nilFM — eureka has lots of markup

eureka has lots of markup

2022-02-13

I've wanted to do this for a long time, but finally got around to it! eureka finally has a more or less complete markup language that compiles to HTML! The only major thing it's missing is (I think) tables, and that may be coming soon... But let's dive into the how and why, shall we?

why

Sometimes blog posts can be verbose, or memex entries can get technical. HTML is a very powerful markup language for displaying various types of information, but it's a little verbose. When I'm in the zone and writing in a constant stream from my brain to my fingertips, I'd prefer to write as little boilerplate as possible. So I'd been thinking for a while of implementing a variation of runic in eureka, but I hadn't dedicated the time and thought to make it happen. Now, what I came up with isn't runic, but it's similar and serves the same purpose. It reduces the number of characters I have to write to achieve the formatting I'm after.

So, I'm able to type this:

{&This is a paragraph of text with a {*/|link} and {@some bold text}}

{,
  {-this}
  {-is}
  {-an}
  {-unordered}
  {-list}
}

{!this is a heading}

{&with some content}

{.and a subheading}

{#
  {-and}
  {-an}
  {-ordered}
  {-list}
}

Whereas before I'd have to type it out in full:

<p>This is a paragraph of text with a {*/|link} and <b>some bold text</b></p>

<ul>
  <li>this</li>
  <li>is</li>
  <li>an</li>
  <li>unordered</li>
  <li>list</li>
</ul>

<h3>this is a heading</h3>

<p>with some content</p>

<h4>and a subheading</h4>

<ol>
  <li>and</li>
  <li>an</li>
  <li>ordered</li>
  <li>list</li>
</ol>

In the end they'd render the same thing (between the horizontal separators here):

This is a paragraph of text with a link and some bold text

this
is
an
unordered
list

this is a heading

with some content

and a subheading

and
an
ordered
list

how

I couldn't direclty replicate runic without a big change in the way things worked under the hood, because runic works on lines, whereas eureka works on text streams delimited by curly brackets.

Most of these extensions to the basic curly-brackets markup from 100r.co were straightforward. It was just a matter of capturing the first character between the curly brackets (as was done with the forward slash originally and characters like the colon, asterisk, and question mark in my early extensions) and then calling a function to write the proper HTML tag before and after the remaining content inside the brackets.

int fptemplate(FILE* f, Lexicon* l, char* s) {
  int target = 0;
  switch (s[0]) {
    case '/':
      return fpportal(f, l, s + 1, 1);
    case '*':
      return fphref(f, s + 1);
    case ':':
      return fpimg(f, s + 1);
    case '?':
      return fphimg(f, s + 1);
    case '_':
      return fpaudio(f, s + 1);
    case '`':
      return fpcode(f, s + 1);
    case '~':
      return fpitalic(f, s + 1);
    case '>':
      return fpblock(f, s + 1);
    case '\'':
      return fpquote(f, s + 1);
    case '$':
      return fppre(f, s + 1);
    case '@':
      return fpbold(f, s + 1);
    case '\\':
      return fpstrike(f, s + 1);
    case '!':
      return fph3(f, s + 1);
    case '.':
  
...
  }
...
}

int fpstrike(FILE* f, char* s) {
  fputs("<s>", f);
  fputs(s, f);
  fputs("</s>", f);
  return 1;
}

int fppre(FILE* f, char* s) {
  fputs("<pre>", f);
  fputs(s, f);
  fputs("</pre>", f);
  return 1;
}

int fpquote(FILE* f, char* s) {
  fputs("<q>", f);
  fputs(s, f);
  fputs("</q>", f);
  return 1;
}
...

But for things like paragraphs and lists, where there may be other markup inside of them, I had to do a little more than just copy-paste the text into the tag's text content.

The first task is to count the opening and closing brackets we encounter when reading a file, so we can keep track of when we close the outermost set. Then, in addition to placing HTML tags around the appropriate text, we have to recursively process that text to take care of any markup that may be inside of it:

int fppara(FILE* f, Lexicon* l, char* s) {
  fputs("<p>", f);
  fpmetatemplate(f, l, s);
  fputs("</p>", f);
  return 1;
}

int fpul(FILE* f, Lexicon* l, char* s) {
  fputs("<ul>", f);
  fpmetatemplate(f, l, s);
  fputs("</ul>", f);
  return 1;
}

int fpol(FILE* f, Lexicon* l, char* s) {
  fputs("<ol>", f);
  fpmetatemplate(f, l, s);
  fputs("</ol>", f);
  return 1;
}

int fpli(FILE* f, Lexicon* l, char* s) {
  fputs("<li>", f);
  fpmetatemplate(f, l, s);
  fputs("</li>", f);
  return 1;
}

int fptemplate(FILE* f, Lexicon* l, char* s) {
  int target = 0;
  switch (s[0]) {
    ...
    case '#':
      return fpol(f, l, s + 1);
    case ',':
      return fpul(f, l, s + 1);
    case '-':
      return fpli(f, l, s + 1);
    case '&':
      return fppara(f, l, s + 1);
  }
  ...
}

int fpmetatemplate(FILE* f, Lexicon* l, char* s) {
  int bopen, bclose;
  char ss[TAG_BODY_SIZE];
  unsigned char t = 0;
  bopen = 0;
  bclose = 0;

  while (*s) {
    if (*s == '}') {
      bclose++;
      if (bopen == bclose) {
        t = 0;
        bopen = 0;
        bclose = 0;
        s++;
        if (!fptemplate(f, l, ss)) {
          return 0;
        }
        continue;
      }
    }
    if (*s == '{') {
      bopen++;
      if (bopen == 1) {
        ss[0] = 0;
        t = 1;
        s++;
        continue;
      }
    }
    if (slen(s) >= TAG_BODY_SIZE)
      return error("Templating error", "text block exceeds tag body size");
    if (t) {
      ccat(ss, *s);
    } else
      fprintf(f, "%c", *s);
    s++;
  }
  return 1;
}

int fpinject(FILE* f, Lexicon* l, char* filepath) {
  FILE* inc;
  int bopen, bclose;
  char c, s[TAG_BODY_SIZE];
  unsigned char t = 0;
  /*fprintf(stderr, "Building: %s\n", filepath);*/
  bopen = 0;
  bclose = 0;
  scsw(filepath, ' ', '_');
  if (!(inc = fopen(filepath, "r")))
    return error("Missing include", filepath);
  s[0] = 0;
  while ((c = fgetc(inc)) != EOF) {
    if (c == '}') {
      bclose++;
      if (bopen == bclose) {
        t = 0;
        bopen = 0;
        bclose = 0;
        if (!fptemplate(f, l, s)) {
          return 0;
        }
        continue;
      }
    }
    if (c == '{') {
      bopen++;
      if (bopen == 1) {
        s[0] = 0;
        t = 1;
        continue;
      }
    }
    if (slen(s) >= TAG_BODY_SIZE)
      return error("Templating error", filepath);
    if (t)
      ccat(s, c);
    else
      fprintf(f, "%c", c);
  }
  fclose(inc);
  return 1;
}

As an aside, I'm aware that some redundancy could be factored out of fpinject and fpmetatemplate, but I'm still grappling with the best way to go about it.

With some minor exceptions due to typos in my original HTML I was able to migrate all my content to the new markup with a sed script. It didn't take more than 15 minutes between writing the script, running it, and finding all my typos.

side-effects

Not only does this speed things up quite a bit when writing content for my website for the obvious reasons, it also has some additional effects that I didn't foresee originally.

The first side-effect is that things are actually more readable. The runes in this markup language are all non-alphanumeric, so it's easy to visually tune them out. Comparatively, in HTML, even with syntax hilighting, the tags can easily blend in with the content and make it a pain to read, especially when it starts to get dense.

The second side-effect is that in low-level tags (those where eureka just copy-pastes the text content between the tags), I don't have to escape my curly brackets anymore -- as long as I close all of them. If the number of opening brackets doesn't match the number of closing brackets, then the algorithm won't know when to close the tag, but if it does, then I don't have to escape with HTML entity codes (like { for a left curly bracket, } for a right curly bracket)! This makes it significantly easier and faster to write (or copy-paste) code snippets inside of a <pre> tag.

Last, but certainly not least of the unexpected bonuses of this setup, is that this comes with a few different types of error-checking! Before, I could have errors in my HTML code go unnoticed for long periods of time because of the laxity of browsers' HTML parsers -- they'd just deal with my error and display a web page that looked pretty much correct. (Alright, perhaps I could avoid this with some scripting to check for correct HTML on build... but I'd rather stay lean.) Now, since everything is basically just a matter of matching curly brackets together, if I make an error in my markup one of three things happens:

a portion of the page simply won't display because the closing bracket wasn't found to trigger templating to occur
eureka crashes because the closing bracket wasn't found and the entire rest of the page was loaded into the template buffer and it overflowed
a portion of the page isn't templated and is instead copied verbatim from the source because there aren't enough opening brackets

Therefore, a visual inspection of the build's console output as well as the actual rendered page makes it clear whether there was a formatting error or not.

onward to victory

All this lets me write more, more quickly, and with less cruft. I was able to bang this post out rather quickly, and I look forward to continuing to write content for my digital garden using this system!