How to Add a Dynamic Table of Contents to Static HTML in React with Rehype

What's Inside 🧐

It’s common to see content managed in a headless CMS available as static HTML, which in the land of JavaScript frameworks can seem somewhat limiting. How can we process and transform that HTML ourselves with tools like rehype, providing a richer experience for our visitors?

Why do we want to process HTML?

Often when we’re dealing with data and content that’s managed outside of our application (think headless CMS or APIs), we have access to that data, but we get it in a raw format.

This is particularly the case with headless WordPress using the REST API or tools like WPGraphQL which inspired this post, which allow us to grab our content as a raw HTML string, but then we’re left simply dumping it into our app.

If we want to add any enhancements, we need to take things into our own hands and process that HTML programmatically, which is where rehype comes in.

What is rehype?

rehype is an HTML processor that allows developers to parse HTML into syntax trees then take those trees and turn them back into an HTML string.

While that’s a lot to take in, ultimately it gives us a pipeline where we can take some HTML, turn it into a format where we can programmatically make changes to it, then turn it back into the original format that the application expects.

It’s part of the unified collective, which is a system for working with syntax trees. unified is a lot to try to cover in this tutorial, but ultimately its the core of which tools like rehype are built upon.

Tip: If you’ve ever heard of remark, rehype is quite similar, except remark works with Markdown instead of HTML!

What are we going to build?

For this tutorial, we’re going to learn how we can turn HTML content into a syntax tree using rehype, do some processing, then turn it back into HTML to add to our application.

Specifically, we’ll learn how to do a few things:

  • Add dynamic IDs to all H2 headers in the content
  • Add a dynamic table of contents with all of the H2 headers
  • Add floating anchor bookmark links like GitHub repos

In order to easily dive in, I set up an example Next.js Starter that we’ll work from in this tutorial. The starter simply has some static HTML that I scraped from my own spacejelly.dev content API which will allow us to easily tap into the HTML like we would if it were a typical WordPress blog.

The steps should be easy to follow along as long as you have access to the HTML string, so whether you use the Starter like we will here, a new Next.js WordPress Starter site, or your own app, you should be good to go!

Step 0: Starting a new React app with a Next.js demo project

Let’s get started by creating our application!

We’re going to use this demo application which includes an example of static HTML content being pulled dynamically into a Next.js app.

To get that up and running, in your terminal run:

yarn create next-app my-rehype-html -e https://github.com/colbyfayock/demo-html-posts-starter
# or
npm create-next-app my-rehype-html -e https://github.com/colbyfayock/demo-html-posts-starter

This will go through and clone the starter project and install all of the dependencies.

Note: feel free to change my-rehype-html to the directory and project name of your choice!

Once everything is installed, navigate to that new directory:

cd my-rehype-html

Then, start up the new project by running:

yarn dev
# or
npm run dev

Which will start up a local development server at http://localhost:3000 where you can now access your new Next.js app!

New Next.js app showing example content from spacejelly.dev
New static blog

Before we jump into Step 1, take a second to look around the application.

You don’t have to have an in-depth understanding of what’s happening, but what you essentially want to know is we have a file located at data/posts.json which contains our post data.

Our homepage (src/pages/index.js) and each of our post routes (src/pages/posts/[postSlug].js) pull in that content in order to list the posts, create a new page for each post, and show that post content on that page.

We’ll be primarily working in [postSlug].js to take the HTML content and process it with rehype!

Follow along with the commit!

Step 1: Parsing an HTML string into a syntax tree with unified and rehype

Getting started, we’re going to first learn how we can take our existing HTML content, turn it into a syntax tree, then turn it back to use in our application.

We’ll also see what that syntax tree looks like and how we’ll use it later to transform it.

To start, we need a couple of packages which we’ll use to work with our HTML. In your terminal, run:

yarn add unified rehype-parse rehype-stringify
# or
npm install unified rehype-parse rehype-stringify

Here’s what we’re installing:

  • unified: the core package that provides us with the pipline and tools that rehype will use to work with the syntax tree
  • rehype-parse: rehype plugin that will parse our HTML string into a syntax tree
  • rehype-stringify: rehype plugin that will take a syntax tree and turn it into an HTML string

Next, open up src/pages/posts/[postSlug].js and import our new packages at the top:

import { unified } from 'unified';
import rehypeParse from 'rehype-parse';
import rehypeStringify from 'rehype-stringify';

Now to actually use these tools, we need to start a new processing pipeline with unified.

For our walkthrough, we’re going to do this inside of the Post component so that we can use console.log and easily inspect the contents in the browser, but when actually doing this in a real app, it likely makes sense to do this work inside of data fetching methods like getStaticPaths to avoid having to do this work in the browser.

So to start our pipeline, at the top of the Post component, add:

const content = unified()
  .use(rehypeParse, {
    fragment: true,
  })
  .use(() => {
    return (tree) => {
      console.log('tree', tree);
    }
  })
  .use(rehypeStringify)
  .processSync(post.content)
  .toString();

Here’s what we’re doing:

  • Create a unified pipeline
  • Use our rehype parse plugin to create a syntax tree
  • Pass in the fragment value of true because we’re not passing in a full HTML document, just a piece of HTML
  • “Use” a new function that right now simply logs our the tree argument, we’ll use this soon to work with our HTML
  • Use the rehype stringify plugin to turn our syntax tree back into an HTML string
  • Run that pipeline synchronously on our post’s content
  • And finally render it to our string

Then update the app to use our new content variable:

<div className={styles.content} dangerouslySetInnerHTML={{
  __html: content
}} />

If we open up our browser and navigate to one of our posts, we should now be able to open up our web console and see our tree in it’s entirety!

Web console showing syntax tree generated from HTML
Syntax tree from HTML content

While we can simply work with that tree “as is”, it’s inefficient. So instead, we’ll next use our unist-util-visit package to “visit” each node.

Follow along with the commit!

Step 2: Programmatically working with DOM nodes in a syntax tree with unist-util-visit

In order to work with each of our nodes, we’re going to use a utility function that helps us find the nodes we need to work with.

To do this, we’re going to use unist-util-visit which is a common utility from the syntax tree used by unified. We’ll use this to find and run code on the appropriate DOM nodes.

To start we need to add our dependency:

yarn add unist-util-visit
# or
npm install unist-util-visit

Next, let’s add a new import at the top of the page:

import { visit } from 'unist-util-visit';

Then, let’s update the use statement where we’re currently adding a console log:

.use(() => {
  return (tree) => {
    visit(tree, 'element', function (node) {
      console.log('node', node)
    });
    return;
  };
})

Here we’re using the visit function where we pass in our syntax tree, a second argument of element meaning we only want nodes that are elements rather than text, and finally a callback function which is where we’re adding a new console log to see the node.

If we reload the page again, we should now see in our web console all of the elements on the page, including paragraphs and headers.

Syntax tree only showing element nodes
Element nodes parsed from HTML content

For our purposes though, we only want to work with the h2 headers, so we can easily add an if statement since we can see exactly what each node is:

.use(() => {
  return (tree) => {
    visit(tree, 'element', function (node) {
      if ( node.tagName === 'h2' ) {
        console.log('node', node)
      }
    });
  };
})

Tip: the second argument to visit takes a “test” function which you could instead use to include the code that checks if it’s an h2 or any other header!

And now, we can see that we’re only logging the h2 headers!

Syntax tree only showing h2 headers
Web log showing h2 headers

In the next step, we’ll learn how we can dynamically create IDs and apply them to each of our header nodes.

Follow along with the commit!

Step 3: Creating and adding dynamic IDs for content headers

Now that we can programmatically access each of our headers, we now want to actually perform some work on them.

Our goal will be to generate a unique ID for each of our headers.

Typically this is done by using the content of the header. For instance, if we have a header “My Cool Header”, the ID may look like my-cool-header.

Now we could do this manually, and you can if you’d prefer! But one important part of generating IDs is making sure to do so consistently along with making sure to cover unusual characters that may not work, breaking the links we’re going to use the IDs for later.

So to do this, we’re going to use another package called parameterize, which is a tool that will take a string and turn it into a safe parameter, removing all special characters.

Tip: parameterize was originally from Ruby on Rails!

To get started, we can install our dependency:

yarn add parameterize
# or
npm install parameterize

Like our other dependencies, we now want to import it at the top of our page src/pages/posts/[postSlug].js:

import parameterize from 'parameterize';

Next, we can get started creating our ID.

Inside our visit block, we want to find the text value inside of our h2 header node, which we can find in the node’s children property.

Particularly, we’ll look for the first item in our children array and grab its value and pass it in as an argument to parameterize:

visit(tree, 'element', function (node) {
  if ( node.tagName === 'h2' ) {
    const id = parameterize(node.children[0].value);
  }
});

Then, we can apply that ID to our header node!

visit(tree, 'element', function (node) {
  if ( node.tagName === 'h2' ) {
    const id = parameterize(node.children[0].value);
    node.properties.id = id;
  }
});

Now, if we inspect our headers, we should now see that each of the h2 nodes have an ID!

H2 header with dynamic ID
H2 with a unique ID

You can even see that if you go to the page’s URL with the ID in the URL, it will jump down to that section! For example:

http://localhost:3000/posts/how-to-add-custom-dynamic-favicons-in-react-next-js#how-does-nextjs-use-favicons

Next, we’ll dynamically create a table of contents and take advantage of those IDs.

Follow along with the commit!

Step 4: Collecting headers to dynamically create a table of contents

Our IDs aren’t much use if people don’t know they can use them. So we’ll create a table of contents that makes it easy for people to jump down to the section they want to read.

To start, we want to collect all of the headers we’re using so that we can later render them to the page, so let’s create an array where we can store these values.

Above our unified pipeline, add:

const toc = [];

Then we want to add each header to that array.

After we add our ID, let’s add our header to the array:

visit(tree, 'element', function (node) {
  if ( node.tagName === 'h2' ) {
    const id = parameterize(node.children[0].value);
    node.properties.id = id;

    toc.push({
      id,
      title: node.children[0].value,
    });
  }
});

Here we’re using the ID that we generated as an anchor property and using the text value of the header for the item’s title.

If we add a console log, we should be able to see our list of headers!

Web console showing header titles and IDs collected from content
Array of headers collected from content

So now, let’s use them to add a table of contents above our content.

Right below our page’s H1 add:

<ul>
  {toc.map(({ id, title}) => {
    return (
      <li key={id}>
        <a href={`#${id}`}>
          { title }
        </a>
      </li>
    )
  })}
</ul>

And now, we should see our list of headers that when clicked, will jump down to the relevant section!

Dynamic table of contents generated from headers
Post table of contents

Follow along with the commit!

Now finally, if you’ve been on GitHub, you’re likely familiar with the little link icon when hovering over a header, where if you click it, updates the browser with the link making it easy to copy.

If we look at their HTML, we can see that instead of putting the the ID on the header itself, they’re adding an additional link node, where they add the ID and float it off to the side.

Github's floating header anchor link bookmarks
Floating header link

So to do this, we’re going to take advantage of the same children property that we used to find the text of the header and inject our own node.

Inside of the same visit function we’ve been working with, underneath where we’re adding our header to the table of contents, let’s add our anchor link:

node.children.unshift({
  type: 'element',
  tagName: 'a',
  properties: {
    href: `#${id}`,
    class: styles.anchor,
    'aria-hidden': 'true'
  }
});

We’re using the unshift method which importantly adds the object to the beginning of the children array, where we create a new anchor tag element.

We’re also adding a few properties here:

  • Our href which includes a “#” so that it works as a jump link
  • A class using our styles object. This might be different if you’re not following along with the Next.js project in this tutorial, in which case you may want to pass a string as a class name
  • And finally we’re adding an aria property of hidden and setting it to true as this doesn’t have any value beyond bookmarking in a browser (similar to what GitHub is doing)

Note: GitHub is adding SVG in their anchor tag. While we could do that, and you can do that, I’m avoiding adding a large complicated node in our example like we had to do creating our anchor tag.

Now if you open this up in the browser, you should see the anchor tag in the code, but you won’t see anything because it has no content inside, so let’s add some.

If you’re following along, open up src/styles/Home.module.css and add the following:

.header:hover .anchor {
  visibility: visible;
}

.anchor {
  float: left;
  margin-left: -30px;
  padding-right: 8px;
}

.anchor:before {
  visibility: hidden;
  color: gray;
  content: '#'
}

.header:hover .anchor:before {
  visibility: visible;
}

In the above, we’re saying that we want to float our anchor element to the left side of our header (again similar to how GitHub does it). We’re also making it invisible to start, but when we hover over our header element, we want it to show.

Now currently we aren’t applying that header class anywhere, so before we look in the browser, let’s add that.

Under the line where we’re adding the ID to our H2 header node from Step 3, add:

node.properties.class = node.properties.class ? `${node.properties.class} ${styles.header}` : styles.header;

We’re updating the node’s class to keep all original classes that were added, but additionally add our header styles.

But now when we open up our browser and hover over our headers, we see our bookmark link!

Header with anchor link that points to ID
Hashtag showing bookmark link on header

If you click that little hashtag, you’ll even see the URL update to include that ID, making it easy to copy and share or simply save your place.

Follow along with the commit!

What else can we do?

Move the unified pipeline out of the component

Like I mentioned earlier, leaving this logic in the React component probably isn’t the most efficient way of handling this. It will make this processing happen every time the component renders, which doesn’t make sense unless your content is dynamically changing (usually it’s not changing after coming from the source).

Instead we can use data fetching methods like getStaticProps or getServerSideProps and change the data before passing it into the component.

Explore more ways to dynamically transform content

Just because we’re getting our HTML statically doesn’t mean we can’t do anything special with it.

For instance, if we wanted to add fancy code blocks with little copy links, we can take advantage of the same rehype ecosystem with their rehype React plugin.