GitHub as a CMS

First thing's first. I really love Ghost. It's got an excellent content and admin API and I'd honestly consider it better than many other Headless CMS options. Second, this isn't a post about Ghost. There were two major shortcomings in Ghost that ultimately I couldn't find good workarounds for.

Custom structured data. As it stands I was jamming everything into codeinjection_foot as YAML and hoping I didn't ever need that field. I also needed to comment my code explaining all my additional metadata resided in the field not called metadata. Yikes.
Something that isn't Mobiledoc. I understand why it's used. I don't like that my parsing process on Codedrift requires that I start from the final HTML and then throw it all in rehype to fix URL links, remove widowed text, and more.

HTML to AST back to HTML is not a good feeling and the pipeline was brittle. When I realized I didn’t want to re-add tags on my posts to Codedrift because of the complexity, I looked for alternatives.

I saw Casual JavaScript which was running the entire site off of GitHub Issues. It wasn't perfect, but GitHub Issues are in Markdown, and Markdown does support frontmatter, and frontmatter is literally designed for metadata.

The missing piece was something that wasn't issue based. Discussions. Sold.

Core Tools

The original Codedrift had a GraphQL micro server that ran everything. It was overengineered (like all good personal projects should be), but in the rewrite I wanted to remove as much of my code as possible. The entire GraphQL infrastructure became a simple http proxy using http-proxy-middleware. This way I wouldn't need to expose a personal access token in the browser.

The next step was to use graphql-code-generator for Typescript typed Urql hooks. I'm still a fan of Urql, and their docs for next.js integration are easy to follow and set up.

Because scope creep is a thing, the site's now also in Typescript. 🥳

So now I've got typed GraphQL queries to GitHub's API without exposing the access token. All that's left is to get our Discussions, turn them into blog posts, and solve for weird cases like tag-based searches which need to go through the search endpoint instead of the repository endpoint. Consistency (and sanity) between queries came from the Post fragment.

The Post Fragment

Fragments are better explained by graphql.org, but in context they are the reusable unit of documents. In this case, a fragment on Discussions will let me get the same fields every time for our discussion, regardless if I arrived at the Discussion via the search endpoint or a repository endpoint. Even though it might overfetch a little for its use case, a fragment pushed through graphql-code-generator will give you a typed object representing the returned data structure.

In other words, having this GraphQL fragment

fragment PostDetails on Discussion {
  id
  title
  lastEditedAt
  url
  labels(first: 100) {
    nodes {
      id
      name
      description
    }
  }
  author {
    avatarUrl(size: 10)
    ... on User {
      id
      name
    }
  }
  body
}

Will give us this type in our generated file:

export type PostDetailsFragment = {
  __typename?: "Discussion";
  id: string;
  title: string;
  lastEditedAt?: any | null | undefined;
  url: any;
  body: string;
  labels?:
    | {
        __typename?: "LabelConnection";
        nodes?:
          | Array<
              | {
                  __typename?: "Label";
                  id: string;
                  name: string;
                  description?: string | null | undefined;
                }
              | null
              | undefined
            >
          | null
          | undefined;
      }
    | null
    | undefined;
  author?:
    | { __typename?: "Bot"; avatarUrl: any }
    | { __typename?: "EnterpriseUserAccount"; avatarUrl: any }
    | { __typename?: "Mannequin"; avatarUrl: any }
    | { __typename?: "Organization"; avatarUrl: any }
    | {
        __typename?: "User";
        id: string;
        name?: string | null | undefined;
        avatarUrl: any;
      }
    | null
    | undefined;
};

Even a Partial<PostDetailsFragment> adds excellent safeguards to a function when I expect to receive data shaped like the GraphQL response for a Discussion.

Slugs and Search

Fetching the most recent posts is a straightforward process since the API can go from a repository to its discussions and limit those to a single category. For accessing a single post by anything other than its ID, there's only the search endpoint.

A generic search on GitHub takes a search query as a string. It's the same string used in GitHub's Advanced Search which at least means it's easy to test and iterate on, even if there is no official query builder. The result is a very generic GraphQL query for searching, which can be used for both slugs and tags. It could probably also be used for the recently created posts, but I'd prefer less magic strings where possible.

query SelectPostsWithSearch($search: String!, $first: Int = 1, $after: String) {
  search(first: $first, after: $after, type: DISCUSSION, query: $search) {
    nodes {
      ... on Discussion {
        ...PostDetails
      }
    }
    discussionCount
    pageInfo {
      endCursor
      hasNextPage
      hasPreviousPage
      startCursor
    }
  }
}

The first of two major downside to this approach is I need to constantly cast the node collection to Discussion, but I feel that's more my inexperience with Typescript than the language itself. The other is that because when running a search, there's always a risk of returning two records for a search. I briefly thought about changing the URLs to be slug/post_id but that felt like it was binding me too closely to GitHub's ID system.

The New Post Workflow

Posts on Codedrift start in my public notes, but eventually they’re well-formed enough to get ideas and opinions. At that point, there’s discussion on the draft, which used to exist exclusively over email and Twitter DMs. These processes also worked, but it always felt like I was missing an opportunity for more open discourse about a topic.

So now, when a post grows out of Codedrift, it’ll become an issue, When it’s ready to be published, it can be moved to a Discussion, and GitHub saves all of the relevant comments and edits in the transfer. This has the added benefit of matching my mental model. Unposted content is an issue which certainly needs resolution. Posted content deserves a discussion.

The code for all of this is in the Codedrift repository, though now that I'm just calling a GitHub API with no interference, the code feels very vanilla. And that feels pretty good.