Fetching GitHub Discussions via the API

OpenSSL’s community has largely moved from email to GitHub Discussions. It makes a ton of sense, but GitHub isn’t ideal when it comes to community data. You pretty much get this:

There is an API for Discussions, but it’s GraphQL. I’m sure it’s great and easy to use once you understand it, but I find graph databases unintuitive and annoying compared to relational databases. Compared to more traditional REST APIs GraphQL is more complicated and finicky.

GitHub offers an Explorer for the GraphQL API, which helps. Here’s a query that grabs the first 100 discussions, some useful data (including the first 100 comments) and page information to get the next 100 discussions in the next call:

{
  repository(owner: "openssl", name: "openssl") {
    discussions(first: 100) {
      totalCount
      edges {
        node {
          databaseId
          number
          category {
            name
          }
          url
          title
          createdAt
          author {
            login
          }
          answer {
            id
            databaseId
          }
          answerChosenBy {
            login
          }
          comments(first: 100) {
            totalCount
            edges {
              node {
                id
                databaseId
                createdAt
                url
                author {
                  login
                }
              }
            }
          }
        }
      }
      pageInfo {
        endCursor
        startCursor
        hasNextPage
        hasPreviousPage
      }
    }
  }
}

It’s wordy and I’m not getting anything close to all the data available. In addition, the edges and node represent generic connections and objects. So discussion threads are objects connected to a repository and comments are objects connected to a discussion thread. It’s flexible in a sense. If I were setting up an application to interact with GitHub, I suspect this would be pretty convenient. But what I want is for GitHub to tell me everything it knows about discussions on a repository and this is not convenient at all.

Fortunately, I’ve found a workaround. You can find the code I’m using to pull GitHub Discussions in this Perl script.[1] First, I get the number for each discussion in a repository:

query {
  repository(owner: "$owner", name: "$repo") {
    discussions(first: $page, after: %s) {
      edges {
        node {
          number
        }
      }
      pageInfo {
        endCursor
        startCursor
        hasNextPage
        hasPreviousPage
      }
    }
  }
}

It turns out Discussions share the same numbering sequence as Issues on GitHub. So the first Discussion post on OpenSSL is 21355. If you try to go to issue #21355 it redirects to the Discussion with the same number. But using that number with the issues API returns a 404 error. Thankfully there is an undocumented REST API for Discussions:

https://api.github.com/repos/openssl/openssl/discussions/21355

Even better, you can grab all the comments in one go as well:

https://api.github.com/repos/openssl/openssl/discussions/21355/comments

In the process of writing this post, I realised I can cut out GraphQL altogether:

  1. Starting with 1, test numbers to see if they are an issue via the issue API: https://api.github.com/repos/$owner/$repo/issues/$num
  2. If a number isn’t an issue, see if it’s a discussion thread: https://api.github.com/repos/$owner/$repo/discussions/$num
  3. If it’s a discussion, grab all the comments: https://api.github.com/repos/$owner/$repo/discussions/$num/comments

  1. I’m editing the code here for clarity. You don’t need to know how I escape quotation marks and that sort of thing. ↩︎