OpenSSL’s community has largely moved from email to GitHub Discussions. It makes a ton of sense, but GitHub isn’t ideal when it comes to community data. You pretty much get this:
There is an API for Discussions, but it’s GraphQL. I’m sure it’s great and easy to use once you understand it, but I find graph databases unintuitive and annoying compared to relational databases. Compared to more traditional REST APIs GraphQL is more complicated and finicky.
GitHub offers an Explorer for the GraphQL API, which helps. Here’s a query that grabs the first 100 discussions, some useful data (including the first 100 comments) and page information to get the next 100 discussions in the next call:
{
repository(owner: "openssl", name: "openssl") {
discussions(first: 100) {
totalCount
edges {
node {
databaseId
number
category {
name
}
url
title
createdAt
author {
login
}
answer {
id
databaseId
}
answerChosenBy {
login
}
comments(first: 100) {
totalCount
edges {
node {
id
databaseId
createdAt
url
author {
login
}
}
}
}
}
}
pageInfo {
endCursor
startCursor
hasNextPage
hasPreviousPage
}
}
}
}
It’s wordy and I’m not getting anything close to all the data available. In addition, the edges
and node
represent generic connections and objects. So discussion threads are objects connected to a repository and comments are objects connected to a discussion thread. It’s flexible in a sense. If I were setting up an application to interact with GitHub, I suspect this would be pretty convenient. But what I want is for GitHub to tell me everything it knows about discussions on a repository and this is not convenient at all.
Fortunately, I’ve found a workaround. You can find the code I’m using to pull GitHub Discussions in this Perl script.[1] First, I get the number
for each discussion in a repository:
query {
repository(owner: "$owner", name: "$repo") {
discussions(first: $page, after: %s) {
edges {
node {
number
}
}
pageInfo {
endCursor
startCursor
hasNextPage
hasPreviousPage
}
}
}
}
It turns out Discussions share the same numbering sequence as Issues on GitHub. So the first Discussion post on OpenSSL is 21355. If you try to go to issue #21355 it redirects to the Discussion with the same number. But using that number with the issues API returns a 404 error. Thankfully there is an undocumented REST API for Discussions:
https://api.github.com/repos/openssl/openssl/discussions/21355
Even better, you can grab all the comments in one go as well:
https://api.github.com/repos/openssl/openssl/discussions/21355/comments
In the process of writing this post, I realised I can cut out GraphQL altogether:
- Starting with 1, test numbers to see if they are an issue via the issue API:
https://api.github.com/repos/$owner/$repo/issues/$num
- If a number isn’t an issue, see if it’s a discussion thread:
https://api.github.com/repos/$owner/$repo/discussions/$num
- If it’s a discussion, grab all the comments:
https://api.github.com/repos/$owner/$repo/discussions/$num/comments
I’m editing the code here for clarity. You don’t need to know how I escape quotation marks and that sort of thing. ↩︎