Duplicate cursor and starredAt for first 100 edges


#1

I’m just getting started with the new API. I’m looking to use it to simplify the backend of rviscomi/red-dwarf which quickly runs into rate limits. This also allows me to sort the users by when they starred a repo, which wasn’t possible with the old API.

My query (see below) gets the name and location of everyone who stars a given repository, 100 users at a time. However, the first page of 100 users seems to have bad data.

Query
query getStargazers($user: String!, $repo: String!, $cursor: String) {
  repository(owner: $user, name: $repo) {
    owner {
      login
      avatarURL
    }
    name
    description
    stargazers(first: 100, after: $cursor, orderBy: {field: STARRED_AT, direction: ASC}) {
      edges {
        cursor
        starredAt
        node {
          login
          location
        }
      }
      totalCount
      pageInfo {
        hasNextPage
      }
    }
  }
}
Variable data
{
  "user": "rviscomi",
  "repo": "red-dwarf",
  "cursor": null
}
Results
    "edges": [
      {
        "cursor": "Y3Vyc29yOjIwMTItMDktMTlUMTM6MTE6MTctMDc6MDA=",
        "starredAt": "2012-09-19T20:11:17Z",
        "node": {
          "login": "vic",
          "location": "Mexico City"
        }
      },
      {
        "cursor": "Y3Vyc29yOjIwMTItMDktMTlUMTM6MTE6MTctMDc6MDA=",
        "starredAt": "2012-09-19T20:11:17Z",
        "node": {
          "login": "gaveen",
          "location": "Sri Lanka"
        }
      },
      ...
    ], ...

See the full results here.

The cursor and starredAt values are identical for all 100 users. So even if I changed the query to first: 1 and after: <cursor>, then I would actually get the "n+1"th user instead of the 2nd user, where n is the number of users with this buggy data. I haven’t checked to see how many users have this cursor but it’s at least 100.

Why do all these edges have the same values?

Update: I removed the orderBy and noticed something interesting. The starredAt values are all still the same, but the cursor values are different! So when ordering by STARRED_AT the cursor hash must just be a function of the starredAt time.

It also looks like about the first 120 users all have the same starredAt value. The repo was created on 9/16/12 and that’s when my star should have been first, but these 100+ stars have the date 9/19/12, mine included. The bug seems to have been fixed on 10/2/12, when the first non-9/12/12 date appears. So maybe GitHub experienced some data loss on or before 9/16/12 until 10/2/12 and backfilled everything with the date 9/19/12?

In any case, even if the starredAt values are all legitimately identical, they should still hash to different cursor values when ordering by starredAt. As it is now, it’s impossible to query for the 20 or so users who come after the 100th cursor and before the cursor dated 10/2/12.


#2

That’s very interesting @rviscomi! Thanks for reporting this. I’ve opened up an internal issue to track this bug. We’ll report back when we have more information.

Thanks again!


#3

Thanks @bswinnerton. FWIW I replicated this issue with my other repo, trunk8. The first page of results all have starredAt times of 2012-06-16T05:25:05Z.

The repo has 687 stars, but I’m only able to get 424 stargazers back because the cursor of the 100th edge is the same all the way up to the 363rd edge. So after: $cursor skips over edges 101-362.


#4

Thanks for all the info on this bug! I’ve just released a change to improve our cursors. I tried your query and it paginated properly this time. Please let me know if you have any more trouble with it!