Getting full records for crunching data


#1

Having the pagination api is fine for interfaces which only show so much data on one site.
But like the graphs tab on github some people (like me) just want to crunch data.
Therefore its often needed to retrieve all pages one after another - and thats so slow…

Also the overhead created is pretty big.
In this case it would be very nice to have the possibility to get all records at once (or at least more than 100 at once) especially when the requested data are just a few attributes and not full records.

In my case I would like to get the stargazers from a repository and show their location on a map. Since many repos have thousands of stars this becomes a problem. @davidcelis pointed out that there might be a better way to handle this usecase.

Another thing I noted was, that there is many redundant information in the response query in case you only query for a single attribute (e.g. location).
The location key is repeated over and over but all I want is an array with all the locations.
My usecase is maybe very rare so it might be not worth it to optimize for that.
But I think the whole reason of graphql is to get the data you need in the most space saving way. So instead of getting this:

{
  "data":{
    "repository":{
      "stargazers":{
        "nodes":[
          {
            "location":"Belgium"
          },
          {
            "location":""
          },
          {
            "location":null
          },
          {
            "location":"Berlin, Germany"
          },
          ... and so on 
        ],
        "pageInfo":{
          "endCursor":"Y3Vyc29yOjExMTM1NjMx",
          "hasNextPage":true
        }
      }
    }
  }
}

I would like to get this:

[
  "Belgium",
  "Berlin, Germany",
  "Tokyo, JAPAN",
  "Valencia, Spain",
  ... and so on
]

or this

{
  locations:[
    "Belgium",
    "Berlin, Germany",
    "Tokyo, JAPAN",
    "Valencia, Spain",
    ... and so on
  ]
}

And at best I would like to avoid empty or null results.

This could be made possible by introducing some keywords. e.g.

  • base for shorting the json
  • as array or something to get a one-attribute result as array
  • skipempty to avoid empty results

I am aware that this is not as simple as that. But having this would reduce the traffic load by a bit. And I think this is a good thing.

PS: I forgot my query which is:

query {
  repository(owner: "owner", name: "repo") {
    stargazers(first: 100, after: "someCursor") {
      nodes { location },
      pageInfo { endCursor, hasNextPage }
    }
  }
}