Seeking Advice: Responsible way to clone repos for a CI tool?


When building a SaaS CI tool, what’s the most responsible way to clone users’ repositories to the web server to run checks? I see a few options:

1. Clone it to a /tmp folder directly on the server
Pros: Simple, straightforward.
Cons: Executing untrusted scripts inside your application; multiple users’ code is vulnerable.

2. Clone it within a Docker container
Pros: Untrusted code is [more] sandboxed and ephemeral; cannot read other users’ code.
Cons: Perhaps more difficult to manage resources, e.g. how much memory/storage is Docker using? Scaling is difficult, but similar to option 1. Can’t cache user code for repeated runs since it lives in an ephemeral Docker container.

3. Clone it in a Docker container within a Kubernetes pod
Pros: Same as 2, but scaling is more manageable since Kubernetes makes it easy.
Cons: Lots of overhead and setup.

Any thoughts or recommendations? How do you clone repos for Github Apps?


We’ve got a similar problem running Dependabot - we don’t clone repos, but we do pull down specific files and evaluate them for some languages (Ruby’s Gemfile is written in Ruby, for example).

Our solution is closest to your option (3). It’s essential that you have reliable isolation if you’re going to be evaluating arbitrary code. The consequences of an attacker breaking out of that isolation would be extremely severe - if they were be able to get access to your app’s credentials then they would gain permissions over all of the GitHub accounts that have linked themselves to your app.

For Dependabot, we’re hosted on Heroku and spin up one-off dynos to process each job that requires code evaluation. These one-off dynos have no access to Dependabot’s credentials or database, so there’s very little an attacker can do without breaking out of one. Since breaking out of one would essentially mean the entirety of Heroku is compromised we’re pretty confident doing so is as close to impossible as Dependabot can make it.