Slides from my talk "Detecting secrets in code committed to Gitlab" at OWASP Suffolk on 15th May 2020.
This talk will cover the following:
* Problem we had
* Techniques to solve that
* Existing tools that can help us
* Comparison of tools
* Final architecture and product
* What we learnt from the experiment
* Future enhancements
2. About Me
● Chandrapal Badshah
● Security Engineer
● Stoic and spends time with philosophy
● Pentest, Automation, Read books
● Manage @HackwithGithub on Twitter
3. Context
● Product based company, fail fast learn fast
● Hires a lot of devs*
● Use Gitlab community edition for code storage and CI/CD
● We do audit the code for secrets in regular intervals, but that’s late
4. Problem Statement
Need to detect and remove sensitive API keys (secrets) from code
This would reduce the impact when:
● Devs makes an internal repo public
● Devs pushes commits to their personal Github repos by mistake
● Unauthorized members accesses to code (insider threat)
5. This would help us in situations like
Source : https://www.bleepingcomputer.com/news/security/microsofts-github-account-hacked-private-repositories-stolen/
8. Git hooks
● Git hooks are scripts that git executes before or after events such as:
commit, push, and receive
● Git hooks are a built-in feature - no need to download anything.
● There are many types of git hooks. Check out https://githooks.com/
● We are interested in commit and receive based hooks:
○ pre-commit
○ post-commit
○ pre-receive
○ post-receive
9. Git hooks in the flow
Source: https://blog.gitguardian.com/git-hooks-automated-secrets-detection/
10.
11. Comparison of Git hooks
Pre commit and Post commit hooks - runs the scripts on dev machines.
Advantages:
● Stops even before the secrets are committed
Disadvantages:
● Adding new regex & managing the script on dev machines is hard
● False positives are bad user experience
● Privacy issues ? Nothing stops them from removing the git hooks
12. Comparison of Git hooks
Pre receive hook - it can’t do much checks as the code is yet to reach the server.
There is Pre push hook which executes even before the Pre receive hook is
executed on the server side. But Pre push hook is still on the client side.
13. Comparison of Git hooks
Post receive hook - runs on the server side.
Advantages:
● Can be configured for no delay when user does a git push. Devs don’t really
see the difference.
● Easy to manage the scripts
● False positives are manageable
Disadvantages:
● The secrets are already on the server
14. Final Decision
Go with the use of post receive hooks.
If secret detected:
● automatically raise a confidential Gitlab issue in the repo
● get feedback - check if it’s a false positive
● if it’s a secret, ask the devs to rotate the secret
Post receive hooks should be configured per repository
15. Gitlab feature to help post receive hooks
● Gitlab has System hooks
● Gitlab system hooks does a HTTP POST request for many events like push,
group create, repo create, etc
● More details at
https://docs.gitlab.com/ee/system_hooks/system_hooks.html
16.
17. Existing secret detection tools
There are lots of open source tools:
● truffleHog
● gitleaks
● git-secrets by AWS Labs
● detect-secrets by Yelp
● talisman by ThoughtWorks
● and more...
18. TruffleHog
● Python based tool
● Customizable regex
● Easy install and CLI commands
● Good documentation
● https://github.com/dxa4481/truffleHog
19. Gitleaks
● Written in Golang
● Customizable regex
● Supports whitelisting of secrets
● Lots of options in CLI commands, lacks documentation
● Allows scan of single commit but downloads the entire repo
● https://github.com/zricethezav/gitleaks
20. Comparison of truffleHog and gitleaks
truffleHog
1. Efficient for smaller commits
2. Less memory intense
3. After configuring with Gitlab system hooks,
the total time taken to complete scanning
was less.
gitleaks
1. Same time as trufflehog for smaller commits.
Comparatively fast for huge commits.
2. Very greedy for CPU memory
3. After configuring with Gitlab system hooks,
the total time taken to complete scanning
was less but at the cost of CPU memory.
21. Changes made
● Took all the necessary code from truffleHog and stripped the rest. We
internally call it “tattletale-rt”.
● The scan logic looks like the below:
○ Get the code changes in the commit (only the added content not the removed)
○ Get all the regexes we need to scan
○ For each line in the code change, check if the regex matches
○ If matches, report it
● Have a separate service called “Issue Manager” which manages issues.
26. What we learnt
● Not all API keys are sensitive. Google API keys are everywhere and are
intended to be public - Google Maps API key, Firebase key, etc
● Deployments are different for each projects - No “one solution” that fits all
● This detection is regex based. API keys / secrets will not be detected if:
○ API key doesn't match the regex
○ If the secrets are in a different language. пароль (parol’) is “password” in Russian.
● Entropy based detection is noisy but can detect some secrets.
● Learn on what’s the secure way to store secrets for each tech stack.