How bad can it Git? Characterizing secret leakage in public GitHub repositories

How bad can it Git? Characterizing secret leakage in public GitHub repositories

There are (at least) two good sources of information for secret detection: the GitHub search API and the GitHub public dataset maintained in Google BigQuery. The first phase of the process is to query for candidate files which may contain secrets, using a carefully crafted set of search terms:

Given a set of candidate files, the next thing you’re going to need is a set of regular expressions for popular key formats. For example:

The regular expressions can then be used to scan the candidate files from the first phase, with any matches considered “candidate secrets”.

Source: blog.acolyer.org