How safe are your secrets? The importance of data security and risk mitigation

Businesswoman,With,Laptop,,Desktop,At,Office,Interior,,Blue,Glowing,Information

Written by James Cruddas, Lead Consultant, Informed Solutions

It is now widely accepted that data is a vital business asset, and with security features developing at a rapid rate, the importance of keeping up-to-speed with the best tools to use to protect an organisation’s data should not be understated.

Here, Informed Solutions’ Lead Consultant James Cruddas considers the importance of data security and risk mitigation, and how InformedENGINEERING© is cultivating an environment where developers can deliver solutions that make our world safer, smarter, healthier, greener, and more secure.

 

Importance of data security

Ask an organisation to tell you their most valued asset and I can almost guarantee that, shortly after its people, data will be one of the most common responses you hear.

The ability of a business to convert data into actionable insight – regardless of sector, size, age, or the number of customers you serve – continues to be ever more important in maintaining competitive advantage, increasing customer satisfaction, and driving operational efficiency gains.

This recognition, coupled with the legal need to safeguard personal information, has helped to remind people of the importance of data security. We see this evidenced by cloud providers enabling more secure configurations “by design”. They are also offering ever more sophisticated security features at more affordable price points than ever before. Take for example AWS, which by April 2023 will be enabling block public access settings by default on S3 buckets.

 

Risk and mitigation

Data has become increasingly acknowledged as “business gold” and the digital vaults in which it is stored continue to grow stronger.

However, a vault built from titanium will open just as easily as one made of tin if you have its key. In data terms, when working with cloud technologies, physical keys are replaced with digital access tokens (be it for AWS, Azure, Google Cloud, or just about anything else). If you expose your access tokens you run a significant risk of exposing your data.

I choose the example of access tokens intentionally, although the same applies to several other forms of secret. Anyone that has read the marketing materials for technologies such as Azure KeyVault, or AWS Secrets Manager might quickly jump to the conclusion they solve just about any credential handling problem you could imagine. Whilst yes, their use is best-practice, and they avoid credentials needing to be stored in plaintext, as they are instead fetched on-demand over REST, you still need to authenticate against these service’s APIs. And to do that you will need an access token.

Multi-factor Authentication (MFA) is an excellent choice for mitigating the threat of compromise but concerningly, studies as recent as November 2022 suggest a mere 26% of enterprise cloud users from the sampled audience are taking advantage of the security benefits it brings. With such a low adoption rate for a relatively simple “switch-on” security feature, more intricate solutions such as carefully crafted identity policies or making use of short-lived tokens, combined with role-based (or attribute-based) controls, is probably even more scarce.

Back in 2019, a study at NCSU by Meli et al. highlighted that “thousands of new, unique secrets are leaked every day” in public GitHub repositories. If you’re one of the 74% of enterprise cloud users not implementing MFA as a secondary layer of protection – this poses a very real threat to your data security. With engineers being aware of how sensitive these credentials can be – how are they still ending up in source code?

The shortest possible answer I’ve found to this question is just three words long… “git add dot”. Whilst I strongly discourage use of this command, as even the best .gitignore files can’t catch everything, you will struggle to find an engineer who has not at some point used this to reduce the time needed to prepare commit contents. The command recursively adds all changes made in the current directory and below in just one fell swoop. This includes newly created files, updates, removals, and everything in-between. Compounding risks to its use, if you omit the “–untracked-files” flag when running a “git status” check, you’ll just be shown a printout of the folder names that are now destined for source control – not details of the individual files inside them. That “temp.txt” file an engineer put their access keys in whilst working on a series of complex CLI commands might never have been intended for source control… but “git add . [dot]” doesn’t care!

 

How can development teams protect themselves?

I’m pleased to say several advances have been made in this space over recent years. In December 2022, GitHub made its secret scanning feature that previously sat behind an enterprise subscription paywall free for all public repositories. That said, whilst this early Christmas present might have brought a smile to the face of any security-conscious developer, an enterprise subscription is still needed if you want to keep your private repositories secret-free. I would also argue that, as generous this might be, only finding out your credentials have been exposed once they are out in the wild is a little too late! To underscore this argument; research published back in late 2020 suggests it can take as little as one minute for hackers to begin using exposed credentials for exploit attempts.

Fortunately, there are several tools that can help with stopping credentials going into source code in the first place. Snyk for example, when using its local CLI, supports, as part of its wider ecosystem, the ability to proactively check for credentials in source code from your local machine. Alongside commercial offerings, there are open-source alternatives such as GitLeaks, GittyLeaks, git-secrets, Trufflehog, and Whispers just to name a few.

At Informed, we use open-source a lot and have made use of TruffleHog OSS on several recent projects as part of our toolchain alongside other complementary provider-specific tools. After reaching out to members of our Service Engineering Community of Practice to gather feedback from our engineers (it was important that everyone be involved in the selection process as they will be the ones using tools), we felt this choice was the best fit for our use-case. At the risk of sounding like a footnote to a financial or legal newsletter – please be aware this does not represent formal advice – it’s simply what works for us!

In April 2022 TruffleHog overhauled its implementation under a new V3 re-design. Alongside other improvements, looking at the tool’s source code, it would seem the maintainers have taken the regex recommendations made by Meli et al. to heart. Combined with its Docker container support, and there being no need for license keys or server-side hub, we feel it provides a simple, portable solution.

Owing in large part to its container support, the tool can run both locally and in serverless CI contexts such as GitHub Actions. The validation feature which tests suspected credentials against 700+ provider types to see if they are genuine, rather than something that just looks like a credential, also goes a long way in preventing “alarm fatigue” by filtering out false positives.

Finally – by binding the tool’s CLI to git pre-commit hooks and instructing a scan to run each time files are staged it becomes straightforward to deploy a multi-layered credential scanning solution that can run on both engineer’s local machines and CI pipelines. This prevents secrets from entering source control in the first place, and double-checks on commits being pushed using CI. It’s always nice to have a fallback!

I’m not saying these types of automated tools are a replacement for understanding and practicing secure software development but as an engineer myself – it’s nice to know there’s a safety net there just in case! As the saying goes “I’d rather have it and not need it, than need it and not have it”.


Read More AI & Data

Comments are closed.