As we talk about the different parts of the ecosystem that people hack data of course is our primary concern.
One of the things that's evolved over time with data of course is that we now have these huge collections of data -- these collections may be in traditional SQL databases they may be in buckets or sand storage and flat files they may be in nosql environments, but it doesn't really matter because what we have are huge collections and these huge collections are rarely segmented.
This means that once you get in you have access to everything. We should at least take the contacts collection, for example, and store it separately from the transactions collection. Then those different collections should have different security rules.
The other problem with the propagation data storage tools is that there are many different ways to exploit that data -- some of these tools come with UIs or APIs that don't get secured in the same way that we're securing our primary access to the data.
Then we find odd backups or sometimes we find verbose logging that will actually create a copy of some of the data as it's accessed. And, of course, legacy systems frequently have back doors that need to be locked down and secure but sometimes we these back doors are forgotten about or there's some reason that keeps it from being fixed.
Of course, you can say that the data is all encrypted, but my feeling is that encryption is a myth. Let's start with encryption at rest.
Encryption at rest is where you take the physical file system and you encrypt that filesystem. So, if somebody gets onto that operating system and steals the file, that file is encrypted and they can't read.
However, in the day and age of managed services with things like s3 buckets or DynamoDB in places like Amazon, we never see the operating system. The only way we ever access that data is using a key pair that has access and when we have access we get access to Everything.
This is because we don't encrypt individual records -- that same key pair that gets us into the data system to decrypt everything that we said was encrypted, now has access to... everything
Again, if we have a contacts collection, I shouldn't be able to read every single contact with one key pair that single key pair that has the ability to decrypt everything. This kind of super key just leads to obvious abuse – there should be separate keys with different access depending on context, and you need to be careful about how you dole those keys out, again, tying those keys to clear identity.
Then there's the actual how you're doling that out the credentialing for the database. When you have an application that accesses a database it's usually given a service account and those service accounts usually have a lot of access.
In fact, the database isn't actually 'user aware,' meaning if I'm an individual logging in through an application to get access to data, it's the service account that is actually getting that data on my behalf. Now, with good API structure it shouldn't be able to do anything that the user shouldn't be able to do, but if those service keys are ever exposed, you have a big problem.
Finally, administrators should not be gods.
A lot of the times when we see hacks it's because it's a disgruntled employee who had a lot of access, or an employee's credentials are somehow hacked. We shouldn't be able to see everything... I honestly don't want to see everything and have often had to ask my engineers to restrict my own access.
Adding identity and security policies to databases that are being consumed by outside services can be tricky, but it's getting more and more expensive, and more and more critical, that we do this by practice, not as a patch or a fix.