Keeping Sensitive Data Out of Your Logs
Protecting your user's PII and PHI requires an intentional approach and the use of a data privacy vault with API.
If your organization builds or maintains an application stack that processes customer data, then you know that protecting sensitive data like personal identifiable information (PII) and personal healthcare information (PHI) is essential to keeping your business going and trusted by your customers. You might think it’s enough to store PII and PHI securely, only exposing it in the proper contexts through your system’s UI and APIs. However, some of the biggest breaches of customer data have happened because sensitive data found its way into poorly secured logs.
How do you ensure that PII and PHI stay out of your logs? In this article, we’ll talk about how to isolate this sensitive data and which practices will assure your customers that they can trust you to protect their data.
Why Businesses Must Isolate PII and PHI
Before we get into the “how” of isolating PII and keeping your logs clean, let’s briefly touch on why it’s important to give customer data special treatment. After all, customers share their data with us all the time. If they’re so willing to hand it over, then why do we need to treat it with special care?
Customer Expectations
Your customers reasonably assume that if they share their data directly with your business, then you’ll use that data specifically for expected business uses. The basic expectation of the customer is that their data will only be used for purposes that they specifically authorize. Unless you’ve explicitly informed your customers that their data will be shared, they expect you to treat their data with extra caution, as if it carries a “handle with care” label.
Data Privacy Regulations and Compliance
Depending on where your customers are located, your business may also be required to comply with certain data privacy regulations. In the EU, you’re subject to GDPR. In the United States, consumer privacy laws in several states (including California, Colorado, and Connecticut) place the onus of protecting customer data squarely on the businesses that collect the data.
Different from Transactional Data
Another important reason why PII should be handled with care is that it’s essentially different from other transactional application data. While plenty of data is generated during a user’s typical interaction with your application, data that can personally identify your customer is not something that can just be changed on a whim, like a compromised password can be changed. Some PII, like birthdays, can’t be changed at all! Many kinds of PII, if leaked to malicious actors, can have a drastic impact on people’s lives, resulting in identity theft and other kinds of fraud.
For both customer trust and legal responsibility, businesses must isolate PII and treat it with special care.
How to Isolate PII and PHI
We’ve laid the ground rules: businesses must treat PII and PHI with special care. Before we can cover what you should do to ensure PII and PHI don’t wind up in your logs, let’s look at the key considerations for how to isolate PII and PHI when collecting and storing it.
Do You Have a Reason to Store It?
For any data you gather from your customers, you should have a good reason for storing it. If you don’t have a reason to keep data on your customers, then don’t keep it. In brief: don’t need it? Then don’t store it.
You might even think that a certain piece of customer information is benign. However, if it’s something that can be tied back to a specific customer and you have no real use for it, then it's simply safer not to keep it around.
What Kind of Data Is It?
Even if you’re not sure whether a piece of data is sensitive enough to warrant isolation, you should still think through the types of data you’re storing. The following are typical examples of PII:
- Full names
- Physical addresses
- Email addresses
- Official identification numbers (for example, from driver’s licenses or passports)
- Phone numbers
However, the above examples aren’t the only types of data that merit special attention and control. Think through what sorts of financial or healthcare information you might keep in your system. Passwords are also important to keep secure. You also should be very careful with the IP addresses of your customer’s devices. Again, if you don’t need any of this information, then you’ll be best off not storing it.
When You Do Need to Store It, Use a Data Privacy Vault
After thinking through the above questions, you’ll likely conclude that you still have a legitimate need to store some of this PII and PHI. That doesn’t mean you should just dump it into your main application database. No, you should use a data privacy vault to make sure your customer data stays isolated, away from everything else in your system. By using a specialized storage solution for these particularly sensitive pieces of information, you set yourself up for better success in keeping your logs clean, reducing your risk of a serious data breach.
Keeping PII and PHI Out of Your Logs
Many developers make use of common-sense techniques to keep certain sensitive information out of logs—for example, by filtering out passwords or IP addresses. These are no-brainers. But, making sure PII and PHI aren't logged is especially important. One of the first targets an attacker will try to hit is your logs, with the hope of finding a treasure trove of PII or PHI.
Storing Encrypted Sensitive Data In Your Database Is Insufficient
Even when businesses try to isolate sensitive data by storing it encrypted in a separate database, the unencrypted data tends to end up in log files. Often, various business operations require the decrypting of PII or PHI to use it. This is often when logs get written to file. Just like that, unencrypted sensitive data ends up in logs.
Let’s take a look at how you can make sure your logs are properly scrubbed for sensitive data.
Tokenize, Tokenize, Tokenize
Any time you have the opportunity to pass or store references to PII and PHI stored in your data privacy vault instead of the actual data, you absolutely should. By tokenizing your sensitive data, you prevent actual, raw sensitive data elements from being written to log files. Whether it’s within those logs, the URLs your customers visit, or the API calls clients can make, using tokenized references to PII and PHI data helps to keep this data safe and secure.
With PII and PHI properly tokenized, the worst that could happen is your application inadvertently logs tokenized data, and that would not be harmful. If your data privacy vault remains secure and you’re only passing around tokenized values, then no one will be able to get to the raw sensitive data.
It’s important to remember that neglecting to tokenize sensitive data is one of the ways data most often gets leaked. Engineers may go to great lengths to encrypt data and secure their systems, but if the full values of sensitive PII and PHI data elements are written out to a log, then all of that work is circumvented.
Governance
Securely storing your data is only one part of the equation. It’s also essential to have well-defined and well-understood controls around how data can be used in your organization. This is where having an effective data governance model comes into play. With proper data governance in place, you make sure that the following questions are thought through and answered:
- How long will you keep sensitive data stored in your systems?
- When will it no longer be necessary to keep certain sensitive data elements?
- How does your company allow sensitive data to be used?
- What will your response be if certain types of sensitive data are exposed in a breach?
These certainly may be difficult questions to answer, but keeping them in mind will help you to build a more privacy-conscious team and allow you to be prepared in the case of an actual data breach.
Conclusion
In summary, I’d like to reiterate the importance of businesses treating customer PII and PHI with special care, isolating sensitive data from other application data for the sake of customer trust and legal compliance. Isolating sensitive data means understanding what kind of data it is and whether you even need to store it. When you have no choice but to store customer data, use a secure data privacy vault that facilitates tokenization and masking to prevent raw PII from ever being written to system logs. Finally, each business should put in place a data governance plan, setting up guardrails for how the business will handle customer data.
Among the data privacy vaults available for modern applications, Skyflow is worth considering. It offers all of the capabilities described above, and it offers a free trial so you can check out its data privacy API.