Scared of analyzing personal data and accidentally breaking the law? It’s simple to avoid when you know how. We’ve got the lowdown.
The General Data Protection Regulation (GDPR)
The GDPR law, which has been valid since May 2018, rules on the processing of personal data. We call data with personal data ‘privacy sensitive data’. The main characteristic of this data is that it is traceable to a person. It’s good that there are rules about how we should deal with personal data, because organizations collect a lot of data and should be thinking about how they can handle it in a responsible manner.
The disadvantages of the GDPR
There’s a downside to the GDPR. Several companies have closed since this legislation came into force. When it comes to data analysis there are consequences to processing sensitive data incorrectly. That’s a shame, because most of the time we’re not interested in the personal information, but we are interested in the information behind it. We don't want to know that Jan Jansen has seen 100% of the episodes of Game of Thrones. What we’re interested in is that the group of people who have watched all of the episodes of Game of Thrones are generally men aged 50-60 living in Drenthe.
We don’t need any traceable data at all. We only want to keep the information that’s attached to an individual, without using any identifying features. That’s why we adjust all data that can be traced to a person; BSN, name, address, date of birth, so that they are no longer traceable. You have various techniques for this, such as data aggregation, psuedonymisation, anonymization, for example.
How should you depersonalize data?
Protect your data by figuring out which information is traceable and what you can do to make it non-traceable. Personal data isn’t vital to complete an analysis, and using depersonalization means you can avoid breaking the law when you analyze the data, or if you share it with a 3rd party for analysis.
Aggregation is a technique to depersonalize data by categorizing data. Categorizing is a way of creating pseudonyms for the data so it’s depersonalized. You can do this by putting age as a category, or postcodes only including letters, for example. You could also make your data anonymous, but then you won’t be able to link the data, which isn’t ideal for analysis.
What do data scientists do with personal data?
As engineers and scientists, we make sure that we never receive personal data in the first place. We ask customers for pseudonymized data, so they aren’t at risk of breaking the law. It’s important to think about safeguarding privacy with all of the parties involved. It’s a shared responsibility, and companies should involve their privacy officer to make sure that protocol is being followed.
We spoke with Privacy Praat in more detail about the GDPR and data.