A deep dive into databases protecting confidentiality
Elevate your enterprise data technology and strategy to Transform 2021.
Privacy-protecting databases use a number of techniques to protect data. The complexity of these techniques has evolved as the threats to data privacy have increased dramatically.
The easiest way to protect individuals’ records in databases may be to assign numeric aliases that can be stored in a separate database. Researchers only receive the first database, with pseudonyms relieving them of the obligation to protect people’s real names. The database with the real names can be stored in a second, more carefully protected location – or even completely deleted.
More sophisticated approaches use encryption or a one-way function to calculate the nickname. This can give users the option of recovering their information from the database by rebuilding the nickname. But anyone who accesses the database cannot easily match the records with the names. My well-aged book, Translucent databases, has explored a number of different approaches in this regard, and many innovations have been made since then.
Some of the more complicated solutions are called “homomorphic encryption”. In these systems, sensitive information is fully encrypted, but complex algorithms are specially designed to allow some basic operations without decryption. For example, some computers may add a list of numbers from an accounting database without being able to decrypt the associated protected values.
Homomorphic encryption is far from mature. Most early systems require too many calculations to be practical, especially for large databases with many entries. They often require that the encryption algorithms be customized for the data analysis that could result from them. Still, mathematicians do an exciting job in the field, and many recent innovations have dramatically reduced the workload involved.
In recent years, researchers have started to seriously explore how adding spurious entries or changing values by adding random noise can make it harder to identify individuals in a database. But if the noise is properly mixed, it will cancel out when calculating some aggregate statistics, like averages – a technique called “differential confidentiality. “
What are the use cases?
- Save time and money on security by deleting the most valuable data. A local version of the database stored in a branch office can remove names to eliminate the risk of loss. The central database can keep full compliance records in a more secure building.
- Sharing data with researchers. If a business or school wishes to cooperate with a research program, they can submit a version of the database that has personal information hidden while withholding a full version if it is necessary to discover the correct name associated with a record.
- Encourage compliance with record keeping rules while preserving client confidentiality.
- Provide strategic protection for military operations while sharing sufficient data with allies for planning.
- A trading system designed to minimize the danger of insider trading while continuing to track all transactions for compliance and settlement.
- A fraud detection accounting system that balances disclosure with confidentiality.
Supplier approaches to encryption
Established database makers have long experimented with using database encryption algorithms that analyze and scramble data in particular rows and columns so that it can only be viewed by someone with the right skills. good access key. These encryption algorithms can protect privacy, but many privacy protection approaches attempt to avoid general encryption. The goal is to balance secrecy and sharing, to protect private information while revealing non-private information to researchers.
Encryption algorithms are often used as a component of this strategy. Personal information, such as names and addresses, is encrypted, and the key to this encryption algorithm is only kept by trusted insiders. Other users have access to unencrypted sections.
One common technique is to use one-way functions such as the SHA256 hash algorithm to create keys for particular records. Anyone can store and retrieve their personal information because they can calculate the data key by hashing their name, for example. But attackers who could browse the data cannot reverse the one-way function to retrieve the name.
Lately, this option does not require encryption, at least directly. Sometimes fake data is mixed in the database, and other times the actual data values are slightly distorted. Identifying recordings of individual people becomes difficult because of the noise.
Some companies are expanding their product line with libraries that add differential privacy to data collections. Google recently open source its internal tool called Privacy on beam, a collection of libraries written in C ++, Go and Java. Users can inject noise before or after storing information in a Google Cloud database.
Microsoft also recently offered a differential privacy toolkit which was developed in collaboration with computer scientists at Harvard. The team demonstrated how the tool can be used for a variety of use cases, such as sharing a dataset used for training an artificial intelligence application or calculating statistics used for training. planning of marketing campaigns.
Oracle also explored the use of algorithms to help protect interactions with researchers forming a machine learning algorithm. A recent use case explores the mixture of differential privacy algorithms with federated learning that works with a distributed database.
Is open source a way forward?
Many of the early explorers of differential privacy worked together on an open source project called OpenDP. It aims to create a diverse collection of algorithms sharing a common framework and data structure. Users will be able to combine multiple algorithms and build a layered approach to protect data.
Another approach focuses on auditing and resolving data issues. the Privacera platform suite of tools can search files to identify and hide Personally Identifiable Information (PII). It deploys a set of machine learning techniques and the tools are integrated with cloud APIs to simplify deployment across multiple clouds and vendors.
For more than a decade, IBM has offered homomorphic encryption. The company offers toolkits for Linux, iOS, and MacOS to accommodate developers who want to incorporate homomorphic encryption into their software. The company also offers consulting services and a cloud environment to store and process data securely.
Is there something privacy-protecting databases can’t do?
The math behind it is often flawless, but there can be many other weak links in systems. Even though the algorithms have no known weaknesses, attackers can sometimes find vulnerabilities.
In some cases, bad actors simply attack the operating system. In others, they attack the communication layer. Some sophisticated attacks combine information from multiple sources to reconstruct the data hidden inside.
But the use of data privacy protection techniques continues to provide another layer of assurance that can simplify compliance. It can also enable types of collaboration that would not be possible without it.
VentureBeat’s mission is to be a digital public place for technical decision-makers to learn about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in managing your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the topics that interest you
- our newsletters
- Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
- networking features, and more
Become a member