Data Science

Is “private” data private?

Man is a community being, with absolutely no resistance from sharing his encounters of life events with his fellows. Starting from the caveman who expressed himself through his cave drawings, to the current digital era and social media, man has always been excited about communication. Books, letters, postcards, telephone calls, and other communication media are rendered ancient with the recent development of the digital universe. The past two decades have seen magnanimous growth of the Digital Universe and exponential increase in Data has resulted in opening gateways to new technologies like Big Data and Data Science.

Posting your vacation photos on social media, emailing your coworker of tasks to be completed, every website you access and why not, even the emails and newsletters that you subscribe to, adds your bit to the already existing Zettabytes of data in the digital universe. We intend to communicate to a certain group of people, but are our intentions fulfilled? Or are we posting data to a larger system with unknown peers accessing them?

 Every sector is now digital, and we have data all over – financial, medical, personal, educational, locational, retail and other data on the internet. Most of the sites now demand a user login and credentials to be able to use their services. Users tend to classify data as private. For example, our email accounts, online shopping, banking and financials, social media tools like FaceBook, Twitter all are governed by individual user credentials and preferences. Whether or not the posted information is private, the login credentials at the least are considered private. However, these data are stored in servers away from your geographical area in some DataBases that the app is built on to use. So, are we the only one who possess our credentials? Probably not. We do have designated teams working on these data servers and databases frequently performing maintenance and other technical activities.

The sensitive information can of course be encrypted before being stored in the relational databases. But, most companies avoid encrypting data as much as possible as it might cause overhead in terms of resources. Encrypting data requires more storage space and the access speed of the application reduces considerably, thereby making encryption expensive. So, it would be only the user passwords and the medical details that are usually encrypted in any system. But mind it, someone would have written the encryption algorithm and hence can be easily decrypted if required. Let alone hackers trying to penetrate the system!

In most of the applications, the database security is achieved by separating the web application server from the database server. The database servers usually have firewalling enabled allowing only the web application server to reach it and possibly limiting access to the database only to a few user accounts dedicatedly used by the web application. Generally, database engines have reasonably secure default settings these days, but many users of the few databases like the newer MongoDB found that unsecured administrative access was enabled by default and accessible to all the Internet. Cake-walk for the hackers!

So, do we think only the data stored backed is vulnerable and not secure? The answer is no. How many of us are aware that we are giving away more information to the digital community that we intend to? We have too many apps and online sites that request users to share their data to help them with their daily chores. We think it makes our lives easier, but instead, we are falling trap to public data. Every site requests you to accept their cookie policy. But do we know what level of data is collected or stored as cookies? Every mobile application requests your permission to access resources on your smartphone, without which the app renders useless. But do we know what data is collected from the app? We provide permission to access the GPS, the phone log and every other possible sensitive information, all for the app to help us. But does it actually help?

A shocking scandal from one of the leading social media companies has made us question our data integrity and privacy. The Cambridge Analytica Scandal, hacking around 87 million Facebook profiles and retrieved users’ personal data including messages and friends’ details. A personality quiz app developed in 2013 by the Cambridge University was installed by 300,000 people and not until 2015 was it identified that the app was able to retrieve user’s private data. Though Facebook apologized for the security breach in their system, the app retrieving personal info was still legal, as it was the users who gave away the permissions requested by the app.

The NSA – National Security Agency of the US, the largest Intelligence organizations, collects information from all over the world to identify for any possible threat. They intercept and store around 1.7 billion (as of 2010) emails, phone calls and other types of communications. This gives us a nagging feeling that we are being watched and our communication and data are not private. There have been protests against the NSA for their data mining operations. However, there can always be secret intelligence agencies trying to read us.

We do have data protection laws governing the use of data. The GDPR for example defines laws to secure the data generated from the European Union. However, all the laws are on paper unless every individual uses the data with ethics.

As of the only data that cannot be read or intercepted by anybody is our minds’ thoughts. Soon, we will be having a device, a magical hat that can even read what the human brain thinks. And with the dawn of such a device, even the most secure of human thoughts are susceptible to hacking and will no longer be marked “Private”.

The digital era is putting an end to Data Privacy!