An Ecosystem for Private Customer Data Management by Harald Gjermundrød, Département of Computer Science, University of Nicosia, Cyprus

This article presents a conceptual framework that supports a customer private data ecosystem that would benefit both the consumer and the companies. The framework provides for the consumers the ability to trace their private data and build a sharing tree. The sharing tree shows which company originally collected the data and how this data has propagated from company to company. In addition, the customer is offered incentives (like free apps) to share their private data, but they are able to set limits to how long this data should be available and how far it can be spread. The benefits for the companies is that the consumers would be more likely to provide them with accurate data, as well as that the consumers would keep an eye on the accuracy of the data pertaining to them. In addition, as stricter privacy regulations are being passed by states (like the European Council General Data Protection Regulation (GDPR) [3]), using the proposed framework gives them a technical solution to stay compliant with such regulations and allow them to quickly “forget” the complete trail of data that they have collected and shared.

1. Trading Personal Data

With the proliferation of digital technologies and the trend of digitization of all kinds of records, citizens are becoming even more concerned about privacy issues regarding personal data that organizations store as part of their daily operation functions. Consider the scenario depicted in Figure 1, a customer is registering newly purchased products (like mobile phone, laptops, etc.) with the manufacturer website (Company A and B). Information such as email address, home address, phone number and age are usually part of the registration process. Unknown to the customer is the fact that her private information is shared with partner companies (Company C and D), which then sell the information to a third party (Company E). After the registration is completed, the customer suddenly receives an unsolicited email message from a company (Company E) that sells mobile phone accessories that is compatible with the device that the user just purchased. The customer should have the means to investigate whether or not data supplied during that particular registration was disclosed (both with and without her consent).

Sharing and Selling of Customer Data

According to [1], 81% of consumers in Europe think that their data holds a certain value but at the same time, one out of three consumers provide false information in order to protect their privacy. An astonishing 68% of the respondents would willingly trade their privacy for a free app [2]. Consumers are concerned about their privacy but they are willing to trade it if the monetary value is right. Companies could seize this opportunity and create an ecosystem where personal and other consumer data would be collected, agreed to be traded, and then distributed to third parties. Thus, the three pillars of the proposed ecosystem are:

– Data worth with respect to the companies and the consumers.
– Monitoring and propagation of personal data.
– Accuracy of collected information.

The rest of the article is organized as follows. In Section 2 the ecosystem for managing private customer data is introduced. Details on the technical approach to implement the proposed ecosystem are presented in Section 3. Section 4 concludes the article.

2. Customer Private Data Ecosystem

2.1 Data Classification

Data can be classified into various categories, depending on who is classifying it and for what purpose. For the purpose of the customer private data ecosystem, data is classified into three categories, as shown below:

Customer Business Data: This is data associated with a business transaction with an enterprise such as financial institutions, insurance companies, travel agencies, and online retailers, to just name a few. It also includes personal data that one supplies during product registration or prior to service requests.
State and Medical Data: This is data that a citizen must provide, by law, to the state such as tax data, employment details, and medical information. This type of data is considered to be private and confidential and is mostly managed by state-supported systems with strict legal protection policies.
Any Other Data: This is any other data that a citizen may upload to the web including social media, blogs, photo sharing services, etc. The difference among this type and the previous two types lies on the fact that citizens provide data of the latter type voluntarily as it is considered nonessential for any vital societal purposes. Needless to say, this may not hold true for much longer.

The proposed ecosystem only handles data from the first category, namely customer business data. As there is distrust between consumers and businesses (see [1]), a framework that allows consumers to trace their private data would alleviate the distrust and promote healthy interactions among consumers and enterprises.

2.2 Consumer Features in the Ecosystem

As it was mentioned earlier, customers are concerned about their private data, but would be willing to “sell/trade” it. The trade value of their private data might not necessarily be monetary. The company could, for instance, offer an extended warranty period in exchange for obtaining specific private information that the company would be allowed to share with its partner companies. Other forms of “payment” could be free shipping, exchangeable points, discount on next order, free premium upgrade, etc. One problem with digital data is that it has zero cost when it comes to replicating and sharing it. Hence, the customer may still be concerned about what happens to the data once it is sold. In order to alleviate the concerns, the customers need to be empowered with the capability to control and verify the dissemination of their data. A technical solution must be devised that provides the company the ability to use the data purchased while at the same time provides assurance to the customers who sold their private data.

From the customers’ point of view, the ecosystem should at least contain the following features that would allow a consumer to:

– be able to specify consent for the data to be shared with other companies and be able to set a limit on the number of third parties that have access to it
– be able to set the data lifetime, where companies would need to delete the data after its expiry (this would support the right to be forgotten)
– be able to query companies about data that they collected about her, giving the opportunity to verify that the collected data is still accurate and/or relevant
– be able to build a “trace tree”, in order to determine how specific data got disseminated and where the data originated from
– be able to prove that some specific data did indeed come from a specific company, i.e. provide non-repudiation.

The two first features are negotiated with the company at the time of the data trade. The wider the dissemination and the longer the lifetime should result in a higher return value to the customer. The three last features allow a customer to monitor their own private customer data. A customer should be able to ask a company to provide her with the private data that they have stored about this specific customer. In addition to providing the personal data, the company should also disclose to the customer the data origin and any other entities that purchased the data from it. Of course, there has to be a technical solution so that the customer can do this in a relative easy way.

2.3 Business Features in the Ecosystem

In today’s digital age, companies are using directed and targeted advertising for their products and services. In order to use directed advertisement, they need to collect a vast amount of data and process it using data mining techniques. The more accurate and complete the collected data the more accurate their findings will be. However, as was pointed out earlier 1 in 3 customers provide false information. In addition, states are starting (or are in the process) of providing consumers with stronger privacy rights, like the European Council General Data Protection Regulation [3]. Therefore, companies need to implement technical solutions that collect the data needed while being compliant to state regulations. In this regard, involving the consumers leads to the creation of a synergy relationship. The companies need to provide more transparency to the customers regarding the private data that they collect as well as pay (or somehow reward the customer) for the data. But, in return, they will get more accurate data from the customers and they will comply with future data privacy regulations.

From the businesses point of view, the framework that they would use to purchase and resell customer private data should at least provide:

– a technical solution that makes it easy for the companies to stay compliant with regulations like the “right to forgotten”
– verifiable proof who originally provided the data, in case of dispute over the accuracy of the private data
– customer data that is accurate and still relevant
– ability to trade data with other businesses, but still comply with privacy regulations
– ability to combine customer data from various sources (based on a customer unique identifier) and be able to extract new knowledge about the customer.

An additional benefit that businesses can learn about their customers by using a framework that is proposed here is that when the user chooses how she want to be rewarded (discount, free shipping, etc.) for trading her private data; the reward choice is new knowledge. The business could add this to the information that it has about this customer, and then sell this record to another business. A business could then use this in a targeted campaign that there is a one-day sale, with free shipping, if this is what the customer record indicate that this specific customer prefers as a reward.

3. Costumer Data Record in the Ecosystem

In this section, a conceptual framework, namely the customer data framework, is presented that contains all the features outlined above. This framework is appropriate for the customer business data category. In order for the framework to be able to satisfy all the features listed earlier, there must be a way for the customers to be uniquely identified in addition for the customers to be able to authenticate (prove who they are) themselves. Todays e-citizens possesses multiple email addresses, in addition when customers are accessing their online accounts often their email address is used as their username. Hence, a user can authenticate herself by proving that she is in control of email address that is affiliated with the account. Using the email address as the unique identifier for a customer is not a perfect solution, users can create fake profiles tied to an email address, or they could have multiple email addresses, etc. But it does provide for a generally good-enough approach to uniquely identify customers world-wide without having to involve new stake holders.

The framework works by utilizing the unique email address that is tied to a specific customer in addition to meta-data with references. When the customer registers with a company, she is using her email address which will be the unique identifier for this user. A customer record is then created from the information that the customer supplies when setting up the account. This record will contain at a minimum: email address, creation date/time, expiration date, collection organization, URI (unique Resource Identifier) that is used as a reference to this specific record, references, signature. There is also optional information that are stored like the personal information of this customer and the specific purchase order. All of this is referred to as the meta data of this customer record and are stored using the XML format. There are two different types of references: Backward Reference and Forward References. When the organization (source) is about to share the record with another organization (target), the source company places a Forward Reference in the record metadata that points to the location that the target company will use to store the record. Similarly, the target organization has to insert a Backward Reference into the metadata of the record that it keeps, which points back to the record of the source company. This process is repeated whenever the record is shared. As a result, a tree is created, with the root node being the originator of the data and the leaves being the companies that have not yet resold the record. In each of the records there is also one more Backward Reference that points back to the company that originally collected this specific customer record.

As a record (essentially an exact copy) is distributed and handled by many entities, undisputed verifiable guarantees must be provided regarding the record integrity. Any record modifications should be attributed to the entity that changed the record. This can be achieved via cryptographic techniques, and to be more specific by digitally signing the hash of the customer record, using the company’s private key. A company could potentially modify a record in order to incorporate additional data and/or change existing ones and share the new version with others rather than the original one. In order to be able to keep track of the versions, the new record also embeds the original record (that was signed by the company that disclosed the record). In this way, the customer would be able to gather all available versions of her record (via traversing the tree). Companies which received the record from the same source company must have the same original record, regardless of any further changes that may do on the record. The signing of records provides nonrepudiation, where a company cannot deny the existence of record versions originated from that company.

A customer should have the legal right to traverse the record tree, from root to the branches, whereas companies should only be allowed to traverse branches one level up or one level down the tree. Apps could be developed to implement the record tree traversal with the specified access controls in place. The deletion of the tree is a technical way of implementing the right-to-be-forgotten.

4. Conclusion

This article presented an ecosystem for the management and sharing of private customer data, which can benefit both the customer and the businesses. The framework presented will benefit the customer in that they will be given compensation for allowing their private data to be shared. In addition, they will be able to use tools that can present to them a complete trace of where their private data have been shared. From the companies point of view the framework will still allow them to share their customers’ private data while staying compliant to regulations. In addition, the accuracy of the collected data may be higher and hence represent a higher usefulness.

Acknowledgment

We would like to thank the BeWiser consortium (funded under EU 7th Framework Program, Grant No: 319907) for our fruitful discussions on customers’ security and privacy issues.

References

[1] Symantec Corporation. State of privacy report 2015. https://www.symantec.com/content/en/us/about/presskits/ b-state-of-privacy-report-2015.pdf. Accessed Nov. 24, 2015.
[2] Symantec Corporation. Internet security threat report 2015. http://www.symantec.com/security_response/publications/ threatreport.jsp. Accessed Nov. 24, 2015.
[3] European Parliament and Council of European Union. Regulation of the European Parliament and of the Council on the Protection of Individuals with regard to the Processing of Personal Data and on the Free Movement of Such Data (General Data Protection Regulation). Technical report, European Parliament and Council of European Union, 2015.