Big data regulation. Does more need to be done?
Big data has immensely benefited the modern world by facilitating the creation of highly accurate prediction models and powerful machine learning systems. However, methods for regulating the generation and use of such a powerful resource remain unclear. Felix Wallis and Rishi Virani discuss the generation and application of big data and its associated regulatory challenges.
Big data describes the enormous amounts and variates of information generated across industries ranging from financial services to social media to biomedical engineering. The great volume of this information means that it cannot be processed using traditional data processing systems and technologies. Instead, data analytics, a collection of methods used to process large amounts of structured and unstructured information, is used to discover patterns within these data.
The collection and processing of big data has drastically benefited the modern world, creating highly accurate prediction models and capable machine learning systems. However, the methods for regulating the generation and use of such a powerful resource remain poorly defined, theoretically and empirically. Two questions arise regarding the billions of interconnected sensors and devices continuously collecting data about their users. First, does big data mark the end of the traditional ideal of individual self-determination concerning personal data? Second, should policymakers be concerned about collective decision making facilitated by big data? In this article, we attempt to tackle these questions. We start by providing a description of the generation and application of big data and follow by evaluating the individual and collective dimensions of big data regulation in relation to The European Union’s General Data Protection Regulation (GDPR) and regulatory approaches (or lack thereof) in the United States. We conclude that more needs to be done in increasing regulation to protect individual self-determination and prevent discrimination when incorporating user-specific information into such datasets.
The generation and application of Big Data
The proliferation of social media has massively contributed to the generation of big data through content such as tweets, posts, reviews, and comments. With over one billion people around the world having access to social media, such platforms produce an overwhelming amount of irrelevant and inconsistent data, known as unstructured data, which is proven redundant in business analysis. As such, sophisticated machine learning and data mining techniques have been developed to identify useful information regarding opinions, behavioural trends, and correlations within big data systems. These techniques are collectively and commonly referred to as data analytics.
Swarm intelligence is an effective data analytics technique which consists of a population of individual artificial agents, all of which are potential solutions to the data mining task. Such agents compete and interact with each other and their environment to form an ‘intelligent’ global behaviour, which optimises the data mining technique by tuning the values of parameters (characteristics that define the machine learning system) to values that maximise the learning process. Such cooperation between agents also assists with data organisation by reducing the number of input variables or dimensionality of the dataset, and improving the performance of machine learning algorithms.
Sentiment analysis has also been used widely to identify and evaluate subjective information including explicit expressions of customer opinion, such as reviews and surveys, as well as implicit expressions, such as news articles. This approach employs text analysis, natural language processing and computational linguistics techniques to systematically identify the sentiment of a text by categorising it as positive, neutral, or negative. Analysing customer attitudes, emotions, and opinions of a company’s product in this way aids better business decision-making and leads to improved sales and customer service by uncovering whether consumers are dissatisfied with a product.
Big data analytics has proven to be useful in application-oriented sectors, particularly finance and retail. Powered by big data, live shopping and social media advertising have propelled customers towards e-commerce use. By collecting tremendous amounts of data from customers’ online interactions, product purchase history, sales quotes, and satisfaction with customer service and product quality, businesses can develop smarter shopping experiences and more efficient marketing campaigns to drive customers towards products of their choice based on market trends. For example, Walmart developed its own search engine, Polaris, which offers tailored results based on the big data collected, such as recent purchases made by customers, products that are trending on social media platforms, products that exhibit positive reviews, as well as items that have been visited frequently.
Big data analysis has also been crucial in transforming banking into a more customer-focused sector. By evaluating social media data regarding customer satisfaction, loyalty and brand sentiment, banks have gained an improved understanding of customer preferences and have been able to enhance customer experience. Data mining techniques are also used to analyse customer profitability and retention, allowing banks to identify and target specific groups based on specific business objectives, and optimise offers related to loans, savings, interest, and account policies to the customers they believe to have the highest spending potential. By using big data to classify customers into different groups, banks can offer tailored products that drive revenue.
Regulatory issues surrounding big data
Mantelero and Vaciago have taken a bi-dimensional approach to big data's regulatory concerns. They propose that these problems should be evaluated from the perspective of the individual and that of the collective.
Concerning the individual, self-determination forms an essential aspect of data regulation policy. Individuals should be able to reserve the informational self-determination right to freely decide what happens to their data, who uses it, and how it is used. This idea has been legislated within Article 6 of GDPR, which states that data processing is only lawful if individuals give their consent. Applying self-determination to big data collection and analytics is becoming increasingly challenging. Since big data encapsulates various forms of information from various sources, it is nearly impossible for the companies collecting it to know its use precisely at the time of collection. User information may be processed multiple times by different companies for different ends. For example, in the case of Instagram, user activity is recorded to improve the platform’s search functionality and ‘Explore Page’, recommend content, and provide targeted advertising for brands across sibling platforms including Facebook, Messenger and Oculus. Consequently, providing more than a generic notice explaining all the potential uses of users’ data is essentially impossible, limiting individuals’ ability to truly consent to their data being processed. Thus, big data regulation trips at the first hurdle of individual data protection.
The collective dimension of big data regulation concerns attempts to control the use of this resource for group profiling and collective decision-making. Mantelero and Vaciago describe big data’s facilitation of a ‘new truth regime’ where individuals' information collection and processing allows companies to assign predictor variables to them and segment their user bases accordingly. Hence trivial information, such as a person’s propensity to watch primetime television or buy general merchandise, can have non-trivial effects on their credit scores and insurance policy access. These effects can be emphasised by combining analysis of individuals’ habits with their neighbours and 0thers in their local area. Consequently, big data analytics can enforce bias against individuals from lower socio-economic backgrounds as group trends are indiscriminately assigned to them, furthering their disadvantage. Mantelero and Vaciago describe how such ‘a classification approach may induce “self-fulfilling cycles of bias” and consequent discriminatory effects.’
Those who stand to be affected by biassed societal group representations generated by big data analytics must receive adequate legal protection. However, big data regulation is sparse in this area, particularly in the United States. In fact, ‘there are no laws that currently regulate big data specifically’ in the US. Instead, at a federal level, the nation relies on the outdated Health Insurance Portability and Accountability Act (1996), the Children’s Online Privacy Protection Act (1998) and the Computer Fraud and Abuse Act (1986) to regulate big data. Firms are expected to ‘ensure that their proposed activities comply with privacy laws that are applicable to the data involved in their operations’, but this regulation is challenging to enforce given the international reach of big data collection systems. As a result, there is very little to protect people from the negative impacts of big data facilitated group decision making. Far more needs to be done to tackle the collective dimension of big data regulation.
To conclude, data regulation is a burgeoning issue, and it is pertinent that governing bodies develop more rigorous measures to protect individuals. Despite big data analytics improving business decision making, such integration of data compromises the rights of customers, unbeknownst to the user. Not only does data protection fail on a collective dimension by enforcing bias against already disadvantaged groups, it also fails to protect individuals since the nature of its generation and application makes it impossible to understand where and how specific data is processed. As such, governing bodies must incorporate regulations that protect individual self-determination and groups from discrimination.