Unchaining data portability

The role of data portability in the EU Digital Strategy

The concept of data portability relates to the characteristic of a set of data to be moved to, from and among applications, operating systems, or devices, with minimal friction.

The European legislator individuated  data portability as fundamental means to develop digital policies that benefit both citizens, giving them higher levels of control over their data, and the market, revamping competition thanks to clearer rules and easier mechanisms for data sharing, access and re-use.

In practice, the possibility for end-users and businesses to move to and from digital service providers seamlessly, without losing content or disrupt their services is, at least in theory, the perfect arena to fuel competition and better services.

Yet the realization of such seemingly simple capability of data is hindered by an unparalleled amount of legal, economical, and technological complications. Data portability is in fact hard to regulate. Its complexity is due to its inbred, bi-parted soul: one half being its economic-driven capacity of impacting market competition, the other being its human-rights-driven capacity of enabling people’s informational self-determination. Additionally, the two souls of data portability are inseparable, meaning that it is impossible to regulate only for data protection leaving competition untouched. The “inseparability of souls“ is a characteristic that in itself impacts the regulation of a right to data portability, because of its cascade implications on multiple domains, ranging from technological interoperability and industrial standardization to human and economic rights, to market competition and consumer’s rights, to policy in data sharing and governance.

Historically, the concept of portability stemmed from “number portability” enshrined in art 30 of the Universal Services Directive. Yet ever since, in the legislative action of the EU legislator, the concept of portability has assumed different objects, affected different subjects, required for new technologies, as well as the development of theoretical frameworks for data sharing and governance to keep pace with the evolution of the European digital market.

Objects of portability legislation have so become personal data (GDPR, proposals for the Digital Governance Act (‘DGA’), Digital Markets Act (‘DMA’) and Data Act –even special categories thereof (such as health data in the proposed European Health Data Space Regulation (‘EHDS’)), non-personal data (Free Flow of non-personal data Regulation ‘FFNPD’, Open Data Directive, and again DSA and DMA), but also the services and online content of European users (Content Portability Regulation and Digital Content Directive (‘DCD’)). Self-evidently, such heterogeneous legislative framework, among Directives and Regulations, realizing diverse political strategies and each with their specific objectives, through horizontal and vertical regulation, over the span of 20 years –and importantly: the last twenty years—in the context of the ever changing economics, technologies and societal structures of the digital ecosystem, have created a legislative omnishambles.

Such legislative omnishambles is extremely hard to navigate –even for legal experts—but  does have legal effects and does create rights and obligations for end-users as well as businesses.

Even within GDPR, the text of art. 20 allows for multiple interpretations of rights and obligations. For instance, it is unclear for users and providers what personal data shall be portable, considering “data provided by the data subject” can be that personally generated, or also that observed by the provider, or even inferred. Also unclear is, which types of formats that are structured, machine-readable, and commonly-used are also functional to ensure interoperability, and what is the legal relationship between interoperability and portability. Another fundamental, unanswered question is about where to trace the line of a “portable minimum” for each service so that the ported data are still meaningful to the data subject. There is in fact difference in being able to port single photos v. albums, or entries of a list of contacts v. a social graph of relationships. In these cases, keeping the structures and the collections of the single data entries can sometimes be as vital to the service as their entries themselves. It is in such cases that the law does not clarify what’s the minimum “data unit”.

Snowballing, the portability of single data or collections thereof might create issues with the “rights of others”, such as privacy or intellectual property. As for privacy, data portability requests may encompass personal data of others, e.g. in the case of contact lists, conversations with, or pictures of others, where it will be hard to balance the interests at stake or even find legitimate grounds for processing. Moreover, when the personal data of others is finally in the control of the requesting data subject, they fall under the data subject’s household exemption. Because of this exemption, they get no longer protected under the GDPR, which is a problem in terms of security –and a big one, should the downloaded datasets contain hundreds of contacts of vulnerable subjects or hazardous content, or be sent via insecure means. As for intellectual property, there will be cases where pictures were adjusted with proprietary filters, or collections created by the service provider on the basis of the generated or observed subject’s personal data, or content generated by a “prosumer”. These cases raise questions of legitimacy of the portability requests and need legal and technical answers.

Nowadays, the reality of fact is that, on one side, the majority of people do not know about the existence of their right to data portability, or would anyways not know how to enjoy it. On another side, there are no clear rules for service providers on how to address such requests without infringing somewhere down the line some stakeholders’ legitimate interests, nor how to create portability services that comply with all the potentially applicable rules. And finally, the European and national courts as well as Regulatory Authorities have not yet clarified questions on portability –only the case UK drivers v. Ola decided on a data portability requests, but without solving any of the above; same holds for the guidelines adopted by the WP29/EDPB.

“Politics” of data portability

Data portability is not a necessary function in most information systems. It is instead a function that the architects of an information exchange system may want to, or are obliged to embed —by a regulatory constraint in the case of Art. 20 GDPR. This means that, decisions on the existence and extent of data portability functions are the result of a normative decision of either the developers or, in our case, the legislator. Such decisions relate to how should (what?) data (personal, non-personal, content, etc…) be governed, who should decide what to do with it, and who should be responsible for making that possible. With the capability of enabling or hindering such decisions, data portability is a crucial means within the realm of “politics of data”.

The political goal of the EU legislator is to unchain the power of European data first by destabilising the de facto ownership over data by the –mostly American—information industry, and then using data the “European way”, that is fairly, securely, to the benefit of its people and businesses, and in respect of fundamental rights. To operationalize such “data sovereignty” strategy, the results of a heated political and academic conversation about governance models have rewarded forms of data openness, access, sharing and re-use, which are allegedly better at reaching goals of information privacy, innovation and competition, as opposed to data property, ownership, and other exclusivity models.

Technologies for data portability

From a technological perspective, to realize portability is quite impractical. Data migration and re-adaptation does not happen smoothly and the lack of mandated top-down coordination from the EU is not helping the standardisation process. As for data formats, practical research on portability requests showed that respondents favour some types of data formats depending on their field of service, of which only a few are GDPR compliant according to the interpretation of the U.K. ICO. As for information systems enabling portability, the EU is moving on multiple fronts. First is the upcoming roll-out of the vertical European Data Spaces, with the Health one already at proposal stage. Additionally, the development of personal information management systems (‘PIMS’) is underway, which will allow users to be “holders” of personal information to manage in secure, local or online storage systems, and to share them at will. Reading from the EDPS’ TechDispatch 3/2020,

“PIMS can usually offer personal data and other metadata describing their properties in machine readable formats, as well as programming interfaces (APIs) for data access and processing. This last feature implies the use of standard policies and system protocols. This is an essential element, the lack thereof currently also represents a limit for PIMS adoption.”

Private projects such as Nextcloud, Solid, and MyData have made promising steps toward portability-enabling systems, but have not reached a level of technology readiness to allow for market acceptance and critical adoption. Unsurprisingly, there are open-source, industry-led initiatives, championed by the Data Transfer Project of Google, Meta, Twitter and Apple, which aim to ensure the entering the market of products and services to address the consumers’ requests of downloadable user data in structured, commonly used formats (Google Takeout, etc.), as well as of direct, seamless data portability from one service to another. These projects, however, may encounter legal obstacles in European competition law ex art. 102 TFUE, but also political ones: the consideration of the EC policy agenda regarding data governance altogether excludes that American big-tech players will unilaterally establish the de facto standard data formats and systems for data portability.

 

Conclusions

Under normal circumstances, and considering the results of the EC’s impact assessments, the interplay of the mentioned regulatory efforts should shape a digital market that benefits everyone. The EU has seemingly found a silver bullet that makes market players and consumers happy, both economically and in the respect of the fundamental rights to privacy, intellectual property, and fairness in the distribution of data value.

In reality, nobody seems interested in using such silver bullet. Why is that?

After months of research, my educated guess is that the reasons are to be found in a mixture of the following:

  • From a market perspective, portability has potentially disruptive, market-wide economic effects mostly stacked against big market players. The EU has been extremely careful in (not) imposing rules and technologies for full harmonization, with a hope that multi-stakeholderism could find its ways. Citing Alek Tarkowski from the Open Future Foundation “no one tried hard to make it work, while others tried very hard not to make it work”.
  • From a law and economics perspective, although it is said that portability will benefit users and businesses, there have not been exhaustive and conclusive economic analyses providing evidence of benefits for big tech companies, nor for Small and Medium Enterprises.
  • From a regulatory perspective, the careful, delicate approach of participatory regulation and technological neutrality has been excessively open, creating uncertainties that have benefited the maintenance of the status quo –meaning, the monopolistic control over data of big tech players.
  • From a technological perspective, there remains the need to develop information systems enabling data portability. The problem is that, in privacy engineering, such development starts with the identification of the requirements, both legal and technical, and in such a moment of regulatory turmoil these are hard to identify, let alone systematize, operate, and put into the market.

LeADS project at AIAI 2022

The LeADS project is organising a workshop, titled “Best Practices for the development of intelligent and trustworthy algorithms and systems, at the 18th International Conference on Artificial Intelligence Applications and Innovations (AIAI). The workshop will take place on Sunday, 19th June in Crete, Greece, and is divided into three sessions: 1) Panel discussion on Data Ownership, Privacy and Empowerment 2) Panel discussion on Trustworthy Data Processing Design and 3) Poster Session.

Below you can find the detailed programme:

 

Legality Attentive Data Science

Workshop: Best Practices for the development of intelligent and trustworthy algorithms and systems

19/06/2022

 

10:30 – 11:30 AM

Panel 1: Data Ownership, Privacy, and Empowerment

Pointing to its limitations, legal uncertainties and issues with implementation, quite a few legal scholars argue that a new legal instrument in the form of data ownership is unnecessary. However, ownership and property are at the core of liberal political theories of the modern state and modern law, as these form the source of rights and liberty. And if data is a valuable resource, forms and scales of its ownership should be discussed- not only in legal writing but also in public debate. Reflecting on the current regimes of data exchange and ownership structures related to data, this panel will discuss if and how a potential data ownership right can empower data subjects and right holders. Covering the scope and elements of a potential data ownership right, the panelists will guide us to have a closer look at the powers and limitations of such a right in relation to pervasive technologies such as AI and machine learning. Some questions the panel will explore: How would a potential data ownership right integrate with existing data protection law? Would it potentially empower individuals regarding access rights and ‘data portability’? Can we talk about collective ownership of data? If so, how can we justify it dwelling on the political questions of property and dispossession? 

Duration: 60 min (including 15 min debate)

Moderator: Imge Ozcan, LSTS, Vrije Universiteit Brussel

Panelists:

  • Katerina Demetzou, Future of Privacy Forum 
  • Paul De Hert, LSTS, Vrije Universiteit Brussel (remote participation) 
  • Afonso Ferreira, CNRS, Institut de Recherche en Informatique de Toulouse

 

11:30 AM – 12:00 PM Coffee Break

 

12:00 – 01:00 PM

Panel 2: Trustworthy Data Processing Design

Data are fuelling the economy. The borders between personal and non-personal data, sensitive and non-sensitive data are fading away while the need for their secondary uses is growing exponentially. The Panel focuses on these issues moving from legal, ethical and technological framework needed to design data processing trustworthy for all the players. 

Duration: 60 min (including 15 min debate)

Moderator: Giovanni Comandé, Scuola Superiore Sant’Anna

Panelists: 

  • Jessica Eynard, Toulouse Capitole University 
  • Elias Pimenidis, University of the West of England 
  • Gabriele Lenzini, University of Luxemburg 
  • Salvatore Rinzivillo, Italian National Research Council

 

01:00 – 2:15 PM

 

Poster Session: Gallery Walk on “Best Practices for the development of intelligent and trustworthy algorithms and systems”.

 

In addition, University of Piraeus, a LeADS beneficiary, in collaboration with University of Sunderland, is co-organizing the second workshop on “Artificial Intelligence and Ethics(AI & ETHICS – https://ifipaiai.org/2022/workshops/#aiethics), which will take place on Monday, 20th June.

We are looking forward to engaging with the interdisciplinary and diverse community at AIAI, sharing our research and having fruitful discussions surrounding our project.

Participation of Barbara Lazarotto at the SSN2022

Early Stage Researcher Barbara Lazarotto (ESR 7) presented her research at the 9th biennial Surveillance & Society conference of the Surveillance Studies Network (SSN), hosted by Erasmus University Rotterdam on June 1-3 2022 in Rotterdam, The Netherlands.

As a part of the panel named “Human Rights Europe and Global South” Barbara presented her topic of public-private data sharing under the lens of State Surveillance, analyzing the role of sensors in a Smart City context and how this data can be used to analyze and monitor entire populations.

To do that, Barbara first pointed out the broad concept of “smart cities” which is used for several different purposes and political agendas which might be concerning when it comes to the public expectations of locational privacy. Subsequently, she presented three examples of the use of public-private data sharing in smart city contexts, namely the city of Kortrijk (BE), a bridge installed in Amsterdam (NL), and the city of Enschede (NL). These three examples were able to demonstrate the extensive use of public-private partnerships that constantly track and monitor individuals’ behaviors and movements.

At last, Barbara focused on the fundamental rights that are violated by these sensors, highlighting that the idea that sensors are non-personal data might be misleading since the number of sensors is becoming so high in some cities that is in fact possible to single out an individual. Thus, Barbara pointed out that by violating locational privacy, other rights might be also violated such as freedom of religion and liberty and security. Barbara finalized her presentation by highlighting the necessity to increase citizens’ participation within the decision-making process of smart cities’ public-private partnerships, the requirement for citizen digital literacy, and further regulation of smart city sensors.

What do you mean a robot gets a say in this?

Predictive analytics, which can be understood as the application of artificial intelligence (machine learning and deep learning) based computer algorithms, to predict the future activities of a datasets based on their past or current behaviors are increasingly finding applications across different domains and sectors.

In the legal and civic fields, predictive analytics while previously known for its application in carrying out subservient tasks such as providing case law insights, mapping of contracts, vetting of legal provisions, are now being used for providing insights during legal trials and dispute resolution. These predictive jurisprudence softwares find their application in predicting recidivism probabilities of persons, investigating evidence, and even predicting the possible resolution of a civil dispute or a criminal charge based on the precedent and the legal infrastructure of the jurisdiction in which they operate.

In toto, predictive jurisprudence finds its application in three broad spheres, namely – (1) Predictive Justice: Criminal Sentencing, Settlement of Civil Disputes, Increasing Access to Justice; (2) Predictive Recidivism Software: Parole related decisions, Commutation of Criminal Sentences; (3) Legal Tools: Drafting tools, Contract Analysis Tools, and Legal Insight Tools.

The use of predictive jurisprudence and its development can be first found all the way back to the Supreme Court Forecasting Project (SCFP), a combined study conducted by students at the University of Pennsylvania, Washington University, and the University of California Berkeley, which was a statistics-based legal project that aimed to predict the outcome of every case argued in front of the United States Supreme Court in the year 2002. The backbone of this project was a statistics formula that when used, performed very well in predicting the decisions of the court to a degree of accuracy that even seasoned legal experts could not match. The statistics model predicted 75% of the court’s affirm/reverse results correctly, whereas the legal experts collectively predicted only 59.1% of the decisions correctly. Although the SCFP did not employ the use of computers per se, it provided proof of concept for the ideation that judicial decisions and the jurisprudence of courts are indeed parameters which can be readily predicted.

Since the SCFP, various companies and individuals have developed their digital products focused on assisting legal professionals by providing insights regarding legal materials or predicting the pattern of the judicial pronouncements, specificized based on the judges and courts across a plethora of legal matters such as settlement of insurance related claims, small cause matters such as traffic violations, granting of parole and commutation of sentences for convicts.

There are many companies who have well established digital products operating in the legal sphere, one of these is CaseCrunch, a UK based startup whose predictive jurisprudence application CaseCruncher Alpha showcased 86.6% accuracy in legal predictions by their algorithms, while the pool of 112 lawyers pitted against the CaseCruncher Alpha had an overall accuracy of 62.3%.

Another company flourishing in the predictive jurisprudence sphere is Loom Analytics which is a predictive analytics platform that features win/loss rates and judge ruling information but only for civil cases in select Canadian provinces, however, they are in the process of scaling up.

However, the established market powers in the predictive jurisprudence sphere are Prédictice– a French company in the business of providing legal case analysis (except for criminal cases). Another French company operating in this sphere is Case Law Analytics, which also works on providing legal analysis albeit much like Prédictice, it does not analyse criminal cases.

Another major player in the US market is Lex Machina which is owned by the global conglomerate LexisNexis and is in the business of providing legal analysis (including criminal cases) to legal professionals amongst other services such as insights such as how a judge behaves in a specific case, a compendium of crucial insights regarding the arguing styles, litigation experience which allows for persons to formulate an appropriate litigation strategy, Lex Machina also provides analysis of a party before a specific judge, courts or forums. Further, Lex Machina provides outcome-based analytics, timing-based analytics and helps in analysing the motions submitted to the court which helps professionals in crafting the appropriate motions to move the courts for specific causes.

Predictive jurisprudence is clearly a winner in terms of analysing not just volumes of data accurately but also identifying patterns in judicial behaviour which may not be visible to even the most seasoned experts.

The companies and private projects engaged in the use of predictive jurisprudence commercially, point towards an inherent market for predictive jurisprudence tools which have many users relying on the same to not only hone their professional skills and insights but also provide an increased access to justice across many jurisdictions. However, this brings us to our most important consideration yet- is it prudent to rely upon predictive jurisprudence software to carry out legal functions? And if so, what are the core tenants of designing and using such software.

The use of AI in this context relies on two specific considerations- the domain or sector in which it will operate and the characteristics of the tasks it will carry out. For example: the AI based software when applied to the legal sector if carries out administrative tasks such as retrieval or organisation of files, can be considered a low-risk AI and therefore, its users need not be made to go through a wide array of disclosure and notifications that they are interacting with an AI system. However, in the case where the AI based software is carrying out complex tasks such as legal deliberation, which would normally require a degree of expertise, the AI will be classified as a high-risk AI since any mistakes or shortcomings can have a direct impact on the life and liberty of an individual.

This brings us to our next crucial consideration about what core tenets are supposed to be kept in mind while designing a predictive jurisprudence-based AI software. First and foremost, a strict compliance with data protection laws takes centrestage in such a software, making a wonderful case for incorporating privacy by design.

Secondly, all legal procedures across jurisdictions- whether civil law, common law etc., have a common abidance to the principles of natural justice which are namely- (1) Adequate Notice; (2) No presence of Bias and; (3) Providing a reasoned order for all delibertations.

This brings us to an important component in all predictive jurisprudence-based AI applications- a degree of explainability. Explainable AI (XAI) has made many developments in the recent times, and a degree of explainability in  a predictive jurisprudence application is crucial in as much as it allows for natural persons to readily rely on them since they understand the reasoning behind the computational results of the AI. The use of XAI as a core design tenet will also enable the predictive jurisprudence application to function independently in low-risk or moderate-risk tasks.

In their current form, since most predictive jurisprudence are far from perfect, they require human oversight and thus function to accentuate the legal analysis of lawyers, judges, and other legal professionals.

The EU Agency for Fundamental Rights (FRA) published a report (The European Union Agency on Fundamental Rights, 2020) in 2020 under the directorship of Michael O’Flaherty titled “Getting the future right: AI and fundamental rights” (FRA Report).

The FRA Report mentions the requirement for adequate disclosure while using the AI-based predictive jurisprudence technologies, this will provide the persons with the successful opportunity to complain about the use of AI and challenge the decisions which have been arrived upon based on the AI as this grievance and complaints mechanism is crucial for upholding the access to justice. The following are crucial to be reported to persons using the predictive jurisprudence based tools, in order to ensure access to justice-

  1. Making people aware that AI is used
  2. Making people aware of how and where to complain (A proper and designated grievance redressal mechanism)
  3. Making sure that the AI system and decisions based on AI can be explained (Use of XAI)

Many legal scholars have voiced their concerns about the use of predictive jurisprudence by courts and legal officers asserting that justice must be deliberated and not predicted hinting at the possibility of its users succumbing to automation bias. This concern has been adequately addressed in the current scenario as currently and until the time the AI based predictive jurisprudence software cannot explain the reasoning behind its computational results, it will operate only under the supervision of a natural person who may use the results as a component to deliberate upon while arriving at their well-reasoned decisions, while primarily relying on their own experiences and expertise.