Technical and legal aspects of privacy-preserving services: the case of health data

Nowadays, the potential usefulness as well as the value of health data are broadly recognized. They may transform traditional medicine into clinical science intertwined with data research, driving innovation and producing value from the perspective of the key stakeholders of the health care ecosystem: not only patients but also health care providers and the life insurance sector.

Yet, the health data does not appear out of thin air, it is not a product that can be viewed in isolation. It is:

  • the personal data related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status (data concerning health),
  • the personal data relating to the inherited or acquired genetic characteristics of a natural person which give unique information about the physiology or the health of that natural person and which result, in particular, from an analysis of a biological sample from the natural person in question (genetic data),
  • the personal data resulting from specific technical processing relating to the physical, physiological or behavioural characteristics of a natural person, which allow or confirm the unique identification of that natural person, such as facial images or dactyloscopic data (biometric data).

Thus, the individual cannot be deprived of the right to decide about their processing as the health issues are at the very centre of the privacy protection sphere.

It becomes clear that balancing the interests of the private individual whose privacy is protected, interests of other private and public actors, and general common interests is highly problematic. Naturally, processing of the health data cannot be unrestricted: optimally, the legal framework should facilitate unlocking the value of health data for European citizens and businesses and empower users in the management of their own health data without undermining the very essence of the right to privacy.

Currently, processing of health data falls under complex GDPR legal regime. This, however, poses a serious challenge for the data processors on the one hand and, on the other, gives rise to numerous legal questions. What are the grounds for processing such data in this highly differentiated context?  How should medical data be protected both on the regulatory and technological level? How can we harness newest technology to increase data safety? How can anonymization and/or privacy-preserving data management techniques using efficient cryptography (e.g. homomorphic, secure multi-party computations) contribute to reaching higher protection levels without becoming a hurdle or an impediment for legitimate data processing? Can the blockchain technologies be used for health information exchange? Should the creation of technological infrastructure be coupled with establishing proper key management schemes?

The task is twofold. First, on the regulatory level general policy guidelines for legislators, independent agencies, businesses on data sharing platforms are necessary, together with the analysis of the policy and market implications of providing privacy-preserving services. Second, the practical recommendations are needed: specific postulates should be formulated on how data protection techniques can be applied in the health domain, in order to contribute to achieving the abovementioned aims.

Author: dr. Katarzyna Południak-Gierz, Jagiellonian University

WATCH AGAIN THE WEBINAR: SoBigData++ and LeADS joint Awareness Panel. Legal Materials as Big Data: (algo)Rithms to Support Legal Interpretation. A Dialogue with Data Scientists.

SoBigData++ and LeADS joint Awareness Panel. Legal Materials as Big Data: (algo)Rithms to Support Legal Interpretation. A Dialogue with Data Scientists.

6th of July 2021

Video

Is blockchain THE reliability solution for big data?

Blockchains have sparked great enthusiasm from the data science community who believes this technology will be THE solution to data authenticity, data privacy protection, data quality guarantee, smooth data access and real time analysis [1], [2]. Data being considered as the new digital oil, data science and blockchain seem to be the perfect match [3]. Indeed, data science allows people/organizations to extract valuable knowledge from humongous volume of structured or unstructured data. So, blockchain provides security and reliability of the manipulated data. But does it sound too good to be true?

 

Blockchain is a way to implement a decentralized repository (a.k.a Distributed Ledger Technology) managed by a group of participants, without necessity of assuming trust among each other. Blockchain groups data records into blocks that are cryptographically signed and chained by back-linking each block to its predecessor. Blockchain was initially proposed for cryptocurrency (e.g., Bitcoin). This first generation of blockchain applications is called Blockchain 1.0. Later, smart contracts were introduced, paving the way to decentralized applications referred as Blockchain 2.0. Today, Blockchain 3.0 explores a wider spectrum of target applications like e-health, smart cities, identity management, etc [4].

 

Big data is one of the possible Blockchain 3.0 applications. Deepa et al [5] recently published a survey on the use of blockchain technology for big data which shows that projects try to apply blockchain-based solutions at different steps of big data processing. This includes big data acquisition (data collection, data transmission and data sharing [6]), big data storage (by securing decentralized file systems or by detecting malicious updates in databases [7]) or big data analytics (for machine learning model sharing, decentralized intelligence and trusted decision-making of machine learning [8]).

 

Although blockchain technology appears to be a good candidate to secure big data, this technology is not flawless [9] [10] [11] and security threats/vulnerabilities have been identified at each layer of the blockchain stack model [12]. First of all, blockchains depend on the underlying network services and attacks on routing protocols or on DNS can harm a blockchain network. At the consensus layer, which is the core component that directly dictates the behavior and the performance of the blockchain, the situation is also complex [13]. The classic Proof of Work protocol is far from being a panacea and is a non-sense from the environment point of view [14]. In addition, most miners are gathering around mining pools to increase their processing capability, and thus, their chance of adding a new block to the blockchain. At the time of writing, the blockchain.com website estimates that six bitcoin mining pools (F2Pool, AntPool, Poolin, ViaBTC, Huobi.pool and SlushPool) represent 63% of the hash rate [15]. If they collude with each other, they can launch the 51% attack and destabilize the whole bitcoin network [13]. Consequently, more and more consensus algorithms are studied, proposed, and extended such as proof of stake, of authority, of activity, RBFT, YAC, etc. However, an ideal consensus algorithm is still missing as almost all algorithms have significant disadvantages in one way or another with respect to their security and performance, as concluded in [13]. The Replicated State Machine layer, which is responsible for the interpretation and execution of transactions, can be vulnerable too. Blockchain technology doesn’t guarantee the reliability of the data, only the integrity of the blocks. For instance, Karapapa et al. [16] showed how to make ransomwares available using Ethereum smart contracts. Confidentiality of data is also not always embedded in the blockchain. Finally, blockchain is implemented as software running on computers and thus attackers can exploit security holes and misconfigurations. E.g., white hat hackers found more than 40 bugs in blockchain and cryptocurrency platforms during a one month bug bounty session in 2019 – 4 of them were buffer overflows which made possible to inject arbitrary code [17].

 

To conclude, blockchain technology offers promising features to big data. However, one should acknowledge the current technical limitations of the technology. Another consideration is legal aspects. Indeed, the European Parliamentary Research Service observed many points of tension between blockchains and the GDPR [18]. When all these issues will be answered then yes … blockchain will be a serious candidate for being the reliability solution for big data.

 

By Romain Laborde

 

References

[1]       “Why Data Scientists Are Falling in Love with Blockchain Tech,” Techopedia.com. https://www.techopedia.com/why-data-scientists-are-falling-in-love-with-blockchain-technology/2/33356 (accessed Apr. 21, 2021).

[2]       2021 at 1:00pm Posted by Isaac Rallo on March 15 and V. Blog, “Six use cases in Blockchain Analysis.” https://www.datasciencecentral.com/profiles/blogs/six-use-cases-in-blockchain-analysis (accessed Apr. 21, 2021).

[3]       “What Makes Blockchain and Data Science a Perfect Combination.” https://www.rubiscape.io/blog/focus-on-data-diversity-to-make-your-ai-initiatives-successful-0 (accessed Apr. 21, 2021).

[4]       D. Di Francesco Maesa and P. Mori, “Blockchain 3.0: applications survey,” Journal of Parallel and Distributed Computing, vol. 138, pp. 99–114, Apr. 2020, doi: 10.1016/j.jpdc.2019.12.019.

[5]       N. Deepa et al., “A survey on blockchain for big data: Approaches, opportunities, and future directions,” arXiv preprint arXiv:2009.00858, 2020.

[6]       N. Tariq et al., “The Security of Big Data in Fog-Enabled IoT Applications Including Blockchain: A Survey,” Sensors, vol. 19, no. 8, Art. no. 8, Jan. 2019, doi: 10.3390/s19081788.

[7]       N. Zahed Benisi, M. Aminian, and B. Javadi, “Blockchain-based decentralized storage networks: A survey,” Journal of Network and Computer Applications, vol. 162, p. 102656, Jul. 2020, doi: 10.1016/j.jnca.2020.102656.

[8]       Y. Liu, F. R. Yu, X. Li, H. Ji, and V. C. M. Leung, “Blockchain and Machine Learning for Communications and Networking Systems,” IEEE Communications Surveys Tutorials, vol. 22, no. 2, pp. 1392–1431, Secondquarter 2020, doi: 10.1109/COMST.2020.2975911.

[9]       X. Li, P. Jiang, T. Chen, X. Luo, and Q. Wen, “A survey on the security of blockchain systems,” Future Generation Computer Systems, vol. 107, pp. 841–853, 2020.

[10]     M. Saad et al., “Exploring the attack surface of blockchain: A comprehensive survey,” IEEE Communications Surveys & Tutorials, vol. 22, no. 3, pp. 1977–2008, 2020.

[11]     Y. Wen, F. Lu, Y. Liu, and X. Huang, “Attacks and countermeasures on blockchains: A survey from layering perspective,” Computer Networks, vol. 191, p. 107978, 2021.

[12]     I. Homoliak, S. Venugopalan, D. Reijsbergen, Q. Hum, R. Schumi, and P. Szalachowski, “The Security Reference Architecture for Blockchains: Toward a Standardized Model for Studying Vulnerabilities, Threats, and Defenses,” IEEE Communications Surveys & Tutorials, vol. 23, no. 1, pp. 341–390, 2020.

[13]     M. Sadek Ferdous, M. Jabed Morshed Chowdhury, M. A. Hoque, and A. Colman, “Blockchain Consensus Algorithms: A Survey,” arXiv e-prints, p. arXiv-2001, 2020.

[14]     A. B. Business CNN, “Bitcoin mining in China could soon generate as much carbon emissions as some European countries, study finds,” CNN. https://www.cnn.com/2021/04/09/business/bitcoin-mining-emissions/index.html (accessed Apr. 21, 2021).

[15]     “pools,” Blockchain.com. https://www.blockchain.com/charts/pools (accessed May 03, 2021).

[16]     C. Karapapas, I. Pittaras, N. Fotiou, and G. C. Polyzos, “Ransomware as a Service using Smart Contracts and IPFS,” in 2020 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), 2020, pp. 1–5.

[17]     Mix, “Security researchers found over 40 bugs in blockchain platforms in 30 days,” TNW | Hardfork, Mar. 14, 2019. https://thenextweb.com/news/blockchain-cryptocurrency-vulnerability-bug (accessed Apr. 28, 2021).

[18]     M. Finck, “Blockchain and the General Data Protection Regulation: Can distributed ledgers be squared with European data protection law?,” PE 634.44, Jul. 2019. [Online]. Available: https://www.europarl.europa.eu/RegData/etudes/STUD/2019/634445/EPRS_STU(2019)634445_EN.pdf.

Rights of the Internet of Everything (Last-JD-RIoE) – First Annual Conference

Wednesday and Thursday, 21-22 July, Online

This event, which takes place in the framework of the LAST-JD-RIoE Project, funded by the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie ITN EJD grant agreement No 814177, gathers world authorities on different aspects of the Internet of Everything the promote scientific discussion, exchange research ideas and promote business opportunities.

For further info and Program

Registration