“Each person’s genome, or full complement of DNA, is unique, but the specific variants within an individual’s genome may be widely shared with biological relatives or even across the entire human population. This mixed character of the genome—as a uniquely individual assemblage of widely shared common elements—imbues it with a dual private and public significance that confounds any discussion of policy addressing genetic privacy.”

Wright Clayton E, Evans BJ, Hazel JW, Rothstein MA1

Introduction

The development of genetic testing in recent decades has opened up new opportunities for understanding the genetic etiology of human diseases and new possibilities for genetic diagnosis along with targeted treatment. At the same time, it has given rise to new risks, including threats to the privacy of sensitive data of both the test subject and his or her family.

The objective of the present paper is to identify the main reasons for creating and exchanging genetic databases internationally, as well as to highlight the main threats involved in this process. It also explores current legal provisions imposed by the European Union (EU), the United States of America (USA), and China in this area.

The first section covers the significance of genetics and genetic testing in medicine. Further, we explore arguments for the application of big genetic databases for scientific development and we present some of the key available databases. We also refer to data anonymization in this context. The next section discusses genetic material understood as personal data to confront these arguments with threats related to genetic material transfer concerning individuals, their families, and communities. The last few sections present legal provisions applied in the EU, USA, and China to address these threats. A short summary of regulations enforced in other countries is also provided. An overview of the legal regulations on the protection of genetic data in selected countries is presented in Table 1.

Table 1. Major solutions and shortcomings of legal regulations on the protection of genetic data in selected countries

Country / organization

Legal act and year of issue

Main focus

Major solutions introduced

Identified shortcomings and limitations

United Nations

Universal Declaration of Human Rights,46 1948

Privacy listed among fundamental rights46

Protection from arbitrary interference46

  • No reference to genetic material
  • No sanctions available

European Union (EU)

Charter of Fundamental Rights,47 2000

Broad catalogue of human rights annexed to the Treaty on the Functioning of the European Union47

  • Prohibition of discrimination on genetic grounds47
  • Protection of personal data47

General Data Protection Regulation (GDPR),55 2016

Focus on the protection of data subjects

  • Purpose of processing personal data:

- Data subject’s consent35

- Vital interest of the data subject or other natural persons

- Public interest in the area of public health

- Legitimate interest of the data controller

  • Terms of processing personal data55:

- Lawfulness

- Fairness and transparency

- Data minimization

- Accuracy

- Storage limitation

- Integrity and confidentiality

- Accountability

  • Binding within the European Economic Area (EEA), limited effect in cooperation outside the EEA (especially missing provisions on EU-US cooperation)61
  • Limitations to anonymization / pseudonymization57

United States (US)

Health Insurance Portability and Accountability Act,59 1996

Protection of patient privacy

  • Protection of personal data balanced with the need to ensure relevant healthcare
  • Terms of disclosing personal data

Missing provisions on EU-US cooperation61

Genetic Information Nondiscrimination Act,58 2008

Focus on nondiscrimination

Prohibition of discrimination in insurance and employment based on genetic predispositions

Complexity of the federal legal system83

Canada

Personal Information Protection and Electronic Documents Act, 2000; Genetic Non-Discrimination Act, 2017

  • General protection of health data
  • Protection from genetic discrimination
  • Data minimization and processing for legitimate objectives only
  • Genetic discrimination as a criminal offence76

Harmonization with GDPR and strengthening of the protection of individuals needed77

China

Regulation on Human Genetic Resources, 2012

Protection of genetic material

  • Prohibition of export63
  • Later supplemented with further regulations on personal data protection68

Protection limited to interests of China and Chinese citizens (in contrast to EU provisions)75

Japan

Act on the Protection of Personal Information, 2017

Genetic data classified as personal information

Safeguards to protect from data disclosure and subject stigmatization

No specific provisions on genetic discrimination75

Taiwan

Personal Data Protection Act, 2012

Genetic data classified as sensitive personal data

Genetic and biomedical research covered by provisions against discrimination of indigenous peoples75,84

Australia

Privacy Act, 1998

Limitation of disclosure of subjects’ data

  • Predefined goals allowing disclosure
  • Protective measures against unauthorized access

Complexity of the legal system78

Genetic basis of human diseases

An immense number of human diseases, including neurological diseases, physical and / or intellectual developmental disorders, birth defects, and numerous other diseases as well as cancers, with symptoms occurring after birth or during life (adult-onset diseases), may be caused by genetic alterations, such as single gene mutations (pathogenic gene variants, monogenic diseases), mutations in multiple genes (multigenic diseases), by a combination of gene variants and environmental factors (multifactorial inheritance disorders), or by changes in the number of chromosomes and / or their structure.2

Modern genetic testing in medicine

The use of modern molecular tests for the diagnosis of human diseases, especially methods that enable a precise analysis of selected fragments or the entire genome, such as next generation sequencing and microarray studies (microarray comparative genomic hybridization), has led not only to the identification of the genetic causes of most of the known human disorders, but also to the identification of new, previously undescribed genetic syndromes and diseases.3 It turned out that some diseases with the same clinical symptoms are caused by mutations in different genes (eg, hereditary nonpolyposis colon cancer,4 hereditary syndrome of increased risk of breast cancer5) or by different mutations in the same gene (eg, cystic fibrosis) leading to a different course of the disease and different response to therapy.6 It also came to light that some diseases could not be properly classified by the clinical symptoms, but only by the presence of given genetic alterations, as, for example, in the case of a specific type of spinocerebellar ataxia (out of 38 known types), which is of major clinical significance, as some types of the disease are therapeutically actionable.7 Moreover, in many cases, the only effective therapy is the one targeting molecular alterations, for example, in patients with spinal muscular atrophy.8

Recently, a shift has been observed from the diagnosis based on clinical data, such as physical examination of a patient, imaging, and pathological or biochemical tests, to molecular diagnosis, particularly in the case of rare, clinically undiagnosable diseases.9

The introduction of whole genome analyses for medical purposes revealed that in the human population there are many diseases caused by single gene mutations or microdeletions / microadditions which occur rarely, ultra-rarely, or are “private” variants of the genome.10 Recognition of such rare genetic alterations has given rise to the identification of a group of rare diseases (occurring with a prevalence of less than 5 per 10 000 persons in the population). Despite the rarity of certain syndromes, the total number of rare diseases in the population is significant (about 7000 syndromes have been described), and the global number of patients with rare diseases is high (1 in 50 people).11

Of the rare diseases, approximately 70% are genetic, and almost 70% begin in childhood.12 Most of them have a severe clinical course, leading to intellectual and / or physical disability, congenital malformations, and premature death. In the cases of extremely rare diseases (single cases described worldwide) there are no specific criteria for a clinical diagnosis.13 In such a situation, the analysis of a patient’s genome becomes an optimal option for a precise and quick diagnosis, which, in some cases, allows for early treatment before irreparable changes occur, for example, in the muscles (spinal muscular atrophy),14 brain and liver (eg, metabolic diseases),15,16 or lungs (cystic fibrosis).17-19 Many genomic variants do not increase the risk of any disorders (benign variants); however, for a large number of variants, the clinical significance is still unknown—variants of undocumented pathogenicity (possibly pathogenic) or variants of unknown significance (VUS) are relatively abundant.20 The overall number of VUS in the human genome is extremely high (eg, over 2000 VUS in the BRCA1 gene are currently listed in the National Institutes of Health ClinVar database); however, individual VUSes are usually rare in the population.21 Figure 1 outlines some issues that might be encountered during an analysis of a genome variant with regard to its clinical significance.

Figure 1. The analysis of genome variants in regard to their clinical significance.

Next generation sequencing should be performed in cases of clinically undiagnosable diseases (eg, rare syndromes or hereditary cancer syndromes), as well as in the cases where the only effective therapy is the one targeted on molecular alterations (eg, treatment of patients with spinal muscular atrophy or targeted therapy for cancer patients). Both clinical molecular diagnosis and targeted treatment require precise identification of the genetic lesion underlying the disease along with determination of its clinical significance (pathogenicity). Difficulties in the clinical interpretation of the results of DNA sequencing stem from the fact that the human genome features millions of sequence variants, among which only some have negative clinical effects (pathogenic variants). Many genomic variants do not contribute to the risk of any disorders (benign, nonpathogenic variants); however, for a large number of variants the clinical significance is still unknown (variants of unknown significance [VUS]). Therefore, databases have been created worldwide that provide records of the nucleotide sequencing information, covering raw sequencing data and / or sequence assembly information and / or functional annotation of variants. Identification of a pathogenic variant allows for establishing a diagnosis, assessment of clinical outcome of disease, and initiation of a targeted therapy (if available). Identification of a nonpathogenic variant means that this alteration is not a cause of disease; therefore, genetic testing for other molecular alterations (eg, chromosomal aberrations, dynamic mutations, or epigenetic alterations) should be performed. Identification of a VUS means that steps for verification of this variant’s clinical significance are required (pedigree analysis, testing of family members with and without the given disease, linkage analyses).

The significance of databases in genomic medicine

Interpretation of genomic assays is hindered by the inability to promptly, comprehensively, and reproducibly identify information from the medical literature and related databases.22 A solution to overcome this bottleneck is to create human variant databases that support the aggregation, curation, and sharing of data.23 Several research initiatives are integrating these data, with the goal of making them available to serve as the source of analyses for a wide range of studies. Variant databases not only curate the literature but also facilitate access to unpublished variant classifications generated in diagnostic laboratories.24

Currently, there are approximately 160 databases that provide records of the nucleotide sequencing information, covering raw sequencing data and / or sequence assembly information and / or functional annotation of variants.25 For example, the Genome Aggregation Database (gnomAD, v2.1.1 and v3.1.2) is currently the largest and most widely used publicly available collection of human genetic variants from harmonized sequencing data. The gnomAD provides allele frequencies for 241 million of single nucleotide variants (SNVs) and small indels, 445 000 structural variants (SVs), and 56 434 mitochondrial variants.26 The database of Single Nucleotide Polymorphisms (dbSNP Build 155) contains human SNVs, microsatellites, and small-scale indels along with publication data, population frequency, and molecular consequence.27,28 dbSNP contains 900 millions of small variants collected from 192 000 individuals. dbVar (ver. July 15, 2022) houses exclusively human SVs greater than 50 base pairs in length, collected across 150 large studies, including the 1000 Genomes Project, Clinical Structural Variants, or Genome in a Bottle. Currently, dbVar contains over 7 million SVs.29 The Ensembl variation database (Ensembl Release 107) gathers data from the abovementioned and other sources. Importantly, this database provides results from a range of in-silico tools, including CAAD, SIFT, and PolyPhen-2 to help assess the potential pathogenicity of 7 million variants that are predicted to result in an amino acid substitution.30 ClinVar is the leading public archive of reports of the relationships among SNVs, SVs, and phenotypes. As of April 2022, ClinVar contains 1 383 896 unique genetic variants from 2184 submitters from all around the world, including 1 000 984 variants with assertion criteria.31 Thus, ClinVar has become a valuable resource to support clinical variant interpretation. The examples of other databases that collect patients’ genomic data focused specifically on pathogenic SVs and SNVs include DECIPHER,32 the Catalogue of Somatic Mutations in Cancer (COSMIC),33 and the Online Mendelian Inheritance in Man (OMIM).34 As a consequence of a “dramatic increase in the amount of genetic information generated, analyzed, shared, and stored by diverse individuals and entities,”1 the issue of ensuring relevant protection of these data has become particularly urgent.35,36

Personal data protection

For the protection of genomic data, the primary condition is the initial de-identification of medical records by removing explicit identifiers (eg, name, address),37 which refers to a collection of techniques devised for removing or transforming identifiable data into nonidentifiable information, and also introducing random noise into the dataset. However, publishing genomic data in open-access repositories raises questions of residual reidentification risk. Experimental results demonstrated that a small number of single-nucleotide polymorphisms (SNPs) (from 30 to 100 SNVs) were sufficient to establish a positive match between known personal data and an individual’s genetic information held or published in a presumptively anonymized format.38 Moreover, recently tested approaches show a possibility of inference of phenotypic information from genomic data that may be leveraged for reidentification purposes. To some extent, Lippert et al39 were able to reidentify individuals in a whole-genome sequencing dataset by predicting their visual traits, including eye and skin color.39 Subsequently, Joshi et al40 demonstrated that unrelated individuals with similar facial features share similar genotypes of 19 277 common SNVs. This particular study provides evidence that human face structure is determined based on the individual’s SNV landscape. Theoretically, genomic features might be predicted from phenotypic data (for example, 3-dimensional facial morphology, visual features, or even physically observable disorders) for reidentification purposes; however, the actual sensitivity and specificity of these attempts are relatively low39,41,42

In the literature, there are reports of serious breaches of anonymity and security of genetic databases, presenting the risk of reidentification of individual subjects. For example, in 2013, confidentiality of 2 major genetic database projects was breached: the Personal Genome Project and the 1000 Genomes Project.43 Therefore, efforts have been made to develop technologies that improve anonymization of genetic information. Simple access control methods limit the data exposure by allowing only authorized users to access sensitive data, whereby qualified researchers apply for access on a project-by-project basis and their research plans are reviewed, often by a relevant committee. If genomic data need to be shared with an untrusted third party or the public, the privacy of individuals to whom these data correspond can be protected by means of perturbation (eg, by transformation or aggregation).43 More advanced secure multiparty computation protocols include cryptography-based methods that enable a group of parties to jointly perform a task on data while keeping the data private.44 Currently, the encryption-based approaches represent the most rigorous route to secure sharing of personal genetic information. In these approaches, an algorithm converts the original data into an encoded format. Consequently, controlled access and data encryption methods are keys to preventing inappropriate use of genomic data, while still allowing their use for research purposes, for example, to design breakthrough therapies based on integrated genomic analysis.45

Protection of personal data is designed to ensure their confidentiality. In the case of genetic material, it has to be noted that once disclosed, the data cannot be changed or modified by the data subject. Nondisclosure should be considered an important element of privacy, which is referred to in the United Nations Universal Declaration of Human Rights of 1948: “No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honor and reputation. Everyone has the right to the protection of the law against such interference or attacks.”46 Protection of private life, as well as protection from disclosure of personal data are also regulated by the EU Charter of Fundamental Rights.47 The data subject should be able to control disclosure of their personal data. In this context, one may also refer to “the right not to know.”48 As genetic information concerns not only the current condition of the data subject, but also his or her future predispositions, it is broadly discussed whether and how patients should be informed of potential secondary or incidental findings acquired through genetic testing. The dilemma is whether the patient should be fully informed about all aspects of their health or protected from potential discomfort associated with learning about predispositions that may (or may not) impact their future. The answers “range from never disclosing anything that was not intended, to disclosing every finding.”48 This dilemma may be solved by including a relevant question in an informed consent form to be signed by the patient.

Secondly, knowledge about genetic predisposition may become a basis of discriminative measures in such areas as education, employment, or insurance, based on the genetically identified susceptibility to disease.49 On the other hand, advances in scientific analysis of the human genome lead to identification of characteristics predisposing individuals to specific activities, most notably in sports. Application of such information as selection or employment criteria should also be regarded as discrimination.50 For insurance companies, knowledge of any potential factors that affect future risks of the insured individual is important for calculating the company costs. In 1992, Billings et al51 published a series of case studies concerning discrimination based on the genetic background, including insurance, employment, and education-related examples. There was even a case of a family that was denied the right to adopt a child due to a parent being at risk of Huntington’s disease.51 This threat was referred to in UNESCO’s Universal Declaration on the Human Genome and Human Rights: “No one shall be subjected to discrimination based on genetic characteristics that is intended to infringe or has the effect of infringing human rights, fundamental freedoms and human dignity.”52 The Declaration also stressed that the same issues should guide scientific application of human genome analysis: “research [on the human genome] should fully respect human dignity, freedom and human rights, as well as the prohibition of all forms of discrimination based on genetic characteristics.”52

As genetic data are associated not only with the given person but also with members of his or her family,53 the protection is required in the interest of all data subjects involved. This is particularly important in light of an enormous potential for discrimination and stigmatization of both the subject and the entire family, including distant relatives. Accordingly, in the case of hereditary diseases, when the information about the potential risk of being a carrier of a pathogenic genetic variant should be shared within the family members, an appropriate recommendation must be mentioned in the genetic counsel prepared for the subject, but only he or she has the right to inform the family members. Uncontrolled disclosure of genetic data, without the consent of all parties involved, can easily undermine the foundation of family functioning, even lead to its disintegration. This may have far-reaching psycho-social and family consequences.

Finally, a bigger-scale transfer of data may provide the recipient with a vast material for various analyses. Unless controlled, such analyses may serve many purposes, including commercial (eg, domination on the pharmaceutical market) or even military ones (eg, development of biological weapons). While genomic manipulation is the broadest field of possible development of new biological weapons, the potential weaponization of genetic data could also involve the application of knowledge concerning the susceptibility of specific populations to particular diseases or agents.54

European regulations on genetic data protection

Considering the grave consequences associated with the potential unauthorized access to or disclosure of genetic data, their protection has been legally regulated by the EU. Currently, the major source of law in this respect in the EU is the General Data Protection Regulation (GDPR).55 The Regulation refers to genetic data in article 9, titled accordingly Processing of special categories of personal data, which states in section 1 that “Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation shall be prohibited.”55 Thus, by default, genetic data are protected from any processing. The GDPR further defines specific conditions which allow exceptions from this rule, including processing upon the data subject’s express consent, in the vital interest of the data subject or other natural persons, or for reasons of public interest in the area of public health and others.

Importantly, however, the GDPR relates to the specifically defined area of activities and segments of data; thus, it does not apply in any way to anonymous data that cannot be matched to a specific data subject (either because they are not identifiable at all or because they have been anonymized beyond identification, eg, aggregated data).56 Therefore, it is clear that the data protection regulations cover all information collected for medical purposes: diagnosis, treatment planning, or genetic counselling for patients and their families. On the other hand, controversies in this area concern the scope of interpretation of anonymization, especially in the context of scientific research, when it is necessary to follow particular cases, and the data—even if anonymized—remain individualized to allow for singling out a case, but not for their individual identification through a name or another unique identifier.57 Another option is pseudonymization; however, recital 26 of the Regulation limits its potential application: “Personal data which have undergone pseudonymization, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person.”55

The same goes for the Regulation’s territorial scope,57 as the provisions cover not only processing that takes place inside the EU and the European Economic Area (EEA), and not only processing concerning data of citizens of the EU and EEA member states, but also that related to or concerned with activities within the EEA territory.55

Without prejudice against the provisions on ensuring data subjects’ awareness and consent with respect to processing of their data, the Regulation leaves room for special exemptions in the case of processing related to scientific purposes. These may concern data subjects’ consent to processing: “It is often not possible to fully identify the purpose of personal data processing for scientific research purposes at the time of data collection. Therefore, data subjects should be allowed to give their consent to certain areas of scientific research when in keeping with recognized ethical standards for scientific research. Data subjects should have the opportunity to give their consent only to certain areas of research or parts of research projects to the extent allowed by the intended purpose.”55 Elsewhere, potential exemptions concerning data subjects’ rights are delegated to the responsibility of member states.55 Under no circumstances, however, should these limitations be construed to endanger the protection of human rights, which is the Regulation’s underlying value.

The basic safeguard for the privacy of personal data imposed by the GDPR involves the data controller’s obligation to ensure that all processing (from collection through storage to use) is restricted exclusively to legal forms and lawful purposes. It should be stressed that obtaining an informed consent from the data subject is only one of the ways to justify lawfulness of such processing. Thus, any processing of personal data, either of special category or not, should meet the following criteria55: lawfulness, fairness and transparency, purpose limitation, data minimization, accuracy, storage limitation, integrity, and confidentiality, as well as controller’s accountability.

Transfer of genetic data to the European Economic Area and United States

Transfer of personal data falls into the scope of the GDPR as well.55 Entities that obtain data concerning European citizens are bound by security requirements defined in the Regulation. The provisions of the GDPR are also binding in the member states of the EEA. On the other hand, there exist separate regulations in non-EU states that are not always compatible with the GDPR and the European Charter of Fundamental Rights.

The American provisions, especially the Genetic Information Nondiscrimination Act, also referred to as GINA, focus on preventing discrimination based on predispositions identified through genetic testing, including screening tests.58 This area is also regulated by broader health care provisions, including the Health Insurance Portability and Accountability Act.59 However, some scholars question the sufficiency of the current regulations, quoting massive developments in both acquisition and application of genetic technologies, the need for protection of the data subject’s extended family, as well as insufficient standardization of requirements and sanctions on the federal level.60

The major attempt to establish an interface between the EU and US legal frameworks of medical data protection without compromising the potential for scientific cooperation involved the EU–US Privacy Shield arrangement. However, as the Shield was repeatedly questioned, and finally derogated in total by the European Court of Justice,61 an overall settlement is still missing.

Another challenge was set by the United Kingdom’s (UK) decision to leave the European Union, which made the UK a third party to the EU legal system. While upon Brexit itself the UK adopted the GDPR into its domestic law, highlighting the intention to maintain the level and principles of personal data protection, this unanimity may be threatened in various scenarios.62 On the one hand, in the case of development or amendment of the EU law, the UK will have to decide whether to follow its example or not, and on the other—the UK may also have an ambition to improve or change its own regulations.62 Thus, at this point, Brexit has not modified the legal status quo, but it certainly limited the predictability and stability of the provisions.

Genetic data protection regulations in China

China has also taken significant measures to protect the genetic data of its citizens. While the close cooperation between some Chinese scholarly institutions and the military and / or government administration has raised concern in Europe, the People’s Republic of China also perceives outbound transfer of the genetic data as a threat to the state’s security. The key legal act regulating this area (Regulation on Human Genetic Resources of 2012) expressly states that “international cooperation and / or export of genetic resources … may jeopardize national security, national interests or public security.”63 The system relies on verification and approval of all international research cooperation projects involving transfer of genetic data. Public security purposes were also an important driver in the establishment and development of China’s National Genebank launched in 2016, and designed to become the world’s largest gene bank, so that the genetic data of Chinese citizens are not exported any more.64 “These concerns about the potential strategic significance of genetic resources have also resulted in an unwillingness to share and exchange data, even as Chinese companies are avidly seeking out access to sources of data beyond China.”64 This is especially important in the context of close links of major Chinese companies involved in genetic data storage with military research institutions.65,66

Thus, Chinese regulations are focused on ensuring security of China and its society, with less focus than in other states on safeguards to protect imported foreign data. It should be stressed that—contrary to the European provisions—the Chinese instruments focus on the collective security of the community and not on the protection of individuals, as evidenced also by involvement of the ministry in charge of technological development along with the ministry responsible for health in the preparation of the provisions. On the other hand, recent amendments to the provisions are more balanced, including protection of individuals.67 For health care data protection, the Chinese law also prohibits their transfer abroad (including a prohibition of storing such data on foreign servers), but still, “currently, the Chinese DTC genetic testing business is running in a regulatory vacuum, governed by self-regulation.”68 Direct-to-consumer (DTC) testing in China is thus governed by general laws on consumer rights, and only in 2019 the state prohibited the collection and storage of personal genetic data without the subject’s informed consent.68

The Chinese state’s attitude to the issues of sensitive genetic data strives to consider the topics present in the discourse in other parts of the world.67 On the other hand, the endeavor to control outbound information, matched with ambitious projects to import data and to collect material concerning the Chinese population trigger significant doubts not only among politicians, but also in scientific milieus. Considering the knowledge about the Chinese government’s policies and operations, there are serious doubts about the ethical aspects of genetic research in the People’s Republic of China, especially when the subjects belong to minorities that are discriminated against and oppressed, such as the Uyghurs and Tibetans.69

These doubts have also been addressed by authorities of other states. In February 2021, the National Counterintelligence and Security Center in the USA issued the following statement: “…For years, the People’s Republic of China has collected large healthcare datasets from the U.S. and nations around the globe, through both legal and illegal means, for purposes only it can control… the PRC’s mass collection of DNA at home has helped it carry out human rights abuses against domestic minority groups and support state surveillance. The PRC’s collection of healthcare data from America poses equally serious risks, not only to the privacy of Americans, but also to the economic and national security of the U.S.”70

In 2016, the Chinese government announced a USD 9 billion project allowing for the collection, sequencing, and analysis of genomic data, paving the way for China to become a global leader in precision medicine.71 Genetic data are acquired via various channels, for example, through a world-wide offer of genetic testing or low-cost genomic sequencing services performed in Chinese biotech firms. The apparent main actor in this process, the BGI Group, is the largest private Chinese genetics company publicly traded on the Shenzhen Stock Exchange. Its blockbuster is the branded NIFTY test (Non-Invasive Fetal TrisomY), a patented procedure currently sold in over 50 countries that can detect major chromosomal aberrations in the fetus by capturing DNA from the placenta in the maternal bloodstream about 10 weeks into a pregnancy. The BGI admits that it stores and reanalyzes left-over blood samples and genetic data, not only fetal but also maternal, remaining from the test, including personal details such as a person’s country, height, weight, and medical history. The legality of this research remains highly questionable. So far, almost 10 million women have taken the BGI’s prenatal test globally.71

Another mechanism of acquiring biological material suitable for genetic analysis recently practiced by the BGI has been selling or donating millions of COVID-19 test kits and gene-sequencing labs outside of China.72 According to US security agencies, this was part of an effort to collect massive amounts of foreign genetic material. The BGI freely admitted that only in 2021 it built 80 COVID-19 labs in 30 countries worldwide, which were subsequently planned to be repurposed for reproductive health screening. Even if COVID-19 tests do not collect patient DNA, prenatal tests definitely and purposely do.72

Results of genetic tests of pregnant women not only of Chinese nationality are uploaded to the government-funded China National GeneBank.72 This genomic bank formally collaborates with the BGI and various military hospitals in mainland China. As in the case of prenatal research, the BGI has collaborated with military hospitals on genetic research programs designed to enhance soldiers’ performance, such as combating deafness caused by weapons training, or exploring whether drugs interacting with genes could protect the Han Chinese, the country’s major ethnic group, from brain injury in high altitudes.72 In 2018, the BGI published a paper describing the largest ever study on genetics of the Chinese population. In that study, almost 150 000 prenatal tests were reused for a wide range of genetic analyses, from identifying genes associated with mental disorders to tracking silent viral infections. The study was also aimed at tracing a genetic distinction between the Hans and minorities, including the Uyghurs and Tibetans, and at analyzing population movements and intermarriage that had been taking place since 1949 because of Chinese government policy. In relation to these studies, 2 BGI subsidiaries have been sanctioned for abusive DNA collection and analysis schemes to repress Chinese citizens.73

Other examples of genetic data protection regulations

The issue of privacy protection in relation to genetic material has been addressed by authorities in many modern countries. The need for such a regulation is clear in Japan, where a comprehensive system of data sharing combines outcomes of various scientific project in a vast registry.74 As in other Asian states, including Taiwan and Philippines, the Japanese legislation defines genetic information as personal data and provides respective protection.75 Regulation of genetic discrimination in Asia is complicated, especially in states such as India or Taiwan, with a significant representation of ethnic minorities. However, either legal regulations (Taiwan, Hongkong) or state-endorsed specialist guidelines (Japan, Philippines, South Korea) are applied in this area.75

Quite detailed provisions in both areas have been introduced in Canada, with personal data protection regulations referring to the entire process of data sharing and using, from informed consent through security measures to obligatory breach notification.76,77 In the case of discrimination, “Canada’s federal Genetic Non-Discrimination Act makes it a criminal offence to require disclosure of genetic test results as a condition of entering into a contract; or to collect, use or disclose genetic test results without explicit written consent.”77 Australia, on the other hand, while also working on developing a relevant system, faces the same challenges as the USA, with a complex mesh of federal and state-level regulations, guidelines, norms, standards, all within the context of common law.78

While Poland is bound by the European provisions of the GDPR, it also lacks a specialist act on genetic data protection. As concluded by the Supreme Audit Office in 2018,79 this makes the protection system inefficient: “The Polish legal system has multiple inconsistent provisions, regulating particular aspects of security of genetic testing. They do not ensure sufficient protection of human rights in view of developments in medicine.” The most serious problem quoted by the Supreme Audit Office concerns insufficient control over genetic tests performed outside the state health care system, especially DTC tests.79,80

The inconsistency of regulations in different parts of the world may pose problems both in terms of scientific cooperation and security of individuals and communities.

Epigenetic data protection

Currently, there is a heated debate whether epigenetic information (such as DNA methylation) stored in public databases exposes the genetic information of an individual. On the one hand, the intrinsic variability of epigenetic landscape strongly depends on the tissue, cellular composition of a sample, and /or age, therefore, it may act as a safeguard toward reidentification efforts. On the other hand, platforms used to assess DNA methylation in a genome-wide fashion (eg, whole genome bisulfite sequencing or Illumina EPIC2 methylation array) capture thousands of SNPs in each individual. Therefore, data confidentiality and privacy cannot be guaranteed in the case of high-throughput epigenomic data storage. Consequently, some measures have been recommended to mitigate the risk of reidentification by removal of SNPs using existing algorithms or by masking sites (or probes) that have common SNPs.81,82

Conclusions

In the present paper various risks associated with establishing, processing, exteding, and using scientific databases including genetic data of individuals were analyzed. Further research should concern challenges related to storage and use of medical records. With developing knowledge of the human genome and new technologies enabling the collection, storage, and processing of data on human health, scholars and lawmakers have come to realize the vast needs for research in this area. At the same time, there are serious threats involved. Those can be divided into 3 main groups: 1) for the genetic data subjects, 2) for their close and distant relatives, and 3) for communities.

All these threats are addressed to a varying extent by legal provisions in different countries, depending on their policies and strategies on how to juggle the interests of current citizens, future generations, foreigners, as well as the state. While international cooperation seems key to shaping a safe and effective system of data sharing, discrepant interests and plans of different states render it difficult to reach a broad compromise.

To conclude, the European GDPR data security regime appears to be effective, partly due to the uniformity of recommendations for European countries, while in the case of less uniform regimes (eg, in the US regulations vary from state to state), providing effective security is more challenging.83 Interstate cooperation raises further concerns, as evidenced by the case of China and its combination of strict protection of its own citizens and society with a broad policy of acquiring genetic data of foreigners. These cases show clearly that while legal provisions regulate collection, storage, and use of such sensitive data, there is also a need to study and develop the issues related to access to the databases, especially in terms of verification whether the potential users are bona fide scientists and whether their purposes are purely research-oriented.

Nevertheless, there is no doubt that the construction of clinical-molecular databases is a prerequisite for the development of medicine based on the knowledge of variations in the human genome. Both legal provisions and ethical codes of conduct must continuously evolve to embrace scientific and technological evolutions, while balancing the 3 fundamental objectives: creation and development of international databases at a level enabling their scientific and clinical application, protection of the rights of individuals—data subjects and their family members, and prevention of weaponization of the genetic material on the international level, as well as on the level of intrastate relations between various groups.