TrustGWAS: A full-process workflow for encrypted GWAS using multi-key homomorphic encryption and pseudorandom number perturbation

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

  • YANG, Meng
  • Chuwen Zhang
  • Xiaoji Wang
  • Xingmin Liu
  • Shisen Li
  • Jianye Huang
  • Zhimin Feng
  • Xiaohui Sun
  • Fang Chen
  • Shuang Yang
  • Ming Ni
  • Lin Li
  • Yanan Cao
  • Feng Mu
The statistical power of genome-wide association studies (GWASs) is affected by the effective sample size. However, the privacy and security concerns associated with individual-level genotype data pose great challenges for cross-institutional cooperation. The full-process cryptographic solutions are in demand but have not been covered, especially the essential principal-component analysis (PCA). Here, we present TrustGWAS, a complete solution for secure, large-scale GWAS, recapitulating gold standard results against PLINK without compromising privacy and supporting basic PLINK steps including quality control, linkage disequilibrium pruning, PCA, chi-square test, Cochran-Armitage trend test, covariate-supported logistic regression and linear regression, and their sequential combinations. TrustGWAS leverages pseudorandom number perturbations for PCA and multiparty scheme of multi-key homomorphic encryption for all other modules. TrustGWAS can evaluate 100,000 individuals with 1 million variants and complete QC-LD-PCA-regression workflow within 50 h. We further successfully discover gene loci associated with fasting blood glucose, consistent with the findings of the ChinaMAP project.
OriginalsprogEngelsk
TidsskriftCell Systems
Vol/bind13
Udgave nummer9
Sider (fra-til)752-767.e6
Antal sider23
ISSN2405-4712
DOI
StatusUdgivet - 2022

Bibliografisk note

Funding Information:
This research is supported by Ministry of Science and Technology of the People’s Republic of China’s program titled “ Science & Technology Boost Economy (2020) ” ( SQ2020YFF0426292 ), the National Key R&D Program of China (No. 2020YFA0112800 , 2020YFA0112801 to Y.C.) and CAMS Innovation Fund for Medical Sciences (2020-12M-5-002). We thank Professor Thomas Hamelryck from Department of Biology, University of Copenhagen for advice on problem definition and experiment planning. We thank Professor Yu Yu from Shanghai Jiao Tong University, Department of Computer Science and Engineering for insightful advice on how to use MK-CKKS framework. We thank Professor Kai Chen, J.X.Z., and D.C. from Hong Kong University of Science and Technology, Department of Computer Science and Engineering for discussions regarding pseudo random number perturbations.

Funding Information:
This research is supported by Ministry of Science and Technology of the People's Republic of China's program titled “Science & Technology Boost Economy (2020)” (SQ2020YFF0426292), the National Key R&D Program of China (No. 2020YFA0112800, 2020YFA0112801 to Y.C.) and CAMS Innovation Fund for Medical Sciences (2020-12M-5-002). We thank Professor Thomas Hamelryck from Department of Biology, University of Copenhagen for advice on problem definition and experiment planning. We thank Professor Yu Yu from Shanghai Jiao Tong University, Department of Computer Science and Engineering for insightful advice on how to use MK-CKKS framework. We thank Professor Kai Chen, J.X.Z. and D.C. from Hong Kong University of Science and Technology, Department of Computer Science and Engineering for discussions regarding pseudo random number perturbations. M.Y. conceived the problem and designed the studies. M.Y. C.Z. X.L. S.L. and J.H. developed TrustGWAS protocols and performed benchmark analysis. X.W. Z.F. and X.S. performed GWAS analysis for 1KG and ChinaMAP dataset. L.L. and Y.C. coordinated the real-world datasets and facilitated insightful discussions on GWAS analysis. F.C. provided advice on genetics analysis. M.N. S.Y. and F.M. supervised the work. M.Y. and C.Z. wrote the manuscript. F.M. is an employee and shareholder of MGI Tech Co. Ltd. We have a patent related to this work with the patent application number 202110807148.6 in China.

Publisher Copyright:
© 2022 Elsevier Inc.

ID: 333435982