Sequencing introduced false positive rare taxa lead to biased microbial community diversity, assembly, and interaction interpretation in amplicon studies

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Dokumenter

  • Fulltext

    Forlagets udgivne version, 6,69 MB, PDF-dokument

  • Yangyang Jia
  • Shengguo Zhao
  • Wenjie Guo
  • Ling Peng
  • Fang Zhao
  • Lushan Wang
  • Guangyi Fan
  • Yuanfang Zhu
  • Dayou Xu
  • Guilin Liu
  • Ruoqing Wang
  • Xiaodong Fang
  • He Zhang
  • Kristiansen, Karsten
  • Wenwei Zhang
  • Jianwei Chen

Background: Increasing studies have demonstrated potential disproportionate functional and ecological contributions of rare taxa in a microbial community. However, the study of the microbial rare biosphere is hampered by their inherent scarcity and the deficiency of currently available techniques. Sample-wise cross contaminations might be introduced by sample index misassignment in the most widely used metabarcoding amplicon sequencing approach. Although downstream bioinformatic quality control and clustering or denoising algorithms could remove sequencing errors and non-biological artifact reads, no algorithm could eliminate high quality reads from sample-wise cross contaminations introduced by index misassignment, making it difficult to distinguish between bona fide rare taxa and potential false positives in metabarcoding studies. Results: We thoroughly evaluated the rate of index misassignment of the widely used NovaSeq 6000 and DNBSEQ-G400 sequencing platforms using both commercial and customized mock communities, and observed significant lower (0.08% vs. 5.68%) fraction of potential false positive reads for DNBSEQ-G400 as compared to NovaSeq 6000. Significant batch effects could be caused by stochastically introduced false positive or false negative rare taxa. These false detections could also lead to inflated alpha diversity of relatively simple microbial communities and underestimated that of complex ones. Further test using a set of cow rumen samples reported differential rare taxa by different sequencing platforms. Correlation analysis of the rare taxa detected by each sequencing platform demonstrated that the rare taxa identified by DNBSEQ-G400 platform had a much higher possibility to be correlated with the physiochemical properties of rumen fluid as compared to NovaSeq 6000 platform. Community assembly mechanism and microbial network correlation analysis indicated that false positive or negative rare taxa detection could lead to biased community assembly mechanism and identification of fake keystone species of the community. Conclusions: We highly suggest proper positive/negative/blank controls, technical replicate settings, and proper sequencing platform selection in future amplicon studies, especially when the microbial rare biosphere would be focused.

OriginalsprogEngelsk
Artikelnummer43
TidsskriftEnvironmental Microbiomes
Vol/bind17
Antal sider18
ISSN1944-3277
DOI
StatusUdgivet - 2022

Bibliografisk note

Funding Information:
This work was partly supported by the National Natural Science Foundation of China (grand number 32100047), the Agricultural Science and Technology Innovation Program (grand number ASTIP-IAS12), the State Key Laboratory of Animal Nutrition (grand number 2004DA125184G2108) and the Science Technology and Innovation Committee of Shenzhen Municipality, China (grand number SGDX20190919142801722).

Funding Information:
The authors thank China National GeneBank and GeneBank (Qingdao) for sequencing and experiment coordination.

Publisher Copyright:
© 2022, The Author(s).

ID: 318802509