SOAPMetaS - Ansatte

SOAPMetaS: profiling large metagenome datasets efficiently on distributed clusters

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Shixu He
Zhibo Huang
Xiaohan Wang
Lin Fang
Shengkang Li
Zhang, Yong
Gengyun Zhang

Rapid increase of the data size in metagenome researches has raised the demand for new tools to process large datasets efficiently. To accelerate the metagenome profiling process in the scenario of big data, we developed SOAPMetaS, a marker gene-based multiple-sample metagenome profiling tool built on Apache Spark. SOAPMetaS demonstrates high performance and scalability to process large datasets. It can process 80 samples of FASTQ data, summing up to 416 GiB, in around half an hour; and the accuracy of species profiling results of SOAPMetaS is similar to that of MetaPhlAn2. SOAPMetaS can deal with a large volume of metagenome data more efficiently than common-used single-machine tools.

Originalsprog	Engelsk
Tidsskrift	Bioinformatics
Vol/bind	37
Udgave nummer	7
Sider (fra-til)	1021-1023
Antal sider	3
ISSN	1367-4803
DOI	https://doi.org/10.1093/bioinformatics/btaa697
Status	Udgivet - 2021

ID: 272641995

Biologisk Institut

SOAPMetaS: profiling large metagenome datasets efficiently on distributed clusters