To assemble or not to resemble - A validated Comparative Metatranscriptomics Workflow (CoMW)

Research output: Contribution to journal › Journal article › Research › peer-review

Standard

To assemble or not to resemble - A validated Comparative Metatranscriptomics Workflow (CoMW). / Anwar, Muhammad Zohaib; Lanzen, Anders; Bang-Andreasen, Toke; Jacobsen, Carsten Suhr.

In: GigaScience, Vol. 8, No. 8, giz096, 2019.

Research output: Contribution to journal › Journal article › Research › peer-review

Harvard

Anwar, MZ, Lanzen, A, Bang-Andreasen, T & Jacobsen, CS 2019, 'To assemble or not to resemble - A validated Comparative Metatranscriptomics Workflow (CoMW)', GigaScience, vol. 8, no. 8, giz096. https://doi.org/10.1093/gigascience/giz096

APA

Anwar, M. Z., Lanzen, A., Bang-Andreasen, T., & Jacobsen, C. S. (2019). To assemble or not to resemble - A validated Comparative Metatranscriptomics Workflow (CoMW). GigaScience, 8(8), [giz096]. https://doi.org/10.1093/gigascience/giz096

Vancouver

Anwar MZ, Lanzen A, Bang-Andreasen T, Jacobsen CS. To assemble or not to resemble - A validated Comparative Metatranscriptomics Workflow (CoMW). GigaScience. 2019;8(8). giz096. https://doi.org/10.1093/gigascience/giz096

Author

Anwar, Muhammad Zohaib ; Lanzen, Anders ; Bang-Andreasen, Toke ; Jacobsen, Carsten Suhr. / To assemble or not to resemble - A validated Comparative Metatranscriptomics Workflow (CoMW). In: GigaScience. 2019 ; Vol. 8, No. 8.

Bibtex

@article{c2073fbe62ff42e7a474442e3dc6e6e4,

title = "To assemble or not to resemble - A validated Comparative Metatranscriptomics Workflow (CoMW)",

abstract = "BACKGROUND: Metatranscriptomics has been used widely for investigation and quantification of microbial communities' activity in response to external stimuli. By assessing the genes expressed, metatranscriptomics provides an understanding of the interactions between different major functional guilds and the environment. Here, we present a de novo assembly-based Comparative Metatranscriptomics Workflow (CoMW) implemented in a modular, reproducible structure. Metatranscriptomics typically uses short sequence reads, which can either be directly aligned to external reference databases ({"}assembly-free approach{"}) or first assembled into contigs before alignment ({"}assembly-based approach{"}). We also compare CoMW (assembly-based implementation) with an assembly-free alternative workflow, using simulated and real-world metatranscriptomes from Arctic and temperate terrestrial environments. We evaluate their accuracy in precision and recall using generic and specialized hierarchical protein databases. RESULTS: CoMW provided significantly fewer false-positive results, resulting in more precise identification and quantification of functional genes in metatranscriptomes. Using the comprehensive database M5nr, the assembly-based approach identified genes with only 0.6% false-positive results at thresholds ranging from inclusive to stringent compared with the assembly-free approach, which yielded up to 15% false-positive results. Using specialized databases (carbohydrate-active enzyme and nitrogen cycle), the assembly-based approach identified and quantified genes with 3-5 times fewer false-positive results. We also evaluated the impact of both approaches on real-world datasets. CONCLUSIONS: We present an open source de novo assembly-based CoMW. Our benchmarking findings support assembling short reads into contigs before alignment to a reference database because this provides higher precision and minimizes false-positive results.",

keywords = "alignment, assembly, benchmarking, false-positive results, metatranscriptomics, precision, recall",

author = "Anwar, {Muhammad Zohaib} and Anders Lanzen and Toke Bang-Andreasen and Jacobsen, {Carsten Suhr}",

year = "2019",

doi = "10.1093/gigascience/giz096",

language = "English",

volume = "8",

journal = "GigaScience",

issn = "2047-217X",

publisher = "Oxford Academic",

number = "8",

}

RIS

TY - JOUR

T1 - To assemble or not to resemble - A validated Comparative Metatranscriptomics Workflow (CoMW)

AU - Anwar, Muhammad Zohaib

AU - Lanzen, Anders

AU - Bang-Andreasen, Toke

AU - Jacobsen, Carsten Suhr

PY - 2019

Y1 - 2019

N2 - BACKGROUND: Metatranscriptomics has been used widely for investigation and quantification of microbial communities' activity in response to external stimuli. By assessing the genes expressed, metatranscriptomics provides an understanding of the interactions between different major functional guilds and the environment. Here, we present a de novo assembly-based Comparative Metatranscriptomics Workflow (CoMW) implemented in a modular, reproducible structure. Metatranscriptomics typically uses short sequence reads, which can either be directly aligned to external reference databases ("assembly-free approach") or first assembled into contigs before alignment ("assembly-based approach"). We also compare CoMW (assembly-based implementation) with an assembly-free alternative workflow, using simulated and real-world metatranscriptomes from Arctic and temperate terrestrial environments. We evaluate their accuracy in precision and recall using generic and specialized hierarchical protein databases. RESULTS: CoMW provided significantly fewer false-positive results, resulting in more precise identification and quantification of functional genes in metatranscriptomes. Using the comprehensive database M5nr, the assembly-based approach identified genes with only 0.6% false-positive results at thresholds ranging from inclusive to stringent compared with the assembly-free approach, which yielded up to 15% false-positive results. Using specialized databases (carbohydrate-active enzyme and nitrogen cycle), the assembly-based approach identified and quantified genes with 3-5 times fewer false-positive results. We also evaluated the impact of both approaches on real-world datasets. CONCLUSIONS: We present an open source de novo assembly-based CoMW. Our benchmarking findings support assembling short reads into contigs before alignment to a reference database because this provides higher precision and minimizes false-positive results.

AB - BACKGROUND: Metatranscriptomics has been used widely for investigation and quantification of microbial communities' activity in response to external stimuli. By assessing the genes expressed, metatranscriptomics provides an understanding of the interactions between different major functional guilds and the environment. Here, we present a de novo assembly-based Comparative Metatranscriptomics Workflow (CoMW) implemented in a modular, reproducible structure. Metatranscriptomics typically uses short sequence reads, which can either be directly aligned to external reference databases ("assembly-free approach") or first assembled into contigs before alignment ("assembly-based approach"). We also compare CoMW (assembly-based implementation) with an assembly-free alternative workflow, using simulated and real-world metatranscriptomes from Arctic and temperate terrestrial environments. We evaluate their accuracy in precision and recall using generic and specialized hierarchical protein databases. RESULTS: CoMW provided significantly fewer false-positive results, resulting in more precise identification and quantification of functional genes in metatranscriptomes. Using the comprehensive database M5nr, the assembly-based approach identified genes with only 0.6% false-positive results at thresholds ranging from inclusive to stringent compared with the assembly-free approach, which yielded up to 15% false-positive results. Using specialized databases (carbohydrate-active enzyme and nitrogen cycle), the assembly-based approach identified and quantified genes with 3-5 times fewer false-positive results. We also evaluated the impact of both approaches on real-world datasets. CONCLUSIONS: We present an open source de novo assembly-based CoMW. Our benchmarking findings support assembling short reads into contigs before alignment to a reference database because this provides higher precision and minimizes false-positive results.

KW - alignment

KW - assembly

KW - benchmarking

KW - false-positive results

KW - metatranscriptomics

KW - precision

KW - recall

U2 - 10.1093/gigascience/giz096

DO - 10.1093/gigascience/giz096

M3 - Journal article

C2 - 31363751

AN - SCOPUS:85070884436

VL - 8

JO - GigaScience

JF - GigaScience

SN - 2047-217X

IS - 8

M1 - giz096

ER -

ID: 226786014

Department of Biology