Data for the Saqqaq genome project
This page holds links to sequence data for the sequencing of a 4000 year old human genome described in
Rasmussen et al, Ancient Human Genome Sequence of an Extinct Palaeo-Eskimo , Nature 463, 757-762 (11 February 2010)
Genotyping data
There are two data sets described in the online supplementary material: The complete genotyping data and the high-confidence data. The data format is described below.
Data format
The two data sets contain data in the same format.
Files with the extension ".diff" holds positions that are different from the reference human genome (hg18 ). Files with the extension ".same" holds positions that are the same as the reference human genome (compressed with gzip).
The files all have the same format and contain the genotype for each position (always relative to the hg18 plus-strand) and information about posterior probability etc. The files have the following tab-separated columns:
- The chromosome name
- Position using 0-indexation (*)
- The reference nucleotide in hg18
- Indicates whether the genotype is the same as the reference nucleotide (y) or not (n)
- Genotype called by the program SNPest (see supplementary material)
- (1-PP), where PP is the posterior probability of the genotype.
- Depth. The number of reads covering the position
- Repeat: If the position lies in an annotated repeat, the ID is given here (otherwise it is '-')
- Distance to nearest SNP (always 1 in the .same-files)
- RS-number: If the position overlaps an annotated SNP, the dbSNP rs-number is given (otherwise it is '-', and in that case the next three fields are also '-')
- Type of dbSNP entry ('single', 'indel' etc. - see the UCSC genome browser for details).
- Strand for dbSNP entry ('+' or '-')
- Type of SNP (e.g. 'AC' for a SNP of type 'single')
* Note that chromosomal positions start at 0. In the UCSC Genome Browser and many other ressources, counting starts at 1, so add one to the numbers in the above mentioned files to compare to these ressources.
Ancient human genome in Nature
Links
Center for Biological Sequence Analysis at the Tech. Univ. of Denmark hosts a genotype-phenotype database of the genome
The raw data from the project have been deposited in the NCBI Short Read Archive, accession number SRA010102.