SciLifeLab
Browse
1/1
9 files

8. Ecological genomics of the Northern krill: Recombination rates and demographic history

dataset
posted on 2024-03-28, 00:10 authored by Andreas WallbergAndreas Wallberg, Per UnnebergPer Unneberg

This item contains archives of data and results used to assess recombination rates (iSMC), demographic history (PSMC, MSMC) and haplotype ages (GEVA) using coalescent methods.

Population definitionsPopulation definitions are the same as desribed in a different item:

  1. "at vs. me" = Atlantic Ocean samples (n=67) vs. the Mediterranean (i.e. Barcelona) samples (n=7).
  2. "we vs. ea" = South-West North Atlantic Ocean (n=20) vs. North-East North Atlantic Ocean (n=47). In files using this contrast, sometimes the label "wa" is used instead of "we" for the South-West North Atlantic Ocean samples.

Contents:

  1. psmc_dataset.psmcfa.gz, datasets for PSMC-analyses containing signatures of heterozygosity in the reference specimen that were converted from VCF into the fasta-like PSMCFA format.
  2. msmc_datasets.tar.gz, datasets for MSMC-analyses containing signatures of heterozygosity in the reference specimen that were converted from VCF into TSV.
  3. ismc_dataset.tar.gz, the VCF dataset and accessory files for iSMC-analyses used to infer recombination rates.
  4. geva_datasets.candidates.at_vs_me.tar.gz, the re-coded VCF and binary format datasets as well as analysis output for the 660 candidate gene loci analyzed for "at" and "me" populations in the "at vs. me" contrast.
  5. geva_datasets.candidates.we_vs_ea.tar.gz, the re-coded VCF and binary format datasets as well as analysis output for the 34 candidate gene loci analyzed for "we" and "ea" populations in the "we vs. ea" contrast.
  6. geva_results.candidates.at_vs_me.tar.gz, the resulting age estimates of minor alleles in the "at vs. me" contrast.
  7. geva_results.candidates.we_vs_ea.tar.gz, the resulting age estimates of minor alleles in the "we vs. ea" contrast.

psmc_dataset.psmcfa.gz

A FASTA-like file that encodes the distribution of heterozygous genotypes across 4,911 sequences in the diploid reference specimen at the 10 bp window resolution. Character states are:

  • N=a window with only inaccessible sites (i.e. missing data)
  • T=a window with accessible data
  • K=a window with accessible data and at least one heterozygous genotype

This format is further documented on the site of the original tool: https://github.com/lh3/psmc

msmc_datasets.tar.gz

This archive contains one TSV file per sequence (n=5,176) that specify the distribution of heterozygous genotypes. It countains four fields. Example: seq_s_1 2039 171 TC

  1. name of sequence
  2. position of the heterozygous genotype
  3. number of accessible sites since the last heterozygous genotype
  4. the heterozygous genotype (only two a string with alleles in this case when analysing a single individual)

This format is further documented on the site of the original tool: https://github.com/stschiff/msmc-tools/blob/master/msmc-tutorial/guide.md

ismc_dataset.tar.gz

This archive contains several files:

  • 1.merged_contigs.vcf = specifies the distribution of heterozygous genotypes in VCF format
  • 1.merged_contigs.tab = specifies the lengths of sequences (TSV format)
  • 1.merged_contigs.bpp = the program control file with run-time parameters (TXT)
  • 1.merged_contigs.fasta = specifies accessible and inaccessible sites ("N") in FASTA format
  • 1.merged_contigs.out_estimates.txt = the summary results of the analysis (TXT)

geva_datasets.candidates.at_vs_me.tar.gz and geva_datasets.candidates.we_vs_ea.tar.gz

These archives hold data and results from analysing variant ages at each of the 660 or 34 candidate gene loci with divergent haplotypes in each of the two contrasts. For each locus, the files span:

  • Two recoded VCF files. In the first file, the minor allele in one of the two populations (e.g. "at") was taken to represent the derived allele and coded as the ALT allele. In the second file, the minor allele in the other group (e.g. "me") was taken to represent the derived allele and coded as the ALT allele.
  • Intermediate data files generated by GEVA by processing the VCF files (*.bin, *.marker.txt, *.sample.txt), including a log and err file.
  • Results files (*.pairs.txt.gz and *.sites.txt). The "*.sites.txt" contain allele age estimates under mutation clock (M), recombination clock (R), and joint clock models (J). The format of these files are described on site of the original tool: https://github.com/pkalbers/geva

geva_results.candidates.at_vs_me.tar.gz and geva_results.candidates.we_vs_ea.tar.gz

These archives contains four TSV files each. For each population (e.g. "at") there are two files. One of them collects all minor allele age estimates under all three models and the other only for the joint model.

Funding

Climate genomics in the Northern krill: the past, present and future of an important marine species

Swedish Research Council for Environment Agricultural Sciences and Spatial Planning

Find out more...

History

Publisher

Uppsala University

Usage metrics

    Andreas Wallberg Lab

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC