This item contains a gzipped archive with ~13,000 orthogroups used to study molecular evolution in this project.
Archive:
krill.orthogroups.tar.gz
Contents of archive (FILE,SIZE,SPECIES,SAMPLES,SNPs):
krill.proteinortho.tsv - the primary output table from Proteinortho. Describes which protein sequences from which species belong to the same orthogroup. Format according to the standard output of the program.
krill.proteinortho.tsv.seqs.csv - a processed table that also contains the actual sequences line by line (see below).
the alignments directory, which contains all OGs in unaligned and aligned files in FASTA format (see below).
Format of the krill.proteinortho.tsv.seqs.csv table
The fields are:
NR = orthogroup number
ORTHO_GROUP = orthogroup ID
N_SPECIES = the number of species
N_GENES = the number of genes/sequences in this orthogroup
N_MATCHING[o] = number of sequences matching outgroup species for this orthogroup
N_NON_MATCHING = number of sequences matching ingroup species for this orthogroup
HEADER = the name of this particular sequence
SEQ = the protein sequence
Contents of the alignments directory
Each orthogroup is represented by up to four FASTA files:
OG*.cds.ginsi.fasta.orig = the original, unaligned and unfiltered sequences
OG*.cds.ginsi.fasta = the aligned and filtered sequences
OG*.cds.ginsi.fasta.without_cold_euphausia.fasta = the aligned and filtered sequences after removing cold-associated Euphausia species
OG*.cds.ginsi.fasta.without_cold_thysanoessa.fasta = the aligned and filtered sequences after removing cold-associated Thysanoessa species
Funding
Local adaptation and genome evolution in crustacean zooplankton: how does size matter?