SciLifeLab
Browse
1/1
21 files

5. Ecological genomics of the Northern krill: Gene family evolution and homology

dataset
posted on 2024-03-28, 00:10 authored by Andreas WallbergAndreas Wallberg, Per UnnebergPer Unneberg

This item contains analyses of gene homology between the Northern krill and nine other crustacean or krill species. The datasets span sequence files, orthology assessments and statistics about gene family evolution, divergence and molecular evolution. The species are associated with the following labels for file names and sequence headers:

mnor = Meganyctiphanes norvegica (Northern krill)

cqua = Cherax quadricarinatus (Australian red claw crayfish)

dmag = Daphnia magna (water flea)

eaff = Eurytemora affinis (copepod)

hame = Homarus americanus (American lobster)

hazt = Hyalella azteca (amphipod)

phaw = Parhyale hawaiensis (amphipod)

pmon = Penaeus monodon (Black tiger shrimp)

pvan = Penaeus vannamei (Whiteleg shrimp)

pvir = Procambarus virginalis (Marbled crayfish)

Contents:

  1. crustacean_homologs.non_redundant_datasets.tar.gz, contains an archive with peptide sequences for ten crustacean species, including those derived from the gene models in the Northern krill genome assembly.
  2. crustacean_homologs.proteinortho_orthologs.tsv, results of the orthology inference among the peptide sequences using Proteinortho (standard Proteinortho format in TSV format). Documentation about the format is available on the site of the orignal tool: https://gitlab.com/paulklemm_PHD/proteinortho#readme
  3. crustacean_homologs.proteinortho_1011_single_copy_orthologs_gblocks_unfiltered.tar.gz, multiple sequence alignments of n=1,011 single-copy orthologs between the ten species in FASTA format.
  4. crustacean_homologs.proteinortho_1011_single_copy_orthologs_gblocks_filtered.tar.gz, as above but unreliable alignment positions have been deleted using Gblocks.
  5. crustacean_homologs.swiftortho_gene_families.tar.gz, shared gene families inferred across the ten species using SwiftOrtho.
  6. brh_mnor_esup_krill.peptide_sequences.fasta.tar.gz, peptide sequences in FASTA format used to detect reciprocal best hits between the Northern krill and the Antarctic krill.
  7. brh_mnor_esup_krill.cds_sequences.fasta.tar.gz, the corresponding CDS nucleotide sequences in FASTA.
  8. brh_mnor_esup_krill.gene_alignments.tar.gz, pairwise sequence-alignments and results of analysis of synonymous and non-synonymous sites and divergences (mixed formats).
  9. brh_mnor_esup_krill.kaks_S_N_sites_dn_ds_dnds.tar.gz, summary tables of synonymous and non-synonymous sites and divergences (TSV files).
  10. nrf6_alignments.tar.gz, multiple sequence alignments between sequences of the nrf-6 gene in the Northern krill, Antarctic krill and Whiteleg shrimp (FASTA).
  11. wgd.datasets_and_results.tar, datasets and results used to study signatures of synonymous divergence (Ks) and whole genome duplication (wgd) among six crustaceans including the Northern krill.
  12. crustacean_opsins.rstb20210289_si_004.fasta.with_krill.aligned.fasta, a protein alignment of crustacean opsin sequences in FASTA format, including the Northern krill.
  13. crustacean_opsins.rstb20210289_si_004.fasta.with_krill.aligned.fasta.only_krill.fasta, a subset protein alignment of krill opsin sequences in FASTA format, including the Northern krill.
  14. homeodomain.fasta.with_krill.fa.aligned.fa, alignment of animal Hox genes in FASTA format, including the Northern krill.
  15. alkbh2.fasta.aligned.fasta.trimal, alignment of alkbh2 genes in FASTA format, including the Northern krill.
  16. dnmt1.fasta.aligned.fasta.trimal, alignment of dnmt1 genes in FASTA format, including the Northern krill.
  17. dnmt2.fasta.aligned.fasta.trimal, alignment of dnmt2 genes in FASTA format, including the Northern krill.
  18. dnmt3.fasta.aligned.fasta.trimal, alignment of dnmt3 genes in FASTA format, including the Northern krill.
  19. tet.fasta.aligned.fasta.trimal, alignment of tet2 genes in FASTA format, including the Northern krill.

crustacean_homologs.non_redundant_datasets.tar.gz

This archive contains one FASTA file per species, as well as one TSV file per species that translate between the simplified sequence names used in the FASTA file and the original NCBI sequence labels.

crustacean_homologs.swiftortho_gene_families.tar.gz

This archive contains multiple files:

  • all_matches.out = a TSV output file from SwiftOrtho with homology information based on all vs. all hits. The format is similar to tabular BLAST output ("blastp -m8") but also contains sequence lengths in the last two columns. The SwiftOrtho format is documented on the site of the original tool: https://github.com/Rinoahu/SwiftOrtho
  • all_matches.out.30_30.orth.apc = line by line gene family assignments using the Affinity Aropagation APC algorithm (TSV).
  • all_matches.out.30_30.orth.apc.table.ALL.csv = the above gene family assignment but written as a count data file for CAFE. The format is described on the site of the original tool: https://github.com/hahnlab/CAFE5
  • all_matches.out.30_30.orth.apc.table.mnor.csv, as above but filtered to only contain gene families with at least one gene in the Northern krill.
  • all_matches.out.30_30.orth.apc.table.mnor_dmag.csv, as above but filtered to only contain gene families with at least one gene in the Northern krill and the water flea.

brh_mnor_esup_krill.gene_alignments.tar.gz

This file contains pairwise sequence alignments for n=13,373 putatively orthologous genes between the Northern krill (mnor) and the Antarctic krill Euphausia superba (esup), as well as associated files to analyse sequence composition and divergence. The orthologs were detected using reciprocal best hit with BLASTP and gene has its own directory. Each directory contains:

  • seq.aa.fasta = the two peptide sequences encoded by the homologus genes
  • seq.nt.fasta = the corresponding nucleotide sequences
  • seq.aa.fasta.ginsi.fasta = pairwise alignment of peptide sequences
  • seq.aa.fasta.ginsi.fasta.pal2nal.fasta = aligned nucleotide sequences fitted using PAL2NAL (also includes an additional log file)
  • seq.aa.fasta.ginsi.fasta.pal2nal.fasta.nogaps.fasta = alignment after removing all columns with gaps.
  • seq.aa.fasta.ginsi.fasta.pal2nal.fasta.nogaps.fasta.axt = alignment in AXT format for KaKs Calculator
  • seq.aa.fasta.ginsi.fasta.pal2nal.fasta.nogaps.fasta.axt.YN.tsv = output from KaKs Calculator, including synonymous and non-synonymous sites and substitutions (also includes an additional log file)

wgd.datasets_and_results.tar.gz

This archive contains one directory with wgd data and results of analyses per species (n=6):

  1. Meganyctiphanes norvegica
  2. Homarus americanus
  3. Penaeus monodon
  4. Hyalella azteca
  5. Eurytemora affinis
  6. Daphnia magna

The files include the coding sequences used to study wgd patters (FASTA-format), gene family clustering output (MCL) and the Ks distributions and mixture model tests. The details of these files are documented on the site of the orignal tool: https://wgd.readthedocs.io/en/latest/methods.html


Funding

Climate genomics in the Northern krill: the past, present and future of an important marine species

Swedish Research Council for Environment Agricultural Sciences and Spatial Planning

Find out more...

History

Publisher

Uppsala University