SciLifeLab
Browse
1/1
3 files

1. Comparative population transcriptomics in krill: reference transcriptomes (FASTA, GFF, TSV files)

dataset
posted on 2023-10-19, 13:31 authored by Andreas WallbergAndreas Wallberg

This item holds one major gzipped tar archive that contains 20 nested tar archives, each of which containing reference transcriptomes and associated metadata for one species of krill (20 species in total).

Archive:

krill.transcriptomes.tar.gz

Contents of major archive (FILE,TAG,SPECIES,SIZE):

  • earm.transcriptomes.tar,earm,Euphausia similis var. armata,491.6M
  • ecry.transcriptomes.tar,ecry,Euphausia crystallorophias,89.7M
  • edin.transcriptomes.tar,edin,Euphausia distinguenda,496.5M
  • efri.transcriptomes.tar,efri,Euphausia frigida,345.2M
  • elam.transcriptomes.tar,elam,Euphausia lamelligera,515.9M
  • elos.transcriptomes.tar,elos,Euphausia longirostris,234M
  • emuc.transcriptomes.tar,emuc,Euphausia mucronata,360.4M
  • epac.transcriptomes.tar,epac,Euphausia pacifica,357.1M
  • erec.transcriptomes.tar,erec,Euphausia recurva,114.8M
  • esim.transcriptomes.tar,esim,Euphausia similis,417.9M
  • espi.transcriptomes.tar,espi,Euphausia spinifera,425.1M
  • esup.transcriptomes.tar,esup,Euphausia superba,520.6M
  • etri.transcriptomes.tar,etri,Euphausia triacantha,396M
  • eval.transcriptomes.tar,eval,Euphausia vallentini,635.1M
  • mnor.transcriptomes.tar,mnor,Meganyctiphanes norvegica,469M
  • nmeg.transcriptomes.tar,nmeg,Nematoscelis megalops,429M
  • tine.transcriptomes.tar,tine,Thysanoessa inermis,594.6M
  • tlon.transcriptomes.tar,tlon,Thysanoessa longicaudata,328.8M
  • tmac.transcriptomes.tar,tmac,Thysanoessa macrura,253.4M
  • trac.transcriptomes.tar,trac,Thysanoessa raschii,231.2M

Contents of nested archives:

Each nested tar archive contains the follow set of files (the "TAG" prepends the filenames according to the list of species tags above):

TAG. trinity.fasta

The full Trinity transcriptomem, including non-coding transcripts and alternative isoforms

TAG.trinity.longest_isoforms.fasta.renamed.list.tsv:

A TSV table to translate between original Trinity transcript sequence names (field 3) and names used throughout the analyses (field 2). This table contains the longest isoforms, i.e. the resulting transcripts after removing redundant shorter isoforms.

  • field 1: number
  • field 2: species-specific transcript sequence names used in analyses. The sequence name follow the format "TAG_NUMBER" for non-coding transcripts and "TAG_NUMBER_OTHER_NUMBER" for coding transcripts (the last number indicates which reading-frame was selected by transdecoder as the best).
  • field 3: original Trinity transcript sequence names

TAG.trinity.longest_isoforms.coding.fasta

The filtered transcriptome, including only the longest isoform of each coding transcript.

TAG.trinity.longest_isoforms.coding.fasta.transdecoder.gff3

A GFF coordinate file that specifies where along the coding transcripts features such as CDS, UTRs start and stop.

TAG.trinity.longest_isoforms.fasta.transdecoder.cds.fasta

The CDS of the open reading frame of coding transcripts, as specified by the TAG.trinity.longest_isoforms.coding.fasta.transdecoder.gff3 GFF file and the TAG.trinity.longest_isoforms.coding.fasta file.

TAG.trinity.longest_isoforms.fasta.transdecoder.pep.fasta

The corresponding peptide sequence of encoded by each CDS.

The GFF files follow the GFF3 standard:

https://www.ensembl.org/info/website/upload/gff3.html

The FASTA files follow the FASTA standard:

https://www.ncbi.nlm.nih.gov/genbank/fastaformat/

Note: Compared to the files used in analyses, these files have been edited to reflect the species names and abbreviations used in publication figures.

Funding

Local adaptation and genome evolution in crustacean zooplankton: how does size matter?

Swedish Research Council

Find out more...

History

Publisher

Uppsala University

Access request email

andreas.wallberg@imbim.uu.se

Usage metrics

    Andreas Wallberg Lab

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC