2. Ecological genomics of the Northern krill: Genome assembly mask sequences
- genome_mask.sample_and_coverage_depth_profile_accessible_sites.fa.gz, a per-base file indicating accessible or inaccessible sites according to both mapping depth and sequence coverage computed from population genetic data. This file can be used to compute for example the number of accessible sites in windows of arbitrary lengths and correct diversity estimates.
- genome_mask.gene_region_profile.fa.gz, a per-base file indicating genomic regions (e.g. intergenic, intron, CDS) computed from non-redundant protein-coding gene bodies (coordinates for those are given in a GFF file in another item)
- genome_mask.gene_region_profile_masked_inaccessible.fa.gz, a per-base file indicating genomic regions as above, but inaccessible sites have been masked.
- genome_mask.repeat_masked.fa.gz, a repeat-masked version of the genome assembly.
- genome_mask.repeat_masked_rewritten.fa.gz, a repeat-masked version of the genome assembly, rewritten to encode repeated vs non-repeated bases differently.
- 0 = inaccessible sites
- 1 = accessible sites
- 1 = intergenic
- 2 = intron
- 3 = 3′-UTR
- 4 = exon (typically overwritten by UTRs or coding sequence)
- 5 = 5′-UTR
- 6 = CDS including any start and stop codons
This item holds one major tar archive that contains three gzipped genome masks saved in FASTA format. These files represents masks for the finished genome assembly of the Northern krill.
In these files, sequence names are the same as those in the main genome assembly DNA file. Instead of containing DNA sequences however, the sequences contain per-base symbols indicating accessible sites or gene regions.
These masks apply to the "main" genome assembly, i.e. they match the genome assembly fasta file "1.m_norvegica.main_w_mito.fasta".
This file contains sequence masks with the following states:
Sites with more than 281x or less than 94x coverage based on short-read mappings of 74 specimens (including the reference individual) or less than 37 mappable individuals were coded as inaccessible.
This file contains sequence masks with the following states:
As in (2), but inaccessible sites with states 0 from (1) have been written on top of the gene region mask.
Files 2 and 3 each contains a matching "GLOBAL.csv" tab-separated spreadsheet file, respectively, detailing the length of each type of genomic region, before and after masking inaccessible sites.
This is the repeat-masked version of the main genome-assembly, with repeated-marked bases detected by RepeatMasker written in lower-case and non-repeated bases in upper case, as is standard.
This is the repeat-masked version of the main genome-assembly, but with re-written with the following states per base:
0 = unrepeated bases
1 = repeated bases
Funding
Climate genomics in the Northern krill: the past, present and future of an important marine species
Swedish Research Council for Environment Agricultural Sciences and Spatial Planning
Find out more...