### General information

Author: National Bioinformatic Infrastructure Sweden, https://nbis.se
Contact e-mail: john.sundh@nbis.se
DOI: 10.17044/scilifelab.22246417
License: CC BY 4.0
This readme file was last updated: 2023-03-10

Please cite as: NBIS (2023). MRSA case study example data. https://doi.org/10.17044/scilifelab.22246417

### Dataset description

This dataset contains fastq files for three Illumina HiSeq runs of an 
RNA-seq analysis (see Osmundson J et al., PLoS One, 2013;8(10):e76572)

This data is used as a case-study for the Tools in Reproducible Research
course. We have previously used `fastq-dump` from the `sra-tools` package 
to download a subsampled set of sequences from the Sequence Read Archive 
(SRA). However,recently sra-tools has become very unreliable due to some 
certificate/security issue when downloading from the National Center for 
Biotechnology Information (NCBI). We have therefore created this dataset to 
use as an alternative starting point for the course case-study.

All three files were generated on the Rackham compute cluster by installing  
sra-tools (v.3.0.3) from the bioconda channel:

```
mamba create -n sra-tools -c bioconda sra-tools
conda activate sra-tools
```

then running:

```
fastq-dump SRR935090 -X 100000 --gzip -Z > SRR935090.fastq.gz
fastq-dump SRR935091 -X 100000 --gzip -Z > SRR935091.fastq.gz
fastq-dump SRR935092 -X 100000 --gzip -Z > SRR935092.fastq.gz
```

Thus, each file contains a subset of 100,000 reads for each sample downloaded 
from the original data found in the SRA archive. The original data contains 
between 76.3 - 176.6 million reads. The idea is to let the students download 
these subsampled files directly or as part of bioinformatic workflows taught 
during the course.