SciLifeLab
Browse

MRSA case study example data

educational resource
posted on 2023-03-10, 12:39 authored by John SundhJohn Sundh

### Dataset description

This dataset contains fastq files for three Illumina HiSeq runs of an
RNA-seq analysis (see Osmundson J et al., PLoS One, 2013;8(10):e76572)

This data is used as a case-study for the Tools in Reproducible Research
course. We have previously used `fastq-dump` from the `sra-tools` package
to download a subsampled set of sequences from the Sequence Read Archive
(SRA). However,recently sra-tools has become very unreliable due to some
certificate/security issue when downloading from the National Center for
Biotechnology Information (NCBI). We have therefore created this dataset to
use as an alternative starting point for the course case-study.

All three files were generated on the Rackham compute cluster by installing  
sra-tools (v.3.0.3) from the bioconda channel:

```
mamba create -n sra-tools -c bioconda sra-tools
conda activate sra-tools
```

then running:

```
fastq-dump SRR935090 -X 100000 --gzip -Z > SRR935090.fastq.gz
fastq-dump SRR935091 -X 100000 --gzip -Z > SRR935091.fastq.gz
fastq-dump SRR935092 -X 100000 --gzip -Z > SRR935092.fastq.gz
```

Thus, each file contains a subset of 100,000 reads for each sample downloaded
from the original data found in the SRA archive. The original data contains
between 76.3 - 176.6 million reads. The idea is to let the students download
these subsampled files directly or as part of bioinformatic workflows taught
during the course.

Funding

National Bioinformatics Infrastructure Sweden (NBIS)

Swedish Research Council

Find out more...

History

Publisher

National Bioinformatics Infrastructure Sweden (NBIS)

Usage metrics

    National Bioinformatics Infrastructure (NBIS)

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC