Examples running GABI
Below are some examples of how you could approach the assembly of your genomes. We will assume that you use a site-specific config file named my_profile. See our installation guide for ways to configure the pipeline.
Illumina short-reads
This is probably the most common approach used for obtaining genomic information from bacterial isolates. Typically, you will have one set of paired-end reads per sample - so your samplesheet would look as follows:
sample platform fq1 fq2
SampleA ILLUMINA /path/to/sampleA_R1.fastq.gz /path/to/sampleA_R2.fastq.gz
SampleB ILLUMINA /path/to/sampleB_R1.fastq.gz /path/to/sampleB_R2.fastq.gz
With this samplesheet, GABI can then be run like so:
Run GABI with all-defaults options.
Switch from Shovill to Unicycler for assembly.
ONT reads
Inexpensive long-reads generated using Nanopore sequencing are an attractive alternative to Illumina sequencing as the superior read length typically enables the reconstruction of the entire bacterial chromosome into one contig. Downsides include platform-specific homopolymer artifacts and a lower per-base quality, the latter of which can be largely compensated by added sequencing depth.
Nanopore data per sample is typically split across many read files - which you could either merge before running GABI or provide individually with the same sample ID. With Nanopore being a single-end technology, the fq2 remains empty, of course.
sample platform fq1 fq2
SampleA NANOPORE /path/to/BBJ413_pass_barcode01_34ed0b48_e8967f4e_0.fastq.gz
SampleA NANOPORE /path/to/BBJ413_pass_barcode01_34ed0b48_e8967f4e_1.fastq.gz
SampleA NANOPORE /path/to/BBJ413_pass_barcode01_34ed0b48_e8967f4e_2.fastq.gz
SampleA NANOPORE /path/to/BBJ413_pass_barcode01_34ed0b48_e8967f4e_3.fastq.gz
SampleB NANOPORE /path/to/BBJ413_pass_barcode02_34ed0b48_e8967f4e_0.fastq.gz
SampleB NANOPORE /path/to/BBJ413_pass_barcode02_34ed0b48_e8967f4e_1.fastq.gz
SampleB NANOPORE /path/to/BBJ413_pass_barcode02_34ed0b48_e8967f4e_2.fastq.gz
SampleB NANOPORE /path/to/BBJ413_pass_barcode02_34ed0b48_e8967f4e_3.fastq.gz
With this samplesheet, you can run GABI:
Run GABI with all-defaults options.
Run GABI on data generated with SUP basecalling
Perform consensus assembly with Autocycler.
ONT and Illumina reads
Combining Nanopore long reads with Illumina short reads will extend the assembly process by using short reads for assembly polishing. No additional command-line flags are required. A samplesheet with mixed ONT and Illumina reads will look as follows:
sample platform fq1 fq2
SampleA ILLUMINA /path/to/sampleA_R1.fastq.gz /path/to/sampleA_R2.fastq.gz
SampleA NANOPORE /path/to/BBJ413_pass_barcode01_34ed0b48_e8967f4e_0.fastq.gz
SampleA NANOPORE /path/to/BBJ413_pass_barcode01_34ed0b48_e8967f4e_1.fastq.gz
SampleA NANOPORE /path/to/BBJ413_pass_barcode01_34ed0b48_e8967f4e_2.fastq.gz
SampleA NANOPORE /path/to/BBJ413_pass_barcode01_34ed0b48_e8967f4e_3.fastq.gz
SampleB ILLUMINA /path/to/sampleB_R1.fastq.gz /path/to/sampleB_R2.fastq.gz
SampleB NANOPORE /path/to/BBJ413_pass_barcode02_34ed0b48_e8967f4e_0.fastq.gz
SampleB NANOPORE /path/to/BBJ413_pass_barcode02_34ed0b48_e8967f4e_1.fastq.gz
SampleB NANOPORE /path/to/BBJ413_pass_barcode02_34ed0b48_e8967f4e_2.fastq.gz
SampleB NANOPORE /path/to/BBJ413_pass_barcode02_34ed0b48_e8967f4e_3.fastq.gz
Note that Homopolish will not run if short reads are provided.
Pacbio reads
Assembling genomes form Pacbio reads is largely equivalent to using ONT reads, with the exception that Pacbio distinguishes between corrected (HiFi) and uncorrected (CLR) reads - which GABI needs to be told about. GABI only accepts one kind (HiFI or CLR) of Pacbio reads per run.
sample platform fq1 fq2
sampleA PACBIO /path/to/sampleA_hifi.fastq.gz
sampleB PACBIO /path/to/sampleB_hifi.fastq.gz
GABI can then be run like so:
Run GABI with all-defaults options for CLR reads.
Run GABI with all-defaults options for HiFi reads.
Pacbio and Illumina reads
Combining Pacbio long reads with Illumina short reads will extend the assembly process by using short reads for assembly polishing. No additional command-line flags are required. A samplesheet with mixed Pacbio and Illumina reads will look as follows:
sample platform fq1 fq2
SampleA ILLUMINA /path/to/sampleA_R1.fastq.gz /path/to/sampleA_R2.fastq.gz
sampleA PACBIO /path/to/sampleA_hifi.fastq.gz
SampleB ILLUMINA /path/to/sampleB_R1.fastq.gz /path/to/sampleB_R2.fastq.gz
sampleB PACBIO /path/to/sampleB_hifi.fastq.gz
Pre-assembled genomes
GABI also accepts pre-assembled genomes - in which case only limited QC data can be generated of course.
A samplesheet for pre-assembled genomes looks as follows:
Then you can run GABI as usual: