------------
Introduction 
------------

QCumber is a tool for quality control and exploration of NGS data. All steps can be skipped if required. The workflow is as follows:

* extract information from Sequence Analysis Viewer
* Quality control with FastQC
* Trim Reads with Trimmomatic
* run FastQC and retrim if necessary
* Quality control of trimmed reads with FastQC
* Map reads against reference using bowtie2
* Classify reads with Kraken

------------
Dependencies
------------

This tool was implemented in python and needs Python 3.4 or higher. For plotting and SAV extraction R (3.0.2) is required. Furhermore, FastQC (>v0.10.1), bowtie2 (> 2.2.3) and Kraken (0.10.5) are required.

Further packages via apt-get install:
* Python3-pip
* libfreetype6-dev
* r-cran-quantreg
* r-bioc-savr

Packages via pip3 install:
* Jinja2 
* matplotlib

R packages:
* ggplot2
* savR
* jsonlite

To change tool or adapter path, change config.txt.

-----
Usage
-----
```
python3 QCumber.py -i <input> -technology <Illumina/IonTorrent> <option(s)>
```
Input parameter:

	-i, -input		sample folder/file. If Illumina folder, files has to match pattern <Sample name>_<lane>_<R1/R2>_<number>. 
					Eg. Sample_12_345_R1_001.fastq. Otherwise use -1,-2
	-1 , -2         alternatively to -i: filename. Must not match Illumina names.
    -adapter        adapter sequence (TruSeq2-PE, TruSeq2-SE, TruSeq3-PE, TruSeq3-SE, TruSeq3-PE-2, NexteraPE-PE). Required for Illumina.

Options:
    -technology     		sequencing technology (Illumina/IonTorrent). Use Illumina if files are fastq
	-output		            output folder, default: input folder
	-reference              reference file
	-threads                number of threads

	-sav 					Sequence Analysis Viewer folder. Requires Interop folder, RunInfo.xml and RunParameter.xml
	-rename                 Rename sample names in report. TSV File with two columns: <old sample name> <new sample name>
	-parameters             Use own standard parameter.
	-trimOption             Override standard trimming option. E.g. MAXINFO:<target length>:<strictness> | SLIDINGWINDOW:<window size>:<required quality>.
                            default: SLIDINGWINDOW:4:15
	-trimBetter				Optimize trimming parameter using 'Per sequence base content' from fastqc
	-trimBetter_threshold	Threshold for 'Per sequence base content' fluctuation. Default:0.15
	-forAssembly			Trim parameter are optimized for assemblies (trim more aggressive).
	-forMapping				Trim parameter are optimized for mapping(allow more errors).
	-minlen                 Minlen parameter for Trimmomatic. Default:50
	-palindrome				palindrome parameter used in Trimmomatic (use 30 or 1000 for further analysis). Default: 30
    -gz                     Output trimmed files as .gz

	-db                     Kraken database
	-nokraken				skip Kraken
	-index					Bowtie2 index if available
	-save_mapping           Save sam files
	-nomapping				skip mapping
    -notrimming             skip trimming

    -version                Get version

Output:

<Sample/Output Folder>
* QCResult
    * Report
        - PDF report per sample
        - HTML report for entire project
        * src
            * img
                - Summary images
    * FastQC
        - <output folder(s) from FastQC>
    * Trimmed
        - <trimmed reads>
        * FastQC
            - <output folder(s) from FastQC>

-------------------
Program Description
-------------------

This project consists of 6 files:

* QCumber.py		main script for running complete pipeline
* classes.py		script containing classes
* helper.py		    small helper functions
* report.tex		Template for sample reports
* batch_report.html Template for batch report
* config.txt        path to tools and adapter file
* boxplot.R		    boxplots of fastqc output for batch report
* barplot.R         barplots of read statistics
* parameter.txt	    default parameter for trimming, set pattern for Illumina names,..


-------
Example
-------

1. Simple usage for Illumina:
```
python3 QCumber.py -1 sample_R1.fastq -2 sample_R2.fastq -technology Illumina -adapter NexteraPE-PE -r myReference.fasta
```

2. Entering a project:
```
python3 QCumber.py -input myProjectFolder/ -technology IonTorrent -r myReference.fasta
```

-------
License
-------
Copyright (C) 2017 Vivi Hue-Trang Lieu

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License, version 3
as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public
License along with this program.  If not, see
<http://www.gnu.org/licenses/>.