Macs2 操作手册与介绍
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
filterdup randsample
callpeak
peaks.xls peaks.narrowPeak OUTPUT FILEs summits.bed model.r model.pdf treat_pileup.bdg control_lambda.bdg
predictd pileup refinepeaks bdgpeakcall bdgbroadcall bdgcmp OUTPUT pileup.bdg refinepeak.bed
ChIP-seq analysis with MACS2
Tips and tricks
Sami Heikkinen, PhD Docent in Molecular Bioinformatics Institute of Biomedicine, UEF
ChIP-Seq simplified
Where?
• Usage tip: use up/down arrow keys to move in command history • ls
– LiSt files in directory – e.g. ‘ls -l’ to show file and folder names AND other info (Long format)
• checks on seq files
– ls –l seq – head seq/*
• check that macs2 works
– macs2 callpeak
callpeak - Options
Various options to indicate/control input, output, peak modelling and peak calling macs2 callpeak usage: macs2 callpeak [-h] -t TFILE [TFILE ...] [-c [CFILE [CFILE ...]]] [-f {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE, BAMPE}] [-g GSIZE] [--keep-dup KEEPDUPLICATES] [--buffer-size BUFFER_SIZE] [--outdir OUTDIR] [-n NAME] [-B] [--verbose VERBOSE] [--trackline] [--SPMR] [-s TSIZE] [--bw BW] [-m MFOLD MFOLD] [--fix-bimodal] [--nomodel] [--shift SHIFT] [--extsize EXTSIZE] [-q QVALUE] [-p PVALUE] [--to-large] [--ratio RATIO] [--down-sample] [--seed SEED] [--nolambda] [--slocal SMALLLOCAL] [--llocal LARGELOCAL] [--broad] [--broad-cutoff BROADCUTOFF] [--call-summits]
– Genome Biology 2008, 9:R137 – now at version 2.1.0.20140616, developed and maintained by Tao Liu at https://github.com/taoliu/MACS/ – https://github.com/taoliu/MACS/blob/macs_v1/README.rst
Using MACS - setup
• cd /home/work/public • mkdir macsout_<user ID>
– <user ID> : e.g. ‘spheikki’ for me – each student MUST have their own folder!!
• to avoid overlapping MACS outputs
Park, Nat Rev Genetics, 2009
Schmidt et al, Methods, 2009
From binding to binding sites
ChIP-seq
~200 bp
Control sample: “Input” or “IgG” - Input: sonicated chromatin without immunoprecipitation - IgG: “unspecific” IP
-f {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE}, --format {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE} Format of tag file, "AUTO", "BED" or "ELAND" or "ELANDMULTI" or "ELANDEXPORT" or "SAM" or "BAM" or "BOWTIE" or "BAMPE". The default AUTO option will let MACS decide which format the file is. Please check the definition in README file if you choose ELAND/ELANDMULTI/ELANDEXPORT/SAM/BAM/BOWTIE. DEFAULT: "AUTO" -g GSIZE, --gsize GSIZE Effective genome size. It can be 1.0e+9 or 1000000000, or shortcuts:'hs' for human (2.7e9), 'mm' for mouse (1.87e9), 'ce' for C. elegans (9e7) and 'dm' for fruitfly (1.2e8), Default:hs --keep-dup KEEPDUPLICATES It controls the MACS behavior towards duplicate tags at the exact same location -- the same coordination and the same strand. The 'auto' option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff; and the 'all' option keeps every tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location. Default: 1 --buffer-size BUFFER_SIZE Buffer size for incrementally increasing internal array size to store reads alignment information. In most cases, you don't have to change this parameter. However, if there are large number of chromosomes/contigs/scaffolds in your alignment, it's recommended to specify a smaller buffer size in order to decrease memory usage (but it will take longer time to read alignment files). Minimum memory requested for reading an alignment file is about # of CHROMOSOME * BUFFER_SIZE * 2 Bytes. DEFAULT: 100000
• head / tail
– show first/last lines of a (text) file – e.g. ‘head -20 ref_hg19.txt’
• Usage tip: use the TAB key to fill in available file/folder names
bdgdiff diffpeak
callpeak - Options
Various options to indicate/control input, output, peak modelling and peak cawenku.baidu.comling macs2 callpeak usage: macs2 callpeak [-h] -t TFILE [TFILE ...] [-c [CFILE [CFILE ...]]] [-f {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE, BAMPE}] [-g GSIZE] [--keep-dup KEEPDUPLICATES] [--buffer-size BUFFER_SIZE] [--outdir OUTDIR] [-n NAME] [-B] [--verbose VERBOSE] [--trackline] [--SPMR] [-s TSIZE] [--bw BW] [-m MFOLD MFOLD] [--fix-bimodal] [--nomodel] [--shift SHIFT] [--extsize EXTSIZE] [-q QVALUE] [-p PVALUE] [--to-large] [--ratio RATIO] [--down-sample] [--seed SEED] [--nolambda] [--slocal SMALLLOCAL] [--llocal LARGELOCAL] [--broad] [--broad-cutoff BROADCUTOFF] [--call-summits] -t/--treatment FILENAME This is the only REQUIRED parameter for MACS.
• Package of command line programs to call peaks in ChIPseq data • Much improved since v1.x!!!
MACS2 – program(s)
INPUT DATA: aligned sequence reads
ChIPed sample “treat” Input/IgG “control”
callpeak – Options - Input
Input files arguments: -t TFILE [TFILE ...], --treatment TFILE [TFILE ...] ChIP-seq treatment file. If multiple files are given as '-t A B C', then they will all be read and combined. REQUIRED. -c [CFILE [CFILE ...]], --control [CFILE [CFILE ...]] Control file. If multiple files are given as '-c A B C', then they will all be read and combined.
36-50 bp
Typically millions of reads per sample
Park, Nat Rev Genetics, 2009
MACS2
• Model-based Analysis of ChIP-Seq • Original version published by Yong Zhang and Tao Liu from the lab of X. Shirley Liu at the Dana-Farber Cancer Institute, Boston
Unix 101
• pwd
– show Present Working Directory
• cd
– Change Directory – e.g. ‘cd /home/work/public’ to get to the folder we use today (from wherever you are) – or, to get back to your home directory: ‘cd $HOME’ – or, back one step ‘cd ..’, or two steps ‘cd ../../’
Using MACS – connect to server
• Open the SSH client
– at Win –> All programs –> SSH Secure shell –> Secure shell client – “Quick connect”
• connection : intron.uef.fi • username : <your user ID> • password: <your password>