QC Report


general
Report generated at2019-12-27 06:46:09
TitleENCODE_GM12878_DNase-seq
DescriptionENCODE GM12878 DNase-seq Samples from Stam and Crawford labs
Pipeline versionv1.5.4
Pipeline typednase
Genomehg38
Alignerbowtie2
Sequencing endednessOrderedDict([('rep1', {'paired_end': False}), ('rep2', {'paired_end': False}), ('rep3', {'paired_end': False}), ('rep4', {'paired_end': False}), ('rep5', {'paired_end': False})])
Peak callermacs2

Alignment quality metrics


Marking duplicates (filtered BAM)

rep1rep2rep3rep4rep5
Unpaired Reads4694712432398606412346292290069545012772
Paired Reads00000
Unmapped Reads00000
Unpaired Duplicate Reads235462631294123312318727467035917664285
Paired Duplicate Reads00000
Paired Optical Duplicate Reads00000
% Duplicate Reads50.15490000000000539.943829.87469999999999720.39400000000000239.2428

Filtered out (samtools view -F 1804):


SAMstat (filtered/deduped BAM)

rep1rep2rep3rep4rep5
Total Reads2339832019454653288871331821642927334517
Total Reads (QC-failed)00000
Duplicate Reads00000
Duplicate Reads (QC-failed)00000
Mapped Reads2339832019454653288871331821642927334517
Mapped Reads (QC-failed)00000
% Mapped Reads100.0100.0100.0100.0100.0
Paired Reads00000
Paired Reads (QC-failed)00000
Read100000
Read1 (QC-failed)00000
Read200000
Read2 (QC-failed)00000
Properly Paired Reads00000
Properly Paired Reads (QC-failed)00000
% Properly Paired Reads0.00.00.00.00.0
With itself00000
With itself (QC-failed)00000
Singletons00000
Singletons (QC-failed)00000
% Singleton0.00.00.00.00.0
Diff. Chroms00000
Diff. Chroms (QC-failed)00000

Filtered and duplicates removed


Sequence quality metrics (filtered/deduped BAM)

rep1
rep1
rep2
rep2
rep3
rep3
rep4
rep4
rep5
rep5

Open chromatin assays are known to have significant GC bias. Please take this into consideration as necessary.


Library complexity quality metrics


Library complexity (filtered non-mito BAM)

rep1rep2rep3rep4rep5
Total Fragments4692670332384954412018962073237043644272
Distinct Fragments2339874019454957289129111822370127833424
Positions with Two Read62631724589404514184810174231220903
NRF = Distinct/Total0.4986230.6007410.7017370.8789970.637734
PBC1 = OneRead/Distinct0.47580.6067210.7308760.9210290.92685
PBC2 = OneRead/TwoRead1.7775522.5719544.10975616.49713321.129777

Mitochondrial reads are filtered out by default. The non-redundant fraction (NRF) is the fraction of non-redundant mapped reads in a dataset; it is the ratio between the number of positions in the genome that uniquely mapped reads map to and the total number of uniquely mappable reads. The NRF should be > 0.8. The PBC1 is the ratio of genomic locations with EXACTLY one read pair over the genomic locations with AT LEAST one read pair. PBC1 is the primary measure, and the PBC1 should be close to 1. Provisionally 0-0.5 is severe bottlenecking, 0.5-0.8 is moderate bottlenecking, 0.8-0.9 is mild bottlenecking, and 0.9-1.0 is no bottlenecking. The PBC2 is the ratio of genomic locations with EXACTLY one read pair over the genomic locations with EXACTLY two read pairs. The PBC2 should be significantly greater than 1.


NRF (non redundant fraction)
PBC1 (PCR Bottleneck coefficient 1)
PBC2 (PCR Bottleneck coefficient 2)
PBC1 is the primary measure. Provisionally


Replication quality metrics


IDR (Irreproducible Discovery Rate) plots

rep1_vs_rep2
rep1_vs_rep2
rep1_vs_rep3
rep1_vs_rep3
rep1_vs_rep4
rep1_vs_rep4
rep1_vs_rep5
rep1_vs_rep5
rep2_vs_rep3
rep2_vs_rep3
rep2_vs_rep4
rep2_vs_rep4
rep2_vs_rep5
rep2_vs_rep5
rep3_vs_rep4
rep3_vs_rep4
rep3_vs_rep5
rep3_vs_rep5
rep4_vs_rep5
rep4_vs_rep5
rep1-pr1_vs_rep1-pr2
rep1-pr1_vs_rep1-pr2
rep2-pr1_vs_rep2-pr2
rep2-pr1_vs_rep2-pr2
rep3-pr1_vs_rep3-pr2
rep3-pr1_vs_rep3-pr2
rep4-pr1_vs_rep4-pr2
rep4-pr1_vs_rep4-pr2
rep5-pr1_vs_rep5-pr2
rep5-pr1_vs_rep5-pr2
pooled-pr1_vs_pooled-pr2
pooled-pr1_vs_pooled-pr2

Reproducibility QC and peak detection statistics

overlapidr
Nt13220084052
N19015949899
N27961144224
N310808366353
N47454344562
N510740764798
Np178815116780
N optimal178815116780
N conservative13220084052
Optimal Setpooled-pr1_vs_pooled-pr2pooled-pr1_vs_pooled-pr2
Conservative Setrep1_vs_rep3rep1_vs_rep3
Rescue Ratio1.35260968229954611.3893780040927046
Self Consistency Ratio1.44994164441999931.5003844066570189
Reproducibility Testpasspass

Reproducibility QC


Number of raw peaks

rep1rep2rep3rep4rep5
Number of peaks173207178292229990134503169490

Top 300000 raw peaks from macs2 with p-val threshold 0.01

Peak calling statistics


Peak region size

rep1rep2rep3rep4rep5idr_optoverlap_opt
Min size150.0150.0150.0150.0150.0150.0150.0
25 percentile165.0160.0161.0179.0193.0420.0313.0
50 percentile (median)227.0218.0214.0268.0320.0631.0481.0
75 percentile379.0362.0354.0479.0532.0966.0783.0
Max size2235.01943.01887.02625.02264.02628.02628.0
Mean322.2292286108529314.2238182307675319.15976346797686399.9251838248961404.49299663696974726.6419078609351600.164700947907

rep1
rep1
rep2
rep2
rep3
rep3
rep4
rep4
rep5
rep5
idr_opt
idr_opt
overlap_opt
overlap_opt

Enrichment / Signal-to-noise ratio


Jensen-Shannon distance (filtered/deduped BAM)

rep1rep2rep3rep4rep5
AUC0.17452895782267820.153728050751190640.194148891037613350.174502604709260680.19522475842165743
Synthetic AUC0.48376318710639790.4821859834897570.485400972829618250.486299417369238040.48872546373792825
X-intercept0.27585786347359980.32704876571225210.230972855066407830.252245749919526860.18570397487239262
Synthetic X-intercept1.1307209743140507e-313.325813669124358e-262.378065482276183e-398.699808992158127e-451.4369727296302043e-66
Elbow Point0.66999022326542930.73889520900019390.65559902992400540.67485659789630260.7116782994479843
Synthetic Elbow Point0.50326313360206290.492211932876514440.50758839089512740.52088257331066930.4999432588881882
Synthetic JS Distance0.36694046601319440.384640660942795640.35593618888138470.402982176000845850.3965298526132157

Peak enrichment


Fraction of reads in peaks (FRiP)

FRiP for macs2 raw peaks

rep1rep2rep3rep4rep5rep1-pr1rep2-pr1rep3-pr1rep4-pr1rep5-pr1rep1-pr2rep2-pr2rep3-pr2rep4-pr2rep5-pr2pooledpooled-pr1pooled-pr2
Fraction of Reads in Peaks0.200868096512912040.223303340337141970.219345846610061570.256194339735850540.26459607828446360.19776171964482920.203394210968748160.21708191700103050.239222943244093380.265266868799369340.197529138844156320.20349302572978430.21685903467881530.239156436157516720.26563345771331750.26603026148931370.246760033836805250.24657389969388158

FRiP for overlap peaks

rep1_vs_rep2rep1_vs_rep3rep1_vs_rep4rep1_vs_rep5rep2_vs_rep3rep2_vs_rep4rep2_vs_rep5rep3_vs_rep4rep3_vs_rep5rep4_vs_rep5rep1-pr1_vs_rep1-pr2rep2-pr1_vs_rep2-pr2rep3-pr1_vs_rep3-pr2rep4-pr1_vs_rep4-pr2rep5-pr1_vs_rep5-pr2pooled-pr1_vs_pooled-pr2
Fraction of Reads in Peaks0.206268849671422170.20968678817543080.193605917663243250.198512365363995390.20882385415047730.192878165497526980.197180646325446550.196910308178923580.202672487138310.196977971562404540.160309458114941570.173354466923671150.173874221182965550.22237673475959530.236954324087745920.23564354632947582

FRiP for IDR peaks

rep1_vs_rep2rep1_vs_rep3rep1_vs_rep4rep1_vs_rep5rep2_vs_rep3rep2_vs_rep4rep2_vs_rep5rep3_vs_rep4rep3_vs_rep5rep4_vs_rep5rep1-pr1_vs_rep1-pr2rep2-pr1_vs_rep2-pr2rep3-pr1_vs_rep3-pr2rep4-pr1_vs_rep4-pr2rep5-pr1_vs_rep5-pr2pooled-pr1_vs_pooled-pr2
Fraction of Reads in Peaks0.16653525382893730.17416690439347180.12903469454904960.116313074912341270.17252518511173370.133419525584554140.116977792217785190.144510430402622620.149627446075424440.128453819148357350.123325093425510890.13311298844548910.144376555186998070.181220040437124090.198730637896400360.2066103801307233

For macs2 raw peaks:


For overlap/IDR peaks:

Annotated genomic region enrichment

rep1rep2rep3rep4rep5
Fraction of Reads in universal DHS regions0.338541143124805540.36270006974681070.33547825863189960.38857994615739450.35979779704905707
Fraction of Reads in blacklist regions2.3933342222860446e-062.724284005476736e-063.3204070348633747e-066.450221390811559e-055.158313205241563e-05
Fraction of Reads in promoter regions0.128341479217311320.144960591175797380.133150189826978430.157461816473470180.10064820241747824
Fraction of Reads in enhancer regions0.306066418443717350.318031064342293830.29418163000026490.310164357679543030.326500775557878

Signal to noise can be assessed by considering whether reads are falling into known open regions (such as DHS regions) or not. A high fraction of reads should fall into the universal (across cell type) DHS set. A small fraction should fall into the blacklist regions. A high set (though not all) should fall into the promoter regions. A high set (though not all) should fall into the enhancer regions. The promoter regions should not take up all reads, as it is known that there is a bias for promoters in open chromatin assays.


Other quality metrics


Comparison to Roadmap DNase

rep1
rep1
rep2
rep2
rep3
rep3
rep4
rep4
rep5
rep5

This bar chart shows the correlation between the Roadmap DNase samples to your sample, when the signal in the universal DNase peak region sets are compared. The closer the sample is in signal distribution in the regions to your sample, the higher the correlation.