PUT

ATAC-seq data from putamen

Pipeline version: v1.4.1

Report generated at 2019-06-14 07:55:02

Paired-end: [True, True, True, True]

Pipeline type: ATAC-Seq

Genome: hg38.tsv

Peak caller: MACS2

Alignment


Flagstat (raw BAM)

rep1 (PE)rep2 (PE)rep3 (PE)rep4 (PE)
Total95444864190505688163936258146180152
Total(QC-failed)0000
Dupes0000
Dupes(QC-failed)0000
Mapped95394462190336491163871444146146064
Mapped(QC-failed)0000
% Mapped99.950099.910099.960099.9800
Paired544941881054241809805564676743000
Paired(QC-failed)0000
Read127247094527120904902782338371500
Read1(QC-failed)0000
Read227247094527120904902782338371500
Read2(QC-failed)0000
Properly Paired544115101051645309795826076671268
Properly Paired(QC-failed)0000
% Properly Paired99.850099.750099.900099.9100
With itself544391101052108789798450676701606
With itself(QC-failed)0000
Singletons46764410563267306
Singletons(QC-failed)0000
% Singleton0.01000.04000.01000.0100
Diff. Chroms1996291227731606
Diff. Chroms (QC-failed)0000

Marking duplicates (filtered BAM)

Filtered out (samtools view -F 1804):


rep1 (PE)rep2 (PE)rep3 (PE)rep4 (PE)
Unpaired Reads0000
Paired Reads22663221430494844162347830404447
Unmapped Reads0000
Unpaired Dupes0000
Paired Dupes1956671331220212249
Paired Opt. Dupes0000
% Dupes/1000.00090.00020.00030.0004

Library complexity (filtered non-mito BAM)

rep1 (PE)rep2 (PE)rep3 (PE)rep4 (PE)
Total Reads (Pairs)22660866430481144162063330403338
Distinct Reads (Pairs)22639306430373804160696830389477
One Read (Pair)22617847430269004159336830375764
Two Reads (Pairs)21388103811355413652
NRF = Distinct/Total0.99900.99980.99970.9995
PBC1 = OnePair/Distinct0.99910.99980.99970.9995
PBC2 = OnePair/TwoPair1057.50174144.77413068.71542225.0047

Mitochondrial reads are filtered out.

NRF (non redundant fraction)
PBC1 (PCR Bottleneck coefficient 1)
PBC2 (PCR Bottleneck coefficient 2)
PBC1 is the primary measure. Provisionally


Flagstat (filtered/deduped BAM)

Filtered and duplicates removed

rep1 (PE)rep2 (PE)rep3 (PE)rep4 (PE)
Total45287310860847028322255260784396
Total(QC-failed)0000
Dupes0000
Dupes(QC-failed)0000
Mapped45287310860847028322255260784396
Mapped(QC-failed)0000
% Mapped100.0000100.0000100.0000100.0000
Paired45287310860847028322255260784396
Paired(QC-failed)0000
Read122643655430423514161127630392198
Read1(QC-failed)0000
Read222643655430423514161127630392198
Read2(QC-failed)0000
Properly Paired45287310860847028322255260784396
Properly Paired(QC-failed)0000
% Properly Paired100.0000100.0000100.0000100.0000
With itself45287310860847028322255260784396
With itself(QC-failed)0000
Singletons0000
Singletons(QC-failed)0000
% Singleton0.00000.00000.00000.0000
Diff. Chroms0000
Diff. Chroms (QC-failed)0000

Peak calling


IDR (Irreproducible Discovery Rate) plots

rep1-rep2
rep1-rep2
rep1-rep3
rep1-rep3
rep1-rep4
rep1-rep4
rep2-rep3
rep2-rep3
rep2-rep4
rep2-rep4
rep3-rep4
rep3-rep4
rep1-pr
rep1-pr
rep2-pr
rep2-pr
rep3-pr
rep3-pr
rep4-pr
rep4-pr
ppr
ppr

Reproducibility QC and peak detection statistics

The number of peaks is capped at 300K for peak-caller MACS2


overlapIDR
Nt209467133096
N1169347101398
N2177842101555
N3194335110142
N4177427108763
Np241767177004
N optimal241767177004
N conservative209467133096
Optimal Setpprppr
Conservative Setrep1-rep2rep1-rep2
Rescue Ratio1.15421.3299
Self Consistency Ratio1.14761.0862
Reproducibilitypasspass

Overlapping peaks


IDR (Irreproducible Discovery Rate) peaks


Enrichment


Strand cross-correlation measures

Performed on subsampled reads (25M)

rep1rep2rep3rep4
Reads22641318250000002500000025000000
Est. Fragment Len.0000
Corr. Est. Fragment Len.0.35250.31590.32040.3265
Phantom Peak50505055
Corr. Phantom Peak0.30210.27780.27330.2941
Argmin. Corr.1500150015001500
Min. Corr.0.20140.24370.24720.2319
NSC1.75021.29621.29621.4078
RSC1.50112.11732.79871.5203

NOTE1: For SE datasets, reads from replicates are randomly subsampled.
NOTE2: For PE datasets, the first end of each read-pair is selected and the reads are then randomly subsampled.


rep1
rep1
rep2
rep2
rep3
rep3
rep4
rep4

Fraction of reads in overlapping peaks

rep1-rep2rep1-rep3rep1-rep4rep2-rep3rep2-rep4rep3-rep4rep1-prrep2-prrep3-prrep4-prppr
Fraction of Reads in Peak0.16890.16670.16940.16370.16450.16260.22720.13620.13540.18080.1813


Fraction of reads in IDR peaks

rep1-rep2rep1-rep3rep1-rep4rep2-rep3rep2-rep4rep3-rep4rep1-prrep2-prrep3-prrep4-prppr
Fraction of Reads in Peak0.13510.12300.13500.12100.12400.11470.18610.10490.09820.14580.1577


ATAQC


Summary table

rep1rep2rep3rep4
GenomeGRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gzGRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gzGRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gzGRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz
Paired/single-endedPaired-endedPaired-endedPaired-endedPaired-ended
Read length50515051
Read count from sequencer544941881054241809805564676743000
Read count successfully aligned544437861052549839799083276708912
Read count after filtering for mapping quality51391743980861849354005871493582
Read count after removing duplicate reads51372177980790519352785671481333
Read count after removing mitochondrial reads (final read count)45287310860847028322255260784396
Mapping quality > q30 (out of total)51391743, 0.94306833235198086184, 0.9303955126893540058, 0.95394872009771493582, 0.931597435597
Duplicates (after filtering)19566, 0.0008637133, 0.00016612202, 0.00029312249, 0.000403
Mitochondrial reads (out of total)19077, 0.0001999801623713372, 7.02545262327e-0516778, 0.00010238513550913041, 8.92326460465e-05
Duplicates that are mitochondrial (out of all dups)36, 0.0009199632014728, 0.00056077386793824, 0.000983445336834, 0.000163278634991
Final reads (after all filters)45287310, 0.83104844134986084702, 0.8165555757783222552, 0.84872779278860784396, 0.792051340187
NRF = Distinct/Total0.999049, OK0.999751, OK0.999672, OK0.999544, OK
PBC1 = OnePair/Distinct0.999052, OK0.999756, OK0.999673, OK0.999549, OK
PBC2 = OnePair/TwoPair1057.50173, OK4144.774107, OK3068.715361, OK2225.004688, OK
Picard est library size24207172875331393266686058036002126316749690
Fraction of reads in nfr0.642993808036, OK0.646909169167, OK0.378866302083, out of range [0.4, inf]0.614671225298, OK
Nfr / mono-nuc reads2.15627959638, out of range [2.5, inf]2.07666427821, out of range [2.5, inf]0.931539832479, out of range [2.5, inf]1.88572038667, out of range [2.5, inf]
Presence of nfr peakOKOKOKOK
Presence of mono-nuc peakOKOKOKOK
Presence of di-nuc peakOKOKOKOK
Naive overlap peaks241767, OK241767, OK241767, OK241767, OK
Idr peaks177004, OK177004, OK177004, OK177004, OK
Naive peak stats: min size73.000073.000073.000073.0000
Naive peak stats: 25 percentile343.0000343.0000343.0000343.0000
Naive peak stats: 50 percentile (median)532.0000532.0000532.0000532.0000
Naive peak stats: 75 percentile754.0000754.0000754.0000754.0000
Naive peak stats: max size2704.00002704.00002704.00002704.0000
Naive peak stats: mean579.3118579.3118579.3118579.3118
Idr peak stats: min size73.000073.000073.000073.0000
Idr peak stats: 25 percentile438.0000438.0000438.0000438.0000
Idr peak stats: 50 percentile (median)618.0000618.0000618.0000618.0000
Idr peak stats: 75 percentile829.0000829.0000829.0000829.0000
Idr peak stats: max size2704.00002704.00002704.00002704.0000
Idr peak stats: mean660.5189660.5189660.5189660.5189
Tss enrichment17.57748.21678.928513.5067
Fraction of reads in universal dhs regions11673624, 0.25779470965416374593, 0.19022093709116816239, 0.20207724427515375755, 0.252964840913
Fraction of reads in blacklist regions154, 3.40086208762e-06401, 4.65835063951e-06137, 1.64630048762e-06197, 3.24108140771e-06
Fraction of reads in promoter regions4531865, 0.1000795315894352604, 0.05056348036645064565, 0.06085982356995546244, 0.0912478594467
Fraction of reads in enhancer regions10857709, 0.23977643439319100683, 0.22188947348718755121, 0.22537638574915744310, 0.259028377757
Fraction of reads in called peak regions8426262, 0.1860815258199032504, 0.1049291042018168805, 0.09816282959698861093, 0.145784384641

Replicate 1

Sample Information

Sample
Genome GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz
Paired/Single-ended Paired-ended
Read length 50

Summary

Read count from sequencer 54,494,188
Read count successfully aligned 54,443,786
Read count after filtering for mapping quality 51,391,743
Read count after removing duplicate reads 51,372,177
Read count after removing mitochondrial reads (final read count) 45,287,310
Note that all these read counts are determined using 'samtools view' - as such,
these are all reads found in the file, whether one end of a pair or a single
end read. In other words, if your file is paired end, then you should divide
these counts by two. Each step follows the previous step; for example, the
duplicate reads were removed after reads were removed for low mapping quality.
This bar chart also shows the filtering process and where the reads were lost
over the process. Note that each step is sequential - as such, there may
have been more mitochondrial reads which were already filtered because of
high duplication or low mapping quality. Note that all these read counts are
determined using 'samtools view' - as such, these are all reads found in
the file, whether one end of a pair or a single end read. In other words,
if your file is paired end, then you should divide these counts by two.

Alignment statistics

Bowtie alignment log

27247094 reads; of these:
  27247094 (100.00%) were paired; of these:
    41339 (0.15%) aligned concordantly 0 times
    20628768 (75.71%) aligned concordantly exactly 1 time
    6576987 (24.14%) aligned concordantly >1 times
    ----
    41339 pairs aligned concordantly 0 times; of these:
      6053 (14.64%) aligned discordantly 1 time
    ----
    35286 pairs aligned 0 times concordantly or discordantly; of these:
      70572 mates make up the pairs; of these:
        50402 (71.42%) aligned 0 times
        3820 (5.41%) aligned exactly 1 time
        16350 (23.17%) aligned >1 times
99.91% overall alignment rate

  

Samtools flagstat

95444864 + 0 in total (QC-passed reads + QC-failed reads)
40950676 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
95394462 + 0 mapped (99.95%:-nan%)
54494188 + 0 paired in sequencing
27247094 + 0 read1
27247094 + 0 read2
54411510 + 0 properly paired (99.85%:-nan%)
54439110 + 0 with itself and mate mapped
4676 + 0 singletons (0.01%:-nan%)
5318 + 0 with mate mapped to a different chr
1996 + 0 with mate mapped to a different chr (mapQ>=5)

  
Note that the flagstat command counts alignments, not reads. please 
use the read counts table to get accurate counts of reads at each
stage of the pipeline.

Filtering statistics

Mapping quality > q30 (out of total) 51,391,743 0.943
Duplicates (after filtering) 19,566 0.001
Mitochondrial reads (out of total) 19,077 0.000
Duplicates that are mitochondrial (out of all dups) 36 0.001
Final reads (after all filters) 45,287,310 0.831
Mapping quality refers to the quality of the read being aligned to that
particular location in the genome. A standard quality score is > 30.
Duplications are often due to PCR duplication rather than two unique reads
mapping to the same location. High duplication is an indication of poor
libraries. Mitochondrial reads are often high in chromatin accessibility
assays because the mitochondrial genome is very open. A high mitochondrial
fraction is an indication of poor libraries. Based on prior experience, a
final read fraction above 0.70 is a good library.
  

Library complexity statistics

ENCODE library complexity metrics

Metric Result
NRF 0.999049 - OK
PBC1 0.999052 - OK
PBC2 1057.50173 - OK
The non-redundant fraction (NRF) is the fraction of non-redundant mapped reads
in a dataset; it is the ratio between the number of positions in the genome
that uniquely mapped reads map to and the total number of uniquely mappable
reads. The NRF should be > 0.8. The PBC1 is the ratio of genomic locations
with EXACTLY one read pair over the genomic locations with AT LEAST one read
pair. PBC1 is the primary measure, and the PBC1 should be close to 1.
Provisionally 0-0.5 is severe bottlenecking, 0.5-0.8 is moderate bottlenecking,
0.8-0.9 is mild bottlenecking, and 0.9-1.0 is no bottlenecking. The PBC2 is
the ratio of genomic locations with EXACTLY one read pair over the genomic
locations with EXACTLY two read pairs. The PBC2 should be significantly
greater than 1.

Picard EstimateLibraryComplexity

24,207,172,875

Yield prediction

Preseq performs a yield prediction by subsampling the reads, calculating the
number of distinct reads, and then extrapolating out to see where the
expected number of distinct reads no longer increases. The confidence interval
gives a gauge as to the validity of the yield predictions.

Fragment length statistics

Metric Result
Fraction of reads in NFR 0.642993808036 - OK
NFR / mono-nuc reads 2.15627959638 out of range [2.5, inf]
Presence of NFR peak OK
Presence of Mono-Nuc peak OK
Presence of Di-Nuc peak OK
Open chromatin assays show distinct fragment length enrichments, as the cut
sites are only in open chromatin and not in nucleosomes. As such, peaks
representing different n-nucleosomal (ex mono-nucleosomal, di-nucleosomal)
fragment lengths will arise. Good libraries will show these peaks in a
fragment length distribution and will show specific peak ratios.

Peak statistics

Metric Result
Naive overlap peaks 241767 - OK
IDR peaks 177004 - OK

Naive overlap peak file statistics

Min size 73.0
25 percentile 343.0
50 percentile (median) 532.0
75 percentile 754.0
Max size 2704.0
Mean 579.311800204

IDR peak file statistics

Min size 73.0
25 percentile 438.0
50 percentile (median) 618.0
75 percentile 829.0
Max size 2704.0
Mean 660.518858331
For a good ATAC-seq experiment in human, you expect to get 100k-200k peaks
for a specific cell type.

Sequence quality metrics

GC bias

Open chromatin assays are known to have significant GC bias. Please take this
into consideration as necessary.

Annotation-based quality metrics

Enrichment plots (TSS)

Open chromatin assays should show enrichment in open chromatin sites, such as
TSS's. An average TSS enrichment in human (hg19) is above 6. A strong TSS enrichment is
above 10. For other references please see https://www.encodeproject.org/atac-seq/
  

Annotated genomic region enrichments

Fraction of reads in universal DHS regions 11,673,624 0.258
Fraction of reads in blacklist regions 154 0.000
Fraction of reads in promoter regions 4,531,865 0.100
Fraction of reads in enhancer regions 10,857,709 0.240
Fraction of reads in called peak regions 8,426,262 0.186
Signal to noise can be assessed by considering whether reads are falling into
known open regions (such as DHS regions) or not. A high fraction of reads
should fall into the universal (across cell type) DHS set. A small fraction
should fall into the blacklist regions. A high set (though not all) should
fall into the promoter regions. A high set (though not all) should fall into
the enhancer regions. The promoter regions should not take up all reads, as
it is known that there is a bias for promoters in open chromatin assays.

Comparison to Roadmap DNase

This bar chart shows the correlation between the Roadmap DNase samples to
your sample, when the signal in the universal DNase peak region sets are
compared. The closer the sample is in signal distribution in the regions
to your sample, the higher the correlation.

Replicate 2

Sample Information

Sample
Genome GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz
Paired/Single-ended Paired-ended
Read length 51

Summary

Read count from sequencer 105,424,180
Read count successfully aligned 105,254,983
Read count after filtering for mapping quality 98,086,184
Read count after removing duplicate reads 98,079,051
Read count after removing mitochondrial reads (final read count) 86,084,702
Note that all these read counts are determined using 'samtools view' - as such,
these are all reads found in the file, whether one end of a pair or a single
end read. In other words, if your file is paired end, then you should divide
these counts by two. Each step follows the previous step; for example, the
duplicate reads were removed after reads were removed for low mapping quality.
This bar chart also shows the filtering process and where the reads were lost
over the process. Note that each step is sequential - as such, there may
have been more mitochondrial reads which were already filtered because of
high duplication or low mapping quality. Note that all these read counts are
determined using 'samtools view' - as such, these are all reads found in
the file, whether one end of a pair or a single end read. In other words,
if your file is paired end, then you should divide these counts by two.

Alignment statistics

Bowtie alignment log

52712090 reads; of these:
  52712090 (100.00%) were paired; of these:
    129825 (0.25%) aligned concordantly 0 times
    39176936 (74.32%) aligned concordantly exactly 1 time
    13405329 (25.43%) aligned concordantly >1 times
    ----
    129825 pairs aligned concordantly 0 times; of these:
      8677 (6.68%) aligned discordantly 1 time
    ----
    121148 pairs aligned 0 times concordantly or discordantly; of these:
      242296 mates make up the pairs; of these:
        169197 (69.83%) aligned 0 times
        26920 (11.11%) aligned exactly 1 time
        46179 (19.06%) aligned >1 times
99.84% overall alignment rate

  

Samtools flagstat

190505688 + 0 in total (QC-passed reads + QC-failed reads)
85081508 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
190336491 + 0 mapped (99.91%:-nan%)
105424180 + 0 paired in sequencing
52712090 + 0 read1
52712090 + 0 read2
105164530 + 0 properly paired (99.75%:-nan%)
105210878 + 0 with itself and mate mapped
44105 + 0 singletons (0.04%:-nan%)
8712 + 0 with mate mapped to a different chr
2912 + 0 with mate mapped to a different chr (mapQ>=5)

  
Note that the flagstat command counts alignments, not reads. please 
use the read counts table to get accurate counts of reads at each
stage of the pipeline.

Filtering statistics

Mapping quality > q30 (out of total) 98,086,184 0.930
Duplicates (after filtering) 7,133 0.000
Mitochondrial reads (out of total) 13,372 0.000
Duplicates that are mitochondrial (out of all dups) 8 0.001
Final reads (after all filters) 86,084,702 0.817
Mapping quality refers to the quality of the read being aligned to that
particular location in the genome. A standard quality score is > 30.
Duplications are often due to PCR duplication rather than two unique reads
mapping to the same location. High duplication is an indication of poor
libraries. Mitochondrial reads are often high in chromatin accessibility
assays because the mitochondrial genome is very open. A high mitochondrial
fraction is an indication of poor libraries. Based on prior experience, a
final read fraction above 0.70 is a good library.
  

Library complexity statistics

ENCODE library complexity metrics

Metric Result
NRF 0.999751 - OK
PBC1 0.999756 - OK
PBC2 4144.774107 - OK
The non-redundant fraction (NRF) is the fraction of non-redundant mapped reads
in a dataset; it is the ratio between the number of positions in the genome
that uniquely mapped reads map to and the total number of uniquely mappable
reads. The NRF should be > 0.8. The PBC1 is the ratio of genomic locations
with EXACTLY one read pair over the genomic locations with AT LEAST one read
pair. PBC1 is the primary measure, and the PBC1 should be close to 1.
Provisionally 0-0.5 is severe bottlenecking, 0.5-0.8 is moderate bottlenecking,
0.8-0.9 is mild bottlenecking, and 0.9-1.0 is no bottlenecking. The PBC2 is
the ratio of genomic locations with EXACTLY one read pair over the genomic
locations with EXACTLY two read pairs. The PBC2 should be significantly
greater than 1.

Picard EstimateLibraryComplexity

33,139,326,668

Yield prediction

Preseq performs a yield prediction by subsampling the reads, calculating the
number of distinct reads, and then extrapolating out to see where the
expected number of distinct reads no longer increases. The confidence interval
gives a gauge as to the validity of the yield predictions.

Fragment length statistics

Metric Result
Fraction of reads in NFR 0.646909169167 - OK
NFR / mono-nuc reads 2.07666427821 out of range [2.5, inf]
Presence of NFR peak OK
Presence of Mono-Nuc peak OK
Presence of Di-Nuc peak OK
Open chromatin assays show distinct fragment length enrichments, as the cut
sites are only in open chromatin and not in nucleosomes. As such, peaks
representing different n-nucleosomal (ex mono-nucleosomal, di-nucleosomal)
fragment lengths will arise. Good libraries will show these peaks in a
fragment length distribution and will show specific peak ratios.

Peak statistics

Metric Result
Naive overlap peaks 241767 - OK
IDR peaks 177004 - OK

Naive overlap peak file statistics

Min size 73.0
25 percentile 343.0
50 percentile (median) 532.0
75 percentile 754.0
Max size 2704.0
Mean 579.311800204

IDR peak file statistics

Min size 73.0
25 percentile 438.0
50 percentile (median) 618.0
75 percentile 829.0
Max size 2704.0
Mean 660.518858331
For a good ATAC-seq experiment in human, you expect to get 100k-200k peaks
for a specific cell type.

Sequence quality metrics

GC bias

Open chromatin assays are known to have significant GC bias. Please take this
into consideration as necessary.

Annotation-based quality metrics

Enrichment plots (TSS)

Open chromatin assays should show enrichment in open chromatin sites, such as
TSS's. An average TSS enrichment in human (hg19) is above 6. A strong TSS enrichment is
above 10. For other references please see https://www.encodeproject.org/atac-seq/
  

Annotated genomic region enrichments

Fraction of reads in universal DHS regions 16,374,593 0.190
Fraction of reads in blacklist regions 401 0.000
Fraction of reads in promoter regions 4,352,604 0.051
Fraction of reads in enhancer regions 19,100,683 0.222
Fraction of reads in called peak regions 9,032,504 0.105
Signal to noise can be assessed by considering whether reads are falling into
known open regions (such as DHS regions) or not. A high fraction of reads
should fall into the universal (across cell type) DHS set. A small fraction
should fall into the blacklist regions. A high set (though not all) should
fall into the promoter regions. A high set (though not all) should fall into
the enhancer regions. The promoter regions should not take up all reads, as
it is known that there is a bias for promoters in open chromatin assays.

Comparison to Roadmap DNase

This bar chart shows the correlation between the Roadmap DNase samples to
your sample, when the signal in the universal DNase peak region sets are
compared. The closer the sample is in signal distribution in the regions
to your sample, the higher the correlation.

Replicate 3

Sample Information

Sample
Genome GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz
Paired/Single-ended Paired-ended
Read length 50

Summary

Read count from sequencer 98,055,646
Read count successfully aligned 97,990,832
Read count after filtering for mapping quality 93,540,058
Read count after removing duplicate reads 93,527,856
Read count after removing mitochondrial reads (final read count) 83,222,552
Note that all these read counts are determined using 'samtools view' - as such,
these are all reads found in the file, whether one end of a pair or a single
end read. In other words, if your file is paired end, then you should divide
these counts by two. Each step follows the previous step; for example, the
duplicate reads were removed after reads were removed for low mapping quality.
This bar chart also shows the filtering process and where the reads were lost
over the process. Note that each step is sequential - as such, there may
have been more mitochondrial reads which were already filtered because of
high duplication or low mapping quality. Note that all these read counts are
determined using 'samtools view' - as such, these are all reads found in
the file, whether one end of a pair or a single end read. In other words,
if your file is paired end, then you should divide these counts by two.

Alignment statistics

Bowtie alignment log

49027823 reads; of these:
  49027823 (100.00%) were paired; of these:
    48693 (0.10%) aligned concordantly 0 times
    38514404 (78.56%) aligned concordantly exactly 1 time
    10464726 (21.34%) aligned concordantly >1 times
    ----
    48693 pairs aligned concordantly 0 times; of these:
      4789 (9.84%) aligned discordantly 1 time
    ----
    43904 pairs aligned 0 times concordantly or discordantly; of these:
      87808 mates make up the pairs; of these:
        64814 (73.81%) aligned 0 times
        4520 (5.15%) aligned exactly 1 time
        18474 (21.04%) aligned >1 times
99.93% overall alignment rate

  

Samtools flagstat

163936258 + 0 in total (QC-passed reads + QC-failed reads)
65880612 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
163871444 + 0 mapped (99.96%:-nan%)
98055646 + 0 paired in sequencing
49027823 + 0 read1
49027823 + 0 read2
97958260 + 0 properly paired (99.90%:-nan%)
97984506 + 0 with itself and mate mapped
6326 + 0 singletons (0.01%:-nan%)
6528 + 0 with mate mapped to a different chr
2773 + 0 with mate mapped to a different chr (mapQ>=5)

  
Note that the flagstat command counts alignments, not reads. please 
use the read counts table to get accurate counts of reads at each
stage of the pipeline.

Filtering statistics

Mapping quality > q30 (out of total) 93,540,058 0.954
Duplicates (after filtering) 12,202 0.000
Mitochondrial reads (out of total) 16,778 0.000
Duplicates that are mitochondrial (out of all dups) 24 0.001
Final reads (after all filters) 83,222,552 0.849
Mapping quality refers to the quality of the read being aligned to that
particular location in the genome. A standard quality score is > 30.
Duplications are often due to PCR duplication rather than two unique reads
mapping to the same location. High duplication is an indication of poor
libraries. Mitochondrial reads are often high in chromatin accessibility
assays because the mitochondrial genome is very open. A high mitochondrial
fraction is an indication of poor libraries. Based on prior experience, a
final read fraction above 0.70 is a good library.
  

Library complexity statistics

ENCODE library complexity metrics

Metric Result
NRF 0.999672 - OK
PBC1 0.999673 - OK
PBC2 3068.715361 - OK
The non-redundant fraction (NRF) is the fraction of non-redundant mapped reads
in a dataset; it is the ratio between the number of positions in the genome
that uniquely mapped reads map to and the total number of uniquely mappable
reads. The NRF should be > 0.8. The PBC1 is the ratio of genomic locations
with EXACTLY one read pair over the genomic locations with AT LEAST one read
pair. PBC1 is the primary measure, and the PBC1 should be close to 1.
Provisionally 0-0.5 is severe bottlenecking, 0.5-0.8 is moderate bottlenecking,
0.8-0.9 is mild bottlenecking, and 0.9-1.0 is no bottlenecking. The PBC2 is
the ratio of genomic locations with EXACTLY one read pair over the genomic
locations with EXACTLY two read pairs. The PBC2 should be significantly
greater than 1.

Picard EstimateLibraryComplexity

60,580,360,021

Yield prediction

Preseq performs a yield prediction by subsampling the reads, calculating the
number of distinct reads, and then extrapolating out to see where the
expected number of distinct reads no longer increases. The confidence interval
gives a gauge as to the validity of the yield predictions.

Fragment length statistics

Metric Result
Fraction of reads in NFR 0.378866302083 out of range [0.4, inf]
NFR / mono-nuc reads 0.931539832479 out of range [2.5, inf]
Presence of NFR peak OK
Presence of Mono-Nuc peak OK
Presence of Di-Nuc peak OK
Open chromatin assays show distinct fragment length enrichments, as the cut
sites are only in open chromatin and not in nucleosomes. As such, peaks
representing different n-nucleosomal (ex mono-nucleosomal, di-nucleosomal)
fragment lengths will arise. Good libraries will show these peaks in a
fragment length distribution and will show specific peak ratios.

Peak statistics

Metric Result
Naive overlap peaks 241767 - OK
IDR peaks 177004 - OK

Naive overlap peak file statistics

Min size 73.0
25 percentile 343.0
50 percentile (median) 532.0
75 percentile 754.0
Max size 2704.0
Mean 579.311800204

IDR peak file statistics

Min size 73.0
25 percentile 438.0
50 percentile (median) 618.0
75 percentile 829.0
Max size 2704.0
Mean 660.518858331
For a good ATAC-seq experiment in human, you expect to get 100k-200k peaks
for a specific cell type.

Sequence quality metrics

GC bias

Open chromatin assays are known to have significant GC bias. Please take this
into consideration as necessary.

Annotation-based quality metrics

Enrichment plots (TSS)

Open chromatin assays should show enrichment in open chromatin sites, such as
TSS's. An average TSS enrichment in human (hg19) is above 6. A strong TSS enrichment is
above 10. For other references please see https://www.encodeproject.org/atac-seq/
  

Annotated genomic region enrichments

Fraction of reads in universal DHS regions 16,816,239 0.202
Fraction of reads in blacklist regions 137 0.000
Fraction of reads in promoter regions 5,064,565 0.061
Fraction of reads in enhancer regions 18,755,121 0.225
Fraction of reads in called peak regions 8,168,805 0.098
Signal to noise can be assessed by considering whether reads are falling into
known open regions (such as DHS regions) or not. A high fraction of reads
should fall into the universal (across cell type) DHS set. A small fraction
should fall into the blacklist regions. A high set (though not all) should
fall into the promoter regions. A high set (though not all) should fall into
the enhancer regions. The promoter regions should not take up all reads, as
it is known that there is a bias for promoters in open chromatin assays.

Comparison to Roadmap DNase

This bar chart shows the correlation between the Roadmap DNase samples to
your sample, when the signal in the universal DNase peak region sets are
compared. The closer the sample is in signal distribution in the regions
to your sample, the higher the correlation.

Replicate 4

Sample Information

Sample
Genome GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz
Paired/Single-ended Paired-ended
Read length 51

Summary

Read count from sequencer 76,743,000
Read count successfully aligned 76,708,912
Read count after filtering for mapping quality 71,493,582
Read count after removing duplicate reads 71,481,333
Read count after removing mitochondrial reads (final read count) 60,784,396
Note that all these read counts are determined using 'samtools view' - as such,
these are all reads found in the file, whether one end of a pair or a single
end read. In other words, if your file is paired end, then you should divide
these counts by two. Each step follows the previous step; for example, the
duplicate reads were removed after reads were removed for low mapping quality.
This bar chart also shows the filtering process and where the reads were lost
over the process. Note that each step is sequential - as such, there may
have been more mitochondrial reads which were already filtered because of
high duplication or low mapping quality. Note that all these read counts are
determined using 'samtools view' - as such, these are all reads found in
the file, whether one end of a pair or a single end read. In other words,
if your file is paired end, then you should divide these counts by two.

Alignment statistics

Bowtie alignment log

38371500 reads; of these:
  38371500 (100.00%) were paired; of these:
    35866 (0.09%) aligned concordantly 0 times
    27760554 (72.35%) aligned concordantly exactly 1 time
    10575080 (27.56%) aligned concordantly >1 times
    ----
    35866 pairs aligned concordantly 0 times; of these:
      5296 (14.77%) aligned discordantly 1 time
    ----
    30570 pairs aligned 0 times concordantly or discordantly; of these:
      61140 mates make up the pairs; of these:
        34088 (55.75%) aligned 0 times
        4660 (7.62%) aligned exactly 1 time
        22392 (36.62%) aligned >1 times
99.96% overall alignment rate

  

Samtools flagstat

146180152 + 0 in total (QC-passed reads + QC-failed reads)
69437152 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
146146064 + 0 mapped (99.98%:-nan%)
76743000 + 0 paired in sequencing
38371500 + 0 read1
38371500 + 0 read2
76671268 + 0 properly paired (99.91%:-nan%)
76701606 + 0 with itself and mate mapped
7306 + 0 singletons (0.01%:-nan%)
5342 + 0 with mate mapped to a different chr
1606 + 0 with mate mapped to a different chr (mapQ>=5)

  
Note that the flagstat command counts alignments, not reads. please 
use the read counts table to get accurate counts of reads at each
stage of the pipeline.

Filtering statistics

Mapping quality > q30 (out of total) 71,493,582 0.932
Duplicates (after filtering) 12,249 0.000
Mitochondrial reads (out of total) 13,041 0.000
Duplicates that are mitochondrial (out of all dups) 4 0.000
Final reads (after all filters) 60,784,396 0.792
Mapping quality refers to the quality of the read being aligned to that
particular location in the genome. A standard quality score is > 30.
Duplications are often due to PCR duplication rather than two unique reads
mapping to the same location. High duplication is an indication of poor
libraries. Mitochondrial reads are often high in chromatin accessibility
assays because the mitochondrial genome is very open. A high mitochondrial
fraction is an indication of poor libraries. Based on prior experience, a
final read fraction above 0.70 is a good library.
  

Library complexity statistics

ENCODE library complexity metrics

Metric Result
NRF 0.999544 - OK
PBC1 0.999549 - OK
PBC2 2225.004688 - OK
The non-redundant fraction (NRF) is the fraction of non-redundant mapped reads
in a dataset; it is the ratio between the number of positions in the genome
that uniquely mapped reads map to and the total number of uniquely mappable
reads. The NRF should be > 0.8. The PBC1 is the ratio of genomic locations
with EXACTLY one read pair over the genomic locations with AT LEAST one read
pair. PBC1 is the primary measure, and the PBC1 should be close to 1.
Provisionally 0-0.5 is severe bottlenecking, 0.5-0.8 is moderate bottlenecking,
0.8-0.9 is mild bottlenecking, and 0.9-1.0 is no bottlenecking. The PBC2 is
the ratio of genomic locations with EXACTLY one read pair over the genomic
locations with EXACTLY two read pairs. The PBC2 should be significantly
greater than 1.

Picard EstimateLibraryComplexity

26,316,749,690

Yield prediction

Preseq performs a yield prediction by subsampling the reads, calculating the
number of distinct reads, and then extrapolating out to see where the
expected number of distinct reads no longer increases. The confidence interval
gives a gauge as to the validity of the yield predictions.

Fragment length statistics

Metric Result
Fraction of reads in NFR 0.614671225298 - OK
NFR / mono-nuc reads 1.88572038667 out of range [2.5, inf]
Presence of NFR peak OK
Presence of Mono-Nuc peak OK
Presence of Di-Nuc peak OK
Open chromatin assays show distinct fragment length enrichments, as the cut
sites are only in open chromatin and not in nucleosomes. As such, peaks
representing different n-nucleosomal (ex mono-nucleosomal, di-nucleosomal)
fragment lengths will arise. Good libraries will show these peaks in a
fragment length distribution and will show specific peak ratios.

Peak statistics

Metric Result
Naive overlap peaks 241767 - OK
IDR peaks 177004 - OK

Naive overlap peak file statistics

Min size 73.0
25 percentile 343.0
50 percentile (median) 532.0
75 percentile 754.0
Max size 2704.0
Mean 579.311800204

IDR peak file statistics

Min size 73.0
25 percentile 438.0
50 percentile (median) 618.0
75 percentile 829.0
Max size 2704.0
Mean 660.518858331
For a good ATAC-seq experiment in human, you expect to get 100k-200k peaks
for a specific cell type.

Sequence quality metrics

GC bias

Open chromatin assays are known to have significant GC bias. Please take this
into consideration as necessary.

Annotation-based quality metrics

Enrichment plots (TSS)

Open chromatin assays should show enrichment in open chromatin sites, such as
TSS's. An average TSS enrichment in human (hg19) is above 6. A strong TSS enrichment is
above 10. For other references please see https://www.encodeproject.org/atac-seq/
  

Annotated genomic region enrichments

Fraction of reads in universal DHS regions 15,375,755 0.253
Fraction of reads in blacklist regions 197 0.000
Fraction of reads in promoter regions 5,546,244 0.091
Fraction of reads in enhancer regions 15,744,310 0.259
Fraction of reads in called peak regions 8,861,093 0.146
Signal to noise can be assessed by considering whether reads are falling into
known open regions (such as DHS regions) or not. A high fraction of reads
should fall into the universal (across cell type) DHS set. A small fraction
should fall into the blacklist regions. A high set (though not all) should
fall into the promoter regions. A high set (though not all) should fall into
the enhancer regions. The promoter regions should not take up all reads, as
it is known that there is a bias for promoters in open chromatin assays.

Comparison to Roadmap DNase

This bar chart shows the correlation between the Roadmap DNase samples to
your sample, when the signal in the universal DNase peak region sets are
compared. The closer the sample is in signal distribution in the regions
to your sample, the higher the correlation.