Bioinformatic Data on G-Quadruplexes in Genomes

Links to our data and frequently asked questions can be found in our GitHub pages below:

https://github.com/sblab-bioinformatics/faq

For example:

DNA G-quadruplexes

1.  DNA G-quadruplexes in human raw data, supplementary bed files with genomic coordinates and genomic tracks are deposited

(GEO: GSE63874 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63874)

NOTE: genomic coordinates of the DNA-G4 are reported by strand and for each of the following experimental conditions:

  • [Na+]_[ K+]: GSE63874_Na_K*.bed
  • [Na+]_[PDS]:GSE63874_Na_PDS*.bed
  • [K+]_[ Na+]_[PDS]): GSE63874_Na_K_PDS*.bed

For example, the 716,310 OQs are those identified in PDS on both forward and reverse strand. To obtain the list of these distinct quadruplexes, download and decompress the bed files containing PDS in the file name:

GSE63874_Na_K_PDS_minus_hits_intersect.bed
GSE63874_Na_PDS_minus_hits_intersect.bed

GSE63874_Na_K_PDS_plus_hits_intersect.bed
GSE63874_Na_PDS_plus_hits_intersect.bed

Merging and counting by strand provides the individual OQs by strand:

cat *PDS*minus*bed | sortBed -i – | mergeBed -i – |wc -l
356379
cat *PDS*plus*bed | sortBed -i – | mergeBed -i – |wc -l
359932

Note that there is one element more than the number reported in the paper (356379 + 359932 = 716311).

2.  DNA G-quadruplexes in multiple organisms

(GEO: GSE110582 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE110582):

NOTE: there is a separate a sample accession for each individual organism.

DNA G-quadruplexes in multiple organisms
GSM3003535
Arabidopsis_Li_K
GSM3003536
Arabidopsis_Li_KPDS
GSM3003537
Celegans_Li_K
GSM3003538
Celegans_Li_KPDS
GSM3003539
Homo_Li_K
GSM3003540
Homo_Li_KPDS
GSM3003541
Drosophila_Li_K
GSM3003542
Drosophila_Li_KPDS
GSM3003543
Ecoli_Li_K
GSM3003544
Ecoli_Li_KPDS
GSM3003545
Leishmania_Li_K
GSM3003546
Leishmania_Li_KPDS
GSM3003547
Mouse_Li_K
GSM3003548
Mouse_Li_KPDS
GSM3003549
Plasmodium_Li_K
GSM3003550
Plasmodium_Li_KPDS
GSM3003551
Rhodobacter_Li_K
GSM3003552
Rhodobacter_Li_KPDS
GSM3003553
Saccharomyces_Li_K
GSM3003554
Saccharomyces_Li_KPDS
GSM3003555
Trypanosoma_Li_K
GSM3003556
Trypanosoma_Li_KPDS
GSM3003557
Zebrafish_Li_K
GSM3003558
Zebrafish_Li_KPDS

RNA G-quadruplex interacting proteins – iClip and relative RNA-seq

1.  iClip data for rG4 interacting proteins in Flp-In T-Rex 29 cells

(GEO: GSE106476, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE106476):

iClip data for rG4 interacting proteins in Flp-In T-Rex 29 cells
GSM2838582
mRG1
GSM2838583
mRG2
GSM2838584
mRG3
GSM2838585
WT1
GSM2838586
WT2
GSM2838587
WT3
2.  RNA-seq in in Flp-In T-Rex 29 cells

(GEO: GSE106476, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE106476):

RNA-seq in in Flp-In T-Rex 29 cells
GSM2838588
DDX3X_mRG_1
GSM2838589
DDX3X_mRG_2
GSM2838590
DDX3X_mRG_3
GSM2838591
DDX3X_mRG_4
GSM2838592

DDX3X_WT_1
GSM2838593
DDX3X_WT_2
GSM2838594
DDX3X_WT_3
GSM2838595
DDX3X_WT_4
GSM2838596
Negative_1
GSM2838597
Negative_2
GSM2838598
Negative_3
3.  iClip for rG4 interacting proteins in HeLa cells

(GEO: GSE105082, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE105082):

iClip for rG4 interacting proteins in HeLa cells
GSM2817677
iCLIP-DHX9-1
GSM2817678
iCLIP-DHX9-2
4.  RPF data in HeLa cells

(GEO: GSE105082, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE105082):

RPF data in HeLa cells
GSM2817679
RPF-NT-1
GSM2817680
RPF-NT-2
GSM2817681
RPF_sc_2
GSM2817682
RPF_sc_3
GSM2817683
RPF_sc_4
GSM2817684
RPF_DHX36_1
GSM2817685
RPF_DHX36_2
GSM2817686
RPF_DHX36_3
GSM2817687
RPF_DHX9_1
GSM2817688
RPF_DHX9_2
GSM2817689
RPF_DHX9_3
5.  RNA-seq in HeLa cells

(GEO: GSE105082, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE105082):

RNA-seq in HeLa cells
GSM2817690
RNA-NT-1
GSM2817691
RNA-NT-2
GSM2817692
RNA_sc_2
GSM2817693
RNA_sc_3
GSM2817694
RNA_sc_4
GSM2817695
RNA_DHX36_1
GSM2817696
RNA_DHX36_2
GSM2817697
RNA_DHX36_3
GSM2817698
RNA_DHX9_1
GSM2817699
RNA_DHX9_2
GSM2817700
RNA_DHX9_3

BG4-ChIP-seq data

1.  BG4-chip-seq in K562 cells

(GEO: GSE107690, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE107690):

BG4-chip-seq in K562 cells
GSM2876090
K562_BG4-ChIP-rep_1a
GSM2876091
K562_BG4-ChIP-rep_1C
GSM2876092
K562_BG4-ChIP-rep_1_input
GSM2876093
K562_BG4-ChIP-rep_2a
GSM2876094
K562_BG4-ChIP-rep_2b
GSM2876095
K562_BG4-ChIP-rep_2c
GSM2876096
K562_BG4-ChIP-rep_2_input
2.  BG4-ChIP-seq in HaCaT cells

(GEO: GSE99205, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99205):

BG4-ChIP-seq in HaCaT cells
GSM2635752
BG4-ChIP-rep1
GSM2635753
BG4-ChIP-rep2
GSM2635754
BG4-ChIP-rep3
GSM2635755
Input
3.  BG4-ChIP-seq in HaCat cells

(GEO: GSE76688, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76688):

BG4-ChIP-seq in HaCat cells
GSM2035783
rhh169_ChIP1_702_502_entst_17082015
GSM2035784
rhh170_ChIP10_702_504_entst_26082015
GSM2035785
rhh171_ChIP3_703_503_entst_17082015
GSM2035786
rhh172_ChIP4_704_504_entst_17082015
GSM2035787
rhh173_ChIP8_706_502_entst_26082015
GSM2035788
rhh174_ChIP9_701_503_entst_26082015
GSM2035789
rhh175_ChIPwthacat_704_502_entst_26082015
GSM2035790
rhh177_Input_705_517_entst_26082015
4.  BG4-ChIP-seq in HEK cells

(GEO: GSE76688, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76688):

BG4-ChIP-seq in HEK cells
GSM2817690
HEKnp_Lonza_1472015_BG4
GSM2817691
HEKnp_Lonza_1572015_BG4
GSM2817692
merged_14_and_15072015_input_heknplonza