Bioinformatic Data on G-Quadruplexes in Genomes

Links to our data and frequently asked questions can be found in our GitHub pages below:

https://github.com/sblab-bioinformatics/faq

For example:

DNA G-quadruplexes

1.  DNA G-quadruplexes in human raw data, supplementary bed files with genomic coordinates and genomic tracks are deposited
(GEO: GSE63874 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63874)

NOTE: genomic coordinates of the DNA-G4 are reported by strand and for each of the following experimental conditions:

  • [Na+]_[ K+]: GSE63874_Na_K*.bed
  • [Na+]_[PDS]:GSE63874_Na_PDS*.bed
  • [K+]_[ Na+]_[PDS]): GSE63874_Na_K_PDS*.bed

For example, the 716,310 OQs are those identified in PDS on both forward and reverse strand. To obtain the list of these distinct quadruplexes, download and decompress the bed files containing PDS in the file name:

GSE63874_Na_K_PDS_minus_hits_intersect.bed
GSE63874_Na_PDS_minus_hits_intersect.bed

GSE63874_Na_K_PDS_plus_hits_intersect.bed
GSE63874_Na_PDS_plus_hits_intersect.bed

Merging and counting by strand provides the individual OQs by strand:

cat *PDS*minus*bed | sortBed -i - | mergeBed -i - |wc -l
356379
cat *PDS*plus*bed | sortBed -i - | mergeBed -i - |wc -l
359932

Note that there is one element more than the number reported in the paper (356379 + 359932 = 716311).

2.  DNA G-quadruplexes in multiple organisms
(GEO: GSE110582 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE110582):

NOTE: there is a separate a sample accession for each individual organism.

Arabidopsis_Li_K
Arabidopsis_Li_KPDS
Celegans_Li_K
Celegans_Li_KPDS
Homo_Li_K
Homo_Li_KPDS
Drosophila_Li_K
Drosophila_Li_KPDS
Ecoli_Li_K
Ecoli_Li_KPDS
Leishmania_Li_K
Leishmania_Li_KPDS
Mouse_Li_K
Mouse_Li_KPDS
Plasmodium_Li_K
Plasmodium_Li_KPDS
Rhodobacter_Li_K
Rhodobacter_Li_KPDS
Saccharomyces_Li_K
Saccharomyces_Li_KPDS
Trypanosoma_Li_K
Trypanosoma_Li_KPDS
Zebrafish_Li_K
Zebrafish_Li_KPDS

RNA G-quadruplex interacting proteins – iClip and relative RNA-seq

1.  iClip data for rG4 interacting proteins in Flp-In T-Rex 29 cells
(GEO: GSE106476, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE106476):

2.  RNA-seq in in Flp-In T-Rex 29 cells
(GEO: GSE106476, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE106476):

DDX3X_mRG_1
DDX3X_mRG_2
DDX3X_mRG_3
DDX3X_mRG_4
DDX3X_WT_1
DDX3X_WT_2
DDX3X_WT_3
DDX3X_WT_4
Negative_1
Negative_2
Negative_3

3.  iClip for rG4 interacting proteins in HeLa cells
(GEO: GSE105082, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE105082):

iCLIP-DHX9-1
iCLIP-DHX9-2

4.  RPF data in HeLa cells
(GEO: GSE105082, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE105082):

RPF-NT-1
RPF-NT-2
RPF_sc_2
RPF_sc_3
RPF_sc_4
RPF_DHX36_1
RPF_DHX36_2
RPF_DHX36_3
RPF_DHX9_1
RPF_DHX9_2
RPF_DHX9_3

5.  RNA-seq in HeLa cells
(GEO: GSE105082, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE105082):

RNA-NT-1
RNA-NT-2
RNA_sc_2
RNA_sc_3
RNA_sc_4
RNA_DHX36_1
RNA_DHX36_2
RNA_DHX36_3
RNA_DHX9_1
RNA_DHX9_2
RNA_DHX9_3

BG4-ChIP-seq data

1.  BG4-chip-seq in K562 cells
(GEO: GSE107690, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE107690):

K562_BG4-ChIP-rep_1a
K562_BG4-ChIP-rep_1C
K562_BG4-ChIP-rep_1_input
K562_BG4-ChIP-rep_2a
K562_BG4-ChIP-rep_2b
K562_BG4-ChIP-rep_2c
K562_BG4-ChIP-rep_2_input

2.  BG4-ChIP-seq in HaCaT cells
(GEO: GSE99205, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99205):

BG4-ChIP-rep1
BG4-ChIP-rep2
BG4-ChIP-rep3
Input

3.  BG4-ChIP-seq in HaCat cells
(GEO: GSE76688, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76688):

rhh169_ChIP1_702_502_entst_17082015
rhh170_ChIP10_702_504_entst_26082015
rhh171_ChIP3_703_503_entst_17082015
rhh172_ChIP4_704_504_entst_17082015
rhh173_ChIP8_706_502_entst_26082015
rhh174_ChIP9_701_503_entst_26082015
rhh175_ChIPwthacat_704_502_entst_26082015
rhh177_Input_705_517_entst_26082015

4.  BG4-ChIP-seq in HEK cells
(GEO: GSE76688, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76688):

HEKnp_Lonza_1472015_BG4
HEKnp_Lonza_1572015_BG4
merged_14_and_15072015_input_heknplonza