Bioinformatic Data on G-Quadruplexes in Genomes
Links to our data and frequently asked questions can be found in our GitHub pages below:
https://github.com/sblab-bioinformatics/faq
For example:
DNA G-quadruplexes
1. DNA G-quadruplexes in human raw data, supplementary bed files with genomic coordinates and genomic tracks are deposited
(GEO: GSE63874 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63874)
NOTE: genomic coordinates of the DNA-G4 are reported by strand and for each of the following experimental conditions:
- [Na+]_[ K+]: GSE63874_Na_K*.bed
- [Na+]_[PDS]:GSE63874_Na_PDS*.bed
- [K+]_[ Na+]_[PDS]): GSE63874_Na_K_PDS*.bed
For example, the 716,310 OQs are those identified in PDS on both forward and reverse strand. To obtain the list of these distinct quadruplexes, download and decompress the bed files containing PDS in the file name:
GSE63874_Na_K_PDS_minus_hits_intersect.bed
GSE63874_Na_PDS_minus_hits_intersect.bed
GSE63874_Na_K_PDS_plus_hits_intersect.bed
GSE63874_Na_PDS_plus_hits_intersect.bed
Merging and counting by strand provides the individual OQs by strand:
cat *PDS*minus*bed | sortBed -i – | mergeBed -i – |wc -l
356379
cat *PDS*plus*bed | sortBed -i – | mergeBed -i – |wc -l
359932
Note that there is one element more than the number reported in the paper (356379 + 359932 = 716311).
2. DNA G-quadruplexes in multiple organisms
(GEO: GSE110582 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE110582):
NOTE: there is a separate a sample accession for each individual organism.
DNA G-quadruplexes in multiple organisms | |
---|---|
GSM3003535 | Arabidopsis_Li_K |
GSM3003536 | Arabidopsis_Li_KPDS |
GSM3003537 | Celegans_Li_K |
GSM3003538 | Celegans_Li_KPDS |
GSM3003539 | Homo_Li_K |
GSM3003540 | Homo_Li_KPDS |
GSM3003541 | Drosophila_Li_K |
GSM3003542 | Drosophila_Li_KPDS |
GSM3003543 | Ecoli_Li_K |
GSM3003544 | Ecoli_Li_KPDS |
GSM3003545 | Leishmania_Li_K |
GSM3003546 | Leishmania_Li_KPDS |
GSM3003547 | Mouse_Li_K |
GSM3003548 | Mouse_Li_KPDS |
GSM3003549 | Plasmodium_Li_K |
GSM3003550 | Plasmodium_Li_KPDS |
GSM3003551 | Rhodobacter_Li_K |
GSM3003552 | Rhodobacter_Li_KPDS |
GSM3003553 | Saccharomyces_Li_K |
GSM3003554 | Saccharomyces_Li_KPDS |
GSM3003555 | Trypanosoma_Li_K |
GSM3003556 | Trypanosoma_Li_KPDS |
GSM3003557 | Zebrafish_Li_K |
GSM3003558 | Zebrafish_Li_KPDS |
RNA G-quadruplex interacting proteins – iClip and relative RNA-seq
1. iClip data for rG4 interacting proteins in Flp-In T-Rex 29 cells
(GEO: GSE106476, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE106476):
iClip data for rG4 interacting proteins in Flp-In T-Rex 29 cells | |
---|---|
GSM2838582 | mRG1 |
GSM2838583 | mRG2 |
GSM2838584 | mRG3 |
GSM2838585 | WT1 |
GSM2838586 | WT2 |
GSM2838587 | WT3 |
2. RNA-seq in in Flp-In T-Rex 29 cells
(GEO: GSE106476, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE106476):
RNA-seq in in Flp-In T-Rex 29 cells | |
---|---|
GSM2838588 | DDX3X_mRG_1 |
GSM2838589 | DDX3X_mRG_2 |
GSM2838590 | DDX3X_mRG_3 |
GSM2838591 | DDX3X_mRG_4 |
GSM2838592
| DDX3X_WT_1 |
GSM2838593 | DDX3X_WT_2 |
GSM2838594 | DDX3X_WT_3 |
GSM2838595 | DDX3X_WT_4 |
GSM2838596 | Negative_1 |
GSM2838597 | Negative_2 |
GSM2838598 | Negative_3 |
3. iClip for rG4 interacting proteins in HeLa cells
(GEO: GSE105082, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE105082):
iClip for rG4 interacting proteins in HeLa cells | |
---|---|
GSM2817677 | iCLIP-DHX9-1 |
GSM2817678 | iCLIP-DHX9-2 |
4. RPF data in HeLa cells
(GEO: GSE105082, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE105082):
RPF data in HeLa cells | |
---|---|
GSM2817679 | RPF-NT-1 |
GSM2817680 | RPF-NT-2 |
GSM2817681 | RPF_sc_2 |
GSM2817682 | RPF_sc_3 |
GSM2817683 | RPF_sc_4 |
GSM2817684 | RPF_DHX36_1 |
GSM2817685 | RPF_DHX36_2 |
GSM2817686 | RPF_DHX36_3 |
GSM2817687 | RPF_DHX9_1 |
GSM2817688 | RPF_DHX9_2 |
GSM2817689 | RPF_DHX9_3 |
5. RNA-seq in HeLa cells
(GEO: GSE105082, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE105082):
RNA-seq in HeLa cells | |
---|---|
GSM2817690 | RNA-NT-1 |
GSM2817691 | RNA-NT-2 |
GSM2817692 | RNA_sc_2 |
GSM2817693 | RNA_sc_3 |
GSM2817694 | RNA_sc_4 |
GSM2817695 | RNA_DHX36_1 |
GSM2817696 | RNA_DHX36_2 |
GSM2817697 | RNA_DHX36_3 |
GSM2817698 | RNA_DHX9_1 |
GSM2817699 | RNA_DHX9_2 |
GSM2817700 | RNA_DHX9_3 |
BG4-ChIP-seq data
1. BG4-chip-seq in K562 cells
(GEO: GSE107690, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE107690):
BG4-chip-seq in K562 cells | |
---|---|
GSM2876090 | K562_BG4-ChIP-rep_1a |
GSM2876091 | K562_BG4-ChIP-rep_1C |
GSM2876092 | K562_BG4-ChIP-rep_1_input |
GSM2876093 | K562_BG4-ChIP-rep_2a |
GSM2876094 | K562_BG4-ChIP-rep_2b |
GSM2876095 | K562_BG4-ChIP-rep_2c |
GSM2876096 | K562_BG4-ChIP-rep_2_input |
2. BG4-ChIP-seq in HaCaT cells
(GEO: GSE99205, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99205):
BG4-ChIP-seq in HaCaT cells | |
---|---|
GSM2635752 | BG4-ChIP-rep1 |
GSM2635753 | BG4-ChIP-rep2 |
GSM2635754 | BG4-ChIP-rep3 |
GSM2635755 | Input |
3. BG4-ChIP-seq in HaCat cells
(GEO: GSE76688, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76688):
BG4-ChIP-seq in HaCat cells | |
---|---|
GSM2035783 | rhh169_ChIP1_702_502_entst_17082015 |
GSM2035784 | rhh170_ChIP10_702_504_entst_26082015 |
GSM2035785 | rhh171_ChIP3_703_503_entst_17082015 |
GSM2035786 | rhh172_ChIP4_704_504_entst_17082015 |
GSM2035787 | rhh173_ChIP8_706_502_entst_26082015 |
GSM2035788 | rhh174_ChIP9_701_503_entst_26082015 |
GSM2035789 | rhh175_ChIPwthacat_704_502_entst_26082015 |
GSM2035790 | rhh177_Input_705_517_entst_26082015 |
4. BG4-ChIP-seq in HEK cells
(GEO: GSE76688, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76688):
BG4-ChIP-seq in HEK cells | |
---|---|
GSM2817690 | HEKnp_Lonza_1472015_BG4 |
GSM2817691 | HEKnp_Lonza_1572015_BG4 |
GSM2817692 | merged_14_and_15072015_input_heknplonza |