About
At GCB Hub, we enhance the understanding of biomarkers and their causal connections with complex traits and diseases across global populations.
Science
Large-scale imputation models for multi-ancestry proteome-wide association analysis.
Proteome-wide association studies (PWAS) decode the intricate proteomic landscape of biological mechanisms for complex diseases. Traditional PWAS model training relies heavily on individual-level reference proteomes, thereby restricting its capacity to harness the emerging summary-level protein quantitative trait loci (pQTL) data in the public domain. Here we introduced a novel framework to train PWAS models directly from pQTL summary statistics. By leveraging extensive pQTL data from the UK Biobank, deCODE, and ARIC studies, we applied our approach to train large-scale European PWAS models (total n = 88,838 subjects). Furthermore, we developed PWAS models tailored for Asian and African ancestries by integrating multi-ancestry summary and individual-level data resources (total n = 914 for Asian and 3,042 for African ancestries). We validated the performance of our PWAS models through a systematic multi-ancestry analysis of over 700 phenotypes across five major genetic data resources. Our results bridge the gap between genomics and proteomics for drug discovery, highlighting novel protein-phenotype links and their transferability across diverse ancestries.

Note:
BLISS code and developed models can be downloaded from our data depository. For the most current BLISS code, please refer to our GitHub page.
In each GWAS database, we applied the PWAS models with matched ancestry and only analyzed proteins with cis-heritability exceeding 0.01. All eight possible models are:
Name Platform Method Ancestry Training Sample size # proteins UKB OLink BLISS EUR 46,066 1,412 ARIC SomaScan BLISS EUR 7,213 4,423 deCODE SomaScan BLISS EUR 35,559 4,428 UKB_AFR_std OLink Standard PWAS AFR 1,171 1,412 UKB_AFR_super OLink BLISS (Super Learner) AFR 1,171 1,412 ARIC_AA SomaScan BLISS African American 1,871 4,415 UKB_ASN_std OLink Standard PWAS Asian 914 1,412 UKB_ASN_super OLink BLISS (Super Learner) Asian 914 1,412 We used SNP name (rsid) to link different datasets and all positions are on GRCh37.
An atlas of genetic effects on the monocyte methylome across European and African populations.
Comprehensive Multi-Ancestry Methylome-Wide Association Study (MWAS)
This is a comprehensive multi-ancestry methylome-wide association study (MWAS) conducted on purified monocytes from European American (EA) and African American (AA) populations.
Key Features:
- Whole-genome bisulfite sequencing (WGBS) data from 298 EA and 160 AA individuals
- Analysis of over 25 million methylation sites
- Identification of cis- and trans-methylation quantitative trait loci (meQTLs)
- Development of population-specific DNA methylation imputation models
- MWAS analysis of 41 complex traits using Million Veteran Program (MVP) data
Our study provides:
- CpG-trait associations: Direct links between specific methylation sites and complex traits
- Gene-trait associations: Aggregated effects of methylation on genes associated with various phenotypes
This resource bridges the gap between genomics and the monocyte methylome, offering insights into:
- Genetic regulation of DNA methylation
- Novel methylation-phenotype associations
- Transferability of findings across diverse ancestries
The data presented here are valuable for researchers investigating the epigenetic basis of complex diseases, particularly those mediated by the immune system.

Team
GCB Hub aims to assemble a passionate team dedicated to open science, pulling together skills from a wide range of disciplines, including statistics, biostatistics, causal inference, epidemiology, drug development, clinical medicine, human genomics, as well as web and software development.
Leadership Team:
Chong Wu (MD Anderson) and Bingxin Zhao (UPenn)
Current and past members:
Zichen Zhang
(MD Anderson),
Xiaochen Yang
(Purdue),
Wanheng Zhang
(MD Anderson)
Contact and license
We are dedicated to identifying causal biomarkers, including but not limited to proteins, genes (expression and splicing), and CpG sites, for complex traits and diseases in different domains. Resulting resources would enable to address many relevant scientific questions, and help researchers in pinpointing the most promising targets for subsequent functional analysis, drug development, and repurposing.
If you have QTL datasets or would like to deposit your summary statistics to GCB Hub, please feel free to reach out to Chong Wu and Bingxin Zhao.
This work is licensed under the CC BY-NC-ND 4.0 DEED.