Data preparation
- JASS requires GWAS summary statistics to be harmonized and formatted (see JASS input data section below) .
We advice the user to follow the methods provided in the first section of this page to harmonize their data. The second section describes the different input data required by JASS.
The third section describes an imputation tool compatible with JASS input format (optional preparation step). Finally, we provided a command line example to assemble input data into the JASS inittable (the database of curated summary statistics used to perform the multi-trait GWAS)
How to generate input data for JASS
Option 1 Nextflow pipeline:
Preprocessing steps for JASS (data harmonisation and imputation)have been gathered into a Nextflow pipeline : JASS pipeline Suite. While this option might have stronger installation requirements, it ensure reproducibility by leveraging docker containers (fixed version of JASS and accompanying packages). It will also be much more efficient is you a large number of heterogeneous data to handle and a computing cluster available.
Option 2 Prepare input data using the JASS pre-processing Python package:
To standardize the format of the input GWAS datasets, you can use the JASS Pre-processing package. The JASS Pre-processing documentation details the use of this tool.
JASS input data
JASS data, from Multi-trait GWAS can be computed, are stored in an HDF5 file. This file can be created with the procedure create-inittable. This procedure needs the following input files to complete:
GWAS description
This file that must contain the following columns and tab-separated:
Consortium |
Outcome |
FullName |
Type |
Nsample |
Ncase |
Ncontrol |
Reference |
ReferenceLink |
dataLink |
internalDataLink |
|---|---|---|---|---|---|---|---|---|---|---|
GIANT |
HIP |
Hip Circumference |
Anthropometry |
142762 |
Shungin et al. 2015 |
url to reference |
url to data |
local path to data |
The Consortium and outcome names must correspond to the name of the summary statistic files and covariance columns as describe in the following section. Nsample, Ncase and Ncontrol can be left blank. The last four columns can also be left blank if the user doesn't want to run JASS on a server.
GWAS results files
GWAS results files in the tabular format by chromosome (tab separated) all in the same folder with the following columns with the same header:
rsID |
pos |
A0 |
A1 |
Z |
|---|---|---|---|---|
rs6548219 |
30762 |
A |
G |
-1.133 |
A0 is the effect allele. The name of file MUST follow this pattern : "z_{CONSORTIUM}_{TRAIT}_chr{chromosome number}.txt". The consortium and the trait must be capitalized and must NOT contain _ .
Covariance file (OPTIONAL)
A covariance file that corresponds to the covariance between traits under H0. This file is a tab-separated tabular file.
We recommend that this covariance file to be computed using the LDScore regression However, this step can be fastidious and if not provided by the user, a matrix will be inferred from low signal zscore.
The traits names (columns and row names of the matrix) must correspond to the summary statistic file names: z_{CONSORTIUM}_{TRAIT}. You can see below an example subset that illustrates this format:
PHE C4D_CHD CARDIOGRAM_CHD DIAGRAM_T2D GABRIEL_ASTHMA GEFOS_BMD-FOREARM GEFOS_BMD-NECK
C4D_CHD 1.0593 0.0351 0.0548 0.085 -0.0061
CARDIOGRAM_CHD 0.0351 1.0256 0.0631 0.025 -0.0002
DIAGRAM_T2D 0.0548 0.0631 1.0136 0.0382 0.0048
GABRIEL_ASTHMA 0.085 0.025 0.0382 1.0134 -0.0104
GEFOS_BMD-FOREARM -0.0061 -0.0002 0.0048 -0.0104 1.0123
Region file
Region file of approximately independant LD regions to the BED file. For european ancestry and grch37/hg19, we suggest to use the regions as defined by [BP15], which is already available in the data folder of the package.
For grch38, we computed these regions for the five superpopulation available in 1000G using Big SNPR [Pri21]. The corresponding files are stored at <https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline/-/tree/pipeline_ancestry/input_files>`_.
chr |
start |
stop |
|---|---|---|
chr1 |
10583 |
1892607 |
For inferring approximately independant LD regions from your own panel we recommend using https://privefl.github.io/bigsnpr/ . See [Pri21] on the matter.
Data imputation (optional)
Creation of the JASS inittable
Once, GWAS summary statistics are harmonized, they are integrated into one file by the using jass command line (see detail in command line usage). Note that GWAS results must be provided through name pattern (ruled used by Unix Shell) corresponding to the file to be included.
jass create-inittable --input-data-path "harmonized_GWAS_files/*.txt" --init-covariance-path $path1/Covariance_matrix_H0.csv --regions-map-path $path2/Region_file.bed --description-file-path $path3/Data_summary.csv --init-table-path $path4/init_table_EUR_not_imputed.hdf5