jass_preprocessing package

Submodules

jass_preprocessing.compute_score module

jass_preprocessing.compute_score.compute_sample_size(mgwas, diagnostic_folder, trait, perSS=0.7)[source]

jass_preprocessing.compute_score.compute_z_score(mgwas)[source]

Compute zscore value and sign1 add the corresponding column to the mgwas dataframe

the smallest positive value of float sys.float_info.min 2.2250738585072014e-308

the biggest Z score np.sqrt(ss.chi2.isf(sys.float_info.min, 1)) 37.537836095576054

jass_preprocessing.dna_utils module

Few fonction to to compute DNA complement

jass_preprocessing.dna_utils.dna_complement(input)[source]

jass_preprocessing.dna_utils.dna_complement_base(inputbase)[source]

jass_preprocessing.map_gwas module

Map GWAS

A set of functions to find GWAS files in subfolder and to map columns

jass_preprocessing.map_gwas.convert_missing_values(df)[source]

Convert all missing value strings to a standart np.nan value

Parameters:: GWAS_table (pandas dataframe) – GWAS data as a dataframe
Returns:: a pandas dataframe with missing value all equal to np.nan

jass_preprocessing.map_gwas.gwas_internal_link(GWAS_table, GWAS_path)[source]

Walk the GWAS path to find the GWAS tables

Parameters:

GWAS_table (str) – path of the folder to explore
findfile (str) – name of the file to find

Returns:

a pandas dataframe with one column for the filename and one column containing the complete path to the file

jass_preprocessing.map_gwas.map_columns_position(gwas_internal_link, column_dict)[source]

Find column position for each specific Gwas

Parameters:

gwas_internal_link (str) – filename of the GWAS data (with path)
GWAS_labels (pd.DataFrame) – corresponding row of the information file

Returns:

pandas Series with column position and column names as index

jass_preprocessing.map_gwas.read_gwas(gwas_internal_link, column_map, imputation_treshold=None)[source]

Read gwas raw data, fetch columns thanks to position stored in column_map and rename columns according to column_map.index

Parameters:

gwas_internal_link (str) – GWAS data as a dataframe
column_map (pandas Series) – Series containing the position of column in
data (the raw)

Returns:

a pandas dataframe with missing value all equal to np.nan

jass_preprocessing.map_gwas.walkfs(startdir, findfile)[source]

Go through the folder and subfolder to find the specified file

Parameters:

startdir (str) – path of the folder to explore
findfile (str) – name of the file to find

jass_preprocessing.map_reference module

Module of function

jass_preprocessing.map_reference.compute_is_aligned(mgwas)[source]: Check if the reference panel and the GWAS data have the same reference allele. return a boolean vector. The function should be the complement of “is_flipped” but we still compute the two function to eventually detect weird cases (more than two alleles for instance)

jass_preprocessing.map_reference.compute_is_flipped(mgwas)[source]

Check if the reference panel and the GWAS data have the same reference allele. return a boolean vector.

Parameters:: mgwas (pandas dataframe) – GWAS study dataframe merged with the reference_panel
Returns:: merge studies,
Return type:: is_flipped (pandas dataframe)

jass_preprocessing.map_reference.compute_snp_alignement(mgwas)[source]: Add a column to mgwas indicating if the reference and coded allele is flipped compared to the reference panel. If it is, the sign of the statistic must be flipped :param mgwas: a pandas dataframe of the GWAS data merged

with the reference panel

jass_preprocessing.map_reference.map_on_ref_panel(gw_df, ref_panel, index_type='rs-number')[source]

Merge Gwas dataframe with the reference panel Make sure that the same SNPs are in the reference panel and the gwas

Parameters:

gw_df (pandas dataframe) – GWAS study dataframe
ref_panel (pandas dataframe) – reference panel dataframe

Returns:

merge studies,

Return type:

merge_GWAS (pandas dataframe)

jass_preprocessing.map_reference.read_reference(gwas_reference_panel, mask_MHC=False, minimum_MAF=None, region_to_mask=None)[source]

helper function to name correctly the column :param gwas_reference_panel: path toward the reference panel file :type gwas_reference_panel: str :param mask_MHC: Whether the MHC region should be masked or not. default is False :type mask_MHC: bool :param Filter the reference panel by minimum allele frequency: :type Filter the reference panel by minimum allele frequency: hg19 coordinate :param minimum_MAF: minimum allele frequency for a SNPs to be retain in the panel :type minimum_MAF: float :param region_to_mask: a list of additional regions to mask :type region_to_mask: dict :param type_of_index: ‘rs-number’ or ‘positional’ :type type_of_index: str

Returns:: the reference_panel with the specified filter applied
Return type:: ref (pandas dataframe)

jass_preprocessing.save_output module

jass_preprocessing.save_output.save_output(mgwas, ImpG_output_Folder, my_study)[source]: Write the preprocessed Gwas for ldscore analysis

jass_preprocessing.save_output.save_output_by_chromosome(mgwas, ImpG_output_Folder, my_study)[source]: Write the preprocessed Gwas for imputation

Module contents

`map_gwas`	Map GWAS
`dna_utils`	Few fonction to to compute DNA complement
`map_reference`	Module of function
`compute_score`
`save_output`