jass_preprocessing package
Submodules
jass_preprocessing.compute_score module
- jass_preprocessing.compute_score.compute_sample_size(mgwas, diagnostic_folder, trait, perSS=0.7)[source]
- jass_preprocessing.compute_score.compute_z_score(mgwas)[source]
Compute zscore value and sign1 add the corresponding column to the mgwas dataframe
the smallest positive value of float sys.float_info.min 2.2250738585072014e-308
the biggest Z score np.sqrt(ss.chi2.isf(sys.float_info.min, 1)) 37.537836095576054
jass_preprocessing.dna_utils module
Few fonction to to compute DNA complement
jass_preprocessing.map_gwas module
Map GWAS
A set of functions to find GWAS files in subfolder and to map columns
- jass_preprocessing.map_gwas.convert_missing_values(df)[source]
Convert all missing value strings to a standart np.nan value
- Parameters:
GWAS_table (pandas dataframe) – GWAS data as a dataframe
- Returns:
a pandas dataframe with missing value all equal to np.nan
- jass_preprocessing.map_gwas.gwas_internal_link(GWAS_table, GWAS_path)[source]
Walk the GWAS path to find the GWAS tables
- Parameters:
GWAS_table (str) – path of the folder to explore
findfile (str) – name of the file to find
- Returns:
a pandas dataframe with one column for the filename and one column containing the complete path to the file
- jass_preprocessing.map_gwas.map_columns_position(gwas_internal_link, column_dict)[source]
Find column position for each specific Gwas
- Parameters:
gwas_internal_link (str) – filename of the GWAS data (with path)
GWAS_labels (pd.DataFrame) – corresponding row of the information file
- Returns:
pandas Series with column position and column names as index
- jass_preprocessing.map_gwas.read_gwas(gwas_internal_link, column_map, imputation_treshold=None)[source]
Read gwas raw data, fetch columns thanks to position stored in column_map and rename columns according to column_map.index
- Parameters:
gwas_internal_link (str) – GWAS data as a dataframe
column_map (pandas Series) – Series containing the position of column in
data (the raw)
- Returns:
a pandas dataframe with missing value all equal to np.nan
jass_preprocessing.map_reference module
Module of function
- jass_preprocessing.map_reference.compute_is_aligned(mgwas)[source]
Check if the reference panel and the GWAS data have the same reference allele. return a boolean vector. The function should be the complement of “is_flipped” but we still compute the two function to eventually detect weird cases (more than two alleles for instance)
- jass_preprocessing.map_reference.compute_is_flipped(mgwas)[source]
Check if the reference panel and the GWAS data have the same reference allele. return a boolean vector.
- Parameters:
mgwas (pandas dataframe) – GWAS study dataframe merged with the reference_panel
- Returns:
merge studies,
- Return type:
is_flipped (pandas dataframe)
- jass_preprocessing.map_reference.compute_snp_alignement(mgwas)[source]
Add a column to mgwas indicating if the reference and coded allele is flipped compared to the reference panel. If it is, the sign of the statistic must be flipped :param mgwas: a pandas dataframe of the GWAS data merged
with the reference panel
- jass_preprocessing.map_reference.map_on_ref_panel(gw_df, ref_panel, index_type='rs-number')[source]
Merge Gwas dataframe with the reference panel Make sure that the same SNPs are in the reference panel and the gwas
- Parameters:
gw_df (pandas dataframe) – GWAS study dataframe
ref_panel (pandas dataframe) – reference panel dataframe
- Returns:
merge studies,
- Return type:
merge_GWAS (pandas dataframe)
- jass_preprocessing.map_reference.read_reference(gwas_reference_panel, mask_MHC=False, minimum_MAF=None, region_to_mask=None)[source]
helper function to name correctly the column :param gwas_reference_panel: path toward the reference panel file :type gwas_reference_panel: str :param mask_MHC: Whether the MHC region should be masked or not. default is False :type mask_MHC: bool :param Filter the reference panel by minimum allele frequency: :type Filter the reference panel by minimum allele frequency: hg19 coordinate :param minimum_MAF: minimum allele frequency for a SNPs to be retain in the panel :type minimum_MAF: float :param region_to_mask: a list of additional regions to mask :type region_to_mask: dict :param type_of_index: ‘rs-number’ or ‘positional’ :type type_of_index: str
- Returns:
the reference_panel with the specified filter applied
- Return type:
ref (pandas dataframe)
jass_preprocessing.save_output module
Module contents
Map GWAS |
|
Few fonction to to compute DNA complement |
|
Module of function |
|