mightypy.stats package#
Module contents#
mightypy.stats#
- class WOE_IV(event: str, non_event: str, target_col: str, bucket_col: str, value_col: str | None = None, agg_func: ~typing.Callable = <function count_nonzero>, bucket_col_type: str = 'continuous', n_buckets: int = 10)[source]#
Bases:
objectWeight of Evidence and Information Value.
References
https://www.listendata.com/2015/03/weight-of-evidence-woe-and-information.html
- Parameters:
event (str) – event name. Generally label true/1.
non_event (str) – non event name. Generally label false/0.
target_col (str) – Target column name.
value_col (str) – Value column name to aggregate(count). Defaults to None.
bucket_col (str) – bucketing column name.
agg_func (Callable, optional) – Aggregation function name. Defaults to np.count_nonzero.
bucket_col_type (str, optional) – Bucketing columns value type. If discrete buckets will not be created else buckets will be created. Defaults to ‘continuous’.
n_buckets (int, optional) – If bucket column has continuous values then create aritificial buckets. Defaults to 10.
Examples
>>> from sklearn.datasets import load_breast_cancer >>> from mightypy.stats import WOE_IV
>>> dataset = load_breast_cancer(as_frame=True) >>> df = dataset.frame[['mean radius', 'target']] >>> target_map = {0: 'False', 1: 'True'} >>> df['label'] = df['target'].map(target_map)
>>> obj = WOE_IV(event='True', non_event='False', target_col='label', >>> bucket_col='mean radius')
>>> cal_df, iv = obj.values(df) >>> fig = obj.plot() >>> fig.tight_layout() >>> fig.show()
or directly
>>> fig, ax = obj.plot(df) >>> fig.show()
- plot(df: DataFrame | None = None, figsize=(10, 5)) Figure[source]#
Plot weight of evidence and subsequent plots.
- Parameters:
df (Optional[pd.DataFrame], optional) – Input dataframe. Defaults to None.
figsize (tuple, optional) – Figure size. Defaults to (10, 5).
- Raises:
ValueError – If dataframe doesn’t exist either in the model or in method args.
- Returns:
matplotlib figure.
- Return type:
plt.Figure
- values(df: DataFrame | None = None) Tuple[DataFrame, float][source]#
Returns weight of evidence and information value for given dataframe.
- Parameters:
df (Optional[pd.DataFrame], optional) – Input dataframe. Defaults to None.
- Raises:
ValueError – If input dataframe does not exist either in the model or in method input args.
- Returns:
calculated dataframe and information value.
- Return type:
Tuple[pd.DataFrame, float]
- population_stability_index(expected: list | ndarray, actual: list | ndarray, data_type: str) DataFrame[source]#
Populaion Stability Index.
References
https://www.listendata.com/2015/05/population-stability-index.html
- Parameters:
- Returns:
calculated dataframe.
- Return type:
pd.DataFrame
Examples
>>> import numpy as np >>> from mightypy.stats import population_stability_index
- continuous data
>>> expected_continuous = np.random.normal(size=(500,)) >>> actual_continuous = np.random.normal(size=(500,)) >>> psi_df = population_stability_index(expected_continuous, actual_continuous, data_type='continuous') >>> psi_df.psi.sum()
- discrete data
>>> expected_discrete = np.random.randint(0,10, size=(500,)) >>> actual_discrete = np.random.randint(0,10, size=(500,)) >>> psi_df = population_stability_index(expected_discrete, actual_discrete, data_type='discrete') >>> psi_df.psi.sum()