FeatureSelection.get_db_filtered¶
- FeatureSelection.get_db_filtered()¶
Restituisce il DataFrame originale senza le feature eliminate.
Rimuove dal DataFrame originale (attributo db) le colonne corrispondenti alle feature che sono state memorizzate nell’attributo _filtered_out_features dai vari filtri di selezione. Mantiene tutte le altre colonne originali (inclusi target, sample_col, ecc.). Questo rappresenta il dataset dopo l’applicazione dei filtri della feature selection.
- Returns:
Il DataFrame originale con le colonne delle feature eliminate rimosse.
- Return type:
pd.DataFrame
Dati utilizzati per gli esempi:
>>> df_test_filters
feature_B feature_C target feature_A1 feature_A2 sample_col 0 34.835708 Z 75.013312 -0.270712 -0.812137 train 1 3.086785 X 109.194174 0.104848 0.314544 train 2 42.384427 Y 96.287048 0.250528 0.751583 train 3 86.151493 X 264.905765 -0.925200 -2.775600 train 4 -1.707669 V 2.880829 0.567144 1.701431 train 5 -1.706848 V 2.318509 -1.040180 -3.120541 train 6 88.960641 X 273.054387 -0.153676 -0.461028 train 7 48.371736 V 101.779140 0.789852 2.369555 train 8 -13.473719 Z -25.266714 -1.226216 -3.678648 train 9 37.128002 Y 73.118623 -0.948007 -2.844021 train 10 -13.170885 X 69.538553 -0.569654 -1.708962 train 11 -13.286488 Z -30.168523 -0.977150 -2.931451 train 12 22.098114 W 54.445288 -0.770632 -2.311895 train 13 -85.664012 W -171.324610 -0.033711 -0.101134 train 14 -76.245892 X -48.581133 -1.032859 -3.098578 train 15 -18.114376 X 59.816750 1.142427 3.427282 train 16 -40.641556 Z -85.182377 -0.609778 -1.829334 train 17 25.712367 Y 56.834657 1.469416 4.408249 train 18 -35.401204 Y -77.550289 1.492679 4.478037 train 19 -60.615185 Y -123.306439 0.707125 2.121376 train Esempio:
>>> from cefeste.selection import FeatureSelection >>> fs = FeatureSelection( ... db=df_test_filters, ... target_col='target', ... sample_col='sample_col', ... sample_train_value='train', ... verbose=True ... ) >>> fs.run() >>> fs.make_report()
feat_name result drop_reason 0 feature_A1 drop unexplanatory 1 feature_A2 drop unexplanatory 2 feature_C keep NaN 3 feature_B keep NaN >>> fs.get_db_filtered()
feature_B feature_C target sample_col 0 34.835708 Z 75.013312 train 1 3.086785 X 109.194174 train 2 42.384427 Y 96.287048 train 3 86.151493 X 264.905765 train 4 -1.707669 V 2.880829 train 5 -1.706848 V 2.318509 train 6 88.960641 X 273.054387 train 7 48.371736 V 101.779140 train 8 -13.473719 Z -25.266714 train 9 37.128002 Y 73.118623 train 10 -13.170885 X 69.538553 train 11 -13.286488 Z -30.168523 train 12 22.098114 W 54.445288 train 13 -85.664012 W -171.324610 train 14 -76.245892 X -48.581133 train 15 -18.114376 X 59.816750 train 16 -40.641556 Z -85.182377 train 17 25.712367 Y 56.834657 train 18 -35.401204 Y -77.550289 train 19 -60.615185 Y -123.306439 train