FeatureElimination.get_db_filtered¶
- FeatureElimination.get_db_filtered()¶
Restituisce il DataFrame originale senza le feature eliminate.
Rimuove dal DataFrame originale (attributo db) le colonne corrispondenti alle feature che sono state eliminate durante il processo RFE e memorizzate nell’attributo _filtered_out_features dal metodo extract_features(). Mantiene tutte le altre colonne originali (inclusi target, sample_col, ecc.).
- Returns:
Il DataFrame originale con le colonne delle feature eliminate rimosse.
- Return type:
pd.DataFrame
Note
Assicurarsi di aver eseguito make_report() e extract_features() prima di chiamare questo metodo per ottenere un risultato significativo.
Esempio:
>>> import pandas as pd >>> from sklearn.linear_model import LogisticRegression >>> from cefeste.elimination import FeatureElimination >>> # Dati di esempio >>> data = pd.DataFrame({ ... 'feature1': [1,2,3,4,5,6,7,8,9,2,3,1,3,6,43,2,4,6,3,2,6,3,2,6,3,2,5], ... 'feature2': [5,4,3,2,1,6,3,7,3,5,8,4,2,9,75,4,5,7,5,2,5,8,6,3,5,7,8], ... 'feature3': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], ... 'feature4': [8,8,8,9,5,3,4,5,6,2,6,8,4,4,4,6,5,7,8,5,9,3,4,7,5,78,8], ... 'feature5': [2,12,4,14,2,3,1,2,3,14,1,1,14,12,15,16,2,12,13,12,1,15,17,2,1,15,1], ... 'target': [0,1,0,1,0,0,0,0,0,1,1,1,1,1,1,1,0,1,1,1,0,1,1,0,1,1,1] ... }) >>> # Parametri >>> target_col = 'target' >>> model = LogisticRegression() >>> grid = {'C': [0.1, 1, 10]} >>> # Inizializzazione >>> fe = FeatureElimination( ... db=data, ... target_col=target_col, ... model=model, ... grid=grid, ... min_n_feat_step=1 ... ) >>> # Generazione del report >>> fe.make_report()
n_feat train_score valid_score n_feat_to_remove feat_used feat_to_remove feat_select best_estimator 0 5 0.964706 0.891667 1 [feature3, feature5, feature1, feature2, feature4] [feature3] [feature5, feature1, feature2, feature4] LogisticRegression(C=0.1) 1 4 0.964706 0.891667 1 [feature4, feature5, feature2, feature1] [feature4] [feature5, feature2, feature1] LogisticRegression(C=0.1) 2 3 0.982353 0.925000 1 [feature2, feature5, feature1] [feature2] [feature5, feature1] LogisticRegression(C=1) 3 2 0.817647 0.750000 1 [feature1, feature5] [feature1] [feature5] LogisticRegression(C=0.1) 4 1 0.788235 0.791667 0 [feature5] [] [feature5] LogisticRegression(C=0.1) >>> fe.extract_features() ['feature2', 'feature5', 'feature1']
>>> fe.get_db_filtered()
feature1 feature2 feature5 target 0 1 5 2 0 1 2 4 12 1 2 3 3 4 0 3 4 2 14 1 4 5 1 2 0 5 6 6 3 0 6 7 3 1 0 7 8 7 2 0 8 9 3 3 0 9 2 5 14 1 10 3 8 1 1 11 1 4 1 1 12 3 2 14 1 13 6 9 12 1 14 43 75 15 1 15 2 4 16 1 16 4 5 2 0 17 6 7 12 1 18 3 5 13 1 19 2 2 12 1 20 6 5 1 0 21 3 8 15 1 22 2 6 17 1 23 6 3 2 0 24 3 5 1 1 25 2 7 15 1 26 5 8 1 1