FeatureElimination.get_X_original¶
- FeatureElimination.get_X_original()¶
Restituisce il DataFrame originale contenente solo le feature inizialmente considerate.
Seleziona dal DataFrame originale (attributo db) solo le colonne presenti nell’attributo feat_to_check, ovvero le feature che sono state incluse all’inizio del processo di analisi/eliminazione.
- Returns:
Un DataFrame contenente solo le colonne specificate in feat_to_check.
- Return type:
pd.DataFrame
Esempio:
>>> import pandas as pd >>> from sklearn.linear_model import LogisticRegression >>> from cefeste.elimination import FeatureElimination >>> # Dati di esempio >>> data = pd.DataFrame({ ... 'feature1': [1,2,3,4,5,6,7,8,9,2,3,1,3,6,43,2,4,6,3,2,6,3,2,6,3,2,5], ... 'feature2': [5,4,3,2,1,6,3,7,3,5,8,4,2,9,75,4,5,7,5,2,5,8,6,3,5,7,8], ... 'feature3': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], ... 'feature4': [8,8,8,9,5,3,4,5,6,2,6,8,4,4,4,6,5,7,8,5,9,3,4,7,5,78,8], ... 'feature5': [2,12,4,14,2,3,1,2,3,14,1,1,14,12,15,16,2,12,13,12,1,15,17,2,1,15,1], ... 'target': [0,1,0,1,0,0,0,0,0,1,1,1,1,1,1,1,0,1,1,1,0,1,1,0,1,1,1] ... }) >>> # Parametri >>> target_col = 'target' >>> model = LogisticRegression() >>> grid = {'C': [0.1, 1, 10]} >>> # Inizializzazione >>> fe = FeatureElimination( ... db=data, ... target_col=target_col, ... model=model, ... grid=grid, ... min_n_feat_step=1 ... ) >>> # Generazione del report >>> fe.make_report()
n_feat train_score valid_score n_feat_to_remove feat_used feat_to_remove feat_select best_estimator 0 5 0.964706 0.891667 1 [feature3, feature5, feature1, feature2, feature4] [feature3] [feature5, feature1, feature2, feature4] LogisticRegression(C=0.1) 1 4 0.964706 0.891667 1 [feature4, feature5, feature2, feature1] [feature4] [feature5, feature2, feature1] LogisticRegression(C=0.1) 2 3 0.982353 0.925000 1 [feature2, feature5, feature1] [feature2] [feature5, feature1] LogisticRegression(C=1) 3 2 0.817647 0.750000 1 [feature1, feature5] [feature1] [feature5] LogisticRegression(C=0.1) 4 1 0.788235 0.791667 0 [feature5] [] [feature5] LogisticRegression(C=0.1) >>> fe.extract_features() ['feature2', 'feature5', 'feature1']
>>> fe.get_X_original()
feature4 feature1 feature2 feature3 feature5 0 8 1 5 0 2 1 8 2 4 0 12 2 8 3 3 0 4 3 9 4 2 0 14 4 5 5 1 0 2 5 3 6 6 0 3 6 4 7 3 0 1 7 5 8 7 0 2 8 6 9 3 0 3 9 2 2 5 0 14 10 6 3 8 0 1 11 8 1 4 0 1 12 4 3 2 0 14 13 4 6 9 0 12 14 4 43 75 0 15 15 6 2 4 0 16 16 5 4 5 0 2 17 7 6 7 0 12 18 8 3 5 0 13 19 5 2 2 0 12 20 9 6 5 0 1 21 3 3 8 0 15 22 4 2 6 0 17 23 7 6 3 0 2 24 5 3 5 0 1 25 78 2 7 0 15 26 8 5 8 0 1