FeatureElimination.get_X_reduced¶

FeatureElimination.get_X_reduced()¶

Restituisce il DataFrame contenente solo le feature finali selezionate.

Seleziona dal DataFrame originale (attributo db) solo le colonne presenti nell’attributo final_feat, che è stata popolata dal metodo extract_features(). Questo rappresenta il dataset dopo l’applicazione della feature elimination (escluse le colonne target e sample_col).

Returns:: Un DataFrame contenente solo le colonne delle feature finali selezionate.
Return type:: pd.DataFrame

Note

Assicurarsi di aver eseguito make_report() e extract_features() prima di chiamare questo metodo per ottenere un risultato significativo.

Esempio:

>>> import pandas as pd
>>> from sklearn.linear_model import LogisticRegression
>>> from cefeste.elimination import FeatureElimination
>>> # Dati di esempio
>>> data = pd.DataFrame({
... 'feature1': [1,2,3,4,5,6,7,8,9,2,3,1,3,6,43,2,4,6,3,2,6,3,2,6,3,2,5],
... 'feature2': [5,4,3,2,1,6,3,7,3,5,8,4,2,9,75,4,5,7,5,2,5,8,6,3,5,7,8],
... 'feature3': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
... 'feature4': [8,8,8,9,5,3,4,5,6,2,6,8,4,4,4,6,5,7,8,5,9,3,4,7,5,78,8],
... 'feature5': [2,12,4,14,2,3,1,2,3,14,1,1,14,12,15,16,2,12,13,12,1,15,17,2,1,15,1],
... 'target':   [0,1,0,1,0,0,0,0,0,1,1,1,1,1,1,1,0,1,1,1,0,1,1,0,1,1,1]
... })
>>> # Parametri
>>> target_col = 'target'
>>> model = LogisticRegression()
>>> grid = {'C': [0.1, 1, 10]}
>>> # Inizializzazione
>>> fe = FeatureElimination(
...    db=data,
...    target_col=target_col,
...    model=model,
...    grid=grid,
...    min_n_feat_step=1
... )
>>> # Generazione del report
>>> fe.make_report()

	n_feat	train_score	valid_score	n_feat_to_remove	feat_used	feat_to_remove	feat_select	best_estimator
0	5	0.964706	0.891667	1	[feature3, feature5, feature1, feature2, feature4]	[feature3]	[feature5, feature1, feature2, feature4]	LogisticRegression(C=0.1)
1	4	0.964706	0.891667	1	[feature4, feature5, feature2, feature1]	[feature4]	[feature5, feature2, feature1]	LogisticRegression(C=0.1)
2	3	0.982353	0.925000	1	[feature2, feature5, feature1]	[feature2]	[feature5, feature1]	LogisticRegression(C=1)
3	2	0.817647	0.750000	1	[feature1, feature5]	[feature1]	[feature5]	LogisticRegression(C=0.1)
4	1	0.788235	0.791667	0	[feature5]	[]	[feature5]	LogisticRegression(C=0.1)

>>> fe.extract_features()
['feature2', 'feature5', 'feature1']

>>> fe.get_X_reduced()

	feature2	feature5	feature1
0	5	2	1
1	4	12	2
2	3	4	3
3	2	14	4
4	1	2	5
5	6	3	6
6	3	1	7
7	7	2	8
8	3	3	9
9	5	14	2
10	8	1	3
11	4	1	1
12	2	14	3
13	9	12	6
14	75	15	43
15	4	16	2
16	5	2	4
17	7	12	6
18	5	13	3
19	2	12	2
20	5	1	6
21	8	15	3
22	6	17	2
23	3	2	6
24	5	1	3
25	7	15	2
26	8	1	5

FeatureElimination.get_X_reduced¶

Table of Contents

This Page

	feature2	feature5	feature1
0	5	2	1
1	4	12	2
2	3	4	3
3	2	14	4
4	1	2	5
5	6	3	6
6	3	1	7
7	7	2	8
8	3	3	9
9	5	14	2
10	8	1	3
11	4	1	1
12	2	14	3
13	9	12	6
14	75	15	43
15	4	16	2
16	5	2	4
17	7	12	6
18	5	13	3
19	2	12	2
20	5	1	6
21	8	15	3
22	6	17	2
23	3	2	6
24	5	1	3
25	7	15	2
26	8	1	5

	feature2	feature5	feature1
0	5	2	1
1	4	12	2
2	3	4	3
3	2	14	4
4	1	2	5
5	6	3	6
6	3	1	7
7	7	2	8
8	3	3	9
9	5	14	2
10	8	1	3
11	4	1	1
12	2	14	3
13	9	12	6
14	75	15	43
15	4	16	2
16	5	2	4
17	7	12	6
18	5	13	3
19	2	12	2
20	5	1	6
21	8	15	3
22	6	17	2
23	3	2	6
24	5	1	3
25	7	15	2
26	8	1	5

	feature2	feature5	feature1
0	5	2	1
1	4	12	2
2	3	4	3
3	2	14	4
4	1	2	5
5	6	3	6
6	3	1	7
7	7	2	8
8	3	3	9
9	5	14	2
10	8	1	3
11	4	1	1
12	2	14	3
13	9	12	6
14	75	15	43
15	4	16	2
16	5	2	4
17	7	12	6
18	5	13	3
19	2	12	2
20	5	1	6
21	8	15	3
22	6	17	2
23	3	2	6
24	5	1	3
25	7	15	2
26	8	1	5