FeatureElimination.get_db_filtered¶

FeatureElimination.get_db_filtered()¶

Restituisce il DataFrame originale senza le feature eliminate.

Rimuove dal DataFrame originale (attributo db) le colonne corrispondenti alle feature che sono state eliminate durante il processo RFE e memorizzate nell’attributo _filtered_out_features dal metodo extract_features(). Mantiene tutte le altre colonne originali (inclusi target, sample_col, ecc.).

Returns:: Il DataFrame originale con le colonne delle feature eliminate rimosse.
Return type:: pd.DataFrame

Note

Assicurarsi di aver eseguito make_report() e extract_features() prima di chiamare questo metodo per ottenere un risultato significativo.

Esempio:

>>> import pandas as pd
>>> from sklearn.linear_model import LogisticRegression
>>> from cefeste.elimination import FeatureElimination
>>> # Dati di esempio
>>> data = pd.DataFrame({
... 'feature1': [1,2,3,4,5,6,7,8,9,2,3,1,3,6,43,2,4,6,3,2,6,3,2,6,3,2,5],
... 'feature2': [5,4,3,2,1,6,3,7,3,5,8,4,2,9,75,4,5,7,5,2,5,8,6,3,5,7,8],
... 'feature3': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
... 'feature4': [8,8,8,9,5,3,4,5,6,2,6,8,4,4,4,6,5,7,8,5,9,3,4,7,5,78,8],
... 'feature5': [2,12,4,14,2,3,1,2,3,14,1,1,14,12,15,16,2,12,13,12,1,15,17,2,1,15,1],
... 'target':   [0,1,0,1,0,0,0,0,0,1,1,1,1,1,1,1,0,1,1,1,0,1,1,0,1,1,1]
... })
>>> # Parametri
>>> target_col = 'target'
>>> model = LogisticRegression()
>>> grid = {'C': [0.1, 1, 10]}
>>> # Inizializzazione
>>> fe = FeatureElimination(
...    db=data,
...    target_col=target_col,
...    model=model,
...    grid=grid,
...    min_n_feat_step=1
... )
>>> # Generazione del report
>>> fe.make_report()

	n_feat	train_score	valid_score	n_feat_to_remove	feat_used	feat_to_remove	feat_select	best_estimator
0	5	0.964706	0.891667	1	[feature3, feature5, feature1, feature2, feature4]	[feature3]	[feature5, feature1, feature2, feature4]	LogisticRegression(C=0.1)
1	4	0.964706	0.891667	1	[feature4, feature5, feature2, feature1]	[feature4]	[feature5, feature2, feature1]	LogisticRegression(C=0.1)
2	3	0.982353	0.925000	1	[feature2, feature5, feature1]	[feature2]	[feature5, feature1]	LogisticRegression(C=1)
3	2	0.817647	0.750000	1	[feature1, feature5]	[feature1]	[feature5]	LogisticRegression(C=0.1)
4	1	0.788235	0.791667	0	[feature5]	[]	[feature5]	LogisticRegression(C=0.1)

>>> fe.extract_features()
['feature2', 'feature5', 'feature1']

>>> fe.get_db_filtered()

	feature1	feature2	feature5	target
0	1	5	2	0
1	2	4	12	1
2	3	3	4	0
3	4	2	14	1
4	5	1	2	0
5	6	6	3	0
6	7	3	1	0
7	8	7	2	0
8	9	3	3	0
9	2	5	14	1
10	3	8	1	1
11	1	4	1	1
12	3	2	14	1
13	6	9	12	1
14	43	75	15	1
15	2	4	16	1
16	4	5	2	0
17	6	7	12	1
18	3	5	13	1
19	2	2	12	1
20	6	5	1	0
21	3	8	15	1
22	2	6	17	1
23	6	3	2	0
24	3	5	1	1
25	2	7	15	1
26	5	8	1	1

FeatureElimination.get_db_filtered¶

Table of Contents

Previous topic

Next topic

This Page

	feature1	feature2	feature5	target
0	1	5	2	0
1	2	4	12	1
2	3	3	4	0
3	4	2	14	1
4	5	1	2	0
5	6	6	3	0
6	7	3	1	0
7	8	7	2	0
8	9	3	3	0
9	2	5	14	1
10	3	8	1	1
11	1	4	1	1
12	3	2	14	1
13	6	9	12	1
14	43	75	15	1
15	2	4	16	1
16	4	5	2	0
17	6	7	12	1
18	3	5	13	1
19	2	2	12	1
20	6	5	1	0
21	3	8	15	1
22	2	6	17	1
23	6	3	2	0
24	3	5	1	1
25	2	7	15	1
26	5	8	1	1

	feature1	feature2	feature5	target
0	1	5	2	0
1	2	4	12	1
2	3	3	4	0
3	4	2	14	1
4	5	1	2	0
5	6	6	3	0
6	7	3	1	0
7	8	7	2	0
8	9	3	3	0
9	2	5	14	1
10	3	8	1	1
11	1	4	1	1
12	3	2	14	1
13	6	9	12	1
14	43	75	15	1
15	2	4	16	1
16	4	5	2	0
17	6	7	12	1
18	3	5	13	1
19	2	2	12	1
20	6	5	1	0
21	3	8	15	1
22	2	6	17	1
23	6	3	2	0
24	3	5	1	1
25	2	7	15	1
26	5	8	1	1

	feature1	feature2	feature5	target
0	1	5	2	0
1	2	4	12	1
2	3	3	4	0
3	4	2	14	1
4	5	1	2	0
5	6	6	3	0
6	7	3	1	0
7	8	7	2	0
8	9	3	3	0
9	2	5	14	1
10	3	8	1	1
11	1	4	1	1
12	3	2	14	1
13	6	9	12	1
14	43	75	15	1
15	2	4	16	1
16	4	5	2	0
17	6	7	12	1
18	3	5	13	1
19	2	2	12	1
20	6	5	1	0
21	3	8	15	1
22	2	6	17	1
23	6	3	2	0
24	3	5	1	1
25	2	7	15	1
26	5	8	1	1