FeatureElimination.get_X_original¶

FeatureElimination.get_X_original()¶

Restituisce il DataFrame originale contenente solo le feature inizialmente considerate.

Seleziona dal DataFrame originale (attributo db) solo le colonne presenti nell’attributo feat_to_check, ovvero le feature che sono state incluse all’inizio del processo di analisi/eliminazione.

Returns:: Un DataFrame contenente solo le colonne specificate in feat_to_check.
Return type:: pd.DataFrame

Esempio:

>>> import pandas as pd
>>> from sklearn.linear_model import LogisticRegression
>>> from cefeste.elimination import FeatureElimination
>>> # Dati di esempio
>>> data = pd.DataFrame({
... 'feature1': [1,2,3,4,5,6,7,8,9,2,3,1,3,6,43,2,4,6,3,2,6,3,2,6,3,2,5],
... 'feature2': [5,4,3,2,1,6,3,7,3,5,8,4,2,9,75,4,5,7,5,2,5,8,6,3,5,7,8],
... 'feature3': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
... 'feature4': [8,8,8,9,5,3,4,5,6,2,6,8,4,4,4,6,5,7,8,5,9,3,4,7,5,78,8],
... 'feature5': [2,12,4,14,2,3,1,2,3,14,1,1,14,12,15,16,2,12,13,12,1,15,17,2,1,15,1],
... 'target':   [0,1,0,1,0,0,0,0,0,1,1,1,1,1,1,1,0,1,1,1,0,1,1,0,1,1,1]
... })
>>> # Parametri
>>> target_col = 'target'
>>> model = LogisticRegression()
>>> grid = {'C': [0.1, 1, 10]}
>>> # Inizializzazione
>>> fe = FeatureElimination(
...    db=data,
...    target_col=target_col,
...    model=model,
...    grid=grid,
...    min_n_feat_step=1
... )
>>> # Generazione del report
>>> fe.make_report()

	n_feat	train_score	valid_score	n_feat_to_remove	feat_used	feat_to_remove	feat_select	best_estimator
0	5	0.964706	0.891667	1	[feature3, feature5, feature1, feature2, feature4]	[feature3]	[feature5, feature1, feature2, feature4]	LogisticRegression(C=0.1)
1	4	0.964706	0.891667	1	[feature4, feature5, feature2, feature1]	[feature4]	[feature5, feature2, feature1]	LogisticRegression(C=0.1)
2	3	0.982353	0.925000	1	[feature2, feature5, feature1]	[feature2]	[feature5, feature1]	LogisticRegression(C=1)
3	2	0.817647	0.750000	1	[feature1, feature5]	[feature1]	[feature5]	LogisticRegression(C=0.1)
4	1	0.788235	0.791667	0	[feature5]	[]	[feature5]	LogisticRegression(C=0.1)

>>> fe.extract_features()
['feature2', 'feature5', 'feature1']

>>> fe.get_X_original()

	feature4	feature1	feature2	feature5
0	8	1	5	2
1	8	2	4	12
2	8	3	3	4
3	9	4	2	14
4	5	5	1	2
5	3	6	6	3
6	4	7	3	1
7	5	8	7	2
8	6	9	3	3
9	2	2	5	14
10	6	3	8	1
11	8	1	4	1
12	4	3	2	14
13	4	6	9	12
14	4	43	75	15
15	6	2	4	16
16	5	4	5	2
17	7	6	7	12
18	8	3	5	13
19	5	2	2	12
20	9	6	5	1
21	3	3	8	15
22	4	2	6	17
23	7	6	3	2
24	5	3	5	1
25	78	2	7	15
26	8	5	8	1

FeatureElimination.get_X_original¶

Table of Contents

This Page

	feature4	feature1	feature2	feature5
0	8	1	5	2
1	8	2	4	12
2	8	3	3	4
3	9	4	2	14
4	5	5	1	2
5	3	6	6	3
6	4	7	3	1
7	5	8	7	2
8	6	9	3	3
9	2	2	5	14
10	6	3	8	1
11	8	1	4	1
12	4	3	2	14
13	4	6	9	12
14	4	43	75	15
15	6	2	4	16
16	5	4	5	2
17	7	6	7	12
18	8	3	5	13
19	5	2	2	12
20	9	6	5	1
21	3	3	8	15
22	4	2	6	17
23	7	6	3	2
24	5	3	5	1
25	78	2	7	15
26	8	5	8	1

	feature4	feature1	feature2	feature5
0	8	1	5	2
1	8	2	4	12
2	8	3	3	4
3	9	4	2	14
4	5	5	1	2
5	3	6	6	3
6	4	7	3	1
7	5	8	7	2
8	6	9	3	3
9	2	2	5	14
10	6	3	8	1
11	8	1	4	1
12	4	3	2	14
13	4	6	9	12
14	4	43	75	15
15	6	2	4	16
16	5	4	5	2
17	7	6	7	12
18	8	3	5	13
19	5	2	2	12
20	9	6	5	1
21	3	3	8	15
22	4	2	6	17
23	7	6	3	2
24	5	3	5	1
25	78	2	7	15
26	8	5	8	1

	feature4	feature1	feature2	feature5
0	8	1	5	2
1	8	2	4	12
2	8	3	3	4
3	9	4	2	14
4	5	5	1	2
5	3	6	6	3
6	4	7	3	1
7	5	8	7	2
8	6	9	3	3
9	2	2	5	14
10	6	3	8	1
11	8	1	4	1
12	4	3	2	14
13	4	6	9	12
14	4	43	75	15
15	6	2	4	16
16	5	4	5	2
17	7	6	7	12
18	8	3	5	13
19	5	2	2	12
20	9	6	5	1
21	3	3	8	15
22	4	2	6	17
23	7	6	3	2
24	5	3	5	1
25	78	2	7	15
26	8	5	8	1