This document discusses preprocessing data with Jupyter Notebooks, Anaconda, and Python 3. It covers several key preprocessing steps: 1) importing libraries and datasets, 2) handling missing values through imputation, 3) encoding categorical variables, 4) splitting data into training and test sets, and 5) scaling variables. The goal is to preprocess a sample dataset on country, age, salary, and purchase data to prepare it for machine learning modeling.
3. www.stratebi.com
2 Preprocesamiento de datos – Importar
conjunto de datos
Country Age Salary
Purchase
d
France 44 72000 No
Spain 27 48000 Yes
Germany 30 54000 No
Spain 38 61000 No
Germany 40 Yes
France 35 58000 Yes
Spain 52000 No
France 48 79000 Yes
Germany 50 83000 No
France 37 67000 Yes
4. www.stratebi.com
3 Preprocesamiento de datos – Imputar
valores nulos
Country Age Salary
Purchase
d
France 44,0 72.000,00 No
Spain 27,0 48.000,00 Yes
Germany 30,0 54.000,00 No
Spain 38,0 61.000,00 No
Germany 40,0 63.777,78 Yes
France 35,0 58.000,00 Yes
Spain 38,8 52.000,00 No
France 48,0 79.000,00 Yes
Germany 50,0 83.000,00 No
France 37,0 67.000,00 Yes
5. www.stratebi.com
1 Preprocesamiento de datos – Codificar
Variables Categóricas
Countr
y Age Salary Purchased
0
44,
0
72.000,0
0 No
2
27,
0
48.000,0
0 Yes
1
30,
0
54.000,0
0 No
2
38,
0
61.000,0
0 No
1
40,
0
63.777,7
8 Yes
0
35,
0
58.000,0
0 Yes
Country_
0
Country_
1
Country_
2 Age Salary Purchased
1 0 0 44,0
72.000,0
0 No
0 0 1 27,0
48.000,0
0 Yes
0 1 0 30,0
54.000,0
0 No
0 0 1 38,0
61.000,0
0 No
0 1 0 40,0
63.777,7
8 Yes
1 0 0 35,0
58.000,0
0 Yes
6. www.stratebi.com
1 Preprocesamiento de datos – Creación de conjuntos
de Entrenamiento y Test
Country_
0
Country
_1
Country
_2 Age Salary Purchased
1 0 0 44,0
72.000,0
0 0
0 0 1 27,0
48.000,0
0 1
0 1 0 30,0
54.000,0
0 0
0 0 1 38,0
61.000,0
0 0
0 1 0 40,0
63.777,7
8 1
1 0 0 35,0
58.000,0
0 1
52.000,0
X_tra
in
Y_trai
n
X_tes
t Y_test
Country
_0
Country
_1
Country
_2 Age Salary
Purchas
ed
0 1 0 40,0
63.777,
78 1
1 0 0 37,0
67.000,
00 1
0 0 1 27,0
48.000,
00 1
0 0 1 38,8
52.000,
00 0
1 0 0 48,0
79.000,
00 1
0 0 1 38,0
61.000,
00 0
72.000,
Creación de los conjuntos Ordenar
7. www.stratebi.com
1 Preprocesamiento de datos – Escalado
de columnas
Country_0 Country_1 Country_2 Age Salary Purchased
-1 2,64575131
-
0,7745966
7 0,26306757
0,123814
79
1
1
-
0,37796447
-
0,7745966
7 -0,25350148
0,461756
32 1
-1
-
0,37796447
1,2909944
5 -1,97539832
-
1,530933
41 1
-1
-
0,37796447
1,2909944
5 0,05261351
-
1,111419
78 0
-
X_tra
in
Y_trai
n
X_tes
t Y_test
Escalado de Columnas
• Country_0
• Country_1
• Country_2
• Age
• Salary
Media 0
Varianza 1