302 Generating Probabilities
Contents
302 Generating Probabilities¶
Brainome’s Random Forest and Neural Network model predictors can also generate probabilities.
Prerequisites¶
This notebook assumes brainome is installed as per notebook brainome_101_Quick_Start
The data sets are:
titanic_train.csv for training data
titanic_predict.csv for predictions
Predictors require numpy to run and optionally scipy to generate a confusion matrix.
!python3 -m pip install brainome --quiet
!brainome --version
import urllib.request as request
response2 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_predict.csv', 'titanic_predict.csv')
%ls -lh titanic_predict.csv
WARNING: You are using pip version 22.0.3; however, version 22.0.4 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.9.10/x64/bin/python3 -m pip install --upgrade pip' command.
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
brainome v1.8-120-prod
-rw-r--r-- 1 runner docker 858 Mar 12 21:09 titanic_predict.csv
Generate a predictor¶
The predictor filename is predictor_302.py
!brainome https://download.brainome.ai/data/public/titanic_train.csv -y -o predictor_302.py -modelonly -q
print('\nCreated predictor_302.py')
!ls -lh predictor_302.py
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
Created predictor_302.py
-rw-r--r-- 1 runner docker 35K Mar 12 21:09 predictor_302.py
Generating classification probabilities for a data set¶
Rather than picking a single class, this feature outputs the probabilities for each class.
# using pandas to read csv data
%pip install pandas
import pandas as pd
import predictor_302 as predictor
# reading csv file
predict_data = pd.read_csv('titanic_predict.csv', na_values=[], na_filter=False)
# REQUIRED: strip the headers from dataset
predict_values = predict_data.values
probabilities_output = predictor.predict(predict_values, return_probabilities=True)
print(' Prediction Probabilities '.center(80, '-'))
print(probabilities_output)
Requirement already satisfied: pandas in /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages (1.4.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages (from pandas) (2021.3)
Requirement already satisfied: numpy>=1.18.5 in /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages (from pandas) (1.22.3)
Requirement already satisfied: six>=1.5 in /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
WARNING: You are using pip version 22.0.3; however, version 22.0.4 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.9.10/x64/bin/python3 -m pip install --upgrade pip' command.
Note: you may need to restart the kernel to use updated packages.
--------------------------- Prediction Probabilities ---------------------------
[['died' 'survived']
['0.10347254522374348' '0.8965274547762565']
['0.8357493590851747' '0.16425064091482533']
['0.5723091317715159' '0.4276908682284841']
['0.8357493590851747' '0.16425064091482533']
['0.8833638515749265' '0.11663614842507353']
['0.8204901964380503' '0.17950980356194968']
['0.8215620538923634' '0.17843794610763664']
['0.10347254522374348' '0.8965274547762565']
['0.42322660063987716' '0.5767733993601228']
['0.7495210975955352' '0.2504789024044648']
['0.8833638515749265' '0.11663614842507353']]
Combining probabilities into the source data set¶
import numpy as np
predict_header = predict_data.columns.values
full_output = np.concatenate((
np.concatenate((predict_header.reshape(1, -1), predict_data)), probabilities_output), axis=1)
pd.DataFrame(full_output)
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | PassengerId | Cabin_Class | Name | Sex | Age | Sibling_Spouse | Parent_Children | Ticket_Number | Fare | Cabin_Number | Port_of_Embarkation | died | survived |
1 | 881 | 2 | Shelley, Mrs. William (Imanita Parrish Hall) | female | 25 | 0 | 1 | 230433 | 26.0 | S | 0.10347254522374348 | 0.8965274547762565 | |
2 | 882 | 3 | Markun, Mr. Johann | male | 33 | 0 | 0 | 349257 | 7.8958 | S | 0.8357493590851747 | 0.16425064091482533 | |
3 | 883 | 3 | Dahlberg, Miss. Gerda Ulrika | female | 22 | 0 | 0 | 7552 | 10.5167 | S | 0.5723091317715159 | 0.4276908682284841 | |
4 | 884 | 2 | Banfield, Mr. Frederick James | male | 28 | 0 | 0 | C.A./SOTON 34068 | 10.5 | S | 0.8357493590851747 | 0.16425064091482533 | |
5 | 885 | 3 | Sutehall, Mr. Henry Jr | male | 25 | 0 | 0 | SOTON/OQ 392076 | 7.05 | S | 0.8833638515749265 | 0.11663614842507353 | |
6 | 886 | 3 | Rice, Mrs. William (Margaret Norton) | female | 39 | 0 | 5 | 382652 | 29.125 | Q | 0.8204901964380503 | 0.17950980356194968 | |
7 | 887 | 2 | Montvila, Rev. Juozas | male | 27 | 0 | 0 | 211536 | 13.0 | S | 0.8215620538923634 | 0.17843794610763664 | |
8 | 888 | 1 | Graham, Miss. Margaret Edith | female | 19 | 0 | 0 | 112053 | 30.0 | B42 | S | 0.10347254522374348 | 0.8965274547762565 |
9 | 889 | 3 | Johnston, Miss. Catherine Helen Carrie"" | female | 1 | 2 | W./C. 6607 | 23.45 | S | 0.42322660063987716 | 0.5767733993601228 | ||
10 | 890 | 1 | Behr, Mr. Karl Howell | male | 26 | 0 | 0 | 111369 | 30.0 | C148 | C | 0.7495210975955352 | 0.2504789024044648 |
11 | 891 | 3 | Dooley, Mr. Patrick | male | 32 | 0 | 0 | 370376 | 7.75 | Q | 0.8833638515749265 | 0.11663614842507353 |
Next Steps¶
Check out 303 Predictor Json Measurements
Check out 300 Put your model to work