302 Generating Probabilities¶

Brainome’s Random Forest and Neural Network model predictors can also generate probabilities.

Prerequisites¶

This notebook assumes brainome is installed as per notebook brainome_101_Quick_Start

The data sets are:

titanic_train.csv for training data
titanic_predict.csv for predictions

Predictors require numpy to run and optionally scipy to generate a confusion matrix.

!python3 -m pip install brainome --quiet
!brainome --version

import urllib.request as request
response2 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_predict.csv', 'titanic_predict.csv')
%ls -lh titanic_predict.csv

WARNING: You are using pip version 22.0.3; however, version 22.0.4 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.9.10/x64/bin/python3 -m pip install --upgrade pip' command.

/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index

brainome v1.8-120-prod

-rw-r--r-- 1 runner docker 858 Mar 12 21:09 titanic_predict.csv

Generate a predictor¶

The predictor filename is predictor_302.py

!brainome https://download.brainome.ai/data/public/titanic_train.csv -y -o predictor_302.py -modelonly -q
print('\nCreated predictor_302.py')
!ls -lh predictor_302.py

/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index

Created predictor_302.py
-rw-r--r-- 1 runner docker 35K Mar 12 21:09 predictor_302.py

Generating classification probabilities for a data set¶

Rather than picking a single class, this feature outputs the probabilities for each class.

# using pandas to read csv data
%pip install pandas
import pandas as pd
import predictor_302 as predictor
# reading csv file
predict_data = pd.read_csv('titanic_predict.csv', na_values=[], na_filter=False)
# REQUIRED: strip the headers from dataset
predict_values = predict_data.values
probabilities_output = predictor.predict(predict_values, return_probabilities=True)
print(' Prediction Probabilities '.center(80, '-'))
print(probabilities_output)

Requirement already satisfied: pandas in /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages (1.4.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages (from pandas) (2021.3)
Requirement already satisfied: numpy>=1.18.5 in /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages (from pandas) (1.22.3)
Requirement already satisfied: six>=1.5 in /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)

WARNING: You are using pip version 22.0.3; however, version 22.0.4 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.9.10/x64/bin/python3 -m pip install --upgrade pip' command.

Note: you may need to restart the kernel to use updated packages.

--------------------------- Prediction Probabilities ---------------------------
[['died' 'survived']
 ['0.10347254522374348' '0.8965274547762565']
 ['0.8357493590851747' '0.16425064091482533']
 ['0.5723091317715159' '0.4276908682284841']
 ['0.8357493590851747' '0.16425064091482533']
 ['0.8833638515749265' '0.11663614842507353']
 ['0.8204901964380503' '0.17950980356194968']
 ['0.8215620538923634' '0.17843794610763664']
 ['0.10347254522374348' '0.8965274547762565']
 ['0.42322660063987716' '0.5767733993601228']
 ['0.7495210975955352' '0.2504789024044648']
 ['0.8833638515749265' '0.11663614842507353']]

Combining probabilities into the source data set¶

import numpy as np
predict_header = predict_data.columns.values
full_output = np.concatenate((
    np.concatenate((predict_header.reshape(1, -1), predict_data)), probabilities_output), axis=1)
pd.DataFrame(full_output)

	0	1	2	3	4	5	6	7	8	9	10	11	12
0	PassengerId	Cabin_Class	Name	Sex	Age	Sibling_Spouse	Parent_Children	Ticket_Number	Fare	Cabin_Number	Port_of_Embarkation	died	survived
1	881	2	Shelley, Mrs. William (Imanita Parrish Hall)	female	25	0	1	230433	26.0		S	0.10347254522374348	0.8965274547762565
2	882	3	Markun, Mr. Johann	male	33	0	0	349257	7.8958		S	0.8357493590851747	0.16425064091482533
3	883	3	Dahlberg, Miss. Gerda Ulrika	female	22	0	0	7552	10.5167		S	0.5723091317715159	0.4276908682284841
4	884	2	Banfield, Mr. Frederick James	male	28	0	0	C.A./SOTON 34068	10.5		S	0.8357493590851747	0.16425064091482533
5	885	3	Sutehall, Mr. Henry Jr	male	25	0	0	SOTON/OQ 392076	7.05		S	0.8833638515749265	0.11663614842507353
6	886	3	Rice, Mrs. William (Margaret Norton)	female	39	0	5	382652	29.125		Q	0.8204901964380503	0.17950980356194968
7	887	2	Montvila, Rev. Juozas	male	27	0	0	211536	13.0		S	0.8215620538923634	0.17843794610763664
8	888	1	Graham, Miss. Margaret Edith	female	19	0	0	112053	30.0	B42	S	0.10347254522374348	0.8965274547762565
9	889	3	Johnston, Miss. Catherine Helen Carrie""	female		1	2	W./C. 6607	23.45		S	0.42322660063987716	0.5767733993601228
10	890	1	Behr, Mr. Karl Howell	male	26	0	0	111369	30.0	C148	C	0.7495210975955352	0.2504789024044648
11	891	3	Dooley, Mr. Patrick	male	32	0	0	370376	7.75		Q	0.8833638515749265	0.11663614842507353

Next Steps¶

Check out 303 Predictor Json Measurements
Check out 300 Put your model to work

Brainome Jupyter Tutorials

302 Generating Probabilities

Contents