I used the code “train_y = train.pop(‘Species’)” to split the labels from the training set as the labels were the species of the flowers.
The input functions are how you pull in data that you later use for training, evaluating, and making predictions. Feature columns describe each of the features that you want to use in a model so by defining your feature columns you are essentially creating an object that tells the model how to use the raw data from the features dictionary that the input function creates.
A classifier is an algorithm that implements a classification. For this exercise I used the DNNClassifier which is good for deep models that are meant to perform multi-class classifications. I defined the classifier by assigning it to a variable (“classifier”) and then I called the DNNClassifier method on tf.estimator. I then specified the feature columns I wanted it to use, the number of hidden layers and nodes I wanted it to include (30,10) and finally I included how many classes the model had to choose between (3). to then train the model using the classifier I used the classifier.train() command. The classifier.train() command is used to assign labels to each data point. To actually use the classifier.train() command, I included the earlier input function that pulled in the training data as my argument so that I was able to train the model on the training data.
One: LinearClassifier Accuracy = 97% Prediction is “Setosa” (99.2%), expected “Setosa” Prediction is “Versicolor” (97.4%), expected “Versicolor” Prediction is “Virginica” (95.6%), expected “Virginica”
Two: DNNLinearCombinedClassifier : Accuracy = 73% Prediction is “Setosa” (77.4%), expected “Setosa” Prediction is “Virginica” (45.9%), expected “Versicolor” Prediction is “Virginica” (63.1%), expected “Virginica”
Three: DNNCombinesClassifier Accuracy = 70% Prediction is “Setosa” (87.4%), expected “Setosa” Prediction is “Versicolor” (49.9%), expected “Versicolor” Prediction is “Virginica” (61.0%), expected “Virginica”
As you can see the area under the curve in the tile of the pairplot that shows age (the upper left-most corner) mimics the shape of the graph of the histogram showing age. At this point you can see that it appears that a majority of the people on the shiip were in their 20s/30s as that is where the curve is the highest and thus has the most area under it; the area under the curve indicated how many observations were observed at that value so where the curve is the highest and the area under the curve is the greatest is the value that is most likely to occur in the dataset.
A categorical column is a column that has limited, fixed values from a number of possible categories/values so that each data point is assigned to one group or another. A dense feature on the other hand is essentially turns a categorical feature column into an array where all the categories without values are represented by a 0 and and only the correct category for that data point is represented with a one (a dense feature identified the location of the data’s category/value in an array and replaces locations without values with a 0).
The feature columns that have been put into my LinearClassifier are made up of six categorical columns (‘sex’, ‘n_siblings_spouses’, ‘parch’, ‘class’, ‘deck’,’embark_town’, ‘alone’) and two numeric columns (‘age’, ‘fare’). My initial output gave me an accuracy of about 70%, which is a bit low and this might be because the model is assessing each column individually. By adding a cross feature column I can combine two highly correlated columns into one columns which allows me to capture all of the possible combinations between the two variables and which might increase the accuracy of my model; essentially cross featured columns allow me to learn the differences between different feature combinations.
Using the derived feature column with the combination of age and gender did imporve my model. The accuracy increased just a bit from 74% to 78%. Looking at the plots you can see from the ROC curve that the curve varies a bit from a line that would extend from a 45 degree angle across the plot (the curve gets pretty close to the upper left-hand corner) so it has a pretty high rate of true positives. From the predicted probabilities plot you can see that the model often predicted a very low chance of survival (the data is skewed to the left; a .2 chance of survival was predicted about 40% of the time), but the model also often predicted (though less often that it predicted a low chance of survival) a very high chance of survival (as seen by the fact that the values between .8 and 1 were predicted at about 25% frequency).