AppliedMachineLearningPublic

Premade estimators

Question one: How did you split the labels from the training set? What was the name of the labels dataset?

I used the code “train_y = train.pop(‘Species’)” to split the labels from the training set as the labels were the species of the flowers.

Question two: List 5 different estimators from tf.estimator and include the base command as you would write it in a script (for example this script used the tf.estimator.DNNClassifier() function from the API).

BoostedTreesEstimator : tf.estimator.BoostedTreesEstimator(feature_columns)
LinearEstimator : tf.estimator.LinearEstimator(feature_columns)
DNNLinearCombinedEstimator : tf.estimator.DNNLinearCombinedEstimator(linear_feature_columns=None)
BaselineEstimator : tf.estimator.BaselineEstimator()
DNNEstimator : tf.estimator.DNNEstimator(feature_columns)

Question three: What are the purposes of the input functions and defining feature columns?

The input functions are how you pull in data that you later use for training, evaluating, and making predictions. Feature columns describe each of the features that you want to use in a model so by defining your feature columns you are essentially creating an object that tells the model how to use the raw data from the features dictionary that the input function creates.

Question four: Describe the command classifier.train() in detail. What is the classifier and how did you define it? Which nested function (and how have you defined it) are you applying to the training and test detests?

A classifier is an algorithm that implements a classification. For this exercise I used the DNNClassifier which is good for deep models that are meant to perform multi-class classifications. I defined the classifier by assigning it to a variable (“classifier”) and then I called the DNNClassifier method on tf.estimator. I then specified the feature columns I wanted it to use, the number of hidden layers and nodes I wanted it to include (30,10) and finally I included how many classes the model had to choose between (3). to then train the model using the classifier I used the classifier.train() command. The classifier.train() command is used to assign labels to each data point. To actually use the classifier.train() command, I included the earlier input function that pulled in the training data as my argument so that I was able to train the model on the training data.

Question five: Redefine your classifier using the DNNLinearCombinedClassifier() as well as the LinearClassifier(). Retrain your model and compare the results using the three different estimators you instantiated. Rank the three estimators in terms of their performance.

One: LinearClassifier Accuracy = 97% Prediction is “Setosa” (99.2%), expected “Setosa” Prediction is “Versicolor” (97.4%), expected “Versicolor” Prediction is “Virginica” (95.6%), expected “Virginica”

Two: DNNLinearCombinedClassifier : Accuracy = 73% Prediction is “Setosa” (77.4%), expected “Setosa” Prediction is “Virginica” (45.9%), expected “Versicolor” Prediction is “Virginica” (63.1%), expected “Virginica”

Three: DNNCombinesClassifier Accuracy = 70% Prediction is “Setosa” (87.4%), expected “Setosa” Prediction is “Versicolor” (49.9%), expected “Versicolor” Prediction is “Virginica” (61.0%), expected “Virginica”

B.Build a Linear Model

Question one: Using the dftrain dataset, upload an image where you used the seaborn library to produce a sns.pairplot(). Also include a histogram of age using the training set and compare it to the seaborn plot for that same feature (variable). What interpretation can you provide of the data based on this plot?

pairplot histogram

As you can see the area under the curve in the tile of the pairplot that shows age (the upper left-most corner) mimics the shape of the graph of the histogram showing age. At this point you can see that it appears that a majority of the people on the shiip were in their 20s/30s as that is where the curve is the highest and thus has the most area under it; the area under the curve indicated how many observations were observed at that value so where the curve is the highest and the area under the curve is the greatest is the value that is most likely to occur in the dataset.

Question two: What is the difference between a categorial column and a dense feature?

A categorical column is a column that has limited, fixed values from a number of possible categories/values so that each data point is assigned to one group or another. A dense feature on the other hand is essentially turns a categorical feature column into an array where all the categories without values are represented by a 0 and and only the correct category for that data point is represented with a one (a dense feature identified the location of the data’s category/value in an array and replaces locations without values with a 0).

Question three: Describe the feature columns that have been input to your LinearClassifier(). How would you assess the result from your initial output? What is the purpose of adding a cross featured column? Did your attempt to capture the interaction between age and gender and incorporate it into your model improve performance? Include and interpret your predicted probabilities and ROC curve plots

The feature columns that have been put into my LinearClassifier are made up of six categorical columns (‘sex’, ‘n_siblings_spouses’, ‘parch’, ‘class’, ‘deck’,’embark_town’, ‘alone’) and two numeric columns (‘age’, ‘fare’). My initial output gave me an accuracy of about 70%, which is a bit low and this might be because the model is assessing each column individually. By adding a cross feature column I can combine two highly correlated columns into one columns which allows me to capture all of the possible combinations between the two variables and which might increase the accuracy of my model; essentially cross featured columns allow me to learn the differences between different feature combinations. predictedprobabilites ROC

Using the derived feature column with the combination of age and gender did imporve my model. The accuracy increased just a bit from 74% to 78%. Looking at the plots you can see from the ROC curve that the curve varies a bit from a line that would extend from a 45 degree angle across the plot (the curve gets pretty close to the upper left-hand corner) so it has a pretty high rate of true positives. From the predicted probabilities plot you can see that the model often predicted a very low chance of survival (the data is skewed to the left; a .2 chance of survival was predicted about 40% of the time), but the model also often predicted (though less often that it predicted a low chance of survival) a very high chance of survival (as seen by the fact that the values between .8 and 1 were predicted at about 25% frequency).