I modeled my project off of the linear regression tensorflow exercise that we did earlier in the class as the goal of this project was to predict a continuous value (i.e population). I created a simple DNN with three dense layers of 128, 64, and 1 neurons. The fact that this was a regression problem informed all of the decisions I made when I was compiling my model which is why I decided to use a reLu activation ( also I knew I wanted my numerical output (i.e the population prediction) to be a positive number). For my loss function I decided to use MSE because MSE is good for regression problems. For my optimizer I decided to use Adam. I was initially going to use RMSprop because it has such a fast learning rate, but I did some reading on optimizers and it seemed like Adam would take the best parts of RMSprop and another technique (Momentum) and combine them, which I thought would be best for my model. I used 0.001 as my learning rate as that was the defeault learning rate for Adam in the documentation. Finally, when compiling my model I decided the metrics that would best indicate how well my model was working were MAE and MSE as MAE (Mean absolute error) could tell me the average difference observed in the predicted and actual values across the whole data set and MSE (Mean Squared Error) would tell me what the average squared difference between the estimated values and the actual values for the data set.
On the very first run I only trained my model on 50 images with 5 epochs, a batch size of 5 and 10 steps per epoch. The results were a loss of 4846.3262, a mae of 55.7935, a mse of 4846.3262. Clearly, at at the start this was not such a great model; the loss was huge and so was the MAE value. I slowly bumped up each argument until I was able to train my computer on 1000 images, with 20 epochs, a batch size of 20 and 30 steps per epoch, I also included a validation split of 0.2 (past this point my computer began to get upset and really slow down!). This model resulted in a loss of 228.5554, a mae of 12.4009 and a validation loss of 197.3452 and a validation mae of 9.9374. For the test images the loss was 7131.8188 and the mae was 75.8363. After producing some graphs it seemed like 13 epochs would be optimal for my model. After running the same model on the same number of training images and just decreasing the number of epochs I recieved the results of loss being 199.6139, mae was 12.5380 and the validation loss was 388.9312, the validation mae was 18.3390. For the test images the loss was 4481.4771 and the mae was 60.8062. Overall, this was clearly not such an effective model, though it was much improved from the initial run; the loss on the test images was relatively high as was the MAE indicating that there is much room for improvement. In the future, I think (to just name a few things) that adding more layers to the model and increasing the amount of images that the network could train on would help this model improve by leaps and bounds.
These were the graphs I produced after running through 20 epochs:
Later, I recieved better results when I decreased my epochs to 14. These were the graphs from that run:
On a more personal note, this was a very useful project for me. At first I was overwhelmed by the openness of the instructions for this project but in the end working on this project caused me to realize that while I was understanding each individual component of a model while doing the tensforflow exercises, I was not understanding how all of the components fit together and what the metrics my models were producing really meant. This project forced me to go back and really connect all of the dots and dig a little bit deeper giving me a more solid understanding of what exactly I am building which can only help in the long run.