Monday, October 1, 2018

Week 7 Reloaded - Fine-tuning VGG16 model for CIFAR-10 dataset

Week 7 is the last week of this deep-learning plan. And I didn’t want to finish before trying a bit more to have better results. So here we go again.

This time I learned two things: Google Colaboratory and Lambda layer. Those two have helped me refactor my code in two notebooks to get better results.

Google Colaboratory

Because my computer was so slow to process such dataset, I didn’t have the luxury of experimenting with models and hyperparameters. So one obvious step was to look for an online solution to process my python code with an acceptable speed. Google Colaboratory was a nice and free solution that fitted my needs.

If you haven’t used it before, here is a nice tutorial
This processing power allowed me to rerun a previous notebook to train my own CNN for CIFAR-10 dataset. But for 100 epochs instead of only 15. This allowed me to see how my model was performing well enough to reach an accuracy of 82.68%. (compared to 74.53% with 15 epochs)

You can find the whole python notebook for my own model on Github here.

Fine-tuning VGG16 model

Now that I had Google Colaboratory, I could try a huge number of changes in different hyperparameters and layers customization. After a couple of days of trying, I could get a maximum accuracy of 73.33%. (compared to 68.03% with less tinkering).

This was the maximum test accuracy I could get. After that, the model was just overfitting to reach 90 something % training accuracy and no test accuracy improvement. I could go further and inject normalization and dropout layers. But that was enough for me for that part of my homework.

I could also solve my problem of not having enough memory to resize all the data and having to use part of it. This was possible by using a Lambda layer as an input and use it to Reshape every batch. Another solution was to use ImageDataGenerator. But I didn’t have to use it for now.

You can find the whole python notebook for a customization of VGG16 on Github here.

What’s next

Instead of spending more time tinkering with an existing model to reach a result that I could reach easier with my own model, I thought it is not worth it. Of course I already achieved the purpose of this homework to recycle an existing model and adapt it to my needs. Now it was time to move on.

Next is to find a real-world problem and work on it with real-world data to achieve an acceptable solution. And this new solution should not only be a quick and dirty bunch of python files, but rather a proper project with coding standards and a user-friendly interface. This way, I can learn even more and sharpen my skills.


Resources:

Sunday, September 23, 2018

Week 7 - Transfer learning with VGG16 model and CIFAR-10 dataset

Here I come to the 7th and final week of the initial plan to start working with deep learning. This week’s assignment was to use an existing CNN to learn the CIFAR-10 dataset. The VGG16 model was chosen for me to try. It was a quite interesting assignment to find some problems I will probably be facing again in the future.

Let’s talk about what I did this week.

You can find the whole python notebook on Github here.

My images are small


This was the first problem I faced. According to the Keras documentation, the model takes a minimum size of 32px. But actually when I tried, it was a minimum of 48px. But my dataset had images of size 32px. So I had to resize the images. This turned out to take more time that expected. There were a few solutions for resizing such data. And to find a suitable and less complicated solution, it took me quite some time to read the documentation of each one and try it.


My computer is a humble one

Not only did I have to resize the images -which was the easy part with integers- but I also had to convert the data type to float32 and normalize the data. This resizing task was not that easy for my computer after the the conversion to float32. That’s why -and after many times of a dead ipython kernel- I ended up using only one third of the training data and converting the data in batches.

The error was always:

Allocation of X exceeds 10% of system memory.

This was not even a large set of data, but I should start looking into learning about more optimized methods of processing such data.

How to reuse the VGG16 model?

Now getting to the step of reusing the model, I faced the question of how to reuse it. There were 4 options depending on the data I have and the data the model used initially for training. You can learn more about the options here: http://cs231n.github.io/transfer-learning/.

For my case, I assumed that the CIFAR-10 data is similar enough to the ImageNet data used for training the VGG16 model. So I used the model as a fixed feature extractor and I only added the fully connected layers at the end to classify 10 classes instead of 1000.

The long wait for results

Now that I’ve solved the initial problems of preparing the data and the model, I started the training for the output layer of the model. This took about 4 hours. And after the wait, the test accuracy was 68% compared to 87.77% training accuracy. :( I already had a better accuracy last week (74.53%) using my own network.

The sad end of the day

This was a sad result after a long time of waiting and restarting the computer a few times. But I believe that I should get used to this as a fundamental part of this data life. Now I have to find out the reason behind the result.

  • Was it overfitting related to the model?
  • Was it because the data was not enough?
  • Or was it because my images were smaller that what the network was trained for?
  • Do I have to fine-tune the model more?
Whatever the reason is, I will have to start by finding a quicker computer/cloud solution to make it easier to test any theory.

What’s next

The next step was to find a bigger real-world data set to play with it. But given the performance I had on my computer, my next step will be looking for an affordable cloud solution to train my models. This way I can put more time into learning and trying instead of spending half a day or even a whole day to test one theory. And I also want to investigate this not-so-large array resizing problem.


Resources:

Tuesday, September 11, 2018

Week 5/6 - Convolutional neural network (CNN) with CIFAR-10 dataset

Here I am in Week 5&6 of my mentor’s plan to practice deep-learning and start solving real problems. This time, my homework was about designing another CNN like the previous week but with the CIFAR-10 dataset.

At first, I was intimidated and thought that I really sucked and that I cannot really move on by myself. But Which I think was close to being true. :D But after several hours of looking closely at the dataset and reading the Keras documentation, I started to find results! Here they are.

You can find the whole python notebook on Github here.

UPDATE: I reran the code but for 100 epochs and could reach a 82.68% accuracy. The notebook is found here.

The accuracy I could get this time compared to previous MNIST homework reminded me of the difference between my 9X% results at school compared to the embarrassing university’s results :D

Anyways. I could get an accuracy of 74.53%. This already required a lot of time to train on my humble computer with no GPU (I think about 2 hours). That’s why I only have one model in my notebook this time. That’s because it took a lot of time to test a single model, so I decided to try something crazy; waiting for only one epoch to finish and looking at the resulting accuracy. If it was not so promising, I stop the process and try with a different design and so on. At the end, I decided to let the current one proceed till the end and see the result.

After reaching this number, I thought it is time to look online at how people solve such a problem. And then I found that link: https://github.com/BIGBALLON/cifar-10-cnn#accuracy-of-all-my-implementations

That person could also reach an accuracy of 76.27% at first with -I think- a known network that noobs like me don’t know yet. Then he had to go for more complicated and more famous networks to get much better results. Of course, the training time with a GPU was so scary; going for a day or even two.

What’s next


The conclusion of this week actually proves how good the plan I am following is. Because now I can see how complicated and time consuming it is to solve such a problem with small images and only 10 classes. That’s why next week of the plan is “transfer learning”. So I have to use an existing network (VGG16) and adapt it to my dataset to have better results.