Sunday, September 23, 2018

Week 7 - Transfer learning with VGG16 model and CIFAR-10 dataset

Here I come to the 7th and final week of the initial plan to start working with deep learning. This week’s assignment was to use an existing CNN to learn the CIFAR-10 dataset. The VGG16 model was chosen for me to try. It was a quite interesting assignment to find some problems I will probably be facing again in the future.

Let’s talk about what I did this week.

You can find the whole python notebook on Github here.

My images are small

This was the first problem I faced. According to the Keras documentation, the model takes a minimum size of 32px. But actually when I tried, it was a minimum of 48px. But my dataset had images of size 32px. So I had to resize the images. This turned out to take more time that expected. There were a few solutions for resizing such data. And to find a suitable and less complicated solution, it took me quite some time to read the documentation of each one and try it.

My computer is a humble one

Not only did I have to resize the images -which was the easy part with integers- but I also had to convert the data type to float32 and normalize the data. This resizing task was not that easy for my computer after the the conversion to float32. That’s why -and after many times of a dead ipython kernel- I ended up using only one third of the training data and converting the data in batches.

The error was always:

Allocation of X exceeds 10% of system memory.

This was not even a large set of data, but I should start looking into learning about more optimized methods of processing such data.

How to reuse the VGG16 model?

Now getting to the step of reusing the model, I faced the question of how to reuse it. There were 4 options depending on the data I have and the data the model used initially for training. You can learn more about the options here:

For my case, I assumed that the CIFAR-10 data is similar enough to the ImageNet data used for training the VGG16 model. So I used the model as a fixed feature extractor and I only added the fully connected layers at the end to classify 10 classes instead of 1000.

The long wait for results

Now that I’ve solved the initial problems of preparing the data and the model, I started the training for the output layer of the model. This took about 4 hours. And after the wait, the test accuracy was 68% compared to 87.77% training accuracy. :( I already had a better accuracy last week (74.53%) using my own network.

The sad end of the day

This was a sad result after a long time of waiting and restarting the computer a few times. But I believe that I should get used to this as a fundamental part of this data life. Now I have to find out the reason behind the result.

  • Was it overfitting related to the model?
  • Was it because the data was not enough?
  • Or was it because my images were smaller that what the network was trained for?
  • Do I have to fine-tune the model more?
Whatever the reason is, I will have to start by finding a quicker computer/cloud solution to make it easier to test any theory.

What’s next

The next step was to find a bigger real-world data set to play with it. But given the performance I had on my computer, my next step will be looking for an affordable cloud solution to train my models. This way I can put more time into learning and trying instead of spending half a day or even a whole day to test one theory. And I also want to investigate this not-so-large array resizing problem.


No comments:

Post a Comment