Sunday, September 23, 2018

Week 7 - Transfer learning with VGG16 model and CIFAR-10 dataset

Here I come to the 7th and final week of the initial plan to start working with deep learning. This week’s assignment was to use an existing CNN to learn the CIFAR-10 dataset. The VGG16 model was chosen for me to try. It was a quite interesting assignment to find some problems I will probably be facing again in the future.

Let’s talk about what I did this week.

You can find the whole python notebook on Github here.

My images are small

This was the first problem I faced. According to the Keras documentation, the model takes a minimum size of 32px. But actually when I tried, it was a minimum of 48px. But my dataset had images of size 32px. So I had to resize the images. This turned out to take more time that expected. There were a few solutions for resizing such data. And to find a suitable and less complicated solution, it took me quite some time to read the documentation of each one and try it.

My computer is a humble one

Not only did I have to resize the images -which was the easy part with integers- but I also had to convert the data type to float32 and normalize the data. This resizing task was not that easy for my computer after the the conversion to float32. That’s why -and after many times of a dead ipython kernel- I ended up using only one third of the training data and converting the data in batches.

The error was always:

Allocation of X exceeds 10% of system memory.

This was not even a large set of data, but I should start looking into learning about more optimized methods of processing such data.

How to reuse the VGG16 model?

Now getting to the step of reusing the model, I faced the question of how to reuse it. There were 4 options depending on the data I have and the data the model used initially for training. You can learn more about the options here:

For my case, I assumed that the CIFAR-10 data is similar enough to the ImageNet data used for training the VGG16 model. So I used the model as a fixed feature extractor and I only added the fully connected layers at the end to classify 10 classes instead of 1000.

The long wait for results

Now that I’ve solved the initial problems of preparing the data and the model, I started the training for the output layer of the model. This took about 4 hours. And after the wait, the test accuracy was 68% compared to 87.77% training accuracy. :( I already had a better accuracy last week (74.53%) using my own network.

The sad end of the day

This was a sad result after a long time of waiting and restarting the computer a few times. But I believe that I should get used to this as a fundamental part of this data life. Now I have to find out the reason behind the result.

  • Was it overfitting related to the model?
  • Was it because the data was not enough?
  • Or was it because my images were smaller that what the network was trained for?
  • Do I have to fine-tune the model more?
Whatever the reason is, I will have to start by finding a quicker computer/cloud solution to make it easier to test any theory.

What’s next

The next step was to find a bigger real-world data set to play with it. But given the performance I had on my computer, my next step will be looking for an affordable cloud solution to train my models. This way I can put more time into learning and trying instead of spending half a day or even a whole day to test one theory. And I also want to investigate this not-so-large array resizing problem.


Tuesday, September 11, 2018

Week 5/6 - Convolutional neural network (CNN) with CIFAR-10 dataset

Here I am in Week 5&6 of my mentor’s plan to practice deep-learning and start solving real problems. This time, my homework was about designing another CNN like the previous week but with the CIFAR-10 dataset.

At first, I was intimidated and thought that I really sucked and that I cannot really move on by myself. But Which I think was close to being true. :D But after several hours of looking closely at the dataset and reading the Keras documentation, I started to find results! Here they are.

You can find the whole python notebook on Github here.

UPDATE: I reran the code but for 100 epochs and could reach a 82.68% accuracy. The notebook is found here.

The accuracy I could get this time compared to previous MNIST homework reminded me of the difference between my 9X% results at school compared to the embarrassing university’s results :D

Anyways. I could get an accuracy of 74.53%. This already required a lot of time to train on my humble computer with no GPU (I think about 2 hours). That’s why I only have one model in my notebook this time. That’s because it took a lot of time to test a single model, so I decided to try something crazy; waiting for only one epoch to finish and looking at the resulting accuracy. If it was not so promising, I stop the process and try with a different design and so on. At the end, I decided to let the current one proceed till the end and see the result.

After reaching this number, I thought it is time to look online at how people solve such a problem. And then I found that link:

That person could also reach an accuracy of 76.27% at first with -I think- a known network that noobs like me don’t know yet. Then he had to go for more complicated and more famous networks to get much better results. Of course, the training time with a GPU was so scary; going for a day or even two.

What’s next

The conclusion of this week actually proves how good the plan I am following is. Because now I can see how complicated and time consuming it is to solve such a problem with small images and only 10 classes. That’s why next week of the plan is “transfer learning”. So I have to use an existing network (VGG16) and adapt it to my dataset to have better results.

Saturday, September 8, 2018

Week 2/3/4 - Convolutional neural network (CNN) with MNIST dataset

Week 2 of my deep learning plan was to train a convolutional network using the MNIST dataset. The intent is to learn the basics of convolutional networks. Instead of writing the code of my homework here again, I will only link to it on Github and speak about my experience with it here.

You can find the whole python notebook on Github here.
This homework was the fist step of feeling not like the hello-world example in week 1. The first step was to load the MNIST data like before, analyzing the data and curating it before processing. It was not complicated but it was a nice start to realizing how important this step is for deciding how to design the network.

Next was the mix of confusion and fun. It was basically about looking for the reasonable number of hidden layers and number of filters per layer. After finding the ‘popular’ numbers for such a problem (what if my problem is a personal/custom one? back to that later in the future inshallah), it was time to try differt depths and different values for hyperparameters like dropout and filters. This was actually a boring and a time consuming step. Just to try some basic variations, I had to keep my computer running for about 4 hours. And I didn’t even try so many variations (only 7 variations).

The results were fine and I could reach an accuracy of 99.32% compared to 98.48% in my previous feedforward assignment.

Week 3 & 4

Week 3 & 4 were actually about playing with hyperparameters and optimization. So it was somehow included in my current assignment. I could get a taste of how they can affect the results and how I should give a considerable time, changing and observing the effects of such parameters and deciding which parameters can introduce an accuracy improvement.

Week 1 - Feedforward network with MNIST dataset

Week 1 of the plan was to train a feedforward network using the MNIST dataset. This is probably the easiest and most straightforward example to understand how the training cycle goes.

Before starting, I should mention that the code for this week -and the other weeks too- was not written from scratch by me. And that’s the main difference between school and work. It treated this homework the same way I treat work. I can google what I want and understand/refactor it. It can also be about reading about a specific network and understanding the recommended range of values for a specific parameter or the recommended layers structure.

This time, it was too simple that I copied the code and started playing with it to understand how it works. And as we move forward in the weeks, I had to write more myself. Consider it a Hello World week. Now lets see the code!

You can find the whole python notebook on Github here.

Week 0 - Jupyter notebook with Keras and Tensorflow

Starting with my deep-learning learning plan, I wanted to have a good development environment. This meant having a Docker image that can work on any device I am using with no extra setup needed. My choice was to use Keras with Tensorflow core for an easy start with not so many unwanted details at this step. I also chose to use Jupyter notebook to have a nice interface to trace/explain my code along with graphs and output numbers.

The following is my finalized Dockerfile with the latest versions which worked exactly how I wanted it.

# To build the container
# docker build -t jupyter-keras .
# To run the container:
# docker run -it -v /$(pwd)/:/home/jovyan/work -p 8888:8888 jupyter-keras:latest --NotebookApp.token=''

# To access the notbook from the browser:
# http://localhost:8888/tree

# To login in to the server:
# docker exec -it  /bin/bash

# To check Keras version:
# python -c 'import keras; print(keras.__version__)'

FROM jupyter/scipy-notebook

MAINTAINER Gaarv <@Gaarv1911>

USER root

# bash instead of dash to use source
RUN ln -snf /bin/bash /bin/sh

USER jovyan

RUN pip install --upgrade pip \
  && pip install --upgrade tensorflow \
  && pip install --upgrade --no-deps git+git:// \
&& pip install --upgrade --no-deps h5py

Tuesday, September 4, 2018

A deep-learning plan from a mentor

I’ve been learning machine learning by myself for a long time. From one Coursera specialization to a course to a YouTube playlist. But then I felt the problem with starting. I am learning the theory with some basic applications, but I don’t know how to go on by myself and start a project and analyze the data and find the correct structure and find-tune the parameters and so on…

Then came to me the old idea one more time; I need a mentor who knows how such professional life works and what really matters more. And after some searching and asking, I found one through a friend of a friend of a friend. And then she contacted me and offered help.


 And after explaining to her what I know and what I want in life from this exercise, she formulated a plan for me. So here I am posting it.

Week 1: Feedforward networks

A good start is the simple MNIST dataset. So train a feedforward network to study the basics of neural networks.

Week 2: Convolutional networks

Change the previous network to a convolutional network to study the basics of convolutional network.

Week 3: Hyperparameter optimization

Change the number of layers and different learning rates and other hyperparameters to learn validation and hyperparameter optimization.

Week 4: Dropout and batch normalization

Introduce dropout and batch normalization to the network to learn regularization and the rest of hyperparameter optimization. Up till now, it’s learning deep learning basics rather than a project.

Week 5&6: CIFAR-10 dataset

Repeat the above classification project but switch to CIFAR-10 dataset. There might be a few changes in the process, like input normalization and the need for more conv layers and so on. But this should solidify the knowledge.

Week 7: Transfer learning

Since you won’t train on your own from scratch all the time (no time and no resources), we sometimes borrow the lower layers from pre-trained networks and refine them. Use VGG model parameters with any other dataset and retrain for fine-tuning .

Which tools to use?

Since all the project is deep learning stick to Tensorflow and Keras, Keras is way easier but it is not as flexible. So you may want to choose between them based on your end goal. But give both a try and make sure you understand the basics in a theoretical level first. Consult a tutorial or a book or whatever you’re comfortable with. There are tons of materials online. Stanford course is one. It is more academic but easy. You won’t need other libraries, like scikit unless you want to play and compare with other algorithms or perhaps do a little input manipulation via them.

What’s next?

I assume if you reached this level, you’d have a pretty decent knowledge with clean data. I’ll look into other datasets to play with, as I know people who don’t consider the standard research datasets as a project but rather more of a tutorial following. So when you add it to your CV, it’ll look better.

I was almost done with all the steps and then got busy in life once again. Now I wanted to start the engines again and move on to a personal project to learn more, but I have some solid doubts that I did not do a 100% clean homework. That’s why I wanted to post my solution for every week again after revising it, cleaning it, and making sure that I can understand and present it well.