tensorflow Archives » AI Geek Programmer

Convolutional neural network 4: data augmentation

AI Geek Programmer — Sat, 14 Mar 2020 12:39:24 +0000

In the previous three parts of the tutorial, we learned about convolutional networks in detail. We looked at the convolution operation, the convolutional network architecture, and the problem of overfitting. In the classification of the CIFAR-10 dataset we achieved 81% on the test set. To go further we would have to change the architecture of our network, experiment with hyperparameters or get more data. I leave the first two solutions for you to experiment with, and in this part of the tutorial I want to feed our network with more data. I will use the so-called data augmentation, i.e. the artificial generation of large amounts of new data.

In the fourth part of the tutorial you will learn:

What is data augmentation?
How to use the data generator from the Keras library?
How to artificially generate new data for the CIFAR-10 set?
And how well will our model do on the set of artificially generated (augmented) data?

What is data augmentation?

As I mentioned in the previous part of the tutorial, if we are dealing with a closed data set, i.e. one that cannot be significantly enlarged or enlarged is very expensive, we can reach for the so-called data augmentation. This is a particularly valuable technique for image analysis. Why? Because the images are susceptible to minor modifications, which will be a new data for the algorithm, although they will still be basically the same for the human eye. Moreover, such “minor modifications” occur in the real world. Standing in front of the car, we can look at it centrally or slightly from the side. It will still be the same vehicle and it will definitely be a car for our brain. For the algorithm, looking at the object from a different perspective is valuable information that allows to better generalize the training process.

What can we actually do with the image we want to artificially process? In theory, we have infinitely many solutions: we can slightly rotate the image, in any direction, at any angle. Move left, right, up and down. Change its colors or make other more or less subtle changes that will give the model tons of new data. In practice, a collection of tens of thousands of images can become a collection with millions of elements. This is a field where the possibilities are really great. As a curiosity: technologies related to autonomous vehicles are also trained on artificially generated data sets, e.g. using realistic game environments such as GTA.

Generating data with the Keras library

The Keras library offers a set of helpful tools for generating data. Let’s try to process the previously seen picture of the building in Crete with this generator. First, we make the necessary imports and define a function that will load the image from the file and convert it to the numpy table:

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
%matplotlib inline

def convert_image(file):
return np.array(Image.open(file))

We load a picture that you can download here, display the shape of the numpy table and the picture itself:

image = convert_image(r'<>\house-small.jpg')
image.shape
>>> (302, 403, 3)

plt.imshow(image)

To generate data we will use the flow (x, y) method from the ImageDataGenerator class. To be able to use it correctly, we have to import the class – that’s pretty obvious, but also adapt the data accordingly. The method expects the tensor x, in which the first position will be the index. In our case, there will be only one element, but the method still requires an index. Inputs y are labels that we don’t need for this simple experiment, but we must provide them. Hence:

from tensorflow.keras.preprocessing.image import ImageDataGenerator

x = np.expand_dims(image, 0)
x.shape
>>> (1, 302, 403, 3)

y = np.asarray(['any-label'])

Then we create the generator object and we pass the appropriate parameters. In the specification there are tons of them available, below are only few examples:

datagen = ImageDataGenerator(
width_shift_range=0.2, # shift along the x axis
height_shift_range=0.2, # shift along the y axis
rotation_range=20,
horizontal_flip=True,
vertical_flip = True,
rescale=1./255,
shear_range=0.25,
zoom_range=0.25,
)

Now we just call the flow (x, y) method, passing the prepared data to it and receiving and displaying the generated images.

figure = plt.figure()
i = 0
for x_batch, y_batch in datagen.flow(x, y):
a = figure.add_subplot(5, 5, i + 1)
plt.imshow(np.squeeze(x_batch))
a.axis('off')
if i == 24: break
i += 1
figure.set_size_inches(np.array(figure.get_size_inches()) * 3)
plt.show()

The result? Literally a bit upside down and a little “exaggerated”, because some parameters are set to high values. But it well reflects the capabilities of the generator. You can experiment with the settings yourself.

Data augmentation on CIFAR-10

Armed with a generator, we can once again approach the classification of the CIFAR-10 dataset. Most of the code has already been discussed in the previous parts of the tutorial, so I will only provide it here for consistency and clarity. At the beginning we make the necessary imports, load the dataset and build a model:

import numpy as np

%tensorflow_version 2.x
import tensorflow

import matplotlib.pyplot as plt
%matplotlib inline

from tensorflow import keras
print(tensorflow.__version__)
print(keras.__version__)
>>> 1.15.0
>>> 2.2.4-tf

from tensorflow.keras.datasets import cifar10
(x_train,y_train), (x_test,y_test) = cifar10.load_data()

from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.layers import Convolution2D, MaxPool2D, Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras import regularizers
from tensorflow.keras.utils import to_categorical

model = Sequential([
Convolution2D(filters=128, kernel_size=(5,5), input_shape=(32,32,3), activation='relu', padding='same'),
BatchNormalization(),
Convolution2D(filters=128, kernel_size=(5,5), activation='relu', padding='same'),
BatchNormalization(),
MaxPool2D((2,2)),
Convolution2D(filters=64, kernel_size=(5,5), activation='relu', padding='same'),
BatchNormalization(),
Convolution2D(filters=64, kernel_size=(5,5), activation='relu', padding='same'),
BatchNormalization(),
MaxPool2D((2,2)),
Convolution2D(filters=32, kernel_size=(5,5), activation='relu', padding='same'),
BatchNormalization(),
Convolution2D(filters=32, kernel_size=(5,5), activation='relu', padding='same'),
BatchNormalization(),
MaxPool2D((2,2)),
Convolution2D(filters=16, kernel_size=(3,3), activation='relu', padding='same'),
BatchNormalization(),
Convolution2D(filters=16, kernel_size=(3,3), activation='relu', padding='same'),
BatchNormalization(),
Flatten(),
Dense(units=32, activation="relu"),
Dropout(0.15),
Dense(units=16, activation="relu"),
Dropout(0.05),
Dense(units=10, activation="softmax")
])

optim = RMSprop(lr=0.001)

model.compile(optimizer=optim, loss='categorical_crossentropy', metrics=['accuracy'])

After preparing and successfully compiling the model, we define the generator. We assume that the data will be rotated by 10 degrees, we allow horizontal flip, but not vertical in order not to artificially put things upside down. The generator will also move the images vertically and horizontally by 10%. Small zoom and shear are also allowed. Let’s remember that the images are small and major modifications can make the image difficult to recognize even for a person:

from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(

rotation_range=10,

horizontal_flip=True,

vertical_flip = False,

width_shift_range=0.1,

height_shift_range=0.1,

rescale = 1. / 255,

shear_range=0.05,

zoom_range=0.05,

)

We also need one-hot encoding for training and test labels. We set the size of the batch and the generator is basically ready to use:

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

batch_size = 64
train_generator = datagen.flow(x_train, y_train, batch_size=batch_size)

The above generator will be the source of data for the training process. But what about the validation set that will allow us to track progress? Well, we need to define a separate generator. However, this one will not modify the source images in any way:

datagen_valid = ImageDataGenerator(
rescale = 1. / 255,
)

x_valid = x_train[:100*batch_size]
y_valid = y_train[:100*batch_size]

x_valid.shape[0]
>>>6400

valid_steps = x_valid.shape[0] // batch_size
validation_generator = datagen_valid.flow(x_valid, y_valid, batch_size=batch_size)

As you can see above, the dataset that the training process will use for validation will be 100 times batch size. We also need to calculate the number of validation steps – these data will be needed to start the training.

history = model.fit_generator(
train_generator,
steps_per_epoch=len(x_train) // batch_size,
epochs=120,
validation_data=validation_generator,
validation_freq=1,
validation_steps=valid_steps,
verbose=2
)

Note that we are not using the fit() method as before, but the fit_generator() method, which accepts a training data generator and (optionally) a validation data generator. With so much data, we will teach 120 instead of 80 epochs, hoping to avoid overfitting.

>>> Epoch 1/120
>>> Epoch 1/120
>>> 781/781 - 49s - loss: 1.8050 - acc: 0.3331 - val_loss: 1.5368 - val_acc: 0.4581
>>> Epoch 2/120
>>> Epoch 1/120
>>> 781/781 - 41s - loss: 1.3230 - acc: 0.5249 - val_loss: 1.1828 - val_acc: 0.5916
>>> Epoch 3/120

(...)

>>> 781/781 - 39s - loss: 0.1679 - acc: 0.9473 - val_loss: 0.1484 - val_acc: 0.9463
>>> Epoch 119/120
>>> Epoch 1/120
>>> 781/781 - 38s - loss: 0.1708 - acc: 0.9466 - val_loss: 0.1538 - val_acc: 0.9538
>>> Epoch 120/120
>>> Epoch 1/120
>>> 781/781 - 39s - loss: 0.1681 - acc: 0.9486 - val_loss: 0.1379 - val_acc: 0.9534

We obtained accuracy of 95% for both sets. This can also be seen in the chart below:

print(history.history.keys())
>>> dict_keys(['loss', 'acc', 'val_loss', 'val_acc'])

plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Valid'], loc='upper left')
plt.show()

Due to the lack of overfitting, we could theoretically further increase the number of epochs.

Let’s check how the trained model will cope with the test dataset that it has not yet seen.

x_final_test = x_test / 255.0
eval = model.evaluate(x_final_test, y_test)
>>> 10000/10000 [==============================] - 3s 314us/sample - loss: 0.5128 - acc: 0.8687

We achieved accuracy of 87%, 6% more than the model version without a data generator.

Most importantly, however, the model is eager to continue learning, without compromising accuracy on the validation and test sets.

This is the last post in this tutorial. I hope that I was able to bring some interesting topics related to convolutional neural networks. If you liked the above post and the entire tutorial, please share it with people who may be interested in the subject of machine learning.

The post Convolutional neural network 4: data augmentation appeared first on AI Geek Programmer.

Convolutional neural network 3: convnets and overfitting

AI Geek Programmer — Fri, 31 Jan 2020 20:46:28 +0000

Convolutional neural network is one of the most effective neural network architecture in the field of image classification. In the first part of the tutorial, we discussed the convolution operation and built a simple densely connected neural network, which we used to classify CIFAR-10 dataset, achieving accuracy of 47%. In the second part of the tutorial, we familiarized ourselves in detail with the architecture and parameters of the convolutional neural network, we built our own network and obtained ~70% accuracy on the test set. As it turned out, however, we encountered the problem of overfitting, which prevented us from getting better results. In this part of the tutorial, we’ll take a closer look at Convnets and overfitting and inspect various techniques of regularization, i.e. preventing excessive fitting to a training set. We will end the post with a list of practical tips that can be useful when building a convolutional neural network.

From the third part of the tutorial you will learn:

What is overfitting?
How to deal with the problem of overfitting?
What is internal covariate shift?
How to apply batch normalization?
What is dropout?
And some practical tips for building convolutional neural networks.

What is overfitting?

Let’s look again at the results we got in the second part of the tutorial. Figure 1 shows the classification results on the training set, which eventually reached up to ~95% (the blue line). Below there is the classification result for the validation set (the orange line). As you can see, the results for both sets began to diverge already around the 15th epoch, and the final difference for the 80th epoch was as high as ~25%.

Figure 1 – learning outcomes for training and validation sets

We call this situation overfitting. The network has learned to classify a training set so well that it has lost the ability to effectively generalize, i.e. the ability to correctly classify data that it has not previously seen.

To better understand overfitting, imagine a real life example. A professional basketball player must have the highest quality shoes. He works with a footwear company, and this company prepares shoes that are perfectly suited to the shape and construction of his feet. This requires not only matching the shape of his feet to the shoes, but primarily it requires special insole. Now the basketball player feels great in new shoes and his game is even better. Does this mean that such shoes will be equally good for another basketball player or for amateur players? Probably not in the vast majority of cases. These shoes have been fitted so well to the feet of this particular basketball player that they will not perform well on another feet. This is overfitting, and companies that produce footwear try to design them in such a way that the shapes of the shoes and insoles fit the greatest possible number of feet, while ensuring the greatest comfort of play.

Yet another example – this time graphic. Let’s assume that we want to build a classifier that will correctly classify the data into “circular” and “triangular”.

Figure 2 – overfitting vs. better generalizing model

If we adjust the classifier too much to the training data, it will not be able to correctly classify the new data, because it is unlikely that these new data will fit perfectly into the distribution of training data. Therefore, it is better for the model to be less complex. Although it will achieve slightly worse results on the training set, it will probably generalize the problem better, so it will also classify new data more correctly.

How to solve the problem of overfitting?

OK, so how do you counteract overfitting? There are at least several effective methods for this. Below I will describe the most important ones and we will try to use some of them in our classifier.

We collect more data – this is often the most effective method to prevent overfitting. If the model sees more data it will be able to better generalize its response. Let’s remember that neural networks, and in general machine learning, loves huge amounts of data and high computing power. Unfortunately, often this method is the most difficult to use in practice or even impossible – as in our case when we have a closed dataset.

If we can’t collect more data, we can sometimes create it ourselves. Although it sounds quite breakneck and we may wonder if artificially generated data will improve the model’s response, in practice this method brings good results. Especially in image processing we have a wide range of possibilities in that area. We can slightly rotate, move, change its colors or make other more or less subtle changes that will give the model tons of new data. From a logical point of view: having an original photo of a horse, we can mirror it or change its color and it will still be a photo of a horse. This technique is called data augmentation and leading libraries offer ready-to-use tools. We will use one of them in the next part of this tutorial.

As I mentioned in the second part of the tutorial, each neural network has many so-called hyperparameters. They have a significant impact on the way the network works. They are part of the model architecture and by controlling them you can get better or worse results. When building each model, it’s worth experimenting to find architecture that gives us better results. Sometimes reducing the complexity of architecture gives surprisingly good results. Too complex architecture will be able to generate overfitting fairly quickly, because it will be easier for such a network to accurately fit to the training set.

Let’s start with this simple move. The network from the second part of the tutorial consists of convolutional and densely connected subnets. The convolutional network is not densely connected and we should rather try to increase its complexity rather than reduce it, because it will be able to capture more features of the image. Therefore, to reduce the complexity of architecture, it is good idea to start with the densely connected part in the first place.

From the model in the following form (the second part of the tutorial):

Dense(units=512, activation="relu"),
Dense(units=64, activation="relu"),
Dense(units=10, activation="softmax")

we will move to a much simpler one:

Dense(units=32, activation="relu"),
Dense(units=16, activation="relu"),
Dense(units=10, activation="softmax")

(...)
>>> Epoch 78/80
>>> loss: 0.5725 - accuracy: 0.7968 - val_loss: 0.7897 - val_accuracy: 0.7367
>>> Epoch 79/80
>>> loss: 0.5667 - accuracy: 0.8014 - val_loss: 0.8373 - val_accuracy: 0.7259
>>> Epoch 80/80
>>> loss: 0.5611 - accuracy: 0.8019 - val_loss: 0.8255 - val_accuracy: 0.7220

eval = model.evaluate(x_test, to_categorical(y_test))
>>> loss: 0.8427 - accuracy: 0.7164

Figure 3 – Training results after reducing the complexity of a densely connected network

As you can see there are some benefits. About 2% higher classification accuracy on the validation set. Faster training, because the network is less computationally demanding. And also reduced, though not eliminated, overfitting – currently at around 10%.

The architecture of our first version of the network, proposed in the second part of the tutorial, assumed the processing of each image by three convolution “modules”, with 64, 32 and 16 filters, respectively. Such a complexity of the convolutional network allowed us to obtain about 80% accuracy on the training set, which translated into ~72% on the test set. For the record, it looked like this:

Convolution2D(filters=64, kernel_size=(3,3), input_shape=(32,32,3), activation='relu', padding='same'),
Convolution2D(filters=64, kernel_size=(3,3), activation='relu', padding='same'),
MaxPool2D((2,2)),
Convolution2D(filters=32, kernel_size=(3,3), activation='relu', padding='same'),
Convolution2D(filters=32, kernel_size=(3,3), activation='relu', padding='same'),
MaxPool2D((2,2)),
Convolution2D(filters=16, kernel_size=(3,3), activation='relu', padding='same'),
Convolution2D(filters=16, kernel_size=(3,3), activation='relu', padding='same'),

In order for us to get better classification results, we should improve in two areas. First, increase the accuracy of classification on the training set, because as you can see the accuracy for the test set is always lower than the accuracy for the training set. Secondly, we should reduce overfitting. A network that learns to generalize well will achieve much better results on data it has not previously seen. In addition, we will be able to train it for more than 80 epochs. Currently, this does not make much sense, because although the accuracy on the training set can still increase, the same parameter on the validation set indicates that this learning is not generalizing, but fitting to the training set.

How to improve the accuracy of classification on the training set? One way is to deepen the convolutional subnet. By adding more layers and increasing the number of filters, we give the network the ability to capture more features and thus greater accuracy in classification. To achieve this, we will add one more convolutional “module” with an increased number of filters:

Convolution2D(filters=128, kernel_size=(5,5), input_shape=(32,32,3), activation='relu', padding='same'),
Convolution2D(filters=128, kernel_size=(5,5), activation='relu', padding='same'),
MaxPool2D((2,2)),
Convolution2D(filters=64, kernel_size=(5,5), activation='relu', padding='same'),
Convolution2D(filters=64, kernel_size=(5,5), activation='relu', padding='same'),
MaxPool2D((2,2)),
Convolution2D(filters=32, kernel_size=(5,5), activation='relu', padding='same'),
Convolution2D(filters=32, kernel_size=(5,5), activation='relu', padding='same'),
MaxPool2D((2,2)),
Convolution2D(filters=16, kernel_size=(3,3), activation='relu', padding='same'),
Convolution2D(filters=16, kernel_size=(3,3), activation='relu', padding='same'),

Attempting to reduce overfitting requires the introduction of two new elements: batch normalization and dropout technique.

Convnets and overfitting: batch normalization

Batch normalization aims to reduce so-called internal covariate shift. To understand the idea behind batch normalization, you must first understand what the internal covariate shift is.

Covariate is a fairly widely used term, mainly in statistics, and means an independent variable, in other words an input variable. On the basis of input variables, output (dependent) variables are determined. By analogy, in machine learning, covariate will mean the input variable / input / X / feature. In our example, covariates are the values of the color components of individual pixels of processed images.

Each dataset has a certain distribution of input data. For example, if in the CIFAR-10 dataset we analyzed the distribution of average brightness of images depicting aircraft, it would probably be different from the brightness of images depicting frogs. If we superimposed these two distributions, they would be shifted from each other. This shift is called covariate shift.

Although the datasets we use for machine learning are usually well-balanced, the division of the set into training, validation and test sets causes that these sets have different distribution of input data. For this reason (among others) we usually have a lower accuracy for the test set as compared to the training set.

Figure 4 – covariate shift

Covariate shift occurs not only when splitting a set or enriching it with new data, but also as a result of passing input data through subsequent layers of the neural network. The network modifies data naturally by imposing weights assigned to connections between neurons in the network. As a consequence, each subsequent layer must learn data that has a slightly different distribution than the original input. This not only slows down the training process but also makes the network more susceptible to overfitting. The phenomenon of input data distribution shift in the neural network has been described by Ioffe and Szegedy and called internal covariate shift.

Ioffe and Szegedy proposed a method of data normalization performed between layers of the neural network as part of its architecture, thanks to which the phenomenon of internal covariate shift can be minimized. It should be noted here that some researchers dealing with the issue indicate that batch normalization does not so much reduce internal covariate shift, but rather “smoothes” the target function, thus accelerating and improving the training process.

To sum up: batch normalization speeds up learning – allows fewer iterations to get the same results as network without batch normalization. It allows the use of higher learning rates without experiencing vanishing gradient problem and also helps eliminate overfitting. Most machine learning libraries, including Keras, have built-in batch normalization functions.

For those interested: a wiki entry and a scientific article by Sergey Ioffe and Christian Szegedy, who proposed and described the batch normalization method. The article is rather technical, with a large dose of mathematics, but the abstract, introduction and summary are easily understandable.

Convnets and overfitting: dropout

The second very useful technique that effectively fights overfitting is the so-called dropout. It was proposed by Geoffrey E. Hinton, et al. at work Improving neural networks by preventing co-adaptation of feature detectors. It is a relatively simple, but also very effective technique for preventing overfitting. It consists in randomly removing individual neurons from the network (from internal layers, sometimes also input layer) during training. Because complex networks (and these are undoubtedly deep neural networks), especially those with relatively small amounts of training data, tend to accurately match the data, this deregulation method forces them to learn in a more generalized way.

In each training round, each of the neurons is removed or left in the network. The chances of removal are defined by the probability with which the neuron will be removed. In the original work it was 50% for each neuron. Currently, we can independently determine this probability, and for different layers it may be different.

Figure 5 – dropout as a technique to minimize overfitting

The use of dropout in practice leads to a situation in which the network architecture changes dynamically and we get a model in which one dataset was used to teach many networks with different architectures, and then was tested on a test set with averaged weight values.

The use of dropout in Keras comes down to adding another layer called Dropout (rate), whose hyperparameter is the probability with which the neuron will be removed from the network. We add dropouts to the densely connected subnet. Its use in the convolutional subnet is less common and basically misses the idea behind the convolutions.

In the convolution layer, we will use batch normalization, which is obtained in Keras by adding the BatchNormalization () layer. As a result, we will get the following new architecture:

model = Sequential([

Convolution2D(filters=128, kernel_size=(5,5), input_shape=(32,32,3), activation='relu', padding='same'),

BatchNormalization(),

Convolution2D(filters=128, kernel_size=(5,5), activation='relu', padding='same'),

BatchNormalization(),

MaxPool2D((2,2)),

Convolution2D(filters=64, kernel_size=(5,5), activation='relu', padding='same'),

BatchNormalization(),

Convolution2D(filters=64, kernel_size=(5,5), activation='relu', padding='same'),

BatchNormalization(),

MaxPool2D((2,2)),

Convolution2D(filters=32, kernel_size=(5,5), activation='relu', padding='same'),

BatchNormalization(),

Convolution2D(filters=32, kernel_size=(5,5), activation='relu', padding='same'),

BatchNormalization(),

MaxPool2D((2,2)),

Convolution2D(filters=16, kernel_size=(3,3), activation='relu', padding='same'),

BatchNormalization(),

Convolution2D(filters=16, kernel_size=(3,3), activation='relu', padding='same'),

BatchNormalization(),

Flatten(),

Dense(units=32, activation="relu"),

Dropout(0.15),

Dense(units=16, activation="relu"),

Dropout(0.05),

Dense(units=10, activation="softmax")

])

optim = RMSprop(lr=0.001)

As you can see above, I also proposed to change the optimizer from SGD to RMSprop, which, as shown by my tests, worked slightly better for the above architecture.

Here a small digression: you may be wondering where all these changes come from? Well, they come from two sources: from collected experience and experiments with a given network. I spent at least a dozen hours on the solution that I will finally present in this tutorial, trying different architectures and hyperparameters values. This is how it looks in practice, so if you spend the second day with your model and you have no idea what to do next, then you must know that it is completely normal and in a moment (or after a short break) you will probably go on with your work.

The network has been trained for 80 epochs and as a result we have achieved a classification accuracy of 81%.

Epoch 77/80
42500/42500 - 19s - loss: 0.0493 - accuracy: 0.9888 - val_loss: 1.7957 - val_accuracy: 0.8119
Epoch 78/80
42500/42500 - 19s - loss: 0.0523 - accuracy: 0.9879 - val_loss: 1.2465 - val_accuracy: 0.8016
Epoch 79/80
42500/42500 - 19s - loss: 0.0499 - accuracy: 0.9880 - val_loss: 1.7057 - val_accuracy: 0.8137
Epoch 80/80
42500/42500 - 18s - loss: 0.0490 - accuracy: 0.9880 - val_loss: 1.5880 - val_accuracy: 0.8175

eval = model.evaluate(x_test, to_categorical(y_test))
>>> 10000/10000 [==============================] - 2s 167us/sample - loss: 1.5605 - accuracy: 0.8112

A look at the chart for the training and validation sets gives mixed feelings. On the one hand, we were able to increase the classification accuracy for all three sets, including the most important, i.e. test set, by nearly 10% (from 71% to 81%). On the other hand, strong overfitting has appeared again, which means that the network is again “learning the training set” more than generalizing the classification.

Figure 6 – the result of the classification (changed architecture)

If I wanted to get a better result than 81%, I would choose one of three ways. First, I could experiment with different architectures. I could refer to one of reference architectures that obtained very good results on the CIFAR-10 or similar dataset. Secondly, I could examine the response of the network to other hyperparameter settings – tedious and time-consuming work, but sometimes a few simple changes give good results. The third way is the further fight against overfitting, but with a slightly different method, which I have already mentioned above – data augmentation. We’ll take look at it in the next part of the tutorial.

Convnets – some practical tips

At very the end of this part of the tutorial, I put some practical, loosely coupled tips that you can take into account when building your Convnet.

If you can, use the proven network architecture and, if possible, adapt it to your needs.
Start with overfitting and introduce regularization.
Place the dropout in a densely connected layer and enter batch normalization into the convolutional subnet. However, do not stick to this rule stiffly. Sometimes, non-standard move can give unexpected results.
Kernel size should be rather much smaller than the size of the image.
Experiment with different hyperparameter settings, then experiment even more.
Use the GPU. If you don’t have a computer with a suitable graphics card, use Google Colaboratory.
Collect as much training data as possible. There is no such thing as “too much data.”
If you cannot collect more data, use the data generator when possible – more on this in the fourth part of the tutorial.
A very deep and extensive network will have a strong tendency to overfitting. Use as shallow network as possible. In particular, do not overdo with the number of neurons and layers in a densely connected subnet.
Ensure that training and test sets are well balanced and have a similar distribution. Otherwise, you will always get much worse results on the test set than on the training set.
Once you feel good in the Convnet world, start reading more advanced scientific studies. They will allow you to better understand how convolutional networks work, and they will introduce new techniques and architectures to your toolbox.

Good luck!

I hope you found the above post interesting. If so, share it with your friends.

I invite you to the fourth part of the tutorial.

The post Convolutional neural network 3: convnets and overfitting appeared first on AI Geek Programmer.

Convolutional neural network 2: architecture

AI Geek Programmer — Wed, 25 Dec 2019 17:02:57 +0000

Convolutional neural network provides one of the best classification results for images. In the previous post, you had the opportunity to learn what a convolution is and how to classify a CIFAR-10 dataset using a simple densly connected neural network. By the way, we have obtained accuracy of 47% on the test set.

In the second part of the tutorial we go further:

we explain the basic concepts and the architecture of convolutional neural networks,
we build a simple convnet and check how it works on the CIFAR-10 dataset,
and we briefly explain what an overfitting is – an issue that we will deal with in the third part of this tutorial.

This post is the second part of the tutorial, so if you haven’t read the first one, I encourage you to read it first.

Convolutional neural network – architecture

Let’s start with the fact that a convolutional neural network consists of two sub-nets. The first of them converts the input tensor. The second one is a classic densely connected neural network, terminated with a layer, which classifies the input data into N classes – as in the example from the first part of the tutorial. The convolutional subnet usually processes three-dimensional data, i.e. unprocessed except for normalization. A densely connected network, on the other hand, requires data flattened to one dimension.

Figure no. 1: Convolutional neural network – high-level architecture

If we adopt this somewhat artificial division into sub-nets (and knowing from other post how to classify image data using a classic neural network), we can basically focus further on the first element only, i.e. on the convolutional part.

Before we go deeper into the architecture of the convolutional network, let’s think for a moment why Convnets handle images so well? If you read my post about handwriting classification, you may remember that all the numbers were more or less in the center of the picture. It looked something like this:

The neural network did well with the classification, being sure that what is most interesting will always be in the center of the image. But wait, what if the digit were shifted to one side? Like this:

Probably a classic neural network, in addition trained on centrally located digits, would not be able to handle such data. There is a term “translational invariance”. This means that we can recognize an object as the right object, even if its appearance or location has changed to some extent: by shifting, rotating, changing the size, colors, brightness, etc. As in the example above, when we moved zero to the upper left corner, and yet for the human brain it is unquestionably still zero. For a non-translational-invariance neural network, it will be a non-zero object – probably impossible to classify for a network trained on centered symbols. It is noteworthy that the word “translational” is a bit confusing. This is not about translation at all, but rather about transitioning. It was taken from geometry, where it means shifting each pixel in the same way.

Convnets are insensitive to location because their central element is the convolution operation, which involves processing each part of the image with the same filter value. In other words: the convolution does not look at the center of the image or any other area. It runs with the filter throughout entire image, and reports characteristic values in places where it finds an interesting feature.

Figure no. 2: convolution – step no. one

Figure no. 2 presents graphically how we perform the convolution operation on a 5×5 pixel image labeled Input, using a 3×3 size filter. The filter was initialized with values that have already appeared in the first part of the tutorial, when we tried to sharpen the edges of an image. The filter is projected onto the image (green frame), and then the value of each pixel of the image is multiplied by the corresponding filter value. As a result, we get the value of -74. Calculations were presented under the output.

In the next step, the filter is moved one position to the right and the calculations are repeated. As a result we get the value of -96.

Figure no. 3: convolution – step no. two

For the last step we get the value of -43, as shown in Figure 4 below.

Figure no. 4: convolution – step no. nine

The filter runs all over the input, from left to right and from top to bottom, ultimately filling the output with values.

Figure no. 5: moving the filter through the image

As you can see, as a result of image convolution we can get values below 0, as well as greater than 255. However, we should remember that the purpose of convolution in the neural network is to detect the image features, not visualization after processing with the filter. Hence, it doesn’t really matter until we want to display the image after the convolution (and this happens quite rarely during the training process). If we want to visualize some intermediate state, we should first shift the values, and only in the second step truncate those above 255 to 255, and those below 0 to the value of 0. The issue of: with what value should we shift data is quite complicated and goes beyond the scope of this tutorial. If you are interested in that, please refer to this thread on stackoverflow first.

In summary, data after convolution become insensitive to the location of the object. No matter where it is in fact located, a convolution will be able to find it and return a characteristic set of data for it. The set which is recognizable later by the densly connected neural network. In this sense, Convnets are insensitive to changes in the location of the object and this is one of the main reasons why they classificate images so well.

Like a classic neural sub-net, a convolutional sub-net can be multi-layered. This means that each subsequent layer is able to find more image features. What’s more, each layer of convolution is also multidimensional, because for each layer N filters are defined (see Figure 6 below). Filter values are differently initialized for each of them and they can therefore find better or worse image features important for correct classification. The back propagation algorithm will reduce the importance of inefficient filters, and promote those that support proper classification. Therefore, after many, many iterations, we will have a set of effective filters. Effective in terms of helping the network correctly classify images, and ultimately correctly generalize the classification process.

Figure no. 6: multilayered convolutional network

Another important element of Convnets is the implementation of the so-called maxpooling operation. What is maxpooling is also best to explain visually.

Figure no. 7: maxpooling operation

We analyze the values of four adjacent pixels, select the largest value and it becomes the output / result of the operation. In the next step, we move the operation window to the next group of pixels and repeat the calculations. Unlike convolution, maxpooling windows do not overlap. Therefore, in our example, the next window will be pixels: 114, 105, 182 and 75.

Maxpooling can also be done on a larger sample, e.g. 3×3, 4×4, etc. Note that M x M maxpooling reduces the size of the image by M². So the 32 x 32 image (1024 pixels), after 2×2 maxpooling will have the dimension of 16×16 = 256 pixels, or 2² times less.

What is a purpose of maxpooling? You’ve already learned the main reason above. Dimensionality is reduced without loss of information relevant to the classification, which simplifies the problem computationally. After performing the convolution (here in the sense: after applying the filter to a given area of the image), we are only interested in whether a significant feature was found in a given area or not. We are not interested in every single pixel value after convolution, but only those values that give a strong answer (strong hint) for our network. Therefore, we look at the values of neighboring pixels and choose the largest – the most important value from them. Furthermore, maxpooling helps identify the elements of the image that are most visible. As a result, we also achieve a higher level of “translational invariance”.

In addition to the maxpooling operation, there is also an average pooling (we calculate the average instead of the maximum) and minpooling, but they are not widely used in Convnets.

Dimensioning of convolutional neural network

As you may have noticed in Figures 2 to 5, the convolution on a 5×5 image with a 3×3 filter resulted in a new 3×3 “image”. Hence the simple conclusion that the convolution may change the dimensions of processed data and it is worth understanding how these dimensions change and how to control these changes. Since I am trying to use the Keras API on the blog, I will use the code line, which we will see later, and which defines the parameters of one of the convolutional layers:

Convolution2D(filters=32, kernel_size=(3,3), activation='relu',padding='same')

The above code creates a convolutional layer with 32 filters, each with 3×3 dimension, and the output from the layer additionally goes through the relu activation function. The padding option is important for this problem, set here to the same value, which means that after the image will be converted, it will have the same size. How is this possible? As the name suggests, we use an artificially added margin (padding), filled with zeros. Keras and Tensorflow offer two ways to handle margins: same and valid.

The valid mode was seen in the examples shown in Figures 2 to 5. The filter does not go outside the image area and thus, the output image is smaller. There is even a simple formula for this. If the image is D x D (in our example 5 x 5) and the filter is N x N (here: 3 x 3), then the size of the resulting image can be calculated from the formula: D – N + 1. For our example it will be 5 – 3 + 1 = 3.

The “same” mode preserves the size of the image after processing by convolution. So that input and output from the convolutional layer will be of the same size. To achieve this, Keras and other frameworks choose the margin size so that the filter passes the image with the margin as many times as the size of the input image. In our case, Keras will add one pixel of margin (padding) to the 5 x 5 image.

Figure 8 – padding types

It is worth noting that for small images – such as in the CIFAR-10 dataset – padding in same mode could be better, so that the size of the output image from the convolution does not drop too quickly, because then we will not be able to add the second and subsequent convolutional layers or perform maxpooling.

Example of dimensioning

Based on what we learned above, let’s try to trace the dimensions in a simple convolutional network. Let’s assume that it will consist of 3 convolution operations, two maxpooling operations and two classic densely connected layers. Also:

Conv1: padding = “valid”, filter size: 5×5, number of filters: 16
Maxpool1: 2×2
Conv2: padding = “same”, filter size: 3×3, number of filters: 8
Maxpool2: 2×2
Conv3: padding = “valid”, filter size: 3×3, number of filters: 4
Dense1: number of neurons at the output: 50
Dense2: number of neurons at the output: 10 (we classify into 10 classes)

Assuming that we have an image from the CIFAR-10 dataset at the input, how will the sizes change?

Conv1: the image has a size of 32x32x3 and we perform convolution on it with a 5×5 filter in the valid mode. Here it should be noted that a 5×5 filter is applied to each color channel and eventually added up. The resulting tensor size will therefore be 32-5 + 1 = 28×28. There are 16 filters. Hence, the output from Conv1 will be a 28x28x16 tensor.
Maxpool1 will reduce the size of each dimension twice. Hence, the output from this layer will be a 14x14x16 tensor.
Conv2: in this layer we are convolving in the same mode, so it will not change the dimensions of the image. There are 8 filters. Hence the output from Conv2 will be a 14x14x8 tensor.
Maxpool2 will reduce the size of each dimension twice. Hence, the output from this layer will be a 7x7x8 tensor.
Conv3: we do a convolution in valid mode, with a 3×3 filter. The resulting tensor size will therefore be 7-3 + 1 = 5×5. There are 4 filters. Hence the output from Conv3 will be a 5x5x4 tensor.
Dense1: we need to flatten the data to move them to the densely connected neural network. The result is a vector of size 5 * 5 * 4 = 100. It will be the entrance to the first layer of the neural network. It has 50 neurons at the output and this will be the size of the input vector to the last layer – Dense 2.
Dense2: on the input it receives data of size 50, on the output it has 10 neurons that classify the result with the softmax function.

Classification of the CIFAR-10 dataset

I believe so much theory is enough and we are ready to try to build a simple convolutional neural network and use it to classify the CIFAR-10 dataset. We will see if we can beat the previous result (47%) obtained with the densely connected neural network.

This time I will use Google Colaboratory environment as we have easy access to the GPU processor there.

Because I want to use tensorflow 2.x, and on the day of writing this post (December 2019) the default version in Colab is still 1.x, besides standard imports we need to add an indication of the expected version of the Tensorflow library.

import numpy as np

%tensorflow_version 2.x
import tensorflow

import matplotlib.pyplot as plt
%matplotlib inline>>> TensorFlow 2.x selected.

We import the Keras library, cifar-10 dataset and normalize data in the range of -0.5 to 0.5:

from tensorflow import keras
print(tensorflow.__version__)
print(keras.__version__)
>>> 2.1.0-rc1
>>> 2.2.4-tf

from tensorflow.keras.datasets import cifar10
(x_train,y_train), (x_test,y_test) = cifar10.load_data()
x_train.shape
>>> (50000, 32, 32, 3)

print(x_train.min(), "-", x_train.max())
>>> 0 - 255

x_train = (x_train / 255.0) - 0.5
x_test = (x_test / 255.0) - 0.5
print(x_train.min(), "-", x_train.max())
>>> -0.5 - 0.5

Next, we import the classes that we will need to build and train the model:

from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.layers import Convolution2D, MaxPool2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical

Time to build a model from imported classes. Model building using the Keras API is simple and you can experiment with various architectures. You will quickly notice however, that a diagnosis of whether the changes are in favor or not, is not so simple, as even such a simple convolutional neural network requires considerable computing power and more importantly – your time to wait for a result.

As for the proposed architecture, it will consist of:

two convolutional layers, each with 64 filters,
followed by 2×2 MaxPool layer,
then another two convolutional layers, each with 32 filters,
and the second layer of 2×2 maxpooling,
and finally the last pair of convolutions, with 16 filters each.
Since the images in the cifar-10 dataset are small, all convolutional layers operate with a 3×3 filter and use padding in the same mode, so as not to reduce the size of processed data too quickly.
As each convolutional layer (like the classic densely connected layer) is a linear one, we add an activation function to the output of each of them, which introduces an additional non-linearity – so liked by neural networks.
The densely connected subnet expects flattened data, hence the Flatten layer.
The model ends with three Dense layers, of which the first two have the relu function as an activation, and the third one classifies the result into one of 10 classes using the softmax function.

model = Sequential([
Convolution2D(filters=64, kernel_size=(3,3), input_shape=(32,32,3), activation='relu', padding='same'),
Convolution2D(filters=64, kernel_size=(3,3), activation='relu', padding='same'),
MaxPool2D((2,2)),
Convolution2D(filters=32, kernel_size=(3,3), activation='relu', padding='same'),
Convolution2D(filters=32, kernel_size=(3,3), activation='relu', padding='same'),
MaxPool2D((2,2)),
Convolution2D(filters=16, kernel_size=(3,3), activation='relu', padding='same'),
Convolution2D(filters=16, kernel_size=(3,3), activation='relu', padding='same'),
Flatten(),
Dense(units=512, activation="relu"),
Dense(units=64, activation="relu"),
Dense(units=10, activation="softmax")
])

Keras offers two useful ways to visualize the model. The first is the summary method, which will not only show the output data shapes on each layer, but will also calculate the complexity of the model by calculating the number of network parameters.

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 32, 32, 64) 1792
_________________________________________________________________
conv2d_1 (Conv2D) (None, 32, 32, 64) 36928
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, 16, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 16, 16, 32) 18464
_________________________________________________________________
conv2d_3 (Conv2D) (None, 16, 16, 32) 9248
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 8, 8, 32) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 8, 8, 16) 4624
_________________________________________________________________
conv2d_5 (Conv2D) (None, 8, 8, 16) 2320
_________________________________________________________________
flatten (Flatten) (None, 1024) 0
_________________________________________________________________
dense (Dense) (None, 512) 524800
_________________________________________________________________
dense_1 (Dense) (None, 64) 32832
_________________________________________________________________
dense_2 (Dense) (None, 10) 650
=================================================================
Total params: 631,658
Trainable params: 631,658
Non-trainable params: 0
_________________________________________________________________

The second way presents the model in graphic form, with the option of presenting the shapes of input and output data.

from tensorflow.keras.utils import plot_model
plot_model(model, 'model_info.png', show_shapes=True)

Figure 9 – graphic representation of the model

Next, we need to determine the optimizer, the loss function and the metrics that we want to collect during the training process. Both: the type of optimizer and its parameters (here learning rate and momentum) – are so called hyperparameters of the model. They will affect its behavior, speed and effectiveness of learning. These are obviously one of the many hyperparameters of our model. The architecture and complexity of the model itself is already critical to the network effectiveness. The size of the filters, their number, type of padding, activation functions, data normalization method, maxpooling conversions – these are other hyperparameters. The number of combinations is really large. It is worth taking advantage of the tips that can be easily found on the internet and start with hyperparameters that have already been checked and provided good effects, and then try to improve the results experimentally.

optim = SGD(lr=0.001, momentum=0.5)
model.compile(optimizer=optim, loss='categorical_crossentropy', metrics=['accuracy'])

After compiling the model, we can start the training process. With the parameters given below, it will last from several minutes to several dozen minutes (depends on the processing unit available). Note that results of the fit method are returned to the history object, which will collect data, so that afterwards we can display training process characteristics. I have also added the validation_split parameter to the fit method, which specifies how much of the training data is reserved for the validation process. This process takes place after each epoch, showing what effects the training process on the training dataset gives on the validation set (which the network has not seen during learning in this epoch).

history = model.fit(
x_train,
to_categorical(y_train),
epochs=80,
validation_split=0.15,
verbose=1
)

The final verification takes place on the test set that the network has not previously seen:

eval = model.evaluate(x_test, to_categorical(y_test))
eval
>>> [1.6473742815971375, 0.6954]

As you can see, by using the convolutional neural network, we were able to increase the accuracy of classification from 47% to nearly 70%.

However, some interesting things can be observed using the data collected in the history object:

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()

Figure 10 – learning and overfitting process

Figure 10 shows the training results on the training set (blue line) and results on the validation set (orange line). As you can see around the 15th epoch both datasets began to obtain different results. The accuracy for the training set gradually increased, eventually reaching around 95%. Meanwhile, validation has done a lot worse. The same applies to the test set.

What does it mean? Well, we are dealing here with so called overfitting (marked by a green arrow in Figure 10): the model almost perfectly learned how to recognize objects from the training set, but could not generalize this knowledge in such a way as to correctly classify objects previously unseen. This is not a good situation, because the network will not cope well with data outside of the training process. Fortunately, there are several techniques that reduce overfitting – we’ll cover them in the third part of the tutorial.

Thanks for reading! If you have anything to ask or share, please do comment below. I also invite you to the third part of the tutorial.

The post Convolutional neural network 2: architecture appeared first on AI Geek Programmer.

Convolutional neural network 1: convolutions

AI Geek Programmer — Sun, 24 Nov 2019 19:02:35 +0000

Deep neural networks are widely used in image and shape recognition. Examples of applications include face recognition, image analysis in medicine, handwriting classification, and detection of surrounding objects. A special type of neural network that handles image processing extremely well is a convolutional neural network.

I have to admit that ConvNet is my favorite deep neural network architecture and I like to use it whenever I have the opportunity. Hence, I am already looking forward to describing it in this series of posts . It has the character of a practical tutorial, so I encourage you to join and start coding along. If you don’t have a programming environment based on Keras and Tensorflow 2.0 yet, here you will find instructions on how to build it.

My idea for this tutorial was to reach for a quite demanding but widely available data set, and then show how a classic neural network is doing on such a set. In the next step we will check what results we get by putting convolutional neural network to work and how can we increase the correctness of classification using various regularization techniques. Building a neural network is not always work from scratch. There are many proven architectures that you can use, but please note that a large convolutional neural network may have quite a large demand for computing power. A good GPU will definitely be useful, especially later in the tutorial. There is a lot of work and material, so I divided the content into several parts.

From the first part of the tutorial you will learn:

What is the cifar-10 data set? How to download it, load and prepare data for training?
How to implement the classification of the data set using the classic densely connected neural network?
What is a convolution?
How to do a convolution on a simple example?

In the following parts I will deal with topics such as:

What is a convolutional neural network and how does it work?
How to build a simple convolutional neural network using the Keras and Tensorflow libraries?
What is regularization, what regularization techniques are the most popular and how will they affect the training results?
How to use one of the reference architectures?
How much will our calculations be accelerated by a GPU?

Data set for a convolutional neural network

For the purpose of our tutorial we will use the popular cifar-10 data set. It contains 60,000 colorful images in 32×32 pixel format. The images are in low resolution and have been classified into 10 classes: airplanes, cars, ships and trucks, as well as cats, birds, deers, horses, dogs and frogs. Classes are nicely balanced, each of them has 6,000 photos.

The authors of the data set are Alex Krizhevsky, Vinod Nair and Geoffrey Hinton. More about the data set, how it was created and how to use it can be found in this work: Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky.

Loading the data set is pretty easy because it is available in the keras library:

import numpy as np
import matplotlib.pyplot as plt
import tensorflow

print(tensorflow.__version__)
print(tensorflow.keras.__version__)

>>>2.0.0

>>>2.2.4-tf

from tensorflow.keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

The load_data method divides the set into training and test data in a ratio of 50,000 to 10,000.

Let’s check what the first 10 labels look like and what is the shape of the training data:

y_train.shape
>>> (50000, 1)

y_train[0:10]
>>> array([[6], [9], [9], [4], [1], [1], [2], [7], [8], [3]], dtype=uint8)

x_train.shape
>>> (50000, 32, 32, 3)

As you can see the labels are numbers from 0 to 9, where each number represents a given class, e.g. 1 is a car. Training data is an array of 50,000 elements. Each of them contains an image with three channels (RBG) and a resolution of 32×32.

Let’s display 16 randomly selected pictures.

photos_count = 16
photos = np.zeros((photos_count,32,32,3), dtype=int)
desc = np.zeros((photos_count,1), dtype=int)
for i in range(photos_count):
   indeks = np.random.randint(0, 50000)
   photos[i] = x_train[indeks]
   desc[i] = y_train[indeks]

photos.shape
>>> (16, 32, 32, 3)

desc.shape
>>> (16, 1)

A simple dictionary can be used to label the pictures.

dict = {
   0: 'airplane',
   1: 'automobile',
   2: 'bird',
   3: 'cat',
   4: 'deer',
   5: 'dog',
   6: 'frog',
   7: 'horse',
   8: 'ship',
   9: 'truck',
}

The matplotlib.pyplot library provides us with extensive data presentation capabilities. We will use a small fraction of these possibilities to present randomly selected 16 photos in one picture:

fig = plt.figure()
for n, (picture, label) in enumerate(zip(photos, desc)):
   a = fig.add_subplot(4, 4, n + 1)
   plt.imshow(picture)
   a.set_title(dict[label[0]])
   a.axis('off')
fig.set_size_inches(fig.get_size_inches() * photos_count / 7)
plt.show()

Classification using a classic neural network

Before we get to ConvNets, for reference purposes we will build a classifier on a “regular” densely connected neural network – that is, the one in which each neuron of a given layer is connected to each neuron of the next layer.

Densely connected neural networks expect the input to be “flat”. We have to therefore flatten our colorful images to one-dimensional form. This can be done as below, and you can also use the predefined Flatten layer, which will flatten the data itself. I will use it a little later.

x_train = x_train.reshape((-1, 3072))
x_test = x_test.reshape((-1, 3072))
x_train.shape
>>> (50000, 3072)

The number 3072 results from multiplying the dimensions 32 x 32 x 3.

Because each pixel in each channel determines the intensity of the color component for the pixel, the data should be in the range of 0 to 255. Let’s check this:

x_train.max()
>>> 255
x_train.min()
>>> 0

Neural networks work best in the area of the greatest activity of their activation functions. Therefore, it is good practice to normalize the input data. There are many types of normalization – I wrote more about it here. We normalize the data so that the average falls around 0.

x_train = (x_train / 255)- 0.5
x_test = (x_test / 255)- 0.5

If we wanted to normalize the data around 0.5, the code should look like this:

x_train = (x_train / 255)

Now, all we need to do now is to make necessary imports, build a model, compile it, and start the training process. I wrote in more detail about how to build the model in this post:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical

model = Sequential([
Dense(1024, activation='tanh', input_shape=(3072,)),
Dense(512, activation='tanh'),
Dense(256, activation='tanh'),
Dense(128, activation='tanh'),
Dense(64, activation='tanh'),
Dense(10, activation='softmax')
])

model.compile(
optimizer='RMSprop',
loss='categorical_crossentropy',
metrics=['accuracy']
)

model.fit(
x=x_train,
y=to_categorical(y_train),
epochs=15,
shuffle=True
)

On the training set you should get the accuracy at around 53%.

Let’s check how our neural network will handle test data that it has not previously seen. We will use the evaluate method. If you are wondering why the to_categorical function was used in the fit and evaluate methods, I explained that in this post.

eval = model.evaluate(
x_test,
to_categorical(y_test)
)

eval
>>> [1.5109523288726807, 0.4749]

Our simple neural network on the cifar-10 data set obtained the accuracy of 47%. Given that we have very simple neural network, as many as 10 classes and a relatively low quality images, this is a pretty decent result, especially if we compared it to the 10% offered by a random guess. Having the above benchmark, we could go to ConvNets. However, we will begin by explaining what is a convolution.

Convolution is a mathematical operation

Yeah! Pretty revealing . Most of what happens in neural networks is in fact a mathematical operation, isn’t? I will add, however, that a convolution is quite simple and has only a few parameters that you only need to be able to interpret to do a convolution yourself.

Let’s start with what is and why we need a convolution at all, since even a simple classic neural network can quite well classify a set and probably if we work on its architecture a bit and add regularization (more on which in the next parts of the tutorial), the result could be much better than 47%.

So what is convolution? I will use some examples. If any of you are interested in music, then certainly you something about sound effects. The original sound can be modified by applying a sound effect filter to it. In the case of an image, which is also a signal, applying a filter can give us the same image, but with certain features highlighted or hidden, e.g. we can sharpen or blur the image, we can also identify the edges on the image. If someone is a fan of photography, she probably used the anti-aliasing effect more than once. It is also nothing else than applying the appropriate filter to the original signal. The operation of applying a filter to a signal is called convolution.

As you can see, there are thousands of uses of the convolution and we don’t realize how often we meet data filtering. However, in the context of image classification and ConvNets, we are interested in a convolution whose purpose is not to enrich the signal (here image) with special effects, but to transform it so that the neural network is able to better capture the characteristics of the image and, as a result, classify it more effectively.

The idea behind a convolutional neural network is to transform the original image before transferring it to a densely connected neural network.

What do we get thanks to a convolution?

after processing an image with a filter, certain image features are highlighted, which makes them easier to recognize. This is called feature extraction – the network itself finds the features relevant to the image,
we usually apply many filters at once and each one can highlight different features. For example, when recognizing faces, one will highlight the eyes, the other ears, the hairline or no hair ,
as a result of convolution we become independent of the object’s position in the image. Whether the aircraft is presented centrally or in the upper left corner, for the convolutional network will not matter and will not negatively affect the classification. If you read my post about the classification of the MNIST data set, you probably remember that all the numbers were presented centrally there. However, preparing such a “perfect” data set is difficult and time-consuming, and sometimes simply impossible,
we reduce noise in the analyzed images by focusing the network’s attention on key features,
usually one of the stages in the convolutional neural network is the layer performing so-called pooling, i.e. combining the values of several adjucent pixels into one. This significantly reduces the computing power needed to train the network without losing important information.

One important note: it is the training algorithm that selects the appropriate filters. This is not our choice as don’t supervise the training process at this level. Filters are randomized and then the back propagation algorithm decides which filter values give the best classification results. As a consequence, the appearance of the image after the machine-implemented convolution often doesn’t say much to the human eye, but it is somehow relevant to the neural network.

Convolution on a single picture

Knowing what a convolution is, before we even use it in neural networks, let’s try to do a convolution on a single photo and see what effects we can achieve.

We will need imports of several libraries, including those for image processing.

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
%matplotlib inline

We will also define a helper function that will download the image from the local disk, convert it to grayscale, so that we can operate on one and not three color channels and return the whole as a numpy array.

def convert_image(file):
return np.array(Image.open(file).convert('L'))

In this example I will use a photo. which I did in the summer of 2019 on Crete. Unfortunately, the gray scale does not reflect how beautiful the sun can be there in September . The original file (though small in size) for download is available here.

image = convert_image(r'path-to-your-local-directory\house-small.jpg')
image.shape
>>> (302, 403)
plt.imshow(image, cmap='gray')

To implement the convolution we will use keras library:

import tensorflow as tf
print(tf.__version__)
>>> 2.0.0

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D

We imported a Sequential model and we’ll use the Conv2D layer to implement two-dimensional convolution. The model is very simple – it contains only one layer and a 3×3 kernel (filter). We also need to indicate the size of the input image – we checked it above: 302 x 403.

We perform a convolution using one filter, i.e. the filters parameter should be set to 1.

model = Sequential(
Conv2D(filters=1,
kernel_size=(3,3),
input_shape=(302, 403, 1))
)

The model in the keras library contains a method that can be helpful in understanding how the model looks like.

model.summary()

>>> Model: "sequential"
>>> _________________________________________________________________
>>> Layer (type)                 Output Shape              Param #   
>>> =================================================================
>>> conv2d (Conv2D)              (None, 300, 401, 1)       10        
>>> =================================================================
>>> Total params: 10
>>> Trainable params: 10
>>> Non-trainable params: 0

It is worth noting the “Output Shape” after convolution, as it is slightly different from the input shape. This is due to the way the convolution operation is carried out. I will not go into details here. More about that in the next part of the tutorial.

Another important issue is reformatting the image so that it is acceptable for the model. The input image is stored in a 302 x 403 array. Meanwhile, the Conv2d layer expects a 4-dimensional tensor. So we need to make the appropriate transformation, for example using the dedicated expend_dims method.

image4Conv = tf.expand_dims(image, 0)
image4Conv = tf.expand_dims(image4Conv, -1)
image4Conv.shape
>>> TensorShape([1, 302, 403, 1])

First, we added the first dimension, whose task is to store the index of the item in the data batch. When we train on large amounts of data, it definitely matters. We have one element / one photo though, so theoretically it doesn’t matter, but keras still expects it to be indexed on the first dimention of the received tensor. The next two dimensions contain the pixel coordinates and the last dimension the pixel value. In our example, we are dealing with shades of gray, i.e. we have one number / one channel. If we were to process an RGB image, the tensor would be of shape [1, 302, 403, 3].

If we were in the training process, the model designed in this way would have to be first compiled by setting the objective function and metrics, and then we would start the training by calling the fit method. However, we just want to pass the image through the Conv2D layer, with filter values randomly selected for now and see what effect we will get. Hence, we will use the predict method, which will do exactly what we want.

result = model.predict(image4Conv)
result.shape
>>> (1, 300, 401, 1)

According to the summary method, we received a 300 x 401 tensor at the output. So the resulting image will be slightly smaller. To display the image we must first get rid of previously added “artificial” dimensions.

result = tf.squeeze(result)
result.shape
>>> TensorShape([300, 401])

plt.imshow(result, cmap='gray')

As you can see the picture was somehow processed by randomly set filter. If you run this or similar code on your environment and work on my image, then you will get a different result, because the randomly set filter will give different final results.

If we want to get a predictable (specific) effect of our convolution, we have to set the filter values ourselves. Keras gives us this option, although in practice it can be rarely used. It is described at the very bottom of this doc page, in the section “Using custom initializers“. It is worth emphasizing once again that the values of the filter are de facto parameters of the model, and the training algorithm changes them in an attempt to obtain the best final results. Thus, the operations we will perform below are only intended to understand the mechanism of convolution. While training the convolutional neural network, we will not interfere with the filter values, although we can initialize these filters in various ways, as described in the link above.

from tensorflow.keras import backend as K

def my_filter(shape, dtype=None):
# We set filter values to detect edges
f = np.array([
     [[[-1]], [[-1]], [[-1]]],
     [[[-1]], [[ 8]], [[-1]]],
     [[[-1]], [[-1]], [[-1]]]
])
return K.variable(f, dtype='float32')

The filter value is set so that it identifies the vertical and horizontal edges, distinguishing them from the other elements of the image. Note that the shape of the filter array is (3,3,1,1) – this may not be visible at first glance – and that the filter elements add up to one. If they didn’t add up to unity, the picture would be darker or lighter. It’s worth experimenting with different filter values. Here is a small note: if you use your image and it is significantly larger than 300 x 400 pixels, then you need to use a slightly larger filter so that the effects of convolution on a larger image are visible to the naked eye.

Now we have to build a model based on the so initiated filter, perform a convolution and display the original and processed images.

model_edge = Sequential(
Conv2D(filters=1,
         kernel_size=(3,3),
   kernel_initializer=my_filter,
   input_shape=(302, 403, 1))
)

result_edge = model_edge.predict(image4Conv)
result_edge.shape
>>> (1, 300, 401, 1)

result_edge = tf.squeeze(result_edge)
result_edge.shape
>>> TensorShape([300, 401])

plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 142
fig = plt.figure()
ax1 = fig.add_subplot(2,1,1)
ax1.imshow(image, cmap='gray')
ax1.set_title('ORIGINAL')
ax2 = fig.add_subplot(2,1,2)
ax2.imshow(result_edge, cmap='gray')
ax2.set_title('AFTER CONVOLUTION')
plt.show()

You can see that the convolution has caught most of the edges.

Let’s try another filter that will blur the image. For this purpose, each filter element will have a value of 1 / n, where n is the number of filter elements. For example, for a 3×3 filter, each element should be 1/9. Because I would like the blur to be stronger (more visible), we will use a 7×7 filter with a value of 1/49. In this way, all filter elements still add up to unity.

def my_filter(shape, dtype=None):
f = np.empty(shape=(7,7,1,1))
f.fill(1/49)
return K.variable(f, dtype='float32')

model_blur = Sequential(Conv2D(filters=1, kernel_size=(7,7), kernel_initializer=my_filter, input_shape=(302, 403, 1)))
result_blur = model_blur.predict(image4Conv)
result_blur.shape
>>> (1, 296, 397, 1)

result_blur = tf.squeeze(result_blur)

fig = plt.figure()
ax1 = fig.add_subplot(2,1,1)
ax1.imshow(image, cmap='gray')
ax1.set_title('ORIGINAL')
ax2 = fig.add_subplot(2,1,2)
ax2.imshow(result_blur, cmap='gray')
ax2.set_title('AFTER CONVOLUTION')
plt.show()

Ready to read the next part of this tutorial?

If you liked the tutorial above, please share it or recommend it to your friends.

The post Convolutional neural network 1: convolutions appeared first on AI Geek Programmer.

Development environment for machine learning

AI Geek Programmer — Sun, 20 Oct 2019 17:17:57 +0000

One of the first problems faced by AI students is how to build a development environment for machine learning. This is an ungrateful issue because there are many methods and tools available and sometimes you simply don’t know which to choose and where to start. Added to this are the issues of choosing libraries for installation, IDE and GPU usage.

Personally, for the purposes of learning and experimenting, I am in favor of using the set of packages offered by Anaconda, conda manager and Jupyter notebook.

From this post you will learn:

What is Anaconda, conda and what are the alternatives?
How to install Anaconda?
How to create virtual development environment for machine learning using conda?
What packages to install to get started with machine learning?
How to install Tensorflow library in different versions?
How to install Tensorflow on GPU?

Differences between Anaconda, conda and pip

Anaconda is an open-source platform for machine learning and data science, available on most operating systems. It contains a set of over 1,500 programming and tool libraries, including conda and pip, which will be useful for building our environments. With Anaconda we have everything at hand, the construction of an environment is quick and simple, and the platform continues to maintain libraries and dependencies between them. Unfortunately, it also has its drawbacks: more advanced features are paid, and the installation of Tensorflow 2.0 is not yet supported (October 2019) by conda and you need to use the pip manager.

Conda is a manager of virtual environments and programming libraries for Python (but also other languages). It is supplied with Anaconda and you can create separate environments in which we will use different libraries or different versions, e.g. tensorflow 1.x, tensorflow 2.x, tensorflow-gpu, etc.

Pip is also a library / programming package manager. However, it does not provide the ability to create separate environments. If someone would like to use pip and still operate in a virtualized environment, then the virtual environment manager for Python – venv – can be used. A certain advantage of pip over conda (at least for October 2019) is that tensorflow 2.0 is available from the pip level and is even the default version. While in conda there is no support for tensorflow 2.0 yet and if you want to build an environment based on version 2.0, then you need a little odd tricks, which is discussed below.

It is worth adding here that there are plenty of great Python environments available in the cloud – just search for “cloud python ide”. In addition, if someone, like me, likes Jupyter Notebook, Google offers the Colaboratory, which is nothing more than a free Jupyter notebook running in the cloud, configured to work or requiring only minimal setup. And what’s interesting, it also allows you to use gpu for free.

Despite this, I think that the local development environment for machine learning is always worth having, so let’s move on to its configuration.

Development environment for machine learning- step no 1

We download and install Anaconda for Python 3.7. Installation is carried out with the default settings. Only this screen may raise a question:

As you can see Anaconda does not recommend adding a path to the PATH variable. Consequently, to run e.g. conda we will have to go to the folder where we install Anaconda, to the condabin subdirectory or use the Anaconda Prompt program (for Windows). Not adding Anaconda directory to the PATH isn’t recommended, cause in more complex environments adding an Anaconda path to PATH can cause conflicts. Hence, I leave the recommended settings and start the installation.

Development environment for machine learning – step no 2

We run the console – in Windows the cmd command, go to the condabin directory (alternatively: run Anaconda Prompt) and check conda version:

>conda -V

If we want to make sure that we are using the latest version of conda, we can run the following command:

>conda update -n base -c defaults conda

Let’s create a new virtual environment. I call it my_env here, but the name can be any:

>conda create --name my_env

To start working with the newly created environment, we run the following command:

>conda activate my_env

Switching to this environment is important, otherwise we will operate in the context of the base environment and there will be no environment separation. To see what packages are installed on the current environment, we issue the command:

> conda list

Since we did not indicate any packages or libraries when creating the environment, the list should be empty at the moment. To install packages we can use the command:

>conda install numpy pandas matplotlib pillow jupyter

We have listed only a few key packages for us, but conda, by examining dependencies, will install many more, including python in the appropriate version. This is one of the biggest advantages of managers such as conda or pip. For all those who would like to read a little more about the possibilities of conda, I recommend this Conda Cheet Sheet website.

At this point, we can stop for a moment with further installation and think about how you can install tensorflow:

If we want one of the lastest stable 1.x versions, then we may install it using conda – this is the most recommended way, because our environment will still be managed by only one package manager – conda.
If we would like to install version 2.0 today (October 2019), unfortunately it is not yet offered by conda and we need to use the pip manager.
There is yet another situation if we can use a GPU graphics processor.

To easily pass each road, we’ll clone now our environment: we deactivate it and use the clone option:

>conda deactivate
>conda create -n my_env-20 --clone my_env
>conda create -n my_env-gpu --clone my_env

As a result, we have three twin environments: my_env, my_env-20, my_env-gpu and separately on each of them we may proceed with the above mentioned installation types. Simply switching between those environments with deactivate – activate commands.

Development environment for machine learning – step no 3A

We activate the my_env environment and install tensorflow in the latest version available in the Anaconda repository:

>conda activate my_env
>conda install tensorflow

If we want to check in what version Python has been installed:

>python --version
>>>Python 3.7.4

If we want to see the tensorflow version, I suggest you do it from the Jupyter Notebook level (by the way we will check if it works correctly):

>jupyter notebook # And in a notebook:
import tensorflow as tf
print(tf.__version__)
>>> 1.14.0

Development environment for machine learning – step no 3B

Let’s switch to the my_env-20 environment and try to install tensorflow version 2.0. Unfortunately, in this situation we cannot use conda, because for today (October 2019) it does not have this version of tensorflow in its repository. Another package manager – pip – then comes into play.

>conda deactivate
>conda activate my_env-20
>conda install pip
>>># All requested packages already installed.

As you can see pip is already installed – it’s because it was defined as one of the dependencies when installing the base packages. Install Tensorflow using pip:

>pip install tensorflow >python --version >python -c "import tensorflow as tf; print(tf.__version__);"

As a result, we will get an environment with Python version 3.7.4 and tensorflow version 2.0.0. Unfortunately, it will not be an ideal environment. In particular, the whole idea of building an environment using conda assumes that this package manager tracks installations and all dependencies, thanks to which the environment is consistent and can be supplemented and updated at any time. When an additional pip enters the stage, conda partially loses this information and also loses control over the environment. The general rule in such situations is that installations using pip should be carried out at the very end and after pip installation no further installations using conda should be performed, otherwise the environment may become unstable. More about potential problems of this configuration and good practices in this area in this article.

I will add that one more important difference between tensorflow installed using conda vs. pip is such that tensorflow from conda can be up to 8 times more efficient due to the way the package is built in the Anaconda repository.

Development environment for machine learning – step no 3C

The last option for building your development environment for machine learning is to configure it so that you can use the power of the GPU, which dramatically speeds up training process. To start with, the main issue. Only those who have a CUDA-enabled graphics card can use the GPU. You can check it at this address. Note: many people also check here, but this list is not valid for today (October 2019). For example, my GeForce GTX 1660 Ti graphics card is not on it, and it can undoubtedly be used by tensorflow.

The installation with the use of conda is currently very simple and in no way resembles a very complicated installation process from a few months ago. Mainly because while installing tensorflow-gpu conda gets also cudatoolkit and cudnn packages by itself, which once had to be laboriously done manually. In the first step, we switch to the appropriate environment clone, and then install the tensorflow-gpu package

>conda deactivate
>conda activate my_env-gpu >conda install tensorflow-gpu

And that’s it . All you need to do now is make sure that our environment actually “sees” and uses a GPU unit. It’s best to use a Jupyter notebook:

>jupyter notebook
# W notebooku:
import tensorflow as tf
print(tf.__version__)
>>> 1.14.0
tf.test.is_gpu_available()
>>>True
tf.test.gpu_device_name()
>>>'/device:GPU:0

That’s all for today . I hope you will be able to easily build your development environment for machine learning. Good luck in your learning process!

If the post was helpful, like it and share it with people who might be interested – thank you.

Looking for more reading? Check my other posts:

The post Development environment for machine learning appeared first on AI Geek Programmer.