📖 You can now grab a copy of our new Deep Learning in Production Book 📖

# Neural Network from scratch-part 2

Sergios Karagiannakoson2018-08-01·2 mins

In this part we are going to examine how we can improve our neural network library by adding a convolutional neural network structure to use on a dataset of images. No one can argue that convolutional neural networks are the best way to classify and train images and this is why they have so much use in computer vision systems. The goal, of course, is to use again GPU's and OpenCL as ConvNets require more computing resourses and memory than plain fully connected networks.

Let's begin. First of all , we have to remember that CovNets in their simpler forms consist of a convolutional layer, a pooling layer and a fully connected layer. Luckily we have implemented the last one . So all it remains are the two first.

## Convolutional layer

This time i am not gonna get into much details about the C++ part and how we will build the basic structure of our ConvNet (i did that in the first part for Fully Connected Layers) , but i will dive on the kernels code, which i think is the most interesting. In those layers, we convolve the input image with a small size kernel and we acquire the fearure map.

kernel  void convolve(global float *image, global Filter* filters, global float * featMap,int filterWidth,int inWidth,int featmapdim){
const int xIn=get_global_id(0);//cols     const int yIn=get_global_id(1);//rows     const int z=get_global_id(2);//filters
float sum=0;     for (int r=0;r<filterWidth;r++){        for (int c=0;c<filterWidth;c++){             sum+= filters[z].weights[c*filterWidth +r]*image[(xIn+c)+inWidth*(yIn+r)];        }    }
sum +=filters[z].bias;
featMap[(xIn+yIn*featmapdim +z*featmapdim*featmapdim)] =relu(sum);

}

As you can tell, we are based on the hypothesis that each pixel of the feature map is calculated parallelly as it is inherently independent from all the others. So if we have an image 28x28 and we use a kernel 5x5, we will need 24x24=576 threads to run simultaneously. The backward propagation is a little more tricky because there are not many online resourses to actually provide the equations for a convolutional layer.

\begin{aligned} \frac{\partial E}{\partial w_{x, y}^{l}} &=\sum_{x^{\prime}} \sum_{y^{\prime}} \delta_{x^{\prime}, y^{\prime}}^{l} f\left(o_{x^{\prime}-x, y^{\prime}-y}^{l-1}\right) \\ &=\delta_{x, y}^{l} * f\left(o_{-x,-y}^{l-1}\right) \\ &=\delta_{x, y}^{l} * f\left(\operatorname{rot}_{180^{\circ}}\left(o_{x, y}^{l-1}\right)\right) \\ \frac{\partial E}{\partial o_{x, y}^{l}} &=\sum_{x^{\prime}} \sum_{y^{\prime}} \delta_{x^{\prime}, y^{\prime}}^{l+1} w_{x^{\prime}-x, y^{\prime}-y}^{l+1} f^{\prime}\left(o_{x, y}^{l}\right) \\ &=\delta_{x, y}^{l+1} * w_{-x,-y}^{l+1} f^{\prime}\left(o_{x, y}^{l}\right) \\ &=\delta_{x, y}^{l+1} * \operatorname{rot}_{180^{\circ}}\left(w_{x, y}^{l+1}\right) f^{\prime}\left(o_{x, y}^{l}\right) \end{aligned}

If we translate the above in c code we get:

kernel void deltas(global Node * nodes,global Node * nextnodes,global float *deltas,global int *indexes,int dim,int nextnumNodes,int pooldim){
const int xIn=get_global_id(0);    const int yIn=get_global_id(1);    const int z=get_global_id(2);
int i = xIn+yIn*pooldim +z*pooldim*pooldim;
float delta = 0;    for (int j = 0; j !=nextnumNodes; j++)        delta += nextnodes[j].delta * nextnodes[j].weights[i];
delta *= devsigmoid(nodes[i].output);     for(int r=0;r<2;r++){            for(int c=0;c<2;c++){                if((c*2+r)==indexes[i])                    deltas[(2*xIn+r)+(2*yIn+c)*dim+z*dim*dim]=delta;            }
} }

kernel void backpropcnn( global float* featMap,global float* deltas,global Filter* filters,int featmapdim,int imagedim,int filterdim,float a,global float* Image){
const int xIn=get_global_id(0);         const int yIn=get_global_id(1);         const int z=get_global_id(2);
float sum=0;         for (int r=0;r<featmapdim;r++){             for (int c=0;c<featmapdim;c++){
sum+= deltas[(c+r*featmapdim +z*featmapdim*featmapdim)]*Image[(xIn+r)+imagedim *(yIn+c)];                 }             }
filters[z].weights[(xIn+filterdim *yIn)] -=a*sum; }

## Pooling layer

The pooling layers is just a down sampling of the feature map into a new map with smaller size. There are two kind of pooling : Average and max pooling with the second being the most used. In max pooling, we just define a filter (usually of size 2x2) and we apply it on the feture map. The goal of the filter is to simply extract the maximum value of the filter window in the image.

kernel void pooling( global float* prevfeatMap,global float* poolMap,global int* indexes,int Width,int pooldim){
const int xIn=get_global_id(0); const int yIn=get_global_id(1); const int z=get_global_id(2);
float max=0;    int index = 0;         for (int r=0;r<2;r++){             for (int c=0;c<2;c++){
if(prevfeatMap[(yIn+c)*Width*z +(xIn+r)]>max){                       max=prevfeatMap[(yIn+c)*Width*z +(xIn+r)];                       index=c*2+r;                       }
}             }             poolMap[(xIn+yIn*pooldim +z*pooldim*pooldim)]=max;             indexes[(xIn+yIn*pooldim +z*pooldim*pooldim)]=index; }

As fas as the backward propagation is concerned, there are no actual gradient calculations. All we need to do is to upsample the matrix. In fact we pass the gradient on the "winning unit" of the forward propagation. That is the reason why we build and indexes matrix in the above snippet before, in which we keep the position of all " winning units". This functionality is visible on the "deltas" function, which is responsible for the calculation of the gradient errors.

To run the kernels code, we follow the next steps:

• Pass the data in matrix format (OpenCV can be used for that)

• Define the cl: Buffers and the cl::Kernels

• Run the kernels with the following code:

//ForwardconvKern.setArg(0, d_InputBuffer);convKern.setArg(1, d_FiltersBuffer);convKern.setArg(2, d_FeatMapBuffer);convKern.setArg(3, filterdim);convKern.setArg(4, inputdim);convKern.setArg(5, featmapdim);
err = (OpenCL::clqueue).enqueueNDRangeKernel(convKern, cl::NullRange,    cl::NDRange(featmapdim, featmapdim, convLayer.numOfFilters),    cl::NullRange);

poolKern.setArg(0, d_FeatMapBuffer);poolKern.setArg(1, d_PoolBuffer);poolKern.setArg(2, d_PoolIndexBuffer);poolKern.setArg(3, featmapdim);poolKern.setArg(4, pooldim);
err = (OpenCL::clqueue).enqueueNDRangeKernel(poolKern, cl::NullRange,    cl::NDRange(pooldim, pooldim, convLayer.numOfFilters),    cl::NullRange);
err = (OpenCL::clqueue).enqueueNDRangeKernel(backpropcnnKern, cl::NullRange,    cl::NDRange(filterdim, filterdim,convLayer.numOfFilters),    cl::NullRange);