Add Layers To A Neural Network In TensorFlow

Video Transcript

Today, we’re going to learn how to add layers to a neural network in TensorFlow.

Right now, we have a simple neural network that reads the MNIST dataset which consists of a series of images and runs it through a single, fully connected layer with rectified linear activation and uses it to make predictions.

# add-layers.py
#
# to run
# python add-layers.py
#
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

x = tf.placeholder(tf.float32, shape=[None, 784])

W = tf.get_variable("weights", shape=[784, 10],
                    initializer=tf.glorot_uniform_initializer())

b = tf.get_variable("bias", shape=[10],
                    initializer=tf.constant_initializer(0.1))

y = tf.nn.relu(tf.matmul(x, W) + b)

y_ = tf.placeholder(tf.float32, [None, 10])

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=y, labels=y_)
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

for step in range(50):
    print(f"training step: {step}")
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_cs, y_:batch_ys})
    if step % 10 == 0:
        print("model accuracy: ")
        print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                            y_: mnist.test.labels}))

print("final model accuracy: ")
print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                    y_: mnist.test.labels}))

We have accuracy reporting that tells us how well we’re doing, as you can see.

# Command line
$ python add-layers.py

However, getting 0.6958 on MNIST is really bad.

The top-quality networks get around 99% accurate. We want to try and get that.

So a simple way to do that is just to add more layers.

W1 = tf.get_variable("weights1", shape=[784, 10],
                    initializer=tf.glorot_uniform_initializer())

b1 = tf.get_variable("bias1", shape=[10],
                    initializer=tf.constant_initializer(0.1))

W2 = tf.get_variable("weights2", shape=[784, 10],
                    initializer=tf.glorot_uniform_initializer())

b2 = tf.get_variable("bias2", shape=[10],
                    initializer=tf.constant_initializer(0.1))

W3 = tf.get_variable("weights3", shape=[784, 10],
                    initializer=tf.glorot_uniform_initializer())

b3 = tf.get_variable("bias3", shape=[10],
                    initializer=tf.constant_initializer(0.1))

So we’re going to go to three fully connected layers.

To do that, we first have to make sure that all of our names for our variables make sense should call all the weights N and bias B.

We have to define our network.

out1 = tf.nn.relu(tf.matmul(x, W1) + b1)
out2 = tf.nn.relu(tf.matmul(out1, W2) + b2)
y = tf.nn.relu(tf.matmul(out2, W3) + b3)

So we’re going to say out1, out2, and then y here.

So we’re going to multiply the input times the first weight matrix and then add the first bias vector.

Then we’ll take the output of that, out1, and multiply it by the second weight matrix and add the second bias vector.

Finally, we’re going to take the output of our second layer and multiply it by our remaining weight matrix and bias vector.

We’re going to need to change the sizes.

So we’ll keep our original one the same.

Then the second one will go to 350;

W1 = tf.get_variable("weights1", shape=[784, 350],
                    initializer=tf.glorot_uniform_initializer())

b1 = tf.get_variable("bias1", shape=[350],
                    initializer=tf.constant_initializer(0.1))

Then to 350, 175;

W2 = tf.get_variable("weights2", shape=[350, 175],
                    initializer=tf.glorot_uniform_initializer())

b2 = tf.get_variable("bias2", shape=[175],
                    initializer=tf.constant_initializer(0.1))

and then new lines in here.

The final one is going to go from 175 to 10.

W3 = tf.get_variable("weights3", shape=[175, 10],
                    initializer=tf.glorot_uniform_initializer())

As you can see now, all of the dimensions align for the calculations.

So you can see that the x is a 784 dimensional tensor and we’re multiplying it by a matrix that’s 784x350 and then so forth.

This should work smoothly.

# Command line
$ python add-layers.py

As you can see, the accuracy is a lot better where we added about 20% there.

However, it takes a little bit longer to calculate.

Once you start adding in multiple layers and you’re getting a deeper network, that’s probably about the time that you’d want to use some sort of GPUs to your computations.

We could go as far as we want with this and keep adding layers and layers and layers.

But if we were to do that, you would see that the advantage wouldn’t be as big.

Another thing we might want to do instead is we might want to add in some sort of convolutional network.

So the first thing we would want to do is we’re going to reshape our input.

conv1_input = tf.reshape([-1, 28, 28, 1])

So currently, our input is a 784 dimensional tensor which you can think of it as a really long vector. We need that to be in the shape of a matrix. So we need the reshape function.

Then we’ll define our layer.

conv1 = tf.layers.conv2d(inputs=conv1_input,
                         filters=3,
                         kernel_size=[5, 5],
                         padding='valid')

A lot of what we’re doing here doesn’t really have a lot of rhyme nor reason to it.

It’s just kind of ad hoc.

Four layers will probably work just as well as three and a kernel size of 4x4 or 6x6 would work basically the same.

I just wanted to come up with something that seems reasonable.

So the input for our convolutional layer has to be a matrix instead of a vector, and the output of our convolutional matrix is a matrix as well.

Now that we’ve defined our convolutional layer, we’re going to note that the input to convolutional layer is a matrix and so is the output.

So we’re going to need to figure out what size the output is so that we can reshape it into a vector to feed into our fully connected layers.

conv1 = tf.reshape(conv1, [-1, ???])

We’re going to call the reshape function in a similar way as above.

But first, we need to know what size the output layer is.

To do that, we’re going to call the get shape function and then we’ll just quit.

#conv1 = tf.reshape(conv1, [-1, ???])

print(conv1.get_shape())
quit()

# Command line
$ python add-layers.py

As you can see, we now have a 24x24x3 dimensional tensor.

# Python Interpreter
24 * 24 * 3

24x24x3 is 1728.

So we’re going to reshape it into a 1728 dimensional tensor.

conv1 = tf.reshape(conv1, [-1, 1728])

W1 = tf.get_variable("weights1", shape=[1728, 350],
                    initializer=tf.glorot_uniform_initializer())

Now, this should work as we want it to.

# Command line
$ python add-layers.py

If you note, the accuracy is pretty good here except that it makes the model a lot slower.

Again, a GPU starts to get really useful once you have multiple convolutional layers.

In fact, if you notice, the accuracy didn’t really plateau at any point here. So we could probably add even more training epochs and we’d get a better result.

Full Source Code For Lesson

# add-layers.py
#
# to run
# python add-layers.py
#
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

x = tf.placeholder(tf.float32, shape=[None, 784])

conv1_input = tf.reshape(x, [-1, 28, 28, 1])
conv1 = tf.layers.conv2d(inputs=conv1_input,
                         filters=3,
                         kernel_size=[5, 5],
                         padding='valid')

conv1 = tf.reshape(conv1, [-1, 1728])

W1 = tf.get_variable("weights1", shape=[1728, 350],
                    initializer=tf.glorot_uniform_initializer())

b1 = tf.get_variable("bias1", shape=[350],
                    initializer=tf.constant_initializer(0.1))

W2 = tf.get_variable("weights2", shape=[350, 175],
                    initializer=tf.glorot_uniform_initializer())

b2 = tf.get_variable("bias2", shape=[175],
                    initializer=tf.constant_initializer(0.1))

W3 = tf.get_variable("weights3", shape=[175, 10],
                    initializer=tf.glorot_uniform_initializer())

b3 = tf.get_variable("bias3", shape=[10],
                    initializer=tf.constant_initializer(0.1))


out1 = tf.nn.relu(tf.matmul(conv1, W1) + b1)
out2 = tf.nn.relu(tf.matmul(out1, W2) + b2)
y = tf.nn.relu(tf.matmul(out2, W3) + b3)

y_ = tf.placeholder(tf.float32, [None, 10])

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=y, labels=y_)
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(cross_entropy)

# We'll use this to make predictions with our model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

for step in range(50):
    #print(f"training step: {step}")
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
    if step % 10 == 0:
        print("model accuracy: ")
        print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                            y_: mnist.test.labels}))

print("final model accuracy: ")
print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                    y_: mnist.test.labels}))