Kinetic Creatures

Elly the Kinetic Creature

A Christmas gift that I received from my lovely wife is a kinetic creature. This is a cardboard model animal — mine is an elephant named Elly — that through clever design can walk. The walking mechanism is based on a simple crank shaft paired with cleverly designed legs that act as linkages. Here’s a slide show of the construction and a video of it in action:

This slideshow requires JavaScript.

One exciting thing about this toy is that you can replace the hand crank with a motor. They sell a kit for this but I plan to make my own.

Laser Cut Windmill

Laser Cut Windmill

I’ve been learning how to use the laser cutter at NextFab over the last few months. I’ve already posted about the laser cut Christmas ornaments that I made, but I also took on a larger project for the Holidays: I decided to make a toy windmill for my nephew who loves anything that spins, especially fans and gears. This turned out to be quite the design challenge, requiring several different versions and iterations – some of which never made it out of the computer.

Roughly speaking, I specified what I wanted in the design, created it in SketchUp, exported and touched it up in Illustrator, laser-cut it out, and finally sanded and glued pieces together. I actually went through the full process twice, but many steps required several more passes.

Windmill Design Goals

  • It is a wooden toy
  • It can be assembled by hand by a young child
  • The windmill blades spin
  • The tower looks like a windmill tower
  • The gears are functional
  • The windmill blades turn with the gears

SketchUp

I used SketchUp to design several versions of the windmill. This was a great way to iterate quickly and create the gears (there’s a very simple plugin). I don’t have the pro license so exporting was tough. Luckily, NextFab has a copy of Rhino which can import a SketchUp file and export it to Adobe Illustrator (just make sure it is a 1:1 scale factor).

Here are sample snapshots form various versions of the windmill. I think the box at the top of the windmill and the base were the pieces that changed the most.

This slideshow requires JavaScript.

Illustrator:

After exporting from SketchUp to Adobe Illustrator, there’s still a lot to do. I found that first joining the line segments together into a single path made the cutting much more consistent; instead of jumping around to cut segments, the laser cutter followed a complete path which eliminated gaps in the cutting. SketchUp exports circles as a series of line segments, which is OK, but for cleaner looking pieces I redrew them in Illustrator as actual circles. Because the laser cutter has a color-coded preset cutting order, I recolored the pieces to make sure the pieces were cut in the right order (black cuts first, then red, then blue, etc.). Finally, I grouped the pieces and distributed them on the artboard (which maps to my wooden board) to maximize the number of pieces I could cut on one board. Because I often needed multiple copes of a single piece (e.g. a gear), I usually made copes from one finished piece in Illustrator.

 

This slideshow requires JavaScript.

Laser Cutting

Once the files were done in Illustrator I loaded them onto the laser cutting computer. A laser cutter works a lot like a really large printer. You press print in Illustrator, adjust some settings and it opens a program dedicated to the laser cutter. Before sending the file you put the wood in the laser cutter.  You raise up the bed of the laser cutter so that the laser is in focus on the wood. You move the laser left and right to position it to where you want to start on cutting on the wood. Then you send the file from the computer to the laser cutter.

Here’s a video of the laser cutter in action:

You can’t actually see the laser beam since all of its light is going straight down to the wood where it very quickly burns through. The fan blows air in to clear the smoke away. The big plastic cover is darkened to protect your eyes from the laser.

Sanding and Glueing

The laser cutter can leave behind some unwanted smoke marks, but they are easily sanded off. The most important pieces to sand were the gears designed to meet at a 90 degree angle at the top of the tower. Laser cutting is limited to two dimensional shapes, but to fit gears together at a 90 degree angle the teeth have to be angled to a 45 degree. My solution was to shape flat gears on a sanding belt until they ran together smoothly.

90 Degree Gears

These gears are sanded to fit together at 90 degrees

 

I chose to round the edges a bit since they were a bit sharp for a toy. I also sanded the tabs and holes a bit to narrow and widen them respectively. This made the parts more forgiving during assembly. I glued some of the pieces together to reduce the number of assembly steps and to make the gears turn with the axles. For some of the last gears I made, I added a flat side to the axle hole in the center of the gear to act as a key. I didn’t need to glue the axle to these gears to get them to turn together. I just needed to sand the axle to match.

The flat edge in the hole grabs the axle

The flat edge in the hole grabs the axle

Laser Cutting Version 1

The first version I cut was nice, but because it was in 1/8″ Birch plywood it was too fragile to be a toy. There were some other issues too: the weight of the blade tipped the tower over so I needed to add a counter balance, the box at the top kept falling apart so I used a rubber band to hold it together, and the axle didn’t really stay straight with just an 1/8″ of plywood in the base or the gear (1/4″ combined), which led to a lot of friction in the gears. Initially I used a friction fit to get the gears and axles to turn together but after repeated use it became apparent that they needed glue.

Overall the pieces were just too complex and needed a lot of simplification; building this version in the real world revealed a lot of issues that I couldn’t see in SketchUp.

Here is the first version:

Windmill Version 1

Windmill Version 1

Laser Cutting Version 2

Having learned form the first version, I cut out this final version (after sever computer-aided iterations) of 1/4″ plywood to make it sturdier. The base is triple layered with the gear holes extending through two layers. This solves the axle alignment issue and keeps the axles from marking the table underneath, and I added some spacers to increase the thickness of the gears. In total, each axle now has a whole inch of material to keep it in aligned. The base and the box panels are greatly simplified and hold together on their own. The extra thickness made the tower pieces difficult to fit, so I sanded them to a shape that worked which improved the tolerance when putting it together.

Here’s the sequence of pictures that I used to make the assembly instructions for my nephew and his parents:

This slideshow requires JavaScript.

Here’s the final product (with overly dramatic music):

Next Steps

The windmill was a big hit and I was actually surprised with how fast my nephew could get it spinning. Right now he has to turn the gear directly so I’m working on a crank to make it easier to spin.

Overall I’d say this project was a success.

 

Lasercut Christmas Ornaments

IMG_3010

So this blog has been silent for a while because I’ve been busy making things at NextFab. Now that the presents have been distributed and the surprise is over I thought I’d share what I’ve made. I spent the fall working on Christmas ornaments with my wife. We drew, designed, and cut out a whole slew of ornaments. Some were more general holiday themed like reindeer and snowflakes. Some were more personalized like giraffes and trains fro my nephews. Here they are:

This slideshow requires JavaScript.

 

The process was unique for each of these but the general sequence was a hand drawn sketch or a picture from the internet, traced in Inkscape, exported to Adobe Illustrator for final processing, and cut out on a Troctec laser cutter at NextFab. For a few of them (the toolbox and train), they started as a Sketchup drawing that I screen captured and traced in Inkscape. The final touch was adding some string through the laser cut hole (or hot gluing it on in a few cases).

Getting Started at NextFab

I recently joined a local makerspace, NextFab, and have been busy taking various safety and training classes. I really like that each class uses a small project to get you familiar with the tools. Here’s what I’ve made so far.

In the wood safety class I made this shelf:

Shelf from Wood Shop Safety

In the metal shop safety class I made this bottle opener:

Bottle Opener from Metal Shop Safety

This key chain is from the intro to laser cutting class:

Key Chain from Laser Cutting

I made this box while learning to use the table saw:

Box from the Table Saw Class

And here’s my first project that I planned out my self:

Miniature Easels

They are miniature easels for small paintings. The one on the far right is the one I had that I duplicated the others from using some square and round dowel rods. The tall one is for small but not as small paintings.

I’m having a lot of fun at NextFab and have a growing list of projects to make!

Field Notes

Back in March I got my first set of Field Notes and I recently completed my first notebook with them. I love writing in notebooks but in the past I’ve struggled with two problems: trying to separate different things into different notebooks and forgetting the notebook. Field Notes encourages you to put everything in one place, saying:

“I’m not writing it down to remember it later, I’m writing it down to remember it now”

And they fit in your pocket so you always have it with you.

I used mine for everything:

a well used Field Notes notebook

I used it for math and recipes:

math and recipes

I used it for scheming:

schemes

I used it for sketching:

sketches

I used it for derivations:

derivations

I used it for more scheming:

more schemes

I used it to figure out fractals:

fractals

And it can be used for many, many more things:
the many, many uses

Here’s the before and after as I start on my next one:
before and after

 

I really loved having this on me all the time. I was able to keep track of a lot more of my ideas and I reduced the random scraps of paper that usually accumulate and have to try to hold onto. It doesn’t replace a full lab notebook for writing down detailed setups but I don’t do that very often and I know when I am going to do that. This is perfect for that aha moment you have in the middle of a plane ride or sitting out in a backyard watching your dog get tangled up again and again. I plan on using these over and over again.

Oh and I’ve switched around the pocket I carry it around in and its already reducing the wear and tear a bit.

Optimization with IPython

I just discovered the easy optimization features of IPython and thought I’d share how I used them to speed up my deep learning code. For a full explanation of the deep learning code see my previous posts about some projects that I’m building it for. This is going to be a speed comparison of the original R code, my first implementation in Python, and finally the optimized version in Python.

Over that last few weeks I’ve been exploring the various features of the IPython shell. I like its tab-to-complete, its easy help syntax (?), its magic commands (with %), and its ability to access the shell (with !). But I’m really excited about the latest set of features I found: the optimization tools. You can profile your code with just a single line and not have to worry about re-writing all of the code to do it. I’ve optimized code before (usually with timers) and I’ve used profilers before like the built-in one for Netbeans with Java and I know that the profile package and timeit package are python packages not IPython but the ease of use is really what does it for me. I don’t want to invest a ton of time and effort into finding out that my code is already as fast as its going to get or that I’m looking in the wrong place. I want an answer now and I want to be able to play around with it to dig into what’s going on. IPython provides that with %timeit and %prun. Here’s how I used these commands to get my 30 hour python code down to 6 minutes.

The Problem:

First off the code I was optimizing trains a machine learning model using gradient descent. To do that I have two key functions with many supporting ones: optimize and a gradient function (cd1 in this case). The optimize function performs gradient descent using the model and gradient function. It takes a chunk of data calls the gradient function and then applies a weight update to the model over and over again. Here’s the code:


def optimize(model, gradient_function, training_data, momentum=0.9, learning_rate=0.1, N_epochs=10, mini_batch_size=100):
    '''Trains a model with gradient descent'''
    model = {key:np.copy(val) for key, val in model.iteritems()}
    momentum_speed = {key:np.zeros(np.shape(val)) for key, val in model.iteritems()}
    data_len = np.shape(training_data['data'])[1]
    for epoch in range(N_epochs):
        for mini_batch in extract_mini_batch(training_data, mini_batch_size):
            gradient = gradient_function(model, mini_batch)
            for key in model.iterkeys():
                momentum_speed[key] = momentum*momentum_speed[key] + gradient[key]
                model[key] += momentum_speed[key]*learning_rate#-weightcost*model[key]
    return model

def cd1(model, data):
    '''Computes one iteration of contrastive divergence on the rbm model'''
    N_cases = np.shape(data['data'])[1]
    
    vis_prob_0 = data['data']
    vis_states_0 = sample_bernoulli(vis_prob_0)
    
    hid_prob_0 = logistic(model['W']*vis_states_0 + model['fwd_bias'])
    hid_states_0 = sample_bernoulli(hid_prob_0)
    
    vis_prob_n = logistic(model['W'].T*hid_states_0 + model['rev_bias'])
    vis_states_n = sample_bernoulli(vis_prob_n)
    
    hid_prob_n = logistic(model['W']*vis_states_n + model['fwd_bias'])
    
    vh0 = hid_states_0 * vis_states_0.T / N_cases
    vh1 = hid_prob_n * vis_states_n.T / N_cases
    cd1_value = vh0 - vh1
    
    model_delta = dict([('W', cd1_value),
                        ('fwd_bias',np.mean(hid_states_0 - hid_prob_n, axis=1)),
                        ('rev_bias',np.mean(vis_prob_0 - vis_prob_n, axis=1))])
    return model_delta

When I first ran this code on to train a model on the full MNIST data for the Kaggle competition it took 30 hours to complete the training on the 42000 data samples. I could have just accepted that but I had a suspicion that it could be faster because I was pretty sure the R code version ran faster than that. So I setup a fair comparison and implemented the exact same problem in R with the old code. It ran in 30 minutes.

Before I found the IPython tools, I recorded a bunch of data to see how fast the code was running with different amounts of data. I found that the R code scaled linearly while the Python code scaled exponentially. I also found that if instead of splitting the data up into mini-batches, I kept it all as one batch the Python was linear and faster than the R. All of this led me to scour the internet for advise on optimizing python code, specifically numpy code. I found some good tips from the scikit-learn site, scipy lectures, and a good example of the results of optimization.

Profiling in IPython:

First I timed my function on a small subset of the data (700 samples, with 100 samples per mini-batch). I kept this for comparison later to see if the changes really do help the problem. To do this I just put %timeit in front of my call to the optimize function.


%timeit model_rbm1 = dbn.optimize(model_rbm1_init, dbn.cd1, train, learning_rate=lr_pretrain, N_epochs=N_epochs_pretrain)
1 loops, best of 3: 18.6 s per loop

Next I run the profiler, which will compute how much time is spent in each function that gets called. For this I just add %prun in front of the command (-l 10 limits it to the top 10 calls):


#2000 cases

%prun -l 10 model_rbm1 = dbn.optimize(model_rbm1_init, dbn.cd1, train, learning_rate=lr_pretrain, N_epochs=N_epochs_pretrain)
   106298 function calls in 150.431 seconds

   Ordered by: internal time
   List reduced from 38 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall  filename:lineno(function)
     6597   44.723    0.007   44.737    0.007  {numpy.core._dotblas.dot}
     1800   36.143    0.020   36.178    0.020  dbn.py:10(logistic)
     1800   28.197    0.016   28.197    0.016  {method 'rand' of 'mtrand.RandomState' objects}
      600   20.137    0.034  146.362    0.244  dbn.py:31(cd1)
     1800   11.819    0.007   44.837    0.025  dbn.py:26(sample_bernoulli)
     1800    4.798    0.003    4.808    0.003  {method 'astype' of 'numpy.ndarray' objects}
        1    3.174    3.174  150.431  150.431  dbn.py:69(optimize)
     1200    1.201    0.001    1.207    0.001  {method 'reduce' of 'numpy.ufunc' objects}
    26397    0.090    0.000    0.101    0.000  defmatrix.py:290(__array_finalize__)
     1200    0.040    0.000    1.267    0.001  _methods.py:49(_mean)

The output is ordered by tottime, which is the total time spent in that function not counting functions that it calls. From this I can see that most time is spent in the numpy dot product routine as expected and the next most is in the logistic function that I wrote. So that’s what I investigated next. Now to do this I had to seperate it out from the code that called it but pass it some dummy data to take the arguments place. I used numpy.rand for that.

#for loop to get the number of calls comparable to dbn.optimize
%prun for i in range(2000): dbn.logistic(np.dot(np.random.rand(300,784),np.random.rand(784,100)) + np.random.rand(300,1))
   10003 function calls in 10.278 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall  filename:lineno(function)
     6000    7.535    0.001    7.535    0.001  {method 'rand' of 'mtrand.RandomState' objects}
     2000    1.585    0.001    1.585    0.001  {numpy.core._dotblas.dot}
     2000    0.846    0.000    0.846    0.000  dbn.py:10(logistic)
        1    0.311    0.311   10.278   10.278  <string>:1(<module>)
        1    0.000    0.000    0.000    0.000  {range}
        1    0.000    0.000    0.000    0.000  {method 'disable' of '_lsprof.Profiler' objects}

No matter what I tried I couldn’t get logistic to be as slow as it was in the optimize routine (cumulative time of 0.9s vs. 36.2s) so I backed up and tried timing the function that calls it: cd1. I figured there must be something about how it is called that slows it down. Here’s that profile run:

#for loop to get the number of calls comparable to dbn.optimize
%prun for i in range(0, data_len, mini_batch_size)*30: v = dbn.cd1(model_rbm1, {'data':(train['data'][:,i:i+mini_batch_size])})
    81805 function calls in 3.944 seconds

    Ordered by: internal time

    ncalls  tottime  percall  cumtime  percall  filename:lineno(function)
      3000    1.327    0.000    1.330    0.000  {numpy.core._dotblas.dot}
       600    1.264    0.002    3.733    0.006  dbn.py:31(cd1)
      1800    0.462    0.000    0.470    0.000  dbn.py:10(logistic)
      1800    0.459    0.000    0.459    0.000  {method 'rand' of 'mtrand.RandomState' objects}
         1    0.204    0.204    3.944    3.944  <string>:1(<module>)
      1800    0.073    0.000    0.579    0.000  dbn.py:26(sample_bernoulli)
      1800    0.042    0.000    0.044    0.000  {method 'astype' of 'numpy.ndarray' objects}
      1200    0.032    0.000    0.034    0.000  {method 'reduce' of 'numpy.ufunc' objects}
     21600    0.023    0.000    0.027    0.000  defmatrix.py:290(__array_finalize__)
      1200    0.018    0.000    0.062    0.000  _methods.py:49(_mean)

Again this isn’t as slow as it was when it was part of the optimize call (cumulative time of 3.7s vs. 146s) but working up this path seemed promising so I looked at how cd1 was called in the optimize function:

#gradient_function = cd1 for this call to optimize
for mini_batch in extract_mini_batch(training_data, mini_batch_size):
    gradient = gradient_function(model, mini_batch)
    ...

There’s nothing especially interesting about the gradient function line, but the for loop that feeds it uses a generator function. I tried swapping that out for a plain slicing of the training data and voila:

%timeit model_rbm1 = dbn.optimize(model_rbm1_init, dbn.cd1, train, learning_rate=lr_pretrain, N_epochs=N_epochs_pretrain)
1 loops, best of 3: 3.18 s per loop
# compared to 18.6s per loop before

The generator function was the culprit, not because it consumes a lot of resources (they are actually very efficient), but because I implemented it poorly:

#before
def extract_mini_batch(data, mini_batch_size=100):
    '''Generates mini_batches from the data'''
    data_len = np.shape(data['data'])[1]
    return (dict([(key,
                   val[:,i:i+mini_batch_size] if len(np.shape(val))>2 and np.shape(val)[1]==data_len else val)
                   for key, val in data.iteritems()])
            for i in xrange(0, data_len, mini_batch_size))

#after
def extract_mini_batch(data, mini_batch_size=100):
    '''Generates mini_batches from the data'''
    data_len = np.shape(data['data'])[1]
    return (dict([(key, val[:,i:i+mini_batch_size])
                   for key, val in data.iteritems()])
            for i in xrange(0, data_len, mini_batch_size))

Using an if statement, even the use of the ternary operator, breaks the ability of the interpreter to compile this into efficient code. I had to give up the ability to store some extra variables with the data but they were extraneous and the optimization is definitely worth it. Plus it is more readable after the change.

Results

Below are the results of the optimization. You can see the exponential curve of the original Python code and the improved linearly scaling Python code. Also note that R code scales worse than the Python code (linearly but not as good of a coefficient). A fully timed run for the R code takes 30 minutes while the new Python code only takes 6 minutes for all 42000 data samples.

Python vs  R Deep Learning

While it may be possible to get further improvements out of Python and R, those improvements would be hard to find, at least for me. I’m satisfied with how fast it runs and am glad to know how easy it is to profile code in python. I had been wondering for a while now how Python and R compared in speed and while there are resources out there testing it myself let’s me know what the level of effort is to get good fast code. I think Python wins on that one because between the straight forward numpy matrix syntax and the handy profiling commands it really only took me a couple days to clean up the code and get it running fast. It took a look more work to get the R code running as it is.

 

Reconstructing Numbers

I’m working on finishing up the code for the final 30 Days of Python project, saving the whales, but I took a detour to work with the MNIST handwritten digits again. I did this because they are great dataset for testing because it’s obvious what you are training on and its also well known how well an algorithm should do.

I’ve been implementing python code that trains restricted boltzmann machines (RBMs), deep belief nets (DBNs), and deep neural networks (DNNs). I originally implemented this in Matlab (as part of a coursera course) and then ported it to R (because it’s free, open source, and good for stats) and am now bringing it into Python. I chose to bring this code into python for a couple of reasons: I’m using python as my go to language, I want learn how to use numpy effectively, and I’ve become a fan of how python manages code. There are plenty of other reasons but I will caveat all of this with the fact that I know there are available RBM/DBN/DNN implementations available but I want to make my own as a learning exercise and because I like having control of the finer details of the model.

A Brief Intro to RBMs, DBNs, and DNNs

A Restricted Boltzmann Machine (RBM) is a model that has two layers: an input layer (called the visible units) and a hidden layer. Each layer is made up of “units” which can take on either a 1 or a 0. For a given input, the hidden layer can be computed using a weight matrix, a bias term, and the logistic function: h = logistic(W*v+b_fwd). Because the inputs are often real valued and the computation ends up with real values on the interval from 0.0 to 1.0 the values are treated as probabilities that a given unit will be ‘on’ or ‘off.’

The interesting thing about RBMs is they are trained to reconstruct their inputs. So instead of trying to tie the output layer to a digit label, we try to reconstruct the digit image. The theory being that if the model can reconstruct the input then the features learned will be useful in determining the label later. To calculate a reconstruction of the visible layer use the transpose of the weight matrix, a reverse bias term, and the logistic function: v = logistic(t(W)*h + b_rev). Calculating the hidden units from the visible is a forward pass and calculating the visible from the hidden is a reverse pass. The optimal way to train these models is quite slow but Geoffery Hinton found a shortcut called contrastive divergence:

  1. Compute a forward pass (v0 -> h0)
  2. Compute a reverse pass (h0 -> v1)
  3. Compute another forward pass (v1 -> h1)
  4. Calculate a weight update rule: delta_W = h0*t(v0) - h1*t(v1)

The full version of contrastive divergence requires the back and forth to go onto infinity but it turns out even one pass gives a good weight update rule that can be used in gradient descent.

RBMs can be stacked by using the output of one layer as the input to the next. Stacked RBMs are mathematically equivalent to Deep Belief Nets (DBNs) and it’s much faster to train them by doing “greedy” training one layer at a time. There are ways to train the DBN to improve the generative ability of the model and tie in the labels but I’m not interested in them for that.

The reason I use RBMs is you can initialize a deep neural network (DNN) with the same coefficients as the DBN. The two changes to make are discarding the reverse biases since the DNN is one way and adding an extra layer to connect to the labels. I follow what I learned in Hinton’s coursera class and use a softmax layer. A softmax layer is like a logistic layer except that it is normalized so that you can interpret it as a probability distribution. For the 10 MNIST digits there will always be 10 units in the softmax layer. For a digit of 2, unit 2 will have a high value (near 1.0) and the others will be near 0.0, which can be interpreted as high probability of it being a digit 2. For any input the sum of all the output units will be 1.0. The computation of a forward pass in the DNN is just a succession of forward passes of each layer. The model can be updated through gradient descent using the backprop method.

Crucially because the DNN is initialized with pre-trained RBMs only a few passes of backprop are needed to fine tune the model for discrimination. Virtually all of the training time is spent in the pre-training without the labels minimizing the amount of time the model has to overfit the training dataset.

Visualizing the Learning: Reconstructed Digits

One way to see how well the model learned from the data is to reconstruct the input data through the model. What I mean is set the input data to be an image of a digit, compute the hidden layer through a forward pass of the model, and then compute a reverse pass to get a reconstructed image. For a random model this results in white noise (think snow on the TV). Below is the result for a forward pass to the first layer and into the second layer followed by reverse pass by both. The reconstructed image is quite good meaning the model has the ability to represent the data well. Having worked with RBMs on multiple projects, I highly recommend this technique to verify that the model has learned something. It very easily points bugs in the code (my first reconstructions looked identical) or points to issues with trying to reconstruct the data this way.

Reconstruction of a digit through the RBMs

Reconstruction of digits through the RBMs

To explore what the model actually learned we can reconstruct an image from just the bias unit (i.e. turn off all of the hidden units). The reconstructed image is similar to the average of all of the images the model sees:

Digit Reconstruction from just the Bias compared to the Average Digit

Digit reconstruction from just the bias compared to the mean digit

Another way to see what’s going on is to turn on a single hidden unit in a layer and see what it constructs. The resulting image is essentially what would drive this hidden unit as close to 1 as possible. It shows what the model is paying attention too:

Reconstruction from a single hidden unit in the first layer

Reconstruction from a single hidden unit in the first layer

A dark area means it’s a higher value so you can see that these units tend have a very active curve somewhere in the image and very inactive spot next to it. What that means is each unit is triggering off of part of a digit. Notice that there aren’t really dark areas around the edges because the digits are centered in the image. Here’s a reconstruction from a single unit in the second layer:

Reconstruction from a single hidden unit in the seond layer

Reconstruction from a single hidden unit in the second layer

It gets harder to see what’s going on at this point but the main thing that is visible is that some of these have more than one area active and others have very tight localized spots that are active. Getting these plots for the first time wasn’t exactly straight forward so I thought I’d share my code:

# %% Probe the bias units
probe1 = dbn.rbm_rev(model_rbm1, np.matrix(np.zeros((N_hid1, 1))))
probe2 = dbn.rbm_rev(model_rbm1, dbn.rbm_rev(model_rbm2, np.matrix(np.zeros((N_hid2, 1)))))

plt.subplot(1, 3, 1)
plt.axis('off')
plt.imshow(np.reshape(np.mean(train['data'],axis=1),image_shape), cmap=plt.cm.gray_r, interpolation='nearest')
plt.title('Mean Image')
plt.subplot(1, 3, 2)
plt.axis('off')

plt.imshow(np.reshape(probe1,image_shape), cmap=plt.cm.gray_r, interpolation='nearest')
plt.title('Rev Bias 1')
plt.subplot(1, 3, 3)
plt.axis('off')
plt.imshow(np.reshape(probe2,image_shape), cmap=plt.cm.gray_r, interpolation='nearest')
plt.title('Rev Bias 2')

# %% Probe the first RBM
bias1_probe = dbn.rbm_rev(model_rbm1, np.matrix(np.zeros((N_hid1, N_hid1))))
W1_probe = dbn.rbm_rev(model_rbm1, np.matrix(np.eye(N_hid1)))
rbm1_probe = W1_probe - bias1_probe

plt_shape1 = (6,12)
plt.figure(figsize=(15.,9.))
for i in range(np.prod(plt_shape1)):
    plt.subplot(plt_shape1[0], plt_shape1[1], i+1)
    plt.axis('off')
    plt.imshow(np.reshape(rbm1_probe[:,i],image_shape), cmap=plt.cm.gray_r, interpolation='nearest')

# %% Probe the second RBM
bias2_probe = dbn.rbm_rev(model_rbm1, dbn.rbm_rev(model_rbm2, np.matrix(np.zeros((N_hid2, N_hid2)))))
W2_probe = dbn.rbm_rev(model_rbm1, dbn.rbm_rev(model_rbm2, np.matrix(np.eye(N_hid2))))
rbm2_probe = W2_probe - bias2_probe

plt_shape2 = (6,12)
plt.figure(figsize=(15.,9.))
for i in range(np.prod(plt_shape2)):
    plt.subplot(plt_shape2[0], plt_shape2[1], i+1)
    plt.axis('off')
    plt.imshow(np.reshape(rbm2_probe[:,i],image_shape), cmap=plt.cm.gray_r, interpolation='nearest')

Results

I trained each RBM layer with 30 passes through the data, 10 passes for the softmax layer, and then finished up with 10 passes of backpropagation for the whole DNN. The resulting model got me my highest score on the MNIST digit challenge on Kaggle, 97% accurate, which I believe is on par with humans. At the very least it beats my 96% that I got with K Nearest Neighbors.

Stay tuned for a comparison of the Python implementation to the R implementation with a bit about optimization.

p.s. Once my code structure settles down, I plan to update this with a link to github repo for the full thing.