You've been hearing about Deep Learning for years now - every week there are new frameworks being released, new AI assistants launched, and new impressive-sounding milestones being hit. Finally you've decided to stop being a spectator, jump aboard the hype train, and take that online course you saw on Hacker News but never started.
Before you embark on your Neural Net adventure, there is one important decision you need to make. One that could mean the difference between smooth sailing or being adrift at sea.
Should I run this tutorial on my laptop or in the cloud?
Machine learning takes a lot of computing power. Underneath the magic and tenuous biology analogies lie millions upon millions of matrix multiplications. Your new Macbook Pro is no slouch, but compared to Google's custom Tensor Processing Units, Microsoft's FPGA racks, and Amazon's GPU farm it's hard to see how a puny laptop could keep up without combusting or causing 3rd degree burns.
A key thing to remember is that these companies are dealing with incredibly large quantities of data; the ImageNet database holds 14 million images and YouTube8-M contains over 500,000 hours of video.
You won't going to be tackling these behemoths right off the bat; instead you'll be working on datasets like the MNIST handwritten digit database which is only 10MB.
New options from tech giants
There are a few platforms available for aspiring Machine Learners. Amazon recently launched the P2 instance type for EC2 which runs on NVIDIA K80s, Google Cloud Platform's ML beta promises access to all the fancy hardware acceleration Google uses (as long as you use Tensorflow), and Microsoft announced GPUs are coming to Azure.
These options cost money, so as a beginner the main tradeoff is how much you're willing to pay to run your initially buggy code versus how long you can wait for your models to train. Most tutorials range somewhere between one to 30 minutes to train on a laptop. Imagine waiting half an hour only to realize you had a typo! Unless you were watching Netflix, that's probably a half hour you wish you got back.
If you have a fancy gaming rig with a top of the line NVIDIA GPU (most deep learning frameworks with GPU support are built on CUDA) you can stop reading and get to work. You might even have a better graphics card than what the big companies are offering. If you're like me and haven't upgraded or owned desktop since Portal 2 came out, then looking into these cloud services may make more sense.
To quantify the tradeoffs I trained a vanilla Tensorflow classifier for MNIST digits on my laptop and on two computing platforms, comparing the times the models took to run.
Training a simple two layer network over two million 28x28 grayscale images took about 4 and a half minutes on my 2015 Macbook Pro - not too shabby! Running the same network on Amazon's P2 instance took only 21 seconds, over 10x as fast. Interestingly, Google's Cloud ML had minimal gains compared to running it locally (it took about 3.8 minutes).
I suspect there is some overhead with Cloud ML since it's designed for scale, so you won't see much benefit from running small models on it. Once you get to huge Googley datasets on networks with ten Inception modules I expect to see compute times comparable to Amazon; however the scope of this discussion is for beginners, so I did not feel like spending the time or money to find out. (maybe later! or someone from Google can tell me).
Another downside to Google Cloud ML for beginners is that many online courses rely on iPython, which I don't believe is possible on Cloud ML as it's a more production-focused platform.
And the winner is:
My recommendation is to try out the P2 instance on Amazon if you are OK with spending a few dollars a day. Remember to stop your instance when you aren't using it!
From the onset you'll be iterating frequently and making lots of stupid mistakes. A lot of the time will be spent tuning hyperparameters in your model - adjusting values like learning rate, number and size of layers, regularization weights, etc. The right parameters are what separates good results from complete garbage, and adjusting them effectively requires intuition that comes from experience. Trying 12 different parameter combinations will take one hour on a Macbook versus six minutes on Amazon; personally, I would rather get my hour back.
If you go down the Amazon route, don't get used to seeing models train in seconds. Once you start working with more data you won't have that luxury, even on high end GPUs (for reference, the newest Titan X card boasts it can train AlexNet in only 2 days - key word "only"). Learn good habits when it comes to choosing hyperparameters and don't rely on getting lucky, at least until 2020 when we finally develop networks that can train themselves and reach singularity.
Where do I get started?
Udacity has a good course on practical Deep Learning using Tensorflow, though it doesn't really go into the theory and math very much.
Stanford's CS231N is on YouTube and all the assignments are online. This class has you implement more of the nuts-and-bolts of neural networks rather than relying on existing frameworks, which I found to be very helpful in my understanding of them.
All the major frameworks (Tensorflow, Torch, Caffe) have online tutorials if you Google them.