Automating the creation and configuration of your AWS Deep Learning instance
·
last updated on
Dec 12, 2018
Introduction
Being able to reproduce an environment - be it an application runtime environment, a testing environment or build environment - has
some important benefits:
Repeatability: Your environment is always the same, which means your application has the necessary components to run, and you have a known state
you can use a start for debugging or development.
Reproducability: You can easily deploy a new instance of your service or application.
Ease of deployment: The deployment process is simple and leaves little room for error.
Bootstrapping latency: It’s quick to get up and running - even a complex deployment process is going to faster when steps don’t have to be taken manually
Completeness: Manually configuring your environment invites the tendency to cut corners; since every step takes a bit longer, you can speed up the process by
leaving some steps out.
These benefits also apply for deployment of ec2 training instances. In this post, we’re going to automate the creation & configuration of an EC2 deep learning instance, with the following end state:
Have a running P2 deep learning EC2 instance
A configured Jupyter notebook server running in the background
Jupyter plugin configurator & a set of base plugins installed
A specific (user chosen) folder copied to the remote instance
Terraform is used to create the relevant AWS infrastructure, which in our case will mean the EC2 instance configuration. Ansible is used for configuration
management; we will use it to install the relevant softare, copy files and launch the jupyter notebook server. Let’s get started!
Getting terraform and ansible
First, we’ll need to install Terraform and Ansible:
Creating the terraform configuration
It’s good security practice to create an IAM user to limit the scope of your credentials in AWS.
Follow the necessary steps to accomplish that, and take note of the Access ID and Access Key. Alternatively, you can have Terraform rely on your credentials in ~/.aws/credentials.
First, define the provider block:
Next, let’s define some security rules in a security group:
Here, we’re allowing both ephemeral ingress and ssh ingress. The former will allow us to use pip to install additional packages, the latter will allow us to login and provision the instance.
Now, we create a configuration template for the deep learning EC2 instance we’re going to create:
We’ve chosen the p2.xlarge instance type here, but you can change this to an instance type of your choosing (note that AWS limits may apply). Note the reference to the allow_ssh security group, this is necessary
to configure the instance to use it. Further note that key_name must refer to a pre-existing key pair. You can create it in your credentials dashboard.
Next, run:
At this point, you should have a running EC2 instance with ssh access!
Automatic provisioning using Ansible
While we now have a running EC2 instance of the appropriate type, we still need to configure the Jupyter server, copy any files we need and do any other configuration we might want.
First, let’s create an inventory file:
We also need to create a configuration file to point ansible to the right ssh key:
All that’s left to do now is to configure the ec2 instance:
These steps are pretty straightforward:
Kill all Jupyter processes that are running.
Generate ssl cert & password, and configure Jupyter to use it.
Install jupyter nbextensions + some plugins.
Start jupyter.
Copy desired files.
Finally, a little script to tie it all together:
It’s all pretty simple, but this can save us a lot of time. Find the complete source code here.
Did you like that post?
You can suscribe to the
RSS feed
.