Stuff Pete Does: Amazon

Well, I've been having more fun messing around with Node.js and allowing myself to be distracted by interesting problems. The latest of which was triggered by my desire to integrate the BrowserStack beta API for cross browser testing. This is a nice service that will fire up any number of different versions of browsers and point them at a URL that you specify. Integrating this with Testacular and Mocha means that I can run all my browser javascript tests in all browser variants and get the results right back in my shell immediately, without having to run a myriad of browser versions locally :) This even includes mobile platforms :D

So what's the catch?

Well, in order for BrowserStack to connect to my Testacular server it needs to hit a public URL. Unfortunately my development machine is not reachable on a public URL (nor do I want it to be, at least not really public). The solution suggested by BrowserStack was to use a simple service called LocalTunnel. This service provides a client with which you can create an SSH tunnel to a local port that you specify. The service then allocates a random subdomain of localtunnel.com from which it will forward HTTP requests to your local port. Very useful and sounds easy, right? Unfortunately when I tried the client it didn't work and the only clues were leading me into a world of SSH keys, etc.

Hence the distraction. As I probably want to fire up my tunnel and browsers programatically I'm not so fond of relying on command line interfaces and really I want a node module to do it. What's more if I'm going to dig around in secure connections why don't I take the opportunity to expand my knowledge in a direction that I want it expanded. So I decided I would implement my own tunnel service and client solution in node and thus the tls-tunnel package was born.

Early on I figured I didn't want to mess about with generating random sub domains and trying to route based on the sub domain on which a connection was made so instead I decided to assign ports on the server to satisfy client connections. This way whenever a new client connects and requests a tunnel the server will allocate a port from a predefined range of available ports and start listening on that.

My plan was to use a free Heroku or Nodejitsu instance to then deploy my tls-tunnel server when I needed it.

This is where I learnt a hard lesson in the problems of bottom up development. Although I am applying TDD principles I did in fact fail to validate one of my initial assumptions - that I could use multiple ports! Both Heroku and Nodejitsu will only expose one port to your application... this could/should have been a red flag. I realised this early on but plowed ahead anyway thinking that at a later date I could apply a small change to my tunnel and instead use the random subdomain solution to differentiate between tunnels.

So I got my tunnel working using TLS (hence the name) with clients and servers authenticating each other with their own self signed SSL certificates. I was pretty proud of myself for implementing something that was in theory protocol agnostic - I had noticed that other similar solutions were limited to HTTP traffic... this should have been a red flag!

I next turned to the problem of making it all work on one port. Having already learnt quite a bit about the TLS/SSL problem domain I now learned a hard lesson about the TCP domain or more specifically the Node.js net domain.

I had made the assumption that when a raw TCP socket was connected to a server I would be able to read out the domain name that it had used... Wrong!!!

What LocalTunnel is doing is using the HTTP protocol to get the domain name that was used for the connection. GAH!! and what do you know this is the same reason the Heroku and Nodejitsu limit access to a single port. Double GAH!!!

So now I'm left with a choice. My solution can still work but I'm going to have to put it on an Amazon EC2 instance or something (I can get one for free for now). Or I can bite the bullet and implement the same HTTP restriction (boo) and do subdomain based tunnelling.

It's not such a simple choice though. On the one hand it's easy to integrate Heroku and Nodejitsu into my development and testing process (and even share that) as opposed to the hoops I will have to jump through to get it up and running on an EC2 instance. But on the other I don't want to limit my solution to HTTP and I haven't actually verified yet that I can use random subdomains on either service (once bitten, etc).

Perhaps there is a third way though - maybe if I only support one tunnel at a time I can use a single port...

That said, I'm leaning towards the EC2 solution for flexibility ("lean"-ing might be a bad choice of word here though - if you'll excuse the pun ;))

Yesterday I discovered that Amazon AWS offer a free tier which basically means I can have a free server (possibly more than one) in their cloud for a year!

Awesome, I'll have some of that :)

I decided to see if could get our 5Live hangout project up and running there. Figured it would be useful as our free heroku plan limits us to a 16MB database!!!! On AWS I can have up to 30GB of storage for free :)

Of course I'll have to be my own sys admin to get it though. So that's where the adventure begins. This is what I needed to setup...

A server
Some storage
Install Node.js
Install MongoDB
Install Git
Open ports to allow access from the interwebs

In figuring this stuff out I probably created and terminated about 20 EC2 instances. There's the first 2 things I learned...

Amazon refers to it's virtual machines as EC2 instances
Amazon calls deleting an instance "terminating it"

When you terminate an instance it does not go away immediately (takes about 20 minutes) but it is not recoverable
There is an option for termination protection which I haven't tried but might be a good idea :)

Only a limited number of the virtual machine types are covered by the free tier but that's ok I only actually tried 2 of them. Didn't think I'd be interested in running my stuff on windows so I only tried the Amazon Linux and Ubuntu 12.04 images. Both of which are free in the micro configuration (1 core, 613MB RAM). After switching between the 2 a few times I settled on Ubuntu mainly because it is more familiar to me. However my research suggests that the Amazon Linux images might be better optimized for EC2.

Now for the real purpose of this blog post, which is mainly for my own notes, these are the steps to setting up the above list of requirements.

Create an Amazon AWS account

First we need an AWS account

From https://aws.amazon.com/free/ sign up for a new account if you don't have one and verify with the fancy phone call verification
Wait for email confirmation of the new account

Create an EC2 instance

We need a virtual machine

Choose the free tier eligible machine type

Keep the default machine options

Create a new security group

Head over to http://aws.amazon.com/console/ and sign in with your new account
Select the EC2 link
Select the Instances/Instances link on the left hand side
Click the Launch Instance button
Choose the Classic Wizard option and click Continue
Choose Ubuntu Server 12.04 LTS 64bit and click Select
Keep the default options for the machine type as pictured above and click Continue
Keep the default options for for the machine features as pictured above and click Continue
Enter a name for the instance (this is only used for display in the AWS console and is not the machine name) and click Continue
Next you will have to create a key pair - this is used instead of passwords to log on to the virtual machine using SSH (If this is not the first instance on the account then you can reuse an existing key pair). Enter a name for the key pair and click Create & Download your Key Pair - keep this somewhere safe but accessible. Then click Continue
Create a new security group with at least port 22 open so that you can SSH to the instance as pictured above. I have decided that it is best to create a new security group for each EC2 instance as it is not possible to change to a different security group after the instance has been created. However it is possible to change the rules in a security group, so if you want different instances to have different rules then you need to create different security groups for each instance. Then click Continue
You will then be presented with a page to review so just click Launch and on the next dialog click Close

Create an EBS volume

We need an Elastic Block Store volume so we can separate our MongoDB data from the OS volume

Select the Elastic Block Store/Volumes link on the left hand side. Notice that there is already an 8GB volume for the EC2 instance OS. Make a note of the zone for this existing volume (eg. us-east-1d), we will want to create our new volume in the same zone so the EC2 instance can be attached to it
Click Create Volume
Select the size of the volume (eg. 10GB) and the same zone as noted in the last step. Don't select a snapshot. Click Yes, Create
Right click the newly created volume and select Attach Volume
Select the newly created Ubuntu instance and leave the Device field to the default. Click Yes, Attach. This will actually attach the volume to /dev/xvdf and not /dev/sdf on this version of Ubuntu, as noted on the dialog

Start the instance and log on using SSH

We're going to need our key pair file in the next step. On OSX and linux it can be supplied to the ssh command using the -i option but on windows I use Putty. Putty does not accept *.pem files as generated by amazon so it's necessary to convert it to a *.ppk file using PuttyGen. Anyway follow these steps to logon...

In the AWS console go back to Instances/Instances on the left hand side
Select the instance and on the Description tab scroll down until you find the Public DNS entry. This is the public host name of your server. As an aside it also contains the static IP address in case you want to know what that is - eg. ec2-.compute-1.amazonaws.com
Launch Putty and paste the Public DNS host name into the host name field
Prepend the host name with ubuntu@ so that you don't need to specify the user name when connecting (the default user is called ubuntu)
On the left hand side select Connection/SSH/Auth.
Under Private key file for authentication browse for the *.ppk file generated by PuttyGen from the *.pem file created and downloaded from Amazon
Go back to the Session section at the top on the left hand side and save the session with a sensible name
Click Open and you should just be logged in as the ubuntu user (after accepting the public key)

Format the EBS volume and mount it permanently

We want a nice efficient file system and it seems that it's de rigueur to use XFS. XFS is supported by the Ubuntu 12.04 kernel but the tools to format volumes are not there by default. Anyway here are the steps to follow at the command line...

sudo apt-get install xfsprogs
sudo mkfs -t xfs /dev/xvdf
sudo mkdir /mnt/data
sudo nano /etc/fstab

The last step will start nano so that we can edit the /etc/fstab file to ensure that our volume is mounted whenever the machine reboots. Add the following line...

/dev/xvdf /mnt/data xfs noatime,noexec,nodiratime 0 0

Write out the file with ctrl-o and exit with ctrl-x.

Now we need to mount the data volume. At the command line...

sudo mount -a

Install the latest stable Node.js

At the time of writing the default Node.js package available in Ubuntu is 0.6.12 and the latest stable is 0.8.2. In order to get the latest stable release do the following at the command line...

sudo apt-get install python-software-properties
sudo apt-add-repository ppa:chris-lea/node.js
sudo apt-get update
sudo apt-get install nodejs npm

Install and start the latest stable MongoDB

At the time of writing the latest MongoDB was 2.0.6 and that is what we download in the following steps. Check with http://www.mongodb.org/downloads to see if there is a newer version. At the command line...

cd ~
curl -O http://downloads.mongodb.org/linux/mongodb-linux-x86_64-2.0.6.tgz
tar -xzf mongodb-linux-x86_64-2.0.6.tgz
cd mongodb-linux-x86_64-2.0.6/bin
sudo mkdir /mnt/data/db
sudo chown ubuntu /mnt/data/db
./mongod --fork --logpath ~/mongod.log --dbpath /mnt/data/db/
cd ~
tail -f mongod.log

This will start the MongoDB daemon in the background and output the logging to ~/mongod.log. The last command allows you to check that the daemon starts up ok. Once it has completed the startup sequence then it is safe to ctrl-c out of the tail and mongod will continue running. To stop mongod, the safest way is from the mongo client. At the command line...

cd ~/mongodb-linux-x86_64-2.0.6/bin
./mongo
use admin
db.shutdownServer()

The last command shutdown the server and prints out lots of stuff that looks like errors but it should be fine and it should be possible to start the server again as before.

Install Git

I use GitHub and all my code is up there so I need git to put it on my new server. At the command line...

sudo apt-get install git

Opening more ports

While developing Node.js applications I usually use the default Express port of 3000. You will remember that when we created the server instance we only opened port 22 in the security group. In order to hit the server on port 3000 we have to add that to our security group too...

In the AWS console select Network & Security/Security Groups on the left hand side
Select the the security group created specifically for the server instance
Select the Inbound tab
For Create a new rule select Custom TCP rule
For Port range enter 3000
For source enter 0.0.0.0/0
Click Add Rule
Click Apply Rule Changes

It should now be possible to connect to services running on port 3000 from the internet. Remember that the host name is the Public DNS entry under the EC2 instance description.

Stuff Pete Does

Saturday, October 20, 2012

Gah! TCP doesn't won't work the way I want :(

Thursday, July 12, 2012

Amazon EC2 learnings