RDTN Simulations in the Cloud

To perform simulations with RDTN more efficiently, I added support for running them on Amazon’s Clould Computing services (called Amazon Web Services or AWS).

As the main point of the experiments I do with RDTN is to run a lot of simulations with varying parameters, parallelization is trivial: Just start a couple of processes that handle different parameters.

Thanks to the ongoing trend towards utility computing, putting the simulation on a cluster is now not only feasible but even easy. Simulations are an ideal application for this this style of accessing resources where you pay only for what you use, as they use as much CPU power as they can when they run, but a dedicated machine would idle most of the time, as simulations are no continually running service.

clouds

Ingredients

The goal was to implement plumbing around RDTN and the Amazon WebServices so that you can run simulations on a given number of machines just by typing one command. The components used for this are:

  • S3: The Simple Storage Service allows you to store data on Amazon’s infrastructure. I use it to store the results of the simulations.
  • EC2: The Elastic Compute Cloud provides the computing resources for running the simulations. The general idea is that you have the image of the system (including the operating system and all necessary daemons and applications) you want to run and then tell EC2 to start it as a virtual machine — Amazon calls them instances — on a real machine somewhere in Amazon’s datacenters. The usage of EC2 is billed in units of machine hours — you pay only for the time your instances are running. A nice feature of EC2 is that you can create the images from a running system, so that you can start out with a basic system (I used a bare Debian GNU/Linux), start it, log in via ssh, and then configure it and install the software you need. When it works, you bundle the system into an image and store it in S3, from where it can be loaded.
  • SQS: I use the Simple Queue Service to manage the simulation tasks. The variants that are to be simulated are computed in advance and stored in SQS. When the instances start they take the next variant from the queue and work on it.
  • GitHub: In order to avoid updating the images when ever the code for the simulations changes, the instances pull the latest version of the awssim branch of RDTN from GitHub.
  • RDTN obviously.
  • AWS-S3 for Ruby: To access S3 from RDTN, I use Marcel Molina‘s aws-s3 which I forked to fix some issues with Ruby1.9 (I use 1.9, because it runs the simulator about three times faster than 1.8).
  • SQS sample code for Ruby: The code I use to interact with SQS is based on sample code from Amazon’s developer resources. I also needed to tweak this to work with Ruby1.9. Currently, the code lives in the RDTN repository, but I plan to extract it to it’s own, as it may be useful beyond the scope of RDTN.

What it does

  1. From any machine compute the variants that should be simulated and put them into SQS. Each variant is one entry in the queue.
  2. Start some EC2 instances
  3. Each instance pulls the current version of RDTN from GitHub. The simulations always use the awssim branch.
  4. On each instance, the simulation runner is started, getting the next variant from the queue to simulate it.
  5. The results of a simulation run are written to S3.
  6. The simulation runner continues to get variants from SQS (step 4) until the queue is empty and the instance automatically shuts down.
  7. The results can be pulled from S3 to any machine where the data is analyzed.

Future Work

The part of the processing of result data could be performed on the EC2 instances, so that only the synthesis of the results needs to be performed on a single machine.

This entry was posted in technical and tagged , , , . Bookmark the permalink.

Comments are closed.