BEAST in the cloud

We recently managed to set-up an account on amazon web service (AWS) that will enable us to test out how practical and efficient it will be to do computationally intensive analyses in the cloud. We often wonder whether it is worth spending loads of money on new servers and lots of time managing the hardware and software on them or whether it is worthwhile off loading most of it to a private company like Amazon. Do we save time? Do we save money?

The cloud is not impervious to problems, recently amazon was affected by a significant outage, however it is still worth thinking of using cloud computing and storage for the future.

The following post is a walk through of my first experiment using AWS EC2 to benchmark different instances for BEAST phylogenetic analyses.

1) Setting-up an amazon AWS account paid using a purchase order number from the CVR.

Arcus Global Ltd is a company that has been set-up to help Universities use AWS. The following video explains the background and how it works.

It wasn’t too difficult to set-up and the people at Arcus Global Ltd were quite helpful. They don’t charge more than the AWS pricing but try to make money by offering support for $100 a month. They aren’t pushy though so we decided to go it alone.

2) Trouble getting going

This was by far the hardest step. AWS terminology is quite alien and there is a big step to make to get started, there is still an awful lot that I don’t understand. The hard part is to install the CUDA tool kit and the beagle library on an instance. I found the following two blogs very useful:

  1. http://blog.faircloth-lab.org/beast-in-the-cloud/
  2. http://francoismichonneau.net/2014/05/how-to-install-beagle-on-ubuntu/
  3. and the most useful: http://tleyden.github.io/blog/2014/10/25/cuda-6-dot-5-on-aws-gpu-instance-running-ubuntu-14-dot-04/

Using the EC2 CentOS 5.5 GPU HVM AMI (ami-aa30c7c3), I didn’t manage to install the beagle library. I kept getting the following error message:

./autogen.sh
Putting files in AC_CONFIG_AUX_DIR, `.config’.
configure.ac: installing `.config/install-sh’
configure.ac: installing `.config/missing’
examples/complextest/Makefile.am: installing `.config/depcomp’
Makefile.am: installing `./INSTALL’
configure.ac:388: required file `hmsbeagle-${GENERIC_API_VERSION}.pc.in’ not found
autoreconf: automake failed with exit status: 1

However, starting from the following AMI (ami-9eaa1cf6), I was successful.

3) Setting-up an AWS instance.

I found the AWS command line tools really useful so I would recommend installing them.

a) Launch the instance and login

ec2-run-instances –key my-key ami-9eaa1cf6 -t g2.8xlarge
ssh -i my-key.pem ubuntu@xx.yy.zz.qq

b) Change the settings to allow ssh (optional)

sudo bash
adduser bob

In /etc/ssh/sshd_config change to:

PasswordAuthentication yes

Add:

AllowUsers bob

Then:

service ssh restart
exit

c) Verify the CUDA library is installed properly

cd /usr/local/cuda/samples/1_Utilities/deviceQuery
make
./deviceQuery

d ) Install all the necessary libraries for the beagle installation

sudo apt-get update
sudo apt-get install build-essential autoconf automake libtool subversion pkg-config openjdk-7-jdk

sudo apt-get install git
git clone –depth=1 https://github.com/beagle-dev/beagle-lib.git
cd beagle-lib
./autogen.sh
./configure –prefix=$HOME
make install
make check

And add to .profile

export LD_LIBRARY_PATH=$HOME/lib:$LD_LIBRARY_PATH
source ~/.profile

e) Finally install BEAST

I did this by transferring BEASTv1.8.2 using CyberDuck because I could find the appropriate link to use for wget. Uncompress and then check that BEAST can find the beagle resources:

tar xvf BEASTv1.8.2.tar
beast -beagle_info

Output for instance type cg1.4xlarge:

BEAGLE resources available:
0 : CPU
Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALERS_RAW SCALERS_LOG VECTOR_SSE VECTOR_NONE THREADING_NONE PROCESSOR_CPU FRAMEWORK_CPU
1 : Tesla M2050
Global memory (MB): 2687
Clock speed (Ghz): 1.15
Number of cores: 448
Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALERS_RAW SCALERS_LOG VECTOR_NONE THREADING_NONE PROCESSOR_GPU FRAMEWORK_CUDA
2 : Tesla M2050
Global memory (MB): 2687
Clock speed (Ghz): 1.15
Number of cores: 448
Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALERS_RAW SCALERS_LOG VECTOR_NONE THREADING_NONE PROCESSOR_GPU FRAMEWORK_CUDA

Or for the instance type g2.2xlarge

BEAGLE resources available:
0 : CPU
Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALERS_RAW SCALERS_LOG VECTOR_SSE VECTOR_NONE THREADING_NONE PROCESSOR_CPU FRAMEWORK_CPU
1 : GRID K520
Global memory (MB): 4096
Clock speed (Ghz): 0.80
Number of cores: 1536
Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALERS_RAW SCALERS_LOG VECTOR_NONE THREADING_NONE PROCESSOR_GPU FRAMEWORK_CUDA

4) Benchmarking different instance types and my personal computers.

Time (minutes)Java JVMOperating SystemProcessor modelProcessor speedComputer modelNotes
2.651.8.0_40MacOS X 10.10.54 GHz Intel Core i74 GHz iMac./beast -beagle_gpu -seed 666 ../examples/Benchmarks/benchmark1.xml
2.711.8.0_40MacOS X 10.10.54 GHz Intel Core i74 GHz iMac./beast -beagle_opencl -seed 666 ../examples/Benchmarks/benchmark1.xml
2.721.8.0_40MacOS X 10.10.54 GHz Intel Core i74 GHz iMac./beast -seed 666 ../examples/Benchmarks/benchmark1.xml
3.69java-7-openjdk-amd64Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-48-generic x86_64)Intel Xeon E5-2680 v2 (Ivy Bridge) Processors2.8 GHz (Max Turbo 3.6 GHz)ami-a596b8d2 on instance type c3.8xlarge./beast -beagle -beagle_cpu -seed 666 ../examples/Benchmarks/benchmark1.xml
3.71java-7-openjdk-amd64Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-48-generic x86_64)Intel Xeon E5-2680 v2 (Ivy Bridge) Processors2.8 GHz (Max Turbo 3.6 GHz)ami-a596b8d2 on instance type c3.8xlarge./beast -beagle -beagle_gpu -seed 666 ../examples/Benchmarks/benchmark1.xml
4.63java-7-openjdk-amd64Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-48-generic x86_64)Intel® Xeon® Processor E5-2670
(20M Cache, 2.60 GHz, 8.00 GT/s Intel® QPI)
2.6 GHz (Max Turbo 3.3 GHz)ami-2cbf3e44 using instance type g2.2xlarge./beast -beagle -beagle_cuda -seed 666 ../examples/Benchmarks/benchmark1.xml
4.71java-7-openjdk-amd64Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-48-generic x86_64)Intel® Xeon® Processor E5-2670
(20M Cache, 2.60 GHz, 8.00 GT/s Intel® QPI)
2.6 GHz (Max Turbo 3.3 GHz)ami-2cbf3e44 using instance type g2.2xlarge./beast -beagle -beagle_GPU -seed 666 ../examples/Benchmarks/benchmark1.xml
4.72java-7-openjdk-amd64Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-48-generic x86_64)Intel® Xeon® Processor E5-2670
(20M Cache, 2.60 GHz, 8.00 GT/s Intel® QPI)
2.6 GHz (Max Turbo 3.3 GHz)ami-2cbf3e44 using instance type g2.2xlarge./beast -seed 666 ../examples/Benchmarks/benchmark1.xml
5.06java-7-openjdk-amd64Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-48-generic x86_64)Intel Sandy Bridge processor 2.6 GHz and two NVIDIA Tesla “Fermi” M2050 GPUs2.6 GHzami-a596b8d2 on instance type cg1.4xlarge./beast -beagle -beagle_GPU -seed 666 ../examples/Benchmarks/benchmark1.xml
5.351.7.0_25MacOS X 10.9.52 x 2.4 GHz Quad-Core Intel Xeon2 x 2.4 GHzMac Pro./beast -beagle_opencl ../examples/Benchmarks/benchmark1.xml
31.16java-7-openjdk-amd64Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-48-generic x86_64)Intel Sandy Bridge processor 2.6 GHz and two NVIDIA Tesla “Fermi” M2050 GPUs2.6 GHzami-a596b8d2 on instance type cg1.4xlarge./beast -beagle -beagle_GPU -seed 666 ../examples/Benchmarks/benchmark2.xml

I still have loads to learn when it comes to using the cloud but this was a useful experiment to start off with.

Thanks for reading this far,

Joseph

 

Categories: amazon, BEAST, EC2, phylogenetics