BEAST in the cloud
- Post by: Joseph Hughes
- September 30, 2015
- No Comment
We recently managed to set-up an account on amazon web service (AWS) that will enable us to test out how practical and efficient it will be to do computationally intensive analyses in the cloud. We often wonder whether it is worth spending loads of money on new servers and lots of time managing the hardware and software on them or whether it is worthwhile off loading most of it to a private company like Amazon. Do we save time? Do we save money?
The cloud is not impervious to problems, recently amazon was affected by a significant outage, however it is still worth thinking of using cloud computing and storage for the future.
The following post is a walk through of my first experiment using AWS EC2 to benchmark different instances for BEAST phylogenetic analyses.
1) Setting-up an amazon AWS account paid using a purchase order number from the CVR.
Arcus Global Ltd is a company that has been set-up to help Universities use AWS. The following video explains the background and how it works.
It wasn’t too difficult to set-up and the people at Arcus Global Ltd were quite helpful. They don’t charge more than the AWS pricing but try to make money by offering support for $100 a month. They aren’t pushy though so we decided to go it alone.
2) Trouble getting going
This was by far the hardest step. AWS terminology is quite alien and there is a big step to make to get started, there is still an awful lot that I don’t understand. The hard part is to install the CUDA tool kit and the beagle library on an instance. I found the following two blogs very useful:
- http://blog.faircloth-lab.org/beast-in-the-cloud/
- http://francoismichonneau.net/2014/05/how-to-install-beagle-on-ubuntu/
- and the most useful: http://tleyden.github.io/blog/2014/10/25/cuda-6-dot-5-on-aws-gpu-instance-running-ubuntu-14-dot-04/
Using the EC2 CentOS 5.5 GPU HVM AMI (ami-aa30c7c3), I didn’t manage to install the beagle library. I kept getting the following error message:
./autogen.sh
Putting files in AC_CONFIG_AUX_DIR, `.config’.
configure.ac: installing `.config/install-sh’
configure.ac: installing `.config/missing’
examples/complextest/Makefile.am: installing `.config/depcomp’
Makefile.am: installing `./INSTALL’
configure.ac:388: required file `hmsbeagle-${GENERIC_API_VERSION}.pc.in’ not found
autoreconf: automake failed with exit status: 1
However, starting from the following AMI (ami-9eaa1cf6), I was successful.
3) Setting-up an AWS instance.
I found the AWS command line tools really useful so I would recommend installing them.
a) Launch the instance and login
ec2-run-instances –key my-key ami-9eaa1cf6 -t g2.8xlarge
ssh -i my-key.pem ubuntu@xx.yy.zz.qq
b) Change the settings to allow ssh (optional)
sudo bash
adduser bob
In /etc/ssh/sshd_config change to:
PasswordAuthentication yes
Add:
AllowUsers bob
Then:
service ssh restart
exit
c) Verify the CUDA library is installed properly
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
make
./deviceQuery
d ) Install all the necessary libraries for the beagle installation
sudo apt-get update
sudo apt-get install build-essential autoconf automake libtool subversion pkg-config openjdk-7-jdk
sudo apt-get install git
git clone –depth=1 https://github.com/beagle-dev/beagle-lib.git
cd beagle-lib
./autogen.sh
./configure –prefix=$HOME
make install
make check
And add to .profile
export LD_LIBRARY_PATH=$HOME/lib:$LD_LIBRARY_PATH
source ~/.profile
e) Finally install BEAST
I did this by transferring BEASTv1.8.2 using CyberDuck because I could find the appropriate link to use for wget. Uncompress and then check that BEAST can find the beagle resources:
tar xvf BEASTv1.8.2.tar
beast -beagle_info
Output for instance type cg1.4xlarge:
BEAGLE resources available:
0 : CPU
Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALERS_RAW SCALERS_LOG VECTOR_SSE VECTOR_NONE THREADING_NONE PROCESSOR_CPU FRAMEWORK_CPU
1 : Tesla M2050
Global memory (MB): 2687
Clock speed (Ghz): 1.15
Number of cores: 448
Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALERS_RAW SCALERS_LOG VECTOR_NONE THREADING_NONE PROCESSOR_GPU FRAMEWORK_CUDA
2 : Tesla M2050
Global memory (MB): 2687
Clock speed (Ghz): 1.15
Number of cores: 448
Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALERS_RAW SCALERS_LOG VECTOR_NONE THREADING_NONE PROCESSOR_GPU FRAMEWORK_CUDA
Or for the instance type g2.2xlarge
BEAGLE resources available:
0 : CPU
Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALERS_RAW SCALERS_LOG VECTOR_SSE VECTOR_NONE THREADING_NONE PROCESSOR_CPU FRAMEWORK_CPU
1 : GRID K520
Global memory (MB): 4096
Clock speed (Ghz): 0.80
Number of cores: 1536
Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALERS_RAW SCALERS_LOG VECTOR_NONE THREADING_NONE PROCESSOR_GPU FRAMEWORK_CUDA
4) Benchmarking different instance types and my personal computers.
[table id=6 /]
I still have loads to learn when it comes to using the cloud but this was a useful experiment to start off with.
Thanks for reading this far,
Joseph