Amazon Elastic Cloud Machine Image for Analytics
I am embarking on creating an Amazon machine image for compute intensive workloads. I am looking for collaborators that have large data sets to process and that are interested in using cloud computing to solve their compute needs.
If you look at the economics, cloud computing is much more cost effective than building your own infrastructure, if you do not have enough work to keep a cluster busy.
The two most interesting offerings in this domain are Sun's Network.com and Amazon's Elastic Cloud. Network.com uses high end servers and charges $1 per cpu hour. Amazon uses a spectrum of machines, with low-end machines going for 10 cents per hour to more powerful machines going for 80 cents per hour.
If you build your own cluster of high-end machines you are looking at about 20 cents per cpu hour for hardware capex and opex. I use an aggressive 2 year amortization of the hardware given the fact that cpu performance still doubles about every 18 months. If you can't do the administration of the cluster yourself you will need to add another 20 cents per cpu hour to the equation.
The above data shows that if you can keep the cluster continuously busy that it is cheaper to build and operate your own cluster. However, if your workload is such that you can't keep a cluster busy, then renting compute infrastructure on-demand is attractive.
Since analytic workloads, particularly data mining workloads, can be heavy on deep statistics processing combined with scripting for automation we need control over the machine we want to deploy. Since we can build your own machine image on Amazon I have selected EC-2 as the platform. Given the spectrum of service level agreements compared to Sun's Network.com EC-2 will provide us a flexible solution for out sourcing analytic workloads.
Tags: analysis, analytic workload, cloud computing, statistics
Share
-
▶ Reply to This