Why an EC2 Instance Isn’t a Server

At Re:Invent, the AWS conference in Vegas last November, Amazon’s CTO, Werner Vogels, made an interesting observation during his keynote address: “An EC2 instance is not a server—it’s a building block.” This sounded suspiciously like a tagline a car manufacturer might use to convince me that their car “isn’t just another car.” But something about Vogels’ statement stuck with me. I had used EC2 since the public beta and he succinctly stated something that I knew to be true but didn’t understand why. It took a few months before I put the pieces together and could articulate what EC2 is and what it is not.

From all outward appearances, and even functionally-speaking, an EC2 instance behaves as a virtualized server: A person can SSH (or Remote Desktop) into it and install nearly any application. In fact, from my experience, this is how >95% of developers are using AWS. However, this is a very myopic view of the platform. Most of the complaints levied against AWS (unreliable, costly, difficult to use) are by those who are trying to use EC2 as if it were a traditional server located in a datacenter.

To truly understand AWS, we have to examine Amazon’s DNA. In 2002-2003, a few years prior to the introduction of EC2, Bezos boldly committed the entire organization to embracing service-oriented architecture. SOA is not unlike object-oriented programming (OOP): each function is discretely contained into an isolated building block and these pieces connect together via an agreed-upon interface. In software design, for example, this would allow a programmer to use a third-party library without needing to understand how the innards work. SOA applies the same principles at the organizational level. Bezos mandated that all business units compartmentalize their operations and only communicate through well-defined (and documented) interfaces such as SOAP/RESTful APIs.

For an organization of Amazon’s size, this change was a big deal. The company had grown very rapidly over the previous decade which led to the web of interconnectedness seen in all large companies: purchasing and fulfillment might share the same database server without either knowing who’s ultimately responsible for it, IT operations might have undocumented root access to payroll records, etc. Amazon.com’s organic growth had left it full of the equivalent of legacy “spaghetti code.” Bezos’ decision to refactor the core of operations required a tremendous amount of planning and effort but ultimately lead to much more scalable and manageable organization. As a result of the SOA initiative, in 2006, a small team in South Africa released a service that allowed computing resources to be provisioned on-demand using an API: Amazon Elastic Compute Cloud (EC2) was born. Along with S3, it was the first product of the newly introduced Amazon Web Services unit (since spun off into a separate business).

Amazon Web Services is the world’s most ambitious—and successful—result of service-oriented architecture. ┬áThis philosophy drives product innovation and flows down to its intended usage. When competitors like Rackspace argue “persistence” as a competitive advantage, they’re missing the entire point of AWS. EC2 is the antithesis of buying a server, lovingly configuring it into a unique work of art, and then making sure it doesn’t break until it’s depreciated off the books. Instead, EC2 instances are intended to be treated as disposable building blocks that provide dynamic compute resources to a larger application. This application will span multiple EC2 instances (autoscaling groups) and likely use other AWS products such as DynamoDB, S3, etc. The pieces are then glued together using Simple Queue Service (SQS), Simple Notification Service (SNS), and CloudWatch. When a single EC2 instance is misbehaving, it ought to be automatically killed and replaced, not fixed. When an application needs more resources, it should know how to provision them itself rather than needing an engineer to be paged in the middle of the night.

Interconnecting loosely-coupled components is how a systems architect properly designs for AWS. Trying to build something on EC2 without SOA in mind is about as productive (and fun) as playing with a single Lego brick.


8 thoughts on “Why an EC2 Instance Isn’t a Server

  1. Hi,

    what I’ve done is to sneak “grid instance” into conversations about AWS / EC2 since this seems to catch better. I mean, compute unit or whatever should make it clear enough, but so far it’s not taked hold with many people. Just the same it’s not easy for everyone (i.e. me) to rewrite their applications as something cloud-ready that fully supports scaling out through the whole “stack” (I mean, like multiple IPs, multiple service instances and still keeping things transactionally synced)
    By now, I see a trend to re-define the classical cloud vm. i.e. on Webhostingtalk.com you’ll find providers saying that “it’s not a true cloud if you can’t resize the memory” (yeah right, like it would matter if you just spin up a new bigger instance and phase out the smaller one) or “cloud hosting” being used to define “if the vm fails it will be launched on another host” – with the original data.
    It makes me cringe twice, first because I hate people redefining stuff so it matches their old-fashioned offers and second, because they won’t be liable if someone follows their false advertising on a real cloud infrastructure. We all know the joke about how “you get to keep the pieces” after a failure.
    Imagine you were at some cloud-redefining hosting company and switch over to a cloud that is really one, more concerned with spinning up a few 100 more instances of your service than treating each single instance as a VM that has no backup, no SAN storage, …

    I wish them a good lawyer to sue the old shop for their false advertising.

  2. We at Cloudozer take the concept even further. Virtual instances should effectively replace OS processes. They should be slimmed down and contain mostly application code. Instances should be started/stopped with the same ease we call (remote) functions today. SOA as currently implemented (XML/RPC) is too heavyweight to our taste. We are using 9p protocol from Plan 9 to coordinate computing nodes.

  3. I had to check the date on this post, because I was surprised that people still needed to be reminded of these things. If you needed to write it though, I imagine some people still need to read it. Given that though, Rackspace’s emphasis on persistence isn’t a missing the point of AWS, it is good marketing — there are people who aren’t ready for AWS’s building blocks, and may never be.

    So far, I’ve been one of those people. At my last job, I led the deployment of an automated configuration management system that was a big step towards being able to run in an AWS-like environment, but we ended up stopping short because, as it happened, we had more important issues to deal with (ie. why our growth rate was pathetic) and so never implemented and tested failover for all the services. My personal “server” is hosted on Linode (a company that competes with Rackspace’s persistent virtual server offering). Getting it ready for AWS just isn’t worthwhile.

  4. I can relate with a lot of what you’re saying in this post, but I feel I must disagree when it comes to your thoughts on persistent cloud. I began using Amazon EC2 and Rackspace Cloud on the same day just about four years ago, and via my cloud computing consultancy, we have multiple customers utilizing both providers for different purposes. We’re constantly telling our customers that they need to architect for the cloud provider they choose and we always present all options to customers when they ask our opinion. What’s interesting is that, even when remaining as neutral as possible, we end up recommending Rackspace Cloud 95%+ of the time.

    I agree with you that applications need to be “cloud aware” and use their instances as simple building blocks to accomplish that feat. Unfortunately, that simply isn’t realistic for most companies. Unless you entered the tech world a mere few years ago, you’re likely coming from a traditional background: spin up a server, configure that server, and make it a part of your stack. That’s usually the path of least resistence and it’s a hard habit to change, especially when customers want to focus on building and testing the viability of their product. In fact, we’re usually out celebrating when we can convince our customers to use tools like puppet for consistent server deployments, as it’s almost always a small investment for a large return. However, when we stand up at the whiteboard and draw out the costs and time required to architect for instance failure, we almost always get pushback. To be honest, I usually agree with their decision. In the modern world of continuous deployment, “MVP out the door ASAP” thinking, and “build build build” strategies, start-ups don’t have that extra time to waste. Things will probably change in time, especially as software (databases especially) are built to handle failure internally… but that day is not today.

    Granted, there are great Platform as a Service (PaaS) products that can live on top of services like EC2 and handle the chaotic drama of dying instances. Unfortunately, those platforms cost money, and a lot of companies are hesitant to use them due to vendor lock-in (then again, EC2 itself is vendor lock-in, but I won’t comment on the benefits of OpenStack here — that’s another response entirely). Furthermore, most PaaS offerings are limited in the technologies that can be used, and customers often-times end up spinning up raw EC2 instances anyway as a workaround. Moreover, I often see customers take a step backwards when they start at a service like Heroku and move themselves to chaos clouds like Amazon EC2. They’re going from a solid platform with underlying redundancy and complete abstraction to essentially bare-metal instances that can “die” at any time. Those bare instances make great building blocks, but those developers aren’t going to focus their time on treating them as such.

    Amazon was the first major player in the market, and we can all admit that those looking to move to “the cloud” often choose Amazon simply because they’ve heard it’s the “right direction to go” and they are the “biggest player in the field”. Those are the types of customers that fail to architect for Amazon’s services, utilize instances like they would persistent servers, and experience all of the pain you describe in your post. No one can deny that there’s a market for a “happy medium” service.

    While we’re a small organization with mostly small customers, I can happily say that we’ve worked with customers with hundreds of EC2 instances who still don’t have the spare developer cycles to focus on architecting for Amazon and handling failure properly. Luckily, many of those start-ups succeed long-term and can re-iterate their product until they get it right and can handle most failures. Until then, they should feel fortunate enough to have a persistent cloud underneath the hood to nurture them as they grow.

    Again, I compliment you on your blog post. I like Amazon EC2, but I think it’s important that people understand how it was designed to be used and what kind of thinking is required to utilize it successfully. Let’s not ignore that much-needed “middle ground” that companies like Rackspace offer… they’re doing it on purpose, and they’re doing it quite well.

  5. Taking the comment from Matt even further, it’s not only the cost implied in making a custom-built-product properly cloud-aware, the problem lies as well in the hundreds of thousands of COTS apps out there which where not developed in cloud-mode. SAP to WordPress, all of them are designed around a persistence model with a big DB sitting in the middle (or other equivalent persistence layer which does not play well with the many-disposable-blocks approach)

  6. Amazon’s concept of engineering solutions by gluing building blocks like AWS, EC2, DynamoDBS etc is not truly engineering unless there is a service which will rate an application’s performance when executed in various available computing types and units. Gluing and/or bolting requires knowledge about load carrying capacity of the joints and the bonding strength of the joints. In absence of those type of information Amazon’s dream will remain a dream.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="">