Today I decided to take Amazon’s new AWS Certified Solutions Architect certification exam. The exam is 55 multiple choice questions with a time limit of 80 minutes. I completed it in about 35 minutes with (a passing) score of 85%. I went into the exam cold, without any preparation, because I wanted an honest assessment my knowledge. It actually surprised how challenging I found it to be–it’s a really good, comprehensive exam.
What I appreciated most about the exam is that there weren’t any “gotcha” or filler questions like: “How many Compute Units does a m1.medium instance have?” or “What is the proper syntax for configuring an auto-scaling group using the command line tools?” The exam was obviously authored by someone who has real world experience working with AWS and not just combing through the online docs looking for factoids. This is a worthwhile credential to have on your resume and for hiring managers to consider.
Some key items you should know before you take the exam:
- how to configure and troubleshoot a VPC inside and out, including basic IP subnetting. VPC is arguably one of the more complex components of AWS and you cannot pass this exam without a thorough understanding of it.
- the difference in use cases between Simple Workflow (SWF), Simple Queue Services (SQS), and Simple Notification Services (SNS).
- how an Elastic Load Balancer (ELB) interacts with auto-scaling groups in a high-availability deployment.
- how to properly secure a S3 bucket in different usage scenarios
- when it would be appropriate to use either EBS-backed or ephemeral instances.
- a basic understanding of CloudFormation.
- how to properly use various EBS volume configurations and snapshots to optimize I/O performance and data durability.
Most default installations of Apache take care of this automatically via
mod_deflate (Nginx uses the equivalent
HttpGzipModule). Files are compressed on-the-fly and a
Content-Encoding: gzip HTTP header is added to the response. However, since Amazon S3 is just a place to store files it lacks the ability to gzip files in real-time before delivering them. When using a website speed test application like WebPageTest, this can result in informational warnings that look like this:
Use gzip compression for transferring compressable responses: 90/100 FAILED - (52.1 KB, compressed = 40.6 KB - savings of 11.5 KB) - http://mybucket.s3.amazonaws.com/css/awesomeness.css
To resolve this, files have to be compressed before being uploaded to S3. From Linux or OSX, this can be easily done with
gzip -9 awesomeness.css, which creates a new, compressed version of “awesomeness.css.”
This new file is then uploaded to S3 and the following metadata is set on the bucket object:
Content-Type header informs the web browser that the actual contents of the file is CSS markup while
Content-Encoding specifies that it’s a gzipped file. Both HTTP headers (“metadata” in AWS-speak) are required for the browser to correctly interprete the file.
Manually gzipping each file and setting the correct metadata can get very tedious, very quickly. I would recommend that this process be automated as a build step on a continuous integration (CI) server (at RBN we use Jenkins): a post-commit hook on the source repository fires, triggering a build. The CI server performs a SVN export or Git clone, executes a shell script that gzips the files, and uploads them to S3 and sets the proper object metadata. If you’re not currently using CI in your development workflow, a more basic version of this process can be hacked together using only commit hooks. Though I would argue that this would be an opportune time to dip your toes into continuous integration.
At Re:Invent, the AWS conference in Vegas last November, Amazon’s CTO, Werner Vogels, made an interesting observation during his keynote address: “An EC2 instance is not a server—it’s a building block.” This sounded suspiciously like a tagline a car manufacturer might use to convince me that their car “isn’t just another car.” But something about Vogels’ statement stuck with me. I had used EC2 since the public beta and he succinctly stated something that I knew to be true but didn’t understand why. It took a few months before I put the pieces together and could articulate what EC2 is and what it is not.
From all outward appearances, and even functionally-speaking, an EC2 instance behaves as a virtualized server: A person can SSH (or Remote Desktop) into it and install nearly any application. In fact, from my experience, this is how >95% of developers are using AWS. However, this is a very myopic view of the platform. Most of the complaints levied against AWS (unreliable, costly, difficult to use) are by those who are trying to use EC2 as if it were a traditional server located in a datacenter.
To truly understand AWS, we have to examine Amazon’s DNA. In 2002-2003, a few years prior to the introduction of EC2, Bezos boldly committed the entire organization to embracing service-oriented architecture. SOA is not unlike object-oriented programming (OOP): each function is discretely contained into an isolated building block and these pieces connect together via an agreed-upon interface. In software design, for example, this would allow a programmer to use a third-party library without needing to understand how the innards work. SOA applies the same principles at the organizational level. Bezos mandated that all business units compartmentalize their operations and only communicate through well-defined (and documented) interfaces such as SOAP/RESTful APIs.
For an organization of Amazon’s size, this change was a big deal. The company had grown very rapidly over the previous decade which led to the web of interconnectedness seen in all large companies: purchasing and fulfillment might share the same database server without either knowing who’s ultimately responsible for it, IT operations might have undocumented root access to payroll records, etc. Amazon.com’s organic growth had left it full of the equivalent of legacy “spaghetti code.” Bezos’ decision to refactor the core of operations required a tremendous amount of planning and effort but ultimately lead to much more scalable and manageable organization. As a result of the SOA initiative, in 2006, a small team in South Africa released a service that allowed computing resources to be provisioned on-demand using an API: Amazon Elastic Compute Cloud (EC2) was born. Along with S3, it was the first product of the newly introduced Amazon Web Services unit (since spun off into a separate business).
Amazon Web Services is the world’s most ambitious—and successful—result of service-oriented architecture. This philosophy drives product innovation and flows down to its intended usage. When competitors like Rackspace argue “persistence” as a competitive advantage, they’re missing the entire point of AWS. EC2 is the antithesis of buying a server, lovingly configuring it into a unique work of art, and then making sure it doesn’t break until it’s depreciated off the books. Instead, EC2 instances are intended to be treated as disposable building blocks that provide dynamic compute resources to a larger application. This application will span multiple EC2 instances (autoscaling groups) and likely use other AWS products such as DynamoDB, S3, etc. The pieces are then glued together using Simple Queue Service (SQS), Simple Notification Service (SNS), and CloudWatch. When a single EC2 instance is misbehaving, it ought to be automatically killed and replaced, not fixed. When an application needs more resources, it should know how to provision them itself rather than needing an engineer to be paged in the middle of the night.
Interconnecting loosely-coupled components is how a systems architect properly designs for AWS. Trying to build something on EC2 without SOA in mind is about as productive (and fun) as playing with a single Lego brick.