Choosing the appropriate Amazon EC2 instance type is an important task, but it can be frustrating to navigate the many options available and understand what kind of performance to expect. One common pitfall is to assume similar specs have the same performance characteristics across instance classes. We recently ran into this problem at Credera when changing instance types for one of our own internal applications running on AWS.
Background
The application in question is composed of multiple Java Spring Boot microservices sharing the same t2.large EC2 instance. Over time, services were added until we were using close to 100% of the 8 GB of RAM available on our instance. To remedy this, we made the decision to move up to a memory-optimized r4.large instance. It had two vCPUs just like the t2.large and almost double the memory at 15.25 GB.
Shortly after, we noticed that it was taking close to twice the amount of time for all our services to start up during deployments. This pointed to a CPU issue, but that didn’t seem to make sense. An r4.large instance has the same number of vCPUs as a t2.large, and the underlying Intel Xeon processor is roughly the same in terms of raw performance. So what could cause the difference?
A Surprising Difference
For a while, we were stumped as to what could be causing the slowdown. It wasn’t until I read the documentation on AWS’s new CPU option customization capabilities that it clicked: Not all vCPUs are made the same.
It turns out there is an important piece of information on AWS’s instance types page that can be easy to overlook because it is just a footnote at the bottom of a very large table:
“Each vCPU is a hyper-thread of an Intel Xeon core except for T2.”
In other words, for T2 instances, 1 vCPU = 1 physical core. For all others, 1 vCPU = 1 logical core. This can lead to significant performance differences when it comes to multi-threaded, burst CPU usage. As luck would have it, starting multiple services at the same time is a multi-threaded, burst task, causing us to see significant performance drops on our R4 class instance as compared to our T2 class instance. Wishing to understand the full impact, I decided to perform some tests myself.
Methodology
I discovered a well-written blog post by Marc Felding titled, “Virtual CPUs With Amazon Web Services.” He researched this back when AWS originally switched from the ECU model to the current vCPU model of measuring instance compute capacity. I used the same basic methodology as Felding, which I will briefly summarize here.
The instances I used for my tests were t2.large, r4.large, and m4.large instances running the latest Amazon Linux AMI. I used gzip with virtual devices for input and output to control for the influence of disk performance.
The commands I used are the same as those used by Felding:
Single-threaded performance:
dd if=/dev/zero bs=1M count=2070 2>>(grep bytes >&2 ) | gzip -c > /dev/null
Multi-threaded performance:
for i in {1..2}; do dd if=/dev/zero bs=1M count=2070 2>>(grep bytes >&2) | gzip -c > /dev/null & done
Results
table, th, td { padding: 5px; border: 1px solid black; }
t2.large
m4.large
r4.large
OS
Amazon Linux
Amazon Linux
Amazon Linux
Processor
Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40 GHz
Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30 GHz
Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30 GHz
# Sockets
1
1
1
# Cores per socket
2
1
1
# Threads per core
1
2
2
Single-core, single thread
132 MB/s
138 MB/s
137 MB/s
Single-core, two threads
65.1 MB/s
67.8 MB/s
67.7 MB/s
Dual-core, two threads
132 MB/s
90.7 MB/s
90.9 MB/s
vCPU Performance Test Results
The CPU information and performance results are summarized in the table above. I’ve highlighted in yellow where we see unexpected performance hits due to hyper-threading. We see around a 34% drop in multi-threaded performance.
Also highlighted in green/red is the difference that causes this performance drop. While all instances report one socket, the t2.large shows two cores with one thread per core, meaning hyper-threading is disabled. The m4.large and r4.large instances instead have one physical core with two threads, which comprise their two vCPUs.
Customizing vCPU Hyper-Threading
What can you do if you don’t want your vCPUs to be hyper-threaded? The good news is that AWS now lets you 1) reduce the number of physical cores assigned to an instance, and 2) disable hyper-threading. This means you can do things like:
Reduce licensing costs for software using a per-core or per-socket licensing model when using an instance type with sufficient RAM but an excess of CPU resources.
Improve performance for some workloads, such as high-performance compute (HPC) workloads, where hyper-threading can cause resource contention.
Note that this is not something that will improve performance for most workloads. Consider carefully whether customizing the CPU options for your instances is a correct fit for your workload. You can read more about customizing CPU options in AWS’s documentation.
Takeaways
To sum up, here are some key takeaways from our experience:
For all instances except for T2, a vCPU is a hyper-thread of a physical core.
Always, always test performance of your application on the target instance type, especially when moving instance families (i.e., T2 to R4) or instance generations (i.e., M4 to M5).
Ensure you have post-launch monitoring for any instances in production, both at the application level and the server/instance level.
T2 instances offer a great value for burstable CPU capacity—they might not use as many CPU credits as you think if you’ve tested CPU utilization on other instance classes.
Review instance types on a regular basis; AWS releases new instance types regularly which may offer better performance for the price.
For Credera, we were fortunate that the performance impact only really affected the start-up time of our application and not ongoing performance. The benefit of the increased memory capacity at a lower price than a corresponding T2 instance more than compensated for this, so we stuck with the r4.large. We will also be evaluating the new R5 instances which are advertised as having 20% increased CPU performance over R4 instances.
Well-Architected Framework
Choosing an instance type is an important part of two pillars of the AWS Well-Architected framework: performance efficiency and cost optimization. Credera is an AWS Advanced Consulting partner trained to provide AWS Well-Architected reviews to ensure that your workloads meet AWS best practices. Our AWS certified experts have experience migrating, architecting, securing, and optimizing solutions on AWS. Feel free to reach out to us at findoutmore@credera.com with any questions.
Note: During publication, AWS announced their new T3 instances. T3 instances boast a newer generation of Intel processors and support for network bandwidth boosting up to 5 Gbps and EBS bursting up to 2.05 Gbps (or 1.5 Gbps for smaller instances). They also now use hyperthreading by default, so the multi-threaded processing suffers the same performance hit seen above. Running the same test on a t3.large instance, I recorded compression speeds of 137 MB/s when single-threaded and 108 MB/s per thread when multi-threaded.
Contact Us
Ready to achieve your vision? We're here to help.
We'd love to start a conversation. Fill out the form and we'll connect you with the right person.
Searching for a new career?
View job openings