ECR + S3 VPC Gateways

February 25, 2021

AWS Gateway VPC endpoints are so good that Amazon only ever shipped two. This is a short post so I’m not going to go into exhaustive detail about what they are, but inexhaustively: they help to keep traffic inside a VPC. See, if you can attach an ENI to a resource, then that resource can be inside a VPC, and two ENIs inside the same VPC can chat with each other without ever hitting the internet. EC2 instances, RDS databases, Lambdas (sometimes) and more can all have ENIs. That part all makes sense – it’s the “virtual datacentre” and all that.

Hold up though – you know what’s not inside your VPC? S3. SNS. SQS. A whole bunch of other fully managed services. If in the course of enabling an AWS service you aren’t asked what VPC to launch it into, then that service lives on the internet, and anything communicating with that service must route through an Internet Gateway or NAT Gateway to reach it. This is non-obvious at first. It makes sense when it’s explained as “S3 doesn’t have an ENI in your virtual datacentre,” but S3 is still an AWS service, so shouldn’t it be reachable privately? Not according to the route table.

This is what VPC endpoints do: they are assigned private IP addresses in your VPC which makes something like S3 reachable without having to hit the internet on the way there.

I told you that story so I can tell you this one

So gateway endpoints (not to be confused with interface endpoints) are pretty cool already, but it gets better because gateway endpoints are literally cheaper than free. By that I mean there’s no hourly rate for running them, and because the packets stay inside your VPC, you save on Amazon’s inscrutable network charges (or double network charge, if you run NAT gateways).

Maybe you don’t move a massive amount of stuff back and forth between S3 though, and what’s a couple pennies saved anyway? Well, how about containers?

ECR storage is backed by S3, and not indirectly either. As in, when you pull an image from ECR, the ECR API is mostly there for authorization and to fetch the manifest. When you start pulling down the actual image layers, you’re pulling from Amazon’s starport buckets. This is called out most explicitly in a troubleshooting doc where it’s explained that you won’t be able to pull from ECR if you blanket disallow S3 access without making an exception for their buckets.

I’m not sure how long it will be until entropy eats that link, so here’s the relevant bits:

How do I resolve the “error pulling image configuration: error parsing HTTP 403 response body” error in Amazon ECS when pulling a Docker image from Amazon ECR?

Amazon ECR uses Amazon Simple Storage Service (Amazon S3) to store your image layers. When your containers download images from Amazon ECR, they must access Amazon ECR to get the image manifest and then Amazon S3 to download the image layers. The following is the Amazon Resource Name (ARN) of the Amazon S3 bucket that contains the layers for each Docker image.

arn:aws:s3:::prod-region-starport-layer-bucket/*

In conclusion…

If you’re moving around a lot of stuff in ECR all day (especially giant monolithic rails apps and especially if you run NAT Gateways) then go ahead and turn on that VPC endpoint and save a bundle.