AWS Elastic Beanstalk for Docker

EwanToo · on April 23, 2014

It's a shame the containers appear to be stuck at 1 container per VM, which is fairly limiting, there's no real reason why you wouldn't run 10 or more containers on 1 VM other than IP allocation (which AWS already does very well)

nickstinemates · on April 23, 2014

Agree - but, this is a good initial offering I believe should only get better with time.

This is a great thing for AWS and for Docker users.

dmourati · on April 23, 2014

Help me understand why docker adds any value in prod? When I tried it, I got quickly frustrated with the "expose a single port" to the local OS. Sure, I could go ham on it and move ports around but where is the scale in that? If we are limited to 1 docker/VM seems like we waste more than we gain. Same problem in physical servers.

Not trolling here, seriously interested in where people see value in the real world.

nickstinemates · on April 23, 2014

I'd love to spend some time with you if you're seriously interested.

You can expose more than a single port to the local OS, so it sounds like your experience is frought with misunderstanding, which can only be characterized as our fault :)

Reach out - nick@docker.com

mikebabineau · on April 23, 2014

The "one docker container per VM" is a Beanstalk-imposed constraint, not a Docker constraint.

I think it's helpful to frame Docker in terms of encapsulation.

Imagine a basic Java app with multiple dependencies. Rather than build a lean app.jar and rely on the host providing dependency1.jar and dependency2.jar, many folks package it all into a single app_with_dependencies.jar.

But what if your app is a web app with static content? Rather than depending on the host system to serve the static content separately, many folks package it all into a single app_with_content.war.

But what if your app needs a specific version of Java? Rather than depending on the host system to have the exact Java you need, folks are starting to package the app into a Docker image that has their desired Java version.

Each layer of encapsulation simplifies server provisioning and thereby improves app portability. Taken to an extreme, you can treat your hosts as homogenous and run any Docker container from any server.

tszming · on April 24, 2014

IMO, docker is less useful when you are on a cloud platform like AWS since you already can pack and ship your application as AMI, they even have a marketplace for this: https://aws.amazon.com/marketplace. One of the exceptions I can think of why you really need to run docker inside ec2 is you want to build a PaaS or something like memcached as a service, since it is expensive to give every customer to own a dedicated instance.

Honestly, I would prefer to run as many small/medium instances rather than a powerful instance but to host many dockers inside it, because if you put all the eggs in one basket and when then host machine is down, you are having a bigger problem; if you host a single docker on an ec2 instance, and as I said before, you can already solve this via AMI without the overheads/abstraction.

Of course, it is still very useful to run docker on a physical machine when you don't want to setup KVM or mess up with OpenStack.

benjaminwootton · on April 24, 2014

Docker is massively more lightweight than shipping AMIs.

- We build hundreds if not thousands of docker images a day after each CI commit. Thats not viable with virtual machines or AMIs.

- Docker has the union file system, so pushes and pulls of Docker images after each CI build are tiny;

- Docker is more platform independent whereas AMIs are very EC2 specific. No other cloud provider have anything as sophisiticated as AMI;

- Rollbacks are instant with Docker;

Docker can also be used in conjunction with autoscaling and resilience. We have 10+ containers on a box, but still replicate this 5 or 6 times and will possibly add EC2 AutoScaling in future. Cloud and Docker are complementary not an alternative in my opinion.

dmourati · on April 24, 2014

Cool. This analysis is similar to my thoughts. Still trying to get educated and reached out to the docker fellow below for more info.

Big fan of packer.io + AMI / Virtualbox / GCE for this.

See generally: http://www.infoq.com/articles/virtual-panel-immutable-infras...

robszumski · on April 24, 2014

Sounds like you might be a perfect candidate for CoreOS. fleet allows for single-purpose containers to move around a cluster. It supports all of the platforms that you mentioned (AWS, Openstack, KVM and bare metal).

benjaminwootton · on April 24, 2014

One docker / VM is not a Docker restriction - more a restriction of this AWS offering.

Was one port exposed a legacy restriction on Docker?

It's not the case now - you can expose many ports to the OS, and then map them as you wish.

This is actually a massive benefit. We might deploy 10 containers to one box all exposing SSH port on 22 and HTTP port on 80, but then at runtime choose which port the OS exposes and map 10 different ports onto 22/80 in the containers.

The real value I find of Docker in production is the deployments - fast, repeatable, easy to rollback. Once they're on a box, it's much of a muchness once you've mapped the ports.

dmourati · on April 24, 2014

Please explain the port mapping in more detail. For example, let's say I want to run N containers on one host OS. Each of the N containers exposes port 80. How do you do the mapping?

benjaminwootton · on April 24, 2014

So you have one image exposing port 80.

You create say 3 instances of that image (i.e. 3 containers) with different port mappings 5001:80, 5002:80, 5003:80. These are just specified at the command line (docker run -p5001:80)

Against the same containers you might also do 6001:22, 6002:22, 6003:22 to expose SSHD to different ports on the host.

This allows you to defer the decision of which containers will be listening where to deployment or runtime, giving you lots of flexibility.

dmourati · on April 24, 2014

Thank you.

OK, how does a client outside of my docker host know to connect to 5001 to get to container1 port 80 and 6001 to get to container1 port 22?

BTW, this was the scenario I mentioned initially as "I could go ham on it and move ports around."

Thanks again for the reply, getting somewhere.

benjaminwootton · on April 24, 2014

The less mature piece of the puzzle for that is service discovery.

So instead of your apps having hard coded endpoints, they go to some config file, zookeper, etcd to find the services. Your config management piece can register endpoints with the same directories as they come online.

An alternative is just to register with a load balancer dynamically.

This bit does take work but it gives your system a very nice dynamic fungible property.

Happy to discuss more offline with anyone as I'm a massive advocate of docker now! Same username at Twitter.

shykes · on April 24, 2014

Thanks for the solid explanations Benjamin.

Docker will soon offer a better solution to service discovery. It will allow taking advantage of the various SD tools out there (zookeeper, etcd, synapse/haproxy, skynet, serf) while keeping your containers compatible with all of them. This will help keep all docker containers compatible with each other, instead of fragmenting them among mutually incompatible service discovery protocols.

Happy to chat about it more on irc - #docker and #docker-dev on Freenode.

mikebabineau · on April 24, 2014

"an alternative is to register with a load balancer dynamically"

To expand on this, a common pattern is to have a local proxy installed on each host machine. To route traffic to a new container, one simply updates the config for the local nginx/haproxy/hipache/other proxy.

A second layer of load balancing exposes the singular, top-level endpoint. All Docker hosts are registered with this LB, but those without live containers for a given service are simply "out of rotation" for that endpoint (health check failure).

This simplifies the service discovery aspect for architectures not already using a more sophisticated mechanism.

sendob · on April 24, 2014

http://12factor.net/port-binding

"In deployment, a routing layer handles routing requests from a public-facing hostname to the port-bound web processes."

steinnes · on April 24, 2014

The "expose a single port" default suggestions always annoyed me a bit, the fact is that each docker gets an IP on a virtual ethernet interface and with the right routes in place can be accessed by anyone on the same network -- so obviously more ports can be exposed :-)

dmourati · on April 24, 2014

Aha!

This is called advanced networking, yes?

http://docs.docker.io/use/networking/

They didn't have this when I looked last summer.

steinnes · on April 25, 2014

It's been a few months since I managed to dig into Docker for some test projects at work, but docker0 was the name of the virtual bridge interface so probably the documentation you linked is what we're going on about :-)

I only started playing with Docker last December, and I think this was supported from that point in time, so definitely it could be a recent addition.

Currently I'm envisioning Docker playing a nice role in our CI and development infrastructure at work, where we can choose to start a new docker for any given project, and get a locally accessible IP back in a matter of seconds ... all from our soon-to-be-created CI+deployment web ui ;-)

evandbrown · on April 23, 2014

We'll carve out ~15 minutes for a brief overview + Q&A during tomorrow's Office Hours: https://plus.google.com/events/ceoe036ugsu6hndr6ncl616gr3k

maslam · on April 23, 2014

Evan this is really nice!

mikebabineau · on April 23, 2014

They also published a quick demo: https://www.youtube.com/watch?v=lBu7Ov3Rt-M&feature=youtu.be

edandersen · on April 24, 2014

I even got a simple .NET app running on Elastic Beanstalk, and it's almost been up for two years with no downtime. Quite impressed.

rsanders · on April 23, 2014

This is excellent news. I have a rather complicated app which I fought tooth and nail to get working on Elastic Beanstalk. The provisioning scripts were just too buggy, undocumented, and full of bad assumptions.

I have a couple of apps running perfectly under EB. This neatly solves my main issue with it.

toomuchtodo · on April 23, 2014

I would be interested in the details with regards to problems you had. I deploy a java/tomcat application to Elastic Beanstalk, and haven't had any issues, going so far as to heavily modify the environment using .ebextensions scripts to replace apache configurations, install additional packages, etc.

rsanders · on April 23, 2014

My Clojure app originally written to run on EB worked with no issues. The problem was with an old, sprawling "monorail" Rails app that needed a very specific system configuration, including running a daemon or two in addition to the web app itself. I can't remember all the individual issues, but the .ebextensions directory was getting pretty big and complicated.

I finally gave up when I ran into bugs in the Rails provisioning scripts that only appeared on initial deploy (or re-deploy; I forget). I had to patch and overwrite them, which was very brittle, and it was just easier to use Chef to spin custom AMIs exactly as I wanted them.

Having Docker be the common "configure a system to your app's needs" mechanism shields me from all these details about how EB works, and that's exactly what was needed.

toomuchtodo · on April 23, 2014

That's fair. Thanks for the details.

robbles · on April 23, 2014

I'm still looking for a way to deploy with Elastic Beanstalk without causing a brief interruption in service. It seems like a major oversight in the system it provides.

I know you can do the CNAME swap trick, but that cause monitoring issues due to essentially switching between groups of servers, plus you have to run twice as many servers.

Since you've had some experience with advanced configuration of EB apps, have you run into any convenient trick for doing this?

toomuchtodo · on April 24, 2014

Elastic Beanstalk does have rolling updates, but unfortunately its for environment changes, not application version changes.

Our developers are fond of the Elastic Beanstalk interface; to switch between app versions without end-user disruption, I have logic that directly manipulates instance state and uses connection draining (both ELB functions) to cycle new clients to the fresh cluster while not disrupting existing clients.

We don't run twice as many servers; before we switch between groups of servers, we scale up the destination cluster, and after the switch scale down the stale cluster with the old app version (2 is the minimum for load balanced applications unfortunately).

My email is in my profile; feel free to get in touch, I'd be happy to answer any questions you may have.

keithgrennan · on April 24, 2014

I agree this is the EB's biggest missing feature. I do it by having a second environment running the same web app. Before deploying I add nodes from this 'reserve' env to the ELB of the main env, and remove all the nodes-to-be-deployed from the main load balancer. Then I deploy the app. When deploy is complete and prod nodes are ready I add them back to the ELB and remove the reserve nodes. The deployment is seamless but this is not a convenient trick - it required a fair bit of custom scripting... Luckily AWS provides great APIs to make all this stuff possible.

keithgrennan · on April 24, 2014

my .ebextensions script just calls chef-client, which is pretty straightforward to maintain.. Downside is that it takes many minutes for a new node to come up. Sounds like docker might help speed this up.

KickingTheTV · on April 24, 2014

How do you guys feel about the "Container-as-a-Service" providers (Orchard, Stackdock, Tutum) in relation to Elastic Beanstalk for Docker? IMHO AWS's solution appears complex and lacking compared to this: http://ow.ly/w63M6

uberdog · on April 24, 2014

It says on the docker site "Please note Docker is currently under heavy development. It should not be used in production (yet)."

http://www.docker.io/learn_more/

MichaelGlass · on April 23, 2014

Until elastic beanstalk supports time scheduled scaling, I will be sad. Every morning my app servers go from zero to a million miles an hour and it takes about 20 minutes to scale up. Does anyone actually use this at scale?

imperialWicket · on April 23, 2014

Elastic beanstalk uses auto scaling groups which support as-put-scheduled-update-group-action in the cli or http://docs.aws.amazon.com/AutoScaling/latest/APIReference/A... to set time-based scaling.

MichaelGlass · on April 24, 2014

lovely. I had a consultation with an Amazon rep after reading this http://aws.amazon.com/application-management/ and they confirmed their own incorrect documentation.

rschmitty · on April 23, 2014

We use CloudFormation to do this, but I thought beanstalk had similar cloudwatch metrics to scale with.

We don't use chef/puppet/ansible/salt in production or any code deploy stuff. Basically use your provisioning tool of choice to setup an instance like you would a local vm, deploy your code, then package that all up as an AMI.

When the server spins up there is some minor config that gets set depending on the zone it is in and we are off to the races. Takes about 2 minutes to fully register with ELB and start serving traffic.

Each step you remove from a new instance to become 'ready' decreases your load time and lets you run at higher margins so you arent stuck with a Cloudwatch alert of 30% cpu -> scale up because it takes 20 minutes for chef to config the server, git to deploy code, run whatever compilation of assets/configs, then finally register with ELB

Downside of course is you are always making AMIs to do a new push.

If you follow semver, you could add a small step to handle a pull of your latest .PATCH version so you are only creating AMIs for MAJOR.MINOR. Or if you prefer only doing MAJOR ami creations

tldr: reduce # of steps a new instance has to take in order to start handling traffic = higher cloudwatch thresholds = more server utilization = cheaper bill and faster scaling!

everettForth · on April 23, 2014

I believe you could write code to do this, as long as you have amazon credentials on a server at that time in the morning. Did you write your own app, or do you need to hire a consultant to do this?

h1karu · on April 23, 2014

Autoscaling isn't really intended for bursting, it works better when you have predictable traffic patterns. But if I understand it correctly you can use EB with custom Cloudwatch metrics and that lets you do some pretty cool, predictive things that autoscale based on external factors such as: volume of Twitter mentions, Google Analytics data, number of active user sessions, etc. In theory you could write some app code that uses custom cloudwatch metrics to initiate scale-up during certain timeframes. right ?

michaelmior · on April 24, 2014

> Every morning my app servers go from zero to a million miles an hour

Sounds pretty predictable to me.

cddotdotslash · on April 24, 2014

Use AWS OpsWorks. It allows you to setup time based instances easily.

bsaul · on April 23, 2014

I'm currently playing with docker and wonders : how does that tech fits with things like chef/puppet/saltstack ( its conf part) ? Is it going to make all those technologies pointless ?

bryanlarsen · on April 23, 2014

You can can call chef-solo etc inside your Dockerfiles to create your images, and you can use chef/etc to put images on your server and hook them together.

The docker "philosophy" is that docker images should be simple, running only a single process. In that case chef-solo et al are often over-kill to create images. Docker mostly solves the "known state" problem, so there really isn't much that chef-solo gives you over running shell commands from your Dockerfile.

But sometimes it's convenient to create a single image that encapsulates everything, containing lots of moving parts. In that case, a provisioning tool could be very useful to put it together. You'll probably lose some of the 'layering' benefits of Docker, though.

I do find that provisioning tools are very helpful to put images onto a server and hooking them together. That's the central configuration store philosophy. Others are working on stuff like etcd for distributed configuration stores.

shykes · on April 23, 2014

Hi, Docker is pretty complementary with the configuration management tools you mention. Lots of people use them in combination.

One common pattern is to use chef/puppet/salt/ansible to help you build the application stack inside the container, then seal the result into a standard Docker container. Once the build phase is over, it no longer matters which configuration management tool you used (if any), which makes your application more portable across infrastructure. See for example http://tech.paulcz.net/2013/09/creating-immutable-servers-wi...

mpdehaan2 · on April 24, 2014

Here's a post I wrote about using Ansible inside a docker file -- http://www.ansible.com/blog/2014/02/12/installing-and-buildi...

jpetazzo · on April 23, 2014

Docker can play with or without Chef/Puppet/Salt/Ansible/etc.:

- you can use configuration management tools to author Docker images; - you can use configuration management tools to deploy Docker (and start Docker containers).

The following presentation has some Puppet-specific information (but the concepts map neatly to other CM tools):

http://www.slideshare.net/jpetazzo/docker-and-puppet-puppet-...

oDot · on April 23, 2014

Is updating the source rebuilds the image from scratch?

evandbrown · on April 23, 2014

If you deploy a Dockerfile, EB will re-build the image.

Also, by default, EB will do a `docker pull your-image` on each app version deploy. To disable that, include a Dockerrun.aws.json file with the following:

{ "AWSEBDockerrunVersion": "1", "Image": { "Update": "false" } }

And some more color here: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create...

EDIT: Also would point out that EB is using Docker's layer/image caching, so a rebuild does use the local cache (but does do the `docker pull` to check remote for any updates)

oDot · on April 23, 2014

Thank you, all clear now!