Elixir/Erlang and autoscaled EC2 servers - how to make them best buddies

At Find a Player, we’ve been trying to figure out how we want to cluster our Elixir servers - so that they can share state between each other - recently. Because we come from ruby (ie, not a high-availability background), when we want to discover other machines in a network we’d traditionally use something like Consul - of which I wrote a ruby library called Diplomat (which admittedly needs some love).

However, after chatting with Gordon Guthrie - a guy who’s been working with Erlang for over 13 years - over twitter, he told us to checkout :net_adm.world/0. Which does pretty much exactly what we want, straight out of the box.

So, if you are building an Elixir app with Autoscaling and AWS, here is how to autoconnect your nodes:

Step 1 - Ensure all nodes have the correct hostname set up

On each node, make sure that the hostname is set to the private DNS:

HOSTNAME="$(curl http://169.254.169.254/latest/meta-data/hostname)"
sudo hostnamectl set-hostname $HOSTNAME
echo "127.0.0.1 ${HOSTNAME}" >> /etc/hosts

Step 2 - Build a .hosts.erlang file

If you have awscli installed (highly recommended) then you can run this at some point before running the app to build the file:

SERVER_CLASS=tag-for-similar-nodes-in-aws
aws ec2 describe-instances \
  --query 'Reservations[].Instances[].NetworkInterfaces[].PrivateIpAddresses[].PrivateDnsName' \
  --output text \
  --filter Name=tag:Class,Values=${SERVER_CLASS} | sed '$!N;s/\t/\n/' | sed -e "s/\(.*\)/'\1'./" > $OTP_ROOT/.hosts.erlang

The above script will find matching instances, print out their private DNS, and produce a file in the format below:

'lorem.example.org'.
'ipsum.example.org'.

Step 3 - Set the hostname in vm.args

Make sure the hostname is setup correctly in your vm.args. Here’s a one-liner to do it using sed:

sed -r -i "s/\-sname.+/\-name api\@$(hostname)/g" /opt/api/releases/$LATEST_RELEASE/vm.args

Step 4 - Attempt to join the cluster

In your code, add a module which checks if the .hosts.erlang file exists, and if so, use it to connect to one of the nodes.

defmodule Api.Discovery do
  require Logger

  def discover do
    # Attempt to connect to any other outside nodes.
    # This command makes use of the /opt/api/.hosts.erlang file
    hosts_path = Path.expand("/opt/api/.hosts.erlang")

    case File.exists?(hosts_path) do
      true ->
        Logger.info("Attempting to join cluster")
        :net_adm.world(:verbose)
      false ->
        Logger.warn("Couldn't find #{hosts_path} file - not joining cluster")
    end
  end
end