Breaking My Dependency Nightmares With Habitat

Overview

In June this year Chef released an awesome new open-source project called Habitat. The great thing about it is that is does a lot of useful things under the hood, but I’m going to focus on telling you a story about how it helped me as an application developer not have to worry about underlying infrastructure dependencies when building and deploying my app. It saved me a heck of a lot of time and frustration and I got to play with some cool new technology too.  

Background

Back in 2006 I was diagnosed with Type 1 Diabetes and soon after this I got to attend a structured education course for people with the condition called Dose Adjustment For Normal Eating (DAFNE for short). As well as learning how to manage my condition better, I got to talk to and share experiences with a group of people who also had the same condition. This was fantastic to be a part of as people really wanted to help and support each other out, as happens when you form a community of like-minded individuals. The challenge was keeping this community going after the course ended. As a recent Computer Science graduate, I decided that I’d create a website to act as a ‘social network’ for people who had been on the course and thus DAFNE Online was born.

I’d dabbled with this web application framework called Ruby on Rails a little bit before and knowing how intuitive Ruby was and the fact that Rails helped developed things rapidly by valuing convention over configuration with the mantra of Don’t Repeat Yourself (DRY) the choice for my new website was simple.  I quickly developed a first version of the website and then needed somewhere to release it so that my group of beta testers could get access and provide some feedback.  As I was a developer, not an infrastructure guy (DevOps wasn’t a thing back then) I had to turn to my favorite search engine to help me figure out how to deploy this thing:

search

Help was duly found in the form of a shared hosting provider who could take my Rails app code and automatically deploy it and stand up a database, also handling that tricky DNS stuff at the same time. This was great! I didn’t have to do the infrastructure stuff at all…

Until… The site got more complex, and I started to depend on more and more RubyGems to support things like pagination, auto complete, etc. With these dependencies came the increased risk of seeing errors like this:

Errno::EACCES: Permission denied - /home/-----/#####/-----/vendor/bundle/ruby/1.9.1/cache/bundler

  /home/-----/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/fileutils.rb:247:in `mkdir'
  /home/-----/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/fileutils.rb:247:in `fu_mkdir'
  /home/-----/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/fileutils.rb:221:in `block (2 levels) in mkdir_p'
  /home/-----/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/fileutils.rb:219:in `reverse_each'
  /home/-----/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/fileutils.rb:219:in `block in mkdir_p'
  /home/-----/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/fileutils.rb:205:in `each'
  /home/-----/.rvm/rubies/ruby-1.9.3-p551/lib/ruby/1.9.1/fileutils.rb:205:in `mkdir_p'

Which started to happen more and more frequently. It was because I was on a shared host and didn’t have sudo access, which is required for some gems which depend on native libraries. Each time I came across this I had to raise a support ticket with the provider who to their credit were very fast to fix my issues, but there was still some downtime when all I wanted to do was deploy the shiny new features in my app.

So I turned to the Internet again for answers, and found a cheap but well-regarded VPS hosting provider who could supply me with a virtual server with full root access.

Now, I had an empty Ubuntu 10 server which needed some magic to happen to it to make my application run. So the Internet was my saviour once again, providing me with a guide on “How to setup Nginx Unicorn and Mysql for Ruby on Rails on Ubuntu” which I ran through and sure enough ended up with server with the necessary infrastructure in place to allow me to migrate my app over to.

Let’s Get Cooking

Fast forward a few years to 2016 when I started working for Chef. I’d steadily been adding new features to the website (and fixed a few bugs) and had even created a companion iOS App. The underlying infrastructure hadn’t been changed much and I was still running the same stack on Ubuntu 10, with many security patches waiting to be done.

I figured that Chef would allow me to migrate to a new server and operating system so I started to write some recipes to configure the infrastructure and application in a repeatable manner. Now, Chef allows you to write very clear code like this:

include_recipe 'nginx::default’
template "#{node['dafne_online']['deploy_base']}/shared/config/nginx.conf" do
  source 'nginx_conf.erb’
  owner 'rails’
  group 'admin’
  mode '0644’
  notifies :restart, 'service[nginx]’
end

Which will use the nginx cookbook available on the Chef Supermarket to instal Nginx and then populate the configuration file from a template stored in my cookbook. Chef is well-known for this kind of approach to infrastructure automation and does a really good job of it.

Then it came to write some recipe code to deploy the actual Rails Application, and what I came up with looked something like this:

deploy node['dafne_online']['deploy_base'] do
  repository                 "https://#{node['dafne_online']['git_user']}:#{git_pw}@dafneonline.git.cloudforge.com/dafneonline.git"  rollback_on_error       true
  environment rails_user_env
  restart_command 'sudo -H -u rails bash -c "export PATH=/opt/rubies/2.2.3/bin:$PATH &&      /etc/init.d/unicorn_dafneonline restart"’
  action :deploy  symlinks shared_hash
  user node['dafne_online']['deploy_user']
  group 'admin’
  before_symlink do
    execute 'bundle install' do
      command "bundle install --path #{node['dafne_online']['deploy_base']}/shared/bundle --without development test --deployment --quiet”
      user node['dafne_online']['deploy_user']
      environment rails_user_env
      cwd release_path
      action :run
    end
  end

  before_restart do
    link '/etc/init.d/unicorn_dafneonline' do
      to '/home/rails/apps/dafneonline/current/config/unicorn_init.sh’
    end

    template "#{node['dafne_online']['deploy_base']}/current/config/unicorn.rb" do
      source 'unicorn.rb.erb'   owner node['dafne_online']['deploy_user']
      group 'admin’
      mode 00744
    end

    execute 'compile assets' do
      command "RAILS_ENV=#{node['dafne_online']['rails_env']} bundle exec rake assets:precompile”
      user node['dafne_online']['deploy_user']
      environment rails_user_env
      cwd release_path
      action :run
    end
  end

  after_restart do
    execute 'notify Airbrake of deployment' do
      command "RAILS_ENV=#{node['dafne_online']['rails_env']} bundle exec rake airbrake:deploy TO=#{node['dafne_online']['rails_env']} REVISION= REPO=https://dafneonline.git.cloudforge.com/dafneonline.git USER=#{node['dafne_online']['deploy_user']}”
      environment rails_user_env
      user node['dafne_online']['deploy_user']
      cwd release_path
      action :run
    end
  end
end

Which is not so pretty or easy to understand. A further challenge was that I tried to run this on a brand new Centos 7 machine but ran into issues where dependencies of the Ruby Gems were missing on the base OS build so when I came to run the app, it failed due to some missing library which had been there on my original Ubuntu 10 machine (I’m looking at you libxml2). So my cookbook grew as I found these dependencies through trial-and-error, but I eventually got it to work on both Centos and Ubuntu.

Enter… Habitat!

Just as I finished up on the cookbook writing, this brand new open source project was unveiled called Habitat. It promised so much including making it easy to run apps on any Linux environment, intelligent choreography, topology management, service discovery and yes, dependency management. I knew it was the right choice, so I set about writing my first plan.

A Habitat plan is at the core a structured shell script. It contains a lot of useful metadata such as who the plan is created by, where the application binaries are stored, checksums and all sorts of information which will help people understand if this plan is right for them to consume and which will help Habitat build the plan into a runnable Habitat artifact.

Of particular interest to my use case was the ability to specify all of the build and runtime dependencies up front – thus I could guarantee that when this plan was eventually packaged with Habitat, it would contain all that was needed to run the artifact anywhere without any external dependency management. Furthermore, specifying these dependencies was easy:

# Specify package runtime dependencies
pkg_deps=(
  core/bundler
  core/glibc
  core/libxml2
  core/libxslt
  core/libyaml
  core/node
  core/postgresql
  core/ruby
  core/zlib
)

# Specify package build time dependencies
pkg_build_deps=(
  core/coreutils
  core/gcc
  core/make
)

When I build this plan, Habitat knows to download the dependencies from its public repository so that they will definitely be present when the artifact is run.

The next important part of the plan is a set of callbacks which define the automation for building and installing the app. If they aren’t overridden they perform a sensible set of defaults such as running make for the installation process, however I added in some overrides which did some Rails application-specific tasks:

do_build() {
  export GEM_HOME=${pkg_path}/vendor/bundle
  export GEM_PATH=${_bundler_dir}:${GEM_HOME}

  # don't let bundler split up the nokogiri config string (it breaks
  # the build), so specify it as an env var instead
  export NOKOGIRI_CONFIG="--use-system-libraries --with-zlib-dir=${_zlib_dir} --with-xslt-dir=${_libxslt_dir} --with-xml2-include=${_libxml2_dir}/include/libxml2 --with-xml2-lib=${_libxml2_dir}/lib"
  bundle config build.nokogiri '${NOKOGIRI_CONFIG}'
  bundle config build.pg --with-pg-config=${_pgconfig}

  bundle install --jobs 2 --retry 5 --path vendor/bundle --binstubs

  bundle exec bin/rake assets:precompile
}

This will perform a couple of Bundler tasks to download and install the Gem dependencies of my app and then precompile the static Javascript and CSS assets into minified versions ready for production.  

So the Habitat plan defined all of the automation which is needed to download, build and install the app and ensure all of the build and runtime dependencies are in place. Now I don’t have to worry about the os-level dependencies that my application has because build Habitat packages will run on any Linux-based operating system (or container).

But Wait… There’s More!

It turns out that Habitat is so much more than just a dependency packager. It allows you to override the configuration when you start a package and at any point during the lifecycle of that package. Most applications have some sort of configuration file alongside them and my Rails app is no different. There’s a database.yml file which defines the configuration info my app needs to connect to it’s backend database, in my case this is now Postgresql:

default: &default
  adapter: postgresql
  encoding: unicode
  pool: 5

production:
  <<: *default
  database: {{cfg.database_name}}
  username: {{cfg.database_username}}
  password: {{cfg.database_password}}
{{~#if bind.has_database}}
{{~#each bind.database.members}}
  host: {{ip}}
  port: {{port}}
{{~/each}}
{{~else}}
  host: {{cfg.database_host}}
  port: {{cfg.database_port}}
{{~/if}}

As you can see this template contains some key-value pairs and placeholders which are defined with the moustache syntax. When my built package is run it will populate the configuration file with some real values. By default, these are read from the default.toml file which is stored alongside the plan:

rails_binding_ip = "0.0.0.0"
rails_port = 3000

database_name = "dafne-online-production"
database_username = "dafne-online"
database_password = ""
database_host = "localhost"
database_port = 5432

However they can be overridden when you start up the package either directly on the command line or by expanding the contents of a .toml file into an environment variable named after your package (HAB_DAFNE_ONLINE) in this example. This is a great feature which means you don’t have to rebuild or even stop your running package when you want to make configuration changes.

Hooks Into Runtime

The final part of my plan comes in the form of defining the lifecycle of the application in terms of how to initialize and get it into a running state. Habitat allows you to do this by providing shell scripts which define lifecycle hooks.  My init hook makes sure the app is ready to run:

echo "Linking database.yml"
ln -sf {{pkg.svc_config_path}}/database.yml 
${DAFNE_ONLINE_DATA}/dist/config/database.yml


if [[ ! -f ${DAFNE_ONLINE_DATA}/.migrations_complete ]]; then
  echo "Running 'rake db:create' in ${PWD}"
  bundle exec bin/rake db:create
  echo "Running 'rake db:migrate' in ${PWD}"
  bundle exec  bin/rake db:migrate
  echo "Running 'rake db:seed' in ${PWD}"
  bundle exec bin/rake db:seed && touch $DAFNE_ONLINE_DATA/.migrations_complete
fi

As you can see, it starts by symlinking the populated database.yml file into the location my Rails app expects it to be in; and then once there is runs some rake tasks to ensure the database is created and populated with the correct schema and data for the app to run correctly. My run hook is a lot simpler, and just starts the Rails app using the Unicorn web server (which is bundled as a RubyGem):

exec chpst -u hab unicorn -p {{cfg.rails_port}}

Again, see that the default or overridden config values (seen in default.toml) are available for use in my hooks. The plan allows me to define the build process for the app, and hooks define the automation needed to get it running and govern the lifecycle. Together they provide me a simple way to define how I go from Rails application source to a build and runnable package which can be deployed anywhere.

Runtime Goodness

After entering the Habitat studio (bundled with the hab binary) and building the package (the simplest of commands: build) I have a package that I can run. However, to run this application for real I need a database. Thankfully those helpful folks at Habitat HQ have already published a package for postgresql, so all I need to do is this:

hab start core/postgresql

Which will download the postgresql Habitat package from the public-facing Habitat depot, including all of it’s runtime dependencies, and then start a supervisor which controls the lifecycle using the hooks that are part of that package. So with that one command I am presented with a running postgresql instance. Awesome!

Now I need to connect it up to the package I have just built in the Habitat Studio:

hab start simfish85/dafne-online --peer 172.17.0.3 --bind database:postgresql.default

Which start up the Rails application, and then runs the init and run hooks shown earlier to get the application to a ready state.  The command has a few more components so let’s break them down further:

hab start simfish85/dafne-online

As I’m in the Habitat studio, this will look locally for my package and run it by first starting the Habitat Supervisor which in turn will then run the lifecycle hooks which are packaged with the app.

--peer 172.17.0.3

Here I’m supplying the IP address of the already-running Habitat supervisor in the postgresql package I started earlier. This means that when the Rails application package supervisor starts up they will form a topology with one another and start to share information with each other that they learn about themselves and other running Habitat packages. This is called the gossip ring.

--bind database:postgresql.default

For me this is the best part, I don’t need to supply any IP addresses or other connectivity information for my postgresql database – all I need to tell my application to do is find the postgresql.default instance within the gossip ring and through Habitat’s inbuilt service discovery capability to exchange the relevant information so that the application package can use the running postgresql database automatically.

This one command let me run the application package I built, form a topology with the running postgresql instance and then automatically populated my Rails database config with the correct configuration information to connect to the Postgresql instance to give me completely running stack ready to be consumed by my users.  Of course I could rerun the above command on multiple nodes to allow me to add more application instances to the topology. Pretty awesome!

Conclusion

Over the years of maintaining the DAFNE Online site I’ve always wanted to focus on what provides value: developing new features for the people who use and rely on the site. When i deployed on a shared hosting provider this gave me the freedom to focus on what I needed, however there were challenges with this approach. I quickly had to become an ‘operations guy’ and stand up and maintain the infrastructure to support the app.

Then Habitat came along, which freed me up from the worry of dependency management, and gave me a simple but powerful framework for defining all of the automation needed to take my application from source code to fully built and runnable package which can be deployed anywhere.

Try it Yourself

There are some great tutorials and documents on the Habitat site: https://www.habitat.sh/try/

Get the Habitat plan source for DAFNE Online here: https://github.com/simfish85/dafne-online-habitat

Posted in:

Simon Fisher

Simon is a Solutions Architect at Chef, he has spent the past few years helping customers solve their Continuous Delivery problems with a variety of products and methods. In his spare time he also develops and maintains DAFNE Online, a site for people with Type 1 Diabetes in the UK. He also enjoys watching rugby, attempting to learn the piano, and learning how to be a new dad.