Habitat For Packaging Python Flask Web Services

This is a guest post by our friend Tom McLaughlin, Engineering Advocate at Threat Stack. It was first published on the Threat Stack blog on February 22, 2017.

One of the challenges of building open source tools is figuring out how to package and distribute them. This is particularly true with web services. To make building, deploying, and running web services easier, Chef created Habitat.

When building open source web services for Threat Stack, one of our concerns is how to package these Python Flask applications so they run in the widest array of environments with low adoption friction. Using Habitat, the process is quick and easy.

For this post, we’re going to focus on the specifics of packaging a Python Flask application and the particular needs of that stack.

Note: If you are not familiar with Habitat, take a look at Chef’s own series on why they created it:

The Habitat tutorial also explains how to install Habitat and then package and deploy a generic application: https://www.habitat.sh/tutorials/

Our Application

The application we are going to package is a simple RESTful web service. It is written in Python, and the Flask microframework is used to write the service. To serve the application, it is expected that you would use the Gunicorn WSGI HTTP server. The deploy process would be something like this:

  • Get the application code to the host.
  • Install Python dependencies.
  • Launch Gunicorn, pointing it to the config file and application.

Depending on your familiarity with the Python stack and your existing tool pipeline, this could be quick and easy, or it could take you several hours to figure out. Since getting this service deployed and usable as quickly as possible is really important to gaining user adoption, streamlining the process is very is important.

Habitat Configuration

Using Habitat starts with writing plan.sh, init hook, and run hook files. In most Habitat examples, these files are located in the repository root, but in ours they are under the build/ directory to keep them contained by themselves:

$ tree threatstack-to-s3/build/
threatstack-to-s3/build/
├── hooks
│ ├── init
│ └── run
└── plan.sh

plan.sh

The plan.sh file is a bash shell script used by Habitat to drive package building. The first part of the file is the basic configuration for packaging the service:

pkg_name=threatstack-to-s3
pkg_description='Archive alerts from Threat Stack to S3'
pkg_version=0.1.0
pkg_origin=tmclaugh
pkg_maintainer='Tom McLaughlin'
pkg_license=('MIT')
# we copy in the source code in the `unpack` phase and need to put
# something here due to https://github.com/habitat-sh/habitat/issues/870
pkg_source="fake"
pkg_build_deps=(core/virtualenv)
pkg_deps=(core/coreutils core/python2)
pkg_exports=([http]=8080)
pkg_exposes=(http)

The first six lines are basic packaging information that describes the service. The pkg_source variable is set to fake because you are building the existing source tree. The pkg_source variable typically points to a remote source archive.

The pkg_build_deps and pkg_deps are where you define an array of Habitat packages needed for the application to build and run. The pkg_build_deps will not be included as a part of any Habitat packages. If you are doing anything more complicated than Hello World in Python, you will need third-party Python modules. You will need to create a Python virtual environment and install modules into it for them to be included in your Habitat package. You don’t have to include the core/virtualenv package in your runtime, however. For runtime, you will just need core/coreutils for things like file copy and lining operations and core/python2 to provide runtime libraries.

The pkg_exports and pkg_exposes variables define the ports that should be exposed by Habitat. In this case, only TCP port 8080 is exposed. This is the port that is defined in hooks/run for the gunicorn process to bind to. These need to match.

You don’t set it in the package you are creating, but there is an optional pkg_svc_user. If you have a user account that all your services run as, set it here to that user. Otherwise the value defaults to hab. This user must exist in order to start a service in a Habitat package.

With the package variables defined, next come the plan’s callback functions. These are executed by Habitat during the packaging process:

# Need to opt-out of all of these steps, as we're copying in source code
do_download() {
    return 0
}
do_verify() {
    return 0
}
do_clean() {
    return 0
}

do_unpack() {
    # Because our habitat files liver under build/.
    PROJECT_ROOT="${PLAN_CONTEXT}/.."

    mkdir -p $pkg_prefix
    build_line "Copying project data to $pkg_prefix/"
    cp -r $PROJECT_ROOT/app $pkg_prefix/
    cp -r $PROJECT_ROOT/*.py $pkg_prefix/
    cp -r $PROJECT_ROOT/requirements.txt $pkg_prefix/
}

do_build() {
    return 0
}

do_install() {
    cd $pkg_prefix
    virtualenv venv
    source venv/bin/activate
    pip install -r requirements.txt
}

Since you are not downloading any source, do_download(), do_verify(), or do_clean() do not need to do anything. The do_unpack() is where you copy the files for your service into the $pkg_prefix. If you were downloading a remote source archive, you would be decompressing it in this step. Python is not a compiled language, so leave do_build() empty. Finally, in do_install(), you change to the packaging directory you copied files to in do_unpack() and install Python modules via pip using the requirements.txt. (You may be savvy and catch a bug here. We’ll get to that.)

hooks/init & hooks/run

The hooks/init file initializes the application. With the application you are creating, you symlink the package contents into the var/ directory of the Habitat application root. With more complicated applications, you might do something like initialize a database schema. Notice the first two lines: They are very important for making it easier to debug package errors. The shebang line passes -xe to bash, which will enable script tracing and cause the script to immediately exit on error. The second line redirects stderr to stdout for all commands in the script. If you don’t do these two things, you may be left scratching your head trying to understand why your service is failing to start:

#!/bin/sh -xe
exec 2>&1
ln -fs /app 
ln -fs /venv 
ln -fs /gunicorn.conf.py 
ln -fs /config.py 
ln -fs /logging.conf 
ln -fs /threatstack-to-s3.py

The hooks/run file is what is executed to start the service. The script changes directory to where you symlinked files and starts the gunicorn process. Importantly, you tell gunicorn to bind to the port you exposed in the plan.sh:

#!/bin/sh -xe
exec 2>&1
cd 
gunicorn -c gunicorn.conf.py --bind 0.0.0.0:8080 threatstack-to-s3

Building & Deploying

With your Habitat configuration in place, you can build a package and start the service. From the root of the threatstack-to-s3 repository, run:

$ hab pkg build build/

All your Habitat files are stored under the build/ directory, which is why that is supplied at the end of the command. With the package built, you can start the service as shown below. Remember that the pkg_svc_user, which defaults to hab, must be available:

$ sudo ./hab start tmclaugh-threatstack-to-s3-0.1.0-20170220224242-x86_64-linux.hart
hab-sup(MN): Starting tmclaugh/threatstack-to-s3/0.1.0/20170220224242
hab-sup(CS): tmclaugh/threatstack-to-s3/0.1.0/20170220224242 is not installed
→ Using core/acl/2.2.52/20161208223311
→ Using core/attr/2.4.47/20161208223238
→ Using core/bzip2/1.0.6/20161208225359
→ Using core/cacerts/2016.09.14/20161031044952
→ Using core/coreutils/8.25/20161208223423
→ Using core/gcc-libs/5.2.0/20161208223920
→ Using core/glibc/2.22/20160612063629
→ Using core/gmp/6.1.0/20161208212521
→ Using core/libcap/2.24/20161208223353
→ Using core/linux-headers/4.3/20160612063537
→ Using core/make/4.2.1/20161214000256
→ Using core/ncurses/6.0/20161213233720
→ Using core/openssl/1.0.2j/20161214012334
→ Using core/python2/2.7.12/20161214012727
→ Using core/readline/6.3.8/20161213234107
→ Using core/sqlite/3130000/20161214012650
→ Using core/zlib/1.2.8/20161118033245
✓ Installed tmclaugh/threatstack-to-s3/0.1.0/20170220224242
★ Install of tmclaugh/threatstack-to-s3/0.1.0/20170220224242 complete with 1 new packages installed.
hab-sup(MR): Butterfly Member ID 2608f6668ce8417e96b1b068db8cb146
hab-sup(MR): Starting butterfly on 0.0.0.0:9638
hab-sup(MR): Starting http-gateway on 0.0.0.0:9631
threatstack-to-s3.default(SR): Initializing
threatstack-to-s3.default hook[init]:(HK): + ln -fs /hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/app /hab/svc/threatstack-to-s3/var
threatstack-to-s3.default hook[init]:(HK): + ln -fs /hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/venv /hab/svc/threatstack-to-s3/var
threatstack-to-s3.default hook[init]:(HK): + ln -fs /hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/gunicorn.conf.py /hab/svc/threatstack-to-s3/var
threatstack-to-s3.default hook[init]:(HK): + ln -fs /hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/config.py /hab/svc/threatstack-to-s3/var
threatstack-to-s3.default hook[init]:(HK): + ln -fs /hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/logging.conf /hab/svc/threatstack-to-s3/var
threatstack-to-s3.default hook[init]:(HK): + ln -fs /hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/threatstack-to-s3.py /hab/svc/threatstack-to-s3/var
threatstack-to-s3.default hook[init]:(HK): + exec
threatstack-to-s3.default(SV): Starting process as user=hab, group=hab
threatstack-to-s3.default(O): + cd /hab/svc/threatstack-to-s3/var
threatstack-to-s3.default(O): + source venv/bin/activate
threatstack-to-s3.default(O): ++ deactivate nondestructive
threatstack-to-s3.default(O): ++ unset -f pydoc
threatstack-to-s3.default(O): ++ '[' -z '' ']'
threatstack-to-s3.default(O): ++ '[' -z '' ']'
threatstack-to-s3.default(O): ++ '[' -n /bin/sh ']'
threatstack-to-s3.default(O): ++ hash -r
threatstack-to-s3.default(O): ++ '[' -z '' ']'
threatstack-to-s3.default(O): ++ unset VIRTUAL_ENV
threatstack-to-s3.default(O): ++ '[' '!' nondestructive = nondestructive ']'
threatstack-to-s3.default(O): ++ VIRTUAL_ENV=/hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/venv
threatstack-to-s3.default(O): ++ export VIRTUAL_ENV
threatstack-to-s3.default(O): ++ _OLD_VIRTUAL_PATH=/hab/pkgs/core/coreutils/8.25/20161208223423/bin:/hab/pkgs/core/python2/2.7.12/20161214012727/bin:/hab/pkgs/core/acl/2.2.52/20161208223311/bin:/hab/pkgs/core/attr/2.4.47/20161208223238/bin:/hab/pkgs/core/bzip2/1.0.6/2016$
208225359/bin:/hab/pkgs/core/glibc/2.22/20160612063629/bin:/hab/pkgs/core/libcap/2.24/20161208223353/bin:/hab/pkgs/core/make/4.2.1/20161214000256/bin:/hab/pkgs/core/ncurses/6.0/20161213233720/bin:/hab/pkgs/core/openssl/1.0.2j/20161214012334/bin:/hab/pkgs/core/sqlite/3130$
00/20161214012650/bin:/hab/pkgs/core/busybox-static/1.24.2/20161214032531/bin:/sbin:/bin:/usr/sbin:/usr/bin
threatstack-to-s3.default(O): ++ PATH=/hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/venv/bin:/hab/pkgs/core/coreutils/8.25/20161208223423/bin:/hab/pkgs/core/python2/2.7.12/20161214012727/bin:/hab/pkgs/core/acl/2.2.52/20161208223311/bin:/hab/pkgs/core/attr/2.4$
47/20161208223238/bin:/hab/pkgs/core/bzip2/1.0.6/20161208225359/bin:/hab/pkgs/core/glibc/2.22/20160612063629/bin:/hab/pkgs/core/libcap/2.24/20161208223353/bin:/hab/pkgs/core/make/4.2.1/20161214000256/bin:/hab/pkgs/core/ncurses/6.0/20161213233720/bin:/hab/pkgs/core/openss$
/1.0.2j/20161214012334/bin:/hab/pkgs/core/sqlite/3130000/20161214012650/bin:/hab/pkgs/core/busybox-static/1.24.2/20161214032531/bin:/sbin:/bin:/usr/sbin:/usr/bin
threatstack-to-s3.default(O): ++ export PATH
threatstack-to-s3.default(O): ++ '[' -z '' ']'
threatstack-to-s3.default(O): ++ '[' -z '' ']'
threatstack-to-s3.default(O): ++ _OLD_VIRTUAL_PS1=
threatstack-to-s3.default(O): ++ '[' x '!=' x ']'
threatstack-to-s3.default(O): +++ basename /hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/venv
threatstack-to-s3.default(O): ++ PS1='(venv) '
threatstack-to-s3.default(O): ++ export PS1
threatstack-to-s3.default(O): ++ alias pydoc
threatstack-to-s3.default(O): ++ '[' -n /bin/sh ']'
threatstack-to-s3.default(O): ++ hash -r
threatstack-to-s3.default(O): + gunicorn -c gunicorn.conf.py --bind 0.0.0.0:8080 threatstack-to-s3
threatstack-to-s3.default(O): [2017-02-20 22:44:58 +0000] [23685] [INFO] Starting gunicorn 19.6.0
threatstack-to-s3.default(O): [2017-02-20 22:44:58 +0000] [23685] [INFO] Listening at: http://0.0.0.0:8080 (23685)
threatstack-to-s3.default(O): [2017-02-20 22:44:58 +0000] [23685] [INFO] Using worker: gevent
threatstack-to-s3.default(O): [2017-02-20 22:44:58 +0000] [23690] [INFO] Booting worker with pid: 23690

Exporting Other Package Formats

Habitat’s native package format is great, but may not suit your needs. For instance, it requires that you have Habitat pre-installed on the host that the package is going to run on. It also requires that its dependencies be downloaded when the service first starts. You may prefer to export a tarball that contains the application, all its dependencies, and the Habitat binary for running the application. Or, you might want to use your existing Docker infrastructure.

To export a tarball, run the following command:

$ hab pkg export tar tmclaugh/threatstack-to-s3

To run the application, transfer the tarball to the host that the application will run on, and do the following:

$ sudo tar zxvf tmclaugh-threatstack-to-s3-0.1.0-20170221015341.tar.gz -C /
$ sudo /hab/bin/hab start tmclaugh/threatstack-to-s3

If you are using Docker instead, export a Docker container using the following command and deploy the container as you would any other container in your infrastructure:

$ hab pkg export docker tmclaugh/threatstack-to-s3

Final Words. . .

So there you have it — a streamlined, effective, and fast way of packaging a Python Flask web service using Chef Habitat. Using Habitat is an excellent way of getting your service deployed and usable as quickly as possible, thereby boosting the likelihood of user adoption.

To see the final code in our repository, click here: https://github.com/threatstack/threatstack-to-s3/tree/phase_3_habitat

Also, a big thanks goes out to Mike Fielder whose blog post, GitHub repo, and time on Slack got me started.

Additionally, the Habitat community Slack helped me with some remaining questions.

Tom McLaughlin

As the Engineering Advocate at Threat Stack, Tom uses his experience in cloud infrastructure / security to solve problems and provide great insight into solutions. He loves finding new and interesting ways of safely and securely automating infrastructure. When not at work he is a proud cat dad to two calicoes and enjoys spending his time drag racing and sailing. He is also an amateur thinkfluencer on Twitter at @tmclaughbos.