Learn you Rake

Learn You Rake

Who needs a Makefile anyway?

Rake is a task runner, and in my opinion, a worthy replacement of Make. It is written in Ruby and, as a result, has the immense benefit of being concise, pleasant to eyes, and fun to write.

Task runners, such as Rake or Make, provide many advantages over manual task management. You can program complex rules or conditional triggers to automate your tasks. They also come with a variety of in-built functions to help you write DRY code and reduce bugs. Finally, they help you write consistent and reproducible code to run on different systems.

For example, Rake can help you organize and manage analyses in a bioinformatics project. With a few simple commands, you can rerun all of your analyses in a desired ordered manner without worrying about mistyping file paths or forgetting important flags.

“Very cool,” you may say, “but how do I use it?” Fair enough. In this post, I will cover elements of Rake that I find useful on daily basis and covers almost all of my use cases.

So, in words of Gandalf, let’s:

Fly, You Fools!

Usage

macOS users should have rake preinstalled. If not, use the following command to install rake:

gem install rake

Once installed, using rake is as simple as following:

$ cd my_dir
$ tree -L 1
.
├── Rakefile
├── file1.py
├── file2.py
├── data/
└── src/

$ rake
(in /home/user/my_dir)
...running...

When invoked on command line without any options, rake searches for a file named Rakefile in the current directory and executes the default task within the file. We will see later how we can adjust which task gets executed and how. ⊕ Rakefile is the “Makefile” equivalent of Rake.

Write you a `Rakefile`

First of all, there is no special format for a Rakefile. A Rakefile contains executable Ruby code. Anything legal in a ruby script is allowed in a Rakefile. However, there are conventions that we must follow.¹ ⊕ For a crash course in Ruby, consider Learn X in Y minutes tutorial for Ruby.

Hello Rake

One of the primary building blocks of a Rakefile is a task. Task is an action you wish to perform which consumes an input and produces an output. Naturally, if you make the input of one task dependent on the output of another, you create a dependency between the tasks, and create what is called a pipeline or workflow. More on that later. For now, let us write a simple task that prints hello world to the terminal.

# Rakefile

task :default => :hello_world

# cool ruby block syntax
task :hello_rake do
  # any valid Ruby code goes here
  puts "Hello, Rake!"
end

Save the snippet in a Rakefile inside a directory and run rake.

$ rake 
Hello, Rake!

By default, Rake will build the :default task within the Rakefile. If you don’t have a :default task, it will give you an error and ask to specify a task. In the case above, you could also get the same output by executing rake hello_rake.

So, to summarize, here’s what we did so far:

We wrote a “hello world” task that prints “Hello, World!” to the screen,
And because it is not the default task, we specified this task as a pre-requisite to the default task in the first line using the => syntax.

What is a Rake Task?

Based on what we see above, here’s what an empty or pseudo-code style task looks like: ⊕ You can also use strings for task names and prerequisites, rake doesn’t care. For example,
task 'name' => %w[prereq1 prereq2]

task <task_name> => <pre-requisites> do |t|
  # actions (may reference t or omit |t| from previous line)
end

As you can notice already, multiple pre-requisites are simply specified by putting them into an array or list.

Augmenting Tasks

The existing skeleton, although simple, is quite effective. We can, however, add additional features to it that can make life easier in certain cases. For instance, say your task expects an argument and does something based on that. Say, run code assuming certain version of a program. Rake provides a way to achieve this as follows:

# Rakefile

task :default => :hello_world

desc "Run Hello World with your name"
task :hello_world, [:name] do |t, args|
  # any valid Ruby code goes here
  puts "Hello, #{args.name}!"
end

Now run it with a twist this time:

$ rake hello_world[Vivek]
Hello, Vivek!

$ rake -T
rake hello_world[name]  # Run hello_world with your name

The arguments specified on the command line were passed to the tasks through args which can then be used inside the block to perform specific functions. As before, you can pass multiple arguments at once.

You may also notice that we added a desc statement with a string describing what the task does. The benefit of doing that is that it shows up as help when a user runs rake -T as shown in the example. ⊕ RULE
Document your tasks.

What if no arguments are supplied? In such cases, you can use with_defaults method in the task body to assume specific defaults.

task :name, [:first_name, :last_name] do |t, args|
  args.with_defaults(:first_name => "John", :last_name => "Dough")
  puts "First name is #{args.first_name}"
  puts "Last  name is #{args.last_name}"
end

What if number of arguments is unknown or variable? In that case, use the extras method of args variable. This allows for tasks that can loop over a variable number of values, and its compatible with named parameters as well:

task :email, [:message] do |t, args|
  mail = Mail.new(args.message)
  recipients = args.extras
  recipients.each do |target|
    mail.send_to(target)
  end
end

Skeleton of a Task

task <target>, [:arg1, :arg2] => [:pre_req1, :pre_req2] do |t, args|
  # actions, may reference t, args here and use methods on them
end

Types of Tasks

While task is a generic way of doing things using Rake, there are special types of tasks as well depending on what kind of output is expected.

file tasks
phony tasks
directory tasks
clean or clobber tasks

File tasks

As name suggests, file tasks are expected to create a file from one or more input files. These tasks would be skipped if the target files already exist.

File tasks are declared using the file method instead of the task method.
File tasks are usually named with a string rather than a symbol.

file "read_distribution.pdf" => ["read_counts.csv"] do |t|
  sh "Rscript plot_distribution #{t.prerequisites[0]} #{t.name}"
end

The great thing about File tasks is that rake provides useful file handling functions such as cp, mv, and rm_r to perform common file operations. For convenience, these are named after their equivalent command line programs. These functions are included in the RakeFileUtils module, an extended version of the standard ruby fileutils module and can be explored using ri FileUtils command.

task :remove_all do
  rm_r("./build")
end

Phony Tasks

A phony task is a file task but instead of other files are input, it uses non-file-based-tasks are prerequisites (without forcing them to rebuild). In Makefile, this is specified using .PHONY.

Use require 'rake/phony' to add the phony task.

require 'rake'
require 'rake/phony'

# Define a phony task to generate a random phone number
Rake::PhonyTask.new('phone_number') do |t|
  t.area_codes = %w[212 646 718]
end

# Define a task that uses the phone_number task
task :make_call => :phone_number do
  puts "Dialing #{Rake.application['phone_number'].value}..."
end

Directory tasks

It is common to need to create directories upon demand. The directory convenience method is a short-hand for creating a FileTask that creates the directory. However, the directory method does not accept prerequisites or actions, but both prerequisites and actions can be added later.

directory "testdata"
file "testdata" => ["otherdata"]
file "testdata" do
  cp Dir["standard_data/*.data"], "testdata"
end

Clean and Clobber Tasks

Through require 'rake/clean' Rake providesclean and clobber tasks:

clean: Clean up the project by deleting scratch files and backup files. Add files to the CLEAN FileList to have the clean target handle them.
clobber: Clobber all generated and non-source files in a project. The task depends on clean, so all the CLEAN files will be deleted as well as files in the CLOBBER FileList. The intent of this task is to return a project to its pristine, just unpacked state.

You can add file names or glob patterns to both the CLEAN and CLOBBER lists. ⊕ RULE
Include rules to cleanup temporary files.

require 'rake/clean'

# Define some files to be cleaned
CLEAN.include('*.o', '*.obj')

# Define some files to be clobbered
CLOBBER.include('*.exe', '*.dll')

# Define a task that generates some object files
task :compile do
  # code to compile source files into object files
end

# Define a task that links the object files into an executable
task :link => :compile do
  # code to link object files into an executable
end

# Define a task that depends on the link task and cleans up afterwards
task :build => :link do
  puts "Build complete"
end

# Define a task that cleans up the generated object files
task :clean do
  Rake::Task['clean'].invoke
  puts "Object files cleaned"
end

# Define a task that clobbers all generated files
task :clobber do
  Rake::Task['clobber'].invoke
  puts "All generated files clobbered"
end

Useful powerups

FileLists

FileList allows you to write tasks that process a lot of files. It is essentially an array of files with special methods available.

Creating a file list is easy. Just give it the list of file names:

fl = FileList['file1.rb', file2.rb']

Or give it a glob pattern:

fl = FileList['*.rb']

# more fun
FileList['*.rb'].each do |src|
	# add tasks that process files here	
	file target => src do
		# actions
	end
end

Parallel execution

Rake allows parallel execution of prerequisites using the following syntax:

multitask copy_files: %w[copy_src copy_doc copy_bin] do
  puts "All Copies Complete"
end

In this example, copy_files is a normal rake task. Its actions are executed whenever all of its prerequisites are done. The big difference is that the prerequisites (copy_src, copy_bin and copy_doc) are executed in parallel. Each of the prerequisites are run in their own Ruby thread, possibly allowing faster overall runtime.

Rules

Rule tasks, also known as synthesized tasks, have the same characteristics as all other kinds of tasks: they have a name, they can have zero or more actions, they can have prerequisites, and if Rake determines the task needs to be run it will only be run once.

What makes rule tasks different is that you don’t actually give them a name – I know, I just said that rule tasks have names, just bear with me – instead when you declare the task you give it a pattern in place of a name.²

Regular expression based matching:

rule /foo/ do |task|
  puts 'called task named: %s' % task.name
end

Specifying regular expression matched rules with syntactic sugar:

rule '.txt' do |task|
  puts 'creating file: %s' % task.name
  touch task.name
end

Specifying dependencies:

rule '.dependency' do |task|
  puts 'called task: %s' % task.name
end

rule '.task' => '.dependency' do |task|
  puts 'called task: %s' % task.name
end

Rules for Files

rule '.txt' => '.template' do |task|
  cp task.source, task.name
end

Advanced Rakefile

Accessing Other Tasks

You can directly manipulate the input, output, and actions of one task from another. For instance,

task :doit do
  puts "DONE"
end

task :dont do
  Rake::Task[:doit].clear
end

Running this example:

$ rake doit
(in /Users/jim/working/git/rake/x)
DONE
$ rake dont doit
(in /Users/jim/working/git/rake/x)

Namespaces

When your Rakefile grows, it’s a good idea to bundle tasks into separate namespaces as an additional layer of organization. This is done using namespace. ⊕ CAUTION
File tasks are not scoped by namespace command since they refer to actual physical file on the system.

For example:

namespace "main" do
  task :build do
    # Build the main program
  end
end

namespace "samples" do
  task :build do
    # Build the sample programs
  end
end

task build: %w[main:build samples:build]

This post should provide you sufficient details to get started with a majority of tasks. If you are done with the post, But I highly recommend checking out other excellent resources that dive deeper into specific details.