How to write an LLVM Pass

4 minute read

Published:

The official tutorial of writing an LLVM Pass is here.

This post is mainly about writing an LLVM Pass in a standalone style and using the compiled LLVM Pass (the .so file) in the compilation time (e.g., compile a target of instrumentation with clang).

0. Prerequisites

  • Ubuntu (here we take Ubuntu 20.04 as a example)
  • LLVM (default version 10)
  • clang (default version 10)

Install LLVM and clang of default version using the following command:

sudo apt install llvm clang

1. Create files to contain an LLVM Pass

Assuming that we are under a user directory (i.e., cd ~), we create a new directory named llvm-pass and enter it:

mkdir llvm-pass && cd llvm-pass

Then, we create a folder to contain our custom LLVM Pass named Hello:

mkdir Hello && cd Hello

In the Hello folder, we touch a new Hello.cpp and edit it.

2. Write an LLVM Pass

First, we add those “include” statements at the start of the Hello.cpp:

#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"

Second, we add a using namespace statement to bring things needed into scope:

using namespace llvm;

Then, we write our LLVM Pass which is a subclass of FunctionPass in an anonymous namespace:

namespace {
    struct Hello : public FunctionPass {
        static char ID;  //  used by LLVM to identify pass
        Hello() : FunctionPass(ID) {}

        bool runOnFunction(Function &F) override {
            errs() << "Hello: ";
            errs().write_escaped(F.getName()) << '\n';
            return false;
        }
    };
}

The function runOnFunction, which overrides an abstract virtual method inherited from FunctionPass, in the pass will be executed on a function at a time. In this example, it means that the string comprised of “Hello: “ and the function name is printed to stderr everytime a function is encountered.

To finish writing this pass, we initialize pass ID and register our pass class Hello at the end of Hello.cpp:

char Hello::ID = 0;
static RegisterPass<Hello> X(
    "hello",        // the name of command line argument
    "Hello Pass",   // pass name
    false,          //  if a pass walks CFG without modifying it then the third argument is set to true
    false           // if a pass is an analysis pass, for example dominator tree pass, then true is supplied
);

3. Build the LLVM Pass

The following command is used to build the custom LLVM Pass we write to a shared library (.so file):

clang++ -shared -fPIC -o <output-so-file-name>.so <your-pass-name>.cpp $(llvm-config --cxxflags --ldflags --libs)

In this example, we simply replace those two placeholder with “Hello”:

clang++ -shared -fPIC -o Hello.so Hello.cpp $(llvm-config --cxxflags --ldflags --libs)

Then, you will see the Hello.so under Hello directory.

4. Use the LLVM Pass

To use the LLVM Pass just written by us, there are two ways: (1) load the pass at the optimization pipeline (e.g., use llvm optimization tool opt), and (2) load the pass with a compiler (e.g., clang).

Here, we use a simple C file to demonstrate the two ways.

/* main.c */
#include <stdio.h>

int hello() {
    return 0;
}

int main() {
    hello();
    return 0;
}

4.1 Use a pass at LLVM optimization phase

To load the pass, we need to get the LLVM bitcode file of this C code:

clang -emit-llvm -c main.c -o main.bc

Then load and enable our Hello pass with opt:

opt -load path-to-pass/Hello.so -hello < main.bc > /dev/null

In this example, we just discard the optimized bitcode file by redirecting the stdout to /dev/null. If the pass modifies the code (e.g., adding or deleting some instructions), you can save the optimized bitcode by redirecting the stdout to some file. For example, we can save the output above using > main_opt.bc.

After the pass, information or data collected during the pass will print according to your setting. If you use the pass to modify the original code, the optimized bitcode file (e.g., main_opt.bc) can be compiled to an executable and those code modifications will take effect when running that executable. For instance:

clang main.bc -o main

4.2 Use a pass with a compiler

Using passes at optimization phase sometimes needs more effort and is tedious, because you always need to get the LLVM bitcode (.bc files) of the target and run opt to enable passes.

There is a approach that can use passes just as the command line argument of a compiler, which make pass loading more convenient and without complicated process.

First, we need to add following line at the end of Hello.cpp and rebuild it:

static void registerHello(const PassManagerBuilder &, llvm::legacy::PassManagerBase &PM) {
    PM.add(new Hello());
}

static RegisterStandardPasses RegisterHello(PassManagerBuilder::EP_EarlyAsPossible, registerHello);

Now, you can use a simpler command to compile the target with your pass loaded and enabled.

clang -Xclang -load -Xclang path-to-pass/Hello.so <your-source-files>

Reference