angr is a powerful
binary analysis tool with a wide range of applications in the security
and reverse engineering domain, but it can also be used for other kinds
of problems. To put it concisely, it can read program executable files
and convert them into logical formulas while representing unknown values
like memory cells, registers or user input from stdin
as
variables which can then be constrained and solved for.
This allows for lots of interesting applications like solving for a set of function parameters that produces certain program behavior. Consider the following snippet of C code:
#include <stdio.h>
int derive_passcode(int seed) {
int a = seed >> 0 & 0xffff;
int b = seed >> 16 & 0xffff;
for (int i = 0; i < 100; i++) {
a *= 0x19461999;
b *= 0x9af10bbf;
}
return a ^ b;
}
int check_access(int passcode) {
return passcode == derive_passcode(0x12de456c);
}
int main(int argc, char *argv[])
{
int passcode;
scanf("%d", &passcode);
return !check_access(passcode);
}
In this case, derive_passcode
is straightforward but
might be way more complex in reality. It could for example derive valid
license keys for a program you wish to use. Your task now is to find a
valid passcode that passes the check. How do we do this without
searching the whole 2^32-large space of possibilities?
Angr can help us here: we load the program, convert it to a formula and then use z3 as a backend to solve this formula. A great advantage is that angr works with machine code so we don’t even need any debug symbols or source code for this. Let’s take a look at the revelant machine code:
...>:
0000000000001149 <derive_passcodepush %rbp
1149: 55 %rsp,%rbp
114a: 48 89 e5 mov %edi,-0x14(%rbp)
114d: 89 7d ec mov -0x14(%rbp),%eax
1150: 8b 45 ec mov %ax,%eax
1153: 0f b7 c0 movzwl %eax,-0xc(%rbp)
1156: 89 45 f4 mov -0x14(%rbp),%eax
1159: 8b 45 ec mov 10 shr $0x10,%eax
115c: c1 e8 %eax,-0x8(%rbp)
115f: 89 45 f8 mov 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
1162: c7 1c jmp 1187 <derive_passcode+0x3e>
1169: eb -0xc(%rbp),%eax
116b: 8b 45 f4 mov 99 19 46 19 imul $0x19461999,%eax,%eax
116e: 69 c0 %eax,-0xc(%rbp)
1174: 89 45 f4 mov -0x8(%rbp),%eax
1177: 8b 45 f8 mov 0b f1 9a imul $0x9af10bbf,%eax,%eax
117a: 69 c0 bf %eax,-0x8(%rbp)
1180: 89 45 f8 mov 01 addl $0x1,-0x4(%rbp)
1183: 83 45 fc 63 cmpl $0x63,-0x4(%rbp)
1187: 83 7d fc 116b <derive_passcode+0x22>
118b: 7e de jle -0xc(%rbp),%eax
118d: 8b 45 f4 mov xor -0x8(%rbp),%eax
1190: 33 45 f8 pop %rbp
1193: 5d
1194: c3 ret
>:
0000000000001195 <check_accesspush %rbp
1195: 55 %rsp,%rbp
1196: 48 89 e5 mov 08 sub $0x8,%rsp
1199: 48 83 ec %edi,-0x4(%rbp)
119d: 89 7d fc mov 6c 45 de 12 mov $0x12de456c,%edi
11a0: bf 9f ff ff ff call 1149 <derive_passcode>
11a5: e8 %eax,-0x4(%rbp)
11aa: 39 45 fc cmp %al
11ad: 0f 94 c0 sete %al,%eax
11b0: 0f b6 c0 movzbl
11b3: c9 leave
11b4: c3 ret ...
According to the System V ABI, the input to check_access
is stored in register edi
and the output in register
eax
. The problem now is as follows:
edi
to get a 1 in
eax
?
A possible solution will be presented without going too deep into the API of angr - there is a plethora of documentation and practice material out there.
#!/usr/bin/env python3
import angr
# load the program
project = angr.Project('./a.out')
# get the address of the `check_access` function and construct a state ready to execute it
check_access_addr = project.loader.find_symbol('check_access').rebased_addr
check_access_state = project.factory.blank_state(addr=check_access_addr)
# create symbolic variable (this is what we want to solve for)
passcode = check_access_state.regs.edi
# now we need to simulate the program up to the point of return so we can constrain the result
simgr = project.factory.simulation_manager(check_access_state)
simgr.explore(find=0x4011B4)
# constrain the result and solve for the passcode
result_state = simgr.found[0]
result = result_state.regs.eax
result_state.add_constraints(result == 1)
concretized_passcode = result_state.solver.eval(passcode)
print(concretized_passcode)
This spits out 703611698
which is the correct value we
were looking for.
The goal of Day 17 of the aoc 2024 challenges was to find a fixpoint input that encodes a program outputting itself. Brute-forcing was not possible since the input space is unconstrained and over two hours of searching yielded nothing. I then tried the good ol’ pen-and-paper approach and figured out some patterns in the program which probably allows you to solve it, but I didn’t want to solve all of this by hand. That’s why I decided to implement the interpreter in a small C program and do the same procedure as above to find an input that generates the appropriate output. This is the C program:
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
const uint64_t program[] = { 2,4, 1,3, 7,5, 0,3, 1,5, 4,4, 5,5, 3,0 };
const size_t program_len = sizeof(program) / sizeof(program[0]);
uint64_t combo(uint64_t operand, uint64_t regs[3]) {
switch (operand) {
case 0:
case 1:
case 2:
case 3: return operand;
case 4: return regs[0];
case 5: return regs[1];
case 6: return regs[2];
}
exit(1);
}
void test(uint64_t reg_a) {
uint64_t regs[] = { reg_a, 0, 0 };
size_t ip = 0;
size_t out_index = 0;
while (ip < program_len) {
uint64_t instr = program[ip];
uint64_t operand = program[ip + 1];
switch (instr) {
case 0: regs[0] >>= combo(operand, regs); break;
case 1: regs[1] ^= operand; break;
case 2: regs[1] = combo(operand, regs) % 8; break;
case 3: if (regs[0] != 0) { ip = operand; continue; }; break;
case 4: regs[1] ^= regs[2]; break;
case 5: if (program[out_index++] != combo(operand, regs) % 8) { return; }; break;
case 6: regs[1] = regs[0] >> combo(operand, regs); break;
case 7: regs[2] = regs[0] >> combo(operand, regs); break;
}
ip += 2;
}
if (out_index == program_len) {
printf("Success!\n");
}
}
int main(int argc, char *argv[])
{
test(234971750570964);
return EXIT_SUCCESS;
}
This made it fairly easy to constrain the program to reach
printf("Success!\n")
and solving for the input
reg_a
. Since there might be multiple solutions and the
challenge specifically asks for the smallest one, we solve using the
min
function.
#!/usr/bin/env python3
import angr
import claripy
proj = angr.Project('a.out')
reg_a = claripy.BVS('reg_a', 64)
func_addr = proj.loader.find_symbol('test').rebased_addr
print(hex(func_addr))
state = proj.factory.blank_state(addr=func_addr)
state.regs.rdi = reg_a
simgr = proj.factory.simulation_manager(state)
simgr.explore(find=0x4013EC)
print(simgr.found)
if simgr.found:
reg_a_value = simgr.found[0].solver.min(reg_a)
print(reg_a_value)
Constraint solving is an extremely powerful technique that has many applications. You can not just use it for reverse engineering, but also for encoding problems in programs instead of directly using formulas which I find to be more ergonomic. A deep understanding of SMT solvers and all the different options is not necessary, but might be helpful. Definitely a tool that should be in every computer scientists’ toolbox!