In this post, we will talk about the canaries, which is part of “Smash Stack Protector” (SSP) mechanism built in GCC (along with most other modern compilers). This article aims to describe canaries, and summarize the different implementations of SSP on different architectures. Developers enforcing SSP should be aware of these implementations when building code that aims to be built on different architectures (for example, embedded software in IoT devices). We will dig deep into the libC and the kernel to understand fundamentally all the components of the canary.
The following architectures were tested in VMs (thanks to QEMU):
All using the same setup:
All the links to source code are provided within this article, and all the code excerpts can be found on elttam’s GitHub. The disassembled snippets are run against compiled versions of files on the GitHub, which you can compile to reproduce, with:
The stack protection mechanism appeared in response to the wide-spread stack buffer overflow vulnerabilities, which started attracting a lot of attention after the famous Phrack article by AlephOne in 1996.
Starting 2000, Hiroaki Etoh (from IBM) suggested first the idea of modifying GCC compilation process to integrate a low overhead mechanism to protect against stack overflows. This gave birth to “StackGuard”, the GCC Stack-Smashing protection still in use today. As early as 2000, several implementations were tested, and consequently have been attacked.
However, by constantly improving it, GCC implemented the “StackGuard” protection, thoroughly described in the GCC Summit paper (2003).
The goal for SSP is to provide the program a way to detect if the stack has been corrupted to the point where it can allow to redirect the code flow and allow arbitrary code execution. To protect it, a random value will be inserted at the base of stack of a function context like this:
In this example above, if Var2 boundaries are not properly checked (for example when using strcpy() type of functions) and attempts to overwrite the return address, it will corrupt the canary, which the program will detect and force a premature (but safe) exit to further memory corruption, and ultimately code execution. As we can observe, one of the immediate weakness is that it will not avoid the corruption of the variables of the current context (here Var1). However, compilers can rearrange the setup of the variables to prevent that.
The SSP is enabled at two levels:
-fstack-protector (since GCC 4.1): includes a canary when a function defines an array of char with a size of 8 bytes or more-fstack-protector-all: adds a canary for all non-inline functions-fstack-protector-strong (since 4.9): provides a smarter way to protect any sensitive location within the current context (the best description can be found on Kees Cook blog)ld).The original paper defines 3 possible canary types:
<img src=”https://github.com/elttam/canary-fun/blob/master”/assets/images/terminator_canary.png?raw=true” width=”100%”>
In practice, it is possible to check the presence of a canary within the ELF thanks to the presence of __libc_chk_fail@plt symbol, which is the PLT entry for the procedure invoked should the canary be tampered with. Some tools (like checksec.sh or pwntools) can also be used.
To be secure, the canary must ensure at least the following properties:
Let’s examine that!
The GlibC manipulates the canary through a global variable called__stack_chk_guard. By reading the source code of glibc-2.24, one can apprehend quite fast when the userland canary* is being setup. The canary is generated by the loader, through the following calls:
*Note: the reason I specified “userland canary” is because the Linux kernel uses another (and different) canary to protect against stack overflow within the kernel. However, for simplicity, canary will always refer to userland canary for now. We will cover the kernel-land canary later in this article.
We can observe that the function _dl_setup_stack_chk_guard() allows to create all the canary types mentioned earlier: if dl_random is null, then the__stack_chk_guard will be a “terminator canary”, otherwise “random canary”.
In practice, on recent Glibc, dl_random is never null (we will understand why later on), and so the canary is only a (mem-)copy of it, with its least significant byte being nullified.
This operation is done to force the termination of a C-string, and make it harder for attackers to overwrite. But on the other hand, this also diminishes all the possible values for the canary, which can only have 2^((sizeof(register)-1)*8) different values.
The canary is roughly a (mem)copy of _dl_random, which according to a vague description, is populated by the kernel. Let’s see how it’s done:
_dl_aux_init() is called by LIBC_START_MAIN(), itself called by _start, which is the ELF entrypoint in userland from the kernel, as defined by the SystemV R4 ABI (see [the x8664 ABI](https://refspecs.linuxbase.org/elf/x86_64-abi-0.21.pdf), page 25 and onward). _dl_aux_init() is the function in charge of handling in userland the values passed from the kernel, through the “_Auxiliary Vector”.
An Auxiliary Vector is an ELF structure that aims to provide information from the kernel to the application. Note that this structure must be present but can be empty. If not empty, then it will provide information basically in the form of an associative array, whose keys can be found in the manpage of getauxval(). Among other valuable information we find:
AT_RANDOM value can be found in the libC:
The Auxiliary Vector can be dumped directly from the terminal by invoking the target binary and setting the environment variable LD_SHOW_AUXV:
The same information is exposed through the procfs structure (/proc/<pid>/auxv):
Note: Some hardened kernels/systems will not expose this information.
Using the greetz.c test file compiled with -fstack-protector, we can also use GDB to confirm this.
Bingo, we have a perfect match: rax contains the value of the 8 first bytes of the AUXV AT_RANDOM! This information is useful, because now we have an easy way to determine the canaries for any process.
The file read_canary_from_pid.c provides a Proof-of-Concept for this attack:
Quick note: this code will work universally on all recent Linux for all architectures as long as it supports the syscall process_vm_readv and exposes their Auxiliary Vector. A Python version is also provided, that solely relies on procfs information.
<img src=”https://github.com/elttam/canary-fun/blob/master”/assets/images/canary-dump.png?raw=true” width=”100%”>
This means that if a process allows to read arbitrary files (such as a Directory Traversal vulnerability on a Web server), it is possible to retrieve the canary this way, if you can seek through the file descriptor. For example, if targeting an HTTP server, the leak would look something like:
/proc/self/auxv to get the AT_RANDOM location/proc/self/mem and force an lseek access to reach the location found above via the HTTP header Range (for instance Range: bytes=<0xAT_RANDOM_ADDRESS>-<0xAT_RANDOM_ADDRESS+16>)sizeof(register)data &= 0xff)__stack_chk_guardlocation in memory!That’s pretty cool, but back to our business. Right now, what we really want to know is how the canary gets populated. So far, we only know where the canary gets its value from, but we do not know how the 16-byte (or 128-bit) location pointed by the Auxiliary Vector AT_RANDOM gets filled.
<img src=”https://github.com/elttam/canary-fun/blob/master”/assets/images/deeper.jpg?raw=true” width=”100%”>
The creation of a new process goes way beyond the purpose of this article, so we will simply cover the part that interests us. Note that there are plenty of excellent resources covering this topic.
When sys_execve is called, the kernel will prepare the new process. If the executable is an ELF, it will call load_elf_binary(), that will in turn call create_elf_tables().
It is this function that will populate with random data the 16-byte bufferk_rand_bytes, and expose it to the user.
And finally, it will create the Auxiliary Vector entry for AT_RANDOM:
Finally, we know exactly how the canary gets its value, which we can summarize here:
sys_execve, using the function void get_random_bytes(void *buf, int nbytes),AT_RANDOM,_dl_random global is pointing to this location,memcopy-ed into __stack_chk_guard.There, done!
Unfortunately for us, that does not leave us a big room for attacking its randomness. Although evaluating/attacking the random generation from this function has been done in the past, we will not cover it as part of this article. This function is at the core of every (if not all) cryptographic mechanism in Linux (cryptographic key generation, TCP sequencing, BlueTooth pairing exchange, Linux kernel canary, etc.), and even though this source of randomness is not perfect, it is considered very secure.
It is actually such a good source of entropy that developers could also rely on it for initializing user-land random generatorsrand for non-forking processes.
The following C snippet could be used to that extent:
This would actually be better than the traditional (and vulnerable) call:
which is still used way too much
<img src=”https://github.com/elttam/canary-fun/blob/master”/assets/images/github-search.png?raw=true” width=”100%”>
But there is a design flaw here: as we saw, the canary value is being set via the LIBC_START_MAINfunction. This means that the value is only generated when a new ELF is being executed and mapped in memory (via sys_execve syscall). But a regular fork will result in the child process systematically inheriting its canary from its parent. This weakness makes the canary inherently vulnerable to brute-force attacks (CTF players are very familiar with such attack).
Now that we know where the canary’s value comes from, let’s spend some time trying to analyse how the canary is used at the assembly level as directed by the compiler (here GCC-6.3.0), for several architectures, starting naturally with Intel.
GCC implementation of the canary for Intel architectures will rely on the selector gs as as quick grep(or even better, rg) will tell:
i386 family uses segmentation to translate virtual address in protected mode to physical address. Several 16-bit selectors exist to make this mechanism possible, and the most commonly known are Code Selector (CS), Data Selector (DS), Stack Selector (SS). Two additional selectors exist, FS and GS, without specific purpose. GS is usually used to store TLS information. GCC uses this segment register to save the canary at the offset GS:0x14. If no TLS, its location is pointed by the symbol __stack_chk_guard.
If we apply it to the binary greetz, the canary is copied in the current context right after the function prologue:
And the epilogue will be in charge of checking if the canary has been modified:
As seen by grep-ing GCC earlier, x86_64 will use FS instead of GS as a selector with an offset of 0x18. This implementation choice is interesting since x86_64 has a flat-memory model, and GS, FS are only offsetting registers. However, it is used because of the following property:
Every segment register has a “visible” part and a “hidden” part. When a segment selector is loaded into the visible part of a segment register, the processor also loads the hidden part of the segment register with the base address, segment limit, and access control information from the segment descriptor pointed to by the segment selector. The information cached in the segment register (visible and hidden) allows the processor to translate addresses without taking extra bus cycles to read the base address and limit from the segment descriptor.
Source: Intel® 64 and IA-32 Architectures, Sect 3.4.3 - Segment registers
Using FS allows to have the canary in the current memory layout without having a potential attacker allowed to directly reach the address.
On Intel, the canary is stored in a readable and writable location. By inserting a simple C stub, such as
it becomes possible to read the canary’s value from inside the current process (read_canary.c is here):
Using the same movl instruction and swaping the arguments also allows to re-write it. By combining those two mechanisms (reading and writing), one can replace a forked process’ canary with an arbitrary one directly during the runtime of the process. This means that x86 developers can protect their code against brute-force attacks on the SSP with a very simple stub, and minimal performance impact.
To prove it, the file greetz-renew-canary.c was written as a Proof-of-Concept, where it will replace the child process’ canary with a dummy value (in this case 0x4142434445464748). This code runs similarly on 32 and 64 bits.
As we can see, through a quite simple hack, we have protected our forked process against brute-force attack! A good seed for the new canary would be to re-use another chunk of the buffer randomly generated (provided byAT_RANDOM).
Other implementations such as RenewSSP provides a ready-to-use library to force the canary renewal upon forking. Similarly, this library uses this “hack” to update the canary values on forked process, and works only for x86. The very nature of this hack will never allow it to be merged upstream.
Now that canaries on Intel architecture have no secret for us, let’s move on to other architectures and implementations.
The location of the SSP can be found under symbol __stack_chk_guard, and the failure procedure (__stack_chk_fail) by its PLT location.
Let’s compile greetz.c on an ARMv6l (RaspberryPi-like) with-fstack-protector, and disassemble the (vulnerable) greetz() function. It will look something like this:
At 0x000085c0, the binary loads the canary, and stores it into the stack at 0x000085c8. A careful reader would have seen those weird andeq instruction after the return (pop pc). The first address (at 0x00008604 greetz+80) corresponds to the address where the canary location is hardcoded by the compiler. But because it is within the .text segment, GDB assumes it is code and disassemble it as code, where it is really an address.
The canary is written in BSS, so its location will always be predictable unless the binary is compiled as PIE. But wait, if the compiler defines a hardcoded value to indicate where to find the canary, how can this work if the memory is totally randomized?
To do that, the compiler will cheat: it will hardcode at the end of the function an offset (at 0x7f558808) and, since on ARM, $pc is a register like any other, it will simply $pc to this offset to find the canary (at 0x7f55880c)!
This means that the compiler requires that the .got page be located immediately after the .text page(s). Such predictability allows attacks such as Offset2lib.
Fun fact: if only one function is to be SSP-protected, the compiler can optimize the code to strip the reference to __stack_chk_guard. The location of the canary will stay the same, but no symbol will exist.
MIPS compiled binaries can also be protected by SSP, and canaries check implementation on MIPS is very similar to the ARM approach.
But unlike ARM, the stub inserted by the compiler will point to an address in the GOT. This location holds another address pointing into a read-only location mapped by ld.so, where the __stack_chk_guard is stored.
This double dereference does not really allow to hack our way to simply update the canary when the binary is forked, like we did on Intel.
Just as in ARM, the few ways to recover the canary would be by either bruteforcing the 2^24 possible values, or through an information leak. Many home routers are MIPS-based Linux boxes, and still have many format string vulnerabilities which can be precious for this kind of attack.
Last but not least, let’s see SSP on PowerPC. As it just so happens, there is not much more to say for this architecture and it is very similar to ARM and MIPS.
A page is allocated in memory as read/write, which will contains the canary.
As expected, the canary is populated the same way that we described before, and the PoC read_canary_from_pid can still be used to know the canary of a running process:
And now that we’ve covered all the major architectures, you might also be curious to know about the kernel-land canary.
Well, Linux protects also itself against overflows thanks to a per-process structure calledstack_canary. This field is populated very early during the kernel initialization by calling the architecture-specific function boot_init_stack_canary().
On x86, Linux will use the same function as in user-land (i.e. get_random_bytes()), and will shuffle it using the timestamp like this:
For MIPS and ARM (including AARCH64), the kernel canary usesget_random_bytes() as well, but the result is XOR-ed with LINUX_VERSION_CODEvariable:
And every fork() will generate a new kernel canary for the current process:
Very similarly to user-land, the procedure __stack_chk_fail() will be invoked to panic() the kernel when a corruption is detected.
In this article, we’ve tried to cover a big part of the SSP protection, which is the canary generation and use. We’ve tested it across several architectures, which had us peeking down into kernel-land. Although the focus was given to understanding the canary mechanism of it, it is important to note that SSP encompasses more mechanisms, such as local variable re-ordering, and can also be finely tuned according to specific needs (using --param=ssp-buffer-size=N with N=8 as a default).
To conclude, SSP provides a fairly good protection against stack buffer overflows, on all architectures tested. Developers should be encouraged tosystematically provide binaries compiled with this flag. In case of doubt as to which SSP option would offer the best trade-off security/performance, it would be recommended to turn to-fstack-protector-strong, as it provides more protection against buffer overrun, by improving the traditional SSP argument re-ordering (to detect function pointers and such).
As you may have noticed reading the implementation details across all the different architectures, the SSP implementation within the C compiler is pretty much the same; the most notable exception being Intel, which uses architecture-specific property to provide a better way to reach the canary.
So if we were to summarize the pros & cons of the use of a stack canary, we could say that:
Pros:
Cons:
execve() generates a new canary, forking process does not, meaning that the forked process canaries may be brute-forced;offset2lib attacks.Newer protections, such as SafeStack may offer a newer/better alternative, which may just be the subject of a follow-up blog post.
Well, that’s it. I hope you’ve enjoyed reading those notes, and feel free to poke me for comments or questions.