gruba: The rise of the undead (WebKit Heap Exploitation)

I remember one day Pablo told me, "forget about attacking metadata, that just wont help you" (or something along those lines, sorry about the inaccuracy).
He was right, as he almost always is, in that case.

Long gone are the days where heap exploitation was all about attacking (overwriting) the internal structures of the heap allocator.

But are they really gone?

It is well known that nowadays, attacking metadata on Windows Heap it is a real pain in the rear. That's one of the reason most of the current heap research papers are focused on crafting the heap layout in order to exploit use-after-free conditions (there are some great exceptions, like Chris Valasek's research on the Low Fragmentation Heap).

But heap overflows are not restricted just to the operating system heap implementation. There are plenty of custom heap allocators in the world of Open Source Software and some of them are widely used (without most of the people noticing).

Our research case was TCMalloc. The reason? Mainly because it is used by WebKit, one of the most popular Web Browser engines out there. WebKit is responsible of rendering web pages in Chrome, Safari a huge amount of Smart Phones (yes Android uses it).
I don't like to lie to myself, owning Bas's phone was enough reason to do the research, the rest was just a bonus.

Understanding the heap allocator (as usual) was the first step in the right direction that drove the whole research to a good end.

TCMalloc is a heap allocator initially designed by Sanjay Ghemawat and Paul Menage from Google. It was designed with two main premises, speed and multi-threaded environments. This two premises drove the whole design of all the
internal structures to be almost lock free (in order to reduce lock contention) and thread local.
From an attacker perspective, this means that we are going to need to control a lot of factors if we need to succeed at exploiting vulnerabilities that involve corruption of information between threads.

The allocator itself it is not complex. It is divided in three main subsystems

Page Allocator
Central Cache Allocator
Thread Allocator

The main place where memory is obtained is at the Page Allocator. This layer of abstraction deals with the system level memory allocators. It can allocate memory from several places depending on which operating system you are on. The options are VirtualAlloc, sbrk, mmap or even /dev/mem.
At this level, memory is handled in integral multiples of a page. This groups of pages are called Spans in TCMalloc terminology.
A global array of list keeps track of each of the span the system has allocated. This array is indexed by the number of pages the span contains. That is, page_heap_freelist[N] will contain a list of free spans of N pages each.
There are 256 entries in the page_heap_freelist (this depends if you are looking at Chrome or Safari, etc.). Those spans with more than 256 pages are handled by another free list of what is called a "really large span".

The Spans serves for two main purposes:

to hold a large object
to be used by the central cache

Memory in multiples of the page size is a little bit inconvenient to use directly from the application. The Central Cache allocator is in charge of making those Spans usable by the application.
The central cache keeps track of the status of spans and also to split the spans into units that are more manageable by the application.
To do so, the central heap has an array of free lists indexed by size class. Each free list contains a set of spans which are spitted into objects of the corresponding size class.
Those objects could have been handed directly to the application, but remember that this allocator is thread oriented, so in order to serve multiple threads, many of the structures of the Central Heap should be guarded by a lock in order to prevent races and stuff that we *do not like*.

That's the reason why there is another level of abstraction. The Thread Heap is a per thread structure where the central allocator stores memory chunks. Because of the nature of per thread structures, there is no need to hold locks to access them.
The Thread Cache consists mainly of an array of free lists indexed by size class. These free lists are populated with objects by the Central Cache allocator and consumed by each thread that needs some objects.

That is basically how the algorithm and structures are tied together. With an in-depth understanding of how these subsystems interact we as attackers can start to make some assumptions about how we can force TCMalloc to be our bitch^H^H^H^H^H friend.

The next step is too take a look into where all this structures are placed and how are they connected to each other, but that my friend is something that we are going to go delve into at Infiltrate (and Immunity's Masters Class).

To give the readers a quick glance over what we have been doing lately, here is a screen-shot of a specially crafted heap layout used by a Canvas exploit to pwn your browser (yes, it will *penetrate* your phone, your windows and your fancy Mac):

To say the least, the outcome of this research was more than satisfying. From an attacker perspective, everything started to look exploitable.

Thanks for reading.

EOF

gruba

Friday, March 04, 2011

The rise of the undead (WebKit Heap Exploitation)

2 comments:

About Me

Links