Page 1 of 1

Getting linked library symbol table

PostPosted: Wed Nov 02, 2016 7:26 pm
by spartan
Hello all,

I'm currently trying to understand some OSX internals by playing with Mach-O. For that, I have a simple exercise that consists in calling 'printf' from a C program without actually including <stdio.h>. The idea, for the ELF format for example, is to retrieve the link_map, find the libc, and recover the printf symbol by looking in the strtab which symbols is linked to the string 'printf'.

With my Mach-o, I've managed to recover my libsystem_c.dylib in memory, but the strtab/symtab/dysymtab offsets are quite fucked up.

Here is a sample of the code I am using right now: https://gist.github.com/P1kachu/c114bcf ... 6075522c93

I then understood that the symtab was not loaded in memory, but the dysymtab is. From my understanding, the dysymtab I get are indexes in the symtab pointed by the LC_SYMTAB command. Am I right ? If I'm not, how can I recover from memory the dysymtab of a loaded library ?

I'm on OSX 10.10.5

Thank you very much in advance !

Spartan

Re: Getting linked library symbol table

PostPosted: Thu Nov 03, 2016 1:05 am
by morpheus
To get the symbol table code such as the one you referred to would work; There are no actual APIs (libmacho.dylib can enumerate the binaries, but not their symbols). My own Jtool also takes the "hard" approach of traversing the load commands, then parsing the symtable.

Remember, that offsets are PC relative - and that your binary is slid. That might account for the "fucked up" offsets you mention.

Btw the 1st Edition actually discusses some of this.

Re: Getting linked library symbol table

PostPosted: Thu Nov 03, 2016 5:23 am
by spartan
Administrator wrote:Remember, that offsets are PC relative - and that your binary is slid. That might account for the "fucked up" offsets you mention.


What do you mean by PC relative here ? I get offsets that are way out of the range of the libc in memory, and I don't understand why they changed (relocations?). Also, what do you mean by "slid" please ? I used jTool to verify with the not loaded version of the libc, and I still can't get a clear understanding of how to access the symbols from the offsets, appart from adding the offset to the libc's base address.

Thank you very much

Re: Getting linked library symbol table

PostPosted: Thu Nov 03, 2016 4:29 pm
by Siguza
For "slid", see ASLR.
For "PC-relative", see addressing modes.

TL;DR: Everything is relative. ;)

Re: Getting linked library symbol table

PostPosted: Thu Nov 03, 2016 7:33 pm
by spartan
I don't understand how offsets in the loaded libc's structures can be PC relative :/
I can understand them being relative to the libc's base address in memory (ASLR should'nt be a problem, or ROP would'nt even be possible, and I resolve the libc's base address at runtime), but not PC relative here :/
If they are, should I then substract my instruction pointer to the offset before adding them to the libc's base address... ? This seems crazy :o

Re: Getting linked library symbol table

PostPosted: Thu Nov 03, 2016 8:41 pm
by Siguza
I tried running your code, and I didn't do much testing, but... could it be that you're looking at a dyld_shared_cache? :P

Re: Getting linked library symbol table

PostPosted: Sat Nov 05, 2016 1:51 am
by spartan
Siguza wrote:I tried running your code, and I didn't do much testing, but... could it be that you're looking at a dyld_shared_cache? :P

Hum I don't think so, I got the lib address by iterating over duld_info.all_image_info_addr. I must be missing something obvious but I am completely stuck... :oops:

Re: Getting linked library symbol table

PostPosted: Sat Nov 05, 2016 8:47 pm
by spartan
Siguza wrote:I tried running your code, and I didn't do much testing, but... could it be that you're looking at a dyld_shared_cache? :P


Oh yes actually I think I took libsystem_c from the dyld_shared_cache. Can you tell me more about it ?

Re: Getting linked library symbol table

PostPosted: Sun Nov 06, 2016 3:04 pm
by Siguza
Okay, the first problem is that the the symtab is designed to be available to dyld when a file is loaded, and not to any running program as a means of reflection.
This means that a) it deals in file offsets rather than virtual memory addresses and b) its context can entirely change its meaning. E.g. for libraries that are part of a dyld_shared_cache, their symtab and strtab are offsets from the entire cache, and not from the start of the library.

Now let's look at such a cache. On iOS you can find them in /System/Library/Caches/com.apple.dyld/, on macOS they're in /var/db/dyld.
We can examine the header with jtool -h:
Code: Select all
$ jtool -h /var/db/dyld/dyld_shared_cache_x86_64h
File is a shared cache containing 570 images (use -l to list)
Header size: 0x70 bytes
3 mappings starting from 0x70. 570 Images starting from 0xd0
mapping r-x/r-x  359MB     7fff80000000 -> 7fff967d2000      (0-167d2000)
mapping rw-/rw-   69MB     7fff70000000 -> 7fff745a8000      (167d2000-1ad7a000)
mapping r--/r--   96MB     7fff967d2000 -> 7fff9c806000      (1ad7a000-20dae000)
DYLD base address: 7fff5fc00000
Local Symbols:  0x0-0x0 (0 bytes)
Code Signature: 0x20dae000-0x2145a9a6 (6998438 bytes)
Slide info:     0x20c60000-0x20dae000 (1368064 bytes)
   Slide Info version 1, TOC offset: 24, count 17832, entries: 10394 of size 128
All caches I've seen define three huge memory regions as above (r-x, rw-, r--), across which the segments of all libraries are split up. So all executable segments (and therefore the library headers) are in the r-x mapping, all data segments in the rw- mapping, and stuff like __LINKEDIT segments, symbol table, string tables and whatnot are in the r-- mapping.
As you can see, however, the virtual memory layout does not correspond to the file layout — the rw- mapping is before the r-x mapping in virtual memory, but after it in the file — and therefore the r-- mapping's vm address is not equal to the r-x's base address plus the r--'s file offset.

So in order to read and resolve a symtab entry correctly, you have to
1) be aware of the file layout (e.g. independent dylib or shared cache)
2) translate the virtual memory layout back to file offsets

Now I haven't looked into how to determine whether a dylib was loaded from cache or not at runtime, or how to locate the cache.
However, looking at dyld_images.h, there's a thing called dyld_shared_cache_ranges, and there's a sharedCacheSlide in dyld_all_image_infos (which you already have). If you can get a hold of dyld_shared_cache_ranges, it should be straightforward to check whether the libc falls into any of the shared cache ranges, and to locate the cache header.

For now, I simply assumed that libsystem_c will always be loaded as part of the cache, and I've done the offset by static analysis.
We can get the (unslid) virtual address of a library within the cache with jtool -l:
Code: Select all
$ jtool -l /var/db/dyld/dyld_shared_cache_x86_64h | fgrep libsystem_c.dylib
 236:     7fff89c86000 /usr/lib/system/libsystem_c.dylib
Given this information, and the knowledge that symtab and strtab are gonna be in the r-- mapping, we can now correctly resolve them:

cache slide = libc - 0x7fff89c86000
symtab = 0x7fff967d2000 - 0x1ad7a000 + cache slide + symoff

I've accordingly rewritten the last part of your find_printf function with these hardcoded offsets:
Code: Select all
// SOMETHING'S WRONG HERE FFS
char * const libc_unslid = (char *)0x7fff89c86000;
char * const dsc_rx_base = (char *)0x7fff80000000;
char * const dsc_ro_addr = (char *)0x7fff967d2000;
const off_t  dsc_ro_off  =             0x1ad7a000;
char * const dsc_ro_base = dsc_ro_addr - dsc_ro_off;

off_t dsc_slide = (char *)libc - libc_unslid;
char *dsc_ro = dsc_ro_base + dsc_slide;

char            *strtab =                     dsc_ro + symcmd->stroff;
struct nlist_64 *symtab = (struct nlist_64 *)(dsc_ro + symcmd->symoff);

uint64_t printf_off = 0;

for(uint32_t i = 0; i < symcmd->nsyms; ++i)
{
    uint32_t strtab_off = symtab[i].n_un.n_strx;
    uint64_t func       = symtab[i].n_value;
    printf("%016llx %s\n", func, &strtab[strtab_off]);

    if(strcmp(&strtab[strtab_off], "_printf") == 0)
    {
        printf_off = func;
    }
}

if(printf_off != 0)
{
    return dsc_rx_base + dsc_slide + printf_off;
}

return NULL;

Re: Getting linked library symbol table

PostPosted: Sun Nov 06, 2016 7:35 pm
by spartan
Wow thank you very much ! I had no idea the offsets would be from the beginning of the cache !