Getting linked library symbol table

Questions and Answers about all things *OS (macOS, iOS, tvOS, watchOS)

Getting linked library symbol table

Postby spartan » Wed Nov 02, 2016 7:26 pm

Hello all,

I'm currently trying to understand some OSX internals by playing with Mach-O. For that, I have a simple exercise that consists in calling 'printf' from a C program without actually including <stdio.h>. The idea, for the ELF format for example, is to retrieve the link_map, find the libc, and recover the printf symbol by looking in the strtab which symbols is linked to the string 'printf'.

With my Mach-o, I've managed to recover my libsystem_c.dylib in memory, but the strtab/symtab/dysymtab offsets are quite fucked up.

Here is a sample of the code I am using right now: https://gist.github.com/P1kachu/c114bcf ... 6075522c93

I then understood that the symtab was not loaded in memory, but the dysymtab is. From my understanding, the dysymtab I get are indexes in the symtab pointed by the LC_SYMTAB command. Am I right ? If I'm not, how can I recover from memory the dysymtab of a loaded library ?

I'm on OSX 10.10.5

Thank you very much in advance !

Spartan
spartan
 
Posts: 6
Joined: Wed Nov 02, 2016 4:55 pm

Re: Getting linked library symbol table

Postby morpheus » Thu Nov 03, 2016 1:05 am

To get the symbol table code such as the one you referred to would work; There are no actual APIs (libmacho.dylib can enumerate the binaries, but not their symbols). My own Jtool also takes the "hard" approach of traversing the load commands, then parsing the symtable.

Remember, that offsets are PC relative - and that your binary is slid. That might account for the "fucked up" offsets you mention.

Btw the 1st Edition actually discusses some of this.
morpheus
Site Admin
 
Posts: 532
Joined: Thu Apr 11, 2013 6:24 pm

Re: Getting linked library symbol table

Postby spartan » Thu Nov 03, 2016 5:23 am

Administrator wrote:Remember, that offsets are PC relative - and that your binary is slid. That might account for the "fucked up" offsets you mention.


What do you mean by PC relative here ? I get offsets that are way out of the range of the libc in memory, and I don't understand why they changed (relocations?). Also, what do you mean by "slid" please ? I used jTool to verify with the not loaded version of the libc, and I still can't get a clear understanding of how to access the symbols from the offsets, appart from adding the offset to the libc's base address.

Thank you very much
spartan
 
Posts: 6
Joined: Wed Nov 02, 2016 4:55 pm

Re: Getting linked library symbol table

Postby Siguza » Thu Nov 03, 2016 4:29 pm

For "slid", see ASLR.
For "PC-relative", see addressing modes.

TL;DR: Everything is relative. ;)
User avatar
Siguza
Unicorn
 
Posts: 159
Joined: Thu Jan 28, 2016 10:38 am

Re: Getting linked library symbol table

Postby spartan » Thu Nov 03, 2016 7:33 pm

I don't understand how offsets in the loaded libc's structures can be PC relative :/
I can understand them being relative to the libc's base address in memory (ASLR should'nt be a problem, or ROP would'nt even be possible, and I resolve the libc's base address at runtime), but not PC relative here :/
If they are, should I then substract my instruction pointer to the offset before adding them to the libc's base address... ? This seems crazy :o
spartan
 
Posts: 6
Joined: Wed Nov 02, 2016 4:55 pm

Re: Getting linked library symbol table

Postby Siguza » Thu Nov 03, 2016 8:41 pm

I tried running your code, and I didn't do much testing, but... could it be that you're looking at a dyld_shared_cache? :P
User avatar
Siguza
Unicorn
 
Posts: 159
Joined: Thu Jan 28, 2016 10:38 am

Re: Getting linked library symbol table

Postby spartan » Sat Nov 05, 2016 1:51 am

Siguza wrote:I tried running your code, and I didn't do much testing, but... could it be that you're looking at a dyld_shared_cache? :P

Hum I don't think so, I got the lib address by iterating over duld_info.all_image_info_addr. I must be missing something obvious but I am completely stuck... :oops:
spartan
 
Posts: 6
Joined: Wed Nov 02, 2016 4:55 pm

Re: Getting linked library symbol table

Postby spartan » Sat Nov 05, 2016 8:47 pm

Siguza wrote:I tried running your code, and I didn't do much testing, but... could it be that you're looking at a dyld_shared_cache? :P


Oh yes actually I think I took libsystem_c from the dyld_shared_cache. Can you tell me more about it ?
spartan
 
Posts: 6
Joined: Wed Nov 02, 2016 4:55 pm

Re: Getting linked library symbol table

Postby Siguza » Sun Nov 06, 2016 3:04 pm

Okay, the first problem is that the the symtab is designed to be available to dyld when a file is loaded, and not to any running program as a means of reflection.
This means that a) it deals in file offsets rather than virtual memory addresses and b) its context can entirely change its meaning. E.g. for libraries that are part of a dyld_shared_cache, their symtab and strtab are offsets from the entire cache, and not from the start of the library.

Now let's look at such a cache. On iOS you can find them in /System/Library/Caches/com.apple.dyld/, on macOS they're in /var/db/dyld.
We can examine the header with jtool -h:
Code: Select all
$ jtool -h /var/db/dyld/dyld_shared_cache_x86_64h
File is a shared cache containing 570 images (use -l to list)
Header size: 0x70 bytes
3 mappings starting from 0x70. 570 Images starting from 0xd0
mapping r-x/r-x  359MB     7fff80000000 -> 7fff967d2000      (0-167d2000)
mapping rw-/rw-   69MB     7fff70000000 -> 7fff745a8000      (167d2000-1ad7a000)
mapping r--/r--   96MB     7fff967d2000 -> 7fff9c806000      (1ad7a000-20dae000)
DYLD base address: 7fff5fc00000
Local Symbols:  0x0-0x0 (0 bytes)
Code Signature: 0x20dae000-0x2145a9a6 (6998438 bytes)
Slide info:     0x20c60000-0x20dae000 (1368064 bytes)
   Slide Info version 1, TOC offset: 24, count 17832, entries: 10394 of size 128
All caches I've seen define three huge memory regions as above (r-x, rw-, r--), across which the segments of all libraries are split up. So all executable segments (and therefore the library headers) are in the r-x mapping, all data segments in the rw- mapping, and stuff like __LINKEDIT segments, symbol table, string tables and whatnot are in the r-- mapping.
As you can see, however, the virtual memory layout does not correspond to the file layout — the rw- mapping is before the r-x mapping in virtual memory, but after it in the file — and therefore the r-- mapping's vm address is not equal to the r-x's base address plus the r--'s file offset.

So in order to read and resolve a symtab entry correctly, you have to
1) be aware of the file layout (e.g. independent dylib or shared cache)
2) translate the virtual memory layout back to file offsets

Now I haven't looked into how to determine whether a dylib was loaded from cache or not at runtime, or how to locate the cache.
However, looking at dyld_images.h, there's a thing called dyld_shared_cache_ranges, and there's a sharedCacheSlide in dyld_all_image_infos (which you already have). If you can get a hold of dyld_shared_cache_ranges, it should be straightforward to check whether the libc falls into any of the shared cache ranges, and to locate the cache header.

For now, I simply assumed that libsystem_c will always be loaded as part of the cache, and I've done the offset by static analysis.
We can get the (unslid) virtual address of a library within the cache with jtool -l:
Code: Select all
$ jtool -l /var/db/dyld/dyld_shared_cache_x86_64h | fgrep libsystem_c.dylib
 236:     7fff89c86000 /usr/lib/system/libsystem_c.dylib
Given this information, and the knowledge that symtab and strtab are gonna be in the r-- mapping, we can now correctly resolve them:

cache slide = libc - 0x7fff89c86000
symtab = 0x7fff967d2000 - 0x1ad7a000 + cache slide + symoff

I've accordingly rewritten the last part of your find_printf function with these hardcoded offsets:
Code: Select all
// SOMETHING'S WRONG HERE FFS
char * const libc_unslid = (char *)0x7fff89c86000;
char * const dsc_rx_base = (char *)0x7fff80000000;
char * const dsc_ro_addr = (char *)0x7fff967d2000;
const off_t  dsc_ro_off  =             0x1ad7a000;
char * const dsc_ro_base = dsc_ro_addr - dsc_ro_off;

off_t dsc_slide = (char *)libc - libc_unslid;
char *dsc_ro = dsc_ro_base + dsc_slide;

char            *strtab =                     dsc_ro + symcmd->stroff;
struct nlist_64 *symtab = (struct nlist_64 *)(dsc_ro + symcmd->symoff);

uint64_t printf_off = 0;

for(uint32_t i = 0; i < symcmd->nsyms; ++i)
{
    uint32_t strtab_off = symtab[i].n_un.n_strx;
    uint64_t func       = symtab[i].n_value;
    printf("%016llx %s\n", func, &strtab[strtab_off]);

    if(strcmp(&strtab[strtab_off], "_printf") == 0)
    {
        printf_off = func;
    }
}

if(printf_off != 0)
{
    return dsc_rx_base + dsc_slide + printf_off;
}

return NULL;
User avatar
Siguza
Unicorn
 
Posts: 159
Joined: Thu Jan 28, 2016 10:38 am

Re: Getting linked library symbol table

Postby spartan » Sun Nov 06, 2016 7:35 pm

Wow thank you very much ! I had no idea the offsets would be from the beginning of the cache !
spartan
 
Posts: 6
Joined: Wed Nov 02, 2016 4:55 pm


Return to Questions and Answers

Who is online

Users browsing this forum: No registered users and 5 guests