4. 4
WHAT THIS PRESENTATION IS ABOUT
It’s not about : Studying memory access patterns/Latency/Data locality
problem (memory hierarchy)/Cache Optimization/Reducing memory stall cycle.
It’about :Tracking userspace memory access (data) of an oracle database
Memory address
Function name
Instruction address
Memory content
Memory access mode (read/write)
5. 5
FOR WHAT PURPOSE ?
Reverse engineering :
Building performance monitoring/auditing tools (interfacing
with the oracle database),
Security analysis (ex: Asseting the security of the db link
password stored in « sys.link$ »),
Etc
Researching oracle internal (Beyond C function call tracing):
Useful for
6. 6
HIGHLIGHTS
Virtual memory Memory access
tracing/profiling
Use cases/Examples
All the concepts , tools and examples described here
are specific to Linux and the x86_64 architecture.
8. 8
VIRTUAL MEMORY
Virtual memory is an abstraction of
main memory.
Each process run in its own large, linear
and private address space,
Virtual memory is made possible by
support in both the processor and
operating system.
Virtual memory
Process 2
Virtual memory
Process 1
13. 13
HARDWARE BREAKPOINT
Provide an elegant mechanism to monitor memory access.
Make use of dedicated registers and hence are limited in number.
x86DebugRegister
• Virtual memory address of the desired watchpointDR0 to DR3
• Obsolete synonyms for DR6 and DR7DR4 and DR5
• Status register information about the last breakpoint
hitDR6
• Control register [local and global enables/memory
access type/memory access length(1,2,4,8 bytes) ]
DR7
https://en.wikipedia.org/wiki/X86_debug_register
15. 15
INTEL PIN
Pin is a dynamic binary instrumentation (DBI) framework.
PinTools are Programmable instrumentation tools (C, C++, assembly).
Benefits :
Insert arbitrary code into working user program.
No change or recompilation of source code.
Attach to a running process.
Rich API that abstracts away the underlying instruction-set (instrument a
class of instructions).
16. 16
Pin inject some dynamic libraries in the address space of the target
application to gain control of the execution (relies on the ptrace system
call).
Instrument binary code right before it runs.
DYNAMIC TRACING MODE : JIT MODE
17. 17
PINATRACE.SO : TRACING MEMORY ACCESS
The pin tool « pinatrace.so »
Generates a trace of all memory addresses referenced by a program.
Instrument instructions that read or write memory
Syntax :
pin -pid 9266 -t pinatrace.so
19. 19
PINATRACE ORACLE ANNOTATE TOOL
Changing the functions addresses to function names, and the memory
addresses to named memory locations whenever possible ! Pinatrace oracle
annotate by Frits Hoogland !
https://fritshoogland.wordpress.com/2016/11/18/advanced-oracle-memory-profiling-using-pin-tool-pinatrace/
20. 20
EX1:TRACKING FUNCTIONS WHERE DATA OF INTEREST IS
HANDLED
Simple stupid c program « ./hello_world » that print « Hello, World! » when
executed.
How can we check in
which function this
happen ?
21. 21
EX1:TRACKING FUNCTIONS WHERE DATA OF INTEREST IS
HANDLED
Let’s track where this happen using intel pin tool « pinatrace.so »
23. 23
EX2:TRACKING FUNCTIONS WHERE DATA OF INTEREST IS
HANDLED
« Client_pass » a C program that ask for a password to execute !
How can we hack
the password?
HINT : The password is stored in clear text
25. 25
DEBUGTRACE.SO : TRACING MEMORY/CALL/INSTRUCTION
The pin tool « debugtrace.so » designed to help debugging
Pin tools switches
call [default 1] Trace calls
instruction [default 0] Trace instructions
memory [default 0] Trace memory
symbols [default 1] Include symbol information
26. 26
Memory/call tracing :(debugtrace.so -memory)
EX2:TRACKING FUNCTIONS WHERE DATA OF INTEREST IS
HANDLED
Memory/call/instruction tracing : (debugtrace.so -instruction -memory)
30. 30
I- LATCH MONITORING TOOLS
LATCH CALL GRAPH
EXTRACTING LATCH HOLDER INFO FROM SGA
Test env : oracle 12.2.0.1/OEL6/UEK4
31. 31
LATCHES
Latches are very low-level locks.
Every latch is just a memory structure in SGA .
There are dedicated functions related with latches in KSL (Ex: ksl_get_shared_latch).
But Latches can also been acquired/released inside
functions like kcbgtcr (consistent get) or kcbgcur (current
get) without dedicated calls to ksl* functions.
32. 32
Watching Ultra fast Latch in action : (Cache buffer chain Latch)
perf record -e mem: 0x000000009F668880:w -p 9154
LATCHES
33. 33
ˆNproc ˆX flag gets latch#
(Shared latch memory layout )
Watching Ultra fast Latch in action : (Cache buffer chain latch)
LATCHES
34. 34
Getting Latch holder info out of the state objects in SGA memory
Based on the work of Andrey Nikolaev
LATCHES
V$LATCHHOLDER scans through
the process state object array
(V$PROCESS/X$KSUPR) and looks
into a field there which points to the
latch held by a process
http://tech.e2sn.com/oracle/troubleshooting/latch-contention-troubleshooting
Use intel pin tools (pintrace.so/
debugtrace.so) and gdb to analyze
what’s going on under the hood
39. 39
A C program that will extract latch holder info out of the state objects in
SGA memory
Can be enhanced to sample latch state object at high frequency and present a
profile of latches held with extended info.
EXTRACTING LATCH HOLDER INFO FROM SGA
40. 40
II- PASSWORD HACKING
EXTRACTING DB_LINK PASSWORD
EXTRACTING ANY USER PASSWORD
REVERSE ENGINEERING DB LINK PASSWORD
DECRYPTION IN PL/SQL
Test env : oracle 12.1.0.2.6|12.2.0.1/OEL6/UEK4
41. 41
EXTRACTING DB_LINK PASSWORD
Starting with version 11.2.0.4 and also in 12c it is no longer possible to supply
the obfuscated password using a BY VALUES clause for creating a database link,
this is only allowed from a datapump import.
ORA-02153 : Invalid VALUES Password String When Creating a Database Link
Using BY VALUES With Obfuscated Password After Upgrade To 11.2.0.4 (Doc ID
1905221.1).
42. 42
Extracting the database link password from function "r0_aes_cbc_loop_enc_x86_intel”
EXTRACTING DB_LINK PASSWORD
44. 44
We will have to attach intel pin tools to the process just after it’s creation !
Suspending a newly forked oracle process.
EXTRACTING ANY USER PASSWORD
46. 46
REVERSE ENGINEERING DB LINK PASSWORD DECRYPTION
IN PL/SQL
Password decrypted using r0_AES_CBC_loop_dec_x86_intel ((Advanced
Encryption Standard ) in CBC encryption mode)
47. 47
REVERSE ENGINEERING DB LINK PASSWORD DECRYPTION
IN PL/SQL
To decrypt the password we need three parameters
48. 48
REVERSE ENGINEERING DB LINK PASSWORD DECRYPTION
IN PL/SQL
This parameters depend on :
“PASSWORDX” from “sys.link$”
Variable “ztcshpl_v6” (Database independent)
NO_USERID_VERIFIER_SALT from SYS.PROPS$ (Database dependent).
50. 50
REVERSE ENGINEERING DB LINK PASSWORD DECRYPTION
IN PL/SQL
The procedure “db_link_password_decrypt” take two parameters :
NO_USERID_VERIFIER_SALT of your database
Passwordx from sys.link$
Memory access Tracing/Profiling is very large subject
How to characterize good memory access behavior? (which function)
• percentage of accesses which was served by the cache good ratio: > 97% Symptoms of bad memory access: Cache Misses
Undestanding memory access patterns for the purpose of imporving data locality.(reducing latency/imporving cache miss)
http://calcul.math.cnrs.fr/IMG/pdf/Valgrind_weidendorfer.pdf
If your IPC is < 1.0, you are likely memory stalled, and software tuning strategies include reducing memory I/O, and improving CPU caching and memory locality, especially on NUMA systems. Hardware tuning includes using processors with larger CPU caches, and faster memory, busses, and interconnects.
The common bottleneck is moving from disks to the memory subsystem. CPU caches, the MMU, memory busses, and CPU interconnects. These can only be analyzed with PMCs.
Memory access Tracing/Profiling is very large subject
How to characterize good memory access behavior? (which function)
• percentage of accesses which was served by the cache good ratio: > 97% Symptoms of bad memory access: Cache Misses
Undestanding memory access patterns for the purpose of imporving data locality.(reducing latency/imporving cache miss)
http://calcul.math.cnrs.fr/IMG/pdf/Valgrind_weidendorfer.pdf
If your IPC is < 1.0, you are likely memory stalled, and software tuning strategies include reducing memory I/O, and improving CPU caching and memory locality, especially on NUMA systems. Hardware tuning includes using processors with larger CPU caches, and faster memory, busses, and interconnects.
The common bottleneck is moving from disks to the memory subsystem. CPU caches, the MMU, memory busses, and CPU interconnects. These can only be analyzed with PMCs.
Reasons for reverse engineering:
Tracking functions where data of interest is handled (ex : password,latchs,fixed SGA variable)
Record how it interacts with virtual memory (different memory location) in order to do some self-study and experiments
Agenda
One virtual address may be relveant to one task but not the other
This provides several benefits :
Simplifies software development (without worrying about contention/ leaving physical memory placement for the operating system to manage.)
Allowing the operating system to use the same memory locations for multiple tasks (memory page sharing)
Physical RAM can be mapped into multiple processes at once
Overcommitting memory/oversubscription,
Increased security due to memory isolation (accidentally writing or deliberately reading sensitive information.)
The operating system will decide where Memory page will reside (page out/page in)
Nearly all implementations of virtual memory divide a virtual address space into pages, blocks of contiguous virtual memory addresse
Pages on contemporary[NB 2] systems are usually at least 4 kilobytes in size;
64-bit Linux allows up to 128 TB of virtual address space for individual processes, and can address approximately 64 TB of physical memory, subject to processor and system limitations.
32-bit x86 processors support 32-bit virtual addresses and 4-GiB virtual address spaces, and current 64-bit processors support 48-bit virtual addressing and 256-TiB virtual address spaces
An area is a contiguous chunk of existing (allocated) virtual memory whose pages are related in some way. For example, the code segment, data segment, heap, shared library segment, and user stack are all distinct areas.
The addresses are split into areas called segments for storing the thread stacks, process executable, libraries, and heap.
Segment: an area of memory flagged for a particular purpose, such as for storing executable or writeable pages.
.text – executables code/.data – initialized variables/.bss – non-initialized variables
The addresses are split into areas called segments
Kernel memory (page table/call stack/data)
Each process has its own memory map
● struct mm
At context switch time, the memory map of the new process is used
● This is part of the context switch overhead
As I mentioned above, shared libraries have no pre-defined load address - it will be decided at runtime.
The base of the executable is fixed / the positions of the stack, heap and libraries is random
Sections contain important data for linking and relocation (.text – executables code/.data – initialized variables/.bss – non-initialized variables/etc)
Segments contain information that is necessary for runtime execution,
Address space layout randomization (ASLR) is a computer security technique involved in protection from buffer overflow attacks.
Position-independent executable (PIE) implements a random base address for the main executable binary and has been in place since 2003. It provides the same address randomness to the main executable as being used for the shared libraries. The PIE feature is in use only for the network facing daemons – the PIE feature cannot be used together with the prelink feature for the same executable.
The reason this memory segment type is called anonymous is that this memory doesn't correspond to any real (named) file on disk
Curiously, due to the address space layout randomization feature which is enabled in Linux, relocation is relatively difficult to follow, because every time I run the executable, the libmlreloc.so shared library gets placed in a different virtual memory address [9]. Recall that the program/executable is not relocatable, and thus its data addresses have to bound at link time. Compiling with PIE enabled affects the resultant binary.
PT_LOAD Specifies a loadable segment, described by p_filesz and p_memsz.
Depending on the kind of file being loaded into memory, the memory address might not match the p_vaddr values.
"_start" label, which is where the all real code start (which is in the C/C++ library startup code for C/C++ programs). (calls initialization functions, then main)
When the binary is mapped into memory, it won't be mapped at the exact same address every run. An offset (base) will be applied thus the difference in the pointer address
TLS : Thread local storage
Exception if using prelink - prelink ELF shared libraries and binaries to speed up startup time (assigns a unique virtual address space slot to each library)
Libc may be prelinked
DR6 is a status register that gives information about the last breakpoint hit, such as the register number of the breakpoint, and DR7 is the breakpoint control register. DR7 includes controls such as, local and global enables, memory access type, and memory access length. However, as with any limited hardware resource, multiple software users must contend for access of these registers.
Global break point : an address in a debug address register may be relevant to one task but not to another.
given debug address has a global (all tasks) or local (current task only) relevance.
local and global enables : These bits indicate whether a given debug address has a global (all tasks) or local (current task only) relevance.
The local enable bits are automatically reset by the processor at every task switch to avoid unwanted breakpoint conditions in the new task.
The global enable bits are not reset by a task switch; therefore, they can be used for conditions that are global to all tasks.
https://en.wikipedia.org/wiki/X86_debug_register
https://www.kernel.org/doc/ols/2009/ols2009-pages-149-158.pdf
https://lwn.net/Articles/353050/
Pin API which includes functions that classify and examine instructions.(like memory operations or branch instructions.)
Many of the PIN APIs that are available in JIT mode are not available in Probe mode.
Probe mode is a method of using Pin to instrument at the function level only. Wrap, Replace, call Analysis function before/after.
it’s vitally important to use and run pin as the same user as the process you want to run pin against. The way pin works is that, upon execution of pin, the pin executable inserts itself into the process’ address space, gains control and then tries to load necessary libraries
Pin API which includes functions that classify and examine instructions.(like memory operations or branch instructions.)
Pin API which includes functions that classify and examine instructions.(like memory operations or branch instructions.)
The first field is the function location (IP),
the second field is R or W (reading or writing obviously),
the third field is the memory location read or written
the fourth field is the amount of byte read/writen
the fifth field is the memory value.
X$ksupr v$process
Because a PGA memory snapshot is made at a certain point in time, this snapshot represents the memory layout of that moment, which has a high probability of having memory deallocated and freed to the operating system. A lot of the SGA/shared pool allocations on the other hand have the intention of re-usability, and thus are not freed immediately after usage, which gives the SGA memory snapshot a good chance of capturing a lot of the memory allocations.
Little-endian format reverses the order of the sequence and addresses/sends/stores the least significant byte first (lowest address) and the most significant byte last (highest address).
Endianness refers to the sequential order used to numerically interpret a range of bytes in computer memory as a larger, composed word value (32/64 bit).
Little-endian format reverses the order of the sequence and addresses/sends/stores the least significant byte first (lowest address) and the most significant byte last (highest address).
Endianness refers to the sequential order used to numerically interpret a range of bytes in computer memory as a larger, composed word value (32/64 bit).
The first two instructions are related to function prologue which is responsible for the preparation of stack and registers for use within the function
RSP : Stack pointer : Points to the top of the stack
RBP : Frame pointer : Provide a stable reference from which functions arguments and local variables can be referenced.
Sub 0x10 rsp allocated space in the stack
Movl the string variable into the stack pushed into the stack
puts
int puts ( const char * str );
Write string to stdout
This is how dynamic relocation behave when using position independent code. Basically ,when the function is called (lazy binding ) it will not call the external function directly but will use a PLT stub. GOT and PLT (Procedure Linkage Table it’s and array of stubs) sections are the keys for dynamic linking.
As I mentioned above, shared libraries have no pre-defined load address - it will be decided at runtime.Address space layout randomization (ASLR) is a computer security technique involved in protection from buffer overflow attacks.
----------------------
Position-independent executable (PIE) implements a random base address for the main executable binary and has been in place since 2003. It provides the same address randomness to the main executable as being used for the shared libraries. The PIE feature is in use only for the network facing daemons – the PIE feature cannot be used together with the prelink feature for the same executable.Curiously, due to the address space layout randomization feature which is enabled in Linux, relocation is relatively difficult to follow, because every time I run the executable, the libmlreloc.so shared library gets placed in a different virtual memory address [9]. Recall that the program/executable is not relocatable, and thus its data addresses have to bound at link time. Compiling with PIE enabled affects the resultant binary.
Pin API which includes functions that classify and examine instructions.(like memory operations or branch instructions.)
1.2 Systematic troubleshooting of latch contention
1.3 Question 1 - Who is trying to get the latch and why?
1.4 Question 2 - Who is holding the latch and why?
KS service layer
KC cache layer
Latches are very low-level locks (used for protecting very short operations on memory structures )
Every latch is just a memory structure in SGA, usually 100-200 bytes in size, depending on your Oracle version, hardware platform and whether you are running 32 or 64-bit Oracle.
There are dedicated functions related with latches in KSL (Kernel Service layer Latching & Wait) layer as kslgetl (KSL Get Exclusive Latch), kslgetsl (KSL Get Shared Latch), etc.But also latches can be captured inside functions like kcbgtcr (consistent get) or kcbgcur (current get) without dedicated calls of ksl* functions.(ultra fast latchs)
Whenever the latch is acquired or released it will modify the first word pointed out by the latch address to reflect the PID of the holding process or the number of process holder depending on the latch type/acquisition mode
Latches on the other hand are much less sophisticated, more lightweight and are used for protecting very short operations on memory structures such various internal linked list modifications, shared pool memory allocation, library cache object lookups and so on
Whenever the latch is acquired or released it will modify the first word pointed out by the latch address to reflect the PID of the holding process or the number of process holder depending on the latch type/acquisition mode
Kc cache layer
Whenever the latch is acquired or released it will modify the first word pointed out by the latch address to reflect the PID of the holding process or the number of process holder depending on the latch type/acquisition mode
Exclusive latch memory layout :
oradebug peek 200222A0 24[200222A0, 200222B8) = 00000016 00000001 000001D0 00000007pidˆ gets latch# level#
Shared latch memory layout :oradebug peek 0x6000AEA8 24[6000AEA8, 6000AEC0) = 00000002 00000000 00000001 00000007ˆNproc ˆX flag gets latch#
Cache buffer chain latch
“V$LATCHHOLDER scans through the process state object array (V$PROCESS/X$KSUPR) and looks into a field there which points to the latch held by a process”
The idea is if we can found the pointer to one of the element of the array “ksllalat” we can deduce the other (fixed size). How to do that ?
Thanks to state objects which Oracle has to maintain for process recovery.
The background process PMON is responsible for freeing those resources to the OS should any of the state object dies because of the process failure.
A state object is just a structure in the SGA containing information about the state of a database resource
based on my previous finding and using only one hardware breakpoint on “KSLLALAQ” of X$KSUPR fixed table i was able to draw the latch call graph of a specific process using systemtap.(current get) without dedicated calls of ksl* functions.(ultra fast latchs)
Extract from latch call graph after a commit was issued :
We have three level of indentation (3 latches acquired at the same time) that can be confirmed using v$latchholder :
Here is a little c progam that will extract all the latch state objects from the SGA.The program will have to loop through X$KSUPR fixed table to extract the latch state objects of every process.It takes as parameters the shmid of the shared memory to attach to,the minimum process address as found in X$KSUPR which correspond the index 0 in the array and the process count.at the same time) that can be confirmed using v$latchholder :
This program can be enhanced to sample latch state object at high frequency and present a profile of latches held with extended info.
Here is a little c progam that will extract all the latch state objects from the SGA.The program will have to loop through X$KSUPR fixed table to extract the latch state objects of every process.It takes as parameters the shmid of the shared memory to attach to,the minimum process address as found in X$KSUPR which correspond the index 0 in the array and the process count.at the same time) that can be confirmed using v$latchholder :
This program can be enhanced to sample latch state object at high frequency and present a profile of latches held with extended info.
Latch holder info out of the state objects in SGA memory
Recover password
What to do in case we forgot the database link password and we need it ? Are we stuck ?
((Advanced Encryption Standard ) in CBC encryption mode)
the password in been encrypted before been send over the network
Suspend the oracle process just after is has been forked (using the SIGSTOP signal the idea come from Mauro Pagano blog post Manually holding an enqueue with SystemTap).
Based on my previous work using pintools i have identified the function “r0_aes_cbc_loop_dec_x86_intel” used for decrypting the password and the offset of interest inside it.
No need for brut force technique using the password hash
Initialization vector
Ciphertext
Encryption Key
A lot of interesting stuffs is happening inside function “ztcsr_dblink_v6” :
Initilization vector
Initialization vector : Using the second byte of passwordx to lookup in ztcshpl_v6
Ciphertext : calculated from ztcshpl_v6 and passwordx
Encryption Key : Calculated as a XOR between two keys :
Key 1 : calculated from ztcshpl_v6 and passwordx
Key 2 : Hash sha256 of NO_USERID_VERIFIER_SALT
Based on my previous work using pintools i have identified the function “r0_aes_cbc_loop_dec_x86_intel” used for decrypting the password and the offset of interest inside it.
No need for brut force technique using the password hash
Based on my previous work using pintools i have identified the function “r0_aes_cbc_loop_dec_x86_intel” used for decrypting the password and the offset of interest inside it.
No need for brut force technique using the password hash