cyber-security-resources/buffer-overflow-examples/basics/memory-and-stack.md

360 lines
12 KiB
Markdown

# Memory and the Stack
## Understanding Computer Memory
To understand buffer overflows, you need to understand how programs use memory. When a program runs, the operating system allocates memory to it, which is divided into several regions.
## Memory Layout of a Process
A typical process memory layout (from low to high addresses):
```
High Memory Address
┌─────────────────┐
│ Kernel Space │ ← Operating system memory (off-limits)
├─────────────────┤
│ Stack │ ← Local variables, function calls (grows downward ⬇)
│ ⬇ │
│ │
│ [free space] │
│ │
│ ⬆ │
│ Heap │ ← Dynamic memory allocation (grows upward ⬆)
├─────────────────┤
│ BSS Segment │ ← Uninitialized global/static variables
├─────────────────┤
│ Data Segment │ ← Initialized global/static variables
├─────────────────┤
│ Text Segment │ ← Program code (instructions)
└─────────────────┘
Low Memory Address
```
### Memory Segments Explained
| Segment | Purpose | Characteristics |
|---------|---------|-----------------|
| **Text** | Program code (machine instructions) | Read-only, executable, shared |
| **Data** | Initialized global/static variables | Read-write, fixed size |
| **BSS** | Uninitialized global/static variables | Read-write, zeroed at start |
| **Heap** | Dynamic memory (`malloc`, `new`) | Grows upward, managed manually |
| **Stack** | Local variables, function calls | Grows downward, automatic management |
## The Stack: Where Buffer Overflows Usually Happen
The **stack** is a Last-In-First-Out (LIFO) data structure used for:
- Storing local variables
- Managing function calls and returns
- Passing function arguments
- Saving CPU register states
### Stack Growth Direction
**Important:** The stack grows from high memory addresses to low memory addresses (downward), but buffers within the stack grow from low to high addresses (upward).
```
High Address
┌──────────────┐
│ Old Data │ ⬅ Stack starts here
├──────────────┤
│ Function 1 │
├──────────────┤
│ Function 2 │ ⬅ Stack grows down
├──────────────┤
│ Function 3 │ ⬅ Most recent function
└──────────────┘
Low Address
```
## Stack Frame Anatomy
Each function call creates a **stack frame** (also called activation record):
```
High Memory
┌─────────────────────┐
│ Function Arguments │ ⬅ Pushed by caller
├─────────────────────┤
│ Return Address │ ⬅ Where to jump back after function completes
├─────────────────────┤
│ Saved Frame Ptr │ ⬅ Previous function's base pointer (EBP/RBP)
├─────────────────────┤
│ Local Variable 1 │
├─────────────────────┤
│ Local Variable 2 │
├─────────────────────┤
│ Buffer[0..N] │ ⬅ Local arrays/buffers
├─────────────────────┤
│ ...more locals... │
└─────────────────────┘
Low Memory
```
### Key Stack Pointers
Two CPU registers track the stack:
**ESP/RSP (Stack Pointer)**
- Points to the current top of the stack
- Moves as data is pushed/popped
- Changes frequently during execution
**EBP/RBP (Base/Frame Pointer)**
- Points to the base of the current stack frame
- Used as a reference point for accessing local variables and parameters
- Remains stable during function execution
## How Function Calls Work
Let's trace what happens when `main()` calls `vulnerable()`:
### Before the Call (in main)
```
Stack:
┌─────────────────┐
│ main's vars │ ⬅ EBP, ESP here
└─────────────────┘
```
### Step 1: Push Arguments (if any)
```c
vulnerable("Hello"); // Push "Hello" pointer
```
```
Stack:
┌─────────────────┐
│ main's vars │
├─────────────────┤
│ argument │ ⬅ "Hello" pointer
└─────────────────┘
```
### Step 2: Execute CALL Instruction
The `call` instruction:
1. Pushes the **return address** (next instruction in `main`)
2. Jumps to `vulnerable()` function
```
Stack:
┌─────────────────┐
│ main's vars │
├─────────────────┤
│ argument │
├─────────────────┤
│ Return Address │ ⬅ Where to return after vulnerable()
└─────────────────┘
```
### Step 3: Function Prologue
At the start of `vulnerable()`:
```assembly
push ebp ; Save old base pointer
mov ebp, esp ; Set new base pointer
sub esp, N ; Allocate space for local variables
```
```
Stack:
┌─────────────────┐
│ main's vars │
├─────────────────┤
│ argument │
├─────────────────┤
│ Return Address │ ⬅ CRITICAL: Controls where program returns
├─────────────────┤
│ Saved EBP │ ⬅ Previous frame pointer
├─────────────────┤
│ Local Var 1 │
├─────────────────┤
│ buffer[20] │ ⬅ ESP, EBP now point here
└─────────────────┘
```
### Step 4: Function Epilogue (Normal Return)
At the end of `vulnerable()`:
```assembly
mov esp, ebp ; Restore stack pointer
pop ebp ; Restore base pointer
ret ; Pop return address and jump to it
```
The program returns to `main()` and continues normally.
## Buffer Overflow Visualization
Now let's see what happens with a buffer overflow:
### Normal Case
```c
void vulnerable() {
char buffer[8];
strcpy(buffer, "Hello"); // 5 bytes + null terminator = 6 bytes (OK)
}
```
```
Stack:
┌──────────────────┐
│ Return Address │ ⬅ 0x08048123 (unchanged)
├──────────────────┤
│ Saved EBP │ ⬅ 0xbffff678 (unchanged)
├──────────────────┤
│ buffer[4-7] │ ⬅ "\0\0\0\0"
├──────────────────┤
│ buffer[0-3] │ ⬅ "Hell"
└──────────────────┘
⬆ ESP
```
### Overflow Case
```c
void vulnerable() {
char buffer[8];
strcpy(buffer, "ThisStringIsMuchLongerThan8Bytes"); // OVERFLOW!
}
```
```
Stack Before:
┌──────────────────┐
│ Return Address │ ⬅ 0x08048123
├──────────────────┤
│ Saved EBP │ ⬅ 0xbffff678
├──────────────────┤
│ buffer[8] │
└──────────────────┘
Stack After Overflow:
┌──────────────────┐
│ Return Address │ ⬅ 0x73736572 (OVERWRITTEN! Actually "ress" from string)
├──────────────────┤
│ Saved EBP │ ⬅ 0x676e6f4c (OVERWRITTEN! Actually "Long" from string)
├──────────────────┤
│ buffer[8-11] │ ⬅ "Much"
├──────────────────┤
│ buffer[4-7] │ ⬅ "ngIs"
├──────────────────┤
│ buffer[0-3] │ ⬅ "This"
└──────────────────┘
⬆ ESP
```
**What happens next:**
1. Function tries to return
2. Pops corrupted return address (0x73736572)
3. Tries to jump to that address
4. **CRASH!** - Segmentation fault (invalid memory access)
## Exploiting Buffer Overflows
An attacker can carefully craft input to:
### 1. Control the Return Address
```
Stack Layout:
┌──────────────────┐
│ Return Address │ ⬅ Overwrite with 0xbffff7d0 (address of shellcode)
├──────────────────┤
│ Saved EBP │ ⬅ Can be junk (not critical)
├──────────────────┤
│ buffer + padding │ ⬅ Fill with NOPs + shellcode
└──────────────────┘
```
### 2. Inject Malicious Code
```
Payload Structure:
[ NOP Sled ][ Shellcode ][ Junk ][ Return Address ]
(safety) (exploit) (fill) (points to NOPs)
```
### 3. Redirect Execution
When the function returns:
1. Pops attacker-controlled return address
2. Jumps to NOP sled
3. Slides down to shellcode
4. Executes arbitrary code!
## Little Endian vs Big Endian
When overwriting addresses, byte order matters:
**Little Endian** (x86, x64):
- Least significant byte first
- Address 0x12345678 stored as: `\x78\x56\x34\x12`
**Big Endian** (some ARM, network protocols):
- Most significant byte first
- Address 0x12345678 stored as: `\x12\x34\x56\x78`
Example:
```python
# To overwrite return address with 0xdeadbeef on x86:
payload = b"A" * 32 + b"\xef\xbe\xad\xde"
```
## Stack vs Heap Overflows
### Stack Overflow Characteristics
- **Target**: Local variables, return addresses
- **Easier to exploit**: Predictable structure
- **Impact**: Code execution via return address overwrite
### Heap Overflow Characteristics
- **Target**: Dynamically allocated memory
- **Harder to exploit**: Less predictable layout
- **Impact**: Data corruption, function pointer overwrite, metadata manipulation
## Key Takeaways
1. **The stack grows downward** (high to low addresses), but **buffers grow upward** (low to high)
2. **Return addresses are stored on the stack** and can be overwritten
3. **Buffer overflow happens** when data exceeds buffer boundaries
4. **Careful memory layout understanding** is critical for both exploitation and defense
5. **Stack frames contain critical control data** that attackers want to modify
## Practical Implications
### For Attackers (Ethical Hackers)
- Need to calculate exact offset to return address
- Must understand stack layout of target function
- Payload must account for stack alignment and protections
### For Defenders (Developers)
- Use stack canaries to detect corruption
- Enable DEP/NX to prevent code execution on stack
- Use ASLR to randomize stack addresses
- Validate all input sizes
- Use safe string functions
## Next Steps
1. Learn about [CPU Registers](registers.md) used in stack operations
2. Study [Assembly Basics](assembly-basics.md) to understand low-level stack manipulation
3. Practice with [Simple Buffer Overflow Example](../examples/01-simple-overflow/)
4. Read about [Modern Mitigations](../defenses/mitigations.md)
## Further Reading
- [Smashing the Stack for Fun and Profit](http://phrack.org/issues/49/14.html) - The classic paper
- [Intel Software Developer Manual](https://software.intel.com/en-us/articles/intel-sdm) - Architecture details
- [Stack Frame Layout](https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/)
- [ASLR Explained](https://en.wikipedia.org/wiki/Address_space_layout_randomization)
---
**Remember:** Understanding the stack is fundamental to both exploiting and defending against buffer overflows. Master these concepts before moving to exploitation techniques.