cyber-security-resources/buffer-overflow-examples/basics/memory-and-stack.md

# Memory and the Stack

## Understanding Computer Memory

To understand buffer overflows, you need to understand how programs use memory. When a program runs, the operating system allocates memory to it, which is divided into several regions.

## Memory Layout of a Process

A typical process memory layout (from low to high addresses):

```
High Memory Address
┌─────────────────┐
│  Kernel Space   │ ← Operating system memory (off-limits)
├─────────────────┤
│  Stack          │ ← Local variables, function calls (grows downward ⬇)
│       ⬇         │
│                 │
│   [free space]  │
│                 │
│       ⬆         │
│  Heap           │ ← Dynamic memory allocation (grows upward ⬆)
├─────────────────┤
│  BSS Segment    │ ← Uninitialized global/static variables
├─────────────────┤
│  Data Segment   │ ← Initialized global/static variables
├─────────────────┤
│  Text Segment   │ ← Program code (instructions)
└─────────────────┘
Low Memory Address
```

### Memory Segments Explained

| Segment | Purpose | Characteristics |
|---------|---------|-----------------|
| **Text** | Program code (machine instructions) | Read-only, executable, shared |
| **Data** | Initialized global/static variables | Read-write, fixed size |
| **BSS** | Uninitialized global/static variables | Read-write, zeroed at start |
| **Heap** | Dynamic memory (`malloc`, `new`) | Grows upward, managed manually |
| **Stack** | Local variables, function calls | Grows downward, automatic management |

## The Stack: Where Buffer Overflows Usually Happen

The **stack** is a Last-In-First-Out (LIFO) data structure used for:
- Storing local variables
- Managing function calls and returns
- Passing function arguments
- Saving CPU register states

### Stack Growth Direction

**Important:** The stack grows from high memory addresses to low memory addresses (downward), but buffers within the stack grow from low to high addresses (upward).

```
High Address
┌──────────────┐
│   Old Data   │  ⬅ Stack starts here
├──────────────┤
│  Function 1  │
├──────────────┤
│  Function 2  │  ⬅ Stack grows down
├──────────────┤
│  Function 3  │  ⬅ Most recent function
└──────────────┘
Low Address
```

## Stack Frame Anatomy

Each function call creates a **stack frame** (also called activation record):

```
High Memory
┌─────────────────────┐
│  Function Arguments │ ⬅ Pushed by caller
├─────────────────────┤
│  Return Address     │ ⬅ Where to jump back after function completes
├─────────────────────┤
│  Saved Frame Ptr    │ ⬅ Previous function's base pointer (EBP/RBP)
├─────────────────────┤
│  Local Variable 1   │
├─────────────────────┤
│  Local Variable 2   │
├─────────────────────┤
│  Buffer[0..N]       │ ⬅ Local arrays/buffers
├─────────────────────┤
│  ...more locals...  │
└─────────────────────┘
Low Memory
```

### Key Stack Pointers

Two CPU registers track the stack:

**ESP/RSP (Stack Pointer)**
- Points to the current top of the stack
- Moves as data is pushed/popped
- Changes frequently during execution

**EBP/RBP (Base/Frame Pointer)**
- Points to the base of the current stack frame
- Used as a reference point for accessing local variables and parameters
- Remains stable during function execution

## How Function Calls Work

Let's trace what happens when `main()` calls `vulnerable()`:

### Before the Call (in main)

```
Stack:
┌─────────────────┐
│   main's vars   │ ⬅ EBP, ESP here
└─────────────────┘
```

### Step 1: Push Arguments (if any)

```c
vulnerable("Hello");  // Push "Hello" pointer
```

```
Stack:
┌─────────────────┐
│   main's vars   │
├─────────────────┤
│   argument      │ ⬅ "Hello" pointer
└─────────────────┘
```

### Step 2: Execute CALL Instruction

The `call` instruction:
1. Pushes the **return address** (next instruction in `main`)
2. Jumps to `vulnerable()` function

```
Stack:
┌─────────────────┐
│   main's vars   │
├─────────────────┤
│   argument      │
├─────────────────┤
│ Return Address  │ ⬅ Where to return after vulnerable()
└─────────────────┘
```

### Step 3: Function Prologue

At the start of `vulnerable()`:

```assembly
push ebp          ; Save old base pointer
mov ebp, esp      ; Set new base pointer
sub esp, N        ; Allocate space for local variables
```

```
Stack:
┌─────────────────┐
│   main's vars   │
├─────────────────┤
│   argument      │
├─────────────────┤
│ Return Address  │ ⬅ CRITICAL: Controls where program returns
├─────────────────┤
│ Saved EBP       │ ⬅ Previous frame pointer
├─────────────────┤
│ Local Var 1     │
├─────────────────┤
│ buffer[20]      │ ⬅ ESP, EBP now point here
└─────────────────┘
```

### Step 4: Function Epilogue (Normal Return)

At the end of `vulnerable()`:

```assembly
mov esp, ebp      ; Restore stack pointer
pop ebp           ; Restore base pointer
ret               ; Pop return address and jump to it
```

The program returns to `main()` and continues normally.

## Buffer Overflow Visualization

Now let's see what happens with a buffer overflow:

### Normal Case

```c
void vulnerable() {
    char buffer[8];
    strcpy(buffer, "Hello");  // 5 bytes + null terminator = 6 bytes (OK)
}
```

```
Stack:
┌──────────────────┐
│ Return Address   │ ⬅ 0x08048123 (unchanged)
├──────────────────┤
│ Saved EBP        │ ⬅ 0xbffff678 (unchanged)
├──────────────────┤
│ buffer[4-7]      │ ⬅ "\0\0\0\0"
├──────────────────┤
│ buffer[0-3]      │ ⬅ "Hell"
└──────────────────┘
    ⬆ ESP
```

### Overflow Case

```c
void vulnerable() {
    char buffer[8];
    strcpy(buffer, "ThisStringIsMuchLongerThan8Bytes");  // OVERFLOW!
}
```

```
Stack Before:
┌──────────────────┐
│ Return Address   │ ⬅ 0x08048123
├──────────────────┤
│ Saved EBP        │ ⬅ 0xbffff678
├──────────────────┤
│ buffer[8]        │
└──────────────────┘

Stack After Overflow:
┌──────────────────┐
│ Return Address   │ ⬅ 0x73736572 (OVERWRITTEN! Actually "ress" from string)
├──────────────────┤
│ Saved EBP        │ ⬅ 0x676e6f4c (OVERWRITTEN! Actually "Long" from string)
├──────────────────┤
│ buffer[8-11]     │ ⬅ "Much"
├──────────────────┤
│ buffer[4-7]      │ ⬅ "ngIs"
├──────────────────┤
│ buffer[0-3]      │ ⬅ "This"
└──────────────────┘
    ⬆ ESP
```

**What happens next:**
1. Function tries to return
2. Pops corrupted return address (0x73736572)
3. Tries to jump to that address
4. **CRASH!** - Segmentation fault (invalid memory access)

## Exploiting Buffer Overflows

An attacker can carefully craft input to:

### 1. Control the Return Address

```
Stack Layout:
┌──────────────────┐
│ Return Address   │ ⬅ Overwrite with 0xbffff7d0 (address of shellcode)
├──────────────────┤
│ Saved EBP        │ ⬅ Can be junk (not critical)
├──────────────────┤
│ buffer + padding │ ⬅ Fill with NOPs + shellcode
└──────────────────┘
```

### 2. Inject Malicious Code

```
Payload Structure:
[  NOP Sled  ][  Shellcode  ][  Junk  ][  Return Address  ]
  (safety)      (exploit)     (fill)     (points to NOPs)
```

### 3. Redirect Execution

When the function returns:
1. Pops attacker-controlled return address
2. Jumps to NOP sled
3. Slides down to shellcode
4. Executes arbitrary code!

## Little Endian vs Big Endian

When overwriting addresses, byte order matters:

**Little Endian** (x86, x64):
- Least significant byte first
- Address 0x12345678 stored as: `\x78\x56\x34\x12`

**Big Endian** (some ARM, network protocols):
- Most significant byte first
- Address 0x12345678 stored as: `\x12\x34\x56\x78`

Example:
```python
# To overwrite return address with 0xdeadbeef on x86:
payload = b"A" * 32 + b"\xef\xbe\xad\xde"
```

## Stack vs Heap Overflows

### Stack Overflow Characteristics
- **Target**: Local variables, return addresses
- **Easier to exploit**: Predictable structure
- **Impact**: Code execution via return address overwrite

### Heap Overflow Characteristics
- **Target**: Dynamically allocated memory
- **Harder to exploit**: Less predictable layout
- **Impact**: Data corruption, function pointer overwrite, metadata manipulation

## Key Takeaways

1. **The stack grows downward** (high to low addresses), but **buffers grow upward** (low to high)
2. **Return addresses are stored on the stack** and can be overwritten
3. **Buffer overflow happens** when data exceeds buffer boundaries
4. **Careful memory layout understanding** is critical for both exploitation and defense
5. **Stack frames contain critical control data** that attackers want to modify

## Practical Implications

### For Attackers (Ethical Hackers)
- Need to calculate exact offset to return address
- Must understand stack layout of target function
- Payload must account for stack alignment and protections

### For Defenders (Developers)
- Use stack canaries to detect corruption
- Enable DEP/NX to prevent code execution on stack
- Use ASLR to randomize stack addresses
- Validate all input sizes
- Use safe string functions

## Next Steps

1. Learn about [CPU Registers](registers.md) used in stack operations
2. Study [Assembly Basics](assembly-basics.md) to understand low-level stack manipulation
3. Practice with [Simple Buffer Overflow Example](../examples/01-simple-overflow/)
4. Read about [Modern Mitigations](../defenses/mitigations.md)

## Further Reading

- [Smashing the Stack for Fun and Profit](http://phrack.org/issues/49/14.html) - The classic paper
- [Intel Software Developer Manual](https://software.intel.com/en-us/articles/intel-sdm) - Architecture details
- [Stack Frame Layout](https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/)
- [ASLR Explained](https://en.wikipedia.org/wiki/Address_space_layout_randomization)

---

**Remember:** Understanding the stack is fundamental to both exploiting and defending against buffer overflows. Master these concepts before moving to exploitation techniques.