diff --git a/Linux_Hacking/README.md b/Linux_Hacking/README.md index e740c1d..e62c1e2 100644 --- a/Linux_Hacking/README.md +++ b/Linux_Hacking/README.md @@ -2,10 +2,6 @@ - - ---- - ## Folders ### SSH Hacking @@ -30,4 +26,1248 @@ * rshd, rlogind, sshd: honor trust relationship established with the source's IP address. +--- + +## My Linux Guide & Tricks + +# The Linux Environment + + +## The Linux Filesystem + +* Let's start getting an idea of our system. The *Linux filesystem* is composed of several system directories locate at **/**: + +``` +$ ls / +``` + +* You can verify their sizes and where they are mounted with: + +``` +$ df -h . +Filesystem Type Size Used Avail Use% Mounted on +/dev/mapper/fedora-home ext4 127G 62G 59G 51% /home +``` + + +* The filesystem architecture is generally divided into the following folders: + +### /bin, /sbin and /user/sbin +* **/bin** is a directory containing executable binaries, essential commands used in single-user mode, and essential commands required by all system users. + +* **/sbin** contains commands that are not essential for the system in single-user mode. + +### /dev +* **/dev** contains device nodes, which are a type of pseudo-file used by most hardware and software devices (except for network devices). + +* The directory also contains entries that are created by the **udev** system, which creates and manages device nodes on Linux (creating them dynamically when devices are found). + +### /var +* **/var** stands for variable and contains files that are expected to be changing in size and content as the system is running. + +* For example, the system log files are located at **/var/log**, the packages and database files are located at **/var/lib**, the print queues are located at **/var/spool**, temporary files stay inside **/var/tmp**, and networks services can be found in subdirectories such as **/var/ftp** and **/var/www**. + +### /etc + +* **/etc** stands for the system configuration files. It contains no binary programs, but it might have some executable scripts. + +* For instance, the file **/etc/resolv.conf** tells the system where to go on the network to obtain the host name of some IP address (*i.e.* DNS). + +* Additionally, the **/etc/passwd** file is the authoritative list of users on any Unix system. It does not contain the passwords: the encrypted password information was migrated into **/etc/shadow**. + +### /lib +* **/lib** contains libraries (common code shared by applications and needed for them to run) for essential programs in **/bin** and **/sbin**. + +* This library filenames either start with ```ld``` or ```lib``` and are called *dynamically loaded libraries* (or shared libraries). + + +### /boot + +* **/boot** contains the few essential files needed to boot the system. + +* For every alternative kernel installed on the system, there are four files: + + * ```vmlinuz```: the compressed Linux kernel, required for booting. + + * ```initramfs``` or ```initrd```: the initial RAM filesystem, required for booting. + + * ```config```: the kernel configuration file, used for debugging. + + * ```system.map```: the kernel symbol table. + + * [GRUB](http://www.gnu.org/software/grub/) files can also be found here. + + + +### /opt +* Optional directory for application software packages, usually installed manually by the user. + +### /tmp +* **/tmp** contains temporary files that are erased in a reboot. + +### /usr + +* **/usr** contains multi-user applications, utilities and data. The common subdirectories are: + * **/usr/include**: header files used to compile applications. + + * **usr/lib**: libraries for programs in **usr/(s)bin**. + + * **usr/sbin**: non-essential system binaries, such as system daemons. In modern Linux systems, this is actually linked together to **/sbin**. + + * **usr/bin**: primary directory of executable commands of the system. + + * **usr/share**: shaped data used by applications, generally architecture-independent. + + * **usr/src**: source code, usually for the Linux kernel. + + * **usr/local**: data and programs specific to the local machine. + + + + +--- +## /dev Specials + +* There exist files provided by the operating system that do not represent any physical device, but provide a way to access special features: + + * **/dev/null** ignores everything written to it. It's convenient for discarding unwanted output. + + * **/dev/zero** contains an *infinite* numbers of zero bytes, which can be useful for creating files of a specified length. + + * **/dev/urandom** and **/dev/random** contain infinite stream of operating-system-generated random numbers, available to any application that wants to read them. The difference between them is that the second guarantees strong randomness (it will wait until enough is available) and so it should used for encryption, while the former can be used for games. + +* For example, to output random bytes, you can type: + +``` +$ cat /dev/urandom | strings +``` + + + +## The Kernel + +* The **Linux Kernel** is the program that manages *input/output requests* from software, and translates them into *data processing instructions* for the *central processing unit* (CPU). + +* To find the Kernel information you can type: + +``` +$ cat /proc/version +Linux version 3.14.9-200.fc20.x86_64 (mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 4.8.3 20140624 (Red Hat 4.8.3-1) (GCC) ) #1 SMP Thu Jun 26 21:40:51 UTC 2014 +``` + +* You can also print similar system information with the specific command to print system information, ```uname```. The flag **-a** stands for all: + +``` + $ uname -a + Linux XXXXX 3.14.9-200.fc20.x86_64 #1 SMP Thu Jun 26 21:40:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux +``` + +* For instance, we might be interested on **checking whether you are using the latest Kernel**. You can do this by checking whether the outputs of the following commands match: + +``` +$ rpm -qa kernel | sort -V | tail -n 1 +$ uname -r +``` + +* Additionally, for Fedora (and RPM systems) you can check what kernels are installed with: + +``` +$ rpm -q kernel +``` + + +--- +## Processes + +* A running program is called **process**. Each process has a **owner** (in the same sense as when we talk about file permissions below). + +* You can find out which programs are running with the **ps** command. This also gives the **process ID** or **PID**, which is a unique long-term identity for the process (different copies of a given program will have separate PIDs). + +* To put a job (process) in the background we either run it with **&** or we press CTRL-Z and then type **bg**. To bring back to the foreground, we type **fg**. + +* To get the list of running jobs in the shell, we type **jobs**. Each job has a **job ID** which can be used with the percent sign **%** to **bg**, **fg** or **kill** (described below). + + +### ps + +* To see the processes that were not started from your current session you can run: + +``` +$ ps x +``` + +* To see your processes and those belonging to other users: + +``` +$ ps aux +``` + +* To list all zombie processes you can either do: + +``` +$ ps aux | grep -w Z +``` + +or + +``` +$ ps -e +``` + + +### top + +* Another useful command is **top** (table of processes). It tells you which programs are using the most of memory or CPU: + +``` +$ top +``` + +* I particularly like [htop](http://hisham.hm/htop/) over top, which needs to be installed if you want to use it. + +### kill + +* To stop running a command you can use **kill**. This will send a message called **signal** to the program. There are [64 different signals](http://www.linux.org/threads/kill-commands-and-signals.4423/), some having distinct meanings from *stop running*: + +``` +$ kill +``` + +* The default signal sent by kill is **SIGTERM**, telling the program that you want it to quit. This is just a request, and the program can choose to ignore it. + +* The signal **SIGKILL** is mandatory and cause the immediate end of the process. The only exception is if the program is in the middle of making a request to the operating system, *i.e.* a system call). This is because the request needs to finish first. **SIGKILL** is the 9th signal in the list and it is usually sent with: + +``` +$ kill -9 +``` + +* Pressing CTRL-C is a simpler way to tell the program to quit, and it sends a message called **SIGINT**. You can also specify the PID as an argument to kill. + + +### uptime + +* Another great command is **uptime**, which shows how long the system has been running, with a measure of its load average as well: + +``` +$ uptime +``` + +### nice and renice + +* Finally, you can change processes priority using ```nice``` (runs a program with modified scheduling priority) and ```renice```(alter priority of running processes). + + +--- +## Environment Variables + +* *Environment variables* are several dynamic named values in the operating system that can be used in running processes. + + +### set and env +* You can see the *environment variables and configuration* in your system with: + +``` +$ set +``` +or +``` +$ env +``` + +### export and echo +* The value of an environment variable can be changed with: + +``` +$ export VAR= +``` + +* The value can be checked with: + +``` +$ echo $VAR +``` + +* The **PATH** (search path) is the list of directories that the shell look in to try to find a particular command. For example, when you type ```ls``` it will look at ```/bin/ls```. The path is stored in the variable **PATH**, which is a list of directory names separated by colons and it's coded inside **./bashrc**. To export a new path you can do: + +``` +$ export PATH=$PATH:/ +``` + +### Variable in Scripts + +* Inside a running shell script, there are pseudo-environment variables that are called with **$1**, **$2**, etc., for individual arguments that were passed to the script when it was run. In addition, **$0** is the name of the script and **$@** is for the list of all the command-line arguments. + + + + + +--- +## The "~/." Files (dot-files) + +* The leading dot in a file is used as an indicator to not list these files normally, but only when they are specifically requested. The reason is that, generally, dot-files are used to store configuration and sensitive information for applications. + +### ~/.bashrc + +* **~/.bashrc** contains scripts and variables that are executed when bash is invoked. + +* It's a good experience to customize your **~/.bashrc**. Just google for samples, or take a look at this [site dedicated for sharing dot-files](http://dotfiles.org), or at [mine](https://github.com/mariwahl/Dotfiles-and-Bash-Examples/blob/master/configs/bashrc). Don't forget to ```source``` your **./bashrc** file every time you make a change (opening a new terminal has the same effect): + +``` +$ source ~/.bashrc +``` + +### Sensitive dot-files + +* If you use cryptographic programs such as [ssh](http://en.wikipedia.org/wiki/Secure_Shell) and [gpg](https://www.gnupg.org/), you'll find that they keep a lot of information in the directories **~/.ssh** and **~/.gnupg**. + +* If you are a *Firefox* user, the **~/.mozilla** directory contains your web browsing history, bookmarks, cookies, and any saved passwords. + +* If you use [Pidgin](http://pidgin.im/), the **~/.purple** directory (after the name of [the IM library](https://developer.pidgin.im/wiki/WhatIsLibpurple)) contains private information. This includes sensitive cryptographic keys for users of cryptographic extensions to Pidgin such as [Off-the-Record](https://otr.cypherpunks.ca/). + + +--- +## File Descriptors + +* A **file descriptor** (FD) is a number indicator for accessing an I/O resource. The values are the following: + * fd 0: stdin (standard input). + * fd 1: stdout (standard output). + * fd 2: stderr (standard error). + +* This naming is used for manipulation of these resources in the command line. For example, to send an **input** to a program you use **<**: + +``` +$ < +``` + +* To send a program's **output** somewhere else than the terminal (such as a file), you use **>**. For example, to just discard the output: + +``` +$ > /dev/null +``` + +* To send the program's error messages to a file you use the file descriptor 2: + +``` +$ 2> +``` + +* To send the program's error messages to the same place where **stdout** is going, *i.e.* merging it into a single stream (this works greatly for pipelines): + +``` +$ 2>&1 +``` + +---- +## File Permissions + +* Every file/directory in Linux is said to belong to a particular **owner** and a particular **group**. Files also have permissions stating what operations are allowed. + + +### chmod +* A resource can have three permissions: read, write, and execute: + + * For a file resource, these permission are: read the file, to modify the file, and to run the file as a program. + + * For a directory, these permissions are: the ability to list the directory's contents, to create and delete files inside the directory, and to access files within the directory. + +* To change the permissions you use the command ```chmod```. + + +### chown and chgrp + +* Unix permissions model does not support *access control lists* allowing a file to be shared with an enumerated list of users for a particular purpose. Instead, the admin needs to put all the users in a group and make the file to belong to that group. File owners cannot share files with an arbitrary list of users. + + +* There are three agents relate to the resource: user, group, and all. Each of them can have separated permissions to read, write, and execute. + +* To change the owner of a resource you use ```chown```. There are two ways of setting permissions with chmod: + + * A numeric form using octal modes: read = 4, write = 2, execute = 1, where you multiply by user = x100, group = x10, all = x1, and sum the values corresponding to the granted permissions. For example 755 = 700 + 50 + 5 = rwxr-xr-x: ``` $ chmod 774 ``` + + * An abbreviated letter-based form using symbolic modes: u, g, or a, followed by a plus or minus, followed by a letter r, w, or x. This means that u+x "grants user execute permission", g-w "denies group write permission", and a+r "grants all read permission":```$ chmod g-w ```. + + +* To change the group you use ```chgrp```, using the same logic as for chmod. + + + +* To see the file permissions in the current folder, type: + +``` +$ ls -l +``` + +* For example, ```-rw-r--r--``` means that it is a file (-) where the owner has read (r) and write (w) permissions, but not execute permission (-). + + +--- + +# Shell Commands and Tricks + +## Reading Files + +### cat + +* Prints the content of a file in the terminal: + +``` +$ cat +``` + +### tac + +* Prints the inverse of the content of a file in the terminal (starting from the bottom): + +``` +$ tac +``` + +### less and more + +* Both print the content of a file, but adding page control: + +``` +$ less +$ more +``` + +### head and tail +* To read 20 lines from the begin: + +``` +$ head -20 +``` + +* To read 20 lines from the bottom: + +``` +$ tail -10 +``` + +### nl + +* To print (cat) a file with line numbers: + +``` +$ nl +``` + +### tee + +* To save the output of a program and see it as well: + +``` +$ | tee -a +``` + +### wc + +* To print the length and number of lines of a file: + +``` +$ wc +``` + + + +--- +## Searching inside Files + +### diff and diff3 + +* **diff** can be used to compare files and directories. Useful flags include: **-c** to list differences, **-r** to recursively compare subdirectories, **-i** to ignore case, and **-w** to ignore spaces and tabs. + +* You can compare three files at once using **diff3**, which uses one file as the reference basis for the other two. + + +### file + +* The command **file** shows the real nature of a file: + +``` +$ file requirements.txt +requirements.txt: ASCII text +``` + +### grep +* **grep** finds matches for a particular search pattern. The flag **-l** lists the files that contain matches, the flag **-i** makes the search case insensitive, and the flag **-r** searches all the files in a directory and subdirectory: + +``` +$ grep -lir +``` + +* For example, to remove lines that are not equal to a word: + +``` +$ grep -xv +``` + +--- +## Listing or Searching for Files + + +### ls + +* **ls** lists directory and files. Useful flags are **-l** to list the permissions of each file in the directory and **-a** to include the dot-files: + +``` +$ ls -la +``` + +* To list files sorted by size: + +``` +$ ls -lrS +``` + +* To list the names of the 10 most recently modified files ending with .txt: + +``` +$ ls -rt *.txt | tail -10 +``` + + +### tree + +* The **tree** command lists contents of directories in a tree-like format. + + +### find + +* To find files in a directory: + +``` +$ find -name +``` + +### which + +* To find binaries in PATH variables: + +``` +$ which ls +``` + +### whereis + +* To find any file in any directory: + +``` +$ whereis +``` + +### locate + +* To find files by name (using database): + +``` +$ locate +``` + +* To test if a a file exist: + +``` +$ test -f +``` + +--- +## Modifying Files + +### true + +* To make a file empty: +``` +$ true > +``` + +### tr + +* **tr** takes a pair of strings as arguments and replaces, in its input, every letter that occurs in the first string by the corresponding characters in the second string. For example, to make everything lowercase: + +``` +$ tr A-Z a-z +``` + +* To put every word in a line by replacing spaces with newlines: + +``` +$ tr -s ' ' '\n' +``` + +* To combine multiple lines into a single line: + + +``` +$ tr -d '\n' +``` + +* **tr** doesn't accept the names of files to act upon, so we can pipe it with cat to take input file arguments (same effect as ```$ < ```): + +``` +$ cat "$@" | tr +``` + +### sort + +* Sort the contents of text files. The flag **-r** sort backwards, and the flag **-n** selects numeric sort order (for example, without it, 2 comes after 1000): + +``` +$ sort -rn +``` + +* To output a frequency count (histogram): + +``` +$ sort | uniq -c | sort -rn +``` + +* To chose random lines from a file: + +``` +$ sort -R | head -10 +``` + +* To combine multiple files into one sorted file: + +``` +$ sort -m +``` + +### uniq + + +* **uniq** remove *adjacent* duplicate lines. The flag **-c** can include a count: + +``` +$ uniq -c +``` + +* To output only duplicate lines: + +``` +$ uniq -d +``` + +### cut + +* **cut** selects particular fields (columns) from a structured text files (or particular characters from each line of any text file). The flag **-d** specifies what delimiter should be used to divide columns (default is tab), the flag **-f** specifies which field or fields to print and in what order: + +``` +$ cut -d ' ' -f 2 +``` + +* The flag **-c** specifies a range of characters to output, so **-c1-2** means to output only the first two characters of each line: + +``` +$ cut -c1-2 +``` + +### join +* **join** combines multiple file by common delimited fields: + +``` +$ join +``` + + + + +---- +## Creating Files and Directories + +### mkdir + +* **mkdir** creates a directory. An useful flag is **-p** which creates the entire path of directories (in case they don't exist): + +``` +$ mkdir -p +``` + + +### cp + +* Copying directory trees is done with **cp**. The flag **-a** is used to preserve all metadata: + +``` +$ cp -a +``` + +* Interestingly, commands enclosed in **$()** can be run and then the output of the commands is substituted for the clause and can be used as a part of another command line: + +``` +$ cp $(ls -rt *.txt | tail -10) +``` + + +### pushd and popd + +* The **pushd** command saves the current working directory in memory so it can be returned to at any time, optionally changing to a new directory: + +``` + $ pushd ~/Desktop/ +``` + +* The **popd** command returns to the path at the top of the directory stack. + +### ln + +* Files can be linked with different names with the **ln**. To create a symbolic (soft) link you can use the flag **-s**: + +``` +$ ln -s +``` + + +### dd + +* **dd** is used for disk-to-disk copies, being useful for making copies of raw disk space. For example, to back up your [Master Boot Record](http://en.wikipedia.org/wiki/Master_boot_record) (MBR): + +``` +$ dd if=/dev/sda of=sda.mbr bs=512 count=1 +``` + +* To use **dd** to make a copy of one disk onto another: + +``` +$ dd if=/dev/sda of=/dev/sdb +``` + + + +---- +## Network and Admin + +### du + +* **du** shows how much disk space is used for each file: + +``` +$ du -sha +``` + +* To see this information sorted and only the 10 largest files: + +``` +$ du -a | sort -rn | head -10 +``` + +* To determine which subdirectories are taking a lot of disk space: + +``` +$ du --max-depth=1 | sort -k1 -rn +``` + +### df + +* **df** shows how much disk space is used on each mounted filesystem. It displays five columns for each filesystem: the name, the size, how much is used, how much is available, percentage of use, and where it is mounted. Note the values won't add up because Unix filesystems have **reserved** storage blogs which only the root user can write to. + +``` +$ df -h +``` + + + +### ifconfig + +* You can check and configure your network interface with: + +``` +$ ifconfig +``` + +* In general, you will see the following devices when you issue **ifconfig**: + + * ***eth0***: shows the Ethernet card with information such as: hardware (MAC) address, IP address, and the network mask. + + * ***lo***: loopback address or localhost. + + +* **ifconfig** is supposed to be deprecated. See [my short guide on ip-netns](https://coderwall.com/p/uf_44a). + + +### dhclient + +* Linux has a DHCP server that runs a daemon called ```dhcpd```, assigning IP address to all the systems on the subnet (it also keeps logs files): + +``` +$ dhclient +``` + +### dig + + +* **dig** is a DNS lookup utility (similar to ```dnslookup``` in Windows). + +### netstat + + +* **netstat** prints the network connections, routing tables, interface statistics, among others. Useful flags are **-t** for TCP, **-u** for UDP, **-l** for listening, **-p** for program, **-n** for numeric. For example: + +``` +$ netstat -tulpn +``` + + + +### netcat, telnet and ssh + +* To connect to a host server, you can use **netcat** (nc) and **telnet**. To connect under an encrypted session, **ssh** is used. For example, to send a string to a host at port 3000: + +``` +$ echo 4wcYUJFw0k0XLShlDzztnTBHiqxU3b3e | nc localhost 3000 +``` + +* To telnet to localhost at port 3000: + +``` +$ telnet localhost 3000 +``` + + + +### lsof + +* **lsof** lists open files (remember that everything is considered a file in Linux): + +``` +$ lsof +``` + +* To see open TCP ports: + +``` +$ lsof | grep TCP +``` + +* To see IPv4 port(s): + +``` +$ lsof -Pnl +M -i4 +``` + +* To see IPv6 listing port(s): + +``` +$ lsof -Pnl +M -i6 +``` + + + +--- + +## Useful Stuff + +### echo + + +* **echo** prints its arguments as output. It can be useful for pipeling, and in this case you use the flag **-n** to not output the trailing new line: +``` +$ echo -n +``` + +* **echo** can be useful to generate commands inside scripts (remember the discussion about file descriptor): + +``` +$ echo 'Done!' >&2 +``` + +* Or to find shell environment variables (remember the discussion about them): + +``` +$ echo $PATH +``` + +* Fir example, we can send the current date information to a file: + +``` +$ echo Completed at $(date) >> log.log +``` + +### bc + +* A calculator program is given by the command **bc** The flag **-l** stands for the standard math library: + +``` +$ bc -l +``` + +* For example, we can make a quick calculation with: +``` +$ echo '2*15454' | bc +30908 +``` + + + +### w, who, finger, users + + +* To find information about logged users you can use the commands **w, who, finger**, and **users**. + + + + + +--- +## Regular Expression 101 + +* **Regular expressions** (regex) are sequences of characters that forms a search pattern for use in pattern matching with strings. + +* Letters and numbers match themselves. Therefore, 'awesome' is a regular expression that matches 'awesome'. + +* The main rules that can be used with **grep** are: + * ```.``` matches any character. + * ```*``` any number of times (including zero). + * ```.*``` matches any string (including empty). + * ```[abc]``` matches any character a or b or c. + * ```[^abc]``` matches any character other than a or b or c. + * ```^``` matches the beginning of a line. + * ```$``` matches the end of a line. + +* For example, to find lines in a file that begin with a particular string you can use the regex symbol **^**: + +``` +$ grep ^awesome +``` + +* Additionally, to find lines that end with a particular string you can use **$**: + +``` +$ grep awesome$ +``` + +* As an extension, **egrep** uses a version called *extended regular expresses* (EREs) which include things such: + * ```()``` for grouping + * ```|``` for or + * ```+``` for one or more times + * ```\n``` for back-references (to refer to an additional copy of whatever was matched before by parenthesis group number n in this expression). + +* For instance, you can use ``` egrep '.{12}'```to find words of at least 12 letters. You can use ```egrep -x '.{12}'``` to find words of exactly twelve letters. + + + + +--- + +## Awk and Sed + +* **awk** is a pattern scanning tool while **sed** is a stream editor for filtering and transform text. While these tools are extremely powerful, if you have knowledge of any very high level languages such as Python or Ruby, you don't necessary need to learn them. + +### sed + +* Let's say we want to replace every occurrence of *mysql* and with MySQL (Linux is case sensitive), and then save the new file to . We can write an one-line command that says "search for the word mysql and replace it with the word MySQL": + +``` +$ sed s/mysql/MySQL/g > +``` + +* To replace any instances of period followed by any number of spaces with a period followed by a single space in every file in this directory: + +``` +$ sed -i 's/\. */. /g' * +``` + +* To pass an input through a stream editor and then quit after printing the number of lines designated by the script's first parameter: + +``` +$ sed ${1}q +``` + + + + +---- + +# Some More Advanced Stuff + + +## Scheduling Recurrent Processes + + +### at +* A very cute bash command is **at**, which allows you to run processes later (ended with CTRL+D): + +``` +$ at 3pm +``` + + +### cron +* If you have to run processes periodically, you should use **cron**, which is already running as a [system daemon](http://en.wikipedia.org/wiki/Daemon_%28computing%29). You can add a list of tasks in a file named **crontab** and install those lists using a program also called **crontab**. **cron** checks all the installed crontab files and run cron jobs. + + +* To view the contents of your crontab, run: + +``` +$ crontab -l +``` + +* To edit your crontab, run: + +``` +$ crontab -e +``` + +* The format of cron job is: *min, hour, day, month, dow* (day of the week, where Sunday is 0). They are separated by tabs or spaces. The symbol * means any. It's possible to specify many values with commas. + +* For example, to run a backup every day at 5am, edit your crontab to: + +``` +0 5 * * * /home/files/backup.sh +``` + +* Or if you want to remember some birthday, you can edit your crontab to: + +``` +* * 16 1 * echo "Remember Mom's bday!" +``` + + +--- +## rsync + +* **rsync** performs file synchronization and file transfer. It can compress the data transferred using *zlib* and can use SSH or [stunnel](https://www.stunnel.org/index.html) to encrypt the transfer. + +* **rsync** is very efficient when recursively copying one directory tree to another because only the differences are transmitted over the network. + +* Useful flags are: **-e** to specify the SSH as remote shell, **-a** for archive mode, **-r** for recurse into directories, and **-z** to compress file data. + +* A very common set is **-av** which makes **rsync** to work recursively, preserving metadata about the files it copies, and displaying the name of each file as it is copied. For example, the command below is used to transfer some directory to the **/planning** subdirectory on a remote host: + +``` +$ rsync -av :/planning +``` + + + +---- +## File Compression + +* Historically, **tar** stood for tape archive and was used to archive files to a magnetic tape. Today **tar** is used to allow you to create or extract files from an archive file, often called a **tarball**. + +* Additionally you can add *file compression*, which works by finding redundancies in a file (like repeated strings) and creating more concise representation of the file's content. The most common compression programs are **gzip** and **bzip2**. + +* When issuing **tar**, the flag **f** must be the last option. No hyphen is needed. You can add **v** as verbose. + +* A simple tarball is created with the flag **c**: + +``` +$ tar cf +``` + +* To extract a tarball you use the flag **x**: + +``` +$ tar xf +``` + +### gzip + + +* **gzip** is the most frequently used Linux compression utility. To create the archive and compress with gzip you use the flag **z**: + +``` +$ tar zcf +``` + +* You can directly work with gzip-compressed files with ```zcat, zmore, zless, zgrep, zegrep```. + +### bzip2 + +* **bzip2** produces files significantly smaller than those produced by gzip. To create the archive and compress with bz2 you use the flag **j**: + +``` +$ tar jcf +``` + +### xz + +* **xz** is the most space efficient compression utility used in Linux. To create the archive and compress with xz: + +``` +$ tar Jcf +``` + + +---- +## Logs + +* Standard logging facility can be found at ```/var/log```. For instance: + * ```/var/log/boot.log``` contains information that are logged when the system boots. + * ```/var/log/auth.log``` contains system authorization information. + * ```/var/log/dmesg``` contains kernel ring buffer information. + + +* The file ```/etc/rsyslog.conf``` controls what goes inside the log files. + +* The folder ```/etc/services``` is a plain ASCII file providing a mapping between friendly textual names for internet services, and their underlying assigned port numbers and protocol types. To check it: + +``` +$ cat /etc/services +$ grep 110 /etc/services +``` + +* To see what your system is logging: + +``` +$ lastlog +``` + + +----- +## /proc and inodes + +* If the last link to a file is deleted but this file is open in some editor, we can still retrieve its content. This can be done, for example, by: + 1. attaching a debugger like **gdb** to the program that has the file open, + + 2. commanding the program to read the content out of the file descriptor (the **/proc** filesystem), copying the file content directly out of the open file descriptor pseudo-file inside **/proc**. + +* For example, if one runs ```$ dd if=/dev/zero of=trash & sleep 10; rm trash```, the available disk space on the system will continue to go downward (since more contents gets written into the file by which **dd** is sending its output). + +* However, the file can't be seen everywhere in the system! Only killing the **dd** process will cause this space to be reclaimed. + +* An **index node** (inode) is a data structure used to represent a filesystem object such as files or directories. The true name of a file, even when it has no other name, is in fact its *inode number* within the filesystem it was created, which can be obtained by +``` +$ stat +``` +or +``` +$ ls -i +``` + +* Creating a hard link with **ln** results in a new file with the same *inode number* as the original. Running *rm* won't affect the other file: + +``` +$ echo awesome > awesome +$ cp awesome more-awesome +$ ln awesome same-awesome +$ ls -i *some +7602299 awesome +7602302 more-awesome +7602299 same-awesome +``` + +---- +## Text, Hexdump, and Encodings + +* A Linux text file contains lines consisting of zero of more text characters, followed by the **newline character** (ASCII 10, also referred to as hexadecimal 0x0A or '\n'). + +* A text with a single line containing the word 'Hello' in ASCII would be 6 bytes (one for each letter, and one for the trailing newline). For example, the text below: + +``` +$ cat text.txt +Hello everyone! +Linux is really cool. +Let's learn more! +``` + +is represented as: + +``` +$ hexdump -c < text.txt +0000000 H e l l o e v e r y o n e ! \n +0000010 L i n u x i s r e a l l y +0000020 c o o l . \n L e t ' s l e a r +0000030 n m o r e ! \n +0000038 +``` + +* The numbers displayed at left are the hexadecimal byte offsets of each output line in the file. + +* Unlike text files on other operating systems, Linux files does not end with a special end-of-file character. + + +* Linux text files were traditionally always interpreted as **ASCII**. In ASCII, each character is a single byte, the ASCII standard as such defines exactly **128 characters** from **ASCII 0 to ASCII 127**. Some of them are non-printable (such as newline). The printable stats at **32**. In that case, **ISO 8859** standards were extensions to ASCII where the character positions **128 to 255** are given foreign-language interpretation. + +* Nowadays, Linux files are most often interpreted as **UTF-8**, which is an encoding of **Unicode**, a character set standard able to represent a very large number of languages. + +* For East asian languages, **UTF-8 **chars are interpreted with **3 bytes** and **UTF-16** chars are interpreted with **2 bytes**. For western languages (such as German, for example), **UTF-16** characters are interpreted with **2 bytes**, and all the regular characters have **00** in front of it. + +* In **UTF-16**, sentences start with two bytes **fe ff** (decimal 254 255) which don't encode as any part of the text. These are the **Unicode byte order mark** (BOM), which guards against certain kinds of encoding errors [1]. + + +* Linux has a command to translate between character sets: + +``` +$ recode iso8859-1..utf-8 +``` + +* This is useful if you see a **mojibake**, which is a character set encoding mismatch bug. + + +* There are only two mandatory rules about characters that can't appear in filename: null bytes (bytes that have numeric value zero) and forward slashes **/**. + + + + +---- + +# Extra Juice: (pseudo)-Random Tricks + +## Creating Pseudo-Random Passwords + +* Add this to your **~/.bashrc**: + +``` +genpass() { + local p=$1 + [ "$p" == "" ] && p=16 + tr -dc A-Za-z0-9_ < /dev/urandom | head -c ${p} | xargs +} +``` + +* Then, to generate passwords, just type: + +``` +$ genpass +``` + +* For example: + +```sh +$ genpass +dIBObynGX9epYogz +$ genpass 8 +c_yhmaXt +$ genpass 12 +FZI2wz2LzyVQ +$ genpass 14 +ZEfgQvpY4ixePt +``` + + +--- +## Password Asterisks + +* By default, when you type your password in the terminal you should see no feedback. If you would like to see asterisks instead, edit: + +``` +$ sudo visudo +``` + +to have the value: + +``` +Defaults pwfeedback +``` + + +---- +## imagemagick + +* You can create a gif file from terminal with ***imagemagick***: + +``` +$ mogrify -resize 640x480 *.jpg +$ convert -delay 20 -loop 0 *.jpg myimage.gif +``` + +--- +## Easy access to the History + + +* Type ```!!``` to run the last command in the history, ```!-2``` for the command before that, and so on. + + + diff --git a/Network_and_802.11/wireshark_guide.md b/Network_and_802.11/wireshark_guide.md new file mode 100644 index 0000000..be23749 --- /dev/null +++ b/Network_and_802.11/wireshark_guide.md @@ -0,0 +1,781 @@ +# [WIRESHARK GUIDE (by bt3)](http://bt3gl.github.io/wiresharking-for-fun-or-profit.html) + + + +[Wireshark](https://www.wireshark.org/) is an open source **network packet analyzer** that allows live traffic analysis, with support to several protocols. + +Wireshark also allows **network forensic**, being very useful for CTFs for example (check my writeups for the [D-CTF Quals 2014](http://bt3gl.github.io/exploring-d-ctf-quals-2014s-exploits.html) and for the CSAW Quals 2014 in [Networking](http://bt3gl.github.io/csaw-ctf-2014-networking-100-big-data.html) and [Forensics](http://bt3gl.github.io/csaw-ctf-2014-forensics-200-why-not-sftp.html)). + +In this blog post I introduce Wireshark and I talk about my favorite features in the tool. + + +------------------------------------------------------ +# The Network Architecture + +Before we are able to understand and analyse network traffic packets, we must have an insight of how the network stack works. + + +## The OSI Model + +The [Open Systems Interconnection](http://en.wikipedia.org/wiki/OSI_model) (OSI) model was published in 1983 and is a conceptual model that characterizes and standardizes the internal functions of a communication system by partitioning it into abstraction layers. + +![](http://i.imgur.com/dZyiOTX.png) + +Protocols are separated according to their function and the hierarchy makes it easier to understand network communication: + + + +### Layer 1: Physical Layer + +Represents the physical and electrical medium through which the network data is transferred. + +It comprehends all hardware, hubs, network adapters, cable, etc. + +### Layer 2: Data Link Layer + +Provides the means of *transporting data* across a physical network. Bridges and switches are the physical devices in this layer. + +It is responsible for providing an addressing scheme that can be used to identify physical devices: the [MAC address](http://en.wikipedia.org/wiki/MAC_address). + +Examples of protocols in this layer are: [Ethernet](http://en.wikipedia.org/wiki/Ethernet), [Token Ring](http://en.wikipedia.org/wiki/Token_ring), [AppleTalk](http://en.wikipedia.org/wiki/AppleTalk), and [Fiber Distributed Data Interface](http://en.wikipedia.org/wiki/Fiber_Distributed_Data_Interface) (FDDI). + +### Layer 3: Network Layer + +Responsible for routing data between physical networks, assigning the *logical addressing* of network hosts. It also handles *packet fragmentation* and *error detection*. + +Routers and its *routing tables* belong to this layer. Examples of protocols are: [Internet Protocol](http://en.wikipedia.org/wiki/Internet_Protocol) (IP), [Internetwork Packet Exchange](http://en.wikipedia.org/wiki/Internetwork_Packet_Exchange), and the [Internet Control Message Protocol](http://en.wikipedia.org/wiki/Internet_Control_Message_Protocol) (ICMP). + + +### Layer 4: Transport Layer + +Provides the *flow control* of data between two hosts. Many firewalls and proxy servers operate at this layer. + +Examples of protocol are: [UDP](http://en.wikipedia.org/wiki/User_Datagram_Protocol) and [TCP](http://en.wikipedia.org/wiki/Transmission_Control_Protocol). + +### Layer 5: Session Layer +Responsible for the *session* between two computers, managing operations such as gracefully terminating connections. It can also establish whether a connection is [duplex or half-duplex](http://en.wikipedia.org/wiki/Duplex_%28telecommunications%29). + +Examples of Protocols are: [NetBIOS](http://en.wikipedia.org/wiki/NetBIOS) and [NWLink](http://en.wikipedia.org/wiki/NWLink). + +### Layer 6: Presentation Layer + +Transforms the received data into a format that can be read by the application layer, such as enconding/decoding and several forms of encryption/decryption for securing the data. + +Examples of protocols are: [ASCII](http://en.wikipedia.org/wiki/ASCII), [MPEG](http://en.wikipedia.org/wiki/Moving_Picture_Experts_Group), [JPEG](http://en.wikipedia.org/wiki/JPEG), and [MIDI](http://en.wikipedia.org/wiki/MIDI). + +### Layer 7: Application Layer + +Provides the details for end users to access network resources. + +Examples of protocols are: [HTTP](http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol), [SMTP](http://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol), [FTP](http://en.wikipedia.org/wiki/File_Transfer_Protocol), and [Telnet](http://en.wikipedia.org/wiki/Telnet). + +--- + +## Data Encapsulation + +The way the protocols on different layers of the OSI model communicate is by the *data encapsulation*, where each layer in the stack adds a header or footer to the packet. + +The encapsulation protocol creates a [protocol data unit](http://en.wikipedia.org/wiki/Protocol_data_unit) (PDU), including the data with all header and footer information added to it. What we call *packet* is the complete PDU. + +For instance, in Wireshark we can track the sequence number where a higher layer PDU starts and stops. This allows us to measure how long it took to transfer a PDU (the *display filter* is **tcp.pdu.time**). + + +--- + + +## Switches and Routers +There are four primary ways to capture traffic from a target device on a +**switched** network: using a **hub**, using a **tap**, by port mirroring, or by ARP cache poisoning. The first two obviously require a hub or a tap. Port mirroring requires forwarding capability from the switch. A great way to decide which method to use was borrowed by the reference [1]: + +![](http://i.imgur.com/aRUfmsp.png) + + +All of the techniques for switched network are available on **routed** networks as well. However, for routers the sniffer placement becomes more relevant since a device's broadcast domain extends only until it reaches a +router. + +--- +## Types of Traffic Packets + +There are three types of traffic packets within a network: + +* **Broadcast packet**: sent to all ports on the network segment. Broadcast MAC address is *ff:ff:ff:ff:ff:ff* (layer 2) or the highest possible IP address (layer 3). + +* **Multicast packet**: sent from a single source to multiple destinations, to simplify the process using as little as bandwidth as possible. + +* **Unicast packet**: transmitted from one computer to another. + + +--- +## Common Protocols by Layer + +### The Address Resolution Protocol (Layer 2) + +Both **logical** and **physical addresses** are used for communication on a network. Logical addresses allows communication between multiple networks (indirectly connected devices). Physical addresses allows communication on a single network (devices that are connected to each other with a switch for example). + + +[ARP](http://en.wikipedia.org/wiki/Address_Resolution_Protocol) is the protocol used to determine which [MAC address](http://en.wikipedia.org/wiki/MAC_address) (physical address such as 00:09:5B:01:02:03 and belonging to layer 2) corresponds to a particular IP address (logical addresses such as 10.100.20.1, belonging to layer 3). + +The ARP resolution process uses two packets (*ARP request* and *ARP response*) to find the matching MAC address, sending a **broadcast** packet to every device in the domain, and waiting for the response of the correct client. This works because a switch uses a MAC table to know through which port to send the traffic. + +In Wireshark, ARP is easily spotted with sentences such as **"Who has 192.168.11/ Tell 192.168.1.1"**. Additionally, you can see the ARP table in your device with: + +``` +$ arp -a +``` + +### The Internet Protocol (Layer 3) + +Every interface on an Internet must have a unique Internet address. An IP has the task of delivering packets between hosts based on the IP addresses in the packet headers. + +[IPv4](http://en.wikipedia.org/wiki/IPv4) addresses are 32-bit addresses used to uniquely identify devices connected in a network. They are represented by the dotted-quad notation with four sets of 8 bits, represented by decimal numbers between 0 and 255. + +In addition, an IP address consists of two parts: a **network address** and a **host address**. The network address identifies the *local area network* (LAN), and the host address identifies the device on that network. + +The determination of these two parts is given by another set of addressing information, the **network mask** (netmask or subnet mask), which is also 32 bit longs. In the netmask, every bit set to 1 identifies the portion of the IP address that belongs to the network address. Remaining bits set to 0 identify the host address: + +![](http://i.imgur.com/a7Evq9z.png) + +Additionally, the IP packet header contain informations such as: + +* **Version**: version of IP used. + +* **Header Length**: length of the IP header. + +* **Type of Service**: flag used by routers to prioritize traffic. + +* **Total length**: length of the IP header and the data in the packet. + +* **Identification**: identification of a packet or sequence of fragmented packets. + +* **Fragment offset**: identification of whether a packet is a fragment. + +* **Time to Live**: definition of the lifetime of the packet, measured in hops/seconds through routers. A TTL is defined when a packet is created, and generally is decremented by 1 every time the packet is forwarded by a router. + +* **Protocol**: identification of the type of packet coming next in the sequence. + +* **Header checksum**: error-detection mechanism. + +* **Source IP Address**. + +* **Destination IP address**. + +* **Options**: for routing and timestamps. + +* **Data**. + + + +### The Internet Control Message Protocol (Layer 3) + +ICMP is the utility protocol of TCP/IP responsible for providing information about the availability of devices, services, or routes on a network. + +Examples of services that use ICMP are **ping**: + +``` +$ ping www.google.com +PING www.google.com (74.125.228.210) 56(84) bytes of data. +64 bytes from iad23s23-in-f18.1e100.net (74.125.228.210): icmp_seq=1 ttl=53 time=21.5 ms +64 bytes from iad23s23-in-f18.1e100.net (74.125.228.210): icmp_seq=2 ttl=53 time=22.5 ms +64 bytes from iad23s23-in-f18.1e100.net (74.125.228.210): icmp_seq=3 ttl=53 time=21.4 ms +``` + + +and **traceroute**: + +``` +$ traceroute www.google.com +traceroute to www.google.com (173.194.46.84), 30 hops max, 60 byte packets + 1 * * * + 2 67.59.254.85 (67.59.254.85) 30.078 ms 30.452 ms 30.766 ms + 3 67.59.255.137 (67.59.255.137) 33.889 ms 67.59.255.129 (67.59.255.129) 33.426 ms 67.59.255.137 (67.59.255.137) 34.007 ms + 4 rtr101.wan.hcvlny.cv.net (65.19.107.109) 34.004 ms 451be075.cst.lightpath.net (65.19.107.117) 32.743 ms rtr102.wan.hcvlny.cv.net (65.19.107.125) 33.951 ms + 5 64.15.3.222 (64.15.3.222) 34.972 ms 64.15.0.218 (64.15.0.218) 35.187 ms 35.120 ms + 6 * 72.14.215.203 (72.14.215.203) 29.225 ms 29.646 ms + 7 209.85.248.242 (209.85.248.242) 29.361 ms 209.85.245.116 (209.85.245.116) 39.780 ms 42.108 ms + 8 209.85.249.212 (209.85.249.212) 33.220 ms 209.85.252.242 (209.85.252.242) 33.500 ms 33.786 ms + 9 216.239.50.248 (216.239.50.248) 53.231 ms 57.314 ms 216.239.46.215 (216.239.46.215) 52.140 ms +10 216.239.50.237 (216.239.50.237) 52.022 ms 209.85.254.241 (209.85.254.241) 48.517 ms 48.075 ms +11 209.85.243.55 (209.85.243.55) 56.220 ms 45.359 ms 44.934 ms +12 ord08s11-in-f20.1e100.net (173.194.46.84) 43.184 ms 39.770 ms 45.095 ms +``` + +The way traceroute works is by sending echo request that have a particular feature in the IP header: **the TTL is 1**. This means that the packet will be dropped at the first router. The the second packet is a reply from the first router along the path to the destination, and so on. + +To make this work, the router replies an ICMP response with a *double-headed packet*, containing a copy of the IP header and the ICMP data that was sent in the original echo request. + + + +### The Transmission Control Protocol (Layer 4) + +Provides a reliable flow of data between two hosts with a **three-way handshake**. The purpose is to allow the transmitting host to ensure that the destination host is up, and let the transmitting host check the availability of the port as well. + +This handshake works as the follow: + +1. Host A sends an initial packet with no data but with the synchronize (SYN) flag and the initial sequence number and [maximum segment size](http://en.wikipedia.org/wiki/Maximum_segment_size) (MSS) for the communication process. +2. Host B responds with a synchronize and acknowledge (SYN + ACK) flag, with its initial sequence number. +3. Host A sends an acknowledge (ACK) packet. + +When the communication is done, a **TCP teardown** process is used to gracefully end a connection between two devices. The process involves four packets: + +1. Host A sends a packet with FIN and ACK flags. +2. Host B sends an ACK packet and then a FIN/ACK packet. +4. Host A sends an ACK packet. + + +Sometimes, however, connections can end abruptly (for example due to a potential attacker issuing a port scan or due a misconfigured host). In these cases, TCP resets packets with a RST flag are used. This indicate that a connection was closed abruptly or a connection attempt was refused. + +Furthermore, when communicating with TCP, 65,535 ports are available. We typically divide them into two groups: + +* **standard port group**: from 1 to 1023, used by specific services. + +* **ephemeral port group**: from 1024 through 65535, randomly chosen by services. + +Finally, the TCP header contains information such as: + +* **Source Port**. +* **Destination Port**. +* **Sequence number**: identify a TCP segment. +* **Acknowledgment Number**: sequence number to be expected in the next packet from the other device. +* **Flags**: URG, ACK, PSH, RST, SYN, FIN flags for identifying the type of TCP packet being transmitted. +* **Windows size**: size of the TCP receiver buffer in bytes. +* **Checksum**: ensure the contents of the TCP header. +* **Urgent Pointer**: examined for additional instructions where the CPU should be reading the data within the packet. +* **Options**: optional fields. + + +### The User Datagram Protocol (Layer 4) + +While TCP is designed for reliable data delivery, UDP focus on speed. UDP sends packets of data called **datagrams** from one host to another, with no guarantee that they reach the other end. + +Unlike TCP, UDP does not formally establish and terminate a connection between hosts. For this reason, it usually relies on built-in reliability services (for example protocols such as DNS and DHCP). + +The UDP header has fewer fields than TCP: + +* **Source Port**. +* **Destination Port**. +* **Packet Length**. +* **Checksum**. + + +### The Dynamic Host Configuration Protocol (Layer 7) + + +In the beginning of the Internet, when a device needed to communicate over +a network, it would be assigned an address by hand. + +As the Internet grown, the **Bootstrap Protocol** (BOOTP) was created, automatically assigning addresses to devices. Later, BOOTP was replaced by DHCP. + + +### The Hypertext Transfer Protocol (Layer 7) + +HTTP is the mechanism that allows browsers to connect to web servers to view web pages. HTTP packets are built on the top of TCP and they are identified by one of the eight different request methods. + + +------------------------------------------------------ + + +# Analyzing Packets in Wireshark + +In Wireshark, the entire process of network sniffing can be divided into three steps: + +1. **Collection**: transferring the selected network interface into promiscuous mode so it can capture raw binary data. + +2. **Conversion**: chunks of collected binary are converted into readable form. + +3. **Analysis**: processing of the protocol type, communication channel, port number, protocol headers, etc. + +## Collecting Packets +Network traffic sniffing is only possible if the ** network interface** (NIC) is transfered to **promiscuous mode**. This allows the transfer of all received traffic to the CPU (instead of processing frames that the interface was intended to receive). If the NIC is not set to promiscuous mode, packets that are not destined to that controller are discarded. + +## Wireshark main's GUI +The Wireshark main's GUI is composed of four parts: + +* **Capture's options**. +* **Packet List**: list all packets in the capture file. It can be edited to display packet number, relative time, source, destination, protocol, etc. +* **Packet details**: hierarchal display of information about a single packet. +* **Packet Bytes**: a packet in its raw, unprocessed form. + +To start capturing packets, all you need to do is to choose the network interface. You may also edit a *capture filter* prior to the packet collection. + + + +## Color Scheme + +The packet list panel displays several type of traffic by (configurable) colors. For instance: + +* green is TCP (and consequently HTTP), +* dark blue is DNS, +* light blue is UDP, +* light yellow is for ARP, +* black identifies TCP packets with problems. + +## Packet Visualization and Statistics + +Wireshark has several tools to learn about packets and networks: + +* **Statistics -> IO Graphs**: Allows to graph throughput of data. For instance, you can use graphs to find peaks in the data, discover performance bottlenecks in individual protocols, and compare data streams. Filtering is available in this interface (for example, to show ARP and DHCP traffic). + +* **Statistics -> TCP -> Stream Graph -> Round Trip Time Graph**: Allows to plot **round-trip times** (RTT) for a given capture file. This is the time it takes for an acknowledgment to be received from a sent packet. + +* **Statistics -> Flow Graph**: Timeline-based representation of communication statistics (based on time intervals). It allows the visualization of connections and the flow of data over time. A flow graph contains a column-based view of a connection between hosts and organizes the traffic. This analysis can show slow points or bottlenecks and determine if there is any latency. + +* **Statistics -> Summary**: Returns a report about the entire process by features such as interface, capture duration and number, and size of packets. + +* **Statistics -> Protocol Hierarchy**: Shows statistical information of different protocols in a *nodal form*. It arranges the protocols according to its layers, presenting them in percentage form. For example, if you know that your network usually gets 15% ARP traffic, if you see a value such as 50%, you know something is wrong. + +* **Statistics -> Conversations**: Shows the address of the endpoints involved in the conversation. + +* **Statistics -> Endpoints**: Similar to conversations, reflecting the statistics of traffic to and from an IP address. For example, for TCP, it can look like **SYN, SYN/ACK, SYN**. + +* **Edit-> Finding Packet or CTRL-F**: Finds packets that match to some criteria. There are three options: + * *Display filter*: expression-based filter (for example **not ip**, **ip addr==192.168.0.10**, or **arp**). + * *Hex value*: packets with a hexadecimal (for example 00:ff, ff:ff). + * *String*: packets with a text string (for example admin or workstation). + +* **Right click -> Follow TCP Stream**: Reassembles TCP streams into an readable format (instead of having the data being in small chunks). The text displayed in *red* to signifies traffic from the source to the destination, and in *blue* identifies traffic in the opposite direction. If you know the stream number (value to be followed to get various data packets), you can also use the following filter for the same purpose: + +``` +tcp.stream eq +``` + +* **Right click -> Mark Packet or CTRL+M**: Helps to organization of relevant packets. + + + + +--- +## Filters + +### The Berkeley Packet Filter Syntax +Wireshark's filtering is a very powerful feature. It uses the [Berkeley Packet Filter](http://en.wikipedia.org/wiki/Berkeley_Packet_Filter) (BFP) syntax. The syntax corresponds to an **expression** which is made of one more **primitives**. These primitives can have one or more **qualifier**, which are defined below: + +* **Type**: ID name or number (for example: **host**, **net**, **port**). +* **Dir**: transfer direction to or from the ID name or number (for example: **src** and **dst**). +* **Proto**: restricts the match to a particular protocol (for example: **ether**, **ip**, **tcp**, **udp**, or **http**) + +A example of primitive is: +``` +dst host 192.168.0.10 +``` +where **dst host** is the qualifier, and the IP address is the ID. + +### Types of Filters +Packages can be filtering in two ways: + +* **Capture filters**: specified when packets are being captured. This method is good for performance of large captures. +* **Display filters**: applied to an existing set of collected packets. This method gives more versatility since you have the entire data available. + +In the following sessions I show several examples of capture and display filters. + +### Capture Filters by Host Address and Name + +* Traffic associated with a host's IPV4 address (also works for a IPv6 network). + +``` +host 172.16.16.150 +``` + +* Traffic to or from a range of IP addresses: + +``` +net 192.168.0.0/24 +``` + +* Device's hostname with the host qualifier: + +``` +host testserver +``` + +* If you are concerned that the IP address for a host changed, you can filter based on MAC address: + +``` +ether host ff-ff-ff-ff-ff-aa +``` + +* Only traffic coming from a particular host (host is an optional qualifier): + +``` +src host 172.16.16.150 +``` + +* All the traffic leaving a host: + +``` +dst host 172.16.16.150 +``` + +* Only traffic to or from IP address 173.15.2.1 + +``` +host 173.15.2.1 +``` + +* Traffic from a range of IP addresses: + +``` +src net 192.168.0.0/24 +``` + + +### Capture Filters by Ports + +* Only traffic on port 8000: + +``` +port 8000 +``` + +* All traffic except on port 443: + +``` +!port 443 +``` + +* Traffic going to a host listening on 80: + +``` +dst port 80 +``` + +* Traffic within a range of port: + +``` +tcp portrange 1501-1549 +``` + +* Both inbound and outbound traffic on port 80 and 21: + +``` +port 80 || port == 21 +``` + +* Only non-http and non-SMTP traffic (equivalent): + +``` +host www.example.com and not (port 80 or port 25) +``` + +### Capture Filters by Protocols + +* Capture only unicast traffic (useful to get rid of noise on the network): + +``` +not broadcast and not multicast +``` + +* ICMP traffic only: + +``` +icmp +``` + + +* Drop ARP packets: + +``` +!arp +``` + +* Drop IPv6 traffic: + +``` +!ipv6 +``` + +* DNS traffic: + +``` +dns +``` + +* Clear text email traffic: + +``` +smtp || pop || imap +``` + +### Capture Filters by Packet's Properties + +* TCP packets with SYN flag set: + +``` +tcp[13]&2==2 +``` + +* ICMP packets with destination unreachable (type 3): + +``` +icmp[0]==3 +``` + +* HTTP GET requests (bytes 'G','E','T' are hex values 47, 45, 54): + +``` +port 80 and tcp[((tcp[12:1] & 0xf0 ) >> 2 ):4 ] = 0x47455420 +``` + +--- +### Display Filters by Host Address and Name + + +* Filter by IP address: + +``` +ip.addr == 10.0.0.1 +``` + +* IP source address field: + +``` +ip.src == 192.168.1.114 +``` + +* IP address src/dst for a network range: + +``` +ip.addr== 192.168.1.0/24 +``` + +### Display Filters by Ports + +* Any TCP packet with 4000 as a source or destination port: + +``` +tcp.port == 4000 +``` + +* Source port: + +``` +tcp.srcport == 31337 +``` + +### Display Filters by Protocols + +* Drops arp, icmp, dns, or whatever other protocols may be background noise: + +``` +!(arp or icmp or dns) +``` + +* Displays all re-transmissions in the trace (helps when tracking down slow application performance and packet loss): + +``` +tcp.analysis.retransmission +``` + +* ICMP Type field to find all PING packets: + +``` +icmp.type== 8 +``` + +### Display Filters by Packet's Properties + +* Displays all HTTP GET requests: + +``` +http.request +``` + +* Display all POST requests: + +``` +http.request.method == "POST" +``` + +* Filter for the HEX values: + +``` +udp contains 33:27:58 +``` + +* Sequence number field in a TCP header: + +``` +tcp.seq == 52703261 +``` + + +* Packets that are less than 128 bytes in length: + +``` +frame.len <= 128 +``` + +* TCP packets with SYN flag set: + +``` +tcp.flags.syn == 1 +``` + +* TCP packets with RST flag set: + +``` +tcp.flags.rst == 1 +``` + +* Displays all TCP resets: + +``` +tcp.flags.reset == 1 +``` + +* IP flags where fragment bit is not set (see if someone is trying ping): + +``` +ip.flags.df == 0 +``` + + +-------------------------------------- +# Using Wireshak for Security + + +## Some Reconnaissance Tips + +### Network Scan with SYN + +A TCP SYN scan is fast and reliable method to scan ports and services in a network. It is also less noisy than other scanning techniques. + +Basically, it relies on the three-way handshake process to determine which ports are open on a target host: + +1. The attacker sends a TCP SYN packet to a range of ports on the victim. + +2. Once this packet is received by the victim, the follow response will be observed: + + * **Open ports**: replies with a TCP SYN/ACK packet (three times). Then the attacker knows that port is open and a service is listening on it. + + * **Closed ports, not filtered**: the attacker receives a RST response. + + * **Filtered ports** (by a firewall, for example): the attacker does not receive any response. + + +### Operating System Fingerprint + +Technique to determine the operating system on a system without have access to it. + +In a **Passive Fingerprinting**, an attacker can use certain fields within packets sent from the target to craft a stealthy fingerprinting. + +This is possible due the lack of specificity by protocol's [RFCs](http://en.wikipedia.org/wiki/Request_for_Comments): although the various fields contained in the TCP, UDP and IP headers are very specific, no default values are defined for these fields. + +For instance, the following header values can help one to distinguish between several operating systems: + +* **IP, Initial Time to Live**: + - 64 for Linux, Mac OS + - 128 for Windows + - 255 for Cisco IOS +* **IP, Don't Fragment Flag**: + - Set for Linux, Mac OS, Windoes + - Not set for Cisco IOS +* **TCP, Max Segment Size**: + - 1440 for Windows + - 1460 for Mac OS 10, Linux +* **TCP, Window Size**: + - 2920-5840 for Linux + - 4128 for Cisco IOS + - 65535 for for Mac OS 10 + - variable for Windows +* **TCP, StackOK**: + - Set for Linux, Windowns + - Not set for Cisco IOS, Mac OS 10 + +Note: A nice tool using operating system fingerprinting techniques is [p0f](http://lcamtuf.coredump.cx/p0f3/). + + + +In **Active Fingerprinting**, the attacker actively sends crafted packets to the victim whose replies reveal the OS. This can be done with [Nmap](http://nmap.org/). + +--- + + +## Some Forensics Tips + +### DNS Queries + +Look at different DNS queries that are made while the user was online. A possible filter is: + +``` +dsn +``` +This will give a view of any malicious DNS request done without the knowledge of the user. An example is a case where a visited website has a hidden **iframe** with some malicious script inside. + +### HTTP GET Headers + +Look for different HTTP streams that have flown during the network activity: HTML, JavaScript, image traffic, 302 redirections, non-HTTP streams, Java Archive downloads, etc. A possible filter is: + +``` +http +``` +You can also look at different GET requests with: + +``` +tcp contains "GET" +``` + +--- +## ARP Cache Poisoning + +### Sniffing + +ARP cache poisoning allows tapping into the wire with Wireshark. This can be used for good or for evil. + +The way this works is the following: all devices on a network communicate with each other on layer 3 using IP addresses. Because switches operate on layer 2 they only see MAC addresses, which are usually cached. + +When a MAC address is not in the cache list, ARP broadcasts a packet asking which IP address owns some MAC address. The destination machine replies to the packet with its MAC address via an ARP reply (as we have learned above). So, at this point, the transmitting computer has the data link layer addressing the information it needs to communicate with the remote computer. This information is then stored into the ARP cache. + +An attacker can spoof this process by sending ARP messages to an Ethernet switch or router with fake MAC addresses in order to intercept the traffic of another computer. + +ARP cache poising can be crafted using [Cain & Abel](http://www.oxid.it/cain.html). + + +### Denial-of-Service + +In networks with very high demand, when you reroute traffic, everything transmitted and received by the target system must first go through your analyzer system. This makes your analyzer the bottleneck in the communication process and being suitable to cause [DoS](http://en.wikipedia.org/wiki/Denial-of-service_attack). + +You might be able avoid all the traffic going through your analyzer system by using a feature called [asymmetric routing](http://www.cisco.com/web/services/news/ts_newsletter/tech/chalktalk/archives/200903.html). + +--- +## Wireless Sniffing + +### The 802.11 Spectrum +The unique difference when capturing traffic from a **wireless local area network** (WLAN) is that the wireless spectrum is a **shared medium** (unlike wired networks, where each client has it own cable to the switch). + +A single WLAN occupy a portion of the [802.11 spectrum](http://en.wikipedia.org/wiki/IEEE_802.11), allowing multiple systems to operate in the same physical medium. In the US, 11 channels are available and a WLAN can operate only one channel at time (and so the sniffing). + +However, a technique called **channel hopping** allows quick change between channels to collect data. A tool to perform this is [kismet](https://www.kismetwireless.net/), which can hop up to 10 channels/second. + + + +### Wireless NIC modes + +Wireless network cards can have four modes: + +* **Managed**: when the wireless client connects directly to a wireless access point (WAP). + +* **ad hoc mode**: devices connect directly to each other, sharing the responsibility of a WAP. + +* **Master mode**: the NIC works with specialized software to allow the computer act as a WAP for other devices. + +* **Monitor**: used to stop transmitting and receiving data, and start listening to the packets flying in the air. + +To access the wireless extensions in Linux you can type: + +``` +$ iwconfig +``` + +To change the interface (for example eth1) to monitor mode, you type: +``` +$ iwconfig eth1 mode monitor +$ iwconfig eth1 up +``` + +To change the channel of the interface: + +``` +$ iwconfig eth` channel 4 +``` + + diff --git a/README.md b/README.md index 39fb6d9..9965f01 100644 --- a/README.md +++ b/README.md @@ -19,53 +19,66 @@ All in one big bag. For fun, profits, or CTFs. ----- +---- ### Useful Command Line #### Searching - - + + ``` grep word f1 - + sort | uniq -c - + diff f1 f2 - + find -size f1 ``` - - + + #### Compressed Files - - + + ``` zcat f1 > f2 - + gzip -d file - + bzip2 -d f1 - + tar -xvf file ``` - - - + + + #### Connecting to a Server/Port - + ``` echo 4wcYUJFw0k0XLShlDzztnTBHiqxU3b3e | nc localhost 30000 - + openssl s_client -connect localhost:30001 -quiet - + nmap -p 31000-32000 localhost - + telnet localhost 3000 ``` - + + +---- + +## References: + +### Books +- The Tangled Web +- The Art of Exploitation +- The Art of Software Security Assessment: +- Practical Packet Analysis + + + ---- ### License diff --git a/Reverse_Engineering/README.md b/Reverse_Engineering/README.md index bfcdae8..ebebf0e 100644 --- a/Reverse_Engineering/README.md +++ b/Reverse_Engineering/README.md @@ -1,6 +1,219 @@ # Reverse Engineering +* Objective: turn a x86 binary executable back into C source code. +* Understand how the compiler turns C into assembly code. +* Low-level OS structures and executable file format. +--- +##Assembly 101 + +### Arithmetic Instructions + +``` +mov eax,2 ; eax = 2 +mov ebx,3 ; ebx = 3 +add eax,ebx ; eax = eax + ebx +sub ebx, 2 ; ebx = ebx - 2 +``` + +### Accessing Memory + +``` +mox eax, [1234] ; eax = *(int*)1234 +mov ebx, 1234 ; ebx = 1234 +mov eax, [ebx] ; eax = *ebx +mov [ebx], eax ; *ebx = eax +``` + +### Conditional Branches + +``` +cmp eax, 2 ; compare eax with 2 +je label1 ; if(eax==2) goto label1 +ja label2 ; if(eax>2) goto label2 +jb label3 ; if(eax<2) goto label3 +jbe label4 ; if(eax<=2) goto label4 +jne label5 ; if(eax!=2) goto label5 +jmp label6 ; unconditional goto label6 +``` + +### Function calls + +First calling a function: + +``` +call func ; store return address on the stack and jump to func +``` +The first operations is to save the return pointer: +``` +pop esi ; save esi +``` + +Right before leaving the function: +``` +pop esi ; restore esi +ret ; read return address from the stack and jump to it +``` +--- + + +## Modern Compiler Architecture + +**C code** --> Parsing --> **Intermediate representation** --> optimization --> **Low-level intermediate representation** --> register allocation --> **x86 assembly** + +### High-level Optimizations + + + +#### Inlining + +For example, the function c: +``` +int foo(int a, int b){ + return a+b +} +c = foo(a, b+1) +``` +translates to c = a+b+1 + + +#### Loop unrolling + +The loop: +``` +for(i=0; i<2; i++){ + a[i]=0; +} +``` +becomes +``` +a[0]=0; +a[1]=0; +``` + +#### Loop-invariant code motion + +The loop: +``` +for (i = 0; i < 2; i++) { + a[i] = p + q; +} +``` +becomes: +``` +temp = p + q; +for (i = 0; i < 2; i++) { + a[i] = temp; +} +``` + +#### Common subexpression elimination + +The variable attributions: +``` +a = b + (z + 1) +p = q + (z + 1) +``` + +becomes +```` +temp = z + 1 +a = b + z +p = q + z +``` + +#### Constant folding and propagation +The assignments: +``` +a = 3 + 5 +b = a + 1 +func(b) +``` + +Becomes: +``` +func(9) +``` + +#### Dead code elimination + +Delete unnecessary code: +``` +a = 1 +if (a < 0) { +printf(“ERROR!”) +} +``` +to +``` +a = 1 +``` + +### Low-Level Optimizations + +#### Strength reduction + +Codes such as: +``` +y = x * 2 +y = x * 15 +``` + +Becomes: +``` +y = x + x +y = (x << 4) - x +``` +#### Code block reordering + +Codes such as : + +``` +if (a < 10) goto l1 +printf(“ERROR”) +goto label2 +l1: + printf(“OK”) +l2: + return; +``` +Becomes: +``` +if (a > 10) goto l1 +printf(“OK”) +l2: +return +l1: +printf(“ERROR”) +goto l2 +``` + + +#### Register allocation + +* Memory access is slower than registers. +* Try to fit as many as local variables as possible in registers. +* The mapping of local variables to stack location and registers is not constant. + +#### Instruction scheduling + +Assembly code like: +``` +mov eax, [esi] +add eax, 1 +mov ebx, [edi] +add ebx, 1 +``` +Becomes: +``` +mov eax, [esi] +mov ebx, [edi] +add eax, 1 +add ebx, 1 +``` + + +--- ## Tools Folder - X86 Win32 Cheat sheet @@ -9,7 +222,7 @@ - Command line tricks - +---- ## Other Tools - gdb @@ -27,18 +240,18 @@ - uncompyle2 (Python) - unpackers, hex editors, compilers - +--- ## Encondings/ Binaries - + ``` file f1 - + ltrace bin - + strings f1 - + base64 -d - + xxd -r nm @@ -47,17 +260,17 @@ objcopy binutils ``` - - + +--- ## Online References [Reverse Engineering, the Book]: http://beginners.re/ - +--- ## IDA - Cheat sheet @@ -65,7 +278,7 @@ binutils - +--- ## gdb - Commands and cheat sheet @@ -88,8 +301,8 @@ set disassembly-flavor intel disas main ``` - -## objdump +--- +## objdump Display information from object files: Where object file can be an intermediate file created during compilation but before linking, or a fully linked executable @@ -98,14 +311,15 @@ created during compilation but before linking, or a fully linked executable $ objdump -d ``` +---- ## hexdump & xxd For canonical hex & ASCII view: ``` -$hexdump -C +$hexdump -C ``` - -## xxd +---- +## xxd Make a hexdump or do the reverse: ``` xxd hello > hello.dump diff --git a/text.txt b/text.txt new file mode 100644 index 0000000..2d9d6f5 --- /dev/null +++ b/text.txt @@ -0,0 +1,3 @@ +Hello everyone! +Linux is really cool. +Let's learn more!