Steganography is the practice of hiding information inside other media like images, audio or video files, text, or pretty much anything else. It is different from encryption in that it aims not at making information unreadable but at concealing the very fact that it is there. Steganography and steganalysis (detection of steganography) are long-standing fields of research. Overviews of the field can be found, e.g., in Subhedar/Mankar (2014) or ZieliĆska/Mazurczyk/Szczypiorski (2014).
Real-world use of steganography is becoming more prevalent. For instance, creators of malware use these techniques to conceal their traffic. In 2011, a malware called Duqu was discovered which used JPG steganography to transmit information back to its control nodes (read more). It illustrates how steganography and cryptography can work hand in hand, as the traffic was first encrypted and then embedded into images. This makes detecting the malware very difficult. Researchers expect steganography will become more and more sophisticated in the future.
In this post, I walk through are very simple example of steganography. There is an innocently looking image of a book with some German text on it. Imagine you suspect someone embedded a message into this file. How would you go about finding the message?
The post is structured into three parts. In part one, I’ll start by showing you the image. Stop here if you want to try it yourself first. Then, in part two, I’ll walk you through an exemplary solution to show how the hidden message can be discovered. Finally, in part three, I’ll show you how the image was created.
My walkthrough is based on a toolbox of steganography-related software I’ve created a while ago.
You can find it here on GitHub.
It’s a Docker image you can run locally and contains all tools used in this post.
However, this post uses no exotic tools so apt-cache search
should work most of the time you miss something.
The challenge
The challenge here is similar to one I’ve read about recently in this blog post. It was originally from a CTF at BSides Iowa and was created by SecDSM. I’ve spiced it up slightly to make it a little more difficult. If you want to try this yourself, don’t read the blog post. I will spoil most of the challenge.
Look at the following two images. The first one is the original image with no message inside (thx to Pexels for the free image!).
Now here is the second image, which does have a message embedded. Both should render perfectly in your browser. You don’t see a different, right?
Now image all you would have intercepted is this second image. You have a suspicion that there is a message inside. Go and find evidence for manipulation. Then go on and extract the hidden message.
…
Tried hard enought? Then go on and read it could be solved.
The solution
Initial screening
The file we look at is a JPG image. For these files, I usually start with a script that checks a few basic things. It looks for embedded strings or other files and tries some steganography tools without or with default passwords. The script can be found here.
Applied to this file, we can spot a few oddities that deserve more attention:
root@a77ff3b227ba:/data# check_jpg book-of-secrets.jpg
[...]
#############################
########## binwalk ##########
#############################
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
0 0x0 JPEG image data, JFIF standard 1.01
99456 0x18480 End of Zip archive
[...]
################################
########## stegdetect ##########
################################
book-of-secrets.jpg : appended(227)<[nonrandom][data][.K........f.?L..]>
[...]
##################################
########## stegoVeritas ##########
##################################
Type: JPEG (ISO 10918)
Mode: RGB
[...]
Trailing Data Discovered... Saving
b'\xffK\x03\x04\n\x00\t\x00\x00\x00f\xae?L\xc5\x16\xf7\xc7-\x00\x00\x00!\x00\x00\x00\x08\x00\x1c\x00flag.txtUT\t\x00\x03P:rZP:rZux\x0b\x00\x01\x04\x00\x00\x00\x00\x04\x00\x00\x00\x00\xb6J\xf2\x03\x97\x88\x9cU\xd9\xb84\xc4HA\x13\xa36UW\xb1\x85&=8\xd9\xdd/\xd7n\x91\xba\xf9\xf8\xe5&\xe8\xec\xac\xb6\xd7\xa9\xd8\xea:SPK\x07\x08\xc5\x16\xf7\xc7-\x00\x00\x00!\x00\x00\x00PK\x01\x02\x1e\x03\n\x00\t\x00\x00\x00f\xae?L\xc5\x16\xf7\xc7-\x00\x00\x00!\x00\x00\x00\x08\x00\x18\x00\x00\x00\x00\x00\x01\x00\x00\x00\xa4\x81\x00\x00\x00\x00flag.txtUT\x05\x00\x03P:rZux\x0b\x00\x01\x04\x00\x00\x00\x00\x04\x00\x00\x00\x00PK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00N\x00\x00\x00\x7f\x00\x00\x00\x00\x00'
First, binwalk, a forensics tool to search for files in disk images, claims you image contains the end of ZIP archive.
Strangely, it does not find the beginning of it (but usually should).
Stegdetect, a tool made to identify steganography in images, found some non-random data at the end of the image.
Finally, stegoVeritas, another Python steganography tool, also found some data after the actual image data.
It conveniently prints out this data too.
Inside this data, we find a few readable strings, most importantly two occurrences of flag.txt
.
All this taken together, it appears as if a ZIP file was appended to the image. Binwalk claims it found the end of a ZIP file. If you print out the strings of a ZIP file, you can find the filenames inside it in plain text like in the bytes printed by stegoVeritas. Taken together, it seems as if someone appended a ZIP file and made it’s begining unreadable. Since the header of a JPG file tells your browser how to read the image, appending data to it does not make the image unusable. Your browser simply ignores the additional data, which is why both images look exactly the same. How do we get this ZIP archive out of the image and extracted?
Repairing the ZIP archive
There is a forensics tool called foremost that I typically use for file carving (i.e., splitting the concatenation of several files into it’s parts). It is meant to be used on raw disk images but can be applied to pretty much any file. It is similar to binwalk in how it works. The problem is that currently, these tools don’t identify the beginning of the ZIP file. Thus, running them on the image will not extract the ZIP archive. Let’s think about why these tools don’t work.
These forensic tools look for special signatures called magic numbers which your filesystem uses to quickly identify file types.
A comprehensive list can be found in Wikipedia.
You can see in that list that ZIP files typically start with the bytes 50 4B 03 04
.
In our screening output, we saw our data starts with \xffK\x03\x04
and since a K is 4B in hex, we see it starts with FF 4B 03 04
.
Thus, it appears someone corrupted the magic bytes to prevent identification of the ZIP archive.
Let’s repair the magic numbers with a hex editor.
Use hexedit book-of-secrets.jph
to change FF
into 50
.
00018390 4B 6F B2 BB 34 10 E2 0C 82 96 E7 77 28 B6 A6 DB Ko..4......w(...
000183A0 85 14 BC C7 0E 09 4D E6 D4 FF 00 31 4B 6A 50 AF ......M....1KjP.
000183B0 43 FF D9 FF 4B 03 04 0A 00 09 00 00 00 66 AE 3F C...K........f.?
000183C0 4C C5 16 F7 C7 2D 00 00 00 21 00 00 00 08 00 1C L....-...!......
000183D0 00 66 6C 61 67 2E 74 78 74 55 54 09 00 03 50 3A .flag.txtUT...P:
000183E0 72 5A 50 3A 72 5A 75 78 0B 00 01 04 00 00 00 00 rZP:rZux........
000183F0 04 00 00 00 00 B6 4A F2 03 97 88 9C 55 D9 B8 34 ......J.....U..4
00018400 C4 48 41 13 A3 36 55 57 B1 85 26 3D 38 D9 DD 2F .HA..6UW..&=8../
00018410 D7 6E 91 BA F9 F8 E5 26 E8 EC AC B6 D7 A9 D8 EA .n.....&........
00018420 3A 53 50 4B 07 08 C5 16 F7 C7 2D 00 00 00 21 00 :SPK......-...!.
00018430 00 00 50 4B 01 02 1E 03 0A 00 09 00 00 00 66 AE ..PK..........f.
00018440 3F 4C C5 16 F7 C7 2D 00 00 00 21 00 00 00 08 00 ?L....-...!.....
00018450 18 00 00 00 00 00 01 00 00 00 A4 81 00 00 00 00 ................
00018460 66 6C 61 67 2E 74 78 74 55 54 05 00 03 50 3A 72 flag.txtUT...P:r
00018470 5A 75 78 0B 00 01 04 00 00 00 00 04 00 00 00 00 Zux.............
00018480 50 4B 05 06 00 00 00 00 01 00 01 00 4E 00 00 00 PK..........N...
00018490 7F 00 00 00 00 00 ......
Now binwalk can properly identify the file again. We can also use foremost now to care out the ZIP file in a very convenient way. It will search for files and put the result into a folder you specify:
root@a77ff3b227ba:/data# foremost -i book-of-secrets.jpg -o outdir
Processing: book-of-secrets.jpg
|foundat=flag.txtUT
*|
root@a77ff3b227ba:/data# ls -lah outdir/zip
total 4.0K
drwxr-xr-- 3 root root 96 Jan 30 20:57 .
drwxr-xr-- 5 root root 160 Jan 30 20:57 ..
-rw-r--r-- 1 root root 228 Jan 31 21:32 00000193.zip
That was it. We now have our ZIP file.
Cracking the ZIP archive
We have a zip file and would like to extract it. Sounds easy. Just run unzip on it:
root@a77ff3b227ba:/data# unzip outdir/zip/00000193.zip
Archive: outdir/zip/00000193.zip
[outdir/zip/00000193.zip] flag.txt password:
skipping: flag.txt incorrect password
It would have been too easy. Unzip asks for a password. We have to find a way to crack it.
The plan is to get a wordlist and use the tool fcrackzip to find the password. There are two ways to go from here. Either we use a huge standard wordlist and hope the password is in there or we build our own one. You could try the standard wordlist but in this case, I suspect you would not be successful. We start directly by building our own one.
The way to come up with a wordlist is to look at the image. Three words are visible in a prominent location. “Buch der Geheimnisse”, which is German for “Book of Secrets”, are the three big words. Put them into a file:
root@a77ff3b227ba:/data/# cat wordlist.txt
buch
der
geheimnisse
We assume that the password is somehow based on these words but some scheme is applied to make it moe complicated. First, the password could be any of these words or any combination of these words. Moreover, we don’t know how words might be deliminated. A small python script creates an extended wordlist with all permuations and a number of different delimiters.
import itertools
words = [word.strip() for word in open("wordlist.txt")]
def append_list(f, words, delim):
for word in words:
f.write(delim.join(word) + "\n")
with open("wordlist_ext.txt", "w") as f:
lists = [words,
[w for w in itertools.permutations(words, 2)],
[w for w in itertools.permutations(words, 3)]]
delims = ["", " ", ",", ";"]
for words in lists:
for delim in delims:
append_list(f, words, delim)
The result is a file with 60 passwords:
t@a77ff3b227ba:/data# python2 combine.py
root@a77ff3b227ba:/data# wc -l wordlist_ext.txt
60 wordlist_ext.txt
root@a77ff3b227ba:/data# cat wordlist_ext.txt | head -n 3
buch
der
geheimnisse
root@a77ff3b227ba:/data# cat wordlist_ext.txt | tail -n 3
der;geheimnisse;buch
geheimnisse;buch;der
geheimnisse;der;buch
Now since fcrackzip is really fast and the passwords are not super complicated yet, we can extend the wordlist even more. John the Ripper has some nice features for password list generation. My reference for using it is this post, which illustrates nicely the various modes. In particular, look at the built-in rules. We use the built-in rule ‘single’ which generates really huge lists:
root@a77ff3b227ba:/data# john -wordlist:`pwd`/wordlist_ext.txt -rules:Single -stdout > `pwd`/wordlist_huge.txt
Press 'q' or Ctrl-C to abort, almost any other key for status
50632p 0:00:00:00 100.00% (2018-01-31 21:38) 632900p/s geheimnisse;der;buch1900
root@a77ff3b227ba:/data# sed -i '1d' wordlist_huge.txt
root@a77ff3b227ba:/data# wc -l wordlist_huge.txt
50632 wordlist_huge.txt
root@a77ff3b227ba:/data# cat wordlist_huge.txt | head -n 4
buch
der
geheimnisse
root@a77ff3b227ba:/data# cat wordlist_huge.txt | tail -n 3
der;geheimnisse;buch1900
geheimnisse;buch;der1900
geheimnisse;der;buch1900
We have blown up the list to about 50k passwords.
Notice the line sed -i '1d' wordlist_huge.txt
directly after password generation.
It’s added because my john creates a headline with some status information.
It’s time to crack.
root@a77ff3b227ba:/data# fcrackzip -v -u -D -p ./wordlist_ext.txt ./outdir/zip/00000193.zip
found file 'flag.txt', (size cp/uc 45/ 33, flags 9, chk ae66)
PASSWORD FOUND!!!!: pw == BuChGeHeImNiSsE
Password found :) Now we can unzip the file and read the flag:
root@a77ff3b227ba:/data# unzip ./outdir/zip/00000193.zip
Archive: ./outdir/zip/00000193.zip
[./outdir/zip/00000193.zip] flag.txt password:
extracting: flag.txt
root@a77ff3b227ba:/data# cat flag.txt
flag{this_was_a_little_too_easy}
How to create the image
Finding the flag was quite some work.
Creating the challenge is much easier.
The following bash script is all you need.
It assumes the original image is in the folder you run the script from and named buch-der-geheimnisse.jpg
:
#!/bin/bash
# create flag
echo "flag{this_was_a_little_too_easy}" > flag.txt
# encrypt with password
zip --password BuChGeHeImNiSsE flag.zip flag.txt
# corrupt magic byte of zip file
printf '\xff' | dd of=flag.zip bs=1 seek=0 count=1 conv=notrunc
# append zip to image
cat buch-der-geheimnisse.jpg flag.zip > book-of-secrets.jpg
The script simply creates a file with a secret message (you could use any existing file). Then it creates a password-protected ZIP archive. Now it scrambles the magic numbers of the ZIP archive using dd, a low-level file system tool (more info). Finally it just appends the ZIP archive to the image with cat and creates a new file.
That was it. Hope you liked it as much as I did :)