Steganography challenge - The Book of Secrets

10 minute read Published:

A small steganography challenge illustrating basic tricks used to hide data inside images. This post introduces the challenge, walks you through the soliution, and ends by describing how the challenge was created. The solution involves some basic JPG image screening, hexedit surgery, and password cracking with custom wordlists.
Table of Contents

Steganography is the practice of hiding information inside other media like images, audio or video files, text, or pretty much anything else. It is different from encryption in that it aims not at making information unreadable but at concealing the very fact that it is there. Steganography and steganalysis (detection of steganography) are long-standing fields of research. Overviews of the field can be found, e.g., in Subhedar/Mankar (2014) or ZieliƄska/Mazurczyk/Szczypiorski (2014).

Real-world use of steganography is becoming more prevalent. For instance, creators of malware use these techniques to conceal their traffic. In 2011, a malware called Duqu was discovered which used JPG steganography to transmit information back to its control nodes (read more). It illustrates how steganography and cryptography can work hand in hand, as the traffic was first encrypted and then embedded into images. This makes detecting the malware very difficult. Researchers expect steganography will become more and more sophisticated in the future.

In this post, I walk through are very simple example of steganography. There is an innocently looking image of a book with some German text on it. Imagine you suspect someone embedded a message into this file. How would you go about finding the message?

The post is structured into three parts. In part one, I’ll start by showing you the image. Stop here if you want to try it yourself first. Then, in part two, I’ll walk you through an exemplary solution to show how the hidden message can be discovered. Finally, in part three, I’ll show you how the image was created.

My walkthrough is based on a toolbox of steganography-related software I’ve created a while ago. You can find it here on GitHub. It’s a Docker image you can run locally and contains all tools used in this post. However, this post uses no exotic tools so apt-cache search should work most of the time you miss something.

The challenge

The challenge here is similar to one I’ve read about recently in this blog post. It was originally from a CTF at BSides Iowa and was created by SecDSM. I’ve spiced it up slightly to make it a little more difficult. If you want to try this yourself, don’t read the blog post. I will spoil most of the challenge.

Look at the following two images. The first one is the original image with no message inside (thx to Pexels for the free image!).

orignal image

Now here is the second image, which does have a message embedded. Both should render perfectly in your browser. You don’t see a different, right?

modified image containing a secret message

Now image all you would have intercepted is this second image. You have a suspicion that there is a message inside. Go and find evidence for manipulation. Then go on and extract the hidden message.

Tried hard enought? Then go on and read it could be solved.

The solution

Initial screening

The file we look at is a JPG image. For these files, I usually start with a script that checks a few basic things. It looks for embedded strings or other files and tries some steganography tools without or with default passwords. The script can be found here.

Applied to this file, we can spot a few oddities that deserve more attention:

root@a77ff3b227ba:/data# check_jpg book-of-secrets.jpg
[...]
#############################
########## binwalk ##########
#############################

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             JPEG image data, JFIF standard 1.01
99456         0x18480         End of Zip archive
[...]
################################
########## stegdetect ##########
################################
book-of-secrets.jpg : appended(227)<[nonrandom][data][.K........f.?L..]>
[...]
##################################
########## stegoVeritas ##########
##################################
Type:   JPEG (ISO 10918)
Mode:   RGB
[...]
Trailing Data Discovered... Saving
b'\xffK\x03\x04\n\x00\t\x00\x00\x00f\xae?L\xc5\x16\xf7\xc7-\x00\x00\x00!\x00\x00\x00\x08\x00\x1c\x00flag.txtUT\t\x00\x03P:rZP:rZux\x0b\x00\x01\x04\x00\x00\x00\x00\x04\x00\x00\x00\x00\xb6J\xf2\x03\x97\x88\x9cU\xd9\xb84\xc4HA\x13\xa36UW\xb1\x85&=8\xd9\xdd/\xd7n\x91\xba\xf9\xf8\xe5&\xe8\xec\xac\xb6\xd7\xa9\xd8\xea:SPK\x07\x08\xc5\x16\xf7\xc7-\x00\x00\x00!\x00\x00\x00PK\x01\x02\x1e\x03\n\x00\t\x00\x00\x00f\xae?L\xc5\x16\xf7\xc7-\x00\x00\x00!\x00\x00\x00\x08\x00\x18\x00\x00\x00\x00\x00\x01\x00\x00\x00\xa4\x81\x00\x00\x00\x00flag.txtUT\x05\x00\x03P:rZux\x0b\x00\x01\x04\x00\x00\x00\x00\x04\x00\x00\x00\x00PK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00N\x00\x00\x00\x7f\x00\x00\x00\x00\x00'

First, binwalk, a forensics tool to search for files in disk images, claims you image contains the end of ZIP archive. Strangely, it does not find the beginning of it (but usually should). Stegdetect, a tool made to identify steganography in images, found some non-random data at the end of the image. Finally, stegoVeritas, another Python steganography tool, also found some data after the actual image data. It conveniently prints out this data too. Inside this data, we find a few readable strings, most importantly two occurrences of flag.txt.

All this taken together, it appears as if a ZIP file was appended to the image. Binwalk claims it found the end of a ZIP file. If you print out the strings of a ZIP file, you can find the filenames inside it in plain text like in the bytes printed by stegoVeritas. Taken together, it seems as if someone appended a ZIP file and made it’s begining unreadable. Since the header of a JPG file tells your browser how to read the image, appending data to it does not make the image unusable. Your browser simply ignores the additional data, which is why both images look exactly the same. How do we get this ZIP archive out of the image and extracted?

Repairing the ZIP archive

There is a forensics tool called foremost that I typically use for file carving (i.e., splitting the concatenation of several files into it’s parts). It is meant to be used on raw disk images but can be applied to pretty much any file. It is similar to binwalk in how it works. The problem is that currently, these tools don’t identify the beginning of the ZIP file. Thus, running them on the image will not extract the ZIP archive. Let’s think about why these tools don’t work.

These forensic tools look for special signatures called magic numbers which your filesystem uses to quickly identify file types. A comprehensive list can be found in Wikipedia. You can see in that list that ZIP files typically start with the bytes 50 4B 03 04. In our screening output, we saw our data starts with \xffK\x03\x04 and since a K is 4B in hex, we see it starts with FF 4B 03 04. Thus, it appears someone corrupted the magic bytes to prevent identification of the ZIP archive.

Let’s repair the magic numbers with a hex editor. Use hexedit book-of-secrets.jph to change FF into 50.

00018390   4B 6F B2 BB  34 10 E2 0C  82 96 E7 77  28 B6 A6 DB  Ko..4......w(...
000183A0   85 14 BC C7  0E 09 4D E6  D4 FF 00 31  4B 6A 50 AF  ......M....1KjP.
000183B0   43 FF D9 FF  4B 03 04 0A  00 09 00 00  00 66 AE 3F  C...K........f.?
000183C0   4C C5 16 F7  C7 2D 00 00  00 21 00 00  00 08 00 1C  L....-...!......
000183D0   00 66 6C 61  67 2E 74 78  74 55 54 09  00 03 50 3A  .flag.txtUT...P:
000183E0   72 5A 50 3A  72 5A 75 78  0B 00 01 04  00 00 00 00  rZP:rZux........
000183F0   04 00 00 00  00 B6 4A F2  03 97 88 9C  55 D9 B8 34  ......J.....U..4
00018400   C4 48 41 13  A3 36 55 57  B1 85 26 3D  38 D9 DD 2F  .HA..6UW..&=8../
00018410   D7 6E 91 BA  F9 F8 E5 26  E8 EC AC B6  D7 A9 D8 EA  .n.....&........
00018420   3A 53 50 4B  07 08 C5 16  F7 C7 2D 00  00 00 21 00  :SPK......-...!.
00018430   00 00 50 4B  01 02 1E 03  0A 00 09 00  00 00 66 AE  ..PK..........f.
00018440   3F 4C C5 16  F7 C7 2D 00  00 00 21 00  00 00 08 00  ?L....-...!.....
00018450   18 00 00 00  00 00 01 00  00 00 A4 81  00 00 00 00  ................
00018460   66 6C 61 67  2E 74 78 74  55 54 05 00  03 50 3A 72  flag.txtUT...P:r
00018470   5A 75 78 0B  00 01 04 00  00 00 00 04  00 00 00 00  Zux.............
00018480   50 4B 05 06  00 00 00 00  01 00 01 00  4E 00 00 00  PK..........N...
00018490   7F 00 00 00  00 00                                  ......

Now binwalk can properly identify the file again. We can also use foremost now to care out the ZIP file in a very convenient way. It will search for files and put the result into a folder you specify:

root@a77ff3b227ba:/data# foremost -i book-of-secrets.jpg -o outdir
Processing: book-of-secrets.jpg
|foundat=flag.txtUT
*|
root@a77ff3b227ba:/data# ls -lah outdir/zip
total 4.0K
drwxr-xr-- 3 root root  96 Jan 30 20:57 .
drwxr-xr-- 5 root root 160 Jan 30 20:57 ..
-rw-r--r-- 1 root root 228 Jan 31 21:32 00000193.zip

That was it. We now have our ZIP file.

Cracking the ZIP archive

We have a zip file and would like to extract it. Sounds easy. Just run unzip on it:

root@a77ff3b227ba:/data# unzip outdir/zip/00000193.zip
Archive:  outdir/zip/00000193.zip
[outdir/zip/00000193.zip] flag.txt password:
   skipping: flag.txt                incorrect password

It would have been too easy. Unzip asks for a password. We have to find a way to crack it.

The plan is to get a wordlist and use the tool fcrackzip to find the password. There are two ways to go from here. Either we use a huge standard wordlist and hope the password is in there or we build our own one. You could try the standard wordlist but in this case, I suspect you would not be successful. We start directly by building our own one.

The way to come up with a wordlist is to look at the image. Three words are visible in a prominent location. “Buch der Geheimnisse”, which is German for “Book of Secrets”, are the three big words. Put them into a file:

root@a77ff3b227ba:/data/# cat wordlist.txt
buch
der
geheimnisse

We assume that the password is somehow based on these words but some scheme is applied to make it moe complicated. First, the password could be any of these words or any combination of these words. Moreover, we don’t know how words might be deliminated. A small python script creates an extended wordlist with all permuations and a number of different delimiters.

import itertools

words = [word.strip() for word in open("wordlist.txt")]


def append_list(f, words, delim):
    for word in words:
        f.write(delim.join(word) + "\n")

with open("wordlist_ext.txt", "w") as f:
    lists = [words,
             [w for w in itertools.permutations(words, 2)],
             [w for w in itertools.permutations(words, 3)]]
    delims = ["", " ", ",", ";"]
    for words in lists:
        for delim in delims:
            append_list(f, words, delim)

The result is a file with 60 passwords:

t@a77ff3b227ba:/data# python2 combine.py
root@a77ff3b227ba:/data# wc -l wordlist_ext.txt
60 wordlist_ext.txt
root@a77ff3b227ba:/data# cat wordlist_ext.txt | head -n 3
buch
der
geheimnisse
root@a77ff3b227ba:/data# cat wordlist_ext.txt | tail -n 3
der;geheimnisse;buch
geheimnisse;buch;der
geheimnisse;der;buch

Now since fcrackzip is really fast and the passwords are not super complicated yet, we can extend the wordlist even more. John the Ripper has some nice features for password list generation. My reference for using it is this post, which illustrates nicely the various modes. In particular, look at the built-in rules. We use the built-in rule ‘single’ which generates really huge lists:

root@a77ff3b227ba:/data# john -wordlist:`pwd`/wordlist_ext.txt -rules:Single -stdout > `pwd`/wordlist_huge.txt
Press 'q' or Ctrl-C to abort, almost any other key for status
50632p 0:00:00:00 100.00% (2018-01-31 21:38) 632900p/s geheimnisse;der;buch1900
root@a77ff3b227ba:/data# sed -i '1d' wordlist_huge.txt
root@a77ff3b227ba:/data# wc -l wordlist_huge.txt
50632 wordlist_huge.txt
root@a77ff3b227ba:/data# cat wordlist_huge.txt | head -n 4
buch
der
geheimnisse
root@a77ff3b227ba:/data# cat wordlist_huge.txt | tail -n 3
der;geheimnisse;buch1900
geheimnisse;buch;der1900
geheimnisse;der;buch1900

We have blown up the list to about 50k passwords. Notice the line sed -i '1d' wordlist_huge.txt directly after password generation. It’s added because my john creates a headline with some status information. It’s time to crack.

root@a77ff3b227ba:/data# fcrackzip -v -u -D -p ./wordlist_ext.txt ./outdir/zip/00000193.zip
found file 'flag.txt', (size cp/uc     45/    33, flags 9, chk ae66)


PASSWORD FOUND!!!!: pw == BuChGeHeImNiSsE

Password found :) Now we can unzip the file and read the flag:

root@a77ff3b227ba:/data# unzip ./outdir/zip/00000193.zip
Archive:  ./outdir/zip/00000193.zip
[./outdir/zip/00000193.zip] flag.txt password:
 extracting: flag.txt
root@a77ff3b227ba:/data# cat flag.txt
flag{this_was_a_little_too_easy}

How to create the image

Finding the flag was quite some work. Creating the challenge is much easier. The following bash script is all you need. It assumes the original image is in the folder you run the script from and named buch-der-geheimnisse.jpg:

#!/bin/bash

# create flag
echo "flag{this_was_a_little_too_easy}" > flag.txt

# encrypt with password
zip --password BuChGeHeImNiSsE flag.zip flag.txt

# corrupt magic byte of zip file
printf '\xff' | dd of=flag.zip bs=1 seek=0 count=1 conv=notrunc

# append zip to image
cat buch-der-geheimnisse.jpg flag.zip > book-of-secrets.jpg

The script simply creates a file with a secret message (you could use any existing file). Then it creates a password-protected ZIP archive. Now it scrambles the magic numbers of the ZIP archive using dd, a low-level file system tool (more info). Finally it just appends the ZIP archive to the image with cat and creates a new file.

That was it. Hope you liked it as much as I did :)