Cryptography & OSINT - The fundamentals
Very often I get the question how cryptography is part of Open-Source Intelligence (OSINT). My answer to that is: It depends on what you are investigating and if you are able to detect a form encryption.
For this blog post series I have asked for a help from my friend Sadie, a former NSA cryptanalyst. She has by far way more knowledge then I have about this subject matter. This is why we have co-written this blog series from which this is the first introductionary blog.
What is Cryptography?
Cryptography is a method of protecting and/or hiding information through the use of codes (cipher). Encryption is the application of cryptography. Cryptanalysis is the study of cipher in order to detect encryption vulnerabilities which could lead to revealing (decrypting) the original message. Using strong-level encryption (implemented properly), a user can better control who is able to read/use the data in question (even if the answer is user-access only).
Cryptography has been used for centuries with famous examples including the Caesar cipher used by Julius Caesar and the Enigma machine used by Nazi Germany during WWII. Over time, encryption algorithms and applications have evolved alongside technology to protect all kinds of information including but not limited to text, images, audio, and video, traveling over various communication channels (i.e. radio, satellite, internet, phone, fax). As modern science investigates the world of quantum physics, cryptographers and cryptanalysts are also looking into quantum cryptography!
A simple example of cryptography is the substitution cipher, which encrypts a message by replacing the original characters with "other" characters. For example we can use the take the word "osintcurious" and encrypt this by going 3 characters up in the alphabet for each letter which would results in "rvlqwfxurxv"
Think back to your school days to illustrate another simple example. You’re sitting in a boring lecture so naturally you start passing notes back and forth to your friends. You suddenly realise, if your teacher catches you, she might read the note aloud for the entire class to hear.
To save yourself the embarrassment, you and your friends decide to use the low-level cryptosystem - Pig Latin. If you’re familiar with the Python programming language or new to coding in general, try programming a Pig Latin encrypting and decrypting function.
Why does Cryptography matter to OSINT?
OSINT investigators sometimes end up finding content that may appear random, garbled, or even fully intact but just simply out-of-place. Criminal A sends a picture to Criminal B. The picture seems to show an innocent cat. Looking at all other communications by Criminal A and B this picture stands out and doesn't seem to look like any other messages been shared by both of these persons. This might imply that they are trying to "hide" a "message" in the cat picture that we can't see. But what if the OSINT investigator is going to try to decrypt the picture?
Image with Hidden Message:
Understanding how different file types (i.e. jpg, png, pdf) structure and store their data can narrow down the search for potential steg opportunities. NVISO labs highlights an easy and effective way to inject hidden messages into JPEG images - simply insert the message after all of the required JPEG structures and fields. We used this idea to create our stegged image above.
When you try to view the contents of an image file (i.e. by running the UNIX less command, opening the file in a text editor), it pretty much looks unreadable - that’s because these files are binary files. You’re probably thinking “if I can’t even read the file, how am I supposed to insert a message into it?”, and that’s an excellent question. While there are many ways of interacting with binary files, the most intuitive and user-friendly seems to be by using the Python programming language. Python has built-in <bytes> objects/structures which can be manipulated using Python’s built-in functions.
Now we should have the original picture as well as a new steg picture. Using the UNIX file and diff commands, we can confirm that both pictures are JPEGs and that they differ.
To see the difference between the two files, we can use the UNIX diff command again but this time we’ll use it on the output from the UNIX command xxd to get a more human-readable format.
Success! We’ve created two JPEG images that appear the same but in fact are different with one containing a secret message. Note that if we had instead prepended our hidden message to the original, we would no longer have a JPEG file.
The example above is a very basic version of how information (text in this case) can be hidden in another piece of information (a picture).
This specific cryptographic technique is called Steganography. This comes from the Greek word steganos which means "hidden" and graph which means "to write", hidden writing.
Side note by Sadie: "although we've found steg, this alone doesn't mean there is malicious intent. Encryption/steg != crime"
To validate the three files it is good tradecraft practice to generate a cryptographic hash value of each file. If all the files are exactly the same the cryptographic hash value for each file will be identical. If not, you now know that the files could be tampered with or at least do not contain exactly the same content.
Example of 3 files found in different places of the internet which all should have the exact same content. We have uploaded the 3 JSON files here so that you can replicate the steps below:
All the files have the exact same file size (3kb) and are in the same format (JSON). But before we open these files in a code editor for visual analysis we can simply check the file integrity by generating a hash value for each file. If the content for all three files is exactly the same we should get 3 matching hash values.
You can generate a hash value with terminal/shell commands or you can use CyberChef, a tool for Turning one form of data into something else or extracting data based upon recipes.
For this example we will use CyberChef and make a recipe that generates a SHA2 hash value of a file that you load.
File1.json > SHA2 hash value = 2705d674acfe31c3749d7eda2de9f6988704cef2954f7e4f17457a649940d62a2623afe6441b759f9891ee953bc9bf66e75f97cbff63ab64ac0dfd35a454ae2a
File2.json > SHA2 hash value =
File3.json > SHA2 hash value =
Clearly the hash value of file2.json does not match the hash values of file1.json and file3.json. This means that the actual data inside file2.json is different from the other two files.
Why is this important?
If you download files that should be exactly the same but the hash values are different this means that the data inside the files has changed. This could be a plausible reason where someone edited the document to fix a typo for example. The filename and file size could still be the same, but the content isn't. Another example could be that someone edited a document with adding in false information or a malicious link or payload. This is why it is good tradecraft practice to always check the file integrity of two or more files that are supposed to be the same.
Why should OSINT and Crypt analysts work together? OSINT analysts may have crypt laying around hiding as garbles, seemingly innocuous images, or completely random looking data. Cryptanalysts may be able to help with cipher identification, narrowing potential attack vectors aimed toward decryption, suggest auxiliary information that may be useful for decryption (algorithm type, key space, whether there’s a human-entered password involved, etc.). OSINT analysts can help legally obtain and provide auxiliary information needed to crack an encryption scheme as well as data. While encryption isn’t a crime, traditionally many cryptanalysts gain access to “suspect” data when delivered by government entities.