All Articles

Searching for OP_RETURNs in a haystack

What OP_RETURN messages in Bitcoin are, how to find them using Bitcoin Core, and how all of this relates to the kidnapping (and alleged murder) of the wife of Tom Hagen, a Norwegian billionaire.

A haystack, if you apply a generous amount of goodwill A haystack, if you apply a generous amount of goodwill. Photo by Torkel Rogstad

In the winter of 2019, a bizarre and intriguing case broke in Norwegian media: Anne-Elisabeth Hagen, the wife of Norwegian billionaire businessman Tom Hagen, was kidnapped, the police has been working in secrecy for months trying to find her, and the kidnappers demand payment in Monero, a cryptocurrency.

Later, the case had a couple of more interesting twists: the husband was charged with the murder (the police is still investigating him and no verdict is out, at time of writing), and it turns out the kidnappers, Tom Hagen and the police has been communicating through Bitcoin transactions. I’ve been interviewed in a major Norwegian newspaper about this, which you can read here.

There’s been two types of communication with Bitcoin, both are awful for all parties involved and should not be used for neither illicit nor legal activities.

Shared addresses and predetermined amounts

The first way of communicating is very simple: the kidnappers left a note specifying a Bitcoin address that should be used for communication, and some different Bitcoin amounts and a corresponding “code”. An example of this is that sending 0.0007 BTC to the communication address meant in a rush, quick or she will die, and sending 0.03 BTC mean I’ll send Monero in seven days. This communication platform is awful for a few different reasons:

  1. It’s one-way communication with a limited vocabulary. As you can tell, the only messages the parties can exchange with this setup is what the kidnappers decided when they left the letter.
  2. It’s not private. Anyone can inspect the Bitcoin blockchain, and see what the parties are saying to each other. The list of codes from the letter quickly ended up on the internet, so all communication is known to the public.
  3. It’s repudiable, meaning neither party can prove that the sender of a message is who they claim to be. Right now I (or any other person) can send a message to the communication address, saying that either the money is on it’s way, or that Anne-Elisabeth is dead.

The police quickly moved on from being limited by these predefined messages, which brings us to our next form of communication.

OP_RETURN text fields

All Bitcoin transactions consist of a number of inputs and outputs (at least one of each). The inputs are the “source” of the money. These would typically come from your Bitcoin wallet. The outputs are most of the time the destination(s) you want to send your funds to. However, this is not always the case.

All Bitcoin transactions are really small code snippets, and sending money to somewhere is just one of the many things those code snippets can do. It is also possible to say “don’t send any money anywhere”, while including a small data field (up to 40 bytes). Why would this be useful? One major reason is that all Bitcoin transactions are visible on the blockchain, providing a highly available, globally visible and censorship resistant way of broadcasting data. The observant reader will notice that this is not what you want for private communications, so keep that in mind the next time you’re planning a kidnapping.

If you want to read more about OP_RETURN data fields, check out the bitcoin.org developer guide

Locating the OP_RETURN messages

In June of this year, the Norwegian state broadcaster (NRK) wrote that Tom Hagen/the police had reached out to the kidnappers with the message cee face ae (presumably, they want to see the face of Anne-Elisabeth). It turns out that we don’t need more information than that to locate the transactions sent between the different parties. I’ll show you how!

The easiest way to interact with the Bitcoin blockchain is through bitcoin-cli, a tool that is shipped with Bitcoin Core. It allows you to (among other things) query blockchain data. To make this a bit easier to wrestle with, we’ll query the Bitcoin test network (testnet) instead of the proper main network (mainnet). Testnet is about 10% of the size of mainnet. I’ve recreated the message the police/Tom Hagen sent to the alleged kidnappers, so this code will also work on mainnet, after switching around a few parameters.

First, lets create a file called print-op-returns.sh, and give it this content:

for height in $(seq 1835420 1835420 ); do #1
    hash=$(bitcoin-cli getblockhash $height) #2
    bitcoin-cli getblock "${hash::-1}" 2 > block-$height.json #3

    echo Processing block file block-$height.json >&2
    jq -c '.tx | .[]' block-$height | while read tx; do #4
        txid=$(jq -r .txid <<< "$tx")
        jq -c '.vout | .[] | select(.scriptPubKey.type == "nulldata")' <<< "$tx" | \
        while read vout; do #5
            echo block-$height.json \
              "$txid" $(jq -r .scriptPubKey.asm <<< "$vout" | cut --fields 2 --delimiter " ") #6
        done
    done
done

There’s a few things going on here, let’s break it down (numbers correspond to comments in bash script):

  1. In addition to bitcoin-cli, we’re also using jq. Chances are you’re already familiar with it, but if not, all you need to know is that it is a JSON processing tool.
  2. We have a rough idea where the message we’re looking for is located, so we specify the start and end for the block heights we want to search in. When NRK wrote the article that led me to finding the message, they said it was sent in November or December of 2018. If you’re running this on mainnet, all you need to do is tweak these parameters with the correct block heights for those dates.
  3. Get the hash of the block height we’re interested in. Block hashes are unique identifiers for all Bitcoin blocks.
  4. Load all the block data into a file corresponding to that height. Adding 2 at the end of the request tells Bitcoin Core to be extra verbose, and include all transaction data in the response.
  5. Loop over all the transactions in the block. Bash and jq syntax is a bit arcane, but the gist of it is that we want to run our loop for all transactions, located under the tx field in the block data.
  6. Loop over all outputs in each transaction. Outputs with OP_RETURN messages are identified with a type of nulldata, so we filter our jq query to only give us those.
  7. Finally, extract the message associated with the output, and print it to the terminal with its block file and transaction ID.

If you run this, you’re going to see all transactions in our block range that have OP_RETURN messages. That’s a lot of messages! Filtering messages based on content is trivial:

bash print-op-returns.sh | grep --regexp='cee.*face.*ae`

Here, we’re saying we only want to see lines that have the words cee, face and ae in them, optionally with some other characters in between.

All in all, we were able to find out sensitive information about an ongoing investigation with less than 15 lines of code. Not bad! The journalists at NRK was probably not aware of how much information they were giving away with the article they wrote. To me, that says a couple of interesting things:

  1. Bitcoin is not a private currency, but most people think it is.
  2. The Bitcoin blockchain data is a treasure trove of interesting economic data, that anyone with a computer can query and analyze. Try finding a specific USD transaction with 15 lines of Bash! In the years ahead, both investigative journalists, economists and researchers are going to uncover a lot of really fascinating truths about human behavior.

Were you able to find the message I posted on testnet? And did you also try and apply this to mainnet? Feel free to contact me if you need some help with this, or just want to chat about anything related to shady billionaires, Bitcoin transactions or compact Bash scripts.