In the same spirit, he wondered what treasures might be submersed in Bitcoin’s data lake. “We literally have a record of every single transaction,” he said. “These are remarkable economic and sociological data sets. Clearly, there’s a lot of information in there, if you can get at it.”
Getting at it proved nontrivial. Ms. Blackburn was barred from the university’s supercomputing cluster — with her file folder labeled “Bitcoin,” she was suspected of mining the cryptocurrency. “I objected,” she said. She said she tried to convince an administrator that she was conducting research, but “they were completely unmoved.”
A key tactic of Ms. Blackburn’s was to trace patterns in plots of numbers that in theory should have been random and meaningless. In one case, she was chasing the “extranonce,” one piece of the mining puzzle: a short field of 0s and 1s tucked within a longer string that encodes each block, or bundle, of transactions. The extranonce leaked information about a computer’s activity. This led Ms. Blackburn to reconstruct the miners’ behavior: when they were mining, when they stopped and when they started up again. She speculates that the extranonce’s leaky behavior was tolerated because it allowed Bitcoin’s creator to keep an eye on miners; the source code was modified to plug this leak shortly before Satoshi Nakamoto disappeared from the public Bitcoin community in December 2010.
Once Ms. Blackburn had put various toeholds to use — allowing her to erode the identity-masking protections — she began merging addresses, linking nodes on a graph, consolidating the effective population of mining agents. Then she cross-referenced and validated the results with information scraped from Bitcoin discussion forums and blogs. Initially, the catalog of agents who mined most of the Bitcoin tallied a couple of thousand; then it hovered for a while around 200. Ultimately, Hail Mary spit out 64. (Eventually, Hail Mary’s brains were incorporated into the lab’s computer cluster, Voltron.)
The study’s purpose was not to name names; it’s the job of the F.B.I. and the I.R.S. to bust Bitcoin criminals. But the researchers pinpointed the identities of a couple of the top players who were publicly known Bitcoin criminals: Agent No. 19 is Michael Mancil Brown, a.k.a. “Dr. Evil,” who was found guilty of a 2012 fraud and extortion scheme involving Mitt Romney, then a candidate for president. Agent No. 67 is associated with Ross Ulbricht, a.k.a. “DreadPirateRoberts,” creator of the Silk Road. Naturally, Agent No. 1 is Satoshi Nakamoto — whose true identity the researchers did not try to determine.
Mark Gerstein, a professor of bioinformatics at Yale University, found in the research implications for data privacy. He recently stored a genome on a private blockchain, which allowed for a secure and tamperproof record. But he noted that in a public setting, as with Bitcoin’s blockchain, a data set’s size and subtle patterns made it susceptible to breaches, even as the data remained immutable. (Ms. Blackburn wasn’t tampering with the Bitcoin blockchain’s records.)
“That’s the amazing thing about big data,” Dr. Gerstein said. “If you have a big enough data set, it starts to leak information in unexpected ways.” Even more so when data from different sources are connected, he said: “When you combine one data set with another to make a bigger data set, nonobvious linkages can arise.”