Wednesday, May 4, 2016

In God We Trust. All Others Use Blockchain

A blockchain (or block chain) is a distributed database or ledger that keeps track of a continuously growing set of data records, protected against tampering or revision.  Data can only be appended to the store, existing data cannot be changed. 

The blockchain is the core innovation behind Bitcoin, but it may be useful in other situations as well.  Bitcoin is the original and best known of the “crypto currencies,” electronic money that is not issued by any government and that exists solely within the computers of the Internet.

In addition to the potential concern of eDiscovery with Bitcoins, other potential uses make blockchain technology potentially interesting to eDiscovery.  As records of transactions or other events, blockchains are likely to play an essential role in future litigation.  Their ability to provide a secure, verifiable, source of truth concerning the data entered in a blockchain without any central authority is likely to be an important benefit for the legal system, but there are limitations.

Bitcoin’s blockchain is a virtual ledger that keeps track of who owns Bitcoins, how many they own, and the chain of custody of each Bitcoin from its creation to its current owner.  The coins are not physical objects.  They are only entries in the Bitcoin ledger, that is, in the Bitcoin blockchain.  The Bitcoin blockchain contains a verifiable record of every Bitcoin transaction ever made.  It uses cryptography and other features to allow each participant on the network to securely update the ledger without the need for a central authority.

Three essential features make blockchain technology interesting and secure: 
  • it is distributed
  • it uses cryptography
  • it has methods for automatically reconciling conflicts


When we say that a blockchain is distributed, what we mean is that exact copies exist on a large number of computers, organized into a peer-to-peer network.  Any computer is allowed to connect to the network, send new transactions to it, verify transactions, and create new blocks (groups of transactions).

Every computer in the network has the definitive record of these transactions.  No one of these computers serves a central source of truth about the status of a transaction.  There is no central authority.  Instead, the truth is established using an algorithm that relies on a consensus of the computers in the network.  As long as a majority of the computers in the network are run honestly, it is nearly impossible to modify earlier entries in the blockchain.

The ledger can only be updated by agreement of a majority of the computers in the network. If the ledger were stored on only a single computer, then a malicious party could, if they had access to this computer, change the ledger.  Because the Bitcoin blockchain is represented on a large number of computers, a majority of them would have to be compromised to change the ledger.

Public key cryptography is a second essential feature of block chains.  Public key cryptography uses two large numbers.  One number serves as a public key and the other number serves as the private key.  The public key can be made widely available while the private key is only cryptographically useful as long as it is kept secret.

A message can be encoded using a cryptographic algorithm and the public key. Then, it can be decoded only by someone in possession of the private key. 

One computer on the network can transfer a coin to another computer by issuing a transaction.  A transaction contains:
  •  An input, which is a reference to the output of one or more previous transactions.
  • A hash of the previous transaction
  • An Index, which is a specific output from the referenced transaction
  • A signature
  • A public key


The public key must match the hash from the output of the previous transaction.  This public key is used to verify the signature.  The combination of the public key and the signature proves that the transaction was created by the actual owner of the Bitcoin being transferred.  It is relatively easy to verify the authenticity of a Bitcoin transaction by checking the chain of signatures. 

A third innovation of blockchain technology is its method for reconciling conflicts so that there is only one distributed version of the truth, rather than a bunch of inconsistent versions.  New transactions are broadcast to all peers in the network. Each peer who receives the transaction checks it for validity and adds it to its block of current transactions.

The consensus method used to verify transactions is called proof of work. Participants on the network run algorithms to confirm that the digital signatures attached to blocks verify each transaction.  They then compete to solve a computational puzzle, which is designed to have a low probability of success on each attempt. This low probability of success makes it unpredictable which networked computer will be able to generate the next block in the blockchain.  It ensures that a randomly selected computer in the network serve as the temporary authority providing the authoritative current status of the blockchain for further computations.  Its peers only accept the generated blockchain if they can verify its accuracy.

Bitcoin’s decentralized consensus depends on four processes running independently on network nodes (computers connected to the network):
  • Verification of each transaction
  • Aggregation of those transactions into new blocks, with demonstrated computation through a proof-of-work algorithm
  • Verification of the new blocks by every node and incorporation of the block into a chain
  • Selection, by each node, of the chain with the most cumulative computation demonstrated through proof of work


When all of these processes are verified, they are accepted as a consensus by the network.  Because they are all independent, they are highly resistant to tampering by a minority of the network nodes.

Each block requires a substantial amount of work to generate an acceptable candidate for acceptance by the network. It preserves the hash of the preceding block. The combination of a high level of computational effort and the preservation of the identity of its preceding block provides protection against anyone trying to regenerate a string of blocks to cover a change to an earlier transaction.  This process protects the chain from tampering, by making changes to preceding blocks computationally infeasible.

Put simply, the proof of work method requires the node to guess a number, called a nonce, that when combined with the block yields a hash value below a certain range.  This nonce is basically a random number and, on average, will take a great deal of computation to guess (by trial and error).  In fact, it can take billions of guesses to come up with an acceptable nonce.  Once guessed, however, it is fairly trivial to verify that it is correct.  The first peer to guess the correct nonce is awarded a “mining credit” of some (currently 25) Bitcoins for its work.  The successful peer then broadcasts that block to its peers, who can easily verify that it is correct.


Hash: 000000000043a8c0fd1d6f726790caa2a406010d19efd2780db27bdbbd93baf6
Previous block: 00000000001937917bd2caba204bb1aa530ec1de9d0f6736e5d85d96da9c8bba Next block: 00000000000036312a44ab7711afa46f475913fbd9727cf508ed4af3bc933d16 Time: 2010-09-16 05:03:47
Difficulty: 712.884864
Transactions: 2
Merkle root: 8fb300e3fdb6f30a4c67233b997f99fdd518b968b9a3fd65857bfe78b2600719 Nonce: 1462756097
Input/Previous Output
Source & Amount
Recipient & Amount
N/A
Generation: 50 + 0 total fees
Generation: 50 + 0 total fees
f5d8ee39a430...:0
1JBSCVF6VM6QjFZyTnbpLjoCJ...: 50
16ro3Jptwo4asSevZnsRX6vf..: 50
Table 1: Example Block of Bitcoin. The block contains 2 transactions, one of which awards the generator peer with 50 Bitcoins.
For a peer to double-spend a given Bitcoin, it would need to replace the transaction where the Bitcoin was originally spent along with its corresponding block.  It would then have to re-compute all of the subsequent blocks in the chain, because they refer to this earlier block.  While not impossible, it is highly unlikely that this could be accomplished.  As long as the honest peers outnumber the dishonest ones, such cheating is infeasible.

Table 2 shows a few recently committed Bitcoin blocks.  “Mined by” indicates the peer system that found the nonce and won the right to broadcast the block.  The difficulty of finding the nonce is adjusted so that blocks are committed on average every 10 minutes, but as you can see, there is some variation in how long it takes to find the nonce.


Block Number
Age
Transactions
Mined by
Size
410136
3 minutes ago
653
Discus Fish
462607
410135
6 minutes ago
338
Discus Fish
180432
410134
6 minutes ago
1303
Discus Fish
978222
410133
7 minutes ago
3318

989240
410132
37 minutes ago
563
AntMiner
259451
410132
36 minutes ago
563
AntMiner
259451
 Table 2: A record of some recent Bitcoin blocks

Why blockchains might matter to eDiscovery

The blockchain provides a way to track and verify transactions without requiring a central tracking authority.  Right now it is used mainly by Bitcoin (or similar crypto-currencies), but it could find applications in other domains.  Many organizations, including IBM, at least two states, and several financial institutions, are exploring the use of blockchains in their businesses.  If these uses become common, they will certainly figure in future litigations.  Blockchains will join the list of sources that must be considered in eDiscovery.

As an example of the interest among financial institutions in blockchain technology, DTCC (Depository Trust & Clearing Corporation) recently held a well-attended symposium on the use of blockchain technology in clearing transactions.  DTCC is pursuing blockchain technologies because they believe that these technologies have the potential to address perceived shortcomings of the post-trade process.  If this turns out to be true, blockchain technologies are likely to play an important role in future financial litigations, where, for example, transactions can be verified. 

Another application of this technology might include tracking the provenance of valuable objects, for example, real estate titles, art objects, or diamonds.  For example, Allianz is working with a startup to develop a system to track diamonds from the time that they are mined through retail sale.

In a cybersecurity breach, according to the Gemological Institute of America (GIA), the grading reports for over a thousand registered diamonds were altered.  The grading report is very influential in setting the value for a diamond so any changes could have a significant economic impact.   Only 175 of these diamonds have been resubmitted for reevaluation.  If the evaluation records for these diamonds had been stored in a block chain, it would be impractically difficult to modify them.  Disputes concerning the provenance of registered gems would be easier to resolve.

The owners of Bitcoins or other goods registered in the ledger are represented by their Bitcoin address, which is simply a number.  The blockchain verifies that the address of the transmitting  party is authentic—in that it verifies that that address was associated at least that number of Bitcoins.  The blockchain does not, however, contain information about who owns that address, merely that the person, whoever it is, had access to the private key associated with that address.  The personal identity of the person corresponding to a Bitcoin address is held outside of the blockchain and may be difficult to discover.

Tools exist that let anyone view the transactions in a Bitcoin block, so discovering the information that is contained in a blockchain is likely to be fairly easy and direct.

It may be too soon to tell, but blockchains may also play a role in the eDiscovery process, itself.  For example, blockchains might be used for tracking or establishing chain of custody, legal holds, and trial exhibits.

Why blockchain technology might not matter to eDiscovery

Blockchain technology is very effective at showing that transactions entered into its ledger have not been modified.  Its approach is so suitable to Bitcoin because a Bitcoin’s only existence is within this ledger.  Bitcoins have no physical manifestation, except within the blockchain. 

Just as the blockchain is of  little use identifying the owner of a Bitcoin address, dealing with tangible goods or instruments that exist outside of the blockchain, is also not so simple.  The blockchain can guarantee that the transactions are faithfully represented, but unless the only manifestation of a good is within the blockchain itself, it cannot guarantee that the transaction faithfully represents the object to which the transaction refers.  If someone submits the wrong information about a diamond, that wrong information will be preserved forever in the blockchain, but will still be wrong. 

The blockchain will faithfully transmit the records it is given, but it has nothing to say about whether those records accurately represent anything else.  The blockchain can accurately represent the sequence of transactions that brought a Bitcoin to a specific owner, as represented by a specific Bitcoin address, but once that Bitcoin is converted to a tangible currency, such as dollars, the blockchain has nothing more to say.  Nor does it have anything to say about the dollars before they were exchanged for Bitcoins.  A blockchain can accurately transmit the truth, but it cannot create it.

The asymmetry between the ability of a blockchain to guarantee that the information it contains cannot be changed and its inability to guarantee that the correct information has been inserted in it is a logical limitation, not something that can be fixed by any kind of technology.  As long as the information exists solely within the blockchain (Bitcoins are DEFINED only within the blockchain), it can work very effectively.  The blockchain transmits the truth from step to step in the same way that a logical deduction transmits the truth.  For example, the logical sequence,
  1. Mary is a cow
  2. 2All cows have four legs
  3. 3Therefore Mary has four legs

Transmits the truth from the premises (Mary is a cow and all cows have four legs) to the conclusion (Therefore, Mary has four legs).  But it cannot guarantee that Mary is, in fact, a cow or that all cows have four legs.  Nothing in the logic compels these two statements to be true.  If they are true, then the conclusion must also be true, but one or both of the premises could be false.  Establishing the truth of the premises requires a different logical process, induction.  There are no guarantees for inductive reasoning.

In the case of the diamonds, the validity of the description entered into the blockchain is the responsibility of the GIA gemologists.  They provide an authoritative evaluation of each diamond.  Being human beings, they are at risk of being suborned.  Once entered into the blockchain, their analysis cannot easily be changed, but there can be no guarantee that they entered the correct data at the start.

Finally, blockchain technology is only effective when its database is distributed across a large group of independent systems.  The truth is what a majority of these systems says the truth is.  As long as a majority of these systems is honest, the guarantee of truth is valid.  One would have to suborn a majority of systems to change the information recorded in the blockchain.  The smaller the network, or, properly, the fewer independent systems there are in a network, the higher the likelihood that a majority could be corrupted.

Two things help to ensure that the Bitcoin system is widely distributed.  First, it is open.  It is based on open-source software.  It is a public network.  Anyone can download it and participate in the network.  The interest in Bitcoin is sufficiently broad that many people would be interested in participating.  There are currently at least 6,655 nodes on the Bitcoin network.

Second, the first system that finds the correct nonce for a block receives a fee, which, depending on the exchange rate, could correspond to a significant number of dollars.  Private networks, such as might be used to support GIA records or DTCC transactions are likely to be much smaller, and so would be easier to compromise.  The fees for running the servers would likely be lower so there would be less incentive to add servers to the network.  Limited networks would be correspondingly more susceptible to scams than larger networks.

Networks that are not open or broad present a much higher risk of being successfully attacked, thus reducing the value and validity of the blockchain.  A centrally run (e.g., by DTCC) blockchain presents no value at all, I would argue.

Conclusion

The concept of a blockchain has the potential to radically change secure computing in important ways.  Exactly what the impact of these changes will be on eDiscovery is, at this point, difficult to predict.  Logical constraints limit the applicability of blockchain to the non-virtual world, because blockchains secure the ledger, not the correspondence between the ledger and the items whose transactions are recorded in the ledger.  

Nevertheless, it seems apparent that blockchain technology will have some impact on eDiscovery.  Bitcoin and other crypto currencies are popular enough that they will surely figure in some litigation at some point, and that point is probably soon.

As I see it, financial instruments, if they can be designed to exist solely within the blockchain, as Bitcoin does, are the most immediate application of this technology of relevance to the eDiscovery community.  That is, if we can make a stock, for example, that exists only in the ledger recorded in the blockchain, then we would be creating a new special-purpose crypto currency for trading in that company.  The stock would only exist as entries in the blockchain ledger, thus avoiding the weak link of external reference.

The security of that system depends on having the ledger represented on many independently controlled servers, so it might only work if whole markets could be converted to virtual stock and every company or every stockholder were to run a full blockchain server, each with a complete copy of all of the transactions for the entire history of the market.  An appropriate proof of work method would also be required.


We are at the beginning of the hype cycle concerning blockchain technology.  It will be interesting to see how it settles out once we reach the plateau of stability.  In the meantime, many new blockchain applications are appearing, some with direct appeal to the eDiscovery market.  Although there may be potential value in these applications, careful consideration is needed to determine whether they do provide any particular value to an organization.