A blockchain
(or block chain)
is a distributed database or ledger that keeps track of a continuously growing
set of data records, protected against tampering or revision. Data can only be appended to the store,
existing data cannot be changed.
The blockchain is the core innovation behind Bitcoin, but it may be useful
in other situations as well. Bitcoin is
the original and best known of the “crypto currencies,” electronic money that
is not issued by any government and that exists solely within the computers of
the Internet.
In addition to the potential concern of eDiscovery with
Bitcoins, other potential uses make blockchain technology potentially interesting
to eDiscovery. As records of
transactions or other events, blockchains are likely to play an essential role
in future litigation. Their ability to
provide a secure, verifiable, source of truth concerning the data entered in a
blockchain without any central authority is likely to be an important benefit
for the legal system, but there are limitations.
Bitcoin’s blockchain is a virtual ledger that keeps track of
who owns Bitcoins, how many they own, and the chain of custody of each Bitcoin
from its creation to its current owner.
The coins are not physical objects.
They are only entries in the Bitcoin ledger, that is, in the Bitcoin
blockchain. The Bitcoin blockchain
contains a verifiable record of every Bitcoin transaction ever made. It uses cryptography and other features to
allow each participant on the network to securely update the ledger without the
need for a central authority.
Three essential features make blockchain technology
interesting and secure:
- it is distributed
- it uses cryptography
- it has methods for automatically reconciling conflicts
When we say that a blockchain is distributed, what we mean
is that exact copies exist on a large number of computers, organized into a
peer-to-peer network. Any computer is
allowed to connect to the network, send new transactions to it, verify
transactions, and create new blocks (groups of transactions).
Every computer in the network has the definitive record of
these transactions. No one of these
computers serves a central source of truth about the status of a
transaction. There is no central
authority. Instead, the truth is
established using an algorithm that relies on a consensus of the computers in
the network. As long as a majority of
the computers in the network are run honestly, it is nearly impossible to modify
earlier entries in the blockchain.
The ledger can only be updated by agreement of a majority of
the computers in the network. If the ledger were stored on only a single
computer, then a malicious party could, if they had access to this computer,
change the ledger. Because the Bitcoin
blockchain is represented on a large number of computers, a majority of them
would have to be compromised to change the ledger.
Public key
cryptography is a second essential feature of block chains. Public key cryptography uses two large
numbers. One number serves as a public key
and the other number serves as the private key.
The public key can be made widely available while the private key is
only cryptographically useful as long as it is kept secret.
A message can be encoded using a cryptographic algorithm and
the public key. Then, it can be decoded only by someone in possession of the
private key.
One computer on the network can transfer a coin to another
computer by issuing a transaction. A transaction contains:
- An input, which is a reference to the output of one or more previous transactions.
- A hash of the previous transaction
- An Index, which is a specific output from the referenced transaction
- A signature
- A public key
The public key must match the hash from the output of the
previous transaction. This public key is
used to verify the signature. The
combination of the public key and the signature proves that the transaction was
created by the actual owner of the Bitcoin being transferred. It is relatively easy to verify the
authenticity of a Bitcoin transaction by checking the chain of signatures.
A third innovation of blockchain technology is its method
for reconciling conflicts so that there is only one distributed version of the
truth, rather than a bunch of inconsistent versions. New transactions are broadcast to all peers
in the network. Each peer who receives the transaction checks it for validity
and adds it to its block of current transactions.
The consensus method used to verify transactions is called
proof of work. Participants on the network run algorithms to confirm that the
digital signatures attached to blocks verify each transaction. They then compete to solve a computational
puzzle, which is designed to have a low probability of success on each attempt.
This low probability of success makes it unpredictable which networked computer
will be able to generate the next block in the blockchain. It ensures that a randomly selected computer
in the network serve as the temporary authority providing the authoritative
current status of the blockchain for further computations. Its peers only accept the generated
blockchain if they can verify its accuracy.
Bitcoin’s
decentralized consensus depends on four processes running independently on network
nodes (computers connected to the network):
- Verification of each transaction
- Aggregation of those transactions into new blocks, with demonstrated computation through a proof-of-work algorithm
- Verification of the new blocks by every node and incorporation of the block into a chain
- Selection, by each node, of the chain with the most cumulative computation demonstrated through proof of work
When all of these processes are verified, they are accepted
as a consensus by the network. Because
they are all independent, they are highly resistant to tampering by a minority
of the network nodes.
Each block requires a substantial amount of work to generate
an acceptable candidate for acceptance by the network. It preserves the hash of
the preceding block. The combination of a high level of computational effort
and the preservation of the identity of its preceding block provides protection
against anyone trying to regenerate a string of blocks to cover a change to an
earlier transaction. This process
protects the chain from tampering, by making changes to preceding blocks
computationally infeasible.
Put simply, the proof of work method requires the node to
guess a number, called a nonce, that when
combined with the block yields a hash value below a
certain range. This nonce is basically a
random number and, on average, will take a great deal of computation to guess
(by trial and error). In fact, it can
take billions of guesses to come up with an acceptable nonce. Once guessed, however, it is fairly trivial
to verify that it is correct. The first
peer to guess the correct nonce is awarded a “mining credit” of some (currently
25) Bitcoins for its work. The
successful peer then broadcasts that block to its peers, who can easily verify
that it is correct.
Hash:
000000000043a8c0fd1d6f726790caa2a406010d19efd2780db27bdbbd93baf6
Previous block: 00000000001937917bd2caba204bb1aa530ec1de9d0f6736e5d85d96da9c8bba Next block: 00000000000036312a44ab7711afa46f475913fbd9727cf508ed4af3bc933d16 Time: 2010-09-16 05:03:47 Difficulty: 712.884864 Transactions: 2 Merkle root: 8fb300e3fdb6f30a4c67233b997f99fdd518b968b9a3fd65857bfe78b2600719 Nonce: 1462756097 |
||
Input/Previous
Output
|
Source
& Amount
|
Recipient
& Amount
|
N/A
|
Generation:
50 + 0 total fees
|
Generation:
50 + 0 total fees
|
f5d8ee39a430...:0
|
1JBSCVF6VM6QjFZyTnbpLjoCJ...:
50
|
16ro3Jptwo4asSevZnsRX6vf..:
50
|
Table
1: Example
Block of Bitcoin. The block contains 2 transactions, one of which awards the
generator peer with 50 Bitcoins.
For a peer to double-spend a given Bitcoin, it would need to
replace the transaction where the Bitcoin was originally spent along with its
corresponding block. It would then have
to re-compute all of the subsequent blocks in the chain, because they refer to
this earlier block. While not
impossible, it is highly unlikely that this could be accomplished. As long as the honest peers outnumber the
dishonest ones, such cheating is infeasible.
Table 2 shows a few recently committed Bitcoin blocks. “Mined by” indicates the peer system that
found the nonce and won the right to broadcast the block. The difficulty of finding the nonce is
adjusted so that blocks are committed on average every 10 minutes, but as you
can see, there is some variation in how long it takes to find the nonce.
Block Number
|
Age
|
Transactions
|
Mined by
|
Size
|
410136
|
3 minutes ago
|
653
|
Discus Fish
|
462607
|
410135
|
6 minutes ago
|
338
|
Discus Fish
|
180432
|
410134
|
6 minutes ago
|
1303
|
Discus Fish
|
978222
|
410133
|
7 minutes ago
|
3318
|
|
989240
|
410132
|
37 minutes ago
|
563
|
AntMiner
|
259451
|
410132
|
36 minutes ago
|
563
|
AntMiner
|
259451
|
Why blockchains might matter to eDiscovery
The blockchain provides a way to track and verify
transactions without requiring a central tracking authority. Right now it is used mainly by Bitcoin (or
similar crypto-currencies), but it could find applications in other
domains. Many organizations, including
IBM, at least two states, and several financial institutions, are exploring the
use of blockchains in their businesses.
If these uses become common, they will certainly figure in future
litigations. Blockchains will join the
list of sources that must be considered in eDiscovery.
As an example of the interest among financial institutions
in blockchain technology, DTCC (Depository
Trust & Clearing Corporation) recently held a well-attended symposium
on the use of blockchain technology in clearing transactions. DTCC is pursuing
blockchain technologies because they believe that these technologies have
the potential to address perceived shortcomings of the post-trade process. If this turns out to be true, blockchain
technologies are likely to play an important role in future financial
litigations, where, for example, transactions can be verified.
Another application
of this technology might include tracking the provenance of valuable objects,
for example, real estate titles, art objects, or diamonds. For example, Allianz is working with a
startup to develop a system to track diamonds from the time that they are mined
through retail sale.
In a cybersecurity
breach, according to the Gemological Institute of America (GIA), the
grading reports for over a thousand registered diamonds were altered. The grading report is very influential in
setting the value for a diamond so any changes could have a significant
economic impact. Only 175 of these
diamonds have been resubmitted for reevaluation. If the evaluation records for these diamonds
had been stored in a block chain, it would be impractically difficult to modify
them. Disputes concerning the provenance
of registered gems would be easier to resolve.
The owners of Bitcoins or other goods registered in the
ledger are represented by their Bitcoin address, which is simply a number. The blockchain verifies that the address of
the transmitting party is authentic—in
that it verifies that that address was associated at least that number of
Bitcoins. The blockchain does not,
however, contain information about who owns that address, merely that the
person, whoever it is, had access to the private key associated with that
address. The personal identity of the
person corresponding to a Bitcoin address is held outside of the blockchain and
may be difficult to discover.
Tools exist that let
anyone view the
transactions in a Bitcoin block, so discovering the information that is
contained in a blockchain is likely to be fairly easy and direct.
It may be too soon to tell, but blockchains may also play a
role in the eDiscovery process, itself.
For example, blockchains might be used for tracking or establishing
chain of custody, legal holds, and trial exhibits.
Why blockchain technology might not matter to eDiscovery
Blockchain technology is very effective at showing that
transactions entered into its ledger have not been modified. Its approach is so suitable to Bitcoin
because a Bitcoin’s only existence is within this ledger. Bitcoins have no physical manifestation,
except within the blockchain.
Just as the blockchain is of
little use identifying the owner of a Bitcoin address, dealing with
tangible goods or instruments that exist outside of the blockchain, is also not
so simple. The blockchain can guarantee
that the transactions are faithfully represented, but unless the only
manifestation of a good is within the blockchain itself, it cannot guarantee
that the transaction faithfully represents the object to which the transaction
refers. If someone submits the wrong
information about a diamond, that wrong information will be preserved forever
in the blockchain, but will still be wrong.
The blockchain will faithfully transmit the records it is
given, but it has nothing to say about whether those records accurately
represent anything else. The blockchain
can accurately represent the sequence of transactions that brought a Bitcoin to
a specific owner, as represented by a specific Bitcoin address, but once that
Bitcoin is converted to a tangible currency, such as dollars, the blockchain
has nothing more to say. Nor does it
have anything to say about the dollars before they were exchanged for
Bitcoins. A blockchain can accurately
transmit the truth, but it cannot create it.
The asymmetry between the ability of a blockchain to
guarantee that the information it contains cannot be changed and its inability
to guarantee that the correct information has been inserted in it is a logical
limitation, not something that can be fixed by any kind of technology. As long as the information exists solely
within the blockchain (Bitcoins are DEFINED only within the blockchain), it can
work very effectively. The blockchain
transmits the truth from step to step in the same way that a logical deduction
transmits the truth. For example, the
logical sequence,
- Mary is a cow
- 2All cows have four legs
- 3Therefore Mary has four legs
Transmits the truth from the premises (Mary is a cow and all cows
have four legs) to the conclusion (Therefore,
Mary has four legs). But it cannot
guarantee that Mary is, in fact, a cow or that all cows have four legs. Nothing in the logic compels these two
statements to be true. If they are true,
then the conclusion must also be true, but one or both of the premises could be
false. Establishing the truth of the
premises requires a different logical process, induction. There are no guarantees for inductive
reasoning.
In the case of the diamonds, the validity of the description
entered into the blockchain is the responsibility of the GIA gemologists. They provide an authoritative evaluation of
each diamond. Being human beings, they
are at risk of being suborned. Once
entered into the blockchain, their analysis cannot easily be changed, but there
can be no guarantee that they entered the correct data at the start.
Finally, blockchain technology is only effective when its
database is distributed across a large group of independent systems. The truth is what a majority of these systems
says the truth is. As long as a majority
of these systems is honest, the guarantee of truth is valid. One would have to suborn a majority of
systems to change the information recorded in the blockchain. The smaller the network, or, properly, the
fewer independent systems there are in a network, the higher the likelihood
that a majority could be corrupted.
Two things help to ensure that the Bitcoin system is widely
distributed. First, it is open. It is based on open-source software. It is a public network. Anyone can download it and participate in the
network. The interest in Bitcoin is
sufficiently broad that many people would be interested in participating. There are currently at least 6,655 nodes on the Bitcoin network.
Second, the first system that finds the correct nonce for a
block receives a fee, which, depending on the exchange rate, could correspond
to a significant number of dollars.
Private networks, such as might be used to support GIA records or DTCC
transactions are likely to be much smaller, and so would be easier to
compromise. The fees for running the
servers would likely be lower so there would be less incentive to add servers
to the network. Limited networks would
be correspondingly more susceptible to scams than larger networks.
Networks that are not open or broad present a much higher
risk of being successfully attacked, thus reducing the value and validity of
the blockchain. A centrally run (e.g.,
by DTCC) blockchain presents no value at all, I would argue.
Conclusion
The concept of a blockchain has the potential to radically
change secure computing in important ways.
Exactly what the impact of these changes will be on eDiscovery is, at
this point, difficult to predict.
Logical constraints limit the applicability of blockchain to the
non-virtual world, because blockchains secure the ledger, not the
correspondence between the ledger and the items whose transactions are recorded
in the ledger.
Nevertheless, it seems apparent that blockchain technology will
have some impact on eDiscovery. Bitcoin
and other crypto currencies are popular enough that they will surely figure in
some litigation at some point, and that point is probably soon.
As I see it, financial instruments, if they can be designed
to exist solely within the blockchain, as Bitcoin does, are the most immediate
application of this technology of relevance to the eDiscovery community. That is, if we can make a stock, for example,
that exists only in the ledger recorded in the blockchain, then we would be
creating a new special-purpose crypto currency for trading in that
company. The stock would only exist as
entries in the blockchain ledger, thus avoiding the weak link of external
reference.
The security of that system depends on having the ledger
represented on many independently controlled servers, so it might only work if
whole markets could be converted to virtual stock and every company or every stockholder
were to run a full blockchain server, each with a complete copy of all of the
transactions for the entire history of the market. An appropriate proof of work method would
also be required.
We are at the beginning of the hype cycle concerning blockchain
technology. It will be interesting to
see how it settles out once we reach the plateau of stability. In the meantime, many new blockchain
applications are appearing,
some with direct appeal to the eDiscovery market. Although there may be potential value in
these applications, careful consideration is needed to determine whether they
do provide any particular value to an organization.