Robert Steinadler, 6 months ago
Hash trees are vital to blockchain technology and Bitcoin because they allow validating data securely and efficiently. The name Merkle tree derives from its scientist inventor Ralph Merkle who patented his invention in 1979. Today, hash trees are very common in cryptography as well as in computer science.
What is a hash tree, how does it work, and why is it so important for Bitcoin and blockchain technology in general?
A Merkle tree, or hash tree, is a mathematical data structure that is commonly used in many computer science applications to organize data. Blockchain technology is only one of them. Hash trees are used to encode blockchain data efficiently and securely. Other applications that use hash trees are Git, the IPFS or database systems like, for example, Amazon DynamoDB or Cassandra.
Hash trees are used to summarize transactions on the blockchain by making up a hash of various data blocks of transactions that have been performed in past. The hashed data does not use a lot of space, and its verification is fast and secure since the hash functions that are used are considered to be highly reliable.
The Merkle root is a mathematical method to confirm all facts that are related to the hash tree. In blockchain, the Merkle root is used to make sure that all blocks that are being sent through a peer-to-peer network are secure. This includes checking the consistency of the data of each block to prove that it has not been altered.
This is achieved by using hash functions. These functions are one-way, meaning that they are deterministic, collision resistant and cannot be reversed. A specific input will always match the same output. This output is the hash value. Let’s take a look at an example. When running the sentence “LiteBit is the best exchange.” through the SHA256 hash algorithm, the following output is always the same:
1abe7feee5daba070aa7673651ab9d407af23277387198d3da938c388bb2d295
Changing one letter or adding more information to the sentence would immediately alter the output.
A Merkle tree creates a digital fingerprint that totals all transactions in a block by creating a verifiable hash value. This way, a user can verify if a transaction is included in a block or not. Transactions are hashed using their transaction ID as input until only one hash value remains. That particular value is the Merkle root.
Take a look at the diagram above. The hash tree grows from the bottom to the top. The IDs of transactions A, B, C, and D are hashed. The resulting hash values are considered to be the so-called leaves of the tree (hash A, hash B, hash C, and hash D). A pair of leaf nodes are used to create a hash value of a non-leaf node. In our example above, a hash value is created by combining the hashes of transaction ID A and B and another one by combining the two hashes of transaction ID C and D. The Merkle root is created by combining hash AB and hash CD. In effect, the Merkle root in our example is a hash value based on four transaction IDs.
Since most hash tree implementations are binary, they are also called binary hash trees because each node has two child nodes. It is essential that only the hashed data are considered part of the hash tree, not the data itself. Also, this example is very simplistic. A Merkle tree growing in the wild is usually much more complex and is not necessarily binary.
A hash can represent a huge amount of data while being super small. At the end of the day, it is only a relatively short string. Therefore, hash trees offer several benefits:
Bitcoin is the first cryptocurrency invented, and it also introduced the blockchain. Let’s pretend that Bitcoin wouldn’t use Merkle trees. To check transaction data for integrity, each node in the network could not rely on the Merkle root that is usually part of each block. In effect, every node would need to have a full copy of the Bitcoin blockchain and hold that copy against every new block propagated by the network.
One could argue that this is already the case. Each full node has a copy of the Bitcoin blockchain and can check the whole ledger. While this is true, the computational effort would be immense since each node would be busy checking data instead of hashes.
A Merkle tree separates plays an important role in accounting by separating the proof for a transaction from the transaction's data. Handing down the proof that a transaction happened is faster, more efficient and frankly more reliable than checking all the data repeatedly.