An Introduction to Blockchain

Post Editor

Learn the fundamentals of a blockchain starting from first principles. We'll cover hashing, mining, consensus and more. After reading this article, you'll have a solid foundation upon which to explore platforms like Ethereum and Solana.

12 min read
0 comments
post

An Introduction to Blockchain

Learn the fundamentals of a blockchain starting from first principles. We'll cover hashing, mining, consensus and more. After reading this article, you'll have a solid foundation upon which to explore platforms like Ethereum and Solana.

post
post
12 min read
0 comments
0 comments

Web3 has been hailed by some as the next age of the Internet. While still in its infancy, Web3 promises to provide a more decentralized way of computing. Love it or hate it, as engineers and scientists, we should understand the technical foundations of Web3. In order to understand Web3, it is essential to understand the fundamentals of blockchains. This is because blockchains form the backbone of systems like Bitcoin and Ethereum by providing a cryptographically secure and immutable data layer.

My goal in this primer is to give you an intuitive understanding of blockchains with some tools and examples so that you can better understand future articles on more sophisticated Web3 technologies, like Ethereum. In this guide, we will focus on Bitcoin’s blockchain, since it is one of the simplest to understand. It will provide you with a set of fundamentals from which to build upon.

Blockchains have a fairly steep learning curve because they combine topics from multiple fields, such as computer science, cryptography, distributed systems, and economics. Part of the reason is also because this concept has received so much hype and misinformation that people have a hard time grokking it, probably because they have mistaken it for some kind of magic. This need not be the case. As we’ll soon see, blockchains, while clever, can be understood at a fundamentals level by proceeding step-by-step.

A useful resource: Visualization is key in understanding these concepts. Anders Brownworth has created a very nice, interactive tool for tinkering with blockchains. I will reference it throughout this document, and recommend experimenting with it.

Background
Link to this section

Bitcoin was designed to be a "peer-to-peer version of electronic cash...requiring no trusted third party to prevent double spending".

Loosely speaking, Double Spending is when someone tries to give the same coin to two people at the same time.

You can read the full abstract from the original Bitcoin whitepaper. It does a pretty great job of introducing and motivating the technology. You'll notice two things about the abstract:

  1. Bitcoin’s main goal was to prevent "double-spending" without using a trusted 3rd party.
  2. The word “blockchain” doesn’t appear anywhere.

Bitcoin is designed to be an electronic currency that can process payment transactions in a way which does not require a trusted 3rd party. Users use public key cryptography to sign transactions. This is a theme that is common across blockchain technologies.

Content imageContent image
Signed transactions, from the Bitcoin whitepaper

Technical Concepts
Link to this section

Here’s a quick overview of the technical concepts covered in this article. These are what enable blockchains like Bitcoin to function:

  • Cryptographic hash functions, like SHA256. Used for generating blockchain account addresses, and for public/private key pairs used for digitally signing transactions.
  • Transactions containing data which update the state of the blockchain. For example, Alice sends Bob 10 bitcoin. Such transactions use digital signatures (via private key) to show that Alice really did send Bob 10 bitcoin. Otherwise, Bob could just fake the transaction.
  • Blocks, which group transactions. This grouping is done to increase the efficiency of the network by processing multiple transactions at once.
  • Blockchain, A data structure containing blocks of transactions which are securely linked together. Think of it like a singly-linked list with some hash functions sprinkled in. It’s a simple, yet surprisingly useful mental model.
  • A peer-to-peer network consisting of nodes. These nodes propagate transactions and blocks on the network to one another in order to reach consensus and make updates to the blockchain.
  • A distributed consensus mechanism for deciding how new blocks should be added to the blockchain. The consensus mechanism is of vital importance, since it is what decentralizes control over the blockchain by forcing nodes participating in the network to cooperate in the enforcement of a set of consensus rules. This makes it difficult for bad actors to add fake blocks or try to attack the network.
  • Cryptoeconomics. A game-theoretic way to incentivize nodes in the network to spend the energy and computing resources required to process transactions in an honest way. This is usually done in the form of rewards paid using a native cryptocurrency, like bitcoin.
  • An open-source software client which implements all of the above, making it easy for individuals to download and become a node participating in the network.

Before diving into the technical details, it's worth taking a moment to understand some of the motivations behind decentralization.

Decentralization
Link to this section

In a centralized world, a trusted authority controls reads and writes to a store of data. For example, your bank is the only entity which can update your account balance in their database, and you trust them to do so. If you received news that your bank had decided to make their database publicly available for anyone to download and update, you would probably panic.

And yet, that’s what Bitcoin does. Bitcoin is decentralized and trustless, meaning there is no trusted authority which controls updates. Bitcoin forms a distributed, peer-to-peer network of computers called "nodes" which run Bitcoin’s software. This software contains a copy of Bitcoin’s blockchain, along with the ability to write updates to it, and communicate with other nodes on the network. Any person can download a copy of the software freely, and become a node.

If Bitcoin were using a traditional database, this would be chaos. If any node could make updates to the database at any time, there would be conflicts, fake transactions, inconsistencies between different node’s copies of the database - it would be completely useless. This could be solved by permitting only certain nodes to have write access, but this would require the network to trust those special nodes, and Bitcoin is designed to be trustless. Too much trust, and you’re back at centralization.

Because of this, Bitcoin does not use a conventional database, but instead uses both a blockchain to store data, and a consensus mechanism called Proof-of-Work which controls updates to the blockchain. In any distributed network, it’s possible for there to be multiple, competing versions of the blockchain at any time. Nodes in the network are configured to always accept the longest chain as being valid, as that is the chain that has the largest amount of “proof-of-work” in it. We’ll see why this is shortly.

It’s worth noting that having a blockchain ledger containing transactions is nothing special. The difficult part is getting a network to agree on the state of the ledger when it’s updated. This has been a topic of interest in distributed systems research for decades. You'll often hear the term Byzantine Fault Tolerant (BFT) thrown around. BFT refers to a class of consensus mechanisms that work when participants in the network can't trust each other to behave well. Imagine that Alice broadcasts a transaction into the network saying that she pays Bob 10 bitcoin. How can Alice and Bob be sure that other nodes on the network have added this transaction to their ledger? How can nodes on the network be sure that this is a legitimate transaction? Maybe Alice already spent her 10 bitcoin somewhere else (Double Spend).

This was the innovation of Bitcoin - finding a straightforward way for a distributed network to reach consensus on the state of the ledger without involving any trusted parties. This is proof-of-work, also known as the Nakamoto Consensus Algorithm. Briefly, it says that of all the different copies of the blockchain that exist in the network at any given time, the true blockchain is always the longest one, because it is the one that has the most computational work invested into it by the network. Assuming that at least 51% of the computing power in the network is controlled by honest nodes, then the longest blockchain must be the real one. If this were not the case, and say Alice and Bob combined controlled more than 50% of the computing power, they could potentially double spend their coins in a 51% attack.

In the next sections, we’ll look into the technical details of how this consensus is achieved.

Cryptographic Hashing
Link to this section

An essential concept in blockchains is hashing. Bitcoin uses a SHA256 cryptographic hash function. Cryptographic hash functions have two properties worth noting:

  1. Their inputs cannot be guessed from their outputs. They are non-reversible.
  2. Two similar inputs should produce greatly different outputs. This is similar to point (1). You should not be able to find any patterns in the output that would help you guess the input.

For a visualization of this, notice in the example below that each time the data changes, the hash value changes to a new value. Let’s try hashing a simple transaction, like {from: “alice”, to: “bob”, value: 10}. Notice how the hash changes whenever we change part of the data, such as the value.

Content imageContent image
Changing the data changes the hash

Blocks
Link to this section

Transaction data is grouped together and stored in blocks. Adding blocks to the blockchain takes time, so it more performant to group multiple transactions into a single block. In addition to transaction data, a block has a header which stores some metadata about the block. For now, we’ll focus on two pieces of metadata, the nonce and the hash.

We’ve already seen that the hash changes based on the data. When adding a block to the blockchain, there is a special challenge that must be solved; we need to guess a value for the nonce such that the SHA256 hash of the nonce and the serialized data fields produces a hash with a certain number of leading zeros. This "leading number of zeros" is referred to as the block’s difficulty level, the observation being that the larger the required number of leading zeros, the longer it takes to guess a nonce that produces a hash with that many leading zeros. Why is this?

Think of it like this; a SHA256 hash is 256 bits, which is 64 hexadecimal characters (each hex character requires 4 bits to encode. 4 * 64 = 256). Each hex character can be one of 16 possible values, and we have 64 of them, so there are 16^64 possible SHA256 hashes. Let’s say we want to find a hash with 20 leading zeros. That means 20 of our 64 characters are known, so there are only 16^(64 - 20) or 16^44 such hashes.

Content imageContent image

Our odds of finding such a hash randomly are quite small, so a block difficulty of 20 means that it will take a very long time to mine the block (i.e. find a nonce that hashes with the block’s contents to produce a hash with 20 leading zeros). In the example below, we are only searching for hashes with 4 leading zeros, so it is much easier to find a suitable nonce.

Content imageContent image
Even slight changes to data will change the hash and invalidate the block

Once the “Mine” button is clicked, we start guessing nonces, the block eventually turns green, indicating that the hash has four leading zeros. As you may have guessed, this process of finding the correct nonce value for a set of data is called “mining”, like a prospector mining for gold in the wild west.

The computational “work” in "proof-of-work" is the act of guessing a nonce until the correct hash is produced. Note that in this example, we need to produce a hash with only four leading zeros, but real Bitcoin hashes currently need to have 17 leading zeros. You can see this in action by looking at a block explorer for Bitcoin’s main network. Notice how all the Hash IDs start with leading zeros. As mentioned previously, the more leading zeros, the more time consuming the mining process is.

Why would Bitcoin want to make the process of mining so time consuming? The answer can be found when we start to link the blocks together into a chain. It has to do with making it very difficult for an attacker to produce a fake version of the blockchain.

Blockchains
Link to this section

In addition to the nonce and hash fields, a block header also contains the hash of the block that came before it in the chain. That previous block also contains the hash of the block that came before it, and so on until a chain of blocks is formed. Hence the term, “blockchain”.

Content imageContent image
Blockchain diagram from the Bitcoin whitepaper

In the example below, notice that the five blocks in the chain start out green, each with hashes with four leading zeros. These are all valid “mined’ blocks.

Content imageContent image
In this toy example, all hashes must have 4 leading zeros. The block difficulty is thus "4"

Now, suppose an attacker wanted to change the data in block 2. In this case, assume Alice has only 10 bitcoin, and she wants to double spend that 10 bitcoin to pay both Bob and George. As soon as she changes the data in block 2 of her local copy of the blockchain, the hash of block 2 changes. Because block 3 contains the hash of block 2 (which has just changed), the hash of block 3 will now be invalidated, and so on up the chain. Every block starting at block 2 is now red (invalid), because the change has rippled and invalidated the chain.

Content imageContent image
A change to block 2 invalidates blocks 2,3,4 and 5

If Alice tried to submit this chain to the network, it would immediately be rejected as invalid, because the hashes don’t have four leading zeros. Put another way, there is no “proof of work” to say that these blocks are valid, and the network always accepts the chain with the largest amount of verifiable (or provable) work. Alice would have to re-mine every block starting from block 2 to validate them, and then send this chain into the network for consensus.
But we saw in the previous step that mining a block is computationally intensive and takes time.

Content imageContent image
Turning blocks from red to green takes time

In the time it takes for Alice to re-mine each of these blocks, a new block will have been added to the legitimate blockchain by the rest of the network. Unless Alice controls at least 51% of the computational power in the network, it can be shown that the probability of her catching up with the main chain and successfully completing her attack drops off exponentially with each additional block. Remember, the nodes on the network will always accept the longest mined chain as the valid one, because this is the chain that has the most computational proof of work invested into it, and we assume that the majority of the computing resources on the network are controlled by honest nodes. This is how Bitcoin’s blockchain maintains its security in a public, decentralized, trustless network.

It’s worth noting that Proof-of-Work is one consensus mechanism for blockchains. It has its limitations, such as being slow and requiring large amounts of computation. It also assumes that most nodes will be honest. Other consensus mechanisms, such as Proof-of-Stake, also exist. They are the topic of a future article.

Incentives and Cryptoeconomics
Link to this section

We’ve yet to answer an important question; if mining is so resource intensive, why would anyone volunteer to do it? All that electricity and specialized hardware costs money - to say nothing of the environmental impact. This brings us to game theory and an emerging field known as cryptoeconomics, which answers the question "how to incentivize honest participation in the network?".

Here's a fun fact; in Bitcoin, mining is the only process by which bitcoin can be created. Each time a miner successfully adds a block to the chain, they are allowed to include a transaction giving themselves some bitcoin as a reward for their work (currently 6.25 bitcoin as of this writing in 2022. This number halves every few years). This is called a miner reward, and it is the primary economic incentive - along with transaction fees - for miners to pay for the specialized hardware and energy required to mine. Ethereum provides its miners with a reward in its native currency (ether) as well. Note that it makes sense to use a currency native to the blockchain to reward miners. If rewards were to instead use fiat currencies like USD or JPY, we’d be back to trusting a central, off-chain authority, which is what Bitcoin set out to avoid in the first place. Note also that the miner reward is calculated to be large enough to incentivize nodes to behave honestly, instead of using their resources to try to attack the network. If you had enough resources to attack the network, you'd probably be better off just mining a bunch of blocks instead.

Conclusion
Link to this section

If you’ve made it this far, here are some additional resources for learning about blockchains. Keep in mind that by understanding how Bitcoin works in terms of its blockchain and consensus mechanism, you will be in very good shape to understand how Ethereum works, and how you can develop Web3 programs on top of it. Even if you despise the idea of Web3, this is still a great way to learn some fundamentals of computer science and economics. Thanks for reading, and see you next time.

Article Resources
Link to this section

Comments (0)

Be the first to leave a comment

Share

About the author

author_image

Fullstack Developer. Love digging into the internals of stuff. Always trying to reach the next level.

author_image

About the author

Nate Lapinski

Fullstack Developer. Love digging into the internals of stuff. Always trying to reach the next level.

About the author

author_image

Fullstack Developer. Love digging into the internals of stuff. Always trying to reach the next level.

Looking for a JS job?
Job logo
Front-End Engineer (Blockchain)

EOS Network Foundation

United States
Remote
$108k - $137k
Job logo
Senior Full-Stack Engineer (NFT / Blockchain)

Shrine Development

United States
Remote
$155k - $220k
Job logo
[Klaytn] Blockchain Engineer (Frontend Focus)

Krust Universe

Worldwide
Remote
$96k - $122k
More jobs
NxAngularCli
NxAngularCli
NxAngularCli

Featured articles