Arweave 2.6: Potentially Aligning Better with Satoshi Nakamoto's Vision

IntermediateMar 24, 2024
This article argues that Satoshi Nakamoto's vision—consensus accessible to everyone via CPU—has yet to be fully realized. The iterative mechanisms of Arweave may align more faithfully with Nakamoto's original vision, with version 2.6 marking a significant step towards fulfilling his expectations.
Arweave 2.6: Potentially Aligning Better with Satoshi Nakamoto's Vision

Introduction

In about a month, #Bitcoin is set to begin its next halving. However, the author believes Satoshi Nakamoto’s vision—consensus accessible to everyone via CPU—has yet to be realized. In this regard, the iterative mechanisms of Arweave may align more faithfully with Nakamoto’s original vision, with version 2.6 representing a significant step towards fulfilling his expectations. This version brings substantial improvements over its predecessors, aiming to:

  • Restrict hardware acceleration, allowing consensus maintenance with a general-purpose CPU + mechanical hard drive, thereby reducing storage costs;
  • Direct consensus costs towards efficient data storage rather than energy-intensive hash competition;
  • Incentivize miners to establish their full Arweave data set copies, enabling faster data routing and more distributed storage.

Consensus mechanism

Based on the above goals, the mechanism of version 2.6 is roughly as follows:

  • A new component is added to the original SPoRA mechanism called the Hash Chain, which is the previously mentioned encryption algorithm clock and generates a SHA-256 mining hash every second.
  • The miner selects the index of a partition in the data partition it stores and uses it together with the mining hash and mining address as the mining input information to start mining.
  • Generate a recall range 1 in the chosen partition and another recall range 2 at a random position in the interweaving network.
  • Use the recall data blocks (Chunks) within range 1 sequentially to calculate whether it’s a block solution. If the calculation result exceeds the current network difficulty, the miner gains the right to mine; if not successful, proceed to calculate the next recall block in the range.
  • Recall data blocks in range 2 may also be calculated and verified, but their solutions require the hash from range 1.

Graph 1: Schematic of the Consensus Mechanism in Version 2.6

Let’s get acquainted with various terms and concepts that appear in this mechanism:

Arweave Data: Also known as the “Weave Network.” All data in the network is divided into individual data blocks, called Chunks (the blocks resembling a “brick wall” in the diagram). These blocks are evenly distributed throughout the Arweave network and are addressed using a Merkle tree structure (also known as Global Offset), allowing identification of any data block’s position within the Weave Network.

Chunk: Each data block typically has a size of 256 KB. Miners must package and hash the corresponding data blocks to win the right to mine, proving that they store copies of the data during the SPoRA mining process.

Partition: “Partition” is a new concept introduced in version 2.6. Each partition covers 3.6TB of data. Partitions are numbered from the beginning of the Weave Network (index 0) up to the total number of partitions covering the entire Weave Network.

Recall Range: Recall Range is another new concept in version 2.6. It represents a series of contiguous data blocks (Chunks) in the Weave Network, starting from a specific offset and having a length of 100MB. With each data block being 256 KB, a Recall Range includes 400 data blocks. In this mechanism, there are two Recall Ranges, as explained in detail below.

Potential Solutions: Every 256KB data block within the Recall Range is considered a potential solution for winning the right to mine. As part of the mining process, each data block is hashed to test if it meets the network’s difficulty requirements. If successful, the miner wins the right to mine and receives mining rewards. If unsuccessful, the miner continues to attempt the next 256KB block within the Recall Range.

Hash Chain: The Hash Chain is a key update in version 2.6, adding an encrypted clock to the previous SPoRA, limiting the maximum hash rate. The Hash Chain generates a sequence of hashes by consecutively hashing a piece of data using the SHA-256 function. This process cannot be parallelized (easily achievable with consumer-grade CPUs), achieving a 1-second delay by performing a certain number of consecutive hash operations.

Mining Hash: After a sufficient number of consecutive hash operations (i.e., after a 1-second delay), the Hash Chain produces a hash value considered valid for mining. It’s noteworthy that the mining hash is consistent across all miners, and all miners can verify it.

Now that we’ve introduced all the necessary terms, we can better understand how Version 2.6 operates by discussing the optimal strategies for obtaining it.

Best Strategies

The overall goal of Arweave has been introduced multiple times before, which is to maximize the number of data replicas stored on the network. But what to store? How to store it? There are many requirements and intricacies involved. Here, we will discuss how to adopt a best practice strategy.

Replicas vs Copies

Since version 2.6, I’ve frequently encountered two terms in various technical documents: Replicas and Copies. Both concepts can be translated into “copies” in Chinese, but in reality, there are significant differences between them, which also caused some obstacles for me to understand the mechanism. For ease of understanding, I prefer to translate Replicas as “副本” (replicas) and Copies as “备份” (backups).

Copies refer to simply copying the data, where there is no difference between the backups of the same data.

Replicas, on the other hand, emphasize uniqueness. It refers to the act of storing data after it has undergone a process of uniqueness. The Arweave network encourages the storage of replicas rather than mere backups.

Note: In version 2.7, the consensus mechanism has changed to SPoRes, which stands for Succinct Proofs of Replications, based on the storage of replicas. I will provide further interpretation in the future.

Packing Unique Replicas

Unique replicas are crucial in the Arweave mechanism. Miners must package all data in a specific format to form their unique replicas as a prerequisite for winning the right to mine.

If you want to run a new node and think about directly copying the data that other miners have already packaged, it won’t work. First, you need to download and synchronize the original data from the Arweave Weave Network (of course, you don’t want to download all of it, downloading only a part is also feasible, and you can set your own data policies to filter out risky data). Then, use the RandomX function to package each data block of the original data, turning them into potential mining solutions.

The packaging process involves providing a Packing Key to the RandomX function, allowing it to generate results through multiple calculations for packaging the original data blocks. The process of unpacking the already packaged data blocks is the same—providing the packing key and using the results generated through multiple calculations to unpack the data blocks.

In version 2.5, the Packing Key backup is a SHA256 hash associated with chunk_offset (the offset of the data block, also understood as the position parameter of the data block) and tx_root (transaction root). This ensures that each mined mining solution comes from a unique replica of data blocks within a specific block. If a data block has multiple backups in different locations in the broken network, each backup needs to be backed up separately as a unique replica.

In version 2.6, this backup key is extended to a SHA256 hash associated with chunk_offset, tx_root, and miner_address (miner’s address). This means that each replica is also unique for each mining address.

Advantages of storing complete replicas

The algorithm suggests that miners should construct a unique complete replica rather than partially replicated ones, which ensures an even distribution of data throughout the network.

How should we understand this? Let’s understand through a comparison of the following two images.

First, let’s assume that the entire Arweave fragmented network has produced a total of 16 data partitions.

Scenario 1:

  • Miner Bob found downloading data too time-consuming, so he only downloaded data from the first 4 partitions of the broken network.
  • To maximize the mining replicas in these 4 partitions, Bob came up with a clever idea. He made 4 copies of the data from these 4 partitions and grouped them into 4 unique replica resources using different mining addresses. Now, Bob has 16 partitions in his storage space. This is fine and complies with the rules of unique replicas.
  • Next, Bob can conduct infringement tests for each block of data material in each partition every second when obtaining the Mining Hash. This allows Bob to have 400 * 16 = 6400 potential mining solutions in one second.
  • But Bob’s cleverness comes at a cost because he has to forfeit one mining opportunity for each recall range. See those “question marks”? They represent the second recall range that Bob cannot find on his hard drive because it marks the data partitions that Bob did not store. Of course, with luck, there are relatively low indicators symbolizing that Bob stored only 25% of the 4 partitions, which means 1600 potential solutions.
  • So, this strategy allows Bob to have 6400 + 1600 = 8000 potential solutions per second.

Figure 2: Bob’s “Clever” Strategy: First Scenario

Second Scenario:

Now, let’s take a look at the second scenario. Due to the arrangement of two recall ranges, a more optimal strategy is to store unique replicas of the data with more problems. This is illustrated in Figure 3.

  • Miner Alice, unlike Bob’s “clever” approach, diligently downloads the partition data for all 16 partitions and uses only one mining address to form a unique replica with 16 backups.
  • Since Alice also has 16 partitions, the total potential solutions for the first recall range are the same as Bob’s, which is also 6400.
  • However, in this scenario, Alice obtains all potential solutions for the second recall range. That’s an additional 6400.
  • So, Alice’s strategy gives her 6400 + 6400 = 12800 potential solutions per second. The advantage is obvious.

Figure 3: Alice’s strategy has greater advantages

The Role of Recall Ranges

You might wonder why, before version 2.5, a single recall block offset was randomly hashed out by a function to let miners search and provide storage proofs, while in version 2.6, it hashes out a recall range instead.

The reason is quite simple: a recall range is composed of contiguous data blocks, and this structure serves one main purpose - to minimize the movement of the read head of mechanical hard drives (HDDs). This method of physical optimization allows HDDs’ read performance to be on par with more expensive solid-state drives (SSDs). It’s like tying one hand and one foot of an SSD; of course, it can still have a slight speed advantage by being able to transfer four recall ranges per second. However, compared to cheaper HDDs, their count will be the key metric driving miners’ choices.

Verification of the Hash Chain

Now let’s discuss the verification of a new block.

To accept a new block, validators need to validate the new block received from the block producer, which can be done by using their generated mining hash to verify the mining hash of the new block.

If a validator is not at the current head of the hash chain, each mining hash includes 25 40-millisecond checkpoints. These checkpoints are the consecutive results of hashing for 40 milliseconds, and they together represent a one-second interval starting from the previous mining hash.

Before propagating the newly received block to other nodes, validators will rapidly complete the verification of the first 25 checkpoints within 40 milliseconds. If the verification is successful, it triggers block propagation and continues to validate the remaining checkpoints.

The full checkpoints are completed by validating all remaining checkpoints. After the first 25 checkpoints, there are 500 verification checkpoints, followed by another 500 verification checkpoints, with the interval doubling for each subsequent group of 500 checkpoints.

While the hash chain must proceed sequentially in generating mining hashes, validators can perform hash verification when validating checkpoints, which can shorten the time to verify blocks and improve efficiency.

Figure 4: Verification Process of the Hash Chain

Seed of the Hash Chain

If a miner or mining pool has faster SHA256 hashing capabilities, their hash chain may advance ahead of other nodes in the network. Over time, this block speed advantage may accumulate into a significant hash chain offset, causing the mined hashes to be out of sync with the rest of the validators. This could lead to a series of uncontrollable forks and reorganizations.

To reduce the likelihood of such hash chain offsets, Arweave synchronizes the global hash chain by using tokens from historical blocks at fixed intervals. This regularly provides new seeds for the hash chain, ensuring synchronization of hash chains among various miners with a validated block.

The interval for hash chain seeds is every 50 * 120 mined hashes (50 represents the number of blocks, and 120 represents the number of mined hashes within a block production cycle of 2 minutes) to select a new seed block. This means that seed blocks appear approximately every ~50 Arweave blocks, but due to variations in block times, seed blocks may appear slightly earlier or later than 50 blocks.

Figure 5: Generation Method of Hash Chain Seeds

The above content excerpted from the specification of version 2.6 by the author illustrates that Arweave has implemented low-power, more decentralized mechanisms to operate the entire network starting from version 2.6. Satoshi Nakamoto’s vision finds practical realization in Arweave.

Arweave 2.6: https://2-6-spec.arweave.dev/

Statement:

  1. This article originally titled “Arweave 2.6 也许更符合中本聪的愿景” is reproduced from [PermaDAO]. All copyrights belong to the original author [Arweave Oasis]. If you have any objection to the reprint, please contact Gate Learn team, the team will handle it as soon as possible.

  2. Disclaimer: The views and opinions expressed in this article represent only the author’s personal views and do not constitute any investment advice.

  3. Translations of the article into other languages are done by the Gate Learn team. Unless mentioned, copying, distributing, or plagiarizing the translated articles is prohibited.

Start Now
Sign up and get a
$100
Voucher!
Create Account