In about a month, #Bitcoin is set to begin its next halving. However, the author believes Satoshi Nakamoto’s vision—consensus accessible to everyone via CPU—has yet to be realized. In this regard, the iterative mechanisms of Arweave may align more faithfully with Nakamoto’s original vision, with version 2.6 representing a significant step towards fulfilling his expectations. This version brings substantial improvements over its predecessors, aiming to:
Based on the above goals, the mechanism of version 2.6 is roughly as follows:
Graph 1: Schematic of the Consensus Mechanism in Version 2.6
Let’s get acquainted with various terms and concepts that appear in this mechanism:
Arweave Data: Also known as the “Weave Network.” All data in the network is divided into individual data blocks, called Chunks (the blocks resembling a “brick wall” in the diagram). These blocks are evenly distributed throughout the Arweave network and are addressed using a Merkle tree structure (also known as Global Offset), allowing identification of any data block’s position within the Weave Network.
Chunk: Each data block typically has a size of 256 KB. Miners must package and hash the corresponding data blocks to win the right to mine, proving that they store copies of the data during the SPoRA mining process.
Partition: “Partition” is a new concept introduced in version 2.6. Each partition covers 3.6TB of data. Partitions are numbered from the beginning of the Weave Network (index 0) up to the total number of partitions covering the entire Weave Network.
Recall Range: Recall Range is another new concept in version 2.6. It represents a series of contiguous data blocks (Chunks) in the Weave Network, starting from a specific offset and having a length of 100MB. With each data block being 256 KB, a Recall Range includes 400 data blocks. In this mechanism, there are two Recall Ranges, as explained in detail below.
Potential Solutions: Every 256KB data block within the Recall Range is considered a potential solution for winning the right to mine. As part of the mining process, each data block is hashed to test if it meets the network’s difficulty requirements. If successful, the miner wins the right to mine and receives mining rewards. If unsuccessful, the miner continues to attempt the next 256KB block within the Recall Range.
Hash Chain: The Hash Chain is a key update in version 2.6, adding an encrypted clock to the previous SPoRA, limiting the maximum hash rate. The Hash Chain generates a sequence of hashes by consecutively hashing a piece of data using the SHA-256 function. This process cannot be parallelized (easily achievable with consumer-grade CPUs), achieving a 1-second delay by performing a certain number of consecutive hash operations.
Mining Hash: After a sufficient number of consecutive hash operations (i.e., after a 1-second delay), the Hash Chain produces a hash value considered valid for mining. It’s noteworthy that the mining hash is consistent across all miners, and all miners can verify it.
Now that we’ve introduced all the necessary terms, we can better understand how Version 2.6 operates by discussing the optimal strategies for obtaining it.
The overall goal of Arweave has been introduced multiple times before, which is to maximize the number of data replicas stored on the network. But what to store? How to store it? There are many requirements and intricacies involved. Here, we will discuss how to adopt a best practice strategy.
Replicas vs Copies
Since version 2.6, I’ve frequently encountered two terms in various technical documents: Replicas and Copies. Both concepts can be translated into “copies” in Chinese, but in reality, there are significant differences between them, which also caused some obstacles for me to understand the mechanism. For ease of understanding, I prefer to translate Replicas as “副本” (replicas) and Copies as “备份” (backups).
Copies refer to simply copying the data, where there is no difference between the backups of the same data.
Replicas, on the other hand, emphasize uniqueness. It refers to the act of storing data after it has undergone a process of uniqueness. The Arweave network encourages the storage of replicas rather than mere backups.
Note: In version 2.7, the consensus mechanism has changed to SPoRes, which stands for Succinct Proofs of Replications, based on the storage of replicas. I will provide further interpretation in the future.
Packing Unique Replicas
Unique replicas are crucial in the Arweave mechanism. Miners must package all data in a specific format to form their unique replicas as a prerequisite for winning the right to mine.
If you want to run a new node and think about directly copying the data that other miners have already packaged, it won’t work. First, you need to download and synchronize the original data from the Arweave Weave Network (of course, you don’t want to download all of it, downloading only a part is also feasible, and you can set your own data policies to filter out risky data). Then, use the RandomX function to package each data block of the original data, turning them into potential mining solutions.
The packaging process involves providing a Packing Key to the RandomX function, allowing it to generate results through multiple calculations for packaging the original data blocks. The process of unpacking the already packaged data blocks is the same—providing the packing key and using the results generated through multiple calculations to unpack the data blocks.
In version 2.5, the Packing Key backup is a SHA256 hash associated with chunk_offset (the offset of the data block, also understood as the position parameter of the data block) and tx_root (transaction root). This ensures that each mined mining solution comes from a unique replica of data blocks within a specific block. If a data block has multiple backups in different locations in the broken network, each backup needs to be backed up separately as a unique replica.
In version 2.6, this backup key is extended to a SHA256 hash associated with chunk_offset, tx_root, and miner_address (miner’s address). This means that each replica is also unique for each mining address.
Advantages of storing complete replicas
The algorithm suggests that miners should construct a unique complete replica rather than partially replicated ones, which ensures an even distribution of data throughout the network.
How should we understand this? Let’s understand through a comparison of the following two images.
First, let’s assume that the entire Arweave fragmented network has produced a total of 16 data partitions.
Scenario 1:
Figure 2: Bob’s “Clever” Strategy: First Scenario
Second Scenario:
Now, let’s take a look at the second scenario. Due to the arrangement of two recall ranges, a more optimal strategy is to store unique replicas of the data with more problems. This is illustrated in Figure 3.
Figure 3: Alice’s strategy has greater advantages
The Role of Recall Ranges
You might wonder why, before version 2.5, a single recall block offset was randomly hashed out by a function to let miners search and provide storage proofs, while in version 2.6, it hashes out a recall range instead.
The reason is quite simple: a recall range is composed of contiguous data blocks, and this structure serves one main purpose - to minimize the movement of the read head of mechanical hard drives (HDDs). This method of physical optimization allows HDDs’ read performance to be on par with more expensive solid-state drives (SSDs). It’s like tying one hand and one foot of an SSD; of course, it can still have a slight speed advantage by being able to transfer four recall ranges per second. However, compared to cheaper HDDs, their count will be the key metric driving miners’ choices.
Now let’s discuss the verification of a new block.
To accept a new block, validators need to validate the new block received from the block producer, which can be done by using their generated mining hash to verify the mining hash of the new block.
If a validator is not at the current head of the hash chain, each mining hash includes 25 40-millisecond checkpoints. These checkpoints are the consecutive results of hashing for 40 milliseconds, and they together represent a one-second interval starting from the previous mining hash.
Before propagating the newly received block to other nodes, validators will rapidly complete the verification of the first 25 checkpoints within 40 milliseconds. If the verification is successful, it triggers block propagation and continues to validate the remaining checkpoints.
The full checkpoints are completed by validating all remaining checkpoints. After the first 25 checkpoints, there are 500 verification checkpoints, followed by another 500 verification checkpoints, with the interval doubling for each subsequent group of 500 checkpoints.
While the hash chain must proceed sequentially in generating mining hashes, validators can perform hash verification when validating checkpoints, which can shorten the time to verify blocks and improve efficiency.
Figure 4: Verification Process of the Hash Chain
Seed of the Hash Chain
If a miner or mining pool has faster SHA256 hashing capabilities, their hash chain may advance ahead of other nodes in the network. Over time, this block speed advantage may accumulate into a significant hash chain offset, causing the mined hashes to be out of sync with the rest of the validators. This could lead to a series of uncontrollable forks and reorganizations.
To reduce the likelihood of such hash chain offsets, Arweave synchronizes the global hash chain by using tokens from historical blocks at fixed intervals. This regularly provides new seeds for the hash chain, ensuring synchronization of hash chains among various miners with a validated block.
The interval for hash chain seeds is every 50 * 120 mined hashes (50 represents the number of blocks, and 120 represents the number of mined hashes within a block production cycle of 2 minutes) to select a new seed block. This means that seed blocks appear approximately every ~50 Arweave blocks, but due to variations in block times, seed blocks may appear slightly earlier or later than 50 blocks.
Figure 5: Generation Method of Hash Chain Seeds
The above content excerpted from the specification of version 2.6 by the author illustrates that Arweave has implemented low-power, more decentralized mechanisms to operate the entire network starting from version 2.6. Satoshi Nakamoto’s vision finds practical realization in Arweave.
Arweave 2.6: https://2-6-spec.arweave.dev/
Statement:
This article originally titled “Arweave 2.6 也许更符合中本聪的愿景” is reproduced from [PermaDAO]. All copyrights belong to the original author [Arweave Oasis]. If you have any objection to the reprint, please contact Gate Learn team, the team will handle it as soon as possible.
Disclaimer: The views and opinions expressed in this article represent only the author’s personal views and do not constitute any investment advice.
Translations of the article into other languages are done by the Gate Learn team. Unless mentioned, copying, distributing, or plagiarizing the translated articles is prohibited.
In about a month, #Bitcoin is set to begin its next halving. However, the author believes Satoshi Nakamoto’s vision—consensus accessible to everyone via CPU—has yet to be realized. In this regard, the iterative mechanisms of Arweave may align more faithfully with Nakamoto’s original vision, with version 2.6 representing a significant step towards fulfilling his expectations. This version brings substantial improvements over its predecessors, aiming to:
Based on the above goals, the mechanism of version 2.6 is roughly as follows:
Graph 1: Schematic of the Consensus Mechanism in Version 2.6
Let’s get acquainted with various terms and concepts that appear in this mechanism:
Arweave Data: Also known as the “Weave Network.” All data in the network is divided into individual data blocks, called Chunks (the blocks resembling a “brick wall” in the diagram). These blocks are evenly distributed throughout the Arweave network and are addressed using a Merkle tree structure (also known as Global Offset), allowing identification of any data block’s position within the Weave Network.
Chunk: Each data block typically has a size of 256 KB. Miners must package and hash the corresponding data blocks to win the right to mine, proving that they store copies of the data during the SPoRA mining process.
Partition: “Partition” is a new concept introduced in version 2.6. Each partition covers 3.6TB of data. Partitions are numbered from the beginning of the Weave Network (index 0) up to the total number of partitions covering the entire Weave Network.
Recall Range: Recall Range is another new concept in version 2.6. It represents a series of contiguous data blocks (Chunks) in the Weave Network, starting from a specific offset and having a length of 100MB. With each data block being 256 KB, a Recall Range includes 400 data blocks. In this mechanism, there are two Recall Ranges, as explained in detail below.
Potential Solutions: Every 256KB data block within the Recall Range is considered a potential solution for winning the right to mine. As part of the mining process, each data block is hashed to test if it meets the network’s difficulty requirements. If successful, the miner wins the right to mine and receives mining rewards. If unsuccessful, the miner continues to attempt the next 256KB block within the Recall Range.
Hash Chain: The Hash Chain is a key update in version 2.6, adding an encrypted clock to the previous SPoRA, limiting the maximum hash rate. The Hash Chain generates a sequence of hashes by consecutively hashing a piece of data using the SHA-256 function. This process cannot be parallelized (easily achievable with consumer-grade CPUs), achieving a 1-second delay by performing a certain number of consecutive hash operations.
Mining Hash: After a sufficient number of consecutive hash operations (i.e., after a 1-second delay), the Hash Chain produces a hash value considered valid for mining. It’s noteworthy that the mining hash is consistent across all miners, and all miners can verify it.
Now that we’ve introduced all the necessary terms, we can better understand how Version 2.6 operates by discussing the optimal strategies for obtaining it.
The overall goal of Arweave has been introduced multiple times before, which is to maximize the number of data replicas stored on the network. But what to store? How to store it? There are many requirements and intricacies involved. Here, we will discuss how to adopt a best practice strategy.
Replicas vs Copies
Since version 2.6, I’ve frequently encountered two terms in various technical documents: Replicas and Copies. Both concepts can be translated into “copies” in Chinese, but in reality, there are significant differences between them, which also caused some obstacles for me to understand the mechanism. For ease of understanding, I prefer to translate Replicas as “副本” (replicas) and Copies as “备份” (backups).
Copies refer to simply copying the data, where there is no difference between the backups of the same data.
Replicas, on the other hand, emphasize uniqueness. It refers to the act of storing data after it has undergone a process of uniqueness. The Arweave network encourages the storage of replicas rather than mere backups.
Note: In version 2.7, the consensus mechanism has changed to SPoRes, which stands for Succinct Proofs of Replications, based on the storage of replicas. I will provide further interpretation in the future.
Packing Unique Replicas
Unique replicas are crucial in the Arweave mechanism. Miners must package all data in a specific format to form their unique replicas as a prerequisite for winning the right to mine.
If you want to run a new node and think about directly copying the data that other miners have already packaged, it won’t work. First, you need to download and synchronize the original data from the Arweave Weave Network (of course, you don’t want to download all of it, downloading only a part is also feasible, and you can set your own data policies to filter out risky data). Then, use the RandomX function to package each data block of the original data, turning them into potential mining solutions.
The packaging process involves providing a Packing Key to the RandomX function, allowing it to generate results through multiple calculations for packaging the original data blocks. The process of unpacking the already packaged data blocks is the same—providing the packing key and using the results generated through multiple calculations to unpack the data blocks.
In version 2.5, the Packing Key backup is a SHA256 hash associated with chunk_offset (the offset of the data block, also understood as the position parameter of the data block) and tx_root (transaction root). This ensures that each mined mining solution comes from a unique replica of data blocks within a specific block. If a data block has multiple backups in different locations in the broken network, each backup needs to be backed up separately as a unique replica.
In version 2.6, this backup key is extended to a SHA256 hash associated with chunk_offset, tx_root, and miner_address (miner’s address). This means that each replica is also unique for each mining address.
Advantages of storing complete replicas
The algorithm suggests that miners should construct a unique complete replica rather than partially replicated ones, which ensures an even distribution of data throughout the network.
How should we understand this? Let’s understand through a comparison of the following two images.
First, let’s assume that the entire Arweave fragmented network has produced a total of 16 data partitions.
Scenario 1:
Figure 2: Bob’s “Clever” Strategy: First Scenario
Second Scenario:
Now, let’s take a look at the second scenario. Due to the arrangement of two recall ranges, a more optimal strategy is to store unique replicas of the data with more problems. This is illustrated in Figure 3.
Figure 3: Alice’s strategy has greater advantages
The Role of Recall Ranges
You might wonder why, before version 2.5, a single recall block offset was randomly hashed out by a function to let miners search and provide storage proofs, while in version 2.6, it hashes out a recall range instead.
The reason is quite simple: a recall range is composed of contiguous data blocks, and this structure serves one main purpose - to minimize the movement of the read head of mechanical hard drives (HDDs). This method of physical optimization allows HDDs’ read performance to be on par with more expensive solid-state drives (SSDs). It’s like tying one hand and one foot of an SSD; of course, it can still have a slight speed advantage by being able to transfer four recall ranges per second. However, compared to cheaper HDDs, their count will be the key metric driving miners’ choices.
Now let’s discuss the verification of a new block.
To accept a new block, validators need to validate the new block received from the block producer, which can be done by using their generated mining hash to verify the mining hash of the new block.
If a validator is not at the current head of the hash chain, each mining hash includes 25 40-millisecond checkpoints. These checkpoints are the consecutive results of hashing for 40 milliseconds, and they together represent a one-second interval starting from the previous mining hash.
Before propagating the newly received block to other nodes, validators will rapidly complete the verification of the first 25 checkpoints within 40 milliseconds. If the verification is successful, it triggers block propagation and continues to validate the remaining checkpoints.
The full checkpoints are completed by validating all remaining checkpoints. After the first 25 checkpoints, there are 500 verification checkpoints, followed by another 500 verification checkpoints, with the interval doubling for each subsequent group of 500 checkpoints.
While the hash chain must proceed sequentially in generating mining hashes, validators can perform hash verification when validating checkpoints, which can shorten the time to verify blocks and improve efficiency.
Figure 4: Verification Process of the Hash Chain
Seed of the Hash Chain
If a miner or mining pool has faster SHA256 hashing capabilities, their hash chain may advance ahead of other nodes in the network. Over time, this block speed advantage may accumulate into a significant hash chain offset, causing the mined hashes to be out of sync with the rest of the validators. This could lead to a series of uncontrollable forks and reorganizations.
To reduce the likelihood of such hash chain offsets, Arweave synchronizes the global hash chain by using tokens from historical blocks at fixed intervals. This regularly provides new seeds for the hash chain, ensuring synchronization of hash chains among various miners with a validated block.
The interval for hash chain seeds is every 50 * 120 mined hashes (50 represents the number of blocks, and 120 represents the number of mined hashes within a block production cycle of 2 minutes) to select a new seed block. This means that seed blocks appear approximately every ~50 Arweave blocks, but due to variations in block times, seed blocks may appear slightly earlier or later than 50 blocks.
Figure 5: Generation Method of Hash Chain Seeds
The above content excerpted from the specification of version 2.6 by the author illustrates that Arweave has implemented low-power, more decentralized mechanisms to operate the entire network starting from version 2.6. Satoshi Nakamoto’s vision finds practical realization in Arweave.
Arweave 2.6: https://2-6-spec.arweave.dev/
Statement:
This article originally titled “Arweave 2.6 也许更符合中本聪的愿景” is reproduced from [PermaDAO]. All copyrights belong to the original author [Arweave Oasis]. If you have any objection to the reprint, please contact Gate Learn team, the team will handle it as soon as possible.
Disclaimer: The views and opinions expressed in this article represent only the author’s personal views and do not constitute any investment advice.
Translations of the article into other languages are done by the Gate Learn team. Unless mentioned, copying, distributing, or plagiarizing the translated articles is prohibited.