Why Web3 Needs a Real-Time Data Layer Now More Than Ever

BeginnerFeb 02, 2024
This article discusses what a real-time data layer is, the current high costs and slow speeds of blockchain implementations making them unsuitable as a general Web3 computing platform, and how successful systems are leveraging real-time off-chain data to find market fit.
 Why Web3 Needs a Real-Time Data Layer Now More Than Ever

Nowadays, Web3 finds itself in a tricky situation, not just because of the long shadow cast by high-profile bad actors on the blockchain ecosystem. Overcoming three significant challenges without abandoning the principles that initially made blockchain appealing is a tough task:

  1. Compared to similar Web 2.0 products, the cost of on-chain storage and write operations is prohibitively high.

  2. On-chain storage and write operations are incredibly slow (by design) to ensure the security promised by blockchain-based systems. As nodes are added to the network and the volume of write requests increases, performance degrades further due to the need for consensus from over 51% of nodes on the validity of new data.

  3. The length (size) of any given blockchain ledger grows significantly with use, breaking the limits of most database infrastructures on the market today.

Operational databases, analytical databases, and distributed ledgers are effective yet distinct types of database management systems. What confuses many about emerging peer-to-peer blockchain networks is that they are not just “databases”; many also serve as “servers” for hosting internet applications (or “dApps” - decentralized applications) written by any capable developer.

Most new technologies go through a phase of being overgeneralized until a suitable product or market fit is found. The root of these three challenges lies in the same “using the right tool for the wrong job” issue. For instance, most IT professionals would not use operational databases as analytical databases, and vice versa. Using distributed ledgers as operational or analytical databases (e.g., under a dApp deployed to a blockchain network) is a particularly poor match, further explained below.

Indeed, the blockchain community is exploring innovative ways to tackle performance issues without compromising security, but this takes time. Ethereum has made some changes in this regard recently. Trust must be placed somewhere. Blockchain moves this trust away from the traditional Web 2.0 model, but it doesn’t eliminate the need for trust— at least not yet.

Real-time off-chain data provides a direct path for Web3 to find product/market fit. However, this approach finds trust in the form of operation/analysis data for dApps within Web 2.0 systems. Yet, the most successful dApps and blockchain-based services have made this trade-off, using the right tools for the right job by leveraging each technology to its strengths.

Before delving deeper into how and why Web3 can progress with real-time data, let’s first consider the future prospects of Web3, regardless of the tri-fold challenges we’ve just identified.

What will continue to drive Web3 forward?

At such times, it’s important to remember blockchain ≠ cryptocurrency. Cryptocurrency is an application of the blockchain concept and underlying technology. The same goes for NFTs and the broader concept of Web3. The core concept of blockchain—transactions, positions, and immutable public records of ownership—continues to offer an interesting contrast to the current financial system, where such ledgers reside in private databases accessible only through institutional and legal gateways. What are these real-world valuable and meaningful use cases?

According to McKinsey, the largest Web3 lending platforms issued $200 billion in loans in 2021. Loans, deposits, remittances, asset swaps, trade finance, and insurance have become viable use cases. Other peer-to-peer, gaming, social, and online media, though early starters, show significant activity.

Digital identity services and supply chain and logistics management remain obvious possibilities. Hypothetical use cases in the supposed metaverse are driving real investment dollars, with companies like Facebook pivoting, rebranding to Meta, and going all-in.

Private blockchain systems on closed and protected networks (e.g., Hyperledger Fabric) may not be what creators envisioned but can now offer more generic use cases for specific industries and institutions (at the cost of being an open Web3 system to the public). NFTs, or the concept of unique, indivisible, and immutable tokens, hold genuine potential commercial value in digitally representing real-world and online-only ephemeral assets.

These are secure public speculations made possible, yet unresolved. Legitimately (and in some cases, physically) establishing connections between the real world and digital NFTs is still undergoing extensive exploration. Web3 provider Alchemy noted in its quarterly report that smart contract deployments grew 143% compared to the same quarter in 2021.

While there are still significant challenges to overcome, like any new idea, the allure of investment funds, developer, and institutional interest does indeed have the potential to draw energy that propels blockchain forward. As the core technology matures, more Web3 value will be created. With more value generation, new opportunities will arise, sparking interest in addressing regulation, legal issues, data privacy, and improved developer and end-user experiences.

Web3 Developers’ Considerations for On-Chain Data

The challenges faced by blockchain products based on Proof of Work extend into their underlying architecture. Operational databases are highly suited for quick, efficient data storage and retrieval. Analytical databases excel in fast, open-ended queries and exploration. Non-relational databases massively offer varying levels of operational or analytical capabilities without sacrificing performance and availability.

Blockchain-based systems provide secure, immutable ledgers but at the cost of performance. Attempting to use secure, append-only immutable ledgers as operational, analytical, or non-relational databases will lead to the following issues:

Unacceptable Performance

The Web 2.0 technology stack has set expectations for a rapid digital experience for most people worldwide, whether using tablets, smartphones, or desktop/notebook computers, not requiring two minutes to six hours. Most popular blockchain implementations are based on slow Proof of Work algorithms to secure write operations to the blockchain data storage and slow peer-to-peer consensus to ensure consistent data reads across the node network.

Volume of Data Causes Production Interruptions

Blockchain is not just a “big data” problem; it’s a massive, incredibly large data problem that only gets bigger with increased usage. Few operational or analytical databases can reach this level, and even fewer can truly achieve this level of linear scalability, significantly narrowing the choice range.

Contradictory and Inaccurate Data

Blockchain’s widespread peer-to-peer, eventual consistency design and the nature of Proof of Work make it secure but result in inconsistent data, making it unsuitable as an operational or analytical database for Web3 applications. Since there are no error messages or fault codes for these issues, writing error handling code to test, interpret, or resolve these errors to attempt compensation is time-consuming or impossible. Naturally, debugging in production or other critical moments is a nightmare for all involved parties. Downstream tech support will be unable to provide answers to frustrated users, and developers will be unable to provide answers to tech support personnel. This leads to negative reviews in app stores.

Unacceptable storage/usage costs

On-chain operations are costly: storing 1GB of data on the Ethereum blockchain can cost thousands of dollars.

Other Considerations

Off-chain indexing or syncing of blockchain data is not straightforward, as these data are not human-readable. Blockchain data requires decoding, enriching, reorganizing, and data modeling through third-party data services before it can be easily used by developers.

Solution: Real-time Off-chain Data Synchronization

The implementation of popular blockchain networks requires time to address performance issues inherent in their design. Off-chain processing is a primary technique used by successful IT professionals to fully leverage existing database technologies and the advantages of blockchain, allocating each technology to its best-designed purpose. Simply put, dApps should read data from off-chain databases and write data back to the chain (but only record the minimal details necessary for the final transaction result).

By syncing the state of the blockchain to an operational or analytical database in real-time, By synchronizing the state of the blockchain in real-time to operational or analytical databases, you ensure the accuracy and currency of data crucial for the fast operation of your dApp. Then, after your dApp and the off-chain database complete as much preprocessing as possible, submit the final results back to the chain.

Static and binary assets can utilize systems like IPFS, but for similar reasons, it is prudent to consider off-chain object storage (such as S3) wherever possible. Therefore, in practice, an off-chain database with an always-synchronized clone of the chain state should become the read/write target for as many operational or analytical workloads as possible.

However, as previously discussed, the sheer volume of data (especially over time) can overwhelm most data infrastructures. Apache Cassandra is one of the most powerful operational database systems at this level of capacity, scale, and performance.

With the right data model, applications can experience sub-second speeds expected of in-memory caches like Redis and persistent database management systems (DBMS). What if non-relational data services could provide historical data and always up-to-date (real-time) off-chain data?

During the indexing process, the raw data is automatically decoded. For developers, this changes the experience of working with blockchain data in raw hexadecimal form, as follows:

For human-readable data, as follows:

Then, Web3 developers typically need to reorganize and enrich blockchain data from third-party data services like Etherscan, whatsabi, NFT metadata, etc., to make it useful for the simplest queries. If the enriched data is subsequently modeled into queryable database tables, developers will have the full capabilities of standard DBMS query languages (instead of having to learn blockchain analytics APIs).

Let’s look at an example:

Developer intent: Search for five entries from block group 134

Actual query code:

System response:

So, what does this look like in practice? To bring it to life, take a look at these two (real-time) example applications that are exactly using such off-chain real-time data services. Web3 developers should be familiar with the application source code; it is written using the popular Web3.js library.

NFT Explorer

Search for every NFT created within seconds

Extract the transfer history of an NFT in a single API call

NFT Explorer is built with React and Next JS, providing users with a complete view of NFTs that have been minted or transferred in real-time on the Ethereum blockchain.

Blockchain Explorer

Pull historical Gas prices by block number

Fetch ERC20 transfer quantities by block number

Like the NFT Explorer, this blockchain data explorer extracts all blockchain data from off-chain data, providing users with a real-time view of the latest mined blocks and the latest Ethereum transactions.

Offering all these on hosted cloud services would help overcome traditional hesitations to achieve the usability and time-to-market of relational DBMS-style. Building such services on top of Cassandra can uniquely offer to colocate these data with your Web3 applications in any region or multi-region without the need for sharding. Cassandra’s built-in replication has been battle-tested in the most extreme internet-scale production environments for over a decade.

Advantages for Web3 Applications and Developers

By minimizing the size of dApps, blockchain data storage, and off-chain processing of blockchain writes, the operational costs for most use cases will be realigned to Web 2.0 levels. Users’ experience of dApp performance on their device of choice returns to acceptable/expected levels. Then, dApp developers can design appropriate “waiting time” dialogues, screens, and alerts to set expectations when users need to submit write operations to a blockchain-based system.

The biggest and most challenging data consistency issues are resolved, as most of the operational data for a dApp is stored in fast, reliable off-chain databases. This can save hours of frustrating (and potentially fruitless) debugging time and avoid production errors that might be impossible to resolve.

Because off-chain systems like non-relational databases can handle large volumes of data, your dApp will meet uptime and response time expectations as the blockchain grows, without needing expensive system redesigns or complete rewrites months after going into production. Working with Cassandra—arguably the most reliable, scalable, and fastest non-relational database—is also one of the highest-paid jobs, according to the latest Stack Overflow developer survey.

Benefits for Enterprises

Broken, slow, or inaccurate applications can lead to irreparable losses of users, revenue, and investor confidence. But let’s discuss the conversation we all hope to have—what exciting possibilities might syncing blockchain state in real-time to off-chain, non-relational infrastructure bring?

Analyzing dApps: Integrating dApps with off-chain analytical databases opens up the prospect of entire “Web 2.0” options and use cases.

Fraud Detection/Prevention Capabilities: Build dApps that can expel bad actors or flag/block abuse, thereby protecting your user community and your business.

Authority for Digital Asset Exchanges: NFT exchanges require accurate/up-to-date market data to facilitate the best trades/sales/exchanges. Prevent buyer’s remorse when users see the item they purchased at a lower price minutes later, as well as resource-intensive refund processes and negative user reviews.

Location-based Features: Knowing the current location is foundational for many of today’s mobile applications. Bring it to your dApp!

IoT Applications: The speed and capacity for writing machine-generated data from software or hardware can only be uncompromisingly handled by non-relational databases.

Data Sovereignty: For compliance, regulatory, or legal reasons, find a synchronized copy of the blockchain state with your dApp (wherever it’s deployed in the world).

Blockchain transaction parsing time is determined by the protocol, and without gas/transaction fees or using accelerator services, it cannot be expedited. By preprocessing as much as possible off-chain, you can minimize the size and frequency of transactions for the final result. This will lower the chain write costs for any use case and improve dApp speed.

Try It Yourself as a Service

This focus on real-time data goes beyond blockchain. It’s an area the industry has been innovating in for over a decade. But technologies like blockchain help demonstrate the importance of real-time data being part of data architecture and business models.

As we wait for quantum cryptography as a service, the ubiquity of atomic clocks, and new innovations in distributed consensus algorithms, real-time data can now be obtained through a Web 2.0 cost structure. Real-time data will still be a core, fundamental element of any blockchain implementation in the future.

Disclaimer:

  1. This article is reprinted from [AIcoin]. All copyrights belong to the original author [Pieter Humphrey,DataStax]. If there are objections to this reprint, please contact the Gate Learn team, and they will handle it promptly.
  2. Liability Disclaimer: The views and opinions expressed in this article are solely those of the author and do not constitute any investment advice.
  3. Translations of the article into other languages are done by the Gate Learn team. Unless mentioned, copying, distributing, or plagiarizing the translated articles is prohibited.
Start Now
Sign up and get a
$100
Voucher!
Create Account