Sora Emerges: Will 2024 Be the Year of AI+Web3 Revolution?

BeginnerFeb 29, 2024
Besides Depin, what kind of sparks can the interweaving of Web3 and AI ignite? What opportunities lie within the Sora track? This article also contemplates the possibilities of Web3 in the era of AI.
Sora Emerges: Will 2024 Be the Year of AI+Web3 Revolution?

Foreword

On February 16th, OpenAI announced its latest text-to-video generative diffusion model named “Sora,” marking another milestone in generative AI with its ability to produce high-quality videos across a wide range of visual data types. Unlike AI video generation tools like Pika, which generate a few seconds of video from multiple images, Sora trains in the compressed latent space of videos and images, breaking them down into spatiotemporal patches for scalable video generation. Moreover, the model demonstrates capabilities of simulating both physical and digital worlds, with its 60-second demo described as a “universal simulator of the physical world.”

Sora continues the technical path of “source data-Transformer-Diffusion-emergence” seen in previous GPT models, indicating its development maturity also relies on computational power. Given the larger data volume required for video training compared to text, the demand for computational power is expected to increase further. However, as discussed in our earlier article “Promising Sector Preview: The Decentralized Computing Power Market,” the importance of computational power in the AI era has been explored, and with the rising popularity of AI, numerous computational power projects have emerged, benefiting other Depin projects (storage, computational power, etc.) with a surge in value. Beyond Depin, this article aims to update and complete past discussions, pondering the sparks that might arise from the intertwining of Web3 and AI and the opportunities within this trajectory in the AI era.

The Development of AI: Three Major Directions

Artificial Intelligence (AI) is a burgeoning field focused on emulating, extending, and enriching human intelligence. Since its inception in the 1950s and 1960s, AI has undergone over half a century of evolution, emerging as a pivotal technology propelling societal transformation and various industries. Throughout this journey, the intertwined progress of three primary research directions—symbolism, connectionism, and behaviorism—has laid the groundwork for the rapid advancement of AI today.

Symbolism

Symbolism, also referred to as logicism or rule-based reasoning, posits that replicating human intelligence through symbol processing is feasible. This approach utilizes symbols to represent and manipulate objects, concepts, and their relationships within a given problem domain, employing logical reasoning to resolve issues. Symbolism has achieved notable success, particularly in expert systems and knowledge representation. Its central tenet is that intelligent behavior can be realized through symbol manipulation and logical inference, with symbols serving as high-level abstractions of the real world.

Connectionism

Connectionism, alternatively known as the neural network approach, seeks to attain intelligence by mirroring the structure and functionality of the human brain. This methodology constructs networks comprising numerous simple processing units akin to neurons and adjusts the connection strengths between these units, akin to synapses, to facilitate learning. Emphasizing learning and generalization from data, connectionism is well-suited for tasks such as pattern recognition, classification, and continuous input-output mapping. Deep learning, an evolution of connectionism, has achieved breakthroughs in domains like image and speech recognition, as well as natural language processing.

Behaviorism

Behaviorism, closely linked to biomimetic robotics and autonomous intelligent systems research, underscores that intelligent agents can learn through environmental interaction. Unlike the preceding approaches, behaviorism doesn’t focus on simulating internal representations or cognitive processes but rather achieves adaptive behavior through the perception-action cycle. It posits that intelligence manifests through dynamic environmental interaction and learning, making it especially effective for mobile robots and adaptive control systems operating in complex and unpredictable environments.

Despite their fundamental disparities, these three research directions can synergize and complement each other in practical AI research and applications, collectively driving the field’s development.

The Principles of AIGC

The burgeoning field of Artificial Intelligence Generated Content (AIGC) represents an evolution and application of connectionism, facilitating the generation of novel content by emulating human creativity. These models are trained using vast datasets and deep learning algorithms to discern underlying structures, relationships, and patterns within the data. Prompted by user input, they produce diverse outputs including images, videos, code, music, designs, translations, answers to questions, and text. Presently, AIGC is fundamentally comprised of three elements: Deep Learning (DL), Big Data, and Massive Computational Power.

Deep Learning

Deep Learning, a subset of Machine Learning (ML), employs algorithms modeled after the neural networks of the human brain. Just as the human brain comprises interconnected neurons processing information, deep learning neural networks consist of multiple layers of artificial neurons performing computations within a computer. These artificial neurons, or nodes, leverage mathematical operations to process data and solve complex problems through deep learning algorithms.

Neural networks consist of layers: input, hidden, and output, with parameters linking these layers.

· Input Layer: The first layer of the neural network, receives external input data. Each neuron within this layer corresponds to a feature of the input data. For instance, in processing image data, individual neurons might represent pixel values.

· Hidden Layers: Following the input layer, the hidden layers process and transmit data through the network. These layers analyze information at various levels, adapting their behavior as they receive new input. Deep learning networks can have hundreds of hidden layers, allowing for multifaceted problem analysis. For instance, when classifying an unfamiliar animal from an image, the network can compare it with known animals by assessing characteristics such as ear shape, leg count, and pupil size. Hidden layers function similarly, each processing different animal features to aid in accurate classification.

· Output Layer: The final layer of the neural network, produces the network’s output. Neurons within this layer represent potential output categories or values. In classification tasks, each neuron might correspond to a category, while in regression tasks, the output layer could feature a single neuron whose value predicts the outcome.

· Parameters: In neural networks, connections between different layers are represented by weights and biases, which are optimized during the training process to enable the network to accurately recognize patterns in the data and make predictions. Increasing parameters can enhance the neural network’s model capacity, i.e., the ability to learn and represent complex patterns in the data. However, this also increases the demand for computational power.

Big Data

Effective neural network training typically necessitates extensive, diverse, high-quality, and multi-source data. Such data forms the cornerstone for training and validating machine learning models. Through big data analysis, machine learning models can identify patterns and relationships within the data, facilitating predictions or classifications.

Massive Computational Power

The intricate multi-layer structure of neural networks, numerous parameters, requirements for processing big data, iterative training methods (involving repeated forward and backward propagation calculations, including activation and loss function computations, gradient calculations, and weight updates), high-precision computing needs, parallel computing capabilities, optimization and regularization techniques, and model evaluation and validation processes collectively contribute to substantial computational demands.

Sora

Sora, OpenAI’s latest video generation AI model, signifies a substantial advancement in artificial intelligence’s capacity to process and comprehend diverse visual data. By employing video compression networks and spatiotemporal patch techniques, Sora can convert vast amounts of visual data captured worldwide and from various devices into a unified representation. This capability enables efficient processing and comprehension of intricate visual content. Sora utilizes text-conditioned Diffusion models to generate videos or images highly correlated with text prompts, showcasing remarkable creativity and adaptability.

Despite Sora’s breakthroughs in video generation and simulating real-world interactions, it encounters certain limitations. These include the accuracy of physical world simulations, consistency in generating long videos, comprehension of complex text instructions, and efficiency in training and generation. Essentially, Sora follows the “big data-Transformer-Diffusion-emergence” technical trajectory, facilitated by OpenAI’s monopolistic computational power and first-mover advantage, resulting in a form of brute-force aesthetics. However, other AI companies still possess the potential to surpass Sora through technological innovation.

While Sora’s connection with blockchain remains modest, it is anticipated that in the next one or two years, the influence of Sora will lead to the emergence and rapid development of other high-quality AI generation tools. These developments are expected to impact various Web3 sectors such as GameFi, social platforms, creative platforms, Depin, etc. Consequently, acquiring a general understanding of Sora is essential, and contemplating how AI will effectively integrate with Web3 in the future becomes a crucial consideration.

The Four Pathways of AI x Web3 Integration

As previously discussed, the fundamental components essential for generative AI can be summarized into three main elements: algorithms, data, and computing power. Conversely, AI, being a universal tool with far-reaching effects on production methods, revolutionizes how industries operate. Meanwhile, the significant impacts of blockchain technology are twofold: it restructures production relationships and enables decentralization. Thus, the convergence of these two technologies can give rise to four potential pathways:

Decentralized Computing Power

This section aims to provide insights into the current landscape of computing power. In the realm of AI, computing power holds immense significance. The demand for computing power in AI, particularly highlighted post the emergence of Sora, has reached unprecedented levels. During the World Economic Forum in Davos, Switzerland, in 2024, OpenAI’s CEO, Sam Altman, emphasized that computing power and energy are currently the foremost constraints, hinting at their future equivalence to currency. Subsequently, on February 10th, Sam Altman announced a groundbreaking plan via Twitter to raise a staggering 7 trillion USD (equivalent to 40% of China’s GDP in 2023) to revolutionize the global semiconductor industry, aiming to establish a semiconductor empire. Previously, my considerations regarding computing power were confined to national restrictions and corporate monopolies; however, the notion of a single entity aspiring to dominate the global semiconductor sector is truly remarkable.

The significance of decentralized computing power is evident. Blockchain’s features offer solutions to the prevalent issues of monopolization in computing power and the exorbitant costs associated with acquiring specialized GPUs. From the perspective of AI requirements, computing power utilization can be categorized into two aspects: inference and training. Projects primarily focusing on training are scarce due to the complex integration required for decentralized networks and the substantial hardware demands, posing significant barriers to implementation. Conversely, inference tasks are relatively simpler, with less intricate decentralized network designs and lower hardware and bandwidth requisites, thus representing a more accessible avenue.

The landscape of centralized computing power holds vast potential, often associated with the “trillion-level” descriptor, and remains a highly sensationalized topic in the AI era. However, upon observing the multitude of recent projects, many appear to be hastily conceived endeavors aimed at capitalizing on trends. While these projects often champion decentralization, they tend to sidestep discussions on the inefficiencies of decentralized networks. Moreover, there exists a notable degree of uniformity in design, with numerous projects adopting similar approaches (such as one-click L2 plus mining design), potentially leading to failure and complicating efforts to differentiate from the traditional AI race.

Algorithm and Model Collaboration System

Machine learning algorithms are designed to learn patterns and rules from data, enabling them to make predictions or decisions based on these learned patterns. Due to the complexity involved in their design and optimization, algorithms are inherently technology-intensive, requiring deep expertise and technological innovation. They serve as the backbone of training AI models, dictating how data is processed to derive useful insights or make decisions. Notable generative AI algorithms, such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Transformers, are tailored for specific domains like painting, language recognition, translation, or video generation, and are instrumental in training specialized AI models.

The plethora of algorithms and models with distinct strengths raises the question: can they be integrated into a versatile model? Bittensor, a recently prominent project, spearheads efforts in this direction by incentivizing collaboration among various AI models and algorithms, thereby fostering the development of more efficient and capable AI models. Other initiatives, such as Commune AI, focus on fostering code collaboration, although the sharing of algorithms and models remains a challenge due to their proprietary nature within AI companies.

The concept of an AI collaborative ecosystem is intriguing, leveraging blockchain technology to mitigate the drawbacks associated with isolated AI algorithms. However, its ability to generate corresponding value is yet to be determined. Established AI companies, equipped with proprietary algorithms and models, possess strong capabilities in updating, iterating, and integrating their technologies. For example, OpenAI has swiftly progressed from early text generation models to multi-domain generative models within a span of two years. Projects like Bittensor may need to explore innovative pathways in their targeted domains to compete effectively.

Decentralized Big Data

From a simplistic standpoint, integrating private data to fuel AI and annotating data are avenues that harmonize well with blockchain technology. The primary concerns revolve around how to thwart junk data and malicious activities. Moreover, data storage can be advantageous for Depin projects such as FIL and AR.

Looking at it from a more intricate angle, leveraging blockchain data for machine learning (ML) to tackle the accessibility of blockchain data presents another compelling direction, as explored by Giza.

In theory, blockchain data is accessible at any given time and mirrors the state of the entire blockchain. However, for those outside the blockchain ecosystem, accessing these extensive datasets is not straightforward. Storing an entire blockchain necessitates substantial expertise and specialized hardware resources.

To surmount the challenges of accessing blockchain data, the industry has witnessed the emergence of several solutions. For instance, RPC providers offer node access through APIs, while indexing services facilitate data retrieval via SQL and GraphQL, playing a pivotal role in mitigating the issue. Nevertheless, these methods have their limitations. RPC services are inadequate for high-density use cases requiring extensive data queries and often fail to meet the demand. Meanwhile, although indexing services offer a more structured approach to data retrieval, the intricacy of Web3 protocols renders constructing efficient queries extremely challenging, sometimes necessitating hundreds or even thousands of lines of complex code. This complexity poses a significant barrier for general data practitioners and those with limited understanding of Web3 intricacies. The collective impact of these limitations underscores the necessity for a more accessible and usable method of obtaining and leveraging blockchain data, which could spur broader application and innovation in the field.

Hence, the fusion of ZKML (Zero-Knowledge Proof Machine Learning, which alleviates the burden of machine learning on the chain) with high-quality blockchain data could potentially yield datasets that address the accessibility challenges of blockchain data. AI has the potential to significantly lower the barriers to accessing blockchain data. Over time, developers, researchers, and ML enthusiasts could gain access to more high-quality, relevant datasets for crafting effective and innovative solutions.

AI Empowerment for Dapps

Since the explosion of ChatGPT3 in 2023, AI empowerment for Dapps has become a very common direction. The broadly applicable generative AI can be integrated through APIs, thus simplifying and smartening up data platforms, trading bots, blockchain encyclopedias, and other applications. It can also function as chatbots (such as Myshell) or AI companions (like Sleepless AI), and even create NPCs in blockchain games using generative AI. However, due to the low technical barriers, most implementations are mere tweaks after integrating an API, and the integration with the projects themselves is often imperfect, hence rarely mentioned.

With the advent of Sora, I personally believe that AI empowerment for GameFi (including the metaverse) and creative platforms will be the primary focus moving forward. Given the bottom-up nature of the Web3 field, it’s improbable to produce products that can directly compete with traditional games or creative companies. However, the emergence of Sora has the potential to break this deadlock, possibly within just two to three years. From the demo of Sora, it appears capable of competing with micro-drama companies. Additionally, the active community culture of Web3 can foster a plethora of interesting ideas. When the only limit is imagination, the barriers between the bottom-up industry and the top-down traditional industry will crumble.

Conclusion

As generative AI tools continue to advance, we are poised to experience more transformative “iPhone moments” in the future. Despite initial skepticism surrounding the integration of AI with Web3, I am confident that current trajectories are generally on track, albeit with three primary pain points requiring attention: necessity, efficiency, and compatibility. While the convergence of these domains remains exploratory, it should not deter us from envisioning its mainstream adoption in the forthcoming bull market.

Maintaining a mindset of curiosity and receptivity to new ideas is crucial. Historical precedents, such as the swift transition from horse-drawn carriages to automobiles and the evolution of inscriptions into past NFTs, underscore the importance of avoiding excessive biases, which often result in missed opportunities.

Disclaimer:

  1. This article is reprinted from [Deep Tide], All copyrights belong to the original author [YBB Capital Zeke]. If there are objections to this reprint, please contact the Gate Learn team, and they will handle it promptly.
  2. Liability Disclaimer: The views and opinions expressed in this article are solely those of the author and do not constitute any investment advice.
  3. Translations of the article into other languages are done by the Gate Learn team. Unless mentioned, copying, distributing, or plagiarizing the translated articles is prohibited.
Comece agora
Registe-se e ganhe um cupão de
100 USD
!
Criar conta