AIxDePIN: What new opportunities will arise from the collision of these two hot tracks?

BeginnerJan 26, 2024
This article explains the changes that DePIN could bring to AI, with the potential for making AI training more efficient and achieving AI popularization.
AIxDePIN: What new opportunities will arise from the collision of these two hot tracks?

By harnessing the power of algorithms, computing power, and data, the advancement of AI technology is redefining the boundaries of data processing and intelligent decision-making. At the same time, DePIN represents a paradigm shift from centralized infrastructure to decentralized, blockchain-based networks.

As the world accelerates its pace towards digital transformation, AI and DePIN (decentralized physical infrastructure) have become foundational technologies driving transformation across industries. The fusion of AI and DePIN not only promotes rapid technological iteration and widespread application but also opens up a more secure, transparent, and efficient service model, bringing profound changes to the global economy.

DePIN: Decentralization Moves From Virtuality to Reality, The Mainstay of The Digital Economy

DePIN is the abbreviation of Decentralized Physical Infrastructure. In a narrow sense, DePIN mainly refers to the distributed network of traditional physical infrastructure supported by distributed ledger technology, such as power network, communication network, positioning network, etc. Broadly speaking, all distributed networks supported by physical devices can be called DePIN, such as storage networks and computing networks.

Image source: Messari

If Crypto has brought about decentralized changes at the financial level, then DePIN is a decentralized solution in the real economy. It can be said that the PoW mining machine is a kind of DePIN. So, DePIN has been a core pillar of Web3 from day one.

The Three Elements of AI— Algorithm, Computing Power, and Data. DePIN Exclusively Possesses Two

The development of artificial intelligence is generally considered to rely on three key elements: algorithms, computing power and data. Algorithms refer to the mathematical models and program logic that drive AI systems, computing power refers to the computing resources required to execute these algorithms, and data is the basis for training and optimizing AI models.

Which of the three elements is the most important? Before the emergence of chatGPT, people usually thought of it as an algorithm, otherwise academic conferences and journal papers would not be filled with algorithm fine-tuning one after another. But when chatGPT and the large language model LLM that supports its intelligence were unveiled, people began to realize the importance of the latter two. Massive computing power is the prerequisite for the birth of models. Data quality and diversity are crucial to building a robust and efficient AI system. In comparison, the requirements for algorithms are no longer as demanding as before.

In the era of large models, AI has transitioned from fine-tuning to brute force, with an increasing demand for computational power and data. DePIN happens to be able to provide that. Token incentives will leverage the long-tail market, where massive consumer-grade computing power and storage will become the best nourishment for large models.

Decentralization of AI Is Not An Option, But A Necessity

Of course, someone might ask, why choose DePIN over centralized services when both computational power and data are available in AWS data centers, and moreover, AWS outperforms DePIN in terms of stability and user experience?

This statement naturally has its reasoning. After all, looking at the present situation, almost all large models are developed directly or indirectly by large internet companies. Behind chatGPT is Microsoft, and behind Gemini is Google. In China, almost every major internet company has a large model. Why is that? It’s because only large internet companies have the computational power supported by high-quality data and strong financial resources. But this is not correct. People no longer want to be manipulated by internet giants.

On the one hand, centralized AI carries data privacy and security risks and may be subject to censorship and control. On the other hand, AI produced by Internet giants will further strengthen people’s dependence, lead to market concentration, and increase barriers to innovation.

from: https://www.gensyn.ai/

Humanity should no longer need a Martin Luther in the AI ​​era. People should have the right to talk directly to God.

DePIN From A Business Perspective: Cost Reduction and Efficiency Increase Are Key

Even setting aside the debate between decentralization and centralization values, from a business perspective, there are still advantages to using DePIN for AI.

Firstly, it is important to recognize that although internet giants control a large number of high-end graphics card resources, the combination of consumer-grade graphics cards in the hands of individuals can still form a significant computing power network, known as the long tail effect of computing power. These consumer-grade graphics cards often have high idle rates. As long as the incentives provided by DePIN exceed the cost of electricity, users have the motivation to contribute their computing power to the network. Additionally, with users managing the physical infrastructure themselves, the DePIN network does not bear the operational costs that centralized suppliers cannot avoid, and can focus solely on protocol design.

For data, the DePIN network can unlock the potential usability of data and reduce transmission costs through edge computing and other methods. Furthermore, most distributed storage networks have automatic deduplication capabilities, reducing the need for extensive data cleaning in AI training.

Lastly, the Crypto economics brought by DePIN enhances the system’s fault tolerance and has the potential to achieve a win-win situation for providers, consumers, and platforms.

Image from: UCLA

In case you don’t believe it, UCLA’s latest research shows that using decentralized computing achieves 2.75 times better performance than traditional GPU clusters at the same cost. Specifically, it is 1.22 times faster and 4.83 times cheaper.

Difficult Road Ahead: What Challenges Will AIxDePIN Encounter?

We choose to go to the moon and do other things in this decade not because they are easy, but because they are hard. — John Fitzgerald Kennedy

Using DePIN’s distributed storage and distributed computing to build AI models without trust still poses many challenges.

Work Verification

Essentially, both deep learning model computation and PoW mining are forms of general computation, with the underlying signal changes between gate circuits. On a macro level, PoW mining is “useless computation,” attempting to find a hash value with a prefix of n zeros through countless random number generation and hash function calculations. On the other hand, deep learning computation is “useful computation,” calculating the parameter values of each layer in deep learning through forward and backward propagation, thus constructing an efficient AI model.

The fact is that “useless calculations” such as PoW mining use hash functions. It is easy to calculate the image from the original image, but it is difficult to calculate the original image from the image, so anyone can easily and quickly verify the validity of the calculation; For the calculation of the deep learning model, due to the hierarchical structure, the output of each layer is used as the input of the next layer. Therefore, verifying the validity of the calculation requires performing all previous work, and cannot be verified simply and effectively.

Image from: AWS

Work verification is very critical, otherwise the provider of the calculation could not perform the calculation at all and submit a randomly generated result.

One idea is to have different servers perform the same computing tasks and verify the effectiveness of the work by repeating the execution and checking whether it is the same. However, the vast majority of model calculations are non-deterministic, and the same results cannot be reproduced even under the exact same computing environment, and can only be similar in a statistical sense. In addition, double counting will lead to a rapid increase in costs, which is inconsistent with DePIN’s key goal of reducing costs and increasing efficiency.

Another category of ideas is the Optimistic mechanism, which optimistically assumes that the result is computed correctly and allows anyone to verify the computation result. If any errors are found, a Fraud Proof can be submitted. The protocol penalizes the fraudster and rewards the whistleblower.

Parallelization

As mentioned before, DePIN mainly leverages the long-tail consumer computing power market, which means that the computing power provided by a single device is relatively limited. For large AI models, training on a single device will take a very long time, and parallelization must be used to shorten the training time.

The main difficulty in parallelizing deep learning training lies in the dependency between previous and subsequent tasks, which makes parallelization difficult to achieve.

Currently, parallelization of deep learning training is mainly divided into data parallelism and model parallelism.

Data parallelism refers to distributing data across multiple machines. Each machine saves all parameters of a model, uses local data for training, and finally aggregates the parameters of each machine. Data parallelism works well when the amount of data is large, but requires synchronous communication to aggregate parameters.

Model parallelism means that when the size of the model is too large to fit into a single machine, the model can be split on multiple machines, and each machine saves a part of the parameters of the model. Forward and backward propagations require communication between different machines. Model parallelism has advantages when the model is large, but the communication overhead during forward and backward propagation is large.

The gradient information between different layers can be divided into synchronous update and asynchronous update. Synchronous update is simple and direct, but it will increase the waiting time; the asynchronous update algorithm has a short waiting time, but will introduce stability problems.

Image from: Stanford University, Parallel and Distributed Deep Learning

Privacy

The global trend of protecting personal privacy is rising, and governments around the world are strengthening the protection of personal data privacy security. Although AI makes extensive use of public data sets, what truly differentiates different AI models is the proprietary user data of each enterprise.

How to get the benefits of proprietary data during training without exposing privacy? How to ensure that the parameters of the built AI model are not leaked?

These are two aspects of privacy, data privacy and model privacy. Data privacy protects users, while model privacy protects the organization that builds the model. In the current scenario, data privacy is much more important than model privacy.

A variety of solutions are being attempted to address the issue of privacy. Federated learning ensures data privacy by training at the source of the data, keeping the data locally, and transmitting model parameters; and zero-knowledge proof may become a rising star.

Case Analysis: What are the high-quality projects in the market?

Gensyn

Gensyn is a distributed computing network designed for training AI models. The network utilizes a layer-one blockchain based on Polkadot to verify the proper execution of deep learning tasks and trigger payments through commands. Founded in 2020, it disclosed a Series A funding round of $43 million in June 2023, with a16z leading the investment.

Gensyn uses the metadata of the gradient-based optimization process to build certificates of the work performed, consistently executed by a multi-granular, graph-based precision protocol and cross-evaluator to allow validation jobs to be re-run and compared for consistency, and ultimately by the chain Confirm it yourself to ensure the validity of the calculation. To further strengthen the reliability of work verification, Gensyn introduces staking to create incentives.

There are four types of participants in the system: submitters, solvers, verifiers and whistleblowers.

• The submitters are end users of the system who provide tasks to be computed and are paid for units of work completed.
• The solver is the main worker of the system, performing model training and generating proofs for inspection by the verifier.
• The validator is key to linking the non-deterministic training process with deterministic linear computation, replicating partial solver proofs and comparing distances to expected thresholds.
• The whistleblower is the last line of defense, checking the work of the verifier and raising challenges, and receiving rewards after passing the challenge.

The solver needs to make a pledge, and the whistleblower tests the solver’s work. If he discovers evildoing, he will challenge it. After the challenge is passed, the tokens staked by the solver will be fined and the whistleblower will be rewarded.

According to Gensyn’s predictions, this solution is expected to reduce training costs to 1/5 of those of centralized providers.

Source: Gensyn

FedML

FedML is a decentralized collaborative machine learning platform for decentralized and collaborative AI, anywhere and at any scale. More specifically, FedML provides an MLOps ecosystem that trains, deploys, monitors, and continuously improves machine learning models while collaborating on combined data, models, and computing resources in a privacy-preserving manner. Founded in 2022, FedML disclosed a $6 million seed round in March 2023.

FedML consists of two key components: FedML-API and FedML-core, which represent high-level API and low-level API respectively.

FedML-core includes two independent modules: distributed communication and model training. The communication module is responsible for the underlying communication between different workers/clients and is based on MPI; the model training module is based on PyTorch.

FedML-API is built on FedML-core. With FedML-core, new distributed algorithms can be easily implemented by adopting client-oriented programming interfaces.

The latest work from the FedML team demonstrates that using the FedML Nexus AI for AI model inference on consumer-grade GPU RTX 4090 is 20 times cheaper and 1.88 times faster than using A100.

from: FedML

Future Outlook: DePIN brings the democratization of AI

One day, AI will further develop into AGI, and computing power will become the de facto universal currency. DePIN will make this process happen in advance.

The intersection and collaboration of AI and DePIN has opened up a brand new point of technological growth, providing enormous opportunities for the development of artificial intelligence. DePIN provides AI with massive distributed computing power and data, which helps train larger-scale models and achieve stronger intelligence. At the same time, DePIN also allows AI to develop towards a more open, secure, and reliable direction, reducing reliance on a single centralized infrastructure.

Looking ahead, AI and DePIN will continue to develop in synergy. Distributed networks will provide a strong foundation for training super-large models, which will play an important role in DePIN applications. While protecting privacy and security, AI will also contribute to the optimization of DePIN network protocols and algorithms. We look forward to AI and DePIN bringing a more efficient, fair, and trustworthy digital world.

Disclaimer:

  1. This article is reprinted from []. All copyrights belong to the original author [**]. If there are objections to this reprint, please contact the Gate Learn team, and they will handle it promptly.
  2. Liability Disclaimer: Th
    e views and opinions expressed in this article are solely those of the author and do not constitute any investment advice.
  3. Translations of the article into other languages are done by the Gate Learn team. Unless mentioned, copying, distributing, or plagiarizing the translated articles is prohibited.
Start Now
Sign up and get a
$100
Voucher!
Create Account