FHE promises to be the “Holy Grail of Cryptography,” however there are many performance, developer experience and security concerns that limit its current adoption.
As shown in the graphic above, FHE will need to be used alongside ZKPs and MPC to build a truly confidential and secure, shared state system.

3.FHE is rapidly evolving; The development of new compilers, libraries, hardware etc. Not to mention, FHE immensely benefits from the R&D of Web2 companies (Intel, Google, DARPA etc.) .

4.As FHE and its surrounding ecosystem matures, we expect to see “verifiable FHE’’ become the standard. FHE applications may opt to outsource computation and verification to FHE co-processors.

5.A foundational limitation of onchain FHE is “who holds the decryption key?” Threshold decryption and MPC offer solutions, however they are generally bound by a performance & security tradeoff.

Introduction:

The transparent nature of blockchains is a double edge sword; while there is beauty in its openness and observability, it is a fundamental obstacle for enterprise adoption.

Onchain privacy has been one of the most challenging problems in crypto for nearly a decade. Although there are many teams building ZK based systems to achieve onchain privacy; they cannot support shared, encrypted state. Why? Short answer is because these programs are a function of a series of ZKPs and as such, it’s impossible for anyone to apply arbitrary logic to the current state. What does this mean? Basically we cannot build confidential shared state applications (think private Uniswap) using just ZKPs.

However, recent breakthroughs have shown that combining ZKPs with Fully Homomorphic Encryption (FHE) could achieve fully generalizable, confidential DeFi. How? FHE is a burgeoning field of cryptography which enables arbitrary computation over encrypted data (sounds crazy right?!) As shown in the graphic above, ZKPs can prove the integrity of user inputs and computation, and FHE can process the computation itself.

While FHE promises to be the “Holy Grail of Cryptography,” we attempt to provide an objective analysis of the field and its various challenges and possible solutions. We cover the following onchain FHE topics:

FHE Schemes, Libraries and Compilers
Secure Threshold Decryption
ZKPs for User Inputs + Computing Party
Programmable, Scalable DA Layer
FHE Hardware

FHE Schemes, Libraries and Compilers

Challenge: Nascent FHE Schemes, Libraries and Compilers

The foundational bottleneck of onchain FHE lies within its lagging developer tooling/infra. Unlike ZKPs or MPC, FHE has only been around since 2009, and therefore had a much shorter timeline to mature.

The primary limitations in FHE DevEx are :

An easy-to-use front-end language for developers to code without much knowledge of the backend cryptography
A fully functional FHE compiler that can handle all the dirty work. (Parameter selection, SIMD optimization for BGV/BFV and parallel optimization)
Existing FHE schemes (particularly TFHE) are around 1000x slower compared to plain computation

To truly understand the intricacies of integrating FHE, let’s walk through the developer journey:

The first step to integrate FHE into your application is to choose an FHE scheme to use. The following chart explains the primary schemes:

As shown in the table above, boolean circuits like FHEW & TFHE have the fastest bootstrapping and can avoid complicated a rather complex parameter selection procedure, and they could be used in arbitrary/generic computations but are relatively slow; While BGV/BFV could be fit for general DeFi as they are more efficient at high-precision arithmetic computations, but developers have to know the depth of the circuit in advance to set up all the parameters of the scheme. On the other end of the spectrum, CKKS supports homomorphic multiplication where precision errors are allowed, making it optimal for non-precise use cases such as ML.

As a developer, you need to choose an FHE solution very carefully as it will affect all other design decisions and future performance. In addition, there are several key parameters that are crucial to correctly setting up the FHE scheme, such as the choice of the module size and the role of the polynomial degree.

Moving on to libraries, the features and capabilities of the popular FHE libraries can be seen from the table below:

But the libraries also have different relationships with schemes and compilers as shown below:

After selecting the FHE solution, developers need to set parameters. Proper selection of parameters will have a huge impact on the performance of the FHE scheme. More difficulty, this requires some understanding of abstract algebra, FHE-specific operations such as relinearization and analog-to-digital switching, and arithmetic or binary circuits. Finally, for effective parameter selection, a conceptual understanding of how they affect the FHE scheme is required.

At this point, developers may ask:

How large does my plaintext space need to be? What magnitude of ciphertexts are acceptable? Where can I parallelize computation? And many more…

In addition, although FHE can support arbitrary computations, developers need to change their mindset when writing FHE programs. For example, one cannot write a branch (if-else) depending on a variable in the program, because programs cannot directly compare the variables like plain data. Instead, developers need to change their code from branches to some kind of computation that could incorporate all branches’ conditions. Similarly, this prevents developers from writing loops in their code.

In short, for a developer uninitiated in FHE, it’s nearly impossible to integrate FHE into their application. It will require significant dev tooling/infra to abstract away the complexities presented by FHE.

Solution:

Web3 Specific FHE Compilers

While there already exist FHE compilers built by the likes of Google and Microsoft, they are:

Not designed with performance in mind, adding 1000x overhead vs writing circuits directly
Optimized for CKKS (aka ML) and not as beneficial for BFV/BGV needed for DeFi
Not built for Web3. Does not support compatibility with ZKP schemes, program chaining etc.

That is until the Sunscreen FHE compiler. It is a Web3 native compiler that offers some of the best performance for arithmetic operations (think DeFi) without hardware accelerators. As discussed above, parameter selection is arguably the most difficult part of implementing an FHE scheme; Sunscreen has not only automated parameter selection, but data encoding, key selection, implements relinearization and modulus switching, sets up the circuits and implements SIMD style operations.

As we move into the future, we expect to see further optimizations not only in the Sunscreen compiler, but other teams building their own implementations that support other high level languages.

New FHE library

While researchers keep exploring new powerful schemes, FHE libraries can enable more developers to integrate FHE.

Let’s take FHE smart contracts as an example. Although it can be difficult to choose from different FHE libraries, it becomes easier when we talk about onchain FHE because there are only a few dominate programming languages across Web3.

For example, Zama’s fhEVM integrates Zama’s open source library TFHE-rs into a typical EVM, exposing homomorphic operations as precompiled contracts. Effectively enabling developers to use encrypted data in their contracts, without any changes to compilation tools.

For other specific scenarios, some other infrastructure may be required. For example, the TFHE library written in C++ is not as well maintained as the Rust version. Although TFHE-rs can support exports for C/C++, if C++ developers want better compatibility and performance, they must write their own version of the TFHE library.

Finally, a core reason for the lack of FHE infrastructure in the market is that we lack efficient ways to build FHE-RAM. One possible solution is to work on a FHE-EVM without certain opcodes.

Secure Threshold Decryption

Challenge: Un-secure/Centralized Threshold Decryption

Every confidential, shared state system is predicated upon an encryption/decryption key. No single party can be trusted, therefore the decryption key is sharded among network participants.

One of the most challenging aspects of onchain FHE (Threshold FHE) is building a secure and preformant threshold decryption protocol. There are two primary bottlenecks: (1) FHE based computation introduces significant overhead, consequently requiring high performant nodes which inherently reduces the potential size of the validator set (2) As the number of nodes participating in the decryption protocol increases, the latency increases.

At least for now, FHE protocols rely on an honest majority among validators, however as stated above, Threshold FHE implies a smaller validator set and therefore, a higher probability of collusion and malicious behavior.

What if the threshold nodes collude? They would be able to bypass the protocol and basically decrypt anything, from user inputs to any onchain data. In addition, it’s important to note the decryption protocol can happen undetectably in current systems (aka “silent attack”).

Solution: Improved Threshold Decryption or Dynamic MPC

There a few ways to possibly address the shortcomings of threshold decryption. (1) Enable an n/2 threshold which would make it more difficult to collude (2) Run the threshold decryption protocol inside of an HSM (3) Use a TEE-based threshold decryption network controlled by a decentralized chain for auth allowing for dynamic key management.

Rather than leverage threshold decryption, it’s possible to use MPC instead. An example of an MPC protocol that could be used in onchain FHE includes Odsy’s new 2PC-MPC, the first noncollusive and massively decentralized MPC algorithm.

A different approach could be to only use the user’s public key to encrypt the data. The validators then process the homomorphic operations, and the users themselves can decrypt the result if needed. While feasible only for niche use cases, we could avoid the threshold assumption altogether.

To summarize, onchain FHE needs an efficient MPC implementation that: (1) Works even when there are malicious actors (2) Introduces minimal latency (3) Enables permissionless/flexible entry of nodes.

Zero-knowledge proof (ZKP) For User Input and Computing Party

Challenge: Verifiability of User Inputs + Computation:

In web2, when we ask for a computational task to be performed, we completely trust a given company will run the task exactly as they’ve promised behind the scenes. In web3, the model is completely flipped; rather than rely on the reputation of a company and simply trust they will act honestly, we’re now operating in a trustless environment, meaning users shouldn’t have to trust anyone.

While FHE enables the processing of encrypted data, it cannot verify correctness of either user inputs or computation. Without the ability to check for either, FHE is hardly useful in the context of blockchain.

Solution: ZKPs for User Inputs + Computing Party:

While FHE enables anyone to perform arbitrary computation over encrypted data, ZKPs allow one to prove something is true without revealing the underlying information itself. So how do they relate?

There are three key ways FHE and ZKPs fit together:

The user needs to submit a proof that their input ciphertext is well formed. Meaning the ciphertext meets the requirements of the encryption scheme and isn’t just arbitrary strings of data.
The user needs to submit a proof that their input plaintext satisfies a given application’s conditions. Ex. amount_balance > amount_transfer
The validator node (aka computing party) needs to prove they’ve correctly executed the FHE computation. This is referred to as “verifiable FHE” and can be seen as analogous to ZKPs needed for rollups.

To further explore the applications of ZKP:

ZKP of Ciphertext

FHE is built on lattice-based cryptography, meaning the construction of the cryptographic primitive involves lattices, which are PQ (post-quantum) secure. In contrast, popular ZKPs such as SNARKS, STARKS and Bulletproofs do not rely on lattice-based cryptography.

To prove that the user’s FHE ciphertext is well-formed, we need to show that it satisfies some matrix-vector equation with entries from rings, and that the numerical values of these elements are small. Essentially, we need a cost-effective on-chain verification proof system designed for handling lattice-based relations, which is an open area of research.

ZKP of Plaintext Input

In addition to proving a well-formed ciphertext, the user needs to prove their input plaintext satisfies an application’s requirements. Luckily, unlike the prior step, we can utilize general SNARKs to prove validity around user inputs, therefore enabling developers to make use of the existing ZKP schemes, libraries and infrastructure.

However, the challenging part is we need to “connect” the two proof systems. Connect as in we need to ensure the user used the same input for both proof systems; otherwise a malicious user could cheat the protocol. Once again, this is an incredibly difficult cryptographic feat and an open area of early research.

Sunscreen has also laid the groundwork with their ZKP compiler that offers support for Bulletproofs as it most easily connects with SDLP. Future development is being done to “link” the FHE and ZKP compiler.

Mind Network has been pioneering the integration of ZKPs to ensure zero-trust inputs and outputs alongside leveraging FHE for secure computation.

ZKP of Correct Computation

FHE run on existing hardware is extremely inefficient and expensive. As we’ve seen the dynamic of the scalability trilemma play out before, networks with higher node resource requirements cannot scale to a sufficient level of decentralization. A possible solution could be a process termed “Verifiable FHE,” a process where the computing party (validator) submits a ZKP to prove honest execution of transactions.

An early prototype of verifiable FHE can be displayed by a project called SherLOCKED. Essentially the FHE computation is offloaded to Risc ZERO’s Bonsai which processes computation over the encrypted data offchain and returns the result with a ZKP and updates the state onchain.

A recent example of leveraging an FHE coprocessor can be seen with Aztec’s onchain auction demo. As we covered earlier, existing FHE chains encrypt the entire state with a threshold key, meaning the system is only as strong as its threshold committee. Conversely, in Aztec, user’s own their own data, and therefore not subject to a threshold security assumption.

Lastly, it’s important to touch on where a developer can build an FHE application in the first place. Devs can build their applications on either an FHE powered L1, an FHE rollup or build on any EVM chain and leverage an FHE coprocessor. Each design comes with it’s own set of tradeoffs, however we are more inclined towards well-designed FHE rollups (pioneered by Fhenix) or FHE coprocessors as they inherit security from Ethereum among other benefits.

Up to recently, achieving fraud proofs on FHE encrypted data was complex, but recently the team at Fhenix.io demonstrated how to achieve fraud proofs using the Arbitrum Nitro stack paired with FHE logic compilation to WebAssembly.

Data Availability (DA) Layer of FHE

Challenge: Lack of Customizability and Throughput

While storage has been commoditized in web2, the same doesn’t hold true in Web3. Given FHE maintains a large data blow-up of 10,000x+, we need to figure out where to store large FHE ciphertexts. If every node operator in a given DA layer were to download and store every FHE chain’s data, only institutional operators would be able to keep up with the demand, risking centralization.

Regarding throughput, Celestia tops out at 6.7mb/s, if we want to run any form FHEML, we would need easily n GBs+/sec. As per recent data shared by 1k(x), existing DA layers cannot support FHE due to design decisions that limit throughput (intentionally).

Solution: Horizontal Scaling + Customizability

We need a DA layer that can support horizontal scalability. Existing DA architectures propagate all data to every node in the network which is a major constraint for scalability. Conversely, with horizontal scalability, as more DA nodes enter the system, each node is storing 1/nth of the data, improving performance and cost as more blockspace is made available.

Separately, given the substantial data size associated with FHE, it would make sense to offer developers higher level of customizability around erasure coding thresholds. i.e. How much of the DA system are you comfortable on a guarantee with? Whether stake based weighting or decentralization based weighting.

The architecture of EigenDA serves as the basis of what a DA layer for FHE could look like. It’s properties of horizontal scaling, structural cost reduction, decoupling of DA and consensus etc.. all gives way to a level of scalability that could one day support FHE.

Lastly, a high level idea could be to build optimized storage slots for storing FHE ciphertexts since ciphertexts follow a particular encoding scheme, so having optimized storage slots might help with efficient use of storage and faster retrieval.

Fully Homomorphic Encryption (FHE) Hardware

Challenge: Low Performance FHE Hardware

In the promotion of on-chain fully homomorphic encryption (FHE) applications, a major issue is the significant latency due to computational overhead, which makes running FHE impractical on any standard processing hardware. This limitation stems from the large amount of interaction with memory due to the huge amount of data that the processor needs to process. In addition to memory interactions, complex polynomial calculations and time-consuming ciphertext maintenance operations also bring a lot of overhead.

To further understand the state of FHE accelerators, we need to uncover the design space: Application-specific FHE accelerators vs. generalizable FHE accelerators.

The crux of computation complexity and hardware design for FHE is always related to the number of multiplications needed for a given application, referred to as the “depth of homomorphic operation.” In the application-specific case, the depth is known, so the system parameters and the related computation are fixed. Therefore, hardware design for this application-specific case will be easier and usually can be optimized for better performance than the generalizable accelerator design. In the general case, where FHE is required to support an arbitrary number of multiplications, bootstrapping is needed to remove part of the noise accumulated in the homomorphic operations. Bootstrapping is expensive and requires considerable hardware resources including on-chip memory, memory bandwidth and computation, which means the hardware design will be very different from the app-specific design. Hence, while the work done by major players in the space, including the likes of Intel, Duality, SRI, DARPA and many others are no doubt pushing the limits with their GPU and ASIC based designs, they may not be 1:1 applicable to support blockchain use cases.

Regarding the development cost：Generalizable hardware requires more capital to design and manufacture than the application-specific one, but its flexibility allows the hardware to be used in a broader application scope. On the other hand, with app-specific design, if an application’s needs change and require support for deeper homomorphic computation, then the app-specific hardware would need to be combined with some software techniques, such as MPC, to achieve the same end as generalizable hardware.

It’s important to note, onchain FHE is far more difficult to accelerate than application-specific use cases (ex.FHEML) because it requires the ability to process more general computations versus a more niche set. To illustrate, fhEVM devnet is constrained to single digit TPS at the moment.

Given we need an FHE accelerator tailored to blockchain, another consideration is how transferable is the work from ZKP hardware to FHE hardware?

There are some common modules between ZKP and FHE, such as the number theoretic transform (NTT) and the underlying polynomial operations. However, the bit width of NTT used in these two cases are different. In ZKP, the hardware needs to support a wide range of bit width for NTT, such as 64-bit and 256-bit, while in FHE, the operands for NTT are shorter due to being represented in the residue number system. Secondly, the degree of NTT handled in the ZKP is in general higher than that of the FHE case. Because of these two reasons, it is not trivial to design an NTT module that achieves satisfying performance for both ZKP and FHE. Other than NTT, there are some other computing bottlenecks in FHE, such as automorphism, which are not present in ZKPs. A naive way of transferring ZKP hardware to the FHE setting is to offload the NTT computing task to the ZKP hardware and use the CPU or other hardware to handle the rest of computation in FHE.

To round off the challenges, computing on encrypted data with FHE used to be 100,000 times slower than on plaintext data. However, thanks to the newer encryption schemes and recent developments in FHE hardware, the current performance has dramatically improved to around 100x slower than plaintext data.

Solutions:

Improvement in FHE Hardware

There are many teams actively building FHE accelerators, however, they are not working on FHE accelerators that are specific to generalizable blockchain compute (i.e. TFHE). An example of a blockchain specific hardware accelerator is FPT, a Fixed-Point FPGA accelerator. FPT is the first hardware accelerator to heavily exploit the inherent noise present in FHE calculations by implementing TFHE bootstrapping entirely with approximate fixed-point arithmetic. Another project that could be useful for FHE is BASALISC, an architecture family of hardware accelerators that aims to substantially accelerate FHE computations in the cloud.

As denoted earlier, one of the primary bottlenecks in FHE hardware design is the massive interaction with the memory. To circumvent this barrier, a potential solution is to reduce the interaction with memory as much as possible. The processing-in-memory (PIM) or near memory computation is probably helpful in this scenario. PIM is a promising solution to address the “memory wall” challenges for future computer systems, which allows the memory to serve both computation and memory functions, promising a radical renovation of the relationship between computation and memory. In the past decade, tremendous progress has been made in designing the non-volatile memories for this purpose, such as resistive RAM, spin-transfer torque magnetic RAM and phase change memory. Using this special memory, there has been work showing significant computation improvement in machine learning and lattice-based public-key encryption.

Optimized software and hardware

In recent developments, software has been acknowledged as a critical component in the hardware acceleration process. For instance, prominent FHE accelerators such as F1 and CraterLake use compilers for a hybrid software/hardware co-design approach.

As the space advances, we can expect fully functional compilers to be co-designed with blockchain-specific FHE compilers. FHE compilers can optimize FHE programs based on the cost model of the respective FHE scheme. These FHE compilers could be integrated with FHE accelerator compilers to improve end-to-end performance by incorporating a cost model from a hardware-level aspect.

Networking FHE Accelerators: The Shift from Scale-up to Scale-out

Existing FHE hardware acceleration efforts largely focus on “scale-up,” which means focusing on the improvement of a single accelerator vertically. On the other hand, “scale-out” places focus on connecting multiple FHE accelerators by networking horizontally, which could drastically improve performance, akin to parallel proof generation with ZKPs.

One of the primary difficulties with FHE acceleration is the data inflation problem: Refers to the significant increase in data size that occurs during encryption and computation, posing challenges for efficient data movement between on-chip and off-chip memories.

Data inflation poses a significant challenge to connecting multiple FHE accelerators via networking horizontally. Therefore, FHE hardware and network co-design will be a promising direction of future research. An example could include an adaptive network routing for FHE accelerators: Implementing a routing mechanism that dynamically adjusts data paths between FHE accelerators based on real-time computational load and network traffic. This would ensure optimal data transfer rates and efficient resource utilization.

Conclusion

FHE will fundamentally reimagine the way we secure data across platforms, paving the way for a new era of unprecedented privacy. To build such a system will require significant advancements in not only FHE, but in ZKPs and MPC as well.

As we venture into this new frontier, collaborative efforts among cryptographers, software engineers, and hardware specialists will be imperative. Not to mention lawmakers, politicians etc. as regulatory compliance is the only path to mainstream adoption.

Ultimately, FHE will stand at the forefront of a transformative wave of digital sovereignty, heralding a future where data privacy and security are no longer mutually exclusive but inextricably unified.

Special Thanks：

Many thanks to Mason Song (Mind Network), Guy Itzhaki (Fhenix), Leo Fan (Cysic), Kurt Pan, Xiang Xie (PADO), Nitanshu Lokhande (BananaHQ) for reviewing. We look forward to reading about the impressive work and efforts these individuals are doing in the field!

Disclaimer：

This article is reprinted from [HashKey Capital]. All copyrights belong to the original author [Jeffrey Hu、Arnav Pagidyala]. If there are objections to this reprint, please contact the Gate Learn team, and they will handle it promptly.
Liability Disclaimer: The views and opinions expressed in this article are solely those of the author and do not constitute any investment advice.
Translations of the article into other languages are done by the Gate Learn team. Unless mentioned, copying, distributing, or plagiarizing the translated articles is prohibited.

Start Now

$100

Voucher!

Create Account

Unlocking the Holy Grail: Challenges and Solutions of Fully Homomorphic Encryption on Chain

Core ideas: