Building an arbitrage bot: Finding arbitrage opportunities

IntermediateApr 09, 2024
In this article, we perform a pre-selection of token pairs of interest. We then derive the mathematical formula for finding the optimal arbitrage between two pools of the same token pairs.
Building an arbitrage bot: Finding arbitrage opportunities

If your MEV setup doesn’t look like this, you’re ngmi

This article is a part of a series about building an arbitrage bot. The goal of this series is to provide a step-by-step guide to building an automated MEV trading robot that can find and execute arbitrage opportunities on popular decentralized exchanges.

In this article, we perform a pre-selection of token pairs of interest. We then derive the mathematical formula for finding the optimal arbitrage between two pools of the same token pairs. Finally, we implement the formula in code and return a list of potential arbitrage opportunities.

Selecting the token pairs

Precisions on the arbitrage strategy

Before we start looking for arbitrage opportunities, we have to clearly define the perimeter of our arbitrage bot. Specifically, what type of arbitrages do we want to act on. The safest kind of arbitrage is between pools that involve ETH. Since ETH is the asset with which the gas of our transactions is paid, it is natural to always want to end up with ETH after an arbitrage. But everyone is tempted to think like this. Keep in mind that in trading, punctual opportunities get less and less profitable as more people act on them.

For the sake of simplicity, we will focus on arbitrage opportunities between pools that involve ETH. We will only look for opportunities between two pools of the same token pair. We will not trade on opportunities which involve more than 2 pools in the trading route (so-called multi-hop opportunities). Note that upgrading this strategy to a riskier one is the first step you should take to improve the profitability of your bot.

To improve on this strategy, you could for example keep some inventory in stablecoins and to act on arbitrage opportunities that yield stablecoins. The same could be done for much riskier assets like shitcoins (with necessary precautions), and periodically rebalance your portfolio into ETH to pay for gas.

Another direction would be to abandon the implicit assumption of atomicity that we made, and introduce statistical reasoning in our strategy. For example, by buying one token in a pool when the price has moved favorably more than some amount of standard deviations, and selling it later (mean reversion strategy). This would be ideal for shitcoins which aren’t listed on much more efficient centralized exchanges, or ones which are but whose price isn’t correctly tracked on-chain. This involves many more moving parts and is out of the scope of this series.

Selecting the token pairs

Now that we have defined the perimeter of our arbitrage bot, we need to select the token pairs we want to trade on. Here are the 2 selection criteria we will use:

  • The pairs will selected must involve ETH.
  • The pairs need to be traded on at least 2 different pools.

Re-using the code from article 2: Efficient reading of pool prices, we have the following code which lists all the token pairs that were deployed by the factory contracts provided:

# [...]

# Load the addresses of the factory contracts

with open("FactoriesV2.json", "r") as f:

factories = json.load(f)

# [...]

# Fetch list of pools for each factory contract

pairDataList = []

for factoryName, factoryData in factories.items():

events = getPairEvents(w3.eth.contract(address=factoryData['factory'], abi=factory_abi), 0, w3.eth.block_number)

print(f'Found {len(events)} pools for {factoryName}')

for e in events:

   pairDataList.append({

       "token0": e["args"]["token0"],

       "token1": e["args"]["token1"],

       "pair": e["args"]["pair"],

       "factory": factoryName

   })

We will simply invert pairDataList into a dictionary where the keys are the token pairs, and the values are the list of pools that trade this pair. When looping through the list, we ignore the pairs that don’t involve ETH. When the loop is over, pairs with at least 2 pools are selected will be stored in lists with at least 2 elements:

# [...]

WETH = "0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2"

pair_pool_dict = {}

for pair_object in pairDataList:

# Check for ETH (WETH) in the pair.

pair = (pair_object['token0'], pair_object['token1'])

if WETH not in pair:

   continue

# Make sure the pair is referenced in the dictionary.

 if pair not in pair_pool_dict:

   pair_pool_dict[pair] = []

# Add the pool to the list of pools that trade this pair.

pair_pool_dict[pair].append(pair_object)

# Create the final dictionnary of pools that will be traded on.

pool_dict = {}

for pair, pool_list in pair_pool_dict.items():

if len(pool_list) >= 2:

   pool_dict[pair] = pool_list

Some stats should be printed to have a better grip with the data we are working with:

# Number of different pairs

print(f'We have {len(pool_dict)} different pairs.')

# Total number of pools

print(f'We have {sum([len(pool_list) for pool_list in pool_dict.values()])} pools in total.')

# Pair with the most pools

 print(f'The pair with the most pools is {max(pool_dict, key=lambda k: len(pool_dict[k]))} with {len(max(pool_dict.values(), key=len))} pools.')

# Distribution of the number of pools per pair, deciles

pool_count_list = [len(pool_list) for pool_list in pool_dict.values()]

pool_count_list.sort(reverse=True)

print(f'Number of pools per pair, in deciles: {pool_count_list[::int(len(pool_count_list)/10)]}')

# Distribution of the number of pools per pair, percentiles (deciles of the first decile)

pool_count_list.sort(reverse=True)

print(f'Number of pools per pair, in percentiles: {pool_count_list[::int(len(pool_count_list)/100)][:10]}')

At the time of writing, this outputs the following:

We have 1431 different pairs.

We have 3081 pools in total.

The pair with the most pools is (‘0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2’, ‘0xdAC17F958D2ee523a2206206994597C13D831ec7’) with 16 pools.

Number of pools per pair, in deciles: [16, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]

Number of pools per pair, in percentiles: [16, 5, 4, 3, 3, 3, 3, 3, 3, 3]

Fetching reserves for 3000 pools can be done in less than 1 second with public RPC nodes. This is a reasonable amount of time.

Now, we that we have all the data we need, we need to start finding arbitrage opportunities.

Finding arbitrage opportunities

General idea

There is an arbitrage opportunity whenever there is a price discrepancy between two pools that trade the same pair. Not all price differences however are exploitable: the gas cost of the transaction sets a minimum value that must be recouped by the trade, and the liquidity in each pool limits the value that can be extracted from a given difference in price.

In order to find the most profitable arbitrage opportunity accessible to us, we will need to compute the potential value extractible from each price difference, considering the reserves/liquidity in each pool, and estimate the gas cost of the transaction.

Arbitrage optimal trade size formula

When an arbitrage opportunity is exploited, the price of the pool that buys the input token will go down, and the price of the pool that sells will go up. The movement of the prices is described by constant product formula.

We already saw in @emileamajar/building-an-arbitrage-bot-automated-market-makers-and-uniswap-2d208215d8c2">article 1 how to compute the output of a swap through a pool, given the reserves of that pool and the input amount.

In order to find the optimal trade size, we first find a formula for the output of two successive swaps, given some input amount, and the reserves of the two pools involved in the swaps.

We assume that the input of the first swap is in token0, and the input of the second swap is in token1, which finally yields an output in token0.

Let x be the input amount, (a1, b1) the reserves of the first pool, and (a2, b2) the reserves of the second pool. fee is the fee taken by the pools, and is assumed to be the same for both pools (0.3% most of the time).

We define a function that computes the output of a swap, given input x and reserves (a,b):

f(x, a, b) = b (1 - a/(a + x(1-fee)))

We then know that the output of the first swap is:

out1(x) = f(x, a1, b1)

out1(x) = b1 (1 - a1/(a1 + x(1-fee)))

The output of the second swap is: (notice the swapped reserve variables)

out2(x) = f(out1(x), b2, a2)

out2(x) = f(f(x, a1, b1), b2, a2)

out2(x) = a2 (1 - b2/(b2 + f(x, a1, b1)(1-fee)))

out2(x) = a2 (1 - b2/(b2 + b1 (1 - a1/(a1 + x (1-fee))) (1-fee)))

We can plot this function using desmos. By choosing the reserve values so that we simulate the first pool having 1 ETH and 1750 USDC, and the second pool having 1340 USDC and 1 ETH, we get the following graph:

Plot of the gross profit of the trade as function of input value

Note that we have actually plotted out2(x) - x, which is the profit of the trade, minus the input amount.

Graphically, we can see that the optimal trade size is at 0.0607 ETH of input, which yields a profit of 0.0085 ETH. The contract must have at least 0.0607 ETH of liquidity in WETH in order to be able to exploit this opportunity.

This profit value of 0.0085 ETH (~$16 when writing this article) is NOT the final profit of the trade, as we still need to take into account the gas cost of the transaction. This will be discussed in a following article.

We want to automatically compute this optimal trade size for our MEV bot. This can be done through elementary calculus. We have a function of one variable x that we want to maximize. The function reaches its maximum for a value of x where the derivative of the function is 0.

Various free and online tools can be used to symbolically compute the derivative of a function, such as wolfram alpha.

Finding the derivative of our gross profit function.

Finding such a derivative is very simple with Wolfram Alpha. You can also do it by hand if you are insecure about your math skills.

Wolfram Alpha yields the following derivative:

dout2(x)/dx = (a1b1a2b2(1-fee)^2)/(a1b2 + (1-fee)x(b1(1-fee)+b2))^2

Since we want to find the value of x that maximizes the profit (which is out2(x) - x), we need to find the value of x where the derivative is 1 (and not 0).

Wolfram Alpha yields the following solution for x in the equation dout2(x)/dx = 1:

x = (sqrt(a1b1a2b2(1-fee)^4 (b1(1-fee)+b2)^2) - a1b2(1-fee)(b1(1-fee)+b2)) / ((1-fee) (b1(1-fee) + b2))^2

With the values of the reserves we used in the graph above, we get x_optimal = 0.0607203782551, which validates our formula (compared to the graph value of 0.0607).

Although this formula is not very readable, it is easy to implement in code. Here is a python implementation of the formula to compute the output of the 2 swaps, and the optimal trade size:

# Helper functions for calculating the optimal trade size

# Output of a single swap

def swap_output(x, a, b, fee=0.003):

return b * (1 - a/(a + x*(1-fee)))

# Gross profit of two successive swaps

def trade_profit(x, reserves1, reserves2, fee=0.003):

 a1, b1 = reserves1

a2, b2 = reserves2

return swap_output(swap_output(x, a1, b1, fee), b2, a2, fee) - x

# Optimal input amount

def optimal_trade_size(reserves1, reserves2, fee=0.003):

a1, b1 = reserves1

a2, b2 = reserves2

return (math.sqrt(a1*b1*a2*b2*(1-fee)**4 * (b1*(1-fee)+b2)**2) - a1*b2*(1-fee)*(b1*(1-fee)+b2)) / ((1-fee) * (b1*(1-fee) + b2))**2

Arbitrage opportunity finder

Now that we know how to compute the gross profit from an arbitrage opportunity between any two given pools of the same token pair, we simply have to iterate over all the token pairs, and all test two-by-two all the pools that have the same token pair. This will give us the gross profit of all the possible arbitrage opportunities that are within the perimeter of our strategy.

To estimate the net profit of a trade, we need to estimate the gas cost of exploiting a given opportunity. This can be done precisely by simulating the transaction through an eth_call to a RPC node, but takes a lot of time and can only be performed for a few dozen opportunities per block.

We will first make a gross estimation of the gas cost by assuming a fixed transaction gas cost (a lower bound, actually), and weed out the opportunities that are not profitable enough to cover the gas cost. Only then will we perform a precise estimation of the gas cost for the remaining opportunities.

Here is the code that goes through all the pairs and all the pools, and sorts the opportunities by profit:

# [...]

 # Fetch the reserves of each pool in pool_dict

to_fetch = [] # List of pool addresses for which reserves need to be fetched.

for pair, pool_list in pool_dict.items():

for pair_object in pool_list:

   to_fetch.append(pair_object["pair"]) # Add the address of the pool

print(f"Fetching reserves of {len(to_fetch)} pools...")

# getReservesParallel() is from article 2 in the MEV bot series

reserveList = asyncio.get_event_loop().run_until_complete(getReservesParallel(to_fetch, providersAsync))

# Build list of trading opportunities

index = 0

opps = []

for pair, pool_list in pool_dict.items():

# Store the reserves in the pool objects for later use

for pair_object in pool_list:

   pair_object["reserves"] = reserveList[index]

   index += 1

# Iterate over all the pools of the pair

for poolA in pool_list:

   for poolB in pool_list:

       # Skip if it's the same pool

       if poolA["pair"] == poolB["pair"]:

           continue

       # Skip if one of the reserves is 0 (division by 0)

       if 0 in poolA["reserves"] or 0 in poolB["reserves"]:

           continue

       # Re-order the reserves so that WETH is always the first token

       if poolA["token0"] == WETH:

           res_A = (poolA["reserves"][0], poolA["reserves"][1])

           res_B = (poolB["reserves"][0], poolB["reserves"][1])

       else:

           res_A = (poolA["reserves"][1], poolA["reserves"][0])

           res_B = (poolB["reserves"][1], poolB["reserves"][0])

       # Compute value of optimal input through the formula

       x = optimal_trade_size(res_A, res_B)

       # Skip if optimal input is negative (the order of the pools is reversed)

       if x < 0:

           continue

       # Compute gross profit in Wei (before gas cost)

       profit = trade_profit(x, res_A, res_B)

       # Store details of the opportunity. Values are in ETH. (1e18 Wei = 1 ETH)

       opps.append({

           "profit": profit / 1e18,

           "input": x / 1e18,

           "pair": pair,

           "poolA": poolA,

           "poolB": poolB,

       })

print(f"Found {len(opps)} opportunities.")

Which produces the following output:

Fetching reserves of 3081 pools.

Found 1791 opportunities.

We now have a list of all the opportunities. We just need to estimate their profit. Right now, we will simply assume a constant gas cost for trading on an opportunity.

We must use a lower bound for the gas cost of a swap on Uniswap V2. Experimentally, we found that this value is close to 43k gas.

Exploiting an opportunity requires 2 swaps, and executing a transaction on Ethereum costs a flat 21k gas, for a total of 107k gas per opportunity.

Here is the code that computes the estimated net profit of each opportunity:

# [...]

# Use the hard-coded gas cost of 107k gas per opportunity

 gp = w3.eth.gas_price

for opp in opps:

opp["net_profit"] = opp["profit"] - 107000 * gp / 1e18

# Sort by estimated net profit

opps.sort(key=lambda x: x["net_profit"], reverse=True)

# Keep positive opportunities

positive_opps = [opp for opp in opps if opp["net_profit"] > 0]

Print stats

# Positive opportunities count

print(f"Found {len(positive_opps)} positive opportunities.")

# Details on each opportunity

 ETH_PRICE = 1900 # You should dynamically fetch the price of ETH

for opp in positive_opps:

print(f"Profit: {opp['net_profit']} ETH (${opp['net_profit'] * ETH_PRICE})")

print(f"Input: {opp['input']} ETH (${opp['input'] * ETH_PRICE})")

print(f"Pool A: {opp['poolA']['pair']}")

print(f"Pool B: {opp['poolB']['pair']}")

print()

Here is the output of the script:

Found 57 positive opportunities.

Profit: 4.936025725859028 ETH ($9378.448879132153)

Input: 1.7958289984719014 ETH ($3412.075097096613)

Pool A: 0x1498bd576454159Bb81B5Ce532692a8752D163e8

Pool B: 0x7D7E813082eF6c143277c71786e5bE626ec77b20

{‘profit’: 4.9374642090282865, ‘input’: 1.7958(…)

Profit: 4.756587769768892 ETH ($9037.516762560894)

Input: 0.32908348765283796 ETH ($625.2586265403921)

Pool A: 0x486c1609f9605fA14C28E311b7D708B0541cd2f5

Pool B: 0x5e81b946b61F3C7F73Bf84dd961dE3A0A78E8c33

{‘profit’: 4.7580262529381505, ‘input’: 0.329(…)

Profit: 0.8147203063054365 ETH ($1547.9685819803292)

Input: 0.6715171730669338 ETH ($1275.8826288271744)

Pool A: 0x1f1B4836Dde1859e2edE1C6155140318EF5931C2

Pool B: 0x1f7efDcD748F43Fc4BeAe6897e5a6DDd865DcceA

{‘profit’: 0.8161587894746954, ‘input’: 0.671(…)

(…)

Which are suspiciously high profits. The first step that should be taken is to verify that the code is correct. After cautiously checking the code, we found that the code is correct.

Are these profits real? As it turns out, no. We have cast our net too wide when selecting which pools to consider in our strategy, and have gotten in our hands pools of toxic tokens.

The ERC20 token standard only describes an interface for interoperability. Anyone can deploy a token that implements this interface, and choose to implement un-orthodox behavior, which is exactly what is at play here.

Some token creators craft their ERC20 so that the pools on which they are traded cannot sell, but only buy the token. Some token contracts even have kill-switches mechanisms that allow the creator to rug-pull all its users.

In our MEV bot, these toxic tokens must be filtered out. This will be addressed in a future article.

If we manually filter out the obviously toxic tokens, we are left with the following 42 opportunities:

Profit: 0.004126583158496902 ETH ($7.840508001144114)

Input: 0.008369804833786892 ETH ($15.902629184195094)

Pool A: 0xdF42388059692150d0A9De836E4171c7B9c09CBf

Pool B: 0xf98fCEB2DC0Fa2B3f32ABccc5e8495E961370B23

{‘profit’: 0.005565066327755902, (…)

Profit: 0.004092580415474992 ETH ($7.775902789402485)

Input: 0.014696360216108083 ETH ($27.92308441060536)

Pool A: 0xfDBFb4239935A15C2C348400570E34De3b044c5F

Pool B: 0x0F15d69a7E5998252ccC39Ad239Cef67fa2a9369

{‘profit’: 0.005531063584733992, (…)

Profit: 0.003693235163284344 ETH ($7.017146810240254)

Input: 0.1392339178514088 ETH ($264.5444439176767)

Pool A: 0x2957215d0473d2c811A075725Da3C31D2af075F1

Pool B: 0xF110783EbD020DCFBA91Cd1976b79a6E510846AA

{‘profit’: 0.005131718332543344, (…)

Profit: 0.003674128918827048 ETH ($6.980844945771391)

Input: 0.2719041848570484 ETH ($516.617951228392)

Pool A: 0xBa19343ff3E9f496F17C7333cdeeD212D65A8425

Pool B: 0xD30567f1d084f411572f202ebb13261CE9F46325

{‘profit’: 0.005112612088086048, (…)

(…)

Notice that in general the profits are lower than the input amount necessary to execute the transaction.

These profits are much more reasonable. But remember that they still are best-case scenario profits, as we have used a very crude estimation of the gas cost of each opportunity.

In a future article, we will simulate the execution of our trade in order to get a precise value of the gas cost of each opportunity.

In order to simulate execution, we need first to develop the smart contract that will execute the trade. This is the topic of the next article.

Conclusion

We have now a clear definition of the perimeter of our MEV arbitrage bot.

We have explored the mathematical theory behind the arbitrage strategy, and have implemented it in Python.

We now have a list of potential arbitrage opportunities, and we need to simulate their execution in order to get a final profit value. To do so, we need to have our trading smart contract ready.

In the next article, we will develop such a smart contract in Solidity, and simulate our first arbitrage trade.

You can find the complete code in the github repo associated with this article. The script is best run in a Jupyter notebook.

Disclaimer:

  1. This article is reprinted from [medium], All copyrights belong to the original author [Emile Amajar]. Original article title”Building an arbitrage bot: Finding arbitrage opportunities (article 3/n)”, If there are objections to this reprint, please contact the Gate Learn team, and they will handle it promptly.
  2. Liability Disclaimer: The views and opinions expressed in this article are solely those of the author and do not constitute any investment advice.
  3. Translations of the article into other languages are done by the Gate Learn team. Unless mentioned, copying, distributing, or plagiarizing the translated articles is prohibited.
Start Now
Sign up and get a
$100
Voucher!
Create Account