Introduction

In our problem statement, the concept of verifiable queries was introduced. We can split the verifiable queries problem into two main subproblems:

Ultra Light Client: Verifying that the indexer is working on top of the current valid state of the blockchain.
Verifiable Query Execution: Verifying that the indexer is executing the correct computation.

These subproblems are pretty much independent from each other. In fact, the first one is agnostic to The Graph’s inner working and encapsulates the broader challenge of ultra light clients. The indexer wants to prove to the consumer (a frontend) that it is executing the indexing logic on top of the current valid state of the blockchain. Otherwise, the indexer could be executing the indexing correctly but in an old version of the chain, or even a completely different one. A relevant caveat here is that the consumer does not run the blockchain! We can’t just compare the indexer’s blockchain state with the frontend’s blockchain state, because the frontend doesn’t have a blockchain state at all. The end goal of the first subproblem, then, is to quickly prove to someone with no access to or interaction with the blockchain that the block header we are showing them is the header of the most recent valid block in the chain.

Once this is fulfilled, the second problem consists of proving that the indexing is executed correctly. Arguably, this verification is easier to do for subgraphs that correctly use events than for subgraphs that involve complex inner logic. In this proof of concept, we will only contemplate the indexing of events and their metadata, and we won’t work with trace data or other more complicated indexing methods.

Ultra Light Client

Succinct Labs faces a similar problem to ours. To solve it, they use Ethereum’s sync committees, a light-client-oriented feature introduced in the Altair fork.

The sync committee is the "flagship feature" of the Altair hard fork. This is a committee of 512 validators that is randomly selected every sync committee period (~1 day), and while a validator is part of the currently active sync committee they are expected to continually sign the block header that is the new head of the chain at each slot.

The purpose of the sync committee is to allow Light Clients to keep track of the chain of beacon block headers. The other two duties that involve signing block headers, block proposal and block attestation, do not work for this function because computing the proposer or attesters at a given slot requires a calculation on the entire active validator set, which Light Clients do not have access to (if they did, they would not be light!). Sync committees, on the other hand, are (i) updated infrequently, and (ii) saved directly in the beacon state, allowing Light Clients to verify the sync committee with a Merkle branch from a block header that they already know about, and use the public keys in the sync committee to directly authenticate signatures of more recent blocks.

In this way, a block is trusted if two thirds or more of the sync committee signed its header. These signatures can be aggregated and therefore efficiently verified. It is also worth noting that block headers contain a commitment to both the current sync committee and the next one. Thus, the information contained in a block header from the current sync period is enough to verify block headers from the next sync period as well, allowing us to inductively verify blocks, looking at just one block header per sync period.

Untitled

This last approach is similar to what the Celo team did with their ultra light client Plumo. To get to the current state of the chain, we hardcode the “genesis” (since the Merge) sync committee, and read and verify the sync committee signatures of one block header per sync period (roughly one day). The main improvement here is that we don’t need to go through all the blocks in the chain, but just through one block per sync period. The signatures could be aggregated and then verified using recursive SNARKs, building a SNARK that verifies the proof for the last sync period and the validity of the new sync committee. Celo used these recursive SNARKs in Plumo, although we still need to dig deeper into them and make sure they would also work for Ethereum.

Verifiable Query Execution

As mentioned before, when verifying query execution, we will focus on subgraphs with contracts that correctly log events. Thus, solving the ultralight client subproblem will help us obtain a provably valid receipt tree root, and so the verifiable query execution subproblem becomes verifying data extraction from that tree.

Proving inclusion of an event log is pretty straightforward: you just provide the Merkle proof and check it against the receipt root in the block header. Proving exclusion, or counting the emissions of an event, is harder as you need to go through all the receipts. This means that building this proof will take a considerable time, and therefore we don’t want this proof to be built every time a consumer sends a query. We want something more general, that only requires building one proof per block and can then be re-used for any query pertaining that block. Moreover, the solution we propose does not need every indexer to build their own proof: it could be just one indexer building it and propagating (or selling) it to the network. In this way, for each block only one proof is built in the entire network! The construction of this proof can easily be parallelized among a few indexers, so that it is faster to complete.

Let’s now go through how we achieve this. First of all, our subgraph will define a fixed list of contracts of interest. We will sort this list in lexicographic order and associate the n-th event in the list to the n-th prime number.

In this way, if the list of contracts of interest of our subgraph is:

contracts_of_interest = [0x0d4a11d5EEaaC28EC3F61d100daF4d40471f1852, 0x1f9840a85d5aF5bf1D1762F925BDADdC4201F984, 0x7a250d5630B4cF539739dF2C5dAcb4c659F2488D, 0x88e6A0c2dDD26FEEb64F039a2c41296FcB3f5640]