Filters Support Over HTTPS

Today, we're happy to announce that eth_newFilter and related RPCs are now generally available over HTTPS. Previously, these RPCs were only available over WebSocket connections; in this post I hope to explain how we enabled support for filters over HTTPS at Infura's scale.

Recall that filters are inherently "stateful", each time one calls eth_newFilter a new filter ID is returned, and then that ID is polled for changes via eth_getFilterChanges. This normally requires an Ethereum client to keep track of the state of each filter locally, and introduces a couple challenges. First off, tracking filter changes per ID is not cheap; recent Mainnet blocks can easily contain over 300 log events per block. Additionally, if the Ethereum client where your state is stored needs to undergo maintenance all active filters on that node may be lost.

So, to enable access to filters over HTTPS, we took a different approach. Rather than tying filter IDs to a backend Ethereum node, our filter IDs are associated with your Infura Project ID. This means that they are not tied to a single connection, and thus unlike WebSockets can be shared across different HTTPS connections.

To do this, we had to move the storage of filter "state" out of the Ethereum node itself, and into a new backend system we've dubbed "Virtual Filters".

This system is comprised of three high-level services:

  • A replicated, in-memory cache of the recent Ethereum blockchain, including all reorgs.
  • A backend store where we track the "position" of each filter in the above chain
  • And a microservice which converts filter related RPCs into actions to perform on the services above.

When a user requests a new filter via eth_newFilter, we store in the backend the filter parameters they passed in, as well as the current "head" blocks hash in what we label the filters "block cursor". Then, to update the filters state during eth_getFilterChanges, we find a path from that cursor position to the new "head" block, and output data accordingly.

For example, given the chain state below, and two filter cursors (labelled A and B):

Example Cursors A and B

In the case of cursor A the path to the head of the chain simply walks from blocks 996 to 1000 in sequence. However, for cursor B, the path is a bit more complex.

  • First, we must walk "backwards" to block 997 to get off the reorg'ed "side chain"
  • Then, we can walk forward to block 1000.

So, in the case of cursor B, we must tell the user than any of the side chain logs were removed ("removed": true in the eth_getFilterChanges results) and then report all the new logs in blocks 998, 999, and 1000 (the logs in block 997 would've been previously reported). In addition, for all the logs found, we need to apply the users original filter criteria (e.g. what topics or addresses the filter was limited to).

Finally, in both cases, we would update the filter state on the backend to note that the new head is block 1000 with hash 0x104, so we are prepared to repeat the process the next time the user calls eth_getFilterChanges.

With this system in place, we can offer a couple of improvements over the standard filter APIs. Since standard ethereum nodes store all the filter results directly in memory, they usually require a user to poll their filter every five minutes to "drain" that information or the filter is discarded. Since we store less state per filter, we have increased that timeout to fifteen minutes. Additionally, our filters limit the amount of "spam" around reorgs. Since the path is only calculated when eth_getFilterChanges is called, we do not have to contain results for blocks which were added but later reorged out in between calls to eth_getFilterChanges, resulting in smaller, more meaningful payloads for our customers.

However, there is one trade-off. Standard nodes offer the ability to create a filter on the pending logs in the nodes transaction mempool, something the above solution doesn't support. However, we found that less than 0.002% of our users use this feature, and we have ideas on how we can implement it if needed. So, if access to pending filters is important to you, please reach out to us and explain your use-case, we'd love to discuss our plans for providing access to pending logs in the future.

Want more insight from the Infura engineering team? Subscribe to the Infura newsletter and never miss a post. As always, if you have questions or feature requests, you can join our community or reach out to us directly.