Inborg Upgrade for the Polygon PoS Network: State Sync With Indore

Polygon Labs
July 4, 2023
Polygon Solutions
Image source: Dribbble

Researchers are constantly studying the Polygon proof-of-stake (PoS) network to find ways to improve the public, decentralized protocol. 

After the network experienced a few long block reorgs earlier this year, affecting user experience and chain stability, researchers at Polygon Labs set out to determine the cause and propose solutions to the community.

The Polygon Improvement Proposal (PIP) framework, which largely resembles the EIP framework, provides a coordination layer for all upgrades to Polygon PoS. This allows for ecosystem consensus to emerge in the forum and the Polygon Protocol Governance Calls, among other mediums. 

Facilitated by the PIP framework, the Inborg Upgrade consists of two proposals:

  1. Indore [PIP - 12] – A proposed upgrade to the State Sync mechanism to enhance network stability and eliminate potential BADBLOCK errors.
  2. Aalborg [PIP-11] – Introduces the concept of “Milestones” to arrive at faster finality on the Polygon PoS network. 

This is the two-step Inborg Upgrade, intended to enhance network stability and increase finality time, with the Aalborg scheduled for on-chain consensus next month. 

This post will describe the Indore Upgrade, and show how it is designed to increase stability of the network. 

In short, if implemented, the Indore State Sync Upgrade will:

  1. Enhance network stability.
  2. Eliminate a bug – the BADBLOCK error – that sometimes arises when a reorg (network partition) is longer than 16 blocks. 
  3. The upgrade eliminates this error by changing the State Sync mechanism, moving it from a block-based to a time-based design. 
  4. The upgrade will not impact user or developer experiences.

Let’s dive in.

What is the “State Sync” mechanism and why does it matter?

A quick refresher. The Polygon PoS network relies on a dual-consensus architecture: a validator layer, Heimdall, and a block producer layer, Bor. 

Heimdall is in charge of initiating the State Sync mechanism. This is the process by which the network reads (events) data from Ethereum. Eventually, that data is passed onto Bor, the layer where the stuff of the Polygon PoS network happens: transactions and block production and so on. How does that process work, the State Sync from Ethereum to Heimdall to Bor?

To make sure that the state of the Polygon PoS network is up-to-date with the state of Ethereum, at the start of every sprint (which is the number of contiguous produced blocks by a single validator on Bor), Bor fetches the State Sync events from Heimdall. 

It does so using two arguments: 

1. fromID - a value which is a unique, incremental identity of the state; and 

2. to - a value, which is a timestamp.

In simple terms, this means that at the start of a sprint, Bor petitions Heimdall for all the State Sync events that have occurred between two different points: the “fromID” and the “to” value, a particular time. It’s gathering and reading all the data from Ethereum in Heimdall during this period. The “to” value is calculated by taking the “current block,” at the start of the sprint, and subtracting the most recent sprint length (which is 16 blocks, after the Delhi hardfork). 

Here’s what it looks like, spelled out. “To” is the timestamp of block n, calculated as follows: 

n = (current block number) - (sprint length)

After this calculation, the timestamp of a particular block n determines the range of the State Sync events to be included in the current block.

But sometimes an error occurs: during a network partition, if two forks are running in parallel for more than a sprint length, determining the “to” time value gets tricky. Two forks have different “to” values to calculate the State Sync events–and therefore, instead of merging into a single canonical chain, they’re hit with a BADBLOCK error. 

Let’s break down why that happens. 

How a “network partition” or reorg can introduce an error

Think about it like parallel dimensions, or alternative timelines. 

For some amount of time, the universe, i.e., a segment of blocks produced in the Bor layer, splits into more than one world. In the case of the Polygon PoS network, that might mean that a node has gone offline or is, for some reason, producing blocks in isolation.

But then it comes back online.

Now there are two parallel partitions, chain A and chain B, with two states of the network, and two conflicting truths about what has and is happening–and, importantly, when. If these two dimensions attempt to merge, suddenly there is a discrepancy between the exact time to indicate a range for retrieving State Sync events from Heimdall–because the “to” value is determined by prior blocks and each chain has different prior blocks. Different timelines.

In most cases, the timelines agree quickly on the “canonical” version and one fork becomes the truth. The main timeline is restored. 

Occasionally, however, when these forks are longer than 16 blocks, the chains have a difficult time determining the “to” value. Instead of determining the exact “to” value, the BADBLOCK error occurs, leaving the network in limbo. That means that sometimes it takes a while – many, many blocks – for resolution to occur. 

The error therefore may affect network stability.

So what’s the proposal that will mitigate this bug in the future? 

Change how Bor determines the “to” time value

This PIP proposes a new genesis parameter called stateSyncConfirmationDelay that changes how the “to” time value is determined. 

Yup, it’s that simple. A minor tweak. 

Instead of calculating the “to” time value based on sprint length to determine the range for retrieving State Sync events from Heimdall, PIP-12 proposes a simple time-based fix for Bor instead. This parameter, stateSyncConfirmationDelay, defines the number of seconds subtracted from the current block’s timestamp to calculate the value of “to” by subtracting 128 seconds. 

The new equation, therefore, would look like this:

to = (current block timestamp) - (128 seconds)

Now, the value of “to” will remain consistent across the network – even in the case of a reorg.

If parallel dimensions arise, there’s a shared timeline.

That’s hugely important, because it enhances network stability and stamps out the BADBLOCK error. With stateSyncConfirmationDelay, the same State Sync events will always be returned from Heimdall.

A quick note: this upgrade is not backward compatible, so, if on-chain consensus is reached by the network, it means this upgrade will be the new canonical Polygon PoS network. 

Want to learn more? Head over to Github.

Tune into the Polygon Labs Blog and our social channels to keep up with updates about the Polygon ecosystem.

Together, we can build an equitable future for all through the mass adoption of Web3!

Website | Twitter | Developer Twitter | Forum | Telegram | Reddit | Discord | Instagram | Facebook | LinkedIn

More from blogs