Replication is a fundamental idea in collaborative editing systems
Sub-problem 1: Source of Truth
Offline-mode support is unachievable if we don’t keep a local copy of the data that the client can operate on while offline.
The basic idea is that we let the server maintain the source of truth for the conversation thread and we make a copy (replica) of that conversation thread on each client.
Each client operates on their replica based on events from the server or the user but only the server is allowed to make updates to the source of truth.
The clients collaborate on making changes to the source of truth by sending update requests to the server and syncing server states with their respective replica states.
Does the source of truth need to exist on the server? Not necessarily. In decentralized systems where there is no single authority to determine the final state that every client needs to be on. All replicas can reach eventual consistency using techniques that are widely deployed in distributed systems like massive-multiplayer-online-games and peer-to-peer applications. It would be interesting to see how distributed computing techniques can be applied to web applications so that our data is not owned by a centralized authority like OkCupid (the premise of the Web 3 movement).
But in our Web 2 world, we have a server that is the gatekeeper for communications between two users as we see in this example.
When Alice and Bob first open their chat app, their replicas are populated by the source of truth from the server via an API request. A WebSocket connection is also established between their clients and the OkCupid server to stream any updates to the source of truth.
- Send (and re-send) a message
- React to a message
- Send a read receipt
Next, we will look at how we keep the replicas in sync with the source of truth when mutations are applied.
Sub-problem 2: Consistency Maintenance
In our chat app system, we have two replicas of the conversation thread on Alice and Bob’s devices. We would like to keep the replicas in sync with each other. In a chat app, you can’t really have a conversation when your replica is showing a different chat history than your conversation partner’s replica.
The replicas can become out of sync when Alice and Bob are proposing changes to the conversation thread (e.g., adding a new message to the thread or reacting to a message).
Suppose Alice wants to send Bob a message M1 , Alice makes a request to the server to update the source of truth after applying the change optimistically to her replica. Meanwhile, Bob is drafting a message M2 to Alice and sends it shortly after Alice sends M1 .
In a perfect zero-latency world, Alice and Bob will get each other’s messages instantaneously and their replicas will always be in sync.
In the real world, server and network latencies both contribute to the order in which mutation requests are processed and broadcasted, which affects what Alice and Bob eventually see in their steady-state replicas after all the messages are done being sent and received.
For instance, when the server receives the request from Alice, it needs to do some work which takes time. Maybe it runs some expensive checks on the incoming message for inappropriate content before it adds the message to the database (which also takes time) and broadcasts that mutation to Bob. You can implement timeouts in the server-client contract to provide some guarantee that the mutation will be successfully processed in a given window of time but there is still secret benefits on secreal some variability in the server latency.