Well, maybe “Enemy” is too strong a word, but it’s definitely something you should probably keep away from.
Timestamps are everywhere. We use them generously, and often don’t give it a second thought. It’s so natural to use it, that we never stop and ask ourselves: “Is it really a good idea?”
Well, it turns out that in many cases, the answer is a resounding No.
The Problem with Timestamp
The following scenario is quite similar to a real-world dilemma we had in one of the projects I’m working on:
So what do we have here?
Our system receives messages from various sources, and handle them in the order they were received.
Now, as you can see in the diagram, there are 3 messages sources, each from a different system. The messages are inserted into a centralized pipe (labeled Messages to Handle), and then pulled by the Handling Service and get handled.
Now, the focal point here is the bold one: The messages must be handled in the order they were received.
So now you probably say to yourself: “OK, what’s the problem? Implement the central pipe as a Queue and call it a day! With Queue you are guaranteed to get the messages in the order they were added!”
Well, that is correct, but in our case, we needed the messages to be handled in the order they were sent from the sources. Meaning, if Message #1 was sent using Message Source #1, and Message #2 was sent 0.5 seconds later using Message Source #2, then Message #1 MUST be handled before Message #2, even if Message #2 was added in the Queue before Message #1.
In other words – the order of the handling is set by the time the messages were inserted to their respective Message Source, not by the order in which they are added to the Queue.
So how can we do that?
Well, the first solution that comes to mind is, of course, timestamp.
By timestamping (is that an actual word?!) the messages, we can know what is the correct order of the messages, regardless of the order they we added to the pipe.
So we began to explore that path, and quite quickly hit a wall.
It all started with a rather innocent question:
Who is going to set the timestamp?
We need someone to set the timestamp of the message before it arrives at the pipe. What is the right place for doing that?
So we examined our options:
1. The Message Sources
This is the easiest option. Each message source, when creating the message, will set its timestamp. Easy!
Except it’s not.
The sources are separate applications, running on different servers that are located in different data centers. Naturally, their clocks are different. Each of them has its own clock, displaying different time. If we’ll use their clocks, the timestamp won’t reflect the actual order.
2. Time Server
This is the next logical step.
Let’s add another server to our architecture, with a single purpose:
Provide the time for the message sources.
Something like this:
Yes, that’s a lot of arrows, but it worth it. We now have a Single Source of Truth, and all sources gets the time from the same clock, and all is good.
Except, again, that it’s not.
Yes, we do have a Single Source of Truth, but we also have a Single Point of Failure.
We actually created a huge dependency in our system – everything depends on a single, flimsy, server.
If the Time Server stops working, for any reason – the whole system will come crashing down.
Now, the obvious solution is of course to scale out – let’s add another Time Server, and put a Load Balancer in front of both the servers, and so, if one crashes, we still have the other one!
Well, yes, but then we’ll go back full circle to our original problem – we now have two servers, each with its own clock, and we can’t trust the timestamp provided anymore.
So this option was turned down too.
NTP! Of course! The famous Network Time Protocol!
After all, this is what it was created for, isn’t it? To return the most accurate time, with 24/7 reliably!
Not familiar with NTP? Here is a short intro:
NTP is a network protocol for clock synchronization. It is a hierarchical network of server, each has its own designated stratum. The lower the stratum, the more accurate the time it represents. Stratum 0 is an actual atomic clock or a GPS. Stratum 1 servers have direct connection to Stratum 0, and stratum 1+n (n>=1) have network connection to Stratum n.
You can read a great explanation of NTP in the wikipedia article.
So, it looks like we found our solution, isn’t it?
Well, not so fast…
Buried inside the NTP article, there is this gem:
“NTP can usually maintain time to within tens of milliseconds over the public Internet, and can achieve better than one millisecond accuracy in local area networks under ideal conditions. Asymmetric routes and network congestion can cause errors of 100 ms or more.”
So NTP is really great at telling you what time is it, but for heavy loads of, say, hundreds of messages per second, it won’t work.
At this point we understood timestamp is not going to work, and we need to look elsewhere.
After much debate, and a lot of back-and-forth, we opted for a completely different solution, one that does not involve time.
This is what we ended up with:
Instead of timestamp, we’re using Sequence. Each message gets its own sequence number, which is retrieved from a Sequence Server which, in turn, gets it from a database.
Here we don’t have the Single Point of Failure problem, since there is no problem scaling out the Sequence Servers – they all still talk to a single database (which, as every other database, can be distributed).
The bottom line here is quite simple:
Do not use Timestamp in distributed systems for ordering purposes.
It’s great for, well, timestamp, and you probably won’t have a problem in small systems, but for anything more complex – always prefer sequences.
Let me know what you think of it in the comments!