Overview
One of the most frequently asked questions we get is, in one form or another, “why is this server so laggy?”. While we do our best to keep TPS as high as possible, lag is still an issue. This is for a variety of reasons, some out of our control.
In this post I will highlight the major factors and explain their causes and trade-offs. The most important thing to remember is running a server is a balancing act between giving players the features and the gameplay they want, having enough capacity to handle all the players that want to be on the server and ensuring the performance is acceptable. No optimisation-related decision can be made without impacting one of these factors, if it could, all servers would run with 20TPS and unlimited player slots.
Entities
The primary and most obvious cause of lag is a great number of entities. This includes all dropped items, monsters, animals and NPCs such as villagers. Each of these kinds of entities has its own source of lag. For example, pathfinding was found to be very intensive not only for villagers, but for monsters too. This is why, at the time of writing, zombie aggressiveness towards villagers is disabled on Purity Vanilla. From our testing, it was found that that process alone was using up to 15% of each tick. Entity collisions are another source of lag although they can be reduced with configuration. This is why our server has a per-chunk limit for each mob type in addition to bukkit’s spawning limits, as our players build huge mob farms and grinders. Our goal is to allow these farms to function as efficiently as possible without causing too much lag while not deviating from vanilla spawning too much outside of farms. However, these limits often result in fewer naturally spawning mobs which can be irritating for players. Paper’s per-player mob spawning is useful, however, still leads to trying to balance farm efficiency and vanilla gameplay.
To evidence the extent of the impact of huge mob farms on server lag, we took sampler reports during one of our temporary maps, for which the world had only been running for a few hours. At 80 players online we only had 33ms ticks on average (20TPS means < 50mspt). Compared to our normal map, which usually runs at 50-60mspt at 80 players, this is a huge improvement. Considering chunks are likely being loaded at the same rate, if not more as people are more likely to explore a fresh map, this shows the impact mob farms have and the benefits gained from nerfing them.
Chunks
Another contributor to overall server lag is chunk loading and generation. Paper actually implements asynchronous loading/saving of chunks which is a huge performance boost. However, not all chunk loads are asynchronous. “Sync loads” can be triggered by a variety of events. A chunk loading synchronously essentially means the server is unable to process anything else until the chunk has loaded, which can cause lag. A very common cause of sync loads in survival servers is player teleporting such as /tpa or /home, although more recent plugins should use asynchronous teleporting. However, even though servers like Purity have no player teleporting, sync loads can still be triggered by maps, cartographers or certain EntityPlayer methods when called by the server. Brief but severe lag spikes can sometimes be attributed to these synchronous map loads.
Just because the majority of chunk loading in Paper is asynchronous does not mean it doesn’t cause any lag. Players using elytra with a trident can easily cause lag from loading chunks too quickly on less powerful hardware or in a shared server environment. This is why pregenerating chunks is recommended for survival servers with a reasonable world border. However, for anarchy servers or just survival servers with large world borders (over 50k blocks) pregenerating is unfeasible due to the file size of a fully generated world - Purity’s overworld is currently 1.5TB at only 8% generated.
Without pregenerating, the server is constantly generating new chunks while players are online, which puts more strain on the CPU.
Lack of Threading
The majority of the Minecraft Server process is run on a single thread. This means, for a single-instance server such as Purity Vanilla, no matter how many threads our server’s CPU has, the Minecraft server can’t take advantage of them all. There are plenty of reasons why Minecraft isn’t properly threaded to take advantage of more CPU threads and going back and optimising the server’s code in this way is a huge task for even Mojang to undertake. As it stands, our server runs on one of the most powerful CPUs in terms of single-threaded performance, the 3950X. We have used a 9900k in the past but we have found Ryzen 3 has given us better performance. This is why, no matter how many donations the server receives, we cannot just “buy a better server”. The fact we are using a 3950X dedicated server for a single Minecraft server is already overkill.
As for why minigame servers like Hypixel can have thousands of simultaneous players, these servers don’t just run a single Minecraft server and fill it with players. They use a proxy like bungeecord and a load-balancer to distribute their players among any number of backend servers. If you think about it, even without any entities or chunk loading, Hypixel’s 100+ player lobbies can still lag. This means each minigame lobby has its own server instance and will only handle 20-40 players.
This kind of load-balancing is impossible for a survival server like Purity since all of our players are playing in the same world at all times.
Hardware Limitations
As I mentioned above, how many players that can play on your Minecraft server is heavily bound by your CPU and its single-threaded performance. Many shared hosts will advertise their use of “enterprise” hardware for the utmost performance, however this invariably means Intel Xeon CPUs, not intended for single-threaded applications. Even hosts which use Intel i9s or high-end Ryzen CPUs will often run more servers on each node than is optimal. This is known as overallocation and should be avoided if at all possible.
Regarding Purity, we run on the most powerful available hardware because it is within our budget. At the lower end, you need to be much more conscious of what hardware you are paying for. We are instead limited by the limitations of the Minecraft server itself.
Garbage Collection
In Java, objects in memory that are no longer used do not need to be explicitly marked to be deleted by the programmer. Instead, The JVM preforms “garbage collection”, during which the program’s heap is checked for unused objects that can be deleted to free up memory. On recent Java versions, the default garbage collector is G1GC. This process can cause lag spikes if misconfigured. If you are using G1GC, make sure to use Aikar’s Flags. On Purity, we use ZGC, a garbage collector designed for the lowest possible pause times. (Your mileage will vary with ZGC, I may write a post about our setup and how it works for us at some point.)
Either way, garbage collection involves additional work to be done by the CPU which can cause lag spikes while it is happening. This could be as a result of GC pauses from G1 or higher CPU utilisation with ZGC. Appropriate memory allocation and heap size is important, don’t just allocate as much memory as possible.
Summary
There are a large number of reasons why servers can seem laggy to a player, in addition to those highlighted above. This post was written to explain the main sources of lag on Purity Vanilla, although they are likely applicable to other survival servers. Each properly configured server will have its own limitations and changes to reduce lag which you will need to adapt to in order to play on.
A lazily configured or misconfigured server will lag no matter what. Read pre-existing optimisation guides for a good baseline, but constant tweaking is necessary to meet your own server’s needs. Purity’s optimisation has been an ongoing process for the entire lifetime of the server and will likely never be completed.