More work on performance this week. Things were getting a bit too slow for my tastes, meaning that they would likely be unplayable on medium-end phones. This work involved quite a bit of refactoring (which is jargon for “creating new problems to replace your old ones”), so I now have a bunch of screenshots of things… not working the way they should. Because this is a long and technical post, I will intersperse them for comic relief.
So far, the game ran in two parallel threads of execution. One main thread that handles the input (the “UI thread”), and another that does the rendering (the “GL thread”). The game logic (physics, AI, …) was done on the GL thread, before each render operation.
The problem here is with the buffer swap. Like in most, if not all, modern games, two framebuffers are used: the front buffer is visible on the screen, and the back buffer is what we’re drawing to. When we’re done drawing, the two are swapped. This ensures that there’s always a completely rendered image on the screen.
The swap operation ensures that all OpenGL commands have been fully executed, waits for the next vertical refresh to come along, and only then it shows the next image on the screen. The vertical refresh rate is usually 60 times per second, so on average we have to wait for 1000 ms / 60 Hz / 2 = 8.3 milliseconds. During this time, the GL thread blocks, waiting for the video hardware. I was seeing this effect: even though the actual processing took around 28 ms, the entire cycle took about 33 ms, which means a framerate of about 30 frames per second.
Android developer and advocate Chris Pruett explains how his game Replica Island gets around this problem. It uses a third thread, which I’ll call the “game thread”, to perform the actual game logic. While the GL thread is waiting for the buffer swap, the game thread can get busy preparing the next frame. This should keep both the CPU and the GPU working at maximum efficiency. In my particular case, I expect this to lead to a framerate increase of exactly 0.
That’s what you get when you overrun your vertex buffer and render some random data.
Zero? Yes, zero. Rendering of the next frame can start as soon as the previous buffer has been swapped to the screen. The game is still very render-heavy, and unless the rendering can be optimized to take less than 16.7 ms, it will miss the next deadline, and have to wait until 33.3 ms have passed until it can swap the buffers again. Yet, for slower phones, it could mean the difference between 20 fps (missing two deadlines) and 30 fps (missing just one), or between 15 fps and 20 fps; the difference between stuttering and playable. And since I was overhauling the rendering anyway, I figured that this gain would be worth a day or two of effort.
As always with multiple threads, problems crop up when they have to communicate. Each of my threads is essentially a message loop, picking up and handling messages. The UI thread sends input to the game thread. The game thread sends render requests to the render thread. The render thread sends viewport size changes to the game thread. The game thread sends game status updates to the UI thread. (And surprisingly, things do not seem to blow up.)
But by far the largest mass of inter-thread communication happens from the game thread to the render thread. Each time step, the game thread determines which objects need to be rendered, and puts these into a queue. The render thread takes this queue, and renders the objects one by one.
This would make for an interesting challenge, if it didn’t kill the framerate so badly.
But wait a second: what if, while rendering, the game thread posts updates to the objects in the queue? For example, what if my cart slams into another, and my cart is drawn in its new position, but the other one hasn’t been updated yet? The two would seem to penetrate each other. I actually implemented things this way at first, hoping that the effects would not be noticeable, but they are. It gives the really weird result of all objects in the scene (including the camera itself) stuttering, but at different times.
The solution, then, is to keep two queues. One for the game thread to write to, and another for the render thread to read from. Once the game thread is done with a full time step, the queues are swapped, so the render thread can then read from that queue.
But wait another second: what if the new queue replaces the old one while the render thread is still processing it? Should it restart? Continue in the same position in the new queue? Maybe the queue swapping should wait until the renderer is done, blocking the game thread because it has nowhere to write to?
The solution, then, must be to keep three queus. One for the game thread to write to (A), another for the renderer to read from (C), and a queue in between (B) that gets periodically swapped with the other two. Queue B in between would always represent a consistent view of the world. When the game thread is done with a frame, it swaps A and B, which puts B next in line for rendering. When the renderer is ready for another frame, it swaps B and C, and renders the new C.
That’s what you get if you assume that the radius of every object is zero: they’ll get discarded too soon.
Problem solved? Not quite; this scheme, too, is flawed. But I won’t go into that, because the scheme is also useless. Why would we ever want the game thread and the render thread to run completely independently? If the game thread does more time steps than the renderer can show, this is wasted time because the player will never see those frames. And if the renderer runs faster than the game thread, it will have to render the very same frame more than once, which is obviously also a waste of time.
The conclusion is clear: the game thread and render thread must be in lockstep. Each frame that the game thread produces, will be rendered exactly once. It might seem that this negates the use of multithreading, because the threads will have to wait for each other anyway, but this is not the case: while the render thread is drawing the current frame, the game thread can be working on the next.
Using this scheme, the game now uses two queues, which get swapped at a synchronization point. The synchronization is currently done with more message passing: the game thread sends a message to the render thread when a new frame is ready, upon which the render thread starts drawing it, but not before having sent a message to the game thread to start computing the next frame. (Java
synchronized methods may provide a better way to do this. It’s just a few lines to be changed, anyway.)
So far, this works like a charm. I cannot yet say if it has improved the framerate any, because I’m still working on the new rendering queues. Recall that we cannot have objects update their state while rendering is taking place. So basically, instead of giving each object a
render() method, I have the objects send off
RenderPacket objects which contain all information necessary to render it: transformation matrix, texture, vertices, texture coordinates … Not every object has been written with this use in mind, so it will take a few more hours tomorrow to get this done.
A possible future enhancement is to merge these packets in the renderer: putting them into one big buffer, and drawing that in one call. This will only work if they all use the same transformation matrix, so it will require that we do all vertex transformations on the CPU. Whether that is a good tradeoff remains to be seen.