Leverage the Power of the GPU

In developing the procedural the terrain for Hades procedurally I ran into an issue – I couldn’t create the terrain at a reasonable scale so as to make it look naturally flowing without significant slowing of the rendering of the scene. Over the last three weeks I have spent a good portion of the time working on my game developing algorithms, ultimately landing on a GPU alpha masked texture solution which is at least in the ball park of reasonable performance. The slides reference the actual data for it, but I will reference the performance between the two implementations here.

The first implementation was the Stacked Boxes implementation, which, while it was easy to implement with the Physics engine I am using it was not very easy to get around. There is a balance to be forged between a Minecraft type blocky world and a smooth one, to be sure, but I ran into some significant performance issues as the number of chunks being created climbed, ultimately leading to the development of a number of culling algorithms but even then things were not great. Rendering a chunk averages about 0.045s, but as the number of blocks grew the physics update call began to tank the whole project. I ultimately threaded the physics library but it did not improve things much, because of the syncing between threads.

My first step into the image collision implementation was on the CPU using a Texture2D and an array of color values. It was successful in generating the rolling terrain, but no matter how I tried I couldn’t get the generation to speed up. In debugging and profiling I tracked the issue down to two methods of Texture2D – GetData and SetData. They convert the internal and proprietary format of the texture into an array of bytes, color or whatever you want, and in doing so they are creating a slew of objects. Their conversion process is slow enough that it takes 23.08s to be able to render a 512×512 image out, after the looping has completed, and as resolution climbs up mind you – so does the number of steps in the loop. Another issue is the fact to keep in mind is that as the size of a region decreases the number of them that need to be rendered to fill the screen increases – which leads to a massive upfront slowdown at one point exceeding 30s. Also, with my goal of smooth flowing terrain I was having a horrible time with dealing with having to blow up the 512×512 textures.

I spent a couple of days optimizing my algorithm and ultimately found masking would be quite handy, so I worked on a new implementation where I would only fill in the elements below the height to cut down on work, which still had to use the set and get data calls but I factored out the physics engine and did some simpler homebrew (Read Simple Newtonian) physics. Things came together and I was able to get the rendering time for a region down to a 1.09s on average, but this was still way outside of the 0.016s for a frame and lead to some slow down (once again) on the rendering step.

Online there are a number of examples of alpha masking, but few of them appear to be using them as I am, to define the terrain. The idea is simple, I have created an HLSL shader that tests the Y position and if it is below the height of that pixel column of the world it is rendered, otherwise it is transparent. The largest benefit being that the GPU is inherently multithreaded, but I had no idea how well that would help me. I was able to push the size of the texture up to 4096×4096 and still rendering them in 0.028s on average. When I scale them back to 1024×1024 it takes 0.0056s on average to render a single texture, which is much more reasonable than anything else I have used, and provides the level of detail I was looking for.

Overall the improvement throughout has been monumental, going from a 30s rendering time and cutting it down to 0.0056s has been amazing, and has also saved me from having to find a different road to take for the game. Learning HLSL is going to take time, but I am pretty satisfied with my pixel shader. With further tweaks I can likely improve it, but I have already moved my physics library back into place and am now letting It handle physics calls again. As the project continues I hope to find a method of handling Physics calls through the GPU and I’ll give building a library for it a try over the summer.