[u][h1]Welcome to ‘Foundry Fridays’[/h1][/u] Hello! My name is Yog(Cheerio) and I have the honor of writing the first ‘Foundry Fridays’ dev diary, a place for different members of the Foundry team to share what they’ve been working on. Each post will dive into a different area of the game, be that art, design, ux etc. As I tend to do a lot of technical work on Foundry, today’s post is going to be fairly technical. Let me know on our discord if you guys like these types of posts and I’ll try to do more of them in the future. For today’s post, we’re going to talk about one of the systems I’ve been working on to help optimize our CPU usage. [u][h2]What problem are we trying to solve?[/h2][/u] [img]https://clan.cloudflare.steamstatic.com/images//38913947/bc70dbef2e554803896a8d20d7d358bf446747cd.png[/img] Foundry is an online, deterministic, procedurally generated, voxel based, infinite world where players build massive factories that grow to be hundreds of thousands of objects that need to be rendered to the screen. Basically a full bingo card of game development challenges. One of the big costs in rendering so many objects is gathering all the objects that need to be rendered, determining which objects are visible and then grouping similar objects together in batches and then sending all of that data to the GPU. Since Foundry is made in Unity, most of this work is done on the CPU. On top of that, most of this happens on the same thread as other game systems such as physics/particles/animation/audio/factorysimulation/ui etc. That being said, if you’re willing to get your hands dirty, Unity gives you the tools to move a lot of this work on to the gpu using what we’re going to call Compute Based Rendering. [u][h2]Compute Based Rendering[/h2][/u] [img]https://clan.cloudflare.steamstatic.com/images//38913947/e81fc1da29fa3f4969cf95a74651a33b203eb0f4.png[/img] Compute based rendering allows us to move a lot of the rendering work that would typically be done on the cpu, to the gpu. Not only does this free up our cpu, it turns out this kind of massively parallelizable work is what GPUs are really good at, meaning we can draw even more objects then before. [u][h2]Baby Steps[/h2][/u] Before I joined Foundry, I did a lot of rendering/optimization work on games like Oxygen Not Included and Dead Rising but this was going to be my first time working with compute shaders so I wanted to start with something fairly simple to prove that this idea would actually work. My first step on my compute rendering journey was to take one of the awesome new grass tufts made by our new art director (Jason) and make myself a test scene in Unity. The next thing I did was to make it so that every frame, all objects that get streamed in have their position/material info uploaded to one big buffer on the gpu. That combined with learning how to use Unity’s Graphics.DrawMeshInstancedIndirect api was enough for me to get something on screen and see if this experiment was going to be worth it. [u][h2]First Results[/h2][/u] [img]https://clan.cloudflare.steamstatic.com/images//38913947/c66906b579ad691daa0074767105b61a21b5bdb8.png[/img] My first test was to compare the performance of drawing 250,000 patches of grass using traditional GameObjects to my new compute based rendering solution. As you can see by the image above, my test scene’s framerate went from 7fps to 197 fps. So far, this is looking very promising. What’s even more exciting is that even after adding things like visibility tests(frustum culling), the cost on the gpu to iterate through all the patches of grass and copy them to a separate compute buffer for rendering is under 0.05ms which is incredibly cheap. At this point I was quite optimistic however it was now time to try out something a little more complicated… [u][h2]Not the Flop(s) I was looking for[/h2][/u] [img]https://clan.cloudflare.steamstatic.com/images//38913947/c13f2ad78f049254721f43c2a2be1edd32f59211.png[/img] For my next test, I wanted to try rendering an object with a hierarchy(an object made up of multiple parts). The new pillars in Foundry are made up of a base model, a middle piece and a top piece. It took a little bit of work to handle uploading the hierarchies to the gpu but once that was taken care of, I was ready to gaze upon my vast sea of super optimized pillars only to instead be greeted by a number that brought sadness to my heart: 6fps. Even though this was six times faster than Unity’s default rendering, I was hoping I would at least hit a smooth 30fps with the new system. I spent some time profiling both my code and Unity’s rendering and after a little digging I finally found the issue. The model was just a lot more detailed than the piece of grass I used in my previous test. I then went and purchased an LOD generator from the asset store(PolyFew), generated some lods and went to work adding LOD support to my rendering system. Once this was all working, it was time for another test. I crossed my fingers and was met with a much better number which brought joy to my heart: [img]https://clan.cloudflare.steamstatic.com/images//38913947/14e1098c0d25f4b1d084855bd48cd30397cb6eb6.png[/img] 227fps! Basically the entire problem was that the gpu was struggling drawing so many vertices. By adding LOD support to my system, we can now reduce both the cpu and gpu cost of drawing so many objects. And at last, we have finally come to the grand finale for today’s post: [u][h2]Conveyors @ 160fps[/h2][/u] [img]https://clan.cloudflare.steamstatic.com/images//38913947/15f4495f70fb75bd1bed2123b7a5f372d6cc2c99.png[/img] The final test I want to talk about today is the conveyor test. The conveyor is the most common object players will place in Foundry. In a large base, a player can easily place tens of thousands of them so it’s quite important that these be rendered as efficiently as possible. I didn’t actually need to add any new features to my system so without further ado, here are the results we’ve all been waiting for: [img]https://clan.cloudflare.steamstatic.com/images//38913947/ca6dcdd7f6e2cf2771f8534442c7161fd865be9f.png[/img] After all that, 250,000 conveyors went from rendering at 3fps to 164fps which I am very happy with. This isn’t as big a win as with the grass pieces but that mostly has to do with an individual conveyor piece actually being more expensive to draw than a single piece of grass. Knowing this, there are a few more steps we can take to further improve conveyor rendering such as optimizing the shader/mesh. [u][h2]Final Thoughts[/h2][/u] So far it looks like spending time investigating compute based rendering is going to be a win. There’s still more work to be done to support some of the more complex buildings such as the animated ones but every time we convert one of our machines to use this new rendering system, we should see a similar jump in performance. There’s also a lot of room for performance improvements to the system such as occlusion culling, tight shadow frustum culling and better sorting. All of which should help us see an even bigger jump in performance. [u][h2]Till Next Time![/h2][/u] [img]https://clan.cloudflare.steamstatic.com/images//38913947/1f937df7aa5331173790c788919044f2baef56b6.png[/img] I hope you enjoyed our first Foundry Friday. We will be posting a new one every second Friday up until early access. If you have any questions, please hit us up on [url=https://discord.gg/MB9YRh4B]our discord[/url]. [url=https://i.imgur.com/yE16siQ.mp4][img]https://i.imgur.com/x1hwp4x.gif[/img][/url] Thanks for listening!