One of the most important – but often unfamiliar – aspects of archviz in Unreal is to make your scenes run fast and smoothly on the target computer(s).
You can create the most beautiful archviz scenes out there but if you present them to your clients in a slow and stuttering way, you might run into some serious problems – because the last thing you want to have is a weak user experience.
So what will you get from this article?
I will introduce 9 optimization techniques to you that you can use up instantly in your archviz projects. By implementing them, you might be able to significantly improve the performance of your scenes, thus preventing everyone having a weak user experience.
There are a lot of technical tips and tricks that you can do to improve the overall performance of your archviz scenes. But first, let’s start with the basics.
WHAT IS PERFORMANCE OPTIMIZATION?
Before answering this question, first we have to understand what a render pipeline (or a graphics pipeline) is.
THE RENDER PIPELINE
Render pipeline is the process of displaying visual content: we render out 3d models, (material) shaders, lights, effects, etc. on a 2d screen. It’s a process: it consists of many stages that follow each other. So the data flow if linear (kinda).
Think of the render pipeline as a car assembly line: at each stage, a new component is added to the automobile (the frame). The more it goes down the line, the closer it will be to its finished state.
In reality, this process is more complex, but for the sake of better understanding, we have to simplify it. Besides its complexity, the data flow has a linear and parallel nature that overlaps each other.
Basically, there is a similarity to offline renderers – just like V-Ray and Corona – but instead of hours, the render time of a single frame (an image) is measured in ms (milliseconds). So if you want to run your archviz scenes at 60 fps, the render time of each frame should be 16,66 ms (1/60).
The bottom line is that the final content of each frame is the result of many calculations.
HOW THE CPU AND GPU CONTRIBUTE TO IT ALL
So the render process has a linear and parallel nature.
First, the CPU prepares all the data that the GPU needs in order to render each pixel out in a frame. The CPU’s function call is called draw call. After the draw call, the GPU starts rendering the pixels on the screen.
The linear, parallel and overlapping ‘nature’ of the render pipeline
A draw call is a sum of commands. It tells the GPU what and how to render on the display. For each frame, the amount of draw calls is mainly determined by your scene’s object count and material shader count.
So let’s answer our question:
WHAT IS PERFORMANCE OPTIMIZATION?
It’s the sum of the various tasks that we do to make the render pipeline be more faster. In other words, we apply different techniques that will reduce the calculation times for many stages in the render pipeline.
Performance optimization is a very complex task, as the render pipeline consists of many, many parameters that affect each other. Basically, it’s a never ending process for every archviz scene because there’s always something that we can further optimize. The trick here is to know when to stop.
This is why I call performance optimization a technical art: there is an exact know-how on how each stage works, but tweaking the parameters and balancing out each stage in order to hit the target framerate is rather an art.
THIS ONE IS A PRACTICAL ARTICLE WITH (ALMOST) NO THEORY
I won’t go into the technical details so deep, for several reasons:
- As I mentioned above, the render pipeline is a pretty complex stuff, and showing you how it works would take days (no kidding).
- You can achieve decent results by implementing these techniques without understanding the science behind it. My main goal here is to show you tricks that you can use instantly so you can get results even within hours.
- Although I spent 100+ hours to study this topic, I’m still just a 3d artist, not a technical guru. In 2018, for two of my projects, I had to make these high quality, intensively interactive archviz scenes run fast and smoothly on VR devices such as Oculus Rift and HTC Vive. That’s why I had to learn about performance optimization.
In this article, I will also share the results of two of my projects from 2018:
First, the interactive apartment project (I will refer to it as the Spanish project) with 4 bedrooms and lots of accessories. The challenge here was to achieve a decent object count and material count to optimize draw calls.
One screenshot of the Spanish project
And second, the 1900’s kitchen project (aka. the Kitchen) which was quite a small scene, but to present high visual quality, I had to use tons of high-poly objects. So optimizing triangle count was quite a challenge here.
The Kitchen project (in collaboration with NEMA FX)
The VR version of this archviz scene is exhibited in M9 Museum in Venice, Italy
Let’s move on, and deal with another important question:
WHY IS PERFORMANCE OPTIMIZATION SO IMPORTANT?
Well, I guess by now you might have a pretty good understanding of the answer, but let’s see why I consider it as an important task:
- Clients always want a flawless and smooth user experience for their projects. A slow and stuttering archviz scene just kills the user experience and leaves the clients unhappy and disappointed.
- For VR scenes (using Oculus Rift or HTC Vive), besides the smooth user experience, you have to pay attention to the safe VR experience as well. You have to prevent VR sickness occuring – which is no joke, I had VR sickness too and it was quite unpleasant.
- Sometimes it’s a requirement to make your archviz scenes run fast and smoothly on mid-level computers as well. So in this case, you must take hardware limitations into account when optimizing performance.
Now let’s see another important area, before I introduce the 9 techniques to you:
THE RIGHT MINDSET TO APPLY
You should always keep three things in mind during performance optimization:
HAVE A SPECIFIC GOAL IN MIND
First, you have to know the requirements before you could start optimizing performance:
- Do your clients want the desktop version only, or do they want to a VR experience as well? It’s an important question because it defines the target framerate: for desktop scenes 60 fps is enough, but for VR you must go for 90 fps to eliminate VR sickness.
- On what computers do your clients plan to present your scenes? It defines the hardware limitations: if they run your archviz scenes on mid-level computers, you will definitely have to pay more attention to (even heavier) performance optimization.
- How interactive should your scenes be? How many movable objects and lights do you plan to include in your scenes? The more dynamic your scenes are, the heavier your calculations will be for certain stages of the render pipeline. In this case, you have to pay attention to the dynamic light setup.
- In general, archviz scenes tend to have high visual quality. They are a bit more sophisticated than games in terms of details, lights and shadows. During optimization, maintaining high visual quality should always be your priority as long as it’s needed and/or possible. If your your final scenes will run on mid-level computers only, you have to compromise the visual quality for the sake of performance.
So the answers for all these questions will determine the target framerate that you need for your archviz projects.
ALWAYS IMPLEMENT ONE TECHNIQUE AT A TIME
Start with implementing one technique only and then monitor how it affects the performance. Continue with the next one, and keep monitoring the results. When you successfully reached the target framerate, there’s no point to continue optimizing performance.
You have just arrived ‘there’: this is where you should stop this whole process.
ARCHVIZ SCENES ARE DIFFERENT THAN GAMES
Archviz scenes are smaller and less complex than games, so you should have a slightly different approach. In general, less optimization is needed for archviz scenes, and sometimes you should apply techniques differently compared to games (for example, for lightmaps).
The 9 techniques
Okay, let’s see what exact techniques you should implement to make your archviz scenes run fast and smoothly.
1 – UPDATE TO THE LATEST VERSION OF YOUR DRIVER
I know, updating your graphics card driver to the latest one sounds weird as a technique, right? But let me explain, because it’s important.
As I already mentioned, during frame rendering, the CPU keeps addressing draw calls to the GPU. The graphics card driver’s task is to check and translate these requests for the GPU. In this process, the input data runs through several codes. After the output has been generated, the GPU starts rendering pixels on the screen.
So this task requires some milliseconds. If these codes are not optimized, the calculations might require more time than it normally should.
This is why it’s important to always update to the latest version of your graphics card driver, as these codes are better optimized compared to older versions, thus this translation process is faster a bit.
But be careful: sometimes latest versions contain bugs that might cause some issues to your computer. A good approach might be to wait a week to see if others have any problems with the latest drivers (personally, I don’t wait, I always update as soon as it’s handed out).
Let’s see the first ‘real’ technique then!
2 – REDUCE YOUR OBJECT COUNT
Why is it necessary?
Well, to reduce the amount of draw calls and having less objects in your scenes is one way to do so.
Because Unreal applies occlusion culling during the CPU draw thread, the amount of draw calls is a dynamic value that depends on the actual camera view (which assets are being loaded for calculating each frame). This goes for the object count as well, but there is also an overall object count for the entire scene.
What are the best practices for the object count?
- My experience is that it’s possible to create archviz scenes that don’t exceed the overall object count of 600. It’s a good number for even more complex scenes too.
- I don’t suggest to have more objects, but if you have very complex scenes you might want to go up to 1,000 – but make sure your CPU is powerful enough and you have sufficient amount of RAM installed for handling the lightmaps (because lightmaps are textures too).
- The object count for the Spanish project was 610, while for the Kitchen it was only 243.
What are the best practices for draw calls?
- A general rule of thumb is to keep their numbers below 2,000 on average: if you have a weak CPU, it should be more like 500 – 1,000 while for fast CPUs 1,500 – 2,000 is okay.
- If you have complex scenes that run on a powerful computer, you can go up to 3,000. But don’t go any further because you’ll start experiencing significant framerate drops.
- But it’s always a great approach to take the time to minimize draw calls. You never know what further requests you’ll get from your clients that might need further optimization (for example, he or she will also need the VR version which means 60 fps is not enough, you have to go for 90 fps). Being prepared might be a good strategy.
- I had around 1,500 draw calls on average for the Spanish project but in certain areas it all went up to 3,000. The Kitchen project had 1,200 draw calls on average.
How to monitor draw calls and object count?
- Type stat rhi in the console. In the list, ‘DrawPrimitive calls’ refers to the amount of draw calls;
- Window – Statistics – Primitive Stats: the overall object count can be found under the ‘Count’ column.
The stat rhi console command
You can check a lot of information here
The Statistics window with the Primitive Stats
You can switch to other statistics using the dropdown list at the upper left corner
Okay, now that you’re familiar with draw calls and object count, let’s move on to another huge topic in terms of performance optimization, which is:
3 – OPTIMIZE YOUR TRIANGLE COUNT
Why is it necessary?
For two reasons mainly. Triangle count plays an important role at two GPU calculation stages: the vertex shader and rasterization.
Vertex shader is the process when the GPU converts the local space of each vertex into screen space. The more vertices the GPU has to process, the more render time it will take overall. But for modern graphics card, processing vertices is not a big deal.
During rasterization – after the GPU organized the vertices into triangles – every triangle is being drawn on the screen. Pixels are calculated using 2x2 pixel size quads: for each pixel, the GPU calculates 4 pixels (this is just how it works). The smaller the triangles are (in other words, the more high-poly the object is) the more iterations the GPU will calculate for the same vertex – which is a bad thing.
This is how the GPU renders (and wastes some) pixels using 2x2 quads
This is what we call overdraw which is a waste of the GPU’s resources because many times it uses 1 pixel only, out of 4 pixels – which means 75% of the calculation is being wasted.
You can check the amount of overdraw using the Quad Overdraw optimization viewmode:
The Quad Overdraw optimization viewmode
The higher the number, the more resources of the GPU is being wasted
What are the best practices?
With that being said, the problem is not really the high triangle count – it’s rather the overdraw.
- First of all, when you start working on a project, you should always keep in mind that what you create will be used for real-time visualization. It means that you should always optimize triangle count. Don’t use 64 sides for a cylinder when you can get away with 18 to make it look good in Unreal. Don’t use the Turbosmooth modifier with 2 or 3 iterations when 1 is enough for the visual quality. You should always focus on how to minimize triangle count while maintaining the best visual quality.
- Modern graphics cards are pretty good at rendering out millions of vertices. So lowering the triangle count is just one method when it comes to optimize your archviz scenes. If you nail the amount of draw calls, triangle count will only be a problem if you have a weak graphics card. In this case yes, you should lower the overall triangle count.
- Because Unreal applies occlusion culling during the CPU draw thread, many objects will be hidden for a particular camera view. Basically, you won’t have all the triangles displayed in the camera view. This means that having high-poly objects in your archviz scenes is okay as far as the occlusion culling works properly. Triangle count is rather a dynamic value than a static one and it depends on which objects are being rendered for frame after frame.
- To tackle the most serious problem that is derived from high triangle count – overdraw – you have to use LODs (see the next technique).
The triangle count for the Spanish project was around 8 million on average, and for the Kitchen project it was around 8,7 million on average. I used many static meshes that had 400,000+ tris. With 1,200 – 1,500 draw calls, I managed to reach the target framerate.
How to monitor triangle count?
- Type in stat rhi in the console: check the actual triangle count in the ‘Triangle drawn‘ section.
- Window – Statistics – Primitive Stats: the overall triangle count can be found in the ‘Sum tris’ column.
Okay, now let’s see how you can tackle the problem caused by overdraw:
4 – USE LODS
Why is it neccessary?
The further away a static mesh is from the camera, the less pixels it is rendered to. The problem is that no matter how small an object is on the display, the GPU will process all vertices during vertex shader and rasterization anyway, causing a huge amount of overdraw. Thus, many of the GPU’s calculation will be wasted – especially in the case of high-poly objects.
What are the best practices?
- The more high-poly your object is, the more LODs should be created. My experience is that 3 LODs are enough for static meshes with high triangle count.
- Creating LODs using the Static Mesh Editor is pretty much enough. You don’t neccessarly need your 3d application to create LODs. Unreal just does this job fast and perfectly.
- To nail the perfect Screen Size and Percent Triangles values, you should set them for each LOD using the Quad Overdraw optimization viewmode. Try to minimize the amount of orange to violet colors as much as possible. But be careful, don’t let your static meshes lose obvious details. Try to balance out triangle count and acceptable loss of details.
- As a starting point, the following Percentage Triangles values might work: 30-40% for LOD1, 10-20% for LOD2 and 5-10% for LOD3. But don’t forget to customize these parameters according to your archviz scenes.
How to monitor LODs?
LOD Coloration optimization viewmode: you can track which LOD is loaded up in the actual view. This viewmode helps a lot when you are setting up the Screen Size values for each object.
The Spanish project in the LOD Coloration optimization viewmode
By now, you may experience a significant performance improvement. Let’s see how you can further pimp you archviz scenes!
5 – USE LESS (AND MORE SIMPLE) MATERIALS
Why is it neccessary?
There are several reasons for that:
- Each material will address a draw call so even if you manage to reduce the number of static meshes in your archviz scenes, having lots of materials will still increase draw calls.
- The more complex your material shaders are (meaning the more shader instructions they contain and the more expensive the nodes are) the more time the GPU needs to render them out during pixel shader. However, this is not the biggest issue for archviz scenes as the majority of the archviz materials are not that complex. Optimizing for reaching a decent number of draw calls should be your priority.
- The more textures you use for the material, the more heavy it will be to RAM consumption and to memory bandwidth.
What are the best practices?
- Use as few material IDs for each object as you can. If you have a static mesh with 5 materials on it, 5 draw calls will be issued. Objects that are far away should have a single material on them.
- Use just a few Parent Materials and create the rest of the materials as Material Instances. This way you can make your workflow faster and easier – because if you change any parameters in the Material Instance, it won’t require compile time.
- Try to minimize the amount of shader instructions in the Parent Material, and use cheap nodes that are processed fast. This will rarely be a problem with archviz materials.
- If appropriate, use numeric values instead of textures to control channel parameters (such as metallic, roughness, etc).
For both the Spanish and the Kitchen projects the Parent Materials contained 6-800 shader instructions – which is okay according to the Shader Complexity optimization viewmode.
How to monitor your material shaders?
- Shader complexity optimization viewmode: you can track how complex your material shaders are. Keep in mind that this view is based on the amount of shader instructions only so it doesn’t take the complexity of the nodes into account. Apart from that, it might be great as a starting point.
- Material Editor – Stats: you can check the number of shader instructions here.
The Shader Complexity optimization viewmode
Stats in the Material Editor
After organizing your materials, let’s see how you can further lower the amount of draw calls in your archviz scenes:
6 – USE HISMC ACTORS
This stuff with a scary name refers to Hierarchical Instanced Static Mesh Component – which is an actor that helps you with creating instances.
In case you haven’t heard of this actor, just watch this video on how to use the Hierarchical Instanced Static Mesh Component.
Why is it neccessary?
To minimize draw calls. Let’s say you have a hundred copies of a static mesh under the HISMC actor. It will render those 100 objects using a single draw call. So you can significantly reduce draw calls by organizing identical objects under this actor.
But be aware of the HISMC actor’s limitations: the only difference between the objects can be their positions and rotations only. Their sizes, the applied materials and the LODs remain the same.
Keep in mind that by Shift+dragging a static mesh in the viewport is not enough. Although the static mesh behind those objects will be the same, they will still require seperate draw calls.
What are the best practices ?
- If you have an archviz scene that has thousands of objects in it, it might be a great way to handle them using a HISMC actor to minimize draw calls.
- If you use instances in your 3d application, it might be a good idea to form an HISMC actor out of them. For example, if you have a dining table with 8 identical chairs and in 3ds Max you instanced those chairs, just put one static mesh under an HISMC actor and arrange the rest manually.
Okay, this technique might further boost the overall performance of your archviz scenes. But there’s another issue that you want to pay attention to:
7 – OPTIMIZE YOUR TEXTURES AND LIGHTMAPS
Why is it neccessary?
To optimize the RAM consumption and the memory bandwidth – you don’t want bandwidth to be the bottleneck of the render pipeline.
Bandwidth problems might occur when the engine loads multiple textures during the game. To give you a simple analogy: when you want to download lots of files in parallel, the download time for each file will be divided according to the number of downloads and the bandwidth of your internet connection. Downloading small files won’t neccessarly be an issue, but in case of huge files, you might run into longer download times – and that’s what you want to eliminate.
One important thing to keep in mind: lightmaps are textures too, so it might be a good idea to optimize them as well.
What are the best practices for textures?
- You should always use power of 2 textures (512x512, 1024x1024, etc.) because this way the engine automatically generates mipmaps (texture LODs: e.g., for a 1024x1024 sized texture, the following mipmaps will be generated: 512x512, 256x256, 128x128, etc. The engine will then automatically displays the preferred texture – mipmap – depending on how many pixels the texture takes up in a particular camera view).
- For automatic mipmap generation, the ratio of your textures should be 1:1, 1:2, 1:3, etc.
- Texture size for small objects (or for static meshes that are further away from the character) should be 256x256 or 512x512 at most.
- Use 1024x1024 textures on huge surfaces. You can go up to 2048x2048 if needed, but limit their numbers.
What about 4K textures? Well, you shouldn’t use them in general, only if they are really needed and you run your archviz scenes on a powerful computer. The memory consumption difference between 2K and 4K textures is 4x (and 16x compared to 1K resolution) – that’s one thing to keep in mind.
What are the best practices for lightmaps?
- First of all, lower your object count. This way less lightmaps will be generated.
- Use 256x256 for each object as a starting point then check your shadow quality. If you need more details for certain objects, switch to 512x512. If you need even more details, just go for 1024x1024. This approach has been pretty helpful regarding to optimizing lightmap resolutions in my archviz scenes.
- For objects that are far away from the camera view, use 128x128 or even 64x64 lightmap sizes.
- High-poly objects with lots of details might require bigger lightmaps, especially if they are closer to the camera view.
- World Settings – Lightmass – Lightmass Settings: consider enabling the ‘Compress Lightmaps’ option. You can further reduce memory bandwidth usage this way.
How to monitor your textures?
By opening the Window – Statistics – Texture Stats: you can find information on the textures here such as current memory usage and current mipmap size for a particular camera view.
How to monitor your lightmaps?
- Lightmap Density optimization viewmode: my experience is that for archviz scenes, it will be almost all orange and red. The reason for that is archviz scenes require much finer shadow details, as opposed to games.
- Window – Statistics – Static Mesh Lighting Info: you can find information on the lightmaps such as their overall number, individual resolution and filesize.
Now let’s see what happens when your archviz scenes contain dynamic light setup as well:
8 – OPTIMIZE YOUR DYNAMIC LIGHT SOURCES
Why is it neccessary?
Because if you don’t pay attention to the number of movable and/or stationary lights, to their parameters and to their arrangement, you will end up with having lots of calculations in the GPU’s pixel shader which might cause some serious framerate drop.
What are the best practices for a dynamic light setup?
- First of all, try to use as few dynamic light sources as you can.
- Lower the Attenuation Radius and Cone Angle values (in case of spots) so the dynamic lights affect less amount of pixels.
- Lower the amount of overlap between movable light sources.
- The static meshes that cast shadow should have lower triangle count. The calculation time of a dynamic lights is heavily dependent on the triangle count as the engine has to process every vertex when rendering out the shadow map.
- Lower the shadow map quality: you can do this by typing in sg.ShadowQuality 0…4 in the console.
How to monitor dynamic lights?
Type stat unit in the console and pay attention to render time of the GPU thread. Start with turning off all movable light sources and then continue with turning them on, one by one. This way you can have a clear picture of the render time for each dynamic light source.
Last but not least, let’s see the last technique:
9 – USE DIFFERENT ENGINE SCALABILITY SETTINGS
Sometimes the efforts will pay off when you don’t use the Cinematic or Epic scalability settings.
It’s not always neccessary, but for VR (or for scenes that need to run on mid-level computers), switching to different settings might help with achieving the target framerate.
What settings should you use?
Well, it’s completely up to the complexity of your scenes and the pieces of hardware you use. This ‘technique’ requires some experimentation, just always keep in mind not to sacrifice the visual quality.
Well, I hope you have a much clearer understanding of performance optimization by now. I encourage you to start implementing these techniques as soon as possible – even if you just work on a practice project.
Remember, it is essential to make your archviz scenes run fast and smoothly on your clients’ computer. The last thing you want to have is a weak user experience.
So use these techniques to boost up the performance of your archviz scenes dramatically!
Agree? Disagree? Have any suggestions?
If you have something to say, feel free to drop a message in the comment section below!
If you think this article might be helpful for others, please share!
CREDITS & RESOURCES
Special thanks to Tom Frank and Balázs Bakon for peer viewing this article!
GPU Performance for Game Artists by Keith O’Conor
Gnomon Masterclass Part I Building Better Pipelines for UE4 by Martin Sevigny
Gnomon Masterclass Part II Rendering in UE4 by Homam Bahnassi
Tech Art Aid tutorial videos by Oskar Świerad
Render Hell by Simon Trümpler
ABOUT THE AUTHOR, ANDRAS RONAI
Architect, 3d artist and Unreal Engine specialist. I help architects and archviz studios with creating high-end quality architectural visuals using the Unreal Engine, from engine compatible modeling to walkthrough scenes and cinematics.