Performance Profile
From Rex community wiki
When you build your world, you should constantly be aware of the performance costs related to the content you're adding. The following image shows a profile that was generated in July of 2010, using the latest develop version between Naali 0.2.3 and Naali 0.2.4 (git rev 23cbf5f5837a9a63ea4a49158199e81a4e03f448).
Four different worlds were observed with four different setups. The user was the only avatar in the world, and the profile was taken after all data had been loaded into the world. The FPS count at the bottom gives a rough estimate of Naali performance on different systems. On Windows, Ogre Direct3D9 Rendering Subsystem is used to draw the world, OpenGL was not tested.
The profile compares the content found in four different worlds. Color coding shows some hints of what is consired bad (red) and what is considered good (green). Bolded black highlights the highest value in a row.
The important issues to focus on are the following:
- Batch count. ~1000 batches is tolerable with Ogre on high end, ~2000 gets Ogre rather heavy on CPU.
- Triangles/batch. Try to keep this as high as possible (at the expense of lowering batch count, of course).
- Texture sizes. Having more textures in view than the GPU has memory for kills the performance immediately. Some cards are outright bad with textures at 2k or larger.
- Texture bytes/pixel. Use texture compression to push this as low as possible. DXT1 is half a byte/pixel compared to full RGBA32 with 4 bytes/pixel. This allows you to have more and larger texture sizes while keeping the texture data amounts low.
- Data sharing. Share your materials, meshes and textures between world entities. If you use the same mesh twice in the world, do not upload it twice, but share it between entities. If the meshes only differ a little, consider breaking the other one into two meshes, of which one is then shared.
- Material count. Ogre is not too good with lots of materials. If several meshes use the same material and texture, don't upload a new material with a different name for each of them.
Render profile tests for Naali
Next test results are got from Nvidia PerfHUD tool. All results have been done with HighEnd computer. Tests were done in worlds Fishworlds, EmptyTaiga and Multipolis.
If developer wants to make own Nvidia PerfHUD tests he can enable Nvidia perfHUD support by uncommenting line #add_definition(-DNVIDIA_PERFHUD) from OgreRenderModule cmake-file.
GPU idle time
World : EmptyTaiga
- Avatar does not move : 1-3 ms (on land).
- Avatar/camera moves : 3-5 ms.
- Avatar does not move : 3 ms (in ocean).
- Avatar/camera moves : 3-4 ms.
World : FishWorld
- Avatar does not move : 20 - 25 ms (in lobby).
- Avatar/camera moves : 6 - 13 ms.
- Avatar does not move : 13 - 15 ms (in ocean).
- Avatar/camera moves : 13 - 15 ms.
World : Multipolis
- Avatar does not move : 3 ms.
- Avatar/camera moves : 3 ms.
Viewport render profile
main view render profile:
World : EmptyTaiga
- GPU time for rendering was between 2.155-2.741 ms. (on land)
- CPU time for rendering was between 1.573-2.12 ms.
- GPU time for rendering was between 1.273-1.314 ms. (in ocean)
- CPU time for rendering was between 0.775-0.882 ms.
render profile for shadows:
World : EmptyTaiga
- GPU time for rendering was between 0.094-0.113 ms. (on land)
- CPU time for rendering was between 0.500-0.659 ms.
- GPU time for rendering was between 0.108-0.196 ms. (in ocean)
- CPU time for rendering was between 0.492-0.911 ms.
main view render profile:
World : FishWorld
- GPU time for rendering was between 6.555-8.177 ms. (in lobby)
- CPU time for rendering was between 8.945-13.696 ms.
- GPU time for rendering was between 6.153-7.892 ms. (in ocean)
- CPU time for rendering was between 8.871-14.19 ms.
render profile for shadows:
World : FishWorld
- GPU time for rendering was between 0.977 - 1.2477 ms. (on land)
- CPU time for rendering was between 4.371 - 6.972 ms.
- GPU time for rendering was between 0.247 - 0.490 ms. (in ocean)
- CPU time for rendering was between 4.405 - 8.587 ms.
main view render profile:
World : Multipolis
- GPU time for rendering was between 1.945-3.042 ms.
- CPU time for rendering was between 1.199-3.268 ms.
render profile for shadows:
World : Multipolis
- GPU time for rendering was between 0.036 - 0.535 ms.
- CPU time for rendering was between 0.475 - 1.052 ms.
|