Author Topic: Possible performance improvements with glGetUniformLocation and state changes (Read 5516 times)

hailstone · « **on:** 29 August 2011, 13:42:25 »

I've had a quick play with gDEBugger and found a possible performance improvement. Instead of a calling glGetUniformLocation each time a uniform is set for each time a unit is rendered I've set it to store the handles on loading a shader. Using gDEBugger it shows that glGetUniformLocation is called 767 times and 11.99% of calls in a single frame step. After the changes it's called a max of 13 times. I've only tried with basic shader so if something is messed up, let me know.

Commit 8918a20089..

Using a scene graph to manage state changes could help improve performance too.

State changes in a single frame step.

will · « **Reply #1 on:** 29 August 2011, 14:30:02 »

An interesting stat would be how many times per frame do you draw the same mesh at the same frame or interpolating between the same two frames?

With texture atlas code you could likely collapse most meshes in each model into a single mesh to.

My own playing with terrain drawing saw big improvements on low end cards using mipmapped texture atlas and degenerate triangle strips.

Yggdrasil · « **Reply #2 on:** 31 August 2011, 23:24:09 »

Nice find. Do they still want to have my tel. number for a free license. Somehow that bugs me.

Quote from: hailstone on 29 August 2011, 13:42:25

Using a scene graph to manage state changes could help improve performance too.

We don't use one? I'm a bit surprised. I've heard many good things about OpenSceneGraph.
http://www.openscenegraph.org/projects/osg

For example it includes triangle stripification. Probably requires a very big rewrite...

will · « **Reply #3 on:** 1 September 2011, 04:58:51 »

My favourite subject:

I thought my octree was pretty tight code, but a special quadtree'd VBO for the landscape beat it out. If you have a 2D problem (as glest and other RTSes are), quadtree is massively faster than octree and grid can be even faster if you need to visit each tile anyway.

I thought this quadtree'd VBO was pretty tight code, but softcoder making a VBO using the old glest visible quad iterator every time the camera moved beat it out. Then I made a tighter visible quad to be iterated, and that beat it out the old glest iterator massively again. (Except it doesn't work to only draw visible tiles, you need a wider walk to find those who might contribute shadows etc. So it was abandoned rather than fixed.)

If I recall silnarm has, a long time back, played with splatting the whole map into a single massive texture. If on the other hand you are still using tile textures, you can get massive speedups from putting all those tiles into a texture atlas so you can draw the map without texture state changes and from a small number of mega-tri-strips using degenerate triangles to skip. (Always assume that an atlas will spill onto additional 'pages' so you're dealing with a small number of textures, rather than strictly a single texture).

Oh, and avoid indexed drawing wherever possible. If you must do it, try and use 16bit indices rather than 32, its usually much quicker. But in my playing, nothing beats triangle strips.

I could imagine a G3DHack that took all the frames and merged the meshes and stripified it, and then you used a vertex shader to interpolate them.

All my empirical data is on low-end cards, where I think these details matter.

Omega · « **Reply #4 on:** 1 September 2011, 21:58:02 »

Quote from: will on 1 September 2011, 04:58:51

If I recall silnarm has, a long time back, played with splatting the whole map into a single massive texture. If on the other hand you are still using tile textures, you can get massive speedups from putting all those tiles into a texture atlas so you can draw the map without texture state changes and from a small number of mega-tri-strips using degenerate triangles to skip. (Always assume that an atlas will spill onto additional 'pages' so you're dealing with a small number of textures, rather than strictly a single texture).

Does GAE do this in someway? I recall seeing some image before that seemed to have all the tiles in one image file. Can't seem to find it now though, and no clue if that's even really related or anything.

silnarm · « **Reply #5 on:** 3 September 2011, 00:45:42 »

Quote from: Omega on 1 September 2011, 21:58:02

Quote from: will on 1 September 2011, 04:58:51
If I recall silnarm has, a long time back, played with splatting the whole map into a single massive texture. If on the other hand you are still using tile textures, you can get massive speedups from putting all those tiles into a texture atlas so you can draw the map without texture state changes and from a small number of mega-tri-strips using degenerate triangles to skip. (Always assume that an atlas will spill onto additional 'pages' so you're dealing with a small number of textures, rather than strictly a single texture).
Does GAE do this in someway? I recall seeing some image before that seemed to have all the tiles in one image file. Can't seem to find it now though, and no clue if that's even really related or anything.

'tr2' (Terrain Renderer 2, very creative name...) puts all the splatted textures on a single big texture (if possible, two or more may be needed depending on max texture dimensions and map size).

The image you saw was probably called 'terrain_tex.png' and was an output of said texture for testing/debugging, this is no longer saved.

Quote from: will on 1 September 2011, 04:58:51

I could imagine a G3DHack that took all the frames and merged the meshes and stripified it, and then you used a vertex shader to interpolate them.

I can also imagine such a thing... one day someone might even do it

Quote from: hailstone on 29 August 2011, 13:42:25

I've had a quick play with gDEBugger and found a possible performance improvement. Instead of a calling glGetUniformLocation each time a uniform is set for each time a unit is rendered I've set it to store the handles on loading a shader. Using gDEBugger it shows that glGetUniformLocation is called 767 times and 11.99% of calls in a single frame step. After the changes it's called a max of 13 times. I've only tried with basic shader so if something is messed up, let me know.

Nice one, don't be shy with further investigating... rendering with shaders is not even remotely optimal atm.

Now that the shaders are working correctly (and are mostly consistent with the fixed pipe code) it is probably time to split the Renderer up, many current state changes are completely useless with shaders, and are just wasting time, but are needed for the fixed pipe code and therefor also for the ability to switch between them easily.

The other big thing that immediately comes to mind is getting rid of all the conditionals in shaders. A tricky job perhaps, but what we need is what the guy at http://www.3dkingdoms.com/ calls a template shader system. Basically we have no complete shader source files, just snippets that do different things and are spliced together to make a final shader (or many shaders), so there wouldn't be one shader supporting up to x lights, but rather one shader for one light, one for two lights, etc, etc, and one for x lights, with no if statements.

News:

Author Topic: Possible performance improvements with glGetUniformLocation and state changes (Read 5516 times)

hailstone

Possible performance improvements with glGetUniformLocation and state changes

will

Re: Possible performance improvements with glGetUniformLocation and state change

Yggdrasil

Re: Possible performance improvements with glGetUniformLocation and state changes

will

Re: Possible performance improvements with glGetUniformLocation and state changes

Omega

Re: Possible performance improvements with glGetUniformLocation and state changes

silnarm

Re: Possible performance improvements with glGetUniformLocation and state changes