Author Topic: 'How to' guide: speeding up rendering (Read 3327 times)

silnarm · « **on:** 23 September 2009, 06:08:01 »

OPEN TASKS

1. rewrite Renderer::renderWater() and Renderer::renderSurface() using Vertex Arrays.

2. More sophisticated state management

3. Sort objects/units before rendering, to minimise calls to glBindTexture()

4. Smarter texture management

5. Use all available texture units

1. Renderer::renderWater() and Renderer::renderSurface()

These functions are currently implemented with immediate mode OpenGL. Immediate mode makes lots of function calls and is very inefficient, the more tiles you render (the further back you zoom) the more pronounced this inefficiency. I'll be reorganising the map data soon to remove duplication of information between Map and Pathfinder, at this time I will also split out the data that the renderer is interested in, and put it all in neat compact video card friendly arrays.

The immediate mode code then needs to be gutted. Instead of drawing the quads in the while loop, we build a vector of indices to the vertex array instead, then when our array (err, vector) is built, we set up our vertex, normal, colour & texture arrays and call,
glDrawElements( GL_QUADS, 4 * numTiles, GL_UNSIGNED_SHORT, ourIndices.c_ptr() );
Thus replacing hundreds of function calls (potentially thousands) with one.

Note that this task may require elements of task 4, specifically to do with the tileset textures... but I'm not too sure at this stage what the current TextureManager does do for us.

This is what I would be doing with my level of knowledge of, and experience with, OpenGL. If any OpenGL Gurus happen to wander through this way and would like to rewrite this using even fancier VBOs or whatever new fandangled things are available, please do so!

2. What's your state?

State management in OpenGL is important, in particular, changing state variables on the 'server' can be expensive. We need to minimise state changes if possible, or at the very least keep track of the state ourselves, and not change it to the same thing. I'm not sure if modern implementations are smart enough to do this for you, but I know this has historically been an issue, so we should do it ourselves in either case.

An assessment of the use of glPushAttrib()/glPopAttrib() should be performed here too, with the aim of minimising if possible.
Ordering our rendering so we push and pop as little as possible would be the goal, and if only a couple of states are changed for something, apparently its better to just change them individually and restore them when you're done, 'by hand'.

Trying to group renderables that need the same state would be desirable, but this would conflict with the aims of task 3.

3. Form on orderly queue! By team colour and then unit type please.

Renderer::renderObjects() was where this all started, I was doing some profiling and noticed it was eating up a rather sizeable chunk of the 3D rendering time (about 40% on my system). A attempted quick-fix and some more investigating revealed the problem, glBindTexture(). Some of the default tileset textures are using 5 meshes, all with different textures, object rendering uses one texture unit, causing 5 calls to glBendTexture() for each one rendered.

Before I discovered that some of the objects have so many textures, I tried to fix it by sorting the objects, and then rendering them in order...

So the old while loop that plucked out objects and rendered them becomes a preprocessing loop:

Code: [Select]

	vector<Vec2i> toRender;
	PosQuadIterator pqi(visibleQuad, Map::cellScale);
	// find all renderable objects...
	while(pqi.next()){
		const Vec2i &pos= pqi.getPos();
		if(map->isInside(pos)){
			Tile *t= map->getTile(Map::toTileCoords(pos));
			Object *o= t->getObject();
			if(o && t->isExplored(thisTeamIndex)){
				toRender.push_back ( pos );
			}
		}
	}
	// sort them by model
	std::map<const void*, vector<Vec2i>> renderTable;
	for ( vector<Vec2i>::const_iterator it = toRender.begin(); it != toRender.end(); ++it ) {
		Object *o = map->getTile( Map::toTileCoords( *it ) )->getObject();
		assert( o && o->getModel() );
		renderTable[o->getModel()].push_back( *it );
	}

then the rendering iterates over the renderTable map, each element is a vector of positions of objects with the same model. This was meant to reduce the calls to glBindTexture() because the code does actually check if the currently bound texture is the same as the new one, and doesn't bind if it is (but it only does this for the base texture unit... more on that in a minute...)

It didn't reduce the calls to glBindTexture(), because some of the models were changing the texture up to five times each...
The possible solutions for objects all involve tasks 4 & 5.

Units we probably can speed up using this method. And by not always binding the team texture, as it currently is. So renderable units need to be sorted first by team colour, then by unit type. Some units do use more than one texture, though 2 seems to be a typical max, and most models seem to use only one, so this should give a noticeable improvement (if there's lots of the same unit type on screen that is!)
We will need to add a 'lastTeamTexture' to MeshCallbackTeamColor, and check before binding textures, much like the model renderer currently does with the base texture unit and 'lastTexture' [see ModelRendererGl::renderMesh()].

4. Texture Management

I shouldn't say too much here, I'm not that familiar with what our current TextureManager does for us, but I have noticed a few things it doesn't do

It's not grouping small textures into bigger textures, this is a must for tileset textures in order to complete task 1.
SurfaceAtlas seems to provide an interface to do just this, via addSurface(), but then it just creates a new texture for each needed tile texture, rather than grouping them all in one big texture, and setting the texture coordinates for each accordingly.

The same might be possible to overcome the object/unit with multiple textures problem, or...

5. Texture Units

We're using one texture unit for all our model rendering... one for the fog of war, and one for shadow mapping (team colour uses the fog of war unit).

That's 3. I'm willing to wager most of the games of Glest played today are on OpenGl implementations offering more than 3 texture units. We should check how many there are, and use them ALL!

Indeed, if sufficient texture units are available, the tileset objects with 5 textures problem is solved, use a texture unit for each, and then my ugly sorting code from above might actually be beneficial.

daniel.santos · « **Reply #1 on:** 25 September 2009, 18:31:44 »

Quote from: silnarm on 23 September 2009, 06:08:01

1. Renderer::renderWater() and Renderer::renderSurface()

Very goddam cool! (I'm in a real cussing mood today.

) I never spent too much time to try to figure out why my frame rates went to hell whenever I zoomed the camera out into space. After analyzing the particle system code (oh, and I modified it even more in the now deprecated 0.3.x branch), I've come to appreciate how so many particles can be drawn so efficiently, since the data for them is buffered up and then sent to the OpenGL engine all at once (well, in batches of up to 1024, if there's more than that, it will use separate calls). So this solution makes a whole lotta sense.

Quote from: silnarm on 23 September 2009, 06:08:01

2. What's your state?

I have an idea for this. Let's create our own class to encapsulate this and use it exclusively to change state. Why? Well for one, it'll help debugging when we can just look at it to see what rendering state bits are set and not set.

Now for debug support, I'm not sure how far we should go, I have to believe that people came up with nifty stuff like this already (I was looking at this BuGLe thing yesterday). So I'm thinking also force all calls to glEnable and glDisable through it.

Quote from: silnarm on 23 September 2009, 06:08:01

3. Sort objects/units before rendering, to minimise calls to glBindTexture()
4. Smarter texture management
5. Use all available texture units

Nice work!

Please keep this all in a separate branch so we can stick it into 0.2.13 whenever it looks good. (Do you have a branch for it already?) I need to make another profile build from 0.2.12a or some such and re-examine some of these older issues I had planned to work on. But here's some other stuff:

Let's re-check computeVisibleQuads() because I don't remember if it ever got properly fixed. It was screwed up to begin with, but didn't matter because of the restrictions on camera movement. When I modified the camera movement, I tried to enhance it and only managed to hack it to work "reasonably." I think it was Omega that told me what the trig was for it, but I don't remember if it ever made it into the function or not. This could mean that the visibleQuad is larger than it needs to be. I suppose it wouldn't be too hard to do something like modify the PoV and make sure we see un-rendered tiles around the border of the screen.

Finally, I hate Renderer::renderUnitsFast(), because it calculates all verticies for all visible units (interpolating between two models) only to do it again (iirc) once they are actually rendered. This also gets called whenever you left click and it doesn't hit a 2D GUI object. It sounds like we're going to get a larger performance boost from what you've listed here, but I still want to revamp this (the details are in some old thread somewhere) so that we aren't doing more of these calculations than we need to. Admittedly, with SSE2, this is pretty darn fast these days because, resulting in some 6-ish CPU instructions per verticies (maybe less, not sure -- because it will do them 4 floats at a time, even though they are Vec3<float>, since there is no other data in that class, it just looks like a huge array of floats to the compiler). None the less, implementing these optimizations should reduce the number of CPU cycles per rendering frame by several thousand.

EDIT: hmm, we could even do something ugly like this

Code: [Select]

// in opengl.h or some such
class OurGlProxyClass {
    void enable(GLenum);
    void disable(GLenum);
    void pushClientAttrib(GLenum);
   // etc...
};

extern OurGlProxy globalGlProxyObject;

#define glEnable(a) globalGlProxyObject.enable(a);
#define glDisable(a) globalGlProxyObject.disable(a);
#define glPushClientAttrib(a) globalGlProxyObject.pushClientAttrib(a);
// etc...

daniel.santos · « **Reply #2 on:** 25 September 2009, 20:56:56 »

Please check out these two related bugs:
Bug #6: MaxRenderDistance setting is not yet doing anything.
Bug #58: Graphic: rough map-edges when camera gets tilted

Here is my info on the interpolation stuff (lower priority): Bug #41#: Performance improvements: Interpolation
Finally, while you're screwing with textures, if you see an easy opportunity for this, then go for it: Bug #63: Compressed images not supported by engine.

silnarm · « **Reply #3 on:** 27 September 2009, 05:51:39 »

Quote from: daniel.santos on 25 September 2009, 18:31:44

Quote from: silnarm on 23 September 2009, 06:08:01
2. What's your state?

I have an idea for this. Let's create our own class to encapsulate this and use it exclusively to change state. Why? Well for one, it'll help debugging when we can just look at it to see what rendering state bits are set and not set. Now for debug support, I'm not sure how far we should go, I have to believe that people came up with nifty stuff like this already (I was looking at this BuGLe thing yesterday). So I'm thinking also force all calls to glEnable and glDisable through it.

This is a great idea, the implications for debugging and diagnostics are just to nice to ignore!

Quote from: daniel.santos

Please keep this all in a separate branch so we can stick it into 0.2.13 whenever it looks good. (Do you have a branch for it already?) I need to make another profile build from 0.2.12a or some such and re-examine some of these older issues I had planned to work on. But here's some other stuff:

There is no branch yet, and may not be for a while... none the tasks are going to be all that easy, and I don't really have the time to be playing with the Renderer as well atm

Task 5 is probably the easiest and could well result in fixing the problem with renderObjects() [at least for most users] so I might try that at some stage soon, but the others will have to wait.

Quote from: daniel.santos

Let's re-check computeVisibleQuads() because I don't remember if it ever got properly fixed.

Ok, I've got some half decent code to render an 'debug overlay' over the terrain now, I'll rig it up to hilight the current visibleQuad in response some command key, then we take a 'snap-shot' of the visibleQuad, then move about (carefully) to see the extents... It wont actually fix anything of course, but it will tell us if it needs fixing, and if it does need fixing, the overlay will be invaluable in confirming our success

Quote from: daniel.santos

Finally, I hate Renderer::renderUnitsFast(), because it calculates all verticies for all visible units (interpolating between two models) only to do it again (iirc) once they are actually rendered. This also gets called whenever you left click and it doesn't hit a 2D GUI object.

If that is the case, obviously it could be done better. This will definitely need looking at I think, it may only be a tiny fraction of CPU time overall, but if we can shorten the time taken by any lengthy 'calculations' like this, all the better.

Quote

EDIT: hmm, we could even do something ugly like this

Not a bad idea... we could get everything 'at once' and be sure we did...

silnarm · « **Reply #4 on:** 27 September 2009, 11:55:11 »

6: fix computeVisibleQuads()

http://i687.photobucket.com/albums/vv231/silnarm/glest/quad_test2.jpg

The visibleQuad was captured here, I put initiates in the corners to give a reference...
Then zoomed out and panned around a little...

http://i687.photobucket.com/albums/vv231/silnarm/glest/quad_test3.jpg

Now that's what you call broken

Edit 20/12: removed inline images

daniel.santos · « **Reply #5 on:** 27 September 2009, 14:34:41 »

Yea, so it looks like it was rendering at *least* 2.5x what it needed to. Sorry about that, and omega posted the forumla to fix it over a year ago now =)

EDIT: whoops, it was a guy name Duke, not Omega

https://forum.megaglest.org/index.php?topic=3229.msg14540#msg14540

EDIT2: And here's MartiÃ±o's 2005 explanation of the original computeVisibleQuads() function (which didn't work right after I allowed more movement of the camera) https://forum.megaglest.org/index.php?topic=1034.0

EDIT 3.14159265:
So I was looking at the code for that function and apparently, I gave up on my code and left it commented out and I'm using MartiÃ±o's original formula. If you end up re-writing it, maybe you can have it figure the PoV and maxRenderingDistance, so that when we're using a camera angle that's more side-looking than down, we don't render stuff that is supposed to be too far away to render (which is what that setting was supposed to be for). For maxRenderingDistance to be truely effective, there would have to be other stuff as well, but making sure that it correctly computes this part is important.

silnarm · « **Reply #6 on:** 20 December 2009, 04:51:10 »

Well, after playing with the original code a bit, and figuring out how to have it take into account the camera's vAng in a similar fashion, I decided it wasn't really accurate enough anyway, and contained too many 'magic numbers'.

So I came up with my own mad scheme, it's a lot more code than the old, but it's accurate, barring a few niggling issues of course

So maxRenderDistance is now of use, and the default setting causes the terrain to cut off quite sharply when the camera is angled to look at a lot of the map (though it's not as bad as the effect from the old visibleQuad method).

So, there are a few issues that really come to the fore now... tarrain rendering is slow, rendering lots of it is not really an option atm, a naive attempt to get all the tile textures on the same physical texture a month or two back didn't end well. A less naive attempt may now be necessary.

For 'far-away' objects when the camera is low, I think billboarding will be required, we may be able to adapt renderUnitsFast() & renderObjectsFast() (which currently render the shadow texture, which is similar to what we want here).

Neither are going to be easy, but I've opened this can of worms now, so I guess I'll finish up the Path finder, and then turn the 'majority' of my attention back the renderer again.

News:

Author Topic: 'How to' guide: speeding up rendering (Read 3327 times)

silnarm

'How to' guide: speeding up rendering

daniel.santos

Re: 'How to' guide: speeding up rendering

daniel.santos

Re: 'How to' guide: speeding up rendering

silnarm

Re: 'How to' guide: speeding up rendering

silnarm

Re: 'How to' guide: speeding up rendering

daniel.santos

Re: 'How to' guide: speeding up rendering

silnarm

Re: 'How to' guide: speeding up rendering