I spent some time familiarising myself a bit more with multipass rendering. I set up a very simple test application where I could do live editing of the major settings and see the effect.
This had a quick win: I figured out that my equation for blending in the lightmaps was wrong. The screenshot above shows the right settings (with the top for the base texture, and the bottom for the lightmap). The screenshot below shows the result when applied to a Quake level.
The scene is much more vibrant now. I think this is how it should have looked from the start. But it's hard to find this kind of information online. It really takes some experimenting to figure it out.
I also put some work in improving the performance. I figured that switching from vertex arrays to vertex buffer objects should be a smart thing to do. This way I could upload the scene data to the graphics card just once, and reuse it as needed. No more continuous streaming of vertex array data. Everything is reduced to a bound id and an offset.
The result isn't bad: an increase in fps of about 50%. So where I got 100fps before, now I see 150fps. In some locations that goes up to about 200fps.
Next I should really do some cleanup of my code. All the hacking and experimenting has degraded the quality somewhat.