Wow, it is huge!
Do you already have cut scenes?
It looks like at the last screen.

Relating the frames per second:
You need a LOD system, LOD = level of detail, don't know whether you already had a look into it.
For the LOD system, you need to split your town into level pieces that you put together in a main level, means you place the level parts like you do with models in one level. You need several levels/levelparts(.wmb). One with the whole details included, one with less, and one without nay details.
to reduce the amount of entities by switching them of, if they are in the distance, you can stop their action relating their LOD.

Have a look into the manual at LOD, d3d_lodfactor, level of detail...

A different idea is this:
you could build each street separately as wmb, put them into one level and then test whether an entity is in the same street like the camera, and if they are switch their action on, if not switch them off. Something like this. I hope you get the idea.