Originally Posted By: MasterQ32
In some cases i also had the impression that CLR code (C#, ...) runs actually faster than a fully optimized C++ program. I can't remember exactly what it was, but it was about a simple algorithm.


I know I'm not exactly going into what you said, but it's still important and tangentially related to simple algorithms: The thing most people forget is that CPUs are insanely fast because they make assumptions about your code and fancy algorithms are only really good for big values of n! An array will most likely outperform a hash table for even up to a hundred entries simply because a linear search through array is much friendlier to the cache and branch predictor. A lookup in a hash table may very well be O(1), but that doesn't offer you much if it means that the CPU can't predict what's going to happen and can't effectively utilize the pre-fetcher.

Yes, RAM access is fast on a human scale, but it's incredibly slow on the CPU level and stalling on RAM access can easily blow your performance budget right out of the window. On the other hand, CPU caches are insanely fast, the closer they are to the actual core the faster. And if you play by the CPUs rules and effectively utilize the cache, the performance will be through the roof.

As a very real example might be this, now famous, StackOverflow question and answer: http://stackoverflow.com/questions/11227...-unsorted-array

Anyway, since we are already walking down this rabbit hole, here is another thing:

Originally Posted By: Wjbender
and I just want to say that , you can thread your functions in lite-c by using windows api too , as long as you keep it thread safe and avoid engine functions , you should be able to thread those complex functions and simply wait (1) untill the thread is done .


That is true, but, threads are expensive! We are talking ridiculous amounts of expensive here, since the OS has to allocate resources for the thread and initialize it, which are both non trivial. So whatever work you want to perform in the new thread has to be more expensive than that right out of the gate or you will spend more time twiddling thumbs while the new thread is spawned.

On top of that, you introduce a lot of complexity in your code because most of the time you don't want to just create a thread that then just fucks around, but you want to do something with the performed work, so you'll have to work on synchronization of two or more threads. And that is a whole bag of pain you are getting yourself into.

Next up is the issue that not everything is actually threadable! There is a category of problems called 'embarrassingly parallel workload' which is so trivially easy to multithread that there almost always is no reason not to do it. One example would be calculating the histogram of a picture, you can trivially divide the picture up into pieces, let each thread crunch on one piece and then in a last step just combine all the results together. This is so trivial to multithread because each piece of work isn't depending on any other step of the work pieces, ie there is nothing stopping them from being done in parallel. On the other hand, you have things like cryptographic hashes where each step of the work depends on the result of the previous, so you can't parallelize that work.

Now, most of the time you will find some work that is somewhere inbetween, and for those you'll have to test and see if doing them in parallel is actually worthwhile.

A way to help out with that is to employ a thread pool, so instead of creating n threads, assigning them work and then joining those n threads, you have a thread pool which maintains its own set of worker threads and which you can give a bunch of tasks which the thread pool then divides amongst those threads. Now in theory that's simple, in practice not so much. I spend a good month working on a new thread pool system for an unannounced 2nd version of a certain game engine and it's very trivial to end up with a system where scheduling jobs and dividing those jobs between the threads is more expensive than actually performing the work.

Oh, and it's quite easy to blow the CPU cache even harder when having multiple threads contend for the same data.


Shitlord by trade and passion. Graphics programmer at Laminar Research.
I write blog posts at feresignum.com