Gamestudio Links
Zorro Links
Newest Posts
Change chart colours
by 7th_zorro. 05/11/24 09:25
Data from CSV not parsed correctly
by dr_panther. 05/06/24 18:50
Help with plotting multiple ZigZag
by degenerate_762. 04/30/24 23:23
AUM Magazine
Latest Screens
The Bible Game
A psychological thriller game
SHADOW (2014)
DEAD TASTE
Who's Online Now
2 registered members (AndrewAMD, 1 invisible), 1,086 guests, and 5 spiders.
Key: Admin, Global Mod, Mod
Newest Members
firatv, wandaluciaia, Mega_Rod, EternallyCurious, howardR
19050 Registered Users
Previous Thread
Next Thread
Print Thread
Rate Thread
The truth, what really happens and what you've read on the Forum #421845
04/26/13 08:14
04/26/13 08:14
Joined: Apr 2007
Posts: 3,751
Canada
WretchedSid Offline OP
Expert
WretchedSid  Offline OP
Expert

Joined: Apr 2007
Posts: 3,751
Canada
This post is all about functions. It's the definite function guide, because there seems to be a lot of confusion and false information circling around this forum, and I'm afraid I'm partially to blame for that. So even though I said that I was going to leave the Forum, I feel like I should clean this up, it's not rocket surgery and even though I do go into the nasty details, everyone should be able to pick up the important bits (and you are free to skim over the details)

First things first: Functions aren't evil. Let's get this straight, calling a function isn't expensive in terms of CPU clock cycles. If you learn one thing from this post, than it HAS to be this bit. Truth be told, it's incredibly hard to get a CPU to its limit, even when using something like the rdrand instruction on the new Ivy Bridge CPUs, which takes roughly 100 cycles according to Intel, but out of order execution, superscalar CPUs and pipelining can minimize if not completely negate any impact, and even if they can't, 100 cycles on a 2.4 GHz CPU are nothing.

Here is the thing though, if you try to get a CPU to its limit, then you aren't going to have hard time doing so. Bringing a CPU to its limit while playing after the rules, that's hard, but you can stall a CPU in no time if you really want to (or if you just program badly). The easiest way to do so is via memory access, which is quite ironically given that it's often possible to trade memory in for performance. The problem is that memory access is inherently slow, although RAM access times are blazingly fast in our context, 100ns to wait just for the data to become available is incredibly slow in terms of a CPU. If the CPU can't do anything in the meantime, it will completely stall for 100ns, wasting precious clock cycles. Do this often enough, and you will waste time like there is no tomorrow. For this very reason, CPUs use layered caches, usually you have L1 and L2 cache, and more recently L3 and L4 cache. The lower the number of the cache, the faster it is (eg. the L1 cache usually operates at the same frequency of the CPU and is directly integrated into it, so there is no slow bus or anything between the data and the CPU). The problem is that fast memory is incredibly expensive, so you usually only have a few kb. L1 and L2 cache, and the slower caches have a bit more. The CPU will try to hold all important data as close to is as possible, so that it doesn't have to go and fetch it from main memory, and modern CPUs will also try to predict what your program is going to do next and pre-fetch memory into its caches so you have as little delay as possible (just for reference, L1 cache lookups take about 0.5ns while L2 cache lookups take about 7ns).

So, what exactly does this have to do with functions? At the first glance nothing, until you realize how functions work. The CPU holds a pointer to the next instruction it will execute, when it executes that instruction, it will move the pointer to the next instruction and so on. When you call a function, the flow changes, after the call instruction, the instruction pointer will point to the beginning of the called function. And this is where things get tricky, the jump of execution is most likely to a part of the program that doesn't reside in the cache (keep in mind, the fast caches are limited in size), so if the CPU can't predict the jump, it will result in a cache miss and it has to get the data from memory, which means the CPU has to stall. In this case, it really has to stall, because even with out of order execution and all tricks, if the code the CPU should crunch on isn't available, it has to wait for it.

Because waiting is expensive, and function calls are something that are supposed to happen a few times in normal execution, CPUs will employ a variety of tricks to try and predict the future. Of course, because jumps and calls usually depend on the result of something that has to be computed first, CPUs can't guess with 100% certainty, and mispredictions happen. Most of the time, the CPU will guess right though, but there are a few edge cases where the CPU is simply not able to guess anything meaningful (and the normal "educated guess" becomes a "well, it's sunny, so…"); For instance, if you have a bunch of random numbers with uniform distribution ranging from eg 0.0 to 1.0, and then you try to split them into two arrays, the first one containing everything from 0.0 to 0.5 and the second one from 0.51 to 1.0, you will get the CPU in deep trouble. Here is some code example;

Code:
float *list = …; // assume 10000 random floats with uniform distribution
float *left, *right; // assume arrays with enough storage to hold the numbers

for(int i=0; i<10000; i++)
{
	if(list[i] <= 0.5f)
	{
		*left ++ = list[i];
	}
	else
	{
		*right ++ = list[i];
	}
}



There is no obvious pattern for the CPU to guess, because the data is random. It's possible to speed this up by a huge factor though, and that is sorting the input first! If the input is sorted, in ascending or descending order, the CPU will see a pattern; Roughly the first half will go straight in one array, so the CPU will just assume that it's always going to take that branch. Then, after roughly the first half, it suddenly switches because the values will become > 0.5, which will first lead to a few mispredictions that are unavoidable, and then the CPU will catch up and predict the other branch.

So, if your program behaves randomly, the CPU will be unable to predict your program flow, and you will end up with mispredictions and ultimately cache misses and memory fetches. If however you can make sure that your program behaves reasonable, then the CPU can take advantage of this is make the execution of your program faster.

But, there is one more thing. The notorious wait() function. The evil of all evils, the one function that makes your program slow as fuck. Truth be told, it doesn't really, and having a few wait()'s in your code won't make your program incredibly slow. The problem with wait however is the following; It is implemented in a way that makes it incredibly hard for the CPU to guess what's going to happen next. If you have lot of functions that use wait(), you will end up with a lot of cache misses. That is not because JCL can't program, but is inherent to the nature of wait, it's nearly impossible to avoid. There ARE cases where you can't give any hints to the CPU and will end up with cache misses, no matter how nice you play. Wait() is one of these cases.

Last but not least though, before you now start to refactor your code and try to code "smart", keep the following in mind: Don't optimize until there is a need to. Write your code simple, and if you have a performance problem, find out what is making it slow and then optimize these parts! Premature optimization is the root of all evil, and may result in not only slower code, but also incredibly hard to maintain one. When writing code, you should follow the KISS principle (Keep It Simple Stupid)!


Shitlord by trade and passion. Graphics programmer at Laminar Research.
I write blog posts at feresignum.com
Re: The truth, what really happens and what you've read on the Forum [Re: WretchedSid] #421847
04/26/13 08:33
04/26/13 08:33
Joined: Nov 2007
Posts: 2,568
Germany, BW, Stuttgart
MasterQ32 Offline
Expert
MasterQ32  Offline
Expert

Joined: Nov 2007
Posts: 2,568
Germany, BW, Stuttgart
Thanks for the explanation, Sid!


Visit my site: www.masterq32.de
Re: The truth, what really happens and what you've read on the Forum [Re: MasterQ32] #421849
04/26/13 09:06
04/26/13 09:06
Joined: Mar 2011
Posts: 3,150
Budapest
sivan Offline
Expert
sivan  Offline
Expert

Joined: Mar 2011
Posts: 3,150
Budapest
nice little article, thanks.

and never forget: eat spaghetti, not code! (finally it will help code maintenance and to avoid the wait() related problems you mentioned)


Free world editor for 3D Gamestudio: MapBuilder Editor

Moderated by  HeelX, Lukas, rayp, Rei_Ayanami, Superku, Tobias, TWO, VeT 

Gamestudio download | chip programmers | Zorro platform | shop | Data Protection Policy

oP group Germany GmbH | Birkenstr. 25-27 | 63549 Ronneburg / Germany | info (at) opgroup.de

Powered by UBB.threads™ PHP Forum Software 7.7.1