2 registered members (VoroneTZ, AndrewAMD),
779
guests, and 7
spiders. |
Key:
Admin,
Global Mod,
Mod
|
|
|
texcoord3 dx9 prototype
#382296
09/06/11 22:40
09/06/11 22:40
|
|
Joined: Sep 2011
Posts: 13
OP
Newbie
texcoord3 dx9 prototype
Greetings,
I used to be pretty active on this forum under the alias "foxfire" - sadly I have forgotten my password and no longer have access to my old email so I made this new account.
Anyway, these are rendered in real-time dynamically with A7 Gamestudio engine. This was the prototype for my own engine API and has served it's purpose well. The prototype is now retired and I use my own proprietary API in dx11.
If you have any questions or need help coding please ask me! I am always glad to help =]
Also, sorry for the LONG absence from here - I've been VERY busy. DX
-Texcoord3-
Last edited by texcoord3; 09/06/11 23:23.
|
|
|
Re: texcoord3 dx9 prototype
[Re: WretchedSid]
#386924
11/11/11 07:40
11/11/11 07:40
|
Joined: Mar 2002
Posts: 1,774 Magdeburg
FlorianP
Serious User
|
Serious User
Joined: Mar 2002
Posts: 1,774
Magdeburg
|
Sure you can write more optimized code in Assembler than in c - and thats kinda the point. You don't write a c-snippet and go like 'hey now im gonna compile this by hand'... Todays desktop CPUs(all CISCs anyway) still have ways of adressing and tricks in general you couldnt dream of in C. Needs a buttload experience though - but fairly possible!
Last edited by FlorianP; 11/11/11 07:41.
|
|
|
Re: texcoord3 dx9 prototype
[Re: WretchedSid]
#386936
11/11/11 11:18
11/11/11 11:18
|
Joined: Mar 2002
Posts: 1,774 Magdeburg
FlorianP
Serious User
|
Serious User
Joined: Mar 2002
Posts: 1,774
Magdeburg
|
Automated optimizations is a topic in the theoretical computer science for a very long time - fact is every optimization done by a toadys machine is obviously imperfect. (In this case) every compiler has to make assumptions based on the higher language its interpreting. I remember one of the first examples we had in theoretical CS was a floating operation in which the compiler has to assume that you need - lets say 32 bit precision - the compiler has no way of knowing that u might need less, thus ends the optimization. Though this very basic and very old example might not work for any machine theres tons of literature about this. In fact this exact problem was the reason for (re-)inventing RISC CPUs which try to minimize such problems. If ure really interested in this topic I suggest u read some books about formal languages and numerical optimizations f.i. EDIT: You might try this http://en.wikipedia.org/wiki/Kahan_summation_algorithm
float KahanSum
(
const float *data,
int n
)
{
float
sum = 0.0f,
C = 0.0f,
Y,
T;
for (int i = 0 ; i < n ; ++i)
{
Y = *data++ - C;
T = sum + Y;
C = T - sum - Y;
sum = T;
}
return sum;
}
float AsmSum
(
const float *data,
int n
)
{
float
result = 0.0f;
_asm
{
mov esi,data
mov ecx,n
fldz
fldz
l1:
fsubr [esi]
add esi,4
fld st(0)
fadd st(0),st(2)
fld st(0)
fsub st(0),st(3)
fsub st(0),st(2)
fstp st(2)
fstp st(2)
loop l1
fstp result
fstp result
}
return result;
}
Last edited by FlorianP; 11/11/11 11:26.
|
|
|
Re: texcoord3 dx9 prototype
[Re: FlorianP]
#386937
11/11/11 11:59
11/11/11 11:59
|
Joined: Apr 2007
Posts: 3,751 Canada
WretchedSid
Expert
|
Expert
Joined: Apr 2007
Posts: 3,751
Canada
|
Automated optimizations is a topic in the theoretical computer science for a very long time - fact is every optimization done by a toadys machine is obviously imperfect. Of course, but this doesn't mean that humans can do it any better and thats the reason why I still doubt that texcoord3 can produce better assembly. You know, knowing that compilers aren't perfect and making it better are two totally different pairs of shoes. No doubt that there are people who can do this, but what are the odds that one of them is here in the forum where most can't even write performant C code? If he had said that he optimized a very few routines in assembly, I would actually believe him, but a complete project? Are you really believing this? About your floating point example, imo the user should know what kind of data type s/he should use in which case and the compiler should trust that the user knows what s/he is doing. Again, its totally easy to write horrible slow C code that even if optimized by the compiler still performs very bad, however, that was never my point.
Last edited by JustSid; 11/11/11 12:05.
|
|
|
Re: texcoord3 dx9 prototype
[Re: WretchedSid]
#386940
11/11/11 12:42
11/11/11 12:42
|
Joined: Mar 2002
Posts: 1,774 Magdeburg
FlorianP
Serious User
|
Serious User
Joined: Mar 2002
Posts: 1,774
Magdeburg
|
I admit I have no idea what this thread is actually about neither have i any indea what texcoord is capable of...sorry for that. Of course your totally right that its bogus to write a whole project in Assembler or thinking that you can even get close to the average power of compiler-optimizations these days. But my point is - theres actually real life examples where assembler has a clear advantage over c - especially in computer graphics. You are the local Apple-fanboy right? So you might have already stumbled over this: Lets say u want to multiply two 32bit floats to a 64bit result and then get the middle 32bit. ARM processors can do that wihtin one clock-cycle((Prozessor-)Takt, ka ob das die korrekte Übersetzung ist) meaning using one assembler instruction. I don't know a single C-compiler who recognizes this and optimizes it correctly.
Last edited by FlorianP; 11/11/11 13:56.
|
|
|
Re: texcoord3 dx9 prototype
[Re: FlorianP]
#386955
11/11/11 17:04
11/11/11 17:04
|
Joined: Apr 2007
Posts: 3,751 Canada
WretchedSid
Expert
|
Expert
Joined: Apr 2007
Posts: 3,751
Canada
|
You are the local Apple-fanboy right? Guilty as charged. Lets say u want to multiply two 32bit floats to a 64bit result and then get the middle 32bit. ARM processors can do that wihtin one clock-cycle((Prozessor-)Takt, ka ob das die korrekte Übersetzung ist) meaning using one assembler instruction. I did a quick look into the ARM ARM and couldn't find such an instruction in the NEON instruction set reference for any revision of the Cortex A8. Mind pointing me to the one you mean? Btw, do you really mean one clock cylce or one assembler mnemonic? there're certainly compilers that have an ARM backend which use the power of specialized instructions. It is indeed a very daunting task for the compiler to identify several instructions that can be replaced by SSE instrunctions, though. FWIW: The * operator of my color class uses something that can be optimized by NEON quite well, by just loading both single precision floating point vectors into NEON registers at once and then multiplying them all at once. In fact, LLVM/Clang 3.0 does this when compiling for armv7 in release mode. I'm not quite sure which parameter triggers this, but one certainly does it. Used LLVM Version: noname:~ Sidney$ clang --version Apple clang version 3.0 (tags/Apple/clang-211.10.1) (based on LLVM 3.0svn) Target: x86_64-apple-darwin11.2.0 Thread model: posix
|
|
|
Re: texcoord3 dx9 prototype
[Re: WretchedSid]
#386978
11/12/11 03:10
11/12/11 03:10
|
Joined: Sep 2011
Posts: 13
texcoord3
OP
Newbie
|
OP
Newbie
Joined: Sep 2011
Posts: 13
|
Ok, well I am no God and I am not the most experience programmer in the world.
BUT, I have tested various code snippets against gcc and vc++ (yes it's c++ but anyway...).
I've noticed that I can code very complex optimizations in assembly that the compilers destroy, mainly when it comes to accessing memory.
For example, my prime optimization is a ray-triangle intersection routine that only reads each possible triangle from ram and then does ALL of the math in registers. Now... maybe a compiler can do that, but at least gcc nor visual studio's compiler produced anything near as efficient.
To be honest, most of, if not all, of the optimizations I have made/plan simply manage variable better than c/c++. While that doesn't always translate to huge gains, or even different code from the compiler, in many examples, such as my ray-tracer, it is a very significant (I am still testing for how significant) difference. I've observed that accessing memory is not particularly fast compared to basic add,mult, etc and so the more I can keep in the cache and use registers, the faster the routines should be.
Also mk_1 noted about SIS (special instruction sets). Indeed I can use these and manage portability in the installer.
Again, I'm not an expert, just a obsessive coder. lol.
|
|
|
Re: texcoord3 dx9 prototype
[Re: texcoord3]
#386991
11/12/11 10:28
11/12/11 10:28
|
Joined: Apr 2007
Posts: 3,751 Canada
WretchedSid
Expert
|
Expert
Joined: Apr 2007
Posts: 3,751
Canada
|
I didn't meant to criticize you as a person, I mean, I don't even know you at all. All I was saying is that I have huge doubts that the average programmer here on the forum can write better assembly than a compiler which does optimizations. And your post looked like "hey, I wrote the project in Assembler", at least thats what I understood from it, and this, and I hope you agree there, is really unlikely to perform better!
By no means its possible to boost performance by helping the compiler out at some parts, I do this too, for example a lot of the matrix and vector calculation done in my engine (iOS) is written in direct assembler to get the most out of NEON. I have no doubt that you can do this too, but like I said, I read your post a bit different. Sorry about that.
Shitlord by trade and passion. Graphics programmer at Laminar Research. I write blog posts at feresignum.com
|
|
|
|