I’m confident that the whole thing can be written in high level C code using
if statements and the entire logic sequence will be complete before the
modulus operator even returns.
In this case, you’re also barking up a non existent tree since both logical
constructs will produce nearly identical code syntax wise, and totally
identical code clock cycle wise.
That is what I just said, yes.
I will give you style points for moving the initial assignment of 'ticks’
outside the loop. This will generate efficiency and lock the frame rate to
exactly 30 ms (though it may suffer from the fact that absolute granularity
to the ms is not guaranteed, so it’s possible that the loop may fall behind
such that the process becomes event starved and ?never catches up).
This depends on how much you’re trying to draw per frame. If you’re
not drawing much then it won’t matter since the CPU will easily keep
pace, but when making large changes under heavy load it can bog down.
When CPU load drops off, it will catch up, but this isn’t pretty.
You can add an additional check to make sure it only increments ticks
if now-ticks < 30 which will eliminate this issue.
It’s also not a bad thing to remember, this is not about efficiency, it’s
about slowing the framerate.
?It will be good for you to realize that the modulus operator is very
expensive, if’s and ?: scream in comparison.
Let’s compare the code you posted to what I posted myself:
m_frame = ++m_frame > 2 ? 0 : m_frame;
vs.
m_frame = (m_frame + 1) % 3;
first, let’s expand these to one instruction per line in C:
++m_frame;
if
(m_frame > 2) {
m_frame = 0;
}
else {
m_frame = m_frame; /* hopefully optimized out… */
}
vs.
register unsigned tmp;
tmp = m_frame + 1;
tmp %= 3;
m_frame = tmp;
/* this could be optimized to:
++m_frame;
m_frame %= 3;
*/
now let’s get the instruction count for each:
5 instructions, 4 processed per branch (or 4 instructions one third of
the time and 3 instructions two thirds of the time if optimized)
vs.
3 instructions (or 2 instructions if optimized)
but this is in C, we can’t get cycles and latency unless we use
assembly language, so in IA32 assembly…
INC [m_frame]
CMP [m_frame], 2
JLE else_label
MOV [m_frame], 0
JMP end_label
else_label:
MOV [m_frame], [m_frame] ; this is basically a NOP, though it sets some flags
end_label:
;…
; optimized, this would be…
; MOV EAX, [m_frame] ; we’ll ignore this instruction since m_frame is
likely already in a register
; CMP EAX, 2
; JLE else_label
; XOR EAX, EAX
; MOV [m_frame], EAX ; ditto
; else_label:
; ;…
vs.
MOV EAX, [m_frame]
INC EAX
XOR EDX,EDX
MOV EBX,3
DIV EBX
MOV [m_frame], EDX
; this really doesn’t optimize much, unless EDX/EBX have previously
been set to appropriate values, but we can trim off one instruction:
; three:
; dd 3
; ; …
; MOV EAX,[m_frame]
; INC EAX
; XOR EDX,EDX
; DIV [three]
; MOV [m_frame], EDX
;
; or if m_frame is already in EAX:
; three:
; dd 3
; ; …
; INC EAX
; XOR EDX,EDX
; DIV [three]
; MOV EAX,EDX
now the instruction count:
6 instructions, with 4 executed two thirds of the time and 5 executed
one third of the time (optimized, this is 5 instructions, with 3
executed two thirds of the time and 5 executed one third of the time,
or 3 instructions, with 2 executed two thirds of the time and 3
executed one third of the time if m_frame is in EAX)
vs.
6 instructions (optimized this is 5 instructions, or 4 instructions if
m_frame is in EAX)
At this point, you’re winning, but we haven’t figured out cycles and
latency yet…
A conditional jump takes longer when it actually jumps than when
execution falls through to the next instruction. There’s two reasons
for this:
- the CPU has to do more work to do a jump than a nop
- jumping will clear any cached instructions, so the CPU will have
to wait for the cache to refill.
On old CPUs (8086, 8088, etc), the difference is 1 cycle vs. 4 cycles.
On newer CPUs, the difference is much larger and depends on the size
of the cache, though much of this won’t show up right away, but will
instead result in the instructions following the jump executing slower
than they otherwise would.
Unfortunately, for modern processors, it’s no longer possible to
simply look up the cycle count for each instruction and add them up
since the time each instruction takes depends not only on the
instruction itself, but also the instructions before it. For example,
XOR EAX,EAX INC EAX will execute slower than XOR EDX,EDX INC EAX,
since in the second case the two instructions are independent and can
be executed asynchronously.
So undetermined? Maybe, but the result really depends on what code is
executed next. If your next instruction clears the cache anyway, then
your conditional jump doesn’t matter quite so much, otherwise avoid it
and use DIV (or AND) instead.On 12 August 2010 04:19, CWC wrote: