HI version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
82% Positive
Analyzed from 1878 words in the discussion.
Trending Topics
#xor#sub#register#bit#zero#same#more#subtraction#don#obvious

Discussion (68 Comments)Read Original on HackerNews
Will remember for the next time I write asm for Itanium!
Even tiny tiny CPUs can do sub in one cycle, so I doubt that. On super-scalar CPUs xor and sub are normally issued to the same execution units so it wouldn't make a difference there either.
MIPS - $zero
RISC-V - x0
SPARC - %g0
ARM64 - XZR
Probably, there are ALU pipeline designs where you don't pay an explicit penalty. But not all, and so XOR is faster.
Surely, someone as awesome as Raymond Chen knows that. The answer is so obvious and basic I must be missing something myself?
A tangent, but what is Obvious depends on what you know.
Often experts don't explain the things they think are Obvious, but those things are only Obvious to them, because they are the expert.
We should all kind, and explain also the Obvious things those who do not know.
It could also be as a result of most people working in assembly being aware of the properties of logic gates, so they carry the understanding that under the hood it might somehow be better.
.b and .w -> clr eor sub are all identical
for .l moveq #0 is the winner
E.g. on Z80 and 6502 both have the same cycle count.
Not scalar, but still sub vs xor. Though you’d use vmov immediate for zeroing anyway.
> It encodes to the same number of bytes, executes in the same number of cycles.
Edit: Looked at comments, seems like x86 and the major 8bit cpu's had the same speed, pondering in this might be a remnant from the 4-bit ALU times.
I mean, not for zeroing because we know from the TFA that it's special-cased anyway. But maybe if you test on different registers?
https://en.wikipedia.org/wiki/Carry-lookahead_adder
The only minor difference between the two on x86, really, is SUB sets OF and CF according to the result while XOR always clears them.
(But this does not discount the fact that basically all CPUs treat them both as one cycle)
There are many kinds of SUB instructions in the x86-64 ISA, which do subtraction modulo 2^64, modulo 2^32, modulo 2^16 or modulo 2^8.
To produce a null result, any kind of subtraction can be used, and XOR is just a particular case of subtraction, it is not a different kind of operation.
Unlike for bigger moduli, when operations are done modulo 2 addition and subtraction are the same, so XOR can be used for either addition modulo 2 or subtraction modulo 2.
I looked it up afterwards and xor was also a valid instruction in that architecture to zero out a register, and used even fewer cycles than the subtraction method; but it was not listed in the subset of the assembly language instructions we were allowed to use for that unit. I suspect that it was deemed a bit off-topic, since you would need to explain what the mathematical XOR operation was (if you didn't already learn about it in other units), when the unit was about something else entirely- but everyone knows what subtraction is, and that subtracting a number by itself leads to zero.
[0] Not x86, I do not recall the exact architecture.
PS. What is static vs dynamic count?
Dynamic count - how many times an opcode gets executed.
I. e. an instruction that doesn't appear often in code, but comes up in some hot loops (like encryption) would have low static and high dynamic.
Edit: this is apparently not the case, see @tliltocatl's comment down the thread
xor was the default zeroing idiom.I onkly did sub reg,reg when I actually want its flags result. Otherwise the main rule is: do not touch either form unless flags liveness makes the rewrite obviously safe. Had about 40 such idioms for the passes.
as mentioned by others this is conjectural but it is a popular (if somewhat unfalsifiable) explanation for homochirality
eg: For one, Isaac Asimov in the 1970s wrote at length on this in his role as a non fiction science writer with a Chemistry Phd
> male/female ratios somehow tend to balance at 50/50.
This is different to the case of actual right handed dominance in humans and to L- Vs R- dominance in chirality ...
( Men and women aren't actual mirror images of each other ... )
The chirality argument made is more akin to dynamic systems balance; yes, you can balance a pencil on its point .. but given a bit of random tilt one way or the other it's going to tend to keep going and end near flat on the table.
Its basically free today. Of course, mov RAX, 0 is also free and does the same thing. But CPUs have limited decoder lengths per clock tick, so the more instructions you fit in a given size, the more parallel a modern CPU can potentially execute.
So.... definitely still use XOR trick today. But really, let the compiler handle it. Its pretty good at keeping track of these things in practice.
-----------
I'm not sure if "sub" is hard-coded to be recognized in the decoder as a zero'd out allocation from the register file. There's only certain instructions that have been guaranteed to do this by Intel/AMD.
8 'sub al, al', 14 'sub ah, ah', 3 'sub ax, ax'
26 'xor al, al', 43 'xor ah, ah', 3 'xor ax, ax'
edit: checked a 2010 bios and not a single 'sub x, x'