FR version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
60% Positive
Analyzed from 3197 words in the discussion.
Trending Topics
#code#compiler#answer#compilers#undefined#behavior#language#more#order#increment

Discussion (82 Comments)Read Original on HackerNews
I remember one particular interviewer who, after I explained that this was undefined behaviour and why, listened patiently to me and then explained to me that the correct answer was 17, because the two post-increments leave the variable as 6, so adding 6 twice to the original 5 gives 17.
I am very glad these types of interview questions have become less prevalent these days. They have, right? Right?
These sorts of things are neat trivia to learn about things like sequence points but 99.9% of the time if it matters in your codebase you're writing something unmaintainable.
That's half of a reasonable answer. The other half is "but I do know the answer so if I see it when reviewing or working on someone else's code I can flag it or rewrite it, and explain to them why it is bad".
If you can convince someone in a position of authority that they’re wrong about something technical without upsetting them then you’re probably a good culture fit and someone who can raise the average effectiveness of your team.
"What does this produce?" and expecting an answer of "17" is a bad question even if UB didn't mean the expected answer is wrong.
Other than the job for most programmers having nothing to do with whether they know the outcome, because hopefully they'd never write something like it or clean it up. And IF they found it they'd hopefully test it - given that it appears to be compiler dependent anyways.
[1] By which I mean predicting the behavior of error-prone code that requires good knowledge of all the quirks of the language to correctly answer.
I just refuse to do interviews like that any more.
The horrible undefined behavior of signed integer overflow at least can be explained by the fact that multiple CPU architectures handling those differently existed (though the fact that C even 'attracts' its ill-defined signed integers when you're using unsigned ones by returning a signed int when left shifting an uint16_t by an uint16_t for example is not as forgivable imho)
But this here is something that could be completely defined at the language level, there's nothing CPU dependent here, they could have simply stated in the language specification that e.g. the order of execution of statements is from left to right (and/or other rules like post increment happens after the full statement is finished for example, my point is not whether the rule I type here is complete enough or not but that the language designers could have made it completely defined).
At least according to this: https://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B#Exp...
This isn't quite the same case, but it's a good illustration of the effect: on gcc, if you have an expression f(a(), b()), the order that a and b get evaluated is [1] dependent on the architecture and calling-convention of f. If the calling convention wants you to push arguments from right to left, then b is evaluated first; otherwise, a is evaluated first. If you evaluate arguments in the right order, then after calling the function, you can immediately push the argument on the stack; in the wrong order, the result is now a live variable that needs to be carried over another function call, which is a couple more instructions. I don't have a specific example for increment/decrement instructions, but considering extremely register-poor machines and hardware instruction support for increment/decrement addressing modes, it's not hard to imagine that there are similar cases where forcing the compiler to insert the increment at the 'wrong' point is similarly expensive.
Now, with modern compilers using cross-architecture IRs as their main avenue of optimization, the benefit from this kind of flexibility is very limited, especially since the penalties on modern architectures for the 'wrong' order of things can be reduced to nothing with a bit more cleverness. But compiler developers tend to be loath to change observable behavior, and the standards committee unwilling to mandate that compiler developers have to modify their code, so the fact that some compilers have chosen to implement it in different manners means it's going to remain that way essentially forever. If you were making a new language from scratch, you could easily mandate a particular order of evaluation, and I imagine that every new language in the past several decades has in fact done that.
[1] Or at least was 20 years ago, when I was asked to look into this. GCC may have changed since then.
With modern register allocators and larger register sets, code generation impact from following source evaluation is of course lower than it used to be. Some CPUs can even involve stack slots in register renaming: https://www.agner.org/forum/viewtopic.php?t=41
On the other hand, even modern Scheme leaves evaluation order undefined. It's not just a C issue.
Anyway, yes, this one example has an obvious order it should be applied. But still, something like it shouldn't be allowed.
That would be nice, but don't forget the more general case of pointers and aliasing:
The compiler cannot statically catch every possible instance of a statement where a variable is updated more than once.Look at the addressing modes for the PDP-11 in https://en.wikipedia.org/wiki/PDP-11_architecture and you'll see you can write (R0)+ to read the contents of the location pointed to by R0, and then increment R0 afterwards (so a post increment).
Back in the day, compilers were simple and optimisations weren't that common, so folding two statements into one and working out that there were no dependencies would have been tough with single pass compilers.
You could argue that without such instructions, C wouldn't have been embraced quite so enthusiastically for systems programming, and the world would have looked rather different.
C just wouldn't be C without things like a[i++]
Both are favorite idioms of C developers. And they are ok if done correctly, clearer than the alternative. They are also unnecessary in modern languages, so those shouldn't copy it (yeah, Python specifically).
And when I checked 3 different compilers, each of them chose a different way to use FMAs.
Even with integer math, you can get different numerical results via UB (e.g. expressions with signed overflow one way and not another).
I didn't open TFA but my first thought was "Is this even defined?".
It kinda make sense that suck fucktardedness could be not defined.
Luckily, I ended up with smug smiles in all those cases after showing them the output from different compilers.
No, if you invoke undefined behavior any result at all is possible.
So let me start by saying that that blog post was written was 15 years ago and I don't even remember the details of it and what I've written there. But, I have a hot-take on this topic you've touched on!
From a programmer perspective, you are absolutely right. The behaviour is undefined, end of discussion. A programmer should never rely on what they observe as the effective behaviour of an UB. A programmer must avoid creating situations in code that could result in the execution flow venturing into the areas of UB. And - per C and C++ standards - results of UB can be anything (insert the old joke about UB formatting one's disk being a formally correct behaviour).
However, I'm a security researcher, and from the security point of view - especially on the offensive side - we need to know and understand the effective behaviours of UBs. This is because basically all "low-level" vulnerabilities in C/C++ are formally effects of UBs. As such, for the security crowd, it still makes sense to investigate, understand, and discuss the actual observed effects of UBs, especially why a compiler does this, what are the real-world actual variants of generated code (if any) for a given UB for this and other compilers, how can this be abused and exploited, and so on.
My point being - there are two sides to this coin.
The problem is that it’s not specified which should be picked, but all pick something.
The obvious counterpoint in this particular instance is that there's no good reason not to make such an awful expression a compile time error.
I also personally think that evaluation order should be strictly defined. I'm unclear if the current arrangement ever offers noticable benefits but it is abundantly clear that it makes the language more difficult to reason about.
This doesn't really help portability all that much.
But what often happens in practice is that "Bill's Fly-By-Night-C-Compiler-originally-written-in-the-mid-nineties" implemented it in some specific way (probably by accident) and maintains it as a (probably informal) extension. And almost certainly has users who depend on it, and can't migrate for a myriad of reasons. Anyway, it's hard to sell an upgrade when users can't just drop the new compiler in and go.
At the language level, it is undefined-behavior, and any code that relies on it is buggy at the language level, and non-portable.
Defining it would make those compiler non-conforming, instead of just dependent on defining something that is undefined.
Probably the best way forward is to make this an error, instead of defining it in some way. That way you don't get silent changes in behavior.
Undefined behavior allows that to happen at the language level, but good implementations at least try not to break user code without warning.
Modern compilers with things like UBSan and such makes changing the result of undefined behavior much less of an issue. But most UB is also, "No diagnostic required", so users don't even know they have in their code without the modern tools.
I don't do a lot of C anymore, but even when I did, I always would do increments on separate lines, and I would do a +=1, or just a = a + 1. I never noticed a performance degradation, and I also don't think my code was harder to read. In fact I think it was easier since I think the semantics were less ambiguous.
After separating a++ onto its own line, replacing a++ with a+=1 or a=a+1 comes down to personal taste in syntax sugar. I vote for a+=1.
There’s UB, so any answer is possible, isn’t it?
I'm going top-to-bottom through comments, and there was a similar question, so I'll link my answer here: https://news.ycombinator.com/item?id=48140821 (TL;DR: you are right, but there's another perspective on this)
It seems like something that should trigger a "we should specify this" reaction when adding these operators, and there is at least one reasonable way to define it which is fairly trivial and easily implementable.
This is how to keep simpletons out of your code base. Every numeric constant is defined in terms of a different lang quiz. Works well in JS as well of course.
Failing to recognize the dangers would be an instant fail; knowing that something reeks of undefined behaviour, or even potential UB, is enough: you just write out explicitly what you want and skip the mind games.
https://www.scribd.com/document/235004757/Test-Your-C-Skills...
It‘s the standard technical C++ blog post everybody seems to write.
The wise nerd will not allow lines like it in their codebase, in the first place and, having seen one, will refactor it (probably involving more lines or parentheses) to make it more clear and easier to maintain.
The latter approach scales better, in long run.
(this is related to my other comment here https://news.ycombinator.com/item?id=48140821)
Uh, 85% of them show the wrong result so 85% of them clearly do not support pre and post increment.
int a = 5; a = (++a * a++) + --a; a = ?