FR version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
33% Positive
Analyzed from 688 words in the discussion.
Trending Topics
#denormals#behavior#zero#correct#standard#where#https#intel#handled#slow

Discussion (15 Comments)Read Original on HackerNews
https://www.w3.org/TR/WGSL/#concrete-float-accuracy
This is all fully tested in the CTS.
https://gpuweb.github.io/cts/standalone/?q=webgpu:shader,*
Why is implementing it correctly not performant? For context I have no idea how rounding is typically implemented anyways.
NVIDIA is not responsible alone, because the Microsoft DirectX specification includes the non-standard behavior.
Nevertheless, as shown in TFA, both the AMD and Intel GPUs allow the user to choose between correct behavior and incorrect behavior that might be faster, while NVIDIA ignores what the user requests and implements only the non-standard behavior.
The developers of graphics or ML/AI applications do not care about errors, but there are also people who want to use GPUs for normal computations, where the accuracy of the results matters, so they want to be able to choose between correct behavior and incorrect but faster behavior.
Actually "faster" is a misnomer, because denormals can be handled correctly without diminishing the speed, but that costs additional die area. Thus what NVIDIA gains by not implementing the right behavior is a reduced production cost.
I didn't know that. Could you provide a more specific reference?
For a lot of applications the difference between a denormal and zero is small enough to be irrelevant, so if you expect near-zero values to be common, enabling a denormals-to-zero compiler flag might give you a pretty nice performance boost for free.
Intel CPU processing, where slowdowns can be as bad as couple hundred cycles. AMD CPUs penalize them much more mildly, usually single-digit cycles. (No idea about ARM.)
During the last half of century there have been plenty of CPUs where denormals have been handled in hardware, so that any slow down caused by them is negligible.
Except for generating graphic images seen by humans or in ML/AI applications, neither flushing results to zero nor treating denormal inputs as zero are acceptable, because they can lead to huge errors.
Whoever fears that denormals can slow down an application, must enable the underflow exception. In that case denormals are never generated, but the underflow exceptions must be handled, because when denormals are not desired but underflows happen, that means that there are bugs in the program, which must be fixed.
Denormals have been created so that people can mask the underflow exception and avoid to handle it, without dire consequences.
However this habit of no longer handling the floating-point exceptions, like before the IEEE 754 standard, has created younger developers who are no longer aware of how FP arithmetic must be handled to avoid errors, so now there are too many who believe that the use of "-ffast-math" is permitted in general-purpose programs, not only in special applications where result accuracy does not matter.
For correct results, you must use either denormals or underflow exception handling. There is no third choice. The third choice, like in GPUs, is only for when correctness is irrelevant.
Your repo has a link to the standard[0], which might interest some people. It makes me unreasonably happy to know that this was funded out of Singapore.
[0] https://posithub.org/docs/posit_standard-2.pdf