Floor and Ceil versus Denormals on CPU and GPU

(asawicki.info)

27 points | by ibobev 4 days ago

2 comments

  • kevmo314 24 minutes ago
    > This is not the first time we can see Nvidia taking shortcuts to achieve maximum performance of their GPUs

    Why is implementing it correctly not performant? For context I have no idea how rounding is typically implemented anyways.

  • crote 1 hour ago
    Another thing to keep in mind is that CPU processing of denormals tends to be extremely slow - I vaguely recall running into something like a 10x slowdown a decade ago.

    For a lot of applications the difference between a denormal and zero is small enough to be irrelevant, so if you expect near-zero values to be common, enabling a denormals-to-zero compiler flag might give you a pretty nice performance boost for free.