I see this test/cmp all the time after the instruction and I don't understand it. pcmpestri will set ZF if edx < 16, and it will set SF if eax < 16. It is already giving you the necessary status. Also testing sub words of the larger register is very slow and is a pipeline hazard.
You've got this monster of an instruction and then people place all this paranoid slowness around it. Am I reading the x86 manual wrong?
I think people started doing that after one of the Intel SSE examples did it and everyone just copied it.
But on any modern CPU there should be essentially no penalty for doing that now. Testing the full register is basically free as long as you aren't doing a partial write followed by a full read (write AH then read AX), and I don't think there's any case where this could stall on anything newer than a Core 2 era processor. But just replacing that with a "jnc" or whatever you're exactly trying to test for would be less instructions at least. I'd love to see benchmarks though if someone has dug deeper into this than I have.
You've got this monster of an instruction and then people place all this paranoid slowness around it. Am I reading the x86 manual wrong?
But on any modern CPU there should be essentially no penalty for doing that now. Testing the full register is basically free as long as you aren't doing a partial write followed by a full read (write AH then read AX), and I don't think there's any case where this could stall on anything newer than a Core 2 era processor. But just replacing that with a "jnc" or whatever you're exactly trying to test for would be less instructions at least. I'd love to see benchmarks though if someone has dug deeper into this than I have.