I haven't polished this yet, but don't want to delay posting it
This describes a clamped subtract operation on 15-bit RGB pixels of the form 0RRRRRGG GGGBBBBB. Below, X and Y refer to the two source pixels. The same algorithm can also be used on a pair of 15-bit RGB pixels packed into a 32-bit integer.
The subtract clamped operation subtracts the channels from both source pixels and keeps the result from going negative, instead of wrapping around:
result = max( X - Y, 0 );
The first step is to find the difference of the components, which will require at least one extra set bit between components to serve as a borrow when a result goes below zero. Without this set bit, a component going below zero would borrow from the next component and wreck the result.
When the low bits of X and Y differ, the low bit of the result will be 1 before any borrow occurs.
RRRRR GGGGG BBBBB 0 00001 00010 00000 X XOR 0 00011 00001 00001 Y ------------------- 0 00010 00011 00001 differing bits AND 0 00001 00001 00000 ------------------- 0 00000 00001 00000 differing low bits
It's only when the low bits of X and Y are the same that the low bit needs to be set to 1. This can be achieved by always adding one and then subtracting one when the low bits of X and Y differ. The result can then be masked to leave just the borrows in the low bits, where 1 = no borrow, 0 = borrow occurred for component to right:
RRRRR GGGGG BBBBB 0 00001 00010 00000 X - 0 00011 00001 00001 Y ------------------- 1 11110 00000 11111 difference + 1 00001 00001 00000 adjust ------------------- 0 11111 00001 11111 adjusted difference - 0 00000 00001 00000 differing low bits (from above) ------------------- 0 11111 00000 11111 borrows AND 1 00001 00001 00000 ------------------- 0 00001 00000 00000 masked borrows
Before clamping, the adjusted difference must be un-adjusted back to the original difference, but without any borrows. The original adjustment was +1 and a borrow bit is 1 when the result did not go below zero, or 0 when it did (and thus subtracted one from the next component to the left). So, we can subtract the masked borrows to either cancel the +1 adjustment if no borrow occurred, or leave the +1 adjustment unchanged so that it cancels out the borrow that occurred for the component to the right. This gives the modulo difference (wrap-around):
0 11111 00001 11111 adjusted difference (from above) - 0 00001 00000 00000 masked borrows (from above) ------------------- 0 11110 00001 11111 modulo difference: result = (X - Y) % 32
These can be masked off and used to form a mask to AND with. If the borrow is set, the result didn't go below zero, so the result should be preserved, otherwise it should be ANDed with 0.
- 0 00000 00001 00000 masked borrows (from above) shifted right 5 bits ------------------- 0 00000 11111 00000 clamp mask AND 0 11110 00001 11111 modulo difference (from above) ------------------- 0 00000 00001 00000 clamped result
Here is the final code:
diff = x - y + 0x8420; low_bits = (x ^ y) & 0x8420; borrows = (diff - low_bits) & 0x8420; modulo = diff - borrows; clamp = borrows - (borrows >> 5); result = modulo & clamp;
It uses 10 operations and can do two 15-bit pixels at once by multiplying the masks by 0x10001.
Mixing Packed RGB Pixels Efficiently
Adding RGB Pixels With Clamping