Another paper which is probably there but I couldn´t find...:
https://graphics.stanford.edu/wikis/cs44...s-10-11-oit.pdf
About order independent alpha blending in dx11 and here is a post about implementing something like it in OpenGL: http://blog.icare3d.org/2010/07/opengl-40-abuffer-v20-linked-lists-of.html
One could probably get such an approach working for DirectX 9 as well, but of course with lots of overhead, but it could still be a solution for some cases as it could still perform better than doing the whole sorting on the CPU.