Have you tried to apply a filter in fourier domain and reconstruct the image? How fast is your implementation?
Of course, that was the first thing before I threw the phillips spectrum at it: (right side shows only real component of the complex coeffients, not the magnitude (fault of stupid me two days ago))
Runs with 2.3 ms for a 512x512x64 image with FFT + iFFT. So it's nothing you want to do for a simple filter like a small/midrange blur. EDIT: on a nVidia GTX 560 Ti