Dont forget the half-pixel offset!
That's what I use:
Code:
float2 calc_ScreenPos(float4 pPos)
{
	return (float2(pPos.x,-pPos.y)/pPos.w+vecViewPort.zw)*0.5+0.5;
}