Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models