Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits