Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards