Offline Multi-Action Policy Learning: Generalization and Optimization