Currently available medication for treating many chronic diseases is often effective only for a subgroup of patients, and biomarkers accurately assessing whether an individual belongs to this subgroup do not exist. In such settings, physicians learn about the effectiveness of a drug primarily through experimentation, i.e., by initiating treatment and monitoring the patient’s response. Precise guidelines for discontinuing treatment are often lacking or left entirely at the physician’s discretion. We introduce a framework for developing adaptive, personalized treatments for such chronic diseases. Our model is based on a continuous-time, multi-armed bandit setting, and acknowledges that drug effectiveness can be assessed by aggregating information from several channels: by continuously monitoring the (self-reported) state of the patient, but also by (not) observing the occurrence of particular infrequent health events, such as relapses or disease flare-ups. Recognizing that the timing and severity of such events carries critical information for treatment design is a key point of departure in our framework compared with typical (bandit) models used in healthcare. We show that the model can be analyzed in closed form for several settings of interest, resulting in optimal policies that are intuitive and have practical appeal. We showcase the effectiveness of the methodology by developing a treatment policy for multiple sclerosis. When compared with standard guidelines, our scheme identifies non-responders earlier, leading to improvements in quality-adjusted life expectancy, as well as significant cost savings.