Developing accurate clinical prediction models is often bottlenecked by the difficulty of generating meaningful predictive features from unstructured data. While electronic health records (EHRs) contain rich narrative information, extracting a comprehensive list of structured features from them requires extensive domain knowledge and granular clinical judgment, a process that is historically manual, unscalable, and impractical for large cohorts. In this study, we first established a rigorous patient-level Clinician Feature Generation (CFG) protocol, in which domain experts manually reviewed notes to def ine and extract nuanced features for a cohort of 147 patients with prostate cancer. As a high-fidelity ground truth, this labor-intensive process provided the blueprint for SNOW (Scalable Note-to-Outcome Workflow), a transparent multi-agent large language model (LLM) system designed to autonomously mimic the iterative reasoning and validation workflow of clinical experts. In the prostate cancer cohort, SNOW achieved a predictive performance for 5-year recurrence (AUC-ROC 0.767 ± 0.041) that was indistinguishable from the gold-standard manual CFG (0.762 ± 0.026) and superior to structured baselines, clinician-guided LLM extraction, and six representational feature generation (RFG) approaches. Manual CFG required prolonged expert review and per-patient abstraction; in contrast, once configured, SNOW generated the full patient-level feature table in 12 hours with 5 hours of clinician oversight, reducing human expert effort by approximately 48-fold. To assess scalability in a setting where manual CFG is infeasible, we deployed SNOW on an external population of 2,084 patients with heart failure with preserved ejection fraction (HFpEF) from the MIMIC-IV database. Without task-specific tuning, SNOWgenerated prognostic features that outperformed baseline and RFG methods for 30-day (SNOW: 0.851±0.008) and 1-year (SNOW: 0.763±0.003) mortality prediction. These results demonstrate that a modular LLM agent-based system can scale expert-level feature generation from clinical notes, while enabling interpretable use of unstructured EHR text in outcome prediction and preserving generalizability across a variety of settings and conditions.