Task Prioritization Research: Human vs. LLM vs. Rules-Based SystemsEvaluated three decision-making systems — a production PHP heuristic, an improved Python formula, and a Claude AI agent — against 100 weekly point-in-time snapshots exported from a live WHMCS project management and billing database spanning approximately May 2024 through April 2026. Each snapshot captured the exact state of open tasks and client context as known at that moment, with no information leakage from future outcomes. Rankings were evaluated against actual downstream revenue: payments received within 60 days of each snapshot date. Key findings: The original production PHP heuristic contains a scoring bug that severely degrades its performance. After correcting the bug, both the improved formula and the Claude agent substantially outperform the original system. When evaluation is restricted to schedulable work, Claude achieved 81% revenue capture efficiency relative to the actual human worker, compared to 73% for the corrected formula. However, approximately 52% of weekly revenue is driven by inbound same-week reactive work — calls, quick fixes, relationship-driven requests — that never appears in the ranked task set and is therefore invisible to any scheduling model. This suggests LLM-based prioritization meaningfully improves decision quality on structured work, but that a substantial share of real-world performance depends on tacit, reactive, and relational inputs that historical task data cannot represent. All code will be published and can be rerun against any compatible WHMCS export; underlying business data is proprietary and not included. |