Detecting that an asset is underperforming is half the problem. The other half is telling the operator what to do, and the recommended action is very different depending on the cause.
For soiling, schedule a wash crew, no urgency. For an inverter fault, dispatch O&M to the inverter station this afternoon and pull the event log before reset. For shading, talk to the site team about vegetation; there's nothing for the dispatcher to do today.
So the question is whether I can tell these three apart from cohort-residual data alone, without bringing in inverter telemetry or site-level weather. I think I can, using the second derivative of the residual time series.
Soiling builds up over days. Dust accumulates and generation creeps slowly down. The first derivative of the residual is small and negative, and the second derivative sits near zero because the slope is roughly constant.
An inverter fault is a step change. The asset is fine at 14:00 and generates zero at 14:30 because the MPPT tripped. The first derivative has one large negative spike, and the second derivative has a matching spike.
Shading is daily-recurring. Same hour, same underperformance. The signal lives in the hour-of-day cross-section rather than in any time derivative.
So the diagnostic chain in diagnose_cause is: first, if max absolute second derivative is greater than 0.25, classify as inverter_fault. Otherwise, if the trend slope across the window is below -0.005, classify as soiling. Otherwise, if the same hour underperformed by more than 30% on four or more days, classify as shading. Otherwise, return "unknown".
The thresholds (0.25, -0.005, 30%) are picked by hand. In production you'd train these against labeled incident data, which I don't have. For a one-week demo I'm calling them out as picked, leaving a placeholder for tuning, and falling back to "unknown" rather than guessing. Returning "unknown" keeps the operator's trust if the classifier is wrong, which is more useful than a confident misclassification.