demosample project · no account needed

Solar Asset Performance Analyzer

By Candela · Sample Project · shipped 11 May 2026

entry 1 · 5 May 2026

Day one: where I'm starting from

Where I'm starting from. I have an interview next week with a solar asset management startup, and I want to walk in with something concrete to talk through rather than just background reading I did on the train.

The plan is to build a stripped-down version of the analyzer their on-call engineers would use. It takes in half-hourly generation data from a portfolio, finds the assets that underperformed today, and tells the operator what to do about each one.

Before I write any analysis code, the design question I want to nail down is what "underperformed" means, mathematically. Every paper and every product page seems to define it slightly differently, and the choice is going to shape the whole tool. I'd rather decide on paper now than discover halfway through that I built around the wrong definition.

entry 2 · 6 May 2026

Cohort-relative residuals over per-asset rolling baselines

Two camps in the field.

The first is per-asset rolling baseline. Compare each array's MW today against its own seven-day rolling average. Simple to explain, easy to implement, and every entry-level monitoring dashboard ships with it out of the box. The problem is that a cloudy week pulls all the assets' baselines down at once, so a genuinely degrading asset can hide inside the weather noise. By the time the rolling average has caught up, you've lost a week of generation that should have been an alert.

The second is cohort-relative residuals. Group assets by something that captures shared weather exposure (latitude band is the cheapest proxy that doesn't fall apart), compute the cohort's median utilization at each half-hour, and flag the assets that fell well below their peers. Weather is shared inside the cohort, so comparing against the median normalizes most of it away.

I'm going with cohort-relative for two reasons. First, it catches step changes immediately. An inverter that trips at 11am shows up as 2 sigma below cohort in the next half-hour, rather than after a week of rolling-baseline catch-up. Second, it's the methodology the better operators in the field have moved toward. Lightsource bp's published methodology is cohort-based, and Octopus Energy's solar performance team has talked publicly about peer-cohort dashboards. If I argue cohort-relative in the interview, I'm arguing for the direction the field's better operators are already moving.

The trade-off I'm taking on is that a cohort needs at least four peer assets in the same band to produce a useful median. Sparse-geography portfolios (Cornwall, the Highlands) won't have the density. For those I'd fall back to per-asset rolling, but I'm not building that path in v1. The cohort case is the one I want to defend.

entry 3 · 7 May 2026

Why latitude band beats inverter type for cohorts

I considered three ways to group assets into cohorts.

The first option is latitude band, which captures shared weather and shared sun angle. The second is inverter model, which captures shared failure modes; an SMA fleet has different MPPT behaviour than a Huawei fleet. The third is nameplate capacity band, which captures shared scale-related effects like mounting structures and transformer losses.

The right answer is whichever axis captures the most variance you're trying to normalize out. For half-hourly generation data, the variance is dominated by weather, then sun angle, then asset-specific factors. Latitude band groups by the two variables that move MW the most, so it wins.

Inverter-model grouping sounds clever but groups together assets that can be 600km apart and weather-uncorrelated. You end up normalizing the wrong noise term, and worse, you hide inverter-specific failures, which is the thing the grouping was supposed to expose in the first place.

Capacity-band grouping is mostly redundant once you normalize MW into utilization (mw divided by capacity), which I'm already doing.

I went with 0.5 degree latitude bands. The UK fleet spans roughly 50.5 degrees north to 58.5 degrees north, which gives me sixteen bands. That feels coarse enough that most bands should have four or more peers but fine enough that weather inside a band stays well-correlated. If I were doing this for a fleet that crossed into mainland Europe, I'd also have to factor longitude because sunrise shifts west to east, but for UK-only the longitude gradient is small enough to ignore.

entry 4 · 8 May 2026

Why z-scores rather than percentile rank

Once I have residuals, I need a way to turn "is this asset bad enough to alert on" into a single number with a defensible threshold.

Two options. Percentile rank means sorting the residuals and reporting the position as a value between zero and one. It's simple, robust to outliers, and easy to explain. Z-score against cohort stdev means dividing the residual by the pstdev of the cohort residuals, which is interpretable as "sigmas below the cohort mean."

Z-scores win because they preserve magnitude information. A 4-sigma underperformer is qualitatively different from a 2-sigma one. The first is almost certainly hardware, the second could still be measurement noise. Percentile rank squashes both into "rank one of eight" and loses that signal.

The catch is that z-scores only make sense if the residual distribution is roughly normal. I checked a clear-sky midday hour and the histogram is tighter than normal in the middle (most assets cluster close to the median) with somewhat fatter tails (a few assets diverge for hardware reasons). Not Gaussian, but not so far off that a 2-sigma threshold is wrong. If anything the fat tails make 2-sigma slightly conservative, which is fine for an alerting system that ought to err toward fewer alerts.

I set the alert threshold at 2 sigma below cohort. That's the conventional industry value, and I have no reason yet to argue for tighter or looser. If the false-positive rate becomes a problem during the demo, I'll log how many alerts fire per day and reconsider. Tuning the threshold before I have a feedback loop is premature optimization.

entry 5 · 9 May 2026

The soiling-vs-inverter-fault diagnostic

Detecting that an asset is underperforming is half the problem. The other half is telling the operator what to do, and the recommended action is very different depending on the cause.

For soiling, schedule a wash crew, no urgency. For an inverter fault, dispatch O&M to the inverter station this afternoon and pull the event log before reset. For shading, talk to the site team about vegetation; there's nothing for the dispatcher to do today.

So the question is whether I can tell these three apart from cohort-residual data alone, without bringing in inverter telemetry or site-level weather. I think I can, using the second derivative of the residual time series.

Soiling builds up over days. Dust accumulates and generation creeps slowly down. The first derivative of the residual is small and negative, and the second derivative sits near zero because the slope is roughly constant.

An inverter fault is a step change. The asset is fine at 14:00 and generates zero at 14:30 because the MPPT tripped. The first derivative has one large negative spike, and the second derivative has a matching spike.

Shading is daily-recurring. Same hour, same underperformance. The signal lives in the hour-of-day cross-section rather than in any time derivative.

So the diagnostic chain in diagnose_cause is: first, if max absolute second derivative is greater than 0.25, classify as inverter_fault. Otherwise, if the trend slope across the window is below -0.005, classify as soiling. Otherwise, if the same hour underperformed by more than 30% on four or more days, classify as shading. Otherwise, return "unknown".

The thresholds (0.25, -0.005, 30%) are picked by hand. In production you'd train these against labeled incident data, which I don't have. For a one-week demo I'm calling them out as picked, leaving a placeholder for tuning, and falling back to "unknown" rather than guessing. Returning "unknown" keeps the operator's trust if the classifier is wrong, which is more useful than a confident misclassification.

entry 6 · 11 May 2026

The conversation I want to have in the interview

I read through the startup's public methodology page. They're using rolling seven-day per-asset baselines with thresholds tuned by asset class, which is the per-asset camp I argued against in my second reasoning entry.

I don't think they're wrong about this. Sparse fleets and unusual asset classes (rooftop commercial, behind-the-meter) don't have cohort density to lean on, and per-asset rolling is the right answer there. What I want to talk about is the path to a hybrid approach.

For the cohort-dense slices of the portfolio, by which I mostly mean utility-scale ground-mounted assets in the South-East, cohort-relative would catch step-change degradation roughly five to seven days earlier than per-asset rolling. That's measurable as generation-saved-per-year and converts straightforwardly into pounds.

For the sparse slices, keep per-asset rolling but make it sharper by adding clear-sky index as a regressor. Satellite-derived clear-sky data is available for free from CAMS, and feeding it in as a covariate substantially reduces the cloudy-week false-negative problem that per-asset rolling has on its own.

The two methods produce comparable z-scores, so the alerting layer doesn't have to know which signal fired. The operator sees one unified inbox.

That's the conversation I want to have. Their methodology is the right answer for the asset mix they have today, and the question I want to surface is where it stops scaling, what the wedge into cohort-relative looks like, and what the engineering cost would be. Somebody who has done a week of work and can argue from numbers is a more useful interview partner than somebody who has only read the methodology page.