LIES, DAMNED LIES AND TRANSIT KPIs

Is Sydney’s public transport performing well or not performing well? User experience is that it’s not performing well, reflected in frequent news reports, social media outrage and pre-election opinion polling. By contrast, the operator’s statistics indicate Sydney’s rail, buses and ferries are performing well, with consistent high results.

I’m going to explain what “Key Performance Indicators” (KPIs) or “Metrics” are, why they’re important, and what’s wrong with them in the context of Sydney’s urban public transportation (or transit). By way of warning, there will be a bit of what looks like management jargon. Please do not be deterred, I want to stick with the established terminology for consistency.

I’ll contrast the practices of companies working in a competitive environment with the practices of Sydney’s urban transit. I’ll also illustrate by international example that customer focus can be, and has been, replicated in public-sector transit services.

Finally I'll explain why I think the era of rampant misrepresentation of performance is drawing to a close, whether governments, agencies and operators want it to or not.

Why it Matters

The underlying motivation for me writing this is that urban transit performance indicators in Sydney have been gamed. The indicators are more about producing high percentages than telling us anything meaningful. Yet they are used as a basis for judging the effectiveness of investment of tax-payers’ dollars, fare (price) setting and assessing management performance.

Reported success or improvement is rewarded, publicised or used as a basis for fare increases, when in fact many of the reported indicators do not show success or improvement in any meaningful sense (to a customer, investor or economy).

You might expect that the indicators would seek to represent features of performance that are important to users and the success of the transit business. However, this is not the case. And it’s not surprising when incentives around the creation and application of KPIs are not to develop or grow the business, but rather to project success theatre.

A half-hourly Sydney bus departs too full to take on passengers at the Edgecliff rail interchange

A half-hourly Sydney 324 bus departs too full to take on passengers at the Edgecliff rail interchange

The bus operator, State Transit, reports 99.87% Reliability in Q4 2012 (source)

The present Sydney transit KPIs are almost useless from a management perspective; they do not meaningfully inform management actions to improve customer experience. From a passenger and price-setting perspective they are at best useless and at worst intentionally deceptive. Moreover, meaningless KPIs render it impossible to analyse the (dis)improvement arising from strategic and operational changes.

This is not a problem that is confined to Sydney or transit, indeed it permeates the public sector to the highest level. However, with an active and earnest focus on performance, several transit organisations globally have overcome this bane. Sydney can too.

Yes Minister: The Florence Nightingale Award Used the Wrong Metrics to judge success

Background

In a free, competitive market with multiple participants providing similar services, those participants are typically trying their hardest to improve how they fulfil the needs and wants of their customers.

Access, Glamour, Comfort, Value- Public transport in the competitive era (left: Central London rapid transit c1890 right: US Interstate rail c1950)

In the competitive private sector, performance reporting is important mainly for managers to understand how their business is performing, and to quantify which management actions improve or impede performance and by how much. Performance reporting is also necessary to inform investors or prospective investors of how the business is performing. Understanding KPIs or Metrics is the livelihood of investors. They want to put their money in companies which meet (or will meet) consumers’ needs better than other companies trying to meet those needs.

Investors therefore seek factual metrics on performance which are an indicator of how the company satisfies peoples’ needs, how efficient it is, how it’s growing and how much money it could make. As a corollary, companies are always seeking to provide those metrics, and finesse them is such a way as to claim as much success as they can.

Whether it is a manager seeking to cast his performance in the most favourable light, or a whole company talking-up its performance pre-investment, companies have a natural incentive to create metrics that overstate performance or success at meeting customers’ needs. These, in the lean startup world, are called Vanity Metrics. A famous example was Groupon’s overstated income described here by the New York Times, so-called “adjusted consolidated segment operating income”. Informed investors are attuned to the phenomenon and know when to call bullshit.

While speaking on the ineffectiveness of Vanity Metrics, Eric Ries defines “success theatre” as “energy that goes into making people think you’re being successful, rather than ... into serving customers” from this video (2:26-3:06)

In the private sector, performance indicators or metrics are under scrutiny from investors, prospective investors and even journalists reporting on the market.

In government-funded services operating in an uncompetitive environment, or in government-granted monopolies (like exclusive franchised bus, rail or ferry services), performance reporting is especially important. In the absence of market competition, it provides the only means by which performance and return on taxpayer dollars can be judged.

Unfortunately, however, public sector and monopoly transit operators are not under the scrutiny of shrewd investors. There is no individual or institutional investor who stands to gain or loose millions of dollars as a consequence of operational performance being misrepresented. For that matter, given transit operators' monopoly position serving an often captive market, there is often no direct signal from bad operational performance- only what comes back indirectly from the loss of a metropolitan area’s efficiency and productivity, and user (voter) displeasure expressed at the ballot box or in protests.

Taxpayers and voters are so far from the details of this one small part of their collective financial interests, that even when there is occasional scrutiny, there is no driving motivation to overhaul performance statistics of transit operators. For taxpayers, if the experience is bad enough, they’ll vote accordingly. Unfortunately, on a year-to-year level and on a decision-to-decision level, this gives us no insight into what’s working, what’s not working, and what we are really getting- in objective quantitative terms- for our dollars spent.

Metrics in Sydney’s Transit

I'll delve into a couple of specific examples of KPIs in Sydney, one on buses and one on trains. Both relate to the headline "on-time running" metric.

Back in 2009 the Government's transit price setting regulator (the Independent Pricing and Regulatory Tribunal, IPART) observed on p26 of its Review of fares for metropolitan and outer metropolitan bus services for 2009, IPART:

"Measures of on-time running are largely limited to recording whether the bus leaves the depot on time. IPART has previously noted that this is not a good indicator of the bus network’s actual on-time running performance or the level of service actually experienced by passengers. IPART considers that the inadequacy of this measure makes it difficult to form a meaningful judgement of the change in on-time running performance. "

In 2010, the New South Wales Auditor General produced a report- Improving the Performance of Metropolitan Bus Services. In the conclusions on p3, the Auditor General noted:

“…a lack of performance information has prevented NSWTI from undertaking any comprehensive analysis of the performance of bus services. The proposed new performance assessment regime will be limited to using data already available and will not cover many areas of performance important to bus users. This will continue to frustrate NSWTI’s ability to manage the performance of metropolitan bus services.”

As anyone with an MBA will tell you, if you’re not measuring the performance of what you’re trying to improve, you’re off to a bad start. This would be a damning assessment, were it made by a venture capitalist with regard to an enterprise. If the managers of the company were unable to even understand, let alone demonstrate, the company’s value proposition and performance, there would be concerns for the managements’ competence.

But we find that even two years after that recommendation, the reported data remains largely unchanged. The headline “on time running” metric, despite criticism from both the Auditor General and the Independent Pricing Regulator continues to be defined by a near meaningless criteria:

“The on-time running for State Transit is defined as a bus that starts its trip between zero to 5 minutes after the scheduled departure time.”

(p6, Quarterly Performance Report, October-December 2012, Transport- State Transit).

According to the Independent Pricing Regulator in December 2012 it is actually not 5 minutes but rather:

“A bus is considered to be on time if it departs from its starting point 5 minutes after its scheduled time. In practice this means a bus can depart up to 5 mins 59 sec late and still be on- time.”

(footnote 13, p7, Metropolitan and Outer Metropolitan Buses – Costs and service Performance Report 2012, IPART)

Just to restate what’s going on here: the only measure of timeliness of service delivery, for management reporting and for review by the independent price setter, is how many buses have left within 5 min 59 secs of timetable from the start of a route.

There is no consideration of how early or late buses become throughout their routes. As a corollary there is no understanding of how many services or connecting services are missed by virtue of early or late running, how much productivity is lost, or the general economic benefit or disbenefit of various courses of management action.

On-Time running for Sydney CBD and inner-west Region 6, from p6, Quarterly Performance Information, October-December 2012, Transport- State Transit

The Auditor General was not been able to realise any change in the indicators used for price-setting and management reporting. In stark contrast, as explained in the below video, when it suited State Treasury’s interests to demonstrate the need for Federal funding, the bus “on-time” criteria is changed “to within 2-minutes” along the whole route. The manipulation of sampling and thresholds to suit managerial and political interests successfully turns the performance of inner-Sydney buses from “90.54% On-Time” into “81% running late”.

(Seven News Sydney, 9 out of 10 Sydney Buses Late, 29/4/2013)

This whimsical redefinition of KPIs to suit political ends has also been seen in Sydney's passenger rail system. Rail "on-time running" was judged against a threshold of 3 minutes 59 seconds. In 2005 the threshold was changed to 5 minutes. On-time running performance jumped from 63% to 89%. The improvement in reported performance between 2005 and 2006 was nothing but an exercise in statistical manipulation- a hollow and superficial victory if ever there was one.

Sydney's passenger rail operator increasing claimed performance by redefining "on-time" (source: CityRail http://www.cityrail.info/about/our_performance/otr_year.jsp)

These visible and obvious abuses of KPIs are symptomatic of a less visible structural problem with transit metrics in Sydney- they simply do not represent a meaningful or useful measure of how public transport is performing.

Whether it is the total lack of consideration for journey times, reliability being defined as for example "the percentage of buses that ran" (with no consideration for whether those buses were early, late, or too full to take passengers), or whether it is the lack of consideration for actual timeliness with which passengers complete their rail, bus or ferry journeys, the headline statistics cannot be interpreted to give an understanding of what users experience.

Moreover, scores against arbitrary thresholds (like 95.2% of trains ran less than five minutes and 59 seconds late at six measuring points in the network) cannot be used to understand the economic value or cost of performance.

These examples only relate to the headline on-time performance metric. Similar problems pervade most of the metrics which appear both in voluntarily made and statutory reports. Broader criticisms and suggestions relating to metrics in Sydney's rail system can be read in this 2011 report for the NSW Business Chamber entitled "Improving Customer Experience". For the sake of brevity I'm going to stick to examples relating to the top level metrics only.

In summary Sydney's urban passenger transport provider KPIs

Are prone to political and managerial abuse, by virtue of using arbitrarily defined thresholds (e.g. X trains within Y minutes at location Z) rather that using absolutes (e.g. Passengers experienced X minutes of lateness to their journeys)
Have been abused in the last decade, turning "63% on-time" into "89% on-time", or 90.54% on-time into "81% late" through shifting thresholds and samples
Have been identified by the Auditor General, the Government Pricing Regulator and the NSW Business Chamber as flawed
Have not been reviewed or corrected, and continue to appear in performance reports in their deficient form.

So is Sydney's problem a global one, an intrinsic failing of uncompetitive, government-funded markets, a problem to which we should be fatalistically resigned? You're probably guessing from my overcooked rhetoric that the answer is a resounding no.

International Examples

Public transportation in Zurich is often regarded as amongst the best in the world, so Swiss practice merits consideration (I'd argue Swiss practice in revenue collection deserves consideration too!)

Swiss Federal Railways (SVV) KPIs from the their Punctuality and Safety Page

Of the Swiss Federal Railway's (Schweizerische Bundesbahnen or SBB) top three reported KPIs, the first looks superficially familiar (I'll come back to that) and the second and third look decidedly unfamiliar. That is because they do not relate to an arbitrary threshold, but rather focus on actual customer outcomes.

"Connections made" reports the proportion of interchanges between services that were able to made (i.e. not missed as a consequence of late running services). This captures the impact of lateness on a journey of which a single train trip is only one part. From a customer's perspective, a few minutes of lateness for one service can very quickly amplify if it means missing a connection with a half-hourly service.

"Gross passenger delay minutes" reports the actual total delay experienced by passengers. Not only is this an informative passenger-centric continuous value in its own right, it can also be used to evaluate the economic cost of lost productivity due to delay.

The headline KPI "customer punctuality" is defined as "Percentage of passengers who arrive on time or less than 3 minutes late". Though this looks similar to Sydney's On-Time Running, in the sense that it refers to performance against a threshold, it is actually very different in that it is reporting about punctuality of customers not trains. Whereas in Sydney a 7-minute late train carrying 1300 people is the same as a 7-minute late train carrying five people, in Switzerland it is all about the passenger. Non-peak services can therefore not be relied on to dilute bad performance in peak periods.You can browse a full performance report in English here.

London, a more familiar city to Sydneysiders, also has informative practices when it comes to KPIs.

Extract from Underground (Transport for London) Performance Report for Period 12 2012-13

Though I would consider the first KPI in the table "customer satisfaction" fairly subjective, tube KPIs start with a similar focus to the SBB: passengers. The demonstrable evidence of the the Underground's customer focus is in the metric "Excess Journey Time" in minutes per passenger from system entry to exit. Again, like the SBB's Gross Passenger Delay Minutes, this is about the impact of unreliability on customers in a meaningful form. Transport for London (TfL) goes a step further than the SBB, including the impacts of ticket queues and station congestion. They explain here:

Excess Journey Time

The Journey Time Metric captures service performance, related to demand, and expresses the information as average passenger journey time. For the purposes of the Journey Time Metric, each journey is broken down into its constituent parts namely access from station entrance to platform, ticket queuing & purchase time, platform wait time, on train time, platform to platform interchange and egress from platform to station exit.

London Buses also reports some interesting and noteworthy metrics. Aside from the focus again on actual quantified customer experience, there is a focus not only on how the system performs relative to its undertaking (i.e. the timetable), but also how good the service level of the undertaking is in the first place. This is demonstrated in the "Average scheduled wait" for high frequency services, telling us what the average waiting time for a passenger who just turns up at a stop, as distinct from someone who has turned up for a specific timetabled service. This should be equally applicable to Sydney's high frequency buses like the 380 and other trunk routes.

Extract from London Buses performance report for 2012-13

In summary these international practices have three important features

They are customer focused (i.e. waiting time of customer, customer delay minutes not train-based)
They recognise that customer experience is not of a train or bus in isolation, but an end-to-end experience, with time sensitive interchanges (i.e. between SBB mainline services "Connections Made" or tube services "Excess Journey Time")
With a few exceptions, they are not based on arbitrary intervals which can be conveniently redefined

It would be a great step forward indeed if Sydney shifted to meaningful, transparent and actionable metrics like these. It would largely eliminate the "success theatre" such as that egregious Sydney Buses "reliability" example I opened this post with.

There may be an opportunity to even surpass these international examples, with a slight expansion of the same theme.

What should KPIs look like in transit?

Before suggesting the logical progression of these metrics, I'd just like to step back for a moment and re-establish what the important considerations are. How do we know time and timeliness is so important- are the SBB and TfL right to focus so much on time? What about air conditioning, escalators etc? There are indeed many factors influencing the choice to use transit and the level of satisfaction of that use.

Transport economists have spent years experimenting, hypothesising and testing what is valued by transit customers. These are the variables that drive mode choice, e.g. lead an individual to choose transit over their car.

In a classic transportation forecasting, travel time is the driver of mode choice and route assignment. This travel time between an origin and destination includes a sequence like the following:

Walking from your start location to the transit station
Travel Time
Time to walk to Interchange to another service
Waiting time
Travel time
Walking to your destination

More sophisticated analyses use the concept of Generalised Cost, but still the headline focus on end-to-end journey time remains the same. Fare price, comfort, interchange delay and weather exposure are some of the elements of Generalised Cost. (You can read more on the components of Generalised Cost and the general principles of economics of travel in TRL’s excellent 2004 publication “The demand for public transport: a practical guide”. Section 5.2 p39 are particularly relevant here).

We have to look no further than the building blocks of Generalised Cost- and of what customers value in revealed preference- to understand what the key aspects of value, utility and transit performance are. Metrics should therefore give insight into performance against these key parameters that constitute Generalised Cost and which drive the choice to use transit.

Time is the key driver of Generalised Cost and in turn transit mode share, but it is not timeliness in isolation on a single service or at a single point-- it is the total door-to-door transit system experience that matters.

While the metrics reported by TfL and the SBB are superior by virtue of capturing things like connections made, queuing time and service interchange delays, they do so within mode-based silos. Under integrated transit administrations, like Tfl, the ZVV and the new Transport for New South Wales, it should be possible to evaluate a service's contribution to global door-to-door travel time performance, not just a particular operator's performance within its operating silo.

For example, the criteria might remain the same "gross passenger delay minutes" or "excess journey time", but it can be evaluated over an entire multi-modal system. The technological limitations that once existed, requiring physical checking and sampling, can now be entirely automated using realtime data streams for passenger information. There is no longer a technological limitation to evaluating whole-of-system performance, while simultaneously filtering by operator, mode, time and geography to optimise into both a powerful management tool and the ultimate pricing performance benchmark.

Given the advantages that easily-manipulated KPIs give to various people in leadership positions, I don't expect there will be a hasty or willing transition. There is however some great news for most people. Let's face it, if you made it this far through this post you deserve some really great news.

Great News (unless you're covering your arse)

Within 5 to 10 years, irrespective of whether governments, authorities or transit operators reform their reported metrics, open access to real-time data will enable third parties, whether it be the Auditor General or a teenager in a garage or both, to compare and interpret real operating data with timetables to ascertain actual operating performance. While this may not tell us much more about air conditioning and crowding (the internet of things could change that a bit later), it will open up the headline performance metrics like "on time running" and "passenger delay" to broad scrutiny.

We are already starting to see static performance data (like service frequencies) represented in new meaningful ways using openly available tools and data.

A Google Heat map of Boston transit service levels from The Walking Bostonian

With both the New South Wales and Australian Governments committed to Open Data it is only a matter of time before live operating data can be comprehensively analysed and interpreted in new, beautiful and easily digested forms. Imagine the the above heat map as "Minutes of Delay Per Passenger between 0730HRS and 0930HRS". That is the future. Sydney Buses claims it has 99.95% reliability will become moot. So understandable, engaging and compelling will this data become that the political and managerial claims will have no choice but to become congruent with and address operating reality and the customer experience.

I'm planning to write more on huge implications that open data will have for transit in the next 20 years. In the meantime, if it's a concept that's new to you, aside from the above links this video provides a comfortable intro (with an awesome Phoenix sound track and Irish narration).

Conclusion

Sydney's transit metrics are poorly defined, subject to abuse, and do not represent aspects of performance that are relevant to customers. The Auditor General, Independent Pricing Regulator and NSW Business Council have all identified serious and unremediated flaws in Sydney's transit performance reporting. We do not know in a quantitative sense how Sydney's urban transportation is performing.

There are international practices that could be implemented providing an immediate improvement. There are opportunities to even advance the meaningfulness and usefulness of transit metrics further by developing measures that encompass customer experience throughout a door-to-door journey. Cities seeking to enhance their productivity and efficiency will continue to proactively develop and manage these.

Irrespective of whether governments, agencies and transit operators choose to take a proactive approach (like the Swiss SBB or London's TfL), availability of open operating data will within the next decade deliver a new era of performance awareness and accountability.

David Caldwell - personal thoughts

Saturday, May 25, 2013