Appendix A - Detail of Statistical model

The “advanced” statistical analysis uses linear regression techniques to establish both a counterfactual and assess the impact of the peak fares Pilot.

The approach used is a “General to Specific” methodology – all variables are initially included, and a model estimated. Then the most statistically insignificant variable is excluded and the model re-run. This is repeated until all remaining variables are significant. The full list of variables is as follows:

Table 4 - Variables used and description
Variable Description
Constant A standard constant or intercept
Trend An overall trend growth rate
PFT Dummy A Peak Fares Trial Dummy - A variable that takes the value 1 from October 1 2023 and 0 before and allows a shift in demand from the Pilot to be estimated
PFT trend A trend variable from October 1 2023 that allows the ongoing impact of the Pilot to be estimated
Day of the week variables Wednesday is chosen as the base and Sunday, Monday, Tuesday, Thursday, Friday and Saturday variables take the value 1 on relevant day of the week to allow daily variations to be captured*
Month variables Similar to the Day variables, September is chosen as the base* (All other months take the value 1 when applicable). This is a standard way of capturing seasonal impacts.
XmasNewYear To account for distinctly different travel demand over the Christmas and New Year period.
Sport 1 if there was a major sporting event that would be assumed to influence rail demand on the day
Concert 1 if there was a major concert or cultural event on the day
Strike 1 if strike action within Scotland.
Bad weather 1 if yellow weather warning on day
Extreme weather 1 if major weather event on day.
Travel demand difference Proxy variable for general travel demand. Is the variation in road travel demand from the equivalent period in 2019 as percentage variation. Various specifications tested and make no difference to other results and just vary interpretation of this variable.

*Note that the choice of the base has no impact on the overall results only the interpretation – for example, the Day variables show the impact of each day compared with the base (Wednesday).

For brevity only the results of the full data model are reported.

The coefficients are discussed below. The Standard Errors are part of the measure of statistical significance. The star ratings reflect significance at the 10% level (*), 5% level (**) and less than 1% level (***) respectively. A lower value is better.

Table 5 - Regression results
Variable Coefficient Std. Error Star rating
Constant 172648.0 2852.35 ***
PFT Dummy 8598.8 4177.52 **
Trend 116.8 6.47248 ***
Peak Fares Trend −140.1 26.8472 ***
Xmas New Year −47609.3 5009.91 ***
Jan −19278.5 3372.00 ***
June −14495.2 3304.30 ***
July −10417.5 3316.79 ***
Aug 14522.8 3294.85 ***
Dec 7474.02 3637.33 **
Sat 18070.4 2754.28 ***
Sun −91055.6 2733.20 ***
Mon −17992.6 2712.70 ***
Thur 4611.2 2712.94 *
Fri 19642.7 2703.65 ***
Sport 17484.3 3525.76 ***
Concert 15848.7 5114.80 ***
Strike −119413.0 5229.13 ***
Weather −26313.2 4169.88 ***
Extreme Weather −84368.7 9473.02 ***
Travel Demand Diff 906.8 275.004 ***

Use of 10% significance is appropriate for a regression of this type but with the exception of the Thursday variable all variables are significant at the highest level (1%) except PFT Dummy which is significant at 5%.

The R-Squared (0.835) and Adjusted R-squared (0.830)  values of the model show that it explains around 83% of the variation in the data.

For this model, the variables February, March, April, May, October and November were insignificant with the remainder remaining in the model. For example, this implies that all other things being equal, daily demand in June of any year was around 14,500 journeys less than the demand in September. In terms of the day of the week variables, Tuesday was insignificant implying that Tuesday demand is not significantly different from Wednesday demand, but all other Day variables were significant and reflect varying travel pattens over the course of an average week,

All the event variables were significant with the expected signs (positive or negative) and the Travel Demand Diff variable was significant and positive suggesting rail demand is higher when total (non-rail) demand is higher. Xmas New Year was also significant and negative as would be expected.

The values and signs of all the significant co-efficient are sensible intuitively. The obvious exception is the negative value for the Peak Fares Trend variable, and this is discussed in the main text.