Appendix A - Detail of Statistical model
The “advanced” statistical analysis uses linear regression techniques to establish both a counterfactual and assess the impact of the peak fares Pilot.
The approach used is a “General to Specific” methodology – all variables are initially included, and a model estimated. Then the most statistically insignificant variable is excluded and the model re-run. This is repeated until all remaining variables are significant. The full list of variables is as follows:
Variable | Description |
---|---|
Constant | A standard constant or intercept |
Trend | An overall trend growth rate |
PFT Dummy | A Peak Fares Trial Dummy - A variable that takes the value 1 from October 1 2023 and 0 before and allows a shift in demand from the Pilot to be estimated |
PFT trend | A trend variable from October 1 2023 that allows the ongoing impact of the Pilot to be estimated |
Day of the week variables | Wednesday is chosen as the base and Sunday, Monday, Tuesday, Thursday, Friday and Saturday variables take the value 1 on relevant day of the week to allow daily variations to be captured* |
Month variables | Similar to the Day variables, September is chosen as the base* (All other months take the value 1 when applicable). This is a standard way of capturing seasonal impacts. |
XmasNewYear | To account for distinctly different travel demand over the Christmas and New Year period. |
Sport | 1 if there was a major sporting event that would be assumed to influence rail demand on the day |
Concert | 1 if there was a major concert or cultural event on the day |
Strike | 1 if strike action within Scotland. |
Bad weather | 1 if yellow weather warning on day |
Extreme weather | 1 if major weather event on day. |
Travel demand difference | Proxy variable for general travel demand. Is the variation in road travel demand from the equivalent period in 2019 as percentage variation. Various specifications tested and make no difference to other results and just vary interpretation of this variable. |
*Note that the choice of the base has no impact on the overall results only the interpretation – for example, the Day variables show the impact of each day compared with the base (Wednesday).
For brevity only the results of the full data model are reported.
The coefficients are discussed below. The Standard Errors are part of the measure of statistical significance. The star ratings reflect significance at the 10% level (*), 5% level (**) and less than 1% level (***) respectively. A lower value is better.
Variable | Coefficient | Std. Error | Star rating |
---|---|---|---|
Constant | 172648.0 | 2852.35 | *** |
PFT Dummy | 8598.8 | 4177.52 | ** |
Trend | 116.8 | 6.47248 | *** |
Peak Fares Trend | −140.1 | 26.8472 | *** |
Xmas New Year | −47609.3 | 5009.91 | *** |
Jan | −19278.5 | 3372.00 | *** |
June | −14495.2 | 3304.30 | *** |
July | −10417.5 | 3316.79 | *** |
Aug | 14522.8 | 3294.85 | *** |
Dec | 7474.02 | 3637.33 | ** |
Sat | 18070.4 | 2754.28 | *** |
Sun | −91055.6 | 2733.20 | *** |
Mon | −17992.6 | 2712.70 | *** |
Thur | 4611.2 | 2712.94 | * |
Fri | 19642.7 | 2703.65 | *** |
Sport | 17484.3 | 3525.76 | *** |
Concert | 15848.7 | 5114.80 | *** |
Strike | −119413.0 | 5229.13 | *** |
Weather | −26313.2 | 4169.88 | *** |
Extreme Weather | −84368.7 | 9473.02 | *** |
Travel Demand Diff | 906.8 | 275.004 | *** |
Use of 10% significance is appropriate for a regression of this type but with the exception of the Thursday variable all variables are significant at the highest level (1%) except PFT Dummy which is significant at 5%.
The R-Squared (0.835) and Adjusted R-squared (0.830) values of the model show that it explains around 83% of the variation in the data.
For this model, the variables February, March, April, May, October and November were insignificant with the remainder remaining in the model. For example, this implies that all other things being equal, daily demand in June of any year was around 14,500 journeys less than the demand in September. In terms of the day of the week variables, Tuesday was insignificant implying that Tuesday demand is not significantly different from Wednesday demand, but all other Day variables were significant and reflect varying travel pattens over the course of an average week,
All the event variables were significant with the expected signs (positive or negative) and the Travel Demand Diff variable was significant and positive suggesting rail demand is higher when total (non-rail) demand is higher. Xmas New Year was also significant and negative as would be expected.
The values and signs of all the significant co-efficient are sensible intuitively. The obvious exception is the negative value for the Peak Fares Trend variable, and this is discussed in the main text.