Mirror, Mirror on the Wall - How good is CricketX, after all?
Arindom Mookerjee - 26 February 2001
(An assessment of how the CricketX model has fared in picking winners_..)
Time for introspection. There are several features unique to CricketX. We
are the only web-site to post forecasts of cricket matches, both static
(pre-match) and dynamic (over-by-over). But are our forecasts reliable?
Here are the bare facts. You, the thinking fan, sit in judgement.
The model
The propreitary model of CricketX was first developed by Dr. Surjit
S.Bhalla in his book, "Between the wickets: The who and why of the best in
cricket" (1987). This basic model has since undergone many foliations that
has seen it transformed from just an authoritative rating system to a more
holistic model of forecasting. The core of the CricketX model is based on
crimetrics (from: cricket econometrics). Crimetrics allows the use of
sophisticated statistical techniques to adjust player performances
according to the nature of the pitch and the quality of the opposition. For
teams, it leads to the construction of batting and bowling indices that are
vital inputs in the forecasting model.
The data
The CricketX database covers every single international match ever played -
now 1508 Tests and 1624 ODIs. However over-by-over scores is a
comparatively new development. The run rate comparison statistic has
featured more prominently since the World Cup in England in 1999. Our
database tracks this for all matches beginning World Cup'99 - about 150
matches. The analysis of our predictions, would be based on this sample and
will not take into account abandoned matches. We do not rate or predict
matches involving non Test-playing nations like Kenya, Holland or Scotland.
The results
There are three sets of results. One, for our static forecasts, the second
for predicting the score at the end of the 50 overs in the first innings
and third, for the dynamic winning probabilities in the second innings.
Besides, there are forecasts on individual performances.
With the static forecasts, we got a creditable 69% right. This means that
out of 100 matches, we have correctly predicted the outcome of 69 of them
before a ball is bowled. For the World Cup, this figure was around 73%.
Initially, we display two sets of probabilities on our home page, depending
on which side bats first. The record also shows that the errors have been
uniformly right or wrong for both the team batting first and batting
second. This shows that the model is fairly robust.
Predicting the end-score of the first innings
Mean error Error range
After (runs) (25th - 75th %ile)
10 overs 10 -22 to 36
15 overs 14 -18 to 31
25 overs 13 -13 to 31
35 overs 8 -11 to 22
45 overs 0.7 -9 to 11
For the first innings Dynamic Score Predictor (DSP), the standard error
(SE) at the end of 10 overs is 10 runs. Till about the 25th over, the error
stays around 13 runs. By the 35th over the difference between the predicted
score and actual score at the end of 50 overs falls to 8 runs on average.
Five overs from close, the SE is less than 0.5 runs. Not only the SE, the
range of the errors too has progressively fallen. At the end of 15 overs,
when the SE was 13 runs, the error in predicting is in the range of -18 to
31 runs. By the 35th over, the range narrows down to -11 to 22 runs.
Predicting the winner in the second innings
Winner correctly identified
Overs Left (% of matches)
50 78
40 81
35 82
25 84
15 89
5 88
Once the target is known and before the second innings gets underway, we
have called it right 78% of the times. At the half-way mark, our record
improves to 84%. This means that half-way through the second innings, we
were able to correctly say whether the chasing team would win (51%
probability of winning) or not 84% of the times. This percentage steadily
increases to 89 by the 35th over.
Besides match forecasting, we also make predictions for individual players.
Our predictions for the last Australia-South Africa series is still
featured on our site. This followed our hugely successful maiden venture
for the Asia Cup. You can read about it in "Lest we forget_" How did these
predictions stack up?
The forecasts for the Asia Cup were spot on with most leading batsmen
averaging within two runs of the forecasts.
- For the last Australia-South Africa series, Michael Bevan was predicted
to score 130 runs, he scored 142. For Mark Waugh, the forecasts said that
his form would not turn very significantly. From 25 runs in the last 5
innings prior to this series, he scored 66 in 3. CricketX had him at 90.
- Lance Klusener was correctly identified as the highest scorer from the
Proteas rank. He was predicted to score 120 runs in 3 innings at an average
of 40. His average for the series was 43, although he scored 86 runs.
- Among the bowlers, McGrath was predicted to be the leading wicket taker
with 8. He bagged 4. Other players, like Harvey, Gillespie and Shane Lee,
who the site had not forecast for, picked up the bulk of wickets.
- Among South African bowlers, Nicky Boje got 4 of 4; Klusener 3 of 5 and
Pollock 3 of 6 that were forecast.
Forecasting requires a sound crimetric (cri for cricket, metric for
econometric) model. It's also a risky business. An inventory of expertise
expertise and risk appetite probably explains why CricketX are the only
ones sticking their necks out!
© CricInfo & CricketX