A Smyth, a Tigre and base runs

base runs is a fresh new approach to Run Estimation formulas developed by David Smyth. Sabermetic denizen of baseball Primer, Tangotigre, has written extensively on base runs and different Run Estimation formulas. When I worked on a technical version of base runs to handle intentional walks and GIDPs, I discussed the results in an e-mail exchange with Tangotigre, for whose help I am grateful. My weights for the various elements in base runs match up very well with the empirical values of all the various batting events. I tested my formula against game logs from 1994 to 2001, with very good results.

Before we get to the two big sluggers, this new approach to run estimation should be discussed.

*There are 4 elements to the base runs formula:*

**A factor**- this is the reach-base factor, similar to the A factor in Bill James' runs Created, except that it removes Homeruns

**B factor**- the advance-the-baserunner factor

**C factor**- the failure-to-advance-the-baserunner factor

**HR**- that's homeruns, sitting on their own. Homeruns do advance baserunners, so they are also included in the B factor as well, but the portion of the homerun's value that drives in the batter is always equal to 1 run.

*The factors are combined as follows:*

**Men on base**multiplied by

**Advancement Ratio**plus

**Home runs**

The idea behind this approach is dead simple: runs are scored when runners first reach base, then advance (either on their own or with another player's help),

*with the exception of homeruns*. Using the factor letters, the formula is:

**A * (B/(B+C)) +HR**

Here we see that the Advancement Ratio is

**Successes**divided by the sum of

**Successes**and

**Failures**.

*My technical version uses the following weights:*

A factor:

**Hits - Homeruns + .9*(HBP+W-IW) + .5*IW - CS - GIDP**

The .9 and .5 might look strange; they reflect the fact that HBPs and walks create many GIDP opporunities and force plays. Intentional walks are often issued with weaker hitters coming up.

B factor:

**.7*singles + 2.4*doubles + 4.1*triples + 2.4*Homeruns + 1.2*sac flies + 0.8*sac hits + 1.1*steals + .2*caught stealing +.1*GIDP +.2*HBP + .1*(W-IW) + .04*(AB-H-K)**

Full discussion of this B factor is beyond the scope of this article, but I will deal with two peculiar aspects.

The advancement value of triples is worth more than homeruns because the triple also advances the batter two bases after he reaches base. The homerun's value here is only refective of the effect it has on runners; it's effect on the batter is dealt with separately.

Caught Stealing and GIDP have some advancement value because some CS are on double steals (the other runner advances) or runners are sometimes called safe after an error (the middle infielder drops the ball during the tag). GIDP's can, of course also advance runners.

C factor:

**AB - H + SH + SF**

This is nothing other than batting outs. Outs made on baserunners are dealt with in the A factor (they remove those runners).

Bonds and Pujols in the context to their teams.

It is possible to apply the base runs formula directly to Bonds and Pujols'stats, but then we are subject to the same difficulties the original runs Created approach suffered from. In reality, Bonds' reach-base factor (A) acts on his teammates, not of his own advancement ratio. Similarly, his advancement potential acts on the batters batting ahead of him.

To sidestep this issue, I first calculated team base runs and compared that to team stats minus Bonds/Pujols. The difference is a rough approximation of the run value added by their performance.

The Giants have scored 601 runs (baseruns predicted 607.09); the Cardinals have scored 718 (base runs predicted 710.31). Subtracting out Bonds's stats from the Giants, the difference comes to 108.87 baseruns; Pujols gets credit for 120.00 baseruns after subtracting his stats from the Cardinals.

But that's not the end of the story - we need to account for secondary effects. When a batter reaches base as frequently as Bonds, he's creating opportunites for his teammates by not making outs. The next question is - how many outs do these batters save for their teammates.

The simplest approach is to subtract OBP from 1. But when a batter reaches first base he's creating a GIDP opportunity for the next batter and this ought to be accounted for. In the NL 2002, a GIDP occured approximately 1 for every 11 times on first base (as estimated by the formula: singles + walks + HBP - steal attempts). The result is an estimated 16.2 GIDPs for his teammates created by Bonds and 14.4 GIDPs by Pujols. I added these to the other outs and the results were:

**Outs% (Outs/PA): Bonds 52.3%; Pujols 60.9%**

The number of outs saved depends on what we assume a replacement level rate would be. I'm going to assume that an outfielder taking one of these two player's spots in the batting order would create fewer outs than average. I used an outs percentage of 67%.

**Outs saved: Bonds 66.67; Pujols 33.41**

How many extra runs would have scored from these outs? The outs are added to the team pool, so Bonds and Pujols can later take advantage of these extra opportunities as well as their teammates. I used the team base runs per Out figures (Giants = .168 base runs per out; Cardinals = .198 base runs per out). Bonds' outs saved should result in 11.22 extra runs for the team, while Pujols adds 6.44 to his.

The new total is 120.09 base runs for Bonds and 126.44 for Pujols. But those are base runs, which is only an estimate of runs. Adjusting for actual number of runs scored by the team (compared to team base runs), Bonds moves down to 118.89 and Pujols moves up to 127.81.

Those are estimates of how many runs Bonds and Pujols were responsible for. However, the value of the contribution depends on replacement level and so we need to calculate marginal runs after setting an appropriate replacement level.

I don't have a good idea what the replacement baseline should be, although I think it's higher than 0.5 league average RPG (which is what Win Shares uses). I set three different replacement runs/PA as follows:

4.50 runs per game - approximately .116 runs/PA (RAR-1)

3.85 runs per game - approximately .101 runs/PA (RAR-2)

3.46 runs per game - approximately .093 runs/PA (RAR-3)

I derived these by looking at the history of the National League and looking at how various run per game levels translate into runs/PA. The results for out two subject are:

player ... PA adjbaseruns RAR-1 RAR-2 RAR-3

Bonds .... 455 118.89 66.11 72.48 76.57

Pujols ... 546 127.81 64.47 72.12 77.03

RAR is runs Above Replacement. Using any replacement level from 3.4 to 4.5 produces very close results, with 3.8 producing a virtual tie.

I've made no attempt to account for the types of clutch performance that Win Shares includes, nor have I accounted for park differences. Pac Bell is a much better pitcher's park than Busch, so for that reason I think Barry Bonds' offensive production has been the most valuable in the NL despite the time he's missed.

But it isn't a cakewalk, and if Bonds misses another 5-10 games, it might be enough to tip the scales.

Bonds or Pujols - who's the most valuable?| 50 comments | Create New Account