Through the Looking Glass

Ed shows how to use HP Solve's SIGMA( ), L( ), and G( ) functions to perform Numeric Integration to analyze the normal probability distribution function.

By Ed Keefe

A problem that's out of this world

The star-date is 3481. The place is the star-ship Enterprise with Captain James Kirk in command.

The voice of the chief engineer comes over the comm system. "Cap'n, the warp drive is goin' down ag'in. We rrreally need ta 'lign the dilithium crystals." Kirk replies, "How long will it take this time, Mr. Scott?" There's a pause and Scotty finally replies, "It'll take at least four hours, Cap'n."

Kirk responds, "That's too long, Mr. Scott. We need to be at warp 7 within three hours if we're going to make it to Starbase 17 in time." Scotty moans, "Ah dinna think we kin do it, Cap'n." Kirk looks over his shoulder at his science officer, Mr. Spock, and, without having to ask, gets the answer, "If Mr. Scott would start work in the next 5 minutes, there's a 95 percent probability that he can make the repairs in 3 hours."

Captain Kirk then tells his chief engineer, "I know you can do it, Scotty. Make it three hours and not a minute more." The question is, how did Spock know that? What's the secret to his amazing ability to "predict" how long it will take Scotty to realign the dilithium crystals? Could you do something similar?

The answers are "Yes." All it takes is some knowledge of statistical inference and a calculator. (The calculator built into the HP Palmtop will do very nicely.)

Some background on statistics

A couple of hundred years ago, scientists gained an important insight.

Whenever they measured the same phenomenon a number of times, they got measurements that were slightly off. Whether they measured the height of a thousand ten-year-olds or the temperature of the same experiment repeated a thousand times, the same thing happened.

They could calculate an average (mean) height or temperature, but they were at a loss to explain the discrepancies between this mean and the other readings.

An example of the bell-shaped curve.

Then they noticed that the discrepancies formed a pattern. They drew a bar-graph in which the height of each bar represented the frequency with which they got the same measurement. They placed the bars along a horizontal axis that represented the different measurements. Then they drew a smooth line that connected the tops of the bars. They almost always found they had a bell-shaped curve.

Scientists jumped to the conclusion that this bell-shaped curve was a great truth that applied to all of nature. The bell-shaped curve made its way onto the pages of statistics books and, as they say, "the rest is history."

Later on, mathematicians found other more descriptive distribution curves, but the bell-shaped curve still endures as a pretty fair approximation of the way things are.

How the bell-shaped curve works

Suppose you've a fifth-grade teacher and over the years you've taken a series of over 100 measurements of the height of your students and found an average measurement. You've drawn a curve like the one above. Now you're about to take another measurement. You ask yourself, "What's the probability that the new measurement will fall somewhere between the highest and lowest measurement you've found so far. The answer is, "There's almost a 100% chance that this will happen."

Then you ask, "What's the probability that the new measurement will fall somewhere between the average measurement and a measurement that is two units of measurement greater?" The answer to that question is a little tougher to figure out, but basically the answer is, "It's the same number you'd get if you could find the area between the bell-shaped curve and the horizontal axis, and between the mid-line and a vertical line that's two units of measurement to the right of the mid-line." (See the diagram at the top of the next column.)

Probability is proportional to area under the bell-shaped curve between the mid-line and two units to the right.

So, how do you find the area under the bell-shaped curve?

To a mathematician, the answer is simple. You compute the definite integral of the bell-shaped function over the desired range.

Suppose you're not a mathematician. You and your HP Palmtop can still find the area in question. You just need to teach HP Solve how to compute the area in each of the bars in the above graph and add the areas together. That will yield a pretty good approximation of the answer. If you could use more bars that were narrower, the results would be even better. Let's see how to do this.

Drawing the bell-shaped curve on the Palmtop

To draw a bell-shaped curve, we can use the Plot function in HP Solve. All we need to know is the mathematical equation that describes the curve. Without going into all the details of how to find this equation, here it is in Solver format:

y = 1/sqrt(2*pi)*exp(-(z^2)/2)

where y is the height of the curve and z is a point on the horizontal axis. e is the mathematical constant 2.718282. The square root sign is replaced with the sqrt function. The constant e is replaced with the exponent function.

Step by step

Here's how to develop the HP Solve equation.

Open the HP Calc program on the Palmtop and press the (Menu) key and select Options Number Format... from the pull-down menu. Set the format to Fixed point with 6 as the number of digits. Press the (Enter) key.

Press and hold the (Ctrl) key while tapping the (S) key. This will start HP Solve. Press (Menu) and select File New from the pull-down menu. Type in the New Filename STAT.EQN and press the (Enter) key. Now press the (F6) function key to open the Solve Editor and key in the above equation. Check your typing and, when ready, press the (F10) (OK) function key.

Press the (LeftArrow) key and press (F6). Type in the Equation Name NormDistrib, then press the (F10) (OK) key.

Now press the (F10)(Plot)key and, in the Function Plotting screen, set X-minimum to -3.50000. Set X-maximum to 3.50000. Set Y-minimum to 0.000000 and Y-maximum to 0.50000. Set the Plot-variable to z and the plot resolution to 30.00000. Then press the (F4) (Draw) key. You should see the following graph:

HP Calc s plot of STAT.EQN.>

Press the (F6) key twice to get the TRACE word in the upper left corner of the screen. Now you can use the right and left arrow keys to slide the cross-hair along the curve. The top line of the screen will show the corresponding values for x and y. Hold the (Shift) key and press the (Arrow) keys to move the cross-hair more slowly.

Now that we have an idea of what the normal distribution curve looks like, we can tackle the task of finding an area under the curve. To do this we want to divide the area along the horizontal axis, between z = 0.000 and z = 2.000, into rectangles of equal width. We can find the length of each bar: it's the value of y in our NormDistrib equation. We can compute the area of each bar by multiplying the length by the width. Finally we need to accumulate the areas of all the rectangles. To expedite this task we'll use a special function in HP Solve called SIGMA( ).

On the Palmtop, return to the Solve Catalog screen and press the (DownArrow) key to move down to a blank line, then press the (F6) (Edit) key to open an empty Solve-Editor screen. Key in the equation found in Figure 1. You may omit the text enclosed in exclamation marks if you wish. (The text is commentary on what each line of code does.)

Figure 1:

CDF= 1/sqrt(2*pi)*

! This is a constant factor in the !

! equation. Isolated so it won't be !

! computed each time thru the loop !

SIGMA( z,

! Loop starts here w/ z holding the!

! results of the accumulation !

a+dz/2,

! Start at a+half the width of the !

! rectangle. !

! End at b !

dz,

! Step through the loop by dz, the !

! width of a rectangle. !

exp(-(z^2)/2)

! Use this as the function. !

)*dz

! Multiply the result by the width.!

When you're ready, press the (F10) (OK) key. Then press the (F9) (Calc) key. In the Solve Calc screen, set a = 0.000000, dz = 0.100000 and b = 2.000000. Solve for CDF by pressing the (F2) (CDF) function key. You should get the answer CDF = 0.477295.

If you happen to have a statistics book at hand, flip to the back of the book. That's where you're likely to find a Table called Normal Curve Areas (or something like that). You should find that the area given for 2.00 is very close to the CDF result above.

If you don't have a statistics book handy but you do have the HP Palmtop Paper ON DISK, there's a Lotus 1-2-3 spreadsheet on this issue s disk that has this table in it (NORMDIST.WKI <ON DISK ICON>).

If you want to check the accuracy of the above equation, try setting b = 0.100000. You should find that CDF = 0.03984, which is way off from the value given in any Normal Curve Area table. That's because the width of the rectangle we're using is way too large for this small a range. You can fix this by making the rectangles narrower. Just set dz = 0.010000 and solve for CDF = 0.039828. That's better, but there's a more elegant way to do this.

Press the (ESC) key and then the (F6) key to return to the Solve Editor and add some code to the equation so that it looks like the equation in Figure 2.

Figure 2

CDF= 0*(a+b+L(dz:abs(b-a)/20))+ 1/sqrt(2*pi)*

!This is a constant factor in the !

!equation. Isolated so it won't be !

!computed each time through the loop!

(The rest of the equation is the same as before.)

The new line at the top uses a "zero-multiplier" trick to rearrange the variables in the Solve Calc screen. The L(dz:abs(b-a)/20) command tells HP Solve to let the width of a rectangle be 1/20th the range (b-a). Since the whole expression is multiplied by zero, we can add it to the rest of the equation without effecting the outcome.

In the rest of the equation we now replace dz with g(dz) which will suppress the dz variable on the Solve Calc screen. With this polished equation you can set b = 0.010000 or b = 4.000000 (or anything in between) and get values for CDF that are very close to those in the statistics books.

A worldly example

To see how to use this Solve equation, consider the following problem.

Suppose you take your car in for a transmission repair. You want to know when you can expect the repair work to be finished. The service manager whips out his HP Palmtop and starts the HP Calc program. He presses (Ctrl)-(L) to get to the Stat List application and uses File Open to bring up a list of Transmission Repair times. There are at least thirty entries in the list. He presses the (F9) (Stats) key and sees that Mean Value = 46.13 (minutes), the Standard Deviation = 8.33 (minutes). The Standard Deviation (SD) is a measure of dispersion. Those two numbers are critical for the next part of the problem.

The mean and standard deviation may indeed be critical but they need to be "massaged" before they can be used with the equations developed so far. You see, the Normal Probability curve is always shown so that the mean is 0 (zero) and the units along the horizontal axis are shown in terms of normalized values, called z-values. (The area under the whole curve is 1 unit, to correspond with the fact that all the probabilities add up to 100%, which is the same as 1.)

To convert any other mean and standard deviation to a z-value, you use the formula:

z = (X - mean)/SD

where X is the number for which you want to find the corresponding z-value.

Suppose the service manager wants to give you a repair time, and a 90% probability for that time. He would start by assuming that since the midpoint of the graph is 50%, 90% is really 40% above the midpoint. So he'd put 0.40000 in the place of CDF and set a = 0.000000 and solve for b = 1.281332 Standard Deviations. b is the z-value he wants. Now he can compute the repair time. He'd use the formula X = z * SD + mean = 1.281332 * 8.13 + 46.33 = 56.7 minutes. The service manager can tell you that your car will be ready in 57 minutes with a probability of 90%. If he wanted to give you a better probability, say 95%, he'd put 0.4500000 in CDF and solve for b = 1.644390 with a corresponding time allotment of X = 1.644390 * 8.13 + 46.33 = 59.698893, or there's a 95% chance that the repairs will be done in an hour.

Spock's equation

On board the Enterprise, 200 years from now, Spok does all these computations in his head without the aid of a calculator or so we're asked to believe.

However, if he had an HP Palmtop, here's the equation he'd probably use. (See Figure 3.) It's an enhanced version of the equation above. I call it Spok's EQN. It combines the equation for computing the z-values along with the Cumulative Dispersion Function (CDF) used to compute probabilities. It also uses the IF(S( ) ... ) function to isolate one equation from another. It even includes a "separator" line between the variables on the Calc screen.

Figure 3

!This one does everything: see examples below !

0*(Z+X+mean+SD+

!Arrange variables in the display !

ooooooo+

!A separator between two equations!

a+b+CDF)+

IF(S(Z) OR S(X)

!IF you're Solving for any of !

OR S(mean)

!these 4 variables, !

OR S(SD)

!Use this equation that computes !

:Z-(X-mean)/SD+

!the Z-value. At the end put the !

0*L(b,Z)

!Z-value in b. !

:IF(S(ooooooo)

!ELSE IF you try to Solve for the !

:ooooooo

!separator...set it equal to 0 !

:0*L(dz:abs(b-a)/20))+

!Set the width to Range/20!

1/sqrt(2*PI)*

!and multiply by a constant. !

SIGMA(z,

!Start accumulating the lengths of!

a+g(dz)/2,

!the rectangles under the curve. !

!from mid-point of the central one!

g(dz),

!out to b in steps of dz !

exp(-(z^2)/2)

! This is the curve's equation. !

)*g(dz)-CDF+

!Multiply by width to get area !

0*L(Z,b)

!Finally, set Z to b. ! )=0 !

Example: Given that the mean time to repair a car is 45 minutes with a std dev of 8 min. The service manager says that your repair will be done in 50 minutes. What's the probability that he's WRONG?

ANALYSIS: This is really asking for the area under a bell curve whose midpoint is at 45. We want the area from X=50 out to X=a large number. We have to convert to the Normalized curve. So put 50 in X; put 45 in mean; put 8 in SD and solve for Z=0.6250. Then move the light bar to a and press Enter to make a = 0.6250 and put 5 in b. Then solve for CDF = 0.265365 . That is there's 26% chance that the service manager is wrong.

Here's how to use Spok's equation. Assume that we have built up a Stat List of the times of 30 or more repair incidents in which the dilithium crystals have been realigned. The mean is 149.582233 and the standard deviation is 15.315359. Enter those two values in the Solve Calc screen.

Since we want to find the z-value for 95%, put 0.450000 in CDF (i.e. 0.95 - .050) and solve for b = 1.644390. This value will also appear as the z-value in the first variable on the screen. l've solved for X= 174.766660 which is close to 175 minutes or 3 hours--less the 5 minutes it will take for Scotty to get organized.

Wrapping Up

Trying to figure out how a famous Vulcan might do statistical inference has let us push the HP Solve application to the limit.

We've developed a sophisticated equation that does numeric integration using the SIGMA( ) function. We've also seen examples of how to use L( ) and G( ) and even used an example of a multiple IF(S(var)...) function.

In the process of developing these equations, I needed a way to check my results. The only tool that was handy was the Lotus 1-2-3 spreadsheet program in the HP Palmtop.

On The HP Palmtop Paper ON DISK, I've included a couple of spreadsheets, one of which (NORMDIST.WKI <ON DISK ICON>) was mentioned above . This spreadsheet contains a table much like the ones you'd find in a statistics book. The other WK1 file (ND-INTEG.WKI <ON DISK ICON>) contains a numeric integration macro modeled on HP Solve's SIGMA( ) function. It is unique in that it not only performs as well as the one in HP Solve, but it will also let you single-step through the solution of the problem something you can't do in HP Solve.

It took me about a month to research and write this article. I hope there is at least a 90% probability that it will provide you with a couple of days' enjoyment.

Through the Looking Glass

Ed shows how to use HP Solve's SIGMA( ), L( ), and G( ) functions to perform Numeric Integration to analyze the normal probability distribution function.

By Ed Keefe

Shareware/freeware mentioned in this article