BCSS: Stata

Showing posts with label Stata. Show all posts

Thursday, 29 October 2015

Which formula for the confidence interval for a proportion?

Stata presents a bewildering array of options for the confidence interval for a proportion. Which one should you use?

By default, Stata uses the "exact" confidence interval. This name is a bit misleading (this interval is also called the Clopper Pearson confidence interval, which makes fewer implied claims!). The exact confidence interval is exact only in the sense that it is never too narrow. In other words, the probability of the true proportion lying within the "exact" confidence interval is at least 95%. However, this means that in most cases the interval is wider than it needs to be.

For an apparently simple problem, finding a formula that will give 95% confidence intervals for a proportion has turned out to be surprisingly difficult to crack! The problem is that events are whole numbers, while proportions are continuous. Imagine you have a 25% real prevalence of smoking in your population, and you have a sample size of 107. Your sample cannot have a 25% prevalence of smoking, because, well, that would be 26·75 people. So some sample sizes are "lucky" because they can actually show lots of sample sizes proportion, and some proportions are "lucky" because they can turn up in lots of sample sizes. You begin to see the problem?

Solutions from research

There have been quite a few studies that have used computer simulation to examine the performance of different confidence interval formulas. The recommended alternatives are Wilson or Jeffeys confidence intervals for samples of less than 100 and the Agresti-Coull interval for samples of 100 or more. This gives the best trade off between confidence intervals that are less than 95% and confidence intervals that are too wide.

What about the textbook formula that SPSS uses?

One option that Stata does not offer you is the formula you find in textbooks, which simply uses the standard error of the proportion to create a confidence interval. This is known as the normal approximation interval, and it is used by SPSS. If you calculate the confidence interval for 2 events out of a sample of 23 using the normal approximation, the confidence interval is -4% to 21%. That's right: SPSS is suggesting that the true event rate could be minus four percent. Quite clearly this is wrong, as there is no such thing as minus four percent. However, the confidence interval also includes a figure which is obviously wrong. If we have observed two cases, then the true value cannot be zero percent either. Less obviously, the upper end of the confidence interval is also very wrong. Using Wilson's formula gives a confidence interval of

. cii 23 2, wil

------ Wilson ------

Variable | Obs Mean Std. Err. [95% Conf. Interval]

-------------+---------------------------------------------------------------

| 23 .0869565 .0587534 .02418 .2679598

2.4% to 26.8%. The "exact" method gives an interval that is slightly wider:

. cii 23 2, exact

-- Binomial Exact --

Variable | Obs Mean Std. Err. [95% Conf. Interval]

-------------+---------------------------------------------------------------

| 23 .0869565 .0587534 .01071 .2803793

at 1.1% to 28.0%.

So never calculate a binomial confidence interval by hand or using SPSS!

Skip to this bit for the answer

For such an apparently simple problem, the issue of the confidence interval for a proportion is mathematically pretty complex. Mercifully, a Stata user just has to remember three things:

the "exact" interval is conservative, but has at least a 95% chance of including the true value;
for N < 100, Wilson or Jeffreys is less conservative and closest to an average chance of 95% coverage,
and for N > 100, Agresti Coull is the best bet.

Friday, 16 October 2015

Stata tips : get a display of Stata's graph colours

Stata graphs allow you to specify the colo(u)rs used for various graph items. Stata has about 50 named colours, so it's hard to remember exactly what each colour looks like. In addition, Stata uses the names gs0 to gs16 for sixteen shades of grey, starting with gs0 (black) and ending with gs16 (white). These are useful for producing more-or-less evenly-spaced gradations of grey.

To see a plot of all the available Stata colour, you need to install the vgsg package. This package contains useful resources that accompany Mitchell's excellent book A visual guide to Stata graphics. Amongst these resources there a simple command that makes a colour chart.

You install the package like this:

. net from http://stata-press.com/data/vgsg

Click the link to install the package. Once you have installed it, you can issue the command

. vgcolormap

to print a palette of the available colours. I printed one in colour and pasted it inside my copy of Mitchell's book. Here is the graph:

Notice that Stata has 16 shades of grey (biostats people can't cope with fifty). These are named gs0 to gs16. Of course, gs0 looks black, but if you look carefully, Dougal, you'll see it's actually a very, very, very, very dark grey*. And gs16 simply white.

And there is a second user-written command, by Seth T. Lirette, that I rather like for its elegant output:

. ssc install hue

. hue

*For non-Irish people, this is a Father Ted joke. Don't worry about it.

Thursday, 22 January 2015

Making graphs of tables

Seeing tables as graphs

We often put tables into papers by reflex. Making them is a dull activity because, I suspect, there is the sense that no-one reads them. And there’s a very good reason for this: while tables are a very good resource, they are lousy communicators.

Tables : lousy communicators

Here is a table of hair and eye colour

. use "Hair and eye colour.dta"

(Hair and Eye Colour, Caithness, from Tocher (1908))

. tabulate eye_colour hair_colour [fweight = freq]

| Hair colour

Eye colour | Fair Red Medium Dark Black | Total

-----------+-------------------------------------------------------+----------

Blue | 326 38 241 110 3 | 718

Light | 688 116 584 188 4 | 1,580

Medium | 343 84 909 412 26 | 1,774

Dark | 98 48 403 681 85 | 1,315

-----------+-------------------------------------------------------+----------

Total | 1,455 286 2,137 1,391 118 | 5,387

You have to be pretty determined to make any sense of the table. Indeed, to do so requires somehow digesting the information from 20 numbers, most of which are three-digit numbers. This is pretty much guaranteed to be beyond the working memory capacity of the average human.

And no, percentages don’t help much:

. tabulate eye_colour hair_colour [fweight = freq], column nofreq

| Hair colour

Eye colour | Fair Red Medium Dark Black | Total

-----------+-------------------------------------------------------+----------

Blue | 22.41 13.29 11.28 7.91 2.54 | 13.33

Light | 47.29 40.56 27.33 13.52 3.39 | 29.33

Medium | 23.57 29.37 42.54 29.62 22.03 | 32.93

Dark | 6.74 16.78 18.86 48.96 72.03 | 24.41

-----------+-------------------------------------------------------+----------

Total | 100.00 100.00 100.00 100.00 100.00 | 100.00

Stacked bar charts

Here, instead, is what happens when we graph the data

catplot eye_colour hair_colour [fw=freq], name(catplot,replace) ///

asyvars stack percent(hair) legend(rows(1) stack)

The stacked bar chart shows the trend of dark-to-light running from top left to bottom right. This shows the breakdown of eye colour within each hair colour, but tells us nothing about the distribution of hair colour.

This is done with Nick Cox’s command catplot. Download it from the ssc archive

. ssc install catplot

Spineplots (mosaic plots)

Spine plots (also called mosaic plots) are a very effective way of visualising tables. Unlike stacked bar charts, you may not have heard of spine plots.

A spineplot will show both the distribution of hair colour, and the distribution of eye colour within hair colour:

spineplot eye_colour hair_colour [fw=freq], percent

The hair colours are shown as columns, and we can see that red hair and black hair are much rarer in this population (Scotland, early 20th century) than fair, medium and dark. And the relationship with eye colour is now very evident – the colour changes from bottom left (fair hair, light or blue eyes) to the top right (dark or black hair, dark eyes).

Do you need a graph rather than a table

The tables above contain the relationship but they don’t show it. And even if you are determined to find it, there are simply too many numbers in the table for any normal person to hold them all in working memory and make sense of the pattern.

The spine plot, on the other hand, shows the relationship with little work needed on the part of the reader. It doesn’t record the exact percentages. If you needed simply to record the exact percentages for reference, then a table is better, but if you wanted to communicate a pattern, then there’s no question: the graph wins hands-down.

This is done with Nick Cox’s command spineplot. Download it from the ssc archive

. ssc install spineplot

Tuesday, 17 June 2014

Chi-Squared : A measure of surprise

Introduction

Rashid, my opposite number in Penang Medical College, has the bad habit of asking statistical questions when I am trying to concentrate on pulling and crawling my way up steeply-sloping rainforest. And it was in such a situation that he came up with a rather good question:

Just what is Chi-squared anyway?

And I said

Chi-squared is a measure of surprise.

Think of it this way. Here was have a table of data, and we are wondering if there is a pattern in there. The data show the relationship between loneliness and prevalence of depressed mood in people aged 65 and over in a community survey.

. tab depressed loneliness, col

+-------------------+

| Key |

|-------------------|

| frequency |

| column percentage |

+-------------------+

| Feels lonely

Depressed mood (Q24) | Not lonely Lonely | Total

----------------------+----------------------+----------

No depression last mo | 1,310 621 | 1,931

| 95.48 81.28 | 90.40

----------------------+----------------------+----------

Depressed last month | 62 143 | 205

| 4.52 18.72 | 9.60

----------------------+----------------------+----------

Total | 1,372 764 | 2,136

| 100.00 100.00 | 100.00

Is there anything surprising here? Well we seem to have more people who are depressed among the lonely (19% depression prevalence) than the non-lonely (5% prevalence). But how surprising is that difference? Well, that depends on what we were expecting.

If loneliness had no impact on depression, then we would expect to see more or less the same prevalence of depression in the lonely and the non-lonely. Of course, the prevalence probably wouldn’t be exactly the same, but a few people one way or another wouldn’t cause much surprise. What we are looking for is a surprising number of people too many or too few.

So how do we judge when the number of people too many or too few is surprising? Well, that depends on how many people we were expecting, doesn’t it? If ten people more than I expected turn up, I am not surprised if I am running a conference, but I am pretty taken aback if I had thought I was going out on an intimate dinner for two.

So the amount of surprise depends on the number of people I saw, but also on the number I was expecting.

What are we expecting?

Let’s have a look at that table again. Notice that 9·6% of people were depressed. So if loneliness has nothing to do with depression, we expect 9·6% of the lonely and 9·6% of the non-lonely to be depressed.

We have 1,372 non-lonely people; 9·6% of that is

. di 1372*0.096

131.712

I let Stata’s display command do the work for me. Essentially, we are expecting 132 of the non-lonely people to be depressed, and so we are expecting

. di 764*0.096

73.344

73 of the lonely people to be depressed.

We can work out the remainder of the numbers, but Stata will display them for us. Now we know how Stata expected those frequencies, that is.

. tab depressed loneliness, exp

+--------------------+

| Key |

|--------------------|

| frequency |

| expected frequency |

+--------------------+

| Feels lonely

Depressed mood (Q24) | Not lonely Lonely | Total

----------------------+----------------------+----------

No depression last mo | 1,310 621 | 1,931

| 1,240.3 690.7 | 1,931.0

----------------------+----------------------+----------

Depressed last month | 62 143 | 205

| 131.7 73.3 | 205.0

----------------------+----------------------+----------

Total | 1,372 764 | 2,136

| 1,372.0 764.0 | 2,136.0

We have four cells in the table, and in each cell we can compare the number we saw with the number we expected. And remember, the expectation is based on the idea that loneliness has nothing to do with depression. From that idea follows the reasoning that if 9·6% of the whole sample is depressed, then we are expecting 9·6% of the lonely, and of the non-lonely to be depressed. And, of course, 90·4% of the whole sample is not depressed, so we are expecting 90·4% of the lonely, and 90·4% of the non-lonely to be depressed.

The Chi-squared test visits each cell of the table, calculating how far apart the expected and observed frequencies are, as a measure of how surprising that cell is, and totalling up all those surprises as a total surprise score.

Of course, we need to relate that total surprise score to the number of cells in the table. A big surprise score could be the result of adding together a lot of small surprises coming from the cells of a large table – in which case we aren’t really surprised – or it could come from a very small number of cells in a small table, in which case we are surprised.

Degrees of freedom: the capacity for surprise

So a Chi-squared value has to be interpreted in the light of the number of cells in the table. Which is where the degrees of freedom come in. Degrees of freedom measures the capacity of a table to generate surprising results. Looking at our table, once we had worked out one expected frequency, we could have filled in the other three by simple subtraction. If we expect 131·7 people to be depressed but not lonely, then the rest of the depressed people must be depressed and lonely. And the rest of the non-lonely people must be non-depressed. See – once I know one number in that table, I can work out the rest by subtraction.

So that 2 x 2 table has only one potential for surprise.

And by extension, once I know the all but one of the numbers in the row of a table, I can fill in the last one by subtraction. Same goes for the columns. So the capacity of a table for surprises is one less than the number of rows multiplied by one less than the number of columns.

And what is the Chi-squared value for the table anyway?

. tab depressed loneliness, col chi

+-------------------+

| Key |

|-------------------|

| frequency |

| column percentage |

+-------------------+

| Feels lonely

Depressed mood (Q24) | Not lonel Lonely | Total

----------------------+----------------------+----------

No depression last mo | 1,310 621 | 1,931

| 95.48 81.28 | 90.40

----------------------+----------------------+----------

Depressed last month | 62 143 | 205

| 4.52 18.72 | 9.60

----------------------+----------------------+----------

Total | 1,372 764 | 2,136

| 100.00 100.00 | 100.00

Pearson chi2(1) = 114.0215 Pr = 0.000

A very large index of surprise: a Chi-squared of 114, with one degree of freedom. And a P-value that is so small that the first three digits are all zero. We would write P<0·001.

So that’s it . Chi-squared is a measure of surprise, and degrees of freedom measure the capacity of a table to come up with surprises.

Those aren’t mathematical definitions, but they are useful ways of thinking about statistical testing, and the difficult idea of degrees of freedom.

Monday, 9 June 2014

Analysis of Variance: Almost always a bad idea

Ask a vague question, get a vague answer

Analysis of variance is a dangerous tool. It allows researchers to avoid asking precise questions. And if you don’t formulate your question precisely, then the answer won’t tell you anything. It’s like the gag in the Hitch Hiker’s Guide where they build a supercomputer to figure out “the answer to life, the universe and everything”. When the computer finally figures it out, the answer is 42. The problem, you see, was that they hadn’t actually figured out what the question was.

Let’s have a look at a simple case. Here is data on cardiac output late in pregnancy in three groups of mothers: those with normal blood pressure, pre-eclampsia (PET) and gestational hypertension (GH)

. table outcome, c(mean co5) format(%2.1f)

----------------------

Hypertens |

ion |

outcome | mean(co5)

----------+-----------

Normal | 6.4

P E T | 5.6

Gest HT | 9.0

----------------------

I used the format option to limit us to one place of decimals.

What does an anova tell us

. anova co5 outcome

Number of obs = 256 R-squared = 0.4984

Root MSE = .790256 Adj R-squared = 0.4944

Source | Partial SS df MS F Prob > F

-----------+----------------------------------------------------

Model | 156.97329 2 78.4866448 125.68 0.0000

outcome | 156.97329 2 78.4866448 125.68 0.0000

Residual | 157.999502 253 .624503962

-----------+----------------------------------------------------

Total | 314.972792 255 1.23518742

Now, if you find that edifying you’re a better person than I. What the anova tells us is that there is a difference in cardiac output between the three groups. But that’s not the answer to any particular question. It certainly doesn’t answer any useful clinical question.

“Oh, but now we can do post-hoc comparisons”, you say. To which I might reply “But don’t you have any ideas you want to test?” The trouble with post-hoc comparisons is that you can rapidly end up with a bunch of comparisons, some of which are of interest and some meaningless.

First ask a question

To analyse data, you need to articulate the underlying hypothesis. And here, the hypothesis wasn’t “there’s some kinda difference in cardiac output between the three groups”. This is what I call the Empty Brain hypothesis, that assumes no previous research has been done, we know nothing about physiology, about hæmodynamics, about anything. And it’s not good enough. Science progresses by building on our understanding, not by wandering around hypothesising “some kinda difference”.

In fact, we expect that late in pregnancy, cardiac output in gestational hypertension will be higher than normal, causing high blood pressure because it exceeds the normal carrying capacity of the mother’s circulatory system. On the other hand, we expect that it will be below normal in pre-eclampsia because pre-eclampsia is characterised by a lot of clinical indicators of inadequate blood supply. The high blood pressure in pre-eclampsia is the result of very high peripheral resistance (the circulatory system basically closed tight) and the heart desperately trying to get the blood supply through.

Then use regression to answer it

We can use regression to ask this question:

. regress co5 i.outcome

Source | SS df MS Number of obs = 256

-------------+------------------------------ F( 2, 253) = 125.68

Model | 156.97329 2 78.4866448 Prob > F = 0.0000

Residual | 157.999502 253 .624503962 R-squared = 0.4984

-------------+------------------------------ Adj R-squared = 0.4944

Total | 314.972792 255 1.23518742 Root MSE = .79026

------------------------------------------------------------------------------

co5 | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

outcome |

P E T | -.8146901 .217851 -3.74 0.000 -1.243722 -.3856577

Gest HT | 2.612142 .1732165 15.08 0.000 2.271012 2.953272

_cons | 6.376119 .0534005 119.40 0.000 6.270953 6.481285

------------------------------------------------------------------------------

I have used Stata’s ‘i’ notation to tell Stata that outcome is to be treated as separate categories. In this case, Stata will use the normotensive group (coded as zero) as the reference group.

Not just hypothesis tests – effect sizes too

The regression has tested our two hypotheses:

1. Compared with women with normal BP, those with PET will have lower cardiac outputs

2. Compared with women with normal BP, those with GH will have higher cardiac outputs

Furthermore, it has quantified the effects. Cardiac output is 0·8 litres a minute lower in pre-eclampsia, and 2·6 litres a minute higher in gestational hypertension.

Robust variance estimation as well!

Another advantage of regression is that we can use Stata’s robust variance estimates. This topic is so important that it will be the subject of its own blog post. But note what happens when I invoke robust variance estimation:

. regress co5 i.outcome, robust

Linear regression Number of obs = 256

F( 2, 253) = 207.37

Prob > F = 0.0000

R-squared = 0.4984

Root MSE = .79026

------------------------------------------------------------------------------

| Robust

co5 | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

outcome |

P E T | -.8146901 .1947096 -4.18 0.000 -1.198148 -.4312319

Gest HT | 2.612142 .1352127 19.32 0.000 2.345856 2.878428

_cons | 6.376119 .0549773 115.98 0.000 6.267847 6.48439

------------------------------------------------------------------------------

Our estimates for the effect sizes haven’t changed, but the confidence intervals have. Robust variance estimation allows us to take into account the clustering within the data due to factors beyond our control. Always a good idea!

Tuesday, 20 May 2014

Don’t use a t-test, use a regression

What?

The t-test is such a mainstay of basic statistics that I can’t be serious when I say that I haven’t used Stata’s ttest command ever. But that’s right. I’ve used Stata since version 3 (we’re on 13 now – I’m older too) and I’ve never used the t-test command built into Stata.

And the reason is that a t-test isn’t as versatile as a regression. But let me reassure you: a t-test is a regression.

Promise.

Watch

Doing a t-test

Using the auto dataset that is built into Stata, I’ll run a t-test to see if foreign cars have better fuel consumption than US cars:

. sysuse auto

(1978 Automobile Data)

. ttest mpg, by(foreign)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Domestic | 52 19.82692 .657777 4.743297 18.50638 21.14747

Foreign | 22 24.77273 1.40951 6.611187 21.84149 27.70396

---------+--------------------------------------------------------------------

combined | 74 21.2973 .6725511 5.785503 19.9569 22.63769

---------+--------------------------------------------------------------------

diff | -4.945804 1.362162 -7.661225 -2.230384

------------------------------------------------------------------------------

diff = mean(Domestic) - mean(Foreign) t = -3.6308

Ho: diff = 0 degrees of freedom = 72

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0003 Pr(|T| > |t|) = 0.0005 Pr(T > t) = 0.9997

The difference in fuel consumption is 4.9 miles per gallon, in favour of the foreign cars, and it’s statistically significant. Stata writes, very correctly, that the probability of a value of t greater than the absolute value we observed in the data is 0·0005. And note the value of t: –3·6308.

A t-test is the same thing as a regression

What happens when I use a regression?

. regress mpg foreign

Source | SS df MS Number of obs = 74

-------------+------------------------------ F( 1, 72) = 13.18

Model | 378.153515 1 378.153515 Prob > F = 0.0005

Residual | 2065.30594 72 28.6848048 R-squared = 0.1548

-------------+------------------------------ Adj R-squared = 0.1430

Total | 2443.45946 73 33.4720474 Root MSE = 5.3558

------------------------------------------------------------------------------

mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

foreign | 4.945804 1.362162 3.63 0.001 2.230384 7.661225

_cons | 19.82692 .7427186 26.70 0.000 18.34634 21.30751

------------------------------------------------------------------------------

The coefficient for the foreign variable is 4·945. That’s exactly the same as the difference between the means. Why?

The coefficient for a regression variable is the effect of a one-unit increase in that variable. For every one-unit increase in the foreign variable, the expected fuel consumption drops by 4·945 miles per gallon. But the foreign variable is a binary variable: it’s coded zero (domestic) and one (foreign). So the effect of a one unit increase is to change from a domestic car to a foreign one.

When a variable is coded 0/1, the regression coefficient is the difference between the means of the two categories. In other words, it’s a test of equality of means.

And, indeed, the t-value you see in the regression is the t-value of the t-test. The t-test isn’t a separate test – it’s the simplest kind of regression: a regression with just one predictor that’s coded 0/1.

Taking the analysis further

But the real reason I use regress is the power I can bring to bear on the analysis. So far we know that foreign cars consume less fuel. But is that because they have smaller engines? If I do a t-test, I can’t answer the question, because a t-test only compares two means. But a regression can:

. regress mpg foreign displacement

Source | SS df MS Number of obs = 74

-------------+------------------------------ F( 2, 71) = 35.57

Model | 1222.85283 2 611.426414 Prob > F = 0.0000

Residual | 1220.60663 71 17.1916427 R-squared = 0.5005

-------------+------------------------------ Adj R-squared = 0.4864

Total | 2443.45946 73 33.4720474 Root MSE = 4.1463

------------------------------------------------------------------------------

mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

foreign | -.8006817 1.335711 -0.60 0.551 -3.464015 1.862651

displacement | -.0469161 .0066931 -7.01 0.000 -.0602618 -.0335704

_cons | 30.79176 1.666592 18.48 0.000 27.46867 34.11485

------------------------------------------------------------------------------

When we take into account the effect of engine size on fuel consumption, there is no longer a significant difference between the foreign and domestic cars. The coefficient for displacement is rather silly, because it’s the effect of a one cubic inch difference in engine size. I don’t know anyone who talks about engines in cubic inches, so please excuse me while I do some work:

Making a sensible predictor variable

. gen cc= displacement*16.387064

. lab var cc "Engine size in cc"

First off, I looked up the conversion factor. Now I have a variable called cc that records engine size in cubic centimetres.

But that’s still going to give us a silly regression coefficient, because the effect of a one-cc change in engine size will be trivial. So I’m generating a variable that represents engine size in multiples of 100 cc.

. gen cc100=cc/100

. lab var cc100 "Engine size in multiples of 100 cc"

Now let’s see the regression:

. regress mpg foreign cc100

Source | SS df MS Number of obs = 74

-------------+------------------------------ F( 2, 71) = 35.57

Model | 1222.85287 2 611.426434 Prob > F = 0.0000

Residual | 1220.60659 71 17.1916421 R-squared = 0.5005

-------------+------------------------------ Adj R-squared = 0.4864

Total | 2443.45946 73 33.4720474 Root MSE = 4.1463

------------------------------------------------------------------------------

mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

foreign | -.8006821 1.335711 -0.60 0.551 -3.464015 1.862651

cc100 | -.2862997 .040844 -7.01 0.000 -.3677404 -.204859

_cons | 30.79176 1.666592 18.48 0.000 27.46867 34.11485

------------------------------------------------------------------------------

Aha. A 100-cc increase in engine size is associated with a reduction of 0·283 mpg. Adjusted for this, the difference between foreign and domestic cars is no longer significant (P=0·551) nor is it important (a difference of 0·8 mpg).

But does that mean that smaller engines are more fuel efficient? Or is it that smaller engines are found in, well, smaller cars?

. regress mpg foreign cc100 weight

Source | SS df MS Number of obs = 74

-------------+------------------------------ F( 3, 70) = 45.88

Model | 1619.71934 3 539.906448 Prob > F = 0.0000

Residual | 823.740115 70 11.7677159 R-squared = 0.6629

-------------+------------------------------ Adj R-squared = 0.6484

Total | 2443.45946 73 33.4720474 Root MSE = 3.4304

------------------------------------------------------------------------------

mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

foreign | -1.600631 1.113648 -1.44 0.155 -3.821732 .6204699

cc100 | .0117693 .0614517 0.19 0.849 -.1107922 .1343308

weight | -.0067745 .0011665 -5.81 0.000 -.0091011 -.0044479

_cons | 41.84795 2.350704 17.80 0.000 37.15962 46.53628

------------------------------------------------------------------------------

Now we can see that the weight of the car, not the engine size, drives the fuel consumption. Adjusted for weight, engine size no longer predicts fuel consumption, and there is no significant difference between the domestic and foreign cars.

Advantages of regression

Using a regression model allowed us to examine factors that might have caused the apparent difference in fuel consumption between foreign and domestic cars. At each step, we came up with an explanation that we could then test by adding a new variable to the model.

And don’t forget robust standard errors

Actually, in real life, I would have used Stata’s robust standard errors in all of these models. They are the equivalent of the unequal variances option in the t-test, but far more powerful:

. regress mpg foreign cc100 weight, robust

Linear regression Number of obs = 74

F( 3, 70) = 50.13

Prob > F = 0.0000

R-squared = 0.6629

Root MSE = 3.4304

------------------------------------------------------------------------------

| Robust

mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

foreign | -1.600631 1.185019 -1.35 0.181 -3.964077 .7628141

cc100 | .0117693 .0449128 0.26 0.794 -.0778065 .1013451

weight | -.0067745 .0008357 -8.11 0.000 -.0084413 -.0051077

_cons | 41.84795 1.811437 23.10 0.000 38.23515 45.46075

------------------------------------------------------------------------------

That’s what it looks like.

So make a new year’s resolution (never too late!) to get beyond the t-test and use the power of regression to construct and test more thoughtful models.