Using the World Wide Web for Teaching Statistics

 

This document contains problems that can be used in an introductory statistics course.  The problems are grouped into the following categories.

Organizing Data

Numerical Description of Data

Binomial Distribution

Normal Distribution

Sampling Distribution

Confidence Intervals for Proportions

Tests of Hypothesis about Proportion

Comparison of Two Proportions

Confidence Intervals for Mean

Tests about Mean

Comparison of Means

Chi Square Test

Linear Regression and Correlation

 

 

The following sites have useful resources for teaching statistic.

 

The Dataset and Story Library:                         http://lib.stat.cmu.edu/DASL/

Data Surfing:                                                     http://it.stlawu.edu/~rlock/datasurf.html

StatLib Dataset Archive:                                   http://lib.stat.cmu.edu/datasets/

Java Applets   for Statistics:                              http://www.stat.duke.edu/sites/java.html

Multimedia Statistics Page:                                http://www.berrie.dds.nl/index.html

Rice Virtual Lab in Statistics:                             http://www.ruf.rice.edu/~lane/rvls.html

Journal of Statistics Education Data Archive:     http://www.amstat.org/publications/jse/jse_data_archive.html

 

 

 

 

Organizing Data

 

1.         In a survey of 1000 people, conducted in June 2002 by Strategy One/Colonial Williamsburg, they were asked what issues was most important to them out of the choices given in the table below.

           

Issue

Percentage of Response

Freedom of Speech

26

Access to affordable health care

20

Freedom of religion

19

Opportunity of economic advancement

12

Right to pursue an education

12

Freedom of press

3

Don’t know/none of the above

8

 

(Source: Survey One/Central Williamsburg, USA Today, August 13, 2002)

 

Draw a pie chart to describe the distribution.

 

2.         25 countries won medals in the 2002 Winter Olympics.  The table below list them along with the total number of medals each won.

 

Country

Medals

Country

Medals

Germany

35

Croatia

4

USA

34

Korea

4

Norway

24

Bulgaria

3

Canada

17

Estonia

3

Austria

16

Great Britain

3

Russia

16

Australia

2

Italy

12

Czech Republic

2

France

11

Japan

2

Switzerland

11

Poland

2

China

8

Spain

2

Netherlands

8

Belarus

1

Finland

7

Slovenia

1

Sweden

6

 

 

 

(Source: CNNSI.com  http://sportsillustrated.cnn.com/olympics/2002/current_medal_tracker/)

 

(a)                Draw a pie chart to describe the distribution.  What problems do you encounter?

(b)               Can you find a way to organize the data so that the graph is more successful?


 

3.         202 countries participated in 2004 Summer Olympics, and 75 countries won medals.  The table below list top 25 countries with the total number of medals won.

 

Country

Medals

Country

Medals

USA

103

Romania

19

Russia

92

Spain

19

China

63

Hungary

17

Australia

49

Greece

16

Germany

48

Belarus

15

Japan

37

Canada

12

France

33

Bulgaria

12

Italy

32

Brazil

10

South Korea

30

Turkey

10

Great Britain

30

Poland

10

Cuba

27

Thailand

8

Ukraine

23

Denmark

8

Netherlands

22

 

 

 

(Source: CNNSI.com   http://sportsillustrated.cnn.com/olympics/2004/medaltracker/medalTrackerByTotal.html)

 

a.         Draw a pie chart to describe the distribution.  What problems do you encounter?

b.         Can you find a way to organize the data so that the graph is more successful?

 

 

4.         The following table, based on the American Chamber of Commerce Researchers Association Survey for the second quarter of 2002, gives the prices (in dollars) of five items in 25 urban areas across the United States.

 

City

Apartment

Rent

Phone

Bill

Price of

Gasoline

Visit to

Doctor

Price of

Beer

Montgomery (AL)

$576

$22.28

$1.335

$52.33

$7.88

Juneau (AK)

1020

18.26

1.584

88.67

8.12

Tucson (AZ)

689

21.03

1.347

54.80

7.79

Sacramento (CA)

749

16.99

1.643

70.00

6.99

San Diego (CA)

1306

24.57

1.632

75.20

7.99

Denver (CO)

891

23.10

1.343

71.8

6.90

Hartford (CT)

896

22.39

1.419

80.25

7.15

Jacksonville (FL)

810

20.31

1.419

63.80

7.15

Bloomington (IN)

678

19.95

1.402

56.67

6.99

New Orleans (LA)

798

26.06

1.351

56.20

6.56

Boston (MA)

1248

24.41

1.405

78.00

7.21

Grand Rapids (MI)

678

22.40

1.499

59.20

8.03

Minneapolis (MN)

815

25.16

1.366

72.20

7.49

Springfield (MO)

568

18.25

1.309

63.72

7.89

Billings (MT)

550

30.45

1.449

70.75

7.09

Buffalo (NY)

714

33.71

1.413

53.00

7.03

Charlotte (NC)

540

21.07

1.359

58.00

7.03

Akron (OH)

686

21.16

1.519

59.40

7.29

Oklahoma City (OK)

579

23.04

1.308

60.02

6.94

Portland (OR)

753

20.92

1.403

72.40

7.69

Philadelphia (PA)

1282

21.12

1.360

62.50

8.57

Austin (TX)

1025

19.20

1.299

68.33

6.78

Richmond (VA)

769

26.15

1.317

59.80

6.37

Spokane (WA)

593

18.49

1.305

61.80

6.89

Charleston (WV)

606

27.08

1.423

64.67

7.01

 

Explanation of variables

Apartment Rent:        Monthly rent of an unfurnished 2-bedroom apartment (excluding all utilities except water), 1 ½ or 2 baths, approximately 950 square feet.

Phone Bill:                  Monthly telephone charges for a private residential line (customer owns instruments).

Price of Gasoline:      Price of one gallon regular unleaded, national brand.

Visit to doctor:           General practitioner’s routine examination of patient.

Price of Beer:            Heineken’s 6-pack, 12-oz.  containers, excluding deposit.

 

a.         Prepare frequency distributions for the five variables.

b.         Construct the relative frequency and percentage distribution for the five variables.

c.                   Draw histograms.

 

5.         The table below shows the average SAT scores for each of the 50 states and District of Columbia for 1990 and 2000.

 

State

1990

2000

Alabama

1079

1114

Alaska

1015

1034

Arizona

1041

1044

Arkansas

1077

1117

California

1002

1015

Colorado

1067

1071

Connecticut

1002

1017

Delaware

1006

998

D.C

950

980

Florida

988

998

Georgia

951

974

Hawaii

985

1007

Idaho

1066

1081

Illinois

1089

1154

Indiana

972

999

Iowa

1172

1189

Kansas

1129

1154

Kentucky

1089

1098

Louisiana

1088

1120

Maine

991

1004

Maryland

1008

1016

Massachusetts

1001

1024

Michigan

1063

1126

Minnesota

1110

1175

Mississippi

1090

1111

Missouri

1089

1149

Montana

1082

1089

Nebraska

1121

1131

Nevada

1022

1027

New Hampshire

1028

1039

New Jersey

993

1011

New Mexico

1100

1092

New York

985

1000

North Carolina

948

988

North Dakota

1157

1197

Ohio

1048

1072

Oklahoma

1095

1123

Oregon

1024

1054

Pennsylvania

987

995

Rhode Island

986

1005

South Carolina

942

966

South Dakota

1150

1175

Tennessee

1102

1116

Texas

979

993

Utah

1121

1139

Vermont

1000

1021

Virginia

997

1009

Washington

1024

1054

West Virginia

1034

1037

Wisconsin

1111

1181

Wyoming

1072

1090

(Source: College Entrance Examination Board, 2001)

 

a.         Use graphs to display the two SAT score distributions.  How has the distribution of average state scores changed over the decade?

b.         Compute the paired difference by subtracting the 1990 score from the 200 score for each state.  Summarize these differences with a graph.


 

Numerical Description of Data

 

6.         In an advertisement in USA Today (July 9, 2001), the company Net2Phone listed its long distance rates to 24 of the 250 countries to which it offers service.

 

Country

Cost per

Minute (cents)

Country

Cost per

Minute (cents)

Belgium

7.9

Italy

9.9

Chile

17

Japan

7.9

Canada

3.9

Mexico

16

Colombia

9.9

Pakistan

49

Dominican Republic

15

Philippines

49

Finland

9.9

Puerto Rico

21

France

7.9

Singapore

11

Germany

7.9

South Korea

9.9

Hong Kong

7.9

Taiwan

9.9

India

49

United Kingdom

7.9

Ireland

7.9

United States

3.9

Israel

8.9

Venezuela

22

 

a.       Make a graphical display of these rates.

b.         Find the mean and the median.

c.         Find the standard deviation.

 

7.         The U.S. Department of Transportation collects data on the amount of gasoline sold in each state.  The following data show the per capita (gallons used per person) consumption in the year 2000.  Using appropriate graphical displays and summary statistics, write a report on the gasoline use by state in the year 2000. 

 

Alabama

544.71

Montana

548.5

Alaska

433.08

Nebraska

508.28

Arizona

452.82

Nevada

446.17

Arkansas

532.82

NH

542.86

CA

422.65

NJ

474.28

Colorado

461.90

NM

474.28

CT

431.04

New York

551.18

Delaware

481.45

NC

296.66

Florida

542.36

ND

513.3

Georgia

452.82

Ohio

574.83

Hawaii

327.27

OK

457.63

Idaho

500.34

Oregon

520.42

Illinois

406.66

PA

441.44

Indiana

518.7

RI

410.31

Iowa

534.7

SC

381.86

Kansas

511.34

SD

555.06

Kentucky

510.9

TN

586.58

LA

522.12

Texas

515.17

Maine

542.36

Utah

498.66

Maryland

542.82

Vermont

456.27

Mass

438.1

Virginia

584.03

Michigan

502.77

WA

506.92

MN

528.06

WV

450.4

MS

559.29

WI

462

Missouri

563.56

Wyoming

462.67

 

8.         The Gallup Poll conducted a representative telephone survey during the fist quarter of 1999.  Among their reported results was the following table concerning the preferred political party affiliation of respondents and their ages?

           

 

Rep.

Dem.

Ind.

Total

18-29

241

351

409

1001

30-49

299

330

370

999

50-64

282

341

375

998

65+

279

382

343

1004

Total

1101

1404

1497

4002

 

a.   What percent of people surveyed were Republicans?

b.   Do you think this might be a reasonable estimate of the percentage of all voters who are Republicans?  Explain.

c.   What percent of people surveyed were under 30 or over 65?

d.   What percent of people were Independents under the age of 30?

e.   What percent of Independents were under 30?

f.    What percent of people under 30 were Independents?

 

9.         The following table gives the number of home runs hit during the 2002 season by all of the baseball teams in the American League

 

Team

Home Runs

Team

Home Runs

Team

Home Runs

Anaheim

152

Texas

230

Tampa Bay

133

Boston

177

Chicago

217

Cleveland

192

New York

223

Toronto

187

Detroit

124

Seattle

152

Oakland

205

Baltimore

165

Minnesota

167

Kansas City

140

 

 

 

a.                   Find the mean and median. 

b.                  Find the standard deviation


 

10.       The following table shows the number of stolen bases (SB) by each of the 16 National League baseball teams during the 2002 season

 

Team

SB

Team

SB

Team

SB

Colorado

103

Montreal

118

Cincinnati

116

St. Louis

86

Florida

177

Milwaukee

94

Arizona

92

Atlanta

76

San Diego

71

San Francisco

74

Philadelphia

104

Chicago

63

Los Angeles

96

New York

87

Pittsburgh

86

Houston

71

 

 

 

 

 

a.                   Find the mean and median.

b.                  Find the standard deviation.

 

 

Binomial Distribution

 

11.       According to a survey by Food Processing, 85% of Americans say they eat home-cooked meals three or more times per week (Time, October 7, 2002).  Suppose that this result is true for the current population of Americans.

a.         Let X be a binomial random variable that denotes the number of Americans in a random sample of 12 who say they eat home-cooked meals three or more times per week.  What are the possible values that X can assume?

b.         Find the probability that in a random sample of 10 shoppers, exactly 4 faithfully buy the same cereal.

 

12.       During the hard economic times, people switch between brands while shopping and rarely stick to one brand.  According to an Insight Express online survey of hoppers, only 24% faithfully buy a favorite cereal (CBS.MarketWatch.com, October 1, 2002).  Assume that this percentage is true for the current population of all shoppers.

a.         Let X be a binomial random variable that denotes the number who faithfully buy the same cereal in a random sample of 10 shoppers.  What are the possible values that X can assume?

b.         Find the probability that in a random sample of 10 shoppers, exactly 4 faithfully buy the same cereal.

 

13.       In a poll of 12-18 year-old females conducted by Harris Interactive for the Gillette Company, 40% of the young females said that they expected the United States to have a female president within 10 years (USA Today, October 1, 2002).  Assume that this result is true for the current population of all 12-to 18-year-old females.  Suppose a random sample of 16 females from this age group is selected.  Find the probability that the number of young females in this sample who expect a female president within 10 years is

a.  at least 9                  b.  at most 5                 c. between 6 to 9

 

14.       According to a 2001 study of college students conducted by Harvard University’s School of Public Health, 34.9% of the male students surveyed said they got drunk three or more times in the past 30 days. (USA Today, April 3, 2002). Assuming that this result holds true for all male college students, find the probability that in a random sample of 10 male college students, the number of students who got drunk three or more times in the past 30 days is

a.  exactly 4                  b.  none                        c.  exactly 8

 

Normal Distribution

 

15.       The U.S. Bureau of Labor Statistics conducts periodic surveys to collect information on the labor market.  According to one such survey, the average earnings of workers in retail trade were$10 per hour in August 2002 (Bureau of Labor Statistics News, September 18, 2002).  Assume that the hourly earnings of such workers in August 2002 had a normal distribution with a mean of $10 and a standard deviation of $1.10.  Find the probability that the hourly earnings of a randomly selected retail trade worker in August 2002 were

            a.  more than $12                     b.  between $8.50 and $10.80

 

16.       According to a survey by the Kaiser Family Foundation, employers paid an average of $7954 per employee in annual premiums to provide family health coverage for their employees, and each worker paid and average of $2084 toward these premiums (USA Today, September 6, 2002).  Assume that the current annual premiums paid by all workers for family health coverage are normally distributed with a mean of $2084 and a standard deviation of $300.

a.              Find the probability that a randomly selected worker pays more than $2500 per year toward the family health coverage premium.

b.                  What percentage of such workers are paying between $1800 and $2400 per year toward such premiums?

 

17.       In a Visa USA poll of Americans, participants were asked which of the following had taught them the most about money management:  school or mistakes.  Sixty-four percent of the persons polled said that mistakes had taught them the most (USA Today, May 16, 2002).  Assume that this result is true for the current population of all Americans.  Find the probability that in a random sample of 400 Americans, the number who will say that mistakes have taught them the most about money management is

a.  exactly 250              b. 260 to 272               c. at most 244

 

18.       According to a survey by Money magazine, 27% of women expect to support their parents financially (USA Today, June 19, 2002).  Assume that this percentage holds true for the current population of all women.  Suppose that a random sample of 300 women is taken.

a.                   Find the probability that exactly 79 of the women in this sample expect to support their parents financially.

b.                  Find the probability that at most 74 of the women in this ample expect to support their parent financially.

c.                   What is the probability that between 75 and 89 of the women in this sample expect to support their parents financially?


 

Sampling Distribution

 

19.       According to International Communications Research for Cingular Wireless, men talk an average of 594 minutes per month on their cell phones (USA Today, July 29, 2002).  Assume that currently 594 minutes with a standard deviation of 160 minutes.  Let X be the mean time spent per month talking on their cell phones by a random sample of 400 men who own cell phones.  Find the mean and standard deviation of X.

 

20.       According to the U.S. Bureau of Labor Statistics estimates, the average earnings of construction workers were $18.96 per hour in August 2002 (Bureau of Labor Statistics News, September 18, 2002).  Assume that the current earnings of all construction workers are normally distributed with a mean of $18.96 per hour and a standard deviation of $3.60 per hour.  Find the probability that the mean hourly earning of a random sample of 25 construction workers is

a.                   between $18 and $20 per hour

b.                  within $1 of the population mean

c.                   greater that the population mean by $1.50

 

21.       According toCardWeb.com, the average credit card debt per household was $8367 in 2001 (USA Today, April 29, 2002). Assume that the probability distribution of all such current debts is skewed to the right with a mean of $8367 and a standard deviation of $8367 and a standard deviation of $2400.  Find the probability that the mean of a random sample of 225 such debts is

a.                   between $8100 and$8500

b.                  within $200 of the population mean

c.                   greater that the population mean by $300 or more

 

22.       A Maritz poll of adult drivers conducted in July 2002 found that 45% of them “often” or “sometimes” eat or drink while driving (USA Today, October 23, 2002).  Assume that 45% of all current adult drivers “often” or “sometimes” eat or drink while driving. Let p be the proportion of adult drivers in a sample of 400 who behave this way.  Find the mean and standard deviation of p and describe the shape of its sampling distribution.

 

23.       In a 2002 USA TODAY-CNN-Gallup poll, 37% of taxpayers said that the income tax they had to pay was not fair (USA Today, April 15, 2002).  Assume that this percentage is true for the current population of all taxpayers.  Let p be the proportion of taxpayers in a random sample of 300 who will say that the income tax they have to pay is not fair.  Calculate the mean and standard deviation of p and comment on the shape of its sampling distribution.

 

24.       In a Retirement Confidence survey of retired people, 51% said that retirement is better than they had expected (U.S. News & World Report, June 3, 2002).  Assume that this percentage is true for the current population of all retirees.  Let p be the proportion of retirees in a random sample of 225 who hold this opinion.  Calculate the mean and standard deviation of p and describe the shape of its sampling distribution.

 

25.       According to a 2002 survey by America Online, mothers with children under 18 years of age spent an average of 16.87 hours per week online (USA Today, May 7, 2002).  Assume that the mean time spent online by all current mothers with children under 18 years of age is 16.87 hours per week with a standard deviation of 5 hours per week. Find the probability that the mean time spent online per week by a random sample of 100 such mothers is

a.                   greater that 17 hours

b.                  between 16.5 and 17.5 hours

c.                   within .75hour of the population mean

d.                  less than the population mean by .75 hour or more

 

Confidence Intervals for Proportions

 

26.       Due to sluggish economic conditions, the percentage of companies that host holiday parties for their employees have declined.  According to a survey by Hewitt Associates, 64% of the companies hosted holiday parties in 2002 (USA Today, December 2, 2002).  Assume that this result is based on a random sample of 400 U.S. companies.

a.                   Find a 98% confidence interval for the proportion of all U.S. companies who hosted holiday parties in 2002.

b.         Explain why we need to make the confidence interval. Why can we not say that 64% of all U.S. companies hosted parties in 2002?

 

 

27.       A May 2002 Gallup Poll found that only 8% of a random sample of 1012 adults approved of attempts to clone a human.

a.         Find the margin of error for this poll if we want 95% confidence in our estimate of the percent of American adults who approve of cloning humans.

b.         Explain what that margin of error means.

c.         If we only need to be 90% confident, will the margin of error be larger or smaller?  Explain.

d.         Find that margin of error.

e.         In general, if all other aspects of the situation remain the same, would smaller samples produce smaller or larger margins of error?

 

28.       In the 1992 U.S. presidential election, Bill Clinton received 43% of the vote compared with 38% for George Bush, and 19% for Ross Perot.  Suppose we had taken a random sample of 100 voters in an exit poll and asked them for whom they had voted.

a.         Would you always get 43 votes for Clinton, 38 for Bush, and 19 for Perot in a sample of 100?  Why or why not?

b.         In 95% of such polls, our sample proportion of voters for Clinton should be between what two values?

c.         In 95% of such polls, the sample proportion of Perot votes should be between what two numbers?

d.         Would you expect the sample proportion of Perot votes to vary more, less, or about the same as the sample proportion of Bush votes?  Why?

 

29.       In May of 200, the Pew Research Foundation sampled 1593 respondents and asked how they obtain news.  In Pew’s report, 33% of respondents say that they now obtain news from the Internet at least once a week.

a.         Pew reports a margin of error of 3% for this result.  Explain what the margin of error means.

b.         Pew also asked about investment information, and 21% of respondents reported that the Internet is their main source of this information.  When limited to the 780 respondents who identified themselves as investors, the percent who rely on the Internet rose to 28%.  How would you expect the margin of error for this statistic to change in comparison with the margin of error for the percentage of all respondents?

c.         When restricted to the 239 active traders in the sample, Pew reports that 45% rely on the Internet for investment information.  Find a confidence interval for this statistic.

d.         How does the margin of error for your confidence interval compare with the values in parts a and b?  Explain why.

 

30.       In May of 2002, the Gallup Organization asked a random sample of 537 American adults this question:

If you could choose between the following two approaches, which do you think is the better penalty for murder, the death penalty or life imprisonment, with absolutely no possibility of parole?

Of those polled, 52% chose the death penalty.

a.         Find a 95% confidence interval for the percentage of all American adults who favor the death penalty.

b.         Based on your confidence interval, is it clear that the death penalty has majority support?  Explain.

c.         If pollsters wanted to follow up on this poll with another survey that could determine the level of support for the death penalty to within 2% with 98% confidence, how many people should they poll?

 

Tests of Hypothesis about Proportion

 

31.       During the 2000 season, the home team won 138 of the 240 regular season National Football League games.  Is this strong evidence of a home field advantage in professional football?  Test an appropriate hypothesis and state your conclusion.  Be sure the appropriate assumptions and conditions are satisfied before you proceed.

 

32.       In 2000, 19 million registered voters failed to vote in the presidential election.  According to the U.S. Census Bureau’s Current Population Survey in November 2000, the most frequently given reason for not voting was “too busy,” cited by 20.9% of the respondents (USA Today, April 15, 2002).  Suppose that a random sample of 250 registered voters who did not vote in the November 2002 midterm elections showed that 18.1% of them stated the main reason for not voting was that they were too busy. At the 5% level of significance, can you conclude that the percentage of the registered voters who did not vote in November 2002 because they were too busy was less than 20.9%?

 

33.       According to a 2002 survey conducted by Harris Interactive for lawyers.com, 47% of Americans dream of owning a business (USA Today, September 10,2002).  Assume that this result was true for the population of Americans in 2002.  A recent random sample of 100 Americans found that 430 of 5them dream of owning a business.  Test at the 5% significance level if the current percentage of Americans who dream of owning a business is different from 47%.

 

34.       In a 2002 Affluent Americans and Their Money survey of “affluent” Americans (having an annual household income of $75,000 of more) conducted for Money magazine by RoperASW, 32% of the respondents indicated that they would have a serious problem paying an unexpected bill of $5000 (Money, Fall 2002).  In a recent random sample of 1100 households with annual income of $75,000 or more, 396 said that they would have a serious problem paying an unexpected bill of $5000.

a.         Test at the 2.5% significance level, whether the company should market this yogurt?

b.         What will your decision be in part (a) if the probability of making a Type I error is zero?  Explain.

 

Comparison of Two Proportions

 

35.       A Vermont study published in December 2001 by the American Academy of Pediatrics examined parental influence on teenagers’ decisions to smoke.  A group of students who had never smoked were questioned about their parents’ attitudes toward smoking.  These students were questioned again two years later to see if they had started smoking.  The researchers found that among the 284 students who indicated that their parents disapproved of kids smoking, 54 had become established smokers.  Among the 41 students who initially said their parents were lenient about smoking, 11 became smokers.  Do these data provide strong evidence that parental attitude influences teenagers’ decisions about smoking?

a.       What kind of design did the researchers use?

b.         Write appropriate hypotheses.

c.         Are the assumptions and conditions necessary for inference satisfied?

d.         Test the hypothesis and state your conclusion.

e.         Explain in this context what your p-value means.

f.          If that conclusion is actually wrong, which type of error did you commit?

 

36.       The August 2001 issue of Pediatrics reported on a study of adolescent suicide attempts.  Questionnaires were given to 6577 middle and high school students, 214 of whom were adopted.  Of the 6577 students 213 youngsters said they had attempted suicide within the last year: 16 of those who were adopted and 197 of those who were not.  Does this indicate a significantly different rate of suicide among adopted teens?

a.         Test an appropriate hypothesis and state your conclusion.

b.         If you concluded there was a difference, estimate that difference with a confidence interval and interpret your interval in context.

 

37.       Among 242 Cleveland area children born prematurely at low birth weights between 1977 and 1979, only 74% graduated from high school.  Among a comparison group of 233 children of normal birth weight, 83% were high school graduates.  (New England Journal of Medicine, 346, no. 3 [2002])

a.         Find a 95% confidence interval for the difference in graduation rates between children of normal and very low birth weights.  Be sure to check the appropriate assumption and conditions.

b.         Does this provide evidence that premature birth may be a risk factor for not finishing high school?  Use your confidence interval to test an appropriate hypothesis.

c.         Suppose your conclusion is incorrect.  Which type of error did you make?

 

Confidence Intervals for Mean

 

38.       According to Money magazine, the average net worth of U.S. households in 2002 was $355,000 (Money, Fall 2002).  Assume that this mean is based on a random sample of 500 households and that the sample standard deviation is $125,000.  Find a 99% confidence interval for the 2002 mean net worth of all U.S.households.

 

39.       According to a 2002 survey by America Online, mothers with children under age 18 spent an average of 16.87 hours per week online (USA Today, May 7, 2002).  Suppose that this mean is based on a random sample of 1000 such mothers and that the standard deviation for this sample is 3.2 hours per week.  Find a 95% confidence interval for the corresponding population mean for all such mothers.

 

40.       According to Money magazine, the average price of new homes in the United States was $145,000 in 2002 (Money, Fall 2002).  Assume that this mean is based on a random sample of 1000 new home sales and that the sample standard deviation is $24,000.  Find a 95% confidence interval for the 2002 mean price of all such homes.

 

41.       According to Money magazine, the average cost of a movie ticket in the United States was $5.70 in 2002 (Money, Fall 2002).  Suppose that a random sample of 25 theaters in the United States yielded a mean movie ticked price of $5.70 with a standard deviation of $1.05.  Assuming that movie ticket prices are normally distributed, find a 95% confidence interval for the mean price of movie tickets for all theaters in the United States.

 

 

Tests about Mean

 

42.       In 1998, as an advertising campaign, the Nabisco Company announced a “1000 Chips Challenge,”  claiming that every 18-ounce bag of their Chips Ahoy cookies contained at least 1000 chocolate chips.  The students in a Statistics class at the Air Force Academy purchased some randomly selected bags of cookies, and counted the chocolate chips.  Some of their data are given below.  (Chance, 12, no. 1,  1999)

                   1219           1214          1087           1200           1419           1121           1325          1345

                   1244           1258          1356           1132           1191           1270           1295          1135

a.         Find a 95% confidence interval for the average number of chips in bags of Chips Ahoy cookies.

b.         What does this evidence say about Nabisco’s claim?  Use your confidence interval to test an appropriate hypothesis and state your conclusion.

 

43.       According to the U.S. Bureau of Labor Statistics, there were 8.1 million unemployed people aged 16 years and over in August 2002.  The average duration of unemployment for these people was 16.3 weeks (Bureau of Labor Statistics News, September 6, 2002).  Suppose that a recent random sample of 400 unemployed Americans aged 16 years and over gave a mean duration of unemployment of 16.9 weeks with a standard deviation of 4.2 weeks.  Find the p-value for the hypothesis test with the alternative hypothesis that the current mean duration of unemployment for all unemployed Americans aged 16 years and over exceeds 16.3 weeks.  Will you reject the null hypothesis at   = .02?

 

44.       According to an estimate, Americans spend an average of $226 per year to “look good,” buying personal-care products and services such as tooth whitener, hair dyes, and sessions at salons (Reader’s Digest, September 2002).  Suppose that a recent random sample of 250 Americans showed that they spent an average of $238 on such products and services this year with a standard deviation of $77.

a.                   Find the p-value for the test of hypothesis with the alternative hypothesis that the current mean annual amount spent on such products and services differs from $226.

b.                  If  = .01, would you reject the null hypothesis based on the p-value calculated in part a?  What if = .02?

 

45.       According to the International Communications Research for Cingular Wireless, women talked an average of 394 minutes per month on their cell phones in 2002 (USA Today, July 29, 2002).  Suppose that a recent sample of 295 women who own cell phones showed that the mean time they spend per month talking on their cell phones is 402 minutes with a standard deviation of 81 minutes.

a.         At the 2% level of significance, can you conclude that the mean time spent talking on their cell phones by all women who own cell phones is currently more that 394 minutes per month?

b.         What is the Type I error in this case?   What is the probability of making this error in part a?

 

46.       According to Money magazine, the average cost of a visit to a doctor’s office in the United States was $60 in 2002 (Money, Fall 2002).  Suppose that a recent random sample of 25 visits to doctors gave a mean of $63.50 and a standard deviation of $2.00.  Using the 5% significance level, can you conclude that the current mean cost of a visit to a doctor’s office exceeds $60?  Assume that such cost for all visits to doctors are normally distributed.

 

47.       According to the U.S Bureau of Labor Statistics, production workers in the mining industry worked an average of 43.5 hours per week in June 2002 (Bureau of Labor Statistics News, September 6, 2002). A random sample of 24 production workers selected recently from a large mining company, Low Yield Mine, found that they work an average of 41.7 hours per week with a standard deviation of 1.3 hours per week.  Assume that the weekly working hours of all these employees are normally distributed.

a.         Suppose that the probability of making a Type I error is selected to be zero.  Can you conclude that workers at Low Yield Mine work less than 4.35 hours per week?  Answer without performing the five steps of a test of hypothesis.

b.         Using the 5% level of significance, can you conclude that workers at Low Yield Mine work less than 43.5 hours per week?

 

Comparison of Means

 

48.       In the 2002 National Geographic Society—RoperASW poll of geographic knowledge, young adults of 18 to 24 years of age from the United States and 8 other nations were asked to identify 11 countries on a numbered map of Asia (National Geographic, December 2002).  The two highest-scoring nations were Germany, with an average of 6.7 correct identifications out of 11, and Sweden, with an average of 6.3 correct answers.  Suppose that these means were based on random samples of 400 young adults from Germany and 600 from Sweden, and that the sample standard deviations of scores for Germany and Sweden were .7 and .8, respectively.

a.         Let  and  be the population mean for Germany and Sweden, respectively.  Find the point estimate of and its margin of error.

b.         Find a 98% confidence interval for .  Using the 5% level of significance, can you conclude that the mean score for all young adults from Germany is greater than that of all young adults from Sweden?

 

49.       According to Smith Travel Research, the mean hotel room rates in the United States were $85.69 and $84.58 per day in 200 and 2001, respectively (USA Today, September 4, 2002).  Suppose that these mean rates were based on random samples of 1000 hotel rooms for 2000 and 1100 hotel rooms for 2001; further assume that the standard deviations of these rates for the two samples were $18.50 and $18, respectively.

a.         Let   and   be the mean rates for all hotel rooms in the United States in 200 and 2001, respectively.  What are the point estimate of u1-u2 and its margin of error?

b.         Find a 90% confidence interval for .

c.         Test ate the 1% significance level whether the mean hotel room rate for 2000 was higher than that for 2001.

 

50.       In June 2002, the Journal of Applied Psychology reported on a study that examined whether the content of TV shows influenced the ability of viewers to recall brand names of items featured in the commercials.  The researchers randomly assigned volunteers to watch one of three programs, each containing the same nine commercials.  One of the programs had violent content, another sexual content, and the third neutral content.  After the shows ended the subjects were asked to recall the brands of products that were advertised.  Results are summarized below.

 

 

Violent

Sexual

Neutral

No. of subjects

108

108

108

Mean

2.08

1.71

3.17

St. Dev

1.87

1.76

1.77

 

a.                   Do the results indicate that viewer memory for ads may differ depending on the program content?

b.         Is there evidence that viewer memory for ads may differ between programs with sexual content and those with neutral content?

 

51.       In a full-page ad that ran in many U.S. newspapers in August 2002, a Canadian discount pharmacy listed costs of drugs that could be ordered from a Website in Canada.  The table compares prices (in US$) for commonly prescribed drugs.

 

COST PER 100 PILLS

 

United States

Canada

Percent Savings

Cardizem

131

83

37

Celebrex

136

72

47

Cipro

374

219

41

Pravachol

370

166

55

Premarin

61

17

72

Prevacid

252

214

15

Prozac

263

112

57

Tamoxifen

349

50

86

Vioxx

243

134

45

Zantac

166

42

75

Zocor

365

200

45

Zoloft

216

105

51

 

a.       Find a 95% confidence interval for the average savings in dollars.

b.         Find a 95% confidence interval for the average savings in Percent.

c.         Which analysis do you think is more appropriate?  Why?

 

52.       Ever since Lou Gehrig developed amyotrophic lateral sclerosis (ALS), this deadly condition has been commonly known as Lou Gehrig’s disease.  Some believe that ALS is more likely to strike athletes or the very fit.  Columbia University neurologist Lewis P. Rowland recorded personal histories of 431 patients he examined between 1992 and 2002.  He diagnosed 280 as having ALS;  38% of them had been varsity athletes.  The other 151 had other neurological disorders, and only 26% of them had been varsity athletes.  (Science News, Sept.28, 2002).  Is there evidence that ALS is more common among athletes?

 

53.       A study of the health behavior of school-aged children asked a sample of 15-year-olds in several different countries if they had been drunk at least twice.  The results are shown in the table, by gender.  Find a 95% confidence interval for the difference in the rates for males and females.  Be sure to check the assumptions that support your chosen procedure, and explain what your interval means.  (Health and Health Behavior Among Young people.  Copenhagen:  World Health Organization, 2000)

 

     Percent of 15-Year-Olds Drunk at Least Twice

Country

Female

Male

Denmark

63

71

Wales

63

72

Greenland

59

58

England

62

51

Finland

58

52

Scotland

56

53

No. Ireland

44

53

Slovakia

31

49

Austria

36

49

Canada

42

42

Sweden

40

40

Norway

41

37

Ireland

29

42

Germany

31

36

Latvia

23

47

Estonia

23

44

Hungary

22

43

Poland

21

39

USA

29

34

Czech Rep.

22

36

Belgium

22

36

Russia

25

32

Lithuania

20

32

France

20

29

Greece

21

24

Switzerland

16

25

Israel

10

18

 

54.       In March 2002, Consumer Reports listed the rate of return for several large cap mutual funds over the previous 3-year and 5-year periods.  (“Large cap” refers to companies worth over $10 billion.)

Annualized Returns (%)

Fund Name

3-year

5-year

Ameristock

7.9

17.1

Clipper

14.1

18.2

Credit Suisse Strategic Value

5.5

11.5

Dodge & Cox Stock

15.2

15.7

Excelsior Value

13.1

16.4

Harbor Large Cap Value

6.3

11.5

ICAP Discretionary Equity

6.6

11.4

ICAP Equity

7.6

12.4

Neuberger Berman Focus

9.8

13.2

PBHG Large Cap Value

10.7

18.1

Pelican

7.7

12.1

Price Equity Income

6.1

10.9

USAA Cornerstone Strategy

2.5

4.9

Vanguard Equity Income

3.5

11.3

Vanguard Windsor

11.0

11.0

 

a.         Find a 95% confidence interval for the difference in rate of return for the 3- and 5-year periods covered by these data.  Clearly explain what your interval means.

b.         It’s common for advertisements to carry the disclaimer that ”past returns may not be indicative of future performance,” but do these data indicate that there was an association between 3-year and 5-year rates of return?

 

 

Chi Square Test

 

55.       In 2000, the Journal of American Medical Association published a study that examined a sample of pregnancies that resulted in the birth of twins.  Births were classified as preterm with intervention (induced labor or cesarean), preterm without such procedures, or term or postterm.  Researchers also classified the pregnancies by the level of prenatal medical care the mother received (inadequate, adequate, or intensive).  The data, from the years 1995-97, are summarized in the table below.  Figures are in thousands of births.  (JAMA 284, 2000)

 

TWIN BIRTHS 1995-1997 (IN THOUSANDS)

 

Preterm

(induced or

Cesarean)

Preterm

(without

procedures)

Term or

postterm

Total

Intensive

18

15

28

61

Adequate

46

43

65

154

Inadequate

12

13

38

63

Total

76

71

131

278

 

Is there evidence of an association between the duration of the pregnancy and the level of care received by the mother?

 

56.       The Gallup Poll conducted a representative telephone survey during the first quarter of 1999.  Among the reported results was the following table concerning the preferred political party affiliation of respondents and their ages.  Is there evidence of age-based differences in party affiliation in the United States?

 

 

Republican

Democratic

Independent

Total

18-29

241

351

409

1001

30-49

299

330

370

999

50-64

282

341

375

998

65+

279

382

343

1004

Total

1101

1404

1497

4002

 

a.         Will you conduct a test of homogeneity or independence?  Why?

b.         Test an appropriate hypothesis.

c.         State your conclusion, including an analysis of differences you find (if any).

 

57.       In January 2002, 725 people receiving outplacement assistance, with incomes of $60,000 to $150,000 were asked how long they could comfortably afford to be unemployed (Business Week, April 15, 2002).  Eight percent said” less than three months,” 46% said “up to six months,” 26% said “up to a year,” and 20% said “more that a year.”  Assume that these results are true for the 2002 population of such people.  Suppose we denote the above responses by A, B, C, and D, respectively.  Recently500 such people were randomly selected and asked the same question.  The following table summarizes their responses.

 

Response

A

B

C

D

Number of people

48

242

120

90

 

Test at the 5% significance level whether the current distribution of response differs from the one for 2002.

 

58.       In an At-a-Glance Communications 2002 survey, office workers were asked how long they normally took to respond to e-mail.  Thirty-sex percent said “as soon as I return to my desk,” 35% said “within an hour or two,” 24% said “before the end of the business day,” and 5% said “when I can” (USA Today, May 7, 2002).  Assume that these results hold true for the population of all office workers in 2002.  Suppose we denote these response by A, B, C, and D, respectively.  A recent random sample of 400 office workers was asked the same question and it yielded the frequency distribution shown in the following table.

 

Response

A

B

C

D

Frequency

128

142

116

14

 

Using the 1% significance level, can you conclude that the current distribution of responses differs from the 2002 distribution?

 

59.       A survey conducted from June 21 through August 7, 2002 studied “affluent” Americans with household incomes of $75,000 or more per year (Money, Fall 2002). Part of that survey examined the relationship between the use of a financial advisor and ownership of stocks.  Assuming that this portion of the survey was based on a random sample of 400 affluent Americans, the percentages given in the magazine would yield the numbers shown in the following table.

 

 

 

Own Stocks

Do Not Own Stocks

Use financial

Yes

165

135

advisor?

No

43

57

 

At the 5% significance level, can you conclude that use of a financial advisor is related to stock ownership for all affluent Americans?

 

60.       In a Knowledge Networks/Statistical Research 2002 survey, 8- to 17-year-olds were asked which medium was there favorite (The Reader’s Digest, November 2002).  If the survey were based on random samples of 500 boys and 500 girls, the percentages given in the magazine article would have yielded the following table.

 

 

Internet

TV

Phone

Radio

Other

Boys

190

170

60

60

20

Girls

140

85

155

85

35

 

Using the 1% significance level, test the null hypothesis that the distribution of media preferences is the same for boys and girls in this age group.

 

Linear Regression and Correlation

 

61.       The following table gives horsepower ratings and expected gas mileage for several 2001 vehicles.

 

Audi A4

170 hp

22 mpg

Buick LeSabre

205

20

Chevy Blazer

190

15

Chevy Prizm

125

31

Ford Excursion

310

10

GMC Yukon

285

13

Honda Civic

127

29

Hyundai Elantr

140

25

Lexus 300

215

21

Lincoln LS

210

23

Mazda MPV

170

18

Olds Alero

140

23

Toyota Camry

194

21

VW Beetle

115

29

 

a.         Make a scatterplot for these data.

b.         Describe the direction, form, and scatter of the plot.

c.         Find the correlation between horsepower and miles per gallon.

d.         Write a few sentences telling what the plot says about fuel economy.

 

62.       The following table shows the oil production of the United States from 1949 to 2000 (in millions of barrels per year).

 

Year

Oil

Year

Oil

Year

Oil

Year

Oil

1949

1,841,940

1962

2,676,189

1975

3,056,779

1988

2,979,123

1950

1,973,574

1963

2,752,723

1976

2,976,180

1989

2,778,773

1951

2,247,711

1964

2,786,822

1977

3,009,265

1990

2,684,687

1952

2,289,836

1965

2,848,514

1978

3,178,216

1991

2,707,039

1953

2,357,082

1966

3,027,763

1979

3,121,310

1992

2,624,632

1954

2,314,988

1967

3,215,742

1980

3,146,365

1993

2,499,033

1955

2,484,428

1968

3,329,042

1981

3,128,624

1994

2,431,476

1956

2,617,283

1969

3,371,751

1982

3,156,715

1995

2,394,268

1957

2,616,901

1970

3,517,450

1983

3,170,999

1996

2,366,017

1958

2,448,987

1971

3,453,914

1984

3,249,696

1997

2,354,831

1959

2,574,590

1972

3,455,368

1985

3,274,553

1998

2,281,919

1960

2,574,933

1973

3,360,903

1986

3,168,252

1999

2,146,732

1961

2,621,758

1974

3,202,585

1987

3,047,378

2000

2,135,062

 

a.         Find the correlation between year and production.

b.         A reporter concludes that a low correlation between year and production shows that oil production has remained steady over the 50-year period.  Do you agree with this interpretation?  Explain.

c.         Fit a least squares regression line to oil production by year.

d.         Using this regression line, predict U.S. oil production in the year 2001.

e.         Does the prediction in part b look reasonable?  Comment

f.          Do you think the regression line is an appropriate model?  Comment.

 

63.       The following table gives the total 2002 payroll (rounded to the nearest million dollars) and the percentage of games won during the 2002 season by each of the National League baseball teams.

 

Team

Total Payroll

Percentage of Game Won

Arizona Diamondbacks

103

60.5

Atlanta Braves

93

63.1

Chicago Cubs

76

41.4

Cincinnati Reds

45

48.1

Colorado Rockies

57

45.1

Florida Marlins

42

48.8

Houston Astros

63

51.9

Los Angeles Dodgers

95

56.8

Milwaukee Brewers

50

34.6

Montreal Expos

39

51.2

New York Mets

95

46.6

Philadelphia Phillies

58

49.7

Pittsburgh Pirates

42

44.7

St. Louis Cardinals

75

59.9

San Diego Padres

41

40.7

San Francisco Giants

78

59.0

 

a.         Find the least squares regression line with total payroll as an independent variable and percentage of games won as a dependent variable.

b.         Give a brief interpretation of the values of the y-intercept and the slope.

c.         Predict the percentage of games won for a team with a total payroll of $55 million.

 

64.       The following table gives the total 2002 payroll (rounded to the nearest million dollars) and the percentage of games won during the 2002 season by each of the American League baseball teams.

 

Team

Total Payroll

Percentage of Games Won

Anaheim Angels

62

61.1

Baltimore Orioles

60

41.4

Boston Red Sox

108

57.4

Chicago White Sox

57

50.0

Cleveland Indians

79

45.7

Detroit Tigers

55

34.2

Kansas City Royals

47

38.3

Minnesota Twins

40

58.4

New York Yankees

126

64.0

Oakland A’s

40

63.6

Seattle Marines

80

57.4

Tampa Bay Devil Rays

34

34.2

Texas Rangers

106

44.4

Toronto Blue Jays

77

48.1

 

a.         Find the least square regression line with total payroll as an independent variable and percentage of games won as a dependent variable.

b.         Give a brief interpretation of the values of the y-intercept and the slope.

c.         Predict the percentage of games won for a team with a total payroll of $65 million.

 

65.       The following table gives the average hotel room rates in the United States for the years 1992-2001.

 

Year

Average Hotel Room Rate

1992

$59.39

1993

60.99

1994

63.35

1995

66.34

1996

70.68

1997

74.77

1998

78.24

1999

81.59

2000

85.69

2001

84.58

 

a.         Assign a value of 0 to 1992, 1 to 1993, 2 to 1994, and so on.  Call this new variable Time.  Make a new table with the variables Time and Average Hotel Room Rate.

b.         Construct a scatter diagram for these data.  Does the scatter diagram exhibit a linear positive relationship between time and average hotel room rates?

c.         Find the least squares regression line.

d.         Compute the correlation coefficient r.

e.         Predict the average hotel room rate for 2006.  Comment on this prediction.

 

66.       The following table shows the price of ladies diamond rings and the weight of their diamond stones. (Journal of Statistical Education, 1996)

 

Weight

Price

Weight

Price

Weight

Price

Weight

Price

Weight

Price

Weight

Price

.17

355

.21

483

.12

223

.17

353

.32

919

.25

655

.16

328

.15

323

.26

663

.18

438

.15

298

.35

1086

.17

350

.18

462

.25

750

.17

318

.16

339

.18

443

.18

325

.28

823

.27

720

.18

419

.16

338

.25

678

.25

642

.16

336

.18

468

.17

346

.23

595

.25

675

.16

342

.20

498

.16

345

.15

315

.23

553

.15

287

.15

322

.23

595

.17

352

.17

350

.17

345

.26

693

.19

485

.29

860

.16

332

.32

918

.33

945

.15

316

 

 

 

 

 

 

 

 

 

 

a.         Construct a scatter diagram for these data. 

b.         Find the least squares regression line.

c.         Compute the correlation coefficient r.