Using the World
Wide Web for Teaching Statistics
This document contains problems that can be used in an introductory statistics course. The problems are grouped into the following categories.
Numerical Description of Data
The following sites have useful resources for teaching
statistic.
The Dataset and Story Library: http://lib.stat.cmu.edu/DASL/
Data Surfing: http://it.stlawu.edu/~rlock/datasurf.html
StatLib Dataset Archive: http://lib.stat.cmu.edu/datasets/
Java Applets for Statistics: http://www.stat.duke.edu/sites/java.html
Multimedia Statistics Page: http://www.berrie.dds.nl/index.html
Rice Virtual Lab in Statistics: http://www.ruf.rice.edu/~lane/rvls.html
Journal of Statistics Education Data Archive: http://www.amstat.org/publications/jse/jse_data_archive.html
1. In a survey of 1000 people, conducted in June 2002 by Strategy One/Colonial Williamsburg, they were asked what issues was most important to them out of the choices given in the table below.
Issue
|
Percentage of Response
|
Freedom of Speech |
26 |
Access to affordable health care |
20 |
Freedom of religion |
19 |
Opportunity of economic advancement |
12 |
Right to pursue an education |
12 |
Freedom of press |
3 |
Don’t know/none of the above |
8 |
(Source: Survey One/Central Williamsburg, USA Today, August 13, 2002)
Draw a pie chart to describe the distribution.
2. 25 countries won medals in the 2002 Winter Olympics. The table below list them along with the total number of medals each won.
Country
|
Medals
|
Country
|
Medals
|
Germany |
35 |
Croatia |
4 |
USA |
34 |
Korea |
4 |
Norway |
24 |
Bulgaria |
3 |
Canada |
17 |
Estonia |
3 |
Austria |
16 |
Great Britain |
3 |
Russia |
16 |
Australia |
2 |
Italy |
12 |
Czech Republic |
2 |
France |
11 |
Japan |
2 |
Switzerland |
11 |
Poland |
2 |
China |
8 |
Spain |
2 |
Netherlands |
8 |
Belarus |
1 |
Finland |
7 |
Slovenia |
1 |
Sweden |
6 |
|
|
(Source: CNNSI.com http://sportsillustrated.cnn.com/olympics/2002/current_medal_tracker/)
(a) Draw a pie chart to describe the distribution. What problems do you encounter?
(b) Can you find a way to organize the data so that the graph is more successful?
3. 202 countries participated in 2004 Summer Olympics, and 75 countries won medals. The table below list top 25 countries with the total number of medals won.
Country
|
Medals
|
Country
|
Medals
|
USA |
103 |
Romania |
19 |
Russia |
92 |
Spain |
19 |
China |
63 |
Hungary |
17 |
Australia |
49 |
Greece |
16 |
Germany |
48 |
Belarus |
15 |
Japan |
37 |
Canada |
12 |
France |
33 |
Bulgaria |
12 |
Italy |
32 |
Brazil |
10 |
South Korea |
30 |
Turkey |
10 |
Great Britain |
30 |
Poland |
10 |
Cuba |
27 |
Thailand |
8 |
Ukraine |
23 |
Denmark |
8 |
Netherlands |
22 |
|
|
(Source: CNNSI.com http://sportsillustrated.cnn.com/olympics/2004/medaltracker/medalTrackerByTotal.html)
a. Draw a pie chart to describe the distribution. What problems do you encounter?
b. Can you find a way to organize the data so that the graph is more successful?
4. The following table, based on the American Chamber of Commerce Researchers Association Survey for the second quarter of 2002, gives the prices (in dollars) of five items in 25 urban areas across the United States.
City |
Apartment Rent |
Phone Bill |
Price of Gasoline |
Visit to Doctor |
Price of Beer |
Montgomery (AL) |
$576 |
$22.28 |
$1.335 |
$52.33 |
$7.88 |
Juneau (AK) |
1020 |
18.26 |
1.584 |
88.67 |
8.12 |
Tucson (AZ) |
689 |
21.03 |
1.347 |
54.80 |
7.79 |
Sacramento (CA) |
749 |
16.99 |
1.643 |
70.00 |
6.99 |
San Diego (CA) |
1306 |
24.57 |
1.632 |
75.20 |
7.99 |
Denver (CO) |
891 |
23.10 |
1.343 |
71.8 |
6.90 |
Hartford (CT) |
896 |
22.39 |
1.419 |
80.25 |
7.15 |
Jacksonville (FL) |
810 |
20.31 |
1.419 |
63.80 |
7.15 |
Bloomington (IN) |
678 |
19.95 |
1.402 |
56.67 |
6.99 |
New Orleans (LA) |
798 |
26.06 |
1.351 |
56.20 |
6.56 |
Boston (MA) |
1248 |
24.41 |
1.405 |
78.00 |
7.21 |
Grand Rapids (MI) |
678 |
22.40 |
1.499 |
59.20 |
8.03 |
Minneapolis (MN) |
815 |
25.16 |
1.366 |
72.20 |
7.49 |
Springfield (MO) |
568 |
18.25 |
1.309 |
63.72 |
7.89 |
Billings (MT) |
550 |
30.45 |
1.449 |
70.75 |
7.09 |
Buffalo (NY) |
714 |
33.71 |
1.413 |
53.00 |
7.03 |
Charlotte (NC) |
540 |
21.07 |
1.359 |
58.00 |
7.03 |
Akron (OH) |
686 |
21.16 |
1.519 |
59.40 |
7.29 |
Oklahoma City (OK) |
579 |
23.04 |
1.308 |
60.02 |
6.94 |
Portland (OR) |
753 |
20.92 |
1.403 |
72.40 |
7.69 |
Philadelphia (PA) |
1282 |
21.12 |
1.360 |
62.50 |
8.57 |
Austin (TX) |
1025 |
19.20 |
1.299 |
68.33 |
6.78 |
Richmond (VA) |
769 |
26.15 |
1.317 |
59.80 |
6.37 |
Spokane (WA) |
593 |
18.49 |
1.305 |
61.80 |
6.89 |
Charleston (WV) |
606 |
27.08 |
1.423 |
64.67 |
7.01 |
Apartment Rent: Monthly rent of an unfurnished 2-bedroom apartment (excluding all utilities except water), 1 ½ or 2 baths, approximately 950 square feet.
Phone Bill: Monthly telephone charges for a private residential line (customer owns instruments).
Price of Gasoline: Price of one gallon regular unleaded, national brand.
Visit to doctor: General practitioner’s routine examination of patient.
Price of Beer: Heineken’s 6-pack, 12-oz. containers, excluding deposit.
a. Prepare frequency distributions for the five variables.
b. Construct the relative frequency and percentage distribution for the five variables.
c. Draw histograms.
5. The table below shows the average SAT scores for each of the 50 states and District of Columbia for 1990 and 2000.
State
|
1990 |
2000 |
Alabama |
1079 |
1114 |
Alaska |
1015 |
1034 |
Arizona |
1041 |
1044 |
Arkansas |
1077 |
1117 |
California |
1002 |
1015 |
Colorado |
1067 |
1071 |
Connecticut |
1002 |
1017 |
Delaware |
1006 |
998 |
D.C |
950 |
980 |
Florida |
988 |
998 |
Georgia |
951 |
974 |
Hawaii |
985 |
1007 |
Idaho |
1066 |
1081 |
Illinois |
1089 |
1154 |
Indiana |
972 |
999 |
Iowa |
1172 |
1189 |
Kansas |
1129 |
1154 |
Kentucky |
1089 |
1098 |
Louisiana |
1088 |
1120 |
Maine |
991 |
1004 |
Maryland |
1008 |
1016 |
Massachusetts |
1001 |
1024 |
Michigan |
1063 |
1126 |
Minnesota |
1110 |
1175 |
Mississippi |
1090 |
1111 |
Missouri |
1089 |
1149 |
Montana |
1082 |
1089 |
Nebraska |
1121 |
1131 |
Nevada |
1022 |
1027 |
New Hampshire |
1028 |
1039 |
New Jersey |
993 |
1011 |
New Mexico |
1100 |
1092 |
New York |
985 |
1000 |
North Carolina |
948 |
988 |
North Dakota |
1157 |
1197 |
Ohio |
1048 |
1072 |
Oklahoma |
1095 |
1123 |
Oregon |
1024 |
1054 |
Pennsylvania |
987 |
995 |
Rhode Island |
986 |
1005 |
South Carolina |
942 |
966 |
South Dakota |
1150 |
1175 |
Tennessee |
1102 |
1116 |
Texas |
979 |
993 |
Utah |
1121 |
1139 |
Vermont |
1000 |
1021 |
Virginia |
997 |
1009 |
Washington |
1024 |
1054 |
West Virginia |
1034 |
1037 |
Wisconsin |
1111 |
1181 |
Wyoming |
1072 |
1090 |
(Source: College Entrance Examination Board, 2001)
a. Use graphs to display the two SAT score distributions. How has the distribution of average state scores changed over the decade?
b. Compute the paired difference by subtracting the 1990 score from the 200 score for each state. Summarize these differences with a graph.
Numerical Description of Data
6. In an advertisement in USA Today (July 9, 2001), the company Net2Phone listed its long distance rates to 24 of the 250 countries to which it offers service.
Country |
Cost per Minute (cents) |
Country |
Cost per Minute (cents) |
Belgium |
7.9 |
Italy |
9.9 |
Chile |
17 |
Japan |
7.9 |
Canada |
3.9 |
Mexico |
16 |
Colombia |
9.9 |
Pakistan |
49 |
Dominican Republic |
15 |
Philippines |
49 |
Finland |
9.9 |
Puerto Rico |
21 |
France |
7.9 |
Singapore |
11 |
Germany |
7.9 |
South Korea |
9.9 |
Hong Kong |
7.9 |
Taiwan |
9.9 |
India |
49 |
United Kingdom |
7.9 |
Ireland |
7.9 |
United States |
3.9 |
Israel |
8.9 |
Venezuela |
22 |
a. Make a graphical display of these rates.
b. Find the mean and the median.
c. Find the standard deviation.
7. The U.S. Department of Transportation collects data on the amount of gasoline sold in each state. The following data show the per capita (gallons used per person) consumption in the year 2000. Using appropriate graphical displays and summary statistics, write a report on the gasoline use by state in the year 2000.
Alabama |
544.71 |
Montana |
548.5 |
Alaska |
433.08 |
Nebraska |
508.28 |
Arizona |
452.82 |
Nevada |
446.17 |
Arkansas |
532.82 |
NH |
542.86 |
CA |
422.65 |
NJ |
474.28 |
Colorado |
461.90 |
NM |
474.28 |
CT |
431.04 |
New York |
551.18 |
Delaware |
481.45 |
NC |
296.66 |
Florida |
542.36 |
ND |
513.3 |
Georgia |
452.82 |
Ohio |
574.83 |
Hawaii |
327.27 |
OK |
457.63 |
Idaho |
500.34 |
Oregon |
520.42 |
Illinois |
406.66 |
PA |
441.44 |
Indiana |
518.7 |
RI |
410.31 |
Iowa |
534.7 |
SC |
381.86 |
Kansas |
511.34 |
SD |
555.06 |
Kentucky |
510.9 |
TN |
586.58 |
LA |
522.12 |
Texas |
515.17 |
Maine |
542.36 |
Utah |
498.66 |
Maryland |
542.82 |
Vermont |
456.27 |
Mass |
438.1 |
Virginia |
584.03 |
Michigan |
502.77 |
WA |
506.92 |
MN |
528.06 |
WV |
450.4 |
MS |
559.29 |
WI |
462 |
Missouri |
563.56 |
Wyoming |
462.67 |
8. The Gallup Poll conducted a representative telephone survey during the fist quarter of 1999. Among their reported results was the following table concerning the preferred political party affiliation of respondents and their ages?
|
Rep. |
Dem. |
Ind. |
Total |
18-29 |
241 |
351 |
409 |
1001 |
30-49 |
299 |
330 |
370 |
999 |
50-64 |
282 |
341 |
375 |
998 |
65+ |
279 |
382 |
343 |
1004 |
Total |
1101 |
1404 |
1497 |
4002 |
a. What percent of people surveyed were Republicans?
b. Do you think this might be a reasonable estimate of the percentage of all voters who are Republicans? Explain.
c. What percent of people surveyed were under 30 or over 65?
d. What percent of people were Independents under the age of 30?
e. What percent of Independents were under 30?
f. What percent of people under 30 were Independents?
9. The following table gives the number of home runs hit during the 2002 season by all of the baseball teams in the American League
Team |
Home Runs |
Team |
Home Runs |
Team |
Home Runs |
Anaheim |
152 |
Texas |
230 |
Tampa Bay |
133 |
Boston |
177 |
Chicago |
217 |
Cleveland |
192 |
New York |
223 |
Toronto |
187 |
Detroit |
124 |
Seattle |
152 |
Oakland |
205 |
Baltimore |
165 |
Minnesota |
167 |
Kansas City |
140 |
|
|
a. Find the mean and median.
b. Find the standard deviation
10. The following table shows the number of stolen bases (SB) by each of the 16 National League baseball teams during the 2002 season
Team |
SB |
Team |
SB |
Team |
SB |
Colorado |
103 |
Montreal |
118 |
Cincinnati |
116 |
St. Louis |
86 |
Florida |
177 |
Milwaukee |
94 |
Arizona |
92 |
Atlanta |
76 |
San Diego |
71 |
San Francisco |
74 |
Philadelphia |
104 |
Chicago |
63 |
Los Angeles |
96 |
New York |
87 |
Pittsburgh |
86 |
Houston |
71 |
|
|
|
|
a. Find the mean and median.
b. Find the standard deviation.
11. According to a survey by Food Processing, 85% of Americans say they eat home-cooked meals three or more times per week (Time, October 7, 2002). Suppose that this result is true for the current population of Americans.
a. Let X be a binomial random variable that denotes the number of Americans in a random sample of 12 who say they eat home-cooked meals three or more times per week. What are the possible values that X can assume?
b. Find the probability that in a random sample of 10 shoppers, exactly 4 faithfully buy the same cereal.
12. During the hard economic times, people switch between brands while shopping and rarely stick to one brand. According to an Insight Express online survey of hoppers, only 24% faithfully buy a favorite cereal (CBS.MarketWatch.com, October 1, 2002). Assume that this percentage is true for the current population of all shoppers.
a. Let X be a binomial random variable that denotes the number who faithfully buy the same cereal in a random sample of 10 shoppers. What are the possible values that X can assume?
b. Find the probability that in a random sample of 10 shoppers, exactly 4 faithfully buy the same cereal.
13. In a poll of 12-18 year-old females conducted by Harris Interactive for the Gillette Company, 40% of the young females said that they expected the United States to have a female president within 10 years (USA Today, October 1, 2002). Assume that this result is true for the current population of all 12-to 18-year-old females. Suppose a random sample of 16 females from this age group is selected. Find the probability that the number of young females in this sample who expect a female president within 10 years is
a. at least 9 b. at most 5 c. between 6 to 9
14. According to a 2001 study of college students conducted by Harvard University’s School of Public Health, 34.9% of the male students surveyed said they got drunk three or more times in the past 30 days. (USA Today, April 3, 2002). Assuming that this result holds true for all male college students, find the probability that in a random sample of 10 male college students, the number of students who got drunk three or more times in the past 30 days is
a. exactly 4 b. none c. exactly 8
15. The U.S. Bureau of Labor Statistics conducts periodic surveys to collect information on the labor market. According to one such survey, the average earnings of workers in retail trade were$10 per hour in August 2002 (Bureau of Labor Statistics News, September 18, 2002). Assume that the hourly earnings of such workers in August 2002 had a normal distribution with a mean of $10 and a standard deviation of $1.10. Find the probability that the hourly earnings of a randomly selected retail trade worker in August 2002 were
a. more than $12 b. between $8.50 and $10.80
16. According to a survey by the Kaiser Family Foundation, employers paid an average of $7954 per employee in annual premiums to provide family health coverage for their employees, and each worker paid and average of $2084 toward these premiums (USA Today, September 6, 2002). Assume that the current annual premiums paid by all workers for family health coverage are normally distributed with a mean of $2084 and a standard deviation of $300.
a. Find the probability that a randomly selected worker pays more than $2500 per year toward the family health coverage premium.
b. What percentage of such workers are paying between $1800 and $2400 per year toward such premiums?
17. In a Visa USA poll of Americans, participants were asked which of the following had taught them the most about money management: school or mistakes. Sixty-four percent of the persons polled said that mistakes had taught them the most (USA Today, May 16, 2002). Assume that this result is true for the current population of all Americans. Find the probability that in a random sample of 400 Americans, the number who will say that mistakes have taught them the most about money management is
a. exactly 250 b. 260 to 272 c. at most 244
18. According to a survey by Money magazine, 27% of women expect to support their parents financially (USA Today, June 19, 2002). Assume that this percentage holds true for the current population of all women. Suppose that a random sample of 300 women is taken.
a. Find the probability that exactly 79 of the women in this sample expect to support their parents financially.
b. Find the probability that at most 74 of the women in this ample expect to support their parent financially.
c. What is the probability that between 75 and 89 of the women in this sample expect to support their parents financially?
19. According to International Communications Research for Cingular Wireless, men talk an average of 594 minutes per month on their cell phones (USA Today, July 29, 2002). Assume that currently 594 minutes with a standard deviation of 160 minutes. Let X be the mean time spent per month talking on their cell phones by a random sample of 400 men who own cell phones. Find the mean and standard deviation of X.
20. According to the U.S. Bureau of Labor Statistics estimates, the average earnings of construction workers were $18.96 per hour in August 2002 (Bureau of Labor Statistics News, September 18, 2002). Assume that the current earnings of all construction workers are normally distributed with a mean of $18.96 per hour and a standard deviation of $3.60 per hour. Find the probability that the mean hourly earning of a random sample of 25 construction workers is
a. between $18 and $20 per hour
b. within $1 of the population mean
c. greater that the population mean by $1.50
21. According toCardWeb.com, the average credit card debt per household was $8367 in 2001 (USA Today, April 29, 2002). Assume that the probability distribution of all such current debts is skewed to the right with a mean of $8367 and a standard deviation of $8367 and a standard deviation of $2400. Find the probability that the mean of a random sample of 225 such debts is
a. between $8100 and$8500
b. within $200 of the population mean
c. greater that the population mean by $300 or more
22. A Maritz poll of adult drivers conducted in July 2002 found that 45% of them “often” or “sometimes” eat or drink while driving (USA Today, October 23, 2002). Assume that 45% of all current adult drivers “often” or “sometimes” eat or drink while driving. Let p be the proportion of adult drivers in a sample of 400 who behave this way. Find the mean and standard deviation of p and describe the shape of its sampling distribution.
23. In a 2002 USA TODAY-CNN-Gallup poll, 37% of taxpayers said that the income tax they had to pay was not fair (USA Today, April 15, 2002). Assume that this percentage is true for the current population of all taxpayers. Let p be the proportion of taxpayers in a random sample of 300 who will say that the income tax they have to pay is not fair. Calculate the mean and standard deviation of p and comment on the shape of its sampling distribution.
24. In a Retirement Confidence survey of retired people, 51% said that retirement is better than they had expected (U.S. News & World Report, June 3, 2002). Assume that this percentage is true for the current population of all retirees. Let p be the proportion of retirees in a random sample of 225 who hold this opinion. Calculate the mean and standard deviation of p and describe the shape of its sampling distribution.
25. According to a 2002 survey by America Online, mothers with children under 18 years of age spent an average of 16.87 hours per week online (USA Today, May 7, 2002). Assume that the mean time spent online by all current mothers with children under 18 years of age is 16.87 hours per week with a standard deviation of 5 hours per week. Find the probability that the mean time spent online per week by a random sample of 100 such mothers is
a. greater that 17 hours
b. between 16.5 and 17.5 hours
c. within .75hour of the population mean
d. less than the population mean by .75 hour or more
26. Due to sluggish economic conditions, the percentage of companies that host holiday parties for their employees have declined. According to a survey by Hewitt Associates, 64% of the companies hosted holiday parties in 2002 (USA Today, December 2, 2002). Assume that this result is based on a random sample of 400 U.S. companies.
a. Find a 98% confidence interval for the proportion of all U.S. companies who hosted holiday parties in 2002.
b. Explain why we need to make the confidence interval. Why can we not say that 64% of all U.S. companies hosted parties in 2002?
27. A May 2002 Gallup Poll found that only 8% of a random sample of 1012 adults approved of attempts to clone a human.
a. Find the margin of error for this poll if we want 95% confidence in our estimate of the percent of American adults who approve of cloning humans.
b. Explain what that margin of error means.
c. If we only need to be 90% confident, will the margin of error be larger or smaller? Explain.
d. Find that margin of error.
e. In general, if all other aspects of the situation remain the same, would smaller samples produce smaller or larger margins of error?
28. In the 1992 U.S. presidential election, Bill Clinton received 43% of the vote compared with 38% for George Bush, and 19% for Ross Perot. Suppose we had taken a random sample of 100 voters in an exit poll and asked them for whom they had voted.
a. Would you always get 43 votes for Clinton, 38 for Bush, and 19 for Perot in a sample of 100? Why or why not?
b. In 95% of such polls, our sample proportion of voters for Clinton should be between what two values?
c. In 95% of such polls, the sample proportion of Perot votes should be between what two numbers?
d. Would you expect the sample proportion of Perot votes to vary more, less, or about the same as the sample proportion of Bush votes? Why?
29. In May of 200, the Pew Research Foundation sampled 1593 respondents and asked how they obtain news. In Pew’s report, 33% of respondents say that they now obtain news from the Internet at least once a week.
a. Pew reports a margin of error of 3% for this result. Explain what the margin of error means.
b. Pew also asked about investment information, and 21% of respondents reported that the Internet is their main source of this information. When limited to the 780 respondents who identified themselves as investors, the percent who rely on the Internet rose to 28%. How would you expect the margin of error for this statistic to change in comparison with the margin of error for the percentage of all respondents?
c. When restricted to the 239 active traders in the sample, Pew reports that 45% rely on the Internet for investment information. Find a confidence interval for this statistic.
d. How does the margin of error for your confidence interval compare with the values in parts a and b? Explain why.
30. In May of 2002, the Gallup Organization asked a random sample of 537 American adults this question:
If you could choose between the
following two approaches, which do you think is the better penalty for murder,
the death penalty or life imprisonment, with absolutely no possibility of
parole?
Of those polled, 52% chose the death penalty.
a. Find a 95% confidence interval for the percentage of all American adults who favor the death penalty.
b. Based on your confidence interval, is it clear that the death penalty has majority support? Explain.
c. If pollsters wanted to follow up on this poll with another survey that could determine the level of support for the death penalty to within 2% with 98% confidence, how many people should they poll?
31. During the 2000 season, the home team won 138 of the 240 regular season National Football League games. Is this strong evidence of a home field advantage in professional football? Test an appropriate hypothesis and state your conclusion. Be sure the appropriate assumptions and conditions are satisfied before you proceed.
32. In 2000, 19 million registered voters failed to vote in the presidential election. According to the U.S. Census Bureau’s Current Population Survey in November 2000, the most frequently given reason for not voting was “too busy,” cited by 20.9% of the respondents (USA Today, April 15, 2002). Suppose that a random sample of 250 registered voters who did not vote in the November 2002 midterm elections showed that 18.1% of them stated the main reason for not voting was that they were too busy. At the 5% level of significance, can you conclude that the percentage of the registered voters who did not vote in November 2002 because they were too busy was less than 20.9%?
33. According to a 2002 survey conducted by Harris Interactive for lawyers.com, 47% of Americans dream of owning a business (USA Today, September 10,2002). Assume that this result was true for the population of Americans in 2002. A recent random sample of 100 Americans found that 430 of 5them dream of owning a business. Test at the 5% significance level if the current percentage of Americans who dream of owning a business is different from 47%.
34. In a 2002 Affluent Americans and Their Money survey of “affluent” Americans (having an annual household income of $75,000 of more) conducted for Money magazine by RoperASW, 32% of the respondents indicated that they would have a serious problem paying an unexpected bill of $5000 (Money, Fall 2002). In a recent random sample of 1100 households with annual income of $75,000 or more, 396 said that they would have a serious problem paying an unexpected bill of $5000.
a. Test at the 2.5% significance level, whether the company should market this yogurt?
b. What will your decision be in part (a) if the probability of making a Type I error is zero? Explain.
35. A Vermont study published in December 2001 by the American Academy of Pediatrics examined parental influence on teenagers’ decisions to smoke. A group of students who had never smoked were questioned about their parents’ attitudes toward smoking. These students were questioned again two years later to see if they had started smoking. The researchers found that among the 284 students who indicated that their parents disapproved of kids smoking, 54 had become established smokers. Among the 41 students who initially said their parents were lenient about smoking, 11 became smokers. Do these data provide strong evidence that parental attitude influences teenagers’ decisions about smoking?
a. What kind of design did the researchers use?
b. Write appropriate hypotheses.
c. Are the assumptions and conditions necessary for inference satisfied?
d. Test the hypothesis and state your conclusion.
e. Explain in this context what your p-value means.
f. If that conclusion is actually wrong, which type of error did you commit?
36. The August 2001 issue of Pediatrics reported on a study of adolescent suicide attempts. Questionnaires were given to 6577 middle and high school students, 214 of whom were adopted. Of the 6577 students 213 youngsters said they had attempted suicide within the last year: 16 of those who were adopted and 197 of those who were not. Does this indicate a significantly different rate of suicide among adopted teens?
a. Test an appropriate hypothesis and state your conclusion.
b. If you concluded there was a difference, estimate that difference with a confidence interval and interpret your interval in context.
37. Among 242 Cleveland area children born prematurely at low birth weights between 1977 and 1979, only 74% graduated from high school. Among a comparison group of 233 children of normal birth weight, 83% were high school graduates. (New England Journal of Medicine, 346, no. 3 [2002])
a. Find a 95% confidence interval for the difference in graduation rates between children of normal and very low birth weights. Be sure to check the appropriate assumption and conditions.
b. Does this provide evidence that premature birth may be a risk factor for not finishing high school? Use your confidence interval to test an appropriate hypothesis.
c. Suppose your conclusion is incorrect. Which type of error did you make?
38. According to Money magazine, the average net worth of U.S. households in 2002 was $355,000 (Money, Fall 2002). Assume that this mean is based on a random sample of 500 households and that the sample standard deviation is $125,000. Find a 99% confidence interval for the 2002 mean net worth of all U.S.households.
39. According to a 2002 survey by America Online, mothers with children under age 18 spent an average of 16.87 hours per week online (USA Today, May 7, 2002). Suppose that this mean is based on a random sample of 1000 such mothers and that the standard deviation for this sample is 3.2 hours per week. Find a 95% confidence interval for the corresponding population mean for all such mothers.
40. According to Money magazine, the average price of new homes in the United States was $145,000 in 2002 (Money, Fall 2002). Assume that this mean is based on a random sample of 1000 new home sales and that the sample standard deviation is $24,000. Find a 95% confidence interval for the 2002 mean price of all such homes.
41. According to Money magazine, the average cost of a movie ticket in the United States was $5.70 in 2002 (Money, Fall 2002). Suppose that a random sample of 25 theaters in the United States yielded a mean movie ticked price of $5.70 with a standard deviation of $1.05. Assuming that movie ticket prices are normally distributed, find a 95% confidence interval for the mean price of movie tickets for all theaters in the United States.
42. In 1998, as an advertising campaign, the Nabisco Company announced a “1000 Chips Challenge,” claiming that every 18-ounce bag of their Chips Ahoy cookies contained at least 1000 chocolate chips. The students in a Statistics class at the Air Force Academy purchased some randomly selected bags of cookies, and counted the chocolate chips. Some of their data are given below. (Chance, 12, no. 1, 1999)
1219 1214 1087 1200 1419 1121 1325 1345
1244 1258 1356 1132 1191 1270 1295 1135
a. Find a 95% confidence interval for the average number of chips in bags of Chips Ahoy cookies.
b. What does this evidence say about Nabisco’s claim? Use your confidence interval to test an appropriate hypothesis and state your conclusion.
43. According to the U.S. Bureau of Labor Statistics, there were 8.1 million unemployed people aged 16 years and over in August 2002. The average duration of unemployment for these people was 16.3 weeks (Bureau of Labor Statistics News, September 6, 2002). Suppose that a recent random sample of 400 unemployed Americans aged 16 years and over gave a mean duration of unemployment of 16.9 weeks with a standard deviation of 4.2 weeks. Find the p-value for the hypothesis test with the alternative hypothesis that the current mean duration of unemployment for all unemployed Americans aged 16 years and over exceeds 16.3 weeks. Will you reject the null hypothesis at = .02?
44. According to an estimate, Americans spend an average of $226 per year to “look good,” buying personal-care products and services such as tooth whitener, hair dyes, and sessions at salons (Reader’s Digest, September 2002). Suppose that a recent random sample of 250 Americans showed that they spent an average of $238 on such products and services this year with a standard deviation of $77.
a. Find the p-value for the test of hypothesis with the alternative hypothesis that the current mean annual amount spent on such products and services differs from $226.
b. If = .01, would you reject the null hypothesis based on the p-value calculated in part a? What if = .02?
45. According to the International Communications Research for Cingular Wireless, women talked an average of 394 minutes per month on their cell phones in 2002 (USA Today, July 29, 2002). Suppose that a recent sample of 295 women who own cell phones showed that the mean time they spend per month talking on their cell phones is 402 minutes with a standard deviation of 81 minutes.
a. At the 2% level of significance, can you conclude that the mean time spent talking on their cell phones by all women who own cell phones is currently more that 394 minutes per month?
b. What is the Type I error in this case? What is the probability of making this error in part a?
46. According to Money magazine, the average cost of a visit to a doctor’s office in the United States was $60 in 2002 (Money, Fall 2002). Suppose that a recent random sample of 25 visits to doctors gave a mean of $63.50 and a standard deviation of $2.00. Using the 5% significance level, can you conclude that the current mean cost of a visit to a doctor’s office exceeds $60? Assume that such cost for all visits to doctors are normally distributed.
47. According to the U.S Bureau of Labor Statistics, production workers in the mining industry worked an average of 43.5 hours per week in June 2002 (Bureau of Labor Statistics News, September 6, 2002). A random sample of 24 production workers selected recently from a large mining company, Low Yield Mine, found that they work an average of 41.7 hours per week with a standard deviation of 1.3 hours per week. Assume that the weekly working hours of all these employees are normally distributed.
a. Suppose that the probability of making a Type I error is selected to be zero. Can you conclude that workers at Low Yield Mine work less than 4.35 hours per week? Answer without performing the five steps of a test of hypothesis.
b. Using the 5% level of significance, can you conclude that workers at Low Yield Mine work less than 43.5 hours per week?
48. In the 2002 National Geographic Society—RoperASW poll of geographic knowledge, young adults of 18 to 24 years of age from the United States and 8 other nations were asked to identify 11 countries on a numbered map of Asia (National Geographic, December 2002). The two highest-scoring nations were Germany, with an average of 6.7 correct identifications out of 11, and Sweden, with an average of 6.3 correct answers. Suppose that these means were based on random samples of 400 young adults from Germany and 600 from Sweden, and that the sample standard deviations of scores for Germany and Sweden were .7 and .8, respectively.
a. Let and be the population mean for Germany and Sweden, respectively. Find the point estimate of and its margin of error.
b. Find a 98% confidence interval for . Using the 5% level of significance, can you conclude that the mean score for all young adults from Germany is greater than that of all young adults from Sweden?
49. According to Smith Travel Research, the mean hotel room rates in the United States were $85.69 and $84.58 per day in 200 and 2001, respectively (USA Today, September 4, 2002). Suppose that these mean rates were based on random samples of 1000 hotel rooms for 2000 and 1100 hotel rooms for 2001; further assume that the standard deviations of these rates for the two samples were $18.50 and $18, respectively.
a. Let and be the mean rates for all hotel rooms in the United States in 200 and 2001, respectively. What are the point estimate of u1-u2 and its margin of error?
b. Find a 90% confidence interval for .
c. Test ate the 1% significance level whether the mean hotel room rate for 2000 was higher than that for 2001.
50. In June 2002, the Journal of Applied Psychology reported on a study that examined whether the content of TV shows influenced the ability of viewers to recall brand names of items featured in the commercials. The researchers randomly assigned volunteers to watch one of three programs, each containing the same nine commercials. One of the programs had violent content, another sexual content, and the third neutral content. After the shows ended the subjects were asked to recall the brands of products that were advertised. Results are summarized below.
|
Violent |
Sexual |
Neutral |
No. of subjects |
108 |
108 |
108 |
Mean
|
2.08 |
1.71 |
3.17 |
St. Dev |
1.87 |
1.76 |
1.77 |
a. Do the results indicate that viewer memory for ads may differ depending on the program content?
b. Is there evidence that viewer memory for ads may differ between programs with sexual content and those with neutral content?
51. In a full-page ad that ran in many U.S. newspapers in August 2002, a Canadian discount pharmacy listed costs of drugs that could be ordered from a Website in Canada. The table compares prices (in US$) for commonly prescribed drugs.
COST PER 100 PILLS
|
United States |
Canada |
Percent Savings |
Cardizem |
131 |
83 |
37 |
Celebrex
|
136 |
72 |
47 |
Cipro |
374 |
219 |
41 |
Pravachol |
370 |
166 |
55 |
Premarin |
61 |
17 |
72 |
Prevacid |
252 |
214 |
15 |
Prozac |
263 |
112 |
57 |
Tamoxifen |
349 |
50 |
86 |
Vioxx |
243 |
134 |
45 |
Zantac |
166 |
42 |
75 |
Zocor |
365 |
200 |
45 |
Zoloft |
216 |
105 |
51 |
a. Find a 95% confidence interval for the average savings in dollars.
b. Find a 95% confidence interval for the average savings in Percent.
c. Which analysis do you think is more appropriate? Why?
52. Ever since Lou Gehrig developed amyotrophic lateral sclerosis (ALS), this deadly condition has been commonly known as Lou Gehrig’s disease. Some believe that ALS is more likely to strike athletes or the very fit. Columbia University neurologist Lewis P. Rowland recorded personal histories of 431 patients he examined between 1992 and 2002. He diagnosed 280 as having ALS; 38% of them had been varsity athletes. The other 151 had other neurological disorders, and only 26% of them had been varsity athletes. (Science News, Sept.28, 2002). Is there evidence that ALS is more common among athletes?
53. A study of the health behavior of school-aged children asked a sample of 15-year-olds in several different countries if they had been drunk at least twice. The results are shown in the table, by gender. Find a 95% confidence interval for the difference in the rates for males and females. Be sure to check the assumptions that support your chosen procedure, and explain what your interval means. (Health and Health Behavior Among Young people. Copenhagen: World Health Organization, 2000)
Percent of 15-Year-Olds Drunk at Least Twice
Country |
Female |
Male |
Denmark |
63 |
71 |
Wales |
63 |
72 |
Greenland |
59 |
58 |
England |
62 |
51 |
Finland |
58 |
52 |
Scotland |
56 |
53 |
No. Ireland |
44 |
53 |
Slovakia |
31 |
49 |
Austria |
36 |
49 |
Canada |
42 |
42 |
Sweden |
40 |
40 |
Norway |
41 |
37 |
Ireland |
29 |
42 |
Germany |
31 |
36 |
Latvia |
23 |
47 |
Estonia |
23 |
44 |
Hungary |
22 |
43 |
Poland |
21 |
39 |
USA |
29 |
34 |
Czech Rep. |
22 |
36 |
Belgium |
22 |
36 |
Russia |
25 |
32 |
Lithuania |
20 |
32 |
France |
20 |
29 |
Greece |
21 |
24 |
Switzerland |
16 |
25 |
Israel |
10 |
18 |
54. In March 2002, Consumer Reports listed the rate of return for several large cap mutual funds over the previous 3-year and 5-year periods. (“Large cap” refers to companies worth over $10 billion.)
Annualized Returns (%)
Fund Name |
3-year |
5-year |
Ameristock |
7.9 |
17.1 |
Clipper |
14.1 |
18.2 |
Credit Suisse Strategic Value |
5.5 |
11.5 |
Dodge & Cox Stock |
15.2 |
15.7 |
Excelsior Value |
13.1 |
16.4 |
Harbor Large Cap Value |
6.3 |
11.5 |
ICAP Discretionary Equity |
6.6 |
11.4 |
ICAP Equity |
7.6 |
12.4 |
Neuberger Berman Focus |
9.8 |
13.2 |
PBHG Large Cap Value |
10.7 |
18.1 |
Pelican |
7.7 |
12.1 |
Price Equity Income |
6.1 |
10.9 |
USAA Cornerstone Strategy |
2.5 |
4.9 |
Vanguard Equity Income |
3.5 |
11.3 |
Vanguard Windsor |
11.0 |
11.0 |
a. Find a 95% confidence interval for the difference in rate of return for the 3- and 5-year periods covered by these data. Clearly explain what your interval means.
b. It’s common for advertisements to carry the disclaimer that ”past returns may not be indicative of future performance,” but do these data indicate that there was an association between 3-year and 5-year rates of return?
55. In 2000, the Journal of American Medical Association published a study that examined a sample of pregnancies that resulted in the birth of twins. Births were classified as preterm with intervention (induced labor or cesarean), preterm without such procedures, or term or postterm. Researchers also classified the pregnancies by the level of prenatal medical care the mother received (inadequate, adequate, or intensive). The data, from the years 1995-97, are summarized in the table below. Figures are in thousands of births. (JAMA 284, 2000)
TWIN BIRTHS
1995-1997 (IN THOUSANDS)
|
Preterm (induced or Cesarean) |
Preterm (without procedures) |
Term or postterm |
Total |
Intensive |
18 |
15 |
28 |
61 |
Adequate |
46 |
43 |
65 |
154 |
Inadequate |
12 |
13 |
38 |
63 |
Total |
76 |
71 |
131 |
278 |
Is there evidence of an association between the duration of the pregnancy and the level of care received by the mother?
56. The Gallup Poll conducted a representative telephone survey during the first quarter of 1999. Among the reported results was the following table concerning the preferred political party affiliation of respondents and their ages. Is there evidence of age-based differences in party affiliation in the United States?
|
Republican
|
Democratic |
Independent |
Total |
18-29 |
241 |
351 |
409 |
1001 |
30-49 |
299 |
330 |
370 |
999 |
50-64 |
282 |
341 |
375 |
998 |
65+ |
279 |
382 |
343 |
1004 |
Total |
1101 |
1404 |
1497 |
4002 |
a. Will you conduct a test of homogeneity or independence? Why?
b. Test an appropriate hypothesis.
c. State your conclusion, including an analysis of differences you find (if any).
57. In January 2002, 725 people receiving outplacement assistance, with incomes of $60,000 to $150,000 were asked how long they could comfortably afford to be unemployed (Business Week, April 15, 2002). Eight percent said” less than three months,” 46% said “up to six months,” 26% said “up to a year,” and 20% said “more that a year.” Assume that these results are true for the 2002 population of such people. Suppose we denote the above responses by A, B, C, and D, respectively. Recently500 such people were randomly selected and asked the same question. The following table summarizes their responses.
Response |
A |
B |
C |
D |
Number of people |
48 |
242 |
120 |
90 |
Test at the 5% significance level whether the current distribution of response differs from the one for 2002.
58. In an At-a-Glance Communications 2002 survey, office workers were asked how long they normally took to respond to e-mail. Thirty-sex percent said “as soon as I return to my desk,” 35% said “within an hour or two,” 24% said “before the end of the business day,” and 5% said “when I can” (USA Today, May 7, 2002). Assume that these results hold true for the population of all office workers in 2002. Suppose we denote these response by A, B, C, and D, respectively. A recent random sample of 400 office workers was asked the same question and it yielded the frequency distribution shown in the following table.
Response |
A |
B |
C |
D |
Frequency |
128 |
142 |
116 |
14 |
Using the 1% significance level, can you conclude that the current distribution of responses differs from the 2002 distribution?
59. A survey conducted from June 21 through August 7, 2002 studied “affluent” Americans with household incomes of $75,000 or more per year (Money, Fall 2002). Part of that survey examined the relationship between the use of a financial advisor and ownership of stocks. Assuming that this portion of the survey was based on a random sample of 400 affluent Americans, the percentages given in the magazine would yield the numbers shown in the following table.
|
|
Own Stocks |
Do Not Own Stocks |
Use financial |
Yes |
165 |
135 |
advisor? |
No |
43 |
57 |
At the 5% significance level, can you conclude that use of a financial advisor is related to stock ownership for all affluent Americans?
60. In a Knowledge Networks/Statistical Research 2002 survey, 8- to 17-year-olds were asked which medium was there favorite (The Reader’s Digest, November 2002). If the survey were based on random samples of 500 boys and 500 girls, the percentages given in the magazine article would have yielded the following table.
|
Internet |
TV |
Phone |
Radio |
Other |
Boys |
190 |
170 |
60 |
60 |
20 |
Girls |
140 |
85 |
155 |
85 |
35 |
Using the 1% significance level, test the null hypothesis that the distribution of media preferences is the same for boys and girls in this age group.
61. The following table gives horsepower ratings and expected gas mileage for several 2001 vehicles.
Audi A4 |
170 hp |
22 mpg |
Buick LeSabre |
205 |
20 |
Chevy Blazer |
190 |
15 |
Chevy Prizm |
125 |
31 |
Ford Excursion |
310 |
10 |
GMC Yukon |
285 |
13 |
Honda Civic |
127 |
29 |
Hyundai Elantr |
140 |
25 |
Lexus 300 |
215 |
21 |
Lincoln LS |
210 |
23 |
Mazda MPV |
170 |
18 |
Olds Alero |
140 |
23 |
Toyota Camry |
194 |
21 |
VW Beetle |
115 |
29 |
a. Make a scatterplot for these data.
b. Describe the direction, form, and scatter of the plot.
c. Find the correlation between horsepower and miles per gallon.
d. Write a few sentences telling what the plot says about fuel economy.
62. The following table shows the oil production of the United States from 1949 to 2000 (in millions of barrels per year).
Year |
Oil |
Year |
Oil |
Year |
Oil |
Year |
Oil |
1949 |
1,841,940 |
1962 |
2,676,189 |
1975 |
3,056,779 |
1988 |
2,979,123 |
1950 |
1,973,574 |
1963 |
2,752,723 |
1976 |
2,976,180 |
1989 |
2,778,773 |
1951 |
2,247,711 |
1964 |
2,786,822 |
1977 |
3,009,265 |
1990 |
2,684,687 |
1952 |
2,289,836 |
1965 |
2,848,514 |
1978 |
3,178,216 |
1991 |
2,707,039 |
1953 |
2,357,082 |
1966 |
3,027,763 |
1979 |
3,121,310 |
1992 |
2,624,632 |
1954 |
2,314,988 |
1967 |
3,215,742 |
1980 |
3,146,365 |
1993 |
2,499,033 |
1955 |
2,484,428 |
1968 |
3,329,042 |
1981 |
3,128,624 |
1994 |
2,431,476 |
1956 |
2,617,283 |
1969 |
3,371,751 |
1982 |
3,156,715 |
1995 |
2,394,268 |
1957 |
2,616,901 |
1970 |
3,517,450 |
1983 |
3,170,999 |
1996 |
2,366,017 |
1958 |
2,448,987 |
1971 |
3,453,914 |
1984 |
3,249,696 |
1997 |
2,354,831 |
1959 |
2,574,590 |
1972 |
3,455,368 |
1985 |
3,274,553 |
1998 |
2,281,919 |
1960 |
2,574,933 |
1973 |
3,360,903 |
1986 |
3,168,252 |
1999 |
2,146,732 |
1961 |
2,621,758 |
1974 |
3,202,585 |
1987 |
3,047,378 |
2000 |
2,135,062 |
a. Find the correlation between year and production.
b. A reporter concludes that a low correlation between year and production shows that oil production has remained steady over the 50-year period. Do you agree with this interpretation? Explain.
c. Fit a least squares regression line to oil production by year.
d. Using this regression line, predict U.S. oil production in the year 2001.
e. Does the prediction in part b look reasonable? Comment
f. Do you think the regression line is an appropriate model? Comment.
63. The following table gives the total 2002 payroll (rounded to the nearest million dollars) and the percentage of games won during the 2002 season by each of the National League baseball teams.
Team |
Total Payroll |
Percentage of Game Won |
Arizona Diamondbacks |
103 |
60.5 |
Atlanta Braves |
93 |
63.1 |
Chicago Cubs |
76 |
41.4 |
Cincinnati Reds |
45 |
48.1 |
Colorado Rockies |
57 |
45.1 |
Florida Marlins |
42 |
48.8 |
Houston Astros |
63 |
51.9 |
Los Angeles Dodgers |
95 |
56.8 |
Milwaukee Brewers |
50 |
34.6 |
Montreal Expos |
39 |
51.2 |
New York Mets |
95 |
46.6 |
Philadelphia Phillies |
58 |
49.7 |
Pittsburgh Pirates |
42 |
44.7 |
St. Louis Cardinals |
75 |
59.9 |
San Diego Padres |
41 |
40.7 |
San Francisco Giants |
78 |
59.0 |
a. Find the least squares regression line with total payroll as an independent variable and percentage of games won as a dependent variable.
b. Give a brief interpretation of the values of the y-intercept and the slope.
c. Predict the percentage of games won for a team with a total payroll of $55 million.
64. The following table gives the total 2002 payroll (rounded to the nearest million dollars) and the percentage of games won during the 2002 season by each of the American League baseball teams.
Team |
Total Payroll |
Percentage of Games Won |
Anaheim Angels |
62 |
61.1 |
Baltimore Orioles |
60 |
41.4 |
Boston Red Sox |
108 |
57.4 |
Chicago White Sox |
57 |
50.0 |
Cleveland Indians |
79 |
45.7 |
Detroit Tigers |
55 |
34.2 |
Kansas City Royals |
47 |
38.3 |
Minnesota Twins |
40 |
58.4 |
New York Yankees |
126 |
64.0 |
Oakland A’s |
40 |
63.6 |
Seattle Marines |
80 |
57.4 |
Tampa Bay Devil Rays |
34 |
34.2 |
Texas Rangers |
106 |
44.4 |
Toronto Blue Jays |
77 |
48.1 |
a. Find the least square regression line with total payroll as an independent variable and percentage of games won as a dependent variable.
b. Give a brief interpretation of the values of the y-intercept and the slope.
c. Predict the percentage of games won for a team with a total payroll of $65 million.
65. The following table gives the average hotel room rates in the United States for the years 1992-2001.
Year |
Average Hotel Room Rate |
1992 |
$59.39 |
1993 |
60.99 |
1994 |
63.35 |
1995 |
66.34 |
1996 |
70.68 |
1997 |
74.77 |
1998 |
78.24 |
1999 |
81.59 |
2000 |
85.69 |
2001 |
84.58 |
a. Assign a value of 0 to 1992, 1 to 1993, 2 to 1994, and so on. Call this new variable Time. Make a new table with the variables Time and Average Hotel Room Rate.
b. Construct a scatter diagram for these data. Does the scatter diagram exhibit a linear positive relationship between time and average hotel room rates?
c. Find the least squares regression line.
d. Compute the correlation coefficient r.
e. Predict the average hotel room rate for 2006. Comment on this prediction.
66. The following table shows the price of ladies diamond rings and the weight of their diamond stones. (Journal of Statistical Education, 1996)
Weight |
Price |
Weight |
Price |
Weight |
Price |
Weight |
Price |
Weight |
Price |
Weight |
Price |
.17 |
355 |
.21 |
483 |
.12 |
223 |
.17 |
353 |
.32 |
919 |
.25 |
655 |
.16 |
328 |
.15 |
323 |
.26 |
663 |
.18 |
438 |
.15 |
298 |
.35 |
1086 |
.17 |
350 |
.18 |
462 |
.25 |
750 |
.17 |
318 |
.16 |
339 |
.18 |
443 |
.18 |
325 |
.28 |
823 |
.27 |
720 |
.18 |
419 |
.16 |
338 |
.25 |
678 |
.25 |
642 |
.16 |
336 |
.18 |
468 |
.17 |
346 |
.23 |
595 |
.25 |
675 |
.16 |
342 |
.20 |
498 |
.16 |
345 |
.15 |
315 |
.23 |
553 |
.15 |
287 |
.15 |
322 |
.23 |
595 |
.17 |
352 |
.17 |
350 |
.17 |
345 |
.26 |
693 |
.19 |
485 |
.29 |
860 |
.16 |
332 |
.32 |
918 |
.33 |
945 |
.15 |
316 |
a. Construct a scatter diagram for these data.
b. Find the least squares regression line.
c. Compute the correlation coefficient r.