The world bank collects data on many variables related to world
MATH220–Assignment #3
Question 1: The World Bank collects data on many variables related to world development for countries throughout the world. Two of these are Internet use (in number of users per 100 people) and life expectancy (in years). The data file is provided separately.

Make a scatterplot of life expectancy (the response variable) versus internet use. Describe the relationship. Is there an overall pattern? Do you see any deviation from that pattern?

Compute the correlation coefficient R between life expectancy versus internet use.

A friend looks at the scatterplot and concludes that using the internet will increase the length of your life. Would you agree with her? Explain your answer.

Make a scatterplot of life expectancy versus internet use, but this time use different symbols for European and nonEuropean countries? Do you wish to modify your answer to question c). Explain.
Minitab Instructions: a) To obtain the scatterplot:

Choose ‘Graph → Scatterplot’

Click on the ‘Simple’ option

In the first row of the ‘variables’ table enter
Copy and paste the graph in your Word file. Describe what you see.
b) To find the correlation coefficient:

Choose ‘Stat → Basic statistics → Correlation’

Select Life Expectancy and Internet Use for the Variables box

Deselect the ‘Display Pvalue’ box and press Enter
The correlation coefficient appears in the Session window.
c) To obtain the scatterplot with different symbols for European and nonEuropean countries:

Choose ‘Graph → Scatterplot’

Click on the ‘With groups’ option

In the first row of the ‘variables’ table enter

the response for the ‘Y’ variable

the explanatory variable for the ‘X’ variable

the Region variable in the ‘Categorical variables for groups’ box and press Enter

Question 2: Old Faithful Geyser in Yellowstone National Park is renowned, among other things, for the regularity of its eruptions. The eruption durations (X, in minutes) and the subsequent intervals before the next eruption (Y, in minutes) are provided in a separate file.

Make a scatterplot of the interval variable versus the duration variable. Describe the relationship. Is there an overall pattern? Do you see any deviation from that pattern?

Find the correlation coefficient R between interval and duration. What would happen to the value of R if the scales were transformed in hours for the interval and duration variables.

Find the equation of the regression line for predicting interval from duration. In simple language, what is the slope of the line telling us?

Add the regression line to the scatterplot.

Find the percent of variation in the interval variable that is explained by the model. Does the regression model provide a good fit?

Make a residual plot from the linear regression model you constructed above. Discuss the appropriateness of the model.

Use the equation of the regression line to predict the subsequent interval before the next eruption for an erution that lasted 5 minutes. How confident are you that the prediction is quite accurate?
Minitab Instructions:
a) Proceed as in question 1a).
b) Proceed as in question 1b).
c) and d) To obtain the equation of the regression line and to plot it on the scatterplot:

Choose ‘Stat → Regression → Fitted line plot’

Select the appropriate variable for the ‘Response’ box and press Tab

Select the appropriate variable for the ‘Predictors’ box and press Enter
Paste the graph to your MSWord file. You will note that the equation of the regression line is printed on the graph, along with the value of R^{2} (ignore the adjusted R^{2}). For ease of reference, rewrite the equation separately in your word document. What is the slope telling us?
f) To make a residual plot:

Choose ‘Regression → Regression’

Choose the interval variable for the Response box and press tab

Choose the duration variable for the Predictors box

Click Graphs open

Move the cursor to the ‘Residuals vs. the variables:’ box and choose the duration variable (the explanatory variable) for that box.

Press Enter twice
Question 3: One of the most dangerous contaminants deposited over European countries following the Chernobyl accident of April 1987 was radioactive cesium. To study cesium transfer from contaminated soil to plants, researchers collected soil samples and samples of mushroom mycelia from 17 wooded locations in Umbria, Central Italy, from August 1986 to November 1989. Measured concentrations (Bq/kg, Bq or becquerel, is a unit of radioactivity) of cesium in the soil are given in a separate data file.

Construct a scatterplot using Y = concentration in mushrooms and X = concentration in soil. Describe the relationship between the two variables.

Fit a linear model and and report the correlation coefficient.

Exclude sample number 17 and repeat parts a) and b).

What is the effect of case 17 on the linear model and the correlation coefficient.
Minitab instructions: Combine the instructions from questions 1 and 2.
Question 4: (Paper and pencil and/or Excel)
Read the set up in Problem 2.170, p. 160 (7e) in the textbook:

Find the conditional distributions of the field of study variable for each region.

Construct the bar graphs of the three conditional distributions on the same page (Excel does this very nicely).

Provide a brief description of the relationship between field of study and region.