Common Core Algebra II.Unit 13.Lesson 8.Linear Regression and Lines of Best Fit
Feb 27, 2017
Hello, I'm Kirk weiler, and this is common core algebra two. By E math instruction. Today, we're going to be doing unit 13 lesson number 8 on linear regression and lines of best fit. Now the vast majority of this lesson, in fact, possibly all of it is review from common car algebra one. But one essential part of statistics is looking at how two variables relate to one another. Do they have a linear relationship or some other type? How can we quantify this? How can we look at it? And one of the most common ways of doing that is to see how well the two variables relate to each other in terms of a linear phenomenon, IE the equation of a line. So let's jump into that right with the first exercise. And the first exercise that says a pediatrician would like to determine the relationship between infant female weights versus their age. The pediatrician studies 100 newborn girls and finds their average weight at the end of three month intervals. The data is shown below and graft on the scatter plot. Scatter plot. All right? So we've got their ages zero months, three months, 6 months, et cetera, how much they weigh. Let's construct a line of best fit, letter a says, using a ruler, draw a line that you think best fits this data as a general guideline, try to draw it such that there are as many data points above the line as there are below it. All right? Well, hey, take a shot at it. All right. Well, this data has a pretty strong linear trend to it anyway. So I'm going to take my best shot and I'm going to see if I can draw a pretty good graph. Maybe like something like that. That's not bad. We've got like two of the data points, three of the data points above, and roughly three of the data points below. So that's a pretty good line of best fit, I think. Letter B says by picking two points that are on the line, not necessarily data points, not necessarily data points. Determine the equation of your best fit line. Round your coefficients to the nearest tenth. All right. Well, this just boils down to coming up with two points that lie on this line. Here's one that's pretty good. Why don't we take the .5 comma 14? The Y axis goes by twos. So that's 5 14. And of course this is going to come out different than in my notes. Yeah, why don't we go ten comma? 20. All right? So I want to find an equation Y equals MX plus B this is good. We haven't done this for a little while. The slope is going to be 20 -14 divided by ten -5. So that's going to be 6 divided by 5, and that ends up being 1.2. Now to find the Y intercept, I guess I could eyeball it, but the more scientific way to do it is to put your slope in Y equals 1.2 X plus B, then we pick one of these points. I think I'll just do the 14. Put that in for Y, put the 5 in for the X so I'll have 14 is equal to let's see that ends up being 6 plus B and B equals 8, which is just about what I see on here. So my line has an equation 1.2 X plus 8. All right. Now, you know, I can draw a line of best fit, grab a couple random points. You can draw a line of best fit, grab a couple random points. The calculator through some very sophisticated techniques can generate literally the best equation. And what it is doing is it's trying to come up with a line that minimizes how far those points are overall from the line. That's what it's doing. So it's using some sophisticated techniques that involve calculus to minimize the distance the points are away from the line. So let's make sure we can do that. I'm going to clear this out, so write down anything you need to. All right. Let's do some linear regression in the next problem. Letter C says using the linear regression command on your calculator to find the equation of the best line the best fit line. All right, they're just serious issues with that one. All right, so let's do it. Let's bring out the TI 84 plus. All right, and there's the TI 84 plus. Great. Now I took advantage of the fact that I knew that this problem was coming. And I actually put the data in already. Still, let's make sure of it, and let's go over to the stat command, a stat button. And let's go into edit. And of course, what we see there is an L one, we see the ages, zero, three, 6, the ages and months. And then an L two, we see the average weight in pounds, 7.2 12.2, et cetera. Now, just so that you can follow along, let me pause the video for a minute, or you pause the video and take a little bit of time getting that data in. All right, so now that we've got the data in, I think it's time to do a little linear regression. Okay. So let's go over to the calculate menu. Luckily, linear regression is one of the first ones. So let's go down and grab linear regression. All right, let's hit enter on that. And it asks us a bunch of different things. It wants our X list, which is L one. And once our Y list, which is L two, it also wants a frequency, but we're going to leave that blank. It also asks where we want to store the equation. We'll leave that blank. So we're going to continue to come on down until we get all the way to calculate and let's hit enter. All right. And the linear regression command or the linear regression equation that we find ends up being, let's see, we're supposed to let's round this to the nearest tenth. It would be 1.2 X plus 7 point 8 1.2 X plus 7.8. And I got to say, that's pretty good compared to what we had before, right? When we were actually working this out before, we nailed down the slope pretty much perfectly. And we had a pretty good Y intercept too. So, you know, we did a pretty good job. It doesn't mean that like I couldn't have been a little bit off on my slope and I was certainly a little bit off on my Y intercept. Each one of us would draw a best fit line a little bit differently. The calculator, the key here is that the calculator is finding the absolute best fit. It's minimizing the distance between the line and the data points in a very sophisticated way that certainly takes quite a bit of calculus. All right. Now, letter D asks us to use your calculator to determine the linear correlation coefficient. All right? The linear correlation coefficient is sometimes also known as the R value. Now we've got it just sitting there on our screen. It's already there and ready. And our R value is point 9 9 5. Rounded to the nearest thousandth. Now, how do we actually interpret that? Well, all our values go between negative one and positive one, with zero, obviously, right in between. Okay? Now, some people might think that negative one would be a bad correlation. But actually, both negative one and positive one are what we would call a perfect correlation. All right? It's just that when we have an R value of negative one, they are negatively correlated. When X goes up, Y would come down. Zero though is bad. Or no correlation, if you will. All right? So how do we interpret an R value of .995? Well, that's very, very good. All right? This is an excellent. Excellent positive fit. All right, an excellent positive fit. So what we can see from that and what we can see from the calculator is that we've got a good fit. Let's actually put the calculator away. All right, bye bye TI 84 plus. We'll be seeing you soon. All right, so there's two things that you do typically with best fit lines. You interpolate, and you extrapolate. Interpolation is when you use a regression model to make predictions for inputs that fall within the range of the inputs used to generate the model. Inside interpolation. So for instance, using the equation that your calculator produced in exercise number one, which was this equation, 1.1 X plus 7.9, predict the weight of a baby girl after ten months. Round your answer to the nearest tenth of a pound. Now this is interpolation because ten months comes within the range of the inputs that we use to generate the model. Now, of course, this is very easy. I just have to put ten months in for X 7.9, you crank through that. And you find 18.9 pounds. And that's a good example of interpolation. Now, what is a little bit weird is notice that that's larger or smaller than the weight at 9 pounds, okay? And again, that's because the data that's actually in that table, they don't perfectly lie on that equation. All right? But at least we have a good we have a good prediction. That's probably very, very realistic. Pause the video now before we talk about extrapolation. Okay, I'm going to clear it out. All right, extrapolation is when a regression model is used to make predictions for inputs that fall. Outside the range of inputs used to generate the model. Now extrapolation is way more dangerous, right? We're going to do an example of it here in a moment. It says, using the equation from your calculator produced an exercise one, predict the weight of a baby girl after two years after two years, round your answer to the nearest tenth of a pound. Well, when X is two years, we definitely don't want to use two, we want to use 24 months. Now, that is out here, right? It's not within the range of these inputs. So it's extrapolating. Now, again, it's still very, very easy to use our model. Y equals 1.1 times 24. Plus 7.9. 34.3 pounds. And that's still might be pretty realistic. All right. Of course, the hazard of extrapolation. If you said, hey, why don't we go with ten years? Let's use it to predict the weight of a girl after ten years. Well, that would be two sorry, that would be 120 months, and what would we get then? Right? If I did that, 1.1 times one 20 plus 7.9, that would give me a 129 pounds. And that's actually quite heavy for a ten year old girl. Ten year old girls don't tend to be that heavy. Then I took it more extreme if I said 20 years, then we would be talking about a weight that was over 200 pounds. The moral of the story is that extrapolation is dangerous. Interpolation tends to be quite good extrapolation tends to be a bit dangerous. But the unfortunate thing is that it's the extrapolation that we tend to want to do. We want to predict the future. We want to predict what's going to happen outside of what we've observed. That's pretty much the main point whenever we do a regression model. All right, pause the video now and write down anything you need to. Okay, let's clear out that text. Let's do one more regression problem. All right. Exercise force says biologists are trying to create a least squares regression equation, another name for best fit line. Relating the length of steelhead salmon to their weight. All right? 7 salmon were measured and weighed. With the data given below. Okay, letter a says determine the least squares regression equation in the form Y equals AX plus B for this data. Round all your coefficients to the nearest hundredth. Well, why don't you go ahead and do this for yourself? I'm not going to actually open up the TI 84 plus because I think that it's, I think, that you should know how to do this at this point. So put that data in and go through it and I'll give you the final answer. All right. Well, if you put the length in L one and you put the weight in L two and you crank through everything, you'll find when rounding the coefficients to the nearest hundredth, Y equals 1.33 X -27 .98. All right. Letter B using your equation from part a, determine the expected weight of a salmon that is 30 inches long. Well, all right, our inputs were the lengths and inches. So X equals 30 inches, so Y equals 1.33 times 30 -27.98, and that's simple enough. Gives me 11.92 pounds. And again, it kind of makes sense. You know, I mean, 30 would be roughly right in here. The 11.92 is definitely between these two. Now letter C says using your equation from part a determine the expected weight of a salmon that is 52 inches long. Why don't you go ahead and do that quickly? All right, simple enough here. We've got X equals 52 and Y would be 1.33 times 52 -27.98. And that would equal 41.18. Pounds. Butter D says, in which part B or C did you use interpolation? And in which part did you use extrapolation? Explain. All right, well go ahead. Do this and then we'll review it. All right. Well, it's pretty easy. B was our interpolation. And that was simply because 30 lines between. 22 and 48. And C was our extrapolation. Because 52 lies outside. Of 22. To 48. All right. It doesn't necessarily mean that our prediction in part B is better than in our prediction in part C but it does tend to be the case that interpolation is a safer estimate estimate than extrapolation. Pause the video now, write down anything you need to, and then we'll wrap up this lesson. Okay. Clearing it out. So in the lesson today, we really quite briefly reviewed some of the ideas some of the ideas behind linear linear fitting. Linear regression and lines of best fit. The idea here now in terms of statistics is that we've got two variables. Both of which have variation for probably a variety of different reasons. And then the question is, can you use one variable and the variation within it to predict what's happening with another variable and the variation within it? One of the great sort of pieces that we use to measure that is that our value, that correlation coefficient. The closer it is to either one or negative one, the better the variation in one set of data can explain the variation in another set. All right. Well, thank you for joining me for another common core algebra two lesson by E math instruction. My name is Kirk weiler, and until next time, keep thinking. And keep solving problems.