Transcripts
1. Easy Statistics: Regression Short Promotional:
2. What is Easy Statistics: Linear Regression?: Welcome to easy statistics. Easy statistics is designed to provide you with a compact and easy to understand cause that focuses on the basic principles of statistical methodology. No prior knowledge is required when, following an easy statistics course, you do not need to have a background in statistics to follow what's happening here. In fact, the less you know, the more your game from this course. Most important, there are no equations in this course. The emphasis off this course is on understanding statistical concepts. Intuitively, Andi, in a gentle manner if you want equations than there are plenty of other courses on textbooks that provide these. Our focus here will be very much on application and interpretation of statistics. I want you to know how to use and how to read statistical results without needing to know the exact underlying mechanics. It's a bit like driving a call. You don't need to know how an engine works. To use a car, you do need to know how to steer, accelerate and brake, and this is what I will teach you here. For statistics
3. What is Linear Regression?: What is linear regression? Linear regression is the most popular regression method used off the linear regression techniques available, or nearly squares, often abbreviated to OLS, is the most common. We're going to focus on ordinarily squares because it often underpins more sophisticated techniques. Andi, It is by far the most used regression method in the real world. Ordinarily, Squares is a technique that examines the relationship between one continues viable and one or more continuous and all categorical variables. This technique is used in many sciences, including economics, sociology, psychology, geography and even history. It is often used in business for quantitative analysis and underpins many government reports that perform some kind of policy evaluation. Anyone wishing to delve deeper into regression statistics must have a good foundation understanding off ordinarily squares.
4. Learning Outcomes: What are the main learning outcomes to learn and understand the basic statistical intuition behind ordinarily squares without needing to know about complicated equations, to be at ease with regression terminology and the assumptions behind ordinarily squares To be able to comfortably interpret and analyze complicated regression upward from ordinarily squares? And finally, to learn some extra tips and tricks that will help you when dealing with ordinarily squares regression models on output.
5. Who is this Course for?: who is this course full? This course is for academics, students of any level. It doesn't matter whether you're studying at school or university. If you need an easy introduction into linear regression, this course is for you. But this course is also for practitioners such business users, including managers who deal with quantitative analysis at their workplace. This course is also for those working in government, especially those who are involved in policy analysis. Finally, this course is aimed at anyone who has an interest in who needs to engage with statistical regression.
6. Pre-requisites: what prerequisites are needed. You do not need to know about mathematics or statistics to follow or get the most out of this course. Curiosity is all that is needed. Some state of knowledge may come in handy for the practical application part of its course , but it is not. The Quiet Stater is a statistical software program that allows users to estimate many regression models. I will use it to demonstrate ordinarily squares for question, a keen interest in understanding how measurements might be related to each other. Regression is all about measuring quantitative variables against each other. If you want to know how why is related to eggs than this is the right place for you.
7. Using Stata: using states. In this course, I'll be using Stater to demonstrate or nearly squares regression examples. Stater is a purchase book statistical software, and you find out more at www state of dot com. There are many courses on how to use data. Should you be interested in this course, I will not teach data, but focus on the interpretation off output. However, if you are interested in stater and replicating examples from this course, I have attached the relevant do fouls to Discourse 2000 off status syntax charge that contained code that allow you to replicate what you see on screen. We'll be using the auto training data fit that comes in build with data. For practical examples, This data is a training that is it that contains a variety of useful variables on relationships that are great for teaching purposes.
8. What is Regression Analysis?: What is regression analysis? Regression analysis is a statistical technique that attempts to explore the relationship between one dependent variable and one or more independent variables. An alternative term used for dependent, variable consume. Close. Be the outcome. Variable the response variable or at the endogenous variable. A dependent variable is normally denoted by the symbol. Why alternative terms for independent variables are predictor explanatory or exogenous variables. Explanatory variables. A. Normally denoted by the symbol X, it is common to write regression models in the form Why equals two x one plus X two plus X ray etcetera. The last term will be in Ferriter. This is often denoted by E. This captures everything that is missing. However, there are many different practices were inviting regression models in mathematical form, so we will avoid all of that. In this course, variables can take many different forms. In regression analysis, they could be continuous. In other words, data can be measured anywhere on the number line. Too many decimal points. I eat minus 2.3 or 500.3. Data can also be in interview format, such as 12345 etcetera. Data can also be in buying reformer such as zero or one. Often these denote finally responses such as yes and no. Sometimes data are orginal. Orginal data is categorical data that is right, such as like it scales. Finally, data can also be nominal. This is categorical data that is unranked, for example, modes of transport. Importantly, data must always be in numeric format. Mathematics and computer software can do very little with string type data. String type data is data that contains the letters and other non numeric characters like exclamation marks. They can also be transformed. And this is a common future of regression models. For example, taking the log of Y and making this the new dependent variable is a very common technique in regression analysis. By doing so, the interpretation off the entire model will be changed, and clearly this needs to be carefully considered when using or analysing such models.
9. What is Linear Regression?: What is linear regression? Regression analysis is a catchall term for every type of regression method. Often, regression methods are split into linear and non linear regression methods. And there are many methods in both of these two camps in this course will focus only on linear methods, specifically the ordinary least squares method that is the most popular linear method. Linear regression assumes that variable parameters relate to the dependent variable in a linear way. Variable parameters are what we try to estimate with regression models and data. Find a relationship between eggs and why we often call parameters coefficients, for example, a parameter or coefficient off one means that for every unit change in X, why the dependent variable changes by one without getting too technical. Linear regression assumes that dependent variables are measured as continuous variables. Explanatory variables can be measured in any way when the dependent variable is non continuous. The correct regression method is often nonlinear. However, there are instances where Linda methods can be used when the dependent variable is not continues when there's only one explanatory variable in the model. In other words, there is only one X variable. We call this simple regression when there are multiple expansion valuables. We called us multiple questions. Most regressions are of the multiple kind. As in practice, we usually want to test or evaluate many variables against a dependent variable. Why?
10. Why is Regression Analysis Useful?: wise regression analysis. Useful regression analysis is a useful for when quantitative evidence is needed. Toe cancer A particular question Quantitative analysis, by definition, requires the analysis off numbers. The opposite of this is a qualitative analysis, which analyzes non numeric data. Such a sward stories meaning or concept regression analysis is useful because it allows for the testing of hypotheses. For example, do men really earn more than women? Is unemployment in the economy related to inflation or how much more ice cream is bought on sunny days? These kind of questions can be answered with statistics, and you'll often heard a term there since statistically significant at the 5% level. In such analysis, however, regression also allows for predictions because regression models estimate parameters or coefficients. These parameters can then be used to compute new statistics. This can be done within a data sample and even outside a date sample. For example, after a regression of various explanatory factors on wages, we can use the estimated parameters to compute the expected wage of a very particular type person, whether they're in the sample or not. This prediction is a great strength of regression methods, and it allows businesses, researchers and policy makers to compute various effects
11. What Types of Regression Analysis Exist?: what type of regression analysis exists, there are many too many to count. In fact, many advance regression methods will be customized for the relevant research question and the data. However, there are some core methods that you should be aware off. These methods are primarily a function off the nature of the data and the nature of the dependent variable. The most common method is ordinarily squares. This method requires the dependent variable to be continuous, and it's often applied to cross sectional data. Cross sectional data is data that doesn't have repeated time elements within it or nearly squares also serves as the basis for many advanced methods, such as waited Lease Quest. Next, the three nonlinear methods. These methods are nonlinear because the dependent variable is not continuous anymore. Loaded in probate models are useful for bonded dependent variables. Ordered, loaded and ordered probate models are useful for when there are multiple ordered categories in the dependent, variable and multi normal. Loaded models are useful when there are normal. I ordered categories in the dependent variable. If you're wondering what loaded and problem models are, these are simply too common ways to achieve a non linear relationship between the variables Walter or some mathematical differences between loaded and probate models, realities this often make little difference to the results. Also note that multi normal probate models also exist, but they're not frequently used, which is why I'm not listing them here. Next, our panel models, both linear and nonlinear models. There are many methods in each category, but the common future is that they all work with data that is collected repeatedly over time. This could be short household panels or long, high frequency trading time. Siri's next account data models, which are similar to loaded in probate models. But you slightly different transformations to account for count properties in the data. Examples of counts are things like the number of doctor visits or the number of T shirts sold. Finally, Cox Proportional hazard models are often used when a dependent variable is time, a common example of a time dependent variable, a survival time of cancer patients, and this method is often used in the health sciences
12. Explaining Regression: explaining regression. Now that we have some basic understanding off the concepts behind regression analysis on also what type of regressions there are, let's explore how it actually works if you're on academic student regression is often learn through a variety of equations, often matrix type equations that have a lot of exes and wise and ease and use in the they served their purpose. But you don't actually need to understand them to learn how regression works. Using visual aids can achieve the same effect, and this something will focus on. In this course, simple linear regression is often explained through correlation. So let's follow that approach and then slowly keep building things up later. Correlation, sometimes called association or dependence, is the relationship between two things. In statistics, these things are often variables. Let's call them X and Y. From now, note that both variables X and Y are connected to a identify. Without this, I didn't fire. None of this will work. The identify is often represented, lied simple. I and we can imagine it to be something like individual people or firms or countries or anything else that can connect the two variables of interest in this ill table over here. There are three identifies, and each identify has one value of Y and one value of X. Let's go ahead and visualize a larger version off this table on a graph. I'm going to plot 100 data points on a scatter plot where the Y axis represents variable y and the X axis represents the variable X. This visual representation is slowly starting to tell us something. In this case, we seem to get a fairly good idea that there seems to be a positive relationship between why Index, in other words, as X increases. So that's why, however, there's also some noise in the data, and there seems to be some clumping in the values of Y Lex. Around zero, the relationship between the two variables can also change. For example, the relationship could become weaker or even negative. Here we see an example of how data can change its relationship with each other. The correlation between Wine X becomes weaker, going all the way to no correlation and then becoming negative. We end up with their relationship that is almost the opposite of what we started with. Visually, it is quite easy to distinguish between extreme type of relationships, however, it can be more difficult to visually. I don't find differences between only minor relationship changes. Take a look at this example. Here is some data that is correlated in different ways. It is easy to tell a plus one correlation. Apart from a minus one correlation, however, this task becomes more difficult for smaller coronation changes. At first glance, it would probably be quite hard to identify any difference between the 1st 2 graphs. Even though they're correlation is different. One has to look quite closely to identify that the relationship between why and eggs has flattened off a little bit in the second graph. This becomes especially tricky if there are lots of data. If we have a 1,000,000 data points that all we would see, for example, is a giant blob of blue. And that is why we often want to summarize the relationship between why necks via some kind of data reduction process
13. Lines of Best Fit: lines of best fit. What are they and how do they work a key thing to understand before jumping into the concept of how to produce lines of best fit, he said. There are two methods that we can use. These are parametric and non parametric methods. Parametric methods are methods that apply some kind of parameter or many parameters to data . Parametric methods are methods that apply some kind of parameter or many parameters to data . Often the parameters will be in the form of an equation, such as Why equals to one X, where the parameter in this case is one. This method is the method that is used in regression analysis and in ordinarily squares and has the advantage of simplicity and working with high dimensional data. A disadvantage just had to require strong assumptions about the data on when these assumptions are not met. Your analysis might be completely wrong, and often you may not even know about it. No parametric methods. Let the data speak for themselves the advantages that you need to make less assumptions about the initial relationships in the data. A big disadvantage is that this method is not very transportable. Another word. You cannot easily tell other people about it in addition and becomes extremely hard. Operate this kind of method in multi dimensional environments. So we often use non Parametric methods to explore basic relationships between why and X and Parametric methods to explore more complicated relationships between why and X one and X two and X ray etcetera. Let's have a look to see what I mean by will of this. Let's not with a scatter plot of some new data. In this case, let's blood data from the State of water data set and try to figure out how the price of cars is related. The Dumars per gallon of petrol consumed by individual cars. The initial scatter plot here tells us there is some kind of relationship between the cast price and it's mpg. It looks a negative in other words, downward sloping. Now let's try to estimate what kind of relationship this is exactly will begin with a non Parametric methods like regression. There are many non parametric methods, so let's pick one called local polynomial regression. Local polynomial regression is a form of moving regression. The user defines a brand with, or that's a computer choose one, and a regression is then estimated within that band with the band, then moves continuously across the X axis step by step and repeats his analysis the individual steps off, then all stitched together to reveal what is essentially a moving average plot of the data . Let's see how this works. In practice, the non parametric methods shown here slowly moves across the data space and continuously updates the relationship between why next? We see that the relationship between my legs starts negatively, but it ends up being slightly more horizontal. In other words, the relationship between Wine X here does not appear to be entirely linear. A big advantage of this method is that it lets the data speak for itself and dozen rely on specific functions or even theory to fit the data. One disadvantage of this method is that the relationship still needs some kind of input. In this case, it requires the size of the bandwidth. If we changed abound with something smaller, the relationship will look different. He is an example of that. Another disadvantage of this method is that it is difficult to transfer this relationship to other users. How can we explain this squiggly line to somebody else, so we often choose a Parametric relationship. A Parametric relationship is one that could be the falling by some kind of equation. For example, a linear line fitted through the data will have a Grady int on that Grady and will be the parameter defining the relationship between why necks that's brought a linear function through the data and see how this looks. Here we see a linear line being fitted through the data, in this case, the life it is based on minimizing the overall distance between the fitted line and all the available data points. This concept is known as least squares on well explored is more detail. In the next session, it underpins the ordinary least squares regression methodology. Defended line in this case has a particular slope minus 238. In other words, for every one unit increasing mpg, the average car prized appears to drop by $238. Great, however, Parametric lines of best fit don't always need to be linear. We can also add a quadratic line of best fit. In this case, we recover two parameters have to find the relationship between Why Necks? He is an example of that in this case the relationship between wine excess parameter rised by one parameter pulling white down as X increases and another prompter pulling. Why back up as X increases? In this case, the parameters are approximately minus 1200 for every increase in X and plus 20 for every increase in X squared. Don't worry about the X squared right now. We'll explore this later. But the important concept is that the functional form of Parametric lines of best fit can be made to be very flexible as long as enough parameters are available. So how does all of this related regression? Well, this is regression. Specifically, this is simple regression where why is regressed against one variable X? How about multiple linear regression? Multiple linear regression is an extension off simple linear regression, and it adds more Bibles to the mathematical framework. One easy way to visualize this is by adding further dimensions to the scatter plot with each extra dimension represents an additional variable. Let's say, for example, that we wanted to explore the impact of mpg on car price but controlling for cars. Wait. Hey, via cars are likely to have poor mpg, and this may affect the price. Visually, we can represent this by a three dimensional scatter plot that plots price against mpg against weight. It would look a little bit like so. Moreover, by rotating the scatter plot, we can look at the relationship that each expansion available has with Y and even examine how the expletive ables are correlated with each other. Finally, what multiple regression analysis does is instead of estimating a line of best fit through the data, if it's a plane off, best fit through the data. This can be hard to visualize on a screen, but here is a crude attempt of mine. The left graphs show the actual data points on a three D scatter plot. While the right graphs show the estimated relationship between these data points, this relationship is represented by a three D plane. If more variables are added to the framework, the plane of Best Fit becomes a hyper plane of best fit. And this is why we sometimes hear people talking about multi dimensionality. When referring to regression analysis,
14. Causality vs Correlation: cause Ality versus Correlation. Hopefully, the previous examples will have given you a good intuitive grasp off what regression analysis tries to do. There are lots of statistics and mass in each type of analysis, but the underlying concept will always remain the same. Regression analysis tries to tell users how data is related to each other in a way that it's easier to understand than looking at the raw data point. However, it is important to be keenly aware off the concept of cause. Ality versus Correlation. Every regression method is a statistical method that correlates. Data that sit a computer or a mathematical equation cannot identify what is causing cause. Ality is always interpreted by the end user, and some models allow better claims, of course, ality than others. Evidence obtained from regression analysis about a strong and statistically significant relationship between two variables may be attributed to cause ality through a compelling theoretical framework and common sense. This can take a lot of practice and almost becomes an art form. Sometimes the data helps, For example, if yesterday's events are used to explain today's action, the time element in the analysis can be used to make better causal in front. However, in other settings such as cross sectional surveys settings, it can become much harder to attribute cause ality. Are people happy because they're healthy, or are people healthy because they're happy? These are tough questions to answer and require theoretical and philosophical reasoning in addition to statistics, so you should always be careful when dealing with regression analysis.
15. What is Ordinary Least Squares?: what is ordinarily square ordinarily squares is a regression method that is based on the concept of least squares. Least squares is a statistical method that fits airline or plane or hyper plane of best fit by minimizing the some off squared residuals between the line of best fit and the actual data points. We square these so called residuals because the some of them is exactly zero when they're not squared. Therefore, negative and positive residuals above and below the line of best fit, cancel each other out. Squaring solves this problem. Many other ways of fit in the line of best fit exists. One example is to fit in line by the method of least absolute deviations, where instead of squaring residuals, the absolute value of them has taken. In other words, negatives are turned positive. However, the squares is by far the most popular method across all the scientists
16. Ordinary Least Squares Visual 1: ordinarily squares visually toe understand the better. Imagine a small data set with a few data points a little bit like this one. Ordinarily, squares will fit a line through these data points. This line may be linear, but it can also be nonlinear. Let's go with a linear example. The red line represents the line of best fit, estimated by ordinarily squares mechanics. In this case, the line of best fit can be represented by a single slope parameter called beta. We often used a Greek letter, bater to denote the slope of her aggression line. This slope informs us of the estimated relationship between why next in this case, why is the price of a car and X is the mileage and MPG? The slope is negative, which means that as the mpg increases, the price of cars decrease. However, note that our slope does not hit any of the actual data points. That is because we're estimating an average relationship between all the available data points. The actual data points are often called observed data points. In other words, why observed the predicted value of why at any given value of X is then given by the line of best fit Visa called predicted data points. Or why predicted, the difference between the observed value and the predicted value is called the residual value. This is what all nearly squares tries to minimize. You can see here that there are three data points and therefore three different visit jewels. The sum of all three is the smallest value that we can achieve in this case. If we change the line of best fit, for example, by moving the line of best fit down, the total sum off the residuals will increase. This is a graphical explanation of what all nearly square strives to do. It finds a regression slope and then intercept that leads to the very best minimum, some off residuals. Let's have another look at this with more data. In this example, we're going to use the full auto training data to see what happens to the root mean square error. When we apply different regression slopes to the data in the left panel, we observe the regression slope going through the data. We'll start with a positive slope off, plus 100 on the right panel. We see the size of the individual residuals, the residuals R squared and then square rooted to ensure only positive values remain. The lowest value a residual can have, therefore, is zero higher. Residual values mean that the relevant data point is far from the actual regression line, the average of all these residuals. It's called the root mean square error off the residuals, and this is depicted by either red line. It tells us how far on average that data points are from the regression life. Now let's look and see what happens when we change the slope. We can see that as we slowly change, slope off the regression line from positive values to negative values. The average error between the line and the data point decreases. The residuals, on average, are trending down as we decreased the slope. This keeps happening until after a certain slope value, the average of the residuals starts to increase again at a slope off around minus 230. The average error from our line of best fit is minimized, and therefore that is our line of best fit. Of course, this graph is a simplified version of what happens. Regression models can have many more variables and therefore many more parameters, and we would need many more dimensions to display such models graphically. Now let's take a look at how orderly squares models are often presented by computer.
17. Ordinary Least Squares Visual 2: Here's an example of how Stater presents regression are. The computer programs may present its differently, but the essence of the information displayed will be similar among all programs. Often part of the regression upper displayed will be diagnostic information that provides high level information about the oval regression model. In ST Ah, this is usually the top part of the output. The lower part of the upper table normally present the estimated coefficients for the relevant variables. There are many pieces of information in this table. However. Generally three pieces matter most. The first is the actual parameter estimates. In other words, the estimated slopes or coefficients, off lines or planes of best fit through the relevant data. In states that this is called coif, which is short for coefficient, each explanatory variable has a relationship with the dependent variable. In this case price. Each expansion level is also conditional on each other. In other words, the effect of mpg conditional on controlling for weight is that for each increase by one unit off mpg price drops by $49. The effective weight this as follows conditional on mpg. An increase in one unit of weight leads to a price increase off $1.7. Final variable. It's a constant constants are the value that the dependent variable in this case price takes when everything in the model is set to zero. In other words, at a weight of zero and at zero MPG, a car should cost around $1946 according to this model. Constance sometimes make sense, and sometimes they don't. In this case, it doesn't make a lot of sense because cars would never have a weight of zero or consume zero MPG. Some people say that constant should be removed from models, especially when they don't make sense. I think that is wrong. You just need to take care when interpreting. Constant, often constant should not be interpreted but left in the model. The next most important piece of information comes from the column called STD Era, which is short for standard error. The standard air statistic is a statistic that reveals with what degree of accuracy the slope coefficient is estimated. If the standard error is low relative to the coefficient, then we can be more certain that the estimated coefficient is close to the true population parameter. If the standard error is high, we can be less certain and have more noise around their estimate. The standard error is important because it allows us to determine to what extent the estimated coefficients from the regression model are statistically significant. The four remaining columns in the results help put, are all further computations off the standard era and are simply different ways to identify significance. The T statistic, the P value, the lower and upper confidence intervals are essentially all the same thing and are based purely on re calculations. Off this Onodera. Well, look at what they mean in a moment. Finally, the third piece of information that matters most is something called R squared. This information is given in the diagnostic parts of the output table and can be found here . Our square is a common indicator off goodness of fit for ordinarily squares regression models. It is bounded between zero and one and higher values in the Cape. That model better fits the data. However, many professional users will caution against over interpretation. Off R squared statistics numbers are relative to the discipline. If you're working with behavioral data, such as people on their choices than R squared off 0.2 or 0.3 are very common and usually indicate good fitting models. If you're working with Time series data such as macroeconomic GDP measures than our squares of 0.8 or 0.9 are very common and indicate good fitting models. Finally, let's talk a little bit more about how the estimated coefficients are related to statistical significance. Let's begin with T statistic. This statistic is an indicator of statistical significance, and normally we're looking for value off 1.96 or above when we're using a reasonably sized sample. Reasonably Sigh samples means around 100 or more observations in the month. The T statistic is easily computed by dividing the estimated coefficient value by the estimated standard error value. Note that when the coefficient is negative, state of will produce a negative T statistic. The sign on the T statistic should, however, be ignored. Next to that is something called the P value. This is short walk probability value and indicates the probability of obtaining the observed result off a test, assuming that the null hypothesis correct. But the no hypothesis in regression tables is normally that a specific result is no different from zero. In other words, small P values mean that there is stronger evidence in favour off the alternative hypothesis, the alternative hypothesis being that the coefficient is the actual estimated coefficient in layman terms and number of 0.5 or below in the kids. Statistical significant at the 95% level numbers below 0.1 in the Kate significance at the 99% level and so forth. Next, our confidence intervals. There is an upper and lower confidence interval. Upper and lower confidence intervals are computed by adding or subtracting 1.96 times the standard error from the estimated coefficient. In other words, the confidence interval is usually two standard errors away from the coefficient. Estimate. Confidence intervals are really useful because they allow you to quickly Aibel statistical tests. Any number outside the confidence into your range will be statistically significantly different from the coefficient estimate, and this example mpg is not statistically significantly different from zero because zero is within the confidence interval range. However, mpg is different from minus 500 because this number is outside the confidence in Torrey. This can be a really useful way to quickly perform statistical testing, and all it involves is multiplying the standard era by approximately two.
18. Sum of Squares: sum of squares. Now let's take a look at the sum of squares in a bit more detail. The previous regression table also provided diagnostic information on the explain. Some were squares, the residual sum of squares and the total sum of squares. These values in the Cape. How much variation is explained by the fitted model? How much variation is unexplained by the model? How much total variation there is in the data. By comparing the proportion of explained sum of squares to the total sum of squares, we can produce something called the coefficient of determination. Often called are square ah squid. The R squared value is a widely used measure, fit for ordinarily squares. Models on the value indicates how well the model fits. The data values of one mean a perfect fit. Values of zero mean a terrible fit. However, the basic ask word can only increase as more explanatory variables are added to the model. In other words, models with hundreds of random co variants, conceptual ated data and produce artificially high goodness of fit statistics. This is why we often also report the Justin R. Squared, which imposes penalties to more variables being added to models. If additional variables are not statistically significant, they will reduce the adjusted R squared value. This statistic tries to strike a balance between rewarding good model building and overloading models with unnecessary variables. However, it should be noted that ask what can be easily abused and should be treated with caution. Hi are squares do not necessarily imply that one model is more valid, and another. Let's take a look at this example. In this demonstration, I'm going to change the noise level around the line of best fit. The true relationship between Wine X is one, and this is what is estimated by the line of best fit. The original data has very little noise and the regression line. It's almost every data point, resulting in an R squared of one. Now, let's go ahead and change this noise level around the true regression. No, we can see that the R squared changes quickly. As we increase the noise around the data. The are square quickly drops in value, suggesting that the model fits that data. Worse and worse, However, the model actually remains the same. One is changing is only the noise around the data noisier data result in a lower R squared value, and the layman observer might claim this to be a poor model. But as you can see, the relationship between why and X hasn't changed at all on the model continues to recover with the correct coefficient value. Both models in this case have the same validity, even though they have different R squared values. And that is why I want you to always be careful with our square. The OSCE quaint example. It's just our next discussion.
19. Best Linear Unbiased Estimator: best linear, unbiased estimate. Orderly Squares is said to be the best linear, unbiased estimator if certain conditions are true. Having an understanding of these conditions is important as some matter more than others. These conditions are often called the cows Markoff assumptions and refer to four particular assumptions that need to be made. Love data. If these assumptions are met than the ordinary least squares estimation is said to be unbiased. In other words, the results produced by the estimated will, on average, be correct. If the cow smoke of assumptions are met, the OLS estimator will also be the best estimator. Best is another word for efficiency and statistics. This simply means that the ordinarily squares estimate that will produce the most accurate results with the least amount of noise. Let's explore this to concept a little further before we discuss the actual assumptions. Efficiency refers to the wits off the sampling distribution, and when an estimator is said to be most efficient, it's sampling. Distribution is less than that of any other estimator. We can visualize this in an easy way by assuming we have two different estimators and an infinite amount of data from this infinite amount of data. Let's go ahead and select a small sample and then try to estimate a particular coefficient for a variable. We're going to use an inefficient estimator on an efficient estimator. We're going to set the true value of the coefficient to one the first time we estimate the coefficients. Using both estimators, we return a value of around minus six for the inefficient estimator and minus two for the efficient estimator. Now let's go ahead and repeat this process the second time. Our estimates a closer. The inefficient estimator predicts a value of around minus one and the efficient estimator off around zero. Both are still some way off the true value, but the efficient estimator seems to be getting closer. Now. Let's go ahead and repeat this process quickly, hundreds of times and see what happens. Both estimators, on average, get to correct value of one. However, the inefficient estimator is on average, further away with its predictions than the efficient estimator. This is the concept of efficiency, and once we normally don't have an infinite amount of data, this concept is often visible in the standard errors of real life results. Inefficient estimators tend to have high standard Aris, resulting in more uncertainty around the true estimated value. Next, let's explore the concept of unbiased nous when an estimator is said to be unbiased that this means that the meat sampling distribution off the coefficient estimates will approximate the true population coefficient. We can visualize this in an easy way by again, assuming that we have two different estimators and an infinite amount of data will select a small sample of this data and try to estimate a particular coefficient. The true value of this coefficient is set to one, and this is denoted by the dotted red line. We use a biased andan unbiased estimator toe estimate the same coefficient. The first past produces an estimate off around zero for the biased estimator and 1.5 for the unbiased estimator. No, let's do it again On the second pass, the biased estimator performs better with the result of three compared to the unbiased estimator with the result of five. But let's continue and repeat this process many times as be repeated process. We see another average. The on vice estimator starts to predict a value of one. What's the unbiased estimator predicts? A value off minus one. That can obviously be a big problem. For example, the objective might be to perform a policy evaluation, and a buys estimator estimates the policy to have a negative effect. What's in reality, it might actually have a positive effect. Buyers is a serious problem in econometrics, and ordinarily, Squares requires some pretty strict assumptions for estimates to be a biased. It is important than to have some understanding off the assumptions behind ordinarily squares.
20. The Gauss-Markov Assumptions: KAOS. Markoff Assumptions. The gas Markoff assumptions are the underlying assumptions that make ordinarily squares the most efficient and unbiased estimate. Generally, four major conditions are needed to achieve this result. These are the home aske elasticity assumptions. There's no perfect culinary assumption, the linear in parameter assumption and zero conditional mean sometimes called exhaustion. 80 Assumption. Roughly speaking, the first to relate to efficiency while the last two related bias, let's explain each in turn and try to determine which matters most.
21. Homoskedasticity: the homos. Good elasticity assumptions. This assumption states that the variance of residuals remained stable across the spectrum off independent variables. In other words, the air is produced by a variable remain roughly constant. Whenever we look at a small part off that variable failure off, this assumption leads to buy standard Eris, and this means we can't rely on hypothesis testing. However many modern statistical packages can easily test. I am correct for this assumption. It is very common, for example, to use something called robust standard errors, which increased inefficiency of the estimates slightly, but make them immune to the failure of this assumption. Let's go ahead and look at an example. In this video, there are two graphs. The left graph shows the relationship between the expansion variable X and the dependent variable. Why the overall relationship never changes, but the variance across X will. In the right graph, we see the residuals or Paris off X. It shows the distance off the actual data points to the line of best fit. The left graph also shows the slope estimate. Understand that era from a normal ordinary least squares regression and a robust ordinarily squats a question Now let's go ahead and run this example and examine what happens when we introduce a changing burials across X. We see that as we increase in variants across X, the actual regression coefficient never changes. However, the standard heiress increase as we increase the variance across X. Moreover, the robust on a terrace increase by a little bit more. All this means is that the failure off the Homo skin elasticity assumption leads to less precise estimates in the real world with modern dates, it's a fairly of. This assumption often has little overall effect on the actual results, and most practitioners do not focus on this assumption a lot.
22. No Perfect Collinearity: no perfect Colin Hearty. This assumption states that an explanatory variable cannot be an exactly linear combination of another explosion available. If this is the case, or nearly squares simply cannot be estimated. This is rarely a problem. Real life, as you would never enter the same variable twice into a regression. However, when there is partial correlation between two variables, in other words, they measure the same thing to some extent, then we turn this multiple linearity, and this can have some effect on our estimates. Specifically, it will increase the noise and therefore the standard errors off our estimates. This phenomenon is generally easy to test for and also easy to deal with, but either excluding valuables or transforming them. Let's look at an example. In this example, I generated a data set that has five different explanatory variables. These range from X one to x five each X variable has a coefficient off one. The graph on the right presents the estimates from ordinarily squares progression and the associate ID 95% confidence interval. Around these estimates, we can see that ordinarily, Squares estimates the value of approximately one each off the five valuables on the left graph. We see the correlation between X one and X two. Currently, there is no correlation between both variables, which is why the data points are scattered randomly. Let's go ahead and see what happens when we start to introduce a correlation between X one and X two and slowly force X one and X two to measure the same thing. At first, not much happens. But then, as the correlation between the two variables increases, the standard error and therefore confidence intervals off both x one and x two starts increase. This happens until they explode towards the end. This is the effect off Colin Garrity. Hi, Colin Garrity between valuables leads to very noisy estimates. But as you see, the noise explosion only happens towards the very end. And in most riel scenarios, the effects of corden ality are hardly noticeable.
23. Linear in Parameters: The next assumption is that the model is linear in parameters. This assumption means that the relationship between why of the excess in the ordinarily squares model is linear. In other words, the coefficient estimates take single values and can only be added or subtracted. They cannot be exponentially ated, divided or multiplied. In general, this assumption makes ordinarily squares regression models easier to interpret. Note. This only applies to the after coefficient. Variables can be transformed in any way, including nonlinear ways. We often call this functional form, and we can vary the functional form as we please in ordinarily squares regression, for example, it is common toe add higher order polynomial of variables to regression equation. A commonly used example is age and h squared were both variables are entered separately. This has the effect of introducing a curve into the line of best fit. Variables can also be directed with each other, and we called it interaction effects. This means that lines of best fit can take on very complicated, functional forms. Let's go ahead and look at an example and this example. There are two graphs. The left hand side show the data plot off the auto data where the price of cars is plotted against MPG. The right hand graph shows the residuals, or how far the individual data points are from the line of best fit. The average distance is represented by the red horizontal life. The initial relationship planted through the data is linear, but it should be fairly obvious that this relationship is probably not a good fit. So let's introduce a quadratic into this relationship and slowly increase the coefficient on the quadratic term from zero. Here's what happens. The line of best bid starts to curve upwards, this kind of result in the better fit, and we can see the residuals coming down, especially for higher values of mpg. Our model fit improves. At some point. We over fit the model by continuously increasing the quadratic coefficient, and the model fit becomes worse. Again of this example highlights the power off functional form. The model is still Linnean parameters because the two estimated coefficients are only added or subtracted. But this square manipulation of X leads to a complicated, nonlinear, functional form that improves the model fit
24. Zero Conditional Mean: zero conditional, mean often called the exhaustion 80 Assumption. This assumption is one of the most important assumptions in orderly squares. The assumption states that there is no correlation between an explanatory variable eggs, and the airtime very of this assumption leads to bias in the coefficient. Estimate of this assumption can often fail in real life because it involved the air. A term which by definition is not observable, can never be tested. A good rule of thumb is that whenever a variable is a choice, especially in individual choice, then it's unlikely to be driven by factors that are UN observed. And hence a relationship with the error term might exist. Let's have a look at an example and this example. I've set up a simulated data set that again contains five explanatory variables, each available as a coefficient off one in relation toe. Why the dependent variable? On the right hand graph, we can see the individual or nearly squares estimates on associated confidence interval for each of the five variables. The correct results are shown by the vertical red line. On the left graph, we see correlation that variable X one has with the airtime note In reality, we can never observed this as the era term will always be hidden from us. Only in this simulated example can we see the air It er the original correlation between X one and the character is set to around zero. Now let's go ahead and increase the correlation between X one on the air time and see what happens. We observed that Ordinarily, squares estimate for X one slowly deviates to the right away from its true value. The more we increased the correlation between X one and the air term, the hired bias in our result becomes This can be a real problem in applied work. And when we have such a problem, we often call it in Dodge in 80.
25. How to Test and Correct Endogeneity: how to test and correct for and Arjun, 80. It is not possible to test for something that cannot be seen. That is why good ordinarily squares models are strongly underpinned by theoretical frameworks, prior literature and rational argumentation. This assumption is also why many scientists argue against data mining with ordinarily squares models, data mining approaches increased unlikelihood off the exhaustion, 80 condition failing and results becoming biased. In the real world, the way to deal with indulge in 80 is often by more data better, more thoughtful model, building different functional forms and also sometimes simply accepting that the models may have some bias.
26. The Gauss-Markov Assumptions Recap: Let's recap the gas market assumptions. The Linnean Parameters assumption is a condition that requires all betas to be additive. It means, in layman terms that the dependent variables should be continues. But it does not mean that the relationship between Y and X must be linear. More complicated functional forms can be worked into ordinarily squares. Regression models violation of zero conditional mean assumption, often called the exhaustion. 80. Assumption can lead to biased estimates. This is a very important assumption. It is not possible to test for it statistically, and identifying or defending against it must be done on theoretical grounds. There is no easy solution if this assumption is violated. Options are to include missing variables in the regression model to attempt alternative identification techniques or to resort to simulation type methods that try to identify the size and direction of any potential bias. The no perfect culinary it assumption must be met or ordinarily, squares. Regression won't work, however, weaker Colligan Garrity between variables will result in increased under. There's fortunately standard. There's only explode the extreme correlations, and this can be tested for and corrected by either dropping variables or transforming them violation off the homos. Contrasted city assumption leads to incorrect standard errors It is easy to test for using appropriate statistical test and easy to correct for with robust on a terrace, Further are included in almost all status school software packages.
27. Applied Examples: Okay, let's explore some of these concepts we've been discussing in a more applied environment. We're now on Stater, which is a statistical software package commonly used toe. Analyze quantitative data set. It's similar to other packages such as SP Assess Our process. I won't explain how to operate Stater or the code that I'm executing to obtain the results . You can learn more about data in specific state. The courses. I've already opened up a training data set called Auto. Let's go ahead and examine it a little bit closer before we start running regression. A common mistake is to stop analyzing data to quickly before fully understanding what's actually inside the data. Modern date set can be very complex on more often the time spent on data preparation and manipulation will outweigh the time spent on actual regression analysis. Let's describe the data to see what we have. The output returned by describe will produce some high level information about the data, such as where it is located, how many observations and how many variables are included in this case. Our data contains 74 observations and 12 variables, so it's not very big. It also has a title that tells us that this data is related to cars from 1978 below. That is information about the variables. One of them is a string variable that contains the names off the car types and the rest are or numeric variables. Let's pretend that we are really interested in explaining the determinant off car price. We can already start building a picture now head. What variables might be important in explaining the price of a car wait and mileage seemed like important variables, while turning circle is probably less important to most people who buy cars. Next, let's explore some summary statistics of the data so that we get some my dear off how the variables are measured on distributed price appears to be measured in dollars, and the least expensive car costs around $3000. What's the most expensive car costs around $16,000. Such prices seem reasonable for 1978. We also see that the variable rep 78 has some missing observations. It only has 69 instead of 74. Most variables also appeared to be continuously measured. However, it looks like the valuable foreign is measured as a binding variable. Let's go ahead. I've confirmed this quickly. My tabulating forum. We see that indeed foreign is measured as a bind. Available around 29% of cars are foreign, so let's go ahead and estimate. Some ordinarily squares progression walls rather than in really going into a full blown model with many variables and interaction terms. Let's build it up slowly and interpret the output and diagnostics along each step. The variable foreign leads itself to a nice, simple question. Ah, foreign cars more expensive than domestic cars. We could answer this question by quickly computing the mean for both subsets of the data and simply comparing the mean. However, we can also achieve the same thing in the regression framework. Let me show you this code regresses the explanatory variable foreign against the dependent variable price. The regression results off this table are pretty easy to interpret, but before we do that, let's quickly look at some diagnostics. The regression includes 74 observations, so that's good. There are no missing observations. The F statistic is not significant here. We're looking for values below 0.5 values above 0.5 employ that now total model. In other words, A. With the variables in our ordinary least squares regression do not explain how price varies . Likewise, the R squared is extremely low. A value of 0.24 means that we explain almost nothing in terms of price variation with the variable foreign. Now let's go and look at the result. We have one variable Copthorne. However, this is a fine available, not that continues. Variables. Such valuables have the following interpretation. If the value off the variable is flipped from 0 to 1. In other words, if a car changes from being the Mestre car to a foreign car, by how much will the cast price increase? The answer here appears to be $312. However, we also observed that the standard error around this estimate is quite large. The standard error is $754. That means the associative T statistic is below 1.96 and the P value is above 0.5 This means this very what is not statistically significant at the 95% level. We get my dear of the uncertainty by looking at the confidence interval. This ranges between minus $1200 plus $1800 so the true value is somewhere in there. But because the confidence in tow crosses zero, we cannot claim statistical significance compared to the value zero. Finally, remember that the effect of the variable is conditional on other controls. In this case, there are no other variables in the model, but there is a constant, and the constant is the value off price if everything else is set to zero. In other words, if a car is domestic and it's a value of foreign is set to zero, it will cost $6000. A foreign car is $312 more expensive, so would cost around $6300. We can also visualize this. Here we see the estimated effect off foreign cars on price. Domestic cars are cheaper on average and foreign cars more expensive by $312. But the confidence interval off both values is so large that they're not statistically different. Great, let's go ahead and increase the number of parables in our model. We could throw all our variable soon and simply see what sticks. This is what a data mining approach would generally do. Stater hers. Various data mining abilities, including stepwise regression that will automatically eliminate variables that are not statistically significant. However, there are some conceptual problems with this approach. One of the most important problem is that it prevents uses from thinking about a problem at hand and doesn't allow them toe. Understand how their data analysis is related to underlying theory or their research hypotheses. For this demonstration. Let's go ahead and slowly add one variable after another, valuable to our regression wall. We will not remove foreign, even though it is insignificant, because Theoden Ishan off other variables, may change its effect. Let's go ahead and at mass per gallon. Terrible. We see now, interestingly, that some immediate, significant changes have occurred. Our ask what has trump drastically to 0.2 A. They're just in. Our squid is a little bit lower at 0.26 but this is still much, much higher than before. Our new variable mpg is statistically very significant, with a small standard error and a high T statistic each increase in one unit off mpg. In other words, cast getting more fuel efficient will decrease the car price by $294. However, we also see that the effect of foreign cars has increased dramatically. Two plus $1700. The standard error has come down a bit from previously 752 now 700 and the variable is now statistically significantly different from zero. What a big difference. One variable makes our model importantly, we can explain this change. Turns out that foreign cars have significantly higher mpg numbers on domestic ours, and once this factor is controlled for the actual price of foreign cars is higher than for domestic cars. This is because the effect of mpg is negative on price, and because foreign cars have higher mpg, their price was lower. Now that this effect has been controlled for and therefore taken out off price, actual effect off a car being foreign is that it causes a price. Rice. This is a perfect example of the eggs originated assumption I was talking about in the previous session. We omitted a important variable from every question model, and the expansion viable we did include was correlated with that important valuable in the air a term so therefore the previous result was biased. However, because we have now moved the offending variable mars per gallon from the era term into the regression model, we are controlling for it and hopefully produced a less biased estimate. This really shows the importance of careful model building. Let's go ahead and introduce 1/3 variable term model. Wait Wait is likely to be an important variable because heavy cars need more raw materials , but also because heavier cars are likely to affect the mass per gallon them. And we know that this in turn affects the foreign estimate. So let's go ahead and had it toe our regression model. Look at that Now Ask would jump up again by a large margin and also our estimated effect have changed again. Let's explain it one more time from the top. The new variable wait a statistically significantly different from zero due to its small standard era, high T statistic and small P value. The effect is positive. In other words, each additional pound off away on the car increases the price by $3.46. The effect of mpg is now positive instead of negative. So the inclusion of weight reversed the sign off this estimate. Higher mpg cars now lead to higher prices, although this effect is not statistically signature. This makes sense after work. Higher MPG cars are more fuel efficient and save money. This may require better technology, and therefore such cars may cost more. However, the previous effect was masked by the fact that heavier cars had worse mileage. Now that this is controlled for the effect of mpg has become less bias. Moreover, because there's a knock on effect of mpg on foreign status we see now the effect of foreign cars jumped to $3600 with a lower standard ERA of 680. This is another important example of regression buys where important explanatory variables were left in the air. It ERM let's assume for a moment that we're now finished with our model building and that we're happy with the specifications that we have. The next step is usually to perform some diagnostic statistics, especially in relation to the cows, Markoff assumption discussed in the previous session. Unfortunately, the exhaustion eight assumption cannot be tested. It can only be inferred, my adding are the variables to the model has just shown or by resorting to theory. We can, however, test the homos domesticity assumption. So let's go ahead and do that here stated. Performed a test for Homo skin elasticity. The results showed that the null hypothesis off a constant variance is rejected in favour off the alternative hypothesis off hetero Scholastic city. In other words, varying variants we can also explored is visually. By examining the residuals here, we've plotted the residuals versus the fitted values. This residual versus fitted plot shows how the residuals are distributed around the plane of best fit values close to zero mean a good fit. We can clearly see on this boat that when we move from low fitted values to hire fitted values off price, the variance of the residuals around zero increases. This is clear evidence off changing variants and needs to be dealt with. We can either use for Boston in Paris, specify a different functional form tries to remove this changing variance. Improving model fit is often a better first option, and in this case the problem might be caused by the fact like many price variables, car price has a long tail. Often we transformed such variables with logs. So let's go ahead and do that. Now let's run a new regression with the dependent variable as log price instead of price, and let's see what happens. At first glance, it looks like everything has changed. The coefficients are completely different, however, because we have now transformed the dependent variable. All explanatory variables relate to the log price and not the price, and this means their interpretation. It's slightly different now. A one unit increase in weight increases the log price of a car by 0.4 This can be a rather inconvenient way to interpret our model estimates, so we often re transform the coefficient to make them easier to understand when a regression model has no lock transformation, either for the dependent variable or the explanatory variable. We call this a level level model, and the interpretation is straightforward. When explanatory variable isn't logs, the interpretation on the coefficient changes to a 1% increase in X causes a beta divided by 100 unit change in war, when the model has a log dependent variable, the interpretation changes to a one unit change in X causes a bait at times 100% change in white. And when the model is a log log model. The interpretation is that a 1% change in X causes a beta percent change in why so in this case, a one unit change in weight causes a 0.4 times 100 equals to 0.4% increase in price. Likewise, foreign cars now cost around 53% more in terms of price. Now let's go back and test the home of skin elasticity. Assumption. Again, the test statistic reveals that we can now accept the null hypothesis off home Oscar plasticity. We can also visualize this again, using the residual versus fitted look here. We can see that as we move along, if it'd price values, the spread of the residuals around the horizontal zero line is much more Even. This is a visual evidence that our model now has home a scholastic errors and that we can accept that particular assumption. Next, let's go check for Colin Charity. This variance inflation factor test highlights to what extent each variable inflates the variance off the model high values above the 50 or so on particular variables are indicative that these particular variables are Colin AEA with other variables here. There is no evidence off high cooling Garrity in our model, because all variables are very low variants inflation factor values. Finally, we can also introduce more complicated funk for forms. Parameters must be linear, but variables can be transformed and offer more complex forms than just linear relationships. For example, we conclude a weight squared variable into the regression to allow a quadratic relationship to exist between La crise and wait. In this new regression, states are included. Wait and they wait squared variable. It's important that the interactive variables are analysed together, so whilst await variable is not statistically significant. The square weight variable is statistically significant, and a joint test should be done on both to see whether the pair is significant or not. Let's assume for a moment they are jointly significant. The interpretation of the output becomes a little bit more complicated, but interaction effects can also be visualized and state that can do that for us. Here we can see that the relationship from our predicted ordinarily squares model between weight and log price is not actually linear, but appears to be quadratic and nature. In other words, there is a curve going through the relationship between price and wait. As weight increases, the log price increases more and more and more. Great. Now let's assume we're done. Model building regression models are often presented as they are shown by statistical programs. There's simply too much information in the tables presented by statistical program, most of which is redundant or not useful to Lehman readers. It's also common to include multiple regression models in a table so that readers can fall on the progress off the coefficients as additional variables are included or removed from model. He is an example of how regression tables often look in reports. This here is a classical regression output table that contains two coefficient to three decimal places and standard errors to three decimal places. Asterix are included to easily identify statistically. Significant effects and the diagnostics own include the observation count on our squad. This table easily allows readers to read across and examine how the effect off the valuable foreign on price, for example, changes as we change our model specifications This kind of approach is important at the DIS , a transparent approach that shows the ingredients off how this particular statistical meal was made. Readers can judge for themselves if they agree with your particular conclusion or not, and this concludes this practical session on ordinarily squares regression.
28. Final Thoughts and Tips: final thoughts and some tips. Hopefully, you've enjoyed this introduction to linear regression analysis. I have some tips you may want to consider when applying regression analysis to data practice. As with many things in life, it is practiced and frequent application that leads to an ever greater understanding off the issue at hand. The same is true for regression analysis. All the theory in the world will not overcome a lack of engagement or application. I always recommend that people should just get stuck in and start exploring data. Think carefully about your original objective. Are you trying to simply understand correlations in your data, or are you trying to determine cause and effect? The first can be done through simply playing around with the data on the regression model. The second will need much more deliberate thoughts about theoretical underpinnings and rational argumentation. Why might X cause why and what could the transmission mechanism be? What else might influence such transmission? Estimate multiple models with small variations. Results are more convincing when different models continuously show the same kind of outcome. Does the inclusion of a particular variable change everything or do your coefficient remain robust, showing a pathway towards your final preferred specification is a very important part off the modern regression analysis, data quality and sample size matter as much as model building. Big innovations in data quality and size have happened since the 19 eighties. Not every model needs to be a complicated thing. Quality and data can add significant credibility toe any results. And you should not shy away from claiming that this data is the best data available. Toe answer. This particular research question, high quality data said, often require complicated data manipulation. A lot of regression mistakes do not emerge from bad model building, but from poor data coding. Do not underestimate the amount of time that should be spent on data cleaning and preparing the data for regression analysis. Ordinarily, squares is still the most commonly used regression method in the world. It would be wrong to dismiss it as a simplistic method. Playing around with functional form through interaction effects can lead to complicated ordinarily squares. Models that closely resemble reality do not be afraid to explore more complicated models that use quadratic terms and other interaction terms. Understand the role of diagnostics in regression analysis, do not get hung up about textbook diagnostics. But do query Whether regression assumptions about the data hold are their assumptions that might be too strong for the data at hand. Finally, have a healthy dose of skepticism when someone is claiming a causal relationship. Regression coefficients often contain some kind of bias. At the same time, don't be a no Sayer and reject everything. Like many things in life, regression analysis is an extra tool that should be used in conjunction with other evidence , such as prior results, theoretical frameworks and also qualitative evidence. There is a fine line between art and statistics in regression analysis.
29. Suggestions for Further Learning: next. This course was intended as an easy introduction into a very complicated topic. Should you wish to know more, I recommend gaining some understanding off the following. Next, nonlinear models specifically probe it and loaded models toe analyze Bindley dependent variables these regression models internally to variations that also analyze multi categorical data. However, understanding the basics off interpreting non linear coefficient is vital to engage in with such models. Applying regression on actual data using statistical software such as SBS says Are or stater. Some of the best learning takes place when models are applied to real data. Play around with different regression models and explore diagnostic tools in the software packages. Advanced users will need to learn new coding. Languages on There are many great books and courses available for each of these Softwares. Learn about simulation. There are many statistical regression methods. A good way to learn more about each of them is to learn how to generate your own custom data from nothing. Simulation allows you to do that and is a great way to explore what new estimators due to your particular data, read relevant applied reports. No report that uses regression analysis both go straight into the results. There's always context data, descriptive theory, pry literature and method. A logical setups. Learn how applied workers done in your field by reading government reports, consultancy reports or academic papers, copy their style and learn how to write stories around your results.