Stata: Panel Data Analysis | Franz Buscha | Skillshare

Stata: Panel Data Analysis

Franz Buscha

Stata: Panel Data Analysis

Franz Buscha

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
10 Lessons (1h 45m)
    • 1. Intro

    • 2. Setting up panel data

    • 3. Panel data descriptives

    • 4. Lag and lead operators

    • 5. Linear panel regression (OLS, RE, FE)

    • 6. The Hausman test

    • 7. Non-linear panel regression (OLS, RE, FE)

    • 8. Difference-in-Differences

    • 9. Parallel Trend Assumption

    • 10. Difference-in-Differences without Parallel Trends

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

This class is a Stata module that explores how to analyse, and model, panel data using the statistics software Stata.

Panel data analysis examines longitudinal data that contains repeated measurements on observations over time. Common examples are household surveys, country surveys or medical surveys. 

The key concept of panel analysis is that time needs to be properly taken into account. Panel data distinguishes data variation into multiple components allowing for more complex assumptions to be modelled. Common estimators include the pooled OLS regression mode, the random effects regression model and the fixed effects regression model.

If you are working with, or analyzing data, that puts time as the dependent variable then you will require a special class of modelling techniques.

In this class I will highlight some of the most important concepts of panel data analysis including:

  • What is panel data
  • How to perform descriptive statistics
  • What are lags and leads
  • Pooled OLS, RE and FE regression estimation.

You are expected to have a basic understanding of Statistics and Stata to get the most out of this course.

Meet Your Teacher

Teacher Profile Image

Franz Buscha


Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Your creative journey starts here.

  • Unlimited access to every class
  • Supportive online creative community
  • Learn offline with Skillshare’s app

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. Intro: Hello and welcome to this short course on stator Panel Data Analysis. This course is a stator module that aims to provide viewers with a quick overview of how to analyze panel data in status. To gain the most from this course, you should have a basic understanding of statistics, specifically regression analysis. And you should have a basic understanding of Stata. If you don't have the first, I recommend that you watch my easy statistics course that focuses on the concept of regression analysis. If you don't have the second, then I recommend that you watch my Learn data analytics with stator calls. And it gives you a great primer on how to use data. Please note that state has a proprietary software. It is not free. But don't worry, even if you don't have access to state up, you'll still learn something from this course. Panel data analysis is concerned with analyzing longitudinal data. This kind of data has repeated survey information over time. Panels have a minimum of two time periods and can have an infinite maximum. However, the statistics for long panels differs from the statistics for short panels. And in this course, we'll look at the statistics and regression methods used for short panels with many observations. We call these types of panels, large n and small t panels. These kind of panels and methods are often found in the social or political scientist. If you're working with analyzing data that comes from such panels. And you should be aware of the special type of modelling techniques involved. In this course, I'll highlight some of the most important concepts of panel data analysis. Including what is panel data and how to set it up. How to produce descriptive statistics, how to use lags and leads. And we'll explore the three main panel regression methods. These methods often formed the basis of further panel data analysis. And if you're working with such data, then you should know about these methods. This course is not intended to be an in-depth mathematical exploration of panel data analysis. There are very few equations in this course. Each session will provide a quick overview of the statistical issues at hand and then move on to stator while demonstrate how to do things and how to interpret things. The outcome of the course is to give viewers a clear basic overview of how to handle panel data. Because this is an advanced topic. All states interactions via code. And you should have a basic understanding, state decoding. So let's go learn some panel data analysis with data. 2. Setting up panel data: In this chapter, we're going to learn how to analyze panel data is data. Panel data is very similar to the cross-sectional data with the addition and observations are traced over multiple time periods. Panel datasets and some of the best datasets that social scientists can obtain, but are also often the most expensive. The main advantage of using panel data is that it contains malformation and more variability and cross-sectional data, and that makes it more efficient. Panel data also allows for time dynamics to be investigated. And that is something that is not possible with cross-sectional data. Because of these advantages, panel data often allow scientists to construct more complex behavior models. And that's a good thing. Before we delve into stator, it's important to take a minute to make sure we understand what panel data is added. Simplest, panel data is cross-sectional data that is gathered over multiple time periods. However, scientists often assume that panel data as enlarge observation count and a small number of time periods. And we call this large n and small t data. Good approximation is that anything with less than 20 time periods is usually called panel data datasets and have a small observation count, but a sampled over many time periods, say one or 200 years, now called time series datasets. And we refer to these as small n and large T datasets. Datasets that have a large n and large T can be considered big datasets and something that has recently emerging in the last few decades. The important thing to understand is that each data type has a different mathematical and statistical approach. And in these sessions will focus on the statistics of large n and small t datasets. Panel data can come in several forms. The first and perhaps most unrealistic case is the balance case, where for each observation of a many time periods, you have complete data. The more realistic case is likely to be the unbalanced case, which includes gaps and dropout in your data. For example, as presented here, observation one drops out after the first time period and never returns back into the data. And you should keep in mind that when you use unbalanced data, your sample size may change during your analysis. Finally, it's important to understand the difference between wide and long form data. To perform panel analysis in Stata, we must first convert our data to long form. If the data arrives like that grid. But often it may not arrive like that. In which case, you will need to convert it from wide to long. White phone data has the property that each variable is measured in each time period. In this example, we have three observations. Measuring age and gender. Both age and gender are measured in time period 12. And this is represented by the variables X1 and X2 and H1 and H2. One convert it to long form. The additional variables will disappear and we're left with only one sex and age variable. However, now, the id variable has been expanded and we observe the same individuals multiple times in our data. We see that person one is male and age 22 in wave one, Watson wave to the same person is still male. But h 23 combine data from y to long form and from long to white form. We can use the reshape command. And this command reshapes the requested variable names to either wide or long form. The command requires a common panel identifier to be inserted in the i option. What's the k option requires the name of the time or panel variable. Once this has been done, the set command can be called, which tells data what the panel identifier and time variables R. And that will then set the data to paneled K times theta. So let's go ahead and see how this works in STATA. Here we are in states with a reshaped training data already loaded. This is a small dataset, so let's use the list command to take a look at what we have, list and execute. The reshape training data contains only three observations. Individuals, 123 individuals, 13 a woman. Once individual one is a man. We then observe three income variables over three different years in 188182. And we also observe another variable over three years called ie. This is the classical wide setup of data. And we need to convert this to long form in order to call the set command. To do that only to use the reshape command. Let's go have a look at V-shaped Help entry fall, help reshape. And here we can see the general syntax for reshape. We first specify the reshape command, and we then specify the subcommands wide alone. To tell reshape what we want. We then need to specify something called a stop. And the stub is simply the variable names without the suffix that identifies the time period. We then have to option that must be included, the i option and the J option. So let's go ahead and reshape this data to long form. Close this reshaped long. And we then specify the variables that we want to reshape. In this case, we want to reshape the variables Inc. and ui. And the stub is ink and ui, IE. We're removing the suffixes 808182, our ID variables called ID, and our time variable is going to be called Year. Note, this doesn't exist yet, but we're going to create it from the reset command. Let's go ahead and execute this. Ok. So the reshape command now tells us that we converted three observations to nine observations, but reduce the number of variables from eight to five. The time variable j is called year. And the following variables were converted in 1881 and inject it too, will convert it to the variable ink. Uat, UA, T1 and T2 were converted to the variable u e. You might wonder why we didn't specify sex to be included in the reshape. The reason is we are only interested in reshaping time-varying variables. Sex doesn't tend to change over time. So we can simply keep it as it is. Now let's go ahead and look at the data using the list command, but with a separate option to separate out what happened. List comma separate at three. And here we can see what happened. We see that our data is now in long form. And for each person, we have three observations over three years. The 808182, because sex wasn't included on the reshape command, must become a time invariant variable that doesn't change over time. In other words, every person has exactly the same sex status in each year, Inc. and ui to change over time. And we can now go ahead and use the set command to set this data to panel data. We can do that by typing set. We give the pano ID variable, which is ID in this case. And the panel time variable, which is here, execute up. And there we are, state or confirms that we have now set out data to panel data and also gives us an indication of how balanced out datas. And that concludes this introduction to setting up panel data in stanza. 3. Panel data descriptives: Now let's take a look at generating a first set of statistics from panel later. Descriptive statistics are an important component of any data analysis. And especially so with Panel Data Analysis. This is because the data are more complex and you really need to know what it is that you're dealing with. Key to working with panel data is to understand the difference between individual observations with respect to time. In other words, different system panel data can occur because of differences between people. But differences can also occur within people. These differences are called between and within. Variation. Between variation is when we want to explore how an individual's data varies compared to other individuals in the same data. Within variation is when we want to explore how an individual's data varies over time within the same individual. Both are important concepts to understand and will pop out throughout the sessions. To highlight this further. Here's an example of how panel data can be used to compute three different types of statistics for the same underlying data. In this small example, we have three individuals, each observed over two time periods. Each individual has a score that starts at a baseline and then increases by one in the next period. Given the score, we could compute statistics such as the average, the variance, or the standard deviation, just like we normally would with any cross sectional data. In the first case here, we treat each observation is completely independent and compute our statistics when all the available data. And we call the statistics the overall statistics. In this case, the average score is 20.5 and the standard deviation is 8.96. However, the overall statistics doesn't tell us how our data varies within observations or between observations. To compute the between statistics, we compute the person average scores for each person first, and then compute the global average, variance and standard deviation from these numbers. And this leads to an average score of 20.5 and a standard deviation of ten. The computer within statistics, we calculate the difference between the actual score with a person's average score. And this gives us a person's deviation from his or her. Average. Statistics are then computed on these values. In this case, the mean is 0 and the standard deviation is 0.55. However, states are actually adds the global mean, in this case 20.5 back into each cell to make the results more comparable. So do keep that in mind whenever you're working with states and you are looking at within statistics. Data will present to you the last column within two and not within one. We're going to look at three panels, specific descriptive statistic commands. Xd describe, tells us about patterns in our panel data. Not that this command is not similar to the standard describe command. Xsd some summarizes Panel Data and X t tab, tabulates panel data. Let's go and see how they work in stator. Here we unstated and I've already loaded a panel version of the MLS WHA data. And this data set is called analysts work. We'll be using this data and the upcoming sessions. This dataset is the same as the NLS W-A-T-E-R data, used another sessions, but now contains multiple observations for individuals on the years 1968 to 1988. So first, let's set the data to panel data by executing the following code. Xsd set id code, which is the Panel ID variable. And then year without time variable. And the data is now set to panel data. The data is unbalanced and ranges from 1968 to 1998. To explore the unbalanced nature of our data, more, we can use the scribe come out that reveals balanced patterns in our data. So let's go ahead and do that and execute the command x t describe. Xy describe now shows us the nine most common data patterns. We can adjust the number of patents with a patent option, and we conclude up to 1000 different patterns should we wish. The results here from x to describe tells us that we've got 4,711 individuals in our data over a maximum of 15 time periods. We also see a distribution of T on the school i, which tells us that 50% of our data is observed for five years or less, and only 5% of our data is observed for 13 years or more. Next, we are shown the most common participation patterns. The largest fraction of individuals was observed in the single year, 1968, AND NOT thereafter. The next largest fraction was observed in 1988, not before. However, given the small size of these patterns, we can conclude that there is no major pattern dominating our unbalanced data. Next, we want to produce some initial summary statistics. We could use the standard summarize commands. But to obtain extra detail on the within and between variation will need to use the power specific command XD sum. So let's go ahead and do that. And let's focus on only the variable hours. So let's type XD some hours and execute. The result tells us the following. There are approximately 28,500 valid hours observations for 4,710 individuals. On average, a person is observed for six years. The average hours worked per week is 36.5. The overall standard deviation is 9.86, with a minimum number of hours worked of one and a maximum of 168. The between standard deviation is lower at 7.84 with a minimum of one and a maximum of 83.5. The within standard deviation is 7.52 with a minimum of minus2 and a maximum of a 130. However, because data adds the global mean back-in, we'll need to subtract this from the min and max values to get real Within changes. So for example, one person reduced hours by minus2 minus 36 is minus 38 hours, hopefully observed timespan. And another person increased our hours by 130 minus 3694 hours. Next, let's look at the binary variables such as union. Here we can use the panel specific tab command, which tabulates panel data. So let's type x t tab union. The over part of the table summarizes the results in terms of person years. We have 19,230 total observations, which 23 or union members and 77% are not. Between repeats the breakdown, but this time in terms of people rather than observations, 3,765 people ever had no union membership. And 110,641 ever had union membership. Because people can switch in and out of union membership, the total amount of people may exceed the actual people count. In this case, 4,150. The within percent tells us the fraction of time a person has the specified value of union membership. If we take the first line, conditional on not being in a union, 86% of an individual's observation will have a value of 0. If we take the second line, conditional on ever having a union sectors of 154% of observation of that individual will have a union membership. And this concludes this quick overview of basic descriptive commands available for panel data in states. 4. Lag and lead operators: Let's talk about the unique feature of panel data. Panel data has a time component that measures variables repeatedly. This allows us to explore dynamics and investigate what happens before and after an event. In other words, we can post questions that aren't just how is much related to happiness. But we can ask more nuanced questions such as how this mayor trafficked happiness today, tomorrow, or ten years from now. To operationalize such questions, we need to use something called time series operators. Time-series operators allow us to manipulate the time component of our data. Stator has four different types of time series operators. The most common types are the lag and lead operators. We can ask states to create a lag by adding an L dot prefix to a variable. We can also insert a number after the L indicating what kind of a lag length we want. So for example, typing L1 dot as a prefix to a variable will tell stator we want to create it's one period lagged. L2 dot will be the two period lag, and so forth. To create a lead, sometimes called forward indicator state or uses the EFF dot prefix. Stator also has two difference operators. The first d dot allows users to create the difference between a current value and the lagged value. However, be careful because the two dots is not the difference between the current value. And it's like two value, but is the difference of the differences. So d dot TH2 calculates the difference of the current value with its lag, and subtract from this the difference of the previous lag with the lag before that. To achieve the difference between the current value and a previous higher order lagged value, we need to use the seasonal difference operator called S dot. So for example, as x3 dot is that difference between the current value and the value three time periods ago. The great thing about data is that these prefix operators can be used in most data commands. On the fly. You can use them in regression-based commands, but you can also use them in descriptive and some rebased months. Moreover, you can add a series of lags and a series of variables into one single prefix. This is really useful because it avoids length decoding work. And I'll show you some examples of this. Here's an example which we'll see again later. And this setup, I progressed log wages and gets a rather complicated looking piece of code highlighted here. In fact, the code is quite simple. We want stator to produce the lags of the variables, weeks unemployed and living in the south. The lags we specify range from negative to positive two. But because a negative lag is simply a forward lead state that will create for us. Forward leads t plus two. T plus one will also create a contemporaneous variable, t 02 lags, t minus one and t minus two. So that's a lot of dynamics for only a little bit of code. And can be really handy when you're exploring data. And Regressions will not introduce any new commands in this session, but we'll use the time series operators to generate time conditioned variables. Will also use the time series operators to create variables on the fly within commands like regress and correlate. So let's go ahead to stator and explore this further. Here we unstated with the NLS training data already loaded and settle panel data. First, let's generate some new variables that used the four available operators. And then take a closer look at them. To remind ourselves what the four available time series operators are. We can execute the following code. Help ts var list execute that. And that will bring up the help file entry for time series variable list. Here we can see the various operators and what their meaning is. Scrolling down this help file, we'll see a series of useful remarks and examples for each of these. I recommend having a good look at this help file if you're dealing with panel data. Let's go ahead and close this. And now let's execute the following code using the variable never married. So generate the lag, generate the future, generate the difference, and generate a seasonal difference. Execute that. And we've now created four new variables from the variable. Never married, a lagged variable, a forward variable, a different variable, and a seasonal difference variable. Next, Let's look at a small portion of this data. Specifically the first 12 observations which correspond to the first individual in this dataset. We're going to use the list command to look at the variables I decode, year, age, and all the never married variables we created for the first 12 observations. And there we are. Here we can see that individual number one, he's 18 years old at the beginning of our data, and 37 years old at the end of our data. This individual was never married at the age of 18. And each 19, the individual becomes married. And the indicator flips from one to 0 and it remains at 0. We can now see what the time series operator did. The lag two previous value and carried forward to the current year. Note, however, that some time gaps exist in our data. That the land value is not carried forward when there is a gap. For example, observation five takes place in 1975, but the value for 1974 is missing. Data does not carry the lag of the last observation forward, which would've been in 1973. But the lag of the last year. 1974, which in this case is missing. And the same applies to all the other operators. Here, the forward values replaced for the contemporaneous value. And here we can see the change in marriage status occur using the difference operator. Finally, the seasonal difference operator is the same as the difference operator. Since we only specify a value of one in the differences. Having created these lagged variables, we can then tabulate them against each other to explore transitions. For example, weak type, SAB, never married against the future value of never married. Let's go ahead and look at that, including row percentages. And in this case, we can see that if you are never married, the likelihood that you will remain never married in the next period is 81%. If you have ever married, then you cannot be never married again. So the transition from 0 to 0 remains at a 100%. Next, let me show you how to incorporate multiple lags into an estimation commands. For example, we may wish to correlate the values never married against five future values of never married. To do that, we could type correlate never married, and then specify in round brackets f, one, forward slash five dot, never married and stayed there will then automatically expanders and include five forward leads in our command. Let's go ahead and execute this. And then we can see what the current correlations, future never married indicators. We can see that the correlation with next year never married is around 80%. Descent drops to 7060, 50% as we head into future values. Next, let me show you another example for the regression command. In this case, we're going to regress log wages on the lags minus two to two for both the variable unemployment in weeks and whether you've lived in the South. Let's execute that. And then we are. That's quite a lot of results from only a small bit of code. The results have something quite interesting. The effect of living in the south today, yesterday or tomorrow has no real impact on current waited. However, the effect of being unemployed does have a negative effect on current wages. Current unemployment seems to matter a lot, but the lack of one also seems to matter a little bit. In other words, my wages today are a function of my unemployment today, but also of my unemployment experiences in the past. Even more interesting is the fact that my wages today are also related to tomorrow's unemployment. How can something in the future affect me today? You might ask, we call this type of effect preprogram effects. And you need to be careful with the causal interpretation of your regression model here. Of course, tomorrow's unemployment cannot affect your weight is today. But often events don't come out of the blue. People who become unemployed tomorrow may already be experiencing declining wages today as the company's struggles for survival, for example. And that is what is highlighted in the statistics. And this is a great example of how useful panel data is an analyzing dynamic effects. And this concludes this session on time series operators in panel data. 5. Linear panel regression (OLS, RE, FE): Now let's explore linear panel data estimators in some detail. These estimators are often ordinarily squares estimators that make more complicated assumptions about the error term in a regression. The advantage of these estimators is that they allow users to build a more nuanced regression model that can be theoretically much more powerful than standard ordinarily squares estimation from cross-sectional data. Because this is such an important topic. In this session, we'll focus on both the theory of these estimators and applying these estimators in state. So let's have a very quick visual recap of what ordinarily squares tries to achieve. Ordinarily squares tries to run a line of best fit through our data. In this example, the red line is the regression line difference between the estimated regression line and the actual data points called the residuals. Adding all the individual residuals together gives us something called the aerator. Because positives and negatives cancel each other out. The average value of the error term is always 0. The key point to understand about panel data estimate is that they allow the residual, ie, the arrows in this picture to be determined by different processes. Ordinarily squares does not allow. For example, half the length of the arrows in this picture may be due to one type of error process helps the other May 0.5 be due to something else. That sounds a little bit cryptic. So what do I mean by this? Here? Two equations. Don't worry too much about understanding the mathematics. You concentrate on the structure of the mass. The first equation is a representation of what we try to do. We want to estimate a dependent variable against a series of explanatory variables that are measured over time. Ultimately, we're interested in the coefficient. The error term is not seen as something we must make an assumption about. However, the panel data world, we can split the error term into different components. Make assumptions about each of the components is a key concept across. We can split the original error term called EIT into sub-components, UI, which denotes an individual effect, lambda T, which denotes a time effects, and VIT, which denotes a random effect. So what do these three potential components actually mean in practice? Let me show you how to better understand this by looking at some real data. Here's some real data on two individuals followed over a period of five years. Person one and Person two. For each person, we observe variables y and x. Ultimately, we want to request the variable y on x. But each observation is also subject to some noise. Total noise around each observation is captured by the variable e, the error term e, the noise. We split. The three different forms, there's a random noise elements per person per year, as shown by the variable v. And imagine this to be just random luck. Good things or bad things may happen to you at any moment in time. This also an individual time-invariant noise effect, denoted by the variable u. This is the person's permanent outlook on life. That doesn't change over time. Maybe person one is always very happy. Maybe person two is always unhappy. Finally, this also noise from time effects that are the same for each observation cost time, and that is denoted by the variable t. You can imagine that these things are something like macroeconomic shocks. When everybody experiences the same wage rises or the same wage decreases over time. The combination of all three noise levels or subtract two y. So what do panel data estimators to? But these noise components, well, they make different assumptions about them. The most basic estimator is called a pooled ordinarily squares estimator. And this estimator makes no distinction between these noise levels. In other words, it doesn't decompose the error term. It also assumes that the regressors are not correlated with the error term. The random effect estimator decomposes the error term into its individual specific and the random error component. This has the advantage of being more efficient. We'll know this estimate is still assumes that regressors are not correlated with the error term. The fixed effects estimator decomposes the error term into its individual specific and that random error component. And assumes that the regressors may be correlated with the error term. Specifically, the regressors are allowed to be correlated with the individual effect component. It applies a transformation technique to difference out old individual effects, either only the random component in the equation. This estimator has many disadvantages, such as removing time-invariant families from depression and big more inefficient. However, many social scientists, There's one of the primary benefits of panel data due to its ability to cope with potentially endogenous variables. The state of commands we're going to be using the session or the standard request command. But with the cluster option, the cluster option will always be needed to be invoked on working with panel data and standard ordinarily squares. This makes sure that you get the right kind of standard errors using simply regress without the cluster option. It's the wrong thing to do is panel data obtain bound them and fixed effects estimators. We can use the stearic command, which allows us to call on various panel data regression models. So let's go and head over to state. Here we unstated and with the NLS work data already loaded and set to panel data. Let's begin by estimating a pooled ordinary least squares regression. Let's assume we're interested in the effect of age and college graduation on low wages. We can run a pooled ordinarily squares model by simply calling the regress command and specifying the cluster option as highlighted earlier. This is important. The repeated nature of the observation leads to something called error correlation, which can artificially create very low standard errors. And this can make you think that too many variables are statistically significant. By using the cluster option, we overcome this. So let's go ahead and regress log wages against H and H squared and college graduation clustered on the ID code. And here are our results. The results show that college graduates earn around 38% more than non college graduates. Because we have a time dimensional data, we can also add interview years to this regression and get an idea of time shocks. And we can do that by simply running our regression and adding time as a new variable. In this case, I'm expanding the more into separate dummy variables execute up. And the results now show that compared to the base year of 1968, log wages went up for everybody in 1969. However, in 1975, log weight is when negative, suggesting that national economic conditions in that year were worse. So this is an example of visualizing time effects and our regression. Next, let's have a look at the fixed effects model. To estimate the fixed effects model, we need to call the XT recommend and apply the FE option. So XT rig log wages against H and H squared college graduation year effects and then fixed effects as an option executed. And here are our results. Let's quite a bit of output here. So let's take a careful look. The diagnostics at the top tell us how many observations and how many people are in this regression. 28,510 observations. And this corresponds to 4,710 individual people. On average, we observe people for around 6.1 years. Here on the left, we have three different r-square statistics. In simple terms, the between R-squared asked the question, how much of the variance between different panel units does my modal account for that within R squared, asked the question, how much of the variants within my panel units does my model accounts for? And the R-squared overall. Asked the question, how much of the variance of all observations does my model accounts for? All three have relevant? Another useful diagnostic is the estimate of the correlation between the individual errors and the regresses. This number deviates far from 0, then it fixed effects estimation is likely justified. Although a formal Hausman test should still be conducted. And we'll talk about that later. Next we have the coefficient estimates. H is positively correlated with log wages and each squared is negatively related to log wages. This gives us the classic inverted U-shape. Both are statistically significant. College graduation is omitted from this regression. And that is because it is a time-invariant variable. And the fixed effects transformation eliminated this from this transformed regression. In other words, it gets entered as a 0 and is therefore kicked out of the model automatically. And finally, we have the year dummies. At the bottom of our output, our estimates of the standard deviation of the individual effect and the total error term. The value rho, indicates that 64% of the variance in our model is due to differences across panels. Finally, let's have a look at the random effects model. If the regressors are correlated with the individual component in the error term. Well, this model will produce bias results. But if they're not, then this model is more efficient with low standard errors compared to the fixed effects model. To estimate run effects, we simply replace the FE option with the RE option XD reg log wages H naught squared, call scatteration R H Tommy's and random effects execute that. And the output for the random effects model is very similar to that of the fixed effects model. Produces the same kind of diagnostics as the fixed effects model, but clearly states that the correlation between the individual effect and the regresses is assumed to be 0. Note that the variable college graduate is now included in this model. As this model does not remove time invariant regresses, the interpretation of rho remains the same. So which model would you use? While we'll explore how to identify what model we prefer, the next session, we talk about the Hausman test. 6. The Hausman test: Let's examine how to choose between random or fixed effects regression estimation in a panel data setting. This is a very common question to ask when working with panel data. The advantages of both types of estimations are clear. Random effects is more efficient and therefore produces lower standard errors. It also allows for the analysis of time-invariant variables. As these are not difference out in the random effects, estimated. Fixed effects is not as efficient. In fact, you need to give up one panel period worth of data to achieve the required transformation. So in a worst-case scenario, in a two period dataset, how your data will be thrown away to achieve the required transformation. And that is a very big data costs. However, fixed-effects allows users to assume that there's a correlation between part of the error term and the regressors. And often, this can be a really important assumption to have in your regression analysis. So how do we choose between one or the other? Well, the good thing is, under the null hypothesis that the individual effects on random, both estimators should produce exactly the same result. And this allows us to construct a test statistic that tests the difference between both sets of coefficients. And this test is called the Hausman test. The Hausman test is very easy to perform in stator. We first need to estimate a fixed effects regression and store the relevant estimates. This is then followed by a random effects regression, who's estimates are also stored. And finally, we call upon the Haussmann command to compare both set of estimates. The resulting test statistic can be found at the bottom of the test. This value is below 0.05, then the test is significant and we reject the null that there is no difference in the results. In other words, the results from both sets of regression are statistically different. And one day, we tend to prefer the fixed effects estimation. So just to repeat, if there is a significant difference in the coefficient between both sets of models, we tend to prefer the coefficients from the fixed effects model. To store estimates. Since data, we're going to use the estimates store command of the each regression. A name must be provided. And then to perform a Hausman test where you start Haussmann command that requires to store the estimates of both the random and fixed effects regression models. Both estimates are inserted after command, and that's it. So let's go ahead to store data and explore this further. Here we announced data using the analyst's work paneled training data. And I've already set the data to panel data using the set command. Now, let's go ahead and estimate the fixed effects regression model. We can do that using the XY recommend. In this case, our dependent variable or log wages and will be regressing that against the Bible age, college graduate and union membership. Okay, that model has now been estimated. We're now going to store the estimates using the estimate store command. Let's call these estimates SE. Okay, that's now been done. And now let's repeat the process for the same model, but using random effects. So x D reg log wages or variable list and the option random effects. And now we're gonna go ahead and store these estimates again. Estimate store, and we'll call these estimates are 0s for random effects. So we've now estimated both a fixed effects and random effects regression model and stored both estimates. Next, let's use the Haussmann command to compare both sets of results. To do that, we'll type Hausman and then simply enter fixed effects and random effects. And these are stored estimates. Execute that. And here we are. The output displayed by the Haussmann command shows the estimated coefficients of both sets of models for each variable and computes the difference and standard errors of these differences. A test statistic is computed from these differences, which tells us that it is very likely that these differences are not random. Any value less than 0.05 implies that we should accept the fixed effects regression estimates over the random effects in person estimates. And that I wanted to choose which of these popular estimators one should use. Personally, in my own work, I found it to be very rare for random effects to be accepted. However, I often work with people surveys. So if you work with different kinds of data, this may not necessarily be true for you too. The Hausman test yourself and see what you get. 7. Non-linear panel regression (OLS, RE, FE): Finally, let's take a look at non-linear paneled data estimators. Just like for cross-sectional data, non-linear Pamela estimators are used when the dependent variable is binary or categorical. The general approach of this model is very similar to that of linear models. However, the mathematics behind these models can get very tricky. So I'll try to give you the most laymen interpretation possible. To keep things manageable. What kind of constraint on only logit models in the session? Just like for linear ordinarily squares models, three types of panel data logit models are used, most often. The pooled logit model, the random effects logit model, and the fixed effects logit model. In principle, they are similar to the linear counterparts. But here are some quick tips for each of them. Pulled logit models are similar to the cross-sectional counterparts. They allow for the computation of marginal effects and assume that regressors are not correlated with the individual effect of the aerator. They also require a clustering on the panel id variable for correct standard errors. Random effect. Larger models are the most efficient types of model. Although they are row coefficients, may not show this. Then marginal effect calculation, however, will these models also assume that there's no correlation with the requesters and the individual effect. And they're quite computationally heavy. In other words, they can take a long time to estimate fixed effects. Logit models are least efficient and do not allow for time-invariant variables. In addition, you cannot compute normal marginal effects for them. This means you'll often have to use the odds ratio if you want to interpret the coefficients from these models. They're also prone to high observation. Attrition has only observations and experience. A change in the dependent variable are included in the model. However, they do allow for requesters to be correlated with the individual component of the aerator. Will use the logit command with the cluster option to estimate a pooled logit model in states or to estimate a random and fixed effects logit model. We'll use the logit command, which is very similar in syntax to the XD recommend, will also use the OCR option to report odds ratios and to compute marginal effects will have a look at the mantels command. So let's go ahead to stator and explore this a little bit further. Here we announced data with the analyst's work panel training data already open and settle panel data. Let's go ahead and estimate a pooled logit model. To estimate a pooled logit model, all we need to do is add the cluster option to the traditional load command as follows. We estimate a logit model using the logit command to dependent variable union. A couple of independent variables and then add the option cluster, and in brackets the panel id variable, which in this case as I decode. So let's go ahead and estimate that. And here's the output which tells us that all variables are statistically significant. Age and grade have a positive effect on union membership. What's living in the south has a negative effect. Because roll logit coefficients are quite how to interpret. We can now state that to report odds ratios instead. And we can do that by adding the OCR option to the loaded command. So we put all of that. And at the OCR option, the baseline is one. So for each year of age increase, the odds of being a union member increases by 0.9%. Living in the south decreases the odds of being a union member by 48%. Compute marginal effects, we can call upon the Margins command. So Margins comma DY DX, and then in brackets star, which denotes all variables. Execute that. And here we can see that living in the south decreases the probability of being a union member by 11 percentage points. Next, let's take a look at the fixed effects logit model. To estimate this will use the logit command with the FE option. So let's go ahead and do that. Xy logit dependent variable union, various explanatory variables, and then the option. Okay, and here's our output. The output of this model is similar to that of exterior egg, with information provided on how many observations and people are included in our model. And what the average length of observation per person is. Great as a time-invariant variable and it's removed from our equation. Age has a positive and significant effect. What slipping in the South as a negative, insignificant effect on union membership. Also note that there's not constantly in this model. We can use the o option to report odds ratios for a better interpretation. Let's go ahead and do that x t logit fixed effects. And then the OR option. Now we can see that living in the south reduces the odds of being a union member by approximately 65%. Masala effects as we know them, cannot be computed for this model. If you request marginal effects, state a will provide monitoring effects, but these make a very specific assumption about the individual effect. And this may not be a realistic assumption. In general, marginal effects should be avoided after fixed effects logit models. To compute a random effects logit model, we simply replace the FBI option and next to login with our ie, like so XD logit union of variables and then random effects. And here is our output. This model takes longer to run and produces output that is similar to the random effects ordinary least-squares model. The coefficients are all significant and have their usual meaning. The input includes an additional panel level variance component. This component is shown twice, once in the log of the variants and once as the standard deviation. Together, they can be used to compute rho, which is 0 when the pamela estimator is no different from the pooled estimator. A formal test of pooled versus random effects is occluded in the last line of the output. If this value is below 0.05, we should use a random effects logit model over a pooled logit model. We can obtain odds ratios in the same way as before. We can add the o option after Ari to obtain odds ratios. And finally, we can also compute marginal effects for this model by using the Margins command has before. Margins comma DY, DX and star. Note that this can take a significant amount of time to compute the marginal effects presented here have the usual interpretation. These are average marginal effects. And that minus 0.9 on South tells us that live in the south leads to approximately a 9.6 percentage point reduction in the probability of being a union member. And there we are. The final question you might have is how do we test between the random effects logit model and it fixed effects logit model. Well, this is done using the Hausman test again. And the procedure is exactly the same as outlined in the previous session, except that the estimates are used from x D login rather than XT rig. And this concludes this final session. Or panel data models in states or panel data models tend to be complex and require more underlying mathematical knowledge. And standard cross sectional models. However, they are more flexible and can be very rewarding to use. Thankfully, state and make the implementation simple and fast. 8. Difference-in-Differences: Let's talk about the popular panel method called difference in. Difference in. This technique is commonly applied in evaluation problems. But the investigator wants to determine some kind of treatment had an effect on the control group. This method is used by many sciences, but it is often used in the evaluation literature. What is used to evaluate some kind of affect. Policy affects, for example, rural difference in differences, often just called different diff, come into my hand. And I'll show you some examples of that in just a moment. All it takes is for numbers and some simple subtraction. However, one reason why this estimator so popular is because it can be combined with regression analysis. And that gives uses the power of difference, which I'll explain in a moment. And the power of controlling for many other variables from a regression combined this type of estimator as very strong properties that causal evaluation. Note also that it has strong similarities to fix the fixed regression estimator. So what's the basic setup of different deaf? Let's start with the predecessor estimator called the before and after estimator. And luncheon for a moment, you have a group of people called the treatment group, or these could be all the people in a particular state, or all the people who receive a particular drug or something similar. We also have two time periods. These are called the before and after time periods. Some kind of treatment is applied to the control group in between these two time periods. If we wanted to figure out what the effect of the treatment was, we could compute the difference in outcomes such as health or income into two time periods will be called up before and after estimation. Because we're literally just comparing something before the treatment. And then after the treatment for group of people who were affected by that treatment. So let's say for example, we're looking at the labor market policy. And the effect was to increase income by plus ten. Well, all we need to do is to compute the difference and we're done. Perfect. You might think that was too easy. And indeed, it is. A problem with this type of analysis is that we can't be sure whether the plus ten effects of our policy was due to the policy itself or also to other factors, such as the economy improving over time. So one way to get around this problem is to have a control group. And a control group is a group that is preferably exactly the same as the treatment group, with the exception that this group was not treated. By examining what happened to the control group, we can deduce what would have happened to the treatment group had they not been treated. This in turn allows us to compute a more accurate treatment effect. So how does that work? Well, it's quite simple. We simply do another before and after estimation for the control group. And then look at the difference between that estimate, that of the treatment group. Let me show you. Here is the control group that before and after estimate is also plus ten. In other words, people who were not affected by our imaginary labor market policy also saw their income rise y plus ten. Ergo. The actual impact of the policy is 0. The treated group would have gotten the plus ten anyway and they're not being treated. And that is the difference in difference estimator. There are two before and after differences. And we then take the difference between those two differences. Hence the name difference in differences. Simple T to make a difference in difference estimator is whether the control group is similar in characteristics to the treated group. If you have a random assignment of the treatment group. And this is likely to be the case. But in many cases, treatment is probably not randomly assigned. Policymakers do not roll the dice when determining who gets healthcare or who its unemployment benefits. This often means that selectivity plays a big role in determining who is in the control and the treated group. To address the basic Different Deaf estimator is often recast into overt aggression. Well, we can use other controls to account for differences in characteristics. So even though the raw data reveals that the treated and control groups are very different, by controlling those differences within a regression framework, we can equalize both groups. Anyway. Gives a more robust and causal estimate. To operationalize different depth in regression, we need to specify four components. Three of which are the different components. Each regression, we'll need an outcome variable Y. And that in turn is regressed against a binary treatment dummy and a binary timed AMI. And finally, the interaction of the two. Lastly, the fourth component is a vector of additional controls. The coefficient to examine will be the interaction coefficients that reveals what the treatment effect. Actually. There are two ways to perform different diff analysis in states that the first is to build a regression model by hand and trying to figure out what's going on yourself. I personally prefer this way as it gives me more control for more complex variations of different list. The second way is to use a program called def. This program allows you to perform different, different analysis, also triple differences. Let's go ahead and explore box. Here we announced data and I'm not going to use the auto or analysts work training vector. I'm going to use a famous dataset like cotton Krueger, who investigated with a minimum wage legislation increase or decrease the number of people working full-time. In the fast food industry. Standard economic theory predicts at minimum wage laws are bad for employment. Counter Krueger, who realized that the minimum wage was increased in New Jersey, but not in the neighboring state of Pennsylvania. So they immediately realized that they could construct a A different scenario where the compare the before and after outcomes of New Jersey typically form after outcomes of Pennsylvania. So let's have a look at this data by running that describe command. So this dataset contains only a few variables. There's an ID variable for the allows us to track different fast food outlets over time. And there's a time variable and it has two components. It before time period 0 and after time period. One. The treated variable is a state variable. New Josie was treated. Pennsylvania is to control the variable FTE, count the number of full-time employees postal. To perform at different, different analysis of this data. We can simply tabulate the summary statistics. We can do that with the tab command will produce a two-way table of treated against time. But we're not interested in the counted. We're interested in the mean of FTE across each of these cells. And the summarise option allows us to populate each cell with a mean of FTE. I'm also calling on the north and no Frac option, which stands for no standard deviations and no frequencies. And this helps to avoid cluttering the table with too many numbers. So let's go ahead and execute this two-way tabulation. Okay, we are, here is a famous two-by-two table of a basic different diff, estimator. We can see that the new Josie, the average number of people employed full-time per store, rose by about 0.5. Over the intervening time period. In Pennsylvania, the number of people decreased by around 2.5. So that means that the minimum wage introduced New Jersey actually increased full-time employment by 0.5 plus 2.5 equals three. So that's a very counter-intuitive result, but also a very famous result. This tiny little table led to a revolution in modern thinking around minimum wages and why they might be beneficial. Let's go ahead and repeat this analysis for the regress commands. To perform different, different regression, we need to specify an interaction between treatment and time. Status. Really hungry for that because we can interact the variables on the fly by using the idle prefix before variables. I'm going to tell states that some of my variables in the case of variables and the double hash will tell stator to perform a full factorial expansion. Let's go ahead and execute this progression. And here are the results. Shows the coefficient tells us not New Jersey outlets employ 2.9 less than half on average. One compared to Pennsylvania outlets. Time coefficient tells us about both states on average reduction in FTE numbers by 2.4 and the after time period. However, interaction effect, as a result, that is approximately positive three. And that tells us that New Jersey experienced a significant upswing in FTE employees and the after time period compared to Pennsylvania. That is also the number which we calculated. Just a moment, go by hand. So that's it. That's how easy it is to perform different diff analysis in states. And there's also a user can command called TIF. You'll need to search for it and installed it. Make sure you always obtain permission before installing any software. Once done, we can then call it by using the diff command and telling it without dependent variable list. In this case, FTE. The treated option tells diff what the treated dummy variable is. And the period option does the same for time. Let's go ahead and execute this and see what happens. Okay? We see that the diff command presents things in a slightly different way. In fact, represents the mean values of FTE for both treated and controls in the before and after time period. And that's very similar to our previous two-way tabulation. And of course, we obtain the same result. Finally, if we're worried whether the control group is similar to the treatment group. We can also include a set of regression controls. If we're different diff with the regress command, we can simply enter these as extra variables in our regression. Like so. Regress, FTE is, are different if setup. And then these extra variables, control execute that. We're now estimating a different regression or controls for additional information. In this case, what type of fast food chain the outer disk Burger King kept, et cetera. And results didn't really change though. We can do the same thing with the disk command. The cough option, which is short for covariate, allows us to include additional controls. The report option reports the estimates for these controls. Cell its electron executed. And there we are. Again is or the same results as before. So that's how basic different depth is performed in state. However, the journey doesn't end here. Different DEF, make something particular assumptions that need to be tested. And it is also common these days to build more complex, different if models. And we'll explore that in another session. 9. Parallel Trend Assumption: Let's continue talking about difference in difference designed. A popularity has increased enormously in the last 25 years. So it's worth, and a little bit deeper and understand some of the issues around difference in differences. In this session, we'll explore a very important assumption behind difference in difference analysis. This assumption is called the parallel trends assumption. So let's have a look at the parallel trend assumption and see what are the different, different analysis is subject to all the usual ordinarily squares assumptions. Nothing is different there. But the power of different diff over OLS comes from the time component that allows us to examine two different groups of the time. One which is treated and the other which is not basic cross-sectional ordinarily squares come to that. However, because of the time component, that definitive estimate that is subject to an additional function must hold for our estimates to be valid. And that assumption is called the parallel trend assumption. It's very simple to explain really. The parallel trend assumption requires that in the absence of treatment, the difference between the treated and control groups remains constant over time. In other words, they have parallel trends. To rephrase that. The trajectory in average outcomes be that income, employment for both groups must be the same for any treatment because they don't have to be equal. They just have to follow the same kind of slope for the treatment. Only if this assumption holds, can we argue that any deviation by the treatment group o away from the control group is because of the treatment. So that's the parallel trend assumption is the most important assumption around difference in differences. And it should always be tested. How do we test for the parallel trend assumption? Well firstly, we need data before the treatment period. Classic different DEF is cost as only two time periods, time periods 0 and time period one. And somewhere in between these two time periods, some sort of treatment takes place to get an idea of what a parallel trend will need an additional time period. A third one, which we'll call t minus1 for now. Think of this as that before, before time period. Without that data, we can't form a test for parallel trends. And the classic example of carbon crew come where they examined minimum wage legislation in New Jersey and Pennsylvania. We only have two time periods. They never performed a parallel trend. And this is a major critique of their study. If you do have time periods before t 0, then you have two options. The first is to do a simple graphic exploration of the data, plot the data, and see what the relevant trends are. Over time. Both the treated and the control group. Do the pre-treatment trends look parallel? If so, the assumption probably holds. This is very easy to perform and also gives you a good visual idea of what's happening to the data. Visually representing data is actually very important even in regression analysis. The second way to examine the parallel trend assumption via regression technique. And the nice thing here is that the approach is actually just a different regression for the two time periods before treatment. Another words, time period t minus one, time period 0. If the interaction effects are statistically insignificant, then both treated and controls are on the same prior slope. In other words, they're on parallel trends. And the nice thing about the regression approach and is now you can control for other variables and therefore take other factors out of the equation. Let's have a look at a couple of pictorial example with and without parallel. The first example is an example with parallel trends. This will really get the meaning across of what I've been trying to say. All of this is using fake and simulated data by the way. So here's the first picture. We have a treatment group in blue and a control group in red. Some kind of treatment happens between time period 0 and time period one. The outcomes for the blue group jumps up quite a lot. But actually the outcomes for the red group also increase. So the total effect of the treatment is not quite as big as we think it is, but it still looks quite positive. Can we be confident in claiming that? Yes, we can. Because pre time period 0, the trends and outcomes for both groups look identical. Wants to blue group is always a little bit higher than the red group. Their slope. In other words, that trend is similar right up to the time period t. And that means we can credibly argue that had no treatment up and the blue group would have seen that outcomes increased anyway. By how much? Well, by as much as the red group. And that is key to this entire analysis. By convincingly claiming that before treatment everything was running on similar trucks. We can be more confident about the causal effect of treatment. Time period 0.5. Now let's take a look at the scenario with this important assumption fails. Here it's a similar picture. Again, the setup is the same. In fact, the data between t0 and t1 is identical. The blue outcome increases. So does the read the outcome. But not as much difference in difference estimator on the T0 and T1 time periods would finally small positive effect of the treatment. But now let's look at the pre-treatment trend. We see that the trends are not parallel. What does that mean? Well, in this case the answer is pictorially very obvious. The blue group was already on that trend. In other words, the treatment had no effect. That outcomes were already rising rapidly well before the treatment happened. The red group was also not affected by anything. Their outcomes also continued arise, but at the same rate, which was slightly lower. So nothing actually happens. There is no effect of the treatment. So this is a great example of why parallel trends matter. Basic different, different analysis would lead us to think that there's actually a positive effect of treatment works in actual fact, there was no effect at all from the treatment. So now let's add up the stator and explore how we can perform on this. Within state. Here we announced data and this time I'm using my own custom do fall. This to fall inputs to tiny little datasets into stator. And each group here only has two observations per time period. And that means standard errors and statistical significance will be meaningless. But that's all right. In this case, we're only interested in exploring the coefficients come out of our model. We need to make sure that we understand what's happening. So let's begin with the first example. That is an example of parallel trends. Let me go ahead and input the first set of data that's now done. And now let me visualize this data using some scatterplot and linear fit. We're going to do that, which is rather lengthy code down here. And I'm not going to explain this graph code in a lot of detail, but actually it's quite simple. All I'm doing is funding various scatter prompts and linear fit for the treated and controls in different time periods. So let's go ahead and take a look. Okay. So this is a visualization of one out data is doing. Afford to explain this graph previously. So I won't talk about it too much. But noteworthy is that we do see evidence of parallel trends before treatment takes place. The lines between the time period t minus one and t 0 are parallel. So not a ton of regression and testis formally, I'm first going to tabulate time to see what I have. And I've set the time data are to be really clear. Minus1 is to before, before time period, 0 for time period, and one is after time period. Unfortunately, status automatic expansion capability doesn't like negative numbers. So I need to recode this quickly. So avoidant error later on. And I'm going to replace the variable time with the variable time plus one. In other words, I'm simply going to add plus one to each unit of time. So that doesn't change anything. We've simply shifted the time periods by one. Then next step is to run a difference, the difference model. We can do that by running a regression of treated time and their interaction for all data. Time periods 12, since these are correspond to the before and after time period. Let's go ahead and do that. Regress wages. Can street it fully interactive with time. And if the time period is equal or greater than one. And that's what we get. Let me put up the previous picture so that we have a good comparison. Or control group starts at one hundred, ten hundred. This is the constant or intercept. The treated group is 500 units above that. Then over time, the control group increases by 100. The treated group increases by 400 on top of the 100 that the control group is increasing over time. In other words, the treated group increases from 100 to 2 thousand and the second time period. But the causal effect of treatment is only 400. It's a control group also went up by 100 planets out different result explained in relation. Now plot next to parallel trends tests. To perform that we simply re-run the same analysis for the earlier time periods. Let's go ahead and do that. Progress wages and time. Time period is equal to less than one. Execute that. And there we are. Again. Let's interpret these results in relation to our graph. The control group starts at 900, the treated group is 500 prior, so that's 1400. So control group goes up by 100 over time and the treated group goes up by 0 more over time. In other words, the treated group also goes up by 100 units overtime. Therefore, the trends are parallel. The interaction effect is 0. Excellent. Parallel trans checked out different diff model is valid. Results are good. Good to go home. Next, let's check a dataset with the parallel trend assumption fails. So let me load up that dataset and also visualize it. So this is the same graph I showed you earlier. Note the absence of parallel trends before treatment. Again, I will shift the time coding to make my life a little bit easier in states that when I use it, automatic expansion capability. Ok, that's done. Again. Let's go ahead and run two models. One different diff model for the before and after time periods, and one different if model sort of before, before and before time periods. So start with the first one. And here the results and the different diff shows us the same results as before. The data is the same for this part and the treatment has a positive effect of 400. Excellent. Now let's go ahead and check the parallel trend assumption. Second, different if model. Oh, this looks like bad news. Let's go ahead and look at our graph. We can see from our regression that the control group started 900. The treated group, or one hundred and ten hundred controls increased by 100 over time. But to treat the units increased by 400 on top of that. In other words, an increase by 500 units overtime. Clearly therefore, those two slopes are not the same. The slope difference between the treated and controls is 400 to be exact. Therefore, the trends are not parallel. And unfortunately, this means we can now throw away, now earlier different result. It has no meaning anymore. We need to come up something better. So that is how to test for parallel trends in Stata. When using difference in differences. I showed you the basic regression way of doing it. And you can also of course, use different commands, such as the user written def command from the previous session. It will all end in the same way and wherever you do it. So that's it for this session. There are other concepts are taken into account when using difference in differences, such as clustering. But these tend to be less important. And the parallel trend assumption. 10. Difference-in-Differences without Parallel Trends: Just as a warning, this session will be challenging. Let's talk about advanced difference in differences estimation. We finished a previous session, learning how to identify and test for the parallel trend assumption. This is a critical assumption behind any different diff modelling and it needs to be tested. If the assumption fails. We cannot conduct a basic different diff estimation procedure with only two time periods. We need to come up with something else. And this session I'm going to show you that you can't sell, obtain good estimates even when pre-treatment trends are not the same. The key to doing this is to build a more complex model that results in complex interactions between variables. But which in turn gives you more modelling freedom and more information. Before I explain how to estimate a different this model without parallel trends. Here are some basic things that you all need. You'll need at least three time periods worth of data. But preferably four or more. Of these. Four or more. At least two should be before time periods and two should be after time periods. A model works fine with 32 before and one after, but some coefficients will collapse as you'll see later. They'll also need to have a good understanding of interaction terms. How to interpret linear and quadratic trumps. The model I will show you is built around estimating a pretreatment trend for the control and treated units. And then estimating a deviation away for both controls and treated. Graphically examining your results before and after regression is vital to fully understand what's going on. They'll also need quite a bit of state of knowledge. A lot of knowledge converges and this analysis, such as building complex models, interpreting interaction effects, graphing results, et cetera, et cetera. So make sure you check out my previous videos and functional form and also interaction effects. So having said that, let's have a look at this theoretical setup. Next. Let's recap a basic different disk setup quickly before we expand on this. In a simple different if model, there are two time periods for both a treatment and a control group. Let's denote a dummy variable that is one for the treated and 0 for the controls. And let's call that treated. Let's also denote a further dummy variable called post. This variable is one for all post-treatment time periods and 0 for all pre-treatment time periods. The interaction of the two is denoted by the variable treat at times. And a classic world where we only have two time periods, both variables and the interaction term upward into regression. Coefficient or treated will tell us what the difference between treated and controls are. The coefficient on post will tell us what the difference before and after us. And the interaction coefficients on treated times post will tell us how the treated units differ compared to the control units and that post time period. In other words, this is the different depth estimates. But now let's imagine that we have more than two time periods. Let's assume that we have many before and many after time periods. Well, we could now include a further variable, n tau model called time. Time is a time variable. And this variable is continuous and simpler measures to time periods. So it could be measured, for example, there's 12345 or 1999200020012002. Whatever funds. We can add this time variable as an extra control Tom model. And that will include an overall time trend in how data are typically having a long-term time trend that fits through our data doesn't do a whole lot for us. What we need to do is fully interactive variables, treat it and post with the variable time. And that will result in a new model that has seven different variables are important in this model are two interaction effects. Let's have a look at each of these parameters that come from this model and see what they do. And then we'll understand why tooth interaction effects are so important. The constant in this model measures the level for ever we're measuring for the control units at time period 0. Often times 0 is just before treatment. That treated parameter measures the difference in treated and controls at time 0. A positive number here means that treated units on a different level than controls on a higher level. The time parameter tells us the slope of the controls at time period 0. In other words, this is the trend of the controls just before treatment. The treated times time parameter tells us where the treated units on a different slope than controls. A value of 0 means now not the higher value means the slope is much higher. All these parameters tell us about the pre-treatment time periods. Next is the analysis of the post-treatment time periods. The variable post and its interactions will tell us what is happening in the post-treatment world. First is the individual post parameter. This simply tells us whether there is a jump or a drop in whatever we're measuring for the controls just after treatment. In other words, histone instantaneous effects from treatment and the control. One would hope not. But if there is and this parameter will be non-zero. Descent question, Can we ask the treated units, do they suffer in instantaneous jump or a drop just after treatment? The post times treated parameter will tell us that positive number here. It tells us that treated unit's gained or lost that much more than controls immediately after treatment. Well, what amount of trends today change? To see what are the trends for controls change after treatment. We can examine the post times time parameter if this is positive. And this tells us that controls increased their long-run trend or slope after treatment. Again, we would hope that this is not the case, but with this parameter, we can identify it clearly. Finally, what about the frequent trip? Did that change? While the post times, time times treated parameter tells us that. If this parameter is positive, then the long-run trend for the treated changed after treatment compared to the change and the trend for the controls. So basically if this is positive or negative and the treated units, so they're trend change relative to the trend of the controls post-treatment. Hopefully you can see now that the two key parameters of interest are the post times treated parameter and the post times time times treated parameter. These parameters identify one with the treated units on instantaneous boost from treatment or not, and to where the treated units, so a long run trend change, frequent or not. The great thing here is all this is controlling for different trends beforehand. So we don't need to worry about parallel trends at all. So it was quite a lot of technical information. I don't want to add to this any further, but it is worth pointing out that general syntax and how to achieve this kind of setup in status. Status, automated expansion and interaction capability really pays off here. You don't need to code all of these variables yourselves. In fact, if you're doing that, you're probably doing something wrong. Or you need to do is to specify a model that looks something like this. Regress the dependent variable y against the indicator variable treated. And it's fully interacted with the continuous variable time, which isn't turn foot interacted with the indicator variable posts. And if you want, you can also add additional control variables after that. Key to the setup is to triple interaction. Two variables are dummy variables and one variable, it is a continuous variable that measures time. And that is all you need. Stator will automatically interact and expand these for you. And you'll end up with seven parameters in addition to any other controls and you specify it. Just keeps a central point of time in your mind. If you call time as 1918, 1992, thousand, et cetera, then your origin is actually the year 0. And that will make the model run. That may confuse you mightily when you see the numbers coming out. Lastly, the models that I'll be showing you uses a linear time trend. There's nothing that stops you from using more complex time trends such as quadratics or even cubic. It will make the output significantly more complex, but the intuition remains the same. I won't demonstrate this here though. If you want to do yourself simply aren't more 3.times interactions to this model. Now let's have a look and stator and see how all of this works. Okay, here we unstated and I've set up a custom do fall that inputs two different datasets into states. Again, these datasets are very small and we're not looking at standard errors at all here. We're only looking at coefficients and try to make sure that we understand what is happening. The first example is an example of a three time period dataset. There's going to be two before time periods and one after time periods. In the second example, we're going to use a data set with nine time periods. For both. We use exactly the same technique. But you'll see that in the three time period model, we're going to have some, some coefficients. That's not a problem. Simply represents a fact that this analysis can estimate time trend changes into post-treatment period for the three time period model. Very well, because there's only one time period, it just can't be done. So don't think something was wrong when status not stopping coefficients for that model. So let's load the first dataset and also visualize it. I want to explain all of this code. The first bit inputs to raw data into stated directly. And the second bit uses the two-way command to overlay various scatterplots and linear fits to visualize the underlying data. So let's execute this and see what we get. Okay? So in this first example, we see three time periods, minus 10 and plus one minus 10 before treatment time periods. What's a plus one st after treatment type here. We can clearly see that the treated and controls are different trends prior to the treatment. So the parallel trend assumption is violated. We can also see that the control trend does not change after time period 01 treatment occurs. For the treated, we do see a change in the trend. The trend becomes less positive. In other words, it looks like the effect of treatment is negative. The treated units should have continued along the same trend. Just like to controls difference in difference estimator should there for estimating a negative effect of the treatment. So let's go back to our dew fall and run a regression that takes this non parallel trends into account and hopefully ends up within negative estimate. To do that, all we need to do is create a post AMI that is equal to one for every time period after treatment and 0 for every time period before treatment. And we then run our fully interactive regression model. Let's go ahead and do that. Excellent. Here are our results to parameters have dropped and that's fine. Let me put up the previous graph so that I can highlight what the coefficients actually due at time period 0. The constant shows us that the controls have a level of 100 thousand. The treated units 500 units above stat. At time period 0, the control units have a trend of 100. And at the same time period, the treated units have a trend that is 400 greater. In other words, their trend or slope is 500. And the post AMI kate control units experienced no change from the previous trend in the post-treatment period. Everything remains on the same path. However, for the treated, the story is different. Their path changed significantly by how much? Well, by minus 400. And that is the parameter on the treated post interaction. So the effect of treatment was negative by how much? By minus 400. And the great thing is that we estimated this in the presence of parallel trends. Next, let's move to a more complex model with nine time periods. Let me go ahead and input the data and graph it. Execute that. Ok, here's the graph. And we can see that the treated and controls are different trends before treatment, again. After treatment or control see that trend increase once the treated see a small jump, but the trend actually remained stable after treatment. In other words, the effect of treatment here is twofold. Firstly, there's an immediate positive effect on the treated units by a level jump. But the trend remains unaltered. And over time. Therefore, the treatment has negative effects. That is because in the absence of treatment to treat, a trend should have increased by the same amount as Adapt controls. So let's use our model again and see what happens. It's generated out army posts and then execute the regression. And here's our regression. Again. Let me pull up the previous graph to help you interpret the coefficients coming out of this complex different if model. We start at 11000 photo controls at time period 0. Treated units, or 300 higher at time period 0 controls on a trend of 100 before treatment. Treated units on a trend of 100 left before treatment. In other words, their trend is 0 and the control seat, no immediate jumper treatment. But to treat it seemed immediate jump of 100 up treatment. That's that they'll gap right over here. Treated units therefore get an immediate positive shock from treatment. However, after treatment to trend for the control units increases by 200, they were already on plus 100. So the neutron will be plus 300. The treated units in no increase in their trend post-treatment. Their value of minus 200 exactly cancels out to control trends increase of plus 200, which implies that trend remains at 0. The result is that the treatment was not good for the treated. It negatively affected their trend by 200 post-treatment. Remember that the final minus 200 as a time trend effect. So that is minus 200 per year. In other words, the losses for the treated water mount over time. It is actually possible to highlight this by replacing the post AMI, but a series of individual dummies that can be done by the following code. I'm generating a new variable posts to hear. And I'm filling that with a time values. If time is equal or greater than one, I will post treated periods. These are them simply expanded in our regression. And we're going to end up with a series of post treatment time dummies. And here is our model. Again, don't worry about the dropped interaction taps, the treated times post dummies. Now compute the overall effect in each post treatment time period. Remember, treated unit's gain plus 100 instantly but also lost 200 and the trend. So the first time period is minus 100, and the second time period, this increases to minus 310 to minus 500, et cetera, et cetera. So that's another way to clearly estimate the effect of treatment. Both models are exactly the same. They're just different ways of presenting that different results. The key concept of secession was to highlight that different diff, models cannot work without parallel trends. That you can't estimate. Very flexible models that recover very good treatment effects. Stator is especially useful here because it can do so much of this automatically. Of course, there are other issues to deal with. We haven't covered clustered standard errors, for example, that the key aim here was to show you how to model complex, different models that incorporate failure of parallel trends.