Stata: Count Data Analysis | Franz Buscha | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Stata: Count Data Analysis

teacher avatar Franz Buscha

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

6 Lessons (45m)
    • 1. Introduction

      2:55
    • 2. Count 1 - Features of Count Data

      7:46
    • 3. Count 2 - Poisson Regression

      9:28
    • 4. Count 3 - Negative Binomial Regression

      7:59
    • 5. Count 4 - Truncated and Censored Count Regression

      8:02
    • 6. Count 5 - Hurdle Count Regression

      8:44
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

12

Students

--

Projects

About This Class

This class is a Stata module that explores how to analyse, and model, count data using the statistics software Stata.

Count data is data that takes only positive integers. Are you looking at data that counts the number of doctor visits? The number of children a couple has? How many times somebody takes a plane in a month? 

If so, you will need to use count data models to analyse such data. Standard regression techniques cannot handle the peculiarities of count data and other techniques needs to be used.

In this class I will explore common count data models such as Poisson and Negative Binomial models and also explore more complex models that have hurdle or two-step processes. 

You are expected to have a basic understanding of Statistics and Stata to get the most out of this course.

Meet Your Teacher

Teacher Profile Image

Franz Buscha

Teacher

Class Ratings

Expectations Met?
  • Exceeded!
    0%
  • Yes
    0%
  • Somewhat
    0%
  • Not really
    0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction: Hello and welcome to this short course on data count data analysis. This course is a stator module that aims to provide viewers with a quick overview of how to analyze count data, it's data. To gain the most from this course, you should have a basic understanding of statistics, specifically regression analysis, and have a basic understanding of data. If you don't have to. First, I recommend that you watch my easiest statistics course that focuses on the concept of regression analysis. If you don't have the second, then I recommend that you watch my learned data analytics with state because that gives you a great primer on how to use data. Please note that this data is a propriety software. It is not free. You'll still learn something from this course, even if you don't have access to state. But it is recommended that you have access to it. Count data is data that is encountered form. It must be integers and be positive. 1234. These are all counter examples of count data include the number of children the person has, the number of doctor visits per year. A person makes, all the number of puffs trips per month. Someone mix. If you're working with or analyzing data that doesn't count format, then you will require specialized methods and models to cope with the unique nature. In this course, I will highlight some of the most important concepts of count data analysis. And this includes what is count data and why do we need special models? What is a Poisson model? What is a negative binomial model? And now also have a look at specialized count data models that have truncated or two-step processes. These models often formed the basis of count data analysis. And if you're working with data where the primary variable of interest isn't count form, then you should know about these mobiles. This course is not intended to be a in-depth mathematical or statistical exploration of count data models. If you want equations, then you won't find them here. Each session will provide a quick overview of the main statistical issue in hand and then move onto states m. Y will demonstrate how to do things and how to interpreting. The outcome of the course is to give viewers a clear and basic overview of how to handle count data. Because this is an advanced topic, Allstate interaction is via code. And you should now understand the basics of state or coding. So come join me as we explore count data in STATA. 2. Count 1 - Features of Count Data: Let's learn how to analyze and model count data. Count data is data that is generated by process that results in only non-negative integers. In other words, data of this type can only be 0 or positive, and it must increment in whole numbers. For example, 012345 and so forth. There are many examples of count data in real life. Just how many storms that region per month, how many crimes occur per week, or how many children a family has? All of these are counted. He's a famous historical example of count data and its analysis. This data stems from somebody called lattice. Now some book from 1898. He was interested in analyzing small numbers. And in one example, produce this table. This table shows the number of deaths that were caused by host kicks in the Prussian army in the late 18 hundreds. Apparently, this was a real issue back then. And you can see that in most months, nobody died in that month. However, one death per month was observed 65 times, and two deaths per month was observed 22 times. This is a classical example of count data, and we're going to come back to this example a little bit later. So a natural question is, do we need special regression models for count data? The short answer is yes. Many count data distributions are positively skewed, rather than normally skewed has required, but ordinarily squares. They also often contain many zeros, which makes transforming the data impractical. That is because multiplying something by 0 will simply produce 0, therefore, transform anything. Finally, ordinarily squares can predict negative counts. And this is of course, something that can't happen in reality. And therefore, we need a different type of regression model to accommodate for these factors. So let's have a look and stator and explore some of these issues a little bit further. I'll use some of the knowledge came from the simulation exercises to simulate count data. This can be done with the, our Poisson function, which generates a random count variable that is Poisson distributed. So let's head over to this data and take an initial look at count data. Here we unstated with an empty datasets. Do you remember the host kicked data I showed you earlier. This data from the 18 hundreds was shown to be Poisson distributed by David. He calculated that this data has a mean of 0.06. One were roughly 200 observations in that data. Let's go ahead and generate 200 new random observations that are Poisson distributed with a mean of 0.06 one. And we can do that in this data by executing the following code. Let's start with a clear and then also set the seed so that we can replicate our simulated numbers, will also set the observation count to 200. Next, let's generate a new variable. Let's call it count. And this new variable is going to be Poisson distributed. The mean of 0.06 one. Execute that. Finally, let's tabulate the newly generated variable. Here we can see the tabulation of the count variable. This is random data, but it should look very similar to the original data from 1898. We see that there are many 0 counts and around 61 counts. There are also 242 counts and only a few three counts. So we can confirm that from both cleavage must write his data was indeed Poisson distributed. We'll explore Poisson modelling on the next video. But here is an important aspect of Poisson modelling. Let's go ahead and perform my summary of the count variable. Specifically, Let's invoke the detail option to see the underlying variants. And what we see is that the mean of our count variable is approximately 0.62 and the variance is approximately 0.63. So the variance and the mean almost identical. This is called equity dispersion and it's a principle that is very important to pause on modelling. Unfortunately, it's also principal that is often violated with real data. Next, let me show you what happens if we run an ordinary least squares regression model through count data. So let's load up the Dolan Hill practice dataset. Clear and then Web views. Dole Hill three. This dataset contains information about the number of deaths per age category for smokers and non-smokers. Let's go ahead and describe it. We see that it's a very small dataset and only has ten observations and four variables. But let's ignore that for a moment. A natural analysis would be to ask how smoking and age are related to the number of tests. So let's have a look at the test variable in more detail. Tabulate deaths. Here we can see that the death variable is likely to be a count variable. We don't know how the count takes place. For example, whether its deaths per week in a month or a year. But we can see that all the numbers are integers and they're positive. Often count variables in real life will have many low counts and only a few high counts. But there's no reason that that needs to be true. There is no fundamental problem with count data having large values. So let's go ahead and build a regression model of deaths. Being naive, we might use ordinarily squares regression that regressors the number of deaths against smoking behavior and age, for example. So let's go ahead and run a regression with deaths as our dependent variable, and smoking and age categories as our explanatory variables. Here's the ordinarily squares regression output. We can see that smoking is positively correlated with the number of tests, with the effect of age categories is a bit zone. So next, let's go ahead and obtain the predictions from this model. And we can do that with the predict command. So let's go ahead and type predict count, which will generate a new variable called count. And then we can tabulate the new variable with the predictions. And we see that there is a problem. We predicted a negative count. And that doesn't make sense. We can't have a negative number of deaths. And that has one of the reasons why we want to use counts specific regression models, such as Poisson or negative binomial regressions. And that concludes this introduction to count data. 3. Count 2 - Poisson Regression: Let's examine Poisson regression. Many count processes follow a Poisson distribution. For example, the number of customers calling a call center, people visiting a website, or even movements in stock price often follow. A Poisson process is a Poisson regression is a type of regression that tries to explain a Poisson process through a series of explanatory variables. Poisson regression is a non-linear regression method, and we therefore need to take care when interpreting our results. Poisson models make some assumptions. Poisson models assume that the events are independent of each other. For example, counting the number of buses turning up at a bus stop is likely to fail this assumption, since classes are deliberately spaced apart. For some models, also assume that the average rate of events per time period, called the incidence rate, is constant. That may not always hold. For example, morning rush hour traffic will likely be different to midnight traffic. So if you're counting cars on the road over time, a Poisson model may not be appropriate for you. Poisson models also assume that two events cannot occur at exactly the same time. If you're count process fails to meet these criteria and a Poisson model is probably not the right choice. A final assumption that the Poisson regression model makes is the assumption of equity dispersion. This is the concept that the mean and variance are equal and cannot vary independently. This assumption is frequently violated and practice. Often your variants will exceed your mean. And this is called over dispersion. And it is important to test all possible models for over dispersion. In rare cases, we may also have under dispersion, but the variance is smaller than the mean. Let's move over the stator. There. We're going to use the Poisson command that works very similar to the regress command. To test for equity dispersion, we're going to use the overdoes command is a command that needs to be installed first. Make sure you always see any relevant permissions before installing anything on your computer. Now, let's head over this data and explore Poisson regression. Here we own stator and I've already loaded the custom count dataset that comes as part of this session. You can download it from this session. Let's take a look at the data first. Let's run and describe command. Here we can see that this data contains information on around 3.5 thousand individuals and the number of doctor visits for each of these individuals. Let's go take a closer look at a variable number of doctor visits. Let's tabulate doc V's. So the first thing that we notice is that this distribution is quite long. Some individuals went to see a doctor, wouldn't 20 or even 40 times. Generally speaking, count data for Poisson models is short and snappy and it doesn't contain many or high counts. So I'm already suspecting that a Poisson model may not be appropriate for this dependent variable. But let's continue anyway. Let's go ahead and run a regression model that tries to explain how these doctor visits are determined. I'm going to specify a Poisson model that uses age, education, income, and gender as explanatory variables. But before we do that, let's explore Poisson's help far quickly. We can do that by typing Help Poisson. Here we can see the help file for the Poisson regression come up. It's relatively simple to use, just like a normal regression. We specify the Poisson command first, and then we must specify one dependent variable. We can then specify between 0 and there's many independent variables, as we would like. In terms of options, the only important options that you should be aware of, the exposure set options. Poisson regression models may also be appropriate for rate data, where a rate is a countable events divided by some measure of time or exposure. So if you, theta contains a measure of time, then it may be appropriate to use the exposure and offset functions for your Poisson model. Let's go ahead and close this help file, and let's run our regression. Let's type Poisson, the dependent variables, doctor visits. And then we'll use the explanatory variables, age, education, income, and also gender. Execute that. And here are the results of our Poisson regression model. We can see that the results look very similar to that obtained from an ordinarily squares regression command. The only real difference here is that because this is a non-linear type model, we see maximum likelihood and summation. The results from this Poisson regression models suggest to us that age, education, and income, all statistically significantly related in influencing doctor visits. Gender does not appear to be an important explanatory variable. Age and educational positively related to the number of doctor visits. While income appears to be negatively related. Row coefficients can Poisson regression relate to the low count? This allows interpretation of coefficients as semi elasticities. Here's what I mean. A one unit increase in age causes a 0.9% increase in the count of doctor visits. So that is the interpretation of the row coefficients presented to us nice data. However, we can also use the Margins command to marginal effects. To do that, we can type margins DY DX. And in this case, I'll use the star wildcard to denote all variables. Execute up. And here, marginal effects allow interpretation of an increase in accounts. For example, a one unit increase in education is associated with 0.01. six additional doctor visits. Alternatively, a one unit increase in age is associated with 0.06 additional doctor visits. Finally, we need to test whether the Poisson model is appropriate before we accept these estimates. State offers an inbuilt way to do that via the Eastern Gulf command, where gov stands for goodness of fit. The East.gov command Uses deviants oh, pacing residuals to test whether the Poisson model is appropriate. Let me demonstrate we can simply type eastern gulf of a Poisson regression. And we offered to test statistics. If either test statistic is below the value of 0.05, then the Poisson model is not a good fit. And that seems to be the case here. However, it is important to note that this is not a test of equity dispersion. To test that, we need to use a user written command called over, and that can be installed for phi. Let me go ahead and install this command. Once the command is installed, to use it, we simply copy the Poisson regression line and replace the Poisson command with the overdoes command, like so over this. And then talk with, and then I'll explanatory variables, execute that. And we see that we get a test statistic back called you had a positive and statistically significant coefficient on this test is evidence of over dispersion in negative and statistically significant coefficient on this test is evidence of the dispersion. In this case, there's clear evidence over dispersion. So ultimately, we reject this modelling approach and we need to find an alternative model to model this count data. And that leads us nicely into our next video about negative binomial regression. That will access the assumption equity dispersion. 4. Count 3 - Negative Binomial Regression: Let's explore the negative binomial regression model for count data. This model is frequently used when we encounter over dispersed counties later. In other words, it deals with count data where there is more variation than would be expected from a Poisson process. However, a negative binomial model is not helpful in cases of under dispersion. This is because Poisson and negative binomial models arise from situations. Events are independently generated. Over dispersion often happens if some causes of the underlying Poisson process are unknown. For example, under Dispersion happens if the counter events on some way connected or regulated and therefore not independent. A negative binomial regression model is similar to a Poisson regression model, except that it has an extra parameter which allows to variance of the predicted counts. To defer. This parameter is called alpha, an estimate to what extent over dispersion is present in the data. This parameter also provides a convenient test whether a negative binomial model is appropriate or not. This is because the Poisson model is a special case of the negative binomial model when the parameter alpha is set to 0. In other words, the previous over dispersion command wasn't actually needed. The negative binomial regression model in stator provides an inbuilt test of over dispersion. So let's add status and let's use the NB reg command to estimate a negative binomial regression. Most of the code we're using is pretty standard. But we'll also use to predict command, to predict counts from our model and then use to correlate command the computer squared correlation between fitted an actual counts. This is often a better indicator of fit when dealing with these kind of models. And it allows for a better comparison across both to Poisson and negative binomial regression models. So here we are, stator and I've already loaded the associated count data set that comes with this session. This is a doctor visits data that includes information on patient characteristics and how many times patients went to see a doctor. Before we had further, let's have a look at the negative binomial regression help file. And we can do that by typing help and be reg. So he has to help entry for the negative binomial regression model. It's pretty standard stuff. We need to specify one dependent variable and then as many independent variables as we like. We can also include if and in condition and weight. If you have time or rate elements in your data to indicate things like exposure time, then you can use the exposure and offset options. If you wish. There is also a dispersion option. You can change how to parameterize over dispersion via the dispersion option. And you can either have a mean dispersion, which is the default, or a constant dispersion. You'll need to decide for yourself which to use. Some authors referred to mean dispersion models as nB to models and constant dispersion models as MV1 models. If you're uncertain about this, my advice is to January leave this alone and go with the default mean dispersion. So let's close this and run a negative binomial regression. Our previous Poisson regression model indicated strong signs of over dispersion. So let's fit the same model, but using negative binomial regression to estimate it. So nb reg, the dependent variables, doctor visits, and then our independent variables, age, education, income, and gender. And here are the results. The output from this model can be interpreted as follows. The top half of our results on the diagnostics that are fairly standard. And you can watch my videos on logit models for more information about these. The only additional piece of information here is how dispersion is parameterized. Next comes the output part of the model. Age, education are both statistically significant in our model, what's income and gender are not. The interpretation of the coefficient is the same as that of a Poisson regression model. The row coefficients referred to changes in the law count, which can be interpreted as a semi elasticity. So for example, a one-year increase in age increases the number of doctor visits by 0.9%. At the bottom of our regression output, we see two estimates of alpha. And B red actually estimates a log of alpha and then exponentiate this to produce alpha. So both numbers therefore refer to the same thing. In this case, alpha has a value of 0.83. And the test at the bottom indicates that it is very statistically significantly different from 0. In other words, a Poisson model is not justified, will say negative binomial model is. We can also use the Margins command to compute average marginal effects. So let's go ahead and do that. Margins dy, dx for all our variables. And this allows for a slightly easier interpretation in count changes. So for example, a 10x increase in education increases the number of doctor visits by 0.15 counts. To get an idea of fit for this model, we can of course, use the IP pseudo r squared that was presented earlier as part of our diagnostic statistics. However, this makes it difficult to compare goodness of fit across different methods. A potentially better ways to interpret Goodness of Fit is via the squared coefficient of correlation, which is similar to R-squared from ordinarily squares. To compute that, we need to predict the count from our regression model. Like so, Predict the count. And then we correlate the predicted count values against the true values. So correlate fitted count against the actual doctor visit counts. And we see that we get a correlation coefficient of 0.0875. We now square this number. And there we are. We've computed a squared coefficient of correlation of 0.007. So that's quite a poll value. And it suggests that even though we took care of the over dispersion problem and our data, our model remains a poll model. We would probably want to use more explanatory variables in future iterations of this model. And that concludes this session on negative binomial regression. As you can see, it's very similar to Poisson regression. 5. Count 4 - Truncated and Censored Count Regression: Sometimes count data is truncated or censored. When the dependent variable is censored or truncated, we must use estimation methods that account for this limitation. If we do not account for such limitations, how estimates may not be accurate. Truncation means that we do not observe part of the dependent variable or of the covariates. For example, we may not observe the value 0 in the count of the number of doctor visits. And only observed values that are equal to one or greater. Censoring On the other hand, implies that we observe all parts of a variable. But for a specific part, we only see a censored value. A common example of this is top coding, where variables with long tails, for example, wages, are coded into a final category. So for example, in our doctor visits data, any values above 20 or more, maybe top coded simply to the value 20. Here's an example of truncation and censoring using the doctor visit data. The first histogram shows the variable that counts the number of doctor visits per person. It's a bit hard to see here, but the very first count is truncated. There are no zeros in this variable. In other words, 10% of this data is truncated and therefore missing. In the second histogram, we see the centered version of the same count variable of doctor visits. Here the camp was censored at 20 and all observations would count greater than 20, where recoded to the value of 20. We still have all observations. But for around 5% of our sample, we don't know the exact value above 20. With such data, patents and count data, we can use status truncated and censored count regression commands. For truncated data, we can use t Poisson and T and B rage, which run truncated Poisson and truncated negative binomial regression models, respectively. For sensor data states that only offers a sensor Poisson model with a command C Poisson. Unfortunately, there is no CAN BE read or censored negative binomial regression command. All three commands require that uses specify where the censoring or truncation occurs. Either at the lower limit or an upper limit. Both can be specified at the same time. So let's set up this data and see how this works in practice. Here we are on states and with the top the visit data already loaded. I've got some code here that produces a truncated and censored version of the count variable doctor visits talk with. So let's run this code, but feel free to change these values to any other values by yourself. Now let's tabulate both new variables and see how they look. Let's tap t dot Vz, which is our truncated version of darkness. Tapping the truncated viable reveals that it is missing the 0 count. Therefore, it has reduced observations and the 0 value is missing from its tabulation. Next, let's tabulate the censored version C Documents. Tab see darkness. And here we can see that the censored variable does not have any missing observations. It has 3,677 of them, but it has been top coded. So any values above 20 now coded to 20. To properly account for the state to behavior, we need to use truncated and censored count data models. To estimate a truncated Poisson regression, we can use the T Poisson command. So T Poisson, the dependent variable is t. We then specify our explanatory variables. And we're going to specify the lower limit, which in this case is 0. That tells data that in this variable the truncation occurs at the value 0. There is also a UL option that stands for upper limit. And both can be specified together. If you forget to specify either a low limit or an upper limit state, I will default to the L, L 0 option. So let's execute this. And here we can see our truncated Poisson regression results are presented in an identical fashion to the standard Poisson regression command. Except that we also received some diagnostic information that state to us the truncation ranked. These estimates have the usual interpretation and we could also use the Margins command to compute marginal effects from this. To estimate a truncated negative binomial regression model. We can replace the t Poisson command with the T and B rich command like so, T and B red. And then truncated doctor visits our explanatory variables. And the lower limit of 0. And here is our truncated negative binomial regression output. This output is identical to that of a standard negative polynomial regression model, except that again, we're given some information about the truncation point in the diagnostics. We can see that in this case, there's still evidence of over dispersion and our data, even though we're working with a reduced sample size of only 3,276 observations. So this would seem to be a more appropriate model than the truncated Poisson regression model. For sense at count data, we can only use the Sense HAT Poisson model as a censored negative binomial model doesn't existence data. So let's go ahead and run one. For that, we can use the Poisson command. And we specify the C2C vis variable as our dependent variable. We then include our explanatory variables and in this case, specify an upper limit of 20. This is where our data is censored. Execute that. And again, the output here is similar to that of a standard Poisson model, except that the diagnostic information tells us what limits were imposed and how many observations are in the uncensored and sends apart of the regression. Interpreting the output is the same as that of a Poisson model, which implies that the row coefficients or semi elasticities, for example, a 10x increase in education increases the count rate by approximately 2%. And this concludes this overview of truncated and sensor data models in states or estimating truncated or censored count data regressions is very easy and states are all the same rules from the Poisson and negative polynomial regression models apply, including how to interpret the coefficients. It's a shame that data does not offer a sense that negative binomial model. But to have a complete lineup of models, there is a user written command that has been developed. Just search for it to install it. It's very easy to find. 6. Count 5 - Hurdle Count Regression: Sometimes we may come across a count variable that has an excessive amount of zeros. This is quite common and visualized by big spike of observations at 0 compared to the rest of the count distribution. However, even if a count variable doesn't have an excess amount of zeros, we may believe that the zeros on generated by a different process. For example, going to the cinema or not may involve different processes than counting how many times someone went to the cinema once they've decided to go. We can model such processes. Why hurdle whose 0 inflated regression models these potatoes into its two separate components and examine how each of the data processes occurs. Here's an example from our doctor visit data. This graph is a histogram of the first 50 counts. Let's look at the zeros. They don't particularly look excessive for this kind of distribution. And access of zeros would probably imply the first bar should be twice as tall. However, that's fine. We may still believe that the zeros on generated from a different process. For example, people who see a doctor or usually sick. So we may assume that the first hurdle of current to a doctor is actually getting sick. What's the difference between a hurdle and a 0 inflated model? Both models assume that the data is generated via a two-step process. Both models assume that the first part of the process is the on, off part, which determines whether something happens in the second part. The second part is the count part that can only occur if a non-zero ONE part was predicted. However, hurdle models differ from 0 and later models by assuming that any count must be greater than 0 if the hurdle has been cleared in 0 inflated models, count can also be 0 if the first part is being cleared. In other words, visiting 0 doctors may be choice even if only sick. And that's actually quite a sensible approach. Not everyone will choose to do something, even if they can't do it. State it does not have an in-built hurdle command. So we'll have to construct it ourselves using the logit and truncated negative binomial regression. And saw the problem. And doing it like this explains the process really well. States that does have an in-built command for 0 inflated models. And in this session we will look at the 0 inflated negative binomial regression model as 0 and Slater Poisson model with command zip also exists, but we won't cover that. So let's head over this data and take a look. Here we unstated with a doctor visit data already open. We concluded one of the previous sessions with a negative binomial model. So let's now turn this into a hurdle model. To do this, we first specify a model of whether a 0 count or any other account has taken place. And we can do this by replacing the previous NB wretch command with logit command. Note that a loaded command does not actually need a binary dependent variable. It will treat zeros as zeros and any other positive value as one. So let's go ahead and estimate that logit. Doctor visits and the now explanatory variables, age, education, income, and gender. Estimate that. And the results show that all variables are positively related with the probability of visiting a doctor. Although income is not statistically significant. We can compute marginal effects to help interpretation of this by using the Margins, command margins and then the Option DY DX for all variables. So being female increases the probability of visiting a doctor in the first place by 3.9 percentage points. Then next step is to model the second part of the hurdle model. In this case, we want to use a truncated version of the Poisson, in this case, the negative binomial regression model. So let's go ahead and do that and use the T and B rich command, which we've learned about in the previous session. The lower limit is 0. So TNP Reggie talk with dependent variable, explanatory variables. And the important thing here is that we enter a condition. If the number of TOC visits is greater than 0, this creates the hurdle. So here are our results. The condition we inserted into our T and B rich command meant that this analysis is only for observations that actually went to see a doctor. Results suggest that age and education are positively related to the count of doctor visits. But gender is not significant. Unlike in the previous equation. Let us still strong evidence of over dispersion. We can also compute marginal effects to a town interpretation for this part of the regression. Margins and then DY, DX for all our variables. And here we can see that a one-day increase in education is associated with increased count of doctor visits of approximately 0.103. Remember, it is important that both regressions are presented and interpreted together. When you're writing this up, you should never only present one part of a hurdle model. Next, if we assume that once people have passed a hurdle or they can choose account of 0, then a 0 and flutter model may be more appropriate than a straightforward hurdle model. So let's go ahead and run a inflated negative binomial regression model. Instead. This command requires that we specify an equation that determines whether the count is 0. And this is done by the inflate option. So we specify 0 inflated negative binomial regression, the dependent variables, doctor visits, and then we include our explanatory variables. And here in the inflate option, we simply enter our explanatory variables. Again. Now let's go ahead and estimate this. And then we can see the presented results similar to the negative binomial regression model, except that both equations are presented together. The first set of results, but late should account process. What is the second set of results called inflate relates to the, suppose it excess number of zeros. Results show that age and education positively affect the actual count. Most age, education and income negatively affect the probability of whether that count is 0. Note all these coefficients and negative. Watson now previous hurdle model, they were all positive values because the hurdle model predicted the probability that the count is non-zero. Well it's a 0 inflated model predicts the probability that the count is 0. So you can simply invert the signs if you compare the coefficients between these two models. Finally, to compute marginal effects after the 0 and floods model will need to use the Margins command twice. Once for the count part like so. And once for the zeros inflated part like so. If you compare the marginal effects between both sets of models, you'll see that they are broadly similar. Which model you go. Four will depend on what assumptions we ultimately make about account process. And this concludes this session on two-step count models.