In econometrics,Â StataÂ is aÂ commonly used computer program for regression analysis. Below are some important commands and descriptions useful for doing regression analysis in Stata. ## R |

Model | Dependent variable | Independent variable | Interpretation | DefinitionÂ |

Level-level | y | x | 1 increase in x leads toÂ Î² increase in y | Î”y =Â Î²Î”x |

Level-log | y | ln(x) | 1% increase in x leads to Î² increase in y | Î”y = (Î²/100)%Î”x |

Log-level | ln(y) | x | 1 increase in x leads to Î²*100% increase in y (approximation) | %Î”y = (100Î²)Î”x |

Log-log | ln(y) | ln(x) | 1% increase in x leads to Î²*100% increase in y (approximation) | %Î”y = Î²%Î”x |

The log-log model has the same form asÂ

**elasticityÂ**(if you know Swedish, see Elasticitet). Thus, the log-log model is also known as the**Constant Elasticity (CE) model**.**Exact percent**

When we use logarithm on the dependent variable, Î²*100 is just an approximation of the effect of a change in the independent variable. To get exact numbers, we must use the formula:

100 * (exp(Î²) - 1)

Which can also be written:

100 * (e^Î²Â - 1)

Where exp(Î²) is e to the power of the coefficient (Î²)Â we're looking at.

## Simple vs multiple regression

**Simple linear regression**(SLR): one independent variable

**Multiple linear regression**(MLR): two+ independent variables

Simple vs multiple regression in Stata.

Note that in the example above, the effect of education increases when we control for additional variables. This is because education is negatively correlated with experience and tenure, because people who study more enter the labor market later. However, all variables are positively correlated with wage. This means that by not controlling for the experience, we underestimate the effect of education of wages.

## Limited Dependent Variable (LDV)/Linear Probability Model (LPM)

When the dependent variable (y) is a dummy/boolean variable (i.e. it can only have the value 0 or 1).If y can be 0 or 1, the expected value of y can be interpreted as the probability that y is equal to 1. Therefore, multiple linear regression model with binary dependent variable is called the

**linear probability model (LPM)**.

It's often okay to use OLS even with LDV.

## Interaction term

Sometimes partial effects depend on another independent variable. In those cases it can be useful to include an**interaction term**, which is simply one independent variable times another.

Example:

price = B0 + B1 * sqrft + B2 * bedrooms + B3(sqrft * bedrooms) + B4 * bthrms + u

Means that, the effect on price of adding more bedrooms depends on the size of the house:

delta_price/delta_bedrooms = B2 + B3 * sqrtft

Itâ€™s more expensive to add bedrooms to a small house than a large house.

price = B0 + B1 * sqrft + B2 * bedrooms + B3(sqrft * bedrooms) + B4 * bthrms + u

Means that, the effect on price of adding more bedrooms depends on the size of the house:

delta_price/delta_bedrooms = B2 + B3 * sqrtft

Itâ€™s more expensive to add bedrooms to a small house than a large house.