2026/05/14
df2 = pd.merge(df, df_ind, on=["銘柄コード", "会社名"])
df2["営業利益"] = df2["売上高"] - df2["売上原価"] - df2["販管費"]
df2["roa"] = df2["営業利益"] / df2["総資産"]
df2["売上高利益率"] = df2["営業利益"] / df2["売上高"]
df2["総資産回転率"] = df2["売上高"] / df2["総資産"]
df2_clean = df2.loc[
(df2["棚卸資産"] <= df2["棚卸資産"].quantile(0.99)) &
(df2["売上原価"] <= df2["売上原価"].quantile(0.99))
]「棚卸資産が大きいほど売上原価が高い」という関係を見たとき、その関係は本当に棚卸資産が原因なのだろうか?
実は、企業の規模(売上高) が棚卸資産と売上原価の両方を同時に引き上げている可能性がある。大きな企業ほど棚卸資産も多く、売上原価も大きい。このような第3の変数を交絡変数という。
売上高(交絡変数)
↓ ↓
棚卸資産 → 売上原価(?)
単回帰で「棚卸資産 → 売上原価」を推定しても、実は売上高の効果が混入してしまっている。
複数の説明変数を同時に入れることで、「他の変数を一定に保った上で、ある変数だけの効果」を推定できる。交絡変数の影響を統計的に取り除けるのが重回帰の強み。
smf.ols の式に + で変数を追加するだけで重回帰になる。
OLS Regression Results
==============================================================================
Dep. Variable: 売上原価 R-squared: 0.990
Model: OLS Adj. R-squared: 0.989
Method: Least Squares F-statistic: 724.8
Date: Fri, 17 Apr 2026 Prob (F-statistic): 7.33e-15
Time: 04:56:19 Log-Likelihood: -186.49
No. Observations: 17 AIC: 379.0
Df Residuals: 14 BIC: 381.5
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 1.006e+04 9762.578 1.030 0.320 -1.09e+04 3.1e+04
棚卸資産 -0.3687 0.215 -1.714 0.109 -0.830 0.093
売上高 0.7587 0.036 21.152 0.000 0.682 0.836
==============================================================================
Omnibus: 0.644 Durbin-Watson: 2.384
Prob(Omnibus): 0.725 Jarque-Bera (JB): 0.431
Skew: 0.360 Prob(JB): 0.806
Kurtosis: 2.697 Cond. No. 1.36e+06
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.36e+06. This might indicate that there are
strong multicollinearity or other numerical problems.
単回帰と比べて、棚卸資産の係数(coef)がどう変わったか確認する。
OLS Regression Results
==============================================================================
Dep. Variable: 売上原価 R-squared: 0.685
Model: OLS Adj. R-squared: 0.664
Method: Least Squares F-statistic: 32.58
Date: Fri, 17 Apr 2026 Prob (F-statistic): 4.15e-05
Time: 04:56:19 Log-Likelihood: -216.20
No. Observations: 17 AIC: 436.4
Df Residuals: 15 BIC: 438.1
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 1.364e+05 4.28e+04 3.187 0.006 4.52e+04 2.28e+05
棚卸資産 3.5234 0.617 5.708 0.000 2.208 4.839
==============================================================================
Omnibus: 0.740 Durbin-Watson: 1.675
Prob(Omnibus): 0.691 Jarque-Bera (JB): 0.671
Skew: -0.151 Prob(JB): 0.715
Kurtosis: 2.075 Cond. No. 1.43e+05
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.43e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
重回帰分析は主に2つの目的で使われる。
例:「何がROAを決めているか」→ 売上高利益率と総資産回転率をコントロールして分析する
OLS Regression Results
==============================================================================
Dep. Variable: roa R-squared: 0.953
Model: OLS Adj. R-squared: 0.946
Method: Least Squares F-statistic: 142.5
Date: Fri, 17 Apr 2026 Prob (F-statistic): 4.94e-10
Time: 04:56:19 Log-Likelihood: 58.281
No. Observations: 17 AIC: -110.6
Df Residuals: 14 BIC: -108.1
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -0.0449 0.011 -4.028 0.001 -0.069 -0.021
売上高利益率 1.7005 0.113 15.001 0.000 1.457 1.944
総資産回転率 0.0288 0.005 5.264 0.000 0.017 0.040
==============================================================================
Omnibus: 0.934 Durbin-Watson: 2.088
Prob(Omnibus): 0.627 Jarque-Bera (JB): 0.330
Skew: 0.341 Prob(JB): 0.848
Kurtosis: 2.995 Cond. No. 120.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
例:「ROAが高い企業は純資産も大きいか」→ 規模(売上高)をコントロールして、ROA自体の効果を取り出す
OLS Regression Results
==============================================================================
Dep. Variable: 純資産 R-squared: 0.596
Model: OLS Adj. R-squared: 0.539
Method: Least Squares F-statistic: 10.34
Date: Fri, 17 Apr 2026 Prob (F-statistic): 0.00175
Time: 04:56:19 Log-Likelihood: -202.78
No. Observations: 17 AIC: 411.6
Df Residuals: 14 BIC: 414.1
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 9598.9572 2.83e+04 0.339 0.740 -5.12e+04 7.04e+04
roa 2.636e+05 2.95e+05 0.894 0.387 -3.69e+05 8.96e+05
売上高 0.1968 0.053 3.722 0.002 0.083 0.310
==============================================================================
Omnibus: 0.979 Durbin-Watson: 2.174
Prob(Omnibus): 0.613 Jarque-Bera (JB): 0.899
Skew: 0.450 Prob(JB): 0.638
Kurtosis: 2.322 Cond. No. 1.56e+07
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.56e+07. This might indicate that there are
strong multicollinearity or other numerical problems.
説明変数を増やすと R-squared(決定係数)は上がるが、変数を闇雲に増やすのは禁物。理論的な根拠なく変数を追加すると解釈が難しくなる。
ゼミ