6 重回帰分析

𠮷田政之

近畿大学経営学部

2026/05/14

1 必要なパッケージを読み込む

import pandas as pd
import statsmodels.formula.api as smf

2 データを準備する

df = pd.read_csv("drive/MyDrive/自分で作ったフォルダの名前/findata.csv", encoding="cp932")
df_ind = pd.read_csv("drive/MyDrive/自分で作ったフォルダの名前/inddata.csv", encoding="cp932")
df2 = pd.merge(df, df_ind, on=["銘柄コード", "会社名"])
df2["営業利益"] = df2["売上高"] - df2["売上原価"] - df2["販管費"]
df2["roa"] = df2["営業利益"] / df2["総資産"]
df2["売上高利益率"] = df2["営業利益"] / df2["売上高"]
df2["総資産回転率"] = df2["売上高"] / df2["総資産"]

df2_clean = df2.loc[
    (df2["棚卸資産"] <= df2["棚卸資産"].quantile(0.99)) &
    (df2["売上原価"] <= df2["売上原価"].quantile(0.99))
]

3 交絡変数の問題

「棚卸資産が大きいほど売上原価が高い」という関係を見たとき、その関係は本当に棚卸資産が原因なのだろうか?

実は、企業の規模(売上高) が棚卸資産と売上原価の両方を同時に引き上げている可能性がある。大きな企業ほど棚卸資産も多く、売上原価も大きい。このような第3の変数を交絡変数という。

売上高(交絡変数)
  ↓              ↓
棚卸資産  →  売上原価(?)

単回帰で「棚卸資産 → 売上原価」を推定しても、実は売上高の効果が混入してしまっている。

4 重回帰分析とは

複数の説明変数を同時に入れることで、「他の変数を一定に保った上で、ある変数だけの効果」を推定できる。交絡変数の影響を統計的に取り除けるのが重回帰の強み。

smf.ols の式に + で変数を追加するだけで重回帰になる。

md_multi = smf.ols('売上原価 ~ 棚卸資産 + 売上高', data=df2_clean).fit()
print(md_multi.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   売上原価   R-squared:                       0.990
Model:                            OLS   Adj. R-squared:                  0.989
Method:                 Least Squares   F-statistic:                     724.8
Date:                Fri, 17 Apr 2026   Prob (F-statistic):           7.33e-15
Time:                        04:56:19   Log-Likelihood:                -186.49
No. Observations:                  17   AIC:                             379.0
Df Residuals:                      14   BIC:                             381.5
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   1.006e+04   9762.578      1.030      0.320   -1.09e+04     3.1e+04
棚卸資産          -0.3687      0.215     -1.714      0.109      -0.830       0.093
売上高            0.7587      0.036     21.152      0.000       0.682       0.836
==============================================================================
Omnibus:                        0.644   Durbin-Watson:                   2.384
Prob(Omnibus):                  0.725   Jarque-Bera (JB):                0.431
Skew:                           0.360   Prob(JB):                        0.806
Kurtosis:                       2.697   Cond. No.                     1.36e+06
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.36e+06. This might indicate that there are
strong multicollinearity or other numerical problems.

単回帰と比べて、棚卸資産の係数(coef)がどう変わったか確認する。

md_single = smf.ols('売上原価 ~ 棚卸資産', data=df2_clean).fit()
print(md_single.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   売上原価   R-squared:                       0.685
Model:                            OLS   Adj. R-squared:                  0.664
Method:                 Least Squares   F-statistic:                     32.58
Date:                Fri, 17 Apr 2026   Prob (F-statistic):           4.15e-05
Time:                        04:56:19   Log-Likelihood:                -216.20
No. Observations:                  17   AIC:                             436.4
Df Residuals:                      15   BIC:                             438.1
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   1.364e+05   4.28e+04      3.187      0.006    4.52e+04    2.28e+05
棚卸資産           3.5234      0.617      5.708      0.000       2.208       4.839
==============================================================================
Omnibus:                        0.740   Durbin-Watson:                   1.675
Prob(Omnibus):                  0.691   Jarque-Bera (JB):                0.671
Skew:                          -0.151   Prob(JB):                        0.715
Kurtosis:                       2.075   Cond. No.                     1.43e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.43e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

5 重回帰の使い所

重回帰分析は主に2つの目的で使われる。

5.1 1. 決定要因の分析(何がXを決めているか)

例:「何がROAを決めているか」→ 売上高利益率と総資産回転率をコントロールして分析する

md_roa = smf.ols('roa ~ 売上高利益率 + 総資産回転率', data=df2_clean).fit()
print(md_roa.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    roa   R-squared:                       0.953
Model:                            OLS   Adj. R-squared:                  0.946
Method:                 Least Squares   F-statistic:                     142.5
Date:                Fri, 17 Apr 2026   Prob (F-statistic):           4.94e-10
Time:                        04:56:19   Log-Likelihood:                 58.281
No. Observations:                  17   AIC:                            -110.6
Df Residuals:                      14   BIC:                            -108.1
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.0449      0.011     -4.028      0.001      -0.069      -0.021
売上高利益率         1.7005      0.113     15.001      0.000       1.457       1.944
総資産回転率         0.0288      0.005      5.264      0.000       0.017       0.040
==============================================================================
Omnibus:                        0.934   Durbin-Watson:                   2.088
Prob(Omnibus):                  0.627   Jarque-Bera (JB):                0.330
Skew:                           0.341   Prob(JB):                        0.848
Kurtosis:                       2.995   Cond. No.                         120.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

5.2 2. 経済的帰結の分析(Xが何に影響するか)

例:「ROAが高い企業は純資産も大きいか」→ 規模(売上高)をコントロールして、ROA自体の効果を取り出す

md_equity = smf.ols('純資産 ~ roa + 売上高', data=df2_clean).fit()
print(md_equity.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    純資産   R-squared:                       0.596
Model:                            OLS   Adj. R-squared:                  0.539
Method:                 Least Squares   F-statistic:                     10.34
Date:                Fri, 17 Apr 2026   Prob (F-statistic):            0.00175
Time:                        04:56:19   Log-Likelihood:                -202.78
No. Observations:                  17   AIC:                             411.6
Df Residuals:                      14   BIC:                             414.1
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   9598.9572   2.83e+04      0.339      0.740   -5.12e+04    7.04e+04
roa         2.636e+05   2.95e+05      0.894      0.387   -3.69e+05    8.96e+05
売上高            0.1968      0.053      3.722      0.002       0.083       0.310
==============================================================================
Omnibus:                        0.979   Durbin-Watson:                   2.174
Prob(Omnibus):                  0.613   Jarque-Bera (JB):                0.899
Skew:                           0.450   Prob(JB):                        0.638
Kurtosis:                       2.322   Cond. No.                     1.56e+07
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.56e+07. This might indicate that there are
strong multicollinearity or other numerical problems.

説明変数を増やすと R-squared(決定係数)は上がるが、変数を闇雲に増やすのは禁物。理論的な根拠なく変数を追加すると解釈が難しくなる。