[DS141] 點做 linear regression?

164 回覆

28 Like 3 Dislike

水原千鶴 2022-07-06 04:59:29

LS problem and QR decomposition
依家 linear regression 禁個制 / 一句代碼就出結果，而基本教科書都係好直接咁講解 normal equation (正規方程式) 或 maximum likelihood (最大似然) approach，冇詳細咁講點樣 solve normal equation，而教科書又寫住呢條方程式︰

於是就好易產生計算上嘅誤會，即係以為 slope estimator 係透過 inv(Gram(X)) 得出嚟 (注: Gram(X) = t(X)*X)。小妹上次開 post 解釋咗點樣用 SVD 去搵 pseudo-inverse 然後解 normal equation:

[DS141] m2 學生都識嘅 inverse, 實際又有咩用?
- 分享自 LIHKG 討論區
https://lih.kg/3049890

水原千鶴 2022-07-06 05:00:16

雖然 pseudo-inverse 比較 stable 同埋通常比直接計 inverse 快，但並冇比 optimized 過嘅 Gaussian elimination 快，因為 pseudo-inverse 係基於 SVD︰

pinv uses the singular value decomposition to form the pseudoinverse of A. Singular values along the diagonal of S that are smaller than tol are treated as zeros, and the representation of A becomes...

(source: https://www.mathworks.com/help/matlab/ref/pinv.html)

水原千鶴 2022-07-06 05:02:06

有冇咩方法嘅 stability 同 speed 介乎於 SVD 同埋 Gaussian-elimination 之間? 答案就係 QR decomposition

可以檢查一下 R 嘅 lm.fit

if(method != "qr")
   warning(gettextf("method = '%s' is not supported. Using 'qr'", method), domain = NA)
chkDots(...)
z <- .Call(C_Cdqrls, x, y, tol, FALSE)

C_Cdqrls 喺 lm.c 入面再 call Fortran，及 numpy.linalg.lstsq 最後都係 call 返 LAPACK 嘅 Fortran routine︰

if m <= n:
    gufunc = _umath_linalg.lstsq_m
else:
     gufunc = _umath_linalg.lstsq_n

lstsq_m 喺 _umath_linalg.cpp 入面會去返 Fortran subroutine，不過 numpy handle 嘅情況比 R 多啲，但兩者入面都有 QR decomposition。

所以 linear regression 實際上喺 X full column rank 係用緊 QR factorization。順便講埋 statsmodels 嘅 OLS 係用 MLE (呢個 post 並唔係教書，基礎相關教科書都有提 MLE)

水原千鶴 2022-07-06 05:02:34

Reduced QR decomposition
QR decomposition 即係將一個 matrix (求其搵個 notation e.g. A) 寫成兩個 matrix Q 同 R 嘅 product

X = QR

Q 係 orthogonal, R 係 upper triangular

為免爆 memory 一般都係用 reduced QR 而唔係 full QR. 雖然一般 linear algebra 嘅書會話 QR decomposition 即係 Gram-Schmidt algorithm 嘅結果, 只係 rearrange 咗啲 orthogonal basis 同埋 inner product.

現實呢個 GS algorithm 並唔 stable, 所以 QR decomposition 並唔係想像中直接, 其中一個方法係用 Householder transform 簡單嚟講即係用 rotation 方法將 X 變成一個 upper triangular, 而呢堆 rotation matrices 乖埋一齊就係 Q 啦.

慎防有人問點解唔用 modified Gram-Schmidt 都係因為 accuracy 問題嚟. 當然 Lapack 入面唔止一種方法

小妹冇記曬冇睇曬

(source: ML wiki)
每乘一個 rotation matrix H[j], main diagonal 下面都會清零, 因為 iteration 會乘好多次直至 rank X, 所以就係動態清零啦.

講完一輪都未返去 LS problem, 我地照 plug X = QR 落去 normal equation

(source: https://genomicsclass.github.io/book/pages/qr_and_regression.html)

換言之即係 matrix-vector product + forward substitution, 比較 QR-based
```MATLAB
opts.UT = true;
opts.TRANSA = false;

水原千鶴 2022-07-06 05:02:52

換言之即係 matrix-vector product + forward substitution, 比較 QR-based

opts.UT = true;
opts.TRANSA = false;

[Q, R] = qr(X, 0);
b_hat = linsolve(R, Q'*y, opts);

同埋 matlab built-in 嘅 pinv

b_hat_svd = pinv(X)*y;

當 X 係 10^7 by 5 時用 QR 會快 pinv 約 40%, 當 X 係 10^3 by 500 時用 QR 會快 pinv 約 70%, 但 accuracy 差唔多.

Summary
Linear regression 係咪好簡單呢?

現實你都係鳩禁架啦, 小妹主旨得一個: 記得唔好手多多改咗人地 qr 嘅 default option, 都唔好諗住用咩 gradient descent 去做普通 linear regression 咁弱智, 我間唔中見到有人問點解唔用 gradient descent 做 linear regression. 小妹 delete 咗原本筆記部份 details, 所以會睇落有啲位唔自然. 小妹並冇 cover 曬所有方法, 事實上你睇返 Lapack 入面有樣嘢叫 divide-and-conquer SVD, 同 QR 有得比, 所以 SVD 並冇完全 outdate 嘅
https://en.wikipedia.org/wiki/Divide-and-conquer_eigenvalue_algorithm

鳩柒的兒女 2022-07-06 08:24:09

水原千鶴 2022-07-06 11:08:57

多謝捧場

諗名好難 2022-07-06 11:48:44

推

水原千鶴 2022-07-06 11:58:18

其實 online linear regression 都係會見到 gradient-based method, 不過呢個 post 只係想講普通 full batch 嘅 linear regression, 所以唔準駁嘴

水原千鶴 2022-07-06 12:09:16

程式猿 2022-07-06 12:18:55

做乜變咗code tracing

sdvsvsdav 2022-07-06 21:48:47

"點解唔用 gradient descent 做 linear regression"
如果linear model 既error 係normally distributed
ols estimation 同 mle 都係同一個closed form solution
既然有closed form solution 點解要用gradient descent 呢d numerical method?

if error is t distributed


f <- function (b0, b1, b2, b3, sigma) {
  # no d is found in this function env., get d from global env.
  return(-sum(log(dnorm(d[["y"]],
                        mean = b0+b1*d[["x1"]]+b2*d[["x2"]]+b3*d[["x3"]],
                        sd = sigma))))
}

mle(f,
    start = list(b0 = 0.3,
                 b1 = 0.2,
                 b2 = -0.4,
                 b3 = 0.5,
                 sigma = 1),
    method = "BFGS") %>%
  coeftest()

又或者唔用mle()

mle_t <- function (data) {
  neg_ln_L_t <- function (para, y, x1, x2, x3) {
    return(-sum(log(dt(y-para[1]-para[2]*x1-para[3]*x2-para[4]*x3,
                       df = para[5]))))
  }
  betas <- optim(par = c(0.3, 0.2, -0.4, 0.5, 1),
                 fn = neg_ln_L_t,
                 y = data[["y"]], x1 = data[["x1"]], x2 = data[["x2"]], x3 = data[["x3"]],
                 method = "BFGS")$par
  return(betas)
}

mle_t(data = d)

sdvsvsdav 2022-07-06 21:51:13

f <- function (b0, b1, b2, b3, sigma) {
  # no d is found in this function env., get d from global env.
  return(-sum(log(dnorm(d[["y"]],
                        mean = b0+b1*d[["x1"]]+b2*d[["x2"]]+b3*d[["x3"]],
                        sd = sigma))))
}

mle(f,
    start = list(b0 = 0.3,
                 b1 = 0.2,
                 b2 = -0.4,
                 b3 = 0.5,
                 sigma = 1),
    method = "BFGS") %>%
  coeftest()

又或者唔用mle()

mle_t <- function (data) {
  neg_ln_L_t <- function (para, y, x1, x2, x3) {
    return(-sum(log(dt(y-para[1]-para[2]*x1-para[3]*x2-para[4]*x3,
                       df = para[5]))))
  }
  betas <- optim(par = c(0.3, 0.2, -0.4, 0.5, 1),
                 fn = neg_ln_L_t,
                 y = data[["y"]], x1 = data[["x1"]], x2 = data[["x2"]], x3 = data[["x3"]],
                 method = "BFGS")$par
  return(betas)
}

mle_t(data = d)

sdvsvsdav 2022-07-06 22:08:11

Copy 錯左
第一段應該係

mle_t <- function (data) {
  f <- function (b0, b1, b2, b3, df) {
    return(-sum(log(dt(data[["y"]]-b0-b1*data[["x1"]]-b2*data[["x2"]]-b3*data[["x3"]],
                       df = df))))
  }

  mle(f,
      start = list(b0 = 0.3,
                   b1 = 0.2,
                   b2 = -0.4,
                   b3 = 0.5,
                   df = 1),
      method = "BFGS") %>%
    coeftest()
}

mle_t(data = d)

水原千鶴 2022-07-07 02:10:02

因為小妹覺得 linear regression 值得深入睇吓點運作，可以睇埋數學上同實際上嘅同異

程式猿 2022-07-07 02:30:55

再扮女人...

水原千鶴 2022-07-07 03:08:23

小妹意想圖

銀點 2022-07-07 03:21:06

g持，讀多d有用嘅野，唔洗捱貴米

水原千鶴 2022-07-07 04:05:26

米價就靠大家嚟守護

sdvsvsdav 2022-07-07 22:07:24

"於是就好易產生計算上嘅誤會，即係以為 slope estimator 係透過 inv(Gram(X)) 得出嚟 (注: Gram(X) = t(X)*X)"
其實OLS estimator 確實係要計(X'X)^{-1}
你只係講緊用咩方法計(X'X)^{-1}

"而基本教科書都係好直接咁講解 normal equation (正規方程式) 或 maximum likelihood (最大似然) approach，冇詳細咁講點樣 solve normal equation"
正如我之前既reply
如果linear model 入面既error 係normally distributed既話
ols estimator 同ml estimator 係有同一個closed form solution
如果個error 係其他distribution(例如t distribution)
ml estimator 唔一定有closed form solution
呢個時侯就要用numerical method
例如gradient descent, newton method, quasi-newton method (e.g. BFGS)

水原千鶴 2022-07-08 01:37:15

冇人想知你件 attention seeker 想講咩. 離題又 on9.
你獻醜不如藏拙, 有嘢想講自己開 post, 唔好喺度騎劫

其實OLS estimator 確實係要計(X'X)^{-1}
你只係講緊用咩方法計(X'X)^{-1}

Moreover, this statement is wrong.

水原千鶴 2022-07-08 01:39:06

斷章取義. 我間唔中見到有人問點解唔用 gradient descent 做 linear regression
你讀好啲中文先再講嘢

水原千鶴 2022-07-08 01:41:39

我講緊 R 同 python 實際 built-in 點運, 你段 code 用咗喺邊個 R 嘅 built-in? 請 post 埋 github repo. 如果唔係你都係離題.

水原千鶴 2022-07-08 01:43:35

sdvsvsdav 你咁有心離題就開 post 自己講, 唔建議喺度 show-off. 小妹唔識睇你啲神邏輯.

水原千鶴 2022-07-08 01:45:53

sdvsvsdav 由封鎖 -> 完全封鎖

第 1 頁第 2 頁第 3 頁第 4 頁第 5 頁第 6 頁第 7 頁

吹水台自選台熱　門最　新手機台時事台政事台 World 體育台娛樂台動漫台 Apps台遊戲台影視台講故台健康台感情台家庭台潮流台美容台上班台財經台房屋台飲食台旅遊台學術台校園台汽車台音樂台創意台硬件台電器台攝影台玩具台寵物台軟件台活動台電訊台直播台站務台黑　洞