Exploring Restricted Least Square | A Quick Note
1 Introduction
This blog post will explore fitting a regression model with constraints on the parameters.
Let's start with the basics of multiple regression, where we have a dependent variable \(Y_i \in \mathbb{R}\) and \(p\) independent variable \(X_i \in \mathbb{R}^p\), \(p \geq 2\).
The multiple regression model is given by:
\[ Y_i=\boldsymbol \alpha + \boldsymbol{\beta}^T X_i +\epsilon_i \]
where \(E(\epsilon_i)=0\), \(Var(\epsilon_i)=\sigma^2\). After centering the data, we can express the linear regression model using matrix notation:
\[ \mathbf{Y} =\mathbf{X} \boldsymbol{\beta}+\boldsymbol{\varepsilon} \]
where \(\mathbf{Y}=\left(\begin{array}{c}Y_{1} \\\ldots \\Y_{n}\end{array}\right) \in \mathbb{R}^{n \times 1}\) , \(\mathbf{X}=\left(\begin{array}{c}X_{1}\\\vdots\\X_{n}\end{array}\right) \in \mathbb{R}^{n \times p}\), \(\beta \in \mathbb{R}^{p\times1}\), and \(\boldsymbol{\varepsilon}=\left(\begin{array}{c}\epsilon_{1}\\\ldots\\\epsilon_{n}\end{array}\right)\in\mathbb{R}^{n\times1}\).
This post aims to discuss how to fit a regression model when there are constraints on the parameter vector \(\boldsymbol{\beta}\).
2 Constrained Least Square
To begin, let's distinguish this topic from constrained least squares.
In constrained least squares, the goal is to solve a linear least squares problem while adding an extra constraint to the solution:
\[ \begin{array}{ll}\text { min } & \|\mathbf{Y}-\mathbf{X} \boldsymbol{\beta}\|^2 \\ \text { subject to } & \mathbf{C} \mathbf{X}=d\end{array} \]
3 Restricted Least Square
Now, let's focus on fitting a regression model with parameter restrictions. We want to impose \(M<k+1\) different linear restrictions on the parameters:
\[ \begin{array}{ll}\text { min } & \|\mathbf{Y}-\mathbf{X} \boldsymbol{\beta}\|^2 \\ \text { subject to }& \mathbf{L} \boldsymbol{\beta} =\boldsymbol{r}\end{array} \]
where \(\boldsymbol{\beta}\) is the parameter of interest, \(\mathbf{L} \in \mathbb{R}^{M\times k}\) and \(\boldsymbol{r} \in \mathbb{R}^{M\times 1}\).
Applying the Lagrange multipliers, we need to minimize the Lagrangian function
\[ \begin{aligned}L(\boldsymbol{\beta}, \boldsymbol{\lambda}) & =(\mathbf{Y}-\mathbf{X} \boldsymbol{\beta})^{\top}(\mathbf{Y}-\mathbf{X} \boldsymbol{\beta})+2 \boldsymbol{\lambda}^{\top}(\mathbf{L} \boldsymbol{\beta}-\boldsymbol{r}) \\& =\mathbf{Y}^{\top} \mathbf{Y}-2 \mathbf{Y}^{\top} \mathbf{X} \boldsymbol{\beta}+\boldsymbol{\beta}^{\top} \mathbf{X}^{\top} \mathbf{X} \boldsymbol{\beta}+2 \boldsymbol{\lambda}^{\top} \mathbf{L} \boldsymbol{\beta}-2 \boldsymbol{\lambda}^{\top} \boldsymbol{r}\end{aligned} \]
We then equate the partial derivatives to zero:
\[ \left\{ \begin{align}\frac{\partial L(\boldsymbol{\beta},\boldsymbol{\lambda})}{\partial\boldsymbol{\beta}}&=-2\mathbf{X}^{\top}\mathbf{Y}+2\mathbf{X}^{\top}\mathbf{X}\boldsymbol{\beta}+2\boldsymbol{\lambda}^{\top}\mathbf{L}=0 \\\frac{\partial L(\boldsymbol{\beta},\boldsymbol{\lambda})}{\partial\boldsymbol{\lambda}}&=2\mathbf{L}\boldsymbol{\beta}-2\boldsymbol{r}=0\end{align}\right. \]
From the first equation, we find:
\[ \begin{aligned}\widehat{\boldsymbol{\beta}}^{(R L S)} & =\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1} \mathbf{X}^{\top} \mathbf{Y}-\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1} \mathbf{L}^{\top} \boldsymbol{\lambda} \\& =\widehat{\boldsymbol{\beta}}^{(O L S)}-\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1} \mathbf{L}^{\top} \boldsymbol{\lambda}\end{aligned} \]
Now, by multiplying both sides by \(\mathbf{L}\), we get:
\[ \mathbf{L} \widehat{\boldsymbol{\beta}}^{(R L S)}=\mathbf{L} \widehat{\boldsymbol{\beta}}^{(O L S)}-\mathbf{L}\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1} \mathbf{L}^{\top} \boldsymbol{\lambda} \]
Since \(\mathbf{L} \boldsymbol{\beta} =\boldsymbol{r}\), we can express \(\boldsymbol{\lambda}\) as:
\[ \boldsymbol{\lambda}=\left(\mathbf{L}\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1} \mathbf{L}^{\top}\right)^{-1}\left(\mathbf{L} \widehat{\boldsymbol{\beta}}^{(O L S)}-\boldsymbol{r}\right) \]
Substituting this into the initial expression for \(\widehat{\boldsymbol{\beta}}^{(R L S)}\) gives us:
\[ \widehat{\boldsymbol{\beta}}^{(R L S)}=\widehat{\boldsymbol{\beta}}^{(O L S)}-\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1} \mathbf{L}^{\top}\left(\mathbf{L}\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1} \mathbf{L}^{\top}\right)^{-1}\left(\mathbf{L} \widehat{\boldsymbol{\beta}}^{(O L S)}-\boldsymbol{r}\right) \]
4 Reference
http://sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/tutorials/xegbohtmlnode18.html#varbeta_rls
http://web.vu.lt/mif/a.buteikis/wp-content/uploads/PE_Book/4-4-Multiple-RLS.html