Exploring Restricted Least Square | A Quick Note

Posted on 2023-09-11 Edited on 2023-09-17 In Theory Views: Disqus:

1 Introduction

This blog post will explore fitting a regression model with constraints on the parameters.

Let's start with the basics of multiple regression, where we have a dependent variable \(Y_i \in \mathbb{R}\) and \(p\) independent variable \(X_i \in \mathbb{R}^p\), \(p \geq 2\).

The multiple regression model is given by:

\[ Y_i=\boldsymbol \alpha + \boldsymbol{\beta}^T X_i +\epsilon_i \]

where \(E(\epsilon_i)=0\), \(Var(\epsilon_i)=\sigma^2\). After centering the data, we can express the linear regression model using matrix notation:

\[ \mathbf{Y} =\mathbf{X} \boldsymbol{\beta}+\boldsymbol{\varepsilon} \]

where \(\mathbf{Y}=\left(\begin{array}{c}Y_{1} \\\ldots \\Y_{n}\end{array}\right) \in \mathbb{R}^{n \times 1}\) , \(\mathbf{X}=\left(\begin{array}{c}X_{1}\\\vdots\\X_{n}\end{array}\right) \in \mathbb{R}^{n \times p}\), \(\beta \in \mathbb{R}^{p\times1}\), and \(\boldsymbol{\varepsilon}=\left(\begin{array}{c}\epsilon_{1}\\\ldots\\\epsilon_{n}\end{array}\right)\in\mathbb{R}^{n\times1}\).

This post aims to discuss how to fit a regression model when there are constraints on the parameter vector \(\boldsymbol{\beta}\).

2 Constrained Least Square

To begin, let's distinguish this topic from constrained least squares.

In constrained least squares, the goal is to solve a linear least squares problem while adding an extra constraint to the solution:

\[ \begin{array}{ll}\text { min } & \|\mathbf{Y}-\mathbf{X} \boldsymbol{\beta}\|^2 \\ \text { subject to } & \mathbf{C} \mathbf{X}=d\end{array} \]

3 Restricted Least Square

Now, let's focus on fitting a regression model with parameter restrictions. We want to impose \(M<k+1\) different linear restrictions on the parameters:

\[ \begin{array}{ll}\text { min } & \|\mathbf{Y}-\mathbf{X} \boldsymbol{\beta}\|^2 \\ \text { subject to }& \mathbf{L} \boldsymbol{\beta} =\boldsymbol{r}\end{array} \]

where \(\boldsymbol{\beta}\) is the parameter of interest, \(\mathbf{L} \in \mathbb{R}^{M\times k}\) and \(\boldsymbol{r} \in \mathbb{R}^{M\times 1}\).

Applying the Lagrange multipliers, we need to minimize the Lagrangian function

\[ \begin{aligned}L(\boldsymbol{\beta}, \boldsymbol{\lambda}) & =(\mathbf{Y}-\mathbf{X} \boldsymbol{\beta})^{\top}(\mathbf{Y}-\mathbf{X} \boldsymbol{\beta})+2 \boldsymbol{\lambda}^{\top}(\mathbf{L} \boldsymbol{\beta}-\boldsymbol{r}) \\& =\mathbf{Y}^{\top} \mathbf{Y}-2 \mathbf{Y}^{\top} \mathbf{X} \boldsymbol{\beta}+\boldsymbol{\beta}^{\top} \mathbf{X}^{\top} \mathbf{X} \boldsymbol{\beta}+2 \boldsymbol{\lambda}^{\top} \mathbf{L} \boldsymbol{\beta}-2 \boldsymbol{\lambda}^{\top} \boldsymbol{r}\end{aligned} \]

We then equate the partial derivatives to zero:

\[ \left\{ \begin{align}\frac{\partial L(\boldsymbol{\beta},\boldsymbol{\lambda})}{\partial\boldsymbol{\beta}}&=-2\mathbf{X}^{\top}\mathbf{Y}+2\mathbf{X}^{\top}\mathbf{X}\boldsymbol{\beta}+2\boldsymbol{\lambda}^{\top}\mathbf{L}=0 \\\frac{\partial L(\boldsymbol{\beta},\boldsymbol{\lambda})}{\partial\boldsymbol{\lambda}}&=2\mathbf{L}\boldsymbol{\beta}-2\boldsymbol{r}=0\end{align}\right. \]

From the first equation, we find:

\[ \begin{aligned}\widehat{\boldsymbol{\beta}}^{(R L S)} & =\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1} \mathbf{X}^{\top} \mathbf{Y}-\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1} \mathbf{L}^{\top} \boldsymbol{\lambda} \\& =\widehat{\boldsymbol{\beta}}^{(O L S)}-\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1} \mathbf{L}^{\top} \boldsymbol{\lambda}\end{aligned} \]

Now, by multiplying both sides by \(\mathbf{L}\), we get:

\[ \mathbf{L} \widehat{\boldsymbol{\beta}}^{(R L S)}=\mathbf{L} \widehat{\boldsymbol{\beta}}^{(O L S)}-\mathbf{L}\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1} \mathbf{L}^{\top} \boldsymbol{\lambda} \]

Since \(\mathbf{L} \boldsymbol{\beta} =\boldsymbol{r}\), we can express \(\boldsymbol{\lambda}\) as:

\[ \boldsymbol{\lambda}=\left(\mathbf{L}\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1} \mathbf{L}^{\top}\right)^{-1}\left(\mathbf{L} \widehat{\boldsymbol{\beta}}^{(O L S)}-\boldsymbol{r}\right) \]

Substituting this into the initial expression for \(\widehat{\boldsymbol{\beta}}^{(R L S)}\) gives us:

\[ \widehat{\boldsymbol{\beta}}^{(R L S)}=\widehat{\boldsymbol{\beta}}^{(O L S)}-\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1} \mathbf{L}^{\top}\left(\mathbf{L}\left(\mathbf{X}^{\top} \mathbf{X}\right)^{-1} \mathbf{L}^{\top}\right)^{-1}\left(\mathbf{L} \widehat{\boldsymbol{\beta}}^{(O L S)}-\boldsymbol{r}\right) \]

4 Reference

http://sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/tutorials/xegbohtmlnode18.html#varbeta_rls

http://web.vu.lt/mif/a.buteikis/wp-content/uploads/PE_Book/4-4-Multiple-RLS.html

http://home.iitk.ac.in/~shalab/econometrics/Chapter6-Econometrics-RegressionAnalysisUnderLinearRestrictions.pdf

https://en.wikipedia.org/wiki/Constrained_least_squares