{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Simple OOP example with Linear Regression\n", "### Inspired by Dr. Tirthajyoti Sarkar, Fremont, CA 94536\n" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A very simple class `MyLinearRegression`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In linear regression we try and compute optimal parameters/coefficients such that the model\n", "$$\n", "x_i^Tw+b\n", "$$\n", "best describes observed data. Here $x_i$ is a data vector and $b$ is the so-called intercept. Given now many data vectors $x_i$ these can be collected into a matrix $\\tilde X$ and to account for the intercept the machine learning community in order to compute both the weights $w$ and the intercept $b$ considers the following matrix\n", "$$\n", "X=\n", "\\begin{bmatrix}\n", "1&x_{11}&\\ldots&x_{1d}\\\\\n", "1&x_{21}&\\ldots&x_{2d}\\\\\n", "\\vdots&\\ldots&\\ldots&\\ldots\\\\\n", "1&x_{n1}&\\ldots&x_{nd}\\\\\n", "\\end{bmatrix}\n", "$$\n", "and we solve for the coefficient vector \n", "$\n", "\\beta=\n", "\\begin{bmatrix}\n", "b\\\\\n", "w\\\\\n", "\\end{bmatrix}$.\n", "Given a vector of data points $y$ we now solve the problem of minimizing\n", "$$\n", "\\Vert X\\beta-y\\Vert_{2}^2\n", "$$\n", "with the explicit solution given by\n", "$$\n", "\\beta^{*}=(X^TX)^{-1}X^Ty.\n", "$$\n", "The following class realizes the solution of this problem." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "class MyLinearRegression:\n", " \n", " def __init__(self, fit_intercept=True):\n", " self.coef_ = None\n", " self.intercept_ = None\n", " self._fit_intercept = fit_intercept" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Create an instance and check attributes" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "mlr = MyLinearRegression()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mlr._fit_intercept" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mlr.coef_==None" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mlr.intercept_ == None" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "<__main__.MyLinearRegression object at 0x7facc41484e0>\n" ] } ], "source": [ "print(mlr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Built-in description method\n", "We can add a special built-in method `__repr__` to create a short description string" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "class MyLinearRegression:\n", " \n", " def __init__(self, fit_intercept=True):\n", " self.coef_ = None\n", " self.intercept_ = None\n", " self._fit_intercept = fit_intercept\n", " \n", " def __repr__(self):\n", " return \"I am a Linear Regression model!\"" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "mlr = MyLinearRegression()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I am a Linear Regression model!\n" ] } ], "source": [ "print(mlr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Adding the `fit` method\n", "Now, we can add the core fitting method called `fit`. This uses linear algebra routines from NumPy to solve a linear regression (single or multi-variate) problem." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "class MyLinearRegression:\n", " \n", " def __init__(self, fit_intercept=True):\n", " self.coef_ = None\n", " self.intercept_ = None\n", " self._fit_intercept = fit_intercept\n", " \n", " def __repr__(self):\n", " return \"I am a Linear Regression model!\"\n", " \n", " def fit(self, X, y):\n", " \"\"\"\n", " Fit model coefficients.\n", "\n", " Arguments:\n", " X: 1D or 2D numpy array \n", " y: 1D numpy array\n", " \"\"\"\n", " \n", " # check if X is 1D or 2D array\n", " if len(X.shape) == 1:\n", " X = X.reshape(-1,1)\n", " \n", " # add bias if fit_intercept is True\n", " if self._fit_intercept:\n", " X_biased = np.c_[np.ones(X.shape[0]), X]\n", " else:\n", " X_biased = X\n", " \n", " # closed form solution\n", " xTx = np.dot(X_biased.T, X_biased)\n", " inverse_xTx = np.linalg.inv(xTx)\n", " xTy = np.dot(X_biased.T, y)\n", " coef = np.dot(inverse_xTx, xTy)\n", " \n", " # set attributes\n", " if self._fit_intercept:\n", " self.intercept_ = coef[0]\n", " self.coef_ = coef[1:]\n", " else:\n", " self.intercept_ = 0\n", " self.coef_ = coef" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Generate some random data for test" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "X = 10*np.random.random(size=(20,2))\n", "y = 3.5*X.T[0]-1.2*X.T[1]+2*np.random.randn(20)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(1,2,figsize=(10,3))\n", "\n", "ax[0].scatter(X.T[0],y)\n", "ax[0].set_title(\"Output vs. first feature\")\n", "ax[0].grid(True)\n", "ax[1].scatter(X.T[1],y)\n", "ax[1].set_title(\"Output vs. second feature\")\n", "ax[1].grid(True)\n", "\n", "fig.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Instantiate a new `MyLinearRegression` object and fit the data" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "mlr = MyLinearRegression()" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "We have not fitted the data yet. There is no regression coefficients\n", "Regression coefficients: None\n" ] } ], "source": [ "print(\"We have not fitted the data yet. There is no regression coefficients\")\n", "print(\"Regression coefficients:\", mlr.coef_)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "mlr.fit(X,y)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "We have fitted the data. We can print the regression coefficients now\n", "Regression coefficients: [ 3.42299229 -1.05863427]\n" ] } ], "source": [ "print(\"We have fitted the data. We can print the regression coefficients now\")\n", "print(\"Regression coefficients:\", mlr.coef_)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The intercept term is given by: 0.12126935155178842\n" ] } ], "source": [ "print(\"The intercept term is given by: \", mlr.intercept_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Comparison of ground truth and fitted values\n", "Woudn't it be nice to compare the ground truth with the predictions and see how closely they fit" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "coef_ = mlr.coef_\n", "y_pred = np.dot(X,coef_)+mlr.intercept_" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(y,y_pred,s=100,alpha=0.75,color='red',edgecolor='k')\n", "plt.plot(y,y,c='k',linestyle='dotted')\n", "plt.grid(True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Extension\n", "Add the plotting as a method to our class!" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "# TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now create a data set of your choice and apply the fit and plot method of our class on it." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "# TODO " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" }, "toc-showcode": false, "toc-showmarkdowntxt": false }, "nbformat": 4, "nbformat_minor": 4 }