my-infosec-awesome/files/machine-learning/.ipynb_checkpoints/machine-learning-by-standford-university-checkpoint.ipynb
2018-01-17 13:44:13 +07:00

132 lines
6.3 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Machine Learning by Standford University\n",
"\n",
"## Week 1\n",
"\n",
"### Introduction\n",
"\n",
"#### What is Machine Learning?\n",
"\n",
"- Definition of machine learning defined by many computer scientists:\n",
" - Arthur Samuel (1959): Machine learning is field of study that gives computers the ability to learn without being explicitly programmed.\n",
" - Tom Mitchell (1998): Well-posed learning problem: A computer program is said to *learn* from experience $E$ with respect to some task $T$ and some performance measure $P$, if its performance on $T$, as measured by $P$, improves with experience $E$.\n",
"- Types of machine learning algorithms:\n",
" - **Supervised learning**: teach the computer how to do something\n",
" - **Unsupervices learning**: let computer learn but itself\n",
" - Others:\n",
" - Reinforcement learning\n",
" - Recommender systems\n",
"\n",
"#### Supervised Learning\n",
"\n",
"- **Definition**: Give the computer a data set in which the right answer were given. Computer then resposible for producing *more* right answer from what we were given.\n",
"- Type of problems on supervised learning\n",
" - **Regression problem**: try to predict continuous (real) valued output e.g. house pricing.\n",
" - **Classification problem**: discrete valued output(s) e.g. probability of breast cancer (nalignant, benign) based on tumor size as attribute or feature. \n",
"\n",
"#### Unsupervised Learning\n",
"\n",
"- **Definition**: Data have the same labels or no labels. Let computer find the structure of data\n",
"- By: **clustering algorithm** and **non-clustering algorithm**\n",
"\n",
"### Model and Cost Function\n",
"\n",
"#### Model Representation\n",
"\n",
"- This training set will be used in the following section:\n",
"\n",
"| Size in feet^2 (x) \t| Price ($) in 1000's (y) \t|\n",
"|:------------------:\t|:-----------------------:\t|\n",
"| 2104 \t| 460 \t|\n",
"| 1416 \t| 232 \t|\n",
"| 1534 \t| 315 \t|\n",
"| 852 \t| 178 \t|\n",
"| ... \t| ... \t|\n",
"\n",
"- To represent the model, these are basic description of notation:\n",
" - $m$ = Number of training exmaples\n",
" - $x$'s = input variable/features\n",
" - $y$'s = output variable/\"target\" variable\n",
" - $(x, y)$ = one training example for corresponding $x$ and $y$\n",
" - $(x^i, y^i); i=1,...,m$ = training examples from row on table when $i$ is an index into the training set\n",
" - $X$ = space of input values, for example: $X = R$\n",
" - $Y$ = space of output values, for example: $Y = R$\n",
"- Supervised learning (on house pricing problem) is consists of\n",
" - Training set or data set $(x^i, y^i); i=1,...,m$\n",
" - Learning algorithm, to output $h$ or *hypothesis function*\n",
" - $h$ or *hypothesis function* takes input and try to output the estimated value of $y$, corresponding to $x$ or $h: X \\rightarrow Y$\n",
"- There are many ways to represent $h$ based on learning algorithm, for example, for house pricing problem, supervised, regression problem, the hypothesis can be described as \n",
"\n",
"$$h_\\theta(x) = \\theta_0 + \\theta_1x$$\n",
"\n",
"which is called *linear regression model with one variable* or *univariate linear regression*.\n",
"\n",
"#### Cost Function\n",
"\n",
"Cost function is the function that tell *accuracy* of hypothesis.\n",
"\n",
"According to the training set of house pricing problem below where $m = 47$\n",
"\n",
"| Size in feet^2 (x) \t| Price ($) in 1000's (y) \t|\n",
"|:------------------:\t|:-----------------------:\t|\n",
"| 2104 \t| 460 \t|\n",
"| 1416 \t| 232 \t|\n",
"| 1534 \t| 315 \t|\n",
"| 852 \t| 178 \t|\n",
"| ... \t| ... \t|\n",
"\n",
"The hypothesis of this linear regression problem can be notated as:\n",
"\n",
"$$h_\\theta(x) = \\theta_0 + \\theta_1x$$\n",
"\n",
"For house pricing linear regression problem, we need to choose $\\theta_0$ and $\\theta_1$ so that the hyopothesis $h_\\theta(x_i)$ (predicted value) is close to $y$ (actual value), or $h_\\theta(x_i) - y_i$ must be small. In this situation, **mean squared error (MSE)** or **mean squared division (MSD)** can be used to measure the average of the squares of the errors or deviations. The cost function of this problem can be described by the MSE as:\n",
"\n",
"$$J(\\theta_0, \\theta_1) = \\dfrac {1}{2m} \\displaystyle \\sum _{i=1}^m \\left ( \\hat{y}_{i}- y_{i} \\right)^2 = \\dfrac {1}{2m} \\displaystyle \\sum _{i=1}^m \\left (h_\\theta (x_{i}) - y_{i} \\right)^2$$\n",
"\n",
"#### Cost Function Intuition I\n",
"\n",
"To find the best hypothesis, best straight line from linear equation that can be used to predict an output, for house pricing problem, result from the cost function of best fit hypothesis must closer to zero or ideally zero.\n",
"\n",
"#### Cost Function Intuition II\n",
"\n",
"This section explains about contour plot which use to conviniently describe more complex hypothesis.\n",
"\n",
"![Example of hypothesis with contour plots to find the best hypothesis based on result of cost function](images/1.png)\n",
"\n",
"\n",
"### Parameter Learning\n",
"\n",
"#### Gradient Descent\n",
"\n",
"#### Gradient Descent Intuition"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}