Clean up this scratch space

This commit is contained in:
bt3gl 2022-03-23 16:19:51 +04:00
parent 1de1667900
commit ed0cead015
62 changed files with 39650 additions and 13 deletions

View file

@ -1,15 +1,36 @@
# Curated Machine Learning and Data Engineering
![License: WTFPL](https://img.shields.io/badge/License-WTFPL-brightgreen.svg) [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/bt3gl/Awesome_Entrepreneur) [![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://GitHub.com/Naereen/StrapDown.js/graphs/commit-activity)
```
In this repository we cover resources for deploying Machine learning
in production environments, a task that includes data sourcing, data ingestion, data
transformation, pre-processing data for use in training, training a model, and hosting
the model.
```
* [Machine Learning resources](https://github.com/bt3gl/Curated_ETL_and_ML_Pipelines/tree/master/machine_learning_examples).
* [Data Engineering resources](https://github.com/bt3gl/Curated_ETL_and_ML_Pipelines/blob/master/data_engineering.md).
# 🤖 [Scrath Space] Machine Learning & Deep Learning projects and resources
* [Tensorflow](https://github.com/bt3gl/Resources-Machine_Learning/tree/master/TensorFlow): examples in TF.
* [Caffe](https://github.com/bt3gl/Resources-Machine_Learning/tree/master/Caffe): examples in Caffe.
* [DeepArt](https://github.com/bt3gl/Resources-Machine_Learning/tree/master/Numpy): deep learning generated art.
* [ML Notebooks](https://github.com/bt3gl/Resources-Machine_Learning/tree/master/Notebooks): jupyter notebooks with ML examples.
* [Numpy](https://github.com/bt3gl/Resources-Machine_Learning/tree/master/Numpy): some snippets in Numpy.
* [Airflow]()
---------
## Learning Resources
* [Energy-based Approaches to Representation Learning - Yann LeCun](https://www.youtube.com/watch?v=m17B-cXcZFI&amp=&t=524s).
* [Stanford's Machine Learning Course](http://cs229.stanford.edu/).
* [Google's Developer Machine Learning Course](https://developers.google.com/machine-learning).
* [Deep Learning Lectures by Lex Fridman](https://www.youtube.com/watch?v=O5xeyoRL95U&list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf).
* [Andrew Ng's deeplearning.ai](https://www.deeplearning.ai/deep-learning-specialization/)
* [A Chart of Neural Networks](http://www.asimovinstitute.org/neural-network-zoo/).
* [UCL Course on RL](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html)
* [Stanford's Convolutional Neural Networks for Visual Recognition](http://cs231n.stanford.edu/)
* [The 9 CNN Papers You Need To Know About](https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html).
* [NVIDIA Deep Learning Course](https://www.youtube.com/playlist?list=PL5B692fm6--tI-ijknnVZWbXU2H4JpSYe)
* [DeepBench](https://github.com/baidu-research/DeepBench).
* [Deep Fake source code](https://github.com/deepfakes/faceswap/).
* [Deep Learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville](http://www.deeplearningbook.org/).
* [Tensorflow plaground](http://playground.tensorflow.org).
* [Google's Tensorflow courses](https://www.tensorflow.org/).
* [MIT Deep Learning Basics](https://medium.com/tensorflow/mit-deep-learning-basics-introduction-and-overview-with-tensorflow-355bcd26baf0).

0
caffe_examples/.gitkeep Normal file
View file

0
deep_art/.gitkeep Normal file
View file

View file

@ -0,0 +1,87 @@
# Deep Dream
## Running it in AWS
Create an AWS's ``` g2.2xlarge``` instance with the AMI [```cs231n_caffe_torch7_keras_lasagne_v2```](http://cs231n.github.io/aws-tutorial/), AMI ID: ```ami-125b2c72```, in the ```us-west-1 region```.
* It cointains Caffe, Torch7, Theano, Keras and Lasagne are pre-installed.
* Python bindings of caffe are available.
* It has CUDA 7.5 and CuDNN v3.
Once your machine is launched, get its IP and access it with:
```shell
$ ssh -i <pem_file> ubuntu@<aws_pub_ip_or_dns>
```
Copy this notebook to the remote machine:
```shell
$ scp -i <pem_file> <files_to_be_copied> ubuntu@<aws_pub_ip_or_dns>P
```
### Checking the Instance
* The root directory is only 12GB, and only ~3GB of that is free.
* There should be a 60GB /mnt directory that you can use to put your data, model checkpoints, models etc. Remember that the /mnt directory wont be persistent across reboots/terminations.
* If you need access to a large dataset and dont want to download it every time you spin up an instance, the best way to go would be to create an AMI for that and attach that AMI to your machine when configuring your instance (before launching but after you have selected the AMI).
* Check if Caffe is running:
```shell
$ cd caffe
$ ./build/tools/caffe time --gpu 0 --model examples/mnist/lenet.prototxt
```
### Running the Notebook
Install any dependences for [Jupyter](http://jupyter.readthedocs.io/en/latest/install.html):
```shell
$ apt-get install build-essential python3-dev
$ pip install jupyter
```
Run the notebook with:
```shell
$ cd caffe/examples
$ jupyter notebook --port=8888 dream.ipynb --no-browser &
```
Open it in your browser with the instance's IP and port 8888. If this does not work, try to make a tunnel:
```shell
$ ssh -i <pem_file> -N -f -L localhost:8888:localhost:8888 ubuntu@<aws_pub_ip_or_dns>
```
Note: add your AWS key to your ```~/.ssh/config``` file to avoid having to type it in all the time.
Note 2: kill the process later, finding it with ```ps aux | grep localhost:8888```.
## Running it in a Dockerfile
In the docker directory build the Dockerfile:
```shell
$ docker build -t caffe:cpu standalone/cpu
```
Then run into it:
```shell
$ docker run -ti --volume=$(pwd):/workspace caffe:cpu
```
## Testing Bat-Country
Another option is to check out [bat-country](https://github.com/jrosebr1/bat-country). I left some examples under ```/bat-country```.
## References
* Google Research [blog post](http://googleresearch.blogspot.ch/2015/06/inceptionism-going-deeper-into-neural.html) about Neural Network art.
* [Clouddream](https://github.com/VISIONAI/clouddream).
* [CNN-VIS](https://github.com/jcjohnson/cnn-vis).

View file

@ -0,0 +1 @@
echo aaa

View file

@ -0,0 +1,13 @@
#!/usr/bin/env python
""" Create dream """
IMAGE_TO_DREAM = "saturn.jpg"
IMAGE_TO_DREAM_OUTPUT = "dream.jpg"
bc = BatCountry(IMAGE_TO_DREAM)
image = bc.dream(np.float32(Image.open())
bc.cleanup()
result = Image.fromarray(np.uint8(image))
result.save(IMAGE_TO_DREAM_OUTPUT)

View file

@ -0,0 +1,10 @@
#!/usr/bin/env python
""" Create dream 2 """
bc = BatCountry(args.base_model)
features = bc.prepare_guide(Image.open(args.guide_image), end=args.layer)
image = bc.dream(np.float32(Image.open(args.image)), end=args.layer,
iter_n=20, objective_fn=BatCountry.guided_objective,
objective_features=features,)
bc.cleanup()

Binary file not shown.

After

Width:  |  Height:  |  Size: 211 KiB

Binary file not shown.

View file

@ -0,0 +1,44 @@
FROM ubuntu:14.04
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cmake \
git \
wget \
libatlas-base-dev \
libboost-all-dev \
libgflags-dev \
libgoogle-glog-dev \
libhdf5-serial-dev \
libleveldb-dev \
liblmdb-dev \
libopencv-dev \
libprotobuf-dev \
libsnappy-dev \
protobuf-compiler \
python-dev \
python-numpy \
python-pip \
python-scipy && \
rm -rf /var/lib/apt/lists/*
ENV CAFFE_ROOT=/opt/caffe
WORKDIR $CAFFE_ROOT
ENV CLONE_TAG=master
RUN pip install jupyter
RUN git clone -b ${CLONE_TAG} --depth 1 https://github.com/BVLC/caffe.git . && \
for req in $(cat python/requirements.txt) pydot; do pip install $req; done && \
mkdir build && cd build && \
cmake -DCPU_ONLY=1 .. && \
make -j"$(nproc)"
ENV PYCAFFE_ROOT $CAFFE_ROOT/python
ENV PYTHONPATH $PYCAFFE_ROOT:$PYTHONPATH
ENV PATH $CAFFE_ROOT/build/tools:$PYCAFFE_ROOT:$PATH
RUN echo "$CAFFE_ROOT/build/lib" >> /etc/ld.so.conf.d/caffe.conf && ldconfig
WORKDIR /workspace

View file

@ -0,0 +1 @@
../dream/dream.ipynb

View file

@ -0,0 +1 @@
../dream/flowers.jpg

View file

@ -0,0 +1 @@
../dream/sky1024px.jpg

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,947 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Fine-tuning a Pretrained Network for Style Recognition\n",
"\n",
"In this example, we'll explore a common approach that is particularly useful in real-world applications: take a pre-trained Caffe network and fine-tune the parameters on your custom data.\n",
"\n",
"The upside of such approach is that, since pre-trained networks are learned on a large set of images, the intermediate layers capture the \"semantics\" of the general visual appearance. Think of it as a very powerful feature that you can treat as a black box. On top of that, only a few layers will be needed to obtain a very good performance of the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we will need to prepare the data. This involves the following parts:\n",
"(1) Get the ImageNet ilsvrc pretrained model with the provided shell scripts.\n",
"(2) Download a subset of the overall Flickr style dataset for this demo.\n",
"(3) Compile the downloaded Flickr dataset into a database that Caffe can then consume."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import os\n",
"os.chdir('..')\n",
"import sys\n",
"sys.path.insert(0, './python')\n",
"\n",
"import caffe\n",
"import numpy as np\n",
"from pylab import *\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# This downloads the ilsvrc auxiliary data (mean file, etc),\n",
"# and a subset of 2000 images for the style recognition task.\n",
"!data/ilsvrc12/get_ilsvrc_aux.sh\n",
"!scripts/download_model_binary.py models/bvlc_reference_caffenet\n",
"!python examples/finetune_flickr_style/assemble_data.py \\\n",
" --workers=-1 --images=2000 --seed=1701 --label=5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's show what is the difference between the fine-tuning network and the original caffe model."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1c1\r\n",
"< name: \"CaffeNet\"\r\n",
"---\r\n",
"> name: \"FlickrStyleCaffeNet\"\r\n",
"4c4\r\n",
"< type: \"Data\"\r\n",
"---\r\n",
"> type: \"ImageData\"\r\n",
"15,26c15,19\r\n",
"< # mean pixel / channel-wise mean instead of mean image\r\n",
"< # transform_param {\r\n",
"< # crop_size: 227\r\n",
"< # mean_value: 104\r\n",
"< # mean_value: 117\r\n",
"< # mean_value: 123\r\n",
"< # mirror: true\r\n",
"< # }\r\n",
"< data_param {\r\n",
"< source: \"examples/imagenet/ilsvrc12_train_lmdb\"\r\n",
"< batch_size: 256\r\n",
"< backend: LMDB\r\n",
"---\r\n",
"> image_data_param {\r\n",
"> source: \"data/flickr_style/train.txt\"\r\n",
"> batch_size: 50\r\n",
"> new_height: 256\r\n",
"> new_width: 256\r\n",
"31c24\r\n",
"< type: \"Data\"\r\n",
"---\r\n",
"> type: \"ImageData\"\r\n",
"42,51c35,36\r\n",
"< # mean pixel / channel-wise mean instead of mean image\r\n",
"< # transform_param {\r\n",
"< # crop_size: 227\r\n",
"< # mean_value: 104\r\n",
"< # mean_value: 117\r\n",
"< # mean_value: 123\r\n",
"< # mirror: true\r\n",
"< # }\r\n",
"< data_param {\r\n",
"< source: \"examples/imagenet/ilsvrc12_val_lmdb\"\r\n",
"---\r\n",
"> image_data_param {\r\n",
"> source: \"data/flickr_style/test.txt\"\r\n",
"53c38,39\r\n",
"< backend: LMDB\r\n",
"---\r\n",
"> new_height: 256\r\n",
"> new_width: 256\r\n",
"323a310\r\n",
"> # Note that lr_mult can be set to 0 to disable any fine-tuning of this, and any other, layer\r\n",
"360c347\r\n",
"< name: \"fc8\"\r\n",
"---\r\n",
"> name: \"fc8_flickr\"\r\n",
"363c350,351\r\n",
"< top: \"fc8\"\r\n",
"---\r\n",
"> top: \"fc8_flickr\"\r\n",
"> # lr_mult is set to higher than for other layers, because this layer is starting from random while the others are already trained\r\n",
"365c353\r\n",
"< lr_mult: 1\r\n",
"---\r\n",
"> lr_mult: 10\r\n",
"369c357\r\n",
"< lr_mult: 2\r\n",
"---\r\n",
"> lr_mult: 20\r\n",
"373c361\r\n",
"< num_output: 1000\r\n",
"---\r\n",
"> num_output: 20\r\n",
"384a373,379\r\n",
"> name: \"loss\"\r\n",
"> type: \"SoftmaxWithLoss\"\r\n",
"> bottom: \"fc8_flickr\"\r\n",
"> bottom: \"label\"\r\n",
"> top: \"loss\"\r\n",
"> }\r\n",
"> layer {\r\n",
"387c382\r\n",
"< bottom: \"fc8\"\r\n",
"---\r\n",
"> bottom: \"fc8_flickr\"\r\n",
"393,399d387\r\n",
"< }\r\n",
"< layer {\r\n",
"< name: \"loss\"\r\n",
"< type: \"SoftmaxWithLoss\"\r\n",
"< bottom: \"fc8\"\r\n",
"< bottom: \"label\"\r\n",
"< top: \"loss\"\r\n"
]
}
],
"source": [
"!diff models/bvlc_reference_caffenet/train_val.prototxt models/finetune_flickr_style/train_val.prototxt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For your record, if you want to train the network in pure C++ tools, here is the command:\n",
"\n",
"<code>\n",
"build/tools/caffe train \\\n",
" -solver models/finetune_flickr_style/solver.prototxt \\\n",
" -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel \\\n",
" -gpu 0\n",
"</code>\n",
"\n",
"However, we will train using Python in this example."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"iter 0, finetune_loss=3.360094, scratch_loss=3.136188\n",
"iter 10, finetune_loss=2.672608, scratch_loss=9.736364\n",
"iter 20, finetune_loss=2.071996, scratch_loss=2.250404\n",
"iter 30, finetune_loss=1.758295, scratch_loss=2.049553\n",
"iter 40, finetune_loss=1.533391, scratch_loss=1.941318\n",
"iter 50, finetune_loss=1.561658, scratch_loss=1.839706\n",
"iter 60, finetune_loss=1.461696, scratch_loss=1.880035\n",
"iter 70, finetune_loss=1.267941, scratch_loss=1.719161\n",
"iter 80, finetune_loss=1.192778, scratch_loss=1.627453\n",
"iter 90, finetune_loss=1.541176, scratch_loss=1.822061\n",
"iter 100, finetune_loss=1.029039, scratch_loss=1.654087\n",
"iter 110, finetune_loss=1.138547, scratch_loss=1.735837\n",
"iter 120, finetune_loss=0.917412, scratch_loss=1.851918\n",
"iter 130, finetune_loss=0.971519, scratch_loss=1.801927\n",
"iter 140, finetune_loss=0.868252, scratch_loss=1.745545\n",
"iter 150, finetune_loss=0.790020, scratch_loss=1.844925\n",
"iter 160, finetune_loss=1.092668, scratch_loss=1.695591\n",
"iter 170, finetune_loss=1.055344, scratch_loss=1.661715\n",
"iter 180, finetune_loss=0.969769, scratch_loss=1.823639\n",
"iter 190, finetune_loss=0.780566, scratch_loss=1.820862\n",
"done\n"
]
}
],
"source": [
"niter = 200\n",
"# losses will also be stored in the log\n",
"train_loss = np.zeros(niter)\n",
"scratch_train_loss = np.zeros(niter)\n",
"\n",
"caffe.set_device(0)\n",
"caffe.set_mode_gpu()\n",
"# We create a solver that fine-tunes from a previously trained network.\n",
"solver = caffe.SGDSolver('models/finetune_flickr_style/solver.prototxt')\n",
"solver.net.copy_from('models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel')\n",
"# For reference, we also create a solver that does no finetuning.\n",
"scratch_solver = caffe.SGDSolver('models/finetune_flickr_style/solver.prototxt')\n",
"\n",
"# We run the solver for niter times, and record the training loss.\n",
"for it in range(niter):\n",
" solver.step(1) # SGD by Caffe\n",
" scratch_solver.step(1)\n",
" # store the train loss\n",
" train_loss[it] = solver.net.blobs['loss'].data\n",
" scratch_train_loss[it] = scratch_solver.net.blobs['loss'].data\n",
" if it % 10 == 0:\n",
" print 'iter %d, finetune_loss=%f, scratch_loss=%f' % (it, train_loss[it], scratch_train_loss[it])\n",
"print 'done'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look at the training loss produced by the two training procedures respectively."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x7fbb36f0ad50>,\n",
" <matplotlib.lines.Line2D at 0x7fbb36f0afd0>]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": [
"iVBORw0KGgoAAAANSUhEUgAAAXUAAAEACAYAAABMEua6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n",
"AAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcXFWd9/HPtzt7AlkkJCGAgbCIqCSyuIDaRECEYZvB\n",
"EQRFB5iMo8CjzuMwOlpdioo4IM4iM6wTgdHhgRFBRAhLM6gQtgQCIQQkYc8CJIEQQpb+PX+c01hp\n",
"eqmqrl5SfN+vV7266tZdzr11+3tPnXvuLUUEZmZWHxr6uwBmZlY7DnUzszriUDczqyMOdTOzOuJQ\n",
"NzOrIw51M7M6UlaoS2qUNFfS9fn1OEmzJS2SdLOkMb1bTDMzK0e5NfUzgAVAW6f2M4HZEbEbcGt+\n",
"bWZm/azbUJe0PXAYcDGgPPhIYFZ+Pgs4uldKZ2ZmFSmnpv5j4P8CrSXDJkTEsvx8GTCh1gUzM7PK\n",
"dRnqkv4MWB4Rc/lTLX0zke4z4HsNmJkNAIO6ef/DwJGSDgOGAVtLuhxYJmliRCyVNAlY3tHEkhz2\n",
"ZmZViIgOK9LdUbk39JL0MeDvIuIISecAL0XEDyWdCYyJiLecLJUU1RbMNiepOSKa+7sc9cLbs7a8\n",
"PWurJ9lZaT/1tiPA2cDBkhYBM/JrMzPrZ901v7wpIu4A7sjPXwYO6q1CmZlZdXxF6Zajpb8LUGda\n",
"+rsAdaalvwtgSdlt6lXN3G3qZmYV68s2dTMzG8Ac6mZmdcShbmZWRxzqZmZ1xKFuZlZHHOpmZnXE\n",
"oW5mVkcc6mZmdcShbmZWRxzqZmZ1xKFuZlZHHOpmZnXEoW5mVkcc6mZmdaTPQ11FSUUd1tfLNTN7\n",
"O+iPmvo44HoV5fusm5nVWH+FegMwqh+WbWZW17oNdUnDJM2RNE/SAkk/yMObJT0raW5+HFrmMse2\n",
"+2tmZjXS7Q9PR8Q6SQdGxFpJg4DfSToACOC8iDivwmW2hfkY4OkKpzUzsy6U1fwSEWvz0yFAI7Ay\n",
"v66mXXxc/jumimnNzKwLZYW6pAZJ84BlwO0R8Uh+6zRJD0q6RFK5Ie3mFzOzXlJuTb01IqYB2wMf\n",
"ldQEXADsBEwDXgDOLXOZpc0vZmZWQ922qZeKiNWSbgD2iYiWtuGSLgau72gaSc0lL1toZhypPd41\n",
"dTMzIFeUm2oxr25DXdI2wMaIWCVpOHAwUJQ0MSKW5tGOAeZ3NH1ENG82v6I+BzyHa+pmZgDkSnJL\n",
"22tJhWrnVU5NfRIwS1IDqbnm8oi4VdLPJE0j1boXAzPLXObYPL5r6mZmNVZOl8b5wPs7GP65Kpc5\n",
"DngS19TNzGquP64oHUsKddfUzcxqrL9uE+CauplZL+jPmrpD3cysxvo01FXUUGAwqfeLm1/MzGqs\n",
"r2vqY4FVpNsMuKZuZlZj/RHqLwOvAsNV1OA+Xr6ZWV3rj1BfGYUIYDUwuo+Xb2ZW1/o61Mfxpzs8\n",
"rsTt6mZmNdVfzS+Q2tbdrm5mVkP90vySn6/CNXUzs5rq7+YX19TNzGrIzS9mZnWkP5tffKLUzKzG\n",
"+rtN3TV1M7Ma6utQHw68np+7pm5mVmN9HeqDgI35+avA1n28fDOzutbXoT4Y2JCfvw4M6+Plm5nV\n",
"tf4O9eF9vHwzs7rmUDczqyNdhrqkYZLmSJonaYGkH+Th4yTNlrRI0s2Syu3FUhrq63Com5nVVJeh\n",
"HhHrgAMjYhrwPuBASQcAZwKzI2I34Nb8uhylJ0rdpm5mVmPdNr9ExNr8dAjQSOqKeCQwKw+fBRxd\n",
"5vLc/GJm1ou6DXVJDZLmAcuA2yPiEWBCRCzLoywDJpS5PIe6mVkvGtTdCBHRCkyTNBq4SdKB7d4P\n",
"SdHZ9JKa33zxWUYy1aFuZlZKUhPQVIt5dRvqbSJitaQbgL2BZZImRsRSSZOA5V1M19z2XEV9GdfU\n",
"zcw2ExEtQEvba0mFaufVXe+Xbdp6tkgaDhwMzAWuA07Ko50EXFvm8kpPlK4j/U6pKi20mZl1rLs2\n",
"9UnAbblNfQ5wfUTcCpwNHCxpETAjvy7Hm23qUYiNwKY8zMzMaqDL5peImA+8v4PhLwMHVbG80hOl\n",
"8KcmmPVVzMvMzNrpsytKczNLZ6FuZmY10Je3CWgEWqMQrSXDHOpmZjXUl6FeepK0jUPdzKyG+jLU\n",
"2ze9QOoB41sFmJnVSH+HumvqZmY15FA3M6sjDnUzszriE6VmZnVkINTUfaLUzKxG+jvU/etHZmY1\n",
"1N+h7uYXM7Ma6us2dYe6mVkv6uuauk+Umpn1ooHQ/OITpWZmNTIQQt01dTOzGunvUHfvFzOzGvLF\n",
"R2ZmdaS/a+oOdTOzGnKom5nVkW5DXdIOkm6X9IikhyWdnoc3S3pW0tz8OLSbWbn3i5lZL+vyh6ez\n",
"DcBXImKepFHA/ZJmAwGcFxHnlbks19TNzHpZt6EeEUuBpfn5GkmPApPz26pwWe1PlLr3i5lZDVXU\n",
"pi5pCjAduDsPOk3Sg5IukTSmm8ldUzcz62XlNL8AkJtergbOyDX2C4Dv5Le/C5wLnNzBdM0A7MZ+\n",
"7Mkb7d52qJvZ256kJqCpFvMqK9QlDQauAa6IiGsBImJ5yfsXA9d3NG1ENAOoqK8CO7R72ydKzext\n",
"LyJagJa215IK1c6rnN4vAi4BFkTE+SXDJ5WMdgwwv5tZufnFzKyXlVNT3x84EXhI0tw87BvA8ZKm\n",
"kXrBLAZmlrGsDk+UqihFIaL8YpuZWUfK6f3yOzqu0d9Y4bLeUlOPQmxUUa35vfUVzs/MzNrp7ytK\n",
"wU0wZmY1M1BC3SdLzcxqYKCEumvqZmY10N+33gWHuplZzbimbmZWRwZCqPv+L2ZmNTIQQt0nSs3M\n",
"amSghPqIPiyHmVndGggnSl/DoW5mVhMDoab+GjCyD8thZla3HOpmZnXEoW5mVkcc6mZmdWSgnCh1\n",
"qJuZ1YBr6mZmdcShbmZWRwZKqI/qw3KYmdWtgRLqrqmbmdWAT5SamdWRbkNd0g6Sbpf0iKSHJZ2e\n",
"h4+TNFvSIkk3SxrTzaxcUzcz62Xl1NQ3AF+JiD2BDwJfkrQHcCYwOyJ2A27Nr7viUDcz62XdhnpE\n",
"LI2Iefn5GuBRYDJwJDArjzYLOLqbWTnUzcx6WUVt6pKmANOBOcCEiFiW31oGTOhmcoe6mVkvG1Tu\n",
"iJJGAdcAZ0TEq5LefC8iQlJ0Ml0zAB9lBA/zQQr8pt0orwEjVZSiEB3Ow8ysnklqAppqMq8oI0cl\n",
"DQZ+DdwYEefnYQuBpohYKmkScHtEvKvddBERAlBR64CxUYjX3zL/ot4ARkch1vV4jczMtnCl2Vmp\n",
"cnq/CLgEWNAW6Nl1wEn5+UnAtd3MqrPmF3ATjJlZTZTTpr4/cCJwoKS5+XEocDZwsKRFwIz8ukMq\n",
"qiEva1MnozjUzcxqoNs29Yj4HZ2H/0FlLmcwsKGLNnOHuplZDfTVFaWdXU3axqFuZlYDfRXqXbWn\n",
"g0PdzKwmHOpmZnXEoW5mVkf6sk29u1D3PdXNzHqoL2vqPlFqZtbL3PxiZlZHHOpmZnXEoW5mVkcG\n",
"0olSh7qZWQ8NlBOla3Com5n1mJtfzMzqiEPdzKyOONTNzOqIT5SamdWRgXKi1KFuZlYDbn4xM6sj\n",
"DnUzszoyUEJ9DTBKRVX169lmZpZ0G+qSLpW0TNL8kmHNkp5t90PUXeky1KMQG0i19THlFtzMzN6q\n",
"nJr6ZUD70A7gvIiYnh+/7WYeQ4E3uhlnObBtGeUxM7NOdBvqEXEnsLKDtyppKikn1FcA4yuYp5mZ\n",
"tdOTNvXTJD0o6RJJ3TWblFtTd6ibmfXAoCqnuwD4Tn7+XeBc4OSORpTUzJ4cSNCqZjVFREsn81xB\n",
"bn5RUX8OzItCPFll+czMthiSmoCmWsyrqlCPiOUlhbkYuL6LcZtV1AjgpSh0GuiweU39NFJbvkPd\n",
"zOperuy2tL2WVKh2XlU1v0iaVPLyGGB+Z+Nm5bapt50o3REYXU3ZzMzezrqtqUv6OfAxYBtJzwAF\n",
"oEnSNFIvmMXAzG5mU26b+gdUVAOwAw51M7OKdRvqEXF8B4MvrXA5ldTUtyX1a9+6wmWYmb3t9dUV\n",
"pZX0ftkxv3ZN3cysQgMp1Ntq6juQmnUc6mZmFRpIof4i8A7gnaReLw51M7MKDZhQj0KsJ93Y632k\n",
"3jQOdTOzCvVlqK8vY7zlwN7AwzjUzcwqNmBq6tkK4N24pm5mVpWBFurLSWVyTd3MrAoDLdRXAJuA\n",
"x4ERKqqxV0tlZlZnBlqoLweeyz+a8Sq+AMnMrCJ9FepDKL+m/kx+vho3wZiZVaTaW+9Wqtya+sPA\n",
"dvm5Q93MrEIDKtSjELcBt+WXDnUzswoNtDb1Ug51M7MK9XqoqyiR2tTLufiolEPdzKxCfVFTHwJs\n",
"iEK0VjidQ93MrEJ9EerVNL2AQ93MrGIOdTOzOuJQNzOrIw51M7M60m2oS7pU0jJJ80uGjZM0W9Ii\n",
"STdLGtPFLBzqZmZ9pJya+mXAoe2GnQnMjojdgFvz68441M3M+ki3oR4RdwIr2w0+EpiVn88Cju5i\n",
"Fg51M7M+Um2b+oSIWJafLwMmdDGuQ93MrI/0+N4vERGSotMRLuIUJjNZzWoGWiKipcxZO9TN7G1B\n",
"UhPQVIt5VRvqyyRNjIilkiaR7oPesVO5Chgbc6K5wmW8CoxUUQ1VXI1qZrbFyJXdlrbXkgrVzqva\n",
"5pfrgJPy85OAa7sYt6rmlxzkrwMjKi6dmdnbVDldGn8O/AHYXdIzkr4AnA0cLGkRMCO/7sxQKr+Z\n",
"V5vXgJFVTmtm9rbTbfNLRBzfyVsHlbmMak+UgkPdzKwiA/mKUkihPqqGZTEzq2sDPdTX4Jq6mVnZ\n",
"Bnqou6ZuZlaBLSHUXVM3MyvTQA91N7+YmVWgr37Ozs0vZmZ9YKDX1N38YmZWgYEe6m5+MTOrwEAP\n",
"dTe/mJlVYEsIddfUzczKNNBD3c0vZmYVGOih7uYXM7MKbAmh7pq6mVmZBnqou/nFzKwCAz3U3fxi\n",
"ZlaBLSHUXVM3MyvTQA91N7+YmVVgoIe6m1/MzCrQ7c/ZdUXSEuAVYBOwISL262A0N7+YmfWRntbU\n",
"A2iKiOmdBDr07Ien1wNSUUOqnN7M7G2lFs0v6ub9qmvqUYjAtXUzs7LVoqZ+i6T7JJ3ayTg9aX4B\n",
"h7qZWdl61KYO7B8RL0gaD8yWtDAi7uxgGdU2v4BD3cysbD0K9Yh4If9dIemXwH7A5qF+O5u4g4Ka\n",
"BdASES0VLmYN7gFjZnVMUhPQVIt5VR3qkkYAjRHxqqSRwCFA8S0jHshr0RLNVZfQNXUzq3O5stvS\n",
"9lpSodp59aSmPgH4paS2+VwZETd3MN6iHiwDHOpmZmWrOtQjYjEwrYxRH6h2GZmbX8zMytQXV5T2\n",
"NNRdUzczK5ND3cysjvRFqM/v4fRufjEzK1Ovh3oUYl0PZ+GauplZmfqipt5TvlOjmVmZtoRQ9z3V\n",
"zczKtCWE+oBsflFRE1TUNSrqnf1dFjOzNj2990u3JBRB9GAWC4FzVNRuwMvA/yHdv/3F/JgETAa+\n",
"G4VY1UkZPgCsiODJkmHvAv4S2Bk4I4LV5RZIRe0K3AS0AscDZ1exXmZmNdcXNfV3VTORxGCJM2mO\n",
"B4F/BG4BHgS2YcUeE3lyxgnAUcBOpFC/RkXtrKJuU1FfLJmPoPViaP2HkmHbALOBscAuwKfKLldR\n",
"Q4FrgPOAmcBfVLN+Zma9QRE9qUR3M3MpIL4UwU8rn5avAz8AvhnB2SrqBOC5KESLxC+BI4G9I5in\n",
"ohqBXwKH0Np4Adp0ImKPKMSLmvjggRwx8yZGLW1k62fORq1bsfCoo3j4+Dv51HGfpTmOJtXUm8oq\n",
"V1HfB/YA/hxoBF4A9olCPFXpOpqZdURSRER3v1XR8bR9EOpXR6SasMT2wPKIjm/FK3EE8FfAfwBX\n",
"AMcBvwD2iWBJHmcn4F7gR0BTBJ8EUFHDeeITe3HFby/kswe/xtRbHgLOZ+let7B+5HPc+v3RHPbl\n",
"+3lptwZe3GN/PvK99YhNbBr8W/51wYms3GVaBE93uT5FTQduBPaKQizLwy4BlgA7knrpfC0K8Xx+\n",
"7wxgqyjEWdVuQzN7+xnoob4a+DkwGjiaFMhHR7By83HZClgAXA18HvinCL4n8Q3go8BhEbRK/BPp\n",
"xzm+mcf/TgQ/k5gA3ADMZ8SKI/m7iauBVh44ZXvu+urOvLT7ycBU4FDgcJo1F9gLOJb1I7/Gsvc+\n",
"wg53XwT8KgrplsIAKmpYXt564A7g8ijERSXvHw78GrgMeI7UJPMDYAXwfdIvQ50ShbipNlu1exKT\n",
"gfdFcGNfLdPMamegh/puwDF50H8AzcAnga+Sgu8HpJqugMYIPi8xHFgXQUgMJt2j/RekNvWrSc0u\n",
"SySmA/8NPAW8H/jXPP//BJblYX+MYKbEnsDDwIURzNysnO+67iDGPnkDk+fALjdt5KVdj46L5sxW\n",
"UVNIgT0Y+K+8HntHITa9OW1q+tk3CnF3fv1u4BzgY8ABpHb7/wL+Igpxlybf+3l2/N0pTJu1inc8\n",
"di6D17UAXyEdYH4KPApsjEKs7XS7FrUDK/YYy3//zzHMnP4rBq9rJF25Owh4N7cXz2fYyg8y/dK7\n",
"GPbKH/J2D+D1KMTSTub5UeCHwJejEPe3e0/AoCjEhpJ1PhB4D+kguLizspZLQnz0rP2Y8a1v5vX4\n",
"HOl8x4nAT6IQj1c0v6JGAu8D5kQhWkvWY0oUYnFehwuApcD3oxDrVNRY4N9I+9k5+ecUa0JFDYlC\n",
"rFdRDcAMYCvg91GI5R2MOwKYGIV4sv17ZS5rIrANqSK1FelOqYtruT4ly2ok7Vt7Ax8G7qFkm5eM\n",
"N4r0zfuxKLzlh3TKXZaA7YAVUYie/PDOgKOitibly9woxEMDOtQ7KpjEMcBZwDtIJ0F3JNXiD4rg\n",
"rTu52BmYQ+r1cmIEt5S8N4oUADdE8FQetiup18wlwBcj2JROmHI28KMIXuxgGQ3ABD7wk3+nqXgE\n",
"oacZ/vIwFh1xAxuHPcee/28m8Gma43lSCG9PCtLrgDnte/ioqBFtwZzPB5xFawNsGPlOVr1zEU8c\n",
"Opq9L9qaoasXIjYC19Da+EU2Dd4ONJhNg29h2CtXk/rpfwI4CPgNsBH4S9Zs20jj+tGodQnDXllD\n",
"MBUQm4YuZuHRU2ltvJ0nPjGFo7/wWxpaTyR1Dd0qz2NJ3uZzgeeBg4HDgYuAU4EZUYhH80Htb4Bj\n",
"Sbdavoj0jeVE0kHzQeAI4HHgKtLPFo7Pj02kwFwKvMwrk0ewZsJTbPfA3cA+wKlsGvRJkGjc8D+8\n",
"MO0jbP3sNF7a7Up2/MNzwEkEwaoptzJ2yaHAY8CupIP/WmAI6acSh5K+sd1FOsexNelAegCwEniW\n",
"tK8tBZoJDkdcCmwghf4KYF/g7vz3xly+52ltPJ+NwwYx5LVv5uX9nhRgytttEzAceIRUsdid9LsC\n",
"B+Vt8iCpQrAP8CHgJWAdsCqX50N5fovzZ/IaMA7Yn9Sz6sK83n+fP/fHSQH9ev48JpCC+2Xgj8BD\n",
"wGdI4fpCXs5rwHvzNJeSvs2uJR2UZ+Tt81iefi/SDzWszWV9iVQR+iNwAjA9l2E46YC7Sy6vSP9v\n",
"vwc+SOqRdhepI8ZUkvGkytk+wM15uncABeAPwLtJ33IPyp/bE6QDxOWk/4Ef5XXbSOrivIB0X6nf\n",
"ATdGIVbkA8cHgHeSKgb3k/bxrUn7/gZgWRQiVNSeuQx/yON+lLSv75rXcW0uf0Meb5f8ea8g9bp7\n",
"mNREPBX4OCkHXif9psRo4Mn8ua4BJuay3JTX91DSPnMTcBvwJdL/VAtwVhTi3i0u1NN7NJBq5hvK\n",
"mxf7AksjeKbM8d8FPFZNd0ptP+dAGjZcyIaR27J0+nxgMrTeDw3jSTvfBaSdZTqp9jGS9I81K4J/\n",
"6qAsUxmzeCPjH/kui2dsig0jviAxmlEv3M0BP1zAwiNPZ8mMGcC5wH8xcukrvOcXn2e/nz7CS7vt\n",
"z/N7r2T5e2Zx7HHraGgdyYX3LOT5fb8D/AvpH/NTDHl1HpuGrGHT0KdJtf1/JPXweR6YGcFaFTUa\n",
"OJm0gz9Hql1tB/wvcGUUYqmK+hzBT9kwYh2D3mikYdOlpJ33ReA0UthcHoV4BEBFDSYF2dH5vRX5\n",
"0UDamScRGseSj32YbR4bxqgX1iBeZ+24y5h12+dR61g++ON5PH3ANF7Z4Uye+OR3gD+jWQ1cOGcn\n",
"nt/vcva5YAZ/9reNwELmf3oIzxywJ4ed9gDpILKRdAvoffPrV/LjrlzmU0ndTiexascWLrzvM5wx\n",
"9U6Gvro98JEoxCoVtRewJ6kGODs3uf0dr044lQ2jtmfR4b9g2qwrGLZ6b9I/Nnm7iXSQ2wvYgRS4\n",
"d5IODFPzPN8AFnDx75/g8zPWMeiNMcBDOVjE7c27snTad5l8z3p2v+4BJjy8hBRUyvvZGFJQrAZ2\n",
"y49hpIPCsjx8LKmXWVtgXhSFePN3gXMNdzrw16TAH0lqBv0N6cCwOynMHsvTDyIF7vg83R6kHl8t\n",
"pN5ma0mh+0T+rCn9FqCitiOF+yZSQLYCq/L+NQ44PW+rVuB7pN5rTwFXkr6Jj8xlaiJ1SFhJqtR9\n",
"Nc+j7VvY3qSD0yHAiPxZ3J+XGXldd86fwWpSBWAT6UC2HbCcVLlpqxhckrfL1DysNY+/Oq9rI+kb\n",
"0HjSgfcE0sF4NqkX3Bjg26QD3E552Vvlz+lgUvhfTPo2uDupZn4I6cD1gyjEije3YQ9CnYjotUea\n",
"fe/Nv3fLHkMg3pWfj4T4GsQREEPbjSeInSD2hVgCcTzElyHuyc+/CvEixIr8mFAy7TshbodYBfEQ\n",
"xLSSeV4BsQziaIhPQNwNcRnEcXk+H4QYnp//HOIqiFMgHoPYtqTcV0A8AXEfxG8gppYsfyTEdvn5\n",
"cIgTIH6GNqxg/MP3MHTVcogP5GWdADEKohHi/XndvghxEMQOEFtD/Divx/g8z10gxkMUIG6BTQt4\n",
"7xUzmX7RKIjfQpwPsSfEXRCfyNN8CmIBxGCI2RB/gLgmfx7/lrfVaogZefxdS7dpB5/joRDn5G36\n",
"nxDPw6bzaKaxm89/XN7+n4K4KZej0+XkaQZD/FX+7K8oWac9IDZAHN7BNN/J8/9ZXobyY5v+/h8o\n",
"8/+ky+3Y7fTNNNBMQxfvb08zh9CcKqCdjCOaaezoM6WZEaXT0swUmvk4zQzOr7ejmZHt1mkiRKdl\n",
"6qIMnZYxjzO8/O1KVL1Ne/cDr75gW+IDYi+INTmkPp3/ue+CmJL/UYd0Mp0g1G7YIIhRJa9HQtwM\n",
"8Vxb+Ofh5+RgH9/FvA+A2I90YHoRogXi3lzWl3IQr4C4EWImxJQ87eEQayEezQeEl/PjEYj/hriI\n",
"dFB6AWJTDs1z8zp/Ly9rVQ7H7SAOgXga4kmIX0AM6qS8N0P8ewrgGJ3X+W6IayG2IR3kni5Zn1UQ\n",
"d0L8EOLb+SBwIMTQvKxnIM7O6/revA5j8nb5YV63n0Ic2/bPDPHPED/Nzxsgirk8V5EOUvtCfAHi\n",
"DojLIS6AWApxK8RhEH+Txz8K4pa8nW7J87sA4jyIYXnb7J6XMZd0EPkuxDqIY8rc746DOKHCfXUk\n",
"xF4dDG/MZbsVYlweNikPuxri/XnYqLwPzMlln0SqUOxSMq/d87bZuQ//B3eFmNxu2BCIbSGGtxve\n",
"kPfTr5EPohBTIVZCzOxk/g2U/K/mef8NFR4Eul8Pouppe3cDV1+wLfUBMZke1l66mPcgiK3bDdsa\n",
"Yo8K5rFjDrwP53/sxvx8aifjD2/biUk1mEmdjDcs/xXELIhfQ0zI/wRDS8Y7k1zL7qKMe0JshPh2\n",
"fn1cDtlBJeP8mHRw2SWH46EQ38rhfTrEcogLIa4r+Uf9Xp72CojHSYH/7Ry8p5O+zdwDcT/p2802\n",
"7cq1F+kbyo9I3yZuytOeDPF1iF3bjb8v6VvFgxAjcsh/E+KPEIty2W4oGX8GxCsQC0kHrudJB86/\n",
"Jn0bEsT0HEQPQPwLxDdIB60XyQfjTrZpI0QT6eD0q1yul/J6j4H4D1JI3wJxG8RP8vb9Vd5250Oc\n",
"kcu0GOJZiItJB9vPkb5FPg5xbV7eyblMl0E8BfEZiOtJlZChpG81Y/K478jLPpN08O+yxtvJ+o3P\n",
"+9wruUwNpArBhXk9l0O8Tvpm1LY//wjid6RvSS+TDqYPQlyZP9+2b02D8vgHQrwG0Zo/+6F5ewbE\n",
"sbX9Xyeqnbbf2tTNuiJxJPC/EXR26wcBDRFvtnG3f/84Ui+o90WwSGIq8HwEr+cT74cAl0XwRsk0\n",
"DaR2zleBWyJo7WDWla7Hh4GVETyau+eeReoZtYp0cvao2PzE/1eAqyN4RmLHXJ69gcNI5w9eI7U7\n",
"30g6gf4B0rUdx5PaoBfm13eTTnBOIJ24nAo8A/yWdAL3VtJ1FXeQTgJfQzoPMQa4KIL1Ep8lte3/\n",
"KvJtNHJvtB1JJx/nkdrOr+dP5xbuJp1v2gs4IoKFEqeQzuVcmtfjfaSTj5DOF3yGdNJzcF7fBtKJ\n",
"xPv4Uxv5aNJndiypff3XpLb4e0lXm/8KeBr4el63y0jt8UuBb+XtuS3phOZrpPbyHYEPRfBy3tZn\n",
"kT77L+d1+wZwUl7Hc4Bvkc6htfXGG53X5cy83Pfksm+VH/vk9yeROlXcBJwCbAucGsEGiUl5mw/K\n",
"j2cieLHfTpRKOhQ4n3QC4eKI+GG79x3q1m8ktorg1f4uRxuJrYGPRXB9fj2ms4NWB9MOJvXq+GPE\n",
"W0/+SwwhncicT+rltRcp9F4gBdtTETzXwXQ7ATtHcGtVK5Xm8RPg9xFcJXEYcAapl9qKDsYVqWdS\n",
"W1BfQgrQ09rWS+IdpB47++THzqSD4L2kLsz7kw6Mz5JCdj7p5OWH88Ho/aTgv510fcvGkuWPIN0W\n",
"5GXgruigJ1we7wukLsa3kTLuLODfIvhZfn846UB4GekgewfphOqHSMG+jnSgeoC0/T9C6vVyZV6f\n",
"l/Pwk0knUjfmR3ME1/ZLqEtqJJ0tP4jUk+Je4PiIeLRkHId6jUhqioiW/i5HvfD2rK3+2p75osNz\n",
"gO9FsKhk+BHAneUeNDuY7zBSL51/jqDbH/rJ18wcC1wcQZfXbeQDy/Wk3jendXRg6Ul29uQujfsB\n",
"T0TEklz7beZeAAADf0lEQVSIX5BusPVoVxNZ1ZpINTGrjSa8PWupiX7YnhEsIzWRtB9+fQ/nu450\n",
"sCh3/LmkZqdyxl1L6t7YK3pyl8bJsFmf8WfzMDMz6yc9CfXeO8NqZmZV6Unzy3Okq+ja7ECqrW8m\n",
"3f/FakFSob/LUE+8PWvL23Ng6MmJ0kGkE6UfJ12Kfg/tTpSamVnfqrqmHhEbJX2Z1PeyEbjEgW5m\n",
"1r969eIjMzPrW73yG6WSDpW0UNLjkv6+N5ZR7yQtkfSQpLmS7snDxkmaLWmRpJsljenvcg5Uki6V\n",
"tEzS/JJhnW4/Sf+Q99eFkg7pn1IPTJ1sy2ZJz+b9c66kT5a8523ZBUk7SLpd0iOSHpZ0eh5em/2z\n",
"lvcryLX+RtJtKqeQLvudB5R9bxI/3tyOi4Fx7YadA3w9P/974Oz+LudAfZCu4JsOzO9u+5Fupzwv\n",
"769T8v5b0xs0bcmPTrZlAfhqB+N6W3a/PScC+Y6sjCKdm9yjVvtnb9TU37woKSI2kO6RcFQvLOft\n",
"oP0VZUcCs/LzWaR7mFsHIuJO2PwnE+l8+x0F/DwiNkS6mO4J0n5sdLot4a37J3hbdisilkbEvPx8\n",
"DemCzcnUaP/sjVD3RUm1EcAtku6TdGoeNiEi/eA16X4RE/qnaFuszrbfdmzeHdf7bHlOk/SgpEtK\n",
"mgq8LSsgaQrpW9AcarR/9kao+8xrbewfEdNJv+f6JUkfKX0z0vcyb+sqlbH9vG27dgHp132mkW4a\n",
"dm4X43pbdkDSKNJNwc6IiM1uPNeT/bM3Qr2si5KsaxHxQv67Avgl6evWMkkTASRNgrf+nqt1qbPt\n",
"136f3T4Ps05ExPLISD/R1tYc4G1ZBkmDSYF+eURcmwfXZP/sjVC/D9hV0hRJQ4BPk+5hbGWSNELS\n",
"Vvn5SNJ9pNt+5Lrt5kUnAdd2PAfrRGfb7zrgOElDJO1E+r3Oe/qhfFuMHDptjiHtn+Bt2S1JIt1y\n",
"eEFEnF/yVk32z57cJqBD4YuSamEC8Mv02TMIuDIibpZ0H3CVpJNJP3j7l/1XxIFN0s9J99zeRtIz\n",
"pB8EPpsOtl9ELJB0FenHhzcCf5troEaH27IANEmaRmoGWAzMBG/LMu0PnAg8JKntzo7/QI32T198\n",
"ZGZWR3rl4iMzM+sfDnUzszriUDczqyMOdTOzOuJQNzOrIw51M7M64lA3M6sjDnUzszry/wFBsEB8\n",
"UlvRigAAAABJRU5ErkJggg==\n"
],
"text/plain": [
"<matplotlib.figure.Figure at 0x7fbb37f20990>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(np.vstack([train_loss, scratch_train_loss]).T)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice how the fine-tuning procedure produces a more smooth loss function change, and ends up at a better loss. A closer look at small values, clipping to avoid showing too large loss during training:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x7fbb347a8310>,\n",
" <matplotlib.lines.Line2D at 0x7fbb347a8590>]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": [
"iVBORw0KGgoAAAANSUhEUgAAAXgAAAEACAYAAAC57G0KAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n",
"AAALEgAACxIB0t1+/AAAIABJREFUeJzsnXeYHNWVt98jgXIY5ZyQMNlIJJMMwhhssI0Dxsbr8Dms\n",
"zTpne9e73qa9tnFYrzMYe53WOeyuFzA4YBAYTEYiCQQCCSRAaZQTEtL5/jj3TlXXVHdX9/SMZsR5\n",
"n2ee6a6uqq5Ov3vu7557rqgqjuM4zv5Hv319AY7jOE734ALvOI6zn+IC7ziOs5/iAu84jrOf4gLv\n",
"OI6zn+IC7ziOs59SSOBFpL+ILBSRK6s8/g0ReURE7hGRea29RMdxHKcZikbwHwQWA52S5kXkXGCO\n",
"qh4MvAu4rHWX5ziO4zRLXYEXkanAucB/ApKzy3nAjwFU9TagTUQmtPIiHcdxnMYpEsF/Ffg4sLfK\n",
"41OAFan7K4GpXbwux3Ecp4vUFHgReTmwRlUXkh+9d+yaue/1DxzHcfYxB9R5/GTgvOCzDwJGiMh/\n",
"qepbUvs8CUxL3Z8atlUgIi76juM4TaCqtQLsqkjRYmMicjrwMVV9RWb7ucD7VPVcETkR+Jqqnphz\n",
"vHIxy4EXaUmXNXyhZbkK+I6W9KpGj90fEZGLVfXifX0d+wP+XrYWfz9bi4hoswJfL4LPouEJLwJQ\n",
"1ctV9WoROVdElgLbgLfVOH4QsLOZCwWeAQY2eazjOM5zjsICr6o3ADeE25dnHntfwdN0ReB3AQOa\n",
"PNZxHOc5R0/PZPUIvnUs2NcXsB+xYF9fwH7Ggn19AY7R0wI/EBPqZvAIPoWqLtjX17C/4O9la/H3\n",
"s/fQ0wL/rJa0Wj59PTyCdxzHaYCeFvhm7RnwCN5xHKch+prAewTvOI5TkL4k8M/gEbzjOE5helrg\n",
"mx1gBY/gHcdxGsIjeMdxnP2UviTwPsjqOI7TAH1J4D1N0nEcpwH6mgfvEbzjOE5BPIJ3HMfZT+lL\n",
"Au8RvOM4TgP0NYH3CN5xHKcgfUngPU3ScRynAfraIKtH8I7jOAXxCN5xHGc/pS8JvEfwjuM4DdCX\n",
"BN4jeMdxnAboSwLvaZKO4zgN0JcGWX2ik+M4TgN4BO84jrOf0tcE3iN4x3GcgtQVeBEZJCK3icgi\n",
"EVksIpfk7DNfRDaJyMLw9y9VTueDrI7jOD3EAfV2UNWdInKGqm4XkQOAm0TkVFW9KbPrDap6Xp3T\n",
"+UQnx3GcHqKQRaOq28PNAUB/YH3OblLgVF2J4HcDB0hZetpWchzH6ZMUEksR6Scii4DVwPWqujiz\n",
"iwIni8g9InK1iBxe5VRNC7yWVPGBVsdxnMLUtWgAVHUvMFdERgJ/FJH5qrogtcvdwLRg45wD/A54\n",
"XqcTXcqb5GI5I9xbkDlHEaIP35WegOM4Tq9FROYD81tyLlVt9Mk/DexQ1X+vsc8y4FhVXZ/aplzM\n",
"XC3pPU1fbFnWAYdpSdc2ew7HcZy+hIioqhaxwDtRJItmrIi0hduDgbOAhZl9JoiIhNsnYA1Hnk/f\n",
"lUHWeLxbNI7jOAUoYtFMAn4sIv2wBuEnqvoXEbkIQFUvB14LvFtEngW2AxdWOVdXrRX34B3HcQpS\n",
"JE3yPuCYnO2Xp25/G/h2gedrhcB7qqTjOE4B+tJMVnCLxnEcpzB9TeA9gnccxylITwv8ri4e7xG8\n",
"4zhOQXpU4LWke7t4Co/gHcdxCtLXpv17BO84jlOQvibwHsE7juMUpK8JvEfwjuM4BelrAu8RvOM4\n",
"TkH6osB7BO84jlOAvibwvvC24zhOQfqawHsE7ziOU5C+JvA+yOo4jlOQvibwPsjqOI5TkL4m8B7B\n",
"O47jFKSvCbxH8I7jOAXpawLvEbzjOE5B+prAewTvOI5TkL4o8B7BO47jFKCvCbxPdHIcxylIXxN4\n",
"j+Adx3EK0tcE3iN4x3GcgvQ1gfcI3nEcpyB9TeB7bZqklGWelGXWvr4Ox3GcSE2BF5FBInKbiCwS\n",
"kcUickmV/b4hIo+IyD0iMq97LhXo3WmSHwReva8vwnEcJ1JT4FV1J3CGqs4Fng+cISKnpvcRkXOB\n",
"Oap6MPAu4LLuulh6cQQPTKb3Nj6O4zwHqWvRqOr2cHMA0B9Yn9nlPODHYd/bgDYRmdDKi0zRmyP4\n",
"SfTea3Mc5zlIXYEXkX4isghYDVyvqoszu0wBVqTurwSm5p+ry56/R/CO4zgFOaDeDqq6F5grIiOB\n",
"P4rIfFVdkNlNsofln21gWWTXnnBnQc556tErI3gpy0BgNL3w2hzH6VuIyHxgfivOVVfgI6q6SUR+\n",
"DxwHLEg99CQwLXV/atiWwzNfVGVrw1eZ0FvTJCeF/4P26VU4jtPnCYHvgnhfRErNnqteFs1YEWkL\n",
"twcDZwELM7tdAbwl7HMisFFVV1c5ZVfFubdOdJoc/vfGa3Mc5zlKPU98EnBd8OBvA65U1b+IyEUi\n",
"chGAql4NPCYiS4HLgffUOF9XBX4LMELKkrWE9jUxgneBdxyn11DTolHV+4BjcrZfnrn/voLPd2Dx\n",
"S8u5npJul7LsAYZhYt9bmAysxQXecZxeRE/PZG2Ff74WGNeC87SSScBy3IN3HKcX4QLfGiYDy/AI\n",
"3nGcXoQLfGuYhAu84zi9jJ4W+C558IF1wNgWnKcTIswXqUj5LIpH8I7j9Dr6dAQvZZkkZanwvaUs\n",
"35OyHN/oSUUYBPwSeHET1+QevOM4vY4+LfCYIN8lZTkWQMpyJPD3wGHVDhZhugjn5Dz0dmACDYp0\n",
"mMU6HJvc5RG84zi9hr5o0aQFfjbwI+APUpb5wMeAbcCYGse/EKt62YEIBwKfwHL9G43CJ2J1enbg\n",
"Au84Ti+icKmCFtGyCD5EzuOArwJ3Ar/GXs93qe3RD6azEL8Yi8D/SuMCPxl4mt47y9ZxnOcofVbg\n",
"genAk1rSZ4HrpSzvxCpbKnB0jePzBH4i8DCwk8YFfjTQjgu84zi9jL4s8DOxgU0AtKT/ByBleR21\n",
"LZrBdBbxNmAjJvCjGryekcAmTOB9kNVxnF5DX/bgZ2KpiVnaqW3RDKJzpD0K2EBzEXwU+J0553Uc\n",
"x9ln9MUsms3hPIeRiuBTrCNE8FKWfsGrT5Nn0YwiieCbEfiNwG6gv5Slry1k7jjOfkqfE3gtqWIi\n",
"fjz5Ap+O4F9P5zVi8wS+jeYj+DZgU7iuXrkgieM4z036okUDZtMcQ3WLZkwoKXwwMD7zeK0IfgfN\n",
"WzTQXAPhOI7TLfS5CD6wFhgCLBdhsAh3xge0pDuAZ4GhWKbNsMyxrY7g0wLvmTSO4/Qa+ozAizBF\n",
"hP7h7jrM834aGAEck1nQO9o007FZpmnqefCDG7w0F3jHcXolfUbggZ9hs1DBIvjHtaR7sEheqBTm\n",
"ONA6jc4Cn5dF01UPfmO47QLvOE6voS958KNIctTXkgywDgn/01ZMvQh+kAjpZf+6mkXjHrzjOL2O\n",
"vjTRaUT4A1iCeeyQRO7DsJowYAL/POz15XnwEh7bLcLAcHs77sE7jrMf0Zcsmg6B15L+Wkv6j2F7\n",
"XgS/DpiHlR8YkslNjw1CFOI2YKMqigu84zj7EX3Cogl2SjqCT1PNojkGs3F2kET7kC/wG8LthgRe\n",
"ytI/PH9cANwF3nGcXkNfieCjjZL106F6BH8E8AQmvunjooBHIY7+OzQewY8AtmpJ9zZ5vOM4TrfR\n",
"VwR+ROZ/mrQHH2nHGoQngK2kBX7ONSN5xTv30IIInkp7BjyCdxynF1FX4EVkmohcLyIPiMj9IvKB\n",
"nH3mi8gmEVkY/v6lyum6Q+CrWTSQRPDJY2MeHswhVwr9n0lH8C7wjuPsdxTJotkNfFhVF4nIMOAu\n",
"Efmzqj6Y2e8GVT2vzrmaTZNsVODXhf+dLZrB7QMYtrofx3x/JrznISrz2HcBB4rQX5U9Ba4rFhqL\n",
"uMA7jtNrqBvBq+oqVV0Ubm8FHsRWMcoiOduyNBvBD8cW8mjEooGMwIvQnyHt9prnXHNM2Kcjgk9l\n",
"0hQV6TYqI3j34B3H6TU05MGLyEws/fC2zEMKnCwi94jI1SJyeJVTdMWiWU31QdZnqRT4tVhe+9NU\n",
"RvCDGbpmD+uet43RS+dKWQbz5rNexwE7NqeObaTgmFs0juP0WgoLfLBnfgt8METyae4Gpqnq0cA3\n",
"gd/ln+X9zxORi8Pf/AaucwSwkuoWzVpS4q8l3QbMCsv5pQdZBzF09V6WnbmGYauPAt7F7GuP5vDf\n",
"psscNBKFu8A7jtNSwphm1MmLu3KuQgIvIgcC/w38VFU7ibeqblHV7eH2NcCBIjK685m++aSqXhz+\n",
"FjRwnbUEfjCwhsyMVS3pmnAzPcg6mKFr4eGXPc3AzbOAT7J+9kae9/t0SeFGCo65B+84TktR1QUp\n",
"nby4K+cqkkUjwPeBxar6tSr7TAj7ISInAKKq63N27YoH/yTVI/hOAp+i0qIZ0t6P9udtYufIp4A7\n",
"uPeNq5m4cFpq/0Yi+DwP3gXecZxeQZEI/hTgTcAZqTTIc0TkIhG5KOzzWuA+EVkEfA24sMq5uuLB\n",
"r8EyXLKZOMUFvt+uwQza0I/NU7Zw99//BvgIj58Gw5+ck9q/qxaND7I6jtMrqJsmqao3UachUNVv\n",
"A98u8HxdSZN8FFuPdTiQ7h0MAR4BplY5dgtWeAym3TKeZwft5dkhO7j2i0v0z198VD67fQADtk+S\n",
"sgzSku7EPXjHcfYT+tJM1s3hL2vT5HrwKbZ2PDbuwYk8M3I3aSF+dkgbe/s/DMwN+zcq8O7BO47T\n",
"K+lLAr8l/GUFvqpFI8LZ/OlLryJaNMOeGsfOthilD+woYiZ7b8MW8YauRfDuwTuO02voE9UkMYGu\n",
"FsHHNMm8CP4INs6cRBT4Ie3j2DlyJ0mkPQR4hn57bwdOCMd0ZZDVPXjHcXoNfSmCb8aiGceOMQcS\n",
"BX7QxrE8M3I7icDHnsF1wFmhbnyXPXgpyxgpy9sKnsNxHKdb6LUCL8JIEf4S7qYFPjubtSOCzyzD\n",
"BzCeHW0DOo4ZuGU0O0duJRH44cAWLelj4RwvoDGBj9cViec9GagouCZlmS1l+UXB8zqO43SZXivw\n",
"wATgRSIdC33U8uA3Y+UKsv73OHaOGkCM7gdsbWNn2xYyAh/2vQI4j4ICH6L9wcC21ObowU8HZkhZ\n",
"0pbUHOC4eud1HMdpFb3Zg4+R+iHU9uAHY3VnkmyZhPHsbBvUca4Dt7axY/RmEq88nhcaFHhslajt\n",
"qcU+SJ13OtAfSE+gGkOyaLjjOE6309MC30+E/gX3jUJ+GCamW8kIvAj9MEHdSb7Aj+OZEUOI67IO\n",
"2DqcHaM3kUTasWcAcAcwmuk3DqKYwKej/0jsGcwI92enHhsDtElZilTddBzH6TI9LfC7KB7Fxwj+\n",
"OGB7qM+e9eAHATtV2Uu1CF77DyWuyzpg23C2TthAjkUTIvE/ccRvplBc4LNF1+J5pwNLgINSj43B\n",
"ovqhOI7j9AC9WeBHYIJ5HEmknPXgh2DiDZml+UQYhNk3Q9Eg/gO2DmXLlHbyPXiAexm3eDTNR/Bp\n",
"D34BlRH82PDfbRrHcXqEnhb43YSBVhEGi1SdfQom5Aux+vPRJ8968NF/h84R/DgsffJZEBP/AVsH\n",
"s372OioFPp0Fs5i25eMoVk2ymkUzHBgP/JXOETxY7rzjOE63sy8i+JhJ83HgX2vsOxy4C4umqwn8\n",
"EGoL/FpgG3v7bwMuZPvY7WyZspHOefCRxQxbPYFiEfww8gV+ArbQyBI6e/DgAu84Tg+xLwX+aCqz\n",
"TLKMAFYBK6gt8NGiqVxc26LoNcA29h64A/gX/vzFRWj/9EzWbBT+BP2fGcSwp/NWjspSLYK388Bj\n",
"wOzUoOrY8Hpc4B3H6RH2pQd/BDCpxr7RPnmISg8+Lb71LBqL4J8duBP4H+5702asQYipkBUirSVV\n",
"dratZPKd4wq8lrxB1p3h/xNa0vXAHpLIfQxWEdM9eMdxeoR94sGHAdA55C/eHYn2yRKas2iSCP7W\n",
"D18GvBtrEHZQPYKHHaMfZ8J9Y6hPXgS/K/x/Ivx/lMSmGQMsxSN4x3F6iJ4W+I2YR30IZlcUieD/\n",
"Ctwftm3G6r9EinnwN/zrai3pBjoLfLbUAGybsIwxS4qIcCeBD6mWu0kE/jHgICnLIMyaWoELvOM4\n",
"PURPC/x1wFmYPXMLNvGpWibNCGCLKr9W5cth2yZgeJjgBIlgQy2BT3LP60fwG6c/wuileUsDZsmL\n",
"4AnnjgL/CHAwFr23Axtwi8ZxnB6ipwX+GuBcTOAfwLJNqkXx2RRGwmSnrSQ2TTaCT/vziUVTKfC1\n",
"Bllh5UlLGPXYkAKvZRidPXjC+aPAPwQcSiLwG/EI3nGcHqKnBf42bBLQWSQCX82Hz6YwRtIimRb4\n",
"tZioR2IEv5VE4AdRL4Jf9NZlHLijv5RlVp3XUi2CLwMPh9su8I7j7DN6VOBVeRb4M7Z60v3UjuA7\n",
"++PGRhKbI50muRyYmdqvWgQfs2jyPfjdQ3ew6P9tBj5a5+XkCryW9Fta0pgu+RA23jCOxKLpUYEX\n",
"YYgIv+/J53Qcp3fQ0xE8mE2zG8soqWfR5EXIaZFMp0k+TlLkC+p78J3SJAM7uPkTO4G/k7JMqPE6\n",
"ql1fB1rSLeF655FE8D3twY8BXpJTK99xnP2cfSHwVwJfVWU3VQQ+iFHeTFGobtGsBw4QoS0M3B6A\n",
"Ree1BllR7ZicFNnJlikDgF8CH6jxOuoKfOAh4BRgHfvGohmKFTkrMq7gOM5+RI8LvCrtqnwy3H2K\n",
"/Ah+KLAjDKpmyRV4VRSzaWZgJYaXhG3bgKEiHIgJ3a5QffJZ8gU6ToL6NXBajZeSN9Epjwex9V73\n",
"iUVDIuxFMoMcx9mPqCvwIjJNRK4XkQdE5H4RyY1qReQbIvKIiNwjIvMKPn+1QdZqA6xQaXOk0yQh\n",
"sWkOBxaHbTGCHwesCaIPFsVXE/jB7DlwJTAl/UBYa/XccLdaDyPLQ1hvoR3rUQwLq0H1FLH3MrLm\n",
"Xo7j7HcUEZrdwIdV9QjgROC9InJYegcROReYo6oHA+8CLiv4/NU8+E4pkinSUXDaooFkoDWmYUIi\n",
"8JOwyVWRZ/KeIwwEK7d8aB0wObNAx3zg31LXWETgHwz/28NEqK30rNi6wDvOc5S6Aq+qq1R1Ubi9\n",
"FROsbNR9HvDjsM9tQJtIzQHKSDWBrxfBVxP4x6ku8BPpLPDVnuNPXPulz6FsJ6klAxbRz6yyHms1\n",
"Hgr/23OuvyfoswIvZZkoZSnt6+twnL5KQ1aBiMzEMkJuyzw0BZuGH1kJTC1wyvXAEBGrvy7CiSKc\n",
"S/UUSagUyKF0juCjRVNP4HdSXeDfAJzO9rE7qLRpJgOjsUYpux5rNZ4Or2VduN/TPnz04BsSeCnL\n",
"BQXmAnQ3JwB/v6+eXMrSX8qSXcjdcfoMhQVeRIYBvwU+GCL5Trtk7munHUQuTv3ND354uibNa7Ef\n",
"dC37I+3BZ22X5cCRWL2bx8K2tMA/ndq3agSvyibgvaw7ZCidBZ7wHEUGWK1CJbyIZPJTx/WLcIAI\n",
"J6X3l7IMkrKcX+TcBbEIfuziSVKW2XX2TfMB4PQWXkczzAImSVmKruPbNFKWN0hZPp/Z/P+Ab3T3\n",
"cztOGhGZn9bKrpzrgIJPeCDw38BPVfV3Obs8SWVt96lhWwWqenHOsSuwqPsx4HlY9F0rgk9HwFOp\n",
"7Dk8Hs6xKJWBE2eyTsRqw0RyPfgUS9hw0CBm3JwV+J3AURTz3wHQkt6VupvugRwNXCXCuJDZA9ZD\n",
"+qGU5X8L9hDqYQJ/3rsuBM4EXlnwuBkkywzuK2ZhmU/jgaelLKcBf9OSPtsNz3UwNgEvu62RRtFx\n",
"uoyqLsCW/ARApHmbskgWjQDfBxar6teq7HYF8Jaw/4nARlVdXfAaHsBEHUycZ2OReU2LJuS6D8QE\n",
"P7IWy6pZnNq2Dct4yRtkrSXSq9k8rR87R6SX3ZsM3I5F8IUFPu/6w+0RwGjec8TXpdyRnTQN68HM\n",
"aeSkIpxVpXDbUAZsgcl3Hos1HvXPVZYDsJ5Lkbr43Um0iGIj+xvg2G56rrF0fs+nUcxqdJxeSRGL\n",
"5hTgTcAZIrIw/J0jIheJyEUAqno18JiILAUuB97TwDU8ABwpwgHYAOkDwMnUH2SdCqxMpT2mc+Ef\n",
"SO3fzCCrnWvH6DU8M+KQ1ObJwN9oMILPkK4oOQIUhj99IYn4xp7QMQ2e9wt0jkABhnD0j7ew+qgV\n",
"wAgpSxHRnoJ9NwoLvJTl/G5I/5yF9QSnSlniWrcHt/g5ImOB6RnPfVp4bp8F3CBSlqG13jcpy1FS\n",
"lrf24CX1KVr1nSuSRXOTqvZT1bmqOi/8XaOql6vq5an93qeqc1T1aFW9u4FruB/LepmJeeS3Y41K\n",
"rQh+FJ3tmci9wB2p+00JPAA7Rq1A+80E+8JiPYZFWI+jkAefwzISkRrOhHt3MGjjWJI6OtOwAdlG\n",
"I9U28hcLH8rxlx3Anf/wGHA3xRqOWPKhkMCHkg6/xT7HlhC+4LOw9QCmkCxg3l0CPwb7PaQHlqdj\n",
"351uy0CSsgyXshzXXeffh/wBqDWW9BrgrT1zKX2SN0pZvtXVk+yLUgVZHsCE4RDMI78X+7FVE98t\n",
"2EzTWVi2TgWqXKjKn1Obqg2y1sqiMXaMeYT+uyaGe5OxmbfLwvM3G8EvAuaG28M5/rLHefjc3WiF\n",
"wF9B4xF8vsAfd9mhDNyyl0Vv3UpjAr+c4hH8yeF/Kwdlx2CzjR/ABH42sJcmBV7KcmCdXcZiFt+c\n",
"sH8/7DN/jO61ac4HftcTA8k9ReglngK8vMZux1PZmDqVzCdJsW6a3iDwa7Af7mlYpsm9YXtuBB9s\n",
"mE2YD54XwWf33x3OL1RG3TvCeaqzadr9DNgW7ZTJmF2wPNxvVuDvBY4MP+jhPO/Ksdzx3idBpgTv\n",
"eyrwf8AxRbppUpZ+csSv+wFtjHx8lJTlnanHhjK/dDJ/+dyNaP8RmMAX6RnMAO6k+CDrKdiXsXUC\n",
"f8uHXoayjCTldjaWnttsBH9LNlKWsrxIyhLHLcYCt5L48BOw3uKjdK/An4A1YC+qtoOUpaHxmCrn\n",
"OEjKcp2U5dNdPVcBzsG+ay/Ns+3C9/o4YEqBhve5yhnA9V09yT4X+CDYDwCvxgT+vvBQLQHdiPng\n",
"nSL4KmwDVqX9euDjmJBWZ+3h93DAzoHBl40RfBzIbUrgtaSbgNXAHGZeP4MBWwfy6FlXs2vYNkxI\n",
"pmE/jq0Ui3Dezflv/Av9dvfjNW+6EPiulOX54bESTx+7gXvffDdmM9yFNRwTpSxfqnHO6WHfohH8\n",
"KcAlwOkt86ufPvar7Bi9DmtUYwT/B+DgRp9DyjIaa9iOzDz0DZJ6Q2OxBiSK6XQsgCg6p6NZXgD8\n",
"nCp2hZRlMHC/lGVG3uP1kLKIlOVd2Gu7H3hHD4wpvBy4FPudzs15fCoWcK3A3uemCPZWoUzAvoSU\n",
"ZTqWaLG43r712OcCH7gfi8weVmUtyeSgajQl8OkNqjyuWmcm6p5BS9k2fi8m7pOBp0Je+3Kaj+Ah\n",
"2jTP/+nRrJp7L3rAY2yduA3LIhqLvf67gJdIWU6p030/Htk7jwtfDZMWHg98C7goTFJ6B1ddvgxr\n",
"mEZiFtg44FrgY1KWiVXOOQNraAfnTfSRspyYuj0YeD6WRrsNK/TWdUasGM7Wie0kAn8QNrayl8bT\n",
"N08J/zui/yByM4Fp4TX0B+4hEfhp2MpcLRd4Kcs4KcuA8LyHAZ8EXiZlyfP6T8XGfpq1M34EXIR1\n",
"+T+IBScvaPJcdQkR+VnA1Vhp8HNydjsO6yEuIxlbaYb/g47lPPcn5gMLgtZ0id4i8DHrJU4G+hxJ\n",
"JJ/HBuxHXteiCWyj0n8vyuNsmtaPbWNnkkTwYALf7CArRIGfueAQHjn3VqCdjTOewbzs1VrSPVj3\n",
"rMze/n9g15Az0wenbAWAw3j45Z9i+JNw88euBL6IzcT9D+AbbJo+IFz3yJBXfztwQzh/XnQFiQff\n",
"TkZMpSxTMbsjTk47DlisJd0WztuUTSNlaZOynAcgwmDaHj+A9bO3UGnRPEpY5zYbhUpZBkpZpgbR\n",
"zHIaVmIjbe+MxcZmpmJ+/7pw7rTAd1cE/zOsptFc4EEt6UpsIZyP5OwbP/uGI/gQGFwAnKElfSAI\n",
"xi+x70dLCdH097AGZamWdBXVBf54rLFeRpMNV7DbDgH+n5Sl2qpwfZUzSOXBd4XeIvD3Y0XNHgdQ\n",
"5duq1Mqj3xj+Nx3BF0GVZ9k+djsbDjqWSoG/jq51nxYB5zFk3XBuf+/dQDsbZu8BXkhotLSkX9eS\n",
"jueO9+xl/ZzXZI6/WcrywiByh/G3jz/B5QvhhouXBbG4EWss/gMTsRjBg3Wf3wcsJCcvPpxzOvZZ\n",
"rKWzTXN2+H90+H8KcHO43bTAY6m4Pw6iNJq2ZbBq7g4t6WZsVnS8priQ+delLHdJWc6SslyGjac8\n",
"EF5zlhcCP6BS4GeG/1MxsV+HCc60EIVGi2YFKYGXsoyXshxNQaQsr5KypHsOw7DP5h3YD/n28NAH\n",
"gbdJWV6fOcWZ2Oc5Mxz/DilLUYGejQUM6d7wL4DXdcOg7kex92kR8I9h243AYSEoSJOO4GcBSFk6\n",
"9cqkLIfVsJM+DnwF+CHwT12++hyqjQ9IWQ6WsmTtvqbOL2W5OOezmE8L/HfoPQJ/N/CvVeq/57ER\n",
"62quL7h/UwIPwMMvX8q4xR/CIrsnAbSk/64lvbqp8xn3AIez7EVr2T1sI9DOukP6Y9U6O3olIoxg\n",
"2RkjGLKuI/NFyjIKs0ROw6yL7TxxavyCxOj1U8AbwopSQ7D3qZ8IA7WkO0Ikl87mSTMO2KEl3YoJ\n",
"fPaHdzb2XsZjTwVuCrcX0LwP/xpsQtox9Ns1ijEPw8qT4/dhJWaPPYMJ/DnA3wHfw+Zd7MEmss0D\n",
"zk/7siG99Sgsap6TurZZmA0YBb5dS7oL+4xnUt2ieTPWWBTlEuBjqfsvBm7B0j//kSDwWtKngFcA\n",
"34oNQhg7OAT4FUkEfz5wacH5DEdgwVMHWtKHsc+1S6mZUpYxqdvjgfcD79WSfllL+pfwXDvDtb81\n",
"jAUskLIsxoKCDoEPabYr0j1TKcsIrMG+POuzS1lmYg3f94AvYSmF1ezGZl/f84Hl4feW3t4fWyvi\n",
"2no9BylX2NA3AAAgAElEQVTL66Qs36vxe5gDlLDxx3jMBCwY63IGDfQSgVdlmypfaOCQjWQmOdVh\n",
"K80K/F0X/Y31c57GfhBPZR8WYZ4IVzW4JN4KYD33v34z5uWvY90hAzExTttOh7PyJBiy7vBUNsIL\n",
"gF1YFHgY1pNIL2GIlnSxlvS6sG0o1sBtpnLRj2oCHyNlyETw4cv9YuDrwNxw/1QsUgOzdXZjYwmF\n",
"CdHbsdiM6Rcz/zMnov1g6dnx830Ss2fABP71wNe1pN/Rkh6kJX2flnSDlvQx7P1LL9RyIrBQS/o0\n",
"lhobq5zOwnoe00gieLBg4zyqWzSHYwPV6Qlw1V7XVKwRPl/KMiBsPhfzp7+GDaTFCB4t6T2YYMUZ\n",
"4/PDNT5C0uM4HGscLsk810Apy6VSlqNSm4+kctJf5GbsfUkf32kWtJTljKzAhe2zgaekLPFz/jTw\n",
"8/D+Z/k+1ls5H/uevhV4T7BwHsM+h1dgqcfp780xWCA0AxuwTXMScL2WdIuWdDXwP8Dbcp67K5yK\n",
"fffLme3vwn5L3wJ+I2WptVLafGyG/yerPD4Hs5s/lWoEjgDub4X/Dr1E4JtgA8XtGbBlAm9t8rke\n",
"5RdX3An8b5XnnA+8jNo5vxWED+9sFr82rirVzvqD4w8sLfBHsnUiPDtoJxbJgQn7z7Av+RGYt9yG\n",
"NWJ5X7Yo8JuonLDzEDZLc5iU5Z9SXc6DqCLwmAivBq7CGoe5WGS9JvW6Otk0BSL68zAP+krgLA7+\n",
"/Su55y1Av/jcaYG/DxtPqVY24zfABcGP/yU2+Pvr8NhSEptmJiaUaYsGrPfzT9j7vQILJg4Um0kL\n",
"9p7fRTEf+8WYD70YGzAXTOB/jzWKbyFZLyDydayncTEmLtcQqqQGER6PFUE7V8ryAuh4f7+NBSHX\n",
"SlliAbsjyUTwgVvJCDxwp5TlTfFOCCh+A/xfzkD7C7Be079LWV6KRaCfqfIe3I29h98HPqElvV1L\n",
"+uPwWLRoXol9fw9NHXcs1jO8AHitlCVdE2o6SboyWC/undLamdQnYNH166UsH5KyfE7Kcin2Ot8P\n",
"fB77DT0oZemo7yRlOTTV4zgMG+B+v1gdpSxzsN/yASRjFenFirpMXxX4dSQiVBdVLlXN/aIX4VE2\n",
"T5umJX2NlnRHzuNzMcH7jEjx91NLehd6QKyauZFN04dhBccqBR620X7Ik9BRdfIkLGLZiP2wFmMz\n",
"e58mM9Ep9CqGkCPwoWDXYqx65+eBd4eHXo6NMYC9z2mBPxv4I7AE+5G9jM6DQR0CH7rlXwDuk7LU\n",
"KpH86vCabgSOZ+xDp3Pf320ksYduI/j8WtL7gIOC/ZTHb7GqpDdjP8BDtKSxImT078GE5V4sK2d2\n",
"eK3RwvgqZhc9HRqtlVjOtmA/wE8DbwiNyAeyFkKKs7CMpZ9jYv56rBfxiJZUtaQ/yRaUCzbRe4GX\n",
"AJ/FhDuOAxwBLNGSbsCiwm+HXtQnMdE9E4uWfxuuqZNFk3o/OzJppCzTsEj5c6lB6mPCe7Ia+Hna\n",
"ksEaki+G9+KXwBu1pGvz3oDw/n0TuFFL+qfMw6ux93k+ZntlBf6uMH7wc+AfUo9Nx+yzyJ3Y7+HF\n",
"edfQJCdgjes7sdf7DPZevk5Leq+WdK+W9B1YY/uT1HfgKqwRBxP4a4EPY2NG/UNDEXtfc7DEkq9i\n",
"nxu4wAPwEyp9ze7kUUJFQRFOEOH7mcfnYa36XhqI4gMjgC2q7GHPwI1ov6eoFPgjgFtYeWI7cFL4\n",
"MZ+ARWB/I8kOacPso2wEPwhbg3YPnSN4MJvmy5iQvDb41a/ARBI6R/CnAddpSXdjX8KLMEFPswDz\n",
"4fsD38UGEm8FfpY3sBe2zQf+EER7IdsmrGX9wQ8RBF5LeqmW9EfxmODt5qIlXYo1FJ/Wkl4cuvCR\n",
"dJbMTCyCXIF9hu2p/b6M/ZDjGMAKrGdjYx6Wj38A9oO/CPhKGDB7v5TloPC6BBOcP2Pv5yuwgcgP\n",
"1et+a0mv1ZKepCX9lZZ0T3i96zEBjz/+n2LjUH/FBqhfFiyLq7Dg57zwWvO83IeBUcE7B2uQf4/1\n",
"TN4ftp0dXudbMCFeImW5IDx2PBZdvwP4qJY0+x3Ivp4fhOvJbo8px7dh35FOAh9ufwuL0GNPokLg\n",
"w3kux8S4y4ilq04HHtCSXqElfZOW9DPhe1gx+KklXYAFAM8P4wCzgROCtTUE633+BgvkfovV6XpJ\n",
"OHwO1qv8C/DCVADx3BZ4VbarVvwgu5PHgJkhOj8dEyMARBiERYT3YStavazBc6fr3rdz5z+8l+RL\n",
"DRbB38jD527FunDnY1kR6zCBh8SDf4rOpQrSC6LkCfxtWDZNCROxL5L41ZAaZA1fvmOxaAmscZhM\n",
"Z4F/FJvE8mcsSj4T6x0MBr6TI/IHA2u4WDeLcCjwNW765N8wMa5M0SzYQ9KSnq8l/a+ch5ZiKZb9\n",
"sIj1cezHOZfEokFLuktL+j+p427AovEjsJRQxayD52MDhi8J1/tJLLsDQkE6Leny8HmN1ZIeHwS4\n",
"GZZjkeHicI2Kva8PA6doSdMR7fexVOPH8xrD0Gu4gySKPx1rmP8R+LhYCuxLgD+FQfn3YL2Pz4RI\n",
"dR4WXd+gJc0GPLnUaNSWAr/DGqJDoUNgpxDsKy3pQ1hjGhuJGVRG8GDiebaUZVDcEBrdi6QsV0hZ\n",
"XlXt2kJPc3Rq07HAohDIFOFmzDqNRRJfgEXvD4WemgIfwmyxM4FDw3XOwVJKn8Aa61guPW/cpCn6\n",
"pMD3JGEy1EbsCzcXmCFCTJ86HFiqyk4skjq16HlFGIBNrok/wHau/vb62GUXYQwWAdzHo2eD/WB/\n",
"gGVggH2p1mPRVbUIPvrvkC/wPwBOD8/5K8wa+FXq8XQEPx14JgyOgQn8ktR9oOOH/AcsIn6ZlnRr\n",
"+KG8Eouar5SyfF3K8uFwyDyskTkZ+JWW9H+4892bMOHKZvDcIdL8zEdMhJ+HDbRuDrn7K7H3ZV2N\n",
"467AxKVjKUgt6d1a0ie0pBuxxvcj2A/49WHg7b1Y5EbYvyvzJsAao5NIRXda0vu1pG9Vmx2d5leY\n",
"pVNLKG4lEfj52MSah7HG4VLMokk33tcBA7DewlPhdbeCd2I9vYexxrc/9p24Vyvr/i8gyfzJWjRo\n",
"SduxQCs9/vMWrBFcTu3o/mTgrtRY0fGkBr8L8DesoT8F+00dj2lDx/iKlvRuYEr4vyTsM4XEav4r\n",
"ZlUeSLMJITm4wBcj2jQx6ySmrEVxAvtyTQ3CXIThmD0TI5t2Ktd/jWLSDv1GaUkvw6KCT0FHxsWx\n",
"QVBzPXgqBT6bRUPwEeO4wq+xga505Jr24NNdZrBB54+Tzzu1pBeEtMb4XFswC+tG1s9+ht2DLgl5\n",
"xsdg7+EYkmnro7Av/iAR0gN8E+hajfoHsPfknzF7BpKB81oCfy/2wzufnO6zlnSZlvR/1OYg3BrO\n",
"/xpaO8tyOfZ7rdt9D43Jz0i+m3nchllp07H3O57337DP+vbQAMZzKvBfWPbOHbQILenq0GPahtWl\n",
"mkHn7xrY7+uoEN0fQH6K9FVU2qRvAS7GxkxeGCzIPE7Cgo+YxXMCjQl8jOBPwXojG7CApmIAPTXe\n",
"cif2XXoyjLmAWV7vJOkhtgQX+GI8itkls7DWOq7yMxeLZG1SlP24T8k7QQ7ZZQnXkS/wG7B1YNGS\n",
"rtCSdqyUpSVdHm5WWDQivEqET5AMsEJ+BN+BlvRxYGJmsCwdwR+LZUTE/Z/Ukl5Z5Vy5K1FpSZ/R\n",
"kn6Bbyy9kY2z9mIRZGwkR2MLuQzHBGd9znsyhGQR8YYJjdmrsZS65SKUWHtofH+qWn7hB3cl9iOu\n",
"133+T6wR/rKWtOg8jSI8jqXHPlpvx8AHoGbq8Q3YBLI7gRviZxaE9g1k0jADP8Gqst6Z81greAgL\n",
"Yl5O53Wf78Nsr+nAE1VE8Crg5cFymYVF0VeHHs4dwItD1tibxQrNxeDhBOzzf0mwTk4jsUCL8Aj2\n",
"3ZyLNQy3Yb26bIZU5E7gdZg9FfkrNs7TMv8dXOCL8ijWIi/BPrROAh9oxKYZTmW9nWxZgIOxbut6\n",
"kjVc+6fsoTRZi+Zw7Etaz4OvIB2xpa5pSJjQcQydo6pmaePRs3ZiPu88rOGIQj4NE/sNmMCn35Mh\n",
"kLtqVWHUsnBeh2VmnMfy+XFMoFYEDzvargFg19B6P8CrsCJm3+zKdeawHLPECi1XGKLiqh5yiPJP\n",
"xyyML2ceu1lLem3OMY9hPb3rso+1iIewAf9hVFqFYK+/DZtBnfXfIw9gmnYENiHtl6kIOdpsP8Je\n",
"8zcx2xMs0PgqNrB8ATa+UO05OhEam79httJ2TOD7U1vgJ1Ep8A9iv3UX+H3Ao1g2yKJwe3YQ2qOp\n",
"7AbfhE2LL0I2gs9aNDMxG6EjgsfSrfKqQGYHWUdi0UDaokmvJFWIICbfAf6FTATfRdp45GWKTXrZ\n",
"FXz8+BqnURnBj4OOAdaBdFHgAbSkv9eSXgEMYdXcrdg4yPaaB/38qnu56RPw+a15tW7S596lJf2g\n",
"5qfUdoW/0OIaMmEA8L+1pIXniGhJXx/swe7gQWyg9S3Zxin0MB7ABppzxTcI7fexaP2fMEspciWW\n",
"0jgDK818AXBhGFAejn3PT8NKRmQnVhXhD1haJVgUvwtL0MjjvvB4h8CH1/dDkkmDLcEFvhiPYi1y\n",
"h8Bjrf5S1Yo1YW8HjhIpVK2vnsDPApapsgNQEQZjX/6z0icRoT8memtIIviRWAMxjETgn6K5olmX\n",
"ABdi3flOC6k3SRvLTx+ICXlsIEdjqaZR4LMRfBTWLgt8iqEsOW8T8JW6vueKUwZy7RehVdUyGyQ0\n",
"HC3Lruil/BY4V0taLfK9D3gpNebAaEk/g/2O5mpJ70htfwyL0l+rJd2pJV2MBREfBu4Ig7SLsYDi\n",
"941euJb0Mi1pXBz7duAd1XpboVexEHME0ts/piVtVS8ZsMEKpz7R91yEWR2zsXSnv6R3UmW7CO8H\n",
"fiXCNaodk4fyyPPg03bETJLZeusxAZwFHCHCeFXWhMdGYlbPNmBwmNw0AhPEWSQCv4JkvdfCaEnX\n",
"SVn+AzihhYM/bewZOBTlKqRjgZfR2Be+msDHxqtpDz6HoWyddKCW9F8K7Btnsh4GZCfsOC0gpJPW\n",
"KrJ1HzYxr6Z9EmySJTnbs0kBv8IGYKNF9X1ANZn/0BSh9/HTOrudR41xn1bhAl+MdZivtgiLMg/C\n",
"JrF8LrujKj8U4XZSKXJVqBrBi9CG9RjiIF20V2ZivvxpJJOR2oANqjwrwh4slS167UeSlDVeAUwT\n",
"QRqo4RP5HPllEJrFZrWuOKXE9JtjSthorPbIYcAzquwSyRX41kbwxV9XWuCdfUMsIV7YH6/DrzDP\n",
"/zYALen3WnTeumgo79HduEVTAFVUlRNV2aTKFiwqPh4bVM2jncRTrkYti2YmsDwlxOuxruNUzFec\n",
"nzqujaR88g4sch+JWTJHUjnICplUySIEr7b24iiNYQL/g5tWq1VRhETgj4YO2ytdzbKlFk2wtmKB\n",
"tyIMxz73w1vx/NUQ4TIRWloZcT+ipQKvNuv5SyTVUPc7XOCb41Hg9horQq0HRtepMFkrTXIWlcWU\n",
"NmBivQ6rBTM/9dgoEoHfTiLwi7Bocxt0LI3YlE0jgoSJWa0i1qUZnto2Brvmg0kEvjsj+Gj1NCLw\n",
"d9H9EfzbyB9If84TLJx/pvhCP0XO+ckWTtrqdbjAN8cjWBGhXFTZhRUnqiVGI6gU+HSjMJNkIk58\n",
"7JiwbRE2oSrWEclG8EMwgV+IRajpRqgpgcemqf+47l7FacOsrrTAxwheSKyp7hT4IZn/9RiOfe4H\n",
"ijS8ZGAhQq9iAHCGSOFsrOcUWtLPF00VdQoIvIj8QERWi0juEnoiMl9ENonIwvBXZMCqr/Nx8lcO\n",
"SrMeas5qrYjgQ6OwAxPnvAj+WCyr5llsZuUR4bE2kog3G8FDawT+NJpYMq4GMa1zOHTU9IlTtDeR\n",
"vJ703IDcQdYu9CyGZv7XI35eD9J9Ufwg7DvwaZJVkRynaYpE8D/EUpNqcYOqzgt/n23BdfVqVFmt\n",
"WndN1mzaIwAinCbCyXSe6AQWsU4gP4I/nET0nyZZuGIMiSDuwCLcYVg0DJX53c0K/Imp52sFbeFa\n",
"YgQ/ClifspHi69mS2mcwVoM8G8HfK1KxFF9RmrFoWiLwInxAJHfG8xDs83qQrpVkcByggMCr6l+h\n",
"Itc7j2aWaNvfiamNWd6EpWVlPXiw9Lu3kR/B9ycR/dXQMRA3iWRB8e2YMGwPx++lMoJfSYMCL8JQ\n",
"TNBaMvAXLKiRVAr8aJKUMVvtykgL/BBs0DVZ1s0Kjx3S5LU1KvDDSAS+qwOtp2NjKlmiwG+htdlC\n",
"znOUVnjwCpwsIveIyNUi0q1ZBn2I3AgeE9iTsYlSWYH/HLYk2GwqBT4KXty2ikqBj5koO8L2Tars\n",
"xrINumrRHIfZPSLSEtEZio1PbCAR7zEkr/EJCgo8SeXAWouJ1LqOeN4ixAb5UZLl85plRJXnjQK/\n",
"lcrxCcdpilbkwd8NTFPV7SJyDlZNLXdNThG5OHV3gaouaMHz91aqpUpOw1bBuZCMwKuyQoSfAW9S\n",
"JT2yH3tQMYJfRbIy0WQqI/hJJCmR11OZUtaMwJ+IlSiegDUeS2vvniDCcaqdClPFQeG0eI8mEfVL\n",
"sAYA7PUMFOEAzKJZg5VYjZyO9VKaFfjtNC7wT2auoRmGV3lej+AdRGQ+lZlyTdNlgVfV1EChXiMi\n",
"l4rIaNXOlfRU9eKuPl8fomOQVYS3Ar9Q5RlMYF+DVTXMevBgEy+yNSzWY/5zTA+rG8EDqPL2zHlW\n",
"YBk4Eh4vMuHpJKww10mYyBcS+CDKt4gwVZX0qko1BV41mYauioqwFRO7GMGnF7uej+UwNyPw8Xzd\n",
"KvBh0trWMDgeKRTBNzkpzenjhMB3QbwvIqWqO9ehyxaNiEwQsUL5InICIHni/hykHRgTxPTbwFwR\n",
"RmBe+iNYUbLO06ltAPermc1PAQ+mRGI1MDGcOx3BR4HPazgIA8PPYEWrltTJ049++YlYGeR0o1KE\n",
"GVgAkU0prBfBZ4n7VVg0IkwJ57qZggIfqnHGBZKH0pzArwLGhgasCD+gcj3ReK5qAr8j2GvPQkUt\n",
"fMdpmCJpkr/ASmEeIiIrROTtInKRiFwUdnktcJ+ILMJWur+w+y63TxEHWcdhP9xDseh9RZgZe4fa\n",
"Itt1UbU1H1ObotgOB/aG2bXQ2aLJYwW2fuUU6Milr8Y8LJpcgTUqjWTSxJLK2XGIRgU+G8EPCw3P\n",
"6dhM4vXUKYOcYiYmuGACny7QVo/hJJH4Ooo3dvOw3lqaahH8YJKsJ/fhnS5TNwpR1ZolSlX121iE\n",
"6lQSB1lnhfuHYiK5suoRNch01ddgkfE0EnsGkgg+d85C4Nck67AeFK6pGhcAvwlWySoaE/i4uHVR\n",
"ga82OzHuNzjssxebDDQXKwu7kcrFmmsxHBgVSg/HCH5eA8fGhjTaNDU/y7B4yQRsAttoVdaHxim+\n",
"nizRooGkYVubs5/jFMJnsnYf0YM/CBPeQwgRfFdPHLrwG7EVbp5OPbSdlAdf5djPqvJ7zOc/qNp+\n",
"QYguICmalk7NLEIjAp/OosmStmi2Y1lBQ8O1PIW91qIe/HDoqLY5FIvEG53oBMV9+COxErTXkyzI\n",
"PjRcQy0PHirfH8dpChf47iNm0czCfuAdFk2Lzr8aiz6zEfwgals0kWUkvYs8jsa+H7Fee0cEL9Kx\n",
"+HEHIpyQ8fTnYAOyRT34aqVT0wK/gySynYg1bhspLvAxM2U0zXvwUFzgj8J6U78DXhW2xWJv9QQ+\n",
"vk7HaRoX+O4jWjQHYROYDsIEtVUCvwqrT5MW+GzlyFrUjOBJ2TPhfhzYnQjcIZIsHiLSsYZmevxl\n",
"dtjWVQ8+bdHEDJNh2FjDKhoT+PTzDQnHUmUZxA6CpTOEpPTyUzQm8FcBZ4WB2XRefxaP4J2W4gLf\n",
"fWzEBv9mY930J7GBwVYK/DwqLZq4TFwrIvgzgaszzzcBW7oQbKJWTAP8DlY75SsitAVBPAhb2abV\n",
"Fk06go8CPzJcy7w6q2nFiHgUyXKGRXLhhwLbU4PiDUXwqqzDspdG4RG804O4wHcTIdtiCzYYuAxb\n",
"ULiVFs0qTBhbFsGLWI55KOB1FJWLbEcP/kxM4E4M28vAlap8EVvY+HOY+G3AJlnFuQBfEOF9ZAQ+\n",
"FBobT/XlALMWzbZwjjbMQ09H8G+kdpGudATfiMBny0rUFfhgV8UIHpLCacPD7appkuF2RwMo0r01\n",
"6J3qiPAGEd64r6+jWVzgu5eYwvcEJvDQWg8emo/gV2CWywAAEY4EHgoWzFHAY+mCaqH2/bPAy7Fa\n",
"Oi8IlsMbSOqX/zNm05yN+e/papBHAx/FBD8dwc/BFjepWGQ5RV4EPxtYq8qe8FrbgqBOwUrtVssO\n",
"6zGBxywkJfmcYunjEVjjXCRNMkbwN4e6O07Pczz23e2TuMB3L+1Y3vsuTOA3pXLWu0pc6i4vgs+d\n",
"6JQmCOpT0CEccWHts7Ev9R05h63CMkB+hPn/Z2Cvb1k4ZzvwLUzwo8BHi2YGJmCnYgK/E0vTPZKk\n",
"8csjz4OfE64lllnejQnmVEwUj61yrmHYjOBGLZpcga8zUSzaM3EMIwr8cKoLfCcPXoSBWA9lUp1r\n",
"3KeIMFIk+RxFGN/AZLDeTByv6ZO4wHcv60nqxzxAZQngrhIFvtkIHoIPH4TqQsxLfylwAuafZ1kN\n",
"XK/KJmxl+zLw35l9voo1AksJq1SF888ALsZEfWMQvi1YMbNOM3pT5EXwc6h83dGmmYKNG7y4yrmG\n",
"Y+KcjuC30bjAxwa01vKHp2AzgCPpCH51fE4RThThv8I+eR587AHtE4EXYYgIRUqAzwQOTjV6P6L6\n",
"59CXcIF3qtJOUlfmFiw6bhWrsJmVaeFpVOCjD38cNoHoEuwaX0B+BP8E8Odw+1asPs1v0zuEImlv\n",
"CdvXY8I7FtiFzSK9mcS22IL1FopE8GkPviOCD2zEovLJ2MpTtQT+cZIf7fbwVy8XPrs4i5Jj04gk\n",
"lhfWu7k+9XBckrEjgk+t3hUnauVl0cS68Psqgp8DfCTbWxHpVEhvKqYncQLXqPDX13GBd6qyBisv\n",
"GxfubuWsxEeA7OpZjQyygkXwLw3n+aUqK7DIeA75s2HfiUVmYAJ/vyoPZ3dS5SpVHg4DzdswD/Nx\n",
"VXapcmqqUdqCWT1FIvi0RTObzgI/OzzXH4HjRDoi5GkiHWUThmGNVFWLJmTiZAU/r3b/Sjqnmf4Y\n",
"+GB47nlYYxaJ4xEjsIZPsVWs2kiEsFYEv68W4p6MvfcdvZUwHrA0I/oxbXZ46n9FmmemAewrjCJ/\n",
"1nGfwAW+e/ks8M3uOLEqO1X5embzDpJiVUX4C8lCIrHcxB+AhcHbzj7ntjCwCfAz4PwCz9GOifjj\n",
"OY/Fsri1BD7WZElPdBpIZ4vmSGBlGBi+FTgvPPYV4MPh9nBM4HMHWUM+/B/pnIkzHDqt4PVj4Asi\n",
"9uMPYncsVs//VGBRZlH2tAe/OfW8I8kX+KYieBEmiXBJkX0LMjnn+edi15we+I1lqOPAcCeBx4KD\n",
"l7fw2lqKCB8V4SOZzR7BO/mo0q5af8CzhawjVWa0Hqrcqsp5qnwoVdL3P4EvFDh2e170nkM7Fs1W\n",
"E/h1YXC2GlswkdsdGpcotNkI/kiSVMvLgPeLMAkr9BUHetMWTV4E/zLsPXxPxoLIi+B/go2rxAyi\n",
"KdgA7k6szs/1mf3THvwWMgKfmkwVbbYYwY/DMp6KWjRHYuWoW0UU+HQPIha+OyK1rW4Ej/WyevNg\n",
"8fNI2Xuh0W5I4EUYWq9Ka0/iAr8focoWVc7t4jmWqPK7Vl0TJmzHULnwSGQLtf33uM94ksg2RsVp\n",
"gd+EiU0s/nUFJjjfDM8bbY5o0UzEqnDuplLg34algP4v8KHU+eNyfR0EH/4i4M0ijMcasYXAd7EV\n",
"u/IEPnrw2Qi+H5U2VHzdMYK/j+IWzURqD/42Sl4E/3zs/U/n53cIfKqgWofAhwZsBq1d27fVjMHS\n",
"f6NAD8YK2zUSwV+Nfd97BS7wTnfTjkVG1SL4WvZM3Gckld40dLZoDiUIfPD+L8MspEuojOCj4MeG\n",
"Yjs24DkROA0rrnYJ8N7UDz0vgo8DytcBL8Fsi4XAT7ESDbdkdq8VwYNZHtU8+PsoHvl2h8BnexBH\n",
"YVVJ0xH8NOy9HYZZaOmyDITjD6T3C/xokkJ5sRfXiAc/kfylOvcJLvBOdxPtlzyBXwPcW+f4dK17\n",
"SAQ+XeZ4IyYe6dmw3wU+jWX9xAg+ZrDsIiPwmJVzhSpbVXkUs1smpo6rNn/hGuAcLIJfpMoGVU5U\n",
"7bBaImmBz0bwUJnZE193jODvB8aHKLgeE7EGq6Ec9ODd51kLk7AZzRPDfoOxSPx/CQIfjpuKLUie\n",
"jtzTAh/LYvSYwIswQoRzGjhkDPAwySzt0dhn1UgEP5JeVGLCBd7pbmoJ/D8Dl9Y5ficmtmlvemt6\n",
"li10rF/bUZ9dlfWqfJbK2bTRatlAZ4E/ksrSDA+RpC/WE/izsQHWhVX2AbORhmGikY7g27CJWtUi\n",
"+HFYw7WZYpFhFNBGC5XdTP4EscnYussxgj8cE8FFwGGh0WnDZjk/Fa45Pnda6GZiEX5PRvDnAp9p\n",
"YP+xWGG4WM9oFPbeu8A7ThVioa1OKaIhbfLZzodU7BMnRKVn6T6V2S0KfF49m21Av1AULc58XU+l\n",
"pz8EOAyLQCNLSAQ+2ip517cyXM8YaqxXGwqVbcAyT2IEP5iklMVYzO+NC46nI/i1mCVVxKaJvY7C\n",
"No0Is7AIe1Jmez9MkBemzvt8bIbuxvA6poW/lalrzovgZ2LWVU8K/HEUbOhCL2QM5qGnI/jCAh9S\n",
"QAdRfI2BbscF3ulu2oEnii5PWIW0wN9HkgIZiXn/nVZYCg3EOkxgokivpzKCH4qJeXrA9yHg0PDD\n",
"PxqrCFqNa4B7CrzGdZiVlPXgl2NZONtTpQ22Y172REzgV1Fc4PfQmA8fK4SOy2wfhzWeT6Se+/kk\n",
"tjqGaqcAABLySURBVNoDmE0zFXvvY0rrcKyhyhP4estEtpJjybwPYd2CdhE+GVNcA0Ox9+1vWM9k\n",
"MInAD8qzr3Iss2i3eQTvPGd4CMu37wpbCBaNKntVOw3MbsQEu9oEr3YqBT5r0UzCxCjdQCwhWYXr\n",
"AGqXmfgeFMo9Xxf+b6VS4JdhVkiHbx+EfhtmE6zHIvgimTQTw/kaFfh2OotvXNA93XuYS77AryCZ\n",
"1zAc69VkPfj7gf45E8laThDfY+gcwZ+Drc9wHvC+1PYxQHsYO1mCWXajSHqgg3Ke5gYRHhLhg+G+\n",
"C7zz3EKVe1V5dxdPk47g81gJ3JZZtzbNOkxgom+fjeCPBZZkjo8e/EnALTXOjSpLVbmq7quw64iT\n",
"xbaTiOHjhAg+s/9WYENI56xq0YgwV4SDw0St2GAUEvgQmZ6B1RTKRvCTMaFux9IfR2C2R6yxczNW\n",
"onkOnS2arMDPDNe1GpggwgHhfN3FbMIAaSbSPgObw/DLcE2RsSQN8IOYZTcaCwaqFaSbAXyeZEZ5\n",
"FHi3aBynAWoKvCorVTmzxvH1LJrxVPrvYLbJBOBFdE55bJZ2kkJlO0jy+9diUXD2NW4hGbuo5cH/\n",
"G/CecL61mCgVFc/ZWHG4v4XjSdkRk4GngvW0Bot6HwzF5sAahaXAB7AIPm3RdAi8CP3D63ucIPDA\n",
"m7BJdZ0Q4VSRQguq1OJYrJ5SbEgJaw8cB/w1XF/6/RxDkhAQG/e40tgO8gV+DHAlNlGtP8m6BB7B\n",
"O04DREujWeoJPGQmXIXB38eA19M6gV9HZdrnJMxW2kD1CD4K/CpyLJpQTvgMLE0zrnK1meICfyY2\n",
"KWsNSQT/DhG+E64vDmg/jUXrHRO4Qq/m77HofQmVFs0qYHCInidjM5afIRH440nyzbN8HGsAusJx\n",
"WFZUeuWwk7D6SVvC65qc2j9P4KM91imCDx79gSRrG7ThFo3jNEWHB98k7VRaNDeSjAvkCnxq21Dg\n",
"zi48d5p1JBH8dkxgNmEikifw6Qj+CeDwnMG+UzCxmhfOV1jgg/i+H6srtJZE4Odhs3RfSSLwq4Cz\n",
"sIldHYRsmkNVuYlKiyZmCg3D3vvl4ZAo8PMwiyOPcVjJ6q5wLPa5bSYR+HSFzzyBjxbNQyQWTRT4\n",
"7GSnMVijFQfxx2ACv5m+JPAi8gMRWS0iedUF4z7fEJFHROQeEZnX2kt0nLoefD0qInhVblDlN+Gx\n",
"eN6sRQP2Q1+k2qXnzl5HtQh+APkRfBSdW7Hfa7YU8kuAX2DCciImoEUj+FdjDecfMIGPg6wzsHLP\n",
"6UXdn8YqYN6cOUfs7cTrjQK/hUTwZ5DMg1gTXvfzgWEiuWmM4+m6wB+OZVxtIXkv0gK/ClvRLGpg\n",
"OoJ/BGuUJlDdg0/vHyexjcSybvqUB/9DrKRsLiJyLjBHVQ/GKuld1qJrc5zICiprzzTKOqpPVtqG\n",
"pcfl5bD/gSo+cZPch/m/0Fng47Y0HRF88MEvAT6V2eclWAXMhViGSKEIPvQEPg18JkSha4FxqcVZ\n",
"PodF9jFjZhVwe2aCWZa0RZMW+MkkcxRWY72O1ZgFlhfFjwNGi1RE2IVJFQlrp9KiOQyboEWwizaT\n",
"TIIbG/ZHlZ2Y7XQ4VSwaKgU+TqaLAt93InhV/SvJFzCP87DSqajqbUCbiPTmehNOH0OVL6jyrS6c\n",
"Iv4Q88RpFXBRlfLIf1Xl8i48b/Z8d6tSCne3YxHiRkxEoLMNtRmLeCO/BGaIcDJYeQFs4tTtmMDP\n",
"o7hF8wIsz/6qcG3bsdmoMeJersqb4nKMwLXUL32dtmi2kMzGjemWYMJ+Wrje5WQEPowpDMFstGaj\n",
"+CFY9dEo4iNSM27Xp/ZLD7SmBRus99af6oOs6aybtEXTtwS+AFOoXEh6JUllOcfpDcQfYl7BsGdV\n",
"+X4PXw+YwPfDIvjNmP2RjeAvxnrQQIcV8j1seUUIA6Rh+91hW1GBfzFwdSb9cw1WGE6Dt96BKjeq\n",
"8ss656xm0aQHa1djdtRCzLaZmTlHFM5baV7g20hmN8drGImVuEjPnE778GkPHsyyUyrrBqWpZtE8\n",
"RS+yaFq1KG524Cc3Z1hELk7dXaCqC1r0/I5Ti6oCvw/pWH1Llb0ibCQj8KoVFTMj15KI/otIBotj\n",
"HZxV2KScegJ/JlYaOc1aLPskr25QEaJFE0s7pAU+HcHH691DiOBFGKXKBsyeWYP1Sj7a5HWkBT4O\n",
"smYjdOgs8NkIfkP4bKoNsuZZNHfTxQheROYD87tyjkgrBP5JktVcwKL3vJogqOrFLXg+x2mUWhbN\n",
"viK7vGK6Pk4t7gamiDABE+m44MgKTKCfxESyqsCHJQWPx2yQNF0SeFV2iaCY+GU9+Cjw0XJaiAni\n",
"MSKMA5aLMJak9s4dwPEiDMizz+qQjeBHkGTEpKkl8A+m9o8lpQWYptpROyg6F+uw+QRttMCiCYHv\n",
"gnhfREpVd65DKyyaK7BFlhGRE4GNqrq69iGO06Nsw6ab98oIPvzfQAGBD7Ngb8Dyzw8k1NMPVstc\n",
"VZaSsmhE+IecGaNxScFsg7cGE/5mI3iwRnQy1SP49diYx9MkHvw5mAUykyDwqqzDBqSbieLzLJp6\n",
"EfzYzOO3Aq8Nt6NFcyhJFlHa0klbNKuw2jX9m7jullMkTfIX2Cy3Q0RkhYi8XUQuEpGLAFT1auAx\n",
"EVkKXI7NqHOcXkMqV7k3CnwUokICH7gO+BhwXdpDV+3wuePAogBfxPLZ05yJWT1Z1mK1Zboi8Fuo\n",
"jOCnAnviQuth8fnvhn2jB/9yYC+WmhgjeLAc/Y+KdFrcHAAR/l2EN+Y8lLVoRpAv8E8Dk0MVyIEk\n",
"cxRizaN7wt04yDoZmCrCMKpbNLEuUq9Yx7WuRaOqbyiwz/vq7eM4+5h17B8WDZjAf53MpKMUUdTG\n",
"h/+vxOqvEFauuoDQ686wBssc6arAx/9bsAYjbywBLNodiaV6XoEJfCy3gCrLRPh3bKzg/Jzjz8WE\n",
"9WeZ7XkRfDWLZhIm1utr1Bvajs1qjfME5tB5kDVm0WwiyR6qGVCEBni+aqflHVtGqwZZHae383Ys\n",
"D723kBX432B54UV4AKvMmBeFQyIwh2CTds4KdVhmYLn9PyBnwhJJ5NxViyb+34Jl5WTr9wMWJYuw\n",
"AhPeWzGBH0GSEQRhXV0RpgfvG7BBWez15bkQjVo0B1GZQZMlzjqO6d8H0zlNcmx4nhjBF/HhZwN/\n",
"FGFgrWJ2XcFLFTjPCUIO+u59fR0pKgRelf9WrbkiVAfB5jgqLXiZx2O1ynlYHZ17sFmrVwGfV+Xf\n",
"qghKKwR+C1bXfg+JwFeL4MF8+Kuwxi1aNB25/6psw3of/5A57kSskZos0rF2aqSoRbMKi8r/E/hK\n",
"jWuMHvx4bK7A8zLnW4/1EPphK5BtpUqqpAjDwqAy2Nq2B9LaNXQrcIF3nH1DnNRUrYZ9V9mMZcQ8\n",
"DPwfllr5J1W+V+OYNdhg9Joa+9RjK0kUvwVLL8yN4AP/io3dLcMi6bQHH7kMK4A2MLXtJOAmrN7M\n",
"C6BjkhQUtGhCg78BuEk1mW+QQ/Tgx2MFzA7HIvSN4TzPYu/3ptBwxh5UBeH6rgG+HTYdGf5nyzS3\n",
"DLdoHGcfoMoeEbZRe5Z4V4gCfyUWxc8FPlLnmEeB/2rB6ltpHx5qRPCqVqlThD1YBL+ajMCrskSE\n",
"e4HXEcYSMIH/Ojbn5sQwEPtKbH3cohYN2HjE7XVeU4zgh2ONyqsIOfKpfdaRzAfqZNEEv/3SsM8p\n",
"4f5R4eFx1FjusSt4BO84+46jVJPMjRazGfOoH1ZlhSpvDlP3q6LKRlXe1cXnTQt8jORrWTSR9Zge\n",
"zSRn/V7gq8BHRJCQgngC5tvfimXhfA6zTiDfoom1aSoIM3R31rm2ONFpPCbws3PO1U7SG8uzaE7A\n",
"Jqa9FAusp2ER/EqSejgtxwXecfYRqTov3cFmLFrslsiwBnFwldT/WhYN0JHKugzL4snr1fwBS2V8\n",
"EVblcnXIlb8t3P8B5sfHhTfyIvhsFk1RYgQ/gaRCZVbg11Ep8FmL5u3Ad8Pcg1uwmaqzsFx/t2gc\n",
"x2mIzdhqTD2dGtqQRZNhGTAxzyIKGTdfwQZDJxKWyVNljQgfwgZK34iJcFbgh2GNXZ5FU4S4MPt4\n",
"zEJ6OOdc60gW/KiwaMLM4QtILJm/YYL/GMnM4wrC5LTRqh119JvCI3jH2T/ZjAlRT9NVga81wPsz\n",
"LPPlDapJGWdVvh6ybVZg1keHwIdsnp1YBN7sgPYOTIQ1NJiP0DmtslYEfz62rm8s4XILcDrWG+hY\n",
"aEWE14rwcxHuxno972zyejvwCN5x9k/2lcAvhI7aMRtJVlUqwjLy/Xego0571bUpqBT4tJjH1M1m\n",
"c823Y1VzY/roYqygW5rVJIOsW4GhIhwJfAZ4IVZaInInlm55P/Z6Dw/bP4FlPH0TuLvemEkRXOAd\n",
"Z/9kAfR83r8qfyUsahIE+fgGDr+RrrkKK7CB5d0ZcdwMXcoM2o6Jd+xdfJnOFXQvJdHTrViJhndi\n",
"kf0LVZMlIVXZIcKd2OIjQmLRHAR8X7VLi9tU4ALvOPshqvzvvr6GRlFlEWHFpSZZga3FujGzfQvU\n",
"zZSpRZyUtho6Gq4KgkUUiR78kcCn0uKe4mXhOk/AVtIagdlILS3U6B684zj7Cyuwgcw8gW82gwaS\n",
"SWlFJ4BtxWyio7GJUZ1QZX0YTI4e/CxgWatLFrjAO46zvxAtmqzAb6b5DJo4ULuL4tH1VqyUwuOx\n",
"imYN1mJ58LMoXouoMC7wjuPsL6zE8ujzIvimBT6wneIR/DZssPeOAvtuwZYw/P/t3VuIVVUcx/Hv\n",
"r9IHMwgJxi4D+uDD+OQQDJFI8yT60oWiFAIfeoju0EMiSPrQgwVBD0EEGViEJUViEGRBRRAkkrdS\n",
"KcEBLS8DRSQSKP17WOvk8Xgue2b2OXtm+/vAxj1775mz/LP8u2fv9V9rBMqvi3CCN7O6+J00dUHZ\n",
"j2hgagm+UXvQM8E3rVUwhu/gzczay5OHneHaBP8ezPil80Wm9ogGes9x0zBJmjCt9ATvUTRmVien\n",
"aEnwEXxfws/9hrw8YgEXSENUD/W6MJskvZAt/RGNE7yZ1ck1Cb4MEVOqKj0FPDyFQqVGVawTvJlZ\n",
"Fx+RXrZWJo+6+WwK3zJJWmi89HmDnODNrDYi+LjqNkzDJH14/g5+yWpmVrWz9GlaZ0X0Za3Xaz9I\n",
"iohonb/BzOy6lhdEXxDRfijnTHJnoTt4SWskHZf0q6SNbc6PS/pL0oG8bZ5OY8zMrjcR/NMpuc9U\n",
"zwQv6UbgTdI0ncuB9ZJG2lz6bUSM5u2VkttpLSSNV92GunAsy+V4zh5F7uDHgBMRMRERl4APSYvb\n",
"tvLjl8Ear7oBNTJedQNqZrzqBlhSJMHfSRrX2XA6H2sWwL2SDkn6XNJyzMysUkWGSRZ5C/sjMBwR\n",
"FyWtBXZzZYVzMzOrQM9RNJLuAbZGxJr89Sbg34h4tcv3nATujog/mo4NZriOmVnNTHcUTZE7+P3A\n",
"MklLSLO1PQasb75A0hBwPiJC0hjpP46r3gp7iKSZ2WD1TPARcVnSs8AXpLmWt0fEMUlP5vNvA48A\n",
"T0m6TJp1bV0f22xmZgUMrNDJzMwGayBTFfQqlLLuJE1IOpyLyPblY4skfSnpF0l7Jd1adTtnK0nv\n",
"Sjon6UjTsY7xk7Qp99XjklZX0+rZqUMst0o63VTouLbpnGPZhaRhSV9L+lnST5Kez8fL6Z8R0deN\n",
"9FjnBLAEmEdaNX2k359bp400jeiilmOvAS/l/Y3AtqrbOVs3YBUwChzpFT9SMd/B3FeX5L57Q9V/\n",
"h9mydYjlFuDFNtc6lr3juRhYkfcXkuacHymrfw7iDr5ooZR11/qS+n5gR97fATw42ObMHRHxHfBn\n",
"y+FO8XsA2BkRlyJigvQPaGwQ7ZwLOsQS2hc6OpY9RMTZiDiY9y8Ax0h1RqX0z0Ek+CKFUtZdAF9J\n",
"2i+psfDAUEQ0lhA7BwxV07Q5q1P87uDq+cTdX4t5Lhc6bm96nOBYTkEeqTgK/EBJ/XMQCd5vcWdu\n",
"ZUSMAmuBZyStaj4Z6Xc3x3maCsTPse3uLWApsIK0JurrXa51LNuQtBD4BHghIv5uPjeT/jmIBP8b\n",
"MNz09TAVr7gy10TEmfznJGnx4DHgnKTFAJJup/iK75Z0il9rf70rH7MOIuJ8ZMA7XHlk4FgWIGke\n",
"Kbm/HxG78+FS+ucgEvz/hVKS5pMKpfYM4HNrQdICSbfk/ZuB1cARUgw35Ms2kKaHsOI6xW8PsE7S\n",
"fElLgWXAvgraN2fkBNTwEKl/gmPZkyQB24GjEfFG06lS+mffl+yLDoVS/f7cGhkCPk39gJuADyJi\n",
"r6T9wC5JTwATwKPVNXF2k7QTuA+4TdIp4GVgG23iFxFHJe0CjgKXgafznanRNpZbgHFJK0iPCk4C\n",
"jSJIx7K3lcDjwGFJB/KxTZTUP13oZGZWU16T1cysppzgzcxqygnezKymnODNzGrKCd7MrKac4M3M\n",
"asoJ3sysppzgzcxq6j+vUsbacqJa4gAAAABJRU5ErkJggg==\n"
],
"text/plain": [
"<matplotlib.figure.Figure at 0x7fbb37f207d0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(np.vstack([train_loss, scratch_train_loss]).clip(0, 4).T)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's take a look at the testing accuracy after running 200 iterations. Note that we are running a classification task of 5 classes, thus a chance accuracy is 20%. As we will reasonably expect, the finetuning result will be much better than the one from training from scratch. Let's see."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy for fine-tuning: 0.570000001788\n",
"Accuracy for training from scratch: 0.224000000954\n"
]
}
],
"source": [
"test_iters = 10\n",
"accuracy = 0\n",
"scratch_accuracy = 0\n",
"for it in arange(test_iters):\n",
" solver.test_nets[0].forward()\n",
" accuracy += solver.test_nets[0].blobs['accuracy'].data\n",
" scratch_solver.test_nets[0].forward()\n",
" scratch_accuracy += scratch_solver.test_nets[0].blobs['accuracy'].data\n",
"accuracy /= test_iters\n",
"scratch_accuracy /= test_iters\n",
"print 'Accuracy for fine-tuning:', accuracy\n",
"print 'Accuracy for training from scratch:', scratch_accuracy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Huzzah! So we did finetuning and it is awesome. Let's take a look at what kind of results we are able to get with a longer, more complete run of the style recognition dataset. Note: the below URL might be occassionally down because it is run on a research machine.\n",
"\n",
"http://demo.vislab.berkeleyvision.org/"
]
}
],
"metadata": {
"description": "Fine-tune the ImageNet-trained CaffeNet on new data.",
"example_name": "Fine-tuning for Style Recognition",
"include_in_docs": true,
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.9"
},
"priority": 4
},
"nbformat": 4,
"nbformat_minor": 0
}

View file

@ -0,0 +1,31 @@
file(GLOB_RECURSE examples_srcs "${PROJECT_SOURCE_DIR}/examples/*.cpp")
foreach(source_file ${examples_srcs})
# get file name
get_filename_component(name ${source_file} NAME_WE)
# get folder name
get_filename_component(path ${source_file} PATH)
get_filename_component(folder ${path} NAME_WE)
add_executable(${name} ${source_file})
target_link_libraries(${name} ${Caffe_LINK})
caffe_default_properties(${name})
# set back RUNTIME_OUTPUT_DIRECTORY
set_target_properties(${name} PROPERTIES
RUNTIME_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/examples/${folder}")
caffe_set_solution_folder(${name} examples)
# install
install(TARGETS ${name} DESTINATION bin)
if(UNIX OR APPLE)
# Funny command to make tutorials work
# TODO: remove in future as soon as naming is standartaized everywhere
set(__outname ${PROJECT_BINARY_DIR}/examples/${folder}/${name}${Caffe_POSTFIX})
add_custom_command(TARGET ${name} POST_BUILD
COMMAND ln -sf "${__outname}" "${__outname}.bin")
endif()
endforeach()

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 538 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 517 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 138 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 238 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 196 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 939 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 531 KiB

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,537 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "AeF9mG-COZNE"
},
"source": [
"## Loading DNN model\n",
"In this notebook we are going to use a [GoogLeNet](https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet) model trained on [ImageNet](http://www.image-net.org/) dataset, which is set in the ```model_path``` variable:\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "RMhGdYHuOZM8"
},
"source": [
"# Deep Dreams (with Caffe) adapted by marina von steinkirch\n",
"\n",
"This notebook demonstrates how to use the [Caffe](http://caffe.berkeleyvision.org/) neural network framework to \n",
"produce \"dream\" visuals shown in the \n",
"[Google Research blog post](http://googleresearch.blogspot.ch/2015/06/inceptionism-going-deeper-into-neural.html). **#deepdream**\n",
"\n",
"## Dependencies\n",
"\n",
"* Standard Python scientific stack: [NumPy](http://www.numpy.org/), [SciPy](http://www.scipy.org/), [PIL](http://www.pythonware.com/products/pil/), [IPython](http://ipython.org/). \n",
"* [Caffe](http://caffe.berkeleyvision.org/) deep learning framework ([installation instructions](http://caffe.berkeleyvision.org/installation.html)).\n",
"* Google [protobuf](https://developers.google.com/protocol-buffers/) library that is used for Caffe model manipulation.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "both",
"colab_type": "code",
"collapsed": false,
"id": "Pqz5k4syOZNA"
},
"outputs": [],
"source": [
"import os\n",
"import numpy as np\n",
"import scipy.ndimage as nd\n",
"import PIL.Image\n",
"from cStringIO import StringIO\n",
"from IPython.display import clear_output, Image, display\n",
"from google.protobuf import text_format\n",
"\n",
"import caffe\n",
"\n",
"# GPU support for CUDA and Caffe.\n",
"caffe.set_mode_gpu()\n",
"# Select GPU device if multiple devices exist.\n",
"caffe.set_device(0)\n",
"\n",
"# Set the background image to build the dream on top of it.\n",
"BACKGROUND_IMG = 'd2.jpg'\n",
"\n",
"# Set the image to control the dream on.\n",
"CONTROL_IMAGE = 'd3.jpg'\n",
"\n",
"def showarray(a, fmt='jpeg'):\n",
" \"\"\" Display the image in the notebook. \"\"\"\n",
" a = np.uint8(np.clip(a, 0, 255))\n",
" f = StringIO()\n",
" PIL.Image.fromarray(a).save(f, fmt)\n",
" display(Image(data=f.getvalue()))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "both",
"colab_type": "code",
"collapsed": false,
"id": "i9hkSm1IOZNR"
},
"outputs": [],
"source": [
"model_path = '../models/bvlc_googlenet/'\n",
"net_fn = os.path.join(model_path, 'deploy.prototxt')\n",
"param_fn = os.path.join(model_path, 'bvlc_googlenet.caffemodel')\n",
"\n",
"# Patching model to be able to compute gradients.\n",
"# Note that you can also manually add \"force_backward: true\" line to \"deploy.prototxt\".\n",
"model = caffe.io.caffe_pb2.NetParameter()\n",
"text_format.Merge(open(net_fn).read(), model)\n",
"model.force_backward = True\n",
"open('tmp.prototxt', 'w').write(str(model))\n",
"\n",
"net = caffe.Classifier('tmp.prototxt', param_fn,\n",
" mean = np.float32([104.0, 116.0, 122.0]), # ImageNet mean, training set dependent\n",
" channel_swap = (2,1,0)) # the reference model has channels in BGR order instead of RGB\n",
"\n",
"# A couple of utility functions for converting to and from Caffe's input image layout.\n",
"def preprocess(net, img):\n",
" return np.float32(np.rollaxis(img, 2)[::-1]) - net.transformer.mean['data']\n",
"\n",
"def deprocess(net, img):\n",
" return np.dstack((img + net.transformer.mean['data'])[::-1])"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "UeV_fJ4QOZNb"
},
"source": [
"## Producing dreams"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "9udrp3efOZNd"
},
"source": [
"The \"dream\" images is just a gradient ascent process that tries to maximize the **L2** norm of activations of a particular DNN layer. \n",
"\n",
"Here are a few simple tricks that we found useful for getting good images:\n",
"\n",
"* offset image by a random jitter,\n",
"* normalize the magnitude of gradient ascent steps,\n",
"* apply ascent across multiple scales (octaves),\n",
"\n",
"First, we implement a basic gradient ascent step function, applying the first two tricks:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "both",
"colab_type": "code",
"collapsed": false,
"id": "pN43nMsHOZNg"
},
"outputs": [],
"source": [
"def objective_L2(dst):\n",
" dst.diff[:] = dst.data \n",
"\n",
"def make_step(net, step_size=1.5, end='inception_4c/output', \n",
" jitter=32, clip=True, objective=objective_L2):\n",
" '''Basic gradient ascent step.'''\n",
"\n",
" src = net.blobs['data'] # input image is stored in Net's 'data' blob\n",
" dst = net.blobs[end]\n",
"\n",
" ox, oy = np.random.randint(-jitter, jitter+1, 2)\n",
" src.data[0] = np.roll(np.roll(src.data[0], ox, -1), oy, -2) # apply jitter shift\n",
" \n",
" net.forward(end=end)\n",
" objective(dst) # specify the optimization objective\n",
" net.backward(start=end)\n",
" g = src.diff[0]\n",
" # apply normalized ascent step to the input image\n",
" src.data[:] += step_size/np.abs(g).mean() * g\n",
"\n",
" src.data[0] = np.roll(np.roll(src.data[0], -ox, -1), -oy, -2) # unshift image\n",
" \n",
" if clip:\n",
" bias = net.transformer.mean['data']\n",
" src.data[:] = np.clip(src.data, -bias, 255-bias) "
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "nphEdlBgOZNk"
},
"source": [
"Next we implement an ascent through different scales. We call these scales \"octaves\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "both",
"colab_type": "code",
"collapsed": false,
"id": "ZpFIn8l0OZNq"
},
"outputs": [],
"source": [
"def deepdream(net, base_img, iter_n=10, octave_n=4, octave_scale=1.4, \n",
" end='inception_4c/output', clip=True, **step_params):\n",
" \n",
" # prepare base images for all octaves\n",
" octaves = [preprocess(net, base_img)]\n",
" for i in xrange(octave_n-1):\n",
" octaves.append(nd.zoom(octaves[-1], (1, 1.0/octave_scale,1.0/octave_scale), order=1))\n",
" \n",
" src = net.blobs['data']\n",
" detail = np.zeros_like(octaves[-1]) # allocate image for network-produced details\n",
" for octave, octave_base in enumerate(octaves[::-1]):\n",
" h, w = octave_base.shape[-2:]\n",
" if octave > 0:\n",
" # upscale details from the previous octave\n",
" h1, w1 = detail.shape[-2:]\n",
" detail = nd.zoom(detail, (1, 1.0*h/h1,1.0*w/w1), order=1)\n",
"\n",
" src.reshape(1,3,h,w) # resize the network's input image size\n",
" src.data[0] = octave_base+detail\n",
" for i in xrange(iter_n):\n",
" make_step(net, end=end, clip=clip, **step_params)\n",
" \n",
" # visualization\n",
" vis = deprocess(net, src.data[0])\n",
" if not clip: # adjust image contrast if clipping is disabled\n",
" vis = vis*(255.0/np.percentile(vis, 99.98))\n",
" showarray(vis)\n",
" print octave, i, end, vis.shape\n",
" clear_output(wait=True)\n",
" \n",
" # extract details produced on the current octave\n",
" detail = src.data[0]-octave_base\n",
" # returning the resulting image\n",
" return deprocess(net, src.data[0])"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "QrcdU-lmOZNx"
},
"source": [
"Now we are ready to let the neural network reveal its dreams! Let's take a Dali image as a starting point:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "both",
"colab_type": "code",
"collapsed": false,
"executionInfo": null,
"id": "40p5AqqwOZN5",
"outputId": "f62cde37-79e8-420a-e448-3b9b48ee1730",
"pinned": false
},
"outputs": [],
"source": [
"img = np.float32(PIL.Image.open(BACKGROUND_IMG))\n",
"showarray(img)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Z9_215_GOZOL"
},
"source": [
"Running the next code cell starts the detail generation process. You may see how new patterns start to form, iteration by iteration, octave by octave."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "both",
"colab_type": "code",
"collapsed": false,
"executionInfo": null,
"id": "HlnVnDTlOZOL",
"outputId": "425dfc83-b474-4a69-8386-30d86361bbf6",
"pinned": false
},
"outputs": [],
"source": [
"_=deepdream(net, img)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Rp9kOCQTOZOQ"
},
"source": [
"The complexity of the details generated depends on which layer's activations we try to maximize. Higher layers produce complex features, while lower ones enhance edges and textures, giving the image an impressionist feeling:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "both",
"colab_type": "code",
"collapsed": false,
"executionInfo": null,
"id": "eHOX0t93OZOR",
"outputId": "0de0381c-4681-4619-912f-9b6a2cdec0c6",
"pinned": false
},
"outputs": [],
"source": [
"_=deepdream(net, img, end='inception_3b/5x5_reduce')"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "rkzHz9E8OZOb"
},
"source": [
"We encourage readers to experiment with layer selection to see how it affects the results. Execute the next code cell to see the list of different layers. You can modify the `make_step` function to make it follow some different objective, say to select a subset of activations to maximize, or to maximize multiple layers at once. There is a huge design space to explore!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "both",
"colab_type": "code",
"collapsed": false,
"id": "OIepVN6POZOc"
},
"outputs": [],
"source": [
"net.blobs.keys()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "vs2uUpMCOZOe"
},
"source": [
"What if we feed the `deepdream` function its own output, after applying a little zoom to it? It turns out that this leads to an endless stream of impressions of the things that the network saw during training. Some patterns fire more often than others, suggestive of basins of attraction.\n",
"\n",
"We will start the process from the same sky image as above, but after some iteration the original image becomes irrelevant; even random noise can be used as the starting point."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "both",
"colab_type": "code",
"collapsed": false,
"id": "IB48CnUfOZOe"
},
"outputs": [],
"source": [
"!mkdir frames\n",
"frame = img\n",
"frame_i = 0"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "both",
"colab_type": "code",
"collapsed": false,
"id": "fj0E-fKDOZOi"
},
"outputs": [],
"source": [
"h, w = frame.shape[:2]\n",
"s = 0.05 # scale coefficient\n",
"\n",
"for i in xrange(1):\n",
" frame = deepdream(net, frame)\n",
" PIL.Image.fromarray(np.uint8(frame)).save(\"frames/%04d.jpg\"%frame_i)\n",
" frame = nd.affine_transform(frame, [1-s,1-s,1], [h*s/2,w*s/2,0], order=1)\n",
" frame_i += 1"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "XzZGGME_OZOk"
},
"source": [
"Be careful running the code above, it can bring you into very strange realms!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "both",
"colab_type": "code",
"collapsed": false,
"executionInfo": null,
"id": "ZCZcz2p1OZOt",
"outputId": "d3773436-2b5d-4e79-be9d-0f12ab839fff",
"pinned": false
},
"outputs": [],
"source": [
"Image(filename='frames/0029.jpg')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Controlling dreams\n",
"\n",
"The image detail generation method described above tends to produce some patterns more often the others. One easy way to improve the generated image diversity is to tweak the optimization objective. Here we show just one of many ways to do that. Let's use one more input image. We'd call it a \"*guide*\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"guide = np.float32(PIL.Image.open(CONTROL_IMAGE))\n",
"showarray(guide)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the neural network we use was trained on images downscaled to 224x224 size. So high resolution images might have to be downscaled, so that the network could pick up their features. The image we use here is already small enough.\n",
"\n",
"Now we pick some target layer and extract guide image features."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"end = 'inception_3b/output'\n",
"h, w = guide.shape[:2]\n",
"src, dst = net.blobs['data'], net.blobs[end]\n",
"src.reshape(1,3,h,w)\n",
"src.data[0] = preprocess(net, guide)\n",
"net.forward(end=end)\n",
"guide_features = dst.data[0].copy()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead of maximizing the L2-norm of current image activations, we try to maximize the dot-products between activations of current image, and their best matching correspondences from the guide image."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def objective_guide(dst):\n",
" x = dst.data[0].copy()\n",
" y = guide_features\n",
" ch = x.shape[0]\n",
" x = x.reshape(ch,-1)\n",
" y = y.reshape(ch,-1)\n",
" A = x.T.dot(y) # compute the matrix of dot-products with guide features\n",
" dst.diff[0].reshape(ch,-1)[:] = y[:,A.argmax(1)] # select ones that match best\n",
"\n",
"_=deepdream(net, img, end=end, objective=objective_guide)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This way we can affect the style of generated images without using a different training set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"colabVersion": "0.3.1",
"default_view": {},
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
},
"views": {}
},
"nbformat": 4,
"nbformat_minor": 0
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

File diff suppressed because it is too large Load diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

File diff suppressed because it is too large Load diff

42
dockerfiles/Dockerfile Normal file
View file

@ -0,0 +1,42 @@
FROM ubuntu:14.04
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cmake \
git \
wget \
libatlas-base-dev \
libboost-all-dev \
libgflags-dev \
libgoogle-glog-dev \
libhdf5-serial-dev \
libleveldb-dev \
liblmdb-dev \
libopencv-dev \
libprotobuf-dev \
libsnappy-dev \
protobuf-compiler \
python-dev \
python-numpy \
python-pip \
python-scipy && \
rm -rf /var/lib/apt/lists/*
ENV CAFFE_ROOT=/opt/caffe
WORKDIR $CAFFE_ROOT
# FIXME: clone a specific git tag and use ARG instead of ENV once DockerHub supports this.
ENV CLONE_TAG=master
RUN git clone -b ${CLONE_TAG} --depth 1 https://github.com/BVLC/caffe.git . && \
for req in $(cat python/requirements.txt) pydot; do pip install $req; done && \
mkdir build && cd build && \
cmake -DCPU_ONLY=1 .. && \
make -j"$(nproc)"
ENV PYCAFFE_ROOT $CAFFE_ROOT/python
ENV PYTHONPATH $PYCAFFE_ROOT:$PYTHONPATH
ENV PATH $CAFFE_ROOT/build/tools:$PYCAFFE_ROOT:$PATH
RUN echo "$CAFFE_ROOT/build/lib" >> /etc/ld.so.conf.d/caffe.conf && ldconfig
WORKDIR /workspace

17
dockerfiles/README.md Normal file
View file

@ -0,0 +1,17 @@
### Build with:
```
$ docker build -t caffe:cpu .
```
### You can test with:
```
$ docker run -ti caffe:cpu bash -c "cd /opt/caffe/build; make runtest"
```
### Or play with:
```
$ docker run -ti --volume=$(pwd):/workspace caffe:cpu
```

25
ml_notebooks/README.md Normal file
View file

@ -0,0 +1,25 @@
## Jupyter Notebooks
### Installing
Install any dependences for [Jupyter](http://jupyter.readthedocs.io/en/latest/install.html):
```shell
$ apt-get install build-essential python3-dev
$ pip install jupyter
```
### Running
On the notebook directory:
```shell
$ jupyter notebook
```
### Basics
* A notebook is made up of a number of cells with Python code. You can execute a cell by clicking on it and pressing ```Shift-Enter```.

467
ml_notebooks/basics.ipynb Normal file
View file

@ -0,0 +1,467 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "iPpI7RaYoZuE"
},
"source": [
"##### Copyright 2018 The TensorFlow Authors."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"cellView": "form",
"colab": {},
"colab_type": "code",
"id": "hro2InpHobKk"
},
"outputs": [],
"source": [
"#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "U9i2Dsh-ziXr"
},
"source": [
"# Customization basics: tensors and operations"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Hndw-YcxoOJK"
},
"source": [
"\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
" \u003ctd\u003e\n",
" \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/customization/basics\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
" \u003c/td\u003e\n",
" \u003ctd\u003e\n",
" \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/customization/basics.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
" \u003c/td\u003e\n",
" \u003ctd\u003e\n",
" \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/tutorials/customization/basics.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
" \u003c/td\u003e\n",
" \u003ctd\u003e\n",
" \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/docs/site/en/tutorials/customization/basics.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
" \u003c/td\u003e\n",
"\u003c/table\u003e"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "6sILUVbHoSgH"
},
"source": [
"This is an introductory TensorFlow tutorial shows how to:\n",
"\n",
"* Import the required package\n",
"* Create and use tensors\n",
"* Use GPU acceleration\n",
"* Demonstrate `tf.data.Dataset`"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "miTaGiqV9RjO"
},
"outputs": [],
"source": [
"from __future__ import absolute_import, division, print_function\n",
"\n",
"try:\n",
" # %tensorflow_version only exists in Colab.\n",
" %tensorflow_version 2.x\n",
"except Exception:\n",
" pass\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "z1JcS5iBXMRO"
},
"source": [
"## Import TensorFlow\n",
"\n",
"To get started, import the `tensorflow` module. As of TensorFlow 2.0, eager execution is turned on by default. This enables a more interactive frontend to TensorFlow, the details of which we will discuss much later."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "vjBPmYjLdFmk"
},
"outputs": [],
"source": [
"import tensorflow as tf"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "H9UySOPLXdaw"
},
"source": [
"## Tensors\n",
"\n",
"A Tensor is a multi-dimensional array. Similar to NumPy `ndarray` objects, `tf.Tensor` objects have a data type and a shape. Additionally, `tf.Tensor`s can reside in accelerator memory (like a GPU). TensorFlow offers a rich library of operations ([tf.add](https://www.tensorflow.org/api_docs/python/tf/add), [tf.matmul](https://www.tensorflow.org/api_docs/python/tf/matmul), [tf.linalg.inv](https://www.tensorflow.org/api_docs/python/tf/linalg/inv) etc.) that consume and produce `tf.Tensor`s. These operations automatically convert native Python types, for example:\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"cellView": "code",
"colab": {},
"colab_type": "code",
"id": "ngUe237Wt48W"
},
"outputs": [],
"source": [
"print(tf.add(1, 2))\n",
"print(tf.add([1, 2], [3, 4]))\n",
"print(tf.square(5))\n",
"print(tf.reduce_sum([1, 2, 3]))\n",
"\n",
"# Operator overloading is also supported\n",
"print(tf.square(2) + tf.square(3))"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "IDY4WsYRhP81"
},
"source": [
"Each `tf.Tensor` has a shape and a datatype:"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "srYWH1MdJNG7"
},
"outputs": [],
"source": [
"x = tf.matmul([[1]], [[2, 3]])\n",
"print(x)\n",
"print(x.shape)\n",
"print(x.dtype)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "eBPw8e8vrsom"
},
"source": [
"The most obvious differences between NumPy arrays and `tf.Tensor`s are:\n",
"\n",
"1. Tensors can be backed by accelerator memory (like GPU, TPU).\n",
"2. Tensors are immutable."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Dwi1tdW3JBw6"
},
"source": [
"### NumPy Compatibility\n",
"\n",
"Converting between a TensorFlow `tf.Tensor`s and a NumPy `ndarray` is easy:\n",
"\n",
"* TensorFlow operations automatically convert NumPy ndarrays to Tensors.\n",
"* NumPy operations automatically convert Tensors to NumPy ndarrays.\n",
"\n",
"Tensors are explicitly converted to NumPy ndarrays using their `.numpy()` method. These conversions are typically cheap since the array and `tf.Tensor` share the underlying memory representation, if possible. However, sharing the underlying representation isn't always possible since the `tf.Tensor` may be hosted in GPU memory while NumPy arrays are always backed by host memory, and the conversion involves a copy from GPU to host memory."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "lCUWzso6mbqR"
},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"ndarray = np.ones([3, 3])\n",
"\n",
"print(\"TensorFlow operations convert numpy arrays to Tensors automatically\")\n",
"tensor = tf.multiply(ndarray, 42)\n",
"print(tensor)\n",
"\n",
"\n",
"print(\"And NumPy operations convert Tensors to numpy arrays automatically\")\n",
"print(np.add(tensor, 1))\n",
"\n",
"print(\"The .numpy() method explicitly converts a Tensor to a numpy array\")\n",
"print(tensor.numpy())"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "PBNP8yTRfu_X"
},
"source": [
"## GPU acceleration\n",
"\n",
"Many TensorFlow operations are accelerated using the GPU for computation. Without any annotations, TensorFlow automatically decides whether to use the GPU or CPU for an operation—copying the tensor between CPU and GPU memory, if necessary. Tensors produced by an operation are typically backed by the memory of the device on which the operation executed, for example:"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"cellView": "code",
"colab": {},
"colab_type": "code",
"id": "3Twf_Rw-gQFM"
},
"outputs": [],
"source": [
"x = tf.random.uniform([3, 3])\n",
"\n",
"print(\"Is there a GPU available: \"),\n",
"print(tf.config.experimental.list_physical_devices(\"GPU\"))\n",
"\n",
"print(\"Is the Tensor on GPU #0: \"),\n",
"print(x.device.endswith('GPU:0'))"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "vpgYzgVXW2Ud"
},
"source": [
"### Device Names\n",
"\n",
"The `Tensor.device` property provides a fully qualified string name of the device hosting the contents of the tensor. This name encodes many details, such as an identifier of the network address of the host on which this program is executing and the device within that host. This is required for distributed execution of a TensorFlow program. The string ends with `GPU:\u003cN\u003e` if the tensor is placed on the `N`-th GPU on the host."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "ZWZQCimzuqyP"
},
"source": [
"\n",
"\n",
"### Explicit Device Placement\n",
"\n",
"In TensorFlow, *placement* refers to how individual operations are assigned (placed on) a device for execution. As mentioned, when there is no explicit guidance provided, TensorFlow automatically decides which device to execute an operation and copies tensors to that device, if needed. However, TensorFlow operations can be explicitly placed on specific devices using the `tf.device` context manager, for example:"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "RjkNZTuauy-Q"
},
"outputs": [],
"source": [
"import time\n",
"\n",
"def time_matmul(x):\n",
" start = time.time()\n",
" for loop in range(10):\n",
" tf.matmul(x, x)\n",
"\n",
" result = time.time()-start\n",
"\n",
" print(\"10 loops: {:0.2f}ms\".format(1000*result))\n",
"\n",
"# Force execution on CPU\n",
"print(\"On CPU:\")\n",
"with tf.device(\"CPU:0\"):\n",
" x = tf.random.uniform([1000, 1000])\n",
" assert x.device.endswith(\"CPU:0\")\n",
" time_matmul(x)\n",
"\n",
"# Force execution on GPU #0 if available\n",
"if tf.config.experimental.list_physical_devices(\"GPU\"):\n",
" print(\"On GPU:\")\n",
" with tf.device(\"GPU:0\"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.\n",
" x = tf.random.uniform([1000, 1000])\n",
" assert x.device.endswith(\"GPU:0\")\n",
" time_matmul(x)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "o1K4dlhhHtQj"
},
"source": [
"## Datasets\n",
"\n",
"This section uses the [`tf.data.Dataset` API](https://www.tensorflow.org/guide/datasets) to build a pipeline for feeding data to your model. The `tf.data.Dataset` API is used to build performant, complex input pipelines from simple, re-usable pieces that will feed your model's training or evaluation loops."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "zI0fmOynH-Ne"
},
"source": [
"### Create a source `Dataset`\n",
"\n",
"Create a *source* dataset using one of the factory functions like [`Dataset.from_tensors`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_tensors), [`Dataset.from_tensor_slices`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_tensor_slices), or using objects that read from files like [`TextLineDataset`](https://www.tensorflow.org/api_docs/python/tf/data/TextLineDataset) or [`TFRecordDataset`](https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset). See the [TensorFlow Dataset guide](https://www.tensorflow.org/guide/datasets#reading_input_data) for more information."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "F04fVOHQIBiG"
},
"outputs": [],
"source": [
"ds_tensors = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5, 6])\n",
"\n",
"# Create a CSV file\n",
"import tempfile\n",
"_, filename = tempfile.mkstemp()\n",
"\n",
"with open(filename, 'w') as f:\n",
" f.write(\"\"\"Line 1\n",
"Line 2\n",
"Line 3\n",
" \"\"\")\n",
"\n",
"ds_file = tf.data.TextLineDataset(filename)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "vbxIhC-5IPdf"
},
"source": [
"### Apply transformations\n",
"\n",
"Use the transformations functions like [`map`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map), [`batch`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#batch), and [`shuffle`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#shuffle) to apply transformations to dataset records."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "uXSDZWE-ISsd"
},
"outputs": [],
"source": [
"ds_tensors = ds_tensors.map(tf.square).shuffle(2).batch(2)\n",
"\n",
"ds_file = ds_file.batch(2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "A8X1GNfoIZKJ"
},
"source": [
"### Iterate\n",
"\n",
"`tf.data.Dataset` objects support iteration to loop over records:"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "ws-WKRk5Ic6-"
},
"outputs": [],
"source": [
"print('Elements of ds_tensors:')\n",
"for x in ds_tensors:\n",
" print(x)\n",
"\n",
"print('\\nElements in ds_file:')\n",
"for x in ds_file:\n",
" print(x)"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"collapsed_sections": [],
"last_runtime": {
"build_target": "",
"kind": "local"
},
"name": "basics.ipynb",
"private_outputs": true,
"provenance": [],
"toc_visible": true,
"version": "0.3.2"
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View file

@ -0,0 +1,929 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "feature_crosses.ipynb",
"provenance": [],
"collapsed_sections": [
"JndnmDMp66FL",
"ZTDHHM61NPTw",
"0i7vGo9PTaZl"
]
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "g4T-_IsVbweU",
"colab_type": "text"
},
"source": [
"# Feature Crosses"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JndnmDMp66FL",
"colab_type": "text"
},
"source": [
"#### Copyright 2017 Google LLC."
]
},
{
"cell_type": "code",
"metadata": {
"id": "hMqWDc_m6rUC",
"colab_type": "code",
"cellView": "both",
"colab": {}
},
"source": [
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "F7dke6skIK-k",
"colab_type": "text"
},
"source": [
"**Learning Objectives:**\n",
" * Improve a linear regression model with the addition of additional synthetic features (this is a continuation of the previous exercise)\n",
" * Use an input function to convert pandas `DataFrame` objects to `Tensors` and invoke the input function in `fit()` and `predict()` operations\n",
" * Use the FTRL optimization algorithm for model training\n",
" * Create new synthetic features through one-hot encoding, binning, and feature crosses"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NS_fcQRd8B97",
"colab_type": "text"
},
"source": [
"## Setup"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4IdzD8IdIK-l",
"colab_type": "text"
},
"source": [
"First, as we've done in previous exercises, let's define the input and create the data-loading code."
]
},
{
"cell_type": "code",
"metadata": {
"id": "CsfdiLiDIK-n",
"colab_type": "code",
"colab": {}
},
"source": [
"from __future__ import print_function\n",
"\n",
"import math\n",
"\n",
"from IPython import display\n",
"from matplotlib import cm\n",
"from matplotlib import gridspec\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn import metrics\n",
"import tensorflow as tf\n",
"from tensorflow.python.data import Dataset\n",
"\n",
"tf.logging.set_verbosity(tf.logging.ERROR)\n",
"pd.options.display.max_rows = 10\n",
"pd.options.display.float_format = '{:.1f}'.format\n",
"\n",
"california_housing_dataframe = pd.read_csv(\"https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv\", sep=\",\")\n",
"\n",
"california_housing_dataframe = california_housing_dataframe.reindex(\n",
" np.random.permutation(california_housing_dataframe.index))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "10rhoflKIK-s",
"colab_type": "code",
"colab": {}
},
"source": [
"def preprocess_features(california_housing_dataframe):\n",
" \"\"\"Prepares input features from California housing data set.\n",
"\n",
" Args:\n",
" california_housing_dataframe: A Pandas DataFrame expected to contain data\n",
" from the California housing data set.\n",
" Returns:\n",
" A DataFrame that contains the features to be used for the model, including\n",
" synthetic features.\n",
" \"\"\"\n",
" selected_features = california_housing_dataframe[\n",
" [\"latitude\",\n",
" \"longitude\",\n",
" \"housing_median_age\",\n",
" \"total_rooms\",\n",
" \"total_bedrooms\",\n",
" \"population\",\n",
" \"households\",\n",
" \"median_income\"]]\n",
" processed_features = selected_features.copy()\n",
" # Create a synthetic feature.\n",
" processed_features[\"rooms_per_person\"] = (\n",
" california_housing_dataframe[\"total_rooms\"] /\n",
" california_housing_dataframe[\"population\"])\n",
" return processed_features\n",
"\n",
"def preprocess_targets(california_housing_dataframe):\n",
" \"\"\"Prepares target features (i.e., labels) from California housing data set.\n",
"\n",
" Args:\n",
" california_housing_dataframe: A Pandas DataFrame expected to contain data\n",
" from the California housing data set.\n",
" Returns:\n",
" A DataFrame that contains the target feature.\n",
" \"\"\"\n",
" output_targets = pd.DataFrame()\n",
" # Scale the target to be in units of thousands of dollars.\n",
" output_targets[\"median_house_value\"] = (\n",
" california_housing_dataframe[\"median_house_value\"] / 1000.0)\n",
" return output_targets"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "ufplEkjN8KUp",
"colab_type": "code",
"colab": {}
},
"source": [
"# Choose the first 12000 (out of 17000) examples for training.\n",
"training_examples = preprocess_features(california_housing_dataframe.head(12000))\n",
"training_targets = preprocess_targets(california_housing_dataframe.head(12000))\n",
"\n",
"# Choose the last 5000 (out of 17000) examples for validation.\n",
"validation_examples = preprocess_features(california_housing_dataframe.tail(5000))\n",
"validation_targets = preprocess_targets(california_housing_dataframe.tail(5000))\n",
"\n",
"# Double-check that we've done the right thing.\n",
"print(\"Training examples summary:\")\n",
"display.display(training_examples.describe())\n",
"print(\"Validation examples summary:\")\n",
"display.display(validation_examples.describe())\n",
"\n",
"print(\"Training targets summary:\")\n",
"display.display(training_targets.describe())\n",
"print(\"Validation targets summary:\")\n",
"display.display(validation_targets.describe())"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "oJlrB4rJ_2Ma",
"colab_type": "code",
"colab": {}
},
"source": [
"def construct_feature_columns(input_features):\n",
" \"\"\"Construct the TensorFlow Feature Columns.\n",
"\n",
" Args:\n",
" input_features: The names of the numerical input features to use.\n",
" Returns:\n",
" A set of feature columns\n",
" \"\"\"\n",
" return set([tf.feature_column.numeric_column(my_feature)\n",
" for my_feature in input_features])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "NBxoAfp2AcB6",
"colab_type": "code",
"colab": {}
},
"source": [
"def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):\n",
" \"\"\"Trains a linear regression model.\n",
" \n",
" Args:\n",
" features: pandas DataFrame of features\n",
" targets: pandas DataFrame of targets\n",
" batch_size: Size of batches to be passed to the model\n",
" shuffle: True or False. Whether to shuffle the data.\n",
" num_epochs: Number of epochs for which data should be repeated. None = repeat indefinitely\n",
" Returns:\n",
" Tuple of (features, labels) for next data batch\n",
" \"\"\"\n",
" \n",
" # Convert pandas data into a dict of np arrays.\n",
" features = {key:np.array(value) for key,value in dict(features).items()} \n",
" \n",
" # Construct a dataset, and configure batching/repeating.\n",
" ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit\n",
" ds = ds.batch(batch_size).repeat(num_epochs)\n",
" \n",
" # Shuffle the data, if specified.\n",
" if shuffle:\n",
" ds = ds.shuffle(10000)\n",
" \n",
" # Return the next batch of data.\n",
" features, labels = ds.make_one_shot_iterator().get_next()\n",
" return features, labels"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "hweDyy31LBsV",
"colab_type": "text"
},
"source": [
"## FTRL Optimization Algorithm\n",
"\n",
"High dimensional linear models benefit from using a variant of gradient-based optimization called FTRL. This algorithm has the benefit of scaling the learning rate differently for different coefficients, which can be useful if some features rarely take non-zero values (it also is well suited to support L1 regularization). We can apply FTRL using the [FtrlOptimizer](https://www.tensorflow.org/api_docs/python/tf/train/FtrlOptimizer)."
]
},
{
"cell_type": "code",
"metadata": {
"id": "S0SBf1X1IK_O",
"colab_type": "code",
"colab": {}
},
"source": [
"def train_model(\n",
" learning_rate,\n",
" steps,\n",
" batch_size,\n",
" feature_columns,\n",
" training_examples,\n",
" training_targets,\n",
" validation_examples,\n",
" validation_targets):\n",
" \"\"\"Trains a linear regression model.\n",
" \n",
" In addition to training, this function also prints training progress information,\n",
" as well as a plot of the training and validation loss over time.\n",
" \n",
" Args:\n",
" learning_rate: A `float`, the learning rate.\n",
" steps: A non-zero `int`, the total number of training steps. A training step\n",
" consists of a forward and backward pass using a single batch.\n",
" feature_columns: A `set` specifying the input feature columns to use.\n",
" training_examples: A `DataFrame` containing one or more columns from\n",
" `california_housing_dataframe` to use as input features for training.\n",
" training_targets: A `DataFrame` containing exactly one column from\n",
" `california_housing_dataframe` to use as target for training.\n",
" validation_examples: A `DataFrame` containing one or more columns from\n",
" `california_housing_dataframe` to use as input features for validation.\n",
" validation_targets: A `DataFrame` containing exactly one column from\n",
" `california_housing_dataframe` to use as target for validation.\n",
" \n",
" Returns:\n",
" A `LinearRegressor` object trained on the training data.\n",
" \"\"\"\n",
"\n",
" periods = 10\n",
" steps_per_period = steps / periods\n",
"\n",
" # Create a linear regressor object.\n",
" my_optimizer = tf.train.FtrlOptimizer(learning_rate=learning_rate)\n",
" my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)\n",
" linear_regressor = tf.estimator.LinearRegressor(\n",
" feature_columns=feature_columns,\n",
" optimizer=my_optimizer\n",
" )\n",
" \n",
" training_input_fn = lambda: my_input_fn(training_examples, \n",
" training_targets[\"median_house_value\"], \n",
" batch_size=batch_size)\n",
" predict_training_input_fn = lambda: my_input_fn(training_examples, \n",
" training_targets[\"median_house_value\"], \n",
" num_epochs=1, \n",
" shuffle=False)\n",
" predict_validation_input_fn = lambda: my_input_fn(validation_examples, \n",
" validation_targets[\"median_house_value\"], \n",
" num_epochs=1, \n",
" shuffle=False)\n",
"\n",
" # Train the model, but do so inside a loop so that we can periodically assess\n",
" # loss metrics.\n",
" print(\"Training model...\")\n",
" print(\"RMSE (on training data):\")\n",
" training_rmse = []\n",
" validation_rmse = []\n",
" for period in range (0, periods):\n",
" # Train the model, starting from the prior state.\n",
" linear_regressor.train(\n",
" input_fn=training_input_fn,\n",
" steps=steps_per_period\n",
" )\n",
" # Take a break and compute predictions.\n",
" training_predictions = linear_regressor.predict(input_fn=predict_training_input_fn)\n",
" training_predictions = np.array([item['predictions'][0] for item in training_predictions])\n",
" validation_predictions = linear_regressor.predict(input_fn=predict_validation_input_fn)\n",
" validation_predictions = np.array([item['predictions'][0] for item in validation_predictions])\n",
" \n",
" # Compute training and validation loss.\n",
" training_root_mean_squared_error = math.sqrt(\n",
" metrics.mean_squared_error(training_predictions, training_targets))\n",
" validation_root_mean_squared_error = math.sqrt(\n",
" metrics.mean_squared_error(validation_predictions, validation_targets))\n",
" # Occasionally print the current loss.\n",
" print(\" period %02d : %0.2f\" % (period, training_root_mean_squared_error))\n",
" # Add the loss metrics from this period to our list.\n",
" training_rmse.append(training_root_mean_squared_error)\n",
" validation_rmse.append(validation_root_mean_squared_error)\n",
" print(\"Model training finished.\")\n",
"\n",
" \n",
" # Output a graph of loss metrics over periods.\n",
" plt.ylabel(\"RMSE\")\n",
" plt.xlabel(\"Periods\")\n",
" plt.title(\"Root Mean Squared Error vs. Periods\")\n",
" plt.tight_layout()\n",
" plt.plot(training_rmse, label=\"training\")\n",
" plt.plot(validation_rmse, label=\"validation\")\n",
" plt.legend()\n",
"\n",
" return linear_regressor"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "1Cdr02tLIK_Q",
"colab_type": "code",
"colab": {}
},
"source": [
"_ = train_model(\n",
" learning_rate=1.0,\n",
" steps=500,\n",
" batch_size=100,\n",
" feature_columns=construct_feature_columns(training_examples),\n",
" training_examples=training_examples,\n",
" training_targets=training_targets,\n",
" validation_examples=validation_examples,\n",
" validation_targets=validation_targets)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "i4lGvqajDWlw",
"colab_type": "text"
},
"source": [
"## One-Hot Encoding for Discrete Features\n",
"\n",
"Discrete (i.e. strings, enumerations, integers) features are usually converted into families of binary features before training a logistic regression model.\n",
"\n",
"For example, suppose we created a synthetic feature that can take any of the values `0`, `1` or `2`, and that we have a few training points:\n",
"\n",
"| # | feature_value |\n",
"|---|---------------|\n",
"| 0 | 2 |\n",
"| 1 | 0 |\n",
"| 2 | 1 |\n",
"\n",
"For each possible categorical value, we make a new **binary** feature of **real values** that can take one of just two possible values: 1.0 if the example has that value, and 0.0 if not. In the example above, the categorical feature would be converted into three features, and the training points now look like:\n",
"\n",
"| # | feature_value_0 | feature_value_1 | feature_value_2 |\n",
"|---|-----------------|-----------------|-----------------|\n",
"| 0 | 0.0 | 0.0 | 1.0 |\n",
"| 1 | 1.0 | 0.0 | 0.0 |\n",
"| 2 | 0.0 | 1.0 | 0.0 |"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KnssXowblKm7",
"colab_type": "text"
},
"source": [
"## Bucketized (Binned) Features\n",
"\n",
"Bucketization is also known as binning.\n",
"\n",
"We can bucketize `population` into the following 3 buckets (for instance):\n",
"- `bucket_0` (`< 5000`): corresponding to less populated blocks\n",
"- `bucket_1` (`5000 - 25000`): corresponding to mid populated blocks\n",
"- `bucket_2` (`> 25000`): corresponding to highly populated blocks\n",
"\n",
"Given the preceding bucket definitions, the following `population` vector:\n",
"\n",
" [[10001], [42004], [2500], [18000]]\n",
"\n",
"becomes the following bucketized feature vector:\n",
"\n",
" [[1], [2], [0], [1]]\n",
"\n",
"The feature values are now the bucket indices. Note that these indices are considered to be discrete features. Typically, these will be further converted in one-hot representations as above, but this is done transparently.\n",
"\n",
"To define feature columns for bucketized features, instead of using `numeric_column`, we can use [`bucketized_column`](https://www.tensorflow.org/api_docs/python/tf/feature_column/bucketized_column), which takes a numeric column as input and transforms it to a bucketized feature using the bucket boundaries specified in the `boundaries` argument. The following code defines bucketized feature columns for `households` and `longitude`; the `get_quantile_based_boundaries` function calculates boundaries based on quantiles, so that each bucket contains an equal number of elements."
]
},
{
"cell_type": "code",
"metadata": {
"id": "cc9qZrtRy-ED",
"colab_type": "code",
"colab": {}
},
"source": [
"def get_quantile_based_boundaries(feature_values, num_buckets):\n",
" boundaries = np.arange(1.0, num_buckets) / num_buckets\n",
" quantiles = feature_values.quantile(boundaries)\n",
" return [quantiles[q] for q in quantiles.keys()]\n",
"\n",
"# Divide households into 7 buckets.\n",
"households = tf.feature_column.numeric_column(\"households\")\n",
"bucketized_households = tf.feature_column.bucketized_column(\n",
" households, boundaries=get_quantile_based_boundaries(\n",
" california_housing_dataframe[\"households\"], 7))\n",
"\n",
"# Divide longitude into 10 buckets.\n",
"longitude = tf.feature_column.numeric_column(\"longitude\")\n",
"bucketized_longitude = tf.feature_column.bucketized_column(\n",
" longitude, boundaries=get_quantile_based_boundaries(\n",
" california_housing_dataframe[\"longitude\"], 10))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "U-pQDAa0MeN3",
"colab_type": "text"
},
"source": [
"## Task 1: Train the Model on Bucketized Feature Columns\n",
"**Bucketize all the real valued features in our example, train the model and see if the results improve.**\n",
"\n",
"In the preceding code block, two real valued columns (namely `households` and `longitude`) have been transformed into bucketized feature columns. Your task is to bucketize the rest of the columns, then run the code to train the model. There are various heuristics to find the range of the buckets. This exercise uses a quantile-based technique, which chooses the bucket boundaries in such a way that each bucket has the same number of examples."
]
},
{
"cell_type": "code",
"metadata": {
"id": "YFXV9lyMLedy",
"colab_type": "code",
"colab": {}
},
"source": [
"def construct_feature_columns():\n",
" \"\"\"Construct the TensorFlow Feature Columns.\n",
"\n",
" Returns:\n",
" A set of feature columns\n",
" \"\"\" \n",
" households = tf.feature_column.numeric_column(\"households\")\n",
" longitude = tf.feature_column.numeric_column(\"longitude\")\n",
" latitude = tf.feature_column.numeric_column(\"latitude\")\n",
" housing_median_age = tf.feature_column.numeric_column(\"housing_median_age\")\n",
" median_income = tf.feature_column.numeric_column(\"median_income\")\n",
" rooms_per_person = tf.feature_column.numeric_column(\"rooms_per_person\")\n",
" \n",
" # Divide households into 7 buckets.\n",
" bucketized_households = tf.feature_column.bucketized_column(\n",
" households, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"households\"], 7))\n",
"\n",
" # Divide longitude into 10 buckets.\n",
" bucketized_longitude = tf.feature_column.bucketized_column(\n",
" longitude, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"longitude\"], 10))\n",
"\n",
" #\n",
" # YOUR CODE HERE: bucketize the following columns, following the example above:\n",
" #\n",
" bucketized_latitude = \n",
" bucketized_housing_median_age = \n",
" bucketized_median_income =\n",
" bucketized_rooms_per_person =\n",
" \n",
" feature_columns = set([\n",
" bucketized_longitude,\n",
" bucketized_latitude,\n",
" bucketized_housing_median_age,\n",
" bucketized_households,\n",
" bucketized_median_income,\n",
" bucketized_rooms_per_person])\n",
" \n",
" return feature_columns\n"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "0FfUytOTNJhL",
"colab_type": "code",
"colab": {}
},
"source": [
"_ = train_model(\n",
" learning_rate=1.0,\n",
" steps=500,\n",
" batch_size=100,\n",
" feature_columns=construct_feature_columns(),\n",
" training_examples=training_examples,\n",
" training_targets=training_targets,\n",
" validation_examples=validation_examples,\n",
" validation_targets=validation_targets)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZTDHHM61NPTw",
"colab_type": "text"
},
"source": [
"### Solution\n",
"\n",
"Click below for a solution."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JQHnUhL_NRwA",
"colab_type": "text"
},
"source": [
"You may be wondering how to determine how many buckets to use. That is of course data-dependent. Here, we just selected arbitrary values so as to obtain a not-too-large model."
]
},
{
"cell_type": "code",
"metadata": {
"id": "Ro5civQ3Ngh_",
"colab_type": "code",
"colab": {}
},
"source": [
"def construct_feature_columns():\n",
" \"\"\"Construct the TensorFlow Feature Columns.\n",
"\n",
" Returns:\n",
" A set of feature columns\n",
" \"\"\" \n",
" households = tf.feature_column.numeric_column(\"households\")\n",
" longitude = tf.feature_column.numeric_column(\"longitude\")\n",
" latitude = tf.feature_column.numeric_column(\"latitude\")\n",
" housing_median_age = tf.feature_column.numeric_column(\"housing_median_age\")\n",
" median_income = tf.feature_column.numeric_column(\"median_income\")\n",
" rooms_per_person = tf.feature_column.numeric_column(\"rooms_per_person\")\n",
" \n",
" # Divide households into 7 buckets.\n",
" bucketized_households = tf.feature_column.bucketized_column(\n",
" households, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"households\"], 7))\n",
"\n",
" # Divide longitude into 10 buckets.\n",
" bucketized_longitude = tf.feature_column.bucketized_column(\n",
" longitude, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"longitude\"], 10))\n",
" \n",
" # Divide latitude into 10 buckets.\n",
" bucketized_latitude = tf.feature_column.bucketized_column(\n",
" latitude, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"latitude\"], 10))\n",
"\n",
" # Divide housing_median_age into 7 buckets.\n",
" bucketized_housing_median_age = tf.feature_column.bucketized_column(\n",
" housing_median_age, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"housing_median_age\"], 7))\n",
" \n",
" # Divide median_income into 7 buckets.\n",
" bucketized_median_income = tf.feature_column.bucketized_column(\n",
" median_income, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"median_income\"], 7))\n",
" \n",
" # Divide rooms_per_person into 7 buckets.\n",
" bucketized_rooms_per_person = tf.feature_column.bucketized_column(\n",
" rooms_per_person, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"rooms_per_person\"], 7))\n",
" \n",
" feature_columns = set([\n",
" bucketized_longitude,\n",
" bucketized_latitude,\n",
" bucketized_housing_median_age,\n",
" bucketized_households,\n",
" bucketized_median_income,\n",
" bucketized_rooms_per_person])\n",
" \n",
" return feature_columns"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "RNgfYk6OO8Sy",
"colab_type": "code",
"colab": {}
},
"source": [
"_ = train_model(\n",
" learning_rate=1.0,\n",
" steps=500,\n",
" batch_size=100,\n",
" feature_columns=construct_feature_columns(),\n",
" training_examples=training_examples,\n",
" training_targets=training_targets,\n",
" validation_examples=validation_examples,\n",
" validation_targets=validation_targets)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "AFJ1qoZPlQcs",
"colab_type": "text"
},
"source": [
"## Feature Crosses\n",
"\n",
"Crossing two (or more) features is a clever way to learn non-linear relations using a linear model. In our problem, if we just use the feature `latitude` for learning, the model might learn that city blocks at a particular latitude (or within a particular range of latitudes since we have bucketized it) are more likely to be expensive than others. Similarly for the feature `longitude`. However, if we cross `longitude` by `latitude`, the crossed feature represents a well defined city block. If the model learns that certain city blocks (within range of latitudes and longitudes) are more likely to be more expensive than others, it is a stronger signal than two features considered individually.\n",
"\n",
"Currently, the feature columns API only supports discrete features for crosses. To cross two continuous values, like `latitude` or `longitude`, we can bucketize them.\n",
"\n",
"If we cross the `latitude` and `longitude` features (supposing, for example, that `longitude` was bucketized into `2` buckets, while `latitude` has `3` buckets), we actually get six crossed binary features. Each of these features will get its own separate weight when we train the model."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-Rk0c1oTYaVH",
"colab_type": "text"
},
"source": [
"## Task 2: Train the Model Using Feature Crosses\n",
"\n",
"**Add a feature cross of `longitude` and `latitude` to your model, train it, and determine whether the results improve.**\n",
"\n",
"Refer to the TensorFlow API docs for [`crossed_column()`](https://www.tensorflow.org/api_docs/python/tf/feature_column/crossed_column) to build the feature column for your cross. Use a `hash_bucket_size` of `1000`."
]
},
{
"cell_type": "code",
"metadata": {
"id": "-eYiVEGeYhUi",
"colab_type": "code",
"cellView": "both",
"colab": {}
},
"source": [
"def construct_feature_columns():\n",
" \"\"\"Construct the TensorFlow Feature Columns.\n",
"\n",
" Returns:\n",
" A set of feature columns\n",
" \"\"\" \n",
" households = tf.feature_column.numeric_column(\"households\")\n",
" longitude = tf.feature_column.numeric_column(\"longitude\")\n",
" latitude = tf.feature_column.numeric_column(\"latitude\")\n",
" housing_median_age = tf.feature_column.numeric_column(\"housing_median_age\")\n",
" median_income = tf.feature_column.numeric_column(\"median_income\")\n",
" rooms_per_person = tf.feature_column.numeric_column(\"rooms_per_person\")\n",
" \n",
" # Divide households into 7 buckets.\n",
" bucketized_households = tf.feature_column.bucketized_column(\n",
" households, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"households\"], 7))\n",
"\n",
" # Divide longitude into 10 buckets.\n",
" bucketized_longitude = tf.feature_column.bucketized_column(\n",
" longitude, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"longitude\"], 10))\n",
" \n",
" # Divide latitude into 10 buckets.\n",
" bucketized_latitude = tf.feature_column.bucketized_column(\n",
" latitude, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"latitude\"], 10))\n",
"\n",
" # Divide housing_median_age into 7 buckets.\n",
" bucketized_housing_median_age = tf.feature_column.bucketized_column(\n",
" housing_median_age, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"housing_median_age\"], 7))\n",
" \n",
" # Divide median_income into 7 buckets.\n",
" bucketized_median_income = tf.feature_column.bucketized_column(\n",
" median_income, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"median_income\"], 7))\n",
" \n",
" # Divide rooms_per_person into 7 buckets.\n",
" bucketized_rooms_per_person = tf.feature_column.bucketized_column(\n",
" rooms_per_person, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"rooms_per_person\"], 7))\n",
" \n",
" # YOUR CODE HERE: Make a feature column for the long_x_lat feature cross\n",
" long_x_lat = \n",
" \n",
" feature_columns = set([\n",
" bucketized_longitude,\n",
" bucketized_latitude,\n",
" bucketized_housing_median_age,\n",
" bucketized_households,\n",
" bucketized_median_income,\n",
" bucketized_rooms_per_person,\n",
" long_x_lat])\n",
" \n",
" return feature_columns"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "xZuZMp3EShkM",
"colab_type": "code",
"colab": {}
},
"source": [
"_ = train_model(\n",
" learning_rate=1.0,\n",
" steps=500,\n",
" batch_size=100,\n",
" feature_columns=construct_feature_columns(),\n",
" training_examples=training_examples,\n",
" training_targets=training_targets,\n",
" validation_examples=validation_examples,\n",
" validation_targets=validation_targets)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "0i7vGo9PTaZl",
"colab_type": "text"
},
"source": [
"### Solution\n",
"\n",
"Click below for the solution."
]
},
{
"cell_type": "code",
"metadata": {
"id": "3tAWu8qSTe2v",
"colab_type": "code",
"colab": {}
},
"source": [
"def construct_feature_columns():\n",
" \"\"\"Construct the TensorFlow Feature Columns.\n",
"\n",
" Returns:\n",
" A set of feature columns\n",
" \"\"\" \n",
" households = tf.feature_column.numeric_column(\"households\")\n",
" longitude = tf.feature_column.numeric_column(\"longitude\")\n",
" latitude = tf.feature_column.numeric_column(\"latitude\")\n",
" housing_median_age = tf.feature_column.numeric_column(\"housing_median_age\")\n",
" median_income = tf.feature_column.numeric_column(\"median_income\")\n",
" rooms_per_person = tf.feature_column.numeric_column(\"rooms_per_person\")\n",
" \n",
" # Divide households into 7 buckets.\n",
" bucketized_households = tf.feature_column.bucketized_column(\n",
" households, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"households\"], 7))\n",
"\n",
" # Divide longitude into 10 buckets.\n",
" bucketized_longitude = tf.feature_column.bucketized_column(\n",
" longitude, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"longitude\"], 10))\n",
" \n",
" # Divide latitude into 10 buckets.\n",
" bucketized_latitude = tf.feature_column.bucketized_column(\n",
" latitude, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"latitude\"], 10))\n",
"\n",
" # Divide housing_median_age into 7 buckets.\n",
" bucketized_housing_median_age = tf.feature_column.bucketized_column(\n",
" housing_median_age, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"housing_median_age\"], 7))\n",
" \n",
" # Divide median_income into 7 buckets.\n",
" bucketized_median_income = tf.feature_column.bucketized_column(\n",
" median_income, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"median_income\"], 7))\n",
" \n",
" # Divide rooms_per_person into 7 buckets.\n",
" bucketized_rooms_per_person = tf.feature_column.bucketized_column(\n",
" rooms_per_person, boundaries=get_quantile_based_boundaries(\n",
" training_examples[\"rooms_per_person\"], 7))\n",
" \n",
" # YOUR CODE HERE: Make a feature column for the long_x_lat feature cross\n",
" long_x_lat = tf.feature_column.crossed_column(\n",
" set([bucketized_longitude, bucketized_latitude]), hash_bucket_size=1000) \n",
" \n",
" feature_columns = set([\n",
" bucketized_longitude,\n",
" bucketized_latitude,\n",
" bucketized_housing_median_age,\n",
" bucketized_households,\n",
" bucketized_median_income,\n",
" bucketized_rooms_per_person,\n",
" long_x_lat])\n",
" \n",
" return feature_columns"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "-_vvNYIyTtPC",
"colab_type": "code",
"colab": {}
},
"source": [
"_ = train_model(\n",
" learning_rate=1.0,\n",
" steps=500,\n",
" batch_size=100,\n",
" feature_columns=construct_feature_columns(),\n",
" training_examples=training_examples,\n",
" training_targets=training_targets,\n",
" validation_examples=validation_examples,\n",
" validation_targets=validation_targets)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "ymlHJ-vrhLZw",
"colab_type": "text"
},
"source": [
"## Optional Challenge: Try Out More Synthetic Features\n",
"\n",
"So far, we've tried simple bucketized columns and feature crosses, but there are many more combinations that could potentially improve the results. For example, you could cross multiple columns. What happens if you vary the number of buckets? What other synthetic features can you think of? Do they improve the model?"
]
}
]
}

View file

@ -0,0 +1,661 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "feature_sets.ipynb",
"provenance": [],
"collapsed_sections": [
"JndnmDMp66FL",
"IGINhMIJ5Wyt",
"pZa8miwu6_tQ"
]
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "JndnmDMp66FL",
"colab_type": "text"
},
"source": [
"#### Copyright 2017 Google LLC."
]
},
{
"cell_type": "code",
"metadata": {
"id": "hMqWDc_m6rUC",
"colab_type": "code",
"cellView": "both",
"colab": {}
},
"source": [
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "zbIgBK-oXHO7",
"colab_type": "text"
},
"source": [
"# Feature Sets"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bL04rAQwH3pH",
"colab_type": "text"
},
"source": [
"**Learning Objective:** Create a minimal set of features that performs just as well as a more complex feature set"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "F8Hci6tAH3pH",
"colab_type": "text"
},
"source": [
"So far, we've thrown all of our features into the model. Models with fewer features use fewer resources and are easier to maintain. Let's see if we can build a model on a minimal set of housing features that will perform equally as well as one that uses all the features in the data set."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "F5ZjVwK_qOyR",
"colab_type": "text"
},
"source": [
"## Setup\n",
"\n",
"As before, let's load and prepare the California housing data."
]
},
{
"cell_type": "code",
"metadata": {
"id": "SrOYRILAH3pJ",
"colab_type": "code",
"colab": {}
},
"source": [
"from __future__ import print_function\n",
"\n",
"import math\n",
"\n",
"from IPython import display\n",
"from matplotlib import cm\n",
"from matplotlib import gridspec\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn import metrics\n",
"import tensorflow as tf\n",
"from tensorflow.python.data import Dataset\n",
"\n",
"tf.logging.set_verbosity(tf.logging.ERROR)\n",
"pd.options.display.max_rows = 10\n",
"pd.options.display.float_format = '{:.1f}'.format\n",
"\n",
"california_housing_dataframe = pd.read_csv(\"https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv\", sep=\",\")\n",
"\n",
"california_housing_dataframe = california_housing_dataframe.reindex(\n",
" np.random.permutation(california_housing_dataframe.index))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "dGnXo7flH3pM",
"colab_type": "code",
"colab": {}
},
"source": [
"def preprocess_features(california_housing_dataframe):\n",
" \"\"\"Prepares input features from California housing data set.\n",
"\n",
" Args:\n",
" california_housing_dataframe: A Pandas DataFrame expected to contain data\n",
" from the California housing data set.\n",
" Returns:\n",
" A DataFrame that contains the features to be used for the model, including\n",
" synthetic features.\n",
" \"\"\"\n",
" selected_features = california_housing_dataframe[\n",
" [\"latitude\",\n",
" \"longitude\",\n",
" \"housing_median_age\",\n",
" \"total_rooms\",\n",
" \"total_bedrooms\",\n",
" \"population\",\n",
" \"households\",\n",
" \"median_income\"]]\n",
" processed_features = selected_features.copy()\n",
" # Create a synthetic feature.\n",
" processed_features[\"rooms_per_person\"] = (\n",
" california_housing_dataframe[\"total_rooms\"] /\n",
" california_housing_dataframe[\"population\"])\n",
" return processed_features\n",
"\n",
"def preprocess_targets(california_housing_dataframe):\n",
" \"\"\"Prepares target features (i.e., labels) from California housing data set.\n",
"\n",
" Args:\n",
" california_housing_dataframe: A Pandas DataFrame expected to contain data\n",
" from the California housing data set.\n",
" Returns:\n",
" A DataFrame that contains the target feature.\n",
" \"\"\"\n",
" output_targets = pd.DataFrame()\n",
" # Scale the target to be in units of thousands of dollars.\n",
" output_targets[\"median_house_value\"] = (\n",
" california_housing_dataframe[\"median_house_value\"] / 1000.0)\n",
" return output_targets"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "jLXC8y4AqsIy",
"colab_type": "code",
"colab": {}
},
"source": [
"# Choose the first 12000 (out of 17000) examples for training.\n",
"training_examples = preprocess_features(california_housing_dataframe.head(12000))\n",
"training_targets = preprocess_targets(california_housing_dataframe.head(12000))\n",
"\n",
"# Choose the last 5000 (out of 17000) examples for validation.\n",
"validation_examples = preprocess_features(california_housing_dataframe.tail(5000))\n",
"validation_targets = preprocess_targets(california_housing_dataframe.tail(5000))\n",
"\n",
"# Double-check that we've done the right thing.\n",
"print(\"Training examples summary:\")\n",
"display.display(training_examples.describe())\n",
"print(\"Validation examples summary:\")\n",
"display.display(validation_examples.describe())\n",
"\n",
"print(\"Training targets summary:\")\n",
"display.display(training_targets.describe())\n",
"print(\"Validation targets summary:\")\n",
"display.display(validation_targets.describe())"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "hLvmkugKLany",
"colab_type": "text"
},
"source": [
"## Task 1: Develop a Good Feature Set\n",
"\n",
"**What's the best performance you can get with just 2 or 3 features?**\n",
"\n",
"A **correlation matrix** shows pairwise correlations, both for each feature compared to the target and for each feature compared to other features.\n",
"\n",
"Here, correlation is defined as the [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient). You don't have to understand the mathematical details for this exercise.\n",
"\n",
"Correlation values have the following meanings:\n",
"\n",
" * `-1.0`: perfect negative correlation\n",
" * `0.0`: no correlation\n",
" * `1.0`: perfect positive correlation"
]
},
{
"cell_type": "code",
"metadata": {
"id": "UzoZUSdLIolF",
"colab_type": "code",
"cellView": "both",
"colab": {
"test": {
"output": "ignore",
"timeout": 600
}
}
},
"source": [
"correlation_dataframe = training_examples.copy()\n",
"correlation_dataframe[\"target\"] = training_targets[\"median_house_value\"]\n",
"\n",
"correlation_dataframe.corr()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "RQpktkNpia2P",
"colab_type": "text"
},
"source": [
"Features that have strong positive or negative correlations with the target will add information to our model. We can use the correlation matrix to find such strongly correlated features.\n",
"\n",
"We'd also like to have features that aren't so strongly correlated with each other, so that they add independent information.\n",
"\n",
"Use this information to try removing features. You can also try developing additional synthetic features, such as ratios of two raw features.\n",
"\n",
"For convenience, we've included the training code from the previous exercise."
]
},
{
"cell_type": "code",
"metadata": {
"id": "bjR5jWpFr2xs",
"colab_type": "code",
"colab": {}
},
"source": [
"def construct_feature_columns(input_features):\n",
" \"\"\"Construct the TensorFlow Feature Columns.\n",
"\n",
" Args:\n",
" input_features: The names of the numerical input features to use.\n",
" Returns:\n",
" A set of feature columns\n",
" \"\"\" \n",
" return set([tf.feature_column.numeric_column(my_feature)\n",
" for my_feature in input_features])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "jsvKHzRciH9T",
"colab_type": "code",
"colab": {}
},
"source": [
"def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):\n",
" \"\"\"Trains a linear regression model.\n",
" \n",
" Args:\n",
" features: pandas DataFrame of features\n",
" targets: pandas DataFrame of targets\n",
" batch_size: Size of batches to be passed to the model\n",
" shuffle: True or False. Whether to shuffle the data.\n",
" num_epochs: Number of epochs for which data should be repeated. None = repeat indefinitely\n",
" Returns:\n",
" Tuple of (features, labels) for next data batch\n",
" \"\"\"\n",
" \n",
" # Convert pandas data into a dict of np arrays.\n",
" features = {key:np.array(value) for key,value in dict(features).items()} \n",
" \n",
" # Construct a dataset, and configure batching/repeating.\n",
" ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit\n",
" ds = ds.batch(batch_size).repeat(num_epochs)\n",
"\n",
" # Shuffle the data, if specified.\n",
" if shuffle:\n",
" ds = ds.shuffle(10000)\n",
" \n",
" # Return the next batch of data.\n",
" features, labels = ds.make_one_shot_iterator().get_next()\n",
" return features, labels"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "g3kjQV9WH3pb",
"colab_type": "code",
"colab": {}
},
"source": [
"def train_model(\n",
" learning_rate,\n",
" steps,\n",
" batch_size,\n",
" training_examples,\n",
" training_targets,\n",
" validation_examples,\n",
" validation_targets):\n",
" \"\"\"Trains a linear regression model.\n",
" \n",
" In addition to training, this function also prints training progress information,\n",
" as well as a plot of the training and validation loss over time.\n",
" \n",
" Args:\n",
" learning_rate: A `float`, the learning rate.\n",
" steps: A non-zero `int`, the total number of training steps. A training step\n",
" consists of a forward and backward pass using a single batch.\n",
" batch_size: A non-zero `int`, the batch size.\n",
" training_examples: A `DataFrame` containing one or more columns from\n",
" `california_housing_dataframe` to use as input features for training.\n",
" training_targets: A `DataFrame` containing exactly one column from\n",
" `california_housing_dataframe` to use as target for training.\n",
" validation_examples: A `DataFrame` containing one or more columns from\n",
" `california_housing_dataframe` to use as input features for validation.\n",
" validation_targets: A `DataFrame` containing exactly one column from\n",
" `california_housing_dataframe` to use as target for validation.\n",
" \n",
" Returns:\n",
" A `LinearRegressor` object trained on the training data.\n",
" \"\"\"\n",
"\n",
" periods = 10\n",
" steps_per_period = steps / periods\n",
"\n",
" # Create a linear regressor object.\n",
" my_optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\n",
" my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)\n",
" linear_regressor = tf.estimator.LinearRegressor(\n",
" feature_columns=construct_feature_columns(training_examples),\n",
" optimizer=my_optimizer\n",
" )\n",
" \n",
" # Create input functions.\n",
" training_input_fn = lambda: my_input_fn(training_examples, \n",
" training_targets[\"median_house_value\"], \n",
" batch_size=batch_size)\n",
" predict_training_input_fn = lambda: my_input_fn(training_examples, \n",
" training_targets[\"median_house_value\"], \n",
" num_epochs=1, \n",
" shuffle=False)\n",
" predict_validation_input_fn = lambda: my_input_fn(validation_examples, \n",
" validation_targets[\"median_house_value\"], \n",
" num_epochs=1, \n",
" shuffle=False)\n",
"\n",
" # Train the model, but do so inside a loop so that we can periodically assess\n",
" # loss metrics.\n",
" print(\"Training model...\")\n",
" print(\"RMSE (on training data):\")\n",
" training_rmse = []\n",
" validation_rmse = []\n",
" for period in range (0, periods):\n",
" # Train the model, starting from the prior state.\n",
" linear_regressor.train(\n",
" input_fn=training_input_fn,\n",
" steps=steps_per_period,\n",
" )\n",
" # Take a break and compute predictions.\n",
" training_predictions = linear_regressor.predict(input_fn=predict_training_input_fn)\n",
" training_predictions = np.array([item['predictions'][0] for item in training_predictions])\n",
" \n",
" validation_predictions = linear_regressor.predict(input_fn=predict_validation_input_fn)\n",
" validation_predictions = np.array([item['predictions'][0] for item in validation_predictions])\n",
" \n",
" # Compute training and validation loss.\n",
" training_root_mean_squared_error = math.sqrt(\n",
" metrics.mean_squared_error(training_predictions, training_targets))\n",
" validation_root_mean_squared_error = math.sqrt(\n",
" metrics.mean_squared_error(validation_predictions, validation_targets))\n",
" # Occasionally print the current loss.\n",
" print(\" period %02d : %0.2f\" % (period, training_root_mean_squared_error))\n",
" # Add the loss metrics from this period to our list.\n",
" training_rmse.append(training_root_mean_squared_error)\n",
" validation_rmse.append(validation_root_mean_squared_error)\n",
" print(\"Model training finished.\")\n",
"\n",
" \n",
" # Output a graph of loss metrics over periods.\n",
" plt.ylabel(\"RMSE\")\n",
" plt.xlabel(\"Periods\")\n",
" plt.title(\"Root Mean Squared Error vs. Periods\")\n",
" plt.tight_layout()\n",
" plt.plot(training_rmse, label=\"training\")\n",
" plt.plot(validation_rmse, label=\"validation\")\n",
" plt.legend()\n",
"\n",
" return linear_regressor"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "varLu7RNH3pf",
"colab_type": "text"
},
"source": [
"Spend 5 minutes searching for a good set of features and training parameters. Then check the solution to see what we chose. Don't forget that different features may require different learning parameters."
]
},
{
"cell_type": "code",
"metadata": {
"id": "DSgUxRIlH3pg",
"colab_type": "code",
"colab": {}
},
"source": [
"#\n",
"# Your code here: add your features of choice as a list of quoted strings.\n",
"#\n",
"minimal_features = [\n",
"]\n",
"\n",
"assert minimal_features, \"You must select at least one feature!\"\n",
"\n",
"minimal_training_examples = training_examples[minimal_features]\n",
"minimal_validation_examples = validation_examples[minimal_features]\n",
"\n",
"#\n",
"# Don't forget to adjust these parameters.\n",
"#\n",
"train_model(\n",
" learning_rate=0.001,\n",
" steps=500,\n",
" batch_size=5,\n",
" training_examples=minimal_training_examples,\n",
" training_targets=training_targets,\n",
" validation_examples=minimal_validation_examples,\n",
" validation_targets=validation_targets)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "IGINhMIJ5Wyt",
"colab_type": "text"
},
"source": [
"### Solution\n",
"\n",
"Click below for a solution."
]
},
{
"cell_type": "code",
"metadata": {
"id": "BAGoXFPZ5ZE3",
"colab_type": "code",
"colab": {}
},
"source": [
"minimal_features = [\n",
" \"median_income\",\n",
" \"latitude\",\n",
"]\n",
"\n",
"minimal_training_examples = training_examples[minimal_features]\n",
"minimal_validation_examples = validation_examples[minimal_features]\n",
"\n",
"_ = train_model(\n",
" learning_rate=0.01,\n",
" steps=500,\n",
" batch_size=5,\n",
" training_examples=minimal_training_examples,\n",
" training_targets=training_targets,\n",
" validation_examples=minimal_validation_examples,\n",
" validation_targets=validation_targets)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "RidI9YhKOiY2",
"colab_type": "text"
},
"source": [
"## Task 2: Make Better Use of Latitude\n",
"\n",
"Plotting `latitude` vs. `median_house_value` shows that there really isn't a linear relationship there.\n",
"\n",
"Instead, there are a couple of peaks, which roughly correspond to Los Angeles and San Francisco."
]
},
{
"cell_type": "code",
"metadata": {
"id": "hfGUKj2IR_F1",
"colab_type": "code",
"cellView": "both",
"colab": {
"test": {
"output": "ignore",
"timeout": 600
}
}
},
"source": [
"plt.scatter(training_examples[\"latitude\"], training_targets[\"median_house_value\"])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "6N0p91k2iFCP",
"colab_type": "text"
},
"source": [
"**Try creating some synthetic features that do a better job with latitude.**\n",
"\n",
"For example, you could have a feature that maps `latitude` to a value of `|latitude - 38|`, and call this `distance_from_san_francisco`.\n",
"\n",
"Or you could break the space into 10 different buckets. `latitude_32_to_33`, `latitude_33_to_34`, etc., each showing a value of `1.0` if `latitude` is within that bucket range and a value of `0.0` otherwise.\n",
"\n",
"Use the correlation matrix to help guide development, and then add them to your model if you find something that looks good.\n",
"\n",
"What's the best validation performance you can get?"
]
},
{
"cell_type": "code",
"metadata": {
"id": "wduJ2B28yMFl",
"colab_type": "code",
"cellView": "form",
"colab": {}
},
"source": [
"#\n",
"# YOUR CODE HERE: Train on a new data set that includes synthetic features based on latitude.\n",
"#"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "pZa8miwu6_tQ",
"colab_type": "text"
},
"source": [
"### Solution\n",
"\n",
"Click below for a solution."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PzABdyjq7IZU",
"colab_type": "text"
},
"source": [
"Aside from `latitude`, we'll also keep `median_income`, to compare with the previous results.\n",
"\n",
"We decided to bucketize the latitude. This is fairly straightforward in Pandas using `Series.apply`."
]
},
{
"cell_type": "code",
"metadata": {
"id": "xdVF8siZ7Lup",
"colab_type": "code",
"colab": {}
},
"source": [
"def select_and_transform_features(source_df):\n",
" LATITUDE_RANGES = zip(range(32, 44), range(33, 45))\n",
" selected_examples = pd.DataFrame()\n",
" selected_examples[\"median_income\"] = source_df[\"median_income\"]\n",
" for r in LATITUDE_RANGES:\n",
" selected_examples[\"latitude_%d_to_%d\" % r] = source_df[\"latitude\"].apply(\n",
" lambda l: 1.0 if l >= r[0] and l < r[1] else 0.0)\n",
" return selected_examples\n",
"\n",
"selected_training_examples = select_and_transform_features(training_examples)\n",
"selected_validation_examples = select_and_transform_features(validation_examples)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "U4iAdY6t7Pkh",
"colab_type": "code",
"colab": {}
},
"source": [
"_ = train_model(\n",
" learning_rate=0.01,\n",
" steps=500,\n",
" batch_size=5,\n",
" training_examples=selected_training_examples,\n",
" training_targets=training_targets,\n",
" validation_examples=selected_validation_examples,\n",
" validation_targets=validation_targets)"
],
"execution_count": 0,
"outputs": []
}
]
}

View file

@ -0,0 +1,973 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "first_steps_with_tensor_flow.ipynb",
"provenance": [],
"collapsed_sections": [
"JndnmDMp66FL",
"ajVM7rkoYXeL",
"ci1ISxxrZ7v0"
]
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "JndnmDMp66FL",
"colab_type": "text"
},
"source": [
"#### Copyright 2017 Google LLC."
]
},
{
"cell_type": "code",
"metadata": {
"id": "hMqWDc_m6rUC",
"colab_type": "code",
"cellView": "both",
"colab": {}
},
"source": [
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "4f3CKqFUqL2-",
"colab_type": "text"
},
"source": [
"# First Steps with TensorFlow"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Bd2Zkk1LE2Zr",
"colab_type": "text"
},
"source": [
"**Learning Objectives:**\n",
" * Learn fundamental TensorFlow concepts\n",
" * Use the `LinearRegressor` class in TensorFlow to predict median housing price, at the granularity of city blocks, based on one input feature\n",
" * Evaluate the accuracy of a model's predictions using Root Mean Squared Error (RMSE)\n",
" * Improve the accuracy of a model by tuning its hyperparameters"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MxiIKhP4E2Zr",
"colab_type": "text"
},
"source": [
"The [data](https://developers.google.com/machine-learning/crash-course/california-housing-data-description) is based on 1990 census data from California."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6TjLjL9IU80G",
"colab_type": "text"
},
"source": [
"## Setup\n",
"In this first cell, we'll load the necessary libraries."
]
},
{
"cell_type": "code",
"metadata": {
"id": "rVFf5asKE2Zt",
"colab_type": "code",
"colab": {}
},
"source": [
"from __future__ import print_function\n",
"\n",
"import math\n",
"\n",
"from IPython import display\n",
"from matplotlib import cm\n",
"from matplotlib import gridspec\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn import metrics\n",
"import tensorflow as tf\n",
"from tensorflow.python.data import Dataset\n",
"\n",
"tf.logging.set_verbosity(tf.logging.ERROR)\n",
"pd.options.display.max_rows = 10\n",
"pd.options.display.float_format = '{:.1f}'.format"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "ipRyUHjhU80Q",
"colab_type": "text"
},
"source": [
"Next, we'll load our data set."
]
},
{
"cell_type": "code",
"metadata": {
"id": "9ivCDWnwE2Zx",
"colab_type": "code",
"colab": {}
},
"source": [
"california_housing_dataframe = pd.read_csv(\"https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv\", sep=\",\")"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "vVk_qlG6U80j",
"colab_type": "text"
},
"source": [
"We'll randomize the data, just to be sure not to get any pathological ordering effects that might harm the performance of Stochastic Gradient Descent. Additionally, we'll scale `median_house_value` to be in units of thousands, so it can be learned a little more easily with learning rates in a range that we usually use."
]
},
{
"cell_type": "code",
"metadata": {
"id": "r0eVyguIU80m",
"colab_type": "code",
"colab": {}
},
"source": [
"california_housing_dataframe = california_housing_dataframe.reindex(\n",
" np.random.permutation(california_housing_dataframe.index))\n",
"california_housing_dataframe[\"median_house_value\"] /= 1000.0\n",
"california_housing_dataframe"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "HzzlSs3PtTmt",
"colab_type": "text"
},
"source": [
"## Examine the Data\n",
"\n",
"It's a good idea to get to know your data a little bit before you work with it.\n",
"\n",
"We'll print out a quick summary of a few useful statistics on each column: count of examples, mean, standard deviation, max, min, and various quantiles."
]
},
{
"cell_type": "code",
"metadata": {
"id": "gzb10yoVrydW",
"colab_type": "code",
"cellView": "both",
"colab": {
"test": {
"output": "ignore",
"timeout": 600
}
}
},
"source": [
"california_housing_dataframe.describe()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "Lr6wYl2bt2Ep",
"colab_type": "text"
},
"source": [
"## Build the First Model\n",
"\n",
"In this exercise, we'll try to predict `median_house_value`, which will be our label (sometimes also called a target). We'll use `total_rooms` as our input feature.\n",
"\n",
"**NOTE:** Our data is at the city block level, so this feature represents the total number of rooms in that block.\n",
"\n",
"To train our model, we'll use the [LinearRegressor](https://www.tensorflow.org/api_docs/python/tf/estimator/LinearRegressor) interface provided by the TensorFlow [Estimator](https://www.tensorflow.org/get_started/estimator) API. This API takes care of a lot of the low-level model plumbing, and exposes convenient methods for performing model training, evaluation, and inference."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0cpcsieFhsNI",
"colab_type": "text"
},
"source": [
"### Step 1: Define Features and Configure Feature Columns"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EL8-9d4ZJNR7",
"colab_type": "text"
},
"source": [
"In order to import our training data into TensorFlow, we need to specify what type of data each feature contains. There are two main types of data we'll use in this and future exercises:\n",
"\n",
"* **Categorical Data**: Data that is textual. In this exercise, our housing data set does not contain any categorical features, but examples you might see would be the home style, the words in a real-estate ad.\n",
"\n",
"* **Numerical Data**: Data that is a number (integer or float) and that you want to treat as a number. As we will discuss more later sometimes you might want to treat numerical data (e.g., a postal code) as if it were categorical.\n",
"\n",
"In TensorFlow, we indicate a feature's data type using a construct called a **feature column**. Feature columns store only a description of the feature data; they do not contain the feature data itself.\n",
"\n",
"To start, we're going to use just one numeric input feature, `total_rooms`. The following code pulls the `total_rooms` data from our `california_housing_dataframe` and defines the feature column using `numeric_column`, which specifies its data is numeric:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "rhEbFCZ86cDZ",
"colab_type": "code",
"colab": {}
},
"source": [
"# Define the input feature: total_rooms.\n",
"my_feature = california_housing_dataframe[[\"total_rooms\"]]\n",
"\n",
"# Configure a numeric feature column for total_rooms.\n",
"feature_columns = [tf.feature_column.numeric_column(\"total_rooms\")]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "K_3S8teX7Rd2",
"colab_type": "text"
},
"source": [
"**NOTE:** The shape of our `total_rooms` data is a one-dimensional array (a list of the total number of rooms for each block). This is the default shape for `numeric_column`, so we don't have to pass it as an argument."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UMl3qrU5MGV6",
"colab_type": "text"
},
"source": [
"### Step 2: Define the Target"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cw4nrfcB7kyk",
"colab_type": "text"
},
"source": [
"Next, we'll define our target, which is `median_house_value`. Again, we can pull it from our `california_housing_dataframe`:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "l1NvvNkH8Kbt",
"colab_type": "code",
"colab": {}
},
"source": [
"# Define the label.\n",
"targets = california_housing_dataframe[\"median_house_value\"]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "4M-rTFHL2UkA",
"colab_type": "text"
},
"source": [
"### Step 3: Configure the LinearRegressor"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fUfGQUNp7jdL",
"colab_type": "text"
},
"source": [
"Next, we'll configure a linear regression model using LinearRegressor. We'll train this model using the `GradientDescentOptimizer`, which implements Mini-Batch Stochastic Gradient Descent (SGD). The `learning_rate` argument controls the size of the gradient step.\n",
"\n",
"**NOTE:** To be safe, we also apply [gradient clipping](https://developers.google.com/machine-learning/glossary/#gradient_clipping) to our optimizer via `clip_gradients_by_norm`. Gradient clipping ensures the magnitude of the gradients do not become too large during training, which can cause gradient descent to fail. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "ubhtW-NGU802",
"colab_type": "code",
"colab": {}
},
"source": [
"# Use gradient descent as the optimizer for training the model.\n",
"my_optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.0000001)\n",
"my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)\n",
"\n",
"# Configure the linear regression model with our feature columns and optimizer.\n",
"# Set a learning rate of 0.0000001 for Gradient Descent.\n",
"linear_regressor = tf.estimator.LinearRegressor(\n",
" feature_columns=feature_columns,\n",
" optimizer=my_optimizer\n",
")"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "-0IztwdK2f3F",
"colab_type": "text"
},
"source": [
"### Step 4: Define the Input Function"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "S5M5j6xSCHxx",
"colab_type": "text"
},
"source": [
"To import our California housing data into our `LinearRegressor`, we need to define an input function, which instructs TensorFlow how to preprocess\n",
"the data, as well as how to batch, shuffle, and repeat it during model training.\n",
"\n",
"First, we'll convert our *pandas* feature data into a dict of NumPy arrays. We can then use the TensorFlow [Dataset API](https://www.tensorflow.org/programmers_guide/datasets) to construct a dataset object from our data, and then break\n",
"our data into batches of `batch_size`, to be repeated for the specified number of epochs (num_epochs). \n",
"\n",
"**NOTE:** When the default value of `num_epochs=None` is passed to `repeat()`, the input data will be repeated indefinitely.\n",
"\n",
"Next, if `shuffle` is set to `True`, we'll shuffle the data so that it's passed to the model randomly during training. The `buffer_size` argument specifies\n",
"the size of the dataset from which `shuffle` will randomly sample.\n",
"\n",
"Finally, our input function constructs an iterator for the dataset and returns the next batch of data to the LinearRegressor."
]
},
{
"cell_type": "code",
"metadata": {
"id": "RKZ9zNcHJtwc",
"colab_type": "code",
"colab": {}
},
"source": [
"def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):\n",
" \"\"\"Trains a linear regression model of one feature.\n",
" \n",
" Args:\n",
" features: pandas DataFrame of features\n",
" targets: pandas DataFrame of targets\n",
" batch_size: Size of batches to be passed to the model\n",
" shuffle: True or False. Whether to shuffle the data.\n",
" num_epochs: Number of epochs for which data should be repeated. None = repeat indefinitely\n",
" Returns:\n",
" Tuple of (features, labels) for next data batch\n",
" \"\"\"\n",
" \n",
" # Convert pandas data into a dict of np arrays.\n",
" features = {key:np.array(value) for key,value in dict(features).items()} \n",
" \n",
" # Construct a dataset, and configure batching/repeating.\n",
" ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit\n",
" ds = ds.batch(batch_size).repeat(num_epochs)\n",
" \n",
" # Shuffle the data, if specified.\n",
" if shuffle:\n",
" ds = ds.shuffle(buffer_size=10000)\n",
" \n",
" # Return the next batch of data.\n",
" features, labels = ds.make_one_shot_iterator().get_next()\n",
" return features, labels"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "wwa6UeA1V5F_",
"colab_type": "text"
},
"source": [
"**NOTE:** We'll continue to use this same input function in later exercises. For more\n",
"detailed documentation of input functions and the `Dataset` API, see the [TensorFlow Programmer's Guide](https://www.tensorflow.org/programmers_guide/datasets)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4YS50CQb2ooO",
"colab_type": "text"
},
"source": [
"### Step 5: Train the Model"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "yP92XkzhU803",
"colab_type": "text"
},
"source": [
"We can now call `train()` on our `linear_regressor` to train the model. We'll wrap `my_input_fn` in a `lambda`\n",
"so we can pass in `my_feature` and `targets` as arguments (see this [TensorFlow input function tutorial](https://www.tensorflow.org/get_started/input_fn#passing_input_fn_data_to_your_model) for more details), and to start, we'll\n",
"train for 100 steps."
]
},
{
"cell_type": "code",
"metadata": {
"id": "5M-Kt6w8U803",
"colab_type": "code",
"colab": {}
},
"source": [
"_ = linear_regressor.train(\n",
" input_fn = lambda:my_input_fn(my_feature, targets),\n",
" steps=100\n",
")"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "7Nwxqxlx2sOv",
"colab_type": "text"
},
"source": [
"### Step 6: Evaluate the Model"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KoDaF2dlJQG5",
"colab_type": "text"
},
"source": [
"Let's make predictions on that training data, to see how well our model fit it during training.\n",
"\n",
"**NOTE:** Training error measures how well your model fits the training data, but it **_does not_** measure how well your model **_generalizes to new data_**. In later exercises, you'll explore how to split your data to evaluate your model's ability to generalize.\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "pDIxp6vcU809",
"colab_type": "code",
"colab": {}
},
"source": [
"# Create an input function for predictions.\n",
"# Note: Since we're making just one prediction for each example, we don't \n",
"# need to repeat or shuffle the data here.\n",
"prediction_input_fn =lambda: my_input_fn(my_feature, targets, num_epochs=1, shuffle=False)\n",
"\n",
"# Call predict() on the linear_regressor to make predictions.\n",
"predictions = linear_regressor.predict(input_fn=prediction_input_fn)\n",
"\n",
"# Format predictions as a NumPy array, so we can calculate error metrics.\n",
"predictions = np.array([item['predictions'][0] for item in predictions])\n",
"\n",
"# Print Mean Squared Error and Root Mean Squared Error.\n",
"mean_squared_error = metrics.mean_squared_error(predictions, targets)\n",
"root_mean_squared_error = math.sqrt(mean_squared_error)\n",
"print(\"Mean Squared Error (on training data): %0.3f\" % mean_squared_error)\n",
"print(\"Root Mean Squared Error (on training data): %0.3f\" % root_mean_squared_error)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "AKWstXXPzOVz",
"colab_type": "text"
},
"source": [
"Is this a good model? How would you judge how large this error is?\n",
"\n",
"Mean Squared Error (MSE) can be hard to interpret, so we often look at Root Mean Squared Error (RMSE)\n",
"instead. A nice property of RMSE is that it can be interpreted on the same scale as the original targets.\n",
"\n",
"Let's compare the RMSE to the difference of the min and max of our targets:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "7UwqGbbxP53O",
"colab_type": "code",
"colab": {}
},
"source": [
"min_house_value = california_housing_dataframe[\"median_house_value\"].min()\n",
"max_house_value = california_housing_dataframe[\"median_house_value\"].max()\n",
"min_max_difference = max_house_value - min_house_value\n",
"\n",
"print(\"Min. Median House Value: %0.3f\" % min_house_value)\n",
"print(\"Max. Median House Value: %0.3f\" % max_house_value)\n",
"print(\"Difference between Min. and Max.: %0.3f\" % min_max_difference)\n",
"print(\"Root Mean Squared Error: %0.3f\" % root_mean_squared_error)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "JigJr0C7Pzit",
"colab_type": "text"
},
"source": [
"Our error spans nearly half the range of the target values. Can we do better?\n",
"\n",
"This is the question that nags at every model developer. Let's develop some basic strategies to reduce model error.\n",
"\n",
"The first thing we can do is take a look at how well our predictions match our targets, in terms of overall summary statistics."
]
},
{
"cell_type": "code",
"metadata": {
"id": "941nclxbzqGH",
"colab_type": "code",
"cellView": "both",
"colab": {
"test": {
"output": "ignore",
"timeout": 600
}
}
},
"source": [
"calibration_data = pd.DataFrame()\n",
"calibration_data[\"predictions\"] = pd.Series(predictions)\n",
"calibration_data[\"targets\"] = pd.Series(targets)\n",
"calibration_data.describe()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "E2-bf8Hq36y8",
"colab_type": "text"
},
"source": [
"Okay, maybe this information is helpful. How does the mean value compare to the model's RMSE? How about the various quantiles?\n",
"\n",
"We can also visualize the data and the line we've learned. Recall that linear regression on a single feature can be drawn as a line mapping input *x* to output *y*.\n",
"\n",
"First, we'll get a uniform random sample of the data so we can make a readable scatter plot."
]
},
{
"cell_type": "code",
"metadata": {
"id": "SGRIi3mAU81H",
"colab_type": "code",
"colab": {}
},
"source": [
"sample = california_housing_dataframe.sample(n=300)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "N-JwuJBKU81J",
"colab_type": "text"
},
"source": [
"Next, we'll plot the line we've learned, drawing from the model's bias term and feature weight, together with the scatter plot. The line will show up red."
]
},
{
"cell_type": "code",
"metadata": {
"id": "7G12E76-339G",
"colab_type": "code",
"cellView": "both",
"colab": {
"test": {
"output": "ignore",
"timeout": 600
}
}
},
"source": [
"# Get the min and max total_rooms values.\n",
"x_0 = sample[\"total_rooms\"].min()\n",
"x_1 = sample[\"total_rooms\"].max()\n",
"\n",
"# Retrieve the final weight and bias generated during training.\n",
"weight = linear_regressor.get_variable_value('linear/linear_model/total_rooms/weights')[0]\n",
"bias = linear_regressor.get_variable_value('linear/linear_model/bias_weights')\n",
"\n",
"# Get the predicted median_house_values for the min and max total_rooms values.\n",
"y_0 = weight * x_0 + bias \n",
"y_1 = weight * x_1 + bias\n",
"\n",
"# Plot our regression line from (x_0, y_0) to (x_1, y_1).\n",
"plt.plot([x_0, x_1], [y_0, y_1], c='r')\n",
"\n",
"# Label the graph axes.\n",
"plt.ylabel(\"median_house_value\")\n",
"plt.xlabel(\"total_rooms\")\n",
"\n",
"# Plot a scatter plot from our data sample.\n",
"plt.scatter(sample[\"total_rooms\"], sample[\"median_house_value\"])\n",
"\n",
"# Display graph.\n",
"plt.show()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "t0lRt4USU81L",
"colab_type": "text"
},
"source": [
"This initial line looks way off. See if you can look back at the summary stats and see the same information encoded there.\n",
"\n",
"Together, these initial sanity checks suggest we may be able to find a much better line."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AZWF67uv0HTG",
"colab_type": "text"
},
"source": [
"## Tweak the Model Hyperparameters\n",
"For this exercise, we've put all the above code in a single function for convenience. You can call the function with different parameters to see the effect.\n",
"\n",
"In this function, we'll proceed in 10 evenly divided periods so that we can observe the model improvement at each period.\n",
"\n",
"For each period, we'll compute and graph training loss. This may help you judge when a model is converged, or if it needs more iterations.\n",
"\n",
"We'll also plot the feature weight and bias term values learned by the model over time. This is another way to see how things converge."
]
},
{
"cell_type": "code",
"metadata": {
"id": "wgSMeD5UU81N",
"colab_type": "code",
"colab": {}
},
"source": [
"def train_model(learning_rate, steps, batch_size, input_feature=\"total_rooms\"):\n",
" \"\"\"Trains a linear regression model of one feature.\n",
" \n",
" Args:\n",
" learning_rate: A `float`, the learning rate.\n",
" steps: A non-zero `int`, the total number of training steps. A training step\n",
" consists of a forward and backward pass using a single batch.\n",
" batch_size: A non-zero `int`, the batch size.\n",
" input_feature: A `string` specifying a column from `california_housing_dataframe`\n",
" to use as input feature.\n",
" \"\"\"\n",
" \n",
" periods = 10\n",
" steps_per_period = steps / periods\n",
"\n",
" my_feature = input_feature\n",
" my_feature_data = california_housing_dataframe[[my_feature]]\n",
" my_label = \"median_house_value\"\n",
" targets = california_housing_dataframe[my_label]\n",
"\n",
" # Create feature columns.\n",
" feature_columns = [tf.feature_column.numeric_column(my_feature)]\n",
" \n",
" # Create input functions.\n",
" training_input_fn = lambda:my_input_fn(my_feature_data, targets, batch_size=batch_size)\n",
" prediction_input_fn = lambda: my_input_fn(my_feature_data, targets, num_epochs=1, shuffle=False)\n",
" \n",
" # Create a linear regressor object.\n",
" my_optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\n",
" my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)\n",
" linear_regressor = tf.estimator.LinearRegressor(\n",
" feature_columns=feature_columns,\n",
" optimizer=my_optimizer\n",
" )\n",
"\n",
" # Set up to plot the state of our model's line each period.\n",
" plt.figure(figsize=(15, 6))\n",
" plt.subplot(1, 2, 1)\n",
" plt.title(\"Learned Line by Period\")\n",
" plt.ylabel(my_label)\n",
" plt.xlabel(my_feature)\n",
" sample = california_housing_dataframe.sample(n=300)\n",
" plt.scatter(sample[my_feature], sample[my_label])\n",
" colors = [cm.coolwarm(x) for x in np.linspace(-1, 1, periods)]\n",
"\n",
" # Train the model, but do so inside a loop so that we can periodically assess\n",
" # loss metrics.\n",
" print(\"Training model...\")\n",
" print(\"RMSE (on training data):\")\n",
" root_mean_squared_errors = []\n",
" for period in range (0, periods):\n",
" # Train the model, starting from the prior state.\n",
" linear_regressor.train(\n",
" input_fn=training_input_fn,\n",
" steps=steps_per_period\n",
" )\n",
" # Take a break and compute predictions.\n",
" predictions = linear_regressor.predict(input_fn=prediction_input_fn)\n",
" predictions = np.array([item['predictions'][0] for item in predictions])\n",
" \n",
" # Compute loss.\n",
" root_mean_squared_error = math.sqrt(\n",
" metrics.mean_squared_error(predictions, targets))\n",
" # Occasionally print the current loss.\n",
" print(\" period %02d : %0.2f\" % (period, root_mean_squared_error))\n",
" # Add the loss metrics from this period to our list.\n",
" root_mean_squared_errors.append(root_mean_squared_error)\n",
" # Finally, track the weights and biases over time.\n",
" # Apply some math to ensure that the data and line are plotted neatly.\n",
" y_extents = np.array([0, sample[my_label].max()])\n",
" \n",
" weight = linear_regressor.get_variable_value('linear/linear_model/%s/weights' % input_feature)[0]\n",
" bias = linear_regressor.get_variable_value('linear/linear_model/bias_weights')\n",
"\n",
" x_extents = (y_extents - bias) / weight\n",
" x_extents = np.maximum(np.minimum(x_extents,\n",
" sample[my_feature].max()),\n",
" sample[my_feature].min())\n",
" y_extents = weight * x_extents + bias\n",
" plt.plot(x_extents, y_extents, color=colors[period]) \n",
" print(\"Model training finished.\")\n",
"\n",
" # Output a graph of loss metrics over periods.\n",
" plt.subplot(1, 2, 2)\n",
" plt.ylabel('RMSE')\n",
" plt.xlabel('Periods')\n",
" plt.title(\"Root Mean Squared Error vs. Periods\")\n",
" plt.tight_layout()\n",
" plt.plot(root_mean_squared_errors)\n",
"\n",
" # Output a table with calibration data.\n",
" calibration_data = pd.DataFrame()\n",
" calibration_data[\"predictions\"] = pd.Series(predictions)\n",
" calibration_data[\"targets\"] = pd.Series(targets)\n",
" display.display(calibration_data.describe())\n",
"\n",
" print(\"Final RMSE (on training data): %0.2f\" % root_mean_squared_error)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "kg8A4ArBU81Q",
"colab_type": "text"
},
"source": [
"## Task 1: Achieve an RMSE of 180 or Below\n",
"\n",
"Tweak the model hyperparameters to improve loss and better match the target distribution.\n",
"If, after 5 minutes or so, you're having trouble beating a RMSE of 180, check the solution for a possible combination."
]
},
{
"cell_type": "code",
"metadata": {
"id": "UzoZUSdLIolF",
"colab_type": "code",
"cellView": "both",
"colab": {
"test": {
"output": "ignore",
"timeout": 600
}
}
},
"source": [
"train_model(\n",
" learning_rate=0.00001,\n",
" steps=100,\n",
" batch_size=1\n",
")"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "ajVM7rkoYXeL",
"colab_type": "text"
},
"source": [
"### Solution\n",
"\n",
"Click below for one possible solution."
]
},
{
"cell_type": "code",
"metadata": {
"id": "T3zmldDwYy5c",
"colab_type": "code",
"colab": {}
},
"source": [
"train_model(\n",
" learning_rate=0.00002,\n",
" steps=500,\n",
" batch_size=5\n",
")"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "M8H0_D4vYa49",
"colab_type": "text"
},
"source": [
"This is just one possible configuration; there may be other combinations of settings that also give good results. Note that in general, this exercise isn't about finding the *one best* setting, but to help build your intutions about how tweaking the model configuration affects prediction quality."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QU5sLyYTqzqL",
"colab_type": "text"
},
"source": [
"### Is There a Standard Heuristic for Model Tuning?\n",
"\n",
"This is a commonly asked question. The short answer is that the effects of different hyperparameters are data dependent. So there are no hard-and-fast rules; you'll need to test on your data.\n",
"\n",
"That said, here are a few rules of thumb that may help guide you:\n",
"\n",
" * Training error should steadily decrease, steeply at first, and should eventually plateau as training converges.\n",
" * If the training has not converged, try running it for longer.\n",
" * If the training error decreases too slowly, increasing the learning rate may help it decrease faster.\n",
" * But sometimes the exact opposite may happen if the learning rate is too high.\n",
" * If the training error varies wildly, try decreasing the learning rate.\n",
" * Lower learning rate plus larger number of steps or larger batch size is often a good combination.\n",
" * Very small batch sizes can also cause instability. First try larger values like 100 or 1000, and decrease until you see degradation.\n",
"\n",
"Again, never go strictly by these rules of thumb, because the effects are data dependent. Always experiment and verify."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GpV-uF_cBCBU",
"colab_type": "text"
},
"source": [
"## Task 2: Try a Different Feature\n",
"\n",
"See if you can do any better by replacing the `total_rooms` feature with the `population` feature.\n",
"\n",
"Don't take more than 5 minutes on this portion."
]
},
{
"cell_type": "code",
"metadata": {
"id": "YMyOxzb0ZlAH",
"colab_type": "code",
"colab": {}
},
"source": [
"# YOUR CODE HERE"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "ci1ISxxrZ7v0",
"colab_type": "text"
},
"source": [
"### Solution\n",
"\n",
"Click below for one possible solution."
]
},
{
"cell_type": "code",
"metadata": {
"id": "SjdQQCduZ7BV",
"colab_type": "code",
"colab": {}
},
"source": [
"train_model(\n",
" learning_rate=0.00002,\n",
" steps=1000,\n",
" batch_size=5,\n",
" input_feature=\"population\"\n",
")"
],
"execution_count": 0,
"outputs": []
}
]
}

View file

@ -0,0 +1,648 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "intro_to_pandas.ipynb",
"provenance": [],
"collapsed_sections": [
"JndnmDMp66FL",
"YHIWvc9Ms-Ll",
"TJffr5_Jwqvd"
]
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "JndnmDMp66FL"
},
"source": [
"#### Copyright 2017 Google LLC."
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "hMqWDc_m6rUC",
"cellView": "both",
"colab": {}
},
"source": [
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "rHLcriKWLRe4"
},
"source": [
"# Intro to pandas"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "QvJBqX8_Bctk"
},
"source": [
"**Learning Objectives:**\n",
" * Gain an introduction to the `DataFrame` and `Series` data structures of the *pandas* library\n",
" * Access and manipulate data within a `DataFrame` and `Series`\n",
" * Import CSV data into a *pandas* `DataFrame`\n",
" * Reindex a `DataFrame` to shuffle data"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "TIFJ83ZTBctl"
},
"source": [
"[*pandas*](http://pandas.pydata.org/) is a column-oriented data analysis API. It's a great tool for handling and analyzing input data, and many ML frameworks support *pandas* data structures as inputs.\n",
"Although a comprehensive introduction to the *pandas* API would span many pages, the core concepts are fairly straightforward, and we'll present them below. For a more complete reference, the [*pandas* docs site](http://pandas.pydata.org/pandas-docs/stable/index.html) contains extensive documentation and many tutorials."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "s_JOISVgmn9v"
},
"source": [
"## Basic Concepts\n",
"\n",
"The following line imports the *pandas* API and prints the API version:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "aSRYu62xUi3g",
"colab": {}
},
"source": [
"from __future__ import print_function\n",
"\n",
"import pandas as pd\n",
"pd.__version__"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "daQreKXIUslr"
},
"source": [
"The primary data structures in *pandas* are implemented as two classes:\n",
"\n",
" * **`DataFrame`**, which you can imagine as a relational data table, with rows and named columns.\n",
" * **`Series`**, which is a single column. A `DataFrame` contains one or more `Series` and a name for each `Series`.\n",
"\n",
"The data frame is a commonly used abstraction for data manipulation. Similar implementations exist in [Spark](https://spark.apache.org/) and [R](https://www.r-project.org/about.html)."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "fjnAk1xcU0yc"
},
"source": [
"One way to create a `Series` is to construct a `Series` object. For example:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "DFZ42Uq7UFDj",
"colab": {}
},
"source": [
"pd.Series(['San Francisco', 'San Jose', 'Sacramento'])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "U5ouUp1cU6pC"
},
"source": [
"`DataFrame` objects can be created by passing a `dict` mapping `string` column names to their respective `Series`. If the `Series` don't match in length, missing values are filled with special [NA/NaN](http://pandas.pydata.org/pandas-docs/stable/missing_data.html) values. Example:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "avgr6GfiUh8t",
"colab": {}
},
"source": [
"city_names = pd.Series(['San Francisco', 'San Jose', 'Sacramento'])\n",
"population = pd.Series([852469, 1015785, 485199])\n",
"\n",
"pd.DataFrame({ 'City name': city_names, 'Population': population })"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "oa5wfZT7VHJl"
},
"source": [
"But most of the time, you load an entire file into a `DataFrame`. The following example loads a file with California housing data. Run the following cell to load the data and create feature definitions:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "av6RYOraVG1V",
"colab": {}
},
"source": [
"california_housing_dataframe = pd.read_csv(\"https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv\", sep=\",\")\n",
"california_housing_dataframe.describe()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "WrkBjfz5kEQu"
},
"source": [
"The example above used `DataFrame.describe` to show interesting statistics about a `DataFrame`. Another useful function is `DataFrame.head`, which displays the first few records of a `DataFrame`:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "s3ND3bgOkB5k",
"colab": {}
},
"source": [
"california_housing_dataframe.head()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "w9-Es5Y6laGd"
},
"source": [
"Another powerful feature of *pandas* is graphing. For example, `DataFrame.hist` lets you quickly study the distribution of values in a column:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "nqndFVXVlbPN",
"colab": {}
},
"source": [
"california_housing_dataframe.hist('housing_median_age')"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "XtYZ7114n3b-"
},
"source": [
"## Accessing Data\n",
"\n",
"You can access `DataFrame` data using familiar Python dict/list operations:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "_TFm7-looBFF",
"colab": {}
},
"source": [
"cities = pd.DataFrame({ 'City name': city_names, 'Population': population })\n",
"print(type(cities['City name']))\n",
"cities['City name']"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "V5L6xacLoxyv",
"colab": {}
},
"source": [
"print(type(cities['City name'][1]))\n",
"cities['City name'][1]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "gcYX1tBPugZl",
"colab": {}
},
"source": [
"print(type(cities[0:2]))\n",
"cities[0:2]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "65g1ZdGVjXsQ"
},
"source": [
"In addition, *pandas* provides an extremely rich API for advanced [indexing and selection](http://pandas.pydata.org/pandas-docs/stable/indexing.html) that is too extensive to be covered here."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "RM1iaD-ka3Y1"
},
"source": [
"## Manipulating Data\n",
"\n",
"You may apply Python's basic arithmetic operations to `Series`. For example:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "XWmyCFJ5bOv-",
"colab": {}
},
"source": [
"population / 1000."
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "TQzIVnbnmWGM"
},
"source": [
"[NumPy](http://www.numpy.org/) is a popular toolkit for scientific computing. *pandas* `Series` can be used as arguments to most NumPy functions:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "ko6pLK6JmkYP",
"colab": {}
},
"source": [
"import numpy as np\n",
"\n",
"np.log(population)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "xmxFuQmurr6d"
},
"source": [
"For more complex single-column transformations, you can use `Series.apply`. Like the Python [map function](https://docs.python.org/2/library/functions.html#map), \n",
"`Series.apply` accepts as an argument a [lambda function](https://docs.python.org/2/tutorial/controlflow.html#lambda-expressions), which is applied to each value.\n",
"\n",
"The example below creates a new `Series` that indicates whether `population` is over one million:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "Fc1DvPAbstjI",
"colab": {}
},
"source": [
"population.apply(lambda val: val > 1000000)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "ZeYYLoV9b9fB"
},
"source": [
"\n",
"Modifying `DataFrames` is also straightforward. For example, the following code adds two `Series` to an existing `DataFrame`:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "0gCEX99Hb8LR",
"colab": {}
},
"source": [
"cities['Area square miles'] = pd.Series([46.87, 176.53, 97.92])\n",
"cities['Population density'] = cities['Population'] / cities['Area square miles']\n",
"cities"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "6qh63m-ayb-c"
},
"source": [
"## Exercise #1\n",
"\n",
"Modify the `cities` table by adding a new boolean column that is True if and only if *both* of the following are True:\n",
"\n",
" * The city is named after a saint.\n",
" * The city has an area greater than 50 square miles.\n",
"\n",
"**Note:** Boolean `Series` are combined using the bitwise, rather than the traditional boolean, operators. For example, when performing *logical and*, use `&` instead of `and`.\n",
"\n",
"**Hint:** \"San\" in Spanish means \"saint.\""
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "zCOn8ftSyddH",
"colab": {}
},
"source": [
"# Your code here"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "YHIWvc9Ms-Ll"
},
"source": [
"### Solution\n",
"\n",
"Click below for a solution."
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "T5OlrqtdtCIb",
"colab": {}
},
"source": [
"cities['Is wide and has saint name'] = (cities['Area square miles'] > 50) & cities['City name'].apply(lambda name: name.startswith('San'))\n",
"cities"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "f-xAOJeMiXFB"
},
"source": [
"## Indexes\n",
"Both `Series` and `DataFrame` objects also define an `index` property that assigns an identifier value to each `Series` item or `DataFrame` row. \n",
"\n",
"By default, at construction, *pandas* assigns index values that reflect the ordering of the source data. Once created, the index values are stable; that is, they do not change when data is reordered."
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "2684gsWNinq9",
"colab": {}
},
"source": [
"city_names.index"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "F_qPe2TBjfWd",
"colab": {}
},
"source": [
"cities.index"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "hp2oWY9Slo_h"
},
"source": [
"Call `DataFrame.reindex` to manually reorder the rows. For example, the following has the same effect as sorting by city name:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "sN0zUzSAj-U1",
"colab": {}
},
"source": [
"cities.reindex([2, 0, 1])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "-GQFz8NZuS06"
},
"source": [
"Reindexing is a great way to shuffle (randomize) a `DataFrame`. In the example below, we take the index, which is array-like, and pass it to NumPy's `random.permutation` function, which shuffles its values in place. Calling `reindex` with this shuffled array causes the `DataFrame` rows to be shuffled in the same way.\n",
"Try running the following cell multiple times!"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "mF8GC0k8uYhz",
"colab": {}
},
"source": [
"cities.reindex(np.random.permutation(cities.index))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "fSso35fQmGKb"
},
"source": [
"For more information, see the [Index documentation](http://pandas.pydata.org/pandas-docs/stable/indexing.html#index-objects)."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "8UngIdVhz8C0"
},
"source": [
"## Exercise #2\n",
"\n",
"The `reindex` method allows index values that are not in the original `DataFrame`'s index values. Try it and see what happens if you use such values! Why do you think this is allowed?"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "PN55GrDX0jzO",
"colab": {}
},
"source": [
"# Your code here"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "TJffr5_Jwqvd"
},
"source": [
"### Solution\n",
"\n",
"Click below for the solution."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "8oSvi2QWwuDH"
},
"source": [
"If your `reindex` input array includes values not in the original `DataFrame` index values, `reindex` will add new rows for these \"missing\" indices and populate all corresponding columns with `NaN` values:"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "yBdkucKCwy4x",
"colab": {}
},
"source": [
"cities.reindex([0, 4, 5, 2])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "2l82PhPbwz7g"
},
"source": [
"This behavior is desirable because indexes are often strings pulled from the actual data (see the [*pandas* reindex\n",
"documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reindex.html) for an example\n",
"in which the index values are browser names).\n",
"\n",
"In this case, allowing \"missing\" indices makes it easy to reindex using an external list, as you don't have to worry about\n",
"sanitizing the input."
]
}
]
}

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1,582 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "synthetic_features_and_outliers.ipynb",
"provenance": [],
"collapsed_sections": [
"JndnmDMp66FL",
"i5Ul3zf5QYvW",
"jByCP8hDRZmM",
"WvgxW0bUSC-c"
]
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "JndnmDMp66FL",
"colab_type": "text"
},
"source": [
"#### Copyright 2017 Google LLC."
]
},
{
"cell_type": "code",
"metadata": {
"id": "hMqWDc_m6rUC",
"colab_type": "code",
"cellView": "both",
"colab": {}
},
"source": [
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "4f3CKqFUqL2-",
"colab_type": "text"
},
"source": [
"# Synthetic Features and Outliers"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jnKgkN5fHbGy",
"colab_type": "text"
},
"source": [
"**Learning Objectives:**\n",
" * Create a synthetic feature that is the ratio of two other features\n",
" * Use this new feature as an input to a linear regression model\n",
" * Improve the effectiveness of the model by identifying and clipping (removing) outliers out of the input data"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VOpLo5dcHbG0",
"colab_type": "text"
},
"source": [
"Let's revisit our model from the previous First Steps with TensorFlow exercise. \n",
"\n",
"First, we'll import the California housing data into a *pandas* `DataFrame`:"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "S8gm6BpqRRuh",
"colab_type": "text"
},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"metadata": {
"id": "9D8GgUovHbG0",
"colab_type": "code",
"colab": {}
},
"source": [
"from __future__ import print_function\n",
"\n",
"import math\n",
"\n",
"from IPython import display\n",
"from matplotlib import cm\n",
"from matplotlib import gridspec\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"import sklearn.metrics as metrics\n",
"import tensorflow as tf\n",
"from tensorflow.python.data import Dataset\n",
"\n",
"tf.logging.set_verbosity(tf.logging.ERROR)\n",
"pd.options.display.max_rows = 10\n",
"pd.options.display.float_format = '{:.1f}'.format\n",
"\n",
"california_housing_dataframe = pd.read_csv(\"https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv\", sep=\",\")\n",
"\n",
"california_housing_dataframe = california_housing_dataframe.reindex(\n",
" np.random.permutation(california_housing_dataframe.index))\n",
"california_housing_dataframe[\"median_house_value\"] /= 1000.0\n",
"california_housing_dataframe"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "I6kNgrwCO_ms",
"colab_type": "text"
},
"source": [
"Next, we'll set up our input function, and define the function for model training:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "5RpTJER9XDub",
"colab_type": "code",
"colab": {}
},
"source": [
"def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):\n",
" \"\"\"Trains a linear regression model of one feature.\n",
" \n",
" Args:\n",
" features: pandas DataFrame of features\n",
" targets: pandas DataFrame of targets\n",
" batch_size: Size of batches to be passed to the model\n",
" shuffle: True or False. Whether to shuffle the data.\n",
" num_epochs: Number of epochs for which data should be repeated. None = repeat indefinitely\n",
" Returns:\n",
" Tuple of (features, labels) for next data batch\n",
" \"\"\"\n",
" \n",
" # Convert pandas data into a dict of np arrays.\n",
" features = {key:np.array(value) for key,value in dict(features).items()} \n",
" \n",
" # Construct a dataset, and configure batching/repeating.\n",
" ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit\n",
" ds = ds.batch(batch_size).repeat(num_epochs)\n",
" \n",
" # Shuffle the data, if specified.\n",
" if shuffle:\n",
" ds = ds.shuffle(buffer_size=10000)\n",
" \n",
" # Return the next batch of data.\n",
" features, labels = ds.make_one_shot_iterator().get_next()\n",
" return features, labels"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "VgQPftrpHbG3",
"colab_type": "code",
"colab": {}
},
"source": [
"def train_model(learning_rate, steps, batch_size, input_feature):\n",
" \"\"\"Trains a linear regression model.\n",
" \n",
" Args:\n",
" learning_rate: A `float`, the learning rate.\n",
" steps: A non-zero `int`, the total number of training steps. A training step\n",
" consists of a forward and backward pass using a single batch.\n",
" batch_size: A non-zero `int`, the batch size.\n",
" input_feature: A `string` specifying a column from `california_housing_dataframe`\n",
" to use as input feature.\n",
" \n",
" Returns:\n",
" A Pandas `DataFrame` containing targets and the corresponding predictions done\n",
" after training the model.\n",
" \"\"\"\n",
" \n",
" periods = 10\n",
" steps_per_period = steps / periods\n",
"\n",
" my_feature = input_feature\n",
" my_feature_data = california_housing_dataframe[[my_feature]].astype('float32')\n",
" my_label = \"median_house_value\"\n",
" targets = california_housing_dataframe[my_label].astype('float32')\n",
"\n",
" # Create input functions.\n",
" training_input_fn = lambda: my_input_fn(my_feature_data, targets, batch_size=batch_size)\n",
" predict_training_input_fn = lambda: my_input_fn(my_feature_data, targets, num_epochs=1, shuffle=False)\n",
" \n",
" # Create feature columns.\n",
" feature_columns = [tf.feature_column.numeric_column(my_feature)]\n",
" \n",
" # Create a linear regressor object.\n",
" my_optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\n",
" my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)\n",
" linear_regressor = tf.estimator.LinearRegressor(\n",
" feature_columns=feature_columns,\n",
" optimizer=my_optimizer\n",
" )\n",
"\n",
" # Set up to plot the state of our model's line each period.\n",
" plt.figure(figsize=(15, 6))\n",
" plt.subplot(1, 2, 1)\n",
" plt.title(\"Learned Line by Period\")\n",
" plt.ylabel(my_label)\n",
" plt.xlabel(my_feature)\n",
" sample = california_housing_dataframe.sample(n=300)\n",
" plt.scatter(sample[my_feature], sample[my_label])\n",
" colors = [cm.coolwarm(x) for x in np.linspace(-1, 1, periods)]\n",
"\n",
" # Train the model, but do so inside a loop so that we can periodically assess\n",
" # loss metrics.\n",
" print(\"Training model...\")\n",
" print(\"RMSE (on training data):\")\n",
" root_mean_squared_errors = []\n",
" for period in range (0, periods):\n",
" # Train the model, starting from the prior state.\n",
" linear_regressor.train(\n",
" input_fn=training_input_fn,\n",
" steps=steps_per_period,\n",
" )\n",
" # Take a break and compute predictions.\n",
" predictions = linear_regressor.predict(input_fn=predict_training_input_fn)\n",
" predictions = np.array([item['predictions'][0] for item in predictions])\n",
" \n",
" # Compute loss.\n",
" root_mean_squared_error = math.sqrt(\n",
" metrics.mean_squared_error(predictions, targets))\n",
" # Occasionally print the current loss.\n",
" print(\" period %02d : %0.2f\" % (period, root_mean_squared_error))\n",
" # Add the loss metrics from this period to our list.\n",
" root_mean_squared_errors.append(root_mean_squared_error)\n",
" # Finally, track the weights and biases over time.\n",
" # Apply some math to ensure that the data and line are plotted neatly.\n",
" y_extents = np.array([0, sample[my_label].max()])\n",
" \n",
" weight = linear_regressor.get_variable_value('linear/linear_model/%s/weights' % input_feature)[0]\n",
" bias = linear_regressor.get_variable_value('linear/linear_model/bias_weights')\n",
" \n",
" x_extents = (y_extents - bias) / weight\n",
" x_extents = np.maximum(np.minimum(x_extents,\n",
" sample[my_feature].max()),\n",
" sample[my_feature].min())\n",
" y_extents = weight * x_extents + bias\n",
" plt.plot(x_extents, y_extents, color=colors[period]) \n",
" print(\"Model training finished.\")\n",
"\n",
" # Output a graph of loss metrics over periods.\n",
" plt.subplot(1, 2, 2)\n",
" plt.ylabel('RMSE')\n",
" plt.xlabel('Periods')\n",
" plt.title(\"Root Mean Squared Error vs. Periods\")\n",
" plt.tight_layout()\n",
" plt.plot(root_mean_squared_errors)\n",
"\n",
" # Create a table with calibration data.\n",
" calibration_data = pd.DataFrame()\n",
" calibration_data[\"predictions\"] = pd.Series(predictions)\n",
" calibration_data[\"targets\"] = pd.Series(targets)\n",
" display.display(calibration_data.describe())\n",
"\n",
" print(\"Final RMSE (on training data): %0.2f\" % root_mean_squared_error)\n",
" \n",
" return calibration_data"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "FJ6xUNVRm-do",
"colab_type": "text"
},
"source": [
"## Task 1: Try a Synthetic Feature\n",
"\n",
"Both the `total_rooms` and `population` features count totals for a given city block.\n",
"\n",
"But what if one city block were more densely populated than another? We can explore how block density relates to median house value by creating a synthetic feature that's a ratio of `total_rooms` and `population`.\n",
"\n",
"In the cell below, create a feature called `rooms_per_person`, and use that as the `input_feature` to `train_model()`.\n",
"\n",
"What's the best performance you can get with this single feature by tweaking the learning rate? (The better the performance, the better your regression line should fit the data, and the lower\n",
"the final RMSE should be.)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "isONN2XK32Wo",
"colab_type": "text"
},
"source": [
"**NOTE**: You may find it helpful to add a few code cells below so you can try out several different learning rates and compare the results. To add a new code cell, hover your cursor directly below the center of this cell, and click **CODE**."
]
},
{
"cell_type": "code",
"metadata": {
"id": "5ihcVutnnu1D",
"colab_type": "code",
"cellView": "both",
"colab": {
"test": {
"output": "ignore",
"timeout": 600
}
}
},
"source": [
"#\n",
"# YOUR CODE HERE\n",
"#\n",
"california_housing_dataframe[\"rooms_per_person\"] =\n",
"\n",
"calibration_data = train_model(\n",
" learning_rate=0.00005,\n",
" steps=500,\n",
" batch_size=5,\n",
" input_feature=\"rooms_per_person\"\n",
")"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "i5Ul3zf5QYvW",
"colab_type": "text"
},
"source": [
"### Solution\n",
"\n",
"Click below for a solution."
]
},
{
"cell_type": "code",
"metadata": {
"id": "Leaz2oYMQcBf",
"colab_type": "code",
"colab": {}
},
"source": [
"california_housing_dataframe[\"rooms_per_person\"] = (\n",
" california_housing_dataframe[\"total_rooms\"] / california_housing_dataframe[\"population\"])\n",
"\n",
"calibration_data = train_model(\n",
" learning_rate=0.05,\n",
" steps=500,\n",
" batch_size=5,\n",
" input_feature=\"rooms_per_person\")"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZjQrZ8mcHFiU",
"colab_type": "text"
},
"source": [
"## Task 2: Identify Outliers\n",
"\n",
"We can visualize the performance of our model by creating a scatter plot of predictions vs. target values. Ideally, these would lie on a perfectly correlated diagonal line.\n",
"\n",
"Use Pyplot's [`scatter()`](https://matplotlib.org/gallery/shapes_and_collections/scatter.html) to create a scatter plot of predictions vs. targets, using the rooms-per-person model you trained in Task 1.\n",
"\n",
"Do you see any oddities? Trace these back to the source data by looking at the distribution of values in `rooms_per_person`."
]
},
{
"cell_type": "code",
"metadata": {
"id": "P0BDOec4HbG_",
"colab_type": "code",
"colab": {}
},
"source": [
"# YOUR CODE HERE"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "jByCP8hDRZmM",
"colab_type": "text"
},
"source": [
"### Solution\n",
"\n",
"Click below for the solution."
]
},
{
"cell_type": "code",
"metadata": {
"id": "s0tiX2gdRe-S",
"colab_type": "code",
"colab": {}
},
"source": [
"plt.figure(figsize=(15, 6))\n",
"plt.subplot(1, 2, 1)\n",
"plt.scatter(calibration_data[\"predictions\"], calibration_data[\"targets\"])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "kMQD0Uq3RqTX",
"colab_type": "text"
},
"source": [
"The calibration data shows most scatter points aligned to a line. The line is almost vertical, but we'll come back to that later. Right now let's focus on the ones that deviate from the line. We notice that they are relatively few in number.\n",
"\n",
"If we plot a histogram of `rooms_per_person`, we find that we have a few outliers in our input data:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "POTM8C_ER1Oc",
"colab_type": "code",
"colab": {}
},
"source": [
"plt.subplot(1, 2, 2)\n",
"_ = california_housing_dataframe[\"rooms_per_person\"].hist()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "9l0KYpBQu8ed",
"colab_type": "text"
},
"source": [
"## Task 3: Clip Outliers\n",
"\n",
"See if you can further improve the model fit by setting the outlier values of `rooms_per_person` to some reasonable minimum or maximum.\n",
"\n",
"For reference, here's a quick example of how to apply a function to a Pandas `Series`:\n",
"\n",
" clipped_feature = my_dataframe[\"my_feature_name\"].apply(lambda x: max(x, 0))\n",
"\n",
"The above `clipped_feature` will have no values less than `0`."
]
},
{
"cell_type": "code",
"metadata": {
"id": "rGxjRoYlHbHC",
"colab_type": "code",
"colab": {}
},
"source": [
"# YOUR CODE HERE"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "WvgxW0bUSC-c",
"colab_type": "text"
},
"source": [
"### Solution\n",
"\n",
"Click below for the solution."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8YGNjXPaSMPV",
"colab_type": "text"
},
"source": [
"The histogram we created in Task 2 shows that the majority of values are less than `5`. Let's clip `rooms_per_person` to 5, and plot a histogram to double-check the results."
]
},
{
"cell_type": "code",
"metadata": {
"id": "9YyARz6gSR7Q",
"colab_type": "code",
"colab": {}
},
"source": [
"california_housing_dataframe[\"rooms_per_person\"] = (\n",
" california_housing_dataframe[\"rooms_per_person\"]).apply(lambda x: min(x, 5))\n",
"\n",
"_ = california_housing_dataframe[\"rooms_per_person\"].hist()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "vO0e1p_aSgKA",
"colab_type": "text"
},
"source": [
"To verify that clipping worked, let's train again and print the calibration data once more:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "ZgSP2HKfSoOH",
"colab_type": "code",
"colab": {}
},
"source": [
"calibration_data = train_model(\n",
" learning_rate=0.05,\n",
" steps=500,\n",
" batch_size=5,\n",
" input_feature=\"rooms_per_person\")"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "gySE-UgfSony",
"colab_type": "code",
"colab": {}
},
"source": [
"_ = plt.scatter(calibration_data[\"predictions\"], calibration_data[\"targets\"])"
],
"execution_count": 0,
"outputs": []
}
]
}

36
numpy_examples/Li.py Normal file
View file

@ -0,0 +1,36 @@
def L_i(x, y, W):
"""
unvectorized version. Compute the multiclass svm loss for a single example (x,y)
- x is a column vector representing an image (e.g. 3073 x 1 in CIFAR-10)
with an appended bias dimension in the 3073-rd position (i.e. bias trick)
- y is an integer giving index of correct class (e.g. between 0 and 9 in CIFAR-10)
- W is the weight matrix (e.g. 10 x 3073 in CIFAR-10)
"""
delta = 1.0 # see notes about delta later in this section
scores = W.dot(x) # scores becomes of size 10 x 1, the scores for each class
correct_class_score = scores[y]
D = W.shape[0] # number of classes, e.g. 10
loss_i = 0.0
for j in xrange(D): # iterate over all wrong classes
if j == y:
# skip for the true class to only loop over incorrect classes
continue
# accumulate loss for the i-th example
loss_i += max(0, scores[j] - correct_class_score + delta)
return loss_i
def L_i_vectorized(x, y, W):
"""
A faster half-vectorized implementation. half-vectorized
refers to the fact that for a single example the implementation contains
no for loops, but there is still one loop over the examples (outside this function)
"""
delta = 1.0
scores = W.dot(x)
# compute the margins for all classes in one vector operation
margins = np.maximum(0, scores - scores[y] + delta)
# on y-th position scores[y] - scores[y] canceled and gave delta. We want
# to ignore the y-th position and only consider margin on max wrong class
margins[y] = 0
loss_i = np.sum(margins)
return loss_i

View file

@ -0,0 +1,23 @@
import numpy as np
class NearestNeighbor(object):
def __init__(self):
pass
def train(self, X, y):
self.Xtr = X
self.ytr = y
def predict(self, X):
num_test = X.shape[0]
Ypred = np.zeros(num_test, dtype = self.ytr.dtype)
# loop over all test rows
for i in xrange(num_test):
# find the nearest training image to the i'th test image
# using the L1 distance (sum of absolute value differences)
distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
min_index = np.argmin(distances) # get the index with smallest distance
Ypred[i] = self.ytr[min_index] # predict the label of the nearest example
return Ypred

60
numpy_examples/README.md Normal file
View file

@ -0,0 +1,60 @@
## Numpy Resources
### Arrays
* Grid of values, all of the same type.
* The number of dimensions is the rank of the array.
* The shape of an array is a tuple of integers giving the size of the array along each dimension.
```
a = np.array([1, 2, 3]) # Create a rank 1 array
print a.shape # Prints "(3,)"
```
```
numpy.asarray([])
numpy.asarray([]).shape
```
* Many functions to create arrays:
```
a = np.zeros((2,2)) # Create an array of all zeros
b = np.ones((1,2)) # Create an array of all ones
c = np.full((2,2), 7) # Create a constant array
d = np.eye(2) # Create a 2x2 identity matrix
e = np.random.random((2,2)) # Create an array filled with random values
```
* Products:
```
x = np.array([[1,2],[3,4]])
v = np.array([9,10])
w = np.array([11, 12])
# Inner product of vectors
print v.dot(w)
print np.dot(v, w)
# Matrix / vector product
print x.dot(v)
print np.dot(x, v)
```
* Sum:
```
print np.sum(x) # Compute sum of all elements
print np.sum(x, axis=0) # Compute sum of each column
print np.sum(x, axis=1) # Compute sum of each row
```
* Broadcasting is a mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

File diff suppressed because one or more lines are too long

25
numpy_examples/dropout.py Normal file
View file

@ -0,0 +1,25 @@
"""
Inverted Dropout: Recommended implementation example.
We drop and scale at train time and don't do anything at test time.
"""
p = 0.5 # probability of keeping a unit active. higher = less dropout
def train_step(X):
# forward pass for example 3-layer neural network
H1 = np.maximum(0, np.dot(W1, X) + b1)
U1 = (np.random.rand(*H1.shape) < p) / p # first dropout mask. Notice /p!
H1 *= U1 # drop!
H2 = np.maximum(0, np.dot(W2, H1) + b2)
U2 = (np.random.rand(*H2.shape) < p) / p # second dropout mask. Notice /p!
H2 *= U2 # drop!
out = np.dot(W3, H2) + b3
# backward pass: compute gradients... (not shown)
# perform parameter update... (not shown)
def predict(X):
# ensembled forward pass
H1 = np.maximum(0, np.dot(W1, X) + b1) # no scaling necessary
H2 = np.maximum(0, np.dot(W2, H1) + b2)
out = np.dot(W3, H2) + b3

View file

@ -0,0 +1,31 @@
# compute the gradient numerically:
# a generic function takes a function f, a vector x o evaluate
# the gradient on, and returns the gradient of f at x:
def eval_numerical_gradient(f, x):
"""
a naive implementation of numerical gradient of f at x
- f should be a function that takes a single argument
- x is the point (numpy array) to evaluate the gradient at
"""
fx = f(x) # evaluate function value at original point
grad = np.zeros(x.shape)
h = 0.00001
# iterate over all indexes in x
it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
# evaluate function at x+h
ix = it.multi_index
old_value = x[ix]
x[ix] = old_value + h # increment by h
fxh = f(x) # evalute f(x + h)
x[ix] = old_value # restore to previous value (very important!)
# compute the partial derivative
grad[ix] = (fxh - fx) / h # the slope
it.iternext() # step to next dimension
return grad

7
numpy_examples/neuron.py Normal file
View file

@ -0,0 +1,7 @@
class Neuron(object):
# ...
def forward(inputs):
""" assume inputs and weights are 1-D numpy arrays and bias is a number """
cell_body_sum = np.sum(inputs * self.weights) + self.bias
firing_rate = 1.0 / (1.0 + math.exp(-cell_body_sum)) # sigmoid activation function
return firing_rate

View file

@ -0,0 +1,25 @@
#!/usr/bin/env python
# Adapted from: http://cs231n.github.io/neural-networks-case-study/
import numpy as np
import matplotlib.pyplot as plt
N = 100 # number of points per class
D = 2 # dimensionality
K = 3 # number of classes
# data matrix (each row = single example)
X = np.zeros((N*K, D))
# class labels
y = np.zeros(N*K, dtype='uint8')
for j in range(K):
ix = range(N*j,N*(j+1))
r = np.linspace(0.0,1,N) # radius
t = np.linspace(j*4,(j+1)*4,N) + np.random.randn(N)*0.2 # theta
X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
y[ix] = j
# visualize the data:
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1,36 @@
import numpy as np
import tensorflow as tf
# Model linear regression y = Wx + b
x = tf.placeholder(tf.float32, [None, 1])
W = tf.Variable(tf.zeros([1,1]))
b = tf.Variable(tf.zeros([1]))
product = tf.matmul(x,W)
y = product + b
y_ = tf.placeholder(tf.float32, [None, 1])
# Cost function sum((y_-y)**2)
cost = tf.reduce_mean(tf.square(y_-y))
# Training using Gradient Descent to minimize cost
train_step = tf.train.GradientDescentOptimizer(0.0000001).minimize(cost)
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
steps = 1000
for i in range(steps):
# Create fake data for y = W.x + b where W = 2, b = 0
xs = np.array([[i]])
ys = np.array([[2*i]])
# Train
feed = { x: xs, y_: ys }
sess.run(train_step, feed_dict=feed)
print("After %d iteration:" % i)
print("W: %f" % sess.run(W))
print("b: %f" % sess.run(b))
print("cost: %f" % sess.run(cost, feed_dict=feed))