clean up to start add modern models:

2025-08-04 04:24:18 -04:00 · 2024-11-17 17:39:16 -08:00 · 2024-11-17 17:39:16 -08:00 · 956ed2397d
commit 956ed2397d
parent 94d09f6fba
34 changed files with 846 additions and 309 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,134 +0,0 @@
-# Byte-compiled / optimized / DLL files
-__pycache__/
-*.py[cod]
-*$py.class
-
-# C extensions
-*.so
-
-# Distribution / packaging
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-pip-wheel-metadata/
-share/python-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-MANIFEST
-
-# PyInstaller
-#  Usually these files are written by a python script from a template
-#  before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
-
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-
-# Unit test / coverage reports
-htmlcov/
-.tox/
-.nox/
-.coverage
-.coverage.*
-.cache
-nosetests.xml
-coverage.xml
-*.cover
-*.py,cover
-.hypothesis/
-.pytest_cache/
-
-# Translations
-*.mo
-*.pot
-
-# Django stuff:
-*.log
-local_settings.py
-db.sqlite3
-db.sqlite3-journal
-
-# Flask stuff:
-instance/
-.webassets-cache
-
-# Scrapy stuff:
-.scrapy
-
-# Sphinx documentation
-docs/_build/
-
-# PyBuilder
-target/
-
-# Jupyter Notebook
-.ipynb_checkpoints
-
-# IPython
-profile_default/
-ipython_config.py
-
-# pyenv
-.python-version
-
-# pipenv
-#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-#   However, in case of collaboration, if having platform-specific dependencies or dependencies
-#   having no cross-platform support, pipenv may install dependencies that don't work, or not
-#   install all needed dependencies.
-#Pipfile.lock
-
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow
-__pypackages__/
-
-# Celery stuff
-celerybeat-schedule
-celerybeat.pid
-
-# SageMath parsed files
-*.sage.py
-
-# Environments
-.env
-.venv
-env/
-venv/
-ENV/
-env.bak/
-venv.bak/
-
-# Spyder project settings
-.spyderproject
-.spyproject
-
-# Rope project settings
-.ropeproject
-
-# mkdocs documentation
-/site
-
-# mypy
-.mypy_cache/
-.dmypy.json
-dmypy.json
-
-# Pyre type checker
-.pyre/
-
-# Custom
-sandbox_cachedir/
-cachedir
-results
--- a/EBMs/README.md
+++ b/EBMs/README.md
@ -0,0 +1,253 @@
+## quantum ai: training energy-based-models using openai
+
+<br>
+
+ 
+#### ⚛️ this repository contains my adapted code from [opeani's implicit generation and generalization in energy-based-models](https://arxiv.org/pdf/1903.08689.pdf)
+
+<br>
+
+### installing
+
+<br>
+
+```bash
+brew install gcc@6
+brew install open-mpi
+brew install pkg-config
+```
+
+<br>
+
+* there is a **[bug](https://github.com/open-mpi/ompi/issues/7516)** in open-mpi for the specific libraries in this problem (`PMIX ERROR: ERROR`) that can be fixed with:
+
+<br>
+
+```
+export PMIX_MCA_gds=^ds12
+```
+
+<br>
+
+* then install python's requirements:
+
+<br>
+
+```bash
+virtualenv venv
+source venv/bin/activate
+pip install -r requirements.txt
+```
+<br>
+
+* note that this is an adapted requirement file since the **[openai's original](https://github.com/openai/ebm_code_release/blob/master/requirements.txt)** is not complete/correct
+* finally, download and install **[mujoco](https://www.roboti.us/index.html)**
+* you will also need to register for a license, which asks for a machine ID
+* the documentation on the website is incomplete, so just download the suggested script and run:
+
+<br>
+
+```bash
+mv getid_osx getid_osx.dms
+./getid_osx.dms
+```
+
+<br>
+
+---
+
+### running
+
+<br>
+
+#### download pre-trained models (examples)
+
+<br>
+
+* download all **[pre-trained models](https://sites.google.com/view/igebm/home)** and unzip them into a local folder `cachedir`:
+
+<br>
+
+```bash
+mkdir cachedir
+```
+
+<br>
+
+#### setting results directory
+
+<br>
+
+* openai's original code contains **[hardcoded constants that only work on Linux](https://github.com/openai/ebm_code_release/blob/master/data.py#L218)**
+* i changed this to a constant (`ROOT_DIR = "./results"`) in the top of `data.py`
+
+<br>
+
+#### running (parallelization with `mpiexec`)
+
+<br>
+
+* all code supports **[`horovod` execution](https://github.com/horovod/horovod)**, so model training can be increased substantially by using multiple different workers by running each command:
+
+<br>
+
+```
+mpiexec -n <worker_num>  <command>
+```
+
+<br>
+
+##### cifar-10 unconditional
+
+<br>
+
+```
+python train.py --exp=cifar10_uncond --dataset=cifar10 --num_steps=60 --batch_size=128 --step_lr=10.0 --proj_norm=0.01 --zero_kl --replay_batch --large_model
+```
+
+* this should generate the following output:
+
+<br>
+
+```bash
+Instructions for updating:
+Use tf.gfile.GFile.
+2020-05-10 22:12:32.471415: W tensorflow/core/framework/op_def_util.cc:355] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
+64 batch size
+Local rank:  0 1
+Loading data...
+Files already downloaded and verified
+Files already downloaded and verified
+Files already downloaded and verified
+Files already downloaded and verified
+Done loading...
+WARNING:tensorflow:From /Users/mia/dev/ebm_code_release/venv/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
+Instructions for updating:
+Colocations handled automatically by placer.
+Building graph...
+WARNING:tensorflow:From /Users/mia/dev/ebm_code_release/venv/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
+Instructions for updating:
+Use tf.cast instead.
+Finished processing loop construction ...
+Started gradient computation...
+Applying gradients...
+Finished applying gradients.
+Model has a total of 7567880 parameters
+Initializing variables...
+Start broadcast
+End broadcast
+Obtained a total of e_pos: -0.0025530937127768993, e_pos_std: 0.09564747661352158, e_neg: -0.22276005148887634, e_diff: 0.22020696103572845, e_neg_std: 0.016306934878230095, temp: 1, loss_e: -0.22276005148887634, eps: 0.0, label_ent: 2.272536277770996, l
+oss_ml: 0.22020693123340607, loss_total: 0.2792498469352722, x_grad: 0.0009156676824204624, x_grad_first: 0.0009156676824204624, x_off: 0.31731340289115906, iter: 0, gamma: [0.], context_0/c1_pre/cweight:0: 0.0731438547372818, context_0/res_optim_res_c1/
+cweight:0: 4.732660444095593e-11, context_0/res_optim_res_c1/gb:0: 3.4007335836250263e-10, context_0/res_optim_res_c2/cweight:0: 0.9494612216949463, context_0/res_optim_res_c2/g:0: 1.8536269741353806e-10, context_0/res_optim_res_c2/gb:0: 6.27235652306268
+3e-10, context_0/res_optim_res_c2/cb:0: 1.1606662297936055e-09, context_0/res_1_res_c1/cweight:0: 6.714453298917178e-11, context_0/res_1_res_c1/gb:0: 3.6198691266697836e-10, context_0/res_1_res_c2/cweight:0: 0.6582950353622437, context_0/res_1_res_c2/g:0
+: 1.669797633496728e-10, context_0/res_1_res_c2/gb:0: 5.911696687732615e-10, context_0/res_1_res_c2/cb:0: 1.1932842491901852e-09, context_0/res_2_res_c1/cweight:0: 8.567072745657711e-11, context_0/res_2_res_c1/gb:0: 6.868505764145993e-10, context_0/res_2
+_res_c2/cweight:0: 0.46929678320884705, context_0/res_2_res_c2/g:0: 1.655784120924153e-10, context_0/res_2_res_c2/gb:0: 8.058526068666083e-10, context_0/res_2_res_c2/cb:0: 1.0161046448686761e-09, context_0/res_2_res_adaptive/cweight:0: 0.0194275379180908
+2, context_0/res_3_res_c1/cweight:0: 4.011655244107182e-11, context_0/res_3_res_c1/gb:0: 5.064903496609929e-10, context_0/res_3_res_c2/cweight:0: 0.32239994406700134, context_0/res_3_res_c2/g:0: 9.758494012857e-11, context_0/res_3_res_c2/gb:0: 7.75612463
+1441708e-10, context_0/res_3_res_c2/cb:0: 6.362700366580043e-10, context_0/res_4_res_c1/cweight:0: 4.090133440270982e-11, context_0/res_4_res_c1/gb:0: 6.013010089844784e-10, context_0/res_4_res_c2/cweight:0: 0.34806951880455017, context_0/res_4_res_c2/g:
+0: 8.414659247168998e-11, context_0/res_4_res_c2/gb:0: 6.443054978433338e-10, context_0/res_4_res_c2/cb:0: 5.496815780325903e-10, context_0/res_5_res_c1/cweight:0: 3.990113794927197e-11, context_0/res_5_res_c1/gb:0: 3.807749116013781e-10, context_0/res_5
+_res_c2/cweight:0: 0.22841960191726685, context_0/res_5_res_c2/g:0: 4.942361797599659e-11, context_0/res_5_res_c2/gb:0: 7.697342763179904e-10, context_0/res_5_res_c2/cb:0: 3.1796060229183354e-10, context_0/fc5/wweight:0: 3.081033706665039, context_0/fc5/
+b:0: 0.4506262540817261,
+
+................................................................................................................................
+Inception score of 1.2397289276123047 with std of 0.0
+```
+
+<br>
+
+##### cifar-10 conditional
+
+<br>
+
+```
+python train.py --exp=cifar10_cond --dataset=cifar10 --num_steps=60 --batch_size=128 --step_lr=10.0 --proj_norm=0.01 --zero_kl --replay_batch --cclass
+```
+
+<br>
+
+##### imagenet 32x32 conditional
+
+<br>
+
+```
+python train.py --exp=imagenet_cond --num_steps=60  --wider_model --batch_size=32 step_lr=10.0 --proj_norm=0.01 --replay_batch --cclass --zero_kl --dataset=imagenet --imagenet_path=<imagenet32x32 path>
+```
+
+<br>
+
+##### imagenet 128x128 conditional
+
+<br>
+
+```
+python train.py --exp=imagenet_cond --num_steps=50 --batch_size=16 step_lr=100.0 --replay_batch --swish_act --cclass --zero_kl --dataset=imagenetfull --imagenet_datadir=<full imagenet path>
+```
+
+<br>
+
+##### imagenet demo
+
+<br>
+
+* the imagenet_demo.py file contains code for experiments with ebms on conditional imagenet 128x128
+* to generate a gif on sampling, you can run the command:
+
+<br>
+
+```
+python imagenet_demo.py --exp=imagenet128_cond --resume_iter=2238000 --swish_act
+```
+
+* the ebm_sandbox.py file contains several different tasks that can be used to evaluate ebms, which are defined by different settings of task flag in the file
+* for example, to visualize cross class mappings in cifar-10, you can run:
+
+<br>
+
+```
+python ebm_sandbox.py --task=crossclass --num_steps=40 --exp=cifar10_cond --resume_iter=74700
+```
+
+<br>
+
+##### generalization
+
+<br>
+
+* to test generalization to out of distribution classification for SVHN (with similar commands for other datasets):
+
+<br>
+
+```
+python ebm_sandbox.py --task=mixenergy --num_steps=40 --exp=cifar10_large_model_uncond --resume_iter=121200 --large_model --svhnmix --cclass=False
+```
+
+<br>
+
+* to test classification on cifar-10 using a conditional model under either L2 or Li perturbations
+
+<br>
+
+```
+python ebm_sandbox.py --task=label --exp=cifar10_wider_model_cond --resume_iter=21600 --lnorm=-1 --pgd=<number of pgd steps> --num_steps=10 --lival=<li bound value> --wider_model
+```
+
+<br>
+
+##### concept combination
+
+<br>
+
+* to train ebms on conditional dsprites dataset, you can train each model separately on each conditioned latent in `cond_pos`, `cond_rot`, `cond_shape`, `cond_scale`, with an example command given below:
+
+<br>
+
+```
+python train.py --dataset=dsprites --exp=dsprites_cond_pos --zero_kl --num_steps=20 --step_lr=500.0 --swish_act  --cond_pos --replay_batch -cclass
+```
+
+<br>
+
+* once models are trained, they can be sampled from jointly by running:
+
+```
+python ebm_combine.py --task=conceptcombine --exp_size=<exp_size> --exp_shape=<exp_shape> --exp_pos=<exp_pos> --exp_rot=<exp_rot> --resume_size=<resume_size> --resume_shape=<resume_shape> --resume_rot=<resume_rot> --resume_pos=<resume_pos>
+```
--- a/EBMs/ais.py
+++ b/EBMs/ais.py
--- a/EBMs/custom_adam.py
+++ b/EBMs/custom_adam.py
--- a/EBMs/data.py
+++ b/EBMs/data.py
--- a/EBMs/ebm_combine.py
+++ b/EBMs/ebm_combine.py
--- a/EBMs/ebm_sandbox.py
+++ b/EBMs/ebm_sandbox.py
--- a/EBMs/fid.py
+++ b/EBMs/fid.py
--- a/EBMs/hmc.py
+++ b/EBMs/hmc.py
--- a/EBMs/imagenet_demo.py
+++ b/EBMs/imagenet_demo.py
--- a/EBMs/imagenet_preprocessing.py
+++ b/EBMs/imagenet_preprocessing.py
--- a/EBMs/inception.py
+++ b/EBMs/inception.py
--- a/EBMs/models.py
+++ b/EBMs/models.py
--- a/EBMs/requirements.txt
+++ b/EBMs/requirements.txt
@ -1,18 +1,18 @@
-scipy==1.1.0
-horovod==0.16.0
-torch==1.5.0
+scipy==1.10.0
+horovod==0.24.0
+torch==1.13.1
 torchvision==0.6.0
 six==1.11.0
 imageio==2.8.0
 tqdm==4.46.0
 matplotlib==3.2.1
 mpi4py==3.0.3
-numpy==1.18.4
-Pillow==5.4.1
+numpy==1.22.0
+Pillow==10.0.1
 baselines==0.1.5
 scikit-image==0.14.2
 scikit_learn
-tensorflow==1.13.1
+tensorflow==2.11.1
 cloudpickle==1.3.0
 Cython==0.29.17
-mujoco-py==1.50.1.68
+mujoco-py==1.50.1.68
--- a/EBMs/test_inception.py
+++ b/EBMs/test_inception.py
--- a/EBMs/train.py
+++ b/EBMs/train.py
--- a/EBMs/utils.py
+++ b/EBMs/utils.py
--- a/GPT/README.md
+++ b/GPT/README.md
@ -0,0 +1,16 @@
+## gpt
+
+<br>
+
+### cool resources
+
+<br>
+
+* **[vscode chatgpt plugin](https://github.com/mpociot/chatgpt-vscode) (and [here](https://marketplace.visualstudio.com/items?itemName=timkmecl.chatgpt))**
+* **[scispace extension (paper explainer)](https://chrome.google.com/webstore/detail/scispace-copilot/cipccbpjpemcnijhjcdjmkjhmhniiick/related)**
+* **[fix python bugs](https://platform.openai.com/playground/p/default-fix-python-bugs?model=code-davinci-002)**
+* **[explain code](https://platform.openai.com/playground/p/default-explain-code?model=code-davinci-002)**
+* **[translate code](https://platform.openai.com/playground/p/default-translate-code?model=code-davinci-002)**
+* **[translate sql](https://platform.openai.com/playground/p/default-sql-translate?model=code-davinci-002)**
+* **[calculate time complexity](https://platform.openai.com/playground/p/default-time-complexity?model=text-davinci-003)**
+* **[text to programmatic command](https://platform.openai.com/playground/p/default-text-to-command?model=text-davinci-003)**
--- a/README.md
+++ b/README.md
@ -1,185 +1,44 @@
-# Training EMBs using OpenAI's resources
+## deep learning projects, code, resources
 
+ <br>

-This repository contains an adapted code for [OpenAI's Implicit Generation and Generalization in Energy Based Models](https://arxiv.org/pdf/1903.08689.pdf).
+### chapters

-## Installing locally
+<br>

-### Install the system's requirement
+#### learnings

-```bash
-brew install gcc@6
-brew install open-mpi
-brew install pkg-config
-```
+* **[deep learning](agents/deep_learning.md)**
+* **[reinforcement learning](agents/reinforcement_learning.md)**
+* **[strategy workflow](agents/strategy_workflow)**
+* **[trading on gmx](agents/trading_on_gmx.md)**

+<br>

-There is a [bug](https://github.com/open-mpi/ompi/issues/7516) in open-mpi for the specific libraries in this problem (`PMIX ERROR: ERROR`) that can be fixed with:
+#### quantum computing and machine learning

-```
-export PMIX_MCA_gds=^ds12
-```
+* **[energy-based models](EBMs)**: my adaptation of openai's implicit generation and generalization in energy based models

+<br>

-### Install requirements.txt
+#### large language models

-Install Python's requirements in a virtual environment:
-
-```bash
-virtualenv venv
-source venv/bin/activate
-pip install -r requirements.txt
-```
-
-Note that this is an adapted requirement file since the [OpenAI's original](https://github.com/openai/ebm_code_release/blob/master/requirements.txt) is not complete/correct.
-
-### Install MuJoCo
-
-Download and install [MuJoCo](https://www.roboti.us/index.html). 
-
-You will also need to register for a license, which asks for a machine ID. The documentation on the website is incomplete, so just download the suggested script and run:
-
-```bash
-mv getid_osx getid_osx.dms
-./getid_osx.dms
-```
-
-### Download pre-trained models (exmples)
-
-Download all [pre-trained models](https://sites.google.com/view/igebm/home) and unzip into a local folder `cachedir`:
-
-```bash
-mkdir cachedir
-```
-
-### Setting results directory
-
-OpenAI's original code contains [hardcoded constants that only work on Linux](https://github.com/openai/ebm_code_release/blob/master/data.py#L218). We changed this to a constant (`ROOT_DIR = "./results"`) in the top of `data.py`. 
+* **[gpt](gpt)**
+* **[claude](claude)**

+<br>

 ----

-## Running
+### cool resources

-### Parallelization with `mpiexec` 
+<br>

-All code supports [`horovod` execution](https://github.com/horovod/horovod), so model training can be increased substantially by using multiple different workers by running each command.
-```
-mpiexec -n <worker_num>  <command>
-```
-
-### Examples of Training on example datasets
-
-#### CIFAR-10 Unconditional:
-
-```
-python train.py --exp=cifar10_uncond --dataset=cifar10 --num_steps=60 --batch_size=128 --step_lr=10.0 --proj_norm=0.01 --zero_kl --replay_batch --large_model
-```
-
-This should generate the following output:
-
-```bash
-Instructions for updating:
-Use tf.gfile.GFile.
-2020-05-10 22:12:32.471415: W tensorflow/core/framework/op_def_util.cc:355] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
-64 batch size
-Local rank:  0 1
-Loading data...
-Files already downloaded and verified
-Files already downloaded and verified
-Files already downloaded and verified
-Files already downloaded and verified
-Done loading...
-WARNING:tensorflow:From /Users/mia/dev/ebm_code_release/venv/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
-Instructions for updating:
-Colocations handled automatically by placer.
-Building graph...
-WARNING:tensorflow:From /Users/mia/dev/ebm_code_release/venv/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
-Instructions for updating:
-Use tf.cast instead.
-Finished processing loop construction ...
-Started gradient computation...
-Applying gradients...
-Finished applying gradients.
-Model has a total of 7567880 parameters
-Initializing variables...
-Start broadcast
-End broadcast
-Obtained a total of e_pos: -0.0025530937127768993, e_pos_std: 0.09564747661352158, e_neg: -0.22276005148887634, e_diff: 0.22020696103572845, e_neg_std: 0.016306934878230095, temp: 1, loss_e: -0.22276005148887634, eps: 0.0, label_ent: 2.272536277770996, l
-oss_ml: 0.22020693123340607, loss_total: 0.2792498469352722, x_grad: 0.0009156676824204624, x_grad_first: 0.0009156676824204624, x_off: 0.31731340289115906, iter: 0, gamma: [0.], context_0/c1_pre/cweight:0: 0.0731438547372818, context_0/res_optim_res_c1/
-cweight:0: 4.732660444095593e-11, context_0/res_optim_res_c1/gb:0: 3.4007335836250263e-10, context_0/res_optim_res_c2/cweight:0: 0.9494612216949463, context_0/res_optim_res_c2/g:0: 1.8536269741353806e-10, context_0/res_optim_res_c2/gb:0: 6.27235652306268
-3e-10, context_0/res_optim_res_c2/cb:0: 1.1606662297936055e-09, context_0/res_1_res_c1/cweight:0: 6.714453298917178e-11, context_0/res_1_res_c1/gb:0: 3.6198691266697836e-10, context_0/res_1_res_c2/cweight:0: 0.6582950353622437, context_0/res_1_res_c2/g:0
-: 1.669797633496728e-10, context_0/res_1_res_c2/gb:0: 5.911696687732615e-10, context_0/res_1_res_c2/cb:0: 1.1932842491901852e-09, context_0/res_2_res_c1/cweight:0: 8.567072745657711e-11, context_0/res_2_res_c1/gb:0: 6.868505764145993e-10, context_0/res_2
-_res_c2/cweight:0: 0.46929678320884705, context_0/res_2_res_c2/g:0: 1.655784120924153e-10, context_0/res_2_res_c2/gb:0: 8.058526068666083e-10, context_0/res_2_res_c2/cb:0: 1.0161046448686761e-09, context_0/res_2_res_adaptive/cweight:0: 0.0194275379180908
-2, context_0/res_3_res_c1/cweight:0: 4.011655244107182e-11, context_0/res_3_res_c1/gb:0: 5.064903496609929e-10, context_0/res_3_res_c2/cweight:0: 0.32239994406700134, context_0/res_3_res_c2/g:0: 9.758494012857e-11, context_0/res_3_res_c2/gb:0: 7.75612463
-1441708e-10, context_0/res_3_res_c2/cb:0: 6.362700366580043e-10, context_0/res_4_res_c1/cweight:0: 4.090133440270982e-11, context_0/res_4_res_c1/gb:0: 6.013010089844784e-10, context_0/res_4_res_c2/cweight:0: 0.34806951880455017, context_0/res_4_res_c2/g:
-0: 8.414659247168998e-11, context_0/res_4_res_c2/gb:0: 6.443054978433338e-10, context_0/res_4_res_c2/cb:0: 5.496815780325903e-10, context_0/res_5_res_c1/cweight:0: 3.990113794927197e-11, context_0/res_5_res_c1/gb:0: 3.807749116013781e-10, context_0/res_5
-_res_c2/cweight:0: 0.22841960191726685, context_0/res_5_res_c2/g:0: 4.942361797599659e-11, context_0/res_5_res_c2/gb:0: 7.697342763179904e-10, context_0/res_5_res_c2/cb:0: 3.1796060229183354e-10, context_0/fc5/wweight:0: 3.081033706665039, context_0/fc5/
-b:0: 0.4506262540817261,
-
-................................................................................................................................
-Inception score of 1.2397289276123047 with std of 0.0
-```
-
-
-#### CIFAR-10 Conditional:
-
-```
-python train.py --exp=cifar10_cond --dataset=cifar10 --num_steps=60 --batch_size=128 --step_lr=10.0 --proj_norm=0.01 --zero_kl --replay_batch --cclass
-```
-
-
-
-#### ImageNet 32x32 Conditional:
-
-```
-python train.py --exp=imagenet_cond --num_steps=60  --wider_model --batch_size=32 step_lr=10.0 --proj_norm=0.01 --replay_batch --cclass --zero_kl --dataset=imagenet --imagenet_path=<imagenet32x32 path>
-```
-
-#### ImageNet 128x128 Conditional:
-
-```
-python train.py --exp=imagenet_cond --num_steps=50 --batch_size=16 step_lr=100.0 --replay_batch --swish_act --cclass --zero_kl --dataset=imagenetfull --imagenet_datadir=<full imagenet path>
-```
-
-#### Imagenet Demo
-
-The imagenet_demo.py file contains code to experiments with EBMs on conditional ImageNet 128x128. To generate a gif on sampling, you can run the command:
-
-```
-python imagenet_demo.py --exp=imagenet128_cond --resume_iter=2238000 --swish_act
-```
-
-The ebm_sandbox.py file contains several different tasks that can be used to evaluate EBMs, which are defined by different settings of task flag in the file. For example, to visualize cross class mappings in CIFAR-10, you can run:
-
-```
-python ebm_sandbox.py --task=crossclass --num_steps=40 --exp=cifar10_cond --resume_iter=74700
-```
-
-
-#### Generalization
-
-To test generalization to out of distribution classification for SVHN (with similar commands for other datasets)
-```
-python ebm_sandbox.py --task=mixenergy --num_steps=40 --exp=cifar10_large_model_uncond --resume_iter=121200 --large_model --svhnmix --cclass=False
-```
-
-To test classification on CIFAR-10 using a conditional model under either L2 or Li perturbations
-```
-python ebm_sandbox.py --task=label --exp=cifar10_wider_model_cond --resume_iter=21600 --lnorm=-1 --pgd=<number of pgd steps> --num_steps=10 --lival=<li bound value> --wider_model
-```
-
-
-#### Concept Combination
-
-To train EBMs on conditional dSprites dataset, you can train each model seperately on each conditioned latent in cond_pos, cond_rot, cond_shape, cond_scale, with an example command given below.
-
-```
-python train.py --dataset=dsprites --exp=dsprites_cond_pos --zero_kl --num_steps=20 --step_lr=500.0 --swish_act  --cond_pos --replay_batch -cclass
-```
-
-Once models are trained, they can be sampled from jointly by running:
-
-```
-python ebm_combine.py --task=conceptcombine --exp_size=<exp_size> --exp_shape=<exp_shape> --exp_pos=<exp_pos> --exp_rot=<exp_rot> --resume_size=<resume_size> --resume_shape=<resume_shape> --resume_rot=<resume_rot> --resume_pos=<resume_pos>
-```
+* **[cursor ai editor](https://www.cursor.com/)**
+* **[microsoft notes on ai agents](https://github.com/microsoft/generative-ai-for-beginners/tree/main/17-ai-agents)**
+* **[ritual.net, integrate ai models into protocols](https://ritual.net/)**
+* **[the promise and challenges of crypto + ai applications, by vub](https://vitalik.eth.limo/general/2024/01/30/cryptoai.html)**
+* **[on training defi agents with markov chains, by bt3gl](https://mirror.xyz/go-outside.eth/DKaWYobU7q3EvZw8x01J7uEmF_E8PfNN27j0VgxQhNQ)**
+* **[google's jax (composable transformations of numpy programs)](https://github.com/google/jax)**
+* **[machine learning engineering open book](https://github.com/stas00/ml-engineering)**
+* **[advances in financial machine learning](books/advances_in_financial_machine_learning.pdf)**
--- a/agents/README.md
+++ b/agents/README.md
@ -0,0 +1,18 @@
+## strategy workflow
+
+<br>
+
+<p align="center">
+<img width="854" src="https://user-images.githubusercontent.com/1130416/227752772-5d739fd8-1b5c-4841-a52a-7cda308fc4df.png">
+</p>
+
+<br>
+
+1. **[data analysis](strategy_workflow/data_analysis.md)**
+2. **[supervised model training](strategy_workflow/supervised_learning.md)**
+3. **[policy development](strategy_workflow/policy.md)**
+4. **[backtesting](strategy_workflow/backtesting.md)**
+5. **[parameter optimization](strategy_workflow/optimization.md)**
+6. **[simulation and paper trading](strategy_workflow/paper_trading.md)**
+7. **[live trading](strategy_workflow/live_trading.md)**
+8. **[strategy metrics](strategy_workflow/strategy_metrics.md)**
--- a/agents/deep_learning.md
+++ b/agents/deep_learning.md
@ -0,0 +1,164 @@
+## deep learning 
+
+<br>
+
+### timeline tl; dr
+
+<br>
+
+* **[2012: imagenet and alexnet](https://github.com/tensorflow/models/blob/master/research/slim/nets/alexnet.py)**
+
+* **[2013: atari with deep reinforcement learning](https://www.tensorflow.org/agents/tutorials/1_dqn_tutorial)**
+* **[2014: seq2seq](https://www.tensorflow.org/addons/tutorials/networks_seq2seq_nmt)**
+* **[2014: adam optmizer](https://github.com/tensorflow/tensorflow/blob/v2.2.0/tensorflow/python/keras/optimizer_v2/adam.py#L32-L281)**
+* **[2015: gans](https://www.tensorflow.org/tutorials/generative/dcgan)**
+* **[2015: resnets](https://github.com/tensorflow/tensorflow/blob/v2.2.0/tensorflow/python/keras/applications/resnet.py)**
+* **[2017: transformers](https://github.com/huggingface/transformers)**
+* **[2018: bert](https://arxiv.org/abs/1810.04805)**
+
+<br>
+
+---
+
+### deep reinforcement learning for trading
+
+<br>
+
+* a map consists of a set of states, a set of actions, a transition function that describes the probability of moving rom one state to another after taking an action, and a reward function that assigns a numerical reward to each state-action pair
+
+* the goal of a map is to maximize its expected cumulative reward over a sequence of actions, called a policy.
+
+* a policy is a function that maps each state to a probability distribution over actions. The optimal policy is the one that maximizes the expected cumulative rewards.
+
+* the problem of reinforcement learning can be formalized using ideas from dynamical systems theory, specifically, as the optimal control of incompletely-known Markov decision processes.
+
+* as opposed to supervised learning, an agent must be able to learn from its own experience. and as oppose to unsupervised learning because, reinforcement learning is trying to maximize a reward signal instead of trying to find hidden structure. 
+
+* the agent has to exploit what it has already experienced in order to obtain reward, but it also has to explore in order to make better action selections in the future. on a stochastic task, each action must be tried many times to gain a reliable estimate of its expected reward. 
+
+* beyond the agent and the environment, one can identify four main subelements of a reinforcement learning system: a policy, a reward signal, a value function, and, optionally, a model of the environment.
+
+* traditional reinforcement learning problems can be formulated as a markov decision process (MDP): 
+  * we have an agent acting in an environment
+  * each step *t* the agent receives as the input the current state S_t, takes an action A_t, and receives a reward R_{t+1} and the next state S_{t+1}
+  * the agent choose the action based on some policy pi: A_t = pi(S_t)
+  * it's our goal to find a policy that maximizes the cumulative reward Sum R_t over some finite or infinite time horizon
+
+
+<br>
+
+<img width="500" src="https://user-images.githubusercontent.com/1130416/227799494-d62aab7f-d6cf-419f-be03-1d2dbdee1853.png">
+
+<br>
+
+#### agent
+
+<br>
+
+* agent is the trading agent (e.g. the human trader who opens the gui of an exchange and makes trading decision based on the current state of the exchange and their account)
+
+<br>
+
+#### environment
+
+<br>
+
+* the exchange and other agents are the environment, and they are not something we can control
+* by putting other agents together into some big complex environment, we lose the ability to explicitly model them
+* if we try to reverse-engineer the algorithms and strategies that other traders are running, put us into a multi-agent reinforcement learning (MARL) problem setting
+
+<br>
+
+#### state
+
+<br>
+
+* in the case of trading on an exchange, we don't observe the complete state of the environment (e.g. other agents), so we are dealing with a partially observable markov decision process (pomdp).
+* what the agents observe is not the actual state S_t of the environment, but some derivation of that.
+* we can call the observation X_t, which is calculated using some function of the full state X_t ~ O(S_t)
+* the observation at each timestep t is simply the history of all exchange events received up to time t.
+* this event history can be used to build up the current exchange state, however, in order for our agent to make decisions, extra info such as account balance and open limit orders need to be included.
+
+<br>
+
+#### time scale
+
+<br>
+
+* hft techniques: decisions are based almost entirely on market microstructure signals. decisions are made on nanoseconds timescales and trading strategies use dedicated connections to exchanges and extremly fast but simple algorithms running fpga hardware.
+* neural networks are slow, they can't make predictions on nanoseconds time scales, so they can't compete with the speed of hft algorithms.
+* guess: the optimal time scale is between a few milliseconds and a few minutes.
+* can deep rl algorithms pick up hidden patterns?
+
+<br>
+
+#### action space
+
+<br>
+
+* the simplest approach has 3 actions: buy, hold, and sell. this works but limits us to placing market orders and to invest a deterministic amount of money at each step.
+* in the next level we would let our agents learn how much money to invest, based on the uncertainty of our model, putting us into a continuous action space.
+* in the next level, we would introduce limit orders, and the agent needs to decide the level (price) and wuantity of the order, and be able to cancel orders that have not been yet matched.
+
+<br>
+
+#### reward function
+
+<br>
+
+* there are several possible reward functions, an obvious would realized PnL (profit and loss). the agent receives a reward whenever it closes a position.
+* the net profit is either negative or positive, and this is the reward signal.
+* as the agent maximize the total cumulative reward, it learns to trade profitably. the reward function leads to the optimal policy in the limit.
+* however, buy and sell actions are rare compared to doing nothing; the agent needs to learn without receiving frequent feedback.
+* an alternative is unrealized pnl, which the net profit the agent would get if it were to close all of its positions immediately.
+* because the unrealized pnl may change at each time step, it gives the agent more frequent feedback signals. however the direct feedback may bias the agent towards short-term actions.
+* both naively optimize for profit, but a trader may want to minimize risk (lower volatility)
+* using the sharpe ration is one simple way to take risk into account. other way is maximum drawdown.
+
+<br>
+
+<img width="505" src="https://user-images.githubusercontent.com/1130416/227811225-9af06c79-3f86-48e8-899c-ee5a80bc91e1.png">
+
+<br>
+
+#### learned policies
+
+<br>
+
+* instead of needing to hand-code a rule-based policy, rl directly learns a policy
+
+
+<br>
+
+#### trained directly in simulation environments
+
+<br>
+
+* we need separate backtesting and parameter optimization steps because it was difficult for our strategies to take into account environmental factors: order book liquidity, fee structures, latencies.
+* getting around environmental limitations is part of the opimization process. if we simulate the latency in the reinforcement learning environment, and this results in the agent making a mistake, the agent will get a negative rewards, forcing it to learn to work around the latencies.
+* by learning a model of the environment and performing rollouts using techniques like a monte carlo tree search (mcts), we could take into account potential reactions of the market (other agents)
+* by being smart about the data we collect from the live environment, we can continously improve our model
+* do we act optimally in the live environment to generate profits, or do we act suboptimally to gather interesting information that we can use to improve the model of our environment and other agents?
+
+<br>
+
+#### learning to adapt to market conditions
+
+<br>
+
+* some strategy may work better in a bearish environment but lose money in a bullish environment.
+* because rl agents are learning powerful policies parameterized by NN, they can alos learn to adapt to market conditions by seeing them in historical data, given that they are trained over long time horizon and have sufficient memory.
+
+<br>
+
+#### trading as research
+
+<br>
+
+* the trading environment is a multiplayer game with thousands of agents acting simultaneously
+* understanding how to build models of other agents is only one possible we can, we can choose perfom actions in a live environment with the goal of maximizing the information grain with respect to kind policies the other agents may be following
+* trading agents receive sparse rewards from the market. naively applying reward-hungry rl algorithms will fail.
+* this opens up the possibility for new algorithms and techniques, that can efficiently deal with sparse rewards.
+* many of today's standard algorithms, such as dqn or a3c, use a very naive approach exploration - basically adding random noise to the policy. however, in the trading case, most states in the environment are bad, and there are only a few good ones. a naive random approach to exploration will almost never stumble upon good state-actions paris.
+* the trading environment is inherently nonstationary. market conditions change and other agent join, leave, and constantly change their strategies.
+* can we train an agent that can transit from bear to bull and then back to bear, without needing to be re-trained?
--- a/agents/reinforcement_learning.md
+++ b/agents/reinforcement_learning.md
@ -0,0 +1,119 @@
+## reinforcement learning
+
+<br>
+
+### tl; dr
+
+<br>
+
+* reinforcement learning is learning what to do (how to map situations to actions) so as to maximize a numerical reward signal
+* an autonomous agent is a software program or system that can operate independently and make decisions on its own, without direct intervention from a human
+
+<br>
+
+---
+
+### overview
+
+<br>
+
+* we formalize the problem of reinforcement using ideas from dynamical system theory, as the optimal control of incompletely-known Markov decision processes.
+* a learning agent must be able to sense the state of its environment to some extent and must be able to take actions that affect the state.
+* markov decision processes are intented to include just these three aspects, sensation, action, and goal.
+* the agent has to exploit what it has already experienced in order to obtain reward, but it has also to explore in order to make better action selections in the future.
+* on a stochastic tasks, each action must be tried many times to gain a reliable estimate of its expected reward.
+
+<br>
+
+---
+
+### elements of reinforcement learning
+
+<br>
+
+* beyond the agent and the environment, 4 more elements belong to a reinforcement learning system: a policy, a reward signal, a value funtion, and a model of the environmnet.
+* a policy defines the learning agent's way of behacing at a given time. It's a mapping from perceiv ed states of the environment to actions to be taken when in those states. in general, policies may be stochastics (specifying probabilities for each action).
+* a reward signal defines the goal of a reinforcement learning problem: on each time step, the environment sends to the reinforcement learning agent a single number called the reward. the agent's sole objective is to maximize the total reward over the run.
+* a value function specifies what is good in the long run, the valye of a state in the total amount of reward an agent can expect to accumulate over the future, starting from that state
+* a model of the environment.
+* the most important feature distinguishing reinforcement learning from other types of learning is that it uses training information that evaluates the actions taken rather than instructs by giving correct actions. 
+
+<br>
+
+---
+
+### finite markov decision processes (mdps)
+
+<br>
+
+* the problem involves evaluating feedbacks and choosing different actions in different situations.
+* mdps are a classical formalization of sequential decision making, where actions influence not just immediate rewards, but also subsequent situations.
+* mdps involve delayed reward and the need to trade off immediate and delayed reward.
+
+<br>
+
+##### the agent-environment interface
+
+* mdps are meant to be a straightfoward framing of the problem of learning from interaction to achieve a goal.
+* the learner and the decision makers is called the agent.
+* the thing it interacts with, comprimising everything outside the agent, is called the environment.
+* the environment gives rise to rewards, numerical values that the agent seeks to maximize over time through its choice of actions.
+
+<br>
+
+<img width="466" src="https://user-images.githubusercontent.com/1130416/228971927-3c574911-d0ca-4d2d-b795-8b0776599952.png">
+
+<br>
+
+* the agent and the environment interact at each of a sequence of discrete steps, t = 0, 1, 2, 3...
+* at each time step t, the agent receives some representation of the environments state St
+* on that basis, the agent selects an action At
+* one step later, in part of a consequence of its action, the agent receives a numerical rewards and finds itself in a new state.
+* the mdp and the agent together give rise to a sequence (trajectory)
+* in a finite mdp, the set of states, actions, and rewards all have a finite number of elements. in this case, the random variables R and S have well defined discrete probability distributions dependent only on the proceding state and action.
+* in a markov decision process, the probabilities given by p completely characterize the environment's dynamics.
+* the state must include information about all aspects of the past agent-environment interaction that make a differnce for the future.
+* anything that cannot be changed arbitrarily by the agent is considered to be outside of it and thus part of its environment.
+
+<br>
+
+##### goals and rewards
+
+
+* each episode ends in a special state called the terminal state, followed by a reset to a standard starting state or to a sample from a standard distribution of starting states.
+* almost all reinforcement learning algorithms involve estimating value functions—functions of states (or of state–action pairs) that estimate how good it is for the agent to be in a given state (or how good it is to perform a given action in a given state). 
+* the Bellman equation averages over all the possibilities, weighting each by its probability of occurring. tt states that the value of the start state must equal the
+(discounted) value of the expected next state, plus the reward expected along the way.
+* solving a reinforcement learning task means finding a policy that achieves a lot of reward over the long run. 
+
+<br>
+
+---
+
+### dynamic programming
+
+* collection of algorithms that can be used to compute optimal policies given a perfect model of the environment as a mdp.
+* a common way of obtaining approximate solutions for tasks with continuous states and actions is to quantize the state and action spaces and then apply finite-state DP methods. 
+* the reason for computing the value function for a policy is to help find better policies.
+* asynchronous DP algorithms are in-place iterative DP algorithms that are not organized in terms of systematic sweeps of the state set. these algorithms update the values of states in any order whatsoever, using whatever values of other states happen to be available. the values of some states may be updated several times before the values of others ar
+* policy evaluation refers to the (typi- cally) iterative computation of the value function for a given policy. 
+* policy improvement refers to the computation of an improved policy given the value function for that policy.
+
+<br>
+
+##### generalized policy interaction
+
+* policy iteration consists of two simultaneous, interacting processes, one making the value function consistent with the current policy (policy evaluation), and the other making the policy greedy with respect to the current value function (policy improvement). 
+* generalized policy iteration (GPI) refers to the general idea of letting policy-evaluation and policy-improvement processes interact, independent of the granularity and other details of the two processes. 
+* DP is sometimes thought to be of limited applicability because of the curse of dimen- sionality, the fact that the number of states often grows exponentially with the number of state variables
+
+<br>
+
+---
+
+### cool resources
+
+<br>
+
+* **[gymnasium api](https://gymnasium.farama.org/)**
+* **[reinforcement learning with unsupervised auxiliary tasks, by jaderberg et al.](https://arxiv.org/abs/1611.05397)**
--- a/agents/strategy_workflow/backtesting.md
+++ b/agents/strategy_workflow/backtesting.md
@ -0,0 +1,6 @@
+## strategy backtesting
+
+<br>
+
+* use a simulator to test an inital version of the strategy against a set of historical data.
+* the simulator can take things such as order book liquidity, network latencies, fees, etc.
--- a/agents/strategy_workflow/data_analysis.md
+++ b/agents/strategy_workflow/data_analysis.md
@ -0,0 +1,5 @@
+## data analysis
+
+<br>
+
+* perform exploratory data analysis to find trading opportunities, such as look at charts, calculate statistics, etc.
--- a/agents/strategy_workflow/defi_glossary.md
+++ b/agents/strategy_workflow/defi_glossary.md
@ -0,0 +1,159 @@
+## DeFi and MEV Glossary
+
+<br>
+
+
+### A
+
+- Arbitrage: the simultaneous buying and selling of assets (e.g., cryptocurrencies) in several markets to take advantage of their price discrepancies.
+- Assets under management (AUM): the total market value of the investments that a person or entity manages on behalf of clients.
+
+<br>
+
+
+### B
+
+- Backrunning: when an attacker attempts to have a transaction ordered immediately after a certain unconfirmed target transaction.
+- Blocks: a block contains transaction data and the hash of the previous block ensuring immutability in the blockchain network. Each block in a blockchain contains a list of transactions in a particular order. These transactions encode the updates to the blockchain state. 
+- Block time: the time interval between blocks being added to the blockchain.
+- Broadcasting: whenever a user interacts with the blockchain, they broadcast a request to include the transaction to the network. This request is public (anyone can listen to it).
+- Builders: actors that take bundles (of pendent transactions from the mempool) and create a final block to send to (multiple) relays (setting themselves afeeRecipient to receive the block’s MEV).
+- Bundles: one or more transactions that are grouped together and executed in the order they are provided. In addition to the searcher's transaction(s), a bundle can also contain other users' pending transactions from the mempool. Bundles can target specific blocks for inclusion as well.
+
+<br>
+
+### C
+
+- Central limit order book (CLOB): patient buyers and sellers post limit orders with the price and size that they are willing to buy or sell a given asset. Impatient buyers and sellers place market orders that run through the CLOB until the desired size is reached.
+- Contract address: the address hosting some source code deployed on the Ethereum blockchain, which is executed by a triggering transaction.
+- Crypto copy trading strategy: a trading strategy that uses automation to buy and sell crypto, letting you copy another trader's method.
+
+<br>
+
+### D
+
+- Derivatives: financial contracts that derive their values from underlying assets.
+- Dollar-cost-averaging (DCA) strategy: a one-stop automated trading, based on time intervals, and reducing the influence of market volatility. Parameters for DCA can be: currency, fixed/maximum investment, and amount, investment frequency.
+
+<br>
+
+### E
+
+- Epoch: in the context of Ethereum's block production, in each slot (every 12 seconds), a validator is randomly chosen to propose the block in that slot. An epoch contains 32 slots.
+- Externally owned account (EOA): an account that is a combination of public address and private key, and that can be used to send and receive Ether to/from another account. An Ethereum address is a 42-character hexadecimal address derived from the last 20 bytes of the public key of the account (with 0x appended in front).
+
+<br>
+
+### F
+
+- Frontrunning: the process by which an adversary observes transactions on the network layer and acts on this information to obtain profit.
+- Fully diluted valuations (FDV):  the total number of tokens multiplied by the current price of a single token.
+- Futures: contracts used as proxy tools to speculate on the future prices of crypto assets or to hedge against their price changes.
+- Future grid trading bots: bots that automate futures trading activities based on grid trading strategies (a set of orders is placed both above and below a specific reference market price for the asset).
+
+<br>
+
+### G
+
+- Gas price: used somewhat like a bid, indicating an amount the user is willing to pay (per unit of execution) to have their transaction processed.
+- Gwei: a small unit of the Ethereum network's Ether (ETH) cryptocurrency. A gwei or gigawei is defined as 1,000,000,000 wei, the smallest base unit of Ether. Conversely, 1 ETH represents 1 billion gwei.
+- Grid trading strategy: a strategy that involves placing orders above and below a set price, using a price grid of orders (which shows orders at incrementally increasing and decreasing prices). Grid trading is based on the overarching goal of buying low and selling high.
+
+<br>
+
+### H
+
+- Hedging: taking short positions.
+
+<br>
+
+### K
+
+- Keys: blockchain account keys can be either private keys (for digital signatures), or public keys (for addresses).
+
+<br>
+
+### L
+
+- Limit orders:  when one longs or shorts a contract, several execution options can be placed (usually with a fee difference). Limit orders that are set at a specific price to be traded, and there is no guarantee that the trade will be executed (see market orders and stop-loss orders).
+- Liquidity pools: a collection of crypto assets that can be used for decentralized trading. They are essential for automated market makers (AMM), borrow-lend protocols, yield farming, synthetic assets, on-chain insurance, blockchain gaming, etc.
+- Liquidation threshold: the percentage at which a collateral value is counted towards the borrowing capacity.
+- Liquidation: when the value of a borrowed asset exceeds the collateral. Anyone can liquidate the collateral and collect the liquidation fee for themselves.
+- Long: traders maintain long positions, which means that they expect the price of a coin to rise in the future.
+
+<br>
+
+### M
+
+- Fully diluted market capitalization: the total token supply, multiplied by the price of a single token.
+- Circulating supply market capitalization: the number of tokens that are available in the market, multiplied by the price of a single token.
+- Margin trading: buying or sell assets with leverage.
+- Marginal seller: a type of seller who is willing first to leave the market if the prices are lower.
+- Market orders: Market orders are executed immediately at the asset's market price (see limit orders).
+- Mean reversion strategy: a trading range (or mean reversion) strategy is based on the concept that an asset's high and low prices are a temporary effect that reverts to their mean value (average value).
+- Mempool: a cryptocurrency node’s mechanism for storing information on unconfirmed transactions. 
+- Merkle tree: a type of binary tree, composed of: 1) a set of notes with a large number of leaf nodes at the bottom, containing the underlying data, 2) a set of intermediate nodes where each node is the hash of its two children, and 3) a single root node, also formed from the hash of its two children, representing the top of the tree.
+- Minting: the process of validating information, creating a new block, and recording that information into the blockchain.
+
+<br>
+
+### P
+
+- Perpetual contract: a contract without an expiration date, where interest rates can be calculated by methods such as Time-Weighted-Average-Price (TWAP).
+- Priority gas auctions: bots compete against each other by binding up transaction fees (gas) to extract revenue from arbitrage opportunities, driving up user fees.
+- Private key: a secret number enabling a blockchain user to prove ownership on an account or contract, via a digital signature.
+- Publick key: a number generated by a one-way (hash) function from the private key, used to verify a digital signature made with the matching private key.
+- Provider: an entity that provides an abstraction for a connection to the blockchain network.
+- POFPs: private order flow protocols.
+
+<br>
+
+### O
+
+- Order flow:  in the context of Ethereum and EVM-based blockchains, an order is anything that allows changing the state of the blockchain.
+- Open interest: total number of futures contracts held by market participants at the end of the trading day. Used as an indicator to determine market sentiment and the strength behind price trends.
+
+<br>
+
+### R
+
+- RPC endpoints: blockchain odes with RPC endpoints.
+
+<br>
+
+### S
+
+- Slots:  in the context of Ethereum's block production, a slot is a time period of 12 seconds in which a randomly chosen validator has time to propose a block. 
+- Smart contracts: a computer protocol intended to enforce a contract on the blockchain without third parties. They are reliant upon code (the functions) and data (the state), and they can trigger specific actions, such as transferring tokens from A to B.
+- Sandwich attack: when slippage value is not set, this attack can happen by an actor bumping the price of an asset to an unfavorable level, executing the trade, and then returning the asset to the original price.
+- Slippage: delta in pricing between the time of order and when the order is executed.
+- Short: traders maintain short positions, which means they expect the price of a coin to drop in the future.
+- Short squeeze: occurs when a heavily shorted stock experiences an increase in price for some unexpected reason. This situation prompts short sellers to scramble to buy the stock to cover their positions and cap their mounting losses.
+- Spot trading: buy or selling assets for immediate delivery.
+- Statistical trading:  is the class of  strategies that aim to generate profitable situations, stemming from pricing inefficiencies among financial markets. Statistical arbitrage is a strategy to obtain profit by applying past statistics. 
+- Stop-loss orders: this type of order execution places a market/limit order to close a position to restrict an investor's loss on a crypto asset.
+
+<br>
+
+### T
+
+- otal value locked (TVL): the value of all tokens locked in various DeFi protocols such as lending platforms, DEXes, or derivatives protocols.
+- Тrading volume: the total amount of traded cryptocurrency (equivalent to US dollars) during a given timeframe.
+- Transaction: on EVM-based blockchains, there the two types of transactions are normal transactions and contract interactions.
+- Transaction hash: a unique 66-character identifier generated with each new transaction.
+- Transaction ordering: blockchains usually have loose requirements for how transactions are ordered within a block, allowing attacks that benefit from certain ordering.
+- Time-weighted average price strategy: TWAP strategy breaks up a large order and releases dynamically determined smaller chunks of the order to the market, using evenly divided time slots between a start and end time.
+
+<br>
+
+### V
+
+- Validation: a mathematical proof that the state change in the blockchain is consistent. To be included into a block in the blockchain, a list of transactions needs to be validated. 
+- VTRPs: validator transaction Reordering protocols.
+- Volume-weighted average price strategy: VWAP breaks up a large order and releases dynamically determined smaller chunks of the order to the market, using historical volume profiles.
+
+<br>
+
+### W
+
+- Whales: individuals or institutions who hold large amounts of coins of a certain cryptocurrency, and can become powerful enough to manipulate the valuation.
--- a/agents/strategy_workflow/live_trading.md
+++ b/agents/strategy_workflow/live_trading.md
@ -0,0 +1,5 @@
+## live trading
+
+<br>
+
+* the strategy is now running live on an exchange.
--- a/agents/strategy_workflow/optimization.md
+++ b/agents/strategy_workflow/optimization.md
@ -0,0 +1,6 @@
+## parameter optimization
+
+<br>
+
+* perform a search, for example grid search, over possible values of strategy parameters like thresholds or coefficients (using the simulator and a set of historical data)
+* overfitting to historical data is a big risk (be careful with validation and test sets).
--- a/agents/strategy_workflow/paper_trading.md
+++ b/agents/strategy_workflow/paper_trading.md
@ -0,0 +1,5 @@
+## paper trading
+
+<br>
+
+* before the strategy goes live, simulation is done on new market data, in real-time (paper trading), which prevents overfitting
--- a/agents/strategy_workflow/policy.md
+++ b/agents/strategy_workflow/policy.md
@ -0,0 +1,5 @@
+## policy development
+
+<br>
+
+* come with a rule-based policy that determines what actions to take based on the current state of the market and the outpus of supervised models.
--- a/agents/strategy_workflow/strategy_metrics.md
+++ b/agents/strategy_workflow/strategy_metrics.md
@ -0,0 +1,9 @@
+## trading strategy metrics
+
+<br>
+
+* **net pnl (net profit and loss):** how much money an algorithm makes (positive) or loses (negative) over some period, minus trading fees
+* **alpha nad beta**
+* **shape ratio:** the excess return per unit of risk you are taking (return on capital over the standard deviation adjusted for risk; the higher the better).
+* **maximum drawdown:** maximum difference between a local maximum and a subsequent local minimum as an another measure of risk.
+* **value at risk (var):** how much capital you may lose over a given time frame with some probability, assumong normal market conditions.
--- a/agents/strategy_workflow/supervised_learning.md
+++ b/agents/strategy_workflow/supervised_learning.md
@ -0,0 +1,5 @@
+## supervised learning
+
+<br>
+
+* train one or more supervised learning models to predict quantities of interest that are necessary for the strategy work, for example, price prediction, quantity prediction, etc.
--- a/agents/trading_on_gmx.md
+++ b/agents/trading_on_gmx.md
@ -0,0 +1,34 @@
+## basics on trading, illustrated on gmx
+
+<br>
+
+#### price chart
+
+
+* the current price is the price of the most recent trade.
+* it varies on whether that trade was a buy or a sell.
+* high volume means the price movement is more reliable (consensus of a large number of market participants).
+* candlesitck chart showing open/start (O), high (h), low (low), anc close/end (c) prices for a given time window.
+
+<br>
+
+
+<img width="400" src="https://user-images.githubusercontent.com/1130416/227733463-d0dff53f-9a5f-45f3-80a4-9d9ab0d9201e.png">
+<img width="400" src="https://user-images.githubusercontent.com/1130416/227733575-90550afd-99f2-45cc-b6aa-fd4457910cc5.png">
+
+<br>
+
+----
+
+#### order book
+
+<br>
+
+* the order book is made of two sides, asks (sell, offers) and bids (buy).
+* the best ask (the lowest price someone is willing to sell ) > the best bid (the highest price someone is willing to buy).
+* the difference between the best ask and the best bid is called spread.
+* **market order**: best price possible, right now. it takes liquidity from the market and usually has higher fees.
+* **limit order (passive order)**: specify the price and qty you are willing to buy or sell at, and then wait for the match.
+* **stop orders**: allow you to set a maximum price for your market orders.
+
+<br>
--- a/books/advances_in_financial_machine_learning.pdf
+++ b/books/advances_in_financial_machine_learning.pdf
--- a/claude/README.md
+++ b/claude/README.md
@ -0,0 +1,3 @@
+## claude
+
+<br>