gpt4all

AI/gpt4all

mirror of https://github.com/nomic-ai/gpt4all.git synced 2024-10-01 01:06:10 -04:00

Author	SHA1	Message	Date
Aaron Miller	f79557d2aa	speedup: just use matvec shaders for matmat so far my from-scratch matmats are still slower than just running more invocations of the existing Metal ported matvec shaders - it should be theoretically possible to make a matmat that's faster (for actual matmat cases) than an optimal matvec, but it will need to be at least* as fast as the mat*vec op and then take special care to be cache-friendly and save memory bandwidth, as the # of compute ops is the same	2023-10-16 13:45:51 -04:00
cebtenzzre	22de3c56bd	convert scripts: fix AutoConfig typo (#1512 )	2023-10-13 14:16:51 -04:00
Aaron Miller	2490977f89	q6k, q4_1 mat*mat	2023-10-12 14:56:54 -04:00
Aaron Miller	afaa291eab	python bindings should be quiet by default * disable llama.cpp logging unless GPT4ALL_VERBOSE_LLAMACPP envvar is nonempty * make verbose flag for retrieve_model default false (but also be overridable via gpt4all constructor) should be able to run a basic test: ```python import gpt4all model = gpt4all.GPT4All('/Users/aaron/Downloads/rift-coder-v0-7b-q4_0.gguf') print(model.generate('def fib(n):')) ``` and see no non-model output when successful	2023-10-11 14:14:36 -07:00
cebtenzzre	7b611b49f2	llmodel: print an error if the CPU does not support AVX (#1499 )	2023-10-11 15:09:40 -04:00
Aaron Miller	043617168e	do not process prompts on gpu yet	2023-10-11 13:15:50 -04:00
Aaron Miller	64001a480a	mat*mat for q4_0, q8_0	2023-10-11 13:15:50 -04:00
cebtenzzre	7a19047329	llmodel: do not call magic_match unless build variant is correct (#1488 )	2023-10-11 11:30:48 -04:00
Cebtenzzre	5fe685427a	chat: clearer CPU fallback messages	2023-10-06 11:35:14 -04:00
Adam Treat	eec906aa05	Speculative fix for build on mac.	2023-10-05 18:37:33 -04:00
Adam Treat	a9acdd25de	Push a new version number for llmodel backend now that it is based on gguf.	2023-10-05 18:18:07 -04:00
Cebtenzzre	8bb6a6c201	rebase on newer llama.cpp	2023-10-05 18:16:19 -04:00
Cebtenzzre	d87573ea75	remove old llama.cpp submodules	2023-10-05 18:16:19 -04:00
Cebtenzzre	cc6db61c93	backend: fix build with Visual Studio generator Use the $<CONFIG> generator expression instead of CMAKE_BUILD_TYPE. This is needed because Visual Studio is a multi-configuration generator, so we do not know what the build type will be until `cmake --build` is called. Fixes #1470	2023-10-05 18:16:19 -04:00
Adam Treat	f605a5b686	Add q8_0 kernels to kompute shaders and bump to latest llama/gguf.	2023-10-05 18:16:19 -04:00
Cebtenzzre	672cb850f9	differentiate between init failure and unsupported models	2023-10-05 18:16:19 -04:00
Adam Treat	906699e8e9	Bump to latest llama/gguf branch.	2023-10-05 18:16:19 -04:00
Cebtenzzre	088afada49	llamamodel: fix static vector in LLamaModel::endTokens	2023-10-05 18:16:19 -04:00
Adam Treat	b4d82ea289	Bump to the latest fixes for vulkan in llama.	2023-10-05 18:16:19 -04:00
Adam Treat	12f943e966	Fix regenerate button to be deterministic and bump the llama version to latest we have for gguf.	2023-10-05 18:16:19 -04:00
Adam Treat	5d346e13d7	Add q6_k kernels for vulkan.	2023-10-05 18:16:19 -04:00
Adam Treat	4eefd386d0	Refactor for subgroups on mat * vec kernel.	2023-10-05 18:16:19 -04:00
Cebtenzzre	3c2aa299d8	gptj: remove unused variables	2023-10-05 18:16:19 -04:00
Cebtenzzre	f9deb87d20	convert scripts: add feed-forward length for better compatiblilty This GGUF key is used by all llama.cpp models with upstream support.	2023-10-05 18:16:19 -04:00
Cebtenzzre	cc7675d432	convert scripts: make gptj script executable	2023-10-05 18:16:19 -04:00
Cebtenzzre	0493e6eb07	convert scripts: use bytes_to_unicode from transformers	2023-10-05 18:16:19 -04:00
Cebtenzzre	d5d72f0361	gpt-j: update inference to match latest llama.cpp insights - Use F16 KV cache - Store transposed V in the cache - Avoid unnecessary Q copy Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> ggml upstream commit 0265f0813492602fec0e1159fe61de1bf0ccaf78	2023-10-05 18:16:19 -04:00
Cebtenzzre	050e7f076e	backend: port GPT-J to GGUF	2023-10-05 18:16:19 -04:00
Cebtenzzre	8f3abb37ca	fix references to removed model types	2023-10-05 18:16:19 -04:00
Cebtenzzre	4219c0e2e7	convert scripts: make them directly executable	2023-10-05 18:16:19 -04:00
Cebtenzzre	ce7be1db48	backend: use llamamodel.cpp for Falcon	2023-10-05 18:16:19 -04:00
Cebtenzzre	cca9e6ce81	convert_mpt_hf_to_gguf.py: better tokenizer decoding	2023-10-05 18:16:19 -04:00
Cebtenzzre	25297786db	convert scripts: load model as late as possible	2023-10-05 18:16:19 -04:00
Cebtenzzre	fd47088f2b	conversion scripts: cleanup	2023-10-05 18:16:19 -04:00
Cebtenzzre	6277eac9cc	backend: use llamamodel.cpp for StarCoder	2023-10-05 18:16:19 -04:00
Cebtenzzre	17fc9e3e58	backend: port Replit to GGUF	2023-10-05 18:16:19 -04:00
Cebtenzzre	7c67262a13	backend: port MPT to GGUF	2023-10-05 18:16:19 -04:00
Cebtenzzre	42bcb814b3	backend: port BERT to GGUF	2023-10-05 18:16:19 -04:00
Cebtenzzre	1d29e4696c	llamamodel: metal supports all quantization types now	2023-10-05 18:16:19 -04:00
Aaron Miller	507753a37c	macos build fixes	2023-10-05 18:16:19 -04:00
Adam Treat	d90d003a1d	Latest rebase on llama.cpp with gguf support.	2023-10-05 18:16:19 -04:00
Adam Treat	99c106e6b5	Fix a bug seen on AMD RADEON cards with vulkan backend.	2023-09-26 11:59:47 -04:00
Jacob Nguyen	e86c63750d	Update llama.cpp.cmake Signed-off-by: Jacob Nguyen <76754747+jacoobes@users.noreply.github.com>	2023-09-16 11:42:56 -07:00
Adam Treat	84905aa281	Fix for crashes on systems where vulkan is not installed properly.	2023-09-16 12:19:46 -04:00
Adam Treat	045f6e6cdc	Link against ggml in bin so we can get the available devices without loading a model.	2023-09-15 14:45:25 -04:00
Adam Treat	aa33419c6e	Fallback to CPU more robustly.	2023-09-14 16:53:11 -04:00
Adam Treat	9013a089bd	Bump to new llama with new bugfix.	2023-09-14 10:02:11 -04:00
Adam Treat	3076e0bf26	Only show GPU when we're actually using it.	2023-09-14 09:59:19 -04:00
Adam Treat	cf4eb530ce	Sync to a newer version of llama.cpp with bugfix for vulkan.	2023-09-13 21:01:44 -04:00
Adam Treat	4b9a345aee	Update the submodule.	2023-09-13 17:05:46 -04:00
Aaron Miller	6f038c136b	init at most one vulkan device, submodule update fixes issues w/ multiple of the same gpu	2023-09-13 12:49:53 -07:00
Adam Treat	8f99dca70f	Bring the vulkan backend to the GUI.	2023-09-13 11:26:10 -04:00
Aaron Miller	f0735efa7d	vulkan python bindings on windows fixes	2023-09-12 14:16:02 -07:00
Adam Treat	c953b321b7	Don't link against libvulkan.	2023-09-12 14:26:56 -04:00
Aaron Miller	c4d23512e4	remove extra dynamic linker deps when building with vulkan	2023-09-11 08:44:39 -07:00
Adam Treat	85e34598f9	more circleci	2023-08-31 15:29:54 -04:00
Adam Treat	f578fa6cdf	Fix for windows.	2023-08-31 15:29:54 -04:00
Adam Treat	17d3e4976c	Add a comment indicating future work.	2023-08-31 15:29:54 -04:00
Adam Treat	987546c63b	Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0.	2023-08-31 15:29:54 -04:00
Adam Treat	d55cbbee32	Update to newer llama.cpp and disable older forks.	2023-08-31 15:29:54 -04:00
Aaron Miller	0bc2274869	bump llama.cpp version + needed fixes for that	2023-08-31 15:29:54 -04:00
aaron miller	33c22be2aa	starcoder: use ggml_graph_plan	2023-08-31 15:29:54 -04:00
Cosmic Snow	108d950874	Fix Windows unable to load models on older Windows builds - Replace high-level IsProcessorFeaturePresent - Reintroduce low-level compiler intrinsics implementation	2023-08-09 09:27:43 +02:00
Adam Treat	6d03b3e500	Add starcoder support.	2023-07-27 09:15:16 -04:00
cosmic-snow	2d02c65177	Handle edge cases when generating embeddings (#1215 ) * Handle edge cases when generating embeddings * Improve Python handling & add llmodel_c.h note - In the Python bindings fail fast with a ValueError when text is empty - Advice other bindings authors to do likewise in llmodel_c.h	2023-07-17 13:21:03 -07:00
Aaron Miller	1c4a244291	bump mem allocation a bit	2023-07-14 09:48:57 -04:00
Adam Treat	ee4186d579	Fixup bert python bindings.	2023-07-14 09:48:57 -04:00
cosmic-snow	6200900677	Fix Windows MSVC arch detection (#1194 ) - in llmodel.cpp to fix AVX-only handling Signed-off-by: cosmic-snow <134004613+cosmic-snow@users.noreply.github.com>	2023-07-13 14:44:17 -04:00
Adam Treat	4963db8f43	Bump the version numbers for both python and c backend.	2023-07-13 14:21:46 -04:00
Adam Treat	0efdbfcffe	Bert	2023-07-13 14:21:46 -04:00
Adam Treat	315a1f2aa2	Move it back as internal class.	2023-07-13 14:21:46 -04:00
Adam Treat	ae8eb297ac	Add sbert backend.	2023-07-13 14:21:46 -04:00
Adam Treat	1f749d7633	Clean up backend code a bit and hide impl. details.	2023-07-13 14:21:46 -04:00
Adam Treat	33557b1f39	Move the implementation out of llmodel class.	2023-07-13 14:21:46 -04:00
Aaron Miller	432b7ebbd7	include windows.h just to be safe	2023-07-12 12:46:46 -04:00
Aaron Miller	95b8fb312e	windows/msvc: use high level processor feature detection API see https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-isprocessorfeaturepresent	2023-07-12 12:46:46 -04:00
Aaron Miller	f0faa23ad5	cmakelists: always export build commands (#1179 ) friendly for using editors with clangd integration that don't also manage the build themselves	2023-07-12 10:49:24 -04:00
Aaron Miller	4a24b586df	llama.cpp: metal buffer freeing	2023-06-30 21:07:21 -03:00
Aaron Miller	137bc2c367	replit: free metal context	2023-06-30 21:07:21 -03:00
Aaron Miller	57dc0c8953	adjust eval buf sizes to pass long input test	2023-06-30 21:07:21 -03:00
Aaron Miller	7a5f6e4726	limit prompt batch size to 128	2023-06-30 21:07:21 -03:00
Aaron Miller	883775bc5f	move 230511 submodule to nomic fork, fix alibi assert	2023-06-30 21:07:21 -03:00
Andriy Mulyar	46a0762bd5	Python Bindings: Improved unit tests, documentation and unification of API (#1090 ) * Makefiles, black, isort * Black and isort * unit tests and generation method * chat context provider * context does not reset * Current state * Fixup * Python bindings with unit tests * GPT4All Python Bindings: chat contexts, tests * New python bindings and backend fixes * Black and Isort * Documentation error * preserved n_predict for backwords compat with langchain --------- Co-authored-by: Adam Treat <treat.adam@gmail.com>	2023-06-30 16:02:02 -04:00
Aaron Miller	40a3faeb05	Use ggml scratch bufs for mpt and gptj models (#1104 ) * backend/gptj: use scratch buffers reduces total memory required and makes eval buf not grow with n_past * backend/mpt: use scratch bufs * fix format-related compile warnings	2023-06-30 10:53:45 -07:00
Aaron Miller	8d19ef3909	backend: factor out common elements in model code (#1089 ) * backend: factor out common structs in model code prepping to hack on these by hopefully making there be fewer places to fix the same bug rename * use common buffer wrapper instead of manual malloc * fix replit compile warnings	2023-06-28 17:35:07 -07:00
Aaron Miller	28d41d4f6d	falcon: use model-local eval & scratch bufs (#1079 ) fixes memory leaks copied from ggml/examples based implementation	2023-06-27 16:09:11 -07:00
Zach Nussbaum	2565f6a94a	feat: add conversion script	2023-06-27 14:06:39 -03:00
Aaron Miller	198b5e4832	add Falcon 7B model Tested with https://huggingface.co/TheBloke/falcon-7b-instruct-GGML/blob/main/falcon7b-instruct.ggmlv3.q4_0.bin	2023-06-27 14:06:39 -03:00
Aaron Miller	db34a2f670	llmodel: skip attempting Metal if model+kvcache > 53% of system ram	2023-06-26 19:46:49 -03:00
Aaron Miller	b19a3e5b2c	add requiredMem method to llmodel impls most of these can just shortcut out of the model loading logic llama is a bit worse to deal with because we submodule it so I have to at least parse the hparams, and then I just use the size on disk as an estimate for the mem size (which seems reasonable since we mmap() the llama files anyway)	2023-06-26 18:27:58 -03:00
Adam Treat	a0f80453e5	Use sysinfo in backend.	2023-06-26 14:14:49 -04:00
niansa/tuxifan	47323f8591	Update replit.cpp replit_tokenizer_detokenize returnins std::string now Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>	2023-06-26 14:49:58 -03:00
niansa	0855c0df1d	Fixed Replit implementation compile warnings	2023-06-26 14:49:58 -03:00
Aaron Miller	1290b32451	update to latest mainline llama.cpp add max_size param to ggml_metal_add_buffer - introduced in https://github.com/ggerganov/llama.cpp/pull/1826	2023-06-26 14:40:52 -03:00
niansa/tuxifan	5eee16c97c	Do not specify "success" as error for unsupported models Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>	2023-06-22 09:28:40 +02:00
Adam Treat	bd58c46da0	Initialize these to nullptr to prevent double deletion when a model fails to load.	2023-06-20 18:23:45 -04:00
niansa/tuxifan	68f9786ed9	Use operator ""_MiB (#991 )	2023-06-16 15:56:22 -04:00
Aaron Miller	abc081e48d	fix llama.cpp k-quants (#988 ) * enable k-quants on all mainline builds	2023-06-15 14:06:14 -07:00
Aaron Miller	c4319d2c8e	dlhandle: prevent libs from using each other's symbols (#977 ) use RTLD_LOCAL so that symbols are only exposed via dlsym without this all symbols exported by the libs are available for symbol resolution, resulting in different lib versions potentially resolving each other's symbols, causing incredibly cursed behavior such as https://gist.github.com/apage43/085c1ff69f6dd05387793ebc301840f6	2023-06-13 14:52:11 -04:00
Aaron Miller	f71d8efc71	metal replit (#931 ) metal+replit makes replit work with Metal and removes its use of `mem_per_token` in favor of fixed size scratch buffers (closer to llama.cpp)	2023-06-13 07:29:14 -07:00
Aaron Miller	85964a7635	bump llama.cpp mainline to latest (#964 )	2023-06-13 08:40:38 -04:00
Tim Miller	797891c995	Initial Library Loader for .NET Bindings / Update bindings to support newest changes (#763 ) * Initial Library Loader * Load library as part of Model factory * Dynamically search and find the dlls * Update tests to use locally built runtimes * Fix dylib loading, add macos runtime support for sample/tests * Bypass automatic loading by default. * Only set CMAKE_OSX_ARCHITECTURES if not already set, allow cross-compile * Switch Loading again * Update build scripts for mac/linux * Update bindings to support newest breaking changes * Fix build * Use llmodel for Windows * Actually, it does need to be libllmodel * Name * Remove TFMs, bypass loading by default * Fix script * Delete mac script --------- Co-authored-by: Tim Miller <innerlogic4321@ghmail.com>	2023-06-13 14:05:34 +02:00
Aaron Miller	88616fde7f	llmodel: change tokenToString to not use string_view (#968 ) fixes a definite use-after-free and likely avoids some other potential ones - std::string will convert to a std::string_view automatically but as soon as the std::string in question goes out of scope it is already freed and the string_view is pointing at freed memory - this is mostly fine if its returning a reference to the tokenizer's internal vocab table but it's, imo, too easy to return a reference to a dynamically constructed string with this as replit is doing (and unfortunately needs to do to convert the internal whitespace replacement symbol back to a space)	2023-06-13 07:14:02 -04:00
Adam Treat	84deebd223	Fix compile for windows and linux again. PLEASE DON'T REVERT THISgit gui!	2023-06-12 17:08:55 -04:00
Juuso Alasuutari	5cfb1bda89	llmodel: add model wrapper destructor, fix mem leak in golang bindings (#862 ) Signed-off-by: Juuso Alasuutari <juuso.alasuutari@gmail.com>	2023-06-12 09:41:22 -07:00
Cosmic Snow	ae4a275bcd	Fix Windows MSVC AVX builds - bug introduced in `0cb2b86730` - currently getting: `warning C5102: ignoring invalid command-line macro definition '/arch:AVX2'` - solution is to use `_options(...)` not `_definitions(...)`	2023-06-12 08:55:55 -07:00
Adam Treat	b906fb4057	When recalculating context we can't erase the BOS.	2023-06-12 08:43:20 -07:00
Aaron Miller	d3ba1295a7	Metal+LLama take two (#929 ) Support latest llama with Metal --------- Co-authored-by: Adam Treat <adam@nomic.ai> Co-authored-by: niansa/tuxifan <tuxifan@posteo.de>	2023-06-09 16:48:46 -04:00
Adam Treat	b162b5c64e	Revert "llama on Metal (#885 )" This reverts commit `c55f81b860`.	2023-06-09 15:08:46 -04:00
Aaron Miller	c55f81b860	llama on Metal (#885 ) Support latest llama with Metal --------- Co-authored-by: Adam Treat <adam@nomic.ai> Co-authored-by: niansa/tuxifan <tuxifan@posteo.de>	2023-06-09 14:58:12 -04:00
niansa/tuxifan	14e9ccbc6a	Do auto detection by default in C++ API Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>	2023-06-09 17:01:19 +02:00
niansa/tuxifan	f03da8d732	Removed double-static from variables in replit.cpp The anonymous namespace already makes it static. Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>	2023-06-09 08:55:15 -04:00
niansa	0cb2b86730	Synced llama.cpp.cmake with upstream	2023-06-08 18:21:32 -04:00
Aaron Miller	47fbc0e309	non-llama: explicitly greedy sampling for temp<=0 (#901 ) copied directly from llama.cpp - without this temp=0.0 will just scale all the logits to infinity and give bad output	2023-06-08 11:08:30 -07:00
Aaron Miller	b14953e136	sampling: remove incorrect offset for n_vocab (#900 ) no effect, but avoids a potential bug later if we use actualVocabSize - which is for when a model has a larger embedding tensor/# of output logits than actually trained token to allow room for adding extras in finetuning - presently all of our models have had "placeholder" tokens in the vocab so this hasn't broken anything, but if the sizes did differ we want the equivalent of `logits[actualVocabSize:]` (the start point is unchanged), not `logits[-actualVocabSize:]` (this.)	2023-06-08 11:08:10 -07:00
Adam Treat	010a04d96f	Revert "Synced llama.cpp.cmake with upstream (#887 )" This reverts commit `89910c7ca8`.	2023-06-08 07:23:41 -04:00
Adam Treat	7e304106cc	Fix for windows.	2023-06-07 12:58:51 -04:00
niansa/tuxifan	89910c7ca8	Synced llama.cpp.cmake with upstream (#887 )	2023-06-07 09:18:22 -07:00
Richard Guo	c4706d0c14	Replit Model (#713 ) * porting over replit code model to gpt4all * replaced memory with kv_self struct * continuing debug * welp it built but lot of sus things * working model loading and somewhat working generate.. need to format response? * revert back to semi working version * finally got rid of weird formatting * figured out problem is with python bindings - this is good to go for testing * addressing PR feedback * output refactor * fixed prompt reponse collection * cleanup * addressing PR comments * building replit backend with new ggmlver code * chatllm replit and clean python files * cleanup * updated replit to match new llmodel api * match llmodel api and change size_t to Token * resolve PR comments * replit model commit comment	2023-06-06 17:09:00 -04:00
Adam Treat	c5de9634c9	Fix llama models on linux and windows.	2023-06-05 14:31:15 -04:00
Adam Treat	8a9ad258f4	Fix symbol resolution on windows.	2023-06-05 11:19:02 -04:00
Adam Treat	812b2f4b29	Make installers work with mac/windows for big backend change.	2023-06-05 09:23:17 -04:00
Adam Treat	f73333c6a1	Update to latest llama.cpp	2023-06-04 19:57:34 -04:00
Adam Treat	301d2fdbea	Fix up for newer models on reset context. This fixes the model from totally failing after a reset context.	2023-06-04 19:31:20 -04:00
AT	5f95aa9fc6	We no longer have an avx_only repository and better error handling for minimum hardware requirements. (#833 )	2023-06-04 15:28:58 -04:00
AT	bbe195ee02	Backend prompt dedup (#822 ) * Deduplicated prompt() function code	2023-06-04 08:59:24 -04:00
Ikko Eltociear Ashimine	945297d837	Update README.md huggingface -> Hugging Face Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com>	2023-06-04 08:46:37 -04:00
Peter Gagarinov	23391d44e0	Only default mlock on macOS where swap seems to be a problem Repeating the change that once was done in https://github.com/nomic-ai/gpt4all/pull/663 but then was overriden by `48275d0dcc` Signed-off-by: Peter Gagarinov <pgagarinov@users.noreply.github.com>	2023-06-03 07:51:18 -04:00
niansa/tuxifan	f3564ac6b9	Fixed tons of warnings and clazy findings (#811 )	2023-06-02 15:46:41 -04:00
niansa/tuxifan	d6a70ddb5f	Fixed model type for GPT-J (#815 ) Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>	2023-06-02 15:46:33 -04:00
Richard Guo	e709e58603	more cleanup	2023-06-02 12:32:26 -04:00
Richard Guo	98420ea6d5	cleanup	2023-06-02 12:32:26 -04:00
Richard Guo	c54c42e3fb	fixed finding model libs	2023-06-02 12:32:26 -04:00
Adam Treat	cec8831e12	Fix mac build again.	2023-06-02 10:51:09 -04:00
Adam Treat	70e3b7e907	Try and fix build on mac.	2023-06-02 10:47:12 -04:00
Adam Treat	a41bd6ac0a	Trying to shrink the copy+paste code and do more code sharing between backend model impl.	2023-06-02 07:20:59 -04:00
Tim Miller	87cb3505d3	Fix MSVC Build, Update C# Binding Scripts	2023-06-01 14:24:23 -04:00
niansa/tuxifan	27e80e1d10	Allow user to specify custom search path via $GPT4ALL_IMPLEMENTATIONS_PATH (#789 )	2023-06-01 17:41:04 +02:00
niansa	5175db2781	Fixed double-free in LLModel::Implementation destructor	2023-06-01 11:19:08 -04:00
niansa/tuxifan	fc60f0c09c	Cleaned up implementation management (#787 ) * Cleaned up implementation management * Initialize LLModel::m_implementation to nullptr * llmodel.h: Moved dlhandle fwd declare above LLModel class	2023-06-01 16:51:46 +02:00
Adam Treat	1eca524171	Add fixme's and clean up a bit.	2023-06-01 07:57:10 -04:00
niansa	a3d08cdcd5	Dlopen better implementation management (Version 2)	2023-06-01 07:44:15 -04:00
niansa/tuxifan	92407438c8	Advanced avxonly autodetection (#744 ) * Advanced avxonly requirement detection	2023-05-31 21:26:18 -04:00
AT	48275d0dcc	Dlopen backend 5 (#779 ) Major change to the backend that allows for pluggable versions of llama.cpp/ggml. This was squashed merged from dlopen_backend_5 where the history is preserved.	2023-05-31 17:04:01 -04:00
Adam Treat	7f9f91ad94	Revert "New tokenizer implementation for MPT and GPT-J" This reverts commit `bbcee1ced5`.	2023-05-30 12:59:00 -04:00
Adam Treat	cdc7d6ccc4	Revert "buf_ref.into() can be const now" This reverts commit `d59c77ac55`.	2023-05-30 12:58:53 -04:00
Adam Treat	b5edaa2656	Revert "add tokenizer readme w/ instructions for convert script" This reverts commit `5063c2c1b2`.	2023-05-30 12:58:18 -04:00
aaron miller	5063c2c1b2	add tokenizer readme w/ instructions for convert script	2023-05-30 12:05:57 -04:00
Aaron Miller	d59c77ac55	buf_ref.into() can be const now	2023-05-30 12:05:57 -04:00
Aaron Miller	bbcee1ced5	New tokenizer implementation for MPT and GPT-J Improves output quality by making these tokenizers more closely match the behavior of the huggingface `tokenizers` based BPE tokenizers these models were trained with. Featuring: * Fixed unicode handling (via ICU) * Fixed BPE token merge handling * Complete added vocabulary handling	2023-05-30 12:05:57 -04:00

1 2 3 4 5 ...

269 Commits