diff --git a/docs/ai-chat.md b/docs/ai-chat.md index 79609eee..40a05c24 100755 --- a/docs/ai-chat.md +++ b/docs/ai-chat.md @@ -15,7 +15,7 @@ The use of **AI chat**, also known as Large Language Models (LLMs), has become i ## Privacy Concerns About LLMs -Data used to train AI models, however, includes a massive amount of publicly available data scraped from the web, which can include sensitive information like names and addresses. Cloud-based AI software often [collects your inputs](https://openai.com/policies/row-privacy-policy), meaning your chats are not private from them. This practice also introduces a risk of data breaches. Furthermore, there is a real possibility that an LLM will leak your private chat information in future conversations with other users. +Data used to train AI models, however, includes a massive amount of publicly available data scraped from the web, which can include sensitive information like names and addresses. Cloud-based AI software often [collects your inputs](https://openai.com/policies/row-privacy-policy), meaning your chats are not private from them. Even deleted [chats can be kept](https://openai.com/index/response-to-nyt-data-demands/). This practice also introduces a risk of data breaches. Furthermore, there is a real possibility that an LLM will leak your private chat information in future conversations with other users. If you are concerned about these practices, you can either refuse to use AI, or use [truly open-source models](https://proton.me/blog/how-to-build-privacy-first-ai) which publicly release and allow you to inspect their training datasets. One such model is [OLMoE](https://allenai.org/blog/olmoe-an-open-small-and-state-of-the-art-mixture-of-experts-model-c258432d0514) made by [Ai2](https://allenai.org/open-data). @@ -25,9 +25,9 @@ Alternatively, you can run AI models locally so that your data never leaves your ### Hardware for Local AI Models -Local models are also fairly accessible. It's possible to run smaller models at lower speeds on as little as 8 GB of RAM. Using more powerful hardware such as a dedicated GPU with sufficient VRAM or a modern system with fast LPDDR5X memory offers the best experience. +Local models are also fairly accessible. They can run on most PCs and some high-end smartphones. It's possible to run smaller models at lower speeds on as little as 8 GB of RAM. Using more powerful hardware such as a dedicated GPU with sufficient VRAM or a modern system with fast LPDDR5X memory offers the best experience. -LLMs can usually be differentiated by the number of parameters, which can vary between 1.3B to 405B for open-source models available for end users. For example, models below 6.7B parameters are only good for basic tasks like text summaries, while models between 7B and 13B are a great compromise between quality and speed. Models with advanced reasoning capabilities are generally around 70B. +LLMs can usually be differentiated by the number of parameters, which can vary between 1.3B to 405B for open-source models available for end users. For example, models below 3B parameters are only reliably good for simple lingustic tasks such as summarisation, models above 4B start understanding context and have a good knowledge of the world.From 8B, they can exerce basic reasoning. Finally, models with advanced reasoning capabilities start at around 30B parameters. For consumer-grade hardware, it is generally recommended to use [quantized models](https://huggingface.co/docs/optimum/en/concept_guides/quantization) for the best balance between model quality and performance. Check out the table below for more precise information about the typical requirements for different sizes of quantized models. @@ -45,9 +45,19 @@ There are many permissively licensed models available to download. [Hugging Face To help you choose a model that fits your needs, you can look at leaderboards and benchmarks. The most widely-used leaderboard is the community-driven [LM Arena](https://lmarena.ai). Additionally, the [OpenLLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) focuses on the performance of open-weights models on common benchmarks like [MMLU-Pro](https://arxiv.org/abs/2406.01574). There are also specialized benchmarks which measure factors like [emotional intelligence](https://eqbench.com), ["uncensored general intelligence"](https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard), and [many others](https://nebuly.com/blog/llm-leaderboards). +#### Recommended models + +Below is a table of good models to start with. + +|Developer|Model name| Available sizes| Strengths | Weaknesses | Censorship | +|---|---|---|---|---|---| +|Google| Gemma 3|1, 4, 12, 27| Multimodal, efficient| Many hallucinations | Sexuality, Drugs | +|Google| Gemma 3n| >2, >4| Vision capabilities, efficient, mobile-friendly | Many hallucinations | Sexuality, Drugs | +|Alibaba| Qwen 3|0.7, 1.7, 4, 8, 14, 32, 235|Multilingual, efficient, intelligent|Not multimodal| CCP-sensitive topics| + ## AI Chat Clients -| Feature | [Kobold.cpp](#koboldcpp) | [Ollama](#ollama-cli) | [Llamafile](#llamafile) | +| Feature | [Kobold.cpp](#koboldcpp) | [Ollama](#ollama-cli) | [Llamafile](#llamafile) | |---|---|---|---| | GPU Support | :material-check:{ .pg-green } | :material-check:{ .pg-green } | :material-check:{ .pg-green } | | Image Generation | :material-check:{ .pg-green } | :material-close:{ .pg-red } | :material-close:{ .pg-red } | @@ -56,6 +66,57 @@ To help you choose a model that fits your needs, you can look at leaderboards an | Custom Parameters | :material-check:{ .pg-green } | :material-close:{ .pg-red } | :material-check:{ .pg-green } | | Multi-platform | :material-check:{ .pg-green } | :material-check:{ .pg-green } | :material-alert-outline:{ .pg-orange } Size limitations on Windows | +## Edge Gallery (Android) + +