text-generation-webui/extensions/send_pictures/script.py

import base64
from io import BytesIO

import gradio as gr
import torch
from transformers import BlipForConditionalGeneration, BlipProcessor

from modules import chat, shared
from modules.ui import gather_interface_values

# If 'state' is True, will hijack the next chat generation with
# custom input text given by 'value' in the format [text, visible_text]
input_hijack = {
    'state': False,
    'value': ["", ""]
}

processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base", torch_dtype=torch.float32).to("cpu")


def caption_image(raw_image):
    inputs = processor(raw_image.convert('RGB'), return_tensors="pt").to("cpu", torch.float32)
    out = model.generate(**inputs, max_new_tokens=100)
    return processor.decode(out[0], skip_special_tokens=True)


def generate_chat_picture(picture, name1, name2):
    text = f'*{name1} sends {name2} a picture that contains the following: “{caption_image(picture)}”*'
    # lower the resolution of sent images for the chat, otherwise the log size gets out of control quickly with all the base64 values in visible history
    picture.thumbnail((300, 300))
    buffer = BytesIO()
    picture.save(buffer, format="JPEG")
    img_str = base64.b64encode(buffer.getvalue()).decode('utf-8')
    visible_text = f'<img src="data:image/jpeg;base64,{img_str}" alt="{text}">'
    return text, visible_text


def ui():
    picture_select = gr.Image(label='Send a picture', type='pil')

    # Prepare the hijack with custom inputs
    picture_select.upload(lambda picture, name1, name2: input_hijack.update({"state": True, "value": generate_chat_picture(picture, name1, name2)}), [picture_select, shared.gradio['name1'], shared.gradio['name2']], None)

    # Call the generation function
    picture_select.upload(
        gather_interface_values, [shared.gradio[k] for k in shared.input_elements], shared.gradio['interface_state']).then(
        chat.cai_chatbot_wrapper, shared.input_params, shared.gradio['display'], show_progress=shared.args.no_stream).then(
        lambda: None, None, picture_select, show_progress=False)
Add the "send picture" extension file 2023-02-24 22:26:15 -05:00			`import base64`
			`from io import BytesIO`

			`import gradio as gr`
Minor changes 2023-03-22 14:55:03 -04:00			`import torch`
			`from transformers import BlipForConditionalGeneration, BlipProcessor`
Reorder imports 2023-03-01 23:23:18 -05:00
Refactor several function calls and the API 2023-04-06 00:22:15 -04:00			`from modules import chat, shared`
Fix send_pictures extension 2023-04-12 09:27:06 -04:00			`from modules.ui import gather_interface_values`
Refactor several function calls and the API 2023-04-06 00:22:15 -04:00
Clean up send_pictures extension 2023-02-25 00:13:24 -05:00			`# If 'state' is True, will hijack the next chat generation with`
Clean up some unused code 2023-02-25 07:23:02 -05:00			`# custom input text given by 'value' in the format [text, visible_text]`
Add the "send picture" extension file 2023-02-24 22:26:15 -05:00			`input_hijack = {`
Clean up send_pictures extension 2023-02-25 00:13:24 -05:00			`'state': False,`
Better way to generate custom prompts 2023-02-24 23:08:17 -05:00			`'value': ["", ""]`
Add prompt hijack option for extensions 2023-02-24 22:49:18 -05:00			`}`

Move bot_picture.py inside the extension 2023-02-25 01:00:19 -05:00			`processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")`
			`model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base", torch_dtype=torch.float32).to("cpu")`

Make the code more like PEP8 for readability (#862) 2023-04-06 23:15:45 -04:00
Move bot_picture.py inside the extension 2023-02-25 01:00:19 -05:00			`def caption_image(raw_image):`
			`inputs = processor(raw_image.convert('RGB'), return_tensors="pt").to("cpu", torch.float32)`
			`out = model.generate(**inputs, max_new_tokens=100)`
			`return processor.decode(out[0], skip_special_tokens=True)`

Make the code more like PEP8 for readability (#862) 2023-04-06 23:15:45 -04:00
Add the "send picture" extension file 2023-02-24 22:26:15 -05:00			`def generate_chat_picture(picture, name1, name2):`
Send_pictures small fix (#546) 2023-04-08 00:55:16 -04:00			`text = f'{name1} sends {name2} a picture that contains the following: “{caption_image(picture)}”'`
Extensions performance & memory optimisations Reworked remove_surrounded_chars() to use regular expression ( https://regexr.com/7alb5 ) instead of repeated string concatenations for elevenlab_tts, silero_tts, sd_api_pictures. This should be both faster and more robust in handling asterisks. Reduced the memory footprint of send_pictures and sd_api_pictures by scaling the images in the chat to 300 pixels max-side wise. (The user already has the original in case of the sent picture and there's an option to save the SD generation). This should fix history growing annoyingly large with multiple pictures present 2023-03-22 00:47:54 -04:00			`# lower the resolution of sent images for the chat, otherwise the log size gets out of control quickly with all the base64 values in visible history`
Minor fix 2023-03-22 15:09:48 -04:00			`picture.thumbnail((300, 300))`
Add the "send picture" extension file 2023-02-24 22:26:15 -05:00			`buffer = BytesIO()`
			`picture.save(buffer, format="JPEG")`
			`img_str = base64.b64encode(buffer.getvalue()).decode('utf-8')`
Extensions performance & memory optimisations Reworked remove_surrounded_chars() to use regular expression ( https://regexr.com/7alb5 ) instead of repeated string concatenations for elevenlab_tts, silero_tts, sd_api_pictures. This should be both faster and more robust in handling asterisks. Reduced the memory footprint of send_pictures and sd_api_pictures by scaling the images in the chat to 300 pixels max-side wise. (The user already has the original in case of the sent picture and there's an option to save the SD generation). This should fix history growing annoyingly large with multiple pictures present 2023-03-22 00:47:54 -04:00			`visible_text = f'<img src="data:image/jpeg;base64,{img_str}" alt="{text}">'`
Add the "send picture" extension file 2023-02-24 22:26:15 -05:00			`return text, visible_text`

Make the code more like PEP8 for readability (#862) 2023-04-06 23:15:45 -04:00
Add the "send picture" extension file 2023-02-24 22:26:15 -05:00			`def ui():`
			`picture_select = gr.Image(label='Send a picture', type='pil')`

Clean up some unused code 2023-02-25 07:23:02 -05:00			`# Prepare the hijack with custom inputs`
Add the "send picture" extension file 2023-02-24 22:26:15 -05:00			`picture_select.upload(lambda picture, name1, name2: input_hijack.update({"state": True, "value": generate_chat_picture(picture, name1, name2)}), [picture_select, shared.gradio['name1'], shared.gradio['name2']], None)`
Clean up some unused code 2023-02-25 07:23:02 -05:00
			`# Call the generation function`
Fix send_pictures extension 2023-04-12 09:27:06 -04:00			`picture_select.upload(`
			`gather_interface_values, [shared.gradio[k] for k in shared.input_elements], shared.gradio['interface_state']).then(`
			`chat.cai_chatbot_wrapper, shared.input_params, shared.gradio['display'], show_progress=shared.args.no_stream).then(`
			`lambda: None, None, picture_select, show_progress=False)`