Advanced model plugins#
The model plugin tutorial covers the basics of developing a plugin that adds support for a new model.
This document covers more advanced topics.
Attachments for multi-modal models#
Models such as GPT-4o, Claude 3.5 Sonnet and Google’s Gemini 1.5 are multi-modal: they accept input in the form of images and maybe even audio, video and other formats.
LLM calls these attachments. Models can specify the types of attachments they accept and then implement special code in the .execute()
method to handle them.
Specifying attachment types#
A Model
subclass can list the types of attachments it accepts by defining a attachment_types
class attribute:
class NewModel(llm.Model):
model_id = "new-model"
attachment_types = {
"image/png",
"image/jpeg",
"image/webp",
"image/gif",
}
These content types are detected when an attachment is passed to LLM using llm -a filename
, or can be specified by the user using the --attachment-type filename image/png
option.
Note: *MP3 files will have their attachment type detected as audio/mpeg
, not audio/mp3
.
LLM will use the attachment_types
attribute to validate that provided attachments should be accepted before passing them to the model.
Handling attachments#
The prompt
object passed to the execute()
method will have an attachments
attribute containing a list of Attachment
objects provided by the user.
An Attachment
instance has the following properties:
url (str)
: The URL of the attachment, if it was provided as a URLpath (str)
: The resolved file path of the attachment, if it was provided as a filetype (str)
: The content type of the attachment, if it was providedcontent (bytes)
: The binary content of the attachment, if it was provided
Generally only one of url
, path
or content
will be set.
You should usually access the type and the content through one of these methods:
attachment.resolve_type() -> str
: Returns thetype
if it is available, otherwise attempts to guess the type by looking at the first few bytes of contentattachment.content_bytes() -> bytes
: Returns the binary content, which it may need to read from a file or fetch from a URLattachment.base64_content() -> str
: Returns that content as a base64-encoded string
A id()
method returns a database ID for this content, which is either a SHA256 hash of the binary content or, in the case of attachments hosted at an external URL, a hash of {"url": url}
instead. This is an implementation detail which you should not need to access directly.
Here’s how the OpenAI plugin handles attachments:
messages = []
if not prompt.attachments:
messages.append({"role": "user", "content": prompt.prompt})
else:
attachment_message = [{"type": "text", "text": prompt.prompt}]
for attachment in prompt.attachments:
url = attachment.url
if not url:
base64_image = attachment.base64_content()
url = f"data:{attachment.resolve_type()};base64,{base64_image}"
attachment_message.append(
{"type": "image_url", "image_url": {"url": url}}
)
messages.append({"role": "user", "content": attachment_message})
As you can see, it uses attachment.url
if that is available and otherwise falls back to using the base64_content()
method to embed the image directly in the JSON sent to the API.
Attachments from previous conversations#
Models that implement the ability to continue a conversation can reconstruct the previous message JSON using the response.attachments
attribute.
Here’s how the OpenAI plugin does that:
for prev_response in conversation.responses:
if prev_response.attachments:
attachment_message = [
{"type": "text", "text": prev_response.prompt.prompt}
]
for attachment in prev_response.attachments:
url = attachment.url
if not url:
base64_image = attachment.base64_content()
url = f"data:{attachment.resolve_type()};base64,{base64_image}"
attachment_message.append(
{"type": "image_url", "image_url": {"url": url}}
)
messages.append({"role": "user", "content": attachment_message})
else:
messages.append(
{"role": "user", "content": prev_response.prompt.prompt}
)
messages.append({"role": "assistant", "content": prev_response.text()})