The following tools are designed to be used with LLM:
strip-tags is a command for stripping tags from HTML. This is useful when working with LLMs because HTML tags can use up a lot of your token budget.
Here’s how to summarize the front page of the New York Times, by both stripping tags and filtering to just the elements with
curl -s https://www.nytimes.com/ \ | strip-tags .story-wrapper \ | llm -s 'summarize the news'
llm, ttok and strip-tags—CLI tools for working with ChatGPT and other LLMs describes ways to use
strip-tags in more detail.
ttok is a command-line tool for counting OpenAI tokens. You can use it to check if input is likely to fit in the token limit for GPT 3.5 or GPT4:
cat my-file.txt | ttok
It can also truncate input down to a desired number of tokens:
ttok This is too many tokens -t 3
This is too
This is useful for truncating a large document down to a size where it can be processed by an LLM.
Symbex is a tool for searching for symbols in Python codebases. It’s useful for extracting just the code for a specific problem and then piping that into LLM for explanation, refactoring or other tasks.
Here’s how to use it to find all functions that match
test*csv* and use those to guess what the software under test does:
symbex 'test*csv*' | \ llm --system 'based on these tests guess what this tool does'
It can also be used to export symbols in a format that can be piped to llm embed-multi in order to create embeddings:
symbex '*' '*:*' --nl | \ llm embed-multi symbols - \ --format nl --database embeddings.db --store
For more examples see Symbex: search Python code for functions and classes, then pipe them into a LLM.