A couple weeks ago I was writing more bash than I care to admit and the "language"(if you can call it that) is quite useful.
I decided to build an agent harness in it. flash is a lightweight AI agent framework in roughly 300 lines of bash. It uses Ollama to query a llm and lets the model use shell tools to complete tasks.
How it works
Flash uses a two-model architecture:
Gemma4:E4B (execution model) has access to tools. it decides when to run shell commands, fetch URLs, or manage a todo list via function calling.
Gemma4:E2B (response model) takes the tool results and generates the final reply. No tools, just text. The idea here is when a conversation is needed this model is quicker on the resources I currently have it running on.
The models hit a local Ollama API, and chat history persists as JSON files in asessions/directory. You can switch between sessions at runtime with/session.
Tools are just shell scripts
Every tool is a standalone .sh file in tools/:
sh.sh- runs any shell commandwebfetch.sh- fetches and strips HTML from a URLtodo_add.sh,todo_done.sh,todo_list.sh- lightweight task management
The agent can call them in parallel, collect results, and feed them back to the model.
Why?
Most agent harnesses are big and cumbersome. I wanted to run something that would be quick and easy to use locally. flash lives in my local file system uses a local model. It's a bash script that is self modifying and almost any consume grade laptop could rrun.
The whole thing fits in a Docker image based on Alpine with just bash, curl, and jq.