Brady Hawkins

flash

May 15, 2026

A couple weeks ago I was writing more bash than I care to admit and the "language"(if you can call it that) is quite useful.

Brady Hawkins's avatar
Brady Hawkins
2w

We should all just be writing bash


I decided to build an agent harness in it.
flash is a lightweight AI agent framework in roughly 300 lines of bash. It uses Ollama to query a llm and lets the model use shell tools to complete tasks.

How it works

Flash uses a two-model architecture:

  • Gemma4:E4B (execution model) has access to tools. it decides when to run shell commands, fetch URLs, or manage a todo list via function calling.

  • Gemma4:E2B (response model) takes the tool results and generates the final reply. No tools, just text. The idea here is when a conversation is needed this model is quicker on the resources I currently have it running on.
    The models hit a local Ollama API, and chat history persists as JSON files in a
    sessions/ directory. You can switch between sessions at runtime with /session.

Tools are just shell scripts

Every tool is a standalone .sh file in tools/:

  • sh.sh - runs any shell command

  • webfetch.sh - fetches and strips HTML from a URL

  • todo_add.sh, todo_done.sh, todo_list.sh - lightweight task management
    The agent can call them in parallel, collect results, and feed them back to the model.

Why?

Most agent harnesses are big and cumbersome. I wanted to run something that would be quick and easy to use locally. flash lives in my local file system uses a local model. It's a bash script that is self modifying and almost any consume grade laptop could rrun.

The whole thing fits in a Docker image based on Alpine with just bash, curl, and jq.

Self-Hosting: Farmville in 2026

Brady Hawkins