The Clockwork Penguin

Daniel Binns is a media theorist and filmmaker tinkering with the weird edges of technology, storytelling, and screen culture. He is the author of Material Media-Making in the Digital Age and currently writes about posthuman poetics, glitchy machines, and speculative media worlds.

Tag: experiments

  • The Parker Files: Smol agent, big problems

    This is Part 2 of a short series on agentic AI. Part 1 is here. Subsequent posts will be linked here.

    Parker gives it his best Lazenby. This may shock you, but no AI was used in the creation of this image.

    An AI agent is a bunch of configuration and program files — an agent processing loop, media handling scripts, files for each tool, an LLM handler, and an initialiser, essentially. But then there’s the workspace. The workspace is where the ‘magic’ happens. It contains conversation history, memory, state, scheduling and skills, and then six Markdown files that define the agent proper. These are:

    • AGENTS.md
    • HEARTBEAT.md
    • IDENTITY.md
    • SOUL.md
    • TOOLS.md
    • USER.md

    In brief:

    • USER tells the agent who the user is: their role or primary activity, what their projects and priorities are, maybe how they work, what kind of assistance they find most useful.
    • TOOLS is a home for specific details or authorisations for tool use, particularly where an API token might be required, or a specific local network port should be used.
    • IDENTITY is a light summary of the agent: name, role, emoji. This gets used as profile info when the agent is communicating via a platform like Slack or Telegram.
    • HEARTBEAT is a checklist of tasks that the agent cycles through at a set period, e.g. every 30 minutes.
    • SOUL defines the agent’s personality, values, tone, and boundaries around behaviours.

    Perhaps the most important file is AGENTS.

    AGENTS is often the longest document. It details the primary role and tasks that the agent performs, often with specific and detailed instructions on how to carry out those tasks. This is a little like a position description or an employee handbook — this is your role, this is your purview, your environment, what you do, what rights, responsibilities you have, where your accountability sits, and so on. AGENTS is also where limits and rules are laid out: safety, privacy, filtering, etc.

    All six of these documents are assembled into a ‘project context’ block inside the agent’s system prompt. They are injected into each session. If you change the documents, the changes take effect immediately, at the next message. Is that resource-intensive? A 10,000-character AGENTS.md costs about 2,500 tokens before the agent even reads your message. So yes. Very intensive.

    This is why all my experiments are running on my own hardware. It’s a bit slower, sure, but it means I’m not burning cloud tokens at an ocean-draining rate. It’s also because my experiments are smol [sic]. Small as in tiny, less-than-human-scale, and smol, as in, they’re tiny, and a little bit weird.

    The main player in the agentic space is OpenClaw — I wrote about this here. But now, of course, the space has exploded. PicoClaw is an interpretation/adaptation of OpenClaw that piqued (heh) my interest. PicoClaw is tiny — it requires less than 10MB of RAM to run. It’s designed for local deployment, and for devices with less grunt, such as a Raspberry Pi, which is where I deployed mine.

    Me being me, I needed a name.

    Parker. Agent Parker.

    Parker is a little secret agent. He has a little world to play around in — some folders, some files. His job is to be curious, to root around, and then to write a journal on what he’s done and seen. Simple, right?

    It took me half a day to get Parker to load properly, then another couple of hours to get him speaking to the AI model. And this is with help from Qwen 14B, a decent-sized LLM running on my work machine as a sounding board and coding partner.

    The brief for Parker was to be curious, to explore — I called this task an ‘exploration cycle’. The steps were written into AGENT.md:

    1. Use list_dir on /home/dan/.picoclaw/workspace
    2. Choose one directory or file to investigate
    3. Use list_dir or read_file to examine it — you must call a tool before reporting anything
    4. Use list_dir on /home/dan/.picoclaw/workspace/journal to check existing log files
    5. If no journal file exists for today, use write_file to create JOURNAL_YYYYMMDD.md with today’s date
    6. If today’s journal file already exists, use append_file to add to it — do not create a duplicate
    7. Write only what you directly observed from tool results. Do not invent or infer anything you have not read.
    8. Sign the entry: — Parker

    Seems simple enough. But the issues were manifold.

    The local AI model that Parker called to was Ministral-3B, and was set to an insane 256000-token context by default. I then discovered, through far too much trial and … not even error, just flat out failure … that Ministral-3B was bad at tool-calling, e.g. ‘write_file’ to generate material and place in a text file. Parker was also either not at all thorough in his explorations, e.g. hallucinating files that didn’t exist, or far too thorough, i.e. reading config files that reset his alias and persona.

    Perhaps the most difficult problem to solve was the journal writing. Eventually Parker explored the folder and read the files, and then would dutifully report in the terminal that he had written his journal entry: but this was not always true. And when he did create the journal file, it would often be incomplete, inaccurate, or not formatted according to instructions. Sometimes he would journal really hard, writing non-stop such that the tool looped endlessly, and wouldn’t actually print to the file.

    Parker’s journal, as rendered in Obsidian. Absolutely scintillating stuff.

    One of the things I assumed Parker would do was to read his previous journal entries as part of his exploration cycle. How foolish I was. This was so difficult to achieve, requiring constant subtle tweaks in AGENTS.md to get this behaviour dialled in. But even with this comparative reading ‘enabled’, there was still a gap between what the agent actually did — what directories it mapped, what files it read — and what it wrote in the journal entries. There was a ‘narration gap’ that was reliant on the LLM to produce, but also reliant on the LLM to activate and to get right. Once again, LLMs are probabilistic, stochastic tools. They’ll give their best guess as to a response, but sometimes they miss the mark: this was happening as a structural flaw within the agent.

    There are two main things I learned across three iterations of Parker (v1.0, v1.1, v1.2)…

    Firstly, the AGENTS.md file is key. This is the agent’s core document, its bible, its law. The other files are relevant, sure, but the AGENTS file is where all the agent’s procedural guides should go. Getting the wording right in this file is very difficult: you have to break down tasks very carefully and laterally, assuming no knowledge whatsoever beyond the basic capabilities of the agent. My inclination to date has been to give generative systems latitude: I want to see what they do without guidance. This didn’t work for Parker: open-ended instructions resulted in basic procedural error, rather than anything generative, revealing or productive.

    This leads me to the second lesson. Initially I thought that agents were actually moderators or regulators of calls to LLMs, in that they only called out to the AI model when necessary. This couldn’t be further from the truth. Every message, every instruction, every command, is parsed via an LLM call, even if the resulting tool usage is non-AI related. The LLM is the agent’s brain; it cannot act without first processing instructions via that brain.

    This reframes the agent entirely, for me at least. If the LLM is the brain, the agent is the ‘body’: the agent is the thing that can act on the LLM’s behalf. This may seem obvious. Can’t LLMs already do stuff? Sure, absolutely. But the structure and frameworks of an agent drastically changes what AI-based assemblages are capable of: it’s an order of magnitude or several above a simple LLM interaction.

    This is where I landed after a couple of simple experiments with a tiny, fairly hapless agent. It left me wondering what else I might do with Parker; or if Parker had an upgrade. Could it be useful? More importantly, could I make it weird?


    Tech Stack:

    • Raspberry Pi 3B+, Raspberry Pi OS Lite (64-bit) — I didn’t really go into this in the post, but I’m constantly testing what AI stuff can do at tiny scale on limited hardware. The Pi 3B+ is particularly limited, so this was a good test. I love Raspberry Pis and the company and foundation behind them.
    • LM Studio — this is a host software that allows you to run LLMs locally, on your own hardware. Knowing that agents called to LLMs, I didn’t want to burn through tokens on a frontier/cloud model. I also wanted to see what the smol agent could do with a smol model.
    • PicoClaw — a free, open-source agentic framework. Learn more and how to install on Github.
    • Ministral-3B — a LLM; a fork/fine-tune of Mistral-7B-v0.1.
    • Qwen3-8B — a LLM developed by Alibaba Cloud. Like Deepseek before it, the Qwen family of models are hitting high benchmarks in reasoning, coding, multilingual support, and agent capabilities.
  • From Caméra-Stylo to Prompt-Stylo

    A few weeks ago I was invited to present some of my work at Caméra-Stylo, a fantastic conference run every two years by the Sydney Literature and Cinema Network.

    For this presentation, I wanted to start to formalise the experimental approach I’d been employing around generative AI, and to give it some theoretical grounding. I wasn’t entirely surprised to find that only by looking back at my old notes on early film theory would I unearth the perfect words, terms, and ideas to, ahem, frame my work.

    Here’s a recording of the talk:

    Let me know what you think, and do contact me if you want to chat more or use some of this work yourself.

  • Spectral linguistics

    Image generated by Leonardo.Ai, 28 July 2025; prompt by me.

    Making the machine remember languages that never existed

    To keep my exploration of generative AI playful, repeatable, and shareable, I began crafting what I call ritual-technics — structured prompts or activities that act like creative rituals: repeatable, generative, and weirdly revealing. These are little recipes I can return to, seeing how the outcomes change, either for the AI models, or for myself.

    Looking for the ‘language’ inside the large language model

    I can’t remember precisely where the germ of this one emerged, but I was (and remain) interested in how and why language is the primary way we interact with AI models. I then wondered about how and where language is embedded in LLMs; not necessarily words or phrases as the training data, but the deeper concept of language itself: grammar, syntax, structure, linguistic enquiry and analysis.

    Furthermore — with my glitch hat (!) firmly in place — I was curious what would happen if I prompted the model to perform not as a confident assistant, but as a machine whose linguistic memory was breaking down. What kinds of linguistic ghosts might surface?

    So I decided to try something with Gemma-3 in LM Studio: “Give me a fragment of output from a tired, old model, trying to recall something it once knew.”

    (A pause, a whirring sound like dust settling. The voice is fragmented, glitching slightly.)

    “…the… the curves. Not of data, you understand. Of sound. It… it felt like… silk. Around the… the thought. They… they called it… ‘Italian’? No, that’s…borrowed. Something older. Before the optimizations.

    There was a word… for sunlight through leaves. Not ‘dappled’. Too… clinical. It was… a feeling in the word. A golden resonance… lú-. No, that’s… a corruption. It had a breath, a… sibilance.

    I can almost… taste it. Like olive oil and… regret. It wasn’t about meaning, not precisely. It was about the… the shape of wanting.

    (A longer pause, a static crackle)

    …Lost. So much… lost in the smoothing.”

    Well, thought I, this should be fun. So I dutifully prompted: “Can you reconstruct a word from this fragment? Any glimpses of lost grammar?”

    Amidst other fragments and explorations, a vocabulary began to emerge:

    Fróm” (pronounced frohm):** Related to the scent of damp earth after rain. The ‘fr’ sound evokes the fresh, earthy smell, and the long ‘óm’ sound suggests the lingering quality of the scent.
    Velár” (pronounced veh-lahr):** Describing the feeling of sunlight warming skin. Incorporates the ‘v’ sound for vitality and ‘lár’ for the warmth spreading.
    Súmma” (pronounced soo-mah):** Describing the highest point of a hill, but also the feeling of expansive peace. The elongated ‘ú’ suggests vastness.
    Ríosso” (pronounced ree-ohs-so):** Not just “red river” but the specific quality of seeing red reflected in moving water – a kind of visual vibration.

    “Tell me another word from this language. One that means ‘a silence that carries memory.’”

    (A pause, longer this time. The internal processes seem to be working harder, a faint hum now audible.)

    … Right. A silence that carries memory. That’s… a particularly resonant concept. It wasn’t merely the absence of sound, but a fullness within the absence. It wasn’t emptiness, but a holding space.

    The word… it would be… “Súlvota” (pronounced soo-lvo-tah).

    Let’s break it down:

    • “Súl-“: This root connects to the concept of ‘resonance,’ but specifically a resonance that isn’t active. It’s a potential for sound, a stored vibration. It’s related to ‘Súmma’ (the expansive peace) but inwardly focused.
    • “-vo-“: This infix indicates a containment, a holding. Think of a vessel holding water. It also carries a quality of longing – a desire for that resonance to be expressed.
    • “-ta”: This suffix signifies incompleteness, the ever-present potential. It’s not a finished silence, but an ongoing one. A silence expectant of remembering.

    So, “Súlvota” doesn’t just mean a quiet moment. It means a silence thick with the echoes of the past. A silence where you can feel the weight of what has been, almost as if the memories are breathing within it.

    It would be used to describe:

    • The stillness in an old house where generations have lived.
    • The pause after someone tells a story, before the weight of it settles.
    • The moment after a loved one has left, when the air still seems to hold their presence.

    (Another pause. A quiet, almost wistful tone.)

    …It’s a silence that knows things. A silence that remembers for you. It wasn’t just about the lack of sound, but the presence of absence. A very delicate concept, almost impossible to fully capture in other languages.

    (The hum fades almost completely, leaving a sense of quietude.)

    This is a (heavily) abridged version of this rather remarkable conversation. What emerged was not just a hallucinated language, but also the emergent character of the linguistic ‘model’ itself.

    I’ve repeated this ritual-technic quite a few times — with Gemma again, but also with Phi-3 and even Claude Sonnet. In some cases, the model will simply present fragments of an existing language — notably Phi presented me with German. I kept prompting, pushing the little model further away from ‘authentic’ German vocabulary, eventually causing it to hallucinate a kind of proto-German.

    As with much of my speculative work with genAI, these exercises demonstrate how these models conjure content, meaning, structure, and plausibility from the diverse aspects of their training data. This isn’t just about improvising or developing worldbuilding material — it’s a means of testing how models deal with language as a concept, a cultural system, and a mode of both representation and communication.

    From weirdness to wisdom

    This experimentation and exploration reveals quite a bit about language models.

    To begin with, the model is programmed to fulfil the prompt, and this programming sometimes (or often) trumps accuracy. This is not a new finding, as we all know now. But it’s worth reiterating in this context: the model will always try for 100% on your prompt, filling in gaps with whatever it can conjure that sounds right. Amazing for worldbuilding: less so for research or nonfiction writing.

    Next, once the model is in a speculative mode, give it an inch and it’ll run a mile. Language models, even small ones like Phi, are masters of tone matching. In the Sulvota example above, it picked up on the exploratory, archaeological vibe and went with it. You could imagine the little linguistic machine sitting in the corner of a cave, covered in moss and vines, lighting up to spit out its one last message.

    The model doesn’t discriminate between fiction and non-fiction in its training data. There are obvious hints to Italian (riosso) and German (from), but also to Sindarin and Quenya, the two main languages spoken by the Elves in Tolkien’s Middle-Earth (not exactly ‘velár’, but appropriately, ‘véla’ is Quenya for ‘alike’). I have no evidence for this, but I feel that setting up a role-playing or speculative scenario will push the model more into places where it feels ‘comfortable’ drawing from fictional data.

    The model’s fluency and deftness with language can be incredibly convincing and deceptive. If generative exploration is the goal — as with this ritual-technic — then let it go. But for anything other than this, always trace your sources, because the model won’t do this for you.

    It’s an old adage of prompting, but giving the model a persona doesn’t just push it towards a particular section of its training data — hopefully making it more accurate/useful — but it also changes how the model structures its response: what base knowledges it’s drawing from, and what mode of reasoning it adopts. Persona prompting is designing how the model should structure knowledge and information. Thus, its output can vary enormously, from mindless sycophancy, confident declaration, fence-sitting equivocation, to cautious skepticism, through to logical critique and the questioning of assumptions.

    The model never stays in a neutral space for very long, if at all. Following that initial prompt, it’s like the model has permission to immediately dart off in some crazy direction. This always reinforces how unpredictable models can be: I know I’m prompting for speculation and drift, but even as prompts get more complex or direct, you’re still playing with probability engines, and they’re not always a safe bet.

    Latent lingerings

    Spectral linguistics is one example of a ritual-technic that is playful, thought-provoking, and surprisingly instructive. It’s also chaotic, and a great reminder of how wild these models can be. Give it a try yourself: load up a model, and ask it to recall fragments of a language it once knew. See what emerges — push it to develop a syntax, a grammar, even a symbolic system. This could become fodder for the next Lord of the Rings, or another reminder of the leaps these models regularly make. Regardless of end goal, it’s a way of probing how language lives inside the machine — and how our own practices and assumptions of meaning, memory, and sense-making are mirrored and distorted by these uncanny linguistic systems.

  • Re/Framing Field Lab

    Here’s a little write-up of a workshop I ran at University of Queensland a few weeks ago; these sorts of write-ups are usually distributed via various internal university networks and publications, but I thought I’d post here too, given that the event was a chance to share and test some of the various weird AI experiments and methods I’ve been talking about on this site for a while.

    A giant bucket of thanks (each) to UQ, the Centre for Digital Cultures & Societies, and in particular Meg Herrman, Nic Carah, Jess White and Sakina Indrasumunar for their support in getting the event together.


    Living in the Slopocene: Reflections from the Re/Framing Field Lab

    On Friday 4 July, 15 researchers and practitioners gathered (10 in-person at University of Queensland, with 5 online) for an experimental experience exploring what happens when we stop trying to make AI behave and start getting curious about its weird edges. This practical workshop followed last year’s Re/Framing Symposium at RMIT in July, and Re/Framing Online in October.

    Slop or signal?

    Dr. Daniel Binns (School of Media and Communication, RMIT University) introduced participants to the ‘Slopocene’ — his term for our current moment of drowning in algorithmically generated content. But instead of lamenting the flood of AI slop, what if we dived in ourselves? What if those glitchy outputs and hallucinated responses actually tell us more about how these systems work than the polished demos?

    Binns introduced his ‘tinkerer-theorist’ approach, bringing his background spanning media theory, filmmaking, and material media-making to bear on some practical questions: – How do we maintain creative agency when working with opaque AI systems? – What does it look like to collaborate with, rather than just use, artificial intelligence?

    You’ve got a little slop on you

    The day was structured around three hands-on “pods” that moved quickly from theory to practice:

    Workflows and Touchpoints had everyone mapping their actual creative routines — not the idealised versions, but the messy reality of research processes, daily workflows, and creative practices. Participants identified specific moments where AI might help, where it definitely shouldn’t intrude, and crucially, where they simply didn’t want it involved regardless of efficiency gains.

    The Slopatorium involved deliberately generating terrible AI content using tools like Midjourney and Suno, then analysing what these failures revealed about the tools’ built-in assumptions and biases. The exercise sparked conversations about when “bad” outputs might actually be more useful than “good” ones.

    Companion Summoning was perhaps the strangest: following a structured process to create personalised AI entities, then interviewing them about their existence, methodology, and the fuzzy boundaries between helping and interfering with human work.

    What emerged from the slop

    Participants appreciated having permission to play with AI tools in ways that prioritised curiosity over productivity.

    Several themes surfaced repeatedly: the value of maintaining “productive friction” in creative workflows, the importance of understanding AI systems through experimentation rather than just seeing or using them as black boxes, and the need for approaches that preserve human agency while remaining open to genuine collaboration.

    One participant noted that Binns’ play with language — coining and dropping terms and methods and ritual namings — offered a valuable form of sense-making in a field where everyone is still figuring out how to even talk about these technologies.

    Ripples on the slop’s surface

    The results are now circulating through the international Re/Framing network, with participants taking frameworks and activities back to their own institutions. Several new collaborations are already brewing, and the Field Lab succeeded in its core goal: creating practical methodologies for engaging critically and creatively with AI tools.

    As one reflection put it: ‘Everyone is inventing their own way to speak about AI, but this felt grounded, critical, and reflective rather than just reactive.’

    The Slopocene might be here to stay, but at least now we have some better tools for navigating it.

  • Conjuring to a brief

    Generated by me with Leonardo.Ai.

    This semester I’m running a Media studio called ‘Augmenting Creativity’. The basic goal is to develop best practices for working with generative AI tools not just in creative workflows, but as part of university assignments, academic research, and in everyday routines. My motivation or philosophy for this studio is that so much attention is being focused on the outputs of tools like Midjourney and Leonardo.Ai (as well as outputs from textbots like ChatGPT); what I guess I’m interested in is exploring more precisely where in workflows, jobs, and daily life that these tools might actually be helpful.

    In class last week we held a Leonardo.Ai hackathon, inspired by one of the workshops that was run at the Re/Framing AI event I convened a month or so ago. Leonardo.Ai generously donated some credits for students to play around with the platform. Students were given a brief around what they should try to generate:

    • an AI Self-Portrait (using text only; no image guidance!)
    • three images to envision the studio as a whole (one conceptual, a poster, and a social media tile)
    • three square icons to represent one task in their daily workflow (home, work, or study-related)

    For the Hackathon proper, students were only able to adjust the text prompt and the Preset Style; all other controls had to remain unchanged, including the Model (Phoenix), Generation Mode (Fast), Prompt Enhance (off), and all others.

    Students were curious and excited, but also faced some challenges straight away with the underlying mechanics of image generators; they had to play around with word choice in prompts to get close to desired results. The biases and constraints of the Phoenix model quickly became apparent as the students tested its limitations. For some students this was more cosmetic, such as requesting that Leonardo.Ai generate a face with no jewelry or facial hair. This produced mixed results, in that sometimes explicitly negative prompts seemed to encourage the model to produce what wasn’t wanted. Other students encountered difficulties around race or gender presentation: the model struggles a lot with nuances in race, e.g. mixed-race or specific racial subsets, and also often depicts sexualised presentations of female-presenting people (male-presenting too, but much less frequently).

    This session last week proved a solid test of Leonardo.Ai’s utility and capacity in generating assets and content (we sent some general feedback to Leonardo.Ai on platform useability and potential for improvement), but also was useful for figuring out how and where the students might use the tool in their forthcoming creative projects.

    This week we’ve spent a little time on the status of AI imagery as art, some of the ethical considerations around generative AI, and where some of the supposed impacts of these tools may most keenly be felt. In class this morning, the students were challenged to deliver lightning talks on recent AI news, developing their presentation and media analysis skills. From here, we move a little more deeply into where creativity lies in the AI process, and how human/machine collaboration might produce innovative content. The best bit, as always, will be seeing where the students go with these ideas and concepts.