The Clockwork Penguin

Daniel Binns is a media theorist and filmmaker tinkering with the weird edges of technology, storytelling, and screen culture. He is the author of Material Media-Making in the Digital Age and currently writes about posthuman poetics, glitchy machines, and speculative media worlds.

Tag: generative AI

  • The Parker Files: Smol agent, big problems

    This is Part 2 of a short series on agentic AI. Part 1 is here. Subsequent posts will be linked here.

    Parker gives it his best Lazenby. This may shock you, but no AI was used in the creation of this image.

    An AI agent is a bunch of configuration and program files — an agent processing loop, media handling scripts, files for each tool, an LLM handler, and an initialiser, essentially. But then there’s the workspace. The workspace is where the ‘magic’ happens. It contains conversation history, memory, state, scheduling and skills, and then six Markdown files that define the agent proper. These are:

    • AGENTS.md
    • HEARTBEAT.md
    • IDENTITY.md
    • SOUL.md
    • TOOLS.md
    • USER.md

    In brief:

    • USER tells the agent who the user is: their role or primary activity, what their projects and priorities are, maybe how they work, what kind of assistance they find most useful.
    • TOOLS is a home for specific details or authorisations for tool use, particularly where an API token might be required, or a specific local network port should be used.
    • IDENTITY is a light summary of the agent: name, role, emoji. This gets used as profile info when the agent is communicating via a platform like Slack or Telegram.
    • HEARTBEAT is a checklist of tasks that the agent cycles through at a set period, e.g. every 30 minutes.
    • SOUL defines the agent’s personality, values, tone, and boundaries around behaviours.

    Perhaps the most important file is AGENTS.

    AGENTS is often the longest document. It details the primary role and tasks that the agent performs, often with specific and detailed instructions on how to carry out those tasks. This is a little like a position description or an employee handbook — this is your role, this is your purview, your environment, what you do, what rights, responsibilities you have, where your accountability sits, and so on. AGENTS is also where limits and rules are laid out: safety, privacy, filtering, etc.

    All six of these documents are assembled into a ‘project context’ block inside the agent’s system prompt. They are injected into each session. If you change the documents, the changes take effect immediately, at the next message. Is that resource-intensive? A 10,000-character AGENTS.md costs about 2,500 tokens before the agent even reads your message. So yes. Very intensive.

    This is why all my experiments are running on my own hardware. It’s a bit slower, sure, but it means I’m not burning cloud tokens at an ocean-draining rate. It’s also because my experiments are smol [sic]. Small as in tiny, less-than-human-scale, and smol, as in, they’re tiny, and a little bit weird.

    The main player in the agentic space is OpenClaw — I wrote about this here. But now, of course, the space has exploded. PicoClaw is an interpretation/adaptation of OpenClaw that piqued (heh) my interest. PicoClaw is tiny — it requires less than 10MB of RAM to run. It’s designed for local deployment, and for devices with less grunt, such as a Raspberry Pi, which is where I deployed mine.

    Me being me, I needed a name.

    Parker. Agent Parker.

    Parker is a little secret agent. He has a little world to play around in — some folders, some files. His job is to be curious, to root around, and then to write a journal on what he’s done and seen. Simple, right?

    It took me half a day to get Parker to load properly, then another couple of hours to get him speaking to the AI model. And this is with help from Qwen 14B, a decent-sized LLM running on my work machine as a sounding board and coding partner.

    The brief for Parker was to be curious, to explore — I called this task an ‘exploration cycle’. The steps were written into AGENT.md:

    1. Use list_dir on /home/dan/.picoclaw/workspace
    2. Choose one directory or file to investigate
    3. Use list_dir or read_file to examine it — you must call a tool before reporting anything
    4. Use list_dir on /home/dan/.picoclaw/workspace/journal to check existing log files
    5. If no journal file exists for today, use write_file to create JOURNAL_YYYYMMDD.md with today’s date
    6. If today’s journal file already exists, use append_file to add to it — do not create a duplicate
    7. Write only what you directly observed from tool results. Do not invent or infer anything you have not read.
    8. Sign the entry: — Parker

    Seems simple enough. But the issues were manifold.

    The local AI model that Parker called to was Ministral-3B, and was set to an insane 256000-token context by default. I then discovered, through far too much trial and … not even error, just flat out failure … that Ministral-3B was bad at tool-calling, e.g. ‘write_file’ to generate material and place in a text file. Parker was also either not at all thorough in his explorations, e.g. hallucinating files that didn’t exist, or far too thorough, i.e. reading config files that reset his alias and persona.

    Perhaps the most difficult problem to solve was the journal writing. Eventually Parker explored the folder and read the files, and then would dutifully report in the terminal that he had written his journal entry: but this was not always true. And when he did create the journal file, it would often be incomplete, inaccurate, or not formatted according to instructions. Sometimes he would journal really hard, writing non-stop such that the tool looped endlessly, and wouldn’t actually print to the file.

    Parker’s journal, as rendered in Obsidian. Absolutely scintillating stuff.

    One of the things I assumed Parker would do was to read his previous journal entries as part of his exploration cycle. How foolish I was. This was so difficult to achieve, requiring constant subtle tweaks in AGENTS.md to get this behaviour dialled in. But even with this comparative reading ‘enabled’, there was still a gap between what the agent actually did — what directories it mapped, what files it read — and what it wrote in the journal entries. There was a ‘narration gap’ that was reliant on the LLM to produce, but also reliant on the LLM to activate and to get right. Once again, LLMs are probabilistic, stochastic tools. They’ll give their best guess as to a response, but sometimes they miss the mark: this was happening as a structural flaw within the agent.

    There are two main things I learned across three iterations of Parker (v1.0, v1.1, v1.2)…

    Firstly, the AGENTS.md file is key. This is the agent’s core document, its bible, its law. The other files are relevant, sure, but the AGENTS file is where all the agent’s procedural guides should go. Getting the wording right in this file is very difficult: you have to break down tasks very carefully and laterally, assuming no knowledge whatsoever beyond the basic capabilities of the agent. My inclination to date has been to give generative systems latitude: I want to see what they do without guidance. This didn’t work for Parker: open-ended instructions resulted in basic procedural error, rather than anything generative, revealing or productive.

    This leads me to the second lesson. Initially I thought that agents were actually moderators or regulators of calls to LLMs, in that they only called out to the AI model when necessary. This couldn’t be further from the truth. Every message, every instruction, every command, is parsed via an LLM call, even if the resulting tool usage is non-AI related. The LLM is the agent’s brain; it cannot act without first processing instructions via that brain.

    This reframes the agent entirely, for me at least. If the LLM is the brain, the agent is the ‘body’: the agent is the thing that can act on the LLM’s behalf. This may seem obvious. Can’t LLMs already do stuff? Sure, absolutely. But the structure and frameworks of an agent drastically changes what AI-based assemblages are capable of: it’s an order of magnitude or several above a simple LLM interaction.

    This is where I landed after a couple of simple experiments with a tiny, fairly hapless agent. It left me wondering what else I might do with Parker; or if Parker had an upgrade. Could it be useful? More importantly, could I make it weird?


    Tech Stack:

    • Raspberry Pi 3B+, Raspberry Pi OS Lite (64-bit) — I didn’t really go into this in the post, but I’m constantly testing what AI stuff can do at tiny scale on limited hardware. The Pi 3B+ is particularly limited, so this was a good test. I love Raspberry Pis and the company and foundation behind them.
    • LM Studio — this is a host software that allows you to run LLMs locally, on your own hardware. Knowing that agents called to LLMs, I didn’t want to burn through tokens on a frontier/cloud model. I also wanted to see what the smol agent could do with a smol model.
    • PicoClaw — a free, open-source agentic framework. Learn more and how to install on Github.
    • Ministral-3B — a LLM; a fork/fine-tune of Mistral-7B-v0.1.
    • Qwen3-8B — a LLM developed by Alibaba Cloud. Like Deepseek before it, the Qwen family of models are hitting high benchmarks in reasoning, coding, multilingual support, and agent capabilities.
  • The agentic revolution is here, and it’s really boring

    This is Part 1 of a short series on agentic AI. Part 2 is here. Subsequent posts will be linked here.

    Robots with jobs. Image generated by Leonardo.Ai, 28 May 2026; prompt by me.

    The new ‘frontier’ of generative AI is the agent. Agentic AI is any configuration that allows LLMs to act autonomously. LLMs leverage their reasoning abilities to take in information, then act on that information; most agentic setups are a loop, so once that action is done, the agent either repeats the task immediately based on what has changed, or waits a period of time before repeating.

    Agents have a lineage in other automations, like macros, scripts, and bots. In a piece of software, there might be a task that you repeat over and over, such as indenting a line a certain number of times, or formatting a category of text in a particular way. Macros allow you to record a sequence of actions, that you can then repeat with a keyboard shortcut.

    You might set a folder to back up from your laptop to an external hard drive: this can be automated with a script and cron job, through an OS feature like Mac’s Automator, or SaaS providers like IFTTT or n8n.io.

    You might write a bot that scrapes blog posts on a particular topic, then sends you a list as an email at the end of the week. These are fairly straightforward tasks, each with perhaps a handful of steps. Easy enough to script up (or find a script, plugin, or app online).

    This is a spectrum of delegation, of automation. Streamlining the things we do day to day, getting technology to help us, to save us time. AI agents are adjacent to this spectrum, but also a little different — in the same way that LLMs are not really like other pieces of software, no matter how good they’re getting at certain tasks.

    To massively reduce the complexity of LLMs, they are probability machines. That’s not to say they’re random or chance-based. They map out connections between concepts, in hugely complex webs or meshworks. The word ‘pawn’, for example, has a location in the LLM’s internal map — a set of coordinates. ‘Pawn’ might sit near ‘king’, because it’s often used to describe chess moves; but ‘pawn’ might also sit near ‘broker’ or ‘shop’. And ‘king’ might sit next to ‘queen’ or ‘bed’, and ‘shop’ might sit near ‘store’ which sits near ‘data’; language, it turns out, is complicated. LLMs manage this complexity through developing their map across multiple dimensions. When given a prompt, the model navigates through the map to find the most probable path through these conceptual clusters, and generate a response based on that.

    No matter how good LLMs get, they will only ever deliver their best guess in response to a query. So how do we help LLMs make better guesses? Through tool calls. Allow the LLM to verify its guess with access to a search engine, for instance. This is now common across the major proprietary models like Claude and Gemini.

    Tool calls, Skills, MCPs — these are the explicit harnesses and suspenders that we can put on LLMs to make them more useful. That we have to do so at all says a lot about how innately unreliable LLMs still are for a whole bunch of tasks — but that’s a separate conversation.

    We’ve given LLMs tools — that’s great in the moment. It helps us use them more efficiently, it lets them be more helpful to us. But that’s still a transaction. What if we could get them to head off and work for us? They can reason, after all — even if it’s only a semantic kind of reasoning. That’s no different to me writing out a blog post off the top of my head like I’m doing now. I’m reasoning out the structure, the argument. I’m making judgements through writing.

    It would be great if we could leverage that kind of reasoning to automate longer strings of tasks: entire processes or workflows. Hell, if we handed over access to our internet accounts, it could take care of emails, scheduling, restaurant and holiday bookings, even do some copy writing and social posting for us.

    That was the reasoning (so to speak) behind Claude Code, OpenAI’s Operator, and various AutoGPTs. But it was OpenClaw, an open-source tool released in late 2025, that changed the game. The earlier tools still needed humans to click, copy, paste, verify, and oversee the various stages of the automation; OpenClaw automated all those steps too, but also went one step further. OpenClaw, at least initially, was by default granted broad access to filesystems and user credentials; a huge problem if users didn’t intend that, but also a new level of affordance for automations.

    So how are we leveraging these new autonomous workers?

    To automate the boring stuff, same as always. Manage my email and my calendar. Give me a daily briefing. Help me meal plan. Keep track of my saves and bookmarks.

    Mid-tier usage is around researching and outlining content for blogs or socials, automating software development or sysadmin workflows, tracking and comparing prices of various items across vendors (be it groceries, wholesale purchases for business, or real estate).

    At the advanced or enterprise level, multi-agent setups are the go-to. Give each agent a particular job, and then put them into service together. These configurations are sometimes deployed in finance, internal business operations, and engineering.

    The initial fear with agentic systems was that they’d run wild, or that they’d build new skills and take over; good old-fashioned sci-fi tech panic fun. But the truth is far less scary. These agents are great when given very specific tasks, and clearly-defined instructions around tool use. Open-ended deployment — where agents are allowed to act beyond clear instructions — often results in error or failure.

    While everyone else is (rightly) concerned about deployment at scale, I’m trying to figure out how the agent itself operates. More to come.

  • From Literacy to Ecology

    The podium in the auditorium of Château de Valrose, Campus Valrose, Université Côte d’Azur.

    The third and final presentation of my research trip in France was delivered on Monday, April 27, at the colloquium “Créativités artificielles : approches critiques de l’IA” at the Université Côte d’Azur, Nice.

    My talk was titled “From Literacy to Ecology: Rethinking Critical-Creative AI After Automation”. This piece at once gestured backwards, to some of my earliest work in genAI, on ‘glitched companions’ and the smol and the weird, but also forwards, to where and how we might think about AI systems and outputs as part of a broader ecosystem and environment. I’m wanting to shift the conversation — and my research — from “How can I read this?” to “How can I live here?”

    I opened with the ‘literacy trap’ of technology, where outcomes- and skills-focused frameworks are instantiated across workplaces and education providers.

    These frameworks are normative — locking people in to particular modes of use and value systems. They position the user in a specific relationship to a tool. Furthermore, they assume that generative AI models are stable objects that can be learned and worked; that you can achieve a measurable standard of competency; that these are machines that can be mastered.

    The paper proposes a shift in metaphor from literacy, which implies a stable object to be learned, to ecology, which implies an environment to be inhabited. Reviving Guattari’s notion of ‘ecosophy’, I present an ecological approach to AI across critical, creative, and communicative registers.

    “The Companion Ecosystem”: the practical activation of my ecological approach to AI.

    As an example activation of this idea, I presented by companions Wuwu and Fynik — little scripted agents that mess with folders on my computer. This work is being developed theoretically and philosophically, but also practically for a game/RPG artefact.

    The three-day event in Nice and Cannes brought together international scholars around critical, ethical, creative, scientific, and experimental interrogations of artificial intelligences generative and otherwise. My contribution will be submitted for consideration for a just-announced publication from the event.

  • Against the totalising imaginary

    Dans le vif: Presenting at Campus Condorcet, Friday 24 April 2026.

    My sabbatical in France has continued apace, with plenty of fruitful meetings and discussions, and not a little writing (deadlines sadly declined a similar holiday).

    On 24 April, I had the opportunity to present some of my research at Université Paris 8-Vincennes-Saint Denis — a university founded by figures including Jacques Derrida, Hélène Cixous, and Roland Barthes in the aftermath of May ’68.

    I presented a talk titled “Against the Totalising Imaginary: Weird AI and the Ecology of the Possible”, in which I discussed my glitch-based experiments and methodologies, which I refer to as ‘ritual-technics’. For the first time, I also proposed worldbuilding and storytelling as productive frameworks for engaging with technologies like generative AI.

    I began with the Slopocene. This has been bandied about as a pejorative term for our current overload of synthetic content and governance by algorithm, with the resulting crises of authenticity, ‘reality’, and authorship. As in other work, I’m working to reclaim the Slopocene as a productive and playful term, but also as a speculative near-future or alt-present, where recursive training collapse turns the web into a haunted archive of confused bots, discarded memes, and broken truths.

    How to navigate the Slopocene? I co-opted the work of my co-presenters for the seminar: Boris Eldagsen, Rosa Cinelli, and Philippe Boisnard, alongside Chris Chesher and Cesar Albarran-Torres, Eryk Salvaggio, and Ian Haig. These are diverse approaches, but they have a few common clusters: material/semiotic, i.e. we can read AI outputs diagnostically as results of training data; relationality/phemomenology, in terms of what kind of encounter or interaction we have with AI technology; and then an aesthetic/resistant thread, which finds value in the visual breakdown and visceral sensation of encountering AI media.

    These are methods, approaches, attitudes that resist zealous techno-utopia or simplistic and naive dystopic rejection, preferring instead to pay close attention to generative AI’s computational and cultural mechanisms. Essentially these are all ways to ‘stay with’ the machine.

    My own approach weaves a thread through the material/semiotic, the relational/phenomenological, and the aesthetic/resistant — an approach I refer to formally as critical-creative AI, or informally: gonzo AI. The approach is the practical/experimental arm of my broader media-materialist approach, where I position myself as a tinkerer-theorist, which translates beautifully in French to bricoleur-théoricien.

    I went through a few of my experiments with genAI, including semantic collapse, music generation, before introducing The Drift, my worldbuilding project where all my weird AI creations live. The Drift is “a space to think and to play and to build, and an alternative imaginary to the totalising mythology that Big Technology would love us to believe, where AI is everything and everything has to be AI”:

    “It’s a world where messiness is the point, where you can be a critical observer but also someone who lives in the space as an inhabitant. There are lovely tensions between delight and disturbance, being critical and being caught-up-in-it — living in these tensions is the only honest position you can have. Games and world-building and storytelling are forms where you can hold the contradiction, you can live with the tension. And it’s a feature of these media rather than a bug or an error.”

    Image generated by Leonardo.Ai, 20 April 2026; prompt by me.

    This HERMES Seminaire, titled “Imaginaires artificiels : créativité et recherche à l’ère de l’image générative”, featured co-presenters Boris Eldagsen, Rosa Cinelli, and philippe boisnard, who shared their innovative approaches to exploring and deconstructing large language models and media generators.

    Université Paris 8 has been my host throughout this research trip, and it already feels like home. The institution embraces a diversity of experience among students and faculty, with interdisciplinary research and creative methods as the norm. Special thanks to Everardo Reyes of Laboratoire Paragraphe, who has been a generous friend and co-conspirator over the past couple of years.

  • New research published: A media-materialist method for interpreting generative AI images

    One of the images I used in this article as a sample object of analysis. Generated in Midjourney using the prompt ‘intellectual rigor’. Perfectly reflects my state at various stages of this article’s composition and publication.

    After plenty of play and experimentation with AI imagery, I found myself reacting viscerally to commentary and early scholarship that was pejorative about — or outright dismissive of — these outputs. The prevailing discourse treated AI images as a kind of slop monolith, when I found a lot of my generations to be fascinating, disturbing, amusing, and even beautiful. In response, I wrote this article, which presents a four-layer method for a structured, formal analysis of AI-generated images. The four layers are data, model, interface, and prompt, reflecting the mechanisms of generative AI technology. Each layer offers various considerations and questions to ask about actual outputs, encouraging researchers, students, educators, and commentators to move beyond dismissing these images as mere slop, and to begin considering them as cultural artefacts.

    This piece is the foundation of all my work on genAI over the past two years (I hinted at its publication last year), and also the first where I’ve attempted to create a new method rather than just apply one. It’s also the first to really put forward my own take on media materialism, a philosophy and methodology that has guided my work for nearly ten years.

    I am a big believer in close analysis, be it of texts, imagery, video, films: all the objects of culture. But I struggled for a long time to bridge that method with a context that made sense to me. In figuring out that the mechanisms of making were another foundational aspect of my work, it took me a few pieces to be able to make this connection, i.e. what I’ve nearly always tried to do is to consider how the means of an object’s production leave their mark on the object itself. It’s a simple conclusion, but it’s taken several attempts for me to articulate it in a way that felt satisfactory. This article feels like the first to actually explain it appropriately; the next step is to deploy the approach across other kinds of synthetic media and generative systems more broadly, but also to possibly return with this approach to cinema and TV.