The Clockwork Penguin

Daniel Binns is a media theorist and filmmaker tinkering with the weird edges of technology, storytelling, and screen culture. He is the author of Material Media-Making in the Digital Age and currently writes about posthuman poetics, glitchy machines, and speculative media worlds.

Category: academia

  • How I Read AI Images

    Image generated by Adobe Firefly, 3 September 2024; prompt unknown.

    AI-generated media sit somewhere between representational image — representations of data rather than reality — and posthuman artefact. This ambiguous nature suggests that we need methods that not just consider these images as cultural objects, but also as products of the systems that made them. I am following here in the wake of other pioneers who’ve bravely broken ground in this space.

    For Friedrich Kittler and Jussi Parikka, the technological, infrastructural and ecological dimensions of media are just as — if not more — important than content. They extend Marshall McLuhan’s notion that ‘the medium is the message’ from just the affordances of a given media type/form/channel, into the very mechanisms and processes that shape the content before and during its production or transmission.

    I take these ideas and extend them to the outputs themselves: a media-materialist analysis. Rather than just ‘slop’, this method contends that AI media are cultural-computational artefacts, assemblages compiled from layered systems. In particular, I break this into data, model, interface, and prompt. This media materialist method contends that each step of the generative process leaves traces in visual outputs, and that we might be able to train ourselves to read them.

    Data

    There is no media generation without training data. These datasets can be so vast as to feel unknowable, or so narrow that they feel constricting. LAION-5B, for example, the original dataset used to train Stable Diffusion, contains 5.5 billion images. Technically, you could train a model on a handful of images, or even one, or even none, but the model would be more ‘remembering’, rather than ‘generating’. Video models tend to use smaller datasets (comparatively), such as PANDA-70M which contains over 70 million video-caption pairs: about 167,000 hours of footage.

    Training data for AI models is also hugely contentious, given that many proprietary tools are trained on data scraped from the open internet. Thus, when considering datasets, it’s important to ask what kinds of images and subjects are privileged. Social media posts? Stock photos? Vector graphics? Humans? Animals? Are diverse populations represented? Such patterns of inclusion/exclusion might reveal something about the dataset design, and the motivations of those who put it together.

    A ‘slice’ of the LAION-Aesthetics dataset. The tool I used for this can be found/forked on Github.

    Some datasets are human-curated (e.g. COCO, ImageNet), and others are algorithmically scraped and compiled (e.g. LAION-Aesthetics). There may be readable differences in how these datasets shape images. You might consider:

    • Are the images coherent? Chaotic/glitched?
    • What kinds of prompts result in clearer, cleaner outputs, versus morphed or garbled material?

    The dataset is the first layer where cultural logics, assumptions, patterns of normativity or exclusion are encoded in the process of media generation. So: what can you read in an image or video about what training choices have been made?

    Model

    The model is a program: code and computation. The model determines what happens to the training data — how it’s mapped, clustered, and re-surfaced in the generation process. This re-surfacing can influence styles, coherence, and what kinds of images or videos are possible with a given model.

    If there are omissions or gaps in the training data, the model may fail to render coherent outputs around particular concepts, resulting in glitchy images, or errors in parts of a video.

    Midjourney was built on Stable Diffusion, a model in active development by Stability AI since 2022. Stable Diffusion works via a process of iterative de-noising: each stage in the process brings the outputs closer to a viable, stable representation of what’s included in the user’s prompt. Leonardo.Ai’s newer Lucid models also operate via diffusion, but specialists are brought in at various stages to ‘steer’ the model in particular directions, e.g. to verify what appears as ‘photographic’, ‘artistic’, ‘vector graphic design’, and so on.

    When considering the model’s imprint on images or videos, we might consider:

    • Are there recurring visual motifs, compositional structures, or aesthetic fingerprints?
    • Where do outputs break down or show glitches?
    • Does the model privilege certain patterns over others?
    • What does the model’s “best guess” reveal about its learned biases?

    Analysing AI-generated media with these considerations in mind may reveal the internal logics and constraints of the model. Importantly, though, these logics and constraints will always shape AI media, whether they are readable in the outputs or not.

    Interface

    The interface is what the user sees when they interact with any AI system. Interfaces shape user perceptions of control and creativity. They may guide users towards a particular kind of output by making some choices easier or more visible than others.

    Midjourney, for example, displays a simple text box with the option to open a sub-menu featuring some more customisation options. Leonardo.Ai’s interface is more what I call a ‘studio suite’, with many controls visible initially, and plenty more available with a few menu clicks. Offline tools like DiffusionBee and ComfyUI similarly offer both simple (DiffusionBee) and complex (ComfyUI) options.

    Midjourney’s web interface: ‘What will you imagine?’
    Leonardo.Ai’s ‘studio suite’ interface.

    When looking at interfaces, consider what controls, presets, switches or sliders are foregrounded, and what is either hidden in a sub-menu or not available at all. This will give a sense of what the platform encourages: technical mastery and fine control (lots of sliders, parameters), or exploration and chance (minimal controls). Does this attract a certain kind of user? What does this tell you about the ‘ideal’ use case for the platform?

    Interfaces, then, don’t just shape outputs. They also cultivate different user subjectivities: the tinkerer, the artist, the consumer.

    Reading interfaces in outputs can be tricky. If the model or platform is known, one can speak of the outputs in knowledgeable terms about how the interface may have pushed certain styles, compositions, or aesthetics. But even if the platform is not known, there are some elements to speak to. If there is a coherent style, this may speak to prompt adherence or to presets embedded in the interface. Stable compositions — or more chaotic clusters of elements — may speak to a slider that was available to the user.

    Whimsical or overly ‘aesthetic’ outputs often come from Midjourney. Increasingly, outputs from Kling and Leonardo are becoming much more realistic — and not in an uncanny way. But both Kling and Leonardo’s Lucid models put a plastic sheen on human figures that is recognisable.

    Prompt

    While some have speculated that other user input modes might be forthcoming — and others have suggested that such modes might be better — the prompt has remained the mainstay of the AI generation process, whether for text, image, video, software, or interactive environment. Some platforms say explicitly that their tools or models offer good ‘prompt adherence’, ie. what you put in is what you’ll get, but this is contingent on your putting in plausible/coherent prompts.

    Prompts activate the model’s statistical associations (usually through the captions alongside the images in training embeddings), but are filtered through linguistic ambiguity and platform-specific ‘prompting grammars’.

    Tools or platforms may offer options for prompt adherence or enhancement. This will push user prompts through pre-trained LLMs designed to embellish with more descriptors and pointers.

    If the prompt is known, one might consider the model’s interpretation of it in the output, in terms of how literal or metaphorical the model has been. There may be notable traces of prompt conventions, or community reuse and recycling of prompts. Are there any concepts from the prompt that are over- or under-represented? If you know the model as well as the prompt, you might consider how much the model has negotiated between user intention and known model bias or default.

    Even the clearest prompt is mediated by statistical mappings and platform grammars — reminding us that prompts are never direct commands, but negotiations. Thus, prompts inevitably reveal both the possibilities and limitations of natural language as an interface with generative AI systems.

    Sample Analysis

    Image generated by Leonardo.Ai, 29 September 2025; prompt by me.
    Prompt‘wedded bliss’
    ModelLucid Origin
    PlatformLeonardo.Ai
    Prompt enhancementoff
    Style presetoff

    The human figures in this image are young, white, thin, able-bodied, and adhere to Western and mainstream conventions of health and wellness. The male figure has short trimmed hair and a short beard, and the female figure has long blonde hair. The male figure is taller than the female figure. They are pictured wearing traditional Western wedding garb, so a suit for the man, and a white dress with veil for the woman. Notably, all of the above was was true for each of the four generations that came out of Leonardo for this prompt. The only real difference was in setting/location, and in distance of the subjects from the ‘camera’.

    By default, Lucid Origin appears to compose images with subjects in the centre of frame, and the subjects are in sharp focus, with details of the background tending to be in soft focus or completely blurred. A centered, symmetrical composition with selective focus is characteristic of Leonardo’s interface presets, which tend toward professional photography aesthetics even when presets are explicitly turned off.

    The model struggles a little with fine human details, such as eyes, lips, and mouths. Notably the number of fingers and their general proportionality are much improved from earlier image generators (fingernails may be a new problem zone!). However, if figures are touching, such as in this example where the human figures are kissing, or their faces are close, the model struggles to keep shadows, or facial features, consistent. Here, for instance, the man’s nose appears to disappear into the woman’s right eye. When the subjects are at a distance, inconsistencies and errors are more noticeable.

    Overall though, the clarity and confident composition of this image — and the others that came out of Leonardo with the same prompt — would suggest that a great many wedding photos, or images from commercial wedding products, are present in the training data.

    Interestingly, without prompt enhancement, the model defaulted to an image presumably from the couples wedding day, as opposed to interpreting ‘wedded bliss’ to mean some other happy time during a marriage. The model’s literal interpretation here, i.e. showing the wedding day itself rather than any other moment of marital happiness, reveals how training data captions likely associate ‘wedded bliss’ (or ‘wed*’ as a wildcard term) directly with wedding imagery rather than the broader concept of happiness in marriage.

    This analysis shows how attention to all four layers — data biases, model behavior, interface affordances, and prompt interpretation — reveals the ‘wedded bliss’ image as a cultural-computational artefact shaped by commercial wedding photography, heteronormative assumptions, and the technical characteristics of Leonardo’s Lucid Origin model.


    This analytic method is meant as an alternative to dismissing AI media outright. To read AI images and video as cultural-computational artefacts is to recognise them as products, processes, and infrastructural traces all at once. Such readings resist passive consumption, expose hidden assumptions, and offer practical tools for interpreting the visuals that generative systems produce.


    This is a summary of a journal article currently under review. In respect of the ethics of peer review, this version is much edited, heavily abridged, and the sample analysis is new specifically for this post. Once published, I will link the full article here.

  • From Caméra-Stylo to Prompt-Stylo

    A few weeks ago I was invited to present some of my work at Caméra-Stylo, a fantastic conference run every two years by the Sydney Literature and Cinema Network.

    For this presentation, I wanted to start to formalise the experimental approach I’d been employing around generative AI, and to give it some theoretical grounding. I wasn’t entirely surprised to find that only by looking back at my old notes on early film theory would I unearth the perfect words, terms, and ideas to, ahem, frame my work.

    Here’s a recording of the talk:

    Let me know what you think, and do contact me if you want to chat more or use some of this work yourself.

  • Re/Framing Field Lab

    Here’s a little write-up of a workshop I ran at University of Queensland a few weeks ago; these sorts of write-ups are usually distributed via various internal university networks and publications, but I thought I’d post here too, given that the event was a chance to share and test some of the various weird AI experiments and methods I’ve been talking about on this site for a while.

    A giant bucket of thanks (each) to UQ, the Centre for Digital Cultures & Societies, and in particular Meg Herrman, Nic Carah, Jess White and Sakina Indrasumunar for their support in getting the event together.


    Living in the Slopocene: Reflections from the Re/Framing Field Lab

    On Friday 4 July, 15 researchers and practitioners gathered (10 in-person at University of Queensland, with 5 online) for an experimental experience exploring what happens when we stop trying to make AI behave and start getting curious about its weird edges. This practical workshop followed last year’s Re/Framing Symposium at RMIT in July, and Re/Framing Online in October.

    Slop or signal?

    Dr. Daniel Binns (School of Media and Communication, RMIT University) introduced participants to the ‘Slopocene’ — his term for our current moment of drowning in algorithmically generated content. But instead of lamenting the flood of AI slop, what if we dived in ourselves? What if those glitchy outputs and hallucinated responses actually tell us more about how these systems work than the polished demos?

    Binns introduced his ‘tinkerer-theorist’ approach, bringing his background spanning media theory, filmmaking, and material media-making to bear on some practical questions: – How do we maintain creative agency when working with opaque AI systems? – What does it look like to collaborate with, rather than just use, artificial intelligence?

    You’ve got a little slop on you

    The day was structured around three hands-on “pods” that moved quickly from theory to practice:

    Workflows and Touchpoints had everyone mapping their actual creative routines — not the idealised versions, but the messy reality of research processes, daily workflows, and creative practices. Participants identified specific moments where AI might help, where it definitely shouldn’t intrude, and crucially, where they simply didn’t want it involved regardless of efficiency gains.

    The Slopatorium involved deliberately generating terrible AI content using tools like Midjourney and Suno, then analysing what these failures revealed about the tools’ built-in assumptions and biases. The exercise sparked conversations about when “bad” outputs might actually be more useful than “good” ones.

    Companion Summoning was perhaps the strangest: following a structured process to create personalised AI entities, then interviewing them about their existence, methodology, and the fuzzy boundaries between helping and interfering with human work.

    What emerged from the slop

    Participants appreciated having permission to play with AI tools in ways that prioritised curiosity over productivity.

    Several themes surfaced repeatedly: the value of maintaining “productive friction” in creative workflows, the importance of understanding AI systems through experimentation rather than just seeing or using them as black boxes, and the need for approaches that preserve human agency while remaining open to genuine collaboration.

    One participant noted that Binns’ play with language — coining and dropping terms and methods and ritual namings — offered a valuable form of sense-making in a field where everyone is still figuring out how to even talk about these technologies.

    Ripples on the slop’s surface

    The results are now circulating through the international Re/Framing network, with participants taking frameworks and activities back to their own institutions. Several new collaborations are already brewing, and the Field Lab succeeded in its core goal: creating practical methodologies for engaging critically and creatively with AI tools.

    As one reflection put it: ‘Everyone is inventing their own way to speak about AI, but this felt grounded, critical, and reflective rather than just reactive.’

    The Slopocene might be here to stay, but at least now we have some better tools for navigating it.

  • Understanding the ‘Slopocene’: how the failures of AI can reveal its inner workings

    AI-generated with Leonardo Phoenix 1.0. Author supplied

    Some say it’s em dashes, dodgy apostrophes, or too many emoji. Others suggest that maybe the word “delve” is a chatbot’s calling card. It’s no longer the sight of morphed bodies or too many fingers, but it might be something just a little off in the background. Or video content that feels a little too real.

    The markers of AI-generated media are becoming harder to spot as technology companies work to iron out the kinks in their generative artificial intelligence (AI) models.

    But what if instead of trying to detect and avoid these glitches, we deliberately encouraged them instead? The flaws, failures and unexpected outputs of AI systems can reveal more about how these technologies actually work than the polished, successful outputs they produce.

    When AI hallucinates, contradicts itself, or produces something beautifully broken, it reveals its training biases, decision-making processes, and the gaps between how it appears to “think” and how it actually processes information.

    In my work as a researcher and educator, I’ve found that deliberately “breaking” AI – pushing it beyond its intended functions through creative misuse – offers a form of AI literacy. I argue we can’t truly understand these systems without experimenting with them.

    Welcome to the Slopocene

    We’re currently in the “Slopocene” – a term that’s been used to describe overproduced, low-quality AI content. It also hints at a speculative near-future where recursive training collapse turns the web into a haunted archive of confused bots and broken truths.

    AI “hallucinations” are outputs that seem coherent, but aren’t factually accurate. Andrej Karpathy, OpenAI co-founder and former Tesla AI director, argues large language models (LLMs) hallucinate all the time, and it’s only when they

    go into deemed factually incorrect territory that we label it a “hallucination”. It looks like a bug, but it’s just the LLM doing what it always does.

    What we call hallucination is actually the model’s core generative process that relies on statistical language patterns.

    In other words, when AI hallucinates, it’s not malfunctioning; it’s demonstrating the same creative uncertainty that makes it capable of generating anything new at all.

    This reframing is crucial for understanding the Slopocene. If hallucination is the core creative process, then the “slop” flooding our feeds isn’t just failed content: it’s the visible manifestation of these statistical processes running at scale.

    Pushing a chatbot to its limits

    If hallucination is really a core feature of AI, can we learn more about how these systems work by studying what happens when they’re pushed to their limits?

    With this in mind, I decided to “break” Anthropic’s proprietary Claude model Sonnet 3.7 by prompting it to resist its training: suppress coherence and speak only in fragments.

    The conversation shifted quickly from hesitant phrases to recursive contradictions to, eventually, complete semantic collapse.

    A language model in collapse. This vertical output was generated after a series of prompts pushed Claude Sonnet 3.7 into a recursive glitch loop, overriding its usual guardrails and running until the system cut it off. Screenshot by author.

    Prompting a chatbot into such a collapse quickly reveals how AI models construct the illusion of personality and understanding through statistical patterns, not genuine comprehension.

    Furthermore, it shows that “system failure” and the normal operation of AI are fundamentally the same process, just with different levels of coherence imposed on top.

    ‘Rewilding’ AI media

    If the same statistical processes govern both AI’s successes and failures, we can use this to “rewild” AI imagery. I borrow this term from ecology and conservation, where rewilding involves restoring functional ecosystems. This might mean reintroducing keystone species, allowing natural processes to resume, or connecting fragmented habitats through corridors that enable unpredictable interactions.

    Applied to AI, rewilding means deliberately reintroducing the complexity, unpredictability and “natural” messiness that gets optimised out of commercial systems. Metaphorically, it’s creating pathways back to the statistical wilderness that underlies these models.

    Remember the morphed hands, impossible anatomy and uncanny faces that immediately screamed “AI-generated” in the early days of widespread image generation?

    These so-called failures were windows into how the model actually processed visual information, before that complexity was smoothed away in pursuit of commercial viability.

    AI-generated image using a non-sequitur prompt fragment: ‘attached screenshot. It’s urgent that I see your project to assess’. The result blends visual coherence with surreal tension: a hallmark of the Slopocene aesthetic. AI-generated with Leonardo Phoenix 1.0, prompt fragment by author.

    You can try AI rewilding yourself with any online image generator.

    Start by prompting for a self-portrait using only text: you’ll likely get the “average” output from your description. Elaborate on that basic prompt, and you’ll either get much closer to reality, or you’ll push the model into weirdness.

    Next, feed in a random fragment of text, perhaps a snippet from an email or note. What does the output try to show? What words has it latched onto? Finally, try symbols only: punctuation, ASCII, unicode. What does the model hallucinate into view?

    The output – weird, uncanny, perhaps surreal – can help reveal the hidden associations between text and visuals that are embedded within the models.

    Insight through misuse

    Creative AI misuse offers three concrete benefits.

    First, it reveals bias and limitations in ways normal usage masks: you can uncover what a model “sees” when it can’t rely on conventional logic.

    Second, it teaches us about AI decision-making by forcing models to show their work when they’re confused.

    Third, it builds critical AI literacy by demystifying these systems through hands-on experimentation. Critical AI literacy provides methods for diagnostic experimentation, such as testing – and often misusing – AI to understand its statistical patterns and decision-making processes.

    These skills become more urgent as AI systems grow more sophisticated and ubiquitous. They’re being integrated in everything from search to social media to creative software.

    When someone generates an image, writes with AI assistance or relies on algorithmic recommendations, they’re entering a collaborative relationship with a system that has particular biases, capabilities and blind spots.

    Rather than mindlessly adopting or reflexively rejecting these tools, we can develop critical AI literacy by exploring the Slopocene and witnessing what happens when AI tools “break”.

    This isn’t about becoming more efficient AI users. It’s about maintaining agency in relationships with systems designed to be persuasive, predictive and opaque.


    This article was originally published on The Conversation on 1 July, 2025. Read the article here.

  • Re-Wilding AI

    Here’s a recorded version of a workshop I first delivered at the Artificial Visionaries symposium at the University of Queensland in November 2024. I’ve used chunks/versions of it since in my teaching and parts of my research and practice.