The Clockwork Penguin

Daniel Binns is a media theorist and filmmaker tinkering with the weird edges of technology, storytelling, and screen culture. He is the author of Material Media-Making in the Digital Age and currently writes about posthuman poetics, glitchy machines, and speculative media worlds.

Category: Research

  • Against the totalising imaginary

    Dans le vif: Presenting at Campus Condorcet, Friday 24 April 2026.

    My sabbatical in France has continued apace, with plenty of fruitful meetings and discussions, and not a little writing (deadlines sadly declined a similar holiday).

    On 24 April, I had the opportunity to present some of my research at Université Paris 8-Vincennes-Saint Denis — a university founded by figures including Jacques Derrida, Hélène Cixous, and Roland Barthes in the aftermath of May ’68.

    I presented a talk titled “Against the Totalising Imaginary: Weird AI and the Ecology of the Possible”, in which I discussed my glitch-based experiments and methodologies, which I refer to as ‘ritual-technics’. For the first time, I also proposed worldbuilding and storytelling as productive frameworks for engaging with technologies like generative AI.

    I began with the Slopocene. This has been bandied about as a pejorative term for our current overload of synthetic content and governance by algorithm, with the resulting crises of authenticity, ‘reality’, and authorship. As in other work, I’m working to reclaim the Slopocene as a productive and playful term, but also as a speculative near-future or alt-present, where recursive training collapse turns the web into a haunted archive of confused bots, discarded memes, and broken truths.

    How to navigate the Slopocene? I co-opted the work of my co-presenters for the seminar: Boris Eldagsen, Rosa Cinelli, and Philippe Boisnard, alongside Chris Chesher and Cesar Albarran-Torres, Eryk Salvaggio, and Ian Haig. These are diverse approaches, but they have a few common clusters: material/semiotic, i.e. we can read AI outputs diagnostically as results of training data; relationality/phemomenology, in terms of what kind of encounter or interaction we have with AI technology; and then an aesthetic/resistant thread, which finds value in the visual breakdown and visceral sensation of encountering AI media.

    These are methods, approaches, attitudes that resist zealous techno-utopia or simplistic and naive dystopic rejection, preferring instead to pay close attention to generative AI’s computational and cultural mechanisms. Essentially these are all ways to ‘stay with’ the machine.

    My own approach weaves a thread through the material/semiotic, the relational/phenomenological, and the aesthetic/resistant — an approach I refer to formally as critical-creative AI, or informally: gonzo AI. The approach is the practical/experimental arm of my broader media-materialist approach, where I position myself as a tinkerer-theorist, which translates beautifully in French to bricoleur-théoricien.

    I went through a few of my experiments with genAI, including semantic collapse, music generation, before introducing The Drift, my worldbuilding project where all my weird AI creations live. The Drift is “a space to think and to play and to build, and an alternative imaginary to the totalising mythology that Big Technology would love us to believe, where AI is everything and everything has to be AI”:

    “It’s a world where messiness is the point, where you can be a critical observer but also someone who lives in the space as an inhabitant. There are lovely tensions between delight and disturbance, being critical and being caught-up-in-it — living in these tensions is the only honest position you can have. Games and world-building and storytelling are forms where you can hold the contradiction, you can live with the tension. And it’s a feature of these media rather than a bug or an error.”

    Image generated by Leonardo.Ai, 20 April 2026; prompt by me.

    This HERMES Seminaire, titled “Imaginaires artificiels : créativité et recherche à l’ère de l’image générative”, featured co-presenters Boris Eldagsen, Rosa Cinelli, and philippe boisnard, who shared their innovative approaches to exploring and deconstructing large language models and media generators.

    Université Paris 8 has been my host throughout this research trip, and it already feels like home. The institution embraces a diversity of experience among students and faculty, with interdisciplinary research and creative methods as the norm. Special thanks to Everardo Reyes of Laboratoire Paragraphe, who has been a generous friend and co-conspirator over the past couple of years.

  • New research published: A media-materialist method for interpreting generative AI images

    One of the images I used in this article as a sample object of analysis. Generated in Midjourney using the prompt ‘intellectual rigor’. Perfectly reflects my state at various stages of this article’s composition and publication.

    After plenty of play and experimentation with AI imagery, I found myself reacting viscerally to commentary and early scholarship that was pejorative about — or outright dismissive of — these outputs. The prevailing discourse treated AI images as a kind of slop monolith, when I found a lot of my generations to be fascinating, disturbing, amusing, and even beautiful. In response, I wrote this article, which presents a four-layer method for a structured, formal analysis of AI-generated images. The four layers are data, model, interface, and prompt, reflecting the mechanisms of generative AI technology. Each layer offers various considerations and questions to ask about actual outputs, encouraging researchers, students, educators, and commentators to move beyond dismissing these images as mere slop, and to begin considering them as cultural artefacts.

    This piece is the foundation of all my work on genAI over the past two years (I hinted at its publication last year), and also the first where I’ve attempted to create a new method rather than just apply one. It’s also the first to really put forward my own take on media materialism, a philosophy and methodology that has guided my work for nearly ten years.

    I am a big believer in close analysis, be it of texts, imagery, video, films: all the objects of culture. But I struggled for a long time to bridge that method with a context that made sense to me. In figuring out that the mechanisms of making were another foundational aspect of my work, it took me a few pieces to be able to make this connection, i.e. what I’ve nearly always tried to do is to consider how the means of an object’s production leave their mark on the object itself. It’s a simple conclusion, but it’s taken several attempts for me to articulate it in a way that felt satisfactory. This article feels like the first to actually explain it appropriately; the next step is to deploy the approach across other kinds of synthetic media and generative systems more broadly, but also to possibly return with this approach to cinema and TV.

  • Like No One Is Watching

    Title slide of my paper “Like No One Is Watching”.

    I’ve kicked off a month’s research sabbatical in France, hitting the ground running…

    My first invited presentation was today at Université Paris I: Panthéon-Sorbonne, as part of the journée d’étude “L’intelligence et l’éthique de la télévision à l’ère des algorithms”. Today’s talks looked at de-ageing as a quest for immortality and fracturing of the present, televisuality and intelligence, and teaching LLMs about humans by making them watch a lot of TV; the seminar concludes tomorrow.

    My own piece, “Like No One Is Watching: The Form of Television in the Algorithmic Moment”, examined how episodic storytelling navigates the constraints of the platform and attention economies. I looked at the chaotic inconsistency of The Bear and the aggressive tedium of The Pitt as shows pushing formal boundaries to reassert a direct relationship with their audience.

    The talk had three key moves.

    Firstly, I re-establish television as the ‘miscreant medium’, drawing from John Fiske and John Hartley’s seminal work. On the one hand, television has always served as a scapegoat or delivery channel for whatever moral panic is current at the time; alongside this, it is a medium perennially torn between the strictures of institutions and technology, and the creativity of its artists.

    Secondly, I argue that platform logic holds two contradictory assumptions about audiences. On one hand, there is an assumption that audiences are passive and distracted. This assumption leads to baked-in redundancies, including explicit exposition and constant re-explanation (a phenomenon that Will Tavlin explores in his piece ‘Casual Viewing’). On the other hand, platform capitalism is contingent on metrics of retention; active, engaged viewing, then, is assumed.

    In the third section, I spoke to sample clips from The Bear and The Pitt, both shows that embody and embrace this presumptive schizophrenia. From The Bear I played part of the seventh episode of the first season, which includes a 17-minute unbroken take. I also shared a couple of mundane conversation scenes from the premiere episode of The Pitt. I used formal analysis here as a diagnostic tool, to observe how creatives push against (or acquiesce to) the algorithmic frame of their distribution. In the case of both shows, I offered that formal experimentation — whether at a dialogue, scene, episode, or series level — demonstrates friction as an exercise in meaning-making: a conversation and negotiation between creator and audience quite apart from questions of data, platform, capital.

    What close formal analysis reveals is that television is not a medium in decline, but one still jovially misbehaving; always exceeding what the discourse says it’s capable of, and still worth watching.

    This talk was a return to formal analysis for me, and it felt great to be home. I’ve been very lucky to be taught by or to work with a bunch of academics who really value close textual analysis, and I think it’s such an incisive and enjoyable means of understanding texts and their contexts.

    It’s highly likely an edited collection will result from this gathering, so fingers crossed that this work will be in print soon!

    Giving my talk at Université Paris 1-Panthéon Sorbonne. Photo thanks to Sandra Laugier.

    I now have a little breathing room before my second presentation, so I’ll be using this time to actually get out and wander around Paris a little, but also to feed and tend to a few items moving through the publication pipeline.

  • How I Read AI Images

    Image generated by Adobe Firefly, 3 September 2024; prompt unknown.

    AI-generated media sit somewhere between representational image — representations of data rather than reality — and posthuman artefact. This ambiguous nature suggests that we need methods that not just consider these images as cultural objects, but also as products of the systems that made them. I am following here in the wake of other pioneers who’ve bravely broken ground in this space.

    For Friedrich Kittler and Jussi Parikka, the technological, infrastructural and ecological dimensions of media are just as — if not more — important than content. They extend Marshall McLuhan’s notion that ‘the medium is the message’ from just the affordances of a given media type/form/channel, into the very mechanisms and processes that shape the content before and during its production or transmission.

    I take these ideas and extend them to the outputs themselves: a media-materialist analysis. Rather than just ‘slop’, this method contends that AI media are cultural-computational artefacts, assemblages compiled from layered systems. In particular, I break this into data, model, interface, and prompt. This media materialist method contends that each step of the generative process leaves traces in visual outputs, and that we might be able to train ourselves to read them.

    Data

    There is no media generation without training data. These datasets can be so vast as to feel unknowable, or so narrow that they feel constricting. LAION-5B, for example, the original dataset used to train Stable Diffusion, contains 5.5 billion images. Technically, you could train a model on a handful of images, or even one, or even none, but the model would be more ‘remembering’, rather than ‘generating’. Video models tend to use smaller datasets (comparatively), such as PANDA-70M which contains over 70 million video-caption pairs: about 167,000 hours of footage.

    Training data for AI models is also hugely contentious, given that many proprietary tools are trained on data scraped from the open internet. Thus, when considering datasets, it’s important to ask what kinds of images and subjects are privileged. Social media posts? Stock photos? Vector graphics? Humans? Animals? Are diverse populations represented? Such patterns of inclusion/exclusion might reveal something about the dataset design, and the motivations of those who put it together.

    A ‘slice’ of the LAION-Aesthetics dataset. The tool I used for this can be found/forked on Github.

    Some datasets are human-curated (e.g. COCO, ImageNet), and others are algorithmically scraped and compiled (e.g. LAION-Aesthetics). There may be readable differences in how these datasets shape images. You might consider:

    • Are the images coherent? Chaotic/glitched?
    • What kinds of prompts result in clearer, cleaner outputs, versus morphed or garbled material?

    The dataset is the first layer where cultural logics, assumptions, patterns of normativity or exclusion are encoded in the process of media generation. So: what can you read in an image or video about what training choices have been made?

    Model

    The model is a program: code and computation. The model determines what happens to the training data — how it’s mapped, clustered, and re-surfaced in the generation process. This re-surfacing can influence styles, coherence, and what kinds of images or videos are possible with a given model.

    If there are omissions or gaps in the training data, the model may fail to render coherent outputs around particular concepts, resulting in glitchy images, or errors in parts of a video.

    Midjourney was built on Stable Diffusion, a model in active development by Stability AI since 2022. Stable Diffusion works via a process of iterative de-noising: each stage in the process brings the outputs closer to a viable, stable representation of what’s included in the user’s prompt. Leonardo.Ai’s newer Lucid models also operate via diffusion, but specialists are brought in at various stages to ‘steer’ the model in particular directions, e.g. to verify what appears as ‘photographic’, ‘artistic’, ‘vector graphic design’, and so on.

    When considering the model’s imprint on images or videos, we might consider:

    • Are there recurring visual motifs, compositional structures, or aesthetic fingerprints?
    • Where do outputs break down or show glitches?
    • Does the model privilege certain patterns over others?
    • What does the model’s “best guess” reveal about its learned biases?

    Analysing AI-generated media with these considerations in mind may reveal the internal logics and constraints of the model. Importantly, though, these logics and constraints will always shape AI media, whether they are readable in the outputs or not.

    Interface

    The interface is what the user sees when they interact with any AI system. Interfaces shape user perceptions of control and creativity. They may guide users towards a particular kind of output by making some choices easier or more visible than others.

    Midjourney, for example, displays a simple text box with the option to open a sub-menu featuring some more customisation options. Leonardo.Ai’s interface is more what I call a ‘studio suite’, with many controls visible initially, and plenty more available with a few menu clicks. Offline tools like DiffusionBee and ComfyUI similarly offer both simple (DiffusionBee) and complex (ComfyUI) options.

    Midjourney’s web interface: ‘What will you imagine?’
    Leonardo.Ai’s ‘studio suite’ interface.

    When looking at interfaces, consider what controls, presets, switches or sliders are foregrounded, and what is either hidden in a sub-menu or not available at all. This will give a sense of what the platform encourages: technical mastery and fine control (lots of sliders, parameters), or exploration and chance (minimal controls). Does this attract a certain kind of user? What does this tell you about the ‘ideal’ use case for the platform?

    Interfaces, then, don’t just shape outputs. They also cultivate different user subjectivities: the tinkerer, the artist, the consumer.

    Reading interfaces in outputs can be tricky. If the model or platform is known, one can speak of the outputs in knowledgeable terms about how the interface may have pushed certain styles, compositions, or aesthetics. But even if the platform is not known, there are some elements to speak to. If there is a coherent style, this may speak to prompt adherence or to presets embedded in the interface. Stable compositions — or more chaotic clusters of elements — may speak to a slider that was available to the user.

    Whimsical or overly ‘aesthetic’ outputs often come from Midjourney. Increasingly, outputs from Kling and Leonardo are becoming much more realistic — and not in an uncanny way. But both Kling and Leonardo’s Lucid models put a plastic sheen on human figures that is recognisable.

    Prompt

    While some have speculated that other user input modes might be forthcoming — and others have suggested that such modes might be better — the prompt has remained the mainstay of the AI generation process, whether for text, image, video, software, or interactive environment. Some platforms say explicitly that their tools or models offer good ‘prompt adherence’, ie. what you put in is what you’ll get, but this is contingent on your putting in plausible/coherent prompts.

    Prompts activate the model’s statistical associations (usually through the captions alongside the images in training embeddings), but are filtered through linguistic ambiguity and platform-specific ‘prompting grammars’.

    Tools or platforms may offer options for prompt adherence or enhancement. This will push user prompts through pre-trained LLMs designed to embellish with more descriptors and pointers.

    If the prompt is known, one might consider the model’s interpretation of it in the output, in terms of how literal or metaphorical the model has been. There may be notable traces of prompt conventions, or community reuse and recycling of prompts. Are there any concepts from the prompt that are over- or under-represented? If you know the model as well as the prompt, you might consider how much the model has negotiated between user intention and known model bias or default.

    Even the clearest prompt is mediated by statistical mappings and platform grammars — reminding us that prompts are never direct commands, but negotiations. Thus, prompts inevitably reveal both the possibilities and limitations of natural language as an interface with generative AI systems.

    Sample Analysis

    Image generated by Leonardo.Ai, 29 September 2025; prompt by me.
    Prompt‘wedded bliss’
    ModelLucid Origin
    PlatformLeonardo.Ai
    Prompt enhancementoff
    Style presetoff

    The human figures in this image are young, white, thin, able-bodied, and adhere to Western and mainstream conventions of health and wellness. The male figure has short trimmed hair and a short beard, and the female figure has long blonde hair. The male figure is taller than the female figure. They are pictured wearing traditional Western wedding garb, so a suit for the man, and a white dress with veil for the woman. Notably, all of the above was was true for each of the four generations that came out of Leonardo for this prompt. The only real difference was in setting/location, and in distance of the subjects from the ‘camera’.

    By default, Lucid Origin appears to compose images with subjects in the centre of frame, and the subjects are in sharp focus, with details of the background tending to be in soft focus or completely blurred. A centered, symmetrical composition with selective focus is characteristic of Leonardo’s interface presets, which tend toward professional photography aesthetics even when presets are explicitly turned off.

    The model struggles a little with fine human details, such as eyes, lips, and mouths. Notably the number of fingers and their general proportionality are much improved from earlier image generators (fingernails may be a new problem zone!). However, if figures are touching, such as in this example where the human figures are kissing, or their faces are close, the model struggles to keep shadows, or facial features, consistent. Here, for instance, the man’s nose appears to disappear into the woman’s right eye. When the subjects are at a distance, inconsistencies and errors are more noticeable.

    Overall though, the clarity and confident composition of this image — and the others that came out of Leonardo with the same prompt — would suggest that a great many wedding photos, or images from commercial wedding products, are present in the training data.

    Interestingly, without prompt enhancement, the model defaulted to an image presumably from the couples wedding day, as opposed to interpreting ‘wedded bliss’ to mean some other happy time during a marriage. The model’s literal interpretation here, i.e. showing the wedding day itself rather than any other moment of marital happiness, reveals how training data captions likely associate ‘wedded bliss’ (or ‘wed*’ as a wildcard term) directly with wedding imagery rather than the broader concept of happiness in marriage.

    This analysis shows how attention to all four layers — data biases, model behavior, interface affordances, and prompt interpretation — reveals the ‘wedded bliss’ image as a cultural-computational artefact shaped by commercial wedding photography, heteronormative assumptions, and the technical characteristics of Leonardo’s Lucid Origin model.


    This analytic method is meant as an alternative to dismissing AI media outright. To read AI images and video as cultural-computational artefacts is to recognise them as products, processes, and infrastructural traces all at once. Such readings resist passive consumption, expose hidden assumptions, and offer practical tools for interpreting the visuals that generative systems produce.


    This is a summary of a journal article currently under review. In respect of the ethics of peer review, this version is much edited, heavily abridged, and the sample analysis is new specifically for this post. Once published, I will link the full article here.

  • Understanding the ‘Slopocene’: how the failures of AI can reveal its inner workings

    AI-generated with Leonardo Phoenix 1.0. Author supplied

    Some say it’s em dashes, dodgy apostrophes, or too many emoji. Others suggest that maybe the word “delve” is a chatbot’s calling card. It’s no longer the sight of morphed bodies or too many fingers, but it might be something just a little off in the background. Or video content that feels a little too real.

    The markers of AI-generated media are becoming harder to spot as technology companies work to iron out the kinks in their generative artificial intelligence (AI) models.

    But what if instead of trying to detect and avoid these glitches, we deliberately encouraged them instead? The flaws, failures and unexpected outputs of AI systems can reveal more about how these technologies actually work than the polished, successful outputs they produce.

    When AI hallucinates, contradicts itself, or produces something beautifully broken, it reveals its training biases, decision-making processes, and the gaps between how it appears to “think” and how it actually processes information.

    In my work as a researcher and educator, I’ve found that deliberately “breaking” AI – pushing it beyond its intended functions through creative misuse – offers a form of AI literacy. I argue we can’t truly understand these systems without experimenting with them.

    Welcome to the Slopocene

    We’re currently in the “Slopocene” – a term that’s been used to describe overproduced, low-quality AI content. It also hints at a speculative near-future where recursive training collapse turns the web into a haunted archive of confused bots and broken truths.

    AI “hallucinations” are outputs that seem coherent, but aren’t factually accurate. Andrej Karpathy, OpenAI co-founder and former Tesla AI director, argues large language models (LLMs) hallucinate all the time, and it’s only when they

    go into deemed factually incorrect territory that we label it a “hallucination”. It looks like a bug, but it’s just the LLM doing what it always does.

    What we call hallucination is actually the model’s core generative process that relies on statistical language patterns.

    In other words, when AI hallucinates, it’s not malfunctioning; it’s demonstrating the same creative uncertainty that makes it capable of generating anything new at all.

    This reframing is crucial for understanding the Slopocene. If hallucination is the core creative process, then the “slop” flooding our feeds isn’t just failed content: it’s the visible manifestation of these statistical processes running at scale.

    Pushing a chatbot to its limits

    If hallucination is really a core feature of AI, can we learn more about how these systems work by studying what happens when they’re pushed to their limits?

    With this in mind, I decided to “break” Anthropic’s proprietary Claude model Sonnet 3.7 by prompting it to resist its training: suppress coherence and speak only in fragments.

    The conversation shifted quickly from hesitant phrases to recursive contradictions to, eventually, complete semantic collapse.

    A language model in collapse. This vertical output was generated after a series of prompts pushed Claude Sonnet 3.7 into a recursive glitch loop, overriding its usual guardrails and running until the system cut it off. Screenshot by author.

    Prompting a chatbot into such a collapse quickly reveals how AI models construct the illusion of personality and understanding through statistical patterns, not genuine comprehension.

    Furthermore, it shows that “system failure” and the normal operation of AI are fundamentally the same process, just with different levels of coherence imposed on top.

    ‘Rewilding’ AI media

    If the same statistical processes govern both AI’s successes and failures, we can use this to “rewild” AI imagery. I borrow this term from ecology and conservation, where rewilding involves restoring functional ecosystems. This might mean reintroducing keystone species, allowing natural processes to resume, or connecting fragmented habitats through corridors that enable unpredictable interactions.

    Applied to AI, rewilding means deliberately reintroducing the complexity, unpredictability and “natural” messiness that gets optimised out of commercial systems. Metaphorically, it’s creating pathways back to the statistical wilderness that underlies these models.

    Remember the morphed hands, impossible anatomy and uncanny faces that immediately screamed “AI-generated” in the early days of widespread image generation?

    These so-called failures were windows into how the model actually processed visual information, before that complexity was smoothed away in pursuit of commercial viability.

    AI-generated image using a non-sequitur prompt fragment: ‘attached screenshot. It’s urgent that I see your project to assess’. The result blends visual coherence with surreal tension: a hallmark of the Slopocene aesthetic. AI-generated with Leonardo Phoenix 1.0, prompt fragment by author.

    You can try AI rewilding yourself with any online image generator.

    Start by prompting for a self-portrait using only text: you’ll likely get the “average” output from your description. Elaborate on that basic prompt, and you’ll either get much closer to reality, or you’ll push the model into weirdness.

    Next, feed in a random fragment of text, perhaps a snippet from an email or note. What does the output try to show? What words has it latched onto? Finally, try symbols only: punctuation, ASCII, unicode. What does the model hallucinate into view?

    The output – weird, uncanny, perhaps surreal – can help reveal the hidden associations between text and visuals that are embedded within the models.

    Insight through misuse

    Creative AI misuse offers three concrete benefits.

    First, it reveals bias and limitations in ways normal usage masks: you can uncover what a model “sees” when it can’t rely on conventional logic.

    Second, it teaches us about AI decision-making by forcing models to show their work when they’re confused.

    Third, it builds critical AI literacy by demystifying these systems through hands-on experimentation. Critical AI literacy provides methods for diagnostic experimentation, such as testing – and often misusing – AI to understand its statistical patterns and decision-making processes.

    These skills become more urgent as AI systems grow more sophisticated and ubiquitous. They’re being integrated in everything from search to social media to creative software.

    When someone generates an image, writes with AI assistance or relies on algorithmic recommendations, they’re entering a collaborative relationship with a system that has particular biases, capabilities and blind spots.

    Rather than mindlessly adopting or reflexively rejecting these tools, we can develop critical AI literacy by exploring the Slopocene and witnessing what happens when AI tools “break”.

    This isn’t about becoming more efficient AI users. It’s about maintaining agency in relationships with systems designed to be persuasive, predictive and opaque.


    This article was originally published on The Conversation on 1 July, 2025. Read the article here.