The Clockwork Penguin

Daniel Binns is a media theorist and filmmaker tinkering with the weird edges of technology, storytelling, and screen culture. He is the author of Material Media-Making in the Digital Age and currently writes about posthuman poetics, glitchy machines, and speculative media worlds.

Tag: culture

  • How I Read AI Images

    Image generated by Adobe Firefly, 3 September 2024; prompt unknown.

    AI-generated media sit somewhere between representational image — representations of data rather than reality — and posthuman artefact. This ambiguous nature suggests that we need methods that not just consider these images as cultural objects, but also as products of the systems that made them. I am following here in the wake of other pioneers who’ve bravely broken ground in this space.

    For Friedrich Kittler and Jussi Parikka, the technological, infrastructural and ecological dimensions of media are just as — if not more — important than content. They extend Marshall McLuhan’s notion that ‘the medium is the message’ from just the affordances of a given media type/form/channel, into the very mechanisms and processes that shape the content before and during its production or transmission.

    I take these ideas and extend them to the outputs themselves: a media-materialist analysis. Rather than just ‘slop’, this method contends that AI media are cultural-computational artefacts, assemblages compiled from layered systems. In particular, I break this into data, model, interface, and prompt. This media materialist method contends that each step of the generative process leaves traces in visual outputs, and that we might be able to train ourselves to read them.

    Data

    There is no media generation without training data. These datasets can be so vast as to feel unknowable, or so narrow that they feel constricting. LAION-5B, for example, the original dataset used to train Stable Diffusion, contains 5.5 billion images. Technically, you could train a model on a handful of images, or even one, or even none, but the model would be more ‘remembering’, rather than ‘generating’. Video models tend to use smaller datasets (comparatively), such as PANDA-70M which contains over 70 million video-caption pairs: about 167,000 hours of footage.

    Training data for AI models is also hugely contentious, given that many proprietary tools are trained on data scraped from the open internet. Thus, when considering datasets, it’s important to ask what kinds of images and subjects are privileged. Social media posts? Stock photos? Vector graphics? Humans? Animals? Are diverse populations represented? Such patterns of inclusion/exclusion might reveal something about the dataset design, and the motivations of those who put it together.

    A ‘slice’ of the LAION-Aesthetics dataset. The tool I used for this can be found/forked on Github.

    Some datasets are human-curated (e.g. COCO, ImageNet), and others are algorithmically scraped and compiled (e.g. LAION-Aesthetics). There may be readable differences in how these datasets shape images. You might consider:

    • Are the images coherent? Chaotic/glitched?
    • What kinds of prompts result in clearer, cleaner outputs, versus morphed or garbled material?

    The dataset is the first layer where cultural logics, assumptions, patterns of normativity or exclusion are encoded in the process of media generation. So: what can you read in an image or video about what training choices have been made?

    Model

    The model is a program: code and computation. The model determines what happens to the training data — how it’s mapped, clustered, and re-surfaced in the generation process. This re-surfacing can influence styles, coherence, and what kinds of images or videos are possible with a given model.

    If there are omissions or gaps in the training data, the model may fail to render coherent outputs around particular concepts, resulting in glitchy images, or errors in parts of a video.

    Midjourney was built on Stable Diffusion, a model in active development by Stability AI since 2022. Stable Diffusion works via a process of iterative de-noising: each stage in the process brings the outputs closer to a viable, stable representation of what’s included in the user’s prompt. Leonardo.Ai’s newer Lucid models also operate via diffusion, but specialists are brought in at various stages to ‘steer’ the model in particular directions, e.g. to verify what appears as ‘photographic’, ‘artistic’, ‘vector graphic design’, and so on.

    When considering the model’s imprint on images or videos, we might consider:

    • Are there recurring visual motifs, compositional structures, or aesthetic fingerprints?
    • Where do outputs break down or show glitches?
    • Does the model privilege certain patterns over others?
    • What does the model’s “best guess” reveal about its learned biases?

    Analysing AI-generated media with these considerations in mind may reveal the internal logics and constraints of the model. Importantly, though, these logics and constraints will always shape AI media, whether they are readable in the outputs or not.

    Interface

    The interface is what the user sees when they interact with any AI system. Interfaces shape user perceptions of control and creativity. They may guide users towards a particular kind of output by making some choices easier or more visible than others.

    Midjourney, for example, displays a simple text box with the option to open a sub-menu featuring some more customisation options. Leonardo.Ai’s interface is more what I call a ‘studio suite’, with many controls visible initially, and plenty more available with a few menu clicks. Offline tools like DiffusionBee and ComfyUI similarly offer both simple (DiffusionBee) and complex (ComfyUI) options.

    Midjourney’s web interface: ‘What will you imagine?’
    Leonardo.Ai’s ‘studio suite’ interface.

    When looking at interfaces, consider what controls, presets, switches or sliders are foregrounded, and what is either hidden in a sub-menu or not available at all. This will give a sense of what the platform encourages: technical mastery and fine control (lots of sliders, parameters), or exploration and chance (minimal controls). Does this attract a certain kind of user? What does this tell you about the ‘ideal’ use case for the platform?

    Interfaces, then, don’t just shape outputs. They also cultivate different user subjectivities: the tinkerer, the artist, the consumer.

    Reading interfaces in outputs can be tricky. If the model or platform is known, one can speak of the outputs in knowledgeable terms about how the interface may have pushed certain styles, compositions, or aesthetics. But even if the platform is not known, there are some elements to speak to. If there is a coherent style, this may speak to prompt adherence or to presets embedded in the interface. Stable compositions — or more chaotic clusters of elements — may speak to a slider that was available to the user.

    Whimsical or overly ‘aesthetic’ outputs often come from Midjourney. Increasingly, outputs from Kling and Leonardo are becoming much more realistic — and not in an uncanny way. But both Kling and Leonardo’s Lucid models put a plastic sheen on human figures that is recognisable.

    Prompt

    While some have speculated that other user input modes might be forthcoming — and others have suggested that such modes might be better — the prompt has remained the mainstay of the AI generation process, whether for text, image, video, software, or interactive environment. Some platforms say explicitly that their tools or models offer good ‘prompt adherence’, ie. what you put in is what you’ll get, but this is contingent on your putting in plausible/coherent prompts.

    Prompts activate the model’s statistical associations (usually through the captions alongside the images in training embeddings), but are filtered through linguistic ambiguity and platform-specific ‘prompting grammars’.

    Tools or platforms may offer options for prompt adherence or enhancement. This will push user prompts through pre-trained LLMs designed to embellish with more descriptors and pointers.

    If the prompt is known, one might consider the model’s interpretation of it in the output, in terms of how literal or metaphorical the model has been. There may be notable traces of prompt conventions, or community reuse and recycling of prompts. Are there any concepts from the prompt that are over- or under-represented? If you know the model as well as the prompt, you might consider how much the model has negotiated between user intention and known model bias or default.

    Even the clearest prompt is mediated by statistical mappings and platform grammars — reminding us that prompts are never direct commands, but negotiations. Thus, prompts inevitably reveal both the possibilities and limitations of natural language as an interface with generative AI systems.

    Sample Analysis

    Image generated by Leonardo.Ai, 29 September 2025; prompt by me.
    Prompt‘wedded bliss’
    ModelLucid Origin
    PlatformLeonardo.Ai
    Prompt enhancementoff
    Style presetoff

    The human figures in this image are young, white, thin, able-bodied, and adhere to Western and mainstream conventions of health and wellness. The male figure has short trimmed hair and a short beard, and the female figure has long blonde hair. The male figure is taller than the female figure. They are pictured wearing traditional Western wedding garb, so a suit for the man, and a white dress with veil for the woman. Notably, all of the above was was true for each of the four generations that came out of Leonardo for this prompt. The only real difference was in setting/location, and in distance of the subjects from the ‘camera’.

    By default, Lucid Origin appears to compose images with subjects in the centre of frame, and the subjects are in sharp focus, with details of the background tending to be in soft focus or completely blurred. A centered, symmetrical composition with selective focus is characteristic of Leonardo’s interface presets, which tend toward professional photography aesthetics even when presets are explicitly turned off.

    The model struggles a little with fine human details, such as eyes, lips, and mouths. Notably the number of fingers and their general proportionality are much improved from earlier image generators (fingernails may be a new problem zone!). However, if figures are touching, such as in this example where the human figures are kissing, or their faces are close, the model struggles to keep shadows, or facial features, consistent. Here, for instance, the man’s nose appears to disappear into the woman’s right eye. When the subjects are at a distance, inconsistencies and errors are more noticeable.

    Overall though, the clarity and confident composition of this image — and the others that came out of Leonardo with the same prompt — would suggest that a great many wedding photos, or images from commercial wedding products, are present in the training data.

    Interestingly, without prompt enhancement, the model defaulted to an image presumably from the couples wedding day, as opposed to interpreting ‘wedded bliss’ to mean some other happy time during a marriage. The model’s literal interpretation here, i.e. showing the wedding day itself rather than any other moment of marital happiness, reveals how training data captions likely associate ‘wedded bliss’ (or ‘wed*’ as a wildcard term) directly with wedding imagery rather than the broader concept of happiness in marriage.

    This analysis shows how attention to all four layers — data biases, model behavior, interface affordances, and prompt interpretation — reveals the ‘wedded bliss’ image as a cultural-computational artefact shaped by commercial wedding photography, heteronormative assumptions, and the technical characteristics of Leonardo’s Lucid Origin model.


    This analytic method is meant as an alternative to dismissing AI media outright. To read AI images and video as cultural-computational artefacts is to recognise them as products, processes, and infrastructural traces all at once. Such readings resist passive consumption, expose hidden assumptions, and offer practical tools for interpreting the visuals that generative systems produce.


    This is a summary of a journal article currently under review. In respect of the ethics of peer review, this version is much edited, heavily abridged, and the sample analysis is new specifically for this post. Once published, I will link the full article here.

  • This algorithmic moment

    Generated by Leonardo AI; prompts by me.

    So much of what I’m being fed at the moment concerns the recent wave of AI. While we are seeing something of a plateauing of the hype cycle, I think (/hope), it’s still very present as an issue, a question, an opportunity, a hope, a fear, a concept. I’ll resist my usual impulse to historicise this last year or two of innovation within the contexts of AI research, which for decades was popularly mocked and institutionally underfunded; I’ll also resist the even stronger impulse to look at AI within the even broader milieu of technology, history, media, and society, which is, apparently, my actual day job.

    What I’ll do instead is drop the phrase algorithmic moment, which is what I’ve been trying to explore, define, and work through over the last 18 months. I’m heading back to work next week after an extended period of leave, so this seems as good a way of any as getting my head back into some of the research I left to one side for a while.

    The algorithmic moment is what we’re in at the moment. It’s the current AI bubble, hype cycle, growth spurt, whatever you define this wave as (some have dubbed it the AI spring or boom, to distinguish it from various AI winters over the last century1). In trying to bracket it off with concrete times, I’ve settled more or less on the emergence of the GPT-3 Beta in 2020. Of course OpenAI and other AI innovations predated this, but it was GPT-3 and its children ChatGPT and DALL-E 2 that really propelled discussions of AI and its possibilities and challenges into the mainstream.

    This also means that much of this moment is swept up with the COVID pandemic. While online life had bled into the real world in interesting ways pre-2020, it was really that year, during urban lockdowns, family zooms, working from home, and a deeply felt global trauma, that online and off felt one and the same. AI innovators capitalised on the moment, seizing capital (financial and cultural) in order to promise a remote revolution built on AI and its now-shunned sibling in discourse, web3 and NFTs.

    How AI plugs into the web as a system is a further consideration — prior to this current boom, AI datasets in research were often closed. But OpenAI and its contemporaries used the internet itself as their dataset. All of humanity’s knowledge, writing, ideas, artistic output, fears, hopes, dreams, scraped and plugged into an algorithm, to then be analysed, searched, filtered, reworked at will by anyone.

    The downfall of FTX and the trial of Sam Bankman-Fried more or less marked the death knell of NFTs as the Next Big Thing, if not web3 as a broader notion to be deployed across open-source, federated applications. And as NFTs slowly left the tech conversation, as that hype cycle started falling, the AI boom filled the void, such that one can hardly log on to a tech news site or half of the most popular Subs-stack without seeing a diatribe or puff piece (not unlike this very blog post) about the latest development.

    ChatGPT has become a hit productivity tool, as well as a boon to students, authors, copy writers and content creators the world over. AI is a headache for many teachers and academics, many of whom fail not only to grasp its actual power and operations, but also how to usefully and constructively implement the technology in class activities and assessment. DALL-E, Midjourney and the like remain controversial phenomena in art and creative communities, where some hail them as invaluable aids, and others debate their ethics and value.

    As with all previous revolutions, the dust will settle on that of AI. The research and innovation will continue as it always has, but out of the limelight and away from the headlines. It feels currently like we cannot keep up, that it’s all happening too fast, that if only we slowed down and thought about things, we could try and understand how we’ll be impacted, how everything might change. At the risk of historicising, exactly like I said I wouldn’t, people thought the same of the printing press, the aeroplane, and the computer. In 2002, Andrew Murphie and John Potts were trying to capture the flux and flow and tension and release of culture and technology. They were grappling in particular with the widespread adoption of the internet, and how to bring that into line with other systems and theories of community and communication. Jean-Francois Lyotard had said that new communications networks functioned largely on “language games” between machines and humans. Building on this idea, Murphie and Potts suggested that the information economy “needs us to make unexpected ‘moves’ in these games or it will wind down through a kind of natural attrition. [The information economy] feeds on new patterns and in the process sets up a kind of freedom of movement within it in order to gain access to the new.”2

    The information economy has given way, now, to the platform economy. It might be easy, then, to think that the internet is dead and decaying or, at least, kind of withering or atrophying. Similarly, it can be even easier to think that in this locked-down, walled-off, platform- and app-based existence where online and offline are more or less congruent, we are without control. I’ve been dropping breadcrumbs over these last few posts as to how we might resist in some small way, if not to the detriment of the system, then at least to the benefit of our own mental states; and I hope to keep doing this in future posts (and over on Mastodon).

    For me, the above thoughts have been gestating for a long time, but they remain immature, unpolished; unfiltered which, in its own way, is a form of resistance to the popular image of the opaque black box of algorithmic systems. I am still trying to figure out what to do with them; whether to develop them further into a series of academic articles or a monograph, to just keep posting random bits and bobs here on this site, or to seed them into a creative piece, be it a film, book, or something else entirely. Maybe a little of everything, but I’m in no rush.

    As a postscript, I’m also publishing this here to resist another system, that of academic publishing, which is monolithic, glacial, frustrating, and usually hidden behind a paywall for a privileged few. Anyway, I’m not expecting anyone to read this, much less use or cite it in their work, but better it be here if someone needs it than reserved for a privileged few.

    As a bookend for the AI-generated image that opened the post, I asked Bard for “a cool sign-off for my blog posts about technology, history, and culture” and it offered the following, so here you go…

    Signing off before the robots take over. (Just kidding… maybe.)


    Notes

    1. For an excellent history of AI up to around 1990, I can’t recommend enough AI: The Tumultuous History of the Search for Artificial Intelligence by Daniel Crevier. Crevier has made the book available for download via ResearchGate. ↩︎
    2. Murphie, Andrew, and John Potts. 2003. Culture and Technology. London: Macmillan Education UK, p. 208. https://doi.org/10.1007/978-1-137-08938-0. ↩︎
  • Critics and creation

    Photo by Leah Newhouse on Pexels.

    I started reading this interview this morning, between Anne Helen Peterson and Betsy Gaines Quammen. I still haven’t finished reading, despite being utterly fascinated, but even before I got to the guts of the interview, I was struck by a thought:

    In the algorithmised world, the creator is the critic.

    This thought is not necessarily happening in isolation; I’ve been thinking about ‘algorithmic culture’ for a couple of years, trying to order these thoughts into academic writing, or even creative writing. But this thought feels like a step in the right direction, even if I’ve no idea what the final output should or will be. Let’s scribble out some notes…

    If there’s someone whose work we enjoy, they’ll probably have an online presence — a blog or social media feed we can follow — where they’ll share what they like.

    It’s an organic kind of culture — but it’s one where the art and vocation of the critic continues to be minimised.

    This — and associated phenomena — is the subject of a whole bunch of recent and upcoming books (including this one, which is at the top of my to-read pile for the next month): a kind of culture where the all-powerful algorithm becomes the sole arbiter of taste, but I also think there is pressure on creatives to be their own kind of critical and cultural hub.

    On the inverse, what we may traditionally have called critics — so modern-day social media commentators, influencers, your Booktubers or Booktokkers, your video essayists and their ilk — now also feel pressure to create. This pressure will come from their followers and acolytes, but also from random people who encounter them online, who will say something like “if you know so much why don’t you just do it yourself” etc etc…

    Some critics will leap at the opportunity and they absolutely should — we are hearing from diverse voices that wouldn’t otherwise have thought to try.

    But some should leave the creation to others — not because they’re not worth hearing from, they absolutely are — but because their value, their creativity, their strength, lies in how they shape language, images, metaphor, around the work of others. They don’t realise — as I didn’t for a long time — that being a critic is a vocation, a life’s work, a real skill. Look at any longer-form piece in the London Review of Books or The New Inquiry and it becomes very clear how valuable this work is.

    I’ve always loved the term critic, particularly cultural critic, or commentator, or essayist… they always seemed like wonderful archaic terms that don’t belong in the modern, fragmented, divided, confused world. But to call oneself a critic or essayist, to own that, and only that, is to defy the norms of culture; to refuse the ‘pillars’ of novel, film, press/journalism, and to stand to one side, giving much-needed perspective to how these archaic forms define, reflect, and challenge society.