The Clockwork Penguin

Daniel Binns is a media theorist and filmmaker tinkering with the weird edges of technology, storytelling, and screen culture. He is the author of Material Media-Making in the Digital Age and currently writes about posthuman poetics, glitchy machines, and speculative media worlds.

Tag: genAI

  • Grotesque fascination

    A few weeks back, some colleagues and I were invited to share some new thoughts and ideas on the theme of ‘ecomedia’, as a lovely and unconventional way to launch Simon R. Troon’s newest monograph, Cinematic Encounters with Disaster: Realisms for the Anthropocene. Here’s what I presented; a few scattered scribblings on environmental imaginaries as mediated through AI.


    Grotesque Fascination:

    Reflections from my weekender in the uncanny valley

    In February 2024 OpenAI announced their video generation tool Sora. In the technical paper that accompanied this announcement, they referred to Sora as a ‘world simulator’. Not just Sora, but also DALL-E or Runway or Midjourney, all of these AI tools further blur and problematise the lines between the real and the virtual. Image and video generation tools re-purpose, re-contextualise, and re-gurgitate how humans perceive their environments and those around them. These tools offer a carnival mirror’s reflection on what we privilege, prioritise, and what we prejudice against in our collective imaginations. In particular today, I want to talk a little bit about how generative AI tools might offer up new ways to relate to nature, and how they might also call into question the ways that we’ve visualized our environment to date.

    AI media generators work from datasets that comprise billions of images, as well as text captions, and sometimes video samples; the model maps all of this information using advanced mathematics in a hyper-dimensional space, sometimes called the latent space or a U-net. A random image of noise is then generated and fed through the model, along with a text prompt from the user. The model uses the text to gradually de-noise the image in a way that the model believes is appropriate to the given prompt.

    In these datasets, there are images of people, of animals, of built and natural environments, of objects and everyday items. These models can generate scenes of the natural world very convincingly. These generations remind me of the open virtual worlds in video games like Skyrim or Horizon: Zero Dawn: there is a real, visceral sense of connection for these worlds as you move through them. In a similar way, when you’re playing with tools like Leonardo or MidJourney, there can often be visceral, embodied reactions to the images or media that they generate: Shane Denson has written about this in terms of “sublime awe” and “abject cringe”. Like video games, too, AI Media Generators allow us to observe worlds that we may never see in person. Indeed, some of the landscapes we generate may be completely alien or biologically impossible, at least on this planet, opening up our eyes to different ecological possibilities or environmental arrangements. Visualising or imagining how ecosystems might develop is one way of potentially increasing awareness of those that are remote, unexplored or endangered; we may also be able to imagine how the real natural world might be impacted by our actions in the distant future. These alien visions might also, I suppose, prepare us for encountering different ecosystems and modes of life and biology on other worlds.

    But it’s worth considering, though, how this re-visualisation, virtualisation, re-constitution of environments, be they realistic or not, might change, evolve or hinder our collective mental image, or our capacity to imagine what constitutes ‘Nature’. This experience of generating ecosystems and environments may increase appreciation for our own very real, very tangible natural world and the impacts that we’re having on it, but like all imagined or technically-mediated processes there is always a risk of disconnecting people from that same very real, very tangible world around them. They may well prefer the illusion; they may prefer some kind of perfection, some kind of banal veneer that they can have no real engagement with or impact on. And it’s easy to ignore the staggering environmental impacts of the technology companies pushing these tools when you’re engrossed in an ecosystem of apps and not of animals.

    In previous work, I proposed the concept of virtual environmental attunement, a kind of hyper-awareness of nature that might be enabled or accelerated by virtual worlds or digital experiences. I’m now tempted to revisit that theory in terms of asking how AI tools problematise that possibility. Can we use these tools to materialise or make perceptible something that is intangible, virtual, immaterial? What do we gain or lose when we conceive or imagine, rather than encounter and experience?

    Machine vision puts into sharp relief the limitations of humanity’s perception of the world. But for me there remains a certain romance and beauty and intrigue — a grotesque fascination, if you like — to living in the uncanny valley at the moment, and it’s somewhere that I do want to stay a little bit longer. This is despite the omnipresent feeling of ickiness and uncertainty when playing with these tools, while the licensing of the datasets that they’re trained on remains unclear. For now, though, I’m trying to figure out how connecting with the machine-mind might give some shape or sensation to a broader feeling of dis-connection.

    How my own ideas and my capacity to imagine might be extended or supplemented by these tools, changing the way I relate to myself and the world around me.

  • Conjuring to a brief

    Generated by me with Leonardo.Ai.

    This semester I’m running a Media studio called ‘Augmenting Creativity’. The basic goal is to develop best practices for working with generative AI tools not just in creative workflows, but as part of university assignments, academic research, and in everyday routines. My motivation or philosophy for this studio is that so much attention is being focused on the outputs of tools like Midjourney and Leonardo.Ai (as well as outputs from textbots like ChatGPT); what I guess I’m interested in is exploring more precisely where in workflows, jobs, and daily life that these tools might actually be helpful.

    In class last week we held a Leonardo.Ai hackathon, inspired by one of the workshops that was run at the Re/Framing AI event I convened a month or so ago. Leonardo.Ai generously donated some credits for students to play around with the platform. Students were given a brief around what they should try to generate:

    • an AI Self-Portrait (using text only; no image guidance!)
    • three images to envision the studio as a whole (one conceptual, a poster, and a social media tile)
    • three square icons to represent one task in their daily workflow (home, work, or study-related)

    For the Hackathon proper, students were only able to adjust the text prompt and the Preset Style; all other controls had to remain unchanged, including the Model (Phoenix), Generation Mode (Fast), Prompt Enhance (off), and all others.

    Students were curious and excited, but also faced some challenges straight away with the underlying mechanics of image generators; they had to play around with word choice in prompts to get close to desired results. The biases and constraints of the Phoenix model quickly became apparent as the students tested its limitations. For some students this was more cosmetic, such as requesting that Leonardo.Ai generate a face with no jewelry or facial hair. This produced mixed results, in that sometimes explicitly negative prompts seemed to encourage the model to produce what wasn’t wanted. Other students encountered difficulties around race or gender presentation: the model struggles a lot with nuances in race, e.g. mixed-race or specific racial subsets, and also often depicts sexualised presentations of female-presenting people (male-presenting too, but much less frequently).

    This session last week proved a solid test of Leonardo.Ai’s utility and capacity in generating assets and content (we sent some general feedback to Leonardo.Ai on platform useability and potential for improvement), but also was useful for figuring out how and where the students might use the tool in their forthcoming creative projects.

    This week we’ve spent a little time on the status of AI imagery as art, some of the ethical considerations around generative AI, and where some of the supposed impacts of these tools may most keenly be felt. In class this morning, the students were challenged to deliver lightning talks on recent AI news, developing their presentation and media analysis skills. From here, we move a little more deeply into where creativity lies in the AI process, and how human/machine collaboration might produce innovative content. The best bit, as always, will be seeing where the students go with these ideas and concepts.

  • Generatainment 101

    generated using Leonardo.Ai

    In putting together a few bits and bobs for academic work on generative AI and creativity, I’m poking around in all sorts of strange places, where all manner of undead monsters lurk.

    The notion of AI-generated entertainment is not a new one, but the first recent start-up I found in the space was Hypercinema. The copy on the website is typically vague, but I think the company is attempting to build apps for sites like stores, museums and theme parks that add visitors into virtual experiences or branded narratives.

    After noodling about on Hypercinema’s LinkedIn and X pages, it wasn’t long before I then found Fable Studios and their Showrunner project; from there it was but a hop, skip and a jump to Showrunner’s parent concept, The Simulation.

    Sim Francisco; what I’m assuming is an artist’s rendition. Sourced from The Simulation on X.

    The Simulation is a project being developed by Fable Studios, a group of techies and storytellers who are interested in a seamless blend of their respective knowledges. To quote their recent announcement: “We believe the future is a mix of game & movie. Simulations powering 1000s of Truman Shows populated by interactive AI characters.” I realise this is still all guff. From what I can tell, The Simulation is a sandbox virtual world populated by a huge variety of AI characters. The idea is that you can guide the AI characters, influencing their lives and decisions; you can then also zoom into a particular character or setting, then ask The Simulation to generate an ‘entertainment’ for you of a particular length, e.g. a 20-minute episode.

    In 2023, Fable Studios released a research paper on their initial work on ‘showrunner agents in multi-agent simulations’. To date, one of the largest issues with AI-generated narratives is that character and plot logics nearly always fall apart; the machine learning systems cannot keep track over prolonged story arcs. In conventional TV/film production, this sort of thing is the role of the director, often in conjunction with the continuity team and first assistant director. But genAI systems are by and large predictive content machines; they’ll examine the context of a given moment and then build the next moment from there, then repeat, then repeat. This process isn’t driven by ‘continuity’ in a traditional cinematic or even narrative sense, but by the cold logic of computation:

    “[A] computer running a program, if left powered up, can sit in a loop and run forever, never losing energy or enthusiasm. It’s a metamechanical machine that never experiences surface friction and is never subject to the forces of gravity like a real mechanical machine – so it runs in complete perfection.”

    John Maeda, How to Speak Machine, p3

    The ML system will repeat the same process over and over again, but note that it does not reframe its entire context from moment to moment, in the way that humans might. The ML system starts again with the next moment, then starts again. This is why generating video with ML tools is so difficult (at least, it still is at the time of writing).

    What if, though, you make a video game, with a set of characters with their own motivations and relationships, and you just let life continue, let characters grow, as per a set of rules? Many sandbox or simulation games can be described in this way. There are also some open-world role-playing games that play out against what feels like a simulated, continous world that exists with or without the player character. The player character, in this latter example, becomes the focaliser, the lens through which action is framed, or from which the narrative emerges. And in the case of simulators or city-builders, it’s the experience of planning out your little world, the embedding of your gameplay choices into the lives of virtual people (as either biography or extended history), that embodies the experience. What The Simulation proposes is similar to both these experiences, but at scale.

    A selection of apparently-upcoming offerings from Showrunner. I believe these are meant to have been generated in/by The Simulation? Sourced from The Simulation on X.

    Sim Francisco is the first megacity that The Simulation has built, and they’re presently working on Neo-Tokyo. These virtual cities are the storyworlds within which you can, supposedly, find your stories. AI creators can jump into these cities, find characters to influence, and then prompt another AI system to capture the ensuing narrative. Again, this is all wild speculation, and the specific mechanics, beyond a couple of vague in-experience clips, are a mystery.

    As is my wont, I’m ever reminded of precedents, not least of which were the types of games discussed above: SimCity, The Sims, The Movies, even back to the old classic Microsoft 3D Movie Maker, but also Skyrim, Grand Theft Auto, Cyberpunk 2077. All of these offer some kind of open-world sandbox element that allows the player to craft their own experience. Elements of these examples seem like they might almost be directly ported to The Simulation: influencing AI characters as in The Sims, or directing them specifically as in 3D Movie Maker? Maybe it’ll be a little less direct, where you simply arrange certain elements and watch the result, like in The Movies. But rather than just the resulting ‘entertainments’, will The Simulation allow users to embody player characters? That way they might then be able to interact with AI characters in single-player, or both AIs and other users in a kind of MMO experience (Fable considers The Simulation to be a kind of Westworld). If this kind of gameplay is combined with graphics like those we’re seeing out of the latest Unreal Engine, this could be Something Else.

    But then, isn’t this just another CyberTown? Another Second Life? Surely the same problems that plagued (sometimes continue to plague) those projects will recur here. And didn’t we just leave some of this nonsense behind us with web3? Even in the last few months, desperate experiments around extended realities have fallen flat; wholesale virtual worlds might not be the goût du moment, er, maintenant. But then, if the generative entertainment feature works well, and the audience becomes invested in their favourite little sim-characters, maybe it’ll kick off.

    It’s hard to know anything for sure without actually seeing the mechanics of it all. That said, the alpha of Showrunner is presently taking applications, so maybe a glimpse under the hood is more possible than it seems.

    Based on this snippet from a Claude-generated sitcom script, however, even knowing how it works never guarantees quality.

    Claude Burrows? I think not. Screenshot from Claude.Ai.

    Post-script: How the above was made

    With a nod to looking under the hood, and also documenting my genAI adventures as part of the initial research I mentioned, here’s how I reached the above script snippet from the never-to-be-produced Two Girls, A Guy, and a WeWork.

    Initial prompt to Claude:

    I have an idea for a sitcom starring three characters: two girls and a guy. One girl works a high-flying corporate job, the other girl has gone back to school to re-train for a new career after being fired. The guy runs a co-working space where the two girls often meet up: most of the sitcom's scenes take place here. What might some possible conflicts be for these characters? How might I develop these into episode plotlines?

    Of the resulting extended output, I selected this option to develop further:

    Conflict 6: An investor wants to partner with the guy and turn his co-working space into a chain, forcing him to choose between profits and the community vibe his friends love. The girls remind him what really matters.

    I liked the idea of a WeWork-esque storyline, and seeing how that might play out in this format and setting. I asked Claude for a plot outline for an episode, which was fine? I guess? Then asked it to generate a draft script for the scene between the workspace owner (one of our main characters) and the potential investor.

    To be fair to the machine, the quality isn’t awful, particularly by sitcom standards. And once I started thinking about sitcom regulars who might play certain characters, the dialogue seemed to make a little more sense, even if said actors would be near-impossible at best, and necromantic at worst.

  • Elusive images

    Generated with Leonardo.Ai, prompts by me.

    Up until this year, AI-generated video was something of a white whale for tech developers. Early experiments resulted in janky-looking acid dream GIFs; vaguely recognisable frames and figures, but nothing in terms of consistent, logical motion. Then things started to get a little, or rather a lot, better. Through constant experimentation and development, the nerds (and I use this term in a nice way) managed to get the machines (and I use this term in a knowingly reductive way) to produce little videos that could have been clips from a film or a human-made animation. To reduce thousands of hours of math and programming into a pithy quotable, the key was this: they encoded time.

    RunwayML and Leonardo.Ai are probably the current forerunners in the space, allowing text-to-image-to-(short)video as a seamless user-driven process. RunwayML also offers text-to-audio generation, which you can then use to generate an animated avatar speaking those words; this avatar can be yourself, another real human, a generated image, or something else entirely. There’s also Pika, Genmo and many others offering variations on this theme.

    Earlier this year, OpenAI announced Sora, their video generation tool. One assumes this will be built into ChatGPT, the chatbot which is serving as the interface for other OpenAI products like DALL-E and custom GPTs. The published results of Sora are pretty staggering, though it’s an open secret that these samples were chosen from many not-so-great results. Critics have also noted that even the supposed exemplars have their flaws. Similar things were said about image generators only a few years ago, though, so one assumes that the current state of things is the worst it will ever be.

    Creators are now experimenting with AI films. The aforementioned RunwayML is currently running their second AI Film Festival in New York. Many AI films are little better than abstract pieces that lack the dynamism and consideration to be called even avant-garde. However, there are a handful that manage to transcend their technical origins. But how this is not true of all media, all art, manages to elude critics and commentators, and worst of all, my fellow scholars.

    It is currently possible, of course, to use AI tools to generate most components, and even to compile found footage into a complete video. But this is an unreliable method that offers little of the creative control that filmmakers might wish for. Creators employ an infinite variety of different tools, workflows, and methods. The simplest might prompt ChatGPT with an idea, ask for a fleshed-out treatment, and then use other tools to generate or source audiovisual material that the user then edits in software like Resolve, Final Cut or Premiere. Others build on this post-production workflow by generating music with Suno or Udio; or they might compose music themselves and have it played by an AI band or orchestra.

    As with everything, though, the tools don’t matter. If the finished product doesn’t have a coherent narrative, theme, or idea, it remains a muddle of modes and outputs that offers nothing to the viewer. ChatGPT may generate some poetic ideas on a theme for you, but you still have to do the cognitive work of fleshing that out, sourcing your media, arranging that media (or guiding a tool to do it for you). Depending on what you cede to the machine, you may or may not be happy with the result — cue more refining, revisiting, more processing, more thinking.

    AI can probably replace us humans for low-stakes media-making, sure. Copywriting, social media ads and posts, the nebulous corporate guff that comprises most of the dead internet. For AI video, the missing component of the formula was time. But for AI film, time-based AI media of any meaning or consequence, encoding time was just the beginning.

    AI media won’t last as a genre or format. Call that wild speculation if you like, but I’m pretty confident in stating it. AI media isn’t a fad, though, I think, in the same ways that blockchain and NFTs were. AI media is showing itself to be a capable content creator and creative collaborator; events like the AI Film Festival are how these tools test and prove themselves in this regard. To choose a handy analogue, the original ‘film’ — celluloid exposed to light to capture an image — still exists. But that format is distinct from film as a form. It’s distinct from film as a cultural idea. From film as a meme or filter. Film, somehow, remains a complex cultural assemblage of technical, social, material and cultural phenomena. Following that historical logic, I don’t think AI media will last in its current technical or cultural form. That’s not to say we shouldn’t be on it right now: quite the opposite, in fact. But to do that, don’t look to the past, or to textbooks, or even to people like me, to be honest. Look to the true creators: the tinkerers, the experimenters, what Apple might once have called the crazy ones.

    Creators and artists have always pushed the boundaries, have always guessed at what matters and what doesn’t, have always shared those guesses with the rest of us. Invariably, those guesses miss some of the mark, but taken collectively they give a good sense of a probable direction. That instinct to take wild stabs is something that LLMs, even a General Artificial Intelligence, will never be truly capable of. Similarly, the complexity of something like, for instance, a novel, or a feature film, eludes these technologies. The ways the tools become embedded, the ways the tools are treated or rejected, the ways they become social or cultural; that’s not for AI tools to do. That’s on us. Anyway, right now AI media is obsessed with its own nature and role in the world; it’s little better than a sequel to 2001: A Space Odyssey or Her. But like those films and countless other media objects, it does itself show us some of the ways we might either lean in to the change, or purposefully resist it. Any thoughts here on your own uses are very welcome!

    The creative and scientific methods blend in a fascinating way with AI media. Developers build tools that do a handful of things; users then learn to daisy-chain those tools together in personal workflows that suit their ideas and processes. To be truly innovative, creators will develop bold and strong original ideas (themes, stories, experiences), and then leverage their workflows to produce those ideas. It’s not just AI media. It’s AI media folded into everything else we already do, use, produce. That’s where the rubber meets the road, so to speak; where a tool or technique becomes the culture. That’s how it worked with printing and publishing, cinema and TV, computers, the internet, and that’s how it will work with AI. That’s where we’re headed. It’s not the singularity. It’s not the end of the world. it’s far more boring and fascinating than either of those could ever hope to be.