DALL-E 2, a text-to-image AI generation program, went live to audiences this autumn. The initial version of the model — which takes its tongue-in-cheek name from Disney’s lovable 2008 robot WALL-E and the Surrealist artist Salvador Dalí — was released in January 2021 by the OpenAI research lab. A version of the tech, DALL-E Mini, was released early on the Hugging Face platform, taking Twitter by storm as a meme sensation, and springboarding the program to international interest beyond AI experts. Over 1.5 million users are creating more than 2 million images per-day with DALL-E.
DALL-E 2 uses the “GPT-3” model of CLIP (Contrastive Language-Image Pre-Training, announced by OpenAI last year), a computer vision system, to generate 1024×1024 pixel images from typed text prompts. The tool was trained using 650 million pairs of images and captions taken from the internet. After collecting image-text pairs, researchers trained the CLIP model to generate text to accurately describe an image, creating a mathematically reliant model. DALL-E then reversed this process, generating images that are well-described by text inputs based on CLIP’s data. Users can also use DALL-E 2 to “outpaint” images — extending pre-existing images beyond their previous borders — and to edit a pre-existing image using text commands.
When inputting your DALL-E 2 request, you’re given the instruction to “start with a detailed description” and the example of “an Impressionist oil painting of sunflowers in a purple vase.” But, what does DALL-E 2 actually “understand” by the style of the Impressionists? Or any artistic style or movement, for that matter? Using the same prompt of “a tomato climbing a ladder by the sea” I put DALL-E 2’s art historical prowess to the test.
For the Impressionists (“An Impressionist painting of a tomato climbing a ladder by the sea”), DALL-E 2 seems to identify that it is a style based around loose brushstrokes, and color-contrasts indicating the impact of light.
It did a surprisingly good job at pinpointing what was meant by “18th Century” art, too. Adding textural elements at the sides and producing a really quite broodingly regal image. What is also interesting is that DALL-E 2 represented what 18th Century artworks look like today, their color palette dulled by time.
My personal favorite was DALL-E’s interpretation of Robert Mapplethorpe’s style. The monochrome image gave the tomato a distinctly pygian look, a sexy nod to Mapplethorpe’s figures. The “Henry Moore sculpture” prompt also made me smile: it would seem second nature to DALL-E 2 that a sculpture requires a plinth.
There were some styles that DALL-E was less adept at recreating, like De Stijl or the Surrealists. It made a good go of interpreting “Mondrian” in the prompt, adding straight lines which cut through the image. Close enough. Warhol’s tomato, too, captured some of the flatness associated with his work, and the Cubist attempt was — in places — angular.