First Prompt Set
Created on August 30|Last edited on August 30
Comment
first off: we have to reference/involve this Cybernetic Tarot I learned about from Genevieve Bell—what an amazing project.
First test results
It worked! It didn't take too long! More detailed notes below the table.
The Hanged Man and the Lovers are actually pretty cool. I would print them. Lots of room for pareidolia :)
Run: fearless-firebrand-1
1
Initial attempts & observations
Subjects
- human faces and realistic portraits are tough—there's a sense of Beyonce and Scarlett, but it's very approximate. Not sure what happened with Marilyn, I'm getting a mix of Warhol vibes and this Dali painting. For the Major Arcana, this makes me want to stay away from anthropomorphized characters like the Hierophant, Empress, Lovers, Hanged Man, etc. (this is a lot of the Major Arcana unfortunately) and lean towards concepts like Strength, Justice, Star, Moon, World, etc (though not the Tower, at least for me—I can tell you why offline). Minor Arcana will be hard for the same reason, but your Ace of X pattern and things like Four of Y could work (modulo that I bet counting is still hard for this network :)
- "yoga" looks to me like it contains lots of wood-grain-parquet-studio floor pixels. This might be from my vast personal experience of yoga practice and yoga photos. My intuition here is that any noun will be strongly correlated with the most frequent pixel type in its photo representations (we could test this with other indoor/outdoor sports—e.g. is there lots of gray and snippets of jerseys for cycling?). Of course I also get a distinct vibe of Yayoi Kusama here and she would be another great artist to add. Lots of polka dots.
Artists
- Alphonse Mucha and art nouveau is my dream. What I really, really want—and I accept that I will need to tune this separately—is to be able to generate realistic portraits of humans in the style of Mucha (one might say Art Nouveau). However, I think by the "frequency of pixels/patches/textures" hypothesis, this will generate the highly-detailed filigree background of his iconic posters and NOT the content-specific view (one might say the portrait part is more impressionistic even)
- Rene Magritte—it appears CLIP has only seen the Pipe and that's it? disappointed
- I misspelled John "Wiliams" Waterhouse but it got some inkling. Again this is me going for dramatic mythological women.
- the palette and hints of style for Alex Gray are impressive—he does gravitate to lots of red & blue, and the right figure's "dress" is super recognizable as his striated muscles/veins/skinless humans.
Questions
- where is most of the data from? this will help me craft better prompts
- how can we better vary/guide the initial noise? Step 99 looks VERY similar across all prompts, and in Mandi's experiments I noticed how a single color palette (and why chartreuse of all colors) starts dominating very quickly.
- would more steps/trying different hyperparameters make sense at this stage, or are we ok with this level of quality?
- how hard would content-aware extension in some form be? I know other generative models do this. E.g. I want the face of a portrait photo-realistic, but the hair and clothes avangarde.
- typos and edit distance in general are interesting—how do they interact? Here's Janelle Shane on desserts in the CLIP desert
- stalking Janelle Shane's Twitter for other insights is a good next step
Next steps I want to try
- I like the pattern of "x by y", or otherwise varying the prompts along two "discretized" dimensions: content and artist, or subject and synonym, scene type and modifier (at night, in the summer), etc. This will enable grouping by one of the field & easier pattern finding in the table
- More artists, especially more canonical ones, and easier prompts like "landscape" or "flowers" or "a sunny day". No need to get so esoteric so fast.
- Styles/aesthetic directions as easier to mimic than specific artists: minimalist, ethereal, cubist, abstract, expressive, rococo, etc. Oh gosh calling dibs now on generating a rococo basilisk and tweeting it at Grimes.
- can we ask questions and have the output be the answer? can we bring in the fortune-telling aspect here and condition on some random variable, or more user input :) ?
Add a comment