I’ve been playing around with generative art. There are a bunch of Jupyter notebooks floating around the Internet, and they produce some really fun art. I’ve been focused this past week on getting these to run fast.
If you want to run Jupyter notebooks fast on the cloud, I discovered you basically have a few options: either the Big 3 cloud providers, GCP, Azure and AWS or a few indie players who have better UX and are easier to start with like Papersource, Kaggle (owned by Google now), etc. I tried out Papersource, Azure and AWS. AWS offered the best performance so that’s what I’d recommend. Specifically, AWS Sagemaker turned out to be pretty easy to get up and running.
I used a ml.p3.2xlarge instance which has a Tesla V100 GPU. It costs around $4/hour. For example, generating a 256x256 image on Google Collab took 14 mins but just 4 mins on this 2xlarge Sagemaker instance. The neat thing about these instances is that you can start/stop them whenever you need them and the data stored on the instance will persist across restarts so you don’t need to wait for a large model to re-download for instance. I would happily pay for ml.p3.8xlarge which costs $15/hour but turns out you have to request quota via customer service for that. Same thing on Azure and their customer service actually told me they didn’t have any available at the moment 😞
Enough talk. Let’s look at some fun pictures.
For the prompt, "new york city winter", this model produces some gorgeous output.
But give it the prompt "kanye west" and it looks awful.
And looks like because this model is trained on 256x256 images, it’s not able to generate images that are larger. Here are some attempted 512x512 images.
Being an AI newbie, I didn’t realize that obtaining enough computing resources would be a big bottleneck. In AI, there are essentially two phases: (1) the training phase and (2) the inference phase. I’ve yet to do any training myself but no surprise that requires quite a bit of compute. But it’s a one off task. Once the model is built out, inference e.g. Siri responding to your question or image generation like above, can require quite a bit of compute as well. That was surprising to me.
One idea I’ve been toying with is generating NFTs on demand based on keywords the minter picks so if you put in “NYC in the winter”, you might be get something like the images above. Even at $4/hour, to maintain a continuously running server, you’d have a fixed cost of $3k a month. On top of that, that single server can likely only generate 1 image at a time and each image takes 4 minutes.
Serverless applied to ML inference is the obvious solution to the cost problem. There are some offerings out there that I have yet to explore but it doesn’t really seem super popular like Serverless is for other applications. Serverless would certainly unlock a whole lot of generative AI applications if it was widely available.
Given this update is sorta mid-week and this weekend’s a long weekend, I have a short list of goals plus some rollover this week:
Better understand the model used to generate the images above: I’ve run up against my lack of knowledge of the model + AI fundamentals in answering some questions and making progress: How do I get this to generate large high-res images? Why do the Kanye images look so awful? How do I get this to run faster? I think an actionable outcome here is to write a blogpost explaining how the generation works in simple terms.
[Rollover] Write up 2 commercially viable ideas for API services that wrap around the OpenAI API to do something more specific and useful.
In other news, I’ve been writing more and got some positive feedback on the Querying Information on the Solana blockchain post that I spent a lot of time on last week. That post now ranks #1 on Bing for “querying nft solana” :)