We're the dawn of an explosion of applications that use AI. The technology is way ahead of applications at this point and the invisible hand of the free market will fill in the applications as soon as possible.
In the narrowest sense, large language models (LLMs) are trained to perform the task of completing sentences. But in practice, they are a single model that does translation, sentiment analysis, classifications and wide variety of NLP tasks. Their generality is the major technological breakthrough of our time. See my Excel demo for how powerful it is.
Diffusion-based image generation models are the other breakthrough of recent years. When I was experimenting with them about a year ago, they were much slower and much worse. Six months later, there were models that produce really high quality images of various varieties.
Whispr, a speech to text open source model, is also excellent. And for the converse, text to speech, we have new models like VALL-E that hold a ton of promise.
AI in every app or AI-native apps?
There will certainly be AI in every app. But the big question about the future is whether the new "AI-native" apps will leapfrog and kill the existing apps that are just augmented with AI. Will AI fundamentally change how we solve user problems? Overall, I'd say the answer is yes. Some companies might adapt but it’s hard to completely tear down your product’s interface.
Let's look at the problem of video editing, where the user's goal is to produce a coherent and engaging video say for TikTok orYouTube or a film to be shown in theaters.
Currently, the solution to that problem involves clips, timelines, transitions, animations, colors, etc. See Davinci Resolve's interface for color editing below:
But do I care about any of this really if my goal is just to be coherent and engaging?
Let's look at Descript, an AI-native product.
Descript is not a timeline editor. Instead, it allows you to edit the script of the video like a doc. A visual might help here.
Want to take out a boring segment in the middle? Just select those words and remove them. Descript will make sure the transition feels natural. This is how I want to edit video. I want to deal directly with content, not frames. Anyone who’s used a timeline editor knows how fussy it can be to clip a segment out at the exact right timestamps.
Timeline view only allows you to see one frame at a time. When I edit video, I often play it back at 4x to get a better bird’s eye view. But an entire script is even better. A Script view, on the other hand, allows you to see the full picture at once. Text is easier to skim than video.
Search actually works. Want to find the part where you transition to talking about say pricing of your product? Just search and click.
Descript has a bunch of other cool features. One worth pointing out is Overdub which lets you re-record a word or sentence in your own voice by typing out the text you want. That way, you don’t need to re-record the whole video just because you messed up one small part.
How much faster is it to produce a video with Descript vs. another editor? Easily 5x, probably 10x in the future.
Without the advances in AI, a script editor would not be possible and that’s why all the existing heavyweights in video editing are all uniformly timeline editors. Sure, some people like a professional movie editor, might still need to deal with frames but most people making videos (every day people using YouTube) will never have to deal with frames.
Let’s talk about a different domain now: writing. Most of the AI products I’ve seen so far in the writing space deal with content creation. They solve the “blank page” problem. That’s an obvious use case given how good GPT is at producing decent quality writing.
But in the course of writing this blogpost, I did not use AI. Why? Well, I couldn’t find a product that addressed the needs I had. I wanted technology to help me:
Proof read paragraphs
Show the diffs and let me decide what to accept!
Make edits in my voice (not in a ChatGPT/generic/celebrity voice)
Add images for the content to be more easily understood
Come up with a good title
Suggest better sectioning of the content
Auto-summarize sections for me so I can skim the whole post easily
Tell me if the arguments I’m making are coherent and if I should add more detail to any section.
Generate a good title for Hacker News, one for Reddit, and also, turn this post into a Twitter thread
Writing on a Computer
Let’s talk about a bit about the evolution of writing on a computer. First come the word processors, where the fundamental construct is just words. You have individual, separate documents where you type in words (and then, images, graphs, etc.). You can move these words around. You can make edits and so on. You can spell check.
Then you have collaborative word processors. But Google Docs, while real-time and collaborative and hence a major leap forward, doesn’t fundamentally change the writing/editing paradigm.
Then, you have the current state of the art tools such as Notion, Craft, etc. Notion in particular is a great writing tool. They actually changed the paradigm of writing from words to blocks of content. In Notion, you can move blocks around, you can lay them out visually, you can share blocks across different pages. Another huge difference from word processors is that you can have a network of pages linked to each other in one massive space. You’re no longer constrained to individual documents like you are with Google Docs or Word.
I think what comes next is writing at the layer of ideas/arguments/thoughts.
Words (GDocs, Word) → Blocks (Notion, Craft) → Ideas (??)
Sometimes, you’ll drop down to the lower levels of abstractions of words but most of your writing time can be spent at the idea layer. The tools you’ll need to work at this layer probably solve the needs that I highlighted earlier.
I haven’t seen any AI writers that have made this jump yet, but it’s certainly possible with our current available technology. Like GitHub CoPilot, you’ll also want the features to be default on rather than default off. Notion AI, Lex, et all. all require the user to proactively ask the AI to say summarize a piece of text or generate a title. This is understandable given the high per-token cost of OpenAI APIs but the future will be cheaper.
What I want is everything (summaries, title suggestions, Twitter thread ideas) to be automatically generated and kept up to date in real-time.
Moving up the layer of abstractions is a big jump and involves fundamental changes and lots of experimentation to get the interface right. I’m not sure if the incumbents will be willing to radically alter their interfaces to make the change (for example, Notion’s AI augmentation v1 is rather disappointing).
That’s why I’m bullish on new players that are AI-native in all of these domains.
Also, love this part: "What I want is everything [...]" :)
With all the writing apps launching, guess you must have found what you were looking for!
I'm using https://saga.so/ai which has all the commands you said in their contextual menu, it's super handy as directly integrated within notes and docs. Similar to Notion but faster and much less complex to use - no need to organize anything :-)
Also finding that writing good prompts is a learning curve, now I'm getting much better results as I know how detailed to make it, or to continue the conversation to get the output I want