The AI Landscape and What I'm Currently Learning

The AI Landscape

This post is an attempt to explain what I'm currently working on with my machine learning studies, in a non-technical way.

The AI landscape looks roughly like this:


"AI" is the broadest term, as you can see, but whenever you see it mentioned in the media, they're usually talking about Generative AI. Large Language Models (LLMs) in particular, like ChatGPT.

Generative AI has been around for about five years but only really exploded into popular consciousness about two years ago. Already, everything else in AI is generally refered to as 'good old fashioned AI' and tends not to look all that intelligent at all.

LLMs like ChatGPT aren't actually intelligent at all; they just do a great job of imitating it because they're trained on such a vast amount of data. Twice this week I've seen articles claiming the Turing Test has now been passed, since LLMs can communicate like humans so convincingly. Whether this is true is a matter of debate, but it's certainly exciting times we live in.


What I'm Studying at the Moment

Over the last two weeks I learned how to create and train my own neural network, which places me in the Deep Learning section in the diagram above. I created NNs to perform basic linear regression, binary classification and multiclass classification. This was part of a practical course by the wonderful Daniel Bourke on Deep Learning with the PyTorch library. My post about all that is here.

This week I've plunged further down the rabbit hole, into the Generative AI section. What interests me above all else is image generation. As well as general text-to-image creation, like Dall-E and Stable Diffusion, I ultimately want to be able to explore what I'll call generative image editing. More on that below.

So my current focus is a technical deep dive trying to get a solid grasp of how general image generation is achieved. I just created and trained a Generative Adversarial Network (GAN) in PyTorch, successfully generating fake handwritten digits that match the MNIST dataset. I'll do a specific post about GANs soon, before I move on to looking at Diffusion models.

Thanks for following my journey!


Generative Image Editing

Inpainting is a technique where a selected part of a photo is edited through generative AI. Say you have a photo of a dog and you want to change it to a photo of a cat, while leaving the rest of the image unchanged. Or you have a photo of a person and you want to change his outfit. Or keep the person intact but transpose him to a different location. Adobe Photoshop's 'Generative Fill' feature does this pretty well, although is still a long way from perfect. Some image generators also offer inpainting.

Outpainting is a related technique where you extend the canvas of an image beyond its original edges, using generative AI to seamlessly create the new parts. So you could take the Mona Lisa, for instance, and expand out to see what the rest of her might have looked like. Technically it's the same thing as Inpainting, just applied in a different area.

Relighting is a state of the art image editing technique that's still very much being developed. The idea is that you can take an image and change the lighting, or time of day, for example going from day to night. This is extremely challenging, since the lighting in a photo is baked into every pixel, and to change it involves the computer having a deep understanding of the elements in the image, depth information and the physics of light. I believe there are partially successful attempts at Relighting already available but haven't seen it in action for myself yet. I've seen the company Magnific mentioned in this regard, so they might be one to watch.