When it comes to generative AI, the topic of texts often gets the most attention. See my guide to creating text with AI. But the advances in images and graphics are just as exciting. And with video, you're getting a preview today of what might be possible in the not-so-distant future. In this article, I'll give you an overview of the possibilities and limitations of visual AI offerings.
Images, graphics and videos add enormous value to online content. They attract more attention and can not only explain a topic, but also emotionalize it. This gives you the chance to showcase your brand and corporate identity and stand out from the crowd.
Until now, those who needed visual content had the following options:
1. create it yourself. In addition to talent and knowledge, you need the right tools and time for implementation. This is likely to be unrealistic in many cases.
2. hire someone. This is certainly the highest quality option: you find a suitable specialist. The results here are usually the best because you have the visuals created to suit your needs. Unsurprisingly, however, this is also where the costs are highest.
3. stock photos. You can use platforms such as Shutterstock, Adobe Stock or Depositphotos. They offer a large selection, are of good quality and are affordable. There are even free offerings such as Pexels or Pixelio. Disadvantage: You get off-the-shelf photos and graphics that others also use. Customization is usually not possible. You would have to make these yourself or commission them.
At first glance, AI image generators appear to be an exciting new alternative. After all, they deliver visual content quickly and easily using text commands. In theory, you can generate a precisely fitting visualization at low cost or even free of charge.
Well-known AI image generators include Dall-E from OpenAI, which is also behind ChatGPT, MidJourney and Stable Diffusion. They all have free and paid offerings. Stable Diffusion is open source, which is why an active community has already developed around this tool. This means you can use Stable Diffusion directly on your own computer - or even on a smartphone or tablet.
What AI offers for images are good for
These image generators create works in all kinds of styles: illustrations, drawings, photos, computer graphics or even the look of an oil painting. The limits here are set by the training material, your imagination and your skill and perseverance in the search for the perfect result.
And that brings us to a weak point of these offers as soon as you actually try them out yourself: it is not always as easy as hoped to achieve the desired result. At least it doesn't happen "at the push of a button", as is often described and promised. Sometimes you're lucky and you get a quick hit. Sometimes you pull your hair out because it just won't work.
Over time, you will learn how to achieve the best results. The central element here is the prompt, i.e. the written instruction to the AI tool. However, what works well there depends heavily on the tool.
Dall-E 3, for example, is very powerful, but ChatGPT stands between you and the application. As with text, you therefore explain in natural language what you have in mind. ChatGPT receives this and translates it into an instruction for Dall-E. If you don't like the result, you explain what needs to be changed. And so it goes on and on.
At the other end of the spectrum is Stable Diffusion. Even if you use it via the commercial application DreamStudio, you have various manual options. You get even more freedom if you use Stable Diffusion via an interface on your own computer, such as Automatic1111 or Draw Things.Â
To explain it like this: Dall-E is macOS, Stable Diffusion is Linux. Dall-E produces good results quite quickly. In return, you have to accept that the system limits what you can do and how you can do it. Stable Diffusion, on the other hand, is initially confusing and complex. But in return, there is an enormous amount possible and you can use a number of levers.
Perhaps MidJourney could then be the third in the group for Windows. However, I have to admit that I don't like MidJourney's interface within the Discourse chat service at all. In this respect, I only have very limited experience with it. At the same time, MidJourney is quite popular because you can achieve great results with little effort. At the moment, however, I prefer to use Dall-E 3 via ChatGPT.
Typical challenges and mistakes
One mistake I see again and again is that too often people try to create photorealistic images. In my opinion, this is not ideal for two reasons:
- The results often look even more artificial than the stock photos on which they are based. In addition, there is often a lack of polish to the look of the images. This is because stock photos are usually designed to be as neutral as possible, which makes them both flexible to use and boring. Photos become interesting through the composition, the lighting, the play with sharpness and blurriness. If you don't make any specifications, AI tools tend to produce something mediocre.
- Problems and mistakes in the picture are more likely to catch the eye, whereas in other styles they pass as an expression of "creative freedom". A technical term here is "uncanny valley": the point at which an almost correct human face looks disturbing due to a small mistake.
That's why I often rely on illustrations and graphics. That doesn't mean that photorealistic images aren't useful at all. But it's good to have other options in mind.
Regardless of the style, it is important to understand the limits of the tools. These can sometimes be surprising. One motif may work straight away, while another idea may not work even after dozens of attempts. This often has to do with what the AI knows from its training material. It can create images that do not yet exist anywhere else.
But at the same time you have to be aware that these tools don't have the slightest understanding of what they are depicting. They have no idea about the world in general or, for example, about human anatomy in particular.
Hands are a well-known example of this problem. Dall-E or Stable Diffusion do not know what a human hand looks like or how it works. They have seen hands during training. But they are sometimes only visible from the side, partially obscured or two hands are on top of each other. The AI does not understand that an average human hand has five fingers and that sometimes, due to perspective or other circumstances, not all of them are visible.
Complex scenes are also difficult. Example: You want a picture that shows a team of five people and you have specific ideas about what each person should look like. Good luck with that! I hope you have the time and patience ...
The situation is similar when a person needs to strike a clearly defined pose or you have an exact image composition in mind. In this case, it helps to create an image not only from a prompt, but also from a template (known as "image to image" as opposed to "text to image"). Stable Diffusion also has the ControlNet helper, which you can use to determine specific elements of a template that should appear in the new image.
As you can see at this point, the higher your expectations and the more detailed your idea, the more difficult it will be. However, it works well if you let the AI inspire you: For example, you describe to ChatGPT the purpose for which you need the image and what it should represent and then you see to what extent you like the result and approach it step by step. With Stable Diffusion, on the other hand, you will experiment with the prompt, but also with numerous other options and settings.
The problematic aspects of image generators
However, this is not the only challenge. Another is that these AIs show what can be found in the training material. And this includes prejudices and clichés. This can include stereotypical gender roles or even racist world views. It is ultimately your responsibility to recognize and weed out such problematic representations. ChatGPT and Dall-E actively try to avoid this.
Another point concerns the "training material" that has already been mentioned several times. Similar to text generators, these tools have also learned their skills from human models. They have been fed with an enormous amount of data. Whether these photos, graphics, illustrations, paintings and other works were allowed to be used for this purpose is a hotly debated question.
"*" indicates required fields
Some see it as copyright infringement. Others compare it to how flesh-and-blood artists learn from role models and follow trends. It would be going too far to describe the discussion here. Some providers, such as Adobe, use their own stock photo offerings for their tools and also provide remuneration for this use. This should make it suitable for the commercial sector and, above all, for companies.
Outlook: From image to moving image
The next exciting field for AI tools is already emerging: video. There are a number of new offerings here that use either text input or an image as a starting point.
The quality of the results is quite astonishing. However, the clips are still very short. Typical artifacts and peculiarities of the AI image generators can also be found here. They currently seem to work best with relatively static scenes. The more complex it gets, the more likely it is that absurd details will creep in.
At the same time, text and image generators were at a similar point not so long ago. A few years ago, for example, we still found it fascinating that any portrait photo could be created. Today, we complain if a detail in our photorealistic output is not one hundred percent correct.
In this respect, there is justified hope that these tools will develop noticeably in the coming months and years. Examples include
So while video generators are still a long way off, I think image generators are already useful and sensible today. They have their limits and they have problems. They do not replace manually created photos or graphics. Rather, they offer another option and in creative hands they can be a helpful tool.
I see them as being on a similar level to today's text generators: they support and sometimes inspire. They work best in tandem with a person.
Your questions about creating AI images
What questions do you have about creating images and graphics with AI? Feel free to use the comment function. Would you like to be informed about new articles on web design and AI? Then follow us on Twitter, Facebook, LinkedIn or via our newsletter.