Stable Diffusion_02_A1111_ControlNet_OpenPose

Maurizio Gastoni January 3, 2024

It is time for some ControlNet in A1111. We are currently studying ComfyUI but A1111 has been the tool we played with the most and used to produce these results.

While A1111 has its image-to-image tool Inbuilt, ControlNet takes everything to the next level of…Control. We are not experts in ControlNet and are only focused on the potential implementation of a workflow for architectural visualization.

For now, we found 3 Main study cases:

3D people/people enhancement that is the topic that we are going to check today;
Sketch to Image capabilities, coming shortly after this;
Clay render to Image. This was the original target we had in mind. We got there to discover that there are real-time options already implemented by other studios. We will still document this because it is a good exercise and an efficient way to keep notes.

The real-time goal is interesting for us as well, that is why we are studying ComfyUI now since it seems to be the most powerful and versatile interface to get there.

Without further do, let’s jump into the 3D People - People enhancement and variation.

In this case, we start with an image from our 3D Library.

We are not huge fans of 3D people. They tend to look weird and we only find them useful if far from the camera or behind glass. What if there was a way to improve them, to create a better version, or to change nationality or dress?

Let’s have a look at that.

Here is a screenshot of the A1111 controlNet interface.

What we wanted to do was to change the guy transforming him into a “smiling Nigerian woman in business clothes, a red blazer”. This was our prompt. More than the prompt itself, what matters are the ControlNet settings and the correct choice of the SDModel and ControlNet Model.

Our SD for this is Realistic Vision V6 and for ControlNet we used OpenPose.

Here is a gallery of full-res results with this prompt and settings:

The changed background is a little confusing, however, the pose is consistent across all images and there is a way to remove the distraction. This is what happens inside A1111. ControlNet draws the pose outline of the reference image and uses that to build the new image.

To get rid of the background we can introduce a mask. In this case we InPanted one but it can also be an image or a render element to control all the people in the image.

The painted mask limits the effect of ControlNet to that specific area, which means no changes in the background. As you can see, the mask doesn’t need to be precise but rather rough. Let’s see a carousel of these results.

Not 100% accurate maybe but in no time we have some incredible variations. Now, just to prove a point we want to make this “a smiling Chinese man in business clothes, a green blazer”.

We hope this was interesting and useful, here is a full animation with all the generated images! To be continued soon!

As a last extra step, you can upscale and enhance the results you got. We haven’t used any upscaler/enhancer in the Stable Diffusion workflow so we took the results and brought them into Krea.ai. The results are awe-inspiring with some details that bring it all together.

Photobashing_01_Overcast

Maurizio Gastoni December 16, 2023

Here is a slowed-down version of the breakdown you saw on our socials, explaining what was done and (maybe) giving some useful tips.

The idea was to take a new shot of an existing project to reduce the time spent preparing the 3D model.

We wanted to do an overcast shot and we started by looking for pictures that could inspire us in terms of colors and environment. We opted for the following one, a shot taken by Alex Berger, a great photographer and friend.

That done, we went back to 3Ds Max and placed a camera. We wanted this dynamic effect so we did not straighten the lines and made a rough camera match, then we matched the light.

As you can see, we wanted the freedom of choosing our camera without the constraints of the picture chosen so we had to rebuild part of the environment. We only cared about the perspective, the lines, and the light so that the entire picture looked “real”.

Following this first step, we cleaned up the picture and extended the first part of the environment using the new generative fill in Photoshop. We think the best way to use this is to leave the generative fill work by itself and keep any part of the image not needed at that moment switched off. In this case, we extended the environment with the building’s layer off so that it didn’t “confuse” the generative fill. We also removed the Windmill. The following image is the result of this process.

Another useful tip is to build the extra parts in smaller pieces so that the generative fill takes into account only the relevant part of the image. In this case, we extended the environment to the left and we also wanted more foreground. We added a “slice” in the bottom part of the image.

From here we brought back some parts of the original image masking them out and addign some details that were not clean enough. In some cases it was easier to add new images rather than working on the masks.

Now the image is basically done. We just painted some extra highlight and did some color-corrections to get it where we wanted.

For the sake of showing what can be achieved in minutes with generative fill, we decided to straighten the verticals still giving room to the building. That left us with the following image.

We used PS to build both sides of the image just using the generative fill. We divided the triangles in 4 parts to retain more control but no prompts were given.

Missing just a person and some extra trees, this time old-fashioned photobashing.

We hope you found this interesting and useful! Cheers!

Stable Diffusion_01 _ Christmas Post

Maurizio Gastoni December 13, 2023

We started looking into AI tools quite late. We tried Midjourney DALL-E and other tools in the early days but could not see actual use for them in the workflow of an ArchViz studio.

Then we started seeing some work and short videos that seemed to be relevant and we decided it was time for a new attempt. Stable Diffusion seems to be the most immediate possibility to implement AI in the workflow efficiently. There are already studios using it so it can be done effectively.

We are not going to do a tutorial on anything specific but rather collect relevant sources we followed and explain the general process behind the generations of the images you see in our socials these days.

If you don’t know anything at all about Stable Diffusion and want to have a comprehensive overview, we recommend a tutorial we found on LinkedIn Learning by Ben Long. It is amazing how clearly he can explain how things work in A1111.

If you (like us) don’t even know how to install Stable Diffusion and A1111, it is explained in the course and also in this video by Ava ArchvizArtist.

Once you have made friends with the interface and start producing something, it all boils down to a few things:

Models _ To get good-looking images consistently, you need a well-trained model installed. For these images, we played with two models that gave us good results with no effort.
Iterations & patience _ There is a high degree of randomness in the AI image creation. You have to test things and to fine-tune everything. The good thing is that there is no right or wrong. You will try and see if the direction is the one you like or you will pivot if it is not.
A basic knowledge of the tool _ To narrow down the results you will get, you need to know what to do. We will share what we did to get somewhat consistent results as you see in the gallery below.

To speed the testing process up in this stage, we used the automatic scripts built into A1111 to: test different models, different prompts, and different combinations of parts of the prompts. On top of that, we rendered bigger batches of images. A high batch count combined with a random seed will give you more images out of the same prompt. A prompt matrix combination will help you understand what parts of your prompt are more relevant than others, different prompts will give you completely different results as will the use of different models, etc…This is the time-consuming part. You have to test. It is all explained perfectly in Ben’s class.

From these batches, we selected 3 images we liked. We wanted more variations of those images with subtle variations.

To get subtle variations of one specific image, the best way we found is to fix the seed. We got the seed for each selected image from the .pgn info tab. For each image, we changed a bit the prompt and ran more tests. The fixed seed will make sure that the image doesn’t change drastically.

All these images were upscaled by a value of 2. We found that the quality compared to the 512 x 512 ratio is not comparable. Higher values were increasing render time considerably not adding too much value to the final result.

Hope you find this useful! We already have the next episode in the making!

Explorations

Maurizio Gastoni December 13, 2023

What to expect in this section of the site? We consider it as a dynamic notebook.

We’ve been studying eagerly lately and what we want to do here, is to document our journey. The focus is still on visualizations so you'll find something about 3Ds Max, Railclone, Forest, UE, AI, and Animations.

Stable Diffusion_02_A1111_ControlNet_OpenPose

Photobashing_01_Overcast

Stable Diffusion_01 _ Christmas Post

Explorations

Search Posts

Featured Posts