Stable Diffusion_02_A1111_ControlNet_OpenPose

It is time for some ControlNet in A1111. We are currently studying ComfyUI but A1111 has been the tool we played with the most and used to produce these results.

While A1111 has its image-to-image tool Inbuilt, ControlNet takes everything to the next level of…Control. We are not experts in ControlNet and are only focused on the potential implementation of a workflow for architectural visualization.

For now, we found 3 Main study cases:

  1. 3D people/people enhancement that is the topic that we are going to check today;

  2. Sketch to Image capabilities, coming shortly after this;

  3. Clay render to Image. This was the original target we had in mind. We got there to discover that there are real-time options already implemented by other studios. We will still document this because it is a good exercise and an efficient way to keep notes.

The real-time goal is interesting for us as well, that is why we are studying ComfyUI now since it seems to be the most powerful and versatile interface to get there.

Without further do, let’s jump into the 3D People - People enhancement and variation. 

In this case, we start with an image from our 3D Library.

We are not huge fans of 3D people. They tend to look weird and we only find them useful if far from the camera or behind glass. What if there was a way to improve them, to create a better version, or to change nationality or dress? 

Let’s have a look at that.

Here is a screenshot of the A1111 controlNet interface.

What we wanted to do was to change the guy transforming him into a “smiling Nigerian woman in business clothes, a red blazer”. This was our prompt. More than the prompt itself, what matters are the ControlNet settings and the correct choice of the SDModel and ControlNet Model. 

Our SD for this is Realistic Vision V6 and for ControlNet we used OpenPose.

Here is a gallery of full-res results with this prompt and settings:

The changed background is a little confusing, however, the pose is consistent across all images and there is a way to remove the distraction. This is what happens inside A1111. ControlNet draws the pose outline of the reference image and uses that to build the new image. 

To get rid of the background we can introduce a mask. In this case we InPanted one but it can also be an image or a render element to control all the people in the image.

The painted mask limits the effect of ControlNet to that specific area, which means no changes in the background. As you can see, the mask doesn’t need to be precise but rather rough. Let’s see a carousel of these results.

Not 100% accurate maybe but in no time we have some incredible variations. Now, just to prove a point we want to make this “a smiling Chinese man in business clothes, a green blazer”.

We hope this was interesting and useful, here is a full animation with all the generated images! To be continued soon!

As a last extra step, you can upscale and enhance the results you got. We haven’t used any upscaler/enhancer in the Stable Diffusion workflow so we took the results and brought them into Krea.ai. The results are awe-inspiring with some details that bring it all together.

Photobashing_01_Overcast

Here is a slowed-down version of the breakdown you saw on our socials, explaining what was done and (maybe) giving some useful tips. 

The idea was to take a new shot of an existing project to reduce the time spent preparing the 3D model.

We wanted to do an overcast shot and we started by looking for pictures that could inspire us in terms of colors and environment. We opted for the following one, a shot taken by Alex Berger, a great photographer and friend. 

That done, we went back to 3Ds Max and placed a camera. We wanted this dynamic effect so we did not straighten the lines and made a rough camera match, then we matched the light.

As you can see, we wanted the freedom of choosing our camera without the constraints of the picture chosen so we had to rebuild part of the environment. We only cared about the perspective, the lines, and the light so that the entire picture looked “real”.

Following this first step, we cleaned up the picture and extended the first part of the environment using the new generative fill in Photoshop. We think the best way to use this is to leave the generative fill work by itself and keep any part of the image not needed at that moment switched off. In this case, we extended the environment with the building’s layer off so that it didn’t “confuse” the generative fill. We also removed the Windmill. The following image is the result of this process.

Another useful tip is to build the extra parts in smaller pieces so that the generative fill takes into account only the relevant part of the image. In this case, we extended the environment to the left and we also wanted more foreground. We added a “slice” in the bottom part of the image.

From here we brought back some parts of the original image masking them out and addign some details that were not clean enough. In some cases it was easier to add new images rather than working on the masks.

Now the image is basically done. We just painted some extra highlight and did some color-corrections to get it where we wanted.

For the sake of showing what can be achieved in minutes with generative fill, we decided to straighten the verticals still giving room to the building. That left us with the following image. 

We used PS to build both sides of the image just using the generative fill. We divided the triangles in 4 parts to retain more control but no prompts were given.

Missing just a person and some extra trees, this time old-fashioned photobashing.

We hope you found this interesting and useful! Cheers!