Lots of odd artifacting, slow creation time and yes it had some issues with sailormoon.
It probably isn’t worth the effort for most things, but one option might also be – and I’m not saying that this will work well, but a thought – using both. That is, if Bing Image Creator can generate images with content that you want but gets some details wrong and can’t do inpainting, but Midjourney can do inpainting, it might be possible to take a Bing-generated image that’s 90% of what you want and then inpaint the particular detail at issue using Midjourney. The inpainting will use the surrounding image as an input, so it should tend to try to generate similar image.
I’d guess that the problem is that an image generated with one model probably isn’t going to be terribly stable in another model – like, it probably won’t converge on exactly the same thing – but it might be that surrounding content is enough to hint it to do the right thing, if there’s enough of that context.
I mean, that’s basically – for a limited case – how AI upscaling works. It gets an image that the model didn’t generate, and then it tries to generate a new image, albeit with only slight “pressure” to modify rather than retain the existing image.
It might produce total garbage, too, but might be worth an experiment.
What I’d probably try to do if I were doing this locally is to feed my starting image into the thing to generate prompt terms that my local model can use to generate a similar-looking image, and include those when doing inpainting, since those prompt terms will be adapted to trying to create a reasonably-similar image using the different model. On Automatic1111, there’s an extension called Clip Interrogator that can do this (“image to text”).
Searching online, it looks like Midjourney has similar functionality, the /describe command.
It’s not magic – I mean, end of the day, the model can only do what it’s been trained on – but I’ve found that to be helpful locally, since I’d bet that Bing and Midjourney expect different prompt terms for a given image.
Oh I also tried local generation (forgot the name) and wooooow is my local PC bad at pictures (clearly can’t be my lack of ability it setting it up).
Hmm. Well, that I’ve done. Like, was the problem that it was slow? I can believe it, but just as a sanity check, if you run on a CPU, pretty much everything is mind-bogglingly slow. Do you know if you were running it on a GPU, and if so, how much VRAM it has? And what you were using (like, Stable Diffusion 1.5, Stable Diffusion XL, Flux, etc?)
It probably isn’t worth the effort for most things, but one option might also be – and I’m not saying that this will work well, but a thought – using both. That is, if Bing Image Creator can generate images with content that you want but gets some details wrong and can’t do inpainting, but Midjourney can do inpainting, it might be possible to take a Bing-generated image that’s 90% of what you want and then inpaint the particular detail at issue using Midjourney. The inpainting will use the surrounding image as an input, so it should tend to try to generate similar image.
I’d guess that the problem is that an image generated with one model probably isn’t going to be terribly stable in another model – like, it probably won’t converge on exactly the same thing – but it might be that surrounding content is enough to hint it to do the right thing, if there’s enough of that context.
I mean, that’s basically – for a limited case – how AI upscaling works. It gets an image that the model didn’t generate, and then it tries to generate a new image, albeit with only slight “pressure” to modify rather than retain the existing image.
It might produce total garbage, too, but might be worth an experiment.
What I’d probably try to do if I were doing this locally is to feed my starting image into the thing to generate prompt terms that my local model can use to generate a similar-looking image, and include those when doing inpainting, since those prompt terms will be adapted to trying to create a reasonably-similar image using the different model. On Automatic1111, there’s an extension called Clip Interrogator that can do this (“image to text”).
Searching online, it looks like Midjourney has similar functionality, the
/describe
command.https://docs.midjourney.com/docs/describe
It’s not magic – I mean, end of the day, the model can only do what it’s been trained on – but I’ve found that to be helpful locally, since I’d bet that Bing and Midjourney expect different prompt terms for a given image.
Hmm. Well, that I’ve done. Like, was the problem that it was slow? I can believe it, but just as a sanity check, if you run on a CPU, pretty much everything is mind-bogglingly slow. Do you know if you were running it on a GPU, and if so, how much VRAM it has? And what you were using (like, Stable Diffusion 1.5, Stable Diffusion XL, Flux, etc?)
Ran it on my 6900 (nice) and although slow the main issue is it made things look like this:
It was stable diffusion XL.