Oof. Dude. You’re not wrong about what is and isn’t available online. But it’s okay. New frontier or whatever. Haha.
I’ve been mulling over the regularization image thing, so I created a reddit post asking about it, but I basically asked, “are these images supposed to represent what the model thinks ‘this’ thing is, and in that case, regularization images would serve the role of being ‘this, but not this’” or is it more like, “these fill in the gaps when the LoRA is lacking?”
I suspect it’s more like the first. That said, it might actually make sense to include all the defective and diverse images for the purpose of basically instructing the LoRA/model to be like, “I know you think I’m asking for ‘this,’ but in reality, that’s not what I want.”
If that’s the case, it might make sense to ENSURE your regularization images are way off base and messed up or whatever. Or at least anything in the class that you know you def don’t want.
I don’t have confirmation of any of this. I’m VERY new here (like ran my first LoRA training yesterday).
I like the idea of your batch size.
Ah. The captioning is something I REALLY need to think about. I’m guessing the cabin caption idea you used, basically you lost flexibility but gained accuracy by going that approach? I wonder if you could tag it ‘cabin, church’ and retain some of both?
The steps, to me, sound very high, but I can’t say, for sure. Ahaha. Because for people, I’ve heard 1500 to 3000.
I’ll be sure to come back and share findings once I have more. I think to really “do this right” you HAVE to train some of your own shit, but to do it well, as you’ve quickly realized, you’ve got to understand the methodology/philosophy of how it’s done.
Well. Maybe scratch some of what I said above. As with many things, the answer is simply more complicated than that.
I found this video fairly useful in helping understand the process. I hope it helps.
https://youtube.com/watch?v=EehRcPo1M-Q