I gathered 23 images of the prompt “Lestat from queen of the damned” with Stewart Townsend in them. I generated multiple images with the prompts Vampire, Damned Queen, Queen of the Damned, Lestat from Queen of the Damned, Interview with the vampire. I used the RealisticVision model because that is the model I will be using for final composition. I extracted the faces using Faceswap.dev and then processed the images using Blip captions. I edited the texts to reduce bad prompts.
I could have used the actual video clips of Stewart Townsend’s face and gone through the best images but who has time for that. I could have included full body images.
I used the learning rate 0.005:100:0.0025:250,0.001:500,0.0005:1000,0.00025 for training with a batch size of 1 and Gradient accumulation step of 2. I switched to SD 1.5 EMA Only model for training. I am using a 1xRTX 3080 Ti on Vast.ai cloud. Training used 11 Gb out of the total 12 Gb. I was paying $0.278 an hour.
I don’t know why but on occasion there will be a hallucination in the training images and it will display a morphed looking abstract image that seems to resemble a kaleidoscope. Majority of images have very distinct features such as blue eyes, gaunt face, pale skin, sunken red darkish eyelids, and long brown curly hair.
I could have upscaled the images before extracting the faces so I could reduce blur.