The model is part of the framework. It typically expects an input image and a driving video, both resized to 256x256 pixels , to perform its animation tasks. Questions about the pre-trained models of vox #127 - GitHub
: Unlike the standard vox-cpk.pth.tar model, which is trained for 100 epochs without a discriminator, the vox-adv-cpk.pth.tar version is fine-tuned for an additional 50 epochs using an adversarial discriminator. Vox-adv-cpk.pth.tar
# Load model and optimizer model = VoxAdvModel() # Assuming VoxAdvModel is defined in model_definition.py checkpoint = torch.load('Vox-adv-cpk.pth.tar', map_location=torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')) model.load_state_dict(checkpoint['state_dict']) The model is part of the framework