Fast Audio Video World Models: Attempt 1 | OWL Blog