Generate videos from text prompts with Sora API
a tiny vision language model
Chat with images and get visual answers