Sora: The New Gen of Text-to-Video Model


By Aaron Jalteco


Over the past few years, Artificial Intelligence (AI) has undergone a rapid transformation in text and image generation, but it has lagged in video creation. Millions of people have undoubtedly come across their fair share of AI-generated content, some of it obvious, and others more subtle and difficult to tell apart from reality.

Last year in March, an AI-generated video of Will Smith eating a plate of spaghetti went viral because of how crude it looked. Today, Sora is able to create videos with eerie accuracy that closely resemble our reality.

This breakthrough has prompted a number of concerns from industry professionals and journalists, stating that it had the potential to harm their industry and that it could be a tool for propaganda. These concerns are not without their merit as a number of companies have already cut down on their workforce to replace them with AI tools.

In 2022, Meta unveiled their text-to-video model with Make-A-Video; the following month Google released Imagen Video. What followed next was an arms race in the generative space that eventually culminated in 2023’s famous Will Smith eating spaghetti video. The general consensus was that the tool was not much more than a novelty and that no real professional would be threatened by such poor renderings.

Afterward, everything went dark, and there was very little progress being shared with the public about the future of these models. That changed in February when OpenAI showcased Sora to the world. For the first time, AI was able to create videos that mimicked reality, blurring the lines between fact and fiction. The online world was left stunned, leaving many to wonder if Sora was an elaborate April Fool's joke.

It was true, OpenAI had solved the issues of previous models. The videos were no longer marred by distortions; they were now slightly uncanny and sometimes indistinguishable from reality.

This prompted a number of experts to come out and openly speak against the negative implications of such a tool existing. With the advent of image and video generation, the tools everyone used to rely on in the past to verify if something actually happened can no longer be taken at face value.

Fake bot-run accounts have been proliferating on social media platforms, partly because of AI tools that are capable of emulating humans. Naturally, the existence of text-to-video models adds another tool for potential bad actors to unleash on the public.

During this year's Super Bowl, numerous harmful AI-generated images of singer Taylor Swift were spread on social media platforms. The public was quick to denounce the creation of the images and highlight the dangers of AI tools in the public space. Last year in May, an act to make the sharing of Deepfakes without consent illegal was introduced to Congress for consideration.

Although there are many negative ethical implications, there are also economic ones. Large corporations have increasingly relied on AI to reduce redundant positions within their organization, particularly within the field of journalism.

There has been a lot of discussion on whether or not these tools are as helpful as they claim to be. A number of budding professionals get their start in the industry by doing the work that is now being eliminated by AI. This has left many to question what will happen to those unable to find work that has now been replaced by AI.