With the constant, innovative ways that image generation has become evermore present in our daily lives, one might ask themselves where it may be headed, or how it all began. Gen Z is very aware of what role it plays in their lives -- Snapchat and Messenger filters mapping your face to send ridiculous photos to friends, charcater creators in video games, and even at-home sunglasses try-ons for the late-night buyer. Yuval Alaluf asked himself a similar question - where can image generation go? What he found was that an industry once thought of as dominant in its domain might have some real competition coming up soon.
Yuval’s began studying AI and Machine Learning during his second year of his Bachelor’s degree at Tel Aviv University. His ML work started with biology, co-writing and developing a project that helped uncover malign as well as dangerous tumors in the human body. This gave Yuval exposure into the idea of deep learning, as well as computer vision work. After his project was finished, his fascination only grew, and the idea of data being the driving force behind all AI and ML projects would fuel his later endeavors.
Yuval discussed the biggest aspect of what makes his work possible - GANs (Generative Adversarial Networks) and StyleGANs. While GANs produce high quality, large images, StyleGANs implements mapping properties to determine areas in space and latent space. Individuals who know StyleGAN, in most use cases, to edit images. An individual can take random noise from a given noise vector to generate any given image! Once this is complete, you can take that noise vector and move it in various directions, add in latent code to then change the age behavior (look, design, etc.) of the image being worked on. StyleGAN not only provides an individual with high quality images, but also the ability to map and edit these images as necessary. The goal is to commit image edits on real images, not generated ones.
Yuval took GANs to build a network that takes an image and finds code that can perform edits on it, calling it restlying. With restyling, Yuval seemed to recognize two predominant themes: the model was either too slow and very accurate, or far too fast and not accurate enough. Yuval and his team worked to tailor their StyleGAN so it began working within this equilibrium, thus beginning his PhD work in Face Toonification.
Face Toonification is the main focus area Yuval’s thesis work has consisted of. He has created, within his studies, AI-based models that can make images turn into Disney-like animation in a matter of seconds. (See Yuval’s work in action on 2-Minute Papers)
Yuval starts with a POC, a small set of images, to test if the model is performing up to his specifications/completing what he’s asking of it to do. Once this is functioning properly, he then changes the images, or data inputted, in addition to changing model architecture, relaxing the assumptions of the data. Whenever Yuval does this though, he emphasized how he does so one step at a time, to ensure “he’s learning and realizing what needs to happen” or be corrected based upon the output from his commands. “When designing for toonification, it’s always a one step forward, two steps back type of exchange.” Yuval was humble to mention this, recongizing that toonifcaiton requires much tooling, running experiments, creating analysis and moving forward - it’s never so simply as inputting and expecting the perfect output.
Yuval and his research team do not take any artistic integrity when they are working on the images being generated. As stated before (and found in the full 2-Minute Papers Video here) StyleGAN will generate the images shown, and his team then takes StyleGAN stockings that know how to generate tunes based on the profile/makings of the images it takes in. By spanning an array of ‘like’ images, as well as using the same encoder for faces paired with the StyleGAN, allows the model to translate between real images and tunes.
Needless to say, Yuval was ecstatic that his work was featured on 2-Minute Papers! As a long time viewer of the channel, he mentioned how his work being featured was a total surprise. Yuval praised Károly Zsolnai-Fehér, creator of 2-Minute Papers, for his ability to be so nuanced when discussing complex processes to an overall generalized audience, with such ease and clarity. Once Károly reached out to Yuval asking to do an overview of his work, “the answer was an immediate yes!”
Yuval mentioned that the primary focus of image generation at the moment is specifically with GAN alone. He believes in the near future, we will see more: domain of videos, encoding of videos, editing them, etc. be applied to the Image Generation process as well.
We already see GANs being used in the film industry - VFX & CGI teams alike have implemented the technology across various narrative storylines and studios. Though this integration is a start of what the future will hold, it still isn’t practical, prodividng solutions for everyday use by the average consumer. Yuval believes that this type of innovation, along with a progressive dive into the world of Augmented Reality and videogames will take Image Generation down the path of the future.
About the Metabob Podcast series:
This new, weekly series will take a deep dive into the tech ecosystem, putting real people and explanations behind some of the common household ideas and misconceptions about the tech industry.
Metabob's Website: https://www.metabob.com
Metabob on LinkedIn: https://www.linkedin.com/company/metabob
Metabob on Twitter: https://twitter.com/Metabob_App
Guest(s) Featured in this Episode