Synthesis AI, a startup developing a platform that generates synthetic data to train AI systems, today announced that it raised $17 million in a Series A funding round led by 468 Capital with participation from Sorenson Ventures and Strawberry Creek Ventures, Bee Partners, PJC, iRobot Ventures, Boom Capital and Kubera Venture Capital. CEO and cofounder Yashar Behzadi says that the proceeds will be put toward product R&D, growing the firm’s team, and increasing research particularly in the area of mixed real and synthetic data.
Synthetic data, or data that’s created artificially rather than captured from the real world, is coming into wider use in data science as the demand for AI systems grows. The benefits are obvious: While collecting real-world data to develop an AI system is costly and labor-intensive, a theoretically infinite amount of synthetic data can be generated to fit any criteria. For example, a developer could use synthetic images of cars and other vehicles to develop a system that can differentiate between makes and models.
Unsurprisingly, Gartner predicts that 60% of the data used for the de¬vel¬op¬ment of AI and analytics projects will be synthetic by 2024. One survey called the use of synthetic data “one of the most promising general techniques on the rise in [AI].”
But synthetic data has limitations. While it can mimic many properties of real data, it isn’t an exact copy. And the quality of synthetic data is dependent on the quality of the algorithm that created it.
Behzadi, of course, asserts that Synthesis has taken meaningful steps toward overcoming these technical hurdles. A former scientist at IT government services firm SAIC and the creator of PopSlate, a smartphone case with a built-in E Ink display, Behzadi founded Synthesis in AI in 2019 with the goal of — in his words — “solving the data issue in AI and transform[ing] the computer vision paradigm.
“As companies develop new hardware, new models, or expand their geographic and customer base, new training data is required to ensure models perform adequately,” Behzadi told. “Companies are also struggling with ethical issues related to model bias and consumer privacy in human-centered products. It is clear that a new paradigm is required to build the next generation of computer vision.”
In most AI systems, labels which can come in the form of captions or annotations are used during the development process to “teach” the system to recognize certain objects. Teams normally have to painstakingly add labels to real-world images, but synthetic tools like Synthesis’ eliminate the need in theory.
Synthesis’ cloud-based platform allows companies to generate synthetic image data with labels using a combination of AI, procedural generation, and VFX rendering technologies. For customers developing algorithms to tackle challenges like recognizing faces and monitoring drivers, for instance, Synthesis generated roughly 100,000 “synthetic people” spanning different genders, ages, BMIs, skin tones, and ethnicities. Through the platform, data scientists could customize the avatars’ poses as well as their hair, facial hair, apparel (e.g., masks and glasses), and environmental aspects like the lighting and even the “lens type” of the virtual camera.
Leading companies in the AR, VR, and metaverse space are using our diverse digital humans and accompanying rich set of 3D facial and body landmarks to build more realistic and emotive avatars,” Behzadi said. “[Meanwhile,] our smartphone and consumer device customers are using synthetic data to understand the performance of various camera modules . Several of our customers are building a car driver and occupant sensing system. They leveraged synthetic data of thousands of individuals in the car cabin across various situations and environments to determine the optimal camera placement and overall configuration to ensure the best performance.