How AI Happens

Unity SVP of AI Danny Lange: The Industrial Metaverse

Episode Summary

Danny Lange, the Senior Vice President of AI at Unity, discusses the simulation, synthetic data, and the opportunity of the Industrial Metaverse.

Episode Notes

Leading AI companies are adopting simulation, synthetic data and other aspects of the metaverse at an incredibly fast rate, and the opportunities for AI/machine learning practitioners are endless. Tune in today for a fascinating conversation about how the real world and the virtual world can be blended in what Danny refers to as “the real metaverse.”

Key Points From This Episode:

Tweetables:

“When you play a game, I don’t need to know your name, your age. I don’t need to know where you live, or how much you earn. All that really matters is that my system needs to learn the way you play and what you are interested in in your gameplay, to make excellent recommendations for other games. That’s what drives the gaming ecosystem.” — @danny_lange [0:03:16]

“Deep learning embedding is something that is really driving a lot of progress right now in the machine learning AI space.” — @danny_lange [0:06:04]

“The world is built on uncertainty and we are looking at simulation in an uncertain world, rather than in a Newtonian, deterministic world.” — @danny_lange [0:23:23]

Links Mentioned in Today’s Episode:

Danny Lange on LinkedIn

Unity

Episode Transcription

EPISODE 43

"DL: When you move into synthetic data, you have the opportunity to correct those biases. The point here is that it's also now your responsibility. You generate the data that you train your system with. You can't blame a labeling company. You can blame the photographer who went out and took pictures on the street of people or whatever. You can't blame anyone. You have to produce the synthetic data. You have to understand."

[00:00:31] RS: Welcome to How AI Happens, a podcast where experts explain their work at the cutting edge of artificial intelligence. You'll hear from AI researchers, data scientists, and machine learning engineers, as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson, and we're about to learn How AI Happens.

[00:01:00] RS: Joining me today on How AI Happens is the Senior Vice President of Artificial Intelligence over at Unity, Danny Lange. Danny, welcome to you. How are you today?

[00:01:08] DL: I am good. What about you?

[00:01:10] RS: I am fantastic. Just podcasting my heart out per usual and excited about this conversation because you didn't use a company I know decently well in other capacities. But I've never really got to get under the AI hood of Unity. I'm excited to do some of that with you here today. Before we get to all that though, would you mind sharing with the folks out there in podcast land a little bit about your background and how you wound up in your current role at Unity.

[00:01:33] DL: Absolutely, yeah. I have been at Unity for almost six years. But I think it's important to get a context of what it did before I came to Unity, because it's highly relevant for what I'm doing at Unity. Which is, not as much in gaming as in your classical enterprise use cases of machine learning and AI. Prior to Unity, I was head of machine learning at Uber. I ran actually the team that built the Michael Angelo platform at Uber. It was a unified machine learning platform to be used across all of Uber. Then, the reason I came to Uber was that I did exactly the same for Amazon. I ran the machine learning platform team called elastic machine learning at Amazon, that essentially ran the unifying machine learning platform for over 100 teams at Amazon. We also build a version of that machine learning platform, I launched that as the first AI service on AWS called Amazon Machine Learning. Prior to that, I ran the machine learning toolkits team at Microsoft. You can see. long, long, very strong enterprise background in machine learning and in the application of AI. That is, in many ways very different from what you see with a gaming engine like Unity is.

[00:02:48] RS: Looking across Unity, Uber, Microsoft and Amazon, very different products, very different use cases at least at the outset. What were the commonalities in the machine learning challenge that has allowed you to lend your expertise to all these different companies?

[00:03:02] DL: Yeah, awesome question, because there's an answer and it's very specific, it's data. It's data that goes across all of this. One of the reasons I moved from Microsoft to Amazon was that, Amazon had basically much more interesting data at that time than what Microsoft had. Then in my move to Uber, data change character. It was not about recommendations, it was not about safety of products and that kind of thing, as it was at Amazon. It was about moving vehicles around making sure that you predict needs and get the vehicles in the right places at the right time. It was things like detecting dangerous driving, etc. 

Then. in a shift to Unity, it was the observation that at Unity, data comes almost for free. It's a part of gaming that you're sort of relieved from the real world, and you can simulate, and you can play games and you can generate vast amounts of data. There's really a common theme here, which is not surprising to machine learning and AI practitioners is the data that drives it all.

[00:04:04] RS: Because the data is coming from the activity in games, is it immune to some of the privacy concerns that people have about other data collection methods?

[00:04:14] DL: Yeah, its behavior driven. When you play a game, I don't need to know your name, your age. I don't need to know where you live, how much you earn. All that really matters is that my system needs to learn and recognize the way you play and what you're interested in in your gameplay to make excellent recommendations for other gamers. That's what actually drives the gaming ecosystem. We have way over three billion monthly players playing game on the Unity platform. We don't develop games. The games are being offered up by studios. What we do is that we sort of bring studios and players together. In one way, you can say, well, my behave is privacy two, but we can't really – it's not you we pinpoint. It's the way you play we pinpoint. In that sense, we are very different from, let's say a lot of social media companies out there, they really based on knowing a lot more about you as a person.

[00:05:16] RS: Yes, and you rattle off a couple examples of that, where I live, how much money I make, what my job is, those sorts of things. Any insight based on those is going to be the result of assumptions you have, that the tool has based on my categorization. That's a lot different than my behavior, right? Because you are the actions you take, you're not just a list of statistics about the things in your life. Your behavior in a game feels much more indicative about your future behavior than say, a list of prior statistics. In that way, do you think it can be more accurate?

[00:05:47] DL: What we do, let me explain this a bit, because it's very technical. We create basically deep learning embeddings off player behavior. You play a game, it may be a shooter game, or it may be a word game, or it may be a puzzle game. All these games are really – if you look, if you try to represent them as deep learned embeddings, you're going to belong to different parts of the universe. If you consistently play a single-shooter games or multiplayer games, then I'm not going to have a whole lot of interest in trying to get you into the working group. That's a very different audience. In that sense, we can really dive in and you can look at subgroups that are certain word games, or certain puzzle games you like more than others, and you play them more than others. Basically, learning that about you, yet never really – we actually don't even put an identifier on you. It doesn't matter. You come in, you play the game, we can recognize which group you belong to. I think that machine learning folks out there will understand that there's an abstraction here. It's not – it's not an identifier. It's not your phone that matters. It's not telling us who you are. It's your behavior that tells us which group you belong to.

[00:06:59] RS: Could you speak a little more about the deep learning embedded pods you have to classify people?

[00:07:04] DL: Yeah, deep learning embeddings. It's something that is really driving a lot of progress right now in the whole machine learning AI space, is the ability to somehow have a deep learning network, summarize who you are, or the shape of an asset or words in sentences. It's that whole idea of being able to learn a representation, an abstract representation of some input that has been really surprisingly efficient in many, many domains. We have seen it in recommendation systems, we have seen actually in – which is highly relevant for Unity. A lot of graphical understanding like, let me give an example, "If I have a cube, no matter which position you see the cube from, it is still a cube."

What you can do with the deep learned embedding is to actually train a system to recognize cubes at any angle, so it doesn't matter how you rotate it, it still results in the same embedding. That would, for instance be a rotation agnostic embedding for graphical objects, so you can recognize cubes, balls go up, and you can recognize various kinds of objects like chairs, and tables, et cetera. It's an extremely powerful concept, whether that is for recommendation engines, for computer vision or even natural language.

[00:08:30] RS: The alternative to rotation agnostic training, the way you're describing was, I guess, capturing photographs and continually retraining a learner to say, "Okay. Even from this angle, that's a car. This to me speaks to the opportunity of simulation, right, of being able to create real world objects in a simulated environment in which to train learners. Is this a huge push for Unity?

[00:08:52] DL: This is a very, very important push from Unity. That is what really at the core attractive mental Unity. I mentioned data, yeah? The fact that you can create a 3D object in Unity, and you can rotate that object as crazy as you want, you can change the lighting conditions, you can change the background, you can even put something in the foreground. But say there is a car, yeah. It's still a car, you know it's a car. Your training data, whether you produce a thousand, a hundred thousand or a billion frames or images of that car from all kinds of angles, and all kinds of weather, in all kinds of lighting conditions, etc. That is an extremely low cost, very scalable, and perfectly labeled training set for your machine learning system. That is really attractive, using a game engine for that purpose.

[00:09:48] RS: Yeah, of course. It would need to be synthetic data in this case, correct?

[00:09:53] DL: Yeah. We talked about synthetic data here. I think it's important to understand that it's a fairly new concept. It was very important concept, because with synthetic data, they still look more or less like the real world. But they are synthetic, they come out of a game engine, yeah. But they have a lot of properties, I just mentioned, you have the 3D shapes, you have lighting conditions, but you also have additional things like physics. Things will drop and bounce off the floor. Things will be heavy or light. You will have movement. If you think about it, you're not able to put NPCs in the non-playable characters. That's humanoids. You can have humanoids walking around. You can train a computer vision system for a forklift, autonomous forklift to avoid hitting people, but you don't use real people. You use humanoids in a game engine, and they will step in front of that autonomous forklift a million times and they will be hit. But yeah, nobody will ever be killed, it's all virtual.

That's the whole idea about synthetic data without putting in danger real humans. You can produce data in situations that you never want to put a real human in. You can also, going into things like bias, you can now basically train your system on equal amounts of all kinds of skin tones. The whole range of skin tones that we find in the human population. You can dress humans with a variety of cultural and specific clothing. So that at the end of the autonomous forklift, we'll work anywhere in the world with any different kinds of looking people, whether that's skin tone, hair, clothing. You're in control of it with synthetic data. You're not just taking a snapshot of the real world, which is more often than not turns out to be very biased training data.

[00:11:50] RS: That's kind of where I wanted to go next with this. The obvious criticism, and forgive me if it's naïve, is that, how can you be sure synthetic data represents a one-to-one reproduction of real-world environments? How do you investigate that problem with Unity?

[00:12:05] DL: First, I'm going to use a counter argument. The counter argument is that with real world data that you handle label, we know that there's a risk in that labeling. It's not always perfect. It's also this challenge of actually understanding the outcome, the statistics of real-world data. It requires a lot of analysis, and there may be meta data that you just don't have. When you move into synthetic data, you have the opportunity to correct those biases. The point here is that it's also now your responsibility. You generate the data that you train your system with. You can't blame a labeling company. You can blame the photographer who went out and took pictures on the street of people or whatever. You can't blame anyone. You have to produce the synthetic data. You have to understand. 

One of the ways we do that at Unity is to ensure that we hire extremely diverse teams that can bring that aspect and then say, "Hey! Have you thought about?" It may be the way that people they dress, it may be things that people do on the sidewalk, or maybe, there are no sidewalks. There are many parts of the world, they don't have sidewalks, so you need training data of people working on the street, et cetera. It's now your responsibility to make the data better than the real-world data. That we cannot easily solve for you, but we can give you the tools and in particularly, the statistics behind the data produced because synthetic data is synthetic, so you have all the metadata available. You know exactly. Did I cover all skin tones? Did I cover people walking, running? Did I cover people of small sizes and tall people? Did I cover a variety of body types, et cetera?

[00:13:57] RS: Do you think synthetic data will replace real-world data as the primary training mechanism?

[00:14:02] DL: Yes, I absolutely just think so. It does come with one note, one caveat, which is, for me to know with my synthetic data are doing a good job, I need to have a baseline. I cannot evaluate my synthetic data up against synthetic data. It makes no sense. There will still be a need for good data collection of real-world data that you use as a baseline. It can be your golden data set that you compare up against, because at the end of the day, these systems are going to have to be deployed to the real world to be useful. That is where the synthetic data, your machine learning model trained on that synthetic data set meets reality.

[00:14:42] RS: You have real world data as a foundation, a springboard upon which to layer synthetic data, perhaps, right? How frequently do you go back to real world data? Is it a one-time thing? Is periodically throughout a process? How often is it important to anchor your synthetic data to real-world counterpart?

[00:15:01] DL: It requires a nuanced answer. The baseline is that you shall never ignore your real-world data, because that is your baseline. You don't know anything if you don't know your real-world data. You need to consult your performance. For instance, you're working on training, a model based on synthetic data. You need to evaluate it. In every iteration, you need to evaluate it up against your real-world data. But you also want to put another perspective in here, which is that, to really make progress in the space of AI, to really build very large machine learning models that do the magic that we have seen some glimpse off of the recent years. You need vast amounts of training data.

I have experienced over the years I've been doing this for longer than you want to spell out loud. I've done this for a long time. What I have noticed is that just as chip makers, they have Moore's Law that says that every 18 months, the density of transistors on the chip doubles. I think I have seen similar law for data in in machine learning and AI, which is that about every 18 months, the amount of data doubles. It goes up, but the real world is going to run out of data beyond the baseline level. If you want to run extensive simulations, if you want to get close to AGI Artificial General Intelligence, you're going to need trillions, and more trillions of frames of training data to cover that level of general intelligence. That synthetic data is absolutely the key of that. We have deep mind in London. They're using Unity. They generate incredible amounts of training data for their systems.

With Open AI using Unity, the Allen Institute using Unity, all these leading AI Institutes or companies are using Unity because they can create more data at a lower cost. When I say more data, is more data than it can ever generate from the real world. Yeah.

[00:17:11] RS: This data to be used in simulation. Simulation these days has a new name, the metaverse. That strikes me as a key opportunity of the metaverse. I think there's a short sighted or perhaps narrow-minded view of the metaverse that's going to be a more immersive second life, or the sims or just another way to be on Facebook. There's much more to the equation. How is Unity? Because also, this anchor as a gaming company, how are they approaching the metaverse and wrapping in some of these simulations use cases?

[00:17:44] DL: Yeah. I love to talk about the industrial metaverse, but let's take the lowest common denominator of the metaverse and some people have introduced the world to the metaverse of having cartoon-ish avatars sitting around the table like an expanded Zoom meeting. I'm like, sure, that's easy to imagine. But what is way more interesting is when you take, there are a few leaps further, yeah. If you think about having a virtual simulated world, living side by side of the real world, you can say sort of, we have a digital twin of the real world, or maybe just a digital twin of how the real world is going to be one day. Think about creating an environment in a game engine like Unity is. Let's take a sort of an industrial example. It can be a manufacturing floor with conveyor belts and robots moving around. I would build that first in unity. I would have the mobile robots bringing parts in, I want to have robotic arms that load them onto a conveyor belt, it flows down, something is being built and there's an output.

I would simulate that in Unity. Unity has motion, high frame rates, gravity from the physics engine. You have the whole 3D world there. You have all the complexities. You can put humanoids in there that needs to cross through the room to get to a virtual bathroom and ensure that the robots can still move around and not get stuck because humans get it in the way. You build that system, you simulate it, you learn from it, you move conveyor belts around, you test throughputs, you see if you can optimize your – you may even use reinforcement learning and other techniques to see if a system is an AI system, can train that environment to operate better, move things around, optimize for you. When you're done, you have to build that factory floor in the real world. Then what do you do? You put IoT, and you put sensors everywhere in there. Those sensors, they feed data right back into your simulation? Yeah.

Now, you have what I call the industrial metaverse, because now I can see if my simulation runs as expected in the real world. I can play around with scenarios. I may learn something, make my simulation more and more accurate because of the data flow coming back in. There's a data flow going back out, which is, "Hey! I found out that if I move this conveyor belt a bit or if I speed it up, things go better." Let's do that in the real world. Then immediately, we'll see the impact of that.

That metaverse is in my opinion, a real metaverse, because the real world and virtual worlds they blend, and they are connected by data flows, things like IoT, 5G networks, all of that technology comes into play. We are able to do stuff in the real world much more efficiently because we can play with it, experiment with it and learn from it in the virtual world first. That to me is that real metaverse.

[00:20:44] RS: Very well said, and it gets more interesting the more you extrapolate this. You're limited only by our imagination here in what you can simulate. This notion of modeling, building something in a simulation, trying to understand how it's going to work before building it in the real world. That's been the domain of aeronautical engineering, lots of engineering, mechanical engineering for a long time. What is the difference now? What is the new opportunity that Unity presents and just in general, with simulation technology becoming more advanced?

[00:21:13] DL: Yeah. Full respect. When Boeing, they built an airplane and then they and their partners, they build the flight simulator? It's going to be very accurate. I would say that, once an airman from the Air Force who told me that, that flight simulator, they will get from lucky, that Boeing would make sure that they didn't crash their plane, period. Yeah. However, when they're flying missions, they always with at least one wing man, so that leaves two planes up there. There's an AWACS surveillance, there are satellites, there are troops on the ground. There are other kinds of intelligence. The simulator, they actually need is one that creates a whole complete scenario around them to train the pilot. That was a defense example, but I think it's sort of easy to understand the complexities that a single perfect flight simulator does not satisfy.

Then, you just think, but gaming, isn't that the point of gaming? Are we having multiplayer games where we can do this? If you take a step back, yes, if you want a simulator for building a skyscraper, and you need to be able to simulate the structures, and you need to make sure they can withstand an earthquake, et cetera, we need those simulators. But that's not what I'm talking about here. When you look at a game engine, like Unity is for simulation. It has a visual component. I mean, we are designed to thrill players. There's a high visual quality. Well, that visual quality is really beneficial, also, if you want to simulate some areas of computer vision. We have the physical component. I mentioned, we have the physics engine. 

If a robot in a simulation doesn't keep on to something, it will drop to the floor. It's a real simulation. There are cognitive aspects when you have a game engine. There are situations that need to be solved, and potentially, you can even have social aspects, you can even simulate human interaction, or you can actually have human players interact with your system. When you look at game engines, it takes simulation from their strict mechanical engineering closer to the real world and real-world problems, which is, the real world is not as clean as double precision floating points in a simulation for construction or in simulation for an airplane. 

But in the real world, those processions they actually never matter. It's a lot of other things that matter. The Unity engine is perfect for representing real world scenarios with real world uncertainties. I have many times mentioned that, we love to think of the world as being Newtonian, that what Newton said, I can compute everything. But we all come to realize that Heisenberg is actually more right that we know very little. There is uncertainty everywhere, and the world is built on uncertainty. I think that what we're looking at here is to look at simulation in an uncertain world rather than in a Newtonian precision, deterministic world.

[00:24:36] RS: If I'm an AI practitioner listening to this, what do you think is the opportunity at hand? How should I be thinking about the way that the industrial metaverse, the metaverse writ large advance simulation ought to be impacting my career?

[00:24:49] DL: I think it's a huge opportunity. I want to give you an example. During the pandemic, we saw a huge uptick in robotics practitioners reaching out to Unity. During the pandemic, the labs were shut down. I know a lot of people say, "Yeah, I can work from home." But for some people in the robotic space meant that they couldn't get near the robots, which meant that, what were they’re going to do. What they found out is that, Unity's engine is a perfect forum for playing with robots. You can play faster, you can play with more and you don't break anything, at least not in the real world. I think that, if you're an AI machine learning practitioner, you are always limited by access to data. 

We have a few big tech companies, they have lots of data, they can do cool stuff with it. But for the rest of us, data is the challenge. What you can do in Unity is you can create these scenarios; it can be a robotic arm that you want to use reinforcement learning to train a model that would allow this robotic arm to be very resilient and pick up objects from wherever the object is put on the table. Doing that with a physical robot is very, very difficult. But you can actually generate the millions and millions of training scenarios that you need to build your machine learning model in Unity. At very low cost, you can do it at very high speed. We have one product which is called Unity Simulation Pro, which is using GPUs to speed up the data generation. I can suddenly generate thousands of training scenarios in a day on a desktop computer. It could be optimizing traffic in a city, or it could be ensuring that everybody can get out of a building in case of a fire. You can simulate that and you can simulate that at high speed. You can do it with available standard equipment. I mean, that empowers machine learning and AI practitioners to a level that we have never seen before.

[00:27:01] RS: How essential do you believe this experience to be. It made me looking down the road a few years? Will this be the kind of thing that you as a practitioner would say, "I have X experience simulating in the industrial metaverse that this is an important skill set on the tool belt of your AI practitioner."

[00:27:18] DL: When I look at the leading AI companies out there, they are adopting simulation, synthetic data and all these aspects of an industrial metaverse to just incredible degree. It's moving so fast. When we first started talking about synthetic data a couple of years ago, people would look at us and be like, "What are you talking about?" Two years later, we have conferences on synthetic data. There are startups in the space that get double digit CSA funding in this space. I think that the whole industry realized that working in simulations is a way to build highly, highly intelligent systems based on millions and millions of crazy scenarios that we can never create in the real world. Being able to use these new technologies in your AI and machine learning work, I think is going to be crucial for the next few years.

[00:28:20] RS: The opportunity sounds clear to me anyway. It's amazing work you're doing over there at Unity, so much more than a video game company. Danny, this has been a blast sitting down and speaking with you. Thank you for sharing all the awesome work you're doing in your experience. I've loved learning from you today.

[00:28:36] DL: Thank you very much, Rob. I thought your questions were fantastic. I enjoyed it very much.

[00:28:47] RS: How AI Happens is brought to you by Sama. Sama provides accurate data for ambitious AI, specializing in image, video, and sensor data annotation and validation for machine learning algorithms in industries such as transportation, retail, ecommerce, media, medtech, robotics, and agriculture. For more information, head to sama.com.