Host Rob May sits down with Matthew Mattina, Head of Arm’s Machine Learning Research Lab. Tune in to hear all about his work using the existing Arm designs to help figure out how to do more for customers, what they see in the market in terms of new non-digital technologies, things that need to be solved or built that would help with their goals, what’s missing from the ecosystem, and much more.
CEO and Co-Founder
Head of Arm’s Machine Learning Research Lab
Rob May: Hi, everyone, and welcome to the latest episode of AI at Work. I’m Rob May, the co-founder and CEO of Talla. I’m your host today. I’m here with Matthew Mattina, the head of Arm’s machine learning research lab, which is based here outside of Boston. So, Matthew, welcome to the podcast today. And why don’t you tell us a little bit about your background and how you got to Arm and what kind of stuff you work on there today?
Matthew Mattina: Thanks, Rob. Happy to be here. I came to Arm by way of a local startup called Tilera. It was a semiconductor chip company, and we built a large multi-core processor. Eventually we got acquired in 2014. After that, I decided to do something different, and I moved to Arm, actually not directly working on ML at the time, sort of looking at new business opportunities for Arm. That brought me pretty quickly to look at the ML space. Eventually we decided to kind of build up or found, basically, a research lab inside of Arm to understand ML. Ultimately we also founded a business unit inside of Arm to productize it.
RM: How much of your work at a place like the Arm machine learning research lab is on using the existing Arm designs to help figure out how to do more machine learning for your customers, and how much of it is figuring out how to incorporate new hardware concepts into your chips that speed up and accelerate artificial intelligence?
MM: That’s a good question. The truth is it’s both. About half of our work is around looking at how can we build better hardware for executing state-of-the-art machine learning applications, so things involving vision, object detection, speech translation. So, what should the hardware look like that can execute state-of-the-art CNNs, RNNs, whatever. Kind of a sub bullet to that is, what should the software kernels look like that execute them. There’s both the hardware piece of it, and then there’s the software library piece of it.
The other big thrust of our work is actually on the model design optimization. Let’s say we want to enable real-time translation on the device and the state-of-the-art models that do that maybe require many gigabytes of storage. In a constrained device like a mobile or even something small like a watch, you’re not going to have those resources. The big second part of our work is around actually how can we shrink models, how can we change model architecture, how can we find better model architectures that are more synergistic with mobile platforms, constrained platforms.
RM: Because you guys are going to typically run on smaller, more mobile types of devices for Arm applications, how do you break down the world between the types of machine learning use cases that can deal with the latency of sending things back to the cloud and everything else, and how much do you believe it’s going to be more like, no, the actual inference needs to happen on the device? What are you seeing in the market for the demand for those two types of structures?
MM: I’m glad you brought that up, because I actually should have made that distinction at the outset. Arm and my lab as well, we’re mostly focused on the inference piece of it, first of all. We don’t spend too much effort now on the training side.
RM: Right. Training on mobile devices for machine learning models is probably a long way off from us.
MM: It’s coming, actually, faster than you think. We have some interesting projects in this space around training at the edge. But yeah, so generally, we’re looking at inference. The key drivers for inference at the edge versus we’ll just ship it back in the cloud are kind of the obvious thing, so like latency. Often you care about latency, right, especially, like if I’m speaking Japanese, and you don’t speak Japanese, and we want some real-time translation between us, we probably don’t want to ship that to the cloud.
MM: So latency is one. Security is the other big one, right?
MM: People are, for good reason, uneasy about shipping stuff back to the cloud. Then even things like power, actually, like firing up the radio on a mobile device is actually expensive to ship those bits up, versus actually running the computer locally might actually be more energy-efficient.
RM: Yeah, are there use cases where connectivity itself is sometimes a problem?
RM: Military or something?
MM: Or even just my house, right? So even there, my mobile will drop connectivity. Always-on internet is still not always on.
RM: There’s a perception in the market, and I don’t agree with this perception. But there’s this perception, particularly going back a couple of years of like, wow, NVIDIA was very well-placed for this, because they were the leading manufacturer of GPUs. But there’s still a perception of, well, GPUs are the AI hardware, and they’ve run away with it. As a former hardware guy, I don’t see that at all, because hardware differentiates across power and footprint and a whole bunch of different types of things. When you think about the AI chip market and where you’re seeing it go, what are the main differentiating factors in terms of how the buyers are making their decisions about chips? Like what are they looking for that’s not on the market now or where these things are going? Because it’s not all going to be in video GPUs.
MM: I mean, on the training side, NVIDIA has sort of a very strong position, to say the least, in that market. Anytime you have a dominant player like that, you’re going to get people wanting a piece of the pie.
MM: The big ones in the space, companies like Graphcore, a UK startup, they’re gunning NVIDIA. There’s many others. I think that’s sort of natural. Now how many will actually be standing at the end of the day, I think probably not that many. Because I think NVIDIA, they’re very good at what they do, and they have the software infrastructure for stickiness. That’s a huge barrier to entry for others in the training space.
RM: Do you think there are applications though that– do you think most of the sort of chips that are going to run at the edge, are they going to be more generic inference, or are they going to run a lot of different types of models and neural network architectures? Or, are you going to see fit chips that are specialized towards language use cases or facial recognition and all that kind of stuff, or does it matter?
MM: For inference at the edge, I think most of these algorithms that are at the core of CNNs, RNNs, whatever, it’s mostly about matrix vector or matrix-matrix multiplication. I don’t think there will be deep specialization for the ML piece of it. Now, you’re still going to want specialized chips for embedding in a microphone. That’s going to have different IOs and that sort of thing. But, I think the ML piece of that is either best handled by software, which gives you adaptability, or with the accelerator. I think it’s more likely to be a general purpose accelerator as opposed to something that’s very domain specific.
RM: Now what are you guys seeing in the market in terms of new non-digital technologies, or how much of the innovation is around materials or reviving old things? I’ve seen resistive RAM and some analog circuitry come back. Are you guys looking at much of that, or what do you think about it?
MM: I think it’s a super interesting space. Again, as a hardware guy, it’s fun to see this stuff being paid attention to and being researched. In particular, there’s a couple of Boston area startups that are looking at photonics.
MM: That’s very interesting work. To answer your question, yes, we have research in that space. We pay attention to that very closely. I think the issue there is like I’m a believer that some of these foundational elements, whether it’s a NVM, non-volatile memory, or a photonics array, they do offer a benefit on paper versus digital CMOS in terms of like throughput per watt, say. However, you need a bunch of stuff around that, right, like whether it’s in A to D converters or you need to worry about the noise and control the environment with the noise with the analog stuff. So the question we have is once you actually address these shortcomings, have you washed out the theoretical 50x advantage. I think that’s a question. I don’t know the answer to yet.
RM: It reminds me of in the 2005, 2006 time frame, I worked in a mobile wireless space on the software side. One of the technologies that was coming at the time was wireless USB, which was built on top of the ultra-wideband protocol, and it was going to be 100 times faster or whatever. A similar thing happened, right, which was you had all the specs– and of course, there wasn’t interoperable hardware yet, because everybody was trying to get it to work, and it’s fast, and there’s standards still being debated.
We as a software company were working on it. We were kind of waiting on the hardware guys. What ultimately happened was, because it took so long, there were two things that happened in parallel. One was that we realized, wow, to get the theoretical throughput it really only worked when you’re streaming, like, a DVD movie, right? Just for shorter messages, there was so much communication overhead, that it actually was a lot slower.
Then in parallel, 802.11 just kept getting faster and faster. You went from B to G to N to whatever. Then you look around, and you go, wow, instead of being a 100x gain, it’s a 2x gain, and it’s five times the cost. Like, why am I going to do this? So, I don’t know that wireless USB ever became a major thing. At least I don’t own any wireless USB products.
I can see that happening. It’s part of what I’ve tried to– from the investing side, it’s part of what I’ve thought a lot about is how the hardware landscape is going to play out, because I have no doubt there’s going to be a lot of changes. It’s hard to predict what they’re going to be though. What do you think– are you excited or are you nervous about the fact that a lot of these concepts are moving away from the von Neumann architecture that we’ve had for 50, 60, 70 years, however long we’ve been doing these hardware devices, and it’s opening up a lot of new compute opportunities?
MM: Yeah, I mean, I think it’s exciting. I think most people at Arm think it’s exciting. You know, the whole von Neumann thing, I tend not to get too worked up over that. I mean, at the end of the day, right, like if you look at a state-of-the-art accelerator, you’re still getting instructions from somewhere telling you, hey, it’s time to do a big matrix by matrix multiply. And then pretty much everybody goes off and pulls the data from some memory somewhere and multiplies it together and produces a result. And is that von Neumann? I don’t know. The question I have is, does it work? Is it lower power, less area? And, is it programmable, you know?
I think some of the more esoteric stuff maybe you could argue is clearly not von Neumann. But yeah, I think it’s a super exciting time. And as we were talking before, we started, I think, the– what’s neat about what’s happening now in the processor design space or in the hardware space in general is that for many years, speaking as an architect, right, our jobs were fairly easy. The fundamental underlying process technology kept getting better at a predictable rate in a predictable amount of time and by a predictable amount. So we could kind of be lazy and still design things that were twice as fast and cheaper. And everybody was happy. And it’s pretty clear now that those halcyon days are ending. But in a sense, it could usher in kind of a new golden age of hardware design, because now we actually have to work that much harder, because we’re not going to get this free lunch from the process guys anymore.
RM: Yeah, I saw a presentation by a guy from IBM– and I think this has all been publicly available too– that when you extrapolate and you look at the cutting-edge machine learning models that are pushing the limits, so things like AlphaGo, AlphaGo Zero, and whatever, the compute needed to run those models is doubling every 3 and 1/2 months. So it’s very hard to keep up with. And if you go back to AlexNet in 2012 up through like AlphaGo Zero, it’s been a 300,000x increase in compute needed to run those models, which is crazy and exciting and everything else.
Now, how is some of this going to affect– because you guys are all IP? You don’t do any fabrication, right?
MM: That’s right. Arm is our IP company. So we design the stuff, and then we license it to our partners, who then put it into a chip and put other stuff around it.
RM: Yeah, how do you think– because I’ve heard some people say, like, one of the nice things about coming up with a lot of these new architectures and these changes is you’re going to be able to use older fabrication technologies now. So you don’t have to be pushing to 7 nanometers and 5 nanometers and all that. Is that going to be true? Like, are we going to be able to use some of the older fabs, or are you going to need new types of fabs? Like, where do you think that the world is going?
MM: That’s hard to say. I mean, I think for some important areas like the internet of things, right, for your embedded light bulb, you might not need to be on bleeding edge, 7 nanometer process node. I think there will be many designs that can stay in older nodes and get the benefit of cost reduction that way, so cheaper foundry costs, et cetera. But, I also think the opposite will be true. You’re still going to want your latest server or mobile device to be on the latest processor, because although you’re not getting the inherent speed up from the transistor like we used to, you’re still getting area reductions. The transistor is still shrinking at least for now, and some power benefits as well. I think there’ll be a spread. Depending on the market space that you’re in, you’re going to see more bleeding edge process node versus older nodes.
RM: Cool. When you think about things that need to be solved or built that would help you at Arm with the goals that you have around machine learning, what’s missing from the ecosystem or not in the state that you would like it that’s maybe like stuff that you guys aren’t doing or working on, but you’re like, oh, I really wish somebody would solve this problem or build this thing?
MM: I think visualization tools is an interesting area. There are some commercial and open source efforts in this space. But just generally getting better insights into why a network is doing what it’s doing. This is a whole subfield right around explainable AI, like, well, why did the self-driving car decide to jump the curb or avoid the person in the crosswalk or whatever. Insights into how models are reaching the conclusions they are I think would be helpful to the field in general and also for the work that we do. Because we’re doing things like quantizing models down from single precision floating point values down to 8-bit numbers. So while you lose some accuracy in some cases, in other cases you don’t. Getting more insight into why the network is behaving the way it’s behaving would be generally useful.
RM: As intelligence is moving to these edge applications, how are you guys thinking about some of the problems around, like, security at the edge and also sort of testing and support and making sure everything is working properly? Like, is that stuff you’re building into your designs or into tools around it, or are you relying on third parties? Like, how is that shaping up?
MM: I mean, security is sort of a whole other topic here. Arm takes security extremely seriously and there’s all the way from trust zone security mechanisms in our processors to in the software that Arm releases. Security is a huge issue. A big part of that actually starts with running things at the edge as opposed to shipping them off to the cloud. Security is a big one.
From an ML perspective, it’s not something that is front and center for the work that we’re doing. We’re looking at models. We come up with new model architectures. We’re less focused on, well, how does this actually get packaged up into a larger system and how are security provisions made for that. Certainly security is critical.
RM: A lot of the AI changes and use cases have led to some tech companies’ employees having problems, right? You’ve seen walkouts at Google. You’ve seen people at Microsoft and Amazon complain about work they’ve done with the DOD and everything else. When you look at potential applications that Arm chips could end up in, whether they’re, like, autonomous drones that shoot people or facial recognition in China for a social score and all that kind of stuff, what’s the internal and the corporate stance on that? Is that part of your regular discussions, or do you just have a division that thinks about that? What’s the general gist of the employees there?
MM: It’s a good question. Arm basically formed an ethics committee to understand these issues and to figure out what a sensible position on these questions need to be, all the way from Arm IP showing up in an autonomous weapons to killer robots and everything in between. It’s something that Arm takes very seriously. Internally, we’re working on a cross-disciplinary group that’s trying to figure out what is a sensible kind of code of ethics around this that we can both share with our partners and also governs how we conduct ourselves. I don’t think that’s been released yet, but it will be.
RM: Talk a little bit about hiring and trends you’re seeing in hardware. We’re only a few years into this machine learning hardware trend, but are more students and more universities adapting and teaching some of these things, or do you have to teach people when they come? Do you have to hire senior people who can adapt quickly? Are more people going into hardware design now? What’s that like?
MM: I probably don’t know whether more people are going into hardware design. I don’t have an insight into that. I mean, as you probably know, it’s extraordinarily competitive to hire people in this space. The nice thing about the group that we’ve built at Arm, the research lab at Arm, is that we’ve got a good mix of newly minted PhDs in machine learning along with seasoned processor architects. And so putting them together is super interesting, and good stuff comes out when you’ve got the guy who really understands the nuances of variational dropout next to a hardware a guy. I think really nice things are happening in that space. But it’s competitive. I think the advantage Arm has is that we– since Arm has 95% market share on mobile phones, that’s appealing to people who want their new ML ideas to actually be used by billions of people. So that’s kind of one way that we’re able to hire top-notch people in the field.
RM: No doubt that the sort of inference at the edge for machine learning is going to be a huge wave for you guys. I mean, that’s going to be great. It’s funny. I actually use Arm all the time. When I talk about Talla and the NLP work that we do, I give people the example of if you’re just going to train in NLP model on, like, colloquial English, then an arm is just going to be a physical arm. Whereas when you’re looking at business vernacular, if you’re a device company, Arm may be a chip. If you’re a finance company, Arm may be an acronym for adjustable rate mortgage. Part of what our tool does is we try to get customer support agents that go through their workflow to sort of highlight and label those things so that we can build better NLP models that are more accurate around entity recognition and all that. See, that’s funny. I talk about Arm all the time.
MM: That’s sort of the importance of context, right, and the imprecision of the English language. Whether you meant Arm the adjustable rate mortgage or your appendage or the chip probably can be figured out by the surrounding words.
RM: Yeah, but if your corpus that you trained on isn’t trained on that industry language, it might be different, right? It’s kind of interesting.
Last question, one of the things I always like to talk to people about is there’s a lot of concern from the general public about like killer AI, artificial intelligence, killer robots, and all that. And as somebody that works on some of these algorithms and sees what’s happening in the industry, what do you think about that? Are you worried? Do you think we should be worried about it? Do you feel like it’s so far off? It’s not a big deal?
MM: I’m really glad you asked that question actually, because this is something I have changed my mind on. It’s also something I think a fair bit about. If you’d asked me five years ago, I would have said this is just silly, right? This is just fear mongering. But, I actually think it could be closer than people think. By “it,” I mean AGI or not even that dramatic as AGI, artificial general intelligence, but just the disruption that can be caused by AI to society could be quite large actually if many, many jobs become not needed anymore. That’s a societal thing. I think there are some analogies that people have painted in this space around we don’t know exactly when it’s going to happen, so let’s not worry about it. I think we do need to think about the control problem and how do we make sure that as we develop better and better machine learning models or AIs that the goals that are programmed into them sort of align with what we want. I don’t think people should lie awake at night worrying about it. I think there’s more important things to worry about. But it’s something that I worry about and I think could be if not in my lifetime, we could see it in my kids’ lifetime, so it suddenly takes on a new salience.
RM: Well, that’s a very good answer to end on. Matthew Mattina, thanks for joining us today. Those of you listening, thanks for listening to the show. If you have guests you’d like us to have on or questions you’d like us to ask, please send those the firstname.lastname@example.org. We will see you next week.