Episode 48: AI at the MGH Martinos Center, with Matthew Rosen and Bo Zhu | Talla

Episode 48: AI at the MGH Martinos Center, with Matthew Rosen and Bo Zhu

Episode Overview

Tune in to this episode of AI at Work on AI at the MGH Martinos Center. Hear from two guests, Matthew Rosen and Bo Zhu. Matt and Bo are also co-founders of a company called BlinkAI. Hear about their philosophy on how to “do basic research, but with a clinical focus”, learn how their work at the MGH Martinos Center inspired their company BlinkAI, and much more.  

Listen to every episode of AI at Work on iTunesGoogle Play, StitcherSoundCloud or Spotify

Bo Zhu

Accelerating Imaging Performance with Machine Learning, MGH/Martinos Center for Biomedical Imaging,
Blink AI Co-Founder

Matthew Rosen

Director, Low Field MRI and Hyperpolarized Media Laboratory, MGH/Martinos Center for Biomedical Imaging,
Blink AI Co-Founder

 

 

Episode Transcription   

Rob May: Welcome, everybody, to the latest episode of AI at Work. I’m Rob May, the co-founder and CEO of Talla. Today my guests are Bo Zhu and Matt Rosen from the MGH Martinos Center. Guys, welcome to the podcast. Why don’t you each give us a little bit about your background and how you got to the Martinos Center. 

Matt Rosen: It’s great to be here. I’m Matt Rosen. I lead a group at the Martinos Center which is really focused on physics, and physics as it pertains to tool-building, really for biomedical imaging. What we try to do in my group is really take the great successes of magnetic resonance imaging and kind of deconstruct it in a certain way and build less expensive, more accessible tools that can really change health care. AI has turned out to be a big component of that. 

Bo Zhu: I’m Bo Zhu, as an undergrad at MIT, one of the things I was working on was improving automatic speech recognition. How can we make speech recognition systems such as Alexa, and Siri, and Google Home, these things that we’re all familiar with, how can make them more robust to noise? In my PhD, doing work on improving medical imaging from a much more foundational physics perspective, and after graduation, entering Matthew Rosen’s group also at the Martinos Center, really thought about combining the two– the machine learning to how do you make more robust signals essentially in an imaging context now. One of the great things that we’ve develop together is a way to use machine learning to really enhance medical imaging quality for diagnostics. 

RM: Interesting. Two part question for you guys. What are some of the key things that you work on at the Martinos Center specifically, and then how is the research agenda set? Do the innovative researchers work on what they want? Is it sort of top down that, hey, we’re going to go into this direction next year? 

MR: Work in my lab has to get paid for somehow, and so we typically operate based on grants from federal funding agencies. Those agencies include things like the National Institute of Health, DARPA, Department of Energy, the Army. And a very serious program came essentially because of interest from funding agencies in my lab back in around 2010 where the Army was interested in the idea of taking magnetic resonance imaging tools out of the very highly controlled environment of the hospital, and putting them into field forward situations. 

Now in order to do that and it be safe, because someone may have shrapnel in their body, you need to turn the magnetic field down very, very low, and the consequence of doing that is that– as Bo was saying– the signal goes down very, very low. MRI has been a real physics success story for the last 40 years, and key to that success has been the steadily improving signal that we get by building these more expensive, higher magnetic field machines. 

We were going to sort of turn that whole thing upside down and say, yeah, we know all of that, but we have to turn the magnetic field down. Let’s see what we can learn and what all of the years of signal processing and acquisition approaches and algorithm approaches. This was really, I should say, pre machine learning, at least in my lab. I knew nothing about these things– what approaches work and what doesn’t work. So we built a program and my group really based around this idea. 

Since then, we’ve kind of gone in several different directions, working very closely with clinical collaborators at the MGH. You know, it’s interesting to talk to physicians scientists, and say, as a physicist, as a tool builder, how can I solve a problem that is important? Our sort of philosophy is to be clinically focused to some extent, still do basic research, but with a clinical focus. It kind of keeps us real. Then find out what are challenging problems that other people have had a hard time solving, so that makes it fun. 

RM: One of the things that always comes up when people talk about ML and health care is privacy issues with data, because the data is so important for machine learning. Were you able to get around that because you’re affiliated with MGH or do you still have certain privacy issues around the data you collect that makes some of your work challenging? 

MR: You can’t get around it. There’s no getting around– I’ll say this on microphone– we’re not getting around anything. I mean, HIPAA compliance is really important. Patient confidentiality is really important. At a big hospital like MGH where there are a lot of clinician scientists who are interested in this kind of work, there are systems in place to allow us to work with anonymized data essentially. 

We do have an interesting advantage in my lab though, and Bo will talk about this more, which is that the main algorithm framework that we’ve developed works on the so-called raw data coming right off of the scanner. It doesn’t work on images. And that can be pretty hard to get from a clinical scanner because those– the raw data is typically processed, turned right into images on the scanner, and then the raw data is thrown away.

But since the work we do in my laboratory is done on a homemade scanner, something I built with one of my graduate students back in 2004, we control our own data pipeline. And so that was sort of– that’s one aspect of us being able to get that raw data. 

BZ: Yeah, and I think when it comes to data privacy and it’s use in machine learning, I think that becomes a problem when you want to utilize data that’s associated with individual patient’s. So if things like age matter right or gender, or any one of many different things that could associate with a person from the type of work that we do in the lab– 

We’re at this point primarily interested in reconstructing images, no matter who it is. And so because of the nature of the data involved, it’s very much mostly a physics problem, rather than tagging certain things to individual people and properties of these people. 

MR: You’ve probably seen lots of people are working on so-called radiologists in a box. They do things like image segmentation or diagnoses, and these things work incredibly well. And they look at label data based on the images and expert region of interests or diagnoses. And that’s amazing work, but that’s not what we’re focused on. We’re focusing on improving image quality as it comes off of the machine. 

RM: Would you even consider these traditional machine vision approaches, or are you using convolutional neural nets or are you starting to look at other stuff? Do you have to use methods that– is there a time series component to this kind of data, for example?

BZ: I think there are a lot of different approaches to tackling this problem. I think at a very basic level, it’s transformation of the data from one space or one domain to another. So when you think about, let’s say, speech recognition, is you’re taking that raw data, that speech signal, and transforming it into text. 

In our case, we’re taking raw signal coming from the scanner, as Matt mentioned, which looks nothing like an image, but there’s traditionally a lot of handcrafted mathematical algorithms that people use to convert that into an image that is interpretable by radiologists or even computer vision algorithms downstream nowadays. 

Rob May: Welcome, everybody, to the latest episode of AI at Work. I’m Rob May, the co-founder and CEO of Talla. Today my guests are Bo Zhu and Matt Rosen from the MGH Martinos Center. Guys, welcome to the podcast. Why don’t you each give us a little bit about your background and how you got to the Martinos Center. 

Matt Rosen: It’s great to be here. I’m Matt Rosen. I lead a group at the Martinos Center which is really focused on physics, and physics as it pertains to tool-building, really for biomedical imaging. What we try to do in my group is really take the great successes of magnetic resonance imaging and kind of deconstruct it in a certain way and build less expensive, more accessible tools that can really change health care. AI has turned out to be a big component of that. 

Bo Zhu: I’m Bo Zhu, as an undergrad at MIT, one of the things I was working on was improving automatic speech recognition. How can we make speech recognition systems such as Alexa, and Siri, and Google Home, these things that we’re all familiar with, how can make them more robust to noise? In my PhD, doing work on improving medical imaging from a much more foundational physics perspective, and after graduation, entering Matthew Rosen’s group also at the Martinos Center, really thought about combining the two– the machine learning to how do you make more robust signals essentially in an imaging context now. One of the great things that we’ve develop together is a way to use machine learning to really enhance medical imaging quality for diagnostics. 

RM: Interesting. Two part question for you guys. What are some of the key things that you work on at the Martinos Center specifically, and then how is the research agenda set? Do the innovative researchers work on what they want? Is it sort of top down that, hey, we’re going to go into this direction next year? 

MR: Work in my lab has to get paid for somehow, and so we typically operate based on grants from federal funding agencies. Those agencies include things like the National Institute of Health, DARPA, Department of Energy, the Army. And a very serious program came essentially because of interest from funding agencies in my lab back in around 2010 where the Army was interested in the idea of taking magnetic resonance imaging tools out of the very highly controlled environment of the hospital, and putting them into field forward situations. 

Now in order to do that and it be safe, because someone may have shrapnel in their body, you need to turn the magnetic field down very, very low, and the consequence of doing that is that– as Bo was saying– the signal goes down very, very low. MRI has been a real physics success story for the last 40 years, and key to that success has been the steadily improving signal that we get by building these more expensive, higher magnetic field machines. 

We were going to sort of turn that whole thing upside down and say, yeah, we know all of that, but we have to turn the magnetic field down. Let’s see what we can learn and what all of the years of signal processing and acquisition approaches and algorithm approaches. This was really, I should say, pre machine learning, at least in my lab. I knew nothing about these things– what approaches work and what doesn’t work. So we built a program and my group really based around this idea. 

Since then, we’ve kind of gone in several different directions, working very closely with clinical collaborators at the MGH. You know, it’s interesting to talk to physicians scientists, and say, as a physicist, as a tool builder, how can I solve a problem that is important? Our sort of philosophy is to be clinically focused to some extent, still do basic research, but with a clinical focus. It kind of keeps us real. Then find out what are challenging problems that other people have had a hard time solving, so that makes it fun. 

RM: One of the things that always comes up when people talk about ML and health care is privacy issues with data, because the data is so important for machine learning. Were you able to get around that because you’re affiliated with MGH or do you still have certain privacy issues around the data you collect that makes some of your work challenging? 

MR: You can’t get around it. There’s no getting around– I’ll say this on microphone– we’re not getting around anything. I mean, HIPAA compliance is really important. Patient confidentiality is really important. At a big hospital like MGH where there are a lot of clinician scientists who are interested in this kind of work, there are systems in place to allow us to work with anonymized data essentially. 

We do have an interesting advantage in my lab though, and Bo will talk about this more, which is that the main algorithm framework that we’ve developed works on the so-called raw data coming right off of the scanner. It doesn’t work on images. And that can be pretty hard to get from a clinical scanner because those– the raw data is typically processed, turned right into images on the scanner, and then the raw data is thrown away.

But since the work we do in my laboratory is done on a homemade scanner, something I built with one of my graduate students back in 2004, we control our own data pipeline. And so that was sort of– that’s one aspect of us being able to get that raw data. 

BZ: Yeah, and I think when it comes to data privacy and it’s use in machine learning, I think that becomes a problem when you want to utilize data that’s associated with individual patient’s. So if things like age matter right or gender, or any one of many different things that could associate with a person from the type of work that we do in the lab– 

We’re at this point primarily interested in reconstructing images, no matter who it is. And so because of the nature of the data involved, it’s very much mostly a physics problem, rather than tagging certain things to individual people and properties of these people. 

MR: You’ve probably seen lots of people are working on so-called radiologists in a box. They do things like image segmentation or diagnoses, and these things work incredibly well. And they look at label data based on the images and expert region of interests or diagnoses. And that’s amazing work, but that’s not what we’re focused on. We’re focusing on improving image quality as it comes off of the machine. 

RM: Would you even consider these traditional machine vision approaches, or are you using convolutional neural nets or are you starting to look at other stuff? Do you have to use methods that– is there a time series component to this kind of data, for example?

BZ: I think there are a lot of different approaches to tackling this problem. I think at a very basic level, it’s transformation of the data from one space or one domain to another. So when you think about, let’s say, speech recognition, is you’re taking that raw data, that speech signal, and transforming it into text. 

In our case, we’re taking raw signal coming from the scanner, as Matt mentioned, which looks nothing like an image, but there’s traditionally a lot of handcrafted mathematical algorithms that people use to convert that into an image that is interpretable by radiologists or even computer vision algorithms downstream nowadays. 

What we’ve been finding is that, much like in speech recognition back in the day, you had all sorts of experts building in subsystems like speech experts, acoustics experts, linguistics experts, and all these experts sort of putting in what they think is the best way to transform that speech signal into text with much more kind of naive but flexible neural network approaches, you let the data represent itself, right? 

We’ve gotten far better and more robust speech recognition solutions. And so we’re sort of applying that same model into image reconstruction– taking that raw signal from image. And we found that there are great benefits to having the machine solve that problem for you. 

In terms the actual architecture, I think there are several ways to achieve this. The sort of algorithm or a specific architecture we’ve developed, auto map, uses a particular method, but people are now sort of extending upon that and finding new ways to do the same thing. It’s a very flexible approach that I think works across many different types of neural network architectures. 

MR” And, many different kinds of data. I mean, the idea of this automatic framework, which is automated transformed by manifold approximation, the idea behind this is to go from one intermediate domain to a final domain. As Bo said, when we take MRI signals, we have voltages in our pre-amps and those get digitized, and then you do some sort of transform, or it could be counts in a photo-detector. It could be X-ray astronomy. 

You have some intermediate domain that we call the sensor domain. You transform that domain into the image domain. We do that in a learned way based on the forward encoding. So it is a very interesting approach. You mentioned convolutional networks. Because, as Bo mentioned, we’d like to think of this as a math problem– and also, I think, because we work in the medical space– we kind of really want to understand what’s under the hood of this approach. 

We’ve all seen the example where horses get turned into zebras and all of those kinds of things, and then Putin riding a horse turns into Putin himself becoming a zebra. We want to be really careful with that. One of the things we talk about quite a bit in our paper is the sort of mathematical underpinnings of each of the pieces as we can describe them. 

We use the universal function approximator and a fully connected network to do the bulk of the transformation. We enforce sparsity sort of naturally and explicitly, depending on where you are in the network. We take advantage of things that we understand and have good mathematical underpinnings. 

Of course, many things have been proved in the machine learning world, back when things were called perceptrons. And we try to rely on some of those things and not just use, let’s say, U-Net to do something, which of course, U-Net is incredibly powerful. That’s not a diss on U-Net. But since we’re dealing with data transformation between domains in a space where people already use mathematics, if we’re going to subsume the standard mathematical transformations, we kind of feel like we need to understand what’s going on. 

BZ: Right. Exactly. And I think especially for mission critical applications like medical imaging where if you get something wrong there are real risks to that. So for things like image beautification and computational photography, you can have a little bit more leeway in terms of how your networks perform. But having as much of an understanding of these networks and what their sub-components are is quite important. 

RM: If step away from your work at MGH a little bit and think about it more broadly applied in society, do you worry about, you know one of the big things that’s going on right now is you have deep fakes and fake news and all this kind of stuff, and so people are figuring out, how do we watermark things at the point of creation so that we know they’re not can generate generated or whatever. 

I can see a world where you’re like, when we have the image off the MRI, we’re going to watermark it. And you’re like, oh, we’re going to jump in before the image at the sensor level. Like do you think from a good guy, bad guy machine learning world, we might get to that someday where people are figuring out how to do Ganz on sensor level data and create like– by the time your instrumentation gets something, it’s wrong. 

I don’t know. I never thought about that until I talked to you today. 

MR: That’s so uncomfortable to think about that. The internet of things applied to MRI scanners, right? 

RM: Yeah. 

MR: Wow. Well, I mean certainly– not certainly. I think this idea of watermarking and digital signatures is certainly going to be applied to things like electronic medical records, which are moving more and more to include diagnostic imaging, typically in a di com format or something like that. But I could easily imagine encrypting those or doing things like that. Whether or not you’re going to go all the way back to what’s coming off the scanner, that’s beyond my pay grade. 

BZ: I’ve heard– and this isn’t really a solution so to say, but an interesting sort of thing that’s coming up– is insurance companies increasingly using automated systems like convolutional neural network based solutions to check and verify diagnostic results from the images. I’ve sort of heard from the grapevine that one can imagine using some sort of adversarial attack or something like that or some sort of modification of the image to–  

MR: To generate your own image that was going to get paid by the insurance company? 

BZ: Yeah. Well, right. Or you know, you have an image and maybe even it’s hospitals doing it. In the worst case, it’s hospitals who want have some advantage over the insurance companies and they kind of modify it in a way that’s imperceptible, but fooling the network. These are real issues I think that need to start to be thought about and solved. 

RM: We’re going to do a whole podcast episode on it at some point. So yeah, this technology led to a spin off that you guys have created. Tell us a little bit about how that came about and what does that spin off do? 

BZ: One of the ways that we describe even the auto map, the medical imaging representation of our work is that it’s sort of utilizing how we as humans learn to see. How do we, as biological creatures, have such great robustness and sort of accuracy in our own vision? For example, if you go to a dark room, oftentimes you’ll be able to see well, but the picture you take on your camera doesn’t. 

What’s really amazing is that actually the data coming into your eyes is much worse quality. So the quantum efficiency of your retinal cells are actually five times worse than a pixel on your cell phone sensor. So clearly there’s some sophisticated processing that’s happening in our brains that’s able to use that much more quality of raw data to generate somehow in our final perception a really clear image. 

How that happens is essentially through this amazing kind of process called perceptual learning, which is you’re born, you’re blind essentially, and over the first few months of your life, you start to learn by interacting with the real world what are the important visual features that matter, so edges and certain patterns and textures and so on and so forth. 

These things are the sort of fundamental building blocks that your brain has selected to create that final image. And that’s what makes it so clear. And so when we talk about auto map, that’s certainly a good way of kind of maybe as an analogy to describe how we can make mental images look better by training on real data. 

But, this analogy applies even more closely to the startup that we founded at Blink AI where we’re actually directly using this type of data-driven imagery reconstruction to make image sensors perform better. Whether it’s your cell phone camera or a camera that’s on a car for ADAS or autonomous driving or security cameras, increasingly these sensors are getting smaller and smaller because they want to reduce costs and they’re also smaller because they often need to fit on mobile packages. 

They’re going on drones. They’re going on doorbells, under smart fridge. And the problem with them getting smaller is that there’s less light going in, which means that there’s a worse signal to noise ratio. There’s less information. Is there a way that we can really computationally boost the signal to noise ratio and allow these cameras to see much better in the dark when they’re just physically unable to currently? 

So we put a brain right behind the sensor to make it see three to five times brighter essentially. 

RM: Interesting. And when you think about the state of that technology today and continuing to move it forward, is it primarily– are we sort of at a plateau where you’re going to see lots of marginal improvements now as we learn to tweak things and everything else, or are we thinking that the next– there’s a big technical breakthrough around the corner in five, six, seven years. 

I’ll give an example. One of the things that I started to see in my angel investing that I do in AI start-ups is I’ve started to see people with use cases where neural networks hit their limits and I’m looking at a company that does customer support for voice that has all these cognitive architecture approaches where they have some neural nets and they have some other modules that are sort of predisposed to do things. 

I’m looking at a company that’s combining symbolic logic processing with neural networks to do some stuff. And so in the work that you do, how do you feel like the sort of state of technology today and where it’s going near term? 

MR: Well, we get surprised all the time. I mean, we’ve been surprised– I mean, just to go back to MRI for a second, there’s steady progress as people improve the things they know about and then a step function happens and somebody realizes, oh, how interesting. We can enforce sparsity and we can reconstruct data in the following way. That’s change. 

This happens. It happens quite often. The neat thing about that is that you can enable classes of technology that you haven’t thought about before, so you do static imaging with MRI. Suddenly, oh my goodness, the SNR has improved by orders of magnitude. Now we can do dynamic imaging. Then that opens up whole new classes of real time imaging. 

I think certainly in terms of what’s being done at Blink AI, you hit a limit and then you can enable a new class of technology. You can reduce the size of a sensor. Now suddenly the sensors become even cheaper. Then, oh, now there are multiple sensors on a device. And then you have all this multiplicity of data, which has more sparsity. 

As algorithms improve, that always is going to happen. As hardware is going to improve, that’s always going to happen. The beautiful thing about these algorithmic approaches is that Moore’s Law applies to chips and physical things, but the rate of GPU computers going kind of faster than that. If you can take what was once a hardware problem, like a quantum efficiency problem, and turn it into a software problem, you’re doing really well. You’re going to win. You’re going to continue to win as computer gets faster. 

RM: We won’t talk about it today, but I’m really watching sort of like, AI hardware space, neuromorphic chips and where all that’s going. There’s some really interesting stuff happening there. 

MR: You know the whole neuromorphic thing is great very interesting, because that’s how Bo and I describe auto map. You know, really based on– biologically inspired. And so the human biological neural network is a spiking neural network. And we don’t have that at all in synthetic neural networks. I think as we move down that direction, it’ll be very interesting to see, as you add that time dependence, what the heck is going to happen. 

RM: Do you feel like the industry is coming around to– you guys mentioned sparsity several times. And in general, I think that’s a word that has a lot of negative connotation in the AI space. And so I just wonder if you read into that– yeah, some people don’t like it. They don’t like dealing with it. 

MR: For us, we deal with, both in the startup and in the lab, we’re trying to improve natural images, not noise, but natural images. Natural images are sparse. That means you can go into some domain and in which you can represent them with less coefficients. People are very familiar with those for images because of, say, the wave of the domain, if you look at image compression. 

If you have some operation, if you identify a sparse domain and some operation that allows you to take, let’s say, noisy data and transform it into that domain where the data is sparse, the noise is not sparse in any domain. That’s sort of by definition. If you can eliminate all that noise, everything is great. You’re looking through the noise and then transforming back. When we say sparsity, that’s sort of what we mean. We mean really taking natural data, high dimensional data, natural data, and transforming it and operating it in a non-linear space where the thing we care about is sparse. 

BZ: Perhaps some of the topics that you were mentioning earlier that maybe have a negative connotation may relate to maybe specifically trying to enforce sparsity in your optimization. Maybe there are reasons why you might not want to do that. 

MR: Yeah, you can certainly do it wrong. 

BZ: Let the neural network just, exactly. Just do what it wants and then it will automatically happen. I think like Matt said, even the use of a convolutional neural network is, in some sense, you’re enforcing sparsity in your problem because you’re saying, hey, look, we care about, for example, images. 

We’re going to utilize an architecture that has certain properties, like spatial invariance and things like that. You’re already sort of imposing what you think is important to the problem and it’s sort of a form of sparsity. 

MR: The convolutional filters have locality to them. If there’s some locality to them, there’s going to be, you’re going to identify some sparsity. It’s a very, I guess nowadays.. What you’re saying, and if you do it wrong, if you say, oh, for some reason I want to make, I want to use less memory in my GPU, so I’m going to enforce sparsity in this place where it doesn’t belong, you might get junk or overfed or do some terrible things. But for us, again, because we have been so inspired by this biological vision archetype, sparsity seems to really drive that. There’s been a lot of successful, I think, mathematical signal processing discoveries going down that route. 
RM: Yeah, definitely. Good. We’ll end on that. Bo and Matt, thank you guys for coming on. If you have, those of you listening, if you have guests you’d like to see on the show or questions you’d like us to ask, please send those to podcast@talla.com. And, we’ll see you next week. Thank you.

What we’ve been finding is that, much like in speech recognition back in the day, you had all sorts of experts building in subsystems like speech experts, acoustics experts, linguistics experts, and all these experts sort of putting in what they think is the best way to transform that speech signal into text with much more kind of naive but flexible neural network approaches, you let the data represent itself, right? 

We’ve gotten far better and more robust speech recognition solutions. And so we’re sort of applying that same model into image reconstruction– taking that raw signal from image. And we found that there are great benefits to having the machine solve that problem for you. 

In terms the actual architecture, I think there are several ways to achieve this. The sort of algorithm or a specific architecture we’ve developed, auto map, uses a particular method, but people are now sort of extending upon that and finding new ways to do the same thing. It’s a very flexible approach that I think works across many different types of neural network architectures. 

MR” And, many different kinds of data. I mean, the idea of this automatic framework, which is automated transformed by manifold approximation, the idea behind this is to go from one intermediate domain to a final domain. As Bo said, when we take MRI signals, we have voltages in our pre-amps and those get digitized, and then you do some sort of transform, or it could be counts in a photo-detector. It could be X-ray astronomy. 

You have some intermediate domain that we call the sensor domain. You transform that domain into the image domain. We do that in a learned way based on the forward encoding. So it is a very interesting approach. You mentioned convolutional networks. Because, as Bo mentioned, we’d like to think of this as a math problem– and also, I think, because we work in the medical space– we kind of really want to understand what’s under the hood of this approach. 

We’ve all seen the example where horses get turned into zebras and all of those kinds of things, and then Putin riding a horse turns into Putin himself becoming a zebra. We want to be really careful with that. One of the things we talk about quite a bit in our paper is the sort of mathematical underpinnings of each of the pieces as we can describe them. 

We use the universal function approximator and a fully connected network to do the bulk of the transformation. We enforce sparsity sort of naturally and explicitly, depending on where you are in the network. We take advantage of things that we understand and have good mathematical underpinnings. 

Of course, many things have been proved in the machine learning world, back when things were called perceptrons. And we try to rely on some of those things and not just use, let’s say, U-Net to do something, which of course, U-Net is incredibly powerful. That’s not a diss on U-Net. But since we’re dealing with data transformation between domains in a space where people already use mathematics, if we’re going to subsume the standard mathematical transformations, we kind of feel like we need to understand what’s going on. 

BZ: Right. Exactly. And I think especially for mission critical applications like medical imaging where if you get something wrong there are real risks to that. So for things like image beautification and computational photography, you can have a little bit more leeway in terms of how your networks perform. But having as much of an understanding of these networks and what their sub-components are is quite important. 

RM: If step away from your work at MGH a little bit and think about it more broadly applied in society, do you worry about, you know one of the big things that’s going on right now is you have deep fakes and fake news and all this kind of stuff, and so people are figuring out, how do we watermark things at the point of creation so that we know they’re not can generate generated or whatever. 

I can see a world where you’re like, when we have the image off the MRI, we’re going to watermark it. And you’re like, oh, we’re going to jump in before the image at the sensor level. Like do you think from a good guy, bad guy machine learning world, we might get to that someday where people are figuring out how to do Ganz on sensor level data and create like– by the time your instrumentation gets something, it’s wrong. 

I don’t know. I never thought about that until I talked to you today. 

MR: That’s so uncomfortable to think about that. The internet of things applied to MRI scanners, right? 

RM: Yeah. 

MR: Wow. Well, I mean certainly– not certainly. I think this idea of watermarking and digital signatures is certainly going to be applied to things like electronic medical records, which are moving more and more to include diagnostic imaging, typically in a di com format or something like that. But I could easily imagine encrypting those or doing things like that. Whether or not you’re going to go all the way back to what’s coming off the scanner, that’s beyond my pay grade. 

BZ: I’ve heard– and this isn’t really a solution so to say, but an interesting sort of thing that’s coming up– is insurance companies increasingly using automated systems like convolutional neural network based solutions to check and verify diagnostic results from the images. I’ve sort of heard from the grapevine that one can imagine using some sort of adversarial attack or something like that or some sort of modification of the image to–  

MR: To generate your own image that was going to get paid by the insurance company? 

BZ: Yeah. Well, right. Or you know, you have an image and maybe even it’s hospitals doing it. In the worst case, it’s hospitals who want have some advantage over the insurance companies and they kind of modify it in a way that’s imperceptible, but fooling the network. These are real issues I think that need to start to be thought about and solved. 

RM: We’re going to do a whole podcast episode on it at some point. So yeah, this technology led to a spin off that you guys have created. Tell us a little bit about how that came about and what does that spin off do? 

BZ: One of the ways that we describe even the auto map, the medical imaging representation of our work is that it’s sort of utilizing how we as humans learn to see. How do we, as biological creatures, have such great robustness and sort of accuracy in our own vision? For example, if you go to a dark room, oftentimes you’ll be able to see well, but the picture you take on your camera doesn’t. 

What’s really amazing is that actually the data coming into your eyes is much worse quality. So the quantum efficiency of your retinal cells are actually five times worse than a pixel on your cell phone sensor. So clearly there’s some sophisticated processing that’s happening in our brains that’s able to use that much more quality of raw data to generate somehow in our final perception a really clear image. 

How that happens is essentially through this amazing kind of process called perceptual learning, which is you’re born, you’re blind essentially, and over the first few months of your life, you start to learn by interacting with the real world what are the important visual features that matter, so edges and certain patterns and textures and so on and so forth. 

These things are the sort of fundamental building blocks that your brain has selected to create that final image. And that’s what makes it so clear. And so when we talk about auto map, that’s certainly a good way of kind of maybe as an analogy to describe how we can make mental images look better by training on real data. 

But, this analogy applies even more closely to the startup that we founded at Blink AI where we’re actually directly using this type of data-driven imagery reconstruction to make image sensors perform better. Whether it’s your cell phone camera or a camera that’s on a car for ADAS or autonomous driving or security cameras, increasingly these sensors are getting smaller and smaller because they want to reduce costs and they’re also smaller because they often need to fit on mobile packages. 

They’re going on drones. They’re going on doorbells, under smart fridge. And the problem with them getting smaller is that there’s less light going in, which means that there’s a worse signal to noise ratio. There’s less information. Is there a way that we can really computationally boost the signal to noise ratio and allow these cameras to see much better in the dark when they’re just physically unable to currently? 

So we put a brain right behind the sensor to make it see three to five times brighter essentially. 

RM: Interesting. And when you think about the state of that technology today and continuing to move it forward, is it primarily– are we sort of at a plateau where you’re going to see lots of marginal improvements now as we learn to tweak things and everything else, or are we thinking that the next– there’s a big technical breakthrough around the corner in five, six, seven years. 

I’ll give an example. One of the things that I started to see in my angel investing that I do in AI start-ups is I’ve started to see people with use cases where neural networks hit their limits and I’m looking at a company that does customer support for voice that has all these cognitive architecture approaches where they have some neural nets and they have some other modules that are sort of predisposed to do things. 

I’m looking at a company that’s combining symbolic logic processing with neural networks to do some stuff. And so in the work that you do, how do you feel like the sort of state of technology today and where it’s going near term? 

MR: Well, we get surprised all the time. I mean, we’ve been surprised– I mean, just to go back to MRI for a second, there’s steady progress as people improve the things they know about and then a step function happens and somebody realizes, oh, how interesting. We can enforce sparsity and we can reconstruct data in the following way. That’s change. 

This happens. It happens quite often. The neat thing about that is that you can enable classes of technology that you haven’t thought about before, so you do static imaging with MRI. Suddenly, oh my goodness, the SNR has improved by orders of magnitude. Now we can do dynamic imaging. Then that opens up whole new classes of real time imaging. 

I think certainly in terms of what’s being done at Blink AI, you hit a limit and then you can enable a new class of technology. You can reduce the size of a sensor. Now suddenly the sensors become even cheaper. Then, oh, now there are multiple sensors on a device. And then you have all this multiplicity of data, which has more sparsity. 

As algorithms improve, that always is going to happen. As hardware is going to improve, that’s always going to happen. The beautiful thing about these algorithmic approaches is that Moore’s Law applies to chips and physical things, but the rate of GPU computers going kind of faster than that. If you can take what was once a hardware problem, like a quantum efficiency problem, and turn it into a software problem, you’re doing really well. You’re going to win. You’re going to continue to win as computer gets faster. 

RM: We won’t talk about it today, but I’m really watching sort of like, AI hardware space, neuromorphic chips and where all that’s going. There’s some really interesting stuff happening there. 

MR: You know the whole neuromorphic thing is great very interesting, because that’s how Bo and I describe auto map. You know, really based on– biologically inspired. And so the human biological neural network is a spiking neural network. And we don’t have that at all in synthetic neural networks. I think as we move down that direction, it’ll be very interesting to see, as you add that time dependence, what the heck is going to happen. 

RM: Do you feel like the industry is coming around to– you guys mentioned sparsity several times. And in general, I think that’s a word that has a lot of negative connotation in the AI space. And so I just wonder if you read into that– yeah, some people don’t like it. They don’t like dealing with it. 

MR: For us, we deal with, both in the startup and in the lab, we’re trying to improve natural images, not noise, but natural images. Natural images are sparse. That means you can go into some domain and in which you can represent them with less coefficients. People are very familiar with those for images because of, say, the wave of the domain, if you look at image compression. 

If you have some operation, if you identify a sparse domain and some operation that allows you to take, let’s say, noisy data and transform it into that domain where the data is sparse, the noise is not sparse in any domain. That’s sort of by definition. If you can eliminate all that noise, everything is great. You’re looking through the noise and then transforming back. When we say sparsity, that’s sort of what we mean. We mean really taking natural data, high dimensional data, natural data, and transforming it and operating it in a non-linear space where the thing we care about is sparse. 

BZ: Perhaps some of the topics that you were mentioning earlier that maybe have a negative connotation may relate to maybe specifically trying to enforce sparsity in your optimization. Maybe there are reasons why you might not want to do that. 

MR: Yeah, you can certainly do it wrong. 

BZ: Let the neural network just, exactly. Just do what it wants and then it will automatically happen. I think like Matt said, even the use of a convolutional neural network is, in some sense, you’re enforcing sparsity in your problem because you’re saying, hey, look, we care about, for example, images. 

We’re going to utilize an architecture that has certain properties, like spatial invariance and things like that. You’re already sort of imposing what you think is important to the problem and it’s sort of a form of sparsity. 

MR: The convolutional filters have locality to them. If there’s some locality to them, there’s going to be, you’re going to identify some sparsity. It’s a very, I guess nowadays.. What you’re saying, and if you do it wrong, if you say, oh, for some reason I want to make, I want to use less memory in my GPU, so I’m going to enforce sparsity in this place where it doesn’t belong, you might get junk or overfed or do some terrible things. But for us, again, because we have been so inspired by this biological vision archetype, sparsity seems to really drive that. There’s been a lot of successful, I think, mathematical signal processing discoveries going down that route. 
RM: Yeah, definitely. Good. We’ll end on that. Bo and Matt, thank you guys for coming on. If you have, those of you listening, if you have guests you’d like to see on the show or questions you’d like us to ask, please send those to podcast@talla.com. And, we’ll see you next week. Thank you.