Fighting Cancer With Data: Featuring Flatiron Health CTO Cat Miller



on •

Jul 11, 2023

Fighting Cancer With Data: Featuring Flatiron Health CTO Cat Miller

“Anyone who knows anything about drug development would say it is a slow and painful process”. Cat Miller is CTO at Flatiron Health, a software company that uses technology to “clean up all the gunk in the trials industry”. Flatiron helps researchers develop new cancer treatments by making clinical trials run smarter and faster. They also identify trial data that lead to new treatments, and help doctors manage and improve patient care.

On this episode, Cat breaks down how Flatiron takes piles of data from “hot garbage”  to deep insights that lead to innovative cancer treatments. She also discusses Flatiron’s approach with AI, why she believes there are no wasted skillsets, and how her experience as an actor has helped her steer Flatiron’s team.

Listen to Crafted, Artium's podcast about great productsand the people who make them.

Listen and subscribe to Crafted: Apple Podcasts | Spotify | All Podcast Apps

Full transcript below — but we recommend you listen for the best experience. 

Cat Miller: What we do on the data side is going from, "Hey, here's some hot garbage in terms of unharmonized, unnormalized, unstructured data in a giant pool. What are the key elements for us to pull out so that that data becomes actually information at the end of the day?"

Dan Blumberg: That's Cat Miller, the CTO of Flatiron Health. Flatiron helps researchers develop new cancer treatments. Doctors and their offices use Flatiron software to manage patient care, and Flatiron then uses the data from those systems to detect patterns that can improve care. On this episode, Cat explains how Flatiron's tools can speed up drug development.Cat Miller: Where is all the gunk in the trials industry that we can sort of clean up and make faster with technology?

Dan Blumberg: We'll learn about Flatiron's approach to AI, which has evolved in the past few years from not trusting it to a cautious embrace to what's coming next.

Cat Miller: If I fast forward what it feels like the curve on these things are, getting to a place where you just genuinely trust its ability to structure unstructured text, for our purposes, I think we can imagine over 10 years, 95% of our data would come through models.

Dan Blumberg: Plus, how Cat's previous career as an actor helps her run a data technology organization.Cat Miller: I tell my engineers take improv classes all the time, because if nothing else, improv makes you less scared of making a mistake.

Dan Blumberg: Welcome to Crafted, a show about great products and the people who make them. I'm your host Dan Blumberg. I'm a product and engagement leader at Artium, where my colleagues and I help companies build incredible products, recruit high performing teams and help you achieve the culture of craft you need to build great software long after we're gone. You're an engineer with a master's from MIT. Has your interest always been in data?

Cat Miller: I definitely got a CS degree from MIT because that's what the cool kids were doing at the time, and I didn't have a direction or a thing that I wanted out of it. I was like, "Well, this will probably get me a job." I didn't even know that I was interested in healthcare data until I noticed that my first couple jobs were in it. By the time I was in my second job, which was another healthcare data job, I realized, "Okay, maybe this is actually where my interests lie." So, of course I left that job by going to be an actor in California for 15 months as one does when one's realizes something like that.

Dan Blumberg: Wow. So what was it that scared you so much about health data that you ran and went to LA to be an actress?

Cat Miller: Honestly, it was that feeling of I can't do both of these things. I can't stay up until 5:00 AM filming in a warehouse and then try to go to work at 8:00 AM, and I need to pick. And I know what this one looks like, but what does the other one look like? And so I needed to go get it out of my system for a little bit.Dan Blumberg: You've worked your way up the ranks at Flatiron Health from an IC over nine years to now be the CTO. First of all, congrats, both on the title which you've held for the past year plus, but also on just finding a place to work for almost a decade that is so rare these days. I'm curious, what is it about Flatiron, about health data, that keeps you so engaged?

Cat Miller: Yeah, it's interesting because I think it's changed over time. I went to Flatiron because I knew I was looking for a health data startup, and there were so many health data startups at the time, and I think probably still, that felt like they were on the fringes of a problem. The product wasn't bad, but is the 15th app that lets diabetics record their A1C going to move the needle? It didn't feel like, "Oh, this is a big deal." And Flatiron's approach as a transformative business made a lot of sense to me. So, that's like why I started there.

And the reason I have stayed so long and the reason I continue to be really proud to work there is it's a company that, at its core, the people who run it I think run it with the values that matter to me. Which is not to say that we never make a mistake or that we'll never have to do something that I don't like or don't want to do, but I think all of our decisions really come from the best possible place, which is what is right for our employees and what is right for patients.

Dan Blumberg: Flatiron's old mission is to reimagine the infrastructure of cancer care and that begins with tackling the snails pace of drug development. From the initial molecular science to the three phases of human trials, new drugs take years and cost billions. Even when approval is reached. There can be multiple follow up tests to monitor any issues with drugs after launch.

Cat Miller: Anyone who knows anything about it would say it is a slow and painful process. And when you look at that and you say, "There's so many inefficiencies here." And to start with, Flatiron was taking the approach and still takes the approach of, "Well, here's one thing, real world data, answer some of these questions better than other sources." So, I mentioned what we call post-marketing commitments or anything that takes place after a drug has been approved, very hard to run a trial for. And what you're asking is, "Hey, in the real world, is this actually causing problems that we didn't see in the trials? Do we need to be worried about this?"

And so that's a place where you can even conceptually think, wow, if we just had the data from actual patients, that would be cheaper, faster, would get us better answers. And it's true along the way as well. There are trials you cannot run because they're unethical. There's all sorts of places where you can imagine speeding it up. And then we've more newly had a direct clinical trials edge. So, real-world data will never be the solution to everything. We're still going to run trials. Where is all the gunk in the trials industry that we can sort of clean up and make faster with technology? And so that's kind of the pieces of the infrastructure that we're thinking about re-imagining.

Dan Blumberg: Flatiron's biggest point of care solution is OncoEMR, which stores health records and lab results and helps create treatment plans. But it also integrates with research tools so that data from clinical trials can be collected and sorted for greater insight.

Cat Miller: OncoEMR is our largest piece of software, so that's used by thousands of clinicians in the United States. We have about 40% of the US market in terms of community oncology sites. So, that's a case where as the data comes in, as the patient is having their experiences, the clinician is documenting and that then becomes available to us. We also work with a number of academic institutions in a slightly different way, so we're integrating directly into their infrastructure, and in return we give them back tools for them in terms of being able to manage their patient populations, or in terms of data and research we give them their own patients back, et cetera. So, those are the two main ways that data comes in. It comes in, and as you might imagine, a giant pile, a giant mess, and then all the rest of what we do on the data side is going from, "Hey, here's some hot garbage in terms of unharmonized, unnormalized, unstructured data in a giant pool. What are the key elements for us to pull out so that data becomes actually information at the end of the day?"

For data products, the primary customer is pharmaceutical companies, so this can be anyone big or small who is developing molecules, and they are using it for everything from the very beginning of that process through how do I build a trial that is likely to be successful and accrue well? Now thinking about things like how do I build a diverse clinical trial, all the way through that kind of, "Okay, my drug's been approved, but I'd like to know more about it or I'd like to expand its label." That kind of whole thing encompasses what our data sets can support. And on the clinical trial side, it's actually a mix of both.

Some of it's that the pharma companies want to be able to run a faster trial, and some of it's that an individual practice would love to be able to run 10 times the number of trials that they run right now, but that is overhead for their staff. There's only kind of a certain amount they can do with the amount of effort that they have to put in. So, if we can make that smoother for them, they can run more trials. So it's actually sits between the two sides.

Dan Blumberg: Can you give an example of data that was observed from the point of care software and then pharmaceutical industry used it?Cat Miller: Yeah, so a great example is breast cancer. So, 1% of breast cancer patients are men, which is not an insignificant number, but it is a small enough amount that having a clinical trial explicitly for men is a problem. And because this is a hormone driven disease, you can imagine that this is a case where we can't conflate men and women. The difference between different hormone levels and different factors is too significant. So, actually men have had very few breast cancer options. So, this was an area where you know can imagine very easily going into the data and saying, "Hey, give me all the breast cancer patients in Flatiron's dataset who are men." And so that's a case where we've had a customer come to us and say, "Help us figure out does this drug work in men? Does our drug actually work in men?" And so we went and we found obviously patients who had taken that drug, also patients who had not taken that drug, provided a data set, and helped use that as supportive evidence to getting that label approved for men with breast cancer.Dan Blumberg: I'm interested, how do you instrument the clinician facing software in such a way that first of all, there's just data that you want to use to just improve the product, but there's also the data itself that you're collecting with drugs is a patient using what protocols, et cetera, how have you navigated that process? How have you improved over the years?

Cat Miller: Well, the first thing I'll say is that the physician and the patient experience comes first. So, we cannot and will not add something to the physician process because it helps the rest of our business but doesn't help the physician. And that's been our position from the start. So, a lot of this is done on the backend. You laugh about, well, it's our data and I call it a mess. There's a couple things about that. I mean, one, the database that you build to power an application is not the database you build to analyze data on top of it. Even in terms of things like how normalized is it? How do you structure the tables? What kind of calculated fields? It's different. It's optimized for running an application. So, I think anytime you take data from an application, you should expect it to be structured in a way that does not make sense for analysis. And I suspect that that's a relatively universal experience.

Dan Blumberg: A hundred percent. I mean, I'm a former journalist, work with a lot of content management systems, and so you're talking about user-generated content and to some degree here, there's no consistency all that. So, I'm sure this is not unique.

Cat Miller: Well, and that gets into the second problem, which is the reason people like OncoEMR, one of the reasons, is that it's highly customizable. And as a data person, the words highly customizable should strike fear into your heart. And in fact, we've realized over time that maybe it was actually a little too customizable and it let people get themselves into a corner. So, we've tried to give them a little bit more helpful structure. But at the end of the day, still, you have drugs called different things. You have labs called different things. There's both the inherent customization and then there's the weird hacks that people get into because there was something they needed that they didn't have or they didn't know where to find it. And so that data even and of itself, even before you get to the unstructured data, is actually quite messy and needs a lot of harmonization to fully get at it.

Dan Blumberg: What are some of the adaptations that you've made in response to needs from the researchers who are looking to ingest the data that you're providing and create new cancer solutions?Cat Miller: Well, the rate of change in the beginning was rapid to say the least. And some of that's because when you're building a new product, I mean MVP philosophy, so our first product had a handful of tables. It had their diagnosis and a couple of things that were very cancer specific, but like shockingly little else. And over time, we got more and more requests to look at different elements of it. So, I think this was maybe the interesting thing is to realize how many different questions our customers were asking.

So, for example, we didn't include labs in our initial cut. Labs are not all that heavily used in cancer care. They're much more heavily used when we're talking about a blood cancer. But for solid tumors, something like lung cancer or breast cancer, they're not as relevant. I think in the beginning it was really like, "Oh, there's a whole table missing. We do not include labs or we're not including height and weight," which is very important if you're trying to understand obesity factors for disease, et cetera. So, it's like things that we added as quickly as we could in the beginning. And then we got into what I would say are more niche questions.

Dan Blumberg: One of those niche questions was the performance of one drug brand name over another.

Cat Miller: I mean, the brand name one is really interesting because what's going on under the hood is that you have this kind of raw data streaming in, and we're creating mappings. So, we're taking some key in that raw data and saying, "Oh, if I see this key, that means that it's going to map to this structured and very rational thing on the other side." And so with the brand name issue, our mapping key wasn't something that could possibly give the brand name. So, our mapping key was at the molecular level.

So, that actually was a change that was actually a really big one in the sense that we couldn't use that mapping key anymore, which fundamentally would mean that we needed to remap the entirety of our drugs, which is a very significant process. What we ended up doing in the beginning at least was to say, "Okay, this is a gigantic change that will take us a very long time to implement. Let us do something that will allow us to answer the specific question that this customer is coming to us with." And we did a little bit more of a hack around it where we knew, "Okay, we only need to worry about this class of molecules and we'll do sort of a second set of mapping for those."

Dan Blumberg: What's the value of the brand name in this case? If the molecules aren't different from brand X and brand Y, what are they testing for?

Cat Miller: It is definitely possible for the same drug to be packaged differently. So, for example, an injection, versus an IV, versus an on body stay with you after the fact packaging, that wasn't necessarily reflected in that data. And so that's a case where knowing the brand name tells you kind of more information about how the patient was using it, how they were experiencing it, and what the side effects and the repercussions of that were.

Dan Blumberg: How have you been employing AI over the past several years? And how now that it's achieved all these new recent major headlines and milestones with LLMs and generative AI, how do you expect to use it in the future?Cat Miller: It's interesting because what Flatiron's approach has been sort of the anti-AI approach in the beginning, I remember having a conversation with the founders before I joined and I was like, "How are you extracting all this unstructured data?" And they were like, "Humans," and I was like, "Really?" And they were like, "Yep, humans." And it was very emphatic and it was very intentional, and it was actually a very, I think, rational response to a lot of companies claiming a lot of wild and crazy things about what AI could do 10 years ago. And then about five or six years ago we said, "Okay, we can see how this is growing at a large rate. We can see the fundamental limitations of it. We're also very worried about injecting any bias into this data because it's being used for very important decisions, so we need to be extraordinarily careful about how we approach it."

And so we in the most gentle possible way started looking at using ML for basically filtering. And we did a ton of work on bias. We had accuracy that meant that we had 96% similar cohorts between the ML cohort and the non-ML cohort. So, that was the start, and it's been ramping up over time. And then I think about two years ago we finally said, "Okay, the techniques have really advanced to the point where we can consider actually delivering data that was extracted in this fashion." And that's a combination of the world changing, like techniques getting better, a level of comfort, and a recognition that there was no way we were going to extend through our human abstraction process 15 million patients. The cost and the overhead of that is just infeasible.

So, for the last two years we've been starting to develop ML extracted variables. An easy way to think about this as smoking status. An ML extracted smoking status is more accurate than a humans, because humans have just have some percentage of messing up. We just make mistakes sometimes. And it's a very easy variable. It's always said in one of the same three ways. You never have a sentence where someone's like, "Not," and then a whole bunch of other words, "A smoker." And I think the revelation or the realization over time is we can do that one and a bunch of other ones, so it's actually worth it. LLMs, I think, add another layer of plausibility on top of it. So, we are still in the early days of experimenting with them. We're definitely on that path of how could these basically make it faster and cheaper for us, or higher quality, or all of the above, to be able to do this ML assisted extraction process, and really increase the number of patients that we're learning from.

Dan Blumberg: What do you think are some potential use cases for it? If you fast forward five, 10 years, how do you think you'll be using it then?Cat Miller: Well, I definitely think that there's a world where extraction of data from text is considered easy, which is incredible to me to say those words out loud, because it's been so hard for so long. But if I fast forward what it feels like the curve on these things are, getting to a place where you just genuinely trust its ability to structure unstructured text across a wide variety of domains. Today it's going to require a lot of fine tuning and maybe that'll always be the case. But certainly for our purposes, I think we can imagine over 10 years, 95% of our data would come through models.

We're also interested in it for its efficiencies. So, we have a project right now for it to write SQL for us. So, this is a safe use case where we're not giving it any patient data. We're just saying, "Hey, these are our schemas. Here's a random question about how many cancer patients have this particular biomarker. Write me some SQL." And it seems to do a pretty good job of that. So, this is where I think we're in the same boat as every other company right now, which is trying to identify what are actually practical things that actually save us time, versus fun toy random stuff that seems cool but doesn't actually move the needle.

Dan Blumberg: Yeah, in a recent interview, Flatiron CEO described the phase that the company's at is Flatiron 3.0. I'm assuming phase one is the early startup days. Phase two is after Roche acquired Flatiron in 2018. Can you share a bit more about the early and middle years and also what Flatiron 3.0 means?Cat Miller: Yeah, I mean, the early years were probably the same as they feel at any other startup that, although we got very lucky in the sense that I think the first year or two I was there, the product market fit felt tenuous. But from about the second year on, there was a lot of traction. We had customers who churned who came back and said, "No, I tried it the other way. It doesn't work for me." Actually, Roche, before they were an owner, they were a customer, and they've always been seen as a leader in the space. So, when they started using our data, it was also a pretty big signal to the world that it was worth using. And so we had the fortunate, I think, experience of mostly trying to churn out the features that we knew our customers wanted, and get the data to the place that they needed it to be.

And on that and the strength of that sort of pathway, we got acquired by Roche. And our hope at the time was that we would be able to use real world data to supplant a lot of clinical trials use cases. I think since then we've been in what I'd call an inflection period. We had the initial kind of S-curve up. We plateaued where we've saturated the market in a lot of ways for our data products. So, the current curve is what are the... I'm going to use the word synergistic unironically, which I'm embarrassed by, what are the synergistic businesses? So, you've got a point of care business, you've got physicians and you have a expertise in building tools that are easy for clinicians to work with. We have a data business and we have an expertise working in data and with pharma on drug development. And so that's I think 3.0 is it's not going to be a data alone business, and recognizing how do we put the pieces together in a way that makes sense as a business as well as delivering value to our customers.

Dan Blumberg: How have you shifted? You started at Flatiron nine plus years ago as an IC. You've become a manager and then a manager of managers. How has that journey gone for you? What are some surprising things that you've had to learn along the way?Cat Miller: I definitely remember many years ago telling my manager that I really wanted to be an architect because this management thing seemed like was just too exhausting and I didn't think I wanted to do it long term. I think for me, higher levels of management have been more fun because I get to do projects with the people who report to me, the people who are around me, and I get to see them really doing better work than I possibly could on a lot of these things. And that's been really just like fun. And I finally reached the point where I'm more excited to see other people crush a project than I am to do it myself. And that was definitely a mental journey for me.Dan Blumberg: Yeah. We chatted at the outset about your communication skills, your training as an actress, and now you're in the C-suite. I imagine communication is one of the most important skills now.Cat Miller: I have always had this opinion that there is no wasted skillset and that most of our lives are far more overlapping than we talk about them being. And so I think improv skills, like I tell my engineers, take improv classes all the time because if nothing else, improv makes you less scared of making a mistake, which is I think just incredibly beneficial in life. I think that communication is huge. I think it's always an important skill. I think it's so underrated for engineers. It's something that we absolutely interview for.

But I feel like anyone who's been in leadership in a company, any level of leadership knows that their job is to tell someone to talk to someone else, like 90% of the time like you are in information hub and a conduit. But it means that even as an IC engineer, as a person in tech, understanding the requirements that your product folks are giving you, and asking the good questions about when this happens, being that person who can help figure things out as early as possible, communicate your status, it's so important even at the junior levels. That's what makes people trust you.

Dan Blumberg: I said I was circle back to the other random things that are helping you in your career, and I'm curious if you could name one or two other things that's helping you in ways you didn't expect or you didn't didn't see coming.

Cat Miller: So, I have the problem that many I think kids of my generation did, which is that things came easily to me in school, and so I never ever learned how to be good at something that I wasn't good at initially. It's crazy talk about, but I really fundamentally didn't know that I could be bad at something and get better at it. And this is going to sound ridiculous, but I play, there's a game called Dark Souls, which is a computer game that is infamous for being really difficult. There are no difficulty settings, you just do it. The game is as is, and it is known to be incredibly hard. I genuinely think the experience of playing that and going from so bad that I died like 15 times in the first five minutes, to getting through the hardest boss in the game, taught me something about my ability to deal with failure and to adjust to it and learn and find sources of learning.

Dan Blumberg: Amazing. Thank you so much for your time. We could go on for a long time. I'm really interested in the way you tie all these different strands together.

Cat Miller: Well thanks, Dan. This has been fun.

Dan Blumberg: That's Cat Miller and this is Crafted from Artium. If you are building innovative solutions to big problems, let's talk. Artium can help you build great software, recruit high performing teams, and achieve the culture of craft you need to build great software long after we're gone. You can learn more about us at, and start a conversation by emailing If you like today's episode, please subscribe and spread the word because Crafted is at an inflection point.Cat Miller: What are the synergistic businesses?