Big Data & Machine Learning in Catastrophe Modeling: Dag Lohmann Interview

Data Science Weekly Interview with Dag Lohmann - Co-Founder of KatRisk - on how Big Data and Machine Learning are shaping the Catastrophe Modeling field

‍

We recently caught up with Dag Lohmann, Co-Founder of KatRisk. We were keen to learn more about his background, how the world of Catastrophe Modeling is evolving (and the influence of Big Data and Machine Learning) and what he is working on at KatRisk...

Hi Dag, firstly thank you for the interview. Let's start with your background and how you became interested in data and Catastrophe Modeling..

Q - What is your 30 second bio?
A - Before co-founding the risk modeling company KatRisk LLC, I was Vice President of Model Development at Risk Management Solutions in Newark, CA leading a team of modelers and software engineers building and implementing catastrophe models. I worked for 7.5 years on the development of continental scale flood risk models in RMS and RMS' Next Generation risk modeling methodology. Before that, from 1999 to 2004, I was with the National Weather Service, NOAA/NCEP/EMC in Camp Springs, MD, where my main interest was data assimilation with real-time data, forecasting and hydrological modeling.

In terms of education, I received a Physics Diploma (Masters) from the Georg-August University in Goettingen (Germany) and a Ph.D. from Hamburg University (Germany) before working for 2 years as a postdoc at Princeton University. I received the 1999 Tison Award of the IAHS and have published numerous papers on risk modeling, hydrological modeling, model uncertainty, forecasting, data assimilation, and climate change.

Q - How did you get interested in Catastrophe Modeling?
A - I remember having a conversation with my brother (Gerrit Lohmann, Professor at AWI Bremerhaven in Germany) in 1998 about climate and extreme events. We thought about looking into this as a commercial enterprise. I then looked up what the marketplace was and found out pretty quickly (not surprisingly) that are companies working on this. I then applied to one of these companies, but it wasn't until 2004 that I started working for RMS, the current market leader in catastrophe modeling.

Q - What was the first data set you remember working with? What did you do with it?
A - The first real data set I worked with was in 1992 when I was still a "real physicist". I did my Masters degree in high energy physics then and worked on Coherent Bremsstrahlung. It was a very interesting problem and I had a very good supervisor (Prof. Schumacher, Goettingen). I had written some code (in C) that would do quantum electrodynamic computations of high energy photons that were created by fast electrons hitting a diamond. After the results from the experiment at MAMI B in Mainz (Germany) came back and I saw that the computations matched almost exactly the predictions I was excited. We had run the experiment all night and in the morning I drove 250 miles back home (on an old Suzuki 400 motorcycle) to show these results to my professor. I still remember riding that bike for 4 hours -- tired, but happy!

Q - Was there a specific "aha" moment when you realized the power of data?
A - That came much later, and in a way that "aha moment" is still happening to me quite a lot today. Data was never too special for me, I always liked simple concepts and models that are able to reflect reality. I am now quite amazed by what is available out there and I always want to do more... It is quite an exciting time for people that like data and models. I feel we are only scratching the surface right now.

Very interesting background - thanks for sharing the personal story - sounds like a memorable bike ride :) Let's talk in more detail about Catastrophe Modeling and how Big Data and Machine Learning are influencing the field...

Q - How is the rise of Big Data and Machine Learning changing the world of Catastrophe Modeling? What is possible now that wasn't 5-10 years ago?
A - I always find it interesting what "Big Data" really means in Catastrophe Modeling. I think many people largely think about Big Data as unstructured data that tracks behavior on the internet. Meteorologists and climate scientists have dealt with very similar problems and tools (EOF/PCA analysis is a good example) for a long time. I think BD and ML will change how we structure code and data, but the math behind will stay the same (or evolve slowly). There is a lot of well organized meteorological data and many smart people are using them for many different purposes.

Big Data for me also means large computations. We are in a world where I can now build a computer for $50K that has the same compute power as the world's fastest computer in 2002. Quite a change - and cloud computing hasn't even really started yet. Catastrophe models will use more and more information in the future. I can easily imagine how all the different and divergent data sources might be used (in aggregate) for decision making in the future.

Q - What are the biggest areas of opportunity / questions you want to tackle?
A - Scientifically I would like to do risk forecasting. But we first have to be better in modeling the current weather and climate risk before we do future risk. There is still so much work to be done to understand climate and weather. I sometimes think that the scientific outlook on future risk is changing too quickly and therefore loosing credibility. But I find the combination of climate models and risk models very interesting and we are already thinking about the best way to do this soon.

Q -What Machine Learning methods have you found most helpful?
A - Everything that simplifies and classifies. For a long time my favorite algorithms have been PCA/EOF based.

Q - What are your favorite tools / applications to work with?
A - My toolset is rather simple: R, Fortran, csh, bash, emacs and QGIS on Ubuntu Linux. I am still yet to find problems that I can't solve using these. I for some reason never really liked C# and Java, or any of the languages that are popular now (JS, etc.). Too many lines of code to write before something happens. R for me is unbelievable, and one must applaud the people behind RStudio for providing great tools to the community!

Q - And what statistical models do you typically use?
A - We've recently started digging in on much more sophisticated questions. As such, we're deploying the newest most models that an actuary would be using, plus models that reflect nature at a very basic level (Poisson distributions, etc.).

Makes sense - interesting to hear how the field is evolving! Let's talk more about what you're working on at KatRisk...

Q - Firstly, how did you come to found KatRisk?
A - I was very fortunate to start KatRisk with two of the smartest people I know (Stefan and Guy). I always wanted to see what I can do with others (without the limits a large company sets) - plus the excitement is hard to beat when you're out by yourself. The timing was great when we started 1.8 years ago. We are getting good feedback from the marketplace and hope to be a nimble, smart and agile player.

Q - What different types of catastrophes does KatRisk assess?
A - Right now we're doing flood, storm surge and tropical cyclones. We just released our US and Asia Flood maps, also we have online versions of our tropical cyclone models at http://www.katalyser.com

Q - What have you been working on this year, and why / how is it interesting to you?
A - We are working on our US and Asia models. Overall it's an interesting scientific problem - but we also think that we can have commercial success with these offerings. Nature is quite complicated and to describe natural phenomena with a mix of deterministic and statistical methods is quite challenging.

Q - What has been the most surprising insight you have found?
A - I have gained much more insight into the relationship between climate and extreme weather. Another surprise - on a more personal level - was how much one can work when you have to. The pressure when you are really dependent on yourself is quite different than what you experience in a large company.

Q - Sounds like an interesting time to be working in the field! Can you talk about a specific question or problem you're trying to solve?
A - We're solving the problem of adequately pricing insurance based on each individual building's characteristics. That's why we created 10m resolution flood maps for the US. The basic underlying principle we follow is that risk adverse behavior should be rewarded with lower premiums. We like to believe that we can contribute to a more risk resilient society that way.

Q - What is the analytical process you use? (i.e., what analyses, tools, models etc.)
A - We do everything from finite element modeling on graphics cards written in C, Fortran, CUDA and shell scripts to data analysis with R and QGIS. I like to keep things simple but powerful. As much as I would like to learn new languages and concepts, I can still do everything I need with these tools. We have written all tools and models from scratch by ourselves!

Q - How / where does Machine Learning help?
A - Not as much as I would like to claim. In principle one could apply machine learning to many more problems for insurance pricing. We are more focused on the climate and modeling problem currently. Once we have all these models up and running I would like to go back and take a deep look at ML again. I believe there is a lot of untapped potential.

Q - What are the next steps for this product?
A - We have to start marketing it now. We are essentially just done with our first model and data release. We would like to make these products available to a large audience, but that requires that people understand the value for their business. The development on these models never stops. I also think that we are just at the beginning of the open data revolution and that this whole field will look quite different in 5 years.

Dag, thanks so much for all the insights and details behind KatRisk - sounds like you are building some very valuable tools/products! Finally, its time to look to the future and share some advice...

Q - What does the future of Catastrophe Modeling look like?
A - In the future we'll debate complicated issues, such as the cost of climate change, more through data and less through opinions. I really like that science seems to be becoming more open and data more accessible. Catastrophe models and their principles are really at the heart of many discussions when it comes to global change and how we can think about it. I really look forward to that!

Q - Any words of wisdom for Machine Learning students or practitioners starting out?
A - Learn statistics and R - and find an interesting problem. There is a whole world out there that needs these skills, not just analytics about ads and what people will buy next.

Dag - Thank you so much for your time! Really enjoyed learning more about your background, how the world of Catastrophe Modeling is evolving and what you are working on at KatRisk. KatRisk can be found online at http://www.katrisk.com/.

Readers, thanks for joining us!

P.S.If you enjoyed this interview and want to learn more about

what it takes to become a data scientist
what skills do I need
what type of work is currently being done in the field

then check out Data Scientists at Work - a collection of 16 interviews with some the world's most influential and innovative data scientists, who each address all the above and more! :)