AMA with Chris Mattmann, CTO at NASA JPL

Chris Mattmann answers the community's questions on his work at NASA JPL and on Apache Tika. Made by Angelica Pan using Weights & Biases
Angelica Pan

Questions

Ayush Chaurasia:
Hey Chris. Excited to have you here. In the podcast, you mentioned that on-device computation isn't really possible as the rovers have 'pea-sized' brains to prevent radiation interference. Why are these chips resistant to radiation interference? Is it something to do with chips of silicon being readily affected by radiation?
Chris:
thanks for your question! Because cosmic radiation can cause issues in space, such as messing up core digital signal logic and as I mentioned, flipping bits e.g., from 0 or 1 or etc., then we have to fly things that have tested well here on Earth as being resistant to radiation. This makes it so that we are behind on the actual chipset that we can use. What you use on your Desktop (like a GPU) is not traditionally available in space, today. However, tomorrow, it will be with the advent of the High Performance Spaceflight Computer (HPSC) class chips and even things like the Snapdragon available today

Danny Goldstein:
• How is ML baked into Perserverance? • What are the critical engineering breakthroughs that ML has enabled in JPL missions?
Chris:
Thanks Danny Goldstein for your questions! ML was part of Perseverance in particular during the Entry, Descent and Landing (EDL) phase with our Terrain Relative Navigation (TRN). This uses computer vision and traditional machine learning to guide the EDL system during the “7 minutes of terror” safely down to a suitable landing site in Jezero Crater on the ground. There are other places where ML is part of the overall command and control pipeline but that’s a very famous and recent one! In terms of critical engineering breakthroughs enabled by ML, we are still in the early days of this. Probably one of the most famous ones thus far have been in the areas of rock detection or classification and the AEGIS system on board is an example of this. But we are only now seeing the advancements in these areas.

Ivan:
Hello, Chris! How are you? I really loved listening to you talk with Lukas on Gradient Dissent. I have a few questions for you. 🙂 1. You were mentioning the ability to send more data to Earth from Mars by compressing it via deep learning. I wonder if something is in the works to use deep learning to allow the rover to drive (or perform certain tasks) autonomously all together. Do you know of when this level of autonomy would be most of value, and are you working on something like this? 2. I wonder how much of the development of Perseverance and Ingenuity is done by NASA, and how much is outsourced to contractors. Which tasks are done by NASA and which do you outsource when engineering and building these machines? 3. What qualities/background should a machine learning engineer have to be able to join NASA JPL team?
Chris:
Thanks @Ivan for your question(s). Yes in terms of deep learning for (future ) Rover(s), in particular for the Fetch rover part of Mars Sample Return. Percy will collect soil and rock samples in tubules, then drop them periodically during its journey so that later in some years, the Fetch Rover will pick them up and gather them to a rendezvous site where it can then launch them out of the Martian atmosphere, rendezvous with an orbiter, and then get back to Earth. As part of that, Fetch will need to drive “farther and faster”. Right now we are testing what we call “Energy Aware Optimal Autonavigation” or a capability to look at terrain with computer vision and then figure out, e.g., if it’s rocky, the wheels will catch better and it will use less power, if it’s sandy you will have the opposite effect. So that’s one area where they are planning for it. In terms of Percy and Ingenuity, JPL is the center of excellence for NASA’s robotic exploration and its FFRDC (federally funded research and development center). If it’s a first of a kind capability we typically maintain the workforce to do that but partner with companies to build specific software or pieces of it. Once it’s commodity we partner to get it out into industry but doing software release and licensing so that industry can build it after. In terms of qualities and backgrounds for ML engineers, we are looking for early career, senior folks, and folks across the gamut. You need not have direct experience in say, robotic exploration but having commensurate skills e.g., in Computer Vision, or in Optimization, or in Engineering disciplines is important. Also having a passion for space!

Andrea Pessl:
Hi Chris, I’m curious to know what you would recommend to a young ML engineer at the beginning of her/his career. Any wise words from an industry expert are appreciated. Thanks for taking the time to answer our questions!
Chris:
Thank you @Andrea for your question. My recommendation to the beginning ML engineer was some advice the great Dr. Chris White from DARPA gave me early on (he now runs an amazing team at MSFT Research). Always have one or two projects you go DEEP on. That you are _the Subject Matter Expert (SME)_. Then you can fill the remainder of your time with breadth, but make sure you walk backwards from the end user in those “deep projects”, all the way to the solution.

Devin Gribbons:
Hi Chris. What technology has surprised you the most in terms of how quickly it has advanced in recent years? And what if any implication does it have on the tech NASAS is building?
Chris:
Thanks @Devin Gribbons for your question(s). One technology that is most surprising to me really are neural networks. They rely on _tons_ of data, but in reality there is an infinite supply out there, the Internet. We’re all data factories, and we are all (constantly in some cases) online. The problem with the Internet is that the data is unstructured. Which leads to the other surprising area. Data Wrangling. It’s advanced so much in recent years, I’m so surprised since it’s relatively cheap per unit to make structured data nowadays, for example see http://scrapinghub.com With the infinite supply of data, and the ease to convert it to structured data, and neural nets hungry for it, you had a perfect storm. Of course the nets have to be made explainable which is the next hard problem.

Angelica Pan:
Hi Chris! What were some of the things that were surprisingly easy and surprisingly hard about writing books like “”Machine Learning with TensorFlow, Second Edition” and “Tika in Action”?
Chris:
Hi @Angelica Pan thanks for your question(s)! The hardest part about book writing for me has been finding time to think carefully and the energy to put my thoughts on paper. It was a perfect storm that I completed reading the original book ML with TensorFlow 1st ed literally in Fall 2019 and had gone through it with the approach of taking notes, writing Jupyter notebooks and scribbling on the book. My wife heard me at night - OMG - I see why Elon is scared now! 🙂 So really it was - with 3 kids, 2 dogs, and a house 😉 a lot of work that I found the time and passion at night for with MLwithTensorFlow 2ed. Once I had that, the pandemic hit in the 1st Q of 2020 and suddenly found myself having more async time to take those thoughts and put them into book form. For Tika in Action, it was a bigger lift b/c I was about 2-3 years out of my PhD at that point and not really interested in writing and still in individual contributor phase. But it was my topical and deep knowledge of Tika and collaboration with an amazing partner, Jukka Zitting, that made that book happen! We had a ton of material and interest b/c Tika was literally our 100% full time job at that point in each of our scenarios.

Sourav Gupta:
Hey Chris . Considering the SotA chip design research, by when you'd say these rovers will have the capacity for some serious on-device computation?
Chris:
Hi @Sourav gupta thanks for your question. Yes, the Qualcomm Snapdragon is on the Ingenuity helicopter! Why/how you say? It’s b/c it’s a technology demonstration and not part of the “core” mission and science return. Anything we get from it is a super bonus! So, because of that we can have a higher risk profile than that of the core mission with Percy. There are those that would debate what I just said, but that’s a simple provacative point. We need to have some room for tech demonstrations on missions. So yeah a SnapDragon would very well be a future HPSC like chip! what can you imagine as something you’d like to do with it? Send ideas my way chris.a.mattmann@jpl.nasa.gov!

Kyle Goyette:
Hi Chris, could you tell us more about how subject matter experts will interact with creating models in the future of automl?
Chris:
Great question @Kyle Goyette thanks for it. Great question, how will SMEs interact with AutoML. You could imagine instead of building say 3 different pipelines to perform a prediction or a classification, that instead, you give feedback on say, the gradients for a particular neural layer, and how you see it converging. You could if traditional explainable models, look at the original features, and their distributions over a particular Y value set, and then decide whether to use a different feature or not, and then to construct a new pipeline with that update feature and score the new models. So it’s neural layers, features, hyperOPT and parameter optimization. In all those areas. Instead of building and testing them from scratch let the machine vary and provide options when it does, and results. Then choose from the results as to which is the best score. You could also rank/rate the pipelines or neural architectures constructed (harder).

Tim Sweeney:
Hi Chris, thanks for doing an AMA - loved the podcast and hearing your perspective on ML, technology, and space robotics. Also, fascinating discussion about the multiple levels of data processing and associated retention. I am curious if you can share more about how launch/deploy decisions are made in your domain. I am used to a very iterative process of development, where the cost of a mistake is seemingly minuscule (and often positioned as a learning opportunity) compared to the cost of mistakes deployed to Mars. What sort of simulations, controls, or other safegaurds to do have in place to help mitigate the risks associated with these complex software systems?
Chris:
Dear @Tim Sweeney thanks for your question! The best way to explain the cost/benefit scenario from a launch / deploy would be the 2012 MSL Curiosity Rover. It was originally supposed to launch years earlier, but due to a delay in development we had to wait a couple years to send it. We learned a lot during this process, and how to optimize velocity and rank/risk elements of the development lifecycle. These are big decisions that can have big budgetary impacts that may span Presidential administrations too. NASA uses a full up formulation process, and risk process that is one of the most nuanced in the world. There is immense pressure on NASA, different compared with our commercial partners since failure has typically much higher costs. For risk mitigation, the old adage is that time is something there is possibly infinite of, just not for any one of us, so we need to use it wisely 🙂

Stacey Svetlichnaya:
Hi Chris, thanks so much for doing the AMA and for all your projects, this is such fascinating and important work. As I’ve been following—and getting really excited about—the recent achievements in space technology, a contrarian perspective I sometimes encounter is something like, “why don't we direct more of this problem-solving energy, research enthusiasm, and advanced technology work to helping the one planet we already have (e.g. addressing the climate crisis) instead of just exploring new ones”. While I think humanity can fully pursue both directions, I’m curious what you think about this perspective and whether you see overlapping opportunities, especially in the realm of machine learning, that would benefit both paths.
Chris:
@Stacey Svetlichnaya great question! I hear this perspective more often than people think. For me, Earth is a planet and as such we should be similarly focused on it, so doing “Planetary science” shouldn’t necessitate skipping one or the other. In fact, there may come a time in the future where the work we are doing on Mars and potential for human exploration and even setting foot on Mars may become an important necessity based on factors that we can no longer humanly influence. Besides that, there is the immediate reward and ROI of NASA technology commercialization. Like that iPhone camera? It came from NASA and its work on imaging systems. Use lzw compression? Yep, NASA. Fan of the PERL language (OK, a stretch I know I kid I kid), that came from Larry Wall, in my old section at JPL. So JPL and NASA produce amazing software that makes it into everyday life. So that investment in Planetary Science (including Earth) has an immediate ROI 🙂

Charles Frye:
Your call to think more about the hard problems in deploying to edge was inspiring! It seems challenging to take the first step on working on those problems -- relative to joining a Kaggle competition or a reproducibility challenge, to say nothing of just writing some blog posts! What's the best way to start out working in that space? Something like diving into a personal hobby project, taking a particular course, or joining a big open project.
Chris:
Thanks for your question @charles-at-wandb. In terms of working in space and getting started, I recommend visiting NASA’s Earth Science data centers (http://earthdata.nasa.gov/) and the Planetary Data System (PDS), http://pds.nasa.gov/ and taking a look at some of the amazing data there. You paid for it, as a taxpayer 🙂 It’s yours. Check it out. Then look at the HDF, NetCDF and FITS file formats. Then think of some hypotheses, thinks that you would want to predict or theorize about. Then, look in the data and explore it? What trends do you see? What’s the error bar(s)? Rinse, wash, repeat 🙂 and enjoy!

Siddhi Vinayak Tripathi:
Hello there, Chris. I was wondering what lead to the creation of apache tika?
Chris:
Thanks @Siddhi Vinayak Tripathi for your question. Apache Tika came from the work we were doing on Nutch and Hadoop. Basically it became nouveau to componentize and make more modular the behemoth that was Apache Nutch for crawling the web and being like Google in the mid 2000s. So when we split Hadoop out into its own project, we asked where else could we do this and make things more reusable? A natural fit based on user requests at the time was the language identifier, the parsing system, and the MIME type and metadata system. There were other domains like content management, and cyber security and so forth that wanted these capabilities in a reusable form, but didn’t need the web crawler. So, Jerome Charron and I tried to split out Tika and failed the first time not having the needed support from communities like Lucene. But we found a champion in Jukka Zitting who was very experienced in open source and Apache and who helped us craft the initial TIka proposal in a way that gained consensus. Once we did this, we were on our way. Tika is the name of Jerome Charron’s child’s stuff animal. It was popular to do that at the time (search Doug Cutting and where he got the Hadoop moniker from). Thanks!

Conclusion

Chris:
OK everyone, thank you for your time and for your questions! Apologies that I didn’t get to all of them. Thank you so much for hosting me @Lavanya and the W&B community! Looking forward to seeing you here in the future!
Thank you Chris for taking the time to talk to our community!

For more AMAs with industry leaders, join the W&B Slack community.