Communication Tips for Data Scientists
In this episode, we talk about the struggles data scientists face with communicating their work and tips on how to address this. Our guest is Avery Smith, founder of Data Career Jumpstart, and he also shares his thoughts on how to use certain data science terms when talking to non-technical audience members.
What You’ll Learn in this episode
- Common struggles data scientists have when communicating their work
- Alternative ways to visualize data i.e. 3D charts, Augmented Reality
- Tips on how to use certain data science terms when communicating to non-technical audience members
Avery runs a 21 days data challenge. You can find more info and sign up here.
You can connect with Avery on:
Get in Touch with Hana
If you are looking for podcast updates and want additional tips on how to visualize and present data sent straight to your inbox, then make sure to subscribe to my weekly data letters here.
When you hit that subscribe button, I’ll be sliding into your inbox every Wednesday with an email.
Love the show? Why not leave a review?
It only takes 2 minutes and provides me with invaluable insight as to what the listeners think.
If you enjoyed this episode, check out this episode where you can further learn how to confidently communicate your work.
On today's episode, we are joined by Avery from data career jumpstart. I actually met Avery through clubhouse last year, and thankfully we've remained connected and in touch since then. And I'm really happy to have you on the show today, Avery. Welcome. Thank you. I'm excited to be here and, and yeah, I totally forgot that. We met on clubhouse. That was like a, a one month thing. Wasn't it. Yes. I know there's some people who are still active on there, but it honestly was very draining. So I haven't actually consistently kept up with it. I haven't opened that. I don't know in a long time, but I know there's some people who are still active and benefiting from it, so good for them. and I'm glad that it actually was able to let me to. Connect with people like you, I've met so many people through clubhouse that I still keep in touch with. So I'm really grateful for that. Could you start by telling the listeners a little bit about. So hello listeners. My name is Avery Smith. I'm a data scientist. I'm obsessed with data. I love all things, data and, and data. Visualization is one of my passions. A little bit about me is I currently do a couple different things, but mostly focused in the data science education space. So I have an online, I, I don't like to call it a boot camp. I call it a project camp. Where I help people pivot into data careers. I also helped teach MIT's data engineering program in a past life. I was a data scientist for science and, and energy companies. So I was a data scientist at ExxonMobil and worked in the energy and oil and gas space for about two and a half years. And before that, I worked in biotechnology as a data scientist. So love data science and just happy to be here. I love how you called your program, a project based one. You emphasize projects a lot, which is really great and practical. And so I'm really glad that you do that while you worked as a professional data scientist. I guess also not only at your former employer, but also as a freelancer, and even now you still work on data science projects. Could you tell us what different ways you've had to communicate your. Yeah, this is something I'm I'm definitely really passionate about is, is communicating work. And I think it's actually really changed in the last, you know, year, or I guess at this 0.2 years, I mean, COVID basically messed with every industry and with almost every aspect of life. And it's definitely been a big mover in my life. I started my business about a year ago. And I, like you said, I, I still do freelancing. I still do consulting on the side and I've actually. Never ever had a client in the same state as I've been in in my freelancing and, and consulting. So presenting online has been huge for me because it's like I don't get face to face time. With my clients. So the only chance I really have to talk to them is, is over zoom or teams or Google meet or whatever you use and presenting in that hour or 15 minutes or a half hour. Whatever time we have allotted is, is really big because that's the only real time I have to a, you know, meet them and, and have FaceTime with them. But B also convey, you know, insights or learnings that I've had and then see, like get feedback and, and you. Have a conversation with them. So presenting is really big and I think it's really been changed over the last year. It's been a lot different than it has in, in years past. I think. Did you have to do any in person, ones back when you were at ExxonMobil? Yeah, that's probably, that's probably the last time I've had an in person presentation, I guess I've done. I've done some in person seminars recently. But yeah, at Exxon Mobil, we definitely had a lot of in person presentations. And some of those were to just like my boss who was more like a friend in all honesty. And some of them were more to like, Like VPs or, or people who were maybe had a little, little bit less time. So those were really challenging and really rewarding. There was definitely a, a lot of those presentations at Exxon. So where have you noticed data scientists, whether it's the colleagues you worked with with before or even just data professionals now, because you have your own company now, and you work with a variety of different data professionals, where have you noticed them struggles the most when it comes to communicating data? I think the, the biggest thing is when they've done analysis with code, for some reason, when you get into code. We just forget that, oh, I actually have to present data I have to present something there's, you know, business insight. So I see a lot of, a lot of the students I, I work with, we have quarterly hackathons. So for instance, last quarter, we had a hackathon with a real company it's called the column. It's a, it's a newsletter company. And We basically, they, they had some business problems, data, business problems that they asked us to solve and we crowdsource solve them and pitched them and presented to the CEO. And one of the things I noticed is when people get into a Jupyter notebook or some sort of coding IDE, they, Promo Clip they really struggle to like, remember that the CEO doesn't know how to code and even if they know how to code, they don't really want to, they don't wanna read code. And so I think one of the biggest struggles I've noticed is like, how do you. Effective code that you've written written, and you've made some good graphs. Maybe you've done some analysis. Maybe you made a model. How do you turn that into an actionable digestible presentation or how do you just give that information to someone who doesn't wanna read through all your codes? So that's one of the biggest struggles I see is how do you turn code into insights and, and presenting that effect. Yeah, when I was a new data professional, I thought sharing code on my screen. In keeping my audience in mind. And I knew there was like a time and place where maybe it's appropriate to share a code. If the purpose of that presentation is to get feedback from my technical audience of my peers or to share or demo my work to fellow data professionals. But as you mentioned, you're often communicating your work to non-technical audience and being able to translate that or translate the outputs from that, the results from your project so that they can see the benefit and the recommendations. Is the part that we sometimes struggle with. So that is a very good point. Thank you for sharing that from, you know, your experience and, okay. So I have another question for you, and I think I know the answer based on like some content you shared, but I'm really excited for the listeners to also hear your take on this. Are there any trends or innovative ways of communicating data that you think are taking off or have the potential to take off? Oh, I, I know what you're talking about. I'm really into data visualization. As something I've always really enjoyed. And recently I found a company that's called flow immersive that has been taking data visualization. And putting it into augmented reality or AR, and so you can make like these 3d graphs that like you can display, like in a, in your boardroom or in your office, and you can like walk around the data. And I, I think that is just fascinating. It's really cool to be able to visualize your data literally like right in front of your face in 3d as if it's in the room with you. So I'm really excited about that. I'm not exactly sure. I haven't seen like a ton of good use cases for it other than it's really fascinating and it captures attention. But I mean, as you know a data VI Wiz, like how do you feel about 3d visualizations? I know, I know TTYs not a fan, but what about you? So I should first say that I'll give my opinion and distinguish between 3d and immersive. Like what you mentioned. I haven't tried immersive data visualization, immersive 3d data visualization yet. I'm excited to try it out, but I guess actually now that I think about it, my concern is about the same for both, which is, is this added dimension gonna add more confusion or is it adding anything of value? But I can see that being able to immerse ourselves in the data, we get to engage more of our senses. And so you can capture your audience's attention more than if you were to just pull up a screen and show a chart, which, you know, people could still zone out. So I can see the benefit of that. And I'm eager to try it out. I can't say too much about it since I haven't tried out immersive data visualizations yet. I think when we see so-called 3d charts on a 2d surface, it's not really 3d if you know what I mean, like it's not true through 3d until you have this more, either immersive experience or something that is actually physically. Three dimensional. There was like a exhibition in New York that actually had physically visualized data, so people could walk through it. It wasn't I don't know how to say it, but it was different from what you're talking about. Like there was actual, like you can touch it. It's Yeah, like it's material versus just like the digital. Yeah. And so one of the topics of one of the pieces was about how like small tickets for small crimes, like a broken window or things like that could actually tangle people into the criminal system and negatively affect their lives. And so the data was hanging and as you walk through it, it tangles you. So it's actually. You get to experience the topic of the project. And there was other pieces that did some something similar. So I thought that was a really interesting way to engage your audience and have a deeper and emotional connection with the topic. So I'm also excited like you about the potential with this. Yeah. It's, it's fascinating. Like you said, I don't know if it is the most rule following data viz techniques, but it sure is really interesting. And Michael, the CEO He's the, the data guy on TikTok. I mean, his videos go viral when he's showing this stuff in, in augmented reality. So it's, it's sure interesting. And will at least, if you have, if you have boring data, put it in augmented reality and someone will care about it. At least one person. Yeah. Thank you for sharing your thoughts on that. In the second segment of the podcast, I want to play a game. And it's using data science terms. And we're gonna pretend that you are presenting to a non-technical audience member about a recent data project that you completed, and you're sharing your findings with them. You're not stuck or anything. And you're also not asking for any input, you're just presenting the end product of your work. So I wanna say a data term and I'll give you three options to choose from for each of these terms, which is option. You can decide to include this data term in your presentation as is without defining it or explaining it. The second option is to skip mentioning this term altogether. And the third one is to translate or explain the term in a way that a non-technical person will be able to understand it. Does that make Fun. Yeah, I'm Yeah, I think it's a fun game. I know you have a strong technical and math background, so I'm really curious to hear your input on this. The first term I have is P value. Oh man. This is tough. This is really tough. I typically leave. I think I'm gonna vote to leave it out or I'm gonna, I'm gonna put a fourth option. I'm sorry. I just ruined your game. Go no, no, no, go for it. of the three. My fourth option is I'm gonna substitute. So instead of saying P value. if the P value is less than your confidence, your desired confidence level. So for instance, usually 95%. So your P value would be less than 0.05. That's statistically significant. So I'm gonna substitute P value. And I'm either going to say it's statistically significant or not statistically significant. If I'm speaking to a non-technical audience, I like this answer and it actually ties in with the next term I was gonna say. The second term I was gonna ask you about is statistical significance. You said you would actually present that to your audience. Have there been times when you notice people maybe. Because significance has like a different meaning in the non data context or non statistical context. Have you noticed audience members ever get confused or ever use this term incorrectly when they have no idea about the P value? yeah, that, that is a good point that like people just take statistically significant and they kind of just cut off the statistically and, and take it as significant or, or that can happen. Doesn't happen every time. But they're like, oh, that is significant. And significant probably has different meaning than statistically significant. And that's a, that's a good point. To be honest, I should probably think about it more, more often. A lot of the times I'll use a phrase. I mean, , you're the best at explaining this, but you obviously need to cater your message. To your audience, depending on who you're speaking to. But a lot of the times I'm speaking with fairly non-technical or at least non-data technical CEOs, founders, and managers. And so a lot of the times I could even sub if, if I wanted, like, I don't wanna say dumb it down, but if I wanna speak more, more for the layman, I would say instead of statistically significant, I will say the data says and typically, typically I have like the rapport. It's their trust. They, they kind of just trust that I know how to handle the data. So I'll just say, this is what the data tells us. But, but at that point you might be running the risk of maybe being too general and not technical enough. So I think if your audience maybe doesn't understand statistical significance, basically say the data can only tell us so much information based on the stats behind it. And there's a certain. Level where we have to be confident. I didn't do a great job explaining that, but give them some, some smaller definition of what the term actually means. I think is a good idea. You're bringing up a good point because I think there's some people, especially like higher up in the company that do want certainty sometimes a hundred percent certainty because there's a lot of stake. And even when you tell them yeah. Something is statistically significant or yeah, the P value is like a really good number. Nothing is ever a hundred percent certain in these cases. And I think sometimes people like, they want you to say those words, but we can't. Right. We, we just can't. What about correlation? Hmm. I think for the most part my audience at least understands. The idea behind correlation. I'm gonna say I'm going to use it and not explain it got it. And outlier. once again. I think obviously there's, there's more technical. Like what is actually an outlier, like okay. In like a, in a normal distribution, is it plus or minus two standard deviations from the mean, and that's like maybe what a statistical outlier is to a statistician, but I think there's like enough. Books and like culture out there that people know what outlier is. There's like the Malcolm Gladwell book. I can't remember what it's called. But I think, I think most people will get the idea. So I'm gonna, I'm gonna use it and not explain it first one. Yes. Yeah. I think the book is called outliers. Oh, there you go. Yeah, it's a pretty mainstream word now. Now what about residuals or error? Mm. Yeah. That's, that's a good question. I'm like, well, does this audience know residuals? So if you guys don't know residuals, it's basically, it's like the difference between the actual and the estimate. I okay. That's how I would explain it right there. That's what a residual is. I don't think the most majority of the time I will mention the word residual. But instead I'm often reporting like an error metric. So for instance, like the mean absolute percent error. Or the mean absolute error or the, just these different error metrics. And those, I do usually have to explain, and to be honest, when I go into the, a metrics, so I do a lot of machine learning. And so we have like a lot of accuracies. And I, each time I go into the accuracies and I like, I. Basically put it into a machine. It'll spit out like a bunch of the errors for me. I always need to remind myself what all of those mean and, and why they're different. So I would probably research those before talking about them and then explain the most important one. The one that makes most sense to the business . I actually had to do this the other day with one of my consulting jobs. We were making a machine learning model to predict prices and we're trying to see how good it was and if it was good enough to meet business needs. And I had like 10 different error metrics to choose from. And I, I only wanted to give them like one or two and, and then explain it. Mm. Okay. Just curious, cuz I know like in the non data world error is a much more serious word. Have you ever noticed when you were talking to someone when you're presenting to non-technical people, when you you're presenting about the errors in your model, do they ever get like overly concerned because they heard the word error and they associated with something very negative. Oh, that's a, that's a good question. I think I. I don't have that happen too often. And I'm wondering if it's because I deal with a lot. Newer companies and people who maybe are just dabbling into data for the first time. And so that, although it might be scary to them at the beginning, I try to be very clear and I try to tell them that whatever model we make will not be perfect. And it doesn't necessarily matter. It's just, doesn't help us make better decisions than we're making today. And I try to illustrate that we don't have perfect data. We don't have unlimited data. So, no matter what we make, it's not going to be perfect. So I think setting expectations can really help for that in the end because they know, okay, this model's not going to be perfect. right. Yeah. They have to accept that. There, there will be some degree of error, Yeah. That's, that's pretty much life. Like, even when I, even, when I'm walking, I, I might take a wrong step or trip my ankle and air's just part of life, I guess. Yeah. So Thank you so much for sharing your advice for people who are looking to communicate better in the data world, and also with playing this game, because for those who are listening. There's a lot to learn from this game and the advice that Avery gave on how to present these terms, if at all, and if they're relevant for your data project? I think this applies mostly to data scientist. As a data analyst I don't work with models as much, but it, I have noticed sometimes, like there are sometimes where I have to be careful about how I define the terms. If they have a different meaning colloquially like significant. This was really interesting. So Avery, I know that you also have a program for people who are interested in getting their feet wet with data and get some more hands on practice. Can you tell us more about this program? Yeah, for sure. So I run a, a challenge called the 21 day to data challenge, which basically is kind of all in the title. It's basically 21 straight days of data learning where you basically set aside a half hour to 45 minutes of your night. You learn something new with data. So it's a really good introductory course to seeing if data science or data analytics is for you. We cover an intro to Tableau an intro to SQL an intro to Python, and then we actually make our first data analytics project. Based off of all this analysis we're doing. We, we follow a crime data set from New York city and we, we kind of summarize everything that we've, we've learned over the 21 days and make a, a data science project post in the end. So by the end, you you'll figure out if you like data. You'll have those three foundations of Tableau, SQL and Python, and you'll also have your very first data science project. So a lot done in just 21 short days. I love how you said the time commitment is only 30 to 45 minutes each day, which is not a lot. So that's really amazing for just 21 days, you're able to do a lot. Thank you so much for telling us about this. Where can people find out more about how to join this program? Go to www.data, career jumpstart.com/challenge, and that would take you to the video that explains everything and the schedule and all that good stuff. Awesome. I will link this, the link to this challenge in the show notes associated with this podcast. So definitely check it out. I'm so glad we were able to chat today. I really look forward to having you again in the future for folks who are listening and who want to connect with you, where can they contact you? Yeah, for sure. I'm glad to be here. I would love to meet all of you guys connect with all of you guys. My most active social media platform is LinkedIn. So I, I like to post at least daily on LinkedIn. So you can find me just at my name, Avery Smith. Otherwise I'm on Instagram at my company's name, data career jumpstart which is probably my second. Active platform. And then I'm all over the place. I have a Twitter, I have a podcast. You can check out my podcast. Data career podcast need to have Hannah on as a guest really soon. But those are probably the best ways. Actually, the very best way is data career jumpstart.com that has all of my resources, all of my socials. So definitely check that out. Data career jumpstart.com. Before we started recording this show, Avery and I were chatting and he mentioned how he has spread himself thin across all these different platforms. So there is definitely a lot of other places to check him out on, like on YouTube and other social media channels. And so what I will do is I will actually link all the ways that people can connect with you on my show notes. So for those who want to connect with Avery, which you should, I will add all his contact info in the show notes associated with this episode. So thank you again so much, Avery for coming on. Yeah, thank you. It was a pleasure and I had a fun time.