Being a data scientist is hard. In addition to the combination of advanced mathematics and coding skills required to do the job, it’s a newer role for many organizations, so data scientists are called upon to navigate corporate landscapes, source the right IT resources, and establish new workflows across departments to do their jobs effectively.
These best practices will help data leaders be more effective at their jobs, lead the way for future data scientists, and establish a department that’s innovative and productive.
Because open-source tools are such an important part of the data science technology stack, it’s important that hiring criteria reflect this.
Data scientists with experience contributing to open-source projects will have a better understanding of how to evaluate and manage open-source tools by looking at code activity, package metadata, release history, and project contributors.
They should also understand when and how to make pull requests if packages can be updated, enhanced, or made more secure to meet an organization’s needs. In addition to hiring data scientists and developers with open source expertise, consider working with a vendor that provides support for open-source tools and libraries.
When data scientists don’t monitor for potential threats, vulnerabilities inevitably creep into models over time. data science leaders must step up and collaborate with IT and security leaders to take charge of their data science and machine learning pipelines.
Because these pipelines usually involve the use of open-source libraries, it’s important to understand an organization’s risk tolerance for open-source software. Learn about Common Vulnerabilities and Exposures (CVEs), how to look for them, and how to monitor environments for high-risk packages. Ignoring a high CVE score can result in data breaches and unstable applications.
Many data scientists don’t start out on teams, rather they are scattered across the organization and assigned to specific lines of business to solve particular problems.
This is usually effective for organizations starting their data science journeys because it’s easier to demonstrate business impact with small, focused projects. But over time, data scientists will need to collaborate to develop processes and eliminate redundancy. They will also need to work with IT to understand how to put projects into production, assess the limits of their resources, and understand security standards.
Many organizations have found success in adopting a hub-and-spoke model, in which some data scientists remain within lines of business, while others work in a data science lab or center to help data scientists and analysts across the organization.
To solve business problems, data science teams should understand how to speak the language of the business units they work with. It’s essential that common terms and acronyms are used in presentations with their respective lines of business. This will help establish common ground in defining and evaluating success.
Alignment with lines of business is also important in building out custom dashboards that serve unique needs. Referring to these dashboard metrics on a regular basis in joint meetings will also be important as new data science project goals are discussed and the effects of decision-making based on model output are evaluated.
Nearly half of data science projects never make it to production.