Logo

The Data Daily

Want To Be AI-First? You Need To Be Data-First. | 7wData

Want To Be AI-First? You Need To Be Data-First. | 7wData

Those that implement AI and Machine Learning project learn quickly that Machine Learning projects are not application development projects. Much of the value of machine learning projects rest in the models, training data, and configuration information that guides how the model is applied to the specific machine learning problem. The application code is mostly a means to implement the machine learning algorithms and "operationalize" the machine learning model in a production environment.  That's not to say that application code is not necessary — after all, the computer needs some way to operationalize the machine learning model — but focusing a machine learning project on the application code is missing the big picture. If you want to be AI-first for your project, you need to have a data-first perspective.

Therefore it follows that if you're going to have a data-first perspective, you need to use a data-first methodology. There's certainly nothing wrong with Agile methodologies as a way of iterating towards success, but Agile on its own leaves much to be desired as it's focused on functionality and delivery of application logic. There are already data-centric methodologies out there that have been proven in many real-world scenarios. One of the most popular is the Cross Industry Standard Process for Data Mining (CRISP-DM), which focuses on the steps needed for successful data projects. In the modern age, it makes sense to merge the notably non-agile CRISP-DM with Agile Methodologies to make it more relevant. While this is still a new area for most enterprises implementing AI projects, we see this sort of merged methodology approach to be more successful than trying to shoehorn all the aspects of an AI project into existing application-focused Agile methodologies.

It stands to reason that if you have a data-centric perspective on AI then you need to pair your data-centric methodologies with data-centric technologies. This means that your choice of tooling to implement all those artifacts detailed above need to be, first and foremost, data-focused. Don't use code-centric IDEs when you should be using data notebooks. Don't use enterprise integration middleware platforms when you should be using tools that focus on model development and maintenance. Don't use so-called machine learning platforms that are really just a pile of cloud-based technologies or overgrown big data management platforms. The tools you use should support the machine learning goals you need, which are in turn supported by the activities you need to do and the artifacts you need to create. Just because a GPU provider has a toolset doesn't mean that it's the right one to use. Just because a big enterprise vendor or a cloud vendor has a "stack" doesn't mean it's the right one. Start from the deliverables and the machine learning objectives and work your way backwards.

Another big consideration is where and how machine learning models will be deployed - or in AI-speak "operationalized". AI models can be implemented in a remarkably wide range of places — from "edge" devices sitting disconnected from the internet to mobile and desktop applications; from enterprise servers to cloud-based instances; and all manner of autonomous vehicles and craft. Each of these locations is a place where AI models and implementations can and do exist. This amount of model operationalization heterogeneity highlights even more so how ludicrous the idea of a single machine learning platform is. How can one platform at the same time provide AI capabilities in a drone, mobile app, enterprise implementation, and cloud instance. Even if you source all this technology from a single vendor, it will be a collection of different tools that sit under a single marketing umbrella rather than a single, cohesive, interoperable platform that makes any sense.

All this methodology and technology can't assemble itself.

Images Powered by Shutterstock