Before building an ML system, we must understand why this system is needed. If this system is built for a business, it must be driven by business objectives, which will need to be translated into ML objectives to guide the development of ML models.
When working on an ML project, data scientists tend to care about the ML objectives: the metrics they can measure about the performance of their ML models such as accuracy, F1 score, inference latency, etc. They get excited about improving their model’s accuracy from 94% to 94.2% and might spend a ton of resources—data, compute and engineering time—to achieve that. But the truth is: most companies don’t care about the fancy ML metrics. They don’t care about increasing a model’s accuracy from 94% to 94.2% unless it moves some business metrics.
The ultimate goal of any project within a business is to increase profits, either directly or indirectly: directly such as increasing sales (conversion rates) and cutting costs; indirectly such as higher customer satisfaction and increasing time spent on a website.
For an ML project to succeed within a business organisation, it’s crucial to tie the performance of an ML system to the overall business performance.
One of the reasons why predicting ad click-through rates and fraud detection are among the most popular use cases for ML today is that it’s easy to map ML models’ performance to business metrics: every increase in click-through rate results in actual ad revenue and every fraudulent transaction stopped results in actual money saved.
Once everyone is on board with the objectives for our ML system, we’ll need to set out some requirements to guide the development of this system.
The specified requirements for an ML system vary from use case to use case. However, most systems should have these four characteristics: reliability, scalability, maintainability and adaptability.
- Reliability
- The system should continue to perform the correct function at the desired level of performance even in the face of adversity (hardware or software faults and even human error).
- “Correctness” might be difficult to determine for ML systems.
- Scalability
- There are multiple ways an ML system can grow.
- It can grow in complexity.
- Your ML system can grow in traffic volume.
- An ML system might grow in ML model count.
- There are multiple ways an ML system can grow.
- Maintainability
- Code should be documented. Code, data and artefacts should be versioned. Models should be sufficiently reproducible.
- Adaptability
- To adapt to shifting data distributions and business requirements, the system should have some capacity for both discovering aspects for performance improvement and allowing updates without service interruption.
- Because ML systems are part code, part data and data can change quickly, ML systems need to be able to evolve quickly.