{"id":3898,"date":"2022-05-02T15:46:40","date_gmt":"2022-05-02T13:46:40","guid":{"rendered":"https:\/\/dev.littlebigcode.fr\/mlops-why-data-model-experiment-tracking-is-important\/"},"modified":"2022-07-04T23:35:36","modified_gmt":"2022-07-04T21:35:36","slug":"mlops-why-data-model-experiment-tracking-is-important","status":"publish","type":"post","link":"https:\/\/dev.littlebigcode.fr\/en\/mlops-why-data-model-experiment-tracking-is-important\/","title":{"rendered":"MLOps : Why data and model experiment tracking is important ? How tools like DVC and MLflow can solve this challenge ?"},"content":{"rendered":"
[et_pb_section fb_built=”1″ admin_label=”section” _builder_version=”4.16″ da_disable_devices=”off|off|off” global_colors_info=”{}” da_is_popup=”off” da_exit_intent=”off” da_has_close=”on” da_alt_close=”off” da_dark_close=”off” da_not_modal=”on” da_is_singular=”off” da_with_loader=”off” da_has_shadow=”on”][et_pb_row admin_label=”row” _builder_version=”4.16″ background_size=”initial” background_position=”top_left” background_repeat=”repeat” custom_padding=”3px||3px||true|false” global_colors_info=”{}”][et_pb_column type=”4_4″ _builder_version=”4.16″ custom_padding=”|||” global_colors_info=”{}” custom_padding__hover=”|||”][et_pb_text admin_label=”Text” _builder_version=”4.17.4″ text_font=”Average Sans||||||||” text_text_color=”#242B57″ link_font=”Average Sans||||||||” link_text_color=”#1CACE4″ ul_font=”Average Sans||||||||” ul_text_color=”#242B57″ ol_text_color=”#242B57″ quote_font=”Average Sans||||||||” quote_text_color=”#242B57″ header_text_color=”#1CACE4″ header_2_text_color=”#1CACE4″ header_3_text_color=”#1CACE4″ header_4_font=”Average Sans||||||||” header_4_text_color=”#1CACE4″ header_5_font=”Century Gothic Bold||||||||” header_5_text_color=”#1CACE4″ header_6_font=”Century Gothic Bold||||||||” header_6_text_color=”#1CACE4″ background_size=”initial” background_position=”top_left” background_repeat=”repeat” custom_padding=”8px|||||” inline_fonts=”Century Gothic Bold,Century Gothic,Average Sans” global_colors_info=”{}”]<\/p>\n
With the rise of interest and the number of machine learning projects (self-driving car, facial recognition, recommendation systems), traditional software development has shifted from hard-coded rules to data-estimated rules a.k.a. data-driven models (cf. figure 1). A set of new challenges arose for building reliable and stable information systems that rely on imperfect data-driven models, such as model versioning, deployment, monitoring, explainability and reproducibility..<\/p>\n
By Samson ZHANG<\/a>, Data Scientist at LittleBigCode<\/p>\n Figure 1. Machine learning vs traditional software development. Source : http:\/\/datalya.com<\/a><\/span><\/p><\/div>\n There is a whole new set of software engineering best practices that comes with the use of data-driven models in order to tackle those challenges, called MLOps. In order to get a broader view of what MLOps is, I recommend you to take a look at Jamila Rejeb\u2019s article: Why MLOps is so important to understand ?<\/a> The main purpose of MLOps is to make your entire ML project lifecycle automated and reproducible. In this article, we will mainly focus on data & model experiment tracking\/versioning. The main issues data & model experiment tracking aim to solve are : <\/strong><\/p>\n Code reproducibility<\/p>\n<\/li>\n Data set reproducibility<\/p>\n<\/li>\n Artifacts logging (model weights, hyper-parameters)<\/p>\n<\/li>\n Experiments’ results comparison<\/p>\n<\/li>\n<\/ul>\n A data-driven model is, by definition, a model that learns from data (cf. figure 2). It means that talking about model versioning, does not only involve versioning the code\/algorithm (neural networks, trees, etc\u2026) and its different parameters (weights, etc\u2026). It also involves versioning the data used for training the latter, as in practice, the data set can change which impacts the model\u2019s performance. Model versioning makes model reproducible in different environments and makes collaboration easier.<\/p>\n Being able to reproduce your models does not only benefit you. In some instances, it can also prevent you from legal issues where you would need to prove ownership of the models by showing that you can generate your models from end to end.<\/p>\n
\n