Data scientists have always used data to gain insight. They develop
models to explain or predict an output variable with input
characteristics.
Sometimes they develop pipelines to complete an end-to-end process.
Going from raw data to final prediction or scoring involves many
steps:
1- Data loading
2- Data pre-processing or transformation
3- Model training
4- Model prediction
5- Post model prediction logic : prediction aggregation, indicator
calculation and segmentation
Each of these steps generates data, referred to here as ‘intermediate results’ or ‘final output’.
In most cases, machine learning models are standalone objects that
are shared with other applications via an API.
To make sure that the deployed API pipeline looks exactly like the
modeler’s pipeline, we need to compare their outputs.
Data For Know (D4K)
provides a graphical
tool to compare the two pipelines.
dataCompare
is a tool for comparing the
outputs of two machine learning pipelines. It helps to check if the
pipelines are similar or not.
dataCompare
is a shiny application
developed with the Golem framework. It is used to check value
differences between two dataframes. The code below shows how to install
it from Github and CRAN.