dataCompare

Overview

Data scientists have always used data to gain insight. They develop models to explain or predict an output variable with input characteristics.
Sometimes they develop pipelines to complete an end-to-end process. Going from raw data to final prediction or scoring involves many steps:

1- Data loading
2- Data pre-processing or transformation
3- Model training
4- Model prediction
5- Post model prediction logic : prediction aggregation, indicator calculation and segmentation

Each of these steps generates data, referred to here as ‘intermediate results’ or ‘final output’.

In most cases, machine learning models are standalone objects that are shared with other applications via an API.
To make sure that the deployed API pipeline looks exactly like the modeler’s pipeline, we need to compare their outputs.
Data For Know (D4K) provides a graphical tool to compare the two pipelines.
dataCompare is a tool for comparing the outputs of two machine learning pipelines. It helps to check if the pipelines are similar or not.

Presentation

dataCompare is a shiny application developed with the Golem framework. It is used to check value differences between two dataframes. The code below shows how to install it from Github and CRAN.

Install and Load

# From Cran
install.packages('dataCompare')

# From Github
install_github('seewe/dataCompare')

# Load in the environment
library(dataCompare)

Run the app with the following code

dataCompare::run_data_compare_app()