RFC: JModels.jl

This document specifies an lightweight interface for statistical Julia models. The goal is to create a generic interface for many (wildly) different packages.

One main goal of this interface is as follows. Suppose that there is a Julia cross-validation package called CV.jl satisfying the JModels interface. This interface should make it possible to use CV.jl on models defined by organisations such as

and more.

Model Basics

In the most basic sense, we define a statistical model as an object that can be fitted to some random variables:

fmodel = fit(model, training_data)

This is often called fitting or training a model. Such a fitted model can then be applied to new data, often known as predicting:

predictions = predict(fmodel, data)

where data satisfies the assumptions of fmodel.

Using the Interface

As briefly mentioned in the Model Basics, the main functions for consuming compatible models are fit and predict. Details about this and related methods are provided below:

JModels.fitFunction
JModels.fit(t, x; settings=NamedTuple())

Return a fitted model of type t on data. Implementing this function is encouraged but optional. Without implementing this function, things such as model evaluation where your model is instantiated, fitted and used for prediction multiple times are not possible.

It is advised to assign default values to all keyword arguments. This makes it easier for people to compare different models.

source
JModels.fit!Function
JModels.fit!(model, data; settings=NamedTuple())

Fit an existing model on data by mutating model. In contrast to fit, this method is more flexible in configuring the model since a predefined model can be passed to be fitted. Also, this method can offer more performance if the model is trained in multiple steps.

source
JModels.predictFunction
JModels.predict(fmodel, data; settings=NamedTuple())

Predict with fitted model fmodel on data. For example, a k-means clustering model can predict target labels.

source
JModels.transformFunction
JModels.transform(fmodel, data; settings=NamedTuple())

Transform data via fitted model fmodel. For example, a k-means clustering model reduce dimensionality.

source
JModels.inverse_transformFunction
JModels.inverse_transform(fmodel, data; settings=NamedTuple())

Inversely transform data via fitted model fmodel.

source

Implementing the Interface

To become a JModels.jl source, the following methods can be implemented; some of which are optional:

Required

Implementing the following methods is required.

JModels.ismodelFunction
JModels.ismodel(x) -> Bool

Check if an object x has defined that it is a statistical model and has implemented the JModels interface.

Example

JModels.ismodel(::ExampleModel) = true
source

Optional

Implementing the following methods is optional.

  • LearningStrategies.jl provides an abstract interface for iteratively training a model. Specifically, the package allows for a model setup!, iteratively update! and a cleanup!. It has been the foundation for IterationControl.jl.
  • MLJModelInterface.jl provides an interface for statistical models. In comparison, JModels assumes less in order to make it easier for packages to satisfy the interface.

Data Definition

This interface makes no assumptions about the datatype. It is up to the package who implements the interface to decide what datatypes are allowed although in most cases the Tables.jl interface is the most suitable. Note that the Tables interface is not suitable for some statistical models. For example, for image classifiers, the data cannot easily be contained in a table.