RFC: JModels.jl
This document specifies an lightweight interface for statistical Julia models. The goal is to create a generic interface for many (wildly) different packages.
One main goal of this interface is as follows. Suppose that there is a Julia cross-validation package called CV.jl satisfying the JModels interface. This interface should make it possible to use CV.jl on models defined by organisations such as
and more.
Model Basics
In the most basic sense, we define a statistical model
as an object that can be fitted to some random variables:
fmodel = fit(model, training_data)
This is often called fitting or training a model. Such a fitted model can then be applied to new data, often known as predicting:
predictions = predict(fmodel, data)
where data
satisfies the assumptions of fmodel
.
Using the Interface
As briefly mentioned in the Model Basics, the main functions for consuming compatible models are fit
and predict
. Details about this and related methods are provided below:
JModels.fit
— FunctionJModels.fit(t, x; settings=NamedTuple())
Return a fitted model of type t
on data
. Implementing this function is encouraged but optional. Without implementing this function, things such as model evaluation where your model is instantiated, fit
ted and used for prediction multiple times are not possible.
It is advised to assign default values to all keyword arguments. This makes it easier for people to compare different models.
JModels.fit!
— FunctionJModels.fit!(model, data; settings=NamedTuple())
Fit an existing model
on data
by mutating model
. In contrast to fit
, this method is more flexible in configuring the model since a predefined model can be passed to be fit
ted. Also, this method can offer more performance if the model is trained in multiple steps.
JModels.predict
— FunctionJModels.predict(fmodel, data; settings=NamedTuple())
Predict with fitted model fmodel
on data
. For example, a k-means clustering model can predict target labels.
JModels.transform
— FunctionJModels.transform(fmodel, data; settings=NamedTuple())
Transform data
via fitted model fmodel
. For example, a k-means clustering model reduce dimensionality.
JModels.inverse_transform
— FunctionJModels.inverse_transform(fmodel, data; settings=NamedTuple())
Inversely transform data
via fitted model fmodel
.
JModels.verify_model
— FunctionJModels.verify_model(x)
Throw an error if !ismodel(x)
.
Implementing the Interface
To become a JModels.jl
source, the following methods can be implemented; some of which are optional:
Required
Implementing the following methods is required.
JModels.ismodel
— FunctionJModels.ismodel(x) -> Bool
Check if an object x
has defined that it is a statistical model and has implemented the JModels interface.
Example
JModels.ismodel(::ExampleModel) = true
Optional
Implementing the following methods is optional.
Related Work
- LearningStrategies.jl provides an abstract interface for iteratively training a model. Specifically, the package allows for a model
setup!
, iterativelyupdate!
and acleanup!
. It has been the foundation for IterationControl.jl. - MLJModelInterface.jl provides an interface for statistical models. In comparison,
JModels
assumes less in order to make it easier for packages to satisfy the interface.
Data Definition
This interface makes no assumptions about the datatype. It is up to the package who implements the interface to decide what datatypes are allowed although in most cases the Tables.jl interface is the most suitable. Note that the Tables interface is not suitable for some statistical models. For example, for image classifiers, the data cannot easily be contained in a table.