Extendable Models

DeepLincs offers AutoEncoder, MultiClassifier, SingleClassifier, as simple high-level APIs for building and training a deep neural network on a L1000 Dataset.

While the networks defined for each model above are simple, each of these object can be subclassed, allowing for the user to override the compile_model method and build and more complicated model for an identical task. For example, here is a self normalizing classifier.

from deep_lincs.models import SingleClassifier
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Activation, AlphaDropout, Input

class SelfNormalisingClassifier(SingleClassifier):
    def __init__(self, dataset, target):
        super(SelfNormalisingClassifier, self).__init__(dataset=dataset, target=target)

    def compile_model(self, hidden_layers, dropout_rate, opt="adam"):
        inputs = Input(shape=(self.in_size,))
        x = Dense( hidden_layers.pop(0), kernel_initializer="lecun_normal")(inputs)
        x = Activation("selu")(x)
        x = AlphaDropout(dropout_rate)(x)

        for h in hidden_layers:
            x = Dense(h, kernel_initializer="lecun_normal")(x)
            x = Activation("selu")(x)
            x = AlphaDropout(dropout_rate)(x)

        outputs = Dense(self.out_size, activation="softmax")(x)
        model = Model(inputs, outputs)
        model.compile(
            loss='categorical_crossentropy',
            optimizer=opt,
            metrics=['accuracy']
        )
        # set model attribute
        self.model = model

The subsequent code to train this model with a Dataset is included below. All deep_lincs.models follow the same order of method calls.

snn = SelfNormalizingClassifier(dataset, target="subtype")
snn.prepare_tf_datasets(batch_size=128)
snn.compile_model([128, 128, 128], dropout_rate=0.15)
snn.fit(epochs=20)

AutoEncoder

class deep_lincs.models.AutoEncoder(dataset, **kwargs)[source]

Represents an simple autoencoder

Parameters:
dataset : Dataset

An instance of a Dataset intended to train and evaluate a model.

test_sizes : tuple, (optional, default ( 0.2 , 0.2 ))

Size of test splits for dividing the dataset into training, validation, and, testing

Attributes:
encoder : tensorflow.keras.Model

Encoder for the AutoEncoder model.

targets : list(str)

Targets for model.

train : Dataset

Dataset used to train the model.

val : Dataset

Dataset used during training as validation.

test : Dataset

Dataset used to evaluate the model.

model : tensorflow.keras.Model

Compiled and trained model.

in_size : int

Size of inputs (generally 978 for L1000 landmark genes).

out_size : int

Same as input size since model is an autoencoder.

__init__(self, dataset, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(self, dataset, \*\*kwargs) Initialize self.
prepare_tf_datasets(self, batch_size[, …]) Defines how to prepare a prefetch dataset for training and model evaluation
compile_model(self, hidden_layers[, …]) Defines how model is built and compiled
fit(self[, epochs, shuffle]) Trains model on training dataset
save(self, file_name) Saves model as hdf5
summary(self) Prints verbose summary of model
compile_model(self, hidden_layers, dropout_rate=0.0, activation='relu', final_activation='relu', optimizer='adam', l1_reg=None)[source]

Defines how model is built and compiled

Parameters:
hidden_layers : list(int)

A list describing the size of the hidden layers.

dropout_rate : float (optional: default 0.0)

Dropout rate used during training. Applied to all hidden layers.

activation : str, (optional: default "relu")

Activation function used in hidden layers.

final_activation : str (optional: default "softmax")

Activation function used in final layer.

optimizer : str, (optional: default "adam")

Optimizer used during training.

l1_reg : float (optional: default None)

Level of L1 regularization applied to the hidden embedding (smallest hidden layer).

Returns:
None
evaluate(self, inputs=None)

Evaluates model

Parameters:
inputs : tensorflow.data.dataset, (optional: default None)

If no tf.dataset is provided, the model is evaluated on internal test dataset.

Returns:
list of evalutation metrics.
fit(self, epochs=5, shuffle=True, **kwargs)

Trains model on training dataset

Parameters:
epochs : int

Number of training epochs

shuffle : bool (default: True)

Whether to shuffle batches during training.

kwargs : (optional)

Additional keyword arguments for tensorflow.keras.model.fit. This is where tensorflow.keras.callbacks should be supplied, such as Tensorboard or EarlyStopping.

Returns:
None
predict(self, inputs=None)

Feeds inputs forward through the network

Parameters:
inputs : tensorflow.data.dataset or array or dataframe, (optional: default None)

Inputs fed through the network. If not provided, the model uses the internal testing data to make a prediction.

Returns:
array of final activations.
prepare_tf_datasets(self, batch_size, batch_normalize=None)

Defines how to prepare a prefetch dataset for training and model evaluation

Parameters:
batch_size : int

Batch size during training and for model evaluation.

batch_normalize : str (default: None)

Normalization applied to each batch during training and evaluation. Can be one of "z_score" or "standard_scale". Default is None.

Returns:
None
>>> model.prepare_tf_datasets(batch_size=128)
    ..
save(self, file_name)

Saves model as hdf5

Parameters:
file_name : str

Name of output file.

Returns:
None
summary(self)

Prints verbose summary of model

Returns:
None

MultiClassifier

class deep_lincs.models.MultiClassifier(dataset, targets, **kwargs)[source]

Represents a classifier for multiple metadata fields

Parameters:
dataset : Dataset

An instance of a Dataset intended to train and evaluate a model.

targets : list(str)

Valid lists of metadata fields which define multiple classification tasks.

test_sizes : tuple, (optional, default ( 0.2 , 0.2 ))

Size of test splits for dividing the dataset into training, validation, and, testing

Attributes:
targets : list(str)

Targets for model.

train : Dataset

Dataset used to train the model.

val : Dataset

Dataset used during training as validation.

test : Dataset

Dataset used to evaluate the model.

model : tensorflow.keras.Model

Compiled and trained model.

in_size : int

Size of inputs (generally 978 for L1000 landmark genes).

out_size : int

Sum total of classification categories.

__init__(self, dataset, targets, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(self, dataset, targets, \*\*kwargs) Initialize self.
prepare_tf_datasets(self, batch_size[, …]) Defines how to prepare a prefetch dataset for training and model evaluation
compile_model(self, hidden_layers[, …]) Defines how model is built and compiled
fit(self[, epochs, shuffle]) Trains model on training dataset
evaluate(self[, inputs]) Evaluates model
predict(self[, inputs]) Feeds inputs forward through the network
save(self, file_name) Saves model as hdf5
summary(self) Prints verbose summary of model
plot_confusion_matrix(self[, normalize, …]) Evaluates model and plots a confusion matrix of classification results
compile_model(self, hidden_layers, dropout_rate=0.0, activation='relu', optimizer='adam', final_activation='softmax')[source]

Defines how model is built and compiled

Parameters:
hidden_layers : list(int)

A list describing the size of the hidden layers.

dropout_rate : float (optional: default 0.0)

Dropout rate used during training. Applied to all hidden layers.

activation : str, (optional: default "relu")

Activation function used in hidden layers.

optimizer : str, (optional: default "adam")

Optimizer used during training.

final_activation : str (optional: default "softmax")

Activation function used in final layer.

loss : str (optional: default "categorical_crossentropy")

Loss function.

Returns:
None
evaluate(self, inputs=None)

Evaluates model

Parameters:
inputs : tensorflow.data.dataset, (optional: default None)

If no tf.dataset is provided, the model is evaluated on internal test dataset.

Returns:
list of evalutation metrics.
fit(self, epochs=5, shuffle=True, **kwargs)

Trains model on training dataset

Parameters:
epochs : int

Number of training epochs

shuffle : bool (default: True)

Whether to shuffle batches during training.

kwargs : (optional)

Additional keyword arguments for tensorflow.keras.model.fit. This is where tensorflow.keras.callbacks should be supplied, such as Tensorboard or EarlyStopping.

Returns:
None
plot_confusion_matrix(self, normalize=True, zero_diag=False, size=300, color_scheme='lightgreyteal')[source]

Evaluates model and plots a confusion matrix of classification results

Parameters:
normalize : bool, (optional: default True)

Whether to normalize counts to frequencies.

zero_diag : bool (optional: default False)

Whether to zero the diagonal of matrix. Useful for examining which categories are most frequently misidenitfied.

size : int, (optional: default 300)

Size of the plot in pixels.

color_scheme : str, (optional: default "lightgreyteal")

Color scheme in heatmap. Can be any from https://vega.github.io/vega/docs/schemes/.

Returns:
altair.Chart object
predict(self, inputs=None)

Feeds inputs forward through the network

Parameters:
inputs : tensorflow.data.dataset or array or dataframe, (optional: default None)

Inputs fed through the network. If not provided, the model uses the internal testing data to make a prediction.

Returns:
array of final activations.
prepare_tf_datasets(self, batch_size, batch_normalize=None)

Defines how to prepare a prefetch dataset for training and model evaluation

Parameters:
batch_size : int

Batch size during training and for model evaluation.

batch_normalize : str (default: None)

Normalization applied to each batch during training and evaluation. Can be one of "z_score" or "standard_scale". Default is None.

Returns:
None
>>> model.prepare_tf_datasets(batch_size=128)
    ..
save(self, file_name)

Saves model as hdf5

Parameters:
file_name : str

Name of output file.

Returns:
None
summary(self)

Prints verbose summary of model

Returns:
None

SingleClassifier

class deep_lincs.models.SingleClassifier(dataset, target, **kwargs)[source]

Represents a classifier for a single metadata field

Parameters:
dataset : Dataset

An instance of a Dataset intended to train and evaluate a model.

target : str

Valid metadata field defining task for classification.

test_sizes : tuple, (optional, default ( 0.2 , 0.2 ))

Size of test splits for dividing the dataset into training, validation, and, testing

Attributes:
target : str

Target task of model.

train : Dataset

Dataset used to train the model.

val : Dataset

Dataset used during training as validation.

test : Dataset

Dataset used to evaluate the model.

model : tensorflow.keras.Model

Compiled and trained model.

in_size : int

Size of inputs (generally 978 for L1000 landmark genes).

out_size : int

Total of target categories.

__init__(self, dataset, target, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(self, dataset, target, \*\*kwargs) Initialize self.
prepare_tf_datasets(self, batch_size[, …]) Defines how to prepare a prefetch dataset for training and model evaluation
compile_model(self, hidden_layers[, …]) Defines how model is built and compiled
fit(self[, epochs, shuffle]) Trains model on training dataset
evaluate(self[, inputs]) Evaluates model
predict(self[, inputs]) Feeds inputs forward through the network
save(self, file_name) Saves model as hdf5
summary(self) Prints verbose summary of model
plot_confusion_matrix(self[, normalize, …]) Evaluates model and plots a confusion matrix of classification results
compile_model(self, hidden_layers, dropout_rate=0.0, activation='relu', optimizer='adam')[source]

Defines how model is built and compiled

Parameters:
hidden_layers : list(int)

A list describing the size of the hidden layers.

dropout_rate : float (optional: default 0.0)

Dropout rate used during training. Applied to all hidden layers.

activation : str, (optional: default "relu")

Activation function used in hidden layers.

optimizer : str, (optional: default "adam")

Optimizer used during training.

Returns:
None
evaluate(self, inputs=None)

Evaluates model

Parameters:
inputs : tensorflow.data.dataset, (optional: default None)

If no tf.dataset is provided, the model is evaluated on internal test dataset.

Returns:
list of evalutation metrics.
fit(self, epochs=5, shuffle=True, **kwargs)

Trains model on training dataset

Parameters:
epochs : int

Number of training epochs

shuffle : bool (default: True)

Whether to shuffle batches during training.

kwargs : (optional)

Additional keyword arguments for tensorflow.keras.model.fit. This is where tensorflow.keras.callbacks should be supplied, such as Tensorboard or EarlyStopping.

Returns:
None
plot_confusion_matrix(self, normalize=True, zero_diag=False, size=300, color_scheme='lightgreyteal')[source]

Evaluates model and plots a confusion matrix of classification results

Parameters:
normalize : bool, (optional: default True)

Whether to normalize counts to frequencies.

zero_diag : bool (optional: default False)

Whether to zero the diagonal of matrix. Useful for examining which categories are most frequently misidenitfied.

size : int, (optional: default 300)

Size of the plot in pixels.

color_scheme : str, (optional: default "viridis")

Color scheme in heatmap. Can be any from https://vega.github.io/vega/docs/schemes/.

Returns:
altair.Chart object
predict(self, inputs=None)

Feeds inputs forward through the network

Parameters:
inputs : tensorflow.data.dataset or array or dataframe, (optional: default None)

Inputs fed through the network. If not provided, the model uses the internal testing data to make a prediction.

Returns:
array of final activations.
prepare_tf_datasets(self, batch_size, batch_normalize=None)

Defines how to prepare a prefetch dataset for training and model evaluation

Parameters:
batch_size : int

Batch size during training and for model evaluation.

batch_normalize : str (default: None)

Normalization applied to each batch during training and evaluation. Can be one of "z_score" or "standard_scale". Default is None.

Returns:
None
>>> model.prepare_tf_datasets(batch_size=128)
    ..
save(self, file_name)

Saves model as hdf5

Parameters:
file_name : str

Name of output file.

Returns:
None
summary(self)

Prints verbose summary of model

Returns:
None