Extendable Models¶

DeepLincs offers AutoEncoder, MultiClassifier, SingleClassifier, as simple high-level APIs for building and training a deep neural network on a L1000 Dataset.

While the networks defined for each model above are simple, each of these object can be subclassed, allowing for the user to override the compile_model method and build and more complicated model for an identical task. For example, here is a self normalizing classifier.

from deep_lincs.models import SingleClassifier
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Activation, AlphaDropout, Input

class SelfNormalisingClassifier(SingleClassifier):
    def __init__(self, dataset, target):
        super(SelfNormalisingClassifier, self).__init__(dataset=dataset, target=target)

    def compile_model(self, hidden_layers, dropout_rate, opt="adam"):
        inputs = Input(shape=(self.in_size,))
        x = Dense( hidden_layers.pop(0), kernel_initializer="lecun_normal")(inputs)
        x = Activation("selu")(x)
        x = AlphaDropout(dropout_rate)(x)

        for h in hidden_layers:
            x = Dense(h, kernel_initializer="lecun_normal")(x)
            x = Activation("selu")(x)
            x = AlphaDropout(dropout_rate)(x)

        outputs = Dense(self.out_size, activation="softmax")(x)
        model = Model(inputs, outputs)
        model.compile(
            loss='categorical_crossentropy',
            optimizer=opt,
            metrics=['accuracy']
        )
        # set model attribute
        self.model = model

The subsequent code to train this model with a Dataset is included below. All deep_lincs.models follow the same order of method calls.

snn = SelfNormalizingClassifier(dataset, target="subtype")
snn.prepare_tf_datasets(batch_size=128)
snn.compile_model([128, 128, 128], dropout_rate=0.15)
snn.fit(epochs=20)

AutoEncoder¶

class deep_lincs.models.AutoEncoder(dataset, **kwargs)[source]¶

Represents an simple autoencoder

Parameters:

dataset : Dataset: An instance of a Dataset intended to train and evaluate a model.
test_sizes : tuple, (optional, default ( 0.2 , 0.2 )): Size of test splits for dividing the dataset into training, validation, and, testing

Attributes:

encoder : tensorflow.keras.Model: Encoder for the AutoEncoder model.
targets : list(str): Targets for model.
train : Dataset: Dataset used to train the model.
val : Dataset: Dataset used during training as validation.
test : Dataset: Dataset used to evaluate the model.
model : tensorflow.keras.Model: Compiled and trained model.
in_size : int: Size of inputs (generally 978 for L1000 landmark genes).
out_size : int: Same as input size since model is an autoencoder.

__init__(self, dataset, **kwargs)[source]¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`(self, dataset, \\kwargs)	Initialize self.
`prepare_tf_datasets`(self, batch_size[, …])	Defines how to prepare a prefetch dataset for training and model evaluation
`compile_model`(self, hidden_layers[, …])	Defines how model is built and compiled
`fit`(self[, epochs, shuffle])	Trains model on training dataset
`save`(self, file_name)	Saves model as hdf5
`summary`(self)	Prints verbose summary of model

compile_model(self, hidden_layers, dropout_rate=0.0, activation='relu', final_activation='relu', optimizer='adam', l1_reg=None)[source]¶

Defines how model is built and compiled

Parameters:

hidden_layers : list(int): A list describing the size of the hidden layers.
dropout_rate : float (optional: default 0.0): Dropout rate used during training. Applied to all hidden layers.
activation : str, (optional: default "relu"): Activation function used in hidden layers.
final_activation : str (optional: default "softmax"): Activation function used in final layer.
optimizer : str, (optional: default "adam"): Optimizer used during training.
l1_reg : float (optional: default None): Level of L1 regularization applied to the hidden embedding (smallest hidden layer).

Returns:

None

evaluate(self, inputs=None)¶

Evaluates model

Parameters:	inputs : `tensorflow.data.dataset`, (optional: default `None`) If no tf.dataset is provided, the model is evaluated on internal test dataset.
Returns:	`list` of evalutation metrics.

fit(self, epochs=5, shuffle=True, **kwargs)¶

Trains model on training dataset

Parameters:	epochs : `int` Number of training epochs shuffle : `bool` (default: `True`) Whether to shuffle batches during training. kwargs : (optional) Additional keyword arguments for `tensorflow.keras.model.fit`. This is where `tensorflow.keras.callbacks` should be supplied, such as Tensorboard or EarlyStopping.
Returns:	`None`

predict(self, inputs=None)¶

Feeds inputs forward through the network

Parameters:	inputs : `tensorflow.data.dataset` or `array` or `dataframe`, (optional: default `None`) Inputs fed through the network. If not provided, the model uses the internal testing data to make a prediction.
Returns:	`array` of final activations.

prepare_tf_datasets(self, batch_size, batch_normalize=None)¶

Defines how to prepare a prefetch dataset for training and model evaluation

Parameters:	batch_size : `int` Batch size during training and for model evaluation. batch_normalize : `str` (default: `None`) Normalization applied to each batch during training and evaluation. Can be one of `"z_score"` or `"standard_scale"`. Default is `None`.
Returns:	`None` >>> model.prepare_tf_datasets(batch_size=128) ..

save(self, file_name)¶

Saves model as hdf5

Parameters:	file_name : `str` Name of output file.
Returns:	`None`

summary(self)¶

Prints verbose summary of model

Returns:	`None`

MultiClassifier¶

class deep_lincs.models.MultiClassifier(dataset, targets, **kwargs)[source]¶

Represents a classifier for multiple metadata fields

Parameters:

dataset : Dataset: An instance of a Dataset intended to train and evaluate a model.
targets : list(str): Valid lists of metadata fields which define multiple classification tasks.
test_sizes : tuple, (optional, default ( 0.2 , 0.2 )): Size of test splits for dividing the dataset into training, validation, and, testing

Attributes:

targets : list(str): Targets for model.
train : Dataset: Dataset used to train the model.
val : Dataset: Dataset used during training as validation.
test : Dataset: Dataset used to evaluate the model.
model : tensorflow.keras.Model: Compiled and trained model.
in_size : int: Size of inputs (generally 978 for L1000 landmark genes).
out_size : int: Sum total of classification categories.

__init__(self, dataset, targets, **kwargs)[source]¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`(self, dataset, targets, \\kwargs)	Initialize self.
`prepare_tf_datasets`(self, batch_size[, …])	Defines how to prepare a prefetch dataset for training and model evaluation
`compile_model`(self, hidden_layers[, …])	Defines how model is built and compiled
`fit`(self[, epochs, shuffle])	Trains model on training dataset
`evaluate`(self[, inputs])	Evaluates model
`predict`(self[, inputs])	Feeds inputs forward through the network
`save`(self, file_name)	Saves model as hdf5
`summary`(self)	Prints verbose summary of model
`plot_confusion_matrix`(self[, normalize, …])	Evaluates model and plots a confusion matrix of classification results

compile_model(self, hidden_layers, dropout_rate=0.0, activation='relu', optimizer='adam', final_activation='softmax')[source]¶

Defines how model is built and compiled

Parameters:

hidden_layers : list(int): A list describing the size of the hidden layers.
dropout_rate : float (optional: default 0.0): Dropout rate used during training. Applied to all hidden layers.
activation : str, (optional: default "relu"): Activation function used in hidden layers.
optimizer : str, (optional: default "adam"): Optimizer used during training.
final_activation : str (optional: default "softmax"): Activation function used in final layer.
loss : str (optional: default "categorical_crossentropy"): Loss function.

Returns:

None

evaluate(self, inputs=None)¶

Evaluates model

Parameters:	inputs : `tensorflow.data.dataset`, (optional: default `None`) If no tf.dataset is provided, the model is evaluated on internal test dataset.
Returns:	`list` of evalutation metrics.

fit(self, epochs=5, shuffle=True, **kwargs)¶

Trains model on training dataset

Parameters:	epochs : `int` Number of training epochs shuffle : `bool` (default: `True`) Whether to shuffle batches during training. kwargs : (optional) Additional keyword arguments for `tensorflow.keras.model.fit`. This is where `tensorflow.keras.callbacks` should be supplied, such as Tensorboard or EarlyStopping.
Returns:	`None`

plot_confusion_matrix(self, normalize=True, zero_diag=False, size=300, color_scheme='lightgreyteal')[source]¶

Evaluates model and plots a confusion matrix of classification results

Parameters:

normalize : bool, (optional: default True): Whether to normalize counts to frequencies.
zero_diag : bool (optional: default False): Whether to zero the diagonal of matrix. Useful for examining which categories are most frequently misidenitfied.
size : int, (optional: default 300): Size of the plot in pixels.
color_scheme : str, (optional: default "lightgreyteal"): Color scheme in heatmap. Can be any from https://vega.github.io/vega/docs/schemes/.

Returns:

altair.Chart object

predict(self, inputs=None)¶

Feeds inputs forward through the network

Parameters:	inputs : `tensorflow.data.dataset` or `array` or `dataframe`, (optional: default `None`) Inputs fed through the network. If not provided, the model uses the internal testing data to make a prediction.
Returns:	`array` of final activations.

prepare_tf_datasets(self, batch_size, batch_normalize=None)¶

Defines how to prepare a prefetch dataset for training and model evaluation

Parameters:	batch_size : `int` Batch size during training and for model evaluation. batch_normalize : `str` (default: `None`) Normalization applied to each batch during training and evaluation. Can be one of `"z_score"` or `"standard_scale"`. Default is `None`.
Returns:	`None` >>> model.prepare_tf_datasets(batch_size=128) ..

save(self, file_name)¶

Saves model as hdf5

Parameters:	file_name : `str` Name of output file.
Returns:	`None`

summary(self)¶

Prints verbose summary of model

Returns:	`None`

SingleClassifier¶

class deep_lincs.models.SingleClassifier(dataset, target, **kwargs)[source]¶

Represents a classifier for a single metadata field

Parameters:

dataset : Dataset: An instance of a Dataset intended to train and evaluate a model.
target : str: Valid metadata field defining task for classification.
test_sizes : tuple, (optional, default ( 0.2 , 0.2 )): Size of test splits for dividing the dataset into training, validation, and, testing

Attributes:

target : str: Target task of model.
train : Dataset: Dataset used to train the model.
val : Dataset: Dataset used during training as validation.
test : Dataset: Dataset used to evaluate the model.
model : tensorflow.keras.Model: Compiled and trained model.
in_size : int: Size of inputs (generally 978 for L1000 landmark genes).
out_size : int: Total of target categories.

__init__(self, dataset, target, **kwargs)[source]¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`(self, dataset, target, \\kwargs)	Initialize self.
`prepare_tf_datasets`(self, batch_size[, …])	Defines how to prepare a prefetch dataset for training and model evaluation
`compile_model`(self, hidden_layers[, …])	Defines how model is built and compiled
`fit`(self[, epochs, shuffle])	Trains model on training dataset
`evaluate`(self[, inputs])	Evaluates model
`predict`(self[, inputs])	Feeds inputs forward through the network
`save`(self, file_name)	Saves model as hdf5
`summary`(self)	Prints verbose summary of model
`plot_confusion_matrix`(self[, normalize, …])	Evaluates model and plots a confusion matrix of classification results

compile_model(self, hidden_layers, dropout_rate=0.0, activation='relu', optimizer='adam')[source]¶

Defines how model is built and compiled

Parameters:	hidden_layers : `list(int)` A list describing the size of the hidden layers. dropout_rate : `float` (optional: default `0.0`) Dropout rate used during training. Applied to all hidden layers. activation : `str`, (optional: default `"relu"`) Activation function used in hidden layers. optimizer : `str`, (optional: default `"adam"`) Optimizer used during training.
Returns:	`None`

evaluate(self, inputs=None)¶

Evaluates model

Parameters:	inputs : `tensorflow.data.dataset`, (optional: default `None`) If no tf.dataset is provided, the model is evaluated on internal test dataset.
Returns:	`list` of evalutation metrics.

fit(self, epochs=5, shuffle=True, **kwargs)¶

Trains model on training dataset

Parameters:	epochs : `int` Number of training epochs shuffle : `bool` (default: `True`) Whether to shuffle batches during training. kwargs : (optional) Additional keyword arguments for `tensorflow.keras.model.fit`. This is where `tensorflow.keras.callbacks` should be supplied, such as Tensorboard or EarlyStopping.
Returns:	`None`

plot_confusion_matrix(self, normalize=True, zero_diag=False, size=300, color_scheme='lightgreyteal')[source]¶

Evaluates model and plots a confusion matrix of classification results

Parameters:

normalize : bool, (optional: default True): Whether to normalize counts to frequencies.
zero_diag : bool (optional: default False): Whether to zero the diagonal of matrix. Useful for examining which categories are most frequently misidenitfied.
size : int, (optional: default 300): Size of the plot in pixels.
color_scheme : str, (optional: default "viridis"): Color scheme in heatmap. Can be any from https://vega.github.io/vega/docs/schemes/.

Returns:

altair.Chart object

predict(self, inputs=None)¶

Feeds inputs forward through the network

Parameters:	inputs : `tensorflow.data.dataset` or `array` or `dataframe`, (optional: default `None`) Inputs fed through the network. If not provided, the model uses the internal testing data to make a prediction.
Returns:	`array` of final activations.

prepare_tf_datasets(self, batch_size, batch_normalize=None)¶

Defines how to prepare a prefetch dataset for training and model evaluation

Parameters:	batch_size : `int` Batch size during training and for model evaluation. batch_normalize : `str` (default: `None`) Normalization applied to each batch during training and evaluation. Can be one of `"z_score"` or `"standard_scale"`. Default is `None`.
Returns:	`None` >>> model.prepare_tf_datasets(batch_size=128) ..

save(self, file_name)¶

Saves model as hdf5

Parameters:	file_name : `str` Name of output file.
Returns:	`None`

summary(self)¶

Prints verbose summary of model

Returns:	`None`

Extendable Models¶

AutoEncoder¶

MultiClassifier¶

SingleClassifier¶

DeepLINCS

Navigation

Related Topics