Extendable Models¶
DeepLincs offers AutoEncoder, MultiClassifier,
SingleClassifier, as simple high-level APIs for building
and training a deep neural network on a L1000 Dataset.
While the networks defined for each model above are simple,
each of these object can be subclassed, allowing for the user to
override the compile_model method and build and more complicated
model for an identical task. For example, here is a
self normalizing classifier.
from deep_lincs.models import SingleClassifier
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Activation, AlphaDropout, Input
class SelfNormalisingClassifier(SingleClassifier):
def __init__(self, dataset, target):
super(SelfNormalisingClassifier, self).__init__(dataset=dataset, target=target)
def compile_model(self, hidden_layers, dropout_rate, opt="adam"):
inputs = Input(shape=(self.in_size,))
x = Dense( hidden_layers.pop(0), kernel_initializer="lecun_normal")(inputs)
x = Activation("selu")(x)
x = AlphaDropout(dropout_rate)(x)
for h in hidden_layers:
x = Dense(h, kernel_initializer="lecun_normal")(x)
x = Activation("selu")(x)
x = AlphaDropout(dropout_rate)(x)
outputs = Dense(self.out_size, activation="softmax")(x)
model = Model(inputs, outputs)
model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy']
)
# set model attribute
self.model = model
The subsequent code to train this model with a Dataset is included below.
All deep_lincs.models follow the same order of method calls.
snn = SelfNormalizingClassifier(dataset, target="subtype")
snn.prepare_tf_datasets(batch_size=128)
snn.compile_model([128, 128, 128], dropout_rate=0.15)
snn.fit(epochs=20)
AutoEncoder¶
-
class
deep_lincs.models.AutoEncoder(dataset, **kwargs)[source]¶ Represents an simple autoencoder
Parameters: - dataset :
Dataset An instance of a
Datasetintended to train and evaluate a model.- test_sizes : tuple, (optional, default (
0.2,0.2)) Size of test splits for dividing the dataset into training, validation, and, testing
Attributes: - encoder :
tensorflow.keras.Model Encoder for the AutoEncoder model.
- targets :
list(str) Targets for model.
- train :
Dataset Dataset used to train the model.
- val :
Dataset Dataset used during training as validation.
- test :
Dataset Dataset used to evaluate the model.
- model :
tensorflow.keras.Model Compiled and trained model.
- in_size :
int Size of inputs (generally 978 for L1000 landmark genes).
- out_size :
int Same as input size since model is an autoencoder.
-
__init__(self, dataset, **kwargs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__(self, dataset, \*\*kwargs)Initialize self. prepare_tf_datasets(self, batch_size[, …])Defines how to prepare a prefetch dataset for training and model evaluation compile_model(self, hidden_layers[, …])Defines how model is built and compiled fit(self[, epochs, shuffle])Trains model on training dataset save(self, file_name)Saves model as hdf5 summary(self)Prints verbose summary of model -
compile_model(self, hidden_layers, dropout_rate=0.0, activation='relu', final_activation='relu', optimizer='adam', l1_reg=None)[source]¶ Defines how model is built and compiled
Parameters: - hidden_layers :
list(int) A list describing the size of the hidden layers.
- dropout_rate :
float(optional: default0.0) Dropout rate used during training. Applied to all hidden layers.
- activation :
str, (optional: default"relu") Activation function used in hidden layers.
- final_activation :
str(optional: default"softmax") Activation function used in final layer.
- optimizer :
str, (optional: default"adam") Optimizer used during training.
- l1_reg :
float(optional: defaultNone) Level of L1 regularization applied to the hidden embedding (smallest hidden layer).
Returns: None
- hidden_layers :
-
evaluate(self, inputs=None)¶ Evaluates model
Parameters: - inputs :
tensorflow.data.dataset, (optional: defaultNone) If no tf.dataset is provided, the model is evaluated on internal test dataset.
Returns: listof evalutation metrics.
- inputs :
-
fit(self, epochs=5, shuffle=True, **kwargs)¶ Trains model on training dataset
Parameters: - epochs :
int Number of training epochs
- shuffle :
bool(default:True) Whether to shuffle batches during training.
- kwargs : (optional)
Additional keyword arguments for
tensorflow.keras.model.fit. This is wheretensorflow.keras.callbacksshould be supplied, such as Tensorboard or EarlyStopping.
Returns: None
- epochs :
-
predict(self, inputs=None)¶ Feeds inputs forward through the network
Parameters: - inputs :
tensorflow.data.datasetorarrayordataframe, (optional: defaultNone) Inputs fed through the network. If not provided, the model uses the internal testing data to make a prediction.
Returns: arrayof final activations.
- inputs :
-
prepare_tf_datasets(self, batch_size, batch_normalize=None)¶ Defines how to prepare a prefetch dataset for training and model evaluation
Parameters: - batch_size :
int Batch size during training and for model evaluation.
- batch_normalize :
str(default:None) Normalization applied to each batch during training and evaluation. Can be one of
"z_score"or"standard_scale". Default isNone.
Returns: None
>>> model.prepare_tf_datasets(batch_size=128) ..
- batch_size :
-
save(self, file_name)¶ Saves model as hdf5
Parameters: - file_name :
str Name of output file.
Returns: None
- file_name :
-
summary(self)¶ Prints verbose summary of model
Returns: None
- dataset :
MultiClassifier¶
-
class
deep_lincs.models.MultiClassifier(dataset, targets, **kwargs)[source]¶ Represents a classifier for multiple metadata fields
Parameters: - dataset :
Dataset An instance of a
Datasetintended to train and evaluate a model.- targets :
list(str) Valid lists of metadata fields which define multiple classification tasks.
- test_sizes : tuple, (optional, default (
0.2,0.2)) Size of test splits for dividing the dataset into training, validation, and, testing
Attributes: - targets :
list(str) Targets for model.
- train :
Dataset Dataset used to train the model.
- val :
Dataset Dataset used during training as validation.
- test :
Dataset Dataset used to evaluate the model.
- model :
tensorflow.keras.Model Compiled and trained model.
- in_size :
int Size of inputs (generally 978 for L1000 landmark genes).
- out_size :
int Sum total of classification categories.
-
__init__(self, dataset, targets, **kwargs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__(self, dataset, targets, \*\*kwargs)Initialize self. prepare_tf_datasets(self, batch_size[, …])Defines how to prepare a prefetch dataset for training and model evaluation compile_model(self, hidden_layers[, …])Defines how model is built and compiled fit(self[, epochs, shuffle])Trains model on training dataset evaluate(self[, inputs])Evaluates model predict(self[, inputs])Feeds inputs forward through the network save(self, file_name)Saves model as hdf5 summary(self)Prints verbose summary of model plot_confusion_matrix(self[, normalize, …])Evaluates model and plots a confusion matrix of classification results -
compile_model(self, hidden_layers, dropout_rate=0.0, activation='relu', optimizer='adam', final_activation='softmax')[source]¶ Defines how model is built and compiled
Parameters: - hidden_layers :
list(int) A list describing the size of the hidden layers.
- dropout_rate :
float(optional: default0.0) Dropout rate used during training. Applied to all hidden layers.
- activation :
str, (optional: default"relu") Activation function used in hidden layers.
- optimizer :
str, (optional: default"adam") Optimizer used during training.
- final_activation :
str(optional: default"softmax") Activation function used in final layer.
- loss :
str(optional: default"categorical_crossentropy") Loss function.
Returns: None
- hidden_layers :
-
evaluate(self, inputs=None)¶ Evaluates model
Parameters: - inputs :
tensorflow.data.dataset, (optional: defaultNone) If no tf.dataset is provided, the model is evaluated on internal test dataset.
Returns: listof evalutation metrics.
- inputs :
-
fit(self, epochs=5, shuffle=True, **kwargs)¶ Trains model on training dataset
Parameters: - epochs :
int Number of training epochs
- shuffle :
bool(default:True) Whether to shuffle batches during training.
- kwargs : (optional)
Additional keyword arguments for
tensorflow.keras.model.fit. This is wheretensorflow.keras.callbacksshould be supplied, such as Tensorboard or EarlyStopping.
Returns: None
- epochs :
-
plot_confusion_matrix(self, normalize=True, zero_diag=False, size=300, color_scheme='lightgreyteal')[source]¶ Evaluates model and plots a confusion matrix of classification results
Parameters: - normalize :
bool, (optional: defaultTrue) Whether to normalize counts to frequencies.
- zero_diag :
bool(optional: defaultFalse) Whether to zero the diagonal of matrix. Useful for examining which categories are most frequently misidenitfied.
- size :
int, (optional: default300) Size of the plot in pixels.
- color_scheme :
str, (optional: default"lightgreyteal") Color scheme in heatmap. Can be any from https://vega.github.io/vega/docs/schemes/.
Returns: altair.Chartobject
- normalize :
-
predict(self, inputs=None)¶ Feeds inputs forward through the network
Parameters: - inputs :
tensorflow.data.datasetorarrayordataframe, (optional: defaultNone) Inputs fed through the network. If not provided, the model uses the internal testing data to make a prediction.
Returns: arrayof final activations.
- inputs :
-
prepare_tf_datasets(self, batch_size, batch_normalize=None)¶ Defines how to prepare a prefetch dataset for training and model evaluation
Parameters: - batch_size :
int Batch size during training and for model evaluation.
- batch_normalize :
str(default:None) Normalization applied to each batch during training and evaluation. Can be one of
"z_score"or"standard_scale". Default isNone.
Returns: None
>>> model.prepare_tf_datasets(batch_size=128) ..
- batch_size :
-
save(self, file_name)¶ Saves model as hdf5
Parameters: - file_name :
str Name of output file.
Returns: None
- file_name :
-
summary(self)¶ Prints verbose summary of model
Returns: None
- dataset :
SingleClassifier¶
-
class
deep_lincs.models.SingleClassifier(dataset, target, **kwargs)[source]¶ Represents a classifier for a single metadata field
Parameters: - dataset :
Dataset An instance of a
Datasetintended to train and evaluate a model.- target :
str Valid metadata field defining task for classification.
- test_sizes : tuple, (optional, default (
0.2,0.2)) Size of test splits for dividing the dataset into training, validation, and, testing
Attributes: - target :
str Target task of model.
- train :
Dataset Dataset used to train the model.
- val :
Dataset Dataset used during training as validation.
- test :
Dataset Dataset used to evaluate the model.
- model :
tensorflow.keras.Model Compiled and trained model.
- in_size :
int Size of inputs (generally 978 for L1000 landmark genes).
- out_size :
int Total of target categories.
-
__init__(self, dataset, target, **kwargs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__(self, dataset, target, \*\*kwargs)Initialize self. prepare_tf_datasets(self, batch_size[, …])Defines how to prepare a prefetch dataset for training and model evaluation compile_model(self, hidden_layers[, …])Defines how model is built and compiled fit(self[, epochs, shuffle])Trains model on training dataset evaluate(self[, inputs])Evaluates model predict(self[, inputs])Feeds inputs forward through the network save(self, file_name)Saves model as hdf5 summary(self)Prints verbose summary of model plot_confusion_matrix(self[, normalize, …])Evaluates model and plots a confusion matrix of classification results -
compile_model(self, hidden_layers, dropout_rate=0.0, activation='relu', optimizer='adam')[source]¶ Defines how model is built and compiled
Parameters: - hidden_layers :
list(int) A list describing the size of the hidden layers.
- dropout_rate :
float(optional: default0.0) Dropout rate used during training. Applied to all hidden layers.
- activation :
str, (optional: default"relu") Activation function used in hidden layers.
- optimizer :
str, (optional: default"adam") Optimizer used during training.
Returns: None
- hidden_layers :
-
evaluate(self, inputs=None)¶ Evaluates model
Parameters: - inputs :
tensorflow.data.dataset, (optional: defaultNone) If no tf.dataset is provided, the model is evaluated on internal test dataset.
Returns: listof evalutation metrics.
- inputs :
-
fit(self, epochs=5, shuffle=True, **kwargs)¶ Trains model on training dataset
Parameters: - epochs :
int Number of training epochs
- shuffle :
bool(default:True) Whether to shuffle batches during training.
- kwargs : (optional)
Additional keyword arguments for
tensorflow.keras.model.fit. This is wheretensorflow.keras.callbacksshould be supplied, such as Tensorboard or EarlyStopping.
Returns: None
- epochs :
-
plot_confusion_matrix(self, normalize=True, zero_diag=False, size=300, color_scheme='lightgreyteal')[source]¶ Evaluates model and plots a confusion matrix of classification results
Parameters: - normalize :
bool, (optional: defaultTrue) Whether to normalize counts to frequencies.
- zero_diag :
bool(optional: defaultFalse) Whether to zero the diagonal of matrix. Useful for examining which categories are most frequently misidenitfied.
- size :
int, (optional: default300) Size of the plot in pixels.
- color_scheme :
str, (optional: default"viridis") Color scheme in heatmap. Can be any from https://vega.github.io/vega/docs/schemes/.
Returns: altair.Chartobject
- normalize :
-
predict(self, inputs=None)¶ Feeds inputs forward through the network
Parameters: - inputs :
tensorflow.data.datasetorarrayordataframe, (optional: defaultNone) Inputs fed through the network. If not provided, the model uses the internal testing data to make a prediction.
Returns: arrayof final activations.
- inputs :
-
prepare_tf_datasets(self, batch_size, batch_normalize=None)¶ Defines how to prepare a prefetch dataset for training and model evaluation
Parameters: - batch_size :
int Batch size during training and for model evaluation.
- batch_normalize :
str(default:None) Normalization applied to each batch during training and evaluation. Can be one of
"z_score"or"standard_scale". Default isNone.
Returns: None
>>> model.prepare_tf_datasets(batch_size=128) ..
- batch_size :
-
save(self, file_name)¶ Saves model as hdf5
Parameters: - file_name :
str Name of output file.
Returns: None
- file_name :
-
summary(self)¶ Prints verbose summary of model
Returns: None
- dataset :