Extendable Models¶
DeepLincs offers AutoEncoder
, MultiClassifier
,
SingleClassifier
, as simple high-level APIs for building
and training a deep neural network on a L1000 Dataset
.
While the networks defined for each model above are simple,
each of these object can be subclassed, allowing for the user to
override the compile_model
method and build and more complicated
model for an identical task. For example, here is a
self normalizing classifier.
from deep_lincs.models import SingleClassifier
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Activation, AlphaDropout, Input
class SelfNormalisingClassifier(SingleClassifier):
def __init__(self, dataset, target):
super(SelfNormalisingClassifier, self).__init__(dataset=dataset, target=target)
def compile_model(self, hidden_layers, dropout_rate, opt="adam"):
inputs = Input(shape=(self.in_size,))
x = Dense( hidden_layers.pop(0), kernel_initializer="lecun_normal")(inputs)
x = Activation("selu")(x)
x = AlphaDropout(dropout_rate)(x)
for h in hidden_layers:
x = Dense(h, kernel_initializer="lecun_normal")(x)
x = Activation("selu")(x)
x = AlphaDropout(dropout_rate)(x)
outputs = Dense(self.out_size, activation="softmax")(x)
model = Model(inputs, outputs)
model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy']
)
# set model attribute
self.model = model
The subsequent code to train this model with a Dataset
is included below.
All deep_lincs.models
follow the same order of method calls.
snn = SelfNormalizingClassifier(dataset, target="subtype")
snn.prepare_tf_datasets(batch_size=128)
snn.compile_model([128, 128, 128], dropout_rate=0.15)
snn.fit(epochs=20)
AutoEncoder¶
-
class
deep_lincs.models.
AutoEncoder
(dataset, **kwargs)[source]¶ Represents an simple autoencoder
Parameters: - dataset :
Dataset
An instance of a
Dataset
intended to train and evaluate a model.- test_sizes : tuple, (optional, default (
0.2
,0.2
)) Size of test splits for dividing the dataset into training, validation, and, testing
Attributes: - encoder :
tensorflow.keras.Model
Encoder for the AutoEncoder model.
- targets :
list(str)
Targets for model.
- train :
Dataset
Dataset used to train the model.
- val :
Dataset
Dataset used during training as validation.
- test :
Dataset
Dataset used to evaluate the model.
- model :
tensorflow.keras.Model
Compiled and trained model.
- in_size :
int
Size of inputs (generally 978 for L1000 landmark genes).
- out_size :
int
Same as input size since model is an autoencoder.
-
__init__
(self, dataset, **kwargs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(self, dataset, \*\*kwargs)Initialize self. prepare_tf_datasets
(self, batch_size[, …])Defines how to prepare a prefetch dataset for training and model evaluation compile_model
(self, hidden_layers[, …])Defines how model is built and compiled fit
(self[, epochs, shuffle])Trains model on training dataset save
(self, file_name)Saves model as hdf5 summary
(self)Prints verbose summary of model -
compile_model
(self, hidden_layers, dropout_rate=0.0, activation='relu', final_activation='relu', optimizer='adam', l1_reg=None)[source]¶ Defines how model is built and compiled
Parameters: - hidden_layers :
list(int)
A list describing the size of the hidden layers.
- dropout_rate :
float
(optional: default0.0
) Dropout rate used during training. Applied to all hidden layers.
- activation :
str
, (optional: default"relu"
) Activation function used in hidden layers.
- final_activation :
str
(optional: default"softmax"
) Activation function used in final layer.
- optimizer :
str
, (optional: default"adam"
) Optimizer used during training.
- l1_reg :
float
(optional: defaultNone
) Level of L1 regularization applied to the hidden embedding (smallest hidden layer).
Returns: None
- hidden_layers :
-
evaluate
(self, inputs=None)¶ Evaluates model
Parameters: - inputs :
tensorflow.data.dataset
, (optional: defaultNone
) If no tf.dataset is provided, the model is evaluated on internal test dataset.
Returns: list
of evalutation metrics.
- inputs :
-
fit
(self, epochs=5, shuffle=True, **kwargs)¶ Trains model on training dataset
Parameters: - epochs :
int
Number of training epochs
- shuffle :
bool
(default:True
) Whether to shuffle batches during training.
- kwargs : (optional)
Additional keyword arguments for
tensorflow.keras.model.fit
. This is wheretensorflow.keras.callbacks
should be supplied, such as Tensorboard or EarlyStopping.
Returns: None
- epochs :
-
predict
(self, inputs=None)¶ Feeds inputs forward through the network
Parameters: - inputs :
tensorflow.data.dataset
orarray
ordataframe
, (optional: defaultNone
) Inputs fed through the network. If not provided, the model uses the internal testing data to make a prediction.
Returns: array
of final activations.
- inputs :
-
prepare_tf_datasets
(self, batch_size, batch_normalize=None)¶ Defines how to prepare a prefetch dataset for training and model evaluation
Parameters: - batch_size :
int
Batch size during training and for model evaluation.
- batch_normalize :
str
(default:None
) Normalization applied to each batch during training and evaluation. Can be one of
"z_score"
or"standard_scale"
. Default isNone
.
Returns: None
>>> model.prepare_tf_datasets(batch_size=128) ..
- batch_size :
-
save
(self, file_name)¶ Saves model as hdf5
Parameters: - file_name :
str
Name of output file.
Returns: None
- file_name :
-
summary
(self)¶ Prints verbose summary of model
Returns: None
- dataset :
MultiClassifier¶
-
class
deep_lincs.models.
MultiClassifier
(dataset, targets, **kwargs)[source]¶ Represents a classifier for multiple metadata fields
Parameters: - dataset :
Dataset
An instance of a
Dataset
intended to train and evaluate a model.- targets :
list(str)
Valid lists of metadata fields which define multiple classification tasks.
- test_sizes : tuple, (optional, default (
0.2
,0.2
)) Size of test splits for dividing the dataset into training, validation, and, testing
Attributes: - targets :
list(str)
Targets for model.
- train :
Dataset
Dataset used to train the model.
- val :
Dataset
Dataset used during training as validation.
- test :
Dataset
Dataset used to evaluate the model.
- model :
tensorflow.keras.Model
Compiled and trained model.
- in_size :
int
Size of inputs (generally 978 for L1000 landmark genes).
- out_size :
int
Sum total of classification categories.
-
__init__
(self, dataset, targets, **kwargs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(self, dataset, targets, \*\*kwargs)Initialize self. prepare_tf_datasets
(self, batch_size[, …])Defines how to prepare a prefetch dataset for training and model evaluation compile_model
(self, hidden_layers[, …])Defines how model is built and compiled fit
(self[, epochs, shuffle])Trains model on training dataset evaluate
(self[, inputs])Evaluates model predict
(self[, inputs])Feeds inputs forward through the network save
(self, file_name)Saves model as hdf5 summary
(self)Prints verbose summary of model plot_confusion_matrix
(self[, normalize, …])Evaluates model and plots a confusion matrix of classification results -
compile_model
(self, hidden_layers, dropout_rate=0.0, activation='relu', optimizer='adam', final_activation='softmax')[source]¶ Defines how model is built and compiled
Parameters: - hidden_layers :
list(int)
A list describing the size of the hidden layers.
- dropout_rate :
float
(optional: default0.0
) Dropout rate used during training. Applied to all hidden layers.
- activation :
str
, (optional: default"relu"
) Activation function used in hidden layers.
- optimizer :
str
, (optional: default"adam"
) Optimizer used during training.
- final_activation :
str
(optional: default"softmax"
) Activation function used in final layer.
- loss :
str
(optional: default"categorical_crossentropy"
) Loss function.
Returns: None
- hidden_layers :
-
evaluate
(self, inputs=None)¶ Evaluates model
Parameters: - inputs :
tensorflow.data.dataset
, (optional: defaultNone
) If no tf.dataset is provided, the model is evaluated on internal test dataset.
Returns: list
of evalutation metrics.
- inputs :
-
fit
(self, epochs=5, shuffle=True, **kwargs)¶ Trains model on training dataset
Parameters: - epochs :
int
Number of training epochs
- shuffle :
bool
(default:True
) Whether to shuffle batches during training.
- kwargs : (optional)
Additional keyword arguments for
tensorflow.keras.model.fit
. This is wheretensorflow.keras.callbacks
should be supplied, such as Tensorboard or EarlyStopping.
Returns: None
- epochs :
-
plot_confusion_matrix
(self, normalize=True, zero_diag=False, size=300, color_scheme='lightgreyteal')[source]¶ Evaluates model and plots a confusion matrix of classification results
Parameters: - normalize :
bool
, (optional: defaultTrue
) Whether to normalize counts to frequencies.
- zero_diag :
bool
(optional: defaultFalse
) Whether to zero the diagonal of matrix. Useful for examining which categories are most frequently misidenitfied.
- size :
int
, (optional: default300
) Size of the plot in pixels.
- color_scheme :
str
, (optional: default"lightgreyteal"
) Color scheme in heatmap. Can be any from https://vega.github.io/vega/docs/schemes/.
Returns: altair.Chart
object
- normalize :
-
predict
(self, inputs=None)¶ Feeds inputs forward through the network
Parameters: - inputs :
tensorflow.data.dataset
orarray
ordataframe
, (optional: defaultNone
) Inputs fed through the network. If not provided, the model uses the internal testing data to make a prediction.
Returns: array
of final activations.
- inputs :
-
prepare_tf_datasets
(self, batch_size, batch_normalize=None)¶ Defines how to prepare a prefetch dataset for training and model evaluation
Parameters: - batch_size :
int
Batch size during training and for model evaluation.
- batch_normalize :
str
(default:None
) Normalization applied to each batch during training and evaluation. Can be one of
"z_score"
or"standard_scale"
. Default isNone
.
Returns: None
>>> model.prepare_tf_datasets(batch_size=128) ..
- batch_size :
-
save
(self, file_name)¶ Saves model as hdf5
Parameters: - file_name :
str
Name of output file.
Returns: None
- file_name :
-
summary
(self)¶ Prints verbose summary of model
Returns: None
- dataset :
SingleClassifier¶
-
class
deep_lincs.models.
SingleClassifier
(dataset, target, **kwargs)[source]¶ Represents a classifier for a single metadata field
Parameters: - dataset :
Dataset
An instance of a
Dataset
intended to train and evaluate a model.- target :
str
Valid metadata field defining task for classification.
- test_sizes : tuple, (optional, default (
0.2
,0.2
)) Size of test splits for dividing the dataset into training, validation, and, testing
Attributes: - target :
str
Target task of model.
- train :
Dataset
Dataset used to train the model.
- val :
Dataset
Dataset used during training as validation.
- test :
Dataset
Dataset used to evaluate the model.
- model :
tensorflow.keras.Model
Compiled and trained model.
- in_size :
int
Size of inputs (generally 978 for L1000 landmark genes).
- out_size :
int
Total of target categories.
-
__init__
(self, dataset, target, **kwargs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(self, dataset, target, \*\*kwargs)Initialize self. prepare_tf_datasets
(self, batch_size[, …])Defines how to prepare a prefetch dataset for training and model evaluation compile_model
(self, hidden_layers[, …])Defines how model is built and compiled fit
(self[, epochs, shuffle])Trains model on training dataset evaluate
(self[, inputs])Evaluates model predict
(self[, inputs])Feeds inputs forward through the network save
(self, file_name)Saves model as hdf5 summary
(self)Prints verbose summary of model plot_confusion_matrix
(self[, normalize, …])Evaluates model and plots a confusion matrix of classification results -
compile_model
(self, hidden_layers, dropout_rate=0.0, activation='relu', optimizer='adam')[source]¶ Defines how model is built and compiled
Parameters: - hidden_layers :
list(int)
A list describing the size of the hidden layers.
- dropout_rate :
float
(optional: default0.0
) Dropout rate used during training. Applied to all hidden layers.
- activation :
str
, (optional: default"relu"
) Activation function used in hidden layers.
- optimizer :
str
, (optional: default"adam"
) Optimizer used during training.
Returns: None
- hidden_layers :
-
evaluate
(self, inputs=None)¶ Evaluates model
Parameters: - inputs :
tensorflow.data.dataset
, (optional: defaultNone
) If no tf.dataset is provided, the model is evaluated on internal test dataset.
Returns: list
of evalutation metrics.
- inputs :
-
fit
(self, epochs=5, shuffle=True, **kwargs)¶ Trains model on training dataset
Parameters: - epochs :
int
Number of training epochs
- shuffle :
bool
(default:True
) Whether to shuffle batches during training.
- kwargs : (optional)
Additional keyword arguments for
tensorflow.keras.model.fit
. This is wheretensorflow.keras.callbacks
should be supplied, such as Tensorboard or EarlyStopping.
Returns: None
- epochs :
-
plot_confusion_matrix
(self, normalize=True, zero_diag=False, size=300, color_scheme='lightgreyteal')[source]¶ Evaluates model and plots a confusion matrix of classification results
Parameters: - normalize :
bool
, (optional: defaultTrue
) Whether to normalize counts to frequencies.
- zero_diag :
bool
(optional: defaultFalse
) Whether to zero the diagonal of matrix. Useful for examining which categories are most frequently misidenitfied.
- size :
int
, (optional: default300
) Size of the plot in pixels.
- color_scheme :
str
, (optional: default"viridis"
) Color scheme in heatmap. Can be any from https://vega.github.io/vega/docs/schemes/.
Returns: altair.Chart
object
- normalize :
-
predict
(self, inputs=None)¶ Feeds inputs forward through the network
Parameters: - inputs :
tensorflow.data.dataset
orarray
ordataframe
, (optional: defaultNone
) Inputs fed through the network. If not provided, the model uses the internal testing data to make a prediction.
Returns: array
of final activations.
- inputs :
-
prepare_tf_datasets
(self, batch_size, batch_normalize=None)¶ Defines how to prepare a prefetch dataset for training and model evaluation
Parameters: - batch_size :
int
Batch size during training and for model evaluation.
- batch_normalize :
str
(default:None
) Normalization applied to each batch during training and evaluation. Can be one of
"z_score"
or"standard_scale"
. Default isNone
.
Returns: None
>>> model.prepare_tf_datasets(batch_size=128) ..
- batch_size :
-
save
(self, file_name)¶ Saves model as hdf5
Parameters: - file_name :
str
Name of output file.
Returns: None
- file_name :
-
summary
(self)¶ Prints verbose summary of model
Returns: None
- dataset :