Skip to content

TensorFlow

TensorFlow (TF) is an open-source software library which can compile tensor operations to execute very quickly on both CPUs and GPUs. It is often used as a backend for machine learning libraries and models.

We heavily recommend the usage of TensorFlow 2.x. TensorFlow 1 has been long deprecated and it will probably be difficult to make it run on GPUs on our clusters.

Installation

For TensorFlow to work with GPUs, you have to use several libraries (CUDA, cuDNN, NCCL etc.) with versions that are compatible together.

You can load the correct modules with the following command:

$ ml TensorFlow

If you want to upgrade the TensorFlow version used in this package or install additional Python modules, you can simply create a virtual environment and install a different TensorFlow version inside it:

$ python3 -m venv venv
$ source venv/bin/activate
(venv) $ python3 -m pip install -U setuptools wheel pip
(venv) $ python3 -m pip install tensorflow

However, if you use a newer TensorFlow version than the one included in the TensorFlow module, you should make sure that it is still compatible with the CUDA version provided by the module. You can find the required CUDA/cuDNN versions for the latest TF here.

TensorFlow Example

After loading TensorFlow, you can check its functionality by running the following Python script.

import tensorflow as tf

a = tf.constant([1, 2, 3])
b = tf.constant([2, 4, 6])
c = a + b
print(c.numpy())

Using TensorFlow With GPUs

With TensorFlow, you can leverage either a single GPU or multiple GPUs in a single process, to e.g. train neural networks much faster.

Using the available TensorFlow module should make sure that these modules will be loaded correctly.

Selecting GPUs

You can select how many and which (NVIDIA) GPUs will be used by TensorFlow with the CUDA_VISIBLE_DEVICES environment variable.

# Do not use any GPUs
$ CUDA_VISIBLE_DEVICES=-1 python3 my_script.py
# Use a single GPU with ID 0
$ CUDA_VISIBLE_DEVICES=0 python3 my_script.py
# Use multiple GPUs
$ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 my_script.py

By default, if you do not specify the environment variable, all available GPUs will be used by TensorFlow.

Multi-GPU TensorFlow Example

This script uses keras and TensorFlow to train a simple neural network on the MNIST dataset. It assumes that you have tensorflow (2.x), keras and tensorflow_datasets Python packages installed. The training is performed on multiple GPUs.

import tensorflow_datasets as tfds
import tensorflow as tf

datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)

mnist_train, mnist_test = datasets['train'], datasets['test']

# Use NCCL reduction if NCCL is available, it should be the most efficient strategy
strategy = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.NcclAllReduce())

# Different reduction strategy, use if NCCL causes errors
# strategy = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.ReductionToOneDevice())
print('Number of devices: {}'.format(strategy.num_replicas_in_sync))

num_train_examples = info.splits['train'].num_examples
num_test_examples = info.splits['test'].num_examples

BUFFER_SIZE = 10000

BATCH_SIZE_PER_REPLICA = 64
BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync

def scale(image, label):
  image = tf.cast(image, tf.float32)
  image /= 255

  return image, label


train_dataset = mnist_train.map(scale).cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
eval_dataset = mnist_test.map(scale).batch(BATCH_SIZE)

# The following line makes sure that the model will run on multiple GPUs (if they are available)
# Without `strategy.scope()`, the model would only be trained on a single GPU
with strategy.scope():
  model = tf.keras.Sequential([
      tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
      tf.keras.layers.MaxPooling2D(),
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(64, activation='relu'),
      tf.keras.layers.Dense(10)
  ])

  model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])

model.fit(train_dataset, epochs=100)

Note

If using the NCCL strategy causes runtime errors, try to run your application with the environment variable TF_FORCE_GPU_ALLOW_GROWTH set to true.

Tip

For real-world multi-GPU training, it might be better to use a dedicated multi-GPU framework such as Horovod.

Comments