HW selection benchmark for deep learning

Posted on February 4, 2019July 27, 2021 by Julia Nedzelskaya

Our company is doing active research in the area of deep learning. To be a bit more specific – we are building very wide embedding layers. We need to be able to perform an interactive research and we need to be able to train a lot of networks on-demand.

The number of weights in our network went well above 5 millions. So learning performance on regular CPU became an issue and hence we have to make a decision regarding the hardware.

We chose to perform some tests in order to decide what is better for us. The main options are:

To buy Titan X based computers;
To go for AWS new P2 instances with K80 accelerators;
To work with existing laptops with 840M built in graphic cards.

Here are results only for TensorFlow and embedding size 50, to make things more clear.

Therefore, we tested all those kinds of hardware on different batch sizes and on different embedding sizes. You can think about embedding size simply as a scale factor for the network. Since we are working with Keras, it did make sense for us also to test different available back-ends – Theano and TensorFlow.

*Embedding size (common)*	*Backend*	*Average time per epoch*
*NVIDIA GeForce 840M*	*NVIDIA GeForce Titan X (Pascal)*	*NVIDIA K80 Tesla*
Batch size = 100
50	TensorFlow	593	50.3	116
50	Theano	326	65.25	134
10	TensorFlow	121.5	30.5	70
10	Theano	117.25	44	75
Batch size = 500
50	TensorFlow	153.75	12	30.5
50	Theano	110.75	18.5	42
10	TensorFlow	27.75	6.25	15.25
10	Theano	29.5	9.25	18.25
Batch size = 1000
50	TensorFlow	100	7.25	20
50	Theano	89.5	14	30
10	TensorFlow	16.5	3.25	9
10	Theano	19.75	5.25	12
Batch size = 2000
50	TensorFlow	73	5.25	14
50	Theano	69	7	19
10	TensorFlow	10.25	2.25	6
10	Theano	13	4	8
Batch size = 4000
50	TensorFlow	60	4	12
50	Theano	62	6	16
10	TensorFlow	7.25	1.25	4
10	Theano	9.75	3	6

Table 1. Benchmark results

Times are in seconds.

The test was performed on about 1 million of test samples.
We used P2.xlarge instance with half of K80 (single GPU).
It should be mentioned that Titan X and half of K80 both have 12 GB.

Main observations :

K80 accelerator is about 50% slower than TitanX on our load.
Batch size is a very important optimization factor.
In our case tensor flow is considerably faster.
Simple graphic card such as 840M is slower than high end cards in terms of magnitude on big networks.
For smaller networks difference between 840M and high-end cards is much less significant.

Here is illustration how batch size affect training time:

Thus, our main decision was based on the choice: to go with K80 and AWS or to buy our own hardware.

TitanX cost is $1200, and a good machine to go with it would come up to about $2500.|
AWS P2 cost is $0.9 per hour.

If we take into account electricity prices, we will get to a break even at about 3000 hours. And this is lot, if we are looking at 8-hour working days. We will get to a bit more than one year (!!!!).

If we train networks 24 hours a day – it will be much less. But it does not happen in our research.

So money-wise AWS was a winner. Moreover, if we want to be flexible and to be able to train one network at a time one day and dozens on another day, AWS wins hands down…

Another important point is that AWS provides us with a very simple way to share configured environments among our team members.

It has to be mentioned that the local card has advantages of simpler UI access to data and results. At the same time, properly configured Jupiter should solve most of it (we are currently working on it).

Note also that that we are training neural networks on big data sets, and the capability to scale is critical for us. Hence we have another vote for AWS based solution.

Interactivity of the research was also one of our goals. The fact that K80 gives us 50% less performance is not pleasing, but nevertheless it still does not kill the interactivity. Note, if the difference would be bigger – we would see it as a problem.
Thus eventually, we decided to go on with Amazon AWS P2.xlarge instances.

A lot of thanks to our deep learning ninja Sergii Myskov who have done all heavy lifting: data preparations / devops / keras programming etc.

Leave a Reply Cancel reply