Tensorflow-cl

Run Tensorflow on OpenCL™ devices. UNDER CONSTRUCTION!!!

Summary

This repo was created from the original Tensorflow repository at:

https://github.com/tensorflow/tensorflow

Please see the main repository for full Tensorflow documentation. This readme will only focus on the OpenCL porting aspects of Tensorflow.

What works

What's working

per-element binary operators: add, sub, mul, div, pow, minimum, maximum, squared_difference, as per test_tf3.py
per-element unary operators: tanh, abs, acos, asin, atan, ceil, cos, exp, floor, inverse, isfinite, isinf, isnan, log, neg, sign, sin, sqrt, square, tan` (test: test_tf4.py)
Variables can be placed on GPU
matmul (using CLBlast)
some gradients
reduce_sum, reduce_prod, reduce_max, reduce_mean, reduce_min working, in beta test_reductions.py
training works :-)))
device name and memory reported correctly now
Aymeric Damien's 2_BasicModels run ok on NVIDIA K520 now (not working on Intel HD5500 yet).

Scope

types:
- float32 are primarily supported type
- int32 is also supported, as a second priority
- int8 (or uint8, havent decided yet), will probably be supported too
- out of scope: complex, double, half

To do

fix bugs...
add convolutions

Installation

For now, Ubuntu 16.04 is supported. In the future, I plan to support Mac OS X too
You need:
- the tensorflow non-gpu installation pre-requisites,
- an OpenCL 1.2-enabled GPU, and OpenCL 1.2-enabled drivers
- python 3
Simply download https://github.com/hughperkins/tensorflow-cl/releases/download/v0.11.0/tensorflow-0.11.0rc0-py3-none-any.whl , and
Install using pip:

pip install --upgrade tensorflow-0.11.0rc0-py3-none-any.whl

If you want, you can build from source

Run

test per-element binary operations tensorflow/stream_executor/cl/test/test_tf3.py:

cd
source ~/env3/bin/activate
python ~/git/tensorflow-cl/tensorflow/stream_executor/cl/test/test_tf3.py

test per-element unary operations tensorflow/stream_executor/cl/test/test_tf4.py:

cd
source ~/env3/bin/activate
python ~/git/tensorflow-cl/tensorflow/stream_executor/cl/test/test_tf4.py

test blas test_blas.py :

cd
source ~/env3/bin/activate
python ~/git/tensorflow-cl/tensorflow/stream_executor/cl/test/test_blas.py

training example test_gradients.py

cd
source ~/env3/bin/activate
python ~/git/tensorflow-cl/tensorflow/stream_executor/cl/test/test_gradients.py

Piccie of running Aymeric Damien's linear_regression.py:

Test results, on v0.11.0 wheel

test	Intel HD5500	NVIDIA K520
test_tf.py	ok	ok
test_tf2.py	ok	ok
test_tf3.py	fails for pow	ok
test_tf4.py	fails for all	ok
test_blas.py	ok	ok
test_reductions.py	fails for all except reduce_mean	ok
linear_regression.py	runs, but cost seems wrong	ok
logistic_regression.py	epoch 1 ok, then memory error	ok
nearest_neighbor.py	accuracy 0.12, seems a bit low...	ok
multilayer_perceptron.py	cost is nan	a bit slow, otherwise seems ok
recurrent_network.py	loss nan, accuracy broken	cost looks ok, accuracy seems broken

Design/architecture

tensorflow code stays 100% NVIDIA® CUDA™
cuda-on-cl compiles the CUDA code into OpenCL
Cedric Nugteren's CLBlast provides BLAS (matrix multiplications)

Related projects

DNN Libraries

OpenCL middleware

CLBlast BLAS for OpenCL
cuda-on-cl Compile CUDA apps for OpenCL
EasyCL Handles running kernels, passing in arguments etc, on OpenCL

News

Nov 1:
- building clew, CLBlast, easycl, cocl as shared libraries now, rather than static
  - hopefully this will facilitate debugging things on the HD5500 on my laptop, since dont need to build/install entire wheel, for libcocl tweaks
- turned on clew
  - this means no longer needs libOpenCL.so during build process
  - might facilitiate building on Mac, since no longer need to link to libOpenCL.so, which was outside the Bazel build tree
Oct 30:
- new wheel v0.11.0
  - fixes critical bug in v0.10.0 release, where the number of devices was hard-coded to be 0 :-P
  - Aymeric Damien's 2_BasicModels all run now, on NVIDIA K520. Seem broken on Intel HD5500 for now
  - bunch of fixes underneath to get 2_BasicModels working ok on K520
Oct 29:
- reduce_min working now, and test_reductions.py tests three types of reduction axes: inner, outer, all
- Wheel v0.10.0 released:
  - Aymeric Damien's linear_regression runs fairly ok now (a bit slow, but not monstrously slow, maybe 3-4 times slower than on CUDA)
  - kernels cached between kernel launches (this gives a hugggeee speed boost, compared to earlier)
  - bunch of behind-the-scenes ops added, like Cast
  - memory and device name reported correctly now
  - reduce_min working now
  - softmax added
Oct 28:
- training working :-) test_gradients.py
- reduce_sum, reduce_prod, reduce_max, reduce_mean added, in beta test_reductions.py
Oct 25:
- fixed BLAS wrapper, working now, on GPU, test script: test_blas.py
- int32 constant works on gpu now, test_ints.py
Oct 24:
- hmmm, just discovered some new options, to ensure operations really are on the gpu, and ... many are not :-P, so back to the drawing board a bit
  - the good news is that component-wise add really is on the gpu
  - the bad news is that everything else is not :-P
- (re-)added following per-element binary operators: sub, mul, div, pow, minimum, maximum, squared_difference. This time, they actually are really running on the gpu :-) (test: test_tf3.py)
- (re-)added following per-element unary operators:, which really are running on gpu now :-), test_tf4.py: tanh, abs, acos, asin, atan, ceil, cos, exp, floor, inverse, isfinite, isinf, isnan, log, neg, sign, sin, sqrt, square, tan`
- Variables can be placed on gpu now, test_gradients.py
Oct 23:
- can use component wise addition from Python now :-)
- fixed critical bug involving float4s, that meant that tensors larger than, say, 3 :-P, could not be added correctly
- ~~added following per-element binary operators: sub, mul, div, not_equal, minimum, maximum, pow, squared_difference (test: test_tf3.py)~~
- added following per-element unary operator: tanh, abs, acos, asin, atan, ceil, cos, exp, floor, inverse, isfinite, isinf, isnan, log, neg, sigmoid, sign, sin, sqrt, square, tan` (test: test_tf4.py)
- ~~added following comparison operators: equal_to, greater, greater_equal, less, less_equal~~
- ~~added in BLAS (using Cedric Nugteren's CLBlast ). Not very tested yet. Test script test_blas.py~~
Oct 22:
- componentwise addition working, when called from c++
- commit 0db9cc2e: re-enabled -fPIC, -pie
  - this is a pre-requisite for being able to run from python at some point
  - but if you built prior to this, you need to deeeeep clean, and rebuild from scratch:
```
rm -Rf third_party/cuda-on-cl/build
bazel clean --expunge
```
- python working (as of commit 5e67304c3c)
  - you'll need to do bazel clean, and rebuild from scratch, if you already did a build prior to this commit
Oct 20:
- removed requirement for CUDA Toolkit
- updated build slightly: added https://github.com/hughperkins/cuda-on-cl as a submodule
Oct 18:
- stream executor up
- crosstool working

Name		Name	Last commit message	Last commit date
Latest commit History 9,069 Commits
doc		doc
tensorflow		tensorflow
third_party		third_party
tools		tools
util/python		util/python
.gitignore		.gitignore
.gitmodules		.gitmodules
ACKNOWLEDGMENTS		ACKNOWLEDGMENTS
ADOPTERS.md		ADOPTERS.md
AUTHORS		AUTHORS
BUILD		BUILD
CONTRIBUTING.md		CONTRIBUTING.md
ISSUE_TEMPLATE.md		ISSUE_TEMPLATE.md
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
WORKSPACE		WORKSPACE
bower.BUILD		bower.BUILD
configure		configure
cuda_for_cocl.BUILD		cuda_for_cocl.BUILD
eigen.BUILD		eigen.BUILD
farmhash.BUILD		farmhash.BUILD
gif.BUILD		gif.BUILD
gmock.BUILD		gmock.BUILD
grpc.BUILD		grpc.BUILD
jpeg.BUILD		jpeg.BUILD
jsoncpp.BUILD		jsoncpp.BUILD
linenoise.BUILD		linenoise.BUILD
nanopb.BUILD		nanopb.BUILD
png.BUILD		png.BUILD
six.BUILD		six.BUILD
zlib.BUILD		zlib.BUILD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tensorflow-cl

Summary

What works

What's working

Scope

To do

Installation

Run

Test results, on v0.11.0 wheel

Design/architecture

Related projects

DNN Libraries

OpenCL middleware

News

About

Releases

Packages

Languages

License

dcolley/tensorflow-cl

Folders and files

Latest commit

History

Repository files navigation

Tensorflow-cl

Summary

What works

What's working

Scope

To do

Installation

Run

Test results, on v0.11.0 wheel

Design/architecture

Related projects

DNN Libraries

OpenCL middleware

News

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages