Skip to content

OpenCL implementation for Tensorflow (under construction...)

License

Notifications You must be signed in to change notification settings

dcolley/tensorflow-cl

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tensorflow-cl

Run Tensorflow on OpenCL™ devices. UNDER CONSTRUCTION!!!

Summary

This repo was created from the original Tensorflow repository at:

Please see the main repository for full Tensorflow documentation. This readme will only focus on the OpenCL porting aspects of Tensorflow.

What works

What's working

  • per-element binary operators: add, sub, mul, div, pow, minimum, maximum, squared_difference, as per test_tf3.py
  • per-element unary operators: tanh, abs, acos, asin, atan, ceil, cos, exp, floor, inverse, isfinite, isinf, isnan, log, neg, sign, sin, sqrt, square, tan` (test: test_tf4.py)
  • Variables can be placed on GPU
  • matmul (using CLBlast)
  • some gradients
  • reduce_sum, reduce_prod, reduce_max, reduce_mean, reduce_min working, in beta test_reductions.py
  • training works :-)))
  • device name and memory reported correctly now
  • Aymeric Damien's 2_BasicModels run ok on NVIDIA K520 now (not working on Intel HD5500 yet).

Scope

  • types:
    • float32 are primarily supported type
    • int32 is also supported, as a second priority
    • int8 (or uint8, havent decided yet), will probably be supported too
    • out of scope: complex, double, half

To do

  • fix bugs...
  • add convolutions

Installation

pip install --upgrade tensorflow-0.11.0rc0-py3-none-any.whl

If you want, you can build from source

Run

cd
source ~/env3/bin/activate
python ~/git/tensorflow-cl/tensorflow/stream_executor/cl/test/test_tf3.py
cd
source ~/env3/bin/activate
python ~/git/tensorflow-cl/tensorflow/stream_executor/cl/test/test_tf4.py
cd
source ~/env3/bin/activate
python ~/git/tensorflow-cl/tensorflow/stream_executor/cl/test/test_blas.py
cd
source ~/env3/bin/activate
python ~/git/tensorflow-cl/tensorflow/stream_executor/cl/test/test_gradients.py

Piccie of running Aymeric Damien's linear_regression.py:

Test results, on v0.11.0 wheel

test Intel HD5500 NVIDIA K520
test_tf.py ok ok
test_tf2.py ok ok
test_tf3.py fails for pow ok
test_tf4.py fails for all ok
test_blas.py ok ok
test_reductions.py fails for all except reduce_mean ok
linear_regression.py runs, but cost seems wrong ok
logistic_regression.py epoch 1 ok, then memory error ok
nearest_neighbor.py accuracy 0.12, seems a bit low... ok
multilayer_perceptron.py cost is nan a bit slow, otherwise seems ok
recurrent_network.py loss nan, accuracy broken cost looks ok, accuracy seems broken

Design/architecture

Related projects

DNN Libraries

OpenCL middleware

  • CLBlast BLAS for OpenCL
  • cuda-on-cl Compile CUDA apps for OpenCL
  • EasyCL Handles running kernels, passing in arguments etc, on OpenCL

News

  • Nov 1:
    • building clew, CLBlast, easycl, cocl as shared libraries now, rather than static
      • hopefully this will facilitate debugging things on the HD5500 on my laptop, since dont need to build/install entire wheel, for libcocl tweaks
    • turned on clew
      • this means no longer needs libOpenCL.so during build process
      • might facilitiate building on Mac, since no longer need to link to libOpenCL.so, which was outside the Bazel build tree
  • Oct 30:
    • new wheel v0.11.0
      • fixes critical bug in v0.10.0 release, where the number of devices was hard-coded to be 0 :-P
      • Aymeric Damien's 2_BasicModels all run now, on NVIDIA K520. Seem broken on Intel HD5500 for now
      • bunch of fixes underneath to get 2_BasicModels working ok on K520
  • Oct 29:
    • reduce_min working now, and test_reductions.py tests three types of reduction axes: inner, outer, all
    • Wheel v0.10.0 released:
      • Aymeric Damien's linear_regression runs fairly ok now (a bit slow, but not monstrously slow, maybe 3-4 times slower than on CUDA)
      • kernels cached between kernel launches (this gives a hugggeee speed boost, compared to earlier)
      • bunch of behind-the-scenes ops added, like Cast
      • memory and device name reported correctly now
      • reduce_min working now
      • softmax added
  • Oct 28:
  • Oct 25:
  • Oct 24:
    • hmmm, just discovered some new options, to ensure operations really are on the gpu, and ... many are not :-P, so back to the drawing board a bit
      • the good news is that component-wise add really is on the gpu
      • the bad news is that everything else is not :-P
    • (re-)added following per-element binary operators: sub, mul, div, pow, minimum, maximum, squared_difference. This time, they actually are really running on the gpu :-) (test: test_tf3.py)
    • (re-)added following per-element unary operators:, which really are running on gpu now :-), test_tf4.py: tanh, abs, acos, asin, atan, ceil, cos, exp, floor, inverse, isfinite, isinf, isnan, log, neg, sign, sin, sqrt, square, tan`
    • Variables can be placed on gpu now, test_gradients.py
  • Oct 23:
    • can use component wise addition from Python now :-)
    • fixed critical bug involving float4s, that meant that tensors larger than, say, 3 :-P, could not be added correctly
    • added following per-element binary operators: sub, mul, div, not_equal, minimum, maximum, pow, squared_difference (test: test_tf3.py)
    • added following per-element unary operator: tanh, abs, acos, asin, atan, ceil, cos, exp, floor, inverse, isfinite, isinf, isnan, log, neg, sigmoid, sign, sin, sqrt, square, tan` (test: test_tf4.py)
    • added following comparison operators: equal_to, greater, greater_equal, less, less_equal
    • added in BLAS (using Cedric Nugteren's CLBlast ). Not very tested yet. Test script test_blas.py
  • Oct 22:
    • componentwise addition working, when called from c++
    • commit 0db9cc2e: re-enabled -fPIC, -pie
      • this is a pre-requisite for being able to run from python at some point
      • but if you built prior to this, you need to deeeeep clean, and rebuild from scratch:
      rm -Rf third_party/cuda-on-cl/build
      bazel clean --expunge
      
    • python working (as of commit 5e67304c3c)
      • you'll need to do bazel clean, and rebuild from scratch, if you already did a build prior to this commit
  • Oct 20:
  • Oct 18:
    • stream executor up
    • crosstool working

About

OpenCL implementation for Tensorflow (under construction...)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 45.6%
  • Python 41.5%
  • Jupyter Notebook 6.2%
  • TypeScript 2.4%
  • HTML 1.7%
  • Shell 0.9%
  • Other 1.7%