11 Replies Latest reply on Nov 13, 2018 9:22 PM by Intel Corporation

    TensorFlow binary was not compiled to use: AVX2 AVX512F FMA

    subhashis

      Please help:

       

      Traceback (most recent call last):

        File "train.py", line 12, in <module>

          import matplotlib.pyplot as plt

      ModuleNotFoundError: No module named 'matplotlib'

      (tf3) [u14544@c009-n023 BRATS2018]$ python train.py

      Using TensorFlow backend.

      ------------------------------

      Loading and preprocessing train data...

      ------------------------------

      ------------------------------

      Creating and compiling model...

      ------------------------------

      ------------------------------

      Fitting model...

      ------------------------------

      Epoch 1/5

      2018-10-17 13:47:50.450277: I tensorflow/core/platform/cpu_feature_guard.cc:141]                              Your CPU supports instructions that this TensorFlow binary was not compiled to                              use: AVX2 AVX512F FMA

      2018-10-17 13:47:50.452655: I tensorflow/core/common_runtime/process_util.cc:69]                              Creating new thread pool with default inter op setting: 2. Tune using inter_op_                             parallelism_threads for best performance.

      2018-10-17 13:47:52.480229: W tensorflow/core/framework/op_kernel.cc:1275] OP_RE                             QUIRES failed at scatter_nd_op.cc:119 : Invalid argument: Invalid indices: [2048                             ,0] = [1, 16, 0, 0] does not index into [16,16,16,128]

      Traceback (most recent call last):

        File "train.py", line 96, in <module>

          validation_data=(x_val, y_val), shuffle=True, callbacks=[model_checkpoint])

        File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/legacy/in                             terfaces.py", line 91, in wrapper

          return func(*args, **kwargs)

        File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/engine/tr                             aining.py", line 1418, in fit_generator

          initial_epoch=initial_epoch)

        File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/engine/tr                             aining_generator.py", line 217, in fit_generator

          class_weight=class_weight)

        File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/engine/tr                             aining.py", line 1217, in train_on_batch

          outputs = self.train_function(ins)

        File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/backend/t                             ensorflow_backend.py", line 2715, in __call__

          return self._call(inputs)

        File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/backend/t                             ensorflow_backend.py", line 2675, in _call

          fetched = self._callable_fn(*array_vals)

        File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/pyth                             on/client/session.py", line 1382, in __call__

          run_metadata_ptr)

        File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/pyth                             on/framework/errors_impl.py", line 519, in __exit__

          c_api.TF_GetCode(self.status.status))

      tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid indices: [                             2048,0] = [1, 16, 0, 0] does not index into [16,16,16,128]

               [[Node: max_unpooling2d_1/max_unpooling2d_1/ScatterNd = ScatterNd[T=DT_                             FLOAT, Tindices=DT_INT32, _class=["loc:@train...d/GatherNd"], _device="/job:loca                             lhost/replica:0/task:0/device:CPU:0"](max_unpooling2d_1/max_unpooling2d_1/transp                             ose, max_unpooling2d_1/max_unpooling2d_1/Reshape_2, max_unpooling2d_1/max_unpool                             ing2d_1/ScatterNd/shape)]]

      Traceback (most recent call last):

        File "/home/u14544/.conda/envs/tf3/lib/python3.6/multiprocessing/util.py", lin                             e 262, in _run_finalizers

          finalizer()

        File "/home/u14544/.conda/envs/tf3/lib/python3.6/multiprocessing/util.py", lin                             e 186, in __call__

          res = self._callback(*self._args, **self._kwargs)

        File "/home/u14544/.conda/envs/tf3/lib/python3.6/shutil.py", line 480, in rmtr                             ee

          _rmtree_safe_fd(fd, path, onerror)

        File "/home/u14544/.conda/envs/tf3/lib/python3.6/shutil.py", line 438, in _rmt                             ree_safe_fd

          onerror(os.unlink, fullname, sys.exc_info())

        File "/home/u14544/.conda/envs/tf3/lib/python3.6/shutil.py", line 436, in _rmt                             ree_safe_fd

          os.unlink(name, dir_fd=topfd)

      OSError: [Errno 16] Device or resource busy: '.nfs0000000f0010fc3a000017ac'

      (tf3) [u14544@c009-n023 BRATS2018]$

       

       

        • 1. Re: TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
          Intel Corporation
          This message was posted on behalf of Intel Corporation

          Hi Subhashis,

          Regarding the 'matplotlib' ModuleNotFoundError, kindly do a pip/conda install matplotlib.

          Regarding the second issue, it seems like you are trying to delete a file, which is still open.
          Kindly check the following:
          1. Have you kept open any files that you are trying to delete through the program? Are you using tensorboard, which might have kept files open? If so, kindly close all these possibilities before trying to delete.
          2. If you are not sure which processes are keeping these files open, then kindly run the following command.

          lsof +D /<directory_where_nfs_error_file_exists>
          This will list all open files under that directory. If that directory has a large directory tree, then this might not be a feasible solution. After you identify the process that is keeping the files open, please check if these processes could be killed. Kill them if possible and re-run the program.

          Kindly let us know if the solution helped. If it did not help, kindly revert with the details of the code you are trying to run so that we could recreate the scenario here and check.

          Regards,
          Anju
           

          • 2. Re: TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
            subhashis

            (tf3) [u14544@c009-n014 BRATS2018]$ python train.py

            Using TensorFlow backend.

            Traceback (most recent call last):

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>

                from tensorflow.python.pywrap_tensorflow_internal import *

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>

                _pywrap_tensorflow_internal = swig_import_helper()

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper

                _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/imp.py", line 243, in load_module

                return load_dynamic(name, filename, file)

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/imp.py", line 343, in load_dynamic

                return _load(spec)

            ImportError: /lib64/libm.so.6: version `GLIBC_2.23' not found (required by /home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so)

             

             

            During handling of the above exception, another exception occurred:

             

             

            Traceback (most recent call last):

              File "train.py", line 7, in <module>

                from keras.models import Model

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/__init__.py", line 3, in <module>

                from . import utils

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/utils/__init__.py", line 6, in <module>

                from . import conv_utils

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/utils/conv_utils.py", line 9, in <module>

                from .. import backend as K

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/backend/__init__.py", line 89, in <module>

                from .tensorflow_backend import *

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 5, in <module>

                import tensorflow as tf

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/__init__.py", line 22, in <module>

                from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in <module>

                from tensorflow.python import pywrap_tensorflow

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>

                raise ImportError(msg)

            ImportError: Traceback (most recent call last):

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>

                from tensorflow.python.pywrap_tensorflow_internal import *

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>

                _pywrap_tensorflow_internal = swig_import_helper()

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper

                _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/imp.py", line 243, in load_module

                return load_dynamic(name, filename, file)

              File "/home/u14544/.conda/envs/tf3/lib/python3.6/imp.py", line 343, in load_dynamic

                return _load(spec)

            ImportError: /lib64/libm.so.6: version `GLIBC_2.23' not found (required by /home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so)

             

             

             

             

            Failed to load the native TensorFlow runtime.

             

             

            See https://www.tensorflow.org/install/install_sources#common_installation_problems

             

             

            for some common reasons and solutions.  Include the entire stack trace

            above this error message when asking for help.

            • 3. Re: TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
              Intel Corporation
              This message was posted on behalf of Intel Corporation

              Hi Subhashis,

              Kindly refer https://communities.intel.com/thread/128690 for the solution.
              Please let us know if the solution helped.

              Regards,
              Anju

              • 4. Re: TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
                subhashis

                Fitting model...

                ------------------------------

                Epoch 1/5

                2018-10-22 01:40:06.028694: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA

                2018-10-22 01:40:06.031594: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.

                2018-10-22 01:40:08.024175: W tensorflow/core/framework/op_kernel.cc:1275] OP_REQUIRES failed at scatter_nd_op.cc:119 : Invalid argument: Invalid indices: [2048,0] = [1, 16, 0, 0] does not index into [16,16,16,128]

                Traceback (most recent call last):

                  File "train.py", line 96, in <module>

                    validation_data=(x_val, y_val), shuffle=True, callbacks=[model_checkpoint])

                  File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper

                    return func(*args, **kwargs)

                  File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/engine/training.py", line 1418, in fit_generator

                    initial_epoch=initial_epoch)

                  File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/engine/training_generator.py", line 217, in fit_generator

                    class_weight=class_weight)

                  File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/engine/training.py", line 1217, in train_on_batch

                    outputs = self.train_function(ins)

                  File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__

                    return self._call(inputs)

                  File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call

                    fetched = self._callable_fn(*array_vals)

                  File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1382, in __call__

                    run_metadata_ptr)

                  File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__

                    c_api.TF_GetCode(self.status.status))

                tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid indices: [2048,0] = [1, 16, 0, 0] does not index into [16,16,16,128]

                [[Node: max_unpooling2d_1/max_unpooling2d_1/ScatterNd = ScatterNd[T=DT_FLOAT, Tindices=DT_INT32, _class=["loc:@train...d/GatherNd"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](max_unpooling2d_1/max_unpooling2d_1/transpose, max_unpooling2d_1/max_unpooling2d_1/Reshape_2, max_unpooling2d_1/max_unpooling2d_1/ScatterNd/shape)]]

                Traceback (most recent call last):

                  File "/home/u14544/.conda/envs/tf3/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers

                    finalizer()

                  File "/home/u14544/.conda/envs/tf3/lib/python3.6/multiprocessing/util.py", line 186, in __call__

                    res = self._callback(*self._args, **self._kwargs)

                  File "/home/u14544/.conda/envs/tf3/lib/python3.6/shutil.py", line 480, in rmtree

                    _rmtree_safe_fd(fd, path, onerror)

                  File "/home/u14544/.conda/envs/tf3/lib/python3.6/shutil.py", line 438, in _rmtree_safe_fd

                    onerror(os.unlink, fullname, sys.exc_info())

                  File "/home/u14544/.conda/envs/tf3/lib/python3.6/shutil.py", line 436, in _rmtree_safe_fd

                    os.unlink(name, dir_fd=topfd)

                OSError: [Errno 16] Device or resource busy: '.nfs0000001800e2fa100000012d'

                • 5. Re: TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
                  Intel Corporation
                  This message was posted on behalf of Intel Corporation

                  Hi Subhashis,

                  Could you please let us know if you have tried the suggestions given earlier. Reposting it here in case you missed it.

                  Usually, the program says that Device or resource busy, when it tries to delete a file that is open.

                  1. Have you kept open any files that you are trying to delete through the program? Are you using tensorboard, which might have kept files open? If so, kindly close all these possibilities before trying to delete.
                  2. If you are not sure which processes are keeping these files open, then kindly run the following command.lsof +D /<directory_where_nfs_error_file_exists>
                  This will list all open files under that directory. If that directory has a large directory tree, then this might not be a feasible solution. After you identify the process that is keeping the files open, please check if these processes could be killed. Kill them if possible and re-run the program.

                  Kindly let us know if the solution helped. If it did not help, kindly revert with the details of the code you are trying to run so that we could recreate the scenario here and check.

                  Regards,
                  Anju
                   

                  • 6. Re: TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
                    subhashis

                    Hi Anju, I am not using tensorboard, and here is the output for step2:

                     

                    (tf3) [u14544@c009-n014 Intel_Projects]$ lsof +D /BRATS2018

                    lsof: WARNING: can't stat(/BRATS2018): No such file or directory

                    lsof 4.87

                    latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/

                    latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ

                    latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man

                    usage: [-?abhKlnNoOPRtUvVX] [+|-c c] [+|-d s] [+D D] [+|-f[gG]] [+|-e s]

                    [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]] [-p s]

                    [+|-r [t]] [-s [p:s]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [--] [names]

                    Use the ``-h'' option to get more help information.

                    • 7. Re: TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
                      Intel Corporation
                      This message was posted on behalf of Intel Corporation

                      Hi Subhashis,

                      Is BRATS2018 a folder in your home folder?. Try giving the complete path like /home/u14544/BRATS2018 and check.

                      Also, since the problem seems to come from an NFS file, try running the code in qsub mode instead of directly running on compute node.

                      Regards,
                      Anju

                      • 8. Re: TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
                        subhashis

                        Hi Anju, here is the output:

                         

                        [u14544@c009 ~]$ qsub -I

                        qsub: waiting for job 188253.c009 to start

                        qsub: job 188253.c009 ready

                         

                         

                        ########################################################################

                        #      Date:           Mon Oct 22 03:37:00 PDT 2018

                        #    Job ID:           188253.c009

                        #      User:           u14544

                        # Resources:           neednodes=1:ppn=2,nodes=1:ppn=2,vmem=92gb,walltime=06:00:00

                        ########################################################################

                         

                         

                        [u14544@c009-n020 ~]$ source activate tf3

                        (tf3) [u14544@c009-n020 ~]$ lsof +D /home/u14544/Intel_Projects/BRATS2018

                        (tf3) [u14544@c009-n020 ~]$ cd Intel_Projects

                        (tf3) [u14544@c009-n020 Intel_Projects]$ cd BRATS2018

                        (tf3) [u14544@c009-n020 BRATS2018]$ export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH

                        (tf3) [u14544@c009-n020 BRATS2018]$ python train.pyUsing TensorFlow backend.

                        ------------------------------

                        Loading and preprocessing train data...

                        ------------------------------

                        ------------------------------

                        Creating and compiling model...

                        ------------------------------

                        ------------------------------

                        Fitting model...

                        ------------------------------

                        Epoch 1/5

                        2018-10-22 03:44:11.035210: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA

                        2018-10-22 03:44:11.037000: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.

                        2018-10-22 03:44:13.184882: W tensorflow/core/framework/op_kernel.cc:1275] OP_REQUIRES failed at scatter_nd_op.cc:119 : Invalid argument: Invalid indices: [2048,0] = [1, 16, 0, 0] does not index into [16,16,16,128]

                        Traceback (most recent call last):

                          File "train.py", line 96, in <module>

                            validation_data=(x_val, y_val), shuffle=True, callbacks=[model_checkpoint])

                          File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper

                            return func(*args, **kwargs)

                          File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/engine/training.py", line 1418, in fit_generator

                            initial_epoch=initial_epoch)

                          File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/engine/training_generator.py", line 217, in fit_generator

                            class_weight=class_weight)

                          File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/engine/training.py", line 1217, in train_on_batch

                            outputs = self.train_function(ins)

                          File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__

                            return self._call(inputs)

                          File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call

                            fetched = self._callable_fn(*array_vals)

                          File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1382, in __call__

                            run_metadata_ptr)

                          File "/home/u14544/.conda/envs/tf3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__

                            c_api.TF_GetCode(self.status.status))

                        tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid indices: [2048,0] = [1, 16, 0, 0] does not index into [16,16,16,128]

                        [[Node: max_unpooling2d_1/max_unpooling2d_1/ScatterNd = ScatterNd[T=DT_FLOAT, Tindices=DT_INT32, _class=["loc:@train...d/GatherNd"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](max_unpooling2d_1/max_unpooling2d_1/transpose, max_unpooling2d_1/max_unpooling2d_1/Reshape_2, max_unpooling2d_1/max_unpooling2d_1/ScatterNd/shape)]]

                        Traceback (most recent call last):

                          File "/home/u14544/.conda/envs/tf3/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers

                            finalizer()

                          File "/home/u14544/.conda/envs/tf3/lib/python3.6/multiprocessing/util.py", line 186, in __call__

                            res = self._callback(*self._args, **self._kwargs)

                          File "/home/u14544/.conda/envs/tf3/lib/python3.6/shutil.py", line 480, in rmtree

                            _rmtree_safe_fd(fd, path, onerror)

                          File "/home/u14544/.conda/envs/tf3/lib/python3.6/shutil.py", line 438, in _rmtree_safe_fd

                            onerror(os.unlink, fullname, sys.exc_info())

                          File "/home/u14544/.conda/envs/tf3/lib/python3.6/shutil.py", line 436, in _rmtree_safe_fd

                            os.unlink(name, dir_fd=topfd)

                        OSError: [Errno 16] Device or resource busy: '.nfs00000016003a7a3a0000001f'

                        (tf3) [u14544@c009-n020 BRATS2018]$

                        • 9. Re: TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
                          subhashis

                          Hi Anju,

                           

                          this code is running perfectly in tensorflow GPU version '1.7.0' and keras '2.1.5'.

                           

                          Thanks

                          • 10. Re: TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
                            Intel Corporation
                            This message was posted on behalf of Intel Corporation

                            Hi Subhashis,

                            qsub -I runs directly in compute node. If you wrap your program in a script file, then you could also run the job as qsub <script_file> from login node. Request was to do the second way and check if the issue still persists.

                            We don't disagree to the point that the code would be running fine. Its just that there is a file on NFS which is creating this problem. It might be logs, which might have got accidentally left open.

                            If the suggestion of qsub job submission did not work, kindly check your mail and respond to that.

                            Regards,
                            Anju

                            • 11. Re: TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
                              Intel Corporation
                              This message was posted on behalf of Intel Corporation

                              This is an update based on the conversation with the user through mail. The user had another error on "Invalid indices" in addition to "Device or resource busy". After solving the first error with the help of the link, https://github.com/ykamikawa/SegNet/issues/4 , his second issue was also resolved. This thread is now closed after user confirmation.

                              Regards,
                              Anju