Ubuntu: lspci returns “Cannot open /sys/bus/pci/devices/xxxxx/resource: No such file or directory”



Question:

My Ubuntu 16.10 server VM in MS Azure (NV6 series) suddenly had a hickup for unknown reasons (none of my doing), I had to restart it and when it came back online I was no longer able to use the GPU on the machine.

The nvidia-smi application freezes.

The command lspci yields

lspci: Cannot open /sys/bus/pci/devices/7ec1:00:00.0/resource: No such file or directory  

And of course, that path (no longer?) exists. What does exist is,

$: ls /sys/bus/pci/devices/  0000:00:00.0/    0000:00:07.0/    0000:00:07.1/    0000:00:07.3/    0000:00:08.0/    b717ec1:00:00.0/  

Some googling yielded a few similar questions like mine, many of which has been asked in the last 24 hours, like this one.

This might be due to Ubuntu or Azure, I have no idea which is the source of this problem or how to solve it.

Anyone have any ideas?


Solution:1

I was having the same problem (using Azure NC24 instances) and after working at it for a few hours I found this post and decided to submit a support request to Microsoft. Here's what they told me:

Canonical appears to have recently released kernel 4.4.0-75 for Ubuntu 16.04 and this is having an adverse effect on Tesla GPUs on NC-series VMs. Installation of the 4.4.0-75 breaks the 8.0.61-1 version of the NVIDIA CUDA driver that’s currently recommended for use on these systems, resulting in nvidia-smi not showing the adapters and lspci returning an error similar to the following:

root@pd-nvtest2:~# lspci lspci: Cannot open /sys/bus/pci/devices/2baf:00:00.0/resource: No such file or directory

They suggest backing up the OS drive, running

apt-get remove linux-image-4.4.0-75-generic

and then

update-grub

Reboot and it should work! At the very least doing that fixed the lspci output for me, I still needed to fix some CUDA stuff but that's from earlier debugging attempts.


Solution:2

Maybe this is owing to you have stopped(deallocated) the Azure VM, and then started VM again. According to [1], the hardware IP(like gpu,cpu) has changed when you stop(deallocated) and then start VM again. But the Ubuntu system hasn't been updated for new hardware(like gpu, cpu) IP address. Hence, lspci will tell you cannot open some hardware ip address related folder.

[1]https://blogs.technet.microsoft.com/gbanin/2015/04/22/difference-between-the-states-of-azure-virtual-machines-stopped-and-stopped-deallocated/


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »