Seven Story Rabbit Hole

Sometimes awesome things happen in deep rabbit holes. Or not.

   images

CUDA 6.5 on AWS GPU Instance Running Ubuntu 14.04

Using a pre-built public AMI

Based on the instructions in this blog post, I’ve created an AMI and shared it publicly. So the easiest thing to do is just use that pre-built AMI:

  • Image: ami-2cbf3e44 for US-East or ami-c38babf3 for US-West (Ubuntu Server 14.04 LTS (HVM) – CUDA 6.5)
  • Instance type: g2.2xlarge (if you skip this step, you won’t have an nvidia device)
  • Storage: Use at least 8 GB, 20+ GB recommended

If you use the pre-built AMI, then you can skip down to the Verify CUDA is correctly installed section, since all of the rest of the steps are “baked in” to the AMI.

Note regarding AMI regions: the AMI only currently works in the US-East and US-West regions. If you need it added to another region, please post a comment below

Building from scratch

Or if you prefer to build your own instance from scratch, keep reading.

Create a new EC2 instance:

  • Image: ami-9eaa1cf6 (Ubuntu Server 14.04 LTS (HVM), SSD Volume Type)
  • Instance type: g2.2xlarge
  • Storage: Use at least 8 GB, 20+ GB recommended

Install build-essential:

1
$ apt-get update && apt-get install build-essential

Get CUDA installer:

1
$ wget http://developer.download.nvidia.com/compute/cuda/6_5/rel/installers/cuda_6.5.14_linux_64.run

Extract CUDA installer:

1
2
3
$ chmod +x cuda_6.5.14_linux_64.run
$ mkdir nvidia_installers
$ ./cuda_6.5.14_linux_64.run -extract=`pwd`/nvidia_installers

Run Nvidia driver installer:

1
2
$ cd nvidia_installers
$ ./NVIDIA-Linux-x86_64-340.29.run

At this point it will popup an 8-bit UI that will ask you to accept a license agreement, and then start installing.

screenshot

At this point, I got an error:

1
2
3
4
5
6
7
Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or
         improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver
         such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics
         device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

         Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log'
         for more information.

After reading this forum post I installed:

1
$ sudo apt-get install linux-image-extra-virtual

When it prompted me what do to about the grub changes, I chose “choose package maintainers version”.

Reboot:

1
$ reboot

Disable nouveau

At this point you need to disable nouveau, since it conflicts with the nvidia kernel module.

Open a new file

1
$ vi /etc/modprobe.d/blacklist-nouveau.conf

and add these lines to it

1
2
3
4
5
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

and then save the file.

Disable the Kernel Nouveau:

1
$ echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf

Reboot:

1
2
$ update-initramfs -u
$ reboot

One more try — this time it works

Get Kernel source:

1
2
$ apt-get install linux-source
$ apt-get install linux-headers-3.13.0-37-generic

Rerun Nvidia driver installer:

1
2
$ cd nvidia_installers
$ ./NVIDIA-Linux-x86_64-340.29.run

Load nvidia kernel module:

1
$ modprobe nvidia

Run CUDA + samples installer:

1
2
$ ./cuda-linux64-rel-6.5.14-18749181.run
$ ./cuda-samples-linux-6.5.14-18745345.run

Verify CUDA is correctly installed

1
2
3
$ cd /usr/local/cuda/samples/1_Utilities/deviceQuery
$ make
$ ./deviceQuery   

You should see the following output:

1
2
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GRID K520
Result = PASS

References

Comments