Launch Linux( ubuntu14.04) GPU Acc machine in AWS

TL; DR

In order to deploy network to train Deep Learning Network, a GPU Enabled machine is required. Fortunately, AWS provides GPU Accelerated Machine.

https://aws.amazon.com/blogs/aws/new-g2-instance-type-with-4x-more-gpu-power/

Installation scripts:
Install Nvidia Drivers, CUDNn, Python, TensorFlow on Ubuntu 16.04

Provision Machine

  • AMI

    Ubuntu Server 14.04 LTS (HVM), SSD Volume Type

  • Select Instance Type

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_cluster_computing.html

  • Deploy it

About CUDA Cores (2560)

Nvidia GPU Product Matrix

Install TensorFlow with pip

manual

使用python3

复制代码
# ubuntu @ dagama in ~ [2:54:27] C:1
$ cd /usr/local/bin
# ubuntu @ dagama in /usr/local/bin [2:54:46]
$ ls -l|grep pip
-rwxr-xr-x 1 root root 204 Nov  7 11:08 pip
-rwxr-xr-x 1 root root 204 Nov  7 11:08 pip2
-rwxr-xr-x 1 root root 204 Nov  7 11:08 pip2.7
$ sudo mv pip2 ~/bakup1
$ sudo mv pip2.7 ~/bakup1
# ubuntu @ dagama in /usr/local/bin [2:57:46]
$ ls -l|grep pip
-rwxr-xr-x 1 root root 204 Nov  7 11:08 pip
###尝试用pip安装模块,以查看pip是否安装成功###
$ pip install wheel
Traceback (most recent call last):
  File "/usr/local/bin/pip", line 7, in <module>
    from pip import main
ImportError: No module named 'pip
###应该是安装python3的pip? 并更新pip###
$ sudo apt-get install python3-pip
$sudo pip install --upgrade pip
$ pip --version
pip 9.0.1 from /usr/local/lib/python3.4/dist-packages (python 3.4)

Install required packages

复制代码
sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose
# 直接利用"pip install -U scikit-learn "安装scikit-learn,会提示"UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 52: ordinal not in range(128)"的错误,可以先升级一下setuptools,如下
sudo pip install --upgrade setuptools
sudo pip install -U scikit-learn  # 安装成功

Install tensorflow0.9.0(python3.4)

复制代码
# Ubuntu/Linux 64-bit, GPU enabled, Python 3.4
# Requires CUDA toolkit 7.5 and CuDNN v4. For other versions, see "Install from sources" below.
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0-cp34-cp34m-linux_x86_64.whl
# Python3
$ sudo pip3 install --upgrade $TF_BINARY_UR

But there is no 'configure'script at the root of the tree (in the tensorflow), so I clone the tensorflow repository, as follows:

Clone the TensorFlow repository
复制代码
$ git clone https://github.com/tensorflow/tensorflow

Install Drivers

https://aws.amazon.com/blogs/aws/new-g2-instance-type-with-4x-more-gpu-power/

Install utilities

复制代码
 sudo apt-get install wget zsh git curl ack-grep -yy

Installing NVIDIA Driver

manual

CUDA Driver

manual

复制代码
sudo dpkg -i cuda-repo-ubuntu1404_8.0.44-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda

Setup CUDA_HOME in PATH

edit /etc/profile

复制代码
export CUDA_HOME=/usr/local/cuda
export PATH=$PATH:$CUDA_HOME/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib64

CUDNN

Install cuDNN v5.

Uncompress and copy the cuDNN files into the toolkit directory. Assuming the toolkit is installed in /usr/local/cuda, run the following commands (edited to reflect the cuDNN version you downloaded):

复制代码
tar xvzf cudnn-8.0-linux-x64-v5.1.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
cd /usr/local/cuda/lib64/
sudo rm -rf libcudnn.so libcudnn.so.5
sudo ln -s libcudnn.so.5.0.5 libcudnn.so.5
sudo ln -s libcudnn.so.5 libcudnn.so

Install bazel

manual

For Ubuntu Trusty (14.04 LTS) users, since OpenJDK 8 is not available on Trusty, please install Oracle JDK 8:

复制代码
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer

Note: You might need to sudo apt-get install software-properties-common if you don't have the add-apt-repository command. See here.

复制代码
$ sudo apt-get update && sudo apt-get install bazel
#Once installed, you can upgrade to newer version of Bazel with:
$ sudo apt-get upgrade bazel

Launch tensorflow

相关推荐
操练起来6 分钟前
【昇腾CANN训练营·第八期】Ascend C生态兼容:基于PyTorch Adapter的自定义算子注册与自动微分实现
人工智能·pytorch·acl·昇腾·cann
optimistic_chen17 分钟前
【Redis 系列】Redis详解
linux·数据库·redis·缓存·xsheel
KG_LLM图谱增强大模型22 分钟前
[500页电子书]构建自主AI Agent系统的蓝图:谷歌重磅发布智能体设计模式指南
人工智能·大模型·知识图谱·智能体·知识图谱增强大模型·agenticai
低客的黑调23 分钟前
了解JVM 结构和运行机制,从小白编程Java 大佬
java·linux·开发语言
想唱rap23 分钟前
C++ map和set
linux·运维·服务器·开发语言·c++·算法
声网27 分钟前
活动推荐丨「实时互动 × 对话式 AI」主题有奖征文
大数据·人工智能·实时互动
caiyueloveclamp28 分钟前
【功能介绍03】ChatPPT好不好用?如何用?用户操作手册来啦!——【AI溯源篇】
人工智能·信息可视化·powerpoint·ai生成ppt·aippt
CodeByV35 分钟前
【Linux】Ext 系列文件系统深度解析:从磁盘到软硬链接
linux·服务器
q***484136 分钟前
Vanna AI:告别代码,用自然语言轻松查询数据库,领先的RAG2SQL技术让结果更智能、更精准!
人工智能·microsoft
LCG元39 分钟前
告别空谈!手把手教你用LangChain构建"能干活"的垂直领域AI Agent
人工智能