본문 바로가기
NoSQL/Hadoop

Ubuntu에서 하둡(hadoop) 설치

by Lohen 2016. 2. 5.


출처: http://www.cyworld.com/ruo91/8096409 

Ubuntu에서 하둡(hadoop) 설치

ruo91 2012-07-22 17:18:08 주소복사
조회 62 스크랩 0
Ubuntu linux에서 hadoop을 Pseudo-Distributed 방식으로 설치하는 방법입니다. 설치법은 생각보다 간단합니다.

우선 아래 패키지들을 설치합니다.
root@ruo91:~# apt-get -y install ssh rsync java7-jdk<br />

hadoop 계정을 생성합니다.
root@ruo91:~# adduser hadoop<br />

hadoop 계정으로 로그인합니다.
root@ruo91:~# su -l hadoop<br />

로컬호스트에서 ssh로 암호입력 없이 연결이 되나 시도 해봅니다.
(이런.. 암호를 입력해야 되는군요!)
hadoop@ruo91:~$ ssh localhost<br />The authenticity of host 'localhost (127.0.0.1)' can't be established.<br />ECDSA key fingerprint is bb:91:c8:60:3c:09:86:75:4e:db:e7:c1:77:47:1b:a4.<br />Are you sure you want to continue connecting (yes/no)? yes<br />Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.<br />hadoop@localhost's password:<br />

그럼 로컬호스트에서 암호없이 로그인 되도록 공개키를 생성합니다.
hadoop@ruo91:~$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa<br />hadoop@ruo91:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys<br />hadoop@ruo91:~$ chmod 644 ~/.ssh/authorized_keys<br />

Java 환경변수를 설정합니다.
hadoop@ruo91:~$ echo "export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386;PATH=$PATH:$JAVA_HOME/bin" >> ~/.profile<br />

hadoop을 다운로드 합니다.
hadoop@ruo91:~$ wget <a href="<a href="<a href="http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz">http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz</a>"><a href="http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz></a>">http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz</a></a>"><a href="<a href="http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz></a>">http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz></a></a>"><a href="http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz></a></a>">http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz</a></a></a><br />hadoop@ruo91:~$ tar xzvf hadoop-1.0.3.tar.gz<br />hadoop@ruo91:~$ cd hadoop-1.0.3<br />

namenode, replication, jobtracker를 설정하기 위해 해당 파일을 열어 사용자 환경에 맞게 수정합니다.

- conf/hadoop-env.sh
hadoop@ruo91:~/hadoop-1.0.3$ nano conf/hadoop-env.sh<br /># The java implementation to use. Required.<br />export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386<br />

- conf/core-site.xml
hadoop@ruo91:~$ nano conf/core-site.xml<br /><configuration><br /> <property><br /> <name>fs.default.name</name><br /> <value>hdfs://localhost:9000</value><br /> </property><br /></configuration><br />

- conf/hdfs-site.xml
dfs.name.dir과 dfs.data.dir은 hadoop이 설치 되어 있는 곳으로 변경합니다.
ex) hadoop location : /home/hadoop/hadoop-1.0.3
hadoop@ruo91:~$ nano conf/hdfs-site.xml<br /><configuration><br /> <property><br /> <name>dfs.replication</name><br /> <value>1</value><br /> </property><br /><br /> <property><br /> <name>dfs.name.dir</name><br /> <value><font color="#0000ff"><b>/home/hadoop/hadoop-1.0.3/</b></font>name</value><br /> </property><br /><br /> <property><br /> <name>dfs.data.dir</name><br /> <value><font color="#0000ff"><b>/home/hadoop/hadoop-1.0.3/</b></font>data</value><br /> </property><br /></configuration><br />

- conf/mapred-site.xml
hadoop@ruo91:~/hadoop-1.0.3$ nano conf/mapred-site.xml<br /><configuration><br /> <property><br /> <name>mapred.job.tracker</name><br /> <value>localhost:9001</value><br /> </property><br /></configuration><br />

설정이 끝났으면 DF(distributed-filesystem)으로 포멧합니다.
hadoop@ruo91:~/hadoop-1.0.3$ bin/hadoop namenode -format<br />12/07/22 06:00:21 INFO namenode.NameNode: STARTUP_MSG:<br />/************************************************************<br />STARTUP_MSG: Starting NameNode<br />STARTUP_MSG: host = ruo91/127.0.1.1<br />STARTUP_MSG: args = [-format]<br />STARTUP_MSG: version = 1.0.3<br />STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 13351 92; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC 2012<br />************************************************************/<br />12/07/22 06:00:22 INFO util.GSet: VM type = 32-bit<br />12/07/22 06:00:22 INFO util.GSet: 2% max memory = 19.33375 MB<br />12/07/22 06:00:22 INFO util.GSet: capacity = 2^22 = 4194304 entries<br />12/07/22 06:00:22 INFO util.GSet: recommended=4194304, actual=4194304<br />12/07/22 06:00:24 INFO namenode.FSNamesystem: fsOwner=hadoop<br />12/07/22 06:00:24 INFO namenode.FSNamesystem: supergroup=supergroup<br />12/07/22 06:00:24 INFO namenode.FSNamesystem: isPermissionEnabled=true<br />12/07/22 06:00:24 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100<br />12/07/22 06:00:24 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)<br />12/07/22 06:00:24 INFO namenode.NameNode: Caching file names occuring more than 10 times<br />12/07/22 06:00:24 INFO common.Storage: Image file of size 112 saved in 0 seconds.<br />12/07/22 06:00:24 INFO common.Storage: Storage directory /home/hadoop/hadoop-1.0.3/name has been successfully formatted.<br />12/07/22 06:00:24 INFO namenode.NameNode: SHUTDOWN_MSG:<br />/************************************************************<br />SHUTDOWN_MSG: Shutting down NameNode at ruo91/127.0.1.1<br />************************************************************/<br />

프로세스를 모두 실행 시켜줍니다.
hadoop@ruo91:~/hadoop-1.0.3$ bin/start-all.sh<br />starting namenode, logging to /home/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-hadoop-namenode-ruo91.out<br />localhost: starting datanode, logging to /home/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-hadoop-datanode-ruo91.out<br />localhost: starting secondarynamenode, logging to /home/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-hadoop-secondarynamenode-ruo91.out<br />starting jobtracker, logging to /home/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-hadoop-jobtracker-ruo91.out<br />localhost: starting tasktracker, logging to /home/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-hadoop-tasktracker-ruo91.out<br />

namenode와 jobtracker은 웹페이지에서 확인이 가능합니다.

- namenode : http://localhost:50070/

- jobtracker : http://localhost:50030/



앞전에 DF로 포멧했던것을 시험해보기 위해 파일 하나를 복사 해봅니다.
hadoop@ruo91:~/hadoop-1.0.3$ bin/hadoop fs -put conf input<br />

Map과 Reduce가 되는지 테스트 해봅니다.
hadoop@ruo91:~/hadoop-1.0.3$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'<br />12/07/22 06:05:22 INFO util.NativeCodeLoader: Loaded the native-hadoop library<br />12/07/22 06:05:22 WARN snappy.LoadSnappy: Snappy native library not loaded<br />12/07/22 06:05:22 INFO mapred.FileInputFormat: Total input paths to process : 16<br />12/07/22 06:05:24 INFO mapred.JobClient: Running job: job_201207220601_0001<br />12/07/22 06:05:25 INFO mapred.JobClient: map 0% reduce 0%<br />12/07/22 06:06:33 INFO mapred.JobClient: map 6% reduce 0%<br />12/07/22 06:06:37 INFO mapred.JobClient: map 12% reduce 0%<br />12/07/22 06:07:53 INFO mapred.JobClient: map 12% reduce 4%<br />12/07/22 06:08:01 INFO mapred.JobClient: map 18% reduce 4%<br />12/07/22 06:08:09 INFO mapred.JobClient: map 25% reduce 4%<br />12/07/22 06:08:16 INFO mapred.JobClient: map 25% reduce 6%<br />12/07/22 06:08:21 INFO mapred.JobClient: map 25% reduce 8%<br />12/07/22 06:08:59 INFO mapred.JobClient: map 31% reduce 8%<br />12/07/22 06:09:03 INFO mapred.JobClient: map 37% reduce 8%<br />12/07/22 06:09:12 INFO mapred.JobClient: map 37% reduce 12%<br />12/07/22 06:10:41 INFO mapred.JobClient: map 50% reduce 12%<br />12/07/22 06:10:51 INFO mapred.JobClient: map 50% reduce 16%<br />12/07/22 06:12:04 INFO mapred.JobClient: map 62% reduce 16%<br />12/07/22 06:12:20 INFO mapred.JobClient: map 62% reduce 20%<br />12/07/22 06:12:54 INFO mapred.JobClient: map 75% reduce 20%<br />12/07/22 06:13:06 INFO mapred.JobClient: map 75% reduce 25%<br />12/07/22 06:13:46 INFO mapred.JobClient: map 81% reduce 25%<br />12/07/22 06:13:50 INFO mapred.JobClient: map 87% reduce 25%<br />12/07/22 06:14:01 INFO mapred.JobClient: map 87% reduce 27%<br />12/07/22 06:14:04 INFO mapred.JobClient: map 87% reduce 29%<br />12/07/22 06:14:36 INFO mapred.JobClient: map 100% reduce 29%<br />12/07/22 06:14:52 INFO mapred.JobClient: map 100% reduce 100%<br />12/07/22 06:15:04 INFO mapred.JobClient: Job complete: job_201207220601_0001<br />12/07/22 06:15:05 INFO mapred.JobClient: Counters: 30<br />12/07/22 06:15:05 INFO mapred.JobClient: Job Counters<br />12/07/22 06:15:05 INFO mapred.JobClient: Launched reduce tasks=1<br />12/07/22 06:15:05 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1046529<br />12/07/22 06:15:05 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0<br />12/07/22 06:15:05 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0<br />12/07/22 06:15:05 INFO mapred.JobClient: Launched map tasks=16<br />12/07/22 06:15:05 INFO mapred.JobClient: Data-local map tasks=16<br />12/07/22 06:15:05 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=496516<br />12/07/22 06:15:05 INFO mapred.JobClient: File Input Format Counters<br />12/07/22 06:15:05 INFO mapred.JobClient: Bytes Read=27119<br />12/07/22 06:15:05 INFO mapred.JobClient: File Output Format Counters<br />12/07/22 06:15:05 INFO mapred.JobClient: Bytes Written=238<br />12/07/22 06:15:05 INFO mapred.JobClient: FileSystemCounters<br />12/07/22 06:15:05 INFO mapred.JobClient: FILE_BYTES_READ=128<br />12/07/22 06:15:05 INFO mapred.JobClient: HDFS_BYTES_READ=28873<br />12/07/22 06:15:05 INFO mapred.JobClient: FILE_BYTES_WRITTEN=368728<br />12/07/22 06:15:05 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=238<br />12/07/22 06:15:05 INFO mapred.JobClient: Map-Reduce Framework<br />12/07/22 06:15:05 INFO mapred.JobClient: Map output materialized bytes=218<br />12/07/22 06:15:05 INFO mapred.JobClient: Map input records=770<br />12/07/22 06:15:05 INFO mapred.JobClient: Reduce shuffle bytes=212<br />12/07/22 06:15:05 INFO mapred.JobClient: Spilled Records=10<br />12/07/22 06:15:05 INFO mapred.JobClient: Map output bytes=112<br />12/07/22 06:15:05 INFO mapred.JobClient: Total committed heap usage (bytes)=3252158464<br />12/07/22 06:15:05 INFO mapred.JobClient: CPU time spent (ms)=127030<br />12/07/22 06:15:05 INFO mapred.JobClient: Map input bytes=27119<br />12/07/22 06:15:05 INFO mapred.JobClient: SPLIT_RAW_BYTES=1754<br />12/07/22 06:15:05 INFO mapred.JobClient: Combine input records=5<br />12/07/22 06:15:05 INFO mapred.JobClient: Reduce input records=5<br />12/07/22 06:15:05 INFO mapred.JobClient: Reduce input groups=5<br />12/07/22 06:15:05 INFO mapred.JobClient: Combine output records=5<br />12/07/22 06:15:05 INFO mapred.JobClient: Physical memory (bytes) snapshot=2256572416<br />12/07/22 06:15:05 INFO mapred.JobClient: Reduce output records=5<br />12/07/22 06:15:05 INFO mapred.JobClient: Virtual memory (bytes) snapshot=6466576384<br />12/07/22 06:15:05 INFO mapred.JobClient: Map output records=5<br />12/07/22 06:15:06 INFO mapred.FileInputFormat: Total input paths to process : 1<br />12/07/22 06:15:08 INFO mapred.JobClient: Running job: job_201207220601_0002<br />12/07/22 06:15:09 INFO mapred.JobClient: map 0% reduce 0%<br />12/07/22 06:15:39 INFO mapred.JobClient: map 100% reduce 0%<br />12/07/22 06:16:00 INFO mapred.JobClient: map 100% reduce 100%<br />12/07/22 06:16:11 INFO mapred.JobClient: Job complete: job_201207220601_0002<br />12/07/22 06:16:11 INFO mapred.JobClient: Counters: 30<br />12/07/22 06:16:11 INFO mapred.JobClient: Job Counters<br />12/07/22 06:16:11 INFO mapred.JobClient: Launched reduce tasks=1<br />12/07/22 06:16:11 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=32548<br />12/07/22 06:16:12 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0<br />12/07/22 06:16:12 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0<br />12/07/22 06:16:12 INFO mapred.JobClient: Launched map tasks=1<br />12/07/22 06:16:12 INFO mapred.JobClient: Data-local map tasks=1<br />12/07/22 06:16:12 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=19552<br />12/07/22 06:16:12 INFO mapred.JobClient: File Input Format Counters<br />12/07/22 06:16:12 INFO mapred.JobClient: Bytes Read=238<br />12/07/22 06:16:12 INFO mapred.JobClient: File Output Format Counters<br />12/07/22 06:16:12 INFO mapred.JobClient: Bytes Written=82<br />12/07/22 06:16:12 INFO mapred.JobClient: FileSystemCounters<br />12/07/22 06:16:12 INFO mapred.JobClient: FILE_BYTES_READ=128<br />12/07/22 06:16:12 INFO mapred.JobClient: HDFS_BYTES_READ=355<br />12/07/22 06:16:12 INFO mapred.JobClient: FILE_BYTES_WRITTEN=42813<br />12/07/22 06:16:12 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=82<br />12/07/22 06:16:12 INFO mapred.JobClient: Map-Reduce Framework<br />12/07/22 06:16:12 INFO mapred.JobClient: Map output materialized bytes=128<br />12/07/22 06:16:12 INFO mapred.JobClient: Map input records=5<br />12/07/22 06:16:12 INFO mapred.JobClient: Reduce shuffle bytes=0<br />12/07/22 06:16:12 INFO mapred.JobClient: Spilled Records=10<br />12/07/22 06:16:12 INFO mapred.JobClient: Map output bytes=112<br />12/07/22 06:16:12 INFO mapred.JobClient: Total committed heap usage (bytes)=210632704<br />12/07/22 06:16:12 INFO mapred.JobClient: CPU time spent (ms)=4680<br />12/07/22 06:16:12 INFO mapred.JobClient: Map input bytes=152<br />12/07/22 06:16:12 INFO mapred.JobClient: SPLIT_RAW_BYTES=117<br />12/07/22 06:16:12 INFO mapred.JobClient: Combine input records=0<br />12/07/22 06:16:12 INFO mapred.JobClient: Reduce input records=5<br />12/07/22 06:16:12 INFO mapred.JobClient: Reduce input groups=1<br />12/07/22 06:16:12 INFO mapred.JobClient: Combine output records=0<br />12/07/22 06:16:12 INFO mapred.JobClient: Physical memory (bytes) snapshot=182882304<br />12/07/22 06:16:12 INFO mapred.JobClient: Reduce output records=5<br />12/07/22 06:16:12 INFO mapred.JobClient: Virtual memory (bytes) snapshot=763695104<br />12/07/22 06:16:12 INFO mapred.JobClient: Map output records=5<br />

저장된 결과를 꺼내어 볼수 있고,
hadoop@ruo91:~/hadoop-1.0.3$ bin/hadoop fs -get output output<br />hadoop@ruo91:~/hadoop-1.0.3$ cat output/*<br />cat: output/_logs: 디렉터리입니다<br />1 dfs.data.dir<br />1 dfs.name.dir<br />1 dfs.replication<br />1 dfs.server.namenode.<br />1 dfsadmin<br />

cat으로 바로 확인 할수 있습니다.
hadoop@ruo91:~/hadoop-1.0.3$ bin/hadoop fs -cat output/*<br />cat: File does not exist: /user/hadoop/output/_logs<br />1 dfs.data.dir<br />1 dfs.name.dir<br />1 dfs.replication<br />1 dfs.server.namenode.<br />1 dfsadmin<br />


반응형