출처: http://www.cyworld.com/ruo91/8096409
Ubuntu linux에서 hadoop을 Pseudo-Distributed 방식으로 설치하는 방법입니다. 설치법은 생각보다 간단합니다.
우선 아래 패키지들을 설치합니다.
hadoop 계정을 생성합니다.
hadoop 계정으로 로그인합니다.
로컬호스트에서 ssh로 암호입력 없이 연결이 되나 시도 해봅니다.
(이런.. 암호를 입력해야 되는군요!)
그럼 로컬호스트에서 암호없이 로그인 되도록 공개키를 생성합니다.
Java 환경변수를 설정합니다.
hadoop을 다운로드 합니다.
namenode, replication, jobtracker를 설정하기 위해 해당 파일을 열어 사용자 환경에 맞게 수정합니다.
- conf/hadoop-env.sh
- conf/core-site.xml
- conf/hdfs-site.xml
dfs.name.dir과 dfs.data.dir은 hadoop이 설치 되어 있는 곳으로 변경합니다.
ex) hadoop location : /home/hadoop/hadoop-1.0.3
- conf/mapred-site.xml
설정이 끝났으면 DF(distributed-filesystem)으로 포멧합니다.
프로세스를 모두 실행 시켜줍니다.
namenode와 jobtracker은 웹페이지에서 확인이 가능합니다.
- namenode : http://localhost:50070/
- jobtracker : http://localhost:50030/
앞전에 DF로 포멧했던것을 시험해보기 위해 파일 하나를 복사 해봅니다.
Map과 Reduce가 되는지 테스트 해봅니다.
저장된 결과를 꺼내어 볼수 있고,
cat으로 바로 확인 할수 있습니다.
우선 아래 패키지들을 설치합니다.
root@ruo91:~# apt-get -y install ssh rsync java7-jdk<br />
hadoop 계정을 생성합니다.
root@ruo91:~# adduser hadoop<br />
hadoop 계정으로 로그인합니다.
root@ruo91:~# su -l hadoop<br />
로컬호스트에서 ssh로 암호입력 없이 연결이 되나 시도 해봅니다.
(이런.. 암호를 입력해야 되는군요!)
hadoop@ruo91:~$ ssh localhost<br />The authenticity of host 'localhost (127.0.0.1)' can't be established.<br />ECDSA key fingerprint is bb:91:c8:60:3c:09:86:75:4e:db:e7:c1:77:47:1b:a4.<br />Are you sure you want to continue connecting (yes/no)? yes<br />Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.<br />hadoop@localhost's password:<br />
그럼 로컬호스트에서 암호없이 로그인 되도록 공개키를 생성합니다.
hadoop@ruo91:~$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa<br />hadoop@ruo91:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys<br />hadoop@ruo91:~$ chmod 644 ~/.ssh/authorized_keys<br />
Java 환경변수를 설정합니다.
hadoop@ruo91:~$ echo "export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386;PATH=$PATH:$JAVA_HOME/bin" >> ~/.profile<br />
hadoop을 다운로드 합니다.
hadoop@ruo91:~$ wget <a href="<a href="<a href="http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz">http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz</a>"><a href="http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz></a>">http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz</a></a>"><a href="<a href="http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz></a>">http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz></a></a>"><a href="http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz></a></a>">http://mirror.yongbok.net/apache/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz</a></a></a><br />hadoop@ruo91:~$ tar xzvf hadoop-1.0.3.tar.gz<br />hadoop@ruo91:~$ cd hadoop-1.0.3<br />
namenode, replication, jobtracker를 설정하기 위해 해당 파일을 열어 사용자 환경에 맞게 수정합니다.
- conf/hadoop-env.sh
hadoop@ruo91:~/hadoop-1.0.3$ nano conf/hadoop-env.sh<br /># The java implementation to use. Required.<br />export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386<br />
- conf/core-site.xml
hadoop@ruo91:~$ nano conf/core-site.xml<br /><configuration><br /> <property><br /> <name>fs.default.name</name><br /> <value>hdfs://localhost:9000</value><br /> </property><br /></configuration><br />
- conf/hdfs-site.xml
dfs.name.dir과 dfs.data.dir은 hadoop이 설치 되어 있는 곳으로 변경합니다.
ex) hadoop location : /home/hadoop/hadoop-1.0.3
hadoop@ruo91:~$ nano conf/hdfs-site.xml<br /><configuration><br /> <property><br /> <name>dfs.replication</name><br /> <value>1</value><br /> </property><br /><br /> <property><br /> <name>dfs.name.dir</name><br /> <value><font color="#0000ff"><b>/home/hadoop/hadoop-1.0.3/</b></font>name</value><br /> </property><br /><br /> <property><br /> <name>dfs.data.dir</name><br /> <value><font color="#0000ff"><b>/home/hadoop/hadoop-1.0.3/</b></font>data</value><br /> </property><br /></configuration><br />
- conf/mapred-site.xml
hadoop@ruo91:~/hadoop-1.0.3$ nano conf/mapred-site.xml<br /><configuration><br /> <property><br /> <name>mapred.job.tracker</name><br /> <value>localhost:9001</value><br /> </property><br /></configuration><br />
설정이 끝났으면 DF(distributed-filesystem)으로 포멧합니다.
hadoop@ruo91:~/hadoop-1.0.3$ bin/hadoop namenode -format<br />12/07/22 06:00:21 INFO namenode.NameNode: STARTUP_MSG:<br />/************************************************************<br />STARTUP_MSG: Starting NameNode<br />STARTUP_MSG: host = ruo91/127.0.1.1<br />STARTUP_MSG: args = [-format]<br />STARTUP_MSG: version = 1.0.3<br />STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 13351 92; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC 2012<br />************************************************************/<br />12/07/22 06:00:22 INFO util.GSet: VM type = 32-bit<br />12/07/22 06:00:22 INFO util.GSet: 2% max memory = 19.33375 MB<br />12/07/22 06:00:22 INFO util.GSet: capacity = 2^22 = 4194304 entries<br />12/07/22 06:00:22 INFO util.GSet: recommended=4194304, actual=4194304<br />12/07/22 06:00:24 INFO namenode.FSNamesystem: fsOwner=hadoop<br />12/07/22 06:00:24 INFO namenode.FSNamesystem: supergroup=supergroup<br />12/07/22 06:00:24 INFO namenode.FSNamesystem: isPermissionEnabled=true<br />12/07/22 06:00:24 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100<br />12/07/22 06:00:24 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)<br />12/07/22 06:00:24 INFO namenode.NameNode: Caching file names occuring more than 10 times<br />12/07/22 06:00:24 INFO common.Storage: Image file of size 112 saved in 0 seconds.<br />12/07/22 06:00:24 INFO common.Storage: Storage directory /home/hadoop/hadoop-1.0.3/name has been successfully formatted.<br />12/07/22 06:00:24 INFO namenode.NameNode: SHUTDOWN_MSG:<br />/************************************************************<br />SHUTDOWN_MSG: Shutting down NameNode at ruo91/127.0.1.1<br />************************************************************/<br />
프로세스를 모두 실행 시켜줍니다.
hadoop@ruo91:~/hadoop-1.0.3$ bin/start-all.sh<br />starting namenode, logging to /home/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-hadoop-namenode-ruo91.out<br />localhost: starting datanode, logging to /home/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-hadoop-datanode-ruo91.out<br />localhost: starting secondarynamenode, logging to /home/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-hadoop-secondarynamenode-ruo91.out<br />starting jobtracker, logging to /home/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-hadoop-jobtracker-ruo91.out<br />localhost: starting tasktracker, logging to /home/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-hadoop-tasktracker-ruo91.out<br />
namenode와 jobtracker은 웹페이지에서 확인이 가능합니다.
- namenode : http://localhost:50070/
- jobtracker : http://localhost:50030/
앞전에 DF로 포멧했던것을 시험해보기 위해 파일 하나를 복사 해봅니다.
hadoop@ruo91:~/hadoop-1.0.3$ bin/hadoop fs -put conf input<br />
Map과 Reduce가 되는지 테스트 해봅니다.
hadoop@ruo91:~/hadoop-1.0.3$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'<br />12/07/22 06:05:22 INFO util.NativeCodeLoader: Loaded the native-hadoop library<br />12/07/22 06:05:22 WARN snappy.LoadSnappy: Snappy native library not loaded<br />12/07/22 06:05:22 INFO mapred.FileInputFormat: Total input paths to process : 16<br />12/07/22 06:05:24 INFO mapred.JobClient: Running job: job_201207220601_0001<br />12/07/22 06:05:25 INFO mapred.JobClient: map 0% reduce 0%<br />12/07/22 06:06:33 INFO mapred.JobClient: map 6% reduce 0%<br />12/07/22 06:06:37 INFO mapred.JobClient: map 12% reduce 0%<br />12/07/22 06:07:53 INFO mapred.JobClient: map 12% reduce 4%<br />12/07/22 06:08:01 INFO mapred.JobClient: map 18% reduce 4%<br />12/07/22 06:08:09 INFO mapred.JobClient: map 25% reduce 4%<br />12/07/22 06:08:16 INFO mapred.JobClient: map 25% reduce 6%<br />12/07/22 06:08:21 INFO mapred.JobClient: map 25% reduce 8%<br />12/07/22 06:08:59 INFO mapred.JobClient: map 31% reduce 8%<br />12/07/22 06:09:03 INFO mapred.JobClient: map 37% reduce 8%<br />12/07/22 06:09:12 INFO mapred.JobClient: map 37% reduce 12%<br />12/07/22 06:10:41 INFO mapred.JobClient: map 50% reduce 12%<br />12/07/22 06:10:51 INFO mapred.JobClient: map 50% reduce 16%<br />12/07/22 06:12:04 INFO mapred.JobClient: map 62% reduce 16%<br />12/07/22 06:12:20 INFO mapred.JobClient: map 62% reduce 20%<br />12/07/22 06:12:54 INFO mapred.JobClient: map 75% reduce 20%<br />12/07/22 06:13:06 INFO mapred.JobClient: map 75% reduce 25%<br />12/07/22 06:13:46 INFO mapred.JobClient: map 81% reduce 25%<br />12/07/22 06:13:50 INFO mapred.JobClient: map 87% reduce 25%<br />12/07/22 06:14:01 INFO mapred.JobClient: map 87% reduce 27%<br />12/07/22 06:14:04 INFO mapred.JobClient: map 87% reduce 29%<br />12/07/22 06:14:36 INFO mapred.JobClient: map 100% reduce 29%<br />12/07/22 06:14:52 INFO mapred.JobClient: map 100% reduce 100%<br />12/07/22 06:15:04 INFO mapred.JobClient: Job complete: job_201207220601_0001<br />12/07/22 06:15:05 INFO mapred.JobClient: Counters: 30<br />12/07/22 06:15:05 INFO mapred.JobClient: Job Counters<br />12/07/22 06:15:05 INFO mapred.JobClient: Launched reduce tasks=1<br />12/07/22 06:15:05 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1046529<br />12/07/22 06:15:05 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0<br />12/07/22 06:15:05 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0<br />12/07/22 06:15:05 INFO mapred.JobClient: Launched map tasks=16<br />12/07/22 06:15:05 INFO mapred.JobClient: Data-local map tasks=16<br />12/07/22 06:15:05 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=496516<br />12/07/22 06:15:05 INFO mapred.JobClient: File Input Format Counters<br />12/07/22 06:15:05 INFO mapred.JobClient: Bytes Read=27119<br />12/07/22 06:15:05 INFO mapred.JobClient: File Output Format Counters<br />12/07/22 06:15:05 INFO mapred.JobClient: Bytes Written=238<br />12/07/22 06:15:05 INFO mapred.JobClient: FileSystemCounters<br />12/07/22 06:15:05 INFO mapred.JobClient: FILE_BYTES_READ=128<br />12/07/22 06:15:05 INFO mapred.JobClient: HDFS_BYTES_READ=28873<br />12/07/22 06:15:05 INFO mapred.JobClient: FILE_BYTES_WRITTEN=368728<br />12/07/22 06:15:05 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=238<br />12/07/22 06:15:05 INFO mapred.JobClient: Map-Reduce Framework<br />12/07/22 06:15:05 INFO mapred.JobClient: Map output materialized bytes=218<br />12/07/22 06:15:05 INFO mapred.JobClient: Map input records=770<br />12/07/22 06:15:05 INFO mapred.JobClient: Reduce shuffle bytes=212<br />12/07/22 06:15:05 INFO mapred.JobClient: Spilled Records=10<br />12/07/22 06:15:05 INFO mapred.JobClient: Map output bytes=112<br />12/07/22 06:15:05 INFO mapred.JobClient: Total committed heap usage (bytes)=3252158464<br />12/07/22 06:15:05 INFO mapred.JobClient: CPU time spent (ms)=127030<br />12/07/22 06:15:05 INFO mapred.JobClient: Map input bytes=27119<br />12/07/22 06:15:05 INFO mapred.JobClient: SPLIT_RAW_BYTES=1754<br />12/07/22 06:15:05 INFO mapred.JobClient: Combine input records=5<br />12/07/22 06:15:05 INFO mapred.JobClient: Reduce input records=5<br />12/07/22 06:15:05 INFO mapred.JobClient: Reduce input groups=5<br />12/07/22 06:15:05 INFO mapred.JobClient: Combine output records=5<br />12/07/22 06:15:05 INFO mapred.JobClient: Physical memory (bytes) snapshot=2256572416<br />12/07/22 06:15:05 INFO mapred.JobClient: Reduce output records=5<br />12/07/22 06:15:05 INFO mapred.JobClient: Virtual memory (bytes) snapshot=6466576384<br />12/07/22 06:15:05 INFO mapred.JobClient: Map output records=5<br />12/07/22 06:15:06 INFO mapred.FileInputFormat: Total input paths to process : 1<br />12/07/22 06:15:08 INFO mapred.JobClient: Running job: job_201207220601_0002<br />12/07/22 06:15:09 INFO mapred.JobClient: map 0% reduce 0%<br />12/07/22 06:15:39 INFO mapred.JobClient: map 100% reduce 0%<br />12/07/22 06:16:00 INFO mapred.JobClient: map 100% reduce 100%<br />12/07/22 06:16:11 INFO mapred.JobClient: Job complete: job_201207220601_0002<br />12/07/22 06:16:11 INFO mapred.JobClient: Counters: 30<br />12/07/22 06:16:11 INFO mapred.JobClient: Job Counters<br />12/07/22 06:16:11 INFO mapred.JobClient: Launched reduce tasks=1<br />12/07/22 06:16:11 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=32548<br />12/07/22 06:16:12 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0<br />12/07/22 06:16:12 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0<br />12/07/22 06:16:12 INFO mapred.JobClient: Launched map tasks=1<br />12/07/22 06:16:12 INFO mapred.JobClient: Data-local map tasks=1<br />12/07/22 06:16:12 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=19552<br />12/07/22 06:16:12 INFO mapred.JobClient: File Input Format Counters<br />12/07/22 06:16:12 INFO mapred.JobClient: Bytes Read=238<br />12/07/22 06:16:12 INFO mapred.JobClient: File Output Format Counters<br />12/07/22 06:16:12 INFO mapred.JobClient: Bytes Written=82<br />12/07/22 06:16:12 INFO mapred.JobClient: FileSystemCounters<br />12/07/22 06:16:12 INFO mapred.JobClient: FILE_BYTES_READ=128<br />12/07/22 06:16:12 INFO mapred.JobClient: HDFS_BYTES_READ=355<br />12/07/22 06:16:12 INFO mapred.JobClient: FILE_BYTES_WRITTEN=42813<br />12/07/22 06:16:12 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=82<br />12/07/22 06:16:12 INFO mapred.JobClient: Map-Reduce Framework<br />12/07/22 06:16:12 INFO mapred.JobClient: Map output materialized bytes=128<br />12/07/22 06:16:12 INFO mapred.JobClient: Map input records=5<br />12/07/22 06:16:12 INFO mapred.JobClient: Reduce shuffle bytes=0<br />12/07/22 06:16:12 INFO mapred.JobClient: Spilled Records=10<br />12/07/22 06:16:12 INFO mapred.JobClient: Map output bytes=112<br />12/07/22 06:16:12 INFO mapred.JobClient: Total committed heap usage (bytes)=210632704<br />12/07/22 06:16:12 INFO mapred.JobClient: CPU time spent (ms)=4680<br />12/07/22 06:16:12 INFO mapred.JobClient: Map input bytes=152<br />12/07/22 06:16:12 INFO mapred.JobClient: SPLIT_RAW_BYTES=117<br />12/07/22 06:16:12 INFO mapred.JobClient: Combine input records=0<br />12/07/22 06:16:12 INFO mapred.JobClient: Reduce input records=5<br />12/07/22 06:16:12 INFO mapred.JobClient: Reduce input groups=1<br />12/07/22 06:16:12 INFO mapred.JobClient: Combine output records=0<br />12/07/22 06:16:12 INFO mapred.JobClient: Physical memory (bytes) snapshot=182882304<br />12/07/22 06:16:12 INFO mapred.JobClient: Reduce output records=5<br />12/07/22 06:16:12 INFO mapred.JobClient: Virtual memory (bytes) snapshot=763695104<br />12/07/22 06:16:12 INFO mapred.JobClient: Map output records=5<br />
저장된 결과를 꺼내어 볼수 있고,
hadoop@ruo91:~/hadoop-1.0.3$ bin/hadoop fs -get output output<br />hadoop@ruo91:~/hadoop-1.0.3$ cat output/*<br />cat: output/_logs: 디렉터리입니다<br />1 dfs.data.dir<br />1 dfs.name.dir<br />1 dfs.replication<br />1 dfs.server.namenode.<br />1 dfsadmin<br />
cat으로 바로 확인 할수 있습니다.
hadoop@ruo91:~/hadoop-1.0.3$ bin/hadoop fs -cat output/*<br />cat: File does not exist: /user/hadoop/output/_logs<br />1 dfs.data.dir<br />1 dfs.name.dir<br />1 dfs.replication<br />1 dfs.server.namenode.<br />1 dfsadmin<br />
반응형
'NoSQL > Hadoop' 카테고리의 다른 글
| 도대체 왜 클라우드 Hadoop, NoSQL에 열광하는가? (0) | 2016.02.05 |
|---|---|
| 우분투에 java & hadoop 설치 및 설정하는 법 (0) | 2016.02.05 |
| hadoop의 정의 및 설치 (0) | 2016.02.05 |
