HadoopDB集群配置方法

以下若是用 hadoop@Cluster0X:~ 代表 Cluster01 ~ Cluster02 都要做的.


参考资料1:HadoopDB Quick Start Guide

参考资料2:HadoopDB安装使用

1  首先在各节点上安装Hadoop-0.20.2, 参考

2  各节点安裝设置 PostgreSQL:

安裝并为数据库建立 hadoop 帐号,假定使用密碼为 1234 hadoop@Cluster0X:~$ sudo apt-get install postgresql hadoop@Cluster0X:~$ sudo vim /etc/postgresql/8.4/main/pg_hba.conf #local   all         all                               ident
local   all         all                               trust
# IPv4 local connections:
#host    all         all         127.0.0.1/32          md5
host    all         all         127.0.0.1/32          password
host    all         all         192.168.0.1/24          password            # 加上Cluster 機器 IP 範圍
# IPv6 local connections:
#host    all         all         ::1/128               md5
host    all         all         ::1/128               password hadoop@Cluster0X:~$ sudo /etc/init.d/postgresql-8.4 restart hadoop@Cluster0X:~$ sudo su - postgres postgres@Cluster0X:~$ createuser hadoop Shall the new role be a superuser? (y/n) y
postgres@Cluster01:~$ psql
psql (8.4.2)
Type "help" for help.

postgres=# alter user hadoop with password '1234';
ALTER ROLE
postgres=# /q 測試其他機器可否連線 hadoop@Cluster01:~$ createdb testdb hadoop@Cluster02:~$ psql -h Cluster01 testdb 
如果能够连接上,则出现以下提示: 口令:
psql (8.4.2)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
Type "help" for help.

testdb=#

3 设置HadoopDB 

    首先从下载hadoopdb,解压后,其中包含hadoopdb.jar。

    然后再从下载postgresql-8.4-701.jdbc4.jar。 

hadoop@Cluster0X:~$ cp hadoopdb.jar HADOOP_HOME/lib/ hadoop@Cluster0X:~$ cp postgresql-8.4-701.jdbc4.jar HADOOP_HOME/lib/ hadoop@Cluster0X:~$ vim HADOOP_HOME/conf/core-site.xml core-site.xml 文件中添加以下内容:
<property>
<name>hadoopdb.config.file</name>
<value>HadoopDB.xml</value>
<description>The name of the HadoopDB cluster configuration file</description>
</property>

<property>
<name>hadoopdb.fetch.size</name>
<value>1000</value>
<description>The number of records fetched from JDBC ResultSet at once</description>
</property>

<property>
<name>hadoopdb.config.replication</name>
<value>false</value>
<description>Tells HadoopDB Catalog whether replication is enabled.
Replica locations need to be specified in the catalog.
False causes replica information to be ignored.</description>
</property> hadoop@Cluster01:~$ vim nodes.txt 将集群中的所有节点IP写入此文件:
192.168.0.1 192.168.0.2 hadoop@Cluster01:~$ vim  Catalog.properties #Properties for Catalog Generation
##################################
nodes_file=nodes.txt
# Relations Name and Table Name are the same
relations_unchunked=raw
relations_chunked=poi
catalog_file=HadoopDB.xml
##
#DB Connection Parameters
##
port=5432
username=hadoop
password=1234
driver=org.postgresql.Driver
url_prefix=jdbc/:postgresql/://
##
#Chunking properties
##
# the number of databases on a node
chunks_per_node=2
# for udb0 ,udb1 ( 2 nodes = 0 ~ 1 )
unchunked_db_prefix=udb
# for cdb0 ,cdb1, ... , cdb5 (2 nodes x 3 chunks = 0~5 )
chunked_db_prefix=cdb
##
#Replication Properties
##
dump_script_prefix=/root/dump_
replication_script_prefix=/root/load_replica_
dump_file_u_prefix=/mnt/dump_udb
dump_file_c_prefix=/mnt/dump_cdb
##
#Cluster Connection
##
ssh_key=id_rsa-gsg-keypair
hadoop@Cluster01:~$ java -cp lib/hadoopdb.jar edu.yale.cs.hadoopdb.catalog.SimpleCatalogGenerator Catalog.properties 产生的 HadoopDB.xml 类似下面: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<DBClusterConfiguration xmlns="">
    <Nodes Password="1234" Username="hadoop" Driver="org.postgresql.Driver" Location="
 192.168.0.1 ">
        <Relations>
            <Partitions url="jdbc:postgresql://
 192.168.0.1 :5432/udb0"/>
        </Relations>
        <Relations>
            <Partitions url="jdbc:postgresql://
 192.168.0.1 :5432/cdb0"/>
            <Partitions url="jdbc:postgresql://
 192.168.0.1 :5432/cdb1"/>
        </Relations>
    </Nodes>
    <Nodes Password="1234" Username="hadoop" Driver="org.postgresql.Driver" Location="
 192.168.0.2 ">
        <Relations>
            <Partitions url="jdbc:postgresql://
 192.168.0.2 :5432/udb1"/>
        </Relations>
        <Relations>
            <Partitions url="jdbc:postgresql://
 192.168.0.2 :5432/cdb2"/>
            <Partitions url="jdbc:postgresql://
 192.168.0.2 :5432/cdb3"/>
        </Relations>
    </Nodes>
</DBClusterConfiguration>
将HadoopDB.xml放入HDFS中: hadoop@Cluster01:~$ hadoop dfs -put HadoopDB.xml HadoopDB.xml

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/ppdyd.html