基于Hadoop的Cloudbase的问题/Bug

日期：2020-06-06 栏目：程序人生浏览：次

1./t 是关键字来的
2.insert 不存在的表 select * from other表;
先会跑Hadoop任务，在insert，发现错误
3.容错处理差
4.不能使用‘/005’,需要使用，必须修改源码
    if( sep.equals( "//t"))
      sep = "/t";
    else
      sep = sep.replaceAll("^////", "");
5.更新慢：
File/Folder Name Platform Size Date ↓ Downloads Notes/Subscribe
Newest Files
cloudbase-1.3.1.tar.gz 1.7 MB 2009-06-16 823 Release Notes
All Files Subscribe
cloudbase 8.2 MB 2009-06-16 3,070 Subscribe Folder view
1.3.1      1.7 MB 2009-06-16 823   Subscribe Folder view
1.3        1.6 MB 2009-04-14 212   Subscribe Folder view
1.2.1      1.1 MB 2009-03-02 226   Subscribe Folder view
1.2        1.1 MB 2009-02-26 80   Subscribe Folder view
1.1      828.4 KB 2008-12-22 514   Subscribe Folder view
1.0.1    739.9 KB 2008-10-24 434   Subscribe Folder view
1.0        1.0 MB 2008-10-16 781   Subscribe Folder view

6.dblink 只有一个功能，将文件插入到数据库中
7.
select c1, sum(c2), min(c2), max( c2) from test_table4 group by c1 order by 1,2,3,4
这一句group by不行，使用group by 的时候，一定要有聚集函数
select c1 from test_table4 group by c1;
8.DBLink 密码都是明文的。
9.元数据是单点的，比较恶心
10.拆分成太多个任务：
   举例：select t1.c1 from test_table4 t1 inner join test_table5 t2 on t1.c1 = t2.c1 order by 1;
   a.将小表排序，根据关联键分发
   b.对大表根据关联键分发之后，跟小表的关联键结合，生成全数据
   c.select 出需要的东西
   d.order by 排序
11.join写得有问题：
   小表的reduce任务个数只能为1，多了就会有问题。

转载注明出处：http://www.heiqu.com/psjjy.html

基于Hadoop的Cloudbase的问题/Bug

相关推荐