2019-07-02

hbase的基本操作

一.hbase环境搭建

在操作hbase之前，肯定得先搭好环境。hbase的环境搭建其实也挺简单的，但前提是你配置好了hadoop和Zookeeper.

1.解压hbase的压缩包

1	tar -zxvf hbase-x-x.tar.gz -C 指定目录

2.建立软连接，方便配置环境变量

ln -s hbase-2.2.0/ hbase

vi ~/.bashrc
# 添加如下代码 自己的路径
export HBASE_HOME=/home/master/software/hbase
＃追加path
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZK_HOME/bin:$HBASE_HOME/bin

进入hbase/conf/ 修改两个配置文件

vi hbase-env.sh
# 添加
export JAVA_HOME=/home/master/software/jdk
＃　指定不用内置的zookeeper
export HBASE_MANAGES_ZK=false
# 指定存放数据的文件　需要创建
export HBASE_LOG_DIR=/home/master/software/data/hbase/logs
export HBASE_PID_DIR=/home/master/software/data/hbase/pids

vi hbase-site.xml 
# 添加
<configuration>
 <property>
    <name>hbase.rootdir</name>
    <value>hdfs://hadoop01:9000/hbase</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <!-- 指定自己配置的zk存放数据的位置->
    <value>/home/master/software/data/zk</value>
  </property>
<property>
       <name>hbase.cluster.distributed</name>
       <value>true</value>
   </property>
       <!--默认HMaster HTTP访问端口-->
   <property>
       <name>hbase.master.info.port</name>
       <value>16010</value>
    </property>
       <!--默认HRegionServer HTTP访问端口-->
    <property>
       <name>hbase.regionserver.info.port</name>
       <value>16030</value>
    </property>
       <!--不使用默认内置的，配置独立的ZK集群地址，除了master，自己配了几台zookeeper，此处就配几台-->
   <property>
       <name>hbase.zookeeper.quorum</name>
       <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
   </property>

</configuration>

1
2
3

mkdir -p ~/software/data/hbase/logs
mkdir -p ~/software/data/hbase/pids
mkdir -p ~/software/data/hbase/logs

修改conf下的regionservers文件加入regoinserver的ip

vi regionservers
# 添加
hadoop01
hadoop02
hadoop03

同步时间

1	sudo date -s "2019-7-2 14:10:10"

二.hbase的shell命令操作

命名空间的操作

1
2
3

create_namespace 'hsj' //创建一个命名空间
list_namespace //查看namesoace
drop_namespace 'hsj' //删除namespace 删除的namespace一定要为空

表的操作 ddl

//创建表
create 'hsj:emp',{NAME => 'info',NAME => 'extrainfo',VERSIONS => 3} //指定版本数
create 'hsj:student','info'
//修改表
alter 'hsj:student',{NAME =>'info',VERSIONS =>3}
//删除表
disable 'hsj:student'
drop 'hsj:studen'

dml

//插入数据或修改
put 'hsj:student','1001','info:name','zs'
put 'hsj:student','1002','info:name','li'
put 'hsj:student','1002','info:age',18
//查询数据
scan 'hsj:student' //扫描全表
get 'hsj:student','1001'
get 'hsj:student','1001','info:name'
//删除数据
delete 'hsj:student','1001','info:name'
deleteall 'hsj:student','1002'
//查询表中数据有多少行
count 'hsj:student'
//删除表中所有数据
truncate 'hsj:student'

预分区

//手动设置预分区
create 'staff','info','partition1',SPLITS => ['1000','2000','3000','4000']
//生产16进行序列预分区
create 'staff2','info','partition2',{NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
//按照文件设置的规则预分区
vi split.txt
# 添加内容
aaaaa
bbbbb
ccccc
ddddd

create 'staff3','partition3',SPLITS_FILE => 'splits.txt'

三. 读写数据的流程

读数据的流程

Client向zk集群发送获取元数据所在的RegionServer
zk返回Meta所在的RegionServer
client读取RegionServer上的Meta，找到对应的读请求的RegionServer
client读取RegionServer上的Regizhogon数据，先读MemoStore中的数据，如果不存在,读取blockCache,读取StoreFile

写数据的流程

Client向zk集群发送获取元数据所在的RegionServer
zk返回Meta所在的RegionServer
client读取RegionServer上的Meta，找到对应的写请求的RegionServer
现将写操作写入RegionServer的Hlog,在将数据写入MemoStore

四.java api

public class HbaseOperator {

    private static Admin admin = null;
    private static Connection conn = null;
    private static Configuration conf = null;

     static {
        //1 .获取配置
        conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum","hadoop01,hadoop02,hadoop03");
        //2.创建连接
         try {
             conn = ConnectionFactory.createConnection(conf);
             admin = conn.getAdmin();
         } catch (IOException e) {
             e.printStackTrace();
         }
     }
     //判断表是否存在
    public static boolean isExistTable(String name) throws Exception {

        boolean b = admin.tableExists(TableName.valueOf(name));
        return b;
    }

    //创建表
    public static  void createTable(String tableName,String cf) throws Exception {

        ColumnFamilyDescriptor cfd = ColumnFamilyDescriptorBuilder.newBuilder(cf.getBytes()).build();
        TableDescriptor td = TableDescriptorBuilder.newBuilder(TableName.valueOf(tableName)).setColumnFamily(cfd).build();
         admin.createTable(td);
        System.out.println("增加成功");
    }
    //异步创建
    public static  void createTableAsync(String tableName,String cf) throws Exception {
        ColumnFamilyDescriptorBuilder cfdb = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(cf));
        cfdb.setMaxVersions(3);
        ColumnFamilyDescriptor cfd = cfdb.build();
        TableDescriptor tdb = TableDescriptorBuilder.newBuilder(TableName.valueOf(tableName)).setColumnFamily(cfd).build();
        byte[][] splite = new byte[][]{Bytes.toBytes("1000"),Bytes.toBytes("2000")};
        Future<Void> tableAsync = admin.createTableAsync(tdb,splite);
        tableAsync.get(1000, TimeUnit.MILLISECONDS);
        System.out.println("创建表成功");

    }
    //插入数据
    public static void put(String tableName,String rowkey,String cf,String column,String data) throws Exception {
        Put p = new Put(rowkey.getBytes());
        p.addColumn(Bytes.toBytes(cf),Bytes.toBytes(column),Bytes.toBytes(data));

        Table table = conn.getTable(TableName.valueOf(tableName.getBytes()));
        table.put(p);
        table.close();
    }
    public static void insert(String tableName,String rowkey,String cf,String column,String data) throws Exception{


    }
    //查找数据
    public static void get(String tableName,String rowkey) throws IOException {
         Get g = new Get(Bytes.toBytes(rowkey));

        Table table = conn.getTable(TableName.valueOf(tableName.getBytes()));
        Result result = table.get(g);
        showView(result);
    }
    //查找数据
    public static void get(String tableName,String rowkey,String cf,String column) throws IOException {
        Get g = new Get(Bytes.toBytes(rowkey));
        g.addColumn(cf.getBytes(),column.getBytes());
        Table table = conn.getTable(TableName.valueOf(tableName.getBytes()));
        Result result = table.get(g);
        showView(result);
    }
    //查找表
    public static void scan(String tableName) throws IOException {
         Scan scan = new Scan();

        Table table = conn.getTable(TableName.valueOf(tableName.getBytes()));
        ResultScanner scanner = table.getScanner(scan);
        for(Result result:scanner){
            showView(result);
        }
    }
    //删除表
    public static  void deleteTable(String tableName) throws Exception {

        admin.disableTable(TableName.valueOf(tableName));

        admin.deleteTable(TableName.valueOf(tableName));
        System.out.println("删除成功");
    }
    /*
        过滤器:
            行健过滤器
             列族过滤器
               列名过滤器
                值过滤器
     */
    public static void scanAndFilter(String tableName) throws Exception {
         Scan scan = new Scan();
//        PrefixFilter filter = new PrefixFilter(Bytes.toBytes("l"));

        RowFilter filter  = new RowFilter(CompareOperator.GREATER_OR_EQUAL, new BinaryComparator(Bytes.toBytes("3000")));

        scan.setFilter(filter);
        Table table = conn.getTable(TableName.valueOf(tableName.getBytes()));
        ResultScanner scanner = table.getScanner(scan);
        for(Result result:scanner){
            showView(result);
        }
    }
    public static void showView(Result result) throws IOException {
        Cell[] cells = result.rawCells();
        for(Cell cell:cells){
            System.out.println(Bytes.toString(CellUtil.copyRow(cell))+","+Bytes.toString(CellUtil.cloneFamily(cell))+
                              ","+Bytes.toString(CellUtil.cloneQualifier(cell))+","+Bytes.toString(CellUtil.cloneValue(cell)));
        }
    }
    public static void close() throws IOException {
        admin.close();
        conn.close();
    }
    public static void main(String[] args) throws Exception {
//        System.out.println(HbaseOperator.isExistTable("hsj:emp"));
//        createTable("javaapi1:student","indo","extrainfo");
//        deleteTable("javaapi1:student");
        put("javaapi1:student","1001","info","age","18");
        put("javaapi1:student","1001","info","gender","male");
        get("javaapi1:student","1001");
        get("javaapi1:student","1001","info","age");
//        scan("javaapi1:student");
//        scanAndFilter("javaapi1:student");
    }
}

五.hbase架构

Client: 负责请求数据的接口
Zookeeper:负责Hbase集群的高可用，存储了元数据的地址
Hmaser:监控RegionServer，处理RegionServer的故障转移以及元数据的变更，处理Region的分配或转移，通过Zookeeper发送自己的位置给客户端
RegionServer:负责存储Hbase的实际数据，处理分配给他的Region，刷新缓存到Hdfs,维护Hlog以及负责StoreFile的合并
Write-Ahed logs (Hlogs): 数据存储在内存容易丢失，Write-Ahed logs存储了对数据的更新操
Store:一个Store对应一个列族
Region:Hbase表的分片,达到256M会进行分裂
MemoStore:内存存储，保存当前的数据操作,当MemoStore数据达到阈值(默认为128M),将数据刷到磁盘。
HFile:StoreFile以HFile的形式保存在Hdfs上

六.flush,compact

flush:当 MemStore 达到阈值，将 Memstore 中的数据 Flush 进 Storefile

## memStore的默认大小　128M
<property>
    <name>hbase.hregion.memstore.flush.size</name>
    <value>134217728</value>
  </property>

compaction:每当memstore的数据flush到磁盘后，就形成一个storefile，当storefile的数量越来越大时,影响hbase的读性能，将多个storeFile进行合并操作，这个过程就称之为compact。compact

的主要作用是为了合并，清除过期和多余版本的数据，提高读写数据的效率。

compaction有两种实现方式
- minor compaction：只用来做部分文件的合并操作以及对多余版本和过期数据的清理
- major compaction：对Region下的Hstore下的所有StoreFIle执行合并操作，最终的结果是整理合并出一个文件。

本文标题:hbase的基本操作

文章作者:HanderH

发布时间:2019-07-02, 15:06:41

最后更新:2019-07-10, 20:16:18

原始链接:http://www.handerh.top/2019/07/02/hbase基本操作/

许可协议: "署名-非商用-相同方式共享 4.0" 转载请保留原文链接及作者。