hbase的基本操作

一.hbase环境搭建

在操作hbase之前,肯定得先搭好环境。hbase的环境搭建其实也挺简单的,但前提是你配置好了hadoop和Zookeeper.

1.解压hbase的压缩包

1
tar -zxvf hbase-x-x.tar.gz -C 指定目录

2.建立软连接,方便配置环境变量

1
2
3
4
5
6
7
ln -s hbase-2.2.0/ hbase

vi ~/.bashrc
# 添加如下代码 自己的路径
export HBASE_HOME=/home/master/software/hbase
#追加path
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZK_HOME/bin:$HBASE_HOME/bin
  1. 进入hbase/conf/ 修改两个配置文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
vi hbase-env.sh
# 添加
export JAVA_HOME=/home/master/software/jdk
# 指定不用内置的zookeeper
export HBASE_MANAGES_ZK=false
# 指定存放数据的文件 需要创建
export HBASE_LOG_DIR=/home/master/software/data/hbase/logs
export HBASE_PID_DIR=/home/master/software/data/hbase/pids

vi hbase-site.xml
# 添加
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop01:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<!-- 指定自己配置的zk存放数据的位置->
<value>/home/master/software/data/zk</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!--默认HMaster HTTP访问端口-->
<property>
<name>hbase.master.info.port</name>
<value>16010</value>
</property>
<!--默认HRegionServer HTTP访问端口-->
<property>
<name>hbase.regionserver.info.port</name>
<value>16030</value>
</property>
<!--不使用默认内置的,配置独立的ZK集群地址,除了master,自己配了几台zookeeper,此处就配几台-->
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>

</configuration>
  1. 创建配置文件指定的目录
1
2
3
mkdir -p ~/software/data/hbase/logs
mkdir -p ~/software/data/hbase/pids
mkdir -p ~/software/data/hbase/logs
  1. 修改conf下的regionservers文件加入regoinserver的ip
1
2
3
4
5
vi regionservers
# 添加
hadoop01
hadoop02
hadoop03
  1. 同步时间
1
sudo date -s "2019-7-2 14:10:10"

二.hbase的shell命令操作

命名空间的操作

1
2
3
create_namespace 'hsj' //创建一个命名空间
list_namespace //查看namesoace
drop_namespace 'hsj' //删除namespace 删除的namespace一定要为空

表的操作 ddl

1
2
3
4
5
6
7
8
//创建表
create 'hsj:emp',{NAME => 'info',NAME => 'extrainfo',VERSIONS => 3} //指定版本数
create 'hsj:student','info'
//修改表
alter 'hsj:student',{NAME =>'info',VERSIONS =>3}
//删除表
disable 'hsj:student'
drop 'hsj:studen'

dml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
//插入数据或修改
put 'hsj:student','1001','info:name','zs'
put 'hsj:student','1002','info:name','li'
put 'hsj:student','1002','info:age',18
//查询数据
scan 'hsj:student' //扫描全表
get 'hsj:student','1001'
get 'hsj:student','1001','info:name'
//删除数据
delete 'hsj:student','1001','info:name'
deleteall 'hsj:student','1002'
//查询表中数据有多少行
count 'hsj:student'
//删除表中所有数据
truncate 'hsj:student'

预分区

1
2
3
4
5
6
7
8
9
10
11
12
13
//手动设置预分区
create 'staff','info','partition1',SPLITS => ['1000','2000','3000','4000']
//生产16进行序列预分区
create 'staff2','info','partition2',{NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
//按照文件设置的规则预分区
vi split.txt
# 添加内容
aaaaa
bbbbb
ccccc
ddddd

create 'staff3','partition3',SPLITS_FILE => 'splits.txt'

三. 读写数据的流程

  • 读数据的流程

  1. Client向zk集群发送获取元数据所在的RegionServer
  2. zk返回Meta所在的RegionServer
  3. client读取RegionServer上的Meta,找到对应的读请求的RegionServer
  4. client读取RegionServer上的Regizhogon数据,先读MemoStore中的数据,如果不存在,读取blockCache,读取StoreFile
  • 写数据的流程

  1. Client向zk集群发送获取元数据所在的RegionServer
  2. zk返回Meta所在的RegionServer
  3. client读取RegionServer上的Meta,找到对应的写请求的RegionServer
  4. 现将写操作写入RegionServer的Hlog,在将数据写入MemoStore

四.java api

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
public class HbaseOperator {

private static Admin admin = null;
private static Connection conn = null;
private static Configuration conf = null;

static {
//1 .获取配置
conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum","hadoop01,hadoop02,hadoop03");
//2.创建连接
try {
conn = ConnectionFactory.createConnection(conf);
admin = conn.getAdmin();
} catch (IOException e) {
e.printStackTrace();
}
}
//判断表是否存在
public static boolean isExistTable(String name) throws Exception {

boolean b = admin.tableExists(TableName.valueOf(name));
return b;
}

//创建表
public static void createTable(String tableName,String cf) throws Exception {

ColumnFamilyDescriptor cfd = ColumnFamilyDescriptorBuilder.newBuilder(cf.getBytes()).build();
TableDescriptor td = TableDescriptorBuilder.newBuilder(TableName.valueOf(tableName)).setColumnFamily(cfd).build();
admin.createTable(td);
System.out.println("增加成功");
}
//异步创建
public static void createTableAsync(String tableName,String cf) throws Exception {
ColumnFamilyDescriptorBuilder cfdb = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(cf));
cfdb.setMaxVersions(3);
ColumnFamilyDescriptor cfd = cfdb.build();
TableDescriptor tdb = TableDescriptorBuilder.newBuilder(TableName.valueOf(tableName)).setColumnFamily(cfd).build();
byte[][] splite = new byte[][]{Bytes.toBytes("1000"),Bytes.toBytes("2000")};
Future<Void> tableAsync = admin.createTableAsync(tdb,splite);
tableAsync.get(1000, TimeUnit.MILLISECONDS);
System.out.println("创建表成功");

}
//插入数据
public static void put(String tableName,String rowkey,String cf,String column,String data) throws Exception {
Put p = new Put(rowkey.getBytes());
p.addColumn(Bytes.toBytes(cf),Bytes.toBytes(column),Bytes.toBytes(data));

Table table = conn.getTable(TableName.valueOf(tableName.getBytes()));
table.put(p);
table.close();
}
public static void insert(String tableName,String rowkey,String cf,String column,String data) throws Exception{


}
//查找数据
public static void get(String tableName,String rowkey) throws IOException {
Get g = new Get(Bytes.toBytes(rowkey));

Table table = conn.getTable(TableName.valueOf(tableName.getBytes()));
Result result = table.get(g);
showView(result);
}
//查找数据
public static void get(String tableName,String rowkey,String cf,String column) throws IOException {
Get g = new Get(Bytes.toBytes(rowkey));
g.addColumn(cf.getBytes(),column.getBytes());
Table table = conn.getTable(TableName.valueOf(tableName.getBytes()));
Result result = table.get(g);
showView(result);
}
//查找表
public static void scan(String tableName) throws IOException {
Scan scan = new Scan();

Table table = conn.getTable(TableName.valueOf(tableName.getBytes()));
ResultScanner scanner = table.getScanner(scan);
for(Result result:scanner){
showView(result);
}
}
//删除表
public static void deleteTable(String tableName) throws Exception {

admin.disableTable(TableName.valueOf(tableName));

admin.deleteTable(TableName.valueOf(tableName));
System.out.println("删除成功");
}
/*
过滤器:
行健过滤器
列族过滤器
列名过滤器
值过滤器
*/
public static void scanAndFilter(String tableName) throws Exception {
Scan scan = new Scan();
// PrefixFilter filter = new PrefixFilter(Bytes.toBytes("l"));

RowFilter filter = new RowFilter(CompareOperator.GREATER_OR_EQUAL, new BinaryComparator(Bytes.toBytes("3000")));

scan.setFilter(filter);
Table table = conn.getTable(TableName.valueOf(tableName.getBytes()));
ResultScanner scanner = table.getScanner(scan);
for(Result result:scanner){
showView(result);
}
}
public static void showView(Result result) throws IOException {
Cell[] cells = result.rawCells();
for(Cell cell:cells){
System.out.println(Bytes.toString(CellUtil.copyRow(cell))+","+Bytes.toString(CellUtil.cloneFamily(cell))+
","+Bytes.toString(CellUtil.cloneQualifier(cell))+","+Bytes.toString(CellUtil.cloneValue(cell)));
}
}
public static void close() throws IOException {
admin.close();
conn.close();
}
public static void main(String[] args) throws Exception {
// System.out.println(HbaseOperator.isExistTable("hsj:emp"));
// createTable("javaapi1:student","indo","extrainfo");
// deleteTable("javaapi1:student");
put("javaapi1:student","1001","info","age","18");
put("javaapi1:student","1001","info","gender","male");
get("javaapi1:student","1001");
get("javaapi1:student","1001","info","age");
// scan("javaapi1:student");
// scanAndFilter("javaapi1:student");
}
}

五.hbase架构

  • Client: 负责请求数据的接口
  • Zookeeper:负责Hbase集群的高可用,存储了元数据的地址
  • Hmaser:监控RegionServer,处理RegionServer的故障转移以及元数据的变更,处理Region的分配或转移,通过Zookeeper发送自己的位置给客户端
  • RegionServer:负责存储Hbase的实际数据,处理分配给他的Region,刷新缓存到Hdfs,维护Hlog以及负责StoreFile的合并
  • Write-Ahed logs (Hlogs): 数据存储在内存容易丢失,Write-Ahed logs存储了对数据的更新操
  • Store:一个Store对应一个列族
  • Region:Hbase表的分片,达到256M会进行分裂
  • MemoStore:内存存储,保存当前的数据操作,当MemoStore数据达到阈值(默认为128M),将数据刷到磁盘。
  • HFile:StoreFile以HFile的形式保存在Hdfs上

六.flush,compact

  • flush:当 MemStore 达到阈值,将 Memstore 中的数据 Flush 进 Storefile
1
2
3
4
5
## memStore的默认大小 128M
<property>
<name>hbase.hregion.memstore.flush.size</name>
<value>134217728</value>
</property>
  • compaction:每当memstore的数据flush到磁盘后,就形成一个storefile,当storefile的数量越来越大时,影响hbase的读性能,将多个storeFile进行合并操作,这个过程就称之为compact。compact

    的主要作用是为了合并,清除过期和多余版本的数据,提高读写数据的效率。

    compaction有两种实现方式

    • minor compaction:只用来做部分文件的合并操作以及对多余版本和过期数据的清理
    • major compaction:对Region下的Hstore下的所有StoreFIle执行合并操作,最终的结果是整理合并出一个文件。
文章目录
  1. 1. 一.hbase环境搭建
    1. 1.1. 二.hbase的shell命令操作
  2. 2. 三. 读写数据的流程
  3. 3. 四.java api
  4. 4. 五.hbase架构
  5. 5. 六.flush,compact
|
载入天数...载入时分秒...