Hive（一）——启动与基本使用

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Hive（⼀）——启动与基本使⽤
⼀、基本概念
The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax. Hive数据仓库软件，致⼒于解决读写、管理分布式存储中的⼤规模数据集，以及使⽤SQL语法进⾏查询的问题。

Hive⽤于解决海量结构化⽇志的数据统计问题。

Hive是基于Hadoop的⼀个数据仓库⼯具。

本质是将HQL（Hive的查询语⾔）转化成MapReduce程序。

HIve处理的数据存储在HDFS
HIve分析数据底层的默认实现是MapReduce
执⾏程序运⾏在Yarn上
Hive的优缺点
优点：
可以快速进⾏数据分析，不需要写MapReduce程序。

MapReduce适合处理⼤数据，不适合处理⼩数据
缺点：
HQL表达能⼒有限，迭代式算法不能表达，粒度较粗，调优⽐较困难。

⾃定义函数类别：
UDF
UDAF
UDTF
架构原理
执⾏顺序：解析器-编译器-优化器-执⾏器
Hive与数据库对⽐
HIve相⽐数据库，读多写少，没有索引，需要暴⼒扫描所有数据，即使引⼊了MapReduce机制，也不适合实时查询，扩展性和Hadoop的是⼀致的，扩展性强。

⼆、安装与启动
下载
需要启动Hadoop的HDFS和Yarn
配置conf/hive-env.sh
export HADOOP_HOME=/usr/local/hadoop(改成hadoop-home路径)
export HIVE_CONF_DIR=/ur/local/hive/conf
启动
bin/hive
三、Hive语句
显⽰数据库
show databases;
使⽤本地模式执⾏
hive> SET =local;
创建表、插⼊记录、查询记录
use default;
#### 创建表
create table student(id int,name string);
#### 插⼊记录
insert into table student values(1,'fonxian');
#### 查询记录
select * from student;
在Hadoop上查看记录
从⽂件系统加载数据
创建数据⽂本student.txt
3,kafka
4,flume
5,hbase
6,zookeeper
创建表，定义分隔符
create table stu1(id int,name string) row format delimited fields terminated by ',';
加载数据
load data local inpath '/usr/local/hive/data/student.txt' into table stu1;
查看数据后的执⾏效果
四、Hive Hook使⽤
添加依赖
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="/POM/4.0.0"
xmlns:xsi="/2001/XMLSchema-instance"
xsi:schemaLocation="/POM/4.0.0 /xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>hive-hook-example</groupId>
<artifactId>Hive-hook-example</artifactId>
<version>1.0</version>
<dependencies>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.1.0</version>
</dependency>
</dependencies>
</project>
创建HiveExampleHook
public class HiveExampleHook implements ExecuteWithHookContext {
public void run(HookContext hookContext) throws Exception {
System.out.println("operation name :" + hookContext.getQueryPlan().getOperationName());
System.out.println(hookContext.getQueryPlan().getQueryPlan());
System.out.println("Hello from the hook !!");
}
}
编译好，获得Hive-hook-example-1.0.jar
hive> add jar Hive-hook-example-1.0.jar
hive> set hive.exec.pre.hooks=HiveExampleHook;
hive> select * from student;
operation name :QUERY
Query(queryId:fangzhijie_20191221231550_0e949bbf-f8f7-45a8-8726-c1cdd679cef9, queryType:null, queryAttributes:{queryString=select * from student}, queryCounters:null, stageGraph:Graph(nodeType:STAGE, roots:null, adjacencyList:null), sta Hello from the hook !!
OK
Time taken: 1.718 seconds
Time taken: 1.68 seconds
五、使⽤MySQL存储元数据
在本地安装mysql，创建hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1:3306/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>
</configuration>
执⾏bin/hive，查看数据库，发现有创建表。

在hive中执⾏reate table aaa(id int);，HDFS中有创建该⽂件，且metastore的TBLS表中有记录。

六、Beeline
HiveServer2 (introduced in Hive 0.11) has its own CLI called Beeline. HiveCLI is now deprecated in favor of Beeline, as it lacks the multi-user, security, and other
capabilities of HiveServer2. To run HiveServer2 and Beeline from shell:
HiveServer2有⾃⼰的客户端，叫Beeline。

HiveCLI⽬前已经废弃了，建议使⽤Beeline。

使⽤Beeline连接HiveServer2
beeline -u "jdbc:hive2://host:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n username -p password
七、报错信息解决&问题定位
修改配置不⽣效
可能是配置路径的问题，查看hive-env.sh，最后发现hive配置路径写错。

错误的路径配置,导致根本找不到配置路径
export HIVE_CONF_DIR=/ur/local/hive/conf
正确的配置
export HIVE_CONF_DIR=/usr/local/hive/conf
插⼊数据失败
hive> insert into table student values(1,'fonxian');
Query ID = fangzhijie_20191205061055_6c8c233e-2d46-470a-972d-38f36bb8068c
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1575495654045_0004, Tracking URL = http://localhost:8088/proxy/application_1575495654045_0004/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1575495654045_0004
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2019-12-05 06:10:58,803 Stage-1 map = 0%, reduce = 0%
Ended Job = job_1575495654045_0004 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
解决⽅法:执⾏下⾯的命令
hive> SET =local;
分析：
参考官⽅⽂档
Hive compiler generates map-reduce jobs for most queries. These jobs are then submitted to the Map-Reduce cluster indicated by the variable: mapred.job.tracker Hive编译器为⼤多数查询操作⽣成MR任务，这些任务之后会被提交到MR集群。

Hive fully supports local mode execution. To enable this, the user can enable the following option:
Hive⽀持本地模式执⾏，⽤户可以使⽤下列操作：
hive> SET =local;
参考⽂档。