Skip to content

Commit 9350953

Browse files
committed
add cn read me doc.
1 parent b8c5127 commit 9350953

File tree

2 files changed

+306
-0
lines changed

2 files changed

+306
-0
lines changed

README-zh.md

Lines changed: 304 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,304 @@
1+
# hive-third-functions
2+
3+
[![Build Status](https://travis-ci.org/aaronshan/hive-third-functions.svg?branch=master)](https://travis-ci.org/aaronshan/hive-third-functions)
4+
[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](https://github.com/aaronshan/hive-third-functions/tree/master/README.md)
5+
[![Documentation Status](https://img.shields.io/badge/中文文档-最新-brightgreen.svg)](https://github.com/aaronshan/hive-third-functions/tree/master/README-zh.md)
6+
[![Release](https://img.shields.io/github/release/aaronshan/hive-third-functions.svg)](https://github.com/aaronshan/hive-third-functions/releases)
7+
8+
## Introduction
9+
10+
hive-third-functions 包含了一些很有用的hive udf函数,特别是数组和json函数.
11+
12+
> 注意:
13+
> hive-third-functions支持hive-0.11.0或更高版本.
14+
15+
## 编译
16+
17+
### 1. 安装依赖
18+
19+
目前, jdo2-api-2.3-ec.jar 在maven中央仓库中已经不可用, 因此我们不得不自己下载并安装到本地的maven库中. 命令如下:
20+
21+
```
22+
wget http://www.datanucleus.org/downloads/maven2/javax/jdo/jdo2-api/2.3-ec/jdo2-api-2.3-ec.jar -O ~/jdo2-api-2.3-ec.jar
23+
mvn install:install-file -DgroupId=javax.jdo -DartifactId=jdo2-api -Dversion=2.3-ec -Dpackaging=jar -Dfile=~/jdo2-api-2.3-ec.jar
24+
```
25+
26+
### 2. 用mvn打包
27+
28+
```
29+
cd ${project_home}
30+
mvn clean package
31+
```
32+
33+
如果你想跳过单元测试,可以这样运行:
34+
```
35+
cd ${project_home}
36+
mvn clean package -DskipTests
37+
```
38+
39+
命令执行完成后, 将会在target目录下生成hive-third-functions-${version}-shaded.jar文件.
40+
41+
你也可以直接在发布页下载打包好了最新版本 [发布页](https://github.com/aaronshan/hive-third-functions/releases).
42+
43+
> 当前最新的版本是 `2.1.2`
44+
45+
## 函数
46+
47+
### 1. 字符函数
48+
49+
| 函数| 描述 |
50+
|:--|:--|
51+
|pinyin(string) -> string | 将汉字转换为拼音|
52+
|md5(string) -> string | md5 哈希|
53+
|sha256(string) -> string |sha256 哈希|
54+
55+
### 2. 数组函数
56+
57+
| 函数| 描述 |
58+
|:--|:--|
59+
|array_contains(array<E>, E) -> boolean | 判断数组是否包含某个值.|
60+
|array_equals(array<E>, array<E>) -> boolean | 判断两个数组是否相等.|
61+
|array_intersect(array, array) -> array | 返回两个数组的交集.|
62+
|array_max(array<E>) -> E | 返回数组中的最大值.|
63+
|array_min(array<E>) -> E | 返回数组中的最小值.|
64+
|array_join(array, delimiter, null_replacement) -> string | 使用给定的连接符来连接数组中的元素, `null_replacement`是一个可选项, 用来替代空值.|
65+
|array_distinct(array) -> array | 移除数组中的重复元素.|
66+
|array_position(array<E>, E) -> long | 返回给定元素在数组中第一次出现的位置 (如果没找到, 返回0).|
67+
|array_remove(array<E>, E) -> array | 删除数组中的给定元素.|
68+
|array_reverse(array) -> array | 反转一个数组.|
69+
|array_sort(array) -> array | 对数组排序, 数组中的元素必需是可排序的.|
70+
|array_concat(array, array) -> array | 连接两个数组.|
71+
|array_value_count(array<E>, E) -> long | 统计数组中包含给定元素的个数.|
72+
|array_slice(array, start, length) -> array | 对数组进行分片操作,start为正数从前开始分片, start为负数从后开始分片, 长度为指定的长度.|
73+
|array_element_at(array&lt;E&gt;, index) -> E | 返回指定位置的数组元素. 如果索引位置 < 0, 则从尾部开始计数并返回.|
74+
75+
### 3. map函数
76+
| 函数| 描述 |
77+
|:--|:--|
78+
|map_build(x&lt;K&gt;, y&lt;V&gt;) -> map&lt;K, V&gt;| 根据指定的键/值对数组创建map.|
79+
|map_concat(x&lt;K, V&gt;, y&lt;K, V&gt;) -> map&lt;K,V&gt; | 返回两个map的并集. 如果一个键在 `x``y`中同时出现, 那对应值来自`y`.|
80+
|map_element_at(map&lt;K, V&gt;, key) -> V | 如果指定的`key`存在,返回对应的值, 否则返回 `NULL` .|
81+
|map_equals(x&lt;K, V&gt;, y&lt;K, V&gt;) -> boolean | 判断map x 和 map y是否相等.|
82+
83+
### 4. 日期函数
84+
85+
| 函数| 描述 |
86+
|:--|:--|
87+
|day_of_week(date_string \| date) -> int | 一周的第几天,周一返回 1, 周日返回 7, 出错返回null.|
88+
|day_of_year(date_string \| date) -> int | 一年的第几天. 值的范围从 1 到 366.|
89+
|zodiac_en(date_string \| date) -> string | 将日期转换为星座英文|
90+
|zodiac_cn(date_string \| date) -> string | 将日期转换为星座中文 |
91+
|type_of_day(date_string \| date) -> string | 获取日期的类型(1: 法定节假日, 2: 正常周末, 3: 正常工作日 4:攒假的工作日),错误返回-1. |
92+
93+
### 5. json函数
94+
95+
| 函数| 描述 |
96+
|:--|:--|
97+
|json_array_get(json, jsonPath) -> array(varchar) |returns the element at the specified index into the `json_array`. The index is zero-based.|
98+
|json_array_length(json, jsonPath) -> array(varchar) |returns the array length of `json` (a string containing a JSON array).|
99+
|json_array_extract(json, jsonPath) -> array(varchar) |extract json array by given jsonPath.|
100+
|json_array_extract_scalar(json, jsonPath) -> array(varchar) |like `json_array_extract`, but returns the result value as a string (as opposed to being encoded as JSON).|
101+
|json_extract(json, jsonPath) -> array(varchar) |extract json by given jsonPath.|
102+
|json_extract_scalar(json, jsonPath) -> array(varchar) |like `json_extract`, but returns the result value as a string (as opposed to being encoded as JSON).|
103+
|json_size(json, jsonPath) -> array(varchar) |like `json_extract`, but returns the size of the value. For objects or arrays, the size is the number of members, and the size of a scalar value is zero.|
104+
105+
### 6. 位函数
106+
107+
| 函数| 描述 |
108+
|:--|:--|
109+
|bit_count(x, bits) -> bigint | count the number of bits set in `x` (treated as bits-bit signed integer) in 2’s complement representation |
110+
|bitwise_and(x, y) -> bigint | returns the bitwise AND of `x` and `y` in 2’s complement arithmetic.|
111+
|bitwise_not(x) -> bigint | returns the bitwise NOT of `x` in 2’s complement arithmetic. |
112+
|bitwise_or(x, y) -> bigint | returns the bitwise OR of `x` and `y` in 2’s complement arithmetic.|
113+
|bitwise_xor(x, y) -> bigint | returns the bitwise XOR of `x` and `y` in 2’s complement arithmetic. |
114+
115+
### 7. 中国身份证函数
116+
117+
| 函数| 描述 |
118+
|:--|:--|
119+
|id_card_province(string) -> string |从身份证号获取省份|
120+
|id_card_city(string) -> string |从身份证号获取城市|
121+
|id_card_area(string) -> string |从身份证号获取区/县|
122+
|id_card_birthday(string) -> string |从身份证号获取生日|
123+
|id_card_gender(string) -> string |从身份证号获取性别|
124+
|is_valid_id_card(string) -> boolean |鉴定身份证号是否有效.|
125+
|id_card_info(string) -> json |获取身份证号信息. 包活省份、城市、区县等.|
126+
127+
### 8. 坐标系函数
128+
129+
| 函数| 描述 |
130+
|:--|:--|
131+
|wgs_distance(double lat1, double lng1, double lat2, double lng2) -> double | 计算 WGS84坐标距离, 单位米. |
132+
|gcj_to_bd(double,double) -> json | GCJ-02(火星坐标系) 转为 BD-09(百度坐标系), 谷歌、高德——>百度|
133+
|bd_to_gcj(double,double) -> json | BD-09(百度坐标系) 转为 GCJ-02(火星坐标系), 百度——>谷歌、高德|
134+
|wgs_to_gcj(double,double) -> json | WGS84(地球坐标系) 转为 GCJ02(火星坐标系)|
135+
|gcj_to_wgs(double,double) -> json | GCJ02(火星坐标系) 转为 GPS84(地球坐标系), 输出的坐标精度在1到2米.|
136+
|gcj_extract_wgs(double,double) -> json | GCJ02(火星坐标系) 转为 GPS84, 输出的坐标精度在0.5米. 但是计算比`gcj_to_wgs`耗时长. |
137+
138+
> 关于互联网地图坐标系的说明见: [当前互联网地图的坐标系现状](https://github.com/aaronshan/hive-third-functions/tree/master/README-geo.md)
139+
140+
141+
### 9. url函数
142+
143+
| 函数| 描述 |
144+
|:--|:--|
145+
|url_encode(value) -> string | escapes value by encoding it so that it can be safely included in URL query parameter names and values|
146+
|url_decode(value) -> string | unescape the URL encoded value. This function is the inverse of `url_encode`. |
147+
148+
## 用法
149+
150+
将下面这些内容写入 `${HOME}/.hiverc` 文件, 或者也可以按需在hive命令行环境中执行.
151+
152+
```
153+
add jar ${jar_location_dir}/hive-third-functions-${version}-shaded.jar
154+
create temporary function array_contains as 'cc.shanruifeng.functions.array.UDFArrayContains';
155+
create temporary function array_equals as 'cc.shanruifeng.functions.array.UDFArrayEquals';
156+
create temporary function array_intersect as 'cc.shanruifeng.functions.array.UDFArrayIntersect';
157+
create temporary function array_max as 'cc.shanruifeng.functions.array.UDFArrayMax';
158+
create temporary function array_min as 'cc.shanruifeng.functions.array.UDFArrayMin';
159+
create temporary function array_join as 'cc.shanruifeng.functions.array.UDFArrayJoin';
160+
create temporary function array_distinct as 'cc.shanruifeng.functions.array.UDFArrayDistinct';
161+
create temporary function array_position as 'cc.shanruifeng.functions.array.UDFArrayPosition';
162+
create temporary function array_remove as 'cc.shanruifeng.functions.array.UDFArrayRemove';
163+
create temporary function array_reverse as 'cc.shanruifeng.functions.array.UDFArrayReverse';
164+
create temporary function array_sort as 'cc.shanruifeng.functions.array.UDFArraySort';
165+
create temporary function array_concat as 'cc.shanruifeng.functions.array.UDFArrayConcat';
166+
create temporary function array_value_count as 'cc.shanruifeng.functions.array.UDFArrayValueCount';
167+
create temporary function array_slice as 'cc.shanruifeng.functions.array.UDFArraySlice';
168+
create temporary function array_element_at as 'cc.shanruifeng.functions.array.UDFArrayElementAt';
169+
create temporary function bit_count as 'cc.shanruifeng.functions.bitwise.UDFBitCount';
170+
create temporary function bitwise_and as 'cc.shanruifeng.functions.bitwise.UDFBitwiseAnd';
171+
create temporary function bitwise_not as 'cc.shanruifeng.functions.bitwise.UDFBitwiseNot';
172+
create temporary function bitwise_or as 'cc.shanruifeng.functions.bitwise.UDFBitwiseOr';
173+
create temporary function bitwise_xor as 'cc.shanruifeng.functions.bitwise.UDFBitwiseXor';
174+
create temporary function map_build as 'cc.shanruifeng.functions.map.UDFMapBuild';
175+
create temporary function map_concat as 'cc.shanruifeng.functions.map.UDFMapConcat';
176+
create temporary function map_element_at as 'cc.shanruifeng.functions.map.UDFMapElementAt';
177+
create temporary function map_equals as 'cc.shanruifeng.functions.map.UDFMapEquals';
178+
create temporary function day_of_week as 'cc.shanruifeng.functions.date.UDFDayOfWeek';
179+
create temporary function day_of_year as 'cc.shanruifeng.functions.date.UDFDayOfYear';
180+
create temporary function type_of_day as 'cc.shanruifeng.functions.date.UDFTypeOfDay';
181+
create temporary function zodiac_cn as 'cc.shanruifeng.functions.date.UDFZodiacSignCn';
182+
create temporary function zodiac_en as 'cc.shanruifeng.functions.date.UDFZodiacSignEn';
183+
create temporary function pinyin as 'cc.shanruifeng.functions.string.UDFChineseToPinYin';
184+
create temporary function md5 as 'cc.shanruifeng.functions.string.UDFMd5';
185+
create temporary function sha256 as 'cc.shanruifeng.functions.string.UDFSha256';
186+
create temporary function json_array_get as 'cc.shanruifeng.functions.json.UDFJsonArrayGet';
187+
create temporary function json_array_length as 'cc.shanruifeng.functions.json.UDFJsonArrayLength';
188+
create temporary function json_array_extract as 'cc.shanruifeng.functions.json.UDFJsonArrayExtract';
189+
create temporary function json_array_extract_scalar as 'cc.shanruifeng.functions.json.UDFJsonArrayExtractScalar';
190+
create temporary function json_extract as 'cc.shanruifeng.functions.json.UDFJsonExtract';
191+
create temporary function json_extract_scalar as 'cc.shanruifeng.functions.json.UDFJsonExtractScalar';
192+
create temporary function json_size as 'cc.shanruifeng.functions.json.UDFJsonSize';
193+
create temporary function id_card_province as 'cc.shanruifeng.functions.card.UDFChinaIdCardProvince';
194+
create temporary function id_card_city as 'cc.shanruifeng.functions.card.UDFChinaIdCardCity';
195+
create temporary function id_card_area as 'cc.shanruifeng.functions.card.UDFChinaIdCardArea';
196+
create temporary function id_card_birthday as 'cc.shanruifeng.functions.card.UDFChinaIdCardBirthday';
197+
create temporary function id_card_gender as 'cc.shanruifeng.functions.card.UDFChinaIdCardGender';
198+
create temporary function is_valid_id_card as 'cc.shanruifeng.functions.card.UDFChinaIdCardValid';
199+
create temporary function id_card_info as 'cc.shanruifeng.functions.card.UDFChinaIdCardInfo';
200+
create temporary function wgs_distance as 'cc.shanruifeng.functions.geo.UDFGeoWgsDistance';
201+
create temporary function gcj_to_bd as 'cc.shanruifeng.functions.geo.UDFGeoGcjToBd';
202+
create temporary function bd_to_gcj as 'cc.shanruifeng.functions.geo.UDFGeoBdToGcj';
203+
create temporary function wgs_to_gcj as 'cc.shanruifeng.functions.geo.UDFGeoWgsToGcj';
204+
create temporary function gcj_to_wgs as 'cc.shanruifeng.functions.geo.UDFGeoGcjToWgs';
205+
create temporary function gcj_extract_wgs as 'cc.shanruifeng.functions.geo.UDFGeoGcjExtractWgs';
206+
create temporary function url_encode as 'cc.shanruifeng.functions.url.UDFUrlEncode';
207+
create temporary function url_decode as 'cc.shanruifeng.functions.url.UDFUrlDecode';
208+
```
209+
210+
你可以在hive的命令杭中使用下面的语句来查看函数的细节.
211+
```
212+
hive> describe function zodiac_cn;
213+
zodiac_cn(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac.
214+
```
215+
216+
或者
217+
218+
```
219+
hive> describe function extended zodiac_cn;
220+
zodiac_cn(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac.
221+
Example:
222+
> select zodiac_cn(date_string) from src;
223+
> select zodiac_cn(month, day) from src;
224+
```
225+
226+
### 示例
227+
```
228+
select pinyin('中国') => zhongguo
229+
select md5('aaronshan') => 95686bc0483262afe170b550dd4544d1
230+
select sha256('aaronshan') => d16bb375433ad383169f911afdf45e209eabfcf047ba1faebdd8f6a0b39e0a32
231+
```
232+
233+
```
234+
select day_of_week('2016-07-12') => 2
235+
select day_of_year('2016-01-01') => 1
236+
select type_of_day('2016-10-01') => 1
237+
select type_of_day('2016-07-16') => 2
238+
select type_of_day('2016-07-15') => 3
239+
select type_of_day('2016-09-18') => 4
240+
select zodiac_cn('1989-01-08') => 魔羯座
241+
select zodiac_en('1989-01-08') => Capricorn
242+
```
243+
244+
```
245+
select array_contains(array(16,12,18,9), 12) => true
246+
select array_equals(array(16,12,18,9), array(16,12,18,9)) => true
247+
select array_intersect(array(16,12,18,9,null), array(14,9,6,18,null)) => [null,9,18]
248+
select array_max(array(16,13,12,13,18,16,9,18)) => 18
249+
select array_min(array(16,12,18,9)) => 9
250+
select array_join(array(16,12,18,9,null), '#','=') => 16#12#18#9#=
251+
select array_distinct(array(16,13,12,13,18,16,9,18)) => [9,12,13,16,18]
252+
select array_position(array(16,13,12,13,18,16,9,18), 13) => 2
253+
select array_remove(array(16,13,12,13,18,16,9,18), 13) => [16,12,18,16,9,18]
254+
select array_reverse(array(16,12,18,9)) => [9,18,12,16]
255+
select array_sort(array(16,13,12,13,18,16,9,18)) => [9,12,13,13,16,16,18,18]
256+
select array_concat(array(16,12,18,9,null), array(14,9,6,18,null)) => [16,12,18,9,null,14,9,6,18,null]
257+
select array_value_count(array(16,13,12,13,18,16,9,18), 13) => 2
258+
select array_slice(array(16,13,12,13,18,16,9,18), -2, 3) => [9,18]
259+
select array_element_at(array(16,13,12,13,18,16,9,18), -1) => 18
260+
```
261+
262+
```
263+
select map_build(array('key1','key2'), array(16,12)) => {"key1":16,"key2":12}
264+
select map_concat(map_build(array('key1','key2'), array(16,12)), map_build(array('key1','key3'), array(17,18))) => {"key1":17,"key2":12,"key3":18}
265+
select map_element_at(map_build(array('key1','key2'), array(16,12)), 'key1') => 16
266+
select map_equals(map_build(array('key1','key2'), array(16,12)), map_build(array('key1','key2'), array(16,12))) => true
267+
```
268+
269+
```
270+
select id_card_info('110101198901084517') => {"valid":true,"area":"东城区","province":"北京市","gender":"男","city":"北京市"}
271+
```
272+
273+
```
274+
select json_array_get("[{\"a\":{\"b\":\"13\"}}, {\"a\":{\"b\":\"18\"}}, {\"a\":{\"b\":\"12\"}}]", 1); => {"a":{"b":"18"}}
275+
select json_array_get('["a", "b", "c"]', 0); => a
276+
select json_array_get('["a", "b", "c"]', 1); => b
277+
select json_array_get('["c", "b", "a"]', -1); => a
278+
select json_array_get('["c", "b", "a"]', -2); => b
279+
select json_array_get('[]', 0); => null
280+
select json_array_get('["a", "b", "c"]', 10); => null
281+
select json_array_get('["c", "b", "a"]', -10); => null
282+
select json_array_length("[{\"a\":{\"b\":\"13\"}}, {\"a\":{\"b\":\"18\"}}, {\"a\":{\"b\":\"12\"}}]"); => 3
283+
select json_array_extract("[{\"a\":{\"b\":\"13\"}}, {\"a\":{\"b\":\"18\"}}, {\"a\":{\"b\":\"12\"}}]", "$.a.b"); => ["\"13\"","\"18\"","\"12\""]
284+
select json_array_extract_scalar("[{\"a\":{\"b\":\"13\"}}, {\"a\":{\"b\":\"18\"}}, {\"a\":{\"b\":\"12\"}}]", "$.a.b") => ["13","18","12"]
285+
select json_extract("{\"a\":{\"b\":\"12\"}}", "$.a.b"); => "12"
286+
select json_extract_scalar("{\"a\":{\"b\":\"12\"}}", "$.a.b") => 12
287+
select json_extract_scalar('[1, 2, 3]', '$[2]');
288+
select json_extract_scalar(json, '$.store.book[0].author');
289+
select json_size('{"x": {"a": 1, "b": 2}}', '$.x'); => 2
290+
select json_size('{"x": [1, 2, 3]}', '$.x'); => 3
291+
select json_size('{"x": {"a": 1, "b": 2}}', '$.x.a'); => 0
292+
```
293+
294+
```
295+
select gcj_to_bd(39.915, 116.404) => {"lng":116.41036949371029,"lat":39.92133699351022}
296+
select bd_to_gcj(39.915, 116.404) => {"lng":116.39762729119315,"lat":39.90865673957631}
297+
select wgs_to_gcj(39.915, 116.404) => {"lng":116.41024449916938,"lat":39.91640428150164}
298+
select gcj_to_wgs(39.915, 116.404) => {"lng":116.39775550083061,"lat":39.91359571849836}
299+
select gcj_extract_wgs(39.915, 116.404) => {"lng":116.39775549316407,"lat":39.913596801757805}
300+
```
301+
302+
```
303+
select url_encode('http://shanruifeng.cc/') => http%3A%2F%2Fshanruifeng.cc%2F
304+
```

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# hive-third-functions
22

33
[![Build Status](https://travis-ci.org/aaronshan/hive-third-functions.svg?branch=master)](https://travis-ci.org/aaronshan/hive-third-functions)
4+
[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](https://github.com/aaronshan/hive-third-functions/tree/master/README.md)
5+
[![Documentation Status](https://img.shields.io/badge/中文文档-最新-brightgreen.svg)](https://github.com/aaronshan/hive-third-functions/tree/master/README-zh.md)
46
[![Release](https://img.shields.io/github/release/aaronshan/hive-third-functions.svg)](https://github.com/aaronshan/hive-third-functions/releases)
57

68
## Introduction

0 commit comments

Comments
 (0)