Skip to content

Commit 487f215

Browse files
authored
Next release version (#1)
* add next release version increase udf class. * delete fastutil dependency, directly extract intarrays code into project, reduce generate jar file size. * add some bitwise functions * add some map functions * add day_of_year, geo, url functions and some unit test.
1 parent 677b542 commit 487f215

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+1652
-36
lines changed

README-geo.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
## 当前互联网地图的坐标系现状
2+
### 地球坐标 (WGS84)
3+
- 国际标准,从 GPS 设备中取出的数据的坐标系
4+
- 国际地图提供商使用的坐标系
5+
6+
### 火星坐标 (GCJ-02), 也叫国测局坐标系
7+
- 中国标准,从国行移动设备中定位获取的坐标数据使用这个坐标系
8+
- 国家规定: 国内出版的各种地图系统(包括电子形式),必须至少采用GCJ-02对地理位置进行首次加密。
9+
10+
###百度坐标 (BD-09)
11+
- 百度标准,百度 SDK,百度地图,Geocoding 使用
12+
- (本来就乱了,百度又在火星坐标上来个二次加密)
13+
14+
## 开发过程需要注意的事
15+
- 从设备获取经纬度(GPS)坐标
16+
* 如果使用的是百度sdk那么可以获得百度坐标(bd09)或者火星坐标(GCJ02),默认是bd09
17+
* 如果使用的是ios的原生定位库,那么获得的坐标是WGS84
18+
* 如果使用的是高德sdk,那么获取的坐标是GCJ02
19+
- 互联网在线地图使用的坐标系
20+
* 火星坐标系:
21+
+ iOS 地图(其实是高德)
22+
+ Google 地图
23+
+ 搜搜、阿里云、高德地图
24+
* 百度坐标系:
25+
+ 当然只有百度地图
26+
* WGS84坐标系:
27+
+ 国际标准,谷歌国外地图、osm地图等国外的地图一般都是这个

README.md

Lines changed: 90 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ It will generate hive-third-functions-${version}-shaded.jar in target directory.
3535

3636
You can also directly download file from [release page](https://github.com/aaronshan/hive-third-functions/releases).
3737

38-
> current latest version is `2.1.0`
38+
> current latest version is `2.2.0`
3939
4040
## Functions
4141

@@ -51,31 +51,41 @@ You can also directly download file from [release page](https://github.com/aaron
5151

5252
| function| description |
5353
|:--|:--|
54-
|array_contains(array<E>, E) -> boolean | whether array contains value or not.|
54+
|array_contains(array&lt;E&gt;, E) -> boolean | whether array contains value or not.|
5555
|array_intersect(array, array) -> array | returns the two array's intersection, without duplicates.|
56-
|array_max(array<E>) -> E | returns the maximum value of input array.|
57-
|array_min(array<E>) -> E | returns the minimum value of input array.|
56+
|array_max(array&lt;E&gt;) -> E | returns the maximum value of input array.|
57+
|array_min(array&lt;E&gt;) -> E | returns the minimum value of input array.|
5858
|array_join(array, delimiter, null_replacement) -> string | concatenates the elements of the given array using the delimiter and an optional `null_replacement` to replace nulls.|
5959
|array_distinct(array) -> array | remove duplicate values from the array.|
60-
|array_position(array<E>, E) -> long | returns the position of the first occurrence of the element in array (or 0 if not found).|
61-
|array_remove(array<E>, E) -> array | remove all elements that equal element from array.|
60+
|array_position(array&lt;E&gt;, E) -> long | returns the position of the first occurrence of the element in array (or 0 if not found).|
61+
|array_remove(array&lt;E&gt;, E) -> array | remove all elements that equal element from array.|
6262
|array_reverse(array) -> array | reverse the array element.|
6363
|array_sort(array) -> array | sorts and returns the array. The elements of array must be orderable.|
6464
|array_concat(array, array) -> array | concatenates two arrays.|
65-
|array_value_count(array<E>, E) -> long | count array's element number that element value equals given value.|
65+
|array_value_count(array&lt;E&gt;, E) -> long | count array's element number that element value equals given value.|
6666
|array_slice(array, start, length) -> array | subsets array starting from index start (or starting from the end if start is negative) with a length of length.|
67-
|array_element_at(array<E>, index) -> E | returns element of array at given index. If index < 0, element_at accesses elements from the last to the first.|
67+
|array_element_at(array&lt;E&gt;, index) -> E | returns element of array at given index. If index < 0, element_at accesses elements from the last to the first.|
6868

69-
### 3. date functions
69+
### 3. map functions
70+
| function| description |
71+
|:--|:--|
72+
|map_build(x&lt;K&gt;, y&lt;V&gt;) -> map&lt;K, V&gt;| returns a map created using the given key/value arrays.|
73+
|map_concat(x&lt;K, V&gt;, y&lt;K, V&gt;) -> map&lt;K,V&gt; | returns the union of two maps. If a key is found in both `x` and `y`, that key’s value in the resulting map comes from `y`.|
74+
|map_element_at(map&lt;K, V&gt;, key) -> V | returns value for given `key`, or `NULL` if the key is not contained in the map.|
75+
|map_equals(x&lt;K, V&gt;, y&lt;K, V&gt;) -> boolean | whether map x equals with map y or not.|
76+
77+
### 4. date functions
7078

7179
| function| description |
7280
|:--|:--|
7381
|day_of_week(date_string \| date) -> int | day of week,if monday,return 1, sunday return 7, error return null.|
82+
|day_of_year(date_string \| date) -> int | day of year. The value ranges from 1 to 366.|
7483
|zodiac_en(date_string \| date) -> string | convert date to zodiac|
7584
|zodiac_cn(date_string \| date) -> string | convert date to zodiac chinese |
7685
|type_of_day(date_string \| date) -> string | for chinese. 获取日期的类型(1: 法定节假日, 2: 正常周末, 3: 正常工作日 4:攒假的工作日),错误返回-1. |
7786

78-
### 4. json functions
87+
### 5. json functions
88+
7989
| function| description |
8090
|:--|:--|
8191
|json_array_get(json, jsonPath) -> array(varchar) |returns the element at the specified index into the `json_array`. The index is zero-based.|
@@ -86,7 +96,17 @@ You can also directly download file from [release page](https://github.com/aaron
8696
|json_extract_scalar(json, jsonPath) -> array(varchar) |like `json_extract`, but returns the result value as a string (as opposed to being encoded as JSON).|
8797
|json_size(json, jsonPath) -> array(varchar) |like `json_extract`, but returns the size of the value. For objects or arrays, the size is the number of members, and the size of a scalar value is zero.|
8898

89-
### 5. china id card functions
99+
### 6. bitwise functions
100+
101+
| function| description |
102+
|:--|:--|
103+
|bit_count(x, bits) -> bigint | count the number of bits set in `x` (treated as bits-bit signed integer) in 2’s complement representation |
104+
|bitwise_and(x, y) -> bigint | returns the bitwise AND of `x` and `y` in 2’s complement arithmetic.|
105+
|bitwise_not(x) -> bigint | returns the bitwise NOT of `x` in 2’s complement arithmetic. |
106+
|bitwise_or(x, y) -> bigint | returns the bitwise OR of `x` and `y` in 2’s complement arithmetic.|
107+
|bitwise_xor(x, y) -> bigint | returns the bitwise XOR of `x` and `y` in 2’s complement arithmetic. |
108+
109+
### 7. china id card functions
90110

91111
| function| description |
92112
|:--|:--|
@@ -98,6 +118,27 @@ You can also directly download file from [release page](https://github.com/aaron
98118
|is_valid_id_card(string) -> boolean |determine is valid china id card No.|
99119
|id_card_info(string) -> json |get china id card info. include province, city, area etc.|
100120

121+
### 8. geographic functions
122+
123+
| function| description |
124+
|:--|:--|
125+
|wgs_distance(double lat1, double lng1, double lat2, double lng2) -> double | calculate WGS84 coordinate distance, in meters. |
126+
|gcj_to_bd(double,double) -> json | GCJ-02(火星坐标系) convert to BD-09(百度坐标系), 谷歌、高德——>百度|
127+
|bd_to_gcj(double,double) -> json | BD-09(百度坐标系) convert to GCJ-02(火星坐标系), 百度——>谷歌、高德|
128+
|wgs_to_gcj(double,double) -> json | WGS84(地球坐标系) convert to GCJ02(火星坐标系)|
129+
|gcj_to_wgs(double,double) -> json | GCJ02(火星坐标系) convert to GPS84(地球坐标系), output coordinate WGS-84 accuracy within 1 to 2 meters.|
130+
|gcj_extract_wgs(double,double) -> json | GCJ02(火星坐标系) convert to GPS84, output coordinate WGS-84 accuracy within 0.5 meters. but compute cost more time than `gcj_to_wgs`. |
131+
132+
> 关于互联网地图坐标系的说明见: [当前互联网地图的坐标系现状](https://github.com/aaronshan/hive-third-functions/tree/master/README-geo.md)
133+
134+
135+
### 9. url functions
136+
137+
| function| description |
138+
|:--|:--|
139+
|url_encode(value) -> string | escapes value by encoding it so that it can be safely included in URL query parameter names and values|
140+
|url_decode(value) -> string | unescape the URL encoded value. This function is the inverse of `url_encode`. |
141+
101142
## Use
102143

103144
Put these statements into `${HOME}/.hiverc` or exec its on hive cli env.
@@ -118,7 +159,17 @@ create temporary function array_concat as 'cc.shanruifeng.functions.array.UDFArr
118159
create temporary function array_value_count as 'cc.shanruifeng.functions.array.UDFArrayValueCount';
119160
create temporary function array_slice as 'cc.shanruifeng.functions.array.UDFArraySlice';
120161
create temporary function array_element_at as 'cc.shanruifeng.functions.array.UDFArrayElementAt';
162+
create temporary function bit_count as 'cc.shanruifeng.functions.bitwise.UDFBitCount';
163+
create temporary function bitwise_and as 'cc.shanruifeng.functions.bitwise.UDFBitwiseAnd';
164+
create temporary function bitwise_not as 'cc.shanruifeng.functions.bitwise.UDFBitwiseNot';
165+
create temporary function bitwise_or as 'cc.shanruifeng.functions.bitwise.UDFBitwiseOr';
166+
create temporary function bitwise_xor as 'cc.shanruifeng.functions.bitwise.UDFBitwiseXor';
167+
create temporary function map_build as 'cc.shanruifeng.functions.map.UDFMapBuild';
168+
create temporary function map_concat as 'cc.shanruifeng.functions.map.UDFMapConcat';
169+
create temporary function map_element_at as 'cc.shanruifeng.functions.map.UDFMapElementAt';
170+
create temporary function map_equals as 'cc.shanruifeng.functions.map.UDFMapEquals';
121171
create temporary function day_of_week as 'cc.shanruifeng.functions.date.UDFDayOfWeek';
172+
create temporary function day_of_year as 'cc.shanruifeng.functions.date.UDFDayOfYear';
122173
create temporary function type_of_day as 'cc.shanruifeng.functions.date.UDFTypeOfDay';
123174
create temporary function zodiac_cn as 'cc.shanruifeng.functions.date.UDFZodiacSignCn';
124175
create temporary function zodiac_en as 'cc.shanruifeng.functions.date.UDFZodiacSignEn';
@@ -139,6 +190,14 @@ create temporary function id_card_birthday as 'cc.shanruifeng.functions.card.UDF
139190
create temporary function id_card_gender as 'cc.shanruifeng.functions.card.UDFChinaIdCardGender';
140191
create temporary function is_valid_id_card as 'cc.shanruifeng.functions.card.UDFChinaIdCardValid';
141192
create temporary function id_card_info as 'cc.shanruifeng.functions.card.UDFChinaIdCardInfo';
193+
create temporary function wgs_distance as 'cc.shanruifeng.functions.geo.UDFGeoWgsDistance';
194+
create temporary function gcj_to_bd as 'cc.shanruifeng.functions.geo.UDFGeoGcjToBd';
195+
create temporary function bd_to_gcj as 'cc.shanruifeng.functions.geo.UDFGeoBdToGcj';
196+
create temporary function wgs_to_gcj as 'cc.shanruifeng.functions.geo.UDFGeoWgsToGcj';
197+
create temporary function gcj_to_wgs as 'cc.shanruifeng.functions.geo.UDFGeoGcjToWgs';
198+
create temporary function gcj_extract_wgs as 'cc.shanruifeng.functions.geo.UDFGeoGcjExtractWgs';
199+
create temporary function url_encode as 'cc.shanruifeng.functions.url.UDFUrlEncode';
200+
create temporary function url_decode as 'cc.shanruifeng.functions.url.UDFUrlDecode';
142201
```
143202

144203
You can use these statements on hive cli env get detail of function.
@@ -166,6 +225,7 @@ Example:
166225

167226
```
168227
select day_of_week('2016-07-12') => 2
228+
select day_of_year('2016-01-01') => 1
169229
select type_of_day('2016-10-01') => 1
170230
select type_of_day('2016-07-16') => 2
171231
select type_of_day('2016-07-15') => 3
@@ -191,6 +251,13 @@ select array_slice(array(16,13,12,13,18,16,9,18), -2, 3) => [9,18]
191251
select array_element_at(array(16,13,12,13,18,16,9,18), -1) => 18
192252
```
193253

254+
```
255+
select map_build(array('key1','key2'), array(16,12)) => {"key1":16,"key2":12}
256+
select map_concat(map_build(array('key1','key2'), array(16,12)), map_build(array('key1','key3'), array(17,18))) => {"key1":17,"key2":12,"key3":18}
257+
select map_element_at(map_build(array('key1','key2'), array(16,12)), 'key1') => 16
258+
select map_equals(map_build(array('key1','key2'), array(16,12)), map_build(array('key1','key2'), array(16,12))) => true
259+
```
260+
194261
```
195262
select id_card_info('110101198901084517') => {"valid":true,"area":"东城区","province":"北京市","gender":"男","city":"北京市"}
196263
```
@@ -215,3 +282,15 @@ select json_size('{"x": {"a": 1, "b": 2}}', '$.x'); => 2
215282
select json_size('{"x": [1, 2, 3]}', '$.x'); => 3
216283
select json_size('{"x": {"a": 1, "b": 2}}', '$.x.a'); => 0
217284
```
285+
286+
```
287+
select gcj_to_bd(39.915, 116.404) => {"lng":116.41036949371029,"lat":39.92133699351022}
288+
select bd_to_gcj(39.915, 116.404) => {"lng":116.39762729119315,"lat":39.90865673957631}
289+
select wgs_to_gcj(39.915, 116.404) => {"lng":116.41024449916938,"lat":39.91640428150164}
290+
select gcj_to_wgs(39.915, 116.404) => {"lng":116.39775550083061,"lat":39.91359571849836}
291+
select gcj_extract_wgs(39.915, 116.404) => {"lng":116.39775549316407,"lat":39.913596801757805}
292+
```
293+
294+
```
295+
select url_encode('http://shanruifeng.cc/') => http%3A%2F%2Fshanruifeng.cc%2F
296+
```

pom.xml

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
<groupId>cc.shanruifeng</groupId>
88
<artifactId>hive-third-functions</artifactId>
9-
<version>2.1.0</version>
9+
<version>2.2.0</version>
1010

1111
<properties>
1212
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
@@ -23,7 +23,7 @@
2323
<dep.airlift.version>0.131</dep.airlift.version>
2424
<dep.jackson.version>2.4.4</dep.jackson.version>
2525
<dep.jmh.version>1.9.3</dep.jmh.version>
26-
<dep.fastutil.version>6.5.9</dep.fastutil.version>
26+
<junit.version>4.12</junit.version>
2727
</properties>
2828

2929
<dependencyManagement>
@@ -83,9 +83,9 @@
8383
</dependency>
8484

8585
<dependency>
86-
<groupId>it.unimi.dsi</groupId>
87-
<artifactId>fastutil</artifactId>
88-
<version>${dep.fastutil.version}</version>
86+
<groupId>junit</groupId>
87+
<artifactId>junit</artifactId>
88+
<version>${junit.version}</version>
8989
</dependency>
9090
</dependencies>
9191
</dependencyManagement>
@@ -139,8 +139,9 @@
139139
</dependency>
140140

141141
<dependency>
142-
<groupId>it.unimi.dsi</groupId>
143-
<artifactId>fastutil</artifactId>
142+
<groupId>junit</groupId>
143+
<artifactId>junit</artifactId>
144+
<scope>test</scope>
144145
</dependency>
145146
</dependencies>
146147

src/main/java/cc/shanruifeng/functions/array/UDFArrayConcat.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumen
3535
// Check if two arguments were passed
3636
if (arguments.length != ARG_COUNT) {
3737
throw new UDFArgumentLengthException(
38-
"The function array_concat(array, array) takes exactly " + ARG_COUNT + "arguments.");
38+
"The function array_concat(array, array) takes exactly " + ARG_COUNT + " arguments.");
3939
}
4040

4141
// Check if two argument is of category LIST

src/main/java/cc/shanruifeng/functions/array/UDFArrayContains.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumen
3838
// Check if two arguments were passed
3939
if (arguments.length != ARG_COUNT) {
4040
throw new UDFArgumentLengthException(
41-
"The function array_contains(array, value) takes exactly " + ARG_COUNT + "arguments.");
41+
"The function array_contains(array, value) takes exactly " + ARG_COUNT + " arguments.");
4242
}
4343

4444
// Check if ARRAY_IDX argument is of category LIST

src/main/java/cc/shanruifeng/functions/array/UDFArrayDistinct.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
package cc.shanruifeng.functions.array;
22

3-
import it.unimi.dsi.fastutil.ints.IntArrays;
3+
import cc.shanruifeng.functions.fastuitl.ints.IntArrays;
44
import java.util.ArrayList;
55
import org.apache.hadoop.hive.ql.exec.Description;
66
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
@@ -38,7 +38,7 @@ public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumen
3838
// Check if two arguments were passed
3939
if (arguments.length != ARG_COUNT) {
4040
throw new UDFArgumentLengthException(
41-
"The function array_distinct(array) takes exactly " + ARG_COUNT + "arguments.");
41+
"The function array_distinct(array) takes exactly " + ARG_COUNT + " warguments.");
4242
}
4343

4444
// Check if two argument is of category LIST

src/main/java/cc/shanruifeng/functions/array/UDFArrayElementAt.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumen
3838
// Check if two arguments were passed
3939
if (arguments.length != ARG_COUNT) {
4040
throw new UDFArgumentLengthException(
41-
"The function array_element_at(array, index) takes exactly " + ARG_COUNT + "arguments.");
41+
"The function array_element_at(array, index) takes exactly " + ARG_COUNT + " arguments.");
4242
}
4343

4444
// Check if ARRAY_IDX argument is of category LIST

src/main/java/cc/shanruifeng/functions/array/UDFArrayIntersect.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
package cc.shanruifeng.functions.array;
22

3-
import it.unimi.dsi.fastutil.ints.IntArrays;
3+
import cc.shanruifeng.functions.fastuitl.ints.IntArrays;
44
import java.util.ArrayList;
55
import org.apache.hadoop.hive.ql.exec.Description;
66
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;

src/main/java/cc/shanruifeng/functions/array/UDFArrayJoin.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumen
4646
if (arguments.length > MAX_ARG_COUNT || arguments.length < MIN_ARG_COUNT) {
4747
throw new UDFArgumentLengthException(
4848
"The function array_join(array, delimiter) or array_join(array, delimiter, null_replacement) takes exactly "
49-
+ MIN_ARG_COUNT + " or " + MAX_ARG_COUNT + "arguments.");
49+
+ MIN_ARG_COUNT + " or " + MAX_ARG_COUNT + " arguments.");
5050
}
5151

5252
// Check if ARRAY_IDX argument is of category LIST

src/main/java/cc/shanruifeng/functions/array/UDFArrayMax.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
package cc.shanruifeng.functions.array;
22

3-
import it.unimi.dsi.fastutil.ints.IntArrays;
3+
import cc.shanruifeng.functions.fastuitl.ints.IntArrays;
44
import org.apache.hadoop.hive.ql.exec.Description;
55
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
66
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;

0 commit comments

Comments
 (0)