Order By
Order By guarantees the total sorted order of the records from all the input data. This requires the strict mode to be disabled (hive.mapred.mode=nonstrict). Whereas in the Strict mode, 'Order By' clause should be followed by 'limit' clause.
If you know the number of records to be retrieved in advance, then use LIMIT clause. This will reduce the processing time by fetching only n highest or n lowest rows from each node and then coordinator picks up n highest or n lowest rows from this small intermediate result set retrieved from each node.
Syntax of Order By:
colOrder: ( ASC | DESC ) colNullOrder: (NULLS FIRST | NULLS LAST) -- (Note: Available in Hive 2.1.0 and later) orderBy: ORDER BY colName colOrder? colNullOrder? (',' colName colOrder? colNullOrder?)* query: SELECT expression (',' expression)* FROM src orderBy |
Node 1
|
Node 2
|
Node 3
|
3
|
4
|
11
|
8
|
6
|
16
|
12
|
2
|
1
|
18
|
17
|
13
|
5
|
15
|
9
|
7
|
10
|
14
|
Ex 1 : Selecting the lowest 2 rows:
hive > Select * from table order by x asc limit 2;
Ex 2 : Selecting the highest 2 rows:
hive > Select * from table order by x desc limit 2;
Now, set the non-strict mode by below command:
hive > set hive.mapred.mode=nonstrict;
Ex 3 : Selecting all rows in ascending order
hive> select * from table order by x asc;
Ex 4 : Selecting all rows in descending order
hive> select * from table order by x desc;
Sort By
guarantees the total sorted order of the records within a reducer. So, if there
are more than one reducer, the final results are only partially ordered.
sortBy: SORT BY colName colOrder? (',' colName colOrder?)*
query: SELECT expression (',' expression)* FROM src sortBy
hive>
Select * from table order by x asc;
Thanks for looking !!! Have a good day !!!
No comments:
Post a Comment