Thursday 26 January 2017

ORDER BY Vs SORT BY Clause in Hive




Order By

Order By guarantees the total sorted order of the records from all the input data. This requires the strict mode to be disabled (hive.mapred.mode=nonstrict). Whereas in the Strict mode, 'Order By' clause should be followed by 'limit' clause.

If you know the number of records to be retrieved in advance, then use LIMIT clause. This will reduce the processing time by fetching only n highest or n lowest rows from each node and then coordinator picks up n highest or n lowest rows from this small intermediate result set retrieved from each node.

Syntax of Order By:

colOrder: ( ASC | DESC )
colNullOrder: (NULLS FIRST | NULLS LAST) -- (Note: Available in Hive 2.1.0 and later)
orderBy: ORDER BY colName colOrder? colNullOrder? (',' colName colOrder? colNullOrder?)*
query: SELECT expression (',' expression)* FROM src orderBy

Consider there are 3 nodes, and each node has the data as in below table:

Node 1
Node 2
Node 3
3
4
11
8
6
16
12
2
1
18
17
13
5
15
9
7
10
14

Ex 1 : Selecting the lowest 2 rows: 
hive >  Select * from table order by x asc limit 2;


Ex 2 : Selecting the highest 2 rows: 
hive > Select * from table order by x desc limit 2;

Now, set the non-strict mode by below command:
hive >  set hive.mapred.mode=nonstrict;

Ex 3 : Selecting all rows in ascending order 

hive> select * from table order by x asc;
 




Ex 4 : Selecting all rows in descending order 
hive> select * from table order by x desc;




Sort By
Sort By guarantees the total sorted order of the records within a reducer. So, if there are more than one reducer, the final results are only partially ordered.

Syntax of Sort By:

colOrder: ( ASC | DESC )
sortBy: SORT BY colName colOrder? (',' colName colOrder?)*
query: SELECT expression (',' expression)* FROM src sortBy

hive> Select * from table order by x asc;


Thanks for looking !!! Have a good day !!! 

No comments:

Post a Comment