Distinct on specific column in Hive - unique

I am running Hive 071
I have a table, with mulitple rows, with the same column value
e.g.
x | y |
---------
1 | 2 |
1 | 3 |
1 | 4 |
2 | 2 |
3 | 2 |
3 | 1 |
I want to have the x column unique, and remove rows that have the same x val
e.g.
x | y |
---------
1 | 2 |
2 | 2 |
3 | 2 |
or
x | y |
---------
1 | 4 |
2 | 2 |
3 | 1 |
are both good
as distinct works only on the whole rs in hive, I couldn't find a way to do it
help please
Tx

You can use the distinct keyword:
SELECT DISTINCT x FROM table

try following query to get result :
select A.x , A.y from (select x , y , rank() over ( partition by x order by y) as ranked from testingg)A where ranked=1;

Related

MySQL select all pinned and some other flagged record to reach limit

I have a "products" table where are stored all the store products, some of them are flagged with "hp" flag and some also with "hppin" flag
I need 2 flags because i pick 10 random products from all of the "hp" flagged ( they could 100 or more ) but I need to get all the pinned one.
The pinned need to be always inside the 10 selected products
I need 10 records = all "hppin" flag + random "hp" flag
example table
PRODUCTS
| id | name | hppin | hp |
| 1 | prod1 | y | y |
| 2 | prod2 | n | y |
| 3 | prod3 | y | y |
| 4 | prod4 | n | y |
| 5 | prod5 | n | n |
| 6 | prod6 | y | y |
| 7 | prod7 | n | y |
| 8 | prod8 | n | y |
| 9 | prod9 | n | y |
| 10 | prod10 | n | y |
| 11 | prod11 | n | y |
| 12 | prod12 | n | y |
| 13 | prod13 | n | y |
| 14 | prod14 | n | n |
| 15 | prod15 | n | y |
I could solve this with 2 query but I would like to know if is possible
to do it just in one query.
The result should be record 1,3,6 + 7 random record with hp = y
You can phrase your query using a union between the two sets of data which you want to obtain. The first half of the below union retrives hppin yes records. The second half obtains hp yes records. We then apply a limit of 10 records using an ordering which gives preference to the hppin matches first. The hp records would only enter the result set if there were fewer than 10 hppin records, which would be the case for your sample data.
SELECT id, name, hppin, hp, 1 AS position
FROM PRODUCTS
WHERE hppin = 'y'
UNION ALL
SELECT id, name, hppin, hp, 2
FROM PRODUCTS
WHERE hp = 'y'
ORDER BY
position, RAND()
LIMIT 10

Alternate order by logic in MySQL

I'm looking to allow for a custom ordering logic through mySQL that allows the following data set:
+----+-----------------+------------+-------+--+
| ID | item | Popularity | Views | |
+----+-----------------+------------+-------+--+
| 1 | A special place | 3 | 10 | |
| 2 | Another title | 5 | 12 | |
| 3 | Words go here | 1 | 15 | |
| 4 | A wonder | 2 | 8 | |
+----+-----------------+------------+-------+--+
To return an order that alternates, row by row, by popularity and then by views, so the return results look like:
+----+-----------------+------------+-------+--+
| ID | item | Popularity | Views | |
+----+-----------------+------------+-------+--+
| 3 | Words go here | 1 | 15 | |
| 2 | Another title | 5 | 12 | |
| 4 | A wonder | 2 | 8 | |
| 1 | A special place | 3 | 10 | |
+----+-----------------+------------+-------+--+
Where you will see the first row returns the 'most popular', the second row returns the most views, the third row returns the second most popular, and the 4th row returns the 2nd most views.
Currently I'm gathering an entire table through mySQL twice, and then merging these results in PHP. This isn't going to cut it when the database is large. Is this possible in mysql at all?
I guess something along these lines could work. Consider the following:
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,x INT NOT NULL
,y INT NOT NULL
);
INSERT INTO my_table VALUES
(1,3,10),
(2,5,12),
(3,1,15),
(4,2, 8)
(5,4, 1);
We can rank x and y in turn, and then arrange those ranks in a single list - so will have x1,y1,x2,y2,etc - but all rows will appear twice; once for the x rank and once for the y rank...
SELECT * FROM
(
( SELECT a.*, COUNT(*) rank FROM my_table a JOIN my_table b ON b.x <= a.x GROUP BY a.id )
UNION ALL
( SELECT a.*, COUNT(*) rank FROM my_table a JOIN my_table b ON b.y <= a.y GROUP BY a.id )
) n
ORDER BY rank
+----+---+----+------+
| id | x | y | rank |
+----+---+----+------+
| 5 | 4 | 1 | 1 |
| 3 | 1 | 15 | 1 |
| 4 | 2 | 8 | 2 |
| 4 | 2 | 8 | 2 |
| 1 | 3 | 10 | 3 |
| 1 | 3 | 10 | 3 |
| 5 | 4 | 1 | 4 |
| 2 | 5 | 12 | 4 |
| 2 | 5 | 12 | 5 |
| 3 | 1 | 15 | 5 |
+----+---+----+------+
Now we can just grab the lowest rank for each id...
SELECT id
, x
, y
FROM
(
( SELECT a.*, COUNT(*) rank FROM my_table a JOIN my_table b ON b.x <= a.x GROUP BY a.id )
UNION ALL
( SELECT a.*, COUNT(*) rank FROM my_table a JOIN my_table b ON b.y <= a.y GROUP BY a.id )
) m
GROUP
BY id,x,y
ORDER
BY MIN(rank);
+----+---+----+
| id | x | y |
+----+---+----+
| 3 | 1 | 15 |
| 5 | 4 | 1 |
| 4 | 2 | 8 |
| 1 | 3 | 10 |
| 2 | 5 | 12 |
+----+---+----+
Incidentally, this should be faster with variables - but I cannot make that solution work at present - senior moment, perhaps.

MySQL - sort result by specific record

Perhaps someone could tell me, how to order MySQL output in this specific scenario:
I have table like this:
| id | Value1 | Value2 | ...More values that doesn`t matter in this example
^-----^--------^--------^
| 1 | 1 | X |
| 2 | 2 | X |
| 3 | 3 | 2 |
| 4 | 1 | X |
| 5 | 2 | X |
| 6 | 3 | 3 |
| 7 | 1 | X |
| 8 | 2 | X |
| 9 | 3 | 1 |
I want to get values from this table, and I want to order them by Value2, but only there, where Value1 is 3 (X values doesn't matter).
What`s the best way to do this, with good performance?
Thanks in advance!
Hmmm, do you want where and order by?
select t.*
from t
where value1 = 3
order by value2;
EDIT:
Based on the comment:
select t.*
from t
order by (value1 = 3) desc, -- put value 3 first
value2
SELECT * FROM TABLE WHERE Value1=3 ORDER BY Value2
Try with this please:
SELECT * FROM TABLE WHERE Value1=3 ORDER BY Value2 UNION SELECT * FROM TABLE WHERE Value1!=3

LIMIT offset or OFFSET in an UPDATE SQL query

I have a table similar to this:
| 0 | X |
| 1 | X |
| 2 | X |
| 3 | Y |
| 4 | Y |
| 5 | X |
| 6 | X |
| 7 | Y |
| 8 | Y |
| 9 | X |
I'd like to replace first 2 occurrences of X with X1, and then 4 next occurrences with X2 so that the resulting table looks like this:
| 0 | X1 |
| 1 | X1 |
| 2 | X2 |
| 3 | Y |
| 4 | Y |
| 5 | X2 |
| 6 | X2 |
| 7 | Y |
| 8 | Y |
| 9 | X2 |
The table in question is of course much bigger and the number of occurrences would thus be higher too so manual editing is not a solution.
I'd like to do something like this:
UPDATE table SET column = 'X' WHERE column = 'X2' LIMIT 90, 88
but unfortunately MySQL doesn't seem to support OFFSET in UPDATE queries... Is there any way to do this?
I don't know whether you have id filed available in table or not, but you can use WHERE id BETWEEN 88 AND 90, MySQL does not support Offset in update query, but you can do this by limiting using BETWEEN command
Try this:
UPDATE table SET column = 'X1' WHERE id IN(SELECT id FROM (SELECT id FROM table WHERE column = 'X' LIMIT 2) as u);
and then
UPDATE table SET column = 'X2' WHERE id IN(SELECT id FROM (SELECT id FROM table WHERE column = 'X' LIMIT 4) as u);

Distinct on specific column in Hive

I am running Hive 071 I have a table, with mulitple rows, with the same column value e.g.
| x | y |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 2 |
| 3 | 2 |
| 3 | 1 |
I want to have the x column unique, and remove rows that have the same x val e.g.
| x | y |
| 1 | 2 |
| 2 | 2 |
| 3 | 2 |
or
| x | y |
| 1 | 4 |
| 2 | 2 |
| 3 | 1 |
are both good as distinct works only on the whole rs in hive, I couldn't find a way to do it
help please Tx
Some options:
1) This will give you the max value of y for each value of x
select x, max(y) from table1 group by x
Equally you could use avg() or min()
2) OR, you could collect all the values of y in a list:
select x, collect_set(y) from table1 group by x
This will give you:
x|y
1|2,3,4
2|2
3|1,2