Distinct on specific column in Hive

Distinct on specific column in Hive - unique

I am running Hive 071 I have a table, with mulitple rows, with the same column value e.g.
| x | y |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 2 |
| 3 | 2 |
| 3 | 1 |
I want to have the x column unique, and remove rows that have the same x val e.g.
| x | y |
| 1 | 2 |
| 2 | 2 |
| 3 | 2 |
or
| x | y |
| 1 | 4 |
| 2 | 2 |
| 3 | 1 |
are both good as distinct works only on the whole rs in hive, I couldn't find a way to do it
help please Tx

Some options:
1) This will give you the max value of y for each value of x
select x, max(y) from table1 group by x
Equally you could use avg() or min()
2) OR, you could collect all the values of y in a list:
select x, collect_set(y) from table1 group by x
This will give you:
x|y
1|2,3,4
2|2
3|1,2

Related

How to write SQL for finding columns with matching string

I do have a following table
TableA
| Column A | Column B |
| -------- | -------- |
| 1234 | Y |
| 2345 | N |
| 3456 | Y |
| 3456 | Y |
| 3456 | N |
| 2345 | N |
| 1234 | N |
| 2345 | N |
Here, '1234' and '3456' has values Y and N whereas 2345 has only value N
I want to display values of column A where there are 2 values (Y and N) in Column B. Ideally
| Column A | Column B |
| -------- | -------- |
| 1234 | Y |
| 1234 | N |
| 3456 | Y |
| 3456 | N |
I tried using
Select * from TableA
where column b = 'Y'
and column b = 'N'
but it doesn't give desired result.

I prefer aggregation here:
SELECT ColumnA
FROM TableA
GROUP BY ColumnA
HAVING MIN(ColumnB) <> MAX(ColumnB);
The above assumes that the only two values which would ever appear in ColumnB are Y and N.

MySQL select all pinned and some other flagged record to reach limit

I have a "products" table where are stored all the store products, some of them are flagged with "hp" flag and some also with "hppin" flag
I need 2 flags because i pick 10 random products from all of the "hp" flagged ( they could 100 or more ) but I need to get all the pinned one.
The pinned need to be always inside the 10 selected products
I need 10 records = all "hppin" flag + random "hp" flag
example table
PRODUCTS
| id | name | hppin | hp |
| 1 | prod1 | y | y |
| 2 | prod2 | n | y |
| 3 | prod3 | y | y |
| 4 | prod4 | n | y |
| 5 | prod5 | n | n |
| 6 | prod6 | y | y |
| 7 | prod7 | n | y |
| 8 | prod8 | n | y |
| 9 | prod9 | n | y |
| 10 | prod10 | n | y |
| 11 | prod11 | n | y |
| 12 | prod12 | n | y |
| 13 | prod13 | n | y |
| 14 | prod14 | n | n |
| 15 | prod15 | n | y |
I could solve this with 2 query but I would like to know if is possible
to do it just in one query.
The result should be record 1,3,6 + 7 random record with hp = y

You can phrase your query using a union between the two sets of data which you want to obtain. The first half of the below union retrives hppin yes records. The second half obtains hp yes records. We then apply a limit of 10 records using an ordering which gives preference to the hppin matches first. The hp records would only enter the result set if there were fewer than 10 hppin records, which would be the case for your sample data.
SELECT id, name, hppin, hp, 1 AS position
FROM PRODUCTS
WHERE hppin = 'y'
UNION ALL
SELECT id, name, hppin, hp, 2
FROM PRODUCTS
WHERE hp = 'y'
ORDER BY
position, RAND()
LIMIT 10

One table, GROUP_CONCAT and multiple AND with FIND_IN_SET

I want to retrieve items which have all values I want.
The table:
+------+------+
| item | val |
+------+------+
| 1 | x |
| 1 | y |
| 2 | a |
| 2 | b |
| 3 | a |
| 3 | x |
| 3 | y |
+------+------+
For example, I want the items which have x and y vals (items 1 and 3)
My SQL query:
SELECT item, GROUP_CONCAT(DISTINCT val) AS vals
FROM test
GROUP BY item
HAVING
FIND_IN_SET('x', vals) AND
FIND_IN_SET('y', vals)
That works, but I think that there is a better solution which doesn't use FIND_IN_SET function.

LIMIT offset or OFFSET in an UPDATE SQL query

I have a table similar to this:
| 0 | X |
| 1 | X |
| 2 | X |
| 3 | Y |
| 4 | Y |
| 5 | X |
| 6 | X |
| 7 | Y |
| 8 | Y |
| 9 | X |
I'd like to replace first 2 occurrences of X with X1, and then 4 next occurrences with X2 so that the resulting table looks like this:
| 0 | X1 |
| 1 | X1 |
| 2 | X2 |
| 3 | Y |
| 4 | Y |
| 5 | X2 |
| 6 | X2 |
| 7 | Y |
| 8 | Y |
| 9 | X2 |
The table in question is of course much bigger and the number of occurrences would thus be higher too so manual editing is not a solution.
I'd like to do something like this:
UPDATE table SET column = 'X' WHERE column = 'X2' LIMIT 90, 88
but unfortunately MySQL doesn't seem to support OFFSET in UPDATE queries... Is there any way to do this?

I don't know whether you have id filed available in table or not, but you can use WHERE id BETWEEN 88 AND 90, MySQL does not support Offset in update query, but you can do this by limiting using BETWEEN command

Try this:
UPDATE table SET column = 'X1' WHERE id IN(SELECT id FROM (SELECT id FROM table WHERE column = 'X' LIMIT 2) as u);
and then
UPDATE table SET column = 'X2' WHERE id IN(SELECT id FROM (SELECT id FROM table WHERE column = 'X' LIMIT 4) as u);

Distinct on specific column in Hive

I am running Hive 071
I have a table, with mulitple rows, with the same column value
e.g.
x | y |
---------
1 | 2 |
1 | 3 |
1 | 4 |
2 | 2 |
3 | 2 |
3 | 1 |
I want to have the x column unique, and remove rows that have the same x val
e.g.
x | y |
---------
1 | 2 |
2 | 2 |
3 | 2 |
or
x | y |
---------
1 | 4 |
2 | 2 |
3 | 1 |
are both good
as distinct works only on the whole rs in hive, I couldn't find a way to do it
help please
Tx

You can use the distinct keyword:
SELECT DISTINCT x FROM table

try following query to get result :
select A.x , A.y from (select x , y , rank() over ( partition by x order by y) as ranked from testingg)A where ranked=1;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Distinct on specific column in Hive - unique

Some options: 1) This will give you the max value of y for each value of x select x, max(y) from table1 group by x Equally you could use avg() or min() 2) OR, you could collect all the values of y in a list: select x, collect_set(y) from table1 group by x This will give you: x|y 1|2,3,4 2|2 3|1,2

Related

How to write SQL for finding columns with matching string

MySQL select all pinned and some other flagged record to reach limit

One table, GROUP_CONCAT and multiple AND with FIND_IN_SET

LIMIT offset or OFFSET in an UPDATE SQL query

Distinct on specific column in Hive

Categories

Resources