Hive - counting distinct CSVs in one column - csv

One of the hive tables looks something like this:
ID listOfcategories
1 ["a","b","b","a","c","d","d"]
2 ["a","a","a","c","c","c","c","e","e","e"]
3 ["a","b","c"]
The number of comma-separated values is a variable. I want to query the number of distinct categories in each row/ID.
So, my output should look like:
ID numDistCategories
1 4
2 3
3 3

You can use explode to output separate rows for each category and then count distinct to get the result you are looking for.
Something like this.
SELECT
id,
COUNT(DISTINCT(cat)) as numDistCategories
FROM (
SELECT
id,
EXPLODE(listOfcategories) AS cat
FROM myTable) a
GROUP BY id;
Hope that helps.

Related

How to get exact rows count of particular column in MySQL table

I want to get exact row count of specified column
Example: Table
Name Id Age
_______________________________
Jon 1 30
Merry 2 40
William 50
David
There are 4 rows in table but i want to count ID column.
I am using below query to achieve it
select count(Id) from table;
But its returning 4 and I know why it is returning 4 but I want output as 2 because there are only two rows in Id column.
How can i achieve it?
Try this:
select count(Id) from table where id>0;
with the help of #blabla_bingo and #Edwin Dijk finally i have achieved it by below query
select count(Id) from table where Id!="";

How to find only id that has different values in MySQL?

I have a table with two columns:
id
num
1
2
2
8
1
7
7
3
I want to get as an answer to my query only ids that have more than 1 nums.
For example in my table I would want to get as a result:
id
1
How should I express my query?
Thanks in advance.
You might need something like this:
SELECT id
FROM your_table_name
GROUP BY id
HAVING count(DISTINCT num) > 1;
Google 'Aggregate functions'. Here the aggregate function is count() and it works always coupled with a GROUP BY clause. Pretty fun.

SQL: Repeated records by grouping some columns

I have a data like,
ID Name ItemA ItemB ItemC
OXZ234 Adam 4 4 5
OXZ234 Adam 1 2 3
OXZ345 Tarzen 6 7 8
OXDER2 William 9 8 2
OXDER2 William 0 8 0
I need to find how much of food each person eats. For example by referring first two records I can say, Adam of ID OXZ234 ate ItemA-5, ItemB-6 and ItemC-8. But for small amount of data this kind of manual calculation is affordable. I have a million data records like this. So initially I need to find the records which is having same ID and name but only items count differing.
I have tried the query to find duplicate records by grouping all columns like below,
select ID,Name,ItemA,ItemB,ItemC, COUNT(*)
from DATA_REFRESH
group by ID,Name,ItemA,ItemB,ItemC
having COUNT(*) > 1
But Now I have to identify records having items columns differed.
So the expected output is like,
OXZ234 Adam 2
OXDER2 William 2
OXZ345 Tarzen 1
Any suggestion would be helpful!
You want SUM
select ID,
Name,
sum(ItemA) as ItA,
sum(ItemB) as ItB,
sum(ItemC) as ItC,
count(ID) as Occurrences -- Counts the number of entries per person
from DATA_REFRESH
group by ID,Name
having count(ID) >1 -- restricts this so only those with more than one entry appear
Hi, You can have a simple query without having clause,
select ID,Name,COUNT(*)
from DATA_REFRESH
group by ID,Name order by COUNT(*) desc ;
Simply try like this,
select ID,Name,COUNT(*)
from Sample_Check
group by ID,Name
having COUNT(*) > 1

SQL group by merge results

how to run query select "sub" grouped by "cat" to return something like this:
SQL query:
select sub
from post
where cat = 1
group by id
to return something like:
3,4,9,14,33,22
table "post"
id cat sub
1 1 3,4,9,14
2 2 1,2
3 2 4,5
4 1 33,22
5 3 1,4
thanks,
It is a very bad idea to store lists of things in character strings. For one thing, your ids are integers, but the strings are characters. More importantly, SQL has a great data structure for storing lists -- it is called a table. You should be using a junction table.
But, sometimes you are stuck with the data you have. In that case, you can use group_concat():
select group_concat(sub)
from post
where cat = 1;

SQL GROUP BY - Multiple results in one column?

I am trying to perform a SELECT query using a GROUP BY clause, however I also need to access data from multiple rows and somehow concatenate it into a single column.
Here's what I have so far:
SELECT
COUNT(v.id) AS quantity,
vt.name AS name,
vt.cost AS cost,
vt.postage_cost AS postage_cost
FROM vouchers v
INNER JOIN voucher_types vt
ON v.type_id = vt.id
WHERE
v.order_id = 1 AND
v.sold = 1
GROUP BY vt.id
Which gives me the first four columns I need in the following format.
quantity | name | cost | postage_cost
2 X 5 1
2 Y 6 1
However, I would also like a fifth column to be displayed, showing all of the codes associated with each line of the order like this:
code
ABCD, EFGH
IJKL, MNOP
Where the comma separated values are pulled from the voucher table.
Is this possible?
Any advice would be appreciated.
Thanks
This is what GROUP_CONCAT does.
Assuming the column is called code you would just add ,GROUP_CONCAT(v.code) As Codes to your select list.