Given a table like this:
id name field
1 walt a
2 hurley b
3 jack c
4 kate a
5 sawyer a
6 john a
7 ben b
8 miles null
9 juliet c
10 jacob d
11 richard null
How would you transfer it into this:
id ids names field
1 1,4,5,6 walt, kate, sawyer, john a
2 2,7 hurley, ben b
3 8 miles null
4 3,9 jack, juliet c
5 10 jacob d
6 11 richard null
It needs to look at all the rows having the same field value. Then it needs to "merge" all other values based on equality of the field value. However if the field value is null, it should do nothing.
I got this to work:
mysql> set #x:= 1;
mysql> select group_concat(id) as ids, group_concat(name) as names, field
from `a table like this` group by coalesce(field, #x:=#x+1);
+---------+-----------------------+-------+
| ids | names | field |
+---------+-----------------------+-------+
| 8 | miles | NULL |
| 11 | richard | NULL |
| 1,4,5,6 | walt,kate,sawyer,john | a |
| 2,7 | hurley,ben | b |
| 3,9 | jack,juliet | c |
| 10 | jacob | d |
+---------+-----------------------+-------+
Basically, I tricked the query into treating each NULL as a non-NULL value that increments each time we evaluate it, so each row with a NULL counts as a distinct group.
Re your comment:
You can also initialize a variable within the query like this:
select group_concat(id) as ids, group_concat(name) as names, field
from (select #x:=1) AS _init
cross join `a table like this`
group by coalesce(field, #x:=#x+1);
GROUP_CONCAT can be used to aggregate data from different rows into a concatenated string (as its name would suggest); it also supports and ORDER BY clause of it's own, so you want make doubly sure corresponding values end up in the same relative position of the list*.
SELECT MIN(id)
, GROUP_CONCAT(id ORDER BY id)
, GROUP_CONCAT(name ORDER BY id)
, field
FROM theTable
WHERE field IS NOT NULL
GROUP BY field
UNION
SELECT id, id, name, field
FROM theTable
WHERE field IS NULL
;
* aggregate functions ignore NULL values, so technically if either id or name contain NULL, the lists will become misaligned; this could be remedied with something like GROUP_CONCAT(IFNULL(concatenated_value, '[null]') ORDER BY ordering_value)
Related
class_table
+----+-------+--------------+
| id |teac_id| student_id |
+----+-------+--------------+
| 1 | 1 | 1,2,3,4 |
+----+-------+--------------+
student_mark
+----+----------+--------+
| id |student_id| marks |
+----+----------+--------+
| 1 | 1 | 12 |
+----+----------+--------+
| 2 | 2 | 80 |
+----+----------+--------+
| 3 | 3 | 20 |
+----+----------+--------+
I have these two tables and i want to calculate the total marks of student and my sql is:
SELECT SUM(`marks`)
FROM `student_mark`
WHERE `student_id` IN
(SELECT `student_id` FROM `class_table` WHERE `teac_id` = '1')
But this will return null, please help!!
DB fiddle
Firstly, you should never store comma separated data in your column. You should really normalize your data. So basically, you could have a many-to-many table mapping teacher_to_student, which will have teac_id and student_id columns.
In this particular case, you can utilize Find_in_set() function.
From your current query, it seems that you are trying to getting total marks for a teacher (summing up marks of all his/her students).
Try:
SELECT SUM(sm.`marks`)
FROM `student_mark` AS sm
JOIN `class_table` AS ct
ON FIND_IN_SET(sm.`student_id`, ct.`student_id`) > 0
WHERE ct.`teac_id` = '1'
In case, you want to get total marks per student, you would need to add a Group By. The query would look like:
SELECT sm.`student_id`,
SUM(sm.`marks`)
FROM `student_mark` AS sm
JOIN `class_table` AS ct
ON FIND_IN_SET(sm.`student_id`, ct.`student_id`) > 0
WHERE ct.`teac_id` = '1'
GROUP BY sm.`student_id`
Just in case you want to know why, The reason it returned null is because the subquery returned as '1,2,3,4' as a whole. What you need is to make it returned 1,2,3,4 separately.
What your query returned
SELECT SUM(`marks`)
FROM `student_mark`
WHERE `student_id` IN ('1,2,3,4')
What you expect is
SELECT SUM(`marks`)
FROM `student_mark`
WHERE `student_id` IN (1,2,3,4)
The best way is it normalize as #madhur said. In your case you need to make the teacher and student as one to many link
+----+-------+--------------+
| id |teac_id| student_id |
+----+-------+--------------+
| 1 | 1 | 1 |
+----+-------+--------------+
| 2 | 1 | 2 |
+----+-------+--------------+
| 3 | 1 | 3 |
+----+-------+--------------+
| 4 | 1 | 4 |
+----+-------+--------------+
If you want to filter your table based on a comma separated list with ID, my approach is to
append extra commas at the beginning and at the end of a list as well as at the beginning and at the end of an ID, eg.
1 becomes ,1, and list would become ,1,2,3,4,. The reason for that is to avoid ambigious matches like 1 matches 21 or 12 in a list.
Also, EXISTS is well-suited in that situation, which together with INSTR function should work:
SELECT SUM(`marks`)
FROM `student_mark` sm
WHERE EXISTS(SELECT 1 FROM `class_table`
WHERE `teac_id` = '1' AND
INSTR(CONCAT(',', student_id, ','), CONCAT(',', sm.student_id, ',')) > 0)
Demo
BUT you shouldn't store related IDs in one cell as comma separated list - it should be foreign key column to form proper relation. Joins would become trivial then.
Using MySQL 5.7 on Google Cloud, I'm trying to deduplicate MySQL data based on an "EmailAddress" column, but some of the rows have a value in the "FullName" column and some of them don't. I want to keep the ones that have a value in the FullName column, but if none of the rows with that EmailAddress value a FullName value, then just keep the duplicate with the lowest ID number (first column - primary key).
I've finally broken it down into two separate queries, one to first remove the rows with no value in the FullName column IF there's another duplicate row that does have a value in the FullName column:
DELETE
FROM customer_info
WHERE id IN
(
SELECT *
FROM
(
SELECT c1.id
FROM customer_info c1
INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id!=c2.id
WHERE
(trim(c1.FullName)='' or c1.FullName is NULL)
and c2.FullName is not NULL
and length(trim(c2.FullName))!=0
) t
)
and another query to remove the rows with the bigger IDs where no value was found in the FullName column:
DELETE
FROM customer_info
WHERE id IN
(
SELECT *
FROM
(
SELECT c1.id
FROM customer_info c1
INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id>c2.id
) t
)
This "works", but not really. It worked one time when I left it running overnight for a smaller segment of the data, and when I woke up there was an error, but I looked at the data and it was complete.
Am I missing something in my query that's making it highly inefficient, or is it just par for the course for this type of query, and there's no optimization possible in my code that would make a tangible improvement? I've maxed out a Google Cloud SQL instance to their db-n1-highmem-32 size, with 32 GB of memory and 1000 GB of storage space, and it still chokes up and spits out a 2013 error after running for an hour. I need to do this for a total of a little over 3 million rows.
For example, this:
id | FullName | EmailAddress |
----------------------------------------------
1 | John Doe | john.doe#email.com |
2 | null | janedoe#box.com |
3 | null | billybob#bobby.com |
4 | null | john.doe#email.com |
5 | John Lennon | jlennon#yoohoo.com |
6 | null | james.smith#coolmail.com|
7 | null | billybob#bobby.com |
8 | Jane Doe | janedoe#box.com |
would result in this:
id | FullName | EmailAddress |
----------------------------------------------
1 | John Doe | john.doe#email.com |
3 | null | billybob#bobby.com |
5 | John Lennon | jlennon#yoohoo.com |
6 | null | james.smith#coolmail.com|
8 | Jane Doe | janedoe#box.com |
using exists() might be simpler in this situation
delete
from customer_info c
where (trim(c.FullName)='' or c.FullName is null)
and exists (
select 1
from customer_info i
where i.Email = c.EmailAddress
and trim(i.FullName)>''
)
delete
from customer_info c
where exists (
select 1
from customer_info i
where i.Email = c.EmailAddress
and i.id < c.id
)
I have inherited a table where one column is a comma-separated list of primary keys for a different table:
id | other_ids | value
---|-----------|-------
1 | a,b,c | 100
2 | d,e | 200
3 | f,g | 3000
I would like to convert this table to one where each other_id gets a column of its own:
id | other_id
---|---------
1 | a
1 | b
1 | c
2 | d
2 | e
3 | f
3 | g
However, I cannot think of a way to do this?
The table is > 10 GB in size, so I would like to do this inside the database, if possible.
first time post, please be kind.
Try this
select id,SUBSTRING_INDEX(other_ids,',',1) as other_id from reverseconcat
UNION
select id,SUBSTRING_INDEX(SUBSTRING_INDEX(other_ids,',',2),',',-1) as other_id from reverseconcat
UNION
select id,SUBSTRING_INDEX(SUBSTRING_INDEX(other_ids,',',3),',',-1) as other_id from reverseconcat
order by id
Although I cant really take any credit. Found this on http://www.programering.com/a/MzMyUzNwATg.html
Unsure how you will go on a huge dataset. Also you will need to add more unions if the other_ids are > 3
If you have the other table, then you can use a join and find_in_set():
select t.id, ot.pk as other_id
from t join
othertable ot
on find_in_set(ot.pk, t.other_ids) > 0;
I have a table like this:
Table: p
+----------------+
| id | w_id |
+---------+------+
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 6 | 5 |
| 6 | 8 |
| 6 | 10 |
| 6 | 10 |
| 7 | 8 |
| 7 | 10 |
+----------------+
What is the best SQL to get the following result? :
+-----------------------------+
| id | most_used_w_id |
+---------+-------------------+
| 5 | 8 |
| 6 | 10 |
| 7 | 8 |
+-----------------------------+
In other words, to get, per id, the most frequent related w_id.
Note that on the example above, id 7 is related to 8 once and to 10 once.
So, either (7, 8) or (7, 10) will do as result. If it is not possible to
pick up one, then both (7, 8) and (7, 10) on result set will be ok.
I have come up with something like:
select counters2.p_id as id, counters2.w_id as most_used_w_id
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters2
join (
select p_id, max(count_of_w_ids) as max_counter_for_w_ids
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters
group by p_id
) as p_max
on p_max.p_id = counters2.p_id
and p_max.max_counter_for_w_ids = counters2.count_of_w_ids
;
but I am not sure at all whether this is the best way to do it. And I had to repeat the same sub-query two times.
Any better solution?
Try to use User defined variables
select id,w_id
FROM
( select T.*,
if(#id<>id,1,0) as row,
#id:=id FROM
(
select id,W_id, Count(*) as cnt FROM p Group by ID,W_id
) as T,(SELECT #id:=0) as T1
ORDER BY id,cnt DESC
) as T2
WHERE Row=1
SQLFiddle demo
Formal SQL
In fact - your solution is correct in terms of normal SQL. Why? Because you have to stick with joining values from original data to grouped data. Thus, your query can not be simplified. MySQL allows to mix non-group columns and group function, but that's totally unreliable, so I will not recommend you to rely on that effect.
MySQL
Since you're using MySQL, you can use variables. I'm not a big fan of them, but for your case they may be used to simplify things:
SELECT
c.*,
IF(#id!=id, #i:=1, #i:=#i+1) AS num,
#id:=id AS gid
FROM
(SELECT id, w_id, COUNT(w_id) AS w_count
FROM t
GROUP BY id, w_id
ORDER BY id DESC, w_count DESC) AS c
CROSS JOIN (SELECT #i:=-1, #id:=-1) AS init
HAVING
num=1;
So for your data result will look like:
+------+------+---------+------+------+
| id | w_id | w_count | num | gid |
+------+------+---------+------+------+
| 7 | 8 | 1 | 1 | 7 |
| 6 | 10 | 2 | 1 | 6 |
| 5 | 8 | 3 | 1 | 5 |
+------+------+---------+------+------+
Thus, you've found your id and corresponding w_id. The idea is - to count rows and enumerate them, paying attention to the fact, that we're ordering them in subquery. So we need only first row (because it will represent data with highest count).
This may be replaced with single GROUP BY id - but, again, server is free to choose any row in that case (it will work because it will take first row, but documentation says nothing about that for common case).
One little nice thing about this is - you can select, for example, 2-nd by frequency or 3-rd, it's very flexible.
Performance
To increase performance, you can create index on (id, w_id) - obviously, it will be used for ordering and grouping records. But variables and HAVING, however, will produce line-by-line scan for set, derived by internal GROUP BY. It isn't such bad as it was with full scan of original data, but still it isn't good thing about doing this with variables. On the other hand, doing that with JOIN & subquery like in your query won't be much different, because of creating temporery table for subquery result set too.
But to be certain, you'll have to test. And keep in mind - you already have valid solution, which, by the way, isn't bound to DBMS-specific stuff and is good in terms of common SQL.
Try this query
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having max(ccc)
here is the sqlfidddle link
You can also use this code if you do not want to rely on the first record of non-grouping columns
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having ccc=max(ccc);
Had a good read through similar topics but I can't quite a) find one to match my scenario, or b) understand others enough to fit / tailor / tweek to my situation.
I have a table, the important fields being;
+------+------+--------+--------+
| ID | Name | Price |Status |
+------+------+--------+--------+
| 1 | Fred | 4.50 | |
| 2 | Fred | 4.50 | |
| 3 | Fred | 5.00 | |
| 4 | John | 7.20 | |
| 5 | John | 7.20 | |
| 6 | John | 7.20 | |
| 7 | Max | 2.38 | |
| 8 | Max | 2.38 | |
| 9 | Sam | 21.00 | |
+------+------+--------+--------+
ID is an auto-incrementing value as records get added throughout the day.
NAME is a Primary Key field, which can repeat 1 to 3 times in the whole table.
Each NAME will have a PRICE value, which may or may not be the same per NAME.
There is also a STATUS field that need to be populated based on the following, which is actually the part I am stuck on.
Status = 'Y' if each DISTINCT name has only one price attached to it.
Status = 'N' if each DISTINCT name has multiple prices attached to it.
Using the table above, ID's 1, 2 and 3 should be 'N', whilst 4, 5, 6, 7, 8 and 9 should be 'Y'.
I think this may well involve some form of combination of JOINs, GROUPs, and DISTINCTs but I am at a loss on how to put that into the right order for SQL.
In order to get the count of distinct Price values per name, we must use a GROUP BY on the Name field, but since you also want to display all names ungrouped but with an additional Status field, we must first create a subselect in the FROM clause which groups by the name and determines whether the name has multiple price values or not.
When we GROUP BY Name in the subselect, COUNT(DISTINCT price) will count the number of distinct price values for each particular name. Without the DISTINCT keyword, it would simply count the number of rows where price is not null.
In conjunction with that, we use a CASE expression to insert N into the Status column if there is more than one distinct Price value for the particular name, otherwise, it will insert Y.
The subselect only returns one row per Name, so to get all names ungrouped, we join that subselect to the main table on the condition that the subselect's Name = the main table's Name:
SELECT
b.ID,
b.Name,
b.Price,
a.Status
FROM
(
SELECT Name, CASE WHEN COUNT(DISTINCT Price) > 1 THEN 'N' ELSE 'Y' END AS Status
FROM tbl
GROUP BY Name
) a
INNER JOIN
tbl b ON a.Name = b.Name
Edit: In order to facilitate an update, you can incorporate this query using JOINs in the UPDATE like so:
UPDATE
tbl a
INNER JOIN
(
SELECT Name, CASE WHEN COUNT(DISTINCT Price) > 1 THEN 'N' ELSE 'Y' END AS Status
FROM tbl
GROUP BY Name
) b ON a.Name = b.Name
SET
a.Status = b.Status
Assuming you have an unfilled Status column in your table.
If you want to update the status column, you could do:
UPDATE mytable s
SET status = (
SELECT IF(COUNT(DISTINCT price)=1, 'Y', 'N') c
FROM (
SELECT *
FROM mytable
) s1
WHERE s1.name = s.name
GROUP BY name
);
Technically, it should not be necessary to have this:
FROM (
SELECT *
FROM mytable
) s1
but there is a mysql limitation that prevents you to select from the table you're updating. By wrapping it in parenthesis, we force mysql to create a temporary table and then it suddenly is possible.