I need to clean some data by merging two similar but slightly different dimension field values into one new row that adds together the two metric values, keeping the uid and date intact.
Current setup looks like this:
╔═════╦═════════════╦══════╦═══════════╦═══════════╗
║ id ║ date ║ uid ║ source ║ pageviews ║
╠═════╬═════════════╬══════╬═══════════╬═══════════╣
║ 1 ║ 2013-12-11 ║ 111 ║ source1 ║ 14 ║
║ 3 ║ 2013-12-11 ║ 111 ║ source1a ║ 1 ║
║ 11 ║ 2013-12-11 ║ 222 ║ source1 ║ 3 ║
║ 19 ║ 2013-12-11 ║ 222 ║ source1a ║ 11 ║
╚═════╩═════════════╩══════╩═══════════╩═══════════╝
I'd like to consider source1 and source1a to be equal and merge the two, to get this:
╔═════╦═════════════╦══════╦══════════╦═══════════╗
║ id ║ date ║ uid ║ source ║ pageviews ║
╠═════╬═════════════╬══════╬══════════╬═══════════╣
║ 1 ║ 2013-12-11 ║ 111 ║ source1 ║ 15 ║
║ 2 ║ 2013-12-11 ║ 222 ║ source1 ║ 14 ║
╚═════╩═════════════╩══════╩══════════╩═══════════╝
id is not important, I had planned to re-increment the id in the new table that results
This is what I tried, but it didn't merge the two records – I am getting matching values but still separate rows:
SELECT date, uid, (SELECT CASE
WHEN source = 'source1a' THEN 'source1'
ELSE source
END) AS 'source', pageviews
FROM trafficSourceMedium
GROUP BY date, source, userid
An aggregation query should do what you want:
select `date`, uid,
(case when source = 'source1a' then 'source1' else source end) as source,
sum(pageviews) as pageviews
from trafficSourceMedium
group by `date`, uid,
(case when source = 'source1a' then 'source1' else source end);
Related
I have the following data:
╔════╦═══════╦═══════╗
║ id ║ group ║ place ║
╠════╬═══════╬═══════╣
║ 1 ║ 1 ║ a ║
║ 2 ║ 1 ║ b ║
║ 3 ║ 1 ║ b ║
║ 4 ║ 1 ║ a ║
║ 5 ║ 1 ║ c ║
║ 6 ║ 2 ║ a ║
║ 7 ║ 2 ║ b ║
║ 8 ║ 2 ║ c ║
╚════╩═══════╩═══════╝
How can I get the path of each group in MySQL?
The expected result is:
╔═══════╦════════════╗
║ group ║ path ║
╠═══════╬════════════╣
║ 1 ║ a-b-a-c ║
║ 2 ║ a-b-c ║
╚═══════╩════════════╝
Assuming that the end goal is to sort by group and id, and then simplify each group's sequence so that consecutive repeated places are only shown once:
Start by determining, for each row, whether the place or the group have changed since the previous row. There's a good solution to this problem in this answer.
Then use GROUP_CONCAT to merge the places together into a path.
Be aware that GROUP_CONCAT has a user-configurable maximum length, which by default is 1,024 characters.
SELECT
`group`,
GROUP_CONCAT(place ORDER BY id SEPARATOR '-') path
FROM
(SELECT
COALESCE(#place != place OR #group != `group`, 1) changed,
id,
#group:=`group` `group`,
#place:=place place
FROM
place_table, (SELECT #place:=NULL, #group:=NULL) s
ORDER BY `group`, id) t
WHERE
changed = 1
GROUP BY `group`;
I've inherited a database that includes a lookup table to find other patents that are related to a given patent.
So it looks like
╔════╦═══════════╦════════════╗
║ id ║ patent_id ║ related_id ║
╠════╬═══════════╬════════════╣
║ 1 ║ 1 ║ 2 ║
║ 2 ║ 1 ║ 3 ║
║ 3 ║ 2 ║ 1 ║
║ 4 ║ 2 ║ 3 ║
║ 5 ║ 3 ║ 2 ║
╚════╩═══════════╩════════════╝
And I want to filter out the reciprocal relationships. 1->2 and 2->1 are the same for my purposes so I only want 1->2.
I don't need to make the edit in the table, I just need a query the returns a list of the unique relationships, and while I'm sure it's simple I've been banging my head against the keyboard for far too long.
Here is a clever query which you can try using. The general strategy is to identify the unwanted duplicate records and then subtract them away from the entire set.
SELECT t.id, t.patent_id, t.related_id
FROM t LEFT JOIN
(
SELECT t1.patent_id AS t1_patent_id, t1.related_id AS t1_related_id
FROM t t1 LEFT JOIN t t2
ON t1.related_id = t2.patent_id
WHERE t1.patent_id = t2.related_id AND t1.patent_id > t1.related_id
) t3
ON t.patent_id = t3.t1_patent_id AND t.related_id = t3.t1_related_id
WHERE t3.t1_patent_id IS NULL
Here is the inner temporary table generated by this query. You can convince yourself that by applying the logic in the WHERE clause you will select the correct records. Non-duplicate records are characterized by t1.patent_id != t2.related_id, and all these records are retained. In the case of duplicates (t1.patent_id = t2.related_id), the record chosen from each pair of duplicates is the one where patent_id < related_id, as you requested in your question.
╔════╦══════════════╦═══════════════╦══════════════╦═══════════════╗
║ id ║ t1.patent_id ║ t1.related_id ║ t2.patent_id ║ t2.related_id ║
╠════╬══════════════╬═══════════════╬══════════════╬═══════════════╣
║ 1 ║ 1 ║ 2 ║ 2 ║ 1 ║ * duplicate
║ 1 ║ 1 ║ 2 ║ 2 ║ 3 ║
║ 2 ║ 1 ║ 3 ║ 3 ║ 2 ║
║ 3 ║ 2 ║ 1 ║ 1 ║ 2 ║ * duplicate
║ 3 ║ 2 ║ 1 ║ 1 ║ 3 ║
║ 4 ║ 2 ║ 3 ║ 3 ║ 2 ║ * duplicate
║ 5 ║ 3 ║ 2 ║ 2 ║ 1 ║
║ 5 ║ 3 ║ 2 ║ 2 ║ 3 ║ * duplicate
╚════╩══════════════╩═══════════════╩══════════════╩═══════════════╝
Click the link below for a running example of this query.
SQLFiddle
Try something like
select distinct * from
(select patient_id, related_id from TABLENAME
union
select related_id, patient_id from TABLENAME
);
Okay you're right the above won't work. Try
select patient_id, related_id from TABLENAME p1
where p1.patiend_id not in
(select patient_id from TABLENAME p2
where p2.related_id = p1.related_id)
I'd like to merge rows based on multiple criteria, essentially removing duplicates where I get to define what "duplicate" means. Here is an example table:
╔═════╦═══════╦═════╦═══════╗
║ id* ║ name ║ age ║ grade ║
╠═════╬═══════╬═════╬═══════╣
║ 1 ║ John ║ 11 ║ 5 ║
║ 2 ║ John ║ 11 ║ 5 ║
║ 3 ║ John ║ 11 ║ 6 ║
║ 4 ║ Sam ║ 14 ║ 7 ║
║ 5 ║ Sam ║ 14 ║ 7 ║
╚═════╩═══════╩═════╩═══════╝
In my example, let's say I want to merge on name and age but ignore grade. The result should be:
╔═════╦═══════╦═════╦═══════╗
║ id* ║ name ║ age ║ grade ║
╠═════╬═══════╬═════╬═══════╣
║ 1 ║ John ║ 11 ║ 5 ║
║ 3 ║ John ║ 11 ║ 6 ║
║ 4 ║ Sam ║ 14 ║ 7 ║
╚═════╩═══════╩═════╩═══════╝
I don't particularly care if the id column is updated to be incremental, but I suppose that would be nice.
Can I do this in MySQL?
My suggestion, based on my above comment.
SELECT distinct name, age, grade
into tempTable
from theTable
This will ignore the IDs and give you only a distinct dump, and into a new table.
Then you can either drop the old and, and rename the new one. Or truncate the old one, and dump this back in.
You could just delete the duplicates in place like this:
delete test
from test
inner join (
select name, age, grade, min(id) as minid, count(*)
from test
group by name, age, grade
having count(*) > 1
) main on test.id = main.minid;
Example: http://sqlfiddle.com/#!9/f1a38/1
I have two tables:
╔════════════════╗ ╔════════════════╗
║ ITEM ║ ║ ITEM_TRACK ║
╠════════════════╣ ╠════════════════╣
║ ID ║ ║ ID ║
║ GUID ║ ║ ITEM_GUID ║
║ COUNT1 ║ ║ CONTEXT ║
║ ENDDATE ║ ║ ║
╚════════════════╝ ╚════════════════╝
╔═════╦══════╦════════╗ ╔═════╦═══════════╦══════════╗
║ ID ║ GUID ║ COUNT1 ║ ║ ID ║ ITEM_GUID ║ CONTEXT ║
╠═════╬══════╬════════╣ ╠═════╬═══════════╬══════════╣
║ 1 ║ aaa ║ ║ ║ 1 ║ abc ║ ITEM ║
║ 2 ║ bbb ║ ║ ║ 2 ║ aaa ║ PAGE ║
║ 3 ║ ccc ║ ║ ║ 3 ║ bbb ║ ITEM ║
║ 4 ║ abc ║ ║ ║ 4 ║ ccc ║ ITEM ║
╚═════╩══════╩════════╝ ║ 5 ║ abc ║ ITEM ║
║ 6 ║ aaa ║ ITEM ║
║ 7 ║ abc ║ ITEM ║
║ 8 ║ ccc ║ PAGE ║
╚═════╩═══════════╩══════════╝
What I'm trying to do is fill in the COUNT1 column in ITEM with the count of the number of times ITEM_GUID appears in ITEM_TRACK for all ITEM.GUIDs where ENDDATE is still in the future. I need to do this once an hour for all GUIDS in ITEM.
I can get the counts I need easily
SELECT ITEM_GUID, COUNT(*) from ITEM_TRACK GROUP BY ITEM_GUID;
What I don't know how to do is, how do I merge this with an INSERT INTO statement to automatically update all the items in the items table with the count based on their ENDDATE?
UPDATE:
I have a working solution based on Aquillo's answer:
UPDATE ITEM a
SET COUNT1 = (SELECT COUNT(*) AS total FROM ITEM_TRACK b WHERE b.item_guid=a.guid);
Is there any other way to do this without a subquery?
You can insert from a select like this:
INSERT INTO myTable (foreignKey, countColumn) VALUES
SELECT ITEM_GUID, COUNT(*) from ITEM_TRACK GROUP BY ITEM_GUID;
In case you want to update, try something like this:
UPDATE from SELECT using SQL Server
If you use INSERT INTO you'll put additional rows in your ITEM table, not update the existing ones. If this is what you meant then that's great, but if you want to update the existing ones, you'll need to use update. You do this by joining the table you want to update with the table you want to update from. However, in your case you want to update from an aggregation and so you need to create a table with the aggregated values. Try this:
UPDATE ITEM SET Count1 = temp.total
FROM Item
INNER JOIN (
SELECT ITEM_GUID, COUNT(*) AS total
FROM ITEM_TRACK
GROUP BY ID) AS temp
ON Item.GUID = temp.ITEM_GUID
WHERE ENDDATE > NOW()
I've tried this on SQL Server (using GETDATE() instead of NOW()) to double check and it worked, I think it should work on MYSQL.
╔════════╦═══════════╦═══════╗
║ MSG_ID ║ RANDOM_ID ║ MSG ║
╠════════╬═══════════╬═══════╣
║ 1 ║ 22 ║ apple ║
║ 2 ║ 22 ║ bag ║
║ 3 ║ 0 ║ cat ║
║ 4 ║ 0 ║ dog ║
║ 5 ║ 0 ║ egg ║
║ 6 ║ 21 ║ fish ║
║ 7 ║ 21 ║ hen ║
║ 8 ║ 20 ║ glass ║
╚════════╩═══════════╩═══════╝
Want to fetch 3 records in a lot such a way that all the data of a particular random_id is picked up .
Result Required:
║ MSG_ID ║ RANDOM_ID ║ MSG ║
╠════════╬═══════════╬═══════╣
║ 1 ║ 22 ║ apple ║
║ 2 ║ 22 ║ bag ║
║ 3 ║ 0 ║ cat ║
Current Result:
║ MSG_ID ║ RANDOM_ID ║ MSG ║
╠════════╬═══════════╬═══════╣
║ 1 ║ 22 ║ apple ║
║ 3 ║ 0 ║ cat ║
║ 4 ║ 0 ║ dog ║
______________________________
Query Used:
SELECT ID,Random_ID, GROUP_CONCAT(message SEPARATOR ' ' ),FLAG,mobile,sender_number,SMStype
FROM messagemaster
WHERE Random_ID > 0
GROUP BY Random_ID
UNION
SELECT ID,Random_ID, message,FLAG,mobile,sender_number,SMStype
FROM messagemaster
WHERE Random_ID = 0
order by random_id LIMIT 100;
I don't want to pick up records using group by.I want to fetch all the records w rt random_ids .Like , if there is a random_id for which there are 3 records and if the query has limit =3 , then i want all the data w r t those random_id to be picked up.
The situation is if I fetch rows with limit 100 , i dont want that some of the data with the random id present in the result set is not picked.
For example if i am picking records limit by 3 then for random id=22 , all records with random id =22 should be picked .
Consider the following...
SELECT b.*
FROM
( SELECT x.*, SUM(y.cnt)
FROM
( SELECT random_id,COUNT(*) cnt FROM messagemaster GROUP BY random_id) x
JOIN
( SELECT random_id,COUNT(*) cnt FROM messagemaster GROUP BY random_id) y
ON y.random_id >= x.random_id
GROUP
BY x.random_id
HAVING SUM(y.cnt) < 4
) a
JOIN messagemaster b
ON b.random_id = a.random_id;