I have a MySQL database table that logs site searches. If it encounters a new search phrase it inserts a new row with the phrase and a counter value of 1. If the phrase already exists it increments the counter.
For example:
Id term count
1 boots 14
2 shirts 2031
3 t-shirt 1005
4 tshirt 188
Unfortunately the table contains duplicates for many phrases, for example:
Id term count
12 sneakers 711
26 sneakers 235
27 sneakers 114
108 sneakers 7
What would a MySQL query look like that combines all the duplicates into one row, totaling up their counts?
What I want:
Id term count
12 sneakers 1067
Thank you in advance.
You are looking for the SUM() aggregate function:
-- `count` is a reserved word so must be enclosed in backticks
SELECT MIN(Id) AS Id, term, SUM(`count`) AS `count` FROM tablename GROUP BY term;
In this case however, if this result is what you want on a permanent basis, I would do as follows:
-- Create a table identical to the current one
CREATE TABLE newtable (Id integer not null auto_increment, term ...)
-- Create a UNIQUE index on the term column
CREATE UNIQUE INDEX newtable_term_udx ON newtable(term);
-- Populate the table with the cleaned results
INSERT INTO newtable (term, `count`) SELECT term, SUM(`count`) ...as above
-- rename the old table as backup, rename newtable with the same name as old table
Then whenever you do an INSERT into the table (which is now the new table) do
INSERT INTO tablename (term, `count`) VALUES ('new word', 1)
ON DUPLICATE KEY UPDATE `count`=`count`+1
This way, if the new term does not exist it will be created with an initial count of 1, and if it does exist, its count will be incremented by 1, all automatically.
select min(Id) as Id, term, sum(count) as count
from your_table
group by term;
Alternatively, you can use max(Id), depending on if you want the first or last Id value for each term.
You can play around with the query here: db-fiddle
Related
Essentially I have the following called Table1 with columns OrderNum and Book there should never be duplicate records of any kind of Book for each OrderNum, if there is it needs to identified and deleted.
For example:
OrderNum 1 should only have Book1 listed once so the query must identify the other 2 Book1 listed for OrderNum 1 and delete them.
OrderNum 4 should only have Book2 listed once so the query must identify the other Book2 listed for OrderNum 4 and delete it.
After the query runs Table1 Should look like this:
I am working with MS Access queries but I am looking for a solution that could work for an mySQL query as well.
I don't know how to do this gracefully on either MySQL or Access, because your table doesn't have a primary key column, which it rightfully should have. On Access, you could try creating a new table, then populating it using the following query:
INSERT INTO yourNewTable (OrderNum, Book)
SELECT DISTINCT OrderNum, Book
FROM yourTable;
Then, delete yourTable after you are done with the above query.
If you had a primary key/auto increment column in your table, let's say id, then you could use the following delete statement directly:
DELETE
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.OrderNum = t1.OrderNum AND
t2.Book = b1.Book AND
t2.id < t1.id);
This would leave, for each (OrderNum, Book) combination, the single record among duplicates which happens to have the lowest id value.
I have the following query to append data into a table if it is unique:
INSERT INTO belgarath.players(tour_id, player_id, player_name_oc)
SELECT DISTINCT 0, ID_P, NAME_P FROM oncourt.players_atp
LEFT JOIN belgarath.players
ON belgarath.players.tour_id = 0
AND belgarath.players.player_id=oncourt.players_atp.ID_P;
I run this once on an empty table and it's fine. I delete a row and run it expecting MySQL to append the one deleted row. However, I get the following error code: Error Code: 1062. Duplicate entry '0-43042' for key 'players.unique_plyrs' . I have a unique key across tour_id and player_id and clearly it's failing because I'm trying to append a duplicate record.
Why would I be getting this if I'm only selecting distinct records to insert? How do I avoid getting this in future?
This should resolve your issue. Put a Where clause to check for belgarath.players.player_id is NULL.
INSERT INTO belgarath.players(tour_id, player_id, player_name_oc)
SELECT DISTINCT 0, ID_P, NAME_P FROM oncourt.players_atp
LEFT JOIN belgarath.players
ON belgarath.players.tour_id = 0
AND belgarath.players.player_id=oncourt.players_atp.ID_P
WHERE belgarath.players.player_id is NULL;
Hope this hint realted to Distinct keyword helps you. When we use distinct key it usually select distinct rows. So we can't expect it should return distinct values for only one column before which we have wrote distinct. Below example will better explain you what i am trying to say.
create table test(id1 int, id2 int);
insert into test values(1,1),(1,2),(1,3);
Here i have created a test table and when i use distinct keyword as used in below query
select distinct id1, id2 from test;
Then we'll get output like this:
id1 id2
1 1
1 2
1 3
You are inserting tour_ID as 0, and as you have defined tour_id and player_id as unique key in oncourt.players_atp table. So your select query is selecting tour_id as '0' every time. Because select query with distinct is getting really distinct records like say player_id is 1,2,3 and names are john, steve, bill respectively then select query will return this 3 records like (0, 1, john), (0, 2, steve), (0, 3, bill) and so on.
If your oncourt.players_atp table also has unique constraint and that table also contains tour_id then you can just copy tour ID from there. If tour_id is not present there and you want to generate it inside belgarath.players table only then in you table definition you can define tour id as a auto increment then it will generate unique id's there and then you don't need to select tour_id in your query you just have to insert player_id and player_name once you define tour_id as an autoincrement ID.
Hope this may help you.
I have a table with several columns. The only way to tell the connection to each row is by session_id
My goal is to update the age column with a random number.
Therefore I used following query
UPDATE userdata
SET age = (FLOOR(18 + RAND() * 62))
WHERE session_id IN
(
SELECT session_id FROM
(
SELECT DISTINCT session_id
FROM userdata
) AS temp
)
This works fine but every distinct session_id still has multiple rows and the age has different values.I know this query does what it should do but how can I change my query to get unique age values for each row with the same session_id?
e.g.
session_id | age
1 25
2 35
2 35
3 51
3 51
3 51
Thank you in advance and please be nice, since this is my first stackoverflow question.
Try this:
update userdata
inner join (select *, (FLOOR(18 + RAND() * 62)) as age2 from userdata group by session_id) T ON T.session_id = userdata.session_id
set userdata.age = T.age2
This seems the wrong way around. It seems session_id is not unique (when perhaps it should be?--which is where the first way you can correct this is)... so you are trying to create a unique column based on it...but as you rightly point out you can't create a new number based on the session id as it too will be the same. Most obvious solution, add an auto-increment column:
ALTER TABLE yourtablename add Age int(11) PRIMARY KEY AUTO_INCREMENT;
I've an 'orders' table structure like this which contains 100,000 records:
date orderid type productsales other
01-Aug-2014 11 order 118 10.12
01-Aug-2014 11 order 118 10.12
18-Aug-2014 11 order 35 4.21
22-Aug-2014 11 Refund -35 -4.21
09-Sep-2014 12 order 56 7.29
15-Sep-2014 12 refund -56 -7.29
23-Oct-2014 13 order 25 2.32
26-Oct-2014 13 refund -25 -2.32
Now, what I want to achieve is to delete those duplicate row from my table where the orderid, type, productsales and other columns values are same to each other and keep only one row (look at the first two records for the orderid of 11).
But if the 'orderids' are same for the two records of the same 'type' of order, but the 'productsales' and 'other' columns values are different then don't delete those rows. I hope I clarified my point.
I'm looking for a mysql delete query to perform this task.
You should add an id column. If you don't want to use a temp table, you could probably do something like this (I have NOT tested this, so...):
ALTER TABLE 'orders'
ADD COLUMN 'id' INT NOT NULL AUTO_INCREMENT FIRST, ADD PRIMARY KEY Id(id)
DELETE
FROM orders INNER JOIN
(
SELECT TOP 1 id
FROM orders
WHERE COUNT(DISTINCT date,orderid,type.productsales,other) > 1
) dupes
ON orders.id = dupes.id
May be its duplicate question to this: MySql: remove table rows depending on column duplicate values?
You can seek for the answer there.
The solution there specify that adding unique index on your possible duplicate columns with IGNORE keyword will remove all duplicates row.
ALTER IGNORE TABLE `table` ADD UNIQUE INDEX `name` (`col1`, `col2`, `col3`);
Here I also want to mention some points:
unique index does not make change in row if any columns(from index, like here 3 columns) have null as value. Ex: null,1,"asdsa" can be stored twice
same way if you have single column in unique index then multiple rows with null values(for that column) will remains in table
IGNORE keywords id depreciated now, it will not work after MySQL 5.6(may be). Now only option is to create new table by a query like this:
CREATE TABLE <table_name> AS SELECT * FROM <your_table> GROUP BY col1,col2,col3;
After that you can delete <your_table> and rename <table_name> to your table.
Here you can change the column list in Group By clause according to your need(from all columns to one column, or few columns which have duplicate values together).
The plus point is, it will work with null values also.
A really easy way to do this is to add a UNIQUE index on the 3 columns. When you write the ALTER statement, include the IGNORE keyword. Like so:
ALTER IGNORE TABLE orders ADD UNIQUE INDEX idx_name (orderid, type, productsales, other);
This will drop all the duplicate rows. As an added benefit, future INSERTs that are duplicates will error out. As always, you may want to take a backup before running something like this...
I hope this can help you.
try this.
create temp table such as temp and stored unique data,
SELECT distinct * into temp FROM Orders
then delete records of orders table table as
DELETE FROM orders
after deleted all records insert records temp into records.
INSER into RECORDS SELECT * FROM TEMP DROP TABLE TEMP
If you have completely duplicated rows, and you want to do this in SQL, then perhaps the best method is to save the rows you want in a temporary table, truncate the table, and insert the data back in:
create temporary table temp_orders as
select distinct *
from orders;
truncate table orders;
alter table orders add orderid int not null primary key auto_increment;
insert into orders;
select *
from temp_orders;
Oh, look, I also added an auto-incrementing primary key so you won't have this problem in the future. This would be a simpler process if you have a unique key on each row.
i am currently writing query. i want to select all records from table . records will be based on mutiple values of foreign key. for example all records related to 1 and 2 both
eg. table might have
id name uid
1 bil 3
2 test 3
3 test 4
4 test 4
5 bil 5
6 bil 5
i want to select all records related to 3 but also related to 4 in this case it is record number 2
SELECT id
FROM `table`
WHERE uid = value1 AND like_id
IN (SELECT like_id
FROM likes
WHERE uid = uid2)
LIMIT 0 , 30
It's not at all clear where "value1" is coming from, or "uid2" is coming from, or where the column "like_id" is coming from. Those column names do not appear in your sample table. Your example query references two different table names (table and likes), yet you only show data for one example table, and that table does not have a column named like_id.
If we assume that "value1" and "uid2" in your query are literals, or bind parameters supplied to the query, which seems to be reasonable, given your specification (variously), of values of 1,2,3 and 4. But we're still left with "like_id" column. Given that it's referenced in the SELECT list of the IN subquery, we're going to presume that's a column in the "likes" table, and given that it's referenced in the outer query, we're going to assume that it's a column in the (unfortunately named) table table.
(Bottomline, it's not at all clear how your query is returning a "correct" result, given that you've made it impossible to replicate a working test case.)
Given a single table, as shown in your example data, e.g.
CREATE TABLE likes (id INT, name VARCHAR(4), uid INT);
INSERT INTO likes VALUES (1,'bil',3),(2,'test',3),(3,'test',4)
,(4,'test',4),(5,'bil',5),(6,'bil',5);
ALTER TABLE likes ADD PRIMARY KEY (id);
ALTER TABLE likes ADD CONSTRAINT likes_ix UNIQUE KEY (uid, name);
Assuming that we're running a query against that single table, and that we're matching "likes" associated with uid=3 to "likes" associated with uid=4, and that the matching is done on the "name" column, then
SELECT t.id
FROM `likes` t
WHERE t.uid = 3
AND EXISTS
( SELECT 1
FROM `likes` s
WHERE s.name = t.name
AND s.uid = 4
)
That will return the id of the row from the likes table for uid=3 where we also find a row in the likes table for uid=4 with a matching name value.
Given a limited number of rows to be inspected from the likes table on the outer query, that gives a limited number of times a correlated subquery would need to be run, which should give reasonable performance:
For large sets, a join operation generally performs better to return an equivalent result:
SELECT t.id
FROM `likes` t
JOIN `likes` s
ON s.name = t.name
AND s.uid = 4
WHERE t.uid = 3
GROUP
BY t.id
The key to optimum performance for either query is going to be appropriate indexes.