compare data in a table - mysql

I have a scenario in which I have to check whether a data already exist in a table or not. If a user n application already exists then we don't need to perform any operation else perform insertion.
CREATE TABLE `table1` (
`ID` INT(11) NOT NULL AUTO_INCREMENT,
`GroupName` VARCHAR(100) DEFAULT NULL,
`UserName` VARCHAR(100) DEFAULT NULL,
`ApplicationName` VARCHAR(100) DEFAULT NULL,
`UserDeleted` BIT DEFAULT 0,
PRIMARY KEY (`ID`)
)
CREATE TABLE `temp_table` (
`ID` INT(11) NOT NULL AUTO_INCREMENT,
`GroupName` VARCHAR(100) DEFAULT NULL,
`UserName` VARCHAR(100) DEFAULT NULL,
`ApplicationName` VARCHAR(100) DEFAULT NULL,
PRIMARY KEY (`ID`)
)
table1 is the main table while temp_table from which I have to perform comparision.
Sample date script:
INSERT INTO `table1`(`ID`,`GroupName`,`UserName`,`ApplicationName`,`userdeleted`)
VALUES
('MidasSQLUsers','Kevin Nikkhoo','MySql Application',0),
('MidasSQLUsers','adtest','MySql Application',0),
('Salesforce','Kevin Nikkhoo','Salesforce',0),
('Salesforce','devendra talmale','Salesforce',0);
INSERT INTO `temp_table`(`ID`,`GroupName`,`UserName`,`ApplicationName`,`userdeleted`)
VALUES
('MidasSQLUsers','Kevin Nikkhoo','MySql Application',0),
('MidasSQLUsers','adtest','MySql Application',0),
('Salesforce','Kevin Nikkhoo','Salesforce',0),
('Salesforce','Kapil Singh','Salesforce',0);
Also if a row of temp_table does not exist int table1 then its status should be as userdeleted 1 as i mentioned in desired output.
Result: table1
ID GroupName UserName ApplicationName Deleted
1 MidasSQLUsers Kevin Nikkhoo MySql Application 0
2 MidasSQLUsers adtest MySql ApplicationName 0
3 Salesforce Kevin Nikkhoo Salesforce 0
4 Salesforce devendra talmale Salesforce 1
5 SalesForce Kapil Singh Salesforce 0
Please help

this should do the trick, change whatever is inside the concat, to met your specification.
first do one update to "delete" the rows that are not in tempTable:
update table t1 set deleted = 'YES'
where concat( t1.groupName, t1.Username, t1.application) NOT IN
(select concat( t2.groupName, t2.Username, t2.application) from tempTable t2);
second: insert the new records
insert into table1 t1 (t1.groupName, t1.username, t1.application_name, t1.deleted)
(select t2.groupName, t2.Username, t2.application, t2.deleted from tempTable t2
where concat(t2.userName, t2.application, t2.groupName, t2.deleted) **not in**
(select concat(t3.userName, t3.application, t3.groupName, t3.deleted)
from table1 t3)
the concat function will make a new row from existing rows... this way I can compare multiple fields at same time as if I was comparing only one...
the "not in" lets make you a query inside a query...

Slight variation on the above queries.
Using JOINs, which if you have an index on the userName, application and groupName fields will likely be faster
UPDATE table1 t1
LEFT OUTER JOIN temp_table t2
ON t1.userName = t2.userName
AND t1.application = t2.application
AND t1.groupName = t2.groupName
SET t1.deleted = CASE WHEN t2.ID IS NULL THEN 1 ELSE 0 END
For the normal insert
INSERT INTO table1 t1 (t1.groupName, t1.username, t1.application_name, t1.deleted)
(SELECT t2.groupName, t2.Username, t2.application, t2.deleted
FROM tempTable t2
LEFT OUTER JOIN table1 t3
ON t2.userName = t3.userName
AND t2.application = t3.application
AND t2.groupName = t3.groupName
WHERE t3.ID IS NULL)

Related

SQL - Left Join and Group By causes row data from second table to get mixed up

I have two tables, the first table has a reference to the id from the second table, I want to make a query involving a left join with fields from the second table as well as with a COUNT function in the select, because of the COUNT function, I am using an GROUP BY clause.
So my query looks something like:
SELECT t1.id, t1.txt, t2.id, t2.txt, COUNT(t2.id)
FROM test_data1 t1
LEFT JOIN test_data2 t2 ON (t1.ref_col = t2.id)
GROUP BY t1.id
In my tables, only the second row of test_data1 has an entry in ref_col, so I would expect that for the first row in the results, the value for t2.id would be NULL, however that is not the case (in my example I see the value 2, but I'm not sure if there might be an element of randomness here).
If I use
SELECT MAX(t1.id), MAX(t1.txt), MAX(t2.id), MAX(t2.txt), COUNT(t2.id)
FROM test_data1 t1
LEFT JOIN test_data2 t2 ON (t1.ref_col = t2.id)
GROUP BY t1.id
I get my expected results, however I am surprised this is necessary given that there will at most only be one entry in test_data2 matching ref_col in test_data1.
Does anyone know why LEFT JOIN + GROUP BY is behaving this way? This is using MySQL version 8 on Linux.
If you want to reproduce this here are the table definitions:
CREATE TABLE test_data1 (id int unsigned NOT NULL AUTO_INCREMENT,
txt VARCHAR(45) DEFAULT NULL,
ref_col int unsigned DEFAULT NULL, PRIMARY KEY (id));
CREATE TABLE test_data1 (id int unsigned NOT NULL AUTO_INCREMENT,
txt VARCHAR(45) DEFAULT NULL,
ref_col int unsigned DEFAULT NULL, PRIMARY KEY (id));
INSERT INTO test_data1 (id, txt, ref_col)
VALUES
(1,'zz',NULL),
(2,'yy',2),
(3,'xx',NULL);
INSERT INTO test_data2 (id, txt)
VALUES
(1,'aa'),
(2,'bb'),
(3,'cc'),
(4,'dd');

Mysql 1093 - Table is specified twice

I have really searched and this question has been answered here
1093 Error in MySQL table is specified twice
but the answer doesn't help me
I have this accounts table
But I am facing error 1093 - Table is specified twice, when trying to update account balance
Although I give the table two names t1 and t2
UPDATE accounts t1
SET Account_Balance = Account_Balance+(
SELECT SUM(Credit)-SUM(Debit)
FROM accounts t2
WHERE Account_Id=1
)
Create accounts table statement
CREATE TABLE `accounts` (
`Account_Id` int(11) NOT NULL AUTO_INCREMENT,
`Account_Name` varchar(100) NOT NULL,
`Account_Name_English` varchar(50) NOT NULL,
`Account_Balance` decimal(15,2) NOT NULL DEFAULT '0.00',
`Credit` decimal(15,2) NOT NULL DEFAULT '0.00',
`Debit` decimal(15,2) NOT NULL DEFAULT '0.00',
PRIMARY KEY (`Account_Id`)
) ENGINE=InnoDB AUTO_INCREMENT=14 DEFAULT CHARSET=utf8
Using an alias doesn't solve the problem of not being able to specify a table that is being updated in a SELECT on the right hand side of an expression. One way to work around this issue is to use a multi-table UPDATE:
UPDATE accounts t1
CROSS JOIN (SELECT SUM(Credit)-SUM(Debit) AS `change`
FROM accounts
WHERE Account_Id=1) t2
SET t1.Account_Balance = t1.Account_Balance + t2.change
Note I'm not sure the location of the WHERE Account_Id = 1 is correct; this will update all Account_Balance fields in accounts to their old balance plus the change from Account_Id 1. If that is what you want, this is fine, otherwise you might need an additional WHERE clause on the UPDATE i.e.
UPDATE accounts t1
CROSS JOIN (SELECT SUM(Credit)-SUM(Debit) AS `change`
FROM accounts
WHERE Account_Id=1) t2
SET t1.Account_Balance = t1.Account_Balance + t2.change
WHERE Account_Id = 1
Or to update all accounts with their own change:
UPDATE accounts t1
JOIN (SELECT Account_Id, SUM(Credit)-SUM(Debit) AS `change`
FROM accounts
GROUP BY Account_Id) t2 ON t2.Account_Id = t1.Account_Id
SET t1.Account_Balance = t1.Account_Balance + t2.change
Here's a demo of all three queries in operation.

MYSQL update into select 3 tables

I use this query (insert into select 3 tables) to insert my rows in the mysql table.
"INSERT INTO test (catid_1, descat_1, catid_2, descat_2, id_user, user)
SELECT '$_POST[cat_1]',t1.desc AS descat_1, '$_POST[cat_2]', t2.desc AS descat_2,
$_POST[id_user]',t3.user FROM t1, t2, t3
WHERE t1.idcat_1='$_POST[cat_1]' and t2.idcat_2='$_POST[cat_2]'
and t3.id_user='$_POST[id_user]'";
Now I'd like to use the same logic to UPDATE my rows (update into select 3 table) in the mysql table.
tables structure
t1
`idcat_1` int(11) NOT NULL AUTO_INCREMENT,
`desc` varchar(100) NOT NULL,
PRIMARY KEY (`idcat_1`)
t2
`idcat_2` int(11) NOT NULL AUTO_INCREMENT,
`idcat_1` int(11) NOT NULL,
`desc` varchar(100) NOT NULL,
PRIMARY KEY (`idcat_2`)
t3
`id_user` int(11) NOT NULL AUTO_INCREMENT,
`user` varchar(40) NOT NULL,
PRIMARY KEY (`id_user`)
Is it possible to do?
Thanks
Like this:
UPDATE test AS t
INNER JOIN t1 ON -- join conditon
INNER JOIN t2 ON ...
INNER JOIN t3 ON ...
SET t.catid_1 = '$_POST[cat_1]',
t.descat_1 = t1.desc,
....
WHERE t1.idcat_1='$_POST[cat_1]'
and t2.idcat_2='$_POST[cat_2]'
and t3.id_user='$_POST[id_user]'
It is not clear how the four tables are joined, you will need to supply the join condition for each JOIN.
Update 1
From the tables' structures that you just posted in your updated question, it seems like neither of the three tables test, t1 nor t2 related to the table t3 by any keys. In this key you didn't need to join with this table t3 and only join the tables test, t1 and t2. Assuming that:
test relates to t1 with test.catid_1 = t1.idcat_1.
t1 relates to t2 with t1.idcat_1 = t2.idcat_1.
Like this:
UPDATE test AS t
INNER JOIN t1 ON t.catid_1 = t1.idcat_1
INNER JOIN t2 ON t1.idcat_1 = t2.idcat_1
SET t.catid_1 = '$_POST[cat_1]',
t.descat_1 = t1.desc,
....
WHERE t1.idcat_1='$_POST[cat_1]'
and t2.idcat_2='$_POST[cat_2]'

MySQL Group Concat Single Column NULL

I have a column that i am looking to retrieve all matches of in one row. I am querying other data as well. Currently i am using group_concat. This has worked great until now. Sometimes there are potential NULL values in this column and this has been preventing anything from being returned.
i have tried various other solutions posted here without success.
CREATE TABLE table1 (
id mediumint(9) NOT NULL AUTO_INCREMENT,
item_num mediumint(9) NOT NULL,
PRIMARY KEY (id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
CREATE TABLE table2 (
id mediumint(9) NOT NULL AUTO_INCREMENT,
oneid mediumint(9) NOT NULL,
item_desc varchar(16) NOT NULL,
PRIMARY KEY (id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
SELECT item_num, GROUP_CONCAT(item_desc) AS alldesc FROM table1 LEFT JOIN table2 ON table1.id = table2.oneid
So basically, there can be several item descripotions that may be NULL; they will be in no particular order either. So i am seeking a list with a placeholder when NULLs arise.
Does this work for you(use description as empty string when it is NULL)?
SELECT item_num,
REPLACE(GROUP_CONCAT(IFNULL(item_desc,' ')), ', ,', ',') AS alldesc
FROM table1
LEFT JOIN table2
ON table1.id = table2.oneid
you are missing GROUP BY in your query. chances are if you have multiple item_num, it will always return one row.
SELECT item_num, GROUP_CONCAT(item_desc) AS alldesc
FROM table1 LEFT JOIN table2
ON table1.id = table2.oneid
GROUP BY item_num
SELECT item_num,
REPLACE(
GROUP_CONCAT(
IFNULL(item_desc,'*!*') -- replace this with something not in a normal item_desc
ORDER BY if(item_desc is null, 1, 0) desc
), '*!*,','') AS alldesc
FROM table1
LEFT JOIN table2
ON table1.id = table2.oneid
GROUP BY item_num
Try the following query
SELECT item_num, GROUP_CONCAT(ISNULL(item_desc,'')) AS alldesc FROM table1 LEFT JOIN table2 ON table1.id = table2.oneid

Super slow query with CROSS JOIN

I have two tables named table_1 (1GB) and reference (250Mb).
When I query a cross join on reference it takes 16hours to update table_1 .. We changed the system files EXT3 for XFS but still it's taking 16hrs.. WHAT AM I DOING WRONG??
Here is the update/cross join query :
mysql> UPDATE table_1 CROSS JOIN reference ON
-> (table_1.start >= reference.txStart AND table_1.end <= reference.txEnd)
-> SET table_1.name = reference.name;
Query OK, 17311434 rows affected (16 hours 36 min 48.62 sec)
Rows matched: 17311434 Changed: 17311434 Warnings: 0
Here is a show create table of table_1 and reference:
CREATE TABLE `table_1` (
`strand` char(1) DEFAULT NULL,
`chr` varchar(10) DEFAULT NULL,
`start` int(11) DEFAULT NULL,
`end` int(11) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`name2` varchar(255) DEFAULT NULL,
KEY `annot` (`start`,`end`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
CREATE TABLE `reference` (
`bin` smallint(5) unsigned NOT NULL,
`name` varchar(255) NOT NULL,
`chrom` varchar(255) NOT NULL,
`strand` char(1) NOT NULL,
`txStart` int(10) unsigned NOT NULL,
`txEnd` int(10) unsigned NOT NULL,
`cdsStart` int(10) unsigned NOT NULL,
`cdsEnd` int(10) unsigned NOT NULL,
`exonCount` int(10) unsigned NOT NULL,
`exonStarts` longblob NOT NULL,
`exonEnds` longblob NOT NULL,
`score` int(11) DEFAULT NULL,
`name2` varchar(255) NOT NULL,
`cdsStartStat` enum('none','unk','incmpl','cmpl') NOT NULL,
`cdsEndStat` enum('none','unk','incmpl','cmpl') NOT NULL,
`exonFrames` longblob NOT NULL,
KEY `chrom` (`chrom`,`bin`),
KEY `name` (`name`),
KEY `name2` (`name2`),
KEY `annot` (`txStart`,`txEnd`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
You should index table_1.start, reference.txStart, table_1.end and reference.txEnd table fields:
ALTER TABLE `table_1` ADD INDEX ( `start` ) ;
ALTER TABLE `table_1` ADD INDEX ( `end` ) ;
ALTER TABLE `reference` ADD INDEX ( `txStart` ) ;
ALTER TABLE `reference` ADD INDEX ( `txEnd` ) ;
Cross joins are Cartesian Products, which are probably one of the most computationally expensive things to compute (they don't scale well).
For each table T_i for i = 1 to n, the number of rows generated by crossing tables T_1 to T_n is the size of each table multiplied by the size of each other table, ie
|T_1| * |T_2| * ... * |T_n|
Assuming each table has M rows, the resulting cost of computing the cross join is then
M_1 * M_2 ... M_n = O(M^n)
which is exponential in the number of tables involved in the join.
I see 2 problems with the UPDATE statement.
There is no index for the End fields. The compound indexes (annot) you have will be used only for the start fields in this query. You should add them as suggested by Emre:
ALTER TABLE `table_1` ADD INDEX ( `end` ) ;
ALTER TABLE `reference` ADD INDEX ( `txEnd` ) ;
Second, the JOIN may (and probably does) find many rows of table reference that are related to a row of table_1. So some (or all) rows of table_1 that are updated, are updated many times. Check the result of this query, to see if it is the same as your updated rows count (17311434):
SELECT COUNT(*)
FROM table_1
WHERE EXISTS
( SELECT *
FROM reference
WHERE table_1.start >= reference.txStart
AND table_1.`end` <= reference.txEnd
)
There can be other ways to write this query but the lack of a PRIMARY KEY on both tables makes it harder. If you define a primary key on table_1, try this, replacing id with the primary key.
Update: No, do not try it on a table with 34M rows. Check the execution plan and try with smaller tables first.
UPDATE table_1 AS t1
JOIN
( SELECT t2.id
, r.name
FROM table_1 AS t2
JOIN
( SELECT name, txStart, txEnd
FROM reference
GROUP BY txStart, txEnd
) AS r
ON t2.start >= r.txStart
AND t2.`end` <= r.txEnd
GROUP BY t2.id
) AS good
ON good.id = t1.id
SET t1.name = good.name;
You can check the query plan by running EXPLAIN on the equivalent SELECT:
EXPLAIN
SELECT t1.id, t1.name, good.name
FROM table_1 AS t1
JOIN
( SELECT t2.id
, r.name
FROM table_1 AS t2
JOIN
( SELECT name, txStart, txEnd
FROM reference
GROUP BY txStart, txEnd
) AS r
ON t2.start >= r.txStart
AND t2.`end` <= r.txEnd
GROUP BY t2.id
) AS good
ON good.id = t1.id ;
Try this:
UPDATE table_1 SET
table_1.name = (
select reference.name
from reference
where table_1.start >= reference.txStart
and table_1.end <= reference.txEnd)
Somebody already offered you to add some indexes. But I think the best performance you may get with these two indexes:
ALTER TABLE `test`.`time`
ADD INDEX `reference_start_end` (`txStart` ASC, `txEnd` ASC),
ADD INDEX `table_1_star_end` (`start` ASC, `end` ASC);
Only one of them will be used by MySQL query, but MySQL will decide which is more useful automatically.