Working on mysql.5.7
Here is my bugs table
MySQL [jira_statistics]> describe bugs;
+---------------------------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------------------+--------------+------+-----+---------+-------+
| issue_key | varchar(45) | NO | PRI | NULL | |
| release_name | varchar(45) | YES | MUL | NULL | |
| issue_summary | varchar(200) | YES | | NULL | |
| story_points | int(11) | NO | | 0 | |
| qa_reopened | float | NO | | 0 | |
| done_reopened | float | NO | | 0 | |
This table is updated by periodic calls to LOAD DATA LOCAL INFILE bugs <file.csv>
Whenever this update takes place (which may either update existing lines and/or insert new ones) I want another table that has some yielded statistics to be updated via the following trigger
create trigger update_bugs_stats after insert on `jira_statistics`.`bugs` for each row
begin
delimiter ;
-- STORY POINTS -------------------------
SELECT AVG(story_points) INTO #avg_bugs_storypoints FROM `jira_statistics`.`bugs` WHERE release_name = new.release_name;
SELECT MAX(story_points) INTO #max_bugs_storypoints FROM `jira_statistics`.`bugs` WHERE release_name = new.release_name;
SELECT MIN(story_points) INTO #min_bugs_storypoints FROM `jira_statistics`.`bugs` WHERE release_name = new.release_name;
INSERT INTO storypoints_stats (release_name, avg_bugs_storypoints, max_bugs_storypoints, min_bugs_storypoints)
VALUES (relName, #avg_bugs_storypoints, #max_bugs_storypoints, #min_bugs_storypoints)
ON DUPLICATE KEY UPDATE
relName=new.release_name,
avg_bugs_storypoints=#avg_bugs_storypoints,
max_bugs_storypoints=#max_bugs_storypoints,
min_bugs_storypoints=#min_bugs_storypoints;
However this gives me the following error whenever trying to create the trigger:
Unknown column new.release_name in where clause.
Why isn't the new keyword bein recognized?
Because new is reserved as a system word
Ref: https://dev.mysql.com/doc/refman/8.0/en/keywords.html
Please modify
new.release_name ==> `new`.`release_name`
etc..
Τhe error was more stupid than I thought;
I was working directly on sql query editor and not on the triggers tab of mysql workbench so it did not parse correctly the new keyword`.
The below query was taking more than 8 min and 900 000 rows processed. it is very slow and affect my product. I can't identify why the query getting slow, all index are set fine.
explain SELECT
COUNT(DISTINCT (cinfo.CONTACT_ID))
FROM
cinfo
INNER JOIN
LTocMapping ON cinfo.CONTACT_ID = LTocMapping.CONTACT_ID
WHERE
(((((((((cinfo.COUNTRY LIKE '%Panama%')
OR (cinfo.COUNTRY LIKE '%PANAMA%'))
AND (((cinfo.CONTACT_EMAIL NOT LIKE '%test%')
AND (cinfo.CONTACT_EMAIL NOT LIKE '%engine%'))
OR (cinfo.CONTACT_EMAIL IS NULL)))
AND ((SELECT
(GROUP_CONCAT(Temp.LIST_ID
ORDER BY Temp.LIST_ID) REGEXP ('.*,*221715000514445053,*.*$'))
FROM
LTocMapping Temp
WHERE
((LTocMapping.CONTACT_ID = Temp.CONTACT_ID)
AND (((Temp.MAPPING_ID >= 221715000000000000)
AND (Temp.MAPPING_ID <= 221715999999999999))
OR ((Temp.MAPPING_ID >= 0)
AND (Temp.MAPPING_ID <= 999999999999))))
GROUP BY Temp.CONTACT_ID) = '0'))
AND ((SELECT
(GROUP_CONCAT(Temp.LIST_ID
ORDER BY Temp.LIST_ID) REGEXP ('.*,*221715000520574130,*.*$'))
FROM
LTocMapping Temp
WHERE
((LTocMapping.CONTACT_ID = Temp.CONTACT_ID)
AND (((Temp.MAPPING_ID >= 221715000000000000)
AND (Temp.MAPPING_ID <= 221715999999999999))
OR ((Temp.MAPPING_ID >= 0)
AND (Temp.MAPPING_ID <= 999999999999))))
GROUP BY Temp.CONTACT_ID) = '0'))
AND (LTocMapping.LIST_ID IN (221715000520574130 , 221715000201569885)))
AND (LTocMapping.STATUS = BINARY 'subscribed'))
AND (((cinfo.CONTACT_STATUS = BINARY 'active')
OR (cinfo.CONTACT_STATUS = BINARY 'softbounce'))
AND (LTocMapping.STATUS = BINARY 'subscribed')))
AND (((cinfo.CONTACT_ID >= 221715000000000000)
AND (cinfo.CONTACT_ID <= 221715999999999999))
OR ((cinfo.CONTACT_ID >= 0)
AND (cinfo.CONTACT_ID <= 999999999999))))
And the answer will be
Below tables FYR
Table 1 :
mysql> desc cinfo;
+------------------------+--------------+------+-----+-----------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+--------------+------+-----+-----------+-------+
| CONTACT_ID | bigint(19) | NO | PRI | NULL | |
| CONTACT_EMAIL | varchar(100) | NO | MUL | NULL | |
| TITLE | varchar(20) | YES | | NULL | |
| FIRSTNAME | varchar(100) | YES | | NULL | |
| LASTNAME | varchar(50) | YES | | NULL | | |
| ADDED_BY | varchar(20) | YES | | NULL | |
| ADDED_TIME | bigint(19) | NO | | NULL | |
| LAST_UPDATED_TIME | bigint(19) | NO | | NULL | |
+------------------------+--------------+------+-----+-----------+-------+
Table 2 :
mysql> desc LTocMapping;
+---------------------+--------------+------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+------------+-------+
| MAPPING_ID | bigint(19) | NO | PRI | NULL | |
| CONTACT_ID | bigint(19) | NO | MUL | NULL | |
| LIST_ID | bigint(19) | NO | MUL | NULL | |
| STATUS | varchar(100) | YES | | subscribed | |
| MAPPING_STATUS | varchar(20) | YES | | connected | |
| MAPPING_TIME | bigint(19) | YES | | NULL | |
+---------------------+--------------+------+-----+------------+-------+
As Far as I can tell, your subqueries are the bottleneck:
For the first subquery, you are using LTocMapping.CONTACT_ID
For the second subquery, you are using LTocMapping.CONTACT_ID as well.
These references (to values of the outer query) are causing these inner queries to become correlated subqueries (also called dependent subqueries). And that means: For every row you are going to fetch on one of the outer tables (~970000) - you are firing 2 additional queries on another table.
So, that's 1.8 Million (as it seems as well not trivial) queries you are executing.
Most the time, a correlated subquery can be replaced by a proper join. But this depends on the usecase. You also can join the same table twice, when using a different alias.
But to outline some join-options, you need to explain, why the subqueries resulting in the condition group_concat(....) = '0' are important - or maybe better, what you want to achieve.
(ps.: You can also see, that explain outlines them as dependent subquery)
OR is inefficient, see if you can avoid it.
Leading wildcards in LIKE are inefficient. See if a FULLTEXT index would work for you.
With a proper COLLATION, you don't need to test both upper and lower case. Also you can avoid use of BINARY. In both cases, you might be able to use an index. (What indexes do you have?)
Try to change from
WHERE ( ( SELECT ... ) = '0' )
to
WHERE ( NOT EXISTS ( SELECT ... ) )
(The SELECT will need some modification.)
(Please get rid of some of the redundant parens; it is hard to read.)
(Please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.)
Perhaps I've been staring at the screen too long but I have the following [legacy] table I'm messing with:
describe t3_test;
+--------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+------------------+------+-----+---------+----------------+
| provnum | varchar(24) | YES | MUL | NULL | |
| trgt_mo | datetime | YES | | NULL | |
| mcare | varchar(2) | YES | | NULL | |
| bed2prsn_asst | varchar(2) | YES | | NULL | |
| trnsfr2prsn_asst | varchar(2) | YES | | NULL | |
| tlt2prsn_asst | varchar(2) | YES | | NULL | |
| hygn2prsn_asst | varchar(2) | YES | | NULL | |
| bath2psrn_asst | varchar(2) | YES | | NULL | |
| ampmcare2prsn_asst | varchar(2) | YES | | NULL | |
| any2prsn_asst | varchar(2) | YES | | NULL | |
| n | float | YES | | NULL | |
| pct | float | YES | | NULL | |
| trgt_qtr | varchar(12) | YES | | NULL | |
| recno | int(10) unsigned | NO | PRI | NULL | auto_increment |
| enddate | date | YES | | NULL | |
+--------------------+------------------+------+-----+---------+----------------+
15 rows in set (0.00 sec)
I have data that looks like this..
"555223","2008-10-01 00:00:00",NULL,"1",NULL,NULL,NULL,NULL,NULL,NULL,"40","93.0233","2008Q4","5767343","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,"1",NULL,NULL,NULL,NULL,NULL,NULL,"40","93.0233","2008Q4","4075309","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,"0",NULL,NULL,NULL,NULL,NULL,NULL,"3","6.97674","2008Q4","4075308","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,"0",NULL,NULL,NULL,NULL,NULL,NULL,"3","6.97674","2008Q4","5767342","2008-12-31"
"555223","2008-10-01 00:00:00","N",NULL,"1",NULL,NULL,NULL,NULL,NULL,"36","83.7209","2008Q4","4075327","2008-12-31"
"555223","2008-10-01 00:00:00","N","1",NULL,NULL,NULL,NULL,NULL,NULL,"36","83.7209","2008Q4","4075323","2008-12-31"
"555223","2008-10-01 00:00:00","Y","1",NULL,NULL,NULL,NULL,NULL,NULL,"4","9.30233","2008Q4","4075325","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,NULL,"0",NULL,NULL,NULL,NULL,NULL,"3","6.97674","2008Q4","4075310","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,NULL,"1",NULL,NULL,NULL,NULL,NULL,"40","93.0233","2008Q4","4075311","2008-12-31"
The first two lines of the table clearly appear to be dupes (minus the A.I. index "recno"). I've tried a half dozen dupe-removal routines and they are not automatically removed.
At this point I am not sure what exactly is wrong? Is it possible there's an invisible character somewhere? Is it possible a letter is in a different character encoding? When I dump the data to CSV as is listed, it doesn't look any different.
Do you have a delete routine that would work on this file structure that would remove anything that is a dupe (minus the recno field)? I have been staring at this for two days and for some reason, it escapes me. (btw, I am aware of the column name anomaly for bathd2psrn_asst - that's not it)
This (original) table has over 13 million records in it. And is over 3GB in size so I'm looking for the most efficient way to kill dupes.. Any ideas?
Here's an example of one of the dupe-killing techniques I used that did not work:
DELETE a FROM t3_test as a, t3_test as b WHERE
(a.provnum=b.provnum)
AND (a.trgt_mo=b.trgt_mo OR a.trgt_mo IS NULL AND b.trgt_mo IS NULL)
AND (a.mcare=b.mcare OR a.mcare IS NULL AND b.mcare IS NULL)
AND (a.bed2prsn_asst=b.bed2prsn_asst OR a.bed2prsn_asst IS NULL AND b.bed2prsn_asst IS NULL)
AND (a.trnsfr2prsn_asst=b.trnsfr2prsn_asst OR a.trnsfr2prsn_asst IS NULL AND b.trnsfr2prsn_asst IS NULL)
AND (a.tlt2prsn_asst=b.tlt2prsn_asst OR a.tlt2prsn_asst IS NULL AND b.tlt2prsn_asst IS NULL)
AND (a.hygn2prsn_asst=b.hygn2prsn_asst OR a.hygn2prsn_asst IS NULL AND b.hygn2prsn_asst IS NULL)
AND (a.bath2psrn_asst=b.bath2psrn_asst OR a.bath2psrn_asst IS NULL AND b.bath2psrn_asst IS NULL)
AND (a.ampmcare2prsn_asst=b.ampmcare2prsn_asst OR a.ampmcare2prsn_asst IS NULL AND b.ampmcare2prsn_asst IS NULL)
AND (a.any2prsn_asst=b.any2prsn_asst OR a.any2prsn_asst IS NULL AND b.any2prsn_asst IS NULL)
AND (a.n=b.n OR a.n IS NULL AND b.n IS NULL)
AND (a.pct=b.pct OR a.pct IS NULL AND b.pct IS NULL)
AND (a.trgt_qtr=b.trgt_qtr OR a.trgt_qtr IS NULL AND b.trgt_qtr IS NULL)
AND (a.enddate=b.enddate OR a.enddate IS NULL AND b.enddate IS NULL)
AND (a.recno>b.recno);
For such a large table, delete can be quite inefficient -- all the logging needed for the deletes is very cumbersome.
I might recommend that you try the truncate/insert approach:
create table temp_t3_test as (
select provnum, targ_mo, . . .,
min(recno) as recno,
enddate
from t3_test
group by provnum, targ_mo, . . ., enddate;
truncate table t3_test;
insert into t3_test(provnum, targ_mo, . . . , recno, enddate)
select *
from temp_t3_test;
Try:
CREATE TABLE t3_new AS
(
SELECT provnum,
trgt_mo,
mcare,
bed2prsn_asst,
trnsfr2prsn_asst,
tlt2prsn_asst,
hygn2prsn_asst,
bath2psrn_asst,
ampmcare2prsn_asst,
any2prsn_asst,
n,
pct,
trgt_qtr,
Min(recno),
enddate
FROM t3_test
GROUP BY provnum,
trgt_mo,
mcare,
bed2prsn_asst,
trnsfr2prsn_asst,
tlt2prsn_asst,
hygn2prsn_asst,
bath2psrn_asst,
ampmcare2prsn_asst,
any2prsn_asst,
n,
pct,
trgt_qtr,
enddate
)
When you use min(recno), you don't actually select just one row. you select the minimum of all recno and use the same value for all the rows. To remove less rows, you can use distinct or group by as I have used. I would say that you can remove the rec no from the temp table and use a new auto increment column in the table that you create again to avoid gaps in the ids.
This is to be used in with the method suggested by Gordon Linoff.
In the case of this scenario, the problem was not with the SQL statement. It was a problem with the DATA, but it was not visible.
The two fields designated type "float" held hidden decimal values that were slightly different from each other. Converting those fields to DECIMAL(a,b) type made the dupes show up and be properly deleted by conventional means.
Special thanks to Gordon Linoff for suggesting looking into this.
I have this project I am working on, I have a table schema, see below
+--------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+------------+------+-----+---------+----------------+
| codeId | int(15) | NO | PRI | NULL | auto_increment |
| code | varchar(9) | YES | | NULL | |
| status | varchar(5) | YES | | 0 | |
+--------+------------+------+-----+---------+----------------+
This table is used for authorizations of codes, however some people send codes like dsfffMUBBDG345qwewqe for authorization, please note the capitalized part. In the code column there is a code MUBBDG345. I need to be able to check from the table if any combination of 9 characters the codes sent matches any of the codes in the db.
I have tried using this query but i just does not work.
select code, codeId, status from authCodes where 'dsfffMUBBDG345qwewqe' like code;
Is this even possible with a mysql query only?
you want to use
SELECT code, codeId, status
FROM authCodes
WHERE 'dsfffMUBBDG345qwewqe' LIKE CONCAT('%', code, '%')