Working on mysql.5.7
Here is my bugs table
MySQL [jira_statistics]> describe bugs;
+---------------------------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------------------+--------------+------+-----+---------+-------+
| issue_key | varchar(45) | NO | PRI | NULL | |
| release_name | varchar(45) | YES | MUL | NULL | |
| issue_summary | varchar(200) | YES | | NULL | |
| story_points | int(11) | NO | | 0 | |
| qa_reopened | float | NO | | 0 | |
| done_reopened | float | NO | | 0 | |
This table is updated by periodic calls to LOAD DATA LOCAL INFILE bugs <file.csv>
Whenever this update takes place (which may either update existing lines and/or insert new ones) I want another table that has some yielded statistics to be updated via the following trigger
create trigger update_bugs_stats after insert on `jira_statistics`.`bugs` for each row
begin
delimiter ;
-- STORY POINTS -------------------------
SELECT AVG(story_points) INTO #avg_bugs_storypoints FROM `jira_statistics`.`bugs` WHERE release_name = new.release_name;
SELECT MAX(story_points) INTO #max_bugs_storypoints FROM `jira_statistics`.`bugs` WHERE release_name = new.release_name;
SELECT MIN(story_points) INTO #min_bugs_storypoints FROM `jira_statistics`.`bugs` WHERE release_name = new.release_name;
INSERT INTO storypoints_stats (release_name, avg_bugs_storypoints, max_bugs_storypoints, min_bugs_storypoints)
VALUES (relName, #avg_bugs_storypoints, #max_bugs_storypoints, #min_bugs_storypoints)
ON DUPLICATE KEY UPDATE
relName=new.release_name,
avg_bugs_storypoints=#avg_bugs_storypoints,
max_bugs_storypoints=#max_bugs_storypoints,
min_bugs_storypoints=#min_bugs_storypoints;
However this gives me the following error whenever trying to create the trigger:
Unknown column new.release_name in where clause.
Why isn't the new keyword bein recognized?
Because new is reserved as a system word
Ref: https://dev.mysql.com/doc/refman/8.0/en/keywords.html
Please modify
new.release_name ==> `new`.`release_name`
etc..
Τhe error was more stupid than I thought;
I was working directly on sql query editor and not on the triggers tab of mysql workbench so it did not parse correctly the new keyword`.
Perhaps I've been staring at the screen too long but I have the following [legacy] table I'm messing with:
describe t3_test;
+--------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+------------------+------+-----+---------+----------------+
| provnum | varchar(24) | YES | MUL | NULL | |
| trgt_mo | datetime | YES | | NULL | |
| mcare | varchar(2) | YES | | NULL | |
| bed2prsn_asst | varchar(2) | YES | | NULL | |
| trnsfr2prsn_asst | varchar(2) | YES | | NULL | |
| tlt2prsn_asst | varchar(2) | YES | | NULL | |
| hygn2prsn_asst | varchar(2) | YES | | NULL | |
| bath2psrn_asst | varchar(2) | YES | | NULL | |
| ampmcare2prsn_asst | varchar(2) | YES | | NULL | |
| any2prsn_asst | varchar(2) | YES | | NULL | |
| n | float | YES | | NULL | |
| pct | float | YES | | NULL | |
| trgt_qtr | varchar(12) | YES | | NULL | |
| recno | int(10) unsigned | NO | PRI | NULL | auto_increment |
| enddate | date | YES | | NULL | |
+--------------------+------------------+------+-----+---------+----------------+
15 rows in set (0.00 sec)
I have data that looks like this..
"555223","2008-10-01 00:00:00",NULL,"1",NULL,NULL,NULL,NULL,NULL,NULL,"40","93.0233","2008Q4","5767343","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,"1",NULL,NULL,NULL,NULL,NULL,NULL,"40","93.0233","2008Q4","4075309","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,"0",NULL,NULL,NULL,NULL,NULL,NULL,"3","6.97674","2008Q4","4075308","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,"0",NULL,NULL,NULL,NULL,NULL,NULL,"3","6.97674","2008Q4","5767342","2008-12-31"
"555223","2008-10-01 00:00:00","N",NULL,"1",NULL,NULL,NULL,NULL,NULL,"36","83.7209","2008Q4","4075327","2008-12-31"
"555223","2008-10-01 00:00:00","N","1",NULL,NULL,NULL,NULL,NULL,NULL,"36","83.7209","2008Q4","4075323","2008-12-31"
"555223","2008-10-01 00:00:00","Y","1",NULL,NULL,NULL,NULL,NULL,NULL,"4","9.30233","2008Q4","4075325","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,NULL,"0",NULL,NULL,NULL,NULL,NULL,"3","6.97674","2008Q4","4075310","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,NULL,"1",NULL,NULL,NULL,NULL,NULL,"40","93.0233","2008Q4","4075311","2008-12-31"
The first two lines of the table clearly appear to be dupes (minus the A.I. index "recno"). I've tried a half dozen dupe-removal routines and they are not automatically removed.
At this point I am not sure what exactly is wrong? Is it possible there's an invisible character somewhere? Is it possible a letter is in a different character encoding? When I dump the data to CSV as is listed, it doesn't look any different.
Do you have a delete routine that would work on this file structure that would remove anything that is a dupe (minus the recno field)? I have been staring at this for two days and for some reason, it escapes me. (btw, I am aware of the column name anomaly for bathd2psrn_asst - that's not it)
This (original) table has over 13 million records in it. And is over 3GB in size so I'm looking for the most efficient way to kill dupes.. Any ideas?
Here's an example of one of the dupe-killing techniques I used that did not work:
DELETE a FROM t3_test as a, t3_test as b WHERE
(a.provnum=b.provnum)
AND (a.trgt_mo=b.trgt_mo OR a.trgt_mo IS NULL AND b.trgt_mo IS NULL)
AND (a.mcare=b.mcare OR a.mcare IS NULL AND b.mcare IS NULL)
AND (a.bed2prsn_asst=b.bed2prsn_asst OR a.bed2prsn_asst IS NULL AND b.bed2prsn_asst IS NULL)
AND (a.trnsfr2prsn_asst=b.trnsfr2prsn_asst OR a.trnsfr2prsn_asst IS NULL AND b.trnsfr2prsn_asst IS NULL)
AND (a.tlt2prsn_asst=b.tlt2prsn_asst OR a.tlt2prsn_asst IS NULL AND b.tlt2prsn_asst IS NULL)
AND (a.hygn2prsn_asst=b.hygn2prsn_asst OR a.hygn2prsn_asst IS NULL AND b.hygn2prsn_asst IS NULL)
AND (a.bath2psrn_asst=b.bath2psrn_asst OR a.bath2psrn_asst IS NULL AND b.bath2psrn_asst IS NULL)
AND (a.ampmcare2prsn_asst=b.ampmcare2prsn_asst OR a.ampmcare2prsn_asst IS NULL AND b.ampmcare2prsn_asst IS NULL)
AND (a.any2prsn_asst=b.any2prsn_asst OR a.any2prsn_asst IS NULL AND b.any2prsn_asst IS NULL)
AND (a.n=b.n OR a.n IS NULL AND b.n IS NULL)
AND (a.pct=b.pct OR a.pct IS NULL AND b.pct IS NULL)
AND (a.trgt_qtr=b.trgt_qtr OR a.trgt_qtr IS NULL AND b.trgt_qtr IS NULL)
AND (a.enddate=b.enddate OR a.enddate IS NULL AND b.enddate IS NULL)
AND (a.recno>b.recno);
For such a large table, delete can be quite inefficient -- all the logging needed for the deletes is very cumbersome.
I might recommend that you try the truncate/insert approach:
create table temp_t3_test as (
select provnum, targ_mo, . . .,
min(recno) as recno,
enddate
from t3_test
group by provnum, targ_mo, . . ., enddate;
truncate table t3_test;
insert into t3_test(provnum, targ_mo, . . . , recno, enddate)
select *
from temp_t3_test;
Try:
CREATE TABLE t3_new AS
(
SELECT provnum,
trgt_mo,
mcare,
bed2prsn_asst,
trnsfr2prsn_asst,
tlt2prsn_asst,
hygn2prsn_asst,
bath2psrn_asst,
ampmcare2prsn_asst,
any2prsn_asst,
n,
pct,
trgt_qtr,
Min(recno),
enddate
FROM t3_test
GROUP BY provnum,
trgt_mo,
mcare,
bed2prsn_asst,
trnsfr2prsn_asst,
tlt2prsn_asst,
hygn2prsn_asst,
bath2psrn_asst,
ampmcare2prsn_asst,
any2prsn_asst,
n,
pct,
trgt_qtr,
enddate
)
When you use min(recno), you don't actually select just one row. you select the minimum of all recno and use the same value for all the rows. To remove less rows, you can use distinct or group by as I have used. I would say that you can remove the rec no from the temp table and use a new auto increment column in the table that you create again to avoid gaps in the ids.
This is to be used in with the method suggested by Gordon Linoff.
In the case of this scenario, the problem was not with the SQL statement. It was a problem with the DATA, but it was not visible.
The two fields designated type "float" held hidden decimal values that were slightly different from each other. Converting those fields to DECIMAL(a,b) type made the dupes show up and be properly deleted by conventional means.
Special thanks to Gordon Linoff for suggesting looking into this.
We are having a Analytics product. For each of our customer we give one JavaScript code, they put that in their web sites. If a user visit our customer site the java script code hit our server so that we store this page visit on behalf of our customer. Each of our customer contains unique domain name that means customer determined by domain nam
Database server : MySql 5.6
Table rows : 400 million
Following is our table schema.
+---------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| domain | varchar(50) | NO | MUL | NULL | |
| guid | binary(16) | YES | | NULL | |
| sid | binary(16) | YES | | NULL | |
| url | varchar(2500) | YES | | NULL | |
| ip | varbinary(16) | YES | | NULL | |
| is_new | tinyint(1) | YES | | NULL | |
| ref | varchar(2500) | YES | | NULL | |
| user_agent | varchar(255) | YES | | NULL | |
| stats_time | datetime | YES | | NULL | |
| country | char(2) | YES | | NULL | |
| region | char(3) | YES | | NULL | |
| city | varchar(80) | YES | | NULL | |
| city_lat_long | varchar(50) | YES | | NULL | |
| email | varchar(100) | YES | | NULL | |
+---------------+------------------+------+-----+---------+----------------+
In above table guid represents visitor of our customer site and sid represents visitor session of our customer site. That means for every sid there should be associated guid.
We need queries like following
Query 1 : Find unique,total visitors
SELECT count(DISTINCT guid) AS count,count(guid) AS total FROM page_views WHERE domain = 'abc' AND stats_time BETWEEN '2015-10-05 00:00:00' AND '2015-10-04 23:59:59'
composite index planning : domain,stats_time,sid
Query 2 : Find unique,total sessions
SELECT count(DISTINCT sid) AS count,count(sid) AS total FROM page_views WHERE domain = 'abc' AND stats_time BETWEEN '2015-10-05 00:00:00' AND '2015-10-04 23:59:59'
composite index planning : domain,stats_time,guid
Query 3: Find visitors,sessions by country ,by region, by city
composite index planning : domain,country
composite index planning : domain,region
Each combination is requiring new composite index. That means huge index file, we can't keep this in memory so performance of the queries are low.
Is there any way optimize this index combinations to reduce index size and improve performance.
Just for grins, run this to see what type of spread you have...
select
country, region, city,
DATE_FORMAT(colName, '%Y-%m-%d') DATEONLY, count(*)
from
yourTable
group by
country, region, city,
DATE_FORMAT(colName, '%Y-%m-%d')
order by
count(*) desc
and then see how many rows it returns. Also, what sort of range does the COUNT column generate. Instead of just an index, does it make sense to create a separate aggregation table on the key elements you are trying to provide with data mining.
If so, I would recommend looking at a similar post also on the stack here. This shows a SAMPLE on how, but I would first look at the counts before suggesting further. But if you have it broken down on a daily basis, what MIGHT this be reduced to.
Additionally, you might want to create pre-aggregate tables ONCE to get started, then have a nightly procedure that builds any new records based on a day just completed. This way it is never running through all 400M records.
If your pre-aggregate tables store based on just the date (y,m,d only), your queries rolled-up per day would shorten querying requirements. The COUNT(*) is just an example basis, but your could add count( distinct whateverColumn ) as needed. Then, you could query the SUM( aggregateColumn ) based on domain, date range, etc. If your 400M records gets reduced down to 7M records, I would also have a minimum index on the (domain, dateOnlyField, and maybe country) to optimize your domain, date-range queries. Once you get something narrowed down at whatever level make sense, you could always drill into the raw data for the granular level.
First, sorry for the title, as I'm no native english-speaker, this is pretty hard to phrase. In other words, what I'm trying to achieve is this:
I'm trying to fetch all domain names from the table virtual_domains where there is no corresponding entry in the virtual_aliases table starting like "postmaster#%".
So if I have two domains:
foo.org
example.org
An they got aliases like:
info#foo.org => admin#foo.org
postmaster#foo.org => user1#foo.org
info#example.org => admin#example.org
I want the query to return only the domain "foo.org" as "example.org" is missing the postmaster alias.
This is the table layout:
mysql> show columns from virtual_aliases;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| domain_id | int(11) | NO | MUL | NULL | |
| source | varchar(100) | NO | | NULL | |
| destination | varchar(100) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
mysql> show columns from virtual_domains;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | | NULL | |
+-------+-------------+------+-----+---------+----------------+
I tried for many hours with IF, CASE, LIKE queries with no success. I don't need a final solution, maybe just a hint with some explanation. Thanks!
SELECT * FROM virtual_domains AS domains
LEFT JOIN virtual_aliases AS aliases
ON domains.id = aliases.domain_id
WHERE aliases.domain_id IS NULL
LEFT JOIN returns all records from the "left" table, even they have no corresponding records in "right" table. Those records will have the right table fields set to NULL. Use WHERE to strip all the others.
I guess I didn't understand you correctly the first time. You have several entries in aliases for single domain, and you want to display only those domains that don't have an entry in aliases table that starts with "postmaster"?
In this case you are should use NOT IN like this:
SELECT * FROM virtual_domains AS domains
WHERE domains.id NOT IN (
SELECT domain_id
FROM virtual_aliases
WHERE whatever_column LIKE "postmaster#%"
)
select id,domain from virtual_domains
where id not in (select domain_id from virtual_aliases)
SELECT * FROM virtual_domains vd
LEFT JOIN virtual_aliases va ON vd.id = va.domain_id
AND va.destination NOT LIKE 'postmaster#%';