I have two tables, which go like
t1
alias_id (string, unique)
finished (datetime)
sum (float)
t2
alias_id (string)
sum (float)
tables contain payments, around 800 k records each. t1 contains each payment just one time, while t2 can have several records with same alias_id - for some payments can consist of several transactions.
I need to compare the sum field in t1 to Sum of sum fields in t2, grouped by alias.
Doing it in Excel works, but is painful and takes about 4 hours. I tried uploading tables to mysql and running a query on them, was surprised to see it took like 8 hours to complete.
I have no idea why, maybe my query is bad? Or maybe grouping by time and sum does that? Could really use a general advice on best approach to the task.
Query goes below.
SELECT
s.alias_id AS id,
s.finished AS finished,
s.sum AS sum,
Sum(b.sum_aggr) AS b_sum
FROM report.rep1 s
LEFT JOIN
( SELECT alias_id, SUM(sum) AS sum_aggr
FROM report.rep2
GROUP BY 1
) b
ON b.alias_id = s.alias_id
GROUP BY 1, 2, 3;
Table DDLs:
first:
CREATE TABLE `rep1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`corp_client_id` longtext,
`agr_name` longtext,
`client_id` longtext,
`order_id` longtext,
`alias_id` longtext,
`due` longtext,
`finished` longtext,
`sum` double NOT NULL,
`currency` longtext,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=720886 DEFAULT CHARSET=utf8
second:
CREATE TABLE `rep2` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`client_id` longtext,
`contract` longtext,
`contract_start_dt` longtext,
`contract_end_dt` longtext,
`country` longtext,
`provider` longtext,
`date` longtext,
`alias_id` longtext,
`transaction_id` longtext,
`payment_transaction` longtext,
`transaction_type` longtext,
`sum` double NOT NULL,
`transaction_type_name` longtext,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=655351 DEFAULT CHARSET=utf8
If you want to compare that the Sums are matching, you can simply do a left join between the tables on alias_id. Now, just compute the SUM on the second table, and then you can compare them.
Try the following instead:
SELECT
s.alias_id AS id,
s.finished AS finished,
s.sum AS sum,
SUM(b.sum) AS b_sum
FROM report.rep1 AS s
LEFT JOIN report.rep2 AS s2 ON s2.alias_id = s.alias_id
GROUP BY s.alias_id, s.finished, s.sum
EDIT: As observed by OP's comments, that alias_id is not indexed on either of the tables. Since the alias_id field is longtext type; it will need proper Indexing, otherwise queries will be slow no matter what. Now, fields with longtext datatype cannot be indexed; so you will need to first convert them into varchar datatype.
ALTER TABLE `rep1` MODIFY COLUMN `alias_id` VARCHAR(255);
ALTER TABLE `rep2` MODIFY COLUMN `alias_id` VARCHAR(255);
You can add the indexing on both the tables as follows:
ALTER TABLE `rep1` ADD INDEX alias_id (`alias_id`);
ALTER TABLE `rep2` ADD INDEX alias_id (`alias_id`);
If alias_id is going to be Unique in the table rep1, you can use the following statement (instead of the first statement above):
ALTER TABLE `rep1` ADD UNIQUE alias_id (`alias_id`);
Related
I have two tables with the following schema,
CREATE TABLE `open_log` (
`delivery_id` varchar(30) DEFAULT NULL,
`email_id` varchar(50) DEFAULT NULL,
`email_activity` varchar(30) DEFAULT NULL,
`click_url` text,
`email_code` varchar(30) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `sent_log` (
`email_id` varchar(50) DEFAULT NULL,
`delivery_id` varchar(50) DEFAULT NULL,
`email_code` varchar(50) DEFAULT NULL,
`delivery_status` varchar(50) DEFAULT NULL,
`tries` int(11) DEFAULT NULL,
`creation_ts` varchar(50) DEFAULT NULL,
`creation_dt` varchar(50) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The email_id and delivery_id columns in both tables make up a unique key.
The open_log table have 2.5 million records where as sent_log table has 0.25 million records.
I want to filter out the records from open log table based on the unique key (email_id and delivery_id).
I'm writing the following query.
SELECT * FROM open_log
WHERE CONCAT(email_id,'^',delivery_id)
IN (
SELECT DISTINCT CONCAT(email_id,'^',delivery_id) FROM sent_log
)
The problem is the query is taking too much time to execute. I've waited for an hour for the query completion but didn't succeed.
Kindly, suggest what I can do to make it fast since, I have the big data size in the tables.
Thanks,
Faisal Nasir
First, rewrite your query using exists:
SELECT *
FROM open_log ol
WHERE EXISTS (SELECT 1
FROM send_log sl
WHERE sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id
);
Then, add an index so this query will run faster:
create index idx_sendlog_emailid_deliveryid on send_log(email_id, delivery_id);
Your query is slow for a variety of reasons:
The use of string concatenation makes it impossible for MySQL to use an index.
The select distinct in the subquery is unnecessary.
Exists can be faster than in.
If this request is often on, you can greatly increase it by create bigint id column, enven if it not unique.
For example you can put trigger and create column like this
alter table sent_log for_get bigint;
After that create trigger/ update it to put hash into that bigint
for_get=CONV(substr(md5(concat(email_id, delivery_id)),1,10),16,10)
If you have such column in both table and index on it, query will be like
SELECT *
FROM open_log ol
left join send_log sl on sl.for_get=ol.for_get
WHERE sl.email_id is not null and sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id;
That query will be fast.
I have two tables, which I need to merge, and they are:
CREATE TABLE IF NOT EXISTS `legacy_bookmarks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`url` text,
`title` text,
`snippet` text,
`datetime` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `datetime` (`datetime`),
FULLTEXT KEY `title` (`title`,`snippet`)
)
And:
CREATE TABLE IF NOT EXISTS `legacy_links` (
`id` mediumint(11) NOT NULL AUTO_INCREMENT,
`user_id` mediumint(11) NOT NULL,
`bookmark_id` int(11) NOT NULL,
`status` enum('public','private') NOT NULL DEFAULT 'public',
UNIQUE KEY `id` (`id`),
KEY `bookmark_id` (`bookmark_id`)
)
As you can see, "legacy_links" contains the ID for "legacy_bookmarks". Am I able to merge the two, based on this relationship?
I can easily change the name of the ID column in "legacy_bookmarks" to "bookmark_id", if that makes things any easier.
Just so you know, the order of the columns, and their types, must be exact, because the data from this combined table is then to be imported into the new "bookmarks" table.
Also, I'd need to able to include additional columns (a "modification" column, populated with the "datetime" values), and change the order of the ones I have.
Any takers?
[Up to you to change the order of the columns]
CREATE TABLE `legacy_linkss` AS
SELECT l.id, l.url, l.title, l.snippet, l.datetime AS modification, b.user_id, b.status
FROM
`legacy_links` l
JOIN `legacy_bookmarks` b ON b.id = l.bookmark_id
;
Afterwards, after checking the consistency and adding manually the constraints, you may:
DROP TABLE `legacy_links`;
DROP TABLE `legacy_bookmarks`;
RENAME TABLE `legacy_linkss` TO `legacy_links`;
Yes, it's called a join, and you would do it like so:
SELECT *
FROM legacy_bookmarks lb
INNER JOIN legacy_links ll ON ll.bookmark_id = lb.id
I have 2 tables city_sessions_1 and city_sessions_2
Structure of both table are similar
CREATE TABLE `city_sessions_1` (
`city_id` int(11),
`session_date` date,
`start_time` varchar(12),
`end_time` varchar(12) ,
`attendance` int(11) ,
KEY `city` (`city_id`),
KEY `session_date` (`session_date`)
) ENGINE=MyISAM;
Note these tables do not have any primary key, but they have their indexes defined. Both tables have same number of rows. But it is expected that some data would be different.
How can I compare these 2 tables' data?
-- We start with the rows in city_session_1, and their fit in city_session_2
SELECT
* -- or whatever fields you are interested in
FROM city_sessions_1
LEFT JOIN city_sessions_2 ON city_sessions_1.city_id=city_sessions_2.city_id
WHERE
-- Chose only those differences you are intersted in
city_sessions_1.session_date<>city_session_2.session_date
OR city_sessions_1.start_time<>city_session_2.start_time
OR city_sessions_1.end_time<>city_session_2.end_time
OR city_sessions_1.attendance<>city_session_2.attendance
UNION
-- We need those rows in city_session_2, that have no fit in city_session_1
SELECT
* -- or whatever fields you are interested in
FROM city_sessions_2
LEFT JOIN city_sessions_1 ON city_sessions_1.city_id=city_sessions_2.city_id
WHERE city_sessions_1.city_id IS NULL
I need to optimize indexes in a table that stores more than 10 Millions rows. The query that is particularly time consuming takes up to 10 seconds to load (when WHERE clause filters only about 2 Millions rows - 8 Millions must be grouped). I have created a few indexes (some of them are complex, some simpler) and tried to find out how to speed this up. Perhaps I'm doing something wrong. MySQL is using optimized_5 index (based on EXPLAIN).
Here is the table's structure and the query:
CREATE TABLE IF NOT EXISTS `geo_reverse` (
`fid` mediumint(8) unsigned NOT NULL,
`tablename` enum('table1','table2') NOT NULL default 'table1',
`geo_continent` varchar(2) NOT NULL,
`geo_country` varchar(2) NOT NULL,
`geo_region` varchar(8) NOT NULL,
`geo_city` mediumint(8) unsigned NOT NULL,
`type` varchar(30) NOT NULL,
PRIMARY KEY (`fid`,`tablename`,`geo_continent`,`geo_country`,`geo_region`,`geo_city`),
KEY `geo_city` (`geo_city`),
KEY `fid` (`fid`),
KEY `geo_region` (`geo_region`,`geo_city`),
KEY `optimized` (`tablename`,`type`,`geo_continent`,`geo_country`,`geo_region`,`geo_city`,`fid`),
KEY `optimized_2` (`fid`,`tablename`),
KEY `optimized_3` (`type`,`geo_city`),
KEY `optimized_4` (`geo_city`,`tablename`),
KEY `optimized_5` (`tablename`,`type`,`geo_city`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
An example query:
SELECT type, COUNT(*) AS objects FROM geo_reverse WHERE tablename = 'table1' AND geo_city IN (5847207,5112771,4916894,...) GROUP BY type
Do you have any idea of how to speed the computation up?
i would use the following index: (geo_city, tablename, type) - geo_city is obviously more selective than tablename, thus it should be on the left. After the condition is applied, the rest should be sorted by type for grouping.
here goes my two MySQL Queries and can some guide me which is the best query to use as per MYSQl DATABase
the below goes my two sql queries
query 1)
select cast(sum(G1.amount)as decimal(8,2)) as YTDRegularPay,cast(sum(b1.amount)as decimal(8,2))as YTDBonusPay
from tbl_employees_swc_grosswagedetails g1,tbl_employees_swc_grosswagedetails b1
where g1.empid=b1.empid
and g1.PayYear=b1.PayYear
and g1.PayperiodNumber=b1.PayperiodNumber
and g1.Fedtaxid=b1.Fedtaxid
and g1.fedtaxid=998899889
and g1.payyear=2011
and g1.PayperiodNumber<=26
and g1.Wage_code='GRTT'
and g1.Taxing_AuthType=b1.Taxing_AuthType
and g1.empid=1005 and b1.wage_code='GRSP'
and g1.taxing_AuthType='FED' ;
and
Query 2)
select abc.Amount as YTDRegularPay,def.Amount as YTDBonusPay
from (select Cast(sum(EG.Amount) as Decimal(8,2)) as Amount
from tbl_employees_swc_grosswagedetails EG
where EG.FedTaxID=998899889
and EG.EmpID=1005
and PayYear=2011
and EG.PayPeriodNumber<=26
and EG.Wage_code='GRTT'
and Taxing_AuthType='FED') as abc,
(select Cast(sum(EG.Amount) as Decimal(8,2)) as Amount
from tbl_employees_swc_grosswagedetails EG
where EG.FedTaxID=998899889
and EG.EmpID=1005
and PayYear=2011
and EG.PayPeriodNumber<=26
and EG.Wage_code='GRSP'
and Taxing_AuthType='FED') as def ;
Here goes my Table structure
delimiter $$
CREATE TABLE `tbl_employees_swc_grosswagedetails` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`empid` int(11) NOT NULL,
`Fedtaxid` varchar(9) NOT NULL,
`Wage_code` varchar(45) NOT NULL,
`Amount` double NOT NULL,
`Hrly_Rate` double DEFAULT NULL,
`Num_hours` double DEFAULT NULL,
`Taxing_AuthType` varchar(10) DEFAULT NULL,
`Taxing_Auth_Name` varchar(10) DEFAULT NULL,
`PayperiodNumber` int(11) NOT NULL,
`PayYear` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `empid` (`empid`),
CONSTRAINT `empid` FOREIGN KEY (`empid`) REFERENCES `tblemployee` (`EmpID`)
ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=359 DEFAULT CHARSET=latin1$$
any good query else these are very much appreciable
Thanks IN adv,
Raghavendra.V
I would say the first one is better, since using JOIN is almost always better than using a subquery. It is also recommended to write the JOIN explicitly (though it does not matter in terms of performance), like this:
SELECT
CAST(SUM(G1.amount) AS decimal(8,2)) AS YTDRegularPay,
CAST(SUM(b1.amount) AS decimal(8,2)) AS YTDBonusPay
FROM
tbl_employees_swc_grosswagedetails g1,
JOIN
tbl_employees_swc_grosswagedetails b1 ON g1.empid = b1.empid
AND g1.PayYear = b1.PayYear
AND g1.PayperiodNumber = b1.PayperiodNumber
AND g1.Taxing_AuthType = b1.Taxing_AuthType
AND g1.Fedtaxid = b1.Fedtaxid
WHERE
g1.fedtaxid = 998899889
AND g1.payyear = 2011
AND g1.PayperiodNumber <= 26
AND g1.Wage_code = 'GRTT'
AND b1.wage_code = 'GRSP'
AND g1.empid = 1005
AND g1.taxing_AuthType = 'FED';
Adding some indexes will probably help as well to make both queries quicker. Since you use many columns in your WHERE clause, you need to choose which ones to index according to the data structure. Try adding a bunch of indexes, run the query with EXPLAIN and see which index is used - this one would be the most effective one and than you can drop the others.