Why this Query takes such a long time to execute - mysql

I have three tables
glSalesJournal
HMISAdd
HMISMain
Now what i am trying to do is add the glSalesJournal amt with HMISAdd amt while grouping up with various Fields and inserting the result into glSalesJournal
The glSalesJournal contains 633173 records
The HMISAdd contains 4193 records
HMISAdd and glSalesJournal contains the same columns which are
loc
glAcct
glSubAcct
batchNbr
contractNbr
amt
I added indexes to the table still the results are the same.
Here is my code:
INSERT INTO hmismain
(loc,
glacct,
subacct,
batchnbr,
contractnbr,
amt)
SELECT glsalesjournal.loc,
glsalesjournal.glacct,
glsalesjournal.glsubacct,
( glsalesjournal.amt + hmisadd.amt ) AS sumAmt,
glsalesjournal.batchnbr,
glsalesjournal.salescontnbr
FROM glsalesjournal
LEFT OUTER JOIN hmisadd
ON ( glsalesjournal.loc = hmisadd.loc
AND glsalesjournal.glacct = hmisadd.glacct
AND glsalesjournal.glsubacct = hmisadd.subacct
AND glsalesjournal.batchnbr = hmisadd.batchnbr
AND glsalesjournal.salescontnbr = hmisadd.contractnbr )
GROUP BY glsalesjournal.loc,
hmisadd.loc,
glsalesjournal.glacct,
hmisadd.glacct,
glsalesjournal.glsubacct,
hmisadd.subacct,
glsalesjournal.batchnbr,
hmisadd.batchnbr,
glsalesjournal.salescontnbr,
hmisadd.contractnbr
The time taken by the script to execute is more than 2 hours. Even when I limit the Records to 100 the time taken is the same.
Can someone please guide me how can I optimize the script.
Thanks

1) It looks like it's a one off query, am I correct here? If not than you are inserting the same data into hmismain table every time.
2) You are grouping on fields from TWO separate tables, so no amount of indexing will ever help you. The ONLY index that will help is an index over a view linking these two tables in the same way.
Further note:
What is the point of
GROUP BY glsalesjournal.loc,
hmisadd.loc,
glsalesjournal.glacct,
hmisadd.glacct,
glsalesjournal.glsubacct,
hmisadd.subacct,
glsalesjournal.batchnbr,
hmisadd.batchnbr,
glsalesjournal.salescontnbr,
hmisadd.contractnbr
You are grouping the data by the same fields twice
glsalesjournal.loc, hmisadd.loc
glsalesjournal.glacct, hmisadd.glacct,
...
Remove the duplicates from GROUP BY and it should run fast

Did you add an index on this fields:
glSalesJournal.loc
glSalesJournal.glAcct
glSalesJournal.glSubAcct
glSalesJournal.batchNbr
glSalesJournal.salesContNbr
HMISAdd.Loc
HMISAdd.GlAcct
HMISAdd.SubAcct
HMISAdd.batchNbr
HMISAdd.contractNbr
If this fields are unindexed, it will perform fulltable scan for each individual record thus causing slow performance.
MySQL Create Index Syntax

Related

selecting most recent row from joined table in MySQL

I have two tables, Project and Projectnote
There is a one to many relationship between project and projectnote.
I want to be able to list my projects and select the most recent projectnotes based on the id.
Is this possible to do in Mysql query, I can't figure it out.
Thanks for any help!
Edit: so far I have a basic query (below) that joins the two tables. However, this only selects projects where a note exists and I get multiple rows where there are several notes per project.
SELECT `driver_checkins`.*, `driver_trips`.`id` AS `trip_id`, `driver_trips`.`trip_num` AS `trip_num`, `driver_trips`.`status` AS `trip_status`, `driver_trips`.`ride_date` AS `ride_date`, `driver_trips`.`today_date` AS `trip_today_date`, `driver_trips`.`pick_up_time` AS `pick_up_time`, `driver_trips`.`d_time` AS `d_time`, `driver_trips`.`trip_type` AS `trip_type`
FROM `driver_checkins`
LEFT JOIN `driver_trips` ON `driver_trips`.`driver_id` = `driver_checkins`.`driver_id` WHERE `checkin_status` = 1 AND `booking_status` = 0;
Using a GROUP BY clause and the GROUP_CONCAT function should do the trick.
I am assuming that "driver_checkins" is your "Project" table and "driver_trips" is your "Projectnote" table.
SELECT `driver_checkins`.*, GROUP_CONCAT(`driver_trips`.`id`, `driver_trips`.`status` ORDER BY `driver_trips`.id DESC SEPARATOR " --- " LIMIT 3)
FROM `driver_checkins`
LEFT JOIN `driver_trips` ON `driver_trips`.`driver_id` = `driver_checkins`.`driver_id`
WHERE `checkin_status` = 1 AND `booking_status` = 0
GROUP BY `driver_checkins`.id;
This should display id and status for the last 3 driver_trips per driver_checkin, separated by " --- ".
Something to consider: while in many cases ordering by id will work chronologically, it's always better to add a timestamp column (e.g. called created) instead to order by chronologically.

MySQL Query gets too complex for me

I'm trying to write a MYSQL Query that updates a cell in table1 with information gathered from 2 other tables;
The gathering of data from the other 2 tables goes without much issues (it is slow, but that's because one of the 2 tables has 4601537 records in it.. (because all the rows for one report are split in a separate record, meaning that 1 report has more than 200 records)).
The Query that I use to Join the two tables together is:
# First Table, containing Report_ID's: RE
# Table that has to be updated: REGI
# Join Table: JT
SELECT JT.report_id as ReportID, REGI.Serienummer as SerialNo FROM Blancco_Registration.TrialTable as REGI
JOIN (SELECT RE.Value_string, RE.report_id
FROM Blancco_new.mc_report_Entry as RE
WHERE RE.path_id=92) AS JT ON JT.Value_string = REGI.Serienummer
WHERE REGI.HardwareType="PC" AND REGI.BlanccoReport=0 LIMIT 100
This returns 100 records (I limit it because the database is in use during work hours and I don't want to steal all resources).
However, I want to use these results in a Query that updates the REGI table (which it uses to select the 100 records in the first place).
However, I get the error that I cannot select from the table itself while updateing it (logically). So I tried selecting the select statement above into a temp table and than Update it; however, then I get the issue that I get to much results (logically! I only need 1 result and get 100) however, I'm getting stuck in my own thougts.. I ultimately need to fill the ReportID into each record of REGI.
I know it should be possible, but I'm no expert in MySQL.. is there anybody that can point me into the right direction?
Ps. fixing the table containing 400k records is not an option, it's a program from an external developer and I can only read that database.
The errors I'm talking about are as follows:
Error Code: 1093. You can't specify target table 'TrialTable' for update in FROM clause
When I use:
UPDATE TrialTable SET TrialTable.BlanccoReport =
(SELECT JT.report_id as ReportID, REGI.Serienummer as SerialNo FROM Blancco_Registration.TrialTable as REGI
JOIN (SELECT RE.Value_string, RE.report_id
FROM Blancco_new.mc_report_Entry as RE
WHERE RE.path_id=92) AS JT ON JT.Value_string = REGI.Serienummer
WHERE REGI.HardwareType="PC" AND REGI.BlanccoReport=0 LIMIT 100)
WHERE TrialTable.HardwareType="PC" AND TrialTable.BlanccoReport=0)
Then I tried:
UPDATE TrialTable SET TrialTable.BlanccoReport = (SELECT ReportID FROM (<<and the rest of the SQL>>> ) as x WHERE X.SerialNo = TrialTable.Serienummer)
but that gave me the following error:
Error Code: 1242. Subquery returns more than 1 row
Haveing the Query above with a LIMIT 1, gives everything the same result
Firstly, your query seems to be functionally identical to the following:
SELECT RE.report_id ReportID
, REGI.Serienummer SerialNo
FROM Blancco_Registration.TrialTable REGI
JOIN Blancco_new.mc_report_Entry RE
ON RE.Value_string = REGI.Serinummer
WHERE REGI.HardwareType = "PC"
AND REGI.BlanccoReport=0
AND RE.path_id=92
LIMIT 100
So, why not use that?
EDIT:
I still don't get it. I can't see what part of the problem the following fails to solve...
UPDATE TrialTable REGI
JOIN Blancco_new.mc_report_Entry RE
ON RE.Value_string = REGI.Serinummer
SET TrialTable.BlanccoReport = RE.report_id
WHERE REGI.HardwareType = "PC"
AND REGI.BlanccoReport=0
AND RE.path_id=92;
(This is not an answer, but maybe a pointer towards a few points that need further attention)
Your JT sub query looks suspicious to me:
(SELECT RE.Value_string, RE.report_id
FROM Blancco_new.mc_report_Entry as RE
WHERE RE.path_id=92
GROUP BY RE.report_id)
You use group by but don't actually use any aggregate functions. The column RE.Value_string should strictly be something like MAX(RE.Value_string) instead.

how to perform a table join when the other table has many rows corresponding to one row in mother table

I have two table --> tbl_book_details and tbl_table_traking
tbl_book_details has columns bd_book_code,
bd_isbn,
bd_title,
bd_edition,
bd_author,
bd_publisher,
bd_supplier,
bd_page,
bd_price_type,
bd_cost_price,
bd_price,
bd_Tax,
bd_covering,
bd_availability,
bd_keywords,
bd_notes,
bd_details,
bd_news_latter,
bd_etDate,
bd_weight,
bd_expire_date,
bd_status
tbl_table_traking has columns
tt_id,
tt_action,
tt_table,
tt_record_id,
tt_on_date,
tt_user,
tt_status
the process is a trigger is defined on tbl_book_details which in case of insert/modify insert the data in tbl_table_traking for traking when and who has modified the records.
till now i have been using following query which is not a join -->
SELECT
tbl_books_details.bd_book_code AS bkid,
tbl_books_details.bd_isbn,
tbl_books_details.bd_title AS title,
-- This part is what I believe is slowing down my query
(SELECT
tt_on_date
FROM
tbl_table_tracking
WHERE tt_action = 'MODIFY'
AND tt_record_id = tbl_books_details.bd_book_code ORDER BY tt_on_date) AS bd_etdate
it was working fine when the records count were below 3 million, but now script time out is occurring.
I have made the index on tbl_table_traking on 'tt_ondate' and on tt_action,
If there any way i can convert it to a join or improve the performance?
the table traking query is returning the mostrecent date on which the record was modified.
my database is in mysql.
You should be able to do something like this.
SELECT
tbl_books_details.bd_book_code AS bkid,
tbl_books_details.bd_isbn,
tbl_books_details.bd_title AS title,
tbl_table_traking.tt_on_date
FROM
tbl_books_details
INNER JOIN tbl_table_tracking
tbl_books_details.bd_book_code = tbl_table_tracking.tt_record_id
WHERE tt_action = 'MODIFY';
Hope this helps...

MySQL: Optimizing SELECT from 6 huge identical tables with different data split up by timestamp

please I have the same problem as I found here
MySQL - Selecting data from multiple tables all with same structure but different data ,
I have to select data from many MySQL tables with identical structure, but different data (split up into table_0, table_1, table_2 etc to table_5 to distribute millions of records of data).
The hardware generating the data records for each device moves from table to table according to timestamp field, which is NOT unique. e.g. 50 records in table_0 may have the same timestamp. When the data gets to the end of table_5, it goes back to table_0 to start overwriting the data there. I need to get the data on each device within a time range.
Each table's data columns (for table_0, table_1... up to table_5):
timestamp, robotGroupID, robotID, sensor1, sensor2, sensor3, ... (many of them)
However the tables are HUGE and the UNION ALL (I read its faster than DISTINCT) takes forever to execute, even with just two tables let alone 6. e.g. I will illustrate for two tables below.
MySQL statement in PHP: (illustrated for just sensor 1, sensor 2 and sensor 3)
(SELECT sensor1, sensor2, sensor3 FROM table_0 WHERE robotID=".$robotID." AND timestamp BETWEEN ".$timeStampStart." AND ".$timeStampStop)
UNION ALL
(SELECT sensor1, sensor2, sensor3 FROM table_1 WHERE robotID=".$robotID." AND timestamp BETWEEN ".$timeStampStart." AND ".$timeStampStop)
N.B it is the exact same query except for the table name. Sensor data for a robot within a time range may span none, one, or more of the tables at once.
I cannot use LIMIT because the number of reports from robots within each time range cannot be known ahead of time. I cant use the MERGE STORAGE ENGINE cos I only have read-only access to the company's database.
I have an idea to use count(robotID) or so on each table to check before running queries but Im not sure how to go about this cos I'm quite a novice.
Please how do you think I can make this work faster for 6 tables and many more columns since there are many more columns than illustrated? Thanks in advance!
Are the fields RobotID and Timestamp indexed?
I would add a multi-field index of ( RobotId, timestamp ) at the very least.
You say you have read only access to the tables, so can you request this index to be added? I'm sure it will help in both your original and updated queries posted.
I must confess Im still a novice PHP/MySQL coder, but with many ideas; so my code is probably "dirty".
So I solved the problem this way in order to move forward, but please better solutions are welcome. As for any strange syntax, I am using a database class built upon the PHP PDO because I am using many different RBDMS types on this project.
For the $myQuery_start variable, I added the names of the other columns as well as sensors 1 to 3.
$myQuery_start = "(SELECT sensor1, sensor2, sensor3 FROM ";
$myQueryCount_start = "(SELECT COUNT(*) FROM ";
$myQuery_stop = " WHERE robotID=".$robotID." AND timestamp BETWEEN ".$timeStampStart." AND ".$timeStampStop.")";
$count_0 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_0".$myQuery_stop)->fetchColumn();
$count_1 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_1".$myQuery_stop)->fetchColumn();
$count_2 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_2".$myQuery_stop)->fetchColumn();
$count_3 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_3".$myQuery_stop)->fetchColumn();
$count_4 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_4".$myQuery_stop)->fetchColumn();
$count_5 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_5".$myQuery_stop)->fetchColumn();
And now I check to see if UNION ALL needs to be appended to each table's query or not. No need to have a UNION ALL if there is no data record to attach in the next table.
$union_0 = (($count_1 + $count_2 + $count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_1 = (($count_2 + $count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_2 = (($count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_3 = (($count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_4 = (($count_5) > 0)?" UNION ALL ":"";
and now we build up the table queries and combine to form the full query
$query_0 = ($count_0 > 0)?$myQuery_start."ip_minute_stats_0".$myQuery_stop.$union_0:"";
$query_1 = ($count_1 > 0)?$myQuery_start."ip_minute_stats_1".$myQuery_stop.$union_1:"";
$query_2 = ($count_2 > 0)?$myQuery_start."ip_minute_stats_2".$myQuery_stop.$union_2:"";
$query_3 = ($count_3 > 0)?$myQuery_start."ip_minute_stats_3".$myQuery_stop.$union_3:"";
$query_4 = ($count_4 > 0)?$myQuery_start."ip_minute_stats_4".$myQuery_stop.$union_4:"";
$query_5 = ($count_5 > 0)?$myQuery_start."ip_minute_stats_5".$myQuery_stop:"";
Then concatenated:
$myQuery = $query_0.$query_1.$query_2.$query_3.$query_4.$query_5;
And finally $myQuery is executed to produce all the data as required.
At least this is roughly 8 times faster than the previous way I used UNION ALL, so I think this is valid. Any suggested further optimization?
If you can convince them to let you change the database structure, you can GREATLY optimize the layout of your database with the help of MySQL Partitioning. You'll want to research "Range Partitioning", and set up partitioning rules that will tell MySQL to automatically sort your data into invisible subtables for way quicker SELECT results. You won't even need multiple tables.
See http://dev.mysql.com/doc/refman/5.1/en/partitioning-overview.html

indexes in mysql SELECT AS or using Views

I'm in over my head with a big mysql query (mysql 5.0), and i'm hoping somebody here can help.
Earlier I asked how to get distinct values from a joined query
mysql count only for distinct values in joined query
The response I got worked (using a subquery with join as)
select *
from media m
inner join
( select uid
from users_tbl
limit 0,30) map
on map.uid = m.uid
inner join users_tbl u
on u.uid = m.uid
unfortunately, my query has grown more unruly, and though I have it running, joining into a derived table is taking too long because there is no indexes available to the derived query.
my query now looks like this
SELECT mdate.bid, mdate.fid, mdate.date, mdate.time, mdate.title, mdate.name,
mdate.address, mdate.rank, mdate.city, mdate.state, mdate.lat, mdate.`long`,
ext.link,
ext.source, ext.pre, meta, mdate.img
FROM ext
RIGHT OUTER JOIN (
SELECT media.bid,
media.date, media.time, media.title, users.name, users.img, users.rank, media.address,
media.city, media.state, media.lat, media.`long`,
GROUP_CONCAT(tags.tagname SEPARATOR ' | ') AS meta
FROM media
JOIN users ON media.bid = users.bid
LEFT JOIN tags ON users.bid=tags.bid
WHERE `long` BETWEEN -122.52224684058 AND -121.79760915942
AND lat BETWEEN 37.07500915942 AND 37.79964684058
AND date = '2009-02-23'
GROUP BY media.bid, media.date
ORDER BY media.date, users.rank DESC
LIMIT 0, 30
) mdate ON (mdate.bid = ext.bid AND mdate.date = ext.date)
phew!
SO, as you can see, if I understand my problem correctly, i have two derivative tables without indexes (and i don't deny that I may have screwed up the Join statements somehow, but I kept messing with different types, is this ended up giving me the result I wanted).
What's the best way to create a query similar to this which will allow me to take advantage of the indexes?
Dare I say, I actually have one more table to add into the mix at a later date.
Currently, my query is taking .8 seconds to complete, but I'm sure if I could take advantage of the indexes, this could be significantly faster.
First, check for indices on ext(bid, date), users(bid) and tags(bid), you should really have them.
It seems, though, that it's LONG and LAT that cause you most problems. You should try keeping your LONG and LAT as a (coordinate POINT), create a SPATIAL INDEX on this column and query like that:
WHERE MBRContains(#MySquare, coordinate)
If you can't change your schema for some reason, you can try creating additional indices that include date as a first field:
CREATE INDEX ix_date_long ON media (date, `long`)
CREATE INDEX ix_date_lat ON media (date, lat)
These indices will be more efficient for you query, as you use exact search on date combined with a ranged search on axes.
Starting fresh:
Question - why are you grouping by both media.bid and media.date? Can a bid have records for more than one date?
Here's a simpler version to try:
SELECT
mdate.bid,
mdate.fid,
mdate.date,
mdate.time,
mdate.title,
mdate.name,
mdate.address,
mdate.rank,
mdate.city,
mdate.state,
mdate.lat,
mdate.`long`,
ext.link,
ext.source,
ext.pre,
meta,
mdate.img,
( SELECT GROUP_CONCAT(tags.tagname SEPARATOR ' | ')
FROM tags
WHERE ext.bid = tags.bid
ORDER BY tags.bid GROUP BY tags.bid
) AS meta
FROM
ext
LEFT JOIN
media ON ext.bid = media.bid AND ext.date = media.date
JOIN
users ON ext.bid = users.bid
WHERE
`long` BETWEEN -122.52224684058 AND -121.79760915942
AND lat BETWEEN 37.07500915942 AND 37.79964684058
AND ext.date = '2009-02-23'
AND users.userid IN
(
SELECT userid FROM users ORDER BY rank DESC LIMIT 30
)
ORDER BY
media.date,
users.rank DESC
LIMIT 0, 30
You might want to compare your perforamnces against using a temp table for each selection, and joining those tables together.
create table #whatever
create table #whatever2
insert into #whatever select...
insert into #whatever2 select...
select from #whatever join #whatever 2
....
drop table #whatever
drop table #whatever2
If your system has enough memory to hold full tables this might work out much faster. It depends on how big your database is.