query with exist take so long - mysql

From my debian terminal I try to execute in mysql client a query like:
SELECT *
FROM stop_times_lazio_set2_withtime2 AS B
WHERE EXISTS
(SELECT *
FROM stop_times_lazio_set2_emptytime2 AS A
WHERE B.trip_id=A.trip_id);
table A contains around 3 million records.
table B is a sub set of A of around 400000 records.
I'd like to select every records of A thats have a row "parent" with the same id (yes its not an unique/primary id)
Now it takes more than hours...now I'm around 2h and i still seen just a blinking pointer... is it the query correct? Even I can't access to others mysql client like phpmyadmin.
There is any way to speed up the process?
There is a way to check how many records are processed at running times?

I guess you have already indexed trip_id? There is another way writing the query, maybe it helps:
SELECT *
FROM stop_times_lazio_set2_withtime
WHERE trip_id IN (SELECT trip_id FROM stop_times_lazio_set2_emptytime2)

I would expect a straight JOIN to be much much faster...
SELECT B.*
FROM stop_times_lazio_set2_withtime2 AS B
JOIN stop_times_lazio_set2_emptytime2 AS A ON B.trip_id=A.trip_id

Why not using a simpler request ?
SELECT A.*
FROM stop_times_lazio_set2_emptytime2 AS A, stop_times_lazio_set_withtime2 AS B
WHERE B.trip_id=A.trip_id;
With that many records, it will obviously take time.
You can actually prevent it by processing only a few at a time by adding this at the end of the request
LIMIT <beginning>, <number of records>

Did you tried "left join"??
sample:
select columns from withtime
left join emptytime on withtime.tripid=emptytime.tripid;

Related

What is the performance difference between these two mysql queries?

select sum(a) from tbl1 where id in (1,2,3) (0.1 seconds)
and
select sum(a) from tbl1 where id in (select id from tbl2) (60 seconds)
select id from tbl2 returns 1,2,3 in 0.001 seconds;
tbl1 has roughly 2.2M entries;
(1,2,3) is something known already when the code runs, while the second example is a query from another table and it is perfectly normal that the latter takes more. However, 600x slowness needs some further explanation. Without further information of your situation, we can only guess, but the potential problems are as follows:
there are many ids in the second table and many records in the first table
the inner query runs for each record of the outer query (highly probable)
lack of indexes on columns you are using for filtering
If the result of the inner query should be similar to the first example's set, then you might want to separate your queries into two. You could load the ids separately and then use the result for the second query.
This situation happens on mysql < 5.5. There seems to be a bug which causes long query time. The solution si to add an index on the tmp2.id or upgrade to higher versions of mysql.

Fast to query slow to create table

I have an issue on creating tables by using select keyword (it runs so slow). The query is to take only the details of the animal with the latest entry date. that query will be used to inner join another query.
SELECT *
FROM amusementPart a
INNER JOIN (
SELECT DISTINCT name, type, cageID, dateOfEntry
FROM bigRegistrations
GROUP BY cageID
) r ON a.type = r.cageID
But because of slow performance, someone suggested me steps to improve the performance. 1) use temporary table, 2)store the result and use it and join it the the other statement.
use myzoo
CREATE TABLE animalRegistrations AS
SELECT DISTINCT name, type, cageID, MAX(dateOfEntry) as entryDate
FROM bigRegistrations
GROUP BY cageID
unfortunately, It is still slow. If I only use the select statement, the result will be shown in 1-2 seconds. But if I add the create table, the query will take ages (approx 25 minutes)
Any good approach to improve the query time?
edit: the size of big registration table is around 3.5 million rows
Can you please try the query in the way below to achieve The query is to take only the details of the animal with the latest entry date. that query will be used to inner join another query, the query you are using is not fetching records as per your requirement and it will faster:
SELECT a.*, b.name, b.type, b.cageID, b.dateOfEntry
FROM amusementPart a
INNER JOIN bigRegistrations b ON a.type = b.cageID
INNER JOIN (SELECT c.cageID, max(c.dateOfEntry) dateofEntry
FROM bigRegistrations c
GROUP BY c.cageID) t ON t.cageID = b.cageID AND t.dateofEntry = b.dateofEntry
Suggested indexing on cageID and dateofEntry
This is a multipart question.
Use Temporary Table
Don't use Distinct - group all columns to make distinct (dont forget to check for index)
Check the SQL Execution plans
Here you are not creating a temporary table. Try the following...
CREATE TEMPORARY TABLE IF NOT EXISTS animalRegistrations AS
SELECT name, type, cageID, MAX(dateOfEntry) as entryDate
FROM bigRegistrations
GROUP BY cageID
Have you tried doing an explain to see how the plan is different from one execution to the next?
Also, I have found that there can be locking issues in some DB when doing insert(select) and table creation using select. I ran this in MySQL, and it solved some deadlock issues I was having.
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
The reason the query runs so slow is probably because it is creating the temp table based on all 3.5 million rows, when really you only need a subset of those, i.e. the bigRegistrations that match your join to amusementPart. The first single select statement is faster b/c SQL is smart enough to know it only needs to calculate the bigRegistrations where a.type = r.cageID.
I'd suggest that you don't need a temp table, your first query is quite simple. Rather, you may just need an index. You can determine this manually by studying the estimated execution plan, or running your query in the database tuning advisor. My guess is you need to create an index similar to below. Notice I index by cageId first since that is what you join to amusementParks, so that would help SQL narrow the results down the quickest. But I'm guessing a bit - view the query plan or tuning advisor to be sure.
CREATE NONCLUSTERED INDEX IX_bigRegistrations ON bigRegistrations
(cageId, name, type, dateOfEntry)
Also, if you want the animal with the latest entry date, I think you want this query instead of the one you're using. I'm assuming the PK is all 4 columns.
SELECT name, type, cageID, dateOfEntry
FROM bigRegistrations BR
WHERE BR.dateOfEntry =
(SELECT MAX(BR1.dateOfEntry)
FROM bigRegistrations BR1
WHERE BR1.name = BR.name
AND BR1.type = BR.type
AND BR1.cageID = BR.cageID)

Slow query takes up entire HDD space resulting in a "1030 Got error 28 from storage engine"

Fairly new to MySQL.
Slow query takes up the entire HDD space ending up with 1030 error code.
INSERT INTO schema.Table C
SELECT a.`Date`, a.Store, a.SKU,
floor((a.QTY / ((b.CASEQTY * b.CASEPERLAYER) * b.LAYERPERPALLET))) AS Pallets,
floor(((a.QTY / ((b.CASEPERLAYER * b.LAYERPERPALLET) * b.CASEQTY)) /.CASEQTY)) AS Cases,
(a.QTY * b.CASEQTY) AS Pieces
FROM
(schema.table1 AS a
INNER JOIN schema.table2 AS b)
WHERE a.Description = 'BLAH';
Problem:
When I run the above query I get the results I need in 0.01 sec with a limit of 100 rows. However, When I try to insert the query into a prepared table it fails.
The above query will basically run for hours until the HDD is full. Table A contains millions of records and table B only a few thousand. Storage engine is InnoDB. I've run a similar query for 3hrs and have had it succeed. Any help will be greatly appreciated.
That's something special in MySQL. In spite of calling it INNER JOIN, you can do a CROSS JOIN by leaving out the ON clause which is exactly what you are doing. (Another dbms would raise a syntax error.)
So by not specifying the ON clause to match records from table1 and table2 you match every record in table1 with every record in table2. These can be many :-)
Your inner join statement contains no join criteria. This will result in something (bad) called a "cartesian product". So, if table A has a million records and table b contains a thousand, then a cartesian product will match each row in table A to EVERY row in the other table. This should give you (at least) a billion records.
To fix this, you need to define/constrain the relationship between the two tables by using an "ON" clause for your join or it could go in the WHERE clause.

SELECT statement issue with OR

I am trying to do a filter query by using the next statement:
SELECT * FROM user_jobs,users WHERE user_jobs.job_title LIKE "%some_keyword%" **OR** user_jobs.job_title LIKE "%another_keyword%" AND user.id=user_jobs.userid
Specs: users.id is PK and user_jobs.userid is FK to users.id
I am trying to filter the users to get the ones that have similar values as specified. When I run it I get a very long loop and finally a large list of users that contains duplicates. (e.g. I only have 300 users and the query shows over 3000 results)
What am I doing wrong, please?
Thanks in advance!
AND takes precedence over OR; use parentheses to achieve the desired result.
SELECT * FROM user_jobs, users
WHERE
(user_jobs.job_title LIKE "%some_keyword%"
OR user_jobs.job_title LIKE "%another_keyword%")
AND users.id = user_jobs.userid
You need to use parentheses in that query.
SELECT * FROM user_jobs,users WHERE user.id=user_jobs.userid
AND (user_jobs.job_title LIKE "%some_keyword%"
OR user_jobs.job_title LIKE "%another_keyword%")
First off, the AND operator holds precedence in this case. Isolate your logic like so:
SELECT * FROM user_jobs,users WHERE (user_jobs.job_title LIKE "%some_keyword%" OR user_jobs.job_title LIKE "%another_keyword%") AND user.id=user_jobs.userid
Second of all, don't use SELECT * FROM .... This selects all your data, adding network overhead and taking up more time to transfer it all across the server.
Reference: https://dev.mysql.com/doc/refman/5.0/en/func-op-summary-ref.html
Even though you table contains 300 records it will result around 3000 records because you are selecting columns from both the tables but not giving any join condition in your query , so it will CROSS JOIN both the tables.
and for finding the `JOB_TITLE patter you can use Regular Expressions also as
SELECT * FROM USER_JOBS T1,USERS T2 WHERE REGEXP_LIKE(USER_JOBS.JOB_TITLE ,'SOME_KEYWORD|OTHER_KEYWORD') AND T2.ID=T1.USERID;

Most efficient way to LIMIT results in a JOIN?

I have a fairly simple one-to-many type join in a MySQL query. In this case, I'd like to LIMIT my results by the left table.
For example, let's say I have an accounts table and a comments table, and I'd like to pull 100 rows from accounts and all the associated comments rows for each.
Thy only way I can think to do this is with a sub-select in in the FROM clause instead of simply selecting FROM accounts. Here is my current idea:
SELECT a.*, c.* FROM
(SELECT * FROM accounts LIMIT 100) a
LEFT JOIN `comments` c on c.account_id = a.id
ORDER BY a.id
However, whenever I need to do a sub-select of some sort, my intermediate level SQL knowledge feels like it's doing something wrong.
Is there a more efficient, or faster, way to do this, or is this pretty good?
By the way...
This might be the absolute simplest way to do this, which I'm okay with as an answer. I'm simply trying to figure out if there IS another way to do this that could potentially compete with the above statement in terms of speed.
Looks perfect to me.
Just wondering if it is ok with you to receive less or more than 100 rows from the above query.
Because if for a particular row of accounts table there are no rows in comments table then you would get less than 100 rows. Similarly if there are multiple matching rows in comments table for a row in accounts table then you may get more than 100 rows.
See: How can I optimize this query, takes more than a min to execute
No, The query is fine, just the way it is. You might want to filter the fields instead of a.* and c.* though.