SQL Query Will Time Out - mysql

I don't know how to stop my query from timing out. I have two tables. One with payment details including a postcode field. Example of payments table;
id (PRI) | company_name | amount | postcode
1 ACME 10000 AB1 1AA
2 Some Int. 15000 ZY9 8XW
The other table is a lookup table which assigns geographical regions to the postcodes. Example of postcode table;
postcode | country | county | local_authority | gor
AB1 1AA S99999 E2304 X 45
AB1 1AB S99999 E2304 X 45
So if a user does a search on country = S99999 it will return all the payments for that country.
The payments table has 40,000 rows. The postcode table has over 2,500,000. Even this really simple query will time out;
SELECT t1.company_name, t1.amount, t1.postcode, t2.country, t2.county, t2.local_authority, t2.gor
FROM `payments` as t1 LEFT JOIN `postcodes` AS t2 ON t1.`postcode` = t2.`postcode`
I have an INDEX on both postcode fields on both tables. I cannot manually add the lookup fields on to my payments table because their is a different lookup for each quarter.
I am limited in my experience here. I cannot think of alternatives or ways around this. Any ideas?

I'll make an answer from my comments as well:
I think your 'LEFT JOIN' is the killer. For now, change it into a normal 'JOIN'. Does it still time out? :)
And you might want to create a view with the query. The first time it will be slow but then the data will (probably) be available in a cache. But you might want to read the documentation of the DBMS about that. :)
Does a user always have to do a search to trigger this query? You might want to try this:
SELECT * FROM payments WHERE postcode IN (SELECT postcode FROM postcode WHERE (country='S99999'));
Edit: the nested query might take more memory. :)

It is not clear what DB you using, but in case of SQL server:
Limit the count of returned recors (TOP clause)
use hint NO LOCK to improve speed of the query
increase command timeout of your query, but no more that web page timeout
don't use LEFT JOIN, it is slow
use temp table (or in memory table), where to store you data from both tables and than just filter the data

Related

MS Access with MySQL backend query with two JOINs results in excessive queries

I have 2 tables, Customers and Countries, like so:
Customers:
+----+------+-----+---------------+----------------+
| ID | Name | ... | OfficeCountry | BillingCountry |
+----+------+-----+---------------+----------------+
| 1 | Bill | ... | 1 | 1 |
| 2 | Joe | ... | 2 | 1 |
+----+------+-----+---------------+----------------+
Countries:
+----+-------------+
| ID | Name |
+----+-------------+
| 1 | USA |
| 2 | Netherlands |
+----+-------------+
(I stripped some columns from the Customers table to only have some relevant columns for this question)
The purpose of those two columns are that the country for billing and the physical office location could be different. We also have more address information in this table, but stripped for this example.
I JOIN these two tables with a query resulting in something like this:
SELECT
ID,
some,
fields,
Countries_1.Name AS OfficeCountryName,
Countries_2.Name AS BillingCountryName
FROM
Customers
LEFT JOIN
Countries AS Countries_1
ON
Customers.OfficeCountry = Countries_1.ID
LEFT JOIN
Countries AS Countries_2
ON
Customers.BillingCountry = Countries_2.ID
The application we are using is a MS Access front end with a MySQL back-end. This is done with ODBC.
The Customers table contains roughly 15,000 records.
The problem is that the application has a bad performance. I enabled the query log, and I can see the following queries being executed when I am loading the data (from a DynaSet) into a form:
The Query as written above
An extra query, with an OUTER JOIN, written in the old legacy {oj ...} syntax
30.000 queries (2x the COUNT from Customers) to the Countries table. This exact query: SELECT ID FROM Countries WHERE ID = 2 (or ID = 1, depending on the Customer).
The last two queries amaze me.
First, WHERE is the OUTER JOIN query coming from? I never specified any OUTER JOIN in Access. Also, the old legacy {oj ..} syntax gives me the feeling that something's up. Also, this query is not needed. I do not use it's data in the Access front end, and I don't know where it's coming from.
Second, WHY is Access query'ing the Countries table for every record? The data isn't needed, and also not helpful. It's only SELECTing the ID which it already knows (as seen in the WHERE clause)
As you can imagine, 30,000 queries is greatly slowing down the performance.
I know that it's not good practice to load 15,000 records into one form (with navigation controls and such), but it's a very old legacy application and a lot of work to re-write.
EDIT
I see now that for very simple queries, with just a purpose build very clean form, it generates a few queries:
A query that selects all ID's (so for the Customer, and twice the JOIN'ed table
A query that selects all neccesary fields PER RECORD RETURNED from query 1, FOR EVERY TABLE. So a SELECT FROM Customers WHERE Id = record_currently_viewed
Access queries with multiple LEFT JOINs on linked ODBC tables are notorious for bad performance. The MySql ODBC driver may be even worse than the Sql Server driver, if it fails on such a simple query.
How to fix:
1- In this particular case, there really should be no reason to use LEFT JOIN at all. How can there be country FKs without matching countries?
You should fix any orphaned FKs and create relationships with referential integrity, then use INNER JOIN, and the problem will be gone.
2- If this isn't possible, move the processing to the server. Create a View that does the joining, link it in Access, and base the form on it.
3- If you can't modify the backend DB at all, you can also use a PassThrough query. This will be read-only though.
Have created a small fiddle where you still could use Left join - With SQL example.
You can try to use with statement - than should load data in memory for further processing:
with memCountry (ID, Name) as (select ID, Name from Countries)
select cu.ID, cu.Name, c1.Name, c2.Name from Customer cu
left join memCountry c1 on cu.OfficeCountry = c1.id
left join memCountry c2 on cu.BillingCountry = c2.id

Mysql two ways to select where. Which way uses less resources and is faster?

For example have url like domain.com/transport/cars
Based on the url want to select from mysql and show list of ads for cars
Want to choose fastest method (method that takes less time to show results and will use less resources).
Comparing 2 ways
First way
Mysql table transport with rows like
FirstLevSubcat | Text
---------------------------------
1 | Text1 car
2 | Text1xx lorry
1 | Text another car
FirstLevSubcat Type is int
Then another mysql table subcategories
Id | NameOfSubcat
---------------------------------
1 | cars
2 | lorries
3 | dogs
4 | flats
Query like
SELECT Text, AndSoOn FROM transport
WHERE
FirstLevSubcat = (SELECT Id FROM subcategories WHERE NameOfSubcat = `cars`)
Or instead of SELECT Id FROM subcategories get Id from xml file or from php array
Second way
Mysql table transport with rows like
FirstLevSubcat | Text
---------------------------------
cars | Text1 car
lorries | Text1xx lorry
cars | Text another car
FirstLevSubcat Type is varchar or char
And query simply
SELECT Text, AndSoOn FROM transport
WHERE FirstLevSubcat = `cars`
Please advice which way would use less resources and takes less time to show results. I read that better select where int than where varchar SQL SELECT speed int vs varchar
So as understand the First way would be better?
The first design is much better, because you separate two facts in your data:
There is a category 'cars'.
'Text1 car' is in the Category 'cars'.
Imagine, in your second design you enter another car, but type in 'cors' instead of 'cars'. The dbms doesn't see this, and so you have created another category with a single entry. (Well, in MySQL you could use an enum column instead to circumvent this issue, but this is not available in most other dbms. And anyhow, whenever you want to rename your category, say from 'cars' to 'vans', then you would have to change all existing records plus alter the table, instead of simply renaming the entry once in the subcategories table.)
So stay away from your second design.
As to Praveen Prasannan's comment on sub queries and joins: That is nonsense. Your query is straight forward and good. You want to select from transport where the category is the desired one. Perfect. There are two groups of persons who would prefer a join here:
Beginners who simply don't know better and always join from the start and try to sort things out in the end.
Experienced programmers who know that some dbms often handle joins better than sub-queries. But this is a pessimistic habit. Better write your queries such that they are easy to read and maintain, as you are already doing, and only change this in case grave performance issues occur.
Yup. As the SO link in your question suggests, int comparison is faster than character comparison and yield faster fetch. Keeping this in mind, first design would be considered as better design. However sub queries are never recommended. Use join instead.
eg:
SELECT t.Text, t.AndSoOn FROM transport t
INNER JOIN subcategories s ON s.ID = t.FirstLevSubcat
WHERE s.NameOfSubcat = 'cars'

Efficiency of Query to Select Records based on Related Records in Composite Table

Setup
I am creating an event listing where users can narrow down results by several filters. Rather than having a table for each filter (i.e. event_category, event_price) I have the following database structure (to make it easy/flexible to add more filters later):
event
event_id title description [etc...]
-------------------------------------------
fllter
filter_id name slug
-----------------------------
1 Category category
2 Price price
filter_item
filter_item_id filter_id name slug
------------------------------------------------
1 1 Music music
2 1 Restaurant restaurant
3 2 High high
4 2 Low low
event_filter_item
event_id filter_item_id
--------------------------
1 1
1 4
2 1
2 3
Goal
I want to query the database and apply the filters that users specify. For example, if a user searches for events in 'Music' (category) priced 'Low' (price) then only one event will show (with event_id = 1).
The URL would look something like:
www.site.com/events?category=music&price=low
So I need to query the database with the filter 'slugs' I receive from the URL.
This is the query I have written to make this work:
SELECT ev.* FROM event ev
WHERE
EXISTS (SELECT * FROM event_filter_item efi
JOIN filter_item fi on fi.filter_item_id = efi.filter_item_id
JOIN filter f on f.filter_id = fi.filter_id
WHERE efi.event_id = ev.event_id AND f.slug = 'category' AND fi.slug ='music')
AND EXISTS (SELECT * FROM event_filter_item efi
JOIN filter_item fi on fi.filter_item_id = efi.filter_item_id
JOIN filter f on f.filter_id = fi.filter_id
WHERE efi.event_id = ev.event_id AND f.slug = 'price' AND fi.slug = 'low')
This query is currently hardcoded but would be dynamically generated in PHP based on what filters and slugs are present in the URL.
And the big question...
Is this a reasonable way to go about this? Does anyone see a problem with having multiple EXISTS() with sub-queries, and those subqueries performing several joins? This query is extremely quick with only a couple records in the database, but what about when there are thousands or tens of thousands?
Any guidance is really appreciated!
Best,
Chris
While EXISTS is just a form of JOIN, MySQL query optimizer is notoriously "stupid" about executing it optimally. In your case, it will probably do a full table scan on the outer table, then execute the correlated subquery for each row, which is bound to scale badly. People often rewrite EXISTS as explicit JOIN for that reason. Or, just use a smarter DBMS.
In addition to that, consider using a composite PK for filter_item, where FK is at the leading edge - InnoDB tables are clustered and you'd want to group items belonging to the same filter physically close together.
BTW, tens of thousands is not a "large" number of rows - to truly test the scalability use tens of millions or more.

Soccer SQL Query Home- and Roadteam Issue

In a soccer environment I want to display the current standings. Meaning: points and goals per team. The relevant tables look similar to the following (simplified).
Match Table
uid (PK) hometeamid roadteamid
------------------------------------------------------------------
Result Table
uid (PK) hometeamscore roadteamscore resulttype (45min, 90min, ..)
-------------------------------------------------------------------
Team Table
uid (PK) name shortname icon
------------------------------------------------------------------
Now I don't get my head around it, how to write the standings in one query. What I managed was to write a query, which returns the "homegames"-standings only. I guess that's the easy part. Anyway here is how it looks:
SELECT ht.name,
Count(*) As matches,
SUM(res.hometeamscore) AS goals,
SUM(res.roadteamscore) AS opponentgoals,
SUM(res.hometeamscore - res.roadteamscore) AS goalDifference,
SUM(res.hometeamscore > res.roadteamscore) * 3 + SUM(res.hometeamscore = res.roadteamscore) As Points
FROM league_league l
JOIN league_gameday gd
ON gd.leagueid = l.uid
JOIN league_match m
ON m.gamedayid = gd.uid
JOIN league_result res
ON res.matchid = m.uid
AND res.resulttype = 2
JOIN league_team ht
ON m.hometeamid = ht.uid
Where l.uid = 1
Group By ht.uid
Order By points DESC, goalDifference DESC
Any idea how to modify this, that it will return home- and roadgames would be big time appreciated.
Many thanks,
Robin
Create views. If your data does not change often and you need performance, create one or more pre-computed tables.
Views in MySQL are juste pseudo-tables that are dynamically computed from a SELECT query. Using the SQL in your question, you can create a view of the teams results at home: CREATE VIEW homegames AS SELECT ...
Then do the same for road games. Then it will be easy to synthesize both views in a third one (you just need to sum up the columns).
Views have at least one flaw: they are slow. A view built on views is like using complex subqueries, and MySQL is quite bad at this. I don't think it's a problem for you as you're probably dealing with hundreds of games at most. But if you find these views to be too slow to query, and provided you don't use any kind of cache that could mitigate this, then use simple tables instead of views. Of course, you'll need to keep them in sync. You can TRUNCATE and INSERT INTO homegames SELECT ... each time you have a new game, or you can be smarter and just UPDATE the tables. Both are right, depending on your needs.
Could you not abstract this out into a stored procedure or stored function to call rather than constructing such a big-ass complicated query?

Showing all duplicates, side by side, in MySQL

I have a table like so:
Table eventlog
user | user_group | event_date | event_dur.
---- ---------- --------- ----------
xyz 1 2009-1-1 3.5
xyz 2 2009-1-1 4.5
abc 2 2009-1-2 5
abc 1 2009-1-2 5
Notice that in the above sample data, the only thing reliable is the date and the user. Through an over site that is 90% mine to blame, I have managed to allow users to duplicate their daily entries. In some instances the duplicates were intended to be updates to their duration, in others it was their attempt to change the user_group they were working with that day, and in other cases both.
Fortunately, I have a fairly strong idea (since this is an update to an older system) of which records are correct. (Basically, this all happened as an attempt to seamlessly merge the old DB with the new DB).
Unfortunately, I have to more or less do this by hand, or risk losing data that only exists on one side and not the other....
Long story short, I'm trying to figure out the right MySQL query to return all records that have more than one entry for a user on any given date. I have been struggling with GROUP BY and HAVING, but the best I can get is a list of one of the two duplicates, per duplicate, which would be great if I knew for sure it was the wrong one.
Here is the closest I've come:
SELECT *
FROM eventlog
GROUP BY event_date, user
HAVING COUNT(user) > 1
ORDER BY event_date, user
Any help with this would be extremely useful. If need be, I have the list of users/date for each set of duplicates, so I can go by hand and remove all 400 of them, but I'd much rather see them all at once.
Thanks!
Would this work?
SELECT event_date, user
FROM eventlog
GROUP BY event_date, user
HAVING COUNT(*) > 1
ORDER BY event_date, user
What's throwing me off is the COUNT(user) clause you have.
You can list all the field values of the duplicates with GROUP_CONCAT function, but you still get one row for each set.
I think this would work (untested)
SELECT *
FROM eventlog e1
WHERE 1 <
(
SELECT COUNT(*)
FROM eventlog e2
WHERE e1.event_date = e2.event_date
AND e1.user = e2.user
)
-- AND [maybe an additionnal constraint to find the bad duplicate]
ORDER BY event_date, user;
;