I have been working on php and mysql for 1 year. I have come across some code in php, where the programmers writes a query which he encloses it in for loop or foreach loop, and the query gets generated as like this (takes 65.134 seconds):
SELECT * from tbl_name where PNumber IN('p1','p2'.......,'p998','p999','p1000')
In my mind I believe that query is horrible, but people who are working before me, they say there is no other way round and we have to deal with it. Then I thought it should be wrong database design. But cannot come up with the solution. So any suggestions or opinions are welcome.
Couple of things to add. Column is indexed and table has almost 3 million records. I have given you example as p1,p2 etc..But those are originally phone numbers like 3371234567,5028129456 etc. The worst part, that I feel is, this column is varchar instead of int or big int which makes the comparison even worse.My question is, can we call this a good query, or is it wrong to generalize and it depends on the requirement?
I'm posting this with hopes that we can decrease the turnaround time by providing an example.
Check with your developers and see how they are producing the SQL command.
All those numbers must be coming from somewhere. Ask them where.
If the numbers are coming from a table, then we should simply JOIN those two tables.
EXAMPLE:
Granting that the phone numbers are stored in a table named PHONE_NUMBERS under a column named Phone -- and using your example tbl_name to which we match the column PNumber
SELECT t1.*
FROM tbl_name AS t1
INNER JOIN PHONE_NUMBERS AS t2
ON t1.PNumber = t2.Phone
However, even as an example, this is not enough. Like #George said, you'll have to give us more information about the data structure and the data source of the phone numbers. In fact, depending on the kind of data you show us and the results you need, your SQL query might need to remain using an IN statement instead of an INNER JOIN statement..
Please give us more information...
What can help us help you is if you could tell us the result of explain on your query.
ie:
explain SELECT * from tbl_name where PNumber IN('p1','p2'.......,'p998','p999','p1000');
This will give us some information on what the database is trying to do.
Related
I have a mysql (mariadb) database with numerous tables and all the tables have the same structure.
For the sake of simplicity, let's assume the structure is as below.
UserID - Varchar (primary)
Email - Varchar (indexed)
Is it possible to query all the tables together for the Email field?
Edit: I have not finalized the db design yet, I could put all the data in single table. But I am afraid that large table will slow down the operations, and if it crashes, it will be painful to restore. Thoughts?
I have read some answers that suggested dumping all data together in a temporary table, but that is not an option for me.
Mysql workbench or PHPMyAdmin is not useful either, I am looking for a SQL query, not a frontend search technique.
There's no concise way in SQL to say this sort of thing.
SELECT a,b,c FROM <<<all tables>>> WHERE b LIKE 'whatever%'
If you know all your table names in advance, you can write a query like this.
SELECT a,b,c FROM table1 WHERE b LIKE 'whatever%'
UNION ALL
SELECT a,b,c FROM table2 WHERE b LIKE 'whatever%'
UNION ALL
SELECT a,b,c FROM table3 WHERE b LIKE 'whatever%'
UNION ALL
SELECT a,b,c FROM table4 WHERE b LIKE 'whatever%'
...
Or you can create a view like this.
CREATE VIEW everything AS
SELECT * FROM table1
UNION ALL
SELECT * FROM table2
UNION ALL
SELECT * FROM table3
UNION ALL
SELECT * FROM table4
...
Then use
SELECT a,b,c FROM everything WHERE b LIKE 'whatever%'
If you don't know the names of all the tables in advance, you can retrieve them from MySQL's information_schema and write a program to create a query like one of my suggestion. If you decide to do that and need help, please ask another question.
These sorts of queries will, unfortunately, always be significantly slower than querying just one table. Why? MySQL must repeat the overhead of running the query on each table, and a single index is faster to use than multiple indexes on different tables.
Pro tip Try to design your databases so you don't add tables when you add users (or customers or whatever).
Edit You may be tempted to use multiple tables for query-performance reasons. With respect, please don't do that. Correct indexing will almost always give you better query performance than searching multiple tables. For what it's worth, a "huge" table for MySQL, one which challenges its capabilities, usually has at least a hundred million rows. Truly. Hundreds of thousands of rows are in its performance sweet spot, as long as they're indexed correctly. Here's a good reference about that, one of many. https://use-the-index-luke.com/
Another reason to avoid a design where you routinely create new tables in production: It's a pain in the ***xxx neck to maintain and optimize databases with large numbers of tables. Six months from now, as your database scales up, you'll almost certainly need to add indexes to help speed up some slow queries. If you have to add many indexes, you, or your successor, won't like it.
You may also be tempted to use multiple tables to make your database more resilient to crashes. With respect, it doesn't work that way. Crashes are rare, and catastrophic unrecoverable crashes are vanishingly rare on reliable hardware. And crashes can corrupt multiple tables. (Crash resilience: decent backups).
Keep in mind that MySQL has been in development for over a quarter-century (as have the other RDBMSs). Thousands of programmer years have gone into making it fast and resilient. You may as well leverage all that work, because you can't outsmart it. I know this because I've tried and failed.
Keep your database simple. Spend your time (your only irreplaceable asset) making your application excellent so you actually get millions of users.
There's 10 tables all with a session_id column and a single session table. The goal is to join them all on the session table. I get the feeling that this is a major code smell. Is this good/bad practice ?
What problems could occur?
Whether this is a good design or not depends deeply on what you are trying to represent with it. So, it might be OK or it might not be... there's no way to tell just from your question in its current form.
That being said, there are couple ways to speed up a join:
Use indexes.
Use covering indexes.
Under the right DBMS, you could use a materialized view to store pre-joined rows. You should be able to simulate that under MySQL by maintaining a special table via triggers (or even manually).
Don't join a table unless you actually need its fields. List only the fields you need in the SELECT list (instead of blindly using *). The fastest operation is the one you don't have to do!
And above all, measure on representative amounts of data! Possible results:
It's lightning fast. Yay!
It's slow, but it doesn't matter that it's slow (i.e. rarely used / not important).
It's slow and it matters that it's slow. Strap-in, you have work to do!
We need Query with 11 joins and the EXPLAIN posted in the original question when it is available, please. And be kind to your community, for every table involved post as well SHOW CREATE TABLE tblname SHOW INDEX FROM tblname to avoid additional requests for these 11 tables. And we will know scope of data and cardinality involved for each indexed column.
of Course more join kills performance.
but it depends !! if your data model is like that then you can't help yourself here unless complete new data model re-design happen !!
1) is it a online(real time transaction ) DB or offline DB (data warehouse)
if online , then better maintain single table. keep data in one table , let column increase in size.!!
if offline , it's better to maintain separate table , because you are not going to required all column always.!!
I have a large database on server. The database is all about the mobile numbers with about 20 millions of records at present. I want to match the mobile numbers on my website to filter the DND or Non-DND mobile numbers. I am using this query for small number filtering
SELECT phone_number
FROM mobnum_table
WHERE phone_number IN ('7710450282', '76100003451', '8910003402', '9410009850', '7610000191');
But what about in the condition I want to filter 1,00,000 mobile number records in few seconds..? I heard about SQL query optimization but not aware so much about it. Also please guide me what storage engine should I consider in this situation?
I have already googled it, but didn't find so much good answer for the same.
Thanks in Advance..
I think there is some problem in your requirement itself. If you tell us more about your problem, may be we can help you. Anyway its not a good idea to give all the 100000 numbers in IN. One option is to create another table and do an inner join.
Assume you have another table selectednumbers with columns id and phone_number,
you can do an inner join as follows
SELECT phone_number FROM mobnum_table a inner join selectednumbers b on
a.phone_number=b.phone_number
As I mentioned earlier, your question is not complete. So kindly provide some more information so we can give optimized query.
So you're generating a list of 100,000 numbers, and then putting that back into another query?
If you're getting the numbers from a table, take the query that generated the list of numbers in the first place, put it inside the in() brackets, and you'll see a large improvement immediately.
Restructure them both to use a JOIN, instead of in() and you'll see even more.
OR, depending on your DB structure, just do
SELECT phone_number
FROM mobnum_table
WHERE DND = 1
I'm trying to optimize a SQL query and I am not sure if further optimization is possible.
Here's my query:
SELECT someColumns
FROM (((smaller_table))) AS a
INNER JOIN (((smaller_table))) AS b
ON a.someColumnProperty = b.someColumnProperty
...the problem with this approach is that my table has half a trillion records in it. In my query, you'll notice (((smaller_table))). I wrote that as an abbreviation for a SELECT statement being run on MY_VERY_LARGE_TABLE to reduce it's size.
(((smaller_table))) appears twice, and the code within is exactly the same both times. There's no reason for me to run the same sub-query twice. This table is several TB and I shouldn't scan through it twice just to get the same results.
Do you have any suggestions on how I can NOT run the exact same reduction twice? I tried replacing the INNER JOIN line with INNER JOIN a AS b but got an "unrecognized table a" warning. Is there any way to store the value of a so I can reuse it?
Thoughts:
Make sure there is an index on userid and dayid.
I would ask you to define better what it is you are trying to find out.
Examples:
What is the busiest time of the week?
Who are the top 25 people who come to the gym the most often?
Who are the top 25 people who utilize the gem the most? (This is different than the one above because maybe I have a user that comes 5 times a month, but stays 5 hours per session vs a user that comes 30 times a month and stays .5 hour per session.)
Maybe doing all days in a horizontal method (day1, day2, day3) would be better visually to try to find out what you are looking for. You could easily put this into excel or libreoffice and color the days that are populated to get a visual "picture" of people who come consecutively.
It might be interesting to run this for multiple months to see if what the seasonality looks like.
Alas CTE is not available in MySQL. The ~equivalent is
CREATE TABLE tmp (
INDEX(someColumnProperty)
)
SELECT ...;
But...
You can't use CREATE TEMPORARY TABLE because such can't be used twice in the same query. (No, I don't know why.)
Adding the INDEX (or PK or ...) during the CREATE (or afterwards) provides the very necessary key for doing the self join.
You still need to worry about DROPping the table (or otherwise dealing with it).
The choice of ENGINE for tmp depends on a number of factors. If you are sure it will be "small" and has no TEXT/BLOB, then MEMORY may be optimal.
In a Replication topology, there are additional considerations.
Consider this a theoretical question as much as practical.
One has a table with, say 1.000.000+ records of users and need to pull data for, say 50.000 of them from that table, using user_id only. How would you expect IN to behave? If not good, is it the only option or is there anything else one could try?
You could insert your search values into a single column temporary table and join on that. I have seen other databases do Bad Things when presented with very large in clauses.
The IN functionality has actually pretty poor performance, so this is something I would avoid. Most of the time you can get by by using a joined query, so depending on your database structure you should definitively favor a join over an IN-statement.
If IN starts to prove troublesome (as other answerers have suggested it might, you could try rewriting your query using EXISTS instead.
SELECT *
FROM MYTAB
WHERE MYKEY IN (SELECT KEYVAL
FROM MYOTHERTAB
WHERE some condition)
could become
SELECT *
FROM MYTAB
WHERE EXISTS (SELECT *
FROM MYOTHERTAB
WHERE some condition AND
MYTAB.MYKEY = MYOTHERTAB.KEYVAL)
I have often found this speeds things up quite a bit.
Use a JOIN to select the data you need.