how to query data in a fast way after table splitting? - mysql

I have a MySQL table about 1000 million records. It is very slow when I make a query.
So I split this table by ID into 10 sub-tables with the same structrue.
table_1(1-1000000)
table_2(10000001-2000000)
table_3(20000001-3000000)
......
But how can i query data in a fast way after table splitting?
when I query a user like this: select name from table where name='mark', I don't know go to which table for querying beacuse I can get the ID range.

Splitting tables this way is totally not the right way when you show your example query. You created more issues actually than solving anything.
Let's get back to the big table:
Step 1 is to see why it is slow, so post explain sql command to get an overview.
Step 2 is to see whether you can improve that query. Stating things like indexes are not a good solution can be true. If so please provide measurements showing this.
Step 3 is to think outside the box. You are running queries in a very big table which gets constantly inserts. Consider using a specifically for search designed index. For example consider indexing with Solr for the search commands.
Eventually you might even get to the hardware point, it just can't get faster on this hardware. But first follow through steps, add the right information, concrete measurements and specifications so you can get even more complete support on your case.

Related

Group by, Order by and Count MySQL performance

I have the next query to get the 15 most sold plates in a place:
This query is taking 12 seconds to execute over 100,000 rows. I think this execution takes too long, so I am searching a way to optmize the query.
I ran the explain SQL command on PHPMyAdmin and i got this:
[![enter image description here][1]][1]
According to this, the main problem is on the p table which is scanning the entire table, but how can I fix this? The id of p table is a primary key, do I need to set it also as an index? Also, is there anything else I can do to make the query runs faster?
You can make a relationship between the two tables.
https://database.guide/how-to-create-a-relationship-in-mysql-workbench/
Beside this you can also use a left join so you won't load the whole right table in.
Order by is a slow function in MySQL, if you are using code afterwards you can just do it in the code that is much faster than order by.
I hope I helped and Community feel free to edit :)
You did include the explain plan but you did not give any information about your table structure, data distribution, cardinality nor volumes. Assuming your indices are accurate and you have an even data distribution, the query is having to process over 12 million rows - not 100,000. But even then, that is relatively poor performance. But you never told us what hardware this sits on nor the background load.
A query with so many joins is always going to be slow - are they all needed?
the main problem is on the p table which is scanning the entire table
Full table scans are not automatically bad. The cost of dereferencing an index lookup as opposed to a streaming read is about 20 times more. Since the only constraint you apply to this table is its joins to other tables, there's nothing in the question you asked to suggest there is much scope for improving this.

Join 10 tables on a single join id called session_id that's stored in session table. Is this good/bad practice?

There's 10 tables all with a session_id column and a single session table. The goal is to join them all on the session table. I get the feeling that this is a major code smell. Is this good/bad practice ?
What problems could occur?
Whether this is a good design or not depends deeply on what you are trying to represent with it. So, it might be OK or it might not be... there's no way to tell just from your question in its current form.
That being said, there are couple ways to speed up a join:
Use indexes.
Use covering indexes.
Under the right DBMS, you could use a materialized view to store pre-joined rows. You should be able to simulate that under MySQL by maintaining a special table via triggers (or even manually).
Don't join a table unless you actually need its fields. List only the fields you need in the SELECT list (instead of blindly using *). The fastest operation is the one you don't have to do!
And above all, measure on representative amounts of data! Possible results:
It's lightning fast. Yay!
It's slow, but it doesn't matter that it's slow (i.e. rarely used / not important).
It's slow and it matters that it's slow. Strap-in, you have work to do!
We need Query with 11 joins and the EXPLAIN posted in the original question when it is available, please. And be kind to your community, for every table involved post as well SHOW CREATE TABLE tblname SHOW INDEX FROM tblname to avoid additional requests for these 11 tables. And we will know scope of data and cardinality involved for each indexed column.
of Course more join kills performance.
but it depends !! if your data model is like that then you can't help yourself here unless complete new data model re-design happen !!
1) is it a online(real time transaction ) DB or offline DB (data warehouse)
if online , then better maintain single table. keep data in one table , let column increase in size.!!
if offline , it's better to maintain separate table , because you are not going to required all column always.!!

MySQL Query Caching (2)

This is not a problem but it belongs to site optimization. I have 110K records of hotels. When I use SELECT something query it will pulled out data from 110k records.
If I search a hotel list with more than 3 star rating, price between 100 - 300 $ and within Mexico City. Suppose I got 45 matching results.
Is there any other way when I add more refinement, it will pulled out data from just only the 45 matching and not go with the 110K data?
The key is indexes my friend... make sure you have indexes of all items used in the WHERE and this will reduce cardinality when selecting...
On a side not... 110k rows is still an extremely small data set for MySQL so shouldn't pose much of a performance issue if you haven't got correct indexing on the table anyway.
It is more depend on how often your data updates.
See.
The MySQL Query Cache
Query Caching in MySQL
Caching question MySQL or Filesystem
I am saying that is there any other way when I add more refinement, it
will pulled out data from just only the 45 matching and not go with
the 110K data.
Then make view of those 45 rows and apply query to it.
Create a view using query
Create view refined as select * from ....
And after that add more select queries to that view
like
Select * from refined where ...
Firs of all, i tend to agree with Brian, indexes matter.
Check what kind(s) of queries are most frequent, and construct multi-column indexes on the table accordingly. Note that the order of columns in the indexes does matter (as the index is a tree, first column appears in tree root, so if your query does not use that column - the whole tree is useless).
Enable slow query log to see what queries actually take long (if any), or not use indexes, so you can improve indexes over time.
Having said this, query cache is a real performance boost, if your table data is mostly read. Here is a useful article on mysql query cache.

Optimizing the Joining of Multiple MySQL Views

I have multiple views in my database that I am trying to perform a JOIN on when certain queries get very complex. As a worst case I would have to join 3 views with the following stats:
View 1 has 60K+ rows with 26 fields.
View 2 has 60K+ rows with 15 fields.
View 3 has 80K+ rows with 8 fields.
Joining views 1 and 2 seem to be no problem, but anytime I try to join the third view the query hangs. I'm wondering if there are any best practices I should be following to keep these querys from hanging. I've tried to use the smallest fields possible (medium/small ints where possible, ect).
We are using MySQL 5.0.92 community edition with MyISAM tables. Not sure if InnoDB would be more efficient.
As a last resort I thinking of splitting the one query into two, hitting views 1 & 2 with the first query, and then view 3 separately with the 3rd. Is there any downside to this other than making 2 queries?
Thanks.
You need to use EXPLAIN to understand why the performance is poor.
I wouldn't think you need to worry about MyISAM vs. InnoDB for this particular read performance just yet. MyISAM versus InnoDB
I am going to post my comments as an answer:
1) Take a look at the EXPLAIN command and see what it says.
2) Check the performance of the individual views. Are they as fast as you think on their own?
3) The columns you are using in your WHERE or JOIN clauses, do the underlying tables have indexes that apply to them? Something to have in mind:
A composite index (an index with more than one column) with columns
(a, b) would not help when you query only for b. It helps with a, and
a + b, but not with only b. That's why the single index you added
improved the situation
4) Are you using the all the columns and all the views? If you don't wouldn't it be simpler to take a look at the views and come up with a query instead?
If its possible to get what how the original VIEWs are defined, then use that as a basis to create your own single query might be a better approach... Way back, another person had similar issues on their query. He needed to get back to the raw table of one such view to ensure it had proper indexes to accept the optimization of the query he was trying to perform. Remember a view is a subset of something else and does not have an index to work with. So, if you can't take advantage of an index at the root table of a view, you could see such a performance hit.

Big tables and analysis in MySql

For my startup, I track everything myself rather than rely on google analytics. This is nice because I can actually have ips and user ids and everything.
This worked well until my tracking table rose about 2 million rows. The table is called acts, and records:
ip
url
note
account_id
...where available.
Now, trying to do something like this:
SELECT COUNT(distinct ip)
FROM acts
JOIN users ON(users.ip = acts.ip)
WHERE acts.url LIKE '%some_marketing_page%';
Basically never finishes. I switched to this:
SELECT COUNT(distinct ip)
FROM acts
JOIN users ON(users.ip = acts.ip)
WHERE acts.note = 'some_marketing_page';
But it is still very slow, despite having an index on note.
I am obviously not pro at mysql. My question is:
How do companies with lots of data track things like funnel conversion rates? Is it possible to do in mysql and I am just missing some knowledge? If not, what books / blogs can I read about how sites do this?
While getting towards 'respectable', 2 Millions rows is still a relatively small size for a table. (And therefore a faster performance is typically possible)
As you found out, the front-ended wildcard are particularly inefficient and we'll have to find a solution for this if that use case is common for your application.
It could just be that you do not have the right set of indexes. Before I proceed, however, I wish to stress that while indexes will typically improve the DBMS performance with SELECT statements of all kinds, it systematically has a negative effect on the performance of "CUD" operations (i.e. with the SQL CREATE/INSERT, UPDATE, DELETE verbs, i.e. the queries which write to the database rather than just read to it). In some cases the negative impact of indexes on "write" queries can be very significant.
My reason for particularly stressing the ambivalent nature of indexes is that it appears that your application does a fair amount of data collection as a normal part of its operation, and you will need to watch for possible degradation as the INSERTs queries get to be slowed down. A possible alternative is to perform the data collection into a relatively small table/database, with no or very few indexes, and to regularly import the data from this input database to a database where the actual data mining takes place. (After they are imported, the rows may be deleted from the "input database", keeping it small and fast for its INSERT function.)
Another concern/question is about the width of a row in the cast table (the number of columns and the sum of the widths of these columns). Bad performance could be tied to the fact that rows are too wide, resulting in too few rows in the leaf nodes of the table, and hence a deeper-than-needed tree structure.
Back to the indexes...
in view of the few queries in the question, it appears that you could benefit from an ip + note index (an index made at least with these two keys in this order). A full analysis of the index situation, and frankly a possible review of the database schema cannot be done here (not enough info for one...) but the general process for doing so is to make the list of the most common use case and to see which database indexes could help with these cases. One can gather insight into how particular queries are handled, initially or after index(es) are added, with mySQL command EXPLAIN.
Normalization OR demormalization (or indeed a combination of both!), is often a viable idea for improving performance during mining operations as well.
Why the JOIN? If we can assume that no IP makes it into acts without an associated record in users then you don't need the join:
SELECT COUNT(distinct ip) FROM acts
WHERE acts.url LIKE '%some_marketing_page%';
If you really do need the JOIN it might pay to first select the distinct IPs from acts, then JOIN those results to users (you'll have to look at the execution plan and experiment to see if this is faster).
Secondly, that LIKE with a leading wild card is going to cause a full table scan of acts and also necessitate some expensive text searching. You have three choices to improve this:
Decompose the url into component parts before you store it so that the search matches a column value exactly.
Require the search term to appear at the beginning of the of the url field, not in the middle.
Investigate a full text search engine that will index the url field in such a way that even an internal LIKE search can be performed against indexes.
Finally, in the case of searching on acts.notes, if an index on notes doesn't provide sufficient search improvement, I'd consider calculating and storing an integer hash on notes and searching for that.
Try running 'EXPLAIN PLAN' on your query and look to see if there are any table scans.
Should this be a LEFT JOIN?
Maybe this site can help.