I have three table in mysql
user(1K)
Campaign(6K)
CamapaignDailyUSes(70K)
If I get data of all user by
Select User.column1,User.column2,Campaign.column1 ,Campaign.column2 ,
DailyUSes.* from User Join Campaign join CamapaignDailyUSes
it will give result in few secounds may be.
But in Couchbase N1ql it will take more then 1 minute
what should Do fot it ?? even create some proper index.
How can i structure my Couchbase data ??
can you post (or mail to prasad.varakur#couchbase.com) the sample docs. Did you explore restructuring/embedding to avoid some JOINS. What is the exact N1QL query? Couches4.5 onwards has two kinds of joins for better performance (leveraging indexes better), and allowing more flexibility in JOINs.
See https://developer.couchbase.com/documentation/server/4.5/n1ql/n1ql-language-reference/from.html#story-h2-3 for more info on lookup & index joins.
And, what are the sizes you specify.. size of document or number of documents??
If 70K is size of document, and you are fetching all of it, then what is the expected result size (based on selectivities).
If you have too big results, then you may want to use parameters (in 4.5.1) such as pretty=false, to minimize the n/w overhead.
-Prasad
Related
There's 10 tables all with a session_id column and a single session table. The goal is to join them all on the session table. I get the feeling that this is a major code smell. Is this good/bad practice ?
What problems could occur?
Whether this is a good design or not depends deeply on what you are trying to represent with it. So, it might be OK or it might not be... there's no way to tell just from your question in its current form.
That being said, there are couple ways to speed up a join:
Use indexes.
Use covering indexes.
Under the right DBMS, you could use a materialized view to store pre-joined rows. You should be able to simulate that under MySQL by maintaining a special table via triggers (or even manually).
Don't join a table unless you actually need its fields. List only the fields you need in the SELECT list (instead of blindly using *). The fastest operation is the one you don't have to do!
And above all, measure on representative amounts of data! Possible results:
It's lightning fast. Yay!
It's slow, but it doesn't matter that it's slow (i.e. rarely used / not important).
It's slow and it matters that it's slow. Strap-in, you have work to do!
We need Query with 11 joins and the EXPLAIN posted in the original question when it is available, please. And be kind to your community, for every table involved post as well SHOW CREATE TABLE tblname SHOW INDEX FROM tblname to avoid additional requests for these 11 tables. And we will know scope of data and cardinality involved for each indexed column.
of Course more join kills performance.
but it depends !! if your data model is like that then you can't help yourself here unless complete new data model re-design happen !!
1) is it a online(real time transaction ) DB or offline DB (data warehouse)
if online , then better maintain single table. keep data in one table , let column increase in size.!!
if offline , it's better to maintain separate table , because you are not going to required all column always.!!
I am working on a database and its a pretty big one with 1.3 billion rows and around 35 columns. Here is what i get after checking the status of the table:
Name:Table Name
Engine:InnoDB
Version:10
Row_format:Compact
Rows:12853961
Avg_row_length:572
Data_length:7353663488
Max_data_length:0
Index_length:5877268480
Data_free:0
Auto_increment:12933138
Create_time:41271.0312615741
Update_time:NULL
Check_time:NULL
Collation:utf8_general_ci
Checksum:NULL
Create_options:
Comment:InnoDB free: 11489280 kB
The Problem I am facing that even a single select query takes too much time to process for example a query Select * from Table_Name limit 0,50000 takes around 2.48 minutes
Is that expected?
I have to make a report in which I have to use the whole historical data, that is whole 1.3 bil rows. I could do this batch by batch but then I would have to run queries which are taking too much time many times again and again.
When the simple query is taking so much time I am not able to do any other complex query which needs joins and case statements.
A common practice is, if you have huge amount of data, you ...
should not SELECT * : You should only select the columns you want
should Limit your fetch range to a smaller number: I bet you won't handle 50000 records at the same time. Try to fetch it batch by batch.
A common problem many database administrators face. The solution: Caching.
Break the Queries into more simpler and small queries. Use Memcached or other caching techniques and tools Memcached saves key vaue pairs, check for a data in memcache..if available, use it. If not fetch it from database and then use and cach. Next tie the data will be available from cahe.
You will have to develop own logic and change some queries. Memcached is available here:
http://memcached.org/
Many tutorials are available on the Web
enable in your my.conf the slow queries up to N seconds, then execute some queries and watch this log, this gives you some clues and maybe you could add some indexes to this table.
or do some queries with EXPLAIN. http://hackmysql.com/case1
A quick note that is usually an easy win ...
If you have any columns that are large text blobs, try selecting everything except for those fields. I've seen varchar(max) fields absolutely kill query efficiency.
You have a very wide average row size and 35 columns. You could try vertically partitioning the table, that is, split the table up into smaller tables that are related to each other 1:1 with a subset of columns from the table. InnoDB stores rows in pages and is not efficient for very wide rows.
If the data is append-only consider looking at ICE.
You might also look at TokuDB because it supports good compression.
You can consider using partitioning and Shard-Query (http://code.google.com/p/shard-query) to access data in parallel. You can also split data over more than one server for parallelism using Shard-Query.
Try adding WHERE clause: WHERE 1=1
If it doesn't give any effect then you should change your engine type to MyISAM.
BACKGROUND
I'm working with a MySQL InnoDB database with 60+ tables and I'm creating different views in order to make dynamic queries fast and easier in the code. I have a couple of views with INNER JOINS (without many-to-many relationships) of 20 to 28 tables SELECTING 100 to 120 columns with row count below 5,000 and it works lighting fast.
ACTUAL PROBLEM
I'm creating a master view with INNER JOINS (without many-to-many relationships) of 34 tables and SELECTING about 150 columns with row count below 5,000 and it seems like it's too much. It takes forever to do a single SELECT. I'm wondering if I hit some kind of view-size limit and if there is any way of increasing it, or any tricks that would help me pass through this apparent limit.
It's important to note that I'm NOT USING Aggregate functions because I know about their negative impact on performance, which, by the way I'm very concerned about.
MySql does not use the "System R algorithm" (used by Postgresql, Oracle, and SQL Server, I think), which considers not only different merge algorithms (MySQL only has nested-loop, although you can fake a hash join by using a hash index), but also the possible ways of joining the tables and possible index combinations. The result seems to be that parsing of queries - and query execution - can be very quick upto a point, but performance can dramatically drop off as the optimizer chooses the wrong path through the data.
Take a look at your explain plans and try to see if a) the drop in performance is due to the number of columns you are returning (just do SELECT 1 or something) or b) if it is due to the optimizer choosing a table scan instead of index usage.
A view is just a named query. When you refer to a view in MySQL it just replaces the name with the actual query and run it.
It seems that you confuse it with materialized views, which are tables you create from a query. Afterwards you can query that table, and does not have to do the original query again.
Materialized views are not implemented in MySQL.
To improve the performance try to use the keyword explain to see where you can optimize your query/view.
Suppose I have a table A, creating a view V from that table.
Then I do several queries from V. I wonder if V will be re-constructed each time I query? or it will be constructed only 1 time, and being saved somewhere in memory by DBMS for next queries (which I think similar to query from a table)?
In general, no. V is a transient set of rows that is computed when requested by a query. Because you can apply additional WHERE and ORDER BY criteria when querying from a view, the execution plan for two queries against the same view could conceivably be quite different. The database generally cannot reuse the results of a previous query against a view to satisfy the next query against that view.
That said, there is a relatively new technology in some engines called Materialized Views. I have never used them myself, but my understanding is that these views are pre-computed based on updates that are made to the underlying tables. So with Materialize Views you do get improved SELECT performance, but at the expense of decrease INSERT, UPDATE, and DELETE performance.
You should also be aware that multi-column indexes can be used to precompute certain selections and sort orders involving individual tables. If you issue a query against a table that can be satisfied using a compound index (only the columns in the index are required by the query, and the sort order matches the index) then the table itself need never be read, only the index.
Views in MySQL are not a de facto caching solution.
MySQL runs the query against the base tables every time you query a view on those base tables. The results of the query are not stored for the view.
As a result, there is no need to "refresh" the view as there is when using materialized views in Oracle Microsoft SQL Server. Even the SQL in a MySQL view definition is re-evaluated every time you query the view.
If you need something like materialized views in MySQL, one tool that might help is FlexViews. This stores the results of a query in an ordinary base table, and then monitors changes recorded in MySQL's binary log, applying relevant changes to the base table. This tool can be quite useful, but it has some caveats:
FlexViews is written in PHP, and as such it has some performance limitations. Depending on your write traffic load, FlexViews may not be able to keep up.
It doesn't support every possible type of SELECT query.
FlexViews-managed materialized view tables are not updateable. That is, you can UPDATE this view table, but the change will not apply to the base tables.
According to Pinal Dave, a view must be refreshed in order to reflect changes made to its referenced table(s). I'm not sure this makes a view of a simple 1-table query any more efficient than querying the table directly (it probably doesn't) but I think it means that views containing complex joins and subqueries may be more efficient than their non-view counterparts.
Pinal Dave has more to say about the other limitations of SQL views (or features, if you like). Maybe you can learn something useful there.
Mysql Views do not support Indexes. (as like in Oracle, where you can create index in Oracle Views) But mysql views can use the indexes in underlying table when created with Merge Algorithm.
If you have to use views, then adjust your JOIN BUFFER.
Using, Something like this
set global join_buffer_size=314572800;
Do profile the differences before and after changing the buffer size.
I have seen after increasing join buffers, the view query executes in same time (in ms) as the table of the same size will do.
For my startup, I track everything myself rather than rely on google analytics. This is nice because I can actually have ips and user ids and everything.
This worked well until my tracking table rose about 2 million rows. The table is called acts, and records:
ip
url
note
account_id
...where available.
Now, trying to do something like this:
SELECT COUNT(distinct ip)
FROM acts
JOIN users ON(users.ip = acts.ip)
WHERE acts.url LIKE '%some_marketing_page%';
Basically never finishes. I switched to this:
SELECT COUNT(distinct ip)
FROM acts
JOIN users ON(users.ip = acts.ip)
WHERE acts.note = 'some_marketing_page';
But it is still very slow, despite having an index on note.
I am obviously not pro at mysql. My question is:
How do companies with lots of data track things like funnel conversion rates? Is it possible to do in mysql and I am just missing some knowledge? If not, what books / blogs can I read about how sites do this?
While getting towards 'respectable', 2 Millions rows is still a relatively small size for a table. (And therefore a faster performance is typically possible)
As you found out, the front-ended wildcard are particularly inefficient and we'll have to find a solution for this if that use case is common for your application.
It could just be that you do not have the right set of indexes. Before I proceed, however, I wish to stress that while indexes will typically improve the DBMS performance with SELECT statements of all kinds, it systematically has a negative effect on the performance of "CUD" operations (i.e. with the SQL CREATE/INSERT, UPDATE, DELETE verbs, i.e. the queries which write to the database rather than just read to it). In some cases the negative impact of indexes on "write" queries can be very significant.
My reason for particularly stressing the ambivalent nature of indexes is that it appears that your application does a fair amount of data collection as a normal part of its operation, and you will need to watch for possible degradation as the INSERTs queries get to be slowed down. A possible alternative is to perform the data collection into a relatively small table/database, with no or very few indexes, and to regularly import the data from this input database to a database where the actual data mining takes place. (After they are imported, the rows may be deleted from the "input database", keeping it small and fast for its INSERT function.)
Another concern/question is about the width of a row in the cast table (the number of columns and the sum of the widths of these columns). Bad performance could be tied to the fact that rows are too wide, resulting in too few rows in the leaf nodes of the table, and hence a deeper-than-needed tree structure.
Back to the indexes...
in view of the few queries in the question, it appears that you could benefit from an ip + note index (an index made at least with these two keys in this order). A full analysis of the index situation, and frankly a possible review of the database schema cannot be done here (not enough info for one...) but the general process for doing so is to make the list of the most common use case and to see which database indexes could help with these cases. One can gather insight into how particular queries are handled, initially or after index(es) are added, with mySQL command EXPLAIN.
Normalization OR demormalization (or indeed a combination of both!), is often a viable idea for improving performance during mining operations as well.
Why the JOIN? If we can assume that no IP makes it into acts without an associated record in users then you don't need the join:
SELECT COUNT(distinct ip) FROM acts
WHERE acts.url LIKE '%some_marketing_page%';
If you really do need the JOIN it might pay to first select the distinct IPs from acts, then JOIN those results to users (you'll have to look at the execution plan and experiment to see if this is faster).
Secondly, that LIKE with a leading wild card is going to cause a full table scan of acts and also necessitate some expensive text searching. You have three choices to improve this:
Decompose the url into component parts before you store it so that the search matches a column value exactly.
Require the search term to appear at the beginning of the of the url field, not in the middle.
Investigate a full text search engine that will index the url field in such a way that even an internal LIKE search can be performed against indexes.
Finally, in the case of searching on acts.notes, if an index on notes doesn't provide sufficient search improvement, I'd consider calculating and storing an integer hash on notes and searching for that.
Try running 'EXPLAIN PLAN' on your query and look to see if there are any table scans.
Should this be a LEFT JOIN?
Maybe this site can help.