I have read that creating a temporary table is best if the number of parameters passed in the IN criteria is large. This is for select queries. Does this hold true for update queries as well ?? I have an update query which uses 3 table joins (Inner Joins) and passes 1000 parameters in the IN criteria and this query runs in a loop for 200 or more times. Which is the best approach to execute this query ?
IN operations are usually slow. Passing 1000 parameters to any query sounds awful. If you can avoid that, do it. Now, I'd really have a go with the temp table. You can even play with the indexing of the table. I mean, instead of just putting values in it, play with the indexes that would help you optimize your searches.
On the other hand, adding with indexes is slower that adding without indexes. Go for an empiric test there. Now, what I think is a must, bear in mind that when using the other table you don't need to use the IN clause because you can use the EXISTS clause which results usually in better performance. I.E.:
select * from yourTable yt
where exists (
select * from yourTempTable ytt
where yt.id = ytt.id
)
I don't know your query, nor data, but that would give you an idea about how to do it. Note the inner select * is as fast as select aSingleField, as the database engine optimizes it.
Those are all my thoughts. But remember, to be 100% sure of what is best for your problem, there is nothing like performing both tests and timing them :) Hope this help.
Related
I am hitting some performance issues on a Mysql server.
I am trying to query a large table (~500k rows) for a subset of data:
SELECT * FROM `my_table` WHERE `subset_id` = id_value;
This request takes ~80ms to achieve, but I am trying to query it over 20k "id_value", which makes the total execution time of almost 1h.
I was hopping that adding an index on subset_id would help, but it's not changing anything (understanding how indexes work, it makes sense).
What I am trying to figure out is if there is any way to "index" the table in a way it wouldn't take 80ms to execute this query but something more reasonable?
Or in other work, is ~80ms for querying a 500k rows table "normal"?
Note: On the larger picture, I am using parallel queries and multiple connections to speed up the process, and tried various optimizations changing the innodb_buffer size. I'm also considering using a larger object querying the db once for the 500k rows instead of 20k*xx but having my code designed in a multiprocessed/co-routines/scalable way, I was trying to avoid this and focusing on optimizing the query/mysql server at the lowest level.
Thanks!
Use a single query with IN rather than a zillion queries:
SELECT *
FROM `my_table`
WHERE `subset_id` IN (id1, id2, . . .);
If your ids are already in a table -- or you can put them in one -- then use a table instead. You can still use IN
SELECT *
FROM `my_table`
WHERE `subset_id` IN (SELECT id FROM idtable);
I have a project(Laravel 5.4) where i need to improve performance as much as i can.
So i was wondering what is the performance difference between:
$model->get()
get method takes all variables('created_at','updated_at', etc), so the select should be faster.
$model->select('many variables to select')->get();
select method is an additional query, so it takes more time, so maybe just get is faster?
I wanted to know if select and get is better in all occasions or are there any moments where just get is better?
The differences between Model::get() and Model::select(['f1', 'f2'])->get() is only at the query
// Model::get()
SELECT * FROM table
// Model::select(['f1', 'f2'])->get()
SELECT f1, f2 FROM table
Both runs database query ONCE, and prepare the collection of model instances to you. select simply construct eloquent to only select fields you need. Performance gain is almost negligible or can be worse. Read about it here: Is it bad for performance to select all columns?
I have the following database structure :
create table Accounting
(
Channel,
Account
)
create table ChannelMapper
(
AccountingChannel,
ShipmentsMarketPlace,
ShipmentsChannel
)
create table AccountMapper
(
AccountingAccount,
ShipmentsComponent
)
create table Shipments
(
MarketPlace,
Component,
ProductGroup,
ShipmentChannel,
Amount
)
I have the following query running on these tables and I'm trying to optimize the query to run as fast as possible :
select Accounting.Channel, Accounting.Account, Shipments.MarketPlace
from Accounting join ChannelMapper on Accounting.Channel = ChannelMapper.AccountingChannel
join AccountMapper on Accounting.Accounting = ChannelMapper.AccountingAccount
join Shipments on
(
ChannelMapper.ShipmentsMarketPlace = Shipments.MarketPlace
and ChannelMapper.AccountingChannel = Shipments.ShipmentChannel
and AccountMapper.ShipmentsComponent = Shipments.Component
)
join (select Component, sum(amount) from Shipment group by component) as Totals
on Shipment.Component = Totals.Component
How do I make this query run as fast as possible ? Should I use indexes ? If so, which columns of which tables should I index ?
Here is a picture of my query plan :
Thanks,
Indexes are essential to any database.
Speaking in "layman" terms, indexes are... well, precisely that. You can think of an index as a second, hidden, table that stores two things: The sorted data and a pointer to its position in the table.
Some thumb rules on creating indexes:
Create indexes on every field that is (or will be) used in joins.
Create indexes on every field on which you want to perform frequent where conditions.
Avoid creating indexes on everything. Create index on the relevant fields of every table, and use relations to retrieve the desired data.
Avoid creating indexes on double fields, unless it is absolutely necessary.
Avoid creating indexes on varchar fields, unless it is absolutely necesary.
I recommend you to read this: http://dev.mysql.com/doc/refman/5.5/en/using-explain.html
Your JOINS should be the first place to look. The two most obvious candidates for indexes are AccountMapper.AccountingAccount and ChannelMapper.AccountingChannel.
You should consider indexing Shipments.MarketPlace,Shipments.ShipmentChannel and Shipments.Component as well.
However, adding indexes increases the workload in maintaining them. While they might give you a performance boost on this query, you might find that updating the tables becomes unacceptably slow. In any case, the MySQL optimiser might decide that a full scan of the table is quicker than accessing it by index.
Really the only way to do this is to set up the indexes that would appear to give you the best result and then benchmark the system to make sure you're getting the results you want here, whilst not compromising the performance elsewhere. Make good use of the EXPLAIN statement to find out what's going on, and remember that optimisations made by yourself or the optimiser on small tables may not be the same optimisations you'd need on larger ones.
The other three answers seem to have indexes covered so this is in addition to indexes. You have no where clause which means you are always selecting the whole darn database. In fact, your database design doesn't have anything useful in this regard, such as a shipping date. Think about that.
You also have this:
join (select Component, sum(amount) from Shipment group by component) as Totals
on Shipment.Component = Totals.Component
That's all well and good but you don't select anything from this subquery. Therefore why do you have it? If you did want to select something, such as the sum(amount), you will have to give that an alias to make it available in the select clause.
I have some logs tables with the same structure. Each tables is related to a site and count billion of entries. The reason of this split is to perform quick and efficient query, because 99.99% of the query are related to the site.
But at this time, I would like to retrieve the min and max value of a column of these tables?
I can't manage to write the SQL request. Should I use UNION?
I am just looking for the request concept, not the final SQL request.
You could use a UNION, yes. Something like this should do:
SELECT MAX(PartialMax) AS TotalMax
FROM
( SELECT MAX(YourColumn) AS PartialMax FROM FirstTable UNION ALL SELECT MAX(YourColumn) AS PartialMax FROM SecondTable ) AS X;
If you have an index over the column you want to find a MAX inside, you should have very good performance as the query should seek to the end of the index on that column to find the maximum value very rapidly. Without an index on that column, the query has to scan the whole table to find the maximum value since nothing inherently orders it.
Added some details to address a concern about "enormous queries".
I'm not sure what you mean by "enormous". You could create a VIEW that does the UNIONs for you; then, you use the view and it will make the query very small:
SELECT MAX(YourColumn) FROM YourView;
but that just optimizes for the size of your query's text. Why do you believe it is important to optimize for that? The VIEW can be helpful for maintenance -- if you add or remove a partition, just fix the view appropriately. But a long query text shouldn't really be a problem.
Or by "enormous", are you worried about the amount of I/O the query will do? Nothing can help that much, aside from making sure each table has an index on YourColumn so that maximum value on each partition can be found very quickly.
I used to write select clause in side select clause to avoid joins in from clause. But I am afraid that is it a good coading practice or it will degrade database performance. Below is the query which contains multiple tables but I have written it using nested select clause without any join statement. Please let me know if I am making any mistake or it is ok. At this moment, I am getting accurate result.
SELECT * ,
(select POrderNo from PurchaseOrderMST POM
where POM.POrderID=CET.POrderID)as POrderNo,
(select SiteName from SiteTRS ST where ST.SiteID=CET.SiteID)as SiteName,
(select ParticularName from ParticularMST PM where
PM.ParticularID=CET.ParticularID)as ParticulerName
FROM ClaimExpenseTRS CET
WHERE ClaimID=#ClaimID
I'd use joins for this because it is best practice to do so and will be better for the query optimizer.
But for the learning just try to execute the script with join and without and see what happens on the query plan and the execution time. Usually this answers your questions right away.
Your solution is just fine.
As long as you are only using 1 column for each "joined" table, and has no multiple matching rows, it is fine. In some cases, even better than joining.
(the db engine could anytime change the direction of a join, if you are not using tricks to force a given direction, which could cause performance suprises. It is called query optimiyation, but as far as you really know your database, you should be the one to decide how the query should run).
I think you should JOIN indeed.
Now your creating your own JOIN with where and select statements.