SQL Server Partition Performance Issue - sql-server-2008

I have the following scenario:
TableA (ID GUID, Type INT) : +60M rows
TableB (ID GUID) : +5M rows
TableA has an Index on ID and Type
TableB the Primary Key is ID
I need to improve the following query:
SELECT * FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID AND A.Type = 5
The query takes about 30 seconds to complete.
We have tried partitioning TableA on the Type column but the query execution time remains the same. Even the execution plan is still the same. As far as I understood partitioning the table should greatly improve the performance?
Do I have to adjust my query to use the partition thus increasing performance?
Are my indexes wrong?
Thanks in advance!

You are one of the people who think partitioning is a magic switch that improves performance when pressed. Partitioning mostly reduces performance and helps in a few narrow cases. It is mostly a management feature for bulk loading and data archiving/deletion.
Partitioning introduces serious consequences and cannot be done without proper understanding, planing and testing.
Create the proper indices (in your case A(Type, ID) would be a good start. Alternatively A(ID) WHERE Type = 5 (a filtered index)).

Related

Performance related to Not exists function

I have 2 tables a and b,each have 2 M and 3.2 Million records. I'am trying to get those id's which are not exists in b from a. I have written below query,
select a.id from a where not exists (select b.id from b where a.id =b.id)
this is taking longer time. is there any better way to get results faster.
Update: I just look into the table structure for both tables and found table a.id has decimal datatype and table b.id has varchar as datatype
will this difference in datatype cause any issues.
Could you try the LEFT JOIN with NULL. It will return the Id's which are exists in TableA and those are not in TableB.
SELECT T1.Id
FROM TableA T1
LEFT JOIN TableB T2 ON T2.Id = T1.Id
WHERE T2.Id IS NULL
While you could write your query using an anti-join, it probably would not affect the performance much, and in fact the underlying execution plan could even be the same. The only way I can see to speed up your query would be to add an index to the b table:
CREATE TABLE idx ON b (id);
But, if b.id be a primary key, then it should already be part of the clustered index. In this case, your current performance might be as good as you can get.
(this is mostly comment, but it's a bit long)
Please take some time to read some of the many questions about query optimization here is SO. The ones which are downvoted and closed omit table/index definitions and explain plans. The ones which will receive upvotes include these along with cardinality, performance and result metrics.
The join to table a in your sub query is redundant. When you remove the second reference to that table you end up with a simpler query. Then you can use a "not in" or a left join.
But the performance is still going to suck. Wherever possible you should try to avoid painting yourself into a corner like this in your data design.
Thanks for your valuable answers, I found the way. It got resolved after keeping same datatypes for lookup ID's, got results in 22 sec.

Increase performance of a one-to-many join

We're building a model where we're joining a 13-part profile (01_01_resource_utilization_prepared) to a daily record to create 13 record per day; this is a deliberate one-to-many which grows the size of the table.
It is a simple query but we have tried indexing but what is the best way to optimise this query?
SELECT
a.DATE,
a.RUN_ID,
a.HOURS,
a.HOURS * b.RESOURCE_DISTRIBUTION,
a.SCHEDULE_PROFILE_ID,
a.WEEKDAY_NUMBER,
a.SCHEDULE_DISTRIBUTION,
b.RESOURCE_DISTRIBUTION,
a.LOCATION_DESC,
a.DEPARTMENT_DESC,
a.LANGUAGE_DESC,
a.JOB_TITLE_DESC,
FROM
03_01_schedule a
LEFT JOIN 01_01_resource_utilization_prepared b ON (
a.RESOURCE_PROFILE_ID = b.RESOURCE_PROFILE_ID
AND a.DATE >= b.EFFECTIVE_FROM
AND a.DATE <= b.EFFECTIVE_TO
)
Does 01_01 refer to Jan 01? If so, I suggest that is a bad way to lay out the data. But meanwhile...
Checking for within a range, where the range comes from another table is hard to optimize. These composite indexes on b will help a little:
INDEX(RESOURCE_PROFILE_ID, EFFECTIVE_FROM)
INDEX(RESOURCE_PROFILE_ID, EFFECTIVE_TO)
Is LEFT needed? If it can be removed without destroying the semantics, then a much better option avails itself. Removing LEFT would let this be useful on a:
INDEX(RESOURCE_PROFILE_ID, `DATE`)
(Meanwhile, I did not understand the relevance of anything you said in your first paragraph.)
Without more information I can't say exactly, but performance will depend on indexing the columns you're comparing. Without indexes the join may have to scan every row, a "full table scan".
It's pretty common in MySQL to forget to declare foreign keys. 03_01_schedule.RESOURCE_PROFILE_ID and 01_01_resource_utilization_prepared.RESOURCE_PROFILE_ID should be declared as foreign keys and they will be indexed. This will make the basic join much faster and also supply referential integrity.
03_01_schedule.DATE, 01_01_resource_utilization_prepared.EFFECTIVE_FROM, and 01_01_resource_utilization_prepared.EFFECTIVE_TO should all be indexed. This will make comparisons using those columns much faster.

SQL query taking too much time to exectute

I have two tables. Table 1 (for questions) has 2 columns "qid" (the question id) and "question". Table 2 (for answers) has "aid" (answer id), "qid" and "answers". The "qid" in table 2 is a foreign key from table 1 and a single question may have multiple answers.
Both tables have 30,000+ entries.
I have made a loop,
for(each qid)
{
select question from table1 where qid = id;
select answer from table2 where qid = id;
}
The number of ids passed is 10 (to the loop). The query takes about 18 seconds to execute.
Is this normal for this much delay or is there something wrong with this approach. I want to know if there is any way to make the above query faster.
You can do it in a single query which should be a lot faster.
SELECT t1.qid, t1.question, t2.aid, t2.answer
FROM table1 t1
INNER JOIN table2 t2 ON t2.qid = t1.qid
WHERE t1.qid IN (?,?,?,etc.);
Or you can do
SELECT t1.qid, t1.question, t2.aid, t2.answer FROM table1 t1, table2 t2 WHERE t1.qid=t2.qid AND t1.qid IN(...some condition...);
I completely agree with #wvdz.
Additionally this is a general list of things you can do to improve performance of selects:
Analyze & possibly rewrite your query, (there's a lot left unsaid here so I recommend visiting one of the resource links I've included, or both).
If the query includes what is effectively the primary key, make sure
you have actually created the primary key constraint for that column
(this creates an index)
Consider creating indexes for any columns that will be used in the
conditions of the query (similar to point one, you will want to read up on this if you think you need more optimization, this becomes more important the more data you have in a table)
Also here are a couple of good resources for tuning your sql queries:
http://beginner-sql-tutorial.com/sql-query-tuning.htm
http://www.quest.com/whitepapers/10_SQL_Tips.pdf
NOTE on Primary Keys: Should you want more information on the use of Primary Keys this answer I gave in the past explains how I use primary keys and why... mentioning this because, in my opinion & experience, every table should include a primary key: MySQL - Should every table contain it's own id/primary column?

Should I use count(*) on a very common query or store the number

I have two tables (a lot more actually, but only interested in these two).
USER_ACTION (ID, ID_USER, ID_ACTION, TMST) AS A
ACTION (ID, DATA, NUM_USERS) AS B
However, B.NUM_USERS is actually a representation of the number of records in USER_ACTION that have A.ID_ACTION = B.ID
Is this acceptable as performance optimization (a very common query recovers ACTION data very often), or since this breaks the relational model it is a bad idea and the query should be:
SELECT B.ID, B.DATA, count(*) AS NUM_USERS
FROM ACTION B JOIN USER_ACTION A ON A.ID_ACTION = B.ID
WHERE B.ID = ?
GROUP BY B.ID
If this second option is the correct answer, is there any index I should put in place to optimize this query?
-- EDIT --
After running Explain with current model, but anonimyzed (8 users for the selected action):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE B const PRIMARY PRIMARY 8 const 1
1 SIMPLE A ref FK_USER_ACTION FK_USER_ACTION 8 const 8 Using index
I would recommend keeping the query as you described it and adding an index on ID_ACTION in USER_ACTION and ID in ACTION.
Your where filter and group by will both benefit from the index on ACTION, and the join to USER_ACTION should be an eq_ref join (for more info http://www.sitepoint.com/using-explain-to-write-better-mysql-queries/), which will be fast in most cases. Use EXPLAIN EXTENDED in front of your query to verify the action plan. If you start to notice any slowness, you could also index DATA in ACTION using a compound index on (ID, DATA). That would give you a covering index, but I doubt if the cost of indexing DATA with ID is really worth it (for more info: http://www.mysqlperformanceblog.com/2006/11/23/covering-index-and-prefix-indexes/... old, but still applicable)
In general, if many count()'s were several thousands of rows, you might want to consider a summary table via a materialized view or a cron job or whatever. It's still slower to count() over (say) 100k rows than it is to precompute that. But basically you shouldn't notice much slowness until you're dealing with several thousands rows in USER_ACTION that need to be count(*)'d in order to return the result. I don't think you're going to have that problem, frankly... so you should be fine using the join you described with the indices I did. Use EXPLAIN EXTENDED to verify that. Also note that LRU will probably come into play a bit here, if you're using INNODB (eg http://dev.mysql.com/doc/refman/5.5/en/innodb-buffer-pool.html). Just something to be aware of, a concept you'll want to realize exists.

TSQL Multiple IN queries bad performance

Why queries like
delete from A where F1 IN (1,2,3,5,5) and F2 IN (7,9,10,11)
are so slow (F1 and F2 are indexed, stats updated) and how do you
optimize them?
Given your example, I'm not sure there's anything you could do to increase performance.
However, your example is simplistic, and if instead your example were using subqueries in the IN statements, then it would probably have room for improvement, perhaps by using an EXISTS instead or just joining. I think the meat of this question is probably about performance issues with IN statements though, right?
Your best tool when considering performance is to examine the explain plans of different solutions and see which one makes most sense for the amount and types of data you expect.
This SO post explains some about how an IN statement works...
SQL Server IN vs. EXISTS Performance
Here's a blog that also discusses performance factors...
http://sqlknowledgebank.blogspot.com/2012/11/in-exists-clause-and-their-performance.html
By guess is a dual loop
My sample is a select
It is a lot faster to optimize a select first
With a join on a PK the query optimizer has more to work with
But with the PK you cannot insert 5 twice
create table #tempF1 (ID int primary key);
insert into #tempF1 values (1),(2),(3),(4);
create table #tempF2 (ID int primary key);
insert into #tempF2 values (1),(2),(3),(5);
select *
from tbl
inner merge join #tempF1
on tbl.F1 = #tempF1.ID
inner merge join #tempF2
on tbl.F1 = #tempF2.ID
May not work in your situation and test other join hints and no hint
I use this technique on some big tables with complex queries where the query optimizer got stupid