I have a query in MYSQL like this (simplified) :
SELECT col1, SUM(DISTINCT col2) AS S
FROM tbl1
WHERE col1='abbc'
GROUP BY col1
ORDER BY S ASC
I know that an index on col1 would be useful for that kind of query. I would like to know if a covering index on (col1, col2) would be more usefull or if it doesnt make any difference.
i try it , it seems different and more usefull
Index Version Execution Plan :
without distinct
SELECT col1, SUM(col2) AS S
FROM tbl1
WHERE col1='abbc'
GROUP BY col1
ORDER BY S ASC;
distinct
SELECT col1, SUM(distinct col2) AS S
FROM tbl1
WHERE col1='abbc'
GROUP BY col1
ORDER BY S ASC;
SQL Fiddle
without Index Version Execution Plan :
it's no different
SQL Fiddle
Related
Supposed I have an ORDER BY that is derived by a SELECT statement:
...
WHERE col1 = ...
ORDER BY (SELECT CASE .. END) // - will return col2 or col3 both of which are in the same table as column 1.
Will the following indices be of use or does using the SELECT makes them useless?
INDEX (col1, col2)
INDEX (col1, col3)
I was having trouble determining a possible unique key in a poorly defined table. The table had 5000 rows. I selected distinct on the fields I thought might be a unique key.
select count(distinct col1, col2)
from tab1;
The result was 4980 records. Then I checked the 20 records and found that the values for col2 where null, but adding col3 should give me uniqueness.
select count(distinct col1, col2, col3)
from tab1;
The result was still 4980. What the? So I changed the query to this.
select col1, col2, col3, count(*)
from tab1
group by col1, col2, col3
having count(*) > 1;
With this I got zero rows, so col1, col2, and col3 are unique. So what was wrong with the first three column query? I tried this.
select count(distinct col1, coalesce(col2, ''), col3)
from tab1;
This returned 5000 records.
It is likely that the multiple fields are being concatenated together in one field in the engine, and concatenating col1, NULL, col3 is resulting in NULL and that is why it is acting this way. But, the result seems to break the NULL standards that MySQL seems to want to follow. Is this a MySQL bug?
The manual specifically says that COUNT(DISTINCT expr [,expr...])
Returns a count of the number of rows with different non-NULL expr values.
which is the behaviour you are seeing.
I have a query like this:
select col1, col2 from table1 where col1 = ?
union all
select col1, col2 from table2 where col2 = ?
Now I need to limit the result of the above query, Now I want to know, if I use limit clause after second select, then just the result of second select will be limited or the result of both select?
Anyway, which approach is good for limiting the result of union all query?
One:
select col1, col2 from table1 where col1 = ?
union all
select col1, col2 from table2 where col2 = ?
limit ?,10
Two:
select * from
(
select col1, col2 from table1 where col1 = ?
union all
select col1, col2 from table2 where col2 = ?
) x
limit ?,10
According to MySQL manual:
To use an ORDER BY or LIMIT clause to sort or limit the entire UNION
result, parenthesize the individual SELECT statements and place the
ORDER BY or LIMIT after the last one.
Hence, you can use:
(select col1, col2 from table1 where col1 = ?)
union all
(select col1, col2 from table2 where col2 = ?)
LIMIT ?, 10
Using a sub-query should also work, but can't be more efficient in comparison to the above query.
The first is better from a performance perspective. The second materializes the subquery, which is additional overhead.
Note: You are using limit without an order by, so the results may not be consistent from one execution of the query to the next.
You should be using order by, which probably makes it irrelevant which version you use (because the order by needs to read and write the data anyway).
I'm starting with a query like this:
insert into summary ( col1, col2, Total )
select col1, col2, count(col4) as total from importdata
where col1 = 'abc' and col4 in ('1A', '2A')
group by col1, col2
order by col1, col2
and I haven't been able to determine how the correct 'on duplicate' clause. The clause I think I need is
on duplicate key update total=count(col4)
and I've placed it as the very last line in the query and as the line after the where clause, but both generated errors. Is my clause even correct and where does it need to go?
(Worst case I can use 'insert ignore', but I think doing the update would be better.)
You can't use COUNT or other group functions in the ON DUPLICATE KEY UPDATE clause. What you can do instead is this:
INSERT INTO summary ( col1, col2, Total )
SELECT col1, col2, count(col4)
FROM importdata
WHERE col1 = 'abc' AND col4 IN ('1A', '2A')
GROUP BY col1, col2
ORDER BY col1, col2
ON DUPLICATE KEY UPDATE Total = VALUES(Total)
This says, if there is a duplicate key, instead of inserting a new row just set the column total to the value you would have inserted in Total. Note that I got rid of the as total -- that would have caused problems as you already have a column named Total, and the names are case-insensitive.
You cannot use functions for the duplicate key update, however, you could create a variable and then use that variable.
INSERT INTO summary (col1, col2, Total)
select col1, col2, #totalCount := count(col4) as Total from importdata
where col1 = 'abc' and col4 in ('1A', '2A')
group by col1, col2
order by col1, col2
) ON DUPLICATE KEY UPDATE Total = #totalCount;
In a stored procedure, I need to INSERT the result of a long UNION into a temp table.
The WHERE clause is the same for all tables, which is being in a SELECT DISTINCT.
Simplified for readability, it goes like this:
INSERT INTO #MyTemp
SELECT col1, col2, col3 FROM tab1 WHERE col1 in (SELECT DISTINCT myId FROM TabIds) UNION
SELECT col1, col2, col3 FROM tab2 WHERE col1 in (SELECT DISTINCT myId FROM TabIds) UNION
SELECT col1, col2, col3 FROM tab3 WHERE col1 in (SELECT DISTINCT myId FROM TabIds) UNION
.
.
.
SELECT col1, col2, col3 FROM tab20 WHERE col1 in (SELECT DISTINCT myId FROM TabIds)
Although TabIds is a small temp table, typically 3-6 records long, this seems to be pretty inneficient.
Is there a better way to do this?
Summarizing my question:
Is there a way I can do SELECT DISTINCT myId FROM TabIds just once and assign it to a kind of array/list/set (not to another temp table) and just use that in the WHERE clauses, and if there is a way, does it really matter for such a small (3-6 recs) temp table?
I'm ignoring your requirement ("not to another temp table") because I don't believe it is well-founded. Try and see if this solution gives you better performance:
SELECT i = myId
INTO #x
FROM dbo.TabIds -- please always use schema prefix
GROUP BY myId;
CREATE UNIQUE CLUSTERED INDEX x ON #x(i);
INSERT INTO #MyTemp(col1, col2, col3)
SELECT col1, col2, col3
FROM
(
SELECT col1, col2, col3 FROM dbo.tab1 WHERE EXISTS -- likely better than IN
(SELECT 1 FROM #x WHERE i = tab1.col1)
UNION ALL
SELECT col1, col2, col3 FROM dbo.tab2 WHERE EXISTS
(SELECT 1 FROM #x WHERE i = tab2.col1)
UNION ALL
...
UNION ALL
SELECT col1, col2, col3 FROM dbo.tab20 WHERE EXISTS
(SELECT 1 FROM #x WHERE i = tab20.col1)
) AS x
GROUP BY col1, col2, col3; -- likely more efficient than `UNION` to remove dupes
Of course this will work best if col1 is indexed in all 20 tables, and if that index includes col2 and col3.
The reason I suggested a view is not because I thought it would make this code run faster. Just that you could create a view that generates this UNION for you, making this code simpler (and any other code that repeats this monotonous UNION). It was a suggestion for convenience, not for performance - though I need to make it clear that using a view does not magically make things slower. Sometimes I can, but that's a dangerous and illogical reason to avoid views.
Finally, I'd strongly consider normalization. Why are these 20 different tables in the first place, when they could all be in one single table?
CREATE TABLE dbo.Normal
(
SourceTableID INT,
col1 <data type>,
col2 <data type>,
col3 <data type>
);
-- indexes / constraints
INSERT dbo.Normal
SELECT 1, col1, col2, col3 FROM dbo.tab1
UNION ALL
SELECT 2, col1, col2, col3 FROM dbo.tab2
UNION ALL
...
UNION ALL
SELECT 20, col1, col2, col3 FROM dbo.tab20;
Now all your queries can simply reference this new table. If you will commonly look for only one of the sources (e.g. tab5), then indexing or partitioning on SourceTableID would be useful.
What you're doing, conceptually, is fine for one-offs and data loads. I hope this isn't part of a bigger pattern in production code, though.
What you're looking for is a Common Table Expression.
My T-SQL is a bit rusty, but with a CTE, your query would go something like:
WITH TabIds_CTE AS (SELECT DISTINCT myId FROM TabIds)
INSERT INTO #MyTemp
SELECT col1, col2, col3 FROM tab1 WHERE col1 IN (SELECT * FROM TabIds_CTE)
UNION ALL ...
I think the following might be better for small tables, but still - it's horrible idea to leave it like this in some production process :)
INSERT INTO #MyTemp (col1,col2,col3)
select distinct
x.col1,x.col2,x.col3
from (
SELECT col1, col2, col3 FROM tab1 union all
SELECT col1, col2, col3 FROM tab2 union all
SELECT col1, col2, col3 FROM tab3 union all
-- ...
SELECT col1, col2, col3 FROM tab20
) x
join (
SELECT DISTINCT myId FROM TabIds
) y
on x.col1=y.myid