Indexing a column in an ORDER BY that is dynamically obtained - mysql

Supposed I have an ORDER BY that is derived by a SELECT statement:
...
WHERE col1 = ...
ORDER BY (SELECT CASE .. END) // - will return col2 or col3 both of which are in the same table as column 1.
Will the following indices be of use or does using the SELECT makes them useless?
INDEX (col1, col2)
INDEX (col1, col3)

Related

Is MySQL handling null values correctly when count(distinct col1, col2, col3)

I was having trouble determining a possible unique key in a poorly defined table. The table had 5000 rows. I selected distinct on the fields I thought might be a unique key.
select count(distinct col1, col2)
from tab1;
The result was 4980 records. Then I checked the 20 records and found that the values for col2 where null, but adding col3 should give me uniqueness.
select count(distinct col1, col2, col3)
from tab1;
The result was still 4980. What the? So I changed the query to this.
select col1, col2, col3, count(*)
from tab1
group by col1, col2, col3
having count(*) > 1;
With this I got zero rows, so col1, col2, and col3 are unique. So what was wrong with the first three column query? I tried this.
select count(distinct col1, coalesce(col2, ''), col3)
from tab1;
This returned 5000 records.
It is likely that the multiple fields are being concatenated together in one field in the engine, and concatenating col1, NULL, col3 is resulting in NULL and that is why it is acting this way. But, the result seems to break the NULL standards that MySQL seems to want to follow. Is this a MySQL bug?
The manual specifically says that COUNT(DISTINCT expr [,expr...])
Returns a count of the number of rows with different non-NULL expr values.
which is the behaviour you are seeing.

Merge several rows to own row by value of one column in MySQL

i have a table like this one:
There are some rows with the same value in col2 and e.g. for every 't1' is only one value in col3 (and so on), no duplicates.
and I want to have a query for this result:
with
SELECT * FROM table GROUP BY col2
i get no all values for t1 e.g..
You can pick min/max value per group as
SELECT
max(col1) col1,
col2,
max(col3) col3,
max(col4) col4,
max(col5) col5,
max(col6) col6,
max(col7) col7
FROM table
GROUP BY col2

Add "on duplicate key update" clause to insert into query

I'm starting with a query like this:
insert into summary ( col1, col2, Total )
select col1, col2, count(col4) as total from importdata
where col1 = 'abc' and col4 in ('1A', '2A')
group by col1, col2
order by col1, col2
and I haven't been able to determine how the correct 'on duplicate' clause. The clause I think I need is
on duplicate key update total=count(col4)
and I've placed it as the very last line in the query and as the line after the where clause, but both generated errors. Is my clause even correct and where does it need to go?
(Worst case I can use 'insert ignore', but I think doing the update would be better.)
You can't use COUNT or other group functions in the ON DUPLICATE KEY UPDATE clause. What you can do instead is this:
INSERT INTO summary ( col1, col2, Total )
SELECT col1, col2, count(col4)
FROM importdata
WHERE col1 = 'abc' AND col4 IN ('1A', '2A')
GROUP BY col1, col2
ORDER BY col1, col2
ON DUPLICATE KEY UPDATE Total = VALUES(Total)
This says, if there is a duplicate key, instead of inserting a new row just set the column total to the value you would have inserted in Total. Note that I got rid of the as total -- that would have caused problems as you already have a column named Total, and the names are case-insensitive.
You cannot use functions for the duplicate key update, however, you could create a variable and then use that variable.
INSERT INTO summary (col1, col2, Total)
select col1, col2, #totalCount := count(col4) as Total from importdata
where col1 = 'abc' and col4 in ('1A', '2A')
group by col1, col2
order by col1, col2
) ON DUPLICATE KEY UPDATE Total = #totalCount;

ON DUPLICATE KEY UPDATE - help needed

My problem is i have an insert that updates the query on duplicate key and it is like the one below:
INSERT INTO TABLE
(COL1, COL2, COL3 , ETC...)
SELECT
COLA1, COLA2, COUNT(1) , ETC...
FROM TABLE2
WHERE 'CONDITION'
GROUP BY COL1, COL2, COL3
ON DUPLICATE KEY UPDATE COL1=VALUES(COLA1), COL3=COUNT(1)
THIS QUERY RETURNS AN ERROR: General error: 1111 Invalid use of group function SQL
COL1, COD2, COL3 ARE COMPLEX KEY.
Try this:
INSERT INTO TABLE(COL1, COL2, COL3, ETC...)
SELECT COLA1, COLA2, COUNT(1), ETC...
FROM TABLE2
WHERE 'CONDITION'
GROUP BY COL1, COL2, COL3
ON DUPLICATE KEY UPDATE COL1 = VALUES(COL1), COL3 = VALUES(COL3);
That is, refer to the names in the values1 statement, not the expressions in the select statement.

Similar WHERE clause in a long UNION statement in SQL Server 2008 R2

In a stored procedure, I need to INSERT the result of a long UNION into a temp table.
The WHERE clause is the same for all tables, which is being in a SELECT DISTINCT.
Simplified for readability, it goes like this:
INSERT INTO #MyTemp
SELECT col1, col2, col3 FROM tab1 WHERE col1 in (SELECT DISTINCT myId FROM TabIds) UNION
SELECT col1, col2, col3 FROM tab2 WHERE col1 in (SELECT DISTINCT myId FROM TabIds) UNION
SELECT col1, col2, col3 FROM tab3 WHERE col1 in (SELECT DISTINCT myId FROM TabIds) UNION
.
.
.
SELECT col1, col2, col3 FROM tab20 WHERE col1 in (SELECT DISTINCT myId FROM TabIds)
Although TabIds is a small temp table, typically 3-6 records long, this seems to be pretty inneficient.
Is there a better way to do this?
Summarizing my question:
Is there a way I can do SELECT DISTINCT myId FROM TabIds just once and assign it to a kind of array/list/set (not to another temp table) and just use that in the WHERE clauses, and if there is a way, does it really matter for such a small (3-6 recs) temp table?
I'm ignoring your requirement ("not to another temp table") because I don't believe it is well-founded. Try and see if this solution gives you better performance:
SELECT i = myId
INTO #x
FROM dbo.TabIds -- please always use schema prefix
GROUP BY myId;
CREATE UNIQUE CLUSTERED INDEX x ON #x(i);
INSERT INTO #MyTemp(col1, col2, col3)
SELECT col1, col2, col3
FROM
(
SELECT col1, col2, col3 FROM dbo.tab1 WHERE EXISTS -- likely better than IN
(SELECT 1 FROM #x WHERE i = tab1.col1)
UNION ALL
SELECT col1, col2, col3 FROM dbo.tab2 WHERE EXISTS
(SELECT 1 FROM #x WHERE i = tab2.col1)
UNION ALL
...
UNION ALL
SELECT col1, col2, col3 FROM dbo.tab20 WHERE EXISTS
(SELECT 1 FROM #x WHERE i = tab20.col1)
) AS x
GROUP BY col1, col2, col3; -- likely more efficient than `UNION` to remove dupes
Of course this will work best if col1 is indexed in all 20 tables, and if that index includes col2 and col3.
The reason I suggested a view is not because I thought it would make this code run faster. Just that you could create a view that generates this UNION for you, making this code simpler (and any other code that repeats this monotonous UNION). It was a suggestion for convenience, not for performance - though I need to make it clear that using a view does not magically make things slower. Sometimes I can, but that's a dangerous and illogical reason to avoid views.
Finally, I'd strongly consider normalization. Why are these 20 different tables in the first place, when they could all be in one single table?
CREATE TABLE dbo.Normal
(
SourceTableID INT,
col1 <data type>,
col2 <data type>,
col3 <data type>
);
-- indexes / constraints
INSERT dbo.Normal
SELECT 1, col1, col2, col3 FROM dbo.tab1
UNION ALL
SELECT 2, col1, col2, col3 FROM dbo.tab2
UNION ALL
...
UNION ALL
SELECT 20, col1, col2, col3 FROM dbo.tab20;
Now all your queries can simply reference this new table. If you will commonly look for only one of the sources (e.g. tab5), then indexing or partitioning on SourceTableID would be useful.
What you're doing, conceptually, is fine for one-offs and data loads. I hope this isn't part of a bigger pattern in production code, though.
What you're looking for is a Common Table Expression.
My T-SQL is a bit rusty, but with a CTE, your query would go something like:
WITH TabIds_CTE AS (SELECT DISTINCT myId FROM TabIds)
INSERT INTO #MyTemp
SELECT col1, col2, col3 FROM tab1 WHERE col1 IN (SELECT * FROM TabIds_CTE)
UNION ALL ...
I think the following might be better for small tables, but still - it's horrible idea to leave it like this in some production process :)
INSERT INTO #MyTemp (col1,col2,col3)
select distinct
x.col1,x.col2,x.col3
from (
SELECT col1, col2, col3 FROM tab1 union all
SELECT col1, col2, col3 FROM tab2 union all
SELECT col1, col2, col3 FROM tab3 union all
-- ...
SELECT col1, col2, col3 FROM tab20
) x
join (
SELECT DISTINCT myId FROM TabIds
) y
on x.col1=y.myid