Query with aggregate, subquery and group by not working - mysql

Can you help me, please? I spent about 2 hours to understand what is wrong, but still don't.
SQLSTATE[42S21]: Column already exists: 1060 Duplicate column name
'id'
select count(*) as aggregate
from (
select `cities`.*,
`cities`.`id` as `id`,
`cities`.`country_id` as `country_id`,
`cities`.`name` as `name`,
`cities`.`alias` as `alias`,
`cities`.`active_frontend` as `active_frontend`
from `cities`
where (
cities.alias in (
select `alias`
from `cities`
group by `alias`
having COUNT(`alias`) > 1
)
)
) count_row_table
Don't ask me what the hell is going on please. Biggest part of this query is generated by Laravel.
If I delete this part:
where
(cities.alias IN (SELECT alias FROM cities GROUP BY alias HAVING
COUNT(alias) > 1))
It will work. But I need this part af.

The issue is with cities.*.
But you can simplify your query to:
select sum(cnt) as cnt
from (
select COUNT(alias) as cnt
from cities
group by alias
having COUNT(alias) > 1
) t
and avoid re-reading your table because in the end, all your need is total number of rows for which alias has more than one row.

You don't need to materialize a subquery for this. You can do:
select count(*)
from cities c
where exists (select 1 from cities c2 where c2.alias = c.alias and c2.id <> c.id);
With an index on cities(alias, id), this should have better performance.

Related

How do I correctly choose two columns within Sub Selection Mysql

Hello guys Im trying to use select within another Select and I get error of opperan should contain 1 value I saw other answers but coulndt figure out how to apply the solution. So here goes my query:
SELECT a.date_insert AS date
,HOUR(a.date_insert) AS hour
,AVG(spood)
,AVG(factor)
,(SELECT AVG(dd.spood) as median_val1,AVG(dd.factor) as median_val2
FROM (
SELECT d.spood, d.factor, #rownum:=#rownum+1 as `row_number`, #total_rows:=#rownum
FROM traf d, (SELECT #rownum:=0) r
WHERE d.spood is NOT NULL
ORDER BY d.spood
) as dd
WHERE dd.row_number IN ( FLOOR((#total_rows+1)/2), FLOOR((#total_rows+2)/2) ))
FROM traf a
INNER JOIN mycolumn b
ON a.ref_id = b.ref_id where value_3 > 100
GROUP BY 1,2
Please any help would be grateful
** I get the error at the (SELECT AVG) which is the subquery **
opperand should contain 1 column while I wish to retrieve 2 columns
You are selecting two expressions from the sub-query which is in the SELECT clause.
If you are using sub-query in the SELECT clause, It must have only one value in sub-query's SELECT clause and must return only one row.
Try to remove one expression from the sub-query and you will find success.
The error message is pretty clear. You have a subquery in the SELECT clause, which is a scalar subquery:
(SELECT AVG(dd.spood) as median_val1,AVG(dd.factor) as median_val2
FROM (SELECT d.spood, d.factor, #rownum:=#rownum+1 as `row_number`, #total_rows:=#rownum
FROM traf d, (SELECT #rownum:=0) r
WHERE d.spood is NOT NULL
ORDER BY d.spood
) as dd
WHERE dd.row_number IN ( FLOOR((#total_rows+1)/2), FLOOR((#total_rows+2)/2
)
A scalar subquery can only return one column and at most one row. The simple solution is to return only one value in the subquery. If you need multiple values, use multiple subqueries.
It is probably possible to rewrite your overall query. However, your question doesn't provide sample data, desired results, or an explanation of what the query is supposed to be doing.
-- Update
Try using the LEFT JOIN with sub-query as follows:
SELECT a.date_insert AS date
,HOUR(a.date_insert) AS hour
,AVG(spood)
,AVG(factor)
,MAX(AVG_VIEW.median_val1) -- usgae of the values from sub-query
,MAX(AVG_VIEW.median_val1) -- usgae of the values from sub-query
FROM traf a
INNER JOIN mycolumn b
ON a.ref_id = b.ref_id
-- added this
LEFT JOIN (SELECT AVG(dd.spood) as median_val1,AVG(dd.factor) as median_val2
FROM (
SELECT d.spood, d.factor, #rownum:=#rownum+1 as `row_number`, #total_rows:=#rownum
FROM traf d, (SELECT #rownum:=0) r
WHERE d.spood is NOT NULL
ORDER BY d.spood
) as dd
WHERE dd.row_number IN ( FLOOR((#total_rows+1)/2), FLOOR((#total_rows+2)/2) )) AS AVG_VIEW
ON 1=1 -- use proper conditions and accordingly use the correct columns in SELECT of this sub-query
-- till here
where value_3 > 100
GROUP BY 1,2
Note: You need to change this query little bit according to your requirement.

Deleting duplicate rows with SQL, CTE and everything else not working

I'm trying to delete a lot of duplicate rows from a SQL table with businesses' codes and businesses' descriptions but I have to keep one for each entry, I have something like 1925 rows and I have 345 rows with duplicates and triple entries, this is the query I used to find duplicates and triple entries:
SELECT codice_ateco_2007, descrizione_ateco_2007, COUNT(*) AS CNT FROM codici_ateco_il_leone GROUP BY codice_ateco_2007, descrizione_ateco_2007 HAVING CNT > 1;
I tried the following but won't work, any of them, when I use CTE I get and error saying unknown function after WITH statement and when I use the other codes like
DELETE
FROM MyDuplicateTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyDuplicateTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
it won't work anyway it says I cannot select the table inside the in function.
Is CTE and the other code out of date or what?How can somebody fix this?By the way there also is id PRIMARY KEY in the codici_ateco_il_leone table.
One method is row_number() with a join:
delete mdt
from MyDuplicateTable mdt join
(select mdt2.*,
row_number() over (partition by DuplicateColumn1, DuplicateColumn2, DuplicateColumn3 order by id) as seqnum
from MyDuplicateTable mdt2
) mdt2
on mdt2.id = mdt.id
where seqnum > 1;
A similar approach uses aggregation:
delete mdt
from MyDuplicateTable mdt join
(select DuplicateColumn1, DuplicateColumn2, DuplicateColumn3, min(id) as min_id
from MyDuplicateTable mdt2
group by DuplicateColumn1, DuplicateColumn2, DuplicateColumn3
having count(*) > 1
) mdt2
using (DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
where mdt.id > mdt2.min_id;
Both of these assume that id is a global unique identifier for each row. That seems reasonable based on the context. However, both can be tweaked if the id can be duplicated for different values of the three key columns.
Your delete statement is fine and works in about every DBMS - except for MySQL where you get this stupid error. The solution to this is simple: replace from sometable with from (select * from sometable) somealias:
DELETE
FROM MyDuplicateTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM (SELECT * FROM MyDuplicateTable) t
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3
);

MySQL Select from UNION performance issue (kills database)

I have a little problem regarding MySQL
I'm trying to make a UNION of two tables like so:
SELECT `user_id`, `post_id`, `requested_on`
FROM `a`
WHERE `status` != 'cancelled'
UNION
SELECT `user_id`, `post_id`, `time` as requested_on
FROM `b`
WHERE `type` = 'ADD'
This query is executed in Showing rows 0 - 29 (36684 total, Query took 0.0147 sec)
but when I do
SELECT * FROM (
SELECT `user_id`, `post_id`, `requested_on`
FROM `a`
WHERE `status` != 'cancelled'
UNION
SELECT `user_id`, `post_id`, `time` as requested_on
FROM `b`
WHERE `type` = 'ADD'
) tbl1
MySQL dies.
The reason why I want to do this is to GROUP BY user_id, post_id
Any ideas why this happens / any workarounds?
later-edit:
This is the SQL Fiddle:
http://sqlfiddle.com/#!2/c7f82d/2
The final query is there, which executes in:
Record Count: 10; Execution Time: 574ms
574ms for 10 records in my point of view is gigantic.
I found what the problem was from.
It was the fact that I was running the queries in PHPMyAdmin and when I did a SELECT UNION SELECT everything was good but when I did
SELECT * FROM (SELECT UNION SELECT)
the pagination system from PHPMyAdmin failed, and PHPMyAdmin was trying to output to my browser a over 30k rows table, that's why the SQL Request hang. :(
It is not clear what the question:
SELECT * FROM (
SELECT user_id, post_id, requested_on
FROM a
WHERE status != cancelled
UNION
SELECT user_id, post_id, time as requested_on
FROM b
WHERE type = ADD
) tbl1 GROUP BY user_id, post_id
means. Assume you have:
A, x, t1
A, x, t2
would you like the row with t1 or t2? If that does not matter lets apply an aggregate function such as MIN:
SELECT user_id, post_id, MIN(requested_on) FROM (
SELECT user_id, post_id, requested_on
FROM a
WHERE status <> cancelled
UNION
SELECT user_id, post_id, time as requested_on
FROM b
WHERE type = ADD
) tbl1
GROUP BY user_id, post_id
MySQL usually doesn't handle derived tables like this very well, is there any other predicate that you can apply to the parts in the union?

Optimizing the sql with subquery with group by

Could somebody help me optimize this query? I have a table with huge data. Could somebody help me optimize without using sub query?
SELECT user_id, scheduled_on_date
FROM
(SELECT user_id, scheduled_on_date
FROM `calls`
ORDER BY scheduled_on_date DESC) AS cinner
GROUP BY user_id
Output expected is:
What I need is the only rows of the users with their last scheduled_date for call.
You can rewrite your query as follows
SELECT c.user_id, c.scheduled_on_date ,other_fields_max_per_group
FROM `calls` c
JOIN (SELECT user_id, MAX(scheduled_on_date) scheduled_on_date
FROM `calls`
GROUP BY user_id) AS cc
ON(c.user_id =cc.user_id AND c.scheduled_on_date =cc.scheduled_on_date)
Add compound index
ALTER TABLE calls ADD INDEX `test` (user_id ,scheduled_on_date )
If there you only want the greatest date and user id you can use only
SELECT user_id, MAX(scheduled_on_date) scheduled_on_date
FROM `calls`
GROUP BY user_id
select c1.*
from calls c1 left join call c2 on c1.user_id = c2.user_id
and c1.scheduled_on_date < c2.scheduled_on_date
where c2.user_id is null;
If you wanted to optimize it further
make sure you have index on (user_id,scheduled_on_date)
get only the required columns

Keep all records in "WHERE IN()" clause, even if they are not found

I have the following mysql query:
SELECT id, sum(views) as total_views
FROM table
WHERE id IN (1,2,3)
GROUP BY id
ORDER BY total_views ASC
If only id 1,3 are found in the database, i still want id 2 to appear, with total_views being set to 0.
Is there any way to do that? This cannot use any other table.
This query hard-codes the list of possible IDs using a sub-query consisting of unions... it then left joins this set of ids to the table containing the information to be counted.
This will preserve an ID in your results even if there are no occurrences:
SELECT ids.id, sum(views) as total_views
FROM (
SELECT 1 AS ID
UNION ALL SELECT 2 AS ID
UNION ALL SELECT 3 AS ID
) ids
LEFT JOIN table
ON table.ID = ids.ID
GROUP BY ids.id
ORDER BY total_views ASC
Alternately, if you had a numbers table, you could do the following query:
SELECT numbers.number, sum(views) as total_views
FROM
numbers
LEFT JOIN table
ON table.ID = ids.ID
WHERE numbers.number IN (1, 2, 3)
GROUP BY numbers.number
ORDER BY total_views ASC
Here's an alternative to Micheal's solution (not a bad solution, mind you -- even with "a lot" of ID's), so long as you're not querying against a cluster.
create temporary table __ids (
id int unsigned primary key
) engine=MEMORY;
insert into __ids (id) values
(1),
(2),
(3)
;
SELECT table.id, sum(views) as total_views
FROM __ids left join table using (id)
GROUP BY table.id
ORDER BY total_views ASC
And if your query becomes complex, I could even conceive of it running more efficiently this way. But, if I were you, I'd benchmark this option with Michael's ad-hoc UNION'ed table option using real data.
in #Michael's answer, if you do have a table with the ids you care about, you can use it as "ids" in place of Michael's in-line data.
Check this fiddle... http://www.sqlfiddle.com/#!2/a9392/3
Select B.ID, sum(A.views) sum from tableB B
left outer join tableA A
on B.ID = A.ID
group by A.ID
also check
http://www.sqlfiddle.com/#!2/a1bb7/1
try this
SELECT id
(CASE 1
IF EXISTS THEN views = mytable.views END
IF NOT EXIST THEN views = 0 END
CASE 2
IF EXISTS THEN views = mytable.views END
IF NOT EXIST THEN views = 0 END
CASE 3
IF EXISTS THEN views = mytable.views END
IF NOT EXIST THEN views = 0 END), sum(views) as total_views
FROM mytable
WHERE id IN (1,2,3)
GROUP BY id
ORDER BY total_views ASC
Does it have to be rows or could you pivot the data to give you one row and a column for every id?
SELECT
SUM(IF (id=1, views, 0)) views_1,
SUM(IF (id=2, views, 0)) views_2,
SUM(IF (id=3, views, 0)) views_3
FROM table