'GROUP BY' and subquery optimization - mysql

I have two tables: data and img each raw from data having 0, 1 or n images in img table.
I want to be able to get all records from data having more than 1 images in img. Cannot use JOINs because I can not edit the first part of the sql query: SELECT some_defualt_columns FROM data WHERE.
Here are my solutions:
Solution 1 Takes a while to perform but works
SELECT some_defualt_columns FROM data WHERE `id` IN
(SELECT data_id FROM
(SELECT data_id, count(*) as occ FROM
img GROUP BY data_id
HAVING occ >1)
AS tmp)
Solution 2 This should be faster than the previous (MySQL: View with Subquery in the FROM Clause Limitation) but this literally kills my MySQL server
SELECT * FROM data WHERE id IN
(SELECT data_id FROM img
GROUP BY `data_id`
HAVING count(`data_id`) > 1)
SOLUTION 3 Maybe the fastes but needs the creation of a view:
CREATE VIEW my_data_with_more_than_one_img AS
SELECT all_columns_of_data_table FROM
data JOIN img
GROUP BY img.data_id
HAVING (COUNT(img.data_id) > 1
Than execute a simple SELECT on this:
SELECT * FROM my_data_with_more_than_one_img WHERE 1
This last solution is rather fast, but I want to know if is there any faster (or smarter) way to get this done

Related

Outer query is very slow when inner query returns no results

I'm trying to fetch a row from a table called export with random weights. It should then fetch one row from another table export_chunk which references the first row. This is the query:
SELECT * FROM export_chunk
WHERE export_id=(
SELECT id FROM export
WHERE schedulable=1
ORDER BY -LOG(1 - RAND())/export.weight LIMIT 1)
AND status='PENDING'
LIMIT 2;
The export table can have 1000 rows while the export_chunk table can have millions of rows.
The query is very fast when the inner query returns a row. However, if there are no rows with schedulable=1, the outer query performs a full table scan on export_chunk. Why does this happen and is there any way to prevent it?
EDIT: Trying COALESCE()
Akina in the comments suggested using COALESCE, ie.:
SELECT * FROM export_chunk
WHERE export_id=COALESCE(
SELECT id FROM export
WHERE schedulable=1
ORDER BY -LOG(1 - RAND())/export.weight LIMIT 1)
,-1)
AND status='PENDING'
LIMIT 2;
This should work. When I run:
SELECT COALESCE((SELECT id FROM export WHERE schedulable=1 ORDER BY -LOG(1-RAND())/export.weight LIMIT 1), -1) FROM export;
It does return -1 for each row which Akina predicted. And if I manually search for -1 instead of the inner query it returns no rows very quickly. However, when I try to use COALESCE on the inner query it is still really slow. I do not understand why.
Test this:
SELECT export_chunk.*
FROM export_chunk
JOIN ( SELECT id
FROM export
WHERE schedulable=1
ORDER BY -LOG(1 - RAND())/export.weight
LIMIT 1 ) AS random_row ON export_chunk.export_id=random_row.id
WHERE export_chunk.status='PENDING'
LIMIT 2;
Does this matches needed logic? especially when no matching rows in the subquery - do you need none output rows (like now) or any 2 rows in this case?
PS. LIMIT without ORDER BY in outer query is strange.

Getting last few rows from more than one tables in MySql?

i am pretty much stucked in an Sql Query from past few hours . i need to get latest few elements from four tables as follows..
table names are -- events , contactinfo , video , news
i need last 3 results from events and news and last single result from video and contactinfo..
i tried following query but as expected it didnt worked ..
SELECT * FROM
((SELECT * FROM EVENTS ORDER BY eventid DESC LIMIT 3)EV) INNER JOIN
((SELECT * FROM NEWS ORDER BY newsid DESC LIMIT 3)NE) INNER JOIN
((SELECT * FROM VIDEOS ORDER BY videoid DESC LIMIT 1)VI) INNER JOIN
((SELECT * FROM CONTACTINFO ORDER BY cid DESC LIMIT 1)AB);
Actually i am not a DB Expert i am a Developer and i really dont know much about MySql.
Any Help Would be Appreciated.
If these tables have the same columns you can do a UNION (instead of your INNER JOIN). If not, I suggest doing 4 queries.
JOINs suggests that the data that is joined correlates to each other and if that's not the case than doing an JOIN seams like the wrong solution.
If you need result as a single table then use SELECT and UNION to union data, providing same column numbers and their data types in each query (CAST column and provide default values if need). Otherwise, if you need results with different structures then run 4 queries.
JOINs don't make sense for your task as last N rows from one table unlikely have corresponding rows within last N rows of another table.
UPDATE
See example:
SELECT * FROM
(SELECT TOP 5 n.ID, n.Content, n.CreatedOn as CreatedOn, n.UserID as NewsUserID, 1 as SourceType FROM News n ORDER BY n.CreatedOn DESC) t1
UNION ALL
SELECT * FROM
(SELECT TOP 5 e.ID, e.Description as Content, e.CreatedAt as CreatedOn, NULL as NewsUserID, 2 as SourceType FROM Events e ORDER BY e.CreatedAt DESC) t2
ORDER BY SourceType, CreatedOn DESC
So i decided i want to have ID, Content and CreatedOn from every source, and also want to have UserID from News table. I built 2 queries so they return same columns of same datatypes. Each query takes only first 5 rows from source (TOP 5 is MS SQL syntax, please use your database's). Also i added an extra field SourceType that keeps type of entity. In the main query i union all results and order by source type first, then by CreatedDate.
This is not a logical way to get four table data in one call, since all tables are independent.
I think you wants to minimise database call,
In order to minimise database hits, you should use memcache instead of using such query.
Memcache :
It save data as key value pair, for each key you will get result set.
Its very fast.

My SQL Query is very slow with 50K+ records

I have a table called links in which I am having 50000 + records where three of the fields are having indexes in the order link_id ,org_id ,data_id.
Problem here is when I am using group by sql query it is taking more time to load.
The Query is like this
SELECT DISTINCT `links`.*
FROM `links`
WHERE `links`.`org_id` = 2 AND (link !="")
GROUP BY link
The table is having nearly 20+ columns
Is there any solution to speed up the query to access faster.
Build an index on (org_id, link)
org_id to optimize your WHERE clause, and link for your group by (and also part of where).
By having the LINK_ID in the first position is probably what is holding your query back.
create index LinksByOrgAndLink on Links ( Org_ID, Link );
MySQL Create Index syntax
the problem is in your DISTINCT.*
the GROUP BY is already doing the work of DISTINCT , so are doing distinct two times , one of SELECT DISTINCT and other for GROUP BY
try this
SELECT *
FROM `links`
WHERE org_id` = 2 AND (link !="")
GROUP BY link
I guess adding a index to your 'link' column would improve the result.
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
Only select the columns that you need.
Why is there a distinct for links.*?
Do you have some rows exactly doubled in your table?
On the other hand, changing the value "" to NULL could be improve your select statement, but iam not sure about this.

MySQL: Select statement and selectively retrieve data

I have a large dataset in MySQL and I would like to speed up the select statement when reading data. Assuming that there are 1000 records, I would like to issue a select statement that retrieves half of them for example but based on time-stamp.
Using something like this will not work, while id is not tightly coupled with time-stamp
select * from table where table.id mod 5 = 0;
Retrieving all the data and afterwards select the data needed is not a solution while I want to avoid retrieving the large dataset. Thus, I 'm looking for something that would distinguish the records upon select.
Thnx
If you need speed then try this
select * from table ORDER BY table.id DESC LIMIT 0,500;
select * from table ORDER BY table.id DESC LIMIT 500,500;
and so on...

MYSQL, Subquery Reference in Union

Is there any way to reference a subquery in a union?
I am trying to do something like the following, and would like to avoid a temporary table, but the subquery will be drawn from a much larger dataset so it makes sense to only do it once..
SELECT * FROM (SELECT * FROM ads WHERE state='FL' AND city='Maitland' AND page='home' ORDER BY RAND()) AS sq WHERE spot = 'full-banner' LIMIT 1
UNION
SELECT * FROM sq WHERE spot = 'leaderboard' LIMIT 1
UNION
SELECT * FROM sq WHERE spot = 'rectangle1' LIMIT 1
UNION
SELECT * FROM sq WHERE spot = 'rectangle2' LIMIT 1
.... etc,,
It's a shame that DISTINCT can't be specified for a single column of a result set.
Well, there is no way to do what you're trying to do without repeating the creation of the derived table.
If querying ads is really expensive then you should try adding an index like:
alter table ads add index (state, city, page, spot);
If after adding that index the query takes too much, then I'd recommend creating a table to store this data and then query that table for each spot.
Depending on your data, you could play around with GROUP BY to get similar results.