I all,
i have 2 similar very LARGE table(1M rows each) with the same layout, i would union them and sorting by a common column: start . also i would put a condition in "start" ie : start>X.
the problem is that the view doesnt take care abount start's index and the the complexity rise up much, a simple query takes about 15 seconds and inserting a LIMIT doesnt fix because the results are cutted off first.
CREATE VIEW CDR AS
(SELECT start, duration, clid, FROM cdr_md ORDER BY start LIMIT 1000)
UNION ALL
(SELECT start, duration, clid, FROM cdr_1025 ORDER BY start LIMIT 1000)
ORDER BY start ;
a query to:
SELECT * FROM CDR WHERE start>10
doesnt returns expected results cause LIMIT keyword cuts off results prior.
the expected results would be as a query like this:
CREATE VIEW CDR AS
(SELECT start, duration, clid, FROM cdr_md WHERE start>X ORDER BY start LIMIT 1000)
UNION ALL
(SELECT start, duration, clid, FROM cdr_1025 WHERE start>X ORDER BY start LIMIT 1000)
ORDER BY start ;
Is there a way to avoid this problem ?
Thaks all
Fabrizio
i have 2 similar table ... with the same layout
This is contrary to the Principle of Orthogonal Design.
Don't do it. At least not without very good reason—with suitable indexes, 1 million records per table is easily enough for MySQL to handle without any need for partitioning; and even if one did need to partition the data, there are better ways than this manual kludge (which can give rise to ambiguous, potentially inconsistent data and lead to redundancy and complexity in your data manipulation code).
Instead, consider combining your tables into a single one with suitable columns to distinguish the records' differences. For example:
CREATE TABLE cdr_combined AS
SELECT *, 'md' AS orig FROM cdr_md
UNION ALL
SELECT *, '1025' AS orig FROM cdr_1025
;
DROP TABLE cdr_md, cdr_1025;
If you will always be viewing your data along the previously "partitioned" axis, include the distinguishing columns as index prefixes and performance will generally improve versus having separate tables.
You then won't need to perform any UNION and your VIEW definition effectively becomes:
CREATE VIEW CDR AS
SELECT start, duration, clid, FROM cdr_combined ORDER BY start
However, be aware that queries on views may not always perform as well as using the underlying tables directly. As documented under Restrictions on Views:
View processing is not optimized:
It is not possible to create an index on a view.
Indexes can be used for views processed using the merge algorithm. However, a view that is processed with the temptable algorithm is unable to take advantage of indexes on its underlying tables (although indexes can be used during generation of the temporary tables).
Related
I have 1 query that returns over 180k rows. I need to make a slight change, so that it returns only about 10 less.
How do I show only the 10 rows as a result?
I've tried EXCEPT but it seems to return a lot more than just the 10.
You can use LIMIT. This will show first n rows. Example:
SELECT * FROM Orders LIMIT 10
If you are trying to make pagination add OFFSET. It will return 10 rows starting from row 20. Example:
SELECT * FROM Orders LIMIT 10 OFFSET 20
MySQL doesn't support EXCEPT (to my knowledge).
Probably the most efficient route would be to incorporate the two WHERE clauses into one. I say efficient in the sense of "Do it that way if you're going to run this query in a regular report or production application."
For example:
-- Query 1
SELECT * FROM table WHERE `someDate`>'2016-01-01'
-- Query 2
SELECT * FROM table WHERE `someDate`>'2016-01-10'
-- Becomes
SELECT * FROM table WHERE `someDate` BETWEEN '2016-01-01' AND '2016-01-10'
It's possible you're implying that the queries are quite complicated, and you're after a quick (read: not necessarily efficient) way of getting the difference for a one-off investigation.
That being the case, you could abuse UNION and a sub-query:
(Untested, treat as pseudo-SQL...)
SELECT
*
FROM (
SELECT * FROM table WHERE `someDate`>'2016-01-01'
UNION ALL
SELECT * FROM table WHERE `someDate`>'2016-01-10'
) AS sub
GROUP BY
`primaryKey`
HAVING
COUNT(1) = 1;
It's ugly though. And not efficient.
Assuming that the only difference is only that one side (I'll call it the "right hand side") is missing records that the left includes, you could LEFT JOIN the two queries (as subs) and filter to right-side-is-null. But that'd be dependent on all those caveats being true.
Temporary tables can be your friend - especially given they're so easily created (and can be indexed):
CREATE TEMPORARY TABLE tmp_xyz AS SELECT ... FROM ... WHERE ...;
This query is inefficient and unable to execute. track and desiredspeed table have almost million records.... after this we want to self join the track table for further processing. any efficient approach to execute bellow query is appreciated..
select
t_id,
route_id,
t.timestamp,
s_lat,
s_long,
longitude,
latitude,
SQRT(POW((latitude - d_lat),2) + POW((longitude - d_long),2)) as dst,
SUM(speed*18/5)/count(*) as speed,
'20' as actual_speed,
((20-(speed*18/5))/(speed*18/5))*100 as speed_variation
from
track t,
desiredspeed s
WHERE
LEFT(s_lat,6) = LEFT(latitude,6)
AND LEFT(s_long,6)=LEFT(longitude,6)
AND t_id > 53445
group by
route_id,
s_lat,
s_long
order by
t_id asc
firstly you are using sybase join syntax i would change that
you are also performing two computations per join across large datasets this is likely to be inefficient
this will not be able to use an index as you are performing computation on the column, either store the data precomputed or alternately add a computed column based on the rule applied above, and index accordingly
Finally it may be quicker if you used temp tables or common Table expressions (although do not know MySQL too well here)
I am optimizing a query which involves a UNION ALL of two queries.
Both Queries have been more than optimized and run at less than one second separately.
However, when I perform the union of both, it takes around 30 seconds to calculate everything.
I won't bother you with the specific query, since they are optimized as they get, So let's call them Optimized_Query_1 and Optimized_Query_2
Number of rows from Optimized_Query_1 is roughly 100K
Number of rows from Optimized_Query_2 is roughly 200K
SELECT * FROM (
Optimized_Query_1
UNION ALL
Optimized_Query_2
) U
ORDER BY START_TIME ASC
I do require for teh results to be in order, but I find that with or without the ORDER BY at the end the query takes as much time so shouldn't make no difference.
Apparently the UNION ALL creates a temporary table in memory, from where the final results are then given, is there any way to work around this?
Thanks
You can't optimize UNION ALL. It simply stacks the two results sets on top of each other. Compared to UNION where an extra step is required to remove duplicates, UNION ALL is a straight stacking of the two result sets. The ORDER BY is likely taking additional time.
You can try creating a VIEW out of this query.
I need to query the MYSQL with some condition, and get five random different rows from the result.
Say, I have a table named 'user', and a field named 'cash'. I can compose a SQL like:
SELECT * FROM user where cash < 1000 order by RAND() LIMIT 5.
The result is good, totally random, unsorted and different from each other, exact what I want.
But I got from google that the efficiency is bad when the table get large because MySQL creates a temporary table with all the result rows and assigns each one of them a random sorting index. The results are then sorted and returned.
Then I go on searching and got a solution like:
SELECT * FROM `user` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `user`)- (SELECT MIN(id) FROM `user`))+(SELECT MIN(id) FROM `user`)) AS id) AS t2 WHERE t1.id >= t2.id AND cash < 1000 ORDER BY t1.id LIMIT 5;
This method uses JOIN and MAX(id). And the efficiency is better than the first one according to my testing. However, there is a problem. Since I also needs a condition "cash<1000" and if the the RAND() is so big that no row behind it has the cash<1000, then no result will return.
Anyone has good idea of how to compose the SQL that has have the same effect as the first one but has better efficiency?
Or, shall I just do simple query in MYSQL and let PHP randomly pick 5 different rows from the query result?
Your help is appreciated.
To make first query faster, just SELECT id - that will make the temporary table rather small (it will contain only IDs and not all fields of each row) and maybe it will fit in memory (temp table with text/blob are always created on-disk for example). Then when you get a result, run another query SELECT * FROM xy WHERE id IN (a,b,c,d,...). As you mentioned this approach is not very efficient, but as a quick fix this modification will make it several times faster.
One of the best approaches seems to be getting the total number of rows, choosing random numbers and for each result run a new query SELECT * FROM xy WHERE abc LIMIT $random,1. It should be quite efficient for random 3-5, but not good if you want 100 random rows each time :)
Also consider caching your results. Often you don't need different random rows to be displayed on each page load. Generate your random rows only once per minute. If you will generate the data for example via cron, you can live also with query which takes several seconds, as users will see the old data while new data are being generated.
Here are some of my bookmarks for this problem for reference:
http://jan.kneschke.de/projects/mysql/order-by-rand/
http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/
I am querying a table for a set of data that may or may not have enough results for the correct operation of my page.
Typically, I would just broaden the range of my SELECT statement to insure enough results, but in this particular case, we need to start with as small of a range of results as possible before expanding (if we need to).
My goal, therefore, is to create a query that will search the db, determine if it got a sufficient amount of rows, and continue searching if it didn't. In PHP something like this would be very simple, but I can't figure out how to dynamically add select statements in a query based off of a current count of rows.
Here is a crude illustration of what I'd like to do - in a SINGLE query:
SELECT *, COUNT(`id`) FROM `blogs` WHERE `date` IS BETWEEN '2011-01-01' AND '2011-01-02' LIMIT 25
IF COUNT(`id`) < 25 {
SELECT * FROM `blogs` WHERE `date` IS BETWEEN '2011-01-02' AND '2011-01-03' LIMIT 25
}
Is this possible to do with a single query?
You have 2 possible solutions:
Compare the count on the programming language side. And if there is not enough - perform one more query. (it is not as bad as you think: query cache, memcached, proper indexes, enough memory on server, etc - there are a lot of possibilities to improve performance)
Create stored procedure