MYSQL query too big - mysql

Despite 8gb of RAM, when I run this MYSQL query I get an error because memory runs. The reason is I have a huge amount of data:
`DELETE FROM bigtable_main where date = '2009-12-31';
Is there a way to split the above query up so that I can do rows 1 to 999,999 in one query, rows 1,000,000 to 1,999,999 in another query, etc.?

You could use the limit keyword:
DELETE FROM bigtable_main where date = '2009-12-31' LIMIT 1000000;
You simply run this query over and over again until there are no rows left to delete.
DELETEing rows is more complex than you might guess, because MySQL's transaction semantics go to a lot of trouble to make it possible to roll back the deletion. If you do the deletion in smaller chunks (e.g. LIMIT 1000000 or even LIMIT 1000) you demand less rollback work from the MySQL server.

Related

understanding mysql limit with non indexed

i have this query which is very simple but i dont want to use index here due to some constraints.
so my worry is how to avoid huge load on server if we are calling non indexed item in where clause.
the solution i feel will be limit.
i am sure of having data in 1000 rows so if i use limit i can check the available values.
SELECT *
from tableA
where status='1' and student='$student_no'
order by id desc
limit 1000
here student column is not indexed in mysql so my worry is it will cause huge load in server
i tried with explain and it seems to be ok but problem is less no of rows in table and as u know mysql goes crazy with more data like millions of rows.
so what are my options ??
i should add index for student ??
if i will add index then i dont need 1000 rows in limit. one row is sufficient and as i said table is going to be several millions of rows so it requires lot of space so i was thinking to avoid indexing of student column and other query is 1000 row with desc row should not cause load on server as id is indexed.
any help will be great
You say:
but i dont want to use index here due to some constraints...
and also say:
how to avoid huge load on server...
If you don't use an index, you'll produce "huge load" on the server. If you want this query to be less resource intensive, you need to add an index. For the aforementioned query the ideal index is:
create index on tableA (student, status, id);
This index should make your query very fast, even with millions of rows.
LIMIT 100 doesn't force the database to search in the first 100 rows.
It just stop searching after 100 matches are found.
So it is not used for performance.
In the query below
SELECT *
from tableA
where status='1' and student='$student_no'
order by id desc
limit 1000
The query will run until it finds 1000 matches.
It doesn't have to search only the first 1000 rows
So this is the behaviour of the above query:
int nb_rows_matched = 0;
while (nb_rows_matched < 1000){
search_for_match();
}

MySQL query takes more time to fetch data [MySQL]

I have 500000 records table in my MySQL server. When running a query it takes more time for query execution. sometimes it goes beyond a minute.
Below I have added my MySQL machine detail.
RAM-16GB
Processor : Intel(R) -Core™ i5-4460M CPU #3.20GHz
OS: Windows server 64 bit
I know there is no problem with my machine since it is a standalone machine and no other applications there.
Maybe the problem with my query. I have gone through the MySql site and found that I have used proper syntax. But I don't know exactly the reason for the delay in the result.
SELECT SUM(`samplesalesdata50000`.`UnitPrice`) AS `UnitPrice`, `samplesalesdata50000`.`SalesChannel` AS `Channel`
FROM `samplesalesdata50000` AS `samplesalesdata50000`
GROUP BY `samplesalesdata50000`.`SalesChannel`
ORDER BY 2 ASC
LIMIT 200 OFFSET 0
Can anyone please let me know whether the duration, depends on the table or the query that I have used?
Note: Even if try with indexing, there is no much difference in result time.
Thanks
Two approaches to this:
One approach is to create a covering index on the columns needed to satisfy your query. The correct index for your query contains these columns in this order: (SalesChannel, UnitPrice).
Why does this help? For one thing, the index itself contains all data needed to satisfy your query, and nothing else. This means your server does less work.
For another thing, MySQL's indexes are BTREE-organized. That means they're accessible in order. So your query can be satisfied one SalesChannel at a time, and MySQL doesn't need an internal temporary table. That's faster.
A second approach involves recognizing that ORDER BY ... LIMIT is a notorious performance antipattern. You require MySQL to sort a big mess of data, and then discard most of it.
You could try this:
SELECT SUM(UnitPrice) UnitPrice,
SalesChannel Channel
FROM samplesalesdata50000
WHERE SalesChannel IN (
SELECT SalesChannel
FROM samplesalesdata50000
ORDER BY Channel LIMIT 200 OFFSET 0
)
GROUP BY SalesChannel
ORDER BY SalesChannel
LIMIT 200 OFFSET 0
If you have an index on SalesChannel (the covering index mentioned above works) this should speed you up a lot, because your aggregate (GROUP BY) query need only consider a subset of your table.
Your problem with "ORDER BY 2 ASC". Try this "ORDER BY Channel".
If it was MS SQL Server you would use the WITH (NOLOCK)
and the MYSQL equivalent is
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;
SELECT SUM(`samplesalesdata50000`.`UnitPrice`) AS `UnitPrice`, `samplesalesdata50000`.`SalesChannel` AS `Channel`
FROM `samplesalesdata50000` AS `samplesalesdata50000`
GROUP BY `samplesalesdata50000`.`SalesChannel`
ORDER BY SalesChannel ASC
LIMIT 200 OFFSET 0
COMMIT ;
To improve on OJones's answer, note that
SELECT SalesChannel FROM samplesalesdata50000
ORDER BY SalesChannel LIMIT 200, 1
will quickly (assuming the index given) find the end of the desired list. Then adding this limits the main query to only the rows needed:
WHERE SalesChannel < (that-select)
There is, however, a problem. If there are fewer than 200 rows in the table, the subquery will return nothing.
You seem to be setting up for "paginating"? In that case, a similar technique can be used to find the starting value:
WHERE SalesChannel >= ...
AND SalesChannel < ...
This also avoids using the inefficient OFFSET, which has to read, then toss, all the rows being skipped over. More
But the real solution may be to build and maintain a Summary Table of the data. It would contain subtotals for each, say, month. Then run the query against the Summary table -- it might be 10x faster. More

SQL delete query is taking too long , 20k records to be deleted from half million records

In my database i have around 500,000 records. I have applied a query to delete around 20,000 records.
Its been 45 minutes that Heidi SQL is showing that the command is being executed.
Here is my command -
DELETE FROM DIRECTINOUT_MSQL WHERE MACHINENO LIKE '%TEXAOUT%' AND DATASENT = 1 AND UPDATETIME = NULL AND DATE <= '2017/4/30';
Please advise, how to avoid these kind of situations in future and what should i do now ? Shall i disintegrate this query with some condition and execute smaller query ?
I have exported my database backup file, its around 47mb
Kindly advise.
Try index method. It will improve your query performance
Database index, or just index, helps speed up the retrieval of data from tables. When you query data from a table, first MySQL checks if the indexes exist, then MySQL uses the indexes to select exact physical corresponding rows of the table instead of scanning the whole table.
https://dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html
What is the engine you are using myIsm or InnoDB?
I guess you are using mysql database,
What is the isolation level set on the database?
how much time does a select of that data takes? please see that there is in some cases limits that make you thin that query finished within few seconds but you got only part of the results.
I had this problem once. I solved it using a loop.
You have to first write a query with fast select to check if you have records to delete:
SELECT COUNT(SOME_ID) FROM DIRECTINOUT_MSQL WHERE MACHINENO LIKE '%TEXAOUT%' AND DATASENT = 1 AND UPDATETIME = NULL AND DATE <= '2017/4/30' LIMIT 1
While you have a count of 1 for this query then do this:
DELETE FROM DIRECTINOUT_MSQL WHERE MACHINENO LIKE '%TEXAOUT%' AND DATASENT = 1 AND UPDATETIME = NULL AND DATE <= '2017/4/30' LIMIT 1000
I just chose 1000 randomly, but you have to see what is the fastest limit for the DELETE statement for you server configuration.

Orientdb GC overhead limit exceeded/out of memory error and slow performance

My orientdb database has around 2.3 million records. I'm trying to query all duplicate records (there are around 750,000 of them) using statement-
SELECT FROM (select PROP1, PROP2, count(*) as c from vin_data group by PROP1 ) where c > 1. When I set the limit to around 200, It takes around 180s to query (which I believe is slow). But when I set the limit to 750000, it gives me Out of memory error. My ram is 4GB and I have set Xms64m and Xmx3600m. I have set index on PROP1 and PROP1+PROP2(composite). My question is- Is 4GB ram enough for a 2.3 million record database?
For the query above both indexes are worthless, because they are not used in the GROUP BY. Without any "where" condition, the entire class is scanned. You could try optimizing it by adding the PARALLEL keyword at the end of the statement. If you have multiple cores it should be much faster.
Anyway, with the upcoming release v3.0 (still in pre-alpha) a lot of effort has been put in the new SQL engine and queries like yours should be much faster.

how group by having limit works

Can someone explain how construction group by + having + limit exactly work? MySQL query:
SELECT
id,
avg(sal)
FROM
StreamData
WHERE
...
GROUP BY
id
HAVING
avg(sal)>=10.0
AND avg(sal)<=50.0
LIMIT 100
Query without limit and having clauses executes for 7 seconds, with limit - instantly if condition covers a large amount of data or ~7 seconds otherwise.
Documentation says that limit executes after having which after group by, this means that query should always execute for ~7 seconds. Please help to figure out what is limited by LIMIT clause.
Using LIMIT 100 simply tells MySQL to return only the first 100 records from your result set. Assuming that you are measuring the query time as the round trip from Java, then one component of the query time is the network time needed to move the result set from MySQL across the network. This can take a considerable time for a large result set, and using LIMIT 100 should reduce this time to zero or near zero.
Things are logically applied in a certain pipeline in SQL:
Table expressions are generated and executed (FROM, JOIN)
Rows filtered (WHERE)
Projections and aggregations applied (column list, aggregates, GROUP BY)
Aggregations filtered (HAVING)
Results limited (LIMIT, OFFSET)
Now these may be composed into a different execution order by the planner if that is safe but you always get the proper data out if you think through them in this order.
So group by groups, then these are filtered with having, then the results of that are truncated.
As soon as MySQL has sent the required number of rows to the client,
it aborts the query unless you are using SQL_CALC_FOUND_ROWS. The
number of rows can then be retrieved with SELECT FOUND_ROWS(). See
Section 13.14, “Information Functions”.
http://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html
This effectively means that if your table has a rather hefty number of rows, the server doesn't need to look at all of them. It can stop as soon as it has found a 100 because it knows that's all that you need.