My Google cloud sql table have 1126571 rows currently and adding minimum 30 thousand every day.When execute the query :
select count(distinct sno) as tot from visits
sql prompt it will generate following error:
Error 0: Unable to execute statement
. Is Cloud SQL Query liable to 60 seconds exceed exception. How can overcome the problem when the table become large.
Break the table into two tables. One to receive new visits ... transactions ... one for reporting. Index the reporting table. Transfer and clear data on a regular basis.
The transaction table will remain relatively small and thus it will be fast to count. The reporting table will be fast to count because of the index.
add an INDEX in your column sno and it will improve its performance.
ALTER TABLE visits ADD INDEX (sno)
Try to split your select query for many parts, for example, the first select query must be limited to 50000, and then the second select query must be started from 50000 and limited to 50000 and so on.
You can do that by this scenario :
1- Get records count.
2- Make a loop and make it end at the records count.
3- For each loop, make the select query select 50000 records and append the results to a datatable (depends on what's your programming language)
4- In the next loop, you must start selecting from where previous loop ended, for example, the second query must select the next 50000 records and so on.
You can specify your select starting index by this SQL query statement:
SELECT * FROM mytable somefield LIMIT 50000 OFFSET 0;
Then you will get the whole data that you want.
NOTE : make a test to see what's the maximum records count can be loaded in 60 sec, this will decrease your loops and therefore, increased performance.
Related
I don't have a real code sorry. But only a problem explanation.
I would like to understand how is the best way to solve this problem.
I have 3 queries:
The first one is a long Transaction which performs an SQL INSERT statement in a table.
The second query COUNTs the number of rows of the previous table after the INSERT took place
The third query UPDATEs one field of the previously inserted record with the count number retrieved by the second query.
So far so good. My 3 queries are executed correctly.
Now suppose that these 3 queries are executed inside an API call. What happens now is that if multiple API calls are executed too fast and simultaneously, the second COUNT query retrieves a wrong value and consequently the 3th UPDATE has also a wrong value.
Nevertheless I have dead locks on the INSERT query because while making the INSERT, the SELECT COUNT tried to read at the same time on a second api call.
My question is what would be the best approach to solve this kind of problem.
I don't need code. I just would like to understand the best way to go.
Would I need to lock all the tables, for example?
It is unclear what you are doing, but this might be faster:
CREATE TEMPORARY TABLE t ...; -- all columns except count
INSERT IN t ...; -- the incoming data
SELECT COUNT(*) INTO #ct FROM t;
INSERT INTO real_table
(...) -- including the count-column last
SELECT ..., #ct FROM t; -- Note how count is tacked on last
Assume I want to find ids that appear in both mode=1 and mode=2:
SELECT id FROM tab a WHERE mode=1 and (SELECT COUNT(*) from tab b where b.mode=2 and a.id=b.id) = 0
and I need this query to run very quickly, even though the table contains millions of rows (already have an index on id1 and on id2). Is there a way to create something like a view that contains this query, that is updated automatically every time the table changes, to have the results prepared for me in advance?
You can create a table called summary_tab. Use a programming language or command line to execute a query like this:
insert into summary_tab
select id from ...
Then, use a task scheduler like cron to execute the script or command line every few minutes.
The other option is to create an AFTER INSERT trigger on your table that will execute a query like this and update summary table. However, if the query takes a long time and/or if you are inserting a bunch of records in tab table, the trigger will slow inserts down.
You could also try something like this:
select id
from tab
where mode in (1, 2)
group by id
having count(*) = 2
Check the speed and results of this query. If it is not that fast, try creating an index on id and another index on mode and yet another with combination of id+mode and see if one of the indexes makes the query fast enough that you don't have to use a summary table.
I have a table which has 40,000 rows. From code side around at same exact second about 20,000 users will need to run a query to find their related row. What is better approach here ?
Loading all 40,000 rows in cache and running a for loop on them to find record ?
Simply query database.
Here is what query will look like. Where parameter will be users IP.
SELECT * FROM iplist where ipfrom <= INET_ATON('xxx.xxx.xx.xx') limit 1;
MySQL already caches the data, in the form of the InnoDB Buffer Pool. As pages of data and indexes are requested, they are copied to RAM, and used for any subsequent queries.
You should define an index for the column you search on, if you don't already have an index or a primary key defined for that column:
ALTER TABLE iplist ADD INDEX (ipfrom);
Then searching for a specific value in that column won't require a table-scan, it will narrow down the search efficiently.
Note when you use LIMIT, you should also use ORDER BY, otherwise the row you get will be the first one read in index order, which may not always be what you want. If you use ORDER BY redundantly (i.e. the same order it reads the index), then it will be optimized out.
SELECT * FROM iplist where ipfrom <= INET_ATON(?) ORDER BY ipfrom LIMIT 1;
I have a table with 10 columns, It contains about 8 million rows. I'm doing a statistically job with this table. The problem is when I run more time, when id grows, the select query slow more.
Here is the query:
select * from transaction
where id > :pointer
AND col_a = :col_a
AND col_b >= :from
order by id ASC limit 5000
Both of 3 fields in the query have been created index.
After each loop, I will run query again with new pointer value, the value of pointer is id of last row of previous result set, I don't use OFFSET.
Finally, I took half of day to run the script with this query, too long.
So how can I fix this performance problem.
Both of 3 fields in the query have been created index
Mysql can use only one index per table in a query. If you created separate indexes for each field, then MySQL can use only one of them to speed up your query, not all 3.
I would create a multi-column index on id, col_a, col_b fields (in this order). This way a single index can be used to satisfy all 3 conditions in the where criteria and the order by as well.
After each loop, I will run query again with new pointer value
Your code suggests that you use some kind of a parametrised query, but we cannot determine if it is a proper MySQL prepared statement. If it is not, then consider using MySQL prepared statement for this process.
Prepare the query before the loop and then in the loop just adjust the parameters and then execute the prepared statement again. This way MySQL will parse the query only once, not each time the code loops.
I have a table 'tbl' something like that:
ID bigint(20) - primary key, autoincrement
field1
field2
field3
That table has 600k+ rows.
Query:
SELECT * from tbl ORDER by ID LIMIT 600000, 1 takes 1.68 second
Query:
SELECT ID, field1 from tbl ORDER by ID LIMIT 600000, 1 takes 1.69 second
Query:
SELECT ID from tbl ORDER by ID LIMIT 600000, 1 takes 0.16 second
Query:
SELECT * from tbl WHERE ID = xxx takes 0.005 second
Those queries are tested in phpmyadmin.
And the result is query 3 and query 4 together return necessarily data.
Query 1 does the same jobs but much slower...
This doesn't look right for me.
Could anyone give any advice?
P.S. I'm sorry for formatting.. I'm new to this site.
New test:
Q5 : CREATE TEMPORARY TABLE tmptable AS (SELECT ID FROM tbl WHERE ID LIMIT 600030, 30);
SELECT * FROM tbl WHERE ID IN (SELECT ID FROM tmptable); takes 0.38 sec
I still don't understand how it's possible. I recreated all indexes.. what else can I do with that table? Delete and refill it manually? :)
Query 1 looks at the table's primary key index, finds the correct 600,000 ids and their corresponding locations within the table, then goes to the table and fetches everything from those 600k locations.
Query 2 looks at the table's primary key index, finds the correct 600k ids and their corresponding locations within the table, then goes to the table and fetches whichever subset of fields are asked for from those 600k rows.
Query 3 looks at the table's primary key index, finds the correct 600k ids, and returns them. It doesn't need to look at the table at all.
Query 4 looks at the table's primary key index, finds the single entry requested, goes to the table, reads that single entry, and returns it.
Time-wise, let's build backwards:
(Q4) The table index allows lookup of a key (id) in O(log n) time, meaning every time the table doubles in size it only takes one extra step to find the key in the index*. If you have 1 million rows, then, it would only take ~20 steps to find it. A billion rows? 30 steps. The index entry includes data on where in the table to go to find the data for that row, so MySQL jumps to that spot in the table and reads the row. The time reported for this is almost entirely overhead.
(Q3) As I mentioned, the table index is very fast; this query finds the first entry and just traverses the tree until it has the requested number of rows. I'm sure I could calculate the precise number of steps it would take, but as a maximum we'll say 20 steps x 600k rows = 12M steps; since it's traversing a tree it would likely be more like 1M steps, but the precise number is largely irrelevant. The most important thing to realize here is that once MySQL has walked the index to pull the ids it needs, it has everything you asked for. There's no need to go look at the table. The time reported for this one is essentially the time it takes MySQL to walk the index.
(Q2) This begins with the same tree-walking as discussed for query 3, but while pulling the IDs it needs, MySQL also pulls their location within the table files. It then has to go to the table file (probably already cached/mmapped in memory), and for every entry it pulled, seek to the proper place in the table and get the fields requested out of those rows. The time reported for this query is the time it takes to walk the index (as in Q3) plus the time to visit every row specified in the index.
(Q1) This is identical to Q2 when all fields are specified. As the time is essentially identical to Q2, we can see that it doesn't really take measurably more time to pull more fields out of the database, any time there is dwarfed by crawling the index and seeking to the rows.
*: Most databases use an indexing data structure (B-trees for MySQL) that has a log base much higher than 2, meaning that instead of an extra step every time the table doubles, it's more like an extra step every time the table size goes up by a factor of hundreds to thousands. This means that instead of the 20-30 steps I stated in the example, it's more like 2-5.