MYSQL and LabVIEW - mysql

I have a table with 27 columns and 300,000 rows of data, out of which 8 columns are filled with data 0 or 1 or null. Using LabVIEW I get the total count of each of these columns using the following query:
select
d_1_result,
d_2_value_1_result,
de_2_value_2_result,
d_3_result,
d_4_value_1_result,
d_4_value_2_result,
d_5_result
from Table_name_vp
where ( insp_time between
"15-02-02 06:00:00" and "15-02-02 23:59:59" or
inspection_time between "15-02-03 00:00:00" and "15-02-03 06:00:00")
and partname = "AbvQuene";
This query runs for the number of days the user input, for example 120 days.
I found that total time taken by the query is 8 secs which not good.
I want to reduce the time to 8 millisecs.
I have also changed the engine to Myisam.
Any suggestions to reduce the time consumed by the query. (LabVIEW Processing is not taking time)

It depends on the data, and how many rows out of the 300,000 are actually selected by your WHERE clause. Obviously if all 300,000 are included, the whole table will need to be read. If it's a smaller number of rows, an index on insp_time or inspection_time (is this just a typo, are these actually the same field?) and/or partname might help. The exact index will depend on your data.

Update 2:
I can't see any reason for you wouldn't be able to load your whole DB into memory because it should be less than 60MB. Do you agree with this?
Please post your answer and the answer the following questions (you can edit a question after you have asked it - that's easier than commenting).
Next steps:
I should have mentioned this before, that before you run a query in LabVIEW I would always test it first using your DB admin tool (e.g. MySql Workbench). Please post whether that worked or not.
Post your LabVIEW code.
You can try running your query with less than 300K rows - say 50K and see how much your memory increases. If there's some limitation on how many rows you can query at one time than you can break your giant query into smaller ones pretty easily and then just add up the results sets. I can post an example if needed.
Update:
It sounds like there's something wrong with your schema.
For example, if you had 27 columns of double's and datetimes ( both are 8 bytes each) your total DB size would only be about 60MB (300K * 27 * 8 / 1048576).
Please post your schema for further help (you can use SHOW CREATE TABLE tablename).
8 millisecs is an extremely low time - I assume that's being driven by some type of hardware timing issue? If not please explain that requirement as a typical user requirement is around 1 second.
To get the response time that low you will need to do the following:
Query the DB at the start of your app and load all 300,000 rows into memory (e.g. a LabVIEW array)
Update the array with new values (e.g. array append)
Run the "query" against he array (e.g. using a for loop with a case select)
On a separate thread (i.e. LabVIEW "loop") insert the new records into to the database or do it write before the app closes
This approach assumes that only one instance of the app is running at a time because synchronizing database changes across multiple instances will be very hard with that timing requirement.

Related

Optimizing MySQL update query with two tables

I have 14 million rows and 20 columns in a table named weather and 1900 rows and 15 columns in a table named incident on MySQL server. I am trying to set the active column in weather to 1 where the weather date column is between the start and end date column of the incident table and where the weather location column is equal to the incident location column. I have the following query and I am not sure if it is the most efficient way to do it. It currently has been running for an hour on AWS RDS db.m5.4xlarge (16 vCPU and 64 GB RAM). It is only using 8% CPU according to the AWS Console.
UPDATE dev.weather, dev.incident
SET weather.active = 1
WHERE weather.location = incident.location AND weather.DATE BETWEEN dev.incident.start_date AND dev.incident.end_date
Is there a better way to accomplish this?
By the time we come up with a satisfactory solution, your query will be finished. But here are some thoughts on it.
UPDATE, especially if lots of rows are modified, is very time-consuming. (This is because of the need to save old rows in case of rollback.)
Without the indexes, I cannot advise on things completely.
This is a one-time query, correct? Future "incidents" will do the update as the incident is stored, correct? That will probably run reasonably fast.
Given that you have a way to update for a single incident, use that as the basis for doing the initial UPDATE (the one you are asking about now). That is, write a special, one-time, program to run through the 1900 incidents, performing the necessary Update. (Advantage: only one Update need ever be written.)
Be sure to COMMIT after each Update. (Or run with autocommit=ON.) Else the 1900 updates will be a big burden on the system, perhaps worse than the single-Update that started this discussion.

Best solution for fast insertion and searching in MySQL for large no. of rows?

We are planning to implement a feature in our web application which will provide users with the ability to do searches and save IDs of all matched records in DB (MySQL - INNODB) as a 'list'. Results can be in millions. We want users to be able to save up to 1 million ids. It has to be in real time (at max 5-10 secs delay is acceptable). This list can then be used later on as another filter in combination with the existing filters.
We don't need to pass these IDs from client side as the same search can be done on server side to retrieve those IDs. However, later on same search can't be reused to get those IDs as the search result can change.
We have few thousand active users and don't expect many to create such big lists but with passage of time total no. of ids saved in these lists can grow to hundreds of million.
Server has more RAM than the complete database (few hundred GBs). Also it uses a SSD.
Here are the issues we need to address:
- Saving up to 1 million ids in DB (within few secs)
- Using these IDs as a search criteria with other filters (this additional criteria shouldn't slow down the searches by more than few secs)
This is what seems to be some of the possible solutions:
Solution 1:
Have a separate table with User Id, List Id, Doc Id
Save IDs in a separate row (possibly 1 million rows for 1 list)
Partition table after a certain size
Benefit: This table can easily be used later on in the JOIN condition and with indexes search performance should be fast.
Issue: Insertions would be slow - I know there are ways to speed up inserts but still it can take longer than few secs especially once the tables grows.
Solution 2:
Save all IDs in one row
Pass these IDs as IN parameter in the query in chunks using techniques like MapReduce for fast searching
Benefit: Insertions would be quite fast.
Issue: Search performance can be fast using MapReduce but it can put a lot of load on server especially if many users start doing such searches.
Any suggestions on what will be the best way? Are there any other possible approaches to cater to this scenario?
Saving intermediate results in progressive filtering -- I have never seen this used successfully. Simply build the complete query and execute it each time.

Doing SUM() and GROUP BY over millions of rows on mysql

I have this query which only runs once per request.
SELECT SUM(numberColumn) AS total, groupColumn
FROM myTable
WHERE dateColumn < ? AND categoryColumn = ?
GROUP BY groupColumn
HAVING total > 0
myTable has less than a dozen columns and can grow up to 5 millions of rows, but more likely about 2 millions in production. All columns used in the query are numbers, except for dateColumn, and there are indexes on dateColumn and categoryColumn.
Would it be reasonble to expect this query to run in under 5 seconds with 5 million rows on most modern servers if the database is properly optimized?
The reason I'm asking is that we don't have 5 millions of data and we won't even hit 2 millions within the next few years, if the query doesn't run in under 5 seconds then, it's hard to know where the problem lies. Would it be because the query is not suitable for a large table, or the database isn't optimized, or the server isn't powerful enough? Basically, I'd like to know whether using SUM() and GROUP BY over a large table is reasonable.
Thanks.
As people in comments under your question suggested, the easiest way to verify is to generate random data and test query execution time. Please note that using clustered index on dateColumn can significantly change execution times due to the fact, that with "<" condition only subset of continuous disk data is retrieved in order to calculate sums.
If you are at the beginning of the process of development, I'd suggest concentrating not on the structure of table and indexes that collects data - but rather what do you expect to need to retrieve from the table in the future. I can share my own experience with presenting website administrator with web usage statistics. I had several webpages being requested from server, each of them falling into one on more "categories". My first approach was to collect each request in log table with some indexes, but the table grew much larger than I had at first estimated. :-) Due to the fact that statistics where analyzed in constant groups (weekly, monthly, and yearly) I decided to create addidtional table that was aggregating requests in predefined week/month/year grops. Each request incremented relevant columns - columns were refering to my "categories" . This broke some normalization rules, but allowed me to calculate statistics in a blink of an eye.
An important question is the dateColumn < ? condition. I am guessing it is filtering records that are out of date. It doesn't really matter how many records there are in the table. What matters is how much records this condition cuts down.
Having aggressive filtering by date combined with partitioning the table by date can give you amazing performance on ridiculously large tables.
As a side note, if you are not expecting to hit this much data in many years to come, don't bother solving it. Your business requirements may change a dozen times by then, together with the architecture, db layout, design and implementation details. planning ahead is great but sometimes you want to give a good enough solution as soon as possible and handle the future painful issues in the next release..

MySQL - Basic issue with a large table

In my db there are two large tables. The first one (A) has 1.7 million rows, the second one (B): 2.1 millions. Records in A and B have a fairly identical size.
I can do any operation on A. It takes time, but it works. On B, I can't do anything. Even a simple select count(*) just hangs for ever. The problem is I don't see any error: it just hangs (when I show the process list it just says "updating" for ever).
It seems weird to me that the small delta (percentage-wise) between 1.7 and 2.1 million could make such a difference (from being able to do everything, to not even be able to do the simplest operation).
Can there be some kind of 2 million rows hard limit?
I am on Linux 2.6+, and I use innoDB.
Thanks!
Pierre
It appears it depends more on the amount of data in each row than it does on the total number of rows. If the rows contain little data, then the maximum rows returned will be higher than rows with more data. Check this link for more info:
http://dev.mysql.com/doc/refman/5.0/en/innodb-restrictions.html
The row size (the number of bytes needed to store one row) might be much larger for the second table. Count(*) may require a full table scan - ie reading through the entire table on disk - larger rows mean more I/O and longer time.
The presence/absence of indexes will likely make a difference too.
As I was saying in my initial post, the thing was the two tables were fairly similar, so row size would be fairly close in both tables. That's why I was a bit surprised, and I started to think that maybe, somehow, a 2 million limit was set somewhere.
It turns out my table was corrupted. It is bizarre since I was still able to access some records (using joins with other tables), and mySQL was not "complaining". I found out by doing a CHECK TABLE: it did not return any error, but it crashed mysqld every time...
Anyway, thank you all for your help on this.
Pierre

mysql Number of SQL Queries

I just converted an Access Db to mysql (using Access as frontend, mysql as backend)
It is a simple, 4 table database
I have a Master table, and 3 linked tables to it
So an Access form displays has data from:
Master table (mainform)
Details1 table (subform)
Details2 table (subform)
Details3 table (subform)
The master table will always be one row, however all the linked tables ("details" tables) can have any number of records, usually around 10-30 rows in each detail table per master record.
Everything runs well, however when checking Mysql Administrator Health>Connection Health>Number of SQL Queries, the number of queries jump to 10 (queries) everytime I move between Master record pages.
I am running this on my own laptop, and I am worried this will become a problem (performance) when I put 100+ users in the work server all working at once.
Could anyone advise if this high "number of queries" reported by Mysql Admin will be a problem?
What number is considered "dangerous" for performance purposes?
The idea is to have a fast running system, so I would like to avoid too many queries to the database.
I also dont understand why (example) it displays 7 queries when there are only 4 tables in total..with only one row per table being dislayed
ANy ideas/comments will be appreciated
Can there something be changed in Access front end to make the number of queries lower ?
thanks so much
Those 10 queries probably don't take a long time, and they are very likely sequential. I doubt there will be a problem with 100 users since they won't all be running the queries at once. Even then, mysql can handle quite a load.
I'm not sure what is going on inside Access. "Queries" can be just about anything (i.e. meta data), not just queries for records from the table. For example, getting the total number of records in a table to display something like "showing 23 of 1,000". If access is doing this for each table, that's an extra 4 queries right there, leaving only 3 to get actual data to display.
It's hard to be sure because it depends on a lot of things like server's memory, cpu and complexity of the queries but...
Supposing the queries for the subforms are directly linked to the master table (with an indexed id field) and do not need to join with other tables (as you have only 4 tables), I think you're ok to run without problems as the number of queries is not too high.
As an example, some years ago I had an old machine (Athlon XP1600 with only 512MB or 1GGB RAM) running mysql and serving files for 20 users. Most of the queries were small Stored Procedures using mainly 20 tables but returning a lot of rows (usualy around 2000 for the most used query). Everything was fast. This old system ran 14 millions queries in 2 months (average > 700 per minute) so I think you will be OK.
Anyways, if you have a way to do a partial test it would be the best option. You could use a small script querying the database in a loop on several machines for example.