Using MySQL Partitioning to speed up concurrent deletes and selects? - mysql

I have a MySQL Innodb table that contains about 8.5 million rows. The table structure basically looks like this:
CREATE TABLE `mydatatable` (
`ext_data_id` int(10) unsigned NOT NULL,
`datetime_utc` date NOT NULL DEFAULT '0000-00-00',
`type` varchar(8) NOT NULL DEFAULT '',
`value` decimal(6,2) DEFAULT NULL,
PRIMARY KEY (`ext_data_id`,`datetime_utc`,`type`),
KEY `datetime_utc` (`datetime_utc`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Every night, I delete the expired values from this table with the following query:
delete from mydatatable where datetime_utc < '2013-09-23'
This query does not seem to use indizes, and it takes quite some time to run. However I also get concurrent updates and selects on the same table. These get locked then, causing my website to be unresponsive at that time.
I am looking for various ways to speed up this setup. I cam across MySQL partitioning and I am wondering if this would be a good fit. I am always adding and selecting the newer data to this table and deleting old ones. I could create partitions based on something like MOD(DAYOFYEAR(datetime),4). Now when I delete, I will always delete the values from another partition than the one I am reading or writing from.
Will I experience locking with this setup? Will partitioning improve the query speed and availability in my case? Or should I look for another solution, and if so, which one?

Since MySQL 5.5 you could use function COLUMNS, that simplifies your partitioning for non-integer columns (such as datetime_utc).
As for the performance:
Dropping a partition is a constant time operation for LIST and RANGE partitioning. The speed is equivalent of a TRUNCATE TABLE or rm file, so practically independent of the size of the partition.
Doing SELECT on a partitioned table benefits from partition pruning, so that you read only from the partition that match you search criteria. That can also speed up range scans.
Tip:
Do not forget to add an "default" partition, such as
PARTITION the_last_one VALUES LESS THAN(MAXVALUE)
in order to avoid INSERT/UPDATE statements fail since no partition found to insert into.

Absolutely you are on right track. You should create daily partitions here and store data in them, you queries will get revolutionised and will run like ferrari. Also take a look at local indexes.
Also with partitions, if your previous data will not interfere, so you can keep or delete it wont make much difference. In fact instead of deleteing you can simply drop partitions. This is also very fast.

Related

Can I kill a process in the "query end" state in Aurora MySql

I have a large table hosted on Aurora at Amazon using MySql 5.7
2 days ago, I ran this command:
insert IGNORE into archiveDataNEW
(`DateTime-UNIX`,`pkl_PPLT_00-PIndex`,`DataValue`)
SELECT `DateTime-UNIX`,`pkl_PPLT_00-PIndex`,`DataValue`
FROM offlineData
order by id
limit 600000000, 200000000
Yesterday afternoon, my computer crashed so the connection to mysql was severed.
sometime last night the status of the query was "query end"
today the status of the query is still "query end"
Questions:
Can I stop this process - or with that only make things worse?
Does MySQL innodb unwind a query when the connection to the server drops? Is there any way to tell it to proceed instead?
will I need to re-run the command when it finally completes the query end process ?
Here is the table I am loading data into, any thoughts or suggestions will be appreciated.
CREATE TABLE `archiveDataNEW` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`DateTime-UNIX` bigint(20) NOT NULL DEFAULT '0',
`pkl_PPLT_00-PIndex` int(11) NOT NULL DEFAULT '0',
`DataValue` decimal(14,4) NOT NULL DEFAULT '0.0000',
PRIMARY KEY (`id`,`DateTime-UNIX`),
UNIQUE KEY `Unique2` (`pkl_PPLT_00-PIndex`,`DateTime-UNIX`) USING BTREE,
KEY `DateTime` (`DateTime-UNIX`) USING BTREE,
KEY `pIndex` (`pkl_PPLT_00-PIndex`) USING BTREE,
KEY `DataIndex` (`DataValue`),
KEY `pIndex-Data` (`pkl_PPLT_00-PIndex`,`DataValue`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=736142506 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (`DateTime-UNIX`)
(PARTITION p2016 VALUES LESS THAN (1483246800) ENGINE = InnoDB,
PARTITION p2017 VALUES LESS THAN (1514782800) ENGINE = InnoDB,
PARTITION p2018 VALUES LESS THAN (1546318800) ENGINE = InnoDB,
PARTITION p2019 VALUES LESS THAN (1577854800) ENGINE = InnoDB,
PARTITION p2020 VALUES LESS THAN (1609477200) ENGINE = InnoDB,
PARTITION p2021 VALUES LESS THAN (1641013200) ENGINE = InnoDB,
PARTITION p2022 VALUES LESS THAN (1672549200) ENGINE = InnoDB,
PARTITION p2023 VALUES LESS THAN (1704085200) ENGINE = InnoDB,
PARTITION pMAX VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;```
There's no way to complete that statement and commit the row it inserted.
This is apparently a bug in MySQL 5.7 code, discussed here: https://bugs.mysql.com/bug.php?id=91078
The symptom is that a query is stuck in "query end" state and there's no way to kill it or finish it except by restarting the MySQL Server. But this is not possible on AWS Aurora, right?
There's some back and forth in that bug log about whether it's caused by the query cache. The query cache is deprecated, but in Aurora they have reenabled it and changed its implementation. They are convinced that their query cache code solves the disadvantages of MySQL's query cache implementation, so they leave it on in Aurora (this is one of many reasons you should think of Aurora as a fork of MySQL, not necessarily compatible with MySQL itself).
Kill it, if you can. It is either busy committing (which will take a long time) or busy undoing (which will take even longer). If it won't kill, you are stuck with waiting it out.
A better way.
Using OFFSET in limit 600000000, 200000000 will only get slower and slower as you do the chunks. This is because it must step over the 600M rows.
Also, INSERTing 200M rows at a time is quite inefficient. The system must prepare to UNDO the action in case of crash.
So, it is better to "remember where you left off". Or do it in explicit chunks like WHERE id BETWEEN 12345000 AND 12345999. Also, do only 1K rows at a time.
But, what are your trying to do?
If you are adding Partitioning, let's discuss whether there will be any benefit. It looks like you are adding yearly partitioning. Possibly the only advantage is when you need to DROP PARTITION to get rid of "old" data. It is unlikely that any queries will run any faster.
Likely optimizations:
Shrink:
`DateTime-UNIX` bigint(20)
This seems to be a unix timestamp, that fits nicely in a 4-byte INT or a 5-byte TIMESTAMP; why use an 8-byte BIGINT? TIMESTAMP has the advantage of allowing lots of datetime functions. A 5-byte DATETIME or a 3-byte DATE will last until the end of the year 9999. We are 17 years from overflow of TIMESTAMP; what computer systems do you know of that have been around since 2004 (today - 17 years)? Caveat: There will be timezone issues to address (or ignore) if you switch from TIMESTAMP. (If you need the time part, do not split a DATETIME into two columns; it is likely to add complexity.)
Drop KEY pIndex (pkl_PPLT_00-PIndex) USING BTREE, it is redundant with two other indexes.
Do not pre-build future partitions; it hurts performance (a small amount). At the end of the current year, build the next year's partition with REORGANIZE. Details here: http://mysql.rjweb.org/doc.php/partitionmaint
This will improve performance in multiple ways:
PRIMARY KEY (`id`,`DateTime-UNIX`),
UNIQUE KEY `Unique2` (`pkl_PPLT_00-PIndex`,`DateTime-UNIX`) USING BTREE,
-->
PRIMARY KEY(`pkl_PPLT_00-PIndex`,`DateTime-UNIX`),
INDEX(id) -- sufficient for AUTO_INCREMENT
It may run faster if you leave off the non-UNIQUE indexes until the table is loaded. Then do ALTER(s) to add them.

MySQL: Slow SELECT because of Index / FKEY?

Dear StackOverflow Members
It's my first post, so please be nice :-)
I have a strange SQL behavior which i can't explain and don't find any resources which explains it.
I have built a web honeypot which record all access and attacks and display it on a statistic page.
However since the data increased, the generation of the statistic page is getting slower and slower.
I narrowed it down to a some select statements which takes a quite a long time.
The "issue" seems to be an index on a specific column.
*For sure the real issue is my lack of knowledge :-)
Database: mysql
DB schema
Event Table (removed unrelated columes):
Event table size: 30MB
Event table records: 335k
CREATE TABLE `event` (
`EventID` int(11) NOT NULL,
`EventTime` datetime NOT NULL DEFAULT current_timestamp(),
`WEBURL` varchar(50) COLLATE utf8_bin DEFAULT NULL,
`IP` varchar(15) COLLATE utf8_bin NOT NULL,
`AttackID` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
ALTER TABLE `event`
ADD PRIMARY KEY (`EventID`),
ADD KEY `AttackID` (`AttackID`);
ALTER TABLE `event`
ADD CONSTRAINT `event_ibfk_1` FOREIGN KEY (`AttackID`) REFERENCES `attack` (`AttackID`);
Attack Table
attack table size: 32KB
attack Table records: 11
CREATE TABLE attack (
`AttackID` int(4) NOT NULL,
`AttackName` varchar(30) COLLATE utf8_bin NOT NULL,
`AttackDescription` varchar(70) COLLATE utf8_bin NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
ALTER TABLE `attack`
ADD PRIMARY KEY (`AttackID`),
SLOW Query:
SELECT Count(EventID), IP
-> FROM event
-> WHERE AttackID >0
-> GROUP BY IP
-> ORDER BY Count(EventID) DESC
-> LIMIT 5;
RESULT: 5 rows in set (1.220 sec)
(This seems quite long for me, for a simple query)
QuerySlow
Now the Strange thing:
If I remove the foreign key relationship the performance of the query is the same.
But if I remove the the index on event.AttackID same select statement is much faster:
(ALTER TABLE `event` DROP INDEX `AttackID`;)
The result of the SQL SELECT query:
5 rows in set (0.242 sec)
QueryFast
From my understanding indexes on columns which are used in "WHERE" should improve the performance.
Why does removing the index have such an impact on the query?
What can I do to keep the relations between the table and have a faster
SELECT execution?
Cheers
Why does removing the index improve performance?
The query optimizer has multiple ways to resolve a query. For instance, two methods for filtering data are:
Look up the rows that match the where clause in the index and then fetch related data from the data pages.
Scan the index.
This doesn't get into the use of indexes for joins or aggregations or alternative algorithms.
Which is better? Under some circumstances, the first method is horribly slower than the second. This occurs when the data for the table does not fit into memory. Under such circumstances, the index can read a record from page 124 and then from 1068 and then from 124 again and -- well, all sorts of random intertwined reading of pages. Reading data pages in order is usually faster. And when the data doesn't fit into memory, thrashing occurs, which means that a page in memory is aged (overwritten) -- and then needed again.
I'm not saying that is occurring in your case. I am simply saying that what optimizers do is not always obvious. The optimizer has to make judgements based on the nature of the data -- and those judgements are not right 100% of the time. They are usually correct. But there are borderline cases. Sometimes, the issue is out-of-date statistics. Sometimes the issue is that what looks best to the optimizer is not best in practice.
Let me emphasize that optimizers usually do a very good job, and a better job than a person would do. Even if they occasionally come up with suboptimal plans, they are still quite useful.
Get rid of your redundant UNIQUE KEYs. A primary key is a unique key.
Use COUNT(*) rather than COUNT(IP) in your query. They mean the same thing because you declared IP to be NOT NULL.
Your query can be much faster if you stop saying WHERE AttackId>0. Because that column is a FK to the PK of your other table, those values should be nonzero anyway. But to get that speedup you'll need an index on event(IP) something like this.
CREATE INDEX IpDex ON event (IP)
But you're still summarizing a large table, and that will always take time.
It looks like you want to display some kind of leaderboard. You could add a top_ips table, and use an EVENT to populate it, using your query, every few minutes. Then you could display it to your users without incurring the cost of the query every time. This of course would display slightly stale data; only you know whether that's acceptable in your app.
Pro Tip. Read https://use-the-index-luke.com by Marcus Winand.
Essentially every part of your query, except for the FKey, conspires to make the query slow.
Your query is equivalent to
SELECT Count(*), IP
FROM event
WHERE AttackID >0
GROUP BY IP
ORDER BY Count(*) DESC
LIMIT 5;
Please use COUNT(*) unless you need to avoid NULL.
If AttackID is rarely >0, the optimal index is probably
ADD INDEX(AttackID, -- for filtering
IP) -- for covering
Else, the optimal index is probably
ADD INDEX(IP, -- to avoid sorting
AttackID) -- for covering
You could simply add both indexes and let the Optimizer decide. Meanwhile, get rid of these, if they exist:
DROP INDEX(AttackID)
DROP INDEX(IP)
because any uses of them are handled by the new indexes.
Furthermore, leaving the 1-column indexes around can confuse the Optimizer into using them instead of the covering index. (This seems to be a design flaw in at least some versions of MySQL/MariaDB.)
"Covering" means that the query can be performed entirely in the index's BTree. EXPLAIN will indicate it with "Using index". A "covering" index speeds up a query by 2x -- but there is a very wide variation on this prediction. ("Using index condition" is something different.)
More on index creation: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

partitioning mysql table with 3b records per year

What is good approach to handle 3b rec table where concurrent read/write is very frequent within few days?
Linux server, running MySQL v8.0.15.
I have this table that will log device data history. The table need to retain its data for one year, possibly two years. The growth rate is very high: 8,175,000 rec/day (1mo=245m rec, 1y=2.98b rec). In the case of device number growing, the table is expected to be able to handle it.
The table read is frequent within last few days, more than a week then this frequency drop significantly.
There are multi concurrent connection to read and write on this table, and the target to r/w is quite close to each other, therefore deadlock / table lock happens but has been taken care of (retry, small transaction size).
I am using daily partitioning now, since reading is hardly spanning >1 partition. However there will be too many partition to retain 1 year data. Create or drop partition is on schedule with cron.
CREATE TABLE `table1` (
`group_id` tinyint(4) NOT NULL,
`DeviceId` varchar(10) COLLATE utf8mb4_unicode_ci NOT NULL,
`DataTime` datetime NOT NULL,
`first_log` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`first_res` tinyint(1) NOT NULL DEFAULT '0',
`last_log` datetime DEFAULT NULL,
`last_res` tinyint(1) DEFAULT NULL,
PRIMARY KEY (`group_id`,`DeviceId`,`DataTime`),
KEY `group_id` (`group_id`,`DataTime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
/*!50100 PARTITION BY RANGE (to_days(`DataTime`))
(
PARTITION p_20191124 VALUES LESS THAN (737753) ENGINE = InnoDB,
PARTITION p_20191125 VALUES LESS THAN (737754) ENGINE = InnoDB,
PARTITION p_20191126 VALUES LESS THAN (737755) ENGINE = InnoDB,
PARTITION p_20191127 VALUES LESS THAN (737756) ENGINE = InnoDB,
PARTITION p_future VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
Insert are performed in size ~1500/batch:
INSERT INTO table1(group_id, DeviceId, DataTime, first_result)
VALUES(%s, %s, FROM_UNIXTIME(%s), %s)
ON DUPLICATE KEY UPDATE last_log=NOW(), last_res=values(first_result);
Select are mostly to get count by DataTime or DeviceId, targeting specific partition.
SELECT DataTime, COUNT(*) ct FROM table1 partition(p_20191126)
WHERE group_id=1 GROUP BY DataTime HAVING ct<50;
SELECT DeviceId, COUNT(*) ct FROM table1 partition(p_20191126)
WHERE group_id=1 GROUP BY DeviceId HAVING ct<50;
So the question:
Accord to RickJames blog, it is not a good idea to have >50 partitions in a table, but if partition is put monthly, there are 245m rec in one partition. What is the best partition range in use here? Does RJ's blog still taken place with current mysql version?
Is it a good idea to leave the table not partitioned? (the index is running well atm)
note: I have read this stack question, having multiple table is a pain, therefore if it is not necessary i wish not to break the table. Also, sharding is currently not possible.
First of all, INSERTing 100 records/second is a potential bottleneck. I hope you are using SSDs. Let me see SHOW CREATE TABLE. Explain how the data is arriving (in bulk, one at a time, from multiple sources, etc) because we need to discuss batching the input rows, even if you have SSDs.
Retention for 1 or 2 years? Yes, PARTITIONing will help, but only with the deleting via DROP PARTITION. Use monthly partitions and use PARTITION BY RANGE(TO_DAYS(DataTime)). (See my blog which you have already found.)
What is the average length of DeviceID? Normally I would not even mention normalizing a VARCHAR(10), but with billions of rows, it is probably worth it.
The PRIMARY KEY you have implies that a device will not provide two values in less than one second?
What do "first" and "last" mean in the column names?
In older versions of MySQL, the number of partitions had impact on performance, hence the recommendation of 50. 8.0's Data Dictionary may have a favorable impact on that, but I have not experimented yet to see if the 50 should be raised.
The size of a partition has very little impact on anything.
In order to judge the indexes, let's see the queries.
Sharding is not possible? Do too many queries need to fetch multiple devices at the same time?
Do you have Summary tables? That is a major way for Data Warehousing to avoid performance problems. (See my blogs on that.) And, if you do some sort of "staging" of the input, the summary tables can be augmented before touching the Fact table. At that point, the Fact table is only an archive; no regular SELECTs need to touch it? (Again, let's see the main queries.)
One table per day (or whatever unit) is a big no-no.
Ingestion via IODKU
For the batch insert via IODKU, consider this:
collect the 1500 rows in a temp table, preferably with a single, 1500-row, INSERT.
massage that data if needed
do one IODKU..SELECT:
INSERT INTO table1(group_id, DeviceId, DataTime, first_result)
ON DUPLICATE KEY UPDATE
last_log=NOW(), last_res=values(first_result)
SELECT group_id, DeviceId, DataTime, first_result
FROM tmp_table;
If necessary, the SELECT can do some de-dupping, etc.
This approach is likely to be significantly faster than 1500 separate IODKUs.
DeviceID
If the DeviceID is alway 10 characters and limited to English letters and digits, then make it
CHAR(10) CHARACTER SET ascii
Then pick between COLLATION ascii_general_ci and COLLATION ascii_bin, depending on whether you allow case folding or not.
Just for your reference:
I have a large table right now over 30B rows, grows 11M rows daily.
The table is innodb table and is not partitioned.
Data over 7 years is archived to file and purged from the table.
So if your performance is acceptable, partition is not necessary.
From management perspective, it is easier to manage the table with partitions, you might partition the data by week. It will 52 - 104 partitions if you keep last or 2 years data online

MySQL table setup for stock information

I am collecting about 3 - 6 millions lines of stock data per day and storing it in a MySQL database.
All of the data is coming from Interactive Brokers every piece of information comes with these five fields: Symbol, Date, Time, Value and Type (type being information on what type of data I am receiving such as price, volume etc)
Here is my create table statement. idticks is just my unique key but I almost never am able to use it in queries.
CREATE TABLE `ticks` (
`idticks` int(11) NOT NULL AUTO_INCREMENT,
`symbol` varchar(30) NOT NULL,
`date` int(11) NOT NULL,
`time` int(11) NOT NULL,
`value` double NOT NULL,
`type` double NOT NULL,
KEY `idticks` (`idticks`),
KEY `symbol` (`symbol`),
KEY `date` (`date`),
KEY `idx_ticks_symbol_date` (`symbol`,`date`),
KEY `idx_ticks_type` (`type`),
KEY `idx_ticks_date_type` (`date`,`type`),
KEY `idx_ticks_date_symbol_type` (`date`,`symbol`,`type`),
KEY `idx_ticks_symbol_date_time_type` (`symbol`,`date`,`time`,`type`)
) ENGINE=InnoDB AUTO_INCREMENT=13533258 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY KEY (`date`)
PARTITIONS 1 */;
As you can see, I have no idea what I am doing because I just keep on creating indexes to make my queries go faster.
Right now the data is being stored on a rather slow computer for testing purposes so I understand that my queries are not nearly as fast as they could be (I have a 6 core, 64gig of ram, SSD machine arriving tomorrow which should help significantly)
That being said, I am running queries like this one
select time, value from ticks where symbol = "AAPL" AND date = 20150522 and type = 8 order by time asc
The query above, if I do not limit it, returns 12928 records for one of my test days and takes 10.2 seconds if I do it from cleared cache.
I am doing lots of graphing and eventually would like to be able to just query the data as I need to it graph. Right now I haven't noticed a lot of difference in speed between getting part of a days worth of data vs just getting the entire day's. It would be cool to have those queries respond fast enough that there is barely any delay when I moving to the next day/screen whatever.
Another query I am using for usability of a program I am writing to interact with the data include
String query = "select distinct `date` from ticks where symbol = '" + symbol + "' order by `date` desc";
But most of my need is the ability to pull a certain type of data from a certain day for a certain symbol like my first query.
I've googled all over the place and I think I understand that creating tons of indexes makes the database bigger and slows down the input speed (I get about 300 pieces of information per second on a busy day). Should I just index each column individually?
I am willing to throw more harddrives at things if it means responsive interface.
Basically, my questions relate to the creation/altering of my table. Based on the above query, can you think of anything I could do to make that faster? Or an indexing system that would help me out? Is InnoDB even the right engine? I tried googling this vs MyISam and after a couple of hours of this, I still wasn't sure.
Thanks :)
Combine date and time into a DATETIME field
Assuming Price and Volume always come in together, put them together (2 columns) and get rid if type.
Get rid of the AUTO_INCREMENT; change to PRIMARY KEY(symbol, datetime)
Get rid of any indexes that are the left part of some other index.
Once you are using DATETIME, use date ranges to find everything in a single date (if you need such). Do not use DATE(datetime) = '...', performance will be terrible.
Symbol can probably be ascii, not utf8.
Use InnoDB, the clustering of the Primary Key can be beneficial.
Do you expect to collect (and use) more data than will fit in innodb_buffer_pool_size? If so, we need to discuss your SELECTs and look into PARTITIONing.
Make those changes, then come back for more advice/abuse.
You're creating a historical database, so MyISAM would work as well as InnoDB. InnoDB is a transactional relational database, and is better suited for relational databases with multiple tables that must remain synchronized.
Your Stock table looks like this.
Stock
-----
Stock ID (idticks)
Symbol
Date
Time
Value
Type
It would be better if you combine the date and time into a time stamp column, and unpack the types like this.
Stock
-----
Stock ID
Symbol
Time Stamp
Volume
Open
Close
Bid
Ask
...
This makes it easier for the database to return rows for a query on a particular type, like the close value.
As far as indexes, you can create as many indexes as you want. You're adding (inserting) information, so the increased time to add information is offset by the decreased time to query the information.
I'd have a primary index on Stock ID, and a unique index on Symbol and Time Stamp descending. You could also have indexes on the values you query most often, like Close.

InnoDB vs MyIsam on a frequently sorted MySQL 5.5 table

I have a table (currently InnoDB) with roughly 100k records. These records have an order column so they can make up an ordered queue. Actually, these records belong to about 40 departments that have their own queue which in turn have their own records in this table.
The problem is that we're constantly getting "lock wait time" errors because various departments are sorting its queue (and records) simultaneously.
I know that MyIsam is a table-level lock engine and InnoDB is row-level. The thing is I'm not sure which one is faster for this kind of operation.
The other thing is that this table is joined in various queries with other InnoDB tables and I don't know the what can happen if I switch the table to MyIsam.
Here's the table structure:
CREATE TABLE `ssti` (
`demand_nber` MEDIUMINT(8) UNSIGNED NOT NULL COMMENT,
`year` CHAR(4) NULL DEFAULT NULL COMMENT,
`department` CHAR(4) NULL DEFAULT NULL COMMENT '4 caracteres',
-- [other columns ]
`priority` INT(10) UNSIGNED NOT NULL DEFAULT '9999999',
PRIMARY KEY (`NR_DMD`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;
And here's the piece of java code that updates the priorities:
PreparedStatement psUpdatePriority = con.prepareStatement("UPDATE `ssti` SET `priority` = ? WHERE demand_nber=?;");
for (int i = 0; i < demands.length(); ++i) {
JSONObject d = demands.getJSONObject(i);
psUpdatePriority.setInt(1, d.getInt("newPriority"));
psUpdatePriority.setInt(2, d.getInt("demandNumber"));
psUpdatePriority.addBatch();
}
int[] totalUpdated = psUpdatePriority.executeBatch();
When investigating performance problems be sure to engage the slow query log so you have a record of the specific queries causing problems.
What it looks like here is you're including a column in your WHERE clause that's not indexed. That's extremely painful on large data sets as it requires a "table scan", or reading every record and evaluating them sequentially.
When indexed your queries should be significantly faster.
If you're really up against the wall, you may want to break out each department into their own table. This is very difficult to undo so I'd only pursue this as a last resort.
Select statements normally do not block each other. Sorting is done in tempDB for each query separately. If you are getting waits for locks, then look at updates that are blocking selects.
With row-locking UPDATE will block only needed (small number of) rows, allowing other statements access other rows. With table-locking an UPDATE will block whole table and no other statements will access table until UPDATE is finished. So MyISAM will make your problem worse in any case.
--
It seems that you are using this table for many purposes. Therefore, you need to consider all of them and their importance, when tuning performance of this table.
Case 1: Department queries its own data and needs it sorted
When a result of some data manipulation is reused many times, the general rule is to save it. It would allow reading the result straight away, rather then computing it every time.
To allow queries to read sorted data you need to create an index.
However index just on sorting column priority will not help. As each department can see only its own data, every query also contains department number. Hence your index should contain two key columns as KEY (department, priority).
Case 2: Table is joined to several other tables
To speed up queries with JOINs you'll need indexes with key same as columns used for joins.
Case 3: Inserting new, possibly transactional, data
A single table is limited in how long it can handle inserts of new data and processing reporting queries. Usually, transactional and reporting uses are considered alternative to each other. It is a good practice to use reporting tables, that summarise data from transactional tables. Also joins to dimensions are easier when data is aggregated (there are less rows).