We have a mySQL slave in production that gets heavy additional usage for about 1 hour 15 minutes a day. A number of temporary servers run multiple threads each with a few heavy queries involving data in disk-based and MEMORY tables (all read-only), doing this in controlled batches of a specified size.
When the process starts, batches of 150 emails per minute complete in 35-40 seconds, and load is around 3-4 on a 4 CPU server, with total CPU usage around 40-50%. The intention is to allow replication to continue once the batch finished, and to not hit the DB for the full minute. The process continues fairly predictably for around 50 minutes, by which time the load as around 6-8 and total CPU usage is around 50-60%.
After about an hour, the process is running slowly and batches are failing to complete with maybe 60-80 emails completed each minute. Load is above 10, CPU usage is at around 90%. The process limps through the last 10 or 15 minutes until all emails are built.
Memory usage is constant throughout with little free RAM but plenty available, no significant or slow queries are run during the process though the master DB receives increased traffic due to response to the emails so replication has more to do, but this is about +2% CPU usage compared to normal idle levels. And the pattern is exactly the same if the batch size is set to 200: initial batches take longer to complete, but everything slows to a crawl about 1 hour in.
I can't make sense of this, but don't have a deep understanding of how high-demand affects load over time. I would've thought the server would quickly get bogged down if it was being overloaded. The fact that the workload is the same throughout the process, and things don't slow down more quickly when the workload is increased by 33%, throws me.
I'm clearly missing something here, but have spent some weeks off and on investigating what else might be impacting on the slave and there is nothing that I can find. All logging just reinforces the above - the process starts well, and load slowly increases until there is a problem.
Is this just normal behaviour? If so, why? Otherwise, what obvious things should I be considering or looking at?
Thanks
systat dump (sample)
HEADER CPU %usr %nice %sys %iowait %steal %irq %soft %guest %gnice %idle
14:39:01 all 23.36 0.02 1.15 0.82 0.09 0.00 0.21 0.00 0.00 74.36
14:47:01 all 35.39 0.02 1.83 1.01 0.10 0.00 0.42 0.00 0.00 61.22
14:51:01 all 46.43 0.02 2.40 0.85 0.08 0.00 0.50 0.00 0.00 49.71
14:55:01 all 52.07 0.03 2.16 0.68 0.08 0.00 0.52 0.00 0.00 44.47
15:07:01 all 52.45 0.02 2.20 1.36 0.09 0.00 0.48 0.00 0.00 43.40
15:13:01 all 58.39 0.02 2.37 0.64 0.06 0.00 0.54 0.00 0.00 37.98
15:25:01 all 56.92 0.03 2.29 0.66 0.08 0.00 0.52 0.00 0.00 39.51
15:29:01 all 71.71 0.03 2.69 0.57 0.06 0.00 0.52 0.00 0.00 24.42
15:31:01 all 77.22 0.03 3.08 0.39 0.04 0.00 0.59 0.00 0.00 18.65
15:37:01 all 85.00 0.03 2.98 0.16 0.03 0.00 0.65 0.00 0.00 11.16
15:39:01 all 86.66 0.04 2.85 0.11 0.03 0.00 0.65 0.00 0.00 9.67
[EDIT] As always, it's too easy to swamp a question with information, and just as easy to not add enough information. So in response to Jonathan's answer:
Creating the email involves selecting books from a series of MEMORY tables containing differently priced books, where the user has subscribed to the genre of book, excluding any promoted books already selected for the user, making allowances for whether the user reads romantic or erotica books, where the user has not seen the same book in the last 4 weeks, and results should be unique. The results must be in a specific order from the MEMORY tables and subscribed genres, which is why this is built one book at a time. The queries end up touching a few tables with sub-queries, eg:
SELECT SQL_NO_CACHE B.*, A.authorAsin
FROM free_books B
LEFT JOIN book_author A on A.bookAsin=B.ASIN
WHERE B.BrowseNodeId={this subscription}
AND B.number_reviews >= 4
AND B.erotica = 0
AND NOT EXISTS
(
SELECT 1 FROM user_history H
WHERE
H.userId={user id}
AND H.asin=B.ASIN
AND dateSent BETWEEN DATE_SUB(NOW(), INTERVAL 28 DAY) AND NOW()
)
AND NOT EXISTS
(
SELECT ASIN FROM free_books K
INNER JOIN romance_genres R
ON R.browsenode=K.BrowseNodeId
WHERE K.ASIN=B.asin
AND K.BrowseNodeId IN ({all subscriptions})
AND NOT EXISTS
(
SELECT ASIN FROM free_books S
WHERE S.ASIN=B.asin
AND S.number_reviews > 100
)
)
AND B.ASIN NOT IN ({previouslyselected books})
ORDER BY salesrank
LIMIT 1
We're creating about 2,500 emails per minute at peak, with about 20-40 books being identified for each user with queries similar to the above. Logged completion times for the PHP script that runs this query range from around 0.01 secs to 0.2 secs. From logs on 6th Sep:
14:40 average 0.0232919
14:50 average 0.0332767
14:59 average 0.0323687
15:10 average 0.0212022
15:20 average 0.0207737
15:30 average 0.0221833
15:40 average 0.03252
15:50 average 0.039384
Something I see here is that we pause for 2 mins at 15:00 to trigger a backup, and that's at about +30 minutes. At 15:10 it's running at comparable speeds to 14:40. At +30 minutes again it's slowed down, and without a 2 minute break it continues to slow down. We can live with a 2 minute break every 30 mins as a workaround, so this is something to try though it doesn't address the root problem.
We're not paginating in any way. We identify the next user with this approach which runs at around 0.003 secs all through the run.
set last_email field to null for all active users
(loop all users)
lock user table
select where last_email is null
update set last_email=now()
unlock table
build email
(endloop)
Dropping the batch size is definitely worth a try. When typing the question I noted we'd not tried 100, or 50, and we should explore that.
Indexes should be reviewed. Again :-) But they should be reviewed.
HD IO monitoring stats show no significant increase during the process. Network IO is around 130MB/min combined during the process which should not be an issue.
That's helped me with a couple of ideas of what to investigate next, which is what I was after for now.
Edit 2:
I've referred to emails being built, and these are batched in groups of 10 and sent via a SAAS using their API. The response times are consistent and not an issue.
But we also send booklists to an app using push notifications and Amazon's SNS services. When the product was in the early days we had equal numbers of signups from app and email, but now it's 90% email. The data for the app excludes some of the data in the email, so the queries are less complex. During the first part of the run we are building for more app users, whereas at the end of the run we are building almost exclusively for email users (with users being sorted by primary key, so essentially by creation date). This would be a great explanation for higher load on the DB if it wasn't for the fact that the app was introduced after a year, and the very start of the run (which is all email users) is fast. There still might be something in this though.
MEMORY tables are used on the understanding that these would be fastest, but the process is essentially read-only, and the servers could easily cache the 3 temp tables with indexes if they were INNODB. There is also an advantage as a query run to tidy the temp tables after we update it with new books runs significantly faster under INNODB:
DELETE #table# FROM #table#
INNER JOIN
(
SELECT ASIN, MAX(isFiction) AS isFiction, MAX(isNonFiction) AS isNonFiction
FROM #table#
GROUP BY ASIN
HAVING isFiction =1
AND isNonFiction =1
) D
WHERE D.ASIN=#table#.ASIN AND #table#.isNonFiction=1
Before:
free_books rebuilt after 0.9548 seconds
99_books rebuilt after 17.4050 seconds
299_books rebuilt after 9.5253 seconds
After:
free_books rebuilt after 0.0987 seconds
99_books rebuilt after 0.1214 seconds
299_books rebuilt after 0.1959 seconds
So we'll run with INNODB for today and see if that makes any changes overall.
Also, adding a pause every 30 mins made no positive difference. So I'll re-gather data after a few days using INNODB and try some very low batch sizes.
Edit 3:
Response time from the email SAAS provider looks like this everyday:
Verbose logs show batch send times are consistent (sampled):
[2018-09-16 14:37:12] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.07 []
[2018-09-16 14:37:15] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.08 []
[2018-09-16 14:45:23] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.07 []
[2018-09-16 14:45:28] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.07 []
[2018-09-16 14:55:20] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.08 []
[2018-09-16 14:55:26] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.08 []
[2018-09-16 15:04:30] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.12 []
[2018-09-16 15:05:07] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.08 []
[2018-09-16 15:11:27] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.08 []
[2018-09-16 15:11:31] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.10 []
[2018-09-16 15:16:31] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.09 []
[2018-09-16 15:17:05] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.08 []
[2018-09-16 15:33:24] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.07 []
[2018-09-16 15:33:30] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.08 []
[2018-09-16 15:40:31] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.11 []
[2018-09-16 15:41:11] production.DEBUG: BatchedEmailSender::createSendStats() send average 0.08 []
DB writing does not directly take place as this is a replication slave. There are replication transactions written to the slave, and there is a slight uptick in load of a few percent, as previously mentioned. This is based on the base level CPU usage for the other 20-odd hours of the day, compared to the CPU usage immediately after the run ends. At this time the load on the slave reduce to just 'normal load plus additional replication transactions as a result of the response to the emails' which isn't entirely scientific, but I can't see how to get a better estimate of this. However I can see how this explanation would fit the pattern of behaviour being observed. I don't fully grok the internals, but if data is being unpdated, and indexes are being changed, and indexes are needing to be reloaded, I see that this can have a large cumulative effect. So this needs investigating
I'm open to the idea of misconfigured buffers (open to the idea of anything that helps TBH...) and ways to specifically test that. The configs tend to go through periods of tweaking, but tend to be left alone when everything is working fine. This means I take it a "fix it if it breaks" approach to DBA, and it's not an area where I've got a lot of understanding. But it's not like I'm running standard configs on a 32GB RAM instance either.
It should be noticed that the INNODB test failed and had to be reverted to using MEMORY tables partway through the run. Individual build times for all emails were hitting 1sec rather than some hitting a few tenths at worst.
Given this is the DB that is experiencing excessive load, I struggle to see how workload on other servers would cause this. Except by some inverse relationship where the load on the servers that runs the scripts lessens over time, so the scripts are run more often, which overloads the DB. The CPU usage of the webservers shows peak CPU usage at around 4pm when the email run starts at 2:30pm, but these also handle click tracking and other user responses to the emails. The dedicated email servers only run as required and self terminate so I'll check the load on those on the next run.
Monitoring the processlist using mytop shows that the slowest queries are more likely to be in the larger table (400k rows against 200k rows and 80k rows in the others) so I've been looking at indexes and other simple changes in the main query. In local testing an unexpected speed difference has been noted between using:
AND NOT EXISTS
(
SELECT ASIN FROM free_books K
and
AND NOT EXISTS
(
SELECT 1 FROM free_books K
Whether this makes a significant difference on production remains to be seen, but all changes to the query needs to go through a round of testing first.
So for today, the batch will be 100. This should be well within capacity at the start of the run. If the CPU usage starts to peak at the usual time, it's possibly less likely to be due to the direct demand on the DB by the run, and possibly more to do with the side effects of user response to the emails. Though if emails are going out less quickly, user response will probably peak a little later too.
An option to discount the side effects of user response to the emails is to stop replication for the duration of the run. So that will probably be the next thing to be tried after reducing the batch, maybe going down to 50 depending what happens at 100.
Edit 4:
Running the batch at 100 resulted in build times increasing at a later time than usual, but essentially the same deterioration was experienced over the course of the run:
"I would be very surprised if queries response time on the same set of tables degrades in time"
The build queries are showing an increase in response time over the course of the run. I thought I'd posted log extracts, but I must have been looking at those before I created this question, so (at 10 minute intervals):
time average max min
14:30 0.0101638 0.0310 0.0049
14:40 0.0125875 0.0383 0.0047
14:50 0.0141314 0.0544 0.0052
15:00 0.0168579 0.1461 0.0051
15:10 0.0150825 0.0529 0.0045
15:20 0.0222632 0.1128 0.0050
15:30 0.0198023 0.0950 0.0060
15:40 0.0326818 0.2564 0.0056
15:50 0.0316094 0.1130 0.0067
16:00 0.03529 0.1181 0.0080
16:07 0.0369679 0.1555 0.0079
However this logging is not 'main query only' and it should be, along with additional logging for the sub queries. It is 'average time to identify all books for one user' and includes figuring out which random genre to be looking for, which table to take the next books from, truncating over-long book attributes and other tidying. Nothing that looks like a potential bottleneck, but I do need finer detail here.
Monitoring the dedicated emailservers on the last run showed they didn't break 20% CPU usage and load was under 1.
I'm not sure what you mean by buffers here, other than mySQL sort, key, inno_pool, join, etc. The 'build and send' loop in more detail is along the lines of:
cronjob triggers PHP script every minute
do
lock user table, identify user, update user, unlock table
build booklist
if app user
send push notification
else
add email to sender
if sender has 10 emails queued, send them and clear the queue
while (batch is incomplete and time running < 55 seconds)
flush email sender queue
record stats
I see nothing that suggests that garbage is not being collected when the script exits every minute. Occasionally there will be an issue when the mail SAAS provider times out and that script is still running when the next cronjob falls due. It's rare, so it's unlikely to be the underlying factor here.
So what buffers are you referring to here?
The main takeaway from the last couple of days is "unless, of course, there are external factors that affect the whole DB behavior" which I've been getting suspicious about and it's something that can be easily checked. So today's run will have replication disabled for the duration, along with any other queries that normally run on the slave.
I'd like to do this before doing a dummy run without sending emails as doing the dummy run without the increase in DB activity due to user response will still leave unanswered questions.
Edit 5
And I do have response times for the main query after all...
time average max min
14:30 0.00774976 0.1404 0.0015
14:40 0.0114696 0.1646 0.0014
14:50 0.0129792 0.2723 0.0013
15:00 0.0227155 0.5878 0.0013
15:10 0.0172391 0.2312 0.0014
15:20 0.0282831 0.2899 0.0016
15:30 0.0207079 0.3851 0.0016
15:40 0.0390089 0.9131 0.0013
15:50 0.0442052 0.4880 0.0014
16:00 0.0308441 0.6848 0.0013
16:07 0.0322117 0.7899 0.0014
And those are from:
Set timer
Initialise query variable
Run query
Check for return value
Log timer
Edit 6:
1 & 4)
From this earlier pseudocode:
set last_email field to null for all active users
(loop all users)
lock user table
select first user where last_email is null
update set last_email=now()
unlock table
build email
(endloop)
The lock allows exclusive access to the user table so one thread on one server gets the next due user, and it's updated before unlocking for access by the next thread. There is an option to queue all active user IDs to a message broker and work them from there, but as time taken to identify the next user doesn't seem to vary significantly during the run, the method using LOCK remains in place for now.
The user table is actually MYISAM, which is a legacy issue.
2) The batching of emails is somewhat arbitrary. We use a concurrent connections library that exits on completion of the slowest connection, and emails can be up to 250kb, so having too many in a batch will increase the likelyhood of a slow send, increase memory usage, and dump on the SAAS api rather. The batch size hasn't been experimented with as this hasn't been seen as the root problem. One server sampled from yesterday on response times:
time avg min max
14:30 0.027 0.09 0.01
14:40 0.08 0.09 0.07
14:50 0.081 0.09 0.07
15:00 0.04 0.04 0.04
15:10 0.086 0.11 0.08
15:20 0.035 0.09 0.01
15:30 0.021 0.03 0.02
15:40 0.206 0.70 0.07
15:50 0.2 0.69 0.07
16:00 0.092 0.13 0.08
3) Stats are recorded in fine detail to logs for debugging and to the DB for totals (mostly). Nearly everything I'm quoting here is coming from log files. DB updates are committed when they are run, not together at the end of a batch. A typical 1 minute batch would:
update user.last_email (150 per minute)
create timing stats for sending each batch of 10 emails (so ~15 times a minute)
create timing stats for average build time over the entire minute (created once)
create stats for total built in that minute (created once)
(all x 16 threads running over all servers, which doesn't seem excessive overall)
Looking more into replication, I found that the binlogs on the master doubled in frequency (to 10M limit) over the run, which sort-of suggests there are twice as many transactions hitting the DB over that period. Plans are still on to pause replication during the next run to see what that reveals.
Also been digging more into systat results, and am positive in discounting disk io as being a problem, though changes in network traffic totals closely resemble the pattern of deterioration of the main query response time. With the lower batch sizes yesterday, load was significantly less (peak at 6 rather than 10 when running with a batch of 150, and tended to be about 4 on a 4 CPU server which suggest it's not CPU-bound) and it also reflected the pattern of the main query response time, but then it would wouldn't it.
I'll look closer at what functional network throughput this server should be able to cope with as I can see how that being a bottleneck would have this result.
Later:
It wasn't a bottleneck. Tests show throughput available up to about 50x what was reported as peak during the last run.
Edit 7:
To clarify the basic infrastructure:
The slave DB runs in isolation - no other code running on this beyond a healthcheck.
An email SAAS provider (Mailgun) is used to actually send the emails. We just aim to hit their API in a measured way.
There are 2 webservers running on other instances which serve the front end and also build 1 thread of emails each throughout the day as needed for newly registered users. Most of the time these have nothing to build and run at <5% CPU usage, peaking to ~25% during the main build.
There are 4 'email' servers which run 4 threads each which are auto-created just before the whole process starts, and auto-terminate when load suggests that all emails have been sent. So these are running at ~25% CPU usage (~1 load on a 2 CPU instance) for about 80mins or so, then shutting down.
They all run the same code for building emails. The emailservers just have 4 cronjobs initiating the PHP script, whereas the webservers just have 1. The script connects to the master DB for critical reads and updates (user and stats table), and to the slave DB for the read-only queries for the builds.
Currently, the DB is the only element in the chain presenting a problem. The clients have a requirement that the emails should be built and sent within 2 hours, but as soon as possible is best. Currently the process completes in about 80 mins. As new users register at a steady rate, at some point the 2 hour limit will be hit. Given that the DB is becoming overloaded during the process to the point where queries are taking twice as long after about 80mins, it's difficult to see that this will resolve itself and instead the response rate will continue to deteriorate.
Parallelism is mostly required to hit the 2 hour limit. Historical data tells us how long each part of the process typically takes (some relying on the DB, and others not) and a single thread would spend some of it's time not hitting the DB. We're intentionally wanting to push the DB up to a reasonable operational limit, but we are unintentionally doing something that causes it's performance to deteriorate. And this is the thing I'm trying to find out.
So with your suggestion, it seems like you're saying that an issue may be that the load increase comes from the management of the threads and not necessarily because of the work being done by the threads? I've been looking closely at systat files and would expect that to appear as high sys CPU usage, but I suppose this is mySQL's thread management at work, rather than kernel thread management so it's all in the mySQL process CPU usage.
The test on turning off replication made no difference yesterday, so now that's been discounted as a possible impact it's time to create a test server. Though creating an environment that mirrors what's going on in production during the run is not trivial, only being able to test this once a day is limiting.
I'll create a test branch for code that mocks the calls to external APIs, and get a full suite of webservers, emailservers, and master and slave DBs setup. I'll run variations of batches and threads, and test some code changes that might be a little less demanding, and can experiment with indexes in the temporary tables too.
So, once again, thanks for the advice. I'll post back here when I have anything worth adding.
Final edit:
It's nothing to do with the DB. Though it took a while to get back to this as other higher priority issues came up, testing shows quite clearly that this is an issue with the nature of the queries, and not an issue with the DB itself.
The load is always 2-4 with batches of 200 for the first 20k accounts. And it's always 10-12 with only 90-110 completing for the last 20k accounts.This after multiple runs with and without allowing time between them, swapping them randomly from one to the other. It's unequivocal.
The main cause is that a full list of books up to the user-requested limit must be completed. If there are no books in a particular genre, a different genre is used, and the code loops round and builds a new query to achieve this. If all the genres in one price range are empty, a different genre in a different price table is used, and again the code loops. Empty genres and tables are omitted from future queries for that user. The incidence of 'no matching books found' is about 3x higher in the last 20k users compared to the first 20k users. So it's just that more queries are being run for each user on average towards the end of the run.
Later users tend to have a higher booklimit and tend to subscribe to less genres, which increases the chance of finding no matches and hitting empty tables. We did increase the default booklimit for new users at some point which explains that, but I've found no reason for later users subscribing to less genres.
Though there probably is some mileage in reviewing queries and indexes on the DB, ultimately this is probably something to be addressed through logic in the code.
Well, the last two edits do bring some light to the possible sources of the problem. Will start with the 5th.
The main query response time does show an increase over time (see average and max). It does not necessarily implies that there is an issue with the query itself though. My guess is that there are other side activities that affect it.
Second, the first piece of pseudo-code includes some elements that I would need a brief explanation:
Why do you lock the user tables (lock user table, identify user, update user, unlock table)?
You are collecting up to 10 outgoing emails before actually sending them. I guess this is for efficiency. Did you try playing with this threshold?
You are recording statistics. Is this into the DB? If yes, do you commit after each cycle?
How do you make sure that you will not be sending the same list to a user? I mean, from your example, I see no mechanism that would prevent that. Again, guessing, you do record something into the DB to prevent duplications. I I'm right, how often do you commit transactions?
EDIT 1:
If I understand correctly, all your threads are sharing the same resources (DB and e-mail server), being these two parts the slowest of all the elements with the chain, right?
As such, could you elaborate on the reasons for having parallelism altogether?
I'll explain:
From the moment you are sharing single instances of resources (quote: "We have a mySQL slave in production..."), implementing multiple threads sending requests to this unique resource, while the requests concern the same tables, it is questionable to assert that the throughput of the arrangement as a whole can be increased.
In fact, and due to the need of prevent different threads to process the same user, you are adding load to the DB (locks, nulling last_email, etc.).
I'm not sure if it would be possible for you to perform the following test in production, but its result would/could be quite interesting...
Reduce the number of parallel threads by a factor of 2 (say, from 10 to 5),
Increase the batch size by the same factor (say from 150 to 300).
Since CPU load does not appear to be high, even when there is a degradation of performance, an increase on the batch size would not bring the CPU to its knees, while it would reduce the work on those parts aimed to maintain consistency across the parallel threads.
You haven't given enough information to get a complete picture, so I will have to answer you based on experience:
The high CPU can be a symptom of over-processing/filtering due to inadequate indexes.
The tail end of your process can be slow, either because you are paginating and you are towards the end of the list (ie, LIMIT 1000000,150).
Or it could be some sort of IO resource getting slow which may then mix with a lock.
Immediate things you can do:
Shrink the batch size to 1/10 and see how the tail end performs.
Analyse the queries and table structure and using EXPLAIN and
noticing the time a query takes to complete, find some better
indexes.
I am developing an application that consist of a server in node.js, basically a socket that is listening for incoming connections.
Data that arrive to my server, come from GPS trackers (30 approximately), that send 5 records per minute each one, so in a minute a will have 5*30 = 150 records, in a hour i will have 150*60 = 9000 records, in a day 9000*24 =216000 and in a month 216000*30 = 6.480.000 million of records.
In addition to the latitude and longitude, i have to store in the database (MySql) the cumulative distance of each tracker. Each tracker send to server positions, and i have to calculate kms between 2 points every time i receive data (to decrease the work to the database when it has millions of records).
So the question is, what is the correct way to sum the kilometers and store it?
I think sum the entire database is not a solution because in millions of records will be very slow. Maybe, every time i have to store a new point (150 times per minute), can I do a select last record in database and then sum the cumulative kilometer with the new distance calc?
2.5 inserts per second is only a modest rate. 6M records/month -- no problem.
How do you compute the KMs? Compute the distance of the previous GPS reading to the current? Or maybe back to the start? Keep in mind that GPS readings can be a bit flaky; a car going in a straight line may look drunk when plotted every 12 seconds. Meanwhile, I will assume you need some kind of index on (sensor, sequence) to keep track of the previous (or first) reading to do the distance.
But, what will you do with the distance? Is it being continually read out for display somewhere? That is, are you updating some non-MySQL thingie 150 times per minute? If so, you have an app that should receive the new GPS reading, store it into MySQL, read the starting point (or remember it), compute the kms and update the graph. That is, MySQL is not the focus here, but your app is.
As for representation of lat/lng, I reference my cheat sheet to see that FLOAT may be optimal.
Kilometers should almost certainly be stored as FLOAT. That gives you about 7 significant digits of precision. You should decide whether to the value represents "meters" or "kilometers". (The precision is the same.)
I am using a sort of code_ping for the time it took to process the whole page, to all my pages in my webportal.
I figured if I do a $count_start in the header initialised with current timestamp and a $count_end in the footer, the same, the difference is a meter to roughly let me know how well optimised the page is (queries, loading time of all things in that particular page).
Say for one page i get 0.0075 seconds, for others I get 0.045 etc...i'm working on optimising the queries better this way.
My question is. If one page says by this meter "rough loading time" that has 0.007 seconds,
will 1000 users querying the same page at the same time get each the result in 0.007 * 1000 = 7 seconds ? meaning they will each get the page after 7 seconds ?
thanks
Luckily, it doesn't usually mean that.
The missing variable in your equation is how your database and your application server and anything else in your stack handles concurrency.
To illustrate this strictly from the MySQL perspective, I wrote a test client program that establishes a fixed number of connections to the MySQL server, each in its own thread (and so, able to issue a query to the server at approximately the same time).
Once all of the threads have signaled back that they are connected, a message is sent to all of them at the same time, to send their query.
When each thread gets the "go" signal, it looks at the current system time, then sends the query to the server. When it gets the response, it looks at the system time again, and then sends all of the information back to the main thread, which compares the timings and generates the output below.
The program is written in such a way that it does not count the time required to establish the connections to the server, since in a well-behaved application the connections would be reusable.
The query was SELECT SQL_NO_CACHE COUNT(1) FROM ... (an InnoDB table with about 500 rows in it).
threads 1 min 0.001089 max 0.001089 avg 0.001089 total runtime 0.001089
threads 2 min 0.001200 max 0.002951 avg 0.002076 total runtime 0.003106
threads 4 min 0.000987 max 0.001432 avg 0.001176 total runtime 0.001677
threads 8 min 0.001110 max 0.002789 avg 0.001894 total runtime 0.003796
threads 16 min 0.001222 max 0.005142 avg 0.002707 total runtime 0.005591
threads 32 min 0.001187 max 0.010924 avg 0.003786 total runtime 0.014812
threads 64 min 0.001209 max 0.014941 avg 0.005586 total runtime 0.019841
Times are in seconds. The min/max/avg are the best/worst/average times observed running the same query. At a concurrency of 64, you notice the best case wasn't all that different than the best case with only 1 query. But biggest take-away here is the total runtime column. That value is the difference in time from when the first thread sent its query (they all send their query at essentially the same time, but "precisely" the same time is impossible since I don't have a 64-core machine to run the test script on) to when the last thread received its response.
Observations: the good news is that the 64 queries taking an average of 0.005586 seconds definitely did not require 64 * 0.005586 seconds = 0.357504 seconds to execute... it didn't even require 64 * 0.001089 (the best case time) = 0.069696 All of those queries were started and finished within 0.019841 seconds... or only about 28.5% of the time it would have theoretically taken for them to run one-after-another.
The bad news, of course, is that the average execution time on this query at a concurrency of 64 is over 5 times as high as the time when it's only run once... and the worst case is almost 14 times as high. But that's still far better than a linear extrapolation from the single-query execution time would suggest.
Things don't scale indefinitely, though. As you can see, the performance does deteriorate with concurrency and at some point it would go downhill -- probably fairly rapidly -- as we reached whichever bottleneck occurred first. The number of tables, the nature of the queries, any locking that is encountered, all contribute to how the server performs under concurrent loads, as do the performance of your storage, the size, performance, and architecture, of the system's memory, and the internals of MySQL -- some of which can be tuned and some of which can't.
But of course, the database isn't the only factor. The way the application server handles concurrent requests can be another big part of your performance under load, sometimes to a larger extent than the database, and sometimes less.
One big unknown from your benchmarks is how much of that time is spent by the database answering the queries, how much of the time is spent by the application server executing the logic business, and how much of the time is spent by the code that is rendering the page results into HTML.