Message Queue Timelimit warning - message-queue

I'm currently trying to configure my Oro Community Shop - running on Docker (WSL2 Backend).
While i configured some products and the Web Catalogue, I noticed that some of it is not showing up at the FrontEnd. I tried to run the message consumer on different options like:
symfony run -d php bin/console oro:message-queue:consume --time-limit=+1200 seconds --memory-limit=556MB -v --env=dev
But no matter what this error/warning occurs.
[Application] May 20 08:40:34 |DEBUG | CONSUM Switch to a queue oro.seconds
[Application] May 20 08:40:34 |DEBUG | CONSUM Execution interrupted as limit time has passed. now: "2022-05-20T08:40:34+0200", time-limit: "2022-05-20T08:40:34+0600" extension="Oro\Component\MessageQueue\Consumption\Extension\LimitConsumptionTimeExtension"
[Application] May 20 08:40:34 |DEBUG | DOCTRI SELECT updated_at FROM oro_message_queue_state WHERE id = :id extension="Oro\Bundle\MessageQueueBundle\Consumption\Extension\ResettableExtensionWrapper" id="cache" memory_usage="101.33 MB"
[Application] May 20 08:40:34 |INFO | CONSUM Update the consumer state time. memory_usage="101.36 MB"
[Application] May 20 08:40:34 |DEBUG | DOCTRI UPDATE oro_message_queue_state SET updated_at = :updatedAt WHERE id = :id AND updated_at < :dateWithGap dateWithGap="2022-05-20T06:35:34+00:00" id="consumers" updatedAt="2022-05-20T06:40:34+00:00"
[Application] May 20 08:40:34 |DEBUG | CONSUM Make sure the queue "oro.seconds" exists on a broker side. memory_usage="102.08 MB"
[Application] May 20 08:40:34 |WARNING| CONSUM Consuming interrupted, reason: The limit time has passed. memory_usage="102.43 MB"
Did i forgot/failed to configure something?
It would be great if someone with more experience could help me handle this problem.
Thanks Alex

If it's a local environment, you can omit the time-limit option and restart the process manually when it stops.
In the production environment, it is required to run the message consumer using a supervisor that will automatically restart the consumer when needed. For more details, see https://doc.oroinc.com/backend/mq/supervisord/#supervisord and https://doc.oroinc.com/backend/mq/.

Related

SGE Setting to Slow Down Specific Job

One of the SGE job was running slow and killed by qmaster to enforce the h_rt=1200.
Is that possible SGE admin dynamically change the setting to make the job(id=2771780) running slow? If yes, what could be the setting to do so? If not, what could cause this?
qname test.q
hostname abc
group domain
owner jenkins
project NONE
department defaultdepartment
jobname top
jobnumber 2771780
taskid undefined
account sge
priority 0
qsub_time Mon Dec 20 11:46:06 2021
start_time Mon Dec 20 11:46:07 2021
end_time Mon Dec 20 12:06:08 2021
granted_pe NONE
slots 1
failed 37 : qmaster enforced h_rt, h_cpu, or h_vmem limit
exit_status 137 (Killed)
ru_wallclock 1201s
ru_utime 0.088s
ru_stime 8.797s
ru_maxrss 5.559KB
ru_ixrss 0.000B
ru_ismrss 0.000B
ru_idrss 0.000B
ru_isrss 0.000B
ru_minflt 23574
ru_majflt 0
ru_nswap 0
ru_inblock 128
ru_oublock 240
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 24156
ru_nivcsw 66
cpu 1454.650s
mem 54.658GBs
io 495.010GB
iow 0.000s
maxvmem 1014.082MB
arid undefined
ar_sub_time undefined
category -U arusers,digital -q test.q -l h_rt=1200
If you are saying that usually the job finishes in 1200s, but ran slowly on this particular occasion, this could be for various external factors such as contention for storage or network bandwidth. You may have also landed on a different compute node type that had slower CPU. An SGE admin can change various resource settings before the job starts executing such as the number of cores, but the more likely issue is contention for storage/io or even throttled cpu for thermal reasons.

Google Fit estimated steps through REST API decrease in time with some users

We are using Googlefit REST API in a process with thousands of users, to get daily steps. With most of users, process is OK, although we are finding some users with this specific behaviour: users step increase during the day, but at some point, they decrease significantly.
We are finding a few issues related to this with Huawei health apps mainly (and some Xiaomi health apps).
We use this dataSourceId to get daily steps: derived:com.google.step_count.delta:com.google.android.gms:estimated_steps
An example of one of our requests to get data for 15th March (Spanish Times):
POST https://www.googleapis.com/fitness/v1/users/me/dataSources
Accept: application/json
Content-Type: application/json;encoding=utf-8
Authorization: Bearer XXXXXXX
{
"aggregateBy": [{
"dataTypeName": "com.google.step_count.delta",
"dataSourceId": "derived:com.google.step_count.delta:com.google.android.gms:estimated_steps"
}],
"bucketByTime": { "durationMillis": 86400000 },
"startTimeMillis": 1615244400000,
"endTimeMillis": 1615330800000
}
With most of users, this goes well (it gets the same data that shows up to the user in googlefit app), but with some users as described, numbers during day increase at first, and decrease later. Some users' data in the googlefit app is much greater (or significantly greater) than the one found through the REST API.
We have even traced this with a specific user during the day. Using buckets of 'durationMillis': 3600000, we have painted a histogram of hourly steps in one day (with a custom made process).
For the same day, in different moments of time (a couple of hours difference in this case), we get this for the EXACT SAME USER:
20210315-07 | ########################################################## | 1568
20210315-08 | ############################################################ | 1628
20210315-09 | ########################################################## | 1574
20210315-10 | ####################### | 636
20210315-11 | ################################################### | 1383
20210315-12 | ###################################################### | 1477
20210315-13 | ############################################### | 1284
20210315-14 | #################### | 552
vs. this, that was retrieved A COUPLE OF HOURS LATER:
20210315-08 | ################# | 430
20210315-09 | ######### | 229
20210315-10 | ################# | 410
20210315-11 | ###################################################### | 1337
20210315-12 | ############################################################ | 1477
20210315-13 | #################################################### | 1284
20210315-14 | ###################### | 552
("20210315-14" means 14.00 at 15th March of 2021)
This is the returning JSON in the first case:
[{"startTimeNanos":"1615763400000000000","endTimeNanos":"1615763460000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":6,"mapVal":[]}]},
{"startTimeNanos":"1615788060000000000","endTimeNanos":"1615791600000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1568,"mapVal":[]}]},
{"startTimeNanos":"1615791600000000000","endTimeNanos":"1615795080000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1628,"mapVal":[]}]},
{"startTimeNanos":"1615795200000000000","endTimeNanos":"1615798500000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1574,"mapVal":[]}]},
{"startTimeNanos":"1615798860000000000","endTimeNanos":"1615802400000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":636,"mapVal":[]}]},
{"startTimeNanos":"1615802400000000000","endTimeNanos":"1615806000000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1383,"mapVal":[]}]},
{"startTimeNanos":"1615806000000000000","endTimeNanos":"1615809480000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1477,"mapVal":[]}]},
{"startTimeNanos":"1615809660000000000","endTimeNanos":"1615813200000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1284,"mapVal":[]}]},
{"startTimeNanos":"1615813380000000000","endTimeNanos":"1615815420000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":552,"mapVal":[]}]}]
This is the returning JSON in the latter case:
[{"startTimeNanos":"1615788300000000000","endTimeNanos":"1615791600000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":517,"mapVal":[]}]},
{"startTimeNanos":"1615791600000000000","endTimeNanos":"1615794540000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":430,"mapVal":[]}]},
{"startTimeNanos":"1615796400000000000","endTimeNanos":"1615798200000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":229,"mapVal":[]}]},
{"startTimeNanos":"1615798980000000000","endTimeNanos":"1615802400000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":410,"mapVal":[]}]},
{"startTimeNanos":"1615802400000000000","endTimeNanos":"1615806000000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1337,"mapVal":[]}]},
{"startTimeNanos":"1615806000000000000","endTimeNanos":"1615809480000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1477,"mapVal":[]}]},
{"startTimeNanos":"1615809660000000000","endTimeNanos":"1615813200000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1284,"mapVal":[]}]},
{"startTimeNanos":"1615813380000000000","endTimeNanos":"1615815420000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":552,"mapVal":[]}]}]
AS you can see, all points always come from originDataSourceId: "raw:com.google.step_count.delta:com.huawei.health"
It looks like a process of Googlefit is doing some kind of adjustments, removing some steps or datapoints, although we cannot find a way to detect what and why, and we cannot explain to the user what is happening or what he or we can do to make his app data to be exactly like ours (or the other way around). His googlefit app shows a number that is not the same one as the one that the REST API shows.
User has already disabled the "googlefit app tracking activities" option.
I would love to know, or try to get some hints to know:
What can I do to debug even more?
Any hint about why is happening this?
Is there anyway, from a configuration point of view (for the user) to prevent this to happen?
Is there anyway, from a development point of view, to prevent this to happen?
Thanks and regards.
UPDATE AFTER Andy Turner's question (thanks for the comment!).
We were able to "catch" this during several hours: 18.58 (around 6K steps), 21.58 (around 25K steps), 22.58 (around 17K steps), 23.58 (around 26K steps). We exported datasets for those, and here is the result.
Another important info: Data is coming only from "raw:com.google.step_count.delta:com.huawei.health". We went through other datasets that might look suspicious, and all were empty (apart from derived and so on).
If we interpret this correctly, probably it's huawei which is sending sometimes a value, and next time, another thing; so it's probably some misconfiguration in the huawei part.
Here are the datasets exported:
https://gist.github.com/jmarti-theinit/8d98996873a9c499a14899a9b62162f3
Result of the GIST is:
Length of 18.58 points 165
Length of 21.58 points 503
Length of 22.58 points 294
Length of 23.58 points 537
How many points in 21.58 that exist in 18.58 => 165
How many points in 22.58 that exist in 18.58 => 57
How many points in 22.58 that exist in 21.58 => 294
How many points in 23.58 that exist in 18.58 => 165
How many points in 23.58 that exist in 21.58 => 503
How many points in 23.58 that exist in 22.58 => 294
So our bet is points are removed and added by devices behind huawei (for example only 57 are common in 18.58 - 22.58), and we cannot control anything more from googlefit's side. Is that correct? Anything else we could see?
We're having similar issues using the REST API.
Here you have what coincides with the case of Jordi:
we are also from Spain (and our users too), although we use servers in Spain and the US
we get the same daily steps value as the google fit app for some users, but not for other users
daily steps increases during the current day, but every next day we make the request, daily steps decrease sometimes
we are making the same request, from the start of day to the end of the day, with 86400000 as bucket time and same data type and data source id
We are in the final development phase, so we're testing with a few users only. Our users have Xiaomi mi band devices.
We think that the problem could be a desynchronization of the servers that we're hitting, because if we test with other apps like this one, they show the correct values. We've created another google cloud console oauth client credentials and new email accounts to test with a brand new users and oauth clients, but the results are the same.
This is the recommended way to get the daily steps andwe are using exactly the same request
https://developers.google.com/fit/scenarios/read-daily-step-total
and even with the "try it" option in the documentation the results are wrong.
What else can we do to help you resolve the issue?
Thank you very much!

MySQL has gone away: Connection_errors_peer_address with high numbers

We have MySQL 5.7 master - slaves replications and on the slave servers side, it hapens from time to time that our application monitoring tools (Tideways and PHP7.0) are reporting
MySQL has gone away.
Checking the MYSQL side:
show global status like '%Connection%';
+-----------------------------------+----------+
| Variable_name | Value |
+-----------------------------------+----------+
| Connection_errors_accept | 0 |
| Connection_errors_internal | 0 |
| Connection_errors_max_connections | 0 |
| Connection_errors_peer_address | 323 |
| Connection_errors_select | 0 |
| Connection_errors_tcpwrap | 0 |
| Connections | 55210496 |
| Max_used_connections | 387 |
| Slave_connections | 0 |
+-----------------------------------+----------+
The Connection_errors_peer_address shows 323. How to further investigate on what is causing this issue on both sides:
MySQL has gone away
and
Connection_errors_peer_address
EDIT:
Master Server
net_retry_count = 10
net_read_timeout = 120
net_write_timeout = 120
skip_networking = OFF
Aborted_clients = 151650
Slave Server 1
net_retry_count = 10
net_read_timeout = 30
net_write_timeout = 60
skip_networking = OFF
Aborted_clients = 3
Slave Server 2
net_retry_count = 10
net_read_timeout = 30
net_write_timeout = 60
skip_networking = OFF
Aborted_clients = 3
In MySQL 5.7, when a new TCP/IP connection reaches the server, the server performs several checks, implemented in sql/sql_connect.cc in function check_connection()
One of these checks is to get the IP address of the client side connection, as in:
static int check_connection(THD *thd)
{
...
if (!thd->m_main_security_ctx.host().length) // If TCP/IP connection
{
...
peer_rc= vio_peer_addr(net->vio, ip, &thd->peer_port, NI_MAXHOST);
if (peer_rc)
{
/*
Since we can not even get the peer IP address,
there is nothing to show in the host_cache,
so increment the global status variable for peer address errors.
*/
connection_errors_peer_addr++;
my_error(ER_BAD_HOST_ERROR, MYF(0));
return 1;
}
...
}
Upon failure, the status variable connection_errors_peer_addr is incremented, and the connection is rejected.
vio_peer_addr() is implemented in vio/viosocket.c (code simplified to show only the important calls)
my_bool vio_peer_addr(Vio *vio, char *ip_buffer, uint16 *port,
size_t ip_buffer_size)
{
if (vio->localhost)
{
...
}
else
{
/* Get sockaddr by socked fd. */
err_code= mysql_socket_getpeername(vio->mysql_socket, addr, &addr_length);
if (err_code)
{
DBUG_PRINT("exit", ("getpeername() gave error: %d", socket_errno));
DBUG_RETURN(TRUE);
}
/* Normalize IP address. */
vio_get_normalized_ip(addr, addr_length,
(struct sockaddr *) &vio->remote, &vio->addrLen);
/* Get IP address & port number. */
err_code= vio_getnameinfo((struct sockaddr *) &vio->remote,
ip_buffer, ip_buffer_size,
port_buffer, NI_MAXSERV,
NI_NUMERICHOST | NI_NUMERICSERV);
if (err_code)
{
DBUG_PRINT("exit", ("getnameinfo() gave error: %s",
gai_strerror(err_code)));
DBUG_RETURN(TRUE);
}
...
}
...
}
In short, the only failure path in vio_peer_addr() happens when a call to mysql_socket_getpeername() or vio_getnameinfo() fails.
mysql_socket_getpeername() is just a wrapper on top of getpeername().
The man 2 getpeername manual lists the following possible errors:
NAME
getpeername - get name of connected peer socket
ERRORS
EBADF The argument sockfd is not a valid descriptor.
EFAULT The addr argument points to memory not in a valid part of the process address space.
EINVAL addrlen is invalid (e.g., is negative).
ENOBUFS
Insufficient resources were available in the system to perform the operation.
ENOTCONN
The socket is not connected.
ENOTSOCK
The argument sockfd is a file, not a socket.
Of these errors, only ENOBUFS is plausible.
As for vio_getnameinfo(), it is just a wrapper on getnameinfo(), which also according to the man page man 3 getnameinfo can fail for the following reasons:
NAME
getnameinfo - address-to-name translation in protocol-independent manner
RETURN VALUE
EAI_AGAIN
The name could not be resolved at this time. Try again later.
EAI_BADFLAGS
The flags argument has an invalid value.
EAI_FAIL
A nonrecoverable error occurred.
EAI_FAMILY
The address family was not recognized, or the address length was invalid for the specified family.
EAI_MEMORY
Out of memory.
EAI_NONAME
The name does not resolve for the supplied arguments. NI_NAMEREQD is set and the host's name cannot be located, or neither
hostname nor service name
were requested.
EAI_OVERFLOW
The buffer pointed to by host or serv was too small.
EAI_SYSTEM
A system error occurred. The error code can be found in errno.
The gai_strerror(3) function translates these error codes to a human readable string, suitable for error reporting.
Here many failures can happen, basically due to heavy load or the network.
To understand the process behind this code, what the MySQL server is essentially doing is a Reverse DNS lookup, to:
find the hostname of the client
find the IP address corresponding to this hostname
to later convert this IP address to a hostname again (see the call to ip_to_hostname() that follows).
Overall, failures accounted with Connection_errors_peer_address can be due to system load (causing transient failures like out of memory, etc) or due to network issues affecting DNS.
Disclosure: I happen to be the person who implemented this Connection_errors_peer_address status variable in MySQL, as part of an effort to have better visibility / observability in this area of the code.
[Edit] To follow up with more details and/or guidelines:
When Connection_errors_peer_address is incremented, the root cause is not printed in logs. That is unfortunate for troubleshooting, but also avoid flooding logs causing even more damage, there is a tradeoff here. Keep in mind that anything that happen before logging in is very sensitive ...
If the server really goes out of memory, it is very likely that many other things will break, and that the server will go down very quickly. By monitoring the total memory usage of mysqld, and monitoring the uptime, it should be fairly easy to determine if the failure "only" caused connections to be closed with the server staying up, or if the server itself failed catastrophically.
Assuming the server stays up on failure, the more likely culprit is the second call then, to getnameinfo.
Using skip-name-resolve will have no effect, as this check happens later (see specialflag & SPECIAL_NO_RESOLVE in the code in check_connection())
When Connection_errors_peer_address fails, note that the server cleanly returns the error ER_BAD_HOST_ERROR to the client, and then closes the socket. This is different from just closing abruptly a socket (like in a crash) : the former should be reported by the client as "Can't get hostname for your address", while the later is reported as "MySQL has gone away".
Whether the client connector actually treat ER_BAD_HOST_ERROR and a socket closed differently is another story
Given that this failure overall seems related to DNS lookups, I would check the following items:
See how many rows are in the performance_schema.host_cache table.
Compare this with the size of the host cache, see the host_cache_size system variable.
If the host cache appear full, consider increasing its size: this will reduce the number of DNS calls overall, relieving pressure on DNS, in hope (admittedly, this is just a shot in the dark) that DNS transient failures will disappear.
323 out of 55 million connections indeed seems transient. Assuming the monitoring client sometime do get connected properly, inspect the row in table host_cache for this client: it may contains other failures reported.
Table performance_schema.host_cache documentation:
https://dev.mysql.com/doc/refman/5.7/en/host-cache-table.html
Further readings:
http://marcalff.blogspot.com/2012/04/performance-schema-nailing-host-cache.html
[Edit 2] Based on the new data available:
The Aborted_clients status variable shows some connections forcefully closed by the server. This typically happens when a session is idle for a very long time.
A typical scenario for this to happen is:
A client opens a connection, and sends some queries
Then the client does nothing for an extended amount of time (greater than the net_read_timeout)
Due to lack of traffic, the server closes the session, and increments Aborted_connects
The client then sends another query, sees a closed connection, and reports "MySQL has gone away"
Note that a client application forgetting to cleanly close sessions will execute 1-3, this could be the case for Aborted_clients on the master. Some cleanup here to fix clients applications using the master would help to decrease resource consumption, as leaving 151650 sessions open to die on timeout has a cost.
A client application executing 1-4 can cause Aborted_clients on the server and MySQL has gone away on the client. The client application reporting "MySQL has gone away" is most likely the culprit here.
If a monitoring application, say, checks the server every N seconds, then make sure the timeouts (here 30 and 60 sec) are significantly greater that N, or the server will kill the monitoring session.

Purging Dead Nodes from SGE

My qstat -g c indicates that I have some dead nodes (formally 'cdsuE'):
CLUSTER QUEUE CQLOAD USED RES AVAIL TOTAL aoACDS cdsuE
--------------------------------------------------------------------------------
all.q 0.11 18 0 9 37 0 10
Is there an easy way to purge or remove these nodes from the queue?
SGE is smart enough to not allocate work to them but they do clutter up various displays.
I do it the hardway.
Kill the jobs "running" or stuck on dead nodes.
Run the qconf remove node pipeline
-
qconf -dattr hostgroup hostlist <nodealias> #allhosts'
qconf -purge queue slots all.q#<nodealias>
qconf -dconf <nodealias>
qconf -de <nodealias>
If you just want to remove from the queue then removing them from the queue
with:
qconf -dattr queue hostlist <nodename> all.q
or if they're incorporated via a hostgroup
qconf -dattr hostgroup hostlist <nodename> <hostgroup>
This does the minimum needed to get them out of the queue but makes it easy to add them back if you manage to resurect them later.
If there are any ghost jobs on the node then use qdel -f to get rid of them

Creating index takes too long time

About 2 months ago, I imported EnWikipedia data(http://dumps.wikimedia.org/enwiki/20120211/) into mysql.
After finished importing EnWikipedia data, I have been creating index in the tables of the EnWikipedia database in mysql for about 2 month.
Now, I have reached the point of creating index in "pagelinks".
However, it seems to take an infinite time to pass that point.
Therefore, I checked the time remaining to pass to ensure that my intuition was correct or not.
As a result, the expected time remaining was 60 days(assuming that I create index in "pagelinks" again from the beginning.)
My EnWikipedia database has 7 tables:
"categorylinks"(records: 60 mil, size: 23.5 GiB),
"langlinks"(records: 15 mil, size: 1.5 GiB),
"page"(records: 26 mil, size 4.9 GiB),
"pagelinks"(records: 630 mil, size: 56.4 GiB),
"redirect"(records: 6 mil, size: 327.8 MiB),
"revision"(records: 26 mil, size: 4.6 GiB) and "text"(records: 26 mil, size: 60.8 GiB).
My server is...
Linux version 2.6.32-5-amd64 (Debian 2.6.32-39),Memory 16GB, 2.39Ghz Intel 4 core
Is that common phenomenon for creating index to take so long days ?
Does anyone have a good solution to create index more quickly ?
Thanks in advance !
P.S: I made following operations for checking the time remaining.
References(Sorry,following page is written in Japanese): http://d.hatena.ne.jp/sh2/20110615
1st. I got records in "pagelink".
mysql> select count(*) from pagelinks;
+-----------+
| count(*) |
+-----------+
| 632047759 |
+-----------+
1 row in set (1 hour 25 min 26.18 sec)
2nd. I got the amount of records increased per minute.
getHandler_write.sh
#!/bin/bash
while true
do
cat <<_EOF_
SHOW GLOBAL STATUS LIKE 'Handler_write';
_EOF_
sleep 60
done | mysql -u root -p -N
command
$ sh getHandler_write.sh
Enter password:
Handler_write 1289808074
Handler_write 1289814597
Handler_write 1289822748
Handler_write 1289829789
Handler_write 1289836322
Handler_write 1289844916
Handler_write 1289852226
3rd. I computed the speed of recording.
According to the result of 2. ,the speed of recording is
7233 records/minutes
4th. Then the time remaining is
(632047759/7233)/60/24 = 60 days
Those are pretty big tables, so I'd expect the indexing to be pretty slow. 630 million records is a LOT of data to index. One thing to look at is partitioning, with data sets that large, without correctly partitioned tables, performance will be sloooow. Here's some useful links:
using partioning on slow indexes you could also try looking at the buffer size settings for building the indexes (the default is 8MB, do for your large table that's going to slow you down a fair bit. buffer size documentation