Ping problems with Windows Server 2012 since Updates on 20/4/2016 - updates

I'm getting a very strange issue with Ping on Windows Server 2012 R2.
I'm running 3 servers - call them A, B and C - and I've also been using a Windows 7 desktop to test this - call it Z.
I can Ping from any of A, B, C or Z to any of the others (i.e. all 4 machines are functioning for both inbound and outbound Pings) - EXCEPT I can't Ping from Server A to Server B (B to A works fine as do C and Z to B).
I've run WireShark on both Server A and Server B and both A and B see BOTH the Ping Request packets going from A to B - AND Ping Reply packets going back from B to A. The Request packets have ICMP ID field as 0x0001 BUT the Reply packets have the ICMP ID field changed to 0x0100 (both the outbound packet as it leaves B and the inbound packet when it reaches A) - so the replies do not match the request and hence the Ping fails.
FWIW, given the particular values, the ID change could be a big-endian vs. little-endian byte swap, though why that might happen (and only happen in one case) makes no sense to me.
Until the Windows Updates this week (which have been applied to all of A, B, C and Z) Pings worked fine with no exceptions that I'm aware of - but the timing might, of course, be complete coincidence.
I've rebooted both A and B but that's made no difference to the behaviour.
Apart from this odd Ping issue, all of A, B, C and Z appear to be functioning normally - no connection problems / errors.
Has anyone got any idea what's going on / what's happened? Has anyone else seen this?
I've also see the question at ==> https://stackoverflow.com/questions/36764903/fail-to-send-ping-from-server, which might or might not be related.
Many thanks for any ideas / suggestions!

Related

In QEMU, how am I supposed to find the next TranslationBlock in a TB linked-list

I am trying to understand which guest instructions were executed after calling the function cpu_loop_exec_tb().
More specifically, I am trying to understand the relation between jmp_list_head, jmp_list_next and jmp_dest. According to the documentation of these fields in the code, the LSB of the pointer of jmp_list_next[0] or jmp_list_next[1] should be set which would indicate which branch was executed. But that is not always the case.
I am also puzzled to know which of jmp_dest or jmp_list_next should be used in order to get the next TranslationBlock pointer value (both contains valid pointers to instantiated TB). Sometimes, both jmp_dest and jmp_list_next have values while other times jmp_list_next are NULL but there are two jmp_dest.
For example:
cpu_loop_exec_tb()
last_tb address:
tb_exit: 0
EXEC:----------------
IN:
0x00007de4: 88 f8 movb %bh, %al
0x00007de6: 88 fc movb %bh, %ah
0x00007de8: e8 e1 ff callw 0x7dcc
level: 0
tb address: 0x00000253aca11cc0, LSB=0
tb.pc address: 0x0000000000007de4, LSB=0
tb.cflags: 0xff020000.
jmp_target_arg: 0x0000000000000094, 0x0000000000000000.
incoming jumps:
tb->jmp_list_head:
outgoing jumps:
tb->jmp_list_next[0]: 0x00000253aca11300, LSB=0
tb->jmp_list_next[1]:
tb->jmp_list_next[0]->pc: 0x0000000000007dd6, LSB=0
tb->jmp_list_next[1]->pc:
tb->jmp_dest[0]: 0x00000253aca11740, LSB=0
tb->jmp_dest[1]:
tb->jmp_dest[0]->pc: 0x0000000000007dcc, LSB=0
tb->jmp_dest[1]->pc:
STATUS: Following child TB 0x00000253aca11740: only jmp_dest[0] available.
Following child TB: : 0x00000253aca11740, LSB=0
EXEC:----------------
IN:
0x00007dcc: 72 02 jb 0x7dd0
level: 1
tb address: 0x00000253aca11740, LSB=0
tb.pc address: 0x0000000000007dcc, LSB=0
tb.cflags: 0xff020000.
jmp_target_arg: 0x0000000000000048, 0x0000000000000060.
incoming jumps:
tb->jmp_list_head: 0x00000253aca12300, LSB=0
outgoing jumps:
tb->jmp_list_next[0]:
tb->jmp_list_next[1]:
tb->jmp_list_next[0]->pc:
tb->jmp_list_next[1]->pc:
tb->jmp_dest[0]: 0x00000253aca118c0, LSB=0
tb->jmp_dest[1]: 0x00000253aca11e80, LSB=0
tb->jmp_dest[0]->pc: 0x0000000000007dce, LSB=0
tb->jmp_dest[1]->pc: 0x0000000000007dd0, LSB=0
tb->jmp_dest[0]->jmp_list_head: 0x00000253aca18c00, LSB=0
tb->jmp_dest[1]->jmp_list_head: 0x00000253aca13740, LSB=0
WARNING: Don't know which jmp_dest[] to choose from.
cpu_loop_exec_tb()
last_tb address:
tb_exit: 0
EXEC:----------------
IN:
0x00007de4: 88 f8 movb %bh, %al
0x00007de6: 88 fc movb %bh, %ah
0x00007de8: e8 e1 ff callw 0x7dcc
In the log above, the returned TB from tb_find() is 0x00000253aca11cc0. The next TB in the linked list is obvious since jmp_list_next[0] program counter is not 0x7dcc and jmp_dest[0] program counter is 0x7dcc.
When looking at TB address 0x00000253aca11740, I do not understand how to select which TB is next since both jmp_dest are set.
Looking at other places in the code which I do not fully understand, I was expecting to evaluate the two jmp_dest, look at their jmp_list_head and see which of the two has the LSB set to 1. In the log above, both tb->jmp_dest[0]->jmp_list_head and tb->jmp_dest[1]->jmp_list_head have their LSB not set which seems to indicate this TB is a leaf while it is clearly not. To be clear, I sometimes see instances where either tb->jmp_dest[0]->jmp_list_head or tb->jmp_dest[1]->jmp_list_head have their LSB set to 1.
I know there are missing TB in the list that I was not able to print since the next executed PC is 0x7de4 (its not 0x7dd0 or 0x7dce).
The guest source code I am executing is this x86 Space Invaders game stored in the MBR.
Note: This is my first time posting on StackOverflow.
Note: I also read this question but it does not seem to solve my problem.
The TB linking is kind of complicated. Mostly you should be able to just ignore it. In particular, for most easily readable logs you should use '-d nochain' which will disable TB linking entirely. If you don't use 'nochain' then you can to some extent figure out executed TBs from the 'exec' logging by looking at when it says it is "Linking TBs" but this is a lot more painful.
Note also that TB chaining like this is not the only reason why cpu_loop_exec_tb() might execute more than one TB -- as well as this "goto_tb" mechanism which statically chains TBs together, there is also the "goto_ptr" mechanism, which does a dynamic "look up a pointer to the next TB if possible" chaining.
You should read the QEMU developer docs on TCG block chaining if you haven't already.
To answer your "which jmp_dest gets used?" question, this depends on what has happened inside the TB. TCG TBs can have up to 2 exits; the classic use for the 2 exit case is for a conditional branch. The generated code looks like "test the condition; if condition fails, take exit 0; if condition passes, take exit 1". (There's no requirement for 0 and 1 to be used in that order, incidentally.) You can't answer "which TB exit do we take?" just by looking at the TB data structure, because the answer is runtime dependent, and can be different each time the TB is executed. You can see this in your example log output: for TB 0x00000253aca11740, jmp_dest[0] points at a TB for guest PC 0x7dce, which is the "condition fails, execution falls through" case, and jmp_dest[1] points at a TB for guest PC 0x7dd0, which is the "condition passes, take the branch" case.
You have a misunderstanding also about the meaning of the LSB in the jmp_list_head value. The LSB is used as part of following the linked list of TBs which jump into this one: it tells us whether the next element in the list is to be found in jmp_list_next[0] or jmp_list_next[1]. You can walk the linked list of incoming jumps with something like this:
uintptr_t ptr_and_n = this_tb->jmp_list_head;
for (;;) {
int n = ptr_and_n & 1;
TranslationBlock *tb = (TranslationBlock *)(ptr_and_n & ~1);
if (!tb) {
break; /* end of linked list */
}
printf("next TB in incoming list: %p (it gets to us via its exit %d)\n",
tb, n);
ptr_and_n = tb->jmp_list_next[n];
}
(In QEMU this is what the TB_FOR_EACH_JMP macro does; I've written out a longhand equivalent here as hopefully a bit easier to understand.)
Finally, be aware that these links between TBs can be created and broken dynamically -- they are an optimization, and if there is no pre-created link from one TB to the next, QEMU will drop back to the main loop to find the next TB, hopefully adding the link if possible. Sometimes existing links are broken (eg if the TB being linked to is invalidated). When emulating an SMP guest some of this may be happening in parallel while your thread is walking the data structures, which is why there is a jmp_lock and why some changes to the fields must be made atomically.

MySQL has gone away: Connection_errors_peer_address with high numbers

We have MySQL 5.7 master - slaves replications and on the slave servers side, it hapens from time to time that our application monitoring tools (Tideways and PHP7.0) are reporting
MySQL has gone away.
Checking the MYSQL side:
show global status like '%Connection%';
+-----------------------------------+----------+
| Variable_name | Value |
+-----------------------------------+----------+
| Connection_errors_accept | 0 |
| Connection_errors_internal | 0 |
| Connection_errors_max_connections | 0 |
| Connection_errors_peer_address | 323 |
| Connection_errors_select | 0 |
| Connection_errors_tcpwrap | 0 |
| Connections | 55210496 |
| Max_used_connections | 387 |
| Slave_connections | 0 |
+-----------------------------------+----------+
The Connection_errors_peer_address shows 323. How to further investigate on what is causing this issue on both sides:
MySQL has gone away
and
Connection_errors_peer_address
EDIT:
Master Server
net_retry_count = 10
net_read_timeout = 120
net_write_timeout = 120
skip_networking = OFF
Aborted_clients = 151650
Slave Server 1
net_retry_count = 10
net_read_timeout = 30
net_write_timeout = 60
skip_networking = OFF
Aborted_clients = 3
Slave Server 2
net_retry_count = 10
net_read_timeout = 30
net_write_timeout = 60
skip_networking = OFF
Aborted_clients = 3
In MySQL 5.7, when a new TCP/IP connection reaches the server, the server performs several checks, implemented in sql/sql_connect.cc in function check_connection()
One of these checks is to get the IP address of the client side connection, as in:
static int check_connection(THD *thd)
{
...
if (!thd->m_main_security_ctx.host().length) // If TCP/IP connection
{
...
peer_rc= vio_peer_addr(net->vio, ip, &thd->peer_port, NI_MAXHOST);
if (peer_rc)
{
/*
Since we can not even get the peer IP address,
there is nothing to show in the host_cache,
so increment the global status variable for peer address errors.
*/
connection_errors_peer_addr++;
my_error(ER_BAD_HOST_ERROR, MYF(0));
return 1;
}
...
}
Upon failure, the status variable connection_errors_peer_addr is incremented, and the connection is rejected.
vio_peer_addr() is implemented in vio/viosocket.c (code simplified to show only the important calls)
my_bool vio_peer_addr(Vio *vio, char *ip_buffer, uint16 *port,
size_t ip_buffer_size)
{
if (vio->localhost)
{
...
}
else
{
/* Get sockaddr by socked fd. */
err_code= mysql_socket_getpeername(vio->mysql_socket, addr, &addr_length);
if (err_code)
{
DBUG_PRINT("exit", ("getpeername() gave error: %d", socket_errno));
DBUG_RETURN(TRUE);
}
/* Normalize IP address. */
vio_get_normalized_ip(addr, addr_length,
(struct sockaddr *) &vio->remote, &vio->addrLen);
/* Get IP address & port number. */
err_code= vio_getnameinfo((struct sockaddr *) &vio->remote,
ip_buffer, ip_buffer_size,
port_buffer, NI_MAXSERV,
NI_NUMERICHOST | NI_NUMERICSERV);
if (err_code)
{
DBUG_PRINT("exit", ("getnameinfo() gave error: %s",
gai_strerror(err_code)));
DBUG_RETURN(TRUE);
}
...
}
...
}
In short, the only failure path in vio_peer_addr() happens when a call to mysql_socket_getpeername() or vio_getnameinfo() fails.
mysql_socket_getpeername() is just a wrapper on top of getpeername().
The man 2 getpeername manual lists the following possible errors:
NAME
getpeername - get name of connected peer socket
ERRORS
EBADF The argument sockfd is not a valid descriptor.
EFAULT The addr argument points to memory not in a valid part of the process address space.
EINVAL addrlen is invalid (e.g., is negative).
ENOBUFS
Insufficient resources were available in the system to perform the operation.
ENOTCONN
The socket is not connected.
ENOTSOCK
The argument sockfd is a file, not a socket.
Of these errors, only ENOBUFS is plausible.
As for vio_getnameinfo(), it is just a wrapper on getnameinfo(), which also according to the man page man 3 getnameinfo can fail for the following reasons:
NAME
getnameinfo - address-to-name translation in protocol-independent manner
RETURN VALUE
EAI_AGAIN
The name could not be resolved at this time. Try again later.
EAI_BADFLAGS
The flags argument has an invalid value.
EAI_FAIL
A nonrecoverable error occurred.
EAI_FAMILY
The address family was not recognized, or the address length was invalid for the specified family.
EAI_MEMORY
Out of memory.
EAI_NONAME
The name does not resolve for the supplied arguments. NI_NAMEREQD is set and the host's name cannot be located, or neither
hostname nor service name
were requested.
EAI_OVERFLOW
The buffer pointed to by host or serv was too small.
EAI_SYSTEM
A system error occurred. The error code can be found in errno.
The gai_strerror(3) function translates these error codes to a human readable string, suitable for error reporting.
Here many failures can happen, basically due to heavy load or the network.
To understand the process behind this code, what the MySQL server is essentially doing is a Reverse DNS lookup, to:
find the hostname of the client
find the IP address corresponding to this hostname
to later convert this IP address to a hostname again (see the call to ip_to_hostname() that follows).
Overall, failures accounted with Connection_errors_peer_address can be due to system load (causing transient failures like out of memory, etc) or due to network issues affecting DNS.
Disclosure: I happen to be the person who implemented this Connection_errors_peer_address status variable in MySQL, as part of an effort to have better visibility / observability in this area of the code.
[Edit] To follow up with more details and/or guidelines:
When Connection_errors_peer_address is incremented, the root cause is not printed in logs. That is unfortunate for troubleshooting, but also avoid flooding logs causing even more damage, there is a tradeoff here. Keep in mind that anything that happen before logging in is very sensitive ...
If the server really goes out of memory, it is very likely that many other things will break, and that the server will go down very quickly. By monitoring the total memory usage of mysqld, and monitoring the uptime, it should be fairly easy to determine if the failure "only" caused connections to be closed with the server staying up, or if the server itself failed catastrophically.
Assuming the server stays up on failure, the more likely culprit is the second call then, to getnameinfo.
Using skip-name-resolve will have no effect, as this check happens later (see specialflag & SPECIAL_NO_RESOLVE in the code in check_connection())
When Connection_errors_peer_address fails, note that the server cleanly returns the error ER_BAD_HOST_ERROR to the client, and then closes the socket. This is different from just closing abruptly a socket (like in a crash) : the former should be reported by the client as "Can't get hostname for your address", while the later is reported as "MySQL has gone away".
Whether the client connector actually treat ER_BAD_HOST_ERROR and a socket closed differently is another story
Given that this failure overall seems related to DNS lookups, I would check the following items:
See how many rows are in the performance_schema.host_cache table.
Compare this with the size of the host cache, see the host_cache_size system variable.
If the host cache appear full, consider increasing its size: this will reduce the number of DNS calls overall, relieving pressure on DNS, in hope (admittedly, this is just a shot in the dark) that DNS transient failures will disappear.
323 out of 55 million connections indeed seems transient. Assuming the monitoring client sometime do get connected properly, inspect the row in table host_cache for this client: it may contains other failures reported.
Table performance_schema.host_cache documentation:
https://dev.mysql.com/doc/refman/5.7/en/host-cache-table.html
Further readings:
http://marcalff.blogspot.com/2012/04/performance-schema-nailing-host-cache.html
[Edit 2] Based on the new data available:
The Aborted_clients status variable shows some connections forcefully closed by the server. This typically happens when a session is idle for a very long time.
A typical scenario for this to happen is:
A client opens a connection, and sends some queries
Then the client does nothing for an extended amount of time (greater than the net_read_timeout)
Due to lack of traffic, the server closes the session, and increments Aborted_connects
The client then sends another query, sees a closed connection, and reports "MySQL has gone away"
Note that a client application forgetting to cleanly close sessions will execute 1-3, this could be the case for Aborted_clients on the master. Some cleanup here to fix clients applications using the master would help to decrease resource consumption, as leaving 151650 sessions open to die on timeout has a cost.
A client application executing 1-4 can cause Aborted_clients on the server and MySQL has gone away on the client. The client application reporting "MySQL has gone away" is most likely the culprit here.
If a monitoring application, say, checks the server every N seconds, then make sure the timeouts (here 30 and 60 sec) are significantly greater that N, or the server will kill the monitoring session.

Defining trips and stop sequences for bidirectional route in GTFS

I'm trying to define a GTFS feed for a ferry crossing between 2 ports (A <-> B). There may be 2 ferries running between these ports.
routes.txt
route_id,route_short_name,route_long_name,route_desc,route_type
AB,A-B,A << >> B,Ferry travelling between A and B,4
calender.txt
service_id,monday,tuesday,wednesday,thursday,friday,saturday,sunday,start_date,end_date
FULLWEEK,1,1,1,1,1,1,1,20180103,20180430
trips.txt
route_id,service_id,trip_id,trip_headsign,direction_id,shape_id,wheelchair_accessible,bikes_allowed
AB,FULLWEEK,a_b,B Dest,0,ab_shape,1,1
AB,FULLWEEK,b_a,B Dest,1,ab_shape,1,1
stops.txt
stop_id,stop_name,stop_desc,stop_lat,stop_lon,location_type
A,B-A,Travelling from B to A,xxxx,xxxx,1
B,A-B,Travelling from A to B,xxxx,xxxx,1
stop_times.txt
trip_id,arrival_time,departure_time,stop_id,stop_sequence
a_b,02:45:00,03:00:00,A,1
a_b,04:45:00,05:00:00,A,1
b_a,00:45:00,01:00:00,B,2
b_a,03:45:00,04:00:00,B,2
^^ this is where the errors appear in the feed validator
Duplicate stop_sequence in trip_id a_b
I can't work if I should be using 2 routes instead of 1 (and stop using the direction_id value in trips.txt) and what the sequence of the timetables are, since the timetables at both ports may not match up as a sequence as there may be multiple ferries running between the 2 ports.
Thank you.
Figured it out, basically trips.txt must contain an entry for every scheduled departure. I was treating trips like routes, when in fact every departure is it's own "trip".

SYS(3050) is throwing Function argument value,type or count is invalid error

Earlier i had an issue of not enough memory for file mapping.
Then as advised by few experts I used following code in my main program and that solves the issue and worked fine.
SYS(3050,1,MIN(536870912,VAL(SYS(3050,1,0))))
SYS(3050,2,MIN(536870912,VAL(SYS(3050,1,0))))
But recently one of client's machine is upgraded to Windows 7 64 bit from XP 32 bit. After that when the system is starting
it is throwing an error of Function argument value, type or count is invalid at SYS(3050) line.
If I omit this and continues then not enough memory for file mapping is occurs.
Can anybody advise what i should do to overcome this issue? Is it because of 64 bit OS of Windows 7 (because other two machines with Windows 7 and 32 bit are working properly)
As Alan B said, the problems with 'not enough memory for file mapping' tend to go away when switching to VFP9 SP2 (it's the one fly in the ointment when using VFP8 SP1 which is otherwise the most solid of the lot).
If switching to VFP9 is not an option then I would suggest factoring out the nested SYS(3050,1,0) calls and sanitising the result before feeding it into VAL(). At the very least it would pinpoint more accurately the place where the problem occurs, to guide further investigation with the aid of a debugger or a tool like IDA.
The original code already caps the parameter at 536870912, which is well below the threshold of 2^31 at which SYS(3050) throws a range error. However, the parameter must be strictly positive, which requires adding a MAX() term:
local nLimit
nLimit = max(1, min(536870912, val(sys(3050, 1))))
sys(3050, 1, m.nLimit)
sys(3050, 2, m.nLimit)
Background: calling the function with a limit parameter of 0 is the same as calling it without a limit (i.e. it gets the limit instead of setting it). Calling the function with a negative parameter causes it to blow with a range error.

Why are register-based virtual machines better than stack-based ones?

Why are register-based virtual machines better than stack-based ones?
Specifically, in the Parrot VM's document, the designer explains the benefits of register machines:
[...] many programs in high-level languages consist of nested function and method calls, sometimes with lexical variables to hold intermediate results. Under non-JIT settings, a stack-based VM will be popping and then pushing the same operands many times, while a register-based VM will simply allocate the right amount of registers and operate on them, which can significantly reduce the amount of operations and CPU time.
but why are the same operands pushed many times?
It seems like they describe a VM which executes the code as described in the language design, bytecode-by-bytecode without compiling or optimisation. In that case it is true. Think about code doing something like this for example:
x = first(a,b,c)
y = second(a,b,c)
third(y,x)
With a register based system, you might be able to simply put the arguments in whatever position they're expected (if registers can be used to pass arguments). If all registers are "global", not per-function (or at least restored when poping the call-stack) you might not need to do anything between the call to first and second.
If you have a stack-based VM, you'd end up with something like (hopefully you do have swap):
push a
push b
push c
call first
push a # pushing same arguments again
push b
push c
call second
swap
call third
Also if you calculate a math expression which reuses the same variables, you might need to do something like this:
push a
push b
add
push a
push c
add
add
instead of (assuming there are registers a,b,c and you can destroy the contents of b and c):
add b, a
add c, a
add b, c # result in b
this avoids restoring a, which needed to be done in a separate push in the first case.
Then again, I'm just guessing the examples, maybe they meant some other case...