Why is there a 10 second delay in this http conversation? - html

I've got a weird situation. The first time I hit an embedded web server (uclinux/boa) at 10.1.10.29, I get a 10 second delay in the browser window before things start happening. "first time" means I haven't hit the machine in a few days. Browser type/OS doesn't matter (source is 10.1.10.20)
I've got a wireshark capture of it happening.
And here is the detail of frame 296:
Note packet 374 doesn't pop out for around 10 seconds after 296. The packets between those 2 aren't from the machine in question. It's just sitting there for 10 seconds and decides to retransmit. How's it supposed to work?

The main reason is most certainly because the code was swapped out from memory.
MS-Windows is really bad in that regard. If some program does not get used for "too long", it gets swapped out of memory. Period. When you come back at it, it has to re-read it back from the hard drive.
The one good thing (main reason) Windows does that is to defragment the kernel memory. For that, it is good.
You have similar problems under Linux, however, only if your server needs the memory. In other words, if you have tons of processes and they all fight for as much of memory as possible, then it is likely to swap out the least used software. Otherwise it will stay in place.
If you were to use the Cassandra database system, you would notice that on any computer that runs anything else than Cassandra. If you just run Cassandra, it remains fast all the time. If you run other software that use a lot of the memory, Cassandra is slow on first access. This is particularly noticeable.

I want to add the answer that solved our problem that had the problem with the 10 second delay, then working and after 5 minutes of inactivity adding another 10 seconds delay.
First of all, we wiresharked everything, and tried to find some kind of error in code, or in the way that the computer or server handled the network traffic. Found nothing out of the ordinary.
After much searching we found it was a DNS-"problem". In the DNS-server that the client computer used, there were dual entries for the domain name of the server. One was correct and one (the first one in the list) was wrong.
So removing the wrong dns pointer solved the problem.
This means the problem was that the computer tried the first address it got, waited 10seconds to get a reply, didnt get it and went to the second address in line. This creates no error messages as this is how DNS is supposed to work. And that is why all our wireshark logs showed up as just waiting 10 seconds with no error and no reason, and then just jump into life, work for as long as the DNS record is valid (5 minutes in our case) and then the procedure needs to be done again.
Hope this helps someone who has a similar problem.

Related

MySQL/MariaDB, does commit() write to disk if there were no changes? Concerned about Raspberries SD card life

I'm a little worried about my Raspberry Pi's SD-card life.
On the Raspberry, there's a MySQL(MariaDB)-server running.
A program of mine is reading from the database every second,
then looks something up on the internet and only rarely,
when something happens, it's going to write to the database.
I used to use commit() only once every 5 minutes, but apparently if I don't commit,
the program doesn't see changes from other programs even though those are from tables
it doesn't write to.
1) Concerns about a Raspberries-SD-card's life are all over the internet, so my question is,
how to best call commit()?
2) If it just reads from the database but doesn't change anything, will commit even access the disk?
Is there a way to see the new changes without commiting?
3) And if I do have to commit every second in order to see the changes in time, how bad is it?
PS: I'm using Python3 with mysql-connector, an 8 GB SD with the OS the raspberry imager program recommended to me
So I guess since it's just a couple hundred writings a day, it's totally fine.
But why have you guys all answered in the comments?
Who am I gonna pick as best answer?

What does Chrome Network Timings really mean and what does affects each timing length?

I was looking at chrome dev tools #resource network timing to detect requests that must be improved. In the link before there's a definition for each timing but I don't understand what processes are being taken behind the scenes that are affecting the length of the period.
Below are 3 different images and here is my understanding of what's going on, please correct me if I'm wrong.
Stalled: Why there are timings where the request get's stalled for 1.17s while others are taking less?
Request Sent: it's the time that our request took to reach server
TTFB: Time took until the server responds with the first byte of data
Content Download: The time until the whole response reaches the client
Thanks
Network is an area where things will vary greatly. There are a lot of different numbers that go into play with these and they vary between different locations and even the same location with different types of content.
Here is some more detail on the areas you need more understanding with:
Stalled: This depends on what else is going on in the network stack. One thing could not be stalled at all, while other requests could be stalled because 6 connections to the same location are already open. There are more reasons for stalling, but the maximum connection limit is an easy way to explain why it may occur.
The stalled state means, we just can't send the request right now it needs to wait for some reason. Generally, this isn't a big deal. If you see it a lot and you are not on the HTTP2 protocol, then you should look into minimizing the number of resources being pulled from a given location. If you are on HTTP2, then don't worry too much about this since it deals with numerous requests differently.
Look around and see how many requests are going to a single domain. You can use the filter box to trim down the view. If you have a lot of requests going off to the same domain, then that is most likely hitting the connection limit. Domain sharding is one method to handle this with HTTP 1.1, but with HTTP 2 it is an anti-pattern and hurts performance.
If you are not hitting the max connection limit, then the problem is more nuanced and needs a more hands-on debugging approach to figure out what is going on.
Request sent: This is not the time to reach the server, that is the Time To First Byte. All request sent means is the request is sent and it took the network stack X time to carry it out.
Nothing you can do to speed this up, it is more for informational and internal debugging purposes.
Time to First Byte (TTFB): This is the total time for the sent request to get to the destination, then for the destination to process the request, and finally for the response to traverse the networks back to the client.
A high TTFB reveals one of two issues. The first is a bad network connection between the client and server. So data is slow to reach the server and get back. The second cause is, a slow server processing the request. This is either because the hardware is weak or the application running is slow. Or, both of these problems can exist at once.
To address a high TTFB, first cut out as much network as possible. Ideally, host the application locally on a low-resource virtual machine and see if there is still a big TTFB. If there is, then the application needs to be optimized for response speed. If the TTFB is super-low locally, then the networks between your client and the server are the problem. There are various ways to handle this that I won't get into since it is an area of expertise unto itself. Research network optimization, and even try moving hosts and seeing if your server providers network is the issue.
Remember the entire server-stack comes into play here. So if nginx or apache are configured poorly, or your database is taking a long time to respond, or your cache is having trouble, then these can cause delays. They are also difficult to detect locally, since your local server could vary in configuration from the remote stack.
Content Download: This is the total time from the TTFB resolving for the client to get the rest of the content from the server. This should be short unless you are downloading a large file. You should take a look at the size of the file, the conditions of the network, and then judge about how long this should take.

Magento cache is reusing bugy code resulting in site death with "Too many connections"

In my Magento store there is a piece of code that is not closing the DB connection (it seems). Normally I don't see any problems. However the problem arises when I enable caching. After a few hours, the site grounds to a halt and eventually it dies with the MySQL error "Too many connections".
It seems that the "bad" piece of code is cached and reused and therefore gets worse and worse until...death.
I'm scratching my head to find out where this rogue piece of code is being called from(of course there could be more than 1 problem).
I doubt it's a core problem otherwise I probably would have heard of it whilst Googling the issue. So that just leaves 3rd part modules and code I have written.
One option would be to disable all modules and re-enable each one until the problem occurs. But of course it might not happen for several hours (and when it did you just know it would be in the middle of the night ;)). Then of course it might not a 3rd party issue but something I have done. And also I need certain modules in my store for it to function properly (payment gateways etc).
So I'm after suggestions on how to track this down...
I've enabled MySQL logging but it doesn't really tell me all that much.
Any ideas?
Magento 1.7.0.2 and APC with Apache 2

MS Access: 100% CPU usage

We have an MS Access 2003 ADP application with SQL Server. Sometimes, without any apparent reason, this application starts consuming 100% of CPU time (50% on a dual-core CPU system). This is what Windows Task Manager and other process monitoring/analysis tools are showing, anyway. Usually, the only way to stop such CPU thrashing is to restart the application.
We still don't know how to trigger this problem at will. But I have a feeling that it usually happens when some of the forms get closed by a user.
NB: Recently we noticed that one of the forms consistently makes CPU usage raise to 100% whenever it gets minimized. Most of the time CPU usage goes back to normal when that form is "un-minimized". Perhaps, it's a different problem, but we'd like to uncover this mystery, too. :)
Googling for a solution of this problem didn't yield very good results. The most frequent theory is that MS Access gets into some sort of waiting-for-events loop which is practically harmless, performance-wise, because the thread running that loop has very low priority. This doesn't seem to help us because in our case (a) it certainly does hurt the system's performance and (b) it's still unclear what exactly makes Access to get into such "bad state" and how to avoid that.
I've gotten this CPU usage problem before in the past, but I don't remember if we ever discovered a solution or it just went away at some point.
In your post, you didn't mention reviewing the VBA. I'd recommend looking for a loop that under certain conditions becomes an endless loop.
I wonder if it is a hangover from this problem that access used to have in the "old" days
http://support.microsoft.com/kb/160819
Whilst the article does say it is fixed in versions >=2000 it still might be something.

php cli script hangs with no messages

I've written a PHP script that runs via SSH and nohup, meant to process records from a database and do stuff with them (eg. process some images, update some rows).
It works fine with small loads, up to maybe 10k records. I have some larger datasets that process around 40k records (not a lot, I realize, but it adds up to a lot of work when each record requires the download and processing of up to 50 images).
The larger datasets can take days to process. Sometimes I'll see in my debug logs memory errors, which are clear enough-- but sometimes the script just appears to "die" or go zombie on me. My tail of the debug log just stops, with no error messages, the tail of the nohup log ends with no error, and the process is still showing in a ps list, looking like this--
26075 pts/0 S 745:01 /usr/bin/php ./import.php
but no work is getting done.
Can anyone give me some ideas on why a process would just quit? The obvious things (like a php script timeout and memory issues) are not a factor, as far as I can tell.
Thanks for any tips
PS-- this is hosted on a godaddy VDS (not my choice). I am sort of suspecting that godaddy has some kind of limits that might kick in on me despite what overrides I put in the code (such as set_time_limit(0);).
Very likely the OOM killer. If you really , really really want to stay out of its reach, as root, have your process write -17 to /proc/self/oom_adj. Caution: The kernel usually knows better. Evading the OOM killer can actually cripple the same RDBMS that you are trying to query. What a vicious cycle that would be :)
You probably (instead) want to stagger queries based on what you read from /proc/loadavg and /proc/meminfo. If you increase loads or swap exponentially, you need to back off, especially as a background process :)
Additionally, monitor IOWAIT while you run. This can be averaged from /proc/stat when compared with the time the system booted. Note it when you start and as you progress.
Unfortunately, the serial killer known as the OOM killer does not maintain a body count that is accessible beyond parsing kernel messages.
Or, your cron job keeps hitting its ulimited amount of allocated heap. Either way, your job needs to back off when appropriate, or prevent its own demise (as noted above) prior to doing any work.
As a side note, you probably should not be doing what you are doing on shared hosting. If its that big, its time to get a VPS (at least) where you have some control over what process gets to do what.