high availability replicated servers, tomcat session lost. Firefox and chrome use 60 segs as TTL and don't respect DNS defined TTL - google-chrome

I have 4 servers for an http service defined on my DNS servers:
app.speednetwork.in. IN A 63.142.255.107
app.speednetwork.in. IN A 37.247.116.68
app.speednetwork.in. IN A 104.251.215.162
app.speednetwork.in. IN A 192.121.166.40
for all of them the DNS server specify a TTL (time to live) of more than 10 hours:
$ttl 38400
speednetwork.in. IN SOA plugandplay.click. info.plugandplay.click. (
1454402805
3600
3600
1209600
38400 )
Firefox ignore TTL and make a new DNS query after each 60 secs, as seen on
about:config -> network.dnsCacheExpiration 60 and on about:networking -> DNS.
Chrome shows here chrome://net-internals/#dns a correct cached dns entry, with more that 10 hours until Expired:
apis.google.com IPV4 216.58.210.174 2016-04-12 11:07:07.618 [Expired]
app.speednetwork.in IPV4 192.121.166.40 2016-04-12 21:45:36.592
but ignore this entry and every minute requery the dns as discussed https://groups.google.com/a/chromium.org/forum/#!topic/chromium-discuss/655ZTdxTftA and seen on chrome://net-internals/#events
The conclusion and the problem: every minute both browsers query dns again, receive a new IP from the 4 configured on DNS, go for a new IP/server and LOST THE TOMCAT SESSION.
As config every user browser is not an option, my question is:
1) There is some other DNS config I can use for high availability?
2) There is some http header I can use to instruct the browsers to continue using the same IP/server for the day?

The DNS TTL value is the maximum time the information may be cached. There is no minimum time, nor any requirement to cache at all. The browser behavior you describe is entirely within the DNS specs, and the browsers are doing nothing wrong. If your server solution depends on the clients remembering a DNS lookup for a certain time, then you need to redesign it. As you have already discovered, it does not work.
Building a load-balancing cluster of Tomcat servers is hardly rocket science these days, and you can easily google a solution yourself.

Keep-Alive header can make the trick. Using a large value as 65 secs, browsers reuse http connection along all session and don't try a new dns query. This is true in my app, where there is a piggyback XMLHttpRequest connection to server every minute, maybe you'll need a bigger value. Apache default is 5 secs.
On using tomcat directly:
response.setHeader("Keep-Alive", " timeout=65");
On using apache (and mod_ajp) in front of tomcat:
nano /etc/apache2/apache2.conf:
MaxKeepAliveRequests 0
KeepAliveTimeout 65
But this was not a total solution. After disconnects http connection is closed, or under several concurrent server petitions, each one is open over several servers, so results are not under the same server session.
Finally I solve this implementing CORS (cross domain), fixing a server to work with (app1, app2, etc.) and go for it until this server fails.
CORS headers on both server and client let me exchange data no matter that initial files download was from app. (IE another domain).

Related

Catch 22? Blocked by CORS policy: Same server, internal/external IP, no SSL

My apologies if this is a duplicate. I can find a million results about CORS policy issues, but not about this specific one:
I developed a simple "speed test" site for my users (wfh employees of my company) to access. It tests speeds across the public net to different datacenters we utilize, and via the users' VPN connection to one of our DCs.
There are more complicated elements, but for a basic round-trip "ping" I have an extremely simple PHP script on the server that contains:
<?php
header('Access-Control-Allow-Origin: *');
header('Access-Control-Allow-Headers: *');
if ($_GET['simple'] == '1')
die('{ }');
?>
It is called like this:
$.ajax({
type: 'GET',
url: sURL,
data: { ignore: (pingCounter.start = new Date().getTime()) },
dataType: 'text',
timeout: iTimeout
})
.done(function(ret) {
pingCounter.end = new Date().getTime();
[...] (additional code omitted for brevity)
(I know this has additional overhead other than the raw round-trip network traffic timing, but I don't need sub-ms accuracy. I just need to be able to tell users "the problem is on your end" or "ah yes, the problem is the latency between your house and this particular DC".)
The same server running that PHP code is addressable at the following URLs at the DC wherein our VPN server lies:
http://speedtest-int.mycompany.com/ping.php
http://speedtest-ext.mycompany.com/ping.php
Public DNS resolves like this:
speedtest-int.mycompany.com IN A 1.1.1.1 (Actual public IP redacted)
speedtest-int.mycompany.com IN A 10.1.1.1 (Actual internal IP redacted)
If I access either URL from my browser directly, it loads fine (which is to say it responds with { }).
When loading via the JS snippet above, the call to http://speedtest-ext.mycompany.com/ping.php works fine.
The call to http://speedtest-int.mycompany.com/ping.php fails with "The request client is not a secure context and the resource is in more-private address space 'private'".
Fair enough, the solution is to add Access-Control-Allow-Private-Network: *, right?
EXCEPT that apparently can only be used with SSL:
https://developer.chrome.com/blog/private-network-access-update/
I have a self-signed cert on there, but that obviously fails by policy for that reason.
I could just get a LetsEncrypt cert for multiple subdomains. EXCEPT it will never validate the URL http://speedtest-int.mycompany.com because the LetsEncrypt servers won't be able to reach that to validate ownership, as it's a private IP.
I have no control over most of my users' machines, so I can't necessarily install trusted internal certs or change browser options. Most users use Chrome.
So is my solution to buy a UCC or wildcard cert?
I feel like I'm in a catch-22, and I don't want to spend however-much on a UCC cert for an internal app that will be very very very occasionally used by one of our 25 home-based employees when I want to prove that their home "internet is bad" and not the corp network.
Thanks in advance; I'm sure there's a stupidly obvious solution I'm not seeing.
(I'm considering pushing a /32 route to my VPN users for another real public IP to be used in place of the internal IP. Then I can have the "internal" test run against an otherwise publicly accessible IP which could be validated by LetsEncrypt, but VPN users would hit it via the VPN. Is that silly?)
Edit: If anyone is curious -- or it helps to clarify my goal here -- this is the output when accessing the speedtest page:
http://s.co.tt/wp-content/uploads/2021/12/Internal_Speedtest_Example-Redacted.png
It repeats for 20 cycles (or until stopped) and runs each element a varying number of times per cycle, collecting the average time for each. It ain't pretty, but it work(ed).

AWS ACM certificate state is pending validation and not changing to issues

I have requested a public ACM certificate and I have selected the DNS validation method. After requesting the certificate it went to Pending validation state. I have created a hosted zone in Route 53 with the same domain name which I have used for my certificate. After creating the certificate I got the option "Create record in Route 53". I have created the record in Route 53 with the CNAME and it displayed as " Success
The DNS record was written to your Route 53 hosted zone. It can take 30 minutes or longer for the changes to propagate and for AWS to validate the domain and issue the certificate.". But the status of the certificate is not getting changed and it is still in pending validation only. After some time the "Create record in Route 43" option is getting enabled again. I have tried the same process multiple times almost one day but the status is not getting changed. Can someone please help to fix the issue.
In the AWS Console (Web UI), on the Certificate Manager page,
Expand the certificate that is pending
Expand the table that has domain and validation status
Click the blue button that says "Create record in Route 53" (you can also do this manually)
Give it about 10 minutes
Or follow these instructions from AWS - Why is my AWS Certificate Manager (ACM) certificate DNS validation status still pending validation?
Having the same issue here and I found out that my problem is in the NS record in my domain. My mistake was I didn't update the Name Servers in my domain, what I did was the opposite. I updated the values of the NS record in R53 based on the NS on my domain then I realized that the right thing to do was to update your NS (Name Servers) of your domain to the values of the NS record in R53. Haha (english is not my native language btw).
Just make sure you have the correct Name Servers and correct CNAME suggested by ACM. I waited a day before and still Pending Validation, but when I fixed it it took only a few minutes for my certificate to be issued.
What I would do is:
Verify that the DNS returns what is expected.
For that you can use dig (Linux) or nslookup (Windows), or even better > https://www.digwebinterface.com
If you don't get what is expected, you need to reconfigure the DNS.
Once it is verified, wait a little bit (10 min to 2h I'd say).
Something to read while you wait:
https://docs.aws.amazon.com/acm/latest/userguide/acm-regions.html
https://docs.aws.amazon.com/acm/latest/userguide/dns-validation.html
https://aws.amazon.com/premiumsupport/knowledge-center/acm-certificate-pending-validation/
https://docs.aws.amazon.com/acm/latest/userguide/domain-ownership-validation.html

captive.apple.com/generate_204 hits from Windows 10

We have a several Windows 10 workstation - 6 out of 20 - constantly hitting the url "captive.apple.com/generate_204" over wired internet. Its not causing any issues but we don't understand why its happening and we want to turn it off.
Our FW logs give us this info which may be pertinent:
udp:6514
pan:threat
action allowed
app web-browsing
app:default_ports tcp/80
app:has_known_vulnerability yes
app:risk 4
app:subcategory internet-utility
app:technology browser-based
app:tunnels_other_application yes
app:used_by_malware yes
application web-browsing
category computer-and-internet-info
content_type text/html
dest 17.253.63.202
dest_hostname captive.apple.com
dest_interface ethernet1/4
dest_ip 17.253.63.202
dest_port 80
dest_zone dsl
direction client-to-server
filename generate_204
flags 0x42b000
misc captive.apple.com/generate_204
protocol tcp
rule User Internet Access - App
signature URL Filtering log(9999)
signature_id 9999
src_interface ethernet1/5.6
src_port 56363
src_translated_ip 192.168.50.1
src_translated_port 8089
threat_id 9999
threat_name -9999
type THREAT
url captive.apple.com/generate_204
user_agent Mozilla / 4.0
Solved.. The GlobalProtect client for VPN access was hitting this "URL" to test for connectivity. I found out by eliminating what services were active on startup and it was the second one I tried.
Now we can eliminate this call-out as it is a trusted app that's doing it with no payload anyhow.
So it wasn't a browser but an embedded agent within the client

Why does Chrome sometimes ask for basic auth a second time and Firefox not?

I am running a React frontend and a Laravel backend on a Nginx server (homestead Vagrant box) behind a basic auth, the Nginx configuration for that looks like:
server {
...
location / {
try_files $uri $uri/ /index.php?$query_string;
auth_basic "Restricted";
auth_basic_user_file /home/vagrant/Code/project/.htpasswd;
}
}
This is basically running all right and Chrome (v52, Mac OS X) "sometimes" ask for the auth again on subsequent requests, for example to load a image which is defined as css-background on a button hover. This behaviour (at least for my research so far) is not consistent and I cant reproduce it regularly, it occurs from time to time, I canĀ“t find a reason for the subsequent auth request.
In Firefox (v47.0, Max OS X) I get one auth prompt and then it is working like expected.
Do you have any idea how to debug the specific behaviour in Chrome or make sure that the first auth prompt will be the only one?
Note: The frontend send some further XHR calls to the backend which have also the "authorization" header set to fulfill the basic auth without showing the prompt.
I suspect the issue here is with how you're storing the authorization token locally and the amount of time for which it's valid. Browsers will handle local storage a little differently from one another, so if you're using local storage or session storage, it may simply be a difference in how the data is persisted.
I believe this SO post would probably help answer the question: How persistent is localStorage?
Basically Chrome allows the data to have a set a timeout period while in Firefox "it is not possible to specify an expiration period for any of your data".
If you're using Chrome frequently and clear your cache for other reasons, you're likely also clearing your auth token. If you're only using Firefox for testing, you likely have a cached auth token that's not expiring.

What could cause duplicate records to be created by Rails?

We are noticing a lot of duplicate records are being created in various tables in our database, but are at a loss as to why this is happening. Interestingly, while the records are otherwise duplicate (down to even the created_at stamps!), on our users table, the password salt and hash are different on each record -- which leads me to believe that somehow Rails is somehow running transactions/save operations twice. Obviously, we are not calling save or create multiple times in the application code.
This duplication does not seem to happen with every record saved in the database, and we cannot seem to infer a pattern yet. There is also a validates_uniqueness_of validation on the User model (though not a unique key on the table yet; we need to clean up all the duplicates to be able to do that) -- so Rails should stop itself if a record already exists, but if the requests are firing simultaneously that's a race condition.
We are currently running Rails 3.2.2 behind Passenger 3.0.11/nginx on our app servers (currently 2 of them), and have one central nginx webserver which sends requests upstream to an app server. Could this setup somehow cause processes to be duplicated or something? Would it matter that requests aren't locked to one upstream server (ie. if one user requests a page that includes static content like images, one or both app servers may be used)? (I feel like that's grasping at straws but I want to cover every possibility)
What else could cause this to happen?
Update: As an example, a user was created today which got duplicate records. Both have the created_at stamp of 2012-03-28 16:48:11, and all columns except for hashed_password and salt are identical. From the request log, I can see the following:
App Server 1:
Started POST "/en/apply/create_user" for 1.2.3.4 at 2012-03-28 12:47:19 -0400
[2012-03-28 12:47:19] INFO : Processing by ApplyController#create_user as HTML
[2012-03-28 12:47:20] INFO : Rendered apply/new_user.html.erb within layouts/template (192.8ms)
Started POST "/en/apply/create_user" for 1.2.3.4 at 2012-03-28 12:48:10 -0400
[2012-03-28 12:48:10] INFO : Processing by ApplyController#create_user as HTML
[2012-03-28 12:48:11] INFO : Redirected to apply/initialize_job_application/3517
[2012-03-28 12:48:11] INFO : /app/controllers/apply_controller.rb:263:in `block (2 levels) in create_user'
App Server 2:
Started POST "/en/apply/create_user" for 1.2.3.4 at 2012-03-28 12:48:10 -0400
[2012-03-28 12:48:10] INFO : Processing by ApplyController#create_user as HTML
Web Server:
1.2.3.4 - - [28/Mar/2012:12:48:10 -0400] "POST /en/apply/create_user HTTP/1.1" 499 0 "en/apply/create_user" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)" "-"
1.2.3.4 - - [28/Mar/2012:12:48:11 -0400] "POST /en/apply/create_user HTTP/1.1" 302 147 "en/apply/create_user" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)" "-"
So the create action was hit three times (returning to the form the first time due to an error, probably), and at least once on each server. The latter two both are registered by the webserver as separate requests, but the first gets status code 499 Client Closed Request (an nginx extension according to wikipedia), and the second gets a 302 as expected. Could the 499 be causing the problems here?
Two possibilities come to mind.
The first one is an odd (and against the RFC) behavior of Nginx when used as a load balancer. It will retry any failed requests against the next backend. The RFC allows that only for safe methods (e.g. GET or HEAD). The result of this is that if your nginx considers a request failed for some reason, it might be that it is re-send to the next server. If both servers complete their transaction though, you have a duplicate record. Judging from your webservers log (and the 499 status code which Nginx uses to denote a user clicking abort in their browser) this looks like the most probable cause.
The second possibility is that your users double-click on the send button. With the right timing, their browsers could send two complete requests nearly at the same time.
To make sure that your user records are really unique, you should create unique indexes on your database. These are then actually ensured (albeit with a worse error message compared to the ActiveRecord check. Because of that, you should always define your uniqueness constraint on both the database schema and your models.
Also, you could look into replacing your frontend nginx with a more conformant loadbalancer. I'd recommend haproxy for that.
It really seems like a race condition. Make sure to lock between the requests. It could easily happen that one or two requests are duplicated every now and then. The same can happen when exchanging items without transactions, so please make sure that you don't have a race between your requests.