Google Chrome version 80 on GalliumOS reliably crashes after several hours - google-chrome

I have about 50 Chromeboxes running GalliumOS, displaying the same HTTP pages all day (dashboards) using Chrome version 80.0.3987.116.
For the past couple weeks, Chrome has been crashing every 18 hours or so, on every single Chromebox. It becomes unresponsive and I have to kill the process or restart the box.
I'm testing Chrome on one of them using this command that loads a given URL:
google-chrome-stable --load-media-router-component-extension=0 --disable-session-crashed-bubble --enable-logging --v=1 app="http://192.168.x.x/whatever"
(The --load-media-router-component-extension=0 is a workaround we put in for a previous issue- the Chrome Media Router was reliably crashing after only a few hours, and we aren't using Chromecast anyway, so that was a no-brainer. And --disable-session-crashed-bubble is to get rid of a modal dialog; we only have SSH access to these things.)
I came in this morning and it was locked up on schedule. On STDOUT it kept printing this over and over again:
[12171:1:0100/00000.627388:ERROR:broker_posix.cc(46)] Received unexpected number of handles
[12171:1:0100/00000.627421:ERROR:command_buffer_proxy_impl.cc(94)] ContextResult::kFatalFailure: AllocateAndMapSharedMemory failed
In chrome_debug.log I'm finding this at the end (repeated over and over):
[12171:1:0100/000000.404872:ERROR:command_buffer_proxy_impl.cc(94)] ContextResult::kFatalFailure: AllocateAndMapSharedMemory failed
[12171:1:0100/000000.405038:ERROR:broker_posix.cc(46)] Received unexpected number of handles
Right now, to get around this we're running a cron job that restarts lightdm on each Chromebox after a couple hours, but we're considering downgrading all of them to whatever the last stable version of Chrome was, 79 or 78.

Related

SSRS Desktop Report Builder Applicartion fails to Launch

Today when I start report builder from the start menu I see the splash screen then it closes.
I tried deleting temp files and files under "C:\Users\me\AppData\Local\Apps\2.0"
For testing I created a new local windows account and installed it again there and it launched fine.
Any ideas?
In the event viewer I found this:
Application: MSReportBuilder.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.Xml.XmlException
at System.Xml.XmlTextReaderImpl.Throw(System.Exception)
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlLoader.Load(System.Xml.XmlDocument, System.Xml.XmlReader, Boolean)
at System.Xml.XmlDocument.Load(System.Xml.XmlReader)
at System.Xml.XmlDocument.Load(System.String)
at Microsoft.ReportDesigner.Properties.Settings+RBSettingsProvider.GetReader(System.String)
at Microsoft.ReportDesigner.Properties.Settings+RBSettingsProvider.GetUserScopedReader()
at Microsoft.ReportDesigner.Properties.Settings+RBSettingsProvider.GetPropertyValues(System.Configuration.SettingsContext, System.Configuration.SettingsPropertyCollection)
at System.Configuration.SettingsBase.GetPropertiesFromProvider(System.Configuration.SettingsProvider)
at System.Configuration.SettingsBase.GetPropertyValueByName(System.String)
at System.Configuration.SettingsBase.get_Item(System.String)
at System.Configuration.ApplicationSettingsBase.GetPropertyValue(System.String)
at System.Configuration.ApplicationSettingsBase.get_Item(System.String)
at Microsoft.ReportDesigner.Properties.Settings.get_RecentDataSources()
at Microsoft.ReportDesigner.ApplicationSettings.get_Settings()
at Microsoft.ReportDesigner.ReportDesigner.LoadAppConfigWndSize()
at Microsoft.ReportDesigner.ReportDesigner..ctor()
at Microsoft.ReportDesigner.ReportDesigner..ctor(Microsoft.ReportDesigner.AppArguments)
at Microsoft.ReportDesigner.Program.Main(System.String[])
What fixed this for me was the user.config file in the local report builder folder was # 0kb so I replaced it with another one off another computer and it works just fine now.
Yeah look under users home AppData/Local/Microsoft and try deleting or renaming "Report Builder" folder. Sometimes the app messes some preferences and then crashes. That worked for me (might be also localLow or roaming folders) – sproketboy Feb 25 at 22:11
The solution provided by sproketboy in their comment on Feb 25 at 22:11 worked for me. Given that it actually solved this situation for me, I'm re-posting it here as an answer.

What does the argument --virtual-time-budget of Chrome CLI really mean?

I'm aware of the documentation of the argument --virtual-time-budget in the source of Chromium, but I don't feel I understand it:
// If set the system waits the specified number of virtual milliseconds before
// deeming the page to be ready. For determinism virtual time does not advance
// while there are pending network fetches (i.e no timers will fire). Once all
// network fetches have completed, timers fire and if the system runs out of
// virtual time is fastforwarded so the next timer fires immediately, until the
// specified virtual time budget is exhausted.
const char kVirtualTimeBudget[] = "virtual-time-budget";
I did some experiments, and the results were confusing to me:
# I'm on macOS; you may change this alias according to your own OS
$ alias chrome="/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome"
$ chrome --version
Google Chrome 70.0.3538.110
$ time chrome --headless --disable-gpu --print-to-pdf https://www.chromestatus.com/
real 0m0.912s
user 0m0.264s
sys 0m0.219s
$ time chrome --headless --disable-gpu --print-to-pdf --virtual-time-budget=10000 https://www.chromestatus.com/
real 0m2.502s
user 0m0.347s
sys 0m0.244s
$ time chrome --headless --disable-gpu --print-to-pdf --virtual-time-budget=100000 https://www.chromestatus.com/
real 0m15.432s
user 0m0.759s
sys 0m0.406s
$ time chrome --headless --disable-gpu --print-to-pdf --virtual-time-budget=1000000 https://www.chromestatus.com/
real 0m15.755s
user 0m0.755s
sys 0m0.401s
I thought Chrome would wait for 0, 10, 100, and 1000 seconds in the above four examples before printing to PDF, but the actual waiting time seemed to be far off. My question is, how to make Chrome wait definitely for X seconds before printing a page to PDF? I'm only considering the Chrome CLI at the moment, and I'm not looking for tools like Puppeteer.
I can answer your title question easily(which explains your results). --virtual-time-budget, states how long the process will wait for a page to load, not that it will wait that long. If the result of the request is available(no more network requests are pending), it will return the results immediately.
The information returned should be correct, unless there is an AJAX request or other Javascript in the mix. If so, you must resort to Javascript/DOM manipulation to resolve the issue.

DIY cartridge stops on git push

I've been developing an application for some weeks, and it's been running in a OpenShift small gear with DIY 0.1 + PostgreSQL cartridges for several days, including ~5 new deployments. Everything was ok and a new deploy stopped and started everything in seconds.
Nevertheless today pushing master as usual stops the cartridge and it won't start. This is the trace:
Counting objects: 2688, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (1930/1930), done.
Writing objects: 100% (2080/2080), 10.76 MiB | 99 KiB/s, done.
Total 2080 (delta 1300), reused 13 (delta 0)
remote: Stopping DIY cartridge
fatal: The remote end hung up unexpectedly
fatal: The remote end hung up unexpectedly
Logging in with ssh and running the start action hook manually fails because database is stopped. Restarting the gear makes everything work again.
The failing deployment has nothing to do with it, since it only adds a few lines of code, nothing about configuration or anything that might break the boot.
Logs (at $OPENSHIFT_LOG_DIR) reveal nothing. Quota usage seems fine:
Cartridges Used Limit
---------------------- ------ -----
diy-0.1 postgresql-9.2 0.6 GB 1 GB
Any suggestions about what could I check?
Oh, dumb mistake. My last working deployment involved a change in the binary name, which now matches the gear name. stop script, with ps grep and so on was wrong, not killing only the application but also the connection. Changing it fixed the issue.
Solution inspired by this blogpost.

Hot reconfiguration of HAProxy still lead to failed request, any suggestions?

I found there are still failed request when the traffic is high using command like this
haproxy -f /etc/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
to hot reload the updated config file.
Here below is the presure testing result using webbench :
/usr/local/bin/webbench -c 10 -t 30 targetHProxyIP:1080
Webbench – Simple Web Benchmark 1.5
Copyright (c) Radim Kolar 1997-2004, GPL Open Source Software.
Benchmarking: GET targetHProxyIP:1080
10 clients, running 30 sec.
Speed=70586 pages/min, 13372974 bytes/sec.
**Requests: 35289 susceed, 4 failed.**
I run command
haproxy -f /etc/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
several times during the pressure testing.
In the haproxy documentation, it mentioned
They will receive the SIGTTOU
611 signal to ask them to temporarily stop listening to the ports so that the new
612 process can grab them
so there is a time period that the old process is not listening on the PORT(say 80) and the new process haven’t start to listen to the PORT (say 80), and during this specific time period, it will cause the NEW connections failed, make sense?
So is there any approach that makes the configuration reload of haproxy that will not impact both existing connections and new connections?
On recent kernels where SO_REUSEPORT is finally implemented (3.9+), this dead period does not exist anymore. While a patch has been available for older kernels for something like 10 years, it's obvious that many users cannot patch their kernels. If your system is more recent, then the new process will succeed its attempt to bind() before asking the previous one to release the port, then there's a period where both processes are bound to the port instead of no process.
There is still a very tiny possibility that a connection arrived in the leaving process' queue at the moment it closes it. There is no reliable way to stop this from happening though.

Starting google-chrome via Selenium on headless debian system

I'm trying to start the google-chrome browser in disabled web security mode. The selenium log says:
15:36:33.526 INFO - Command request: getNewBrowserSession[*googlechrome, http://www.myurl.de, , commandLineFlags=--disable-web-security] on session null
Anyways, it just hangs after
15:36:33.600 INFO - Launching Google Chrome...
Here's the stack trace:
16:36:44.605 ERROR - Failed to start new browser session, shutdown browser and clear all session data org.openqa.selenium.server.RemoteCommandException: timed out waiting for window 'null' to appear at org.openqa.selenium.server.FrameGroupCommandQueueSet.waitForLoad(FrameGroupCommandQueueSet.java:564) at org.openqa.selenium.server.FrameGroupCommandQueueSet.waitForLoad(FrameGroupCommandQueueSet.java:521) at org.openqa.selenium.server.BrowserSessionFactory.createNewRemoteSession(BrowserSessionFactory.java:374) at org.openqa.selenium.server.BrowserSessionFactory.getNewBrowserSession(BrowserSessionFactory.java:125) at org.openqa.selenium.server.BrowserSessionFactory.getNewBrowserSession(BrowserSessionFactory.java:87) at org.openqa.selenium.server.SeleniumDriverResourceHandler.getNewBrowserSession(SeleniumDriverResourceHandler.java:785) at org.openqa.selenium.server.SeleniumDriverResourceHandler.doCommand(SeleniumDriverResourceHandler.java:422) at org.openqa.selenium.server.SeleniumDriverResourceHandler.handleCommandRequest(SeleniumDriverResourceHandler.java:393) at org.openqa.selenium.server.SeleniumDriverResourceHandler.handle(SeleniumDriverResourceHandler.java:146) at org.openqa.jetty.http.HttpContext.handle(HttpContext.java:1530) at org.openqa.jetty.http.HttpContext.handle(HttpContext.java:1482) at org.openqa.jetty.http.HttpServer.service(HttpServer.java:909) at org.openqa.jetty.http.HttpConnection.service(HttpConnection.java:820) at org.openqa.jetty.http.HttpConnection.handleNext(HttpConnection.java:986) at org.openqa.jetty.http.HttpConnection.handle(HttpConnection.java:837) at org.openqa.jetty.http.SocketListener.handleConnection(SocketListener.java:243) at org.openqa.jetty.util.ThreadedServer.handle(ThreadedServer.java:357) at org.openqa.jetty.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
Selenium is started by robotframework by the robotframework-maven-plugin. Also xvfb is started by the maven build script to simulate a display. But the startup configuration does not seem to be the problem. Everything starts fine, just the browser won't get up.
I hope anyone can help me.
Make sure that the user account that is launching the browser has a home directory. Otherwise the browser profile creation will fail.