Logback messages are getting mixed up in multithreaded dropwizard service scenario - logback
We have a multithreaded service where some individual log messages are getting corrupted when another thread writes to the log at the same time. How can I prevent this from happening?
We are using worker threads to make REST service calls, receive, and process a response. While doing so, we write to the log, using logback, the default logging engine in dropwizard.
specific case that I'm looking at is when a thread throws an exception, we log the stack trace. What I see is that the trace is cut in half by entries from at least one other thread:
Example
[pool-4-thread-13] Error!
stack trace line 1
[pool-4-thread-17] Event 1 happened
[pool-4-thread-17] Event 2 happened
stack trace line 2
stack trace line 3
stack trace line 4
What I expected to see in the log was something like this:
[pool-4-thread-13] Error!
stack trace line 1
stack trace line 2
stack trace line 3
stack trace line 4
[pool-4-thread-17] Event 1 happened
[pool-4-thread-17] Event 2 happened
config.yaml
server:
adminContextPath: /admin
adminConnectors:
- type: http
port: 9193
applicationContextPath: /
applicationConnectors:
- type: http
port: 9192
maxRequestHeaderSize: 64KiB
requestLog:
timeZone: UTC
appenders:
- type: file
currentLogFilename: ./logs/service-requests.log
archive: true
# When the log file rotates, the archived log will be renamed to this and gzipped. The
# %d is replaced with the previous day (yyyy-MM-dd). Custom rolling windows can be created
# by passing a SimpleDateFormat-compatible format as an argument: "%d{yyyy-MM-dd-hh}".
archivedLogFilenamePattern: ./logs/service-requests.log.%d.log.gz
# The number of archived files to keep.
archivedFileCount: 30
logging:
level: INFO
loggers:
# We only care about warning log messages from the apache libraries, like hadoop, zookeeper, and phoenix
org.apache: WARN
appenders:
- type: file
timeZone: UTC
currentLogFilename: ./logs/service-log.log
archive: true
archivedFileCount: 30
archivedLogFilenamePattern: ./logs/service-log-%d.log.gz
logFormat: "[%date{dd MMM yyyy HH:mm:ss}] [%thread] %-5level %logger - %message %n"
Related
Config block returns not found, in hyperledger fabric
Trying to fetch config block to create a config update. I'm using the test network in fabric samples with default settings (no CA) even after starting the network I cannot fetch any blocks. not latest or oldest either This is the output I'm getting peer channel fetch config 2022-02-08 11:09:47.306 +03 [channelCmd] InitCmdFactory -> INFO 001 Endorser and orderer connections initialized 2022-02-08 11:09:47.309 +03 [cli.common] readBlock -> INFO 002 Expect block, but got status: &{NOT_FOUND} Error: can't read the block: &{NOT_FOUND}
I think you need to specify the channel, for example: peer channel fetch config -c mychannel That works for me with the default test network channel, and I get the same error you saw without the -c option. It's also worth having a look at the test network scripts since they are meant to be a sample themselves. In this case configUpdate.sh does a config update.
CouchBase not restarting properly after disk extension
I'm using CouchBase. We reach the disk limit three days ago. We extended the disk space but CouchBase doesn't starting properly : the web console is not accessible. The debug logs show the lies below: crasher: initial call: application_master:init/4 pid: <0.86.0> registered_name: [] exception exit: {{shutdown, {failed_to_start_child,ns_server_nodes_sup, {shutdown, {failed_to_start_child,start_couchdb_node, {{badmatch,{error,duplicate_name}}, [{ns_server_nodes_sup, '-start_couchdb_node/0-fun-0-',0, [{file,"src/ns_server_nodes_sup.erl"},{line,129}]}, {ns_port_server,init,1, [{file,"src/ns_port_server.erl"},{line,73}]}, {gen_server,init_it,6, [{file, "c:/tools/cygwin/home/ADMINI~1/OTP_SR~2/lib/stdlib/src/gen_server.erl"}, {line,304}]}, {proc_lib,init_p_do_apply,3, [{file, "c:/tools/cygwin/home/ADMINI~1/OTP_SR~2/lib/stdlib/src/proc_lib.erl"}, {line,239}]}]}}}}}, {ns_server,start,[normal,[]]}} in function application_master:init/4 (c:/tools/cygwin/home/ADMINI~1/OTP_SR~2/lib/kernel/src/application_master.erl, line 133) ancestors: [<0.85.0>] messages: [{'EXIT',<0.87.0>,normal}] links: [<0.85.0>,<0.7.0>] dictionary: [] trap_exit: true status: running heap_size: 1598 stack_size: 27 reductions: 202 neighbours: [error_logger:info,2019-11-18T20:44:58.184+01:00,ns_1#127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:do_log:203] I’m using Windows NT version 6.2 Server Editon. Build 9200 I extended the disk space by using a private Cloud Provider system. The disk wasn’t replace. We just extended it. Does any one faced tis kind of issue?
Ejabberd Disco Iteam not coming
Vrsion: 17.11 Platform : ubuntu 16.04 With the mod_muc configuration, sometimes disco items does not load at all. Here is a configuration I have used for disco items. Here is a crash log I found while crashes mod_muc: db_type: sql default_room_options: - allow_subscription: true - mam: true access_admin: - allow: admin access_create: muc_create access_persistent: muc_create history_size: 100 max_rooms_discoitems: 1000 max_user_conferences: 50 max_users_presence: 50 Also, while joining same muc which was earlier available does not get connection. If I restart the server, things works well and again after certain times muc s doesn't come Error Log: Stopping MUC room x#conference.host.com 2018-07-27 12:57:39.972 [error] <0.32056.26> gen_fsm <0.32056.26> in state normal_state terminated with reason: bad return value: ok 2018-07-27 12:57:39.972 [error] <0.32056.26>#p1_fsm:terminate:760 CRASH REPORT Process <0.32056.26> with 0 neighbours exited with reason: bad return value: ok in p1_fsm:terminate/8 line 760 2018-07-30 05:12:12 =ERROR REPORT==== ** State machine <0.9190.27> terminating ** Last event in was {route,<<>>,{iq,<<"qM1F3-119">>,set,<<"en">>,{jid,<<"usr_name">>,<<"x.y.com">>,<<"1140">>,<<"usr_name">>,<<"x.y.com">>,<<"1140">>},{jid,<<"planet_discovery1532511384">>,<<"conference.x.y.com">>,<<>>,<<"planet_discovery1532511384">>,<<"conference.x.y.com">>,<<>>},[{xmlel,<<"query">>,[{<<"xmlns">>,<<"urn:xmpp:mam:2">>}],[{xmlel,<<"set">>,[{<<"xmlns">>,<<"http://jabber.org/protocol/rsm">>}],[{xmlel,<<"max">>,[],[{xmlcdata,<<"30">>}]},{xmlel,<<"after">>,[],[]}]},{xmlel,<<"x">>,[{<<"xmlns">>,<<"jabber:x:data">>},{<<"type">>,<<"submit">>}],[{xmlel,<<"field">>,[{<<"var">>,<<"FORM_TYPE">>},{<<"type">>,<<"hidden">>}],[{xmlel,<<"value">>,[],[{xmlcdata,<<"urn:xmpp:mam:2">>}]}]}]}]}],#{ip => {0,0,0,0,0,65535,46291,27829}}}} ** When State == normal_state ** Data == {state,<<"planet_discovery1532511384">>, <<"conference.x.y.com">>,<<"x.y.com">>,{all,muc_create,[{allow, [{acl,admin}]}],muc_create},{jid,<<"planet_discovery1532511384">>,<<"conference.x.y.com">>,<<>>,<<"planet_discovery1532511384">>,<<"conference.x.y.com">>,<<>>},{config,<<"Planet Discovery">>,<<>>,true,true,true,anyone,true,true,false,true,true,true,false,true,true,true,true,false,<<>>,true,[moderator,participant,visitor],true,1800,200,false,<<>>,{0,nil},true},{dict,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[[{<<"usr_name">>,<<"x.y.com">>,<<"1140">>}|{x.y.com,{jid,<<"usr_name">>,<<"x.y.com">>,<<"1140">>,<<"usr_name">>,<<"x.y.com">>,<<"1140">>},<<"usr_name#x.y.com/1140">>,moderator,{presence,<<"qM1F3-116">>,available,<<"en">>,{jid,<<"usr_name">>,<<"x.y.com">>,<<"1140">>,<<"usr_name">>,<<"x.y.com">>,<<"1140">>},{jid,<<"planet_discovery1532511384">>,<<"conference.x.y.com">>,<<"usr_name#x.y.com/1140">>,<<"planet_discovery1532511384">>,<<"conference.x.y.com">>,<<"usr_name#x.y.com/1140">>},undefined,[],undefined,[{xmlel,<<"c">>,[{<<"xmlns">>,<<"http://jabber.org/protocol/caps">>},{<<"hash">>,<<"sha-1">>},{<<"node">>,<<"http://www.igniterealtime.org/projects/smack">>},{<<"ver">>,<<"p801v5l0jeGbLCy09wmWvQCQ7Ok=">>}],[]},{vcard_xupdate,{<<>>,<<>>},undefined}],#{ip => {0,0,0,0,0,65535,46291,27829}}}}]],[],[],[],[],[],[],[],[],[],[]}}},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},nil,{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},{dict,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[[<<"usr_name#x.y.com/1140">>,{<<"usr_name">>,<<"x.y.com">>,<<"1140">>}]],[],[],[],[],[],[],[],[],[],[],[],[]}}},{dict,3,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[[{<<"usr_name">>,<<"x.y.com">>,<<>>}|{owner,<<>>}]],[],[],[[{<<"miga8747b6">>,<<"x.y.com">>,<<>>}|{owner,<<>>}]],[],[],[],[[{<<"ruba32cc6e">>,<<"x.y.com">>,<<>>}|{owner,<<>>}]],[]}}},{lqueue,{{[],[]},0,unlimited},1000},[],<<>>,false,nil,none,undefined} ** Reason for termination = ** {bad_return_value,ok} 2018-07-30 05:12:12 =CRASH REPORT==== crasher: initial call: mod_muc_room:init/1 pid: <0.9190.27> registered_name: [] exception exit: {{bad_return_value,ok},[{p1_fsm,terminate,8,[{file,"src/p1_fsm.erl"},{line,760}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]} ancestors: ['mod_muc_x.y.com',ejabberd_gen_mod_sup,ejabberd_sup,<0.32330.26>] messages: [] links: [] dictionary: [{'$internal_queue_len',0}] trap_exit: true status: running heap_size: 6772 stack_size: 27 reductions: 3310 neighbours: 2018-07-30 12:41:56 =ERROR REPORT====
What ejabberd version? and how did you install it? The syntax of your default_room_options are wrong, did you really use that config like that? And what changes you made from a stock installation? I mean: did you setup a cluster of several nodes, did you enable other modules that may interfere with mod_muc...? And most importantly: you have setup the max_rooms_discoitems to 10000. How many rooms does the service have? That option should be set to a small value, because requesting discoitems for 10.000 rooms means requesting information to each single room, and that means 10.000 queries, and that can have unknown consequences. Does your problem reproduce if you set a low value, like 100?
openshift v3 online pro volume and memory limit issues
I am trying to run an sonatype/nexus3 on openshift online v3 pro. If I just use the web console to create a new app from image it assigns it only 512Mi and it dies with OOM. It did get created though and logged a lot of java output before it died of out of memory. When using the web console there doesnt appear a way to set the memory on the image. When I try to edited the yaml of the pod it doesn't let me edited the memory limit. Reading the docs about memory limits it suggests that I can run with this: oc run nexus333 --image=sonatype/nexus3 --limits=memory=750Mi Then it doesn't even start. It dies with: {kubelet ip-172-31-59-148.ec2.internal} Error: Error response from daemon: {"message":"create c30deb38b3c26252bf1218cc898fbf1c68d8fc14e840076710c211d58ed87a59: mkdir /var/lib/docker/volumes/c30deb38b3c26252bf1218cc898fbf1c68d8fc14e840076710c211d58ed87a59: permission denied"} More information from oc get events: FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 16m 16m 1 nexus333-1-deploy Pod Normal Scheduled {default-scheduler } Successfully assigned nexus333-1-deploy to ip-172-31-50-97.ec2.internal 16m 16m 1 nexus333-1-deploy Pod spec.containers{deployment} Normal Pulling {kubelet ip-172-31-50-97.ec2.internal} pulling image "registry.reg-aws.openshift.com:443/openshift3/ose-deployer:v3.6.173.0.21" 16m 16m 1 nexus333-1-deploy Pod spec.containers{deployment} Normal Pulled {kubelet ip-172-31-50-97.ec2.internal} Successfully pulled image "registry.reg-aws.openshift.com:443/openshift3/ose-deployer:v3.6.173.0.21" 15m 15m 1 nexus333-1-deploy Pod spec.containers{deployment} Normal Created {kubelet ip-172-31-50-97.ec2.internal} Created container 15m 15m 1 nexus333-1-deploy Pod spec.containers{deployment} Normal Started {kubelet ip-172-31-50-97.ec2.internal} Started container 15m 15m 1 nexus333-1-rftvd Pod Normal Scheduled {default-scheduler } Successfully assigned nexus333-1-rftvd to ip-172-31-59-148.ec2.internal 15m 14m 7 nexus333-1-rftvd Pod spec.containers{nexus333} Normal Pulling {kubelet ip-172-31-59-148.ec2.internal} pulling image "sonatype/nexus3" 15m 10m 19 nexus333-1-rftvd Pod spec.containers{nexus333} Normal Pulled {kubelet ip-172-31-59-148.ec2.internal} Successfully pulled image "sonatype/nexus3" 15m 15m 1 nexus333-1-rftvd Pod spec.containers{nexus333} Warning Failed {kubelet ip-172-31-59-148.ec2.internal} Error: Error response from daemon: {"message":"create 3aa35201bdf81d09ef4b09bba1fc843b97d0339acfef0c30cecaa1fbb6207321: mkdir /var/lib/docker/volumes/3aa35201bdf81d09ef4b09bba1fc843b97d0339acfef0c30cecaa1fbb6207321: permission denied"} I am not sure why if I use the web console I cannot assign more memory. I am not sure why running it with oc run dies with the mkdir error. Can anyone tell me how to run sonatype/nexus3 on openshift online pro?
Looking in the documentation I see that it is a Java VM solution. When using Java 8, memory usage can be DRAMATICALLY IMPROVED using only the following 2 runtime Java VM options: ... "-XX:+UnlockExperimentalVMOptions", "-XX:+UseCGroupMemoryLimitForHeap" ... I just deployed my container (Spring Boot JAR) that consumed over 650 MB RAM. With just these two (new) options RAM consumption dropped to just 270 MB!!! So, with these 2 runtime settings all OOM's are left far behind! Enjoy!
You may want to also follow along with the tutorial that is in the OpenShift docs https://docs.openshift.com/online/dev_guide/app_tutorials/maven_tutorial.html I have had success deploying this in OpenShift Online Pro
Okay the mkdir /var/lib/docker/volumes/ permission denied seems to be that the image needs a /nexus-data mount and that is refused. I saw that by deploying from the web console (dies with OOM) but the edit yaml for the created pod to see the generated volume mount. Creating the image with the following yaml using cat nexus3_pod.ephemeral.yaml | oc create -f - with the volume mount and explicit memory settings the container will now start up: apiVersion: "v1" kind: "Pod" metadata: name: "nexus3" labels: name: "nexus3" spec: containers: - name: "nexus3" resources: requests: memory: "1200Mi" limits: memory: "1200Mi" image: "sonatype/nexus3" ports: - containerPort: 8081 name: "nexus3" volumeMounts: - mountPath: /nexus-data name: nexus3-1 volumes: - emptyDir: {} name: nexus3-1 Notes The mage sets -Xmx1200m as documented at sonatype/docker-nexus3. So if you assign memory less than 1200Mi it will crash with OOM when the heap grows over the limit. You may as well set requested and max to be the max heap side anything. When the allocated memory was too low it crashed die just as it was setting up the DB which corrupted the db log which meant it then got in a crash loop "couldn't load 4 byte from 0 byte file" when I recreated it with more memory. It seems that with an emptyDir the files hang around between crash restarts and memory changes (that's documented behaviour I think). I had to recreate a pod with a different name to get a clean emptyDir and assigned memory of 1200Mi to get it to all start.
Moved a Sitecore 6 site to production but got a Media.UploadWatcher Exception?
Has anyone seen an exception relating to the Media.UploadWatcher? I don't have the error handy, but the exception was causing all pages to not load, even the admin section. In order to fix it, I reset the application pool and the site came back up right away. I know that the client was uploading some large files through the content editor, but I wouldn't think that alone would cause problems. I have upped the MaxExecutionTime to allow for those uploads, but again, I don't think that would be the problem. Is there something I forgot to do while moving the code to production or is there a setting that might be off? All I did was copy the code to production, and change the directory references in the web.config to point to the new locations (like the license file). There error hasn't come up again, but I'm scared it will come up at an inopportune time. Any ideas? Thanks in advance! UPDATE: The exception just occurred again on the live site and I had to recycle the app pool. Anyone know what could be causing this? Here is the exception from the event log: Event code: 3005 Event message: An unhandled exception has occurred. Event time: 1/4/2010 9:56:50 AM Event time (UTC): 1/4/2010 3:56:50 PM Event ID: 7fbcc8d807204614904572753b4beb2e Event sequence: 23 Event occurrence: 22 Event detail code: 0 Application information: Application domain: /LM/w3svc/1422107501/root-1-129070941106290901 Trust level: Full Application Virtual Path: / Application Path: C:\HostingSpaces\mysite\mysite.com\wwwroot\ Machine name: 180716WEB1 Process information: Process ID: 310020 Process name: w3wp.exe Account name: 180716WEB1\myuser_web Exception information: Exception type: TypeInitializationException Exception message: The type initializer for 'Sitecore.Resources.Media.UploadWatcher' threw an exception. Request information: Request URL: http://www.mysite.com/Default.aspx Request path: /Default.aspx User host address: 75.147.19.21 User: Is authenticated: False Authentication Type: Thread account name: 180716WEB1\myuser_web Thread information: Thread ID: 7 Thread account name: 180716WEB1\myuser_web Is impersonating: False Stack trace: Custom event details: For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
I'm fairly sure the media uploadwatcher doesn't come in to it when using the content editor to add media - it runs on a schedule (defined in web.config) to check if any items have been added to the media upload folder in the filesystem (I can't remember the exact folder name at the moment). When we've launched sitecore sites, we find it easier to NOT upload the local web.config to live - instead, duplicate changes to both. There are settings and entire sections in the web.config relevant to the role of that server. If you can get the error message, add it to your post.
On our dev server the solution to this error was removing the SiteDefinition.config from the app_config/include folder, which only contains settings between xml comment (version 6.6 update 4) probably de default config file. I Got there by first removing all the files in app_config/include and placing them back one by one.