Mosquitto: Outgoing messages are being dropped - message-queue

First, I'll start by telling that this question is a refinement of Mosquitto not propagating messages to AWS IoT using bridge configuration, so lots of context and logs could be found on that question too, but I decided to start a new one as I guess I found a real symptom of the actual problem, which I prefer to handle it alone to avoid confusion with other possible issues:
Mosquitto (/var/log/mosquitto/mosqutto.log) log files were actually disabled and, the only available logs where from /var/log/syslog but, when we enabled them, and issued a cat mosquitto.log|grep bridge, some relevant messages appeared:
1.- Bridge local.bridgeawsiot doing local SUBSCRIBE on topic #
Which tells that all topic are being bridged
2.- Connecting bridge awsiot (myEndpoint.iot.us-east-1.amazonaws.com:8883)
Which tells that it's using the correct endpoint
3.- Outgoing messages are being dropped for client local.bridgeawsiot.
Which worries me a lot, as I don't really know why is that happening, but seems like a clear symptom of the problem, but after a few searches I found:
"The message "Outgoing messages are being dropped" is shown when the internal message queue become full.", so I guess messages are just being enqueued but not actually being sent to AWS IoT.
So my questions are:
Why are these messages are being dropped?
If they are being dropped because queue is full, then why queued messages are not being sent to the bridged endpoint?
Relevant info:
Version: 1.4.14-0mosquitto1~jessie2
OS: Debian GNU/Linux 9.1 (stretch)

Not sure if we have the same issue but mine was resolved when I added into mosquitto.conf
max_queued_messages 0
From man mosquitto.conf
The maximum number of QoS 1 or 2 messages to hold in the queue (per client) above those messages that are currently in flight. Defaults to 100. Set to 0 for no maximum (not recommended). See also the queue_qos0_messages and max_queued_bytes options.
Reloaded on reload signal.

Related

Mediasoup: Connection state changed to disconnected a few tens of seconds after connected

We use mediasoup to create our products. However, I am having problems with the transport connection.
The client transport connection state goes disconnected a few eights seconds after connection.
The following log will be output in the chrome console.
mediasoup-client:Transport connection state changed to connected
However, the following log will be output in the chrome console a few eights seconds later
mediasoup-client:Transport connection state changed to disconnected
If the NewProducer is present before the disconnection, the above will not happen.
Do you know the possible causes?
Resolved. I changed the AWS security policy according to the topic below and it worked.
You’ll also need to configure your AWS Security Group to allow TCP/UDP on whatever port range you’re using.
https://mediasoup.discourse.group/t/docker-setup-with-listenips/2557/4

VerneMQ plugin_chain_exhausted Authentication MySQL

I have a running instance of VerneMQ (cluster of 2 nodes) on Google kubernets and using MySQL (CloudSQL) for Auth. Server accepts connections over TLS
It works fine, but after a few days i start seeing this message on the log:
can't authenticate client {[],<<"Client-id">>} from X.X.X.X:16609 due to plugin_chain_exhausted
The client app (paho) complains that the server refused the connection for being "not authorized (code=5 in paho error)"
after a few retry it finally connects. but every time it get's harder and harder until it just won't connect anymore
If i restart VerneMQ everything get's back to normal
I have only 3 clients currently connected at most, at the same time.
clients already connected have no issues in pub/sub.
In my configuration i have (among other things):
log.console.level=debug
plugins.vmq_diversity=on
vmq_diversity.mysql.* = all of them set
allow_anonymous=off
vmq_diversity.auth_mysql.enabled=on
it's like the server degrades over time. the status webpage reports no problem
My verne server was build from the git repository about a month ago and runs on a docker container
what could be the cause?
what else could i check to find posibles causes? maybe a diversity missconfiguration?
Tks
To quickly explain the plugin_chain_exhausted log: with Verne you can run multiple authentication/authorization plugins, and they will be checked in a chain. If one plugin allows the client, it will be in. If no plugin allows the client, you'll see the log above.
This does not explain the behaviour you describe, though. I don't think I have seen that.
In any case, the first thing to check is whether you actually run multiple plugins. For instance: have you disabled the vmq.passwd and the vmq.acl plugins?

Terminating dataproc cluster with termination protection on instances produces red flag on cluster that never leaves; is cluster safe?

I need to give a dataproc cluster protection like one can give an AWS EMR cluster. I saw that VM protection is a thing (but can't find anything about dataproc cluster protection), so I decided to try that out.
I made a dataproc cluster, for every instance of which I turned on deletion protection.
As a test of the safety of this arrangement, I tried to delete the cluster from the command line. As a result, the cluster now has a red flag on it all the time. The message reads:
Invalid resource usage: 'Resource cannot be deleted if it's
protected against deletion.'.
My question is this: given the persistent error message, is the cluster still ok? Have I accomplished the cluster protection that I sought? As far as I can tell, everything is still alright, I just wondered if anyone knows more about the state of the management of the cluster in the presence of this scary red exclamation mark.
While your cluster is probably fine, it is now in error state and cannot be used for submitting jobs through the API, or updating.
Dataproc does not currently support delete protection. You can file a feature request here: https://issuetracker.google.com/issues/new?component=187133&template=0

Jmeter: Getting "java.net.ConnectException: Connection timed out: connect" error

I am trying to hit 350 users but Jmeter failing script by saying Connection timed out.
I have added following:
http.connection.stalecheck$Boolean=true in hc.parameter file
httpclient4.retrycount=1
hc.parameter.file=hc.parameter
Is there anything that I am missing to add on?
This normally indicates a problem on application under test side so I would recommend checking the logs of your application for anything suspicious.
If everything seems to be fine there - check logs of your web and database servers, for instance Apache HTTP Server allows 150 connections by default, MySQL - 100, etc. so you may need to identify whether you are suffering from this form of limits and what needs to be done to raise them
And finally it may be simply lack of CPU or free RAM on application under test side so next time you run your test keep an eye on baseline OS health metrics as application may respond slowly or even hang if it doesn't have any spare headroom. You can use JMeter PerfMon plugin to integrate this form of monitoring with your load test.

Google Compute Engine: Internal DNS server and issues with the resolving

Since google Compute engine does not provides internal DNS i created 2 centos bind machines which will do the resolving for the machines on GCE and forward the resolvings over vpn to my private cloud and vice versa.
as the google cloud help docs suggests you can have this kind of scenario. and edit the resolv.conf on each instance to do the resolving.
What i did was edit the ifcg-eth0 to disable the PEERDNS and in /etc/resolv.conf
i added the search domain and top 2 nameservrs my instances.
now after one instance gets rebooted..it wont start again because its searching for the metadata.google.internal domain
Jul 8 10:17:14 instance-1 google: Waiting for metadata server, attempt 412
What is the best practice in this kind of scenarios?
ty
Also i need the internal DNS for to do the poor's man round-robin failover, since GCE does not provides internal balancers.
As mentioned at https://cloud.google.com/compute/docs/networking:
Each instance's metadata server acts as a DNS server. It stores the DNS entries for all network IP addresses in the local network and calls Google's public DNS server for entries outside the network. You cannot configure this DNS server, but you can set up your own DNS server if you like and configure your instances to use that server instead by editing the /etc/resolv.conf file.
So you should be able to just use 169.254.169.254 for your DNS server. If you need to define external DNS entries, you might like Cloud DNS. If you set up a domain with Cloud DNS, or any other DNS provider, the 169.254.169.254 resolver should find it.
If you need something more complex, such as customer internal DNS names, then your own BIND server might be the best solution. Just make sure that metadata.google.internal. resolves to 169.254.169.254.
OK, I just ran in to this.. but unfortunately there was no timeout after 30 minutes that got it working. Fortunatly nelasx had correctly diagnosed it, and given the fix. I'm adding this to give the steps I had to take based on his excellent question and commented answer. I've just pulled the info I had to gather together in one place, to get to a solution.
Symptoms: on startup of the google instance - getting connection refused
After inspecting serial console output, will see:
Jul 8 10:17:14 instance-1 google: Waiting for metadata server, attempt 412
You could try waiting, didn't work for me, and inspection of https://github.com/GoogleCloudPlatform/compute-image-packages/blob/master/google-startup-scripts/usr/share/google/onboot
# Failed to resolve host or connect to host. Retry indefinitely.
6|7) sleep 1.0
log "Waiting for metadata server, attempt ${count}"
Led me to believe that will not work.
So, the solution was to fiddle with the disk, to add in nelasx's solution:
"edit ifcfg-eth and change PEERDNS=no edit /etc/resolv.conf and put on top your nameservers + search domain edit /etc/hosts and add: 169.254.169.254 metadata.google.internal"
To do this,
Best to create a snapshot backup before you start in case it goes awry
Uncheck "Delete boot disk when instance is deleted" for your instance
Delete the instance
Create a micro instance
Mount the disk
sudo ls -l /dev/disk/by-id/* # this will give you the name of the instances
sudo mkdir /mnt/new
sudo mount /dev/disk/by-id/scsi-0Google_PersistentDisk_instance-1-part1 /mnt/new
where instance-1 will be changed as per your setup
Go in an edit as per nelasx's solution - idiot trap I fell for - use a relative path - don't just sudo vi /etc/hosts use /mnt/new/etc/hosts - that cost me 15 more minutes as I had to go through the: got depressed, scratched head, kicked myself cycle.
Delete the debug instance, ensuring your attached disk delete option is unchecked
Create a new instance matching your original with the edited disk as your boot disk and fire it up.