I have 3 etcd nodes on VMs (not k8s).
There was such problem that nodes are alive but can't see each other, error "connection timeout" during health check. But every single node has "alive" status and zabbix with "etcd by http" template doesn't generate any alerts.
Is there any way to check nodes visibility and to monitor it using zabbix?
Depending upon the version you run, here's an example to do this with 3.5.2
Command
ETCDCTL_API=3 ./bin/etcdctl endpoint status --cluster -w table --endpoints="member1.etcd:2384,member2.etcd:2384,member3.etcd:2384"
Output:
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+--------------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://member1.etcd:2384 | 17ef476d9d7fec5f | 3.5.2 | 1.5 MB | false | false | 7 | 20033 | 20033 | |
| http://member2.etcd:2384 | 31e0ca30ec3c9d94 | 3.5.2 | 1.5 MB | false | false | 7 | 20033 | 20033 | |
| http://member3.etcd:2384 | 721948abbb0522bd | 3.5.2 | 1.5 MB | false | false | 7 | 20033 | 20033 | |
+--------------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
I'm trying to model a rule for perseo CEP using esper time intervals (timer:interval), like the one documented on http://esper.espertech.com/release-6.1.0/esper-reference/html/event_patterns.html#pattern-atoms.
In particular this example which must be fire every 20 seconds is never executed:
{
"name": "timer_interval_update",
"text": "select *,\"timer_interval_update\" as ruleName from pattern [every timer:interval(20 sec)]",
"action": {
"type": "update",
"parameters": {
"id":"${id}",
"type":"A",
"attributes": [
{
"name": "status",
"value": "open",
"type": "text"
}
]
}
}
}
In the perseo-core log I'm receiving:
time=2018-07-31T19:17:11.821Z | lvl=INFO | from= | corr=n/a | trans=n/a | srv=n/a | subsrv=n/a | op=update | comp=perseo-core | msg=rule fired: WrapperEventBean [event=MapEventBean eventType=com.espertech.esper.event.map.MapEventType#7c584419] [properties={ruleName=timer_interval_update}]
time=2018-07-31T19:17:11.827Z | lvl=ERROR | from= | corr=n/a | trans=n/a | srv=n/a | subsrv=n/a | op=DoHTTPPost | comp=perseo-core | msg=exception IOException: java.net.SocketException: Unexpected end of file from server
time=2018-07-31T19:17:11.828Z | lvl=ERROR | from= | corr=n/a | trans=n/a | srv=n/a | subsrv=n/a | op=update | comp=perseo-core | msg=action post failed
I'm using a dockerized environment using this docker-compose:
https://github.com/aguirrea/fiware-perseo-test.git
We are using Idas/Orion/Mongo(docker build) and Cygnus together in order to send data to Ckan and Cosmos.
We simulated 100 sensors that send data each 3 minutes, This approach stopped working after 2 days, I've checked both IDAS and Orion logs and see these Mongodb erros in the logs, there is not any notification coming both of the components no more.
Idas log:
failed time=2016-05-25T11:30:13,852.191UTC | lvl=ERROR | comp=iota:Manager | op=checkIndexes | file=[140414053451808:admin_service.cc:148] | msg=Check configuration, error in checkIndexes DBException can't connect couldn't connect to server 172.17.0.2:27017 (172.17.0.2), connection attempt failed
time=2016-05-25T11:30:13,853.966UTC | lvl=ERROR | comp=iota:Manager | op=conn | file=[140414053451808:mongo_connection.cc:254] | msg=It has reached the maximum mongo pool
time=2016-05-25T11:30:13,853.993UTC | lvl=ERROR | comp=iota:Manager | op=conn | file=[140414053451808:mongo_connection.cc:258] | msg=create a new con
Orion log:
time=2016-05-25T11:30:04.948UTC | lvl=INFO | trans=N/A | srv=N/A | subsrv=N/A | from=N/A | function=main | comp=Orion | msg=contextBroker.cpp[1719]: Orion Context Broker is running
time=2016-05-25T11:30:04.964UTC | lvl=ERROR | trans=N/A | srv=N/A | subsrv=N/A | from=N/A | function=mongoConnect | comp=Orion | msg=mongoConnectionPool.cpp[140]: Database Startup Error (cannot connect to mongo - doing 100 retries with a 1000 microsecond interval)
time=2016-05-25T11:30:05.969UTC | lvl=INFO | trans=N/A | srv=N/A | subsrv=N/A | from=N/A | function=mongoConnect | comp=Orion | msg=mongoConnectionPool.cpp[205]: Successful connection to database
time=2016-05-25T11:30:05.970UTC | lvl=INFO | trans=N/A | srv=N/A | subsrv=N/A | from=N/A | function=setWriteConcern | comp=Orion | msg=connectionOperations.cpp[681]: Database Operation Successful (setWriteConcern: 1)
time=2016-05-25T11:30:05.970UTC | lvl=INFO | trans=N/A | srv=N/A | subsrv=N/A | from=N/A | function=getWriteConcern | comp=Orion | msg=connectionOperations.cpp[724]: Database Operation Successful (getWriteConcern)
`
Do you think that this is related with the number of data that is being sent to Idas? and mongodb stopped due to maximum connections exceeding?
thanks
Orion shows the cannot connect to mongo - doing 100 retries with a 1000 microsecond interval error when the DB cannot be accessed, e.g. when the mongod server is down. I'm not an expert on IDAS, but I'd say that couldn't connect to server 172.17.0.2:27017 (172.17.0.2), connection attempt failed error is poiting to the same cause.
Thus, the solution to the problem is to ensure that MongoDB is up and running and accesible from Orion and IDAS.
My issue is even i disable the root user from audit logging but still logging for these user. Anyone please help. Here is i did step by step.
[Setp -1] Check the audit log variable.
mysql> SHOW VARIABLES LIKE 'audit_log%';
+-----------------------------+--------------+
| Variable_name | Value |
+-----------------------------+--------------+
| audit_log_buffer_size | 1048576 |
| audit_log_connection_policy | ALL |
| audit_log_current_session | ON |
| audit_log_exclude_accounts | |
| audit_log_file | audit.log |
| audit_log_flush | OFF |
| audit_log_format | OLD |
| audit_log_include_accounts | |
| audit_log_policy | ALL |
| audit_log_rotate_on_size | 0 |
| audit_log_statement_policy | ALL |
| audit_log_strategy | ASYNCHRONOUS |
+-----------------------------+--------------+
12 rows in set (0.00 sec)
[Setp-2]
The following statement is disable audit logging for root account.
-- audit_log_include_accounts to NULL
SET GLOBAL audit_log_include_accounts = NULL;
SET GLOBAL audit_log_exclude_accounts = root#%;
Note: I used the root#% instead root#localhost because of this database server can access from another ip address.
[Setp-3] I call the select statement SELECT * FROM SSVR_AUDIT_LOG from remote PC.
[Step-4] I checked the audit log in DB server.
<AUDIT_RECORD TIMESTAMP="2016-04-22T03:49:11 UTC" RECORD_ID="593_2016-04-22T01:28:17" NAME="Query" CONNECTION_ID="6" STATUS="0" STATUS_CODE="0" USER="root[root] # [162.16.22.48]" OS_LOGIN="" HOST="" IP="162.16.22.48" COMMAND_CLASS="show_create_table" SQLTEXT="SHOW CREATE TABLE `SSVR_AUDIT_LOG`"/>
<AUDIT_RECORD TIMESTAMP="2016-04-22T03:49:12 UTC" RECORD_ID="594_2016-04-22T01:28:17" NAME="Query" CONNECTION_ID="7" STATUS="0" STATUS_CODE="0" USER="root[root] # [162.16.22.48]" OS_LOGIN="" HOST="" IP="162.16.22.48" COMMAND_CLASS="select" SQLTEXT="SELECT * FROM `SSVR_AUDIT_LOG` LIMIT 0, 1000"/>
<AUDIT_RECORD TIMESTAMP="2016-04-22T03:49:12 UTC" RECORD_ID="595_2016-04-22T01:28:17" NAME="Query" CONNECTION_ID="7" STATUS="0" STATUS_CODE="0" USER="root[root] # [162.16.22.48]" OS_LOGIN="" HOST="" IP="162.16.22.48" COMMAND_CLASS="show_fields" SQLTEXT="SHOW COLUMNS FROM `tldssvr`.`SSVR_AUDIT_LOG`"/>
<AUDIT_RECORD TIMESTAMP="2016-04-22T03:49:13 UTC" RECORD_ID="596_2016-04-22T01:28:17" NAME="Quit" CONNECTION_ID="7" STATUS="0" STATUS_CODE="0" USER="root" OS_LOGIN="" HOST="" IP="162.16.22.48" COMMAND_CLASS="connect"/>
Here is my reference link enter link description here
I got the answer for my question. Here is correct answer. When you facing like that issue, you can follow below the steps.
Audit Log Filtering by Account
List all ‘audit log’ configuration items
> mysql -u root -p
> SHOW VARIABLES LIKE ‘audit_log%’;
+-----------------------------+--------------+
| Variable_name | Value |
+-----------------------------+--------------+
| audit_log_buffer_size | 1048576 |
| audit_log_connection_policy | ALL |
| audit_log_current_session | OFF |
| audit_log_exclude_accounts | |
| audit_log_file | audit.log |
| audit_log_flush | OFF |
| audit_log_format | OLD |
| audit_log_include_accounts | |
| audit_log_policy | ALL |
| audit_log_rotate_on_size | 0 |
| audit_log_statement_policy | ALL |
| audit_log_strategy | ASYNCHRONOUS |
+-----------------------------+--------------+
To add the remote application server host name and ip address in database server.
> cat /etc/hosts
> 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
162.16.22.48 App_PC
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
To disable audit logging only for the application database user (root) local host and remote host accounts.
> mysql –u root –p
>SET GLOBAL audit_log_include_accounts = NULL;
>SET GLOBAL audit_log_exclude_accounts = 'root#localhost,root#App_PC';
List all ‘audit log’ configuration items and check the audit_log_exclude_account value.
> SHOW VARIABLES LIKE 'audit_log%';
> +-----------------------------+----------------------------+
| Variable_name | Value |
+-----------------------------+----------------------------+
| audit_log_buffer_size | 1048576 |
| audit_log_connection_policy | ALL |
| audit_log_current_session | OFF |
| audit_log_exclude_accounts | root#localhost,root#App_PC |
| audit_log_file | audit.log |
| audit_log_flush | OFF |
| audit_log_format | OLD |
| audit_log_include_accounts | |
| audit_log_policy | ALL |
| audit_log_rotate_on_size | 0 |
| audit_log_statement_policy | ALL |
| audit_log_strategy | ASYNCHRONOUS |
+-----------------------------+----------------------------+
So the problem is that when i try to establish a connection with the Context Broker whether i'm trying to update the entity or read the values. It only responds ok when i ask a second time.
Context Broker Version: 0.24.0 (i updated from 0.20.0 because i thought that was the problem)
Example:
python2.7 GetEntity.py entity_1
Output:
* Asking to http://188.***.**.***:1026/ngsi10/queryContext
* Headers: {'Fiware-Service': 'fiwarefinal', 'content-type': 'application/json', 'accept': 'application/json', 'X-Auth-Token': 'NULL'}
* Sending PAYLOAD:
{
"entities": [
{
"type": "",
"id": "entity_1",
"isPattern": "false"
}
],
"attributes": []
}
...
Traceback (most recent call last):
File "GetEntity.py", line 73, in <module>
r = requests.post(URL, data=PAYLOAD, headers=HEADERS)
File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 109, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 573, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 415, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))
When i do it a second time:
* Asking to http://188.***.**.***:1026/ngsi10/queryContext
* Headers: {'Fiware-Service': 'fiwarefinal', 'content-type': 'application/json', 'accept': 'application/json', 'X-Auth-Token': 'NULL'}
* Sending PAYLOAD:
{
"entities": [
{
"type": "",
"id": "entity_1",
"isPattern": "false"
}
],
"attributes": []
}
...
* Status Code: 200
* Response:
{
"contextResponses" : [
{
"contextElement" : {
"type" : "",
"isPattern" : "false",
"id" : "entity_1",
"attributes" : [
{
"name" : "humidity",
"type" : "int",
"value" : "4"
},
{
"name" : "latitude",
"type" : "float",
"value" : "3"
},
{
"name" : "longitude",
"type" : "float",
"value" : "5"
},
{
"name" : "temperature",
"type" : "int",
"value" : "8"
}
]
},
"statusCode" : {
"code" : "200",
"reasonPhrase" : "OK"
}
}
]
}
Example Nrº 2:
curl http://188.***.**.***:1026/version
Output:
curl: (56) Failure when receiving data from the peer
And when i do it a second time:
<orion>
<version>0.24.0</version>
<uptime>0 d, 0 h, 41 m, 27 s</uptime>
<git_hash><deleted for safety reasons></git_hash>
<compile_time>Mon Sep 14 17:52:44 CEST 2015</compile_time>
<compiled_by>fermin</compiled_by>
<compiled_in>centollo</compiled_in>
</orion>
Sometimes i even have to restart because when i check the status of the Context Broker it gives me an error. I'm not sure of the error but i think it's something like "the pid file doesn't exist"
EDIT:
EDIT 2:
So i tried using the script from Fiware-Figway:
python2.7 GetEntity.py entity_1
and as expected because of the problem, it worked half the times.
Full log trace for the Context Broker:
time=2015-10-01T11:31:09.269EDT | lvl=INFO | trans=N/A | function=main | comp=Orion | msg=contextBroker.cpp[1509]: Orion Context Broker is running
time=2015-10-01T11:31:09.280EDT | lvl=INFO | trans=N/A | function=mongoConnect | comp=Orion | msg=mongoConnectionPool.cpp[196]: Successful connection to database
time=2015-10-01T11:31:09.280EDT | lvl=INFO | trans=N/A | function=mongoConnect | comp=Orion | msg=mongoConnectionPool.cpp[196]: Successful connection to database
time=2015-10-01T11:31:09.281EDT | lvl=INFO | trans=N/A | function=mongoConnect | comp=Orion | msg=mongoConnectionPool.cpp[196]: Successful connection to database
time=2015-10-01T11:31:09.282EDT | lvl=INFO | trans=N/A | function=mongoConnect | comp=Orion | msg=mongoConnectionPool.cpp[196]: Successful connection to database
time=2015-10-01T11:31:09.282EDT | lvl=INFO | trans=N/A | function=mongoConnect | comp=Orion | msg=mongoConnectionPool.cpp[196]: Successful connection to database
time=2015-10-01T11:31:09.283EDT | lvl=INFO | trans=N/A | function=mongoConnect | comp=Orion | msg=mongoConnectionPool.cpp[196]: Successful connection to database
time=2015-10-01T11:31:09.283EDT | lvl=INFO | trans=N/A | function=mongoConnect | comp=Orion | msg=mongoConnectionPool.cpp[196]: Successful connection to database
time=2015-10-01T11:31:09.284EDT | lvl=INFO | trans=N/A | function=mongoConnect | comp=Orion | msg=mongoConnectionPool.cpp[196]: Successful connection to database
time=2015-10-01T11:31:09.284EDT | lvl=INFO | trans=N/A | function=mongoConnect | comp=Orion | msg=mongoConnectionPool.cpp[196]: Successful connection to database
time=2015-10-01T11:31:09.285EDT | lvl=INFO | trans=N/A | function=mongoConnect | comp=Orion | msg=mongoConnectionPool.cpp[196]: Successful connection to database
time=2015-10-01T11:31:09.285EDT | lvl=INFO | trans=N/A | function=mongoInit | comp=Orion | msg=contextBroker.cpp[1289]: Connected to mongo at localhost:orion
time=2015-10-01T11:31:09.286EDT | lvl=INFO | trans=N/A | function=getOrionDatabases | comp=Orion | msg=MongoGlobal.cpp[283]: Database Operation Successful (listDatabases command)
time=2015-10-01T11:31:09.286EDT | lvl=INFO | trans=N/A | function=subscriptionsTreat | comp=Orion | msg=MongoGlobal.cpp[2911]: Database Operation Successful ({})
time=2015-10-01T11:31:09.287EDT | lvl=INFO | trans=N/A | function=subscriptionsTreat | comp=Orion | msg=MongoGlobal.cpp[2911]: Database Operation Successful ({})
time=2015-10-01T11:31:09.288EDT | lvl=INFO | trans=N/A | function=treatOnTimeIntervalSubscriptions | comp=Orion | msg=MongoGlobal.cpp[553]: Database Operation Successful ({ conditions.type: "ONTIMEINTERVAL" })
time=2015-10-01T11:31:09.288EDT | lvl=INFO | trans=N/A | function=main | comp=Orion | msg=contextBroker.cpp[1597]: Startup completed
time=2015-10-01T11:31:18.112EDT | lvl=INFO | trans=1443713469-276-00000000001 | function=connectionTreat | comp=Orion | msg=rest.cpp[910]: Starting transaction from 127.0.0.1:45867/version
time=2015-10-01T11:31:18.114EDT | lvl=INFO | trans=1443713469-276-00000000001 | function=requestCompleted | comp=Orion | msg=rest.cpp[442]: Transaction ended
time=2015-10-01T11:31:19.289EDT | lvl=INFO | trans=N/A | function=getOrionDatabases | comp=Orion | msg=MongoGlobal.cpp[283]: Database Operation Successful (listDatabases command)
time=2015-10-01T11:31:19.290EDT | lvl=INFO | trans=N/A | function=subscriptionsTreat | comp=Orion | msg=MongoGlobal.cpp[2911]: Database Operation Successful ({})
time=2015-10-01T11:31:19.290EDT | lvl=INFO | trans=N/A | function=subscriptionsTreat | comp=Orion | msg=MongoGlobal.cpp[2911]: Database Operation Successful ({})
time=2015-10-01T11:31:29.292EDT | lvl=INFO | trans=N/A | function=getOrionDatabases | comp=Orion | msg=MongoGlobal.cpp[283]: Database Operation Successful (listDatabases command)
time=2015-10-01T11:31:29.293EDT | lvl=INFO | trans=N/A | function=subscriptionsTreat | comp=Orion | msg=MongoGlobal.cpp[2911]: Database Operation Successful ({})
time=2015-10-01T11:31:29.293EDT | lvl=INFO | trans=N/A | function=subscriptionsTreat | comp=Orion | msg=MongoGlobal.cpp[2911]: Database Operation Successful ({})
time=2015-10-01T11:31:39.294EDT | lvl=INFO | trans=N/A | function=getOrionDatabases | comp=Orion | msg=MongoGlobal.cpp[283]: Database Operation Successful (listDatabases command)
time=2015-10-01T11:31:39.295EDT | lvl=INFO | trans=N/A | function=subscriptionsTreat | comp=Orion | msg=MongoGlobal.cpp[2911]: Database Operation Successful ({})
time=2015-10-01T11:31:39.295EDT | lvl=INFO | trans=N/A | function=subscriptionsTreat | comp=Orion | msg=MongoGlobal.cpp[2911]: Database Operation Successful ({})
time=2015-10-01T11:31:39.427EDT | lvl=INFO | trans=1443713469-276-00000000002 | function=connectionTreat | comp=Orion | msg=rest.cpp[910]: Starting transaction from 188.***.**.***:58440/ngsi10/queryContext
time=2015-10-01T11:31:39.442EDT | lvl=INFO | trans=1443713469-276-00000000002 | function=entitiesQuery | comp=Orion | msg=MongoGlobal.cpp[1616]: Database Operation Successful ({ query: { $or: [ { _id.id: "entity_1" } ], _id.servicePath: { $in: [ null, /^$/, /^/.*/ ] } }, orderby: { creDate: 1 } })
time=2015-10-01T11:31:39.444EDT | lvl=INFO | trans=1443713469-276-00000000002 | function=registrationsQuery | comp=Orion | msg=MongoGlobal.cpp[2206]: Database Operation Successful ({ query: { $or: [ { contextRegistration.entities: { $in: [] } }, { contextRegistration.entities.id: { $in: [ "entity_1" ] } } ], expiration: { $gt: 1443713499 }, servicePath: { $in: [ null, /^$/, /^/.*/ ] } }, orderby: { _id: 1 } })
time=2015-10-01T11:31:39.453EDT | lvl=INFO | trans=1443713469-276-00000000002 | function=requestCompleted | comp=Orion | msg=rest.cpp[442]: Transaction ended
So as refered on this post's answer, there is a RAM memory issue regarding this problem. In fact my VM was running low on memory after a lot of testing and that itself made the Context Broker have this problem. I freed the memory and now the issue is gone.