Any help is appreciated, Please let me know were I am going wrong!
I am getting errors shown in the following image, I am running Loki and Grafana as 2 different AWS ECS-FARGATE tasks but my Liki container is failing and keep restarting itself:
My loki-config.yaml:
auth_enabled: true
server:
http_listen_port: 3100
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 1h # Any chunk not receiving new logs in this time will be flushed
max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h
chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
max_transfer_retries: 0 # Chunk transfers disabled
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: aws
schema: v11
index:
prefix: index_
period: 24h
storage_config:
aws:
s3: s3://XXXXX:YYYY#eu-west-1/logs-loki-test
boltdb_shipper:
active_index_directory: /loki/boltdb-shipper-active
cache_location: /loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: s3
compactor:
working_directory: /loki/boltdb-shipper-compactor
shared_store: aws
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: false
retention_period: 0s
ruler:
storage:
type: local
local:
directory: /loki/rules
rule_path: /loki/rules-temp
alertmanager_url: http://localhost:9093
ring:
kvstore:
store: inmemory
enable_api: true
In the compactor block, line shared_store replace aws with s3 and try out
I have a puppet code that supposed to create one galera cluster that contains two nodes but instead, it is creating two clusters with one node each.
the name of the two nodes are testbox1 and testbox2
the following is my ./hiera/role/testbox.yaml
---
classes:
- '::galera'
selinux::mode: 'permissive'
yum::repos::enabled:
- percona-x86_64
yum::repos:
contrail-3.2.1-mitaka:
enabled: 0
packages:
- 'Percona-XtraDB-Cluster-shared-compat-57'
- 'Percona-Server-selinux-56'
galera::configure_repo: false
galera::package_ensure: 'present'
galera::galera_package_ensure: 'absent'
galera::galera_package_name: 'Percona-XtraDB-Cluster-galera-3'
galera::client_package_name: 'Percona-XtraDB-Cluster-client-57'
galera::mysql_package_name: 'Percona-XtraDB-Cluster-server-57'
galera::bootstrap_command: 'systemctl start mysql#bootstrap.service'
galera::mysql_service_name: 'mysql'
mysql::server_service_name: 'mysql'
galera::service_enabled: true
galera::mysql_restart: true
galera::configure_firewall: false
mysql::server::purge_conf_dir: true
galera::purge_conf_dir: true
galera::grep_binary: '/bin/grep'
galera::mysql_binary: '/usr/bin/mysql'
galera::rundir: '/var/run/mysqld'
galera::socket: '/var/lib/mysql/mysql.sock'
galera::create_root_user: true
galera::create_root_my_cnf: true
galera::create_status_user: true
galera::status_check: true
galera::galera_servers: ['testbox-1', 'testbox-2']
galera::galera_master: 'testbox-1'
galera::status_password: 'bla'
galera::bind_address: '0.0.0.0'
galera::override_options:
mysqld:
pxc_strict_mode: 'ENFORCING'
wsrep_provider: '/usr/lib64/galera3/libgalera_smm.so'
wsrep_slave_threads: 8
wsrep_sst_method: 'rsync'
wsrep_cluster_name: 'grafana-galera-cluster'
wsrep_node_address: "%{ipaddress}"
wsrep_node_name: "%{hostname}"
wsrep_sst_auth: "sstuser:%{hiera('galera::sstuser_password')}"
binlog_format: 'ROW'
default_storage_engine: 'InnoDB'
innodb_locks_unsafe_for_binlog: 1
innodb_autoinc_lock_mode: 2
innodb_buffer_pool_size: '40000M'
innodb_log_file_size: '100M'
query_cache_size: 0
query_cache_type: 0
datadir: '/var/lib/mysql'
socket: '/var/lib/mysql/mysql.sock'
log-error: '/var/log/mysqld.log'
pid-file: '/var/run/mysql/mysql.pid'
max_connections: '10000'
max_connect_errors: '10000000'
mysqld_safe:
log-error: '/var/log/mysqld.log'
galera::status_user: 'clustercheck'
galera::status_allow: '%'
galera::status_available_when_donor: 0
galera::status_available_when_readonly: -1
galera::status_host: 'localhost'
galera::status_log_on_success: ''
galera::status_log_on_success_operator: '='
galera::status_port: 9200
galera::validate::action: 'select count(1);'
galera::validate::catch: 'ERROR'
galera::validate::delay: 3
galera::validate::inv_catch: undef
galera::validate::retries: 20
and I am using fraenki/galera module
The thing with this code, I end up with testbox1 in one cluster and testbox2 in another cluster instead of having both of them in the same cluster, After troubleshooting my issue is related to jira.percona.com/browse/PXC-2258, I found out the puppet code will create wsrep.cnf which has no value for wsrep_cluster_address and this will overwrite /etc/my.cnf.d/server.cnf which has the right value. I know how to fix this manually by deleting wsrep.cnf but I would like to have Puppet to do this without me fixing things manually but I do not know how.
puppet version 3.8.7 (opensource) (I can not upgrade it)
mysql#bootstrap needs to be executed on only one node. The other node do a normal start and then it will SST off the first node.
With two nodes you will have trouble getting a quorum and its unworkable as a HA system.
We have a magento website
Our website some times it showing below error like
There has been an error processing your request
Exception printing is disabled by default for security reasons.
Error log record number: 855613014442
Based on our logs, it is showing that Mysql is going down as shown below
2019-06-24T04:44:49.542168Z 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.7.26' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Community Server (GPL)
2019-06-24T04:44:50.594943Z 0 [Note] InnoDB: Buffer pool(s) load completed at 190624 4:44:50
2019-06-24T04:45:11.103402Z 0 [Note] Giving 0 client threads a chance to die gracefully
2019-06-24T04:45:11.103429Z 0 [Note] Shutting down slave threads
2019-06-24T04:45:11.103438Z 0 [Note] Forcefully disconnecting 0 remaining clients
2019-06-24T04:45:11.103444Z 0 [Note] Event Scheduler: Purging the queue. 0 events
2019-06-24T04:45:11.103484Z 0 [Note] Binlog end
We have increased innodb_buffer_pool_size but still i am facing same issue.
I have executed below commands in my server..check it these outputs
1)free -m
Output:
total used free shared buff/cache available
Mem: 7819 1430 4688 81 1701 6009
Swap: 0 0 0
2)dmesg | tail -30
Output:
[ 6.222373] [TTM] Initializing pool allocator
[ 6.241079] [TTM] Initializing DMA pool allocator
[ 6.255768] [drm] fb mappable at 0xF0000000
[ 6.259225] [drm] vram aper at 0xF0000000
[ 6.262574] [drm] size 33554432
[ 6.265475] [drm] fb depth is 24
[ 6.268473] [drm] pitch is 3072
[ 6.289079] fbcon: cirrusdrmfb (fb0) is primary device
[ 6.346169] Console: switching to colour frame buffer device 128x48
[ 6.347151] loop: module loaded
[ 6.357709] cirrus 0000:00:02.0: fb0: cirrusdrmfb frame buffer device
[ 6.364646] [drm] Initialized cirrus 1.0.0 20110418 for 0000:00:02.0 on minor 0
[ 6.722341] input: PC Speaker as /devices/platform/pcspkr/input/input4
[ 6.788110] EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
[ 6.802845] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
[ 6.841332] cryptd: max_cpu_qlen set to 1000
[ 6.871200] AVX2 version of gcm_enc/dec engaged.
[ 6.873349] AES CTR mode by8 optimization enabled
[ 6.936609] EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
[ 6.949717] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
[ 6.964446] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)
[ 6.984659] alg: No test for __generic-gcm-aes-aesni (__driver-generic-gcm-aes-aesni)
[ 7.084148] intel_rapl: Found RAPL domain package
[ 7.086591] intel_rapl: Found RAPL domain dram
[ 7.088788] intel_rapl: DRAM domain energy unit 15300pj
[ 7.102115] EDAC sbridge: Seeking for: PCI ID 8086:6fa0
[ 7.102119] EDAC sbridge: Ver: 1.1.2
[ 7.175339] ppdev: user-space parallel port driver
[ 10.728980] ip6_tables: (C) 2000-2006 Netfilter Core Team
[ 10.772307] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
3)ps auxw | grep mysql
Output:
mysql 5056 2.9 10.8 7009056 871240 ? Sl 12:29 0:12 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
root 5538 0.0 0.0 112708 976 pts/0 S+ 12:36 0:00 grep --color=auto mysql
Can anyone has idea how to resolve this issue.
Thanks
We are testing Galera crash to reproduce one of the prod issue while deleting record. Version is 10.1.32
Can someone please help us with scenarios for crashing Galera cluster while deleting records. When does it crash? Is it related to foreign key or 'wsrep_slave_threads'?
Please provide any help.
**Below is the actual Prod log when the cluster failed -**
2018-08-16 16:24:27 140366938635008 [ERROR] Slave SQL: Error 'Unknown table 'DB_TOTAL4DEV_P.users_last_login_2'' on query. Default database: ''. Query: 'drop table `DB_TOTAL4DEV_P`.`users_last_login_2`', Internal MariaDB error code: 1051
2018-08-16 16:24:27 140366938635008 [Warning] WSREP: RBR event 1 Query apply warning: 1, 309050572
2018-08-16 16:24:27 140366938635008 [Warning] WSREP: Ignoring error for TO isolated action: source: 4533077b-9f57-11e8-94aa-6b070f22a4b5 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 492382 trx_id: -1 seqnos (l: 805550, g: 309050572, s: 309050571, d: 309050571, ts: 223731321595519)
2018-08-16 16:24:28 140366938635008 [ERROR] Slave SQL: Error 'Unknown table 'DB_TOTAL4DEV_P.users_last_login_2'' on query. Default database: ''. Query: 'drop table `DB_TOTAL4DEV_P`.`users_last_login_2`', Internal MariaDB error code: 1051
2018-08-16 16:24:28 140366938635008 [Warning] WSREP: RBR event 1 Query apply warning: 1, 309050575
2018-08-16 16:24:28 140366938635008 [Warning] WSREP: Ignoring error for TO isolated action: source: 56daeaa4-9e8e-11e8-b891-6f203bac329a version: 3 local: 0 state: APPLYING flags: 65 conn_id: 708893 trx_id: -1 seqnos (l: 805553, g: 309050575, s: 309050574, d: 309050574, ts: 310032366214700)
2018-08-16 16:28:45 140366938635008 [ERROR] mysqld: Can't find record in 'users_last_login'
2018-08-16 16:28:45 140366938635008 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table DB_TOTAL4DEV_P.users_last_login; Can't find record in 'users_last_login', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1862, Internal MariaDB error code: 1032
2018-08-16 16:28:45 140366938635008 [Warning] WSREP: RBR event 3 Delete_rows_v1 apply warning: 120, 309051933
2018-08-16 16:28:45 140366938635008 [Warning] WSREP: Failed to apply app buffer: seqno: 309051933, status: 1
2018-08-16 16:28:45 140366938635008 [ERROR] WSREP: Failed to apply trx: source: 808e3422-9dcd-11e8-bbc0-3ab028afd1b7 version: 3 local: 0 state: APPLYING flags: 1 conn_id: 893106 trx_id: 15068161908 seqnos (l: 806931, g: 309051933, s: 309051932, d: 309051868, ts: 393116532516870)
2018-08-16 16:28:45 140366938635008 [ERROR] WSREP: Failed to apply trx 309051933 4 times
2018-08-16 16:28:45 140366938635008 [ERROR] WSREP: Node consistency compromised, aborting...
From last 4 days we are facing frequent database crashes with mysql infobright engine, there is no recent changes on production environment and no updates.
Currently we are using the version 5.1.40.
Find the below dump, can any one help to figure out the issue.
170520 21:12:08 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.
key_buffer_size=1677721600
read_buffer_size=1048576
max_used_connections=75
max_threads=1000
threads_connected=54
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3696548 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
thd: 0xc2a4bd000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fc0d0bede58 thread_stack 0x80000
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0xaef849]
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0x412e13]
/lib64/libpthread.so.0(+0xf7e0) [0x7fc0d48c77e0]
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0xb10635]
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0xb1f123]
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0x9a9693]
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0x76ae0c]
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0x76b594]
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0x767ab3]
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0x7694ea]
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0x72902b]
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0x422325]
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0x427573]
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0x42b38c]
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0x42c227]
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0x42cb05]
/usr/local/infobright-4.7.1-x86_64/bin/mysqld() [0x41f06d]
/lib64/libpthread.so.0(+0x7aa1) [0x7fc0d48bfaa1]
/lib64/libc.so.6(clone+0x6d) [0x7fc0d460caad]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0xc2da5e410 = SELECT DATE_FORMAT(DATETIME,'%Y%m%d') AS YEAR_MONTH_DAY_SK ,HOUR(DATETIME) AS HOUR_SK, IFNULL(DESTINATION,'0') AS DESTINATION, IFNULL(DATETIME,'1970-01-01 00:00:00') AS DATETIME, IFNULL(CLIENTID,'0') AS CLIENTID, IFNULL(GROUPID,'0') AS GROUPID, IFNULL(TEAMID,'0') AS TEAMID, IFNULL(SERVICEID,'0') AS SERVICEID, IFNULL(CHANNELID,'0') AS CHANNELID, IFNULL(STATUSID,'0') AS STATUSID, CASE REASONCODE WHEN '' THEN NULL WHEN NULL THEN NULL ELSE REASONCODE END AS REASONCODE, CASE REASONDESC WHEN '' THEN NULL WHEN NULL THEN NULL ELSE REASONDESC END AS REASONDESC, IFNULL(ACTIONTYPE1ID,'0') AS ACTIONTYPE1ID, CASE ACTIONTYPE1DESC WHEN '' THEN NULL WHEN NULL THEN NULL ELSE ACTIONTYPE1DESC END AS ACTIONTYPE1DESC, IFNULL(ACTIONTYPE2ID,'0') AS ACTIONTYPE2ID, CASE ACTIONTYPE2DESC WHEN '' THEN NULL WHEN NULL THEN NULL ELSE ACTIONTYPE2DESC END AS ACTIONTYPE2DESC, IFNULL(ATTACHMENT,'0') AS ATTACHMENT, CASE MIMETYPE WHEN '' THEN NULL WHEN NULL THEN NULL ELSE MIMETYPE END AS MIMETYPE, CASE VOICEFLOWNAME WH
thd->thread_id=35918
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
170520 21:12:08 mysqld_safe Number of processes running now: 0
170520 21:12:08 mysqld_safe mysqld restarted
tcmalloc: large alloc 1365172224 bytes == 0x4518000 #
Loading configuration for Infobright instance ...
Option: AllowMySQLQueryPath, value: 1.
Option: AutoConfigure, value: 0.
Option: CacheFolder, value: /usr/local/infobright-4.7.1-x86_64/cache.
Option: ControlMessages, value: 0.
Option: IBEngineRevision, value: IEE_4.7.1_r30553_31737.
Option: InternalMessages, value: 0.
Option: InternalMessagesFlushPeriod, value: 60.
Option: KNFolder, value: BH_RSI_Repository.
Option: KNLevel, value: 99.
Option: LicenseCheckInterval, value: 0.
Option: LicenseExpireWarningDays, value: 0.
Option: LicenseFile, value: <unknown>.
Option: LicenseServerIPAddr, value: .
Option: LicenseServerType, value: .
Option: LicenseServerWarningNumber, value: .
Option: LoaderMainHeapSize, value: 800.
Option: PushDown, value: 1.
Option: ServerMainHeapSize, value: 48000.
Option: UseMySQLImportExportDefaults, value: 0.
Option: bherrLogLevel, value: 1.
Infobright instance configuration loaded.
tcmalloc: large alloc 40265318400 bytes == 0x687c8000 #
tcmalloc: large alloc 10066329600 bytes == 0x9cff48000 #
170520 21:12:09 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use
170520 21:12:09 [ERROR] Do you already have another mysqld server running on port: 5029 ?
170520 21:12:09 [ERROR] Aborting
170520 21:12:09 [Note] /usr/local/infobright-4.7.1-x86_64/bin/mysqld: Shutdown complete
170520 21:12:09 mysqld_safe mysqld from pid file /data/infobright/data/SH-UMP-CINFBRT2.pid ended