[00:05:47] <logmsgbot>	 !log ryankemper@cumin2001 START - Cookbook sre.wdqs.data-transfer
[00:05:47] <logmsgbot>	 !log ryankemper@cumin2001 END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
[00:05:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:05:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:06:20] <logmsgbot>	 !log ryankemper@cumin2001 START - Cookbook sre.wdqs.data-transfer
[00:06:20] <logmsgbot>	 !log ryankemper@cumin2001 END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
[00:06:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:06:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:08:34] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.wdqs.data-transfer
[00:08:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:10:54] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
[00:10:55] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.wdqs.data-transfer
[00:10:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:10:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:13:17] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
[00:13:18] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.wdqs.data-transfer
[00:13:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:13:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:16:08] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
[00:16:08] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.wdqs.data-transfer
[00:16:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:16:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:18:53] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
[00:18:53] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.wdqs.data-transfer
[00:18:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:18:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:21:43] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
[00:21:44] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.wdqs.data-transfer
[00:21:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:21:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:24:37] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
[00:24:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:38:31] <wikibugs>	 (03PS7) 10Krinkle: Replace stringified class names with ::class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593654 (https://phabricator.wikimedia.org/T251841) (owner: 10Reedy)
[00:38:57] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] Replace stringified class names with ::class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593654 (https://phabricator.wikimedia.org/T251841) (owner: 10Reedy)
[00:39:47] <wikibugs>	 (03Merged) 10jenkins-bot: Replace stringified class names with ::class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593654 (https://phabricator.wikimedia.org/T251841) (owner: 10Reedy)
[00:41:27] * Krinkle staging on mwdebug1001
[00:55:16] <logmsgbot>	 !log krinkle@deploy1001 Synchronized wmf-config/logging.php: I046868190b472 (duration: 01m 13s)
[00:55:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:15:50] <icinga-wm>	 RECOVERY - rpki grafana alert on icinga1001 is OK: OK: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is not alerting. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/
[02:09:12] <wikibugs>	 (03PS12) 10Bmansurov: Add recommendation-api chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230)
[03:53:07] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Thumbor, 10Traffic: Thumbnail rendering of complex SVG file leads to Error 500 or Error 429 instead of Error 408 - https://phabricator.wikimedia.org/T226318 (10AntiCompositeNumber)
[03:53:15] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Thumbor, 10Traffic: Thumbnail rendering of complex SVG file leads to Error 500 or Error 429 instead of Error 408 - https://phabricator.wikimedia.org/T226318 (10AntiCompositeNumber) rsvg-convert 2.40.16 did not process this file in >5 minutes, 2.48.4...
[04:03:53] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Thumbor, 10Traffic: Thumbnail rendering of complex SVG file leads to Error 500 or Error 429 instead of Error 408 - https://phabricator.wikimedia.org/T226318 (10AntiCompositeNumber) rsvg-convert 2.40.16 processed this file at 1000px in 75 seconds, wh...
[04:56:02] <wikibugs>	 (03CR) 10BPirkle: [C: 03+1] "Looks good, approved for self-merge and deploy." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/596538 (https://phabricator.wikimedia.org/T245170) (owner: 10Tim Starling)
[06:29:18] <icinga-wm>	 PROBLEM - Check systemd state on ores1006 is CRITICAL: connect to address 10.64.32.15 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:29:26] <icinga-wm>	 PROBLEM - MD RAID on ores1006 is CRITICAL: connect to address 10.64.32.15 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:30:00] <icinga-wm>	 PROBLEM - Check systemd state on ores1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:30:08] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores1006 is CRITICAL: connect to address 10.64.32.15 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:30:30] <icinga-wm>	 PROBLEM - Check systemd state on ores1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:30:34] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores1006 is CRITICAL: connect to address 10.64.32.15 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:31:40] <icinga-wm>	 PROBLEM - puppet last run on ores1006 is CRITICAL: connect to address 10.64.32.15 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:31:52] <icinga-wm>	 RECOVERY - Check systemd state on ores1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:42:26] <icinga-wm>	 PROBLEM - ores_workers_running on ores1006 is CRITICAL: PROCS CRITICAL: 2 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES
[06:42:30] <icinga-wm>	 RECOVERY - Check systemd state on ores1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:43:16] <icinga-wm>	 PROBLEM - ores_workers_running on ores1003 is CRITICAL: PROCS CRITICAL: 0 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES
[06:43:22] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores1006 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:43:28] <icinga-wm>	 RECOVERY - puppet last run on ores1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:46:14] <icinga-wm>	 RECOVERY - ores_workers_running on ores1006 is OK: PROCS OK: 91 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES
[06:51:14] <icinga-wm>	 RECOVERY - MD RAID on ores1006 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:53:12] <icinga-wm>	 RECOVERY - Check systemd state on ores1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:54:36] <icinga-wm>	 RECOVERY - ores_workers_running on ores1003 is OK: PROCS OK: 91 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES
[07:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200516T0700)
[08:13:17] <wikibugs>	 (03CR) 10DannyS712: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/596726 (owner: 10Jforrester)
[08:34:57] <wikibugs>	 10Operations, 10ORES, 10Scoring-platform-team (Current): ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) - https://phabricator.wikimedia.org/T242705 (10elukey) The issue happens from time to time, always at the same time. @Halfak is there any possible solution t...
[09:52:16] <icinga-wm>	 PROBLEM - PHP opcache health on mwdebug1001 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[12:15:22] <icinga-wm>	 RECOVERY - PHP opcache health on mwdebug1001 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[12:43:42] <wikibugs>	 (03CR) 10Hashar: "With the last CI jobs moved to Docker containers, we barely rely on puppet anymore. I have done some cleanup a few months ago but the rema" [puppet] - 10https://gerrit.wikimedia.org/r/596687 (https://phabricator.wikimedia.org/T252190) (owner: 10Dzahn)
[15:16:38] <Krinkle>	 !log krinkle@mc1020 Looking at why there are still over 2M echo:seen keys in redis main stash
[15:16:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:13:20] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 64, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[17:17:38] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 52 probes of 566 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[17:24:27] <Krinkle>	 !log krinkle@mc1020 Prune old echo:seen: keys that have ttl:-1 from Redis main stash, ref T252945
[17:24:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:24:30] <stashbot>	 T252945: Avoid constant evictions on Redis main stash - https://phabricator.wikimedia.org/T252945
[17:29:24] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 47 probes of 566 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[17:34:55] <elukey>	 Krinkle: nice!
[17:35:45] <elukey>	 Krinkle: I still see some evictions now though :(
[17:35:52] <Krinkle>	 elukey: on mc1020?
[17:36:15] <elukey>	 ah nono aggregated
[17:36:28] <elukey>	 we could even breakdown the graph per shard
[17:36:35] <Krinkle>	 It is
[17:36:36] <Krinkle>	 I think
[17:36:46] <Krinkle>	 I also added breakdown by instance earlier today
[17:37:05] <Krinkle>	 Ah I guess shard=instance
[17:37:46] <elukey>	 ah ok if you specifically select one, I was thinking to an aggregated graph with all shards
[17:38:00] <elukey>	 anyway, evictions are way less, good job :)
[17:39:16] <elukey>	 it is also worth to notice that memory usage is basically maxed out for almost all shards
[17:39:36] <elukey>	 and this is the pre-condition to enter LRU mode
[17:39:39] <elukey>	 + evictions
[17:41:29] <Krinkle>	 elukey: yeah, I don't know what it's configured at but the constnat 520M line for each shard is certainly suspicious
[17:41:35] <Krinkle>	 that basically just tells me it's at the ceiling
[17:41:42] <Krinkle>	 see T252945
[17:41:43] <stashbot>	 T252945: Avoid constant evictions on Redis main stash - https://phabricator.wikimedia.org/T252945
[17:42:48] <elukey>	 maxmemory 500Mb
[17:42:48] <elukey>	 maxmemory-policy volatile-lru
[17:43:00] <elukey>	 this is from /etc/redis/tcp_6379.conf
[17:43:15] <Krinkle>	 right
[17:43:31] <Krinkle>	 elukey: hm.. but LRU stands for least-recently-used
[17:43:44] <Krinkle>	 I know to not to expect perfection in caching/lru
[17:44:00] <Krinkle>	 but then how come it still hasn't gotten to the millions of unused no-ttl keys from months months ago
[17:44:03] <elukey>	 yes after redis maxes out memory, it starts evicting, following the LRU policy
[17:44:47] <elukey>	 ahh interesting!
[17:44:49] <elukey>	 "Evicts the least recently used keys out of all keys with an "expire" field set"
[17:44:58] <Krinkle>	 :facepalm:
[17:45:02] <elukey>	 this is the "volatile-lru"
[17:45:09] <Krinkle>	 Right, that's not a bad policy
[17:45:17] <Krinkle>	 if we used it correctly :P
[17:45:19] <Krinkle>	 cool
[17:45:27] <elukey>	 otherwise there is  allkeys-lru
[17:45:44] <Krinkle>	 so given that we very bravely gave everything a TTL
[17:45:56] <Krinkle>	 and that 90% is used up by legacy Echo values from pre-2019
[17:46:04] <Krinkle>	 that means it's basically only deleting stuff we want to keep
[17:46:06] <Krinkle>	 great
[17:46:19] <elukey>	 yep 
[17:46:21] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nova-compute: set ceph nodes to use CPU features available on all cloudvirts [puppet] - 10https://gerrit.wikimedia.org/r/596762 (https://phabricator.wikimedia.org/T225320) (owner: 10Andrew Bogott)
[17:46:31] <elukey>	 are those pre-2019 values droppable?
[17:46:38] <Krinkle>	 see task
[17:46:38] <Krinkle>	 yes
[17:46:39] <wikibugs>	 (03PS1) 10Andrew Bogott: Move cloudvirt-wdqs hosts off of ceph [puppet] - 10https://gerrit.wikimedia.org/r/596816 (https://phabricator.wikimedia.org/T252784)
[17:46:54] <Krinkle>	 in 2019, Echo was fixed to give its keys a ttl of 1 year
[17:47:04] <Krinkle>	 also, later that year it was migrated to echoseen-kask
[17:47:10] <Krinkle>	 so it's not even reading/writing these at all anymore afaik
[17:47:17] <Krinkle>	 but I'm not willing to make that call on a saturday
[17:47:26] <elukey>	 ah ok we are waiting for confirmation, great
[17:47:29] <elukey>	 yes yes :)
[17:47:35] <elukey>	 wise choice
[17:47:36] <Krinkle>	 but I am dropping the no-ttl ones at least
[17:47:55] <Krinkle>	 which on the shards I looked at is most or all of the echo:seen keys, so it has definitely turned over everything else at least once since Octover
[17:48:09] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Move cloudvirt-wdqs hosts off of ceph [puppet] - 10https://gerrit.wikimedia.org/r/596816 (https://phabricator.wikimedia.org/T252784) (owner: 10Andrew Bogott)
[17:48:10] <Krinkle>	 but I'm keeping the extra ttl check just in case
[17:48:41] <Krinkle>	 s/everything/anything volatile/
[17:49:08] <elukey>	 super, really interesting discovery
[17:49:21] <elukey>	 going afk, have a good (rest of) the weekend :)
[17:49:25] <Krinkle>	 !log krinkle@mwmaint1002: Running cleanupRemovedModules.php to prune old module_deps rows T113916
[17:49:26] <Krinkle>	 thanks
[17:49:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:49:28] <stashbot>	 T113916: Redesign ResourceLoader's file dependency tracking (module_deps) - https://phabricator.wikimedia.org/T113916
[17:51:28] <wikibugs>	 10Operations, 10Pybal, 10Traffic, 10Patch-For-Review: Upgrade pybal-test instances to stretch - https://phabricator.wikimedia.org/T190993 (10Aklapper)
[17:53:19] <wikibugs>	 10Operations, 10Pybal, 10Traffic: Upgrade pybal-test instances to stretch - https://phabricator.wikimedia.org/T190993 (10Aklapper) 05Stalled→03Resolved a:03Vgutierrez >>! In T190993#4126816, @Vgutierrez wrote: > Let's keep pybal-test2001 as jessie till we don't have any LVS on production running jessie...
[17:53:22] <wikibugs>	 10Operations, 10Pybal, 10Traffic, 10Patch-For-Review: Upgrade LVS servers to stretch - https://phabricator.wikimedia.org/T177961 (10Aklapper)
[17:53:32] <wikibugs>	 10Operations, 10Pybal, 10Traffic: Upgrade pybal-test instances to stretch - https://phabricator.wikimedia.org/T190993 (10Aklapper)
[17:55:04] <wikibugs>	 10Operations, 10JavaScript: Instability on fr.wikiversity project - https://phabricator.wikimedia.org/T112069 (10Aklapper)
[17:56:18] <Krinkle>	 !log krinkle@mc1023 Pruning old echo:seen: Redis keys that didn't use a ttl yet, ref T252945
[17:56:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:56:21] <stashbot>	 T252945: Avoid constant evictions on Redis main stash - https://phabricator.wikimedia.org/T252945
[18:24:03] <Krinkle>	 !log krinkle@mc1025 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref T252945
[18:24:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:24:07] <stashbot>	 T252945: Avoid constant evictions on Redis main stash - https://phabricator.wikimedia.org/T252945
[18:30:07] <Krinkle>	 !log krinkle@mc1024 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref T252945
[18:30:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:30:11] <stashbot>	 T252945: Avoid constant evictions on Redis main stash - https://phabricator.wikimedia.org/T252945
[18:48:07] <wikibugs>	 10Operations, 10Internet-Archive, 10Offline-Working-Group: Create backups of Wikimedia content in diverse geographic places - https://phabricator.wikimedia.org/T156544 (10Aklapper) 05Stalled→03Open
[18:54:55] <Krinkle>	 !log krinkle@mc1026 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref T252945
[18:54:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:54:59] <stashbot>	 T252945: Avoid constant evictions on Redis main stash - https://phabricator.wikimedia.org/T252945
[18:58:50] <Krinkle>	 !log krinkle@mc1027 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref T252945
[18:58:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:10:22] <Krinkle>	 !log krinkle@mc1028 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
[19:10:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:25:56] <Krinkle>	 !log krinkle@mc1029 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
[19:25:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:42:07] <Krinkle>	 !log krinkle@mc1030 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
[19:42:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:51:11] <Krinkle>	 !log krinkle@mc1031 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
[19:51:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:57:12] <Krinkle>	 !log krinkle@mc1032 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
[19:57:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:04:54] <Krinkle>	 !log krinkle@mc1033 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
[20:04:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:23:53] <Krinkle>	 !log krinkle@mc1034,mc1035,mc1036 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
[20:23:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:58:56] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 67 probes of 566 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[21:04:52] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 46 probes of 566 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[21:14:47] <wikibugs>	 10Operations, 10DBA, 10Wikidata, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): Fix Wikidata dispatch - https://phabricator.wikimedia.org/T252952 (10Addshore) Dispatching is slowing down, as the db server is lagged and the dispatch process has a waitForReplication call in the code. Specifically this...
[21:16:31] <wikibugs>	 10Operations, 10DBA, 10Wikidata, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): Wikidata dispatching slow and maxlag high on Wikidata due to db1101 replication lag - https://phabricator.wikimedia.org/T252952 (10Addshore)
[21:21:29] <wikibugs>	 10Operations, 10DBA, 10Wikidata, 10Patch-For-Review, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): Wikidata dispatching slow and maxlag high on Wikidata due to db1101 replication lag - https://phabricator.wikimedia.org/T252952 (10Ladsgroup) I contact @Marostegui
[21:41:31] <wikibugs>	 10Operations, 10DBA, 10Wikidata, 10Patch-For-Review, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): Wikidata dispatching slow and maxlag high on Wikidata due to db1101 replication lag - https://phabricator.wikimedia.org/T252952 (10Addshore) p:05Unbreak!→03High
[21:43:25] <wikibugs>	 10Operations, 10DBA, 10Wikidata, 10Patch-For-Review, and 2 others: Wikidata dispatching slow and maxlag high on Wikidata due to db1101 replication lag - https://phabricator.wikimedia.org/T252952 (10Addshore) 05Open→03Resolved a:03Addshore Marking as resolved as the impact on wikidata is now gone  > 1...
[21:46:06] <wikibugs>	 10Operations, 10DBA, 10Wikidata, 10Patch-For-Review, and 2 others: Wikidata dispatching slow and maxlag high on Wikidata due to db1101 replication lag - https://phabricator.wikimedia.org/T252952 (10jcrespo) {P11212}
[21:46:20] <wikibugs>	 10Operations, 10DBA, 10Wikidata, 10Patch-For-Review, and 2 others: Wikidata dispatching slow and maxlag high on Wikidata due to db1101 replication lag - https://phabricator.wikimedia.org/T252952 (10Marostegui) Some more details. There were a few long running queries `  | 669154401 | wikiuser        | 10.64...
[21:46:40] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:50:05] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:56:36] <Krinkle>	 !log krinkle@mc1019 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
[21:56:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:04:53] <Krinkle>	 !log krinkle@mc1022 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
[22:04:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:58:19] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[23:00:13] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[23:33:03] <wikibugs>	 (03PS1) 10Krinkle: contint: Remove mention of unused global agent script [puppet] - 10https://gerrit.wikimedia.org/r/596833 (https://phabricator.wikimedia.org/T252955)