[00:02:35] <wikibugs>	 (03PS1) 10Legoktm: extdist: Drop pre-stretch support [puppet] - 10https://gerrit.wikimedia.org/r/560957
[02:51:31] <icinga-wm>	 PROBLEM - Check systemd state on ores1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:04:31] <icinga-wm>	 PROBLEM - ores_workers_running on ores1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES
[03:13:03] <icinga-wm>	 RECOVERY - Check systemd state on ores1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:13:27] <icinga-wm>	 RECOVERY - ores_workers_running on ores1001 is OK: PROCS OK: 91 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES
[04:16:46] <wikibugs>	 10Operations, 10Maps, 10Discovery-Search (Current work): Re-import OSM data at eqiad and codfw to temporarily fix current OSM replication issues. - https://phabricator.wikimedia.org/T239728 (10Arjunaraoc) @Mathew.onipe @Gehel  Thanks for completing this task, I am able to get my recent OSM edits reflect on w...
[10:06:43] <logmsgbot>	 !log elukey@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp3061.esams.wmnet
[10:06:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:07:46] <elukey>	 !log powercycle cp3061 - mgmt serial console not showing a working tty, no ssh
[10:07:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:10:45] <icinga-wm>	 RECOVERY - Host cp3061 is UP: PING OK - Packet loss = 0%, RTA = 83.37 ms
[10:57:03] <ema>	 !log repool cp3061 T238305
[10:57:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:57:10] <stashbot>	 T238305: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305
[11:13:56] <ema>	 !log restarted wikibugs to fix phab irc notifications
[11:14:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:14:49] <wikibugs>	 10Operations, 10Traffic: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10ema)
[11:50:29] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 31882376 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[11:52:15] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 55880 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:45:53] <wikibugs>	 (03PS1) 10Ladsgroup: Offboard Tim [puppet] - 10https://gerrit.wikimedia.org/r/560972
[12:47:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Offboard Tim [puppet] - 10https://gerrit.wikimedia.org/r/560972 (owner: 10Ladsgroup)
[12:59:07] <wikibugs>	 (03CR) 10Peachey88: "Can I suggest a slightly more descriptive commit message, example might be something like "Off-boarding Tim Eulitz"" [puppet] - 10https://gerrit.wikimedia.org/r/560972 (owner: 10Ladsgroup)
[13:11:09] <icinga-wm>	 PROBLEM - MD RAID on ms-be2035 is CRITICAL: CRITICAL: State: degraded, Active: 2, Working: 2, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[13:11:10] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on ms-be2035 is CRITICAL: CRITICAL: State: degraded, Active: 2, Working: 2, Failed: 0, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T241534 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[13:11:14] <wikibugs>	 10Operations, 10ops-codfw: Degraded RAID on ms-be2035 - https://phabricator.wikimedia.org/T241534 (10ops-monitoring-bot)
[13:22:24] <wikibugs>	 (03CR) 10MarcoAurelio: [C: 04-1] "See inline comments. I also agree with Peachey88. There's also an apparently easy way to offboard users documented at https://w.wiki/Ech v" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/560972 (owner: 10Ladsgroup)
[13:30:55] <icinga-wm>	 PROBLEM - HP RAID on ms-be2035 is CRITICAL: CRITICAL: Slot 3: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:2 - Failed: 2I:4:1 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[13:30:57] <icinga-wm>	 ACKNOWLEDGEMENT - HP RAID on ms-be2035 is CRITICAL: CRITICAL: Slot 3: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:2 - Failed: 2I:4:1 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T241535 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[13:31:01] <wikibugs>	 10Operations, 10ops-codfw: Degraded RAID on ms-be2035 - https://phabricator.wikimedia.org/T241535 (10ops-monitoring-bot)
[13:34:41] <wikibugs>	 (03PS3) 10Gehel: wdqs: use RecentChanges API for updates on all WDQS servers [puppet] - 10https://gerrit.wikimedia.org/r/560922 (https://phabricator.wikimedia.org/T241410)
[13:37:13] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] wdqs: use RecentChanges API for updates on all WDQS servers [puppet] - 10https://gerrit.wikimedia.org/r/560922 (https://phabricator.wikimedia.org/T241410) (owner: 10Gehel)
[13:37:33] <wikibugs>	 10Operations, 10ops-codfw, 10SRE-swift-storage: Degraded RAID on ms-be2035 - https://phabricator.wikimedia.org/T241535 (10Peachey88)
[14:04:20] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on ms-be2035 is CRITICAL: cluster=swift device={cciss,0,cciss,1,cciss,10,cciss,11,cciss,12,cciss,13,cciss,2,cciss,3,cciss,4,cciss,5,cciss,6,cciss,7,cciss,8,cciss,9} instance=ms-be2035:9100 job=node site=codfw https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be2035&var-datasource=codfw+prometheus/ops
[14:31:25] <wikibugs>	 (03PS2) 10Ladsgroup: Offboard Tim Eulitz [puppet] - 10https://gerrit.wikimedia.org/r/560972
[14:32:33] <wikibugs>	 (03CR) 10Ladsgroup: "> Patch Set 1: Code-Review-1" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/560972 (owner: 10Ladsgroup)
[15:18:34] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 47 probes of 510 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[15:24:26] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 28 probes of 510 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[17:47:59] <sbassett>	 Hey all - going to deploy a quick security fix for T241410 (https://gerrit.wikimedia.org/r/560978) in a few minutes.
[17:58:18] <logmsgbot>	 !log sbassett@deploy1001 Synchronized php-1.35.0-wmf.11/extensions/EventBus/includes/EventFactory.php: Security fix for T241410 (duration: 00m 56s)
[17:58:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:02:49] <wikibugs>	 (03CR) 10MarcoAurelio: [C: 03+1] "Looks good to me now. Waiting on ldap/ops people to review and merge." [puppet] - 10https://gerrit.wikimedia.org/r/560972 (owner: 10Ladsgroup)
[18:48:47] <Jayprakash12345>	 Hi, Can someone having +2 in integration/config merge https://gerrit.wikimedia.org/r/#/c/integration/config/+/560982/? 
[18:49:39] <hauskatze>	 Jayprakash12345: a cheque would help ;-)
[18:50:20] <hauskatze>	 I'm not sure there's anyone right now though
[18:52:41] <Jayprakash12345>	 hauskatze: Ok, Thank you! May be anyone will merge when they will online :)
[18:55:25] <hauskatze>	 hashar was online a while ago I think
[18:55:33] <hauskatze>	 you may be lucky :)
[19:03:26] <wikibugs>	 (03PS1) 10Gehel: wdqs: enable async_import on eqiad public cluster [puppet] - 10https://gerrit.wikimedia.org/r/560987 (https://phabricator.wikimedia.org/T241410)
[19:03:46] <wikibugs>	 (03PS2) 10Gehel: wdqs: enable async_import on eqiad public cluster [puppet] - 10https://gerrit.wikimedia.org/r/560987 (https://phabricator.wikimedia.org/T241410)
[19:05:44] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] "PCC looks happy" [puppet] - 10https://gerrit.wikimedia.org/r/560987 (https://phabricator.wikimedia.org/T241410) (owner: 10Gehel)
[20:51:28] <icinga-wm>	 PROBLEM - traffic_server tls process restarted on cp1083 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=eqiad+prometheus/ops&var-instance=cp1083&var-layer=tls
[20:53:44] <icinga-wm>	 RECOVERY - WDQS high update lag on wdqs1007 is OK: (C)4.32e+04 ge (W)2.16e+04 ge 2.154e+04 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[22:29:35] <gehel>	 !log repooling wdqs1007
[22:29:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log