[00:25:33] <icinga-wm>	 PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:30:59] <wikibugs>	 (03PS1) 10Zoranzoki21: flaggedrevs.php: Enable autoreview for bots on bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547901
[00:33:17] <icinga-wm>	 PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[00:36:11] <wikibugs>	 (03PS2) 10Zoranzoki21: flaggedrevs.php: Enable autoreview for bots on bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547901 (https://phabricator.wikimedia.org/T237170)
[00:36:51] <icinga-wm>	 RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:43:53] <icinga-wm>	 RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[01:56:18] <TheSandDoctor>	 What's up with commons? I'm getting "[Xb4zmwpAAD4AAHQiSJMAAAAI] 2019-11-03 01:55:39: Fatal exception of type "InvalidArgumentException"" when trying to undo an edit
[02:00:07] <Krenair>	 any edit or just a particular one thcipriani 
[02:00:24] <TheSandDoctor>	 @Krenair https://commons.wikimedia.org/w/index.php?title=File:Tara_Sutaria_at_Sabyasachi_event_in_2019.jpg&diff=373018477&oldid=364761963&diffmode=source
[02:00:25] <Krenair>	 TheSandDoctor* sorry thc.ipriani, hit tab too soon
[02:00:44] * TheSandDoctor was just about to file something on Phabricator.
[02:00:57] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack: Update comments on pdns3hack [puppet] - 10https://gerrit.wikimedia.org/r/547874 (owner: 10Alex Monk)
[02:02:33] <TheSandDoctor>	 @Krenair https://phabricator.wikimedia.org/T237173
[02:02:37] <TheSandDoctor>	 I subscribed you to it
[02:03:30] <Krenair>	 TheSandDoctor, ack, managed to undo this one manually
[02:03:41] <TheSandDoctor>	 you got same thing>
[02:03:43] <TheSandDoctor>	 ?*
[02:04:11] <Krenair>	 yes
[02:04:19] <TheSandDoctor>	 might want to add that to the ticket?
[02:04:25] <Krenair>	 might be relating to MCR, not sure yet
[02:05:54] <TheSandDoctor>	 MCR @Krenair?
[02:10:09] <icinga-wm>	 PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:11:07] <Krenair>	 TheSandDoctor, Multi-Content Revisions, being used for SDC
[02:11:29] <TheSandDoctor>	 this is where I sound stupid and ask what SDC stands for....
[02:11:32] <TheSandDoctor>	 :P
[02:11:35] <TheSandDoctor>	 @Krenair
[02:12:27] <Krenair>	 https://meta.wikimedia.org/wiki/Structured_Data_on_Commons
[02:13:52] <TheSandDoctor>	 thanks
[02:37:35] <icinga-wm>	 RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:03:18] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation
[03:03:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:09:16] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation (duration: 06m 01s)
[03:09:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:10:01] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation (second try)
[03:10:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:10:25] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@9972ed2]: deploying fix for puppet prefix creation (second try) (duration: 00m 25s)
[03:10:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:13:54] <wikibugs>	 10Operations, 10ops-codfw, 10Cloud-Services: rack/setup codfw: cloudbackup2001.codfw.wmnet and cloudbackup2002.codfw.wmnet - https://phabricator.wikimedia.org/T224528 (10Andrew) I can't PXE boot, so something is broken somewhere.  I haven't dug in much though.  ` Broadcom UNDI PXE-2.1 v214.0.170.0 Copyright...
[03:50:32] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@0c024d4]: one more prefix fix
[03:50:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:54:06] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@0c024d4]: one more prefix fix (duration: 03m 35s)
[03:54:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:32:23] <icinga-wm>	 PROBLEM - Host ms-be2056 is DOWN: PING CRITICAL - Packet loss = 100%
[04:34:07] <icinga-wm>	 RECOVERY - Host ms-be2056 is UP: PING OK - Packet loss = 0%, RTA = 36.13 ms
[05:00:11] <icinga-wm>	 PROBLEM - rsyslog TLS listener on port 6514 on centrallog1001 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Logs
[05:01:47] <icinga-wm>	 RECOVERY - rsyslog TLS listener on port 6514 on centrallog1001 is OK: SSL OK - Certificate centrallog1001.eqiad.wmnet valid until 2024-06-25 15:42:33 +0000 (expires in 1696 days) https://wikitech.wikimedia.org/wiki/Logs
[06:40:33] <icinga-wm>	 PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:06:19] <icinga-wm>	 RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:13:26] <wikibugs>	 (03CR) 10Urbanecm: [C: 04-1] flaggedrevs.php: Enable autoreview for bots on bswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547901 (https://phabricator.wikimedia.org/T237170) (owner: 10Zoranzoki21)
[07:50:38] <wikibugs>	 (03CR) 10Masumrezarock100: Add localized Minerva wordmark for Sindhi Wikipedia (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547061 (https://phabricator.wikimedia.org/T200870) (owner: 10Ammarpad)
[08:55:25] <icinga-wm>	 PROBLEM - Long running screen/tmux on snapshot1005 is CRITICAL: CRIT: Long running SCREEN process. (user: ariel PID: 7930, 1733099s 1728000s). https://wikitech.wikimedia.org/wiki/Monitoring/Long_running_screens
[09:28:05] <wikibugs>	 (03PS1) 10Brian Wolff: Adjust CSP header for pdfs & videos & set enforce on testwiki [puppet] - 10https://gerrit.wikimedia.org/r/547929 (https://phabricator.wikimedia.org/T117618)
[09:28:53] <wikibugs>	 (03CR) 10Ema: [C: 03+1] cumin: aliases: cache::text_ats is a thing now [puppet] - 10https://gerrit.wikimedia.org/r/547800 (https://phabricator.wikimedia.org/T227432) (owner: 10CDanis)
[09:46:31] <wikibugs>	 10Operations, 10Security-Team, 10Traffic, 10Wikimedia-General-or-Unknown, 10Patch-For-Review: Add restrictive CSP to upload.wikimedia.org - https://phabricator.wikimedia.org/T117618 (10Bawolff) A small number of browsers seem to want android-webview-video-poster: as a source when viewing videos, but the...
[10:12:19] <wikibugs>	 10Operations, 10ContentSecurityPolicy, 10Security-Team, 10Traffic, and 2 others: Add restrictive CSP to upload.wikimedia.org - https://phabricator.wikimedia.org/T117618 (10Bawolff)
[10:28:21] <icinga-wm>	 PROBLEM - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is CRITICAL: /api/rest_v1/media/math/check/{type} (Mathoid - check test formula) timed out before a response was received: /api/rest_v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is WARNING: Test Retrieve aggregated feed content for April 29, 2016 responds with unexpected value at path = Missing keys: [mostread] https://wikitech
[10:28:21] <icinga-wm>	 ki/RESTBase
[10:29:53] <icinga-wm>	 RECOVERY - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[12:39:54] <wikibugs>	 (03PS1) 10MarcoAurelio: Allow FlaggedRevs' 'autoreview' permission to be assigned globally [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547957
[12:41:06] <wikibugs>	 (03PS2) 10MarcoAurelio: Allow FlaggedRevs' 'autoreview' permission to be assigned globally [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547957
[12:45:29] <hauskater>	 Urbanecm: can you rebase https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikimediaMessages/+/529037/ please?
[12:45:56] * Urbanecm tried the web button
[12:45:59] <Urbanecm>	 sure, will do hauskater :)
[12:46:19] <hauskater>	 Yeah, I tried that too lol
[12:46:41] * Urbanecm should've guessed you did :D
[12:47:13] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "Sure" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547957 (owner: 10MarcoAurelio)
[12:48:07] <Urbanecm>	 hauskater: should be done!
[12:50:27] <hauskater>	 Urbanecm: thanks, but it looks "group-oathauth-tester-member": "two-factor authentication tester", is added there for some reason?
[12:51:05] * Urbanecm was not paying enough attention
[12:51:49] <Urbanecm>	 hauskater: what about now?
[12:52:07] <hauskater>	 checking
[12:52:31] <hauskater>	 still there?
[12:52:45] <hauskater>	 ehm wait
[12:52:48] <hauskater>	 reloading
[12:53:21] <hauskater>	 Urbanecm: looks good
[12:53:24] <hauskater>	 thanks
[12:53:36] <Urbanecm>	 hauskater: yw
[13:10:37] <icinga-wm>	 PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:54:45] <wikibugs>	 (03Abandoned) 10Zoranzoki21: flaggedrevs.php: Enable autoreview for bots on bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547901 (https://phabricator.wikimedia.org/T237170) (owner: 10Zoranzoki21)
[14:06:45] <icinga-wm>	 RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:26:25] <icinga-wm>	 RECOVERY - Maps tiles generation on icinga1001 is OK: OK: Less than 90.00% under the threshold [10.0] https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=8&fullscreen&orgId=1
[15:00:37] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting%23Nova-fullstack
[15:10:19] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting%23Nova-fullstack
[15:13:31] <icinga-wm>	 ACKNOWLEDGEMENT - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project andrew bogott investigating https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting%23Nova-fullstack
[15:26:33] <icinga-wm>	 RECOVERY - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is OK: 0 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting%23Nova-fullstack
[15:26:44] <wikibugs>	 10Operations, 10serviceops: Kubernetes hosts raid check make facter fail - https://phabricator.wikimedia.org/T237197 (10Volans)
[15:35:25] <wikibugs>	 10Operations, 10serviceops: Kubernetes workers frequent oom-killer in action - https://phabricator.wikimedia.org/T237198 (10Volans)
[15:40:07] <icinga-wm>	 PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:07:31] <icinga-wm>	 RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:53:52] <hauskater>	 davidwbarratt: hi. re. react.i18n if you submit https://gerrit.wikimedia.org/r/#/c/react.i18n/+/547894/ jenkins will be able to submit on the repo.
[19:12:19] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[19:17:09] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[20:40:51] <icinga-wm>	 PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:06:33] <icinga-wm>	 RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:40:17] <icinga-wm>	 PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:46:53] <icinga-wm>	 PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[21:51:31] <icinga-wm>	 RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:57:29] <icinga-wm>	 RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[23:11:09] <wikibugs>	 10Operations, 10ops-codfw, 10Cloud-Services: rack/setup codfw: cloudbackup2001.codfw.wmnet and cloudbackup2002.codfw.wmnet - https://phabricator.wikimedia.org/T224528 (10Papaul) @Andrew  the reason is that cloudbackup2002 is in the .16 network or it supposed to be in the .32 network since it is racked in row...
[23:57:20] <wikibugs>	 (03PS1) 10Alex Monk: cloud-puppetmaster: Prep for new instances [puppet] - 10https://gerrit.wikimedia.org/r/547992 (https://phabricator.wikimedia.org/T235218)