[00:12:25] * Urbanecm is stashing on mwdebug1001
[00:16:23] * Urbanecm is done with stashing
[01:00:00] <Urbanecm>	 jouncebot, now
[01:00:00] <jouncebot>	 No deployments scheduled for the next 33 hour(s) and 29 minute(s)
[01:06:09] <Urbanecm>	 !log Deployed patch for T228574
[01:06:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:06:19] <stashbot>	 T228574: 'skipcaptcha' permission is not assigned to autoconfirmed users on Wikispecies - https://phabricator.wikimedia.org/T228574
[02:24:03] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 20116728 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[02:25:43] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 6376 and 51 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[03:36:05] <icinga-wm>	 PROBLEM - puppet last run on cp4028 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz],File[/usr/share/GeoIP/GeoIP2-City.mmdb.test] https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[03:39:03] <icinga-wm>	 PROBLEM - puppet last run on mw1339 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[03:50:11] <icinga-wm>	 PROBLEM - puppet last run on elastic1037 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[04:04:23] <icinga-wm>	 RECOVERY - puppet last run on cp4028 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[04:07:19] <icinga-wm>	 RECOVERY - puppet last run on mw1339 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[04:12:49] <icinga-wm>	 RECOVERY - puppet last run on elastic1037 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[05:39:05] <icinga-wm>	 RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[05:44:07] <icinga-wm>	 PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[06:20:12] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on analytics1029 - https://phabricator.wikimedia.org/T228577 (10ops-monitoring-bot)
[06:29:21] <icinga-wm>	 PROBLEM - puppet last run on cp5011 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[06:35:41] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on thumbor1004 is CRITICAL: 4.001 ge 4 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=thumbor1004&var-datasource=eqiad+prometheus/ops
[06:57:33] <icinga-wm>	 RECOVERY - puppet last run on cp5011 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[12:17:34] <wikibugs>	 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Redirect lzh.wikipedia to zh-classical.wikipedia - https://phabricator.wikimedia.org/T167513 (10Viztor) This really shouldn't be difficult. A regular CNAME/redirect would neatly do the job.
[13:07:11] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on helium is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=helium&var-datasource=eqiad+prometheus/ops
[13:46:25] <icinga-wm>	 PROBLEM - puppet last run on db1061 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[13:47:45] <icinga-wm>	 ACKNOWLEDGEMENT - Memory correctable errors -EDAC- on thumbor1004 is CRITICAL: 7.001 ge 4 Effie Mouzeli Known issue - https://phabricator.wikimedia.org/T215411 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=thumbor1004&var-datasource=eqiad+prometheus/ops
[14:14:47] <icinga-wm>	 RECOVERY - puppet last run on db1061 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[14:17:49] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on wtp2013 is CRITICAL: 12 ge 4 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=wtp2013&var-datasource=codfw+prometheus/ops
[17:59:33] <icinga-wm>	 PROBLEM - puppet last run on db1104 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[18:11:15] <icinga-wm>	 PROBLEM - puppet last run on druid1001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[18:27:51] <icinga-wm>	 RECOVERY - puppet last run on db1104 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[18:39:31] <icinga-wm>	 RECOVERY - puppet last run on druid1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[18:42:41] <icinga-wm>	 PROBLEM - HHVM rendering on mw1346 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[18:44:11] <icinga-wm>	 RECOVERY - HHVM rendering on mw1346 is OK: HTTP OK: HTTP/1.1 200 OK - 77026 bytes in 0.172 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[19:47:03] <wikibugs>	 (03PS1) 10Revi: Enable wgNamespacesWithSubpages on main NS for kowikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524685 (https://phabricator.wikimedia.org/T228481)
[19:55:49] <wikibugs>	 (03CR) 10DannyS712: [C: 03+1] "Looks good to me" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524685 (https://phabricator.wikimedia.org/T228481) (owner: 10Revi)
[20:10:21] <icinga-wm>	 PROBLEM - Check systemd state on elastic1046 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[20:10:43] <icinga-wm>	 PROBLEM - MD RAID on elastic1046 is CRITICAL: CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[20:10:44] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on elastic1046 is CRITICAL: CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T228606 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[20:10:48] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on elastic1046 - https://phabricator.wikimedia.org/T228606 (10ops-monitoring-bot)
[20:14:23] <icinga-wm>	 PROBLEM - Number of backend failures per minute from CirrusSearch on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [600.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?orgId=1&var-cluster=eqiad&var-smoothing=1&panelId=9&fullscreen
[20:17:43] <icinga-wm>	 RECOVERY - Number of backend failures per minute from CirrusSearch on graphite1004 is OK: OK: Less than 20.00% above the threshold [300.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?orgId=1&var-cluster=eqiad&var-smoothing=1&panelId=9&fullscreen
[20:18:03] <icinga-wm>	 PROBLEM - puppet last run on elastic1046 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 8 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[search/mjolnir/deploy] https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[21:24:07] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on elastic1046 is CRITICAL: cluster=elasticsearch device=sdb instance=elastic1046:9100 job=node site=eqiad https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1046&var-datasource=eqiad+prometheus/ops
[22:24:33] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "LGTM, thanks!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524685 (https://phabricator.wikimedia.org/T228481) (owner: 10Revi)
[22:25:27] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on elastic1046 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1046&var-datasource=eqiad+prometheus/ops
[22:29:20] <wikibugs>	 (03CR) 10Huji: [C: 03+1] Enable wgNamespacesWithSubpages on main NS for kowikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524685 (https://phabricator.wikimedia.org/T228481) (owner: 10Revi)
[22:55:15] <wikibugs>	 10Operations, 10Wikimedia-General-or-Unknown: load.php URL hanging sometimes - https://phabricator.wikimedia.org/T213030 (10Aklapper) >>! In T213030#4863807, @Ciencia_Al_Poder wrote: > Will keep you updated if I see this problem again  @Ciencia_Al_Poder: Six months later: Has this happened again? :)
[23:39:05] <icinga-wm>	 RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[23:39:11] <wikibugs>	 (03PS1) 10Aaron Schulz: Move duplicated RDBMS host lists to ProductionServices.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524695
[23:39:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Move duplicated RDBMS host lists to ProductionServices.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524695 (owner: 10Aaron Schulz)
[23:44:03] <icinga-wm>	 PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[23:52:30] <wikibugs>	 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664 (10Verdy_p) Note that ALL ISO 15924 scripts marked as encoded in Unicode up to version 9.0 (including historic scripts) have a suitable No...