[00:12:25] * Urbanecm is stashing on mwdebug1001 [00:16:23] * Urbanecm is done with stashing [01:00:00] jouncebot, now [01:00:00] No deployments scheduled for the next 33 hour(s) and 29 minute(s) [01:06:09] !log Deployed patch for T228574 [01:06:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:06:19] T228574: 'skipcaptcha' permission is not assigned to autoconfirmed users on Wikispecies - https://phabricator.wikimedia.org/T228574 [02:24:03] PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 20116728 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [02:25:43] RECOVERY - Postgres Replication Lag on maps1002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 6376 and 51 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [03:36:05] PROBLEM - puppet last run on cp4028 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz],File[/usr/share/GeoIP/GeoIP2-City.mmdb.test] https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [03:39:03] PROBLEM - puppet last run on mw1339 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [03:50:11] PROBLEM - puppet last run on elastic1037 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [04:04:23] RECOVERY - puppet last run on cp4028 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [04:07:19] RECOVERY - puppet last run on mw1339 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [04:12:49] RECOVERY - puppet last run on elastic1037 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [05:39:05] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [05:44:07] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:20:12] 10Operations, 10ops-eqiad: Degraded RAID on analytics1029 - https://phabricator.wikimedia.org/T228577 (10ops-monitoring-bot) [06:29:21] PROBLEM - puppet last run on cp5011 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:35:41] PROBLEM - Memory correctable errors -EDAC- on thumbor1004 is CRITICAL: 4.001 ge 4 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=thumbor1004&var-datasource=eqiad+prometheus/ops [06:57:33] RECOVERY - puppet last run on cp5011 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [12:17:34] 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Redirect lzh.wikipedia to zh-classical.wikipedia - https://phabricator.wikimedia.org/T167513 (10Viztor) This really shouldn't be difficult. A regular CNAME/redirect would neatly do the job. [13:07:11] RECOVERY - Device not healthy -SMART- on helium is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=helium&var-datasource=eqiad+prometheus/ops [13:46:25] PROBLEM - puppet last run on db1061 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [13:47:45] ACKNOWLEDGEMENT - Memory correctable errors -EDAC- on thumbor1004 is CRITICAL: 7.001 ge 4 Effie Mouzeli Known issue - https://phabricator.wikimedia.org/T215411 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=thumbor1004&var-datasource=eqiad+prometheus/ops [14:14:47] RECOVERY - puppet last run on db1061 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [14:17:49] PROBLEM - Memory correctable errors -EDAC- on wtp2013 is CRITICAL: 12 ge 4 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=wtp2013&var-datasource=codfw+prometheus/ops [17:59:33] PROBLEM - puppet last run on db1104 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [18:11:15] PROBLEM - puppet last run on druid1001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [18:27:51] RECOVERY - puppet last run on db1104 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [18:39:31] RECOVERY - puppet last run on druid1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [18:42:41] PROBLEM - HHVM rendering on mw1346 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [18:44:11] RECOVERY - HHVM rendering on mw1346 is OK: HTTP OK: HTTP/1.1 200 OK - 77026 bytes in 0.172 second response time https://wikitech.wikimedia.org/wiki/Application_servers [19:47:03] (03PS1) 10Revi: Enable wgNamespacesWithSubpages on main NS for kowikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524685 (https://phabricator.wikimedia.org/T228481) [19:55:49] (03CR) 10DannyS712: [C: 03+1] "Looks good to me" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524685 (https://phabricator.wikimedia.org/T228481) (owner: 10Revi) [20:10:21] PROBLEM - Check systemd state on elastic1046 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [20:10:43] PROBLEM - MD RAID on elastic1046 is CRITICAL: CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [20:10:44] ACKNOWLEDGEMENT - MD RAID on elastic1046 is CRITICAL: CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T228606 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [20:10:48] 10Operations, 10ops-eqiad: Degraded RAID on elastic1046 - https://phabricator.wikimedia.org/T228606 (10ops-monitoring-bot) [20:14:23] PROBLEM - Number of backend failures per minute from CirrusSearch on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [600.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?orgId=1&var-cluster=eqiad&var-smoothing=1&panelId=9&fullscreen [20:17:43] RECOVERY - Number of backend failures per minute from CirrusSearch on graphite1004 is OK: OK: Less than 20.00% above the threshold [300.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?orgId=1&var-cluster=eqiad&var-smoothing=1&panelId=9&fullscreen [20:18:03] PROBLEM - puppet last run on elastic1046 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 8 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[search/mjolnir/deploy] https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [21:24:07] PROBLEM - Device not healthy -SMART- on elastic1046 is CRITICAL: cluster=elasticsearch device=sdb instance=elastic1046:9100 job=node site=eqiad https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1046&var-datasource=eqiad+prometheus/ops [22:24:33] (03CR) 10Urbanecm: [C: 03+1] "LGTM, thanks!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524685 (https://phabricator.wikimedia.org/T228481) (owner: 10Revi) [22:25:27] RECOVERY - Device not healthy -SMART- on elastic1046 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1046&var-datasource=eqiad+prometheus/ops [22:29:20] (03CR) 10Huji: [C: 03+1] Enable wgNamespacesWithSubpages on main NS for kowikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524685 (https://phabricator.wikimedia.org/T228481) (owner: 10Revi) [22:55:15] 10Operations, 10Wikimedia-General-or-Unknown: load.php URL hanging sometimes - https://phabricator.wikimedia.org/T213030 (10Aklapper) >>! In T213030#4863807, @Ciencia_Al_Poder wrote: > Will keep you updated if I see this problem again @Ciencia_Al_Poder: Six months later: Has this happened again? :) [23:39:05] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [23:39:11] (03PS1) 10Aaron Schulz: Move duplicated RDBMS host lists to ProductionServices.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524695 [23:39:58] (03CR) 10jerkins-bot: [V: 04-1] Move duplicated RDBMS host lists to ProductionServices.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524695 (owner: 10Aaron Schulz) [23:44:03] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [23:52:30] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664 (10Verdy_p) Note that ALL ISO 15924 scripts marked as encoded in Unicode up to version 9.0 (including historic scripts) have a suitable No...