[00:01:48] (03PS1) 10Dzahn: Revert "testreduce: use regular package{} instead of require_package" [puppet] - 10https://gerrit.wikimedia.org/r/482390 [00:02:42] (03CR) 10Dzahn: [C: 03+2] "Duplicate declaration: Package[nodejs] is already declared in file /etc/puppet/modules/visualdiff/manifests/init.pp:16; cannot redeclare a" [puppet] - 10https://gerrit.wikimedia.org/r/482390 (owner: 10Dzahn) [00:24:55] 10Operations, 10uprightdiff, 10Parsoid-Tests: stretch version of uprightdiff package - https://phabricator.wikimedia.org/T212987 (10Dzahn) [00:25:49] 10Operations, 10uprightdiff, 10Parsoid-Tests: stretch version of uprightdiff package - https://phabricator.wikimedia.org/T212987 (10Dzahn) [00:25:53] 10Operations, 10Parsoid, 10Patch-For-Review: rack/setup/install scandium.eqiad.wmnet (parsoid test box) - https://phabricator.wikimedia.org/T201366 (10Dzahn) [00:26:49] 10Operations, 10Parsoid, 10Patch-For-Review: rack/setup/install scandium.eqiad.wmnet (parsoid test box) - https://phabricator.wikimedia.org/T201366 (10Dzahn) some issues solved (no more broken packages, icinga happy), but blocked on T212987 and still has a dependency issue with apt::pin [00:27:06] 10Operations, 10Parsoid, 10Patch-For-Review: rack/setup/install scandium.eqiad.wmnet (parsoid test box) - https://phabricator.wikimedia.org/T201366 (10Dzahn) a:03Dzahn [00:31:22] 10Operations, 10uprightdiff, 10Parsoid-Tests: stretch version of uprightdiff package - https://phabricator.wikimedia.org/T212987 (10ssastry) @Legoktm In case you can help with this packaging of uprightdiff. [00:57:14] (03CR) 10Bstorm: wmcs::nfs::misc - Refactor into profile/role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482051 (https://phabricator.wikimedia.org/T209527) (owner: 10GTirloni) [01:12:38] (03CR) 10Bstorm: "> Aren't these packages really universal and not tied to any particular distribution? Does the distribution really mean anything to us at " [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/479181 (https://phabricator.wikimedia.org/T107878) (owner: 10GTirloni) [01:15:36] (03CR) 10Paladox: "This is what it will look like: https://phabricator.wikimedia.org/F27793705" [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [01:15:55] (03CR) 10Bstorm: [C: 03+1] "So yeah, presuming we can eliminate some confusion by using "unstable", I like this change :)" [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/479181 (https://phabricator.wikimedia.org/T107878) (owner: 10GTirloni) [02:03:20] (03CR) 10Krinkle: "The green conflicts with the logo, making the text hard to read and the logo no longer clearly recognisable. The text should probably dark" [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [02:04:40] (03CR) 10Krinkle: "(or use an all-white version of the logo and make the green darker still)." [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [02:57:57] 10Operations, 10Phabricator, 10Release-Engineering-Team: Convert Phabricator mail config to use cluster.mailers - https://phabricator.wikimedia.org/T212989 (10Paladox) [02:58:26] 10Operations, 10Phabricator, 10Release-Engineering-Team: Convert Phabricator mail config to use cluster.mailers - https://phabricator.wikimedia.org/T212989 (10Paladox) [03:33:07] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 903.38 seconds [03:42:57] (03PS1) 10Paladox: phabricator: Migrate mail config to cluster.mailers [puppet] - 10https://gerrit.wikimedia.org/r/482400 [03:49:17] (03PS2) 10Paladox: phabricator: Migrate mail config to cluster.mailers [puppet] - 10https://gerrit.wikimedia.org/r/482400 [03:49:35] (03PS3) 10Paladox: phabricator: Migrate mail config to cluster.mailers [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) [03:50:23] (03CR) 10jerkins-bot: [V: 04-1] phabricator: Migrate mail config to cluster.mailers [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox) [03:50:29] (03PS4) 10Paladox: phabricator: Migrate mail config to cluster.mailers [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) [03:51:33] (03PS5) 10Paladox: phabricator: Migrate mail config to cluster.mailers [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) [03:52:07] (03PS6) 10Paladox: phabricator: Migrate mail config to cluster.mailers [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) [03:52:12] (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox) [03:55:41] (03CR) 10Paladox: "@20after4 would you be able to review this please? I believe we have to do the same for incoming emails to (though im not sure how to do t" [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox) [04:02:19] PROBLEM - Device not healthy -SMART- on helium is CRITICAL: cluster=misc device=megaraid,6 instance=helium:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=helium&var-datasource=eqiad%2520prometheus%252Fops [04:21:55] 10Operations, 10uprightdiff, 10Parsoid-Tests: stretch version of uprightdiff package - https://phabricator.wikimedia.org/T212987 (10Legoktm) I uploaded uprightdiff to stretch-backports, it'll take a few days for it to get reviewed by the backports FTP masters, if that's not an issue (otherwise I can get a ve... [04:22:01] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 140.89 seconds [04:49:23] PROBLEM - MegaRAID on helium is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) [04:49:30] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T212990 (10ops-monitoring-bot) [06:28:22] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:28:56] PROBLEM - puppet last run on authdns2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/nrpe_check_systemd_unit_state] [06:29:12] PROBLEM - netbox HTTPS on netmon1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 547 bytes in 0.008 second response time [06:31:30] PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/mysql-ps1.sh] [06:57:32] RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:00:12] RECOVERY - puppet last run on authdns2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:07:28] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational [07:11:06] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:31:22] RECOVERY - netbox HTTPS on netmon1002 is OK: HTTP OK: HTTP/1.1 302 Found - 348 bytes in 0.559 second response time [07:31:37] a puppet run fixed it, it is the recurrent log rotation segfault --^ [07:31:48] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational [07:37:06] (03PS1) 10Elukey: Decomission two Hadoop worker nodes from the Analtytics cluster [puppet] - 10https://gerrit.wikimedia.org/r/482401 (https://phabricator.wikimedia.org/T209929) [07:38:09] (03CR) 10Elukey: [C: 03+2] Decomission two Hadoop worker nodes from the Analtytics cluster [puppet] - 10https://gerrit.wikimedia.org/r/482401 (https://phabricator.wikimedia.org/T209929) (owner: 10Elukey) [07:40:09] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T212990 (10elukey) p:05Triageβ†’03High a:03Cmjohnson [07:41:34] ACKNOWLEDGEMENT - Device not healthy -SMART- on helium is CRITICAL: cluster=misc device=megaraid,6 instance=helium:9100 job=node site=eqiad Elukey T212990 https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=helium&var-datasource=eqiad%2520prometheus%252Fops [07:41:34] ACKNOWLEDGEMENT - MegaRAID on helium is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) Elukey T212990 [11:44:13] 10Operations, 10monitoring: Degraded RAID alert not acking notifications - https://phabricator.wikimedia.org/T212969 (10Volans) a:03Volans [11:48:38] (03PS1) 10Volans: icinga: raid_handler fix path to command file [puppet] - 10https://gerrit.wikimedia.org/r/482403 (https://phabricator.wikimedia.org/T212969) [11:49:08] 10Operations, 10monitoring, 10Patch-For-Review: Degraded RAID alert not acking notifications - https://phabricator.wikimedia.org/T212969 (10Volans) The raid handler had the old path valid in jessie for the command file. [11:50:09] (03CR) 10Volans: [C: 03+2] icinga: raid_handler fix path to command file [puppet] - 10https://gerrit.wikimedia.org/r/482403 (https://phabricator.wikimedia.org/T212969) (owner: 10Volans) [11:58:06] 10Operations, 10monitoring, 10Patch-For-Review: Degraded RAID alert not acking notifications - https://phabricator.wikimedia.org/T212969 (10Volans) 05Openβ†’03Resolved Path deployed, resolving for now. Please re-open if that doesn't fix it. [12:10:21] (03CR) 10Volans: "> Patch Set 1:" (032 comments) [dns] - 10https://gerrit.wikimedia.org/r/481833 (owner: 10Volans) [13:56:02] Hello. This URL doesn't load for me. It stops sending more data after a while, until it timeouts: https://dpaste.de/eXcB (I put it on dpaste because it's long) [13:58:11] https://dpaste.de/kreZ/raw [14:00:05] Well, it works now, maybe a temporary hiccup [15:02:42] 10Operations, 10Phabricator, 10Release-Engineering-Team, 10Patch-For-Review: Convert Phabricator mail config to use cluster.mailers - https://phabricator.wikimedia.org/T212989 (10Paladox) they now include "void-recipient@" in the email https://github.com/phacility/phabricator/blob/73e3057c52f46ec6d... [16:17:43] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): Update label and switch to rename labvirt1013 to cloudvirt1013 - https://phabricator.wikimedia.org/T212522 (10Andrew) a:05Andrewβ†’03Cmjohnson [18:26:30] (03CR) 10Paladox: "> (or use an all-white version of the logo and make the green darker" [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [19:01:16] 10Operations, 10Core Platform Team (PHP7 (TEC4)), 10Core Platform Team Kanban (Doing), 10HHVM, and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Reedy) Luasandbox seems to be segfaulting on vagrant... dunno if it’s more widely replicable yet, or applicable to p... [19:48:24] PROBLEM - Disk space on analytics-tool1002 is CRITICAL: DISK CRITICAL - free space: / 721 MB (3% inode=89%) [20:21:21] ouch [20:21:27] checking an-tool1002 [20:22:30] RECOVERY - Disk space on analytics-tool1002 is OK: DISK OK [20:23:31] !log manually clean up of big logs under /var/log/.. [20:23:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:23:37] sigh [20:23:43] * elukey amends the sal [23:51:02] (03PS1) 10BryanDavis: toolforge: Add missing php packages [puppet] - 10https://gerrit.wikimedia.org/r/482481 [23:52:37] (03CR) 10BryanDavis: toolforge: Add missing php packages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482481 (owner: 10BryanDavis)