[00:02:55] !log twentyafterfour@tin Finished scap: sync new branch, testwiki to php-1.28.0-wmf.8 refs T137492 (duration: 51m 59s) [00:02:55] T137492: MW-1.28.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T137492 [00:02:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:27:43] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [00:34:41] PROBLEM - salt-minion processes on labstore2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [00:44:12] (03CR) 10Paladox: [C: 031] group0 to 1.28.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296482 (owner: 1020after4) [00:52:21] RECOVERY - salt-minion processes on labstore2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [00:59:02] PROBLEM - salt-minion processes on labstore2003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [01:21:31] RECOVERY - salt-minion processes on labstore2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [01:22:42] PROBLEM - MariaDB Slave Lag: m3 on db1048 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1357.23 seconds [01:28:12] PROBLEM - salt-minion processes on labstore2003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [01:51:52] RECOVERY - MariaDB Slave Lag: m3 on db1048 is OK: OK slave_sql_lag Replication lag: 0.01 seconds [01:57:32] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [02:00:07] (03CR) 1020after4: [C: 032] group0 to 1.28.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296482 (owner: 1020after4) [02:00:42] (03Merged) 10jenkins-bot: group0 to 1.28.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296482 (owner: 1020after4) [02:04:22] PROBLEM - salt-minion processes on labstore2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [02:16:47] !log promoting group0 to 1.28.0-wmf.8 [02:16:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:18:44] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: sync wikiversions.json - group0 to 1.28.0-wmf.8 refs T137492 [02:18:45] T137492: MW-1.28.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T137492 [02:18:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:22:02] RECOVERY - salt-minion processes on labstore2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [02:26:52] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [02:28:51] PROBLEM - salt-minion processes on labstore2003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [02:29:50] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.7) (duration: 09m 21s) [02:29:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:30:48] !log labstore1004 is replicating NFS/DRBD shares to labstore1005 and they are large and it's taking a long time [02:30:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:34:04] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [02:50:02] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.8) (duration: 04m 48s) [02:50:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:54:19] (03PS1) 10Krinkle: Lower $wgSquidMaxage to 1 day for test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296495 (https://phabricator.wikimedia.org/T124954) [02:56:32] !log l10nupdate@tin ResourceLoader cache refresh completed at Wed Jun 29 02:56:32 UTC 2016 (duration 6m 30s) [02:56:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:57:32] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [03:04:23] PROBLEM - salt-minion processes on labstore2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [03:05:11] (03CR) 10Krinkle: [C: 032] Lower $wgSquidMaxage to 1 day for test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296495 (https://phabricator.wikimedia.org/T124954) (owner: 10Krinkle) [03:05:59] (03Merged) 10jenkins-bot: Lower $wgSquidMaxage to 1 day for test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296495 (https://phabricator.wikimedia.org/T124954) (owner: 10Krinkle) [03:26:54] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master). [03:28:27] !log krinkle@tin Synchronized wmf-config/InitialiseSettings.php: test2wiki (duration: 00m 33s) [03:28:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:29:13] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [03:39:02] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [03:39:04] PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2607:f6f0:205::153 [03:45:32] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 3.17 ms [03:45:32] RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 47.02 ms [03:51:43] RECOVERY - salt-minion processes on labstore2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [03:56:52] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [03:56:54] 06Operations, 10Icinga, 10ORES, 06Revision-Scoring-As-A-Service: Monitor production worker nodes in icinga - https://phabricator.wikimedia.org/T138882#2413619 (10Dzahn) [03:58:24] PROBLEM - salt-minion processes on labstore2003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [04:03:52] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [04:22:53] PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [04:22:53] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [04:27:43] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [04:28:48] (03PS1) 10KartikMistry: Deploy Compact Language Links as default (Stage 3.5) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296501 (https://phabricator.wikimedia.org/T136677) [04:33:43] PROBLEM - salt-minion processes on labstore2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [04:40:55] Hey. https://en.wikipedia.org/w/index.php?title=Special%3ACentralAuth&target=Optimus24 is giving Fatal exception of type "Exception" [04:42:58] JJMC89: is that your account? [04:43:59] looks like it's https://phabricator.wikimedia.org/T119736 [04:44:15] No, I was attempting to see if the account existed for an ACC request. [04:45:17] JJMC89: https://phabricator.wikimedia.org/T119736#2413660 [04:46:06] Thanks. Is there another way to check if the account exists? [04:46:58] Or does that mean that it does? [04:47:35] I'm not sure either [04:47:44] RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 3.71 ms [04:48:43] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 1.48 ms [04:50:00] Thanks ori [04:50:13] sorry I couldn't help more [04:50:23] No worries [04:52:24] ori: Hey, since you're around; I'm looking for a Phab task involving global account creation failing when a user account is created, so that the account exists locally but doesn't appear in CentralAuth. Do you happen to know the number on that off-hand? My Phab foo isn't good enough [04:53:36] so the inverse of https://phabricator.wikimedia.org/T119736 ? [04:54:10] Yeah, though admittedly I only have a vague recollection of the bug existing [04:54:57] Oh, wait, hmm [04:55:11] This might actually be something else [04:57:22] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [04:57:53] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [05:04:13] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [05:04:43] PROBLEM - salt-minion processes on labstore2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [05:06:57] ori: For the user name Karina.Arnold Special:CentralAuth says no global account but Special:CreateAccount says the username is already in use [05:07:08] Is there a bug for that? [05:07:24] ori: Okay, so the issue I was thinking of is something else entirely, see JJMC89's message above [05:07:52] I couldn't find one, but I am not usually subscribed to authentication-related bugs [05:08:17] I'd file one; if it's a dupe, it can be closed easily. Better to err on the side of overreporting. [05:15:07] (^ JJMC89) [05:15:21] https://phabricator.wikimedia.org/T138909 [05:16:35] many thanks [05:26:34] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [05:27:03] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [05:33:33] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [05:34:03] PROBLEM - salt-minion processes on labstore2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [05:52:02] RECOVERY - salt-minion processes on labstore2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [05:55:53] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [05:58:43] PROBLEM - salt-minion processes on labstore2003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [06:02:53] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [06:24:20] (03PS2) 10KartikMistry: Deploy Compact Language Links as default (Stage 3.5) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296501 (https://phabricator.wikimedia.org/T136677) [06:27:23] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [06:31:22] PROBLEM - puppet last run on ms-be2022 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:43] PROBLEM - puppet last run on mw1284 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:52] PROBLEM - puppet last run on elastic1042 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:53] PROBLEM - puppet last run on mw1112 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:23] PROBLEM - puppet last run on mw2228 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:22] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [06:34:53] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:37] PROBLEM - puppet last run on mw2077 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:06] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:26] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:46:06] PROBLEM - puppet last run on mw1159 is CRITICAL: CRITICAL: Puppet has 2 failures [06:56:27] RECOVERY - puppet last run on mw1112 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:36] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [06:57:06] RECOVERY - puppet last run on elastic1042 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:57:36] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [06:57:36] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:45] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:15] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:36] RECOVERY - puppet last run on mw2228 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:53] !log rebooting most snapshot hosts for kernel security update [06:58:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:59:06] RECOVERY - puppet last run on mw2077 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:00:05] RECOVERY - puppet last run on ms-be2022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:01:16] RECOVERY - puppet last run on mw1284 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:02:34] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [07:06:23] PROBLEM - salt-minion processes on labstore2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [07:10:44] RECOVERY - puppet last run on mw1159 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [07:11:09] !log powercycling snapshot1001, reboot stuck [07:11:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:14:39] PROBLEM - MariaDB disk space on dbstore1001 is CRITICAL: DISK CRITICAL - free space: /srv 385961 MB (5% inode=99%) [07:15:14] mmmm [07:16:49] !log powercycling snapshot1002, reboot stuck [07:16:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:19:23] (03PS3) 10KartikMistry: Deploy Compact Language Links as default (Stage 3.5) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296501 (https://phabricator.wikimedia.org/T136677) [07:23:08] (03PS1) 10Mdann52: Fix missing "'" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296517 [07:23:44] (03Abandoned) 10Mdann52: Fix missing "'" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296517 (owner: 10Mdann52) [07:25:16] (03PS1) 10Mdann52: Fix missing "'" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296518 [07:25:36] (03PS2) 10Mdann52: Change $wgMaxRedirects to 3 on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296518 [07:26:44] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [07:28:25] (03Abandoned) 10Mdann52: Change $wgMaxRedirects to 3 on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296406 (owner: 10Mdann52) [07:30:44] (03PS1) 10Mdann52: Change $wgMaxRedirects to 3 on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296520 [07:31:28] (03Abandoned) 10Mdann52: Change $wgMaxRedirects to 3 on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296518 (owner: 10Mdann52) [07:33:54] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [07:38:17] (03PS2) 10Mdann52: Change $wgMaxRedirects to 3 on enwiki and dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296520 [07:43:23] PROBLEM - puppet last run on mw1103 is CRITICAL: CRITICAL: Puppet has 1 failures [07:47:31] !log rolling reboot of appservers in eqiad for kernel security update [07:47:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:48:45] (03CR) 10Alexandros Kosiaris: "As I 've already pointed out in the comment above, worker nodes are already monitored. The swagger spec actually is used to advertise all " [puppet] - 10https://gerrit.wikimedia.org/r/296054 (owner: 10Dzahn) [07:49:23] (03PS3) 10Peachey88: Change $wgMaxRedirects to 3 on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296520 (https://phabricator.wikimedia.org/T67064) (owner: 10Mdann52) [07:52:47] (03CR) 10Alexandros Kosiaris: [C: 032] consistently (no) FQDN in DHCP config [puppet] - 10https://gerrit.wikimedia.org/r/296424 (owner: 10Dzahn) [07:52:51] (03PS2) 10Alexandros Kosiaris: consistently (no) FQDN in DHCP config [puppet] - 10https://gerrit.wikimedia.org/r/296424 (owner: 10Dzahn) [07:53:45] 06Operations, 06Release-Engineering-Team, 07Developer-notice, 05Gitblit-Deprecate, and 2 others: Redirect Gitblit urls (git.wikimedia.org) -> Diffusion urls (phabricator.wikimedia.org/diffusion) - https://phabricator.wikimedia.org/T137224#2413792 (10Danny_B) >>! In T137224#2412573, @Dzahn wrote: > Yes, the... [07:55:23] (03CR) 10Alexandros Kosiaris: [V: 032] consistently (no) FQDN in DHCP config [puppet] - 10https://gerrit.wikimedia.org/r/296424 (owner: 10Dzahn) [07:56:43] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:02:11] (03CR) 10TTO: "Does this actually work? T122771 is still open, which suggests it might not." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296520 (https://phabricator.wikimedia.org/T67064) (owner: 10Mdann52) [08:05:04] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [08:05:43] !log powercycling mw1092, stuck on reboot [08:05:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:08:44] RECOVERY - puppet last run on mw1103 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:12:02] !log powercycling mw1097, stuck on reboot [08:12:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:12:49] !log powercycling mw1099, stuck on reboot [08:12:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:16:38] (03PS1) 10Urbanecm: Throttling exeption for enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296524 (https://phabricator.wikimedia.org/T138167) [08:29:04] PROBLEM - DPKG on stat1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [08:31:34] RECOVERY - DPKG on stat1002 is OK: All packages OK [08:32:48] dpkg alerts on stat* are temporary due to kernel update [08:38:24] PROBLEM - DPKG on stat1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [08:39:21] thanks :) [08:40:19] !log powercycling mw1108, stuck on reboot [08:40:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:42:56] RECOVERY - DPKG on stat1003 is OK: All packages OK [08:44:28] !log puppet stopped on analytics1027 to prevent Camus job to run (prep step for Hadoop kernel upgrades) [08:44:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:55:04] !log powercycling mw1111, stuck on reboot [08:55:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:56:05] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:56:55] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:01:47] !log rebooting analytics1028->1057 for kernel upgrades (Hadoop worker nodes) [09:01:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:02:56] 06Operations, 10LDAP-Access-Requests: LDAP Account required for Transparency Report - https://phabricator.wikimedia.org/T138369#2413944 (10hashar) (Added Michelle Paulson, Legal director and most probably the person that signed the contract for Siddharth. Sorry for the spam if that is not the case). I don't... [09:03:25] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [09:04:15] PROBLEM - salt-minion processes on labstore2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [09:20:20] !log upgrading diamond to 3.5-6 (T138758) [09:20:20] T138758: diamond: certain counters always calculated as 0 - https://phabricator.wikimedia.org/T138758 [09:20:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:21:16] RECOVERY - salt-minion processes on labstore2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:21:38] 06Operations, 10CirrusSearch, 06Discovery, 06Discovery-Search-Backlog, 13Patch-For-Review: Only use newer (elastic10{16..47}) servers as master capable elasticsearch nodes - https://phabricator.wikimedia.org/T112556#2413965 (10Gehel) This is also partly tracked as part of T138329. Masters have moved to n... [09:22:25] 06Operations, 10CirrusSearch, 06Discovery, 06Discovery-Search-Backlog, and 2 others: Only use newer (elastic10{16..47}) servers as master capable elasticsearch nodes - https://phabricator.wikimedia.org/T112556#2413967 (10Gehel) [09:23:20] !log banning elastic1001 to 1016 from cluster to prepare their decommissioning (T138329) [09:23:21] T138329: Install and configure new elasticsearch servers in eqiad - https://phabricator.wikimedia.org/T138329 [09:23:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:26:54] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:27:54] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:28:26] PROBLEM - salt-minion processes on labstore2003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [09:29:15] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [09:34:25] PROBLEM - puppet last run on sca2001 is CRITICAL: CRITICAL: Puppet has 1 failures [09:34:54] PROBLEM - puppet last run on francium is CRITICAL: CRITICAL: Puppet has 1 failures [09:34:55] PROBLEM - puppet last run on mw2132 is CRITICAL: CRITICAL: Puppet has 1 failures [09:34:55] PROBLEM - puppet last run on alsafi is CRITICAL: CRITICAL: Puppet has 1 failures [09:35:06] PROBLEM - salt-minion processes on labstore2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [09:37:04] PROBLEM - puppet last run on mw2088 is CRITICAL: CRITICAL: Puppet has 1 failures [09:37:24] RECOVERY - puppet last run on alsafi is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:39:14] RECOVERY - puppet last run on sca2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:39:25] RECOVERY - puppet last run on mw2088 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [09:42:04] PROBLEM - puppet last run on db1023 is CRITICAL: CRITICAL: Puppet has 1 failures [09:44:10] 06Operations, 10GlobalRename, 10MediaWiki-extensions-CentralAuth: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973#2414023 (10Steinsplitter) [09:44:15] RECOVERY - puppet last run on db1023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:45:00] 06Operations, 10GlobalRename, 10MediaWiki-extensions-CentralAuth: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973#2414031 (10biplabanand) @Steinsplitter I am also waiting............. [09:45:37] sorry for the puppetfails, they're transient (diamond upgrade) ^ [09:48:54] RECOVERY - puppet last run on francium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:49:04] RECOVERY - puppet last run on mw2132 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [09:53:38] 06Operations, 10Ops-Access-Requests, 06Discovery, 10Wikidata, and 2 others: Enable WDQS admins to enable/disable updater service - https://phabricator.wikimedia.org/T138627#2414045 (10akosiaris) I support that as well. The code in wdqs::updater should probably anyway be amended to use base::service_unit at... [09:54:58] !log powercycling mw1163, stuck on reboot [09:55:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:55:04] PROBLEM - Host mw1103 is DOWN: PING CRITICAL - Packet loss = 100% [09:56:44] PROBLEM - Host mw1120 is DOWN: PING CRITICAL - Packet loss = 100% [09:57:38] 06Operations, 10ops-codfw, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2414049 (10fgiunchedi) [09:58:19] 06Operations, 10ops-eqiad: mw1103 and mw1120 stuck after reboot - https://phabricator.wikimedia.org/T138921#2414050 (10MoritzMuehlenhoff) [09:59:04] ACKNOWLEDGEMENT - Host mw1103 is DOWN: PING CRITICAL - Packet loss = 100% Muehlenhoff hardware problems, T138921 [09:59:04] ACKNOWLEDGEMENT - Host mw1120 is DOWN: PING CRITICAL - Packet loss = 100% Muehlenhoff hardware problems, T138921 [10:01:16] moritzm: what happens to those machines usually? the kernel isn't able to reboot the machine? [10:04:03] just a few chars of garbage output on the serial console, so hard to tell :-) [10:04:29] is there any chance to grep over all git repos files (master branch is good enough) other than cloning all repos to local machine and grep locally then? [10:04:40] those are all hosts > 5 years old and with upcoming replacement/decomission, the newer ones worked fine [10:05:14] Danny_B: via github search comes to mind, all repos under wikimedia [10:05:39] doubt it's related to the kernel, though, rather something related to idrac. Papaul made several firmware updates in codfw, which made things more reliable [10:05:44] moritzm: ah ok, thanks! [10:06:11] godog: would you mind to elaborate a bit for me, pls? or link some example? [10:07:31] Danny_B: something like this, https://github.com/search?q=org%3Awikimedia+unit+test&type=Code [10:08:01] nice, thank you very much [10:08:11] Danny_B: np, not grep but close enough [10:09:05] (03PS1) 10Nschaaf: Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) [10:10:13] (03CR) 10jenkins-bot: [V: 04-1] Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [10:14:30] (03PS2) 10Dereckson: Throttling exemption for enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296524 (https://phabricator.wikimedia.org/T138167) (owner: 10Urbanecm) [10:16:23] 06Operations, 10ops-eqiad, 10Analytics-Cluster, 06Analytics-Kanban: analytics1049.eqiad.wmnet disk failure - https://phabricator.wikimedia.org/T137273#2414097 (10elukey) Rebooted by mistake as part of last round of hadoop kernel upgrades, now getting: ``` All of the disks from your previous configuration... [10:17:07] (03PS1) 10Filippo Giunchedi: install_server: add prometheus2002 [puppet] - 10https://gerrit.wikimedia.org/r/296537 [10:19:25] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [10:20:47] (03PS1) 10Jcrespo: Remove otrs backups from dbstore1001, create them on es2001 instead [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) [10:21:47] (03CR) 10jenkins-bot: [V: 04-1] Remove otrs backups from dbstore1001, create them on es2001 instead [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) (owner: 10Jcrespo) [10:21:55] RECOVERY - salt-minion processes on labstore2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:21:57] (03CR) 10Jhernandez: [C: 031] Enable both performance experiments on small tagalog wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296257 (https://phabricator.wikimedia.org/T137822) (owner: 10Jdlrobson) [10:22:03] (03PS2) 10Filippo Giunchedi: install_server: add prometheus2002 [puppet] - 10https://gerrit.wikimedia.org/r/296537 [10:22:10] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] install_server: add prometheus2002 [puppet] - 10https://gerrit.wikimedia.org/r/296537 (owner: 10Filippo Giunchedi) [10:24:40] (03PS1) 10Muehlenhoff: Remove mw1103 and mw1120 from dsh [puppet] - 10https://gerrit.wikimedia.org/r/296540 (https://phabricator.wikimedia.org/T138921) [10:26:31] (03PS2) 10Nschaaf: Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) [10:27:44] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:27:58] (03CR) 10jenkins-bot: [V: 04-1] Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [10:28:36] (03PS2) 10Muehlenhoff: Remove mw1103 and mw1120 from dsh [puppet] - 10https://gerrit.wikimedia.org/r/296540 (https://phabricator.wikimedia.org/T138921) [10:29:14] PROBLEM - salt-minion processes on labstore2003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [10:29:25] (03PS2) 10Jcrespo: Remove otrs backups from dbstore1001, create them on es2001 instead [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) [10:30:35] (03CR) 10jenkins-bot: [V: 04-1] Remove otrs backups from dbstore1001, create them on es2001 instead [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) (owner: 10Jcrespo) [10:31:17] !log rebooting analytics100[12] (Hadoop Yarn/HDFS master and standby) - One at the time forcing failover manually with daemon restarts [10:31:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:33:21] (03PS3) 10Nschaaf: Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) [10:34:23] (03CR) 10jenkins-bot: [V: 04-1] Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [10:35:23] PROBLEM - salt-minion processes on labstore2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [10:36:19] (03PS3) 10Jcrespo: Remove otrs backups from dbstore1001, create them on es2001 instead [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) [10:36:39] (03PS4) 10Jcrespo: Remove otrs backups from dbstore1001, create them on es2001 instead [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) [10:37:48] (03CR) 10jenkins-bot: [V: 04-1] Remove otrs backups from dbstore1001, create them on es2001 instead [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) (owner: 10Jcrespo) [10:37:50] (03PS5) 10Jcrespo: Remove otrs backups from dbstore1001, create them on es2001 instead [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) [10:38:39] (03CR) 10Muehlenhoff: [C: 032 V: 032] Remove mw1103 and mw1120 from dsh [puppet] - 10https://gerrit.wikimedia.org/r/296540 (https://phabricator.wikimedia.org/T138921) (owner: 10Muehlenhoff) [10:38:52] (03CR) 10jenkins-bot: [V: 04-1] Remove otrs backups from dbstore1001, create them on es2001 instead [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) (owner: 10Jcrespo) [10:39:25] (03PS4) 10Nschaaf: Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) [10:39:34] (03PS6) 10Jcrespo: Remove otrs backups from dbstore1001, create them on es2001 instead [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) [10:40:25] (03CR) 10jenkins-bot: [V: 04-1] Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [10:43:05] (03PS7) 10Jcrespo: Remove otrs backups from dbstore1001, create them on es2001 instead [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) [10:43:07] (03PS5) 10Nschaaf: Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) [10:44:05] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [10:44:34] (03CR) 10jenkins-bot: [V: 04-1] Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [10:45:17] (03PS8) 10Jcrespo: Remove otrs backups from dbstore1001, create them on es2001 instead [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) [10:48:20] there is a lot of failed requests to En-us-acid.ogg [10:49:03] but no referrer? [10:50:02] jynus: not a file recently deleted from Commons [10:50:11] there is nothing there [10:50:32] I am considering other options now, if that was a thing, we do not care much [10:50:48] Oh, file exists on en.wp: https://en.wikipedia.org/wiki/File:En-us-acid.ogg [10:51:08] 06Operations, 10ops-eqiad: Broken memory on mw1217 - https://phabricator.wikimedia.org/T138925#2414178 (10MoritzMuehlenhoff) [10:51:20] but it works for me [10:51:24] RECOVERY - salt-minion processes on labstore2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:51:45] ACKNOWLEDGEMENT - Host mw1217 is DOWN: PING CRITICAL - Packet loss = 100% Muehlenhoff Broken RAM, T138925 [10:53:54] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [10:55:06] (03PS1) 10Muehlenhoff: Remove mw1217 from dsh [puppet] - 10https://gerrit.wikimedia.org/r/296544 (https://phabricator.wikimedia.org/T138925) [10:57:13] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:57:18] (03CR) 10Muehlenhoff: [C: 032 V: 032] Remove mw1217 from dsh [puppet] - 10https://gerrit.wikimedia.org/r/296544 (https://phabricator.wikimedia.org/T138925) (owner: 10Muehlenhoff) [10:58:43] PROBLEM - salt-minion processes on labstore2003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [11:03:17] PROBLEM - salt-minion processes on labstore2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [11:04:01] (03PS1) 10Filippo Giunchedi: include 'standard' for prometheus machines [puppet] - 10https://gerrit.wikimedia.org/r/296545 [11:05:42] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] include 'standard' for prometheus machines [puppet] - 10https://gerrit.wikimedia.org/r/296545 (owner: 10Filippo Giunchedi) [11:09:43] !log deleting broken dewiki_titlesuggest index from codfw (T138811) [11:09:45] T138811: CVE-2016-4997 - https://phabricator.wikimedia.org/T138811 [11:09:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:11:06] !log powercycling mw1223, stuck on reboot [11:11:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:15:44] PROBLEM - MariaDB disk space on dbstore1001 is CRITICAL: DISK CRITICAL - free space: /srv 294738 MB (4% inode=99%) [11:17:46] 06Operations, 10ops-eqiad: mw1223 completely hangs - https://phabricator.wikimedia.org/T138930#2414265 (10MoritzMuehlenhoff) [11:20:34] (03PS9) 10Jcrespo: Remove otrs backups from dbstore1001, create them on es2001 instead [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) [11:20:39] (03PS1) 10Muehlenhoff: Remove mw1223 from dsh [puppet] - 10https://gerrit.wikimedia.org/r/296546 (https://phabricator.wikimedia.org/T138930) [11:23:42] (03CR) 10Alexandros Kosiaris: [C: 04-1] Remove otrs backups from dbstore1001, create them on es2001 instead (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) (owner: 10Jcrespo) [11:25:52] 06Operations, 10Monitoring, 13Patch-For-Review: diamond: certain counters always calculated as 0 - https://phabricator.wikimedia.org/T138758#2414305 (10ema) p:05Triage>03Normal [11:26:07] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [11:27:13] !log powercycling elastic1009 - stuck in reboot [11:27:13] (03CR) 10Muehlenhoff: [C: 032 V: 032] Remove mw1223 from dsh [puppet] - 10https://gerrit.wikimedia.org/r/296546 (https://phabricator.wikimedia.org/T138930) (owner: 10Muehlenhoff) [11:27:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:32:18] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [11:33:08] (03PS2) 10Ema: diamond TCP collector: publish TFO-related metrics as gauges [puppet] - 10https://gerrit.wikimedia.org/r/296408 [11:33:17] (03CR) 10Ema: [C: 032 V: 032] diamond TCP collector: publish TFO-related metrics as gauges [puppet] - 10https://gerrit.wikimedia.org/r/296408 (owner: 10Ema) [11:33:35] (03CR) 10Jcrespo: Remove otrs backups from dbstore1001, create them on es2001 instead (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) (owner: 10Jcrespo) [11:38:28] (03PS1) 10Filippo Giunchedi: prometheus: use ordered_yaml for server config [puppet] - 10https://gerrit.wikimedia.org/r/296550 [11:38:40] !log halfway moving otrs backups from dbstore1001 to es2001 [11:38:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:39:06] they are being moved to /srv/backup/m2 [11:39:20] but its final resting place will be /srv/backups/m2 [11:40:19] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [11:44:33] (03PS2) 10Filippo Giunchedi: prometheus: use ordered_yaml for server config [puppet] - 10https://gerrit.wikimedia.org/r/296550 [11:44:39] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] prometheus: use ordered_yaml for server config [puppet] - 10https://gerrit.wikimedia.org/r/296550 (owner: 10Filippo Giunchedi) [11:47:58] PROBLEM - PyBal backends health check on lvs1012 is CRITICAL: PYBAL CRITICAL - apaches_80 - Could not depool server mw1245.eqiad.wmnet because of too many down! [11:50:20] RECOVERY - PyBal backends health check on lvs1012 is OK: PYBAL OK - All pools are healthy [11:51:02] (03PS1) 10Gehel: Postgresql - allow multiple entries for the same user in pg_hba.conf [puppet] - 10https://gerrit.wikimedia.org/r/296551 (https://phabricator.wikimedia.org/T138092) [11:51:29] RECOVERY - salt-minion processes on labstore2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [11:52:08] (03CR) 10jenkins-bot: [V: 04-1] Postgresql - allow multiple entries for the same user in pg_hba.conf [puppet] - 10https://gerrit.wikimedia.org/r/296551 (https://phabricator.wikimedia.org/T138092) (owner: 10Gehel) [11:52:11] 06Operations, 10ops-eqiad, 13Patch-For-Review: mw1103, mw1120 , mw1259 stuck after reboot - https://phabricator.wikimedia.org/T138921#2414396 (10MoritzMuehlenhoff) [11:52:14] (03CR) 10Gehel: "Augeas is brand new to me, so I probably did something awful here. Idea on how to properly test / validate that are welcomed!" [puppet] - 10https://gerrit.wikimedia.org/r/296551 (https://phabricator.wikimedia.org/T138092) (owner: 10Gehel) [11:53:36] (03PS2) 10Gehel: Postgresql - allow multiple entries for the same user in pg_hba.conf [puppet] - 10https://gerrit.wikimedia.org/r/296551 (https://phabricator.wikimedia.org/T138092) [11:54:31] (03PS1) 10Muehlenhoff: Remove mw1259 from dsh [puppet] - 10https://gerrit.wikimedia.org/r/296552 [11:56:00] ACKNOWLEDGEMENT - Host mw1259 is DOWN: PING CRITICAL - Packet loss = 100% Muehlenhoff hardware problem, T138921 [11:56:59] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [11:57:28] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [11:57:28] ACKNOWLEDGEMENT - Host mw1223 is DOWN: PING CRITICAL - Packet loss = 100% Muehlenhoff hardware problem, T138930 [11:58:48] PROBLEM - salt-minion processes on labstore2003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [12:00:06] 06Operations, 10ops-eqiad: mw1145: eth0 has different negotiated speed than requested - https://phabricator.wikimedia.org/T138937#2414415 (10MoritzMuehlenhoff) [12:07:24] ACKNOWLEDGEMENT - Host analytics1049 is DOWN: PING CRITICAL - Packet loss = 100% Muehlenhoff hardware issue, T137273 [12:07:32] PROBLEM - salt-minion processes on labstore2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [12:11:46] !log powercycling mw1260, stuck on reboot [12:11:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:14:27] (03CR) 10DCausse: "just a small indentation issue, other than that I think we can swat this one anytime" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292514 (https://phabricator.wikimedia.org/T135159) (owner: 10EBernhardson) [12:18:02] (03PS2) 10Muehlenhoff: Remove mw1259 from dsh [puppet] - 10https://gerrit.wikimedia.org/r/296552 [12:20:36] (03CR) 10Muehlenhoff: [C: 032 V: 032] Remove mw1259 from dsh [puppet] - 10https://gerrit.wikimedia.org/r/296552 (owner: 10Muehlenhoff) [12:26:23] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [12:31:11] (03PS4) 10KartikMistry: Deploy Compact Language Links as default (Stage 3.5) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296501 (https://phabricator.wikimedia.org/T136677) [12:32:44] !log powercycling elastic1010, stuck on reboot [12:32:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:36:30] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [12:41:36] !log continuing rolling restarts of elastic* in eqiad and codfw for kernel security update [12:41:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:48:19] Hey. Does anyone know if '-foo' works inside dblists? E.g. can I have a DB list with 'wikipedia\n-enwiki\n-frwiki'? [12:50:22] sounds like a mess [12:51:10] RECOVERY - salt-minion processes on labstore2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [12:51:44] Reedy: I want to shorten visualeditor-default.dblist a lot. [12:52:01] lol [12:52:08] Reedy: Right now it's "all Wikivoyages except two" and "all Wikipedias except 34", except we're manually listing all the ones it is. [12:52:25] I guess I could migrated to visualeditor-notdefault.dblist, but… ;-) [12:52:50] Reedy: Also, merge https://gerrit.wikimedia.org/r/#/c/296560/ whilst you're around. ;-) [12:53:05] I think that is done at config side, not at dblist side [12:53:29] but I do not handle that, so do not listen to me [12:53:44] https://github.com/wikimedia/operations-mediawiki-config/blob/master/multiversion/MWWikiversions.php#L40 [12:53:57] "A dblist expression contains one or more dblist file names separated by '+' and '-'." [12:54:00] >> No [12:54:20] but maybe something like default => x, wikkipedias => y enwiki => x [12:54:50] jynus: This is for dblists not config. Doing in config is fine but tedious. [12:55:21] Anyway, thanks Reedy, https://github.com/wikimedia/operations-mediawiki-config/blob/master/multiversion/MWWikiversions.php#L75 shows it doesn't work (yet). [12:55:31] * James_F writes longer .dblist files instead. :-) [12:58:11] PROBLEM - salt-minion processes on labstore2003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [12:58:45] !log rebooting dataset1001 for kernel update [12:58:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:00:12] (03CR) 10Alexandros Kosiaris: Remove otrs backups from dbstore1001, create them on es2001 instead (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) (owner: 10Jcrespo) [13:01:21] (03PS1) 10Reedy: else if -> elseif [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296567 [13:03:51] (03CR) 10Jcrespo: Remove otrs backups from dbstore1001, create them on es2001 instead (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) (owner: 10Jcrespo) [13:06:17] (03PS1) 10Jforrester: VisualEditor: Move the citation button out of the primary toolbar on Wikivoyages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296572 (https://phabricator.wikimedia.org/T133725) [13:06:19] (03PS1) 10Jforrester: VisualEditor: Move the citation button out of the primary toolbar except on Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296573 [13:10:23] (03CR) 10Alexandros Kosiaris: Remove otrs backups from dbstore1001, create them on es2001 instead (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) (owner: 10Jcrespo) [13:12:17] (03PS10) 10Jcrespo: Remove otrs backups from dbstore1001, create them on es2001 instead [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) [13:13:51] (03CR) 10Alexandros Kosiaris: [C: 031] "looks fine to me now :)" [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) (owner: 10Jcrespo) [13:14:02] (03CR) 10Jcrespo: "Alex: I get now what you were saying on the comments- the confusion was that you were pointing to the wrong place." [puppet] - 10https://gerrit.wikimedia.org/r/296538 (https://phabricator.wikimedia.org/T131705) (owner: 10Jcrespo) [13:22:38] !log rebooting analytics1027 for kernel upgrades [13:22:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:24:43] (03CR) 10Filippo Giunchedi: "@Gehel, there are some examples in our git repositories but in short once you have a working setup.py then you can use "debdry" to build a" [puppet] - 10https://gerrit.wikimedia.org/r/290765 (owner: 10Gehel) [13:27:38] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:33:56] PROBLEM - salt-minion processes on labstore2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [14:04:04] 404s have grown again [14:15:34] (03CR) 10Halfak: "The check_ores_worker script enacts a specific and useful route through ORES execution. It ensures that the score generated is not cached" [puppet] - 10https://gerrit.wikimedia.org/r/296054 (owner: 10Dzahn) [14:15:56] PROBLEM - puppet last run on mw2160 is CRITICAL: CRITICAL: puppet fail [14:21:46] RECOVERY - salt-minion processes on labstore2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [14:22:18] James_F: just reverse the DB list, so have a list of wikis not running VE? /Reedy [14:26:46] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [14:29:06] PROBLEM - salt-minion processes on labstore2003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [14:31:28] (03PS2) 10Jforrester: Cleanup: Move never-altered GlobalBlockingBlockXFF into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292615 [14:32:54] (03PS2) 10Jforrester: Cleanup: Move never-altered CentralAuthUseEventLogging into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292616 [14:33:10] (03PS2) 10Jforrester: Cleanup: Move never-altered DisableUnmergedEdits into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292617 [14:33:26] (03PS2) 10Jforrester: Cleanup: Move never-altered NewUserSuppressRC into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292618 [14:33:44] (03PS2) 10Jforrester: Cleanup: Move never-altered UseDismissableSiteNotice into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292619 [14:33:58] (03PS2) 10Jforrester: Cleanup: Move never-altered UseAbuseFilter into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292620 [14:35:13] p858snake: Yeah, but… eh. I guess. Seems messy. [14:37:15] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [14:44:15] RECOVERY - puppet last run on mw2160 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:46:26] !log powercycling elastic1012, stuck on reboot [14:46:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:56:25] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [14:57:56] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [14:58:08] greg-g: Double-clash. :-) [14:59:04] ? [14:59:29] bah :) [14:59:37] greg-g: In https://gerrit.wikimedia.org/r/#/c/296593/ you duplicated my cherry-pick and then duplicated my commit message update. :-) [15:00:04] anomie, ostriches, thcipriani, hashar, twentyafterfour, and Krenair: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160629T1500). [15:00:04] James_F, RoanKattouw, mlitn, and Urbanecm: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:26] I'm here [15:00:32] according to https://phabricator.wikimedia.org/T138931 I beat you with the commit msg part ;) [15:00:58] I can SWAT today [15:00:58] * greg-g fixes [15:01:27] mlitn: are you needing maintenance scripts run, or just merging fixes for them? [15:01:28] greg-g: Yeah yeah, but https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=0&oldid=699496 I beat you [15:01:30] :-) [15:01:57] 06Operations, 06DC-Ops, 10Continuous-Integration-Infrastructure (phase-out-gallium): Can scandium.eqiad.wmnet receives a couple 500G hard drive in a RAID 1 array? - https://phabricator.wikimedia.org/T138955#2414901 (10hashar) [15:01:58] (03PS2) 10Thcipriani: Add $wmgEchoTransition setting for Echo transition flags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296479 (owner: 10Catrope) [15:02:01] thcipriani: right now I just need them to exist [15:02:16] mlitn: ack, just checking [15:02:17] thcipriani: I’ll be running them later today [15:02:26] ok, my redundant work here is done :) [15:02:38] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296479 (owner: 10Catrope) [15:03:34] o/ [15:03:44] (03Merged) 10jenkins-bot: Add $wmgEchoTransition setting for Echo transition flags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296479 (owner: 10Catrope) [15:03:46] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [15:03:56] (I am merely lurking, got a call in half an hour) [15:04:19] (03PS3) 10Thcipriani: Enable Echo transition flags in production for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296480 (owner: 10Catrope) [15:05:16] PROBLEM - salt-minion processes on labstore2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [15:05:38] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296480 (owner: 10Catrope) [15:06:15] (03Merged) 10jenkins-bot: Enable Echo transition flags in production for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296480 (owner: 10Catrope) [15:06:48] (03PS1) 10Filippo Giunchedi: prometheus: generate mysql targets from mw config [puppet] - 10https://gerrit.wikimedia.org/r/296595 (https://phabricator.wikimedia.org/T126757) [15:06:50] (03PS1) 10Filippo Giunchedi: prometheus: add mysql mediawiki production db discovery [puppet] - 10https://gerrit.wikimedia.org/r/296596 (https://phabricator.wikimedia.org/T126757) [15:08:11] (03CR) 10Jcrespo: "Noooooooooooooooooooooooooooooooo." [puppet] - 10https://gerrit.wikimedia.org/r/296596 (https://phabricator.wikimedia.org/T126757) (owner: 10Filippo Giunchedi) [15:08:28] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:296479|Add $wmgEchoTransition setting for Echo transition flags]] PART I (duration: 00m 50s) [15:08:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:08:36] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [15:09:03] !log thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:296479|Add $wmgEchoTransition setting for Echo transition flags]] PART II (duration: 00m 26s) [15:09:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:09:15] ^ RoanKattouw your patch 1 is sync'd [15:09:28] OK, that's the no-op one I think [15:09:36] yup, looks like [15:10:25] jynus: lol, what about the 'no' ? :D not urgent but wanted to put it out there [15:10:51] you are doing what I did [15:10:59] parsing mediawiki config [15:11:28] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:296480|Enable Echo transition flags in production for testing]] (duration: 00m 27s) [15:11:31] ^ RoanKattouw check please [15:11:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:11:54] but you are doing that to find the shards of each db? [15:12:29] Hi. [15:12:34] bonjour :-) [15:12:43] thcipriani: Yup, the wg vars are enabled. Now we wait and see what this does to the API cluster load graphs [15:12:52] jynus: yeah, and the master/slave, https://gerrit.wikimedia.org/r/296595 [15:13:12] that is bad, what if I depool a server from mediawiki? [15:13:21] it disappears from prometheus? [15:13:54] Mr akosiaris, hello :) [15:14:45] I think we discussed that- puppet is right now the place to get that [15:15:04] akosiaris: I confirm bulk loading works, it uses port 7000 and not 9160 [15:15:05] it has the shard and the "role" [15:15:28] thcipriani: there is a throttle change from Urbanecm, for an hackathon later this day, but they can't be there at the SWAT, I checked it, the rule looks fine, could you deploy it too please? [15:15:28] we export those resources and create a file, like we do with neon [15:15:38] s/hackathon/editathon [15:15:41] jynus: from that particular job 'mysql-mw' yeah, but you are right puppet is indeed the way to go, I had the code review anyway, will change it [15:15:43] Dereckson: yup, will do, thanks for standing in [15:15:46] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [15:16:02] mediawiki is the place to search the current state pooled/depooled [15:16:07] akosiaris: elukey updates the related task T138609 [15:16:07] T138609: Network ACL rules to allow traffic from Analytics to Production for port 9160 - https://phabricator.wikimedia.org/T138609 [15:16:09] (for now) [15:16:17] but the role should be on puppet [15:16:21] (for now) [15:16:33] thcipriani: did you see the patch from James_F for stashedit? [15:16:34] you're welcome [15:16:47] akosiaris: I think we're ready to go for performance test now :) [15:16:55] PROBLEM - MariaDB disk space on dbstore1001 is CRITICAL: DISK CRITICAL - free space: /srv 287240 MB (4% inode=99%) [15:16:59] o/ mutante [15:17:11] can you take a look at https://gerrit.wikimedia.org/r/#/c/296535/ ? [15:17:18] let me handle the exports, godog, and once it is almost done I will let you create the file in the format you want [15:17:31] greg-g: oh, no, I didn't, added after my last refresh, thanks for the heads up [15:17:37] I think it's a good solution to a hard problem, but I'm not sure if it's the Right(TM) way to do something like this. [15:17:45] thcipriani: np, last minute wmf.7 blocker [15:18:30] jynus: sure that works too, I'll add you on the other code review to export ganglia cluster information which will be similar [15:18:41] I added yuvipanda too since it seems like this might be a good way to work around the nginx + multiple web nodes monitoring problem. [15:18:45] yuvipanda, see https://gerrit.wikimedia.org/r/#/c/296535/ [15:19:08] godog - I think the code is useful [15:19:19] after all, I created the first version [15:19:29] but not for config [15:22:49] !log thcipriani@tin Synchronized php-1.28.0-wmf.7/extensions/Flow/maintenance/FlowRestoreLQT.php: SWAT: [[gerrit:296511|Script to restore LQT topics to their pre-import state (T119509)]] (duration: 00m 26s) [15:22:49] T119509: Cleanup ptwikibooks conversion - https://phabricator.wikimedia.org/T119509 [15:22:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:23:06] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [15:23:18] ^ mlitn maintenance script in place [15:23:20] (03CR) 10Yuvipanda: [C: 04-1] "Awesome work, but unfortunately this won't work because icinga can not reach the nodes themselves directly, since they are all private to " [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [15:24:28] jynus: indeed, that's where I got inspired from :D [15:25:58] thcipriani: thanks! [15:26:56] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [15:29:30] !log thcipriani@tin Synchronized php-1.28.0-wmf.7/extensions/Flow: SWAT: [[gerrit:296512|Do not reimport existing header (T119509)]] (duration: 00m 46s) [15:29:31] T119509: Cleanup ptwikibooks conversion - https://phabricator.wikimedia.org/T119509 [15:29:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:29:53] ^ mlitn 2nd patch sync'd [15:30:11] perfect :) [15:30:17] PROBLEM - salt-minion processes on labstore2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [15:33:04] moritzm akosiaris I was looking at the salt minion failures, I think those were broken by https://gerrit.wikimedia.org/r/#/c/295782/ but it looks like we're missing some labs support subnets in codfw in hieradata [15:33:16] (03CR) 10Nschaaf: "With this change, icinga would still be going through the load balancer, but it would be hitting https://ores.wmflabs.org/ores-web-03/v2/." [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [15:33:22] PROBLEM - salt-minion processes on labstore2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [15:34:44] anyways I've silenced those until tomorrow [15:34:48] !log thcipriani@tin Synchronized php-1.28.0-wmf.7/extensions/Flow/maintenance/FlowRemoveOldTopics.php: SWAT: [[gerrit:296513|Also delete topics that have more recent updates by (only) talk page manager (T119509)]] (duration: 00m 25s) [15:34:48] T119509: Cleanup ptwikibooks conversion - https://phabricator.wikimedia.org/T119509 [15:34:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:34:58] ^ mlitn final maintenance script sync'd [15:35:25] AbuseFilter sure is making a lot of logspam :( [15:36:03] thcipriani: ok that’s all I needed, thanks a lot :) [15:36:23] mlitn: yw :) [15:38:22] !log thcipriani@tin Synchronized php-1.28.0-wmf.7/resources/Resources.php: SWAT: [[gerrit:296593|mediawiki.action.edit.stash: Restore dependency to "jquery.getAttrs" (T138931)]] (duration: 00m 26s) [15:38:23] T138931: JavaScript crashes frequently due to stashEdit calls - https://phabricator.wikimedia.org/T138931 [15:38:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:38:27] ^ James_F check please [15:39:41] (03PS3) 10Thcipriani: Throttling exemption for enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296524 (https://phabricator.wikimedia.org/T138167) (owner: 10Urbanecm) [15:39:53] thcipriani: greg-g Can one of you? Sorry. [15:41:33] 06Operations, 10netops: Network ACL rules to allow traffic from Analytics to Production for port 9160 - https://phabricator.wikimedia.org/T138609#2414972 (10elukey) >>! In T138609#2411926, @akosiaris wrote: > Can we please verify that ? I 'd like to remove port 9160 if it is not really needed. That's the Thrif... [15:44:06] James_F: can confirm jQuery.fn.serializeObject is a function when editing [15:44:26] godog: thanks, I'll have a look [15:44:53] PROBLEM - puppet last run on db1036 is CRITICAL: CRITICAL: Puppet has 1 failures [15:44:54] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296524 (https://phabricator.wikimedia.org/T138167) (owner: 10Urbanecm) [15:45:15] thcipriani: yay, thank you. [15:45:30] (03Merged) 10jenkins-bot: Throttling exemption for enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296524 (https://phabricator.wikimedia.org/T138167) (owner: 10Urbanecm) [15:46:28] (03PS1) 10Elukey: Revert "Enable Thrift RCP service on aqs100[456]" [puppet] - 10https://gerrit.wikimedia.org/r/296597 [15:46:48] (03PS2) 10Elukey: Revert "Enable Thrift RCP service on aqs100[456]" [puppet] - 10https://gerrit.wikimedia.org/r/296597 [15:49:03] !log thcipriani@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:296524|Throttling exemption for enwiki (T138167)]] (duration: 00m 25s) [15:49:04] T138167: IP exemption for six account limit for conducting a Wikipedia workshop - https://phabricator.wikimedia.org/T138167 [15:49:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:49:15] (03CR) 10Elukey: [C: 032] Revert "Enable Thrift RCP service on aqs100[456]" [puppet] - 10https://gerrit.wikimedia.org/r/296597 (owner: 10Elukey) [15:49:16] ^ Dereckson throttling exemption sync'd [15:51:33] thanks for the deploy thcipriani [15:51:49] moritzm: np, let me know if I can help [15:51:53] PROBLEM - Host elastic2005 is DOWN: PING CRITICAL - Packet loss = 100% [15:51:59] Dereckson: yw, thanks for being available :) [15:52:27] elastic2005 is me, downtime lapsed [15:52:42] RECOVERY - Host elastic2005 is UP: PING OK - Packet loss = 0%, RTA = 42.46 ms [15:55:28] (03PS7) 10ArielGlenn: dump url shorteners for wiki projects [puppet] - 10https://gerrit.wikimedia.org/r/278400 (https://phabricator.wikimedia.org/T116986) [15:56:39] (03CR) 10jenkins-bot: [V: 04-1] dump url shorteners for wiki projects [puppet] - 10https://gerrit.wikimedia.org/r/278400 (https://phabricator.wikimedia.org/T116986) (owner: 10ArielGlenn) [15:57:33] mutante, twentyafterfour: I'm currently amending the patch to fix the issue, I believe we can use this time slot... [15:58:41] jzerebecki: which issue? [15:59:02] (03PS8) 10ArielGlenn: dump url shorteners for wiki projects [puppet] - 10https://gerrit.wikimedia.org/r/278400 (https://phabricator.wikimedia.org/T116986) [15:59:07] twentyafterfour: T137224#2407860 [15:59:08] T137224: Redirect Gitblit urls (git.wikimedia.org) -> Diffusion urls (phabricator.wikimedia.org/diffusion) - https://phabricator.wikimedia.org/T137224 [16:00:01] 06Operations, 10Fundraising-Backlog: Allow Fundraising to A/B test wikipedia.org as send domain - https://phabricator.wikimedia.org/T135410#2415000 (10Jgreen) This ticket follows up on T94052 where the idea of A:B testing was raised. [16:00:04] mutante and twentyafterfour: Dear anthropoid, the time has come. Please deploy Redirect Gitblit to Phabricator (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160629T1600). [16:01:01] jzerebecki: ah, ok ... just lowering the priority of the git.wikimedia.org vhost? or some smarter fix than that? ;) [16:04:00] 06Operations, 06DC-Ops, 10Continuous-Integration-Infrastructure (phase-out-gallium): Can scandium.eqiad.wmnet receives a couple 500G hard drive in a RAID 1 array? - https://phabricator.wikimedia.org/T138955#2415016 (10hashar) [16:04:26] (03CR) 10Filippo Giunchedi: [C: 04-1] "this will need reworking to make puppet exported resources the configuration source, not mediawiki-config" [puppet] - 10https://gerrit.wikimedia.org/r/296595 (https://phabricator.wikimedia.org/T126757) (owner: 10Filippo Giunchedi) [16:04:32] (03CR) 10Filippo Giunchedi: [C: 04-1] "this will need reworking to make puppet exported resources the configuration source, not mediawiki-config" [puppet] - 10https://gerrit.wikimedia.org/r/296596 (https://phabricator.wikimedia.org/T126757) (owner: 10Filippo Giunchedi) [16:08:39] (03PS3) 10JanZerebecki: Rewrite rules for git.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/296138 (https://phabricator.wikimedia.org/T137224) (owner: 10Paladox) [16:09:04] RECOVERY - puppet last run on db1036 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:09:16] twentyafterfour: the bugzilla and bugs ServerAliases were missing, I added them [16:09:43] James_F: hashar_: thanks for taking care of https://phabricator.wikimedia.org/T138931 [16:09:44] jzerebecki oh, thanks for noticing that :) [16:09:51] mutante ^^ [16:10:32] 06Operations, 06Release-Engineering-Team, 07Developer-notice, 05Gitblit-Deprecate, and 2 others: Redirect Gitblit urls (git.wikimedia.org) -> Diffusion urls (phabricator.wikimedia.org/diffusion) - https://phabricator.wikimedia.org/T137224#2415030 (10JanZerebecki) Those server aliases where missing, I amend... [16:10:45] MatmaRex: thank you for the detail task and links to the commits ;-} Eventually Greg cherry picked it, James added it to deployment window and Tyler pushing it \O/ [16:10:49] MatmaRex: team work! [16:11:54] twentyafterfour: mutante might not be here as he couldn't know before that someone would fix the problem for deployment [16:12:33] jzerebecki mutante will be back in less then 45 mins [16:12:36] He said [16:21:43] RECOVERY - salt-minion processes on labstore2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:26:33] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:28:01] MatmaRex: No worries. In return, could I get CR from you on https://gerrit.wikimedia.org/r/#/c/285428/ ? ;-) [16:28:12] RECOVERY - salt-minion processes on labstore2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:29:18] jzerebecki hi, woulden this bit ServerName <%= @gitblit_servername %> not work [16:29:37] since it is set in settings => { [16:29:52] but i thought it has to be set outside of that for it to work [16:30:02] https://phabricator.wikimedia.org/rOPUP161616de99c5fd439d2fc84486c272f75611e581 [16:30:08] mutante ^^ [16:30:28] phedenskog: $gitblit_servername = $phab_settings['gitblit.hostname'] [16:31:16] oops wrong nick. paladox ^^ [16:31:29] jzerebecki but then it would ne to be changed to ServerName <%= @phab_settings['gitblit.hostname'] %> [16:31:44] Im new at puppet [16:31:49] so i may be wrong [16:32:42] James_F: hm [16:33:12] paladox: what do you think in the current patch the variable referenced by @gitblit_servername is set to in gitblit_vhost.conf.erb? [16:33:46] Well it is referencing gitblit.hostname but im not sure if it should be phab_settings['gitblit.hostname'] [16:34:32] jzerebecki ^ [16:35:16] paladox: where is a reference to gitblit.hostname? [16:35:39] jzerebecki here https://phabricator.wikimedia.org/diffusion/OPUP/change/production/modules/phabricator/templates/gitblit_vhost.conf.erb;161616de99c5fd439d2fc84486c272f75611e581 [16:35:53] and https://phabricator.wikimedia.org/diffusion/OPUP/change/production/modules/role/manifests/phabricator/main.pp;161616de99c5fd439d2fc84486c272f75611e581 [16:36:17] paladox: which lines? [16:36:43] MatmaRex: Thank you. :_) [16:37:05] actually in the second link it is 'gitblit.hostname' => 'git.wikimedia.org', [16:37:16] that's line 59 [16:37:34] and first link line 2 and is set as ServerName <%= @gitblit_servername %> [16:38:09] which means we could have set the wrong setting for it unless that is what is set in gitblit puppet [16:38:14] jzerebecki ^^ [16:38:54] Nope dosent look like any references to that https://github.com/wikimedia/operations-puppet/search?utf8=%E2%9C%93&q=gitblit_servername [16:39:43] back [16:39:54] jzerebecki: what's up [16:40:14] paladox: https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/role/manifests/phabricator/main.pp;8856c7a42439088943f3111c0cbefa4fca41b680$40 is setting https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/phabricator/manifests/init.pp;8856c7a42439088943f3111c0cbefa4fca41b680$56 [16:40:49] mutante: I amended the pach to add the missing ServerAliases for bugs and bugzilla [16:41:05] Yep but how is the server name set [16:41:09] in the apache vhost file [16:41:19] since i carn't find anywhere where it is set. [16:41:24] mutante: but wait let me fix the author [16:41:49] But https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/role/manifests/phabricator/main.pp;8856c7a42439088943f3111c0cbefa4fca41b680$40 is setting 'gitblit.hostname' => 'git.wikimedia.org', [16:41:51] jzerebecki: the redirects for bugzilla are not in an Apache virtual host though [16:42:00] jzerebecki: i just see templates/redirect_config.json.erb in phabricatoritself [16:42:06] (03CR) 10ArielGlenn: "It's after Wikimania ;-) I'd like to do the first run manually (with the deployed script), so let me know when the ext is live!" [puppet] - 10https://gerrit.wikimedia.org/r/278400 (https://phabricator.wikimedia.org/T116986) (owner: 10ArielGlenn) [16:42:18] mutante: yes that was the error, I fixed that in the reuploaded patch [16:42:49] oh, ok, thank you! looking [16:44:08] jzerebecki but 'gitblit.hostname' => 'git.wikimedia.org', is not being set in the apache file [16:44:46] paladox: what's the gerrit url please [16:44:56] oh, nevermind [16:45:07] mutante https://gerrit.wikimedia.org/r/#/c/296138/ [16:45:09] Oh [16:45:32] paladox: i should abandon https://gerrit.wikimedia.org/r/#/c/293221/ ? [16:45:47] (03Abandoned) 10Dzahn: git.wikimedia.org -> Diffusion redirects [puppet] - 10https://gerrit.wikimedia.org/r/293221 (https://phabricator.wikimedia.org/T137224) (owner: 10Dzahn) [16:45:48] mutante yep [16:45:52] (03PS4) 10JanZerebecki: Rewrite rules for git.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/296138 (https://phabricator.wikimedia.org/T137224) (owner: 10Paladox) [16:46:27] mutante: ok done, author fixed [16:46:50] jzerebecki seems to be setting in https://gerrit.wikimedia.org/r/#/c/296138/4/modules/phabricator/manifests/init.pp [16:47:01] setting = set [16:49:53] now i see. ServerAliases in phabricator-default.conf.erb ! not in Apache . yes [16:50:11] paladox: https://gerrit.wikimedia.org/r/#/c/296138/4/modules/phabricator/templates/phabricator-default.conf.erb [16:50:16] thats where he uses it [16:50:21] Ok [16:50:28] thanks [16:50:33] it's on the phabricator side [16:50:45] Yep [16:51:00] well, Apache for Phab, but as opposed to the ones setup with apache::site [16:51:17] Oh [16:54:37] mutante want to give it another shot at the patch. [16:55:59] (03PS2) 10Jforrester: VisualEditor: Move cite out of primary toolbar except on WP/WB/WV [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296573 [16:57:03] PROBLEM - mediawiki-installation DSH group on mw2098 is CRITICAL: Host mw2098 is not in mediawiki-installation dsh group [16:57:44] PROBLEM - puppet last run on mw2098 is CRITICAL: CRITICAL: Puppet has 4 failures [16:58:43] RECOVERY - Host mw2134 is UP: PING OK - Packet loss = 0%, RTA = 36.26 ms [16:58:47] paladox: yes, i'm compiling puppet right now [16:58:58] mutante ok, thanks. [16:58:59] :) [17:00:05] (03CR) 10Dzahn: [C: 04-1] "Error: Failed to parse template phabricator/phabricator-default.conf.erb:" [puppet] - 10https://gerrit.wikimedia.org/r/296138 (https://phabricator.wikimedia.org/T137224) (owner: 10Paladox) [17:00:46] http://puppet-compiler.wmflabs.org/3222/iridium.eqiad.wmnet/change.iridium.eqiad.wmnet.err [17:00:49] jzerebecki ^^ [17:00:53] looking [17:01:00] thanks [17:01:40] Line 21 [17:01:52] needs to be updated. [17:01:58] <% if !@serveralias.empty? -%> [17:02:40] including RewriteCond "%{HTTP_HOST}" "<%= @serveralias.gsub('.', '\.') %>" [17:02:46] paladox: i would like to copy/paste that list of URLs to test [17:03:04] mutante ok, the urls of the phab task? [17:03:21] paladox: i forgot if it was phab etherpad or gerrit :p [17:03:34] mutante it's phab. [17:03:45] mutante but replace git.wikimedia.org with git.wmflabs.org [17:04:38] ok, in this case i want the actual .wikimedia.org ones [17:04:59] mutante ok, https://phabricator.wikimedia.org/T137224 [17:05:11] (03PS5) 10JanZerebecki: Rewrite rules for git.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/296138 (https://phabricator.wikimedia.org/T137224) (owner: 10Paladox) [17:05:33] thanks [17:05:48] mutante, paladox: I think that fixes it [17:06:15] jzerebecki thanks [17:06:20] cool,rebuilding [17:07:48] yep, does not fail anymore. http://puppet-compiler.wmflabs.org/3223/ [17:08:04] adds aliases http://puppet-compiler.wmflabs.org/3223/iridium.eqiad.wmnet/ [17:08:05] mutante: otoh that is changing robots.txt for bugs and bugzilla [17:09:37] hmm, eh [17:09:41] mutante: before bugs and bugzilla were not serving a robots.txt, but instead would redirect to phabricator.w.o/ from /robots.txt [17:09:49] i do see the diff, yea [17:10:21] mutante: we would deindex these https://www.google.de/search?q=site:bugs.wikimedia.org&ie=utf-8&oe=utf-8&gws_rd=cr&ei=3wB0V-m3GMy6UZCRrpAI#q=site:bugzilla.wikimedia.org [17:12:57] mutante: i think that is fine the results do not seem useful, they duplicate things to be found on phabricator.w.o anyway [17:13:44] it seems ok to me.. but i also dont recall being involved in any of those when doing BZ work [17:13:49] yea [17:14:23] PROBLEM - mediawiki-installation DSH group on mw2123 is CRITICAL: Host mw2123 is not in mediawiki-installation dsh group [17:15:16] 06Operations, 10ops-codfw, 13Patch-For-Review: mw2098 / mw2123 / mw2134 not coming up after reboot - https://phabricator.wikimedia.org/T138812#2415347 (10Papaul) 05Open>03Resolved a:05Papaul>03MoritzMuehlenhoff @MoritzMuehlenhoff all the systems are back online. [17:16:39] 06Operations, 10ops-codfw, 06DC-Ops: Codfw-mw* IDRAC firmware upgrade - https://phabricator.wikimedia.org/T125088#2415355 (10Papaul) 05Open>03Resolved Closing this task since all the systems IDRAC firmware are up to date [17:17:44] 06Operations, 10ops-codfw: codfw: return one intel ssd to dasher for warranty replacement - https://phabricator.wikimedia.org/T132210#2415364 (10Papaul) Logo *** This is a system-generated email from an unmonitored mailbox. Please do not reply *** Dear Papaul Tshibamba, Your Sa... [17:21:54] RECOVERY - puppet last run on mw2098 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:23:03] (03PS6) 10Dzahn: Rewrite rules for git.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/296138 (https://phabricator.wikimedia.org/T137224) (owner: 10Paladox) [17:33:52] 06Operations, 10DBA, 06Labs, 07Tracking: Database replication services (tracking) - https://phabricator.wikimedia.org/T50930#2415469 (10jcrespo) [17:34:54] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/3224/" [puppet] - 10https://gerrit.wikimedia.org/r/296138 (https://phabricator.wikimedia.org/T137224) (owner: 10Paladox) [17:41:09] 06Operations, 10GlobalRename, 10MediaWiki-extensions-CentralAuth: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973#2385660 (10CerKill) I will let you wait during one week... Holidays time now... see you ;) [17:44:10] (03PS1) 10Alexandros Kosiaris: Add labs-hosts1-b-codfw and labs-support1-b-codfw [puppet] - 10https://gerrit.wikimedia.org/r/296610 [17:46:02] (03CR) 10Alexandros Kosiaris: [C: 032] Add labs-hosts1-b-codfw and labs-support1-b-codfw [puppet] - 10https://gerrit.wikimedia.org/r/296610 (owner: 10Alexandros Kosiaris) [17:46:07] (03PS2) 10Alexandros Kosiaris: Add labs-hosts1-b-codfw and labs-support1-b-codfw [puppet] - 10https://gerrit.wikimedia.org/r/296610 [17:46:12] (03CR) 10Alexandros Kosiaris: [V: 032] Add labs-hosts1-b-codfw and labs-support1-b-codfw [puppet] - 10https://gerrit.wikimedia.org/r/296610 (owner: 10Alexandros Kosiaris) [17:47:44] (03CR) 10Dzahn: "https://phabricator.wikimedia.org/P3315" [puppet] - 10https://gerrit.wikimedia.org/r/296138 (https://phabricator.wikimedia.org/T137224) (owner: 10Paladox) [17:48:13] !log gerrit: flushed all caches to pick up rename, things may be slow for the next 15m or so [17:48:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:49:26] 06Operations, 06Release-Engineering-Team, 07Developer-notice, 05Gitblit-Deprecate, and 2 others: Redirect Gitblit urls (git.wikimedia.org) -> Diffusion urls (phabricator.wikimedia.org/diffusion) - https://phabricator.wikimedia.org/T137224#2415511 (10Dzahn) URLs tested: P3315 tested with apache-fast-test f... [17:50:05] 06Operations, 06Labs, 06Release-Engineering-Team, 10wikitech.wikimedia.org: Rename specific account in LDAP, Wikitech and Gerrit - https://phabricator.wikimedia.org/T133968#2415513 (10demon) I've done the rename of "Luis Felipe Schenone" to "Felipe Schenone" per the request. Sorry it took so long, I got c... [17:51:10] PROBLEM - mediawiki-installation DSH group on mw2134 is CRITICAL: Host mw2134 is not in mediawiki-installation dsh group [17:53:12] (03PS3) 10Paladox: varnish: git.wm.org to iridium, remove related config/tests/monitoring [puppet] - 10https://gerrit.wikimedia.org/r/293789 (https://phabricator.wikimedia.org/T137224) (owner: 10Dzahn) [17:56:09] 06Operations, 06Release-Engineering-Team, 07Developer-notice, 05Gitblit-Deprecate, and 2 others: Redirect Gitblit urls (git.wikimedia.org) -> Diffusion urls (phabricator.wikimedia.org/diffusion) - https://phabricator.wikimedia.org/T137224#2415517 (10Dzahn) @DannyB It was because the Bugzilla redirects are... [17:59:55] godog: I 'd say fixed in https://gerrit.wikimedia.org/r/296610 [18:00:51] (03CR) 10Dzahn: [C: 032] "rewrite rules are in iridium now, tested:" [puppet] - 10https://gerrit.wikimedia.org/r/293789 (https://phabricator.wikimedia.org/T137224) (owner: 10Dzahn) [18:03:32] akosiaris: nice, thanks! yeah looks like it recovered [18:03:37] 06Operations, 06Labs, 06Release-Engineering-Team, 10wikitech.wikimedia.org: Rename specific account in LDAP, Wikitech and Gerrit - https://phabricator.wikimedia.org/T133968#2415541 (10lfschenone) Thanks a lot demon! About lfs to lfschenone, do you mean we should wait for someone from labs. Maybe when the t... [18:05:01] !log git.wm.org URLs switched from gitblit to phab redirects [18:05:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:06:09] !log we stopped using gitblit. git.wikimedia.org URLs P3318 T137224 [18:06:10] P3318 git.wm / phab.wm / bz.wm AFTER change 296138 - https://phabricator.wikimedia.org/P3318 [18:06:10] T137224: Redirect Gitblit urls (git.wikimedia.org) -> Diffusion urls (phabricator.wikimedia.org/diffusion) - https://phabricator.wikimedia.org/T137224 [18:06:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:07:51] 06Operations, 06Labs, 06Release-Engineering-Team, 10wikitech.wikimedia.org: Rename specific account in LDAP, Wikitech and Gerrit - https://phabricator.wikimedia.org/T133968#2415555 (10demon) >>! In T133968#2415541, @lfschenone wrote: > Thanks a lot demon! About lfs to lfschenone, do you mean we should wait... [18:13:26] (03PS2) 10Yurik: Maps: Limit query exec time for kartotherian user [puppet] - 10https://gerrit.wikimedia.org/r/295548 (https://phabricator.wikimedia.org/T138422) [18:15:06] 06Operations, 07Blocked-on-RelEng, 05Gitblit-Deprecate, 13Patch-For-Review: Phase out antimony.wikimedia.org (git.wikimedia.org / gitblit) - https://phabricator.wikimedia.org/T123718#2415575 (10Dzahn) [18:15:11] 06Operations, 06Release-Engineering-Team, 07Developer-notice, 05Gitblit-Deprecate, and 2 others: Redirect Gitblit urls (git.wikimedia.org) -> Diffusion urls (phabricator.wikimedia.org/diffusion) - https://phabricator.wikimedia.org/T137224#2415573 (10Dzahn) 05Open>03Resolved a:03Dzahn [18:16:30] (03PS1) 10Muehlenhoff: Readd fixed codfw appserver hosts to dsh [puppet] - 10https://gerrit.wikimedia.org/r/296612 (https://phabricator.wikimedia.org/T138812) [18:17:48] (03CR) 10Muehlenhoff: [C: 032 V: 032] Readd fixed codfw appserver hosts to dsh [puppet] - 10https://gerrit.wikimedia.org/r/296612 (https://phabricator.wikimedia.org/T138812) (owner: 10Muehlenhoff) [18:18:48] 06Operations, 10ops-codfw, 13Patch-For-Review: mw2098 / mw2123 / mw2134 not coming up after reboot - https://phabricator.wikimedia.org/T138812#2415591 (10MoritzMuehlenhoff) I've repooled the hosts and added them back to dsh. [18:20:56] (03CR) 10Alexandros Kosiaris: [C: 031] "After discussions with Guillaume, we decided that the best place for this to be enforced, would be kartotherian itself, essentially runnin" [puppet] - 10https://gerrit.wikimedia.org/r/295548 (https://phabricator.wikimedia.org/T138422) (owner: 10Yurik) [18:21:37] 06Operations, 07Blocked-on-RelEng, 05Gitblit-Deprecate, 13Patch-For-Review: Phase out antimony.wikimedia.org (git.wikimedia.org / gitblit) - https://phabricator.wikimedia.org/T123718#2415614 (10demon) [18:23:10] (03PS1) 10Chad: Gerrit: stop replicating to antimony for gitblit [puppet] - 10https://gerrit.wikimedia.org/r/296613 [18:23:16] mutante: ^^ :) [18:24:17] (03CR) 10jenkins-bot: [V: 04-1] Gerrit: stop replicating to antimony for gitblit [puppet] - 10https://gerrit.wikimedia.org/r/296613 (owner: 10Chad) [18:24:29] Oh jenkins [18:24:50] ok, cool! back in 5 [18:28:19] !log demon@tin Synchronized php-1.28.0-wmf.8/includes/export/WikiExporter.php: Deploy I94ca4a06 (duration: 00m 34s) [18:28:21] 06Operations, 06Release-Engineering-Team, 07Developer-notice, 05Gitblit-Deprecate, and 2 others: Redirect Gitblit urls (git.wikimedia.org) -> Diffusion urls (phabricator.wikimedia.org/diffusion) - https://phabricator.wikimedia.org/T137224#2415631 (10greg) \o/ Thanks all for the work here. Lovely working wi... [18:28:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:29:33] !log demon@tin Synchronized php-1.28.0-wmf.8/maintenance/backup.inc: Deploy I94ca4a06 (duration: 00m 33s) [18:29:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:30:11] !log demon@tin Synchronized php-1.28.0-wmf.8/maintenance/dumpBackup.php: Deploy I94ca4a06 (duration: 00m 27s) [18:30:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:30:16] apergos: Done for wmf.8, doing wmf.7 once jenkins is done ^^^ [18:30:24] I see it. [18:30:29] \o/ [18:35:39] (03PS2) 10Chad: Gerrit: stop replicating to antimony for gitblit [puppet] - 10https://gerrit.wikimedia.org/r/296613 [18:37:06] !log demon@tin Synchronized php-1.28.0-wmf.7/includes/export/WikiExporter.php: Deploy I94ca4a06 (duration: 00m 25s) [18:37:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:39:07] !log demon@tin Synchronized php-1.28.0-wmf.7/maintenance/backup.inc: Deploy I94ca4a06 (duration: 00m 24s) [18:39:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:39:42] !log demon@tin Synchronized php-1.28.0-wmf.7/maintenance/dumpBackup.php: Deploy I94ca4a06 (duration: 00m 25s) [18:39:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:40:18] thanks a lot ostriches and also Max Sem for the review [18:41:20] yw! [18:48:20] PROBLEM - puppet last run on maps2002 is CRITICAL: CRITICAL: puppet fail [18:49:39] (03PS3) 10Chad: Gerrit: stop replicating to antimony for gitblit [puppet] - 10https://gerrit.wikimedia.org/r/296613 [18:49:41] (03PS1) 10Chad: Remove antimony [puppet] - 10https://gerrit.wikimedia.org/r/296616 [18:50:21] (03PS1) 10Chad: Remove antimony from dns [dns] - 10https://gerrit.wikimedia.org/r/296617 [18:52:17] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2415676 (10Dzahn) Talked with Paulis, the owner of FKraus bot and xqt, pywikibot-Framework author. Paulis has reinstalled python and bot an... [18:52:31] RECOVERY - mediawiki-installation DSH group on mw2134 is OK: OK [18:53:18] (03CR) 10Dzahn: [C: 032] Gerrit: stop replicating to antimony for gitblit [puppet] - 10https://gerrit.wikimedia.org/r/296613 (owner: 10Chad) [18:56:41] 06Operations, 10netops: Network ACL rules to allow traffic from Analytics to Production for port 9160 - https://phabricator.wikimedia.org/T138609#2415677 (10akosiaris) Really glad we sorted this out and figured out we don't need 9160. I also appreciate the cleaning up of the unnecessarily opened thrift interfa... [18:57:47] elukey: ^ [18:57:55] (03PS2) 10Dzahn: Remove antimony [puppet] - 10https://gerrit.wikimedia.org/r/296616 (https://phabricator.wikimedia.org/T123718) (owner: 10Chad) [18:58:23] 06Operations, 07Blocked-on-RelEng, 05Gitblit-Deprecate, 13Patch-For-Review: Phase out antimony.wikimedia.org (git.wikimedia.org / gitblit) - https://phabricator.wikimedia.org/T123718#2415680 (10Dzahn) stop replicating to gitblit https://gerrit.wikimedia.org/r/296613 [18:59:50] RECOVERY - mediawiki-installation DSH group on mw2098 is OK: OK [19:00:04] twentyafterfour: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160629T1900). Please do the needful. [19:00:22] dear jouncebot, deploying [19:00:38] (03PS1) 10Chad: Gerrit: minor (no-op) manifest cleanup [puppet] - 10https://gerrit.wikimedia.org/r/296619 [19:03:37] (03PS2) 10Dzahn: Gerrit: minor (no-op) manifest cleanup [puppet] - 10https://gerrit.wikimedia.org/r/296619 (owner: 10Chad) [19:04:40] (03CR) 10Dzahn: [C: 032] Remove antimony [puppet] - 10https://gerrit.wikimedia.org/r/296616 (https://phabricator.wikimedia.org/T123718) (owner: 10Chad) [19:05:07] (03PS1) 1020after4: group1 wikis to 1.28.0-wmf.8T138555 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296621 [19:05:31] (03CR) 1020after4: [C: 032] group1 wikis to 1.28.0-wmf.8T138555 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296621 (owner: 1020after4) [19:06:04] (03Merged) 10jenkins-bot: group1 wikis to 1.28.0-wmf.8T138555 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296621 (owner: 1020after4) [19:06:21] (03CR) 10Dzahn: [C: 032] Gerrit: minor (no-op) manifest cleanup [puppet] - 10https://gerrit.wikimedia.org/r/296619 (owner: 10Chad) [19:07:11] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.28.0-wmf.8T138555 [19:07:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:07:55] (03PS2) 10Dzahn: graphite/production: Limit to PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/296202 (owner: 10Muehlenhoff) [19:08:46] (03CR) 10Dzahn: [C: 032] graphite/production: Limit to PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/296202 (owner: 10Muehlenhoff) [19:13:16] (03PS1) 10Chad: WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 [19:13:41] !log antimony - stopping gitblit service [19:13:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:13:54] mutante :) [19:14:08] !log ytterbium: running puppet and reloading replication plugin [19:14:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:14:41] mutante: "antimony - stopping gitblit service" <------ well done!!!!!!!!!!! [19:14:50] RECOVERY - puppet last run on maps2002 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [19:14:54] (03CR) 10jenkins-bot: [V: 04-1] WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 (owner: 10Chad) [19:15:03] hashar: thanks:) [19:17:09] RECOVERY - mediawiki-installation DSH group on mw2123 is OK: OK [19:17:58] (03PS2) 10Dzahn: Remove antimony from dns [dns] - 10https://gerrit.wikimedia.org/r/296617 (https://phabricator.wikimedia.org/T123718) (owner: 10Chad) [19:18:19] PROBLEM - MariaDB disk space on dbstore1001 is CRITICAL: DISK CRITICAL - free space: /srv 287055 MB (4% inode=99%) [19:20:19] PROBLEM - gitblit process on antimony is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar gitblit.jar [19:21:44] ACKNOWLEDGEMENT - gitblit process on antimony is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar gitblit.jar daniel_zahn decom - T123718 [19:22:19] !log antimony puppetstoredconfigclean.rb to remove icinga monitor remnants [19:22:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:24:56] (03PS2) 10Chad: WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 [19:25:52] 06Operations, 07Blocked-on-RelEng, 05Gitblit-Deprecate, 13Patch-For-Review: Phase out antimony.wikimedia.org (git.wikimedia.org / gitblit) - https://phabricator.wikimedia.org/T123718#2415739 (10Dzahn) [palladium:~] $ puppetstoredconfigclean.rb antimony.wikimedia.org Killing antimony.wikimedia.org...done. [... [19:29:20] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master). [19:32:10] who is doing the train today? [19:32:21] twentyafterfour: ? [19:32:29] aude: yep [19:33:08] we didn't deploy new wikidata code this week but somehow it's not possible now to add new statements [19:33:13] same on test.wikidata [19:33:34] aude: hmmm [19:33:36] so maybe put wikidata back on wmf.7 ? [19:33:45] and i can look into it, since test.wikidata is also broken [19:33:58] ok [19:34:03] thanks [19:34:08] hopefully it's a trivial fix [19:36:06] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: Roll back wikidata and testwikidata to 1.28.0-wmf.7 per request by @aude [19:36:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:36:32] err wait you wanted test.wikidata to stay at wmf.8 [19:36:42] yeah [19:36:54] still need to debug [19:37:31] (03PS1) 1020after4: Roll back wikidata to 0.28.0-wmf.7 per request by @aude [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296624 [19:38:23] (03CR) 1020after4: [C: 032] Roll back wikidata to 0.28.0-wmf.7 per request by @aude [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296624 (owner: 1020after4) [19:38:55] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: testwikidata back to wmf.8 [19:38:56] !log antimony - shutdown -h now (since it's gone from Icinga now) [19:39:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:39:04] (03Merged) 10jenkins-bot: Roll back wikidata to 0.28.0-wmf.7 per request by @aude [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296624 (owner: 1020after4) [19:39:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:39:13] mutante: awesome! [19:39:14] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: puppet fail [19:39:25] twentyafterfour: :)! [19:40:09] Hello [19:40:13] thank [19:40:16] s [19:40:36] 06Operations, 13Patch-For-Review, 07Tracking: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#2415772 (10Dzahn) [19:40:38] 06Operations, 07Blocked-on-RelEng, 05Gitblit-Deprecate, 13Patch-For-Review: Phase out antimony.wikimedia.org (git.wikimedia.org / gitblit) - https://phabricator.wikimedia.org/T123718#2415770 (10Dzahn) 05Open>03Resolved 12:44 < mutante> !log antimony - shutdown -h now (since it's gone from Icinga now)... [19:41:24] 06Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration, 13Patch-For-Review: Redirect yue.wikipedia.org to zh-yue.wikipedia.org - https://phabricator.wikimedia.org/T105999#2415775 (10Krenair) 05Open>03Resolved [19:41:34] Is there anyone that could help me with a cap lift on es.wiki? [19:41:47] The throttling lift should be enabled by now [19:42:04] But now that we've tried creating more than six accounts, the filter was activated [19:42:21] https://phabricator.wikimedia.org/T137917 This is the task on phabricator and it was solved [19:45:21] hashar: are you still here? ^ [19:45:57] :((( [19:46:09] (03CR) 10Chad: [C: 04-1] WIP: Move more of the gerrit config out of role class and into hiera (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/296622 (owner: 10Chad) [19:46:42] Hmm, close. I'm not getting variable resolution in hiera right tho [19:46:45] Edjoerv: can you reopen that ticket and find the public IP being used? [19:46:54] Edjoerv: like make a user go to whatsmyip.com or so [19:47:12] should be raised to 40 for 181.39.138.146 [19:47:23] 181.39.138.146 [19:47:25] That one. [19:47:40] I already reopened the ticket [19:48:30] ok! just saw [19:48:47] and fluorine.eqiad.wmnet /a/mw-log/ratelimit.log would have the rate warning [19:48:53] i can make a change [19:50:17] oh, it already is 40 [19:50:25] (03PS1) 10Urbanecm: Fix for eswiki throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296627 (https://phabricator.wikimedia.org/T137917) [19:50:27] so, wrong IP? [19:50:47] Edjoerv, is 181.39.138.146 what you see on whatismyip.com? [19:50:51] Edjoerv: on which wiki do you have the issue? eswiki or commons ? [19:51:00] eswiki now [19:51:06] And yes it is the same ip [19:51:16] that IP at least does not trigger any edit rate limit [19:51:34] i dont see it in the ratelimit.log (thanks for the hint hashar) [19:51:40] Hi, can somebody deploy https://gerrit.wikimedia.org/r/296627 ? This is fix for T137917 [19:51:40] T137917: Temporary IP Cap Lift on es.wiki - https://phabricator.wikimedia.org/T137917 [19:51:41] Thanks [19:52:05] holy fuck [19:52:08] PHH [19:52:12] we really need tests for that [19:52:13] I swear I double checked it [19:52:15] ooh :p [19:52:20] (03PS3) 10Chad: WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 [19:52:22] (03PS1) 10Chad: Gerrit replication: ensure group exists before user and before ssh key [puppet] - 10https://gerrit.wikimedia.org/r/296629 [19:52:24] (03CR) 10Hashar: [C: 032] Fix for eswiki throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296627 (https://phabricator.wikimedia.org/T137917) (owner: 10Urbanecm) [19:52:26] (03CR) 10MaxSem: [C: 032] Fix for eswiki throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296627 (https://phabricator.wikimedia.org/T137917) (owner: 10Urbanecm) [19:52:40] MaxSem: mind deploying it ? [19:52:46] yeah [19:53:05] Urbanecm: Edjoerv sorry about that :( [19:53:15] though I dont see any rate limiting being triggered [19:53:38] (03Merged) 10jenkins-bot: Fix for eswiki throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296627 (https://phabricator.wikimedia.org/T137917) (owner: 10Urbanecm) [19:54:20] also did not see it in the log [19:54:43] !log maxsem@tin Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/296627/ (duration: 00m 31s) [19:54:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:55:05] Synced? [19:55:13] ^ [19:55:32] Thanks. I'm going to reclose the ticket. Sorry Edjoerv [19:55:46] still, that IP wasn't in logs [19:55:53] (???) [19:56:03] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [19:56:54] the few IPs I have found are all from Mexico [19:57:00] but the event is apparently in Ecuador [19:57:16] 06Operations, 10ops-eqiad: decom antimony (datacenter) - https://phabricator.wikimedia.org/T138978#2415845 (10Dzahn) [19:57:46] Maybe the filters wasn't logging for short time? It could happen... [19:57:51] 06Operations, 10ops-eqiad: decom antimony (datacenter) - https://phabricator.wikimedia.org/T138978#2415859 (10Dzahn) [19:57:53] 06Operations, 07Blocked-on-RelEng, 05Gitblit-Deprecate, 13Patch-For-Review: Phase out antimony.wikimedia.org (git.wikimedia.org / gitblit) - https://phabricator.wikimedia.org/T123718#2415860 (10Dzahn) [19:58:41] (03CR) 10Dzahn: [C: 032] "server is shut down" [dns] - 10https://gerrit.wikimedia.org/r/296617 (https://phabricator.wikimedia.org/T123718) (owner: 10Chad) [19:59:17] Edjoerv, does it work now? [19:59:39] Yes it is working! [19:59:41] Thanks a lot! [19:59:43] :O [20:00:04] gwicke, cscott, arlolra, subbu, bearND, and mdholloway: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / Mobileapps / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160629T2000). [20:00:19] Thanks, Hashar and Urbanecm and all of you! [20:00:32] no parsoid deploy today [20:00:37] You're welcome :). [20:00:44] Edjoerv: all credits to Urbanecm to have fixed it :) please comment / close the related task! [20:00:54] 06Operations, 10ops-eqiad: decom antimony (datacenter) - https://phabricator.wikimedia.org/T138978#2415878 (10Dzahn) [20:01:01] Urbanecm: I should have caught that additional space on review. Sorry about that :/ [20:01:08] mutante ^^ :) [20:01:19] I am moving out now *wave* [20:01:23] I've already closed the task. [20:01:59] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [20:02:32] 06Operations, 07Blocked-on-RelEng, 05Gitblit-Deprecate: Phase out antimony.wikimedia.org (git.wikimedia.org / gitblit) - https://phabricator.wikimedia.org/T123718#2415880 (10Dzahn) [20:03:17] 06Operations, 10ops-codfw, 10DBA: es2017 and es2019 crashed with no logs - https://phabricator.wikimedia.org/T130702#2415881 (10Papaul) @jcrespo any new update on es2017? [20:03:34] 06Operations, 13Patch-For-Review, 07Tracking: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#2415882 (10Dzahn) [20:03:49] 06Operations, 13Patch-For-Review, 07Tracking: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#1936497 (10Dzahn) antimony down (and one other). down to 20 [20:05:10] (03CR) 10MaxSem: "I tried it on maps-scratch5 and saw no changes in pg_hba.conf." [puppet] - 10https://gerrit.wikimedia.org/r/296551 (https://phabricator.wikimedia.org/T138092) (owner: 10Gehel) [20:05:46] 06Operations, 06Release-Engineering-Team, 07Developer-notice, 05Gitblit-Deprecate, and 2 others: Redirect Gitblit urls (git.wikimedia.org) -> Diffusion urls (phabricator.wikimedia.org/diffusion) - https://phabricator.wikimedia.org/T137224#2415897 (10Dzahn) [20:05:56] 06Operations, 10ops-codfw, 10DBA: es2017 and es2019 crashed with no logs - https://phabricator.wikimedia.org/T130702#2415898 (10jcrespo) No. No crash. But I was expecting to hear their response from support. [20:06:00] 06Operations, 06Release-Engineering-Team, 07Developer-notice, 05Gitblit-Deprecate, 07Notice: Redirect Gitblit urls (git.wikimedia.org) -> Diffusion urls (phabricator.wikimedia.org/diffusion) - https://phabricator.wikimedia.org/T137224#2388282 (10Dzahn) [20:09:19] (03PS4) 10Chad: WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 [20:10:07] (03PS1) 10Brian Wolff: Add Content-Security-Policy to images from test[2]wiki [puppet] - 10https://gerrit.wikimedia.org/r/296634 (https://phabricator.wikimedia.org/T117618) [20:11:44] (03PS5) 10Chad: WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 [20:13:59] subbu, i'll do a few deployes [20:15:03] ok ... i am not deploying today. [20:15:18] !log starting mobileapps deploy [20:15:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:21:03] !log mobileapps deployed 1da6bf0 [20:21:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:31:15] 06Operations, 10Icinga, 10ORES, 06Revision-Scoring-As-A-Service: Monitor production worker nodes in icinga - https://phabricator.wikimedia.org/T138882#2416006 (10Dzahn) https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=ores+worker [20:35:56] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/3226/" [puppet] - 10https://gerrit.wikimedia.org/r/291024 (https://phabricator.wikimedia.org/T136301) (owner: 10Hashar) [20:37:01] (03PS2) 10Dzahn: admin: add new sectools-roots admin group [puppet] - 10https://gerrit.wikimedia.org/r/296438 (https://phabricator.wikimedia.org/T138873) [20:40:23] (03CR) 10Dpatrick: [C: 031] admin: add new sectools-roots admin group [puppet] - 10https://gerrit.wikimedia.org/r/296438 (https://phabricator.wikimedia.org/T138873) (owner: 10Dzahn) [20:40:42] (03CR) 10Dzahn: [C: 032] "adds new, but empty group. (needs to be 2 steps anyways)" [puppet] - 10https://gerrit.wikimedia.org/r/296438 (https://phabricator.wikimedia.org/T138873) (owner: 10Dzahn) [20:42:10] I may be deploying a VE wmf.8 fix some time between the end of this deployment window and the start of the next [20:48:16] (03PS1) 10Dzahn: admin: add dpatrick to sectools-roots [puppet] - 10https://gerrit.wikimedia.org/r/296651 (https://phabricator.wikimedia.org/T138873) [20:50:10] !log deployed Kartotherian https://gerrit.wikimedia.org/r/#/c/296646/ [20:50:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:53:57] Krenair, is that related to the linkupdate flood in fatalmon? [20:54:12] no [20:54:18] it's https://phabricator.wikimedia.org/T138980 [20:55:02] !log deployed Tilerator https://gerrit.wikimedia.org/r/#/c/296647/ [20:55:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:58:23] (03PS2) 10Dzahn: admin: add dpatrick to sectools-roots, put group in role [puppet] - 10https://gerrit.wikimedia.org/r/296651 (https://phabricator.wikimedia.org/T138873) [20:59:01] 06Operations, 06Release-Engineering-Team, 07Developer-notice, 05Gitblit-Deprecate, 07Notice: Redirect Gitblit urls (git.wikimedia.org) -> Diffusion urls (phabricator.wikimedia.org/diffusion) - https://phabricator.wikimedia.org/T137224#2416101 (10Danny_B) Not sure if this has been notified by #developer-r... [21:00:26] (03PS2) 10Dzahn: Gerrit replication: ensure group exists before user and before ssh key [puppet] - 10https://gerrit.wikimedia.org/r/296629 (owner: 10Chad) [21:01:59] !log deployed Graphoid https://gerrit.wikimedia.org/r/#/c/296498/ [21:02:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:02:32] thcipriani, do you know if anyone is looking at the falatmon flood? [21:03:20] yurik: I don't. I haven't seen anything about it, either. [21:03:46] * thcipriani looks [21:03:46] 06Operations, 06Project-Admins, 05Gitblit-Deprecate: Archive #Gitblit-Deprecate - https://phabricator.wikimedia.org/T138986#2416121 (10Danny_B) [21:04:39] 06Operations, 06Project-Admins, 05Gitblit-Deprecate: Archive #Gitblit-Deprecate - https://phabricator.wikimedia.org/T138986#2416121 (10Dzahn) quite the opposite. happy to see it being archived [21:05:07] hmm. Lots of notices from Linker.php in wmf.7 :\ [21:06:06] * thcipriani files task [21:06:16] 06Operations, 06Project-Admins, 05Gitblit-Deprecate: Archive #Gitblit-Deprecate - https://phabricator.wikimedia.org/T138986#2416143 (10Dzahn) The last remaining task is not blocked by deprecating gitblit. gitblit is already deprecated right now, the server is down. I think the tag should be removed from that... [21:07:12] (03CR) 10Dzahn: [C: 032] Gerrit replication: ensure group exists before user and before ssh key [puppet] - 10https://gerrit.wikimedia.org/r/296629 (owner: 10Chad) [21:08:28] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review, 05Security: root access on security-tools instances for Darian Patrick - https://phabricator.wikimedia.org/T138873#2416159 (10Dzahn) a:03Dzahn [21:08:44] 06Operations, 06Project-Admins, 05Gitblit-Deprecate: Archive #Gitblit-Deprecate - https://phabricator.wikimedia.org/T138986#2416161 (10Danny_B) I guess @Paladox and myself can finish the last one by the end of month or week so then I'll do the archiving... I guess no need to rush... [21:09:33] 07Blocked-on-Operations, 06Operations, 07Graphite: "unexpected error" on graphite-web - https://phabricator.wikimedia.org/T138541#2416165 (10Dzahn) p:05Triage>03Normal [21:09:51] 07Blocked-on-Operations, 06Operations, 07Graphite: "unexpected error" on graphite-web - https://phabricator.wikimedia.org/T138541#2403434 (10Dzahn) a:03yuvipanda [21:10:28] (03CR) 10Dpatrick: [C: 031] admin: add dpatrick to sectools-roots, put group in role [puppet] - 10https://gerrit.wikimedia.org/r/296651 (https://phabricator.wikimedia.org/T138873) (owner: 10Dzahn) [21:12:06] 06Operations, 10vm-requests, 13Patch-For-Review, 05Security: provide ganeti VM for security team sectools - https://phabricator.wikimedia.org/T138650#2416185 (10Dzahn) The machine has been installed with a stub role and is ready to be used. The blocked task is handling the access request for Darian to get... [21:12:16] 06Operations, 10vm-requests, 13Patch-For-Review, 05Security: provide ganeti VM for security team sectools - https://phabricator.wikimedia.org/T138650#2416187 (10Dzahn) 05Open>03Resolved a:03Dzahn [21:12:40] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review, 05Security: root access on security-tools instances for Darian Patrick - https://phabricator.wikimedia.org/T138873#2416190 (10Dzahn) [21:12:42] 06Operations, 10vm-requests, 13Patch-For-Review, 05Security: provide ganeti VM for security team sectools - https://phabricator.wikimedia.org/T138650#2406322 (10Dzahn) [21:12:48] 06Operations, 10vm-requests, 13Patch-For-Review, 05Security: provide ganeti VM for security team sectools - https://phabricator.wikimedia.org/T138650#2406322 (10Dzahn) [21:12:50] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review, 05Security: root access on security-tools instances for Darian Patrick - https://phabricator.wikimedia.org/T138873#2412705 (10Dzahn) [21:13:51] 06Operations, 10Ops-Access-Requests: root access on security-tools instances for Darian Patrick - https://phabricator.wikimedia.org/T138873#2412705 (10Dzahn) [21:14:35] 06Operations, 10Ops-Access-Requests: Requesting access to deployment hosts (tin/terbium) for Brian Wolff - https://phabricator.wikimedia.org/T138635#2416213 (10Dzahn) a:03Bawolff [21:14:48] 06Operations, 10Ops-Access-Requests: Requesting access to deployment hosts (tin/terbium) for Brian Wolff - https://phabricator.wikimedia.org/T138635#2405970 (10Dzahn) 05Open>03stalled [21:14:54] Oh, I should do that [21:18:51] (03PS1) 10Dzahn: admin: add wdqs-admins to deploy-service group [puppet] - 10https://gerrit.wikimedia.org/r/296658 (https://phabricator.wikimedia.org/T138628) [21:20:34] (03PS2) 10Dzahn: admin: add wdqs-admins to deploy-service group [puppet] - 10https://gerrit.wikimedia.org/r/296658 (https://phabricator.wikimedia.org/T138628) [21:20:55] (03PS3) 10Dzahn: admin: add wdqs-admins to deploy-service group [puppet] - 10https://gerrit.wikimedia.org/r/296658 (https://phabricator.wikimedia.org/T138628) [21:21:13] 06Operations, 10Ops-Access-Requests, 10Deployment-Systems, 06Discovery, and 5 others: Add wdqs-admins to deploy-services group - https://phabricator.wikimedia.org/T138628#2416241 (10Dzahn) a:03Dzahn [21:22:40] 06Operations, 10Ops-Access-Requests, 06Discovery, 10Wikidata, and 2 others: Enable WDQS admins to enable/disable updater service - https://phabricator.wikimedia.org/T138627#2405780 (10Dzahn) Patch needs to be amended to reflect the latest comments? [21:24:19] (03CR) 10Dzahn: [C: 04-1] "ticket says "grant both mask/unmask and enable/disable" and that seems alright. please amend to reflect that" [puppet] - 10https://gerrit.wikimedia.org/r/295968 (https://phabricator.wikimedia.org/T138627) (owner: 10Smalyshev) [21:24:53] (03CR) 10Smalyshev: "OK, I will, I was waiting for consensus on the ticket." [puppet] - 10https://gerrit.wikimedia.org/r/295968 (https://phabricator.wikimedia.org/T138627) (owner: 10Smalyshev) [21:27:07] PROBLEM - Host cp3022 is DOWN: PING CRITICAL - Packet loss = 100% [21:27:27] PROBLEM - Host cp3030 is DOWN: PING CRITICAL - Packet loss = 100% [21:27:37] PROBLEM - Host cp3032 is DOWN: PING CRITICAL - Packet loss = 100% [21:27:37] PROBLEM - Host cp3015 is DOWN: PING CRITICAL - Packet loss = 100% [21:27:37] PROBLEM - Host cp3008 is DOWN: PING CRITICAL - Packet loss = 100% [21:27:47] PROBLEM - Host ms-be3001 is DOWN: PING CRITICAL - Packet loss = 100% [21:27:47] PROBLEM - Host cp3043 is DOWN: PING CRITICAL - Packet loss = 100% [21:27:57] RECOVERY - Host cp3032 is UP: PING OK - Packet loss = 0%, RTA = 82.73 ms [21:27:57] RECOVERY - Host cp3030 is UP: PING WARNING - Packet loss = 66%, RTA = 84.49 ms [21:27:57] RECOVERY - Host ms-be3001 is UP: PING OK - Packet loss = 0%, RTA = 83.00 ms [21:27:59] what is with cp3, is ther maintenance? [21:28:06] RECOVERY - Host cp3008 is UP: PING OK - Packet loss = 0%, RTA = 82.87 ms [21:28:06] RECOVERY - Host cp3015 is UP: PING OK - Packet loss = 0%, RTA = 83.01 ms [21:28:06] RECOVERY - Host cp3022 is UP: PING OK - Packet loss = 0%, RTA = 83.17 ms [21:28:07] RECOVERY - Host cp3043 is UP: PING OK - Packet loss = 0%, RTA = 82.66 ms [21:28:24] (03CR) 10Dzahn: "Thanks, it seems ok to me and if it comes up in the meeting we can link to it like this" [puppet] - 10https://gerrit.wikimedia.org/r/295968 (https://phabricator.wikimedia.org/T138627) (owner: 10Smalyshev) [21:30:11] jynus: dont know, but it seems over [21:33:51] I am going to disable dbstore1001 notifications and delete some files [21:34:41] !log removing /srv/backups/m2-otrs-* (tranferred to es2001) to make space [21:34:41] will be deploying https://gerrit.wikimedia.org/r/#/c/296661/ soon [21:34:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:41:26] !log cleared phab 2fa for ebernhardson for lost phone [21:41:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:43:29] !log krenair@tin Synchronized php-1.28.0-wmf.8/extensions/VisualEditor/ApiVisualEditor.php: https://gerrit.wikimedia.org/r/296661 - VE namespaces issue (duration: 00m 26s) [21:43:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:44:15] (03PS6) 10Dzahn: Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [21:44:52] (03CR) 10Dzahn: "amended to split out the define into a separate file and with that fix the lint issue" [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [21:45:13] (03CR) 10jenkins-bot: [V: 04-1] Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [21:46:44] seems to have fixed the issue [21:48:39] (03PS7) 10Dzahn: Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [21:49:09] PROBLEM - puppet last run on db1066 is CRITICAL: CRITICAL: Puppet has 1 failures [21:53:24] mutante: It probably doesn't save much, but we could prune the gitblit package from apt too I think? [21:53:30] Or had we already given up packaging? [21:53:31] (03CR) 10Dzahn: "Error: Could not find data item role::labs::ores::lb::realservers in any Hiera data file and no default supplied at /mnt/jenkins-workspace" [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [21:53:33] * ostriches can't remember [21:54:43] hmmm https://apt.wikimedia.org/wikimedia/pool/main/g/ [21:55:34] https://apt.wikimedia.org/wikimedia/pool/main/g/gerrit/ :) [21:55:39] mutante ^^ [21:55:52] thats gerrit, but not gitblit [21:56:00] Yeah, looks like there isn't one. [21:56:06] Perfect then [21:56:11] root@carbon:~# reprepro -C universe ls gitblit [21:56:15] dont see it there either [21:56:49] 06Operations, 06Discovery, 10Elasticsearch, 10Wikimedia-Logstash, 03Discovery-Search-Sprint: Logstash elasticsearch mapping does not allow err.code to be a string - https://phabricator.wikimedia.org/T137400#2367183 (10EBernhardson) Not sure the best solution here. We could force err.code to a string from... [21:57:08] nor with -C thirdparty [22:02:10] PROBLEM - Host cp3044 is DOWN: PING CRITICAL - Packet loss = 100% [22:02:19] PROBLEM - Host cp3012 is DOWN: PING CRITICAL - Packet loss = 100% [22:02:20] PROBLEM - Host lvs3001 is DOWN: PING CRITICAL - Packet loss = 100% [22:02:40] PROBLEM - Host cp3048 is DOWN: PING CRITICAL - Packet loss = 100% [22:02:40] PROBLEM - Host eeden is DOWN: PING CRITICAL - Packet loss = 100% [22:02:40] PROBLEM - Host cp3009 is DOWN: PING CRITICAL - Packet loss = 100% [22:03:30] RECOVERY - Host eeden is UP: PING WARNING - Packet loss = 86%, RTA = 83.37 ms [22:03:30] RECOVERY - Host lvs3001 is UP: PING WARNING - Packet loss = 86%, RTA = 83.40 ms [22:03:30] RECOVERY - Host cp3044 is UP: PING WARNING - Packet loss = 37%, RTA = 83.54 ms [22:03:30] RECOVERY - Host cp3048 is UP: PING WARNING - Packet loss = 37%, RTA = 83.40 ms [22:03:31] RECOVERY - Host cp3012 is UP: PING WARNING - Packet loss = 37%, RTA = 83.63 ms [22:03:39] RECOVERY - Host cp3009 is UP: PING OK - Packet loss = 0%, RTA = 83.31 ms [22:05:08] (03CR) 10Dzahn: "now jenkins-bot is happy, but it cant find the servers in hiera yet" [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [22:05:48] (03PS6) 10Chad: WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 [22:05:59] @seen nschaaf [22:05:59] mutante: I have never seen nschaaf [22:06:41] Icinga and esams do not seem to love eachother now [22:06:58] schana: ^ [22:08:09] PROBLEM - puppet last run on cp3049 is CRITICAL: CRITICAL: Puppet has 2 failures [22:08:15] (03CR) 10jenkins-bot: [V: 04-1] WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 (owner: 10Chad) [22:08:20] PROBLEM - puppet last run on cp3015 is CRITICAL: CRITICAL: Puppet has 1 failures [22:12:35] Dereckson hi, could you approve translations for https://www.mediawiki.org/w/index.php?title=Template:DownloadMediaWiki&oldid=2088171&diff=2167348 [22:12:37] please [22:13:12] and https://www.mediawiki.org/w/index.php?title=Template:GerritLog&oldid=1434810&diff=2169774 please [22:15:34] Hi paladox, could you apply for translation permissions on mw? [22:16:00] RECOVERY - puppet last run on db1066 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:16:02] Dereckson ok, how do i do that [22:16:04] please [22:18:11] paladox: done for DownloadMediaWiki, GerritLog apparently didn't touch to translated text (between tags). [22:18:29] Oh, thanks [22:19:23] ask a mediawiki bureaucrat perhaps, as I'm not sure we've a formal process to require that [22:19:28] (03PS7) 10Chad: WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 [22:19:39] Dereckson sorry to ask, but could you also do https://www.mediawiki.org/wiki/Template:WikimediaDownload please [22:19:44] ok [22:20:21] Done. [22:20:32] Thanks for caring about these templates. [22:20:32] (03CR) 10jenkins-bot: [V: 04-1] WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 (owner: 10Chad) [22:21:20] Dereckson your welcome, and sorry to ask again and i promise this is the last one. But this https://www.mediawiki.org/wiki/Template:WikimediaGitCheckout one please. [22:21:36] (03PS8) 10Chad: WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 [22:21:43] Dereckson and it was Danny_B who found those templates, i just updated them. [22:21:45] Done. [22:22:02] Thankyou very much. [22:22:35] (03CR) 10jenkins-bot: [V: 04-1] WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 (owner: 10Chad) [22:28:37] (03PS9) 10Chad: WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 [22:29:49] (03CR) 10jenkins-bot: [V: 04-1] WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 (owner: 10Chad) [22:31:05] (03PS10) 10Chad: WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 [22:31:19] I'll get it eventually :p [22:32:12] (03CR) 10jenkins-bot: [V: 04-1] WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 (owner: 10Chad) [22:33:17] (03PS11) 10Chad: WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 [22:33:27] Closeeeee [22:35:12] RECOVERY - puppet last run on cp3049 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [22:35:50] RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [22:43:16] (03CR) 10Chad: "Close, but something's not quite right with the $host parameter being set in hiera." [puppet] - 10https://gerrit.wikimedia.org/r/296622 (owner: 10Chad) [22:43:33] Any hiera masters about? :) [22:45:17] (03PS1) 10Paladox: Promote REL1_27 to stable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296669 [22:46:47] ostriches ^^ [22:47:50] (03CR) 10Chad: [C: 032] Promote REL1_27 to stable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296669 (owner: 10Paladox) [22:48:01] ostriches thanks [22:48:01] Could've waited til swat but w/e [22:48:08] oh [22:49:13] (03Merged) 10jenkins-bot: Promote REL1_27 to stable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296669 (owner: 10Paladox) [22:50:13] !log demon@tin Synchronized wmf-config/CommonSettings.php: extdist config for 1.27/1.25 (duration: 00m 31s) [22:50:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:51:11] (03CR) 10Chad: "Or do I need to use hiera() in the proxy and jetty classes?" [puppet] - 10https://gerrit.wikimedia.org/r/296622 (owner: 10Chad) [22:56:11] (03PS3) 10Smalyshev: Allow wdqs admins to control wdqs-updater service [puppet] - 10https://gerrit.wikimedia.org/r/295968 (https://phabricator.wikimedia.org/T138627) [22:58:48] (03CR) 10Smalyshev: [C: 031] admin: add wdqs-admins to deploy-service group [puppet] - 10https://gerrit.wikimedia.org/r/296658 (https://phabricator.wikimedia.org/T138628) (owner: 10Dzahn) [23:00:05] RoanKattouw, ostriches, Krenair, MaxSem, and Dereckson: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160629T2300). Please do the needful. [23:00:05] RoanKattouw: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:36] * RoanKattouw waves [23:01:27] hey [23:01:53] there's some trivial stuff in the queue like https://gerrit.wikimedia.org/r/296567 [23:01:58] but not listed for swat [23:06:26] (03PS12) 10Chad: WIP: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 [23:07:21] (03CR) 10BryanDavis: [C: 031] else if -> elseif [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296567 (owner: 10Reedy) [23:08:45] Let's add it to SWAT so. [23:08:59] (03CR) 10BryanDavis: [C: 031] Cleanup: Move never-altered GlobalBlockingBlockXFF into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292615 (owner: 10Jforrester) [23:09:30] (03CR) 10BryanDavis: [C: 031] Cleanup: Move never-altered CentralAuthUseEventLogging into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292616 (owner: 10Jforrester) [23:10:49] (03CR) 10BryanDavis: [C: 031] Cleanup: Move never-altered DisableUnmergedEdits into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292617 (owner: 10Jforrester) [23:12:28] (03CR) 10BryanDavis: [C: 031] Cleanup: Move never-altered UseLocalisationUpdate into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292621 (owner: 10Jforrester) [23:13:02] (03CR) 10BryanDavis: [C: 031] Cleanup: Move never-altered CommonsMetadata* into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292622 (owner: 10Jforrester) [23:13:21] bd808: how many there are in total? [23:13:53] (I added the first three to the SWAT, I'll wait you're done addinh the remaining) [23:13:53] 10 from James_F -- https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/mediawiki-config+branch:master+topic:cleanup,n,z [23:15:09] PROBLEM - check_puppetrun on tellurium is CRITICAL: CRITICAL: Puppet has 1 failures [23:15:21] Dereckson: I'll stop and let you look at them. Take whatever ones you are comfortable with now and I'll look at any that are left later [23:15:44] Okay. [23:17:12] (03PS1) 10Dereckson: Revert "Enable Echo transition flags in production for testing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296672 [23:18:04] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296672 (owner: 10Dereckson) [23:18:38] (03Merged) 10jenkins-bot: Revert "Enable Echo transition flags in production for testing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296672 (owner: 10Dereckson) [23:19:06] RoanKattouw: revert live on mw1017 [23:20:08] Dereckson: Thanks. The original deploy 8 hours ago caused no visible effects on any wikis so we should probably just proceed with deploying it to the cluster [23:20:09] PROBLEM - check_puppetrun on tellurium is CRITICAL: CRITICAL: Puppet has 1 failures [23:20:19] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 1 failures [23:20:21] (03PS1) 10Alex Monk: Change wmgVisualEditorAvailableNamespaces keys to canonical names instead of indexes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296673 (https://phabricator.wikimedia.org/T138999) [23:21:04] (This was an 8-hour test to see what enabling that setting does to performance. It doesn't appear to impact performance measurably and there seems to be no way to tell if it's on or off other than mwscript eval.php) [23:21:20] ack [23:21:21] (03PS13) 10Chad: Move more of the gerrit config out of role class and into hiera [puppet] - 10https://gerrit.wikimedia.org/r/296622 [23:21:32] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Revert "Enable Echo transition flags in production for testing" (duration: 00m 25s) [23:21:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:22:07] (03CR) 10Dereckson: [C: 031] "no op" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292615 (owner: 10Jforrester) [23:22:26] Someone needs to go through https://gerrit.wikimedia.org/r/#/q/project:operations/mediawiki-config+status:open+-label:Code-Review%253C%253D-1,p,003dad540004797b again [23:23:08] in particular you have a couple on the second page Dereckson [23:24:40] (03PS3) 10Dereckson: Cleanup: Move never-altered GlobalBlockingBlockXFF into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292615 (owner: 10Jforrester) [23:24:51] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292615 (owner: 10Jforrester) [23:25:09] PROBLEM - check_puppetrun on tellurium is CRITICAL: CRITICAL: Puppet has 1 failures [23:25:09] PROBLEM - check_puppetrun on payments1005 is CRITICAL: CRITICAL: Puppet has 1 failures [23:25:09] PROBLEM - check_puppetrun on saiph is CRITICAL: CRITICAL: puppet fail [23:25:10] PROBLEM - check_puppetrun on alnitak is CRITICAL: CRITICAL: Puppet has 1 failures [23:25:10] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 1 failures [23:25:16] frack again [23:25:17] uff. that's me. fixing.... [23:25:27] (03Merged) 10jenkins-bot: Cleanup: Move never-altered GlobalBlockingBlockXFF into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292615 (owner: 10Jforrester) [23:25:35] (03PS2) 10Dereckson: Update logo settings for Adyghe Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296460 (owner: 10Odder) [23:25:38] sorry, i did some puppet reorg and one misplaced file caused cascading failhose [23:26:05] (03CR) 10Chad: [C: 031] "https://puppet-compiler.wmflabs.org/3232/ shows only manifest changes, no actual config changes on disk yay!!! :D" [puppet] - 10https://gerrit.wikimedia.org/r/296622 (owner: 10Chad) [23:26:18] yay go me [23:26:21] gerrit puppet way nicer now [23:27:00] ($wgGlobalBlockingBlockXFF still true on mw1017) [23:27:19] does your puppet not have CI checks Jeff_Green? [23:27:29] (03CR) 10Dereckson: "PS2: optipng -o7" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296460 (owner: 10Odder) [23:27:39] Krenair: correct [23:27:51] although I usually do a much better job of testing in virtualbox [23:27:54] correct that is has no CI checks? [23:28:01] (03PS3) 10Dereckson: Cleanup: Move never-altered CentralAuthUseEventLogging into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292616 (owner: 10Jforrester) [23:28:06] correct, it has no CI checks [23:28:42] (03CR) 10Dereckson: [C: 032] "no op" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292616 (owner: 10Jforrester) [23:29:25] (03Merged) 10jenkins-bot: Cleanup: Move never-altered CentralAuthUseEventLogging into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292616 (owner: 10Jforrester) [23:29:56] 06Operations, 10Gerrit, 06Release-Engineering-Team: reinstall/upgrade gerrit server (ytterbium) from precise to jessie - https://phabricator.wikimedia.org/T125018#2416563 (10greg) [23:30:00] 06Operations, 10Gerrit: Update gerrit sshkey in role::ci::slave::labs when upgrade to Jessie happens - https://phabricator.wikimedia.org/T131903#2416564 (10greg) [23:30:09] PROBLEM - check_puppetrun on tellurium is CRITICAL: CRITICAL: Puppet has 1 failures [23:30:09] PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures [23:30:09] RECOVERY - check_puppetrun on saiph is OK: OK: Puppet is currently enabled, last run 191 seconds ago with 0 failures [23:30:09] PROBLEM - check_puppetrun on bismuth is CRITICAL: CRITICAL: Puppet has 1 failures [23:30:10] PROBLEM - check_puppetrun on payments1005 is CRITICAL: CRITICAL: Puppet has 1 failures [23:30:10] PROBLEM - check_puppetrun on alnitak is CRITICAL: CRITICAL: Puppet has 1 failures [23:30:19] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 1 failures [23:30:19] RECOVERY - check_puppetrun on payments2002 is OK: OK: Puppet is currently enabled, last run 155 seconds ago with 0 failures [23:30:19] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 1 failures [23:31:44] 06Operations, 10scap, 03Scap3 (Scap3-MediaWiki-MVP): Depool proxies temporarily while scap is ongoing to avoid taxing those nodes - https://phabricator.wikimedia.org/T125629#2416567 (10greg) [23:31:59] Jeff_Green, do changes get reviewed by anyone else? [23:32:58] Krenair: generally no [23:33:13] there are no other opsen who work on frack with any regularity [23:33:44] (03PS3) 10Dereckson: Cleanup: Move never-altered DisableUnmergedEdits into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292617 (owner: 10Jforrester) [23:34:24] (03CR) 10Dereckson: [C: 031] "no-op, but move setting responsibility from config to extension" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292617 (owner: 10Jforrester) [23:34:53] Jeff_Green, couldn't it be put in gerrit and get CI from jenkins, same as prod puppet? [23:34:58] Do we want to use default extension values? [23:35:09] PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures [23:35:09] PROBLEM - check_puppetrun on bismuth is CRITICAL: CRITICAL: Puppet has 1 failures [23:35:09] PROBLEM - check_puppetrun on tellurium is CRITICAL: CRITICAL: Puppet has 1 failures [23:35:10] RECOVERY - check_puppetrun on payments1005 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [23:35:10] PROBLEM - check_puppetrun on alnitak is CRITICAL: CRITICAL: Puppet has 1 failures [23:35:19] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 1 failures [23:35:19] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 1 failures [23:35:22] Krenair: this is a long discussion, and not a good time all things considered. can we discuss perhaps tomorrow? [23:35:37] sure [23:35:41] (file a task!) [23:35:47] ha [23:36:01] (03PS3) 10Dereckson: Cleanup: Move never-altered NewUserSuppressRC into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292618 (owner: 10Jforrester) [23:36:17] (03CR) 10Dereckson: [C: 032] "no op, SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292618 (owner: 10Jforrester) [23:36:20] greg-g, i'm not sure if that was intended for me or Jeff [23:36:23] greg-g, btw, PM [23:36:36] Krenair: short answer is that having the cluster details exposed is problematic in terms of PCI [23:37:05] Krenair: either :) [23:37:10] your main puppet repository contains private data Jeff_Green ? [23:37:15] shouldn't private data be separated? [23:37:37] that's where the discussion gets long [23:37:55] (03PS3) 10Dereckson: Cleanup: Move never-altered UseDismissableSiteNotice into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292619 (owner: 10Jforrester) [23:37:59] the definition of private is different [23:39:46] (03CR) 10Dereckson: [C: 032] "SWAT, take two" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292618 (owner: 10Jforrester) [23:40:09] RECOVERY - check_puppetrun on tellurium is OK: OK: Puppet is currently enabled, last run 177 seconds ago with 0 failures [23:40:09] PROBLEM - check_puppetrun on bismuth is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:10] PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:10] PROBLEM - check_puppetrun on alnitak is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:10] RECOVERY - check_puppetrun on payments2001 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [23:40:17] How can the definition be different? [23:40:19] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 1 failures [23:43:17] Krenair: because teh PCI-DSS standard is not rational [23:43:17] So, we've a little Zuul issue. CentralAuthUseEventLogging / GlobalBlockingBlockXFF live on mw1017 [23:43:29] (and mwrepl gives expected results) [23:45:09] PROBLEM - check_puppetrun on bismuth is CRITICAL: CRITICAL: Puppet has 1 failures [23:45:09] PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures [23:45:10] PROBLEM - check_puppetrun on alnitak is CRITICAL: CRITICAL: Puppet has 1 failures [23:45:10] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 1 failures [23:46:24] (03PS3) 10Dereckson: Update logo settings for Adyghe Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296460 (https://phabricator.wikimedia.org/T139005) (owner: 10Odder) [23:46:38] (03CR) 10Dereckson: [C: 031] Update logo settings for Adyghe Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296460 (https://phabricator.wikimedia.org/T139005) (owner: 10Odder) [23:46:39] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master). [23:47:41] Dereckson: and zuul isn't picking up your CR+2? [23:48:13] bd808: Zuul doesn't seem to pick anything right now, https://integration.wikimedia.org/zuul/ empty [23:49:18] ah, it picked a CR+2 [23:50:09] PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures [23:50:09] PROBLEM - check_puppetrun on bismuth is CRITICAL: CRITICAL: Puppet has 1 failures [23:50:10] RECOVERY - check_puppetrun on alnitak is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [23:50:10] RECOVERY - check_puppetrun on payments2003 is OK: OK: Puppet is currently enabled, last run 181 seconds ago with 0 failures [23:53:03] Dereckson: I don't see anything scary in the zuul log on gallium [23:54:10] bd808: my fault, was a dependency issue + a suspicious low activity on Zuul, I'm used to check Depends-On:, I've forgotten about same repo dependency [23:54:30] ah. out of order in the patch chain then [23:54:43] (see also (minus my phab task noise) in -releng [23:54:46] so question about https://gerrit.wikimedia.org/r/#/c/292617 > do we really want to move this setting from the config to the (default value) extension back? [23:54:47] ) [23:54:48] frack puppet should start recovering [23:55:09] RECOVERY - check_puppetrun on bismuth is OK: OK: Puppet is currently enabled, last run 134 seconds ago with 0 failures [23:55:09] RECOVERY - check_puppetrun on americium is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [23:55:50] Dereckson: we certainly don't set every MW config var in wmf-config. Seems sane to me to use extension defaults when they are what we want. [23:59:02] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292617 (owner: 10Jforrester) [23:59:46] (03Merged) 10jenkins-bot: Cleanup: Move never-altered DisableUnmergedEdits into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292617 (owner: 10Jforrester) [23:59:49] (03Merged) 10jenkins-bot: Cleanup: Move never-altered NewUserSuppressRC into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292618 (owner: 10Jforrester)