[00:00:32] (03Draft1) 10Paladox: test [puppet] - 10https://gerrit.wikimedia.org/r/325062 [00:00:34] (03Draft2) 10Paladox: test [puppet] - 10https://gerrit.wikimedia.org/r/325062 [00:01:30] (03CR) 10jenkins-bot: [V: 04-1] test [puppet] - 10https://gerrit.wikimedia.org/r/325062 (owner: 10Paladox) [00:01:33] (03Abandoned) 10Paladox: test [puppet] - 10https://gerrit.wikimedia.org/r/325062 (owner: 10Paladox) [00:02:44] (03PS33) 10Paladox: Phabricator: allow rsyncing /srv/repos from active to passive server [puppet] - 10https://gerrit.wikimedia.org/r/324796 (https://phabricator.wikimedia.org/T137928) [00:02:58] (03PS34) 10Dzahn: Phabricator: allow rsyncing /srv/repos from active to passive server [puppet] - 10https://gerrit.wikimedia.org/r/324796 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [00:03:29] (03PS35) 10Paladox: Phabricator: allow rsyncing /srv/repos from active to passive server [puppet] - 10https://gerrit.wikimedia.org/r/324796 (https://phabricator.wikimedia.org/T137928) [00:06:19] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/4780/" [puppet] - 10https://gerrit.wikimedia.org/r/324796 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [00:07:46] (03PS36) 10Dzahn: Phabricator: allow rsyncing /srv/repos from active to passive server [puppet] - 10https://gerrit.wikimedia.org/r/324796 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [00:07:53] (03CR) 10Dzahn: [C: 032] Phabricator: allow rsyncing /srv/repos from active to passive server [puppet] - 10https://gerrit.wikimedia.org/r/324796 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [00:10:43] (03CR) 10Paladox: "Thanks." [puppet] - 10https://gerrit.wikimedia.org/r/324796 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [00:10:43] paladox: ehm.. almost :) [00:10:54] we need one follow-up, looking [00:11:29] no problems in prod, just on 2001 [00:12:08] Ok [00:13:34] paladox: ahh, so our problem is we resolve "iridium" [00:13:41] oh [00:13:47] and not iridium.eqiad.wmnet [00:13:53] yep [00:16:33] a lot of machines will search eqiad.wmnet [00:16:37] but I wouldn't count on it [00:16:48] yep, this one doesnt [00:16:59] but the search stuff can also be puppetized [00:17:10] on it [00:17:51] Yep [00:25:58] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [00:27:33] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 05MW-1.28-release-notes, 13Patch-For-Review: Create Wikipedia Tulu - https://phabricator.wikimedia.org/T140898#2480135 (10Krenair) Is this completed now @Dereckson? [00:28:02] (03PS1) 10Dzahn: phabricator: use FQDN instead of short hostname in ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/325067 [00:29:00] (03CR) 10jenkins-bot: [V: 04-1] phabricator: use FQDN instead of short hostname in ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/325067 (owner: 10Dzahn) [00:30:48] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:31:07] (03CR) 10Paladox: phabricator: use FQDN instead of short hostname in ferm rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/325067 (owner: 10Dzahn) [00:31:17] paladox: something different first. phab2001 - add 'search eqiadn.wmnet' to /etc/resolv.conf [00:31:31] and puppet removed that again, hold on [00:31:47] (03CR) 10Paladox: phabricator: use FQDN instead of short hostname in ferm rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/325067 (owner: 10Dzahn) [00:32:00] mutante yep [00:32:12] mutante you will need to create a new variable with hiera [00:32:26] maybe we can avoid it [00:32:43] $phabricator_active_server_fqdn = hiera('phabricator_active_server_fqdn') [00:32:56] But wont that fail because we want it to run on phab2001 [00:33:01] so it wont get phab2001 ip [00:33:07] mutante ^^ [00:33:41] yes, i saw that, it will fail but i want another way altogether [00:33:49] Ok [00:34:02] mutante oh so different then your current patch? [00:34:08] yes [00:34:45] oh [00:35:02] btw, the puppet run and rsync setup is already fixed now [00:35:13] but give me a minute [00:35:16] ok [00:35:24] how did you manage to fix it? [00:35:56] edited /etc/resolv.conf manually [00:35:59] then puppet reverted me [00:36:05] oh [00:36:07] but it was enough for the ferm rule that it works once [00:36:14] ok [00:36:16] then it could create it and fine .. so far [00:36:21] :) [00:38:16] paladox: do ./puppet/hieradata$ grep -r domain_search [00:38:24] ok [00:38:48] i'm adding that for phabricator [00:39:06] ok [00:39:08] :) [00:39:53] i'm just thinking about where in the hierarchy [00:40:18] Oh [00:42:00] (03PS1) 10Dzahn: phabricator: add eqiad and codfw to domain search [puppet] - 10https://gerrit.wikimedia.org/r/325069 [00:42:51] (03CR) 10Paladox: [C: 031] phabricator: add eqiad and codfw to domain search [puppet] - 10https://gerrit.wikimedia.org/r/325069 (owner: 10Dzahn) [00:46:41] (03PS2) 10Dzahn: phabricator: add eqiad and codfw to domain search [puppet] - 10https://gerrit.wikimedia.org/r/325069 [00:47:59] (03CR) 10Dzahn: [C: 032] phabricator: add eqiad and codfw to domain search [puppet] - 10https://gerrit.wikimedia.org/r/325069 (owner: 10Dzahn) [00:51:21] paladox: doesnt work like that :p [00:51:29] oh [00:51:31] back to the first version [00:51:35] well, both [00:51:35] mutante should we do the ip [00:51:36] for now [00:52:07] i think we should do FQDN [00:52:16] and later remove the usage of the existing var [00:52:25] so that we dont have both in hiera [00:52:47] oh [00:52:54] ok, thanks for working on it :) [00:56:59] Krenair: around? [00:58:48] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:59:06] (03PS2) 10Dzahn: phabricator: use FQDN instead of short hostname in ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/325067 [01:00:45] (03CR) 10Paladox: [C: 031] phabricator: use FQDN instead of short hostname in ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/325067 (owner: 10Dzahn) [01:05:17] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/4781/" [puppet] - 10https://gerrit.wikimedia.org/r/325067 (owner: 10Dzahn) [01:05:40] (03CR) 10Dzahn: "follow-up https://gerrit.wikimedia.org/r/#/c/325067" [puppet] - 10https://gerrit.wikimedia.org/r/324796 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [01:06:05] (03CR) 10Paladox: "Thanks." [puppet] - 10https://gerrit.wikimedia.org/r/324796 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [01:06:29] (03CR) 10Dzahn: [C: 032] phabricator: use FQDN instead of short hostname in ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/325067 (owner: 10Dzahn) [01:06:31] (03CR) 10Paladox: "It works." [puppet] - 10https://gerrit.wikimedia.org/r/325067 (owner: 10Dzahn) [01:06:52] (03CR) 10Paladox: "srange => @resolve((iridium.eqiad.wmnet))/32" [puppet] - 10https://gerrit.wikimedia.org/r/325067 (owner: 10Dzahn) [01:07:11] (03PS3) 10Dzahn: phabricator: use FQDN instead of short hostname in ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/325067 (https://phabricator.wikimedia.org/T137928) [01:07:16] (03PS4) 10Dzahn: phabricator: use FQDN instead of short hostname in ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/325067 (https://phabricator.wikimedia.org/T137928) [01:15:56] (03PS1) 10Dzahn: phab2001: fix ferm rule, remove /32 [puppet] - 10https://gerrit.wikimedia.org/r/325073 [01:16:10] mutante it fails on labs [01:16:11] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item phabricator_active_server_fqdn in any Hiera data file and no default supplied at /etc/puppet/modules/role/manifests/phabricator/rsync.pp:5 on node phabricator.phabricator.eqiad.wmflabs [01:16:11] Warning: Not using cache on failed catalog [01:16:11] Error: Could not retrieve catalog; skipping run [01:17:31] paladox: either set it in the wiki page, or i add it in repo [01:17:40] oh [01:17:44] looks like we are currently setting "active_server" in wiki page there? [01:17:54] btw, the /32 neeeds to be removed [01:18:04] (03CR) 10Aude: build: require-dev phpunit in composer.json (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325055 (owner: 10Krinkle) [01:20:43] works now phabricator_active_server_fqdn: phabricator.phabricator.eqiad.wmflabs [01:21:03] (03CR) 10Paladox: [C: 031] phab2001: fix ferm rule, remove /32 [puppet] - 10https://gerrit.wikimedia.org/r/325073 (owner: 10Dzahn) [01:21:10] *nod* good! i slightly prefer in repo over wiki page but either works [01:21:25] oh [01:22:15] (03PS2) 10Dzahn: phab2001: fix ferm rule, remove /32 [puppet] - 10https://gerrit.wikimedia.org/r/325073 [01:22:21] (03CR) 10Dzahn: [C: 032] phab2001: fix ferm rule, remove /32 [puppet] - 10https://gerrit.wikimedia.org/r/325073 (owner: 10Dzahn) [01:24:25] (03CR) 10Dzahn: "follow-up 2 https://gerrit.wikimedia.org/r/#/c/325073/" [puppet] - 10https://gerrit.wikimedia.org/r/324796 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [01:25:28] (03CR) 10Dzahn: "works now:" [puppet] - 10https://gerrit.wikimedia.org/r/324796 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [01:25:39] paladox: that fixed the ferm thing [01:25:53] mutante :) [01:25:57] is it syncing the repos [01:26:00] ? [01:26:01] the /32 wasn't supposed to be there, that was just with the IP address [01:26:02] and thanks [01:26:07] yep [01:27:12] it does not make puppet run a sync, it lets puppet setup the config so that humans are permitted to do so [01:27:20] oh [01:27:26] ok [01:29:06] just saw another thing we need to fix , but no rush to it [01:29:24] progress was made [01:33:35] 06Operations, 06Analytics-Kanban, 06Discovery, 06Discovery-Analysis (Current work), and 2 others: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#2843314 (10mpopov) 05Open>03Resolved >>! In T147682#2837271, @Ottomata wrote: > YESSHHHHH... [01:41:42] (03PS1) 10Dzahn: phab2001: fix hosts_allowed in rsync config [puppet] - 10https://gerrit.wikimedia.org/r/325077 (https://phabricator.wikimedia.org/T137928) [01:42:05] (03CR) 10Dzahn: "the ferm part worked, but the rsyncd part did not. follow-up 3: https://gerrit.wikimedia.org/r/#/c/325077/" [puppet] - 10https://gerrit.wikimedia.org/r/324796 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [01:42:56] (03CR) 10jenkins-bot: [V: 04-1] phab2001: fix hosts_allowed in rsync config [puppet] - 10https://gerrit.wikimedia.org/r/325077 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [01:43:32] (03PS2) 10Dzahn: phab2001: fix hosts_allowed in rsync config [puppet] - 10https://gerrit.wikimedia.org/r/325077 (https://phabricator.wikimedia.org/T137928) [01:47:19] (03CR) 10Dzahn: [C: 032] phab2001: fix hosts_allowed in rsync config [puppet] - 10https://gerrit.wikimedia.org/r/325077 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [01:54:16] (03CR) 10Dzahn: "rsync ok, better now, but we still need to add IPv6 too" [puppet] - 10https://gerrit.wikimedia.org/r/324796 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [02:05:19] (03PS1) 10Dzahn: phab2001: allow rsync from iridium over IPv6 too [puppet] - 10https://gerrit.wikimedia.org/r/325079 (https://phabricator.wikimedia.org/T137928) [02:06:18] (03CR) 10jenkins-bot: [V: 04-1] phab2001: allow rsync from iridium over IPv6 too [puppet] - 10https://gerrit.wikimedia.org/r/325079 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [02:07:41] (03PS2) 10Dzahn: phab2001: allow rsync from iridium over IPv6 too [puppet] - 10https://gerrit.wikimedia.org/r/325079 (https://phabricator.wikimedia.org/T137928) [02:08:39] (03CR) 10jenkins-bot: [V: 04-1] phab2001: allow rsync from iridium over IPv6 too [puppet] - 10https://gerrit.wikimedia.org/r/325079 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [02:09:55] (03PS3) 10Dzahn: phab2001: allow rsync from iridium over IPv6 too [puppet] - 10https://gerrit.wikimedia.org/r/325079 (https://phabricator.wikimedia.org/T137928) [02:10:29] (03CR) 10Paladox: phab2001: allow rsync from iridium over IPv6 too (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/325079 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [02:11:05] (03CR) 10Dzahn: [C: 032] phab2001: allow rsync from iridium over IPv6 too [puppet] - 10https://gerrit.wikimedia.org/r/325079 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [02:12:08] RECOVERY - Check systemd state on phab2001 is OK: OK - running: The system is fully operational [02:15:08] PROBLEM - Check systemd state on phab2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [02:16:15] why do you keep forgetting my ACK icinga [02:18:04] (03CR) 10Dzahn: "follow-up 4: https://gerrit.wikimedia.org/r/#/c/325079/" [puppet] - 10https://gerrit.wikimedia.org/r/324796 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [02:19:18] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:19:48] !log phab2001 - deleted outdated contents of /srv/repos [02:20:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:20:10] !log iridium - starting fresh rsync of /srv/repos over to phab2001 as backup [02:20:16] paladox: ^ resolved.. and good night [02:20:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:20:37] mutante thanks and you too [02:20:40] :) [02:21:08] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [02:21:51] paladox: it's running in a screen, cya [02:22:00] Oh :) :) [02:22:01] ok [02:26:11] ACKNOWLEDGEMENT - Check systemd state on phab2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn not the active server [02:32:28] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [02:32:51] 06Operations: sftp gives bogus "Couldn't stat remote file: No such file or directory" - https://phabricator.wikimedia.org/T146509#2843426 (10Mattflaschen-WMF) 05Open>03Invalid Sorry, this was indeed a local issue. [02:46:28] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [03:08:01] !log catrope@tin Synchronized php-1.29.0-wmf.4/extensions/PageImages: SWAT: return any image, not just the non-free image (duration: 01m 31s) [03:08:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:10:49] PROBLEM - puppet last run on analytics1049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:25:44] 06Operations, 10Phabricator, 06Release-Engineering-Team, 13Patch-For-Review: Setup test domain for phab2001 - https://phabricator.wikimedia.org/T152132#2843493 (10Dzahn) p:05Triage>03Normal [03:38:48] RECOVERY - puppet last run on analytics1049 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [03:40:58] PROBLEM - puppet last run on db1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:00:18] PROBLEM - puppet last run on mc1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:08:58] RECOVERY - puppet last run on db1048 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [04:09:49] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=303.60 Read Requests/Sec=350.20 Write Requests/Sec=826.70 KBytes Read/Sec=43740.40 KBytes_Written/Sec=8155.20 [04:18:48] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=11.80 Read Requests/Sec=155.80 Write Requests/Sec=6.20 KBytes Read/Sec=1544.40 KBytes_Written/Sec=512.00 [04:28:18] RECOVERY - puppet last run on mc1014 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [04:47:28] PROBLEM - puppet last run on cp3042 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:48:51] (03PS1) 10Yuvipanda: ssh: Disable challenge response auth for labs [puppet] - 10https://gerrit.wikimedia.org/r/325086 (https://phabricator.wikimedia.org/T147998) [04:50:50] (03PS2) 10Yuvipanda: ssh: Disable challenge response auth for labs [puppet] - 10https://gerrit.wikimedia.org/r/325086 (https://phabricator.wikimedia.org/T147998) [04:52:06] (03PS3) 10Yuvipanda: ssh: Disable challenge response auth for labs [puppet] - 10https://gerrit.wikimedia.org/r/325086 (https://phabricator.wikimedia.org/T147998) [04:57:44] (03CR) 10Aude: "@daniel so we can deploy it asap (e.g. on Monday) and get the config moved" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323556 (https://phabricator.wikimedia.org/T111023) (owner: 10Aude) [04:57:46] (03PS4) 10Yuvipanda: ssh: Disable challenge response auth for labs [puppet] - 10https://gerrit.wikimedia.org/r/325086 (https://phabricator.wikimedia.org/T147998) [04:58:45] (03CR) 10jenkins-bot: [V: 04-1] ssh: Disable challenge response auth for labs [puppet] - 10https://gerrit.wikimedia.org/r/325086 (https://phabricator.wikimedia.org/T147998) (owner: 10Yuvipanda) [04:59:11] (03PS7) 10Aude: Move interwiki sorting orders to config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323556 (https://phabricator.wikimedia.org/T111023) [05:01:08] (03PS5) 10Yuvipanda: ssh: Disable challenge response auth for labs [puppet] - 10https://gerrit.wikimedia.org/r/325086 (https://phabricator.wikimedia.org/T147998) [05:05:01] (03CR) 10Yuvipanda: [C: 032] ssh: Disable challenge response auth for labs [puppet] - 10https://gerrit.wikimedia.org/r/325086 (https://phabricator.wikimedia.org/T147998) (owner: 10Yuvipanda) [05:05:21] (03CR) 10Yuvipanda: "puppet compiler says nop on prod, tested on tools." [puppet] - 10https://gerrit.wikimedia.org/r/325086 (https://phabricator.wikimedia.org/T147998) (owner: 10Yuvipanda) [05:15:28] RECOVERY - puppet last run on cp3042 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:24:28] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [06:53:28] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:56:28] PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tree] [07:24:28] RECOVERY - puppet last run on stat1004 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [08:14:48] PROBLEM - puppet last run on db1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:43:48] RECOVERY - puppet last run on db1031 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [09:02:08] RECOVERY - Check systemd state on phab2001 is OK: OK - running: The system is fully operational [09:24:28] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:52:28] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [10:50:48] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [10:52:48] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [10:58:48] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [14:22:18] PROBLEM - puppet last run on snapshot1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:22:48] PROBLEM - puppet last run on lvs1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:31:28] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:42:08] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [14:43:08] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4826371 keys, up 33 days 6 hours - replication_delay is 38 [14:51:18] RECOVERY - puppet last run on snapshot1005 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [14:51:48] RECOVERY - puppet last run on lvs1001 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [15:00:28] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [15:04:58] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [15:32:58] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:04:08] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [16:05:08] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4826455 keys, up 33 days 7 hours - replication_delay is 0 [16:38:50] 10Blocked-on-Operations, 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#2844183 (10Krenair) [16:55:11] 07Puppet, 06Labs, 10Labs-Infrastructure: realm.pp: "Data retrieved from Toolsbeta is String not Hash" if not defined in Hiera - https://phabricator.wikimedia.org/T152142#2844202 (10scfc) This happens with standalone puppetmasters as well, and `/var/log/syslog` then says: ``` Dec 3 16:29:41 toolsbeta-puppet... [17:18:13] any reqruiters here around for wikimedia? [17:20:04] this is the wrong sort of channel for that, mostly sysadmins, devs and volunteers hang out here [17:20:30] got it [17:31:48] RECOVERY - haproxy failover on dbproxy1010 is OK: OK check_failover servers up 2 down 0 [17:32:12] that's me^ [17:34:10] (03CR) 10MarcoAurelio: "I'd rather not complicate more this scenario and would say to finish the whole table population on all the wikis that need to be done and " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322667 (https://phabricator.wikimedia.org/T148242) (owner: 10MarcoAurelio) [17:35:47] ok [18:00:48] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:03:58] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:19:50] ton31337: fwiw, https://wikimediafoundation.org/wiki/Work_with_us has a list of open positions [18:21:21] (03PS1) 10Alex Monk: Follow-up I3b706396: no more wg = wmg for this variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325116 [18:24:06] valhallasw`cloud: +1 [18:28:48] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [18:32:58] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [18:39:01] (03PS2) 10Alex Monk: deployment-prep: Follow-up I3b706396: no more wg = wmg for this variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325116 [18:39:03] (03PS1) 10Alex Monk: deployment-prep: Follow-up Iaff51065: CONTENT_MODEL_FLOW_BOARD is no longer set by Flow.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325119 [18:40:11] (03CR) 10Alex Monk: [C: 032] deployment-prep: Follow-up I3b706396: no more wg = wmg for this variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325116 (owner: 10Alex Monk) [18:40:19] (03CR) 10Alex Monk: [C: 032] deployment-prep: Follow-up Iaff51065: CONTENT_MODEL_FLOW_BOARD is no longer set by Flow.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325119 (owner: 10Alex Monk) [18:40:50] (03Merged) 10jenkins-bot: deployment-prep: Follow-up I3b706396: no more wg = wmg for this variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325116 (owner: 10Alex Monk) [18:41:10] (03Merged) 10jenkins-bot: deployment-prep: Follow-up Iaff51065: CONTENT_MODEL_FLOW_BOARD is no longer set by Flow.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325119 (owner: 10Alex Monk) [18:43:24] !log krenair@tin Synchronized wmf-config/CommonSettings-labs.php: no-op in prod, this file is not loaded, for https://gerrit.wikimedia.org/r/#/c/325119/ (duration: 00m 45s) [18:43:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:44:29] !log krenair@tin Synchronized wmf-config/mobile-labs.php: no-op in prod, this file is not loaded, for https://gerrit.wikimedia.org/r/#/c/325116/ (duration: 00m 45s) [18:44:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:45:14] okay, that made it much quieter [20:12:15] (03PS1) 10Alex Monk: Follow-up I863367b8, Ic9db0829: These two commits conflicted [puppet] - 10https://gerrit.wikimedia.org/r/325122 [20:13:09] (03CR) 10Alex Monk: "This was broken upon merge due to conflict with I863367b8" [puppet] - 10https://gerrit.wikimedia.org/r/304263 (https://phabricator.wikimedia.org/T141785) (owner: 10Thcipriani) [20:13:43] (03PS2) 10Alex Monk: Follow-up I863367b8, Ic9db0829: These two commits conflicted [puppet] - 10https://gerrit.wikimedia.org/r/325122 (https://phabricator.wikimedia.org/T141785) [20:54:29] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 23 failures. Last run 2 minutes ago with 23 failures. Failed resources (up to 3 shown): Package[apt-listchanges],Package[ethtool],Package[tshark],Package[atop] [21:22:28] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [21:41:18] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [21:42:18] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4825016 keys, up 33 days 13 hours - replication_delay is 45 [22:11:38] 07Puppet, 06Labs, 10Labs-Infrastructure: mwyaml chokes on existing, but empty Hiera: pages on wikitech - https://phabricator.wikimedia.org/T152142#2844932 (10scfc) p:05Triage>03High a:03scfc [22:21:32] 07Puppet, 06Labs, 10Labs-Infrastructure: mwyaml chokes on existing, but empty Hiera: pages on wikitech - https://phabricator.wikimedia.org/T152142#2844937 (10scfc) The issue is caused by existing, but empty `Hiera:` pages and a very misleading (:-)) difference between code and error message in `modules/wmfli... [22:46:18] (03PS1) 10Tim Landscheidt: mwyaml: Accept existing, but empty "Hiera:" pages as well [puppet] - 10https://gerrit.wikimedia.org/r/325131 [22:47:26] (03CR) 10Tim Landscheidt: "Tested on Toolsbeta." [puppet] - 10https://gerrit.wikimedia.org/r/325131 (owner: 10Tim Landscheidt) [22:48:28] (03CR) 10jenkins-bot: [V: 04-1] mwyaml: Accept existing, but empty "Hiera:" pages as well [puppet] - 10https://gerrit.wikimedia.org/r/325131 (owner: 10Tim Landscheidt) [22:52:35] 07Puppet, 06Labs, 10Labs-Infrastructure: mwyaml chokes on existing, but empty Hiera: pages on wikitech - https://phabricator.wikimedia.org/T152142#2844998 (10scfc) (AFAIUI, after deploying the change, the Labs puppetmaster needs to be restarted (`service apache2 restart`) because Puppet/Ruby does not reload... [22:53:23] 06Operations, 06Parsing-Team, 06Release-Engineering-Team, 07HHVM, and 3 others: API cluster failure / OOM - https://phabricator.wikimedia.org/T151702#2828601 (10Theklan) Hello, I'm Theklan, one of the admins of eu:wp. We have been making some chages in Modulu:Wikidata, in order to fully use Wikidata's pow... [22:53:48] (03PS2) 10Tim Landscheidt: mwyaml: Accept existing, but empty "Hiera:" pages as well [puppet] - 10https://gerrit.wikimedia.org/r/325131 [22:58:15] (03CR) 10Tim Landscheidt: "(And tested PS2 as well :-).)" [puppet] - 10https://gerrit.wikimedia.org/r/325131 (owner: 10Tim Landscheidt) [23:27:20] (03PS1) 10Legoktm: admin: Update my (=legoktm) dotfiles [puppet] - 10https://gerrit.wikimedia.org/r/325134