[00:35:14] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:54:34] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:03:14] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [01:22:34] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [01:31:44] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:57:24] PROBLEM - puppet last run on db1095 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:59:44] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [02:10:44] 06Operations, 10Ops-Access-Requests: Request to access hadoop (stat1004) for Ladsgroup - https://phabricator.wikimedia.org/T155303#2944015 (10Ladsgroup) Would this be helpful? {T134651} [02:16:14] PROBLEM - puppet last run on ms-fe1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:21:45] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.7) (duration: 07m 20s) [02:21:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:26:07] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Jan 17 02:26:07 UTC 2017 (duration 4m 22s) [02:26:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:26:24] RECOVERY - puppet last run on db1095 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [02:44:14] RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [03:23:24] PROBLEM - puppet last run on dbproxy1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:23:34] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 612.64 seconds [03:26:34] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 270.82 seconds [03:51:24] RECOVERY - puppet last run on dbproxy1006 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [04:39:54] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [05:04:24] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:06:54] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [05:33:24] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:01:44] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:06:24] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [06:07:44] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/1: down - Core: cr1-ulsfo:xe-1/2/0 (Telia, IC-313592, 51ms) {#11372} [10Gbps wave]BR [06:07:54] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 66, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/2/0: down - Core: cr1-eqord:xe-0/0/1 (Telia, IC-313592, 51ms) {#1502} [10Gbps wave]BR [06:08:54] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [06:09:24] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:09:54] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:12:24] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [06:12:54] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [06:16:44] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [06:16:54] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 68, down: 0, dormant: 0, excluded: 0, unused: 0 [06:18:24] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:18:45] 06Operations, 10Traffic: convert dumps to use Letsencrypt for SSL cert (deadline: 2017-04-26) - https://phabricator.wikimedia.org/T154940#2944273 (10Dzahn) a:03Dzahn [06:19:54] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:29:44] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [06:33:34] PROBLEM - Check HHVM threads for leakage on mw1260 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:37:24] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:39:01] <_joe_> wat? [06:39:19] <_joe_> ulsfo issues? [06:42:25] <_joe_> yup, I'd say a network blip probably [06:44:24] PROBLEM - puppet last run on analytics1043 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata] [06:44:44] PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:46:24] PROBLEM - Check HHVM threads for leakage on mw1259 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:49:07] 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: db2060 crashed (RAID controller) - https://phabricator.wikimedia.org/T154031#2944299 (10Marostegui) @Papaul ping me today once you are around and have time so we can do all the updates and get this ticket over with Thanks! [06:58:53] !log Compressing cebwiki/templatelinks (215G) table on db1038 - T154465 [06:58:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:59] T154465: Defragment db1038 - https://phabricator.wikimedia.org/T154465 [07:08:52] (03PS2) 10Marostegui: mariadb: Split dbstore role classes [puppet] - 10https://gerrit.wikimedia.org/r/332228 (https://phabricator.wikimedia.org/T130128) [07:09:08] 06Operations, 10MediaWiki-API, 10Traffic: Varnish does not cache Action API responses when logged in - https://phabricator.wikimedia.org/T155314#2944333 (10Tgr) Yeah, [[https://github.com/wikimedia/operations-puppet/blob/fb4d97d7dab38124a0a32a0a6c728f033fac0abf/modules/varnish/templates/text-common.inc.vcl.e... [07:13:34] RECOVERY - puppet last run on analytics1043 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [07:13:53] (03CR) 10Marostegui: "These changes compile fine: https://puppet-compiler.wmflabs.org/5102/" [puppet] - 10https://gerrit.wikimedia.org/r/332228 (https://phabricator.wikimedia.org/T130128) (owner: 10Marostegui) [07:26:17] !log Compressing revision tables db1035 (depooled) [07:26:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:29:06] (03PS6) 10Giuseppe Lavagetto: base: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/332355 [07:50:47] !log Remove partitions from enwiktionary.templatelinks on dbstore2001 - T154097 [07:50:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:50:51] T154097: Remove partitions from enwiktionary.templatelinks in s2 - https://phabricator.wikimedia.org/T154097 [08:00:29] (03PS2) 10Marostegui: mariadb: Enable gtid_domain_id - phabricator hosts [puppet] - 10https://gerrit.wikimedia.org/r/326446 (https://phabricator.wikimedia.org/T149418) [08:05:49] (03CR) 10Alexandros Kosiaris: [C: 032] contint: import rewrite rule from integration/docroot [puppet] - 10https://gerrit.wikimedia.org/r/332385 (https://phabricator.wikimedia.org/T150727) (owner: 10Hashar) [08:05:55] (03PS2) 10Alexandros Kosiaris: contint: import rewrite rule from integration/docroot [puppet] - 10https://gerrit.wikimedia.org/r/332385 (https://phabricator.wikimedia.org/T150727) (owner: 10Hashar) [08:05:59] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] contint: import rewrite rule from integration/docroot [puppet] - 10https://gerrit.wikimedia.org/r/332385 (https://phabricator.wikimedia.org/T150727) (owner: 10Hashar) [08:09:55] (03PS2) 10Muehlenhoff: hhvm: Restrict to domain networks [puppet] - 10https://gerrit.wikimedia.org/r/316550 [08:16:24] 06Operations, 10ORES, 10Revision-Scoring-As-A-Service-Backlog: Set up monitoring for ORES redis database - https://phabricator.wikimedia.org/T155482#2944388 (10akosiaris) [08:16:54] (03PS1) 10Alexandros Kosiaris: ores: Monitor the state of the redis databases [puppet] - 10https://gerrit.wikimedia.org/r/332431 (https://phabricator.wikimedia.org/T155482) [08:22:44] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/1: down - Core: cr1-ulsfo:xe-1/2/0 (Telia, IC-313592, 51ms) {#11372} [10Gbps wave]BR [08:22:54] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 66, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/2/0: down - Core: cr1-eqord:xe-0/0/1 (Telia, IC-313592, 51ms) {#1502} [10Gbps wave]BR [08:27:14] (03CR) 10Alexandros Kosiaris: [C: 032] ores: Monitor the state of the redis databases [puppet] - 10https://gerrit.wikimedia.org/r/332431 (https://phabricator.wikimedia.org/T155482) (owner: 10Alexandros Kosiaris) [08:31:30] apache for god sake [08:31:44] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [08:31:54] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 68, down: 0, dormant: 0, excluded: 0, unused: 0 [08:33:16] !log installing tiff security updates [08:33:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:04] (03PS1) 10Hashar: contint: properly set REQUEST_FILENAME in vhost [puppet] - 10https://gerrit.wikimedia.org/r/332432 (https://phabricator.wikimedia.org/T150727) [08:41:18] akosiaris: good morning! thx for the apache doc.wm.o merge, though I screwed up the change yesterday :(( [08:41:39] I sent a test/dev copy instead of the one I wanted (I pushed the wrong branch for review) [08:41:47] https://gerrit.wikimedia.org/r/332432 would fix it for realy [08:42:09] !log Compressing wikidatawiki on db1026 - https://phabricator.wikimedia.org/T154929 [08:42:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:17] hashar: ah indeed. Ok thanks [08:42:21] will merge [08:42:26] (03CR) 10Alexandros Kosiaris: [C: 032] contint: properly set REQUEST_FILENAME in vhost [puppet] - 10https://gerrit.wikimedia.org/r/332432 (https://phabricator.wikimedia.org/T150727) (owner: 10Hashar) [08:43:48] PROBLEM - Redis status tcp_6379 on oresrdb1001 is CRITICAL: Return code of 255 is out of bounds [08:44:08] PROBLEM - Redis status tcp_6380 on oresrdb1001 is CRITICAL: Return code of 255 is out of bounds [08:44:18] PROBLEM - Redis status tcp_6379 on oresrdb1002 is CRITICAL: Return code of 255 is out of bounds [08:44:28] PROBLEM - Redis status tcp_6380 on oresrdb1002 is CRITICAL: Return code of 255 is out of bounds [08:45:27] 06Operations, 10Revision-Scoring-As-A-Service-Backlog: Set up oresrdb redis node in codfw - https://phabricator.wikimedia.org/T139372#2944445 (10Joe) [08:45:29] 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 07Wikimedia-Multiple-active-datacenters: Prepare and improve the datacenter switchover procedure - https://phabricator.wikimedia.org/T154658#2944444 (10Joe) [08:47:28] RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK [08:47:47] <_joe_> akosiaris: what's up with ores redis? [08:47:56] <_joe_> did you look into it already? [08:48:00] (03PS1) 10Alexandros Kosiaris: Revert "redis::monitoring::instance: partially disable replication checks" [puppet] - 10https://gerrit.wikimedia.org/r/332434 [08:48:07] <_joe_> heh [08:48:09] yes [08:51:34] 06Operations, 10Continuous-Integration-Infrastructure: (Nodepool) CI is really slow tonight - https://phabricator.wikimedia.org/T155444#2944453 (10hashar) 05Open>03Resolved a:03hashar I guess that explains it oojs/ui is an heavy consumer with long running jobs. Taking https://gerrit.wikimedia.org/r/#/c/... [08:52:46] akosiaris: apache made me crazy yesterday afternoon : / Until I actually read mod_rewrite doc. It is all fine now thx! [08:55:33] 06Operations, 10Continuous-Integration-Infrastructure: (Nodepool) CI is really slow tonight - https://phabricator.wikimedia.org/T155444#2943438 (10hashar) ^^ filled as T155483 for later consideration. [08:56:35] hashar: heh, there is a reason mod_rewrite's docs say it's voodoo [08:57:16] akosiaris: I think I will dish out all the crap and use mod_autoindex :] [08:57:26] (03PS7) 10Giuseppe Lavagetto: base: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/332355 [08:58:32] <_joe_> akosiaris: what, mod_rewrite? [08:58:41] (03PS3) 10Juniorsys: authdns: Add trailing comma [puppet] - 10https://gerrit.wikimedia.org/r/332093 (https://phabricator.wikimedia.org/T93645) [08:58:46] _joe_: yup [08:58:55] <_joe_> voodoo has at least its own internal logic [08:59:49] (03PS3) 10Juniorsys: bacula module: Trailing commas, full class names [puppet] - 10https://gerrit.wikimedia.org/r/332094 (https://phabricator.wikimedia.org/T93645) [09:00:06] (03PS3) 10Juniorsys: conftool module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332095 (https://phabricator.wikimedia.org/T93645) [09:00:18] (03PS3) 10Juniorsys: contint module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332096 (https://phabricator.wikimedia.org/T93645) [09:00:35] (03PS3) 10Juniorsys: diamond module: Add trailing commas [puppet] - 10https://gerrit.wikimedia.org/r/332098 (https://phabricator.wikimedia.org/T93645) [09:00:46] (03PS3) 10Juniorsys: druid module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332099 (https://phabricator.wikimedia.org/T93645) [09:00:56] (03PS3) 10Juniorsys: ganglia module: Use full names for class names [puppet] - 10https://gerrit.wikimedia.org/r/332100 (https://phabricator.wikimedia.org/T93645) [09:01:05] (03PS4) 10Juniorsys: geowiki module: Lint changes + modes/umask quoting [puppet] - 10https://gerrit.wikimedia.org/r/332101 (https://phabricator.wikimedia.org/T93645) [09:01:14] (03PS3) 10Juniorsys: install_server module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332102 [09:01:21] (03PS3) 10Juniorsys: mediawiki module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332103 (https://phabricator.wikimedia.org/T93645) [09:01:29] (03PS3) 10Juniorsys: puppetmaster module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332105 (https://phabricator.wikimedia.org/T93645) [09:01:35] (03PS3) 10Juniorsys: role analytics_cluster: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332106 (https://phabricator.wikimedia.org/T93645) [09:01:44] (03PS3) 10Juniorsys: site.pp - Use full class names, not relative ones [puppet] - 10https://gerrit.wikimedia.org/r/332107 (https://phabricator.wikimedia.org/T93645) [09:01:51] (03PS3) 10Juniorsys: statistics module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332109 (https://phabricator.wikimedia.org/T93645) [09:01:58] (03PS3) 10Juniorsys: toollabs role modules: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332110 (https://phabricator.wikimedia.org/T93645) [09:02:05] (03PS3) 10Juniorsys: toollabs module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332111 [09:02:10] (03PS3) 10Juniorsys: torrus module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332112 (https://phabricator.wikimedia.org/T93645) [09:02:17] (03PS3) 10Juniorsys: varnish module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332113 (https://phabricator.wikimedia.org/T93645) [09:03:17] (03PS3) 10Juniorsys: postgresql module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332104 (https://phabricator.wikimedia.org/T93645) [09:05:38] (03PS1) 10Alexandros Kosiaris: redis: Allow specifying password for monitoring [puppet] - 10https://gerrit.wikimedia.org/r/332436 [09:05:45] Should I always be adding reviewers to my changes in operations/puppet, or? [09:06:25] there are a few people auto added as reviwer via https://www.mediawiki.org/wiki/Git/Reviewers#operations.2Fpuppet [09:06:42] friendly12345: I usually look at the last few authors for a given module [09:07:24] Sorry should I have be having this conversation here, or in wikimedia-labs? It seems as though this channel is for mostly icinga alert spam only. [09:07:33] with something like: git shortlog -s --since "1 year ago" modules/postgresql [09:07:57] for that module that would list Alexandros Kosiaris and Guillaume Lederrey as top authors [09:08:35] if you pass '-e' to 'git shortlog' it would show the author email address [09:08:42] which you can then use to add that person as a reviewer [09:09:01] (I think there is a Gerrit plugin to do that automatically but it is rather unstable) [09:20:50] hashar: if apache makes you crazy, I suggest the #httpd channel.. A lot of very nice and helpful people in there :) [09:21:28] elukey: will definitely remember about it when I start playing with mod_autoindex ! [09:21:32] you will get crazy anyway for things like mod_rewrite, but at least with some group theraphy support :P [09:22:08] let me know if you want anybody to listen to your ideas, always interested in httpd issues :) [09:22:11] !log installing tomcat security updates [09:22:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:14] (03CR) 10Alexandros Kosiaris: "Various comments here and there. Overall, I think moving the base module to a profile makes sense but this change incorporates more than t" (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/332355 (owner: 10Giuseppe Lavagetto) [09:26:43] (03CR) 10Alexandros Kosiaris: [C: 032] bacula module: Trailing commas, full class names [puppet] - 10https://gerrit.wikimedia.org/r/332094 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [09:27:00] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] bacula module: Trailing commas, full class names [puppet] - 10https://gerrit.wikimedia.org/r/332094 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [09:30:28] RECOVERY - Check HHVM threads for leakage on mw1260 is OK: OK [09:40:54] (03PS4) 10Alexandros Kosiaris: authdns: Add trailing comma [puppet] - 10https://gerrit.wikimedia.org/r/332093 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [09:40:59] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] authdns: Add trailing comma [puppet] - 10https://gerrit.wikimedia.org/r/332093 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [09:44:35] (03PS4) 10Alexandros Kosiaris: conftool module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332095 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [09:44:40] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] conftool module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332095 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [09:47:21] (03PS1) 10Muehlenhoff: Update SSH key for Petr Pechelko [puppet] - 10https://gerrit.wikimedia.org/r/332439 (https://phabricator.wikimedia.org/T155449) [09:49:39] (03CR) 10Alex Monk: [C: 04-1] "Shall we just set openstack::version to mitaka in shinken-01's hiera data in horizon then?" [puppet] - 10https://gerrit.wikimedia.org/r/328611 (owner: 10Alex Monk) [09:52:34] 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 07Wikimedia-Multiple-active-datacenters: Prepare and improve the datacenter switchover procedure - https://phabricator.wikimedia.org/T154658#2944557 (10MoritzMuehlenhoff) p:05Triage>03High [09:53:38] RECOVERY - Check HHVM threads for leakage on mw1169 is OK: OK [09:54:33] (03PS1) 10Marostegui: db-codfw.php: Depool db2041 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332441 (https://phabricator.wikimedia.org/T154097) [09:56:38] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2041 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332441 (https://phabricator.wikimedia.org/T154097) (owner: 10Marostegui) [09:58:15] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2041 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332441 (https://phabricator.wikimedia.org/T154097) (owner: 10Marostegui) [09:58:26] (03CR) 10jenkins-bot: db-codfw.php: Depool db2041 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332441 (https://phabricator.wikimedia.org/T154097) (owner: 10Marostegui) [09:58:28] RECOVERY - Check HHVM threads for leakage on mw1259 is OK: OK [09:59:43] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2041 - T154097 (duration: 00m 38s) [09:59:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:59:47] T154097: Remove partitions from enwiktionary.templatelinks in s2 - https://phabricator.wikimedia.org/T154097 [10:02:02] !log Remove partitions from enwiktionary.templatelinks on db2041 - T154097 [10:02:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:00] !log installing libwmf security updates [10:06:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:16:18] !log Updating CI Jessie image for NodeJs 4 -> 6 upgrade. T155443 [10:16:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:16:23] T155443: Update ci to nodejs 6 - https://phabricator.wikimedia.org/T155443 [10:22:25] !log installing jq security updates [10:22:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:27:18] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2041" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332444 [10:27:30] (03CR) 10Marostegui: [C: 04-2] "Wait for the alter to finish" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332444 (owner: 10Marostegui) [10:29:36] (03PS8) 10Giuseppe Lavagetto: base: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/332355 [10:31:48] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Puppet has 24 failures. Last run 2 minutes ago with 24 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [10:32:28] PROBLEM - graphite.wikimedia.org on graphite1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:34:35] 06Operations, 10Analytics, 10ChangeProp, 10Citoid, and 11 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2944616 (10hashar) [10:34:41] 06Operations, 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 13Patch-For-Review: Update ci to nodejs 6 - https://phabricator.wikimedia.org/T155443#2944614 (10hashar) 05Open>03Resolved a:03Paladox [10:34:45] (03PS4) 10Juniorsys: contint module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332096 (https://phabricator.wikimedia.org/T93645) [10:34:47] (03PS4) 10Juniorsys: diamond module: Add trailing commas [puppet] - 10https://gerrit.wikimedia.org/r/332098 (https://phabricator.wikimedia.org/T93645) [10:35:35] !log CI switched NodeJS from v4 to v6 T155443 T149331 [10:35:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:40] T149331: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331 [10:35:40] T155443: Update ci to nodejs 6 - https://phabricator.wikimedia.org/T155443 [10:37:18] RECOVERY - graphite.wikimedia.org on graphite1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1572 bytes in 2.638 second response time [10:43:25] !log installing file/libmagic security updates [10:43:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:48:22] (03PS4) 10Juniorsys: druid module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332099 (https://phabricator.wikimedia.org/T93645) [10:48:38] (03PS4) 10Juniorsys: ganglia module: Use full names for class names [puppet] - 10https://gerrit.wikimedia.org/r/332100 (https://phabricator.wikimedia.org/T93645) [10:48:52] (03PS5) 10Juniorsys: geowiki module: Lint changes + modes/umask quoting [puppet] - 10https://gerrit.wikimedia.org/r/332101 (https://phabricator.wikimedia.org/T93645) [10:48:55] 06Operations, 10Analytics, 10ChangeProp, 10Citoid, and 11 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2944635 (10hashar) The CI Jessie instances are now using NodeJS version 6 as provided by apt.wikimedia.org. I kind of freaked out yesterday until I saw this task and all th... [10:49:06] (03PS4) 10Juniorsys: install_server module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332102 [10:49:20] (03PS4) 10Juniorsys: mediawiki module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332103 (https://phabricator.wikimedia.org/T93645) [10:50:46] 06Operations, 06Analytics-Kanban: Open temporary access from analytics vlan to new-labsdb one - https://phabricator.wikimedia.org/T155487#2944637 (10JAllemandou) [10:51:18] (03PS4) 10Juniorsys: postgresql module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332104 (https://phabricator.wikimedia.org/T93645) [10:52:57] (03PS4) 10Juniorsys: puppetmaster module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332105 (https://phabricator.wikimedia.org/T93645) [10:54:15] (03CR) 10Hashar: [C: 031] diamond module: Add trailing commas [puppet] - 10https://gerrit.wikimedia.org/r/332098 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [10:55:13] (03PS4) 10Juniorsys: role analytics_cluster: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332106 (https://phabricator.wikimedia.org/T93645) [10:55:26] (03PS4) 10Juniorsys: site.pp - Use full class names, not relative ones [puppet] - 10https://gerrit.wikimedia.org/r/332107 (https://phabricator.wikimedia.org/T93645) [10:55:34] (03CR) 10Marostegui: [C: 032] "Maintenance finished and lag gone" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332444 (owner: 10Marostegui) [10:56:36] (03CR) 10Hashar: [C: 04-1] "Almost :] proxy_common is a relative one and part of the contint module. Beside that looks good." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/332096 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [10:57:12] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2041" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332444 (owner: 10Marostegui) [10:58:02] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2041" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332444 (owner: 10Marostegui) [10:58:26] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2041 - T154097 (duration: 00m 38s) [10:58:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:31] T154097: Remove partitions from enwiktionary.templatelinks in s2 - https://phabricator.wikimedia.org/T154097 [10:58:45] (03PS4) 10Juniorsys: statistics module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332109 (https://phabricator.wikimedia.org/T93645) [10:58:48] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [10:59:11] 06Operations, 06Analytics-Kanban, 10netops: Open temporary access from analytics vlan to new-labsdb one - https://phabricator.wikimedia.org/T155487#2944653 (10elukey) p:05Triage>03Low a:05MoritzMuehlenhoff>03None [11:01:16] (03PS4) 10Juniorsys: toollabs role modules: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332110 (https://phabricator.wikimedia.org/T93645) [11:01:27] (03PS4) 10Juniorsys: toollabs module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332111 [11:01:44] (03PS4) 10Juniorsys: torrus module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332112 (https://phabricator.wikimedia.org/T93645) [11:02:59] (03PS4) 10Juniorsys: varnish module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332113 (https://phabricator.wikimedia.org/T93645) [11:05:53] 06Operations, 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling: Update npm to 3 or 4 - https://phabricator.wikimedia.org/T155488#2944663 (10Paladox) [11:07:31] (03CR) 10Alexandros Kosiaris: [C: 032] Revert "redis::monitoring::instance: partially disable replication checks" [puppet] - 10https://gerrit.wikimedia.org/r/332434 (owner: 10Alexandros Kosiaris) [11:07:41] (03PS2) 10Alexandros Kosiaris: Revert "redis::monitoring::instance: partially disable replication checks" [puppet] - 10https://gerrit.wikimedia.org/r/332434 [11:07:45] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Revert "redis::monitoring::instance: partially disable replication checks" [puppet] - 10https://gerrit.wikimedia.org/r/332434 (owner: 10Alexandros Kosiaris) [11:08:11] 06Operations, 06Analytics-Kanban, 10netops: Open temporary access from analytics vlan to new-labsdb one - https://phabricator.wikimedia.org/T155487#2944677 (10JAllemandou) [11:08:17] (03PS1) 10Marostegui: db-codfw.php: Depool db2049 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332445 (https://phabricator.wikimedia.org/T154097) [11:08:21] (03CR) 10Hashar: [C: 031] "Looks fine and the puppet noop tests in modules/torrus/tests/ pass as well." [puppet] - 10https://gerrit.wikimedia.org/r/332112 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [11:10:45] (03PS2) 10ArielGlenn: make (most) snapshot shell scripts files instead of templates [puppet] - 10https://gerrit.wikimedia.org/r/328158 [11:10:52] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2049 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332445 (https://phabricator.wikimedia.org/T154097) (owner: 10Marostegui) [11:12:27] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2049 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332445 (https://phabricator.wikimedia.org/T154097) (owner: 10Marostegui) [11:13:42] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2049 - T154097 (duration: 00m 38s) [11:13:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:13:46] T154097: Remove partitions from enwiktionary.templatelinks in s2 - https://phabricator.wikimedia.org/T154097 [11:14:41] (03PS3) 10ArielGlenn: make (most) snapshot shell scripts files instead of templates [puppet] - 10https://gerrit.wikimedia.org/r/328158 [11:15:01] !log Remove partitions from enwiktionary.templatelinks on db2049 - T154097 [11:15:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:49] (03CR) 10ArielGlenn: [C: 032] make (most) snapshot shell scripts files instead of templates [puppet] - 10https://gerrit.wikimedia.org/r/328158 (owner: 10ArielGlenn) [11:19:18] snapshots will whine in a minute, I'm on it [11:19:28] !log installing python-werkzeug security updates [11:19:28] PROBLEM - puppet last run on snapshot1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:19:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:20:40] (03CR) 10jenkins-bot: db-codfw.php: Depool db2049 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332445 (https://phabricator.wikimedia.org/T154097) (owner: 10Marostegui) [11:22:54] !log installing tre security updates [11:22:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:25:48] 06Operations, 10DNS, 10Traffic, 07Beta-Cluster-reproducible: Ferm/DNS library weirdness on deployment-mediawiki boxes - https://phabricator.wikimedia.org/T153468#2881386 (10MoritzMuehlenhoff) Could report/send the patch upstream? [11:25:57] (03PS1) 10ArielGlenn: change snapshots manifests to use the shell scripts instead of templates [puppet] - 10https://gerrit.wikimedia.org/r/332447 [11:31:45] (03CR) 10ArielGlenn: [C: 032] change snapshots manifests to use the shell scripts instead of templates [puppet] - 10https://gerrit.wikimedia.org/r/332447 (owner: 10ArielGlenn) [11:32:43] !log installing w3m security updates [11:32:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:34:28] RECOVERY - puppet last run on snapshot1007 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [11:39:27] (03PS1) 10ArielGlenn: convert dump run script to (mostly) use shell vars instead of hiera [puppet] - 10https://gerrit.wikimedia.org/r/332448 [11:40:28] PROBLEM - puppet last run on labstore1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:40:47] (03CR) 10ArielGlenn: [C: 032] convert dump run script to (mostly) use shell vars instead of hiera [puppet] - 10https://gerrit.wikimedia.org/r/332448 (owner: 10ArielGlenn) [11:50:01] (03PS2) 10Giuseppe Lavagetto: base: fix pick_initscript spec [puppet] - 10https://gerrit.wikimedia.org/r/331494 (owner: 10Hashar) [11:51:38] (03CR) 10Giuseppe Lavagetto: [C: 032] base: fix pick_initscript spec [puppet] - 10https://gerrit.wikimedia.org/r/331494 (owner: 10Hashar) [11:53:28] PROBLEM - puppet last run on aluminium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata] [11:54:17] !log installing potrace security updates [11:54:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:01:24] PROBLEM - graphite.wikimedia.org on graphite1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:02:14] RECOVERY - graphite.wikimedia.org on graphite1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1572 bytes in 1.481 second response time [12:08:34] RECOVERY - puppet last run on labstore1002 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [12:09:09] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2049" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332450 [12:12:26] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2049" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332450 (owner: 10Marostegui) [12:14:06] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2049" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332450 (owner: 10Marostegui) [12:14:22] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2049" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332450 (owner: 10Marostegui) [12:14:54] (03CR) 10Giuseppe Lavagetto: "Some answers to akosiaris' commments, I will incorporate some of his suggestions in my work." (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/332355 (owner: 10Giuseppe Lavagetto) [12:15:08] (03PS3) 10Filippo Giunchedi: cassandra: add jmx_exporter to Cassandra in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/331911 (https://phabricator.wikimedia.org/T155120) [12:16:08] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2049 - T154097 (duration: 00m 47s) [12:16:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:16:13] T154097: Remove partitions from enwiktionary.templatelinks in s2 - https://phabricator.wikimedia.org/T154097 [12:17:27] (03CR) 10Filippo Giunchedi: [C: 032] cassandra: add jmx_exporter to Cassandra in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/331911 (https://phabricator.wikimedia.org/T155120) (owner: 10Filippo Giunchedi) [12:18:52] (03CR) 10Giuseppe Lavagetto: "Production seems to compile fine (I corrected the few errors encountered in https://puppet-compiler.wmflabs.org/5103/); I will however do " [puppet] - 10https://gerrit.wikimedia.org/r/332355 (owner: 10Giuseppe Lavagetto) [12:21:34] RECOVERY - puppet last run on aluminium is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [12:23:03] (03PS1) 10Marostegui: db-codfw.php: Depool db2056 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332451 (https://phabricator.wikimedia.org/T154097) [12:24:34] (03CR) 10Hashar: [C: 04-1] "That is dirty and imho solely for Mac OS X / Darwin. I would like to find a better fix." [puppet] - 10https://gerrit.wikimedia.org/r/331632 (owner: 10Hashar) [12:24:54] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2056 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332451 (https://phabricator.wikimedia.org/T154097) (owner: 10Marostegui) [12:25:30] (03Abandoned) 10Hashar: (DO NOT SUBMIT) Octopus merge of spec fixes [puppet] - 10https://gerrit.wikimedia.org/r/331850 (owner: 10Hashar) [12:26:02] (03Restored) 10Hashar: (DO NOT SUBMIT) Octopus merge of spec fixes [puppet] - 10https://gerrit.wikimedia.org/r/331850 (owner: 10Hashar) [12:26:22] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2056 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332451 (https://phabricator.wikimedia.org/T154097) (owner: 10Marostegui) [12:26:45] (03Abandoned) 10Filippo Giunchedi: prometheus: add graphite_exporter support [puppet] - 10https://gerrit.wikimedia.org/r/257860 (https://phabricator.wikimedia.org/T92813) (owner: 10Filippo Giunchedi) [12:26:57] (03Abandoned) 10Filippo Giunchedi: labs: tap metrics towards graphite_exporter [puppet] - 10https://gerrit.wikimedia.org/r/257861 (https://phabricator.wikimedia.org/T92813) (owner: 10Filippo Giunchedi) [12:27:54] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2056 - T154097 (duration: 00m 42s) [12:27:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:27:58] T154097: Remove partitions from enwiktionary.templatelinks in s2 - https://phabricator.wikimedia.org/T154097 [12:28:01] (03PS1) 10Filippo Giunchedi: swift: add ms-fe200[5-8] to site and conftool [puppet] - 10https://gerrit.wikimedia.org/r/332452 (https://phabricator.wikimedia.org/T152612) [12:28:03] (03CR) 10jenkins-bot: db-codfw.php: Depool db2056 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332451 (https://phabricator.wikimedia.org/T154097) (owner: 10Marostegui) [12:28:39] (03PS3) 10Hashar: (DO NOT SUBMIT) Octopus merge of spec fixes [puppet] - 10https://gerrit.wikimedia.org/r/331850 [12:29:05] !log Remove partitions from enwiktionary.templatelinks on db2056 - T154097 [12:29:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:29:13] (03PS14) 10Hashar: Modification of Rakefile spec entry point [puppet] - 10https://gerrit.wikimedia.org/r/282484 (https://phabricator.wikimedia.org/T78342) (owner: 10Nicko) [12:29:39] (03PS9) 10Hashar: Use rake tasks to run modules spec [puppet] - 10https://gerrit.wikimedia.org/r/307223 [12:35:01] (03PS4) 10Giuseppe Lavagetto: Add 'webp' package to ImageMagick role [puppet] - 10https://gerrit.wikimedia.org/r/331820 (https://phabricator.wikimedia.org/T27397) (owner: 10Brion VIBBER) [12:35:50] (03CR) 10Filippo Giunchedi: [C: 032] swift: add ms-fe200[5-8] to site and conftool [puppet] - 10https://gerrit.wikimedia.org/r/332452 (https://phabricator.wikimedia.org/T152612) (owner: 10Filippo Giunchedi) [12:41:07] (03PS5) 10Giuseppe Lavagetto: Add 'webp' package to ImageMagick role [puppet] - 10https://gerrit.wikimedia.org/r/331820 (https://phabricator.wikimedia.org/T27397) (owner: 10Brion VIBBER) [12:41:15] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Add 'webp' package to ImageMagick role [puppet] - 10https://gerrit.wikimedia.org/r/331820 (https://phabricator.wikimedia.org/T27397) (owner: 10Brion VIBBER) [12:51:44] godog: we have already a jmx_exporter? \o/ [12:52:37] elukey: heheh not yet deployed no, but I've put in the scaffolding [12:52:47] very ince [12:52:49] *nice [12:52:50] elukey: it is running for cassandra in deployment-prep though [12:52:55] <_joe_> is it using jmxtrans? [12:53:07] <_joe_> I didn't check the code tbh [12:53:24] let me know when you are ready to test it on other java deployments, looking forward to test it on hadoop and kafka [12:53:37] no we've tried jmx_exporter first, https://grafana-labs.wikimedia.org/dashboard/db/cassandra [12:54:11] elukey: ok! the missing bits are essentially deploying the jar and adding it to the jvm command line [12:54:35] the related task is T155120 [12:54:36] T155120: Enable Prometheus metrics export for Cassandra - https://phabricator.wikimedia.org/T155120 [12:54:46] super [13:00:15] (03PS1) 10Muehlenhoff: Grant temporary access to labsdb replica from Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/332457 (https://phabricator.wikimedia.org/T155487) [13:01:05] (03CR) 10jerkins-bot: [V: 04-1] Grant temporary access to labsdb replica from Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/332457 (https://phabricator.wikimedia.org/T155487) (owner: 10Muehlenhoff) [13:01:46] (03PS2) 10Muehlenhoff: Grant temporary access to labsdb replica from Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/332457 (https://phabricator.wikimedia.org/T155487) [13:03:41] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2056" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332458 [13:03:54] (03CR) 10Marostegui: [C: 04-2] "Wait for the alter to finish and the server to catch up" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332458 (owner: 10Marostegui) [13:16:40] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia, 05MW-1.28-release-notes: File does not thumbnail, doesn't have extracted metadata, has reported zero width/height (due to garbage bytes between JPEG sections) - https://phabricator.wikimedia.org/T148606#2944817 (10MoritzMuehlenhoff) [13:18:11] (03CR) 10Giuseppe Lavagetto: "Looking at all the code in base, two other classes are properly profiles, imho:" [puppet] - 10https://gerrit.wikimedia.org/r/332355 (owner: 10Giuseppe Lavagetto) [13:18:19] 06Operations, 10DNS, 10Traffic, 07Beta-Cluster-reproducible, 07Upstream: Ferm/DNS library weirdness on deployment-mediawiki boxes - https://phabricator.wikimedia.org/T153468#2944820 (10Krenair) Yeah. The real bug seems to be in Net::DNS, the trailing full stop config change + patch above for ferm is just... [13:23:44] 06Operations, 10DNS, 10Traffic, 07Beta-Cluster-reproducible, 07Upstream: Ferm/DNS library weirdness on deployment-mediawiki boxes - https://phabricator.wikimedia.org/T153468#2944830 (10Krenair) > You have successfully confirmed your subscription request to the mailing list net-dns-users, however final ap... [13:27:50] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:30:07] (03PS5) 10Juniorsys: contint module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332096 (https://phabricator.wikimedia.org/T93645) [13:32:03] (03CR) 10Juniorsys: "Should be fixed now" [puppet] - 10https://gerrit.wikimedia.org/r/332096 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [13:33:42] !log installing libpng security updates [13:33:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:49:23] Still no deployment calendar on wikitech? [13:56:50] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [14:06:26] (03PS5) 10Dzahn: install_server module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332102 (owner: 10Juniorsys) [14:09:00] (03CR) 10Dzahn: [C: 032] install_server module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332102 (owner: 10Juniorsys) [14:10:31] !log installing bind9 security updates [14:10:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:10:45] (03PS5) 10Dzahn: torrus module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332112 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [14:12:05] Urbanecm: did you get some changes to push today? [14:13:39] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/5108/" [puppet] - 10https://gerrit.wikimedia.org/r/332112 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [14:15:46] (03CR) 10Dzahn: "yea, what is the reason for this?" [puppet] - 10https://gerrit.wikimedia.org/r/331998 (owner: 10Paladox) [14:18:43] (03CR) 10Dzahn: [C: 04-1] ganglia module: Use full names for class names (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/332100 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [14:21:55] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/5109/planet1001.eqiad.wmnet/ compiling fails" [puppet] - 10https://gerrit.wikimedia.org/r/332100 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [14:23:26] (03PS5) 10Dzahn: diamond module: Add trailing commas [puppet] - 10https://gerrit.wikimedia.org/r/332098 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [14:25:23] (03CR) 10Dzahn: [C: 032] diamond module: Add trailing commas [puppet] - 10https://gerrit.wikimedia.org/r/332098 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [14:28:07] (03CR) 10Dzahn: [C: 031] druid module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332099 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [14:29:24] (03CR) 10Dzahn: [C: 031] geowiki module: Lint changes + modes/umask quoting [puppet] - 10https://gerrit.wikimedia.org/r/332101 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [14:30:29] (03CR) 10Hashar: [C: 031] "Noop on the three production servers https://puppet-compiler.wmflabs.org/5110/" [puppet] - 10https://gerrit.wikimedia.org/r/332096 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [14:30:45] (03CR) 10Hashar: [C: 031] "Thank you JuniorSys :]" [puppet] - 10https://gerrit.wikimedia.org/r/332096 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [14:32:12] (03PS2) 10Dzahn: Update SSH key for Petr Pechelko [puppet] - 10https://gerrit.wikimedia.org/r/332439 (https://phabricator.wikimedia.org/T155449) (owner: 10Muehlenhoff) [14:34:06] (03PS1) 10Urbanecm: [throttle] Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332468 (https://phabricator.wikimedia.org/T155493) [14:37:42] (03CR) 10Dzahn: "Labs DNS? both are NXDOMAIN from a random labs instance" [puppet] - 10https://gerrit.wikimedia.org/r/328456 (https://phabricator.wikimedia.org/T153608) (owner: 10Tim Landscheidt) [14:38:00] PROBLEM - Redis status tcp_6479 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 602 600 - REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 2958425 keys, up 78 days 6 hours - replication_delay is 602 [14:38:16] hashar, sorry for my lateness. I wish to deploy: T155309, T155321, T152296, T155493. Patch numbers: 332053 ,332325, 332329, 332468 [14:38:17] T152296: Review the 'botadmin' group at mlwiktionary and mlwikisource - https://phabricator.wikimedia.org/T152296 [14:38:17] T155321: Change the name "Wikipedia" from Latin to Cyrillic "Википедия" on Avar Wikipedia - https://phabricator.wikimedia.org/T155321 [14:38:17] T155309: Please add www.leventhalmap.org to the wgCopyUploadsDomains whitelist of Wikimedia Commons - https://phabricator.wikimedia.org/T155309 [14:38:17] T155493: Account creation throttle exception request for 2017-01-24 (#1Lib1Ref) - https://phabricator.wikimedia.org/T155493 [14:38:50] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 653 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2958248 keys, up 78 days 6 hours - replication_delay is 653 [14:39:14] (03PS3) 10Dzahn: restbase: add wikimania2018 [puppet] - 10https://gerrit.wikimedia.org/r/331523 (https://phabricator.wikimedia.org/T155038) [14:39:47] jouncebot: next [14:41:57] (03Abandoned) 10Dzahn: (debug) test removing is_virtual check for ipmi [puppet] - 10https://gerrit.wikimedia.org/r/331574 (owner: 10Dzahn) [14:45:00] RECOVERY - Redis status tcp_6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 2957879 keys, up 78 days 6 hours - replication_delay is 0 [14:46:32] (03CR) 10Dzahn: [C: 032] beta::autoupdater: Stop wmf-beta-mwconfig-update being a template just to get the staging dir [puppet] - 10https://gerrit.wikimedia.org/r/322408 (owner: 10Alex Monk) [14:46:38] (03PS4) 10Dzahn: beta::autoupdater: Stop wmf-beta-mwconfig-update being a template just to get the staging dir [puppet] - 10https://gerrit.wikimedia.org/r/322408 (owner: 10Alex Monk) [14:46:50] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2957861 keys, up 78 days 6 hours - replication_delay is 0 [14:47:19] (03PS4) 10Elukey: Add JVM Heap usage alarms for basic Hadoop daemons [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) [14:49:58] 06Operations, 13Patch-For-Review: Remote IPMI doesn't work for ~17% of the fleet - https://phabricator.wikimedia.org/T150160#2944978 (10Dzahn) a:05Dzahn>03Volans Do you still have the commands you ran for P4379? That should now be empty. [14:52:34] mutante: next won't work as the deployment calendar is empty. [14:52:40] hashar: did you receive my message? [14:52:46] Urbanecm_: gotcha, thx [14:52:52] mutante: yw [14:53:54] (03PS1) 10Dzahn: install_server: remove stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/332470 (https://phabricator.wikimedia.org/T154164) [14:54:03] (03CR) 10Elukey: [C: 04-1] "Still WIP" [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) (owner: 10Elukey) [14:55:12] (03CR) 10Dzahn: "when is the next deployment here?" [puppet] - 10https://gerrit.wikimedia.org/r/331523 (https://phabricator.wikimedia.org/T155038) (owner: 10Dzahn) [14:57:05] (03PS1) 10Dzahn: remove stat1001, keep mgmt [dns] - 10https://gerrit.wikimedia.org/r/332472 (https://phabricator.wikimedia.org/T154164) [14:57:20] hashar, are you here? [14:57:36] (03CR) 10Hashar: "Tyler/Chad would confirm, but I think we can remove the whole hieradata/labs/staging/ hierarchy. The project has been abandoned iirc." [puppet] - 10https://gerrit.wikimedia.org/r/328456 (https://phabricator.wikimedia.org/T153608) (owner: 10Tim Landscheidt) [14:57:39] (03CR) 10Dzahn: [C: 032] install_server: remove stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/332470 (https://phabricator.wikimedia.org/T154164) (owner: 10Dzahn) [14:57:45] (03PS2) 10Dzahn: install_server: remove stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/332470 (https://phabricator.wikimedia.org/T154164) [15:00:06] 06Operations, 10ops-eqiad, 10hardware-requests, 13Patch-For-Review: Reclaim/Decommission (specify) stat1001 - https://phabricator.wikimedia.org/T154164#2944993 (10Dzahn) p:05Triage>03Normal [15:01:43] !log installing bash security updates [15:01:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:00] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [15:06:33] (03Abandoned) 10Hashar: aptrepo: fix spec on Mac OS X [puppet] - 10https://gerrit.wikimedia.org/r/331632 (owner: 10Hashar) [15:11:19] Dereckson poke [15:14:23] (03PS2) 10Filippo Giunchedi: swift: add systemd unit file for proxy-server [puppet] - 10https://gerrit.wikimedia.org/r/294517 (https://phabricator.wikimedia.org/T117972) [15:14:55] (03PS4) 10Hashar: Jenkins integration of rspec [puppet] - 10https://gerrit.wikimedia.org/r/331856 (https://phabricator.wikimedia.org/T78342) [15:16:38] Steinsplitter: ping [15:16:46] How can I help you? [15:16:59] (03PS5) 10Elukey: Add JVM Heap usage alarms for basic Hadoop daemons [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) [15:18:57] (03CR) 10Muehlenhoff: swift: add systemd unit file for proxy-server (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/294517 (https://phabricator.wikimedia.org/T117972) (owner: 10Filippo Giunchedi) [15:19:04] (03PS6) 10Elukey: Add JVM Heap usage alarms for basic Hadoop daemons [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) [15:19:21] Dereckson: hi :), Are you familiar with global rename? [15:23:08] (03CR) 10Filippo Giunchedi: swift: add systemd unit file for proxy-server (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/294517 (https://phabricator.wikimedia.org/T117972) (owner: 10Filippo Giunchedi) [15:27:17] Steinsplitter: generally, lego.ktm is happy to overview the rename I think. If it's urgent, I've watched one with Linedwell in October, we can do that together. If it's less urgent, you could prefer to ask lego.ktm. [15:28:26] okay, i poked lego.ktm a few hours ago, he seems away. i have one with 52k edits (two k over the limit) [15:28:46] (03PS1) 10Hashar: wmflib: switch to puppetlabs_spec_helper/rake_tasks [puppet] - 10https://gerrit.wikimedia.org/r/332475 [15:28:48] (03PS1) 10Hashar: wmflib: update spec still using old install1001 IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/332476 [15:29:10] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/5111/" [puppet] - 10https://gerrit.wikimedia.org/r/294517 (https://phabricator.wikimedia.org/T117972) (owner: 10Filippo Giunchedi) [15:30:01] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [15:31:22] (03CR) 10jerkins-bot: [V: 04-1] wmflib: switch to puppetlabs_spec_helper/rake_tasks [puppet] - 10https://gerrit.wikimedia.org/r/332475 (owner: 10Hashar) [15:31:47] 06Operations, 10ops-codfw, 10media-storage: Degraded RAID on ms-be2003 - https://phabricator.wikimedia.org/T155363#2945017 (10Papaul) p:05Triage>03Normal [15:32:44] (03CR) 10jerkins-bot: [V: 04-1] wmflib: update spec still using old install1001 IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/332476 (owner: 10Hashar) [15:33:53] (03Abandoned) 10Hashar: wmflib: update spec still using old install1001 IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/332476 (owner: 10Hashar) [15:34:30] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:09] (03PS2) 10Hashar: wmflib: switch to puppetlabs_spec_helper/rake_tasks [puppet] - 10https://gerrit.wikimedia.org/r/332475 [15:37:28] (03CR) 10jerkins-bot: [V: 04-1] wmflib: switch to puppetlabs_spec_helper/rake_tasks [puppet] - 10https://gerrit.wikimedia.org/r/332475 (owner: 10Hashar) [15:37:38] (03PS5) 10Hashar: kafka: fix Unrecognized escape sequence '\.' [puppet] - 10https://gerrit.wikimedia.org/r/331451 [15:45:02] back in a while... trying to beat the jetlag by doing a hardware upgrade. this can only end badly [15:45:25] (03CR) 10Hashar: "Fails due to conftool which is not available on the instances https://integration.wikimedia.org/ci/job/operations-puppet-rake-jessie/2839/" [puppet] - 10https://gerrit.wikimedia.org/r/332475 (owner: 10Hashar) [15:45:26] 06Operations, 10Ops-Access-Requests: Requesting to change the production public key - https://phabricator.wikimedia.org/T155449#2943623 (10mobrovac) Confirming that Petr's laptop broke down and that this is a genuine request. [15:45:28] (03PS1) 10Hashar: contint: add python-conftool [puppet] - 10https://gerrit.wikimedia.org/r/332477 [15:45:44] (03CR) 10Hashar: [C: 04-1] "Needs https://gerrit.wikimedia.org/r/332477 contint: add python-conftool" [puppet] - 10https://gerrit.wikimedia.org/r/332475 (owner: 10Hashar) [15:47:02] (03CR) 10Hashar: "The conftool spec fails on https://gerrit.wikimedia.org/r/#/c/332475/2 because the CI nodes lack python-conftool :" [puppet] - 10https://gerrit.wikimedia.org/r/332477 (owner: 10Hashar) [15:49:35] (03CR) 10Muehlenhoff: [C: 032] Update SSH key for Petr Pechelko [puppet] - 10https://gerrit.wikimedia.org/r/332439 (https://phabricator.wikimedia.org/T155449) (owner: 10Muehlenhoff) [15:49:41] (03PS3) 10Muehlenhoff: Update SSH key for Petr Pechelko [puppet] - 10https://gerrit.wikimedia.org/r/332439 (https://phabricator.wikimedia.org/T155449) [15:55:10] (03PS6) 10Dzahn: contint module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332096 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [15:55:23] (03CR) 10Andrew Bogott: [C: 032] delete unused ppa keys in files/ppa/ [puppet] - 10https://gerrit.wikimedia.org/r/318451 (owner: 10Dzahn) [15:55:33] (03PS4) 10Andrew Bogott: delete unused ppa keys in files/ppa/ [puppet] - 10https://gerrit.wikimedia.org/r/318451 (owner: 10Dzahn) [15:56:20] 06Operations, 10Ops-Access-Requests: Requesting to change the production public key - https://phabricator.wikimedia.org/T155449#2945102 (10MoritzMuehlenhoff) 05Open>03Resolved a:03MoritzMuehlenhoff The key change has merged, it may take up to 30 minutes until Puppet has effected the change on all servers. [15:58:32] (03CR) 10Dzahn: [C: 032] contint module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332096 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [15:58:41] (03PS7) 10Dzahn: contint module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332096 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [16:00:00] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:02:10] PROBLEM - citoid endpoints health on scb2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:02:10] PROBLEM - citoid endpoints health on scb2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:02:10] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:02:21] (03CR) 10Andrew Bogott: "> Shall we just set openstack::version to mitaka in shinken-01's hiera data in horizon then" [puppet] - 10https://gerrit.wikimedia.org/r/328611 (owner: 10Alex Monk) [16:02:30] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:02:37] 06Operations: Integrate jessie 8.7 point release - https://phabricator.wikimedia.org/T155401#2945131 (10MoritzMuehlenhoff) These updates are fully rolled out: libwmf irssi python-werkzeug file jq tre potrace w3m ieee-data isn't needed, we had previously already installed the version from stretch into jessie-wi... [16:03:00] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [16:03:00] RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy [16:03:00] RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy [16:04:08] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/294517 (https://phabricator.wikimedia.org/T117972) (owner: 10Filippo Giunchedi) [16:05:03] (03CR) 10Paladox: [C: 031] "> yea, what is the reason for this?" [puppet] - 10https://gerrit.wikimedia.org/r/331998 (owner: 10Paladox) [16:06:00] (03PS3) 10Filippo Giunchedi: swift: add systemd unit file for proxy-server [puppet] - 10https://gerrit.wikimedia.org/r/294517 (https://phabricator.wikimedia.org/T117972) [16:08:20] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] swift: add systemd unit file for proxy-server [puppet] - 10https://gerrit.wikimedia.org/r/294517 (https://phabricator.wikimedia.org/T117972) (owner: 10Filippo Giunchedi) [16:10:20] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:13:10] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [16:13:41] (03Abandoned) 10Hashar: Octopus merge of linting fixes [puppet] - 10https://gerrit.wikimedia.org/r/331460 (owner: 10Hashar) [16:14:37] (03Abandoned) 10Paladox: Gerrit: Allow running sudo service gerrit * under the gerrit2 user [puppet] - 10https://gerrit.wikimedia.org/r/331998 (owner: 10Paladox) [16:16:31] !log filippo@puppetmaster1001 conftool action : set/pooled=yes; selector: name=ms-fe2005.codfw.wmnet [16:16:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:16:58] (03CR) 10Dzahn: "eh, the entire "staging" has been abandoned??" [puppet] - 10https://gerrit.wikimedia.org/r/328456 (https://phabricator.wikimedia.org/T153608) (owner: 10Tim Landscheidt) [16:17:00] (03PS7) 10Hashar: puppet parse validate from rake [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154915) [16:18:01] jouncebot: next [16:18:10] nope, not yet [16:18:56] jouncebot: refresh [16:18:57] I refreshed my knowledge about deployments. [16:19:02] jouncebot: next [16:19:02] In 0 hour(s) and 40 minute(s): Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170117T1700) [16:19:06] mutante ^^ [16:19:06] :) thanks! [16:19:08] neat :) [16:19:31] 06Operations, 10Traffic, 10Wikidata, 07HTTPS: wikiba.se should use HTTPS - https://phabricator.wikimedia.org/T155359#2945239 (10Izno) [16:19:40] PROBLEM - Swift HTTP frontend on ms-fe3002 is CRITICAL: connect to address 10.20.0.16 and port 80: Connection refused [16:19:50] PROBLEM - Swift HTTP backend on ms-fe3002 is CRITICAL: connect to address 10.20.0.16 and port 80: Connection refused [16:21:10] that's me ^ ms-fe3002 not in service [16:21:59] 06Operations, 10Traffic, 10Wikidata, 07HTTPS: wikiba.se should use HTTPS - https://phabricator.wikimedia.org/T155359#2941299 (10Dzahn) wikiba.se is not owned by WMF and doesn't point to WMF DNS servers and the IP it points to is also not under our control. Therefore i don't really see how this is an Opera... [16:22:48] !log Powering off db2060 for maintenance - T154031 [16:22:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:22:54] T154031: db2060 crashed (RAID controller) - https://phabricator.wikimedia.org/T154031 [16:23:48] 06Operations, 10Traffic, 10Wikidata, 07HTTPS: wikiba.se should use HTTPS - https://phabricator.wikimedia.org/T155359#2945287 (10Dzahn) 05Open>03stalled [16:24:45] jouncebot: next [16:24:46] In 0 hour(s) and 35 minute(s): Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170117T1700) [16:25:40] RECOVERY - Swift HTTP frontend on ms-fe3002 is OK: HTTP OK: HTTP/1.1 200 OK - 185 bytes in 0.174 second response time [16:25:46] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2056" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332458 (owner: 10Marostegui) [16:25:50] RECOVERY - Swift HTTP backend on ms-fe3002 is OK: HTTP OK: HTTP/1.1 200 OK - 393 bytes in 0.188 second response time [16:26:13] I see no patches for puppet swat, if some line up in the next 35 min I won't be able to attend puppet swat today :( [16:27:26] (03CR) 10Dzahn: "Thanks Andrew :)" [puppet] - 10https://gerrit.wikimedia.org/r/318451 (owner: 10Dzahn) [16:27:48] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2056" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332458 (owner: 10Marostegui) [16:28:04] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2056" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332458 (owner: 10Marostegui) [16:29:00] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:29:31] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2056 - T154097 (duration: 00m 48s) [16:29:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:35] T154097: Remove partitions from enwiktionary.templatelinks in s2 - https://phabricator.wikimedia.org/T154097 [16:29:49] (03PS4) 10Dzahn: varnish misc: add phab2001 as a backend for phab-new [puppet] - 10https://gerrit.wikimedia.org/r/324797 (https://phabricator.wikimedia.org/T137928) [16:30:09] (03CR) 10Dzahn: "how's the varnish refactoring going?" [puppet] - 10https://gerrit.wikimedia.org/r/324797 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [16:34:38] (03CR) 10Dzahn: "so if i see it right "if loghost is set (to any value), then logging to logstash gets enabled". Is that right? In that case this is back t" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [16:35:21] (03CR) 10Paladox: "Oh, so we want to disable it again?" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [16:35:59] (03PS2) 10Dzahn: puppetmaster: Indent @ssl_settings in Apache and NGINX configurations [puppet] - 10https://gerrit.wikimedia.org/r/329745 (owner: 10Tim Landscheidt) [16:36:21] (03CR) 10Elukey: [C: 04-1] "So multiplying a hiera value with a float is not well liked by puppet 3.8, since it needs two numbers (and there is not auto-conversion st" [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) (owner: 10Elukey) [16:36:49] (03CR) 10Dzahn: "i don't know, you are the one who wants a change:) i'm just pointing out you said last time this is just "adding support" without enabling" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [16:38:01] (03PS15) 10Paladox: Gerrit: Add support for logstash in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) [16:38:07] (03PS16) 10Paladox: Gerrit: Add support for logstash in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) [16:38:16] (03CR) 10Paladox: "> i don't know, you are the one who wants a change:) i'm just" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [16:39:24] (03PS5) 10Ema: varnish module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332113 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [16:40:14] (03CR) 10Ema: [V: 032 C: 032] "LGTM and to pcc https://puppet-compiler.wmflabs.org/5114/" [puppet] - 10https://gerrit.wikimedia.org/r/332113 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [16:43:23] (03PS5) 10Elukey: druid module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332099 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [16:44:37] (03CR) 10Dzahn: "compiler says it's a change http://puppet-compiler.wmflabs.org/5115/cobalt.wikimedia.org/ because log_host is set in labs??" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [16:46:04] (03CR) 10Elukey: [C: 032] "LGTM and PCC agrees: https://puppet-compiler.wmflabs.org/5116/" [puppet] - 10https://gerrit.wikimedia.org/r/332099 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [16:47:27] (03CR) 10Paladox: "@Dzahn, yeh it should show the port in jetty, but it won't change log host." [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [16:47:45] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2060" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332495 [16:48:25] (03CR) 10Dzahn: "nope:" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [16:49:07] (03CR) 10Dzahn: [C: 032] puppetmaster: Indent @ssl_settings in Apache and NGINX configurations [puppet] - 10https://gerrit.wikimedia.org/r/329745 (owner: 10Tim Landscheidt) [16:49:13] (03PS3) 10Dzahn: puppetmaster: Indent @ssl_settings in Apache and NGINX configurations [puppet] - 10https://gerrit.wikimedia.org/r/329745 (owner: 10Tim Landscheidt) [16:49:16] (03CR) 10Paladox: "Yep, that may be adding an extra line." [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [16:49:33] (03PS17) 10Paladox: Gerrit: Add support for logstash in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) [16:49:53] (03CR) 10Paladox: "@Dzahn could you re run puppet compiler please?" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [16:51:53] 06Operations, 10Traffic, 06Wikipedia-iOS-App-Backlog, 10iOS-app-feature-Links: Fix universal link support in iOS when the OS requests the site association file from m.wikipedia.org - https://phabricator.wikimedia.org/T155504#2945413 (10Fjalapeno) [16:51:57] (03PS2) 10Marostegui: Revert "db-codfw.php: Depool db2060" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332495 [16:55:25] (03PS4) 10Dzahn: restbase: add wikimania2018 [puppet] - 10https://gerrit.wikimedia.org/r/331523 (https://phabricator.wikimedia.org/T155038) [16:56:39] (03CR) 10Dzahn: [C: 032] restbase: add wikimania2018 [puppet] - 10https://gerrit.wikimedia.org/r/331523 (https://phabricator.wikimedia.org/T155038) (owner: 10Dzahn) [16:56:43] (03CR) 10Paladox: "@Dzahn looking at https://puppet-compiler.wmflabs.org/5117/cobalt.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [16:57:41] (03CR) 10Paladox: "Looking at the bottom of puppet compiler i see" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [16:58:38] (03PS6) 10Anomie: Set $wgSoftBlockRanges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324215 [16:59:13] (03CR) 10Dzahn: "done, still creates changes http://puppet-compiler.wmflabs.org/5117/" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [16:59:21] (03PS18) 10Paladox: Gerrit: Add support for logstash in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) [16:59:30] (03CR) 10Paladox: "> done, still creates changes http://puppet-compiler.wmflabs.org/5117/" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [16:59:56] (03CR) 10Paladox: "It's just adding a extra line where it is meant to be." [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [17:00:04] godog, moritzm, and _joe_: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170117T1700). Please do the needful. [17:00:28] 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: db2060 crashed (RAID controller) - https://phabricator.wikimedia.org/T154031#2945458 (10Papaul) Firmware update complete. [17:00:45] (03CR) 10Dzahn: "your template says <% if @log_host %> around all the changes, so why would this change if log_host is not true" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [17:01:02] (03PS5) 10Elukey: statistics module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332109 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [17:01:29] (03CR) 10Paladox: "> your template says <% if @log_host %> around all the changes, so" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [17:02:10] (03CR) 10Dzahn: "line 22 in the erb?" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [17:02:40] PROBLEM - puppet last run on mw1212 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:02:59] (03CR) 10Paladox: "@Dzahn would you be able to re run puppet please?" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [17:03:34] (03CR) 10Elukey: [C: 032] "LGTM and pcc agrees https://puppet-compiler.wmflabs.org/5118/" [puppet] - 10https://gerrit.wikimedia.org/r/332109 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [17:05:30] RECOVERY - MegaRAID on ms-be2003 is OK: OK: optimal, 13 logical, 13 physical [17:05:52] (03PS19) 10Paladox: Gerrit: Add support for logstash in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) [17:08:56] (03CR) 10Chad: [C: 04-1] "Yes, this whole file should be deleted instead." [puppet] - 10https://gerrit.wikimedia.org/r/328456 (https://phabricator.wikimedia.org/T153608) (owner: 10Tim Landscheidt) [17:10:02] (03PS20) 10Paladox: Gerrit: Add support for logstash in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) [17:12:00] 06Operations, 06Discovery, 06Maps, 06WMF-Legal, 03Interactive-Sprint: Define tile usage policy - https://phabricator.wikimedia.org/T141815#2945471 (10debt) Hi @timautin - yes, these things can be done as long as there isn't excessive usage, as noted by @Slaporte in the above [[ https://phabricator.wikime... [17:13:53] 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: db2060 crashed (RAID controller) - https://phabricator.wikimedia.org/T154031#2945473 (10Papaul) {F5300978} [17:15:54] 06Operations, 10ops-codfw, 10media-storage: Degraded RAID on ms-be2003 - https://phabricator.wikimedia.org/T155363#2945478 (10Papaul) a:05Papaul>03fgiunchedi Disk replacement complete [17:18:49] (03PS1) 10DCausse: [cirrus] Increase weigths for content namespace weight on mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332513 (https://phabricator.wikimedia.org/T155142) [17:19:26] (03PS2) 10DCausse: [cirrus] Increase weigths for content namespaces on mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332513 (https://phabricator.wikimedia.org/T155142) [17:21:48] (03PS4) 10Andrew Bogott: Use LE for wikitech [puppet] - 10https://gerrit.wikimedia.org/r/331638 (https://phabricator.wikimedia.org/T154913) (owner: 10Alex Monk) [17:23:25] (03CR) 10Andrew Bogott: [C: 032] Use LE for wikitech [puppet] - 10https://gerrit.wikimedia.org/r/331638 (https://phabricator.wikimedia.org/T154913) (owner: 10Alex Monk) [17:23:59] (03PS1) 10Urbanecm: Enable subpages in NS_MAIN in eswikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332516 (https://phabricator.wikimedia.org/T155498) [17:24:48] jouncebot, next [17:24:48] In 0 hour(s) and 35 minute(s): Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170117T1800) [17:26:00] PROBLEM - puppet last run on cp3039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:27:21] 06Operations, 10ops-eqiad, 10hardware-requests, 13Patch-For-Review: decommission stat1001 - https://phabricator.wikimedia.org/T154164#2945535 (10RobH) [17:27:28] (03PS2) 10RobH: remove stat1001, keep mgmt [dns] - 10https://gerrit.wikimedia.org/r/332472 (https://phabricator.wikimedia.org/T154164) (owner: 10Dzahn) [17:28:46] PROBLEM - puppet last run on silver is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 41 seconds ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ssl/localcerts/wikitech.wikimedia.org.crt] [17:30:26] RECOVERY - puppet last run on mw1212 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [17:35:34] 06Operations, 10ops-eqiad, 10hardware-requests, 13Patch-For-Review: decommission stat1001 - https://phabricator.wikimedia.org/T154164#2945571 (10RobH) [17:38:37] (03PS1) 10Andrew Bogott: Work around possible bug in ensure=>absent [puppet] - 10https://gerrit.wikimedia.org/r/332519 [17:39:10] (03CR) 10RobH: [C: 032] remove stat1001, keep mgmt [dns] - 10https://gerrit.wikimedia.org/r/332472 (https://phabricator.wikimedia.org/T154164) (owner: 10Dzahn) [17:40:45] 06Operations, 10ops-eqiad, 10hardware-requests, 13Patch-For-Review: decommission stat1001 - https://phabricator.wikimedia.org/T154164#2945592 (10RobH) a:03Cmjohnson [17:42:43] (03PS2) 10Andrew Bogott: sslcert::certificate: Clarify ensure=>absent behavior [puppet] - 10https://gerrit.wikimedia.org/r/332519 [17:43:07] 06Operations, 10ops-eqiad, 10hardware-requests: decommission stat1001 - https://phabricator.wikimedia.org/T154164#2945602 (10RobH) [17:44:56] PROBLEM - puppet last run on labtestweb2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ssl/localcerts/labtestwikitech.wikimedia.org.crt] [17:48:15] !log restbase installing node v6.9.1 on the cluster T149331 [17:48:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:48:19] T149331: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331 [17:49:08] 06Operations, 10Analytics, 10ChangeProp, 10Citoid, and 11 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2945609 (10mobrovac) a:03mobrovac [17:54:06] RECOVERY - puppet last run on cp3039 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [17:55:35] (03PS1) 10RobH: remove stat1001 from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/332522 [17:56:49] (03CR) 10Andrew Bogott: [C: 032] "Holy cow, this seems to actually be necessary." [puppet] - 10https://gerrit.wikimedia.org/r/332519 (owner: 10Andrew Bogott) [17:58:44] (03PS2) 10RobH: remove stat1001 from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/332522 [17:59:25] (03CR) 10RobH: [C: 032] remove stat1001 from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/332522 (owner: 10RobH) [18:00:01] RECOVERY - puppet last run on silver is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [18:00:04] yurik, gwicke, cscott, arlolra, subbu, halfak, and Amir1: Respected human, time to deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170117T1800). Please do the needful. [18:00:34] no deploys today [18:00:47] 06Operations, 10ops-eqiad, 10hardware-requests: decommission stat1001 - https://phabricator.wikimedia.org/T154164#2945644 (10RobH) [18:03:27] !log restbase deploying a0e542b, switching to Node v6 T149331 [18:03:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:03:32] T149331: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331 [18:04:57] sorry, i meant: no *parsoid* deploys today [18:05:28] (03CR) 10Paladox: "This https://puppet-compiler.wmflabs.org/5120/cobalt.wikimedia.org/ looks better :)" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [18:09:25] 06Operations, 10MediaWiki-API, 10Traffic: Varnish does not cache Action API responses when logged in - https://phabricator.wikimedia.org/T155314#2945672 (10Anomie) No, because the module might not know in the first place. Look at the crazy things that cause trouble for {T127233}, it's basically the same thin... [18:12:01] RECOVERY - puppet last run on labtestweb2001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [18:18:46] (03PS3) 10Madhuvishy: nfs: Clean up post tools nfs migration [puppet] - 10https://gerrit.wikimedia.org/r/329707 [18:20:02] (03CR) 10Dzahn: [C: 031] "yes, http://puppet-compiler.wmflabs.org/5120/cobalt.wikimedia.org/ does look better" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [18:20:13] (03PS21) 10Dzahn: Gerrit: Add support for logstash in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [18:21:40] (03CR) 10Dzahn: [C: 032] Gerrit: Add support for logstash in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [18:21:59] (03CR) 10Madhuvishy: [C: 032] nfs: Clean up post tools nfs migration [puppet] - 10https://gerrit.wikimedia.org/r/329707 (owner: 10Madhuvishy) [18:22:19] (03PS4) 10Madhuvishy: nfs: Clean up post tools nfs migration [puppet] - 10https://gerrit.wikimedia.org/r/329707 [18:22:30] (03CR) 10Madhuvishy: [V: 032 C: 032] nfs: Clean up post tools nfs migration [puppet] - 10https://gerrit.wikimedia.org/r/329707 (owner: 10Madhuvishy) [18:24:23] (03PS2) 10Tim Landscheidt: openstack: Indent @ssl_settings in Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/329744 [18:25:27] (03PS3) 10Dzahn: staging: delete staging/common.yaml [puppet] - 10https://gerrit.wikimedia.org/r/328456 (https://phabricator.wikimedia.org/T153608) (owner: 10Tim Landscheidt) [18:26:26] (03PS3) 10Dzahn: openstack: Indent @ssl_settings in Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/329744 (owner: 10Tim Landscheidt) [18:26:48] (03PS1) 10Ema: Revert "Route around codfw, network issues there" [puppet] - 10https://gerrit.wikimedia.org/r/332526 (https://phabricator.wikimedia.org/T154758) [18:27:22] (03PS1) 10Filippo Giunchedi: site: add fluorine's roles to mwlog2001 [puppet] - 10https://gerrit.wikimedia.org/r/332527 (https://phabricator.wikimedia.org/T123728) [18:28:29] (03CR) 10Dzahn: [C: 032] openstack: Indent @ssl_settings in Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/329744 (owner: 10Tim Landscheidt) [18:29:01] 06Operations, 10Traffic: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717#2945767 (10Andrew) [18:29:01] PROBLEM - puppet last run on silver is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:29:04] 06Operations, 10Traffic, 13Patch-For-Review: convert wikitech.wikimedia.org from globalsign to letsencrypt certificate (deadline 2017-02-24) - https://phabricator.wikimedia.org/T154913#2945766 (10Andrew) 05Open>03Resolved [18:29:42] (03CR) 10Chad: [C: 031] staging: delete staging/common.yaml [puppet] - 10https://gerrit.wikimedia.org/r/328456 (https://phabricator.wikimedia.org/T153608) (owner: 10Tim Landscheidt) [18:30:47] (03PS1) 10Madhuvishy: nfs: Remove tools mount from labstore-svc dependency in nfsclient [puppet] - 10https://gerrit.wikimedia.org/r/332528 [18:32:16] (03PS2) 10Madhuvishy: nfs: Remove tools mount from labstore-svc dependency in nfsclient [puppet] - 10https://gerrit.wikimedia.org/r/332528 [18:32:28] (03CR) 10Madhuvishy: [V: 032 C: 032] nfs: Remove tools mount from labstore-svc dependency in nfsclient [puppet] - 10https://gerrit.wikimedia.org/r/332528 (owner: 10Madhuvishy) [18:34:39] (03PS1) 10Filippo Giunchedi: puppet_compiler: replace rhodium with puppet.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/332529 [18:37:02] (03CR) 10Filippo Giunchedi: [C: 032] puppet_compiler: replace rhodium with puppet.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/332529 (owner: 10Filippo Giunchedi) [18:37:09] (03PS2) 10Filippo Giunchedi: puppet_compiler: replace rhodium with puppet.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/332529 [18:37:18] (03PS1) 10Dzahn: wikitech: remove pre-LE sslcert class [puppet] - 10https://gerrit.wikimedia.org/r/332530 [18:39:00] (03Draft1) 10Paladox: Gerrit: Enable logstash by default for prod gerrit [puppet] - 10https://gerrit.wikimedia.org/r/332531 (https://phabricator.wikimedia.org/T141324) [18:39:04] (03PS2) 10Paladox: Gerrit: Enable logstash by default for prod gerrit [puppet] - 10https://gerrit.wikimedia.org/r/332531 (https://phabricator.wikimedia.org/T141324) [18:41:27] (03PS2) 10Dzahn: wikitech: remove pre-LE sslcert class [puppet] - 10https://gerrit.wikimedia.org/r/332530 (https://phabricator.wikimedia.org/T154913) [18:41:36] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/5123/silver.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/332530 (https://phabricator.wikimedia.org/T154913) (owner: 10Dzahn) [18:42:05] (03CR) 10BBlack: [C: 031] Revert "Route around codfw, network issues there" [puppet] - 10https://gerrit.wikimedia.org/r/332526 (https://phabricator.wikimedia.org/T154758) (owner: 10Ema) [18:42:42] (03CR) 10Dzahn: [C: 032] wikitech: remove pre-LE sslcert class [puppet] - 10https://gerrit.wikimedia.org/r/332530 (https://phabricator.wikimedia.org/T154913) (owner: 10Dzahn) [18:43:04] (03Abandoned) 10Andrew Bogott: Add clientlib.pp and mwopenstackclients.py [puppet] - 10https://gerrit.wikimedia.org/r/325828 (https://phabricator.wikimedia.org/T150092) (owner: 10Andrew Bogott) [18:43:11] (03PS3) 10Dzahn: wikitech: remove pre-LE sslcert class [puppet] - 10https://gerrit.wikimedia.org/r/332530 (https://phabricator.wikimedia.org/T154913) [18:44:48] (03PS1) 10Urbanecm: [throttle] Add a new rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332533 (https://phabricator.wikimedia.org/T155510) [18:48:01] RECOVERY - puppet last run on silver is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [18:48:21] RECOVERY - puppet last run on ms-be2003 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [18:49:02] (03CR) 10Dzahn: "confirmed it's also gone from /etc/ssl/private/ on silver" [puppet] - 10https://gerrit.wikimedia.org/r/332530 (https://phabricator.wikimedia.org/T154913) (owner: 10Dzahn) [18:49:23] (03CR) 10Dzahn: "puppet works fine again on silver now" [puppet] - 10https://gerrit.wikimedia.org/r/332530 (https://phabricator.wikimedia.org/T154913) (owner: 10Dzahn) [18:49:33] (03PS1) 10Eevans: WIP: Enable Prometheus JMX exporter on Cassandra nodes [puppet] - 10https://gerrit.wikimedia.org/r/332535 (https://phabricator.wikimedia.org/T155120) [18:49:58] (03PS1) 10Chad: group0 to wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332536 [18:50:04] (03CR) 10Eevans: [C: 04-1] WIP: Enable Prometheus JMX exporter on Cassandra nodes [puppet] - 10https://gerrit.wikimedia.org/r/332535 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans) [18:50:15] (03CR) 10Dzahn: [C: 032] staging: delete staging/common.yaml [puppet] - 10https://gerrit.wikimedia.org/r/328456 (https://phabricator.wikimedia.org/T153608) (owner: 10Tim Landscheidt) [18:50:17] (03PS2) 10Eevans: WIP: Enable Prometheus JMX exporter on Cassandra nodes [puppet] - 10https://gerrit.wikimedia.org/r/332535 (https://phabricator.wikimedia.org/T155120) [18:50:20] (03PS4) 10Dzahn: staging: delete staging/common.yaml [puppet] - 10https://gerrit.wikimedia.org/r/328456 (https://phabricator.wikimedia.org/T153608) (owner: 10Tim Landscheidt) [18:50:22] (03CR) 10Chad: [C: 04-2] "Prep for later" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332536 (owner: 10Chad) [18:52:04] (03CR) 10Chad: [C: 032] [throttle] Add a new rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332533 (https://phabricator.wikimedia.org/T155510) (owner: 10Urbanecm) [18:52:11] (03PS2) 10Ema: Revert "Route around codfw, network issues there" [puppet] - 10https://gerrit.wikimedia.org/r/332526 (https://phabricator.wikimedia.org/T154758) [18:52:17] (03CR) 10Chad: [C: 032] Change the NS_PROJECT name to "Википедия" on avwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332325 (https://phabricator.wikimedia.org/T155321) (owner: 10Urbanecm) [18:52:20] (03CR) 10Ema: [V: 032 C: 032] Revert "Route around codfw, network issues there" [puppet] - 10https://gerrit.wikimedia.org/r/332526 (https://phabricator.wikimedia.org/T154758) (owner: 10Ema) [18:52:39] (03CR) 10Chad: [C: 032] Add *.leventhalmap.org to the copyupload whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332053 (https://phabricator.wikimedia.org/T155309) (owner: 10Urbanecm) [18:52:43] (03CR) 10Chad: [C: 032] [throttle] Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332468 (https://phabricator.wikimedia.org/T155493) (owner: 10Urbanecm) [18:52:49] (03CR) 10Chad: [C: 032] Enable subpages in NS_MAIN in eswikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332516 (https://phabricator.wikimedia.org/T155498) (owner: 10Urbanecm) [18:53:29] 06Operations, 10Traffic: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717#2945933 (10Dzahn) [18:54:23] Urbanecm_: I'm doing your swat early. [18:54:24] (03Merged) 10jenkins-bot: [throttle] Add a new rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332533 (https://phabricator.wikimedia.org/T155510) (owner: 10Urbanecm) [18:54:29] ostriches: thanks [18:54:32] (I hate the 11am tuesday swat, it's a terrible time of day) [18:54:40] (03CR) 10Dzahn: [C: 031] "we can give it a try during the next maintenance, see if the logstash restart from last time was all that was needed or not" [puppet] - 10https://gerrit.wikimedia.org/r/332531 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [18:54:42] (03CR) 10jenkins-bot: [throttle] Add a new rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332533 (https://phabricator.wikimedia.org/T155510) (owner: 10Urbanecm) [18:54:45] I have 7pm... [18:55:13] (03PS1) 10Madhuvishy: nfs: Cleanup maps on labstore-svc dependency in nfsclient [puppet] - 10https://gerrit.wikimedia.org/r/332537 [18:55:20] (03Merged) 10jenkins-bot: Change the NS_PROJECT name to "Википедия" on avwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332325 (https://phabricator.wikimedia.org/T155321) (owner: 10Urbanecm) [18:55:25] Hehe, well $TIME_OF_DAY on tuesdays are pretty rough, since the train starts an hour later :) [18:55:28] (03CR) 10jerkins-bot: [V: 04-1] [throttle] Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332468 (https://phabricator.wikimedia.org/T155493) (owner: 10Urbanecm) [18:55:44] (03CR) 10jenkins-bot: Change the NS_PROJECT name to "Википедия" on avwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332325 (https://phabricator.wikimedia.org/T155321) (owner: 10Urbanecm) [18:56:07] Should I rebase 332468? [18:56:34] (03PS2) 10Chad: Add *.leventhalmap.org to the copyupload whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332053 (https://phabricator.wikimedia.org/T155309) (owner: 10Urbanecm) [18:56:52] (03PS2) 10Chad: Enable subpages in NS_MAIN in eswikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332516 (https://phabricator.wikimedia.org/T155498) (owner: 10Urbanecm) [18:57:12] Urbanecm_: Yeah that one needs a manual rebase [18:57:18] Working on it. [18:58:31] (03PS2) 10Urbanecm: [throttle] Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332468 (https://phabricator.wikimedia.org/T155493) [18:58:47] (03CR) 10Rush: [C: 032] nfs: Cleanup maps on labstore-svc dependency in nfsclient [puppet] - 10https://gerrit.wikimedia.org/r/332537 (owner: 10Madhuvishy) [18:58:57] (03PS2) 10Rush: nfs: Cleanup maps on labstore-svc dependency in nfsclient [puppet] - 10https://gerrit.wikimedia.org/r/332537 (owner: 10Madhuvishy) [18:59:37] (03CR) 10Madhuvishy: [V: 032] nfs: Cleanup maps on labstore-svc dependency in nfsclient [puppet] - 10https://gerrit.wikimedia.org/r/332537 (owner: 10Madhuvishy) [18:59:59] !log demon@tin Synchronized wmf-config/throttle.php: T155510 throttle rule (duration: 01m 36s) [19:00:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:00:03] T155510: Requesting temporary lift of IP cap - https://phabricator.wikimedia.org/T155510 [19:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170117T1900). [19:00:04] Urbanecm: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [19:00:08] (03CR) 10jenkins-bot: Add *.leventhalmap.org to the copyupload whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332053 (https://phabricator.wikimedia.org/T155309) (owner: 10Urbanecm) [19:01:00] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: avwiki namespace tweaks, T155321 (duration: 00m 39s) [19:01:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:01:05] T155321: Change the name "Wikipedia" from Latin to Cyrillic "Википедия" on Avar Wikipedia - https://phabricator.wikimedia.org/T155321 [19:01:41] (03PS3) 10Chad: Enable subpages in NS_MAIN in eswikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332516 (https://phabricator.wikimedia.org/T155498) (owner: 10Urbanecm) [19:01:53] (03CR) 10Chad: [C: 032] [throttle] Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332468 (https://phabricator.wikimedia.org/T155493) (owner: 10Urbanecm) [19:02:14] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: Add *.leventhalmap.org to the copyupload whitelist (duration: 00m 39s) [19:02:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:04] 06Operations, 10Traffic, 13Patch-For-Review: convert wikitech.wikimedia.org from globalsign to letsencrypt certificate (deadline 2017-02-24) - https://phabricator.wikimedia.org/T154913#2946029 (10RobH) [19:03:13] (03Merged) 10jenkins-bot: [throttle] Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332468 (https://phabricator.wikimedia.org/T155493) (owner: 10Urbanecm) [19:03:27] (03CR) 10jenkins-bot: [throttle] Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332468 (https://phabricator.wikimedia.org/T155493) (owner: 10Urbanecm) [19:04:35] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: Enable subpages in NS_MAIN in eswikiversity (duration: 00m 39s) [19:04:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:05:34] (03CR) 10jenkins-bot: Enable subpages in NS_MAIN in eswikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332516 (https://phabricator.wikimedia.org/T155498) (owner: 10Urbanecm) [19:05:40] !log demon@tin Synchronized wmf-config/throttle.php: throttle rule for T155493 (duration: 00m 40s) [19:05:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:05:44] T155493: Account creation throttle exception request for 2017-01-24 (#1Lib1Ref) - https://phabricator.wikimedia.org/T155493 [19:05:46] Urbanecm: And that's the last of them. Thanks for playing the swat game :) [19:06:05] !log demon@tin Started scap: testwiki to wmf.8 + rebuild l10n [19:06:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:19] ostriches, thanks for your deployment! [19:06:28] It was really fast. [19:07:33] (03PS9) 10Andrew Bogott: Move shinkengen from using LDAP to the OpenStack APIs [puppet] - 10https://gerrit.wikimedia.org/r/328611 (owner: 10Alex Monk) [19:07:37] Urbanecm: I was in a rush. I don't like that particular swat window because it runs up against the weekly tuesday train :) [19:07:51] (in the future, I'm thinking of killing that *particular* swat window on tuesdays) [19:11:00] (03CR) 10Andrew Bogott: [C: 032] Move shinkengen from using LDAP to the OpenStack APIs [puppet] - 10https://gerrit.wikimedia.org/r/328611 (owner: 10Alex Monk) [19:13:43] (03PS11) 10Volans: Cumin: allow connection to the targets [puppet] - 10https://gerrit.wikimedia.org/r/330436 (https://phabricator.wikimedia.org/T154588) [19:16:08] 06Operations, 10Traffic: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717#2946097 (10RobH) [19:16:51] 06Operations, 10Traffic: convert stream.wikimedia.org from GS to LE certificate - https://phabricator.wikimedia.org/T155524#2946101 (10RobH) [19:18:36] (03PS4) 10Madhuvishy: nfs: Dual mount misc projects from labstore-secondary cluster [puppet] - 10https://gerrit.wikimedia.org/r/329711 (https://phabricator.wikimedia.org/T154336) [19:20:15] (03PS5) 10Dzahn: staging: delete staging/common.yaml [puppet] - 10https://gerrit.wikimedia.org/r/328456 (https://phabricator.wikimedia.org/T153608) (owner: 10Tim Landscheidt) [19:24:28] (03PS1) 10Eevans: Prometheus JMX exporter deploy repository [software/prometheus_jmx_exporter] - 10https://gerrit.wikimedia.org/r/332542 (https://phabricator.wikimedia.org/T155120) [19:25:16] .away brb [19:40:16] (03PS1) 10Dzahn: dumps: switch to Letsencrypt for TLS cert [puppet] - 10https://gerrit.wikimedia.org/r/332543 (https://phabricator.wikimedia.org/T154940) [19:41:13] (03CR) 10jerkins-bot: [V: 04-1] dumps: switch to Letsencrypt for TLS cert [puppet] - 10https://gerrit.wikimedia.org/r/332543 (https://phabricator.wikimedia.org/T154940) (owner: 10Dzahn) [19:43:11] PROBLEM - Nginx local proxy to apache on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:43:41] PROBLEM - Apache HTTP on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:44:01] PROBLEM - HHVM rendering on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:45:23] (03PS2) 10Dzahn: dumps: switch to Letsencrypt for TLS cert [puppet] - 10https://gerrit.wikimedia.org/r/332543 (https://phabricator.wikimedia.org/T154940) [19:51:51] (03PS3) 10Dzahn: dumps: switch to Letsencrypt for TLS cert [puppet] - 10https://gerrit.wikimedia.org/r/332543 (https://phabricator.wikimedia.org/T154940) [19:52:27] !log demon@tin Finished scap: testwiki to wmf.8 + rebuild l10n (duration: 46m 21s) [19:52:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:51] (03PS4) 10Dzahn: dumps: switch to Letsencrypt for TLS cert [puppet] - 10https://gerrit.wikimedia.org/r/332543 (https://phabricator.wikimedia.org/T154940) [20:00:04] ostriches: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170117T2000). [20:00:14] (03CR) 10Chad: [C: 032] group0 to wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332536 (owner: 10Chad) [20:00:21] jouncebot: I was born ready [20:00:29] ostriches: hi :] [20:00:38] ostriches: sorry I have just CR+2 a bunch of mw/extension changes [20:01:13] I already created the branch ~1.5h ago [20:01:20] nice [20:01:57] If they're important, we can backport [20:02:01] Otherwise, missed the train :) [20:02:03] nop [20:02:16] that is to add banana i18n and jsonlint [20:02:23] so the CI queue is a bit overloaded [20:02:35] Ohhh, ok [20:02:36] Gotcha [20:02:43] we need moaar nodes [20:02:46] Yeah, only thing that I'll wait on is the one mw-config change [20:02:53] but I cant find a good metrics reflecting the slowdown / delay [20:03:20] mw-config has its own queue and is 2/3 done already [20:03:26] Shouldn't take long [20:07:58] (03Merged) 10jenkins-bot: group0 to wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332536 (owner: 10Chad) [20:08:07] hashar: Another option: throttle how fast people can click +2 :P :P [20:08:09] (03CR) 10jenkins-bot: group0 to wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332536 (owner: 10Chad) [20:08:43] !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.8 [20:08:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:45] 06Operations, 06Labs, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2946364 (10madhuvishy) [20:09:57] !log restarting hhvm on mw1227 - hhvm-dump-debug in /tmp/hhvm.25127.bt [20:10:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:32] RECOVERY - Apache HTTP on mw1227 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.035 second response time [20:12:51] RECOVERY - HHVM rendering on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 72554 bytes in 0.446 second response time [20:13:01] RECOVERY - Nginx local proxy to apache on mw1227 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.025 second response time [20:16:17] F'ing liquidthreads [20:28:01] (03CR) 10RobH: "I've commented in line regarding the certificate path info." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/332543 (https://phabricator.wikimedia.org/T154940) (owner: 10Dzahn) [20:30:05] (03PS1) 10Andrew Bogott: Labs salt: Remove salt::reactors [puppet] - 10https://gerrit.wikimedia.org/r/332554 [20:33:28] (03CR) 10Rush: [C: 032] Labs salt: Remove salt::reactors [puppet] - 10https://gerrit.wikimedia.org/r/332554 (owner: 10Andrew Bogott) [20:38:33] (03CR) 10Andrew Bogott: "Puppet compiler run looks basically good to me:" [puppet] - 10https://gerrit.wikimedia.org/r/332107 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [20:39:51] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [20:41:22] !log demon@tin Synchronized php-1.29.0-wmf.8/extensions/LiquidThreads/classes/Hooks.php: Fix warning about pass-by-ref (duration: 00m 40s) [20:41:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:42:01] PROBLEM - puppet last run on mc1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:44:29] (03CR) 10Andrew Bogott: [C: 032] "On a second run that analytics box is a no-op, so probably something just changed upstream mid-test. This patch is fine with me as long a" [puppet] - 10https://gerrit.wikimedia.org/r/332107 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [20:45:09] 06Operations, 06Labs, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2907785 (10Neil_P._Quinn_WMF) @madhuvishy Do you know what `editor-engagement` is? Is is the project for http://ee-dashboards.wmflabs.org? I'm trying to figure out... [20:45:31] PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:46:21] RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy [20:47:36] (03CR) 10Alex Monk: [C: 04-1] "We should be able to set openstack::version safely on labmon1001 and do this, I think" [puppet] - 10https://gerrit.wikimedia.org/r/328608 (https://phabricator.wikimedia.org/T104575) (owner: 10Alex Monk) [20:48:01] (03CR) 10Alex Monk: [C: 04-1] "Not sure if we can safely do this, don't know what openstack-related stuff goes on in labstore hosts" [puppet] - 10https://gerrit.wikimedia.org/r/328609 (https://phabricator.wikimedia.org/T104575) (owner: 10Alex Monk) [20:48:36] (03CR) 10Alex Monk: [C: 04-1] "should be able to make this work by setting openstack::version to mitaka" [puppet] - 10https://gerrit.wikimedia.org/r/329021 (https://phabricator.wikimedia.org/T104575) (owner: 10Alex Monk) [20:49:03] (03PS5) 10Andrew Bogott: site.pp - Use full class names, not relative ones [puppet] - 10https://gerrit.wikimedia.org/r/332107 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [20:54:40] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/5126/" [puppet] - 10https://gerrit.wikimedia.org/r/332527 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi) [20:54:51] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [20:58:03] 06Operations, 10ops-codfw, 10media-storage: Degraded RAID on ms-be2003 - https://phabricator.wikimedia.org/T155363#2946635 (10fgiunchedi) 05Open>03Resolved Thanks @Papaul ! disk is rebuilding [21:04:00] (03PS3) 10Eevans: WIP: Enable Prometheus JMX exporter on Cassandra nodes [puppet] - 10https://gerrit.wikimedia.org/r/332535 (https://phabricator.wikimedia.org/T155120) [21:05:39] (03PS2) 10Eevans: Prometheus JMX exporter deploy repository [software/prometheus_jmx_exporter] - 10https://gerrit.wikimedia.org/r/332542 (https://phabricator.wikimedia.org/T155120) [21:06:51] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [21:07:25] (03PS4) 10Eevans: WIP: Enable Prometheus JMX exporter on Cassandra nodes [puppet] - 10https://gerrit.wikimedia.org/r/332535 (https://phabricator.wikimedia.org/T155120) [21:10:01] RECOVERY - puppet last run on mc1001 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [21:20:11] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 602 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2976056 keys, up 78 days 12 hours - replication_delay is 602 [21:22:51] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [21:25:21] PROBLEM - Redis status tcp_6479 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 656 600 - REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 2967329 keys, up 78 days 12 hours - replication_delay is 656 [21:27:21] RECOVERY - Redis status tcp_6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 2967953 keys, up 78 days 12 hours - replication_delay is 0 [21:28:11] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2967383 keys, up 78 days 13 hours - replication_delay is 0 [21:28:48] (03PS1) 10Chad: MWMultiVersion: Better handling for bogus host headers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332643 [21:29:36] (03PS3) 10Filippo Giunchedi: udp2log: move to service_unit and systemd [puppet] - 10https://gerrit.wikimedia.org/r/313604 (https://phabricator.wikimedia.org/T123728) [21:32:45] bblack: If you've got a second, your thoughts on 332643 would be appreciated. Basically I'm wanting to swap a class of (very very low volume) 500s to 400. [21:32:46] (03CR) 10jerkins-bot: [V: 04-1] udp2log: move to service_unit and systemd [puppet] - 10https://gerrit.wikimedia.org/r/313604 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi) [21:32:55] (03CR) 1020after4: [C: 031] MWMultiVersion: Better handling for bogus host headers (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332643 (owner: 10Chad) [21:33:03] They're low enough volume I'm not anticipating problems, but wanna make sure they won't bust any monitoring you may have in place [21:35:37] (03PS1) 10Andrew Bogott: ldaplist: remove host lookup [puppet] - 10https://gerrit.wikimedia.org/r/332645 [21:36:46] (03PS4) 10Filippo Giunchedi: udp2log: move to service_unit and systemd [puppet] - 10https://gerrit.wikimedia.org/r/313604 (https://phabricator.wikimedia.org/T123728) [21:43:48] (03CR) 10Chad: MWMultiVersion: Better handling for bogus host headers (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332643 (owner: 10Chad) [21:45:31] 06Operations, 06Labs, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2946953 (10madhuvishy) @Neil_P._Quinn_WMF This is the link to the project - https://wikitech.wikimedia.org/wiki/Nova_Resource:Editor-engagement. It's described as... [21:45:36] (03CR) 10Chad: [C: 032] Remove getMediaWikiCli() entry point, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331933 (owner: 10Chad) [21:47:03] (03Merged) 10jenkins-bot: Remove getMediaWikiCli() entry point, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331933 (owner: 10Chad) [21:48:18] !log demon@tin Synchronized multiversion/MWVersion.php: Removing old getMediaWikiCli() entry point, unused (duration: 00m 39s) [21:48:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:51:28] (03PS1) 10Andrew Bogott: Designate: Rename the nova_ldap sink handler to wmf_sink [puppet] - 10https://gerrit.wikimedia.org/r/332646 [21:54:05] (03CR) 10jenkins-bot: Remove getMediaWikiCli() entry point, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331933 (owner: 10Chad) [21:54:41] (03PS5) 10Filippo Giunchedi: udp2log: move to service_unit and systemd [puppet] - 10https://gerrit.wikimedia.org/r/313604 (https://phabricator.wikimedia.org/T123728) [21:55:48] (03PS1) 10Chad: Remove ori's `mw` script [puppet] - 10https://gerrit.wikimedia.org/r/332648 [21:56:51] PROBLEM - puppet last run on maps1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:59:00] (03PS6) 10Filippo Giunchedi: udp2log: move to service_unit and systemd [puppet] - 10https://gerrit.wikimedia.org/r/313604 (https://phabricator.wikimedia.org/T123728) [21:59:03] (03PS1) 10Chad: Copy getMediaWiki() to MWMultiVersion, test usage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332651 [22:00:04] yurik, maxsem, and jgirault: Dear anthropoid, the time has come. Please deploy Interactive teamm depl (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170117T2200). [22:00:27] i suspect it got autocopied from last week [22:01:00] (03PS12) 10Volans: Cumin: allow connection to the targets [puppet] - 10https://gerrit.wikimedia.org/r/330436 (https://phabricator.wikimedia.org/T154588) [22:01:38] (03PS2) 10Chad: Copy getMediaWiki() to MWMultiVersion, test usage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332651 [22:06:29] AlexZ, by any chance are you around? [22:07:19] (03Abandoned) 10Paladox: Redirect /changes/ to /r/changes/ [puppet] - 10https://gerrit.wikimedia.org/r/330858 (https://phabricator.wikimedia.org/T154760) (owner: 10Paladox) [22:07:47] (03PS1) 10BBlack: Offboarding Yuri Astrakhan [puppet] - 10https://gerrit.wikimedia.org/r/332656 [22:07:55] volans: hi [22:08:25] AlexZ: hi, can I bother you a second about the wikimedia cloak? [22:08:28] (03CR) 10BBlack: [V: 032 C: 032] Offboarding Yuri Astrakhan [puppet] - 10https://gerrit.wikimedia.org/r/332656 (owner: 10BBlack) [22:09:32] volans: sure [22:13:22] AlexZ: thanks! See query [22:14:52] (03PS7) 10Filippo Giunchedi: udp2log: move to service_unit and systemd [puppet] - 10https://gerrit.wikimedia.org/r/313604 (https://phabricator.wikimedia.org/T123728) [22:17:26] (03PS13) 10Volans: Cumin: allow connection to the targets [puppet] - 10https://gerrit.wikimedia.org/r/330436 (https://phabricator.wikimedia.org/T154588) [22:17:42] (03CR) 10Filippo Giunchedi: "PCC for fluorine https://puppet-compiler.wmflabs.org/5134/" [puppet] - 10https://gerrit.wikimedia.org/r/313604 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi) [22:21:54] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[enforce-users-groups-cleanup] [22:23:14] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:25:54] RECOVERY - puppet last run on maps1002 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [22:26:05] (03CR) 10Chad: [C: 032] Copy getMediaWiki() to MWMultiVersion, test usage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332651 (owner: 10Chad) [22:26:54] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[enforce-users-groups-cleanup] [22:27:15] 06Operations, 10Traffic, 06Wikipedia-iOS-App-Backlog, 10iOS-app-feature-Links: Fix universal link support in iOS when the OS requests the site association file from m.wikipedia.org - https://phabricator.wikimedia.org/T155504#2947316 (10JMinor) p:05Triage>03Normal [22:27:37] (03Merged) 10jenkins-bot: Copy getMediaWiki() to MWMultiVersion, test usage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332651 (owner: 10Chad) [22:28:06] (03CR) 10jenkins-bot: Copy getMediaWiki() to MWMultiVersion, test usage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332651 (owner: 10Chad) [22:30:00] !log demon@tin Synchronized multiversion: Step 1/∞ of multiversion cleanups (duration: 00m 55s) [22:30:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:34:54] PROBLEM - puppet last run on stat1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[enforce-users-groups-cleanup] [22:35:14] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [22:37:54] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [22:39:14] PROBLEM - puppet last run on cp3022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:51:32] (03PS1) 10Chad: Swap rpc/RunJobs.php to use MWMultiVersion::getMediaWiki() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332664 [22:51:34] (03PS1) 10Chad: Swap most remaining entry points to using MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332665 [22:51:36] (03PS1) 10Chad: Turn getMediaWiki() into back-compat to MWMultiVersion::getMediaWiki() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332666 [22:58:54] RECOVERY - puppet last run on stat1003 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [23:06:33] (03PS1) 10Chad: Allow MWScript to be called for initialization purposes only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332670 [23:07:14] RECOVERY - puppet last run on cp3022 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [23:07:28] (03PS2) 10Chad: Allow MWScript to be called for initialization purposes only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332670 [23:08:46] 06Operations, 06Labs, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2947504 (10Neil_P._Quinn_WMF) @madhuvishy: okay, thanks! None of that looks like it impacts my work. I'm not sure who else one could talk to about that project, bu... [23:10:54] (03PS1) 10Chad: Stop using MWMinimalScriptInit [puppet] - 10https://gerrit.wikimedia.org/r/332673 [23:11:07] (03PS1) 10Chad: Remove MWMinimalScriptInit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332674 [23:13:20] (03Abandoned) 10Chad: mwrepl: Use MWScript.php directly [puppet] - 10https://gerrit.wikimedia.org/r/309605 (owner: 10Chad) [23:16:30] (03CR) 10Chad: [C: 032] Swap rpc/RunJobs.php to use MWMultiVersion::getMediaWiki() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332664 (owner: 10Chad) [23:18:56] (03Merged) 10jenkins-bot: Swap rpc/RunJobs.php to use MWMultiVersion::getMediaWiki() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332664 (owner: 10Chad) [23:19:06] (03CR) 10jenkins-bot: Swap rpc/RunJobs.php to use MWMultiVersion::getMediaWiki() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332664 (owner: 10Chad) [23:20:21] (03PS5) 10Eevans: WIP: Enable Prometheus JMX exporter on Cassandra nodes [puppet] - 10https://gerrit.wikimedia.org/r/332535 (https://phabricator.wikimedia.org/T155120) [23:21:12] !log demon@tin Synchronized rpc/RunJobs.php: Step 2/∞ of multiversion cleanups (duration: 00m 39s) [23:21:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:21:19] (03PS6) 10Eevans: WIP: Enable Prometheus JMX exporter on Cassandra nodes [puppet] - 10https://gerrit.wikimedia.org/r/332535 (https://phabricator.wikimedia.org/T155120) [23:21:31] heh, inf [23:26:50] (03PS1) 10Eevans: fix incorrect port in ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/332682 (https://phabricator.wikimedia.org/T155120) [23:31:46] (03Restored) 10Chad: WIP: Remove mobileportal docroot, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323999 (owner: 10Chad) [23:31:59] (03PS2) 10Chad: Remove extra layer of symlink indirection [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323999 [23:34:02] 06Operations, 10CirrusSearch, 06Discovery, 06Discovery-Search, 10Elasticsearch: Puppet changes required for elasticsearch 5.x upgrade - https://phabricator.wikimedia.org/T155578#2947598 (10EBernhardson) [23:37:39] 06Operations, 10CirrusSearch, 06Discovery, 06Discovery-Search, 10Elasticsearch: Puppet changes required for elasticsearch 5.x upgrade - https://phabricator.wikimedia.org/T155578#2947612 (10EBernhardson) I poked around how this works some. I don't think there will be any problems updating. But someone els... [23:39:08] (03CR) 10Chad: [C: 032] Allow MWScript to be called for initialization purposes only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332670 (owner: 10Chad) [23:40:38] (03Merged) 10jenkins-bot: Allow MWScript to be called for initialization purposes only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332670 (owner: 10Chad) [23:40:49] (03CR) 10jenkins-bot: Allow MWScript to be called for initialization purposes only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332670 (owner: 10Chad) [23:42:02] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: [epic] System level upgrade for cirrus / elasticsearch - https://phabricator.wikimedia.org/T151324#2947636 (10EBernhardson) [23:42:04] 06Operations, 10CirrusSearch, 06Discovery, 06Discovery-Search, 10Elasticsearch: Puppet changes required for elasticsearch 5.x upgrade - https://phabricator.wikimedia.org/T155578#2947634 (10EBernhardson) [23:42:34] !log demon@tin Synchronized multiversion/MWScript.php: Step 3/∞ of multiversion cleanups (duration: 00m 39s) [23:42:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:43:01] ori: Any chance I could get a +/- 1 (or 2, if you're feeling particularly generous :)) for https://gerrit.wikimedia.org/r/#/c/332648/? [23:44:29] (03CR) 10Chad: [C: 032] Turn getMediaWiki() into back-compat to MWMultiVersion::getMediaWiki() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332666 (owner: 10Chad) [23:44:43] (03CR) 10Chad: [C: 032] Swap most remaining entry points to using MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332665 (owner: 10Chad) [23:46:11] (03Merged) 10jenkins-bot: Swap most remaining entry points to using MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332665 (owner: 10Chad) [23:46:23] (03Merged) 10jenkins-bot: Turn getMediaWiki() into back-compat to MWMultiVersion::getMediaWiki() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332666 (owner: 10Chad) [23:47:36] !log demon@tin Synchronized w: Step 4/∞ of multiversion cleanups (duration: 00m 39s) [23:47:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:47:43] ostriches: is mwrepl available on all app servers? [23:47:46] or just deployment hosts? [23:47:50] Hmm, good question! [23:48:00] (03CR) 10jenkins-bot: Swap most remaining entry points to using MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332665 (owner: 10Chad) [23:48:56] I would bet only deploy and task servers. That has historically been the only places with mwscript [23:48:59] ori: Looks like everywhere [23:49:17] mwrepl is in mediawiki/init.pp [23:49:26] huh. /me loves being wrong about that [23:49:30] # == Class: mediawiki [23:49:35] include ::mediawiki::mwrepl [23:49:43] does it actually work? [23:49:55] I don't have my yubikey in this computer, so I can't check [23:50:29] I know it works on masters [23:50:32] * ostriches tries a random apache [23:51:03] and can you use it to invoke a php file located anywhere? (outside the source tree) [23:51:09] /usr/local/bin/mwrepl: line 32: expanddblist: command not found [23:51:10] Error: Unknown wiki: enwiki [23:51:11] Nerppp [23:51:53] yeouch [23:52:11] if it's actively blocking some interface change or removal, sure, nuke it, but if not I'd rather keep it unless mwrepl provides the same functionality [23:52:37] I'll BBL but if you leave a comment on the CL i'll check it later [23:55:33] 06Operations, 10fundraising-tech-ops, 10procurement: ssl renewal : *.frdev.wikimedia.org expires on 2017-02-10 - https://phabricator.wikimedia.org/T155584#2947717 (10RobH) [23:58:57] (03CR) 10Chad: "So per IRC and here, this isn't strictly identical to mwrepl, which also isn't available everywhere. It doesn't have to be outright delete" [puppet] - 10https://gerrit.wikimedia.org/r/332648 (owner: 10Chad)