[00:32:57] PROBLEM - puppet last run on eeden is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:00:57] RECOVERY - puppet last run on eeden is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [01:17:35] 06Operations, 10netops: Packet loss from Voxel to text load balancers - https://phabricator.wikimedia.org/T153998#2927180 (10faidon) 05Open>03stalled This is impossible to debug further without more information. Can we get a complete traceroute (ICMP or UDP, although TCP in addition to those won't hurt) a... [01:18:19] 06Operations, 10ops-ulsfo, 10netops: lvs4002 power supply failure - https://phabricator.wikimedia.org/T151273#2927187 (10faidon) a:03RobH @RobH what's the status of this? [01:18:50] 06Operations, 10netops: Investigate why disabling an uplink port did not deprioritize VRRP on cr2-eqiad - https://phabricator.wikimedia.org/T119759#2927190 (10faidon) 05Open>03Resolved a:03faidon Resolving as per the above. [01:19:33] 06Operations, 10Monitoring, 10netops, 13Patch-For-Review: Juniper monitoring - https://phabricator.wikimedia.org/T83992#2927193 (10faidon) [01:34:27] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [01:37:27] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [01:39:57] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/1: down - Core: cr1-ulsfo:xe-1/2/0 (Telia, IC-313592, 51ms) {#11372} [10Gbps wave]BR [01:39:57] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 66, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/2/0: down - Core: cr1-eqord:xe-0/0/1 (Telia, IC-313592, 51ms) {#1502} [10Gbps wave]BR [01:57:57] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [01:57:57] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 68, down: 0, dormant: 0, excluded: 0, unused: 0 [02:08:37] PROBLEM - puppet last run on rdb1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:36:40] RECOVERY - puppet last run on rdb1008 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [02:41:21] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.7) (duration: 27m 03s) [02:41:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:45:57] !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Jan 9 02:45:57 UTC 2017 (duration 4m 36s) [02:46:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:37:57] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/1: down - Core: cr1-ulsfo:xe-1/2/0 (Telia, IC-313592, 51ms) {#11372} [10Gbps wave]BR [03:37:58] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 66, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/2/0: down - Core: cr1-eqord:xe-0/0/1 (Telia, IC-313592, 51ms) {#1502} [10Gbps wave]BR [03:50:57] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [03:50:57] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 68, down: 0, dormant: 0, excluded: 0, unused: 0 [06:33:06] (03PS1) 10Urbanecm: Add en.wikinews and es.wikinews as import source in testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331229 (https://phabricator.wikimedia.org/T154879) [06:36:47] PROBLEM - Check HHVM threads for leakage on mw1259 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:39:37] PROBLEM - Check HHVM threads for leakage on mw1260 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:42:47] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:44:37] PROBLEM - puppet last run on db1044 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[apt-transport-https] [07:12:05] !log restart elasticsearch on relforge100[12] to adjust ltr logging settings [07:12:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:37] RECOVERY - puppet last run on db1044 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [07:31:47] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:34:37] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [07:37:37] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:22:47] RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK [08:59:47] RECOVERY - Check HHVM threads for leakage on mw1260 is OK: OK [09:04:32] (03CR) 10Alexandros Kosiaris: [C: 032] Fix parameter alignment [puppet] - 10https://gerrit.wikimedia.org/r/330844 (owner: 10KartikMistry) [09:04:37] (03PS2) 10Alexandros Kosiaris: Fix parameter alignment [puppet] - 10https://gerrit.wikimedia.org/r/330844 (owner: 10KartikMistry) [09:04:45] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Fix parameter alignment [puppet] - 10https://gerrit.wikimedia.org/r/330844 (owner: 10KartikMistry) [09:21:35] (03CR) 10Alexandros Kosiaris: [C: 032] puppetmaster: fail on private post-commit hook [puppet] - 10https://gerrit.wikimedia.org/r/330824 (owner: 10Filippo Giunchedi) [09:21:41] (03PS2) 10Alexandros Kosiaris: puppetmaster: fail on private post-commit hook [puppet] - 10https://gerrit.wikimedia.org/r/330824 (owner: 10Filippo Giunchedi) [09:21:45] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] puppetmaster: fail on private post-commit hook [puppet] - 10https://gerrit.wikimedia.org/r/330824 (owner: 10Filippo Giunchedi) [09:30:57] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:33:37] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [09:36:37] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:40:13] (03PS1) 10Hashar: (WIP) puppet parse from rake (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154894) [09:41:47] RECOVERY - Check HHVM threads for leakage on mw1259 is OK: OK [09:46:34] (03PS1) 10Odder: Add Collection namespace to the Polish Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331240 (https://phabricator.wikimedia.org/T154711) [09:47:08] (03CR) 10jerkins-bot: [V: 04-1] Add Collection namespace to the Polish Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331240 (https://phabricator.wikimedia.org/T154711) (owner: 10Odder) [09:52:10] (03PS2) 10Odder: Add Collection namespace to the Polish Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331240 (https://phabricator.wikimedia.org/T154711) [09:52:46] (03CR) 10jerkins-bot: [V: 04-1] Add Collection namespace to the Polish Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331240 (https://phabricator.wikimedia.org/T154711) (owner: 10Odder) [09:53:37] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:58:57] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [10:01:10] (03PS3) 10Odder: Add Collection namespace to the Polish Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331240 (https://phabricator.wikimedia.org/T154711) [10:03:37] PROBLEM - puppet last run on mc1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:08:12] (03PS1) 10Alexandros Kosiaris: tegmen: Set do_acme: false [puppet] - 10https://gerrit.wikimedia.org/r/331242 [10:17:04] (03CR) 10Alexandros Kosiaris: [C: 032] Partial revert of I89bd171e (LE part) [puppet] - 10https://gerrit.wikimedia.org/r/330694 (owner: 10Alex Monk) [10:17:09] (03PS2) 10Alexandros Kosiaris: Partial revert of I89bd171e (LE part) [puppet] - 10https://gerrit.wikimedia.org/r/330694 (owner: 10Alex Monk) [10:17:17] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Partial revert of I89bd171e (LE part) [puppet] - 10https://gerrit.wikimedia.org/r/330694 (owner: 10Alex Monk) [10:18:04] 06Operations, 10Citoid, 06Services, 10VisualEditor: NIH db misbehaviour causing problems to Citoid - https://phabricator.wikimedia.org/T133696#2927679 (10Mvolz) I had a case yesterday where I had two different responses for the same request, because in one of them, PubMed timed out. Happy to reduce the tim... [10:19:36] (03CR) 10Alexandros Kosiaris: [C: 032] tegmen: Set do_acme: false [puppet] - 10https://gerrit.wikimedia.org/r/331242 (owner: 10Alexandros Kosiaris) [10:19:40] (03PS2) 10Alexandros Kosiaris: tegmen: Set do_acme: false [puppet] - 10https://gerrit.wikimedia.org/r/331242 [10:19:43] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] tegmen: Set do_acme: false [puppet] - 10https://gerrit.wikimedia.org/r/331242 (owner: 10Alexandros Kosiaris) [10:22:37] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [10:30:37] RECOVERY - puppet last run on mc1029 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [10:33:37] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [10:36:37] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [10:38:19] !log restart nginx and rcstream on rcs1001.eqiad.wmnet to debug issue with prematurely closed connections and 502 returned to clients. No change witnessed. [10:38:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:57] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Puppet has 10 failures. Last run 2 minutes ago with 10 failures. Failed resources (up to 3 shown): Service[salt-minion],Service[ssh],Service[nagios-nrpe-server],Package[tzdata] [10:47:47] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:07:57] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [11:15:47] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [11:25:56] (03CR) 10Hashar: "Or maybe try using the tasks from 'puppetlabs_spec_helper/rake_tasks'" [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [11:29:28] (03CR) 10Hashar: [C: 04-1] (WIP) puppet parse from rake (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [11:33:37] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [11:36:37] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [11:38:15] (03CR) 10Hashar: [C: 04-1] "So turns out that I have reinvented the wheel. puppet-syntax/tasks/puppet-syntax has tasks to check manifests/templates and even hiera fil" [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [11:40:15] (03PS1) 10Juniorsys: Add missing trailing commas to Puppet resources [puppet] - 10https://gerrit.wikimedia.org/r/331247 (https://phabricator.wikimedia.org/T93645) [11:52:13] 06Operations, 10Citoid, 10RESTBase, 10RESTBase-API, and 4 others: Set-up Citoid behind RESTBase - https://phabricator.wikimedia.org/T108646#2927705 (10Mvolz) [12:33:00] jouncebot: next [12:33:00] In 1 hour(s) and 26 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170109T1400) [12:33:55] hasharLunch: busy eu swat today ^ [12:41:04] 06Operations, 10OTRS, 07Wikimedia-Incident: OTRS error (back up, now monitoring) - https://phabricator.wikimedia.org/T154841#2927772 (10akosiaris) Turns out the issue was created by 2 modified GenericAgent tasks ran that ended up, some time after they were ran, consuming most of memory, CPU and finally disk... [12:41:09] 06Operations, 10OTRS, 07Wikimedia-Incident: OTRS error (back up, now monitoring) - https://phabricator.wikimedia.org/T154841#2927773 (10akosiaris) 05Open>03Resolved a:03akosiaris [12:43:31] (03CR) 10Alexandros Kosiaris: [C: 032] delete icinga SSL cert, not needed anymore [puppet] - 10https://gerrit.wikimedia.org/r/330957 (owner: 10Dzahn) [12:43:40] (03PS2) 10Alexandros Kosiaris: delete icinga SSL cert, not needed anymore [puppet] - 10https://gerrit.wikimedia.org/r/330957 (owner: 10Dzahn) [12:43:43] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] delete icinga SSL cert, not needed anymore [puppet] - 10https://gerrit.wikimedia.org/r/330957 (owner: 10Dzahn) [12:51:47] PROBLEM - puppet last run on db1071 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:19:47] RECOVERY - puppet last run on db1071 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [13:35:57] jouncebot, next [13:35:57] In 0 hour(s) and 24 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170109T1400) [13:55:57] hasharLunch: Urbanecm was busy for eu swat today :) want to do the swat, or should I? [14:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170109T1400). [14:00:04] Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [14:01:41] Ready for SWAT! [14:02:16] (03CR) 10Alexandros Kosiaris: "Minor comments inline, +1 otherwise" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/330829 (https://phabricator.wikimedia.org/T133717) (owner: 10Dzahn) [14:02:27] o/ [14:02:37] hasharLunch: around for swat? [14:03:00] Urbanecm: if hasharLunch is not around, I will do the swat [14:04:26] ok, looks like he is not around... [14:04:30] I can SWAT today! [14:04:57] Okay :) [14:07:30] Urbanecm: expect for the throttle commit, can the rest be tested on mwdebug1002? (once the commits are there) [14:08:28] Except the whitelist and throttle patches yes. [14:08:33] (03PS2) 10Zfilipin: Add digitalmedia.fws.gov to the whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330685 (https://phabricator.wikimedia.org/T154671) (owner: 10Urbanecm) [14:09:07] great, will ping you then when the commits are deployed, so you can test [14:09:16] Okay [14:09:47] PROBLEM - puppet last run on scb1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:12:32] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330685 (https://phabricator.wikimedia.org/T154671) (owner: 10Urbanecm) [14:13:08] (03Merged) 10jenkins-bot: Add digitalmedia.fws.gov to the whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330685 (https://phabricator.wikimedia.org/T154671) (owner: 10Urbanecm) [14:13:20] (03CR) 10jenkins-bot: Add digitalmedia.fws.gov to the whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330685 (https://phabricator.wikimedia.org/T154671) (owner: 10Urbanecm) [14:14:41] (03PS3) 10Zfilipin: Add HD logos for multiple projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330719 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [14:16:56] Urbanecm: 330685 is at mwdebug1002, please test [14:19:29] zeljkof, works [14:19:34] didn't gerrit display logos?! I remember seeing them before, but now all I get is "Binary files differ" [14:19:39] https://gerrit.wikimedia.org/r/#/c/330719/3/static/images/project-logos/cswikiquote-1.5x.png,unified [14:19:57] Seems it run "diff" only... [14:20:04] but it did display them. [14:20:22] Urbanecm: ok, deploying 330685 [14:20:46] so, it's not just me not able to see logos in gerrit? it's at least you :) [14:20:50] (you too) [14:21:00] Yes, I get the same message. [14:25:07] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:330685|Add digitalmedia.fws.gov to the whitelist (T154671)]] (duration: 02m 38s) [14:25:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:10] T154671: Please add to the wgCopyUploadsDomains whitelist of Wikimedia Commons - https://phabricator.wikimedia.org/T154671 [14:25:53] Urbanecm: ok, scap is a bit slower today, but the first patch is deployed, please test [14:27:23] Works at least at beta [14:29:08] zeljkof, ^ [14:29:16] Urbanecm: great [14:29:21] working on the second one [14:29:24] ok [14:32:41] Urbanecm: had to clone the repo and get the changes to my machines, but the logos look fine, deploying [14:32:58] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330719 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [14:33:01] That's great! [14:33:32] (03Merged) 10jenkins-bot: Add HD logos for multiple projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330719 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [14:33:33] Urbanecm: should I push 330719 (logos) directly to production? there is no need to put them on canary first, right? [14:33:47] (03CR) 10jenkins-bot: Add HD logos for multiple projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330719 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [14:33:47] Yeah, you can. [14:37:47] RECOVERY - puppet last run on scb1002 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [14:38:29] (03PS2) 10Zfilipin: Enable Extension:Babel's category on cswikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330847 (https://phabricator.wikimedia.org/T67211) (owner: 10Urbanecm) [14:39:00] not sure what happened to scap, it is really slow today... [14:39:20] sync-masters usually took a second, now a few minutes [14:39:46] !log zfilipin@tin Synchronized static/images/project-logos: SWAT: [[gerrit:330719|Add HD logos for multiple projects (T150618)]] (duration: 02m 36s) [14:39:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:50] T150618: Provide HD logos for all projects - https://phabricator.wikimedia.org/T150618 [14:40:07] Urbanecm: ok, logos synced to prod, please test [14:41:20] Logos are there, I have no HD display so I can't check it fully. [14:41:26] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330847 (https://phabricator.wikimedia.org/T67211) (owner: 10Urbanecm) [14:41:43] Urbanecm: ok, good enough I think :) [14:42:01] (03Merged) 10jenkins-bot: Enable Extension:Babel's category on cswikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330847 (https://phabricator.wikimedia.org/T67211) (owner: 10Urbanecm) [14:42:14] (03CR) 10jenkins-bot: Enable Extension:Babel's category on cswikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330847 (https://phabricator.wikimedia.org/T67211) (owner: 10Urbanecm) [14:42:57] (03PS3) 10Zfilipin: [throttle] Lift for 2017-01-10/12 + minor cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330855 (https://phabricator.wikimedia.org/T154312) (owner: 10Urbanecm) [14:44:29] Hi. [14:44:45] Urbanecm: 330847 is at mwdebug1002, please test [14:44:46] Urbanecm: we've all in the config for https://phabricator.wikimedia.org/T154312 indian workshops? [14:45:50] I see you take care of that with https://gerrit.wikimedia.org/r/#/c/330855/3/wmf-config/throttle.php thanks Urbanecm and zeljkof [14:46:04] Dereckson: working on it now [14:46:21] Dereckson, yes, seems we have all. [14:46:24] You're welcome :) [14:46:36] BTW they should learn better way how to request this... [14:47:07] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330855 (https://phabricator.wikimedia.org/T154312) (owner: 10Urbanecm) [14:47:38] Urbanecm: checked 330847 on mwdebug? [14:47:42] (03Merged) 10jenkins-bot: [throttle] Lift for 2017-01-10/12 + minor cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330855 (https://phabricator.wikimedia.org/T154312) (owner: 10Urbanecm) [14:47:44] can I deploy it? [14:47:45] zeljkof, checking. [14:47:54] Works, please deploy. [14:47:54] (03CR) 10jenkins-bot: [throttle] Lift for 2017-01-10/12 + minor cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330855 (https://phabricator.wikimedia.org/T154312) (owner: 10Urbanecm) [14:48:00] Urbanecm: great, deploying [14:48:18] Thanks [14:51:25] (03PS2) 10Zfilipin: Add DW alias for NS_PROJECT_TALK in frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328719 (https://phabricator.wikimedia.org/T153952) (owner: 10Urbanecm) [14:51:33] zeljkof, just a notify. After 328719 a script will be needed as there are some pages with DW in the begin. [14:52:03] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:330847|Enable Extension:Babel s category on cswikiversity (T67211)]] (duration: 02m 36s) [14:52:05] Urbanecm: thanks for the heads up [14:52:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:06] T67211: Enable Extension:Babel's category on Czech Wikiversity - https://phabricator.wikimedia.org/T67211 [14:52:31] Some docs are in https://wikitech.wikimedia.org/wiki/Adding_Namespaces [14:52:35] (if you need it) [14:52:43] Urbanecm: I do! :) [14:52:57] 330847 deployed, please test [14:54:28] zeljkof: dereckson@terbium:~$ mwscript namespaceDupes.php frwiki gives *before the change* 0 pages to fix, 0 were resolvable. 0 links to fix, 0 were resolvable. Looks good! as output, so all is good, you won't have any issue running that. [14:55:02] Sometimes there are odd cases like pages created twice, one in (main) as "Foo:Bar", one in the namespace "Foo" as "Bar" [14:55:31] in such case, --merge gives good results, but you need to ping a local sysop to check and clean afterwards [14:55:31] Dereckson: thanks! [14:56:15] (generally dumping the list of the output on the task and tell "Please check histories are coherent." works well for that) [14:56:19] zeljkof, 330847 works. [14:58:02] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328719 (https://phabricator.wikimedia.org/T153952) (owner: 10Urbanecm) [14:58:34] (03Merged) 10jenkins-bot: Add DW alias for NS_PROJECT_TALK in frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328719 (https://phabricator.wikimedia.org/T153952) (owner: 10Urbanecm) [14:58:45] (03CR) 10jenkins-bot: Add DW alias for NS_PROJECT_TALK in frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328719 (https://phabricator.wikimedia.org/T153952) (owner: 10Urbanecm) [14:59:41] Urbanecm: great [15:00:02] !log zfilipin@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:330855|[throttle] Lift for 2017-01-10/12 + minor cleanup (T154312)]] (duration: 02m 36s) [15:00:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:06] T154312: Request for a temporary lift of account creation cap on IPs (2017-01-04,2017-01-06,2017-01-10,2017-01-12) - https://phabricator.wikimedia.org/T154312 [15:00:38] Urbanecm: 330855 (throttle) deployed, nothing to check, moving on... [15:01:01] (03PS2) 10Zfilipin: Enable import from cswiki to arbcom_cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330983 (https://phabricator.wikimedia.org/T154799) (owner: 10Urbanecm) [15:04:25] Urbanecm: 328719 is at mwdebug1002, please test [15:04:59] !log extending EU SWAT, tree more patches left to deploy [15:05:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:44] zeljkof, if you didn't run the script it is ok. [15:05:55] Urbanecm: not yet [15:06:03] Okay. So please deploy and run :) [15:06:16] ok, so first deploy to prod, then run the script? [15:06:31] I've done it only once I think... [15:06:59] Yes. [15:10:07] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:328719|Add DW alias for NS_PROJECT_TALK in frwiki (T153952)]] (duration: 02m 36s) [15:10:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:10:11] T153952: Create namespace redirection "DW:" for fr.wp - https://phabricator.wikimedia.org/T153952 [15:10:32] Urbanecm: 328719 deployed, running the script... [15:10:53] ok [15:14:11] (03CR) 10Zfilipin: "https://phabricator.wikimedia.org/T153952#2927974" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328719 (https://phabricator.wikimedia.org/T153952) (owner: 10Urbanecm) [15:14:26] Urbanecm: the script worked fine ^ [15:14:54] please test [15:15:08] two more patches... [15:15:18] Works :) [15:16:41] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330983 (https://phabricator.wikimedia.org/T154799) (owner: 10Urbanecm) [15:17:16] (03Merged) 10jenkins-bot: Enable import from cswiki to arbcom_cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330983 (https://phabricator.wikimedia.org/T154799) (owner: 10Urbanecm) [15:17:42] (03PS2) 10Zfilipin: Add en.wikinews and es.wikinews as import source in testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331229 (https://phabricator.wikimedia.org/T154879) (owner: 10Urbanecm) [15:18:03] (03CR) 10jenkins-bot: Enable import from cswiki to arbcom_cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330983 (https://phabricator.wikimedia.org/T154799) (owner: 10Urbanecm) [15:20:40] Urbanecm: 330983 is at mwdebug1002, please test [15:21:14] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331229 (https://phabricator.wikimedia.org/T154879) (owner: 10Urbanecm) [15:21:44] Works [15:21:46] (03Merged) 10jenkins-bot: Add en.wikinews and es.wikinews as import source in testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331229 (https://phabricator.wikimedia.org/T154879) (owner: 10Urbanecm) [15:21:57] (03CR) 10jenkins-bot: Add en.wikinews and es.wikinews as import source in testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331229 (https://phabricator.wikimedia.org/T154879) (owner: 10Urbanecm) [15:22:06] ok, deploying [15:25:40] ok [15:26:05] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:330983|Enable import from cswiki to arbcom_cswiki (T154799)]] (duration: 02m 38s) [15:26:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:26:09] T154799: Enable import from cswiki to arbcom_cswiki - https://phabricator.wikimedia.org/T154799 [15:26:58] Urbanecm: 330983 deployed, please check [15:27:00] (03PS1) 10Hashar: Introduce linters using rake [puppet/cdh] - 10https://gerrit.wikimedia.org/r/331312 (https://phabricator.wikimedia.org/T154894) [15:27:23] works [15:27:38] the last commit for eu swat, 331229... [15:28:51] Urbanecm: 331229 is at mwdebug1002, please test [15:29:13] (03CR) 10Hashar: "check experimental" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/331312 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [15:29:34] zeljkof, sadly I have no rights for testing... [15:29:52] Urbanecm: will you be able to test once it is deployed? [15:30:33] zeljkof, no. I must have a sysop bit there and I do not have it. [15:30:47] Urbanecm: ok, in that case, deploying [15:30:56] (03CR) 10Hashar: "check experimental" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/331312 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [15:31:06] Okay. I'll ask the author of the request for testing. [15:31:08] could you please leave a comment in gerrit/phab that somebody needs to test it? [15:31:20] I think I already answered. [15:31:28] I'll put in the resolved comment. [15:31:34] great, sorry, did not see it [15:31:50] :) [15:32:02] (03PS2) 10Hashar: Introduce linters using rake [puppet/cdh] - 10https://gerrit.wikimedia.org/r/331312 (https://phabricator.wikimedia.org/T154894) [15:32:23] (03CR) 10Hashar: "check experimental" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/331312 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [15:33:10] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:331229|Add en.wikinews and es.wikinews as import source in testwiki (T154879)]] (duration: 02m 38s) [15:33:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:14] T154879: Add en.wikinews and es.wikinews as import source in testwiki - https://phabricator.wikimedia.org/T154879 [15:34:09] (03PS1) 10Dereckson: Set Translation namespace on ml.wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331313 (https://phabricator.wikimedia.org/T154087) [15:35:13] Seems the backlog is empty! [15:35:13] Urbanecm: all deployed, thanks for flying with #releng! :) [15:35:26] !log finished EU SWAT! [15:35:26] Thank you for the deployment! [15:35:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:37:00] * zeljkof is out of lunch [15:40:20] (03PS3) 10Hashar: Introduce linters using rake [puppet/cdh] - 10https://gerrit.wikimedia.org/r/331312 (https://phabricator.wikimedia.org/T154894) [15:40:41] (03CR) 10Hashar: "check experimental" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/331312 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [15:42:14] 06Operations, 10MediaWiki-Vagrant, 06Release-Engineering-Team, 07Epic: [EPIC] Migrate base image to Debian Jessie - https://phabricator.wikimedia.org/T136429#2928039 (10bd808) [15:46:43] (03PS4) 10Hashar: Introduce linters using rake [puppet/cdh] - 10https://gerrit.wikimedia.org/r/331312 (https://phabricator.wikimedia.org/T154894) [15:47:03] (03CR) 10Hashar: "check experimental" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/331312 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [16:04:49] (03PS5) 10Hashar: Introduce linters using rake [puppet/cdh] - 10https://gerrit.wikimedia.org/r/331312 (https://phabricator.wikimedia.org/T154894) [16:04:51] (03PS1) 10Hashar: Introduce linters using rake [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/331327 (https://phabricator.wikimedia.org/T154894) [16:04:56] (03PS1) 10Hashar: Introduce linters using rake [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/331328 (https://phabricator.wikimedia.org/T154894) [16:04:58] (03PS1) 10Hashar: Introduce linters using rake [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/331329 (https://phabricator.wikimedia.org/T154894) [16:05:00] (03PS1) 10Hashar: Introduce linters using rake [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/331330 (https://phabricator.wikimedia.org/T154894) [16:06:33] (03PS1) 10Hashar: Add .gitreview [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/331331 [16:06:35] (03PS1) 10Hashar: Introduce linters using rake [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/331332 (https://phabricator.wikimedia.org/T154894) [16:08:05] (03CR) 10Hashar: "check experimental" [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/331327 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [16:08:08] (03CR) 10Hashar: "check experimental" [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/331328 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [16:08:11] (03CR) 10Hashar: "check experimental" [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/331329 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [16:08:14] (03CR) 10Hashar: "check experimental" [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/331330 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [16:08:17] (03CR) 10Hashar: "check experimental" [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/331332 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [16:08:26] (03CR) 10jerkins-bot: [V: 04-1] Introduce linters using rake [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/331328 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [16:14:36] (03CR) 10jerkins-bot: [V: 04-1] Introduce linters using rake [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/331328 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [16:16:25] (03CR) 10Hashar: "check experimental" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/331312 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [16:19:30] (03PS2) 10Hashar: Introduce linters using rake [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/331328 (https://phabricator.wikimedia.org/T154894) [16:19:32] (03PS1) 10Hashar: Ignore flake8 error about duplicate keys in dict [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/331333 [16:20:30] (03CR) 10Hashar: "check experimental" [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/331328 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [16:33:47] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [16:36:47] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:45:29] (03CR) 10Alex Monk: "(it looks like the LE class handles this internally anyway)" [puppet] - 10https://gerrit.wikimedia.org/r/330694 (owner: 10Alex Monk) [17:14:48] (03PS2) 10Hashar: puppet parse from rake [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154894) [17:18:34] (03CR) 10jerkins-bot: [V: 04-1] puppet parse from rake [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [17:23:34] (03PS3) 10Hashar: puppet parse from rake [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154894) [17:23:36] (03PS1) 10Hashar: openstack: capitalize service resource in keystone [puppet] - 10https://gerrit.wikimedia.org/r/331352 [17:25:27] (03CR) 10jerkins-bot: [V: 04-1] puppet parse from rake [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [17:27:53] grlblblblb [17:27:56] import realm;pp [17:27:58] bahhh [17:29:45] hashar: There's a problem in your MyIRC syntax; check the manual... [17:30:02] ;-D [17:33:57] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:38:47] PROBLEM - puppet last run on mc1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:38:52] 06Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Asynchronous processing in production: one queue to rule them all - https://phabricator.wikimedia.org/T149408#2928202 (10Joe) Slides for the starting the discussion available here https://docs.google.com/presentation/d/1DCofLYbP1dWnTb1JWNNn... [17:51:54] !log Updated the Wikidata property suggester with data from last Monday's JSON dump and applied the T132839 workarounds [17:51:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:51:58] T132839: [RfC] Property suggester suggests human properties for non-human items - https://phabricator.wikimedia.org/T132839 [18:02:57] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [18:03:47] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [18:04:20] (03CR) 10Dzahn: "I thought logstash wasn't running when we tried last time, because it was crashed, and that explained it? Why are we changing other stuff" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [18:04:50] (03CR) 10Paladox: "> I thought logstash wasn't running when we tried last time, because" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [18:05:05] 06Operations, 10Traffic: convert wikitech.wikimedia.org from globalsign to letsencrypt certificate (deadline 2017-02-24) - https://phabricator.wikimedia.org/T154913#2928272 (10RobH) [18:05:47] RECOVERY - puppet last run on mc1019 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [18:06:47] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:08:35] (03PS12) 10Paladox: Gerrit: Add support for logstash in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) [18:08:37] 06Operations, 10Traffic: convert wikitech.wikimedia.org from globalsign to letsencrypt certificate (deadline 2017-02-24) - https://phabricator.wikimedia.org/T154913#2928308 (10RobH) [18:09:09] 06Operations, 10Traffic: convert wikitech.wikimedia.org from globalsign to letsencrypt certificate (deadline 2017-02-24) - https://phabricator.wikimedia.org/T154913#2928272 (10RobH) [18:09:11] (03CR) 10Paladox: "@Chad im re enabling it by default again (Only for prod gerrit) so that we can try again when ever you decide to :)" [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [18:11:50] 06Operations, 07Puppet, 10Continuous-Integration-Config: Get rid of "import realm.pp" in manifests/site.pp - https://phabricator.wikimedia.org/T154915#2928315 (10hashar) [18:16:58] !log rebooting and powercycling mira, CPU frequency throttled, suspecting firmware bug [18:17:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:18:27] PROBLEM - Host mira is DOWN: PING CRITICAL - Packet loss = 100% [18:20:27] RECOVERY - Host mira is UP: PING OK - Packet loss = 0%, RTA = 36.12 ms [18:21:14] 06Operations, 10Traffic, 10Wikimedia-Mailing-lists: convert lists.wikimedia.org certificate to LetsEncrypt (deadline:2017-03-02) - https://phabricator.wikimedia.org/T154917#2928370 (10RobH) [18:21:17] RECOVERY - Improperly owned -0:0- files in /srv/mediawiki-staging on mira is OK: Files ownership is ok. [18:22:58] 06Operations, 10Citoid, 10ContentTranslation-CXserver, 10MediaWiki-extensions-ContentTranslation, and 4 others: Decom legacy ex-parsoidcache cxserver, citoid, and restbase service hostnames - https://phabricator.wikimedia.org/T133001#2928388 (10mobrovac) [18:23:04] 06Operations, 10Citoid, 10RESTBase, 10RESTBase-API, and 4 others: Set-up Citoid behind RESTBase - https://phabricator.wikimedia.org/T108646#2928385 (10mobrovac) 05Open>03Resolved a:03mobrovac This can now be considered done. There is still {T133001} to deal with, though. [18:23:07] PROBLEM - Keyholder SSH agent on mira is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. [18:30:57] PROBLEM - puppet last run on cobalt is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:33:47] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [18:34:13] 06Operations, 10Citoid, 06Services, 10VisualEditor: NIH db misbehaviour causing problems to Citoid - https://phabricator.wikimedia.org/T133696#2928415 (10mobrovac) p:05High>03Normal The imminent problem with production alerts was //dealt with// in [PS 295678](https://gerrit.wikimedia.org/r/#/c/295678/)... [18:36:47] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:37:17] (03PS2) 10Tim Landscheidt: Tools: Fully qualify hostnames [puppet] - 10https://gerrit.wikimedia.org/r/328451 (https://phabricator.wikimedia.org/T153608) [18:40:36] 06Operations, 10Traffic: convert librenms.wikimedia.org from GS to LE cert (expires: 2017-09-11) - https://phabricator.wikimedia.org/T154919#2928434 (10RobH) [18:41:34] 06Operations, 10Traffic: convert librenms.wikimedia.org from GS to LE cert (expires: 2017-02-11) - https://phabricator.wikimedia.org/T154919#2928449 (10RobH) [18:42:14] (03PS1) 10RobH: Convert librenms.wikimedia.org to LE cert use [puppet] - 10https://gerrit.wikimedia.org/r/331370 [18:50:22] 06Operations, 10Analytics, 10ChangeProp, 10Citoid, and 11 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2928474 (10mobrovac) [18:51:12] 06Operations, 10Analytics, 10ChangeProp, 10Citoid, and 11 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2748991 (10mobrovac) Scheduled to happen the week of 2017-01-16 [18:51:24] 06Operations, 10ops-eqiad: investigate lead hardware issue - https://phabricator.wikimedia.org/T147905#2928476 (10RobH) 05Resolved>03Open I'm re-opening this task, as there was a CPU frequency issues on this that were never resolved. This came up today, as mira had similar frequency issues that were resol... [18:53:47] PROBLEM - puppet last run on mw1297 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:58:57] RECOVERY - puppet last run on cobalt is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [19:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170109T1900). [19:00:28] not it [19:00:39] is empty [19:01:14] I'm which case I'll take full credit for a successful swat ;-) [19:01:21] (03CR) 10Mobrovac: [C: 031] Include hhvm fatals and exceptions in scap canary checks [puppet] - 10https://gerrit.wikimedia.org/r/304327 (https://phabricator.wikimedia.org/T142784) (owner: 10Thcipriani) [19:04:57] PROBLEM - Check systemd state on elastic2035 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:04:57] PROBLEM - Check systemd state on elastic2031 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:04:57] PROBLEM - Check systemd state on elastic2025 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:04:57] PROBLEM - Check systemd state on elastic2028 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:04:57] PROBLEM - Check systemd state on elastic2027 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:04:58] PROBLEM - Check systemd state on elastic2034 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:04:58] PROBLEM - Check systemd state on elastic2036 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:05:07] PROBLEM - Check systemd state on elastic2030 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:05:07] PROBLEM - Check systemd state on elastic2026 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:05:07] PROBLEM - Check systemd state on elastic2029 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:05:07] PROBLEM - Check systemd state on elastic2032 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:08:02] 06Operations, 10Traffic: convert wikitech.wikimedia.org from globalsign to letsencrypt certificate (deadline 2017-02-24) - https://phabricator.wikimedia.org/T154913#2928540 (10RobH) [19:11:33] (03PS2) 10RobH: reclaim nobelium to spares [puppet] - 10https://gerrit.wikimedia.org/r/325439 [19:11:50] (03CR) 10RobH: [C: 032] reclaim nobelium to spares [puppet] - 10https://gerrit.wikimedia.org/r/325439 (owner: 10RobH) [19:16:57] PROBLEM - puppet last run on ms-fe1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:17:44] (03CR) 10Dzahn: tendril: use Letsencrypt for SSL cert (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/330829 (https://phabricator.wikimedia.org/T133717) (owner: 10Dzahn) [19:21:33] (03CR) 10Dzahn: Convert librenms.wikimedia.org to LE cert use (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/331370 (owner: 10RobH) [19:22:47] RECOVERY - puppet last run on mw1297 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [19:29:09] (03CR) 10Muehlenhoff: [C: 032] Update to 4.4.40 [debs/linux44] - 10https://gerrit.wikimedia.org/r/330926 (owner: 10Muehlenhoff) [19:34:07] PROBLEM - Check systemd state on elastic2033 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:36:40] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#2928629 (10Marostegui) Hi, Which tables would you need converted in the end? looking at: T145885#2896928 looks like you tried t... [19:40:05] (03PS2) 10RobH: Convert librenms.wikimedia.org to LE cert use [puppet] - 10https://gerrit.wikimedia.org/r/331370 [19:40:08] PROBLEM - puppet last run on cp3043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:44:15] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#2928698 (10Paladox) @Marostegui hi, the main table we want converted is patch_comment. I haven't really tested if converted that... [19:44:22] (03PS3) 10RobH: Convert librenms.wikimedia.org to LE cert use [puppet] - 10https://gerrit.wikimedia.org/r/331370 [19:44:57] RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [19:45:51] (03CR) 10Dzahn: [C: 031] "looks good to me now. should work. one extra thing i checked is we have mod_rewrite loaded already, but we do." [puppet] - 10https://gerrit.wikimedia.org/r/331370 (owner: 10RobH) [19:46:36] (03PS1) 10Muehlenhoff: Update to 4.4.41 [debs/linux44] - 10https://gerrit.wikimedia.org/r/331388 [19:46:51] 06Operations, 06TCB-Team, 10Two-Column-Edit-Conflict-Merge, 15User-Addshore, 03WMDE-QWERTY-Team-Board: Deploy TwoColConflict extension to beta - https://phabricator.wikimedia.org/T154927#2928708 (10Addshore) [19:47:12] 06Operations, 06TCB-Team, 10Two-Column-Edit-Conflict-Merge, 15User-Addshore, 03WMDE-QWERTY-Team-Board: Deploy TwoColConflict extension to beta - https://phabricator.wikimedia.org/T154927#2928708 (10Addshore) a:03Addshore [19:49:30] (03CR) 10RobH: [C: 032] Convert librenms.wikimedia.org to LE cert use [puppet] - 10https://gerrit.wikimedia.org/r/331370 (owner: 10RobH) [19:49:58] !log updating librenms.wikimedia.org cert, netmon1001 only system affected [19:50:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:50:07] PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:51:17] RECOVERY - puppet last run on lvs2006 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [19:53:28] 06Operations, 10DBA, 13Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#2928738 (10Marostegui) [19:53:52] (03PS1) 10RobH: fixing typo for cert filename [puppet] - 10https://gerrit.wikimedia.org/r/331390 [19:58:29] (03CR) 10RobH: [C: 032] fixing typo for cert filename [puppet] - 10https://gerrit.wikimedia.org/r/331390 (owner: 10RobH) [20:00:47] PROBLEM - puppet last run on netmon1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 49 seconds ago with 1 failures. Failed resources (up to 3 shown): Exec[acme-setup-acme-librenms] [20:04:40] (03CR) 10Muehlenhoff: [C: 032] Update to 4.4.41 [debs/linux44] - 10https://gerrit.wikimedia.org/r/331388 (owner: 10Muehlenhoff) [20:09:07] RECOVERY - puppet last run on cp3043 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [20:10:47] PROBLEM - puppet last run on netmon1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 49 seconds ago with 1 failures. Failed resources (up to 3 shown): Exec[acme-setup-acme-librenms] [20:11:47] RECOVERY - puppet last run on netmon1001 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [20:12:44] 06Operations: Package the next LTS kernel (likely 4.9) - https://phabricator.wikimedia.org/T154934#2928855 (10MoritzMuehlenhoff) [20:13:30] 06Operations, 10MediaWiki-Configuration, 06Performance-Team, 06Services (watching), and 5 others: Integrating MediaWiki (and other services) with dynamic configuration - https://phabricator.wikimedia.org/T149617#2928878 (10srishakatux) Note-taker(s) of this session: Follow the instructions here: https://... [20:13:59] 06Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Asynchronous processing in production: one queue to rule them all - https://phabricator.wikimedia.org/T149408#2928885 (10srishakatux) Note-taker(s) of this session: Follow the instructions here: https://www.mediawiki.org/wiki/Wikimedia_De... [20:14:53] (03PS1) 10RobH: the key path differs than the cert path [puppet] - 10https://gerrit.wikimedia.org/r/331392 [20:15:19] (03CR) 10RobH: [C: 032] the key path differs than the cert path [puppet] - 10https://gerrit.wikimedia.org/r/331392 (owner: 10RobH) [20:19:17] RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [20:22:00] (03CR) 10Jcrespo: "This looks good, but as it touches pt-heartbeat, which could bring down all wikis at once, I would like to personally deploy it, very slow" [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/331329 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [20:25:47] (03PS1) 10RobH: decommission use of librenms.wikimedia.org globalsign certificate [puppet] - 10https://gerrit.wikimedia.org/r/331394 [20:28:58] (03CR) 10RobH: [C: 032] decommission use of librenms.wikimedia.org globalsign certificate [puppet] - 10https://gerrit.wikimedia.org/r/331394 (owner: 10RobH) [20:33:13] 06Operations, 10Traffic: convert wikitech.wikimedia.org from globalsign to letsencrypt certificate (deadline 2017-02-24) - https://phabricator.wikimedia.org/T154913#2928975 (10Krenair) a:03Krenair [20:33:47] (03PS2) 10Alex Monk: ruby-httpclient callers: Use the operating system's certificate store [puppet] - 10https://gerrit.wikimedia.org/r/311048 (https://phabricator.wikimedia.org/T145808) [20:36:41] 06Operations, 10Traffic, 13Patch-For-Review: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717#2240497 (10RobH) [20:36:56] 06Operations, 10Traffic, 13Patch-For-Review: convert wikitech.wikimedia.org from globalsign to letsencrypt certificate (deadline 2017-02-24) - https://phabricator.wikimedia.org/T154913#2929003 (10Dzahn) [20:36:58] 06Operations, 10Traffic, 13Patch-For-Review: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717#2929004 (10Dzahn) [20:39:57] 06Operations, 10Traffic, 13Patch-For-Review: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717#2929006 (10Dzahn) [20:39:59] 06Operations, 10Traffic, 10Wikimedia-Mailing-lists: convert lists.wikimedia.org certificate to LetsEncrypt (deadline:2017-03-02) - https://phabricator.wikimedia.org/T154917#2929005 (10Dzahn) [20:41:42] 06Operations, 10Traffic: convert tendril to use Letsencrypt for SSL cert - https://phabricator.wikimedia.org/T154938#2929008 (10Dzahn) [20:42:23] (03PS2) 10Tim Landscheidt: librenms: Indent @ssl_settings in Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/329741 [20:42:31] 06Operations, 10Traffic: convert tendril to use Letsencrypt for SSL cert (deadline 2017-03-17) - https://phabricator.wikimedia.org/T154938#2929008 (10Dzahn) [20:43:35] (03CR) 10Dzahn: [C: 031] librenms: Indent @ssl_settings in Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/329741 (owner: 10Tim Landscheidt) [20:49:39] 06Operations, 10Traffic: convert ganglia to use Letsencrypt for SSL cert (deadline: 2017-02-07) - https://phabricator.wikimedia.org/T154939#2929027 (10Dzahn) [20:50:09] (03CR) 10RobH: [C: 032] librenms: Indent @ssl_settings in Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/329741 (owner: 10Tim Landscheidt) [20:50:11] (03PS2) 10Dzahn: ganglia: use Letsencrypt for SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/331085 (https://phabricator.wikimedia.org/T154939) [20:50:37] (03PS3) 10Dzahn: ganglia: use Letsencrypt for SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/331085 (https://phabricator.wikimedia.org/T154938) [20:51:58] 06Operations, 10Traffic, 13Patch-For-Review: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717#2929049 (10Dzahn) [20:54:12] 06Operations, 10Traffic: convert dumps to use Letsencrypt for SSL cert (deadline: 2017-04-26) - https://phabricator.wikimedia.org/T154940#2929052 (10Dzahn) [21:01:53] 06Operations, 10Traffic: convert archiva to use Letsencrypt for SSL cert (deadline 2017-05-08) - https://phabricator.wikimedia.org/T154942#2929084 (10Dzahn) [21:13:16] (03PS3) 10Dzahn: install_server: Indent @ssl_settings in NGINX configuration [puppet] - 10https://gerrit.wikimedia.org/r/329740 (owner: 10Tim Landscheidt) [21:13:52] (03CR) 10Dzahn: [C: 032] install_server: Indent @ssl_settings in NGINX configuration [puppet] - 10https://gerrit.wikimedia.org/r/329740 (owner: 10Tim Landscheidt) [21:15:29] (03PS2) 10Dzahn: tendril: Indent @ssl_settings in Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/329747 (owner: 10Tim Landscheidt) [21:17:17] (03CR) 10Dzahn: [C: 032] tendril: Indent @ssl_settings in Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/329747 (owner: 10Tim Landscheidt) [21:23:19] (03CR) 10Andrew Bogott: [C: 032] openstack: capitalize service resource in keystone [puppet] - 10https://gerrit.wikimedia.org/r/331352 (owner: 10Hashar) [21:23:23] (03PS2) 10Andrew Bogott: openstack: capitalize service resource in keystone [puppet] - 10https://gerrit.wikimedia.org/r/331352 (owner: 10Hashar) [21:26:46] andrewbogott: thank you :] [21:41:57] PROBLEM - puppet last run on mw1265 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:45:27] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:45:27] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:45:27] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:45:37] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:45:57] PROBLEM - zotero on sca1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:46:49] (03CR) 10Dzahn: [C: 032] ganglia: Indent @ssl_settings in Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/329737 (owner: 10Tim Landscheidt) [21:46:55] (03PS2) 10Dzahn: ganglia: Indent @ssl_settings in Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/329737 (owner: 10Tim Landscheidt) [21:47:02] PROBLEM - LVS HTTP IPv4 on zotero.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:48:18] grumble zotero grumble [21:48:26] <_joe_> akosiaris: you on it? [21:48:31] <_joe_> I'm around if needed [21:48:33] yeah [21:49:15] is the fix simply restarting service? [21:49:17] RECOVERY - Check systemd state on elastic2032 is OK: OK - running: The system is fully operational [21:49:23] with zotero? usually yes [21:49:27] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [21:49:27] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [21:49:27] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [21:49:27] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [21:49:39] it's that high quality [21:49:42] :P [21:49:51] <_joe_> robh: yeah zotero is a piece of garbage third party thing [21:49:51] so restarting it on the endpoints recovers? [21:49:52] RECOVERY - LVS HTTP IPv4 on zotero.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.0 200 OK - 62 bytes in 0.008 second response time [21:49:52] RECOVERY - zotero on sca1004 is OK: HTTP OK: HTTP/1.0 200 OK - 62 bytes in 0.017 second response time [21:49:54] ok [21:49:59] just fyi for me for next time [21:50:20] !log service restart zotero on sca1003, sca1004. Zotero OOMed again as usual [21:50:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:50:23] thanks guys :) [21:51:33] we should just set a timeline for killing this service [21:51:53] (03CR) 10Hashar: "That looked like a noop to me since the end result after lexing should be indentical. But we never know with puppet." [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/331329 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [22:04:07] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [22:05:28] (03PS2) 10Dzahn: requesttracker: Indent @ssl_settings in Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/329746 (owner: 10Tim Landscheidt) [22:07:08] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [22:11:01] (03CR) 10Dzahn: [C: 032] requesttracker: Indent @ssl_settings in Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/329746 (owner: 10Tim Landscheidt) [22:11:07] RECOVERY - puppet last run on mw1265 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [22:12:41] (03PS2) 10Dzahn: mirrors: Indent @ssl_settings in NGINX configuration [puppet] - 10https://gerrit.wikimedia.org/r/329743 (owner: 10Tim Landscheidt) [22:16:33] 06Operations: make deployment SSH keys use the same passphrase - https://phabricator.wikimedia.org/T154943#2929207 (10Reedy) [22:17:16] 06Operations, 06TCB-Team, 10Two-Column-Edit-Conflict-Merge, 15User-Addshore, 03WMDE-QWERTY-Team-Board: Deploy TwoColConflict extension to beta - https://phabricator.wikimedia.org/T154927#2928708 (10Reedy) Are you JFDI-ing this? I'm about if you want a hand [22:19:45] 06Operations, 06TCB-Team, 10Two-Column-Edit-Conflict-Merge, 15User-Addshore, 03WMDE-QWERTY-Team-Board: Deploy TwoColConflict extension to beta - https://phabricator.wikimedia.org/T154927#2929215 (10Addshore) [22:20:37] 06Operations, 06TCB-Team, 10Two-Column-Edit-Conflict-Merge, 15User-Addshore, 03WMDE-QWERTY-Team-Board: Deploy TwoColConflict extension to beta - https://phabricator.wikimedia.org/T154927#2928708 (10Addshore) @Reedy we need to patches for the few minor things from the security review first :) (see attache... [22:25:36] 06Operations, 10Traffic, 07Wikimedia-Incident: Investigate varnishd child crashes when multiple nodes get depooled/pooled concurrently - https://phabricator.wikimedia.org/T154801#2929228 (10greg) [22:25:48] 06Operations, 10Monitoring, 10Traffic, 07Wikimedia-Incident: Plot number of cached objects on a per-server per-DC basis - https://phabricator.wikimedia.org/T154864#2929229 (10greg) [22:28:17] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:29:23] (03PS1) 10RobH: convert lists.wikimedia.org from GS to LE certificate [puppet] - 10https://gerrit.wikimedia.org/r/331403 [22:30:13] (03CR) 10jerkins-bot: [V: 04-1] convert lists.wikimedia.org from GS to LE certificate [puppet] - 10https://gerrit.wikimedia.org/r/331403 (owner: 10RobH) [22:31:49] (03PS2) 10RobH: convert lists.wikimedia.org from GS to LE certificate [puppet] - 10https://gerrit.wikimedia.org/r/331403 [22:34:15] (03CR) 10RobH: [C: 031] tendril: use Letsencrypt for SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/330829 (https://phabricator.wikimedia.org/T133717) (owner: 10Dzahn) [22:45:57] PROBLEM - puppet last run on ms-be1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:47:48] 06Operations, 10DNS, 10Domains, 10Traffic: Donate wiktionary.pl to the Foundation - https://phabricator.wikimedia.org/T154826#2929270 (10tomasz) The domain is now the property of the Foundation, and MarkMonitor have set the DNS to ns[0, 1, 2].wikimedia.org. Does anything else need to be done here? [22:49:50] (03CR) 10Dzahn: [C: 031] convert lists.wikimedia.org from GS to LE certificate [puppet] - 10https://gerrit.wikimedia.org/r/331403 (owner: 10RobH) [22:50:16] (03PS3) 10RobH: convert lists.wikimedia.org from GS to LE certificate [puppet] - 10https://gerrit.wikimedia.org/r/331403 [22:50:42] I'm going to go ahead and merge my lists.w.o change since it should only add in the challenge file and create new files (not change how the files are served yet) [22:51:54] (03CR) 10RobH: [C: 032] convert lists.wikimedia.org from GS to LE certificate [puppet] - 10https://gerrit.wikimedia.org/r/331403 (owner: 10RobH) [22:52:27] (03PS4) 10Dzahn: ganglia: use Letsencrypt for SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/331085 (https://phabricator.wikimedia.org/T154938) [22:53:16] !log updating lists.w.o to use LE cert [22:53:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:55:07] PROBLEM - puppet last run on fermium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:56:17] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [23:00:04] (03PS1) 10RobH: remove group from LE addition for lists.w.o [puppet] - 10https://gerrit.wikimedia.org/r/331406 [23:01:33] 06Operations, 06TCB-Team, 10Two-Column-Edit-Conflict-Merge, 15User-Addshore, 03WMDE-QWERTY-Team-Board: Deploy TwoColConflict extension to beta - https://phabricator.wikimedia.org/T154927#2929306 (10Addshore) a:05Addshore>03Reedy [23:01:47] (03CR) 10Dzahn: [C: 031] "yep, group is not a valid parameter of letsencrypt::cert::integrated. I see why you used it, it was there before (for some reason that i a" [puppet] - 10https://gerrit.wikimedia.org/r/331406 (owner: 10RobH) [23:03:23] (03CR) 10RobH: [C: 032] remove group from LE addition for lists.w.o [puppet] - 10https://gerrit.wikimedia.org/r/331406 (owner: 10RobH) [23:03:56] (03PS1) 10Reedy: Deploy TwoColConflict on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331408 (https://phabricator.wikimedia.org/T154927) [23:04:17] PROBLEM - puppet last run on ganeti1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:04:46] (03CR) 10Reedy: [C: 032] Deploy TwoColConflict on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331408 (https://phabricator.wikimedia.org/T154927) (owner: 10Reedy) [23:05:13] (03PS1) 10RobH: Revert "remove group from LE addition for lists.w.o" [puppet] - 10https://gerrit.wikimedia.org/r/331409 [23:05:21] (03Merged) 10jenkins-bot: Deploy TwoColConflict on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331408 (https://phabricator.wikimedia.org/T154927) (owner: 10Reedy) [23:05:42] (03CR) 10jenkins-bot: Deploy TwoColConflict on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331408 (https://phabricator.wikimedia.org/T154927) (owner: 10Reedy) [23:05:53] (03Abandoned) 10RobH: Revert "remove group from LE addition for lists.w.o" [puppet] - 10https://gerrit.wikimedia.org/r/331409 (owner: 10RobH) [23:05:59] clicked abandon by accident. [23:06:10] there is "restore" [23:07:17] !log reedy@tin Synchronized wmf-config/extension-list-labs: T154927 (duration: 00m 41s) [23:07:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:07:21] T154927: Deploy TwoColConflict extension to beta - https://phabricator.wikimedia.org/T154927 [23:08:27] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/1/3: down - Core: cr2-eqiad:xe-4/1/3 (Level3, BDFS2448, 84ms) {#A0010621} [10Gbps wave]BR [23:08:27] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/1/3: down - Core: cr2-esams:xe-0/1/3 (Level3, BDFS2448, 84ms) {#2013} [10Gbps wave]BR [23:08:45] !log reedy@tin Synchronized wmf-config/InitialiseSettings-labs.php: T154927 (duration: 00m 42s) [23:08:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:09:40] 06Operations, 06TCB-Team, 10Two-Column-Edit-Conflict-Merge, 13Patch-For-Review, and 2 others: Deploy TwoColConflict extension to beta - https://phabricator.wikimedia.org/T154927#2928708 (10greg) >>! In T154927#2929215, @Addshore wrote: > @Reedy we need to patches for the few minor things from the security... [23:09:46] !log reedy@tin Synchronized wmf-config/CommonSettings-labs.php: T154927 (duration: 00m 41s) [23:09:48] 06Operations, 06TCB-Team, 10Two-Column-Edit-Conflict-Merge, 15User-Addshore, 03WMDE-QWERTY-Team-Board: Deploy TwoColConflict extension to production - https://phabricator.wikimedia.org/T150184#2929339 (10Addshore) [23:09:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:09:50] 06Operations, 06TCB-Team, 10Two-Column-Edit-Conflict-Merge, 13Patch-For-Review, and 2 others: Deploy TwoColConflict extension to beta - https://phabricator.wikimedia.org/T154927#2929338 (10Addshore) [23:09:56] Reedy: https://phabricator.wikimedia.org/T154927#2929333 hasty? :) [23:10:21] greg-g: bawolff fixed the security issue, and I merged it [23:10:39] cool (sorry, just going off of state of the tasks :) ) [23:10:41] 06Operations, 06TCB-Team, 10Two-Column-Edit-Conflict-Merge, 13Patch-For-Review, and 2 others: Deploy TwoColConflict extension to beta - https://phabricator.wikimedia.org/T154927#2929342 (10Addshore) >>! In T154927#2929333, @greg wrote: >>>! In T154927#2929215, @Addshore wrote: >> @Reedy we need to patches... [23:11:08] 06Operations, 06TCB-Team, 10Two-Column-Edit-Conflict-Merge, 13Patch-For-Review, and 2 others: Deploy TwoColConflict extension to beta - https://phabricator.wikimedia.org/T154927#2929343 (10greg) Thanks :) [23:12:03] silly Reedy confusing everyone [23:12:10] ;) [23:12:17] addshore: YOU'RE WELCOME [23:13:57] RECOVERY - puppet last run on ms-be1006 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [23:15:56] it's not a dev summit if something gets deployed by Reedy in a slightly concerning way :P [23:16:04] doesn't* [23:16:08] whatever, grammar [23:16:16] greg-g: I could've deployed to prod from a plane yesterday [23:16:19] I have a few changes on my radar :P [23:16:34] ostriches: deploy party [23:16:43] Reedy: that just sounds like a bad idea :P [23:16:44] here we go... [23:16:50] (the plane) [23:16:52] addshore: I've done it before [23:16:59] Reedy: Ain't no party like a deploy party [23:17:29] Aww, Krinkle hasn't re-reviewed my patch yet :( [23:17:39] bad Krinkle [23:17:43] Reedy: twocolconflict appears on beta, looks like i18n is missing? [23:17:55] bad Reedy [23:18:05] (03CR) 10Krinkle: [C: 031] Remove MWVersion, fold its two functions into MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309363 (owner: 10Chad) [23:18:28] Krinkle: <3 [23:19:02] (03CR) 10Chad: [C: 032] Remove MWVersion, fold its two functions into MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309363 (owner: 10Chad) [23:19:34] (03Merged) 10jenkins-bot: Remove MWVersion, fold its two functions into MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309363 (owner: 10Chad) [23:19:50] addshore: i18n wfm? [23:19:50] https://en.wikipedia.beta.wmflabs.org/wiki/Special:Version [23:19:55] TwoColConflict 0.0.1 (db9a200) 22:34, 9 January 2017 GPL-2.0+ Showing a side-by-side edit merge screen for edit conflict resolution WMDE [23:20:07] oh... wfm now too [23:20:23] *closes the task*.... [23:20:33] 06Operations, 06TCB-Team, 10Two-Column-Edit-Conflict-Merge, 15User-Addshore, 03WMDE-QWERTY-Team-Board: Deploy TwoColConflict extension to production - https://phabricator.wikimedia.org/T150184#2929349 (10Addshore) [23:20:36] 06Operations, 06TCB-Team, 10Two-Column-Edit-Conflict-Merge, 13Patch-For-Review, and 2 others: Deploy TwoColConflict extension to beta - https://phabricator.wikimedia.org/T154927#2929348 (10Addshore) 05Open>03Resolved [23:22:38] (03PS1) 10RobH: lists.w.o new le not to require apche mod ssl [puppet] - 10https://gerrit.wikimedia.org/r/331412 [23:22:44] (03CR) 10jenkins-bot: Remove MWVersion, fold its two functions into MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309363 (owner: 10Chad) [23:22:53] beta didn't break :D [23:23:23] hey, testing, sweet [23:23:34] (03CR) 10RobH: [C: 032] lists.w.o new le not to require apche mod ssl [puppet] - 10https://gerrit.wikimedia.org/r/331412 (owner: 10RobH) [23:24:56] 07Puppet, 10MediaWiki-Vagrant, 13Patch-For-Review: mediawiki/vagrant puppet classes "3d" are illegal with puppet - https://phabricator.wikimedia.org/T154594#2929362 (10hashar) 05Open>03Resolved @Juniorsys thank you very much! That unlock the generation of mediawiki/vagrant documentation https://gerrit.w... [23:25:07] RECOVERY - puppet last run on fermium is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [23:25:09] !log demon@tin Synchronized multiversion/MWMultiVersion.php: Cleanup cleanup everybody everywhere (duration: 00m 40s) [23:25:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:25:24] (03CR) 10Dzahn: [C: 032] ganglia: use Letsencrypt for SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/331085 (https://phabricator.wikimedia.org/T154938) (owner: 10Dzahn) [23:25:30] (03PS5) 10Dzahn: ganglia: use Letsencrypt for SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/331085 (https://phabricator.wikimedia.org/T154938) [23:26:54] !log demon@tin Synchronized w: Cleanup cleanup everybody do your share (duration: 00m 40s) [23:26:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:28:20] !log ganglia web - replacing SSL cert with Letsencrypt [23:28:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:28:25] !log demon@tin Synchronized rpc/RunJobs.php: More cleanup songs (duration: 00m 40s) [23:28:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:29:04] (03PS1) 10RobH: lists.wikimedia.org update to LE from GS [puppet] - 10https://gerrit.wikimedia.org/r/331415 [23:29:53] !log demon@tin Synchronized multiversion: Final batch of MWVersion cleanup (in song form) (duration: 00m 56s) [23:29:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:30:17] (03CR) 10RobH: [C: 032] lists.wikimedia.org update to LE from GS [puppet] - 10https://gerrit.wikimedia.org/r/331415 (owner: 10RobH) [23:30:24] (03PS2) 10RobH: lists.wikimedia.org update to LE from GS [puppet] - 10https://gerrit.wikimedia.org/r/331415 [23:30:46] (03PS2) 10Reedy: 3 more to extension.json in extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328482 (https://phabricator.wikimedia.org/T139800) [23:31:46] (03CR) 10Reedy: [C: 032] 3 more to extension.json in extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328482 (https://phabricator.wikimedia.org/T139800) (owner: 10Reedy) [23:32:19] RECOVERY - puppet last run on ganeti1001 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [23:32:22] (03Merged) 10jenkins-bot: 3 more to extension.json in extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328482 (https://phabricator.wikimedia.org/T139800) (owner: 10Reedy) [23:32:36] (03CR) 10jenkins-bot: 3 more to extension.json in extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328482 (https://phabricator.wikimedia.org/T139800) (owner: 10Reedy) [23:34:09] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [23:34:45] !log reedy@tin Synchronized wmf-config/extension-list: More to extension.json (duration: 00m 40s) [23:34:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:35:01] 06Operations, 10Traffic, 13Patch-For-Review: convert ganglia to use Letsencrypt for SSL cert (deadline: 2017-02-07) - https://phabricator.wikimedia.org/T154939#2929401 (10Dzahn) {F5263611} [23:35:40] (03PS2) 10Reedy: Use wfLoadExtension for 3 more extensions too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328484 (https://phabricator.wikimedia.org/T140852) [23:35:44] 06Operations, 10Traffic, 13Patch-For-Review: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717#2929403 (10Dzahn) [23:35:46] 06Operations, 10Traffic, 13Patch-For-Review: convert ganglia to use Letsencrypt for SSL cert (deadline: 2017-02-07) - https://phabricator.wikimedia.org/T154939#2929402 (10Dzahn) 05Open>03Resolved [23:36:05] 06Operations, 10Traffic: convert ganglia to use Letsencrypt for SSL cert (deadline: 2017-02-07) - https://phabricator.wikimedia.org/T154939#2929027 (10Dzahn) [23:36:30] 06Operations, 10Traffic, 13Patch-For-Review: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717#2240497 (10Dzahn) [23:36:34] (03CR) 10Chad: "This can land safely now -- the Depends-On landed, with back-compat to the old version in case it goes sideways." [puppet] - 10https://gerrit.wikimedia.org/r/309366 (owner: 10Chad) [23:37:09] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [23:37:14] (03CR) 10Reedy: [C: 032] Use wfLoadExtension for 3 more extensions too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328484 (https://phabricator.wikimedia.org/T140852) (owner: 10Reedy) [23:37:43] (03Merged) 10jenkins-bot: Use wfLoadExtension for 3 more extensions too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328484 (https://phabricator.wikimedia.org/T140852) (owner: 10Reedy) [23:38:47] !log reedy@tin Synchronized wmf-config/CommonSettings.php: wfLoadExtension (duration: 00m 40s) [23:38:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:41:32] 06Operations, 10Traffic, 10Wikimedia-Mailing-lists: convert lists.wikimedia.org certificate to LetsEncrypt (deadline:2017-03-02) - https://phabricator.wikimedia.org/T154917#2929416 (10RobH) So the old call for the certificate file also set group ownership to debian-exim. I had to remove that call from the L... [23:41:52] 06Operations, 10Traffic, 13Patch-For-Review: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717#2929417 (10RobH) [23:42:52] (03Abandoned) 10Alex Monk: maintain-meta_p: style improvements [software] - 10https://gerrit.wikimedia.org/r/308590 (owner: 10Alex Monk) [23:43:13] My review dashboard fits onto my screen again [23:47:58] 06Operations, 10netops: cr2-esams<->cr2-eqiad link down - https://phabricator.wikimedia.org/T154952#2929438 (10faidon) [23:49:14] (03CR) 10jenkins-bot: Use wfLoadExtension for 3 more extensions too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328484 (https://phabricator.wikimedia.org/T140852) (owner: 10Reedy) [23:50:59] 06Operations, 06Discovery, 06Maps, 06WMF-Legal, 03Interactive-Sprint: Define tile usage policy - https://phabricator.wikimedia.org/T141815#2929465 (10Slaporte) We clarified the tile usage policy in the [Maps Terms of Use](https://wikimediafoundation.org/wiki/Maps_Terms_of_Use): > **Using maps in third... [23:54:12] Krenair: Well color me surprised...MWVersion cleanups broke nothing :p [23:54:18] s/Krenair/Krinkle [23:54:33] Kr is ambiguous! [23:55:25] congrats