[00:01:04] RECOVERY - puppet last run on stat1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [00:04:25] robh: heh, 10 times in the last month according to my irc logs [00:05:02] that is farrrr more than on server admin log [00:05:07] why arent folks server admin logging reboots? [00:05:10] =[ [00:05:16] PROBLEM - jmxtrans on analytics1022 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args -jar.+jmxtrans-all.jar [00:05:44] robh: some could be flaps though [00:06:17] oh, just irc log note of the alert [00:06:25] not someone speaking and saying they reboot, gotcha [00:06:45] PROBLEM - jmxtrans on analytics1012 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args -jar.+jmxtrans-all.jar [00:06:48] heh yeah, still weird tho [00:07:23] PROBLEM - jmxtrans on analytics1018 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args -jar.+jmxtrans-all.jar [00:07:34] PROBLEM - jmxtrans on analytics1021 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args -jar.+jmxtrans-all.jar [00:08:06] (03PS1) 10Milimetric: Add a new limn datafile generator: extdist [puppet] - 10https://gerrit.wikimedia.org/r/221801 (https://phabricator.wikimedia.org/T101194) [00:10:07] RECOVERY - jmxtrans on analytics1022 is OK: PROCS OK: 1 process with command name java, regex args -jar.+jmxtrans-all.jar [00:16:14] (03PS1) 10Alex Monk: Standardise a ton of ticket comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221803 (https://phabricator.wikimedia.org/T31902) [00:16:24] RECOVERY - jmxtrans on analytics1012 is OK: PROCS OK: 1 process with command name java, regex args -jar.+jmxtrans-all.jar [00:17:03] RECOVERY - jmxtrans on analytics1018 is OK: PROCS OK: 1 process with command name java, regex args -jar.+jmxtrans-all.jar [00:17:13] RECOVERY - jmxtrans on analytics1021 is OK: PROCS OK: 1 process with command name java, regex args -jar.+jmxtrans-all.jar [00:23:21] (03CR) 10Jforrester: [C: 031] Standardise a ton of ticket comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221803 (https://phabricator.wikimedia.org/T31902) (owner: 10Alex Monk) [00:34:32] (03CR) 10Springle: "Some of this (which slaves get purged, and when) is waiting on an email discussion with analytics. Will !log once done." [software] - 10https://gerrit.wikimedia.org/r/221561 (owner: 10Springle) [00:34:46] (03CR) 10Springle: [C: 032] eventlogging purge no longer on m2-master [software] - 10https://gerrit.wikimedia.org/r/221561 (owner: 10Springle) [00:35:04] (03CR) 10Springle: [V: 032] eventlogging purge no longer on m2-master [software] - 10https://gerrit.wikimedia.org/r/221561 (owner: 10Springle) [00:36:46] 6operations: jmxtrans log rotation failure - https://phabricator.wikimedia.org/T104271#1412359 (10fgiunchedi) 3NEW a:3Ottomata [00:37:26] !log restbase1* upgrade to cassandra 2.1.7 completed [00:37:31] Logged the message, Master [00:40:59] (03PS2) 10Alex Monk: Standardise a ton of ticket comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221803 (https://phabricator.wikimedia.org/T31902) [00:41:55] 6operations, 10RESTBase, 10RESTBase-Cassandra: begin testing Cassandra 2.1.6 - https://phabricator.wikimedia.org/T101745#1412384 (10fgiunchedi) since we're running `2.1.7` in production I've imported it into apt.wikimedia.org [00:42:28] 6operations, 10Traffic: Switch to explicit ciphersuite - https://phabricator.wikimedia.org/T104274#1412385 (10BBlack) 3NEW [00:42:41] (03PS1) 10BBlack: switch to explicit ciphersuite lists [puppet] - 10https://gerrit.wikimedia.org/r/221805 (https://phabricator.wikimedia.org/T104274) [00:42:59] 6operations, 10Traffic, 5Patch-For-Review: Switch to explicit ciphersuite - https://phabricator.wikimedia.org/T104274#1412396 (10BBlack) [00:47:00] 6operations, 10Traffic, 5Patch-For-Review: Switch to explicit ciphersuite - https://phabricator.wikimedia.org/T104274#1412407 (10BBlack) [00:49:05] PROBLEM - puppet last run on xenon is CRITICAL Puppet has 1 failures [00:53:44] (03CR) 10Filippo Giunchedi: jobchron: log rotate (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/218905 (owner: 10Matanya) [00:54:38] (03CR) 10Filippo Giunchedi: [C: 031] logstash-logback-encoder setup [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/220764 (owner: 10Eevans) [00:57:40] (03PS8) 10Filippo Giunchedi: configure additional Cassandra metric alerts [puppet] - 10https://gerrit.wikimedia.org/r/218408 (https://phabricator.wikimedia.org/T101764) (owner: 10Eevans) [00:58:04] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] configure additional Cassandra metric alerts [puppet] - 10https://gerrit.wikimedia.org/r/218408 (https://phabricator.wikimedia.org/T101764) (owner: 10Eevans) [00:59:24] (03PS1) 10Alex Monk: Remove wmgUseXAnalytics and wgAjaxEditStash override, other random cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221808 (https://phabricator.wikimedia.org/T31902) [00:59:26] (03PS1) 10Alex Monk: Standardise remaining ticket comments I could find [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221809 (https://phabricator.wikimedia.org/T31902) [01:06:43] RECOVERY - puppet last run on xenon is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [01:10:05] (03Abandoned) 10Filippo Giunchedi: Include cronolog in Apache module; use for Pybal logs [puppet] - 10https://gerrit.wikimedia.org/r/220631 (owner: 10Ori.livneh) [01:14:15] Krenair: still around? [01:14:19] yep [01:14:55] I think something in that last patch caused a wikitech regression. I’m happy to bisect, but — have any instincts about where I should start? [01:15:07] what's the issue? [01:15:09] It’s something to do with nova not being able to dig out my user rights from keystone [01:15:15] or, possibly, just a failure of caching. [01:15:27] You can’t see it right now because I reverted the file by hand :) [01:15:55] last wikitech change affected two files [01:16:06] yep, I reverted both. [01:16:38] do you want me to switch things back to broken so you can see what’s happening? I haven’t dug in at all yet really. [01:16:48] okay... [01:16:56] we did test basic editing etc. after the deploy [01:17:13] some people had reported some normal-sounding issues, but editing etc. worked [01:17:18] Yeah, it’s a bit subtle. If you visit https://wikitech.wikimedia.org/wiki/Special:NovaInstance [01:18:31] Oh. [01:18:34] Very subtle. [01:18:51] ok, I returned it to the broken state [01:18:58] See how you can’t enter anything in the project filter? [01:19:29] yep [01:20:21] how does that data get from ldap to the client? [01:21:19] hm [01:21:26] that’s a good question :) It might be via keystone or via ldap, let me check. [01:23:04] ah, it’s in a cookie! Is it possible that it’s just that cookies are broken? [01:23:12] And/or stale? [01:25:51] (03CR) 10Filippo Giunchedi: [C: 04-1] "some nits, overall looks sane" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/221747 (owner: 10Ori.livneh) [01:26:37] andrewbogott, maybe.... [01:26:49] this->userLDAP->getProjects() does return nothing [01:27:18] ok, brb and then I’ll turn on ldap debugging [01:27:19] andrewbogott, actually, I think that's just the last used project filter? [01:27:29] in the cookie [01:27:40] my value is deployment-prep [01:28:08] 6operations, 10Traffic: Sort out DHE for Forward Secrecy w/ older clients - https://phabricator.wikimedia.org/T104281#1412502 (10BBlack) 3NEW [01:30:06] ldap logging does not appear to be working? [01:30:10] debug logging* [01:31:04] bd808, those logging changes wouldn't be breaking $wgDebugLogGroups["ldap"] = "/tmp/ldap-s-1-debug.log"; would they? [01:31:08] I agree, I don’t know what that’s about [01:31:15] So this patch removed /srv/mediawiki/private/WikitechPrivateLdapSettings.php right? [01:31:22] no [01:31:29] it moved it up the file to be with the other ldap stuff [01:31:31] oh, just moved it [01:32:03] PROBLEM - Check correctness of the icinga configuration on neon is CRITICAL: Icinga configuration contains errors [01:33:59] ^ checking [01:35:02] 6operations, 10Traffic: Sort out DHE for Forward Secrecy w/ older clients - https://phabricator.wikimedia.org/T104281#1412509 (10BBlack) A notable counter-opinion: ssllabs.com caps us down from "A+" to "B" for allowing DHE-1024 as they consider it "weak". Obviously our other two options are "Status Quo: leave... [01:35:04] andrewbogott, mwscript shows all of the important settings are the same [01:35:10] (mwscript eval.php labswiki) [01:36:10] So, we survived the leap second than, aye? [01:36:18] isn't that tomorrow? [01:36:33] Is it? [01:36:43] Well, [01:36:46] Deployment schedule says it's.. [01:36:46] later today [01:36:50] oh, it's past midnight [01:36:56] depending on your time zone ;) [01:37:31] yes, 21 or so hours from now [01:37:47] andrewbogott, you confirmed that reverting this fixes wikitech though? [01:38:14] it doesn’t fix the sidebar but does fix the manage instance page. I assumed the sidebar is due to memached persistence. [01:38:33] If you look in the config, I made copies. So .phpgood for the pre-patch files and .phpbak for the post-patch ones. [01:40:02] Krenair: I’m going to hand-edit the live files a bit, if you aren’t already doing that :) [01:40:21] I already uncommented the ldap debug lines, but nothing other than that [01:40:29] ok [01:40:44] RECOVERY - carbon-cache too many creates on graphite1001 is OK Less than 1.00% above the threshold [500.0] [01:41:23] !log krinkle Synchronized php-1.26wmf11/includes/resourceloader/ResourceLoader.php: I7761242f01 (duration: 00m 14s) [01:41:29] Logged the message, Master [01:41:37] (03PS1) 10Filippo Giunchedi: restbase: drop illegal characters from alarm description [puppet] - 10https://gerrit.wikimedia.org/r/221815 (https://phabricator.wikimedia.org/T101764) [01:42:42] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] restbase: drop illegal characters from alarm description [puppet] - 10https://gerrit.wikimedia.org/r/221815 (https://phabricator.wikimedia.org/T101764) (owner: 10Filippo Giunchedi) [01:46:30] Krenair: we don't install a logging handler that reads from $wgDebugLogGroups in prod any more. See $wmgMonologChannels in InitialiseSettings for the replacement. [01:47:03] we don't want to send the logs off anywhere, just dump them to a local file on silver [01:47:43] That needs to be done by modifying $wmgMonologConfig then [01:48:17] I can work up the right changes. [01:48:26] This wouldn't have been working for quite a while [01:48:53] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [01:49:44] bd808: it’s only for an optional debug log that’s usually turned off, so we wouldn’t have noticed. [01:50:03] Krenair: if I move require_once( '/srv/mediawiki/private/WikitechPrivateLdapSettings.php' ) to the bottom of wikitech.php it fixes things. [01:50:04] RECOVERY - Host mw1085 is UPING OK - Packet loss = 0%, RTA = 1.17 ms [01:50:07] So, that’s dumb [01:50:09] wat. [01:50:22] Oooh! [01:50:24] I know why. [01:50:45] something must be redefined [01:50:59] urgh, let me find the version I can reference publicly [01:51:14] andrewbogott: any good reason that log channel can't go to fluorine with all the other logs instead of to a local file on silver? [01:51:21] 6operations, 10Traffic: Sort out DHE for Forward Secrecy w/ older clients - https://phabricator.wikimedia.org/T104281#1412545 (10BBlack) More info on Java6: 20% of JVM marketshare and declining: https://plumbr.eu/blog/java/java-version-statistics-2015-edition Oracle's Java Roadmap says public release updates/... [01:51:38] bd808: only general wikitech isolation reasons. [01:52:07] https://github.com/wikimedia/operations-puppet/blob/HEAD/modules/wikitech/templates/wikitech_ldap.php.erb [01:52:12] but silver is a deploy target now so ... not really more isolated [01:52:23] security by obscurity I guess [01:52:32] it’s not about security... [01:52:33] So what happens is we load this [01:52:45] Its about being able to still use/test/debug wikitech when the rest of the cluster is on fire :) [01:52:46] And then it gets overwritten by the OSM defaults [01:52:52] Oh, of course. [01:53:13] Krenair: want to write a patch or shall I? [01:53:19] by https://github.com/wikimedia/mediawiki-extensions-OpenStackManager/blob/master/OpenStackManager.php#L69 [01:53:29] I'll [01:53:58] thanks [01:56:01] !log krenair Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 11s) [01:56:06] Logged the message, Master [01:56:07] andrewbogott, now it should work [01:56:43] Krenair: looks ok, other than the sidebar. [01:56:50] (03PS1) 10Alex Monk: Unbreak wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221816 [01:57:17] (03CR) 10Alex Monk: [C: 032] "Oops." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221816 (owner: 10Alex Monk) [01:57:22] (03Merged) 10jenkins-bot: Unbreak wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221816 (owner: 10Alex Monk) [01:58:02] yeah [01:58:12] andrewbogott, so what's up with the sidebar? [01:58:18] I see all my normal sections [01:58:21] you do? [01:58:34] Do you have ‘manage instances’ in the sidebar? [01:58:41] lemme log out/in [01:58:43] yup [01:58:47] under projectadmins [01:59:05] this is a DynamicSidebar thing... an extension I've never actually looked into [01:59:27] yeah, I still don’t have a projectadmin section [01:59:42] * andrewbogott tries a different browser [02:00:20] hm, broken in new browser too [02:01:07] It must be cached someplace if it works for you… I think I’m happy to wait until morning and assume it’ll sort itself out. [02:02:19] Krenair: sidebar is fixed for the user who reported the issue as well. So… [02:02:26] I don’t know what to make of this :) [02:02:39] 6operations, 10Traffic: Sort out DHE for Forward Secrecy w/ older clients - https://phabricator.wikimedia.org/T104281#1412549 (10BBlack) Redhat seems to have patched around the Java6/1024 problem late last year: https://rhn.redhat.com/errata/RHSA-2014-1634.html . The picture I'm getting is vendors other than... [02:04:52] Krenair: I’m about out of nickels for the night anyway, I’ll sort out the sidebar in the morning if it’s still broken for me. Since it seems to be working for everyone else :) [02:04:55] Thanks for the fix. [02:05:01] ok [02:05:07] sorry for breaking that in the first place [02:05:14] np, I’m still glad for the cleanup [02:06:39] yeah, it caches: https://github.com/wikimedia/mediawiki-extensions-OpenStackManager/blob/89b249662fb1192047e6d6104e749331475890d2/nova/OpenStackNovaUser.php#L613 [02:06:43] RECOVERY - Check correctness of the icinga configuration on neon is OK: Icinga configuration is correct [02:11:03] !log krenair Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 12s) [02:11:09] Logged the message, Master [02:12:14] (03PS1) 10Alex Monk: wikitech: Fix semantic links too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221820 [02:12:57] (03CR) 10Alex Monk: [C: 032] wikitech: Fix semantic links too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221820 (owner: 10Alex Monk) [02:13:03] (03Merged) 10jenkins-bot: wikitech: Fix semantic links too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221820 (owner: 10Alex Monk) [02:13:15] I had never noticed that namespace before [02:13:17] https://wikitech.wikimedia.org/w/index.php?title=Ops:Templateeditor_right_for_enwiki&action=history [02:17:56] OSM role cache is only an hour btw, andrewbogott_afk [02:18:08] https://github.com/wikimedia/mediawiki-extensions-OpenStackManager/blob/89b249662fb1192047e6d6104e749331475890d2/nova/OpenStackNovaUser.php#L219 [02:18:26] current value for your user is [], which explains why you don't see much in your sidebar. probably due to the ldap issue [02:18:45] !log l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 06m 09s) [02:18:53] Logged the message, Master [02:20:23] PROBLEM - puppet last run on cp3018 is CRITICAL puppet fail [02:22:01] !log LocalisationUpdate completed (1.26wmf11) at 2015-06-30 02:22:00+00:00 [02:22:05] Logged the message, Master [02:35:24] RECOVERY - puppet last run on cp3018 is OK Puppet is currently enabled, last run 36 seconds ago with 0 failures [02:38:34] (03PS1) 10BryanDavis: wikitech: Local logging config for ldap debugging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221825 [02:40:07] (03CR) 10BryanDavis: "Untested, but I think this will work. It's disabled by default so it should be low risk to deploy but may need tweaking." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221825 (owner: 10BryanDavis) [03:00:49] (03CR) 10Eevans: [C: 032 V: 032] logstash-logback-encoder setup [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/220764 (owner: 10Eevans) [03:45:48] (03CR) 10Ori.livneh: Add tessera module and role; apply on graphite1001 (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/221747 (owner: 10Ori.livneh) [03:48:55] (03PS4) 10Ori.livneh: Add tessera module and role; apply on graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/221747 [03:50:00] godog: ^ (if you're still around) [04:10:51] 6operations, 10Traffic, 7HTTPS, 5HTTPS-by-default, 5Patch-For-Review: Switch to ECDSA hybrid certificates - https://phabricator.wikimedia.org/T86654#1412714 (10BBlack) Tested OCSP Stapling this evening as best I can: my test env was the latest stable FF release on Mac with strict OCSP prefs set. If I di... [04:18:57] (03CR) 10Prtksxna: "Thanks Catrope!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220121 (https://phabricator.wikimedia.org/T103283) (owner: 10Prtksxna) [04:37:00] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 30 04:37:00 UTC 2015 (duration 36m 59s) [04:37:05] Logged the message, Master [05:15:39] (03PS1) 10Springle: depool db1034 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221830 [05:16:09] (03CR) 10Springle: [C: 032] depool db1034 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221830 (owner: 10Springle) [05:16:14] (03Merged) 10jenkins-bot: depool db1034 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221830 (owner: 10Springle) [05:17:07] !log springle Synchronized wmf-config/db-eqiad.php: depool db1034 (duration: 00m 12s) [05:17:12] Logged the message, Master [05:21:10] !log restarting cassandra instance on restbase1004; was in small-write mode [05:21:16] Logged the message, Master [05:33:53] i know you're out there, godog [05:33:58] you can't hide forever [06:17:19] (03PS1) 10KartikMistry: CX: Enable CX except enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221831 (https://phabricator.wikimedia.org/T103531) [06:20:03] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [06:21:55] (03PS1) 10KartikMistry: CX: Add languages for deployment on 20150630 [puppet] - 10https://gerrit.wikimedia.org/r/221832 (https://phabricator.wikimedia.org/T103531) [06:31:35] PROBLEM - puppet last run on mw2082 is CRITICAL Puppet has 1 failures [06:32:14] PROBLEM - puppet last run on cp2013 is CRITICAL Puppet has 2 failures [06:35:06] PROBLEM - puppet last run on cp4014 is CRITICAL Puppet has 1 failures [06:35:44] PROBLEM - puppet last run on cp2014 is CRITICAL Puppet has 1 failures [06:37:04] PROBLEM - puppet last run on mw2143 is CRITICAL Puppet has 1 failures [06:37:04] PROBLEM - puppet last run on labcontrol2001 is CRITICAL Puppet has 1 failures [06:37:43] PROBLEM - puppet last run on mw2184 is CRITICAL Puppet has 1 failures [06:37:53] PROBLEM - puppet last run on ms-fe2003 is CRITICAL Puppet has 1 failures [06:37:54] PROBLEM - puppet last run on mw1052 is CRITICAL Puppet has 1 failures [06:37:55] PROBLEM - puppet last run on mw1123 is CRITICAL Puppet has 1 failures [06:39:13] PROBLEM - puppet last run on mw2206 is CRITICAL Puppet has 1 failures [06:39:14] PROBLEM - puppet last run on mw2045 is CRITICAL Puppet has 1 failures [06:39:14] PROBLEM - puppet last run on mw2050 is CRITICAL Puppet has 1 failures [06:39:14] PROBLEM - puppet last run on mw1144 is CRITICAL Puppet has 1 failures [06:46:25] RECOVERY - puppet last run on mw2082 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:46:43] RECOVERY - puppet last run on mw2045 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:46:44] RECOVERY - puppet last run on mw1144 is OK Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:46:54] RECOVERY - puppet last run on cp2013 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:46:54] RECOVERY - puppet last run on labcontrol2001 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:47:26] RECOVERY - puppet last run on cp4014 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:34] RECOVERY - puppet last run on mw2184 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:43] RECOVERY - puppet last run on ms-fe2003 is OK Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:47:44] RECOVERY - puppet last run on mw1052 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:44] RECOVERY - puppet last run on mw1123 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:05] RECOVERY - puppet last run on cp2014 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:49:05] RECOVERY - puppet last run on mw2206 is OK Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:49:13] RECOVERY - puppet last run on mw2050 is OK Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:49:24] RECOVERY - puppet last run on mw2143 is OK Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:56:16] (03PS27) 10Alexandros Kosiaris: WIP: lvs: hieraize lvs_services variable [puppet] - 10https://gerrit.wikimedia.org/r/221065 [06:56:56] !log initiating query profiling on db1018 [06:57:00] Logged the message, Master [07:10:40] (03CR) 10Muehlenhoff: "I'd rather modify puppetsigner.py to amend /etc/puppet/autosign.conf, that way we're still in control which clients are accepted." [puppet] - 10https://gerrit.wikimedia.org/r/220305 (https://phabricator.wikimedia.org/T102504) (owner: 10Andrew Bogott) [07:13:02] (03PS28) 10Alexandros Kosiaris: lvs: hieraize lvs_services and lvs::monitor [puppet] - 10https://gerrit.wikimedia.org/r/221065 [07:14:24] (03CR) 10Muehlenhoff: "Similar to my comment on puppet we could amend a file of accepted hosts specified by the "autosign_file" option in /etc/salt/master" [puppet] - 10https://gerrit.wikimedia.org/r/220306 (https://phabricator.wikimedia.org/T102504) (owner: 10Andrew Bogott) [07:14:35] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60550 bytes in 5.282 second response time [07:22:08] (03PS1) 10Alexandros Kosiaris: Fix for lvs::monitor_service_http_https [puppet] - 10https://gerrit.wikimedia.org/r/221836 [07:23:18] (03CR) 10Alexandros Kosiaris: [C: 032] Fix for lvs::monitor_service_http_https [puppet] - 10https://gerrit.wikimedia.org/r/221836 (owner: 10Alexandros Kosiaris) [07:24:43] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [07:32:33] (03PS29) 10Alexandros Kosiaris: lvs: hieraize lvs_services and lvs::monitor [puppet] - 10https://gerrit.wikimedia.org/r/221065 [07:47:10] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "A couple of questions, and obviously I guess you verified this with the puppet compiler, but LGTM." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/221065 (owner: 10Alexandros Kosiaris) [07:47:40] <_joe_> ori: can't we use a VM for tessera? [07:47:55] <_joe_> and godog too [07:49:43] <_joe_> ostriches: will take a look, I promise [08:04:04] (03CR) 10Giuseppe Lavagetto: [C: 032] redirects: use separate ServerAlias directives for each alias [puppet] - 10https://gerrit.wikimedia.org/r/221291 (owner: 10BBlack) [08:05:47] !log applying schema changes for Gather extension [08:05:51] Logged the message, Master [08:06:23] ^that is issue T103611, look out for errors on gather [08:06:56] (althogh I made sure to make only compatible changes) [08:11:23] 6operations, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 6Services, and 3 others: Standardise CXServer deployment - https://phabricator.wikimedia.org/T101272#1412988 (10KartikMistry) p:5High>3Normal [08:39:53] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me. That's much more readable than looking up various sets of excluded cipher sets.. And if there are changes in the OpenSSL" [puppet] - 10https://gerrit.wikimedia.org/r/221805 (https://phabricator.wikimedia.org/T104274) (owner: 10BBlack) [08:52:26] i'm going to restart cassandra on rb1004, it's misbehaving [08:53:26] !log restbase restrting cassandra on restbase1004 [08:53:30] Logged the message, Master [09:03:28] 6operations, 10Gather, 7Database, 7Schema-change: Update Gather DB schema for flagging backend - https://phabricator.wikimedia.org/T103611#1413042 (10jcrespo) @Tgr: > I doubt locking would be a concern for Gather; we are talking about a table with 5000 rows and a couple hundred write queries a day. Pleas... [09:30:49] (03PS2) 10Giuseppe Lavagetto: confd: monitor template failures and file removals [puppet] - 10https://gerrit.wikimedia.org/r/221140 (https://phabricator.wikimedia.org/T103360) [09:32:54] (03PS3) 10Giuseppe Lavagetto: confd: monitor template failures and file removals [puppet] - 10https://gerrit.wikimedia.org/r/221140 (https://phabricator.wikimedia.org/T103360) [09:33:17] <_joe_> mobrovac: how's cassandra now? [09:33:42] _joe_: euh, spitting blood but marching on [09:35:01] (03CR) 10Giuseppe Lavagetto: [C: 032] confd: monitor template failures and file removals [puppet] - 10https://gerrit.wikimedia.org/r/221140 (https://phabricator.wikimedia.org/T103360) (owner: 10Giuseppe Lavagetto) [09:35:31] Hi, i created an account on Appveyor (testing tool) using 'wikimedia', and want to give the user/pass to someone at WMF for safe keeping. who do I contact? (T101807) [09:36:13] moritzm: _joe_ : do you know where / how we store our user/passwords ? [09:36:15] out of puppet [09:36:31] <_joe_> hashar: what do you mean? [09:36:36] once upon a time we had /root/doc/passwords on fenari. But I think we have a proper solution now [09:36:37] <_joe_> also, wrong channel? [09:36:46] <_joe_> yes [09:36:52] jayvdb created a "wikimedia" account for a software as a service system [09:36:54] that runs CI [09:37:02] and we would like to store the password in some shared place [09:37:11] so we don't depend on jayvdb to recover it :-D [09:37:16] <_joe_> hashar: moritzm is your man than [09:37:21] \O/ [09:37:22] <_joe_> *then [09:37:32] do we also have a standard email for such accounts? [09:37:36] <_joe_> hashar: who will need to access this password? [09:37:49] #releng I guess [09:37:52] <_joe_> ok [09:38:01] but we can funnel the password request through ops if needed [09:38:08] <_joe_> so we'd need to get you all to create a gpg key and get it verified by us [09:38:14] jayaround: so gotta poke moritzm about it :-} [09:38:22] ohh [09:40:38] 6operations, 10Traffic: Sort out DHE for Forward Secrecy w/ older clients - https://phabricator.wikimedia.org/T104281#1413108 (10MoritzMuehlenhoff) > I wonder if we're using any Java6 stuff in-house? I checked that as part of the leap second preparations (the Java bugfix for timer handling wasn't backported t... [09:42:50] hashar: we started to use pwstore in ops (as outlined here: https://phabricator.wikimedia.org/T96130). pwstore is flexible in terms of groups, so we can easily make some password files accessible for releng [09:43:09] 6operations, 7HHVM, 7Tracking: Complete the use of HHVM over Zend PHP on the Wikimedia cluster (tracking) - https://phabricator.wikimedia.org/T86081#1413114 (10JeroenDeDauw) When can we expect this to be done? [09:43:11] please create a Phab task with the people involved and assign it to me [09:43:39] (access to the pwstore is based on PGP keys, so people will need to create a key if they don't have one) [09:45:15] there's some existings docs on PGP in wikitech and those in SF could also reach out to ops people with a key (e.g. mutante) [09:50:48] 6operations: Manage Appveyor account - https://phabricator.wikimedia.org/T104306#1413120 (10jayvdb) 3NEW a:3MoritzMuehlenhoff [09:53:54] bah cassandra dying on us today [09:54:46] !log restbase restarting cassandra on restbase1004 [09:54:50] Logged the message, Master [09:56:34] 6operations, 6Labs, 10wikitech.wikimedia.org, 7HHVM: Move wikitech to HHVM - https://phabricator.wikimedia.org/T98813#1413134 (10Joe) This ticket makes no sense, IMO. We have no reason to migrate wikitech to HHVM at the moment. I'm resolving this ticket as invalid until someone explains me why should we e... [09:57:45] (03CR) 10Alexandros Kosiaris: lvs: hieraize lvs_services and lvs::monitor (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/221065 (owner: 10Alexandros Kosiaris) [09:59:07] lol, cassandra is using 1200% CPU on rb1004 [09:59:28] too much data to compact and flush [09:59:36] i fear we'll have to ride it out [10:01:39] 6operations: Manage Appveyor account - https://phabricator.wikimedia.org/T104306#1413138 (10hashar) @MoritzMuehlenhoff we would like to hold the password / email in the gpg backed credential store that got recently introduced for WMF. Do we have a generic/ops email we can use for the account? I am not sure who... [10:02:20] moritzm: thank you :-} [10:03:14] 6operations, 7HHVM, 7Tracking: Complete the use of HHVM over Zend PHP on the Wikimedia cluster (tracking) - https://phabricator.wikimedia.org/T86081#1413142 (10Joe) [10:03:18] 6operations, 6Labs, 10wikitech.wikimedia.org, 7HHVM: Move wikitech to HHVM - https://phabricator.wikimedia.org/T98813#1413140 (10Joe) 5Open>3Invalid [10:06:11] (03CR) 10Giuseppe Lavagetto: lvs: hieraize lvs_services and lvs::monitor (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/221065 (owner: 10Alexandros Kosiaris) [10:07:26] PROBLEM - Confd template for /etc/pybal/pools/osm on lvs1005 is CRITICAL: File not found: /etc/pybal/pools/osm [10:08:18] PROBLEM - Confd template for /etc/pybal/pools/osm on lvs1002 is CRITICAL: File not found: /etc/pybal/pools/osm [10:08:45] <_joe_> uhm interesting problem [10:08:52] <_joe_> osm is declared but has no backend [10:09:16] <_joe_> so of course our checker script thinks the file is invalid and refuses to write it [10:13:08] ACKNOWLEDGEMENT - Confd template for /etc/pybal/pools/osm on lvs1002 is CRITICAL: File not found: /etc/pybal/pools/osm Giuseppe Lavagetto Osm has no backends, preventing the file to be generated. [10:13:08] ACKNOWLEDGEMENT - Confd template for /etc/pybal/pools/osm on lvs1005 is CRITICAL: File not found: /etc/pybal/pools/osm Giuseppe Lavagetto Osm has no backends, preventing the file to be generated. [10:13:21] 6operations: Manage Appveyor account - https://phabricator.wikimedia.org/T104306#1413159 (10MoritzMuehlenhoff) > @MoritzMuehlenhoff we would like to hold the password / email in the gpg backed credential store that got recently introduced for WMF. > Do we have a generic/ops email we can use for the account? pws... [10:16:44] <_joe_> mobrovac: I see very scary alarms on restbase, I guess linked to cassandra [10:16:56] <_joe_> can we assist you in any way? [10:17:12] you mean the 5xx ones? [10:17:21] ignore that, known [10:17:35] cassandra on rb1004 is choking [10:17:39] like, completely [10:18:42] <_joe_> mobrovac: that and the two SLAs as well [10:19:12] _joe_: i'm thinking it wouldn't be a bad idea perhaps to put it out of the restbase config to let it recover with minimal external disturbance [10:19:14] thoughts? [10:19:52] i.e. remove its IP from restbase::seeds so that only the other C* nodes are contacted [10:20:07] rb1004 cass has got 150-250 GB more data than the others [10:20:14] so also more stuff to do [10:20:55] <_joe_> why is it so? [10:21:06] <_joe_> I mean why are data so unevenly balanced? [10:21:51] <_joe_> mobrovac: seems like a sensible idea in general, but removing it from restbase::seeds won't mean a failover will happen of some sorts? [10:22:04] it's not "that" uneven - rb1004: 1.35 TB, the others ~ 1.15 - 1.20TB [10:22:08] <_joe_> I mean if we start writing data to other nodes, we must make it rejoin [10:22:24] <_joe_> maybe I didn't get what restbase::seeds is [10:22:37] _joe_: no, i don't mean to kick it out of the cass cluster, jsut not let rb bother it with requests [10:23:16] _joe_: the 5xx from RB we are seeing are (mostly) from contacting cass on rb1004 [10:23:25] which times out [10:23:27] <_joe_> ok [10:23:31] <_joe_> yeah let's do that [10:23:36] <_joe_> I'll brew the patch [10:23:40] k cool thnx [10:25:39] <_joe_> we will still see those 5xx given other nodes will proxy a certain number of requests to it [10:25:48] (03PS1) 10Giuseppe Lavagetto: restbase: remove rb1004 from the seeds temporarily [puppet] - 10https://gerrit.wikimedia.org/r/221846 [10:25:53] <_joe_> mobrovac: ^^ [10:26:35] <_joe_> also, what's up with puppet failing on rb1003? [10:26:38] <_joe_> I'll look [10:26:52] _joe_: yes, but at least we'll alleviate the pressure off of rb1004 [10:27:05] <_joe_> ok [10:28:44] (03CR) 10Mobrovac: [C: 031] "I'll at least alleviate (if only slightly) the pressure off of rb1004 temporarily" [puppet] - 10https://gerrit.wikimedia.org/r/221846 (owner: 10Giuseppe Lavagetto) [10:29:16] should have been "it'll" [10:29:17] but well [10:29:18] :) [10:31:22] (03CR) 10Giuseppe Lavagetto: [C: 032] restbase: remove rb1004 from the seeds temporarily [puppet] - 10https://gerrit.wikimedia.org/r/221846 (owner: 10Giuseppe Lavagetto) [10:37:48] RECOVERY - puppet last run on restbase1003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:41:42] !log restbase restarting on all nodes [10:41:46] Logged the message, Master [10:42:42] (03PS30) 10Alexandros Kosiaris: lvs: hieraize lvs_services and lvs::monitor [puppet] - 10https://gerrit.wikimedia.org/r/221065 [10:50:34] <_joe_> mobrovac: puppet is still running [10:50:40] euh? [10:50:44] <_joe_> sorry, done [10:50:49] <_joe_> my connection was flapping [10:55:01] _joe_: i restarted cass on rb1004 yet again, but getting messages from RB nodes: Setting host 10.64.32.160 as DOWN [10:55:10] it seems our dance amounted to nothing [10:56:55] (03CR) 10Giuseppe Lavagetto: [C: 031] lvs: hieraize lvs_services and lvs::monitor [puppet] - 10https://gerrit.wikimedia.org/r/221065 (owner: 10Alexandros Kosiaris) [10:57:40] <_joe_> mobrovac: I feared that [11:03:27] _joe_: hm, maybe we're good, those may have been errors received from the other cass nodes, and not rb workers, since rb1004 is indeed down [11:03:35] <_joe_> mobrovac: as far as the system is concerned, 1004 seems healthy [11:03:36] s/good/"good"/ [11:04:04] k, rb1004 is up again [11:04:23] <_joe_> cassandra is telling us it's operating normally, apparently [11:04:27] <_joe_> just very busy [11:04:30] there was an inexplicable "connection reset" error on rb1004 [11:04:38] _joe_: yes, it's up now [11:04:46] wasn't for a couple of minutes, though [11:04:51] (on rb1004 only) [11:05:08] <_joe_> ot [11:05:24] <_joe_> it's doing a lot of compaction, but that's expected I guess [11:05:42] yup [11:06:16] that's what happens when you put CodeThatWorks(TM) on top of a looot of data [11:06:24] <_joe_> eheh [11:06:41] <_joe_> I always warned you guys that cassandra is an uneasy beast [11:06:47] since in the previous version TheCodeDidntWorkButWeThoughtItDid(TM) [11:07:13] <_joe_> at $JOB~1 I've seen a lot of ops losing hair and sleep over cassandra [11:07:14] so it didn't remove all of the re-renderings we thought it should [11:07:21] <_joe_> I see [11:07:22] yeah i know [11:07:31] <_joe_> so you were growing undefinitely? [11:07:33] <_joe_> I see [11:07:59] not really, on average data would get deleted, just not for the busiest pages [11:08:03] <_joe_> classic "we now evict, but have 3 months of unevicted shit that will get evicted immediately" [11:08:21] something like that [11:08:25] <_joe_> I think jynus has seen that pattern in the past [11:08:43] <_joe_> it's a pretty common nightmare for all datastores [11:09:11] but yeah, the biggest problem with Cassandra (and ES for that matter) is that only when the shit hits the fan you realise something in terribly wrong everywhere - in the app settings, in your data desc, in your data itself [11:09:19] but by then the ship has sailed ... [11:09:31] <_joe_> usually is someone doing a "DELETE from some_scratch_table WHERE is_stale=1" withoug a limit :) [11:09:38] :) [11:10:06] <_joe_> mobrovac: yeah all those distributed clusters are way less resilient than I'd like them to be [11:10:21] yup [11:10:40] k, i see 5xx rate dropping [11:10:54] possibly the effort has paid off [11:10:58] let's wait and see [11:11:00] <_joe_> that's why since 3-4 years my first question is "Tell me five reasons why we can't use mysql" [11:11:03] <_joe_> yes [11:11:09] hehe :) [11:11:25] 1. sql, 2. mysql [11:11:30] * mobrovac trolling [11:11:39] _joe_: it's not webscale [11:11:52] oh wait, webscalesql [11:11:53] * _joe_ throws mysql's changelogs since 4.0 to moborovac [11:11:57] there problem solved [11:11:59] <_joe_> akosiaris: yeah. [11:12:17] _joe_: clearly because it's not a current buzzword [11:12:29] <_joe_> mobrovac: you were trolling, but I think 1 is a reason for sure. [11:12:33] p858snake: let's make it one! [11:13:03] 6operations, 10Gather, 7Database, 7Schema-change: Update Gather DB schema for flagging backend - https://phabricator.wikimedia.org/T103611#1394461 (10Tgr) Thanks, Jaime! Created T104314 about the primary keys. [11:13:18] <_joe_> p858snake: was there a time when mysql was an unreliable wreck and a joke as a "real" database. It was probably a buzzword back then [11:13:45] php was the buzzword back then, mysql was its companion [11:13:57] <_joe_> still, a good piece of google was based on mysql 4.x for years, soo... [11:21:32] (03PS2) 10KartikMistry: CX: Enable CX all wikipedias except enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221831 (https://phabricator.wikimedia.org/T103531) [11:24:29] <_joe_> out for lunch [11:29:09] !log restbase restarting cassandra on restbase1005 [11:29:12] Logged the message, Master [11:36:40] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1413346 (10samuwmde) I would like to join Project-Creators. Our tech communcation and event team is switching to phabricator for their plannin... [11:36:44] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1413344 (10samuwmde) I would like to join Project-Creators. Our tech communcation and event team is switching to phabricator for their plannin... [11:38:29] (03PS4) 10Matanya: jobchron: log rotate [puppet] - 10https://gerrit.wikimedia.org/r/218905 [11:50:54] (03CR) 10Santhosh: [C: 031] CX: Enable CX all wikipedias except enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221831 (https://phabricator.wikimedia.org/T103531) (owner: 10KartikMistry) [12:23:49] (03PS7) 10Yuvipanda: Labs: small race condition fix in replica-addusers.pl [puppet] - 10https://gerrit.wikimedia.org/r/218880 (https://phabricator.wikimedia.org/T92561) (owner: 10coren) [12:30:21] PROBLEM - Cassanda CQL query interface on restbase1004 is CRITICAL: Connection refused [12:30:28] PROBLEM - Cassandra database on restbase1004 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (cassandra), command name java, args CassandraDaemon [12:32:56] RECOVERY - Cassandra database on restbase1004 is OK: PROCS OK: 1 process with UID = 113 (cassandra), command name java, args CassandraDaemon [12:33:39] ^^ known [12:33:48] (unfortunately) [12:34:25] (03PS9) 10Yuvipanda: Labs: Rewrite of manage-nfs-volumes-daemon [puppet] - 10https://gerrit.wikimedia.org/r/217861 (https://phabricator.wikimedia.org/T102782) (owner: 10coren) [12:34:28] (03CR) 10Steinsplitter: "@" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221600 (owner: 10Matanya) [12:34:40] (03CR) 10Steinsplitter: "without "»" after url" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221600 (owner: 10Matanya) [12:35:16] RECOVERY - Cassanda CQL query interface on restbase1004 is OK: TCP OK - 0.004 second response time on port 9042 [12:37:40] (03PS2) 10Matanya: add unibas.ch to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221600 [12:39:10] matanya: you missed *.ub.unibas.ch (https://gerrit.wikimedia.org/r/#/c/221600/2/wmf-config/InitialiseSettings.php) sorry for being annoying :( [12:39:29] thanks Steinsplitter no worries [12:40:03] (03PS3) 10Matanya: add unibas.ch to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221600 [12:41:40] (03CR) 10Steinsplitter: [C: 031] add unibas.ch to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221600 (owner: 10Matanya) [12:44:10] 7Puppet, 6Labs, 3Labs-Sprint-104: Allow per-host hiera overrides via wikitech - https://phabricator.wikimedia.org/T104202#1413471 (10yuvipanda) A couple of methods: # Pages named Hiera:/host/hostname # a 'host' key in the Hiera: page itself. [12:44:37] PROBLEM - puppet last run on mw2133 is CRITICAL puppet fail [13:01:57] RECOVERY - puppet last run on mw2133 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:03] _joe_: i think we're out of the danger zone [13:10:15] stats and logs are back to normal [13:10:48] (03PS2) 10Muehlenhoff: Update to 3.19.8-ckt2 [debs/linux] - 10https://gerrit.wikimedia.org/r/221642 [13:16:26] (03PS1) 10Mobrovac: Revert "restbase: remove rb1004 from the seeds temporarily" [puppet] - 10https://gerrit.wikimedia.org/r/221854 [13:16:36] _joe_: ^^ [13:22:55] (03CR) 10Ottomata: [C: 031] "Should I merge?" [puppet] - 10https://gerrit.wikimedia.org/r/221801 (https://phabricator.wikimedia.org/T101194) (owner: 10Milimetric) [13:23:18] PROBLEM - puppet last run on sca1001 is CRITICAL Puppet has 1 failures [13:27:37] 6operations: jmxtrans log rotation failure - https://phabricator.wikimedia.org/T104271#1413582 (10Ottomata) SIGHHHHHHH STUPID JMXTRANS BUG, ok! On it, thanks! [13:28:38] (03PS1) 10ArielGlenn: change directory where wikidata json dumps are synced to in labs [puppet] - 10https://gerrit.wikimedia.org/r/221855 [13:32:28] (03PS1) 10Yuvipanda: labstore: Simplify (and expand!) projects-config.yaml [puppet] - 10https://gerrit.wikimedia.org/r/221856 [13:32:58] paravoid: ^, part of finishing up the manage-nfs-volumes deamon. it can be vastly simplified if it doesn't need to use LDAP at all, and this is part of that. [13:33:01] should be a noop. [13:33:07] (03CR) 10jenkins-bot: [V: 04-1] labstore: Simplify (and expand!) projects-config.yaml [puppet] - 10https://gerrit.wikimedia.org/r/221856 (owner: 10Yuvipanda) [13:33:28] (03CR) 10Milimetric: [C: 031] "Yes, this is good to merge now." [puppet] - 10https://gerrit.wikimedia.org/r/221801 (https://phabricator.wikimedia.org/T101194) (owner: 10Milimetric) [13:33:47] I can then use https://wikitech.wikimedia.org/w/api.php?action=query&list=novainstances&niproject=tools&niregion=eqiad&format=json (and similar) to get individual IP addresses... [13:34:44] 6operations, 10vm-requests: eqiad: (1) VM for static-bugzilla - https://phabricator.wikimedia.org/T103604#1413599 (10JohnLewis) there will very likely just be a single host so staticbugs.eqiad.wmmet or something would be best following the misc naming style as this isn't really a 'service' perse. [13:36:22] akosiaris: could you look over https://phabricator.wikimedia.org/T103604 when you have time? :) [13:38:07] RECOVERY - puppet last run on sca1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [13:39:23] (03PS2) 10Yuvipanda: labstore: Simplify (and expand!) projects-config.yaml [puppet] - 10https://gerrit.wikimedia.org/r/221856 [13:41:56] (03PS10) 10Yuvipanda: Labs: Rewrite of manage-nfs-volumes-daemon [puppet] - 10https://gerrit.wikimedia.org/r/217861 (https://phabricator.wikimedia.org/T102782) (owner: 10coren) [13:41:58] (03PS3) 10Yuvipanda: labstore: Simplify (and expand!) projects-config.yaml [puppet] - 10https://gerrit.wikimedia.org/r/221856 [13:42:25] 6operations, 7HHVM, 7Tracking: Complete the use of HHVM over Zend PHP on the Wikimedia cluster (tracking) - https://phabricator.wikimedia.org/T86081#1413623 (10Krenair) [13:42:29] 6operations, 6Labs, 10wikitech.wikimedia.org, 7HHVM: Move wikitech to HHVM - https://phabricator.wikimedia.org/T98813#1413617 (10Krenair) 5Invalid>3Open It's an absolute blocker for {T91590}, {T94149}, and perhaps {T75901}. If you want to close this, you'll have to reject at least those first two. [13:44:30] 6operations, 6Labs, 10wikitech.wikimedia.org, 7HHVM: Move wikitech to HHVM - https://phabricator.wikimedia.org/T98813#1413626 (10Reedy) >>! In T98813#1413617, @Krenair wrote: > It's an absolute blocker for {T91590}, {T94149}, and perhaps {T75901}. If you want to close this, you'll have to reject at least t... [13:55:05] 7Puppet, 6Labs, 3Labs-Sprint-104: Allow per-host hiera overrides via wikitech - https://phabricator.wikimedia.org/T104202#1413686 (10scfc) I prefer 1. so that it is possible to just copy the project Hiera page to a host's Hiera page to start customizing that. [14:00:15] (03PS1) 10Ottomata: Redirect stderr of this cron to the same logfile [puppet] - 10https://gerrit.wikimedia.org/r/221858 [14:03:24] 7Puppet, 6Labs, 3Labs-Sprint-104: Allow per-host hiera overrides via wikitech - https://phabricator.wikimedia.org/T104202#1413702 (10yuvipanda) Yeah, I prefer $1 too, since that mirrors closely what we have in the operations/puppet git repository [14:04:23] (03CR) 10Muehlenhoff: [C: 032 V: 032] Update to 3.19.8-ckt2 [debs/linux] - 10https://gerrit.wikimedia.org/r/221642 (owner: 10Muehlenhoff) [14:10:11] !log restbase restarting cassandra on restbase1005 [14:11:37] (03CR) 10Addshore: [C: 031] "yes!" [puppet] - 10https://gerrit.wikimedia.org/r/221855 (owner: 10ArielGlenn) [14:12:03] hashar: https://gerrit.wikimedia.org/r/#/c/217466/ is failing to rebase [14:12:14] hashar: do you have an up-to-date running on the integration puppetmaster? [14:12:23] hashar: I can review/merge now if you rebase :) [14:12:26] PROBLEM - puppet last run on mw2003 is CRITICAL puppet fail [14:16:53] (03CR) 10ArielGlenn: [C: 032] change directory where wikidata json dumps are synced to in labs [puppet] - 10https://gerrit.wikimedia.org/r/221855 (owner: 10ArielGlenn) [14:20:06] (03PS2) 10Ottomata: Redirect stderr of this cron to the same logfile [puppet] - 10https://gerrit.wikimedia.org/r/221858 [14:20:45] moritzm: I don’t understand the difference between what you suggest on https://gerrit.wikimedia.org/r/#/c/220305/ and the status quo. [14:20:53] Or, rather, I understand what you’re suggesting but not how it would make a difference. [14:21:25] (03CR) 10Ottomata: [C: 032] Redirect stderr of this cron to the same logfile [puppet] - 10https://gerrit.wikimedia.org/r/221858 (owner: 10Ottomata) [14:24:35] (03CR) 10Faidon Liambotis: [C: 031] "Yup." [puppet] - 10https://gerrit.wikimedia.org/r/221805 (https://phabricator.wikimedia.org/T104274) (owner: 10BBlack) [14:27:36] RECOVERY - puppet last run on mw2003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [14:27:37] (03PS2) 10Giuseppe Lavagetto: Revert "restbase: remove rb1004 from the seeds temporarily" [puppet] - 10https://gerrit.wikimedia.org/r/221854 (owner: 10Mobrovac) [14:27:45] <_joe_> mobrovac: merging it [14:27:54] k thnx [14:27:55] (03CR) 10Giuseppe Lavagetto: [C: 032] Revert "restbase: remove rb1004 from the seeds temporarily" [puppet] - 10https://gerrit.wikimedia.org/r/221854 (owner: 10Mobrovac) [14:28:41] (03CR) 10Giuseppe Lavagetto: [V: 032] Revert "restbase: remove rb1004 from the seeds temporarily" [puppet] - 10https://gerrit.wikimedia.org/r/221854 (owner: 10Mobrovac) [14:30:05] (03PS11) 10Yuvipanda: [WIP] labstore: Rewrite of manage-nfs-volumes-daemon [puppet] - 10https://gerrit.wikimedia.org/r/217861 (https://phabricator.wikimedia.org/T102782) (owner: 10coren) [14:30:09] (03CR) 10Andrew Bogott: "In that case why not just leave puppetsigner.py as it is? Either way we're polling" [puppet] - 10https://gerrit.wikimedia.org/r/220305 (https://phabricator.wikimedia.org/T102504) (owner: 10Andrew Bogott) [14:30:12] paravoid: ^ almost a full rewrite. [14:30:27] aha [14:30:53] before I look deeper into this [14:31:05] I had agreed with Coren to move everything labstore-related out of ldap/openstack [14:31:11] paravoid: yes, that's a WIP [14:31:14] and he prepared patches for that [14:31:14] paravoid: I'm doing that now [14:31:30] didn't want to do that and code changes in one patchset [14:31:39] yeah, but I thought he already did that [14:31:45] https://gerrit.wikimedia.org/r/#/c/218666/ [14:31:56] paravoid: nope, not for that file :) [14:32:00] nor for replica-addusers... [14:32:16] so no merge conflicts there. [14:32:37] I would really like to do this in 3 steps [14:33:02] 1) Move everything that *is* puppetized into its own module, so we can read it and audit it [14:33:21] 2) Add to that module everything that is not puppetized, as it is right now on the systems [14:33:39] 3) Start making changes like rewrites and such, in puppet [14:33:56] I can't really help with (2) [14:33:57] does that make sense? [14:34:12] <_joe_> paravoid: that would be boring! [14:34:13] 6operations: Investigate Ubuntu fork of ttf-indic-fonts and bring it in Jessie - https://phabricator.wikimedia.org/T103328#1413811 (10Aklapper) > We should examine the differences to reach parity. For some background, see https://bugs.launchpad.net/ubuntu/+source/ttf-indic-fonts/+bug/958345 comment 10 and later. [14:34:39] paravoid: it does, but I can't help with (2) and new instances in projects with NFS enabled don't get NFS atm... [14:34:42] paravoid: I can do (1) though. [14:34:59] (1) is supposedely that patchset above, but from what you're saying it needs further work [14:35:09] (2) was documented somewhere by Marc and it shouldn't be that hard to figure out [14:35:13] paravoid: which patchset? https://gerrit.wikimedia.org/r/#/c/218666/8? [14:35:19] yes [14:35:34] paravoid: those are all additions, so I think that's (2) [14:35:34] ah no, that's (2) [14:35:54] yes [14:36:03] paravoid: so I can do (1) [14:36:17] (1) was supposed to be https://gerrit.wikimedia.org/r/#/c/220618/ [14:36:20] which would move things from the openstack and ldap modules [14:36:31] but I gather it was incomplete [14:36:34] it is [14:36:41] alright, let's complete it then [14:36:42] ther's lots more in the openstack and ldap module [14:36:58] then let's review/merge 218666 [14:37:07] then rebase the rewrite(s) on top of this [14:37:24] it sounds convoluted and I'm sorry [14:37:38] but I saw you moved it from modules/ldap to modules/openstack [14:37:45] akosiaris: around? [14:37:49] paravoid: paravoid no, that was what Coren|Away had done :) [14:37:56] I am moving it to labstore module now. [14:38:15] I didn't want to move in the same patchset as the rewrite so as to not confuse the diffs... [14:38:16] 6operations, 10Continuous-Integration-Infrastructure: Investigate usage of ttf-ubuntu-font-family which is not available on Jessie - https://phabricator.wikimedia.org/T103325#1413825 (10Aklapper) [14:38:28] yes, that's not what I said either :) [14:38:30] and rename them too [14:38:33] indeed, indeed. [14:38:38] let's move it in a separate patchset [14:38:40] I'll switch ordering around and do rebases [14:38:42] yeah [14:38:49] awesome [14:38:50] thanks :) [14:39:16] paravoid: :) do you think you can still look at my rewrite and validate my assumptions / directions? ok to tell me to wait until everything else is in order too :) [14:39:19] but that won't change the code... [14:39:26] yup, will do [14:39:42] paravoid: also, I'm not sure what effects merging https://gerrit.wikimedia.org/r/#/c/218666/8 will have as well. theoretically none - I guess I can do a diff on what's in there and what's in the patch and verify [14:40:04] yes :) [14:40:18] * YuviPanda does [14:42:35] 6operations, 10Continuous-Integration-Infrastructure: Investigate usage of ttf-ubuntu-font-family which is not available on Jessie - https://phabricator.wikimedia.org/T103325#1413834 (10Dzahn) https://gerrit.wikimedia.org/r/#/c/218640/ [14:43:43] (03PS1) 10Ottomata: Need to append stderr to log file for aggregator cron job [puppet] - 10https://gerrit.wikimedia.org/r/221862 [14:44:59] (03CR) 10Yuvipanda: [C: 04-1] "diffs :'(" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/218666 (https://phabricator.wikimedia.org/T102478) (owner: 10coren) [14:45:05] paravoid: ^ :( [14:45:05] (03CR) 10Ottomata: [C: 032] Need to append stderr to log file for aggregator cron job [puppet] - 10https://gerrit.wikimedia.org/r/221862 (owner: 10Ottomata) [14:46:28] paravoid: I will amend the patches to match what's on labstore1002 now? major difference seems to be that in labstore1002 nfs2/3 are disabled [14:46:42] !log Update cxserver to 0d21a80 [14:46:47] Logged the message, Master [14:47:02] YuviPanda: yes [14:47:08] paravoid: doing so now. [14:47:10] YuviPanda: the /exp/keys line is for /public/keys [14:47:15] godog: can you merge, https://gerrit.wikimedia.org/r/#/c/221832/ [14:47:18] paravoid: which we no longer need... [14:47:18] which I guess it was removed in the meantime [14:47:21] right [14:47:22] right [14:47:30] jouncebot, next [14:47:30] In 0 hour(s) and 12 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150630T1500) [14:47:33] 6operations, 10Traffic, 10hardware-requests: Upgrade eqiad LVS to 10G - https://phabricator.wikimedia.org/T89120#1413840 (10Cmjohnson) I received the LVS servers. I am assuming these are going to be split across 3 10G rows (A,C,D). Please advise [14:50:45] (03PS9) 10Yuvipanda: labstore: More puppetization fixes for labstore* [puppet] - 10https://gerrit.wikimedia.org/r/218666 (https://phabricator.wikimedia.org/T102478) (owner: 10coren) [14:50:48] paravoid: ^ [14:50:57] can you +1? I'll babysit and make sure there are no diffs [14:50:59] s/make sure/pray/ [14:51:14] 10Ops-Access-Requests, 6operations: Grant dcausse root on the search cluster - https://phabricator.wikimedia.org/T104222#1413851 (10Dzahn) [14:52:02] 10Ops-Access-Requests, 6operations: Grant dcausse root on the search cluster - https://phabricator.wikimedia.org/T104222#1410752 (10Dzahn) I assume by sudo you mean root, as in sudo ALL ALL, not a defined set of a commans that can be executed as a specific user. [14:52:03] ok. Who can merge, https://gerrit.wikimedia.org/r/#/c/221832/ [14:52:49] paravoid is on duty.. :) [14:53:17] * paravoid delegates to akosiaris [14:53:28] * kart_ is looking for akosiaris :) [14:54:05] * kart_ is happy that this kind of boring changes are not going to happen every week now! [14:55:10] (03PS2) 10Dzahn: static-bugzilla: add bug number comments [puppet] - 10https://gerrit.wikimedia.org/r/221351 [14:55:55] paravoid: wow, so much crap in ldap::client... [14:56:30] (03CR) 10Faidon Liambotis: [C: 04-1] "First pass :)" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/217861 (https://phabricator.wikimedia.org/T102782) (owner: 10coren) [14:57:58] YuviPanda: can you merge 221832? We need it in 3 minutes :D [14:58:18] (03PS2) 10Yuvipanda: CX: Add languages for deployment on 20150630 [puppet] - 10https://gerrit.wikimedia.org/r/221832 (https://phabricator.wikimedia.org/T103531) (owner: 10KartikMistry) [14:58:20] kart_: sure [14:58:31] YuviPanda: thanks for saving the world. [14:58:31] 6operations, 10Analytics-Cluster, 10hardware-requests: Hadoop worker node procurement - 2015 - https://phabricator.wikimedia.org/T100442#1413865 (10Cmjohnson) I received the servers but do not have any place to put most of them. 4 are destined to go to Row D2 but the others I am not sure. I know Andrew B sa... [14:58:31] kart_: you guys need to move it out of there soon however... [14:58:43] kart_: shall I merge it now? [14:58:43] YuviPanda: yes. We will. [14:59:09] YuviPanda: languages will be picked up from config.defaults.js [14:59:24] I guess that's a 'yes, merge it now'? [14:59:27] Krenair: are you SWATing today, or am I? [14:59:39] can you do it please? [14:59:59] yup, np, kart_ you look like you're around, but ping anyway :P [15:00:04] manybubbles anomie ostriches thcipriani marktraceur Krenair: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150630T1500). Please do the needful. [15:00:08] I have a hopefully-no-op config change [15:00:15] (03CR) 10Dzahn: [C: 032] static-bugzilla: add bug number comments [puppet] - 10https://gerrit.wikimedia.org/r/221351 (owner: 10Dzahn) [15:00:23] I'm here. [15:00:35] that's not really important but am trying to make the config a little more clean and standard where possible [15:01:10] YuviPanda: yes! [15:01:28] 6operations, 10Analytics-Cluster, 10hardware-requests: Hadoop worker node procurement - 2015 - https://phabricator.wikimedia.org/T100442#1413869 (10Ottomata) Hm. We don't have any plans to replace any analytics nodes, but we might be able to move some of them around, as some will be repurposed as non distri... [15:01:38] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221831 (https://phabricator.wikimedia.org/T103531) (owner: 10KartikMistry) [15:01:44] (03Merged) 10jenkins-bot: CX: Enable CX all wikipedias except enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221831 (https://phabricator.wikimedia.org/T103531) (owner: 10KartikMistry) [15:01:51] (03CR) 10Yuvipanda: [C: 032 V: 032] CX: Add languages for deployment on 20150630 [puppet] - 10https://gerrit.wikimedia.org/r/221832 (https://phabricator.wikimedia.org/T103531) (owner: 10KartikMistry) [15:02:29] (03PS3) 10Yuvipanda: CX: Add languages for deployment on 20150630 [puppet] - 10https://gerrit.wikimedia.org/r/221832 (https://phabricator.wikimedia.org/T103531) (owner: 10KartikMistry) [15:02:39] (03CR) 10Yuvipanda: [V: 032] CX: Add languages for deployment on 20150630 [puppet] - 10https://gerrit.wikimedia.org/r/221832 (https://phabricator.wikimedia.org/T103531) (owner: 10KartikMistry) [15:02:48] kart_: done [15:02:57] cool [15:03:22] you deserve leap beer [15:03:41] (03CR) 10Yuvipanda: [WIP] labstore: Rewrite of manage-nfs-volumes-daemon (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/217861 (https://phabricator.wikimedia.org/T102782) (owner: 10coren) [15:04:01] !log thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Enable CX all wikipedias except enwiki [[gerrit:221831]] (duration: 00m 13s) [15:04:08] ^ kart_ check please [15:04:11] Logged the message, Master [15:04:16] \O/ [15:04:29] (03CR) 10Dzahn: [C: 032] tendril: sync changes from github repo [software/tendril] - 10https://gerrit.wikimedia.org/r/221184 (https://phabricator.wikimedia.org/T98816) (owner: 10Dzahn) [15:04:58] kart_: you're doing it wrong [15:05:03] you're supposed to test on enwiki first [15:05:21] Reedy: really? [15:05:36] (03PS1) 10Yuvipanda: labstore: Move stuff into module from ldap module [puppet] - 10https://gerrit.wikimedia.org/r/221864 (https://phabricator.wikimedia.org/T102478) [15:05:42] I got cx on frwiki at least [15:05:45] thcipriani: thanks! [15:06:05] kart_: looks good then? [15:06:19] Reedy: it was deployed incrementally in other wikis. enwiki next week. [15:06:29] kart_: I'm trolling you :) [15:06:37] s/you// [15:06:45] You're always trolling :p [15:06:57] I don't get paid not to troll [15:06:57] PROBLEM - jmxtrans on analytics1012 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args -jar.+jmxtrans-all.jar [15:07:09] If it doesn't work on enwiki, it won't work anywhere else ;D [15:07:43] i am messing with jmxtrans [15:07:46] trying to fix [15:08:39] :) [15:09:14] thcipriani: looks fine, we will find where CX will break gadgets and common.js now [15:09:31] kart_: ok! [15:09:41] (03PS1) 10Yuvipanda: labs: Remove manage-keys-nfs and friends [puppet] - 10https://gerrit.wikimedia.org/r/221865 [15:09:46] (03CR) 10jenkins-bot: [V: 04-1] labs: Remove manage-keys-nfs and friends [puppet] - 10https://gerrit.wikimedia.org/r/221865 (owner: 10Yuvipanda) [15:09:50] paravoid: ^ I think these two cover most of it? [15:09:51] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221803 (https://phabricator.wikimedia.org/T31902) (owner: 10Alex Monk) [15:09:55] (03CR) 10Dzahn: [V: 032] tendril: sync changes from github repo [software/tendril] - 10https://gerrit.wikimedia.org/r/221184 (https://phabricator.wikimedia.org/T98816) (owner: 10Dzahn) [15:10:07] I wonder if touching all the config files will make hhvm freak out. [15:10:25] (03Merged) 10jenkins-bot: Standardise a ton of ticket comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221803 (https://phabricator.wikimedia.org/T31902) (owner: 10Alex Monk) [15:12:32] !log thcipriani Synchronized wmf-config: SWAT: Standardise a ton of ticket comments [[gerrit:221803]] (duration: 00m 13s) [15:12:38] Logged the message, Master [15:12:48] (03PS2) 10Yuvipanda: labstore: Move stuff into module from ldap module [puppet] - 10https://gerrit.wikimedia.org/r/221864 (https://phabricator.wikimedia.org/T102478) [15:12:50] ^ Krenair synced :) [15:12:50] (03PS10) 10Yuvipanda: labstore: More puppetization fixes for labstore* [puppet] - 10https://gerrit.wikimedia.org/r/218666 (https://phabricator.wikimedia.org/T102478) (owner: 10coren) [15:12:52] (03PS2) 10Yuvipanda: labs: Remove manage-keys-nfs and friends [puppet] - 10https://gerrit.wikimedia.org/r/221865 [15:13:28] thcipriani, great, thanks [15:13:47] paravoid: am not going to rebase my rewrite until these are merged, to prevent rebase-hell :) [15:13:53] let me know if I've missed any files. [15:16:57] RECOVERY - jmxtrans on analytics1012 is OK: PROCS OK: 1 process with command name java, regex args -jar.+jmxtrans-all.jar [15:17:51] (03CR) 10Faidon Liambotis: [C: 04-1] "Woot!" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/221865 (owner: 10Yuvipanda) [15:18:45] YuviPanda: do we use homedirectorymanager.py anywhere anymore? [15:19:01] paravoid: it's included in a couple of other scripts. [15:19:07] ah right [15:19:11] yeah, I just git greped [15:19:14] dammit :) [15:19:25] Krenair: the past couple SWAT deploys you've just been going through the mediawiki-config patches that are verified +1 and getting them out: have you been pinging people to see if they're around? Or just doing it if it seems innocuous? [15:20:05] I pick off ones that I know are fine to do and can verify myself [15:20:32] (03CR) 10Faidon Liambotis: [C: 031] labstore: Move stuff into module from ldap module [puppet] - 10https://gerrit.wikimedia.org/r/221864 (https://phabricator.wikimedia.org/T102478) (owner: 10Yuvipanda) [15:20:42] Most of the requests from individual wikis for small config tweaks are easy, for example [15:21:01] (03PS3) 10Yuvipanda: labs: Remove manage-keys-nfs and friends [puppet] - 10https://gerrit.wikimedia.org/r/221865 [15:21:42] (03PS1) 10Yuvipanda: labstore: Split monitoring out into its own class [puppet] - 10https://gerrit.wikimedia.org/r/221866 [15:21:49] paravoid: updated and another commit as well. [15:22:05] (03CR) 10Faidon Liambotis: [C: 031] "If this is no-diff from what's @ labstore1002, it's a +2 ;)" [puppet] - 10https://gerrit.wikimedia.org/r/218666 (https://phabricator.wikimedia.org/T102478) (owner: 10coren) [15:22:47] (03CR) 10Faidon Liambotis: [C: 031] labstore: Split monitoring out into its own class [puppet] - 10https://gerrit.wikimedia.org/r/221866 (owner: 10Yuvipanda) [15:22:57] paravoid: andrewbogott https://phabricator.wikimedia.org/T104342 (cleanup of the ldap module scripts) [15:23:23] (03CR) 10Yuvipanda: [C: 032] labstore: More puppetization fixes for labstore* [puppet] - 10https://gerrit.wikimedia.org/r/218666 (https://phabricator.wikimedia.org/T102478) (owner: 10coren) [15:27:19] kart_: now, yes, I am around [15:27:22] paravoid: cool, that patch was a noop. [15:27:25] * YuviPanda merges others [15:27:28] :D [15:27:31] akosiaris: :) [15:27:57] so, Yuvi merged the cxserver patch ? [15:28:04] or did I read the backscroll wrong ? [15:28:19] (03CR) 10Yuvipanda: [C: 032] labstore: Move stuff into module from ldap module [puppet] - 10https://gerrit.wikimedia.org/r/221864 (https://phabricator.wikimedia.org/T102478) (owner: 10Yuvipanda) [15:28:20] paravoid: https://phabricator.wikimedia.org/T100503 may be a nice ops duty task if you have time :) [15:29:20] akosiaris: he did. [15:29:53] JohnFLewis: also the bugzilla vm is on my radar, I 've just not found the time yet [15:30:02] JohnFLewis: I'll leave it to robh :) [15:30:05] akosiaris: okay, just checking :) [15:30:13] paravoid: sneaky [15:30:15] I am finishing something and I will address it [15:30:32] :P [15:32:11] paravoid: wanna +1 https://gerrit.wikimedia.org/r/#/c/221865/? addressed your concerns [15:33:05] (03CR) 10Faidon Liambotis: [C: 032] labs: Remove manage-keys-nfs and friends [puppet] - 10https://gerrit.wikimedia.org/r/221865 (owner: 10Yuvipanda) [15:33:41] good stuff. [15:34:10] am waiting for run to complete before merging the last in the series. [15:34:20] paravoid: anything else before I rebase the rewrite? [15:35:07] nope! [15:35:13] (03PS3) 10BBlack: redirects: use separate ServerAlias directives for each alias [puppet] - 10https://gerrit.wikimedia.org/r/221291 [15:36:26] PROBLEM - puppet last run on holmium is CRITICAL Puppet has 1 failures [15:36:42] paravoid: whoopsy, lots of churn in puppet due to that force => true, as it kills all the files in /public/keys by generating a delete for each one... [15:37:04] that's why I said to require the unmount first [15:37:07] (03CR) 10BBlack: [C: 032 V: 032] redirects: use separate ServerAlias directives for each alias [puppet] - 10https://gerrit.wikimedia.org/r/221291 (owner: 10BBlack) [15:37:16] PROBLEM - puppet last run on labcontrol2001 is CRITICAL Puppet has 1 failures [15:37:34] paravoid: no, these aren't mounted. [15:37:48] paravoid: during the outage, me and _joe_ changed it so that the script ran on the instances themselves. [15:37:50] (03PS2) 10BBlack: switch to explicit ciphersuite lists [puppet] - 10https://gerrit.wikimedia.org/r/221805 (https://phabricator.wikimedia.org/T104274) [15:37:56] paravoid: and just created /public/keys [15:37:58] on each of them [15:38:02] this allowed us to bring precise instances back [15:38:06] so they're all on the local system [15:38:10] paravoid: I also did require the unmount :) [15:38:11] (03CR) 10Faidon Liambotis: [C: 031] create ssl::unified as a non-SNI alternative to ssl::sni [puppet] - 10https://gerrit.wikimedia.org/r/221741 (owner: 10BBlack) [15:38:46] PROBLEM - puppet last run on nembus is CRITICAL Puppet has 1 failures [15:38:49] (03CR) 10Faidon Liambotis: create ssl::unified as a non-SNI alternative to ssl::sni (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/221741 (owner: 10BBlack) [15:38:55] (03CR) 10BBlack: [C: 032] switch to explicit ciphersuite lists [puppet] - 10https://gerrit.wikimedia.org/r/221805 (https://phabricator.wikimedia.org/T104274) (owner: 10BBlack) [15:39:07] (03CR) 10Faidon Liambotis: [C: 031] primary ssl services -> unified-only, not SNI [puppet] - 10https://gerrit.wikimedia.org/r/221670 (owner: 10BBlack) [15:39:21] bblack: I didn't actually check if ssl::unified if going to clean up the SNI vhosts [15:39:37] PROBLEM - puppet last run on db1011 is CRITICAL Puppet has 1 failures [15:39:37] other than that fine with me [15:40:59] (03PS2) 10Ottomata: Add a new limn datafile generator: extdist [puppet] - 10https://gerrit.wikimedia.org/r/221801 (https://phabricator.wikimedia.org/T101194) (owner: 10Milimetric) [15:41:14] paravoid: it doesn't [15:41:16] (03CR) 10Ottomata: [C: 032 V: 032] Add a new limn datafile generator: extdist [puppet] - 10https://gerrit.wikimedia.org/r/221801 (https://phabricator.wikimedia.org/T101194) (owner: 10Milimetric) [15:41:26] I figured I'll do it minimally/manually at first [15:41:46] (03PS2) 10Yuvipanda: labstore: Split monitoring out into its own class [puppet] - 10https://gerrit.wikimedia.org/r/221866 [15:41:51] I think it cleans up sites-available, just not the rest of it [15:41:53] (03CR) 10Yuvipanda: [C: 032 V: 032] labstore: Split monitoring out into its own class [puppet] - 10https://gerrit.wikimedia.org/r/221866 (owner: 10Yuvipanda) [15:42:11] ok [15:42:16] andrewbogott: when monitoring a pupet run on nembus, I saw [15:42:16] > Notice: /Stage[main]/Ldap::Server/Service[opendj]/ensure: ensure changed 'stopped' to 'running' [15:42:17] just now [15:42:57] Try again, does it say it on every run? [15:43:10] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: stat1002 access requested for sniedzielski - https://phabricator.wikimedia.org/T103871#1414015 (10Niedzielski) @RobH, thanks! Who is responsible for merging the patch? I wasn't sure if I should +2 it or not. [15:43:22] yeah, running again [15:43:56] RECOVERY - puppet last run on nembus is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [15:44:27] (03PS7) 10John F. Lewis: install-server: rename module to install_server [puppet] - 10https://gerrit.wikimedia.org/r/221787 [15:44:30] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: stat1002 access requested for sniedzielski - https://phabricator.wikimedia.org/T103871#1414027 (10Reedy) >>! In T103871#1414015, @Niedzielski wrote: > @RobH, thanks! Who is responsible for merging the patch? I wasn't sure if I should +2 it or not. I'd s... [15:44:40] (03PS8) 10John F. Lewis: install-server: rename module to install_server [puppet] - 10https://gerrit.wikimedia.org/r/221787 [15:44:53] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: stat1002 access requested for sniedzielski - https://phabricator.wikimedia.org/T103871#1414029 (10RobH) Only operations can merge in the patch. [15:44:56] (03PS2) 10BBlack: create ssl::unified as a non-SNI alternative to ssl::sni [puppet] - 10https://gerrit.wikimedia.org/r/221741 [15:45:24] paravoid / akosiaris: reviews for https://gerrit.wikimedia.org/r/#/c/221787/ welcome when you have time [15:45:41] andrewbogott: yes, on every puppet run [15:46:09] (03CR) 10BBlack: [C: 032] create ssl::unified as a non-SNI alternative to ssl::sni [puppet] - 10https://gerrit.wikimedia.org/r/221741 (owner: 10BBlack) [15:46:13] YuviPanda: ok, looking... [15:46:18] (03PS5) 10BBlack: primary ssl services -> unified-only, not SNI [puppet] - 10https://gerrit.wikimedia.org/r/221670 [15:46:36] RECOVERY - puppet last run on holmium is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures [15:47:27] RECOVERY - puppet last run on labcontrol2001 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures [15:48:09] (03CR) 10RobH: [C: 031] "seems like a straightforward rename to adopt puppet naming standards to me. I see no mistakes, but even one could cause some major issues" [puppet] - 10https://gerrit.wikimedia.org/r/221787 (owner: 10John F. Lewis) [15:48:12] (03PS1) 10Yuvipanda: labstore: Do not pass monitor_iface to labstore::fileserver [puppet] - 10https://gerrit.wikimedia.org/r/221870 [15:48:22] (03PS2) 10Yuvipanda: labstore: Do not pass monitor_iface to labstore::fileserver [puppet] - 10https://gerrit.wikimedia.org/r/221870 [15:48:30] (03CR) 10Yuvipanda: [C: 032 V: 032] labstore: Do not pass monitor_iface to labstore::fileserver [puppet] - 10https://gerrit.wikimedia.org/r/221870 (owner: 10Yuvipanda) [15:48:50] YuviPanda: it’s happening on neptunium too. It must be harmless, but… strange. [15:49:35] paravoid: re 221670 (switch to unified): it seems like even if this sticks and we keep things this way through our next cert orders, etc... we'd probably keep one separate SNI cert for *.wm.o to use for misc-lb and those sorts of cases anyways, maybe? [15:49:52] if we would, no point moving upload/bits off of it either, pointless cert size increase for them [15:50:36] paravoid: HAHAHAAHAAAAAAAAAAAAAAAA [15:50:40] labstore/manifests/fileserver.pp [15:50:45] has class named labstore::fileserve [15:50:46] (03CR) 10Alex Monk: "Does not seem to work." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221825 (owner: 10BryanDavis) [15:50:49] (no r) [15:51:15] (03PS1) 10Yuvipanda: labstore: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/221871 [15:51:19] (03CR) 10jenkins-bot: [V: 04-1] labstore: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/221871 (owner: 10Yuvipanda) [15:51:32] (03PS2) 10Yuvipanda: labstore: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/221871 [15:51:38] (03CR) 10Yuvipanda: [C: 032 V: 032] labstore: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/221871 (owner: 10Yuvipanda) [15:52:58] PROBLEM - puppet last run on labstore1001 is CRITICAL puppet fail [15:53:24] Glaisher, Dereckson: I have given up trying to come to any sort of understanding on https://phabricator.wikimedia.org/T96523 [15:53:33] there there, puppet [15:53:52] YuviPanda: so, the init script for opendj is dumb and can’t detect when opendj is already running. It’s harmless; when ‘service start’ tries to restart it it hits a logfile and backs out. [15:54:01] heh ouch [15:54:05] ok [15:54:12] we should migrate to openldap at some point... [15:54:39] andrewbogott: the nfs deamon no longer depends on LDAP https://gerrit.wikimedia.org/r/#/c/217861/ [15:54:41] (still WIP) [15:54:44] we should migrate to [15:54:47] RECOVERY - puppet last run on db1011 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [15:55:21] andrewbogott: :D so off LDAP, you mean? [15:55:28] maybe :( [15:55:52] andrewbogott: heh. does horizon have an API? [15:56:15] keystone you mean? [15:56:29] https://wiki.openstack.org/wiki/Horizon/RESTAPI [15:57:27] andrewbogott: oh, for queries and stuff, but yeah, keystone. ugh, this gonna be fun, I guess. [15:57:28] eh..but see that comment about not being supported for external use [15:57:49] andrewbogott: we should move off LDAP for puppet roles at some point, I guess. [15:58:15] Yeah, we’re not too far from that. [15:58:48] yeah [15:58:58] there's a fairly complete ENC that I need to finish up... [15:59:08] and then per-host and per-role hiera support... [16:03:08] (03CR) 10BryanDavis: [C: 04-1] "Blerg. Yeah this won't work (at least not deterministically) due to the way in which the logging system is wired up internally. Once Logge" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221825 (owner: 10BryanDavis) [16:03:54] (03PS31) 10Alexandros Kosiaris: lvs: hieraize lvs_services and lvs::monitor [puppet] - 10https://gerrit.wikimedia.org/r/221065 [16:05:07] PROBLEM - puppet last run on wtp1001 is CRITICAL Puppet has 1 failures [16:05:28] <_joe_> bd808: still working on scap restarting the appservers? [16:05:41] (03CR) 10BryanDavis: "Hmm.. COW may kick in before the array is used via $wgMWLoggerDefaultSpi. We should try updating $wgMWLoggerDefaultSpi['args'][0] instead " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221825 (owner: 10BryanDavis) [16:05:47] PROBLEM - puppet last run on terbium is CRITICAL Puppet has 1 failures [16:06:16] PROBLEM - puppet last run on mw1241 is CRITICAL Puppet has 1 failures [16:06:24] <_joe_> a lot of puppet failures [16:06:28] <_joe_> looking [16:06:42] _joe_: I implemented the cave man solution and it didn't work very well. I can give it another try when you have the better way to talk to pybal sorted [16:07:22] <_joe_> bd808: I do have it, actually [16:07:27] PROBLEM - puppet last run on mw1215 is CRITICAL Puppet has 1 failures [16:07:42] <_joe_> we're just ~1 week away to have it everywhere, it's already in place in codfw [16:07:54] <_joe_> that's why I asked :) [16:07:58] !log restarting cassandra instance on restbase1004 [16:08:03] Logged the message, Master [16:08:05] <_joe_> gwicke: again? [16:08:17] RECOVERY - puppet last run on labstore1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:08:18] it's our new favorite game [16:08:28] <_joe_> I think mobrovac already did that three hours ago [16:08:37] <_joe_> gwicke: cassandra whack-a-mole? [16:08:59] _joe_: the pybal depool just needs to replace this section of scap -- https://github.com/wikimedia/mediawiki-tools-scap/blob/master/scap/main.py#L499-L518 [16:08:59] yeah did that a couple of times today [16:09:12] those instances have very little headroom left, which means that every small perturbance can push them over the edge [16:09:34] basically, too much storage per instance [16:09:39] (03PS4) 10Yuvipanda: labstore: Simplify (and expand!) projects-config.yaml [puppet] - 10https://gerrit.wikimedia.org/r/221856 [16:09:41] (03PS1) 10Yuvipanda: [WIP] labstore: Rewrite of manage-nfs-volumes-daemon [puppet] - 10https://gerrit.wikimedia.org/r/221872 (https://phabricator.wikimedia.org/T102782) [16:09:51] !log disabling puppet on all lvs and neon [16:09:55] Logged the message, Master [16:10:00] (03Abandoned) 10Yuvipanda: [WIP] labstore: Rewrite of manage-nfs-volumes-daemon [puppet] - 10https://gerrit.wikimedia.org/r/217861 (https://phabricator.wikimedia.org/T102782) (owner: 10coren) [16:10:33] (03CR) 10Alexandros Kosiaris: [C: 032] "Since this got a +1 and a very length discussion yesterday on IRC about the internals, I 've run a fleet wide catalogcompilation with the " [puppet] - 10https://gerrit.wikimedia.org/r/221065 (owner: 10Alexandros Kosiaris) [16:10:53] <_joe_> bd808: my idea is: 1) get a list of all servers 2) create batches based on dc and size of the pool, 3) For every batch, do: a) acquire a lock on changes b) depool the servers c) release the lock d) wait N seconds e) restart hhvm [16:11:07] <_joe_> bd808: this will require a significant amount of time, I think [16:11:31] !log enabling and running puppet on lvs1006 [16:11:35] Logged the message, Master [16:12:03] <_joe_> akosiaris: cool [16:12:17] noop on lvs1006 :-) [16:12:26] _joe_: we should start a phab task, work out the details and give it a shot [16:12:27] _joe_: swap 3c and 3d ? you might want to make sure the changes are there before release, no? [16:12:46] (03CR) 10John F. Lewis: "was going to run a puppet compile test on this but its broken https://phabricator.wikimedia.org/T96802" [puppet] - 10https://gerrit.wikimedia.org/r/221787 (owner: 10John F. Lewis) [16:13:20] <_joe_> mobrovac: nope, the lock is needed to mock a transaction on etcd and not make confd reload the file 10 times [16:13:25] <_joe_> but we can do without it [16:13:40] <_joe_> it's harmless to pybal [16:13:45] ah k [16:14:23] _joe_: A subtask of https://phabricator.wikimedia.org/T103886 would be ideal for tracking [16:14:29] <_joe_> bd808: ok [16:14:51] <_joe_> bd808: you tricked me in writing something that's not code, you _are_ a damn good manager [16:15:36] shhhh... I'm a horrible manager who should be demoted to just writing code and maybe a few RfCs [16:15:38] RECOVERY - puppet last run on terbium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:16:08] RECOVERY - puppet last run on mw1241 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:16:20] Krenair: looking [16:17:17] RECOVERY - puppet last run on mw1215 is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:17:27] RECOVERY - puppet last run on wtp1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:17:46] RECOVERY - Host analytics1016 is UPING OK - Packet loss = 0%, RTA = 2.53 ms [16:17:49] (03PS1) 10Ottomata: Revert previous changes for upgrade to v250. v250 has bugs with log files. 242-1 worked fine, reverting to that. [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/221873 [16:18:36] (03CR) 10Ottomata: [C: 032] Revert previous changes for upgrade to v250. v250 has bugs with log files. 242-1 worked fine, reverting to that. [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/221873 (owner: 10Ottomata) [16:19:04] (03CR) 10Ottomata: "Bug: T104271" [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/221873 (owner: 10Ottomata) [16:19:23] !log enabled and running puppet on lvs1005 [16:19:27] Logged the message, Master [16:19:31] (03PS2) 10Yuvipanda: [WIP] labstore: Rewrite of manage-nfs-volumes-daemon [puppet] - 10https://gerrit.wikimedia.org/r/221872 (https://phabricator.wikimedia.org/T102782) [16:19:40] paravoid: ^ [16:19:53] addressed your questions. [16:22:02] 10Ops-Access-Requests, 6operations: Grant dcausse root on the search cluster - https://phabricator.wikimedia.org/T104222#1414228 (10Manybubbles) Yes, thanks. [16:22:21] (03PS1) 10Ottomata: Update jmxtrans module with rollback to 242 configs [puppet] - 10https://gerrit.wikimedia.org/r/221874 [16:22:38] !log enabled and ran puppet on lvs1004. noop as well [16:22:39] (03CR) 10Ottomata: [C: 032 V: 032] Update jmxtrans module with rollback to 242 configs [puppet] - 10https://gerrit.wikimedia.org/r/221874 (owner: 10Ottomata) [16:22:41] Logged the message, Master [16:33:05] robh: awesome, thanks [16:34:37] (03PS6) 10BBlack: primary ssl services -> unified-only, not SNI [puppet] - 10https://gerrit.wikimedia.org/r/221670 [16:35:11] uuuh [16:35:21] YuviPanda: I'd swear I reviewed it before [16:35:31] paravoid: hmm? [16:35:44] wth [16:35:46] paravoid: oh, I made it a new patch because I gave up on rebasing [16:35:49] paravoid: the old one. [16:35:55] oh! [16:36:26] paravoid: just copied the appropriate bits over after spending 5 minutes attempting to rebase. this patch was already moving it from ldap to openstack, and then my earlier patch moved it from ldap to labstore, and... [16:36:38] paravoid: https://gerrit.wikimedia.org/r/#/c/217861/ was older patch [16:37:11] well, I don't see my previous comments being fixed [16:37:25] oh getuid is fixed [16:37:53] self.volumes/volumes is not being used at all as far as I can see [16:38:12] paravoid: I responede about volumes in the older patch. I need it to create projectname/home or projectname/project if they don't exist (for new projects) [16:38:21] that was functionality that's missing in the rewrite [16:38:32] so it will be used in a subsequent commit? [16:39:09] subsequent PS, I guess. I left a bunch of TODOs in the commit message that I'll continue to work on [16:39:38] PROBLEM - DPKG on analytics1027 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:39:57] PROBLEM - DPKG on analytics1010 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:40:23] (03CR) 10Muehlenhoff: "(we followed up on IRC: The idea behind the change is to avoid polling for the cron run on the puppetmaster). Given that the entire puppet" [puppet] - 10https://gerrit.wikimedia.org/r/220305 (https://phabricator.wikimedia.org/T102504) (owner: 10Andrew Bogott) [16:40:34] (03CR) 10Muehlenhoff: [C: 031] Turn on puppet autosigning on labs. [puppet] - 10https://gerrit.wikimedia.org/r/220305 (https://phabricator.wikimedia.org/T102504) (owner: 10Andrew Bogott) [16:40:36] PROBLEM - DPKG on analytics1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:41:09] heh switched to python3 [16:41:11] sneaky! [16:41:16] hey, one less package :P [16:41:16] PROBLEM - DPKG on analytics1026 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:41:18] ipaddress is built in [16:41:19] for ipaddress I assume [16:41:21] yeah :) [16:41:26] PROBLEM - DPKG on analytics1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:41:29] besides, the future, etc. [16:41:35] also, I need to understand https://github.com/wikimedia/operations-puppet/blob/production/modules/labstore/files/sync-exports [16:41:39] and how that fits into this whole picture [16:42:05] ugh :) [16:42:12] I've inquired about this too [16:42:29] so the concept is, essentially, that data resides under /srv but we only export from /exp [16:42:51] as to give a separate mountpoint to each export and disallow cross-mount accesses [16:43:04] I hate it but I haven't researched it properly [16:43:27] so that means when a new project is added we need to run sync-exports? [16:43:34] yes [16:43:53] we haven't done that in the past... [16:44:01] also, https://gerrit.wikimedia.org/r/#/c/217861/8/modules/openstack/files/manage-nfs-volumes-daemon suggests it should be invoked here? [16:44:22] line 178 [16:44:37] yup [16:44:40] _joe_: -# Description: Main MediaWiki application server cluster, appservers.svc.eqiad.wmnet [16:44:40] +# Description: Main MediaWiki application server cluster, appservers.svc.codfw.wmnet [16:44:42] plus you also need to run exportfs -a [16:44:48] (03PS2) 10BryanDavis: wikitech: Local logging config for ldap debugging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221825 [16:44:51] paravoid: ah, the old code called sync-exports. [16:44:51] that on a codfw lvs server [16:45:02] the only change around [16:46:03] paravoid: so looks like I now either need to shell out to that or port that over to python as well? [16:46:17] both ideally [16:46:26] they can't be merged though [16:46:26] paravoid: or we can do this in two steps - step 1 has it working for new instances in existing projects, and step 2 has it working for new projects as well. [16:46:29] so, we got some description right at least [16:46:37] sync-exports pretty much requires root [16:46:41] and I don't expect any new NFS bearing projects in this week. [16:46:43] paravoid: oh, I see. [16:46:50] both for mount and exportfs [16:47:26] 6operations: jmxtrans log rotation failure - https://phabricator.wikimedia.org/T104271#1414312 (10Ottomata) 5Open>3Resolved Ok, I downgraded back to v 242-1, and reverted the puppet changes. I manually removed the offending cron job. What a dumb bug! Thanks. [16:47:31] hrm, I wonder if we actually have to do all that [16:47:44] or if we can just add an fsid= but export /srv subdirs [16:48:00] !log enabled an ran puppet on all lvs servers @ codfw [16:48:04] Logged the message, Master [16:48:17] 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#1414314 (10Joe) 3NEW [16:48:48] <_joe_> bd808: ^^ [16:49:40] paravoid: I think splitting off the new project scenario separately (and tackling that next) will help things. so this deamon just maitains /etc/exports.d and nothing else (as of this patch)... [16:49:51] (03CR) 10Alex Monk: "That doesn't seem to work either." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221825 (owner: 10BryanDavis) [16:49:56] and then I'll rewrite sync-exports to python, and then we can add code to detect new / changed projects into this one. [16:49:59] how does that sound? [16:51:10] YuviPanda: sounds okay! [16:51:45] paravoid: cool. that leaves me actual testing and systemd unit file. [16:51:57] paravoid: I guess I should have systemd handle logging for me too? vs explicitly writing to syslog [16:52:07] RECOVERY - DPKG on analytics1027 is OK: All packages OK [16:52:17] !log disabling puppet on cache clusters [16:52:18] RECOVERY - DPKG on analytics1010 is OK: All packages OK [16:52:21] Logged the message, Master [16:52:57] RECOVERY - DPKG on analytics1003 is OK: All packages OK [16:53:22] (03CR) 10BBlack: [C: 032] primary ssl services -> unified-only, not SNI [puppet] - 10https://gerrit.wikimedia.org/r/221670 (owner: 10BBlack) [16:53:37] RECOVERY - DPKG on analytics1026 is OK: All packages OK [16:53:47] RECOVERY - DPKG on analytics1004 is OK: All packages OK [16:53:59] 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#1414324 (10bd808) Is there any way we can discover the topography from conftool? We have a list of mw servers from th... [16:54:06] (03PS1) 10Glaisher: Redirect wikipedia.is to is.wikipedia.org [puppet] - 10https://gerrit.wikimedia.org/r/221877 (https://phabricator.wikimedia.org/T103915) [16:54:42] (03CR) 10Glaisher: "Second attempt https://gerrit.wikimedia.org/r/#/c/221877/" [puppet] - 10https://gerrit.wikimedia.org/r/221148 (https://phabricator.wikimedia.org/T103915) (owner: 10Glaisher) [16:56:14] going afk for a while now [16:58:09] 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#1414332 (10Joe) Note that this problem statement could very well be expanded to all of our application clusters. I ju... [16:58:20] !log re-enabling puppet on caches [16:58:24] Logged the message, Master [16:59:59] 6operations, 10ops-eqiad: analytics1016 down due to power issue(?) - https://phabricator.wikimedia.org/T103544#1414345 (10Cmjohnson) 5Open>3Resolved The new board is now in place and the server is back online. Added the idrac license [17:00:29] (03CR) 10Ori.livneh: "ping" [puppet] - 10https://gerrit.wikimedia.org/r/221747 (owner: 10Ori.livneh) [17:02:38] !log enabled and ran puppet on lvs400X, lvs300X, lvs100[123]. noops [17:02:42] Logged the message, Master [17:15:24] YuviPanda: how do I add something to ‘blocked’ on a workboard if my browser window isn’t tall enough for me to drag it? [17:16:19] (03PS1) 10RobH: Oliver's key may be compromised [puppet] - 10https://gerrit.wikimedia.org/r/221879 [17:17:42] ori: wanna +1 me so its not just me saying 'this is gonna work' ;] [17:17:53] paravoid: or you if you are about, but its late for you so wasnt sure [17:17:54] ^ [17:18:35] (03CR) 10Filippo Giunchedi: Add tessera module and role; apply on graphite1001 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/221747 (owner: 10Ori.livneh) [17:18:40] (03CR) 10RobH: [C: 031] "my understanding is this will pull his existing key off all boxes. I've left his group assignments in place, as once he has a new key set" [puppet] - 10https://gerrit.wikimedia.org/r/221879 (owner: 10RobH) [17:18:48] no [17:18:50] remove the key [17:18:52] not the account [17:18:55] ? [17:19:07] so just blank the key line? [17:19:13] (03CR) 10Filippo Giunchedi: [C: 031] Add tessera module and role; apply on graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/221747 (owner: 10Ori.livneh) [17:19:33] yes [17:19:39] glad i asked, pushing now [17:19:40] er, actually [17:19:43] make it [] [17:19:44] (03PS2) 10RobH: Oliver's key may be compromised [puppet] - 10https://gerrit.wikimedia.org/r/221879 [17:19:45] robh: yeah that will absent his account and do funny things since he's proably still in groups. If you empty the key line to be [] [17:19:46] yep [17:19:47] ssh_keys: [] [17:19:48] it should wipe it out [17:20:15] done [17:20:21] +1 me someone =] [17:20:22] (03CR) 10Faidon Liambotis: [C: 032] Oliver's key may be compromised [puppet] - 10https://gerrit.wikimedia.org/r/221879 (owner: 10RobH) [17:20:25] yayyyy [17:20:26] thanks [17:20:29] thank you [17:20:38] (03CR) 10RobH: [C: 032] Oliver's key may be compromised [puppet] - 10https://gerrit.wikimedia.org/r/221879 (owner: 10RobH) [17:20:47] needs a rebase, probably [17:21:08] ok, its merged live [17:21:16] poor oliver =[ [17:21:20] (03PS4) 10Dzahn: tendril: git clone from wmf repo via puppet [puppet] - 10https://gerrit.wikimedia.org/r/221172 (https://phabricator.wikimedia.org/T98816) [17:21:26] (always sucks to have laptop issues when traveling) [17:22:20] (03CR) 10BBlack: [C: 031] Redirect wikipedia.is to is.wikipedia.org [puppet] - 10https://gerrit.wikimedia.org/r/221877 (https://phabricator.wikimedia.org/T103915) (owner: 10Glaisher) [17:26:38] Starting the new branch cut for the train deploy from master, FYI [17:32:00] 6operations: jmxtrans log rotation failure - https://phabricator.wikimedia.org/T104271#1414473 (10fgiunchedi) dumb indeed! thanks for fixing! [17:35:05] 6operations, 10MediaWiki-Sites, 10SEO, 5HTTPS-by-default, and 3 others: URLs for the same title without extra query parameters should have the same canonical link - https://phabricator.wikimedia.org/T67402#1414489 (10Nemo_bis) Does the patch fix http://de.wikipedia.org/wiki/de:Rosa%20Luxemburg?uselang=en too? [17:37:50] (03PS5) 10Dzahn: tendril: git clone from wmf repo via puppet [puppet] - 10https://gerrit.wikimedia.org/r/221172 (https://phabricator.wikimedia.org/T98816) [17:38:49] 6operations, 10RESTBase, 10RESTBase-Cassandra: test Cassandra 2.1.7 - https://phabricator.wikimedia.org/T101745#1414495 (10fgiunchedi) [17:39:47] 10Ops-Access-Requests, 6operations: Grant dcausse root on the search cluster - https://phabricator.wikimedia.org/T104222#1414498 (10RobH) David, We need a few things from you, as it seems you don't yet have shell access to the cluster. All of these requirements are detailed on https://wikitech.wikimedia.org/... [17:39:59] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1414499 (10Dzahn) a:5DarTar>3None [17:40:49] PROBLEM - puppet last run on lvs4004 is CRITICAL puppet fail [17:42:48] 10Ops-Access-Requests, 6operations: Grant dcausse root on the search cluster - https://phabricator.wikimedia.org/T104222#1414501 (10RobH) Additionally, all sudo requests require that they be approved in our monday operations meeting. So this doesn't adhere to the typical 3 day wait, but will instead be review... [17:43:58] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 14.29% of data above the critical threshold [500.0] [17:44:17] nice 503 spike! [17:47:13] (03CR) 10Dzahn: [C: 032] tendril: git clone from wmf repo via puppet [puppet] - 10https://gerrit.wikimedia.org/r/221172 (https://phabricator.wikimedia.org/T98816) (owner: 10Dzahn) [17:48:17] (03PS1) 10Aaron Schulz: Set $wgMainStash to redis instead of the DB default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221885 (https://phabricator.wikimedia.org/T88493) [17:52:22] thcipriani: marxarelli: Please be aware of https://gerrit.wikimedia.org/r/221795 [17:53:22] (03PS5) 10Ori.livneh: Add tessera module and role; apply on graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/221747 [17:53:30] (03CR) 10Ori.livneh: [C: 032 V: 032] Add tessera module and role; apply on graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/221747 (owner: 10Ori.livneh) [17:54:20] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Add tessera.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/221767 (owner: 10Ori.livneh) [17:54:41] yet another frontend... [17:54:42] hoo: k, I'll have to make that change as the first update to wmf/1.26wmf12 since I already started cutting the branch. [17:54:51] install all the frontends? :) [17:55:24] also, can we switch to graphite.wm.org/$frontend/ if we're about to do a bunch of them? [17:55:43] separate hostname for gdash was already a stretch when it was introduced :) [17:55:45] thcipriani: It's fine, just need to make sure we pick up the right Wikidata branch [17:55:59] paravoid: you should have more confidence in me and in filippo [17:56:36] this isn't about confidence [17:56:44] the plan is to get rid of gdash, and decide between grafana and tessera after some period of side-by-side usage [17:56:57] the only task that mentions tessera is a one called "Upgrade to newer version of gdash" [17:57:04] misc-web config is missing for that [17:57:13] I thought the end goal here is every employee gets their own custom-built dashboard on its own public endpoint with unique software??? [17:57:25] bblack: that's Q2 [17:57:28] (T98134) [17:57:32] which still hasn't happened :) [17:57:32] slackers [17:57:42] i'll document this [17:58:30] it's just my opinion that we should cut down on the fanciness and first create some solid foundations/a minimum product that works [17:59:01] godog has been very reluctant to introduce another piece of software to the stack and relented only after we agreed on a plan to cut back elsewhere (gdash) [17:59:18] yep, gdash isn't really maintained upstream anymore, see my related question on github paravoid [17:59:19] RECOVERY - puppet last run on lvs4004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [18:00:04] thcipriani marxarelli greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150630T1800). Please do the needful. [18:00:08] fwiw I'm not in love with gdash, I was mentioning the other day how I'd be happy if Krinkle's work could totally replace it :) [18:00:26] Krinkle moved to Grafana [18:00:29] Yep [18:00:32] but to this day I still construct graphite URLs by hand all the time [18:00:55] You'll like this one even more [18:01:04] I'll see about converting nav timing to grafana later. [18:01:09] Should be pretty straight forward [18:01:13] i 'od' whisper files [18:01:38] (kidding.) [18:02:04] no, seriously, this is also a problem [18:02:11] this == ? [18:02:24] the fact that perf.wm.org doesn't use graphite suggests that we're doing something wrong :) [18:02:27] I'm not blaming you [18:02:52] just saying, we should fix the causes, not just write and/or deploy more frontends [18:03:25] and if tessera is going to be "it", I'm okay with it [18:03:47] it just happened to happen a bit under the radar and I haven't heard of any longer term plan for all this [18:04:12] and it's not like I haven't asked, that task above about gdash is mine :) [18:04:45] indeed, the puppet+repo work from ori happened essentially yesterday [18:05:31] it's a Q1 deliverable for mobile to have dashboards, ditto perf, so we're seeing more and more people use grafana because it's there [18:06:07] that said, there is a case to be made for tessera being better, but we can't yank out grafana from under people [18:06:14] so some period of side-by-side evaluation seems warranted to me [18:06:46] you know... grafana was installed as a "test" as well [18:06:49] 6operations, 10Wikimedia-Git-or-Gerrit, 5Patch-For-Review: move tendril to gerrit repo and puppetize cloning - https://phabricator.wikimedia.org/T98816#1414577 (10Dzahn) merged, but the task should stay open for now. i still would like to delete existing files and let puppet re-create them to test it. did n... [18:06:56] "let's see how this frontend works" kind of thing [18:07:23] 6operations, 7Database: move tendril to gerrit repo and puppetize cloning - https://phabricator.wikimedia.org/T98816#1414580 (10Dzahn) [18:07:25] There are SO MANY graphing frontends and URLs [18:07:27] It's crazy [18:07:30] I lost track a long long time ago [18:07:40] (03PS1) 10Ori.livneh: Fix-up for I2e47c63c0: correct Trebuchet deployment path for Tessera [puppet] - 10https://gerrit.wikimedia.org/r/221891 [18:07:45] And there appears to be no central list of all of them [18:08:06] (03CR) 10Ori.livneh: [C: 032 V: 032] Fix-up for I2e47c63c0: correct Trebuchet deployment path for Tessera [puppet] - 10https://gerrit.wikimedia.org/r/221891 (owner: 10Ori.livneh) [18:08:14] RoanKattouw: There is, in fact. [18:08:24] https://wikitech.wikimedia.org/wiki/Graphite [18:08:27] I don't think we need a list, I think we just need to coalesce into a few [18:08:41] why? [18:09:02] They all use Graphite. Some use the PNG api, some (like Grafana) use the JSON api and render interactive graphs. [18:09:04] parsimony is nice, but the software ecosystem sucks [18:09:10] different ones have different strengths [18:09:46] because currently none work for a lot of our use cases and in my mind this is happening due to lack of maintenance/engineering resources [18:10:14] grafana is being used by mobile / readership, services, performance, and techops [18:11:26] (03Abandoned) 10Andrew Bogott: Turn on puppet autosigning on labs. [puppet] - 10https://gerrit.wikimedia.org/r/220305 (https://phabricator.wikimedia.org/T102504) (owner: 10Andrew Bogott) [18:11:57] PROBLEM - Cassanda CQL query interface on restbase1005 is CRITICAL: Connection refused [18:11:57] PROBLEM - Cassandra database on restbase1005 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (cassandra), command name java, args CassandraDaemon [18:12:02] I guess what I'm not okay with is time being spent into a new frontend when my bug to upgrade one of the existing frontends hasn't been fixed nor rejected :) [18:12:15] that's fair [18:13:27] paravoid: I imagine for the majority use case, grafana will be primary for a while. I certainly exceeds all features and use cases of gdash. [18:13:31] It [18:13:52] then someone should create all the gdash dashboards* in grafana [18:13:57] (*: that are useful) [18:13:59] and drop it [18:14:02] But there are many analytical applications where it doesn't provide enough flexibility, or too much. [18:14:14] Yep, sounds like a plan. [18:14:26] I'l document how to use it, and anyone who cares about a dashboard can convert it. [18:14:37] I personally like the simplicity of gdash better but I'm fine with not getting my way [18:15:02] The "PIck a predefined query" of gdash exists in grafana too. [18:15:13] http://grafana.wikimedia.org/#/dashboard/db/resourceloader [18:15:17] http://grafana.wikimedia.org/#/dashboard/db/https [18:15:49] Especially things like zooming into a subset range in super simple in grafana. [18:16:05] and toggling metrics (e.g. disable "p95" so you can focus on median) [18:16:18] !log restarted cassandra on restbase1004 with g1gc GC and larger heap [18:16:22] Logged the message, Master [18:16:29] and of course, changing the overal range between hour/day/month without rendering all combinations [18:16:31] just this past week I've used at least reqsum/reqerror from gdash, plus perf.wm.org for perf graphs, plus your new perf.wm.org navtiming selector, plus constructing URLs by hand so I can monitor the navtiming medians [18:16:46] oh and I guess grafana for http://grafana.wikimedia.org/#/dashboard/db/activity that ori pointed me at [18:16:58] (not to mention ganglia!) [18:17:06] Yeah [18:17:20] I can see how it's overwhelming. [18:17:28] !log restarted cassandra on restbase1005 with g1gc GC and larger heap [18:17:32] Logged the message, Master [18:17:47] and librenms :) [18:17:55] getting that data into one of these would be awesome [18:18:01] if gdash isn't maintained anymore, let's migrate it's dashboard definitions and kill it [18:18:08] s/it's/its/ [18:18:29] gdash reqerr/reqsum is the one (of many many) graph things I look at the most every day [18:18:34] same here [18:18:47] Just for the record, none of "these" store actual data other than graphite. Bar a few exceptions with caching. They all query graphite, most even from the client-side. [18:19:40] Perhaps open a task and gather the dashboards people actually want so we can convert those into usable alternatives in grafana. [18:19:48] RECOVERY - Cassanda CQL query interface on restbase1005 is OK: TCP OK - 0.008 second response time on port 9042 [18:19:48] RECOVERY - Cassandra database on restbase1005 is OK: PROCS OK: 1 process with UID = 113 (cassandra), command name java, args CassandraDaemon [18:20:50] true, I'll try converting some to tessera as well, particularly the ones relevant to ops [18:22:35] I don't think leaving this to "people" is going to cut it :) [18:23:17] it's people! [18:23:37] ( https://www.youtube.com/watch?v=8Sp-VFBbjpE ) [18:25:14] godog: so, actionable :) if you've lost hope on gdash can you figure out a deprecation plan for it? [18:25:47] (and reject T98134, I guess?) [18:26:06] (03PS1) 10Andrew Bogott: Update toolserver.org.crt. [puppet] - 10https://gerrit.wikimedia.org/r/221896 (https://phabricator.wikimedia.org/T104211) [18:26:21] hoo: I made a bump for the Wikidata extension since I merged your change to tools after I cut the branch https://gerrit.wikimedia.org/r/#/c/221895/ look good to you? [18:27:41] paravoid: I think I did, yeah, I'll reject it and open one for tessera and one to deprecate gdash [18:28:04] thanks [18:28:19] as for tessera, is it being pitched as a grafana replacement or in addition to it? [18:28:38] have we settled to it or are we still in the evaluation phase? [18:28:53] if it's the latter, how do we know that people won't start accidentally relying on it like it happened for grafana? [18:29:00] fwiw http://tessera.wmflabs.org/ is there but nginx error page [18:29:08] 6operations, 10ops-eqiad, 10Traffic: eqiad: investigate thermal issues with some cp10xx machines - https://phabricator.wikimedia.org/T103226#1414644 (10Cmjohnson) I polled a view of the system board temperatures on the those listed with highest temps and the system boards are well within their range The co... [18:29:08] don't answer those now, that's just food for thought for the ticket :) [18:29:32] paravoid: hehe ok, thanks, I'll add those [18:29:56] paravoid, jessie on db* - 1/150 :-) [18:29:56] also, T87840 should probably be acted upon soon (not necessarily by godog) [18:30:01] thcipriani: Yeah, that's the right commit [18:30:06] shall I +1/+2? [18:30:42] hoo: I'll go ahead and +2 thank! [18:30:51] bblack: I'm guessing you don't use torrus :) [18:30:51] *thanks [18:31:41] 6operations: Decommission virt1001-1009 - https://phabricator.wikimedia.org/T98376#1414649 (10Cmjohnson) Can I finish with the decom process now? Chris [18:32:41] 6operations, 7Monitoring: deprecate gdash - https://phabricator.wikimedia.org/T104365#1414652 (10fgiunchedi) 3NEW a:3fgiunchedi [18:32:46] 6operations, 10ops-eqiad: What to do with decommissioned ciscos? - https://phabricator.wikimedia.org/T103374#1414663 (10Cmjohnson) a:5mark>3Cmjohnson I am taking this and will figure out what we need to do with them. [18:33:14] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [18:33:40] !log tendril - short downtime for switch to new repo [18:33:44] Logged the message, Master [18:34:04] 6operations, 7Monitoring: Upgrade to newer version of gdash - https://phabricator.wikimedia.org/T98134#1414666 (10fgiunchedi) 5Open>3declined since gdash is unmaintained, and alternatives exists we should invest in those instead, see also T104365 [18:35:13] mutante: yeah that was my earlier experiment, I guess we can use ori's puppet class now to have an instance in labs too [18:36:02] !log labcontrol1002 going down for a few minutes [18:36:06] Logged the message, Master [18:36:52] paravoid: nope :) [18:37:11] I don't think it even works anymore [18:37:26] (03PS1) 10Ori.livneh: Tessera: Restrict POST / PUT / DELETE to wmf / ops / nda LDAP groups [puppet] - 10https://gerrit.wikimedia.org/r/221897 [18:37:41] 6operations, 7Monitoring: evaluate tessera dashboards - https://phabricator.wikimedia.org/T104366#1414678 (10fgiunchedi) 3NEW a:3fgiunchedi [18:39:15] PROBLEM - Host labcontrol1002 is DOWN: PING CRITICAL - Packet loss = 100% [18:41:39] (03CR) 10Ori.livneh: [C: 032 V: 032] Tessera: Restrict POST / PUT / DELETE to wmf / ops / nda LDAP groups [puppet] - 10https://gerrit.wikimedia.org/r/221897 (owner: 10Ori.livneh) [18:46:42] is git::clone in puppet broken? [18:46:49] it used to work and now it does ...nothing [18:47:14] also no error, just nothing [18:47:50] 6operations, 7Monitoring: evaluate tessera dashboards - https://phabricator.wikimedia.org/T104366#1414719 (10ori) So far, the aspects I find superior to Grafana are: * It has a faster UI. Grafana feels sluggish by comparison. * It encourages documentation and discoverability: ** Dashboards are classified by c... [18:48:16] mutante: what is the onlyif / unless / creates conditional? [18:48:29] and is there a reason for it to evaluate to the wrong value? [18:49:15] 6operations, 7Monitoring: Port important gdash dashboards to Tessera - https://phabricator.wikimedia.org/T104369#1414729 (10ori) 3NEW a:3fgiunchedi [18:49:50] 6operations, 7Monitoring: evaluate tessera dashboards - https://phabricator.wikimedia.org/T104366#1414737 (10faidon) Copying from IRC: - Is this being pitched as a Grafana replacement or something that will run in parallel to it? - I saw a tessera.wmflabs.org link in another task but it was 503ing; was this te... [18:50:01] 6operations, 7Monitoring: Port important gdash dashboards to Tessera - https://phabricator.wikimedia.org/T104369#1414729 (10ori) a:5fgiunchedi>3None [18:50:04] ori: is this in response to git::clone? i don't think there is a conditional [18:50:24] PROBLEM - puppet last run on mw1117 is CRITICAL Puppet has 1 failures [18:50:51] mutante: there is; exec { "git_clone_${title}" } has creates => "${directory}/.git/config", [18:51:24] RECOVERY - Host labcontrol1002 is UPING OK - Packet loss = 0%, RTA = 1.50 ms [18:51:43] ori: ! that might be it.. if .git/config already existed from manually cloning in the same place ! [18:51:56] there you go :) [18:52:04] ori: thx [18:55:38] !log demon Synchronized php-1.26wmf11/includes/WebResponse.php: (no message) (duration: 00m 12s) [18:55:42] Logged the message, Master [18:59:03] PROBLEM - puppet last run on labcontrol1002 is CRITICAL Puppet has 1 failures [19:00:20] !log demon Synchronized php-1.26wmf11/includes/WebResponse.php: rv my test (duration: 00m 12s) [19:00:24] Logged the message, Master [19:03:04] (03PS1) 10JanZerebecki: Enable WikibaseQualit extensions on test.wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221905 (https://phabricator.wikimedia.org/T103814) [19:05:54] RECOVERY - puppet last run on mw1117 is OK Puppet is currently enabled, last run 0 seconds ago with 0 failures [19:07:22] (03PS2) 10JanZerebecki: Enable WikibaseQuality extensions on test.wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221905 (https://phabricator.wikimedia.org/T103814) [19:09:00] 6operations, 7Monitoring: evaluate tessera dashboards - https://phabricator.wikimedia.org/T104366#1414782 (10ori) >>! In T104366#1414737, @faidon wrote: > Copying from IRC: > - Is this being pitched as a Grafana replacement or something that will run in parallel to it? Initially in parallel. I think it will u... [19:15:22] (03CR) 10Alex Monk: Set $wgMainStash to redis instead of the DB default (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221885 (https://phabricator.wikimedia.org/T88493) (owner: 10Aaron Schulz) [19:16:38] (03PS1) 10Cmjohnson: Swapping servers for labnet1002 [dns] - 10https://gerrit.wikimedia.org/r/221906 [19:21:10] 6operations: Decommission virt1001-1009 - https://phabricator.wikimedia.org/T98376#1414863 (10Andrew) Yes, these can be removed any time. [19:21:33] (03CR) 10Cmjohnson: [C: 032] Swapping servers for labnet1002 [dns] - 10https://gerrit.wikimedia.org/r/221906 (owner: 10Cmjohnson) [19:22:46] (03Abandoned) 10Cmjohnson: adding labnet1002 to dhcpd with 10G Mac [puppet] - 10https://gerrit.wikimedia.org/r/220556 (owner: 10Cmjohnson) [19:23:26] Failed to load resource: the server responded with a status of 404 (Not Found) [19:23:48] for https://wikitech.wikimedia.org/beacon/statsv?ve.mwTarget.performance.system�aveDialogOpen=842ms&ve.mwTarget.performance.system.serializeforcache=349ms [19:23:55] ori is that something to do with your bits changes? [19:25:20] 10Ops-Access-Requests, 6operations: Grant dcausse root on the search cluster - https://phabricator.wikimedia.org/T104222#1414894 (10dcausse) Full Name: David Causse My wikitech user name is DCausse (https://wikitech.wikimedia.org/wiki/User:DCausse) My labs ssh account is dcausse. Preferred shell username : dca... [19:25:44] Krenair: I haven't changed anything recently (past couple of weeks) but looking anyhow [19:26:04] (03PS1) 10Cmjohnson: Fixing typo in host entries [puppet] - 10https://gerrit.wikimedia.org/r/221952 [19:26:15] (03PS1) 10Thcipriani: Add 1.26wmf12 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221953 [19:26:17] (03PS1) 10Thcipriani: group0 to php-1.26wmf12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221954 [19:27:13] Is that something that's supposed to be be handled by varnish? [19:27:43] hm, bblack -- https://wikitech.wikimedia.org/beacon/statsv ought to 204 but it 404s instead. The rule is if (req.url ~ "^/beacon\/[^/?]+") { error 204; } in wikimedia.vcl.erb. [19:27:53] wikitech does not go through varnish [19:28:07] oh [19:28:20] yeah [19:28:22] (03CR) 10Cmjohnson: [C: 032] Fixing typo in host entries [puppet] - 10https://gerrit.wikimedia.org/r/221952 (owner: 10Cmjohnson) [19:28:24] by design or by accident which we now pass off as design? [19:28:28] (03CR) 10Thcipriani: [C: 032] "Train Deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221953 (owner: 10Thcipriani) [19:28:33] (03Merged) 10jenkins-bot: Add 1.26wmf12 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221953 (owner: 10Thcipriani) [19:28:34] wikitech is (intentionally) not part of our normal infrastructure [19:28:38] https://phabricator.wikimedia.org/T102178#1364317 [19:28:55] PROBLEM - Cassanda CQL query interface on restbase1001 is CRITICAL: Connection refused [19:29:06] ok. we don't need stats from wikitech, so it's not hurting our data [19:29:11] I think the primary rationale is we need it to work when other things are broken [19:29:12] and they should not be visible to users except in the js console [19:29:17] yeah [19:29:17] but honestly I don't know the whole history [19:29:20] I got this from the js console [19:29:23] bblack: the reverse has more often been the case ;) [19:29:32] I wonder if we can disable the things that call this on wikitech [19:29:35] it would be more elegant to have a 204 for wikitech as well [19:29:54] Krenair: yeah just disable wikimediaevents and eventlogging on wikitech [19:30:06] did you just use elegant and wikitech in the same sentence? [19:30:13] well, don't disable eventlogging, too many other extensions depend on it and i'm not sure that all dependencies are soft [19:30:42] so disable eventlogging and file bugs for things that break? :p [19:30:53] !log thcipriani Started scap: testwiki to php-1.26wmf12 and rebuild l10n cache [19:30:57] Logged the message, Master [19:30:58] ok according to the internets RedirectMatch 204 beacon/(.*)$ should work [19:31:10] so don't disable anything, i'll submit a patch adding that rule to apache on wikitech [19:31:20] ok [19:31:25] RECOVERY - Cassanda CQL query interface on restbase1001 is OK: TCP OK - 0.010 second response time on port 9042 [19:32:17] (03PS5) 10BBlack: No need for wgSecureLogin on our wikis, HTTPS is forced everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/219265 (https://phabricator.wikimedia.org/T103021) [19:32:56] (03PS1) 10Matanya: access: add DCausse shell account [puppet] - 10https://gerrit.wikimedia.org/r/221967 [19:33:10] (03CR) 10BBlack: "Undid the PS4 bit (put labs back in). I failed to comprehend my own patch when I did that earlier. The default when unset in config is t" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/219265 (https://phabricator.wikimedia.org/T103021) (owner: 10BBlack) [19:36:34] (03PS1) 10Matanya: access: Grant dcausse root on the search cluster [puppet] - 10https://gerrit.wikimedia.org/r/221968 [19:36:54] 6operations, 7HTTPS: Replace SHA1 certificates with SHA256 - https://phabricator.wikimedia.org/T73156#1414942 (10RobH) I'm not quite certain about the existence or replacement of the fundraising related certificates. Perhaps @Jgreen can advise? [19:37:15] (03CR) 10jenkins-bot: [V: 04-1] access: Grant dcausse root on the search cluster [puppet] - 10https://gerrit.wikimedia.org/r/221968 (owner: 10Matanya) [19:37:55] 6operations, 7HTTPS: audit and replace all fundraising certificates in sha1 to sha256 - https://phabricator.wikimedia.org/T104378#1414947 (10RobH) 3NEW a:3Jgreen [19:38:22] (03PS1) 10Ori.livneh: Wikitech: rebuff requests to beacon/(.*) with an HTTP 204 [puppet] - 10https://gerrit.wikimedia.org/r/221969 [19:38:35] (03CR) 10Matanya: "depends on https://gerrit.wikimedia.org/r/221967" [puppet] - 10https://gerrit.wikimedia.org/r/221968 (owner: 10Matanya) [19:38:46] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant dcausse root on the search cluster - https://phabricator.wikimedia.org/T104222#1414956 (10RobH) We'll need your manager to approve this request on this task. [19:39:07] bblack: ^^. note that all the examples i found online (e.g. http://www.lowlevelmanager.com/2009/07/returning-apache-204.html) don't have a slash in the rule, so if I got anything wrong in that patch it's probably inappropriate escaping. I think it should be OK, though. [19:39:36] beacon(.*) vs beacon/(.*) [19:40:32] beacon(.*) would technically be too broad, but practically OK -- as in, unlikely to ever match anything it shouldn't. So I could go either way. [19:41:58] !log OAI: disabled unused accounts [19:42:03] Logged the message, Master [19:44:34] (03CR) 10BBlack: [C: 031] Wikitech: rebuff requests to beacon/(.*) with an HTTP 204 [puppet] - 10https://gerrit.wikimedia.org/r/221969 (owner: 10Ori.livneh) [19:44:41] 6operations, 10ops-eqiad, 10Traffic: eqiad: investigate thermal issues with some cp10xx machines - https://phabricator.wikimedia.org/T103226#1415011 (10Cmjohnson) I may have some thermal paste in storage. Let's pick one and see if that helps. [19:45:41] robh: am i stepping on your toes ? [19:45:47] ? [19:45:54] the access patches [19:46:06] nope! [19:46:12] if i wanted to do them i should have claimed the task =] [19:46:18] ok [19:46:28] (which when i wanna do them i totally do =) [19:46:59] the opposite is the truth, pretty sure everyone appreciates the patches =] [19:47:16] ok, glad to hear [19:51:16] PROBLEM - Disk space on snapshot1001 is CRITICAL: DISK CRITICAL - free space: / 606 MB (2% inode=59%) [19:51:43] 6operations, 10ops-eqiad, 10Traffic: eqiad: investigate thermal issues with some cp10xx machines - https://phabricator.wikimedia.org/T103226#1415028 (10BBlack) Is cp1067 acceptable? If so I'll downtime/depool it. [19:55:25] I wonder why https://gerrit.wikimedia.org/r/#/c/220760/ is not scheduled at https://integration.wikimedia.org/zuul/ [19:55:59] I was expecting a gate-and-submit job [19:57:24] hmm scap is been hung up on this last apache server for a minute or so :\ [19:58:05] thcipriani: Give it some more time [19:58:10] thcipriani: mw1086.eqiad.wmnet is the stuck host [19:58:27] If it is stuck, you can kill the ssh to it [19:58:37] cpu load there is 16.76 [19:58:52] oh boy, ok, so can I just kill scap and then run sync-wikiverions? [19:59:02] better not [19:59:10] or, wait, it's got to do some rebuilding on these instances too [19:59:23] no, but you can open another ssh session to tin and kill the ssh to mw1086 [19:59:46] then you can run sync-common from mw1086 manually to get it caught up [20:00:05] hoo: Respected human, time to deploy WikibaseQuality/Constraints to testwikidata (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150630T2000). Please do the needful. [20:00:24] bd808: ah, kk, doing now [20:00:56] hoo: https://gerrit.wikimedia.org/r/#/c/221905/ [20:01:05] seen that [20:01:12] we will need to wait until after the train [20:01:16] sure [20:01:17] k, it's rebuilding cdbs now [20:02:49] one failed with no space left :( [20:03:04] ugh [20:03:11] thcipriani: one of the sanpshoot servers? [20:03:17] looking [20:03:17] *snapshot [20:03:20] Do we have to many MW versions, again [20:03:39] they ride on the ragged edge of running out of space all the time [20:03:53] cleaning up l10n cdbs and old versions helps [20:03:54] yup sudo -u mwdeploy -n -- /srv/deployment/scap/scap/bin/scap-rebuild-cdbs on snapshot1001.eqiad.wmnet returned [70]: [20:04:20] hoo: because pages can live in Varnish with static asset references for 30 days [20:04:51] so we need to keep the branch around until it is not likely to be referenced from varnish [20:04:53] bd808: I know, but still we only need to keep so many around [20:05:15] 5 weeks worth bascially [20:05:40] which with the new cadence should reduce the number of branches [20:05:45] I think... [20:05:51] !log thcipriani Finished scap: testwiki to php-1.26wmf12 and rebuild l10n cache (duration: 34m 58s) [20:06:01] Logged the message, Master [20:06:02] bd808: So we could purge at least two [20:06:42] (8 are currently around) [20:07:10] what branch was on enwiki on 2015-05-26? [20:07:29] that is the oldest one we could have use for [20:07:43] 1.26wmf6 [20:07:49] yeah, 6 [20:08:05] Ok, so we could only purge of wmf5 [20:08:36] *nod* and cleanup l10n from which is a lot of the disk usage actually [20:08:46] (03PS1) 10BBlack: HTTPS: redirect POST with 307 [puppet] - 10https://gerrit.wikimedia.org/r/221974 [20:09:08] bd808: So we can delete the l10n in advance? [20:09:11] That sounds handy [20:09:48] hoo: yeah. there is a process for it [20:10:16] that was one of the first things I worked out last year when I ran the train for a while [20:10:24] (03PS2) 10Andrew Bogott: Update toolserver.org.crt. [puppet] - 10https://gerrit.wikimedia.org/r/221896 (https://phabricator.wikimedia.org/T104211) [20:10:38] ok, testwiki looks ok, if I'm reading logstash right, I'm going to run sync-wikiversions and then train should be done. [20:10:55] (03CR) 10RobH: [C: 031] Update toolserver.org.crt. [puppet] - 10https://gerrit.wikimedia.org/r/221896 (https://phabricator.wikimedia.org/T104211) (owner: 10Andrew Bogott) [20:11:11] thcipriani: Maybe the old stuff should be pruned? [20:11:17] (03PS3) 10Andrew Bogott: Update toolserver.org.crt. [puppet] - 10https://gerrit.wikimedia.org/r/221896 (https://phabricator.wikimedia.org/T104211) [20:11:17] scap-purge-l10n-cache does most of the work [20:11:22] https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Purge_localization_cache_for_now_unused_versions [20:11:23] (03CR) 10Andrew Bogott: [C: 032] Update toolserver.org.crt. [puppet] - 10https://gerrit.wikimedia.org/r/221896 (https://phabricator.wikimedia.org/T104211) (owner: 10Andrew Bogott) [20:11:31] * thcipriani looks [20:11:49] I'll quickly prepare some food [20:11:57] will be back once you're done (watching from my phone) [20:12:01] ok, I can run that: scap-purge-l10n-cache --version 1.26wmf10 [20:12:44] thcipriani: 1.26wmf9 hasn't had an l10n purge either [20:12:53] and 1.26wmf8 [20:12:57] !log thcipriani Purged l10n cache for 1.26wmf10 [20:13:04] Logged the message, Master [20:13:05] no wonder snapshot is out of disk [20:13:10] oh boy, ok, running for all of 'em [20:13:13] and 1.26wmf7 [20:13:21] and 1.26wmf6 [20:13:50] !log thcipriani Purged l10n cache for 1.26wmf9 [20:13:55] RECOVERY - Disk space on snapshot1001 is OK: DISK OK [20:13:58] Logged the message, Master [20:14:14] !log thcipriani Purged l10n cache for 1.26wmf8 [20:14:24] Logged the message, Master [20:14:26] (03PS1) 10Rush: confd: wrap linting and manage error state file [puppet] - 10https://gerrit.wikimedia.org/r/221977 [20:14:44] !log thcipriani Purged l10n cache for 1.26wmf7 [20:14:50] Logged the message, Master [20:15:08] (03CR) 10jenkins-bot: [V: 04-1] confd: wrap linting and manage error state file [puppet] - 10https://gerrit.wikimedia.org/r/221977 (owner: 10Rush) [20:15:10] !log thcipriani Purged l10n cache for 1.26wmf6 [20:15:20] Logged the message, Master [20:17:29] bd808: how do I get snapshot1001 up-to-date now that it has space? I can run sync-common but it still needs to rebuild some cdbs locally, right? Is there a command for that? [20:17:41] sync-common takes care of that [20:17:45] Run sync-common [20:18:04] kk, doing that, thanks [20:19:30] heh. got some disk back -- https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20eqiad&h=snapshot1001.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1435695521&v=13.123&m=disk_free&vl=GB&ti=Disk%20Space%20Available&z=large [20:19:51] (03PS2) 10Rush: confd: wrap linting and manage error state file [puppet] - 10https://gerrit.wikimedia.org/r/221977 [20:20:32] (03CR) 10jenkins-bot: [V: 04-1] confd: wrap linting and manage error state file [puppet] - 10https://gerrit.wikimedia.org/r/221977 (owner: 10Rush) [20:20:47] ok, ran sync-common on mw1086 and snapshot1001 [20:21:36] subbu: it was an fyi, not a don't-merge-this [20:21:46] subbu: (sorry, meant that for -dev) [20:21:47] (03CR) 10Thcipriani: [C: 032] "Train deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221954 (owner: 10Thcipriani) [20:21:53] (03Merged) 10jenkins-bot: group0 to php-1.26wmf12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221954 (owner: 10Thcipriani) [20:22:27] (03PS3) 10Rush: confd: wrap linting and manage error state file [puppet] - 10https://gerrit.wikimedia.org/r/221977 [20:23:40] !log thcipriani rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf12 [20:23:44] Logged the message, Master [20:24:04] (03PS4) 10Rush: confd: wrap linting and manage error state file [puppet] - 10https://gerrit.wikimedia.org/r/221977 [20:24:30] OK, and with that my first deployment train is complete [20:24:43] Ok [20:24:51] Be there in a sec [20:25:42] (03CR) 10Rush: [C: 032] confd: wrap linting and manage error state file [puppet] - 10https://gerrit.wikimedia.org/r/221977 (owner: 10Rush) [20:26:33] thcipriani: are you sure it's complete? we didn't have an outage yet [20:26:37] ;) [20:27:00] anticlimactic [20:29:10] thcipriani: ^5 [20:29:26] :) [20:29:45] running the train is more like using a chain saw or other power tool. When you are new at it and paying attention you are much less likely to cause harm then later when it seems like a routine thing to do. [20:30:04] RoanKattouw legoktm matt_flaschen: Respected human, time to deploy Flow (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150630T2030). Please do the needful. [20:30:32] @Flow people: We have a bit of a delay right now [20:30:45] hoo, it's okay, us too. [20:31:16] We're almost ready, though. [20:31:26] How long do you need for your deploy? [20:31:34] If so, I might let you go first [20:31:42] * If it's fast [20:31:54] Waiting for the students that wrote the extensions we want to enable [20:32:37] Probably not fast enough. [20:39:32] (03PS1) 10Rush: confd: alert on invalid template generation [puppet] - 10https://gerrit.wikimedia.org/r/221978 [20:39:58] !log Created `wbqc_constraints` on testwikidatawiki (s3). [20:40:03] Logged the message, Master [20:44:09] (03PS3) 10Hoo man: Enable WikibaseQuality extensions on test.wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221905 (https://phabricator.wikimedia.org/T103814) (owner: 10JanZerebecki) [20:45:14] Let's do this [20:45:21] (03CR) 10Hoo man: [C: 032] Enable WikibaseQuality extensions on test.wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221905 (https://phabricator.wikimedia.org/T103814) (owner: 10JanZerebecki) [20:45:27] (03Merged) 10jenkins-bot: Enable WikibaseQuality extensions on test.wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221905 (https://phabricator.wikimedia.org/T103814) (owner: 10JanZerebecki) [20:46:36] !log hoo Synchronized wmf-config/InitialiseSettings.php: Enable WikibaseQuality extensions on testwikidata (duration: 00m 14s) [20:46:40] Logged the message, Master [20:47:02] hmm, no links to old bugzilla attachments anymore ? [20:47:06] https://static-bugzilla.wikimedia.org/attachment.cgi?id=10856 [20:47:55] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant dcausse root on the search cluster - https://phabricator.wikimedia.org/T104222#1415234 (10Manybubbles) >>! In T104222#1414956, @RobH wrote: > We'll need your manager to approve this request on this task. As Tomasz's delegate while he's on paternit... [20:50:05] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant dcausse root on the search cluster - https://phabricator.wikimedia.org/T104222#1415235 (10RobH) a:3RobH @Manybubbles: I didn't realize you were his delegate, good enough! thanks! So this still cannot be merged until its approved in our operati... [20:51:10] (03CR) 10RobH: [C: 04-2] "The actual patchset looks good, and would be a +2, except we still need to get this access approved during our Monday operations meeting. " [puppet] - 10https://gerrit.wikimedia.org/r/221967 (owner: 10Matanya) [20:51:21] (03CR) 10RobH: [C: 04-2] "The actual patchset looks good, and would be a +2, except we still need to get this access approved during our Monday operations meeting. " [puppet] - 10https://gerrit.wikimedia.org/r/221968 (owner: 10Matanya) [20:51:44] manybubbles: didnt realize you were also his delegate, sorry about that ;D [20:52:01] robh: its cool! Its only temporary [20:52:09] i dont even try to keep up to who is in charge of what when folks are out, i barely keep up to it when everyone is around =] [20:52:15] (03PS1) 10Hoo man: Log 'wbq_evaluation' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221984 [20:52:28] so yea, no point in merging his bastion access until we also merge his enhanced access [20:52:45] (03CR) 10Hoo man: [C: 032] Log 'wbq_evaluation' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221984 (owner: 10Hoo man) [20:52:52] (03Merged) 10jenkins-bot: Log 'wbq_evaluation' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221984 (owner: 10Hoo man) [20:53:01] and i'll add to our list for ops meeting on monday (i claimed the task) [20:53:08] after matanya did all the work ;D [20:53:38] !log hoo Synchronized wmf-config/InitialiseSettings.php: Log 'wbq_evaluation' (duration: 00m 12s) [20:53:42] Logged the message, Master [20:54:22] Ok, I think I'm done [20:58:20] !log catrope Synchronized php-1.26wmf11/maintenance/: Add populateContentModel maintenance script (duration: 00m 17s) [20:58:24] Logged the message, Master [20:58:48] !log catrope Synchronized php-1.26wmf12/maintenance/: Add populateContentModel maintenance script (duration: 00m 13s) [20:58:52] Logged the message, Master [21:00:34] centralnotice will bypass varnish cache right? but not a normal sitenotice? [21:01:12] the rule is req.url !~ "^/wiki/Special\:Banner" [21:01:19] well, ~ [21:01:20] !log Running populateContentModel.php on officewiki for page table in namespaces occupied by Flow (1,3,5,7,9,11,13,15,91,93,101,111,113,829) [21:01:24] Logged the message, Mr. Obvious [21:02:53] (03PS1) 10Odder: Add localised logo for Marathi Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221989 (https://phabricator.wikimedia.org/T103655) [21:08:31] (03PS4) 10EBernhardson: Patch to uniqify filename of eval()'d code [debs/hhvm] - 10https://gerrit.wikimedia.org/r/219125 [21:09:06] (03CR) 10EBernhardson: [C: 031] "Patch updated to match the one accepted upstream. No functionality changes, just updates existing test cases and adds a couple more." [debs/hhvm] - 10https://gerrit.wikimedia.org/r/219125 (owner: 10EBernhardson) [21:09:25] (03PS1) 10GWicke: Move Cassandra to g1gc collector and increase heap size [puppet] - 10https://gerrit.wikimedia.org/r/221993 (https://phabricator.wikimedia.org/T103161) [21:10:54] 10Ops-Access-Requests, 6operations: Grant dcausse root on the search cluster - https://phabricator.wikimedia.org/T104222#1415364 (10RobH) 5Open>3stalled p:5Triage>3Normal [21:14:23] !log catrope Synchronized php-1.26wmf11/extensions/Flow: (no message) (duration: 00m 13s) [21:14:27] Logged the message, Master [21:14:37] !log catrope Synchronized php-1.26wmf12/extensions/Flow: (no message) (duration: 00m 14s) [21:14:41] Logged the message, Master [21:15:25] (03CR) 10Eevans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/221993 (https://phabricator.wikimedia.org/T103161) (owner: 10GWicke) [21:19:08] !log catrope Synchronized php-1.26wmf11/extensions/Flow: (no message) (duration: 00m 14s) [21:19:12] Logged the message, Master [21:19:22] !log catrope Synchronized php-1.26wmf12/extensions/Flow: (no message) (duration: 00m 14s) [21:21:14] (03PS1) 10Catrope: Make project talk namespace on cawiki Flow-occupied [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221999 (https://phabricator.wikimedia.org/T99117) [21:22:22] (03CR) 10Catrope: [C: 04-2] "Holding for just a little bit" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221999 (https://phabricator.wikimedia.org/T99117) (owner: 10Catrope) [21:24:40] godog: you around? [21:26:51] (03PS1) 10BBlack: tlsproxy: rename protoproxy to tlsproxy globally [puppet] - 10https://gerrit.wikimedia.org/r/222001 [21:26:53] (03PS1) 10BBlack: tlsproxy: fold ssl::beta::common into ssl::beta [puppet] - 10https://gerrit.wikimedia.org/r/222002 [21:26:55] (03PS1) 10BBlack: tlsproxy: move role::tlsproxy::ssl::common to auto-required tlsproxy::instance [puppet] - 10https://gerrit.wikimedia.org/r/222003 [21:26:57] (03PS1) 10BBlack: tlsproxy: move sslcert stuff inside of tlsproxy::localssl [puppet] - 10https://gerrit.wikimedia.org/r/222004 [21:26:59] (03PS1) 10BBlack: tlsproxy: remove unused ganglia/localhost stuff [puppet] - 10https://gerrit.wikimedia.org/r/222005 [21:27:01] (03PS1) 10BBlack: tlsproxy: rename beta-only things to betassl for clarity [puppet] - 10https://gerrit.wikimedia.org/r/222006 [21:27:03] (03PS1) 10BBlack: tlsproxy: remove remaining ipv6 hacks from beta [puppet] - 10https://gerrit.wikimedia.org/r/222007 [21:27:05] (03PS1) 10BBlack: tlsproxy: gut esams cases from beta-only template [puppet] - 10https://gerrit.wikimedia.org/r/222008 [21:27:07] (03PS1) 10BBlack: tlsproxy: move template into module (only user) [puppet] - 10https://gerrit.wikimedia.org/r/222009 [21:27:09] (03PS1) 10BBlack: tlsproxy: move logrotate into module (only user) [puppet] - 10https://gerrit.wikimedia.org/r/222010 [21:27:11] (03PS1) 10BBlack: tlsproxy: remove pointless use_ssl + jessie conditionals [puppet] - 10https://gerrit.wikimedia.org/r/222011 [21:27:13] (03PS1) 10BBlack: tlsproxy: remove dead udplog comments [puppet] - 10https://gerrit.wikimedia.org/r/222012 [21:28:05] heh, did I kill grrrit-wm? :) [21:29:23] !log Ran populateContentModel.php --table=page --ns=5 on cawiki [21:29:27] Logged the message, Mr. Obvious [21:45:58] !log Ran populateContentModel.php --table=archive --ns=5 on officewiki [21:46:03] Logged the message, Mr. Obvious [21:46:21] !log Also ran populateContentModel.php --table=archive for talk namespaces on officewiki [21:46:25] Logged the message, Mr. Obvious [21:49:44] &win 2 [21:51:00] (03PS1) 10BBlack: tlsproxy: add 2048-bit dhparam file to nginx [puppet] - 10https://gerrit.wikimedia.org/r/222016 [21:55:07] (03PS2) 10RobH: access: Grant Ellery Wulczyn @ellery access to terbium via the restricted group [puppet] - 10https://gerrit.wikimedia.org/r/221006 (owner: 10Matanya) [21:56:24] (03PS3) 10RobH: access: Grant Ellery Wulczyn @ellery access to terbium via the restricted group [puppet] - 10https://gerrit.wikimedia.org/r/221006 (owner: 10Matanya) [21:58:08] (03CR) 10RobH: [C: 032] access: Grant Ellery Wulczyn @ellery access to terbium via the restricted group [puppet] - 10https://gerrit.wikimedia.org/r/221006 (owner: 10Matanya) [21:59:02] https://commons.wikimedia.org/w/index.php?title=Special:UploadWizard&campaign=tos-rs [21:59:12] Now that's an interesting error [21:59:16] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant Ellery Wulczyn @ellery access to terbium via the restricted group. - https://phabricator.wikimedia.org/T103782#1415529 (10RobH) [22:01:44] PROBLEM - puppet last run on mw2156 is CRITICAL Puppet has 1 failures [22:03:13] !log Started convertNamespaceFromWikitext.php for Project_talk on Catalan Wikipedia [22:03:17] Logged the message, Master [22:03:59] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant Ellery Wulczyn @ellery access to terbium via the restricted group. - https://phabricator.wikimedia.org/T103782#1415542 (10RobH) 5Open>3Resolved a:3RobH @Ellery's access to terbium has been merged and is live on terbium & bast1001 (so you shou... [22:08:29] 6operations, 10Traffic, 5Patch-For-Review: Switch to explicit ciphersuite - https://phabricator.wikimedia.org/T104274#1415552 (10BBlack) 5Open>3Resolved a:3BBlack [22:09:39] !log Done converting wikitext namespace to Flow on Catalan Wikipedia [22:09:43] Logged the message, Master [22:10:14] jouncebot, next [22:10:14] In 0 hour(s) and 19 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150630T2230) [22:12:04] Log at people.wikimedia.org/~mattflaschen/2015_06_30_Catalan_Project_talk.log (it's 1.1 MB) [22:13:04] (03CR) 10Catrope: [C: 032] Make project talk namespace on cawiki Flow-occupied [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221999 (https://phabricator.wikimedia.org/T99117) (owner: 10Catrope) [22:13:09] (03Merged) 10jenkins-bot: Make project talk namespace on cawiki Flow-occupied [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221999 (https://phabricator.wikimedia.org/T99117) (owner: 10Catrope) [22:13:19] * andrewbogott will be back for the leap-second vigil [22:13:29] (03PS2) 10Alex Monk: Remove wmgUseXAnalytics and wgAjaxEditStash override, other random cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221808 (https://phabricator.wikimedia.org/T31902) [22:13:56] !log catrope Synchronized wmf-config/InitialiseSettings.php: Flow-occupy Wikipedia talk namespace on cawiki (duration: 00m 11s) [22:14:00] Logged the message, Master [22:19:04] RECOVERY - puppet last run on mw2156 is OK Puppet is currently enabled, last run 2 minutes ago with 0 failures [22:23:08] odder: did you file a bug already? [22:24:18] (03PS2) 10Ori.livneh: Wikitech: rebuff requests to beacon/(.*) with an HTTP 204 [puppet] - 10https://gerrit.wikimedia.org/r/221969 [22:24:28] (03CR) 10Ori.livneh: [C: 032 V: 032] Wikitech: rebuff requests to beacon/(.*) with an HTTP 204 [puppet] - 10https://gerrit.wikimedia.org/r/221969 (owner: 10Ori.livneh) [22:24:34] PROBLEM - puppet last run on cp3044 is CRITICAL puppet fail [22:30:04] RoanKattouw ostriches Krenair: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150630T2230). [22:30:04] Krenair: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [22:30:34] guess I'll do it then [22:31:11] early today for the leap second? [22:31:23] bd808: yep [22:32:15] * bd808 leaps into the future [22:32:32] oh that's today [22:35:03] (03PS1) 10Ori.livneh: Make Coal's whisper files accessible to Graphite front-ends. [puppet] - 10https://gerrit.wikimedia.org/r/222020 [22:35:18] ^ godog: I did this manually on graphite1001, following up to make sure it is puppetized [22:35:40] it has a cool-looking change # (222020) so it must be good [22:35:53] I'm not convinced that my change will work as expected, actually. Something quite strange seems to be happening in my tests on tin. [22:37:00] fatal: unable to access 'https://git.wikimedia.org/git/operations/puppet.git/': The requested URL returned error: 504 [22:37:00] error: Could not fetch origin [22:37:06] (03PS1) 10Rush: pybal: remove eqiad applictaion for osm [puppet] - 10https://gerrit.wikimedia.org/r/222021 [22:37:06] anybody knows why ^? [22:40:00] ori: coal-looking? [22:40:07] SMalyshev: the trailing /? [22:40:09] SMalyshev: can only say its not jsut you, someone in mobile was also complaing about that recently [22:40:24] ori: trainilng / is something git has added [22:40:36] ori: remote is origin https://git.wikimedia.org/git/operations/puppet.git (fetch) [22:40:40] Does tin always runs scripts from mediawiki-staging? [22:40:46] yes [22:40:55] which worked some time ago but not anymore :( [22:41:07] PROBLEM - puppet last run on cp2016 is CRITICAL puppet fail [22:41:22] So I thought that wmgUseXAnalytics looks redundant - it's always set to true, and then there's just a simple if ( $wmgUseXAnalytics ) { include XAnalytics extension } check [22:41:26] should I just use gerrit one? [22:41:27] Krenair: you doing OK? [22:41:43] oh, I was scrolled up and didn't notice [22:41:45] nvm me [22:41:50] yeah, I was double checking some simple cleanup [22:41:59] I'm not convinced my patch will be OK now [22:42:06] don't do it then :) [22:42:12] it's behaving strangely on tin [22:42:27] ok, looks like git.wikimedia.org is toast: Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please try again in a few minutes. [22:42:27] SMalyshev: git.wm.o afaik is often down and problematic [22:42:57] and yet, checking on tin, I made it include the extension unconditionally and it still seems loaded. But then if I remove the block for the wmgUse variable, it stops being loaded [22:43:02] that explains it [22:43:51] !log running eval.php (along the lines of https://gerrit.wikimedia.org/r/#/c/221783) on commonswiki to fix T104395 [22:43:55] Logged the message, Master [22:43:56] CRITICAL - Socket timeout after 10 seconds [22:44:08] from icinga about antimony (git dot's server) [22:44:37] (03CR) 10BBlack: [C: 031] pybal: remove eqiad applictaion for osm [puppet] - 10https://gerrit.wikimedia.org/r/222021 (owner: 10Rush) [22:44:58] gitblit sucks! [22:45:47] it's just our way of getting opsen to help with the gerrit/gitblit -> phab migration :) [22:46:07] !log restarting gitblit on antimony, because Java is so 1996 [22:46:12] Logged the message, Master [22:46:15] RECOVERY - puppet last run on cp2016 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [22:47:06] (03PS1) 10BBlack: ciphersuites: refactor further, add compat-dhe option [puppet] - 10https://gerrit.wikimedia.org/r/222022 [22:47:08] (03PS1) 10BBlack: tlsproxy: enable DHE-2048 FS for Android 2.x, etc. [puppet] - 10https://gerrit.wikimedia.org/r/222023 (https://phabricator.wikimedia.org/T104281) [22:47:28] (03CR) 10Rush: [C: 032] pybal: remove eqiad applictaion for osm [puppet] - 10https://gerrit.wikimedia.org/r/222021 (owner: 10Rush) [22:47:50] bblack: Java was crap until 2001 and then back to crap by 2007 or so [22:48:44] what made it wonderful in the middle? :) [22:48:50] ha same question [22:50:24] that was when I used it every day :/ [22:50:30] (03PS1) 10CSteipp: Log privileged users with short passwords [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222025 (https://phabricator.wikimedia.org/T94774) [22:50:42] :) [22:51:10] I think I must be going crazy [22:51:15] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60578 bytes in 0.137 second response time [22:51:55] My change to the repository in tin:/srv/mediawiki-staging *should not* actually change whether that extension is loaded [22:51:55] Krenair: no comment [22:52:00] I think java was really awesome and cool when its primary use-case was to demonstrate the idea of cross-platform code running in browsers at Sun demos, usually involving a little animated bean in a Netscape browser. [22:52:07] after that, it was pretty much all downhill [22:52:15] and yet... mwscript eval.php shows it disappears [22:52:26] !log updated restbase1004 to openjdk-8 [22:52:30] Logged the message, Master [22:52:57] !log Pooled HHVM image scaler (mw1152) at weight 1 for testing. [22:53:00] Logged the message, Master [22:54:21] I don't think I'm going to sync my originally planned config changes [22:54:22] or whatever he was, apparently "Duke": [22:54:56] Who's SWATing? [22:54:58] Well [22:54:59] https://upload.wikimedia.org/wikipedia/commons/thumb/4/40/Wave.svg/1000px-Wave.svg.png [22:55:01] I was [22:55:07] Or at least, I was going to [22:55:12] hoo, why do you ask? [22:55:28] !log depooled mw1152 [22:55:32] Logged the message, Master [22:55:33] I would like to push a minor update to Wikidata for wmf12 [22:55:59] <_joe_> hoo: swat is suspendend until after the leap second, IIRC [22:55:59] go ahead [22:56:11] _joe_: Isn't that far ahead? [22:56:27] <_joe_> 1 hour ahead [22:56:43] <_joe_> but so was afternoon swat IIRC [22:56:51] nope [22:57:18] https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150630T2230 [22:57:20] jouncebot, next [22:57:20] In 0 hour(s) and 32 minute(s): Leap Second (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150630T2330) [22:57:26] ^ not happening [22:57:40] uh? [22:57:41] the leap second is not happening? [22:57:52] no, that was just a gut reaction to jouncebot heh [22:58:22] <_joe_> it is happening, or you won't see me around at this hour :) [22:58:28] did the evening one finish? [22:58:40] no [22:58:50] the window is 22:30 - 23:30 UTC [22:59:04] the time is 22:59 [22:59:14] :) [22:59:34] So... can I push my stuff, or do I need to wait until one hour after the leap second [22:59:44] (03CR) 10Mobrovac: Move Cassandra to g1gc collector and increase heap size (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/221993 (https://phabricator.wikimedia.org/T103161) (owner: 10GWicke) [22:59:56] We should *really* move the deploys to test to Monday [23:00:05] RECOVERY - puppet last run on cp3044 is OK Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:00:36] hoo, according to the calendar it's fine, if you doubt the calendar ask greg-g [23:00:56] I don't doubt the calendar, just don't want to make people angry [23:01:26] so, basically nothing was scheduled for the window (well, before this?) [23:01:58] if you're going to do something, sooner is better than later [23:02:01] I had something scheduled for it, and decided not to do it [23:02:07] It looks like hoo has something to add, however [23:02:08] hoo: there was push back on that idea (moving test wikis to Monday) when it was proposed [23:02:17] greg-g: Why? [23:02:20] hoo: really, the next change I want to make is to go daily :) [23:02:32] hoo: I'll explain later if you want to deploy now :) [23:02:34] Everything daily? [23:02:47] It's fine now... waiting for lazy jenkins [23:02:53] (yes) [23:03:06] but, deplo now or forever hold your peace (until tomoorrow) [23:03:10] +y [23:03:12] bblack: I have the comic book -- http://comicbookdb.com/issue.php?ID=242211 [23:03:41] * greg-g sings "tomorrow, tomorrow, you're only a day (and a second) away" [23:03:42] woah, was there more than one? [23:04:00] (03CR) 10Alex Monk: "I was going to do this today, but ran into some weirdness on tin. It seems like this somehow disables that extension, even though it's def" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221808 (https://phabricator.wikimedia.org/T31902) (owner: 10Alex Monk) [23:06:51] oh jenkins... [23:07:05] 7 minutes and counting... [23:08:19] PROBLEM - etherpad.wikimedia.org HTTP on etherpad1001 is CRITICAL - Socket timeout after 10 seconds [23:08:51] ^ etherpad's MyISAM is so fast, it got to the leap second early and crashed before everything else [23:09:18] hoo: looks like your tests just take that long :P https://integration.wikimedia.org/ci/job/mwext-Wikidata-testextension-zend/390/ [23:09:21] (03CR) 10GWicke: Move Cassandra to g1gc collector and increase heap size (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/221993 (https://phabricator.wikimedia.org/T103161) (owner: 10GWicke) [23:09:27] ( https://github.com/ether/etherpad-lite/wiki/Converting-from-InnoDB-to-MyISAM ) [23:11:06] greg-g: :S [23:11:18] Need to do more test profiling... but that's just such a time eater [23:11:35] Our testing infrastructure is not exactly perfect [23:12:16] (03CR) 10GWicke: Move Cassandra to g1gc collector and increase heap size (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/221993 (https://phabricator.wikimedia.org/T103161) (owner: 10GWicke) [23:12:33] 6operations, 10Traffic, 5Patch-For-Review: Sort out DHE for Forward Secrecy w/ older clients - https://phabricator.wikimedia.org/T104281#1415940 (10BBlack) ^ That's assuming we're all ok with possibly breaking outdated and/or Oracle Java. I think I am. [23:13:19] RECOVERY - etherpad.wikimedia.org HTTP on etherpad1001 is OK: HTTP OK: HTTP/1.1 200 OK - 7926 bytes in 0.028 second response time [23:15:14] (03CR) 10Alex Monk: [C: 04-1] Log privileged users with short passwords (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222025 (https://phabricator.wikimedia.org/T94774) (owner: 10CSteipp) [23:17:22] hoo: nothing is perfect [23:17:53] 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#1415962 (10bd808) For the current workflow of the scap family of tools, it would be easiest if we could select a list... [23:20:24] 6operations, 7HHVM: Custom session handler corrupted by session_destroy, "Failed to initialize storage module" - https://phabricator.wikimedia.org/T97675#1415971 (10bd808) @joe do we have a "next HHVM build" tracking bug that we can attach this too so it doesn't get lost? [23:21:08] greg-g: Ok to push now (Will be done before :30) [23:21:15] yeah [23:21:21] god speed [23:21:36] 6operations, 10Fundraising-Backlog, 6Security, 10fundraising-tech-ops: Delete gadolinium:/a/log/fundraising/ - https://phabricator.wikimedia.org/T92336#1415981 (10atgo) [23:30:04] akosiaris moritzm: Respected human, time to deploy Leap Second (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150630T2330). Please do the needful. [23:30:04] !log hoo Synchronized php-1.26wmf12/extensions/Wikidata/: Fix EntityParserOutputGenerator (duration: 00m 21s) [23:30:09] Logged the message, Master [23:31:34] heh, a split second over time [23:32:01] also, please deploy the lead second sounds awesome [23:32:47] greg-g: You can tell me about the daily deployments now :D [23:34:06] heh, I mean, there's not much to tell yet,other than I think that is a good goal. It'll make us do things like: improve speed of our tests, improve confidence of our tests, change to deploy based on percentage of pageviews, not per wiki, etc [23:34:53] really, I want to do 3x/day (8am Pacific, 12noon, and 4pm) [23:35:07] services will do their own thing [23:35:36] (03PS2) 10GWicke: Move Cassandra to g1gc collector and increase heap size [puppet] - 10https://gerrit.wikimedia.org/r/221993 (https://phabricator.wikimedia.org/T103161) [23:36:14] (03CR) 10GWicke: Move Cassandra to g1gc collector and increase heap size (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/221993 (https://phabricator.wikimedia.org/T103161) (owner: 10GWicke) [23:36:41] greg-g: Woah... that would mean a lot of change [23:36:55] Given Wikibase updates usually take 2-4 weeks to go live [23:37:23] 3x/day + atomic update to each server + % traffic to each version + "instant" rollback == modern practices [23:37:48] hoo: what would be the blockers for wikibase? [23:37:56] it also means no need for backports and one-off fixes mostly [23:38:33] Despite us not taking master into an unusable state (Which we did in the past, but are over that, I think) [23:38:40] there's *a lot* to do to get there, honestly, but that's what's awesome about goals: they make you do the good things [23:38:46] Well... keeping third party extensions up to date [23:39:03] various extensions just use Wikidata stuff without having proper tests for that [23:39:12] by that you mean your labrythine dependency chain? :P [23:39:22] Also not sure we run these tests while branching [23:39:30] (03PS1) 10Filippo Giunchedi: cassandra: remove restbase100[89] from cassandra seeds [puppet] - 10https://gerrit.wikimedia.org/r/222034 [23:39:31] that'd be good to do [23:39:47] anyone know why mw1031's eth0 is running at 100mbit instead of 1000? [23:39:56] (03CR) 10GWicke: [C: 031] cassandra: remove restbase100[89] from cassandra seeds [puppet] - 10https://gerrit.wikimedia.org/r/222034 (owner: 10Filippo Giunchedi) [23:40:39] (03PS1) 10Rush: confd: track per template run error state files [puppet] - 10https://gerrit.wikimedia.org/r/222035 [23:41:21] jgage: only thing in SAL is from April [23:41:22] (03CR) 10jenkins-bot: [V: 04-1] confd: track per template run error state files [puppet] - 10https://gerrit.wikimedia.org/r/222035 (owner: 10Rush) [23:42:07] it's been like that for 77 days, so that fits. *looks* [23:42:12] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] cassandra: remove restbase100[89] from cassandra seeds [puppet] - 10https://gerrit.wikimedia.org/r/222034 (owner: 10Filippo Giunchedi) [23:42:22] greg-g: Speaking as someone who takes quite a bit of pain from our current deployment process (I mean we need to be around three evenings per week, build our own build, ...)... having stuff as a more automatic and fail safe process would be awesome [23:42:28] also rapid delivery :D [23:44:10] yep :) [23:45:04] (03PS3) 10GWicke: Move Cassandra to g1gc collector and increase heap size [puppet] - 10https://gerrit.wikimedia.org/r/221993 (https://phabricator.wikimedia.org/T103161) [23:45:14] (03PS2) 10Rush: confd: track per template run error state files [puppet] - 10https://gerrit.wikimedia.org/r/222035 [23:47:07] mutante, are you having issues with phabricator? [23:47:20] Krenair: yes [23:47:33] i thought it was my connection first [23:47:33] (03PS3) 10Rush: confd: track per template run error state files [puppet] - 10https://gerrit.wikimedia.org/r/222035 [23:47:35] (03PS2) 10Rush: confd: alert on invalid template generation [puppet] - 10https://gerrit.wikimedia.org/r/221978 [23:47:40] you appear to have creates two duplicates of your own task from hours ago [23:47:43] created* [23:47:58] oooh, did i.. weird [23:48:04] (03Abandoned) 10Rush: confd: alert on invalid template generation [puppet] - 10https://gerrit.wikimedia.org/r/221978 (owner: 10Rush) [23:48:10] it appeared to me as if they never got created [23:48:28] (03CR) 10Rush: [C: 032] confd: track per template run error state files [puppet] - 10https://gerrit.wikimedia.org/r/222035 (owner: 10Rush) [23:48:52] #wikimedia-operations.04-13.log:14:59 < wikibugs> operations, ops-eqiad: mw1031 has a bad uplink - https://phabricator.wikimedia.org/T95896#1202993 (Joe) NEW [23:48:55] #wikimedia-operations.04-13.log:15:00 < _joe_> !log depooled mw1031 [23:48:58] jgage: ^ [23:49:21] thanks greg-g [23:49:44] Krenair: thanks, i see you already cleaned it up! [23:49:47] * greg-g likes grep'ing through irclogs [23:49:49] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Move Cassandra to g1gc collector and increase heap size [puppet] - 10https://gerrit.wikimedia.org/r/221993 (https://phabricator.wikimedia.org/T103161) (owner: 10GWicke) [23:49:50] i wonder what flea power is [23:50:01] (03PS1) 10GWicke: Require openjdk-8-jdk [puppet] - 10https://gerrit.wikimedia.org/r/222037 [23:50:18] <_joe_> can we not merge patches to cassandra right now? [23:50:46] <_joe_> I mean 10 minutes away from the leap second, it's the single most delicate thing we have as far as timing is concerned [23:51:08] _joe_: sure, I won't puppet-merge however the puppet recipe won't go near cassandra [23:51:17] <_joe_> oh ok [23:51:22] this is the biggest deal since y2k [23:51:33] the end of the world as we know it [23:51:41] <_joe_> ahah [23:51:42] epoch, actually [23:51:48] <_joe_> ori: well last time it was [23:52:10] actually I think the pragmatic fallout was bigger than y2k, even though the panic buildup was lesser [23:52:11] <_joe_> I am pretty confident it won't be the same shitshow [23:52:12] fact: the mayan calendar ends today [23:52:15] good thing we'll all be old by 2038 when the epoch ends [23:52:34] ori: oh shite, /me finds the nearest mormon with 3 years of reserves [23:52:50] fact: Children born during the leap second start out at age 2048 [23:52:52] <_joe_> jgage: I think given the retirement laws in italy, I'll still be working in 23 years [23:53:06] jgage, bad thing is that nobody younger than us will remember C ;) [23:53:11] heheh [23:53:40] I assume it’ll just be like the y2k — we’ll all get laid off in the next decade, spend years unemployed, then suddenly get hired back as high-priced contractors during 2037 [23:53:50] greg-g: so, I'd like to deploy the MF sitenotice fix relatively soon... [23:54:04] legoktm: guh, of course [23:54:14] legoktm: but, isn't it (the notice) not on right now? [23:54:31] greg-g: on enwp yes, but other wikis use notices all the time :) [23:54:38] (03PS4) 10Rush: confd: track per template run error state files [puppet] - 10https://gerrit.wikimedia.org/r/222035 [23:54:40] (that wasn't a "OK" "of course", btw) [23:54:45] right [23:54:47] :) [23:54:54] should I wait until after the leap second? [23:54:57] ideally [23:54:58] (03PS5) 10Rush: confd: track per template run error state files [puppet] - 10https://gerrit.wikimedia.org/r/222035 [23:55:27] The 32-bit rollover will be a week after my 65th birthday. I vowed in 1999 that I would have retired from computer nerd work by then. [23:56:00] bd808: it'll be fun to watch from your chair on the beach, sipping a cuba libre [23:56:06] bd808: you can still retire two weeks before your 65th birthday [23:56:26] greg-g: and the planes fall from the sky and the ocean boils [23:56:53] see you down in Arizona Bay [23:57:27] sick bill hicks reference [23:57:44] :) [23:59:37] (03PS2) 10CSteipp: Log privileged users with short passwords [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222025 (https://phabricator.wikimedia.org/T94774)