[00:01:44] I would be surprised if pc1002 was suffering from a high read rate, because we have memcached in front in a multiwrite configuration [00:01:56] the most likely thing, it seems to me, is a high write rate [00:02:07] TimStarling: right now springle is collecting some data that will probably help [00:02:17] sorry I didn't get back to that thread [00:02:27] springle: how's things on that? [00:04:24] also I'd like to know whether the apaches were hanging inside a query or in connect -- I would suspect in a query [00:04:35] timeouts are longer in queries [00:05:26] that suggests a very high max_connections on pc1002 but maybe that's the case [00:06:08] !log Running clearMessageBlobs.php [00:06:13] Logged the message, Master [00:14:02] I guess you guys know about (Cannot contact the database server: Too many connections (10.64.48.26)) [00:14:13] greg-g: essay almost done [00:14:57] :) [00:14:58] NotASpy: wiki? [00:15:11] NotASpy: which wiki, to be exact [00:15:19] hoo: yeah. en.wp trying to load my contributions. [00:15:49] does it work again? [00:15:53] TimStarling: you're correct about the high write-rate [00:17:06] hoo: yeah, it's fine now. [00:17:29] or rather, I've not been able to recreate the error. [00:17:51] dberror is full with that, but it stopped 3 min. ago [00:18:22] not sure what happened, but there are only ~350 conns. on that machine now, so it's fine again [00:19:21] springle: Do you have any clue why that happens from time to time? We had that on Wikidata before a few times and I haven't been able to track it down [00:19:31] probably something is making a lot of apache workers sleep [00:22:01] hoo: yes. recently we've seen those too many connections bursts at the same time as puppet runs [00:22:17] which infact is directly related to pc1002 earlier, too [00:23:01] ok... what's puppet doing that leads to this? [00:23:13] although, there is something else affecting S[1-7] slaves as well as puppet; sometimes spikes are of slow special page queries which look more like some sort of cache invalidation fallingback on bd [00:23:17] db* [00:23:45] a lot of wait io, and a network spike [00:24:04] but honestly we can't pin it down to puppet alone yet [00:24:31] puppet runs are much more frequent than this happens [00:24:44] yep [00:27:22] just happened again with 10.64.48.22 :/ [00:29:01] (03CR) 10BryanDavis: "> 'type' isn't part of the gelf spec, I think that field is being added by Logstash" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140623 (owner: 10Gage) [00:29:37] hoo: https://tendril.wikimedia.org/report/slow_queries?host=^db1067&user=wikiuser&schema=wik&qmode=eq&query=&hours=1 [00:29:53] that's sampled back quite a bit [00:30:01] Revision::fetchFromConds doesn't usually take 2s [00:30:03] don't think I have access to that [00:30:17] oh [00:31:02] also I can't login to ishmael for ages [00:31:58] yeah that's the same auth as ishmael [00:32:56] oh ndas [00:33:15] probably because I'm not in the ops or wmf ldap group (not sure which one) [00:34:02] springle: Is there another way to connect to them? Can I bypass auth by requesting from the cluster? [00:35:16] well both are hosted on neon; i suppose one could socks forward [00:36:31] get a 401 even if I use curl on the cluster, meh [00:36:43] and don't have direct shell on the machine as far as I see on site.pp [00:38:25] ldap_authurl => 'ldaps://virt0.wikimedia.org virt1000.wikimedia.org/ou=people,dc=wikimedia,dc=org?cn', [00:38:25] ldap_group => 'cn=wmf,ou=groups,dc=wikimedia,dc=org', [00:38:50] that's very org. specific :/ [00:40:05] yeah, welcome to the convoluted world of ndas and access to private user data as a nonWMFer [00:40:20] I've been asking questions about it for months now and .... yeah [00:44:15] never rains but it pours [00:50:43] mh, guess I could modify the apache conf, but that would look horrible as that's not really something that's commonly done [00:51:03] modify in a way that lets me/ us bypass auth from inside the cluster [00:51:51] PROBLEM - MySQL Processlist on db1019 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 69 statistics [00:52:41] seriously wtf [00:52:47] today is wierd [00:53:04] wtf [00:53:07] wtf [00:53:51] RECOVERY - MySQL Processlist on db1019 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 2 statistics [01:03:01] greg-g: mail sent. i'd like to get feedback before wikifying it [01:03:10] also i need to check out these other spikes^ [01:04:48] * greg-g nods [01:09:02] thanks springle [01:17:19] springle: If I change the apache config, which range should I allow to bypass auth? Or only a single host (bastion? terbium?) [01:18:37] hoo: i really have no idea who is allowed to view what. we should just figure out the correct ldap group to use for auth [01:19:23] the "correct" ones are wmf and ops (right now wmf)... but I'm neither :P [01:20:06] then they're surely not correct since i'd benefit from you having access to links i post [01:21:00] maybe we could create a new one which covers all shell users, or just use the "wmde" group (although I'm not in that one atm :P) [01:21:24] greg-g: Who's deciding about these things? [01:39:28] (03PS1) 10Hoo man: Allow ldap "wmde" users to access ishmael [operations/puppet] - 10https://gerrit.wikimedia.org/r/140881 [01:40:19] :P Well, if there's no reply I'm just bold [01:42:49] (03CR) 10Chad: "What's broken about it?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140666 (owner: 10PleaseStand) [01:44:57] hoo: "no one" [01:45:17] ehm... awesome... not :/ [01:45:23] tell me about it [01:45:34] but the change in gerrit will hopefully at least bring this into view [01:45:37] I'll wrangle the people I can [01:45:51] (03CR) 10Hoo man: "I'm not sure this works (the multiple require clauses should be fulfilled if at least one is)..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/140881 (owner: 10Hoo man) [01:46:46] greg-g: Thanks :) [01:47:02] getting close to 4am, so I better call it a day... good night ;) [01:47:35] hoo: thanks for the merge earlier [01:47:41] good night [01:48:17] :) [02:01:45] (03CR) 10MZMcBride: "includes/IP.php doesn't exist any longer. I'm too lazy to look up when it went away." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140666 (owner: 10PleaseStand) [02:32:31] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:33:31] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: Fetching readonly [02:35:17] !log LocalisationUpdate completed (1.24wmf9) at 2014-06-20 02:34:14+00:00 [02:35:23] Logged the message, Master [03:19:40] !log LocalisationUpdate completed (1.24wmf10) at 2014-06-20 03:18:36+00:00 [03:19:45] Logged the message, Master [03:25:33] (03PS3) 10PleaseStand: Remove use of deprecated wfGetIP() [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140666 [03:35:12] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun 20 03:34:06 UTC 2014 (duration 34m 5s) [03:35:17] Logged the message, Master [04:13:31] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [05:39:11] (03CR) 10Nuria: "As you pointed out I also need to change error messages." (036 comments) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [07:16:47] (03CR) 10Alexandros Kosiaris: "Cool. thanks for clearing these out" (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 (owner: 10Nikerabbit) [07:32:55] (03PS18) 10KartikMistry: cxserver configuration for beta labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 (owner: 10Nikerabbit) [07:40:25] (03PS1) 10KartikMistry: Add firewall rules for contint Jenkins slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/140891 [07:41:27] oops. wrong dependencies between two patches. [07:47:31] (03PS2) 10KartikMistry: Add firewall rules for contint Jenkins slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/140891 [07:54:46] (03PS1) 10Giuseppe Lavagetto: puppet3: tpa databases to puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/140893 [07:56:12] (03CR) 10Springle: [C: 031] puppet3: tpa databases to puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/140893 (owner: 10Giuseppe Lavagetto) [07:57:30] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet3: tpa databases to puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/140893 (owner: 10Giuseppe Lavagetto) [08:26:35] (03PS1) 10Giuseppe Lavagetto: puppet3: all databases to puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/140896 [08:28:49] (03CR) 10QChris: [WIP] Add backup role and scripts to wikimetrics (033 comments) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [08:37:13] (03PS1) 10Gilles: Reduce MediaViewer EventLogging rate [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140897 [08:42:50] (03CR) 10QChris: [WIP] Add backup role and scripts to wikimetrics (031 comment) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [08:46:24] (03CR) 10Springle: [C: 031] puppet3: all databases to puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/140896 (owner: 10Giuseppe Lavagetto) [08:47:51] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet3: all databases to puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/140896 (owner: 10Giuseppe Lavagetto) [08:48:49] (03PS1) 10Kmosher: Allow specifying default replication factor [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/140898 [08:52:55] (03CR) 10Alexandros Kosiaris: [C: 032] "I am gonna coordinate with Antoine during merging this, just to make sure no problems arise. LGTM" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140891 (owner: 10KartikMistry) [08:59:35] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Nitpicks. Otherwise looks fine." (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/140749 (owner: 10Matanya) [09:00:38] (03PS3) 10Filippo Giunchedi: add swift eqiad-prod cluster dashboard [operations/puppet] - 10https://gerrit.wikimedia.org/r/140685 [09:00:45] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] add swift eqiad-prod cluster dashboard [operations/puppet] - 10https://gerrit.wikimedia.org/r/140685 (owner: 10Filippo Giunchedi) [09:07:46] (03PS3) 10Matanya: kafkatee: convert logrotate script into a template [operations/puppet] - 10https://gerrit.wikimedia.org/r/140749 [09:10:11] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.002 second response time [09:10:34] <_joe_> godog: is this the new dashboard going live? :P [09:10:47] haha no [09:10:56] I hope at least [09:11:21] looking into it [09:19:45] usual problem btw, one of the big metrics being requested and the uwsgi workers spending all of their time fetching it [09:21:09] (03CR) 10Alexandros Kosiaris: [C: 032] kafkatee: convert logrotate script into a template [operations/puppet] - 10https://gerrit.wikimedia.org/r/140749 (owner: 10Matanya) [09:22:11] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.003 second response time [09:26:17] (03PS1) 10Giuseppe Lavagetto: puppet3: ssl terminators [operations/puppet] - 10https://gerrit.wikimedia.org/r/140899 [09:30:27] (03PS1) 10Filippo Giunchedi: fix swift cache directory permissions [operations/puppet] - 10https://gerrit.wikimedia.org/r/140900 [09:31:29] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet3: ssl terminators [operations/puppet] - 10https://gerrit.wikimedia.org/r/140899 (owner: 10Giuseppe Lavagetto) [09:32:08] (03CR) 10Tobias Gritschacher: [C: 031] Icinga: Check Dispatch command for Wikidata notification [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 (owner: 10Christopher Johnson (WMDE)) [09:52:55] (03PS1) 10Giuseppe Lavagetto: puppet3: pc caches [operations/puppet] - 10https://gerrit.wikimedia.org/r/140901 [09:53:48] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet3: pc caches [operations/puppet] - 10https://gerrit.wikimedia.org/r/140901 (owner: 10Giuseppe Lavagetto) [10:05:42] (03CR) 10Nikerabbit: cxserver configuration for beta labs (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 (owner: 10Nikerabbit) [10:43:44] (03PS10) 10Nuria: Add backup role and scripts to wikimetrics [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [10:45:38] (03CR) 10Nuria: "I think I have address all points. If error messages are not good enough please be so kind as to suggest an specific implementation." (037 comments) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [10:48:01] (03PS11) 10Nuria: Add backup role and scripts to wikimetrics [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [10:49:45] (03PS1) 10Giuseppe Lavagetto: puppet3: migrate swift (be and fe) [operations/puppet] - 10https://gerrit.wikimedia.org/r/140904 [10:58:11] PROBLEM - puppetmaster backend https on palladium is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8141: HTTP/1.1 500 Internal Server Error [10:58:48] <_joe_> mmmh [10:59:21] <_joe_> it's passenger down [10:59:34] <_joe_> no point in debugging, it's ruby glue crap [11:00:11] RECOVERY - puppetmaster backend https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.027 second response time [11:00:14] <_joe_> !log restarted apache on palladium, passenger was dead and filling error logs [11:00:19] Logged the message, Master [11:08:09] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet3: migrate swift (be and fe) [operations/puppet] - 10https://gerrit.wikimedia.org/r/140904 (owner: 10Giuseppe Lavagetto) [11:11:36] (03Abandoned) 10Faidon Liambotis: Labs: Fix beta to work with role::mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/134519 (owner: 10BryanDavis) [11:14:09] (03CR) 10Faidon Liambotis: "This is all Alex's domain :) " [operations/puppet] - 10https://gerrit.wikimedia.org/r/138632 (owner: 10Dzahn) [11:33:02] * matanya is looking for ops reviews [11:59:01] PROBLEM - RAID on ms-be3003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:59:51] RECOVERY - RAID on ms-be3003 is OK: OK: optimal, 12 logical, 12 physical [12:01:37] godog: that you I assume? [12:08:45] (03CR) 10QChris: Add backup role and scripts to wikimetrics (033 comments) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [12:14:01] Hi opsen, is there any information on how long the video scalers outage took and how extensive it was? [12:14:48] twkozlowski: https://wikitech.wikimedia.org/wiki/Incident_documentation/20140613-Videoscalers [12:15:41] Timeline is broken. [12:15:55] Wednesday June 13, and Sunday June 14... [12:16:01] * twkozlowski fixing :-) [12:16:04] (03CR) 10Alexandros Kosiaris: [C: 04-2] "NAK, this is already packaged for ubuntu and contained in nagios-plugins-contrib. We should anyway install the package to enjoy updates (n" [operations/puppet] - 10https://gerrit.wikimedia.org/r/138632 (owner: 10Dzahn) [12:16:53] Thanks paravoid [12:20:27] paravoid: on a similar note, I think you are still discussing the yesterday's outage somewhere? I noticed there is no incident documentation for that one yet [12:20:39] I also searched Wikitech-l, and noticed nothing there [12:21:16] (I'm thinking of adding a link for Tech News in case people want to read more about it, but we can definitely live without any link at all.) [12:24:03] (03CR) 10Odder: "As a matter of interest, this patch effectively enables CirrusSearch as the primary search engine for 70 new wikis (according to the comm " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136338 (owner: 10Chad) [12:34:56] paravoid: uhm no, coincidence [12:49:06] jenkins-bot is behaving weird: "Merge Failed." but it is already based on latest change. Known problem? [12:53:41] se4598: yes, for almost a year now. https://bugzilla.wikimedia.org/show_bug.cgi?id=hash-mismatch [12:54:55] springle: would you like to be automatically notified of patches that introduce schema changes? [12:55:33] something like this can do it I'm told [12:55:34] https://www.mediawiki.org/w/index.php?title=Git%2FReviewers&diff=1041804&oldid=1041082 [13:07:51] _joe_: finally there sorry [13:07:53] (03PS3) 10Filippo Giunchedi: Add roles for testing swift in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/137803 (owner: 10Andrew Bogott) [13:08:14] _joe_: we can get lanthanum / gallium (contint servers) switched to puppet 3 whenever you want [13:08:14] <_joe_> hashar: np [13:08:20] (03CR) 10Filippo Giunchedi: Add roles for testing swift in labs (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137803 (owner: 10Andrew Bogott) [13:08:27] <_joe_> hashar: I'm ready [13:10:15] _joe_: go ahead with puppet change / merge :-] [13:10:38] i can run puppet on those machines if it can offload you [13:11:17] <_joe_> nah don't worry [13:19:13] (03PS1) 10Giuseppe Lavagetto: puppet3: contint servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/140908 [13:25:49] (03CR) 10Hashar: [C: 031] "Thanks!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140908 (owner: 10Giuseppe Lavagetto) [13:27:12] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet3: contint servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/140908 (owner: 10Giuseppe Lavagetto) [13:30:58] <_joe_> hashar: puppet has ran on both servers, now puppet 3 will be used to validate commits [13:31:18] awesome [13:31:24] _joe_: want me to do the announce? [13:31:33] (with appropriate crediting obviously) [13:31:39] <_joe_> hashar: first let's verify it works [13:31:52] <_joe_> ;) [13:32:24] (03Restored) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/89002 (owner: 10Hashar) [13:32:28] (03PS10) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/89002 [13:32:34] <_joe_> also, I think this is expected as all our critical servers are on puppet 3 now. [13:32:41] <_joe_> hashar: eheh good idea [13:32:59] lame job is https://integration.wikimedia.org/ci/job/operations-puppet-validate/16584/ [13:34:04] <_joe_> phew [13:34:09] V+1 \O/ [13:34:16] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/89002 (owner: 10Hashar) [13:37:01] (03PS1) 10Odder: Change some user group rights on ruwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140910 (https://bugzilla.wikimedia.org/66871) [13:59:26] (03CR) 10Milimetric: [C: 04-1] Add backup role and scripts to wikimetrics (031 comment) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [14:17:26] (03PS2) 10Thiemo Mättig (WMDE): Icinga: IRC notification event handler and Wikidata configuration file [operations/puppet] - 10https://gerrit.wikimedia.org/r/139193 (owner: 10Christopher Johnson (WMDE)) [14:19:02] (03CR) 10Dzahn: "oh totally, i just didn't realize it was packaged, i agree" [operations/puppet] - 10https://gerrit.wikimedia.org/r/138632 (owner: 10Dzahn) [14:19:11] (03Abandoned) 10Dzahn: add check_snmp_environment from Nagios exchange [operations/puppet] - 10https://gerrit.wikimedia.org/r/138632 (owner: 10Dzahn) [14:48:11] PROBLEM - MySQL Processlist on db1021 is CRITICAL: CRIT 1 unauthenticated, 0 locked, 1 copy to table, 601 statistics [14:49:01] RECOVERY - MySQL Processlist on db1021 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 10 statistics [15:06:00] (03PS3) 10KartikMistry: Enable ContentTranslation extension on beta labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140723 (owner: 10Nikerabbit) [15:06:47] (03CR) 10KartikMistry: Enable ContentTranslation extension on beta labs (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140723 (owner: 10Nikerabbit) [15:07:54] hashar: ^^ if you've time :) [15:08:26] (03CR) 10Reedy: [C: 04-1] "Needs adding to extension-list-labs so you get i18n stuff included" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140723 (owner: 10Nikerabbit) [15:08:44] Reedy is faster than me :] [15:09:40] heh [15:09:58] :) [15:10:26] And my internet access is sucking badly :P [15:10:31] Reedy: are you on a boat? [15:10:47] At home [15:10:59] YuviPanda is bot :) [15:11:01] aww [15:11:01] I think my router might not be liking the heat too much [15:11:07] Reedy: hehe 'heat' [15:12:06] It's also nearly 3 years old :/ [15:12:24] I can imagine what Yuvi must be thinking about 'heat'. [15:12:32] kart_: I was in the UK until one week ago [15:12:47] Now, I can surely imagine ;) [15:13:25] (03PS1) 10Reedy: Remove MobileApp, already in extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140920 [15:13:32] * Reedy kicks YuviPanda [15:13:56] fine, fine. [15:13:57] I forgot [15:14:02] it's been there for ages now :| [15:14:06] HOW DARE YOU [15:14:33] (03PS12) 10Nuria: Add backup role and scripts to wikimetrics [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [15:15:13] (03CR) 10Reedy: [C: 032] Remove MobileApp, already in extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140920 (owner: 10Reedy) [15:15:20] (03Merged) 10jenkins-bot: Remove MobileApp, already in extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140920 (owner: 10Reedy) [15:16:15] Reedy: so much self merging [15:16:24] !log reedy Synchronized wmf-config/extension-list-labs: (no message) (duration: 00m 16s) [15:16:26] I'm special [15:16:29] Logged the message, Master [15:16:47] Reedy: are you a Special Page? [15:17:37] (03CR) 10Dzahn: [C: 031] Grant bearND ability to upload mobile releases [operations/puppet] - 10https://gerrit.wikimedia.org/r/140646 (owner: 10Yuvipanda) [15:21:02] (03PS1) 10Filippo Giunchedi: don't automatically restart swift backend services [operations/puppet] - 10https://gerrit.wikimedia.org/r/140922 [15:21:06] (03CR) 10Nuria: Add backup role and scripts to wikimetrics (033 comments) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [15:25:30] (03CR) 10Reedy: Enable ContentTranslation extension on beta labs (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140723 (owner: 10Nikerabbit) [15:32:52] (03CR) 10BryanDavis: "I will resurrect this patch once we get bug 65591 fixed in labs. At that point all it will contain is the addition of ::mediawiki::users i" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134519 (owner: 10BryanDavis) [15:40:47] (03CR) 10BryanDavis: Enable ContentTranslation extension on beta labs (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140723 (owner: 10Nikerabbit) [15:48:33] (03CR) 10Greg Grossmeier: "Just to confirm this in a place that isn't easily missed in future patchsets:" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140723 (owner: 10Nikerabbit) [15:50:32] (03PS5) 10Nuria: Enable the new backup role in wikimetrics if set [operations/puppet] - 10https://gerrit.wikimedia.org/r/139558 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [15:51:50] (03CR) 10jenkins-bot: [V: 04-1] Enable the new backup role in wikimetrics if set [operations/puppet] - 10https://gerrit.wikimedia.org/r/139558 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [15:55:26] (03CR) 10QChris: Add backup role and scripts to wikimetrics (031 comment) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [15:56:54] (03PS1) 10Rush: add admin users to default node definition [operations/puppet] - 10https://gerrit.wikimedia.org/r/140930 [15:59:34] (03CR) 10Filippo Giunchedi: [C: 031] "with the understanding that it'll be merged next week :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140930 (owner: 10Rush) [16:02:31] (03CR) 10Nikerabbit: "I'll try to clarify few things:" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140723 (owner: 10Nikerabbit) [16:06:54] (03CR) 10Greg Grossmeier: "> 1) We need to use cxserver-beta for the initial beta feature launch unless we suddenly get space from production." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140723 (owner: 10Nikerabbit) [16:09:08] (03CR) 10Dzahn: [C: 032] rancid.pp - lint and tidy, quoting, arrows, retab [operations/puppet] - 10https://gerrit.wikimedia.org/r/139464 (owner: 10Dzahn) [16:09:49] (03PS4) 10Dzahn: rancid - replace generic::systemuser with user [operations/puppet] - 10https://gerrit.wikimedia.org/r/138005 (owner: 10Rush) [16:10:10] (03CR) 10jenkins-bot: [V: 04-1] rancid - replace generic::systemuser with user [operations/puppet] - 10https://gerrit.wikimedia.org/r/138005 (owner: 10Rush) [16:10:47] (03PS1) 10Rush: admin yaml mc* [operations/puppet] - 10https://gerrit.wikimedia.org/r/140934 [16:11:06] Not authorized to call find on /file_metadata/smokeping/smokeping.fcgi .. hmmm [16:11:26] is that the -1? no idea what that means :) [16:11:33] <_joe_> mutante: where is it? [16:11:40] _joe_: netmon1001 [16:11:52] <_joe_> mutante: it's a wrongly-defined file path [16:12:09] <_joe_> it's the classic thing that puppet 2 allowed and puppet 3 does not [16:12:30] ah, but it appears to be still puppet2 on that node [16:12:37] i'll look closer [16:12:48] <_joe_> puppet 3 master [16:12:49] <_joe_> :) [16:12:56] oh, makes sense [16:13:08] <_joe_> mutante: let me take a look [16:13:39] modules/smokeping/manifests/web.pp: source => "puppet:///${module_name}/smokeping.fcgi", [16:14:06] <_joe_> you need a modules in front of the module name [16:14:09] <_joe_> I think [16:14:17] yes [16:14:21] agreed [16:14:21] (03PS2) 10Rush: admin yaml mc* [operations/puppet] - 10https://gerrit.wikimedia.org/r/140934 [16:14:21] <_joe_> 99.99% sure but look at the docs [16:14:30] (03CR) 10Rush: [C: 032] admin yaml mc* [operations/puppet] - 10https://gerrit.wikimedia.org/r/140934 (owner: 10Rush) [16:14:53] (03CR) 10Rush: [V: 032] admin yaml mc* [operations/puppet] - 10https://gerrit.wikimedia.org/r/140934 (owner: 10Rush) [16:19:08] (03PS1) 10Dzahn: fix path to smokeping.fcgi [operations/puppet] - 10https://gerrit.wikimedia.org/r/140935 [16:21:42] (03CR) 10Dzahn: [C: 032] fix path to smokeping.fcgi [operations/puppet] - 10https://gerrit.wikimedia.org/r/140935 (owner: 10Dzahn) [16:22:06] (03CR) 10Hashar: "Niklas wrote:" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140723 (owner: 10Nikerabbit) [16:22:31] Jun 20 16:22:10 netmon1001 kernel: [14481049.594951] init: ganglia-monitor main process (7725) terminated with status 1 [16:22:34] Jun 20 16:22:10 netmon1001 kernel: [14481049.594983] init: ganglia-monitor main process ended, respawning [16:22:37] Jun 20 16:22:10 netmon1001 kernel: [14481049.619690] init: ganglia-monitor respawning too fast, stopped [16:24:32] _joe_: smokeping part fixed. smokeping.wikimedia.org]: Scheduling refresh of Service[apache2] [16:25:12] <_joe_> mutante: :) [16:26:26] (03CR) 10Dzahn: "smokeping.wikimedia.org]: Scheduling refresh of Service[apache2]" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140935 (owner: 10Dzahn) [16:30:29] (03CR) 10Hashar: "I will have to review this a bit more carefully. The ssh from beta bastion might not be needed anymore since the Jenkins job now runs dir" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140891 (owner: 10KartikMistry) [16:32:10] (03CR) 10GWicke: ">> 3) Gabriel told us that parsoid-beta is not well maintained currently. I do not know if they can fix it easily." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140723 (owner: 10Nikerabbit) [16:33:02] gwicke: (to not munge up gerrit reviews with mostly superfluous comments) thanks for the clarification there [16:33:22] (03PS5) 10Dzahn: rancid - replace generic::systemuser with user [operations/puppet] - 10https://gerrit.wikimedia.org/r/138005 (owner: 10Rush) [16:33:52] confusion between "beta cluster" "beta labs" "wmflabs" and "beta feature" [16:34:07] I heard you liked pre-production software, so we made everything beta [16:34:39] yo dog I heard you like beta so I beta's your beta while you were beta'ing [16:34:52] dawg* [16:34:57] I put a beta in your beta so you can test while you test [16:35:25] (03PS6) 10Dzahn: rancid - replace generic::systemuser with user [operations/puppet] - 10https://gerrit.wikimedia.org/r/138005 (owner: 10Rush) [16:37:11] (03PS7) 10Dzahn: rancid - replace generic::systemuser with user [operations/puppet] - 10https://gerrit.wikimedia.org/r/138005 (owner: 10Rush) [16:37:15] greg-g: you are welcome! [16:39:56] (03CR) 10Dzahn: [C: 032] rancid - replace generic::systemuser with user [operations/puppet] - 10https://gerrit.wikimedia.org/r/138005 (owner: 10Rush) [16:41:42] (03CR) 10Dzahn: [C: 032] add update_functions file to wikistats [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/140606 (owner: 10Dzahn) [16:42:59] akosiaris: remember "until backup::client has a ferm rule" ? [16:43:24] does that make sense that it should be there? [16:45:55] greg-g: Is it ok with you if I run `sync-common` on fenari to verify that bug 66844 was fixed by the update to scap I pushed yesterday? [16:46:50] bd808: sync-common also exists on tin [16:47:09] mutante: Yes, but the bug was specific to fenari [16:47:17] bd808: yeah [16:52:12] bd808: greg-g : fenari.wikimedia.org : Jun 20 16:48:25 : mwdeploy : user NOT in sudoers ; [16:52:16] is that the issue maybe? [16:52:43] (and do we have to care about fenari still?) [16:52:52] mutante: Yeah that was my typing the wrong command. Now I'm on santa's naughty list [16:52:53] if we don't, decom it [16:53:10] bd808: gotcha,ok, i thought maybe that was the cause [16:53:29] !log Ran /usr/local/bin/sync-common on fenari to verify fix for bug 66844. It works! [16:53:34] Logged the message, Master [16:55:29] I can't answer the "do we care about fenari" question. Was it the deploy server before the eqiad move? [16:55:40] * bd808 is still sort of new here [16:55:41] yeah [16:56:00] bd808: yes, i think i know the answer [16:56:20] for deployment it wouldnt matter but we still need to move noc.wm [16:56:23] ^ [16:56:39] and it still needs to recieve updates to keep noc up to date [16:56:58] It took 3:42 to run the rsync; slow connection is slow [16:58:58] did it transfer much? [16:59:14] It should have been a no-op [16:59:24] btw palladium's disk is filling more than we are emptying it, it might alarm again this week, I believe it is related to the salt master not cleaning up jobs (rt 7721) [16:59:28] https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20eqiad&h=palladium.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1403282440&v=78.8&m=part_max_used&vl=%25&ti=Maximum%20Disk%20Space%20Used&z=large [17:01:31] (03CR) 10Dzahn: "please fix the path conflict, otherwise it looks good" [operations/puppet] - 10https://gerrit.wikimedia.org/r/139786 (owner: 10Matanya) [17:02:19] (03CR) 10Dzahn: [C: 032] solr: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/140703 (owner: 10Matanya) [17:05:32] (03CR) 10Filippo Giunchedi: "note that this is to get a minimal swift in labs in place, the whole swift + puppet needs a cleanup/overhaul anyway" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137803 (owner: 10Andrew Bogott) [17:08:54] (03CR) 10BryanDavis: "\o/ Will we be able to use this to add a swift storage cluster for images in beta? The current usage of a shared folder on NFS is slow and" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137803 (owner: 10Andrew Bogott) [17:11:29] !log expanded palladium's root to avoid filling up, suspected salt-master (RT #7721) [17:11:33] Logged the message, Master [17:12:25] (03CR) 10Filippo Giunchedi: "yes! this would make it simpler to allocate a swift cluster in labs/beta to be used" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137803 (owner: 10Andrew Bogott) [17:15:16] (03PS1) 10Dzahn: lint backups.pp - part 1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/140942 [17:25:07] (03CR) 10Dzahn: "andrewbogott, still wanna do this? i think it should report 20 minutes after merge" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140745 (owner: 10Andrew Bogott) [17:25:41] mutante: yes, but after lunch :) [17:26:23] andrewbogott_afk: 'k [17:40:11] (03PS1) 10Dzahn: puppet module for a tor relay (WIP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/140948 [17:41:21] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:41:53] ^ tungsten seems really slow [17:41:58] login takes looon [17:42:36] yea, graphite is doing tons of stuff and --debug start [17:43:03] where do you see --debug start? [17:43:09] htop [17:43:22] gotcha [17:53:11] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.003 second response time [18:04:18] (03CR) 10Dzahn: [C: 032] pmacct - replace generic::systemuser with user [operations/puppet] - 10https://gerrit.wikimedia.org/r/137989 (owner: 10Rush) [18:04:48] (03PS4) 10Dzahn: pmacct - replace generic::systemuser with user [operations/puppet] - 10https://gerrit.wikimedia.org/r/137989 (owner: 10Rush) [18:16:40] (03CR) 10Dzahn: "noop on rhenium" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137989 (owner: 10Rush) [18:19:45] (03CR) 10Alex Monk: "... why?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140948 (owner: 10Dzahn) [18:20:35] you know what's worse than "if $realm = 'labs'" ? "if $realm != 'labs'" , easy to overlook [18:21:28] (03CR) 10Dzahn: "to support https://www.eff.org/torchallenge/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140948 (owner: 10Dzahn) [18:23:41] where's the deploy bot [18:23:56] .next [18:24:14] morebots and logmsgbot are both here [18:24:15] I am a logbot running on tools-exec-09. [18:24:15] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [18:24:15] To log a message, type !log . [18:24:26] jouncebot: .next [18:24:30] jouncebot: netxt [18:24:32] arg :P [18:24:35] jouncebot: next [18:24:35] In 68 hour(s) and 35 minute(s): SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140623T1500) [18:24:36] oh, new bot! [18:24:37] that :) [18:24:42] weee [18:24:45] i wanted to know the next window:) [18:24:56] i didn't realize it responded to commands [18:25:14] in 68 hours, of course it's Friday [18:25:42] hmmm, i wanted to make this change https://gerrit.wikimedia.org/r/#/c/137993/3/modules/deployment/manifests/deployment_server.pp [18:26:01] ori: It can do "next" and "refresh" at least. There's a command to restart it too, but mwalker may have to give that one. [18:26:14] it worked fine (noop) on other boxes [18:27:40] hmm? [18:27:47] mutante: generic::systemuser sets up the group too tho, no? [18:28:06] mwalker: ori just noticed jouncebot [18:28:29] ori: yes, but user does as well . groups => $extra_groups [18:28:30] ah; yep -- ori have you not noticed it poking you for deploy windows? [18:29:09] mwalker: i noticed the announcements, not the interactive commands [18:29:29] eh, of course groups => $deployer_groups, , the other one was the same thing on gerrit [18:29:50] mutante: but $deployer_groups may not include the 'trebuchet' group [18:30:08] in which case the user resource won't create it, iirc [18:30:12] but systemuser would [18:30:20] ori, ah; ya; the grand master plan is to eventually allow people to schedule with it and mark successful / failed deployments; but for now its just a simple thing [18:30:33] mwalker: that would be extremely useful [18:30:45] mwalker: have you seen https://github.com/etsy/deployinator ? [18:31:22] mwalker: oops, i meant https://github.com/etsy/PushBot [18:31:50] ori, interesting; no /me reads more [18:32:58] pushbot seemed like a great idea but it was cumbersome when we tried it out. It's definitely not well suited out of the box for living on a public channel. [18:33:20] mutante: in systemuser.pp, L43: if ($default_group_gid) { [18:33:29] that's false, so we enter the else block on L51 [18:33:38] if $default_group == $name { [18:33:38] group { $default_group: [18:33:38] ensure => present, [18:33:40] name => $default_group, [18:33:42] } [18:33:44] } [18:34:18] $default_group defaults to $name, and deployment_server.pp did not override it [18:34:35] which means it declared a group { 'trebuchet': } [18:34:40] which may not be getting declared with your change [18:34:54] i could be getting this wrong, the logic is a bit torturous [18:35:33] ori: then i'd think just adding "gid =>" would be the fix [18:35:38] chasemp: ^ [18:36:04] gid autorequires, but does it also autocreates? [18:36:08] yeah trying to figure out teh problem. so is it we thikn removing the systemuser usage means the trebuchet group won't be created? [18:36:09] the type reference says for "groups" that [18:36:09] *autocreate [18:36:15] it doesn't autocreate [18:36:17] chasemp: yeah [18:36:25] " The primary group should not be listed" [18:36:34] if it was a case of $default_group being ensured in systemuser [18:36:39] so that's "gid" , not groups [18:36:50] yeah, i think we need an explicit group resource [18:36:53] we need a ....group { 'foo': ensure = present } [18:36:58] to accompany the user now [18:37:00] I thought I got them all [18:37:03] * ori nods [18:37:04] but maybe not :) [18:37:30] mwalker: I might still have a vm that has a working pushbot (java ahoy!) [18:37:58] or add "trebuchet" to $deployer_groups ? [18:38:08] mwalker: down side: no ACLs and it hated pipes in people's nicks (I fixed that, I think) [18:38:32] mutante: in this change https://gerrit.wikimedia.org/r/#/c/137993/3/modules/deployment/manifests/deployment_server.pp [18:38:38] mutante: btw, _joe_'s catalog compiler jenkins job thingy is great for answering questions like that [18:38:42] I don't see any groups being created [18:38:53] mwalker: really, jouncebot is already more powerful, all pushbot did really was just change the /topic based on commands. Adding that to jouncebot would probably be semi-trivial [18:38:57] it should pick up on the fact that a formerly-declared resource disappeared [18:39:23] chasemp: then we need to add the group AND also specify "gid =>" in addition to "groups +>" [18:39:34] mwalker: well, it had some safeguards in "nope, someone else deploying now" and such, but, that logic should be fairly clear [18:39:48] greg-g, *nods* [18:40:04] mutante: I don't think so? The group wasn't created in teh old case [18:40:04] which may also be in error [18:40:10] so creating it now would be more than a straight port? [18:40:21] and the groups definition should work the same [18:40:56] as far as I can tell that preserves functionality, it's a matter of where do the groups get created further up the logic stack [18:40:59] that I don't know [18:41:10] chasemp: it was created in the old case, i think [18:41:13] chasemp: ori just found out where it gets created [18:41:25] 11:34 < ori> that's false, so we enter the else block on L51 [18:41:25] 11:34 < ori> if $default_group == $name { [18:41:26] 11:34 < ori> group { $default_group: [18:41:30] sure [18:41:39] but there is no default_group def there? [18:41:52] unless I'm blind it wasn't created there [18:41:58] ^possible [18:42:17] $default_group defaults to $name [18:42:25] and it wasn't getting overridden [18:42:35] so $default == $name was true [18:43:04] give me a moment to look before I go off and disagree prematurely :) [18:43:19] as i said before, i could be getting this wrong, it's a little tricky [18:44:01] I understand the confusion now [18:44:18] systemusers is in the state I changed it to, must not have been in the list of things faidon rolled back [18:44:33] I was thinking in terms of the original setup pre-me-doing-stuff [18:45:38] it's slightly insane puppet let's you declare a variable that doesn't exist yet. i.e. $foo ='' and $bar = $foo in the same class [18:45:58] but assuming that logic works, as long as default_group isn't overridden by something that does not match $name it will create it [18:46:10] by confusion I mean mine :) [18:46:40] so I believe https://gerrit.wikimedia.org/r/#/c/137993/3/modules/deployment/manifests/deployment_server.pp needs a group def [18:48:13] yep [18:48:24] that's where i'm at too i think [18:50:04] ok, so if we need to create the group we should also use "gid" for the primary group [18:50:15] and then add the secondary ones with #deployer_groups as before [18:50:24] agreed [18:50:29] ori? [18:51:20] yep [18:52:02] we know have a greater than 10% chance of being right :) [18:52:06] I may give us 11 [18:52:20] j/k I'm sure that's the deal [18:52:20] gid can be specified numerically OR by name, groups ONLY by name [18:52:29] we always use names anyways though [18:52:42] I believe so [18:56:59] (03PS4) 10Dzahn: deployment,replace generic::systemuser with user [operations/puppet] - 10https://gerrit.wikimedia.org/r/137993 (owner: 10Rush) [18:57:14] https://gerrit.wikimedia.org/r/#/c/137993/4/modules/deployment/manifests/deployment_server.pp [18:57:26] (or don't specify numeric gid for group as well?) [18:57:42] also, group has "system" as well [18:58:09] "Whether the group is a system group with lower GID." is 10004 lower ? [18:58:24] i guess it needs to use a different number too [18:59:14] ah that makes sense to me? [18:59:22] if this is a group not meant for human users [19:00:41] fwiw nowhere in teh cards have we said we'll do group cleanup [19:00:56] but making the distinction is probably a good thing imo [19:13:41] (03CR) 10Ori.livneh: [C: 031] deployment,replace generic::systemuser with user [operations/puppet] - 10https://gerrit.wikimedia.org/r/137993 (owner: 10Rush) [20:22:05] (03CR) 10Chad: [C: 032] Remove use of deprecated wfGetIP() [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140666 (owner: 10PleaseStand) [20:22:18] (03Merged) 10jenkins-bot: Remove use of deprecated wfGetIP() [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140666 (owner: 10PleaseStand) [20:22:47] !log demon Synchronized wmf-config/throttle.php: wfGetIP removal, code cleanup (duration: 00m 05s) [20:22:52] Logged the message, Master [20:23:04] !log demon Synchronized wmf-config/CommonSettings.php: wfGetIP removal, code cleanup (duration: 00m 04s) [20:23:08] Logged the message, Master [20:25:10] (03CR) 10Chad: [C: 031] lucene: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/140665 (owner: 10Matanya) [20:31:02] (03PS1) 10Jforrester: Follow-up 73ab798a: Also enable MediaWiki.org's Skin namespace in VisualEditor [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141053 [20:31:21] (03PS1) 10Jforrester: Remove whitelist entry for now-graduated MediaViewer BetaFeature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141054 [20:33:04] (03PS1) 10Jforrester: Remove references to the Math VisualEditor BetaFeature (now graduated) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141055 [20:33:06] (03CR) 10jenkins-bot: [V: 04-1] Remove references to the Math VisualEditor BetaFeature (now graduated) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141055 (owner: 10Jforrester) [20:33:44] (03CR) 10Jforrester: [C: 031] Follow-up 73ab798a: Also enable MediaWiki.org's Skin namespace in VisualEditor [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141053 (owner: 10Jforrester) [20:34:11] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Fri 20 Jun 2014 17:33:39 UTC [20:34:14] (03PS2) 10Jforrester: Remove references to the Math VisualEditor BetaFeature (now graduated) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141055 [20:53:31] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Fri Jun 20 20:53:30 UTC 2014 [20:55:12] (03CR) 10Bartosz Dziewoński: [C: 031] Follow-up 73ab798a: Also enable MediaWiki.org's Skin namespace in VisualEditor [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141053 (owner: 10Jforrester) [20:56:37] (03PS1) 10Ori.livneh: [HAT] Load mod_version on application servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/141059 [20:56:44] ^^ mutante [20:56:52] I don't expect that to be deployed on a Friday, just flagging it for review :) [20:59:09] let me know if you want to talk it through or anything [21:01:37] (03PS1) 10Ori.livneh: [HAT] Add configuration guards [operations/apache-config] - 10https://gerrit.wikimedia.org/r/141062 [21:02:01] (03CR) 10Alex Monk: [C: 031] Follow-up 73ab798a: Also enable MediaWiki.org's Skin namespace in VisualEditor [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141053 (owner: 10Jforrester) [21:12:31] (03CR) 10Ori.livneh: [C: 04-1] "Do not deploy before https://gerrit.wikimedia.org/r/#/c/141059/" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/141062 (owner: 10Ori.livneh) [21:31:06] aude: heya, just saw the deploy calendar edit, mind replying to the roadmap email I just sent? or, tell me what's up and I can :) [21:31:38] aude: /me may have gotten confused given the two wikidata reviews going on right now :/ [21:32:29] aude: nvm! there's your email [21:32:33] :p [21:32:36] :) [21:33:02] * greg-g only sync's email every 10 minutes, sometimes things like this happen :) [21:33:42] greg-g: Moral of the story; sync faster :D [21:34:19] bah, that just means email more often [21:34:34] but it means important emails, faster [21:34:54] that's what IRC is for [21:34:57] :) [21:36:49] true, true :p [21:43:03] greg-g: sorry about that [21:43:12] aude: no worries :) [21:43:45] we have too much new stuff right now that we want to skip this week [21:43:52] and ensure it's ready when we deploy [21:44:04] * James_F approves of that tactic. :-) [21:44:12] * greg-g nods [21:44:18] we still want to do the property / entity suggester though, onto beta and then ready [21:44:33] +1 [21:53:31] (03PS13) 10QChris: Add backup role and scripts to wikimetrics [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [21:54:17] (03PS6) 10QChris: Enable the new backup role in wikimetrics if set [operations/puppet] - 10https://gerrit.wikimedia.org/r/139558 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [21:55:42] (03CR) 10jenkins-bot: [V: 04-1] Enable the new backup role in wikimetrics if set [operations/puppet] - 10https://gerrit.wikimedia.org/r/139558 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [21:56:13] (03CR) 10QChris: "Jenkins will probably fail, as I made the dependency on" [operations/puppet] - 10https://gerrit.wikimedia.org/r/139558 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [22:04:15] (03CR) 10Nuria: Enable the new backup role in wikimetrics if set (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139558 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [22:06:33] (03CR) 10Milimetric: "only one nit" (031 comment) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [22:07:55] (03CR) 10Milimetric: [C: 031] Enable the new backup role in wikimetrics if set [operations/puppet] - 10https://gerrit.wikimedia.org/r/139558 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [22:10:46] (03CR) 10QChris: Add backup role and scripts to wikimetrics (031 comment) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [22:40:12] (03PS14) 10QChris: Add backup role and scripts to wikimetrics [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [22:41:04] (03PS7) 10QChris: Enable the new backup role in wikimetrics if set [operations/puppet] - 10https://gerrit.wikimedia.org/r/139558 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [22:41:48] (03CR) 10QChris: "Jenkins will fail again, as the dependency on change 139557 (which is not yet merged) is made explicit by a submodule" [operations/puppet] - 10https://gerrit.wikimedia.org/r/139558 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [22:42:46] (03CR) 10Milimetric: [C: 031] "cool" [operations/puppet] - 10https://gerrit.wikimedia.org/r/139558 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [22:43:37] (03CR) 10Milimetric: [C: 032] "cool" [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [22:43:48] (03CR) 10jenkins-bot: [V: 04-1] Enable the new backup role in wikimetrics if set [operations/puppet] - 10https://gerrit.wikimedia.org/r/139558 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [22:46:58] (03CR) 10QChris: Enable the new backup role in wikimetrics if set (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139558 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [22:56:07] (03CR) 10Chad: [C: 032] Replace the Nostalgia extension with the Nostalgia skin [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137200 (https://bugzilla.wikimedia.org/61256) (owner: 10Bartosz Dziewoński) [22:57:14] (03Merged) 10jenkins-bot: Replace the Nostalgia extension with the Nostalgia skin [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137200 (https://bugzilla.wikimedia.org/61256) (owner: 10Bartosz Dziewoński) [22:58:17] !log demon Synchronized wmf-config/CommonSettings.php: Load nostalgia from skins rather than extensions when it exists (duration: 00m 04s) [22:58:23] Logged the message, Master [22:58:37] ^demon|lunch: thanks :) [22:59:06] <^demon|lunch> np. [23:05:57] (03PS1) 10Dzahn: pmacct - also create group for systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/141074 [23:16:45] (03PS1) 10Dzahn: rancid: also create system group for system user [operations/puppet] - 10https://gerrit.wikimedia.org/r/141078