[00:00:04] RoanKattouw, ^d, marktraceur, MaxSem: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141112T0000). [00:02:59] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [00:03:23] (03PS1) 10Alexandros Kosiaris: Change postgresql dir to have version and cluster [puppet] - 10https://gerrit.wikimedia.org/r/172659 [00:10:28] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [00:44:09] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:48:19] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 57905 bytes in 0.214 second response time [00:52:11] PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 327 seconds [00:52:12] PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 330 seconds [00:53:28] RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -0 seconds [00:53:32] RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 0 seconds [01:12:47] (03PS1) 10Matanya: (bug 73197) allow admins to give patroller right on Hebrew Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172663 [01:14:34] Reedy: i forget this ^ part earlier, if you don't nind, quite useless with out [01:49:13] apergos, DNS is down for mail.wikipedia.org: https://bugzilla.wikimedia.org/show_bug.cgi?id=73290 [01:50:54] PROBLEM - Disk space on logstash1002 is CRITICAL: DISK CRITICAL - free space: / 16707 MB (3% inode=99%): [01:52:47] superm401: https://github.com/wikimedia/operations-dns/commit/a1607cedac1454ad9cad4f1e2742655f6bb13e59 https://github.com/wikimedia/operations-dns/commit/3a7f472cb3e9bcd03f0492cfdd8c0a2156f448d3 [01:52:49] Presumably [01:54:45] Reedy, hmm, thanks. People have been working on fixing SSL on mail.wikipedia.org quite recently, on the bug I see also-ed. [02:09:15] !log LocalisationUpdate completed (1.25wmf6) at 2014-11-12 02:09:14+00:00 [02:09:21] Logged the message, Master [02:12:52] RECOVERY - Disk space on logstash1002 is OK: DISK OK [02:15:49] !log LocalisationUpdate completed (1.25wmf7) at 2014-11-12 02:15:49+00:00 [02:15:53] Logged the message, Master [02:20:00] greg-g: bd808|BUFFER ^^ That's promising [02:20:09] 16 minutes to do 2 l10n updates [03:35:17] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Nov 12 03:35:17 UTC 2014 (duration 35m 16s) [03:35:21] Logged the message, Master [03:54:13] PROBLEM - puppetmaster https on virt1000 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8140: HTTP/1.1 500 Internal Server Error [04:56:23] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:03:32] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 57901 bytes in 0.112 second response time [05:12:33] PROBLEM - Disk space on vanadium is CRITICAL: DISK CRITICAL - free space: / 4276 MB (3% inode=94%): [05:21:52] PROBLEM - Disk space on vanadium is CRITICAL: DISK CRITICAL - free space: / 4227 MB (3% inode=94%): [06:29:04] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 3 failures [06:29:04] RECOVERY - Disk space on vanadium is OK: DISK OK [06:29:15] PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:46:13] RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:46:54] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:49:23] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.017 second response time [07:25:15] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:29:35] sei pesante, antimony [07:42:55] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 57901 bytes in 0.323 second response time [08:06:07] (03PS2) 10Giuseppe Lavagetto: HAT: remove addencoding directives that are harmful on apache 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/172578 [08:06:20] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] HAT: remove addencoding directives that are harmful on apache 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/172578 (owner: 10Giuseppe Lavagetto) [08:17:53] <_joe_> !log stress testing a group of HHVM servers in anticipation for the move to 20% of traffic [08:17:57] Logged the message, Master [08:29:45] (03PS2) 10Yuvipanda: diamond: Don't choke on puppet syntax error failures [puppet] - 10https://gerrit.wikimedia.org/r/172592 [08:29:52] anyone wanna +1 ^? [08:32:27] (03CR) 10Yuvipanda: [C: 032] diamond: Don't choke on puppet syntax error failures [puppet] - 10https://gerrit.wikimedia.org/r/172592 (owner: 10Yuvipanda) [08:40:15] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:46:26] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 57901 bytes in 6.325 second response time [08:50:05] (03PS7) 10Nemo bis: Make BounceHandler extension work on Meta-Wiki [puppet] - 10https://gerrit.wikimedia.org/r/168622 (owner: 1001tonythomas) [08:50:49] (03PS5) 10Nemo bis: Deploy BounceHandler extension to production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172322 (https://bugzilla.wikimedia.org/69019) (owner: 10Legoktm) [08:51:36] (03CR) 10Nemo bis: "This certainly needs something to be done as regards mail relay etc.; I *think* that's what I7badb2383a3e4d6d78e28bea6a97c5a51d5be64d is f" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172322 (https://bugzilla.wikimedia.org/69019) (owner: 10Legoktm) [08:57:52] (03PS4) 10Jalexander: Add SecurePoll specific dblist and allow SecurePoll to use [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172449 (https://bugzilla.wikimedia.org/73245) [09:02:16] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [09:03:56] (03CR) 10Nemo bis: [C: 04-1] "Not clear why a new list would be needed, please reply to Reedy. https://bugzilla.wikimedia.org/show_bug.cgi?id=73245#c4" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172449 (https://bugzilla.wikimedia.org/73245) (owner: 10Jalexander) [09:14:26] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [09:27:24] (03PS1) 10ArielGlenn: re-add mail.wp.org, lost in I1489f35f9fd335da9b509cc21015177020732d87. [dns] - 10https://gerrit.wikimedia.org/r/172677 [09:29:22] (03CR) 10ArielGlenn: [C: 032] re-add mail.wp.org, lost in I1489f35f9fd335da9b509cc21015177020732d87. [dns] - 10https://gerrit.wikimedia.org/r/172677 (owner: 10ArielGlenn) [10:16:35] <_joe_> !log depooling mw1189 from the api pool for reimaging [10:16:37] Logged the message, Master [10:28:03] <_joe_> !log repooling mw1189 with a reduced hhvm thread count for testing (puppet disabled, as well) [10:28:06] Logged the message, Master [10:50:06] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: puppet fail [11:01:56] (03CR) 10Alexandros Kosiaris: [C: 031] swift: report statsd data to localhost [puppet] - 10https://gerrit.wikimedia.org/r/172549 (owner: 10Filippo Giunchedi) [11:07:42] (03CR) 10TTO: [C: 04-1] "Per the bug, we don't need yet another dblist..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172449 (https://bugzilla.wikimedia.org/73245) (owner: 10Jalexander) [11:08:45] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [11:16:51] (03CR) 10Alexandros Kosiaris: [C: 032] Introduce role::ganglia classes [puppet] - 10https://gerrit.wikimedia.org/r/172553 (owner: 10Alexandros Kosiaris) [11:16:51] (03CR) 10Alexandros Kosiaris: [C: 032] Assign role::ganglia::web to uranium [puppet] - 10https://gerrit.wikimedia.org/r/172555 (owner: 10Alexandros Kosiaris) [11:21:06] PROBLEM - puppet last run on uranium is CRITICAL: CRITICAL: puppet fail [11:25:28] (03PS3) 10Filippo Giunchedi: carbon-c-relay: add debian packaging [debs/carbon-c-relay] - 10https://gerrit.wikimedia.org/r/172228 [11:27:49] (03PS4) 10Filippo Giunchedi: carbon-c-relay: add debian packaging [debs/carbon-c-relay] - 10https://gerrit.wikimedia.org/r/172228 [11:28:06] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] carbon-c-relay: add debian packaging [debs/carbon-c-relay] - 10https://gerrit.wikimedia.org/r/172228 (owner: 10Filippo Giunchedi) [11:28:45] akosiaris: thanks! [11:34:23] (03PS1) 10Alexandros Kosiaris: Don't include misc::monitoring::views on uranium [puppet] - 10https://gerrit.wikimedia.org/r/172689 [11:36:58] (03CR) 10Alexandros Kosiaris: [C: 032] Don't include misc::monitoring::views on uranium [puppet] - 10https://gerrit.wikimedia.org/r/172689 (owner: 10Alexandros Kosiaris) [11:38:59] <_joe_> misc::monitoring? [11:41:41] <_joe_> I can't see ganglia.wikimedia.org [11:41:47] <_joe_> is someone on it? [11:42:14] yes [11:42:15] me [11:42:26] <_joe_> ok [11:42:28] <_joe_> :) [11:42:46] (03PS1) 10Alexandros Kosiaris: Fixes for role::ganglia::web [puppet] - 10https://gerrit.wikimedia.org/r/172690 [11:42:58] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Fixes for role::ganglia::web [puppet] - 10https://gerrit.wikimedia.org/r/172690 (owner: 10Alexandros Kosiaris) [11:44:26] RECOVERY - puppet last run on uranium is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [11:52:59] (03PS1) 10Alexandros Kosiaris: role::ganglia::web rrdcached fixes [puppet] - 10https://gerrit.wikimedia.org/r/172693 [11:57:58] (03CR) 10Alexandros Kosiaris: [C: 032] role::ganglia::web rrdcached fixes [puppet] - 10https://gerrit.wikimedia.org/r/172693 (owner: 10Alexandros Kosiaris) [12:02:35] PROBLEM - Disk space on uranium is CRITICAL: DISK CRITICAL - free space: / 326 MB (3% inode=46%): [12:04:33] akosiaris: ^ [12:05:32] yeah I know [12:08:45] PROBLEM - HTTP on uranium is CRITICAL: Connection refused [12:12:17] (03PS1) 10Alexandros Kosiaris: role::ganglia::web rrdcached socket support [puppet] - 10https://gerrit.wikimedia.org/r/172695 [12:15:04] (03CR) 10Alexandros Kosiaris: [C: 032] role::ganglia::web rrdcached socket support [puppet] - 10https://gerrit.wikimedia.org/r/172695 (owner: 10Alexandros Kosiaris) [12:17:00] RECOVERY - HTTP on uranium is OK: HTTP OK: HTTP/1.1 302 Found - 426 bytes in 0.008 second response time [12:18:35] _joe_: paravoid: ganglia fixed [12:18:42] sorry, I should have tested it more :-( [12:18:45] RECOVERY - Disk space on uranium is OK: DISK OK [12:19:00] it was meant to be way less problematic [12:19:15] <_joe_> akosiaris: well, I'd be worried if it was :P [12:19:16] at least now we got rrdcached support :-) [12:26:56] Preparing for firmware update. Please wait [12:27:05] atftpd is a PITA ... [12:29:23] (03CR) 10Faidon Liambotis: [C: 032] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/165779 (owner: 10Ori.livneh) [12:34:25] PROBLEM - Disk space on uranium is CRITICAL: DISK CRITICAL - free space: / 337 MB (3% inode=59%): [12:39:25] RECOVERY - Disk space on uranium is OK: DISK OK [12:53:12] (03PS1) 10ArielGlenn: draft notifier to delete salt keys of a labs instance when it's deleted [puppet] - 10https://gerrit.wikimedia.org/r/172700 [13:15:41] (03PS1) 10QChris: Extract EventLoggings log directory into a variable [puppet] - 10https://gerrit.wikimedia.org/r/172705 [13:15:43] (03PS1) 10QChris: Move Eventlogging logs underneath /srv, which has more free space [puppet] - 10https://gerrit.wikimedia.org/r/172706 [13:15:45] (03PS1) 10QChris: Retain 90 days of EventLogging logs [puppet] - 10https://gerrit.wikimedia.org/r/172707 (https://bugzilla.wikimedia.org/69029) [13:18:08] (03PS2) 10Filippo Giunchedi: swift: report statsd data to localhost [puppet] - 10https://gerrit.wikimedia.org/r/172549 [13:18:25] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: report statsd data to localhost [puppet] - 10https://gerrit.wikimedia.org/r/172549 (owner: 10Filippo Giunchedi) [13:19:45] (03CR) 10QChris: Move Eventlogging logs underneath /srv, which has more free space (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/172706 (owner: 10QChris) [13:32:25] !log rolling reload of swift on ms-fe1* to pick up statsd changes [13:32:32] Logged the message, Master [13:36:46] PROBLEM - Disk space on uranium is CRITICAL: DISK CRITICAL - free space: / 307 MB (3% inode=87%): [13:42:56] RECOVERY - Disk space on uranium is OK: DISK OK [13:46:57] (03CR) 10QChris: Link aggregator dataset into wikimetrics public webspace (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/172285 (https://bugzilla.wikimedia.org/72740) (owner: 10QChris) [13:49:35] mark: around? tony & reedy are ready to turn on VERP addresses [13:49:49] I am fwiw :) [13:49:54] ok [13:50:30] paravoid: I can deal with the mail puppet merge and watch logs to confirm that things are working as expected [13:50:39] which one? [13:50:45] https://gerrit.wikimedia.org/r/#/c/168622/7/manifests/role/mail.pp [13:51:08] give me a sec [13:51:11] k [13:55:00] !log rolling reload of swift on ms-be1* to pick up statsd changes [13:55:04] Logged the message, Master [13:55:55] (03CR) 10Faidon Liambotis: [C: 031] Make BounceHandler extension work on Meta-Wiki [puppet] - 10https://gerrit.wikimedia.org/r/168622 (owner: 1001tonythomas) [13:56:32] Jeff_Green: yay :) thanks paravoid ! [13:56:47] although I would start from mediawiki.org, as Aaron noted [13:57:08] but if you're feeling confident enough and Greg is okay with it, meta works for me as well [13:57:39] i think aaron is referring to the hostname for the HTTP request [13:57:59] or is there another comment I'm not seeing? [13:58:12] oh wait, aaron said metawiki [13:58:20] yeah :) [13:58:42] paravoid: there's been some back and forth on the best HTTP endpoint [13:58:51] right, I see that [14:01:34] Jeff_Green: we can have Reedy finish https://gerrit.wikimedia.org/r/#/c/172322/ ? [14:01:34] how come we're not using each user's homewiki? [14:02:21] paravoid: i think because we don't know it until we decode the verp address [14:03:22] paravoid: we have the extension installed on every wiki in prod - and the exim talking back only to meta wiki [14:03:45] and once a bounce from some other wiki arrives - the $wikiId in the prefix part can help us look in the right table https://github.com/wikimedia/mediawiki-extensions-BounceHandler/blob/master/includes/ApiBounceHandler.php#L36 [14:03:51] (03PS1) 10Alexandros Kosiaris: Allow specifying journal_dir for rrdcached [puppet] - 10https://gerrit.wikimedia.org/r/172715 [14:03:56] and get the user details! [14:04:25] do emails originate from a common domain, e.g. @wikimedia.org, though? [14:04:42] they do [14:05:06] same as before [14:05:10] what's an example address? I saw the $wikiId-base36()... on the bug report but I'm not sure if it was accurate [14:05:25] paravoid: wiki-testwiki-2-nevvrp-Ru3q8IfTlVGXczYk@wikimedia.org [14:05:49] that will make sure that the bounce get redirected to wiki with wikiId = testwiki [14:05:53] right [14:06:18] Jeff_Green: so why did you submit that SPF change for wikipedia.org then? [14:06:41] they changed the plan [14:06:54] when I submitted that, they were talking about sending out with per-wiki domains [14:07:02] right, because that's what I remembered [14:07:02] that's gone back and forth too [14:07:06] hence my confusion :) [14:07:10] yeah [14:07:25] I didn't object b/c I think a single domain is cleaner anyway [14:07:32] yup, agreed [14:07:48] i'll purge that commit [14:07:55] "abandon" :P [14:08:15] stab [14:08:19] hah [14:08:28] where is git stab when you need it [14:09:18] tonythomas: are any special provisions taken to ensure the localpart is going to be < 64 chars? [14:09:44] (03Abandoned) 10Jgreen: add a neutral SPF record for wikipedia.org and other domains of template [dns] - 10https://gerrit.wikimedia.org/r/171324 (owner: 10Jgreen) [14:10:16] paravoid: of course ! as per https://github.com/wikimedia/mediawiki-extensions-BounceHandler/blob/master/includes/VerpAddressGenerator.php#L67 : we have The generated hash is cut down to 12 ( 96 bits ) instead of the full 120 bits. [14:12:02] we made sure that VERP gets < 64 in this commit https://github.com/wikimedia/mediawiki-extensions-BounceHandler/commit/d325bb411370c7aca4f3d5e1fef2e2efc0a1fd50 [14:12:20] that's what I'd like to hear :) [14:12:55] :) thanks to the reviewers ! [14:14:03] ok [14:14:06] looks reasonable [14:14:45] (03CR) 10Anomie: "I note that Nemo's -1, and the discussion on the bug that is the basis for TTO's -1, seems to be based entirely on a faulty understanding " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172449 (https://bugzilla.wikimedia.org/73245) (owner: 10Jalexander) [14:14:48] great [14:15:01] we're ready to fire it up then? [14:15:07] yup, go ahead :) [14:15:11] thx [14:15:33] I didn't review the mediawiki extension of course [14:15:36] I don't think I need to :) [14:15:59] yep, there have been a lot of eyes on that [14:16:14] should we coordinate the config changes here or in #dev? [14:16:39] Reedy: hey ! you around ? [14:16:40] mediawiki-config? [14:16:56] yeah - this one https://gerrit.wikimedia.org/r/#/c/172322/ [14:16:58] on which wikis do you plan to roll it out today? [14:17:10] meta ! [14:18:06] tonythomas: which wiki's start using VERP addresses as of today's changes? [14:18:12] yes, that was the question :) [14:18:29] all, as far as I can see [14:18:31] let's not do that [14:18:34] Jeff_Green: we are doing this one meta right ? [14:19:54] (03CR) 10Faidon Liambotis: [C: 04-1] Deploy BounceHandler extension to production (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172322 (https://bugzilla.wikimedia.org/69019) (owner: 10Legoktm) [14:20:23] tonythomas: hmm. I haven't looked at the wiki configuration since last week [14:20:43] let's look that over now [14:20:50] my understanding is that "meta" will be special, in the sense that its API will accept action=bouncehandler [14:21:09] but that the other wikis will also change their behavior, by sending VERP emails [14:21:11] paravoid: true - once we have our extension over there [14:21:22] those two rollouts should be independent, though [14:21:41] paravoid: true that - in that case we should have https://gerrit.wikimedia.org/r/#/c/172322/ rolling out only on meta [14:21:57] meta can start acception action=bouncehandler, but it shouldn't send VERP emails yet [14:22:07] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:22:11] these two shouldn't be tied with each other [14:22:31] I thought the same of the exim config actually [14:22:58] i.e. it would be good to have a toggle to turn off HTTP processing and revert :blackhole: [14:23:14] err revert to :blackhole: for wikimail bounces [14:23:25] it should be rolled out per-wiki imho [14:23:34] paravoid: we should have VERP generation first - and later meta receiving action=bouncehandler, right ? [14:23:46] no, the other way around [14:23:47] as we did with beta [14:24:08] paravoid: we will have nothing arriving at action=bouncehandler though [14:24:13] <_joe_> !log load test on hhvm done [14:24:18] Logged the message, Master [14:24:27] if we are not generating VERP address in meta [14:24:47] (03PS1) 10Gilles: Set different ImageMetrics sampling factor for logged-in users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172720 [14:27:27] !log restarting elastic1016 to pick up new plugins.... half way done [14:27:29] Logged the message, Master [14:29:23] gi11es: You seem to have linked the wrong change for this morning's SWAT [14:29:39] anomie: the first one was done overnight by Reedy [14:29:42] it seems [14:30:12] gi11es: You linked to a Flow bugfix, nothing to do with FSFileBackend [14:30:22] ah, ok, let me fix that [14:31:57] PROBLEM - swift-object-auditor on ms-be2010 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [14:32:45] (03CR) 10Ottomata: "I am confued about /var/log/eventlogging/archive and /srv/eventlogging-logs. They look like they have the same files, but /srv/eventloggi" [puppet] - 10https://gerrit.wikimedia.org/r/172706 (owner: 10QChris) [14:32:52] ms-be2010 is me [14:33:12] (03PS1) 10Giuseppe Lavagetto: Have 20% of the anonymous traffic sent to HHVM [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172724 [14:33:34] (03PS2) 10Ottomata: Extract EventLoggings log directory into a variable [puppet] - 10https://gerrit.wikimedia.org/r/172705 (owner: 10QChris) [14:34:17] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 57929 bytes in 0.314 second response time [14:36:05] (03CR) 10Ottomata: "Ah I see. eventlogging-logs is some old artifact we can delete." [puppet] - 10https://gerrit.wikimedia.org/r/172706 (owner: 10QChris) [14:36:23] (03CR) 10Ottomata: [C: 032] Extract EventLoggings log directory into a variable [puppet] - 10https://gerrit.wikimedia.org/r/172705 (owner: 10QChris) [14:38:07] RECOVERY - swift-object-auditor on ms-be2010 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [14:39:09] (03CR) 10Ottomata: [C: 032] "All good points. I concede!" [puppet] - 10https://gerrit.wikimedia.org/r/172285 (https://bugzilla.wikimedia.org/72740) (owner: 10QChris) [14:39:27] PROBLEM - swift-object-replicator on ms-be2010 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [14:40:49] Jeff_Green, tonythomas: so anyway; I'd strongly prefer it if this was done in an incremental rollout, not an on/off toggle [14:41:16] paravoid: agreed [14:51:57] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Have 20% of the anonymous traffic sent to HHVM [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172724 (owner: 10Giuseppe Lavagetto) [14:55:27] !log oblivian Synchronized wmf-config/CommonSettings.php: Open HHVM to 20% of anons (duration: 00m 06s) [14:55:31] Logged the message, Master [15:03:37] (03PS12) 10Ottomata: Initial commit of Cassandra puppet module [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 [15:09:56] PROBLEM - Disk space on uranium is CRITICAL: DISK CRITICAL - free space: / 306 MB (3% inode=87%): [15:16:37] <_joe_> akosiaris: are you looking into this? ^^ [15:20:14] yup [15:21:17] RECOVERY - Disk space on uranium is OK: DISK OK [15:21:55] (03CR) 10Alexandros Kosiaris: [C: 032] Allow specifying journal_dir for rrdcached [puppet] - 10https://gerrit.wikimedia.org/r/172715 (owner: 10Alexandros Kosiaris) [15:23:07] PROBLEM - puppet last run on uranium is CRITICAL: CRITICAL: puppet fail [15:26:07] (03PS1) 10Alexandros Kosiaris: role::ganglia::web fix invocation of rrdcached class [puppet] - 10https://gerrit.wikimedia.org/r/172747 [15:27:37] (03CR) 10Alexandros Kosiaris: [C: 032] role::ganglia::web fix invocation of rrdcached class [puppet] - 10https://gerrit.wikimedia.org/r/172747 (owner: 10Alexandros Kosiaris) [15:29:17] RECOVERY - puppet last run on uranium is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [15:29:36] _joe_: ^ this should fix the uranium disk issues permanently [15:29:46] https://gerrit.wikimedia.org/r/172715 [15:31:38] <_joe_> akosiaris: nice :) [15:32:19] I was not expecting the journal to grow at a rate of 1,1G per 20mins [15:32:41] not that I had run the numbers anyway... [15:37:11] <_joe_> akosiaris: eheh [15:37:25] <_joe_> 1 gb/20 mins is _a_lot_ [15:42:51] (03CR) 10QChris: "> I think we should symlink /var/log/eventlogging ->" [puppet] - 10https://gerrit.wikimedia.org/r/172706 (owner: 10QChris) [15:48:19] I suppose I may as well SWAT today [15:50:09] gi11es, hoo, anomie: Ping for SWAT in 10 minutes [15:50:11] * anomie is here [15:50:19] anomie: pong [15:50:25] * hoo around [15:53:13] (03CR) 10Ottomata: "Hm, good point, so these logs are a data output of eventlogging. Hm." [puppet] - 10https://gerrit.wikimedia.org/r/172706 (owner: 10QChris) [16:00:04] manybubbles, anomie, ^d, marktraceur, gi11es: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141112T1600). Please do the needful. [16:00:09] * anomie starts SWAT [16:00:12] in meeting today! [16:00:19] thanks anomie! [16:00:19] hoo: I'll do yours first because it should be quick [16:00:33] (03PS2) 10Anomie: Add "featured portal" badge (Q17580674) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172580 (https://bugzilla.wikimedia.org/73193) (owner: 10Hoo man) [16:00:41] ok [16:00:42] (03CR) 10Anomie: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172580 (https://bugzilla.wikimedia.org/73193) (owner: 10Hoo man) [16:00:53] (03Merged) 10jenkins-bot: Add "featured portal" badge (Q17580674) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172580 (https://bugzilla.wikimedia.org/73193) (owner: 10Hoo man) [16:01:13] !log anomie Synchronized wmf-config/Wikibase.php: SWAT: Add "featured portal" badge (Q17580674) [[gerrit:172729]] (duration: 00m 10s) [16:01:14] hoo: ^ Test please [16:01:16] Logged the message, Master [16:01:23] gi11es: You're next [16:01:29] on that [16:03:16] anomie: Looks fine, thanks [16:03:32] (03PS2) 10Anomie: Set different ImageMetrics sampling factor for logged-in users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172720 (owner: 10Gilles) [16:03:39] (03CR) 10Anomie: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172720 (owner: 10Gilles) [16:03:47] (03Merged) 10jenkins-bot: Set different ImageMetrics sampling factor for logged-in users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172720 (owner: 10Gilles) [16:04:10] !log anomie Synchronized wmf-config: SWAT: Set different ImageMetrics sampling factor for logged-in users [[gerrit:172720]] (duration: 00m 12s) [16:04:11] gi11es: ^ Test please [16:04:11] Logged the message, Master [16:07:33] gi11es: Are you there? [16:08:17] PROBLEM - HHVM busy threads on mw1114 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [90.0] [16:11:10] hmm. Well, moving on then. [16:12:21] anomie: yeah I'm testing the change [16:14:28] RECOVERY - HHVM busy threads on mw1114 is OK: OK: Less than 1.00% above the threshold [60.0] [16:14:35] I see the value in eval.php on terbium but not in JS yet [16:14:46] (03CR) 10Nemo bis: Deploy BounceHandler extension to production (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172322 (https://bugzilla.wikimedia.org/69019) (owner: 10Legoktm) [16:16:01] ah, nevermind... it's because the code that sends it to JS is only on master right now :) [16:16:07] anomie: change deployed fine [16:16:47] gi11es: ok [16:17:54] (03CR) 10Nuria: [C: 031] Link aggregator dataset into wikimetrics public webspace [puppet] - 10https://gerrit.wikimedia.org/r/172285 (https://bugzilla.wikimedia.org/72740) (owner: 10QChris) [16:19:36] PROBLEM - Disk space on db1017 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=74%): [16:21:53] yeah yeah [16:22:37] RECOVERY - Disk space on db1017 is OK: DISK OK [16:25:09] !log anomie Synchronized php-1.25wmf7/extensions/MultimediaViewer/: SWAT: Backport MediaViewer options menu layout fix [[gerrit:172737]] (duration: 00m 09s) [16:25:11] gi11es: ^ Test please [16:25:16] Logged the message, Master [16:25:33] anomie: testing [16:25:55] anomie: works fine, all clear [16:26:05] thanks! [16:26:11] anomie: Your turn [16:26:13] !log anomie Started scap: SWAT: SecurePoll fix for jump-text and title on create/edit [[gerrit:172718]] [16:26:16] Logged the message, Master [16:30:25] (03CR) 10Rush: [C: 04-1] "just -1'ing as we decided to wait on this for the moment, only to keep our changes to a minimum pre-migration." [puppet] - 10https://gerrit.wikimedia.org/r/168509 (owner: 1020after4) [16:40:20] paravoid: is lead another mail server? [16:40:34] (03PS2) 10Giuseppe Lavagetto: hiera: allow regex-based searches [puppet] - 10https://gerrit.wikimedia.org/r/172552 [16:40:45] legoktm: yes [16:40:54] and it's going to be replaced soonish as well [16:41:11] with yet another one [16:41:16] ok, what's it's IP? (or where should I look to find it?) [16:41:23] lead.wikimedia.org? :) [16:41:43] both of those mailservers have IPv6 as well [16:41:58] right now it doesn't matter I guess, since the exim config justs hits appservers.svc.eqiad.wmnet, which is ipv4 [16:42:28] but if e.g. switched it over to our frontends, then you'd see its IPv6 address instead (in XFF even!) [16:42:38] so I have to say, why are we doing IP-based authentication here in the first place? [16:43:19] what would you suggest instead? [16:45:48] token-based maybe? [16:45:55] using a header? [16:48:26] !log anomie Finished scap: SWAT: SecurePoll fix for jump-text and title on create/edit [[gerrit:172718]] (duration: 22m 13s) [16:48:29] anomie: ^ Test please [16:48:29] Logged the message, Master [16:49:26] anomie: d'oh, it didn't work. Forgot to actually pull the revision :( [16:50:00] paravoid: how would that work on the MW side? I'm not really following your idea :/ [16:51:03] !log anomie Synchronized php-1.25wmf7/extensions/SecurePoll/: SWAT: SecurePoll fix for jump-text and title on create/edit [[gerrit:172718]] (for real this time) (duration: 00m 09s) [16:51:06] Logged the message, Master [16:51:09] anomie: ^ Try that. i18n might not be there though. [16:51:31] anomie: ^ Yes, change is there except for i18n. :/ [16:51:44] legoktm: send a secret via curl... and verify it in the API [16:52:03] would be defined as a private var for MediaWiki and probably be in puppet private for exim [16:52:07] ah [16:52:30] yeah, that would be easy to implement [16:53:09] (03PS1) 10Ottomata: Add README.md with Usage info [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/172763 [16:53:35] anomie: I think we'll just live with the i18n being wrong until this afternoon's train deploy. [16:53:41] (03PS2) 10Ottomata: Add README.md with Usage info [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/172763 [16:53:43] * anomie is more-or-less done with SWAT [16:54:15] (03CR) 10Ottomata: [C: 032 V: 032] Add README.md with Usage info [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/172763 (owner: 10Ottomata) [17:00:04] manybubbles, ^d: Dear anthropoid, the time has come. Please deploy Search (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141112T1700). [17:00:36] PROBLEM - puppet last run on cp3018 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:27] PROBLEM - puppet last run on amssq39 is CRITICAL: CRITICAL: Puppet has 1 failures [17:03:24] (03PS1) 10Chad: dewiki gets cirrus as default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172766 [17:05:11] (03CR) 10Manybubbles: [C: 031] dewiki gets cirrus as default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172766 (owner: 10Chad) [17:05:13] (03CR) 10Chad: [C: 032] dewiki gets cirrus as default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172766 (owner: 10Chad) [17:05:19] (03Merged) 10jenkins-bot: dewiki gets cirrus as default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172766 (owner: 10Chad) [17:05:41] <_joe_> manybubbles: \o/ [17:05:51] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s) [17:05:53] Logged the message, Master [17:05:54] <_joe_> ^d: you as well [17:06:01] <_joe_> congrats [17:06:05] _joe_: wait a few hours to see if people complain! [17:06:15] <^d> Dieses Wiki verwendet eine neue Suchmaschine. (Mehr erfahren) [17:06:16] <^d> :D [17:06:44] yay [17:07:06] Suchmaschine? that doesn't sound pleasant [17:07:14] <_joe_> manybubbles: yeah, just being able to pull the lever is an accomplishment in itself [17:07:19] yeah [17:07:21] <_joe_> manybubbles: "search engine" [17:07:24] yeah [17:07:59] I had two years of German more than ten years ago. So it *looks* familiar and sometimes if I squint I can guess and be almost right. [17:08:41] <_joe_> I never actually studied german [17:09:28] <_joe_> but the words there were of quite common use [17:10:24] (03CR) 10Giuseppe Lavagetto: [C: 032] hiera: allow regex-based searches [puppet] - 10https://gerrit.wikimedia.org/r/172552 (owner: 10Giuseppe Lavagetto) [17:12:59] (03PS1) 10Giuseppe Lavagetto: hiera: use lookup_key, not key [puppet] - 10https://gerrit.wikimedia.org/r/172768 [17:13:21] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] hiera: use lookup_key, not key [puppet] - 10https://gerrit.wikimedia.org/r/172768 (owner: 10Giuseppe Lavagetto) [17:16:56] RECOVERY - puppet last run on amssq39 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [17:17:07] RECOVERY - puppet last run on cp3018 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [17:30:13] Is hashar the only maintainer for CI Jenkins? [17:31:14] I need to make a job non-voting, https://gerrit.wikimedia.org/r/#/c/172765/ [17:38:09] awight: you can add Timo as well [17:38:26] greg-g: thx! [17:38:27] they're both pretty responsive [17:38:38] filing a bug would also help :) [17:38:49] Krinkle|detached: hehe, when you are feeling less detached, I could use non-voting on this CI job: https://gerrit.wikimedia.org/r/#/c/172765/ [17:47:13] (03CR) 10GWicke: [C: 031] "LGTM." [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [17:47:33] akosiaris: ping [17:47:56] gwicke: pong [17:48:19] hey; are you planning to review the restbase puppetization? [17:48:29] yup [17:48:35] cool [17:48:40] chasemp: has the day arrived where when I want a new procurement ticket I should just enter it directly in phab? Or is RT still the right place for that? [17:48:51] still RT [17:49:06] ok! [17:50:51] akosiaris: am happy to chat about it any time [17:51:13] gwicke: cool. Good to know, thanks! [17:53:26] Fatal log is full of "PHP Fatal error: Base lambda function for closure not found in /srv/mediawiki/php-1.25wmf7/extensions/Wikidata/extensions/Wikibase/lib/config/WikibaseLib.default.php on line 18" [17:53:38] question, for anyone.... [17:53:43] 1721 times in last hour [17:54:12] does bits.beta.wmflabs.org have its own varnish hosts? [17:54:56] <_joe_> bd808: from which servers? [17:55:10] _joe_: Trying to figure that out. [17:55:13] <_joe_> bd808: usually happens on API after a graceful, or scap [17:56:33] (03CR) 10GWicke: "This has been tested against a cassandra cluster set up with https://gerrit.wikimedia.org/r/#/c/166888/ in the services labs project. The " [puppet] - 10https://gerrit.wikimedia.org/r/167213 (owner: 10GWicke) [17:56:46] _joe_: Just from the top20 hosts I'd guess mw1195 mw1196 mw1205 mw1147 mw1141 [17:57:15] <_joe_> ok look how that matches http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&s=by+name&c=API%2520application%2520servers%2520eqiad&tab=m&vn=&hide-hf=false [17:57:31] <_joe_> those servers need a reload/restart of apache [17:57:43] <_joe_> bd808: on it [17:57:56] Thanks _joe_ [17:58:46] <_joe_> !log gracefulling apache on problematic API hosts [17:58:54] Logged the message, Master [18:11:24] <_joe_> bd808: errors should've stopped [18:13:05] _joe_: Cool. Looks like the error rate looks much better in last 5 mins [18:14:12] (03PS1) 10Yuvipanda: cache: Don't setup ganglia monitoring on labs [puppet] - 10https://gerrit.wikimedia.org/r/172776 (https://bugzilla.wikimedia.org/73263) [18:14:24] ^ this fixes some betalabs puppet errors that've been erroring for a month... [18:14:26] can someone +1? [18:14:41] bblack: ^ (since it touches some varnish code) [18:16:41] <_joe_> sigh [18:16:55] _joe_: It's realm branching, and I'm not fully sure how to do this with hiera [18:17:16] <_joe_> YuviPanda: well you'd have to refactor a few things [18:17:18] yeah [18:17:21] <_joe_> for now, it is good [18:17:31] argh.. realm branching in varnish too ? [18:17:38] (03CR) 10Reedy: Deploy BounceHandler extension to production (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172322 (https://bugzilla.wikimedia.org/69019) (owner: 10Legoktm) [18:17:52] I was more than happy to kill them in the ganglia_new module tbh... [18:18:07] (03CR) 10Reedy: [C: 031] "Yup" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172449 (https://bugzilla.wikimedia.org/73245) (owner: 10Jalexander) [18:18:44] akosiaris: labs doesn't have ganglia, and the cache module includes them indiscriminately [18:19:02] _joe_: akosiaris another option is to include gmond in the labs base class, but then it's just running there uselessly, trying to send metrics that are never received... [18:19:35] <_joe_> YuviPanda: nah [18:19:40] I get the problem you are trying to solve... I don't really have a better solution ready, I just dislike all the labs branching [18:19:49] sigh, there's going to be a *lot* more branching... [18:19:49] <_joe_> YuviPanda: I can take a shot at that, but tomorrow please :) [18:19:57] lots of ganglia in the varnish code... [18:20:16] <_joe_> YuviPanda: so let's solve this once and for all in a beter way [18:21:02] What repo do integration/slave-scripts come from? [18:22:20] nvm, found it: integration-jenkins/bin/ [18:22:30] so puppet hasn't run in any of the cache nodes in betalabs for like a month now [18:22:31] damn, instance.pp has a dependency on an exec from inside the ganglia varnish class that's included in manifests/role/cache.pp.... [18:22:45] (03CR) 10BBlack: [C: 031] cache: Don't setup ganglia monitoring on labs [puppet] - 10https://gerrit.wikimedia.org/r/172776 (https://bugzilla.wikimedia.org/73263) (owner: 10Yuvipanda) [18:22:46] <_joe_> YuviPanda: lol [18:23:04] YuviPanda: not surprised [18:23:07] etherpad just hiccuped [18:23:09] I'm not going to realm branch inside the varnish module... [18:23:17] Back pretty fast [18:23:23] marktraceur: not surprised as well [18:23:27] <_joe_> YuviPanda: ping me tomorrow [18:23:30] not the best software out there [18:23:32] _joe_: ok, will do. [18:23:34] <_joe_> we can work on this toghether [18:23:51] hey btw, I am fighting with the ganglia stuff as well [18:24:03] _joe_: I'll note on the bug. I'm sure there's plenty of other similar things going on that we'll uncover as time goes by. [18:24:14] I 'll help too [18:24:16] * YuviPanda hopes ganglia in our infra dies in the next few years, replaced by graphite [18:24:20] akosiaris: :D will ping you too! [18:24:36] * akosiaris actually likes ganglia [18:24:41] well the gmond part [18:24:46] on that note, anyone wants to +1 https://gerrit.wikimedia.org/r/#/c/172420/? [18:24:48] and parts of gmetad [18:24:51] it's already running on the shinken. [18:24:53] host [18:25:12] <_joe_> YuviPanda: if we want that, we need to do a few things on both graphite AND grafana (our chosen frontend, apparently) [18:25:37] _joe_: yeah, true. I'm still not too much of a fan of grafana, tbh. feels more complex than necessary. [18:25:42] WHY use ES to store configs?! [18:25:47] *config [18:25:52] webscale! [18:26:06] <_joe_> YuviPanda: no idea, I am frontend-agnostic [18:26:13] * YuviPanda quite likes nagf [18:26:22] <_joe_> no please :) [18:26:25] hehe :) [18:26:39] <_joe_> not that it's *bad*, it's just raw [18:26:52] well, it *could* be unraw, but NIH etc. [18:27:05] <_joe_> I'd really like to have toime to work on monitoring [18:27:07] for a 3h hack it's pretty good. [18:27:21] <_joe_> I think it's awesome for its intended scope [18:27:33] (03CR) 10BBlack: [C: 031] "Seems like either both or neither of the text and mobile lb destinations should be template args (vs hardcoded)? I'd probably just hardco" [dns] - 10https://gerrit.wikimedia.org/r/171769 (https://bugzilla.wikimedia.org/38799) (owner: 10Dzahn) [18:27:34] yeah, I perhaps should ignore the failures I notice from the shinken work and just continue with the shinken work... [18:29:12] (03CR) 10Yuvipanda: [C: 04-2] "Ok, this just led us down a rabbit hole of too many things in the varnish code being intertwined with ganglia. Need to fix properly." [puppet] - 10https://gerrit.wikimedia.org/r/172776 (https://bugzilla.wikimedia.org/73263) (owner: 10Yuvipanda) [18:29:36] (03CR) 10Giuseppe Lavagetto: [C: 031] shinken: Add basic service checks for all of labs [puppet] - 10https://gerrit.wikimedia.org/r/172420 (owner: 10Yuvipanda) [18:29:48] (03PS14) 10Yuvipanda: shinken: Add basic service checks for all of labs [puppet] - 10https://gerrit.wikimedia.org/r/172420 [18:30:40] (03PS1) 10Dzahn: remove nickel from network.pp [puppet] - 10https://gerrit.wikimedia.org/r/172778 [18:32:04] (03CR) 10Yuvipanda: [C: 032] shinken: Add basic service checks for all of labs [puppet] - 10https://gerrit.wikimedia.org/r/172420 (owner: 10Yuvipanda) [18:32:12] _joe_: ty [18:32:54] * _joe_ off [18:38:22] !log turned off yurik's zerosms cronjob on stat1002 (already discussed with him, he was ok with it being stopped until he could find time to fix it) [18:38:25] Logged the message, Master [18:41:14] (03PS1) 10Cscott: Give parsoid-roots access to ruthenium and xenon. [puppet] - 10https://gerrit.wikimedia.org/r/172780 [18:46:44] (03CR) 10Nuria: [C: 031] Retain 90 days of EventLogging logs [puppet] - 10https://gerrit.wikimedia.org/r/172707 (https://bugzilla.wikimedia.org/69029) (owner: 10QChris) [18:52:12] (03PS1) 10Anomie: Add api-feature-usage to logstash [puppet] - 10https://gerrit.wikimedia.org/r/172781 [19:00:04] Reedy, greg-g: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141112T1900). [19:01:09] (03CR) 10Anomie: Add api-feature-usage to logstash (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/172781 (owner: 10Anomie) [19:01:55] "Anthropoid" [19:02:52] http://en.wiktionary.org/wiki/anthropoid#Noun [19:03:15] (03CR) 10BryanDavis: "Cherry-picked to deployment-salt for testing in beta" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/172781 (owner: 10Anomie) [19:04:04] guillom: it actually picks a random one from multiple terms [19:04:16] !log installing package upgrades on iron [19:04:18] I like it :) [19:04:20] Logged the message, Master [19:05:32] !log installing package upgrades on bast1001 (incl. PHP version) [19:05:35] Logged the message, Master [19:08:27] PROBLEM - DPKG on bast1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:08:32] Configuration file `/etc/php5/conf.d/fss.ini' ==> Modified (by you or by a script) since installation. [19:08:49] +; Enable FastStringSearch extension [19:08:51] yes / no ? [19:09:19] installs package maintainers version [19:10:30] mutante: bast1001? [19:10:36] RECOVERY - DPKG on bast1001 is OK: All packages OK [19:10:42] It shouldn't be needed on bast1001 [19:10:45] but shouldn't do any harm [19:10:46] akosiaris: can i do this then? re: status of nickel https://gerrit.wikimedia.org/r/#/c/172778/1/manifests/network.pp [19:11:20] Reedy: yes, bast1001. the package upgrade enabled it now, since i let it install the maintainer's version [19:12:42] replace php5-fss 1.0-1 (using .../php5-fss_1.0-2_amd64.deb [19:12:48] ah, yeah [19:12:55] it fixes the old comment style [19:12:59] perfectly fine [19:13:07] ok:) [19:18:54] (03PS1) 10Reedy: Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172789 [19:18:56] (03PS1) 10Reedy: testwiki to 1.25wmf8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172790 [19:18:58] (03PS1) 10Reedy: wikipedias to 1.25wmf7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172791 [19:19:00] (03PS1) 10Reedy: group0 to 1.25wmf8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172792 [19:19:12] (03CR) 10Reedy: [C: 032] Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172789 (owner: 10Reedy) [19:19:20] (03CR) 10Reedy: [C: 032] testwiki to 1.25wmf8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172790 (owner: 10Reedy) [19:19:23] (03Merged) 10jenkins-bot: Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172789 (owner: 10Reedy) [19:19:27] (03Merged) 10jenkins-bot: testwiki to 1.25wmf8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172790 (owner: 10Reedy) [19:19:42] !log reedy Started scap: testwiki to 1.25wmf8 and build l10n cache [19:19:47] Logged the message, Master [19:19:52] Reedy: was my branch update to the release tool too late? [19:19:58] * aude can update submodule [19:20:04] aude: I merged one at least... [19:20:14] hmmm, ok [19:20:29] https://gerrit.wikimedia.org/r/172774 [19:20:33] ah, yes [19:21:04] didn't see the ping in irc from gerrit bot [19:21:22] maybe it was broken [19:32:41] (03Abandoned) 10Dzahn: gerrit: configure sshd to not listen on gerrit IP [puppet] - 10https://gerrit.wikimedia.org/r/172476 (owner: 10Dzahn) [19:37:16] !log Made myself oauthadmin on mediawikiwiki [19:37:20] Logged the message, Master [19:40:16] csteipp: ^ Ok with you if I keep it on this time? [19:40:49] (the right) [19:42:13] (03PS1) 10Dzahn: ssh server: make listening port configurable [puppet] - 10https://gerrit.wikimedia.org/r/172799 [19:47:51] (03PS2) 10BryanDavis: Add api-feature-usage to logstash [puppet] - 10https://gerrit.wikimedia.org/r/172781 (owner: 10Anomie) [19:50:58] (03CR) 10BryanDavis: "Cherry-picked patch set #2 to deployment-salt for testing in beta" [puppet] - 10https://gerrit.wikimedia.org/r/172781 (owner: 10Anomie) [19:57:18] hashar, YuviPanda: do either of you know anything about the gerrit/github mirroring? how it's set up, what credentials it uses, etc? [19:57:33] hmm, you need ^d or qchris [19:57:55] ^d, qchris, ^ [19:57:55] <^d> It uses secret credsss [19:58:10] are they available in puppet? [19:58:20] <^d> For what? [19:58:28] presumably the password isn't plain text in the public repo ;) [19:58:30] i wrote https://github.com/cscott/npm-travis yesterday which lets you trigger travis jobs from jenkins. [19:59:00] but it needs to be able to push to a temporary branch on github to trigger travis. presumably the mirroring already has access to creds which let it do that. [19:59:18] i'm wondering if they are accessible from jenkins [19:59:57] otherwise i can set up a new github user with its own forks of all the relevant projects, and configure travis against that user. but that seems a bit of a waste. [20:00:00] <^d> Ok, so we've got a role account @ github. [20:00:12] <^d> Gerrit's public key is added to that user. [20:00:37] (03PS1) 10Yuvipanda: shinken: Don't say 'icinga' in shinken notification emails [puppet] - 10https://gerrit.wikimedia.org/r/172802 [20:00:58] can a jenkins job get at gerrit's public key? [20:01:18] <^d> Yeah, jenkins slaves already have it. [20:01:23] <^d> See gerrit::replicationdest [20:01:27] <^d> in puppet [20:01:44] ok, cool, let me read through that and i'll come back w/ more questions [20:01:49] (03PS3) 10BryanDavis: Add api-feature-usage to logstash [puppet] - 10https://gerrit.wikimedia.org/r/172781 (owner: 10Anomie) [20:01:52] <^d> That's just the public key though. [20:02:02] <^d> To be able to push from jenkins you'd need the private key. [20:02:11] <^d> Which...doesn't exist off of gerrit host afaik. [20:02:14] <^d> (shouldn't) [20:02:14] <^d> :) [20:02:30] heh [20:02:30] <^d> We might be able to setup another key pair. [20:02:43] yeah, should have another keypair with more limited access, I guess [20:03:26] yeah, that sounds more or less correct. [20:03:52] there's a github user that corresponds to gerrit? is that right? [20:04:00] yeah, wmfgerrit I think [20:04:02] (03PS2) 10Yuvipanda: shinken: Don't say 'icinga' in shinken notification emails [puppet] - 10https://gerrit.wikimedia.org/r/172802 [20:05:19] <^d> cscott: Yes, wmfgerrit is right [20:05:25] so i can maybe create a wmftravis user for these pushes [20:05:29] (03PS3) 10Yuvipanda: shinken: Don't say 'icinga' in shinken notification emails [puppet] - 10https://gerrit.wikimedia.org/r/172802 [20:05:54] <^d> cscott: That could work, yeah. [20:06:15] it looks like wmfgerrit is a member of the 'owners' team, so they should be able to push to every repo. i'll probably manually add wmftravis to the particular repos i'm experimenting with travis integration with, to start. [20:06:49] how does the gerrit->git sync actually work? is there any chance that gerrit will wipe out a temporary branch i create if the sync happens at just the wrong time? [20:07:24] (03PS1) 10Dzahn: ssh server: make ListenAddress configurable [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) [20:08:13] (03CR) 10Ori.livneh: [C: 031] Move Eventlogging logs underneath /srv, which has more free space (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/172706 (owner: 10QChris) [20:08:17] <^d> cscott: I don't think it'll delete remote branches it doesn't recognize. [20:08:25] <^d> It deletes remote branches when they're deleted on gerrit. [20:08:47] (03Abandoned) 10Ori.livneh: add auditd module; add auditd rules for keyholder [puppet] - 10https://gerrit.wikimedia.org/r/165862 (owner: 10Ori.livneh) [20:08:51] ok, well i guess we'll try it and see! [20:08:54] <^d> I lie :( [20:08:59] <^d> If true, replication will remove remote branches that are absent locally or invisible to the replication (for example read access denied via authGroup option). [20:09:26] hm, how often does sync occur? [20:09:38] <^d> Whenever a repo's refs/* change. [20:09:47] (03PS3) 10Ori.livneh: tox env to build test coverage [debs/pybal] - 10https://gerrit.wikimedia.org/r/172243 (owner: 10Hashar) [20:09:54] (03CR) 10Ori.livneh: [C: 032] tox env to build test coverage [debs/pybal] - 10https://gerrit.wikimedia.org/r/172243 (owner: 10Hashar) [20:10:11] (03Merged) 10jenkins-bot: tox env to build test coverage [debs/pybal] - 10https://gerrit.wikimedia.org/r/172243 (owner: 10Hashar) [20:10:14] so maybe i could just push the temporary branch to the gerrit repo, and let the sync take care of the github side, assuming this doesn't cause hella delay [20:11:34] <^d> Hmm, should work. [20:12:57] then i'd need to create a gerrit wmftravis user, with appropriate push permissions. [20:13:06] (03PS1) 10Dzahn: ssh server: make PermitRootLogin configurable [puppet] - 10https://gerrit.wikimedia.org/r/172804 [20:13:08] (03PS1) 10Ori.livneh: Tests for `pybal.monitor` [debs/pybal] - 10https://gerrit.wikimedia.org/r/172805 [20:13:28] <^d> cscott: This is getting complicated! :( [20:13:39] !log reedy Finished scap: testwiki to 1.25wmf8 and build l10n cache (duration: 53m 57s) [20:13:43] Logged the message, Master [20:13:47] that was slow :( [20:13:58] well, i wouldn't need to make the github user any more, so it's the same complexity just placed differently [20:14:06] let me experiment though [20:14:12] (03CR) 10Ori.livneh: "Thanks for the feedback! I'll definitely try to develop it even further." [puppet] - 10https://gerrit.wikimedia.org/r/165779 (owner: 10Ori.livneh) [20:14:17] (03PS4) 10Ori.livneh: add `keyholder` module for managing a shared ssh-agent [puppet] - 10https://gerrit.wikimedia.org/r/165779 [20:14:24] (03CR) 10Ori.livneh: [C: 032 V: 032] add `keyholder` module for managing a shared ssh-agent [puppet] - 10https://gerrit.wikimedia.org/r/165779 (owner: 10Ori.livneh) [20:14:40] (03CR) 10Reedy: [C: 032] wikipedias to 1.25wmf7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172791 (owner: 10Reedy) [20:14:47] (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/#/c/172799" [puppet] - 10https://gerrit.wikimedia.org/r/172476 (owner: 10Dzahn) [20:15:14] (03CR) 10Alexandros Kosiaris: [C: 032] "LGTM but why not removing everything concerning nickel in one batch ?" [puppet] - 10https://gerrit.wikimedia.org/r/172778 (owner: 10Dzahn) [20:15:27] (03Merged) 10jenkins-bot: wikipedias to 1.25wmf7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172791 (owner: 10Reedy) [20:16:37] (03CR) 10Dzahn: "i wasn't sure if that should be done when it is reclaimed for something else, but yes if the name will change anyways" [puppet] - 10https://gerrit.wikimedia.org/r/172778 (owner: 10Dzahn) [20:20:11] (03PS2) 10Dzahn: remove nickel from network.pp [puppet] - 10https://gerrit.wikimedia.org/r/172778 [20:20:22] (03CR) 10Dzahn: [C: 032] remove nickel from network.pp [puppet] - 10https://gerrit.wikimedia.org/r/172778 (owner: 10Dzahn) [20:20:55] (03PS1) 10Ori.livneh: role::deployment: retab [puppet] - 10https://gerrit.wikimedia.org/r/172809 [20:21:10] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf7 [20:21:15] Logged the message, Master [20:21:32] (03PS2) 10Ori.livneh: role::deployment: retab [puppet] - 10https://gerrit.wikimedia.org/r/172809 [20:22:42] (03CR) 10Reedy: [C: 032] group0 to 1.25wmf8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172792 (owner: 10Reedy) [20:22:46] (03CR) 10Ori.livneh: [C: 032] role::deployment: retab [puppet] - 10https://gerrit.wikimedia.org/r/172809 (owner: 10Ori.livneh) [20:22:59] (03Merged) 10jenkins-bot: group0 to 1.25wmf8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172792 (owner: 10Reedy) [20:24:09] (03PS5) 10Reedy: Add SecurePoll specific dblist and allow SecurePoll to use [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172449 (https://bugzilla.wikimedia.org/73245) (owner: 10Jalexander) [20:24:21] (03CR) 10Reedy: [C: 032] Add SecurePoll specific dblist and allow SecurePoll to use [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172449 (https://bugzilla.wikimedia.org/73245) (owner: 10Jalexander) [20:24:29] (03Merged) 10jenkins-bot: Add SecurePoll specific dblist and allow SecurePoll to use [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172449 (https://bugzilla.wikimedia.org/73245) (owner: 10Jalexander) [20:24:54] (03PS2) 10Reedy: (bug 73197) allow admins to give patroller right on Hebrew Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172663 (owner: 10Matanya) [20:24:59] (03CR) 10Reedy: [C: 032] (bug 73197) allow admins to give patroller right on Hebrew Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172663 (owner: 10Matanya) [20:25:07] (03Merged) 10jenkins-bot: (bug 73197) allow admins to give patroller right on Hebrew Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172663 (owner: 10Matanya) [20:25:21] !log restarting elastic1021 to pick up new plugins [20:25:26] Logged the message, Master [20:25:30] ^demon|lunch: ^^^ I'm not really being quick about this [20:25:52] (03PS2) 10Reedy: wgCopyUploadsDomains configuration for Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172557 (https://bugzilla.wikimedia.org/73045) (owner: 10Dereckson) [20:26:32] ^demon|lunch: hey, it works! https://travis-ci.org/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler/builds/40816032 [20:26:56] adds about 15s latency, not too bad though [20:27:22] (03CR) 10Reedy: [C: 032] wgCopyUploadsDomains configuration for Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172557 (https://bugzilla.wikimedia.org/73045) (owner: 10Dereckson) [20:27:30] (03Merged) 10jenkins-bot: wgCopyUploadsDomains configuration for Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172557 (https://bugzilla.wikimedia.org/73045) (owner: 10Dereckson) [20:28:02] !log reedy Synchronized database lists: (no message) (duration: 00m 15s) [20:28:05] Logged the message, Master [20:28:24] (03PS1) 10Ori.livneh: role::deployment: apply keyholder class [puppet] - 10https://gerrit.wikimedia.org/r/172812 [20:28:28] !log reedy Synchronized wmf-config/: (no message) (duration: 00m 15s) [20:28:31] Logged the message, Master [20:29:05] (03PS2) 10Ori.livneh: role::deployment: apply keyholder class [puppet] - 10https://gerrit.wikimedia.org/r/172812 [20:29:22] (03CR) 10Ori.livneh: [C: 032 V: 032] role::deployment: apply keyholder class [puppet] - 10https://gerrit.wikimedia.org/r/172812 (owner: 10Ori.livneh) [20:29:27] <^demon|lunch> manybubbles: I can pick it up after lunch. [20:29:33] <^demon|lunch> cscott: \o/ [20:29:48] (03CR) 10Dzahn: "@ John F. Lewis: It actually already has it:" [dns] - 10https://gerrit.wikimedia.org/r/172452 (https://bugzilla.wikimedia.org/71262) (owner: 10Dzahn) [20:30:34] so i just need to make a new gerrit user, and then give the jenkins jobs access to that user's credentials, securely. [20:34:27] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [20:35:31] (03PS1) 10Ori.livneh: keyholder: provision /run/keyholder [puppet] - 10https://gerrit.wikimedia.org/r/172815 [20:35:40] ^demon|lunch: do you have gerrit admin bits? who can make a new gerrit user for me? [20:35:51] (03CR) 10Ori.livneh: [C: 032 V: 032] keyholder: provision /run/keyholder [puppet] - 10https://gerrit.wikimedia.org/r/172815 (owner: 10Ori.livneh) [20:36:37] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [20:37:41] (03PS1) 10Ori.livneh: keyholder: require_package('python3') [puppet] - 10https://gerrit.wikimedia.org/r/172816 [20:37:52] (03CR) 10Ori.livneh: [C: 032 V: 032] keyholder: require_package('python3') [puppet] - 10https://gerrit.wikimedia.org/r/172816 (owner: 10Ori.livneh) [20:45:26] (03CR) 10Dzahn: "yep, nickel is going to keep the name nickel, so we weren't supposed to remove everything" [puppet] - 10https://gerrit.wikimedia.org/r/172778 (owner: 10Dzahn) [20:45:34] <^demon|lunch> cscott: Just create in wikitech. [20:45:36] <^demon|lunch> Same user store. [20:46:14] (03PS1) 10Dzahn: remove nickel's public IP [dns] - 10https://gerrit.wikimedia.org/r/172819 [20:49:05] (03CR) 10Dzahn: [C: 04-1] "but for this it needs to be removed from puppet as "nickel.wikimedia.org" and renamed/re-added as "nickel.eqiad.wmnet"" [dns] - 10https://gerrit.wikimedia.org/r/172819 (owner: 10Dzahn) [20:53:53] (03PS1) 10Dzahn: nickel: remove ganglia, re-add in eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/172862 [20:54:22] Reedy: did test.wikidata / test2 get wmf8 yet or still doing that? [20:54:59] !log Restarting Jenkins, deadlock on deployment-bastion [20:55:06] Logged the message, Master [20:55:29] aude: According to SAL nothing happened in these regards [20:55:30] yet [20:55:30] aude: all done as of half an hour or so ago [20:55:55] wait... what? [20:55:57] Special:version says otherwise :/ [20:56:28] did I not do a final ync? [20:56:30] sync [20:56:41] not yet [20:56:46] Jenkins is restarting [20:56:46] 20:21:09 rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf7 [20:56:47] ffs [20:56:51] hoo@tin:~$ grep testwikidata /srv/mediawiki-staging/wikiversions.json [20:56:51] "testwikidatawiki": "php-1.25wmf8", [20:56:52] mh [20:57:12] hoo@tin:~$ grep testwikidata /srv/mediawiki/wikiversions.json [20:57:12] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf8 [20:57:13] "testwikidatawiki": "php-1.25wmf7", [20:57:14] ah, yep [20:57:15] Logged the message, Master [20:57:18] you forget that [20:57:22] * forgot [20:57:57] thanks :) [20:59:19] (03CR) 10Dzahn: [C: 04-1] "needs assignment of internal IP and wmnet entry" [puppet] - 10https://gerrit.wikimedia.org/r/172862 (owner: 10Dzahn) [20:59:53] (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/#/c/172862/" [dns] - 10https://gerrit.wikimedia.org/r/172819 (owner: 10Dzahn) [21:00:04] gwicke, cscott, arlolra, subbu: Dear anthropoid, the time has come. Please deploy Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141112T2100). [21:02:07] PROBLEM - DPKG on analytics1011 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:02:39] ^^ that's me [21:05:15] (03CR) 10Dzahn: "so... see these:" [dns] - 10https://gerrit.wikimedia.org/r/172452 (https://bugzilla.wikimedia.org/71262) (owner: 10Dzahn) [21:05:16] RECOVERY - DPKG on analytics1011 is OK: All packages OK [21:06:04] (03CR) 10Dzahn: "follow-up to add the matching forward entry:" [dns] - 10https://gerrit.wikimedia.org/r/107854 (owner: 10Jeremyb) [21:06:22] (03CR) 10Dzahn: "follow-up to add it to DNS" [puppet] - 10https://gerrit.wikimedia.org/r/94111 (owner: 10Jeremyb) [21:06:36] (03PS4) 10BryanDavis: Add api-feature-usage to logstash [puppet] - 10https://gerrit.wikimedia.org/r/172781 (owner: 10Anomie) [21:11:14] (03CR) 10Dereckson: "Depends instead of Icb4cbadb2d3766869dbc5310121b69fc9e450bf2" [puppet] - 10https://gerrit.wikimedia.org/r/172313 (https://bugzilla.wikimedia.org/35611) (owner: 10Dereckson) [21:15:25] (03CR) 10Dzahn: [C: 032] add IPv6 record for iodine (OTRS) [dns] - 10https://gerrit.wikimedia.org/r/172452 (https://bugzilla.wikimedia.org/71262) (owner: 10Dzahn) [21:22:04] (03PS2) 10QChris: Move Eventlogging logs underneath /srv, which has more free space [puppet] - 10https://gerrit.wikimedia.org/r/172706 [21:22:06] (03PS1) 10QChris: Link EventLogging logs into /var/log/eventlogging [puppet] - 10https://gerrit.wikimedia.org/r/172884 [21:23:49] (03CR) 10QChris: "This depends on" [puppet] - 10https://gerrit.wikimedia.org/r/172884 (owner: 10QChris) [21:23:55] (03CR) 10QChris: [C: 04-1] Link EventLogging logs into /var/log/eventlogging [puppet] - 10https://gerrit.wikimedia.org/r/172884 (owner: 10QChris) [21:26:25] (03PS5) 10BryanDavis: Add api-feature-usage to logstash [puppet] - 10https://gerrit.wikimedia.org/r/172781 (owner: 10Anomie) [21:30:10] (03CR) 10QChris: Move Eventlogging logs underneath /srv, which has more free space (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/172706 (owner: 10QChris) [21:30:37] (03PS2) 10QChris: Retain 90 days of EventLogging logs [puppet] - 10https://gerrit.wikimedia.org/r/172707 (https://bugzilla.wikimedia.org/69029) [21:31:07] (03PS6) 10BryanDavis: Add api-feature-usage to logstash [puppet] - 10https://gerrit.wikimedia.org/r/172781 (owner: 10Anomie) [21:31:11] (03CR) 10Ottomata: [C: 032] Move Eventlogging logs underneath /srv, which has more free space [puppet] - 10https://gerrit.wikimedia.org/r/172706 (owner: 10QChris) [21:31:33] (03CR) 10Ottomata: Move Eventlogging logs underneath /srv, which has more free space [puppet] - 10https://gerrit.wikimedia.org/r/172706 (owner: 10QChris) [21:34:16] (03CR) 10QChris: Move Eventlogging logs underneath /srv, which has more free space (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/172706 (owner: 10QChris) [21:38:30] (03PS7) 10BryanDavis: Add api-feature-usage to logstash [puppet] - 10https://gerrit.wikimedia.org/r/172781 (owner: 10Anomie) [21:48:31] test2wiki Wikibase\Client\Usage\Sql\SqlUsageTracker::queryUsagesForPage 10.64.16.27 1146 Table 'test2wiki.wbc_entity_usage' doesn't exist [21:48:37] Reedy: hrm [21:48:53] sweet [21:48:57] aude: hoo ^ [21:49:04] New table? [21:49:32] (03CR) 10Yuvipanda: "So /var anything is terrible on labs, since they are all tiny (~2G, including for logs (by default)). Caused /var to fill up on a couple o" [puppet] - 10https://gerrit.wikimedia.org/r/171206 (owner: 10Ori.livneh) [21:49:48] * StupidPanda vaguely pokes ori with ^ which is already merged [21:50:17] ahm [21:50:21] that was merged? [21:50:27] apparently :/ [21:50:37] bblack: ^^ [21:50:42] (the puppet change, that is) [21:50:53] omg, table is not tob e used yet [21:51:02] we might need a setting change [21:51:05] (03CR) 10BryanDavis: "Stripping backslashes still not working as expected." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/172781 (owner: 10Anomie) [21:51:19] I didn't follow that change [21:51:24] is it setting protected? [21:51:36] it is [21:51:42] Great [21:51:56] might have been changed to default, use the table [21:51:58] so we need the setting [21:53:26] aude: Are you on it or shall I do? [21:54:22] doing [21:54:52] (03PS1) 10Hoo man: Set useLegacyUsageIndex = true for Wikibase client [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172894 [21:55:00] (03PS1) 10Aude: Set useLegacyUsageIndex to true for Wikibase client [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172895 [21:55:17] aaaa [21:55:41] Surprisingly similar commit summaries [21:56:09] hoo's patch is fine [21:56:12] * hoo shouldn't go full screen terminal [21:56:20] (03Abandoned) 10Aude: Set useLegacyUsageIndex to true for Wikibase client [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172895 (owner: 10Aude) [21:56:22] I mean not in front of HexCaht [21:56:45] heh [21:57:11] Reedy: ^ [21:58:25] (03CR) 10Reedy: [C: 032] Set useLegacyUsageIndex = true for Wikibase client [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172894 (owner: 10Hoo man) [21:58:33] (03Merged) 10jenkins-bot: Set useLegacyUsageIndex = true for Wikibase client [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172894 (owner: 10Hoo man) [21:58:50] greg-g: i'm going to postpone today's parsoid deploy until tomorrow, if that's all right [21:59:06] !log reedy Synchronized wmf-config/: Set useLegacyUsageIndex = true for Wikibase client (duration: 00m 17s) [21:59:10] Logged the message, Master [21:59:13] cscott: sure thing [21:59:19] cscott: edit the wiki page plz :) [21:59:56] it looks like we've already got a thursday window. ;) [22:00:04] yurik: Respected human, time to deploy Wikipedia Zero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141112T2200). Please do the needful. [22:02:47] dammit, shinken can no longer seem to send emails [22:05:36] ^demon|lunch: what about gerrit groups? how are those created? [22:15:37] ^demon|lunch: I finished up with elastic1022. want to work on the rest? [22:22:31] cscott: admins can create Gerrit groups [22:23:16] (03PS2) 10QChris: Add jobs for aggregating hourly projectcount files to daily per wiki csvs [puppet] - 10https://gerrit.wikimedia.org/r/172201 (https://bugzilla.wikimedia.org/72740) [22:23:25] cscott: on Gerrit I have People > Create New Group. That needs admin apparently: https://gerrit.wikimedia.org/r/#/admin/create-group/ [22:24:24] cscott: if all fails fill a bug :] I am off [22:25:09] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: puppet fail [22:26:20] hashar: who are gerrit admins? [22:28:52] (03PS3) 10QChris: Add jobs for aggregating hourly projectcount files to daily per wiki csvs [puppet] - 10https://gerrit.wikimedia.org/r/172201 (https://bugzilla.wikimedia.org/72740) [22:29:40] (03PS1) 10Ottomata: Add Adam Baso to researchers group [puppet] - 10https://gerrit.wikimedia.org/r/172898 [22:30:01] (03PS2) 10Ottomata: Add Adam Baso to researchers group [puppet] - 10https://gerrit.wikimedia.org/r/172898 [22:30:17] RECOVERY - check_puppetrun on boron is OK: OK: Puppet is currently enabled, last run 298 seconds ago with 0 failures [22:31:09] (03CR) 10QChris: Add jobs for aggregating hourly projectcount files to daily per wiki csvs (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/172201 (https://bugzilla.wikimedia.org/72740) (owner: 10QChris) [22:31:54] (03CR) 10Ottomata: [C: 032] Add Adam Baso to researchers group [puppet] - 10https://gerrit.wikimedia.org/r/172898 (owner: 10Ottomata) [22:31:57] cscott: https://gerrit.wikimedia.org/r/#/admin/groups/1,members [22:34:03] (03PS4) 10QChris: Add jobs for aggregating hourly projectcount files to daily per wiki csvs [puppet] - 10https://gerrit.wikimedia.org/r/172201 (https://bugzilla.wikimedia.org/72740) [22:35:23] hashar: https://bugzilla.wikimedia.org/show_bug.cgi?id=73334 [22:36:18] cscott: Gerrit issue aside, you might want to write the travis integration as a RFC :-D [22:36:35] cscott: that might be reused to trigger CI jobs from Phabricator on Travis and thus overhaul our CI infrastructure [22:36:41] well, i want to experiment with it first to see if it's feasible [22:37:15] cscott: you might be able to trigger a Travis build using the change reference [22:37:24] if it seems to work for mw-ocg-latexer and gwicke's cassandra projects (restbase, etc) then i'm happy to try to turn my evil hack into a production service. ;) [22:37:50] +1 :D [22:38:32] cscott: in Gerrit the ref for patch 123,42 would be : refs/changes/23/123/42 (the last two digits of the change number are used for namespacing [22:40:10] yes, but those aren't synced to github [22:40:26] and i'm not sure whether it's a good idea to do so [22:40:46] but one option is to convert gerrit refs into github PRs, and then run travis on the PRs. [22:40:47] cscott: there might be an entry point in Travis API to trigger a build given a ref / commit [22:40:58] (03PS1) 10Dzahn: uranium: add IPv6 entries [dns] - 10https://gerrit.wikimedia.org/r/172899 [22:40:58] hashar: nope, i looked and there is not. [22:41:17] hashar: i can trigger a build on a branch, tag, or PR. [22:41:34] throwaway branches seem to work fine at the moment. [22:41:53] i did think hard about using PRs, and that might be a long-term option, but it opens up a bunch of issues. [22:43:22] cscott: well github sends some notification to Travis, so there must be a way [22:43:34] cscott: could even make zuul to send the notification :D [22:46:28] hashar: travis is hardwired to use github as the backend storage. you can't run a build unless the commit in question is in github. [22:46:47] cscott: :-/ [22:47:29] anyway time to get to bed *wave* [22:47:30] hashar: it's not really a problem, really. you don't even have to use the canonical repos. you could push to forked repos of a special 'wmftravis' github user. [22:47:47] the push is just an efficient way to name the collection of files you want tested. [22:50:53] qchris: i had to use the account name 'npmtravis' since 'wmftravis' violated some account-naming rule. [22:53:16] cscott: I added npmtravis. [22:53:31] qchris: thanks! [22:53:38] yw [22:59:41] (03PS8) 10BryanDavis: Add api-feature-usage to logstash [puppet] - 10https://gerrit.wikimedia.org/r/172781 (owner: 10Anomie) [23:02:05] (03CR) 10Anomie: [C: 031] Add api-feature-usage to logstash [puppet] - 10https://gerrit.wikimedia.org/r/172781 (owner: 10Anomie) [23:05:09] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: puppet fail [23:08:24] (03PS5) 10QChris: Add jobs for aggregating hourly projectcount files to daily per wiki csvs [puppet] - 10https://gerrit.wikimedia.org/r/172201 (https://bugzilla.wikimedia.org/72740) [23:10:02] (03CR) 10QChris: "More code review happened in IRC" [puppet] - 10https://gerrit.wikimedia.org/r/172201 (https://bugzilla.wikimedia.org/72740) (owner: 10QChris) [23:10:09] RECOVERY - check_puppetrun on boron is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [23:14:58] PROBLEM - check_puppetrun on backup4001 is CRITICAL: CRITICAL: puppet fail [23:16:56] (03CR) 10Ottomata: [C: 032] Add jobs for aggregating hourly projectcount files to daily per wiki csvs [puppet] - 10https://gerrit.wikimedia.org/r/172201 (https://bugzilla.wikimedia.org/72740) (owner: 10QChris) [23:17:17] (03PS2) 10Dzahn: uranium: add IPv6 entries [dns] - 10https://gerrit.wikimedia.org/r/172899 [23:18:21] (03CR) 10Dzahn: [C: 032] uranium: add IPv6 entries [dns] - 10https://gerrit.wikimedia.org/r/172899 (owner: 10Dzahn) [23:19:01] ottomata: you forgot to merge? https://gerrit.wikimedia.org/r/#/c/172285/ [23:19:41] no, didn't forget to merge [23:19:51] the python code istn' merged yet [23:19:53] (03CR) 10BryanDavis: [C: 031] "Cherry-picked to deployment-salt and working at " [puppet] - 10https://gerrit.wikimedia.org/r/172781 (owner: 10Anomie) [23:20:00] PROBLEM - check_puppetrun on backup4001 is CRITICAL: CRITICAL: puppet fail [23:20:01] qchris just wanted it to be approved and ready for when it happens [23:20:05] so someone else could feel good about merging [23:20:06] (03CR) 10Dzahn: "ganglia.wikimedia.org is an alias for uranium.wikimedia.org." [dns] - 10https://gerrit.wikimedia.org/r/172899 (owner: 10Dzahn) [23:20:31] (03CR) 10Ottomata: "This is ready to merge once the dependent aggregator repository change is merged." [puppet] - 10https://gerrit.wikimedia.org/r/172285 (https://bugzilla.wikimedia.org/72740) (owner: 10QChris) [23:20:35] ottomata: ah, ok. shouldn't have been +2'd, I think... [23:20:50] jgage: Tested and ready for +2 logstash patch at https://gerrit.wikimedia.org/r/#/c/172781/ :) [23:21:16] cool, ok. *looks* [23:21:34] no? [23:21:39] hard to say! [23:21:49] +1 says "LGTM but someone else must approve" [23:22:00] but i give it my 100% stamp of approval, and would merge it now if the other change was ready :) [23:22:07] ideally, if something shouldn't be merged until X happens, it should actually be -2'd :) [23:22:18] I"ll let qchris -2 it then :) [23:22:23] :) [23:22:29] I was just confused going through open patchsets :) [23:22:49] we need to bring this up at allhands or something and resolve it once and for all [23:23:00] (the way to use 1 and 2 and to merge/submit and all that) [23:23:12] (03CR) 10QChris: [C: 04-1] "Waiting for the corresponding changes in aggregator to get merged." [puppet] - 10https://gerrit.wikimedia.org/r/172201 (https://bugzilla.wikimedia.org/72740) (owner: 10QChris) [23:23:19] YuviPanda: Better? ;-) [23:23:23] :D yeah [23:23:28] (03CR) 10Gage: [C: 032] Add api-feature-usage to logstash [puppet] - 10https://gerrit.wikimedia.org/r/172781 (owner: 10Anomie) [23:23:33] mutante: heh, 'once and for all' [23:24:16] bd808, running puppet agent on logstash100* now.. [23:24:29] YuviPanda: +1 ? [23:24:35] mutante: ? [23:25:00] people use it differently [23:25:01] PROBLEM - check_puppetrun on backup4001 is CRITICAL: CRITICAL: puppet fail [23:25:16] i'm not sayin which is right, i'm just saying we do it differently [23:25:44] for example, some people use +2 without submit, some tell you not to [23:26:17] and i didn't want to start the IRC discussion yet another time [23:26:25] because we just repeat ourselves [23:26:34] bd808: ok, your change is live on logstash1001-1003 [23:27:04] mutante: heh, true. ops definitely use gerrit differently [23:27:10] (03CR) 10TTO: "Sorry, I was too hasty. I didn't notice the change was in CommonSettings. If it has been in InitialiseSettings my -1 would have been more " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172449 (https://bugzilla.wikimedia.org/73245) (owner: 10Jalexander) [23:28:16] YuviPanda: some people want you to _never_ merge changes they uploaded, some people want wiki-style collab, some ask for permission to upload a new PS on somebody else's change and so on.. [23:28:33] :) +2 + merge definitely means 'I will babysit this change' [23:29:06] +2 != submit [23:29:21] mutante and I have meditated on this many times :) [23:29:24] well, that's confusing, I think. [23:29:33] you guys should write up about it in mediums that's not IRC channels :0 [23:29:34] :) [23:29:39] clearly we need a +3 [23:29:50] +4 [23:29:53] hehe [23:29:54] PROBLEM - check_puppetrun on backup4001 is CRITICAL: CRITICAL: puppet fail [23:29:54] +42 [23:30:01] do you know how confused I was for a week [23:30:03] that two +1's [23:30:06] is not a +2 [23:30:09] that wtf'ed me to no end [23:30:18] check_gerrit_rules on gerrit is CRITICAL: CRITICAL: Too many people use gerrit in too many ways [23:30:40] useradd -G youguys YuviPanda [23:30:52] (03PS1) 10Ori.livneh: Groundwork work for keyholder-based MediaWiki deployments [puppet] - 10https://gerrit.wikimedia.org/r/172911 [23:30:58] jgage: It's working. \o/ thanks. [23:31:02] yay [23:31:24] you can have a million +1 and still never get it merged :) [23:31:31] +1 is cheap [23:31:36] * YuviPanda merges mutante and chasemp [23:31:52] I don't +2 anything I don't feel fully comfortable babysitting [23:32:00] the mysql_wmf autolayout patch, for example [23:32:03] lol [23:32:08] the full authoritative from mark I got on this is that a +1 from an ops person [23:32:14] makes seomthing merge-worthy [23:32:19] +1 [23:32:22] :D [23:32:22] (03CR) 10Ori.livneh: [C: 032] Groundwork work for keyholder-based MediaWiki deployments [puppet] - 10https://gerrit.wikimedia.org/r/172911 (owner: 10Ori.livneh) [23:32:26] the rub is some people only want to merge their own changes (me) [23:32:34] and some people thing all changes should be mergable by all [23:32:48] mainly I am just too high strung to not merge my own stuff [23:33:30] i think that leads to less actual reviewing [23:33:56] everybody will be tempted to just self-merge their own stuff and rarely look at something else [23:35:02] PROBLEM - check_puppetrun on backup4001 is CRITICAL: CRITICAL: puppet fail [23:35:09] then you get heartbleed [23:35:24] but the bigger problem is to get reviews without having to beg on IRC which introduces a need for realtime-ness [23:35:52] we could have merge duty like RT duty [23:35:56] but nobody would like that i suspect [23:36:32] i think we just need better mail filters [23:36:50] so that people dont get all the "gerrit spam" but still the ones where you have been personally added as reviewer [23:37:07] making jenkins bot not email unless it's a -1 or a -2 would be nice [23:37:14] or.. get in the habit of looking at web ui queue more often [23:37:31] * YuviPanda does that instead [23:37:41] or couple tickets and review tighter and make it part of a dashboard that is included in normal work flow :) [23:37:56] hehe 'normal workflow' [23:37:59] it would also be nice to have a bot trigger like !reviewers which adds some random people *g*:) [23:39:22] or invite everyone in ops to review changes from !ops for equal review opportunity [23:39:39] chasemp: i have been pasting a ton of gerrit links into RT... that doesnt do anything [23:40:01] PROBLEM - check_puppetrun on backup4001 is CRITICAL: CRITICAL: puppet fail [23:40:14] agree w/ jgage and also sure mutante in theory that's a good plan [23:40:22] on that note, anyone wanna +1 https://gerrit.wikimedia.org/r/#/c/172802/? [23:40:28] I think in practice rt is so divorced from the rest of things it doesn't work out [23:40:29] no, make a ticket :) [23:40:31] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:54] indeed, RT isn't even part of my workflow [23:40:56] what do you mean by divorced? [23:41:01] like, people don't use it [23:41:38] YuviPanda: could you make an RT for shinken? [23:41:47] I've been using phabricator [23:41:51] hopefully the switch to phab will be a good opportunity to reset our workflows [23:42:30] it may be naive but I'm also hopeful [23:43:22] YuviPanda: "lover_name" ? nice [23:43:42] wha? [23:43:43] !ops usually calls IRC channel operators. [23:44:03] yes Carmela, how may we assist you in your kickban requirements today ? [23:44:11] mutante: it does say that in the email... [23:44:23] NotASpy: :-) [23:45:01] PROBLEM - check_puppetrun on backup4001 is CRITICAL: CRITICAL: puppet fail [23:45:08] mutante: we also have a lot more 'kill' in our repo than 'love' :) [23:45:19] * YuviPanda moves to a hippie commune [23:45:42] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [23:46:52] mutante: shinken doesn't actually seem to deliver emails anymore tho :( I need to see what's up with that [23:47:11] so why are shinken commands in module called nagios [23:47:39] where are shinken commands? [23:47:45] it's called nagios_common :) [23:47:50] in that change you just linked [23:47:51] and shinken and icinga are nagios compatible implementations [23:47:57] that re-use the same notification commands [23:48:04] which are defined in nagios format, which shinken and icinga also use :) [23:48:29] there's nothing shinken specific there [23:49:57] (03PS1) 10Rush: phab email pipe cleanup and allow maint [puppet] - 10https://gerrit.wikimedia.org/r/172915 [23:50:04] PROBLEM - check_puppetrun on backup4001 is CRITICAL: CRITICAL: puppet fail [23:50:11] PROBLEM - check_puppetrun on samarium is CRITICAL: CRITICAL: puppet fail [23:50:12] PROBLEM - check_puppetrun on silicon is CRITICAL: CRITICAL: puppet fail [23:50:13] PROBLEM - check_puppetrun on payments1003 is CRITICAL: CRITICAL: puppet fail [23:50:26] (03PS2) 10Rush: phab email pipe cleanup and allow maint [puppet] - 10https://gerrit.wikimedia.org/r/172915 [23:50:31] (03PS3) 10Rush: phab email pipe cleanup and allow maint [puppet] - 10https://gerrit.wikimedia.org/r/172915 [23:51:13] (03CR) 10jenkins-bot: [V: 04-1] phab email pipe cleanup and allow maint [puppet] - 10https://gerrit.wikimedia.org/r/172915 (owner: 10Rush) [23:51:49] (03CR) 10Dzahn: [C: 031] "http://unicodeheart.com/" [puppet] - 10https://gerrit.wikimedia.org/r/172802 (owner: 10Yuvipanda) [23:52:00] mutante: :D [23:52:13] YuviPanda: ^ i remmeber when notpeter added the hearts :) [23:52:49] 😻😻😻 [23:53:00] mutante: :D hearts good [23:53:01] (03PS1) 10Yuvipanda: icinga: Move ircecho code into module [puppet] - 10https://gerrit.wikimedia.org/r/172916 [23:53:03] mutante: also ^? :) [23:53:06] (03PS1) 10Filippo Giunchedi: swift: expand txstats complete configuration [puppet] - 10https://gerrit.wikimedia.org/r/172917 [23:53:12] (03PS4) 10Yuvipanda: shinken: Don't say 'icinga' in shinken notification emails [puppet] - 10https://gerrit.wikimedia.org/r/172802 [23:53:13] re, the puppet fails on fundraising.. eh.. that should have been fixed a litle while ago [23:53:14] the things you don't discover by chance at midnight ^ [23:53:35] damn, it's almost 6... [23:53:58] nooo. ircecho .. arg [23:54:02] YuviPanda: am right? you might need some sleep [23:54:06] yeah. [23:54:07] I should. [23:54:15] I do quite like my new sleep cycle tho [23:54:22] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: expand txstats complete configuration [puppet] - 10https://gerrit.wikimedia.org/r/172917 (owner: 10Filippo Giunchedi) [23:54:34] PDT hours? [23:54:50] 4AM - 12 noon, work for 3h in the afternoon, then go hang out with friends in the evening, then another 6-7h after 11pm [23:55:02] (03CR) 10Yuvipanda: [C: 032] shinken: Don't say 'icinga' in shinken notification emails [puppet] - 10https://gerrit.wikimedia.org/r/172802 (owner: 10Yuvipanda) [23:55:03] PROBLEM - check_puppetrun on backup4001 is CRITICAL: CRITICAL: puppet fail [23:55:13] godog: not intentionally, at least. [23:55:21] PROBLEM - check_puppetrun on thulium is CRITICAL: CRITICAL: puppet fail [23:55:22] PROBLEM - check_puppetrun on samarium is CRITICAL: CRITICAL: puppet fail [23:55:23] PROBLEM - check_puppetrun on silicon is CRITICAL: CRITICAL: puppet fail [23:55:24] RECOVERY - check_puppetrun on payments1003 is OK: OK: Puppet is currently enabled, last run 300 seconds ago with 0 failures [23:55:34] mutante: yeah, ircecho. we don't have anything else atm, sadly [23:55:50] oh, we do, we have too many :) [23:55:53] mutante: I should spend some time writing a simple redis based service that does all IRC filtering, sends metrics / events somewhere, logs things as well... [23:56:05] and then all bots in prod just relay through that [23:56:06] YuviPanda: oh ok, so basically no mornings [23:56:09] every time somebody wants an IRC bot we have to use something different :) [23:56:32] godog: true :) that's ok. I don't remember waking up before 10AM for quite a while... [23:57:02] YuviPanda: i should make you a member of the bots labs project [23:57:02] heheh [23:57:12] mutante: prod / labs split, etc :) [23:57:17] NOOO [23:58:22] plus putting it on labs would mean having an auth mechanism, etc. [23:58:32] I'm not going to replace ircecho when I'm setting up shinken :) [23:58:44] replace one thing after another [23:59:10] well, I'll write it in the 4weeks of free time I get with every week I have :) [23:59:53] mutante: so... +1? :)