[00:58:59] (03CR) 10Ori.livneh: "Petrb: is a working example that subscribes to the rcstream feed and writes each incom" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (owner: 10Ori.livneh) [01:12:29] (03PS7) 10Ori.livneh: Add 'rcstream' module for broadcasting recent changes over WebSockets [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 [01:17:16] (03PS8) 10Ori.livneh: Add 'rcstream' module for broadcasting recent changes over WebSockets [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/16599) [02:13:01] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3792 MB (3% inode=99%): [02:14:22] !log LocalisationUpdate completed (1.24wmf2) at 2014-05-05 02:13:18+00:00 [02:14:31] Logged the message, Master [02:22:01] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3434 MB (3% inode=99%): [02:26:26] (03PS9) 10Ori.livneh: Add 'rcstream' module for broadcasting recent changes over WebSockets [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/16599) [02:26:37] !log LocalisationUpdate completed (1.24wmf3) at 2014-05-05 02:25:34+00:00 [02:26:43] Logged the message, Master [02:32:22] <^demon|away> !log antimony: ran very very aggressive repacking on mediawiki/core, operations/puppet, mediawiki/extensions/{UploadWizard,CentralAuth,CentralNotice,DonationInterface,FlaggedRevs,AbuseFilter,BlueSpiceExtensions,Translate,WikimediaMessages,EducationProgram,UniversalLanguageSelector,Wikibase}, pywikibot/{core,compat}, operations/dumps/tests. Basically anything taking up >90MB on disk. Probably not the cause of gitb [02:32:22] <^demon|away> lit's wonkiness but they're certainly not helping matters. [02:32:29] Logged the message, Master [02:32:50] <^demon|away> !log [gitb]lit's wonkiness but they're certainly not helping matters. [02:32:56] Logged the message, Master [02:40:19] (03PS6) 10Springle: MariaDB multi-source replication monitoring. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131015 [02:41:42] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Sat May 3 05:29:08 2014 [02:42:41] (03CR) 10Springle: [C: 032] MariaDB multi-source replication monitoring. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131015 (owner: 10Springle) [02:47:31] (03PS10) 10Krinkle: Add 'rcstream' module for broadcasting recent changes over WebSockets [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/16599) (owner: 10Ori.livneh) [02:48:53] (03PS11) 10Krinkle: Add 'rcstream' module for broadcasting recent changes over WebSockets [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/16599) (owner: 10Ori.livneh) [02:50:36] (03PS12) 10Krinkle: Add 'rcstream' module for broadcasting recent changes over WebSockets [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/14045) (owner: 10Ori.livneh) [02:51:27] (03CR) 10Krinkle: "Referring to bug 14045 instead of bug 16599 (tracker bug), and removed duplicate RFC (one is marked as superseded by the other and already" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131040 (https://bugzilla.wikimedia.org/14045) (owner: 10Ori.livneh) [02:56:44] (03PS1) 10Springle: Use correct replication channel name s/m1/m2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131414 [02:58:30] (03CR) 10Springle: [C: 032] Use correct replication channel name s/m1/m2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131414 (owner: 10Springle) [03:01:00] RECOVERY - Disk space on virt0 is OK: DISK OK [03:12:22] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon May 5 03:11:15 UTC 2014 (duration 11m 14s) [03:12:28] Logged the message, Master [03:34:23] (03PS1) 10Springle: Remove db68 from site.pp. To be rebuilt. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131416 [03:36:09] (03CR) 10Springle: [C: 032] Remove db68 from site.pp. To be rebuilt. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131416 (owner: 10Springle) [04:10:55] (03PS1) 10Springle: Use m2-master CNAME to make DB rotations neater. This allows a master switch to be a DNS change plus a simple port 3306 tcp redirect with socat until TTL. Should also help if we switch to a haproxy configuration in the future. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131418 [04:13:22] (03CR) 10Springle: "Is this likely to work with consumers after TTL and at least one DB reconnect, or are they going to need an explicit restart if CNAME chan" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131418 (owner: 10Springle) [04:19:59] (03PS1) 10Springle: Use m2-master CNAME to make DB rotations neater. This allows a master switch to be a DNS change plus a simple port 3306 tcp redirect with socat until TTL. Should also help if we switch to a haproxy configuration in the future. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131419 [04:20:36] (03CR) 10Springle: "Is this likely to work with gerrit after TTL and at least one DB reconnect, or will it need an explicit restart if CNAME changes?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131419 (owner: 10Springle) [04:25:50] (03PS1) 10Springle: Use m2-master CNAME to make DB master rotations neater. This allows a master switch to be a DNS change plus a simple port 3306 tcp redirect with socat until TTL. Should also help if we switch to a haproxy configuration in the future. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131420 [04:28:32] (03CR) 10Springle: "Does this sound ok? The scholarships app doesn't appear to use persistent connections (great!) so I'm expecting the DNS/socat process woul" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131420 (owner: 10Springle) [04:37:24] (03PS1) 10Springle: Useful to have one slave CNAME for m1, m2, and x1 shards [operations/dns] - 10https://gerrit.wikimedia.org/r/131421 [04:43:10] (03CR) 10Springle: "Want these to put in puppet in a few places rather than actual db host names. Trying to think of any catches that could arise some day?" [operations/dns] - 10https://gerrit.wikimedia.org/r/131421 (owner: 10Springle) [05:42:43] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Sat May 3 05:29:08 2014 [06:29:23] PROBLEM - Disk space on vanadium is CRITICAL: DISK CRITICAL - free space: / 4018 MB (3% inode=95%): [06:30:23] RECOVERY - Disk space on vanadium is OK: DISK OK [07:52:23] (03PS1) 10Hashar: Retab role/gerrit.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/131432 [07:52:26] (03PS1) 10Hashar: Remove a level of indentation from role/gerrit.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/131433 [07:52:27] (03PS1) 10Hashar: role::gerrit::labs::ci for CI project on labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/131434 [07:54:13] <_joe_> /win 27 [07:54:19] <_joe_> oh I hate this [07:54:24] you won! [07:58:58] (03Abandoned) 10Hashar: role::gerrit::labs::ci for CI project on labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/131434 (owner: 10Hashar) [08:43:43] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Sat May 3 05:29:08 2014 [09:13:46] (03PS1) 10Hashar: Adjust role::zuul::labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/131438 [09:20:13] (03PS5) 10Hashar: contint: split Zuul server and merger (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129292 [09:20:31] (03CR) 10Hashar: "Pass $git_dir to zuul::merger since it needs it for git repack." [operations/puppet] - 10https://gerrit.wikimedia.org/r/129292 (owner: 10Hashar) [09:29:10] (03PS6) 10Hashar: contint: split Zuul server and merger (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129292 [09:37:28] (03PS2) 10Hashar: Adjust role::zuul::labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/131438 [10:56:45] (03PS2) 10Giuseppe Lavagetto: Move cluster definition to the node level. [operations/puppet] - 10https://gerrit.wikimedia.org/r/130591 [11:44:35] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Sat May 3 05:29:08 2014 [11:45:07] (03PS1) 10Gage: initial debianization [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 [12:25:20] (03CR) 10Gage: "* Uses git-buildpackage and pristine-tar" [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [12:39:52] (03PS3) 10Nemo bis: We have APC, let's use it in $wgMainCacheType [operations/puppet] - 10https://gerrit.wikimedia.org/r/119102 [12:43:25] (03PS1) 10Alexandros Kosiaris: WIP: Remove various sdtpa leftover DNS records [operations/dns] - 10https://gerrit.wikimedia.org/r/131453 [12:45:19] !log removing various sdtpa devices from LibreNMS [12:45:26] Logged the message, Master [12:49:17] akosiaris: there are now three such patchsets [12:50:24] :-( [12:50:48] the other two: https://gerrit.wikimedia.org/r/#/c/130573/ and https://gerrit.wikimedia.org/r/#/c/131348/ [12:50:49] oh and [12:51:06] lol [12:51:07] https://gerrit.wikimedia.org/r/#/c/131347/ [12:51:20] choose any, don't care, let's jut merge before there's a fourth :-D [12:52:24] in other news ff29 is weirding me out a bit (where's the ___ button? oh it's morphed into _that_ thing...) [12:53:30] Yeah, I heard they did major redesign. [12:54:11] I installed the "classic theme restorer" extension [12:54:15] and all is well [12:54:26] I"m going to try to stick it out for a little while [13:06:52] I couldn't stand the tabs on the top [13:09:09] <_joe_> I love it [13:09:11] <_joe_> :) [13:27:00] manybubbles|away: I'm wondering if it's now safe to chance your UID or if you still have running jobs… ping me when you get in? [13:27:56] (03PS1) 10Andrew Bogott: Update dartar's UID to match labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/131462 [13:27:58] (03CR) 10jenkins-bot: [V: 04-1] Update dartar's UID to match labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/131462 (owner: 10Andrew Bogott) [13:34:36] (03PS1) 10Andrew Bogott: Update dartar's UID to match labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/131463 [13:48:59] (03CR) 10Faidon Liambotis: [C: 032] Remove cr1-sdtpa/csw1-sdtpa, fix crosslinks DNS [operations/dns] - 10https://gerrit.wikimedia.org/r/131347 (owner: 10Faidon Liambotis) [13:49:03] (03CR) 10Faidon Liambotis: [C: 032] Kill DNS for sdtpa mgmt hosts (asw, cam, ps, scs) [operations/dns] - 10https://gerrit.wikimedia.org/r/131348 (owner: 10Faidon Liambotis) [13:49:21] apergos: merged mine, as it was fixing (and not removing) crosslinks [13:49:41] ok, off to abandon [13:50:39] (03Abandoned) 10ArielGlenn: sdtpa is gone, remove all entries to devices, links, etc [operations/dns] - 10https://gerrit.wikimedia.org/r/130573 (owner: 10ArielGlenn) [13:51:04] (03Abandoned) 10Faidon Liambotis: WIP: Remove various sdtpa leftover DNS records [operations/dns] - 10https://gerrit.wikimedia.org/r/131453 (owner: 10Alexandros Kosiaris) [13:53:15] (03PS1) 10Andrew Bogott: Rename aaharoni to amire80 to match labs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131464 [13:53:37] (03Abandoned) 10Andrew Bogott: Update dartar's UID to match labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/131462 (owner: 10Andrew Bogott) [13:56:41] (03CR) 10Andrew Bogott: [C: 032] Rename aaharoni to amire80 to match labs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131464 (owner: 10Andrew Bogott) [13:56:49] (03PS1) 10Andrew Bogott: Rename gdubuc to 'gilles' to match labs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131465 [13:57:42] (03PS2) 10Andrew Bogott: Update dartar's UID to match labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/131463 [13:59:12] (03CR) 10Andrew Bogott: [C: 032] Rename gdubuc to 'gilles' to match labs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131465 (owner: 10Andrew Bogott) [13:59:46] (03PS3) 10Andrew Bogott: Update dartar's UID to match labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/131463 [14:01:44] (03CR) 10Andrew Bogott: [C: 032] Update dartar's UID to match labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/131463 (owner: 10Andrew Bogott) [14:30:09] andrewbogott_afk: I'm now totally logged out - sorry, wasn't paying attention to pings for a while there [14:30:50] (03PS1) 10Alexandros Kosiaris: torrus: csw2-esams in accessswitches, not corerouters [operations/puppet] - 10https://gerrit.wikimedia.org/r/131473 [14:36:02] aharoni: For the SWAT, please prepare gerrit changes that actually backport the changes to the relevant deployment branches (wmf/1.24wmf2 and/or wmf/1.24wmf3), and link them from the Deployments page. The "Cherry Pick To" button in Gerrit is helpful for this, although note for extensions you'll additionally need to prepare a revision updating the submodule in the appropriate mediawiki/core branch. Thanks. [14:43:19] anomie: is https://bugzilla.wikimedia.org/show_bug.cgi?id=64727 going in this SWAT ? [14:43:50] matanya: Planning on it, why? [14:44:21] my bot is not running for a week now, and people are complaining [14:45:12] a week == 3 days [14:45:36] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Sat May 3 05:29:08 2014 [14:45:50] anomie: so, if this gets in it would help me a lot. thanks [14:47:06] matanya: Ok. (: I thought you were asking because you saw some problem with it from an ops perspective. [14:47:33] well, i'm not an op, so no :) [14:54:14] manybubbles: I'll do the SWAT today [14:55:06] hi anomie [14:55:17] (03CR) 10BryanDavis: [C: 031] "Scholarships does actually use persistent connections (see https://github.com/wikimedia/wikimedia-wikimania-scholarships/blob/master/src/W" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131420 (owner: 10Springle) [14:55:27] hi aharoni [14:56:00] looking at the SWAT stuff [14:56:46] back in a couple of minutes [15:00:29] * anomie is beginning the SWAT deploy, with his own backport [15:00:41] anomie: sweet [15:01:02] (03PS3) 10BBlack: Use whole subnets in squid.php list for XFF acceptance [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130634 [15:01:45] I guess I should wait for SWAT before trying to merge in the above [15:01:56] (or if it's easy to include it, feel free) [15:03:44] anomie: back [15:03:56] had to reconnect [15:04:22] so, do I have to do the "Cherry pick to" thing? [15:05:43] to wmf/1.24wmf3 ? [15:09:31] anomie: ^ [15:09:40] manybubbles: ok if I move you right now? [15:10:23] aharoni: It'd be helpful, yes. I can do it for you in a few minutes, but for the future it saves time to do it yourself so I don't have to ask which version you're wanting to backport to. [15:10:38] !log anomie synchronized php-1.24wmf2/includes/api/ApiLogin.php 'SWAT: Backport change 131056 to 1.24wmf2 to fix bug 64727' [15:10:43] anomie: I should learn. [15:10:45] Logged the message, Master [15:10:52] only wmf/1.24wmf3 , or also wmf/1.24wmf2 / [15:10:54] (03PS1) 10Andrew Bogott: Move Nik Everett/manybubbles to UID 3304 to match labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/131477 [15:10:55] ? [15:11:33] aharoni: It depends if it needs to go to all the wikis or just the ones on 1.24wmf3 already. [15:11:59] !log anomie synchronized php-1.24wmf3/includes/api/ApiLogin.php 'SWAT: Backport change 131056 to 1.24wmf3 to fix bug 64727' [15:12:05] Logged the message, Master [15:12:07] matanya: Bug fix is deployed now, hopefully that makes your bot work [15:12:33] * anomie has verified that the bug fix fixed the bug it was supposed to fix [15:12:53] (03PS2) 10Andrew Bogott: Move Nik Everett/manybubbles to UID 3304 to match labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/131477 [15:13:01] my guess is that only wmf3 [15:13:25] aharoni: I'm ready to deploy as soon as you have the patches ready [15:15:16] one day... if I have another lifetime left in me, i'm rewriting half the plugins of jenkins... [15:16:07] (03CR) 10Andrew Bogott: [C: 032] Move Nik Everett/manybubbles to UID 3304 to match labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/131477 (owner: 10Andrew Bogott) [15:16:41] OK, preparing [15:17:36] anomie: does this look OK - https://gerrit.wikimedia.org/r/#/c/131479/ ? [15:18:06] * anomie +2s [15:18:19] anomie - the other one: https://gerrit.wikimedia.org/r/#/c/131480/ [15:18:42] aharoni: if the bug is only present in one wmf branch, you only need one backport [15:19:03] that was my question in my email on that thread, in fact :) (if this was only wmf3) [15:19:47] mutante: you around? Can you give me a hand with an RT access question? [15:19:58] aharoni: The second one looks good too, although the backport also needs the instructions at https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Updating_the_submodule followed. I can do it, but if you want to learn how you're welcome to. [15:20:25] anomie: I'll look. I should know such things at least a bit. [15:24:23] !log anomie synchronized php-1.24wmf3/includes/specials/SpecialAllmessages.php 'SWAT: Backport change 131041 to 1.24wmf3 to fix bug in Special:AllMessages' [15:24:28] aharoni: The Special:AllMessages change is deployed now, please verify that things are now working [15:24:29] Logged the message, Master [15:24:38] thanks [15:24:55] anomie: checking. [15:25:02] in the meantime, `git submodule update --init --recursive` takes some time... [15:26:05] :) a bit [15:26:30] anomie: Special:Allmessages doesn't seem to work. [15:26:43] Caching issue, or something broken? [15:26:54] aharoni: First, which wiki are you testing on? [15:27:22] English, Hebrew [15:27:25] should it be test? [15:27:52] aharoni: IIRC, only mediawiki.org and the testwikis are on wmf3. Check Special:Version. [15:28:14] If you need it backported to wmf2 as well, we have time to do that. [15:29:18] Yes, only mediawiki.org and the testwikis are on wmf3. Non-Wikipedias go tomorrow, and Wikipedias on Thursday. [15:29:58] anomie: it's broken on enwiki. [15:30:03] so wmf2? [15:30:25] yes, looks like it in Special:Version [15:30:32] (I know so little about deployment.) [15:30:34] aharoni: Yes, we'll need to do enwiki as well. Same cherry-pick process, just to the 1.24wmf2 branch [15:31:13] OK [15:31:42] anomie: https://gerrit.wikimedia.org/r/#/c/131481/ [15:32:06] anomie: https://gerrit.wikimedia.org/r/#/c/131482/ [15:32:07] aharoni: +2ed [15:32:35] (and git submodule is still running for that MofileFrontend thing on wmf3) [15:36:13] git submodule done, going on... [15:37:47] !log anomie synchronized php-1.24wmf2/includes/specials/SpecialAllmessages.php 'SWAT: Backport change 131041 to 1.24wmf2 to fix bug in Special:AllMessages' [15:37:54] Logged the message, Master [15:37:59] anomie: about the MobileFrontend change: [15:38:01] aharoni: enwiki has the Special:AllMessages fix now [15:38:17] on wmf3 the Git procedure didn't seem to do anything. [15:38:31] git review said error: failed to push some refs to 'ssh://amire80@gerrit.wikimedia.org:29418/mediawiki/core.git' [15:39:49] aharoni: No details on why it failed? [15:40:21] pretty straightforward [15:40:25] remote: Processing changes: refs: 1, done [15:40:27] To ssh://amire80@gerrit.wikimedia.org:29418/mediawiki/core.git [15:40:28] ! [remote rejected] HEAD -> refs/publish/wmf/1.24wmf3 (no new changes) [15:40:30] error: failed to push some refs to 'ssh://amire80@gerrit.wikimedia.org:29418/mediawiki/core.git' [15:41:15] aharoni: There you go, "no new changes". Did you commit your change to the extension before trying to update the submodule ref in core? [15:42:06] git commit -a -m "Update MyCoolExtension" [15:42:20] do I have to run `git commit -a -m "Update MyCoolExtension"` in the core directory or in the extension directory? [15:42:36] aharoni: I see that https://gerrit.wikimedia.org/r/#/c/131480/ isn't merged, so the git fetch and git checkout in the extension directory didn't do anything. [15:43:36] right. [15:43:50] are you supposed to merge it first? [15:44:07] You need to for the submodule update to work right [15:45:06] I'm not sure that I understand. [15:45:15] I cannot merge https://gerrit.wikimedia.org/r/#/c/131480/ . [15:45:17] no +2 right [15:45:28] aharoni: Ah, ok. I'll take care of that then. [15:45:41] thanks [15:46:29] and Special:Allmessages is still broken on English and Hebrew Wikipedia. [15:48:51] aharoni: Ugh. That would be my fault, I forgot one step in the process. :( [15:49:52] !log anomie synchronized php-1.24wmf2/includes/specials/SpecialAllmessages.php 'SWAT: Backport change 131041 to 1.24wmf2 to fix bug in Special:AllMessages' [15:49:57] \o/ seems to work [15:50:06] Good! [15:50:22] Now, we've got about 10 minutes to deploy and verify your other backport. [15:50:31] OK, is there anything more I can do to help with it? [15:50:36] (I'm here, in any case.) [15:50:56] Well, are you still wanting to try preparing the submodule update for wmf3? Or should I just do it? [15:51:32] * andre__ also has a question for anomie but this looks more urgent, so better not interrupting here [15:51:42] uno momento por favor [15:51:48] anomie: I will have the same problem with +2 [15:52:10] aharoni: I'd +2 it anyway, I was just wondering if you still wanted to put the change in Gerrit [15:52:21] Oh. [15:52:23] * anomie will do the backport to wmf2 in the mean time [15:52:24] a sec [15:54:27] now it seems to work [15:55:17] anomie: does this make sense? - https://gerrit.wikimedia.org/r/#/c/131486/ [15:56:13] aharoni: Yes [15:58:22] aharoni: Doing the wmf2 one now, please check as soon as the bot says it's done [15:58:28] Then I'll do the wmf3 version [15:59:03] !log anomie synchronized php-1.24wmf2/extensions/MobileFrontend/ 'SWAT: Backport change 131237 to 1.24wmf2 to fix bug in MobileFrontend' [15:59:10] Logged the message, Master [15:59:25] * anomie is doing the wmf3 MobileFrontend sync now [16:00:25] !log anomie synchronized php-1.24wmf3/extensions/MobileFrontend/ 'SWAT: Backport change 131237 to 1.24wmf3 to fix bug in MobileFrontend' [16:00:31] Logged the message, Master [16:00:53] * anomie is done with this morning's SWAT [16:01:10] aharoni: let us know when you've confirmed the fix [16:03:20] Sorry! This site is experiencing technical difficulties. [16:03:24] Try waiting a few minutes and reloading. [16:03:26] (Cannot contact the database server: Too many connections (10.64.16.10)) [16:04:03] twkozlowski: ? [16:04:05] which wiki? [16:04:08] wfm [16:04:18] * aude assumes wikidata [16:04:28] greg-g: the MobileFrontend change looks good [16:04:36] aharoni: great, thanks [16:04:37] Wikidata [16:04:48] twkozlowski: any particular page? [16:04:57] https://www.wikidata.org/wiki/Q9381500#sitelinks-wikipedia [16:05:12] Too many connections, it seems. [16:05:13] :-D [16:05:17] wfm [16:08:26] wfm2, only saying cause you like to know when things break [16:08:52] twkozlowski: I do I do :) [16:09:03] if it happens again, do let us know [16:10:13] oh right, gold coast for them this week [16:11:59] damn, I missed it again :/ [16:34:24] (03PS1) 10Giuseppe Lavagetto: Adding ability to compute change-based diffs. [operations/software] - 10https://gerrit.wikimedia.org/r/131495 [16:57:28] (03PS1) 10Alexandros Kosiaris: Move torrus to netmon1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/131499 [17:00:13] (03PS1) 10BryanDavis: Configure git submodules to rebase on update [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131500 (https://bugzilla.wikimedia.org/61562) [17:01:21] <^d> <3 [17:01:26] <^d> bd808: ^ [17:01:45] * bd808 basks in ^d's adoration :) [17:02:29] (03CR) 10BryanDavis: Configure git submodules to rebase on update (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131500 (https://bugzilla.wikimedia.org/61562) (owner: 10BryanDavis) [17:02:31] (03CR) 10Chad: [C: 032] Configure git submodules to rebase on update [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131500 (https://bugzilla.wikimedia.org/61562) (owner: 10BryanDavis) [17:02:38] (03Merged) 10jenkins-bot: Configure git submodules to rebase on update [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131500 (https://bugzilla.wikimedia.org/61562) (owner: 10BryanDavis) [17:02:48] (03CR) 10Aaron Schulz: [C: 032] Revert "Increased htmlCacheUpdate throttle" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131408 (owner: 10Aaron Schulz) [17:02:57] (03Merged) 10jenkins-bot: Revert "Increased htmlCacheUpdate throttle" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131408 (owner: 10Aaron Schulz) [17:03:09] <^d> bd808: We'll test it this week's deploy. [17:03:29] <^d> In theory, could test it on currently deployed branches too. [17:03:31] <^d> Let's do that. [17:03:44] Oh yeah. You're deploying this week [17:04:28] <^d> Can we clean up 1.23wmf*s now? [17:04:58] On tin or in gerrit? [17:05:14] !log aaron synchronized wmf-config/CommonSettings.php 'Revert "Increased htmlCacheUpdate throttle"' [17:05:21] Logged the message, Master [17:05:34] <^d> The latter definitely, can turn them to tags. [17:05:36] On tin we need to keep branches for ~30 days after they are rolled off the 'pedias to support varnish caches [17:05:56] * bd808 goes to dig up the email where he analyzed that [17:06:06] <^d> Is that automated yet? [17:06:09] <^d> Cleaning up old ones? [17:06:45] There's a cleanup script, but it's not hooked into anything. I was doing it manually during the Tuesday deploy window [17:09:51] <^d> Ah ok. [17:11:07] ^d: I wrote some emails about the process in early March. They went to the old core-l, ops-l and engineering. [17:13:20] <^d> bd808: Updated currently deployed branches wmf[23] [17:13:35] <^d> With the submodule rebase thingie. [17:13:44] oh cool [17:45:49] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Sat May 3 05:29:08 2014 [18:02:51] ori: is gdash broken? [18:21:43] (03CR) 10Umherirrender: [C: 031] Create 'noratelimit' user group on dewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130809 (https://bugzilla.wikimedia.org/57819) (owner: 10Withoutaname) [18:41:45] (03PS1) 10Chad: Remove $wmgCirrusIsBuilding. Cirrus is always on everywhere now [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131513 [18:57:40] (03PS4) 10BBlack: Use whole subnets in squid.php list for XFF acceptance [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130634 [18:58:06] ^ going to push this through tin in a minute, if no objections [18:58:21] (to the timing of the deploy, I mean!) [18:59:25] (03CR) 10BBlack: [C: 032 V: 032] Use whole subnets in squid.php list for XFF acceptance [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130634 (owner: 10BBlack) [18:59:30] :) [18:59:39] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1585.733276 [18:59:39] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1427.366699 [19:00:19] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1336.033325 [19:00:59] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1084.266724 [19:01:03] ?? % [19:01:06] ^ even [19:01:19] PROBLEM - Varnishkafka Delivery Errors on cp1057 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1315.866699 [19:01:45] bblack: have you done a mw-config deploy before? [19:01:47] !log bblack updated /a/common to {{Gerrit|I5a2d86ef0}}: Use whole subnets in squid.php list for XFF acceptance [19:01:48] (03PS5) 10Dzahn: full sudo ALL for jkrauska on pmacct (rhenium) [operations/puppet] - 10https://gerrit.wikimedia.org/r/130589 [19:01:49] PROBLEM - Varnishkafka Delivery Errors on cp1069 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1172.400024 [19:01:49] PROBLEM - Varnishkafka Delivery Errors on cp1056 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1231.133301 [19:01:54] Logged the message, Master [19:01:55] paravoid: kinda, I'm following the wiki! [19:02:06] :) [19:02:39] PROBLEM - Varnishkafka Delivery Errors on cp1070 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1163.766724 [19:02:53] what's with the kafka stuff? [19:02:59] should be pretty acurate, ask if you have any questions [19:03:00] who knows :) [19:03:42] I basically followed this: https://wikitech.wikimedia.org/wiki/How_to_do_a_configuration_change#In_your_own_repo_via_gerrit [19:03:45] !log bblack synchronized wmf-config/squid.php 'Update wgSquidServersNoPurge to use whole subnets for XFF checking' [19:03:49] seems to be working fine so far [19:03:51] Logged the message, Master [19:04:03] (03CR) 10Dzahn: [C: 032] full sudo ALL for jkrauska on pmacct (rhenium) [operations/puppet] - 10https://gerrit.wikimedia.org/r/130589 (owner: 10Dzahn) [19:04:19] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:04:39] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:04:39] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:04:48] bblack: nope! [19:04:58] bblack: revert [19:05:10] http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Application+servers+eqiad&m=cpu_report&s=by+name&mc=2&g=cpu_report is bad enough [19:05:16] but http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=API+application+servers+eqiad&m=cpu_report&s=by+name&mc=2&g=cpu_report is really bad [19:05:26] yep, revert [19:05:36] (03PS1) 10BBlack: Revert "Use whole subnets in squid.php list for XFF acceptance" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131519 [19:05:43] heh :/ [19:05:50] (03CR) 10BBlack: [C: 032 V: 032] Revert "Use whole subnets in squid.php list for XFF acceptance" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131519 (owner: 10BBlack) [19:05:57] (03CR) 10Dzahn: "Admins::Pmacct/Sudo_user[jkrauska]/File[/etc/sudoers.d/jkrauska]/ensure...." [operations/puppet] - 10https://gerrit.wikimedia.org/r/130589 (owner: 10Dzahn) [19:05:59] (03Merged) 10jenkins-bot: Revert "Use whole subnets in squid.php list for XFF acceptance" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131519 (owner: 10BBlack) [19:05:59] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:06:19] RECOVERY - Varnishkafka Delivery Errors on cp1057 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:06:24] bd808: ^^^ fwiw :) [19:06:27] yeah I saw the CPU too [19:06:39] blerg [19:06:49] RECOVERY - Varnishkafka Delivery Errors on cp1069 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:07:14] fyi - http://blog.wikimedia.org/2014/05/05/wikimedia-foundation-selects-cyrusone-in-dallas-as-new-data-center/ [19:07:16] !log bblack updated /a/common to {{Gerrit|Iaf4d57d54}}: Revert "Use whole subnets in squid.php list for XFF acceptance" [19:07:21] the merge step takes forever :P [19:07:23] Logged the message, Master [19:07:35] paravoid: So the existing CIDR handling in core it too slow to use this way I guess [19:07:39] RECOVERY - Varnishkafka Delivery Errors on cp1070 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:07:50] RECOVERY - Varnishkafka Delivery Errors on cp1056 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:08:00] I could trim the CIDR list too, there are several currently-unused subnets in it [19:08:01] Eloquence: \o/ [19:08:13] but... checking an IP against a few CIDR ranges shouldn't be hard to do efficiently :P [19:08:31] bblack: even if you cut it by half it'd still be pretty bad [19:08:33] (modulo php being php and all that) [19:08:39] yeah you would know wouldn't you :P [19:08:46] !log bblack synchronized wmf-config/squid.php 'REVERT: Update wgSquidServersNoPurge to use whole subnets for XFF checking' [19:08:52] Logged the message, Master [19:09:01] bblack: Agreed. But the code in core that I was leveraging is ... chock full of regex madness [19:09:26] php5-gdnsd [19:09:46] it's kind of awesome when someone engineers this brilliant binary tree address space for efficient routing and filtering implementations, and then someone uses regexes on them :) [19:09:47] !log disabled puppet on cp40XX, ssl10XX, and ssl30XX [19:09:53] Logged the message, RobH [19:10:17] paravoid: I'm going to commit the known good key to the private repo and then merge my other patchset (after removing mark as reviewer since we fixed that issue) [19:10:18] php5-libgdmaps! [19:10:40] figured may as well review it in public channel, its not a security issue =] [19:10:50] Eloquence, guillom: the link to the second photo on the dallas blog post is broken [19:11:04] paravoid: damn; thanks; fixing now [19:11:06] so, basically a 20% cpu jump: http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Application+servers+eqiad&m=cpu_report&s=by+name&mc=2&g=cpu_report [19:11:19] PROBLEM - Varnishkafka Delivery Errors on cp1052 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1543.900024 [19:11:27] also see the api appservers, that was way worse [19:11:54] paravoid: Fixed! [19:11:57] wow, like 40% there [19:12:19] PROBLEM - Varnishkafka Delivery Errors on cp1067 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1569.633301 [19:12:22] :) [19:13:19] PROBLEM - Varnishkafka Delivery Errors on amssq62 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1340.0 [19:13:50] PROBLEM - Varnishkafka Delivery Errors on cp1068 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1625.966675 [19:14:19] PROBLEM - Varnishkafka Delivery Errors on cp1065 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1499.133301 [19:14:19] PROBLEM - Varnishkafka Delivery Errors on amssq56 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1491.233276 [19:14:35] bd808: do you have a quick pointer to where the php code for the network-checking is? [19:14:40] bblack, paravoid: I updated the bug I opened for the original CPU spike. This is fixable by either changing IP::isInRange() or by adding a more performant alternate code path. [19:14:47] (03CR) 10RobH: [C: 032] "I've addressed the private key concern with Mark already. He was right, and the key had been accidentally updated by applying the newest " [operations/puppet] - 10https://gerrit.wikimedia.org/r/130797 (owner: 10RobH) [19:14:59] PROBLEM - Varnishkafka Delivery Errors on amssq53 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1093.93335 [19:15:10] PROBLEM - Varnishkafka Delivery Errors on amssq58 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1495.666626 [19:15:19] PROBLEM - Varnishkafka Delivery Errors on amssq60 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1517.266724 [19:15:42] paravoid: Ok, I've merged my key and cert update. any preference on where its updated first? [19:15:49] nope [19:15:59] is ottomata in zurich yet? [19:16:04] bblack, you around? [19:16:08] bblack: https://github.com/wikimedia/mediawiki-core/blob/master/includes/utils/IP.php#L709 [19:16:10] PROBLEM - Varnishkafka Delivery Errors on cp1066 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1202.266724 [19:16:11] PROBLEM - Varnishkafka Delivery Errors on cp1053 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1341.833374 [19:16:14] maybe we should page him for all the varnishkafka errors [19:16:18] I'll do that [19:16:19] RECOVERY - Varnishkafka Delivery Errors on cp1052 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:17:11] PROBLEM - Varnishkafka Delivery Errors on amssq50 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1363.0 [19:17:11] PROBLEM - Varnishkafka Delivery Errors on amssq54 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1643.93335 [19:17:19] PROBLEM - Varnishkafka Delivery Errors on amssq51 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1304.233276 [19:17:19] RECOVERY - Varnishkafka Delivery Errors on cp1067 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:17:39] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1216.466675 [19:17:50] PROBLEM - Varnishkafka Delivery Errors on amssq48 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1556.666626 [19:18:05] !log depooling ssl1001 to test new certs live on system [19:18:12] Logged the message, RobH [19:18:19] PROBLEM - Varnishkafka Delivery Errors on amssq49 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1780.199951 [19:18:19] RECOVERY - Varnishkafka Delivery Errors on amssq62 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:18:19] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1531.633301 [19:18:39] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1127.5 [19:18:50] RECOVERY - Varnishkafka Delivery Errors on cp1068 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:18:50] RECOVERY - Varnishkafka Delivery Errors on amssq53 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:18:52] bd808: maybe we could do something like the first user-comment here? http://www.php.net/manual/en/ref.network.php [19:19:10] PROBLEM - Varnishkafka Delivery Errors on cp1054 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1447.43335 [19:19:10] PROBLEM - Varnishkafka Delivery Errors on amssq61 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1332.699951 [19:19:10] (but with v6 support) [19:19:19] RECOVERY - Varnishkafka Delivery Errors on cp1065 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:19:19] RECOVERY - Varnishkafka Delivery Errors on amssq56 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:19:23] paged otto [19:20:10] RECOVERY - Varnishkafka Delivery Errors on amssq58 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:20:19] RECOVERY - Varnishkafka Delivery Errors on amssq60 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:20:39] PROBLEM - Varnishkafka Delivery Errors on amssq55 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1465.666626 [19:20:49] PROBLEM - Varnishkafka Delivery Errors on amssq59 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1650.366699 [19:21:10] RECOVERY - Varnishkafka Delivery Errors on cp1066 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:21:10] RECOVERY - Varnishkafka Delivery Errors on cp1053 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:21:19] RECOVERY - Varnishkafka Delivery Errors on amssq54 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:21:19] RECOVERY - Varnishkafka Delivery Errors on amssq51 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:21:49] RECOVERY - Varnishkafka Delivery Errors on amssq48 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:21:59] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1249.0 [19:22:05] arggghhhh goddamn newlines [19:22:19] RECOVERY - Varnishkafka Delivery Errors on amssq49 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:22:39] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:22:39] PROBLEM - Varnishkafka Delivery Errors on cp4009 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 327.866669 [19:22:49] PROBLEM - Varnishkafka Delivery Errors on cp1060 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 101.633331 [19:23:09] RECOVERY - Varnishkafka Delivery Errors on amssq50 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:23:10] PROBLEM - Varnishkafka Delivery Errors on cp4018 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 360.0 [19:23:10] RECOVERY - Varnishkafka Delivery Errors on amssq61 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:23:10] RECOVERY - Varnishkafka Delivery Errors on cp1054 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:23:10] PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1802.533325 [19:23:19] PROBLEM - Varnishkafka Delivery Errors on cp1055 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1567.099976 [19:23:19] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:23:19] PROBLEM - Varnishkafka Delivery Errors on cp1057 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1138.56665 [19:23:39] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:23:49] PROBLEM - Varnishkafka Delivery Errors on cp1069 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1121.06665 [19:24:19] PROBLEM - Varnishkafka Delivery Errors on cp4010 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 334.799988 [19:24:19] PROBLEM - Varnishkafka Delivery Errors on cp4017 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 350.166656 [19:24:19] PROBLEM - Varnishkafka Delivery Errors on amssq52 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1688.400024 [19:24:22] (03PS1) 10RobH: missing newline on unified cert [operations/puppet] - 10https://gerrit.wikimedia.org/r/131528 [19:24:27] bd808: I'm going to play with the PHP code offline and do some benchmarking and see if I can replace all that with something faster and more efficient, it may take a little while [19:24:39] RECOVERY - Varnishkafka Delivery Errors on amssq55 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:24:40] (03CR) 10RobH: [C: 032 V: 032] missing newline on unified cert [operations/puppet] - 10https://gerrit.wikimedia.org/r/131528 (owner: 10RobH) [19:24:49] RECOVERY - Varnishkafka Delivery Errors on amssq59 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:24:49] PROBLEM - Varnishkafka Delivery Errors on cp1059 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 107.833336 [19:24:50] bblack: Cool. Let me know if I can help. [19:25:10] PROBLEM - Varnishkafka Delivery Errors on cp4011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 41.933334 [19:25:19] PROBLEM - Varnishkafka Delivery Errors on cp4003 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 376.333344 [19:25:19] PROBLEM - Varnishkafka Delivery Errors on amssq57 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1290.43335 [19:25:39] PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1489.866699 [19:25:49] PROBLEM - Varnishkafka Delivery Errors on cp1046 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 107.900002 [19:25:49] PROBLEM - Varnishkafka Delivery Errors on cp1056 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 890.799988 [19:25:56] (03PS1) 10Dzahn: add shell account for Monte Hurd [operations/puppet] - 10https://gerrit.wikimedia.org/r/131529 [19:26:39] PROBLEM - Varnishkafka Delivery Errors on cp1070 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1158.366699 [19:26:52] bblack: The check that is going crazy is https://github.com/wikimedia/mediawiki-core/blob/master/includes/GlobalFunctions.php#L4167 . If changing IP::isInRange() seems too scary/invasive a new check just for this usage could be added. [19:26:59] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:27:19] PROBLEM - Varnishkafka Delivery Errors on cp4004 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 318.399994 [19:27:19] RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:28:19] RECOVERY - Varnishkafka Delivery Errors on cp1055 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:28:19] RECOVERY - Varnishkafka Delivery Errors on amssq52 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:28:24] bd808: I think my basic plan is to "precompile" the configured list into some prepped integer stuff (for IPs, CIDRs, explicit ranges), and then have a much faster replacement for isInRange() that operates on that prepped data [19:28:50] PROBLEM - Varnishkafka Delivery Errors on cp4001 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 370.933319 [19:28:53] probably not a real tree or anything, just something that doesn't involve a ton of text parsing every check [19:28:58] paravoid: would you be so kind as to check ssl1001.wikimedia.org and let me know if it passes your chain test [19:29:15] i get the same results on it i get on the unpatched ssl1002, so i'd think its ok, but perhaps i need to test further items [19:29:19] PROBLEM - Varnishkafka Delivery Errors on cp4002 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 400.766663 [19:29:19] RECOVERY - Varnishkafka Delivery Errors on cp1057 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:29:35] im just reviewing the openssl s_client connect [19:29:49] RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:30:19] RECOVERY - Varnishkafka Delivery Errors on amssq57 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:30:39] PROBLEM - Varnishkafka Delivery Errors on cp4008 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 465.600006 [19:30:49] RECOVERY - Varnishkafka Delivery Errors on cp1069 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:30:49] PROBLEM - Varnishkafka Delivery Errors on cp4016 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 357.633331 [19:31:39] PROBLEM - Varnishkafka Delivery Errors on cp4020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 51.900002 [19:32:26] RobH: that's the old cert [19:32:39] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [19:32:49] RECOVERY - Varnishkafka Delivery Errors on cp1056 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:32:51] ...eh? [19:33:19] i merged it and saw it update, checking [19:33:19] PROBLEM - Varnishkafka Delivery Errors on cp1067 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1497.833374 [19:33:19] PROBLEM - Varnishkafka Delivery Errors on cp1052 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1418.033325 [19:33:25] you haven't restarted nginx though [19:33:39] RECOVERY - Varnishkafka Delivery Errors on cp1070 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:33:40] bblack: Just caching the output of IP::parseRange() in the php process might make a big difference. [19:33:44] i thought it did so automatically at the time of replacement of main file [19:33:45] lemme do so [19:33:49] PROBLEM - Varnishkafka Delivery Errors on cp1068 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1567.900024 [19:33:50] no, on purpose :) [19:33:54] greg-g: Reedy "dologmsg" doesn't work anymore on tin? [19:33:57] my bad =P [19:34:00] so that when it breaks, it doesn't break spectacularly ;) [19:34:04] mutante tested dologmsg script on tin [19:34:09] oh, heh, it does [19:34:10] PROBLEM - Varnishkafka Delivery Errors on cp4019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 47.866665 [19:34:19] PROBLEM - Varnishkafka Delivery Errors on cp1065 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1411.5 [19:34:19] PROBLEM - Varnishkafka Delivery Errors on amssq62 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1275.533325 [19:34:19] PROBLEM - Varnishkafka Delivery Errors on cp4012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 45.366665 [19:34:26] jgage: ping? [19:34:49] PROBLEM - Varnishkafka Delivery Errors on amssq53 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1363.266724 [19:34:57] is gage at the office? [19:35:00] paravoid: yes, but NOW its the new cert [19:35:02] he wasnt no [19:35:05] paravoid: no [19:35:10] PROBLEM - Varnishkafka Delivery Errors on amssq58 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1365.833374 [19:35:10] PROBLEM - Varnishkafka Delivery Errors on cp1066 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1549.733276 [19:35:19] PROBLEM - Varnishkafka Delivery Errors on amssq56 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1212.400024 [19:35:19] PROBLEM - Varnishkafka Delivery Errors on amssq60 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1265.93335 [19:35:29] paravoid: and also seems to be working [19:35:39] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1430.333374 [19:35:39] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 993.790875712 [19:35:53] if you confirm i'll start repooling and depooling them in ones or twos and rolling the update [19:36:19] PROBLEM - Varnishkafka Delivery Errors on amssq54 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1213.0 [19:36:19] PROBLEM - Varnishkafka Delivery Errors on amssq51 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1526.0 [19:36:19] PROBLEM - Varnishkafka Delivery Errors on cp1053 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1229.93335 [19:36:19] PROBLEM - Varnishkafka Delivery Errors on amssq49 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1129.099976 [19:36:19] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1503.599976 [19:36:35] (03CR) 10Dzahn: [C: 031] "there are no more "home-mounted Apaches" and dologmsg works fine on tin" [operations/puppet] - 10https://gerrit.wikimedia.org/r/130600 (owner: 10Dzahn) [19:36:49] PROBLEM - Varnishkafka Delivery Errors on amssq48 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1450.866699 [19:37:11] PROBLEM - Varnishkafka Delivery Errors on amssq50 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1357.333374 [19:37:12] PROBLEM - Varnishkafka Delivery Errors on amssq61 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1513.266724 [19:37:12] PROBLEM - Varnishkafka Delivery Errors on cp1054 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1439.366699 [19:37:39] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1372.06665 [19:37:40] RobH: looks good [19:37:48] huzzzzzzaaaaahhhhhhhh [19:38:06] i'll repool ssl1001 and then depool the rest of eqiad in pairs (seems ok to you?) [19:38:19] RECOVERY - Varnishkafka Delivery Errors on cp1052 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:38:22] except esams, thats one by one [19:38:25] there arent many there [19:38:29] yup, sounds fine [19:38:39] PROBLEM - Varnishkafka Delivery Errors on amssq55 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1266.400024 [19:38:41] cool thanks [19:38:49] RECOVERY - Varnishkafka Delivery Errors on cp1068 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:38:49] PROBLEM - Varnishkafka Delivery Errors on amssq59 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1360.033325 [19:39:19] RECOVERY - Varnishkafka Delivery Errors on cp1065 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:39:19] RECOVERY - Varnishkafka Delivery Errors on amssq62 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:39:19] RECOVERY - Varnishkafka Delivery Errors on amssq56 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:39:39] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:39:49] RECOVERY - Varnishkafka Delivery Errors on amssq53 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:40:10] RECOVERY - Varnishkafka Delivery Errors on amssq58 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:40:10] RECOVERY - Varnishkafka Delivery Errors on cp1066 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:40:19] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:40:19] RECOVERY - Varnishkafka Delivery Errors on cp1067 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:40:59] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1243.333374 [19:41:19] RECOVERY - Varnishkafka Delivery Errors on cp1053 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:41:19] PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1344.233276 [19:41:19] RECOVERY - Varnishkafka Delivery Errors on amssq51 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:41:19] PROBLEM - Varnishkafka Delivery Errors on cp1055 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1458.466675 [19:41:19] RECOVERY - Varnishkafka Delivery Errors on amssq49 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:41:20] PROBLEM - Varnishkafka Delivery Errors on amssq52 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1570.466675 [19:41:20] RECOVERY - Varnishkafka Delivery Errors on amssq60 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:41:50] RECOVERY - Varnishkafka Delivery Errors on amssq48 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:42:38] RECOVERY - Varnishkafka Delivery Errors on amssq61 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:42:38] RECOVERY - Varnishkafka Delivery Errors on cp1054 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:42:38] RECOVERY - Varnishkafka Delivery Errors on amssq54 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:42:39] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:43:10] RECOVERY - Varnishkafka Delivery Errors on amssq50 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:43:19] PROBLEM - Varnishkafka Delivery Errors on amssq57 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1540.133301 [19:43:39] PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1647.466675 [19:43:50] RECOVERY - Varnishkafka Delivery Errors on amssq59 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:44:41] RECOVERY - Varnishkafka Delivery Errors on amssq55 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:45:19] RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:45:19] RECOVERY - Varnishkafka Delivery Errors on amssq52 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:46:00] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:46:19] RECOVERY - Varnishkafka Delivery Errors on cp1055 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:46:19] PROBLEM - Varnishkafka Delivery Errors on cp1057 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1179.93335 [19:47:39] RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:47:54] can someone page jgage about all that [19:48:06] otto hasn't responded [19:48:19] RECOVERY - Varnishkafka Delivery Errors on cp4003 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:48:19] RECOVERY - Varnishkafka Delivery Errors on amssq57 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:48:49] PROBLEM - Varnishkafka Delivery Errors on cp1069 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1034.366699 [19:49:11] RECOVERY - Varnishkafka Delivery Errors on cp4018 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:50:12] paravoid: only XFPs here [19:50:31] paravoid: I thought we were going to connect to a switch in ulsfo (not mx) [19:50:39] RECOVERY - Varnishkafka Delivery Errors on cp4009 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:50:40] yes [19:50:42] that's correct [19:50:56] switch doesn't have 10/100/1000 ports? [19:51:13] fiber problem... [19:51:19] RECOVERY - Varnishkafka Delivery Errors on cp4004 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:51:19] RECOVERY - Varnishkafka Delivery Errors on cp4017 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:51:27] Would any ops people be available to meet with Ray King at the office today at 3pm? He's the guy that owns the .wiki TLD. [19:51:50] PROBLEM - Varnishkafka Delivery Errors on cp1056 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 830.833313 [19:51:52] have an SFP fiber embedded into the chassis? :) [19:52:12] media converter a bad idea? [19:52:18] what's the switch in ulsfo? [19:52:19] RECOVERY - Varnishkafka Delivery Errors on cp4002 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:52:19] RECOVERY - Varnishkafka Delivery Errors on cp1057 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:52:23] mutante: ^ ? [19:52:25] it has SFP ports? [19:52:49] RECOVERY - Varnishkafka Delivery Errors on cp4001 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:53:19] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1692.466675 [19:53:22] we have an EX4500 and an EX4550 [19:53:39] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1432.56665 [19:53:39] PROBLEM - Varnishkafka Delivery Errors on cp1070 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1080.300049 [19:54:10] PROBLEM - Varnishkafka Delivery Errors on cp1065 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1496.43335 [19:54:16] kaldari: ? [19:54:18] do you want a juniper brand SFP or do you use third party? [19:54:19] RECOVERY - Varnishkafka Delivery Errors on cp4010 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:54:19] PROBLEM - Varnishkafka Delivery Errors on cp1052 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1392.233276 [19:54:49] PROBLEM - Varnishkafka Delivery Errors on cp1068 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1348.5 [19:55:01] kaldari: if it's about the Varnishkafka i believe Ottomata has already been paged [19:55:19] PROBLEM - Varnishkafka Delivery Errors on amssq62 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1160.56665 [19:55:19] PROBLEM - Varnishkafka Delivery Errors on amssq56 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1011.666687 [19:55:49] RECOVERY - Varnishkafka Delivery Errors on cp1069 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:55:57] cajoel: third parties are fine, do you have spares? [19:56:11] PROBLEM - Varnishkafka Delivery Errors on amssq58 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1151.466675 [19:56:11] PROBLEM - Varnishkafka Delivery Errors on amssq61 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1212.033325 [19:56:11] PROBLEM - Varnishkafka Delivery Errors on cp1066 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1320.56665 [19:56:11] paravoid: only found XFPs in the boxes here. [19:56:19] PROBLEM - Varnishkafka Delivery Errors on cp1053 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1354.599976 [19:56:19] PROBLEM - Varnishkafka Delivery Errors on cp1067 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 700.400024 [19:56:19] PROBLEM - Varnishkafka Delivery Errors on amssq51 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1532.133301 [19:56:19] PROBLEM - Varnishkafka Delivery Errors on amssq60 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1266.599976 [19:56:22] ok, so we'll make an order [19:56:39] RECOVERY - Varnishkafka Delivery Errors on cp4008 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:56:39] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1359.266724 [19:56:48] I'll file an RT [19:56:49] PROBLEM - Varnishkafka Delivery Errors on amssq53 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1082.333374 [19:56:49] PROBLEM - Varnishkafka Delivery Errors on amssq48 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1203.199951 [19:57:09] but before I do that, can you confirm with reliance that they'll terminate in a 1Gbps SFP [19:57:10] PROBLEM - Varnishkafka Delivery Errors on cp1054 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1444.333374 [19:57:19] PROBLEM - Varnishkafka Delivery Errors on amssq54 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1220.866699 [19:57:19] PROBLEM - Varnishkafka Delivery Errors on amssq49 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1014.56665 [19:57:19] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:57:21] and whether we'll use one or two? we can order two of course, they're going to be cheap [19:57:34] should be <$100 [19:57:39] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:57:43] yup [19:57:48] I'd order 2, just so we have a backup [19:57:49] PROBLEM - Varnishkafka Delivery Errors on amssq59 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1164.133301 [19:57:52] I'll call reliance right now.. [19:58:08] double check that the 4500 doesn't have any SFPs in it already? [19:58:11] PROBLEM - Varnishkafka Delivery Errors on amssq50 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1065.733276 [19:58:13] !log ssl1001 back in service, ssl1002-1003 set to disabled in pybal [19:58:18] Logged the message, RobH [19:58:47] faidon@asw-ulsfo> show chassis hardware | match SFP | except SFP\+ Xcvr 16 REV 01 740-013113 BYJOHNSJ221987C SFP-T Xcvr 27 REV 01 740-013113 BYJOHNSJ221987D SFP-T Xcvr 18 REV 01 740-013113 BYJOHNSJ221987A SFP-T Xcvr 20 REV 01 740-013113 BYJOHNSJ221987B SFP-T [19:58:49] RECOVERY - Varnishkafka Delivery Errors on cp4016 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:58:49] RECOVERY - Varnishkafka Delivery Errors on cp1056 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:58:54] it's full of XFPs :) [19:58:58] (03CR) 10Dzahn: [C: 032] add shell account for Monte Hurd [operations/puppet] - 10https://gerrit.wikimedia.org/r/131529 (owner: 10Dzahn) [19:59:02] (ironically) [19:59:19] PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1367.866699 [19:59:39] PROBLEM - Varnishkafka Delivery Errors on amssq55 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1268.766724 [19:59:59] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1237.633301 [20:00:19] RECOVERY - Varnishkafka Delivery Errors on cp1065 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:00:19] PROBLEM - Varnishkafka Delivery Errors on cp1055 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1468.633301 [20:00:19] PROBLEM - Varnishkafka Delivery Errors on amssq52 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1438.266724 [20:00:19] RECOVERY - Varnishkafka Delivery Errors on cp1052 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:01:20] RECOVERY - Varnishkafka Delivery Errors on amssq62 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:01:29] RECOVERY - Varnishkafka Delivery Errors on amssq56 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:01:39] (03PS1) 10Dzahn: add account mhurd to stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131537 [20:01:39] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:01:40] RECOVERY - Varnishkafka Delivery Errors on cp1070 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:01:49] PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1695.133301 [20:01:50] RECOVERY - Varnishkafka Delivery Errors on amssq48 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:01:50] RECOVERY - Varnishkafka Delivery Errors on cp1068 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:01:59] RECOVERY - Varnishkafka Delivery Errors on amssq53 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:02:11] RECOVERY - Varnishkafka Delivery Errors on amssq61 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:02:19] RECOVERY - Varnishkafka Delivery Errors on cp1066 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:02:19] RECOVERY - Varnishkafka Delivery Errors on amssq58 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:02:19] RECOVERY - Varnishkafka Delivery Errors on cp1053 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:02:19] RECOVERY - Varnishkafka Delivery Errors on amssq54 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:02:19] RECOVERY - Varnishkafka Delivery Errors on amssq49 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:02:20] RECOVERY - Varnishkafka Delivery Errors on amssq51 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:02:20] PROBLEM - Varnishkafka Delivery Errors on amssq57 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1558.366699 [20:02:21] RECOVERY - Varnishkafka Delivery Errors on amssq60 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:02:32] greg-g, getting ready to deploy parsoid. .. any reason to wait? [20:02:49] RECOVERY - Varnishkafka Delivery Errors on amssq59 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:03:11] RECOVERY - Varnishkafka Delivery Errors on cp1054 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:03:19] RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:03:19] (03PS2) 10Dzahn: add account mhurd to stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131537 [20:03:35] subbu: don't think so, I've grown to ignore varnichkhafka errors unfortunately :/ [20:03:42] ok. [20:03:52] * subbu logs on to bastion [20:04:15] cajoel: 4500/4550 doesn't support 100BASE SFPs [20:04:19] RECOVERY - Varnishkafka Delivery Errors on cp1067 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:04:19] RECOVERY - Varnishkafka Delivery Errors on amssq52 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:04:41] cajoel: only the 4200 does; so please convince them to do GigE [20:04:59] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:05:04] paravoid: well nasty --- so get a 1G no matter what, and if we can't get GigE, we'll have to do some sort of mediuaconverte/cheapo switch anyway (ugly) [20:05:11] RECOVERY - Varnishkafka Delivery Errors on amssq50 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:05:19] RECOVERY - Varnishkafka Delivery Errors on cp1055 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:05:39] RECOVERY - Varnishkafka Delivery Errors on amssq55 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:06:03] cajoel: they said okay for a single 1gbps port in your initial mail, so I see no reason why not [20:06:39] RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:07:24] (03CR) 10Dzahn: [C: 032] "had approval, waiting period over" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131537 (owner: 10Dzahn) [20:07:46] cajoel, RobH: filed procurement request RT #7416 [20:08:19] PROBLEM - Varnishkafka Delivery Errors on cp1047 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 119.833336 [20:08:19] RECOVERY - Varnishkafka Delivery Errors on amssq57 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:08:19] mutante: WOould you be available to meet with Ray King at the office today at 3pm? He's the guy that owns the .wiki TLD. [20:08:44] would be good to have someone from ops at the meeting [20:09:16] (03PS1) 10Dr0ptp4kt: Test support for Nokia proxy fronting assigned cellular IP for one subdomain. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131540 [20:09:19] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1709.966675 [20:09:27] kaldari: oh.. i had no idea he was coming to office.. if you can tell me what the plan is, sure [20:09:36] kaldari: can we meet at office before? [20:10:11] PROBLEM - Varnishkafka Delivery Errors on cp4018 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 225.833328 [20:10:19] PROBLEM - Varnishkafka Delivery Errors on cp1057 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1211.300049 [20:10:19] PROBLEM - Varnishkafka Delivery Errors on cp4017 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 248.53334 [20:10:28] mutante: sure, I can meet with you now or whenever. Meeting with Ray is scheduled for 3-4pm. I'll add you to the invite. [20:10:39] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1434.266724 [20:11:04] (03CR) 10Dzahn: "Mhurd/Unixaccount[Monte Hurd]/User[mhurd]/ensure: created.." [operations/puppet] - 10https://gerrit.wikimedia.org/r/131537 (owner: 10Dzahn) [20:11:19] PROBLEM - Varnishkafka Delivery Errors on cp4003 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 195.666672 [20:11:58] RobH: how is it going? [20:13:19] PROBLEM - Varnishkafka Delivery Errors on cp4004 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 222.833328 [20:13:39] PROBLEM - Varnishkafka Delivery Errors on cp4008 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 69.599998 [20:13:49] PROBLEM - Varnishkafka Delivery Errors on cp4001 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 211.300003 [20:14:19] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:14:39] RECOVERY - Varnishkafka Delivery Errors on cp4020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:14:43] sorry got a phone call, im updating ssl1002-1003 now [20:14:49] PROBLEM - Varnishkafka Delivery Errors on cp4016 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 93.099998 [20:15:17] !log deployed parsoid f2f1f1d7 (with deploy sha 71072f8a) [20:15:19] PROBLEM - Varnishkafka Delivery Errors on cp4002 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 184.199997 [20:15:23] Logged the message, Master [20:15:39] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1384.43335 [20:15:39] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:15:50] PROBLEM - Varnishkafka Delivery Errors on cp1069 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1009.966675 [20:16:19] PROBLEM - Varnishkafka Delivery Errors on cp1065 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1371.666626 [20:17:10] PROBLEM - Varnishkafka Delivery Errors on amssq61 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1302.266724 [20:17:19] PROBLEM - Varnishkafka Delivery Errors on cp1053 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1538.166626 [20:17:19] PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1792.533325 [20:17:19] RECOVERY - Varnishkafka Delivery Errors on cp1057 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:17:19] PROBLEM - Varnishkafka Delivery Errors on amssq56 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1316.133301 [20:17:19] PROBLEM - Varnishkafka Delivery Errors on cp1052 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1144.866699 [20:17:39] PROBLEM - Varnishkafka Delivery Errors on cp4020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 31.9 [20:17:40] PROBLEM - Varnishkafka Delivery Errors on cp4009 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 200.666672 [20:17:41] mutante/robh: can you call gage and tell him about all this varnishkafka stuff? [20:17:49] PROBLEM - Varnishkafka Delivery Errors on amssq48 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1430.466675 [20:17:49] PROBLEM - Varnishkafka Delivery Errors on amssq59 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1282.099976 [20:18:08] !log putting ssl1002/3 back into service [20:18:10] PROBLEM - Varnishkafka Delivery Errors on cp1066 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1287.866699 [20:18:15] Logged the message, RobH [20:18:19] PROBLEM - Varnishkafka Delivery Errors on amssq62 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1095.866699 [20:18:19] PROBLEM - Varnishkafka Delivery Errors on amssq49 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1054.366699 [20:18:19] PROBLEM - Varnishkafka Delivery Errors on amssq51 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1439.599976 [20:18:49] PROBLEM - Varnishkafka Delivery Errors on amssq53 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1182.533325 [20:18:49] PROBLEM - Varnishkafka Delivery Errors on cp1056 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 973.733337 [20:18:49] PROBLEM - Varnishkafka Delivery Errors on cp1068 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1264.766724 [20:18:53] paravoid: i just texted him [20:18:59] thx [20:18:59] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1383.866699 [20:19:00] if he doesnt reply to text shortly i'll call [20:19:00] paravoid: calling [20:19:04] oops [20:19:10] PROBLEM - Varnishkafka Delivery Errors on amssq58 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1081.633301 [20:19:11] PROBLEM - Varnishkafka Delivery Errors on cp1054 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1184.5 [20:19:19] PROBLEM - Varnishkafka Delivery Errors on amssq54 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1150.900024 [20:19:19] PROBLEM - Varnishkafka Delivery Errors on cp1055 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1365.5 [20:19:19] PROBLEM - Varnishkafka Delivery Errors on cp4010 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 143.766663 [20:19:19] PROBLEM - Varnishkafka Delivery Errors on amssq52 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1464.666626 [20:19:19] PROBLEM - Varnishkafka Delivery Errors on amssq60 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1165.866699 [20:19:21] i had him in my recent texts from him texting me about his dropping connection in meeting [20:19:39] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:19:55] !log ssl1004/5 disabled for update [20:20:02] Logged the message, RobH [20:20:06] paravoid: he'll look, talked to him [20:20:08] bah, depooled, whatevs [20:20:14] hi [20:20:30] weird rob, i just got your text now [20:20:39] <^d> mutante: Sooo, silver as the formey ldap admin box replacement. I can't ssh to it because my account isn't on it. [20:20:39] PROBLEM - Varnishkafka Delivery Errors on amssq55 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 903.233337 [20:20:39] PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1470.800049 [20:20:49] <^d> (we tried to fix this friday but broke) [20:21:40] !log puppet agent has been re-enabled on ssl1001-1003 [20:21:47] Logged the message, RobH [20:21:53] jgage: yay multi-carrier text delays =] [20:22:10] yeah [20:22:11] PROBLEM - Varnishkafka Delivery Errors on amssq50 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1269.233276 [20:22:19] RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:22:19] RECOVERY - Varnishkafka Delivery Errors on cp1065 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:22:19] PROBLEM - Varnishkafka Delivery Errors on cp1067 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1333.766724 [20:22:19] PROBLEM - Varnishkafka Delivery Errors on amssq57 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1622.56665 [20:22:39] PROBLEM - Varnishkafka Delivery Errors on cp1070 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 980.633362 [20:23:10] RECOVERY - Varnishkafka Delivery Errors on amssq61 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:23:20] RECOVERY - Varnishkafka Delivery Errors on amssq56 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:23:47] so, about the varnichkhafka spam.... [20:23:49] RECOVERY - Varnishkafka Delivery Errors on amssq48 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:23:49] RECOVERY - Varnishkafka Delivery Errors on cp1069 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:23:49] RECOVERY - Varnishkafka Delivery Errors on amssq59 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:24:00] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:24:00] ahh, I see you're pinging gage [20:24:03] sorry, cause not yet identified [20:24:10] RECOVERY - Varnishkafka Delivery Errors on cp1066 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:24:18] can we make icinga shut up temporarily? [20:24:19] RECOVERY - Varnishkafka Delivery Errors on cp1053 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:24:19] RECOVERY - Varnishkafka Delivery Errors on amssq51 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:24:19] RECOVERY - Varnishkafka Delivery Errors on amssq62 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:24:19] RECOVERY - Varnishkafka Delivery Errors on cp1055 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:24:19] RECOVERY - Varnishkafka Delivery Errors on amssq49 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:24:20] RECOVERY - Varnishkafka Delivery Errors on amssq52 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:24:20] RECOVERY - Varnishkafka Delivery Errors on cp1052 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:24:49] RECOVERY - Varnishkafka Delivery Errors on cp1068 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:24:50] RECOVERY - Varnishkafka Delivery Errors on amssq53 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:25:11] RECOVERY - Varnishkafka Delivery Errors on amssq58 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:25:19] RECOVERY - Varnishkafka Delivery Errors on cp1054 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:25:19] RECOVERY - Varnishkafka Delivery Errors on amssq54 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:25:19] RECOVERY - Varnishkafka Delivery Errors on amssq60 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:25:27] !log ssl1004/5 returned to service (and puppet agents enabled) [20:25:34] Logged the message, RobH [20:25:39] !log depooled ssl1006/7 for update [20:25:47] Logged the message, RobH [20:25:49] RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:25:50] RECOVERY - Varnishkafka Delivery Errors on cp1056 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:26:16] paravoid: Bahhhhh [20:26:19] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1551.233276 [20:26:24] what? [20:26:25] so somehow the puppet agent disable systems called in [20:26:37] they have the udpated unified cert (but not chained since i didnt get to them yet) [20:26:42] you don't have to disable puppet anymore [20:26:43] it doesnt restart nginx, so we should be ok [20:26:47] exactly [20:27:11] well, then im just going to salt remove the chain on them all and force puppet runs, since nginx doesnt restart there isnt a good reason for me to do one by one anymore right? [20:27:18] (not doing that until you confirm mind you) [20:27:19] RECOVERY - Varnishkafka Delivery Errors on amssq57 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:27:39] RECOVERY - Varnishkafka Delivery Errors on amssq55 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:28:23] the only thing i need to manually do is restart nginx after depooling them and draining traffic to each [20:28:27] seems to me. [20:28:30] well, one by one. [20:28:39] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1464.5 [20:29:19] RECOVERY - Varnishkafka Delivery Errors on amssq50 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:29:19] RECOVERY - Varnishkafka Delivery Errors on cp1067 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:30:15] * AaronSchulz wonders why Template:WPBannerMeta is so slow [20:30:19] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:30:39] RECOVERY - Varnishkafka Delivery Errors on cp1070 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:31:20] (03CR) 10Hashar: [C: 031 V: 032] "Deployed on contint puppetmaster and proven to be working :]" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131438 (owner: 10Hashar) [20:31:24] well, now its not all, its just ssl1006-1009, so just rm'd the old chain file and running puppet on them. 1006/7 are already depooled so once thats done i'll restart their nginx and test them, repool [20:32:00] (03CR) 10Hashar: [C: 04-1] "Seems to be working on labs :-)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129292 (owner: 10Hashar) [20:32:39] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1413.866699 [20:33:05] AaronSchulz: is that the template for CN magic (disabling banners on multiple sites and the like)? [20:33:39] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:34:26] hrmm, ssl1006 is being bitchy. [20:34:37] i'll flag it and come back since ssl1007 is fine with same exact steps [20:34:50] !log ssl1007 going back into service, ssl1008 depooling [20:34:57] Logged the message, RobH [20:35:19] PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1813.733276 [20:35:19] PROBLEM - Varnishkafka Delivery Errors on cp1057 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1179.800049 [20:36:05] gah [20:36:59] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1261.56665 [20:37:19] PROBLEM - Varnishkafka Delivery Errors on amssq52 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1392.5 [20:38:11] PROBLEM - Varnishkafka Delivery Errors on amssq61 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1333.966675 [20:38:15] !log forced kafka broker reelection [20:38:22] Logged the message, Master [20:38:33] (03PS1) 10Dzahn: add mhurd to bast1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131592 [20:38:39] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:38:49] RECOVERY - Varnishkafka Delivery Errors on cp1060 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:38:49] PROBLEM - Varnishkafka Delivery Errors on amssq48 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1347.733276 [20:38:50] PROBLEM - Varnishkafka Delivery Errors on amssq59 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2547.100098 [20:39:19] PROBLEM - Varnishkafka Delivery Errors on cp1065 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2819.866699 [20:39:19] PROBLEM - Varnishkafka Delivery Errors on cp1053 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 3146.133301 [20:39:19] RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:39:19] PROBLEM - Varnishkafka Delivery Errors on amssq49 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2131.399902 [20:39:19] PROBLEM - Varnishkafka Delivery Errors on cp1055 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 3275.333252 [20:39:39] PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 3136.266602 [20:39:39] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 1925.8109877 [20:39:49] RECOVERY - Varnishkafka Delivery Errors on cp1059 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:39:49] RECOVERY - Varnishkafka Delivery Errors on cp1046 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:39:55] !log ssl1008 back into service, ssl1009 already depooled [20:40:00] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:40:02] Logged the message, RobH [20:40:10] PROBLEM - Varnishkafka Delivery Errors on cp1066 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2584.93335 [20:40:19] PROBLEM - Varnishkafka Delivery Errors on amssq51 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2216.533447 [20:40:19] RECOVERY - Varnishkafka Delivery Errors on cp1057 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:40:19] PROBLEM - Varnishkafka Delivery Errors on amssq56 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1759.599976 [20:40:50] PROBLEM - Varnishkafka Delivery Errors on amssq53 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1762.866699 [20:41:11] PROBLEM - Varnishkafka Delivery Errors on cp1054 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2375.399902 [20:41:11] PROBLEM - Varnishkafka Delivery Errors on amssq58 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2004.599976 [20:41:19] PROBLEM - Varnishkafka Delivery Errors on amssq54 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2196.666748 [20:41:19] PROBLEM - Varnishkafka Delivery Errors on amssq62 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1633.533325 [20:41:19] RECOVERY - Varnishkafka Delivery Errors on amssq52 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:41:19] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 3310.5 [20:41:19] PROBLEM - Varnishkafka Delivery Errors on amssq60 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2147.199951 [20:41:19] PROBLEM - Varnishkafka Delivery Errors on amssq57 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 3060.56665 [20:41:19] PROBLEM - Varnishkafka Delivery Errors on cp1052 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1912.233276 [20:41:19] interesting, ssl1009 has same refusal error as ssl1006 (but wasnt pooled before) [20:41:39] PROBLEM - Varnishkafka Delivery Errors on amssq55 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2714.699951 [20:41:49] RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:41:49] PROBLEM - Varnishkafka Delivery Errors on cp1069 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2013.599976 [20:41:50] PROBLEM - Varnishkafka Delivery Errors on cp1068 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2204.300049 [20:41:50] RECOVERY - Varnishkafka Delivery Errors on amssq59 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:42:19] RECOVERY - Varnishkafka Delivery Errors on amssq61 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:42:19] RECOVERY - Varnishkafka Delivery Errors on cp1055 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:42:19] RECOVERY - Varnishkafka Delivery Errors on amssq49 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:42:22] ^d: re: LDAP access. i don't think you had that access before.. just robla,reedy,sumanah. https://gerrit.wikimedia.org/r/#/c/125721/8/manifests/site.pp [20:42:36] ^d: when it was on formey i mean [20:42:46] ^d: oh, i see, because you had root on formey for gerrit? [20:42:49] RECOVERY - Varnishkafka Delivery Errors on amssq48 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:43:00] !log ssl1009 was refusing connections both before and after my ssl cert update. ssl1006 is presently refusing connections post update. they are set to disabled in pubal [20:43:03] <^d> mutante: Yeah I did because it was formey. [20:43:04] !log pybal [20:43:06] Logged the message, RobH [20:43:11] RECOVERY - Varnishkafka Delivery Errors on cp1066 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:43:11] PROBLEM - Varnishkafka Delivery Errors on amssq50 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1651.366699 [20:43:11] Logged the message, RobH [20:43:19] RECOVERY - Varnishkafka Delivery Errors on amssq54 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:43:19] RECOVERY - Varnishkafka Delivery Errors on cp1065 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:43:19] RECOVERY - Varnishkafka Delivery Errors on cp1053 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:43:19] RECOVERY - Varnishkafka Delivery Errors on amssq51 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:43:19] RECOVERY - Varnishkafka Delivery Errors on amssq57 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:43:19] RECOVERY - Varnishkafka Delivery Errors on amssq56 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:43:19] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:43:22] ^d: looking .. [20:43:49] PROBLEM - Varnishkafka Delivery Errors on cp1056 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1005.06665 [20:44:11] RECOVERY - Varnishkafka Delivery Errors on cp4018 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:44:19] RECOVERY - Varnishkafka Delivery Errors on amssq58 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:44:19] RECOVERY - Varnishkafka Delivery Errors on cp1054 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:44:19] PROBLEM - Varnishkafka Delivery Errors on cp1067 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1428.166626 [20:44:19] RECOVERY - Varnishkafka Delivery Errors on amssq60 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:44:39] RECOVERY - Varnishkafka Delivery Errors on amssq55 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:44:39] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2076.899902 [20:44:59] RECOVERY - Varnishkafka Delivery Errors on amssq53 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:45:19] RECOVERY - Varnishkafka Delivery Errors on amssq62 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:45:19] RECOVERY - Varnishkafka Delivery Errors on cp4017 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:45:19] RECOVERY - Varnishkafka Delivery Errors on cp1052 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:45:49] RECOVERY - Varnishkafka Delivery Errors on cp1069 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:45:49] RECOVERY - Varnishkafka Delivery Errors on cp1068 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:46:19] RECOVERY - Varnishkafka Delivery Errors on cp4004 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:46:39] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2113.899902 [20:46:39] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:46:39] PROBLEM - Varnishkafka Delivery Errors on cp1070 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 989.06665 [20:46:39] RECOVERY - Varnishkafka Delivery Errors on cp4008 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:46:49] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Sat May 3 05:29:08 2014 [20:47:19] RECOVERY - Varnishkafka Delivery Errors on cp4003 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:47:39] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [20:47:39] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:48:19] RECOVERY - Varnishkafka Delivery Errors on amssq50 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:48:49] RECOVERY - Varnishkafka Delivery Errors on cp4016 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:48:50] RECOVERY - Varnishkafka Delivery Errors on cp4001 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:48:54] sorry for the icinga spam folks, i'd turn it off if i knew how [20:48:59] restarting kafka brokers now [20:49:00] PROBLEM - Varnishkafka Delivery Errors on cp1046 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 37.166668 [20:49:19] RECOVERY - Varnishkafka Delivery Errors on cp4002 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:49:49] PROBLEM - Varnishkafka Delivery Errors on cp1059 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 42.333332 [20:49:49] RECOVERY - Varnishkafka Delivery Errors on cp1056 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:50:16] (03PS1) 10Dzahn: include accounts in LDAP admin role, add demon [operations/puppet] - 10https://gerrit.wikimedia.org/r/131597 [20:50:19] RECOVERY - Varnishkafka Delivery Errors on cp1067 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:50:39] !log ssl1006 and ssl1009 are responsive to nginx and back in service [20:50:39] RECOVERY - Varnishkafka Delivery Errors on cp1070 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:50:46] Logged the message, RobH [20:50:49] PROBLEM - Varnishkafka Delivery Errors on cp1060 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 6076.366699 [20:50:59] (03PS2) 10Dzahn: add mhurd to bast1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/131592 [20:51:10] RECOVERY - Varnishkafka Delivery Errors on cp4011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:51:19] RECOVERY - Varnishkafka Delivery Errors on cp4019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:51:19] RECOVERY - Varnishkafka Delivery Errors on cp1047 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:51:19] RECOVERY - Varnishkafka Delivery Errors on cp4010 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:51:19] RECOVERY - Varnishkafka Delivery Errors on cp4012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:51:39] RECOVERY - Varnishkafka Delivery Errors on cp4020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:51:39] RECOVERY - Varnishkafka Delivery Errors on cp4009 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:51:49] RECOVERY - Varnishkafka Delivery Errors on cp1060 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:51:49] RECOVERY - Varnishkafka Delivery Errors on cp1059 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:51:59] RECOVERY - Varnishkafka Delivery Errors on cp1046 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:52:05] (03PS2) 10Manybubbles: WIP: Add some analyzers to labs [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/130969 [20:52:19] PROBLEM - Kafka Broker Under Replicated Partitions on analytics1021 is CRITICAL: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value CRITICAL: 44.0 [20:52:50] (03CR) 10Dzahn: [C: 032] "..or he can't get to stat1003" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131592 (owner: 10Dzahn) [20:52:56] hrm i can't get it to reelect analytics1022 [20:54:10] PROBLEM - Varnishkafka Delivery Errors on cp4018 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 307.866669 [20:54:22] !log cp4001-4020 unified cert and nginx service reloaded, back in service [20:54:24] (03CR) 10Chad: "I have no idea how gwtorm behaves here." [operations/puppet] - 10https://gerrit.wikimedia.org/r/131419 (owner: 10Springle) [20:54:29] Logged the message, RobH [20:55:10] RECOVERY - Varnishkafka Delivery Errors on cp4018 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:55:47] (03PS3) 10Manybubbles: WIP: Add some analyzers to labs [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/130969 [20:56:10] PROBLEM - Varnishkafka Delivery Errors on amssq50 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 268.799988 [20:58:10] RECOVERY - Varnishkafka Delivery Errors on amssq50 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:58:19] PROBLEM - Kafka Broker Messages In on analytics1022 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 5.56206809584 [20:58:19] !log both kafka brokers back in service [20:58:21] !log ssl1001-1003 now have updated unified cert in service [20:58:26] Logged the message, Master [20:58:32] Logged the message, RobH [20:58:45] how come it calls you robh and me master? [20:59:07] <^d> People hack the script to make it say funny things. [20:59:11] yep [20:59:32] leslie and notpeter have custom messages [20:59:39] PROBLEM - Varnishkafka Delivery Errors on cp4008 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 646.799988 [20:59:47] RobH: wikipedia now has new certs with old values but new fingerprint only? [20:59:58] just added zero to SANs [21:00:05] ok well i have confirmed that the kafka brokers are both acting as expected, time to drill down.. [21:00:08] for *.zero.wikipedia.org support [21:00:19] PROBLEM - Varnishkafka Delivery Errors on cp4004 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 435.966675 [21:00:39] PROBLEM - Varnishkafka Delivery Errors on cp4020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 428.399994 [21:00:49] PROBLEM - Varnishkafka Delivery Errors on cp1046 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 504.0 [21:00:50] PROBLEM - Varnishkafka Delivery Errors on cp1056 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 92.400002 [21:01:10] PROBLEM - Varnishkafka Delivery Errors on cp4019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 512.400024 [21:01:19] PROBLEM - Varnishkafka Delivery Errors on cp4003 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 411.600006 [21:01:49] PROBLEM - Varnishkafka Delivery Errors on cp4001 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 319.200012 [21:01:49] PROBLEM - Varnishkafka Delivery Errors on cp1059 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 445.200012 [21:01:49] PROBLEM - Varnishkafka Delivery Errors on cp4016 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 646.799988 [21:02:19] PROBLEM - Varnishkafka Delivery Errors on cp4010 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1901.400024 [21:02:19] PROBLEM - Varnishkafka Delivery Errors on cp1067 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 352.799988 [21:02:19] PROBLEM - Varnishkafka Delivery Errors on cp1057 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 546.0 [21:02:19] PROBLEM - Varnishkafka Delivery Errors on cp4002 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 554.400024 [21:02:49] PROBLEM - Varnishkafka Delivery Errors on cp1060 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 504.0 [21:03:10] PROBLEM - Varnishkafka Delivery Errors on cp4011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 520.799988 [21:03:39] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 781.200012 [21:03:39] PROBLEM - Varnishkafka Delivery Errors on cp4009 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 596.400024 [21:04:19] PROBLEM - Varnishkafka Delivery Errors on cp1047 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 392.933319 [21:04:19] PROBLEM - Varnishkafka Delivery Errors on cp4012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 495.600006 [21:04:39] RECOVERY - Varnishkafka Delivery Errors on cp4008 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:04:39] PROBLEM - Varnishkafka Delivery Errors on cp1070 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1503.599976 [21:04:39] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:06:19] RECOVERY - Varnishkafka Delivery Errors on cp4010 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:06:19] RECOVERY - Varnishkafka Delivery Errors on cp1067 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:06:49] RECOVERY - Varnishkafka Delivery Errors on cp1056 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:06:49] RECOVERY - Varnishkafka Delivery Errors on cp4016 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:06:49] PROBLEM - Varnishkafka Delivery Errors on cp1069 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 714.0 [21:06:59] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 184.800003 [21:07:19] RECOVERY - Kafka Broker Messages In on analytics1022 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 1592.54232073 [21:07:19] RECOVERY - Kafka Broker Under Replicated Partitions on analytics1021 is OK: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value OKAY: 0.0 [21:07:19] RECOVERY - Varnishkafka Delivery Errors on cp1057 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:07:35] (03PS1) 10EBernhardson: Drop wgFlowCacheKey from CommonSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131609 [21:07:39] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 277.200012 [21:07:39] RECOVERY - Varnishkafka Delivery Errors on cp1070 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:07:39] RECOVERY - Varnishkafka Delivery Errors on cp4009 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:07:40] !log trying on analytics1022: https://wikitech.wikimedia.org/wiki/Analytics/Kraken/Kafka/Administration#Recovering_a_laggy_broker_replica [21:07:48] Logged the message, Master [21:07:49] RECOVERY - Varnishkafka Delivery Errors on cp1046 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:07:52] (03CR) 10Chad: "Very similar to what we tried friday: https://gerrit.wikimedia.org/r/#/c/131010/ which didn't work." [operations/puppet] - 10https://gerrit.wikimedia.org/r/131597 (owner: 10Dzahn) [21:07:59] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:08:19] RECOVERY - Varnishkafka Delivery Errors on cp4004 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:08:19] RECOVERY - Varnishkafka Delivery Errors on cp4003 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:08:29] RECOVERY - Varnishkafka Delivery Errors on cp4002 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:08:39] RECOVERY - Varnishkafka Delivery Errors on cp4020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:08:39] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:08:49] RECOVERY - Varnishkafka Delivery Errors on cp4001 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:09:19] RECOVERY - Varnishkafka Delivery Errors on cp4019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:10:19] PROBLEM - Varnishkafka Delivery Errors on cp4010 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 113.933334 [21:10:49] RECOVERY - Varnishkafka Delivery Errors on cp1059 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:10:49] RECOVERY - Varnishkafka Delivery Errors on cp1060 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:10:59] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1015.533325 [21:11:19] PROBLEM - Varnishkafka Delivery Errors on cp1067 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 257.600006 [21:11:39] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 860.866638 [21:11:39] PROBLEM - Varnishkafka Delivery Errors on cp1070 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 782.733337 [21:12:09] (03CR) 10Dzahn: "should be the missing $gid = 500 and "wikidev" group, i just had that same issue with admins::pmacct" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131597 (owner: 10Dzahn) [21:12:19] PROBLEM - Kafka Broker Messages In on analytics1022 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 0.0 [21:12:19] RECOVERY - Varnishkafka Delivery Errors on cp1047 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:12:19] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1321.266724 [21:12:49] PROBLEM - Varnishkafka Delivery Errors on cp4009 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 412.700012 [21:12:49] PROBLEM - Varnishkafka Delivery Errors on cp1056 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 229.899994 [21:13:11] (03PS2) 10Dzahn: include accounts in LDAP admin role, add demon [operations/puppet] - 10https://gerrit.wikimedia.org/r/131597 [21:13:19] PROBLEM - Varnishkafka Delivery Errors on cp1057 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 239.100006 [21:14:19] RECOVERY - Varnishkafka Delivery Errors on cp4011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:14:19] RECOVERY - Kafka Broker Messages In on analytics1022 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 2039.08236657 [21:14:59] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:15:19] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:15:29] RECOVERY - Varnishkafka Delivery Errors on cp4012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:15:39] RECOVERY - Varnishkafka Delivery Errors on cp1070 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:15:39] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:16:19] RECOVERY - Varnishkafka Delivery Errors on cp1057 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:16:49] RECOVERY - Varnishkafka Delivery Errors on cp1056 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:16:49] RECOVERY - Varnishkafka Delivery Errors on cp1069 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:17:02] now stay that way, dammit [21:17:40] RECOVERY - Varnishkafka Delivery Errors on cp4009 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:19:21] :) [21:19:35] I created an instance in betalabs; and then realized I don't actually need it. Is there a 'production'esque way to remove the puppet and salt keys from the master? or is it just dropping the box off the face of the planet and having puppet/salt clean themselves automatically? [21:20:19] RECOVERY - Varnishkafka Delivery Errors on cp1067 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:20:59] ah; yes; found it on wikitech [21:21:05] \O/ [21:23:19] RECOVERY - Varnishkafka Delivery Errors on cp4010 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:27:52] * jgage works on varnishkafka incident report, while keeping an eye on it [21:28:30] thanks much jgage [21:50:44] Hi all, I'm wondering how you manage deployment of the wmf/1.NNwmfNN in relation to submodules [21:51:34] At webplatform.org we are running wmf/1.20 (ish) and I'd like to make a better way to upgrade versions [21:51:57] (without re-engineering what's been figured out, or already in place) [21:57:03] <^d> renoirb_: Lots of hang-wringing. We've got a script called make-wmf-branch which handles creating new wmf branches with their appropriate submodules. [21:57:42] <^d> Then we'll update any submodules as needed during a cycle when the branch is live. Rather manual and painful. [22:01:09] ^d so, how do you deploy then. You have a builder VM and once its done, you rsync files? [22:02:39] <^d> We've got a deployment box that we pull the repos to then we sync the files, yeah. [22:14:41] ^d can you give me a link to that script make-wmf-branch? [22:14:58] and/or any useful utility that are available for me to understand how you handle deployment, pls :) [22:15:50] <^d> renoirb_: https://git.wikimedia.org/tree/mediawiki%2Ftools%2Frelease.git/HEAD/make-wmf-branch [22:15:54] <^d> https://git.wikimedia.org/tree/mediawiki%2Ftools%2Frelease.git/HEAD/make-wmf-branch [22:15:56] <^d> renoirb: ^ [22:16:38] oh, sweet! ^ [22:16:39] ^d [22:20:35] (03PS3) 10Chad: include accounts in LDAP admin role, add demon [operations/puppet] - 10https://gerrit.wikimedia.org/r/131597 (owner: 10Dzahn) [22:22:27] (03CR) 10Rush: "hey man this seems...awesome but I'm not sure how to review it, mainly because I'm not sure how exactly to use it? Can you share with me " [operations/software] - 10https://gerrit.wikimedia.org/r/131495 (owner: 10Giuseppe Lavagetto) [22:32:07] ori: mwalker, fyi, I'll be afk during the beginning of swat. do your thing. [22:42:18] mwalker: i can do it if you like [22:42:38] ori, yes please; I'm stuck in a sprint review [22:47:25] If I may ask for confirmation, If I get it correctly ^d, I could use make-wmf-branch config's default.conf to set what extension my site uses, right? [22:50:36] <^d> renoirb: Should be able to :) [22:56:03] I do not see anything to prepare a version for deployment though ^d [22:56:27] I guess i'd have to change the default behavior, isnt' it? [22:56:33] <^d> Ah, that's another script that's 100% wmf-specific. [22:56:49] <^d> checkoutMediaWiki. There's no way that'll work as-is for you. [22:58:51] (03PS1) 10Andrew Bogott: Rename mflaschen to mattflaschen and change UID to match labs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131624 [23:00:28] (03CR) 10Andrew Bogott: [C: 032] Rename mflaschen to mattflaschen and change UID to match labs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131624 (owner: 10Andrew Bogott) [23:04:52] bblack, when you have a minute, would you please review and, if appropriate, +2 https://gerrit.wikimedia.org/r/#/c/131540/ [23:04:54] ? [23:07:35] (03PS2) 10BBlack: Test support for Nokia proxy fronting assigned cellular IP for one subdomain. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131540 (owner: 10Dr0ptp4kt) [23:07:45] (03CR) 10BBlack: [C: 032 V: 032] Test support for Nokia proxy fronting assigned cellular IP for one subdomain. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131540 (owner: 10Dr0ptp4kt) [23:08:07] BBlack, thx [23:08:11] np [23:08:40] (03CR) 10Rush: [C: 04-1] "Just a few minor things, I end up with a package named something like: python-statsd_0.1.10-chasemp20140227-1_all.deb. Which based on the" [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [23:14:06] ori: You're doing SWAT today then? [23:15:33] (03PS4) 10Dzahn: include accounts in LDAP admin role, add demon [operations/puppet] - 10https://gerrit.wikimedia.org/r/131597 [23:17:23] RoanKattouw: yes [23:17:33] waiting on jenkins [23:17:33] Sweet [23:17:37] doing ve next, i think [23:18:53] (03PS1) 10Dzahn: add w.wiki, link to wikipedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/131627 [23:19:54] (03CR) 10Dzahn: [C: 032] "not actually adding new permissions, just finishing RT #6134 people had this on formey before" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131597 (owner: 10Dzahn) [23:20:50] (03CR) 10Krinkle: [C: 031] "LGTM. Makes no effective change (resulting setting remains the same)." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131513 (owner: 10Chad) [23:21:44] (03CR) 10Chad: "I should probably have Nik +1 this as well. Only reason I see us needing it is some emergency fallback but we're almost past that point." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131513 (owner: 10Chad) [23:22:11] (03CR) 10Krinkle: "Plus, the original config setting is still there in CommonSettings.php if needed." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131513 (owner: 10Chad) [23:22:25] Or is that an anti-pattern? [23:22:36] wmg should only be needed per wiki I suppose. [23:22:53] <^d> Hmm. [23:23:29] !log ori synchronized php-1.24wmf2/extensions/EventLogging 'Update EventLogging for Id23b37fbe for SWAT.' [23:23:31] (03CR) 10Dzahn: "also see: https://gerrit.wikimedia.org/r/#/c/125721/8/manifests/site.pp" [operations/puppet] - 10https://gerrit.wikimedia.org/r/131597 (owner: 10Dzahn) [23:23:34] Logged the message, Master [23:26:20] !log ori synchronized php-1.24wmf3/extensions/EventLogging 'Update EventLogging for Id23b37fbe for SWAT.' [23:26:26] Logged the message, Master [23:44:05] !log ori Started scap: SWAT deploy for VisualEditor and Flow cherry-picks [23:44:11] Logged the message, Master [23:47:49] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Sat May 3 05:29:08 2014 [23:48:49] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Mon May 5 20:48:34 2014 [23:54:01] !log ori Finished scap: SWAT deploy for VisualEditor and Flow cherry-picks (duration: 09m 55s) [23:54:08] Logged the message, Master [23:54:20] ^ RoanKattouw, ebernhardson [23:54:25] ebernhardson: I still have to sync the config change [23:54:26] doing that now [23:54:38] ok, perfect [23:54:49] (03CR) 10Ori.livneh: [C: 032] Drop wgFlowCacheKey from CommonSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131609 (owner: 10EBernhardson) [23:55:11] (03Merged) 10jenkins-bot: Drop wgFlowCacheKey from CommonSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131609 (owner: 10EBernhardson) [23:57:18] !log ori updated /a/common to {{Gerrit|Id1f2e0acf}}: Drop wgFlowCacheKey from CommonSettings.php [23:57:25] Logged the message, Master