[00:40:24] (03PS1) 10TTO: Add enwikisource to global abuse filters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179864 [01:40:14] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 4 failures [02:07:34] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:09:16] !log l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 02s) [02:09:22] !log LocalisationUpdate completed (1.25wmf11) at 2014-12-15 02:09:21+00:00 [02:09:24] Logged the message, Master [02:09:26] Logged the message, Master [02:13:11] (03PS2) 10Andrew Bogott: Support bootstrap-vz for buildign labs debian images [puppet] - 10https://gerrit.wikimedia.org/r/179765 [02:14:58] !log l10nupdate Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 01s) [02:15:01] !log LocalisationUpdate completed (1.25wmf12) at 2014-12-15 02:15:01+00:00 [02:15:05] Logged the message, Master [02:15:08] Logged the message, Master [03:36:24] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Dec 15 03:36:24 UTC 2014 (duration 36m 23s) [03:36:27] Logged the message, Master [05:12:45] PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: puppet fail [05:21:35] (03PS3) 10Ori.livneh: Add 'xenon' module for aggregating ext_xenon-produced traces [puppet] - 10https://gerrit.wikimedia.org/r/179791 [05:24:53] RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [05:50:39] <_joe_> andrewbogott: ping [05:50:59] _joe_: hello! [05:51:14] <_joe_> hi :) [05:51:23] <_joe_> did you see my request in the backlog? [05:51:54] Hm… must've missed it. What's up? [05:52:02] <_joe_> I have 2 VMs, I need to use them for performance tests, so I'd need those to be on the same physical host [05:52:17] ah, ok. I'll move them. Project/instance name? [05:52:22] <_joe_> hhvm-img and lamp-img if I'm not wrong [05:52:36] <_joe_> 1 sec for the project name [05:53:28] <_joe_> project: hat-imagescalers [05:53:29] Hang on, I probably don't need it. [05:53:34] ok :) [05:53:48] <_joe_> thanks :) [05:54:39] I'm going to move them both to virt1006 since that one's underutilized at the moment. Will take a few minutes -- if you have sessions open they may freeze for a bit. [05:59:07] <_joe_> andrewbogott: nope no open sessions [06:00:04] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 1 failures [06:03:21] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:33:45] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:40] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:01] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:01] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:10] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:29] PROBLEM - puppet last run on ms-fe2001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:37:23] PROBLEM - Disk space on fluorine is CRITICAL: DISK CRITICAL - free space: /a 76224 MB (3% inode=99%): [06:37:44] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:37:51] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:45] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:46:05] RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:47:19] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:16] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:34] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:48:40] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:48:57] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:48:58] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:53:16] <_joe_> disk on fluorine again? [06:53:32] we never merged the 1:1000 patch [06:53:41] <_joe_> oh, my [06:53:45] <_joe_> let's do it [06:53:52] <_joe_> (hi) [06:54:05] hey, good morning [06:54:33] <_joe_> api.log is 147 G [06:54:53] pv comes in super-handy for verifying this [06:55:23] pv < api.log > /dev/null [06:55:29] ~100 mb/s [06:55:39] (03CR) 10Ori.livneh: [C: 032] Sample 'api' debug log group at 1:1000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179412 (owner: 10Ori.livneh) [06:56:16] (03Merged) 10jenkins-bot: Sample 'api' debug log group at 1:1000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179412 (owner: 10Ori.livneh) [06:57:00] _joe_: migrate is fininished, both instances are on virt1006 now. [06:57:12] !log ori Synchronized wmf-config/InitialiseSettings.php: Id9023e66c: Sample "api" debug log group at 1:1000 (duration: 00m 06s) [06:57:20] Logged the message, Master [06:58:14] er, tail -f api.log | pv >/dev/null, that is [06:58:14] <_joe_> andrewbogott: thanks a lot [06:58:51] <_joe_> ori: much better now [06:58:59] yesterday's log is getting gzipped now, so it should recover [07:00:06] <_joe_> yep [07:00:49] <_joe_> ori: my plan for the day, apart from ops onduty tasks, is to convert one jobrunner [07:00:55] <_joe_> and see how it behaves [07:01:10] <_joe_> when we last had one, the problems were basically hhvm crashing [07:01:25] yeah, it should be fine now [07:01:54] <_joe_> in the meanwhile I'm doing tests for imagescalers [07:02:36] do you think you might have a chance to review ? flamegraph.pl is imported, so the patch is not nearly as big as it looks [07:04:20] <_joe_> I will for sure [07:06:30] i tested it on vagrant [07:16:43] (03CR) 10Nemo bis: "Nice!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179412 (owner: 10Ori.livneh) [07:18:18] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [07:18:27] <_joe_> you have to start coding in PHP again after a few years to really understand how much it sucks [07:23:13] ori: https://phabricator.wikimedia.org/T78517 [07:23:25] not api.log related at all [07:24:07] i'm too tired to look :( just about to call it a night [07:24:19] also, hi [07:24:26] hi :) [07:24:44] i read the task -- too tired to review the patches, i mean [07:25:00] i can do so tomorrow (or you can go ahead and merge them if you feel confident) [07:25:43] them == the reverts [07:25:48] i should sleep. good night! [07:25:53] <_joe_> night [07:27:21] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [07:27:36] oh it's not about confidence [07:27:44] this is a real error presumably [07:27:58] I'm wondering if people want to fix this before I strip out the debug log calls [07:29:57] <_joe_> paravoid: I'll poke people in core this evening [07:31:56] <_joe_> sigh, image scaling is slightly slower on hhvm that it is on zend mod_php - even when using hhvm's light processes [07:33:11] (03PS3) 10Andrew Bogott: Support bootstrap-vz for buildign labs debian images [puppet] - 10https://gerrit.wikimedia.org/r/179765 [07:33:12] <_joe_> well, it's about 5%, so that is well below our potential systematic errors [07:33:49] andrewbogott: s/buildign/building/ :) [07:34:03] andrewbogott: also, did you see my comments here on saturday? [07:34:23] paravoid: mmmmaybe? Regarding what? [07:34:37] (Joe asked me that too, so I must've lost my backscroll at some point) [07:35:07] Grrrrrrr how many people died in the holy war that made nano the default on new distros? [07:35:59] <_joe_> "not nearly enough" [07:36:50] I don't think I can argue that vi is a sensible editor. But I can argue that everytime git launches nano instead of vi I want to stab someone. [07:37:35] andrewbogott: basically your hp-virt change wasn't going to work [07:37:50] it says /dev/xvdb and we have no such disk [07:37:57] (that's a Xen paravirt drive) [07:38:41] and I was saying that in any case, I don't see much point with fighting with partman for this [07:38:49] we can partition/format the nova disk post-install [07:38:53] manually or even with puppet [07:39:05] <_joe_> mmmh very very bad power outage at codfw [07:39:15] ? [07:39:24] <_joe_> they've been without utility power since 1 hour [07:39:36] paravoid: or just switch to /dev/sdb? [07:39:42] <_joe_> no impact to us, still, but 1 hour is bad [07:39:54] andrewbogott: this late_command thing is a very big hack [07:40:08] _joe_: RT says it's fixed [07:40:20] _joe_: and, well, I assume they have generators [07:40:29] <_joe_> yeah the last one, I was reading those in a row [07:40:32] it's also Texas, so I assume fuel isn't a problem :P [07:40:35] <_joe_> and yes they do have :) [07:40:42] paravoid: It's a very big hack, but it's also partman. [07:40:55] andrewbogott: right, so why do this there? [07:41:07] paravoid: Doesn't matter -- doing it in puppet is fine. [07:41:08] just let d-i/partman install the basic system [07:41:14] that's what we do in swift fwiw [07:41:20] The immediate issue that I'm having with partman is that it insists I don't have a swap volume. [07:41:22] we don't partition/format all those disks in d-i [07:41:23] Even though… I do [07:43:04] paravoid: I'll try a reinstall without that section… hours from now we will see if that makes it stop complaining about swap. [07:43:11] Unless you see the issue with swap already? [07:43:47] not immediately, no [07:43:51] and I'm a bit crippled atm [07:43:59] I don't even have an up-to-date ops/puppet locally :/ [07:44:11] Backhoe problem? [07:44:16] (03PS1) 10Andrew Bogott: Leave the second volume out of partman. [puppet] - 10https://gerrit.wikimedia.org/r/179869 [07:44:20] I'm on 3g at the moment [07:44:35] carefully counting every bit I transfer [07:44:52] I'm gonna move to a cafe soon I think [07:45:03] <_joe_> I have a cumulative 17Gb of 3G traffic right now [07:45:13] paravoid: yesterday I got a full install on one of the Hps (after clicking manually through the complaint about swap). It took > 12 hours to install the OS. Is that expected? [07:45:19] <_joe_> that's because rome has very few cafes with 3g [07:45:20] of course not :) [07:45:22] <_joe_> or wifi [07:45:48] paravoid: it almost felt like it was downloading packages via the serial port… so slow! [07:45:53] andrewbogott: the complaint about swap can be preseeded, if you decide to go without swap (i did something like that for the Ciscos, if memory serves0 [07:46:02] but that's definitely not n ormal [07:46:15] paravoid: I want a swap, and I have one in the partman recipe. I don't know why it says I don't :( [07:46:46] paravoid: If you don't mind, I'll hold off on the reinstall and let you watch it after you relocate. Maybe you'll see why it's so slow. [07:46:57] Also, btw, once the install finished, the box rejected my root key and the root password. [07:47:03] Although possibly my root passwd is out of date. [07:47:28] no, the install doesn't provision our keys [07:47:33] it only provisions the new_install key [07:47:50] so you have to go via palladium with ssh -i .ssh/new_install (or whatever the actual name is) [07:48:13] paravoid: yes, sorry, that's what I meant. Rejected the new install key. [07:48:16] oh [07:49:03] paravoid: I believe that virt1011 is still in that state, if you want to look. [07:49:28] I didn't try very hard, may have been a dumb mistake. [07:51:04] it doesn't even ping [07:51:49] (03PS4) 10Andrew Bogott: Support bootstrap-vz for building labs debian images [puppet] - 10https://gerrit.wikimedia.org/r/179765 [07:52:18] oh? Maybe I restarted an install then, sorry. [07:52:19] * andrewbogott checks [07:53:12] (03CR) 10Andrew Bogott: [C: 032] Leave the second volume out of partman. [puppet] - 10https://gerrit.wikimedia.org/r/179869 (owner: 10Andrew Bogott) [07:54:06] paravoid: yeah, serial console is totally unresponsive on virt1011 now. I don't know what it's doing. [07:54:48] Oh! Yeah, it was waiting for me to confirm the install w/out swap. So I must've restarted the install there. [07:55:01] When it takes 10 hours to install, it's pretty easy to think of new things to test before the old tests are done :) [07:55:57] is it a network issue or a disk issue? [07:56:12] try ssh'ing while it installs (I added this "feature" last week) [07:56:16] and wget'ing something [07:56:39] with the new_install key? [07:57:19] yes [07:57:47] cool -- I'm restarting the install on virt1010 now, will check once the install is underway. [08:04:57] (03PS1) 10Legoktm: Sample the GlobalTitleFail log at 1:10000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179870 [08:06:25] (03PS5) 10Andrew Bogott: Support bootstrap-vz for building labs debian images [puppet] - 10https://gerrit.wikimedia.org/r/179765 [08:20:20] PROBLEM - HTTPS_m.wikimediafoundation.org on cp4002 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [08:20:20] PROBLEM - HTTPS_m.wikimediafoundation.org on cp1038 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [08:20:20] PROBLEM - HTTPS_wikimediafoundation.org on cp3022 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [08:20:20] PROBLEM - HTTPS_m.wiktionary.org on cp1065 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [08:20:20] PROBLEM - HTTPS_wikimediafoundation.org on cp4019 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [08:20:20] PROBLEM - HTTPS_wikiversity.org on amssq37 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [08:20:20] PROBLEM - HTTPS_wikivoyage.org on cp1062 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [08:22:34] RECOVERY - HTTPS_m.wikibooks.org on cp4010 is OK: SSL_CERT OK - X.509 certificate for *.m.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:05 2015 GMT (expires in 342 days) [08:22:34] RECOVERY - HTTPS_m.wikimediafoundation.org on cp4002 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimediafoundation.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:07 2015 GMT (expires in 342 days) [08:22:34] RECOVERY - HTTPS_m.wikimediafoundation.org on cp1038 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimediafoundation.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:07 2015 GMT (expires in 342 days) [08:22:35] RECOVERY - HTTPS_wikinews.org on cp4006 is OK: SSL_CERT OK - X.509 certificate for *.wikinews.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:09 2015 GMT (expires in 342 days) [08:22:35] RECOVERY - HTTPS_wikivoyage.org on cp1062 is OK: SSL_CERT OK - X.509 certificate for *.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:09 2015 GMT (expires in 342 days) [08:22:35] RECOVERY - HTTPS_wikibooks.org on cp3015 is OK: SSL_CERT OK - X.509 certificate for *.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:03 2015 GMT (expires in 342 days) [08:22:35] RECOVERY - HTTPS_m.wikiquote.org on cp4012 is OK: SSL_CERT OK - X.509 certificate for *.m.wikiquote.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:07 2015 GMT (expires in 342 days) [08:22:36] RECOVERY - HTTPS_m.wiktionary.org on cp1065 is OK: SSL_CERT OK - X.509 certificate for *.m.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:07 2015 GMT (expires in 342 days) [08:22:36] RECOVERY - HTTPS_zero.wikipedia.org on cp4011 is OK: SSL_CERT OK - X.509 certificate for *.zero.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:05 2015 GMT (expires in 342 days) [08:33:51] greetings [08:39:32] (03PS2) 10Giuseppe Lavagetto: apt: fix for failure case [puppet] - 10https://gerrit.wikimedia.org/r/179472 [08:44:05] (03PS6) 10Andrew Bogott: Support bootstrap-vz for building labs debian images [puppet] - 10https://gerrit.wikimedia.org/r/179765 [08:45:22] (03CR) 10Giuseppe Lavagetto: [C: 032] apt: fix for failure case [puppet] - 10https://gerrit.wikimedia.org/r/179472 (owner: 10Giuseppe Lavagetto) [08:46:46] RECOVERY - Disk space on fluorine is OK: DISK OK [08:48:25] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [08:50:01] (03PS2) 10Giuseppe Lavagetto: exim: fix compilation warnings [puppet] - 10https://gerrit.wikimedia.org/r/179485 [08:51:33] paravoid: ok, virt1010 is installing again. You think I should be able to # ssh -i /root/.ssh/new_install root@virt1010.eqiad.wmnet ? [08:52:32] (03CR) 10Giuseppe Lavagetto: [V: 032] exim: fix compilation warnings [puppet] - 10https://gerrit.wikimedia.org/r/179485 (owner: 10Giuseppe Lavagetto) [08:52:44] (03CR) 10Giuseppe Lavagetto: [C: 032] exim: fix compilation warnings [puppet] - 10https://gerrit.wikimedia.org/r/179485 (owner: 10Giuseppe Lavagetto) [08:55:05] <_joe_> ok so now the big offender in terms of puppet warnings is openstack [08:55:18] <_joe_> andrewbogott: use @variable in templates :) [08:55:44] ok :) Some of those templates are ancient. [08:56:22] <_joe_> andrewbogott: I'll fix those this week, I've set "no puppet warnings by the end of 2014" as a personal goal of mine :) [08:56:29] great! [09:11:04] (03PS1) 10Hashar: mwdeploy private key is only for production [puppet] - 10https://gerrit.wikimedia.org/r/179875 [09:14:46] (03CR) 10Hashar: [V: 031] "Applied on the beta cluster puppetmaster. That solves the puppet issue on deployment-bastion." [puppet] - 10https://gerrit.wikimedia.org/r/179875 (owner: 10Hashar) [09:17:13] <_joe_> hashar: I don't like this at all [09:17:16] <_joe_> tbh [09:17:58] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "Does beta have a completely different deployment mechanism than production? I think we should just find a way to keep the deployment mecha" [puppet] - 10https://gerrit.wikimedia.org/r/179875 (owner: 10Hashar) [09:18:11] <_joe_> hashar: so keep it as a cherry-pick on beta for now [09:18:24] _joe_: yeah that is the idea :D [09:18:35] I have no idea what the key holder is nor how it is used in prod [09:18:49] <_joe_> hashar: ok, I will take a look at this [09:18:55] nor do I have any idea how we hold the mwdeploy key on beta. Most probably it is on the homedir of beta cluster deployment and has been generated manually [09:18:56] <_joe_> whenever I have time [09:19:59] mukunda is going to help a lot for the beta cluster now that Phabricator is {done} [09:20:44] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 14 data above and 9 below the confidence bounds [09:29:47] (03CR) 1020after4: "Giuseppe: While I agree that we should endeavour to keep production and beta similar, the problem seems to be that the key isn't actually " [puppet] - 10https://gerrit.wikimedia.org/r/179875 (owner: 10Hashar) [09:31:06] (03CR) 10Hashar: "Beta is using scap as well (thanks to Bryan) from the instance deployment-bastion, triggered by a Jenkins job and relying on the mwdeploy " [puppet] - 10https://gerrit.wikimedia.org/r/179875 (owner: 10Hashar) [09:32:35] (03PS1) 10Giuseppe Lavagetto: install-server: add relevant entries for einsteinium [puppet] - 10https://gerrit.wikimedia.org/r/179876 [09:34:00] (03CR) 1020after4: "Then the real solution is to completely remove keyholder from beta?" [puppet] - 10https://gerrit.wikimedia.org/r/179875 (owner: 10Hashar) [09:34:20] (03CR) 10Giuseppe Lavagetto: "My point is exactly we should provide a key in beta as well, I don't think that's too hard to do. So my advice is - keep this cherry-pick" [puppet] - 10https://gerrit.wikimedia.org/r/179875 (owner: 10Hashar) [09:37:49] (03PS2) 10Giuseppe Lavagetto: install-server: add relevant entries for einsteinium [puppet] - 10https://gerrit.wikimedia.org/r/179876 [09:38:24] paravoid: I'm stepping away for a while -- if you have the inclination, please investigate the s-l-o-w install happening on virt1010 (or clarify how I can ssh in and see for myself). Thanks! [09:38:26] (03CR) 10Giuseppe Lavagetto: [C: 032] install-server: add relevant entries for einsteinium [puppet] - 10https://gerrit.wikimedia.org/r/179876 (owner: 10Giuseppe Lavagetto) [09:38:51] (03CR) 10Hashar: "Assuming unattended upgrade is a Debian mechanism to let apt automatically upgrade a package when it see it: I am not sure how different i" [puppet] - 10https://gerrit.wikimedia.org/r/178806 (owner: 10Hashar) [09:40:23] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 629 [09:40:27] (03PS1) 10Giuseppe Lavagetto: install-server: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/179877 [09:40:58] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] install-server: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/179877 (owner: 10Giuseppe Lavagetto) [09:45:17] RECOVERY - check_mysql on db1008 is OK: Uptime: 5257105 Threads: 73 Questions: 149053371 Slow queries: 35915 Opens: 99968 Flush tables: 2 Open tables: 64 Queries per second avg: 28.352 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [09:48:39] !log Upgrading composer on CI to v1.0.0-alpha9 {{gerrit|178550}} [09:48:42] Logged the message, Master [09:53:14] andrewbogott: not pinging again... [09:53:16] I'll debug [10:02:15] paravoid: thanks. it might not be far enough along yet… it is /very/ slow [10:02:54] andrewbogott: when the swap question comes [10:02:57] don't hit yet [10:03:02] just try to go back [10:03:12] to the menu, where you can spawn a shell [10:10:14] paravoid: I'm about to go to dinner -- I can try when I get back, or you can try if/when the prompt arrives. [10:10:22] New partman script, it's vaguely possible it will get past this time. [10:14:22] (03PS7) 10Andrew Bogott: Support bootstrap-vz for building labs debian images [puppet] - 10https://gerrit.wikimedia.org/r/179765 [10:16:15] sorry to run off in the middle of this :( [10:16:37] lol, don't worry at all :) [10:19:36] I'm gonna guess andrewbogott_afk is in Asia somewhere? :O [10:40:24] (03CR) 10Faidon Liambotis: "Of course it will be different, the puppet code will remain the same. The fact that you need up-to-date software in CI it's not something " [puppet] - 10https://gerrit.wikimedia.org/r/178806 (owner: 10Hashar) [10:44:31] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [10:50:35] (03PS1) 10Faidon Liambotis: phabricator: strip Ubuntu 12.04 (precise) support [puppet] - 10https://gerrit.wikimedia.org/r/179882 [10:57:26] (03PS1) 10Faidon Liambotis: graphite: remove $::lsbdistcodename branch [puppet] - 10https://gerrit.wikimedia.org/r/179884 [11:02:44] (03CR) 10Filippo Giunchedi: [C: 032] graphite: remove $::lsbdistcodename branch [puppet] - 10https://gerrit.wikimedia.org/r/179884 (owner: 10Faidon Liambotis) [11:03:00] paravoid: gah @ that if/else [11:03:45] gah @ the proliferation of $::lsb branches [11:04:27] hehe indeed, have you come across many so far with the debian work? [11:04:32] yaeh [11:04:33] yeah [11:04:59] well, not with the debian work specifically, it's not like I've tried running graphite on Debian [11:05:18] but starting with that, I started grepping the tree for $lsb and friends [11:06:12] *nod* [11:07:08] PROBLEM - puppet last run on ms-be2013 is CRITICAL: CRITICAL: puppet fail [11:09:37] bah, 502 from puppet server [11:10:01] I would have expected puppet to retry [11:10:42] (03PS1) 10Faidon Liambotis: ganglia: convert $lsb check to os_version() [puppet] - 10https://gerrit.wikimedia.org/r/179885 [11:19:42] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 1 failures [11:22:25] RECOVERY - puppet last run on ms-be2013 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [11:23:38] (03PS1) 10Faidon Liambotis: (WIP) Remove support for Ubuntu Lucid/10.04 [puppet] - 10https://gerrit.wikimedia.org/r/179888 [11:26:55] (03PS1) 10Faidon Liambotis: rcstream: remove requires_os('ubuntu >= trusty') [puppet] - 10https://gerrit.wikimedia.org/r/179889 [11:31:12] (03PS3) 10Dzahn: openstack-manager: re-add tmp removed mwdeploy user [puppet] - 10https://gerrit.wikimedia.org/r/179444 [11:31:15] (03CR) 10jenkins-bot: [V: 04-1] openstack-manager: re-add tmp removed mwdeploy user [puppet] - 10https://gerrit.wikimedia.org/r/179444 (owner: 10Dzahn) [11:32:57] (03PS4) 10Dzahn: openstack-manager: re-add tmp removed mwdeploy user [puppet] - 10https://gerrit.wikimedia.org/r/179444 [11:34:58] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:35:00] (03PS1) 10Yuvipanda: Remove stupidest ever 80col limitation [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179891 [11:35:02] (03PS1) 10Yuvipanda: Split tableschema.yaml into whitelisted and greylisted [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179892 [11:35:22] (03CR) 10Yuvipanda: [C: 032 V: 032] Remove stupidest ever 80col limitation [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179891 (owner: 10Yuvipanda) [11:35:37] (03CR) 10Yuvipanda: [C: 032 V: 032] Split tableschema.yaml into whitelisted and greylisted [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179892 (owner: 10Yuvipanda) [11:38:18] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/179444 (owner: 10Dzahn) [11:43:09] pfff [11:43:14] gotta restart Jenkins [11:44:42] or not [11:44:46] it is bugged beyond repair [11:45:46] (03PS2) 10Dereckson: Add en.wikisource to global abuse filters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179864 (owner: 10TTO) [11:46:10] (03CR) 10Dereckson: [C: 031] Add en.wikisource to global abuse filters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179864 (owner: 10TTO) [11:51:35] !log Zuul: clearing out some old zuul git references ( https://phabricator.wikimedia.org/T70481 ). Running in a screen on gallium [11:51:39] (03PS1) 10Yuvipanda: Remove all replacewith: null from greylists [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179896 [11:51:41] Logged the message, Master [11:52:11] (03CR) 10Yuvipanda: [C: 032 V: 032] Remove all replacewith: null from greylists [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179896 (owner: 10Yuvipanda) [11:55:19] (03PS1) 10Yuvipanda: Don't use the default terrible pyyaml flow style [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179897 [11:55:38] (03CR) 10Yuvipanda: [C: 032 V: 032] Don't use the default terrible pyyaml flow style [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179897 (owner: 10Yuvipanda) [11:55:49] i think you can use jenkins again, YuviPanda [11:55:59] mutante: this doesn’t have jenkins set up yet [11:56:04] i see [11:56:20] mostly because laaazzyzyyy [12:06:05] (03PS1) 10Yuvipanda: Fix where clauses in greylist [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179899 [12:06:07] (03PS1) 10Yuvipanda: Remove unused sys import in bootstrapper [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179900 [12:06:22] (03CR) 10Yuvipanda: [C: 032 V: 032] Fix where clauses in greylist [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179899 (owner: 10Yuvipanda) [12:06:35] (03CR) 10Yuvipanda: [C: 032 V: 032] Remove unused sys import in bootstrapper [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179900 (owner: 10Yuvipanda) [12:52:17] (03CR) 10Alexandros Kosiaris: [C: 032] "I can't stop thinking of systemd when I see stdout redirection. Anyway, I see you have already tested it on (at least) xenon and it works " [puppet] - 10https://gerrit.wikimedia.org/r/179764 (owner: 10GWicke) [12:55:54] godog: did you have to restart diamond everywhere after you rolled out the new version? [12:56:03] I see a lot of metric instability over the last few days on labs [12:56:24] (03CR) 10Dzahn: "uploaded 11-26 but comments on gerrit or phab task, set task to stalled, then suggesting to abandon, can be recreated by anyone" [puppet] - 10https://gerrit.wikimedia.org/r/175889 (owner: 10Dzahn) [12:57:06] YuviPanda: upgrading the package should take care of that, so "yes" since the upgrade isn't automatic [12:57:15] YuviPanda: what do you mean by metric instability? [12:57:17] hmm [12:57:31] godog: lots of ‘stuck’ metrics (same ones as from some time ago, repeated by txstatsd) [12:57:37] restarting diamond on affected host ‘fixes’ it [12:57:43] and it starts properly reporting metrics again [12:58:00] ack [12:58:09] nothing of note found in the diamond logs (there was one stacktrace, but that seemed related to the sudo outage on labs from a while ago) [12:58:57] godog: the graphite machine itself doesn’t seem to be suffering, has fairly low io wait (~3%) [12:58:58] YuviPanda: I'm about to merge https://gerrit.wikimedia.org/r/#/c/179423/ which is related to the (non) logs [12:59:43] (03Abandoned) 10Dzahn: add virt1000 to scap dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/175889 (owner: 10Dzahn) [13:02:31] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] send log to stdout with upstart and systemd [debs/python-diamond] - 10https://gerrit.wikimedia.org/r/179423 (owner: 10Filippo Giunchedi) [13:04:14] godog: so a restart fixed the issue, at least on deployment-prep. I don’t think they’ve been upgraded at all either [13:04:21] since I presume we don’t have ensure => latest [13:04:46] we do not indeed [13:06:17] I’m tempted to do a clusterwide restart... [13:07:51] paravoid: I'm back for a few minutes… turn up anything? [13:07:51] YuviPanda: in labs it might have to do with python-statsd dep missing also [13:07:58] godog: did you do the upgrade via salt? are all prod machines upgraded? [13:08:00] godog: oh? [13:08:05] but why does a restart fix it [13:08:10] and why would it be missing in labs... [13:10:33] in labs because for changes like this we have to chase individual puppetmasters [13:11:26] not sure of the specific problem though, is there an instance where metrics are stuck? [13:12:02] (03CR) 10Alexandros Kosiaris: [C: 032] Added initial Debian packaging [debs/contenttranslation/hfst] - 10https://gerrit.wikimedia.org/r/179153 (owner: 10KartikMistry) [13:12:06] godog: yeah, a few. look at limn1? [13:12:09] (03CR) 10Alexandros Kosiaris: [V: 032] Added initial Debian packaging [debs/contenttranslation/hfst] - 10https://gerrit.wikimedia.org/r/179153 (owner: 10KartikMistry) [13:12:22] or integration-slave1001 [13:13:16] !log uploaded hfst_3.8.1~r4088-1 to apt.wikimedia.org (trusty) [13:13:23] Logged the message, Master [13:13:24] YuviPanda: EACCESS [13:13:32] godog: heh, let me add you. [13:14:24] godog: what’s your labs username again? [13:14:27] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Added initial Debian packaging [debs/contenttranslation/apertium-sv-da] - 10https://gerrit.wikimedia.org/r/179428 (owner: 10KartikMistry) [13:15:00] YuviPanda: filippo :) [13:15:14] ah, I had two ls this time [13:15:27] godog: added you, limn1 should be accessible [13:17:27] godog: all of this goes into labmon1001, which hosts txstatsd and graphite [13:18:13] YuviPanda: mmhh ok if the instances are precise then no, diamond hasn't changed there [13:18:33] hmm [13:18:49] I wonder if labmon1001 is getting overloaded [13:20:53] check if it is dropping udp [13:21:09] godog: huh, interesting. labmon1001 itself hasn’t been sending data to graphite.wikimedia.org for a while?! [13:21:15] https://graphite.wikimedia.org/render/?width=586&height=308&_salt=1418649636.81&target=servers.labmon1001.cpu.total.iowait.value&from=00%3A00_20141209&until=23%3A59_20141215&format=json is all null [13:22:25] since the 9th looks like? [13:22:49] yeah [13:24:18] have to step back for a few, bbiab [13:24:31] (03PS5) 10Dzahn: openstack-manager: re-add tmp removed mwdeploy user [puppet] - 10https://gerrit.wikimedia.org/r/179444 [13:24:39] ok [13:24:41] other hosts seem fine [13:24:53] godog: I’m going to let it rest like that for a while in case you want to investigate [13:24:57] (rather than restarting diamond) [13:25:18] because I see statsd related errors in the diamond logs [13:25:24] (03CR) 10Dzahn: [C: 032] openstack-manager: re-add tmp removed mwdeploy user [puppet] - 10https://gerrit.wikimedia.org/r/179444 (owner: 10Dzahn) [13:29:04] godog: actually, I think diamond stopped reporting metrics on 12/05, since everything after that is just a repeated value, courtesy txstatsd [13:29:08] plus this is trusty [13:36:28] (03PS4) 10Dzahn: udp2log: rsync, add ferm service for rsyncd [puppet] - 10https://gerrit.wikimedia.org/r/179438 [13:41:12] (03PS6) 10Dzahn: let bastion hosts have base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/96424 [13:50:01] (03PS7) 10Dzahn: let bastion hosts have base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/96424 [13:50:28] (03CR) 10Dzahn: let bastion hosts have base::firewall (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/96424 (owner: 10Dzahn) [13:50:51] !log reclaiming zinc to spares, stopped puppet agent [13:50:55] Logged the message, Master [13:54:18] YuviPanda: looking [13:56:58] YuviPanda: mhh diamond isn't running? [13:57:43] godog: yeah, look at the log, it says it can’t find settings [13:59:58] !log zinc removed from icinga, system is now shutdown for reclaim per RT8939 [14:00:04] Logged the message, Master [14:01:07] !log reinstall python-twisted-bin python-twisted-core python-twisted-web on labmon1001 [14:01:12] Logged the message, Master [14:01:12] YuviPanda: sigh, an instance of https://phabricator.wikimedia.org/P151 [14:01:32] aaah, bah [14:01:41] I wonder which other hosts have the same thing [14:01:51] YuviPanda: still have to track it down exactly what makes it behave like that, seems twisted related [14:02:01] diamond uses twisted? [14:02:07] only machines running graphite when I checked [14:02:13] ah, hmm [14:02:15] right [14:02:27] but how does that affect *diamond*, shouldn’t that affect txstatsd? [14:03:32] sigh I got that backwards, sec [14:05:53] (03CR) 10Dzahn: "https://phabricator.wikimedia.org/T40799" [dns] - 10https://gerrit.wikimedia.org/r/171769 (https://bugzilla.wikimedia.org/38799) (owner: 10Dzahn) [14:06:56] YuviPanda: ok not twisted by statsd like you pointed out, as it turns out the previous version of python-statsd had a bug with django, fixed by the latest python-statsd release I've uploaded last week [14:07:07] however I completely missed labmon1001 being trusty and thus affected [14:07:17] ‘aaah [14:07:18] aaah [14:14:23] (03PS7) 10Dzahn: phabricator: community metrics stats mail [puppet] - 10https://gerrit.wikimedia.org/r/177792 [14:30:18] (03PS3) 10RobH: adding Matthias Mullie to statstics-users [puppet] - 10https://gerrit.wikimedia.org/r/179140 [14:31:41] (03CR) 10RobH: [C: 032] adding Matthias Mullie to statstics-users [puppet] - 10https://gerrit.wikimedia.org/r/179140 (owner: 10RobH) [15:01:16] (03CR) 10Filippo Giunchedi: [C: 031] let bastion hosts have base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/96424 (owner: 10Dzahn) [15:05:32] (03CR) 10Faidon Liambotis: [C: 031] "Nice!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/179493 (owner: 10Alexandros Kosiaris) [15:07:49] (03CR) 10Faidon Liambotis: [C: 031] Delete the install-server::caching-proxy class [puppet] - 10https://gerrit.wikimedia.org/r/179487 (owner: 10Alexandros Kosiaris) [15:07:53] (03CR) 10Faidon Liambotis: [C: 031] Remove module url_downloader [puppet] - 10https://gerrit.wikimedia.org/r/179488 (owner: 10Alexandros Kosiaris) [15:19:43] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 1 failures [15:21:35] (03CR) 10Rush: "I can really give this some attention after wednesday, but in general thank you! Been meaning to clean this up." [puppet] - 10https://gerrit.wikimedia.org/r/179882 (owner: 10Faidon Liambotis) [15:25:20] (03CR) 10Alexandros Kosiaris: [C: 032] Remove module url_downloader [puppet] - 10https://gerrit.wikimedia.org/r/179488 (owner: 10Alexandros Kosiaris) [15:25:26] anomie: I can do swat today [15:26:03] manybubbles: Ok! [15:30:43] gi11es: it looks like https://gerrit.wikimedia.org/r/#/c/179918 is already merged but it is slated for deploy in half an hour. what is up? [15:30:53] wait, sorry, reading it wrong [15:30:55] ignore me [15:30:57] (03CR) 10Alexandros Kosiaris: [C: 032] Delete the install-server::caching-proxy class [puppet] - 10https://gerrit.wikimedia.org/r/179487 (owner: 10Alexandros Kosiaris) [15:31:10] !log upload diamond 3.5-3 to trusty-wikimedia [15:31:15] Logged the message, Master [15:31:23] manybubbles: I can do it instead; I have another patch on the way and gi11es has his [15:31:32] marktraceur: if you want! [15:31:45] (03CR) 10Manybubbles: [C: 031] Update entity suggester blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179469 (owner: 10Hoo man) [15:31:59] Assuming I can get the submodule patches ready in time -.- [15:32:25] :) [15:32:50] I'm in the I section! [15:33:38] swat time .... almost [15:33:53] 1 + 1 + 1 + 1 :) [15:34:21] <3 [15:34:37] (03PS1) 10Alexandros Kosiaris: Move the top level variables in phabricator role in a class [puppet] - 10https://gerrit.wikimedia.org/r/179930 [15:35:03] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:42:16] (03PS1) 10Alexandros Kosiaris: Followup commit to 8f1ddeb [puppet] - 10https://gerrit.wikimedia.org/r/179935 [15:43:34] PROBLEM - url_downloader on chromium is CRITICAL: Connection refused [15:43:59] (03CR) 10Alexandros Kosiaris: [C: 032] Followup commit to 8f1ddeb [puppet] - 10https://gerrit.wikimedia.org/r/179935 (owner: 10Alexandros Kosiaris) [15:45:25] and that icinga url_downloader thing should go just about ... now [15:45:55] \o/ [15:46:17] comment icinga-wm... you are making me look bad ... [15:46:23] come on... [15:46:48] RECOVERY - url_downloader on chromium is OK: TCP OK - 0.013 second response time on port 8080 [15:47:03] there you go [15:47:14] the bribing worked! [15:47:55] :-) [15:48:52] <_joe_> lol [15:54:31] (03PS2) 10Alexandros Kosiaris: Add README, RSpecs and tests for squid3 module [puppet] - 10https://gerrit.wikimedia.org/r/179493 [15:56:24] OK, gi11es, aude, y'all ready for this? [15:56:33] yep [15:56:34] * aude here [15:56:45] (03CR) 10Alexandros Kosiaris: Add README, RSpecs and tests for squid3 module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/179493 (owner: 10Alexandros Kosiaris) [15:57:01] Sitelink UI fixes? [15:57:04] yes [15:57:10] :) [15:57:19] and view entity action / deferred deserialization [15:57:21] * marktraceur puts on the Space Jam theme to get psyched [15:57:31] * aude doesn't want to wait until after new years for those [15:57:44] definitely [15:58:25] aude: Any particular order for these two patches of yours? [15:58:31] no [15:58:38] Cool. [15:58:59] Do you plan to push my config. change? [15:59:00] (03CR) 10Alexandros Kosiaris: [C: 032] udp2log: rsync, add ferm service for rsyncd [puppet] - 10https://gerrit.wikimedia.org/r/179438 (owner: 10Dzahn) [15:59:13] that too [15:59:18] argh [15:59:19] don't [15:59:20] it has +4 :) [15:59:24] ? [15:59:33] marktraceur: ^ [15:59:43] aude: don't [15:59:50] (03CR) 10Hoo man: [C: 04-1] "Shouldn't go live before we update the entity suggester data, as otherwise the now deleted properties might come up..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179469 (owner: 10Hoo man) [16:00:05] manybubbles, anomie, ^d, marktraceur: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141215T1600). [16:00:05] marktraceur: hoo wants to wait on the config patch [16:00:08] Yup [16:00:20] I'm merging your wmf12 patch first. [16:00:24] Not sure that's an actual issue, but no benefit in risking anything [16:00:28] ok [16:03:36] Jenkins is derping. [16:04:50] hoo: or could remove those entries from the table [16:04:54] <_joe_> hey DO NOT DEPLOY [16:04:54] we can wait though [16:05:12] aude: We can just finish the update stuff [16:05:12] _joe_: OK! [16:05:14] <_joe_> beta is down because of a bug [16:05:15] hoo: ok [16:05:19] <_joe_> see https://gerrit.wikimedia.org/r/#/c/179932/ [16:05:20] the script is not that broken [16:05:22] Gotcha [16:05:36] _joe_: I have one patch getting merged into wmf12, but I can wait [16:05:45] Let me know when it's ready [16:06:01] only P71 has entries [16:06:06] <_joe_> marktraceur: the problem is here probably https://gerrit.wikimedia.org/r/#/c/142046/16 [16:06:11] _joe_: It's OK [16:06:26] That code isn't in any branch that will be deployed in 2014 [16:06:44] <_joe_> RoanKattouw: oh it's in master, true [16:07:02] So...go? [16:07:04] It only broke in master this morning [16:07:05] <_joe_> marktraceur: so, go on then :) [16:07:07] Yeah never mind us [16:07:35] _joe_: You wanna just merge that patch and see if it fixes beta? [16:07:45] <_joe_> RoanKattouw: yep [16:07:55] I checked and eswiki does in fact have the VipsScaler extension [16:08:09] <_joe_> mh [16:08:11] So it would seem that the segfault is caused by a specific access pattern of $params [16:08:18] <_joe_> yes [16:08:35] <_joe_> I was trying to make hhvm crash with a test script and I couldn't until now [16:09:00] (03CR) 10BryanDavis: "Keys can be placed in beta via local commits in deployment-salt:/var/lib/git/labs/private. That is how the ssh keypair for beta's scap wra" [puppet] - 10https://gerrit.wikimedia.org/r/179875 (owner: 10Hashar) [16:09:38] <_joe_> RoanKattouw: wait for jenkins and it will be merged [16:09:44] Cool [16:09:53] Once merged it'll automatically be deployed to beta [16:09:59] There's a job that runs every 5-10 minutes on a timer [16:11:19] <_joe_> RoanKattouw: it appears the jenkins bot is extremely slow there [16:11:49] Yes [16:11:54] aude: OK, going. [16:11:56] There are some slow jobs it runs on merge [16:11:59] ok [16:12:08] !log marktraceur Synchronized php-1.25wmf12/extensions/Wikidata/: [SWAT] [wmf12] - Update test.wikidata (fixes/polish for changes to the site link section, and performance improvements for page views). (duration: 00m 24s) [16:12:10] aude: Test! [16:12:14] testing [16:12:14] Logged the message, Master [16:12:15] A desk near me has a monitor just showing https://integration.wikimedia.org/zuul/ all day [16:12:33] (near me as in near my normal desk in SF, I'm in Europe right now) [16:13:12] looks ok [16:13:42] Sweet! [16:13:44] gi11es is next. [16:13:51] although i wonder if we have more patches to go in [16:13:59] we might come back at next swat [16:14:09] Well, hoo -1'd his. If you have more, I'm sure the evening folks can help :) [16:14:11] just seems like something is missing but nothing broken [16:14:14] yeah :) [16:18:23] (03PS1) 10Giuseppe Lavagetto: wikidata-query: give root on the test machine to the group [puppet] - 10https://gerrit.wikimedia.org/r/179943 [16:18:25] (03PS1) 10Giuseppe Lavagetto: admins: grant access to Stas Malyshev [puppet] - 10https://gerrit.wikimedia.org/r/179944 [16:20:03] gi11es: Is that patch already in wmf12? [16:20:14] marktraceur: yes [16:20:19] Awesome. [16:21:17] !log marktraceur Synchronized php-1.25wmf11/extensions/MultimediaViewer/: [SWAT] [wmf11] - Track the most recent upload time for performance events (Media Viewer) (duration: 00m 05s) [16:21:18] gi11es: Test? [16:21:24] Logged the message, Master [16:22:10] marktraceur: the data's started pouring in, it works [16:22:11] thanks [16:22:15] Sweet. [16:22:34] On to marktraceur's patches. [16:22:39] marktraceur: Ready to test? [16:22:42] marktraceur: Yup. [16:23:13] It occurs to me I can't test without commons adminship. [16:28:48] (03CR) 10Ori.livneh: [C: 031] rcstream: remove requires_os('ubuntu >= trusty') [puppet] - 10https://gerrit.wikimedia.org/r/179889 (owner: 10Faidon Liambotis) [16:29:26] (03CR) 10Ori.livneh: [C: 031] ganglia: convert $lsb check to os_version() [puppet] - 10https://gerrit.wikimedia.org/r/179885 (owner: 10Faidon Liambotis) [16:30:15] (03PS2) 10Faidon Liambotis: ganglia: convert $lsb check to os_version() [puppet] - 10https://gerrit.wikimedia.org/r/179885 [16:30:22] (03CR) 10Faidon Liambotis: [C: 032 V: 032] ganglia: convert $lsb check to os_version() [puppet] - 10https://gerrit.wikimedia.org/r/179885 (owner: 10Faidon Liambotis) [16:30:31] !log marktraceur Synchronized php-1.25wmf11/extensions/UploadWizard/: [SWAT] [wmf11] Fix Flickr imports in UploadWizard (duration: 00m 05s) [16:30:34] (03PS2) 10Faidon Liambotis: rcstream: remove requires_os('ubuntu >= trusty') [puppet] - 10https://gerrit.wikimedia.org/r/179889 [16:30:35] Logged the message, Master [16:30:42] (03CR) 10Faidon Liambotis: [C: 032 V: 032] rcstream: remove requires_os('ubuntu >= trusty') [puppet] - 10https://gerrit.wikimedia.org/r/179889 (owner: 10Faidon Liambotis) [16:30:51] Waiting on a fellow in -commons to help me test it [16:31:47] (03PS4) 10Ori.livneh: Add 'xenon' module for aggregating ext_xenon-produced traces [puppet] - 10https://gerrit.wikimedia.org/r/179791 [16:31:56] (03CR) 10Ori.livneh: [C: 032 V: 032] Add 'xenon' module for aggregating ext_xenon-produced traces [puppet] - 10https://gerrit.wikimedia.org/r/179791 (owner: 10Ori.livneh) [16:32:53] ori: that seems large enough to warrant an extra set of eyes [16:33:07] a look by* [16:33:12] !log marktraceur Synchronized php-1.25wmf12/extensions/UploadWizard/: [SWAT] [wmf12] Fix Flickr imports in UploadWizard (duration: 00m 05s) [16:33:17] Logged the message, Master [16:33:20] paravoid: do you feel like reviewing? [16:33:22] Deploys done, still waiting on help to test it. [16:33:23] <_joe_> RoanKattouw_away: we did it [16:33:42] <_joe_> ori: I would have loved to review that, but I spent the afternoon chasing segfaults in beta :/ [16:33:48] paravoid: the role isn't applied yet [16:34:12] _joe_: sucks, what was the cause? [16:34:16] <_joe_> ori: and btw, this crashes are something we should reproduce and fix [16:34:40] yeah, what were the crashes? [16:34:40] <_joe_> ori: cause https://gerrit.wikimedia.org/r/#/c/142046/16 fix https://gerrit.wikimedia.org/r/#/c/179932/ [16:35:03] <_joe_> sorry, wrong link [16:35:08] Seems fine now [16:35:12] SWAT is OVER!!! [16:35:13] <_joe_> uhm, no [16:35:33] Hrm? [16:35:43] <_joe_> marktraceur: not related :) [16:35:47] Okay then. [16:36:07] <_joe_> ori: I've also been trying to create a repro case now, but no dice for now [16:37:13] <_joe_> a simpler case that is, the enwiki main page was crashing beta :P [16:37:41] * ori nods [16:38:14] paravoid: ? [16:38:35] (03PS1) 10Ori.livneh: Apply role::xenon on fluorine [puppet] - 10https://gerrit.wikimedia.org/r/179949 [16:43:50] (03CR) 10Nemo bis: [C: 031] Sample the GlobalTitleFail log at 1:10000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179870 (owner: 10Legoktm) [16:45:03] * hoo cries about logrotate [16:52:58] (03CR) 10Ori.livneh: [C: 032] Apply role::xenon on fluorine [puppet] - 10https://gerrit.wikimedia.org/r/179949 (owner: 10Ori.livneh) [16:53:06] (03PS3) 10BryanDavis: Optional MWLoggerMonologSpi configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179369 [16:55:38] ori: sorry, no, not right now [16:56:07] (03PS1) 10Ori.livneh: Typo fix ('xenon' -> 'xenon-log') [puppet] - 10https://gerrit.wikimedia.org/r/179953 [16:56:30] (03CR) 10Ori.livneh: [C: 032 V: 032] Typo fix ('xenon' -> 'xenon-log') [puppet] - 10https://gerrit.wikimedia.org/r/179953 (owner: 10Ori.livneh) [16:58:59] (03PS1) 10Mark Bergsma: Revert "Add 'xenon' module for aggregating ext_xenon-produced traces". [puppet] - 10https://gerrit.wikimedia.org/r/179954 [16:59:07] (03CR) 10jenkins-bot: [V: 04-1] Revert "Add 'xenon' module for aggregating ext_xenon-produced traces". [puppet] - 10https://gerrit.wikimedia.org/r/179954 (owner: 10Mark Bergsma) [16:59:56] (03PS1) 10Mark Bergsma: Revert "Apply role::xenon on fluorine" [puppet] - 10https://gerrit.wikimedia.org/r/179955 [17:00:17] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:42] mark: why? [17:01:53] because such changes should see review first? [17:02:11] as we agreed on a million times now [17:02:39] this is a perf monitoring role, squarely within what we said was ok for me to do [17:02:52] "small fixes" yes [17:02:52] no impact on an existing production service [17:03:04] there's no need to rush this through [17:03:51] of course you've already applied this on the host so i'm not going to make a bigger mess [17:04:08] it'll unapply cleanly with ensure => absent [17:04:13] would you like me to do that? [17:04:17] there's no point [17:04:26] i just want you to wait for reviews. [17:04:38] (03PS2) 10Alexandros Kosiaris: Move the top level variables in phabricator role in a class [puppet] - 10https://gerrit.wikimedia.org/r/179930 [17:04:46] (03Abandoned) 10Mark Bergsma: Revert "Add 'xenon' module for aggregating ext_xenon-produced traces". [puppet] - 10https://gerrit.wikimedia.org/r/179954 (owner: 10Mark Bergsma) [17:04:56] (03Abandoned) 10Mark Bergsma: Revert "Apply role::xenon on fluorine" [puppet] - 10https://gerrit.wikimedia.org/r/179955 (owner: 10Mark Bergsma) [17:05:14] jouncebot: next [17:05:15] In 3 hour(s) and 54 minute(s): Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141215T2100) [17:05:48] jouncebot: Why did you forget to announce my slot? [17:05:57] jouncebot: refresh [17:05:59] I refreshed my knowledge about deployments. [17:07:01] I'm going to deploy my monolog config changes now. [17:07:23] (03PS2) 10BryanDavis: Introduce wmgUseMonologLogger feature flag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179368 [17:08:25] (03CR) 10BryanDavis: [C: 032] Introduce wmgUseMonologLogger feature flag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179368 (owner: 10BryanDavis) [17:08:30] (03Merged) 10jenkins-bot: Introduce wmgUseMonologLogger feature flag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179368 (owner: 10BryanDavis) [17:08:33] (03CR) 10Rush: "Alex, you are a gentlemen and a scholar. thanks." [puppet] - 10https://gerrit.wikimedia.org/r/179930 (owner: 10Alexandros Kosiaris) [17:10:03] !log bd808 Synchronized wmf-config/InitialiseSettings.php: Introduce wmgUseMonologLogger feature flag [I61fa967] (duration: 00m 07s) [17:10:07] Logged the message, Master [17:10:42] (03CR) 10GWicke: "@Alex: I dislike the piping part too, but didn't see another solution that reliably captures output when nothing else is working any more " [puppet] - 10https://gerrit.wikimedia.org/r/179764 (owner: 10GWicke) [17:11:13] (03PS4) 10BryanDavis: Optional MWLoggerMonologSpi configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179369 [17:12:25] (03CR) 10BryanDavis: [C: 032] Optional MWLoggerMonologSpi configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179369 (owner: 10BryanDavis) [17:12:30] (03Merged) 10jenkins-bot: Optional MWLoggerMonologSpi configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179369 (owner: 10BryanDavis) [17:13:44] !log bd808 Synchronized wmf-config: Optional MWLoggerMonologSpi configuration [I720f2cb] (duration: 00m 05s) [17:13:51] Logged the message, Master [17:14:08] !log bd808 Synchronized docroot/noc/createTxtFileSymlinks.sh: Optional MWLoggerMonologSpi configuration [I720f2cb] (duration: 00m 05s) [17:14:12] Logged the message, Master [17:15:37] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:17:36] ori: Dec 15 17:14:51 mw1199: #012Warning: Invalid argument supplied for foreach() in /srv/mediawiki/wmf-config/StartProfiler.php on line 111 -- Looks like that comes from xenon samples that don't have a phpStack array key [17:18:00] bd808: i think the cause for that is that hhvm doesn't pick up changes for a changed closure somehow [17:18:19] yuck. [17:18:39] yeah. [17:18:40] <^d> :( [17:18:48] The HHVM bug _joe_ and I found today was interesting too [17:18:57] (03PS2) 10BryanDavis: Enable MWLoggerMonologSpi for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179370 [17:19:41] (03CR) 10BryanDavis: [C: 032] Enable MWLoggerMonologSpi for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179370 (owner: 10BryanDavis) [17:19:47] (03Merged) 10jenkins-bot: Enable MWLoggerMonologSpi for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179370 (owner: 10BryanDavis) [17:19:50] Apparently if you have a function f($a, &$b, &$c) { ... } but then you call it as call_user_func_array( 'f', array( $a, $b, &$c ) ); (not passing $b by reference) and then f() accesses $b in a certain way (we don't know that part yet), HHVM segfaults [17:20:15] or maybe f() has to throw an exception (the segfault appeared to come from the exception handler) [17:20:39] RoanKattouw: can you file a task? i will chase this down [17:20:46] Sure [17:21:04] I'll attach the backtrace as well [17:22:49] !log bd808 Synchronized wmf-config: Optional MWLoggerMonologSpi configuration [I720f2cb] (for real this time) (duration: 00m 06s) [17:22:54] Logged the message, Master [17:23:07] ^ the last time I did that I had forgotten to rebase on tin :/ [17:23:20] sync-file docroot/noc/createTxtFileSymlinks.sh "Optional MWLoggerMonologSpi configuration [I720f2cb] (for real this time)" [17:23:31] !log bd808 Synchronized docroot/noc/createTxtFileSymlinks.sh: Optional MWLoggerMonologSpi configuration [I720f2cb] (for real this time) (duration: 00m 06s) [17:23:45] Logged the message, Master [17:24:57] Hit the intermittent error with the shared ssh-agent -- mw1047 returned [255]: Error reading response length from authentication socket. [17:26:11] !log bd808 Synchronized wmf-config/InitialiseSettings.php: Enable MWLoggerMonologSpi for testwiki [I419eb0d] (duration: 00m 05s) [17:26:15] Logged the message, Master [17:30:21] ori: https://phabricator.wikimedia.org/T78558 [17:30:46] RoanKattouw: the ' // Do something with $y (we don't know what exactly yet) ' is essential, since i can't reproduce it [17:31:01] I know :( sorry [17:31:14] _joe_ tried to reproduce it too but couldn't [17:31:35] I will add a comment to the bug explaining where the receiving code lives [17:31:46] <_joe_> yeah pretty annoying [17:32:02] <_joe_> ori: beta has full php traces in /var/log/hhvm/error.log btw [17:32:17] * ori looks [17:33:10] (03PS1) 10BryanDavis: monolog: Enable on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179958 [17:33:32] The thing is, esbetawiki also runs VipsScaler and it didn't breka [17:34:14] <_joe_> yeah, that is quite strange as well... [17:34:38] Maybe they have different logos or something, I don't know [17:36:58] (03CR) 10BryanDavis: [C: 032] monolog: Enable on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179958 (owner: 10BryanDavis) [17:37:02] (03Merged) 10jenkins-bot: monolog: Enable on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179958 (owner: 10BryanDavis) [17:37:36] !log bd808 Synchronized wmf-config/InitialiseSettings.php: Enable MWLoggerMonologSpi for group0 wikis [I2f72f97] (duration: 00m 05s) [17:37:42] Logged the message, Master [17:41:49] PROBLEM - puppet last run on analytics1027 is CRITICAL: CRITICAL: Puppet has 1 failures [17:42:01] (03PS1) 10Ori.livneh: Fix-ups for I09926c8c2 [puppet] - 10https://gerrit.wikimedia.org/r/179960 [17:42:50] w00t. Monolog logging is finally working as designed -- https://logstash.wikimedia.org/#/dashboard/elasticsearch/monolog [17:43:00] bd808: \o/ [17:43:27] yay [17:43:47] (03CR) 10Ori.livneh: [C: 032] Fix-ups for I09926c8c2 [puppet] - 10https://gerrit.wikimedia.org/r/179960 (owner: 10Ori.livneh) [17:44:18] The events in that dashboard are coming from monolog directly to the redis instances on the logstash cluster. No udp2log bits in between [17:44:48] bd808: that's awesome [17:44:52] congrats [17:45:42] thanks! [17:47:25] one more nail in udp2log's coffin [17:47:33] jgage: When you get a chance, https://gerrit.wikimedia.org/r/#/c/179758/ would be nice to have to go along with this. [17:48:34] ok [17:49:50] I piled up a few logstash config changes on Friday -- https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/puppet+branch:production+topic:logstash,n,z -- They are all running on beta and seem to make the events there a bit nicer [17:50:41] speaking of udp2log, are there any plans for sqstats pushing to graphite via udp? I'll need to tackle that too [17:50:49] bd808 ori ^ [17:50:55] sqstats? [17:52:27] there's a perl script named sqstat running on analytics1026 that writes reqstats.* metrics to graphite [17:52:48] I'm assuming "squid stats" [17:52:53] perl? *shudder* [17:52:58] godog, i would love to get rid of that if there were some way...eventually it will probably run via kafkatee when we get rid of udp2log [17:53:37] (03CR) 10Gage: "just one nitpick, inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/179758 (owner: 10BryanDavis) [17:53:43] ottomata: sweet! how far it is from being doable? [17:53:47] RECOVERY - puppet last run on analytics1027 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:55:38] godog, um, well, i need to get some udp2log outputs moved into hadoop [17:55:45] that' the first step [17:56:00] we're going to try to move as many of them to hadoop as we can, rather than relying on filtering and streaming to disk [17:56:11] qchris: how soon do you think we can do that? [17:57:05] Has anyone tired to figure out what's up with the memcached entry that keeps spamming the hhvm.log? nable to unserialize: [a:1777:{s:27:"Gadget-SidebarTranslate.css";s:8:"!TOO BIG";s:25:"Abusefilter-warning-skype";s:8:"!TOO BIG"; ... [17:59:05] ottomata: Not sure how pressing it is. If we need to, we can do it in 2014 (like in 2 days) [17:59:30] <_joe_> bd808: I think tim and ori may have a clue, I remember that being one of the issues we had to tackle with HHVM [17:59:30] ottomata: But if it's not extra-pressing, I'd probably wait until 2015. [17:59:53] qchris, it is not super pressing, but it is somethign i'd like to get done sooner rather than later [18:00:03] +1 what ottomata said [18:00:13] ottomata: me too. But soooo many things to do :-) [18:08:53] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Waiting for ops meeting and a new key" [puppet] - 10https://gerrit.wikimedia.org/r/179944 (owner: 10Giuseppe Lavagetto) [18:09:09] (03PS2) 10BryanDavis: Sample the GlobalTitleFail log at 1:10000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179870 (owner: 10Legoktm) [18:09:34] (03CR) 10BryanDavis: [C: 032] Sample the GlobalTitleFail log at 1:10000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179870 (owner: 10Legoktm) [18:09:47] (03Merged) 10jenkins-bot: Sample the GlobalTitleFail log at 1:10000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179870 (owner: 10Legoktm) [18:10:40] (03PS1) 10Ottomata: Install python-netaddr on stat1002 and stat1003 [puppet] - 10https://gerrit.wikimedia.org/r/179965 [18:10:49] !log bd808 Synchronized wmf-config/InitialiseSettings.php: Sample the GlobalTitleFail log at 1:10000 [I280ac3d] (duration: 00m 07s) [18:10:54] Logged the message, Master [18:12:37] (03CR) 10Ottomata: [C: 032] Install python-netaddr on stat1002 and stat1003 [puppet] - 10https://gerrit.wikimedia.org/r/179965 (owner: 10Ottomata) [18:15:13] (03PS2) 10BryanDavis: logstash: port udp2log rules to monolog input [puppet] - 10https://gerrit.wikimedia.org/r/179758 [18:26:01] (03PS2) 10BryanDavis: Set wgTranslateTranslationServices['TTMServer']['cutoff'] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179566 [18:26:36] (03CR) 10BryanDavis: [C: 032] Set wgTranslateTranslationServices['TTMServer']['cutoff'] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179566 (owner: 10BryanDavis) [18:26:47] (03Merged) 10jenkins-bot: Set wgTranslateTranslationServices['TTMServer']['cutoff'] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179566 (owner: 10BryanDavis) [18:27:51] !log bd808 Synchronized wmf-config/CommonSettings.php: Set wgTranslateTranslationServices['TTMServer']['cutoff'] [I138b22a] (duration: 00m 07s) [18:27:56] Logged the message, Master [18:28:26] (03PS1) 10Ori.livneh: xenon: add title to graphs; set mindwidth=2 [puppet] - 10https://gerrit.wikimedia.org/r/179967 [18:30:41] (03CR) 10Ori.livneh: [C: 032] xenon: add title to graphs; set mindwidth=2 [puppet] - 10https://gerrit.wikimedia.org/r/179967 (owner: 10Ori.livneh) [18:58:25] dr0ptp4kt: what's the statsu of this? [18:58:25] https://phabricator.wikimedia.org/T76626 [18:58:43] are you waiting for ops to do something about it? [19:00:52] (03CR) 10Ori.livneh: [C: 031] mediawiki: enhancements to hhvm_cleanup_cache [puppet] - 10https://gerrit.wikimedia.org/r/179102 (owner: 10Giuseppe Lavagetto) [19:04:13] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [19:12:58] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [19:30:09] (03PS1) 10Hashar: hhvm: fix include_path [puppet] - 10https://gerrit.wikimedia.org/r/179974 [19:31:34] ottomata: nothing other than meetings yet. bblack and i have a meeting tomorrow with people to go over it [19:33:03] ok cool, just checking that it is moving [19:38:06] (03CR) 10Hashar: [V: 031] "Cherry picked on contint puppetmaster and it fixed the build of TimedMediaHandler under HHVM https://integration.wikimedia.org/ci/job/mwex" [puppet] - 10https://gerrit.wikimedia.org/r/179974 (owner: 10Hashar) [19:40:49] (03PS2) 10Giuseppe Lavagetto: admins: grant access to Stas Malyshev [puppet] - 10https://gerrit.wikimedia.org/r/179944 [19:41:46] ottomata: thanks for checking on that. i appreciate it! [19:43:19] (03PS2) 10Giuseppe Lavagetto: wikidata-query: give root on the test machine to the group [puppet] - 10https://gerrit.wikimedia.org/r/179943 [19:45:31] (03CR) 10Giuseppe Lavagetto: [C: 032] wikidata-query: give root on the test machine to the group [puppet] - 10https://gerrit.wikimedia.org/r/179943 (owner: 10Giuseppe Lavagetto) [19:45:51] (03PS3) 10Giuseppe Lavagetto: admins: grant access to Stas Malyshev [puppet] - 10https://gerrit.wikimedia.org/r/179944 [19:46:03] (03CR) 10Giuseppe Lavagetto: [C: 032] admins: grant access to Stas Malyshev [puppet] - 10https://gerrit.wikimedia.org/r/179944 (owner: 10Giuseppe Lavagetto) [19:56:38] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: puppet fail [19:57:41] Coren: Mark just mentioned something about the dhcp settings being being wrong for the labs subnet -- have you taken care of that already? [19:58:19] <_joe_> that ^^ is me and it's expected since I did a small wtf [19:58:22] (03PS1) 10Giuseppe Lavagetto: admins: use spaces in the yaml file [puppet] - 10https://gerrit.wikimedia.org/r/179980 [19:58:45] (03CR) 10Giuseppe Lavagetto: [C: 032] admins: use spaces in the yaml file [puppet] - 10https://gerrit.wikimedia.org/r/179980 (owner: 10Giuseppe Lavagetto) [19:58:57] (03CR) 10Giuseppe Lavagetto: [V: 032] admins: use spaces in the yaml file [puppet] - 10https://gerrit.wikimedia.org/r/179980 (owner: 10Giuseppe Lavagetto) [19:59:29] andrewbogott: i believe rob confirmed it was already merged [19:59:32] but better check anyway [20:01:07] Hm, I don't see anything obvious. Where am I looking, specifically -- in install_server? [20:02:09] robh, same question [20:02:54] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:03:54] so the change that papaul did is merged [20:04:27] https://gerrit.wikimedia.org/r/#/c/179152/ [20:04:39] andrewbogott: So this is when its fetching packages and software during the installer right? [20:04:41] (03PS1) 10Mattflaschen: Enable EventLogging for Flow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179981 [20:04:46] it takes like 5 minutes per 1% minimum? [20:04:48] robh: yep [20:05:02] so somewhere in the install server file, last time this happened, subnets didnt match [20:05:12] (03CR) 10Mattflaschen: [C: 04-1] "We'll put in today's SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179981 (owner: 10Mattflaschen) [20:05:12] or wait... no it was ipv6 last time [20:05:20] last time it attempts ipv6 by default, and it wasnt on the system [20:05:28] so it would try it and fial with every fetch [20:05:40] andrewbogott: were you putting ipv6 on these? [20:05:54] robh: not intentionally [20:06:01] i recall having this issue with coren [20:06:03] and we found it [20:06:28] (03PS1) 10Hoo man: Fix up logging (and log rotation) for dumpwikidatajson [puppet] - 10https://gerrit.wikimedia.org/r/179982 [20:06:44] andrewbogott: that being said, im going to audit the network info in the netboot file real quick [20:07:06] robh: where would I specify v6? My config in dhcpcd is very simple so far [20:07:11] * hoo created a log mess on snapshot1003 :S [20:07:29] well, i think in the last case it was due to the fqdn having an ipv6 address in dns [20:07:37] and it not having it bound at time of install [20:07:58] but i dont recall exactly, too bad coren is still in line for smiling [20:09:19] andrewbogott: ok, this is for eqiad server too right? [20:09:28] robh: yep [20:09:39] which one (so i can find row) [20:09:45] i dont know off top of my head, sorry [20:09:56] virt1010, 1011, 1012 [20:10:05] Those are the ones giving me trouble [20:10:53] ok, so labs row b [20:12:32] hoo: is the mess just the missing .log extension? [20:12:45] andrewbogott: so yea, the network settings in the install server for row b are good [20:12:57] and the patchset mark was referring to was for row b codfw, not row b eqiad [20:13:09] YuviPanda: Yes... the trailing wildcard makes it go crazy [20:13:14] robh: dang [20:13:17] oh, how so? [20:13:18] an re-rotate stuff that is already rotated [20:13:26] aaah [20:13:26] right [20:13:28] dumpwikidatajson-1.1.gz.1.gz [20:13:33] producing such things [20:13:45] andrewbogott: im racking my brain trying to recall the exact specific of this issue, because it happened to coren and i [20:13:51] and i know it had to do with ipv6 [20:13:51] I only played it through for one iteration, thus didn't think that far :S [20:14:09] hoo: heh, ok. want me to merge? or do you have someone else pigeonholed for this? [20:14:12] i just dont recall what the fix was ;_; [20:14:29] YuviPanda: Would be great, if you could take care [20:14:35] (so at least i know that marc and i figured it out once, and he had the issue once before that with another project someplace iirc) [20:14:41] (03CR) 10Yuvipanda: [C: 032] Fix up logging (and log rotation) for dumpwikidatajson [puppet] - 10https://gerrit.wikimedia.org/r/179982 (owner: 10Hoo man) [20:14:44] hoo: ^ [20:14:51] Also feel free to just wipe out the existing /var/log/wikidatadump/ [20:15:05] nothing of interest there, today's run was just fine [20:15:13] hoo: ok [20:17:53] robh: I see coren saying "There are issues with the autoinstall that I do not have the time to debug." in a commit log. [20:18:21] hrmm, we flipped some bit or something and it worked right away at the time [20:18:26] And I was only installing Trusty as an experiment, maybe I should just skip to precise [20:18:35] but, that was months and months ago [20:18:57] and it now escapes me (which is really annoying me now, since normally i can recall things if i focus) [20:19:02] =P [20:19:14] " Add gold and platinum MAC to dhcp" ? [20:22:12] YuviPanda: Please don't forget that... don't want to have that rot around forever [20:22:54] hoo: yeah, am about to delete it. are you sure they won’t be needed? [20:23:08] robh: I need to go back to sleep… I will look forward to a complete solution in my backscroll when I wake up :) Thanks for looking. [20:23:10] We didn't even have those until last week [20:23:15] hoo: hehe [20:23:20] and I only introduced them to help me debugging [20:23:20] so, yes :P [20:23:25] andrewbogott: if coren is online later i'll ask him about it =] [20:23:34] !log cleaned out /var/log/wikidatadumps on snapshot1003 because hoo needs them anywhay? [20:23:37] Logged the message, Master [20:23:46] hahaha :D [20:24:25] hoo: I bet at some point you’ll get tired of the hoo puns :) [20:24:42] Yeah, like in 2011 [20:24:52] heh [21:00:04] gwicke, cscott, arlolra, subbu: Dear anthropoid, the time has come. Please deploy Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141215T2100). [21:04:41] greg-g, no parsoid deploy today /cc cscott gwicke [21:10:10] (03PS1) 10Rush: phab route issues by localpart to project [puppet] - 10https://gerrit.wikimedia.org/r/179989 [21:10:55] (03CR) 10jenkins-bot: [V: 04-1] phab route issues by localpart to project [puppet] - 10https://gerrit.wikimedia.org/r/179989 (owner: 10Rush) [21:12:52] (03PS2) 10Rush: phab route issues by localpart to project [puppet] - 10https://gerrit.wikimedia.org/r/179989 [21:13:34] (03CR) 10jenkins-bot: [V: 04-1] phab route issues by localpart to project [puppet] - 10https://gerrit.wikimedia.org/r/179989 (owner: 10Rush) [21:18:25] (03PS3) 10Rush: phab route issues by localpart to project [puppet] - 10https://gerrit.wikimedia.org/r/179989 [21:20:50] (03CR) 10Rush: [C: 032] phab route issues by localpart to project [puppet] - 10https://gerrit.wikimedia.org/r/179989 (owner: 10Rush) [21:37:44] (03PS1) 10Rush: phab email routing add reference not localport as project [routing] foo = bar foo@p.wm.o == #bar and not #foo [puppet] - 10https://gerrit.wikimedia.org/r/180018 [21:41:21] (03Abandoned) 10EBernhardson: Allow enwiki bots to create flow boards [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178388 (owner: 10EBernhardson) [21:41:34] (03CR) 10Rush: [C: 032] phab email routing add reference not localport as project [routing] foo = bar foo@p.wm.o == #bar and not #foo [puppet] - 10https://gerrit.wikimedia.org/r/180018 (owner: 10Rush) [21:50:46] (03PS2) 10MaxSem: Don't collapse sections on mobile WD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179513 [22:31:10] PROBLEM - puppet last run on mw1060 is CRITICAL: CRITICAL: puppet fail [22:46:21] RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:49:31] (03PS4) 10Spage: $wgContentHandlerUseDB true everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/170129 (https://bugzilla.wikimedia.org/49193) [22:49:53] (03PS5) 10Spage: $wgContentHandlerUseDB true everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/170129 (https://bugzilla.wikimedia.org/49193) [23:16:23] (03PS1) 10Ori.livneh: xenon: gzip svgs; add custom header/footer + stylesheet [puppet] - 10https://gerrit.wikimedia.org/r/180045 [23:18:08] (03CR) 10Ori.livneh: [C: 032 V: 032] xenon: gzip svgs; add custom header/footer + stylesheet [puppet] - 10https://gerrit.wikimedia.org/r/180045 (owner: 10Ori.livneh) [23:18:26] !log redeploy patches for T77624 & T76195 [23:18:33] Logged the message, Master [23:18:54] !log deploy patch for T71209 [23:18:57] Logged the message, Master [23:21:16] (03PS1) 10Ori.livneh: Xenon: Add AddType / AddEncoding directives for .svgz files [puppet] - 10https://gerrit.wikimedia.org/r/180047 [23:21:36] (03CR) 10Ori.livneh: [C: 032 V: 032] Xenon: Add AddType / AddEncoding directives for .svgz files [puppet] - 10https://gerrit.wikimedia.org/r/180047 (owner: 10Ori.livneh) [23:27:09] Here's a strange error: https://integration.wikimedia.org/ci/job/mediawiki-phpunit-zend/37/console [23:27:18] "TextPassDumperTest::testCheckpointGzip [23:27:19] 23:25:21 expected more than 1 checkpoint to have been created. Checkpoint interval is 0.5 seconds, maybe your computer is too fast? [23:27:21] " [23:36:53] Ah yes, that happened to me as well [23:37:20] awight: in fact it's filed https://phabricator.wikimedia.org/T70653 [23:38:56] Nemo_bis: well, at least it was an entertaining error... [23:39:21] :) [23:41:51] I also like the "failed asserting that 1 is greater than 1" premise