[00:01:40] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 114.800003 [00:02:50] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Server Error - 1703 bytes in 7.345 second response time [00:04:40] !log To investigate bug 63579, manually patched "grunt-lib-phantomjs/phantomjs/main.js" in "/srv/deployment/integration/slave-scripts" on gallium [00:04:45] Logged the message, Master [00:07:08] !log graphite.wikimedia.org (e.g. https://graphite.wikimedia.org/render/?) is serving 502 Bad Gateway, ori is investigating [00:07:13] Logged the message, Master [00:13:30] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [00:14:50] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 268395 bytes in 7.928 second response time [00:15:12] !log graphite webapp 502 caused by uwsgi's init script not restarting the service correctly [00:15:17] Logged the message, Master [00:15:30] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [00:33:30] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 185.933334 [00:35:30] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 297.566681 [00:38:10] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [00:54:30] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [00:56:40] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [01:09:05] !log Debugging uWSGI init scripts on tungsten; expect some Graphite / Gdash flapping. [01:09:13] Logged the message, Master [01:25:20] !log Bug 63579 is still happening occasionally. Leaving patch on gallium in place for now. [01:25:26] Logged the message, Master [01:37:10] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx [01:47:00] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 997.768861401 [01:54:21] (03PS1) 10Ori.livneh: Replace uWSGI's broken init.d scripts with Upstart job def [operations/puppet] - 10https://gerrit.wikimedia.org/r/124786 [02:00:12] (03CR) 10Ori.livneh: [C: 032] Replace uWSGI's broken init.d scripts with Upstart job def [operations/puppet] - 10https://gerrit.wikimedia.org/r/124786 (owner: 10Ori.livneh) [02:03:03] (03PS1) 10Ori.livneh: Update graphite::web for I8c214e0fd [operations/puppet] - 10https://gerrit.wikimedia.org/r/124788 [02:03:20] (03CR) 10Ori.livneh: [C: 032 V: 032] Update graphite::web for I8c214e0fd [operations/puppet] - 10https://gerrit.wikimedia.org/r/124788 (owner: 10Ori.livneh) [02:12:10] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [02:12:10] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [02:12:10] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [02:12:10] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [02:13:51] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 2777 MB (2% inode=99%): [02:17:24] (03PS1) 10Ori.livneh: Add 'uwsgictl' tool for managing services [operations/puppet] - 10https://gerrit.wikimedia.org/r/124789 [02:18:09] (03PS2) 10Ori.livneh: Add 'uwsgictl' tool for managing services [operations/puppet] - 10https://gerrit.wikimedia.org/r/124789 [02:19:25] !log LocalisationUpdate completed (1.23wmf20) at 2014-04-09 02:19:25+00:00 [02:19:28] (03CR) 10jenkins-bot: [V: 04-1] Add 'uwsgictl' tool for managing services [operations/puppet] - 10https://gerrit.wikimedia.org/r/124789 (owner: 10Ori.livneh) [02:19:33] Logged the message, Master [02:19:50] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3701 MB (3% inode=99%): [02:20:20] (03PS3) 10Ori.livneh: Add 'uwsgictl' tool for managing services [operations/puppet] - 10https://gerrit.wikimedia.org/r/124789 [02:21:54] (03CR) 10Ori.livneh: [C: 032] Add 'uwsgictl' tool for managing services [operations/puppet] - 10https://gerrit.wikimedia.org/r/124789 (owner: 10Ori.livneh) [02:31:47] (03PS1) 10Ori.livneh: Add service checks for mwprof, uwsgi, graphite-web & gdash [operations/puppet] - 10https://gerrit.wikimedia.org/r/124790 [02:33:24] (03CR) 10Ori.livneh: [C: 032] Add service checks for mwprof, uwsgi, graphite-web & gdash [operations/puppet] - 10https://gerrit.wikimedia.org/r/124790 (owner: 10Ori.livneh) [02:39:59] (03PS1) 10Ori.livneh: uWSGI service: specify --autoload [operations/puppet] - 10https://gerrit.wikimedia.org/r/124791 [02:40:08] (03CR) 10Ori.livneh: [C: 032 V: 032] uWSGI service: specify --autoload [operations/puppet] - 10https://gerrit.wikimedia.org/r/124791 (owner: 10Ori.livneh) [02:43:52] !log LocalisationUpdate completed (1.23wmf21) at 2014-04-09 02:43:52+00:00 [02:43:57] Logged the message, Master [02:53:16] greg-g: l10nupdate ran for 1.23wmf21 and mw.o isn't broken. \o/ [03:00:50] RECOVERY - Disk space on virt0 is OK: DISK OK [03:05:18] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: Connection refused [03:06:18] PROBLEM - gdash.wikimedia.org on tungsten is CRITICAL: Connection refused [03:16:58] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [03:23:58] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx [03:33:29] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Apr 9 03:33:25 UTC 2014 (duration 33m 24s) [03:33:34] Logged the message, Master [03:34:24] (03PS1) 10Ori.livneh: Make (graphite|gdash).wm.o go through misc-eqiad-lb [operations/puppet] - 10https://gerrit.wikimedia.org/r/124792 [03:41:30] (03CR) 10Ori.livneh: [C: 032] Make (graphite|gdash).wm.o go through misc-eqiad-lb [operations/puppet] - 10https://gerrit.wikimedia.org/r/124792 (owner: 10Ori.livneh) [04:13:58] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [05:12:18] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:12:18] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:12:18] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:12:18] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:13:58] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx [05:59:58] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [06:57:18] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx [07:09:23] (03PS4) 10Rush: module to manage new python-diamond package [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 [07:14:55] (03CR) 10Andrew Bogott: [C: 04-1] "Two comments, both simple" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 (owner: 10Rush) [07:20:07] (03CR) 10Rush: module to manage new python-diamond package (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 (owner: 10Rush) [07:25:30] rfarrand: This is the security thing we were talking about at breakfast: https://xkcd.com/ [07:30:18] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx [07:38:52] (03CR) 10Matanya: module to manage new python-diamond package (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 (owner: 10Rush) [07:52:17] @notify hashar [07:52:17] I'll let you know when I see hashar around here [07:52:19] ok [08:02:12] hello [08:04:41] hi hashar [08:04:43] I added more checks :D [08:04:47] hashar: how do I help getting https://wiki.jenkins-ci.org/display/JENKINS/Checkstyle+Plugin installed with our jenkins install? if it already isn't [08:05:08] yuvipanda: I think we got it already [08:05:15] oh cool! [08:05:42] yuvipanda: and it is in jjb: http://ci.openstack.org/jenkins-job-builder/publishers.html#publishers.checkstyle :-] [08:05:49] oooh! cool! [08:05:58] https://gerrit.wikimedia.org/r/#/c/124749/ and https://gerrit.wikimedia.org/r/#/c/124748/ for my python jobs [08:06:07] i deployed it already :) [08:06:15] looking [08:06:27] for check style you can use the publisher macro "checkstyle-xml" [08:06:41] it is defined in macro.yaml [08:06:41] is it being used anywhere else? [08:06:44] * yuvipanda greps [08:06:49] basically look for any file named checkstyle-*.xml [08:06:58] yeah grep is your best bet [08:07:08] if you get the check style report generated in the root of the workspace [08:07:12] you can get it added to the job using: [08:07:21] publishers: [08:07:21] - checkstyle-xml [08:07:46] cool! [08:08:11] so flake8 [08:08:12] https://gerrit.wikimedia.org/r/#/c/124749/1/mobile.yaml [08:08:27] that change is fine, it will create a job which executes: tox -e flake8 [08:08:38] which also mean that the target repository needs the flake8 integration hehe [08:08:40] and tox [08:08:51] hashar: yeah, I added tox.ini [08:09:00] there is https://www.mediawiki.org/wiki/Continuous_integration/Tutorials/Test_your_python :D [08:09:04] and fixed flake errors [08:09:36] hashar: ah. so this is just two scripts, no setup.py or modules [08:09:46] hmm, I just need flake8, so maybe I should use a different job [08:09:53] na it is fine [08:10:02] tox will take care of installing the latest version of flake8 [08:10:09] right [08:10:14] potentially we could just run the flake8 command [08:10:22] but I find defining a list of even easier to handle [08:10:26] :D [08:10:33] and you can then tweak what flake8 env is doing by simply editing your tox.ini [08:10:36] and in the future if we do end up having tests... [08:10:37] yeah [08:10:42] that saves you from having to refresh the jenkins job configuration [08:10:49] yup. [08:11:25] hashar: I also want to get the continuous builds working again. [08:11:31] for this repo [08:11:32] what is the change adding tox.ini ? [08:11:37] hashar: looking, moment [08:11:42] ahh nm [08:11:46] hashar: https://gerrit.wikimedia.org/r/#/c/124746/ [08:11:46] it is under scripts/tox.ini [08:11:59] yeah, all py files are there [08:12:02] should I move it up? [08:12:11] if you dont mind [08:12:16] sure [08:12:18] you also need to define a tox env named flake8 [08:12:59] don't I have that in the tox.ini? [08:13:18] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:13:18] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:13:18] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:13:18] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:13:48] yuvipanda: there is a [flake8] section but that simply gives settings for flake8 [08:13:57] oh [08:14:00] yuvipanda: tox need an env defined with [testenv:flake8] let me find an example [08:14:08] aaah [08:14:14] ah it is on https://www.mediawiki.org/wiki/Continuous_integration/Tutorials/Test_your_python [08:14:44] there is a dummy tox file which has sections [tox]  (that configure tox itself) [testenv] which provides the default for each environement [08:14:59] and then [testenv:flake8] which tells what to do when one invoke tox with the 'flake8' environement [08:15:13] think about it like a npm target which would be npm flake8 :-D [08:15:20] :) [08:15:32] deploying zuul change [08:16:53] done [08:17:21] wooo! [08:18:24] hashar: https://gerrit.wikimedia.org/r/#/c/124797/ has tox.ini fixes [08:18:59] hmm [08:18:59] 08:18:13 tox.MissingFile: MissingFile: /mnt/jenkins-workspace/workspace/apps-android-wikipedia-tox-flake8/setup.py [08:19:23] I do have skipDist=true [08:22:14] hmm [08:22:17] * yuvipanda tries skipsdist [08:25:05] sorry bac [08:25:06] k [08:25:23] ah [08:25:27] yuvipanda: you should add some ignores [08:25:35] yeah, am doing [08:25:36] the wiki page is not very up to date hehe [08:25:47] hashar: it is currently running flake8 on tox itself and complaining :D [08:25:49] https://integration.wikimedia.org/ci/job/apps-android-wikipedia-tox-flake8/3/console [08:26:00] [flake8] [08:26:00] exclude = .venv,.tox,dist,doc,build,*.egg [08:27:12] \O/ [08:27:33] hashar: \o/ https://gerrit.wikimedia.org/r/#/c/124797/ that seems to work :) [08:27:44] hashar: want to +2? :) [08:27:47] press rebase and merge ! [08:27:52] I rebased :) [08:28:18] let me comment on it [08:28:23] ok! [08:29:40] https://gerrit.wikimedia.org/r/#/c/124797/8/tox.ini,unified [08:29:42] a couple more [08:29:58] [tox] [08:29:58] minversion = 1.6 [08:29:58] envlist = flake8 [08:31:47] kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 1.97148582555e-09 [08:31:50] ? [08:31:57] hashar: updated! :) [08:32:57] yuvipanda: rebased it and giving it a try [08:33:04] hashar: ok! [08:33:08] * hashar whistles [08:33:17] want me to +2 it ? [08:33:54] hashar: yes! :) [08:34:03] hashar: ty! [08:34:10] hashar: now comes the hard part which is checkstyle for java [08:34:15] I'll go have lunch and come back to it :) [08:34:19] yuvipanda: also python as a VERY nice feature which is to add unit test straight in the function/method doc block [08:34:20] https://docs.python.org/2/library/doctest.html [08:34:28] yeah! I've used that in the past [08:34:42] hashar: but these are just plain scripts that are run once in a while. I'll figure out how to add tests for them at some point [08:34:50] found that a couple weeks ago, bryan davis did demo it to me :] [08:34:54] I was in choke [08:35:02] yeah, doctests are fun! [08:35:16] hashar: we don't have anything else that uses checkstyle on Java repos do we? [08:35:28] not sure [08:35:40] isn't maven taking care of generating the check style xml file? [08:35:44] (I am maven agnostic sorry) [08:35:44] yeah [08:35:51] so I just have to run maven then [08:35:55] or have a job that runs maven [08:36:00] and setup a local maven thing going [08:36:06] hashar: I'll do that after lunch! [08:36:10] we'll have to make that non-voting for now [08:36:11] but still [08:36:23] yuvipanda: I think there is already a maven job in our jjb conf [08:36:29] oh? [08:36:31] * yuvipanda greps [08:36:35] yeah in mobile.yaml [08:36:41] yeah, [08:37:04] yuvipanda: and http://ci.openstack.org/jenkins-job-builder/project_maven.html :-] [08:37:12] woot! [08:37:19] hashar: this should be easier since this is just a check and not a build [08:37:20] and indeed we can make it non voting while you play with it [08:37:22] I'll add build later [08:37:48] hashar: brb! And thanks for the help! [08:37:49] I am not sure where the checkstyle.xml file will end up being generated to though [08:38:02] ping me anytime, I am at home the whole day [08:38:28] ah [08:38:45] yuvipanda: jenkins-bot can't merge on the apps/android/wikipedia :-] adding it in [08:39:46] !log Gerrit Letting JenkinsBot submit changes on apps/android/* [08:39:51] Logged the message, Master [08:50:27] hashar: woo! Thanks for fixing that [08:57:28] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx [09:01:00] <_joe_> nd graphite is *not* working, again [09:01:26] (03CR) 10Odder: [C: 031] Add Musées de la Haute-Saône to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124754 (owner: 10Jean-Frédéric) [09:09:52] (03PS2) 10Hashar: Add Musées de la Haute-Saône to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124754 (owner: 10Jean-Frédéric) [09:10:15] (03CR) 10Hashar: "You might want to whitelist both *.musees.cg70.fr and musses.cg70.fr don't you? :-)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124754 (owner: 10Jean-Frédéric) [09:10:34] odder: do you have any contact with jean-frederic ? [09:16:56] (03PS3) 10Hashar: Add Musées de la Haute-Saône to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124754 (owner: 10Jean-Frédéric) [09:17:12] hashar: java checkstyle! https://gerrit.wikimedia.org/r/#/c/124802/ and https://gerrit.wikimedia.org/r/#/c/124801/ [09:17:20] I have to make it non voting tho, will figure that [09:17:42] (03CR) 10Hashar: [C: 032] "Restored *.musees.cg70.fr , that is not going to do any harm :-)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124754 (owner: 10Jean-Frédéric) [09:17:49] (03Merged) 10jenkins-bot: Add Musées de la Haute-Saône to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124754 (owner: 10Jean-Frédéric) [09:19:09] !log hashar synchronized wmf-config/InitialiseSettings.php '[] = 'musees.cg70.fr'; {{gerrit|124754}} {{bug|63449}}' [09:19:14] Logged the message, Master [09:19:42] (03CR) 10Hashar: "The change has been deployed on Wikimedia production cluster. Feel free to ping me on irc (hashar) if it needs further tweaks." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124754 (owner: 10Jean-Frédéric) [09:27:12] (03CR) 10Hashar: [C: 031] "We should probably find a way to have scap to install it for us or update the /usr/local/bin paths to point to /srv/scap/scap/bin or somet" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124763 (owner: 10BryanDavis) [09:32:01] (03PS1) 10Andrew Bogott: Hammer down a few more bogus https failures. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124806 [09:35:35] ACKNOWLEDGEMENT - RAID on dataset1001 is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) daniel_zahn RT #7238 [09:35:56] (03PS2) 10Andrew Bogott: Hammer down a few more bogus https failures. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124806 [09:36:04] (03CR) 10Alexandros Kosiaris: [C: 032] Create symlink for compile-wikiversions in /usr/local/bin [operations/puppet] - 10https://gerrit.wikimedia.org/r/124763 (owner: 10BryanDavis) [09:37:09] * yuvipanda pokes hashar [09:38:40] (03CR) 10Andrew Bogott: [C: 032] Hammer down a few more bogus https failures. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124806 (owner: 10Andrew Bogott) [09:40:25] ori: ping [09:42:36] (03CR) 10Dzahn: "+1" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124806 (owner: 10Andrew Bogott) [09:45:52] yuvipanda: pong [09:46:34] hashar: java checkstyle! https://gerrit.wikimedia.org/r/#/c/124802/ and https://gerrit.wikimedia.org/r/#/c/124801/ [09:46:40] bad panda [09:46:45] andrewbogott: sorry haven't followed up on ganglia on labs. The instance is too small indeed and not sure how to fix it. I would attempt to resize the instance to something bigger, else recreate it from scratched + do a bunch of puppet changes to point to the new instance :] [09:46:52] what did I do now, odder [09:47:03] * odder hugs yuvipanda [09:47:07] telephone :) [09:47:12] hashar: it's puppetized, right? So just kill the instance and build a new one with the same name? [09:47:18] Oh, I guess you need to reuse the ip... [09:47:36] odder: :) It was a bit irritating considering how some decisions are made on Trello in a much less transparent manner and you need accounts to comment there... [09:47:48] odder: and even worse is Mingle which sometimes isn't even public [09:48:09] PROBLEM - HTTPS on cp4007 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [09:48:16] andrewbogott: yeah I would ideally need the same IP but there is no good way to achieve that apparently. And since I am too lazy to do all the puppet changes, I thought we could try resizing the instance hehe [09:48:34] andrewbogott: but you probably have better things to do today @ops summit (moaaaaar monitoring!) [09:48:54] Can try resizing but instances often do not survive [09:48:59] But yeah, next week we'll try it [09:49:09] andrewbogott: another way would be to have two instances. One to collect all metrics and another one acting as a webfrontend [09:49:26] andrewbogott: but I don't know ganglia well enough to figure out how to split web/aggregators services [09:50:05] It's ok, we'll just move it to a great big instance next week :) [09:50:23] so resizing is highly risky (instance disappear) but if it works that would only have taken a few minutes + that grant you yet another skill on your resume ("I managed to resize an instance on Nova yeahhhh") [09:50:30] hehe [09:51:32] yuvipanda: looking [09:52:54] (03CR) 10Dzahn: [C: 032] remove all Tampa appservers from DHCP [operations/puppet] - 10https://gerrit.wikimedia.org/r/123211 (owner: 10Dzahn) [09:54:45] yuvipanda: you got spam https://gerrit.wikimedia.org/r/#/c/124802/3/mobile.yaml,unified [09:55:11] yay mutante, one step closer to getting the word 'Tampa' blacklisted in here? :-) [09:57:29] hashar: updated both patches [09:57:33] :-] [09:57:45] waiting for Jenkins to show the diff [09:58:34] yuvipanda: get the maven job deployed :) I am deploying zuul change [09:58:37] hashar: woo! deploying [09:59:10] hashar: deployed! [09:59:28] and feel free to join #wikimedia-qa where all the spam is sent :] [09:59:47] hashar: I was on it but the selenium spam was a fair bit so I left [10:00:06] you can ignore the selenium bot :] [10:00:17] that is true [10:00:19] I shall do that :) [10:00:24] * yuvipanda adds to autojoin [10:00:24] zuul reloaded! [10:01:28] hashar: woo! now to test. [10:04:38] hashar: \o/ https://integration.wikimedia.org/ci/job/apps-android-wikipedia-maven-checkstyle/2/console but has a maven error not a checkstyle one [10:04:39] investigating [10:04:59] at least it started downloading the whole internet [10:05:05] that is usually a good sign [10:05:33] hehe [10:05:39] 10:03:54 [INFO] There are 3898 checkstyle errors. [10:05:42] but I can't see them? [10:06:00] we have to find out whether maven write them to some xml file [10:07:07] yuvipanda: http://paste.openstack.org/show/75396/ [10:08:25] I guess we want Jenkins Checkstyle plugin to look at files matching **/target/checkstyle-result.xml [10:09:07] (follow up on #wikimedia-qa) [10:51:23] (03PS1) 10Hashar: contint: apply maven settings on labs slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/124822 [10:56:53] (03CR) 10Hashar: [C: 031 V: 032] "Cherry picked on the puppetmaster for the integration project. That would hopefully work on the production slaves gallium.wikimedia.org " [operations/puppet] - 10https://gerrit.wikimedia.org/r/124822 (owner: 10Hashar) [11:03:16] (03PS2) 10Hashar: contint: get rid of misc::pbuilder on slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/122707 [11:03:18] (03PS3) 10Hashar: contint: directory to hold debian-glue packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/122712 [11:13:39] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [11:13:39] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [11:13:39] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [11:13:39] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [12:01:59] PROBLEM - Host amssq58 is DOWN: PING CRITICAL - Packet loss = 100% [12:02:20] PROBLEM - Host amssq57 is DOWN: PING CRITICAL - Packet loss = 100% [12:02:20] PROBLEM - Host amssq52 is DOWN: PING CRITICAL - Packet loss = 100% [12:02:28] is this someone here? ^^ [12:02:30] PROBLEM - Host lvs3001 is DOWN: PING CRITICAL - Packet loss = 100% [12:02:30] PROBLEM - Host amssq47 is DOWN: PING CRITICAL - Packet loss = 100% [12:02:30] PROBLEM - Host amssq56 is DOWN: PING CRITICAL - Packet loss = 100% [12:02:50] PROBLEM - Host amslvs3 is DOWN: PING CRITICAL - Packet loss = 100% [12:02:59] PROBLEM - Host amslvs1 is DOWN: PING CRITICAL - Packet loss = 100% [12:03:00] PROBLEM - Host cp3004 is DOWN: PING CRITICAL - Packet loss = 100% [12:03:00] PROBLEM - Host text-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [12:03:03] PROBLEM - Host amslvs4 is DOWN: PING CRITICAL - Packet loss = 100% [12:03:12] PROBLEM - Host amssq59 is DOWN: PING CRITICAL - Packet loss = 100% [12:03:21] PROBLEM - Host upload-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [12:03:30] RECOVERY - Host amssq47 is UP: PING WARNING - Packet loss = 93%, RTA = 86.26 ms [12:03:30] RECOVERY - Host cp3004 is UP: PING WARNING - Packet loss = 93%, RTA = 85.57 ms [12:03:30] RECOVERY - Host lvs3001 is UP: PING WARNING - Packet loss = 93%, RTA = 85.32 ms [12:03:30] RECOVERY - Host amssq59 is UP: PING WARNING - Packet loss = 93%, RTA = 87.04 ms [12:03:30] RECOVERY - Host upload-lb.esams.wikimedia.org is UP: PING WARNING - Packet loss = 28%, RTA = 87.24 ms [12:03:32] RECOVERY - Host amssq57 is UP: PING WARNING - Packet loss = 50%, RTA = 87.85 ms [12:03:39] RECOVERY - Host amssq52 is UP: PING OK - Packet loss = 0%, RTA = 87.41 ms [12:03:39] RECOVERY - Host amssq58 is UP: PING OK - Packet loss = 0%, RTA = 87.04 ms [12:03:39] RECOVERY - Host amslvs3 is UP: PING OK - Packet loss = 0%, RTA = 87.60 ms [12:03:59] PROBLEM - Host cp3010 is DOWN: PING CRITICAL - Packet loss = 100% [12:03:59] PROBLEM - RAID on nescio is CRITICAL: Timeout while attempting connection [12:04:09] RECOVERY - Host cp3010 is UP: PING OK - Packet loss = 0%, RTA = 94.45 ms [12:04:09] RECOVERY - Host amslvs1 is UP: PING OK - Packet loss = 0%, RTA = 95.24 ms [12:04:09] RECOVERY - Host amssq56 is UP: PING OK - Packet loss = 0%, RTA = 95.58 ms [12:04:49] RECOVERY - Host amslvs4 is UP: PING OK - Packet loss = 0%, RTA = 96.66 ms [12:04:49] RECOVERY - Host text-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 94.73 ms [12:22:55] (03CR) 10Hashar: "rebased to fix conflict" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122707 (owner: 10Hashar) [12:23:04] (03CR) 10Hashar: "rebased" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122712 (owner: 10Hashar) [12:52:11] (03CR) 10Dzahn: [C: 032] beta: reenable fatalmonitor script on eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/124624 (owner: 10Hashar) [12:54:12] mutante: should i push ms6 decom patches ? [13:08:46] (03PS1) 10Andrew Bogott: Partial revert of a87afea676375aba4f0ec28228e28df0502e5321 [operations/puppet] - 10https://gerrit.wikimedia.org/r/124834 [13:12:08] (03CR) 10Andrew Bogott: [C: 032] Partial revert of a87afea676375aba4f0ec28228e28df0502e5321 [operations/puppet] - 10https://gerrit.wikimedia.org/r/124834 (owner: 10Andrew Bogott) [13:19:31] apergos: online? [13:19:39] yes [13:19:58] Steinsplitter: what's up? [13:20:49] apergos: it is back. so resolved O_O [13:22:35] ok then [13:25:16] apergos: is ms6 in esams or tampa ? [13:26:15] mutante: can we close https://rt.wikimedia.org/Ticket/Display.html?id=6979 now that you've done #80? [13:27:44] (03PS1) 10Cmjohnson: adding ipv6 for neon/icinga rt 4602 [operations/dns] - 10https://gerrit.wikimedia.org/r/124837 [13:31:48] matanya: yes [13:31:49] andrewbogott: yes [13:32:14] mutante: is m6 in esams or tampa ? [13:32:26] matanya: ms5 is gone isn't it? I mean long gone [13:32:29] pmtpa [13:32:45] oh sorry [13:32:59] so why is it node 'ms6.esams.wikimedia.org' in site.pp ? [13:33:04] ms6 (I have one veerical line of pixels that doesn't function, and once in awhile it lands in exactly the place to change a letter [13:33:11] or make a ltter/number ambiguous [13:33:13] esams. [13:33:17] it's in esams [13:33:31] while files/dsh/group/pmtpa includes ms6 ? [13:33:37] whoknows [13:33:41] i'm confused [13:33:49] i'll just remove all [13:33:50] well ms6 is definitely in esams [13:34:06] ms5 if it still exists (which it shouldn't) is/was definitely in pmtpa [13:34:33] anywyas ms6 is no longer used for anything so it can go from dsh and all. [13:35:32] (03CR) 10Cmjohnson: [C: 032] adding ipv6 for neon/icinga rt 4602 [operations/dns] - 10https://gerrit.wikimedia.org/r/124837 (owner: 10Cmjohnson) [13:35:59] (03PS1) 10Dzahn: create shell account for Filippo Giunchedi [operations/puppet] - 10https://gerrit.wikimedia.org/r/124838 [13:36:11] (03PS1) 10Matanya: decom: ms6 [operations/puppet] - 10https://gerrit.wikimedia.org/r/124839 [13:36:12] uuuh [13:36:49] (03PS1) 10Andrew Bogott: Revert "Make (graphite|gdash).wm.o go through misc-eqiad-lb" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124840 [13:38:39] (03CR) 10Filippo Giunchedi: [C: 031] create shell account for Filippo Giunchedi [operations/puppet] - 10https://gerrit.wikimedia.org/r/124838 (owner: 10Dzahn) [13:38:52] (03CR) 10Andrew Bogott: [C: 032] Revert "Make (graphite|gdash).wm.o go through misc-eqiad-lb" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124840 (owner: 10Andrew Bogott) [13:39:12] matanya: ms6.esams.wikimedia.org [13:39:13] (03PS1) 10Alexandros Kosiaris: Improve the check_eth check [operations/puppet] - 10https://gerrit.wikimedia.org/r/124841 [13:39:26] matanya: that's why it did not show up in Tampa tikcets [13:39:33] mutante: see above :) [13:41:12] andre__: Do you have plans to log out all BZ users? [13:41:47] hoo: I myself didn't have plans so far; I'd hope that ops told me if that's recommended [13:41:54] (plus no idea how I'd do that, to be honest) [13:43:10] (03CR) 10Dzahn: [C: 032] create shell account for Filippo Giunchedi [operations/puppet] - 10https://gerrit.wikimedia.org/r/124838 (owner: 10Dzahn) [13:44:58] (03CR) 10Alexandros Kosiaris: [C: 032] Improve the check_eth check [operations/puppet] - 10https://gerrit.wikimedia.org/r/124841 (owner: 10Alexandros Kosiaris) [13:46:02] andre__: Prune the logincookies table? [13:46:02] Maybe also the tokens table? [13:46:13] I'm not into BZ enough :/ [13:46:21] (03PS5) 10Rush: module to manage new python-diamond package [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 [13:46:33] I guess I might ask upstream what they plan to do [13:47:37] andre__: Ok, I can help with killing stuff from the DB or so (or springl.e, ... can do that) [13:48:26] (03CR) 10Matanya: module to manage new python-diamond package (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 (owner: 10Rush) [13:48:39] (03PS1) 10Alexandros Kosiaris: Fix bug introduced in I4a7a2c71be [operations/puppet] - 10https://gerrit.wikimedia.org/r/124842 [13:49:24] (03CR) 10Alexandros Kosiaris: [C: 032] Fix bug introduced in I4a7a2c71be [operations/puppet] - 10https://gerrit.wikimedia.org/r/124842 (owner: 10Alexandros Kosiaris) [13:49:33] (03CR) 10Alexandros Kosiaris: [V: 032] Fix bug introduced in I4a7a2c71be [operations/puppet] - 10https://gerrit.wikimedia.org/r/124842 (owner: 10Alexandros Kosiaris) [13:49:54] (03PS6) 10Rush: module to manage new python-diamond package [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 [13:55:38] (03PS1) 10RobH: replace ticket.wikimedia.org certificate [operations/puppet] - 10https://gerrit.wikimedia.org/r/124843 [13:56:39] (03CR) 10RobH: [C: 032 V: 032] replace ticket.wikimedia.org certificate [operations/puppet] - 10https://gerrit.wikimedia.org/r/124843 (owner: 10RobH) [13:58:04] !log updating otrs cert [13:58:10] Logged the message, RobH [13:58:44] (03PS1) 10Alexandros Kosiaris: Fix another introduced in I4a7a2c71be [operations/puppet] - 10https://gerrit.wikimedia.org/r/124844 [13:59:23] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Fix another introduced in I4a7a2c71be [operations/puppet] - 10https://gerrit.wikimedia.org/r/124844 (owner: 10Alexandros Kosiaris) [14:00:11] hoo: might be a good question for security@. I can't really judge if that's needed :-/ [14:00:39] !log adding filippo to ops/wmf LDAP groups [14:00:44] Logged the message, Master [14:01:34] andre__: I don't know Bugzilla, I can only tell you that users should be logged out and adviced to change their Passwords [14:01:48] but I can't tell you how BZ handles this [14:02:44] !log yes, otrs is totally ssl borked, robh is working on it [14:02:49] Logged the message, RobH [14:05:30] (03CR) 10Alexandros Kosiaris: "The ideas in this were incorporated in:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124606 (owner: 10Cmjohnson) [14:08:17] andre__: I'd suggest to prune the logincookies table, it has about 5.5k entries atm [14:08:55] PROBLEM - HTTPS on cp4009 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:08:55] PROBLEM - check configured eth on tridge is CRITICAL: NRPE: Command check_check_eth not defined [14:09:36] hoo, I'll ask on security@ as I'd love to get input [14:09:42] OK, more https errors coming up -- please ignore! [14:09:44] PROBLEM - HTTPS on cp4015 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:09:44] PROBLEM - HTTPS on cp4008 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:09:45] PROBLEM - HTTPS on cp4005 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:09:45] PROBLEM - HTTPS on cp4002 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:09:45] PROBLEM - HTTPS on amssq47 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:09:45] PROBLEM - HTTPS on cp4020 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:09:47] andre__: Which security? [14:09:52] mozilla?= [14:10:00] hoo: WMF mailing list [14:10:04] PROBLEM - HTTPS on cp4010 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:10:04] PROBLEM - HTTPS on cp4012 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:10:05] PROBLEM - HTTPS on cp4001 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:10:05] PROBLEM - HTTPS on cp4019 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:10:10] meh [14:10:14] I'm not on that [14:10:14] PROBLEM - HTTPS on cp4017 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:10:14] PROBLEM - HTTPS on cp4004 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:10:20] but opening a bug isn't an option [14:10:25] PROBLEM - HTTPS on cp4013 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:10:34] PROBLEM - gdash.wikimedia.org on tungsten is CRITICAL: Connection refused [14:10:34] PROBLEM - HTTPS on cp4003 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:10:34] PROBLEM - HTTPS on cp4016 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:10:34] PROBLEM - HTTPS on cp4018 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:10:34] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: Connection refused [14:10:35] PROBLEM - check configured eth on pdf3 is CRITICAL: Connection refused by host [14:10:44] PROBLEM - HTTPS on cp4014 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:10:54] PROBLEM - HTTPS on cp4006 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:11:04] PROBLEM - HTTPS on cp4011 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:12:32] (03PS1) 10Giuseppe Lavagetto: Adding two carbon-cache workers to graphite. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124846 [14:13:34] RECOVERY - HTTPS on iodine is OK: SSL_CERT OK - X.509 certificate for ticket.wikimedia.org from RapidSSL CA valid until Feb 1 04:15:29 2016 GMT (expires in 663 days) [14:14:04] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [14:14:04] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [14:14:04] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [14:14:04] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [14:14:12] !log otrs back up, live hacked apache change, now working permanent puppet change (puppet is disabled on iodine at present) [14:14:18] Logged the message, RobH [14:16:07] (03CR) 10Rush: [C: 031] "hooray!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124846 (owner: 10Giuseppe Lavagetto) [14:16:31] (03CR) 10Gage: [C: 031] Adding two carbon-cache workers to graphite. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124846 (owner: 10Giuseppe Lavagetto) [14:16:33] (03PS1) 10RobH: fixing ticket.wikimedia.org apache vhost file [operations/puppet] - 10https://gerrit.wikimedia.org/r/124848 [14:17:02] (03CR) 10RobH: [C: 032 V: 032] fixing ticket.wikimedia.org apache vhost file [operations/puppet] - 10https://gerrit.wikimedia.org/r/124848 (owner: 10RobH) [14:22:33] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx [14:24:28] (03PS1) 10Andrew Bogott: This is still wrong but slightly better! [operations/puppet] - 10https://gerrit.wikimedia.org/r/124850 [14:27:25] (03PS2) 10Andrew Bogott: This is still wrong but slightly better! [operations/puppet] - 10https://gerrit.wikimedia.org/r/124850 [14:29:21] (03PS1) 10Alexandros Kosiaris: Some extra improvements for check_eth [operations/puppet] - 10https://gerrit.wikimedia.org/r/124854 [14:29:35] (03CR) 10Andrew Bogott: [C: 032] This is still wrong but slightly better! [operations/puppet] - 10https://gerrit.wikimedia.org/r/124850 (owner: 10Andrew Bogott) [14:30:47] (03PS1) 10Filippo Giunchedi: RT #7243 add filippo to admins::roots [operations/puppet] - 10https://gerrit.wikimedia.org/r/124855 [14:31:53] (03CR) 10Alexandros Kosiaris: [C: 032] Some extra improvements for check_eth [operations/puppet] - 10https://gerrit.wikimedia.org/r/124854 (owner: 10Alexandros Kosiaris) [14:33:37] (03PS2) 10Dzahn: RT #7243 add filippo to admins::roots [operations/puppet] - 10https://gerrit.wikimedia.org/r/124855 (owner: 10Filippo Giunchedi) [14:35:32] (03CR) 10Matanya: [C: 031] "welcome!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124855 (owner: 10Filippo Giunchedi) [14:35:34] (03CR) 10Dzahn: [C: 031] RT #7243 add filippo to admins::roots [operations/puppet] - 10https://gerrit.wikimedia.org/r/124855 (owner: 10Filippo Giunchedi) [14:35:56] PROBLEM - check configured eth on mchenry is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:36:36] akosiaris: ^ [14:36:46] PROBLEM - check configured eth on dobson is CRITICAL: Connection refused by host [14:37:08] matanya: yeah, thanks [14:37:21] oh, you know. ok :) [14:38:58] ACKNOWLEDGEMENT - check configured eth on dobson is CRITICAL: Connection refused by host alexandros kosiaris soon to be decom, hardy [14:40:45] RECOVERY - HTTPS on cp4008 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [14:40:45] RECOVERY - HTTPS on sodium is OK: SSL_CERT OK - X.509 certificate for lists.wikimedia.org from RapidSSL CA valid until Jan 31 02:58:36 2016 GMT (expires in 662 days) [14:40:55] RECOVERY - HTTPS on cp4020 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [14:41:22] RECOVERY - HTTPS on cp4012 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [14:41:32] RECOVERY - HTTPS on cp4003 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [14:43:12] PROBLEM - check configured eth on pdf2 is CRITICAL: Connection refused by host [14:43:40] looking for a good place to run sqstat, [14:43:45] paravoid, know of one? [14:44:06] i tried moving this to analytics1003 yesterday, but it didn't work, and before I try again to figure out why [14:44:12] was just wondering if maybe there is a better place [14:45:08] (03PS1) 10Giuseppe Lavagetto: Fix graphite-web cronjob. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124858 [14:47:52] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx [14:48:25] !log disabling puppet to test sqstat on analytics1003 [14:48:29] Logged the message, Master [14:53:32] (03PS1) 10Andrew Bogott: Check the proper cert on the cache boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/124859 [14:54:38] (03PS2) 10Andrew Bogott: Check the proper cert on the cache boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/124859 [14:55:11] <_joe_> ori: please review my change to graphite conf and commit that if it seems good to you - graphite really needs some attention right now [14:56:20] (03CR) 10Filippo Giunchedi: [C: 031] RT #7243 add filippo to admins::roots [operations/puppet] - 10https://gerrit.wikimedia.org/r/124855 (owner: 10Filippo Giunchedi) [14:56:38] (03CR) 10Andrew Bogott: [C: 032] Check the proper cert on the cache boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/124859 (owner: 10Andrew Bogott) [14:56:44] (03CR) 10Filippo Giunchedi: [C: 032] RT #7243 add filippo to admins::roots [operations/puppet] - 10https://gerrit.wikimedia.org/r/124855 (owner: 10Filippo Giunchedi) [14:56:51] (03PS2) 10Ottomata: Fix graphite-web cronjob. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124858 (owner: 10Giuseppe Lavagetto) [14:57:04] (03CR) 10Ottomata: [C: 032 V: 032] Fix graphite-web cronjob. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124858 (owner: 10Giuseppe Lavagetto) [14:57:15] I gotcha _joe_ [14:58:03] <_joe_> ottomata: what did I do wrong? [14:58:30] eh? [14:58:40] looks good to me [14:58:41] i merged it [14:58:42] eh? [14:58:46] <_joe_> Oh ok you just rebased [14:58:49] <_joe_> :) [14:59:07] <_joe_> I couldn't find the difference between the two revisions [14:59:08] yup [14:59:14] <_joe_> thanks, it fixes the labs instance [14:59:32] graphite.wikimedia.org is down right now too? [14:59:47] <_joe_> ottomata: If you happen to see ori around, please point it to the other change [14:59:51] <_joe_> ottomata: yes :( [14:59:58] <_joe_> s/it/him/ [15:00:18] what other change? [15:00:51] <_joe_> ottomata: https://gerrit.wikimedia.org/r/124858 [15:00:59] <_joe_> oh no sorry [15:01:01] <_joe_> 1 sec [15:01:10] <_joe_> https://gerrit.wikimedia.org/r/#/c/124846/ [15:02:06] !log stopped puppet on emery to test sqstat on analytics1003 [15:02:11] Logged the message, Master [15:02:35] <_joe_> graphite goes down because somebody added a lot of metrics to it in the last week, and it cannot handle the load, apparently [15:02:44] hm,aye ok [15:03:31] _joe_, anything we can do to make graphite just kidna come back online until ori gets to review that? [15:03:37] can I kick aservice or two? [15:03:47] <_joe_> so one possible solution is to add a few more carbon-cache workers [15:03:49] <_joe_> ottomata: kick uswgi on tungsten [15:03:52] RECOVERY - HTTPS on cp4009 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:03:53] RECOVERY - HTTPS on amssq47 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:03:56] <_joe_> it will last like 10 mins [15:04:02] RECOVERY - HTTPS on cp4014 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:04:02] RECOVERY - HTTPS on cp4011 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:04:12] RECOVERY - HTTPS on cp4007 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:04:12] RECOVERY - HTTPS on cp4001 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:04:13] RECOVERY - HTTPS on cp4017 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:04:13] RECOVERY - HTTPS on cp4002 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:04:13] RECOVERY - HTTPS on cp4015 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:04:22] RECOVERY - HTTPS on cp4004 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:04:22] RECOVERY - HTTPS on cp4019 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:04:22] RECOVERY - HTTPS on cp4016 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:04:23] RECOVERY - HTTPS on cp4018 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:04:32] RECOVERY - HTTPS on cp4013 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:04:32] RECOVERY - HTTPS on cp4010 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:04:32] RECOVERY - HTTPS on cp4005 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:04:46] <_joe_> ottomata: if you feel adventurous enough, /sbin/carbonctl restart [15:04:48] (03PS1) 10Alexandros Kosiaris: WIP: Introducing Shinken module [operations/puppet] - 10https://gerrit.wikimedia.org/r/124861 [15:07:42] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [15:07:49] ottomata: fyi kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 7.09796838796e-21 [15:07:55] dunno, just seeing it in icinga [15:14:28] danke [15:17:27] ottomata: oh, i just saw your comment on that RT for Filippo.. but we already did most [15:17:40] (because we're sitting next to each other in Athens) [15:17:49] thanks for offering [15:18:27] yeah, saw that [15:18:29] thanks! [15:18:31] (03PS1) 10Filippo Giunchedi: RT #7246 add fgiunchedi to icinga contact groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/124863 [15:23:38] (03CR) 10Dzahn: [C: 032] RT #7246 add fgiunchedi to icinga contact groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/124863 (owner: 10Filippo Giunchedi) [15:23:52] RECOVERY - DPKG on virt1000 is OK: All packages OK [15:24:23] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 2322.77923734 [15:24:50] (03PS1) 10Ottomata: Moving sqstat to analytics1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/124864 [15:25:02] ottomata: :) [15:25:14] (03CR) 10Ottomata: [C: 032 V: 032] Moving sqstat to analytics1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/124864 (owner: 10Ottomata) [15:27:14] (03PS1) 10Filippo Giunchedi: RT #7246 add Filippo Giunchedi to icinga cgi allowed commands [operations/puppet] - 10https://gerrit.wikimedia.org/r/124865 [15:28:49] (03CR) 10Dzahn: [C: 032] RT #7246 add Filippo Giunchedi to icinga cgi allowed commands [operations/puppet] - 10https://gerrit.wikimedia.org/r/124865 (owner: 10Filippo Giunchedi) [15:33:56] RECOVERY - HTTPS on cp4006 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 651 days) [15:39:40] (03Abandoned) 10Cmjohnson: add interface speed check for all hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/124606 (owner: 10Cmjohnson) [15:41:09] (03CR) 10Cmjohnson: [C: 032] hume: decom, left mgmt [operations/dns] - 10https://gerrit.wikimedia.org/r/122609 (owner: 10Matanya) [15:44:02] ACKNOWLEDGEMENT - gdash.wikimedia.org on tungsten is CRITICAL: Connection refused andrew bogott These are due to icinga bugs of some sort... something to do with https://gerrit.wikimedia.org/r/#/c/124840/ and related patches. [15:44:02] ACKNOWLEDGEMENT - graphite.wikimedia.org on tungsten is CRITICAL: Connection refused andrew bogott These are due to icinga bugs of some sort... something to do with https://gerrit.wikimedia.org/r/#/c/124840/ and related patches. [15:46:34] (03CR) 10Dzahn: [C: 032] decom: ms6 [operations/puppet] - 10https://gerrit.wikimedia.org/r/124839 (owner: 10Matanya) [15:50:01] (03CR) 10Giuseppe Lavagetto: "As a reference: we moved from 35k metrics/min to 65k metrics/min in the last few weeks:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124846 (owner: 10Giuseppe Lavagetto) [15:50:19] hey [15:51:39] (03PS2) 10Ori.livneh: Adding two carbon-cache workers to graphite. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124846 (owner: 10Giuseppe Lavagetto) [15:51:47] (03CR) 10Ori.livneh: [C: 032 V: 032] Adding two carbon-cache workers to graphite. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124846 (owner: 10Giuseppe Lavagetto) [15:52:00] !log ms6 - revoke puppet cert, salt key, remove from icinga [15:52:05] Logged the message, Master [15:54:15] (03PS1) 10Ottomata: Adding $data_dir parameter to set path.data [operations/puppet] - 10https://gerrit.wikimedia.org/r/124869 [16:02:39] (03PS3) 10Ottomata: Adding logster module and using it to monitor CirrusSearch-slow.log [operations/puppet] - 10https://gerrit.wikimedia.org/r/123466 [16:03:13] jgage: mind looking that one over real quick? ^ [16:03:33] wanna merge but would appreciate one other opsen nitpicking at it before I do [16:07:21] (03PS1) 10Ottomata: Installing make on stat nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/124871 [16:07:25] greg-g: we would like to deploy quick thing, if there is time open [16:07:37] aude: what quick thing? [16:07:39] g'mornign! [16:07:40] https://gerrit.wikimedia.org/r/#/c/124870/ [16:07:41] (03CR) 10Ottomata: [C: 032 V: 032] Installing make on stat nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/124871 (owner: 10Ottomata) [16:08:00] some issues with formatting dates (as we switched to backend /api handling vs. js ) [16:08:05] aude: gotcha, yeah [16:08:20] since we had no time on test.wikidata etc., we think this is the best action [16:08:33] we can wait until next time we deploy in a few weeks [16:08:41] aude: I think manybubbles (not here) and MaxSem were going to do something starting now-ish [16:08:44] users won't miss much anything [16:08:48] k [16:08:49] ok [16:09:02] it's not horrible broken [16:09:10] not super urgent but sooner better [16:09:34] greg-g, aude, since Nik is away ATM, feel free to deploy [16:09:38] ok [16:09:52] thanks MaxSem [16:10:04] won't take more than 5 min [16:12:06] waits for jenkins [16:12:31] * aude impatient [16:13:52] ... [16:14:20] almost done [16:17:04] !log aude synchronized php-1.23wmf21/extensions/Wikidata 'Switch Wikidata back to previous version of Wikibase' [16:17:09] Logged the message, Master [16:17:15] shall verify it's good [16:17:28] see no reason why not [16:21:52] go ahead [16:27:41] hey manybubbles, ready to do GeoData? [16:27:56] sure! you want me to sync or you want to sync? [16:27:56] if so, I'll prepare commits [16:27:58] sure [16:28:47] !log fiddling with Elasticsearch cluster balancing options trying to get enwiki better balanced [16:28:52] Logged the message, Master [16:32:12] (03CR) 10Matanya: Adding logster module and using it to monitor CirrusSearch-slow.log (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123466 (owner: 10Ottomata) [16:32:26] !log maxsem synchronized php-1.23wmf21/extensions/GeoData [16:32:31] Logged the message, Master [16:33:23] (03PS1) 10Ori.livneh: mwprof: change collector port to 3811 [operations/puppet] - 10https://gerrit.wikimedia.org/r/124875 [16:33:38] (03CR) 10Ottomata: Adding logster module and using it to monitor CirrusSearch-slow.log (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123466 (owner: 10Ottomata) [16:33:58] (03PS1) 10MaxSem: Enable $wgGeoDataUseCirrusSearch on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124876 [16:35:01] (03CR) 10Matanya: [C: 031] Adding logster module and using it to monitor CirrusSearch-slow.log [operations/puppet] - 10https://gerrit.wikimedia.org/r/123466 (owner: 10Ottomata) [16:35:06] (03PS1) 10Ori.livneh: Change default port to 3811 [operations/software/mwprof/reporter] - 10https://gerrit.wikimedia.org/r/124877 [16:36:06] (03CR) 10MaxSem: [C: 032] Enable $wgGeoDataUseCirrusSearch on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124876 (owner: 10MaxSem) [16:36:15] (03Merged) 10jenkins-bot: Enable $wgGeoDataUseCirrusSearch on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124876 (owner: 10MaxSem) [16:36:32] ottomata: https://gerrit.wikimedia.org/r/#/c/85337 (the eventlogging kafka patch) should be good to go. I set the topic to 'eventlogging-test01' for now since we might as well reserve the name 'eventlogging' for some future iteration of this setup [16:37:01] i moved the actual python code to eventlogging itself rather than keep it a plugin [16:37:32] (03CR) 10Ori.livneh: [C: 032 V: 032] Change default port to 3811 [operations/software/mwprof/reporter] - 10https://gerrit.wikimedia.org/r/124877 (owner: 10Ori.livneh) [16:37:48] !log maxsem synchronized wmf-config/InitialiseSettings.php 'https://gerrit.wikimedia.org/r/124876' [16:37:52] Logged the message, Master [16:38:39] manybubbles, done - now an index rebuild is needed, but I don't dare myself:P [16:38:50] MaxSem: on test wiki? [16:38:55] test2 or test? [16:39:05] (03PS2) 10Ori.livneh: mwprof: change collector port to 3811 [operations/puppet] - 10https://gerrit.wikimedia.org/r/124875 [16:39:10] (03CR) 10Ori.livneh: [C: 032 V: 032] mwprof: change collector port to 3811 [operations/puppet] - 10https://gerrit.wikimedia.org/r/124875 (owner: 10Ori.livneh) [16:39:14] gerrit says test [16:40:10] ori: , cool [16:40:13] re topic name [16:40:20] there is currently no way to delete topic names in kafka [16:40:26] there is in 0.8.1, but apparently it is kinda buggy [16:40:42] ottomata: so should we just go with 'eventlogging'? [16:40:48] just for organizational reasons, i'd prefer to add fewer topics if possible [16:40:59] well, i actually have been thinking about starting to version topic names [16:41:03] !log reindexed testwiki to soak up geo changes [16:41:06] MaxSem: ^^^ [16:41:09] Logged the message, Master [16:41:11] for example, I'm going to have to create new topics when I add a new replica node [16:41:14] ottomata: what do the versions represent? [16:41:19] because I can't change relplica settings for existant topics [16:41:23] that is, what changes from one to the next [16:41:25] just iterations iguess [16:41:46] like, whenever you HAVE to change something about the topic, you basically have to use a new topic [16:41:48] oh, that must be frustrating [16:42:15] yeah, that's why they made that feature for 0.8.1, but there wasn't really anything in 0.8.1 that we needed, and i hear tell that it is kiinda buggy right now [16:42:20] so i'm not thinking about upgrading yet [16:42:21] so: 'eventlogging-0'? [16:42:26] maybe 00? [16:42:33] sure [16:43:01] ok cool [16:43:06] i'll look at the change ina bit [16:43:48] (03PS1) 10MaxSem: GeoData to elasticsearch in debug mode on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124880 [16:43:56] manybubbles, sorry - I had a phone call [16:44:10] ottomata: yep. the change itself is pretty simple, but i'll need your help to confirm that things are getting written. but it's not urgent at all. [16:44:20] k [16:44:20] MaxSem: np - I think it is ready for you to test [16:44:34] manybubbles: I was going to start in an reformatting a node [16:44:45] but then realized we needed to puppetize path.data, eh? [16:44:47] ottomata: fine! [16:44:49] see your gerrit [16:44:50] :) [16:44:57] ? [16:45:08] multiple data directories now, right? [16:45:12] https://gerrit.wikimedia.org/r/#/c/124869/ [16:45:15] stripe [16:45:24] raid 0? [16:45:28] i thought we were going to JBOD it and let es do it [16:45:35] I though raid 0 [16:45:40] its easier on the monitoring [16:45:49] I check rt [16:46:04] ok reading [16:46:06] me too [16:46:08] (03CR) 10MaxSem: [C: 032] GeoData to elasticsearch in debug mode on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124880 (owner: 10MaxSem) [16:46:15] (03Merged) 10jenkins-bot: GeoData to elasticsearch in debug mode on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124880 (owner: 10MaxSem) [16:46:42] looks like rt says raid 0. I think raid 0 will be easier to manage and we're already setting up raid so not a big deal [16:46:43] ok, it is unclear, you do say you prefer raid 0 in the RT [16:46:44] but not sure why [16:46:47] hm [16:47:17] easier to manage, a bit faster, and we'll have to do nasty stuff if one disk fails anyway, so why not make that nasty stuff repartition [16:47:21] we already have the redundency [16:48:08] !log maxsem synchronized wmf-config/InitialiseSettings.php 'https://gerrit.wikimedia.org/r/124880' [16:48:13] Logged the message, Master [16:48:46] sounds more annoying to manage to me [16:48:50] disk fails [16:48:51] raid alert [16:48:54] ah, what's wrong? [16:48:54] which disk [16:48:57] can we rebuild raid? [16:49:04] man mdadm, uhhhh ook go! [16:49:05] ...wait [16:49:08] is it working? [16:49:09] hmmm [16:49:17] vs. [16:49:25] 1 disk fails. [16:49:30] es on node is down [16:49:33] replace disk [16:49:36] let es rebuild stuff [16:49:41] i'm googling for opinions [16:49:43] found this: [16:49:44] https://groups.google.com/forum/#!msg/elasticsearch/RrFCex8JJ28/mcHu1LSV-TAJ [16:49:49] but it isn't very helpful [16:50:31] if ES says RAID 0 is better than their own JBOD balancing [16:50:32] then sure [16:50:33] but hm [16:51:02] they do say it is better [16:51:04] (03PS4) 10Ori.livneh: Add EventLogging Kafka writer plug-in [operations/puppet] - 10https://gerrit.wikimedia.org/r/85337 [16:51:15] their jbod stuff is supposed to be for folks who don't want to mess with raid [16:51:35] yeah, they say that, but that's about it [16:51:36] ok ok ok [16:51:38] raid 0 it is then [16:51:47] here, we have hangout first [16:51:56] ? [17:01:31] so manybubbles, we should abandon that data.path change then? [17:01:41] yah, I think so [17:01:45] k [17:02:07] (03Abandoned) 10Ottomata: Adding $data_dir parameter to set path.data [operations/puppet] - 10https://gerrit.wikimedia.org/r/124869 (owner: 10Ottomata) [17:15:00] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [17:15:00] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [17:15:00] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [17:15:00] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [17:20:14] hmm manybubbles, does it require a forced reindex? cuz I started receiving one coordinate in results after I null edited the page inquestion [17:20:28] MaxSem: it wasn't working? [17:21:04] yep [17:21:17] ah, yea, let me do this [17:22:01] !log regenerating Elasticsearch index from mediawiki for testwiki to soak up geo changes. [17:22:04] that probably was required [17:22:06] Logged the message, Master [17:22:11] well, you tested it and it was [17:23:35] for the reference, what commands did you use? [17:27:50] (03PS1) 10Yurik: Add Zero ext to zerowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124886 [17:28:27] manybubbles, for the reference - what commands did you use? [17:29:48] MaxSem: https://wikitech.wikimedia.org/wiki/Search - I did both an In Place Reindex (to get config changes) then a Full Reindex [17:29:56] they are poorly named [17:30:07] one is fast and atomic and only does config changes [17:30:16] and the other is slow and regenerates the index from mediawiki [17:30:45] but it isn't atomic- so if you have to get config changes out you do one then the other [17:30:59] greg-g: deploying setting changes [17:31:20] (03PS1) 10Dr0ptp4kt: Add proxy support for 437-01. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124887 [17:31:26] and "atomic" isn't quite the right word. it rebuilds the index by spinning all the text through the php process without asking mediawiki to render the pages [17:31:34] then it swaps two pointer in quick succession [17:31:38] each swap is atomic [17:31:43] but the whole process isn't [17:32:22] bblack, when you have a minute would you please review and, if appropriate, +2 https://gerrit.wikimedia.org/r/124887 ? [17:32:41] and technically the swaps are part of distributed cluster state - so I'm not 100% clear on the details of how out of sync each node can get. "not very" is my experience though [17:34:12] (03CR) 10Yurik: [C: 032] Add Zero ext to zerowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124886 (owner: 10Yurik) [17:34:20] (03Merged) 10jenkins-bot: Add Zero ext to zerowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124886 (owner: 10Yurik) [17:38:56] !log yurik synchronized wmf-config/InitialiseSettings.php [17:39:02] Logged the message, Master [17:41:48] MaxSem: the rebuild finished [17:42:03] thanks manybubbles [17:47:40] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:48:31] PROBLEM - MySQL Recent Restart on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:48:40] PROBLEM - MySQL Slave Running on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:49:21] RECOVERY - MySQL Recent Restart on db1047 is OK: OK 1276608 seconds since restart [17:49:30] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [17:49:31] RECOVERY - MySQL Slave Running on db1047 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [17:51:20] Hi, I'm being asked if you have changed the ssl certificates of all potentially affected WMF sites already (including Bugzilla). Can you tell? [17:51:24] (03CR) 10Ottomata: [C: 032] module to manage new python-diamond package [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 (owner: 10Rush) [17:53:27] pajz: yes. [17:56:49] Ok, thanks. [17:57:54] pajz: np, it is now safe to change your password and not worry about residual issues [17:58:30] pajz: you may have seen that we are also logging everyone out of the site (forcing people to relogin). It takes a while but should be done in the next couple days for everyone. [18:01:20] PROBLEM - Disk space on virt1000 is CRITICAL: DISK CRITICAL - free space: / 1523 MB (2% inode=86%): [18:02:11] I see, thanks. [18:28:37] ottomata: hey, can I have a few minutes with the elasticsearch cluster before you zap a node? [18:28:53] sure, i am not doing it right now anyway manybubbles [18:28:56] other meetings and stuff [18:38:45] (03PS1) 10Matanya: decom: ms6 [operations/dns] - 10https://gerrit.wikimedia.org/r/124901 [18:44:11] (03PS1) 10Manybubbles: Elasticsearch config to better spread shards [operations/puppet] - 10https://gerrit.wikimedia.org/r/124903 [18:45:00] (03PS1) 10Yurik: Default wg(Add|Remove)Groups for zerowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124904 [18:45:46] RobH: Has the cert. for Bugzilla already been changed? [18:46:03] (03CR) 10Manybubbles: "These settings have already been applied using the curl api. This persists them across cluster restarts." [operations/puppet] - 10https://gerrit.wikimedia.org/r/124903 (owner: 10Manybubbles) [18:54:28] greg-g: i need to adjust group rights for zero wiki, when would be a good time? should be very quick https://gerrit.wikimedia.org/r/#/c/124904/ [18:56:51] for deployment::target generally, it'd be nice to make a trebuchet a custom provider for Package resources [18:57:05] that was @ottomata [18:57:53] ha, whoa that would be fancy [18:57:56] so you'd have something like: package { 'eventlogging': provider => 'trebuchet' } or something like that. and if you wanted to keep things truly wmf-agnostic, you could parametrize the provider, so that someone could use pip / apt instead [18:58:39] class eventlogging( $package_provider = 'pip' ) { package { 'eventlogging': provider => $package_provider, } } [18:58:40] etc. [18:59:09] and then it would simply be the role class that passes trebuchet in as a parameter [19:00:01] ottomata: Got a clue whether the BZ cert. has already been changed? [19:00:10] custom providers are pretty easy btw: http://docs.puppetlabs.com/guides/provider_development.html [19:01:56] yurik: yeah, now's fine (sorry, was on a call) [19:02:11] greg-g: oki, thx [19:02:29] hoo, no I don't, does RobH know? [19:02:31] (03CR) 10Yurik: [C: 032] Default wg(Add|Remove)Groups for zerowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124904 (owner: 10Yurik) [19:02:39] (03Merged) 10jenkins-bot: Default wg(Add|Remove)Groups for zerowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124904 (owner: 10Yurik) [19:04:05] ottomata: he doesn't respond atm [19:04:38] I got the suspicion that cert. is locally on circonium only... (or in the private puppet)... mind to check? [19:04:51] ./files/apache/sites/bugzilla.wikimedia.org: SSLCertificateFile /etc/ssl/certs/bugzilla.wikimedia.org.pem [19:08:13] yurik: let csteipp know when you're done, he also needs to get in before parsoid ;) [19:10:37] !log yurik synchronized wmf-config/InitialiseSettings.php [19:12:17] csteipp: greg-g, go ahead. Need to figure out group security further - not working too well for me [19:12:45] I'm going to get some lunch, bbiab [19:12:47] yurik: Thanks! [19:15:43] !log csteipp synchronized php-1.23wmf21/extensions/CentralAuth/maintenance [19:15:48] Logged the message, Master [19:16:27] greg-g: I'm done. Thanks! [19:27:15] ottomata: Could you please check the BZ cert? I guess it's in the private puppet which I have no access to... [19:29:30] hoo, I imagine the public key would change too, no? [19:29:38] the last it changed in public puppet repo is Nov 4 [19:29:40] 2013 [19:30:07] Oh well, crap :( [19:55:02] (03PS1) 10Ottomata: Using kraken/deploy repository for deployments instead of kraken [operations/puppet] - 10https://gerrit.wikimedia.org/r/124993 [19:58:20] (03CR) 10Ottomata: [C: 032 V: 032] Using kraken/deploy repository for deployments instead of kraken [operations/puppet] - 10https://gerrit.wikimedia.org/r/124993 (owner: 10Ottomata) [20:07:30] mwalker|food, RoanKattouw, ebernhardson: I'm going to be out during SWAT today, so count me out please [20:07:39] OK [20:08:22] RoanKattouw: I still owe you a reply re: scheduling. Since I haven't found something readymade that would help us automate things, I'm OK with being more explicit. I can note that in a reply to your e-mail. [20:16:00] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [20:16:00] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [20:16:00] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [20:16:00] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [20:18:00] PROBLEM - MySQL Processlist on db1056 is CRITICAL: CRIT 28 unauthenticated, 0 locked, 71 copy to table, 12 statistics [20:19:00] RECOVERY - MySQL Processlist on db1056 is OK: OK 2 unauthenticated, 0 locked, 0 copy to table, 2 statistics [21:10:57] hey Reedy, yt? [21:14:27] bd808: yt? [21:14:37] ottomata: pong [21:14:59] know anyting abou tthis? [21:14:59] https://gerrit.wikimedia.org/r/#/c/119677/ [21:15:14] this broke analytics (and probably other) deployments [21:15:21] because it limits the machines that can be deployed to [21:15:24] and doesn't include analytics nodes [21:15:31] Blerg [21:15:47] https://gerrit.wikimedia.org/r/#/c/119677/11/manifests/role/deployment.pp [21:15:52] What subnets did we miss? I thought we were making it larger? [21:15:56] this should probalby be more than just mw appservers, no? [21:16:37] analytics needs 10.64.5.0/24, 10.64.21.0/24, 10.64.36.0/24, 10.64.53.0/24 [21:16:45] 10.64.36.0/24 isn't in there [21:16:48] that's what I was jsut checking [21:17:15] 10.64.32.0/22 ends at 10.64.35.254 [21:17:29] It wasn't on purpose. There is a comment inline in one of the revisions where I changed 10.64.0.0/16 to 10.64.16.0/22 [21:17:58] I couldn't find reference to /16 in other locations within puppet [21:18:24] yeah, dunno of other /16s [21:18:30] On the plus side it should be easier to fix now; just add the right subnets to network::constants [21:18:35] just curious, why are we limiting this? [21:18:40] yeah, i can add to mw_appserver [21:18:40] although [21:18:47] they aren't mw_appservers :p [21:19:19] * bd808 shrugs [21:19:23] seems weird to use that constant here [21:19:32] trebuchet should be used for lots more deployments than just mw_app [21:19:34] right? [21:19:35] The limits were there before, we were just trying to consolidate [21:19:40] If we allow all of 10.*.*.*... Labs in theory can read PrivateSettings.php [21:19:44] aye ok [21:19:48] i see [21:20:00] so limiting is cool [21:20:07] just not sure why'd we use a variable called mw_* [21:20:19] there's no mw stuff on analytics nodes :p [21:20:32] {{sofixit}} [21:20:32] :D [21:20:38] haha, oook [21:20:46] deployable_networks? [21:20:47] :p [21:20:48] Mostly because we made the variable for scap rsync and then realized it applied to trebuchet too [21:23:02] ottomata: What row is 10.64.53.0/24 in? [21:23:36] * bd808 found it: analytics1-d-eqiad [21:23:44] yup [21:24:02] trying to find out if I can concat arrays in puppet... [21:24:43] It would be awesome if you figured out how to generate that list. My puppet fu wasn't strong enough. [21:25:53] (03PS7) 10BryanDavis: [WIP] Configure scap master and clients in beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/123674 [21:27:46] oo bd808, you can concat arrays [21:27:48] it looks like this [21:28:01] $big_array = [$array1, $array2] [21:28:03] pretty yyyyy dumb [21:28:11] o_0 [21:28:29] An array of arrays is just an flat array? [21:29:17] * bd808 hides his head from ruby dsl magic [21:29:53] yeah [21:30:06] http://weblog.etherized.com/posts/175 [21:30:29] um wut no, ruby doesn't do that [21:30:33] * MatmaRex calls puppet magic [21:31:27] "For all practical purposes, within puppet's DSL, arrays that contain multiple sub-arrays function as if they are a single array containing all elements of the sub arrays." [21:31:33] * bd808 still blames ruby [21:31:49] yeah, naw, puppets fault [21:32:02] sigh unnghghhh [21:32:07] bd808: still not sure best way to do this [21:32:42] ottomata: I think it would fine to add your subnets to the current variable. [21:32:50] You can rename if you want [21:32:58] ergh, its used in other places though [21:33:09] and your subnets kinda overlap with some of mine [21:33:16] scap's rsync servers [21:33:24] um, i think hang on [21:34:32] ah nope [21:34:33] htey don't [21:34:34] sigh ok [21:34:37] i'll add them and add a note [21:38:23] (03PS1) 10Ottomata: Adding analytics networks to $mw_appserver_networks variable [operations/puppet] - 10https://gerrit.wikimedia.org/r/125011 [21:38:35] bd808: ^ [21:38:36] s'ok? [21:39:31] * bd808 looks [21:42:01] ottomata: I'm ok with it, but I'm not the keeper of ops/puppet prettiness. If you want to rename/split or rollback the change that effects trebuchet I'm fine with any of those options too. [21:42:17] i really want to just get this deploy out and make sure it works and quit working today [21:42:20] so ummmm [21:42:21] hm [21:42:27] i will send an email with a note [21:42:29] we can figure this out shortly [21:42:37] i'm not sure the best to do, i don't touch network.pp too much [21:42:46] hopefully faidon will have an opinion [21:43:06] There is little chance of him not having one :) [21:43:09] hahah [21:43:10] true! [21:43:23] would appreciate a +1 on that though, so I don't look like a lone crazy person :p [21:43:33] (too late) [21:43:38] NOooooooo [21:43:38] hah [21:43:44] :) [21:43:52] (03CR) 10BryanDavis: [C: 031] "I'm ok with it, but I'm not the keeper of ops/puppet prettiness. If you want to rename/split or rollback the change that effects trebuchet" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125011 (owner: 10Ottomata) [21:44:43] We're all crazy here! Right? (/me looks for reassurances) [21:44:55] (03CR) 10Ottomata: [C: 032 V: 032] "Faidon," [operations/puppet] - 10https://gerrit.wikimedia.org/r/125011 (owner: 10Ottomata) [21:44:58] bd808: it's why were here together [21:45:09] Gooble gobble [21:46:35] bd808: chicken chicken https://www.youtube.com/watch?v=yL_-1d9OSdk [21:47:08] <^demon|away> bd808: Yes, we're all crazy. [21:47:10] <^demon|away> Sanity's overrated. [21:47:51] RobH: ping [21:47:51] hahaha [21:55:48] greg-g: Oh man. The chicken chicken paper was a huge meme at my last job [22:00:00] bd808: hah [22:16:02] (03CR) 10Faidon Liambotis: "The comments are extensive, so this looks fine to me." [operations/puppet] - 10https://gerrit.wikimedia.org/r/125011 (owner: 10Ottomata) [22:16:19] paravoid: thank you! I was just about a good explanation on the cert question [22:17:02] yvw :) [22:18:04] paravoid: Can I haz new cert. for Bugzilla? :P [22:18:53] there is no evidence that it's possible to extract private keys with heartbleed on Linux systems [22:19:16] lots of people are trying, none have succeeded so far aiui [22:19:23] (but they have on freebsd) [22:19:46] so we decided to not replace all of our one-off certificates [22:19:49] paravoid: mh... so should we kill the user sessions and are good for now? [22:20:02] yeah [22:20:05] Not a fan of that, but ok [22:20:07] doing then [22:20:16] why not? [22:20:42] paravoid: From time to time where's some awry stuff on bugzilla (security wise), so I'd rather see us paranoid about it [22:21:06] paravoid: Really? I had heard it was possible... but I hadn't seen otherwise. [22:21:52] I would say that some people spend way too much time on line [22:21:59] https://twitter.com/neelmehta/status/453625474879471616 [22:22:02] but that line would backfire right now... [22:22:20] <^d> apergos: Indeed! We should take a break :) [22:22:21] (that's the person that found this vuln, not just some random dude :)) [22:22:32] yep [22:24:09] even on freebsd, the only way they've managed to exploit this is by restarting apache and using the exploit right after that [22:24:58] ^d: I will be taking a break soon. it's called "sleep" [22:25:07] I've heard it's good for one's health [22:25:16] <^d> mmm, sleep [22:25:54] paravoid: I read this earlier today: https://twitter.com/puellavulnerata/status/453652636055519233 [22:26:03] but I guess openssl.org runs openbsd [22:26:10] which has a different malloc and whatnot [22:26:49] looks like Ubuntu [22:27:47] (03PS1) 10BryanDavis: Fully qualify reference to hhvm class [operations/puppet] - 10https://gerrit.wikimedia.org/r/125014 [22:29:21] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [22:29:41] ok, I'd then suggest to kill the sessions on BZ? Any objections? [22:29:43] csteipp: ^ [22:30:18] hoo: No, I think the reset is fine, if we're sure our key isn't out there [22:30:48] If we decide to get a new key later on, we can throw out the users again, it's not that painful to re-login [22:31:36] (03CR) 10BryanDavis: "Cherry-picked on deployment-salt. Resolves error." [operations/puppet] - 10https://gerrit.wikimedia.org/r/125014 (owner: 10BryanDavis) [22:33:08] !log Logged out all Bugzilla users by deleting all session cookie data from mysql [22:33:13] Logged the message, Master [22:33:53] Anyone with access to private data should change their passwords and the usual stuffs... *sigh* [22:34:26] andre__: ^ fyi [22:47:33] !log Jenkins slaves in labs seem to be down. Zuul is stacking up jobs for hasNpm nodes (integration slaves in labs). Both slaves have 7/7 executors idle. [22:47:39] Logged the message, Master [22:50:43] ebernhardson: Could you do today's SWAT window? [22:50:46] Ori is away and I'm busy [22:50:56] RoanKattouw: yea i can do that [22:51:00] Sweet thanks man [22:54:10] !log Zuul has lots of queued jobs for npm slaves, but neither Jenkins nor integration-slave1001.eqiad.wmflabs and 1002 themselves have anything queued. They're idle, responsive and waiting for jobs. [22:54:15] Logged the message, Master [22:55:03] (03CR) 10Ori.livneh: [C: 032 V: 032] Fully qualify reference to hhvm class [operations/puppet] - 10https://gerrit.wikimedia.org/r/125014 (owner: 10BryanDavis) [22:55:14] ebernhardson: I'm back and can help if you like [22:57:59] Hmm [22:58:05] Is the Gerrit-to-Bugzilla bot broken [22:58:06] ? [22:58:13] Could this be related to logging all users out? [22:59:03] <^d> RoanKattouw: No clue, maybe. (in that order) [22:59:08] <^d> Also, s/bot/plugin/ [22:59:12] <^d> It's not a bot :) [23:00:18] * ^d is checking now [23:00:27] !log Restarting Jenkins because I have no clue what is going on and have no time to investigate yet another random clogging of all jobs. Restart ought to fix it. [23:00:32] Logged the message, Master [23:01:05] <^d> !log gerrit: reloaded bugzilla plugin to force it to log back in [23:01:10] Logged the message, Master [23:01:13] <^d> RoanKattouw: Fix-ded? [23:01:19] Maybe? [23:01:22] <^d> Log says it logged in now [23:01:33] I'll find out once Jenkins stops being broken and starts merging things [23:01:40] <^d> [2014-04-09 22:59:43,280] INFO com.google.gerrit.server.plugins.PluginLoader : Reloading plugin hooks-bugzilla [23:01:40] <^d> [2014-04-09 22:59:43,325] INFO com.googlesource.gerrit.plugins.hooks.bz.BugzillaModule : Bugzilla is configured as ITS [23:01:40] <^d> [2014-04-09 22:59:44,085] INFO com.googlesource.gerrit.plugins.hooks.bz.BugzillaItsFacade : Connected to https://bugzilla.wikimedia.org as username=gerritadmin@wikimedia.org, userid=16761 [23:01:42] <^d> [2014-04-09 22:59:44,604] INFO com.googlesource.gerrit.plugins.hooks.bz.BugzillaItsFacade : Connected to Bugzilla at https://bugzilla.wikimedia.org/xmlrpc.cgi, reported version is 4.4.1 [23:01:47] <^d> ^ There's my logspam from the reload [23:03:39] (03CR) 10EBernhardson: [C: 032] Enable math VE plugin on labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/120838 (owner: 10Catrope) [23:03:57] (03Merged) 10jenkins-bot: Enable math VE plugin on labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/120838 (owner: 10Catrope) [23:04:02] !log Jenkins and Zuul are back up. Queues have not been preserved. [23:04:05] RoanKattouw: ^ [23:04:07] Logged the message, Master [23:05:06] !log ebernhardson synchronized wmf-config/InitialiseSettings-labs.php 'Enable math VE plugin on labs' [23:05:11] Logged the message, Master [23:07:30] Krinkle: Awesome, thanks [23:10:07] (03PS7) 10Krinkle: module to manage new python-diamond package [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 (owner: 10Rush) [23:10:10] ebernhardson: for SWAT, https://gerrit.wikimedia.org/r/#/c/125018/ [23:16:12] ebernhardson: Another for SWAT – https://gerrit.wikimedia.org/r/#/c/125023/ [23:16:21] ebernhardson: Sorry for the late notice, finally got it merged. [23:16:47] James_F: ok, no problem [23:17:00] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [23:17:00] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [23:17:00] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [23:17:00] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [23:18:13] (03PS1) 10Ori.livneh: Corrections to the inline documentation for role::graphite [operations/puppet] - 10https://gerrit.wikimedia.org/r/125025 [23:18:15] (03PS1) 10Ori.livneh: Increase count of carbon cache instances to eight [operations/puppet] - 10https://gerrit.wikimedia.org/r/125026 [23:19:48] ^ paravoid [23:19:59] going to figure out which metrics are responsible for the increase in load now [23:20:31] I'm so tired I'm not in any position to find bugs in code [23:21:11] you have my blessing to self-merge whatever you think is best for this :) [23:21:27] kk [23:22:23] !log ebernhardson synchronized php-1.23wmf21/extensions/Flow 'Backport fix DB-to-cache pipeline for 1.23wmd21' [23:22:28] Logged the message, Master [23:25:39] James_F: does jenkins-bot run in the Math extension? Not getting any gate-and-submit after +2 [23:25:53] oh there it goes, i'm just impatient :P [23:26:07] ebernhardson: Yes. [23:26:12] ebernhardson: Hah, lag. [23:26:13] :-) [23:28:07] (03CR) 10Ori.livneh: [C: 032] Corrections to the inline documentation for role::graphite [operations/puppet] - 10https://gerrit.wikimedia.org/r/125025 (owner: 10Ori.livneh) [23:28:24] (03CR) 10Ori.livneh: [C: 032] Increase count of carbon cache instances to eight [operations/puppet] - 10https://gerrit.wikimedia.org/r/125026 (owner: 10Ori.livneh) [23:28:52] (03CR) 10EBernhardson: [C: 032] Increase Flow cache version [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124646 (owner: 10Matthias Mullie) [23:29:49] jenkins is just generally slow right now :( [23:30:07] I love that Bugzilla says "Bugzilla needs a legitimate login and password to continue. " [23:30:15] like, that it specifies "legitimate", as though that needed clarification. [23:30:56] ori: My Android 2.3 used to ask me for the RIGHT password whenever I entered the wrong one first [23:31:04] <^d> ori: "Totes legit" would be better :) [23:31:11] * a wrong one [23:31:15] (03Merged) 10jenkins-bot: Increase Flow cache version [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124646 (owner: 10Matthias Mullie) [23:31:26] ^d: heh, +1 [23:31:36] hoo: "oooooh, you meant the *right* password." [23:32:30] !log ebernhardson synchronized wmf-config/CommonSettings.php 'Update Flow cache version' [23:32:35] Logged the message, Master [23:33:52] James_F: https://integration.wikimedia.org/ci/job/mediawiki-core-qunit/18911/console [23:34:05] James_F: checking to see what, but thats from merging your patch [23:34:15] ebernhardson: Yeah, qunit fails every now and then right now; just re-+2. [23:34:21] lol, ok [23:34:28] (03PS1) 10MarkTraceur: First pilot site for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125031 [23:34:29] ebernhardson: Krinkle's been working on it for days. :-( [23:34:30] (03PS1) 10MarkTraceur: FUTURE: Second batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125032 [23:34:32] (03PS1) 10MarkTraceur: FUTURE: Third batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125033 [23:34:34] (03PS1) 10MarkTraceur: FUTURE: Fourth batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125034 [23:34:36] (03PS1) 10MarkTraceur: FUTURE: Fifth batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125035 [23:34:48] marktraceur: The future, man, it's awesome! [23:34:50] LO, AND MARKTRACEUR SAID LET THERE BE PLANS [23:35:17] And now I don't have to worry about it. [23:36:53] !log ebernhardson synchronized php-1.23wmf21/extensions/Math/modules/VisualEditor/ve.ui.MWMathInspectorTool.js 'Update Math VE tool to use a command in 1.23wmf21' [23:36:57] Logged the message, Master [23:37:02] James_F: ok should be live now [23:37:13] ebernhardson: Thanks! [23:37:15] RoanKattouw: yours is also live if you want to test [23:37:24] ebernhardson: I tested that already; thanks! [23:37:31] ok awsome, SWAT is done then. [23:54:42] (03PS8) 10BryanDavis: [WIP] Configure scap master and clients in beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/123674