[00:01:40] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 114.800003 [00:02:50] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Server Error - 1703 bytes in 7.345 second response time [00:04:40] !log To investigate bug 63579, manually patched "grunt-lib-phantomjs/phantomjs/main.js" in "/srv/deployment/integration/slave-scripts" on gallium [00:04:45] Logged the message, Master [00:07:08] !log graphite.wikimedia.org (e.g. https://graphite.wikimedia.org/render/?) is serving 502 Bad Gateway, ori is investigating [00:07:13] Logged the message, Master [00:13:30] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [00:14:50] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 268395 bytes in 7.928 second response time [00:15:12] !log graphite webapp 502 caused by uwsgi's init script not restarting the service correctly [00:15:17] Logged the message, Master [00:15:30] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [00:33:30] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 185.933334 [00:35:30] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 297.566681 [00:38:10] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [00:54:30] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [00:56:40] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [01:09:05] !log Debugging uWSGI init scripts on tungsten; expect some Graphite / Gdash flapping. [01:09:13] Logged the message, Master [01:25:20] !log Bug 63579 is still happening occasionally. Leaving patch on gallium in place for now. [01:25:26] Logged the message, Master [01:37:10] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx [01:47:00] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 997.768861401 [01:54:21] (03PS1) 10Ori.livneh: Replace uWSGI's broken init.d scripts with Upstart job def [operations/puppet] - 10https://gerrit.wikimedia.org/r/124786 [02:00:12] (03CR) 10Ori.livneh: [C: 032] Replace uWSGI's broken init.d scripts with Upstart job def [operations/puppet] - 10https://gerrit.wikimedia.org/r/124786 (owner: 10Ori.livneh) [02:03:03] (03PS1) 10Ori.livneh: Update graphite::web for I8c214e0fd [operations/puppet] - 10https://gerrit.wikimedia.org/r/124788 [02:03:20] (03CR) 10Ori.livneh: [C: 032 V: 032] Update graphite::web for I8c214e0fd [operations/puppet] - 10https://gerrit.wikimedia.org/r/124788 (owner: 10Ori.livneh) [02:12:10] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [02:12:10] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [02:12:10] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [02:12:10] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [02:13:51] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 2777 MB (2% inode=99%): [02:17:24] (03PS1) 10Ori.livneh: Add 'uwsgictl' tool for managing services [operations/puppet] - 10https://gerrit.wikimedia.org/r/124789 [02:18:09] (03PS2) 10Ori.livneh: Add 'uwsgictl' tool for managing services [operations/puppet] - 10https://gerrit.wikimedia.org/r/124789 [02:19:25] !log LocalisationUpdate completed (1.23wmf20) at 2014-04-09 02:19:25+00:00 [02:19:28] (03CR) 10jenkins-bot: [V: 04-1] Add 'uwsgictl' tool for managing services [operations/puppet] - 10https://gerrit.wikimedia.org/r/124789 (owner: 10Ori.livneh) [02:19:33] Logged the message, Master [02:19:50] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3701 MB (3% inode=99%): [02:20:20] (03PS3) 10Ori.livneh: Add 'uwsgictl' tool for managing services [operations/puppet] - 10https://gerrit.wikimedia.org/r/124789 [02:21:54] (03CR) 10Ori.livneh: [C: 032] Add 'uwsgictl' tool for managing services [operations/puppet] - 10https://gerrit.wikimedia.org/r/124789 (owner: 10Ori.livneh) [02:31:47] (03PS1) 10Ori.livneh: Add service checks for mwprof, uwsgi, graphite-web & gdash [operations/puppet] - 10https://gerrit.wikimedia.org/r/124790 [02:33:24] (03CR) 10Ori.livneh: [C: 032] Add service checks for mwprof, uwsgi, graphite-web & gdash [operations/puppet] - 10https://gerrit.wikimedia.org/r/124790 (owner: 10Ori.livneh) [02:39:59] (03PS1) 10Ori.livneh: uWSGI service: specify --autoload [operations/puppet] - 10https://gerrit.wikimedia.org/r/124791 [02:40:08] (03CR) 10Ori.livneh: [C: 032 V: 032] uWSGI service: specify --autoload [operations/puppet] - 10https://gerrit.wikimedia.org/r/124791 (owner: 10Ori.livneh) [02:43:52] !log LocalisationUpdate completed (1.23wmf21) at 2014-04-09 02:43:52+00:00 [02:43:57] Logged the message, Master [02:53:16] greg-g: l10nupdate ran for 1.23wmf21 and mw.o isn't broken. \o/ [03:00:50] RECOVERY - Disk space on virt0 is OK: DISK OK [03:05:18] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: Connection refused [03:06:18] PROBLEM - gdash.wikimedia.org on tungsten is CRITICAL: Connection refused [03:16:58] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [03:23:58] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx [03:33:29] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Apr 9 03:33:25 UTC 2014 (duration 33m 24s) [03:33:34] Logged the message, Master [03:34:24] (03PS1) 10Ori.livneh: Make (graphite|gdash).wm.o go through misc-eqiad-lb [operations/puppet] - 10https://gerrit.wikimedia.org/r/124792 [03:41:30] (03CR) 10Ori.livneh: [C: 032] Make (graphite|gdash).wm.o go through misc-eqiad-lb [operations/puppet] - 10https://gerrit.wikimedia.org/r/124792 (owner: 10Ori.livneh) [04:13:58] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [05:12:18] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:12:18] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:12:18] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:12:18] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:13:58] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx [05:59:58] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [06:57:18] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx [07:09:23] (03PS4) 10Rush: module to manage new python-diamond package [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 [07:14:55] (03CR) 10Andrew Bogott: [C: 04-1] "Two comments, both simple" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 (owner: 10Rush) [07:20:07] (03CR) 10Rush: module to manage new python-diamond package (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 (owner: 10Rush) [07:25:30] rfarrand: This is the security thing we were talking about at breakfast: https://xkcd.com/ [07:30:18] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx [07:38:52] (03CR) 10Matanya: module to manage new python-diamond package (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 (owner: 10Rush) [07:52:17] @notify hashar [07:52:17] I'll let you know when I see hashar around here [07:52:19] ok [08:02:12] hello [08:04:41] hi hashar [08:04:43] I added more checks :D [08:04:47] hashar: how do I help getting https://wiki.jenkins-ci.org/display/JENKINS/Checkstyle+Plugin installed with our jenkins install? if it already isn't [08:05:08] yuvipanda: I think we got it already [08:05:15] oh cool! [08:05:42] yuvipanda: and it is in jjb: http://ci.openstack.org/jenkins-job-builder/publishers.html#publishers.checkstyle :-] [08:05:49] oooh! cool! [08:05:58] https://gerrit.wikimedia.org/r/#/c/124749/ and https://gerrit.wikimedia.org/r/#/c/124748/ for my python jobs [08:06:07] i deployed it already :) [08:06:15] looking [08:06:27] for check style you can use the publisher macro "checkstyle-xml" [08:06:41] it is defined in macro.yaml [08:06:41] is it being used anywhere else? [08:06:44] * yuvipanda greps [08:06:49] basically look for any file named checkstyle-*.xml [08:06:58] yeah grep is your best bet [08:07:08] if you get the check style report generated in the root of the workspace [08:07:12] you can get it added to the job using: [08:07:21] publishers: [08:07:21] - checkstyle-xml [08:07:46] cool! [08:08:11] so flake8 [08:08:12] https://gerrit.wikimedia.org/r/#/c/124749/1/mobile.yaml [08:08:27] that change is fine, it will create a job which executes: tox -e flake8 [08:08:38] which also mean that the target repository needs the flake8 integration hehe [08:08:40] and tox [08:08:51] hashar: yeah, I added tox.ini [08:09:00] there is https://www.mediawiki.org/wiki/Continuous_integration/Tutorials/Test_your_python :D [08:09:04] and fixed flake errors [08:09:36] hashar: ah. so this is just two scripts, no setup.py or modules [08:09:46] hmm, I just need flake8, so maybe I should use a different job [08:09:53] na it is fine [08:10:02] tox will take care of installing the latest version of flake8 [08:10:09] right [08:10:14] potentially we could just run the flake8 command [08:10:22] but I find defining a list of even easier to handle [08:10:26] :D [08:10:33] and you can then tweak what flake8 env is doing by simply editing your tox.ini [08:10:36] and in the future if we do end up having tests... [08:10:37] yeah [08:10:42] that saves you from having to refresh the jenkins job configuration [08:10:49] yup. [08:11:25] hashar: I also want to get the continuous builds working again. [08:11:31] for this repo [08:11:32] what is the change adding tox.ini ? [08:11:37] hashar: looking, moment [08:11:42] ahh nm [08:11:46] hashar: https://gerrit.wikimedia.org/r/#/c/124746/ [08:11:46] it is under scripts/tox.ini [08:11:59] yeah, all py files are there [08:12:02] should I move it up? [08:12:11] if you dont mind [08:12:16] sure [08:12:18] you also need to define a tox env named flake8 [08:12:59] don't I have that in the tox.ini? [08:13:18] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:13:18] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:13:18] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:13:18] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:13:48] yuvipanda: there is a [flake8] section but that simply gives settings for flake8 [08:13:57] oh [08:14:00] yuvipanda: tox need an env defined with [testenv:flake8] let me find an example [08:14:08] aaah [08:14:14] ah it is on https://www.mediawiki.org/wiki/Continuous_integration/Tutorials/Test_your_python [08:14:44] there is a dummy tox file which has sections [tox] (that configure tox itself) [testenv] which provides the default for each environement [08:14:59] and then [testenv:flake8] which tells what to do when one invoke tox with the 'flake8' environement [08:15:13] think about it like a npm target which would be npm flake8 :-D [08:15:20] :) [08:15:32] deploying zuul change [08:16:53] done [08:17:21] wooo! [08:18:24] hashar: https://gerrit.wikimedia.org/r/#/c/124797/ has tox.ini fixes [08:18:59] hmm [08:18:59] 08:18:13 tox.MissingFile: MissingFile: /mnt/jenkins-workspace/workspace/apps-android-wikipedia-tox-flake8/setup.py [08:19:23] I do have skipDist=true [08:22:14] hmm [08:22:17] * yuvipanda tries skipsdist [08:25:05] sorry bac [08:25:06] k [08:25:23] ah [08:25:27] yuvipanda: you should add some ignores [08:25:35] yeah, am doing [08:25:36] the wiki page is not very up to date hehe [08:25:47] hashar: it is currently running flake8 on tox itself and complaining :D [08:25:49] https://integration.wikimedia.org/ci/job/apps-android-wikipedia-tox-flake8/3/console [08:26:00] [flake8] [08:26:00] exclude = .venv,.tox,dist,doc,build,*.egg [08:27:12] \O/ [08:27:33] hashar: \o/ https://gerrit.wikimedia.org/r/#/c/124797/ that seems to work :) [08:27:44] hashar: want to +2? :) [08:27:47] press rebase and merge ! [08:27:52] I rebased :) [08:28:18] let me comment on it [08:28:23] ok! [08:29:40] https://gerrit.wikimedia.org/r/#/c/124797/8/tox.ini,unified [08:29:42] a couple more [08:29:58] [tox] [08:29:58] minversion = 1.6 [08:29:58] envlist = flake8 [08:31:47] kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 1.97148582555e-09 [08:31:50] ? [08:31:57] hashar: updated! :) [08:32:57] yuvipanda: rebased it and giving it a try [08:33:04] hashar: ok! [08:33:08] * hashar whistles [08:33:17] want me to +2 it ? [08:33:54] hashar: yes! :) [08:34:03] hashar: ty! [08:34:10] hashar: now comes the hard part which is checkstyle for java [08:34:15] I'll go have lunch and come back to it :) [08:34:19] yuvipanda: also python as a VERY nice feature which is to add unit test straight in the function/method doc block [08:34:20] https://docs.python.org/2/library/doctest.html [08:34:28] yeah! I've used that in the past [08:34:42] hashar: but these are just plain scripts that are run once in a while. I'll figure out how to add tests for them at some point [08:34:50] found that a couple weeks ago, bryan davis did demo it to me :] [08:34:54] I was in choke [08:35:02] yeah, doctests are fun! [08:35:16] hashar: we don't have anything else that uses checkstyle on Java repos do we? [08:35:28] not sure [08:35:40] isn't maven taking care of generating the check style xml file? [08:35:44] (I am maven agnostic sorry) [08:35:44] yeah [08:35:51] so I just have to run maven then [08:35:55] or have a job that runs maven [08:36:00] and setup a local maven thing going [08:36:06] hashar: I'll do that after lunch! [08:36:10] we'll have to make that non-voting for now [08:36:11] but still [08:36:23] yuvipanda: I think there is already a maven job in our jjb conf [08:36:29] oh? [08:36:31] * yuvipanda greps [08:36:35] yeah in mobile.yaml [08:36:41] yeah, [08:37:04] yuvipanda: and http://ci.openstack.org/jenkins-job-builder/project_maven.html :-] [08:37:12] woot! [08:37:19] hashar: this should be easier since this is just a check and not a build [08:37:20] and indeed we can make it non voting while you play with it [08:37:22] I'll add build later [08:37:48] hashar: brb! And thanks for the help! [08:37:49] I am not sure where the checkstyle.xml file will end up being generated to though [08:38:02] ping me anytime, I am at home the whole day [08:38:28] ah [08:38:45] yuvipanda: jenkins-bot can't merge on the apps/android/wikipedia :-] adding it in [08:39:46] !log Gerrit Letting JenkinsBot submit changes on apps/android/* [08:39:51] Logged the message, Master [08:50:27] hashar: woo! Thanks for fixing that [08:57:28] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx [09:01:00] <_joe_> nd graphite is *not* working, again [09:01:26] (03CR) 10Odder: [C: 031] Add Musées de la Haute-Saône to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124754 (owner: 10Jean-Frédéric) [09:09:52] (03PS2) 10Hashar: Add Musées de la Haute-Saône to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124754 (owner: 10Jean-Frédéric) [09:10:15] (03CR) 10Hashar: "You might want to whitelist both *.musees.cg70.fr and musses.cg70.fr don't you? :-)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124754 (owner: 10Jean-Frédéric) [09:10:34] odder: do you have any contact with jean-frederic ? [09:16:56] (03PS3) 10Hashar: Add Musées de la Haute-Saône to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124754 (owner: 10Jean-Frédéric) [09:17:12] hashar: java checkstyle! https://gerrit.wikimedia.org/r/#/c/124802/ and https://gerrit.wikimedia.org/r/#/c/124801/ [09:17:20] I have to make it non voting tho, will figure that [09:17:42] (03CR) 10Hashar: [C: 032] "Restored *.musees.cg70.fr , that is not going to do any harm :-)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124754 (owner: 10Jean-Frédéric) [09:17:49] (03Merged) 10jenkins-bot: Add Musées de la Haute-Saône to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124754 (owner: 10Jean-Frédéric) [09:19:09] !log hashar synchronized wmf-config/InitialiseSettings.php '[] = 'musees.cg70.fr'; {{gerrit|124754}} {{bug|63449}}' [09:19:14] Logged the message, Master [09:19:42] (03CR) 10Hashar: "The change has been deployed on Wikimedia production cluster. Feel free to ping me on irc (hashar) if it needs further tweaks." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/124754 (owner: 10Jean-Frédéric) [09:27:12] (03CR) 10Hashar: [C: 031] "We should probably find a way to have scap to install it for us or update the /usr/local/bin paths to point to /srv/scap/scap/bin or somet" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124763 (owner: 10BryanDavis) [09:32:01] (03PS1) 10Andrew Bogott: Hammer down a few more bogus https failures. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124806 [09:35:35] ACKNOWLEDGEMENT - RAID on dataset1001 is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) daniel_zahn RT #7238 [09:35:56] (03PS2) 10Andrew Bogott: Hammer down a few more bogus https failures. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124806 [09:36:04] (03CR) 10Alexandros Kosiaris: [C: 032] Create symlink for compile-wikiversions in /usr/local/bin [operations/puppet] - 10https://gerrit.wikimedia.org/r/124763 (owner: 10BryanDavis) [09:37:09] * yuvipanda pokes hashar [09:38:40] (03CR) 10Andrew Bogott: [C: 032] Hammer down a few more bogus https failures. [operations/puppet] - 10https://gerrit.wikimedia.org/r/124806 (owner: 10Andrew Bogott) [09:40:25] ori: ping [09:42:36] (03CR) 10Dzahn: "+1" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124806 (owner: 10Andrew Bogott) [09:45:52] yuvipanda: pong [09:46:34] hashar: java checkstyle! https://gerrit.wikimedia.org/r/#/c/124802/ and https://gerrit.wikimedia.org/r/#/c/124801/ [09:46:40] bad panda [09:46:45] andrewbogott: sorry haven't followed up on ganglia on labs. The instance is too small indeed and not sure how to fix it. I would attempt to resize the instance to something bigger, else recreate it from scratched + do a bunch of puppet changes to point to the new instance :] [09:46:52] what did I do now, odder [09:47:03] * odder hugs yuvipanda [09:47:07] telephone :) [09:47:12] hashar: it's puppetized, right? So just kill the instance and build a new one with the same name? [09:47:18] Oh, I guess you need to reuse the ip... [09:47:36] odder: :) It was a bit irritating considering how some decisions are made on Trello in a much less transparent manner and you need accounts to comment there... [09:47:48] odder: and even worse is Mingle which sometimes isn't even public [09:48:09] PROBLEM - HTTPS on cp4007 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [09:48:16] andrewbogott: yeah I would ideally need the same IP but there is no good way to achieve that apparently. And since I am too lazy to do all the puppet changes, I thought we could try resizing the instance hehe [09:48:34] andrewbogott: but you probably have better things to do today @ops summit (moaaaaar monitoring!) [09:48:54] Can try resizing but instances often do not survive [09:48:59] But yeah, next week we'll try it [09:49:09] andrewbogott: another way would be to have two instances. One to collect all metrics and another one acting as a webfrontend [09:49:26] andrewbogott: but I don't know ganglia well enough to figure out how to split web/aggregators services [09:50:05] It's ok, we'll just move it to a great big instance next week :) [09:50:23] so resizing is highly risky (instance disappear) but if it works that would only have taken a few minutes + that grant you yet another skill on your resume ("I managed to resize an instance on Nova yeahhhh") [09:50:30] hehe [09:51:32] yuvipanda: looking [09:52:54] (03CR) 10Dzahn: [C: 032] remove all Tampa appservers from DHCP [operations/puppet] - 10https://gerrit.wikimedia.org/r/123211 (owner: 10Dzahn) [09:54:45] yuvipanda: you got spam https://gerrit.wikimedia.org/r/#/c/124802/3/mobile.yaml,unified [09:55:11] yay mutante, one step closer to getting the word 'Tampa' blacklisted in here? :-) [09:57:29] hashar: updated both patches [09:57:33] :-] [09:57:45] waiting for Jenkins to show the diff [09:58:34] yuvipanda: get the maven job deployed :) I am deploying zuul change [09:58:37] hashar: woo! deploying [09:59:10] hashar: deployed! [09:59:28] and feel free to join #wikimedia-qa where all the spam is sent :] [09:59:47] hashar: I was on it but the selenium spam was a fair bit so I left [10:00:06] you can ignore the selenium bot :] [10:00:17] that is true [10:00:19] I shall do that :) [10:00:24] * yuvipanda adds to autojoin [10:00:24] zuul reloaded! [10:01:28] hashar: woo! now to test. [10:04:38] hashar: \o/ https://integration.wikimedia.org/ci/job/apps-android-wikipedia-maven-checkstyle/2/console but has a maven error not a checkstyle one [10:04:39] investigating [10:04:59] at least it started downloading the whole internet [10:05:05] that is usually a good sign [10:05:33] hehe [10:05:39] 10:03:54 [INFO] There are 3898 checkstyle errors. [10:05:42] but I can't see them? [10:06:00] we have to find out whether maven write them to some xml file [10:07:07] yuvipanda: http://paste.openstack.org/show/75396/ [10:08:25] I guess we want Jenkins Checkstyle plugin to look at files matching **/target/checkstyle-result.xml [10:09:07] (follow up on #wikimedia-qa) [10:51:23] (03PS1) 10Hashar: contint: apply maven settings on labs slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/124822 [10:56:53] (03CR) 10Hashar: [C: 031 V: 032] "Cherry picked on the puppetmaster for the integration project. That would hopefully work on the production slaves gallium.wikimedia.org " [operations/puppet] - 10https://gerrit.wikimedia.org/r/124822 (owner: 10Hashar) [11:03:16] (03PS2) 10Hashar: contint: get rid of misc::pbuilder on slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/122707 [11:03:18] (03PS3) 10Hashar: contint: directory to hold debian-glue packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/122712 [11:13:39] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [11:13:39] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [11:13:39] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [11:13:39] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [12:01:59] PROBLEM - Host amssq58 is DOWN: PING CRITICAL - Packet loss = 100% [12:02:20] PROBLEM - Host amssq57 is DOWN: PING CRITICAL - Packet loss = 100% [12:02:20] PROBLEM - Host amssq52 is DOWN: PING CRITICAL - Packet loss = 100% [12:02:28] is this someone here? ^^ [12:02:30] PROBLEM - Host lvs3001 is DOWN: PING CRITICAL - Packet loss = 100% [12:02:30] PROBLEM - Host amssq47 is DOWN: PING CRITICAL - Packet loss = 100% [12:02:30] PROBLEM - Host amssq56 is DOWN: PING CRITICAL - Packet loss = 100% [12:02:50] PROBLEM - Host amslvs3 is DOWN: PING CRITICAL - Packet loss = 100% [12:02:59] PROBLEM - Host amslvs1 is DOWN: PING CRITICAL - Packet loss = 100% [12:03:00] PROBLEM - Host cp3004 is DOWN: PING CRITICAL - Packet loss = 100% [12:03:00] PROBLEM - Host text-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [12:03:03] PROBLEM - Host amslvs4 is DOWN: PING CRITICAL - Packet loss = 100% [12:03:12] PROBLEM - Host amssq59 is DOWN: PING CRITICAL - Packet loss = 100% [12:03:21] PROBLEM - Host upload-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [12:03:30] RECOVERY - Host amssq47 is UP: PING WARNING - Packet loss = 93%, RTA = 86.26 ms [12:03:30] RECOVERY - Host cp3004 is UP: PING WARNING - Packet loss = 93%, RTA = 85.57 ms [12:03:30] RECOVERY - Host lvs3001 is UP: PING WARNING - Packet loss = 93%, RTA = 85.32 ms [12:03:30] RECOVERY - Host amssq59 is UP: PING WARNING - Packet loss = 93%, RTA = 87.04 ms [12:03:30] RECOVERY - Host upload-lb.esams.wikimedia.org is UP: PING WARNING - Packet loss = 28%, RTA = 87.24 ms [12:03:32] RECOVERY - Host amssq57 is UP: PING WARNING - Packet loss = 50%, RTA = 87.85 ms [12:03:39] RECOVERY - Host amssq52 is UP: PING OK - Packet loss = 0%, RTA = 87.41 ms [12:03:39] RECOVERY - Host amssq58 is UP: PING OK - Packet loss = 0%, RTA = 87.04 ms [12:03:39] RECOVERY - Host amslvs3 is UP: PING OK - Packet loss = 0%, RTA = 87.60 ms [12:03:59] PROBLEM - Host cp3010 is DOWN: PING CRITICAL - Packet loss = 100% [12:03:59] PROBLEM - RAID on nescio is CRITICAL: Timeout while attempting connection [12:04:09] RECOVERY - Host cp3010 is UP: PING OK - Packet loss = 0%, RTA = 94.45 ms [12:04:09] RECOVERY - Host amslvs1 is UP: PING OK - Packet loss = 0%, RTA = 95.24 ms [12:04:09] RECOVERY - Host amssq56 is UP: PING OK - Packet loss = 0%, RTA = 95.58 ms [12:04:49] RECOVERY - Host amslvs4 is UP: PING OK - Packet loss = 0%, RTA = 96.66 ms [12:04:49] RECOVERY - Host text-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 94.73 ms [12:22:55] (03CR) 10Hashar: "rebased to fix conflict" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122707 (owner: 10Hashar) [12:23:04] (03CR) 10Hashar: "rebased" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122712 (owner: 10Hashar) [12:52:11] (03CR) 10Dzahn: [C: 032] beta: reenable fatalmonitor script on eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/124624 (owner: 10Hashar) [12:54:12] mutante: should i push ms6 decom patches ? [13:08:46] (03PS1) 10Andrew Bogott: Partial revert of a87afea676375aba4f0ec28228e28df0502e5321 [operations/puppet] - 10https://gerrit.wikimedia.org/r/124834 [13:12:08] (03CR) 10Andrew Bogott: [C: 032] Partial revert of a87afea676375aba4f0ec28228e28df0502e5321 [operations/puppet] - 10https://gerrit.wikimedia.org/r/124834 (owner: 10Andrew Bogott) [13:19:31] apergos: online? [13:19:39] yes [13:19:58] Steinsplitter: what's up? [13:20:49] apergos: it is back. so resolved O_O [13:22:35] ok then [13:25:16] apergos: is ms6 in esams or tampa ? [13:26:15] mutante: can we close https://rt.wikimedia.org/Ticket/Display.html?id=6979 now that you've done #80? [13:27:44] (03PS1) 10Cmjohnson: adding ipv6 for neon/icinga rt 4602 [operations/dns] - 10https://gerrit.wikimedia.org/r/124837 [13:31:48] matanya: yes [13:31:49] andrewbogott: yes [13:32:14] mutante: is m6 in esams or tampa ? [13:32:26] matanya: ms5 is gone isn't it? I mean long gone [13:32:29] pmtpa [13:32:45] oh sorry [13:32:59] so why is it node 'ms6.esams.wikimedia.org' in site.pp ? [13:33:04] ms6 (I have one veerical line of pixels that doesn't function, and once in awhile it lands in exactly the place to change a letter [13:33:11] or make a ltter/number ambiguous [13:33:13] esams. [13:33:17] it's in esams [13:33:31] while files/dsh/group/pmtpa includes ms6 ? [13:33:37] whoknows [13:33:41] i'm confused [13:33:49] i'll just remove all [13:33:50] well ms6 is definitely in esams [13:34:06] ms5 if it still exists (which it shouldn't) is/was definitely in pmtpa [13:34:33] anywyas ms6 is no longer used for anything so it can go from dsh and all. [13:35:32] (03CR) 10Cmjohnson: [C: 032] adding ipv6 for neon/icinga rt 4602 [operations/dns] - 10https://gerrit.wikimedia.org/r/124837 (owner: 10Cmjohnson) [13:35:59] (03PS1) 10Dzahn: create shell account for Filippo Giunchedi [operations/puppet] - 10https://gerrit.wikimedia.org/r/124838 [13:36:11] (03PS1) 10Matanya: decom: ms6 [operations/puppet] - 10https://gerrit.wikimedia.org/r/124839 [13:36:12] uuuh [13:36:49] (03PS1) 10Andrew Bogott: Revert "Make (graphite|gdash).wm.o go through misc-eqiad-lb" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124840 [13:38:39] (03CR) 10Filippo Giunchedi: [C: 031] create shell account for Filippo Giunchedi [operations/puppet] - 10https://gerrit.wikimedia.org/r/124838 (owner: 10Dzahn) [13:38:52] (03CR) 10Andrew Bogott: [C: 032] Revert "Make (graphite|gdash).wm.o go through misc-eqiad-lb" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124840 (owner: 10Andrew Bogott) [13:39:12] matanya: ms6.esams.wikimedia.org [13:39:13] (03PS1) 10Alexandros Kosiaris: Improve the check_eth check [operations/puppet] - 10https://gerrit.wikimedia.org/r/124841 [13:39:26] matanya: that's why it did not show up in Tampa tikcets [13:39:33] mutante: see above :) [13:41:12] andre__: Do you have plans to log out all BZ users? [13:41:47] hoo: I myself didn't have plans so far; I'd hope that ops told me if that's recommended [13:41:54] (plus no idea how I'd do that, to be honest) [13:43:10] (03CR) 10Dzahn: [C: 032] create shell account for Filippo Giunchedi [operations/puppet] - 10https://gerrit.wikimedia.org/r/124838 (owner: 10Dzahn) [13:44:58] (03CR) 10Alexandros Kosiaris: [C: 032] Improve the check_eth check [operations/puppet] - 10https://gerrit.wikimedia.org/r/124841 (owner: 10Alexandros Kosiaris) [13:46:02] andre__: Prune the logincookies table? [13:46:02] Maybe also the tokens table? [13:46:13] I'm not into BZ enough :/ [13:46:21] (03PS5) 10Rush: module to manage new python-diamond package [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 [13:46:33] I guess I might ask upstream what they plan to do [13:47:37] andre__: Ok, I can help with killing stuff from the DB or so (or springl.e, ... can do that) [13:48:26] (03CR) 10Matanya: module to manage new python-diamond package (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 (owner: 10Rush) [13:48:39] (03PS1) 10Alexandros Kosiaris: Fix bug introduced in I4a7a2c71be [operations/puppet] - 10https://gerrit.wikimedia.org/r/124842 [13:49:24] (03CR) 10Alexandros Kosiaris: [C: 032] Fix bug introduced in I4a7a2c71be [operations/puppet] - 10https://gerrit.wikimedia.org/r/124842 (owner: 10Alexandros Kosiaris) [13:49:33] (03CR) 10Alexandros Kosiaris: [V: 032] Fix bug introduced in I4a7a2c71be [operations/puppet] - 10https://gerrit.wikimedia.org/r/124842 (owner: 10Alexandros Kosiaris) [13:49:54] (03PS6) 10Rush: module to manage new python-diamond package [operations/puppet] - 10https://gerrit.wikimedia.org/r/124608 [13:55:38] (03PS1) 10RobH: replace ticket.wikimedia.org certificate [operations/puppet] - 10https://gerrit.wikimedia.org/r/124843 [13:56:39] (03CR) 10RobH: [C: 032 V: 032] replace ticket.wikimedia.org certificate [operations/puppet] - 10https://gerrit.wikimedia.org/r/124843 (owner: 10RobH) [13:58:04] !log updating otrs cert [13:58:10] Logged the message, RobH [13:58:44] (03PS1) 10Alexandros Kosiaris: Fix another introduced in I4a7a2c71be [operations/puppet] - 10https://gerrit.wikimedia.org/r/124844 [13:59:23] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Fix another introduced in I4a7a2c71be [operations/puppet] - 10https://gerrit.wikimedia.org/r/124844 (owner: 10Alexandros Kosiaris) [14:00:11] hoo: might be a good question for security@. I can't really judge if that's needed :-/ [14:00:39] !log adding filippo to ops/wmf LDAP groups [14:00:44] Logged the message, Master [14:01:34] andre__: I don't know Bugzilla, I can only tell you that users should be logged out and adviced to change their Passwords [14:01:48] but I can't tell you how BZ handles this [14:02:44] !log yes, otrs is totally ssl borked, robh is working on it [14:02:49] Logged the message, RobH [14:05:30] (03CR) 10Alexandros Kosiaris: "The ideas in this were incorporated in:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124606 (owner: 10Cmjohnson) [14:08:17] andre__: I'd suggest to prune the logincookies table, it has about 5.5k entries atm [14:08:55] PROBLEM - HTTPS on cp4009 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org) [14:08:55] PROBLEM - check configured eth on tridge is CRITICAL: NRPE: Command check_check_eth not defined [14:09:36] hoo, I'll ask on security@ as I'd love to get input [14:09:42] OK, more https errors coming up -- please ignore! [14:09:44]