[00:00:43] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 01:50:18 UTC [00:10:54] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [00:11:53] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [00:24:43] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 10:18:04 UTC [00:31:43] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 06:23:11 UTC [01:44:03] PROBLEM - puppet last run on amssq49 is CRITICAL: CRITICAL: Epic puppet fail [02:01:43] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 01:50:18 UTC [02:03:03] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [02:05:23] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3873 MB (3% inode=99%): [02:10:53] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:11:53] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:17:22] !log LocalisationUpdate completed (1.24wmf18) at 2014-09-01 02:16:18+00:00 [02:17:32] Logged the message, Master [02:25:43] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 10:18:04 UTC [02:28:42] !log LocalisationUpdate completed (1.24wmf19) at 2014-09-01 02:27:39+00:00 [02:28:49] Logged the message, Master [02:30:53] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:31:53] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:32:43] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 06:23:11 UTC [03:00:23] RECOVERY - Disk space on virt0 is OK: DISK OK [03:15:28] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Sep 1 03:14:22 UTC 2014 (duration 14m 21s) [03:15:34] Logged the message, Master [03:16:03] RECOVERY - OCG health on ocg1003 is OK: OK: /mnt/tmpfs 0B: /srv/deployment/ocg/output 4132919304B: /srv/deployment/ocg/postmortem 1321920B: ocg_job_status 9515 msg: ocg_render_job_queue 0 msg [03:44:13] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: Epic puppet fail [04:02:43] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 01:50:18 UTC [04:03:14] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [04:26:43] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 10:18:04 UTC [04:33:43] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 06:23:11 UTC [05:10:24] (03PS1) 10Springle: assign db1044 to s3 [puppet] - 10https://gerrit.wikimedia.org/r/157641 [05:12:07] (03CR) 10Springle: [C: 032] assign db1044 to s3 [puppet] - 10https://gerrit.wikimedia.org/r/157641 (owner: 10Springle) [05:26:24] !log springle Synchronized wmf-config/db-eqiad.php: depool db1027 while cloning (duration: 00m 07s) [05:26:31] Logged the message, Master [05:28:22] !log xtrabackup clone db1027 to db1044 [05:28:27] Logged the message, Master [05:37:28] wikibugs seems slow. [05:51:14] (03PS1) 10Springle: Mariadb 10 slave config changes. - Prepare for engine-independent index statistics - Enable extra port 3307 in case of emergency - Remove deprecated innodb setting [puppet] - 10https://gerrit.wikimedia.org/r/157644 [05:52:52] (03CR) 10Springle: [C: 032] Mariadb 10 slave config changes. - Prepare for engine-independent index statistics - Enable extra port 3307 in case of emergency - Remove de [puppet] - 10https://gerrit.wikimedia.org/r/157644 (owner: 10Springle) [06:03:43] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 01:50:18 UTC [06:25:34] RECOVERY - Disk space on elastic1015 is OK: DISK OK [06:27:43] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 10:18:04 UTC [06:28:14] PROBLEM - puppet last run on mw1002 is CRITICAL: CRITICAL: Epic puppet fail [06:28:33] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:03] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:03] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:13] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 4 failures [06:34:43] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 06:23:11 UTC [06:40:13] PROBLEM - puppet last run on db1009 is CRITICAL: CRITICAL: Puppet has 2 failures [06:46:03] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:46:03] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:46:13] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:46:34] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:47:14] RECOVERY - puppet last run on mw1002 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [06:58:13] RECOVERY - puppet last run on db1009 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [07:10:58] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [07:11:07] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [07:11:42] !log powercycle ms-be1010 "cpu soft lockup" on console [07:11:49] Logged the message, Master [07:12:57] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [07:13:07] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [07:13:10] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] mailman: leave used_languages alone, use site_languages [puppet] - 10https://gerrit.wikimedia.org/r/156766 (owner: 10Filippo Giunchedi) [07:13:18] PROBLEM - Host ms-be1010 is DOWN: PING CRITICAL - Packet loss = 100% [07:14:19] <_joe_> good morning [07:14:22] <_joe_> ciao godog [07:14:37] RECOVERY - check if dhclient is running on ms-be1010 is OK: PROCS OK: 0 processes with command name dhclient [07:14:37] RECOVERY - swift-account-replicator on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [07:14:37] RECOVERY - Disk space on ms-be1010 is OK: DISK OK [07:14:38] RECOVERY - swift-object-server on ms-be1010 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [07:14:38] RECOVERY - swift-account-reaper on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [07:14:38] RECOVERY - swift-container-auditor on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [07:14:38] RECOVERY - swift-object-replicator on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [07:14:38] RECOVERY - swift-account-auditor on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [07:14:38] RECOVERY - swift-container-updater on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [07:14:39] RECOVERY - swift-account-server on ms-be1010 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [07:14:40] RECOVERY - RAID on ms-be1010 is OK: OK: optimal, 14 logical, 14 physical [07:14:47] RECOVERY - Host ms-be1010 is UP: PING OK - Packet loss = 0%, RTA = 1.43 ms [07:14:48] RECOVERY - swift-container-replicator on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [07:14:48] RECOVERY - swift-container-server on ms-be1010 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [07:14:57] RECOVERY - SSH on ms-be1010 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [07:14:58] RECOVERY - swift-object-auditor on ms-be1010 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [07:14:58] RECOVERY - Puppet freshness on ms-be1010 is OK: puppet ran at Mon Sep 1 07:14:54 UTC 2014 [07:14:58] RECOVERY - swift-object-updater on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [07:15:00] hey _joe_ [07:15:07] RECOVERY - DPKG on ms-be1010 is OK: All packages OK [07:15:07] RECOVERY - check configured eth on ms-be1010 is OK: NRPE: Unable to read output [07:15:07] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [07:23:55] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] weight ms-be1013/14/15 to 2800 [software/swift-ring] - 10https://gerrit.wikimedia.org/r/156252 (owner: 10Filippo Giunchedi) [07:27:02] !log deploy latest ring to swift eqiad-prod [07:27:09] Logged the message, Master [07:32:39] (03PS1) 10Springle: mariadb events for s1-7 masters [software] - 10https://gerrit.wikimedia.org/r/157649 [07:33:01] RECOVERY - NTP on ms-be1010 is OK: NTP OK: Offset 0.0009207725525 secs [07:34:02] (03CR) 10Springle: [C: 032] mariadb events for s1-7 masters [software] - 10https://gerrit.wikimedia.org/r/157649 (owner: 10Springle) [07:37:19] (03PS1) 10Springle: script to generate engine-independent statistics [software] - 10https://gerrit.wikimedia.org/r/157650 [07:39:13] (03CR) 10Springle: [C: 032] script to generate engine-independent statistics [software] - 10https://gerrit.wikimedia.org/r/157650 (owner: 10Springle) [07:41:26] (03PS1) 10Springle: fix comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157651 [07:44:55] (03PS6) 10Giuseppe Lavagetto: beta: manage virtualhosts via puppet [puppet] - 10https://gerrit.wikimedia.org/r/156762 [07:45:06] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] beta: manage virtualhosts via puppet [puppet] - 10https://gerrit.wikimedia.org/r/156762 (owner: 10Giuseppe Lavagetto) [07:46:01] PROBLEM - swift eqiad-prod object availability on tungsten is CRITICAL: CRITICAL: 1.67% of data under the critical threshold [90.0] [07:47:24] (03CR) 10Springle: [C: 032] fix comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157651 (owner: 10Springle) [07:47:28] (03Merged) 10jenkins-bot: fix comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157651 (owner: 10Springle) [07:50:23] ACKNOWLEDGEMENT - swift eqiad-prod object availability on tungsten is CRITICAL: CRITICAL: 8.33% of data under the critical threshold [90.0] Filippo Giunchedi rebalance in progress [07:52:20] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] mediawiki::packages: remove libtidy-0.99-0 [puppet] - 10https://gerrit.wikimedia.org/r/156843 (owner: 10Ori.livneh) [07:52:26] (03PS2) 10Filippo Giunchedi: mediawiki::packages: remove libtidy-0.99-0 [puppet] - 10https://gerrit.wikimedia.org/r/156843 (owner: 10Ori.livneh) [07:52:34] (03CR) 10Filippo Giunchedi: [V: 032] mediawiki::packages: remove libtidy-0.99-0 [puppet] - 10https://gerrit.wikimedia.org/r/156843 (owner: 10Ori.livneh) [08:03:50] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 01:50:18 UTC [08:29:52] (03CR) 10Filippo Giunchedi: "does it fully replace this? sounds like we can abandon this code review then" [puppet] - 10https://gerrit.wikimedia.org/r/138292 (owner: 10Ori.livneh) [08:34:50] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 06:23:11 UTC [08:37:16] (03CR) 10Filippo Giunchedi: Icinga: Check Dispatch command for Wikidata notification (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/136095 (owner: 10Christopher Johnson (WMDE)) [09:00:34] (03CR) 10Filippo Giunchedi: "no it was introduced earlier in Icf4240e08" [puppet] - 10https://gerrit.wikimedia.org/r/156260 (owner: 10Filippo Giunchedi) [09:01:26] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] elasticsearch: deploy shard percentage check [puppet] - 10https://gerrit.wikimedia.org/r/156260 (owner: 10Filippo Giunchedi) [09:11:30] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [09:25:19] btw an easy way to fix the above is from strontium: sudo -u gitpuppet ssh strontium.eqiad.wmnet [09:25:33] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [09:31:54] !log springle Synchronized wmf-config/db-eqiad.php: repool db1027 (duration: 00m 07s) [09:32:00] Logged the message, Master [09:56:02] (03CR) 10Alexandros Kosiaris: [C: 032] Add codfw subnets (now: remove pmtpa IPv6 networks) [puppet] - 10https://gerrit.wikimedia.org/r/156090 (owner: 10Mark Bergsma) [10:04:33] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 01:50:18 UTC [10:08:15] perhaps a silly question, what needs to happen after I've merged https://gerrit.wikimedia.org/r/#/c/156260/ and puppet has run to make the check appear? [10:11:13] godog: nothing ? [10:11:26] well review that the check is working ? [10:11:49] the check is in checkcommands.cfg but not in puppet_services.cfg on neon mmhhh [10:12:03] monitor_service will achieve this [10:12:06] 2 puppet runs [10:12:21] 1 on whatever includes elasticsearch::nagios::check [10:12:24] and on 1 neon [10:12:37] 40 minutes at most, often less [10:13:06] after that, puppet_services will have the stanza [10:14:13] ah okay, so worst/unlucky case is 80m [10:14:24] there we go! thanks akosiaris [10:15:01] no worst case is 40min [10:15:09] 20 + 20 [10:15:47] assuming your host runs puppet at the exact same time that neon does [10:18:46] ah! okay just bad timing then (it just generated the right puppet_services) [10:21:10] (03PS1) 10Alexandros Kosiaris: Assign all hosts to a nagios hostgroup [puppet] - 10https://gerrit.wikimedia.org/r/157658 [10:22:38] (03PS1) 10Filippo Giunchedi: elasticsearch: add python-requests dependency [puppet] - 10https://gerrit.wikimedia.org/r/157659 [10:23:07] akosiaris: ^ got 2min for an easy one? :) [10:23:19] (03CR) 10Alexandros Kosiaris: "Submitting this in the form of an RFC, trying to grasp (probably for a second time) the reason for the old behaviour" [puppet] - 10https://gerrit.wikimedia.org/r/157658 (owner: 10Alexandros Kosiaris) [10:23:46] <_joe_> godog: I can take a look [10:24:14] (03CR) 10Alexandros Kosiaris: [C: 04-1] "pedantic note, otherwise LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/157659 (owner: 10Filippo Giunchedi) [10:24:36] godog: there you go, _joe_ https://gerrit.wikimedia.org/r/157658 [10:24:44] need your wisdom please :-) [10:25:02] cause I can't for the life of me remeber why we had that [10:26:07] _joe_: sure, thanks! [10:26:10] akosiaris: ok! [10:26:42] <_joe_> guys, how good you are with apache2 configs? [10:26:49] <_joe_> I'm getting *mad* [10:26:58] shoot [10:27:10] (03PS2) 10Filippo Giunchedi: elasticsearch: add python-requests dependency [puppet] - 10https://gerrit.wikimedia.org/r/157659 [10:27:34] (03CR) 10Filippo Giunchedi: elasticsearch: add python-requests dependency (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/157659 (owner: 10Filippo Giunchedi) [10:27:43] <_joe_> ok, HHVM: we will have 100 ProxyPass directives everywhere, and all will include the DOCUMENT_ROOT of the single virtualhost [10:27:56] <_joe_> I figured a way to make it work with ProxyPass [10:28:14] <_joe_> I mean a way to use a single include everywhere [10:28:36] <_joe_> I don't want to use a RewriteRule as it's a performance hit [10:28:41] <_joe_> comapred to proxypass [10:28:58] <_joe_> proxypass uses persistent connections to the fcgi server, rewrite does not [10:29:04] <_joe_> so, I tried this [10:29:31] <_joe_> RewriteRule ^ - [E=doc_root:%{DOCUMENT_ROOT}] [10:29:42] <_joe_> ProxyPassInterpolateEnv On [10:30:17] <_joe_> ProxyPass /wiki fcgi://localhost:9000/${doc_root}/something.php [10:30:29] <_joe_> ProxyPass /wiki fcgi://localhost:9000/${doc_root}/something.php interpolate [10:30:32] <_joe_> sorry [10:30:38] <_joe_> but, this does not work apparently [10:30:52] <_joe_> and I can't see what I got wrong [10:30:57] <_joe_> btw, using [10:31:33] <_joe_> RewriteRule /wiki fcgi://localhost:9000%{DOCUMENT_ROOT}/something.php [P,L] [10:31:36] <_joe_> it works [10:32:51] (03CR) 10Giuseppe Lavagetto: [C: 031] "LGTM, one minor caveat." [puppet] - 10https://gerrit.wikimedia.org/r/157659 (owner: 10Filippo Giunchedi) [10:33:04] (03CR) 10Giuseppe Lavagetto: elasticsearch: add python-requests dependency (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/157659 (owner: 10Filippo Giunchedi) [10:33:37] GET /_cluster/health HTTP/1.1 [10:34:24] <_joe_> akosiaris: re your patch: at the time, I just reproduced what was in puppet [10:34:42] <_joe_> so it may well be a bad behaviour [10:34:51] _joe_: that is what I thought too, just wanted some confirmation [10:35:03] <_joe_> but I didn't want to change anything as I was just working on puppet 3 compatibility [10:35:09] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 06:23:11 UTC [10:35:15] _joe_: no you are right, it doesn't work with precise's requests, fixing :( [10:35:31] <_joe_> godog: I got burnt by that repeatedly [10:36:20] le sigh, anyways trusty is close enough [10:36:21] <_joe_> I bet no one ever saw ProxyPassInterpolateEnv before :P [10:37:08] _joe_: btw Keep this turned off (for server performance) unless you need it! [10:37:17] for ProxyPassInterpolateEn [10:37:25] and yes I had not used it ever before [10:37:31] <_joe_> akosiaris: yes I read that [10:37:50] <_joe_> but from what I see it gets computed once [10:37:59] <_joe_> it's still better than the RewriteRule [10:38:13] <_joe_> mmmh I guess I gotta take another route [10:38:16] is it not working perhaps because you do not have an interpolate parameter to the proxypass directives ? [10:38:23] <_joe_> that is, using puppet templates [10:38:29] <_joe_> no no it's there [10:38:41] <_joe_> I must have screwed something up somewhere anyway [10:38:46] ah yes, I saw the second c/p now [10:39:13] are they both there or just one? [10:39:49] <_joe_> godog: what? [10:40:30] both proxypass directives that you pasted, are they both in the config? [10:41:15] <_joe_> no just the second one [10:42:01] ${doc_root} [10:42:03] ??? [10:42:26] should it be %{doc_root} ? [10:42:44] <_joe_> akosiaris: no [10:42:53] <_joe_> btw, File does not exist: /var/www/wiki [10:42:56] <_joe_> ARGH [10:43:08] * _joe_ headbangs [10:43:17] <_joe_> ok I'm going to do that via puppet [10:43:27] <_joe_> and make all vhosts templates [10:43:53] <_joe_> I'm not going to copy-paste the hhvm block everywhere [10:44:04] <_joe_> it's against my religion :) [10:44:33] <_joe_> I just want to understand what I'm doing wrong [10:46:28] <_joe_> seeing http://stackoverflow.com/questions/4583235/apache-proxypass-using-a-variable-url-with-interpolate it seems my code should work [10:47:11] (03CR) 10Filippo Giunchedi: elasticsearch: add python-requests dependency (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/157659 (owner: 10Filippo Giunchedi) [10:47:20] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] elasticsearch: add python-requests dependency [puppet] - 10https://gerrit.wikimedia.org/r/157659 (owner: 10Filippo Giunchedi) [10:47:59] <_joe_> lol If I replace /wiki with / it works [10:50:17] <_joe_> ok solved [10:50:34] (03PS1) 10Filippo Giunchedi: elasticsearch: fix check_elasticsearch.py to work on precise [puppet] - 10https://gerrit.wikimedia.org/r/157662 [10:51:44] _joe_: ^ easy enough to fix at least [10:52:06] <_joe_> :) [10:52:07] <_joe_> it is [10:55:30] <_joe_> godog: why the import re there? [10:56:50] _joe_: they weren't sorted before [10:56:50] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [10:57:14] <_joe_> oh ok [10:57:16] <_joe_> sorry [10:57:27] <_joe_> tired, time for a break [10:57:40] <_joe_> apache has the ability to really piss me off [10:57:55] <_joe_> s/apache/technology/ [10:59:29] * _joe_ laters [11:17:47] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] elasticsearch: fix check_elasticsearch.py to work on precise [puppet] - 10https://gerrit.wikimedia.org/r/157662 (owner: 10Filippo Giunchedi) [11:18:48] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [11:29:24] (03PS1) 10Springle: pool db1044 in s3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157667 [11:30:27] (03CR) 10Springle: [C: 032] pool db1044 in s3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157667 (owner: 10Springle) [11:30:32] (03Merged) 10jenkins-bot: pool db1044 in s3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157667 (owner: 10Springle) [11:31:28] !log springle Synchronized wmf-config/db-eqiad.php: pool db1044, warm up (duration: 00m 06s) [11:31:33] Logged the message, Master [11:36:22] (03CR) 10Alexandros Kosiaris: [C: 032] Add DNS views in ganglia [puppet] - 10https://gerrit.wikimedia.org/r/157045 (owner: 10Alexandros Kosiaris) [12:02:17] PROBLEM - DPKG on carbon is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:04:11] !log springle Synchronized wmf-config/db-eqiad.php: depool db1044 (duration: 00m 06s) [12:04:17] Logged the message, Master [12:04:47] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 01:50:18 UTC [12:05:17] PROBLEM - puppet last run on carbon is CRITICAL: CRITICAL: Puppet has 1 failures [12:10:29] !log springle Synchronized wmf-config/db-eqiad.php: repool db1044, take 2 (duration: 00m 06s) [12:10:35] Logged the message, Master [12:12:17] RECOVERY - DPKG on carbon is OK: All packages OK [12:22:18] RECOVERY - puppet last run on carbon is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [12:30:38] PROBLEM - puppet last run on mw1208 is CRITICAL: CRITICAL: Puppet has 1 failures [12:35:47] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 06:23:11 UTC [12:47:37] RECOVERY - puppet last run on mw1208 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [12:58:22] (03PS1) 10Filippo Giunchedi: swift: check high load average on backend machines [puppet] - 10https://gerrit.wikimedia.org/r/157672 [13:15:13] (03PS1) 10Filippo Giunchedi: swift: add dispersion to swift dashboards [puppet] - 10https://gerrit.wikimedia.org/r/157674 [13:17:40] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: add dispersion to swift dashboards [puppet] - 10https://gerrit.wikimedia.org/r/157674 (owner: 10Filippo Giunchedi) [13:33:25] (03PS1) 10Filippo Giunchedi: image scalers: bump workers limits [puppet] - 10https://gerrit.wikimedia.org/r/157678 [13:38:28] (03CR) 10Giuseppe Lavagetto: [C: 031] "\o/" [puppet] - 10https://gerrit.wikimedia.org/r/157678 (owner: 10Filippo Giunchedi) [13:38:34] <_joe_> :) [13:39:24] haha even nicer between "" [13:48:13] Hi. We have an issue on Wikimedia Commons. When we try to delete https://commons.wikimedia.org/wiki/File:Pheliperodrigues.jpg we got the following error message:* [13:48:16] Error deleting file: Could not delete file "mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg". [13:49:21] is jenkins stuck? [13:49:30] there are pending jobs queued an hour ago [13:49:34] https://integration.wikimedia.org/zuul/ [13:52:24] Dereckson: mhh does retrying work? [13:53:04] <_joe_> MatmaRex: that's because hashar is on leave [13:53:07] <_joe_> lemme see [13:53:10] godog: nope [13:53:16] godog: nor for me [13:53:37] ack [13:53:56] there's a bug for it somewhere. [13:54:55] <_joe_> MatmaRex: not jenkins AFAICS [13:55:17] _joe_: i use "jenkins" as the general term for our entire CI workflow :) [13:55:24] <_joe_> oh ok :) [13:55:31] seems to indeed be moving, though [13:55:35] just incredibly slow [13:55:45] stupid mediawiki-vendor-integration jobs [13:56:21] <_joe_> MatmaRex: well, in that case I'm not sure I can troubleshoot that satisfactorily [13:59:50] NotASpy Dereckson I couldn't find anything obviously wrong with swift itself, I see the failed attempts at deleting the file, do you remember what was the bug? [14:00:14] just looking for the bug now [14:00:24] _joe_: filed https://bugzilla.wikimedia.org/show_bug.cgi?id=70256 [14:03:43] _joe_: I'm here [14:03:46] Antoine is on leave? [14:05:47] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 01:50:18 UTC [14:07:35] godog: had trouble finding the bug since it was closed RESOLVEDFIXED, but it's at https://bugzilla.wikimedia.org/show_bug.cgi?id=69760 [14:08:51] <_joe_> Krinkle: happy to see you around :) [14:09:04] <_joe_> Krinkle: yes, see @wmfall [14:10:16] Ugh, people coming out of other people. [14:16:15] and it's been going on for a while, I'm told [14:18:43] Yeah, I wasn't expecting a "well, back in *my* day we used to.. " [14:19:55] NotASpy Dereckson yeah that is supposed to work upon retry usually, can you file a new bug with what you have seen? I'll followup there (likely tomorrow [14:20:49] will do [14:21:02] thanks! [14:33:56] akosiaris: are you around ? [14:35:31] matanya_: yes [14:36:29] i see something i would like another eye to comment on: in manifests/dns.pp line 65 i see: $listen_addresses = [$::ipaddress], [14:36:47] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 06:23:11 UTC [14:37:30] which is used in templates/powerdns/recursor.conf.erb as local-address=<%= flatten_ips(listen_addresses).sort.join(" ") %> [14:37:50] akosiaris: wouldn't it be better to just use @ipaddress directly ? [14:38:05] ipaddress is a fact, but i'm sure i'm missing something here [14:38:25] that is the default.. what if you want to override it ? [14:38:38] and you want to pass an array of ip addresses for pdns to listen on ? [14:38:39] such as ? [14:39:00] like ['10.10.10.10', '192.168.0.1'] ? [14:39:05] oh, got it [14:39:09] address of the top of my head obviously [14:39:14] ok [14:39:20] yeah, i knew i was missing something [14:39:33] but couldn't figure it out. Thank you! [14:39:45] you 're welcome [14:43:31] (03PS1) 10Matanya: pdns: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/157681 [14:59:41] (03PS1) 10Matanya: deployment: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/157685 [15:00:07] PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: /mnt/tmpfs 0B: /srv/deployment/ocg/output 5.0GB (= 5.0GB critical): /srv/deployment/ocg/postmortem 1471364B: ocg_job_status 9627 msg: ocg_render_job_queue 0 msg [15:04:28] (03PS2) 10Matanya: deployment: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/157685 [15:04:30] jouncebot: you still dead, buddy? [15:04:35] who's swatting? [15:04:53] marktraceur: ^d: who's swatting? you're the only ones online [15:05:08] oh wait. [15:05:24] who put the swat on tomorrow instead of today? :P [15:06:16] MatmaRex: it is labor day in US [15:06:21] no deploys today [15:06:40] blergh, silly americans [15:06:46] that too :) [15:07:43] i only don't get why they don't work on "labor" day. maybe it would make more sense to call it vacation day [15:16:07] PROBLEM - Disk space on elastic1004 is CRITICAL: DISK CRITICAL - free space: / 1064 MB (3% inode=96%): [15:45:17] PROBLEM - Disk space on elastic1009 is CRITICAL: DISK CRITICAL - free space: / 1066 MB (3% inode=96%): [15:52:17] RECOVERY - Disk space on elastic1009 is OK: DISK OK [16:06:47] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 01:50:18 UTC [16:20:46] * marktraceur looks up from his pina colada [16:20:50] No deploys today. [16:20:57] * marktraceur goes back to hammock [16:37:47] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 06:23:11 UTC [16:42:57] RECOVERY - Disk space on elastic1007 is OK: DISK OK [16:44:07] RECOVERY - Disk space on elastic1004 is OK: DISK OK [16:44:28] !log removed some large slow query logs from elastic* nodes, need to look into this... [16:44:34] Logged the message, Master [18:07:47] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 01:50:18 UTC [18:38:05] (03PS8) 10ArielGlenn: data retention audit script for logs, /root and /home dirs [software] - 10https://gerrit.wikimedia.org/r/141473 [18:38:09] (03CR) 10jenkins-bot: [V: 04-1] data retention audit script for logs, /root and /home dirs [software] - 10https://gerrit.wikimedia.org/r/141473 (owner: 10ArielGlenn) [18:38:47] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 06:23:11 UTC [18:40:59] I guess I'm just being blind.. [18:41:15] Where the code that stages/sets up the wikitech.wikimedia.org.erb apache config [18:42:22] manifests/role/nova.pp maybe.. [18:43:25] (03PS1) 10Reedy: Add temporary virt1000.wikimedia.org.erb for wikitech migration to multiversion [puppet] - 10https://gerrit.wikimedia.org/r/157704 [18:48:10] (03CR) 10Reedy: Add temporary virt1000.wikimedia.org.erb for wikitech migration to multiversion (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/157704 (owner: 10Reedy) [18:48:18] (03CR) 10Reedy: [C: 04-1] Add temporary virt1000.wikimedia.org.erb for wikitech migration to multiversion [puppet] - 10https://gerrit.wikimedia.org/r/157704 (owner: 10Reedy) [18:55:52] Reedy, so waht's happening with wikitech exactly? [18:56:44] It's being moved over to using multiversion etc [18:56:59] more wikis going to be added there? [18:57:05] It's presumably not going to be moved into production, is it? [18:57:24] I don't think there's more wikis to be added [18:57:27] And no, I don't think so either [18:58:31] !log reedy Purged l10n cache for 1.24wmf17 [18:58:38] Logged the message, Master [18:59:32] !log reedy Purged l10n cache for 1.24wmf16 [18:59:38] Logged the message, Master [19:00:29] !log reedy Purged l10n cache for 1.24wmf15 [19:00:36] Logged the message, Master [19:01:03] !log reedy Purged l10n cache for 1.24wmf14 [19:01:10] Logged the message, Master [19:01:23] * Reedy scratches his head [19:01:36] !log reedy Purged l10n cache for 1.24wmf13 [19:01:42] Logged the message, Master [19:06:51] (03PS1) 10Reedy: Deleted 1.24wmf[6-8] symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157706 [19:08:24] (03CR) 10Reedy: [C: 032] Deleted 1.24wmf[6-8] symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157706 (owner: 10Reedy) [19:08:28] (03Merged) 10jenkins-bot: Deleted 1.24wmf[6-8] symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157706 (owner: 10Reedy) [19:12:24] !log Deleted php-1.24wmf[6-8] from apaches via dsh [19:12:30] Logged the message, Master [19:29:58] PROBLEM - puppet last run on labnet1001 is CRITICAL: CRITICAL: Puppet has 1 failures [19:44:41] (03PS1) 10John F. Lewis: Mailman: Fix a few encoding issues for languages [puppet] - 10https://gerrit.wikimedia.org/r/157708 [19:47:07] RECOVERY - puppet last run on labnet1001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [20:03:08] PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: /mnt/tmpfs 0B: /srv/deployment/ocg/output 5.0GB (= 5.0GB critical): /srv/deployment/ocg/postmortem 1475651B: ocg_job_status 9673 msg: ocg_render_job_queue 0 msg [20:08:47] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 01:50:18 UTC [20:12:11] (03CR) 10Stryn: [C: 031] Mailman: Fix a few encoding issues for languages [puppet] - 10https://gerrit.wikimedia.org/r/157708 (owner: 10John F. Lewis) [20:17:35] (03PS19) 10Reedy: Add wikitech config. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [20:24:23] (03CR) 10Reedy: [C: 04-1] "So a few points... Where are the Wikitech specific extensions actually included/required? I can't obviously see anything" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [20:39:23] (03PS20) 10Reedy: Add wikitech config. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [20:39:25] (03PS1) 10Reedy: Add virt1000.wikimedia.org static mapping for wikitech migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157757 [20:39:47] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 06:23:11 UTC [20:43:50] (03CR) 10Reedy: Add wikitech config. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [21:21:42] (03CR) 10Andrew Bogott: Add wikitech config. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [21:24:26] (03CR) 10Andrew Bogott: "Given that you're reviewing your own code, I'm going to assume this is a work in progress :)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/157704 (owner: 10Reedy) [21:25:45] (03CR) 10Reedy: "If you can tell me where the puppet config side of stuff needs to go that'd be helpful ;)" [puppet] - 10https://gerrit.wikimedia.org/r/157704 (owner: 10Reedy) [21:30:52] (03CR) 10Andrew Bogott: Add temporary virt1000.wikimedia.org.erb for wikitech migration to multiversion (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/157704 (owner: 10Reedy) [21:42:35] (03PS2) 10Reedy: Add temporary virt1000.wikimedia.org.erb for wikitech migration to multiversion [puppet] - 10https://gerrit.wikimedia.org/r/157704 [21:42:37] (03CR) 10Reedy: Add temporary virt1000.wikimedia.org.erb for wikitech migration to multiversion (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/157704 (owner: 10Reedy) [21:50:33] !log disabled gerrit account Caothu9669; spam [21:50:39] Logged the message, Master [22:09:47] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 01:50:18 UTC [22:40:47] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Sun 31 Aug 2014 06:23:11 UTC [22:44:32] bd808: \o/ docker [22:44:56] yuvipanda: I have it running on a labs instance. Kind of fun. [22:45:06] niiiice! [22:45:14] I should learn more about docker at some point [22:46:26] It's interesting. Basically a fairly opinionated way to prepare lxc images and then manage running them. [22:46:37] yeah [22:46:55] opinionated ways to do things perhaps are great for kicking a new technology off the ground [22:47:14] * jeremyb has been playing with docker too [22:49:13] what is that, in relation to the vagrant change? or context? [22:49:40] jeremyb: Yeah. Support for Docker as a provider in MWV [22:50:26] MWV is a new one for me. but i guess that's mediawiki vagrant [22:50:32] It seems to work pretty well so far. One surprising thing is that `vagrant reload` creates a whole new image [22:50:41] huh [22:50:46] what does it normally do? [22:51:25] If you are using Virtualbox it just stops the VM, changes some config and starts it again [22:51:58] But with Docker the config (like forward ports) is baked into the image [22:52:14] So you have to start all over to add a new port forward [22:52:21] so is it essentially the same as vagrant halt && vagrant provision ? [22:52:46] reload is normally the same as halt and up [22:53:02] ok [22:54:36] bd808: what happens to the db, etc state that's in there? [22:54:38] all gonneeeee? [22:54:49] yuvipanda: Yup. [22:54:51] depends where there is [22:54:54] heh [22:55:21] and does vagrant delete the old image or just stops and runs a new one? [22:56:30] I think it just makes a new one. `vagrant destroy` does remove the Docker image, but I didn't really check to see what happened after `vagrant reload` [23:00:40] !log Running extensions/GlobalCssJs/removeOldManualUserPages.php per m:GlobalCssJs [23:00:46] Logged the message, Master