[00:01:42] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Server Error - 1703 bytes in 6.651 second response time [00:09:42] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 172372 bytes in 6.788 second response time [00:56:50] (03CR) 10Jeremyb: [C: 04-1] "bad regex. otherwise needs further review/comments." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/113687 (owner: 10Brion VIBBER) [02:09:12] PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC [02:12:31] (03PS1) 10Springle: tendril.wikimedia.org along side ishmael [operations/dns] - 10https://gerrit.wikimedia.org/r/113890 [02:13:04] (03CR) 10Springle: [C: 032] tendril.wikimedia.org along side ishmael [operations/dns] - 10https://gerrit.wikimedia.org/r/113890 (owner: 10Springle) [02:14:08] !log LocalisationUpdate completed (1.23wmf13) at 2014-02-18 02:14:08+00:00 [02:14:19] Logged the message, Master [02:26:05] !log LocalisationUpdate completed (1.23wmf14) at 2014-02-18 02:26:05+00:00 [02:26:12] Logged the message, Master [02:35:47] (03PS1) 10Springle: tendril.wikimedia.org, borrow basic structure from ishmael which is already on neon. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113893 [02:39:40] (03CR) 10Springle: [C: 032] tendril.wikimedia.org, borrow basic structure from ishmael which is already on neon. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113893 (owner: 10Springle) [02:45:38] (03PS2) 10MZMcBride: Disable and remove ContactPageFundraiser [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110292 (owner: 10Reedy) [02:48:15] !log LocalisationUpdate ResourceLoader cache refresh completed at 2014-02-18 02:48:14+00:00 [02:48:21] Logged the message, Master [02:50:26] (03PS1) 10Springle: tendril add missing manifest [operations/puppet] - 10https://gerrit.wikimedia.org/r/113895 [02:51:58] (03CR) 10Springle: [C: 032] tendril add missing manifest [operations/puppet] - 10https://gerrit.wikimedia.org/r/113895 (owner: 10Springle) [02:53:20] !log reindexing s1 slaves abuse_filter_log [02:53:28] Logged the message, Master [02:58:34] (03PS1) 10Reedy: Disable ContactPageFundraiser from testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113896 [03:10:22] (03PS2) 10Reedy: Remove ContactPageFundraiser from testwiki and foundationwikio [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113896 [03:13:08] (03PS3) 10Reedy: Remove ContactPageFundraiser from testwiki and foundationwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113896 [03:17:57] (03PS1) 10Reedy: Remove UserThrottle extension and log group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113897 [03:32:36] (03PS4) 10MZMcBride: Remove ContactPageFundraiser from testwiki and foundationwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113896 (owner: 10Reedy) [03:37:28] (03PS2) 10MZMcBride: Remove UserThrottle extension and log group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113897 (owner: 10Reedy) [03:37:54] !log reedy updated /a/common to {{Gerrit|Ifd09130b4}}: Ignore PhpStorm files [03:37:59] (03PS1) 10Reedy: Symlink in the extension-list files [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113899 [03:38:02] Logged the message, Master [03:38:37] (03PS1) 10Ori.livneh: Refactor GeoIP lookup code [operations/puppet] - 10https://gerrit.wikimedia.org/r/113900 [03:39:59] (03PS2) 10MZMcBride: Symlink in the extension-list files [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113899 (owner: 10Reedy) [03:44:50] (03PS3) 10Reedy: Symlink in the extension-list files [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113899 [03:50:53] (03CR) 10Alex Monk: Symlink in the extension-list files (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113899 (owner: 10Reedy) [03:53:47] (03PS1) 10Springle: limit to ops for now [operations/puppet] - 10https://gerrit.wikimedia.org/r/113901 [03:55:04] (03CR) 10Springle: [C: 032] limit to ops for now [operations/puppet] - 10https://gerrit.wikimedia.org/r/113901 (owner: 10Springle) [03:59:48] (03CR) 10Reedy: Symlink in the extension-list files (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113899 (owner: 10Reedy) [04:09:52] (03PS4) 10Reedy: Symlink in the extension-list files [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113899 [04:10:25] (03CR) 10Reedy: [C: 032] Symlink in the extension-list files [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113899 (owner: 10Reedy) [04:10:33] (03Merged) 10jenkins-bot: Symlink in the extension-list files [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113899 (owner: 10Reedy) [04:11:10] !log reedy synchronized docroot/noc/ [04:11:18] Logged the message, Master [04:30:40] (03PS1) 10Andrew Bogott: In Havana, switch back to a driver-based wikinotifier. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113903 [04:31:47] (03CR) 10jenkins-bot: [V: 04-1] In Havana, switch back to a driver-based wikinotifier. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113903 (owner: 10Andrew Bogott) [04:34:03] (03PS2) 10Andrew Bogott: In Havana, switch back to a driver-based wikinotifier. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113903 [04:46:49] (03CR) 10Andrew Bogott: [C: 032] In Havana, switch back to a driver-based wikinotifier. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113903 (owner: 10Andrew Bogott) [05:10:12] PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC [05:20:22] PROBLEM - puppetmaster https on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:21:12] RECOVERY - puppetmaster https on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.157 second response time [05:35:42] PROBLEM - Host ms-be1005 is DOWN: PING CRITICAL - Packet loss = 100% [05:48:55] (03PS1) 10Andrew Bogott: Update wikinotifier to work on Havana. First of many, I expect... [operations/puppet] - 10https://gerrit.wikimedia.org/r/113908 [05:50:06] (03CR) 10jenkins-bot: [V: 04-1] Update wikinotifier to work on Havana. First of many, I expect... [operations/puppet] - 10https://gerrit.wikimedia.org/r/113908 (owner: 10Andrew Bogott) [05:52:09] (03PS2) 10Andrew Bogott: Update wikinotifier to work on Havana. First of many, I expect... [operations/puppet] - 10https://gerrit.wikimedia.org/r/113908 [05:53:34] (03CR) 10Andrew Bogott: [C: 032] Update wikinotifier to work on Havana. First of many, I expect... [operations/puppet] - 10https://gerrit.wikimedia.org/r/113908 (owner: 10Andrew Bogott) [06:22:47] (03PS1) 10Andrew Bogott: Further wikinotifier updates for havana [operations/puppet] - 10https://gerrit.wikimedia.org/r/113910 [06:26:01] (03PS2) 10Andrew Bogott: Further wikinotifier updates for havana [operations/puppet] - 10https://gerrit.wikimedia.org/r/113910 [06:29:23] (03CR) 10Andrew Bogott: [C: 032] Further wikinotifier updates for havana [operations/puppet] - 10https://gerrit.wikimedia.org/r/113910 (owner: 10Andrew Bogott) [06:33:24] (03PS1) 10Andrew Bogott: FLAGS are now called CONF [operations/puppet] - 10https://gerrit.wikimedia.org/r/113911 [06:34:33] (03CR) 10jenkins-bot: [V: 04-1] FLAGS are now called CONF [operations/puppet] - 10https://gerrit.wikimedia.org/r/113911 (owner: 10Andrew Bogott) [06:35:44] (03PS2) 10Andrew Bogott: FLAGS are now called CONF [operations/puppet] - 10https://gerrit.wikimedia.org/r/113911 [06:37:20] (03CR) 10Andrew Bogott: [C: 032] FLAGS are now called CONF [operations/puppet] - 10https://gerrit.wikimedia.org/r/113911 (owner: 10Andrew Bogott) [07:36:34] (03PS1) 10Andrew Bogott: Havana wikinotified needs to use conductor &c. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113917 [07:37:41] (03CR) 10jenkins-bot: [V: 04-1] Havana wikinotified needs to use conductor &c. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113917 (owner: 10Andrew Bogott) [07:39:09] (03PS2) 10Andrew Bogott: Havana wikinotified needs to use conductor &c. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113917 [07:40:25] (03PS3) 10Andrew Bogott: Havana wikinotifier needs to use conductor &c. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113917 [07:42:02] (03CR) 10Andrew Bogott: [C: 032] Havana wikinotifier needs to use conductor &c. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113917 (owner: 10Andrew Bogott) [08:11:12] PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC [08:25:44] (03PS1) 10Andrew Bogott: Fix up ip reporting for havana [operations/puppet] - 10https://gerrit.wikimedia.org/r/113920 [08:29:11] (03CR) 10Andrew Bogott: [C: 032] Fix up ip reporting for havana [operations/puppet] - 10https://gerrit.wikimedia.org/r/113920 (owner: 10Andrew Bogott) [09:11:33] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [09:35:33] PROBLEM - Varnish HTCP daemon on cp1054 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:36:02] PROBLEM - Varnish HTTP text-backend on cp1054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:02] PROBLEM - Varnish traffic logger on cp1054 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:37:43] (03PS2) 10Ori.livneh: Refactor GeoIP lookup code [operations/puppet] - 10https://gerrit.wikimedia.org/r/113900 [10:06:02] hi akosiaris do you think manifests/misc/statistics should be a module (i know i should ask otto :)) [10:07:56] matanya: looks like it [10:09:58] thanks [10:10:33] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [10:11:08] cp1054 locked up again, probably [10:30:28] (03PS1) 10Ori.livneh: Send GeoIP lookup result as 'GeoIP' cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/113935 [10:41:35] (03CR) 10Alexandros Kosiaris: [C: 032] appserver php: remove files moved to appserver module [operations/puppet] - 10https://gerrit.wikimedia.org/r/113784 (owner: 10Matanya) [11:12:12] PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC [11:20:21] hashar: I tried to build mathoid with jenkins and it failed.. https://integration.wikimedia.org/ci/job/mathoidsvc-debian-glue/6/console even though I could build exactly the same source on launchpad [11:22:28] hashar: as far as I understand the problem is 00:02:31.215 ok 9 ['chroot', '/tmp/tmpcmd3CV', 'dpkg', '-i', 'tmp/mathoid_0.1.3+0~20140218111509.6~1.gbpedf733_amd64.deb'] # skip Ignoring failed command [11:32:30] i'm trying to convert iptables rules to ferm rules, any nice guide i can read? it seems like not all the functionlity is on ferm module, or i don't understand it 100% [11:33:10] akosiaris: ^ ? [11:33:38] not really [11:33:53] best approach is probably to have a look at how others use it [11:34:42] yes, done that. didn't find a replace for iptables_purge_service [11:34:49] but i'll dig deeper [11:34:53] you dont need one [11:35:02] ferm is not a state machine. [11:35:13] ah, that is why i can't find it :) [11:35:20] each time it is notified it will try to reload the entire conf [11:35:32] you only list what you need and that's it [11:35:47] great, thanks. all is clear now :) [11:35:48] btw the default behaviour is DROP so extra care is needed [11:36:03] that is what i intend to keep [11:40:05] (03PS1) 10ArielGlenn: releases role indentation and style fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/113942 [11:41:23] (03CR) 10ArielGlenn: [C: 032] releases role indentation and style fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/113942 (owner: 10ArielGlenn) [11:41:52] akosiaris: to verify: ferm::rule { 'redis_internal': [11:41:52] rule => 'proto tcp dport 6379 { saddr $INTERNAL ACCEPT; }', [11:42:16] will allow redis on port 6379 from internal lan ? [11:58:24] matanya: yes [12:01:22] thanks [12:06:37] (03PS1) 10ArielGlenn: alert earlier for disk space on parsoid servers, rt 6851 [operations/puppet] - 10https://gerrit.wikimedia.org/r/113944 [12:07:23] (03CR) 10ArielGlenn: "This leaves the regular disk alert (5%, 3%) in place, not sure if that's a problem or not" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113944 (owner: 10ArielGlenn) [12:23:29] physikerwelt: that is piuparts [12:25:21] hashar: maybe I should install that, run locally and see what happens? [12:26:47] physikerwelt: I guess so. I am not sure how to fix those errors though [12:28:30] physikerwelt: I dont think piuparts warnings are much of an issue [12:28:38] physikerwelt: you might want to look at the lintian one though: http://integration.wikimedia.org/ci/job/mathoidsvc-debian-glue/6/testReport/ [12:29:23] some perl scripts have #!/usr/local/bin/perl [12:29:37] might need #!/usr/bin/env perl [12:30:38] ok I agree... I'm just wondering why those errors do not occur in the math extension [12:31:10] CRLF is a really bad problem with git and windows [12:31:31] for some reason I get CRLF over and again [12:31:54] Testing that is really a great idea [12:34:38] but we have to exclude node_modules from test [12:34:51] they are somehow out of control [12:36:05] (03PS1) 10ArielGlenn: nuria access to erbium, rt #6836 [operations/puppet] - 10https://gerrit.wikimedia.org/r/113946 [12:37:45] (03CR) 10ArielGlenn: [C: 032] nuria access to erbium, rt #6836 [operations/puppet] - 10https://gerrit.wikimedia.org/r/113946 (owner: 10ArielGlenn) [12:47:05] (03PS1) 10ArielGlenn: Revert "nuria access to erbium, rt #6836" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113949 [12:48:44] (03CR) 10ArielGlenn: [C: 032] Revert "nuria access to erbium, rt #6836" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113949 (owner: 10ArielGlenn) [13:54:20] (03CR) 10Manybubbles: [C: 031] "Sounds good to me. I'm fine with syncing this whenever." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113294 (owner: 10Chad) [13:59:54] (03PS1) 10coren: Labs: Add labstore100[12] node entry [operations/puppet] - 10https://gerrit.wikimedia.org/r/113952 [14:02:01] (03CR) 10coren: [C: 032] "Safe to merge: cannot affect anything else by definition" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113952 (owner: 10coren) [14:11:29] (03PS1) 10coren: Labs: Tweaks to labstore config for migration [operations/puppet] - 10https://gerrit.wikimedia.org/r/113953 [14:13:12] PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC [14:28:06] (03PS1) 10Manybubbles: Turn on checkDelay for a cirrus job [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113954 [14:32:16] (03PS1) 10Andrew Bogott: Add cron for manage-keys. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113955 [14:32:34] Coren: ^ might be all it takes, presuming that the mount on the nfs server is in the same place... [14:32:54] (03CR) 10jenkins-bot: [V: 04-1] Add cron for manage-keys. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113955 (owner: 10Andrew Bogott) [14:33:00] awww [14:33:32] well, ok, not quite so simple [14:38:21] (03CR) 10coren: [C: 032] "Mini tweaks." [operations/puppet] - 10https://gerrit.wikimedia.org/r/113953 (owner: 10coren) [14:39:31] (03PS2) 10Andrew Bogott: Add cron for manage-keys. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113955 [14:40:10] (03CR) 10jenkins-bot: [V: 04-1] Add cron for manage-keys. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113955 (owner: 10Andrew Bogott) [14:45:37] (03PS4) 10Andrew Bogott: Add cron for manage-keys. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113955 [14:48:00] (03PS5) 10Andrew Bogott: Add cron for manage-keys. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113955 [14:49:43] (03CR) 10coren: [C: 032] "Should work fine." [operations/puppet] - 10https://gerrit.wikimedia.org/r/113955 (owner: 10Andrew Bogott) [14:51:17] (03PS1) 10coren: Labs: Use service groups in eqiad always [operations/puppet] - 10https://gerrit.wikimedia.org/r/113959 [14:52:08] andrewbogott: Can you sanity check my change? ^^ [14:52:21] petan: around? [14:52:29] . [14:52:47] petan: https://gerrit.wikimedia.org/r/113755 What do you mean with resolve helper over there [14:53:17] hoo: that thing which takes the correct name from /etc/hosts in case user provides wrong one [14:53:30] line 100 [14:53:54] um [14:54:10] I didn't mean resolve helper that uses /etc/hosts you didn't remove that, but that _p suffix [14:54:13] that is useful as well [14:54:31] it's not, as we always remove _p in the script [14:54:39] line 10)/92 [14:54:43] * 109/92 [14:56:10] https://gerrit.wikimedia.org/r/#/c/113755/2/modules/toollabs/files/sql [14:56:12] Coren: I'm pretty sure it does what you say it does. As to whether or not that's the right thing, I'll leave to you [14:56:56] (03PS2) 10coren: Labs: Use service groups in eqiad always [operations/puppet] - 10https://gerrit.wikimedia.org/r/113959 [14:57:42] petan: In which line does it use that _p? sorry, but I just don't see why we should append it [14:57:59] because the databases are called like that [14:58:37] Sure... but that's not a reason for the script to make superfluous calls to itself [14:58:47] server=`echo "$1" | sed 's/_p//'`.labsdb [14:59:08] The script removes the _p again, so the additional _p is of no use AFAIS [14:59:28] I think it was called again for some reason... dunno what [15:00:14] I don't really care how you do that, but I don't like idea of loosing that functionality :P [15:00:32] I don't see how we loose functionality that way [15:00:43] Can you give me an example where my version is broken? [15:00:48] this sql script was meant to be idiot proof, so that users with minimal knowledge of unix can operate it [15:01:02] when I do "sql enwiki" with my version, it will connect me to enwiki_p [15:01:11] So will my version do [15:01:12] if I do sql enwiki with your version, it crash [15:01:17] howcome [15:01:38] (03PS1) 10Dzahn: fix hardcoded db name in BZ reporter script [operations/puppet] - 10https://gerrit.wikimedia.org/r/113961 [15:01:41] (03CR) 10coren: [C: 032] "Does what it means to do, in the way it intended." [operations/puppet] - 10https://gerrit.wikimedia.org/r/113959 (owner: 10coren) [15:01:48] aha [15:01:52] I didn't notice that [15:02:55] (03PS1) 10Andrew Bogott: Move manage-keys-nfs cron to the actual nfs class. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113962 [15:03:07] hoo: is that file somewhere so that I can test it [15:03:21] I am lazy to extract from gerrit, it has shitty interface [15:04:09] /home/hoo/sql [15:04:15] on tool labs [15:04:41] (03CR) 10Andrew Bogott: [C: 032] Move manage-keys-nfs cron to the actual nfs class. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113962 (owner: 10Andrew Bogott) [15:05:15] (03CR) 10Dzahn: [C: 032] fix hardcoded db name in BZ reporter script [operations/puppet] - 10https://gerrit.wikimedia.org/r/113961 (owner: 10Dzahn) [15:06:19] hoo: I don't know what is difference between enwiki and enwiki_p databases [15:06:30] I was thinking that we only have enwiki_p or we had in past [15:06:52] using your script with "sql enwiki" and "sql enwiki_p" connects me to different databases according to client [15:07:48] I don't really know if it's good or bad, but it could break existing scripts that rely on that [15:09:10] mh [15:11:42] petan: What if we remove the recursion and just append _p in that step? [15:12:54] it won't go throught that switch but I think that's ok [15:12:57] you can try that :P [15:13:59] ok, will amend later (will also fix the hard coded $2 for verbose... not sure why I did that) [15:15:51] petan: Just checked that out... enwiki_p are the views and enwiki are base tables AFAIS [15:16:23] ok I think that normal users shouldn't have access to enwiki then? [15:16:32] in which case they should be autoredirected to _p [15:16:42] MariaDB [enwiki_p]> SELECT * FROM enwiki.user WHERE user_name = 'Hoo man'; [15:16:42] ERROR 1142 (42000): SELECT command denied to user 'u2133'@'10.4.0.220' for table 'user' [15:17:20] (03PS1) 10Matanya: statistics: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/113966 [15:17:21] this script must be idiot-proof, it needs to get user to proper db [15:17:29] so it should be _p [15:17:57] right... I only tested the script with SELECT DATABASE(); (to make sure it makes mysql use the right one) [15:18:03] but didn't test actual queries [15:18:21] (03CR) 10jenkins-bot: [V: 04-1] statistics: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/113966 (owner: 10Matanya) [15:18:41] didn't believe it will work [15:19:58] (03PS2) 10Matanya: statistics: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/113966 [15:20:14] (03PS1) 10coren: Labs: make a generic "every instance" class [operations/puppet] - 10https://gerrit.wikimedia.org/r/113967 [15:20:59] (03CR) 10jenkins-bot: [V: 04-1] statistics: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/113966 (owner: 10Matanya) [15:21:58] (03PS2) 10coren: Labs: make a generic "every instance" class [operations/puppet] - 10https://gerrit.wikimedia.org/r/113967 [15:24:29] (03CR) 10Andrew Bogott: [C: 031] "This looks good, although I don't relish the great LDAP search and replace that this will entail." [operations/puppet] - 10https://gerrit.wikimedia.org/r/113967 (owner: 10coren) [15:24:44] (03PS3) 10Matanya: statistics: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/113966 [15:25:39] (03CR) 10coren: [C: 032] "At least it can be done at leisure." [operations/puppet] - 10https://gerrit.wikimedia.org/r/113967 (owner: 10coren) [15:25:52] (03CR) 10jenkins-bot: [V: 04-1] statistics: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/113966 (owner: 10Matanya) [15:26:57] (03PS4) 10Matanya: statistics: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/113966 [15:27:28] why puppet doesn't alert when i run validate? :/ [15:27:59] (03CR) 10jenkins-bot: [V: 04-1] statistics: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/113966 (owner: 10Matanya) [15:30:24] (03PS5) 10Matanya: statistics: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/113966 [15:31:26] (03CR) 10jenkins-bot: [V: 04-1] statistics: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/113966 (owner: 10Matanya) [15:36:47] matanya: different puppet versions maybe ? [15:36:53] matanya: jenkins has puppet 2.7 iirc [15:36:55] i think so [15:37:01] yes, i have 3.3 [15:37:53] (03CR) 10Hashar: statistics: convert into a module (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/113966 (owner: 10Matanya) [15:38:23] LOL hashar :) fixed that already :P [15:41:21] :-] [15:41:37] i don't think there is much syntax difference between puppet 2.7 and 3.3 [15:41:47] as far as "puppet parser validate " is involved [15:45:06] paravoid :D :D [15:45:12] ? [15:45:15] librdkafka [15:45:18] ah [15:45:21] yes :) [15:45:25] uploaded to Debian too [15:45:27] thank you! [15:45:29] awesooome [15:45:30] will you do the backport? [15:45:37] sure, can do [15:45:50] i forget, do I need to commit it or just build and add to our apt? [15:46:53] just build and reprepro include [15:47:07] great, can do [15:47:09] danke [15:48:20] (03PS6) 10Matanya: statistics: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/113966 [15:48:36] ottomata: ^ [15:48:53] :O [15:48:56] hashar: no the issue is puppet parser validate returens nothing [15:49:00] you are a crazy person! [15:49:10] indeed :) [15:49:20] phew, uhhhhhh [15:49:33] if you are going to modularlize that, I probably would make multiple modules [15:49:37] needs a great overhaul, but this is a start [15:49:47] misc::statistics was more of just a big ol' misc file of classes [15:49:59] lots of stuff there isn't related at all [15:50:16] please, enlighten me ottomata [15:50:53] hm, i mean, i'm not sure exactly how I would do it, but [15:51:10] in the same way I wouldn't make a 'misc' module [15:51:26] i probably wouldn't turn misc/statistics.pp into a module all of its own [15:51:37] there are at least a few separate pieces I can think of: [15:52:16] stats.wikimedia.org [15:52:17] geowiki [15:52:17] public datasets [15:52:28] probably many of those rsync jobs could be part of the udp2log module, maybe? [15:52:33] or maybe not udp2log [15:52:34] mind looking at the commit and suggest as code review? [15:52:37] maybe something more generic [15:52:40] oright [15:52:42] hm, yeah [15:52:42] hm [15:52:45] ok, yeah [15:52:46] will do [15:53:06] now it is just a load of structured mess [15:53:22] might be useful to move around stuff as needed [15:55:33] (03PS1) 10Thiemo Mättig (WMDE): Redirect all *.wikidata.org subdomains to www.wikidata.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/113972 [16:01:08] mutante: ping ? [16:03:31] mark: what box is going to host noc.wikimedia.org ? tin? [16:03:48] i don't know :) [16:04:58] matanya, fenari? [16:05:27] MaxSem: I think he's asking post-tampa [16:06:46] hoo is right MaxSem [16:06:55] (03CR) 10Thiemo Mättig (WMDE): Redirect all *.wikidata.org subdomains to www.wikidata.org (031 comment) [operations/apache-config] - 10https://gerrit.wikimedia.org/r/113972 (owner: 10Thiemo Mättig (WMDE)) [16:07:31] paravoid, ready when you are [16:07:38] <^demon|breakfast> I thought tin was internal-only on purpose. [16:07:52] MaxSem: I am now [16:08:05] (03PS3) 10Faidon Liambotis: Varnish: disable WAP on mobile frontends [operations/puppet] - 10https://gerrit.wikimedia.org/r/108738 [16:08:23] * MaxSem chuckles his fingers [16:08:58] (03CR) 10Faidon Liambotis: [C: 032] Varnish: disable WAP on mobile frontends [operations/puppet] - 10https://gerrit.wikimedia.org/r/108738 (owner: 10Faidon Liambotis) [16:09:12] bye bye wap [16:09:23] if jenkins decided to V+2 [16:17:17] paravoid: nice work on the debian package :) [16:17:50] Snaps and jgage, lets chat here [16:17:52] hi [16:17:58] hey jgage [16:18:08] so jgage, Snaps is the author of librdkafka (Kafka C lib) and varnishkafka [16:18:21] cool :) [16:18:22] i was just chatting to both of you about the same thing [16:18:25] thought we shoudl move in here :) [16:18:33] hehe good idea [16:18:40] wmf is all about multiple communication channels [16:19:08] heh, ok i'm looking for ResponseSendTimeMs Snaps…doesn't look like i'm; graphing that right now [16:19:13] thats on the broker side, right? [16:19:32] should be, yeah [16:19:43] will that give us insight into leader elections? [16:19:55] there is also TotalTimeMs [16:20:13] produce? [16:20:14] https://kafka.apache.org/documentation.html#monitoring [16:20:18] ottomata: yeah [16:20:20] "kafka.network":type="RequestMetrics",name="Produce-ResponseSendTimeMs" [16:20:47] both TotalTimeMs and ResponeSendTimeMs are of interest [16:21:18] jgage: are you in charge of monitoring? [16:21:28] jgage: no, this is for finding the cause of latency peaks, which might cause rdkafka queueing issues. It's just a theory at this point but all we got [16:22:05] matanya: well i'm still a newbie on the team but i am trying to coordinate work on monitoring [16:22:14] snaps, ok [16:22:40] Snaps: [16:22:59] mean Produce-ResponseSendTimeMs is twice as high on analytics102 [16:23:11] almost. [16:23:11] 0.38 vs 0.63 [16:23:19] stddev is much higher on an22 though as well [16:23:27] jgage: please let me know when you have some time, i'd like to share some ideas and see what is needed in thsi area [16:23:27] 3.58 vs 6.18 [16:23:46] ottomata: interesting! [16:24:03] matanya, great ok! you can also email me: jgerard@wikimedia.org [16:24:17] I recently asked myself whether we still save new uploads on nas-1 [16:24:24] (03PS1) 10coren: Labs: NFS mounts for labs instances [operations/puppet] - 10https://gerrit.wikimedia.org/r/113976 [16:24:30] and uhh [16:24:31] mails get lost in the queue :) [16:24:40] true [16:24:44] opposite for Produce-TotalTimeMs [16:25:14] analytics1021: mean: 15.16, StdDev: 55.86 [16:25:14] analytics1022: mean: 9.63, StdDev: 25.67 [16:25:22] ottomata: hey [16:25:27] apergos: --^ [16:25:28] heyo [16:25:34] (the image upload bit) [16:25:46] ottomata: wanna have a look to see why reqstats got broken and started showing a very erratic behavior approx 2 weeks ago? [16:25:49] https://graphite.wikimedia.org/render/?title=Wiki%20Pageviews/sec&from=-336hours&until=-288hours&width=1024&height=500&areaMode=none&hideLegend=false&lineWidth=1&lineMode=connected&target=color%28cactiStyle%28alias%28scale%28reqstats.pageviews,%220.01666%22%29,%20%22pageviews/sec%22%29%29,%22blue%22%29 [16:25:54] shows the pattern [16:26:24] ? [16:26:35] oh [16:26:38] oh taht is from sqstat? [16:26:39] right? [16:26:47] paravoid, if that is so, I think I know why [16:26:48] it is [16:27:02] (03PS2) 10coren: Labs: NFS mounts for labs instances [operations/puppet] - 10https://gerrit.wikimedia.org/r/113976 [16:27:05] matanya and I moved sqstat from emery to erbium [16:27:13] previously it was running on the direct udp2log stream to emery [16:27:14] hoo, no, nas-1 isn't used for file backend storage any more [16:27:17] now it is running on the multicast relay stream [16:27:26] yeah, i thought you are going to say that [16:28:30] although, the sporadic drops is totally weird [16:29:27] (03CR) 10coren: [C: 032] "NFS FTW!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113976 (owner: 10coren) [16:29:38] apergos: Ok... we recently had a bug about an (archived) image getting lost (DB inconsistencies)... any chance I could like search for that image? (more of a theoretical question, I don't really care about that specific bug much) [16:29:50] by the way, how comes that reqerror often has more errors than reqsum has requests? like, 300 k/s errors vs. 100 k/s requests [16:30:40] errors are not /s, they're /min [16:30:56] (or /hour in the longer duration graphs) [16:31:05] 300k/s errors would be really really bad :) [16:31:17] ottomata: ok, are you going to investigate and fix then? :) [16:31:23] paravoid, we merged that on 2014-01-30 17:48:42 [16:31:49] you would have to ask one of us to look arounf for it, hoo; we would check a few places in swift, posibly check public mirrors, etc. [16:32:02] paravoid, if this is the cause, which it probably is [16:32:05] i'm not sure what to do [16:32:21] ottomata: we can move it to oxygen to test it [16:32:31] we moved stuff off of emery because it is in tampa [16:32:49] matanya, i doubt that will help, since oxygen uses the same lossy multicast stream [16:32:58] http://ganglia.wikimedia.org/latest/graph_all_periods.php?hreg[]=emery%7Coxygen%7Cerbium&mreg[]=%5Epacket_loss_average%24&z=large>ype=line&title=%5Epacket_loss_average%24&aggregate=1&r=hour [16:33:10] it can't be the cause, it can only be the trigger [16:33:18] maybe configure direct stream? [16:33:33] the cause is probably packet loss, or resource exhaustion or something like that [16:34:41] paravoid, we are tryign to deprecate udp2log soon, hence kafkatee stuff [16:35:43] the hope is to turn off varnishncsa sooner rather than later [16:36:13] hm, i mean [16:36:22] we could move this back to emery to test it, emery is still online [16:36:30] please don't [16:36:32] apergos: Ok, so the swift storage isn't like mounted anywhere (using cloudfuse or so)... and only accessible for ops?! [16:36:33] and i was hoping to keep emery online until we get kafkatee replacing it [16:37:32] direct access to the back end happens from a host on the production cluster, someone with access to those credentials can delete as well as look at things so yes it's limited to roots [16:37:54] there's not a way to ls a flat directory or anything like that [16:38:01] (03PS1) 10coren: Labs: remove unneeded (and broken) autofs stop [operations/puppet] - 10https://gerrit.wikimedia.org/r/113978 [16:38:35] I wonder, is it possible for a volunteer to have access to ops-l ? [16:38:52] matanya, why not move sqstat back? [16:39:23] matanya: With an NDA in theory, yes [16:39:29] ottomata: i spent about 4 days to find out who owns what and where, now go back ? :/ [16:39:39] (03CR) 10coren: [C: 032] "Trivial fix." [operations/puppet] - 10https://gerrit.wikimedia.org/r/113978 (owner: 10coren) [16:39:53] hoo: in theory ? [16:40:39] just sqstat which is ops only [16:40:44] matanya: Well, I subscribed to it some time back and nobody ever approved that (did that cause some page on wikitech said everyone with shell should be subscribed) [16:41:24] ottomata: if you have to, do it, but i really prefer not to. [16:41:38] apergos: I see... thanks for your help ;) [16:41:38] better find the root cause and fix it [16:41:39] paravoid, looks good to me: curl -I -H 'Accept: text/vnd.wap.wml' en.m.wikipedia.org/wiki/Main_Page [16:42:10] well, lets move it back and at least see if that fixes it [16:42:18] will put some heavy comments along with it [16:42:24] ottomata: maybe try to netcat on both sides and see if you have pack losses [16:42:31] ottomata, are you the author of sqstat? [16:42:36] no [16:42:39] no idea who [16:42:42] maybe asher was? [16:42:43] dunno [16:42:46] yeah the file doesn't say [16:42:50] hoo, yw, jusst if there's a bug like that don't be shy about asking for help on the bz report and it will likely get forwarded to rt for one of us to poke [16:42:50] it was before my time [16:42:55] no matter, just curious [16:43:52] apergos: Basically it's this https://bugzilla.wikimedia.org/show_bug.cgi?id=54776 [16:44:54] ah, not just one or two small images :-D [16:45:00] (03PS1) 10Ottomata: Moving sqstat back to emery [operations/puppet] - 10https://gerrit.wikimedia.org/r/113980 [16:45:14] ;_; [16:45:15] (03PS2) 10BryanDavis: Send Vary header on http to http redirect [operations/puppet] - 10https://gerrit.wikimedia.org/r/111917 [16:45:22] (03CR) 10Ottomata: [C: 032 V: 032] Moving sqstat back to emery [operations/puppet] - 10https://gerrit.wikimedia.org/r/113980 (owner: 10Ottomata) [16:46:09] Yeah... :/ Luckily the sha1 sums in the oldimage table are correct, so that we don't have unrecoverable data loss [16:46:22] ottomata: lets check back tomorrow and see what happened [16:48:57] sounds like a bit of scripting work is needed.. to search the various places where the image might have been (temp locations or whatever), I"m not familiar with upload wizard anymore so I"m not sure what that would look like [16:59:35] (03PS1) 10coren: Move labs-specific stuff out of base [operations/puppet] - 10https://gerrit.wikimedia.org/r/113982 [17:00:49] apergos: Can you take a look at ^^? It should be fairly straightforward but since this touches base I need a review. [17:03:28] (03PS1) 10Ottomata: Adding more Kafka network request stats to jmxtrans monitoring [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/113983 [17:03:46] (03CR) 10Ottomata: [C: 032 V: 032] Adding more Kafka network request stats to jmxtrans monitoring [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/113983 (owner: 10Ottomata) [17:04:11] (03PS1) 10Ottomata: Updating kafka module with more stats from jmx [operations/puppet] - 10https://gerrit.wikimedia.org/r/113984 [17:04:20] (03CR) 10Ottomata: [C: 032 V: 032] Updating kafka module with more stats from jmx [operations/puppet] - 10https://gerrit.wikimedia.org/r/113984 (owner: 10Ottomata) [17:05:07] (03CR) 10Daniel Kinzler: Redirect all *.wikidata.org subdomains to www.wikidata.org (031 comment) [operations/apache-config] - 10https://gerrit.wikimedia.org/r/113972 (owner: 10Thiemo Mättig (WMDE)) [17:14:12] PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC [17:37:44] there are a couple hung submit_check_result processes from the snmp trap translator going back to Jan 2 on neon, anybody familiar with this issue? shall i just kill them? (root@neon:~# ps auxw | grep submit_check_result) [17:38:01] jgage: ah actually sometimes that gets caught [17:38:15] usually what actually solves it, stupidly, is doing a strace -p on the stuck process [17:38:21] either on the very parent or the last child [17:38:25] groan [17:38:27] ok, thanks [17:38:27] yeah [17:39:03] yep, that worked [17:39:11] wtf [17:39:12] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Last successful Puppet run was Wed 12 Feb 2014 10:05:10 PM UTC [17:39:19] that's me [17:40:12] PROBLEM - Puppet freshness on cp4010 is CRITICAL: Last successful Puppet run was Wed 22 Jan 2014 09:57:14 PM UTC [17:40:12] PROBLEM - Puppet freshness on vanadium is CRITICAL: Last successful Puppet run was Thu 02 Jan 2014 02:40:49 PM UTC [17:40:17] also those [17:40:28] ok they're all cleared [17:40:43] thanks lesliecarr, i'll make a note of this for future reference [17:41:02] RECOVERY - Puppet freshness on vanadium is OK: puppet ran at Tue Feb 18 17:40:54 UTC 2014 [17:41:21] yw [17:44:14] Can someone take a peek at https://gerrit.wikimedia.org/r/113982 please? It's straightforward but I don't want to self +2 since it touches base. [17:47:23] ottomata|lunch: it's completely broken now [17:47:42] ottomata|lunch: no data at all for approx. the past hour [17:49:33] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx [17:57:12] RECOVERY - Puppet freshness on cp4010 is OK: puppet ran at Tue Feb 18 17:57:09 UTC 2014 [18:05:22] RECOVERY - Puppet freshness on analytics1004 is OK: puppet ran at Tue Feb 18 18:05:14 UTC 2014 [18:07:02] * Coren lives dangerously and +2s his changeset since it's blocking him. [18:07:17] Coren: single quotes unless interpolating! [18:07:28] ... just in time! :-) [18:07:49] boom goes the site [18:08:00] :P [18:10:04] (03PS2) 10coren: Move labs-specific stuff out of base [operations/puppet] - 10https://gerrit.wikimedia.org/r/113982 [18:10:06] (03CR) 10Thiemo Mättig (WMDE): Redirect all *.wikidata.org subdomains to www.wikidata.org (031 comment) [operations/apache-config] - 10https://gerrit.wikimedia.org/r/113972 (owner: 10Thiemo Mättig (WMDE)) [18:11:07] ori: Anything to add on substance? :-) [18:14:34] (03PS3) 10coren: Move labs-specific stuff out of base [operations/puppet] - 10https://gerrit.wikimedia.org/r/113982 [18:19:31] (03CR) 10coren: [C: 032] "Safe enough in practice. Blame me if all hell breaks loose." [operations/puppet] - 10https://gerrit.wikimedia.org/r/113982 (owner: 10coren) [18:22:41] paravoid: looking [18:22:43] sorry [18:25:55] (03PS1) 10Ottomata: Adding missing newline at end of filters.emery.erb to fix sqstat [operations/puppet] - 10https://gerrit.wikimedia.org/r/113999 [18:26:09] (03CR) 10Ottomata: [C: 032 V: 032] Adding missing newline at end of filters.emery.erb to fix sqstat [operations/puppet] - 10https://gerrit.wikimedia.org/r/113999 (owner: 10Ottomata) [18:26:33] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [18:26:59] ah coren, sorry, I mentally checked out a couple minutes before your ping... [18:27:08] end of day brain shutdown [18:27:22] It's okay; I like to live dangerously. :-) [18:32:15] ottomata: meeting? [18:33:16] (03CR) 10Aaron Schulz: Turn on checkDelay for a cirrus job (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113954 (owner: 10Manybubbles) [18:33:42] AH MANYBUBBLES [18:33:58] Scrum of Scrum is right now it was supposed to have been moved [18:34:06] but it wasn't this week due to the holiday yesterday [18:34:07] (03CR) 10Aaron Schulz: [C: 031] Turn on checkDelay for a cirrus job (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113954 (owner: 10Manybubbles) [18:34:58] (03PS1) 10coren: Labs: fix inversion in ldap::role::client [operations/puppet] - 10https://gerrit.wikimedia.org/r/114001 [18:35:21] sorry about that manybubbles, i think the meeting go moved around strangely this week [18:35:40] it was supposed to be moved but was decided to still happen at this time this week [18:44:28] (03CR) 10coren: [C: 032] "trivial fix" [operations/puppet] - 10https://gerrit.wikimedia.org/r/114001 (owner: 10coren) [18:44:52] PROBLEM - Disk space on virt6 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 42268 MB (3% inode=99%): [18:46:37] andrewbogott_afk, Coren: ^^^ [18:46:43] warning turned critical [18:46:49] Bleh. [18:46:59] We are out of spacez! [18:48:34] Just to make things more fun, I'm not sure I even got place to shuffle instances to. [18:58:37] paravoid: Oh fun times. I could move an instance or two... to other boxen that are just as full and put them in the same situation. [19:25:56] (03CR) 10GWicke: [C: 031] alert earlier for disk space on parsoid servers, rt 6851 (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/113944 (owner: 10ArielGlenn) [19:28:01] !log Jenkins jobs for npm are broken because the new integration-slave02 and integration-slave03 instances have SSL issues (different npm version and no certificates). And integration-slave01 (which was working) was deleted. [19:28:09] Logged the message, Master [19:39:11] !log reedy updated /a/common to {{Gerrit|I641a25ef9}}: Symlink in the extension-list files [19:39:15] (03PS1) 10Reedy: All non wikipedias to 1.23wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114007 [19:39:19] Logged the message, Master [19:39:43] (03PS1) 10EBernhardson: Enable Flow on 'm:Talk:Flow/Developer test page' [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114009 [19:40:20] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: All non wikipedias to 1.23wmf14 [19:40:29] Logged the message, Master [19:41:05] (03CR) 10Reedy: [C: 032] All non wikipedias to 1.23wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114007 (owner: 10Reedy) [19:41:12] (03Merged) 10jenkins-bot: All non wikipedias to 1.23wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114007 (owner: 10Reedy) [19:42:41] hmm I might have to change that topic [19:43:26] I'll have it back soon enough... [19:45:06] !log repooling cp3022 into bits esams. varnishkafka has emptied its outbuf since last night [19:45:13] Logged the message, Master [19:45:50] how long did it take? [19:51:42] paravoid: Absolutely fuckall I can do about it except hurry up the migration so we can start moving instances off. [19:52:35] then i suggest you do that ;) [19:53:10] . o O ( management can be so easy sometimes... ) [19:53:34] mark: We got an instance running w/ storage and all now. [19:53:42] excellent [19:53:46] i'm glad we're back on track [19:55:39] !log aaron synchronized php-1.23wmf14/includes/filebackend/SwiftFileBackend.php '58fa613a75c2730cbf8f60e9e3f283a3f043f00b' [19:55:48] Logged the message, Master [20:15:12] PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC [20:21:11] !log upgraded librdkafka1 to 0.8.3 on cp1056, restarting varnishkafka there [20:21:19] Logged the message, Master [20:23:41] (03PS1) 10BBlack: update testsuite for 3.0.5 [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/114063 [20:23:43] (03PS1) 10BBlack: port nlist bugfix from gdnsd d97bacc8 [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/114064 [20:23:59] (03CR) 10BBlack: [C: 032 V: 032] update testsuite for 3.0.5 [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/114063 (owner: 10BBlack) [20:24:14] (03CR) 10BBlack: [C: 032 V: 032] port nlist bugfix from gdnsd d97bacc8 [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/114064 (owner: 10BBlack) [20:34:10] (03CR) 10Chad: [C: 032] Remove optional inclusion of Elastica. Not like it would do any good if it was missing [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113294 (owner: 10Chad) [20:34:18] (03Merged) 10jenkins-bot: Remove optional inclusion of Elastica. Not like it would do any good if it was missing [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113294 (owner: 10Chad) [20:35:02] !log demon synchronized wmf-config/CirrusSearch-common.php 'Elastica is always included now' [20:35:10] Logged the message, Master [20:38:56] (03PS1) 10BBlack: varnish (3.0.5plus~wmftest-wm4) unstable; urgency=low [operations/debs/varnish] - 10https://gerrit.wikimedia.org/r/114065 [20:39:28] (03Abandoned) 10BBlack: varnish (3.0.5plus~wmftest-wm4) unstable; urgency=low [operations/debs/varnish] - 10https://gerrit.wikimedia.org/r/114065 (owner: 10BBlack) [20:40:38] (03PS1) 10BBlack: varnish (3.0.5plus~wmftest-wm4) unstable; urgency=low [operations/debs/varnish] (3.0.5-plus-wm) - 10https://gerrit.wikimedia.org/r/114067 [20:40:53] (03CR) 10BBlack: [C: 032 V: 032] varnish (3.0.5plus~wmftest-wm4) unstable; urgency=low [operations/debs/varnish] (3.0.5-plus-wm) - 10https://gerrit.wikimedia.org/r/114067 (owner: 10BBlack) [20:43:33] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [21:06:03] (03PS1) 10coren: Labs: tweak manage-nfs-volumes for eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/114069 [21:07:48] (03CR) 10coren: [C: 032] "Just tweaks." [operations/puppet] - 10https://gerrit.wikimedia.org/r/114069 (owner: 10coren) [21:08:57] (03PS1) 10EBernhardson: Enable Flow on two mediawikiwiki pages for design [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114071 [21:25:14] apergos: hey, shall we shutdown locke or was there a blocker left [21:25:36] i mean actually typing shutdown, the rest looks all done, or doesit [21:25:42] I dunno [21:25:56] I mean I +1 taking it out of puppet as 'everything's gone off there' right? [21:26:03] oh no, you sent messages to peole [21:26:05] people [21:26:07] ohh [21:26:15] and likely not all have replied [21:26:21] uh this was home dirs? [21:26:24] but see I took copies [21:26:39] I thought locke was already in the decomm queue now [21:26:40] yea, the ones you copied to iron though [21:26:43] yes [21:26:46] per RT 6168 [21:26:58] well, it is up, that's why [21:27:01] I took copies so we didn't have to wait [21:27:08] i also just said "thanks,so once it's actually down ... " :P [21:27:15] cool,ok, well [21:27:29] I got replies from at least three people but that doesn't matter for this [21:27:41] just follow the decomm steps [21:27:56] will do.thanks [21:39:28] <^d> Someone got a minute or two to look at a super quick puppet change? [21:41:51] ^d: removing your old key? [21:42:55] <^d> mutante: Yep [21:48:55] (03CR) 10Dzahn: [C: 032] "confirmed with salt i don't see this key anywhere (as it has been set to absent before)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113326 (owner: 10Chad) [21:50:01] mutante: hello there :) [21:50:11] <^d> mutante: Thanks. [21:50:50] mutante: regarding bugzilla, do we still need the .pem file in the files directory? [21:51:12] i guess not as it is now in the module, but would like to confirm with you [21:51:45] (03PS3) 10Ori.livneh: Refactor GeoIP lookup code; add tests [operations/puppet] - 10https://gerrit.wikimedia.org/r/113900 [21:51:58] ^d: yw, i don't expect either that we want to keep everything in 'absent' until the end of time [21:52:20] matanya: is that a change you already suggested? [21:52:27] that andre_ mentioned [21:52:37] have it localy not pushed yet [21:52:37] then i think no..but let me look [21:53:00] like other 30 changes :) [21:53:35] (03PS2) 10Ori.livneh: Send GeoIP lookup result as 'GeoIP' cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/113935 [21:54:43] matanya: we still need it [21:54:55] if you mean puppet::/files/ssl/ [21:55:07] i did not move the cert into the module if you exected that [21:55:09] yes, but not there, should i be in the bugzilla module? [21:55:15] and i'm not sure that i should either [21:55:29] we should find that out as a general question [21:55:33] and then either move all or none [21:55:43] so far they are all neatly in files/ssl/ next to each other [21:55:49] i see [21:55:58] and that has also some advantage [21:56:11] but ..it's not like i'm really sure about that question [21:56:13] much like loads of stuff in the apache dir in files [22:03:17] about to start deploying some small updates in our deploy window, double checking no-one else is doing deploys [22:05:02] mutante: how about reviewing https://gerrit.wikimedia.org/r/#/c/113775/ ? [22:06:22] (03CR) 10Dzahn: [C: 032] "oh yea, totally, thx" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113775 (owner: 10Matanya) [22:06:38] thanks [22:08:14] mutante: there are more if you feel like :) https://gerrit.wikimedia.org/r/#/q/project:operations/puppet+owner:%22Matanya+%253Cmatanya%2540foss.co.il%253E%22+status:open,n,z [22:10:21] !log ebernhardson synchronized php-1.23wmf13/extensions/Flow/includes/Data/RevisionStorage.php [22:10:29] Logged the message, Master [22:11:24] bblack: you about ? [22:17:56] (03PS5) 10Matanya: remove shell account for lwelling and access to stat1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/113627 [22:17:58] !log ebernhardson synchronized php-1.23wmf14/extensions/Flow [22:18:07] Logged the message, Master [22:18:45] (03CR) 10EBernhardson: [C: 032] Add qa_automation group and grant it Flow rights [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113311 (owner: 10Spage) [22:18:53] (03Merged) 10jenkins-bot: Add qa_automation group and grant it Flow rights [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113311 (owner: 10Spage) [22:18:56] (03CR) 10EBernhardson: [C: 032] Enable Flow on two mediawikiwiki pages for design [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114071 (owner: 10EBernhardson) [22:19:03] (03CR) 10EBernhardson: [C: 032] Enable Flow on 'm:Talk:Flow/Developer test page' [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114009 (owner: 10EBernhardson) [22:19:11] (03Merged) 10jenkins-bot: Enable Flow on 'm:Talk:Flow/Developer test page' [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114009 (owner: 10EBernhardson) [22:19:13] (03Merged) 10jenkins-bot: Enable Flow on two mediawikiwiki pages for design [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114071 (owner: 10EBernhardson) [22:20:08] (03CR) 10Dzahn: [C: 032] "one of the accounts under "definitely" not active anymore on the tickets per Eloquence" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113627 (owner: 10Matanya) [22:20:28] !log ebernhardson synchronized wmf-config/InitialiseSettings.php [22:20:35] Logged the message, Master [22:21:00] (03PS3) 10Matanya: remove shell access and key for mgrover [operations/puppet] - 10https://gerrit.wikimedia.org/r/113636 [22:23:05] !log ebernhardson synchronized wmf-config/InitialiseSettings.php [22:23:13] Logged the message, Master [22:23:34] (03CR) 10Dzahn: [C: 04-1] "actually, matanya, can you set those to "access revoked" but leave it on the node and just ensure the key absent? thanks" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113636 (owner: 10Matanya) [22:25:21] !log ebernhardson synchronized wmf-config/InitialiseSettings.php [22:25:30] Logged the message, Master [22:27:39] (03CR) 10Dzahn: [C: 04-1] "i'm just voting it down per the explanation from Alex why it can't be merged yet" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110880 (owner: 10Matanya) [22:27:45] (03PS3) 10Matanya: nrpe: remove hard coded disk checks [operations/puppet] - 10https://gerrit.wikimedia.org/r/110880 [22:27:50] (03CR) 10Dzahn: [C: 04-1] nrpe: remove hard coded disk checks [operations/puppet] - 10https://gerrit.wikimedia.org/r/110880 (owner: 10Matanya) [22:33:59] (03PS1) 10EBernhardson: Enable Flow on one page of meta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114088 [22:34:20] (03CR) 10EBernhardson: [C: 032] Enable Flow on one page of meta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114088 (owner: 10EBernhardson) [22:34:27] (03Merged) 10jenkins-bot: Enable Flow on one page of meta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114088 (owner: 10EBernhardson) [22:35:23] !log ebernhardson synchronized wmf-config/InitialiseSettings.php [22:35:31] Logged the message, Master [22:40:24] tfinc: yes [22:48:59] (03PS1) 10Matthias Mullie: Seperate Flow Parsoid config from VE [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114091 [22:50:47] (03CR) 10jenkins-bot: [V: 04-1] Seperate Flow Parsoid config from VE [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114091 (owner: 10Matthias Mullie) [22:51:31] bblack: great. yurikR had some questions for you [22:51:42] tfinc: I see that, but I can't reach him now [22:52:00] tfinc: I've got two video chat requests and 3 pings about something being urgent, but zero actual data on what the problem is :P [22:52:27] (03PS1) 10EBernhardson: Disable Flow on meta pending parsoid config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114092 [22:52:33] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [22:52:48] (03PS3) 10Matthias Mullie: Seperate Flow Parsoid config from VE [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114091 [22:52:50] bblack: yeah, the zero team has a launch issue that they want to ask you about. thanks for responding. now lets see if can align the moon and the sun to have both of you active at the same time [22:52:54] (03CR) 10jenkins-bot: [V: 04-1] Seperate Flow Parsoid config from VE [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114091 (owner: 10Matthias Mullie) [22:53:15] (03Abandoned) 10EBernhardson: Disable Flow on meta pending parsoid config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114092 (owner: 10EBernhardson) [22:54:54] (03PS1) 10Matthias Mullie: Seperate Flow Parsoid config from VE [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114095 [22:55:46] (03Abandoned) 10Matthias Mullie: Seperate Flow Parsoid config from VE [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114091 (owner: 10Matthias Mullie) [22:57:27] (03CR) 10EBernhardson: [C: 032] Seperate Flow Parsoid config from VE [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114095 (owner: 10Matthias Mullie) [22:57:34] (03Merged) 10jenkins-bot: Seperate Flow Parsoid config from VE [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114095 (owner: 10Matthias Mullie) [22:58:04] (03CR) 10Dzahn: [C: 032] alert earlier for disk space on parsoid servers, rt 6851 [operations/puppet] - 10https://gerrit.wikimedia.org/r/113944 (owner: 10ArielGlenn) [22:59:25] !log ebernhardson synchronized wmf-config/CommonSettings.php [22:59:34] Logged the message, Master [22:59:44] gwicke: saw this before? info: /Service[parsoid]: Provider upstart does not support features enableable; not managing attribute enable [23:00:12] yes, it says enableable [23:00:37] mutante, afaik no [23:00:44] (03CR) 10Dzahn: "info: /Stage[main]/Role::Parsoid::Production/Nrpe::Monitor_service[parsoid_disk_space]/Nrpe::Check[check_parsoid_disk_space]/File[/etc/nag" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113944 (owner: 10ArielGlenn) [23:01:04] gwicke: i'm merging the disk space monitoring change [23:01:13] gwicke: that's why i saw it on puppet run, just "info" though [23:01:18] mutante, I switched back to the init script only for the deb as upstart is on the way out in the longer term & there were some more precedence issues [23:01:29] but i wanted to make sure, because of the issues with that [23:01:40] gwicke: ah, ok [23:01:45] mutante, the fix for the restart issues is not yet deployed [23:01:52] our next window is tomorrow [23:02:21] gwicke: gotcha, this just touches nagios-nrpe-service not the parsoid service itself, that is just something i saw on a normal puppet run [23:02:24] * gwicke is happy about debian going with systemd & ubuntu following [23:04:34] mutante, I don't see any enableable in the code [23:04:51] the only enable in parsoid.pp is for the service in general [23:05:21] gwicke: that is probably a puppet bug about what it outputs.. [23:05:38] yeah [23:05:48] it's not a puppet bug [23:05:48] root@wtp1001:~# grep enableable /var/log/syslog [23:05:54] but it really says that [23:05:58] upstart jobs specify their start conditions in the job definition file [23:06:06] it's not something that can be managed externally to the definition itself [23:06:14] oh, it's called "enableable" .. thats a thing? [23:06:14] which is why the upstart provider for the Service type doesn't support enableable [23:06:15] ok [23:06:32] so we should just remove that parameter from the service resource [23:06:46] oh, heh, even in Urban Dictionary.. blames not being native [23:07:10] ori: thx [23:07:40] (03PS1) 10Ori.livneh: Drop enabled => true param on Parsoid upstart Service resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/114097 [23:07:41] ^ mutante [23:07:59] Tool Labs routing looks utterly broken [23:08:03] Coren: ^ [23:08:39] mutante: (it'll be a no-op) [23:08:55] puppet error messages could use some improvement [23:09:18] hear! [23:09:23] (03PS2) 10Ori.livneh: Drop enabled => true param on Parsoid upstart Service resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/114097 [23:10:44] (03CR) 10Dzahn: [C: 032] "yep, i reported this error seeing it on puppet runs on parsoid and since i know "enableable" is a thing this makes sense :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/114097 (owner: 10Ori.livneh) [23:15:43] greg-g: around? [23:15:52] yessir [23:16:00] any hope for a fire depl in a bit? [23:16:09] LIGHTNING [23:16:12] PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC [23:16:12] yurikR: what's up? [23:16:22] PM [23:20:39] (03PS1) 10Matthias Mullie: De-duplicate Parsoid config from VE & Flow [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114099 [23:22:12] (03PS1) 10Jforrester: Factor out $wgParsoidURL [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114100 [23:22:29] gwicke: additional disk space checks in icinga are there now in status "pending" [23:22:58] mutante, on all hosts? [23:23:04] gwicke: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=parsoid+disk+space [23:23:07] yes [23:23:17] wtp1001-1024 [23:23:35] now it says ok for me [23:24:28] not all parsoid hosts have a disk space entry though [23:24:33] gwicke: yea, it literally just came up, that was a good thing [23:24:58] yet, I'm guessing [23:25:33] everything that has the role/parsoid.pp [23:25:42] the others will just not be done yet in that case [23:25:49] *nod* [23:25:53] but we edited the role.. [23:26:24] how long can it take for puppet to apply such changes? [23:26:53] (03Abandoned) 10Matthias Mullie: De-duplicate Parsoid config from VE & Flow [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114099 (owner: 10Matthias Mullie) [23:27:33] gwicke: something around 30 min I'd say [23:27:38] because it's a cron [23:27:50] and in those cases, it also needs a run on neon, but i already did [23:28:14] k, thx [23:33:16] (03PS2) 10Jforrester: Factor out Parsoid config from VisualEditor config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114100 [23:35:24] gwicke: they are all there now [23:36:36] mutante: cool, thanks! [23:37:04] lets me sleep better after deploying changes to our logging infrastructure [23:37:18] cool, yw! [23:38:29] !log locke - disable puppet, puppetstoredconfigclean on master, revoke puppet cert and salt key.. [23:38:38] Logged the message, Master [23:42:25] !log shutting down locke - killing 757 days of uptime and one more Tampa classic host [23:42:33] Logged the message, Master [23:42:52] ;_; [23:42:57] Connection to locke closed. [23:42:59] bye [23:43:56] (03PS1) 10BBlack: improve netmapper_update.sh error-handling [operations/puppet] - 10https://gerrit.wikimedia.org/r/114105 [23:43:58] (03PS1) 10BBlack: disable netmapper update cron temporarily [operations/puppet] - 10https://gerrit.wikimedia.org/r/114106 [23:44:06] PROBLEM - Host locke is DOWN: PING CRITICAL - Packet loss = 100% [23:44:12] (03CR) 10BBlack: [C: 032 V: 032] improve netmapper_update.sh error-handling [operations/puppet] - 10https://gerrit.wikimedia.org/r/114105 (owner: 10BBlack) [23:44:24] (03CR) 10BBlack: [C: 032 V: 032] disable netmapper update cron temporarily [operations/puppet] - 10https://gerrit.wikimedia.org/r/114106 (owner: 10BBlack) [23:46:21] icinga-wm: and you didnt let me ACK that anymore before the host is gone from your config, alright:)