[00:27:10] (03CR) 10Legoktm: "g1" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122621 (owner: 10Reedy) [01:17:40] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Sun 15 Jun 2014 22:17:06 UTC [01:47:10] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Mon Jun 16 01:46:59 UTC 2014 [02:09:10] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:15:41] !log LocalisationUpdate completed (1.24wmf8) at 2014-06-16 02:14:38+00:00 [02:15:49] Logged the message, Master [02:20:40] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Fri 13 Jun 2014 20:03:25 UTC [02:27:08] !log LocalisationUpdate completed (1.24wmf9) at 2014-06-16 02:26:05+00:00 [02:27:14] Logged the message, Master [02:31:00] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.004 second response time [03:00:50] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 16 02:59:43 UTC 2014 (duration 59m 42s) [03:00:55] Logged the message, Master [03:46:38] (03PS1) 10Withoutaname: New namespace "Carte" for rowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139766 (https://bugzilla.wikimedia.org/66530) [04:01:17] PROBLEM - Puppet freshness on mw1051 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 03:58:39 UTC [04:03:17] PROBLEM - Puppet freshness on mw1051 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 03:58:39 UTC [04:05:17] PROBLEM - Puppet freshness on mw1051 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 03:58:39 UTC [04:07:17] PROBLEM - Puppet freshness on mw1051 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 03:58:39 UTC [04:09:17] PROBLEM - Puppet freshness on mw1051 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 03:58:39 UTC [04:11:17] PROBLEM - Puppet freshness on mw1051 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 03:58:39 UTC [04:13:17] PROBLEM - Puppet freshness on mw1051 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 03:58:39 UTC [04:15:17] PROBLEM - Puppet freshness on mw1051 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 03:58:39 UTC [04:17:17] PROBLEM - Puppet freshness on mw1051 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 03:58:39 UTC [04:19:17] PROBLEM - Puppet freshness on mw1051 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 03:58:39 UTC [04:21:17] PROBLEM - Puppet freshness on mw1051 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 03:58:39 UTC [04:23:17] PROBLEM - Puppet freshness on mw1051 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 03:58:39 UTC [04:25:17] PROBLEM - Puppet freshness on mw1051 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 03:58:39 UTC [04:26:54] 1051? [04:27:17] PROBLEM - Puppet freshness on mw1051 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 03:58:39 UTC [04:28:57] RECOVERY - Puppet freshness on mw1051 is OK: puppet ran at Mon Jun 16 04:28:47 UTC 2014 [05:21:18] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Fri 13 Jun 2014 20:03:25 UTC [06:14:56] (03PS1) 10QChris: Fix log target for mobile apps data job [operations/puppet] - 10https://gerrit.wikimedia.org/r/139775 (https://bugzilla.wikimedia.org/66600) [06:17:13] springle: It seems puppet on stat1003 cannot run. [06:17:33] I cannot see the error message, but might https://gerrit.wikimedia.org/r/139775 fix it? [06:18:00] (It seems the time since puppet fails matches the time https://gerrit.wikimedia.org/r/#/c/138884 got merged) [06:20:22] (03CR) 10Springle: [C: 032] Fix log target for mobile apps data job [operations/puppet] - 10https://gerrit.wikimedia.org/r/139775 (https://bugzilla.wikimedia.org/66600) (owner: 10QChris) [06:21:59] Thanks! [06:22:08] RECOVERY - Puppet freshness on stat1003 is OK: puppet ran at Mon Jun 16 06:22:04 UTC 2014 [06:22:11] qchris: thank you :) [06:28:35] (03PS1) 10Springle: Add Leila's new ssh key [operations/puppet] - 10https://gerrit.wikimedia.org/r/139779 [06:38:02] (03CR) 10Matanya: apt/pin.pp - retab and mini quoting fix (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139458 (owner: 10Dzahn) [06:38:24] (03CR) 10Matanya: misc/management.pp - retab and lint fixes (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139460 (owner: 10Dzahn) [06:40:36] (03CR) 10Matanya: noc.pp - various lint fixes (036 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139462 (owner: 10Dzahn) [06:41:45] (03CR) 10Matanya: rancid.pp - lint and tidy, quoting, arrows, retab (035 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139464 (owner: 10Dzahn) [06:42:35] (03CR) 10Matanya: "Please abandon." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126941 (owner: 10Dzahn) [06:46:34] (03PS1) 10Matanya: ganglia_new: retab instance.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/139781 [06:52:04] (03PS1) 10Matanya: labsdebrepo: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/139782 [07:00:17] (03PS1) 10QChris: Ensure log file for mobile data job exists [operations/puppet] - 10https://gerrit.wikimedia.org/r/139785 (https://bugzilla.wikimedia.org/66600) [07:08:29] (03PS1) 10Matanya: torrus: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/139786 [07:09:32] (03CR) 10Matanya: Ensure log file for mobile data job exists (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139785 (https://bugzilla.wikimedia.org/66600) (owner: 10QChris) [07:12:14] (03CR) 10QChris: Ensure log file for mobile data job exists (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139785 (https://bugzilla.wikimedia.org/66600) (owner: 10QChris) [07:19:06] (03CR) 10Matanya: Ensure log file for mobile data job exists (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139785 (https://bugzilla.wikimedia.org/66600) (owner: 10QChris) [07:22:21] ola [07:23:32] do someone think i can start an upload with gwtoolset ? about 40 files of 50 mo [07:24:26] _joe_: ^ ? [07:29:57] tounoki: AFAIK not much has changed since a similar upload attempt brought the thumbnail servers down [07:30:20] there are a few patchsets pending, which might help, but nothing important has been merged yet [07:35:24] tounoki: you still didn't do it? just use uploadwizard [07:35:29] or uploader.py [07:36:21] it's silly to waste volunteer time waiting for who-knows-what for such a minuscule upload [07:37:17] the goal is to perform a process and his configuration. after i have about 8000 files to upload [07:38:16] and we use gwtoolset to keep formatting metadata [07:39:47] <_joe_> Nemo_bis: last time a slightly larger upload brought down the imagescalers [07:39:57] <_joe_> still, I think now you can do that [07:40:20] tounoki: treasure the log line above and go ahead :D [07:40:41] <_joe_> the fact that in a month nothing has changed on this respect kinda frustrates me, but still... [07:41:02] <_joe_> (I know it's not easy to fix, btw) [08:08:29] (03PS1) 10Nemo bis: Add Terry Chay to English Wikimedia Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/139791 [08:31:28] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [08:32:05] <_joe_> mh [08:32:58] PROBLEM - Lucene on search1015 is CRITICAL: Connection timed out [08:38:48] RECOVERY - Lucene on search1015 is OK: TCP OK - 3.004 second response time on port 8123 [08:39:47] <_joe_> oh well [08:40:02] lol [08:40:03] <_joe_> I love when problems heal themselves before I understood how to manage them [08:45:18] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.997 second response time on port 8123 [08:48:28] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [08:48:41] <_joe_> uhm, again [08:49:18] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.996 second response time on port 8123 [08:49:58] PROBLEM - Lucene on search1015 is CRITICAL: Connection timed out [08:52:25] mind if I restart lucene search on that box? [08:52:28] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [08:52:38] <_joe_> apergos: go on [08:52:46] <_joe_> I'm not sure if that will change something [08:53:48] RECOVERY - Lucene on search1015 is OK: TCP OK - 0.000 second response time on port 8123 [08:54:18] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.001 second response time on port 8123 [08:54:18] <_joe_> 2014-06-16 08:49:31.680185 [search_pool4_8123 ProxyFetch] search1015.eqiad.wmnet (enabled/partially up/pooled): Fetch failed, 0.005 s [08:54:26] <_joe_> pybalk disagreed [08:54:30] I'm nt sure how much it will change but [08:54:37] it wasn't logging before and now it is [08:54:47] so I think it was in an unahppy(ier) state [08:55:05] <_joe_> apergos: pybal still dislikes this state [08:56:07] hi apergos [08:56:17] morning [08:57:16] <_joe_> 2014-06-16 08:57:04,727 [pool-1-thread-52] ERROR org.wikimedia.lsearch.search.SearchEngine - Internal error in SearchEngine trying to make WikiSearcher: fiwiki is being deployed or is not searched by this host [08:57:19] <_joe_> java.lang.RuntimeException: fiwiki is being deployed or is not searched by this host [08:57:32] <_joe_> apergos: I see *thousands* of these [08:57:43] yep I've been watching them too [08:58:24] <_joe_> apergos: so, it's not working [08:58:25] my restart was an orderly stop and then start (checking that the process was actually gone) so nothing should have broken from that [08:58:49] <_joe_> apergos: from my past lucene experience, that's not always the case [08:59:14] <_joe_> so I know nothing about our search infrastructure, else we could just point pybal to other hosts [09:00:06] maybe there's an rsync of indexes that didn't happen/complete or something [09:00:29] https://wikitech.wikimedia.org/wiki/Search/Old here's all the notes about the old search [09:00:30] <_joe_> I know nothing about this :( [09:01:09] 1015 and 1016 are the two in this pool [09:01:37] <_joe_> and they both seem to be dead [09:16:03] (03PS3) 10Filippo Giunchedi: enable statsd reporting for swift proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/138574 [09:25:24] _joe_: search is now a user facing issue [09:25:35] <_joe_> matanya: meaning what? [09:26:12] on he.wiki help desk user reports : Internal error in SearchEngine: hewiki is being deployed or is not searched by this host [09:26:34] <_joe_> matanya: can they try again now? [09:26:37] matanya: yeah we know. Looking at it [09:26:42] yes [09:26:47] <_joe_> we restarted some indices and that may be the reason for that [09:27:10] yes, i saw above. will ask, thanks [09:29:45] !log restarted search1015 about 15 mns ago, it's now recovered afaict, restarted search1016, it's doing index setup now [09:29:50] Logged the message, Master [10:00:04] hoo: The time is nigh to deploy Wikidata (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140616T1000) [10:03:56] * hoo waits for jenkins [10:06:21] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] enable statsd reporting for swift proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/138574 (owner: 10Filippo Giunchedi) [10:12:57] !log hoo Synchronized php-1.24wmf9/extensions/Wikidata/: Update Wikidata to fix a suggester bug (duration: 00m 13s) [10:13:02] Logged the message, Master [10:14:01] works :) [10:16:19] !log restarting swift-proxy-server on ms-fe3001 to test statsd metrics [10:16:24] Logged the message, Master [10:16:34] <_joe_> \o/ [10:16:57] I have this feeling tungsten won't be amused [10:17:14] <_joe_> we should ask for moar servers for graphite [10:17:22] <_joe_> if we're replacing ganglia [10:18:30] !log hoo Synchronized php-1.24wmf8/extensions/Wikidata/: Update Wikidata to fix a suggester bug (duration: 00m 09s) [10:18:35] Logged the message, Master [10:19:25] yay, may favorite bug is back -.- [10:20:38] PROBLEM - Apache HTTP on mw1070 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 50422 bytes in 0.037 second response time [10:21:38] RECOVERY - Apache HTTP on mw1070 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.072 second response time [10:21:47] that's me [10:23:24] !log Touched all 1.24wmf8 extension/wikidata files and ran sync-common after that on mw1070 [10:23:29] Logged the message, Master [10:23:58] mh :/ I thought touching them all wouldn't lead top problems (opposed to deleting) [10:47:39] !log restarting swift-proxy-server on ms-fe3002 to test statsd metrics [10:47:44] Logged the message, Master [11:06:49] btw _joe_ user report search is ok now [11:10:45] <_joe_> matanya: I was pretty confident that was the case [11:10:58] thank you for this :) [11:13:48] <_joe_> thank apergos [11:14:05] <_joe_> i just complained, basically [11:18:47] thank you apergos :) [11:18:56] yw :-D [12:16:48] PROBLEM - MySQL Processlist on db1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:18:38] PROBLEM - MySQL Processlist on db1021 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 6 copy to table, 533 statistics [12:20:38] RECOVERY - MySQL Processlist on db1021 is OK: OK 0 unauthenticated, 0 locked, 2 copy to table, 3 statistics [12:45:01] Krinkle|detached: ping [12:57:53] (03PS1) 10Gilles: Reduce EventLogging sampling rate for MediaViewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139817 [12:58:43] (03PS2) 10Gilles: Reduce EventLogging sampling rate for MediaViewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139817 [13:03:01] !log restarting swift-proxy-server on ms-fe1001 to test statsd metrics [13:03:06] Logged the message, Master [13:08:56] (03PS2) 10Ottomata: Ensure log file for mobile data job exists [operations/puppet] - 10https://gerrit.wikimedia.org/r/139785 (https://bugzilla.wikimedia.org/66600) (owner: 10QChris) [13:09:36] (03CR) 10Ottomata: [C: 032 V: 032] "I think the linting of other things can be done in another commit. Thanks Christian!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/139785 (https://bugzilla.wikimedia.org/66600) (owner: 10QChris) [13:12:59] <_joe_> !log removing chip-l mailing list as for bug #63877 [13:13:04] Logged the message, Master [13:14:17] * twkozlowski hugs _joe_ [13:15:09] <_joe_> twkozlowski: I'm doing these small chores in the after-lunch interval when I'm sleepy [13:15:50] Thank you! The lists have been waiting to be closed for some time now. [13:17:35] <_joe_> twkozlowski: yes in some cases though the requestor is not the list admin [13:17:41] <_joe_> I won't close those lists [13:18:50] ottomata: a shame you for merging that :P [13:19:23] *I [13:19:26] <_joe_> matanya: I'm sure ottomata will fix that before the end of the day ;) [13:19:31] :) [13:20:06] my lint-fu is really hurt now :) [13:20:16] <_joe_> trust other ops to fear shame and implicit fingerpointing. That usually works well. [13:20:34] * _joe_ points his finger randomly around the chan [13:21:22] _joe_: As in, which bugs precisely? [13:21:29] RECOVERY - Kafka Broker Messages In on analytics1012 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 2490.61489154 [13:22:25] <_joe_> twkozlowski: 66003 [13:22:59] (03PS1) 10Odder: Add Help namespace to default search on all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139819 (https://bugzilla.wikimedia.org/66066) [13:23:33] _joe_: Asaf Bartov is a WMF staffer, head of Grants department. Think you can trust him. :-) [13:24:20] <_joe_> twkozlowski: well, it's not about trust really. I think list owners should decide on a list fate [13:24:47] <_joe_> twkozlowski: I do trust you, and I assumed the tickets were written with the best intentions [13:24:58] <_joe_> the moment you assigned them to me. [13:25:12] well it depends what "close" means; if you delete the archives, the authors should have a say too [13:25:17] <_joe_> I just don't want to step on anyone's toes [13:25:28] <_joe_> Nemo_bis: absolutely [13:25:39] <_joe_> (by default I don't) [13:25:52] Well, https://bugzilla.wikimedia.org/show_bug.cgi?id=66003#c0 is pretty clear that people are too lazy to do anything. [13:25:54] (03CR) 10Nemo bis: [C: 04-1] "Needs to be added to all individual wikis below as well, the two are not merged automatically." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139819 (https://bugzilla.wikimedia.org/66066) (owner: 10Odder) [13:26:42] Nemo_bis: That's exactly why the plus sign is there. [13:29:24] (03PS1) 10Ottomata: Lint fixes for misc/statistics.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/139820 [13:29:32] mutante, _joe_: public shaming apparently works: ^ :p [13:29:41] sorry [13:29:43] matanya: ^ [13:29:51] (not mu tan te ) [13:30:13] (03CR) 10Odder: "The two arrays are merged automatically, that's what the plus sign is there for." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139819 (https://bugzilla.wikimedia.org/66066) (owner: 10Odder) [13:30:39] ottomata: i'm preparing you a public punishment, give me a minute or too. thank you for fixing your sins :) [13:30:41] two [13:30:48] heheh [13:37:10] <_joe_> !log closed wikimedia-de-by list [13:37:15] Logged the message, Master [13:37:48] hmm _joe_ [13:38:26] <_joe_> twkozlowski: I need to log it [13:38:41] <_joe_> to the wall of shame [13:38:47] !log _joe_ also working on recovering the list which was deleted by mistake [13:38:52] Logged the message, Master [13:39:42] <_joe_> I wanted to log on success :) [13:49:07] (03PS1) 10Matanya: udp2log: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/139826 [13:49:17] ottomata: here is your way to redemption ^ [13:52:32] (03CR) 10Ottomata: [C: 032] "Looks good, thanks! I will merge this shortly and make sure puppet behaves as we expect." [operations/puppet] - 10https://gerrit.wikimedia.org/r/139826 (owner: 10Matanya) [13:53:23] ottomata: https://commons.wikimedia.org/wiki/File:Indulgence.jpg <-- for you :) [13:54:14] hehe [13:55:13] _joe_: you need to sign there, rep of the pope, you are the closest, geography wise [13:56:48] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 20.00% of data exceeded the critical threshold [500.0] [13:57:42] ottomata: do you want a review on the lint patch you pushed, or you will run puppet-lint yourself and redo the patch ? [13:58:07] (03PS1) 10QChris: Lint statistics.pp: Make 'ensure' first item [operations/puppet] - 10https://gerrit.wikimedia.org/r/139828 [13:58:19] haha [13:58:38] qchris: https://gerrit.wikimedia.org/r/#/c/139820/ [13:59:02] * matanya is so going to use public shaming in the future [13:59:02] matanya: a glance over it would be good [13:59:03] just in case [13:59:09] :-) [14:00:51] (03Abandoned) 10QChris: Lint statistics.pp: Make 'ensure' first item [operations/puppet] - 10https://gerrit.wikimedia.org/r/139828 (owner: 10QChris) [14:10:18] hashar, regarding https://bugzilla.wikimedia.org/show_bug.cgi?id=66575 -- is that a user that also exists in prod? [14:15:34] (03CR) 10Matanya: [C: 04-1] "a few inline comments." (0329 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139820 (owner: 10Ottomata) [14:17:48] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% data above the threshold [250.0] [14:26:45] chasemp: will service users (e.g. l10nupdate) eventually be yamlized as well? Or are they a separate case? [14:28:04] No plan for service users other than to consolidate how they are defined. Yamlizing them is a TBD. [14:28:17] ok [14:29:03] Its maybe the right thing? No strong opinion yet as long as whatever is consistent [14:30:04] I don't think I much careā€¦ just, I may be about to add a new one and want to make sure I do it right. [14:32:00] System => true...is about the only thing I know for certain [14:33:15] paravoid: pong [14:33:25] how's CVN's I/O? [14:33:39] (03PS1) 10Giuseppe Lavagetto: puppet: switch masters to puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/139832 [14:38:07] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Do not merge until a decision has been taken." [operations/puppet] - 10https://gerrit.wikimedia.org/r/139832 (owner: 10Giuseppe Lavagetto) [14:46:12] <_joe_> got to bail now, I'll be back in ~ 2 hours I guess. [14:48:05] (03PS1) 10Matanya: firewall: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/139836 [14:50:26] (03CR) 10Andrew Bogott: "The cxserver user is now in ldap (10060) so this patch shouldn't create it." [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 (owner: 10Nikerabbit) [14:51:32] * anomie sees no SWAT deploys requested for this morning [14:52:09] anomie: Looks like you will have an uneventful morning then [14:52:34] JohnLewis: Lets me catch up on email from the weekend, anyway [14:52:56] I guess it does [14:56:05] (03PS1) 10Alexandros Kosiaris: Define special jobs priorities [operations/puppet] - 10https://gerrit.wikimedia.org/r/139841 [14:58:12] anomie: you can always do a standard shell request if you want :) (or at least review) https://gerrit.wikimedia.org/r/#/c/134400/ [15:00:04] manybubbles, anomie: The time is nigh to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140616T1500) [15:00:19] anomie: nothing today, I think [15:00:39] anomie: yeah, review nemo's thing:) [15:01:41] bawolff whipped me and I made the commit message much more explanatory [15:03:04] manybubbles: are you into the old search too ? [15:03:25] matanya: in that I've read the source and understand it to some degree, yes [15:03:37] but in that I can change it if there is something wrong with it - no [15:03:55] it'd be quite an excursion to release a new copy of it [15:04:01] read: we'd break stuff [15:04:24] thanks, just wondering, since it broke earlier today [15:05:21] matanya: I saw that. I was going to reply but I don't have anything really constructive [15:05:36] for hewiki at least we should look at doing cirrus, I think [15:05:56] @seen springle [15:06:06] I think i agree, but I fear the missing results bug i opened [15:06:17] oooh, springle is here :-) while the bot is dead [15:11:13] (03CR) 10Alexandros Kosiaris: [C: 031] "Well, I'd +2 it but since you ask for consensus +1" [operations/puppet] - 10https://gerrit.wikimedia.org/r/139832 (owner: 10Giuseppe Lavagetto) [15:12:32] (03CR) 10Andrew Bogott: [C: 031] puppet: switch masters to puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/139832 (owner: 10Giuseppe Lavagetto) [15:12:55] matanya: ah, yeah, that one. I'll have to wait until thursday to run the saneitizer and see what we get [15:13:17] I ran it last thursday on the test systems and came up with two things: [15:13:42] pages that turned into redirects (fixed release rolling onto wikis now) [15:13:53] and pages in the wrong index - I haven't run those down yet [15:14:11] there is no rush anyway [15:15:14] matanya: I've got plenty to do in the mean time! [15:20:24] paravoid: The bot I converted seems to work fine, including the syncing. [15:21:15] The others are a little more difficult as they need log files to be aggregated and a backup made of the directory (I guess I'll setup a cron to create a tarball of the directory and sync it to /data/project/cvn/backups/tar.gz or something) [15:21:47] and rsync -a --delete /srv/cvn/log /data/project/cvn/log/$(hostname) [15:22:07] (+ -c --no-perms --compress) [15:23:42] (03CR) 10Matanya: [C: 04-1] "This will fail on trusty boxes with ruby 1.9. ruby 1.9 removed "to_a" method from strings. and we still have the following manifests with " [operations/puppet] - 10https://gerrit.wikimedia.org/r/139832 (owner: 10Giuseppe Lavagetto) [15:27:04] (03CR) 10Nemo bis: [C: 031] "True, I had forgotten that we had finally made wgNamespacesToBeSearchedDefault sane. :)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139819 (https://bugzilla.wikimedia.org/66066) (owner: 10Odder) [15:30:22] !log reinstalling analytics1018 [15:30:28] Logged the message, Master [15:32:08] PROBLEM - Host analytics1018 is DOWN: PING CRITICAL - Packet loss = 100% [15:32:32] ACKNOWLEDGEMENT - Host analytics1018 is DOWN: PING CRITICAL - Packet loss = 100% ottomata This node is being reinstalled. [15:33:54] anyone looking at the ubuntu sync alert? [15:35:13] (03PS1) 10Ottomata: Fix for filename of analytics1-d-eqiad.cfg [operations/puppet] - 10https://gerrit.wikimedia.org/r/139850 [15:35:35] (03CR) 10Ottomata: [C: 032 V: 032] Fix for filename of analytics1-d-eqiad.cfg [operations/puppet] - 10https://gerrit.wikimedia.org/r/139850 (owner: 10Ottomata) [15:37:18] RECOVERY - Host analytics1018 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [15:39:29] PROBLEM - SSH on analytics1018 is CRITICAL: Connection timed out [15:39:38] PROBLEM - check if dhclient is running on analytics1018 is CRITICAL: Timeout while attempting connection [15:39:38] PROBLEM - Hadoop DataNode on analytics1018 is CRITICAL: Timeout while attempting connection [15:39:38] PROBLEM - RAID on analytics1018 is CRITICAL: Timeout while attempting connection [15:39:38] PROBLEM - DPKG on analytics1018 is CRITICAL: Timeout while attempting connection [15:39:48] PROBLEM - check configured eth on analytics1018 is CRITICAL: Timeout while attempting connection [15:40:04] paravoid: I'll take a look now [15:40:08] PROBLEM - puppet disabled on analytics1018 is CRITICAL: Timeout while attempting connection [15:40:08] PROBLEM - Disk space on analytics1018 is CRITICAL: Timeout while attempting connection [15:47:23] ACKNOWLEDGEMENT - DPKG on analytics1018 is CRITICAL: Connection refused by host ottomata This node is being reinstalled. [15:47:23] ACKNOWLEDGEMENT - Disk space on analytics1018 is CRITICAL: Connection refused by host ottomata This node is being reinstalled. [15:47:23] ACKNOWLEDGEMENT - Hadoop DataNode on analytics1018 is CRITICAL: Connection refused by host ottomata This node is being reinstalled. [15:47:23] ACKNOWLEDGEMENT - NTP on analytics1018 is CRITICAL: NTP CRITICAL: No response from NTP server ottomata This node is being reinstalled. [15:47:23] ACKNOWLEDGEMENT - RAID on analytics1018 is CRITICAL: Connection refused by host ottomata This node is being reinstalled. [15:47:23] ACKNOWLEDGEMENT - SSH on analytics1018 is CRITICAL: Connection refused ottomata This node is being reinstalled. [15:47:24] ACKNOWLEDGEMENT - check configured eth on analytics1018 is CRITICAL: Connection refused by host ottomata This node is being reinstalled. [15:47:24] ACKNOWLEDGEMENT - check if dhclient is running on analytics1018 is CRITICAL: Connection refused by host ottomata This node is being reinstalled. [15:47:25] ACKNOWLEDGEMENT - puppet disabled on analytics1018 is CRITICAL: Connection refused by host ottomata This node is being reinstalled. [15:58:35] file has vanished: "/Archive-Update-in-Progress-obake.canonical.com" (in ubuntu) [15:58:38] (FTR) [15:58:40] (03CR) 10Giuseppe Lavagetto: "@matanya: both puppet masters are precises, so as long as we don't make the move for trusty boxes we'll be good. Thanks for pointing that " [operations/puppet] - 10https://gerrit.wikimedia.org/r/139832 (owner: 10Giuseppe Lavagetto) [15:59:17] !log manually ran update-ubuntu-mirror on carbon, successful [15:59:21] Logged the message, Master [16:02:13] (03CR) 10Anomie: [C: 04-1] "Doesn't work correctly." (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134400 (owner: 10Nemo bis) [16:04:54] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 13:03:55 UTC [16:10:46] (03PS1) 10Ottomata: Repurpose analytics1018 as kafka broker [operations/puppet] - 10https://gerrit.wikimedia.org/r/139856 [16:18:01] Hi ops. I was looking through pageview stats to identify pages in userspace that were being abused for search rank, when I noticed an anomaly. [16:18:17] User:Cunard is one of the top pages... on the entire site. And it doesn't exist. http://stats.grok.se/en/201406/User:Cunard [16:19:26] I asked Cunard about this and it was news to him as well. He has no idea. He did find another non-existent userpage with high views though, User:The Philip72 [16:20:25] (03CR) 10Ottomata: [C: 032 V: 032] Repurpose analytics1018 as kafka broker [operations/puppet] - 10https://gerrit.wikimedia.org/r/139856 (owner: 10Ottomata) [16:20:46] If someone with squid log access could look into this we'd appreciate it. We don't know if the hits are real, or some problem in the reporting. [16:21:25] This is all on en btw [16:21:54] Gigs-: try pinging in #wikimedia-analytics, someone there might be able to check something for you quickly [16:22:02] ok I'll take it over there, thanks [16:28:26] (03PS1) 10Ottomata: Set num.io.threads to configuration default [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/139862 [16:28:37] (03CR) 10Ottomata: [C: 032 V: 032] Set num.io.threads to configuration default [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/139862 (owner: 10Ottomata) [16:29:10] (03PS1) 10Ottomata: Update kafka module with num.io.threads default change [operations/puppet] - 10https://gerrit.wikimedia.org/r/139863 [16:30:24] (03CR) 10Ottomata: [C: 032 V: 032] Update kafka module with num.io.threads default change [operations/puppet] - 10https://gerrit.wikimedia.org/r/139863 (owner: 10Ottomata) [16:33:04] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Mon Jun 16 16:32:54 UTC 2014 [16:36:14] PROBLEM - Kafka Broker Messages In on analytics1022 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 0.0 [16:37:14] RECOVERY - Kafka Broker Messages In on analytics1022 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 2349.29231937 [16:38:51] (03PS2) 10Ottomata: Add Leila's new ssh key [operations/puppet] - 10https://gerrit.wikimedia.org/r/139779 (owner: 10Springle) [16:39:04] (03CR) 10Ottomata: [C: 032 V: 032] Add Leila's new ssh key [operations/puppet] - 10https://gerrit.wikimedia.org/r/139779 (owner: 10Springle) [16:44:46] paravoid: Moving the remaining bots now [16:44:59] There's 1000s of .nfs#### files in various of the subdirectories I'm moving [16:45:25] seems they can't be deleted even with sudo, I'll move them along and try again afterwards (if they're still there) [16:50:34] PROBLEM - Kafka Broker Under Replicated Partitions on analytics1022 is CRITICAL: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value CRITICAL: 56.0 [16:54:12] (03PS1) 10Ottomata: Set num.replica.fetchers to 4 [operations/puppet] - 10https://gerrit.wikimedia.org/r/139864 [16:56:34] PROBLEM - Kafka Broker Messages In on analytics1012 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 0.0 [16:58:21] .nfs files: http://serverfault.com/questions/201294/nfsxxxx-files-appearing-what-are-those [17:01:34] RECOVERY - Kafka Broker Under Replicated Partitions on analytics1022 is OK: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value OKAY: 0.0 [17:06:13] (03CR) 10QChris: "Only discussion items." (0323 comments) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [17:12:34] RECOVERY - Kafka Broker Messages In on analytics1012 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 5392.1469763 [17:13:57] hm. in puppet, i have a variable $hadoop_namenodes='hadooplogstash05.eqiad.wmflabs' which correctly affects a node when i declare it in global context, but it doesn't work when i declare it within the node definition. this surprises me. [17:15:23] ah i see, the module is explicitly checking for $::hadoop_namenodes [17:34:02] !log csteipp Synchronized php-1.24wmf9/extensions/EducationProgram/includes/api/ApiAddStudents.php: (no message) (duration: 00m 05s) [17:34:07] Logged the message, Master [17:34:17] (03CR) 10Filippo Giunchedi: [C: 031] scap: ensure=>absent /usr/local/bin/sync-common-file [operations/puppet] - 10https://gerrit.wikimedia.org/r/135924 (owner: 10BryanDavis) [17:36:00] !log csteipp Synchronized php-1.24wmf8/extensions/EducationProgram/includes/api/ApiAddStudents.php: Bug66631 (duration: 00m 05s) [17:36:04] Logged the message, Master [18:05:48] (03Draft1) 10Alexandros Kosiaris: osm.planet sync up [operations/puppet] - 10https://gerrit.wikimedia.org/r/136740 [18:17:37] (03CR) 10Tychay: [C: 031] Add Terry Chay to English Wikimedia Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/139791 (owner: 10Nemo bis) [18:23:14] was gerrit upgraded recently? [18:23:35] the emails it sends out for code reviews are now missing inline comments. e.g. all of the mail i received for https://gerrit.wikimedia.org/r/#/c/139809/ does [18:24:57] (03PS2) 10Milimetric: Add backup role and scripts [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) [18:25:13] (03CR) 10Milimetric: Add backup role and scripts (0322 comments) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [18:25:55] (03PS3) 10Milimetric: Add backup role and scripts [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) [18:48:04] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.001 second response time [18:49:01] (03PS2) 10Ottomata: Set num.replica.fetchers to 4 [operations/puppet] - 10https://gerrit.wikimedia.org/r/139864 [18:49:03] (03PS1) 10Ottomata: Fix analytics1018 typo [operations/puppet] - 10https://gerrit.wikimedia.org/r/139883 [18:49:04] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.005 second response time [18:51:12] (03CR) 10Ottomata: [C: 032 V: 032] Set num.replica.fetchers to 4 [operations/puppet] - 10https://gerrit.wikimedia.org/r/139864 (owner: 10Ottomata) [18:51:23] (03CR) 10Ottomata: [C: 032 V: 032] Fix analytics1018 typo [operations/puppet] - 10https://gerrit.wikimedia.org/r/139883 (owner: 10Ottomata) [19:02:10] PROBLEM - RAID on analytics1018 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:02:20] PROBLEM - check configured eth on analytics1018 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:02:20] PROBLEM - check if dhclient is running on analytics1018 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:02:35] PROBLEM - DPKG on analytics1018 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:02:35] PROBLEM - jmxtrans on analytics1018 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:02:35] PROBLEM - Disk space on analytics1018 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:02:35] PROBLEM - puppet disabled on analytics1018 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:02:50] PROBLEM - Kafka Broker Server on analytics1018 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:05:57] first puppet run [19:05:58] shhhh [19:07:49] (03PS1) 10MaxSem: Math: fool-proof configuration [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139888 [19:08:30] RECOVERY - jmxtrans on analytics1018 is OK: PROCS OK: 1 process with command name java, args -jar jmxtrans-all.jar [19:08:30] RECOVERY - puppet disabled on analytics1018 is OK: OK [19:08:31] RECOVERY - Disk space on analytics1018 is OK: DISK OK [19:09:10] RECOVERY - RAID on analytics1018 is OK: OK: no disks configured for RAID [19:09:20] RECOVERY - check configured eth on analytics1018 is OK: NRPE: Unable to read output [19:09:20] RECOVERY - check if dhclient is running on analytics1018 is OK: PROCS OK: 0 processes with command name dhclient [19:12:57] greg-g, I'm gonna SWAT ^^ today, cuz it looks like the next outage begging to happen [19:14:00] PROBLEM - NTP on analytics1018 is CRITICAL: NTP CRITICAL: Offset unknown [19:14:26] MaxSem: good call, thank you [19:15:40] PROBLEM - Kafka Broker Messages In on analytics1018 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 0.0 [19:17:06] (03PS2) 10Ori.livneh: mediawiki: small clean-ups [operations/puppet] - 10https://gerrit.wikimedia.org/r/139065 [19:19:00] RECOVERY - NTP on analytics1018 is OK: NTP OK: Offset -0.02820360661 secs [19:21:30] RECOVERY - DPKG on analytics1018 is OK: All packages OK [19:21:50] RECOVERY - Kafka Broker Server on analytics1018 is OK: PROCS OK: 1 process with command name java, args kafka.Kafka /etc/kafka/server.properties [19:23:01] ACKNOWLEDGEMENT - Kafka Broker Messages In on analytics1018 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 0.0 ottomata This broker does not yet have any partitions allocated to it. [19:27:49] (03PS1) 10Dr0ptp4kt: Update HTTPS settings for a number of operators. [operations/puppet] - 10https://gerrit.wikimedia.org/r/139893 [19:28:18] bblack: when you have a moment, would you please review and, if appropriate, +2 merge and deploy ^^ ? [19:29:55] (03CR) 10Faidon Liambotis: [C: 032] "Sounds good to me." [operations/puppet] - 10https://gerrit.wikimedia.org/r/139832 (owner: 10Giuseppe Lavagetto) [19:31:40] (03CR) 10Matanya: [C: 031] "Thanks for clarifying this." [operations/puppet] - 10https://gerrit.wikimedia.org/r/139832 (owner: 10Giuseppe Lavagetto) [19:31:47] (03CR) 10Ori.livneh: "added some ops so this can be merged" [operations/puppet] - 10https://gerrit.wikimedia.org/r/139059 (owner: 10Krinkle) [19:34:00] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 16:32:54 UTC [19:38:47] Will anything bad happen if I truncate /var/log/diamond/diamond.log on a labs (beta) host? [19:39:16] /var/log/diamond/diamond.log is 755M on deployment-prep [19:39:25] (03PS2) 10Faidon Liambotis: mwgrep: Add namespace prefix in output [operations/puppet] - 10https://gerrit.wikimedia.org/r/139059 (owner: 10Krinkle) [19:39:49] (03CR) 10Faidon Liambotis: [C: 032] mwgrep: Add namespace prefix in output [operations/puppet] - 10https://gerrit.wikimedia.org/r/139059 (owner: 10Krinkle) [19:40:45] (03CR) 10Faidon Liambotis: [V: 032] mwgrep: Add namespace prefix in output [operations/puppet] - 10https://gerrit.wikimedia.org/r/139059 (owner: 10Krinkle) [19:44:50] (03PS1) 10Nemo bis: Add a couple new blogs to the English Wikimedia Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/139897 [19:49:36] (03PS1) 1020after4: Move the ordered_json parser function to a shared module and add appropriate require calls to classes that use the function [operations/puppet] - 10https://gerrit.wikimedia.org/r/139921 [19:50:43] (03CR) 10jenkins-bot: [V: 04-1] Move the ordered_json parser function to a shared module and add appropriate require calls to classes that use the function [operations/puppet] - 10https://gerrit.wikimedia.org/r/139921 (owner: 1020after4) [19:52:30] PROBLEM - Kafka Broker Under Replicated Partitions on analytics1012 is CRITICAL: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value CRITICAL: 57.0 [19:52:40] PROBLEM - Kafka Broker Server on analytics1022 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args kafka.Kafka /etc/kafka/server.properties [19:52:42] (03PS2) 1020after4: Move the ordered_json parser function to a shared module and add appropriate require calls to classes that use the function [operations/puppet] - 10https://gerrit.wikimedia.org/r/139921 [19:53:40] RECOVERY - Kafka Broker Server on analytics1022 is OK: PROCS OK: 1 process with command name java, args kafka.Kafka /etc/kafka/server.properties [19:56:39] ori or bd808, I have an issue with settings.php.erb in vagrant, can one of you advise? [19:56:49] I want to either set up a key = value pair where value is not quoted, or... [19:57:02] or just have a way to insert a full, literal line without erb doing any fancy parsing. [19:57:12] Are either of those cases supported already? [19:57:16] andrewbogott: it's meant to work for simple cases. if you want to do that, you can either have an array of string literals or just a string [19:57:37] ok, can't mix string literals with key,value pairs though, right? [19:57:43] No :( [19:57:48] * ori nods [19:57:52] but you can have two config sections [19:58:11] I did that in the role for central auth [19:59:01] andrewbogott: example: https://github.com/wikimedia/mediawiki-vagrant/blob/master/puppet/manifests/roles/translate.pp#L12 [19:59:07] thanks, reading... [19:59:10] PROBLEM - Kafka Broker Messages In on analytics1022 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 0.0 [19:59:28] andrewbogott: and another: https://github.com/wikimedia/mediawiki-vagrant/blob/master/puppet/manifests/roles/visualeditor.pp#L13 [20:00:04] gwicke, subbu: The time is nigh to deploy Parsoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140616T2000) [20:00:12] ok, not so bad. Thanks. [20:03:31] RECOVERY - Kafka Broker Under Replicated Partitions on analytics1012 is OK: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value OKAY: 0.0 [20:03:40] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Mon Jun 16 20:03:35 UTC 2014 [20:06:06] we are not doing a parsoid deploy today. investigating some regressions found in testing. [20:06:30] greg-g. fyi. in case anyone else wants the window. [20:06:49] i dont see anyone scheduled after us. [20:06:57] in the deployment calendar. [20:08:48] subbu: thanks for the headsup [20:08:52] (03PS1) 10Rush: diamond not in labs and on trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/140009 [20:10:47] chasemp: \o/ nice! [20:10:53] (03PS1) 10Ori.livneh: declare apache::mod::cgi [operations/puppet] - 10https://gerrit.wikimedia.org/r/140010 [20:13:00] (03PS1) 10Ori.livneh: rcstream: un-comment diamond collectors [operations/puppet] - 10https://gerrit.wikimedia.org/r/140012 [20:13:10] RECOVERY - Kafka Broker Messages In on analytics1022 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 2310.59081456 [20:13:23] chasemp: maybe you could +1 that ^^ it just un-comments-out the diamond config for rcstream, which was previously commented out since diamond wasn't available for trusty [20:15:07] (03CR) 10Rush: [C: 031] "YES BUT...depends on https://gerrit.wikimedia.org/r/#/c/140009/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140012 (owner: 10Ori.livneh) [20:15:28] cool :) i'll wait for that one [20:15:33] thank you [20:15:39] np man [20:16:19] (03CR) 10Ori.livneh: [C: 031] diamond not in labs and on trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/140009 (owner: 10Rush) [20:16:40] (03CR) 10Andrew Bogott: [C: 031] "We might want Diamond on labs at some point, but this seems fine for now. Have you verified that this class works on Trusty?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140009 (owner: 10Rush) [20:17:05] (03CR) 10Ori.livneh: "@mutante, godog: remember that neither Bryan nor I are permitted to merge this, so please go ahead if it looks good to you." [operations/puppet] - 10https://gerrit.wikimedia.org/r/135924 (owner: 10BryanDavis) [20:17:34] paravoid: Should be all migrated now [20:17:41] NFS happy? [20:18:07] andrewbogott: would appreciate review / +1 of https://gerrit.wikimedia.org/r/#/c/139065/ whenever you have the chance [20:19:14] (03PS2) 10BBlack: Update HTTPS settings for a number of operators. [operations/puppet] - 10https://gerrit.wikimedia.org/r/139893 (owner: 10Dr0ptp4kt) [20:19:22] (03CR) 10Ori.livneh: [C: 032 V: 032] profiler-to-carbon: set a timeout on the connection [operations/software/mwprof/reporter] - 10https://gerrit.wikimedia.org/r/139093 (owner: 10Giuseppe Lavagetto) [20:19:24] (03CR) 10BBlack: [C: 032 V: 032] Update HTTPS settings for a number of operators. [operations/puppet] - 10https://gerrit.wikimedia.org/r/139893 (owner: 10Dr0ptp4kt) [20:19:26] (03CR) 10Rush: [C: 032] "There have been changes and cleanup so not this exact module version in trusty, I verified the package seems ok, etc...today. post merge " [operations/puppet] - 10https://gerrit.wikimedia.org/r/140009 (owner: 10Rush) [20:21:31] okay ori you're gtg on those collectors [20:21:43] wee! thanks again [20:21:57] (03CR) 10Rush: "for posterity, verified on osmium. all good" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140009 (owner: 10Rush) [20:21:59] (03CR) 10Ori.livneh: [C: 032] rcstream: un-comment diamond collectors [operations/puppet] - 10https://gerrit.wikimedia.org/r/140012 (owner: 10Ori.livneh) [20:26:39] ori: oooh, is diamond https://github.com/BrightcoveOS/Diamond [20:26:40] ? [20:26:53] * YuviPanda gets interested in setting up that for toollabs [20:26:58] YuviPanda: yep [20:27:01] ori: cool [20:27:51] ori: btw, I dropped mongo from toollabs for now. not good enough support for the kind of multi-tenancy / 'shared hosting' we do. [20:28:02] YuviPanda: i saw [20:28:21] no way to create users without attaching them to a db, and no way to not have it pre-allocate a lot of space per-GB in a supported way [20:29:08] ori: but 'tis k, think we'll get postgres for all tools in a while anyway, and I can re-use the script that I wrote for generating user accounts. should hopefully replace the current script that generates user accounts for mysql with that too (current one is in bash-style perl) [20:30:47] (03CR) 10QChris: [C: 04-1] "I vetoed being added to the alert list with Toby in private" [operations/puppet] - 10https://gerrit.wikimedia.org/r/139335 (owner: 10Alexandros Kosiaris) [20:31:10] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: Fetching origin [20:32:10] RECOVERY - Unmerged changes on repository puppet on strontium is OK: Fetching origin [20:39:27] (03PS1) 10QChris: Remove spetrea from icinga's analytics contact group [operations/puppet] - 10https://gerrit.wikimedia.org/r/140016 [20:41:03] (03CR) 10Physikerwelt: [C: 031] Math: fool-proof configuration [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139888 (owner: 10MaxSem) [20:41:23] (03PS1) 10Rush: diamond trusty disable default IPVS [operations/puppet] - 10https://gerrit.wikimedia.org/r/140017 [20:41:30] (03CR) 10QChris: "Thanks for the patch." [operations/puppet] - 10https://gerrit.wikimedia.org/r/139335 (owner: 10Alexandros Kosiaris) [20:42:38] (03CR) 10jenkins-bot: [V: 04-1] diamond trusty disable default IPVS [operations/puppet] - 10https://gerrit.wikimedia.org/r/140017 (owner: 10Rush) [20:44:38] (03PS2) 10Rush: diamond trusty disable default IPVS [operations/puppet] - 10https://gerrit.wikimedia.org/r/140017 [20:45:26] !log updated eventlogging to b4b42effc6 [20:45:31] Logged the message, Master [20:46:30] (03CR) 10Rush: [C: 032 V: 032] "to resolve cron spam" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140017 (owner: 10Rush) [20:48:15] (03CR) 10Ori.livneh: [C: 031] webperf/deprecate: Log jqmigrate to statsd under mw.js.deprecate [operations/puppet] - 10https://gerrit.wikimedia.org/r/137484 (owner: 10Krinkle) [20:53:00] ori: just merge that :) [20:55:14] ori: assuming you have plans for diamond -> graphite for prod, do you plan on seting that up on labs as well? [20:55:25] (03PS1) 10BryanDavis: beta: add role::cache::configuration::backends['labs']['bits'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/140019 [20:56:15] (03PS1) 10Ori.livneh: diamond: default collectors to enabled = true [operations/puppet] - 10https://gerrit.wikimedia.org/r/140020 [20:56:37] (03CR) 10Ori.livneh: [C: 032] "per paravoid" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137484 (owner: 10Krinkle) [20:56:56] bblack, are you available to do the tablet switchover tomorrow? [20:59:19] (03CR) 10Rush: [V: 04-1] "did you try this? We had a similar thing where enabled = true was set here, but I couldn't actually set a collector with no settings whic" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140020 (owner: 10Ori.livneh) [21:00:31] (03CR) 10BryanDavis: "Cherry-picked to deployment-salt." [operations/puppet] - 10https://gerrit.wikimedia.org/r/140019 (owner: 10BryanDavis) [21:02:03] (03PS2) 10Ori.livneh: diamond: default collectors to enabled = true [operations/puppet] - 10https://gerrit.wikimedia.org/r/140020 [21:03:01] chasemp: updated ^ [21:05:22] MaxSem: yes, probably, what time? [21:05:40] (03CR) 10Rush: [C: 031] "rubber stamping, I didn't try to install a new collector but this seems right. Need to go through and clean out the explicity enabled = t" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140020 (owner: 10Ori.livneh) [21:05:46] (03PS3) 10Ori.livneh: mw/apache 2.4 compat: remove DefaultType directive [operations/puppet] - 10https://gerrit.wikimedia.org/r/138891 [21:05:58] (03CR) 10Ori.livneh: [C: 032] diamond: default collectors to enabled = true [operations/puppet] - 10https://gerrit.wikimedia.org/r/140020 (owner: 10Ori.livneh) [21:08:30] (03CR) 10Rush: "Really not sure, but do you need to require wmflib? I can use the function now in puppet without requiring the current module. Unsure if" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139921 (owner: 1020after4) [21:15:45] (03PS1) 10Ori.livneh: diamond: clean out collector enabled => true settings [operations/puppet] - 10https://gerrit.wikimedia.org/r/140022 [21:23:37] (03CR) 1020after4: "I'm not sure either. It seems like puppet must pre-load everything in the modules/*/lib directories. Perhaps there is no point requiring " [operations/puppet] - 10https://gerrit.wikimedia.org/r/139921 (owner: 1020after4) [21:30:28] !log upgraded eventlogging to 3012aad [21:30:28] ^ qchris [21:30:29] Logged the message, Master [21:30:40] Thanks ori \o/ [21:31:09] hey bblack, i'm not going to get to it today, but just in case [21:31:21] its safe to puppetize those new cache nodes, ja? [21:31:27] they won't get traffic unless we manually put them in pybal config? [21:31:29] correct? [21:33:10] paravoid: do I have your permission to merge Krinkle's changes to asset-check.js? I'm in favor of moving it out of operatins/puppet too but I don't want to have to make Krinkle resubmit his commits [21:33:42] look what we can do! [21:33:45] greg-g: ping [21:33:54] greg-g: damn, doesn't work outside -staff? [21:34:01] hmm, should? [21:34:12] * greg-g reads the code [21:34:34] ah, rate limited per nick [21:34:44] greg-g: clever [21:35:24] (03CR) 10Ottomata: [C: 031] declare apache::mod::cgi [operations/puppet] - 10https://gerrit.wikimedia.org/r/140010 (owner: 10Ori.livneh) [21:35:31] thanks [21:38:03] greg-g: ping [21:38:04] NeuroticPanda: You sent me a contentless ping. This is a contentless pong. Please provide a bit of information about what you want and I will respond when I am around. [21:38:09] greg-g: yeah, seems to work [21:38:14] Hah [21:38:32] greg-g: You should hook it up to just do a CTCP PING on the person in reply [21:53:10] (03PS1) 10Matanya: fundraising: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/140029 [22:04:25] (03PS2) 10Ori.livneh: mediawiki: small clean-ups [operations/puppet] - 10https://gerrit.wikimedia.org/r/140010 [22:06:44] (03CR) 10Ori.livneh: [C: 032] mediawiki: small clean-ups [operations/puppet] - 10https://gerrit.wikimedia.org/r/140010 (owner: 10Ori.livneh) [22:21:50] (03PS3) 1020after4: Move the ordered_json parser function to a shared module and add appropriate require calls to classes that use the function [operations/puppet] - 10https://gerrit.wikimedia.org/r/139921 [22:30:42] (03CR) 10BryanDavis: "When I tried to cherry-pick this into deployment-salt it ended up producing an empty commit. Git seems to think this is the same change as" [operations/puppet] - 10https://gerrit.wikimedia.org/r/139065 (owner: 10Ori.livneh) [22:30:57] ori: ^ [22:31:17] * ori looks [22:32:15] (03CR) 10GWicke: "@Filippo: That would be great!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136128 (owner: 10Filippo Giunchedi) [22:32:53] are there sources of ci jobs available somewhere? [22:33:23] yes [22:33:32] stuff like mediawiki-core-phpcs-strict etc [22:33:37] Reedy: where, pls? [22:34:10] https://github.com/wikimedia/integration-jenkins-job-builder-config [22:34:44] Danny_B: See https://www.mediawiki.org/wiki/CI/JJB for how the config gets turned into jobs [22:40:41] thanks guys [22:42:30] hmm, am i blind? [22:42:44] i'm looking for that particular script to check cs [22:44:23] Danny_B: Is it one of the scripts in https://github.com/wikimedia/integration-jenkins/tree/master/bin ? [22:45:59] why do we have this stuff on github and not on our own git server? [22:46:26] github is just a mirror. This is all in gerrit [22:46:42] It's just easier to browse on github in most cases [22:47:08] ah, i see [22:47:19] eg https://gerrit.wikimedia.org/r/#/admin/projects/integration/jenkins and https://git.wikimedia.org/log/integration%2Fjenkins/HEAD [22:47:43] so according to https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs-strict-HEAD/11626/console i'm perhaps looking for checkstyle-phpcs.xml i guess [22:48:34] basically i'm looking for the tool which checks coding conventions [22:48:55] and its configuration of course [22:49:19] https://github.com/wikimedia/mediawiki-tools-codesniffer [22:50:04] is this all documented somewhere? [22:51:47] I'm just connecting the dots from the scripts. run-phpcs-mw.sh is the runner script and it references a checkout of mediawiki-tools-codesniffer for the standard [22:53:52] so in https://github.com/wikimedia/mediawiki-tools-codesniffer/blob/master/MediaWiki/ruleset.xml i see some names of the rules but still i can't find the definition of those rules [22:56:05] Many/most of those are "standard" rules from PHP CodeSniffer [22:56:14] http://www.squizlabs.com/php-codesniffer [22:57:05] Actually they all seem to be standard rules [22:58:59] ori, RoanKattouw_MSc, mwalker: I'll do the SWAT [22:59:04] it's my stuff anyway [22:59:07] OK cool [22:59:27] thanks [22:59:52] MaxSem, are you around, I'm going to deploy your patches [23:00:02] yes I am [23:00:05] mwalker, ori, MaxSem: The time is nigh to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140616T2300) [23:00:14] bd808: thanks, i'll dig through [23:04:00] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Mon 16 Jun 2014 20:03:35 UTC [23:04:08] (03CR) 10MaxSem: [C: 032] Math: fool-proof configuration [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139888 (owner: 10MaxSem) [23:04:19] (03Merged) 10jenkins-bot: Math: fool-proof configuration [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139888 (owner: 10MaxSem) [23:05:33] !log maxsem Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/139888/ (duration: 00m 08s) [23:05:37] Logged the message, Master [23:11:12] !log maxsem Synchronized php-1.24wmf9/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/139562/ (duration: 00m 06s) [23:11:16] Logged the message, Master [23:12:17] !log maxsem Synchronized php-1.24wmf8/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/139562/ (duration: 00m 05s) [23:12:22] Logged the message, Master [23:13:15] jgonera, can you verify that new changes look ok? [23:14:34] MaxSem, in prod? [23:14:38] yep [23:15:44] MaxSem, I can't see most of the changes on enwiki, just some of them [23:16:12] interaction iwth old styles? [23:16:22] MaxSem, logging in helped [23:16:24] weird [23:16:30] those were only CSS changes [23:16:54] 5 minutes haven't passed yet [23:17:50] MaxSem, I logged out and now everything seems to be OK [23:17:59] cool [23:18:03] thanks! [23:18:25] greg-g, I'm done [23:20:43] coolio [23:33:49] (03CR) 10BBlack: [C: 031] puppet: switch masters to puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/139832 (owner: 10Giuseppe Lavagetto) [23:34:33] (03PS1) 10BryanDavis: beta: Small scap fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/140045 [23:46:48] (03Abandoned) 10Ori.livneh: mediawiki: small clean-ups [operations/puppet] - 10https://gerrit.wikimedia.org/r/139065 (owner: 10Ori.livneh) [23:59:19] (03CR) 10BryanDavis: "Cherry-picked to deployment-salt and applied on deployment-jobrunner01 to replace manual application of beta::scap::target." [operations/puppet] - 10https://gerrit.wikimedia.org/r/140045 (owner: 10BryanDavis)