[00:01:08] (03PS7) 10Dzahn: add annual.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/165927 [00:06:38] (03Abandoned) 10Dzahn: add annual.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/165927 (owner: 10Dzahn) [00:11:36] RECOVERY - ElasticSearch health check on logstash1003 is OK: OK - elasticsearch (production-logstash-eqiad) is running. status: green: timed_out: false: number_of_nodes: 3: number_of_data_nodes: 3: active_primary_shards: 42: active_shards: 116: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [00:14:24] !log Jenkins queue is stuck (99% free executors, but it's not running any of Zuul's pending jobs) [00:14:30] Logged the message, Master [00:16:12] James_F: https://integration.wikimedia.org/ (first graph on top right) [00:16:22] 23:50 is when it started. Interesting, this is new information :) [00:16:41] Oh, you mean FR-tech complaining isn't for nothing? [00:17:07] Do ping me earlier. First I heard was from James in PM just moments ago [00:17:21] I though they were just being their usual complaining selves. [00:18:20] marktraceur: :-P [00:18:31] They may have all left already. [00:18:43] I have https://integration.wikimedia.org/zuul/ open full-time in a huge screen right next to me, so I notice things being stuck. [00:18:53] so many huge screens [00:19:01] Only one. [00:19:11] * YuviPanda misses huge screens [00:19:12] Well. There's also my personal second screen. [00:19:16] heh :) [00:19:17] * James_F counts pixels. [00:19:48] 2560+1440 on the laptop; 2560 again on the monitoring screen. [00:20:01] That was fast counting [00:20:04] Krinkle: Do you want to get SMS alerts about this? :-) [00:20:11] marktraceur: What can I say? It's a gift. [00:21:04] James_F: The main metric is "queued x min ago". If that's more than 30 minutes (max job run time) it generally means one or more jobs are stuck and blcking the pipeline [00:21:13] Yeah. [00:21:17] In this case it's donate interface not building [00:21:19] no idea [00:21:28] it didn't create the job yet, so there' snothig I can abort [00:21:30] Krinkle: YuviPanda can set you up with a custom SMS alert thing. ;-) [00:21:46] Krinkle: Can you kill the +2 request? [00:21:52] Or will that not work? [00:22:23] * James_F goes. [00:23:44] Nah, doens't work like that [00:27:53] James_F|Away: heh :) [00:32:59] Found the curlpit [00:33:01] https://gerrit.wikimedia.org/r/#/c/166889/ [00:33:13] is the lowest of a 3-change stack [00:33:22] which can't be merged [00:33:29] and someone Zuul then decides to wait undefinitely [00:33:31] lovely [00:36:58] (03PS4) 10Yuvipanda: labs: reduce acct archiving retention [puppet] - 10https://gerrit.wikimedia.org/r/164520 (https://bugzilla.wikimedia.org/69604) (owner: 10Hashar) [00:39:52] (03PS5) 10Yuvipanda: labs: reduce acct archiving retention [puppet] - 10https://gerrit.wikimedia.org/r/164520 (https://bugzilla.wikimedia.org/69604) (owner: 10Hashar) [00:40:29] (03CR) 10Yuvipanda: [C: 031] "Moved the puppet code into base as well, with realm branching..." [puppet] - 10https://gerrit.wikimedia.org/r/164520 (https://bugzilla.wikimedia.org/69604) (owner: 10Hashar) [00:46:01] (03PS2) 10Dzahn: wikimedia.org service aliases - indentation fixes [dns] - 10https://gerrit.wikimedia.org/r/166914 [00:47:07] (03CR) 10jenkins-bot: [V: 04-1] wikimedia.org service aliases - indentation fixes [dns] - 10https://gerrit.wikimedia.org/r/166914 (owner: 10Dzahn) [00:49:14] !log Zuul queue made unstuck by fixing the clogged build (see bug 72113) [00:49:22] Logged the message, Master [00:50:40] (03PS3) 10Dzahn: wikimedia.org service aliases - indentation fixes [dns] - 10https://gerrit.wikimedia.org/r/166914 [00:51:41] PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 315 seconds [00:51:43] (03CR) 10Dzahn: "fwiw, on PS2 , the dns-tabs lint check said OK even though there were indeed tabs" [dns] - 10https://gerrit.wikimedia.org/r/166914 (owner: 10Dzahn) [00:52:26] PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 358 seconds [00:52:36] RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 40 seconds [00:53:18] RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -0 seconds [00:56:08] (03CR) 10Dzahn: [C: 032] "yep, the actual change is only inside if ($::realm == 'labs')" [puppet] - 10https://gerrit.wikimedia.org/r/164520 (https://bugzilla.wikimedia.org/69604) (owner: 10Hashar) [00:59:55] (03CR) 10Dzahn: "on random labs instance:" [puppet] - 10https://gerrit.wikimedia.org/r/164520 (https://bugzilla.wikimedia.org/69604) (owner: 10Hashar) [01:29:09] (03CR) 10BBlack: [C: 031] ssl_ciphersuite - add new compat mode [puppet] - 10https://gerrit.wikimedia.org/r/166710 (owner: 10Dzahn) [01:31:29] !log springle Synchronized wmf-config/db-eqiad.php: depool db1061 (duration: 00m 07s) [01:31:38] Logged the message, Master [01:40:48] (03PS1) 10Springle: s1 db loads changed from tin during schema changes; make them stick [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166933 [01:41:21] (03CR) 10Springle: [C: 032] s1 db loads changed from tin during schema changes; make them stick [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166933 (owner: 10Springle) [01:41:28] (03Merged) 10jenkins-bot: s1 db loads changed from tin during schema changes; make them stick [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166933 (owner: 10Springle) [02:24:56] !log LocalisationUpdate completed (1.25wmf2) at 2014-10-16 02:24:55+00:00 [02:25:08] Logged the message, Master [02:29:28] (03PS1) 10Yurik: Zero: Updated 410-01 to support Opera https [puppet] - 10https://gerrit.wikimedia.org/r/166936 [02:29:52] bblack, ^ [02:46:47] !log LocalisationUpdate completed (1.25wmf3) at 2014-10-16 02:46:46+00:00 [02:46:53] Logged the message, Master [03:25:49] PROBLEM - puppet last run on db2005 is CRITICAL: CRITICAL: puppet fail [03:43:29] RECOVERY - puppet last run on db2005 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [04:24:21] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Oct 16 04:24:21 UTC 2014 (duration 24m 20s) [04:24:26] Logged the message, Master [04:37:13] (03PS1) 10Springle: prepare es2001 [puppet] - 10https://gerrit.wikimedia.org/r/166945 [04:38:29] (03CR) 10Springle: [C: 032] prepare es2001 [puppet] - 10https://gerrit.wikimedia.org/r/166945 (owner: 10Springle) [04:44:09] !log xtrabackup clone es1004 to es2001 [04:44:15] Logged the message, Master [04:55:25] (03PS2) 10Ori.livneh: trebuchet: derive the grain name from the repo name [puppet] - 10https://gerrit.wikimedia.org/r/166736 [05:18:16] <_joe_> ori: ygpm [05:23:25] (03CR) 10KartikMistry: "Ping :)" [puppet] - 10https://gerrit.wikimedia.org/r/166535 (owner: 10KartikMistry) [06:28:21] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: puppet fail [06:28:31] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: puppet fail [06:28:41] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: puppet fail [06:28:43] PROBLEM - puppet last run on logstash1002 is CRITICAL: CRITICAL: puppet fail [06:29:13] PROBLEM - puppet last run on db1042 is CRITICAL: CRITICAL: puppet fail [06:29:13] PROBLEM - puppet last run on mw1213 is CRITICAL: CRITICAL: puppet fail [06:29:32] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:11] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:32] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:51] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:51] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:01] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:03] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 6 failures [06:31:04] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:04] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:12] PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:13] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:13] PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:14] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:15] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:24] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:27] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:31] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:32] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:02] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:02] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:02] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:02] RECOVERY - puppet last run on iron is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:45:41] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:45:52] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:45:52] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [06:45:52] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:45:53] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:45:53] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:46:01] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:46:02] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:46:11] RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:46:20] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:46:20] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:46:21] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:46:31] RECOVERY - puppet last run on db1042 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:46:32] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [06:46:32] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:46:32] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:46:41] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:46:42] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:47:01] RECOVERY - puppet last run on logstash1002 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [06:47:09] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:47:09] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:47:11] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:47:23] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 61 seconds ago with 0 failures [06:47:31] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:47:41] RECOVERY - puppet last run on amssq46 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [06:47:53] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [07:49:54] (03PS2) 10BBlack: Zero: Updated 410-01 to support Opera https [puppet] - 10https://gerrit.wikimedia.org/r/166936 (owner: 10Yurik) [07:50:02] (03CR) 10BBlack: [C: 032 V: 032] Zero: Updated 410-01 to support Opera https [puppet] - 10https://gerrit.wikimedia.org/r/166936 (owner: 10Yurik) [07:53:27] !log Jenkins: upgrading PHPUnit from 3.7.28 to 3.7.37 {{gerrit|164683}} [https://lists.wikimedia.org/pipermail/wikitech-l/2014-October/079049.html wikitech-l announce] [07:53:39] Logged the message, Master [08:29:34] PROBLEM - puppet last run on install2001 is CRITICAL: CRITICAL: puppet fail [08:33:52] (03PS3) 10Hashar: Get betalabs localsettings.js file from deploy repo (just like prod) [puppet] - 10https://gerrit.wikimedia.org/r/166610 (owner: 10Subramanya Sastry) [08:34:31] (03CR) 10Hashar: [C: 031] "I adjusted the commit message to mention this puppet patch depends on having the file being made available in the deploy repo ( https://ge" [puppet] - 10https://gerrit.wikimedia.org/r/166610 (owner: 10Subramanya Sastry) [08:44:02] !log powercycle ms-be1007, no ssh and no console [08:44:09] Logged the message, Master [08:44:12] oh [08:44:14] :( [08:44:29] sadness indeed :( [08:46:17] RECOVERY - Host ms-be1007 is UP: PING OK - Packet loss = 0%, RTA = 2.44 ms [08:46:44] RECOVERY - very high load average likely xfs on ms-be1007 is OK: OK - load average: 16.36, 3.76, 1.24 [08:47:45] RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [08:49:54] RECOVERY - Swift HTTP backend on ms-fe2004 is OK: HTTP OK: HTTP/1.1 200 OK - 396 bytes in 0.098 second response time [08:49:55] RECOVERY - Swift HTTP backend on ms-fe2002 is OK: HTTP OK: HTTP/1.1 200 OK - 396 bytes in 0.101 second response time [08:50:04] RECOVERY - Swift HTTP backend on ms-fe2003 is OK: HTTP OK: HTTP/1.1 200 OK - 396 bytes in 0.104 second response time [08:50:16] RECOVERY - LVS HTTP IPv4 on ms-fe.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 396 bytes in 0.105 second response time [08:50:38] RECOVERY - Swift HTTP backend on ms-fe2001 is OK: HTTP OK: HTTP/1.1 200 OK - 396 bytes in 0.104 second response time [08:51:23] those recoveries are unrelated of course, there was an acl missing from the container used by the nagios check [08:52:00] <_joe_> eheh [08:57:40] (03CR) 10Filippo Giunchedi: [C: 031] "ack, thanks John for the explanation!" [puppet] - 10https://gerrit.wikimedia.org/r/166686 (owner: 10John F. Lewis) [09:01:07] (03CR) 10JanZerebecki: [C: 031] ssl_ciphersuite - add new compat mode [puppet] - 10https://gerrit.wikimedia.org/r/166710 (owner: 10Dzahn) [09:20:20] (03PS1) 10Glaisher: Enable wgCopyUploadsFromSpecialUpload on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166964 (https://bugzilla.wikimedia.org/71897) [09:21:14] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [500.0] [09:22:27] PROBLEM - Host tmh1002 is DOWN: PING CRITICAL - Packet loss = 100% [09:24:58] ori: I have conducted my survey on virt package declarations vs name them in the role (trebuchet changeset discussion) and the users agree that just doing it in the role is better. I'll look at your gerrit changeset here in a bit [09:26:40] apergos: cool, thanks very much for checking with folk! I haven't had a chance to test my latest PS so it may be buggy. [09:27:23] ok, I'll keep that in mind [09:28:37] anyone investigating tmh1002? [09:29:10] ok, I logged in [09:30:00] !log powercycling tmh1002, unresponsive, stuck, no vga output [09:30:08] Logged the message, Master [09:32:49] The disk drive for /tmp is not ready yet or not present. [09:32:49] Continue to wait, or Press S to skip mounting or M for manual recovery [09:32:58] RECOVERY - Host tmh1002 is UP: PING OK - Packet loss = 0%, RTA = 1.04 ms [09:33:27] interesting [09:33:31] tmh1002 doesn't have a separate /tmp [09:33:57] PROBLEM - MySQL Replication Heartbeat on db1061 is CRITICAL: CRIT replication delay 28809 seconds [09:34:08] PROBLEM - MySQL Slave Delay on db1061 is CRITICAL: CRIT replication delay 28819 seconds [09:34:38] oh, raid's resyncing [09:34:52] [>....................] resync = 0.0% (136768/975652672) finish=18562.7min speed=875K/sec [09:34:55] lolol [09:35:53] springle: hey [09:36:01] springle: db1061 errors just a minute ago [09:36:14] paravoid: yep [09:36:15] (slave delay) [09:36:21] phone buzzed me [09:36:24] haha really? [09:36:37] I'm impressed :) [09:37:48] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [09:38:38] SMART Health Status: HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE FAILURE [asc=5d, ascq=10] [09:38:46] I love the caps [09:39:11] ACKNOWLEDGEMENT - MySQL Replication Heartbeat on db1061 is CRITICAL: CRIT replication delay 29119 seconds Sean Pringle schema change blocking [09:39:12] ACKNOWLEDGEMENT - MySQL Slave Delay on db1061 is CRITICAL: CRIT replication delay 29068 seconds Sean Pringle schema change blocking [09:39:32] (03PS1) 10Gilles: Revert "Revert "Prerender thumbnails at upload time on all wikis except commons"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166968 [09:39:51] gi11es: rationale? [09:40:09] paravoid: the code has been fixed to not generate noisy 500s anymore [09:40:26] oh Revert Revert [09:40:29] just saw that [09:40:40] those 500s were harmless in the first place, but I understand it made monitoring difficult [09:41:01] yup this has been a long-standing issue [09:41:25] thanks :) [09:41:38] I worry about the prerender tbh [09:42:05] usual worries -- disk space, i/o [09:42:11] but I guess it's worth a try [09:42:48] PROBLEM - very high load average likely xfs on ms-be1003 is CRITICAL: CRITICAL - load average: 232.33, 125.69, 59.97 [09:44:00] it's a sort of trial anyway, because we have to redo the logic based on rectangular buckets instead of width-based in both media viewer and the prerendering [09:44:27] how are we going to cleanup the width-based thumbs? :) [09:45:16] when we get to thumbnails that expire when they're not used, I guess [09:45:36] what this trial will tell us is how much of an impact prerendering new uploads has, compared to re-processing existing files [09:46:09] the latter means creating a job, a bunch of extra work, etc. for the real deal. and maybe in practice doing just new uploads will reduce the bulk of the render-on-miss [09:46:12] we just don't know [09:46:17] godog: ping [09:46:17] paravoid: ping detected, please leave a message! [09:46:20] haha [09:46:44] [4899575.904105] BUG: soft lockup - CPU#10 stuck for 22s! [xfsaild/sdn3:1148] [09:46:47] grumble [09:47:11] once we know that in a few weeks (how much there is to gain for prerendering new uploads and not processing the backlog of files), I'm fine with pulling this width-based prerendering [09:47:38] PROBLEM - NTP on tmh1002 is CRITICAL: NTP CRITICAL: Offset unknown [09:47:45] because I don't expect the multimedia team to have the bandwidth to do the proper rectangular-based bucketing until next year [09:47:51] paravoid: :( 1003 eh? [09:48:04] unless we get someone else on the team, but no word on that at the moment [09:48:15] yeah [09:48:22] that's the SSD [09:48:43] the only certainty is that we're losing our PM, which means more PM-ish work for me and less time writing code... [09:49:42] paravoid: the "fix" afaict is just to reboot the machine when xfs gets into a funny state [09:50:58] sounds like my level of ops skills [09:51:47] RECOVERY - NTP on tmh1002 is OK: NTP OK: Offset -0.001677393913 secs [09:51:54] hehe gi11es "have you tried turning off and on again?" [09:52:07] I'll do that, but this isn't the first time this happens on this box [09:52:24] !log rebooting ms-be1003, sdn3/xfs troubles [09:52:31] Logged the message, Master [09:53:00] indeed, it happens from time to time on random ms-be boxes (hence the lame alarm), the trace hints at some locks too [09:53:37] that alarm is also how we discovered that you can't have commas in alert descriptions [09:53:44] (true story) [09:54:13] (03PS3) 10ArielGlenn: trebuchet: derive the grain name from the repo name [puppet] - 10https://gerrit.wikimedia.org/r/166736 (owner: 10Ori.livneh) [09:54:18] PROBLEM - swift-object-replicator on ms-be1003 is CRITICAL: Connection refused by host [09:54:27] PROBLEM - SSH on ms-be1003 is CRITICAL: Connection refused [09:54:28] PROBLEM - swift-object-updater on ms-be1003 is CRITICAL: Connection refused by host [09:54:47] PROBLEM - swift-account-reaper on ms-be1003 is CRITICAL: Connection refused by host [09:54:58] PROBLEM - check if salt-minion is running on ms-be1003 is CRITICAL: Connection refused by host [09:54:58] PROBLEM - swift-account-replicator on ms-be1003 is CRITICAL: Connection refused by host [09:55:10] PROBLEM - swift-account-auditor on ms-be1003 is CRITICAL: Connection refused by host [09:55:28] PROBLEM - swift-container-auditor on ms-be1003 is CRITICAL: Connection refused by host [09:55:28] PROBLEM - puppet last run on ms-be1003 is CRITICAL: Connection refused by host [09:55:47] PROBLEM - swift-container-updater on ms-be1003 is CRITICAL: Connection refused by host [09:55:47] PROBLEM - swift-container-server on ms-be1003 is CRITICAL: Connection refused by host [09:55:48] also, word of warning, since the fix for the 500s is going to be deployed in today's train for wikipedias, I didn't bother backporting it for the SWAT that happens 3 hours earlier [09:56:06] so do expect a spike of harmless 500s between the SWAT and the deploy train [09:56:09] PROBLEM - swift-container-replicator on ms-be1003 is CRITICAL: Connection refused by host [09:56:09] PROBLEM - DPKG on ms-be1003 is CRITICAL: Connection refused by host [09:56:10] PROBLEM - swift-object-auditor on ms-be1003 is CRITICAL: Connection refused by host [09:56:10] PROBLEM - check if dhclient is running on ms-be1003 is CRITICAL: Connection refused by host [09:56:10] PROBLEM - Disk space on ms-be1003 is CRITICAL: Connection refused by host [09:56:10] PROBLEM - swift-account-server on ms-be1003 is CRITICAL: Connection refused by host [09:56:17] PROBLEM - check configured eth on ms-be1003 is CRITICAL: Connection refused by host [09:56:17] PROBLEM - RAID on ms-be1003 is CRITICAL: Connection refused by host [09:56:27] PROBLEM - swift-object-server on ms-be1003 is CRITICAL: Connection refused by host [09:56:44] if that's bad, let me know and I'll add the backport to the SWAT [09:57:58] (03CR) 10ArielGlenn: "it needed changes to files/modules/deploy.py as well." [puppet] - 10https://gerrit.wikimedia.org/r/166736 (owner: 10Ori.livneh) [09:59:17] PROBLEM - Host ms-be1003 is DOWN: CRITICAL - Plugin timed out after 15 seconds [10:00:07] RECOVERY - very high load average likely xfs on ms-be1003 is OK: OK - load average: 5.30, 1.17, 0.38 [10:00:10] RECOVERY - check if salt-minion is running on ms-be1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:00:10] RECOVERY - swift-account-replicator on ms-be1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [10:00:10] RECOVERY - DPKG on ms-be1003 is OK: All packages OK [10:00:10] RECOVERY - swift-container-replicator on ms-be1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [10:00:17] RECOVERY - Host ms-be1003 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms [10:00:17] RECOVERY - swift-account-auditor on ms-be1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [10:00:17] RECOVERY - swift-object-auditor on ms-be1003 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [10:00:17] RECOVERY - check if dhclient is running on ms-be1003 is OK: PROCS OK: 0 processes with command name dhclient [10:00:17] RECOVERY - Disk space on ms-be1003 is OK: DISK OK [10:00:18] RECOVERY - swift-account-server on ms-be1003 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [10:00:27] RECOVERY - check configured eth on ms-be1003 is OK: NRPE: Unable to read output [10:00:37] RECOVERY - RAID on ms-be1003 is OK: OK: optimal, 14 logical, 14 physical [10:00:47] RECOVERY - swift-object-server on ms-be1003 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [10:00:47] RECOVERY - swift-object-replicator on ms-be1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [10:00:47] RECOVERY - SSH on ms-be1003 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [10:00:48] RECOVERY - swift-object-updater on ms-be1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [10:00:53] RECOVERY - swift-container-auditor on ms-be1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:00:53] RECOVERY - puppet last run on ms-be1003 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [10:01:07] RECOVERY - swift-account-reaper on ms-be1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [10:01:09] RECOVERY - swift-container-updater on ms-be1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [10:01:09] RECOVERY - swift-container-server on ms-be1003 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [10:05:52] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add initial Debian packaging [debs/contenttranslation/cg3] - 10https://gerrit.wikimedia.org/r/163579 (owner: 10KartikMistry) [10:06:17] (03PS1) 10Glaisher: Add commons to wgImportSources for sewikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166969 (https://bugzilla.wikimedia.org/72125) [10:10:17] gi11es: ack! [10:11:02] !log Updated our Jenkins Job Builder forked repository ( ee80dbc..7ad4386 ). No job configuration impact. [10:11:07] Logged the message, Master [10:15:08] PROBLEM - NTP on ms-be1003 is CRITICAL: NTP CRITICAL: Offset unknown [10:19:09] RECOVERY - NTP on ms-be1003 is OK: NTP OK: Offset -0.004389405251 secs [11:06:44] springle: still here? [11:07:32] springle: ah, nevermind, found it [11:17:18] PROBLEM - puppet last run on rcs1001 is CRITICAL: CRITICAL: Puppet has 1 failures [11:17:18] PROBLEM - puppet last run on rdb1002 is CRITICAL: CRITICAL: Puppet has 1 failures [11:17:18] PROBLEM - puppet last run on analytics1011 is CRITICAL: CRITICAL: Puppet has 1 failures [11:17:28] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Puppet has 1 failures [11:17:29] PROBLEM - puppet last run on db1064 is CRITICAL: CRITICAL: Puppet has 1 failures [11:17:39] PROBLEM - puppet last run on analytics1014 is CRITICAL: CRITICAL: Puppet has 1 failures [11:17:39] PROBLEM - puppet last run on db2011 is CRITICAL: CRITICAL: Puppet has 1 failures [11:17:39] PROBLEM - puppet last run on elastic1005 is CRITICAL: CRITICAL: Puppet has 1 failures [11:17:48] PROBLEM - puppet last run on wtp1013 is CRITICAL: CRITICAL: Puppet has 1 failures [11:17:56] PROBLEM - puppet last run on protactinium is CRITICAL: CRITICAL: Puppet has 1 failures [11:17:58] PROBLEM - puppet last run on wtp1003 is CRITICAL: CRITICAL: Puppet has 1 failures [11:17:58] PROBLEM - puppet last run on mw1186 is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:08] PROBLEM - puppet last run on search1013 is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:10] PROBLEM - puppet last run on db1030 is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:11] PROBLEM - puppet last run on elastic1002 is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:11] PROBLEM - puppet last run on mw1024 is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:18] PROBLEM - puppet last run on pc1003 is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:18] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:19] PROBLEM - puppet last run on mw1001 is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:19] PROBLEM - puppet last run on nitrogen is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:19] PROBLEM - puppet last run on mw1148 is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:30] PROBLEM - puppet last run on ssl1007 is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:39] PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:42] PROBLEM - puppet last run on mw1121 is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:44] PROBLEM - puppet last run on rdb1003 is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:44] PROBLEM - puppet last run on mw1043 is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:45] PROBLEM - OCG health on ocg1001 is CRITICAL: CRITICAL: ocg_job_status 281570 msg: ocg_render_job_queue 1173 msg (=500 critical) [11:18:59] PROBLEM - puppet last run on db1006 is CRITICAL: CRITICAL: Puppet has 1 failures [11:18:59] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 1 failures [11:19:18] PROBLEM - puppet last run on mw1201 is CRITICAL: CRITICAL: Puppet has 1 failures [11:19:18] PROBLEM - puppet last run on wtp1007 is CRITICAL: CRITICAL: Puppet has 1 failures [11:19:18] PROBLEM - puppet last run on mw1122 is CRITICAL: CRITICAL: Puppet has 1 failures [11:19:19] PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: ocg_job_status 282304 msg: ocg_render_job_queue 1468 msg (=500 critical) [11:19:28] PROBLEM - OCG health on ocg1002 is CRITICAL: CRITICAL: ocg_job_status 282382 msg: ocg_render_job_queue 1512 msg (=500 critical) [11:19:29] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: Puppet has 1 failures [11:19:59] PROBLEM - puppet last run on analytics1027 is CRITICAL: CRITICAL: Puppet has 1 failures [11:31:09] RECOVERY - puppet last run on elastic1005 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [11:31:09] RECOVERY - puppet last run on protactinium is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [11:31:31] RECOVERY - puppet last run on rcs1001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [11:31:31] RECOVERY - OCG health on ocg1003 is OK: OK: ocg_job_status 288151 msg: ocg_render_job_queue 63 msg [11:31:32] RECOVERY - puppet last run on rdb1002 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [11:31:38] RECOVERY - OCG health on ocg1002 is OK: OK: ocg_job_status 288155 msg: ocg_render_job_queue 40 msg [11:31:48] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [11:31:49] RECOVERY - puppet last run on db1064 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [11:31:50] RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [11:31:59] RECOVERY - OCG health on ocg1001 is OK: OK: ocg_job_status 288197 msg: ocg_render_job_queue 0 msg [11:32:00] RECOVERY - puppet last run on analytics1014 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [11:32:18] RECOVERY - puppet last run on wtp1013 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [11:32:18] RECOVERY - puppet last run on db2011 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [11:32:19] RECOVERY - puppet last run on wtp1003 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [11:32:19] RECOVERY - puppet last run on mw1186 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [11:32:20] RECOVERY - puppet last run on db1030 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [11:32:20] RECOVERY - puppet last run on wtp1007 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [11:32:20] RECOVERY - puppet last run on elastic1002 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [11:32:29] RECOVERY - puppet last run on pc1003 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [11:32:39] RECOVERY - puppet last run on analytics1011 is OK: OK: Puppet is currently enabled, last run 61 seconds ago with 0 failures [11:32:39] RECOVERY - puppet last run on mw1001 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [11:32:39] RECOVERY - puppet last run on nitrogen is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [11:32:40] RECOVERY - puppet last run on mw1148 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [11:32:40] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [11:32:40] RECOVERY - puppet last run on ssl1007 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [11:32:48] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [11:32:53] RECOVERY - puppet last run on mw1121 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [11:33:00] RECOVERY - puppet last run on rdb1003 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [11:33:18] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [11:33:19] RECOVERY - puppet last run on search1013 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [11:33:28] RECOVERY - puppet last run on mw1201 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [11:33:40] RECOVERY - puppet last run on mw1024 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [11:33:40] RECOVERY - puppet last run on mw1122 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [11:33:59] RECOVERY - puppet last run on mw1043 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [11:34:08] RECOVERY - puppet last run on analytics1027 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [11:34:19] RECOVERY - puppet last run on db1006 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [11:53:14] (03CR) 10Calak: [C: 031] Add commons to wgImportSources for sewikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166969 (https://bugzilla.wikimedia.org/72125) (owner: 10Glaisher) [13:00:05] K4: Respected human, time to deploy Fundraising (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141016T1300). Please do the needful. [14:08:45] <_joe_> !log depooling mw1189 from the api pool, reimaging with hhvm [14:08:56] Logged the message, Master [14:11:23] Wellp, time to look at SWAT. [14:12:04] I see I was correct in predicting that I would be SWATting gi11es's patch [14:38:50] _joe_: hiyaaa [14:38:55] i've got some time today for some app server installs [14:39:03] can I help? (you had an etherpad..right?) [14:39:40] <_joe_> ottomata: no, wait :) [14:39:43] <_joe_> maybe tomorrow [14:39:55] daw, i'm not working tomorrow [14:40:05] LAZY [14:40:07] <_joe_> ottomata: next week is going to be really full [14:40:17] of app server install time? [14:40:19] <_joe_> so you can recover next week [14:40:22] haha, ok [14:40:23] <_joe_> ottomata: yes :) [14:40:32] yeah, that's good too, i think i'm on RT duty next week to [14:40:42] so i can focus a little less on analytics stuff then and do some app servers too [14:42:29] ^demon|away: are you the one with the scheme for making a staff irc bouncer? [14:52:11] Ookayyyyy [14:52:22] aude, gi11es, are you both around to verify your patches? [14:53:16] marktraceur: ahoy [14:59:02] OK, we'll do the thumbnail thing first [14:59:17] aude needs to answer before I can commit to doing the badges stuff. [14:59:27] Ooh, and there are Glaisher patches - Glaisher, are you around to verify things? [14:59:37] yeah [14:59:40] No hoo either [15:00:04] manybubbles, anomie, ^d, marktraceur, gi11es: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141016T1500). Please do the needful. [15:00:31] Is hoo on the list? [15:00:37] hoom? [15:01:47] He'd be able to take aude swat request at least [15:01:51] Right. [15:02:16] > Revert "Revert "Prerender..."" [15:02:17] Exciting [15:02:24] !log removed 'publicKey' and 'accessKey' from ldap user records -- they were obsolete and making everyone nervous [15:02:31] Logged the message, Master [15:02:43] Heh [15:03:04] (03CR) 10MarkTraceur: [C: 032] Revert "Revert "Prerender thumbnails at upload time on all wikis except commons"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166968 (owner: 10Gilles) [15:03:17] gi11es: OK, here we go. Do I need to do anything special, or just push out the change/ [15:03:19] (03Merged) 10jenkins-bot: Revert "Revert "Prerender thumbnails at upload time on all wikis except commons"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166968 (owner: 10Gilles) [15:03:20] ? [15:03:48] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: Puppet has 1 failures [15:03:48] marktraceur: just push it [15:04:00] Cool beans. [15:05:18] !log marktraceur Synchronized wmf-config/InitialiseSettings.php: Re-enable prerendering of thumbnails for new files. (duration: 00m 05s) [15:05:20] gi11es: Time to test! [15:05:25] Logged the message, Master [15:05:31] alright, I shall do that [15:06:01] mw1189 returned 'no such file or directory' for sync-common, anyone noticed that before? [15:06:57] Glaisher: You're next! [15:07:04] \o/ [15:07:21] (03PS2) 10MarkTraceur: Enable wgCopyUploadsFromSpecialUpload on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166964 (https://bugzilla.wikimedia.org/71897) (owner: 10Glaisher) [15:07:29] (03CR) 10MarkTraceur: [C: 032] Enable wgCopyUploadsFromSpecialUpload on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166964 (https://bugzilla.wikimedia.org/71897) (owner: 10Glaisher) [15:07:34] marktraceur: here [15:07:36] (03Merged) 10jenkins-bot: Enable wgCopyUploadsFromSpecialUpload on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166964 (https://bugzilla.wikimedia.org/71897) (owner: 10Glaisher) [15:07:39] sorrry, just got home [15:07:41] Awesome [15:07:50] aude: No problem, I can do Glaisher's and then you're up. [15:07:55] ok [15:08:02] (03PS2) 10MarkTraceur: Add commons to wgImportSources for sewikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166969 (https://bugzilla.wikimedia.org/72125) (owner: 10Glaisher) [15:08:13] aude: You've got the most complicated ones anyway [15:09:31] marktraceur: works according to my test on enwiki [15:09:39] (03CR) 10MarkTraceur: [C: 032] Add commons to wgImportSources for sewikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166969 (https://bugzilla.wikimedia.org/72125) (owner: 10Glaisher) [15:09:41] Awesomesauce [15:09:46] (03Merged) 10jenkins-bot: Add commons to wgImportSources for sewikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166969 (https://bugzilla.wikimedia.org/72125) (owner: 10Glaisher) [15:09:51] gi11es: Let me know if you see any issues with it and I'll hop in to revert. [15:10:31] ok, the only expected thing is a rise in 500s until the deploy train (when the fix lands on wikipedias) [15:11:09] !log marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] Enable wgCopyUploadsFromSpecialUpload on testwiki, Add commons to wgImportSources for sewikimedia (duration: 00m 05s) [15:11:10] gi11es: That's exciting. [15:11:17] Logged the message, Master [15:11:20] gi11es: But it won't have any effect? [15:11:33] they're harmless 500s, just the php backend saying "nope, can't make an image that size" [15:11:44] Glaisher: Test, go, fight, team. :) [15:11:53] it just screws with monitoring 500s, which is why this got reverted last time [15:12:04] since it's only going to last 3 hours this time, I figured the backport would be overkill [15:12:19] Well, here's hoping [15:12:45] I'll keep a sharp eye on icinga-wm and this channel for the next three hours in any case [15:14:04] marktraceur: works [15:14:06] thanks a lot [15:14:12] aude: I assume I do the extension updates first, then the config changes [15:14:21] Glaisher: My pleasure! [15:14:26] order does not matter [15:14:42] extensions first is ok [15:15:35] OK [15:15:58] Waiting on Jenkins, should be fun [15:16:19] eh. the testwiki one does not. [15:16:20] Error fetching URL: Received HTTP code 403 from proxy after CONNECT [15:16:31] Glaisher: Worse than before? [15:16:41] didn't break anything [15:16:50] Well, I won't worry too much [15:16:54] Think you can fix 'er? [15:17:49] i'll look into it [15:19:08] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [15:19:21] <_joe_> \o/ [15:23:59] OK aude, I'm doing wmf2 first [15:24:08] ok [15:25:49] !log marktraceur Synchronized php-1.25wmf2/extensions/Wikidata/: [SWAT] [wmf2] Update CSS for Wikidata badges (duration: 00m 11s) [15:25:56] Logged the message, Master [15:25:57] checking [15:25:58] <_joe_> !log repooling mw1189 [15:26:06] Logged the message, Master [15:26:28] And wmf3 on the way [15:26:37] !log marktraceur Synchronized php-1.25wmf3/extensions/Wikidata/: [SWAT] [wmf3] Update CSS for Wikidata badges (duration: 00m 11s) [15:26:42] Logged the message, Master [15:26:54] it will need the config [15:27:00] but nothing obvious broken [15:27:02] (03CR) 10MarkTraceur: [C: 032] Define client CSS classes for new wikidata badges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166801 (owner: 10Hoo man) [15:27:06] Cool, config coming soon [15:27:10] (03Merged) 10jenkins-bot: Define client CSS classes for new wikidata badges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166801 (owner: 10Hoo man) [15:27:16] (03CR) 10MarkTraceur: [C: 032] Wikidata: Also search in NS_PROPERTY per default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166785 (owner: 10Hoo man) [15:27:24] (03Merged) 10jenkins-bot: Wikidata: Also search in NS_PROPERTY per default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166785 (owner: 10Hoo man) [15:27:29] wmf3 looks good [15:29:45] !log marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] Enable wgCopyUploadsFromSpecialUpload on testwiki (duration: 00m 05s) [15:29:51] Logged the message, Master [15:29:54] ...wait. [15:29:59] That was the wrong log entry [15:30:29] !log Sorry, that was in fact adding NS_PROPERTY to the search configuration, mistyped. [15:30:36] Logged the message, Master [15:31:13] !log marktraceur Synchronized wmf-config/Wikibase.php: [SWAT] Define client CSS classes for new wikidata badges (duration: 00m 05s) [15:31:15] aude: You should be set now! [15:31:19] ok [15:31:19] Logged the message, Master [15:33:33] marktraceur: looks good [15:33:37] aude: Excellent! [15:33:43] * aude sees stars next to featured lists [15:33:51] I now declare the GRAND AND BENEVOLENT SWAT HOUR to be completed! [15:33:58] thanks [15:34:06] Return to your homes [15:34:07] and search looks good [15:34:38] eeeeh [15:34:48] we might have a foloow up for search later [15:35:11] So demanding [15:35:17] But it won't be my problem, so demand away [15:35:27] lol [15:35:29] * Reedy hids [15:35:31] hides even [15:35:34] Too late! [15:35:45] it finds only properties and not items ....ooops [15:38:43] (03PS1) 10Giuseppe Lavagetto: mediawiki: avoid having mysqld running on trusty. [puppet] - 10https://gerrit.wikimedia.org/r/167007 [15:39:31] for wgNamespacesToBeSearchedDefault when it has +eswiki' => array( 100 => 1, 104 => 1 ), [15:39:43] then that means in addition to the 'default' ones? [15:40:10] 'default' => array( 0 => 1, 1 => 0, 2 => 0, 3 => 0, 4 => 0, 5 => 0, 6 => 0, 7 => 0, 8 => 0, 9 => 0, 10 => 0, 11 => 0, 12 => 0, 13 => 0 ), [15:40:12] aude: I believe so [15:40:23] then it seems odd and i shall investigate [15:40:29] yeah [15:40:37] aude: var_dump via eval.php to check? [15:40:54] ooooh [15:41:00] think i know [15:42:36] or not [15:44:06] (03CR) 10Chad: "Why on earth would that be a dependency? *sigh*" [puppet] - 10https://gerrit.wikimedia.org/r/167007 (owner: 10Giuseppe Lavagetto) [15:44:09] array ( 120 => 1, 0 => 1, ) [15:44:13] looks correct [15:45:07] (03CR) 10Giuseppe Lavagetto: "It's not strictly a dependency. The mediawiki package recommends it, and it gets installed." [puppet] - 10https://gerrit.wikimedia.org/r/167007 (owner: 10Giuseppe Lavagetto) [15:51:20] (03PS2) 10Chad: Configure Elasticsearch for statsd [puppet] - 10https://gerrit.wikimedia.org/r/166690 [15:51:22] (03PS1) 10Chad: Decom deployment-elastic01 from beta [puppet] - 10https://gerrit.wikimedia.org/r/167010 [15:53:25] search works logged out [15:56:27] (03PS2) 10Chad: Decom deployment-elastic01 from beta [puppet] - 10https://gerrit.wikimedia.org/r/167010 [15:59:15] Reedy: my search preferences http://dpaste.com/2SNAS0G have 'searchNs0' => '', [15:59:23] i wonder how it got like that? [15:59:33] i am sure it used to work [16:00:04] ejegg, AndyRussG: Respected human, time to deploy Central Notice (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141016T1600). Please do the needful. [16:01:05] searchNs0 = null (empty) in the database [16:15:52] (03CR) 10Subramanya Sastry: "https://gerrit.wikimedia.org/r/#/c/166608/ is now merged." [puppet] - 10https://gerrit.wikimedia.org/r/166610 (owner: 10Subramanya Sastry) [16:22:34] (03CR) 10Manybubbles: [C: 031] Configure Elasticsearch for statsd (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/166690 (owner: 10Chad) [16:22:51] (03CR) 10Manybubbles: [C: 031] Decom deployment-elastic01 from beta [puppet] - 10https://gerrit.wikimedia.org/r/167010 (owner: 10Chad) [16:25:39] (03CR) 10Chad: Configure Elasticsearch for statsd (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/166690 (owner: 10Chad) [16:31:06] (03PS1) 10BBlack: Disable SSLv3 completely [puppet] - 10https://gerrit.wikimedia.org/r/167015 [16:38:41] PROBLEM - DPKG on tungsten is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:41:50] RECOVERY - DPKG on tungsten is OK: All packages OK [16:43:40] (03CR) 10Filippo Giunchedi: [C: 031] Configure Elasticsearch for statsd (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/166690 (owner: 10Chad) [16:48:49] (03PS1) 10Ori.livneh: Apt::Conf['no-recommends'] -> Package <| provider == 'apt' |> [puppet] - 10https://gerrit.wikimedia.org/r/167020 [16:50:24] So I'm trying to do a Central Notice deployment, and I seem to have lost some privileges since I last deployed Sept. 15th [16:50:47] (03PS4) 10Ori.livneh: trebuchet: derive the grain name from the repo name [puppet] - 10https://gerrit.wikimedia.org/r/166736 [16:50:49] I can't +2 my submodule update, and I can't get from the bastion onto tin [16:51:20] Did I do something bad? [16:51:47] <_joe_> ori: so we already do 'no-recommends', just we didn't enforce it to happen before apt was running? [16:51:51] <_joe_> LOL [16:51:51] (03CR) 10Ori.livneh: [C: 032 V: 032] trebuchet: derive the grain name from the repo name [puppet] - 10https://gerrit.wikimedia.org/r/166736 (owner: 10Ori.livneh) [16:52:05] ejegg: Cast out the heretic! :) [16:52:12] * marktraceur looks [16:52:22] ejegg: What bastion are you using? [16:52:33] bast1001 - did that change? [16:52:39] PROBLEM - nutcracker port on mw1189 is CRITICAL: Cannot assign requested address [16:52:45] _joe_: yep [16:52:54] ejegg: Don't think so, I'm getting through on it [16:53:00] <_joe_> wut? [16:53:02] ejegg: Can you log in to bast1001 though? [16:53:29] i mean, i ssh -A bast1001 and get in fine, then ssh -A tin gets me permission denied (publickey) [16:53:36] Hm. [16:53:48] RECOVERY - nutcracker port on mw1189 is OK: TCP OK - 0.000 second response time on port 11212 [16:53:57] which definitely wasn't the case on 9/15 [16:54:02] That also works for me [16:54:22] gwicke: yt? [16:57:30] PROBLEM - puppet last run on wtp1010 is CRITICAL: CRITICAL: Puppet has 1 failures [16:57:45] ottomata: pong [16:57:51] PROBLEM - puppet last run on mw1104 is CRITICAL: CRITICAL: Puppet has 1 failures [16:57:56] (03PS2) 10Ottomata: [WIP] Initial commit of Cassandra puppet module [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 [16:57:58] (03CR) 10Ottomata: [WIP] Initial commit of Cassandra puppet module (032 comments) [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [16:58:08] gwicke: ok uHm [16:58:09] oh yeah [16:58:14] (03PS1) 10Ori.livneh: Revert "trebuchet: derive the grain name from the repo name" [puppet] - 10https://gerrit.wikimedia.org/r/167022 [16:58:20] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 1 failures [16:58:20] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures [16:58:20] PROBLEM - puppet last run on mw1131 is CRITICAL: CRITICAL: Puppet has 1 failures [16:58:20] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Puppet has 1 failures [16:58:21] PROBLEM - puppet last run on wtp1008 is CRITICAL: CRITICAL: Puppet has 1 failures [16:58:21] PROBLEM - puppet last run on vanadium is CRITICAL: CRITICAL: Puppet has 1 failures [16:58:28] (03CR) 10Ori.livneh: [C: 032 V: 032] Revert "trebuchet: derive the grain name from the repo name" [puppet] - 10https://gerrit.wikimedia.org/r/167022 (owner: 10Ori.livneh) [16:58:29] PROBLEM - puppet last run on mw1215 is CRITICAL: CRITICAL: Puppet has 1 failures [16:58:30] PROBLEM - puppet last run on wtp1001 is CRITICAL: CRITICAL: Puppet has 1 failures [16:58:31] PROBLEM - puppet last run on mw1154 is CRITICAL: CRITICAL: Puppet has 1 failures [16:58:32] q about libjna-java [16:58:34] and libjemalloc1 [16:58:37] ejegg: What are the first 8 or so characters of your public key? [16:58:39] PROBLEM - puppet last run on mw1021 is CRITICAL: CRITICAL: Puppet has 1 failures [16:58:41] PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 1 failures [16:58:41] does cassandra just automatically use them if they are installed? [16:58:48] the puppet failures were my change, no actual production impact [16:58:48] one sec, let me find that [16:58:50] PROBLEM - puppet last run on sca1001 is CRITICAL: CRITICAL: Puppet has 2 failures [16:58:57] i reverted so they'll go away [16:58:58] ejegg: Actually, give me a paste with the whole thing [16:59:00] PROBLEM - puppet last run on mw1047 is CRITICAL: CRITICAL: Puppet has 1 failures [16:59:00] PROBLEM - puppet last run on mw1113 is CRITICAL: CRITICAL: Puppet has 1 failures [16:59:01] PROBLEM - puppet last run on mw1064 is CRITICAL: CRITICAL: Puppet has 1 failures [16:59:09] PROBLEM - puppet last run on mw1207 is CRITICAL: CRITICAL: Puppet has 1 failures [16:59:10] PROBLEM - puppet last run on mw1037 is CRITICAL: CRITICAL: Puppet has 1 failures [16:59:10] PROBLEM - puppet last run on mw1155 is CRITICAL: CRITICAL: Puppet has 1 failures [16:59:50] PROBLEM - puppet last run on mw1103 is CRITICAL: CRITICAL: Puppet has 1 failures [16:59:50] PROBLEM - puppet last run on mw1128 is CRITICAL: CRITICAL: Puppet has 1 failures [16:59:50] PROBLEM - puppet last run on mw1018 is CRITICAL: CRITICAL: Puppet has 1 failures [16:59:55] ah, the cassandra package depends on libjna-java [16:59:57] so that will be installed. [16:59:59] PROBLEM - puppet last run on mw1073 is CRITICAL: CRITICAL: Puppet has 1 failures [17:00:04] what about libjemalloc1 [17:00:06] gwicke: ^ [17:00:11] PROBLEM - puppet last run on mw1179 is CRITICAL: CRITICAL: Puppet has 1 failures [17:00:11] PROBLEM - puppet last run on mw1199 is CRITICAL: CRITICAL: Puppet has 1 failures [17:00:11] PROBLEM - puppet last run on elastic1013 is CRITICAL: CRITICAL: Puppet has 1 failures [17:00:15] marktraceur: http://pastebin.com/4HscLZ1c [17:00:15] ottomata: IIRC it'll use it if installed [17:00:20] PROBLEM - puppet last run on mw1070 is CRITICAL: CRITICAL: Puppet has 1 failures [17:00:21] PROBLEM - puppet last run on mw1085 is CRITICAL: CRITICAL: Puppet has 1 failures [17:00:30] PROBLEM - puppet last run on mw1194 is CRITICAL: CRITICAL: Puppet has 1 failures [17:00:30] PROBLEM - puppet last run on mw1017 is CRITICAL: CRITICAL: Puppet has 1 failures [17:00:30] PROBLEM - puppet last run on mw1058 is CRITICAL: CRITICAL: Puppet has 1 failures [17:00:31] hm, ook [17:00:39] PROBLEM - puppet last run on mw1101 is CRITICAL: CRITICAL: Puppet has 1 failures [17:00:39] PROBLEM - puppet last run on mw1020 is CRITICAL: CRITICAL: Puppet has 1 failures [17:00:39] PROBLEM - puppet last run on elastic1010 is CRITICAL: CRITICAL: Puppet has 1 failures [17:00:49] PROBLEM - puppet last run on mw1015 is CRITICAL: CRITICAL: Puppet has 2 failures [17:00:50] Looks right to me, ejegg. [17:00:51] PROBLEM - puppet last run on mw1019 is CRITICAL: CRITICAL: Puppet has 1 failures [17:00:51] PROBLEM - puppet last run on wtp1014 is CRITICAL: CRITICAL: Puppet has 1 failures [17:00:51] PROBLEM - puppet last run on tmh1001 is CRITICAL: CRITICAL: Puppet has 2 failures [17:01:00] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:00] PROBLEM - puppet last run on mw1095 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:01] PROBLEM - puppet last run on mw1102 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:01] PROBLEM - puppet last run on elastic1009 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:01] PROBLEM - puppet last run on mw1169 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:08] ottomata: cassandra tries to do as much off-heap allocation as possible to avoid GC scaling limits [17:01:09] ejegg: Can you pastebin your ssh config too? [17:01:10] PROBLEM - puppet last run on mw1075 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:10] PROBLEM - puppet last run on mw1157 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:19] PROBLEM - puppet last run on mw1137 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:21] PROBLEM - puppet last run on mw1078 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:29] PROBLEM - puppet last run on mw1136 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:29] RECOVERY - puppet last run on vanadium is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [17:01:29] PROBLEM - puppet last run on mw1083 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:30] PROBLEM - puppet last run on mw1127 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:39] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:39] PROBLEM - puppet last run on lanthanum is CRITICAL: CRITICAL: Puppet has 6 failures [17:01:39] PROBLEM - puppet last run on elastic1016 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:42] PROBLEM - puppet last run on tmh1002 is CRITICAL: CRITICAL: Puppet has 2 failures [17:01:49] PROBLEM - puppet last run on mw1182 is CRITICAL: CRITICAL: Puppet has 1 failures [17:01:59] PROBLEM - puppet last run on mw1214 is CRITICAL: CRITICAL: Puppet has 1 failures [17:02:00] PROBLEM - puppet last run on mw1094 is CRITICAL: CRITICAL: Puppet has 1 failures [17:02:08] ottomata: in early cassandra 2.0.x versions the packaged jna wasn't picked up properly either, but I think that's fixed now [17:02:09] PROBLEM - puppet last run on mw1184 is CRITICAL: CRITICAL: Puppet has 1 failures [17:02:35] PROBLEM - puppet last run on mw1138 is CRITICAL: CRITICAL: Puppet has 1 failures [17:02:39] PROBLEM - puppet last run on mw1035 is CRITICAL: CRITICAL: Puppet has 1 failures [17:02:39] PROBLEM - puppet last run on mw1013 is CRITICAL: CRITICAL: Puppet has 2 failures [17:02:59] marktraceur: D'oh! I hadn't ssh-add ed the key, just unlocked it for the first hop. with ssh-add I'm onto tin. Sorry for wasting your time! [17:03:01] ottomata: tne cassandra 2.1.0 package does not depend on jna [17:03:16] Ah! [17:03:17] oh it does not? i'm looking at 2.0.10 [17:03:18] i thikn [17:03:21] ejegg: No problem [17:03:41] ottomata: 2.0.9 seems to list it [17:04:34] marktraceur: Can I ask for another minute of your time to +2 for a submodule bump? https://gerrit.wikimedia.org/c/167017 [17:04:50] ottomata: so might be worth installing explicitly, unless it's not needed any more in 2.1.0 [17:05:01] https://gerrit.wikimedia.org/r/167017 [17:05:18] ejegg: You're deploying it right now? [17:05:20] Wait. [17:05:29] You have access to tin but not +2 in deploy branches [17:05:31] That's wrong [17:05:31] um, hoping to [17:05:43] yeah, I def. had +2 in deploy branches for 1.24 [17:05:46] uhhhhhh ottomata can you fix this? [17:05:46] back in Sept [17:06:01] Multiple weirdnesses with ejegg's account. [17:06:04] waassup? [17:06:25] Oh, wait, let me try. ejegg needs +2 in deployment branches. [17:06:28] it's a different pubkey for gerrit and for tin, could that be it? [17:06:37] ottomata: looking at https://issues.apache.org/jira/browse/CASSANDRA-5872, maybe it's bundled in 2.1.0 [17:07:10] ejegg: Try now [17:07:29] marktraceur: not sure if I am the right person to ask, (as I'm not totally sure what deployment branches are). not sure who or what is supposed to have access to them [17:07:34] marktraceur: still just +1 / 0 / -1 [17:07:53] ejegg: Hm. Maybe try logging in and out, I think that may be necessary with perms changes [17:07:58] will do [17:08:09] You're part of wmf-deployment now for sure [17:08:38] and... fixed! Thanks for all the help, marktraceur! [17:08:42] Woot [17:08:46] cool(?¿) :) [17:08:46] ottomata: so not installing it separately seems to be fine in both 2.0.x and 2.1.x; lets just double-check that all is well after doing the first install through puppet [17:08:48] I can do things, I'm a real boy [17:08:52] ok cool [17:09:04] hehe. So glad I grabbed 2 hrs for this slot! [17:09:46] ejegg: If you need any more help don't hesitate to ask of course [17:09:50] thanks again [17:14:32] RECOVERY - puppet last run on mw1064 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [17:15:11] RECOVERY - puppet last run on mw1215 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [17:15:11] RECOVERY - puppet last run on wtp1010 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [17:15:24] (03CR) 10GWicke: [WIP] Initial commit of Cassandra puppet module (031 comment) [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [17:15:31] RECOVERY - puppet last run on mw1113 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:15:42] RECOVERY - puppet last run on mw1207 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [17:16:01] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [17:16:01] RECOVERY - puppet last run on wtp1008 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [17:16:06] RECOVERY - puppet last run on mw1131 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:16:06] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [17:16:06] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [17:16:11] RECOVERY - puppet last run on mw1154 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [17:16:11] RECOVERY - puppet last run on wtp1001 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [17:16:21] RECOVERY - puppet last run on mw1021 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [17:16:22] RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [17:16:22] RECOVERY - puppet last run on mw1103 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [17:16:22] RECOVERY - puppet last run on sca1001 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [17:16:41] RECOVERY - puppet last run on mw1104 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [17:16:42] RECOVERY - puppet last run on mw1037 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [17:16:43] RECOVERY - puppet last run on mw1155 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [17:17:02] RECOVERY - puppet last run on mw1194 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [17:17:22] RECOVERY - puppet last run on tmh1001 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [17:17:22] RECOVERY - puppet last run on mw1018 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [17:17:32] RECOVERY - puppet last run on mw1095 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [17:17:32] RECOVERY - puppet last run on mw1047 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [17:17:41] RECOVERY - puppet last run on elastic1009 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [17:17:41] RECOVERY - puppet last run on mw1073 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [17:17:42] RECOVERY - puppet last run on mw1179 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [17:17:42] RECOVERY - puppet last run on mw1199 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [17:17:42] RECOVERY - puppet last run on mw1075 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [17:17:42] RECOVERY - puppet last run on mw1157 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [17:17:42] RECOVERY - puppet last run on elastic1013 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [17:17:43] RECOVERY - puppet last run on mw1137 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [17:17:52] RECOVERY - puppet last run on mw1017 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [17:17:52] RECOVERY - puppet last run on mw1070 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:18:02] RECOVERY - puppet last run on mw1058 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [17:18:02] RECOVERY - puppet last run on mw1085 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [17:18:03] RECOVERY - puppet last run on mw1078 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [17:18:03] RECOVERY - puppet last run on lanthanum is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [17:18:11] RECOVERY - puppet last run on mw1101 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [17:18:11] RECOVERY - puppet last run on elastic1016 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [17:18:21] gwicke: i'm trying to find some info about jemalloc and cassandra [17:18:21] RECOVERY - puppet last run on elastic1010 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [17:18:21] RECOVERY - puppet last run on mw1020 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:18:21] RECOVERY - puppet last run on mw1015 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [17:18:22] RECOVERY - puppet last run on tmh1002 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [17:18:22] RECOVERY - puppet last run on mw1128 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [17:18:22] RECOVERY - puppet last run on mw1019 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [17:18:32] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [17:18:33] RECOVERY - puppet last run on mw1102 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [17:18:39] as far as I can tell, i've installed it, but cassandra is not using it? or maybe it does so more dynamically than I nkow about? [17:18:41] RECOVERY - puppet last run on mw1169 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [17:19:02] RECOVERY - puppet last run on mw1083 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [17:19:02] RECOVERY - puppet last run on mw1127 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [17:19:22] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [17:19:22] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: Puppet last ran 80977 seconds ago, expected 14400 [17:19:30] i'm trying to set various things in the cassandra-env.sh file that might make it load it [17:19:31] RECOVERY - puppet last run on mw1182 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [17:19:32] RECOVERY - puppet last run on wtp1014 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [17:19:32] RECOVERY - puppet last run on mw1214 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [17:19:41] RECOVERY - puppet last run on mw1094 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [17:19:44] RECOVERY - puppet last run on mw1184 is OK: OK: Puppet is currently enabled, last run 61 seconds ago with 0 failures [17:19:52] PROBLEM - puppet last run on gadolinium is CRITICAL: CRITICAL: Puppet last ran 81079 seconds ago, expected 14400 [17:20:02] RECOVERY - puppet last run on mw1136 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:20:03] RECOVERY - puppet last run on mw1138 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [17:20:12] RECOVERY - puppet last run on mw1013 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [17:20:22] RECOVERY - puppet last run on mw1035 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [17:22:03] gwicke: i'm also googling for info about how jemalloc helps with cassandra, is this documented somewhere? [17:29:01] RECOVERY - puppet last run on gadolinium is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [17:29:09] !log depooled mw1114 for api [17:29:14] Logged the message, Master [17:29:33] RECOVERY - puppet last run on oxygen is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [17:29:43] !log ejegg Synchronized php-1.25wmf3/extensions/CentralNotice/: Update CentralNotice (duration: 00m 08s) [17:29:48] Logged the message, Master [17:30:21] ottomata: I"m told you are the one to ask about labcontrol2001: is it a live salt master actually talking to any minions or is it 'for future use' with a future codfw labs cluster, or something else entirely? [17:30:38] uhhh, apergos, i betcha someone told you to ask andrew [17:30:41] probably the other one [17:30:41] :) [17:30:58] woops [17:31:05] in fact I know which oe and yet my brain [17:31:17] must have clocked out already :-D [17:34:09] hah :) [17:42:35] (03PS1) 10MaxSem: Add m.wikisource.org [dns] - 10https://gerrit.wikimedia.org/r/167027 (https://bugzilla.wikimedia.org/69765) [17:44:08] can someone take a look at ^^^ ? paravoid? [17:46:31] (03PS2) 10MaxSem: Add m.wikisource.org [dns] - 10https://gerrit.wikimedia.org/r/167027 (https://bugzilla.wikimedia.org/69765) [17:47:22] !log ejegg Synchronized php-1.25wmf2/extensions/CentralNotice/: Update CentralNotice (duration: 00m 04s) [17:47:28] Logged the message, Master [17:48:11] (03PS3) 10Ottomata: [WIP] Initial commit of Cassandra puppet module [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 [17:50:12] PROBLEM - check_mysql on payments1004 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [17:50:15] PROBLEM - check_mysql on payments1002 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [17:50:16] PROBLEM - check_mysql on payments1003 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [17:50:40] grr. stupid package update. [17:50:49] aware & fixing ^^^ [17:54:41] (03CR) 10Faidon Liambotis: [C: 032] Add m.wikisource.org [dns] - 10https://gerrit.wikimedia.org/r/167027 (https://bugzilla.wikimedia.org/69765) (owner: 10MaxSem) [17:55:12] RECOVERY - check_mysql on payments1004 is OK: Uptime: 98787 Threads: 2 Questions: 64988 Slow queries: 110 Opens: 444 Flush tables: 1 Open tables: 61 Queries per second avg: 0.657 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [17:55:29] RECOVERY - check_mysql on payments1002 is OK: Uptime: 96821 Threads: 2 Questions: 678994 Slow queries: 3090 Opens: 286 Flush tables: 1 Open tables: 40 Queries per second avg: 7.012 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [17:55:29] RECOVERY - check_mysql on payments1003 is OK: Uptime: 98020 Threads: 2 Questions: 682945 Slow queries: 3214 Opens: 293 Flush tables: 1 Open tables: 46 Queries per second avg: 6.967 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [17:57:14] thanks paravoid:) [17:57:30] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 909.06665 [17:58:21] :) [18:00:04] Reedy, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141016T1800). Please do the needful. [18:00:38] OO! [18:01:55] (03PS1) 10MaxSem: Enable mobile redirect for old Wikisource (http://wikisource.org) [puppet] - 10https://gerrit.wikimedia.org/r/167032 (https://bugzilla.wikimedia.org/69765) [18:04:35] really? wikisource has no subdomain? [18:05:00] oh wikisource is the crazy case where both $language.wikisource.org and wikisource.org coexist [18:05:21] oh, I guess not [18:05:40] eh, the comment could've been clearer indeed [18:06:42] just add it to the first regexp [18:06:55] alongside mediawiki & wikimediafoundation.org [18:07:20] it'll match wikisource.org, and it'll also match www.wikisource.org rather than going to wikisource.org first, then redirecting again [18:07:48] right [18:08:31] (03PS2) 10MaxSem: Enable mobile redirect for old Wikisource (http://wikisource.org) [puppet] - 10https://gerrit.wikimedia.org/r/167032 (https://bugzilla.wikimedia.org/69765) [18:09:03] ugh, will cause m.m.wikisource [18:09:30] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [18:10:20] iiiiinteresting [18:11:38] (03PS3) 10MaxSem: Enable mobile redirect for old Wikisource (http://wikisource.org) [puppet] - 10https://gerrit.wikimedia.org/r/167032 (https://bugzilla.wikimedia.org/69765) [18:11:54] I miss my test suite for redirector.c [18:13:14] ooh good catch [18:15:33] ottomata: jemalloc is a slightly faster memory allocator than the libc malloc [18:15:40] gwicke: i found it [18:15:44] patch updated [18:15:47] ottomata: cassandra transparently falls back to libc malloc if jemalloc is not installed [18:15:56] naw, you need to specify it [18:16:09] http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__memory_allocator [18:16:23] hm, okay [18:16:39] eating some luncha nd will start to look into the security stuff [18:17:44] has been a while since I last looked into it ;) [18:18:48] ottomata: I believe the main complication about the security stuff might be getting the CA into the Java trust store [18:18:48] or whatever it's actually called [18:19:19] keystore & truststore it seems [18:21:30] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 882.200012 [18:21:45] hm hm mhm [18:21:58] !log restarted varnishkafka on cp3021 [18:22:07] Logged the message, Master [18:24:41] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [18:26:04] pfft, it's reporting a negative value? http://ganglia.wikimedia.org/latest/graph.php?r=4hr&z=xlarge&hreg[]=%28amssq|cp%29.%2B&mreg[]=kafka.varnishkafka\.kafka_drerr.per_second>ype=line&title=kafka.varnishkafka\.kafka_drerr.per_second&aggregate=1 [18:27:34] i just restarted it, jgage [18:27:37] so it reset the seq to 0 [18:28:02] ok, so just a monitoring artifact [18:35:33] (03PS1) 10Yurik: Zero: unified everything, no opera for 293-41 [puppet] - 10https://gerrit.wikimedia.org/r/167037 [18:35:33] bblack, :) ^ [18:37:19] where can i see mobile cache fragmentation levels [18:39:47] paravoid and mark, you will be happy to know it too :) ^ [18:39:58] what we have been all waiting for for a very very very long time [18:41:12] \o/ [18:41:39] MaxSem: m.wikisource.org redirects to wikisource.org for me [18:43:33] yurikR: now, can we get rid of all this logic entirely? [18:44:01] paravoid, not yet -- we are working on switichng our analytics to hadoop, once that is done, yes [18:44:24] that's why we have to do it in the deliwvery phase - to set the x-analytics correctly [18:44:43] analytics is the only component that needs it now, not the backend [18:46:30] chasemp: , yt? [18:47:45] I see an andrewbogott... [18:47:52] I"m told you are the one to ask about labcontrol2001: is it a live salt master actually talking to any minions or is it 'for future use' with a future codfw labs cluster, or something else entirely? [18:48:06] (see, otto mata, asking the right andrew this time :-D) [18:48:34] apergos: No minions so far. It's a future codfw labs controller, as you say [18:48:42] (03PS4) 10MaxSem: Enable mobile redirect for old Wikisource (http://wikisource.org) [puppet] - 10https://gerrit.wikimedia.org/r/167032 (https://bugzilla.wikimedia.org/69765) [18:48:51] Right now it's doing ldap and dns for production though [18:48:52] paravoid, ^^ should fix it [18:49:34] blergh, right [18:50:00] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 425.533325 [18:50:14] apergos: why do you ask? [18:50:23] hm hm hm [18:50:33] esams bits is having trouble producing to kafka [18:50:34] no likey [18:51:12] MaxSem: so... www.wikisource.org from a phone :) [18:51:21] -> www.m.wikisource.org -> NXDOMAIN [18:51:49] which is the case right now I suppose? [18:52:39] yup, it should be gone after the varnish fix [18:52:50] it won't? [18:53:05] the www redirect is after the \w+ one [18:53:29] (03PS1) 10Ottomata: Grant access to Magnus Edenhill on analytics1003 [puppet] - 10https://gerrit.wikimedia.org/r/167042 [18:53:30] mindfuck [18:53:33] haha [18:53:45] actually [18:53:47] move it up again [18:53:53] (03PS1) 10Gage: Enable GELF for MRAppManager [puppet/cdh] - 10https://gerrit.wikimedia.org/r/167043 [18:54:19] and let's add |m to the second regexp? [18:55:13] (03CR) 10Gage: [C: 032] Enable GELF for MRAppManager [puppet/cdh] - 10https://gerrit.wikimedia.org/r/167043 (owner: 10Gage) [18:56:01] (03PS2) 10Ottomata: Grant access to Magnus Edenhill on analytics1003 [puppet] - 10https://gerrit.wikimedia.org/r/167042 [18:57:04] mark: https://gerrit.wikimedia.org/r/#/c/167042/2 [18:57:09] ok to give magnus access to analytics1003? [18:58:30] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 418.333344 [18:58:53] (03PS5) 10MaxSem: Enable mobile redirect for old Wikisource (http://wikisource.org) [puppet] - 10https://gerrit.wikimedia.org/r/167032 (https://bugzilla.wikimedia.org/69765) [18:59:28] MaxSem: that's right, right? :) [18:59:48] (03PS1) 10Gage: Enable GELF for MRAppManager part 2 [puppet] - 10https://gerrit.wikimedia.org/r/167044 [19:00:03] paravoid, prrrrrrobably [19:01:03] heh [19:03:14] (03PS2) 10Gage: Enable GELF for MRAppManager part 2 [puppet] - 10https://gerrit.wikimedia.org/r/167044 [19:03:19] _joe_: i think the issue is unrelated to memory, but io deadlocks when shelling out to /usr/bin/diff [19:05:00] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:07:19] andrewbogott: going to be doing a saltl upgrade and want to know what services I might impact there (seems like none) [19:08:40] PROBLEM - DPKG on palladium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:09:14] (03PS1) 10Reedy: Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167048 [19:09:16] (03PS1) 10Reedy: testwiki to 1.25wmf4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167049 [19:09:18] (03PS1) 10Reedy: Wikipedias to 1.25wmf3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167050 [19:10:07] (03CR) 10Reedy: [C: 032] Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167048 (owner: 10Reedy) [19:10:08] apergos: Yep, I agree -- none :) [19:10:15] (03Merged) 10jenkins-bot: Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167048 (owner: 10Reedy) [19:10:20] PROBLEM - puppet last run on db1069 is CRITICAL: CRITICAL: puppet fail [19:10:20] PROBLEM - puppet last run on rubidium is CRITICAL: CRITICAL: puppet fail [19:10:23] PROBLEM - puppet last run on mw1133 is CRITICAL: CRITICAL: puppet fail [19:10:23] PROBLEM - puppet last run on lvs3004 is CRITICAL: CRITICAL: Puppet has 29 failures [19:10:30] PROBLEM - puppet last run on analytics1023 is CRITICAL: CRITICAL: puppet fail [19:10:30] PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: puppet fail [19:10:30] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: puppet fail [19:10:31] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: Puppet has 75 failures [19:10:31] PROBLEM - puppet last run on analytics1013 is CRITICAL: CRITICAL: puppet fail [19:10:40] PROBLEM - puppet last run on mw1050 is CRITICAL: CRITICAL: puppet fail [19:10:40] PROBLEM - puppet last run on mw1057 is CRITICAL: CRITICAL: puppet fail [19:10:40] RECOVERY - DPKG on palladium is OK: All packages OK [19:10:40] PROBLEM - puppet last run on mw1151 is CRITICAL: CRITICAL: puppet fail [19:10:50] PROBLEM - puppet last run on mw1081 is CRITICAL: CRITICAL: puppet fail [19:10:50] PROBLEM - puppet last run on db1060 is CRITICAL: CRITICAL: Puppet has 38 failures [19:10:52] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Puppet has 48 failures [19:10:52] PROBLEM - puppet last run on amssq51 is CRITICAL: CRITICAL: Puppet has 47 failures [19:10:53] PROBLEM - puppet last run on labsdb1006 is CRITICAL: CRITICAL: puppet fail [19:10:53] PROBLEM - puppet last run on lvs4003 is CRITICAL: CRITICAL: Puppet has 43 failures [19:10:53] PROBLEM - puppet last run on mw1049 is CRITICAL: CRITICAL: Puppet has 119 failures [19:10:53] PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: Puppet has 31 failures [19:10:53] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: Puppet has 45 failures [19:10:54] PROBLEM - puppet last run on ms-be3001 is CRITICAL: CRITICAL: Puppet has 21 failures [19:10:56] PROBLEM - puppet last run on search1017 is CRITICAL: CRITICAL: puppet fail [19:10:56] PROBLEM - puppet last run on amssq36 is CRITICAL: CRITICAL: puppet fail [19:10:56] PROBLEM - puppet last run on search1005 is CRITICAL: CRITICAL: Puppet has 86 failures [19:10:56] PROBLEM - puppet last run on snapshot1004 is CRITICAL: CRITICAL: puppet fail [19:11:00] PROBLEM - puppet last run on mw1014 is CRITICAL: CRITICAL: Puppet has 99 failures [19:11:00] PROBLEM - puppet last run on ms-be1012 is CRITICAL: CRITICAL: puppet fail [19:11:00] PROBLEM - puppet last run on plutonium is CRITICAL: CRITICAL: Puppet has 50 failures [19:11:00] PROBLEM - puppet last run on mw1183 is CRITICAL: CRITICAL: puppet fail [19:11:00] PROBLEM - puppet last run on db1026 is CRITICAL: CRITICAL: Puppet has 45 failures [19:11:01] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Puppet has 49 failures [19:11:01] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: Puppet has 11 failures [19:11:02] PROBLEM - puppet last run on db1004 is CRITICAL: CRITICAL: puppet fail [19:11:02] PROBLEM - puppet last run on rdb1001 is CRITICAL: CRITICAL: puppet fail [19:11:03] PROBLEM - puppet last run on mw1165 is CRITICAL: CRITICAL: puppet fail [19:11:03] PROBLEM - puppet last run on argon is CRITICAL: CRITICAL: puppet fail [19:11:04] PROBLEM - puppet last run on db1020 is CRITICAL: CRITICAL: puppet fail [19:11:04] PROBLEM - puppet last run on mw1084 is CRITICAL: CRITICAL: Puppet has 119 failures [19:11:05] PROBLEM - puppet last run on mw1180 is CRITICAL: CRITICAL: puppet fail [19:11:10] PROBLEM - puppet last run on lvs2006 is CRITICAL: CRITICAL: Puppet has 44 failures [19:11:10] PROBLEM - puppet last run on virt1004 is CRITICAL: CRITICAL: Puppet has 46 failures [19:11:10] PROBLEM - puppet last run on db2037 is CRITICAL: CRITICAL: puppet fail [19:11:11] PROBLEM - puppet last run on mw1098 is CRITICAL: CRITICAL: puppet fail [19:11:11] PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: Puppet has 29 failures [19:11:12] PROBLEM - puppet last run on mw1181 is CRITICAL: CRITICAL: puppet fail [19:11:12] PROBLEM - puppet last run on db2004 is CRITICAL: CRITICAL: puppet fail [19:11:12] PROBLEM - puppet last run on db1039 is CRITICAL: CRITICAL: Puppet has 20 failures [19:11:12] PROBLEM - puppet last run on mw1079 is CRITICAL: CRITICAL: Puppet has 99 failures [19:11:13] PROBLEM - puppet last run on snapshot1002 is CRITICAL: CRITICAL: puppet fail [19:11:13] PROBLEM - puppet last run on cp1062 is CRITICAL: CRITICAL: puppet fail [19:11:14] PROBLEM - puppet last run on wtp1018 is CRITICAL: CRITICAL: puppet fail [19:11:14] PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: puppet fail [19:11:15] PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: puppet fail [19:11:15] PROBLEM - puppet last run on cp4005 is CRITICAL: CRITICAL: Puppet has 52 failures [19:11:16] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: Puppet has 119 failures [19:11:16] PROBLEM - puppet last run on db2001 is CRITICAL: CRITICAL: puppet fail [19:11:17] PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: puppet fail [19:11:22] PROBLEM - puppet last run on analytics1022 is CRITICAL: CRITICAL: Puppet has 43 failures [19:11:22] PROBLEM - puppet last run on db1071 is CRITICAL: CRITICAL: puppet fail [19:11:22] PROBLEM - puppet last run on mw1156 is CRITICAL: CRITICAL: puppet fail [19:11:22] PROBLEM - puppet last run on mw1125 is CRITICAL: CRITICAL: Puppet has 119 failures [19:11:23] PROBLEM - puppet last run on mw1159 is CRITICAL: CRITICAL: puppet fail [19:11:23] PROBLEM - puppet last run on db2023 is CRITICAL: CRITICAL: puppet fail [19:11:23] PROBLEM - puppet last run on virt1003 is CRITICAL: CRITICAL: Puppet has 46 failures [19:11:30] PROBLEM - puppet last run on virt1007 is CRITICAL: CRITICAL: puppet fail [19:11:31] PROBLEM - puppet last run on mw1034 is CRITICAL: CRITICAL: puppet fail [19:11:31] PROBLEM - puppet last run on gadolinium is CRITICAL: CRITICAL: Puppet has 54 failures [19:11:31] PROBLEM - puppet last run on search1002 is CRITICAL: CRITICAL: Puppet has 86 failures [19:11:31] PROBLEM - puppet last run on ms-be2001 is CRITICAL: CRITICAL: puppet fail [19:11:31] PROBLEM - puppet last run on mc1014 is CRITICAL: CRITICAL: Puppet has 44 failures [19:11:31] PROBLEM - puppet last run on mw1111 is CRITICAL: CRITICAL: Puppet has 119 failures [19:11:32] PROBLEM - puppet last run on hooft is CRITICAL: CRITICAL: puppet fail [19:11:32] PROBLEM - puppet last run on virt1001 is CRITICAL: CRITICAL: Puppet has 46 failures [19:11:33] PROBLEM - puppet last run on thallium is CRITICAL: CRITICAL: puppet fail [19:11:33] PROBLEM - puppet last run on mw1190 is CRITICAL: CRITICAL: Puppet has 119 failures [19:11:34] PROBLEM - puppet last run on rhenium is CRITICAL: CRITICAL: puppet fail [19:11:34] PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: puppet fail [19:11:35] PROBLEM - puppet last run on mw1146 is CRITICAL: CRITICAL: puppet fail [19:11:35] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: Puppet has 53 failures [19:11:36] PROBLEM - puppet last run on search1023 is CRITICAL: CRITICAL: puppet fail [19:11:36] PROBLEM - puppet last run on mw1004 is CRITICAL: CRITICAL: puppet fail [19:11:37] PROBLEM - puppet last run on amssq40 is CRITICAL: CRITICAL: Puppet has 47 failures [19:11:37] PROBLEM - puppet last run on amssq41 is CRITICAL: CRITICAL: puppet fail [19:11:40] PROBLEM - puppet last run on mw1056 is CRITICAL: CRITICAL: puppet fail [19:11:42] PROBLEM - puppet last run on osmium is CRITICAL: CRITICAL: puppet fail [19:11:43] PROBLEM - puppet last run on mw1051 is CRITICAL: CRITICAL: Puppet has 119 failures [19:11:50] PROBLEM - puppet last run on ssl1009 is CRITICAL: CRITICAL: puppet fail [19:11:50] PROBLEM - puppet last run on elastic1014 is CRITICAL: CRITICAL: puppet fail [19:11:50] PROBLEM - puppet last run on wtp1004 is CRITICAL: CRITICAL: puppet fail [19:11:50] PROBLEM - puppet last run on mw1029 is CRITICAL: CRITICAL: puppet fail [19:11:50] PROBLEM - puppet last run on wtp1022 is CRITICAL: CRITICAL: puppet fail [19:11:51] PROBLEM - puppet last run on cp1038 is CRITICAL: CRITICAL: puppet fail [19:11:51] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: puppet fail [19:11:52] PROBLEM - puppet last run on tungsten is CRITICAL: CRITICAL: puppet fail [19:11:52] PROBLEM - puppet last run on mw1074 is CRITICAL: CRITICAL: puppet fail [19:11:53] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: puppet fail [19:11:53] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: puppet fail [19:11:54] PROBLEM - puppet last run on mw1030 is CRITICAL: CRITICAL: puppet fail [19:11:54] PROBLEM - puppet last run on ssl3002 is CRITICAL: CRITICAL: puppet fail [19:11:55] PROBLEM - puppet last run on amssq42 is CRITICAL: CRITICAL: puppet fail [19:12:00] PROBLEM - puppet last run on wtp1023 is CRITICAL: CRITICAL: puppet fail [19:12:00] PROBLEM - puppet last run on ms-be1008 is CRITICAL: CRITICAL: puppet fail [19:12:11] PROBLEM - puppet last run on mw1097 is CRITICAL: CRITICAL: puppet fail [19:12:11] PROBLEM - puppet last run on ms-be1007 is CRITICAL: CRITICAL: puppet fail [19:12:11] PROBLEM - puppet last run on lvs1001 is CRITICAL: CRITICAL: puppet fail [19:12:12] PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: puppet fail [19:12:20] PROBLEM - puppet last run on search1024 is CRITICAL: CRITICAL: puppet fail [19:12:40] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: puppet fail [19:14:00] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 573.06665 [19:14:21] RECOVERY - puppet last run on db1069 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [19:14:32] (03CR) 10Reedy: [C: 032] testwiki to 1.25wmf4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167049 (owner: 10Reedy) [19:14:40] (03Merged) 10jenkins-bot: testwiki to 1.25wmf4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167049 (owner: 10Reedy) [19:14:43] ^ that recovery, i ran puppet manually there [19:14:50] dunno why icinga hickup though [19:14:55] saw no error [19:14:57] <_joe_> ok so it's a puppetmaster failure maybe [19:15:31] Oct 16 19:09:03 ms-be2007 puppet-agent[26691]: (/File[/var/lib/puppet/lib]) Failed to generate additional resources using 'eval_generate': Connection refused - connect(2) [19:15:35] Oct 16 19:09:03 ms-be2007 puppet-agent[26691]: (/File[/var/lib/puppet/lib]) Could not evaluate: Connection refused - connect(2) Could not retrieve file metadata for puppet://puppet/plugins: Connection refused - connect(2) [19:16:20] [Thu Oct 16 19:09:32 2014] [notice] Apache/2.2.22 (Ubuntu) Phusion_Passenger/2.2.11 mod_ssl/2.2.22 OpenSSL/1.0.1 configured -- resuming normal operations [19:16:48] from /usr/lib/phusion_passenger/passenger-spawn-server:61 [19:17:02] seems like it crashed again but was already restarted [19:17:04] I saw some mention of puppetmaster failures happening occasionally? [19:17:40] yes, i restarted apache at least twice in the past after passenger crashes [19:17:41] <_joe_> paravoid: yes [19:17:50] then it spams apache error log [19:17:53] <_joe_> paravoid: the solution is, upgrade to trusty and ruby 1.9 [19:17:55] but after a restart all is back to normal [19:18:05] <_joe_> or, add one more puppetmaster [19:18:09] but i didnt actually restart it this time.. just saw it in the logs [19:18:21] <_joe_> we surely need one in codfw I think [19:18:22] paravoid: s/occasionally/at least once a day for weeks or months now/ [19:19:03] @palladium:~# grep -i passenger /var/log/apache2/error.log [19:19:33] <_joe_> we just need someone with the time to create a codfw puppetmaster with trusty :) [19:19:42] !log reedy Started scap: testwiki to 1.25wmf4 [19:19:47] Logged the message, Master [19:19:48] ori: fine, give us back _joe_ and I'm sure he'll fix it in a day ;) [19:19:57] *** Exception Errno::EPIPE in Passenger RequestHandler (Broken pipe) [19:20:01] * ori clutches to _joe_'s leg. [19:20:23] <_joe_> oh I shouldn't have imagined that. [19:22:52] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:24:00] (03PS2) 10Reedy: Moved zerowiki to group0 depl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166800 (owner: 10Yurik) [19:24:07] (03CR) 10Reedy: [C: 032] Moved zerowiki to group0 depl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166800 (owner: 10Yurik) [19:24:14] (03Merged) 10jenkins-bot: Moved zerowiki to group0 depl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166800 (owner: 10Yurik) [19:27:21] MaxSem: so [19:27:45] MaxSem: this change needs to be split :) [19:28:00] RECOVERY - puppet last run on db1060 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [19:28:05] or else we'll do infinite loops depending on which varnish('s cache) you hit [19:28:14] RECOVERY - puppet last run on db1026 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [19:28:14] RECOVERY - puppet last run on plutonium is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [19:28:14] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [19:28:20] RECOVERY - puppet last run on virt1004 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [19:28:21] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [19:28:28] so we need to first deploy the mobile-frontend change [19:28:30] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [19:28:31] RECOVERY - puppet last run on analytics1022 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [19:28:31] RECOVERY - puppet last run on virt1003 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [19:28:40] RECOVERY - puppet last run on search1002 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [19:28:41] RECOVERY - puppet last run on mw1133 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [19:28:41] RECOVERY - puppet last run on mc1014 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [19:28:41] RECOVERY - puppet last run on mw1111 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [19:28:41] RECOVERY - puppet last run on virt1001 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [19:28:50] RECOVERY - puppet last run on analytics1023 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [19:28:50] RECOVERY - puppet last run on lvs3004 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [19:28:51] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [19:28:51] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [19:28:51] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [19:28:51] RECOVERY - puppet last run on analytics1013 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [19:29:00] RECOVERY - puppet last run on mw1151 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [19:29:10] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [19:29:10] (sometimes i kill the bot in these cases, sometimes people say they actually like seeing the recoveries) [19:29:13] RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [19:29:13] RECOVERY - puppet last run on mw1049 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [19:29:13] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [19:29:13] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [19:29:14] and wait for those redirects' to expire from cache [19:29:14] RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [19:29:14] RECOVERY - puppet last run on search1005 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [19:29:14] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [19:29:15] RECOVERY - puppet last run on mw1014 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [19:29:15] RECOVERY - puppet last run on ms-be3001 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [19:29:16] RECOVERY - puppet last run on argon is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [19:29:16] RECOVERY - puppet last run on mw1183 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [19:29:17] RECOVERY - puppet last run on rdb1001 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [19:29:17] RECOVERY - puppet last run on mw1084 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [19:29:22] RECOVERY - puppet last run on mw1180 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [19:29:22] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [19:29:22] RECOVERY - puppet last run on mw1098 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [19:29:22] RECOVERY - puppet last run on lvs2006 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [19:29:30] RECOVERY - puppet last run on db1039 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [19:29:30] RECOVERY - puppet last run on snapshot1002 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [19:29:31] RECOVERY - puppet last run on mw1079 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [19:29:31] RECOVERY - puppet last run on db2004 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [19:29:31] RECOVERY - puppet last run on db1071 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [19:29:31] RECOVERY - puppet last run on db2001 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [19:29:31] RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [19:29:32] RECOVERY - puppet last run on mw1125 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [19:29:32] RECOVERY - puppet last run on cp4005 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [19:29:41] RECOVERY - puppet last run on rubidium is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [19:29:42] RECOVERY - puppet last run on gadolinium is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [19:29:46] RECOVERY - puppet last run on thallium is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [19:29:50] RECOVERY - puppet last run on mw1190 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [19:29:50] RECOVERY - puppet last run on cp1063 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [19:29:51] RECOVERY - puppet last run on rhenium is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [19:29:51] RECOVERY - puppet last run on hafnium is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [19:29:51] RECOVERY - puppet last run on mw1004 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [19:29:54] icinga-wm: sssh [19:30:33] (it will come back by itself on next puppet run but by then most of the spam is usually over) [19:32:38] recovered [19:34:01] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [19:35:01] (03CR) 10Dzahn: [C: 032] "if you're a native speaker of ar, hr, it no, pt, sk or vi, feel free to cross check these" [puppet] - 10https://gerrit.wikimedia.org/r/166686 (owner: 10John F. Lewis) [19:35:37] mutante: +2 to that :p [19:36:47] (03CR) 10Dzahn: [V: 032] "ah, needs manual verify for jenkins. we can always submit fixes" [puppet] - 10https://gerrit.wikimedia.org/r/166686 (owner: 10John F. Lewis) [19:37:00] (03PS2) 10BBlack: Zero: unified everything, no opera for 293-41 [puppet] - 10https://gerrit.wikimedia.org/r/167037 (owner: 10Yurik) [19:37:10] (03CR) 10BBlack: [C: 032 V: 032] "\o/" [puppet] - 10https://gerrit.wikimedia.org/r/167037 (owner: 10Yurik) [19:37:27] bblack, can we monitor cache hit ratio somewhere? [19:37:42] hit ratio yes, fragmentation no [19:38:32] JohnLewis: puppet applied on the server .. looks good to you ? [19:38:56] mutante: looks good [19:39:04] (looked at ar) [19:39:18] great [19:39:20] and confirmed with vi :) [19:40:09] godog/joe: how does "Lista di Mailing" sound in Italian [19:41:00] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:41:14] godog/joe: blame the volunteers if it is wrong :D [19:41:20] yurikR: e.g. this for cp4011 in ulsfo: http://ganglia.wikimedia.org/latest/graph_all_periods.php?title=&vl=&x=&n=&hreg%5B%5D=cp4011.ulsfo.wmnet&mreg%5B%5D=frontend.cache_(hit%7Cmiss)%24>ype=line&glegend=show&aggregate=1 [19:41:31] JohnLewis: thanks for the contribution [19:41:42] mutante: is there anything realistic we can do about https://bugzilla.wikimedia.org/show_bug.cgi?id=37817 besides poking the listadmins to just change it themselves? [19:42:29] or eqiad might be a better measure: http://ganglia.wikimedia.org/latest/graph_all_periods.php?title=&vl=&x=&n=&hreg%5B%5D=cp1060.eqiad.wmnet&mreg%5B%5D=frontend.cache_(hit%7Cmiss)%24>ype=line&glegend=show&aggregate=1 [19:43:31] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1032.866699 [19:44:04] esams: http://ganglia.wikimedia.org/latest/graph_all_periods.php?title=&vl=&x=&n=&hreg%5B%5D=cp3014.esams.wmnet&mreg%5B%5D=frontend.cache_(hit%7Cmiss)%24>ype=line&glegend=show&aggregate=1 [19:44:18] JohnLewis: it makes me think of that idea of a "list of list admins". i honestly don't know. can we confirm if that really fixes it on one list? [19:44:32] JohnLewis: i remember there once was also an upstream issue with encoding in mailman [19:45:14] looks at that attachment now [19:45:40] (I updated cp4011 a few minutes ago, and 3014 in esams just updated right now) [19:45:44] mutante: we have a list of list admins now - although we have to rely on list admins checking the email :) [19:46:15] mutante: https://lists.wikimedia.org/mailman/listinfo/listadmins -- on a side note, I'm a listadmin yet don't have the listadmin password (yay) [19:46:27] mutante: "lista di mailing" sounds awful [19:47:11] JohnLewis: i think it's indeed better to ask the admins to change their descriptions.. but .. at least it's not like one has to find all those email addresses. there is automatically -owner@lists address for each list and the admins will all receive that [19:47:33] Nemo_bis: heh, what would you say :) [19:48:11] JohnLewis: mailto: listadmins-owner@lists.wikimedia.org ?:) [19:48:19] mutante: yeah true. I'll add it to my todo-type-list-thingie and see what I can do about spamming people [19:48:28] mutante: huh? [19:49:17] JohnLewis: that way Thehelpfulone will get a message to share the password with you [19:49:27] because he's the other admin [19:49:30] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:49:38] I'll just email him the old fashioned way [19:49:58] i could reset it, but we would have to mail him one way or another [19:49:59] mutante: lista di discussione [19:50:06] at least to tell him about it being changed [19:50:37] omg a list for listadmin, povero me [19:51:17] i think it's actually good. there have been several occasions when people wanted to discuss how to configure their lists [19:51:54] it's ste up in a rather hacky way though. Every -owner list in subed so.. [19:51:58] *set up [19:52:43] paravoid, can't we just purge the redirects instead of waiting (30 days?) [19:52:48] andrewbogott: I'm about now dude [19:53:05] chasemp: ? [19:53:17] mutante: it's non-urgent until we need to spam everyone - so until that days comes, I'll leave him to try and get back to me. [19:53:21] thought you were looking for me, maybe wrong person :D [19:53:54] The idea is also all those with the power to create lists (well, those who do it regularly I should say) are also added to the -owner list giving a way to easily prod us [19:54:16] JohnLewis: +1 [19:54:45] JohnLewis: RT support was appreciated as well [19:54:48] Currently it is just THO and me - adding Philippe and James would be the next step :) [19:55:30] are you going to want phabricator tasks for new list requests? [19:55:38] (03PS3) 10Ottomata: Grant access to Magnus Edenhill on analytics1003 [puppet] - 10https://gerrit.wikimedia.org/r/167042 [19:55:46] We handle it via BZ currently so yeah [19:55:56] *nod* [19:58:06] (03CR) 10Ottomata: [C: 032] Grant access to Magnus Edenhill on analytics1003 [puppet] - 10https://gerrit.wikimedia.org/r/167042 (owner: 10Ottomata) [19:59:11] (03PS1) 10Ottomata: Adding edenhill to bastiononly group [puppet] - 10https://gerrit.wikimedia.org/r/167057 [19:59:24] (03CR) 10Ottomata: [C: 032 V: 032] Adding edenhill to bastiononly group [puppet] - 10https://gerrit.wikimedia.org/r/167057 (owner: 10Ottomata) [20:00:04] gwicke, cscott, subbu: Respected human, time to deploy Parsoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141016T2000). Please do the needful. [20:00:10] PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: puppet fail [20:01:10] RECOVERY - check if salt-minion is running on analytics1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [20:06:04] !log reedy Finished scap: testwiki to 1.25wmf4 (duration: 46m 21s) [20:06:09] Logged the message, Master [20:06:21] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [20:06:38] (03PS2) 10Reedy: Wikipedias to 1.25wmf3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167050 [20:06:41] !log reedy Synchronized database lists: (no message) (duration: 00m 19s) [20:06:43] (03CR) 10Reedy: [C: 032] Wikipedias to 1.25wmf3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167050 (owner: 10Reedy) [20:06:46] Logged the message, Master [20:06:51] (03Merged) 10jenkins-bot: Wikipedias to 1.25wmf3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167050 (owner: 10Reedy) [20:07:22] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.25wmf3 [20:07:28] Logged the message, Master [20:07:56] (03PS1) 10Reedy: group0 to 1.24wmf4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167061 [20:09:01] (03CR) 10Reedy: [C: 032] group0 to 1.24wmf4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167061 (owner: 10Reedy) [20:09:08] (03Merged) 10jenkins-bot: group0 to 1.24wmf4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167061 (owner: 10Reedy) [20:10:23] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf4 [20:10:27] Logged the message, Master [20:11:49] (03PS2) 10Reedy: Enable uploads on he.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166887 (https://bugzilla.wikimedia.org/72060) (owner: 10Nemo bis) [20:11:54] (03CR) 10Reedy: [C: 032] Enable uploads on he.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166887 (https://bugzilla.wikimedia.org/72060) (owner: 10Nemo bis) [20:12:01] (03Merged) 10jenkins-bot: Enable uploads on he.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166887 (https://bugzilla.wikimedia.org/72060) (owner: 10Nemo bis) [20:13:08] (03PS3) 10Reedy: Lots of rights changes for huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166687 (https://bugzilla.wikimedia.org/72055) (owner: 10Calak) [20:13:12] (03CR) 10Reedy: [C: 032] Lots of rights changes for huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166687 (https://bugzilla.wikimedia.org/72055) (owner: 10Calak) [20:13:19] (03Merged) 10jenkins-bot: Lots of rights changes for huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166687 (https://bugzilla.wikimedia.org/72055) (owner: 10Calak) [20:14:20] RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [20:15:03] !log reedy Synchronized wmf-config/: (no message) (duration: 00m 17s) [20:15:09] Logged the message, Master [20:15:36] !log reedy Synchronized database lists: (no message) (duration: 00m 18s) [20:15:41] Logged the message, Master [20:29:18] yurikR: we won't see full impact for a while I think, but already I think the leading edge of the miss rate in esams is dropping off a bit: [20:29:20] http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&title=hit%3Amiss+on+mobile+frontend&vl=N%2Fs&x=&n=&hreg[]=cp3014.esams.wmnet&mreg[]=frontend.cache_%28hit%7Cmiss%29>ype=line&glegend=show&aggregate=1&embed=1&_=1413488892071 [20:29:41] (relative to the usual pattern I mean) [20:30:11] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet has 1 failures [20:32:17] thx bblack, will look at it tomorrow i guess :) [20:35:10] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: puppet fail [20:40:18] RECOVERY - check_puppetrun on boron is OK: OK: Puppet is currently enabled, last run 205 seconds ago with 0 failures [20:43:47] (03PS1) 10Calak: Create "templateeditor" user group on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167065 (https://bugzilla.wikimedia.org/72146) [20:47:01] (03CR) 10John F. Lewis: [C: 031] ssl_ciphersuite - add new compat mode [puppet] - 10https://gerrit.wikimedia.org/r/166710 (owner: 10Dzahn) [20:53:42] (03CR) 10Dzahn: [C: 032] ssl_ciphersuite - add new compat mode [puppet] - 10https://gerrit.wikimedia.org/r/166710 (owner: 10Dzahn) [20:56:12] (03CR) 10Dzahn: "thanks for the many +1's, now we can use this, starting with gerrit" [puppet] - 10https://gerrit.wikimedia.org/r/166710 (owner: 10Dzahn) [21:01:08] PROBLEM - nutcracker port on mw1189 is CRITICAL: Cannot assign requested address [21:02:08] RECOVERY - nutcracker port on mw1189 is OK: TCP OK - 0.000 second response time on port 11212 [21:10:33] (03PS4) 10Ottomata: [WIP] Initial commit of Cassandra puppet module [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 [21:11:12] (03PS5) 10Ottomata: [WIP] Initial commit of Cassandra puppet module [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 [21:11:33] (03CR) 10Ottomata: [WIP] Initial commit of Cassandra puppet module (031 comment) [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [21:14:10] (03PS1) 10John F. Lewis: gerrit: Disable SSLv3 [puppet] - 10https://gerrit.wikimedia.org/r/167105 [21:14:37] (03Restored) 10Dzahn: add annual.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/165927 (owner: 10Dzahn) [21:15:45] (03CR) 10Dzahn: "yes, this is what uses Change-Id: I0cff61a2d86060e8c now" [puppet] - 10https://gerrit.wikimedia.org/r/167105 (owner: 10John F. Lewis) [21:18:26] (03CR) 10Dzahn: [C: 032] gerrit: Disable SSLv3 [puppet] - 10https://gerrit.wikimedia.org/r/167105 (owner: 10John F. Lewis) [21:19:40] JohnLewis: i suggest we get you added to list of trusted users, that way we can get jenkins verify your stuff [21:20:17] that's the reason why jenkins will just +1 and insist on "needs verify" [21:20:45] mutante: yeah. If you want - I can upload a patch for it and you +1 it since I assume it is out of your area (unless I'm mistaken :) ) [21:20:53] (03CR) 10Dzahn: [V: 032] gerrit: Disable SSLv3 [puppet] - 10https://gerrit.wikimedia.org/r/167105 (owner: 10John F. Lewis) [21:21:08] JohnLewis: if you already know where it is.. sure go ahead [21:21:33] there are 2 places in 1 file [21:21:43] i kept forgetting one of the 2 each time [21:21:50] :D [21:22:01] That file is funny :P [21:22:16] Just grep for an existing user's email and see where it is referenced [21:22:16] the regex, heh, yea, i wish it was one line per user [21:23:19] :p [21:23:43] hoo: I just grep'd addshore's email because I know he's listed :p [21:23:53] (03CR) 10Dzahn: "now used by Gerrit in I134fe31d0faca2d8" [puppet] - 10https://gerrit.wikimedia.org/r/166710 (owner: 10Dzahn) [21:24:02] So am I :P [21:24:18] (not using my WMDE email) [21:24:20] (03CR) 10Dzahn: "- SSLProtocol all -SSLv2" [puppet] - 10https://gerrit.wikimedia.org/r/167105 (owner: 10John F. Lewis) [21:24:33] hoo: I didn't even know you had a wmde email tbh [21:25:31] JohnLewis: Everyone working for WMDE has one... I guess :P [21:25:43] hoo: use it then! :p [21:27:31] meh [21:29:59] mutante: https://gerrit.wikimedia.org/r/#/c/167107/ [21:30:13] Das Benutzerkonto „Hoo“ ist nicht vorhanden. [21:30:49] mutante: huh? :D [21:32:33] hoo: on wikimedia.de , but not entirely serious :) [21:32:42] grrrit-wm1: hello? [21:33:22] mutante: :P Never had an account to the public wiki... I had one to our private one, once, but forgot my user name :D [21:33:28] * on the [21:34:14] gerrrit bot stopped talking? [21:34:39] hoo: i see :) [21:35:01] JohnLewis: ironically if you add yourself to the list of those users, that change itself will also not be verified yet [21:35:20] Gotta love irony [21:35:20] because yaml lint clearly is dangerous :P [21:35:25] well, not really ironically:) [21:36:33] (03CR) 10Dzahn: [C: 032] add annual.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/165927 (owner: 10Dzahn) [21:53:18] PROBLEM - puppet last run on mw1114 is CRITICAL: CRITICAL: Puppet has 1 failures [22:01:13] (03PS2) 10BBlack: Disable SSLv3 completely [puppet] - 10https://gerrit.wikimedia.org/r/167015 [22:06:29] RECOVERY - puppet last run on mw1114 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [22:52:38] (03CR) 10Faidon Liambotis: [C: 031] Disable SSLv3 completely [puppet] - 10https://gerrit.wikimedia.org/r/167015 (owner: 10BBlack) [22:54:32] (03CR) 10Faidon Liambotis: "This leaves the "mediawiki" package installed, which is… funny, to say the least. I'd prefer a solution that would result in that not bein" [puppet] - 10https://gerrit.wikimedia.org/r/167007 (owner: 10Giuseppe Lavagetto) [23:00:05] RoanKattouw, ^d, marktraceur, MaxSem: Dear anthropoid, the time has come. Please deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141016T2300). [23:00:46] nothing to deploy, whee [23:02:32] MaxSem: You can call yourself lucky that jenkins is to slow for our backports today [23:02:35] :D [23:02:44] WIN [23:02:53] I'll deploy them myself once stuff is ready... maybe 30-45 more minutes [23:03:00] greg-g: Is that ok with you? ^ [23:03:24] * greg-g is not here [23:04:22] Ok, if greg is not here, we can do whatever we want!!11 :D [23:05:25] hoo: I do anyway ;) [23:05:49] Ok... we need even more time to get stuff ready... yay :P [23:07:40] !log RT - removed global permission for privileged users to create tickets - should not affect anyone because users are either not privileged or get this from other groups - need it to be flexible about readonly queues in RT - let me know if any issues [23:07:47] Logged the message, Master [23:08:12] chasemp: ^ got it [23:08:29] i was able to make one queue readonly while not affecting the others after that [23:41:42] (03PS1) 10coren: [WIP] Labs: puppetize gridengine [puppet] - 10https://gerrit.wikimedia.org/r/167126 [23:42:52] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Labs: puppetize gridengine [puppet] - 10https://gerrit.wikimedia.org/r/167126 (owner: 10coren) [23:45:11] (03PS2) 10coren: [WIP] Labs: puppetize gridengine [puppet] - 10https://gerrit.wikimedia.org/r/167126 [23:46:22] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Labs: puppetize gridengine [puppet] - 10https://gerrit.wikimedia.org/r/167126 (owner: 10coren) [23:47:12] (03PS3) 10coren: [WIP] Labs: puppetize gridengine [puppet] - 10https://gerrit.wikimedia.org/r/167126