[00:01:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:02:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [00:04:04] PROBLEM - SSH on fenari is CRITICAL: Connection refused [00:08:24] !log maxsem synchronized php-1.22wmf10/extensions/GeoData/solrupdate.php [00:08:35] Logged the message, Master [00:10:05] !log maxsem synchronized php-1.22wmf9/extensions/GeoData/solrupdate.php [00:10:16] Logged the message, Master [00:15:00] LeslieCarr: I think that we can close https://rt.wikimedia.org/Ticket/Display.html?id=3066 [00:15:07] everything is stopped on spence at this point [00:15:13] it's about to decom [00:15:24] monitoring requires refinement, but ya know [00:15:31] that's a whole different can of worms [00:15:39] New patchset: Dzahn; "re-kill spence from site.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73370 [00:15:40] notpeter: [00:16:09] eh, it's already also in decom [00:16:09] heh [00:16:10] awesome [00:16:18] yeah, but should clean it up from site.pp [00:16:28] I'm sure it's hardcoded all over the place.... [00:16:39] ah, that makes me notice the comment on srv193 [00:16:45] that is also outdated ..now [00:17:06] git grep spence (notpeter) all over [00:17:21] don't grep me, bro ;) [00:17:22] but yeah [00:17:23] oh i did [00:17:25] :~/wmf/puppet/manifests$ git grep spence [00:17:25] decommissioning.pp:'spence', [00:17:27] it should only be in site.pp now [00:17:33] after my change above [00:17:45] Risker, you're getting extra technical if you're joining this channel now :O [00:17:48] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [00:18:18] THO, I just logged in and saw the server lag flag. At least now I know where to get the downlow :) [00:21:43] New patchset: Dzahn; "add comment to srv193 that test has been switched to mw1017" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73371 [00:22:10] :-) [00:22:17] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73371 [00:25:01] !request help Thehelpfulone RT-4880 [00:25:06] :D [00:26:10] I was going to follow up with someone else on that first mutante - I'm not sure whether we necessarily /need/ to mark lists as private - we've not had a problem with it in at least the past couple of years (AFAIK) and it would mean crossposting to lists for announcements etc would be more work [00:26:17] mutante: Might be worth adding something to the srv193 MOTD to say don't use me or something [00:27:06] mutante, can you give me a list n that RT ticket of the lists that are marked as private? [00:27:09] in that* [00:27:45] Thehelpfulone: yea, after moving it to core queue [00:29:12] Reedy: alright, done :) [00:29:22] :) [00:30:10] mutante, sure [00:30:47] Thehelpfulone: done [00:30:57] thanks [00:39:35] Dzahn, Ryan_Lane thanks for test.w.o. I updated https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Trying_tin.27s_code_on_testwiki to mw1017, but don't fully understand that paragraph. [00:42:51] ah. cool. thanks [00:43:17] seems fine to me spagewmf [00:44:18] don't see hardcoded srv193 in /usr/local/bin/ on fenari where the sync scripts are either [00:44:33] mutante/dzahn (?!) I was hoping you could clear the fog around how to rebuild 18n cache on on test.w.o only :) [00:44:36] besidea apache-fast-test which i already fixed [00:46:16] spagewmf: oh.ehm.. afraid not really [00:46:53] but i didnt see that in that section [00:47:30] ah.. yea.. "only way to update these messages is to do a full scap" ..hmm [00:47:48] !log paused updating change_tags indexes on remaining wikis in bug 40867 comment 6 (adds done, drops todo) [00:47:59] Logged the message, Master [00:54:14] mutante, passed it on, I'll let you know (and update the RT ticket) when I have an answer :) [00:54:40] Thehelpfulone: thank you! and cu later then [00:55:02] no problem, when you get a minute perhaps you can !help with RT-2494 ;-) [00:55:26] By help I mean give your comments, I'll do all the hard work! [00:56:44] agree on streamlining it, feel free to remove Jimbo's access , hehehe [00:57:07] i wonder.. does he ever show up on #mediawiki nowadays? [00:58:08] -Notice- {from NickServ} Last seen : Dec 16 00:51:30 2008 (4 years, 29 weeks, 6 days, 01:06:26 ago) [00:58:17] :) [00:58:34] oh correction, that was the wrong nick, but -Notice- {from NickServ} User seen : Dec 01 19:05:06 2012 (31 weeks, 5 days, 06:53:15 ago) [00:59:22] that being said, i can't remember actually needing ops to "deal with trolls" as it says [00:59:39] but looking at it, keeping it updated and possibly removing people not working for us anymore makes sense [01:00:10] one thing you would want ops for is to set the topic, but since we are wiki we should just have mode -t and make it editable [01:00:19] ok, ttyl [01:07:52] spagewmf: (or anyone) you don't happen to roughly know the total number of server the WMF has, do you? [01:08:01] New patchset: Dzahn; "comment another apache module for duplicate definition, add FIXME" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73373 [01:08:07] physical servers, not virtual instances or anything like that [01:26:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:27:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.162 second response time [01:38:05] RECOVERY - SSH on fenari is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [01:56:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:57:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [01:59:08] Prodego: see http://ganglia.wikimedia.org/latest/ [01:59:16] notice hosts up/hosts down [01:59:30] of course, that's only the systems that are actually in use [01:59:38] we have some that aren't turned on yet [02:16:00] !log LocalisationUpdate completed (1.22wmf9) at Fri Jul 12 02:16:00 UTC 2013 [02:16:11] Logged the message, Master [02:25:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [02:34:20] !log LocalisationUpdate completed (1.22wmf10) at Fri Jul 12 02:34:20 UTC 2013 [02:34:30] Logged the message, Master [02:46:49] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: No successful Puppet run in the last 10 hours [02:47:23] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 12 02:47:23 UTC 2013 [02:47:34] Logged the message, Master [02:52:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:53:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [02:56:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:57:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.142 second response time [03:23:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:24:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.136 second response time [03:28:56] Elsie: bah, you know what I mean [03:50:52] PROBLEM - Puppet freshness on mw1001 is CRITICAL: No successful Puppet run in the last 10 hours [03:58:54] :-) [03:59:36] Scrollback in here was cute today. Lots of collaboration and updating docs and shit. It was nice to read. [04:01:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:02:52] PROBLEM - Puppet freshness on db78 is CRITICAL: No successful Puppet run in the last 10 hours [04:03:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [04:11:10] Elsie: :P [04:22:57] New patchset: Ori.livneh; "EventLogging module: tabs -> spaces; no significant changes." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73387 [04:32:42] PROBLEM - Puppet freshness on searchidx1001 is CRITICAL: No successful Puppet run in the last 10 hours [04:39:42] PROBLEM - Puppet freshness on rubidium is CRITICAL: No successful Puppet run in the last 10 hours [04:40:42] PROBLEM - Puppet freshness on manganese is CRITICAL: No successful Puppet run in the last 10 hours [04:40:42] PROBLEM - Puppet freshness on mw1007 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:42] PROBLEM - Puppet freshness on mw1043 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:42] PROBLEM - Puppet freshness on mw1171 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:42] PROBLEM - Puppet freshness on ekrem is CRITICAL: No successful Puppet run in the last 10 hours [04:40:42] PROBLEM - Puppet freshness on mw121 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:42] PROBLEM - Puppet freshness on mw1197 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:43] PROBLEM - Puppet freshness on mw1041 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:43] PROBLEM - Puppet freshness on mw1063 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:44] PROBLEM - Puppet freshness on mw1087 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:44] PROBLEM - Puppet freshness on solr1003 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:45] PROBLEM - Puppet freshness on search1024 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:45] PROBLEM - Puppet freshness on mw58 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:46] PROBLEM - Puppet freshness on sq76 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:46] PROBLEM - Puppet freshness on srv292 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:47] PROBLEM - Puppet freshness on search18 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:47] PROBLEM - Puppet freshness on solr3 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:48] PROBLEM - Puppet freshness on mw1210 is CRITICAL: No successful Puppet run in the last 10 hours [04:40:48] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [04:40:49] PROBLEM - Puppet freshness on stat1 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:42] PROBLEM - Puppet freshness on cp1005 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:42] PROBLEM - Puppet freshness on cp3012 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:42] PROBLEM - Puppet freshness on amssq53 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:42] PROBLEM - Puppet freshness on cp3009 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:42] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:42] PROBLEM - Puppet freshness on db1001 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:42] PROBLEM - Puppet freshness on db1044 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:43] PROBLEM - Puppet freshness on helium is CRITICAL: No successful Puppet run in the last 10 hours [04:41:43] PROBLEM - Puppet freshness on mc1007 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:44] PROBLEM - Puppet freshness on db1031 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:44] PROBLEM - Puppet freshness on ms10 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:45] PROBLEM - Puppet freshness on db39 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:45] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:46] PROBLEM - Puppet freshness on mw124 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:46] PROBLEM - Puppet freshness on mw1032 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:47] PROBLEM - Puppet freshness on mw43 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:47] PROBLEM - Puppet freshness on pc1 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:48] PROBLEM - Puppet freshness on potassium is CRITICAL: No successful Puppet run in the last 10 hours [04:41:48] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [04:41:49] PROBLEM - Puppet freshness on sq54 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:49] PROBLEM - Puppet freshness on rdb1002 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:50] PROBLEM - Puppet freshness on sq58 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:50] PROBLEM - Puppet freshness on srv255 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:51] PROBLEM - Puppet freshness on srv273 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:51] PROBLEM - Puppet freshness on wtp1015 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:42] PROBLEM - Puppet freshness on mw1003 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:42] PROBLEM - Puppet freshness on cp1010 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:42] PROBLEM - Puppet freshness on mw1024 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:42] PROBLEM - Puppet freshness on mw1033 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:42] PROBLEM - Puppet freshness on mw1189 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:42] PROBLEM - Puppet freshness on ms-fe1001 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:42] PROBLEM - Puppet freshness on mw1069 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:43] PROBLEM - Puppet freshness on mw106 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:43] PROBLEM - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:44] PROBLEM - Puppet freshness on mw1150 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:44] PROBLEM - Puppet freshness on mw1205 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:45] PROBLEM - Puppet freshness on mw1201 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:45] PROBLEM - Puppet freshness on db1022 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:46] PROBLEM - Puppet freshness on mw2 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:46] PROBLEM - Puppet freshness on mw35 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:47] PROBLEM - Puppet freshness on mw79 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:47] PROBLEM - Puppet freshness on srv193 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:48] PROBLEM - Puppet freshness on search33 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:48] PROBLEM - Puppet freshness on wtp1007 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:49] PROBLEM - Puppet freshness on rdb1003 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:49] PROBLEM - Puppet freshness on mw98 is CRITICAL: No successful Puppet run in the last 10 hours [04:43:42] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: No successful Puppet run in the last 10 hours [04:43:42] PROBLEM - Puppet freshness on amssq48 is CRITICAL: No successful Puppet run in the last 10 hours [04:43:42] PROBLEM - Puppet freshness on calcium is CRITICAL: No successful Puppet run in the last 10 hours [04:43:42] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: No successful Puppet run in the last 10 hours [04:43:42] PROBLEM - Puppet freshness on db1033 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:42] PROBLEM - Puppet freshness on antimony is CRITICAL: No successful Puppet run in the last 10 hours [04:45:42] PROBLEM - Puppet freshness on cp1058 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:42] PROBLEM - Puppet freshness on cp3011 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:42] PROBLEM - Puppet freshness on db1002 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:42] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:43] PROBLEM - Puppet freshness on db1014 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:43] PROBLEM - Puppet freshness on labstore1 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:44] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:44] PROBLEM - Puppet freshness on labstore3 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:45] PROBLEM - Puppet freshness on mc1012 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:45] PROBLEM - Puppet freshness on ms-be12 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:46] PROBLEM - Puppet freshness on ms-fe1002 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:46] PROBLEM - Puppet freshness on mw1027 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:47] PROBLEM - Puppet freshness on mw1104 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:47] PROBLEM - Puppet freshness on mw1206 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:48] PROBLEM - Puppet freshness on mw1208 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:48] PROBLEM - Puppet freshness on mw1211 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:49] PROBLEM - Puppet freshness on mw42 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:49] PROBLEM - Puppet freshness on mw75 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:50] PROBLEM - Puppet freshness on solr1 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:50] PROBLEM - Puppet freshness on search25 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:51] PROBLEM - Puppet freshness on srv242 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:51] PROBLEM - Puppet freshness on ssl1 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:52] PROBLEM - Puppet freshness on virt11 is CRITICAL: No successful Puppet run in the last 10 hours [04:45:52] PROBLEM - Puppet freshness on wtp1001 is CRITICAL: No successful Puppet run in the last 10 hours [04:46:42] PROBLEM - Puppet freshness on amssq51 is CRITICAL: No successful Puppet run in the last 10 hours [04:46:42] PROBLEM - Puppet freshness on amssq56 is CRITICAL: No successful Puppet run in the last 10 hours [04:46:42] PROBLEM - Puppet freshness on capella is CRITICAL: No successful Puppet run in the last 10 hours [04:46:42] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: No successful Puppet run in the last 10 hours [04:46:42] PROBLEM - Puppet freshness on cp1017 is CRITICAL: No successful Puppet run in the last 10 hours [04:48:42] PROBLEM - Puppet freshness on db1058 is CRITICAL: No successful Puppet run in the last 10 hours [04:48:42] PROBLEM - Puppet freshness on mw1021 is CRITICAL: No successful Puppet run in the last 10 hours [04:48:42] PROBLEM - Puppet freshness on cp1038 is CRITICAL: No successful Puppet run in the last 10 hours [04:48:42] PROBLEM - Puppet freshness on db77 is CRITICAL: No successful Puppet run in the last 10 hours [04:48:42] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:42] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:42] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:42] PROBLEM - Puppet freshness on brewster is CRITICAL: No successful Puppet run in the last 10 hours [04:49:42] PROBLEM - Puppet freshness on cp1065 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:42] PROBLEM - Puppet freshness on dataset2 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:42] PROBLEM - Puppet freshness on db1024 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:42] PROBLEM - Puppet freshness on db32 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:43] PROBLEM - Puppet freshness on db51 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:43] PROBLEM - Puppet freshness on db57 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:44] PROBLEM - Puppet freshness on es5 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:44] PROBLEM - Puppet freshness on mc1008 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:45] PROBLEM - Puppet freshness on mw1138 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:45] PROBLEM - Puppet freshness on mw26 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:46] PROBLEM - Puppet freshness on mw33 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:46] PROBLEM - Puppet freshness on mw36 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:47] PROBLEM - Puppet freshness on mw64 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:47] PROBLEM - Puppet freshness on sq50 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:48] PROBLEM - Puppet freshness on stat1002 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:48] PROBLEM - Puppet freshness on tarin is CRITICAL: No successful Puppet run in the last 10 hours [04:49:49] PROBLEM - Puppet freshness on wtp1013 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:42] PROBLEM - Puppet freshness on db46 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:42] PROBLEM - Puppet freshness on db1010 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:42] PROBLEM - Puppet freshness on aluminium is CRITICAL: No successful Puppet run in the last 10 hours [04:50:42] PROBLEM - Puppet freshness on db55 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:42] PROBLEM - Puppet freshness on mw1040 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:42] PROBLEM - Puppet freshness on es1009 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:42] PROBLEM - Puppet freshness on mw1062 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:43] PROBLEM - Puppet freshness on mw107 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:43] PROBLEM - Puppet freshness on mw109 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:44] PROBLEM - Puppet freshness on mw53 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:44] PROBLEM - Puppet freshness on mw1022 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:45] PROBLEM - Puppet freshness on mw1185 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:45] PROBLEM - Puppet freshness on mw1132 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:46] PROBLEM - Puppet freshness on mw1218 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:46] PROBLEM - Puppet freshness on mw40 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:47] PROBLEM - Puppet freshness on mw70 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:47] PROBLEM - Puppet freshness on snapshot4 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:48] PROBLEM - Puppet freshness on labsdb1001 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:48] PROBLEM - Puppet freshness on sq55 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:49] PROBLEM - Puppet freshness on sq64 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:49] PROBLEM - Puppet freshness on srv296 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:50] PROBLEM - Puppet freshness on sq81 is CRITICAL: No successful Puppet run in the last 10 hours [04:50:50] PROBLEM - Puppet freshness on wtp1021 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:42] PROBLEM - Puppet freshness on amssq59 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:42] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [04:51:42] PROBLEM - Puppet freshness on db31 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:42] PROBLEM - Puppet freshness on cp3006 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:42] PROBLEM - Puppet freshness on mc1011 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:42] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:42] PROBLEM - Puppet freshness on lvs6 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:43] PROBLEM - Puppet freshness on mw1134 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:43] PROBLEM - Puppet freshness on mw1072 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:44] PROBLEM - Puppet freshness on mw117 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:44] PROBLEM - Puppet freshness on mw87 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:45] PROBLEM - Puppet freshness on mw1219 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:45] PROBLEM - Puppet freshness on search1006 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:46] PROBLEM - Puppet freshness on professor is CRITICAL: No successful Puppet run in the last 10 hours [04:51:46] PROBLEM - Puppet freshness on search1011 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:47] PROBLEM - Puppet freshness on srv285 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:47] PROBLEM - Puppet freshness on virt2 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:48] PROBLEM - Puppet freshness on mw1178 is CRITICAL: No successful Puppet run in the last 10 hours [04:52:42] PROBLEM - Puppet freshness on amssq43 is CRITICAL: No successful Puppet run in the last 10 hours [04:52:42] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: No successful Puppet run in the last 10 hours [04:52:42] PROBLEM - Puppet freshness on amssq32 is CRITICAL: No successful Puppet run in the last 10 hours [04:52:42] PROBLEM - Puppet freshness on db1011 is CRITICAL: No successful Puppet run in the last 10 hours [04:52:42] PROBLEM - Puppet freshness on db59 is CRITICAL: No successful Puppet run in the last 10 hours [04:54:42] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: No successful Puppet run in the last 10 hours [04:54:42] PROBLEM - Puppet freshness on cp1001 is CRITICAL: No successful Puppet run in the last 10 hours [04:54:42] PROBLEM - Puppet freshness on cp1039 is CRITICAL: No successful Puppet run in the last 10 hours [04:54:42] PROBLEM - Puppet freshness on cp1054 is CRITICAL: No successful Puppet run in the last 10 hours [04:54:42] PROBLEM - Puppet freshness on cp3019 is CRITICAL: No successful Puppet run in the last 10 hours [04:55:42] PROBLEM - Puppet freshness on amssq36 is CRITICAL: No successful Puppet run in the last 10 hours [04:55:42] PROBLEM - Puppet freshness on cp1008 is CRITICAL: No successful Puppet run in the last 10 hours [04:55:42] PROBLEM - Puppet freshness on cp1009 is CRITICAL: No successful Puppet run in the last 10 hours [04:55:42] PROBLEM - Puppet freshness on cp1064 is CRITICAL: No successful Puppet run in the last 10 hours [04:55:42] PROBLEM - Puppet freshness on cp1068 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:42] PROBLEM - Puppet freshness on amssq41 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:42] PROBLEM - Puppet freshness on amssq60 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:42] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:42] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:42] PROBLEM - Puppet freshness on cp1057 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:43] PROBLEM - Puppet freshness on db1028 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:43] PROBLEM - Puppet freshness on db38 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:44] PROBLEM - Puppet freshness on db68 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:44] PROBLEM - Puppet freshness on ersch is CRITICAL: No successful Puppet run in the last 10 hours [04:56:45] PROBLEM - Puppet freshness on lvs3 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:45] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:46] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:46] PROBLEM - Puppet freshness on mw1039 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:47] PROBLEM - Puppet freshness on mw1177 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:47] PROBLEM - Puppet freshness on mw1184 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:48] PROBLEM - Puppet freshness on mw120 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:48] PROBLEM - Puppet freshness on mw96 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:49] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:49] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [04:57:42] PROBLEM - Puppet freshness on amssq58 is CRITICAL: No successful Puppet run in the last 10 hours [04:57:42] PROBLEM - Puppet freshness on cp1019 is CRITICAL: No successful Puppet run in the last 10 hours [04:57:42] PROBLEM - Puppet freshness on db1003 is CRITICAL: No successful Puppet run in the last 10 hours [04:57:42] PROBLEM - Puppet freshness on db1041 is CRITICAL: No successful Puppet run in the last 10 hours [04:57:42] PROBLEM - Puppet freshness on db1043 is CRITICAL: No successful Puppet run in the last 10 hours [04:59:42] PROBLEM - Puppet freshness on amslvs3 is CRITICAL: No successful Puppet run in the last 10 hours [04:59:42] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: No successful Puppet run in the last 10 hours [04:59:42] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: No successful Puppet run in the last 10 hours [04:59:42] PROBLEM - Puppet freshness on cp1004 is CRITICAL: No successful Puppet run in the last 10 hours [04:59:42] PROBLEM - Puppet freshness on cp1011 is CRITICAL: No successful Puppet run in the last 10 hours [05:00:11] New review: Ori.livneh; "Thanks for the review. Replies in-line; updated patch forthcoming." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71927 [05:00:42] PROBLEM - Puppet freshness on amssq46 is CRITICAL: No successful Puppet run in the last 10 hours [05:00:42] PROBLEM - Puppet freshness on cp1055 is CRITICAL: No successful Puppet run in the last 10 hours [05:00:42] PROBLEM - Puppet freshness on amssq62 is CRITICAL: No successful Puppet run in the last 10 hours [05:00:42] PROBLEM - Puppet freshness on db1053 is CRITICAL: No successful Puppet run in the last 10 hours [05:00:42] PROBLEM - Puppet freshness on mw102 is CRITICAL: No successful Puppet run in the last 10 hours [05:00:42] PROBLEM - Puppet freshness on es8 is CRITICAL: No successful Puppet run in the last 10 hours [05:00:42] PROBLEM - Puppet freshness on mw6 is CRITICAL: No successful Puppet run in the last 10 hours [05:00:43] PROBLEM - Puppet freshness on ssl3002 is CRITICAL: No successful Puppet run in the last 10 hours [05:00:43] PROBLEM - Puppet freshness on srv241 is CRITICAL: No successful Puppet run in the last 10 hours [05:00:44] PROBLEM - Puppet freshness on virt7 is CRITICAL: No successful Puppet run in the last 10 hours [05:00:44] PROBLEM - Puppet freshness on virt5 is CRITICAL: No successful Puppet run in the last 10 hours [05:00:45] PROBLEM - Puppet freshness on ms1002 is CRITICAL: No successful Puppet run in the last 10 hours [05:00:45] PROBLEM - Puppet freshness on wtp1023 is CRITICAL: No successful Puppet run in the last 10 hours [05:01:42] PROBLEM - Puppet freshness on amssq38 is CRITICAL: No successful Puppet run in the last 10 hours [05:01:42] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: No successful Puppet run in the last 10 hours [05:01:42] PROBLEM - Puppet freshness on cp1052 is CRITICAL: No successful Puppet run in the last 10 hours [05:01:42] PROBLEM - Puppet freshness on cp1059 is CRITICAL: No successful Puppet run in the last 10 hours [05:01:42] PROBLEM - Puppet freshness on db1006 is CRITICAL: No successful Puppet run in the last 10 hours [05:03:42] PROBLEM - Puppet freshness on amssq39 is CRITICAL: No successful Puppet run in the last 10 hours [05:03:42] PROBLEM - Puppet freshness on amssq61 is CRITICAL: No successful Puppet run in the last 10 hours [05:03:42] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: No successful Puppet run in the last 10 hours [05:03:42] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: No successful Puppet run in the last 10 hours [05:03:42] PROBLEM - Puppet freshness on cp1045 is CRITICAL: No successful Puppet run in the last 10 hours [05:04:42] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: No successful Puppet run in the last 10 hours [05:04:42] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [05:04:42] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: No successful Puppet run in the last 10 hours [05:04:42] PROBLEM - Puppet freshness on db52 is CRITICAL: No successful Puppet run in the last 10 hours [05:04:42] PROBLEM - Puppet freshness on mc1003 is CRITICAL: No successful Puppet run in the last 10 hours [05:05:42] PROBLEM - Puppet freshness on amssq52 is CRITICAL: No successful Puppet run in the last 10 hours [05:05:42] PROBLEM - Puppet freshness on cp1007 is CRITICAL: No successful Puppet run in the last 10 hours [05:05:42] PROBLEM - Puppet freshness on cp1056 is CRITICAL: No successful Puppet run in the last 10 hours [05:05:42] PROBLEM - Puppet freshness on cp1061 is CRITICAL: No successful Puppet run in the last 10 hours [05:05:42] PROBLEM - Puppet freshness on cp3007 is CRITICAL: No successful Puppet run in the last 10 hours [05:06:12] New patchset: Ori.livneh; "EventLogging module: tabs -> spaces; no significant changes." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73387 [05:06:12] New patchset: Ori.livneh; "Rewrite of EventLogging module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73388 [05:07:00] PROBLEM - Puppet freshness on amssq55 is CRITICAL: No successful Puppet run in the last 10 hours [05:07:00] PROBLEM - Puppet freshness on amssq57 is CRITICAL: No successful Puppet run in the last 10 hours [05:07:00] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: No successful Puppet run in the last 10 hours [05:07:00] PROBLEM - Puppet freshness on cp1003 is CRITICAL: No successful Puppet run in the last 10 hours [05:07:00] PROBLEM - Puppet freshness on cp1051 is CRITICAL: No successful Puppet run in the last 10 hours [05:07:32] oh, god damn it [05:07:40] i must have deleted the change-id line when amending the commit [05:08:33] Is it just me or does icinga-wm voice itself like a million times? [05:09:00] PROBLEM - Puppet freshness on db1020 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:00] PROBLEM - Puppet freshness on lvs1 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:00] PROBLEM - Puppet freshness on mc1009 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:00] PROBLEM - Puppet freshness on ms-be1012 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:00] PROBLEM - Puppet freshness on mw1111 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:00] PROBLEM - Puppet freshness on mw1124 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:00] PROBLEM - Puppet freshness on mw1130 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:01] PROBLEM - Puppet freshness on mw1125 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:01] PROBLEM - Puppet freshness on mw1161 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:02] PROBLEM - Puppet freshness on mw1147 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:02] PROBLEM - Puppet freshness on mw1190 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:03] PROBLEM - Puppet freshness on mw1214 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:03] PROBLEM - Puppet freshness on mw8 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:04] PROBLEM - Puppet freshness on mw38 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:04] PROBLEM - Puppet freshness on search36 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:05] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:05] PROBLEM - Puppet freshness on solr2 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:06] PROBLEM - Puppet freshness on sq51 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:06] PROBLEM - Puppet freshness on srv301 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:07] PROBLEM - Puppet freshness on wtp1011 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:07] PROBLEM - Puppet freshness on wtp1018 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:47] the channel is configured to auto-voice it on join, and it's flooding itself off and re-joining in a loop [05:10:00] PROBLEM - Puppet freshness on amssq44 is CRITICAL: No successful Puppet run in the last 10 hours [05:10:00] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: No successful Puppet run in the last 10 hours [05:10:00] PROBLEM - Puppet freshness on cp1063 is CRITICAL: No successful Puppet run in the last 10 hours [05:10:00] PROBLEM - Puppet freshness on db1004 is CRITICAL: No successful Puppet run in the last 10 hours [05:10:00] PROBLEM - Puppet freshness on db43 is CRITICAL: No successful Puppet run in the last 10 hours [05:10:06] I only see it voicing, no parts or joins. [05:10:14] * Jasper_Deng thinks Elsie has conference mode on [05:10:18] I see the quits and joins [05:10:21] 22:10 icinga-wm has left IRC (Excess Flood) [05:10:21] 22:10 I only see it voicing, no parts or joins. [05:10:27] I don't think irssi has conference mode. [05:10:30] Oh. [05:10:35] I have it ignored. [05:10:38] Heh. [05:10:44] So I'm only seeing ChanServ actions. [05:10:48] Okay, that makes more sense. [05:10:49] right. [05:11:54] i think leslie is working on fixing it [05:12:22] !log restarting ircecho on neon [05:12:23] :p [05:12:34] Logged the message, Master [05:12:48] the /var/spool/snmpttt dir looks harmless this time.. that's all i know at this point [05:12:55] just a handful of files [05:13:31] (because that's been the issue in the past, it ran out of inodes) [05:19:24] i doesn't seem like a particularly helpful alert [05:20:00] PROBLEM - Puppet freshness on amssq34 is CRITICAL: No successful Puppet run in the last 10 hours [05:20:00] PROBLEM - Puppet freshness on cp1002 is CRITICAL: No successful Puppet run in the last 10 hours [05:20:00] PROBLEM - Puppet freshness on cp1012 is CRITICAL: No successful Puppet run in the last 10 hours [05:20:00] PROBLEM - Puppet freshness on db34 is CRITICAL: No successful Puppet run in the last 10 hours [05:20:00] PROBLEM - Puppet freshness on db1023 is CRITICAL: No successful Puppet run in the last 10 hours [05:20:00] PROBLEM - Puppet freshness on es1005 is CRITICAL: No successful Puppet run in the last 10 hours [05:20:00] PROBLEM - Puppet freshness on mc1005 is CRITICAL: No successful Puppet run in the last 10 hours [05:20:01] PROBLEM - Puppet freshness on mw1139 is CRITICAL: No successful Puppet run in the last 10 hours [05:20:01] PROBLEM - Puppet freshness on pc1003 is CRITICAL: No successful Puppet run in the last 10 hours [05:20:02] PROBLEM - Puppet freshness on search1018 is CRITICAL: No successful Puppet run in the last 10 hours [05:20:02] PROBLEM - Puppet freshness on srv258 is CRITICAL: No successful Puppet run in the last 10 hours [05:21:00] PROBLEM - Puppet freshness on cp1046 is CRITICAL: No successful Puppet run in the last 10 hours [05:21:00] PROBLEM - Puppet freshness on db40 is CRITICAL: No successful Puppet run in the last 10 hours [05:21:00] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 10 hours [05:21:00] PROBLEM - Puppet freshness on ms-be1 is CRITICAL: No successful Puppet run in the last 10 hours [05:21:00] PROBLEM - Puppet freshness on mw20 is CRITICAL: No successful Puppet run in the last 10 hours [05:21:00] PROBLEM - Puppet freshness on search14 is CRITICAL: No successful Puppet run in the last 10 hours [05:29:00] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [05:29:00] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [05:29:00] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [05:29:00] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [05:29:00] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [05:29:00] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [05:29:00] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [06:08:20] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:09:10] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [06:31:10] New patchset: Ori.livneh; "Rewrite of EventLogging module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71927 [07:31:06] evil puppet [07:31:28] hm? [07:32:27] torturing icinga-wm [07:33:16] oh; /ignore [08:31:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:32:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [08:38:05] apergos: I found out an old puppet patch related to reverting puppetmaster::self https://gerrit.wikimedia.org/r/#/c/64031/ [08:38:15] do we really need that ? [08:38:25] not that old :-D [08:38:47] well it depends if we expect to use environments any time soon [08:39:10] if it's going to be 6 months or longer then I will want to look at that again [08:39:22] k I am leaving that change open so [08:39:32] ok cool [08:43:02] um so at some point gluster home was set maybe to read-only... do you remember about that? ( hashar ) [08:43:14] the /home I think [08:43:21] to force people to migrate to nfs [08:43:28] yeah, I looked for it in puppet and didn't find it [08:43:44] the reason is that this has been undone in part as of yesterday (and again I don't know where) [08:44:01] apparently the mount are defined in ldap [08:44:04] at least I looked in git log and didn't see an obvious reference [08:44:12] ugh [08:44:13] I see [08:45:29] * apergos tries the labs admin log [08:46:08] nope [08:47:11] so wm-bot was gone again, I couldn't get on the instance, blah blah, it was determined that gluster was serving /home and was read only over there, they aren't ready to move to nfs because they have public_html on gluster too [08:47:37] so coren (maybe) set it to rw mount, at least there, I dunno about other projects, guess I could check [08:48:48] PROBLEM - Puppet freshness on grosley is CRITICAL: No successful Puppet run in the last 10 hours [08:49:18] I am not sure why wm-bot is on labs [08:49:25] since we rely on it [08:51:05] I guess that bots used on the projects are supposed to run from there [08:51:11] not really clear on the matter [08:51:53] yeah on bastion-restricted it's rw too now [08:52:00] so it must have been a global change [08:56:48] PROBLEM - Puppet freshness on mw56 is CRITICAL: No successful Puppet run in the last 10 hours [08:59:36] New patchset: Hashar; "contint: publish Zuul git repositories" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71968 [09:32:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:33:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [10:18:20] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [10:39:43] New review: Nikerabbit; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71927 [10:42:36] !log Sending European wikidata and wikivoyage traffic to esams [10:42:47] Logged the message, Master [10:44:33] |log lunch time [10:49:02] mark: ? [10:49:12] yes? [10:49:16] * aude having trouble accessing wikidata [10:49:19] yep [10:49:23] i made a mistake in dns [10:49:26] so it's down for some people [10:49:29] ok [10:49:35] should be back in an hour I hope [10:49:36] ok, it's not me [10:49:38] ttl expiring [10:49:54] was sending to wikidata-lb.wikidata.org instead of wikidata-lb.wikimedia.org :( [10:49:58] ok ok [10:50:35] anyway, once it works again you'll get it from europe [10:50:39] ok [10:50:49] even if i'm proxying to the us? [10:51:13] what do you mean? [10:51:35] well, it's as if i am in new jersey :) [10:51:48] if you have a proxy for all your internet traffic via the US, then likely not ;) [10:51:49] usually, though sometimes i'm not proxying [10:51:52] ok [10:52:03] then it wouldn't make sense either [10:52:10] going via NJ and back to ams [10:52:20] yeah! [10:52:25] only to go back to DC as wikidata pages aren't really cached much anyway ;) [10:52:56] right [10:53:58] let me know when it works again for you [10:54:32] some people say it works [10:54:49] if i turn off my proxy it works [10:55:21] so means it does not work for north americans\ [10:57:41] no, it just doesn't work for people unlucky enough to have done a wikidata dns lookup in the few minutes that it was broken :( [10:57:46] ok [10:57:48] (or their ISP) [10:57:53] in theory it should work after 10 mins [10:57:59] makes sense [10:58:00] in practice, often it takes longer :( [10:58:02] * aude was on wikidata  [10:58:12] yeah [10:58:28] alright, i can wait and sent a note to the community [10:58:29] did anyone discuss the possibility of testing text varnish on wikidata with you guys yet? [10:58:38] yes it was mentioned yesterday [10:58:41] ok good [10:58:44] what was the response? [10:58:47] i think it would be good for us [10:58:55] just not sure what i would need to do to help :) [10:59:01] probably nothing [10:59:03] ok [10:59:07] just didn't want to do it without you guys being aware [10:59:11] and if you notice anything, do let us know [10:59:16] wikidata community is generally more tolerant of any snags [10:59:38] ok then maybe we can do that next week [10:59:41] ok [10:59:44] i am around [10:59:49] excellent, thanks [10:59:55] and sorry for the dns fuckup :( [11:00:12] if it eventually can allow the language selector thing to work nicer someday, that would be great [11:00:21] it's a step in the right direction to use varnish [11:00:28] yes [11:00:36] varnish can't solve all those problems yet, but will certainly make it easier [11:00:40] right [11:01:55] watchmouse also reported the disruption [11:02:02] now says 'performance issues', probably means it's improving [11:07:29] still broken for you? [11:17:29] mark: it's still broken [11:17:38] grr [11:17:38] other people seeing that now also [11:17:51] e.g. in the office, where it worked for them before [11:19:04] interesting that test.w [11:19:09] test.wikidata works [11:19:16] that's not via geodns [11:19:19] ok [12:47:40] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: No successful Puppet run in the last 10 hours [13:50:56] PROBLEM - Puppet freshness on mw1001 is CRITICAL: No successful Puppet run in the last 10 hours [14:02:56] PROBLEM - Puppet freshness on db78 is CRITICAL: No successful Puppet run in the last 10 hours [14:08:59] New patchset: Faidon; "Rewrite of EventLogging module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71927 [14:10:47] New review: Faidon; "Fixed tabs on deployment.pp and also git mv packages.pp package.pp (the class was renamed but the fi..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/71927 [14:12:01] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73387 [14:32:59] PROBLEM - Puppet freshness on searchidx1001 is CRITICAL: No successful Puppet run in the last 10 hours [14:39:59] PROBLEM - Puppet freshness on rubidium is CRITICAL: No successful Puppet run in the last 10 hours [14:40:59] PROBLEM - Puppet freshness on ekrem is CRITICAL: No successful Puppet run in the last 10 hours [14:40:59] PROBLEM - Puppet freshness on manganese is CRITICAL: No successful Puppet run in the last 10 hours [14:40:59] PROBLEM - Puppet freshness on mw1007 is CRITICAL: No successful Puppet run in the last 10 hours [14:40:59] PROBLEM - Puppet freshness on mw1041 is CRITICAL: No successful Puppet run in the last 10 hours [14:40:59] PROBLEM - Puppet freshness on mw1043 is CRITICAL: No successful Puppet run in the last 10 hours [14:40:59] PROBLEM - Puppet freshness on mw1063 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:00] PROBLEM - Puppet freshness on mw1087 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:00] PROBLEM - Puppet freshness on mw1171 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:00] PROBLEM - Puppet freshness on mw1197 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:01] PROBLEM - Puppet freshness on mw121 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:01] PROBLEM - Puppet freshness on mw1210 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:02] PROBLEM - Puppet freshness on mw58 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:03] PROBLEM - Puppet freshness on search1024 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:03] PROBLEM - Puppet freshness on search18 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:04] PROBLEM - Puppet freshness on solr1003 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:04] PROBLEM - Puppet freshness on solr3 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:04] PROBLEM - Puppet freshness on sq76 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:05] PROBLEM - Puppet freshness on srv292 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:05] PROBLEM - Puppet freshness on stat1 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:06] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [14:41:59] PROBLEM - Puppet freshness on amssq53 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:59] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:59] PROBLEM - Puppet freshness on cp1005 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:59] PROBLEM - Puppet freshness on cp3009 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:59] PROBLEM - Puppet freshness on cp3012 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:59] PROBLEM - Puppet freshness on db1001 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:00] PROBLEM - Puppet freshness on db1031 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:00] PROBLEM - Puppet freshness on db1044 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:00] PROBLEM - Puppet freshness on db39 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:01] PROBLEM - Puppet freshness on helium is CRITICAL: No successful Puppet run in the last 10 hours [14:42:01] PROBLEM - Puppet freshness on mc1007 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:02] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:02] PROBLEM - Puppet freshness on ms10 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:03] PROBLEM - Puppet freshness on mw1032 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:03] PROBLEM - Puppet freshness on mw124 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:04] PROBLEM - Puppet freshness on pc1 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:04] PROBLEM - Puppet freshness on mw43 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:05] PROBLEM - Puppet freshness on potassium is CRITICAL: No successful Puppet run in the last 10 hours [14:42:05] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [14:42:06] PROBLEM - Puppet freshness on rdb1002 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:06] PROBLEM - Puppet freshness on sq54 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:07] PROBLEM - Puppet freshness on sq58 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:07] PROBLEM - Puppet freshness on srv255 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:08] PROBLEM - Puppet freshness on srv273 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:09] PROBLEM - Puppet freshness on wtp1015 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:59] PROBLEM - Puppet freshness on cp1010 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:59] PROBLEM - Puppet freshness on db1022 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:59] PROBLEM - Puppet freshness on ms-fe1001 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:59] PROBLEM - Puppet freshness on mw1003 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:59] PROBLEM - Puppet freshness on mw1024 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:59] PROBLEM - Puppet freshness on mw1033 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:59] PROBLEM - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:00] PROBLEM - Puppet freshness on mw106 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:00] PROBLEM - Puppet freshness on mw1069 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:01] PROBLEM - Puppet freshness on mw1150 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:02] PROBLEM - Puppet freshness on mw1189 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:02] PROBLEM - Puppet freshness on mw1201 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:02] PROBLEM - Puppet freshness on mw1205 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:03] PROBLEM - Puppet freshness on mw2 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:03] PROBLEM - Puppet freshness on mw35 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:04] PROBLEM - Puppet freshness on mw79 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:04] PROBLEM - Puppet freshness on rdb1003 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:05] PROBLEM - Puppet freshness on search33 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:05] PROBLEM - Puppet freshness on srv193 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:06] PROBLEM - Puppet freshness on wtp1007 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:06] PROBLEM - Puppet freshness on mw98 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:59] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:59] PROBLEM - Puppet freshness on amssq48 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:59] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:59] PROBLEM - Puppet freshness on calcium is CRITICAL: No successful Puppet run in the last 10 hours [14:43:59] PROBLEM - Puppet freshness on db1033 is CRITICAL: No successful Puppet run in the last 10 hours [14:45:59] PROBLEM - Puppet freshness on antimony is CRITICAL: No successful Puppet run in the last 10 hours [14:45:59] PROBLEM - Puppet freshness on cp1058 is CRITICAL: No successful Puppet run in the last 10 hours [14:45:59] PROBLEM - Puppet freshness on cp3011 is CRITICAL: No successful Puppet run in the last 10 hours [14:45:59] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: No successful Puppet run in the last 10 hours [14:45:59] PROBLEM - Puppet freshness on db1002 is CRITICAL: No successful Puppet run in the last 10 hours [14:45:59] PROBLEM - Puppet freshness on db1014 is CRITICAL: No successful Puppet run in the last 10 hours [14:45:59] PROBLEM - Puppet freshness on labstore1 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:01] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:01] PROBLEM - Puppet freshness on labstore3 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:01] PROBLEM - Puppet freshness on mc1012 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:02] PROBLEM - Puppet freshness on ms-be12 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:02] PROBLEM - Puppet freshness on ms-fe1002 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:03] PROBLEM - Puppet freshness on mw1027 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:03] PROBLEM - Puppet freshness on mw1104 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:04] PROBLEM - Puppet freshness on mw1206 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:04] PROBLEM - Puppet freshness on mw1208 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:05] PROBLEM - Puppet freshness on mw1211 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:05] PROBLEM - Puppet freshness on mw42 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:06] PROBLEM - Puppet freshness on mw75 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:06] PROBLEM - Puppet freshness on search25 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:07] PROBLEM - Puppet freshness on solr1 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:07] PROBLEM - Puppet freshness on srv242 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:08] PROBLEM - Puppet freshness on ssl1 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:08] PROBLEM - Puppet freshness on virt11 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:09] PROBLEM - Puppet freshness on wtp1001 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:59] PROBLEM - Puppet freshness on amssq51 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:59] PROBLEM - Puppet freshness on amssq56 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:59] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:59] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:59] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 10 hours [14:48:59] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: No successful Puppet run in the last 10 hours [14:48:59] PROBLEM - Puppet freshness on cp1038 is CRITICAL: No successful Puppet run in the last 10 hours [14:48:59] PROBLEM - Puppet freshness on db1058 is CRITICAL: No successful Puppet run in the last 10 hours [14:48:59] PROBLEM - Puppet freshness on db65 is CRITICAL: No successful Puppet run in the last 10 hours [14:48:59] PROBLEM - Puppet freshness on db77 is CRITICAL: No successful Puppet run in the last 10 hours [14:50:59] PROBLEM - Puppet freshness on aluminium is CRITICAL: No successful Puppet run in the last 10 hours [14:50:59] PROBLEM - Puppet freshness on db1010 is CRITICAL: No successful Puppet run in the last 10 hours [14:50:59] PROBLEM - Puppet freshness on db46 is CRITICAL: No successful Puppet run in the last 10 hours [14:50:59] PROBLEM - Puppet freshness on db55 is CRITICAL: No successful Puppet run in the last 10 hours [14:50:59] PROBLEM - Puppet freshness on es1009 is CRITICAL: No successful Puppet run in the last 10 hours [14:50:59] PROBLEM - Puppet freshness on labsdb1001 is CRITICAL: No successful Puppet run in the last 10 hours [14:50:59] PROBLEM - Puppet freshness on mw1022 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:00] PROBLEM - Puppet freshness on mw1040 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:00] PROBLEM - Puppet freshness on mw1062 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:01] PROBLEM - Puppet freshness on mw107 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:01] PROBLEM - Puppet freshness on mw109 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:02] PROBLEM - Puppet freshness on mw1132 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:02] PROBLEM - Puppet freshness on mw1185 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:03] PROBLEM - Puppet freshness on mw1218 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:03] PROBLEM - Puppet freshness on mw40 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:04] PROBLEM - Puppet freshness on mw53 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:04] PROBLEM - Puppet freshness on mw70 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:05] PROBLEM - Puppet freshness on snapshot4 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:05] PROBLEM - Puppet freshness on sq55 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:06] PROBLEM - Puppet freshness on sq64 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:06] PROBLEM - Puppet freshness on sq81 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:07] PROBLEM - Puppet freshness on srv296 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:07] PROBLEM - Puppet freshness on wtp1021 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:59] PROBLEM - Puppet freshness on amssq59 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:59] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [14:51:59] PROBLEM - Puppet freshness on cp3006 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:59] PROBLEM - Puppet freshness on db31 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:59] PROBLEM - Puppet freshness on lvs6 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:59] PROBLEM - Puppet freshness on mc1011 is CRITICAL: No successful Puppet run in the last 10 hours [14:51:59] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:00] PROBLEM - Puppet freshness on mw1072 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:00] PROBLEM - Puppet freshness on mw1134 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:01] PROBLEM - Puppet freshness on mw117 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:01] PROBLEM - Puppet freshness on mw1178 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:02] PROBLEM - Puppet freshness on mw1219 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:02] PROBLEM - Puppet freshness on mw87 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:03] PROBLEM - Puppet freshness on professor is CRITICAL: No successful Puppet run in the last 10 hours [14:52:03] PROBLEM - Puppet freshness on search1006 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:04] PROBLEM - Puppet freshness on search1011 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:04] PROBLEM - Puppet freshness on srv285 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:05] PROBLEM - Puppet freshness on virt2 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:59] PROBLEM - Puppet freshness on amssq32 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:59] PROBLEM - Puppet freshness on amssq43 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:59] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:59] PROBLEM - Puppet freshness on cp1015 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:59] PROBLEM - Puppet freshness on db1011 is CRITICAL: No successful Puppet run in the last 10 hours [14:54:59] PROBLEM - Puppet freshness on cp1039 is CRITICAL: No successful Puppet run in the last 10 hours [14:54:59] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: No successful Puppet run in the last 10 hours [14:54:59] PROBLEM - Puppet freshness on cp1054 is CRITICAL: No successful Puppet run in the last 10 hours [14:54:59] PROBLEM - Puppet freshness on cp3019 is CRITICAL: No successful Puppet run in the last 10 hours [14:54:59] PROBLEM - Puppet freshness on cp3021 is CRITICAL: No successful Puppet run in the last 10 hours [14:55:59] PROBLEM - Puppet freshness on cp1008 is CRITICAL: No successful Puppet run in the last 10 hours [14:55:59] PROBLEM - Puppet freshness on amssq36 is CRITICAL: No successful Puppet run in the last 10 hours [14:55:59] PROBLEM - Puppet freshness on cp1009 is CRITICAL: No successful Puppet run in the last 10 hours [14:55:59] PROBLEM - Puppet freshness on cp1064 is CRITICAL: No successful Puppet run in the last 10 hours [14:55:59] PROBLEM - Puppet freshness on cp1068 is CRITICAL: No successful Puppet run in the last 10 hours [14:56:59] PROBLEM - Puppet freshness on amssq41 is CRITICAL: No successful Puppet run in the last 10 hours [14:56:59] PROBLEM - Puppet freshness on amssq60 is CRITICAL: No successful Puppet run in the last 10 hours [14:56:59] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: No successful Puppet run in the last 10 hours [14:56:59] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: No successful Puppet run in the last 10 hours [14:56:59] PROBLEM - Puppet freshness on db38 is CRITICAL: No successful Puppet run in the last 10 hours [14:56:59] PROBLEM - Puppet freshness on cp1057 is CRITICAL: No successful Puppet run in the last 10 hours [14:56:59] PROBLEM - Puppet freshness on db1028 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:00] PROBLEM - Puppet freshness on db68 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:00] PROBLEM - Puppet freshness on ersch is CRITICAL: No successful Puppet run in the last 10 hours [14:57:01] PROBLEM - Puppet freshness on lvs3 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:01] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:02] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:02] PROBLEM - Puppet freshness on mw1039 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:03] PROBLEM - Puppet freshness on mw1177 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:03] PROBLEM - Puppet freshness on mw1184 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:04] PROBLEM - Puppet freshness on mw120 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:04] PROBLEM - Puppet freshness on mw96 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:05] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:05] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:16] ffs [14:57:30] how hard is it to have monitoring that doesn't break every 3rd day [14:57:59] PROBLEM - Puppet freshness on amssq58 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:59] PROBLEM - Puppet freshness on cp1019 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:59] PROBLEM - Puppet freshness on db1003 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:59] PROBLEM - Puppet freshness on db1041 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:59] PROBLEM - Puppet freshness on db1043 is CRITICAL: No successful Puppet run in the last 10 hours [14:59:59] PROBLEM - Puppet freshness on amslvs3 is CRITICAL: No successful Puppet run in the last 10 hours [14:59:59] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: No successful Puppet run in the last 10 hours [14:59:59] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: No successful Puppet run in the last 10 hours [14:59:59] PROBLEM - Puppet freshness on cp1004 is CRITICAL: No successful Puppet run in the last 10 hours [14:59:59] PROBLEM - Puppet freshness on cp1011 is CRITICAL: No successful Puppet run in the last 10 hours [15:00:59] PROBLEM - Puppet freshness on amssq46 is CRITICAL: No successful Puppet run in the last 10 hours [15:00:59] PROBLEM - Puppet freshness on amssq62 is CRITICAL: No successful Puppet run in the last 10 hours [15:00:59] PROBLEM - Puppet freshness on cp1055 is CRITICAL: No successful Puppet run in the last 10 hours [15:00:59] PROBLEM - Puppet freshness on db1053 is CRITICAL: No successful Puppet run in the last 10 hours [15:00:59] PROBLEM - Puppet freshness on es8 is CRITICAL: No successful Puppet run in the last 10 hours [15:00:59] PROBLEM - Puppet freshness on ms1002 is CRITICAL: No successful Puppet run in the last 10 hours [15:00:59] PROBLEM - Puppet freshness on mw102 is CRITICAL: No successful Puppet run in the last 10 hours [15:01:07] PROBLEM - Puppet freshness on srv241 is CRITICAL: No successful Puppet run in the last 10 hours [15:01:07] PROBLEM - Puppet freshness on mw6 is CRITICAL: No successful Puppet run in the last 10 hours [15:01:07] PROBLEM - Puppet freshness on ssl3002 is CRITICAL: No successful Puppet run in the last 10 hours [15:01:07] PROBLEM - Puppet freshness on virt5 is CRITICAL: No successful Puppet run in the last 10 hours [15:01:07] PROBLEM - Puppet freshness on virt7 is CRITICAL: No successful Puppet run in the last 10 hours [15:01:07] PROBLEM - Puppet freshness on wtp1023 is CRITICAL: No successful Puppet run in the last 10 hours [15:01:59] PROBLEM - Puppet freshness on amssq38 is CRITICAL: No successful Puppet run in the last 10 hours [15:01:59] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: No successful Puppet run in the last 10 hours [15:01:59] PROBLEM - Puppet freshness on cp1052 is CRITICAL: No successful Puppet run in the last 10 hours [15:01:59] PROBLEM - Puppet freshness on db1006 is CRITICAL: No successful Puppet run in the last 10 hours [15:01:59] PROBLEM - Puppet freshness on cp1059 is CRITICAL: No successful Puppet run in the last 10 hours [15:03:59] PROBLEM - Puppet freshness on amssq39 is CRITICAL: No successful Puppet run in the last 10 hours [15:03:59] PROBLEM - Puppet freshness on amssq61 is CRITICAL: No successful Puppet run in the last 10 hours [15:03:59] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: No successful Puppet run in the last 10 hours [15:03:59] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: No successful Puppet run in the last 10 hours [15:03:59] PROBLEM - Puppet freshness on cp1045 is CRITICAL: No successful Puppet run in the last 10 hours [15:05:59] PROBLEM - Puppet freshness on amssq52 is CRITICAL: No successful Puppet run in the last 10 hours [15:05:59] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: No successful Puppet run in the last 10 hours [15:05:59] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: No successful Puppet run in the last 10 hours [15:05:59] PROBLEM - Puppet freshness on cp1007 is CRITICAL: No successful Puppet run in the last 10 hours [15:05:59] PROBLEM - Puppet freshness on cp1056 is CRITICAL: No successful Puppet run in the last 10 hours [15:07:59] PROBLEM - Puppet freshness on amssq55 is CRITICAL: No successful Puppet run in the last 10 hours [15:08:00] PROBLEM - Puppet freshness on amssq57 is CRITICAL: No successful Puppet run in the last 10 hours [15:08:00] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: No successful Puppet run in the last 10 hours [15:08:00] PROBLEM - Puppet freshness on cp1003 is CRITICAL: No successful Puppet run in the last 10 hours [15:08:00] PROBLEM - Puppet freshness on cp1051 is CRITICAL: No successful Puppet run in the last 10 hours [15:09:59] PROBLEM - Puppet freshness on db1020 is CRITICAL: No successful Puppet run in the last 10 hours [15:09:59] PROBLEM - Puppet freshness on lvs1 is CRITICAL: No successful Puppet run in the last 10 hours [15:09:59] PROBLEM - Puppet freshness on mc1009 is CRITICAL: No successful Puppet run in the last 10 hours [15:09:59] PROBLEM - Puppet freshness on ms-be1012 is CRITICAL: No successful Puppet run in the last 10 hours [15:09:59] PROBLEM - Puppet freshness on mw1111 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:00] PROBLEM - Puppet freshness on mw1124 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:00] PROBLEM - Puppet freshness on mw1125 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:01] PROBLEM - Puppet freshness on mw1130 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:01] PROBLEM - Puppet freshness on mw1147 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:02] PROBLEM - Puppet freshness on mw1161 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:02] PROBLEM - Puppet freshness on mw1190 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:03] PROBLEM - Puppet freshness on mw1214 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:03] PROBLEM - Puppet freshness on mw38 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:04] PROBLEM - Puppet freshness on mw8 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:04] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:05] PROBLEM - Puppet freshness on search36 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:05] PROBLEM - Puppet freshness on solr2 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:06] PROBLEM - Puppet freshness on sq51 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:06] PROBLEM - Puppet freshness on srv301 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:07] PROBLEM - Puppet freshness on wtp1011 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:07] PROBLEM - Puppet freshness on wtp1018 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:59] PROBLEM - Puppet freshness on amssq44 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:59] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:59] PROBLEM - Puppet freshness on cp1063 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:59] PROBLEM - Puppet freshness on db1004 is CRITICAL: No successful Puppet run in the last 10 hours [15:10:59] PROBLEM - Puppet freshness on db43 is CRITICAL: No successful Puppet run in the last 10 hours [15:11:51] New patchset: coren; "Add new labsudb role for Labs users' database" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73171 [15:13:32] Reedy: So what would be the proper way to update an e-mailaddress? I know the User object has a method for this, but I'm not sure it deals with CentralAuth properly (like the Special page does when the user changes it himself) [15:14:38] Krinkle: https://wikitech.wikimedia.org/wiki/Password_reset [15:14:50] Aha [15:15:57] Just do it from tin/terbium rather than fenari [15:16:36] New patchset: Reedy; "Update noc symlinks for display" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73350 [15:16:45] yeah, from tin [15:16:53] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73350 [15:17:50] is anybody who deals with Notifications around? [15:19:19] !log Resetting [[commons:User:CommonsDelinker]] account e-mailaddress (bug 51016) [15:19:30] Logged the message, Master [15:20:21] ori-l: bug 1234 is getting matched: https://wikitech.wikimedia.org/w/index.php?title=Server_Admin_Log&diff=77280&oldid=77277 [15:20:28] for {{Gerrit}} .. [15:20:59] PROBLEM - Puppet freshness on amssq34 is CRITICAL: No successful Puppet run in the last 10 hours [15:20:59] PROBLEM - Puppet freshness on cp1002 is CRITICAL: No successful Puppet run in the last 10 hours [15:20:59] PROBLEM - Puppet freshness on cp1012 is CRITICAL: No successful Puppet run in the last 10 hours [15:20:59] PROBLEM - Puppet freshness on db1023 is CRITICAL: No successful Puppet run in the last 10 hours [15:20:59] PROBLEM - Puppet freshness on db34 is CRITICAL: No successful Puppet run in the last 10 hours [15:20:59] PROBLEM - Puppet freshness on es1005 is CRITICAL: No successful Puppet run in the last 10 hours [15:21:00] PROBLEM - Puppet freshness on mc1005 is CRITICAL: No successful Puppet run in the last 10 hours [15:21:00] PROBLEM - Puppet freshness on mw1139 is CRITICAL: No successful Puppet run in the last 10 hours [15:21:01] PROBLEM - Puppet freshness on pc1003 is CRITICAL: No successful Puppet run in the last 10 hours [15:21:01] PROBLEM - Puppet freshness on search1018 is CRITICAL: No successful Puppet run in the last 10 hours [15:21:02] PROBLEM - Puppet freshness on srv258 is CRITICAL: No successful Puppet run in the last 10 hours [15:21:59] PROBLEM - Puppet freshness on cp1046 is CRITICAL: No successful Puppet run in the last 10 hours [15:21:59] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 10 hours [15:21:59] PROBLEM - Puppet freshness on db40 is CRITICAL: No successful Puppet run in the last 10 hours [15:21:59] PROBLEM - Puppet freshness on mw20 is CRITICAL: No successful Puppet run in the last 10 hours [15:21:59] PROBLEM - Puppet freshness on ms-be1 is CRITICAL: No successful Puppet run in the last 10 hours [15:21:59] PROBLEM - Puppet freshness on search14 is CRITICAL: No successful Puppet run in the last 10 hours [15:22:10] Hmm. Why isn't mc.php appearing on noc.. [15:22:53] indeed [15:23:12] https://git.wikimedia.org/blob/operations%2Fmediawiki-config.git/1046e864c1d6323dde82bcb9f2c0d5eee592bfa1/docroot%2Fnoc%2FcreateTxtFileSymlinks.sh#L34 [15:23:14] it is listed [15:23:17] Yup [15:23:24] re-run the script? [15:23:33] DocumentRoot /usr/local/apache/common/docroot/noc [15:23:34] Nope [15:23:39] Needs updating on tin and sync-docroot [15:23:43] Seems it's not using it from NFS [15:24:09] hashar_, if you're still working can we catch up about https://gerrit.wikimedia.org/r/#/c/72721/ ? [15:24:46] Reedy: It is also in version control (not just in the script, but the output of the script) and mc is listed there, so something got currupted? [15:24:47] !log reedy synchronized docroot and w [15:24:58] Logged the message, Master [15:25:01] Fixed now [15:26:31] -rw-rw-r-- 1 dzahn wikidev 637 Jun 12 21:30 twemproxy-eqiad.yaml [15:26:32] -rw-rw-r-- 1 reedy wikidev 637 Jul 12 15:16 twemproxy-pmtpa.yaml [15:26:32] -rw-rw-r-- 1 reedy wikidev 637 Jul 12 15:16 twemproxy.yaml [15:26:57] Wonder why we've got 3.. Especially as they're identical [15:28:04] andrewbogott: I am kind of fighting with python :( [15:28:18] hashar_, anything I can help with? [15:28:27] I am giving up [15:28:40] will send patch upstream and ask them for help :-] [15:28:45] * hashar_ context switch to puppet [15:29:26] andrewbogott: so looking at your patchset 2 https://gerrit.wikimedia.org/r/#/c/72721/1..2/rakefile,unified [15:29:54] It seems like a cheap trick, but symlinking is what it does for the actual module being tested anyway. [15:29:54] andrewbogott: have you injected the GitHub fixtures repositories under operations/puppet.git ? [15:29:59] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [15:29:59] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [15:29:59] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [15:29:59] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [15:29:59] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [15:29:59] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [15:30:00] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [15:30:19] !log reedy synchronized docroot and w [15:30:23] hashar_: I don't know what that is [15:31:43] hashar_: My understanding is that the spec setup does two things... [15:31:51] a) clones the modules listed in .fixtures.yml [15:31:59] b) Creates fixtures as scripted by the test [15:32:13] So, I'm dealing with a) by removing all of our .fixtures.yml and symlinking modules instead [15:32:23] and, b) doesn't seem to require any git interaction. [15:32:24] oh [15:32:29] !log reedy synchronized docroot and w [15:32:43] the reason I am calling a git pull is that rake spec_prep would do the git clone but never update [15:32:44] hashar_: But I might be missing something on account of re-running tests on the same system... [15:33:03] so after a while you might be missing fixtures [15:33:10] that have been send by upstream [15:33:14] hashar_: Right, I think we don't need it to git pull at all. Because we want to test vs. our actual modules rather than vs. github's [15:33:32] if you provide all the fixtures directly in our operations/puppet repository, we can just skip the call to 'rake spec_prep' [15:33:44] ok, back up... [15:33:47] what do you mean by 'fixtures'? [15:33:58] the stuff being cloned from github [15:34:15] And do you believe that to differ from a) above? [15:34:43] !log reedy synchronized docroot and w [15:35:52] andrewbogott: that would work if you provide the github stuff in our repo [15:36:22] hashar_: as far as I know, 'the github stuff' is just the list of modules listed in .fixtures.yml [15:38:33] I am lost [15:38:46] ok :) [15:39:11] I don't understand the change you made in https://gerrit.wikimedia.org/r/#/c/72721/1..2/rakefile,unified :-] [15:39:17] lets start with that. [15:39:24] the code used to do simply rake spec_prep [15:39:32] Yep. [15:39:43] that looks at the fixtures.yaml and fetch the modules/manifests from github [15:39:51] just the modules, I believe. But, ok. [15:40:51] So -- fetching those things from github means that we are then testing the behavior of the module against upstream modules, some of which may not exist in our rep. [15:41:12] That seems wrong. So I want to prevent /anything/ from being fetched. Tests should pass against our modules, or not at all. [15:41:37] ahhh [15:41:37] so: https://gerrit.wikimedia.org/r/#/c/72996/ [15:42:22] New patchset: Reedy; "Add all-labs.dblist" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73419 [15:42:25] not to mention our modules might differ from those in github which, again, we care about tests passing w/respect to our versions. [15:42:39] andrewbogott: yeah that make a lot more sense now [15:43:09] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73419 [15:45:17] removing .fixtures.yaml seems a bit of a hack, I'd rather explicitly tell spec_prep not to do any cloning. But that doesn't seem possible short of mucking with the code. [15:45:28] maybe we do not need to use spec_prep [15:46:39] * hashar_ tries [15:46:59] It does other things related to fixtures, although I don't know quite how it works yet [15:47:39] maybe all it does is create an empty site.pp? [15:47:51] New patchset: Mark Bergsma; "Revert "Allow persistent connections for HTTP PURGE (error) responses" - causing problems with vhtcpd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73421 [15:48:56] andrewbogott: seems it is only the empty site.pp [15:49:04] well that's easy enough :) [15:49:07] Hang on, I'll revise [15:49:15] Could not parse for environment production: No file(s) found for import of '/Users/amusso/projects/operations/puppet/modules/apache/spec/fixtures/manifests/site.pp' at line 3 on node aeriale [15:49:42] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73421 [15:49:43] hashar_: want to say something about https://bugzilla.wikimedia.org/show_bug.cgi?id=50335 ? it's getting stale... [15:49:50] andrewbogott: I am touching some files [15:49:59] andrewbogott: and removed the rake rspec_prep [15:50:10] ok! We'll see how it goes [15:50:14] Of course several tests fail anyway [15:50:17] that unblock apache :-) [15:50:19] but apache should pass, and rsync [15:50:23] !log reedy synchronized docroot and w [15:53:13] ran across a 502 bad gateway error while saving just now [15:58:19] andrewbogott: patchincoming [15:58:25] andrewbogott: will press da rebase button [15:58:27] cool [15:58:31] New patchset: Hashar; "rake wrapper to run puppet module tests" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72721 [15:58:50] hashar_: That means I can/should revert https://gerrit.wikimedia.org/r/#/c/72996/ since it's not longer needed? [15:58:52] rebasing [15:58:52] New patchset: Hashar; "rake wrapper to run puppet module tests" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72721 [15:59:11] andrewbogott: na it is fine :-) [15:59:23] well, it increases the diff with upstream, so best to avoid if we don't need it. [15:59:35] andrewbogott: this way people running locally 'rake spec' in a module will have the tests run against our modules instead of upstream ones [15:59:47] hm, true. [16:00:01] or, actually, it'll just fail if they don't run the top-level script first [16:04:04] New review: Hashar; "(1 comment)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/72721 [16:04:12] hashar, I think that the rakefile should create the empty site.pp [16:04:19] andrewbogott: https://gerrit.wikimedia.org/r/#/c/72721/4/rakefile,unified [16:04:26] andrewbogott: so yeah I am wondering what we should do [16:04:39] either we git add the site.pp and modules symlinks [16:05:03] or have the rake file take care of deleting the fixtures files and creating the site.pp and modules symlinks [16:05:13] though deleting the fixtures is a bit evil since they are tracked in git [16:05:28] are they tracked? [16:05:47] Anyway, why not just have the script create things if they don't exist, and do nothing if they do? [16:07:15] hashar_, stay tuned, I'm fiddling with the patch [16:08:44] New patchset: coren; "Labs: Create /etc/wmflabs-project" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73426 [16:08:52] anomie: ^^ [16:09:22] !log reedy synchronized docroot and w [16:09:43] New patchset: coren; "Labs: Create /etc/wmflabs-project" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73426 [16:10:01] This one is the same, without the stoopid missing brace. :-) [16:10:20] New patchset: Reedy; "Display twemproxy.yaml" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73427 [16:10:35] Why do I only see these things /after/ the git review? [16:10:56] meanwhile… hashar_, do you have an opinion about the importance of puppet/util/colors? It causes the script to fail for me, I've been commenting that part out [16:10:56] Coren: Except for using spaces instead of tabs, seems ok to me. But I don't really know puppet. [16:11:41] Oh bleh. We have so many different space/tabs conventions in there that no editor setting manages to get them right. [16:12:12] New patchset: coren; "Labs: Create /etc/wmflabs-project" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73426 [16:12:37] anomie: I wasn't trolling for a +2, just showing what I meant. :-) [16:12:38] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73427 [16:13:19] Coren: Good, since I don't have (or want right now) +2 on that repo ;) [16:13:48] andrewbogott: you get an old version of puppet maybe ? [16:14:03] I have whatever labs+precise gives me [16:14:13] .. [16:14:17] that is a good thing [16:14:24] that also means it is not going to work properly on jenkins [16:14:53] irb(main):001:0> require'puppet/util/colors' [16:14:54] LoadError: no such file to load -- puppet/util/colors [16:14:55] seriously [16:15:35] ok, I'll just snip out that part :) [16:17:44] hmm [16:17:47] git log --reverse -n1 lib/puppet/util/colors.rb gives me the commit [16:17:55] that introduced the feature (puppet bug 18986 ) [16:18:21] ahno (#12080) Implement a rich console logging prototype. [16:18:32] $ git describe 4720a9456daa4b03af7641b006f63add081e47dd [16:18:33] 2.7.10-218-g4720a94 [16:18:42] and we run 2.7.11 [16:18:44] gmmh [16:20:56] no idea [16:23:48] New patchset: Andrew Bogott; "rake wrapper to run puppet module tests" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72721 [16:24:00] apparently having a filename and a branch name match in git causes some trouble to git review :) [16:24:09] Anyway… hashar_, ^^ [16:25:39] New patchset: BBlack; "Allow persistent connections for HTTP PURGE (error) responses via 204" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73429 [16:26:08] !log Graceful reload of Zuul to forward deployment to I4de1670e11d58e [16:26:20] Logged the message, Master [16:28:42] New review: BBlack; "Pushing this through, was tested manually on cp1038 and seems to work correctly." [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/73429 [16:28:43] Change merged: BBlack; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73429 [16:31:16] New patchset: Hashar; "rake skip colorization when unsupported by puppet" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73430 [16:31:23] Any idea why wget https://upload.wikimedia.org/wikipedia/commons/archive/f/f9/20121117150147D0%A6%D0%B5%D1%80%D0%BA%D0%BE%D0%B2%D1%8C_%D0%90%D1%80%D1%85%D0%B0%D0%BD%D0%B3%D0%B5%D0%BB%D0%B0_%D0%9C%D0%B8%D1%85%D0%B0%D0%B8%D0%BB%D0%B0_%D0%B2_%D0%9A%D1%83%D0%B1%D0%B8%D0%BD%D0%BA%D0%B5.jpg returns 501 Not Implemented? [16:31:44] andrewbogott: https://gerrit.wikimedia.org/r/73430 should skip loading the puppet color module when it is not found. That would get rid of the warning you have. [16:32:54] hashar, ok, that's better. Go ahead and merge that and I'll reconcile it with the rest of my patch [16:33:08] um… hashar_ that is [16:33:47] New review: Hashar; "The color change is unrelated :-] see https://gerrit.wikimedia.org/r/73430 " [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/72721 [16:33:54] andrewbogott: so just restore the color [16:34:06] my patch should apply to it cleanly :-] [16:34:09] ok [16:34:49] I am not sure whether it should be enabled in Zuul yet [16:34:50] I will most probably create a dumb job that is triggered manually for us to test with [16:34:59] and just have it to run on maste^Wproduction [16:35:01] branch [16:35:49] I am leaving in like 10-15 minutes :-D [16:37:13] hashar_: We shouldn't enable it today, at least :) [16:37:38] I do want to get these bits merged though so I can send another email "Now here's how to run tests" [16:37:58] New patchset: Andrew Bogott; "rake wrapper to run puppet module tests" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72721 [16:38:02] hi [16:38:14] any way to create prehooks for an automake target ? [16:38:18] custom prehooks [16:38:38] Anyone up for a quick +2 for https://gerrit.wikimedia.org/r/#/c/73426/ ? It's a trivial addition, but I don't self +2 on base.pp. :-) [16:40:01] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73426 [16:40:21] New review: Hashar; "go ahead and merge it :-}" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/72721 [16:40:21] andrewbogott: excellent :-] [16:40:26] Thanks, andrewbogott. [16:40:28] andrewbogott: you can get https://gerrit.wikimedia.org/r/#/c/72721/ merged in [16:40:34] 'k [16:40:37] andrewbogott: I have added a test job in jenkins :-] [16:40:52] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72721 [16:41:15] hashar_: One that's only invoked by hand? [16:41:20] andrewbogott: https://integration.wikimedia.org/ci/job/test-puppet-rspec/2/ :-D [16:41:30] so if you go to https://integration.wikimedia.org/ci/job/test-puppet-rspec/ [16:41:45] and logged in with your labs account in the 'wmf' ldap group, you should see a "Build now" link [16:41:48] New patchset: Andrew Bogott; "rake skip colorization when unsupported by puppet" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73430 [16:41:48] that let you trigger the build manually [16:42:00] and jenkins is supposed to poll the repo from time to time to trigger build. [16:42:15] andrewbogott: the job is failing because gallium does not have the ruby packages though [16:42:31] oh, I have a pending patch for that [16:42:47] ah https://gerrit.wikimedia.org/r/#/c/71921/ [16:43:01] andrewbogott: I did amend it because 'rake' ended up being included twice. [16:43:07] yep, I saw. Seems good. [16:43:23] andrewbogott: so if you merge that, sock puppet it, update puppet on gallium and then build now in the jenkins job, you should get something useful :-] [16:43:25] \O/ [16:43:29] hashar_, what about polling the repo? Does that run the risk of us testing patches that haven't been reviewed yet? [16:43:32] and thus write the nice email ahehe [16:43:42] andrewbogott: it poll the tip of the production branch [16:43:48] so if its merged, it is probably safe [16:43:50] ah, that's fine then. [16:43:55] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73430 [16:44:53] running puppet :D [16:45:28] Oh, I didn't merge the one with the packages yet [16:45:32] yeah huhu [16:45:43] anyway once merged you can try out in the jenkins job :-] [16:45:56] I might connect later on to verify but that should be fine [16:46:01] I will have to find a way to build a nice test report out of the spec output [16:46:12] ok! [16:46:32] New patchset: Andrew Bogott; "Add packages to the jenkins slave for puppet rspec." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71921 [16:46:54] I am out :-]  ping me by email if there is any trouble! [16:46:59] home + daughter time [16:47:38] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71921 [16:58:56] hi kaldari. [16:59:53] kaldari: is it technically possible to get Echo aka Notifications installed on a Wikimedia wiki? asking you since i think you worked on it [17:01:33] MatmaRex: i think there is a "not yet" thing going on after other wikis requested it.. [17:03:00] hmph. [17:03:04] because of what? [17:03:45] not ready, I think [17:03:47] let me find the bug [17:04:06] https://bugzilla.wikimedia.org/show_bug.cgi?id=50064#c9 [17:06:44] how can it not be ready if it's deployed and working on en.wp? D: [17:07:05] Besides the need to complete core feature, our rationale for waiting is to [17:07:05] avoid overwhelming the communities with too many changes taking place all at [17:07:05] once -- and make sure that we have some resources to respond quickly to [17:07:05] inevitable questions from each community. [17:07:16] well,except they are asking for it themselves [17:07:34] oh well. i'll relay. [17:11:37] MatmaRex: Certainly file a bug [17:12:13] Reedy: we don't have"consensus" yet (no strawpoll), and i don't want to bother people if it will not be enabled anyway [17:12:17] (we're speaking of pl.wp, btw) [17:12:41] MatmaRex: I'm not sure what's going on with Echo (I've moved on to mobile development currently). It seems more feature requirements keep getting added :( [17:13:09] eh [17:26:14] New patchset: Odder; "Clean up $wgExtraNamespaces in InitialiseSettings.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72059 [17:36:13] New patchset: Springle; "default to newly added change_tags/tag_summary indexes on all remaining wikis in bug 40867 comment 6" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73435 [17:39:02] gwicke: git diff 6a6cfa9a395cb053b18bdbeec0fd23e0477e05fb 4209296244575a13cafa58ec8d5fd3e8425108e2 --name-only [17:39:05] lots of stuff :) [17:40:22] I am trying to understand how https://git.wikimedia.org/tree/operations%2Fmediawiki-config.git is merged with mediawiki software [17:40:22] New review: Reedy; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73435 [17:40:27] is there any documentation for it? [17:40:44] AaronSchulz: git diff 6a6cfa9a395cb053b18bdbeec0fd23e0477e05fb 4209296244575a13cafa58ec8d5fd3e8425108e2 -- php [17:40:44] LocalSettings loads CommonSettings [17:40:49] CommonSettings loads InitialiseSettings [17:41:01] AaronSchulz: the JS stuff is not used at all in core [17:41:16] only the php subdir is, and there is exactly one commit in there.. [17:42:09] gwicke: yeah it doesn't matter [17:42:13] AaronSchulz: git log 6a6cfa9a395cb053b18bdbeec0fd23e0477e05fb..4209296244575a13cafa58ec8d5fd3e8425108e2 -- php [17:47:33] AaronSchulz: I'd like to deploy this soon so that I am still around for the afternoon [17:47:55] gwicke: you can :) I'm not objecting [17:48:44] AaronSchulz: I don't have +2 though [17:48:52] oh, wow [17:49:24] so deploying requires some coordination for me.. [17:49:33] you can still sync right? [17:49:36] yes [17:49:43] maybe James can fix that [17:50:12] AaronSchulz: is James a gerrit admin? [17:50:39] RoanKattouw: ^^ [17:57:58] !log gwicke synchronized php-1.22wmf9/extensions/Parsoid/php/Parsoid.php 'updating Parsoid master' [17:58:08] Logged the message, Master [17:59:08] !log gwicke synchronized php-1.22wmf9/extensions/Parsoid/php/ParsoidCacheUpdateJob.php 'updating Parsoid master' [17:59:19] Logged the message, Master [18:04:14] AaronSchulz: in https://graphite.wikimedia.org/render/?width=1486&height=641&_salt=1371161654.988&from=-24hours&target=stats.job-insert-ParsoidCacheUpdateJob.count&target=stats.job-pop-ParsoidCacheUpdateJob.count it looks as if there have been more Parsoid job insertions than pops recently [18:10:35] New patchset: Reedy; "Remove $wgOldChangeTagsIndex on all remaining wikis in bug 40867 comment 6" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73435 [18:11:25] New patchset: Reedy; "Remove $wgHtml5, both deprecated and same as MW default" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73443 [18:18:11] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73435 [18:19:10] !log reedy synchronized wmf-config/InitialiseSettings.php 'Remove wgOldChangeTagsIndex on all remaining wikis' [18:19:21] Logged the message, Master [18:32:18] AaronSchulz, ParsoidCacheUpdateJob: 127366 queued; 131 claimed (0 active, 131 abandoned) [18:32:57] maybe we need more runners? [18:33:23] cpu still looks low, so I guess [18:33:27] or was another IO-bound job added that decreases the parsoid run frequency? [18:33:46] it's just parsoid in its own loop [18:33:53] ah, k [18:34:09] many of the updates are just a single title [18:34:35] for which only more runners or (maybe) more jobs per runner help [18:35:28] we can also play with the number of titles per job for template updates, but most of those jobs are probably single-title updates [18:37:25] I should just kill procs_per_iobound_type and put it in the subshell command [18:38:06] gah, well, it would need a bool in it's place [18:38:23] * AaronSchulz can use a factor instead [18:40:06] gwicke: maybe 4x? [18:40:24] AaronSchulz: yes, the Parsoid cluster is currently at 2% load ;) [18:42:18] it seems that the push/pop rate diverged on Wednesday: https://graphite.wikimedia.org/render/?width=1486&height=641&_salt=1371161654.988&from=-100hours&target=stats.job-insert-ParsoidCacheUpdateJob.count&target=stats.job-pop-ParsoidCacheUpdateJob.count [18:42:44] around the time Tim investigated the HTMLCacheUpdate issue [18:44:11] !log authdns-update all ulsfo now 4XXX [18:44:22] Logged the message, RobH [18:46:53] AaronSchulz: might be related to https://gerrit.wikimedia.org/r/#/c/72681/ [18:47:27] upped the Varnish backend timeout from 60 seconds to 5 minutes [18:48:19] so slow pages can block job runners for longer [18:48:57] PROBLEM - Puppet freshness on grosley is CRITICAL: No successful Puppet run in the last 10 hours [18:56:57] PROBLEM - Puppet freshness on mw56 is CRITICAL: No successful Puppet run in the last 10 hours [19:00:24] gwicke: I'll try 5x :) [19:01:32] AaronSchulz: alright, go for it ;) [19:02:29] hmm, that might be too many db connections for now [19:02:43] I'll have to start a bit lower [19:02:50] ok [19:02:59] I could also increase the number of titles per job [19:03:03] in the config [19:06:13] is there a way to install a hook executing after an Automake target ? [19:07:08] New patchset: Aaron Schulz; "3X increase to parsoid pipeline" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73450 [19:19:40] springle: not seeing any script already there, oh well [19:22:50] hey paravoid, would sometime monday work for you for deploying 71927? i don't think it'd take much. i may have a couple of "oh fuck" fix-ups that i'd need merged but it shouldn't take very long and you can be doing other things during that time. [19:23:02] how about now? [19:23:34] hmmm. let me ping a couple of folks who'd be affected and ask! [19:24:00] monday is possible too [19:24:23] actually, yeah, let's do monday if that's OK. i'm a bit eager but it'd be better to give some more advance notice. [19:24:37] what time UTC works best for you? [19:25:18] we have a meeting on monday [19:25:21] so let's say after the meeting? [19:25:31] meeting is at 18:00 UTC [19:25:37] paravoid: hey Faidon [19:25:39] how long does it usually run? two hours? [19:25:41] hi [19:25:48] less usually [19:25:57] paravoid: just switched to upstream jni [19:26:01] but say two to be sure? [19:26:10] paravoid: please let me know when you have a minute [19:26:18] ok cool, so 20:00 UTC. i really appreciate your help with this. [19:26:27] no worries [19:26:30] average: sure, hit me :) [19:27:13] New patchset: Anomie; "Update protection configs for core change I6bf650a3" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71538 [19:27:31] paravoid: https://gerrit.wikimedia.org/r/#/c/68711/ [19:27:48] paravoid: I used your #13 patch, and went forward with that one, modified the Makefile.am [19:28:25] wait [19:28:34] that doesn't belong in the packaging patchset [19:29:20] let's start over [19:29:28] what happened with the issue you mentioned yesterday? [19:29:29] that's what I thought, ok [19:29:59] paravoid: well, I will say that basically we will not have any *.class in our package [19:30:00] the embedding-so-in-jar? [19:30:31] I don't understand :) [19:31:14] I'll write in pm, because this will be long [19:45:09] New review: Ori.livneh; "To be deployed Monday, Jul 15 @ 20:00 UTC." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/71927 [19:47:38] springle: 20130712002944 is the timestamp on the oldest row in enwiki tag_summary [19:56:05] !log gwicke synchronized php-1.22wmf9/extensions/Parsoid/php/Parsoid.php 'updating Parsoid' [19:56:15] Logged the message, Master [19:56:50] !log gwicke synchronized php-1.22wmf9/extensions/Parsoid/php/ParsoidCacheUpdateJob.php 'updating Parsoid' [19:57:02] Logged the message, Master [20:02:52] AaronSchulz: the number of 'active' jobs for ParsoidCacheUpdate tends to be around 1, is that ok? [20:03:01] 141204 queued; 132 claimed (1 active, 131 abandoned) [20:10:30] gwicke: I don't see many parsoid job procs on the runners [20:10:31] not sure why [20:11:09] odd [20:11:19] the other main types look ok [20:12:36] the backlog for refreshLinks seems to be pretty low currently- was something changed to speed that up recently that the parsoid jobs could contend on? [20:12:51] db connections for example [20:13:23] on m1016, there is no parsoid job coordinator running at all for example [20:13:36] oh, wait [20:14:05] yeah, doesn't seem to be there [20:14:13] I see the main loop and the immediate priority loop [20:14:44] notpeter: can you restart all the runners? :) [20:15:23] notpeter: maybe after merging https://gerrit.wikimedia.org/r/#/c/73450/ ;) [20:15:25] New patchset: Jdlrobson; "Remove usage of wgMFLogEvents" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73488 [20:16:59] New patchset: Reedy; "Point php symlink to wmf9" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73500 [20:17:00] AaronSchulz: maybe Tim killed some after discovering the HTMLCacheUpdate bug to throttle the cache updates [20:18:23] https://gerrit.wikimedia.org/r/#/c/72064/ [20:18:25] heh [20:18:28] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/73500 [20:18:33] you know I fixed that years ago and it was changed back [20:18:44] * AaronSchulz wonders if it was by tim :) [20:19:06] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [20:19:20] "[19:01] TimStarling: I fixed this once already, but it looks like it has been very carefully unfixed" :D [20:20:03] sounds like pingpong [20:23:52] https://gerrit.wikimedia.org/r/#/c/68937/ [20:23:58] what's up with that change? [20:24:06] it's been merged and then reverted, I see [21:01:31] New patchset: Andrew Bogott; "Remove uses of the 'firewall' module." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/73521 [21:06:02] !log stopped db32 mysql slave threads [21:06:05] !log stopped db32 mysql slave threads [21:06:06] New review: MaxSem; "Waiting for dependency." [operations/mediawiki-config] (master) C: -2; - https://gerrit.wikimedia.org/r/73488 [21:06:16] Logged the message, Master [21:48:34] notpeter: ping [21:53:28] AaronSchulz: what does it take to restart the job runners? [21:54:29] the ParsoidCacheUpdate backlog continues to grow: 154987 queued; 131 claimed (0 active, 131 abandoned) [22:01:38] mutante: ping [22:02:17] gwicke: ping [22:02:29] ahh ;) [22:02:39] do you know how to restart job runners? [22:02:51] Aaron asked Peter earlier about it [22:03:03] [13:13] on m1016, there is no parsoid job coordinator running at all for example [22:03:07] ehm.. not without checking wikitech ..even though i might have done it in the past [22:03:37] I can search wikitech for you ;) [22:04:04] https://wikitech.wikimedia.org/wiki/Job_queue_runner [22:04:12] /etc/init.d/mw-job-runner restart on all jobbers [22:04:25] thanks:) [22:04:39] binasher: thanks too! [22:05:49] gwicke: looking at the dsh groups, specific ones? [22:05:58] dsh -g job-runners [22:06:04] according to https://wikitech.wikimedia.org/wiki/Job_queue_runner [22:06:11] ddsh -g job-runners "/etc/init.d/mw-job-runner restart" [22:06:15] yea, we have eqiad and pmtpa and .old [22:06:19] cool [22:06:40] !log restarting job runners [22:06:51] Logged the message, Master [22:07:06] mutante: thanks! [22:07:10] it's like they werent running, failed to kill existing ones [22:07:14] but otherwise, looks done [22:07:37] mutante: yes, that was our suspicion too [22:07:39] oh, wait. unable to open pidfile '/var/run/mw-jobs.pid' for writing (Permission denied) [22:07:44] on some [22:08:16] doing that one more time as root [22:08:22] looks better [22:08:38] yep, done [22:09:03] yay! the number of queued parsoid jobs starts to shrink ;) [22:09:08] :) [22:10:24] https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&s=by+name&c=Parsoid+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 [22:10:59] 35% API load now [22:11:23] will be a good load test until the queue is drained [22:15:57] meta is giving me errors on API requests [22:16:11] hmm.because it's so busy now? [22:17:26] 42% according to ganglia [22:19:32] abartov: the load seems to have leveled off a bit, do you still get errors? [22:19:41] https://ganglia.wikimedia.org/latest/?r=4hr&cs=&ce=&m=cpu_report&s=by+name&c=API+application+servers+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 [22:19:44] btw, Could not open input file: /usr/local/apache/common/multiversion/MWScript.php [22:19:52] is a problem the job queue monitoring has [22:19:59] mutante: fenari? tin? spence? [22:20:05] neon [22:20:09] because it is not a mediawiki install [22:20:40] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=job_queue [22:21:39] gwicke: yes, just now [22:22:06] gwicke: ...10.64.0.128 via cp1001.eqiad.wmnet (squid/2.7.STABLE9) to () Error: ERR_SOCKET_FAILURE, errno (98) Address already in use at Fri, 12 Jul 2013 22:21:33 GMT [22:22:20] ahh [22:22:20] RoanKattouw: done < <( php /usr/local/apache/common/multiversion/MWScript.php extensions/WikimediaMaintenance/getJobQueueLengths.php ) [22:22:32] we had something like that before, was a faulty squid then [22:22:44] restarting it fixed that issue [22:23:04] cp1001 it seems [22:23:21] mutante: If neon wants to run MW maintenance scripts it needs to be an MW install [22:23:38] mutante: ^^ [22:23:49] or whoever has squid restart powers ;) [22:24:25] * abartov avoids squids in general [22:24:29] RoanKattouw: yea, but afaik never was like spence was and this was the only check like that [22:24:38] !log restarting squid on cp1001 [22:24:46] abartov: they have served us well over the years [22:24:49] Logged the message, Master [22:24:54] so don't be ungrateful ;) [22:25:21] !log rebuilding tag_summary on enwiki [22:25:25] gwicke: oh, far from it! I approve of squid, krakens, and other Cthulhuoid critters wholeheartedly! [22:25:32] Logged the message, Master [22:25:38] gwicke: (as long as other people actually deal with them...:)) [22:25:59] haha [22:26:58] gwicke, mutante, almost but no cigar. My query went further, but then I got booted again, same error (errno 98 socket already in use) on cp1001.eqiad [22:27:07] or did it not restart yet? [22:27:19] it did [22:27:57] is there another squid process lingering? [22:27:59] hmm, i see those errors popping up in syslog .yea [22:28:42] trying to completely stop and start [22:30:42] PROBLEM - Frontend Squid HTTP on cp1001 is CRITICAL: Connection refused [22:31:06] so, this was definitely off, no other processes left [22:31:14] and then started back and frontend again [22:31:42] RECOVERY - Frontend Squid HTTP on cp1001 is OK: HTTP OK: HTTP/1.0 200 OK - 1293 bytes in 0.002 second response time [22:32:12] https://ganglia.wikimedia.org/latest/?c=Text%20squids%20eqiad&h=cp1001.eqiad.wmnet&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [22:32:52] don't see socket errors in syslog anymore so far [22:33:45] !log completely stopped and then started squid front and back on cp1001 (had socket already in use errors) [22:33:55] Logged the message, Master [22:33:59] arg, and back it is.. sigh [22:34:31] or at least for a second ?! [22:38:56] !log aaron cleared profiling data [22:39:55] Logged the message, Master [22:44:52] > Error: ERR_SOCKET_FAILURE, errno (98) Address already in use at Fri, 12 Jul 2013 22:44:21 GMT [22:45:06] cp1001.eqiad.wmnet [22:45:14] only got it to happen the one time [22:48:02] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: No successful Puppet run in the last 10 hours [23:06:37] so it seems that the API Squids are short of ports? [23:07:10] Ryan_Lane: ^^ [23:07:12] there are no actual API squids, it's a text squid [23:07:19] ah, ok [23:07:21] but yea [23:07:30] and cp1001 is busier than all the others [23:07:37] https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Text%2520squids%2520eqiad&tab=m&vn= [23:07:43] is ulimit too low? [23:08:30] * gwicke fondly remember the days when getting past 1024 sockets was an accomplishment [23:09:02] well, we checked ip_local_port_range [23:09:05] 1024 65535 [23:09:42] http://paste.debian.net/15819/ [23:10:48] -n 1024? [23:10:58] nah [23:11:06] that would fall over immediately [23:11:09] * abartov nods [23:11:26] squid is started with a different -n normally [23:11:45] not sure if that is in the init script these days [23:12:38] [ -f /proc/sys/fs/file-max ] || return 0 [23:12:38] [ $SQUID_MAXFD -le 65536 ] || SQUID_MAXFD=65536 [23:12:46] * abartov remembers having to write code to render a 130k JPEG file in multiple passes, because the 64k DOS segment kept filling up. [23:12:47] minimal_file_max=$(($SQUID_MAXFD + 4096)) [23:12:47] if [ "$global_file_max" -lt $minimal_file_max ] [23:12:47] then [23:12:48] echo $minimal_file_max > /proc/sys/fs/file-max [23:12:52] <-- all from the init script [23:12:55] * abartov is old. [23:14:37] is /proc/sys/fs/file-max 65536? [23:14:56] 3220417 [23:15:21] that might be sufficient [23:16:11] the load on the API is below 30% now [23:16:17] are there still error on cp1001? [23:16:22] *errors [23:16:34] gwicke: so... [23:16:40] gwicke: just talked with RoanKattouw about this [23:16:57] gwicke: you guys are sending requests through the squids, just to hit the MW api cluster [23:17:09] there's no caching occuring at the squid level for this. [23:17:10] yes, there are still errors on cp1001 [23:17:18] you should be making requests directly to the api cluster [23:17:19] ok [23:17:29] Basically Parsoid is just requesting http://en.wikipedia.org/w/api.php which goes LVS -> frontend Squid -> backend Squid -> LVS -> Apache [23:17:32] is this for parsoid? [23:17:33] we can change the API config [23:17:37] i guess so [23:17:37] jeremyb: yes [23:17:38] But we have an internal IP we can use to get to that second LVS [23:17:51] is it all just hitting cp1001 but no other one? [23:17:54] this would eliminate 2 proxies and an lvs connection [23:18:03] mutante: it should be hitting more than just cp1001 [23:18:16] something seems to be not quite right with the load balancing for the API [23:18:20] but just this one looks so much busier than the others [23:18:24] per ganglia and those errors [23:18:27] mutante: Is it busy on the frontend or backend? [23:18:30] we should eliminate that in any case, but worth investigating later [23:18:35] I wonder if this is due to CARP [23:18:43] If it's the backend that would be it [23:18:52] If, say, http://en.wikipedia.org/w/api.php hashes to cp1001 [23:19:09] I remember there was some discussion about CARP not handling similarly named backends very well [23:19:21] RoanKattouw: that sounds very plausible [23:19:25] gwicke: Anyway, we need a way for Parsoid to connect to a different IP but still use the same Host header [23:19:26] that's also possible [23:19:26] most of those requests are posts [23:19:41] so all have the same URL [23:19:44] Yeah [23:19:54] gwicke: this isn't a matter of load, but a matter of socket starvation [23:20:00] Ryan_Lane: api.svc.eqiad.wmnet has address 10.2.2.22 ---> is the one they need to hit, right? [23:20:03] too many connections [23:20:08] yeah, as everything is funneled through one box [23:20:09] you may also starve the api backends, too [23:20:15] but it's less likely [23:20:29] the load is at 28% [23:20:33] you should also look at ways to do connection pooling, if possible [23:20:47] RoanKattouw: yeah [23:21:00] And that too [23:21:08] we can change this to an internal host name [23:21:11] Most likely, the problem is starvation due to TCP_WAIT timeouts [23:21:22] by just changing the Parsoid config [23:21:32] gwicke: That option already exists?! [23:21:37] RoanKattouw: yes, sure [23:21:42] just change localsettings.js [23:21:47] I mean you can't just use http://api.svc.eqiad.wmnet/.... because that would send the wrong Host header, right? [23:21:56] it would have to be different hosts [23:22:03] Ugh [23:22:10] I'd like to not add 800 DNS entries for this [23:22:12] or we'd need to hack something up [23:22:28] that explicitly sets the host header [23:22:42] ^^ that :) [23:22:42] do that :) [23:22:58] Yes [23:22:58] hehe [23:22:59] how can you set the host header from client JS? [23:23:05] jeremyb: Parsoid = server JS [23:23:15] jeremyb: nodejs [23:23:31] ohhhhhhhhhhh [23:23:33] nvm! [23:24:13] i was thinking the issue was with the way clients were hitting the API when talking to parsoid. but it's instead the way parsoid gets page text... [23:24:19] Yes [23:24:24] I wonder if we could add a workaround that just appends a random ?1 ?2... to the URL [23:24:26] And also does template expansions in certain cases [23:24:33] and then test the header stuff [23:24:55] Sounds ... reasonable to me [23:25:17] or even something like the worker id [23:25:25] We'd still want to do the header stuff just so we can avoid going through the whole stack (should help with latency too), but as a workaround it sounds like that should spread the pain [23:25:33] gwicke: Worker ID or worker PID? [23:25:35] yes [23:25:43] any source of entropy ;) [23:25:47] Right [23:25:50] the cheaper the better [23:25:55] RoanKattouw: do we have latency for this anywhere? I'd love to see how much this decreases it [23:26:09] process.pid looks good [23:26:11] Just saying, PID > ID because IDs presumably are much more likely to be duplicated between hosts [23:26:19] Ryan_Lane: I don't know if we have data on this. gwicke would know [23:26:26] I assume latency is minor though [23:26:31] yeah. should be [23:26:39] PHP instantiation plus the work we're asking it to do should take much more time [23:26:56] yeah, likely 30ms PHP startup, 1ms work [23:27:16] and 0.5ms connection [23:27:24] Depends on whether it needs to talk to the database though :) [23:29:06] much of that data should be in memcached [23:29:16] but yeah, some more IO will be in there too [23:31:12] !log finished tag_summary rebuild [23:31:23] Logged the message, Master [23:31:31] just pushed a patch and will test that a bit in rt testing, should be online in ~5 minutes [23:31:42] cool [23:31:43] thanks [23:31:51] great [23:32:14] binasher: ^^ not sure if you saw the proposed solution [23:32:54] binasher: parsoid will connect directly to the api load balancer IP rather than going through the entire stack [23:33:52] Ryan_Lane: +1 [23:33:58] hmm, I'm testing against the API but am getting some failures [23:34:01] locally it works [23:34:09] now this might just be poor cp1001 [23:35:14] yeah: Request: POST http://en.wikipedia.org/w/api.php?random=15749, from 108.94.30.165 via cp1001.eqiad.wmnet (squid/2.7.STABLE9) to ()
[23:37:10] nothing in syslog now [23:37:14] on cp1001 [23:37:32] !log restarted db32 slave [23:37:43] tail -f /var/log/squid-frontend/cache.log [23:37:43] 2013/07/12 23:35:51| Detected REVIVED Parent: 10.64.0.136 [23:37:43] Logged the message, Master [23:38:03] tail -f /var/log/squid/cache.log [23:38:03] 2013/07/12 23:31:11| commBind: Cannot bind socket FD 18 to *:0: (98) Address already in use [23:38:20] but that stopped [23:38:27] started again [23:39:24] !log updated Parsoid to f6d3742 [23:39:37] !log updated Parsoid to f6d3742 [23:39:48] Logged the message, Master [23:40:45] mutante: even after this parsoid change, it may be an issue until all current connections completely die [23:41:12] that seems to have helped [23:41:18] nod [23:41:21] 2013/07/12 23:40:14| squidaio_queue_request: WARNING - Queue congestion [23:46:17] no more errors [23:47:11] \o/ [23:47:12] Yay [23:47:12] great [23:47:30] is the URL the only thing that goes into the hash? [23:47:32] RoanKattouw, gwicke: thanks for the help guys [23:47:38] well, that queue congestion warning is last line in squid/cache.log [23:47:41] gwicke: hm. not sure how it works on posts [23:47:51] yea, thanks [23:47:56] gwicke: Yes :( we've had problems before with the caching server that enwiki api.php mapped to falling over [23:48:12] urlParse: Illegal character in hostname '*.wikipedia.org' [23:48:19] random ones from frontend [23:48:26] RoanKattouw: it makes sense for hit rates [23:48:29] some invalid requests [23:48:46] See [Ops] cp1004 & cp1005 , posted May 16, 2012 [23:48:46] except when there is no hits on POST.. [23:48:50] Digging up mailman URL for that [23:49:19] for situations like this it's really not helpful :) [23:49:29] though going through squid isn't really useful either for this case [23:49:34] yeah [23:49:46] https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Text%20squids%20eqiad&h=cp1001.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1373672957&v=0.85&m=load_fifteen&vl=%20&ti=Fifteen%20Minute%20Load%20Average&z=large [23:49:54] yeah, not load issue, but load also changed [23:50:16] http://ganglia.wikimedia.org/latest/?r=4hr&cs=&ce=&c=Text+squids+eqiad&h=cp1001.eqiad.wmnet&tab=m&vn=&mc=2&z=medium&metric_group=ALLGROUPS [23:50:59] https://lists.wikimedia.org/mailman/private/ops/2012-May/005058.html [23:51:18] PROBLEM - Puppet freshness on mw1001 is CRITICAL: No successful Puppet run in the last 10 hours [23:52:36] RoanKattouw: thx [23:53:02] "Actually it seems that cp1004 has a rate of backend requests about 20 times higher than the other servers, because it is the CARP backend for http://en.wikipedia.org/w/api.php, which is commonly posted to." -- Tim back in May 2012 [23:53:32] Somewhat similar problem: "Squid was reporting bind() errors, due to an excessive number (~95k) of client-side connections in the TIME_WAIT state." [23:54:00] actually that last line sounds exactly like it [23:54:38] Yeah [23:55:33] "So the problem is ephemeral port exhaustion for the cp1004/appservers.svc host pair. You would think that using HTTP keep-alive would fix this, and I see that Mark enabled it recently. But when Squid receives a POST request, it forces any previously open keepalive connection to close, before opening a new connection to send the POST." [23:55:54] link from ryan earlier http://www.squid-cache.org/mail-archive/squid-users/200702/0054.html .. and 2) is already maxed out [23:56:36] You have run out of free ports, all available ports occupied by [23:56:37] TIME_WAIT sockets. [23:56:52] Hmm, so [23:57:02] The thread ended with Tim suggesting a few things, and Asher responding with: [23:57:11] so_reuseaddr? [23:57:12] "I don't believe that squid caches the responses to POST requests, even if they are technically cacheable. So instead of letting CARP send all /api.php POST requests to cp1004, why don't we add an acl to the frontend squids that just matches api.php POST requests, and sends them directly to api.svc.pmtpa.wmnet?" [23:57:28] POST is not cacheable normally [23:57:31] He wrote that a year ago, I have no idea whether that was ever implemented [23:57:36] at least when following the HTTP spec [23:57:39] And I can't ask him now because he's not in the office or on IRC [23:58:38] that sounds like a good idea