[00:05:54] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [00:22:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:23:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [00:26:04] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:54] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [00:32:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [00:32:44] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 00:32:40 UTC 2013 [00:32:54] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [00:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:53:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [00:56:04] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:06:16] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [01:08:46] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [01:22:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:23:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [01:32:46] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 01:32:38 UTC 2013 [01:33:16] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [01:36:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:37:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [01:52:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:53:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [02:00:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:02:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [02:06:51] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [02:06:52] !log LocalisationUpdate completed (1.22wmf12) at Thu Aug 8 02:06:51 UTC 2013 [02:07:04] Logged the message, Master [02:11:38] (03CR) 10GWicke: "Ping!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/72653 (owner: 10Pyoungmeister) [02:16:52] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Aug 8 02:16:52 UTC 2013 [02:17:03] Logged the message, Master [02:22:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [02:33:21] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 02:33:11 UTC 2013 [02:33:51] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [02:49:11] PROBLEM - Puppet freshness on db9 is CRITICAL: No successful Puppet run in the last 10 hours [02:53:36] (03CR) 10MaxSem: "We might want to use https://github.com/lkarsten/libvmod-cookie here." [operations/puppet] - 10https://gerrit.wikimedia.org/r/75316 (owner: 10Mark Bergsma) [03:01:31] PROBLEM - Puppetmaste [03:08:27] eh, did icinga-wm just choke? [03:18:16] MaxSem: strange, checking [03:26:40] weird, the log line is complete [03:26:44] lemme check something [03:26:54] oh wait! [03:26:57] partition is full [03:27:05] heh [03:27:15] same thing as before [03:27:53] it doesn't monitor its own host? [03:28:54] MaxSem: i think there aren't a lot of disk use checks in general [03:29:25] jeremyb: is there an explicit list somewhere of "checks we should have but don't yet" [03:29:32] like, where should we add this one? :) [03:29:36] * MaxSem recalls that they actually helped not so long ago [03:30:13] https://wikitech.wikimedia.org/wiki/Projects#Basic_monitoring_.26_alerting [03:30:56] greg-g: idk, but there's https://rt.wikimedia.org/Ticket/Display.html?id=4728 :) [03:39:02] doesn't monitor own disk use [03:40:30] (03PS1) 10Lcarr: icinga needs logrotate [operations/puppet] - 10https://gerrit.wikimedia.org/r/78185 [03:41:36] (03CR) 10Lcarr: [C: 032] icinga needs logrotate [operations/puppet] - 10https://gerrit.wikimedia.org/r/78185 (owner: 10Lcarr) [03:47:12] also strangely enough our puppet logrotate doesn't actually rotate the puppet file [03:48:08] jeremyb: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=DISK [03:48:15] (03PS1) 10Lcarr: rotate puppet.log in logrotate [operations/puppet] - 10https://gerrit.wikimedia.org/r/78187 [03:48:31] mutante: Ryan_Lane ^^ ? [03:49:06] LeslieCarr: ah, has it been called /var/log/puppet but now its puppet.log ? [03:49:10] looks like it [03:49:23] yeah [03:49:50] (03CR) 10Dzahn: [C: 032] "makes sense, used ot be /var/log/puppet and now changed to puppet.log it seems" [operations/puppet] - 10https://gerrit.wikimedia.org/r/78187 (owner: 10Lcarr) [04:08:02] (03PS1) 10Lcarr: fixing icinga.log logrotate [operations/puppet] - 10https://gerrit.wikimedia.org/r/78189 [04:08:33] (03CR) 10Dzahn: [C: 032] fixing icinga.log logrotate [operations/puppet] - 10https://gerrit.wikimedia.org/r/78189 (owner: 10Lcarr) [04:10:41] thanks mutante [04:11:58] done on sockpuppet [04:13:56] (03PS9) 10Dzahn: add virtual language subdomain redirects for wikidata [operations/apache-config] - 10https://gerrit.wikimedia.org/r/65443 [04:21:02] (03PS6) 10Dzahn: the existing etherpad-lite_1.0-wm2 package in a operations/debs repo for completeness [operations/debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/76654 [04:22:06] (03CR) 10Dzahn: "was supposed to be a patch file: https://gerrit.wikimedia.org/r/#/c/76654/6/debian/patches/log4js.js.patch" [operations/debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/76661 (owner: 10Dzahn) [04:38:11] LeslieCarr: < wikibugs_> (NEW) please change mail FROM address to not use noc@ - https://bugzilla.wikimedia.org/52628 normal; Wikimedia: OTRS; () [04:38:24] so we dont get all the otrs mail [04:38:58] yay [04:39:00] *heart* [04:39:16] andre_ told me to open it in component OTRS :) [04:41:29] that looks annoying [04:42:10] Oops. [04:42:56] can create any new alias, like otrs-admins@ if you like [04:57:04] hrm i think that there's some things that expect the old nagios.cmd file in nagios plugins being submitted by nsca [04:58:43] may need to revert that old change [05:00:11] (03PS1) 10Lcarr: Revert "Fixed the path to icinga cmd file all around +file" [operations/puppet] - 10https://gerrit.wikimedia.org/r/78200 [05:00:27] (03PS2) 10Lcarr: Revert "Fixed the path to icinga cmd file all around +file" [operations/puppet] - 10https://gerrit.wikimedia.org/r/78200 [05:02:01] (03CR) 10Lcarr: [C: 032] "Reverting for now - need to fix and repackage the nagios plugins" [operations/puppet] - 10https://gerrit.wikimedia.org/r/78200 (owner: 10Lcarr) [05:22:55] anybody around who can approve a new gerrit account? [05:25:39] How is the list on noc.wikimedia.org generated? I hope not by hand? [05:28:01] greg-g: Which list? [05:28:08] I think he means the conf/ index. [05:28:09] oh, /conf/ [05:28:14] I'm a mindreader. [05:28:16] Yes, it's manually [05:28:18] :) [05:28:27] Well, add a line to a file, run script to create symlinks [05:28:32] heh [05:28:35] gotcha [05:28:41] Tim and Krinkle discussed making that whole index page pointers to GitHub. [05:28:52] Or git.wm.o, I suppose, if we can keep that running for more than ten minutes. [05:28:53] I was just looking for our version of https://github.com/pediapress/mwlib.rl/blob/master/example-mwlib.config [05:29:04] We shell out to PediaPress (the company). [05:29:07] As I understand it. [05:29:19] Or maybe we have (also?) internal PDF servers? [05:29:21] I thought we had our own servers doing it [05:29:24] yeah [05:29:29] pdfXXXX [05:29:36] greg-g: Elsie: noc.wm.o/conf is mostly generated. The file contents are live from wmf-deployment, the html overview is generated based on a white list [05:29:46] greg-g: that would be hiding away on the pdf boxes most likely [05:29:51] it's all in version control, so look it up if you want to knw more [05:29:52] https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=PDF%2520servers%2520pmtpa&tab=m&vn= [05:29:55] which may or maynot be a mess still [05:29:55] Or perhaps in Puppet. [05:30:04] wikitech.wm.o probably has docs about this. [05:30:07] (Or really ought to.) [05:30:25] https://github.com/wikimedia/operations-mediawiki-config/tree/master/docroot/noc/conf [05:30:38] https://github.com/wikimedia/operations-mediawiki-config/blob/master/docroot/noc/conf/index.php [05:31:11] https://github.com/wikimedia/operations-mediawiki-config/blob/master/docroot/noc/createTxtFileSymlinks.sh [05:31:13] Krinkle: greg-g was investigating a PDF bug. noc.wm.o was a secondary question. ;-) [05:31:20] Krinkle knows the symlinks.sh file well. ^_^ [05:33:39] but, like I said, sleep is a good idea [05:33:41] g'night [05:40:26] Elsie: yea, it's pediapress operated [05:47:32] Elsie: /home/pp .. heiko@duck, volker@caramba .. and others [05:50:20] Elsie: docs _would_ be here if they were http://simple.pediapress.com/wiki/RenderServerOperation [05:50:35] been like that for months .. [05:52:23] https://wikitech.wikimedia.org/wiki/Pdf1.wikimedia.org [06:00:39] test ircecho [06:00:45] yay [06:01:00] i'm a bot [06:08:54] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [06:29:15] (03CR) 10Lcarr: [C: 04-1] "still missing firewall rules" [operations/puppet] - 10https://gerrit.wikimedia.org/r/75777 (owner: 10Demon) [06:30:22] (03CR) 10Yuvipanda: [C: 04-1] "Needs a newer nginx-extras to work, hoping I can convince Kartik to work on it :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 (owner: 10Yuvipanda) [06:32:55] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 06:32:47 UTC 2013 [06:33:54] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [07:05:45] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [07:32:25] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [07:32:25] PROBLEM - Puppet freshness on holmium is CRITICAL: No successful Puppet run in the last 10 hours [07:32:25] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [07:32:25] PROBLEM - Puppet freshness on pdf3 is CRITICAL: No successful Puppet run in the last 10 hours [07:32:25] PROBLEM - Puppet freshness on sq41 is CRITICAL: No successful Puppet run in the last 10 hours [07:32:26] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [07:33:05] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 07:32:59 UTC 2013 [07:33:45] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [07:45:45] Elsie: jeluf and elian are otrsadmin.. isn't that outdated? [07:46:04] mail alias that is [07:57:19] !log reedy synchronized docroot and w [07:57:30] Logged the message, Master [07:59:30] !log reedy synchronized wmf-config/ [07:59:41] Logged the message, Master [08:04:19] (03CR) 10Dzahn: [C: 032] "Rangilo, feel free to add me as reviewer next time you have changes." [operations/puppet] - 10https://gerrit.wikimedia.org/r/78009 (owner: 10Rangilo Gujarati) [08:04:40] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [08:04:40] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [08:04:40] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [08:04:40] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [08:04:40] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [08:04:41] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [08:05:22] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:08:44] (03CR) 10Dzahn: [C: 032] "http://docs.python-requests.org/en/latest/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/75632 (owner: 10Hashar) [08:12:57] (03CR) 10Dzahn: "ii python-requests 0.8.2-1" [operations/puppet] - 10https://gerrit.wikimedia.org/r/75632 (owner: 10Hashar) [08:15:05] (03CR) 10Dzahn: [C: 031] Add remaining skins (Cologne Blue and Modern) to sync script. [operations/puppet] - 10https://gerrit.wikimedia.org/r/69444 (owner: 10Mattflaschen) [08:16:20] let's block Googlebot from git.wikimedia ?! [08:16:41] https://gerrit.wikimedia.org/r/#/c/77909/1 [08:17:13] hmm. its's kind of down too [08:20:27] !log attempted to restart gitblit on antimony [08:20:37] Logged the message, Master [08:20:44] !log git.wikimedia.org back [08:20:55] Logged the message, Master [08:24:44] mutante: I had suggested blocking the useragent for the zip urls [08:25:08] looking at access.log , indeed it's pretty fast [08:25:12] and full ogf Googlebot [08:25:33] and java CPU usage high [08:25:56] I'd say block it for now [08:26:06] ^demon will fix it later [08:26:11] (03CR) 10Dzahn: [C: 032] "indeed access.log is full of Googlebot" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77909 (owner: 10Demon) [08:26:18] alright, yeay, d [08:36:14] (03CR) 10Reedy: [C: 032] cswikiquote: Add custom namespace for works [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78142 (owner: 10Danny B.) [08:36:15] (03Merged) 10jenkins-bot: cswikiquote: Add custom namespace for works [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78142 (owner: 10Danny B.) [08:37:25] Ryan_Lane: [08:37:30] apergos: ? [08:38:02] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 08:37:55 UTC 2013 [08:38:03] google supposedly has reduced their queries / sec and the number of parallel connections to prevent this [08:38:14] til we get robots.txt fixed, which demon was gonna do today [08:38:21] Ryan_Lane: ping? [08:38:22] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:38:51] yuvipanda: ? [08:39:01] apergos: ah, ok [08:39:11] Ryan_Lane: I added KartikMistry to the project-proxy project, but he's still not able to ssh in to even bastion [08:39:12] he has shell [08:39:15] and the key works for gerrit [08:39:18] is there more that needs to be done? [08:39:26] what's the shell account name? [08:39:28] if it's still falling over that's not good, I can let em know (but it might be a while before I get a response, wrong tz) [08:39:44] otherwise we can *cough* ignore it til demon's fix later today [08:39:47] Ryan_Lane: KartikMistry [08:40:07] it's kartik [08:40:09] ;) [08:40:17] aaah [08:40:21] so it is an username problem [08:40:22] yuvipanda: FAIL [08:40:45] yuvipanda: can the user try sshing in? [08:41:00] (03PS2) 10Reedy: Set up flood flag for zhwikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78018 (owner: 10TTO) [08:41:06] (03CR) 10Reedy: [C: 032] Set up flood flag for zhwikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78018 (owner: 10TTO) [08:41:08] !log on overloaded antimony: stopped gitblit, ran puppet, blocked Googlebot, restarted apache and gitblit [08:41:11] apergos: [08:41:16] (03Merged) 10jenkins-bot: Set up flood flag for zhwikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78018 (owner: 10TTO) [08:41:18] Logged the message, Master [08:41:19] it doesnt care so far :p [08:41:29] Googlebot/2.1; +http://www.google.com/bot.html)" [08:41:31] ok well lemme tell em we've blocked them temprarily [08:41:35] Ryan_Lane: still doesn't work [08:41:36] yuvipanda: yeah, using the wrong username [08:41:41] Ryan_Lane: even with kartik@ [08:41:44] same error [08:41:52] Invalid user KartikMistry from [08:41:52] until demon can get his end fixed [08:41:59] so, yeah, wrong username [08:42:07] (03PS2) 10Reedy: (bug 52578) New user group 'botadmin' on ckb.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78056 (owner: 10Dereckson) [08:42:53] (03CR) 10Siebrand: "Added some platform engineers as in I3c156792." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/66339 (owner: 10Liangent) [08:42:54] apergos: thanks, it seems to ignore the block [08:42:56] now it's working [08:43:02] Ryan_Lane: yeah, it is [08:43:02] it does? [08:43:04] I'm an idiot. [08:43:10] er how did you block them? [08:43:32] apergos: this is definitely in Apache https://gerrit.wikimedia.org/r/#/c/77909/1/templates/apache/sites/git.wikimedia.org.erb but access.log still .. [08:43:39] (03CR) 10Reedy: [C: 032] (bug 52578) New user group 'botadmin' on ckb.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78056 (owner: 10Dereckson) [08:43:46] (03Merged) 10jenkins-bot: (bug 52578) New user group 'botadmin' on ckb.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78056 (owner: 10Dereckson) [08:43:47] apergos: oh.. Googlebot-Image/1.0 != Googlebot ? [08:43:53] no [08:44:00] googlebot is the one that rate limited [08:44:09] that should be enough, it's the /zip/ paths primarily that were the problem [08:44:17] Googlebot-Image is the one still being very active [08:44:21] ok [08:44:25] it's minor compared to the rest [08:44:34] alright [08:44:35] I did a count from the logs yeasterday [08:44:42] ah:) good [08:44:53] (03PS2) 10Reedy: Add autopatrol protection level for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77701 (owner: 10TTO) [08:44:56] now YandexBot kicking in :) [08:44:58] (03CR) 10Reedy: [C: 032] Add autopatrol protection level for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77701 (owner: 10TTO) [08:45:16] (03Merged) 10jenkins-bot: Add autopatrol protection level for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77701 (owner: 10TTO) [08:45:29] please note it on the bz report so we don't forget [08:46:09] java CPU usage isn't > 500% anymore though :) [08:46:39] (03PS7) 10Reedy: Clean up headers in CommonSettings and InitialiseSettings [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/76342 (owner: 10TTO) [08:46:51] (03CR) 10Reedy: [C: 032] Clean up headers in CommonSettings and InitialiseSettings [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/76342 (owner: 10TTO) [08:46:56] I bet it isn't [08:47:01] (03Merged) 10jenkins-bot: Clean up headers in CommonSettings and InitialiseSettings [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/76342 (owner: 10TTO) [08:48:27] apergos: done [08:53:09] !log reedy synchronized wmf-config/ [08:59:30] thanks [09:02:42] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 09:02:37 UTC 2013 [09:03:22] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [09:10:59] (03PS1) 10Dzahn: fix atom.xml redirect on planet.wikimedia.org without language subdomain (bug 46721) [operations/puppet] - 10https://gerrit.wikimedia.org/r/78220 [09:12:46] (03PS2) 10Dzahn: fix atom.xml redirect on planet.wikimedia.org without language subdomain (bug 46721) [operations/puppet] - 10https://gerrit.wikimedia.org/r/78220 [09:13:25] (03CR) 10Dzahn: [C: 032] fix atom.xml redirect on planet.wikimedia.org without language subdomain (bug 46721) [operations/puppet] - 10https://gerrit.wikimedia.org/r/78220 (owner: 10Dzahn) [09:23:38] apergos: is adding a (readable) robots.txt the long term solution for gitblit, or something else? [09:23:54] the current status of the bugzilla ticket is not super-clear [09:24:07] fixing the issue with the serving of the robots.txt file is the fix, yes [09:24:24] at that point if we want to add a few other paths in there we can [09:33:07] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 09:32:59 UTC 2013 [09:33:17] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [09:43:30] (03CR) 10Mark Bergsma: "No, I had looked at its code and it sucks. Large static arrays, linear searches, no sorting, etc. But an improved/fixed up version of it c" [operations/puppet] - 10https://gerrit.wikimedia.org/r/75316 (owner: 10Mark Bergsma) [09:48:30] (03CR) 10Mark Bergsma: [C: 032] Setup a misc services varnish cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/78090 (owner: 10Mark Bergsma) [09:50:39] (03PS1) 10Mark Bergsma: Rename role::cache::ssl::text to ssl::unified [operations/puppet] - 10https://gerrit.wikimedia.org/r/78224 [09:55:09] apergos: should it be added to the bug summary? [09:55:48] (03PS1) 10Mark Bergsma: Install cp1043/1044 as misc varnish servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/78225 [09:55:49] (03PS1) 10Mark Bergsma: Decommission cp1041-1044 [operations/puppet] - 10https://gerrit.wikimedia.org/r/78226 [09:55:59] I don't think it' snecessary, unless demon doesn't get to it today for some reason [09:56:03] (03CR) 10Mark Bergsma: [C: 032] Rename role::cache::ssl::text to ssl::unified [operations/puppet] - 10https://gerrit.wikimedia.org/r/78224 (owner: 10Mark Bergsma) [09:56:41] (03CR) 10Mark Bergsma: [C: 032] Install cp1043/1044 as misc varnish servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/78225 (owner: 10Mark Bergsma) [09:57:04] (03CR) 10Mark Bergsma: [C: 032] Decommission cp1041-1044 [operations/puppet] - 10https://gerrit.wikimedia.org/r/78226 (owner: 10Mark Bergsma) [09:57:45] oki doki [09:57:58] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [10:08:51] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [10:09:41] (03PS1) 10TTO: Clean up first part of InitialiseSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78227 [10:31:15] (03PS1) 10Mark Bergsma: Move cp1043 and cp1044 internal [operations/puppet] - 10https://gerrit.wikimedia.org/r/78228 [10:31:52] (03CR) 10Mark Bergsma: [C: 032] Move cp1043 and cp1044 internal [operations/puppet] - 10https://gerrit.wikimedia.org/r/78228 (owner: 10Mark Bergsma) [10:33:21] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 10:33:16 UTC 2013 [10:33:51] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [10:34:20] (03CR) 10Mark Bergsma: "Why not just put a single file resource and use the "recurse" attribute? :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/76678 (owner: 10Tim Starling) [10:39:10] I should resurrect this grnet script... [10:39:40] file { "/home/faidon": [10:39:40] ensure => directory, [10:39:40] source => [ [10:39:40] "puppet:///files/userfiles/faidon/", [10:39:40] "puppet:///files/userfiles/skel/", [10:39:42] ], [10:39:44] sourceselect => first, [10:39:47] recurse => remote, [10:39:49] is what we had [10:39:58] yeah [10:43:07] (03CR) 10Faidon: "I've used this previously and has worked well for me:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/76678 (owner: 10Tim Starling) [11:06:06] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [11:35:16] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 11:35:09 UTC 2013 [11:36:06] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:01:53] PROBLEM - SSH on sq50 is CRITICAL: Server answer: [12:02:53] RECOVERY - SSH on sq50 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [12:03:13] PROBLEM - SSH on sq60 is CRITICAL: Server answer: [12:04:13] RECOVERY - SSH on sq60 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [12:05:23] PROBLEM - SSH on sq59 is CRITICAL: Server answer: [12:06:14] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:06:24] RECOVERY - SSH on sq59 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [12:06:34] PROBLEM - SSH on virt1000 is CRITICAL: Server answer: [12:07:34] RECOVERY - SSH on virt1000 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [12:11:34] PROBLEM - SSH on virt1000 is CRITICAL: Server answer: [12:12:34] RECOVERY - SSH on virt1000 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [12:14:14] PROBLEM - SSH on lvs1003 is CRITICAL: Server answer: [12:15:14] RECOVERY - SSH on lvs1003 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [12:32:44] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 12:32:40 UTC 2013 [12:33:14] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:34:00] (03CR) 10MZMcBride: "This is a small chunk? :-)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78227 (owner: 10TTO) [12:34:32] mutante: Super-outdated. :-) [12:34:52] mutante: JeLuF and Elian haven't been involved since like 2005, I don't think. [12:49:44] PROBLEM - Puppet freshness on db9 is CRITICAL: No successful Puppet run in the last 10 hours [13:05:36] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [13:14:45] have you seen the size of the whole config file lately ( Elsie )? [13:32:56] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 13:32:46 UTC 2013 [13:33:36] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [13:56:56] (03CR) 10Ottomata: "(3 comments)" [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/78160 (owner: 10Edenhill) [14:06:29] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [14:32:49] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 14:32:47 UTC 2013 [14:33:29] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [15:08:08] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [15:12:28] (03CR) 10Demon: "Yeah just hadn't gotten back to it yet." [operations/puppet] - 10https://gerrit.wikimedia.org/r/75777 (owner: 10Demon) [15:13:15] <^d> manybubbles: I had a dream that we did something very very wrong in CirrusSearch [15:13:19] <^d> And it broke things. [15:13:29] <^d> Sadly it was a dream, so I'm having trouble remembering what we did wrong. [15:13:39] <^d> (And if it was a reflection on what we're actually doing) [15:13:47] ! [15:14:19] the only thing I'm worried about right now is making sure we run side by side correctly.... if that is ok then I think we'll be fine. [15:14:33] <^d> So I think today I'm going to look at a bunch of things with my paranoid hat on. [15:14:40] <^d> And assume the worst :) [15:14:41] but yeah, if you remember what your dream said it might be useful. unless it was about rainbows. [15:14:47] sounds good [15:17:12] <^d> Heh, I see what you were trying to do in https://gerrit.wikimedia.org/r/#/c/78151/ [15:32:48] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 15:32:43 UTC 2013 [15:33:08] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [15:40:41] (03PS1) 10Demon: Don't proxy robots.txt [operations/puppet] - 10https://gerrit.wikimedia.org/r/78239 [15:51:57] RECOVERY - Puppet freshness on mchenry is OK: puppet ran at Thu Aug 8 15:51:55 UTC 2013 [15:52:59] (03PS2) 10Demon: Don't proxy robots.txt [operations/puppet] - 10https://gerrit.wikimedia.org/r/78239 [15:55:08] (03CR) 10ArielGlenn: [C: 032] Don't proxy robots.txt [operations/puppet] - 10https://gerrit.wikimedia.org/r/78239 (owner: 10Demon) [16:06:37] (03PS1) 10Demon: Revert "Poor googlebot, spidering git too fast" [operations/puppet] - 10https://gerrit.wikimedia.org/r/78243 [16:07:50] (03PS1) 10Manybubbles: Turn down elasticsearch heap usage in production. [operations/puppet] - 10https://gerrit.wikimedia.org/r/78244 [16:08:29] ^d: ^^^ [16:08:37] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:09:23] (03CR) 10Demon: [C: 031] "Looks good and needed. Merge me plz :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/78244 (owner: 10Manybubbles) [16:09:44] (03PS2) 10Demon: Revert "Poor googlebot, spidering git too fast" [operations/puppet] - 10https://gerrit.wikimedia.org/r/78243 [16:11:32] (03CR) 10ArielGlenn: [C: 032] Revert "Poor googlebot, spidering git too fast" [operations/puppet] - 10https://gerrit.wikimedia.org/r/78243 (owner: 10Demon) [16:13:04] (03PS3) 10Akosiaris: Refactor nrpe to a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/77720 [16:13:36] <^d> akosiaris: Hey, got a backups question for you :) [16:13:47] hit me [16:14:19] <^d> So until yesterday, we had daily backups of svn copied off to tridge. We discontinued those since SVN is now a read-only service. [16:14:32] <^d> So A) If you've got old backups laying around on tridge, feel free to kill them [16:14:54] <^d> and B) I've got a final backup of SVN, it's about 14G. Where would be a safe place to store that $practicallyForever? [16:16:30] A) cool. We are redesigning backup using a new infrastructure so that part will be implicit (just not migrated to the new infra) [16:16:54] B) The new infra. I can create a $forever volume for such things [16:17:13] let's call it archival (sounds better) [16:17:59] <^d> Sounds like a plan. [16:18:18] :-) [16:19:44] I wondeer what we want to do with rather larger data sets that we should keep around forever (a few T, more than a few T) [16:20:24] RAID1 over the internet ? [16:20:48] old slashdot joke... [16:20:56] apergos: like ? [16:21:05] page view stats for example [16:21:10] we keep them forever [16:21:29] having a safe place besides just "on spinning disks in two places" might be nice [16:21:30] <^d> akosiaris: We did an NFS mount between Amsterdam and Tampa once. ma rk didn't like that. [16:21:51] i would not like that either [16:22:22] wait til you hear where the labs (in tamp) gluster filesystem is mounted... [16:23:26] but i was referring to allowing people to mirror the data freely (through whatever technology). That's what CommanderTaco called RAID1 over the internet [16:23:50] we have a couple copies off site but they aren't guaranteed to be there [16:29:51] Am I in the right place to ask about logo changes? [16:29:54] To Wikidata [16:30:03] Or is it -tech [16:32:02] Wrong place, apparently. [16:32:47] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 16:32:43 UTC 2013 [16:33:37] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [17:09:22] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [17:15:50] ^d: antimony unhappy again [17:16:02] <^d> dangit. [17:17:16] <^d> apergos: We're gonna have to block some other things too, /zip/ wasn't enough :\ [17:18:15] (03PS1) 10Manybubbles: Turn down suggestion caching in labs. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78252 [17:18:17] block all the things [17:32:52] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Aug 8 17:32:44 UTC 2013 [17:33:12] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [17:33:12] PROBLEM - Puppet freshness on holmium is CRITICAL: No successful Puppet run in the last 10 hours [17:33:12] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [17:33:12] PROBLEM - Puppet freshness on pdf3 is CRITICAL: No successful Puppet run in the last 10 hours [17:33:12] PROBLEM - Puppet freshness on sq41 is CRITICAL: No successful Puppet run in the last 10 hours [17:33:13] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [17:33:22] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [17:37:16] <^d> paravoid: So, testsearch1001 is freaking out or something. [17:37:21] <^d> http://ganglia.wikimedia.org/latest/?r=week&cs=08%2F06%2F2013+00%3A00+&ce=08%2F08%2F2013+00%3A00+&c=Miscellaneous+eqiad&h=testsearch1001.eqiad.wmnet&tab=m&vn=&mc=2&z=small&metric_group=ALLGROUPS [17:37:25] <^d> 1002 and 1003 are fine [17:37:47] <^d> elastic search isn't running yet on any of them (unrelated issue, fix in puppet for that) [17:38:33] wow [17:38:44] <^d> top is reporting power_saving and watchdog as taking up >100% cpu in spikes. [17:41:00] cmjohnson1: around? [17:47:38] (03CR) 10Demon: "(1 comment)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78252 (owner: 10Manybubbles) [17:50:13] <^d> paravoid: Thinking it's hardware? [17:51:52] that [17:52:00] or the friggin power settings again [17:53:13] (03PS2) 10Manybubbles: Turn down suggestion caching in labs. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78252 [17:53:20] i hate hardware [17:53:48] (03CR) 10Demon: [C: 032] Turn down suggestion caching in labs. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78252 (owner: 10Manybubbles) [17:53:57] (03Merged) 10jenkins-bot: Turn down suggestion caching in labs. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78252 (owner: 10Manybubbles) [17:54:31] <^d> manybubbles: Ok, should have your 30m cache expiry in labs soon. [17:59:59] (03PS1) 10Mark Bergsma: Correct LVS service IP name [operations/puppet] - 10https://gerrit.wikimedia.org/r/78255 [18:00:10] <^d> Well that was annoying. [18:01:06] so should we put gitblit behind varnish? [18:01:12] (03CR) 10Mark Bergsma: [C: 032] Correct LVS service IP name [operations/puppet] - 10https://gerrit.wikimedia.org/r/78255 (owner: 10Mark Bergsma) [18:01:33] <^d> mark: It doesn't output sane enough caching headers yet. [18:01:49] what does it output? [18:02:18] <^d> Take https://git.wikimedia.org/commitdiff/mediawiki%2Fextensions%2FSpamBlacklist.git/29f69a35e7e115455796c8c04ab2f9ee7f53f213 for example. [18:02:31] <^d> Cache-Control: private, must-revalidate [18:02:32] <^d> Last-Modified: Thu, 08 Aug 2013 17:23:22 GMT [18:02:32] <^d> Expires: Thu, 08 Aug 2013 18:07:06 GMT [18:02:50] <^d> That's a diff page. The content will *never* change. [18:03:00] that's stupid indeed [18:03:05] we can override t in varnish, but it's hacky of course [18:03:12] is there anything that is actually private? [18:03:20] does it allow logins and such? [18:03:20] <^d> No, there's not. [18:03:26] <^d> Yes, but we have that disabled. [18:03:38] then we might as well force caching on it [18:03:53] hehe [18:03:58] varnish by default doesn't honor CC: private at all [18:04:01] I guess this is why [18:04:06] <^d> Upstream is nice and responsive. I can ask him to fix this :) [18:04:13] yeah that would be nice [18:04:59] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [18:04:59] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [18:04:59] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [18:04:59] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [18:04:59] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [18:05:00]