[00:15:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:24:09] PROBLEM - Swift HTTP on ms-fe1004 is CRITICAL: HTTP CRITICAL - No data received from host [00:28:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.355 seconds [00:40:23] Any sysadmins around that can answer a questions? [00:40:30] a question* [00:48:30] how can they know if they can answer your question if they haven't heard it? [00:48:57] Well, I mean, can attempt to answer a question [00:49:57] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sun Dec 16 00:49:50 UTC 2012 [01:02:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:09:39] MaxSem: good sysadmins can [01:10:10] j/k. [01:10:12] "NO" [01:10:26] "Don't do that" [01:17:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.015 seconds [01:20:05] PROBLEM - Swift HTTP on ms-fe1003 is CRITICAL: HTTP CRITICAL - No data received from host [01:28:11] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 256 seconds [01:31:30] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 9 seconds [01:37:38] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 196 seconds [01:38:42] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 204 seconds [01:40:20] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 4 seconds [01:41:05] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [01:50:05] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:06:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.028 seconds [02:08:23] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [02:16:20] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [02:21:49] tofutwitch11: still around with the question ? [02:21:59] yes [02:22:11] what is it and i will see if i can answer [02:26:23] !log LocalisationUpdate completed (1.21wmf6) at Sun Dec 16 02:26:23 UTC 2012 [02:26:35] Logged the message, Master [02:30:09] PROBLEM - Apache HTTP on mw34 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:10] PROBLEM - Apache HTTP on mw44 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:10] PROBLEM - Apache HTTP on mw27 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:17] PROBLEM - Apache HTTP on mw52 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:18] PROBLEM - Apache HTTP on mw20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:18] PROBLEM - Apache HTTP on mw29 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:26] PROBLEM - Apache HTTP on mw38 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:26] PROBLEM - Apache HTTP on mw51 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:27] PROBLEM - Apache HTTP on mw28 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:27] PROBLEM - Apache HTTP on mw22 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:27] PROBLEM - Apache HTTP on mw46 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:27] PROBLEM - Apache HTTP on mw33 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:35] PROBLEM - Apache HTTP on mw24 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:35] PROBLEM - Apache HTTP on mw40 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:44] PROBLEM - Apache HTTP on srv242 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:45] PROBLEM - Apache HTTP on srv227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:45] PROBLEM - Apache HTTP on mw36 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:45] PROBLEM - Apache HTTP on mw19 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:45] PROBLEM - Apache HTTP on mw21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:45] PROBLEM - Apache HTTP on mw37 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:45] PROBLEM - Apache HTTP on srv279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:53] PROBLEM - Apache HTTP on srv213 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:53] PROBLEM - Apache HTTP on srv200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:54] PROBLEM - Apache HTTP on srv266 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:54] PROBLEM - Apache HTTP on srv271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:02] PROBLEM - Apache HTTP on mw49 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:03] PROBLEM - Apache HTTP on srv286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:03] PROBLEM - Apache HTTP on mw58 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:03] PROBLEM - Apache HTTP on mw54 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:03] PROBLEM - Apache HTTP on mw43 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:03] PROBLEM - Apache HTTP on mw26 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:12] PROBLEM - Apache HTTP on srv265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:12] PROBLEM - Apache HTTP on mw57 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:20] PROBLEM - Apache HTTP on mw42 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:20] PROBLEM - Apache HTTP on srv237 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:21] PROBLEM - Apache HTTP on mw17 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:21] PROBLEM - Apache HTTP on mw39 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:29] PROBLEM - Apache HTTP on mw31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:29] PROBLEM - Apache HTTP on mw30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:29] PROBLEM - Apache HTTP on mw35 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:30] PROBLEM - Apache HTTP on mw48 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:30] PROBLEM - Apache HTTP on srv196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:38] PROBLEM - Apache HTTP on srv259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:47] PROBLEM - Apache HTTP on srv270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:47] PROBLEM - Apache HTTP on mw41 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:32:05] PROBLEM - Apache HTTP on srv211 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:32:05] PROBLEM - Apache HTTP on mw59 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:32:14] PROBLEM - Apache HTTP on srv239 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:32:14] RECOVERY - Apache HTTP on srv242 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.182 second response time [02:32:23] RECOVERY - Apache HTTP on srv279 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.991 second response time [02:32:24] RECOVERY - Apache HTTP on srv213 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.411 second response time [02:32:24] RECOVERY - Apache HTTP on srv271 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.638 second response time [02:32:32] RECOVERY - Apache HTTP on srv286 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.047 second response time [02:32:33] RECOVERY - Apache HTTP on srv200 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.112 second response time [02:32:41] RECOVERY - Apache HTTP on srv265 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.052 second response time [02:32:50] RECOVERY - Apache HTTP on srv237 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.071 second response time [02:32:50] RECOVERY - Apache HTTP on mw17 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.210 second response time [02:32:59] RECOVERY - Apache HTTP on srv196 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.065 second response time [02:32:59] RECOVERY - Apache HTTP on mw31 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.537 second response time [02:33:08] RECOVERY - Apache HTTP on srv259 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.049 second response time [02:33:09] RECOVERY - Apache HTTP on mw30 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.485 second response time [02:33:17] RECOVERY - Apache HTTP on srv270 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.046 second response time [02:33:18] RECOVERY - Apache HTTP on mw34 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.046 second response time [02:33:18] RECOVERY - Apache HTTP on mw27 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.060 second response time [02:33:18] RECOVERY - Apache HTTP on mw41 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.054 second response time [02:33:18] RECOVERY - Apache HTTP on mw44 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.820 second response time [02:33:27] RECOVERY - Apache HTTP on mw52 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.060 second response time [02:33:27] RECOVERY - Apache HTTP on mw20 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.053 second response time [02:33:27] RECOVERY - Apache HTTP on mw29 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.065 second response time [02:33:35] RECOVERY - Apache HTTP on mw38 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.056 second response time [02:33:35] RECOVERY - Apache HTTP on mw28 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.072 second response time [02:33:36] RECOVERY - Apache HTTP on mw22 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.284 second response time [02:33:36] RECOVERY - Apache HTTP on srv211 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.054 second response time [02:33:36] RECOVERY - Apache HTTP on mw46 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.054 second response time [02:33:36] RECOVERY - Apache HTTP on mw59 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.066 second response time [02:33:36] RECOVERY - Apache HTTP on mw33 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.062 second response time [02:33:44] RECOVERY - Apache HTTP on mw24 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.048 second response time [02:33:44] RECOVERY - Apache HTTP on srv239 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.056 second response time [02:33:44] RECOVERY - Apache HTTP on mw40 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.064 second response time [02:33:53] RECOVERY - Apache HTTP on srv227 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.054 second response time [02:33:53] RECOVERY - Apache HTTP on mw19 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.055 second response time [02:33:54] RECOVERY - Apache HTTP on mw21 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.053 second response time [02:33:54] RECOVERY - Apache HTTP on mw37 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.071 second response time [02:33:54] RECOVERY - Apache HTTP on mw36 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.157 second response time [02:34:02] RECOVERY - Apache HTTP on srv266 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.059 second response time [02:34:11] RECOVERY - Apache HTTP on mw26 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.054 second response time [02:34:20] RECOVERY - Apache HTTP on mw57 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.046 second response time [02:34:20] RECOVERY - Apache HTTP on mw54 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.059 second response time [02:34:21] RECOVERY - Apache HTTP on mw58 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.047 second response time [02:34:21] RECOVERY - Apache HTTP on mw49 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.072 second response time [02:34:21] RECOVERY - Apache HTTP on mw43 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.081 second response time [02:34:29] RECOVERY - Apache HTTP on mw42 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.057 second response time [02:34:29] RECOVERY - Apache HTTP on mw39 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.060 second response time [02:34:39] RECOVERY - Apache HTTP on mw48 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.044 second response time [02:36:53] RECOVERY - Apache HTTP on mw51 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.057 second response time [02:37:56] RECOVERY - Apache HTTP on mw35 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.043 second response time [02:40:26] !log checking memcache servers due to spate of mediawiki servers flapping [02:40:35] Logged the message, Mistress of the network gear. [02:45:38] !log mwscript.php appears to be broken [02:45:47] Logged the message, Mistress of the network gear. [02:46:20] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [02:47:42] !log LocalisationUpdate completed (1.21wmf5) at Sun Dec 16 02:47:42 UTC 2012 [02:47:50] Logged the message, Master [02:48:15] i'm outtie, g'night folks [02:53:23] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [02:59:36] Hi. I have a random question: A high-res (11.6 MB) image was just posted to reddit and it's getting popular. What would be the effect on costs for thousands of hits on an image of that size? (http://www.reddit.com/r/space/comments/14wng9/highres_pic_of_the_galactic_center_an/) [03:00:34] brianpeiris: well there's essentially no bandwidth cost [03:00:59] Really? Interesting, why is that? [03:01:02] brianpeiris: i don't know if size is an issue there. assume it's cached by the squids then it should be fine [03:01:44] assuming* [03:01:49] Hmm, I don't understand how this bandwidth would be free, can you link me to more information about these "squids"? [03:02:03] Squid is our HTTP caching software [03:02:16] Jasper_Deng: one of [03:02:25] the other is Memcached [03:02:26] Ok, but how does that result in free bandwidth? [03:02:47] if it's cached the Apache servers are free [03:03:13] brianpeiris: i think all of the WMF transit/peering is unmetered. so use a little or use a lot and it's the same price. and the majority is also completely free. (at least for carrying cost. maybe not for initial setup) [03:03:23] Jasper_Deng: err, no. many more than that [03:03:24] Um, no. [03:03:37] Amgine: ? [03:03:56] brianpeiris: (as in donated) [03:04:26] brianpeiris: anyway, there's a lot of speculation there. others know more but are probably not online right now [03:04:49] Ok, thanks for the information! [03:05:56] Keep in mind that even donated bandwidth is a finite item. [03:06:01] brianpeiris: http://wikitech.wikimedia.org/view/Squid http://en.wikipedia.org/wiki/Squid_%28software%29 [03:06:35] brianpeiris: anyway, would be much better to link to the file description page [03:06:47] I agree [03:07:10] brianpeiris: in this case that is https://commons.wikimedia.org/wiki/File:Galactic_Cntr_full_cropped.jpg [03:07:30] brianpeiris: (for reasons that have nothing to do with our servers or perf or bandwidth) [03:08:29] It would be preferred, of course, to use either of these links: http://www.ipac.caltech.edu/2mass/gallery/showcase/galcen/fullres.html [03:09:07] Well, I'm glad to hear that it's not a concern, since 100,000 views of that image on reddit would result in a petabyte of transfer [03:11:06] brianpeiris: i'm certain transit / peering won't be a concern. and other things are probably fine too. we have recently been working on making it fast for video files which are mostly (i guess) larger than that file. and video is the same infra as what serves that file [03:12:03] true [04:20:14] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:23:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.152 seconds [04:57:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:13:55] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.045 seconds [05:20:04] PROBLEM - Swift HTTP on ms-fe1003 is CRITICAL: HTTP CRITICAL - No data received from host [05:23:35] Ryan_Lane: sounds like gerrit's kinda broken? [05:23:51] 16 05:14:52 < legoktm> ! [remote rejected] HEAD -> refs/publish/master/master (database error) [05:32:47] 16 05:32:27 < gerrit-wm> New patchset: Legoktm; "Add variant to A which is currently being exploited on enwiki" [mediawiki/extensions/AntiSpoof] (master) - https://gerrit.wikimedia.org/r/38901 [05:32:54] so it finally went through for him [05:33:04] idk how many times he tried but it sounds like more than a couple [05:46:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:47:10] (sounds like it was actually a change in commit msg that made it eventually go through ok. waiting on him to provide the original commit he was trying so i can try to repro) [05:59:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.119 seconds [06:15:19] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Puppet has not run in the last 10 hours [06:20:08] PROBLEM - Swift HTTP on ms-fe1003 is CRITICAL: HTTP CRITICAL - No data received from host [06:24:19] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [06:24:19] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: Puppet has not run in the last 10 hours [06:24:20] PROBLEM - Puppet freshness on ms-be1002 is CRITICAL: Puppet has not run in the last 10 hours [06:24:20] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [06:33:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:49:55] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.342 seconds [07:23:04] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:36:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.494 seconds [07:41:49] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [07:41:49] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [07:57:52] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [07:57:53] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [08:12:07] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:20:04] PROBLEM - Swift HTTP on ms-fe1003 is CRITICAL: HTTP CRITICAL - No data received from host [08:27:07] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.019 seconds [08:59:58] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:14:40] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.021 seconds [09:41:04] PROBLEM - Swift HTTP on ms-fe1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:47:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:00:52] PROBLEM - Puppet freshness on ms-be3 is CRITICAL: Puppet has not run in the last 10 hours [10:01:55] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.237 seconds [10:35:58] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:52:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.070 seconds [11:19:37] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 202 seconds [11:20:04] PROBLEM - Swift HTTP on ms-fe1003 is CRITICAL: HTTP CRITICAL - No data received from host [11:20:31] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 230 seconds [11:23:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:27:07] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [11:27:43] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [11:38:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.900 seconds [12:09:52] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [12:12:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:17:49] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [12:25:37] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.882 seconds [12:47:49] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [12:54:52] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [12:59:49] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:16:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.031 seconds [13:49:01] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:03:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.042 seconds [14:36:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:52:46] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.073 seconds [15:23:46] New review: Reedy; "Wheee :D" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/38724 [15:25:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:29:11] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38724 [15:32:04] New patchset: Reedy; "Losslessly compress sul images" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38909 [15:32:23] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38909 [15:32:49] New review: Reedy; "reedy@ubuntu64-web-esxi:~/git/operations/mediawiki-config$ git checkout master" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38909 [15:36:05] !log reedy synchronized images/sul/ [15:36:14] Logged the message, Master [15:38:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.226 seconds [15:50:54] New patchset: Reedy; "(bug 40036) add central auth icon for wikimania2013 wiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38722 [15:51:06] New review: Reedy; "PS3: Losslessly compressed image" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38722 [15:52:40] New review: Reedy; "Re: All wikimania wikis" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/38722 [15:52:41] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38722 [15:53:29] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38677 [15:54:08] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38452 [15:54:55] !log reedy synchronized images/sul/wikimania.png [15:55:04] Logged the message, Master [15:55:28] !log reedy synchronized wmf-config/InitialiseSettings.php [15:55:36] Logged the message, Master [16:14:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:16:31] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Puppet has not run in the last 10 hours [16:17:52] PROBLEM - Apache HTTP on srv224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:25:31] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [16:25:31] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: Puppet has not run in the last 10 hours [16:25:32] PROBLEM - Puppet freshness on ms-be1002 is CRITICAL: Puppet has not run in the last 10 hours [16:25:32] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [16:26:16] PROBLEM - Apache HTTP on srv219 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:27:37] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.806 seconds [16:27:55] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:28:04] PROBLEM - LVS HTTP IPv4 on rendering.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:28:31] PROBLEM - Apache HTTP on srv222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:36:46] New patchset: Nikerabbit; "Bug 43075 - Config wm2013 sidebar translation group" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38910 [16:41:35] PROBLEM - Apache HTTP on srv220 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:42:46] RECOVERY - Apache HTTP on srv219 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.454 second response time [16:44:45] anyone around? [16:45:03] apaches bouncing and LVS did once too [16:45:16] still not recovered (in here at least) [16:46:22] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:46:59] watchmouse is all green... [16:47:43] PROBLEM - Apache HTTP on srv219 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:51:19] 3~hey [16:51:21] looking [16:52:32] ok, danke [16:59:41] RECOVERY - Apache HTTP on srv220 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.052 second response time [17:02:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:05:14] RECOVERY - Apache HTTP on srv219 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.078 second response time [17:07:47] RECOVERY - LVS HTTP IPv4 on rendering.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 61366 bytes in 1.555 seconds [17:10:22] New patchset: Jeremyb; "bug 38543 - set kowikisource wgLogo to local upload" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38911 [17:11:59] RECOVERY - Apache HTTP on srv222 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.054 second response time [17:12:26] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.056 second response time [17:12:53] RECOVERY - Apache HTTP on srv224 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.052 second response time [17:13:20] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.065 second response time [17:14:43] !log "killall -9 convert" on all imagescalers [17:14:51] Logged the message, Master [17:17:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.037 seconds [17:38:32] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 217 seconds [17:38:59] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 233 seconds [17:42:44] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [17:42:44] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [17:46:48] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [17:47:05] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [17:50:05] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:58:47] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [17:58:47] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [18:06:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.056 seconds [18:38:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:41:45] PROBLEM - Swift HTTP on ms-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:54:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.031 seconds [19:26:54] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:41:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.141 seconds [20:02:23] PROBLEM - Puppet freshness on ms-be3 is CRITICAL: Puppet has not run in the last 10 hours [20:15:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:24:08] PROBLEM - Swift HTTP on ms-fe1004 is CRITICAL: HTTP CRITICAL - No data received from host [20:28:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.354 seconds [21:04:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:19:02] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.018 seconds [21:51:14] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:07:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.041 seconds [22:10:35] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [22:18:32] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [22:24:05] PROBLEM - Swift HTTP on ms-fe1004 is CRITICAL: HTTP CRITICAL - No data received from host [22:39:50] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:48:32] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [22:55:35] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [22:56:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.023 seconds [23:28:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:44:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.033 seconds [23:53:56] PROBLEM - Apache HTTP on srv219 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:55:26] RECOVERY - Apache HTTP on srv219 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.449 second response time