[00:00:19] !log mflaschen synchronized php-1.21wmf12/extensions/GuidedTour/GuidedTour.php 'Small bug fix to GuidedTour; removing unneeded dependency. https://gerrit.wikimedia.org/r/#/c/56546/1/GuidedTour.php' [00:00:25] Logged the message, Master [00:01:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:02:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [00:03:12] !log mflaschen synchronized php-1.21wmf12/extensions/GuidedTour/GuidedTour.php 'Small bug fix to GuidedTour; removing unneeded dependency. https://gerrit.wikimedia.org/r/#/c/56546/1/GuidedTour.php' [00:03:19] Logged the message, Master [00:06:29] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [00:07:49] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Fri Mar 29 00:07:43 UTC 2013 [00:08:09] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 30698 MB (3% inode=99%): [00:08:29] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [00:08:39] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [00:12:12] !log olivneh synchronized php-1.21wmf12/extensions/MoodBar 'Updating MoodBar to remove ClickTracking integration' [00:12:12] Logged the message, Master [00:14:39] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Fri Mar 29 00:14:33 UTC 2013 [00:15:29] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [00:26:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:27:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.194 second response time [00:28:28] New patchset: Reedy; "Update php symlink to 1.21wmf12" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56556 [00:28:41] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56556 [01:07:48] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [01:09:28] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 30251 MB (3% inode=99%): [01:09:58] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [01:16:56] ori-l: hey [01:17:02] paravoid: hey [01:17:14] I completely missed the patchset [01:17:22] sorry [01:17:25] no worries at all [01:17:26] ok to merge now? [01:17:35] only if you want to make me super happy [01:17:41] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54324 [01:17:48] :))) [01:18:06] done [01:18:06] ^ ori-l eating a burger emoticon [01:18:08] if i sudo puppetd -tv on vanadium, will it pull the changes, or do i need to wait 30 minutes? [01:18:19] YuviPanda: heh [01:18:22] it will [01:18:27] I was about to do that, but better if you do :) [01:18:34] ok, will do [01:18:51] you're more likely to notice something going wrong in the diffs [01:19:10] running.. [01:24:02] !log Stopping EventLogging daemons to allow Puppet to change 'eventlogging' user's home dir [01:24:09] Logged the message, Master [01:24:27] omg events dropping! [01:24:36] pid 21038, uptime 34 days, 15:18:55 [01:24:37] :( [01:24:41] it was a good run [01:26:29] hrm: err: /Stage[main]/Eventlogging::Archive/File[/etc/logrotate.d/eventlogging]: Could not evaluate: Could not retrieve information from environment production source(s) puppet:///files/eventlogging/logrotate at /var/lib/git/operations/puppet/modules/eventlogging/manifests/archive.pp:25 [01:27:26] whoops [01:27:30] but modules/eventlogging/files/logrotate is there [01:27:43] no, that's not how you refer to files in modules [01:27:48] my bad, should have spotted that [01:28:01] yes, damn you for not fixing my bugs [01:28:15] what should it be? [01:30:31] c'mon gerrit [01:30:37] New patchset: Faidon; "eventlogging: fix file path" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56561 [01:31:20] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56561 [01:31:28] ok, try again [01:31:36] btw, you had it right on ganglia.pp :) [01:31:41] oh, right [01:31:45] yeah, i just saw your change [01:33:10] New patchset: Dr0ptp4kt; "Unified default lang redirect from m. & zero." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55302 [01:34:14] dr0ptp4kt: and who are you? :) [01:35:25] hi there - adam baso here - sitting next to yurik this week while he's on site. [01:35:48] ah [01:36:00] I'm Faidon, as /whois says :) [01:36:50] I should probably update, LOL, will do. [01:37:42] paravoid: http://dpaste.org/9T1Gn/raw/ worked [01:37:44] New patchset: Dzahn; "add a script and cron to mail out bugzilla audit log and move bugzilla scripts to files/bugzilla instead of misc" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56562 [01:38:49] great [01:38:57] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Puppet has not run in the last 10 hours [01:38:57] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Puppet has not run in the last 10 hours [01:38:57] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [01:39:10] New patchset: Dzahn; "add a script and cron to mail out bugzilla audit log and move bugzilla scripts to files/bugzilla instead of misc" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56562 [01:42:09] paravoid: the zpubmon ganglia module doesn't appear to be reporting metrics http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&title=EventLogging&vl=events+%2F+sec&x=&n=&hreg[]=vanadium.eqiad.wmnet&mreg[]=%5E%28client-generated-raw%7Cserver-generated-raw%7Cvalid-events%29%24>ype=stack&glegend=show&aggregate=1 [01:42:17] but everything seems to be running, so i'll debug from home [01:42:21] but if you have any ideas, let me know [01:42:28] thanks very much again [01:45:40] actually, paravoid, got a second? [01:46:07] yes [01:46:17] if so, have a look at file { '/usr/lib/ganglia/python_modules/zpubmon.py' in modules/eventlogging/manifests/ganglia.pp [01:46:28] i tried to make it a symlink to /srv/deployment/eventlogging/EventLogging/ganglia/python_modules/zpubmon.py [01:46:35] the destination is there, but instead of a symlink i got a directory [01:47:02] /usr/lib/ganglia/python_modules/zpubmon.py: directory [01:47:16] recurse => true [01:47:18] why? [01:47:28] why why? [01:47:35] sorry, bad recursion joke [01:47:39] haha [01:48:02] oh, i was probably thinking that it would create parent dirs as necessary [01:48:10] but instead it's causing puppet to interpret the resource as a directory [01:48:33] yeah [01:48:51] recurse means "recursively copy directory" [01:49:58] New patchset: Ori.livneh; "Drop "recurse => true" from EventLogging Ganglia resources" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56563 [01:50:25] puppet has more booby traps than an indiana jones movie [01:51:02] paravoid: ^^ patch [01:51:09] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56563 [01:53:51] paravoid: http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&title=EventLogging&vl=events+%2F+sec&x=&n=&hreg[]=vanadium.eqiad.wmnet&mreg[]=%5E%28client-generated-raw%7Cserver-generated-raw%7Cvalid-events%29%24>ype=stack&glegend=show&aggregate=1 [01:53:57] THANK YOU, weee [01:54:00] what a relief [01:54:17] i felt like a thief in the night having this stuff running but unpuppetized [01:55:47] hahaha [01:56:37] :) thanks again and see you later [01:56:38] * ori-l runs home [01:57:46] bye! [02:06:09] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [02:08:19] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [02:08:49] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 29655 MB (3% inode=99%): [02:10:49] PROBLEM - Puppet freshness on cp3010 is CRITICAL: Puppet has not run in the last 10 hours [02:17:06] !log LocalisationUpdate completed (1.21wmf12) at Fri Mar 29 02:17:06 UTC 2013 [02:17:14] Logged the message, Master [02:22:49] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Puppet has not run in the last 10 hours [02:26:41] New patchset: Odder; "(bug 46154) Override $wgGroupPermissions for thwiki Add abusefilter-log-detail and patrol for autoconfirmed on thwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56564 [03:04:41] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [03:06:51] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [03:07:21] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 29194 MB (3% inode=99%): [03:10:41] PROBLEM - Apache HTTP on mw1171 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:10:41] PROBLEM - Apache HTTP on mw1060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:10:41] PROBLEM - Apache HTTP on mw1054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:11:11] PROBLEM - Apache HTTP on mw1172 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:11:31] RECOVERY - Apache HTTP on mw1171 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.336 second response time [03:11:32] RECOVERY - Apache HTTP on mw1054 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.061 second response time [03:11:32] RECOVERY - Apache HTTP on mw1060 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.149 second response time [03:12:01] RECOVERY - Apache HTTP on mw1172 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.057 second response time [03:18:12] Waiting for 10.64.16.145: 140 seconds lagged [03:19:08] db1050 [03:20:30] binasher: ^ [03:20:43] lots of Waiting for the slave SQL thread to advance position [03:21:57] a lot of wait cpu [03:22:08] db1051 and db1052 are more loaded [03:23:47] Reedy: gah, looking [03:24:20] thanks, its in the 200s now [04:01:33] 1 million jobs queued on enwiki in the last hour, 150k in one min in large write queries… [04:01:54] this cause a partial sight outage when it happened earlier today, and its likely to again [04:02:09] hmmm? [04:03:12] let's reparse 1/3rd of enwiki, shall we? [04:03:12] what inserts them? [04:03:20] template edits? [04:03:42] these are refreshLinks jobs [04:04:16] Reedy: around? [04:04:17] do you know the cause? [04:04:25] or should I start investigating too? [04:06:03] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:06:36] paravoid: i don't [04:07:02] Aaron|home: any changes in wmf12 that would be relevant? [04:07:28] no, just usage changes probably [04:07:43] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 28590 MB (3% inode=99%): [04:07:53] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Fri Mar 29 04:07:51 UTC 2013 [04:08:03] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:08:13] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [04:08:33] it looks like wikidata utilizes refreshlinks jobs [04:08:59] hrm, link? [04:09:03] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Fri Mar 29 04:08:57 UTC 2013 [04:10:03] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:10:03] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Fri Mar 29 04:09:57 UTC 2013 [04:11:03] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:11:43] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Fri Mar 29 04:11:39 UTC 2013 [04:12:03] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:12:23] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Fri Mar 29 04:12:18 UTC 2013 [04:13:03] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:14:43] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Fri Mar 29 04:14:34 UTC 2013 [04:15:03] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:15:37] !log aaron synchronized php-1.21wmf12/includes/job/JobQueueGroup.php 'deployed 651884d4bdc76c29dd717dfdbbd698632223f3b5 ' [04:15:45] Logged the message, Master [04:16:18] !log aaron synchronized php-1.21wmf12/maintenance/runJobs.php 'deployed 651884d4bdc76c29dd717dfdbbd698632223f3b5' [04:16:25] Logged the message, Master [04:18:07] Aaron|home: the wikidata updater is actually just inserting ChangeNotification jobs [04:19:07] WikiPageUpdater has scheduleRefreshLinks [04:19:13] which might resolve to refreshlinks [04:19:13] PROBLEM - Puppet freshness on mw1160 is CRITICAL: Puppet has not run in the last 10 hours [04:19:56] well that's directly what it inserts [04:20:01] 02:49:58 Posted 1000 changes to enwiki, up to ID 14970265, timestamp 20130328102638. Lag is 59000 seconds. Next ID is 14970265. [04:20:02] 02:58:28 Posted 1000 changes to enwiki, up to ID 14971265, timestamp 20130328103006. Lag is 59302 seconds. Next ID is 14971265. [04:20:02] 02:58:31 Posted 1000 changes to enwiki, up to ID 14972265, timestamp 20130328103325. Lag is 59106 seconds. Next ID is 14972265. [04:20:30] refreshLinks per-title jobs for several titles [04:20:37] I wonder what the batch limit is [04:20:46] those are each individual change notif jobs that i suppose could turn into thousands of refreshlinks jobs [04:21:05] 1000 batch limit for change notifications [04:21:18] so it seems [04:21:21] what log are you looking at? [04:21:59] hume:/var/log/wikidata/dispatcher.log /var/log/wikidata/dispatcher2.log [04:22:13] ah, hume, right [04:22:39] Tim-away asked them not to reparse everything every time a langlink is changed [04:22:45] I'm hoping that change I deployed will reduce the instances of RL jobs being spawned before the existing ones finish [04:23:26] I was aiming to do that before but the code didn't handle the case where runJobs already started, which should be handled now [04:23:40] will that slow that actual insertion of refreshlinks jobs? [04:23:42] * Aaron|home was wondering why sometimes the green/blue lines on graphite wildly didn't match [04:24:10] it will slow it down to the rate of "how fast it can parse and finish all the existing jobs" before converting more RL2 jobs to RL jobs [04:24:23] that and the profiling collector was fairly broken for a while [04:24:30] I originally wanted that just to reduce the width of the queue, but it also slows it down [04:24:52] I still have to get someone to review https://gerrit.wikimedia.org/r/#/c/56522/ [04:25:25] are the ChangeNotification jobs also low priority? [04:26:47] yes [04:29:30] re: the comment in that review to use casting for $count, isn't it already cast as an int from incr()? [04:30:16] he meant use (int) instead of intval() [04:30:33] is either necessary though? [04:30:37] otherwise the world might end [04:30:56] you know I just moved that code from elsewhere [04:31:10] I didn't want to change it around for fun reasons in that same commit [04:32:13] you touched it though! heh [04:32:24] i would delete that line for fun [04:33:01] this looks reasonable to me though [05:04:21] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [05:06:31] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [05:07:01] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 29060 MB (3% inode=99%): [05:13:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:14:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [05:28:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [05:33:40] New patchset: Rfaulk; "mod. use get_project_host_map method to generate map for project to host key." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56576 [05:46:08] New patchset: J; "install lilypond on apache nodes (used by Score extension)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56577 [05:51:34] PROBLEM - SSH on lvs1001 is CRITICAL: Connection timed out [05:51:35] PROBLEM - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [05:51:36] PROBLEM - LVS HTTPS IPv4 on foundation-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [05:51:36] PROBLEM - LVS HTTPS IPv4 on wikibooks-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [05:51:37] PROBLEM - LVS HTTP IPv4 on mediawiki-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [05:51:37] PROBLEM - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [05:51:38] PROBLEM - LVS HTTP IPv4 on wikibooks-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [05:51:38] PROBLEM - LVS HTTPS IPv4 on mobile-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [05:52:22] uh [05:52:34] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [05:52:34] RECOVERY - LVS HTTPS IPv4 on wikibooks-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 62746 bytes in 0.038 second response time [05:52:34] RECOVERY - LVS HTTP IPv4 on mediawiki-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.0 200 OK - 62740 bytes in 0.004 second response time [05:52:34] RECOVERY - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 16451 bytes in 0.010 second response time [05:52:36] RECOVERY - LVS HTTP IPv4 on wikibooks-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.0 200 OK - 62740 bytes in 0.014 second response time [05:52:36] RECOVERY - LVS HTTPS IPv4 on foundation-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 62746 bytes in 0.047 second response time [05:52:36] RECOVERY - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 16485 bytes in 0.024 second response time [05:52:38] RECOVERY - LVS HTTPS IPv4 on mobile-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 16492 bytes in 0.019 second response time [05:53:11] ok, why'd you kill mobile asher ? [05:53:13] and lvs1001 [05:55:46] so many reasons but hear goes.. 1 - heeeyyy [05:58:06] hehe [05:58:11] okay, i sent off a quick email [05:59:17] bye [05:59:20] ah, I just saw it (I did not even [05:59:26] see these til now) [05:59:36] still waking up.. [06:04:53] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [06:06:03] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [06:06:33] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 29582 MB (3% inode=99%): [06:30:13] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Fri Mar 29 06:30:03 UTC 2013 [06:30:53] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [06:30:53] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Fri Mar 29 06:30:46 UTC 2013 [06:31:53] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [06:31:53] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Fri Mar 29 06:31:52 UTC 2013 [06:32:53] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [06:59:47] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [07:05:25] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [07:07:05] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 29131 MB (3% inode=99%): [07:07:35] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [07:15:02] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Fri Mar 29 07:14:46 UTC 2013 [07:15:25] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [07:21:35] PROBLEM - Apache HTTP on mw1111 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:21:36] PROBLEM - Apache HTTP on mw1112 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:21:36] PROBLEM - Apache HTTP on mw1020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:21:36] PROBLEM - Apache HTTP on mw1052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:21:36] PROBLEM - Apache HTTP on mw1104 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:21:36] PROBLEM - Apache HTTP on mw1167 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:21:55] PROBLEM - Apache HTTP on mw1082 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:06] PROBLEM - Apache HTTP on mw1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:06] PROBLEM - Apache HTTP on mw1059 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:15] PROBLEM - Apache HTTP on mw1174 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:15] PROBLEM - Apache HTTP on mw1073 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:15] PROBLEM - Apache HTTP on mw1113 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:15] PROBLEM - Apache HTTP on mw1078 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:15] PROBLEM - Apache HTTP on mw1100 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:16] PROBLEM - Apache HTTP on mw1098 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:16] PROBLEM - Apache HTTP on mw1105 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:17] PROBLEM - Apache HTTP on mw1074 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:17] PROBLEM - Apache HTTP on mw1108 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:18] site is rather slow right now... [07:22:19] PROBLEM - Apache HTTP on mw1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:19] PROBLEM - Apache HTTP on mw1212 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:19] PROBLEM - Apache HTTP on mw1040 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:19] PROBLEM - Apache HTTP on mw1166 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:20] PROBLEM - Apache HTTP on mw1028 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:21] PROBLEM - Apache HTTP on mw1067 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:21] PROBLEM - Apache HTTP on mw1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:26] PROBLEM - Apache HTTP on mw1086 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:26] PROBLEM - Apache HTTP on mw1188 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:26] PROBLEM - Apache HTTP on mw1187 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:26] PROBLEM - Apache HTTP on mw1214 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:26] PROBLEM - Apache HTTP on mw1171 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:27] PROBLEM - Apache HTTP on mw1068 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:27] PROBLEM - Apache HTTP on mw1060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:27] PROBLEM - Apache HTTP on mw1106 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:27] PROBLEM - Apache HTTP on mw1101 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:28] PROBLEM - Apache HTTP on mw1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:29] PROBLEM - Apache HTTP on mw1026 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:32] PROBLEM - Apache HTTP on mw1023 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:32] PROBLEM - Apache HTTP on mw1090 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:33] PROBLEM - Apache HTTP on mw1050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:34] PROBLEM - Apache HTTP on mw1216 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:34] PROBLEM - LVS HTTP IPv4 on appservers.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:37] PROBLEM - Apache HTTP on mw1018 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:37] PROBLEM - Apache HTTP on mw1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:37] PROBLEM - Apache HTTP on mw1055 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:37] PROBLEM - Apache HTTP on mw1164 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:37] PROBLEM - Apache HTTP on mw1064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:37] PROBLEM - Apache HTTP on mw1186 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:37] PROBLEM - Apache HTTP on mw1109 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:37] PROBLEM - Apache HTTP on mw1097 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:37] PROBLEM - Apache HTTP on mw1027 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:37] PROBLEM - Apache HTTP on mw1168 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:39] PROBLEM - Apache HTTP on mw1096 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:39] PROBLEM - Apache HTTP on mw1103 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:40] PROBLEM - Apache HTTP on mw1080 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:40] PROBLEM - LVS HTTPS IPv4 on wikivoyage-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:40] RECOVERY - Apache HTTP on mw1104 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 8.617 second response time [07:22:40] RECOVERY - Apache HTTP on mw1167 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 9.011 second response time [07:22:42] PROBLEM - Apache HTTP on mw1057 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:42] PROBLEM - Apache HTTP on mw1185 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:44] PROBLEM - Apache HTTP on mw1025 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:44] PROBLEM - Apache HTTP on mw1032 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:44] PROBLEM - Apache HTTP on mw1056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:44] PROBLEM - Apache HTTP on mw1099 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:44] PROBLEM - Apache HTTP on mw1039 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:45] RECOVERY - Apache HTTP on mw1082 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.052 second response time [07:22:45] PROBLEM - Apache HTTP on mw1102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:55] RECOVERY - Apache HTTP on mw1059 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.125 second response time [07:23:05] RECOVERY - Apache HTTP on mw1113 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.049 second response time [07:23:05] RECOVERY - Apache HTTP on mw1174 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.051 second response time [07:23:05] RECOVERY - Apache HTTP on mw1078 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.056 second response time [07:23:05] RECOVERY - Apache HTTP on mw1073 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.087 second response time [07:23:05] RECOVERY - Apache HTTP on mw1098 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.393 second response time [07:23:06] RECOVERY - Apache HTTP on mw1074 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.056 second response time [07:23:06] RECOVERY - Apache HTTP on mw1105 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.061 second response time [07:23:07] RECOVERY - Apache HTTP on mw1212 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.059 second response time [07:23:07] RECOVERY - Apache HTTP on mw1108 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.067 second response time [07:23:08] RECOVERY - Apache HTTP on mw1083 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.070 second response time [07:23:08] RECOVERY - Apache HTTP on mw1040 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.085 second response time [07:23:09] RECOVERY - Apache HTTP on mw1100 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 2.642 second response time [07:23:09] RECOVERY - Apache HTTP on mw1166 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.055 second response time [07:23:11] RECOVERY - Apache HTTP on mw1067 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.057 second response time [07:23:11] RECOVERY - Apache HTTP on mw1081 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.063 second response time [07:23:11] RECOVERY - Apache HTTP on mw1028 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.076 second response time [07:23:15] RECOVERY - Apache HTTP on mw1086 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.048 second response time [07:23:15] RECOVERY - Apache HTTP on mw1188 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.053 second response time [07:23:15] RECOVERY - Apache HTTP on mw1187 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.093 second response time [07:23:15] RECOVERY - Apache HTTP on mw1214 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.058 second response time [07:23:15] RECOVERY - Apache HTTP on mw1068 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.058 second response time [07:23:16] RECOVERY - Apache HTTP on mw1060 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.093 second response time [07:23:17] RECOVERY - Apache HTTP on mw1106 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.099 second response time [07:23:23] RECOVERY - Apache HTTP on mw1171 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.104 second response time [07:23:23] RECOVERY - Apache HTTP on mw1101 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 1.476 second response time [07:23:23] RECOVERY - Apache HTTP on mw1026 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.106 second response time [07:23:23] RECOVERY - Apache HTTP on mw1075 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 2.281 second response time [07:23:23] RECOVERY - Apache HTTP on mw1090 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.065 second response time [07:23:23] RECOVERY - Apache HTTP on mw1023 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.092 second response time [07:23:23] RECOVERY - Apache HTTP on mw1050 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.071 second response time [07:23:23] RECOVERY - Apache HTTP on mw1216 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.327 second response time [07:23:23] RECOVERY - LVS HTTP IPv4 on appservers.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 62900 bytes in 0.330 second response time [07:23:25] RECOVERY - Apache HTTP on mw1186 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.053 second response time [07:23:25] RECOVERY - Apache HTTP on mw1064 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.062 second response time [07:23:25] RECOVERY - Apache HTTP on mw1018 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.075 second response time [07:23:25] RECOVERY - Apache HTTP on mw1164 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.077 second response time [07:23:25] RECOVERY - Apache HTTP on mw1097 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.078 second response time [07:23:27] RECOVERY - Apache HTTP on mw1055 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.067 second response time [07:23:27] RECOVERY - Apache HTTP on mw1079 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.095 second response time [07:23:27] RECOVERY - Apache HTTP on mw1109 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.110 second response time [07:23:27] RECOVERY - Apache HTTP on mw1168 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.059 second response time [07:23:28] RECOVERY - Apache HTTP on mw1027 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.056 second response time [07:23:29] RECOVERY - Apache HTTP on mw1096 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.069 second response time [07:23:29] RECOVERY - Apache HTTP on mw1103 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.064 second response time [07:23:29] RECOVERY - Apache HTTP on mw1112 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.066 second response time [07:23:30] RECOVERY - Apache HTTP on mw1020 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.063 second response time [07:23:30] RECOVERY - Apache HTTP on mw1080 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.085 second response time [07:23:32] RECOVERY - LVS HTTPS IPv4 on wikivoyage-lb.pmtpa.wikimedia.org is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 757 bytes in 0.202 second response time [07:23:32] RECOVERY - Apache HTTP on mw1032 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.049 second response time [07:23:33] RECOVERY - Apache HTTP on mw1025 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.054 second response time [07:23:33] RECOVERY - Apache HTTP on mw1185 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.055 second response time [07:23:33] RECOVERY - Apache HTTP on mw1056 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.051 second response time [07:23:33] RECOVERY - Apache HTTP on mw1039 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.055 second response time [07:23:35] RECOVERY - Apache HTTP on mw1052 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.073 second response time [07:23:35] RECOVERY - Apache HTTP on mw1099 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.071 second response time [07:23:35] RECOVERY - Apache HTTP on mw1057 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.088 second response time [07:23:35] RECOVERY - Apache HTTP on mw1111 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 3.792 second response time [07:23:36] RECOVERY - Apache HTTP on mw1102 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.070 second response time [07:23:55] RECOVERY - Apache HTTP on mw1019 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.050 second response time [08:07:54] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [08:09:34] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 28581 MB (3% inode=99%): [08:10:05] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [08:14:34] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Fri Mar 29 08:14:31 UTC 2013 [08:15:04] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [08:20:35] New patchset: Odder; "(bug 45643) Add new user groups to urwiki with specific rights Add abusefilter and rollbacker user groups, modify $wgAddGroups for crats and sysops, modify $wgRemoveGroups for crats" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56578 [08:26:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:27:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.122 second response time [09:01:18] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:02:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.122 second response time [09:05:32] New patchset: Dereckson; "(bug 46686) Throttle rule for gu. workshop" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56582 [09:06:15] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [09:07:25] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [09:07:55] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 29012 MB (3% inode=99%): [09:20:14] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56582 [09:25:30] !log olivneh synchronized wmf-config/throttle.php 'Updating throttle rules for guwiki workshop (Bug 46686)' [09:25:36] Logged the message, Master [09:40:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:41:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [10:06:07] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [10:07:17] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [10:07:47] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 28434 MB (3% inode=99%): [10:18:11] New review: PleaseStand; "(5 comments)" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/56408 [10:22:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:23:17]