[12:29:47] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Fri 13 Jun 2014 20:03:25 UTC [12:29:47] (03PS1) 10BBlack: turn on amssq43-46 esams text varnish backends [operations/puppet] - 10https://gerrit.wikimedia.org/r/139573 [12:29:47] (03CR) 10BBlack: [C: 032 V: 032] turn on amssq43-46 esams text varnish backends [operations/puppet] - 10https://gerrit.wikimedia.org/r/139573 (owner: 10BBlack) [12:29:48] !log LocalisationUpdate completed (1.24wmf8) at 2014-06-14 02:16:38+00:00 [12:29:48] Logged the message, Master [12:29:50] !log enabled amssq43-46 frontends (esams text varnish) in pybal [12:29:50] Logged the message, Master [12:29:50] !log LocalisationUpdate completed (1.24wmf9) at 2014-06-14 02:36:35+00:00 [12:29:50] Logged the message, Master [12:29:50] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 14 03:07:14 UTC 2014 (duration 7m 13s) [12:29:50] Logged the message, Master [12:29:52] PROBLEM - Disk space on palladium is CRITICAL: DISK CRITICAL - free space: / 1417 MB (3% inode=50%): [12:29:53] RECOVERY - Disk space on palladium is OK: DISK OK [12:29:55] (03CR) 10Santhosh: "Not sure what happened when you tested, but my blog feed http://thottingal.in/blog/tag/wikipedia/ is working." [operations/puppet] - 10https://gerrit.wikimedia.org/r/139465 (owner: 10Odder) [12:29:55] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Fri 13 Jun 2014 20:03:25 UTC [12:29:59] does anyone here have the ability to give wm-bot a good kick in the goolies? [12:29:59] seems to not be listening or talking [12:29:59] _joe|away: ^^ [12:29:59] https://wikitech.wikimedia.org/wiki/wm-bot [12:30:09] PROBLEM - LighttpdHTTP on dataset1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:30:09] RECOVERY - LighttpdHTTP on dataset1001 is OK: HTTP OK: HTTP/1.1 200 OK - 5122 bytes in 0.332 second response time [12:30:10] (03PS1) 10Withoutaname: Remove echowikis.dblist [operations/puppet] - 10https://gerrit.wikimedia.org/r/139581 [12:30:10] _joe|away: ^^ or someone who can kick wm-bot [12:30:10] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Fri 13 Jun 2014 20:03:25 UTC [12:30:11] (03CR) 10Withoutaname: [C: 031] "Looks sane so far" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134400 (owner: 10Nemo bis) [12:30:16] (03CR) 10Odder: "The domain expired — I'll re-add your blog in a moment." [operations/puppet] - 10https://gerrit.wikimedia.org/r/139465 (owner: 10Odder) [12:30:16] (03PS1) 10Odder: Re-add Santhosh Thottingal's blog to English Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/139586 [12:30:18] (03CR) 10Santhosh: [C: 031] Re-add Santhosh Thottingal's blog to English Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/139586 (owner: 10Odder) [12:30:19] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Fri 13 Jun 2014 20:03:25 UTC [12:30:21] (03CR) 10Nemo bis: Dead Blogs Are Dead, Part II (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139465 (owner: 10Odder) [13:26:42] (03CR) 10SPQRobin: [C: 031] Gather all soft-disabled uploads wikis in one config item [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134400 (owner: 10Nemo bis) [14:08:40] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Fri 13 Jun 2014 20:03:25 UTC [15:13:30] PROBLEM - Disk space on palladium is CRITICAL: DISK CRITICAL - free space: / 1422 MB (3% inode=50%): [15:25:26] (03PS2) 10Alex Monk: Remove echowikis.dblist [operations/puppet] - 10https://gerrit.wikimedia.org/r/139581 (owner: 10Withoutaname) [15:29:17] (03CR) 10Alex Monk: "What happens when this is run on wikis where Echo is disabled?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/139581 (owner: 10Withoutaname) [15:54:53] what version of graphite are we at currently? [16:27:30] RECOVERY - Disk space on palladium is OK: DISK OK [17:09:40] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Fri 13 Jun 2014 20:03:25 UTC [18:46:56] !log rebooting ms-be1001, XFS: Internal error XFS_WANT_CORRUPTED_RETURN, lots of processes in D [18:47:01] Logged the message, Master [18:48:50] PROBLEM - Host ms-be1001 is DOWN: PING CRITICAL - Packet loss = 100% [18:53:20] RECOVERY - Host ms-be1001 is UP: PING OK - Packet loss = 0%, RTA = 0.39 ms [19:00:20] PROBLEM - swift-container-server on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [19:00:31] PROBLEM - swift-object-replicator on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [19:00:31] PROBLEM - swift-account-server on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [19:00:31] PROBLEM - swift-container-updater on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [19:00:31] PROBLEM - swift-object-updater on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [19:00:31] PROBLEM - swift-account-auditor on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [19:00:32] PROBLEM - swift-object-auditor on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [19:00:32] PROBLEM - swift-account-replicator on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [19:01:00] PROBLEM - swift-account-reaper on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [19:01:00] PROBLEM - swift-container-auditor on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [19:01:10] PROBLEM - swift-container-replicator on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [19:05:50] PROBLEM - Host ms-be1001 is DOWN: PING CRITICAL - Packet loss = 100% [19:07:00] RECOVERY - Host ms-be1001 is UP: PING OK - Packet loss = 0%, RTA = 0.37 ms [19:07:00] RECOVERY - swift-account-reaper on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [19:07:00] RECOVERY - swift-container-auditor on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [19:07:10] RECOVERY - swift-container-replicator on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [19:07:20] RECOVERY - swift-container-server on ms-be1001 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [19:07:30] RECOVERY - swift-object-auditor on ms-be1001 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [19:07:30] RECOVERY - swift-account-replicator on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [19:07:30] RECOVERY - swift-account-server on ms-be1001 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [19:07:30] RECOVERY - swift-object-updater on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [19:07:30] RECOVERY - swift-object-replicator on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [19:07:31] RECOVERY - swift-account-auditor on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [19:07:31] RECOVERY - swift-container-updater on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [19:11:10] PROBLEM - swift-container-replicator on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [19:11:20] PROBLEM - swift-container-server on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [19:11:30] PROBLEM - swift-object-replicator on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [19:11:30] PROBLEM - swift-account-replicator on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [19:11:30] PROBLEM - swift-account-auditor on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [19:11:30] PROBLEM - swift-object-auditor on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [19:11:30] PROBLEM - swift-container-updater on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [19:11:31] PROBLEM - swift-account-server on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [19:11:31] PROBLEM - swift-object-updater on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [19:11:58] uhhh [19:12:00] PROBLEM - swift-account-reaper on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [19:12:00] PROBLEM - swift-container-auditor on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [19:12:08] ah, you're here :) [19:12:24] greg-g: 'paravoid is here, no fear'? [19:12:29] yep [19:12:35] :) [19:12:43] greg-g: :) [19:13:03] yeah it's me [19:14:00] PROBLEM - Host ms-be1001 is DOWN: PING CRITICAL - Packet loss = 100% [19:15:30] RECOVERY - swift-object-auditor on ms-be1001 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [19:15:30] RECOVERY - swift-container-updater on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [19:15:30] RECOVERY - swift-account-auditor on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [19:15:30] RECOVERY - swift-object-updater on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [19:15:30] RECOVERY - swift-account-replicator on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [19:15:31] RECOVERY - swift-account-server on ms-be1001 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [19:15:31] RECOVERY - swift-object-replicator on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [19:15:40] RECOVERY - Host ms-be1001 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [19:16:00] RECOVERY - swift-account-reaper on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [19:16:00] RECOVERY - swift-container-auditor on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [19:16:10] RECOVERY - swift-container-replicator on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [19:16:20] RECOVERY - swift-container-server on ms-be1001 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [19:19:06] ok, that was sdj1 then [19:19:20] !log unmounting ms-be1001's sdj1, corrupted filesystem [19:19:24] Logged the message, Master [20:03:14] (03CR) 10Withoutaname: "Bsitu, https://gerrit.wikimedia.org/r/#/c/139581/" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139326 (owner: 10Withoutaname) [20:10:37] !log ran "delete from ep_users_per_course where upc_user_id=0 limit 1" on enwiki for bug 66624 [20:10:40] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Fri 13 Jun 2014 20:03:25 UTC [20:10:41] Logged the message, Master [20:24:48] !log ran "delete from ep_students where student_user_id =0 limit 1;" on enwiki for bug 66624 [20:24:51] Logged the message, Master [21:59:42] There's reports at commons that video scalers, are no longer scaling videos, and their ganglia graphs do look like they stopped doing things: http://ganglia.wikimedia.org/latest/?r=custom&cs=06%2F12%2F2014+00%3A00+&ce=&m=cpu_report&s=by+name&c=Video+scalers+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&hide-hf=false&sh=1&z=small&hc=4 [22:00:30] anyone around who could maybe poke them? [22:07:40] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: Last successful Puppet run was Sat 14 Jun 2014 19:07:17 UTC [22:24:55] bawolff: worth a !log at least [22:25:21] I filed an RT ticket [22:25:47] * bawolff feels nervous invoking the !log magic [22:25:57] as if its something I'm not supposed to touch [22:27:16] !log video scalers seem to have stopped doing webVideoTranscode jobs [22:27:22] Logged the message, Master [22:28:03] bawolff: See? That wasn't so bad. People aren't killing you (yet) :p [22:28:21] :P [23:11:40] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Fri 13 Jun 2014 20:03:25 UTC [23:17:40] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Sat 14 Jun 2014 20:17:19 UTC