[00:02:40] New review: Reedy; "It's not needed, due to the following line in MessagesPl.php" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/47348 [00:07:50] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Mon Feb 4 00:07:45 UTC 2013 [00:08:31] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [00:08:41] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Mon Feb 4 00:08:36 UTC 2013 [00:08:49] New patchset: Andrew Bogott; "Update changelog for 'Fix adminbot's postrm'" [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/47376 [00:09:09] Change merged: Andrew Bogott; [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/47376 [00:09:31] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [00:09:40] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Mon Feb 4 00:09:37 UTC 2013 [00:10:30] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [00:13:50] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [00:22:32] New review: Leinad; "Yes, it's needed." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/47348 [00:22:50] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Puppet has not run in the last 10 hours [00:26:13] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Puppet has not run in the last 10 hours [00:28:31] New review: Reedy; "Ah, because the Wikipedia part changes, not the "talk" part" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/47348 [00:37:08] New patchset: Reedy; "Bug 39652 - Add "autoreviewer" to $wgRestrictionLevels on ptwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47377 [00:37:48] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47377 [00:38:22] !log reedy synchronized wmf-config/InitialiseSettings.php [00:38:24] Logged the message, Master [00:39:29] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47351 [00:40:02] New patchset: Reedy; "(bug 44412) Enable NewUserMessage extension on ptwiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47359 [00:40:08] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47359 [00:40:17] New patchset: Reedy; "(bug 43524) Allow itwikivoyage sysops to upload locally" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47361 [00:40:22] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47361 [00:40:35] New patchset: Reedy; "(bug 44615) Add Index namespace to idwiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47369 [00:40:40] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47369 [00:40:51] New patchset: Reedy; "(bug 43465) Only the main namespace should be searched by default on enwikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47367 [00:40:57] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47367 [00:41:09] New patchset: Reedy; "(bug 44048) Create patroller group on enwikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47358 [00:41:14] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47358 [00:42:45] !log reedy synchronized wmf-config/InitialiseSettings.php [00:42:47] Logged the message, Master [00:51:10] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [01:20:10] PROBLEM - Varnish traffic logger on cp1031 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:20:21] PROBLEM - Varnish traffic logger on cp1026 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:21:21] PROBLEM - Varnish traffic logger on cp1024 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:22:28] PROBLEM - Varnish traffic logger on cp1026 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:22:38] PROBLEM - Varnish traffic logger on cp1031 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:23:20] PROBLEM - Varnish traffic logger on cp1034 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:23:49] PROBLEM - Varnish traffic logger on cp1024 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:24:20] PROBLEM - Varnish traffic logger on cp1033 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:25:19] PROBLEM - Varnish traffic logger on cp1034 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:27:07] PROBLEM - Varnish traffic logger on cp1033 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:32:22] RECOVERY - Varnish traffic logger on cp1033 is OK: PROCS OK: 3 processes with command name varnishncsa [01:32:24] RECOVERY - Varnish traffic logger on cp1033 is OK: PROCS OK: 3 processes with command name varnishncsa [01:34:54] RECOVERY - Varnish traffic logger on cp1034 is OK: PROCS OK: 3 processes with command name varnishncsa [01:35:14] RECOVERY - Varnish traffic logger on cp1034 is OK: PROCS OK: 3 processes with command name varnishncsa [01:37:25] RECOVERY - Varnish traffic logger on cp1024 is OK: PROCS OK: 3 processes with command name varnishncsa [01:38:21] RECOVERY - Varnish traffic logger on cp1024 is OK: PROCS OK: 3 processes with command name varnishncsa [01:41:54] RECOVERY - Varnish traffic logger on cp1031 is OK: PROCS OK: 3 processes with command name varnishncsa [01:43:00] RECOVERY - Varnish traffic logger on cp1031 is OK: PROCS OK: 3 processes with command name varnishncsa [01:46:24] RECOVERY - Varnish traffic logger on cp1026 is OK: PROCS OK: 3 processes with command name varnishncsa [01:47:21] RECOVERY - Varnish traffic logger on cp1026 is OK: PROCS OK: 3 processes with command name varnishncsa [02:00:04] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [02:25:58] !log LocalisationUpdate completed (1.21wmf8) at Mon Feb 4 02:25:58 UTC 2013 [02:26:01] Logged the message, Master [02:36:27] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 183 seconds [02:36:38] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 185 seconds [02:37:12] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 190 seconds [02:37:39] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 197 seconds [03:08:50] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [03:11:40] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [03:12:21] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [03:12:36] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [03:12:37] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [04:19:02] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [04:39:46] New patchset: saper; "Bug 44414 - Create Wikivoyage Polish" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47348 [04:40:27] New review: saper; "Added NS_PROJECT_TALK per CR" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/47348 [04:42:32] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [04:46:22] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 199 seconds [04:46:23] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 200 seconds [04:47:02] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 206 seconds [04:47:11] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 208 seconds [05:07:52] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [05:20:02] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [05:20:20] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [05:20:21] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [05:20:30] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [05:24:10] New review: MZMcBride; "There are no performance issues with this?" [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/47353 [05:54:21] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [05:54:22] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [05:54:22] PROBLEM - Puppet freshness on msfe1002 is CRITICAL: Puppet has not run in the last 10 hours [05:54:22] PROBLEM - Puppet freshness on vanadium is CRITICAL: Puppet has not run in the last 10 hours [05:54:22] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [05:56:19] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [06:01:32] New patchset: saper; "Refactor mail-instance-creator.py puppetsigner.py" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47385 [06:16:10] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [06:28:45] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Mon Feb 4 06:28:41 UTC 2013 [06:28:45] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [06:29:05] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Mon Feb 4 06:28:56 UTC 2013 [06:29:45] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [06:32:45] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Mon Feb 4 06:32:36 UTC 2013 [06:32:46] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [07:35:17] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Puppet has not run in the last 10 hours [07:36:01] PROBLEM - Host labstore2 is DOWN: PING CRITICAL - Packet loss = 100% [07:36:42] RECOVERY - Host labstore2 is UP: PING OK - Packet loss = 0%, RTA = 26.83 ms [07:37:09] !log upgrading labstore2 to precise [07:37:12] Logged the message, Master [07:47:59] New patchset: Ryan Lane; "Set limit via upstart stanza for gluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47389 [07:49:10] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47389 [07:51:41] PROBLEM - Host labstore2 is DOWN: PING CRITICAL - Packet loss = 100% [07:52:37] PROBLEM - Host labstore2 is DOWN: PING CRITICAL - Packet loss = 100% [07:53:41] RECOVERY - Host labstore2 is UP: PING OK - Packet loss = 0%, RTA = 27.15 ms [07:54:07] RECOVERY - Host labstore2 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [08:02:31] PROBLEM - SSH on labstore2 is CRITICAL: Connection refused [08:04:19] PROBLEM - SSH on labstore2 is CRITICAL: Connection refused [08:04:35] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [08:07:36] RECOVERY - SSH on labstore2 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [08:07:46] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Mon Feb 4 08:07:37 UTC 2013 [08:07:55] RECOVERY - SSH on labstore2 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [08:08:35] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [08:10:37] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [08:16:35] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Puppet has not run in the last 10 hours [08:42:15] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Mon Feb 4 08:42:06 UTC 2013 [08:42:35] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [08:44:40] PROBLEM - swift-container-updater on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:44:41] PROBLEM - swift-container-server on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:05] PROBLEM - swift-account-auditor on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:05] PROBLEM - swift-object-replicator on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:06] PROBLEM - swift-account-replicator on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:10] PROBLEM - swift-object-auditor on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:11] PROBLEM - swift-object-server on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:19] PROBLEM - swift-object-replicator on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:19] PROBLEM - swift-container-replicator on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:45] PROBLEM - swift-account-server on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:46] PROBLEM - swift-container-replicator on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:46] PROBLEM - swift-object-updater on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:55] PROBLEM - swift-object-auditor on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:55] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:55] PROBLEM - swift-container-updater on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:55] PROBLEM - swift-object-server on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:55] PROBLEM - swift-container-server on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:48:16] PROBLEM - swift-account-server on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:48:34] PROBLEM - swift-account-auditor on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:48:43] PROBLEM - swift-object-updater on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:48:57] RECOVERY - swift-object-replicator on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [08:48:58] RECOVERY - swift-account-replicator on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [08:48:58] RECOVERY - swift-account-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [08:49:01] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:49:37] RECOVERY - swift-object-server on ms-be5 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [08:49:37] RECOVERY - swift-object-auditor on ms-be5 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [08:49:39] RECOVERY - swift-account-server on ms-be5 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [08:49:39] RECOVERY - swift-container-replicator on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [08:49:39] RECOVERY - swift-object-updater on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [08:49:46] RECOVERY - swift-container-replicator on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [08:49:46] RECOVERY - swift-object-replicator on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [08:49:48] RECOVERY - swift-object-server on ms-be5 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [08:49:48] RECOVERY - swift-object-auditor on ms-be5 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [08:49:48] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [08:49:48] RECOVERY - swift-container-updater on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [08:49:48] RECOVERY - swift-container-server on ms-be5 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [08:49:55] RECOVERY - swift-account-server on ms-be5 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [08:49:55] RECOVERY - swift-container-server on ms-be5 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [08:49:56] RECOVERY - swift-container-updater on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [08:50:14] RECOVERY - swift-account-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [08:50:22] RECOVERY - swift-object-updater on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [08:50:40] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:28:56] Change abandoned: Hashar; "Superseded by the --verbose change Tim proposed in Ic6db1d8a" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42970 [09:42:27] New patchset: MaxSem; "WIP: advanced Solr monitoring script" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47111 [09:55:14] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 182 seconds [09:55:45] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 197 seconds [09:56:32] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 220 seconds [09:56:41] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 223 seconds [09:57:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [09:59:41] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [10:00:44] New review: Hashar; "getopt -l might not always be available (ex on Mac OS X). I proposed a simpler argument handler." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/46907 [10:02:12] apergos: if you got any minutes to spare, I got some trivial changes for you to merge in :-] [10:03:35] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 29 seconds [10:03:48] New patchset: ArielGlenn; "tool for converting MW xml files to page/rev/txt sql files, initial commit" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/47390 [10:03:58] sure [10:04:13] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [10:04:14] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [10:04:30] all related to the contint server gallium [10:04:41] install the pylint package : https://gerrit.wikimedia.org/r/#/c/46466/ [10:04:53] mercurial package: https://gerrit.wikimedia.org/r/#/c/46931/ [10:05:14] enabling an apache site https://gerrit.wikimedia.org/r/#/c/47365/ [10:05:14] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [10:05:17] that is all [10:05:27] I already have installed the two packages manually [10:05:41] and tested the site enabling manually :-] [10:05:47] I can run puppet on gallium by myself :-] [10:05:48] ok good [10:05:52] oh yeah :-D [10:06:00] I was gonna do the whole thing but right :-) [10:06:15] yeah simply merge on sock puppet and I do the guinea pig for the rest hehe [10:06:22] uh huh [10:07:15] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/46466 [10:07:38] PROBLEM - Puppet freshness on mw1128 is CRITICAL: Puppet has not run in the last 10 hours [10:08:22] why do we need mercurial anyways? [10:08:39] oh [10:08:53] Chad needs it to build some Java package [10:09:03] I am not entirely sure which one though [10:09:08] gah really? [10:09:20] *sigh* [10:09:44] most probably a plugin for Gerrit [10:10:03] PROBLEM - Puppet freshness on mw1128 is CRITICAL: Puppet has not run in the last 10 hours [10:10:10] so we're building out of trunk? [10:10:41] oh yeah [10:10:44] I see what you mean [10:10:46] mmm [10:10:59] why doesn't everybody in the universe use Fossil instead of that lame army of crappy VCSes? [10:11:18] I'm going to skip that change for right now [10:11:47] will ask Chad about it [10:11:53] ok cool [10:12:25] New review: Hashar; "Chad do you remember which Java package it was for ? We might not want to build from a third party." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/46931 [10:13:46] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47365 [10:14:03] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [10:14:34] reconnecting [10:16:11] err: Could not retrieve catalog from remote server: Error 400 on SERVER: Unknown function apache_site at /var/lib/git/operations/puppet/manifests/misc/contint.pp:338 on node gallium.wikimedia.org [10:16:11] bahhh [10:16:32] New review: ArielGlenn; "alpha code is alpha" [operations/dumps] (ariel); V: 0 C: 2; - https://gerrit.wikimedia.org/r/47390 [10:16:55] New review: ArielGlenn; "alpha code is alpha" [operations/dumps] (ariel); V: 2 C: 2; - https://gerrit.wikimedia.org/r/47390 [10:16:55] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/47390 [10:20:32] New patchset: Hashar; "contint: fix apache_site definition" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47391 [10:20:42] New review: Hashar; "apache_site is not a function, fixed by https://gerrit.wikimedia.org/r/47391" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47365 [10:20:59] apergos: sorry did a mistake in a previous patch https://gerrit.wikimedia.org/r/#/c/47391/ :-D [10:21:11] apergos: apache_site is not a function [10:21:37] grrr, of course I did not check that [10:21:46] puppet did for us :-] [10:21:51] * apergos should never assume the other guy knows what they are doing :-P [10:23:03] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Puppet has not run in the last 10 hours [10:23:13] is it allowed (or a good idea) to have spaces in the name like that? [10:23:36] yeah did that multiple times already [10:23:46] so you can write something like: [10:23:57] mount { "better mount vdb on labs": …. } [10:24:03] which would let you later reference it as: [10:24:14] require => Mount["better mount vdb on labs"] [10:24:22] generally sure [10:24:29] I mean for the specific function [10:25:56] I guess it's ok (having looked at it now) [10:26:11] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47391 [10:26:22] we will see :-D [10:26:38] yep, we will [10:28:00] that works [10:28:02] !! [10:28:25] good [10:28:32] notice: /Stage[main]/Misc::Docs::Puppet/Exec[generate puppet docsite]/returns: executed successfully [10:28:33] bah [10:28:46] I need to migrate that doc generation to Jenkins [10:30:11] apergos: thanks! :-] [10:30:28] sure [10:30:29] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Puppet has not run in the last 10 hours [10:35:24] and finally got my monthly report entry in *whew* [10:52:01] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [10:53:00] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.000 second response time on port 8123 [10:59:31] New patchset: Dereckson; "(bug 44634) Enable micro design on ml.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47393 [11:07:12] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [11:09:34] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47393 [11:10:37] New patchset: Hashar; "(Bug 41350) The translation of Wikipedia is said to be wrong on pa.wikipedia.org" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/29792 [11:11:03] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/29792 [11:11:53] deploying that [11:14:33] New review: Hashar; "deployed live" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/29792 [11:14:34] !log hashar synchronized wmf-config/InitialiseSettings.php 'cbd432f..82ed797 {{gerrit|47393}} Vector micro design on mlwiki, {{gerrit|29792}} wikipedia translation on pawiki' [11:14:35] New review: Hashar; "deployed live" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47393 [11:14:35] Logged the message, Master [11:33:43] PROBLEM - Host labstore4 is DOWN: PING CRITICAL - Packet loss = 100% [11:34:49] PROBLEM - Host labstore4 is DOWN: PING CRITICAL - Packet loss = 100% [11:38:32] RECOVERY - Host labstore4 is UP: PING OK - Packet loss = 0%, RTA = 26.62 ms [11:38:37] RECOVERY - Host labstore4 is UP: PING OK - Packet loss = 0%, RTA = 0.57 ms [11:39:49] PROBLEM - NTP on labstore4 is CRITICAL: NTP CRITICAL: Offset unknown [11:42:14] PROBLEM - NTP on labstore4 is CRITICAL: NTP CRITICAL: Offset unknown [11:58:23] RECOVERY - NTP on labstore4 is OK: NTP OK: Offset -0.08374726772 secs [12:00:23] PROBLEM - SSH on labstore4 is CRITICAL: Connection refused [12:03:04] PROBLEM - SSH on labstore4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:04:23] PROBLEM - Host labstore4 is DOWN: PING CRITICAL - Packet loss = 100% [12:09:04] PROBLEM - Host labstore4 is DOWN: PING CRITICAL - Packet loss = 100% [12:11:23] RECOVERY - SSH on labstore4 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [12:11:33] RECOVERY - Host labstore4 is UP: PING OK - Packet loss = 0%, RTA = 26.59 ms [12:11:55] RECOVERY - SSH on labstore4 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [12:12:04] RECOVERY - Host labstore4 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [12:13:53] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Mon Feb 4 12:13:46 UTC 2013 [12:14:01] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Mon Feb 4 12:13:45 UTC 2013 [12:15:23] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 189 seconds [12:15:43] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 195 seconds [12:16:25] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 216 seconds [12:16:34] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 220 seconds [12:19:04] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [12:19:56] New patchset: ArielGlenn; "update gluster client packagename for dataset1001 gluster mount" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47394 [12:20:53] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47394 [12:21:40] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [12:21:44] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [12:21:49] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [12:22:25] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [12:22:52] RECOVERY - Puppet freshness on dataset1001 is OK: puppet ran at Mon Feb 4 12:22:38 UTC 2013 [12:57:45] New patchset: Hashar; "beta: all wikis to wmf8" [operations/mediawiki-config] (newdeploy) - https://gerrit.wikimedia.org/r/47397 [12:58:12] Change merged: jenkins-bot; [operations/mediawiki-config] (newdeploy) - https://gerrit.wikimedia.org/r/47397 [13:17:05] New patchset: Hashar; "beta: /usr/local/apache dupe definition" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/45115 [13:17:10] New patchset: Hashar; "sync mediawiki only in production" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47398 [13:17:44] booo [13:24:26] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 186 seconds [13:24:39] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 190 seconds [13:24:56] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 199 seconds [13:26:00] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 235 seconds [13:29:36] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [13:30:27] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.001 second response time on port 8123 [13:31:24] RECOVERY - Puppet freshness on magnesium is OK: puppet ran at Mon Feb 4 13:31:03 UTC 2013 [13:46:52] New patchset: MaxSem; "rm anti-labs safeguard" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47400 [13:48:53] New review: Hashar; "yeah lets kill beta now!" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/47400 [13:49:06] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47400 [13:59:26] PROBLEM - SSH on ms-be3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:59:27] PROBLEM - swift-account-replicator on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:59:27] PROBLEM - swift-object-replicator on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:59:27] PROBLEM - swift-object-server on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:59:27] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:59:36] PROBLEM - swift-container-updater on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:59:37] PROBLEM - swift-account-server on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:59:37] PROBLEM - swift-container-replicator on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:59:37] PROBLEM - swift-object-updater on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:59:56] PROBLEM - swift-container-server on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:00:06] PROBLEM - swift-account-auditor on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:00:06] PROBLEM - swift-object-auditor on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:00:48] PROBLEM - swift-container-server on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:01:06] PROBLEM - SSH on ms-be3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:01:06] PROBLEM - swift-container-updater on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:01:15] PROBLEM - swift-account-server on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:01:24] PROBLEM - swift-object-updater on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:01:33] PROBLEM - swift-account-auditor on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:01:42] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:01:51] PROBLEM - swift-object-auditor on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:02:00] PROBLEM - swift-container-replicator on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:02:00] PROBLEM - swift-object-replicator on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:02:09] PROBLEM - swift-object-server on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:02:18] PROBLEM - swift-account-replicator on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:07:42] !log upgrading labstore3 to precise [14:07:43] Logged the message, Master [14:08:36] PROBLEM - Host labstore3 is DOWN: PING CRITICAL - Packet loss = 100% [14:10:25] he killed labstore3!! [14:11:06] RECOVERY - Host labstore3 is UP: PING OK - Packet loss = 0%, RTA = 26.64 ms [14:14:36] New patchset: Hashar; "beta: all wiki now uses php-master" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47403 [14:15:02] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47403 [14:17:44] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [14:17:45] PROBLEM - NTP on ms-be3 is CRITICAL: NTP CRITICAL: No response from NTP server [14:17:53] New patchset: MaxSem; "Add vars from private.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47404 [14:18:03] hashar, ^^^ [14:20:23] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47404 [14:20:28] MaxSem: thx! [14:27:12] PROBLEM - NTP on ms-be3 is CRITICAL: NTP CRITICAL: No response from NTP server [14:31:14] PROBLEM - NTP on labstore3 is CRITICAL: NTP CRITICAL: Offset unknown [14:32:14] RECOVERY - NTP on labstore3 is OK: NTP OK: Offset -0.06643784046 secs [14:32:15] !log rebooting labstore2 [14:32:16] Logged the message, Master [14:32:24] !log make that rebooting labstore3 [14:32:25] Logged the message, Master [14:33:46] !log powercycled ms-be3 [14:33:47] Logged the message, Master [14:33:49] PROBLEM - Host labstore3 is DOWN: PING CRITICAL - Packet loss = 100% [14:35:02] RECOVERY - Host labstore3 is UP: PING OK - Packet loss = 0%, RTA = 0.44 ms [14:36:12] RECOVERY - swift-object-server on ms-be3 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [14:36:14] RECOVERY - swift-container-updater on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [14:36:15] RECOVERY - swift-account-replicator on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [14:36:15] RECOVERY - SSH on ms-be3 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [14:36:15] RECOVERY - swift-object-replicator on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [14:36:15] RECOVERY - swift-object-server on ms-be3 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [14:36:15] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [14:36:21] RECOVERY - swift-account-replicator on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [14:36:25] RECOVERY - swift-account-server on ms-be3 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [14:36:39] RECOVERY - swift-container-server on ms-be3 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [14:36:44] RECOVERY - swift-container-server on ms-be3 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [14:36:55] RECOVERY - swift-account-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [14:36:55] RECOVERY - swift-object-auditor on ms-be3 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [14:36:55] RECOVERY - swift-container-replicator on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [14:36:55] RECOVERY - swift-object-updater on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [14:36:57] RECOVERY - swift-container-updater on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [14:37:06] RECOVERY - swift-account-server on ms-be3 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [14:37:15] RECOVERY - swift-object-updater on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [14:37:24] RECOVERY - swift-account-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [14:37:33] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [14:37:42] RECOVERY - SSH on ms-be3 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [14:37:42] RECOVERY - swift-object-auditor on ms-be3 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [14:37:51] RECOVERY - swift-object-replicator on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [14:37:51] RECOVERY - swift-container-replicator on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [14:39:45] RECOVERY - NTP on ms-be3 is OK: NTP OK: Offset 0.0005950927734 secs [15:03:24] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out