[01:23:58] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [01:28:01] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [01:40:37] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 227 seconds [01:41:14] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 264 seconds [01:48:16] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 688s [01:51:52] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 4 seconds [01:52:46] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 45s [01:52:46] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 1 seconds [02:24:44] PROBLEM - Puppet freshness on db1029 is CRITICAL: Puppet has not run in the last 10 hours [04:08:29] PROBLEM - swift-object-auditor on ms-be8 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [04:17:57] PROBLEM - swift-object-auditor on ms-be7 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [04:18:33] RECOVERY - swift-object-auditor on ms-be8 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [04:23:48] RECOVERY - swift-object-auditor on ms-be7 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [04:40:27] PROBLEM - Puppet freshness on mw1102 is CRITICAL: Puppet has not run in the last 10 hours [04:54:33] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [05:53:34] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [06:29:26] jesusaurus: Did you get the help you needed? [06:29:36] There's an operations git repo somewhere. [06:29:55] I assume Roan or someone showed you, but if you still need help, feel free to ask in here or in #wikimedia-tech. [07:06:34] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [07:17:44] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [07:18:23] Brooke: thanks, its less needing technical help, and more not wanting to step on other peoples toes as i change all the things [07:18:57] Will there be breaking changes or can things be migrated incrementally? [07:19:10] Obviously the latter is preferable unless there's a really high level of trust. [07:19:12] so i was hoping to get access to push it some sort of testing branch, however you have your workflow set up, so that others can see how i am changes the repo as i go from module to module [07:20:08] You found the operations repo, I assume? [07:20:34] things can definitely be changed incrementally, but i need to know more about how the puppet master and agents are configured to know if things will break [07:20:49] It's all public code. [07:21:28] well, roan gave me a URI to the git repo, but that doesnt necessarily include the puppet.conf file on the master or agents [07:22:17] Someone in here should be able to help. [07:22:17] and if they dont have a path set for modules, and are only looking for the manifests directory, then things will break [07:22:21] When it's not the weekend. [07:23:00] right, and with wikimania coming up, im expecting it to be a couple weeks before im actually doing any real work [07:26:44] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [07:39:47] PROBLEM - Puppet freshness on ms1 is CRITICAL: Puppet has not run in the last 10 hours [08:51:22] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [09:34:59] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [10:59:08] seems like european squids have hard time with images https://bugzilla.wikimedia.org/show_bug.cgi?id=38242 [11:24:35] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [11:28:38] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [12:12:05] ms6 is the culprit: http://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&m=&c=Miscellaneous+esams&h=ms6.esams.wikimedia.org&tab=m&vn=&mc=2&z=medium&metric_group=ALLGROUPS [12:14:06] out of space on root, trying to reclaim smeby removing HTCPpurger.log.1 [12:16:47] apergos: [12:16:56] https://bugzilla.wikimedia.org/38242 [12:17:22] !log removed HTCPpurger.log.1 and current log ater restart of purger on ms6, /was full. people reporting thumb issues from europe [12:17:32] Logged the message, Master [12:24:57] apergos: no nagios complaints on ms6? [12:25:20] PROBLEM - Puppet freshness on db1029 is CRITICAL: Puppet has not run in the last 10 hours [12:49:38] PROBLEM - swift-object-auditor on ms-be8 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [12:54:35] PROBLEM - swift-object-auditor on ms-be6 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [12:56:59] PROBLEM - swift-object-auditor on ms-be7 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [12:56:59] RECOVERY - swift-object-auditor on ms-be8 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [12:59:14] RECOVERY - swift-object-auditor on ms-be6 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [13:01:20] RECOVERY - swift-object-auditor on ms-be7 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [14:41:24] PROBLEM - Puppet freshness on mw1102 is CRITICAL: Puppet has not run in the last 10 hours [14:55:21] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [15:40:33] woosters (or anyone else): Who would be a good person to ask for an interview regarding general performance/"uptime" issues for the Signpost Technology Report? [apologies for crossposting] [15:54:34] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [17:03:51] New patchset: Alex Monk; "(bug 38247) Add WP and WT namespace aliases to ilowiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14678 [17:07:46] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [17:18:43] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [17:27:52] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [17:41:03] PROBLEM - Puppet freshness on ms1 is CRITICAL: Puppet has not run in the last 10 hours [18:52:20] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [19:36:26] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [20:07:57] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [20:11:48] New patchset: Andrew Bogott; "Add an ext3 cisco partman recipe." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14743 [20:12:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14743 [20:12:43] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14743 [20:24:54] PROBLEM - Host virt1001 is DOWN: PING CRITICAL - Packet loss = 100% [20:30:27] RECOVERY - Host virt1001 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [20:31:30] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [20:34:21] PROBLEM - SSH on virt1001 is CRITICAL: Connection refused [20:43:43] New patchset: Krinkle; "bits/robots.txt: Remove commented out rules, re-instate noindex." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14744 [20:50:02] What is todo.dblist ? [20:50:49] ^demon: do you know? [20:51:01] <^demon> not a clue [20:51:02] seems like a semi-random list of wikis to me [20:51:13] <^demon> could just be a list of wikis someone made at some point. [20:51:54] PROBLEM - NTP on virt1001 is CRITICAL: NTP CRITICAL: No response from NTP server [20:52:34] New patchset: Krinkle; "(bug 34370) Remove chwikimedia from SiteMatrix" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14746 [21:14:38] New patchset: Krinkle; "Remove broken docroot/foundation/extract.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14747 [21:17:18] New patchset: Krinkle; "Remove old bits/test files" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14748 [21:20:38] New patchset: Krinkle; "Remove old bits/test.txt file" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14748 [21:23:55] New review: Krinkle; "pybal-test-file is still used:" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14748 [21:26:03] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [21:29:57] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [21:30:15] PROBLEM - swift-object-auditor on ms-be8 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [21:31:45] PROBLEM - swift-object-auditor on ms-be6 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [21:34:00] PROBLEM - swift-object-auditor on ms-be7 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [21:36:06] RECOVERY - swift-object-auditor on ms-be6 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [21:36:15] RECOVERY - swift-object-auditor on ms-be8 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [21:41:21] RECOVERY - swift-object-auditor on ms-be7 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [21:43:45] PROBLEM - Host mw1012 is DOWN: PING CRITICAL - Packet loss = 100% [21:44:39] PROBLEM - Host virt1001 is DOWN: PING CRITICAL - Packet loss = 100% [21:45:51] RECOVERY - SSH on virt1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [21:46:00] RECOVERY - Host virt1001 is UP: PING OK - Packet loss = 0%, RTA = 30.93 ms [22:25:50] PROBLEM - Puppet freshness on db1029 is CRITICAL: Puppet has not run in the last 10 hours [23:38:21] New review: Reedy; "Indenting looks wrong" [operations/mediawiki-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/14678 [23:41:22] New patchset: Alex Monk; "(bug 38247) Add WP and WT namespace aliases to ilowiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14678