[00:13:32] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Puppet has not run in the last 10 hours [00:24:11] RECOVERY - Lucene on search15 is OK: TCP OK - 0.007 second response time on port 8123 [00:32:26] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [00:38:01] RECOVERY - Lucene on search15 is OK: TCP OK - 2.993 second response time on port 8123 [00:46:52] PROBLEM - Puppet freshness on search1001 is CRITICAL: Puppet has not run in the last 10 hours [00:51:40] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [01:31:52] RECOVERY - Lucene on search15 is OK: TCP OK - 8.991 second response time on port 8123 [01:32:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:33:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.929 seconds [01:37:43] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [01:40:07] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [01:56:01] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 600s [01:56:37] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 639s [02:01:16] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:02:28] RECOVERY - Lucene on search15 is OK: TCP OK - 0.006 second response time on port 8123 [02:02:28] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [02:06:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:10:43] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.944 seconds [02:16:24] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [02:18:03] RECOVERY - Lucene on search15 is OK: TCP OK - 0.001 second response time on port 8123 [02:19:42] PROBLEM - Puppet freshness on brewster is CRITICAL: Puppet has not run in the last 10 hours [02:26:18] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [03:06:25] !log re-enabling notifications for search-pool1 and search-pool3, search-pool2 still flapping very badly [03:06:32] Logged the message, and now dispaching a T1000 to your position to terminate you. [03:08:28] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [03:08:55] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [03:11:37] RECOVERY - Lucene on search15 is OK: TCP OK - 0.002 second response time on port 8123 [03:19:52] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [03:30:22] RECOVERY - Lucene on search15 is OK: TCP OK - 0.002 second response time on port 8123 [03:38:46] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [04:05:29] RECOVERY - Lucene on search15 is OK: TCP OK - 2.995 second response time on port 8123 [04:17:56] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [04:29:47] RECOVERY - Lucene on search15 is OK: TCP OK - 2.997 second response time on port 8123 [04:34:35] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [04:38:11] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [04:56:20] RECOVERY - Lucene on search15 is OK: TCP OK - 0.003 second response time on port 8123 [05:04:44] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [06:16:12] RECOVERY - Lucene on search15 is OK: TCP OK - 9.004 second response time on port 8123 [06:24:27] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [06:40:21] RECOVERY - Lucene on search15 is OK: TCP OK - 3.009 second response time on port 8123 [06:48:19] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [07:06:36] PROBLEM - Lucene on search9 is CRITICAL: Connection timed out [07:06:54] PROBLEM - Lucene on search3 is CRITICAL: Connection timed out [07:17:42] PROBLEM - Puppet freshness on search1002 is CRITICAL: Puppet has not run in the last 10 hours [07:21:09] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [07:24:36] RECOVERY - Lucene on search4 is OK: TCP OK - 0.009 second response time on port 8123 [07:25:30] RECOVERY - Lucene on search3 is OK: TCP OK - 0.002 second response time on port 8123 [07:26:15] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.004 second response time on port 8123 [07:26:33] RECOVERY - Lucene on search9 is OK: TCP OK - 0.012 second response time on port 8123 [07:59:39] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [08:02:39] PROBLEM - Puppet freshness on search1003 is CRITICAL: Puppet has not run in the last 10 hours [08:05:39] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [08:05:39] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [08:25:45] PROBLEM - SSH on db1006 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:25:45] PROBLEM - MySQL Replication Heartbeat on db1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:25:54] PROBLEM - Disk space on db1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:26:03] PROBLEM - MySQL Slave Delay on db1040 is CRITICAL: CRIT replication delay 200 seconds [08:26:03] PROBLEM - MySQL Slave Delay on db1022 is CRITICAL: CRIT replication delay 201 seconds [08:26:12] PROBLEM - MySQL Slave Running on db1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:26:12] PROBLEM - mysqld processes on db1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:26:13] PROBLEM - RAID on db1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:26:13] PROBLEM - MySQL disk space on db1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:26:30] PROBLEM - MySQL Recent Restart on db1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:26:39] PROBLEM - MySQL Idle Transactions on db1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:26:39] PROBLEM - DPKG on db1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:26:39] PROBLEM - MySQL Slave Delay on db1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:26:48] PROBLEM - MySQL Replication Heartbeat on db1040 is CRITICAL: CRIT replication delay 246 seconds [08:26:57] PROBLEM - Full LVS Snapshot on db1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:27:06] PROBLEM - MySQL Replication Heartbeat on db1022 is CRITICAL: CRIT replication delay 264 seconds [08:28:45] RECOVERY - MySQL Slave Delay on db1022 is OK: OK replication delay NULL seconds [08:30:51] PROBLEM - Host db1006 is DOWN: PING CRITICAL - Packet loss = 100% [08:33:06] RECOVERY - MySQL Slave Delay on db1040 is OK: OK replication delay NULL seconds [08:36:24] RECOVERY - Lucene on search15 is OK: TCP OK - 2.999 second response time on port 8123 [08:44:48] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [09:11:30] RECOVERY - Lucene on search15 is OK: TCP OK - 0.003 second response time on port 8123 [09:19:53] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [10:14:20] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Puppet has not run in the last 10 hours [10:24:05] RECOVERY - Lucene on search15 is OK: TCP OK - 0.005 second response time on port 8123 [10:26:38] PROBLEM - Disk space on mw48 is CRITICAL: DISK CRITICAL - free space: /tmp 61 MB (3% inode=89%): [10:32:30] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [10:44:02] RECOVERY - Disk space on mw48 is OK: DISK OK [10:47:56] PROBLEM - Puppet freshness on search1001 is CRITICAL: Puppet has not run in the last 10 hours [11:28:51] RECOVERY - Lucene on search15 is OK: TCP OK - 8.999 second response time on port 8123 [11:37:06] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [11:38:36] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [11:53:44] hello [12:13:05] Nikerabbit: sorry I don't have the rights to create a git repo. Maybe mutante or another op [12:18:39] RECOVERY - Lucene on search15 is OK: TCP OK - 0.008 second response time on port 8123 [12:20:36] PROBLEM - Puppet freshness on brewster is CRITICAL: Puppet has not run in the last 10 hours [12:25:39] mediawiki - is very slow [12:27:03] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [12:43:53] hashar: I'll wait demon then [12:51:43] <^demon|away> Nikerabbit: I won't be around much today, what's up? [12:52:04] ^demon|away: need a place for new extension [12:52:25] <^demon|away> I think I can manage that. [12:52:38] <^demon|away> Name? Description? [12:53:17] ^demon|away: working name is lcadft: tools for collecting contact data for translators and sending them updates [12:53:45] <^demon|away> All lowercase like that? [12:53:59] <^demon|away> Pick a name you like, it's damn near impossible to rename them afterwords. [12:55:59] why? [12:56:07] <^demon|away> Because. [12:56:21] :-D [12:56:24] it's not hard in svn [12:56:34] <^demon|away> Those are directories, not repositories. [12:56:49] a bunch of other stuff is hard in svn as a tradeoff [12:56:57] okay, let's talk again in a week after I've reached out the naming committee [12:57:58] <^demon|away> Not renaming them is kind of good in a way too. Keeps people's urls from arbitrarily breaking when the author decides to rename his extension :) [12:58:55] look, it's a qchris! [12:59:15] :) Hi apergos [12:59:21] how's things? [12:59:28] <^demon|away> Alrighty, I'm going to celebrate president's day. [12:59:46] Quite good, except relatives seem to die at a quite disturbing rate these days :( [12:59:52] waitbeforeyougo ^demon|away [13:00:01] <^demon|away> Hm? [13:00:11] is qchris set up for git for the dumps stuff? (if not and you can't right now, ok) [13:00:23] qchris: I'm sorry to hear this [13:00:31] not yet. I haven't aked him to do that yet [13:00:35] ok [13:00:52] But i am having ssh git labs account and will ask him after our meeting ;) [13:01:08] ok. he'll be doing the wmf holiday then [13:01:15] <^demon|away> apergos: Once he's set up in gerrit, he should be good to go. I made dumps "anyone can push for review" like puppet is. [13:01:21] RECOVERY - Lucene on search15 is OK: TCP OK - 8.994 second response time on port 8123 [13:01:23] ok great [13:01:30] <^demon|away> But if he's got any problems, lemme know and we can sort it out [13:01:34] Gerrti works for me. Yes. Thanks [13:01:35] fab [13:09:23] ^demon|away: one can rename repos in github :E [13:09:36] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [13:30:50] I am going to deploy a change [13:32:32] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/111832 [13:32:40] which fix (bug 34421) duplicate Subject / wrong To: headers in mail [13:36:12] that should fix the funny mail headers [13:36:18] at least it did on testwiki! [13:39:43] this will be exciting, maybe :-D [13:40:46] might make our mail servers gone wild :D [13:40:49] or something [13:41:01] nice trick while most people are on wmf holiday :-D [13:41:11] (not me, I thought I would try to get some code done) [13:41:40] no I am going to look at that syslog-ng file [13:41:44] :D [13:41:57] have fun :-D [13:42:20] are you observing USA holiday day? [14:16:22] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:17:34] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [14:21:59] New patchset: Pyoungmeister; "adding mediawiki install to search indexers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2672 [14:23:05] New patchset: Hashar; "send Swift syslogs to their own file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2673 [14:23:30] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2672 [14:23:30] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2672 [14:23:31] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2673 [14:23:36] apergos: here we have. Swift log moved to their own file ( change 2672 ) [14:23:58] yay [14:24:19] any idea whom I could ask to review it ? [14:24:24] beside mark / ryan :-D [14:25:33] they are the primary reviewers I guess [14:27:26] 7 of my changes pending now :D [14:35:25] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [14:41:37] listening to the radio, the journalist is interviewing a family in hungary regretting communism :-) [14:42:02] quote "everyone had a job" "if you did not like it you could just change" "you could travel" ... [14:49:27] if you did not like communism you could change? [14:49:34] O_O [15:09:32] RECOVERY - Lucene on search15 is OK: TCP OK - 2.995 second response time on port 8123 [15:17:56] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [15:32:29] RECOVERY - Lucene on search15 is OK: TCP OK - 2.993 second response time on port 8123 [15:40:53] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [15:50:47] New patchset: Pyoungmeister; "some more indexer bits and pieces" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2674 [15:54:24] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2674 [15:54:25] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2674 [15:56:09] !log initial test-spinup of searchidx1001 and search1001-1006 (en cluster) [15:56:11] Logged the message, and now dispaching a T1000 to your position to terminate you. [15:56:47] RECOVERY - Lucene on search15 is OK: TCP OK - 8.999 second response time on port 8123 [15:57:57] apergos: weren't you fixing stuff in redirects.conf lately? [15:58:14] no. we were putting things in it and then taaking them right back out :-D [15:58:34] that is EVIL!! [15:58:49] well they were broken! [15:58:57] so I claim it was not evil at all but very good!! [15:58:59] just like the European Union is giving cash to greek just to have it pay back Europe :D [15:59:09] that's [15:59:13] well I was going to say "just dumb" [15:59:21] but the problem is that there are all of these evil side effects [15:59:27] so yeah I guess it's evil too [16:00:16] well I am triaging wikimedia bugs in bugzilla, one of them is : https://mediawiki.org redirects to http://www.mediawiki.org/ [16:00:24] aka HTTPS without a trailing / sent you to HTTP [16:00:29] but that seems fixed to me know [16:00:30] ah [16:00:32] it is? [16:00:33] https://bugzilla.wikimedia.org/show_bug.cgi?id=31369 [16:00:36] well we didn't touch that [16:00:39] just office.wp [16:01:00] unless mutante did something later [16:01:02] $ curl -IL https://www.mediawiki.org 2>/dev/null|grep Location [16:01:02] Location: https://www.mediawiki.org/wiki/MediaWiki [16:01:18] will check redirects.conf [16:01:35] PROBLEM - Host search1002 is DOWN: PING CRITICAL - Packet loss = 100% [16:02:38] PROBLEM - Host search1003 is DOWN: PING CRITICAL - Packet loss = 100% [16:02:47] RewriteCond %{HTTPS} off !!!!!!!!! [16:02:51] apergos: seems fixed [16:02:58] huh [16:03:02] ok [16:03:06] well maybe he put it in [16:03:10] great [16:03:29] /home/wikipedia/conf should really be versionned with git / svn :D [16:03:42] I thought it was [16:04:06] I was able to see who added what for the apache conf files [16:05:11] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [16:05:29] PROBLEM - SSH on search1001 is CRITICAL: Connection refused [16:06:59] RECOVERY - Host search1002 is UP: PING OK - Packet loss = 0%, RTA = 30.86 ms [16:07:01] wmf-config is versioned in SVN [16:07:05] apergos: you should be able to close RT 1668 http://rt.wikimedia.org/Ticket/Display.html?id=1668 [16:07:14] I want to put that in git some time, after taking out PrivateSettings.php etc [16:07:19] I have just marked related bug https://bugzilla.wikimedia.org/show_bug.cgi?id=31369 fixed [16:07:32] I wonder why I own this [16:07:33] weird [16:07:40] RoanKattouw: /home/wikipedia/conf on fenari is not versionned [16:07:53] the only stuff is the good old RCS system :D [16:07:55] That's right, that's not [16:08:02] RECOVERY - Host search1003 is UP: PING OK - Packet loss = 0%, RTA = 30.90 ms [16:08:25] but those conf files are absolutely in svn [16:08:51] maybe I am not looking at the correct place [16:08:59] ariel@fenari:/home/wikipedia/conf/httpd$ svn info [16:08:59] Path: . [16:08:59] URL: file:///home/wikipedia/conf-svn/httpd [16:09:08] or they are added to svn by whatever sync script is used [16:09:51] ohhh [16:09:56] PROBLEM - Disk space on search1002 is CRITICAL: Connection refused by host [16:10:06] we have several redirects.conf files :-( [16:10:10] yes [16:10:20] the ones in there are the ones to work from [16:10:59] PROBLEM - DPKG on search1002 is CRITICAL: Connection refused by host [16:11:08] PROBLEM - SSH on search1002 is CRITICAL: Connection refused [16:11:53] PROBLEM - RAID on search1002 is CRITICAL: Connection refused by host [16:12:11] PROBLEM - SSH on search1003 is CRITICAL: Connection refused [16:13:02] 200 more mails to go [16:15:20] RECOVERY - Lucene on search15 is OK: TCP OK - 0.002 second response time on port 8123 [16:19:27] New patchset: Hashar; "remove gerrit/nagios bots from #wikimedia-tech" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2675 [16:23:35] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [16:28:05] PROBLEM - NTP on search1002 is CRITICAL: NTP CRITICAL: No response from NTP server [17:12:32] New patchset: Pyoungmeister; "get the password to the file. boop boop" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2676 [17:13:04] boop boop? [17:14:16] apergos: that's the sound a computer makes. [17:14:39] I had a stressful weekend, ok ? [17:14:47] :-D [17:19:59] PROBLEM - Host search1002 is DOWN: PING CRITICAL - Packet loss = 100% [17:20:00] PROBLEM - Host search1001 is DOWN: PING CRITICAL - Packet loss = 100% [17:22:05] !log stopping puppet on brewster [17:22:07] Logged the message, and now dispaching a T1000 to your position to terminate you. [17:25:23] RECOVERY - Host search1002 is UP: PING OK - Packet loss = 0%, RTA = 31.01 ms [17:25:23] RECOVERY - Host search1001 is UP: PING OK - Packet loss = 0%, RTA = 31.28 ms [17:28:53] anyone know if we can get https on prototype? https://bugzilla.wikimedia.org/34520 [17:34:31] New patchset: Hashar; "update WiktionaryMobile github URL" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2677 [17:36:18] New patchset: Pyoungmeister; "that hostname so doesn't exist..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2678 [17:37:27] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2676 [17:37:27] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2676 [17:37:36] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2678 [17:37:36] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2678 [17:44:04] hexmode: what is prototype ? :-( [17:45:33] New patchset: Pyoungmeister; "clean up requires" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2679 [17:46:32] hexmode: it is not puppetized, probably nothing I can do to bring https on prototype :-( [17:46:41] prototype should go away [17:46:46] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2679 [17:46:46] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2679 [17:46:51] No sense in putting in effort to https-ify it [17:47:04] RoanKattouw_away: can you add a comment on https://bugzilla.wikimedia.org/show_bug.cgi?id=34520 ? :-) [17:48:43] RECOVERY - Lucene on search15 is OK: TCP OK - 0.000 second response time on port 8123 [17:48:53] There's a separate bug abuot prototype HTTPS IIRC, probably closed as WONTFIX [17:49:53] yeah there is https://bugzilla.wikimedia.org/show_bug.cgi?id=32457 mwEmbed is hosted on prototype.mw, meaning that it does not support https [17:49:54] hehe [17:56:53] New patchset: Pyoungmeister; "regex" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2680 [17:57:21] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2680 [17:57:22] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2680 [18:01:01] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [18:03:25] RECOVERY - Squid on brewster is OK: TCP OK - 0.004 second response time on port 8080 [18:03:52] PROBLEM - Puppet freshness on search1003 is CRITICAL: Puppet has not run in the last 10 hours [18:05:13] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [18:06:52] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [18:06:52] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [18:09:52] PROBLEM - Host search1002 is DOWN: PING CRITICAL - Packet loss = 100% [18:15:16] RECOVERY - Host search1002 is UP: PING OK - Packet loss = 0%, RTA = 30.90 ms [18:26:04] RECOVERY - SSH on search1001 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:26:13] PROBLEM - Host search1003 is DOWN: PING CRITICAL - Packet loss = 100% [18:27:43] RECOVERY - SSH on search1002 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:28:55] RECOVERY - Puppet freshness on search1001 is OK: puppet ran at Mon Feb 20 18:28:38 UTC 2012 [18:28:55] RECOVERY - SSH on search1003 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:29:04] RECOVERY - Host search1003 is UP: PING OK - Packet loss = 0%, RTA = 30.90 ms [18:31:02] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:31:28] RECOVERY - RAID on search1001 is OK: OK: no RAID installed [18:31:46] RECOVERY - DPKG on search1001 is OK: All packages OK [18:31:46] RECOVERY - Disk space on search1001 is OK: DISK OK [18:32:13] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [18:34:37] RECOVERY - NTP on search1001 is OK: NTP OK: Offset -0.09592020512 secs [18:36:07] RECOVERY - Puppet freshness on search1002 is OK: puppet ran at Mon Feb 20 18:35:55 UTC 2012 [18:37:19] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.06784634783 (gt 8.0) [18:37:37] hashar: could anyone else do it w/o puppet? [18:38:22] RECOVERY - DPKG on search1002 is OK: All packages OK [18:38:49] RECOVERY - Disk space on search1002 is OK: DISK OK [18:39:16] RECOVERY - RAID on search1002 is OK: OK: no RAID installed [18:42:07] RECOVERY - Puppet freshness on search1003 is OK: puppet ran at Mon Feb 20 18:41:55 UTC 2012 [18:43:43] gerrit's interface is punishment to the eyes. [18:43:52] Is there a different skin that it can use or something? [18:44:01] The vomit yellow and green really is quite awful. [18:44:49] RECOVERY - RAID on search1003 is OK: OK: no RAID installed [18:45:16] RECOVERY - Disk space on search1003 is OK: DISK OK [18:45:16] RECOVERY - DPKG on search1003 is OK: All packages OK [18:46:37] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 1.26315605263 [18:49:19] RECOVERY - NTP on search1002 is OK: NTP OK: Offset 0.0131098032 secs [18:49:44] I wish Gerrit were skinnable [18:50:42] I found https://labsconsole.wikimedia.org/wiki/Gerrit_bugs_that_matter [18:50:51] And a bug about moving those page to Bugzilla, I think. [18:50:55] A bit of a mess. [18:51:03] But yeah, it needs UI work and visual work. [18:54:15] yup [18:54:46] What Gerrit lacks on the frontend it makes up for at the backend, and unfortunately the reverse is true for the alternatives that we've seen [18:54:52] RECOVERY - NTP on search1003 is OK: NTP OK: Offset 0.0687738657 secs [19:47:28] hexmode: sorry I am back [19:47:42] hexmode: looks like prototype is obsolete and need to be deleted / decommissioned [19:47:58] hexmode: so we should migrate whatever is left there to a new system [19:50:47] New patchset: Pyoungmeister; "well, that sure doesn't exist..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2681 [19:51:18] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2681 [19:51:19] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2681 [20:01:36] PROBLEM - Lucene on search1005 is CRITICAL: Connection refused [20:01:36] PROBLEM - Lucene on search1004 is CRITICAL: Connection refused [20:01:36] PROBLEM - Lucene on search1006 is CRITICAL: Connection refused [20:01:36] PROBLEM - Lucene on searchidx1001 is CRITICAL: Connection refused [20:03:41] New patchset: Danakim; "Modified the git-setup script to not use "git config --global" when setting the environment up for the user. This changes any .gitconfig file the user might have in his/her home and might not actually be in sync with the data the user might want to use fo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2682 [20:13:54] RECOVERY - Lucene on search15 is OK: TCP OK - 8.998 second response time on port 8123 [20:14:39] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Puppet has not run in the last 10 hours [20:22:09] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [20:36:51] RECOVERY - Lucene on search15 is OK: TCP OK - 9.004 second response time on port 8123 [20:45:06] PROBLEM - Lucene on search15 is CRITICAL: Connection timed out [20:49:44] New patchset: Hashar; "git-setup script no more use "git config --global"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2682 [20:50:12] OH YEAHHHHHHhhhh [20:50:26] I HAVE MANAGED TO EDIT SOMEONE ELSE PATCHSET IN GERRIT !!!!! [20:50:30] ohuouuuuhouuu [20:50:42] and I did not even had to look at the git cheat sheet! !!!!! [20:59:14] New review: Hashar; "We try to keep tabulations, this way anyone can use their preferred tab size (2,3,4, 8 ...)." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2682 [21:00:52] hashar: Did you use git review -d for that too? [21:01:16] New review: Hashar; "Finally, you might want to look at git-review https://labsconsole.wikimedia.org/wiki/Git-review it ..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2682 [21:02:25] New review: Catrope; "Yeah git-review does make everything simpler... unless you're on Windows :D . Might be worth keeping..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/2682 [21:07:31] RoanKattouw_away: I do my stuff and then just: git review [21:07:37] never looked at the switch [21:07:43] Oh OK [21:07:45] err never looked at the arguments / options [21:07:55] So you copypasted the command to download the revision from gerrit? [21:08:08] git review -d 2682 means "download change 2682" [21:08:22] IIRC: git fetch-all && git cherry-pick [21:08:49] or maybe I copy pasted, can't remember [21:08:52] will use -d now :-))))))))))) [21:08:54] thanks Roan! [21:09:28] also it seems git-review find out you are working on a bug and can create the topic branch automatically [21:09:51] i.e. you can be locally in "production", if your commit message contains bug (\d+), it will create a topic branch $1 :-) [21:39:22] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [23:30:54] PROBLEM - Puppet freshness on brewster is CRITICAL: Puppet has not run in the last 10 hours [23:36:54] PROBLEM - Lucene on search1007 is CRITICAL: Connection refused [23:36:54] PROBLEM - Lucene on search1010 is CRITICAL: Connection refused [23:36:54] PROBLEM - Lucene on search1011 is CRITICAL: Connection refused [23:36:54] PROBLEM - Lucene on search1009 is CRITICAL: Connection refused [23:36:54] PROBLEM - Lucene on search1012 is CRITICAL: Connection refused [23:36:54] PROBLEM - Lucene on search1013 is CRITICAL: Connection refused [23:36:54] PROBLEM - Lucene on search1015 is CRITICAL: Connection refused [23:36:55] PROBLEM - Lucene on search1016 is CRITICAL: Connection refused [23:36:55] PROBLEM - Lucene on search1020 is CRITICAL: Connection refused [23:36:56] PROBLEM - Lucene on search1017 is CRITICAL: Connection refused [23:36:56] PROBLEM - Lucene on search1018 is CRITICAL: Connection refused [23:36:58] PROBLEM - Lucene on search1019 is CRITICAL: Connection refused