[00:13:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:14:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [00:22:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:23:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.138 second response time [00:31:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.137 second response time [00:52:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:53:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [01:22:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:23:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [07:19:33] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [07:19:33] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [07:19:33] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [07:19:33] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [07:19:33] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [07:19:33] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [07:19:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:19:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [07:19:36] PROBLEM - Host mw27 is DOWN: PING CRITICAL - Packet loss = 100% [07:19:36] RECOVERY - Host mw27 is UP: PING OK - Packet loss = 0%, RTA = 26.54 ms [07:19:40] !log LocalisationUpdate completed (1.22wmf12) at Fri Aug 2 02:14:20 UTC 2013 [07:19:40] Logged the message, Master [07:19:43] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:19:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.142 second response time [07:19:44] (03PS2) 10Tim Landscheidt: Tools: Manage obsolete files in /usr/local/bin [operations/puppet] - 10https://gerrit.wikimedia.org/r/77234 [07:19:44] (03CR) 10jenkins-bot: [V: 04-1] Tools: Manage obsolete files in /usr/local/bin [operations/puppet] - 10https://gerrit.wikimedia.org/r/77234 (owner: 10Tim Landscheidt) [07:19:44] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Aug 2 02:34:08 UTC 2013 [07:19:44] Logged the message, Master [07:19:45] (03PS3) 10Tim Landscheidt: Tools: Manage obsolete files in /usr/local/bin [operations/puppet] - 10https://gerrit.wikimedia.org/r/77234 [07:19:47] (03PS4) 10Tim Landscheidt: Tools: Manage obsolete files in /usr/local/bin [operations/puppet] - 10https://gerrit.wikimedia.org/r/77234 [07:19:49] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:19:50] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.133 second response time [07:19:53] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [07:19:58] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:19:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [07:20:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:20:01] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.143 second response time [07:20:04] (03PS1) 10Catrope: VE config changes for https://gerrit.wikimedia.org/r/77165 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77268 [07:20:04] (03PS1) 10Catrope: [DO NOT MERGE] Turn VisualEditor beta welcome on everywhere [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77269 [07:20:04] (03CR) 10Catrope: [C: 04-2] "Don't deploy this until the beta welcome message has been translated into more than just English" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77269 (owner: 10Catrope) [07:20:06] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:20:07] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [07:20:07] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [07:20:15] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:20:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [07:20:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:20:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [07:20:23] (03CR) 10Catrope: [C: 032] VE config changes for https://gerrit.wikimedia.org/r/77165 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77268 (owner: 10Catrope) [07:20:23] (03Merged) 10jenkins-bot: VE config changes for https://gerrit.wikimedia.org/r/77165 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77268 (owner: 10Catrope) [07:20:26] PROBLEM - Puppet freshness on holmium is CRITICAL: No successful Puppet run in the last 10 hours [07:20:30] !log catrope Started syncing Wikimedia installation... : Update VE to master [07:20:30] Logged the message, Master [07:20:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:20:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [07:20:32] !log catrope Finished syncing Wikimedia installation... : Update VE to master [07:20:32] Logged the message, Master [07:20:32] !log catrope synchronized wmf-config/InitialiseSettings.php 'Add VE config to set Edit source / Edit beta on enwiki' [07:20:32] Logged the message, Master [07:20:32] !log catrope synchronized wmf-config/CommonSettings.php 'Add plumbing for new VE config vars' [07:20:32] Logged the message, Master [07:20:33] !log catrope synchronized php-1.22wmf12/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.ViewPageTarget.init.js 'touch' [07:20:33] Logged the message, Master [07:20:33] !log catrope synchronized php-1.22wmf12/resources/startup.js 'touch' [07:20:33] Logged the message, Master [07:20:46] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [07:20:49] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:20:50] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [07:20:50] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:20:50] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [07:20:58] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:20:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [07:20:59] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [07:21:01] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:21:01] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [07:32:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:33:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.143 second response time [07:42:23] hey apergos, yt? [07:42:32] yes [07:42:34] what's up? [07:43:14] i'm trying to decide what to do with the 'high bandwidth rsync' sysctl config, which sets 'vm.min_free_kbytes' => 262144 [07:43:29] googling around i came across https://wikitech.wikimedia.org/wiki/Dataset1001 [07:43:30] ah [07:43:47] yeah I added that [07:44:26] both andrew and faidon agreed that it was not good that i included it under a 'role::rsync' class [07:44:32] can you think of a more descriptive name? [07:44:51] no, it's probably not just about rsync, it just happens that rsync trips that more easily [07:46:24] morning [07:46:28] hey hashar [07:46:30] morning [07:47:02] apergos: also, you noted 'This is still well below the max recommended 6% for a host with 16GB ram.' -- should it be set to a % of ram? facter exposes a memorysize fact so i can make it be a % of ram size. [07:47:20] maybe that's excessively fancy and a comment would do, though [07:47:37] no I don't think you need to do that [07:47:50] it's well below = this is a good thing [07:48:00] ori-l: sorry about the human readable exception.log , I didn't mean to be mean to your change :( [07:48:24] or what ever, damn I am not awake yet [07:48:34] happy coffee [07:48:42] I ran out of coffee :( [07:48:46] ugh [07:48:47] hashar: i didn't take it as mean; the current format really is quite readable and familiar to people debugging, so i understand the resistance [07:49:00] sleepwalk to the nearest corner store? [07:49:11] apergos: it is too far away :-] [07:54:18] apergos: how's 'early_page_reclaim' [07:54:20] ? [07:56:20] not loving it, but not coming up with anything good either [08:01:39] ori-l: the more review others patches the more I find our code base to be horribly organized :( [08:05:44] core, you mean? [08:12:02] ori-l: in core yup [08:12:20] example [08:12:34] i like spaghetti, specially uncooked one nicely aligned in their box [08:12:44] in MediaWiki it is more like each person cook ONE spaghetti [08:12:49] hahaha [08:12:56] and try to insert the result in the plate [08:13:16] i think i've been guilty of that myself [08:13:17] and we then send to the outside world that multi colored spaghetti thing [08:13:24] we are all [08:14:13] i think with visualeditor and parsoid it is reaching the point where not even tim knows his way around everywhere [08:14:50] so i guess we each have to make sure we cook our one spaghetti just right :) [08:17:48] VE is surely difficul [08:17:55] !log restarting slave threads db32 db52 db39 db51 db45 db43 db56 after OSC bug 49194 [08:18:06] Logged the message, Master [08:18:10] for Parsoid I consider it merely a web service which hopefully has a nice documentation for its entry points :-] [08:18:39] springle: at first spot I though you were a volunteer and wondered how you could restart slave threads :D Hello! [08:18:49] heh :) [08:19:11] ori-l: I am reviewing the code design. Will not be able to follow up tomorrow though since I pack my luggages for vacations. [08:19:54] you should relax and prepare for your vacation :P [08:20:02] there's no deployment anytime soon anyway [08:20:14] you can think JSON on the plane [08:20:15] I am gone till Aug 26th :D [08:20:49] it can wait, really, but up to you [08:21:31] * hashar reads compact() php documentation [08:21:36] never heard of that one before [08:22:27] it is evil :D [08:22:34] yes, but the best kind of evil [08:27:54] we need php_python [08:28:02] * hashar look up in pecl [08:29:29] PROBLEM - Puppet freshness on sq41 is CRITICAL: No successful Puppet run in the last 10 hours [08:42:03] ori-l: review complete :] Have a good night! [08:42:18] hashar: you too, thanks! [08:42:50] next target fatal.log :-D [08:44:32] hashar: https://bugzilla.wikimedia.org/show_bug.cgi?id=52209 [08:45:17] :-] [08:46:27] enjoy your vacation [08:46:49] I will ! [08:48:55] ori-l: aren't you sleeping? :) [08:49:17] almost [09:03:05] PROBLEM - SSH on pdf3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:06:01] RECOVERY - SSH on pdf3 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [09:12:04] (03CR) 10Ori.livneh: "(4 comments)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 (owner: 10Ori.livneh) [09:14:11] PROBLEM - SSH on pdf3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:15:01] RECOVERY - SSH on pdf3 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [09:17:11] (03PS5) 10Ori.livneh: Clean up sysctl parameters. [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 [09:17:39] (03CR) 10jenkins-bot: [V: 04-1] Clean up sysctl parameters. [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 (owner: 10Ori.livneh) [09:18:38] (03PS6) 10Ori.livneh: Clean up sysctl parameters. [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 [09:26:23] (03CR) 10Ori.livneh: "PS6 bites the bullet and instructs Puppet to manage /etc/sysctl.d exclusively. It will purge some files that are installed by defaults, bu" [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 (owner: 10Ori.livneh) [09:32:22] (03PS7) 10Ori.livneh: Clean up sysctl parameters. [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 [09:33:26] ugh. [09:35:29] (03PS8) 10Ori.livneh: Clean up sysctl parameters. [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 [11:36:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:37:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [11:40:23] * paravoid grumbles [11:57:17] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [11:57:17] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [11:57:17] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [11:57:17] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [11:57:17] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [11:57:17] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [13:02:05] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [13:05:26] !log upgrading ceph to 0.67-rc3 [13:05:35] PROBLEM - DPKG on ms-fe1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:05:37] Logged the message, Master [13:06:35] RECOVERY - DPKG on ms-fe1004 is OK: All packages OK [13:10:47] PROBLEM - Host ms-fe1002 is DOWN: PING CRITICAL - Packet loss = 100% [13:11:47] !log rolling reboot ceph nodes for kernel upgrades [13:11:47] PROBLEM - Host ms-fe1003 is DOWN: PING CRITICAL - Packet loss = 100% [13:11:47] PROBLEM - DPKG on ms-be1012 is CRITICAL: Timeout while attempting connection [13:11:57] Logged the message, Master [13:12:27] RECOVERY - Host ms-fe1002 is UP: PING OK - Packet loss = 0%, RTA = 0.68 ms [13:13:27] RECOVERY - Host ms-fe1003 is UP: PING OK - Packet loss = 0%, RTA = 0.39 ms [13:13:47] PROBLEM - Host ms-be1012 is DOWN: PING CRITICAL - Packet loss = 100% [13:14:07] PROBLEM - Host ms-be1011 is DOWN: PING CRITICAL - Packet loss = 100% [13:15:07] RECOVERY - Host ms-be1012 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [13:15:37] RECOVERY - Host ms-be1011 is UP: PING OK - Packet loss = 0%, RTA = 1.64 ms [13:15:47] RECOVERY - DPKG on ms-be1012 is OK: All packages OK [13:16:37] PROBLEM - Host ms-fe1004 is DOWN: PING CRITICAL - Packet loss = 100% [13:17:17] PROBLEM - Host ms-be1010 is DOWN: PING CRITICAL - Packet loss = 100% [13:17:17] PROBLEM - Host ms-be1009 is DOWN: PING CRITICAL - Packet loss = 100% [13:17:27] PROBLEM - Host ms-be1008 is DOWN: PING CRITICAL - Packet loss = 100% [13:17:57] RECOVERY - Host ms-fe1004 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [13:19:37] RECOVERY - Host ms-be1009 is UP: PING OK - Packet loss = 0%, RTA = 1.00 ms [13:19:47] RECOVERY - Host ms-be1008 is UP: PING OK - Packet loss = 0%, RTA = 0.94 ms [13:19:47] RECOVERY - Host ms-be1010 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [13:21:07] PROBLEM - Host ms-fe1001 is DOWN: PING CRITICAL - Packet loss = 100% [13:21:57] yo qchris [13:22:04] Hi ottomata [13:22:12] do you remember what it was I have to do on manganese to reload the gerrit replication config? [13:22:36] Reloading the plugin should do the trick. [13:22:37] RECOVERY - Host ms-fe1001 is UP: PING OK - Packet loss = 0%, RTA = 0.93 ms [13:22:41] ja don't remember how to do that [13:23:12] ssh gerrit.wikimedia.org plugin reload replication [13:23:39] bash: plugin: command not found [13:23:51] Whoops ... ssh gerrit.wikimedia.org gerrit plugin reload replication [13:24:17] PROBLEM - Host ms-be1007 is DOWN: PING CRITICAL - Packet loss = 100% [13:24:17] PROBLEM - Host ms-be1006 is DOWN: PING CRITICAL - Packet loss = 100% [13:24:17] PROBLEM - Host ms-be1005 is DOWN: PING CRITICAL - Packet loss = 100% [13:24:44] hm ok that worked (had to ssh -p 29418) [13:24:55] are there logs I can look at? [13:25:41] the error_log holds the relevant logs. [13:25:47] Let me get the path to that ... [13:25:47] RECOVERY - Host ms-be1005 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [13:25:53] i found it [13:25:57] RECOVERY - Host ms-be1006 is UP: PING OK - Packet loss = 0%, RTA = 0.38 ms [13:26:07] i see an error, but i think it is irrelevant: Cannot replicate to gerritslave@gallium.wikimedia.org:/var/lib/git/mediawiki/extensions/WikibaseQuery.gi [13:26:17] /var/lib/gerrit2/review_site/logs/error_log [13:26:17] RECOVERY - Host ms-be1007 is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms [13:26:23] Oh I was too slow :-) [13:27:07] PROBLEM - NTP on ms-fe1002 is CRITICAL: NTP CRITICAL: Offset unknown [13:29:37] PROBLEM - Host ms-be1003 is DOWN: PING CRITICAL - Packet loss = 100% [13:29:40] ottomata: Yes, that looks unrelated [13:29:47] PROBLEM - Host ms-be1004 is DOWN: PING CRITICAL - Packet loss = 100% [13:29:47] PROBLEM - Host ms-be1002 is DOWN: PING CRITICAL - Packet loss = 100% [13:29:47] PROBLEM - Host ms-be1001 is DOWN: PING CRITICAL - Packet loss = 100% [13:30:09] ottomata: Was the final �t� in the log line dropped in copy/paste, is there no final �t�? [13:30:17] PROBLEM - NTP on ms-be1011 is CRITICAL: NTP CRITICAL: Offset unknown [13:30:47] RECOVERY - Host ms-be1003 is UP: PING OK - Packet loss = 0%, RTA = 0.36 ms [13:31:07] RECOVERY - NTP on ms-fe1002 is OK: NTP OK: Offset -0.000833272934 secs [13:31:17] RECOVERY - Host ms-be1004 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [13:31:27] RECOVERY - Host ms-be1001 is UP: PING OK - Packet loss = 0%, RTA = 4.05 ms [13:31:47] RECOVERY - Host ms-be1002 is UP: PING OK - Packet loss = 0%, RTA = 4.37 ms [13:31:54] ha, yes [13:34:17] RECOVERY - NTP on ms-be1011 is OK: NTP OK: Offset 0.001428723335 secs [13:34:27] PROBLEM - NTP on ms-be1010 is CRITICAL: NTP CRITICAL: Offset unknown [13:38:28] RECOVERY - NTP on ms-be1010 is OK: NTP OK: Offset -0.001274585724 secs [13:38:37] (03PS1) 10Ottomata: Updating README [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/77313 [13:38:48] (03CR) 10Ottomata: [C: 032 V: 032] Updating README [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/77313 (owner: 10Ottomata) [13:39:21] yay, great qchris, its working, thank you! [13:39:35] yw [13:40:19] PROBLEM - NTP on ms-be1005 is CRITICAL: NTP CRITICAL: Offset unknown [13:45:07] (03PS19) 10Ottomata: Puppetizing HA NameNode via Quorum Based JournalNode. [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/76018 [13:45:19] RECOVERY - NTP on ms-be1005 is OK: NTP OK: Offset 0.00024497509 secs [13:45:58] PROBLEM - NTP on ms-be1001 is CRITICAL: NTP CRITICAL: Offset unknown [13:47:29] (03PS20) 10Ottomata: Puppetizing HA NameNode via Quorum Based JournalNode. [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/76018 [13:49:58] RECOVERY - NTP on ms-be1001 is OK: NTP OK: Offset -0.00491964817 secs [13:56:48] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [14:08:30] (03CR) 10Faidon: [C: 031] Puppetizing HA NameNode via Quorum Based JournalNode. [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/76018 (owner: 10Ottomata) [14:13:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:14:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [14:20:55] PROBLEM - SSH on pdf3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:21:11] (03PS1) 10Ottomata: Updated README [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/77320 [14:21:20] (03CR) 10Ottomata: [C: 032 V: 032] Updated README [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/77320 (owner: 10Ottomata) [14:21:45] RECOVERY - SSH on pdf3 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [14:23:21] (03PS1) 10Ottomata: Updated README [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/77321 [14:23:32] (03CR) 10Ottomata: [C: 032 V: 032] Updated README [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/77321 (owner: 10Ottomata) [14:28:55] PROBLEM - SSH on pdf3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:29:00] (03CR) 10Lcarr: [C: 031] Puppetizing HA NameNode via Quorum Based JournalNode. [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/76018 (owner: 10Ottomata) [14:29:45] RECOVERY - SSH on pdf3 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [14:30:03] meh, i'll check out pdf3 [14:31:31] !log pdf3 unresponsive on console , rebooting [14:31:32] (03CR) 10Ottomata: [C: 032 V: 032] Puppetizing HA NameNode via Quorum Based JournalNode. [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/76018 (owner: 10Ottomata) [14:31:42] Logged the message, Mistress of the network gear. [14:35:10] PROBLEM - Host pdf3 is DOWN: PING CRITICAL - Packet loss = 100% [14:35:32] LeslieCarr: is that your first time ever touching a pdf box? :) [14:35:57] hehehe, willingly! [14:36:37] a few times when i started i looked at them because "eep hardy" but i gave up on them , like everyone else [14:37:06] wow, i have to look up hardy [14:37:59] hahaha [14:38:04] ubuntu 8.04 [14:38:09] kids these days... [14:38:09] :P [14:39:00] RECOVERY - Host pdf3 is UP: PING OK - Packet loss = 0%, RTA = 26.56 ms [14:39:31] yeah, it's about etch era [14:58:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:59:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.139 second response time [15:00:58] YuviPanda: pshaw, hardy was probably before you were born! [15:08:48] PROBLEM - Puppet freshness on holmium is CRITICAL: No successful Puppet run in the last 10 hours [15:24:22] (03PS1) 10Jgreen: vhost file for ticket.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/77329 [15:26:38] (03CR) 10Jgreen: [C: 032 V: 032] vhost file for ticket.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/77329 (owner: 10Jgreen) [15:37:59] (03PS1) 10ArielGlenn: rewrite of admins.pp, changes how accounts are enabled/disabled, bug fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/77330 [15:38:17] (03CR) 10jenkins-bot: [V: 04-1] rewrite of admins.pp, changes how accounts are enabled/disabled, bug fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/77330 (owner: 10ArielGlenn) [15:38:28] bah humbug [15:40:29] (03PS2) 10ArielGlenn: rewrite of admins.pp, changes how accounts are enabled/disabled, bug fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/77330 [15:40:46] (03CR) 10jenkins-bot: [V: 04-1] rewrite of admins.pp, changes how accounts are enabled/disabled, bug fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/77330 (owner: 10ArielGlenn) [15:42:06] (03PS1) 10Andrew Bogott: Make our checks for definitions a bit more explicit. [operations/puppet] - 10https://gerrit.wikimedia.org/r/77331 [15:42:07] (03PS1) 10Andrew Bogott: Move base class and subclasses into a 'base' module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/77332 [15:42:23] (03CR) 10Andrew Bogott: "Obviously this should not be merged without serious babysitting!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77332 (owner: 10Andrew Bogott) [15:42:57] (03CR) 10jenkins-bot: [V: 04-1] Move base class and subclasses into a 'base' module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/77332 (owner: 10Andrew Bogott) [15:44:25] (03PS3) 10ArielGlenn: rewrite of admins.pp, changes how accounts are enabled/disabled, bug fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/77330 [15:44:42] (03CR) 10jenkins-bot: [V: 04-1] rewrite of admins.pp, changes how accounts are enabled/disabled, bug fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/77330 (owner: 10ArielGlenn) [15:45:03] (03PS1) 10Jgreen: new manifest for otrs role [operations/puppet] - 10https://gerrit.wikimedia.org/r/77333 [15:46:04] (03CR) 10Jgreen: [C: 032 V: 032] new manifest for otrs role [operations/puppet] - 10https://gerrit.wikimedia.org/r/77333 (owner: 10Jgreen) [15:46:11] (03PS4) 10ArielGlenn: rewrite of admins.pp, changes how accounts are enabled/disabled, bug fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/77330 [15:46:21] 8:39, puppet repo silence [15:46:25] 8:45, puppet repo rewritten [15:51:14] ? [15:57:21] (03PS1) 10Jgreen: add role::otrs::webserver to iodine [operations/puppet] - 10https://gerrit.wikimedia.org/r/77339 [15:57:22] (03CR) 10Jgreen: [C: 032 V: 032] add role::otrs::webserver to iodine [operations/puppet] - 10https://gerrit.wikimedia.org/r/77339 (owner: 10Jgreen) [15:57:26] (03CR) 10Faidon: [C: 04-2] "+2500, -3288 is unreviewable and you seem to do multiple changes in one go. This is quite an important piece and needs to be thorougly rev" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77330 (owner: 10ArielGlenn) [16:01:57] (03CR) 10Faidon: "Looking a bit at the code, it's also quite horrible with all those ifs, so I guess this is a -3 or -4 :) You probably need to look at how " [operations/puppet] - 10https://gerrit.wikimedia.org/r/77330 (owner: 10ArielGlenn) [16:06:18] (03CR) 10ArielGlenn: "It turns out that due to http://projects.puppetlabs.com/issues/4151 I can't get done what I want with virtual resources (if you mean the a" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77330 (owner: 10ArielGlenn) [16:11:42] (03PS1) 10Jgreen: remove duplicate package in role::otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77342 [16:11:44] (03CR) 10Jgreen: [C: 032 V: 032] remove duplicate package in role::otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77342 (owner: 10Jgreen) [16:16:56] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [16:40:26] On Labs, I get "err: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not parse for environment production: Syntax error at '}' at /etc/puppet/manifests/role/otrs.pp:39 on node i-0000069a.pmtpa.wmflabs". [16:40:40] (For "puppetd -tv", that is.) [16:41:17] scfc_de: is that a self instance? [16:41:26] there was a mail or something about how to fix your self instances [16:41:49] jeremyb: Yes, but a new one, and that sounds like a syntax error. [16:42:04] No, it's not even a self instance, sorry. Regular. [16:42:23] ok. then yeah, it's a server-side problem with a manifest [16:42:39] oh, it's Jeff_Green's fault i guess based on file name :) [16:42:58] i really have to run away now or i'd look closer [16:43:05] bbl [16:43:23] Yes, operations/puppet:manifests/role/otrs.pp looks faulty. But how did that get past Jenkins? We do have lint tests there that should catch this. [16:44:05] Jeff_Green: ^ [16:44:44] ssl[1-3]0[0-9][0-9] covers 1000-3099 right..cuz 1009 is not picking up the partitioner...am i missing something? [16:50:14] (03CR) 10Tim Landscheidt: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77342 (owner: 10Jgreen) [16:51:38] (03PS1) 10Tim Landscheidt: Fix syntax error [operations/puppet] - 10https://gerrit.wikimedia.org/r/77348 [16:55:50] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [16:58:41] If someone could +2 https://gerrit.wikimedia.org/r/#/c/77348/, much appreciated. [16:59:01] (03PS1) 10Demon: Fixing syntax error that broke puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/77350 [16:59:17] <^d> scfc_de: That and ^ [16:59:18] (03CR) 10jenkins-bot: [V: 04-1] Fixing syntax error that broke puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/77350 (owner: 10Demon) [16:59:25] <^d> There's two syntax errors. [17:00:24] <^d> Jeff_Green: There's some syntax errors in that otrs manifest you were working on, broke puppet. [17:00:45] <^d> Patches above from scfc_de and myself. [17:01:50] d^: Isn't Puppet liberal with trailing commas? [17:02:48] <^d> Maybe, I'm just going based on what I'm used to doing. [17:02:55] <^d> The ; would be needed if the line below wasn't commented. [17:03:36] d^: You're right. [17:05:30] BTW, Jenkins *caught* the syntax error, but the change was already merged then (https://gerrit.wikimedia.org/r/#/c/77342/). Can Jenkins check that condition ("patch merged") and complain "louder" then? :-) [17:08:51] <^d> scfc_de: No, not really. People shouldn't merge like that though :\ [17:09:47] Needz a revert? [17:10:05] <^d> scfc_de and I put in 2 changes to fix the syntax errors. [17:10:10] <^d> Change is probably fine as is. [17:10:31] <^d> https://gerrit.wikimedia.org/r/#/c/77348/ and https://gerrit.wikimedia.org/r/#/c/77350/ [17:10:35] (03CR) 10Tim Landscheidt: "My OCD says that this should have https://gerrit.wikimedia.org/r/#/c/77348/ as a dependency so it isn't merged prematurely :-)." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77350 (owner: 10Demon) [17:11:19] <^d> scfc_de: I'll rebase on top, sec. [17:11:39] Anything I can do to help? [17:12:03] (03PS2) 10Demon: Fixing syntax error that broke puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/77350 [17:12:30] Coren: Review and merge https://gerrit.wikimedia.org/r/#/c/77348/ and https://gerrit.wikimedia.org/r/#/c/77350/?! [17:13:38] (03CR) 10Tim Landscheidt: [C: 031] Fixing syntax error that broke puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/77350 (owner: 10Demon) [17:14:02] (03CR) 10coren: [C: 032] "No brainer fix" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77348 (owner: 10Tim Landscheidt) [17:15:04] (03CR) 10coren: [C: 032] "Jenkins likes it." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77350 (owner: 10Demon) [17:15:55] Coren: Thanks. [17:16:07] Merged. [17:16:27] <^d> Thank you Coren :) [17:17:00] "notice: Finished catalog run in 26.11 seconds" and everything's working again. [17:20:21] Hi- I'm having trouble using 'git review' to put my commit up for review -- I keep getting a message that 'We don't know where your gerrit is. Please manually create [17:20:21] a remote named gerrit and try again.' Any help much appreciated. [17:21:18] kma500: If you originally cloned with "git clone", you (may) need to rename the remote "origin" to "gerrit". [17:21:43] kma500: "git remote rename origin gerrit" [17:21:56] I did, I believe. I'll try that [17:23:53] scfc_de- I tried and now get a message reading "Problems encountered installing commit-msg hook [17:23:53] ssh: connect to host git.wikimedia.org port 22: No route to host" [17:24:51] kma500: Hmmm. Don't know. Network error? Try again? [17:25:23] same thing. [17:26:55] <^d> If you cloned from git.wikimedia.org then that's not gerrit. [17:26:59] <^d> That's just the repo browser. [17:27:01] kma500: Ah, ok, you had cloned from the anonymous repo. [17:27:41] Oh. What do you recommend I do? [17:27:50] kma500: What's your Gerrit username? [17:28:09] Kmenger, I believe. [17:28:33] kma500: And you want to submit to labs/toollabs? [17:28:43] yes [17:29:40] "why don't I clone from git.wikimedia?" "because that's just a browser, not the real repo" "uh, what? it says 'git'" "shuddup" [17:30:12] kma500: Then "git remote set-url gerrit ssh://kmenger@gerrit.wikimedia.org:29418/labs/toollabs" should fix that. [17:31:12] That and then 'git review -s' again? [17:31:29] kma500: Yes, if the first one didn't succeed. [17:31:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:32:03] Well, now I get 'Permission denied (publickey).' [17:32:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.133 second response time [17:32:57] kma500: Sure about "kmenger"? [17:33:38] Kmenger is my wikitech account. [17:34:22] kma500: What's your "Instance shell account name" in the wikitech preferences? [17:34:27] kmenger [17:34:41] Hmmm. [17:35:52] kma500: And your key is listed on https://gerrit.wikimedia.org/r/#/settings/ssh-keys? [17:36:17] *public key [17:37:26] I have a key there [17:38:13] 'Algorithm': ssh-rsa 'Key' ... long key... [17:38:18] Is that what you mean? [17:41:57] (03PS2) 10Faidon: Add an authdns module & associated role classes [operations/puppet] - 10https://gerrit.wikimedia.org/r/74119 [17:41:58] Yes. Do you find the first letters in the key in your ~/.ssh/id_rsa.pub? [17:47:15] yes--on my local machine. I am actually working from my tools account, so I don't have that file there. [17:51:22] kma500: That makes everything much more complicated :-), because then the key on Tools is used. *My* work pattern is that I test stuff on Tools and then copy them to my private machine and submit from there. You can do very fancy stuff with ssh key forwarding & Co., but that's on the far side of my horizon, I'm afraid. [17:53:10] ^d: uff. fixing... [17:53:22] <^d> Already fixed. [17:53:31] You mean it won't just magically work? :) Thanks for all your help. I will try your work pattern. [17:53:53] i see that now [17:54:03] it was a trailing comma? [17:54:16] <^d> :) [17:54:25] grumble grumble grumble [17:55:43] kma500: Actually, it can :-), but the magic isn't very transparent, so there's always a risk that your *private* key gets exposed to somewhere where you didn't expect it to be, so I shy away from that. [17:56:12] Jeff_Green: No, the problem was the commented "{", but uncommented "}" = unbalanced parentheses. [17:56:52] right but that's because puppet's parser is stupid [17:57:03] the typo was a , where there should have been a ; [17:57:06] scfc_de: see, this is why you should use password authentication :P [17:57:13] * YuviPanda adds some more smileys after that [17:58:44] Jeff_Green: No, the typo occured when you inserted a newline before "}" in line ... 19 (https://gerrit.wikimedia.org/r/#/c/77342/1/manifests/role/otrs.pp). [18:01:32] oh there were two typos, I see the second patch now. fail [18:30:12] PROBLEM - Puppet freshness on sq41 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:26] (03PS1) 10Jgreen: add webserver::apache to iodine [operations/puppet] - 10https://gerrit.wikimedia.org/r/77363 [18:39:18] (03CR) 10jenkins-bot: [V: 04-1] add webserver::apache to iodine [operations/puppet] - 10https://gerrit.wikimedia.org/r/77363 (owner: 10Jgreen) [18:40:59] (03PS2) 10Jgreen: add webserver::apache to iodine [operations/puppet] - 10https://gerrit.wikimedia.org/r/77363 [18:42:20] (03CR) 10Jgreen: [C: 032 V: 032] add webserver::apache to iodine [operations/puppet] - 10https://gerrit.wikimedia.org/r/77363 (owner: 10Jgreen) [18:51:37] (03PS1) 10Jgreen: iodine switching from webserver::apache2 to webserver::apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/77365 [18:57:08] (03CR) 10Andrew Bogott: "Jenkins failure is fixed by this: https://gerrit.wikimedia.org/r/#/c/77364/1" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77332 (owner: 10Andrew Bogott) [18:57:09] (03CR) 10Jgreen: [C: 032 V: 032] iodine switching from webserver::apache2 to webserver::apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/77365 (owner: 10Jgreen) [18:57:41] (03PS1) 10Pyoungmeister: WORK IN PROGRESS: check graphite data from nagios [operations/puppet] - 10https://gerrit.wikimedia.org/r/77366 [18:58:29] (03CR) 10jenkins-bot: [V: 04-1] WORK IN PROGRESS: check graphite data from nagios [operations/puppet] - 10https://gerrit.wikimedia.org/r/77366 (owner: 10Pyoungmeister) [18:58:38] I know, jenkins. I know. [19:00:17] quit point out my faults, man, it ain't nice [19:00:20] pointing* [19:00:42] hmm, perhaps I shoudlnt' relay jenkins -1 for WIP things? [19:00:45] nah, too specific [19:03:21] (03PS1) 10Jgreen: fix OTRS apache vhost, enable ssl for iodine [operations/puppet] - 10https://gerrit.wikimedia.org/r/77367 [19:03:26] YuviPanda: nah, I want to see all of my pep8 failings :) [19:06:29] (03CR) 10Jgreen: [C: 032 V: 032] fix OTRS apache vhost, enable ssl for iodine [operations/puppet] - 10https://gerrit.wikimedia.org/r/77367 (owner: 10Jgreen) [19:10:37] Ryan_Lane, is it harmful to have the projectgid.rb fact present on production machines? [19:10:45] YuviPanda: Having a Work-In-Progress: True keyword might be a good idea. [19:10:55] Or some other way to programmatically mark commits as draft. [19:11:01] Besides, y'know, actual drafts. [19:12:12] Openstack's gerrit has a 'work in progress' feature, maybe there's a patch we could grab [19:13:01] Elsie: that was already wontfixed by ^demon|lunch [19:13:19] Was it? [19:13:21] andrewbogott: probably not [19:13:24] Elsie: yes [19:13:36] andrewbogott: does it need to be included? [19:13:46] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:13:47] Well, a link would be handy. :-) [19:14:22] Ryan_Lane, I'm going to rip out our code that installs custom facts and just let pluginsync do it. That means we'll get the same facts everywhere (which is probably a good thing, overall.) [19:14:32] ah [19:14:33] right [19:14:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [19:15:12] I'm pretty sure that fact just does a gid lookup via the system [19:15:20] so it should be fine [19:16:17] Anybody going to lunch soon? [19:16:30] (for people that are in the SF office that is) [19:17:46] (03PS1) 10Yuvipanda: Add initial classes for dynamic http routing instance in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77368 [19:18:09] preilly: oh hey [19:18:31] preilly: I remember a story you were telling me about an airline CEO flying as a flight attendant every once in a while to see if their customers were happy [19:18:47] preilly: do you happen to remember the CEO/airline's name? :) [19:19:36] preilly: likely soon, yes [19:19:47] Elsie: search bugzilla? :) [19:20:20] I'm pretty sure I overturned the wontfix. [19:20:25] !log installing mexia [19:20:48] Elsie: "overturrned" [19:20:49] hahaha [19:21:00] :-) [19:21:10] @paravoid — Ben Baldanza Spirit Airlines’ CEO did it [19:21:31] Logged the message, Master [19:21:44] https://bugzilla.wikimedia.org/show_bug.cgi?id=50842 [19:21:45] preilly: thanks! [19:22:05] I generalized the bug. I'm not sure a keyword has been rejected. [19:22:06] Elsie: get the work in progress stuff upstreamed and then it'll likely get considered [19:22:08] But maybe. [19:22:19] @paravoid — I'm also pretty sure that others have done it too I don't think that it was his idea to begin with [19:22:21] Well, we can upstream via Christian and Chad. :-) [19:22:23] otherwise it probably isn't going to happen [19:22:43] I think there's a legitimate bug. Whether anyone will ever have the time/inclination to fix it is another matter, as always. [19:23:38] Does anybody know if nginx has an easy way to enforce that a certain header be set on every request? [19:23:52] Try google? [19:24:29] I've been looking at http://wiki.nginx.org/HeadersManagement [19:24:41] Elsie: was that your serious response to me? [19:24:59] * preilly leave it to MZMcBride to troll me [19:25:20] Well, it would be the first place I would look. [19:25:54] That's fair [19:25:59] Usually there's a StackOverflow answer in the first few results with a usable answer... http://stackoverflow.com/questions/11973047/adding-and-using-header-http-in-nginx [19:26:44] Elsie: the thing is that I want to enforce that a header is present in a request [19:27:04] if it isn't there I want to not process the request [19:28:07] > If you do not explicitly set underscores_in_headers on;, nginx will silently drop HTTP headers with underscores (which are perfectly valid according to the HTTP standard). [19:28:12] That's fun. [19:28:35] LeslieCarr, can you read https://rt.wikimedia.org/Ticket/Display.html?id=2816 and fill me in on what's happening there? [19:28:35] Yeah that's a bit lame to say the least [19:28:50] It seems like that fact isn't installed (and doesn't compile) but is nonetheless referenced someplace... [19:28:54] Ryan_Lane: want to grab some food in a bit? [19:29:00] There's #nginx on freenode, apparently. [19:29:14] preilly: sure [19:29:39] preilly: yes, you can force a header on every request [19:29:49] we do so with X-forwarded-proto [19:30:28] proxy_set_header [19:31:05] though I guess that's only when proxying [19:31:19] I'm not sure about adding headers when not proxying [19:32:06] PROBLEM - Host labstore3 is DOWN: PING CRITICAL - Packet loss = 100% [19:32:24] andrewbogott: Ryan_Lane +2 for https://gerrit.wikimedia.org/r/77368 [19:32:25] ? [19:32:26] RECOVERY - Host labstore3 is UP: PING OK - Packet loss = 0%, RTA = 26.54 ms [19:33:44] YuviSplit: merged [19:33:47] @Elsie — I'm in that channel now [19:34:04] Ryan_Lane: ty! [19:34:06] yw [19:34:13] why no grrit-wm? [19:34:18] ah, netsplit? [19:34:22] preilly: Ah, nice. [19:35:02] Ryan_Lane: I want to force the request to have a header or not process it [19:35:53] preilly: and this is not through a proxy, I'm assuming? [19:36:13] !log dns update [19:36:24] Logged the message, Master [19:36:27] if not, I have no clue as I've only been using nginx as a proxy so far [19:36:28] Ryan_Lane: that's correct [19:36:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:36:44] It's okay I'll figure it out [19:36:51] * Ryan_Lane nods [19:36:55] let me know if you figure it out [19:37:00] andrewbogott: ok [19:37:01] I may need the same thing for web platform [19:37:10] and the new ops person there is switching to nginx [19:37:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [19:37:29] ah map $http_x_header $file_suffix { [19:37:29] default "2"; [19:37:30] OK "1"; [19:37:31] }; [19:37:45] LeslieCarr, actually, maybe it isn't referenced… yesterday mark convinced me that it is but I can no longer see where :) [19:37:55] huh? [19:38:33] it's referenced in templates/misc/initcwnd.erb [19:38:41] and installed with base.pp [19:39:58] Oh! Yesterday I started out talking about default_interface.pp and mark changed the subject to default_gateway without me noticing. Sneaky! [19:40:28] So… https://gerrit.wikimedia.org/r/#/c/76120/ <- that is what I am actually wondering about [19:41:29] hehe [19:41:31] (03Restored) 10Andrew Bogott: Remove default_interface fact [operations/puppet] - 10https://gerrit.wikimedia.org/r/76120 (owner: 10Andrew Bogott) [19:41:55] So, I'm confused, but… do you agree that default_interface.rb can be scrapped? [19:41:55] so it is needed [19:42:29] i do not agree [19:42:29] (03PS1) 10Cmjohnson: adding testsearch1001-3 to dhcpd and netboot.cfg [operations/puppet] - 10https://gerrit.wikimedia.org/r/77371 [19:42:40] ok... [19:42:41] it is referenced in a template that's installed on many machines [19:42:48] default_interface is? [19:42:51] templates/misc/initcwnd.erb [19:42:55] oh wait [19:42:57] default interface [19:43:00] (03PS1) 10Ottomata: Access to stat1 for Kenan Wang. RT 5520 [operations/puppet] - 10https://gerrit.wikimedia.org/r/77372 [19:43:04] not default_gateway_interface ? [19:43:39] * andrewbogott 's head spins [19:44:54] LeslieCarr, https://gerrit.wikimedia.org/r/#/c/76120/ <- removes a file called default_interface.rb which I think you wrote [19:44:56] (03PS2) 10Cmjohnson: adding testsearch1001-3 to dhcpd and netboot.cfg [operations/puppet] - 10https://gerrit.wikimedia.org/r/77371 [19:45:18] And which doesn't parse... [19:45:20] oh look [19:45:23] i did add that [19:45:25] why did i do that [19:45:53] (03CR) 10Cmjohnson: [C: 032 V: 032] adding testsearch1001-3 to dhcpd and netboot.cfg [operations/puppet] - 10https://gerrit.wikimedia.org/r/77371 (owner: 10Cmjohnson) [19:46:13] oh looks like an earlier version [19:46:15] ok [19:46:20] (03PS2) 10Ottomata: Access to stat1 for Kenan Wang. RT 5520 [operations/puppet] - 10https://gerrit.wikimedia.org/r/77372 [19:46:27] (03CR) 10Ottomata: [C: 032 V: 032] Access to stat1 for Kenan Wang. RT 5520 [operations/puppet] - 10https://gerrit.wikimedia.org/r/77372 (owner: 10Ottomata) [19:46:49] (03CR) 10Lcarr: [C: 031] "correct, this file - default_interface.rb , is unused" [operations/puppet] - 10https://gerrit.wikimedia.org/r/76120 (owner: 10Andrew Bogott) [19:47:00] there we go [19:47:09] thanks! [19:47:42] Ottomata: do you want me to merge your changes? [19:48:00] i just did I think [19:48:31] oh..okay must've been convenient timing [19:49:14] (03CR) 10Andrew Bogott: [C: 032] Remove default_interface fact [operations/puppet] - 10https://gerrit.wikimedia.org/r/76120 (owner: 10Andrew Bogott) [19:50:57] LeslieCarr, as for what that RT ticket is actually about… I have a patch in the works that should fix it. [19:51:32] oh awesome [19:52:51] * ^d hands Ryan_Lane a small basket of puppet changes [19:53:37] Ryan_Lane, here's an easy one: https://gerrit.wikimedia.org/r/#/c/77331/ [19:59:18] (03PS1) 10Hashar: rake validate learned how to force colorization [operations/puppet] - 10https://gerrit.wikimedia.org/r/77375 [19:59:23] (03PS2) 10Hashar: rake validate learned how to force colorization [operations/puppet] - 10https://gerrit.wikimedia.org/r/77375 [19:59:25] (03PS1) 10Dzahn: add needed directory for RT-Shredder plugin to reactivate it (RT #5534) [operations/puppet] - 10https://gerrit.wikimedia.org/r/77377 [20:00:10] (03PS2) 10Andrew Bogott: Move base class and subclasses into a 'base' module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/77332 [20:00:11] (03PS2) 10Andrew Bogott: Make our checks for definitions a bit more explicit. [operations/puppet] - 10https://gerrit.wikimedia.org/r/77331 [20:00:12] (03PS1) 10Andrew Bogott: Turn on pluginsync. [operations/puppet] - 10https://gerrit.wikimedia.org/r/77378 [20:00:23] (03PS2) 10Dzahn: add needed directory for RT-Shredder plugin to reactivate it (RT #5534) [operations/puppet] - 10https://gerrit.wikimedia.org/r/77377 [20:01:07] (03CR) 10Lcarr: [C: 031] "oo i love the proper global scoping!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77331 (owner: 10Andrew Bogott) [20:01:49] (03CR) 10Andrew Bogott: [C: 032] rake validate learned how to force colorization [operations/puppet] - 10https://gerrit.wikimedia.org/r/77375 (owner: 10Hashar) [20:02:19] \O/ [20:03:41] jenkins verified but "publish and submit" greyed out [20:04:18] (03PS3) 10Dzahn: add needed directory for RT-Shredder plugin to reactivate it (RT #5534) [operations/puppet] - 10https://gerrit.wikimedia.org/r/77377 [20:04:19] rebasing [20:04:34] (03CR) 10Dzahn: [C: 032] add needed directory for RT-Shredder plugin to reactivate it (RT #5534) [operations/puppet] - 10https://gerrit.wikimedia.org/r/77377 (owner: 10Dzahn) [20:04:45] (03CR) 10jenkins-bot: [V: 04-1] add needed directory for RT-Shredder plugin to reactivate it (RT #5534) [operations/puppet] - 10https://gerrit.wikimedia.org/r/77377 (owner: 10Dzahn) [20:04:54] hashar: do we know more why it sometimes wants the rebases only seconds after initial submit? [20:05:11] rgohghghg [20:05:20] mutante, I think it's just a race, if something gets merged while submitting. [20:05:22] why does the change works for me but no there [20:05:34] andrewbogott: ahh. ok [20:05:38] hashar, the color thing? [20:06:30] wait, it's Verified -1 after rebase while it was +2 before :) [20:06:35] (03PS1) 10Hashar: Revert "rake validate learned how to force colorization" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77379 [20:06:38] andrewbogott: yup sorry :( [20:06:47] I got a different puppet 2.7 version on my VM [20:06:51] want me to merge the revert or do you want to tinker with it more? [20:07:00] so yeah just revert, which is https://gerrit.wikimedia.org/r/77379 [20:07:01] damn [20:07:12] I should build myself a VM with the material from apt.wm.o [20:07:22] na just revert [20:07:25] (03CR) 10Andrew Bogott: [C: 032] "Tragic!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77379 (owner: 10Hashar) [20:07:27] I will fill a bug about it for later on [20:07:27] paravoid: if you're still here: anything I/we can do about google marking a certain mailing list's messages as spam (specifically, wikitech-ambassadors)? [20:07:38] friday fix at 10pm are definitely a bad idea :-] [20:07:41] ah, this is "undefined method `console_color' for main:Object" right? [20:08:05] mutante, rebase yet again? [20:08:18] mutante: yup caused by a tragic change I made a minute ago [20:08:27] and social engineered andrew to merge in a hurry hehe [20:08:52] (03PS4) 10Dzahn: add needed directory for RT-Shredder plugin to reactivate it (RT #5534) [operations/puppet] - 10https://gerrit.wikimedia.org/r/77377 [20:09:00] sorry mutante :( [20:09:05] hey, I read it! [20:09:08] heh, no problem :) [20:09:53] (03CR) 10Dzahn: [C: 032] add needed directory for RT-Shredder plugin to reactivate it (RT #5534) [operations/puppet] - 10https://gerrit.wikimedia.org/r/77377 (owner: 10Dzahn) [20:10:07] all good now [20:11:15] (03CR) 10Andrew Bogott: "What was generating the docs before? And, don't we need to turn that off at the same time?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77090 (owner: 10Hashar) [20:11:23] (03PS1) 10Jgreen: start puppetizing otrs exim config [operations/puppet] - 10https://gerrit.wikimedia.org/r/77380 [20:12:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:15:24] (03CR) 10Jgreen: [C: 032 V: 032] start puppetizing otrs exim config [operations/puppet] - 10https://gerrit.wikimedia.org/r/77380 (owner: 10Jgreen) [20:16:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 3.658 second response time [20:18:58] (03CR) 10Hashar: "This is merely a helper for local use. I copy pasted the shell part from the Jenkins script at https://integration.wikimedia.org/ci/job/op" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77090 (owner: 10Hashar) [20:21:11] (03CR) 10Andrew Bogott: [C: 032] role::nova: package[] -> Package[] [operations/puppet] - 10https://gerrit.wikimedia.org/r/77124 (owner: 10Hashar) [20:22:46] (03PS1) 10Jgreen: fix class name otrs --> role::otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77382 [20:23:06] and now I am really off *waves* [20:23:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:23:32] (03CR) 10Jgreen: [C: 032 V: 032] fix class name otrs --> role::otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77382 (owner: 10Jgreen) [20:28:00] (03PS1) 10Cmjohnson: removing colby bellin and blondel entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/77384 [20:28:59] hey, if my public key is ssh-rsa XXX== user@hostname, does user@hostname have to be the key name in puppet? [20:29:22] (03PS1) 10Jgreen: remove erroneous class call [operations/puppet] - 10https://gerrit.wikimedia.org/r/77385 [20:29:31] (03CR) 10Cmjohnson: [C: 032 V: 032] removing colby bellin and blondel entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/77384 (owner: 10Cmjohnson) [20:30:18] (03CR) 10Jgreen: [C: 032 V: 032] remove erroneous class call [operations/puppet] - 10https://gerrit.wikimedia.org/r/77385 (owner: 10Jgreen) [20:35:12] (03PS1) 10MaxSem: Update my SSH key [operations/puppet] - 10https://gerrit.wikimedia.org/r/77386 [20:35:31] hey, can someone review plz^^:) [20:35:42] (03CR) 10Ryan Lane: [C: 032] Provide reviewer counts per patch [operations/puppet] - 10https://gerrit.wikimedia.org/r/76945 (owner: 10Demon) [20:44:35] (03PS1) 10Jgreen: more exim config for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77388 [20:45:41] (03CR) 10Jgreen: [C: 032 V: 032] more exim config for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77388 (owner: 10Jgreen) [20:49:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 9.288 second response time [20:51:05] Bah, MaxSem I'm trying to create a hangout and invite you but I don't know how to work this computamajig. Do you want to create one? [20:51:13] voice/face contact to confirm key change [20:52:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:54:01] andrewbogott, calling [20:55:17] (03CR) 10Andrew Bogott: [C: 032] "Confirmed via google hangout that this is the real Max." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77386 (owner: 10MaxSem) [20:55:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 9.303 second response time [20:58:29] MaxSem, well, now that I've merged that… what's with the new key for 'Darrell'? [20:58:48] (03CR) 10Yuvipanda: "awwww, that's what you think! *evillaugh*" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77386 (owner: 10MaxSem) [20:59:02] andrewbogott, ?? [20:59:20] andrewbogott: re: keys and travelling, my keyes are in a truecrypt volume. I assume that is good enough? [21:01:42] (03PS1) 10Jgreen: otrs gets spamassassin, otrs system user [operations/puppet] - 10https://gerrit.wikimedia.org/r/77391 [21:03:01] (03CR) 10Jgreen: [C: 032 V: 032] otrs gets spamassassin, otrs system user [operations/puppet] - 10https://gerrit.wikimedia.org/r/77391 (owner: 10Jgreen) [21:03:34] yuvipanda, I /think/ so… ryan_lane, do you take any extraordinary security measures when traveling? [21:03:59] (03PS1) 10MaxSem: Rm dupe I accidentally created [operations/puppet] - 10https://gerrit.wikimedia.org/r/77393 [21:04:02] My prod key uses a passphrase and is on an encrypted drive… good enough? [21:04:32] (03CR) 10Andrew Bogott: [C: 032] "*blush*" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77393 (owner: 10MaxSem) [21:05:36] jeff_green, want me to merge that? [21:05:49] andrewbogott: yespls [21:07:26] andrewbogott: do you not use full disk encryption? [21:07:33] Ryan_Lane, I do. [21:07:36] ah, ok [21:07:49] well, just let us know if anyone takes your laptop and gives it back [21:07:58] keep your laptop turned off [21:08:01] Yep, OK. [21:08:08] do you have two-factor enabled for google? [21:08:33] I don't. My staff account you mean? [21:08:45] you should! also for wikitech [21:08:51] I do for wikitech [21:09:12] I use thunderbird as my gmail client, will it be able to cope with two-factor? [21:10:06] * MaxSem (contribs | log | block) moved #wikimedia-operations to ##paranoia [21:10:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:10:58] andrewbogott: you'd set up an "app specific password" for thunderbird [21:11:07] which is what it says on the tin [21:11:19] * andrewbogott tries it [21:11:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 3.376 second response time [21:11:26] a one use password that you should only use for thunderbird. you'd also create one for eg: pidgin [21:11:29] have we already discussed rubber hose cryptanalysis? [21:11:35] :( [21:11:42] * andrewbogott thinks he will wait until a printer is nearby [21:11:50] !log dns update [21:12:15] andrewbogott: gpg encrypt the backup passwords and keep it on your laptop, as well :) [21:12:34] MaxSem, I think our goal is to protect ourselves from being hacked and not knowing it. If they beat a password out of me at least I'll know. [21:13:28] but you won't be able to do anything due to being in prison;) [21:13:46] MaxSem, I didn't say anyone else would know :( [21:14:07] (03PS1) 10Jgreen: grr exim4 != exim [operations/puppet] - 10https://gerrit.wikimedia.org/r/77394 [21:17:14] (03CR) 10Jgreen: [C: 032 V: 032] grr exim4 != exim [operations/puppet] - 10https://gerrit.wikimedia.org/r/77394 (owner: 10Jgreen) [21:31:30] hm… two-factor-auth is unrelated to the issue of laptop theft, right? Or should I be careful not to carry my phone in the same bag as my laptop? (I can't think how that matters, really.) [21:32:24] andrewbogott: well, if someone has your phone and your phone is locked with a good keyswipe/code, then they could easily get in to your google account, yeah [21:32:50] well, if they had your password (if you don't set lastpass/whatever to autofill) [21:33:06] I assume you mean 'not locked with a good...' [21:33:34] sure, intersperse/remove random negatives as needed ;) [21:33:43] * MaxSem simply does not use his WMF account from phone [21:33:57] But, right, my laptop won't know the password anyway. I guess since one of the factors is my head then the laptop+phone in bag issue is moot. [21:34:08] thats the idea [21:34:20] know+have [21:39:46] how long does it take for my ssh key I just added to wikitech to get to gerrit? [21:41:51] (03PS1) 10Tim Landscheidt: Grid Engine: Link to accounting instead of pulling it [operations/puppet] - 10https://gerrit.wikimedia.org/r/77452 [21:43:43] (03PS1) 10Yuvipanda: Implement instanceproxy replacement with lua [operations/puppet] - 10https://gerrit.wikimedia.org/r/77454 [21:43:48] andrewbogott: ^ [21:44:10] andrewbogott: probably needs work, also needs testing. [21:44:27] greg-g: it doesn't [21:44:34] greg-g: gerrit is separate, unfortunately [21:45:02] can someone with puppet experience look at https://gerrit.wikimedia.org/r/77454 [21:45:03] ? [21:45:04] Ryan_Lane: oh, 'twas confused then... [21:45:19] specifically, I am hardcoding a DNS resolver in there, I suppose I shouldn't do that [21:45:27] also perhaps different organizatio [21:45:27] n [21:45:39] greg-g: wikitech's ssh key interface syncs to labs [21:45:48] we have an open bug for this in gerrit [21:45:52] who knows if it'll ever get fixed [21:45:56] Ryan_Lane: :) [21:46:24] (03CR) 10Tim Landscheidt: "Should shave about 1 GB/5 minutes off disk IO :-)." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77452 (owner: 10Tim Landscheidt) [21:47:45] * yuvipanda pokes scfc_de with https://gerrit.wikimedia.org/r/77454 [21:48:43] yuvipanda: Way outside my comfort zone :-). [21:48:52] aww [21:49:19] also no way to get DNS ip address from facter :( [21:50:10] Which IP address do you mean? [21:50:18] Not 10.*? [21:50:27] 10.4.0.1; [21:50:55] Ah, you mean the DNS server? [21:51:11] scfc_de: yeah [21:51:18] scfc_de: nginx needs to be told where it is [21:51:21] explicitly [21:51:23] I can't find a global [21:51:28] and i see it being hard coded in other places [21:51:30] so I guess it is ok [21:52:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:53:04] Looking at base.pp, isn't that $::nameservers & Co. that gets inserted into templates/base/resolv.conf.erb? [21:53:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [21:53:35] looking [21:54:04] scfc_de: well, that sets /etc/resolv [21:54:10] and doesn't do that for labs [21:54:21] Not? [21:54:30] if $::realm != "labs" { [21:55:27] Where gets Labs's resolv.conf then set? [21:55:32] (03PS1) 10Jgreen: parameterize spamassassin and use it in role::otrs::mailserver [operations/puppet] - 10https://gerrit.wikimedia.org/r/77456 [21:55:48] scfc_de: no idea :D [21:55:50] (03CR) 10jenkins-bot: [V: 04-1] parameterize spamassassin and use it in role::otrs::mailserver [operations/puppet] - 10https://gerrit.wikimedia.org/r/77456 (owner: 10Jgreen) [21:56:00] scfc_de: I see a similar line [21:56:01] templates/nginx/sites/labs-proxy.erb: resolver 10.4.0.1; [21:56:01] "git grep resolv.conf" comes up empty. [21:56:09] scfc_de: that's the current instanceproxy [21:56:46] Ryan_Lane: Where does /etc/resolv.conf on Labs come from? Is it part of the initial instance image? [21:57:13] (03PS2) 10Jgreen: parameterize spamassassin and use it in role::otrs::mailserver [operations/puppet] - 10https://gerrit.wikimedia.org/r/77456 [21:57:17] it comes from dhcp, I believe [21:57:41] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [21:57:41] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [21:57:41] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [21:57:41] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [21:57:41] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [21:57:41] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [21:58:03] yuvipanda: I believe you can refer to other files on the local file system in Puppet, if that helps. [21:58:18] (03CR) 10Jgreen: [C: 032 V: 032] parameterize spamassassin and use it in role::otrs::mailserver [operations/puppet] - 10https://gerrit.wikimedia.org/r/77456 (owner: 10Jgreen) [21:58:23] scfc_de: I think current one is good enough, since that is what andrewbogott's initial code uses :D [21:58:59] Ryan_Lane: andrewbogott should I make this into a module, rather than just scattering them in the repo? [21:59:19] yuvipanda, probably. I haven't read it yet though :) [21:59:55] andrewbogott: can you tell me when you'll have time to look at it? Would help me figure out when to sleep [21:59:59] yuvipanda: Probably. [22:01:24] yuvipanda, not sure, my workday is close to its end. I may get to it this evening but you should not wait up for me right now at least. [22:01:40] andrewbogott: heh, okay. When do you arrive in hong kong? [22:08:00] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [22:08:29] (03PS1) 10Jgreen: fix use of systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/77457 [22:09:40] PROBLEM - search indices - check lucene status page on testsearch1001 is CRITICAL: Connection refused [22:09:50] PROBLEM - spamassassin on iodine is CRITICAL: Connection refused by host [22:09:50] PROBLEM - search indices - check lucene status page on testsearch1002 is CRITICAL: Connection refused [22:10:00] PROBLEM - search indices - check lucene status page on testsearch1003 is CRITICAL: Connection refused [22:10:18] !log pooling ssl1007-9 [22:10:29] Logged the message, Master [22:10:55] (03CR) 10Jgreen: [C: 032 V: 032] fix use of systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/77457 (owner: 10Jgreen) [22:13:00] PROBLEM - Lucene on testsearch1001 is CRITICAL: Connection refused [22:13:10] PROBLEM - Lucene on testsearch1002 is CRITICAL: Connection refused [22:13:20] PROBLEM - Lucene on testsearch1003 is CRITICAL: Connection refused [22:18:18] (03PS2) 10Yuvipanda: Add labsproxy module for replacing instanceproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/77454 [22:22:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:23:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [22:32:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Fri Aug 2 22:32:36 UTC 2013 [22:33:00] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [22:34:37] !log depooling ssl1001-4 to measure load on new hardware [22:34:48] Logged the message, Master [22:41:41] !log depooling ssl1005-6 to further test load [22:41:52] Logged the message, Master [22:50:48] !log deppoling ssl1007 for more testing [22:50:58] Logged the message, Master [22:54:37] !log repooling ssl1001-1007 [22:54:47] Logged the message, Master [22:55:48] hmmmm, since the past hour(s?) my bot edits are not being marked as bot edits [22:56:05] * aude changed absolutely nothing in my bot code and it worked earlier [22:56:30] are there any issues that might be related? [22:57:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:58:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.205 second response time [22:58:57] it has clearly grown sentient [22:59:21] aude: Have you tried turning it off and on again? [22:59:47] logging out and in again? [22:59:52] unpluggin it [23:00:05] aude: Example edit? [23:00:09] there's a little reset button on the bottom [23:02:07] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [23:02:21] Wouldn't keeping your laptop off just make it easier to mirror the drive? [23:02:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:03:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [23:03:43] Elsie: http://www.wikidata.org/w/index.php?title=Q1133516&curid=1080380&diff=63303474&oldid=63303469 [23:04:16] my last edits from earlier bot run were all marked as bot [23:04:44] Let's look... [23:05:35] i have assert=bot [23:05:41] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [23:06:22] http://p.defau.lt/?Dbp5i1jV3aWAIJUs7wlViw [23:06:50] yep [23:07:06] Q7946865 was from 4-5 hours ago [23:07:40] ori-l: The server admin log looks drunk. It auto-links "2013" as a Gerrit changeset. [23:08:57] aude: Sorry, dunno. [23:09:17] You're sure you're specifying bot=1? [23:09:29] http://dpaste.com/1328571/ is my bot edit [23:09:39] which has not changed from 5 hours ago [23:09:47] Elsie: yes i have bot=1 [23:10:52] https://www.wikidata.org/w/index.php?title=Special%3ALog&type=&user=&page=User%3AAudeBot&year=&month=-1&tagfilter= [23:11:00] Looks like your bot's user groups haven't changed. [23:11:06] And the bot user right is still assigned to bots. [23:11:13] https://www.wikidata.org/wiki/Special:ListGroupRights [23:11:18] * aude tries to edit enwiki [23:11:26] see if it's wikidata specific [23:13:20] (03CR) 10Tim Landscheidt: "Tested on toolsbeta." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77144 (owner: 10Tim Landscheidt) [23:13:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:13:42] (03CR) 10Tim Landscheidt: "Tested on toolsbeta." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77234 (owner: 10Tim Landscheidt) [23:13:54] (03CR) 10Tim Landscheidt: "Tested on toolsbeta." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77452 (owner: 10Tim Landscheidt) [23:14:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [23:22:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:27:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.159 second response time [23:27:33] (03PS3) 10Yuvipanda: Add labsproxy module for replacing instanceproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/77454 [23:30:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:32:51] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Fri Aug 2 23:32:42 UTC 2013 [23:33:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [23:33:41] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [23:50:04] PROBLEM - Puppet freshness on pdf3 is CRITICAL: No successful Puppet run in the last 10 hours [23:52:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:53:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [23:57:04] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [23:59:46] (03Abandoned) 10Reedy: Add DataTypes as requirement for Wikibase [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/74291 (owner: 10Jeroen De Dauw)