[00:10:05] here is 2AM. Bye guys! [00:15:36] here it's over 4AM [00:15:41] so what?:P [01:23:25] (03PS1) 10Springle: warm up db1060 in s2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106856 [01:23:36] (03CR) 10jenkins-bot: [V: 04-1] warm up db1060 in s2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106856 (owner: 10Springle) [01:27:31] (03PS2) 10Springle: warm up db1060 in s2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106856 [01:27:49] (03CR) 10Springle: [C: 032] warm up db1060 in s2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106856 (owner: 10Springle) [01:27:58] (03Merged) 10jenkins-bot: warm up db1060 in s2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106856 (owner: 10Springle) [01:29:42] !log springle synchronized wmf-config/db-eqiad.php 'warm up db1060 in s2' [01:29:49] Logged the message, Master [02:14:22] !log LocalisationUpdate completed (1.23wmf9) at Sat Jan 11 02:14:21 UTC 2014 [02:14:29] Logged the message, Master [02:28:04] !log LocalisationUpdate completed (1.23wmf10) at Sat Jan 11 02:28:04 UTC 2014 [02:28:11] Logged the message, Master [02:47:58] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Jan 11 02:47:58 UTC 2014 [02:48:04] Logged the message, Master [03:21:48] !log Reloading zuul to deploy I1053812d8acfe9 [03:21:55] Logged the message, Master [04:32:58] PROBLEM - MySQL Processlist on db1021 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 154 statistics [04:34:58] RECOVERY - MySQL Processlist on db1021 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 3 statistics [04:44:14] (03CR) 10Ori.livneh: [C: 04-1] "The Upstart job configuration contains erb syntax (it renders @deploy_path) so it has to be a template, which means it has to be moved to " [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 (owner: 10Mwalker) [05:10:03] (03PS1) 10Springle: db1060 to full steam [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106862 [05:10:45] (03CR) 10Springle: [C: 032] db1060 to full steam [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106862 (owner: 10Springle) [05:10:53] (03Merged) 10jenkins-bot: db1060 to full steam [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106862 (owner: 10Springle) [05:11:47] !log springle synchronized wmf-config/db-eqiad.php 'db1060 to full steam' [05:11:56] Logged the message, Master [05:18:50] (03PS1) 10Springle: assign db1023 to s6 [operations/puppet] - 10https://gerrit.wikimedia.org/r/106863 [05:20:07] (03CR) 10Springle: [C: 032] assign db1023 to s6 [operations/puppet] - 10https://gerrit.wikimedia.org/r/106863 (owner: 10Springle) [05:27:30] !log xtrabackup clone db1022 to db1023 [05:27:35] Logged the message, Master [05:37:08] PROBLEM - mysqld processes on db1023 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [06:55:49] (03PS5) 10BryanDavis: Kibana puppet class [operations/puppet] - 10https://gerrit.wikimedia.org/r/106169 [07:40:53] good evening, would it be OK to troubleshoot an exception on testwiki by ssh to mw1017 and modifying a file in the Flow extension? [07:42:12] * spage glad to see everybody's at the disco on Friday night :) [07:45:59] :-D [07:47:38] have you tested it locally first? [07:47:50] and. what's the change? [07:58:14] apergos: a file has Fatal error: Class 'Flow\Model\FlowException' not found because it doesn't have "use Flow\Exception\FlowException;". So that's masking the actual problem. [07:58:34] ok [07:58:57] so you're going to fix that? go ahead then while I'm here [07:59:19] sp [07:59:22] er [07:59:24] spage: [07:59:28] actually, the stack trace shows what the actual problem likely is, so I don't need to unmask it. No worries, go back to the dance floor :) [07:59:39] heh [08:00:02] if only! nah, I'm saturday morning hacking... in my pjs yet [08:31:57] (03PS1) 10Faidon Liambotis: librensm: adjust syslog blacklist [operations/puppet] - 10https://gerrit.wikimedia.org/r/106870 [08:32:14] (03CR) 10Faidon Liambotis: [C: 032 V: 032] librensm: adjust syslog blacklist [operations/puppet] - 10https://gerrit.wikimedia.org/r/106870 (owner: 10Faidon Liambotis) [08:48:22] thanks Tim Landscheidt to give me shell access [09:17:02] (03PS1) 10Faidon Liambotis: librenms: minor fixes to phpdump [operations/puppet] - 10https://gerrit.wikimedia.org/r/106876 [09:21:40] (03CR) 10Ori.livneh: [C: 031] librenms: minor fixes to phpdump [operations/puppet] - 10https://gerrit.wikimedia.org/r/106876 (owner: 10Faidon Liambotis) [09:35:28] (03CR) 10Faidon Liambotis: [C: 032] librenms: minor fixes to phpdump [operations/puppet] - 10https://gerrit.wikimedia.org/r/106876 (owner: 10Faidon Liambotis) [09:35:30] (03PS1) 10Faidon Liambotis: librenms: another syslog blacklist entry [operations/puppet] - 10https://gerrit.wikimedia.org/r/106877 [09:35:47] (03CR) 10Faidon Liambotis: [C: 032] librenms: another syslog blacklist entry [operations/puppet] - 10https://gerrit.wikimedia.org/r/106877 (owner: 10Faidon Liambotis) [10:36:48] PROBLEM - SSH on eeden is CRITICAL: Server answer: [10:37:48] RECOVERY - SSH on eeden is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [15:51:59] (03PS1) 10Tinaj1234: Changed date format in l10nupdate-1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/106892 [16:21:04] (03PS1) 10Faidon Liambotis: librenms: workaround JunOS stupidness [operations/puppet] - 10https://gerrit.wikimedia.org/r/106895 [16:21:06] (03PS1) 10Faidon Liambotis: librenms: make main class' "config" option simpler [operations/puppet] - 10https://gerrit.wikimedia.org/r/106896 [16:24:07] (03CR) 10Faidon Liambotis: [C: 032] librenms: workaround JunOS stupidness [operations/puppet] - 10https://gerrit.wikimedia.org/r/106895 (owner: 10Faidon Liambotis) [16:24:16] (03CR) 10Faidon Liambotis: [C: 032] librenms: make main class' "config" option simpler [operations/puppet] - 10https://gerrit.wikimedia.org/r/106896 (owner: 10Faidon Liambotis) [18:44:23] (03PS1) 10Stwalkerster: dynamicproxy: Pass through existing XFF data too [operations/puppet] - 10https://gerrit.wikimedia.org/r/106907 [19:01:02] (03PS9) 10Matanya: svn: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 [19:04:28] PROBLEM - Varnish HTCP daemon on cp1055 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:04:28] PROBLEM - Varnish HTTP text-backend on cp1055 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:04:28] PROBLEM - Varnish traffic logger on cp1055 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:32:49] (03PS1) 10Odder: Set $wgExportFromNamespaces to true on MediaWiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106911 [19:33:58] (03CR) 10jenkins-bot: [V: 04-1] Set $wgExportFromNamespaces to true on MediaWiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106911 (owner: 10Odder) [19:35:24] ouch [19:36:30] (03PS2) 10Odder: Set $wgExportFromNamespaces to true on MediaWiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106911 [19:42:59] twkozlowski: pm? [20:07:58] PROBLEM - Varnish traffic logger on cp1066 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:08:28] PROBLEM - Varnish HTTP text-backend on cp1066 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:08:38] PROBLEM - Varnish HTCP daemon on cp1066 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:13:59] andrewbogott: around? [20:14:23] matanya, what's up? [20:14:27] hi there [20:14:57] i'm planning to start a kind of a big change and want your opinion before i start [20:15:16] andrewbogott: it is converting manifest/webserver into a module. [20:16:18] RECOVERY - Varnish HTTP text-backend on cp1066 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.006 second response time [20:16:28] RECOVERY - Varnish HTCP daemon on cp1066 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [20:16:48] RECOVERY - Varnish traffic logger on cp1066 is OK: PROCS OK: 2 processes with command name varnishncsa [20:16:49] matanya, what's your email? [20:18:14] ok, I just sent you selections from a recent email discussion about this... [20:18:38] I think that moving webserver into a module is a good first step for the topic of that email. [20:18:54] But probably I won't merge your work directly but will use it as a base for a more dramatic reorg. [20:19:00] (Unless you want to do that too :) ) [20:19:15] I started that conversation but then got bogged down with labs migration stuff so haven't gotten back to it. [20:19:25] oh, lengthy theads :) [20:19:34] *r [20:20:12] The upshot is that we're probably going to keep alive the webserver classes. So that's good new for your plans... [20:20:29] I haven't thought a whole lot about how it should be rearranged. It's pretty baffling at the moment. [20:20:54] ok, i'll take this under my hands. i'll see what i can do here. any other stuff i need to know before i take this any further? [20:21:14] I don't think so… just that it'll be complex and controversial :) [20:21:47] oh, that is ok. stuff get better with controversial [20:22:20] sometimes :) [20:28:16] andrewbogott: HHVM? [20:28:29] hiphopvm [20:28:33] Facebook's php engine [20:29:27] oh that. do we use it? [20:30:17] we want to [20:30:38] any page on why? [20:31:43] Not sure. I think it's always been a long-term goal, not sure if much progress has been made. [20:31:49] are you not on wikitech-l? [20:31:56] It's rumored to be much faster. [20:32:59] MaxSem: i was, but way to many mailing-lists cluttered my mail [20:33:25] i reduced it to the minimum 56 needed [20:33:37] and that is also too many [20:36:13] You are on 56 mailing lists?! [20:36:44] sadly, yes [20:37:02] not all wiki related though :) [20:37:23] I can think of three I'm on :-) [20:38:38] the more buttons you have the more mailing lists you are on [20:40:55] I have no idea how many buttons I have [20:41:03] my keyboard has about 100 I guess [20:41:19] do I count my shitty phone too? :-P [20:41:51] LOL [20:42:28] * twkozlowski winks [21:19:01] !log ytterbium: CPU %user spike starting 19:40, gerrit very slow. [21:19:07] Logged the message, Master [21:19:36] LeslieCarr: ^ [21:23:28] RECOVERY - Varnish traffic logger on cp1055 is OK: PROCS OK: 2 processes with command name varnishncsa [21:24:18] RECOVERY - Varnish HTCP daemon on cp1055 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [21:24:18] RECOVERY - Varnish HTTP text-backend on cp1055 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.002 second response time [21:29:41] I just got a page for gerrit [21:29:52] timeout [21:31:30] If I pull from gerrit.wikimedia.org: "Timeout, server gerrit.wikimedia.org not responding.", "fatal: The remote end hung up unexpectedly" [21:32:02] yeah [21:32:31] "ssh scfc@gerrit.wikimedia.org -p 29418" works. [21:34:07] guess I'll restart gerrit [21:34:11] scfc_de: Repeatedly? Or just intermittently? [21:34:17] it's certainly hogging the cpu over there [21:34:22] apergos: Which process(es) are taking all the CPU? [21:34:28] well [21:34:33] or maybenot [21:34:35] java [21:34:49] load is starting to drop, I'll wait for a bit [21:35:04] https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&h=ytterbium.wikimedia.org&m=cpu_report&s=by+name&mc=2&g=cpu_report&c=Miscellaneous+eqiad [21:35:07] Weird. [21:35:56] Gloria: Now "git pull" succeeded, so: intermittently. [21:36:01] mm [21:36:36] Yeah, I had weirdness earlier. I think it's just overloaded and consequently slow. [21:38:28] undecided [21:38:34] restart or leave alone? [21:39:00] Now works for me repeatedly, so leave alone? [21:39:25] The linked graph is interesting. [21:39:45] well my thinking is more like this: rather have demon look at it if it's actually a problem (and I am not really here to babysit it, it's late night for me) [21:39:58] so if it's mostly limping along I will let it alone for now [21:40:27] Does jenkins and such run on that same host? [21:40:30] Or does it run elsewhere? [21:41:20] elsewhere [21:41:28] Hmm, there went that theory. [21:48:13] Haha, wow siebrand: https://gerrit.wikimedia.org/r/#/dashboard/92 [21:48:44] I filed https://bugzilla.wikimedia.org/show_bug.cgi?id=59957