[00:00:00] ori-l: although another reason, was because we arn't sure how to measure our memcache usage if its mixed in with the standard servers. With dedicated memcache instances we can use the stats generated by them to extrapolate what kind of memory usage we would have on a larger deployment [00:00:15] mem/cpu/etc. [00:00:17] ah, that makes sense [00:00:54] PiRSquared: I'm not really sure what the protocol is for resolving such issues -- presumably you need to pick a suffix for dupes [00:00:57] <^demon|away> ori-l: Did we ever figure out what magic gnomes were increasing memc usage last time around? [00:01:11] ^demon|away: yeah, it was a wikibase config object [00:01:18] ori-l: how about "/old" or "broken" or something? [00:01:20] <^demon|away> Ah, that sites thing, yes. [00:01:30] It doesn't really matter, as long as you can restore the content of those pages [00:01:36] <^demon|away> werdna: Oh, and while we're sorta on the subject...what kind of unholy global-scoped hell is this container.php? :D [00:02:15] ^demon|away: Well, it's not included globally scoped [00:02:16] ask ebernhardson [00:02:30] <^demon|away> Yeah, that's completely unclear unless you start grepping where it's used. [00:02:34] <^demon|away> That's scary as hell. [00:02:37] <^demon|away> :D [00:02:47] but basically the idea is that you initialise Flow by reading from the container [00:02:53] get all the objects you need [00:03:02] namespaces! [00:03:06] and then pass them up the stack to get somewhere [00:03:32] ebernhardson: in that case, too, you probably explicitly don't want an existing MC host [00:03:48] ori-l: is there another channel for requests like mine (angwiki)? I'm sorry for bothering you... [00:04:51] PiRSquared: Ok, I used '-old' as the suffix [00:04:58] okay [00:04:59] let me paste the output, so you know which pages were affected [00:05:14] Thanks. [00:05:23] PiRSquared: https://dpaste.de/WOD0/raw [00:06:08] ebernhardson: let me poke around for a sec to see if there's an existing node that would be an obvious fit for this purpose [00:06:17] Thanks so much, ori-l. [00:06:17] if not, you'll probably want to file an RT ticket [00:08:31] ebernhardson: actually, you should just get a dedicated host for this; I think there are a bunch of spares, so there's no reason to pile this on a host that is already doing something else. That way system metrics will correspond exclusively to your usage [00:08:44] so file an RT ticket and ask for one [00:09:22] and poke RobH about it when he's around :P [00:09:35] <^demon|away> ori-l: Bunch is relative :) https://wikitech.wikimedia.org/wiki/Server_Spares [00:09:46] <^demon|away> But yeah, RT is the way to go. [00:09:55] well, there arent a tton of spares [00:09:59] there are spares to go around yes [00:10:05] ebernhardson: I see the conversation moved on a bit, but I was going to suggest either an email to ops@ or an RT ticket outlining your needs and we can follow up from there. [00:10:08] but with the tampa migration, we have to be reasonable. [00:10:15] rt procurement ticket is best yes [00:10:39] ^demon|away: ah, I forgot about that page, thanks [00:10:46] <^demon|away> yw [00:13:13] (03CR) 10Chad: "This should be able to go in now now that the exception issue is resolved." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/93622 (owner: 10Chad) [00:14:04] ebernhardson: what you missed: https://dpaste.de/iE5t/raw [00:32:46] !log stopping replication on sanitarium db1054:3308 and labsdb1002:3308 while restoring dewiki to labs [00:33:01] Logged the message, Master [00:39:52] (03CR) 10Aklapper: "Just FYI, keeping audit (for admins like me), metrics (for Quim Gil and the Tech Community Metrics folks) and bugzilla_report (for everybo" [operations/puppet] - 10https://gerrit.wikimedia.org/r/94075 (owner: 10Dzahn) [00:49:31] keen: what exactly is the issue? [00:49:58] jasper_deng: http://paste.lopsa.org/133 [00:50:17] keen: I see that, but I see no real issue w/ that [00:50:23] just passing it along really. [00:50:42] randy seems to think it was enough of a problem to chase down..but I dont know what the original issue was. [00:51:03] (03CR) 10Bsitu: Enable Flow discussions on a few test wiki pages (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94106 (owner: 10Spage) [00:52:19] looks like matt walker has since responded though, so I'll leave it alone now. ;) [00:53:01] werdna: Security and ops review are incomplete. December 4 seems unlikely. [00:53:46] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (218292) [00:56:26] * keen . o O ( my guess would be that he clicked a donate link that landed him on the "anti spam and abuse statement" instead of a page where he could actually donate... if I follow the his "success" at the bottom, I can arrive at an actual donation form.. ) [00:58:20] ori-l, did you add an ability to do "vagrant dbupdate" or something like that by any chance? easier than to do vagrant ssh / php update.php :) [00:59:59] yurik-road: mediawiki::extension types take a boolean needs_update parameter that will automatically run update.php when the extension is enabled [01:00:17] but there is no facility for generic, unscheduled db updates [01:01:08] ori-l,the most common case i have is doing various "git pull" in core and extensions, and having to vagrant ssh to do the php update [01:01:42] it would be great if we have something like "vagrant pull-and-update" :) [01:02:43] yurik-road: easy to do; 'vagrant run-tests' runs core's phpunit tests; it's implemented in lib/mediawiki-vagrant/run-tests.rb [01:02:56] you could follow that example to implement pull-and-update [01:03:16] which would go through all mediawiki core & extensions, do git pull for master branch (not sure if git pull master is the right command), and do php update [01:03:23] note that it invokes /usr/local/bin/run-mediawiki-tests [01:03:36] oki, might have a crack at it :) [01:03:41] which is just: [01:04:02] . /etc/profile.d/puppet-managed/set_mw_install_path.sh ; cd "$MW_INSTALL_PATH" ; php tests/phpunit/phpunit.php --testdox "$@" [01:05:17] yurik-road: sorry, that's lib/mediawiki-vagrant/run-tests.rb ; i had an old branch checked out [01:07:47] ori-l, forgot to tell you btw - what do you think about renaming all the *-role into role-* ? this would allow quick autocomplete :) [01:08:04] more importantly it would group them together in the help list [01:08:30] sure, especially if we retained the older syntax as a fallback [01:21:47] (03CR) 10Spage: "Why I'm guessing that officewiki should be a separate database." (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94106 (owner: 10Spage) [01:28:07] (03CR) 10Aude: Enable Flow discussions on a few test wiki pages (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94106 (owner: 10Spage) [02:08:29] !log LocalisationUpdate completed (1.23wmf4) at Wed Nov 27 02:08:28 UTC 2013 [02:08:46] Logged the message, Master [02:15:19] !log LocalisationUpdate completed (1.23wmf5) at Wed Nov 27 02:15:19 UTC 2013 [02:15:34] Logged the message, Master [02:24:51] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [02:28:51] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (203586) [02:29:51] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [02:39:53] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (207734) [02:40:52] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Nov 27 02:40:52 UTC 2013 [02:41:07] Logged the message, Master [02:43:53] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [02:47:53] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (223879) [02:55:03] PROBLEM - Puppet freshness on professor is CRITICAL: No successful Puppet run for 0d 18h 3m 27s [03:01:23] PROBLEM - Disk space on db1019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:21:54] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [03:27:54] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (202245) [03:30:54] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [03:35:00] ori-l, almost done with the script (had to step away for a bit) - but it fails because ssh doesn't pass in my credentials. Is there an option to pipe my creds to vagrant? [03:50:52] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (202285) [03:51:52] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [04:26:25] (03PS2) 10MZMcBride: Create "Draft" namespace on the English Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97675 [04:37:35] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (202729) [04:47:35] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [04:54:35] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (203247) [04:56:36] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [04:59:36] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (205451) [05:15:33] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [05:18:33] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (201108) [05:20:33] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [05:25:33] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (210614) [05:55:25] PROBLEM - Puppet freshness on professor is CRITICAL: No successful Puppet run for 0d 21h 3m 49s [06:53:01] (03PS1) 10ArielGlenn: depool db1019 (s3) temporarily for lvm resize [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97888 [06:59:08] (03CR) 10ArielGlenn: [C: 032] depool db1019 (s3) temporarily for lvm resize [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97888 (owner: 10ArielGlenn) [07:01:15] !log ariel updated /a/common to {{Gerrit|I4372bb602}}: depool db1019 (s3) temporarily for lvm resize [07:01:32] Logged the message, Master [07:02:26] !log ariel synchronized wmf-config/db-eqiad.php 'depool db1019 (s3) temporarily for lvm resize' [07:02:41] Logged the message, Master [07:10:36] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [07:13:36] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (202900) [07:33:36] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [07:36:33] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (211383) [07:38:26] !log rebooting db1019 after kernel upgrade, fix for broken xfs_growfs [07:38:41] Logged the message, Master [07:39:43] PROBLEM - Host db1019 is DOWN: PING CRITICAL - Packet loss = 100% [07:41:13] RECOVERY - Host db1019 is UP: PING OK - Packet loss = 0%, RTA = 0.37 ms [07:53:34] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [07:58:08] (03PS1) 10ArielGlenn: warm up db1019 (s3) after lvm resize [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97890 [07:59:01] (03CR) 10ArielGlenn: [C: 032] warm up db1019 (s3) after lvm resize [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97890 (owner: 10ArielGlenn) [07:59:21] !log ariel updated /a/common to {{Gerrit|I50354e622}}: warm up db1019 (s3) after lvm resize [07:59:36] Logged the message, Master [08:00:03] !log ariel synchronized wmf-config/db-eqiad.php 'warm up db1019 (s3) aftr lvm resize' [08:00:19] Logged the message, Master [08:05:42] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [08:15:42] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (201108) [08:16:32] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [08:46:50] (03CR) 10Ori.livneh: "Ottomata: OK, no objection then." [operations/puppet] - 10https://gerrit.wikimedia.org/r/94169 (owner: 10Ottomata) [08:47:36] (03CR) 10Ori.livneh: "Yay! Thanks!" [operations/debs/python-kafka] (debian) - 10https://gerrit.wikimedia.org/r/97848 (owner: 10Ottomata) [08:48:16] PROBLEM - Disk space on labstore4 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=98%): /exp/public/datasets 0 MB (0% inode=98%): /exp/public/keys 0 MB (0% inode=98%): /exp/public/repo 0 MB (0% inode=98%): [08:49:29] ahhh [08:51:06] apergos: paravoid: labstore4 out of disk space :/ /ext/public/datasets filled :/ [08:51:19] not sure what that machine is, my labs project has labstore1 [08:51:26] uugghhh [08:51:48] guess we need to clear some things [08:51:52] lemme look [09:17:46] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [09:29:57] found it, now how to fix... [09:34:16] RECOVERY - Disk space on labstore4 is OK: DISK OK [09:48:30] PROBLEM - RAID on db9 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:48:57] !log there was no mount /srv/pagecounts on labstore4, so rsync to /exp/pagecounts wrote to and filled /; did the mkdir and now things seem ok [09:49:11] Logged the message, Master [09:49:50] PROBLEM - DPKG on db9 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:49:50] PROBLEM - Disk space on db9 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:50:31] PROBLEM - mysqld processes on db9 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:51:30] PROBLEM - RAID on db9 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:53:20] PROBLEM - SSH on db9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:53:21] PROBLEM - MySQL disk space on db9 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:53:30] PROBLEM - puppet disabled on db9 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:53:30] RECOVERY - mysqld processes on db9 is OK: PROCS OK: 1 process with command name mysqld [09:53:40] RECOVERY - DPKG on db9 is OK: All packages OK [09:56:31] PROBLEM - mysqld processes on db9 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:56:40] RECOVERY - Disk space on db9 is OK: DISK OK [09:59:20] RECOVERY - SSH on db9 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [09:59:50] PROBLEM - DPKG on db9 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:01:15] apergos: oooh, pagecounts got done? [10:01:16] nice [10:01:37] I think they've been there for some time [10:01:48] there was just a hiccup on labstore4 this time, not sure why [10:01:50] RECOVERY - DPKG on db9 is OK: All packages OK [10:01:51] hmm, I remember my patch got reverted [10:02:10] RECOVERY - MySQL disk space on db9 is OK: DISK OK [10:02:20] RECOVERY - puppet disabled on db9 is OK: OK [10:02:21] RECOVERY - mysqld processes on db9 is OK: PROCS OK: 1 process with command name mysqld [10:08:17] !log shot some old puppet processes hogging memory on db9 (from march and earlier) [10:08:30] Logged the message, Master [10:17:32] (03PS1) 10ArielGlenn: db1019 (s3) back to full weight in pool [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97896 [10:17:59] YuviPanda: that was due to using nfs instead of rsyncing to the remote server [10:18:05] right [10:18:07] that was fixed the same or the next day [10:18:10] aaah [10:18:15] i didn't know that :) [10:18:19] thanks! [10:18:25] I didn't know you didn't know :-) [10:19:01] (03CR) 10ArielGlenn: [C: 032] db1019 (s3) back to full weight in pool [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97896 (owner: 10ArielGlenn) [10:19:27] !log ariel updated /a/common to {{Gerrit|If5ebd6194}}: db1019 (s3) back to full weight in pool [10:19:43] Logged the message, Master [10:20:14] !log ariel synchronized wmf-config/db-eqiad.php 'db1019 (s3) back to full weight in the pool' [10:20:27] Logged the message, Master [10:50:18] (03CR) 10Faidon Liambotis: [C: 031] "FWIW, this is good to go, I'm just waiting for the dependency." [operations/puppet] - 10https://gerrit.wikimedia.org/r/97004 (owner: 10Dr0ptp4kt) [10:53:57] PROBLEM - Host ssl1 is DOWN: PING CRITICAL - Packet loss = 100% [10:55:07] (03CR) 10Faidon Liambotis: [C: 032] Zero: Changed 470-01 to whitelist all languages [operations/puppet] - 10https://gerrit.wikimedia.org/r/97860 (owner: 10Yurik) [10:55:42] (03CR) 10Faidon Liambotis: [C: 032] Serve gdash.wikimedia.org on misc varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/97693 (owner: 10Ori.livneh) [10:55:56] thanks [10:56:07] RECOVERY - Host ssl1 is UP: PING OK - Packet loss = 0%, RTA = 35.44 ms [11:00:07] works, too [11:00:16] mind if i do the dns change? [11:01:22] (03CR) 10Faidon Liambotis: [C: 032] gdash.wm.o: noc -> misc varnish [operations/dns] - 10https://gerrit.wikimedia.org/r/97698 (owner: 10Ori.livneh) [11:01:30] faidon speaks in gerrit [11:03:30] sorry, I wasn't watching irc [11:03:43] and you didn't use my name/nick, so it didn't notify me [11:04:04] what, you don't get pinged by 'dns'? :P [11:04:07] :P [11:06:03] ori-l / paravoid mind merging: https://gerrit.wikimedia.org/r/#/c/97697/1 ? [11:06:42] the arrow alignment fix is welcome, the line break isn't [11:07:04] nod [11:07:12] languages that don't have multi-line strings lose the right to complain about line length in my book :P [11:07:28] i usually break long lines, but sometimes it's clearer [11:07:45] if you want to do the line break, you should do [\n File[...],\n File[...]\n, ] [11:08:05] i.e. keep the [ & ] in their own lines and indent the File resources [11:08:11] maybe I missed an indentation level there [11:08:46] https://dpaste.de/Boc3/raw [11:09:39] right [11:09:43] that's what I meant, thanks :-) [11:11:29] !log ssl1 rebooted itself about 15 mins ago, no idea why [11:11:43] Logged the message, Master [11:11:45] because I rebooted it? [11:11:58] it was kinda of obvious if you looked at auth.log :) [11:11:59] I didn't see it in the logs (I did look) [11:12:05] SAL that is [11:12:51] well.. wanna log that then? :-P [11:12:57] no [11:13:21] rebooting a random depooled ssl server in tampa, who cares [11:14:48] as long as icinga is still notifying in here, it's worth logging for that reason [11:23:13] (03PS2) 10Matanya: webperf :lint-clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/97697 [11:23:32] paravoid: ^ [11:23:57] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (206939) [11:24:06] (03CR) 10Ori.livneh: [C: 032 V: 032] "Thank you very much" [operations/puppet] - 10https://gerrit.wikimedia.org/r/97697 (owner: 10Matanya) [11:24:32] thanks ori-l :) [11:24:37] (03PS1) 10Ori.livneh: decom gdash on professor [operations/puppet] - 10https://gerrit.wikimedia.org/r/97900 [11:24:37] damn [11:24:40] he's faster [11:24:55] i think i'm going on a lint project [11:25:06] it drives me made the tabs/spaces mess [11:25:09] ori-l: don't merge yet [11:25:18] cached dns entries? [11:25:21] yes [11:25:22] ttl is 1h [11:25:31] *mad [11:25:42] I deployed the change at 11:00 UTC [11:25:45] that leaves 35' more minutes [11:25:57] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [11:26:04] not that I expect gdash to have many users, at least at this hour [11:26:08] but you never kno [11:26:15] yeah, still nice to do the right thing [11:28:36] !log faidon switched gdash.wm.o from professor.pmtpa -> tungsten.eqiad behind misc-varnish & rebooted ssl1 in tampa [11:28:45] lol [11:28:50] Logged the message, Master [11:28:55] polite nudge? [11:29:09] I'll take it :) [11:29:14] i was alarmed by the alert too [11:29:44] fair enough [11:29:57] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (205308) [11:31:49] ok, going to get some sleep for real now, thanks for the merges / puppet runs / dns change [11:31:57] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [11:33:34] and for udpprofiler! :D [11:33:51] matanya: and the lint! [11:37:43] (03PS1) 10Aude: Fix Wikibase noc symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97903 [11:39:48] (03PS6) 10Aude: Enable Wikidata build on beta labs [WIP] [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95996 [11:40:34] (03CR) 10Aude: [C: 04-1] "needs more testing and scrutiny to ensure this doesn't break localisation cache stuff on beta" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95996 (owner: 10Aude) [11:50:41] (03CR) 10ArielGlenn: [C: 031] "As a first step this is ok. It desperately needs instance-specific stuff separated out into a role module in later steps." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/96403 (owner: 10Dzahn) [12:42:02] (03PS5) 10Addshore: Start wikidata puppet module for builder [operations/puppet] - 10https://gerrit.wikimedia.org/r/96552 [12:42:57] (03CR) 10Addshore: "Tested on Labs and so far this all seems to work as expected, Now to try and implement the next stage ontop of this!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/96552 (owner: 10Addshore) [13:29:50] (03PS1) 10Matanya: varnish : lint clean up [operations/puppet] - 10https://gerrit.wikimedia.org/r/97910 [13:30:43] (03CR) 10jenkins-bot: [V: 04-1] varnish : lint clean up [operations/puppet] - 10https://gerrit.wikimedia.org/r/97910 (owner: 10Matanya) [13:31:47] (03PS2) 10Matanya: varnish : lint clean up [operations/puppet] - 10https://gerrit.wikimedia.org/r/97910 [13:39:29] paravoid: now ori-l isn't here, you can be faster :) ^ [14:02:17] (03CR) 10Faidon Liambotis: [C: 032] decom gdash on professor [operations/puppet] - 10https://gerrit.wikimedia.org/r/97900 (owner: 10Ori.livneh) [14:45:50] (03CR) 10ArielGlenn: "What changed in the apache configs that would make this work now?" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91209 (owner: 10Reedy) [15:06:59] hello, is meta having issues? i just got: Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please try again in a few minutes. If you report this error to the Wikimedia System Administrators, please include the details below. [15:07:00] Request: POST http://meta.wikimedia.org/wiki/Special:CentralNoticeBanners/edit/B13_1125_nmsrt_enYY, from 208.80.154.134 via cp1066 frontend ([10.2.2.25]:80), Varnish XID 1146839306 [15:07:01] Forwarded for: 199.83.222.237, 208.80.154.134 [15:07:01] Error: 503, Service Unavailable at Wed, 27 Nov 2013 15:03:23 GMT [15:09:55] i am trying to clone a banner and just got: Internal error - Meta [15:09:56] [d16b359b] 2013-11-27 15:08:49: Fatal exception of type BannerExistenceException [15:11:16] oh i guess it actually already cloned [15:12:01] Jeff_Green: have you seen anything weird? [15:14:00] meganhernandez: looking [15:17:56] meganhernandez: can you try it again? [15:18:07] hi akosiaris! I have a couple of .deb reviews that need some love, whenever you have a few minutes [15:18:37] the big one is the varnishkafka .deb, I'd like to get final approval from either you or faidon before we tag and merge [15:18:38] https://gerrit.wikimedia.org/r/#/c/78782/ [15:18:40] ottomata: gime a couple of minutes and I 'll be there [15:18:45] suuure thanks [15:19:16] it seems ok now Jeff_Green [15:21:59] https://gdash.wikimedia.org/dashboards/reqerror/ so there's that [15:26:18] apergos: I haven't really been tuned in on the 503 issue, been heads-down on fundraising cluster prep [15:26:47] ottomata: so [15:27:08] varnishkafka is number 1. What other deb ? [15:27:21] lkoigster [15:27:23] logster [15:27:23] https://gerrit.wikimedia.org/r/#/c/95556/ [15:27:29] and python-kafka [15:27:29] https://gerrit.wikimedia.org/r/#/c/97848/ [15:27:43] both of those are pretty simple —with-python2 packages [15:28:00] apergos: I've seen and skimmed the various email, is that a decent summary? [15:28:10] 20 patch sets ? [15:28:12] man... [15:28:14] haha, yeah [15:28:23] well, faidon originally just committed a skeleton [15:28:34] and we added logging, and logrotate, and a postinst, etc. etc. [15:29:23] (03PS2) 10ArielGlenn: mark stuff in decomm.pp with their rt tickets for easier tracking [operations/puppet] - 10https://gerrit.wikimedia.org/r/93930 [15:29:35] oo, actually, i think I just thought of something I might need to add….not sure, [15:29:41] it's been a variety of issues, mostly covered in the emails [15:29:45] Jeff_Green: [15:29:52] so, akosiaris, varnishkafka will log periodic json stats to /var/cache/varnishkafka.stats.json [15:29:53] (sorry, tuned out for a minute in another window) [15:30:05] varnishkafka starts up as varnishlog user [15:30:31] i *think* that it doesn't have permissions to write to /var/cache at first [15:31:04] hmm [15:31:14] maybe I should make varnishkafka itself just write to /tmp/varnishkafka.stats.json by default [15:31:20] and have puppet move it to /var/cache or something? [15:31:23] with proper permissions? [15:31:45] that is the fast way out. You can do that... but what are those json stats? [15:31:49] hello i was going to put up a 100% test. should i wait? i got that error about 30 mins ago but haven't got it again since then [15:32:09] (03PS3) 10ArielGlenn: mark stuff in decomm.pp with their rt tickets for easier tracking [operations/puppet] - 10https://gerrit.wikimedia.org/r/93930 [15:32:46] meganhernandez: should be fine, go for it [15:32:54] heyhey meganhernandez [15:33:26] akosiaris: https://gist.github.com/ottomata/7677672 [15:33:35] just stats about running varnishkafka and librdkafka stuff [15:33:40] hi there werdna [15:33:42] want to parse that to send to ganglia [15:33:46] ok will do Jeff_Green [15:33:49] things like txbytes, errors,e tc. [15:35:04] (03CR) 10ArielGlenn: [C: 032] mark stuff in decomm.pp with their rt tickets for easier tracking [operations/puppet] - 10https://gerrit.wikimedia.org/r/93930 (owner: 10ArielGlenn) [15:35:11] ottomata: and this gets rewritten often right ? [15:35:27] its appended to [15:35:32] every 60 seconds by default [15:35:56] so not a cache [15:36:05] suppose not? faidon told me to put it there :p [15:36:05] right ? [15:36:19] but maybe he didn't realize that it was a log? [15:36:19] can it be deleted without data loss ? [15:36:22] yes [15:36:23] no problem [15:36:26] well i mean [15:36:31] its just running stats [15:36:39] if you restart varnishkafka they will all reset [15:36:47] you'll lose history if you delete it, but no biggy [15:37:01] its only stats, has nothing to do with real operation of varnishkafka [15:37:09] as always with kafka a cornercase... [15:37:18] ha, eh? [15:37:25] ok cache it is [15:37:34] since it can be deleted without data loss [15:37:36] yeah [15:37:44] but i mean, you can delete /var/log/syslog without dataloss [15:37:46] otherwise i 'd advise spool [15:37:59] of course that is dataloss [15:38:13] but you said it gets recreated on restart [15:38:18] hmm [15:38:20] on restart [15:38:24] oh, no it will still be appended to [15:38:25] truncated ? [15:38:27] just the counters will reset [15:38:56] just out of curiosity [15:39:04] the program want to append data [15:39:07] wants* [15:39:15] how does it append ? [15:39:21] it json we are talking about [15:39:22] fopen "a" [15:39:45] https://gist.github.com/ottomata/7677672 [15:39:58] how many appends have happened here ? [15:39:59] https://github.com/wikimedia/varnishkafka/blob/master/varnishkafka.c#L1773 [15:40:07] oh that's 2 [15:40:13] each line is a full json object [15:40:19] varnishkafka will log its own stats [15:40:24] and librdkafka logs its own as well [15:40:48] gistfile2.json is just 2 lines from that file [15:40:51] it logs one line at a time [15:41:12] but those 2 are completely different from each other [15:41:42] yes [15:42:04] those are the only 2 different objects that are (currently) logged [15:42:12] but true, librdkafka keeps its own stats [15:42:19] and varnishkafka keeps its own as well [15:42:32] librdkafka ? [15:42:34] librdkafka takes a callback from which it will perodically log [15:42:37] yeah, the kafka C library [15:42:41] varnishkafka just uses it [15:42:43] wait [15:42:55] varnishkafka writes that file [15:42:58] haha, oh man [15:43:01] yes. [15:43:05] and periodically appends to it [15:43:14] what has librdkafka to do with that file ? [15:43:23] sorry, ok, librdkafka is just the C kafka library [15:43:33] varnishkafka uses it to produce varnish log messages to kafka [15:44:12] librdkafka maintains its own stats about its runtime usage [15:44:24] please tell me in a different file [15:44:31] not in a file [15:44:32] just in memory [15:44:36] :-) [15:44:38] if you want to get that data out of it, you pass it a callback that it will periodically get called [15:44:47] we pass it this vk_log_stats function [15:44:50] to get the json stats into this file [15:44:54] pull the data and write it yourself [15:45:01] https://github.com/wikimedia/varnishkafka/blob/master/varnishkafka.c#L1361 [15:45:27]                 rd_kafka_conf_set_stats_cb(conf.rk_conf, kafka_stats_cb); [15:45:36] (line 1975) [15:45:49] we are pulling the data and writing it ourself [15:45:55] rdkafka calls OUR callback [15:45:57] and we write it [15:46:19] so in that gist [15:46:21] s/we/varnishkafka/g :) [15:46:25] there are two json documents [15:46:38] which is which ? [15:46:57] the kafka is from rdkafka [15:46:59] kafka == librdkafka (we can change that top level key to rdkafka or something, been kinda wanting to do that) [15:47:02] and varnishkafka == varnishkafka [15:47:03] yes [15:47:05] okok [15:47:13] so it will append those two types of documents [15:47:17] periodically [15:47:21] yup [15:47:21] correct ? [15:47:31] finally I understood :-) [15:47:36] haha :) [15:48:00] so /var/cache sounds ok with it [15:48:12] ok, cool [15:48:19] and you want to create the directory with the correct user ? [15:48:32] either way, right now we were just creating the file directly in /var/cache [15:48:37] /var/cache/vanrishkafka.stats.json [15:48:56] directory or whatever, varnishkafka should be able to write there on install [15:49:20] so have the package create the directory [15:49:27] .dirs file [15:49:47] k, and how to chown it to varnishlog user? postinst? [15:50:51] yes [15:51:22] hm, should this be done in the Makefile as part of make install? since that is the default location? [15:52:04] it should be done after you create the user [15:52:19] and make install does not create users :-) [15:52:33] postinst is your friend [15:52:38] …this package doesn't create the user :/ the user is varnishlog which uhhhhh, hm maybe it shouldn't be? [15:52:43] varnishlog i think comes with varnish package? [15:52:44] not sure [15:53:13] ah.. hmm [15:53:31] varnishkafka needs to run as varnishlog I suppose ? [15:53:44] hm, i don't know actually [15:53:47] Snaps? ^ [15:53:51] i don't think it does [15:57:33] akosiaris: let's assume it doesn't need to be varnishlog [15:57:37] should it have its own user? [15:57:43] yes [15:58:00] hm [15:58:05] varnishncsa runs as varnishlog [15:58:17] maybe for a reason ? [15:58:24] doesn't seem to [15:58:24] I am unfortunately unaware... [15:58:27] you can run that as any user [15:58:31] i just ran it as ganglia and it works [15:58:59] i think that's why we (faidon? snaps?) made varnishkafka run as varnishlog [15:59:02] because varnishncsa does [15:59:06] and this is a replacement for varnisihncsa [15:59:37] I have the suspicion that it *needs* to run as varnishncsa [15:59:39] ok, alternatively, if it does need to run as varnishlog, you think I should just use .dirs and .postinstall to chown? [15:59:46] I was in a meeting [15:59:50] I'm here now [15:59:56] hiiiii! :0 [16:00:11] if you need me, backlog's TL;DR please :) [16:00:14] how you doing, sleepybusy paravoid? [16:00:41] ok, paravoid, do you know, does varnishkafka NEED to run as varnishlog user [16:00:46] varnishncsa runs as varnishlog user [16:00:50] but doesn't seem to need to [16:01:05] and, if vanrishkafka does not need to run as varnishlog user, should we create a user for it? [16:01:29] I don't think there's any such requirement [16:01:35] but why would you create a new user? [16:01:46] ha, i don't know, because akosiaris thinks we shoudl? [16:02:17] IMHO to keep them separate... why should a daemon share another daemons UID ? [16:02:30] it not wise from a security POV [16:02:30] we won't run both [16:02:51] varnishkafka is a heavily modified fork of varnishncsa [16:03:15] well, we'll run both for a little bit, buuuut, what creates teh varnishlog user? [16:03:21] but yeah, they /can/ coexist [16:03:21] is that from varnish package? [16:03:30] dunno, probably? [16:03:36] I don't see a harm in a separate user, though [16:03:40] so if it makes things easier, by all means [16:04:01] it makes things a wee more difficult [16:04:04] (03CR) 10ArielGlenn: role and module structure for ishmael (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/96403 (owner: 10Dzahn) [16:04:05] i'd rather it just be varnishlog [16:04:21] it does not i think.. [16:04:32] mostly because you don't have a dependency on another package [16:04:38] that creates the user and blah blah [16:04:39] if it depends on varnish, that's fine [16:04:50] it keeps things slighlty cleaner [16:04:51] Depends: varnish, ${shlibs:Depends}, ${misc:Depends} [16:06:33] yes, varnishlog is created by varnish package [16:06:57] https://gist.github.com/ottomata/7678234 [16:07:36] yes it does.... in that case [16:07:39] ah dpkg-statoverride! [16:07:44] and since you depend on it [16:07:44] didn't know about that one [16:07:59] its fine [16:08:16] still, my pedantic self would be happier with a different user as long as it works [16:08:51] haha, well, i'd rather keep it since it depends on it anyway, varnishncsa uses it, and paravoid likes it :) [16:10:22] ok [16:11:07] btw.. don't use dpkg-statoverride in this case [16:11:22] ok [16:11:25] it's meant as a tool for the sysadmin not the package builder [16:11:28] oh ok [16:11:28] hm [16:11:42] (03PS21) 10Ottomata: (WIP) Initial Debian version [operations/software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/78782 (owner: 10Faidon Liambotis) [16:12:11] (03PS22) 10Ottomata: (WIP) Initial Debian version [operations/software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/78782 (owner: 10Faidon Liambotis) [16:12:15] oop, forgot dirs, there we go [16:12:22] (03CR) 10Faidon Liambotis: [C: 04-2] "Let's just fix it in the extension IMHO. We can wait." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97540 (owner: 10Yurik) [16:12:32] ottomata: I should add you a debian builder job :D [16:12:43] ottomata: that is https://github.com/mumrah/kafka-python ? [16:12:50] yes [16:13:09] missing debian/copyright :P [16:13:28] yeah, i never know what to put for that [16:14:24] shoudl I just put the LICENSE file there? [16:15:04] nope... it has a format. [16:15:10] c/p kafka's one [16:15:18] and modify it accordingly [16:15:27] http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ [16:15:27] ok [16:18:08] (03CR) 10Akosiaris: [C: 04-1] "And missing debian/copyright" (034 comments) [operations/debs/python-kafka] (debian) - 10https://gerrit.wikimedia.org/r/97848 (owner: 10Ottomata) [16:20:49] (03PS3) 10Ottomata: Initial deb packaging [operations/debs/python-kafka] (debian) - 10https://gerrit.wikimedia.org/r/97848 [16:20:56] (03CR) 10Ottomata: Initial deb packaging (034 comments) [operations/debs/python-kafka] (debian) - 10https://gerrit.wikimedia.org/r/97848 (owner: 10Ottomata) [16:23:36] (03PS4) 10Ottomata: Initial deb packaging [operations/debs/python-kafka] (debian) - 10https://gerrit.wikimedia.org/r/97848 [16:24:49] ok, akosiaris. there we go [16:26:04] mutante, are you working already? [16:26:43] andrewbogott: was reading backlog , kind of [16:26:47] whats up [16:27:00] I lost my backscroll and need you to talk me through nagios/puppet-freshness again :( [16:27:11] If you're not working yet I can check in again later in the day :) [16:27:43] andrewbogott: hold on, i'll find an old list mail and forward it to you [16:29:34] mutante: Thank you! [16:30:01] andrewbogott: http://lists.wikimedia.org/pipermail/labs-l/2012-March/000128.html and forwarded [16:30:40] check if base.pp still has [16:30:45] exec { "puppet snmp trap": [16:30:59] then you got the answer to _how_ the puppet runs trigger it [16:31:29] yep, it's still there. [16:31:55] I'm pretty confused by ".1.3.6.1.4.1.33298" but I guess I can just ignore that part :) [16:32:24] check this part "See how `hostname` is being used in there. This simply works in [16:32:28] production because production hosts return the same string for [16:32:30] hostname that Nagios uses to define the hosts it knows about. On labs [16:32:34] though, hostname returns the resource name (f.e. i-000000f8)," [16:33:16] yeah that is weird [16:33:22] andrewbogott: yes, that's the OID for {iso(1) identified-organization(3) dod(6) internet(1) private(4) enterprise(1) 33298} [16:34:09] and that should work fine [16:34:23] First Registration Authority Mark Bergsma [16:34:40] http://oid-info.com/get/1.3.6.1.4.1.33298 [16:35:02] that's not gonna be the issue, but see my mail, i went through several issues one by one back then [16:35:31] mutante: ok, looks like this email has everything I need! Now I just have to figure out how to convey that info to wikitech... [16:35:46] it could be the hostname mismatch [16:35:47] which, I guess I could just make wikitech the nagios host, but that seems like overkill. [16:35:49] or firewalling [16:36:33] andrewbogott: haha, saw that yet? [16:36:35] So when trying to find out if it is even possible for an instance to [16:36:38] know it's own instance name with a local command, Andrew Bogott [16:36:41] pointed me to this (thanks!:): [16:36:43] http://aws.amazon.com/code/1825 (EC2 Instance Metadata Query Tool ), [16:37:25] yeah -- from wikitech's perspective I have all that info already. So solving /my/ problem may be easier than solving the actual nagios alert issue [16:37:37] (I'm trying to generate a report page on wikitech that includes puppet freshness info) [16:37:48] aah [16:38:00] thought you just wanted to fix the current icinga [16:38:28] (03CR) 10Akosiaris: [C: 04-1] "Minor stuff plus the debian/copyright file missing" (032 comments) [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/95556 (owner: 10Ottomata) [16:38:29] They may or may not turn out to be the same problem [16:39:49] ah, also see replies on that thread , like when petan said he'd fix it and "I changed the address in nagios to that ec weird string." [16:39:57] yes [16:41:20] so for your report, you could also just parse the icinga log to see the incoming passive checks there (if they work) or listen to the packets yourself (snmptrapd) [16:44:21] oh a report page, hmmm [16:44:40] if you had report output what would you do with it? [16:45:38] I ask because I have thought of having various status reports automatically show up on wikitech but no idea how to go about it [16:46:02] sounds like trying to make puppet dashboard homebrew version? [16:46:29] well for output that puppet doesn't have [16:47:32] i would like (the functionality of) this https://www.datadoghq.com/ [16:48:00] https://www.datadoghq.com/integrations/ [16:48:28] correlate jenkins event with ganglia metrics..bla [16:50:15] (03PS2) 10Reedy: Switch www portals to using one docroot [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91209 [16:50:17] that would be nice [16:50:21] (03CR) 10Akosiaris: Initial deb packaging (031 comment) [operations/debs/python-kafka] (debian) - 10https://gerrit.wikimedia.org/r/97848 (owner: 10Ottomata) [16:50:36] ah there is a Reedy [16:50:50] perhaps with an answer to my question [16:53:14] (03CR) 10Akosiaris: Initial deb packaging (031 comment) [operations/debs/python-kafka] (debian) - 10https://gerrit.wikimedia.org/r/97848 (owner: 10Ottomata) [16:54:27] (03PS2) 10Ottomata: Initial .deb packaging [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/95556 [16:58:56] (03PS5) 10Ottomata: Initial deb packaging [operations/debs/python-kafka] (debian) - 10https://gerrit.wikimedia.org/r/97848 [16:59:18] (03CR) 10Ottomata: Initial deb packaging (031 comment) [operations/debs/python-kafka] (debian) - 10https://gerrit.wikimedia.org/r/97848 (owner: 10Ottomata) [17:00:30] ok akosiaris, commetns on all 3 .debs responded to and patchsets up [17:01:38] ottomata: why 0.8.1 [17:01:57] the guy says it will be 0.8.0-1 [17:02:19] (at least I hope it will be true) [17:04:36] (03CR) 10Reedy: "The problem last time around was that I moved the www.wikimedia.org "portal" to the wwwportals config." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91209 (owner: 10Reedy) [17:06:56] hmmmmMMM [17:07:02] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (208326) [17:07:24] dunno, good catch [17:07:43] (03PS6) 10Ottomata: Initial deb packaging [operations/debs/python-kafka] (debian) - 10https://gerrit.wikimedia.org/r/97848 [17:10:20] (03CR) 10Ottomata: Initial .deb packaging (031 comment) [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/95556 (owner: 10Ottomata) [17:12:02] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [17:16:03] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (202908) [17:20:02] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [17:39:01] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (201723) [17:39:34] akosiaris: i am hoping to get a +1 from you today on at least the varnishkafka deb today so I can finalize that, ehhhhhh? :) [17:39:59] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [17:41:58] ottomata: having a meeting really soon unfortunately. Can't promise anything :-( [17:42:27] sooook, maybe paravoid will help :D? [17:42:35] I'm on the same meeting [17:43:14] rats! [17:43:15] k [17:44:59] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (201917) [17:48:59] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [17:54:59] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (207218) [17:57:59] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [18:07:04] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (211654) [18:08:34] PROBLEM - Varnish HTTP mobile-frontend on cp3012 is CRITICAL: Connection timed out [18:10:05] PROBLEM - LVS HTTP IPv4 on mobile-lb.esams.wikimedia.org is CRITICAL: Connection timed out [18:11:34] RECOVERY - Varnish HTTP mobile-frontend on cp3012 is OK: HTTP OK: HTTP/1.1 200 OK - 261 bytes in 0.191 second response time [18:11:55] RECOVERY - LVS HTTP IPv4 on mobile-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 23050 bytes in 0.291 second response time [18:12:02] !log kill -9 gdb on cp3012, attached to varnish frontend [18:12:16] Logged the message, Master [18:13:08] paravoid: I was still there! [18:13:26] in any case, it seems it was 3011 this time, I should have set up both [18:13:57] (at least, 3011 restarted its child) [18:14:01] oh, sorry [18:14:16] I never got any kind of stop on the gdb on 3012 [18:18:06] (03PS1) 10Matthias Mullie: Fix config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97933 [18:20:39] (03PS2) 10Chad: Fix config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97933 (owner: 10Matthias Mullie) [18:26:33] (03PS3) 10Jeremyb: Fix Flow config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97933 (owner: 10Matthias Mullie) [18:36:48] (03CR) 10Cmcmahon: [C: 031] "merge at will please, this is required for Flow on beta labs" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97933 (owner: 10Matthias Mullie) [18:41:12] http://translatewiki.net/ is down?! [18:41:25] are we hosting it? [18:41:38] 1. it's up, 2. no. [18:42:11] twkozlowski, i am getting domain not configured wikimedia message [18:42:50] yurik-road: strange, it works fine for me [18:43:49] hmm, it just came back up [18:51:33] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:53:24] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [18:54:38] (03PS1) 10Manybubbles: Show "using new search engine" when using Cirrus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97939 [18:55:37] (03CR) 10Manybubbles: "Might be worth deploying when we re-enable Cirrus on Monday. Not before." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97939 (owner: 10Manybubbles) [18:56:05] <^d> That can be deployed any time without harm :p [18:56:33] (03PS2) 10Manybubbles: Show "using new search engine" when using Cirrus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97939 [18:56:58] yeah [18:57:04] I suppose just not today [19:00:13] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [19:01:46] <^d> manybubbles: I merged the other one about betafeatures. [19:01:54] <^d> I had reviewed it before, and Lego made the change I wanted. [19:02:21] sweet! [19:02:25] wonderful, really! [19:02:28] I'm excited about that [19:03:51] oh no, do I really have to submit a bug with a penis image as an example :-( [19:04:13] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (213298) [19:04:39] bblack:so, anything useful? [19:05:01] or did my kill mess with it too much [19:08:33] paravoid: I didn't get anything, I didn't even see a stop in gdb before the kill, so that doesn't even seem right [19:08:46] :/ [19:09:24] where did you pick up the original tpp.c assert from? [19:09:45] syslog [19:11:26] yeah I guess it was only on cp3011 this time [19:11:32] cp3012 died, but not with the assert [19:12:06] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [19:15:06] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (203480) [19:17:15] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [19:18:15] hashar, Reedy : https://gerrit.wikimedia.org/r/#/c/97933 is a fix for labs that should have no effect in production. But my understanding is it's bad to have config changes in master that aren't deployed since it confuses someone trying to push a fix. So do we do? [19:18:27] s/do we/what do/ [19:19:26] (03Abandoned) 10Yuvipanda: tools: Remove redundant declration [operations/puppet] - 10https://gerrit.wikimedia.org/r/97629 (owner: 10Yuvipanda) [19:19:59] (03CR) 10Spage: [C: 031] "Looks good, I'd like to +2 but will undeployed config changes confuse people working on fixing production?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97933 (owner: 10Matthias Mullie) [19:21:51] (03CR) 10Chad: "Yes. Please don't merge unless you plan to deploy :)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97933 (owner: 10Matthias Mullie) [19:22:06] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (202254) [19:23:08] meh, job_queue, limit 199,999 and we're always just slightly above [19:23:31] but that limit keeps going up over time :P [19:24:08] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [19:27:44] ^d: about merging config changes ^^ [19:28:34] ^d (you're Chad, right?) Thanks for responding. My understanding is mediawiki-config changes get automatically deployed to beta. So we would wait for an LD window to +2 and deploy the config changes to production? [19:29:06] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (202185) [19:29:11] <^d> Yeah, because then if someone else needs to deploy to production (ie: site's crashing), you're in their way. [19:30:17] <^d> Config changes that span prod/beta files are fun that way :) [19:31:30] ^d if the change was only to mediawiki-config/*-labs.php files would that be OK to +2? Or chrismcmahon we could put betalabs on a branch. [19:31:49] <^d> No branches here, that'll make them diverge :) [19:32:03] <^d> And yes, if we were only touching *-labs* it'd be fine :) [19:32:38] <^d> The production impact here is basically zero, so I have no real opposition to deploying, just saying that merging without deploying is bad. [19:33:04] <^d> (Ask Ryan_Lane to tell you how awesome the production/test branches of puppet were :p) [19:33:30] * Ryan_Lane pukes [19:33:34] haha [19:33:54] +1 merging without deploy is bad [19:34:16] the next one deploying will be surprised and track you down [19:35:02] <^d> In my case, if I'm the next one it involves a fair bit of verbal assault after tracking you down :p [19:35:21] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [19:35:47] <^d> And if you make livehacks, I commit them locally with embarrassing commit summaries like "NOT COMMITTING YOUR WORK IS BAD, YO" ;-) [19:36:44] and if you see remnants of the 'test' branch, remove it :p [19:40:08] we should just use svn [19:41:08] ori-l, around? [19:41:13] YuviPanda: get a rack in Tampa for that ;p [19:41:19] YuviPanda, VSS is the best [19:41:37] we should just have index.php, index2.php, index.final.php and so on [19:41:42] SUPER EFFICIENT! [19:42:01] nah, index.php is not compressed [19:42:14] anyway, where's ori? not having him around is like... the end of the world! [19:42:23] yurik: too early in the day, probably [19:42:31] 11am? [19:42:34] yeah, you are right [19:43:10] ^d so the net effect on production should be zero if that change is deployed, but I don't have +2 or deploy rights either [19:43:20] even oribot needs sleep [19:43:23] he was around until something like 3-4am his time [19:43:28] ^d: but it's got us stopped cold in beta labs right now [19:43:33] give the man some time to *gasp* sleep [19:43:56] <^d> You're one to talk, you non-sleeping robot man :p [19:45:24] <^d> chrismcmahon: Gonna merge + deploy now. [19:45:31] (03CR) 10Chad: [C: 032] Fix Flow config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97933 (owner: 10Matthias Mullie) [19:45:32] thanks ^d! [19:46:34] * ^d patiently waits on jenkinsbot. [19:47:28] <^d> Any time you feel like merging, jenkins, just dandy with me. [19:47:34] ^d "HELPING OUT beta-labs IS GOOD, YO" ;-) [19:48:18] <^d> "I SAVED YOUR CODE FOR YOU" <- Next one I'll use ;-) [19:48:29] (03Merged) 10jenkins-bot: Fix Flow config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97933 (owner: 10Matthias Mullie) [19:49:20] hassle it may be but it is in the service of a smooth first deployment Real Soon Now [19:49:32] ^d, are you deploying flow? i thought it was next week :) can't wait to crit^H^H^H praise it! [19:49:37] !log demon synchronized wmf-config/CommonSettings.php 'Fixes for Flow config, no-op in prod' [19:49:51] ^d: 'IT IS DANGEROUS TO GO ALONE, HERE TAKE THIS: git commit'? :) [19:49:51] Logged the message, Master [19:50:09] yurik: Flow is out next week but it's nice to work out config issues before it goes live, eh? [19:50:13] !log demon synchronized wmf-config/InitialiseSettings.php 'Fixes for Flow config, no-op in prod' [19:50:29] Logged the message, Master [19:50:32] hehe, true that. is it available anywhere public yet? [19:50:32] <^d> yurik: Yeah, I decided to deploy it to all pages on all wikis the day before Thanksgiving. Living life on the edge like that ;-) [19:50:53] ^d: also enable AFT on all of them while you are at it [19:51:02] and LQT at the same time... [19:51:12] so where can i see it? [19:51:24] <^d> Let's put CodeReview everywhere. What wiki couldn't make use of SVN-based code review systems? [19:51:34] yurik: http://ee-flow.wmflabs.org/wiki/Main_Page [19:52:19] yei, thx YuviPanda [19:52:31] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (205353) [20:00:25] kinda confusing - why is the "flag" icon and "pencil" icon both bring up the editor when i point to my edits, but for other posts flag shows "hide". Also my name everywhere is kinda confusing ... I guess it copies after facebook in that [20:00:36] but overall - damn! I like!!! :) [20:02:31] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [20:04:02] hmm, paravoid, did we know that default-jdk/jre is java 6? [20:04:07] openjdk6? [20:04:08] yes [20:04:12] i thought it was 7 on precise [20:05:10] ottomata, next week? any ideas of which place we should try next? btw, is there a spreadsheet of places we went to already? [20:05:27] naw i don't have any other locations, thought you had some [20:05:31] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (201836) [20:05:32] i'm sure there are plenty more [20:05:57] there are - i saw something niceish in the village on google maps, will check [20:06:07] but we need a spreadsheet to keep track [20:06:32] ottomata: default-jdk is openjdk-6 because openjdk-7 didn't work in some of the more niche architectures [20:06:35] i think there was one, don't remember [20:06:36] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [20:06:42] at least that was the case for Debian, not sure if Ubuntu is just following the lead [20:07:04] oof, i'm looking at this because I did a ps recently on a hadoop node, and noticed that /usr/lib/jvm/j2sdk1.6-oracle/bin/java is running :/ [20:07:11] thought we uninstalled this [20:07:13] looking at it now [20:07:28] i think we are supposed to run ahdoop on openjdk 7, not 6 [20:07:45] yes [20:07:58] j2sdk1.6 isn't openjdk 6 [20:08:03] yeah i know [20:08:04] it's sun/oracle javas [20:08:33] right, so I saw oracle running, was all like 'whaaa', and then started looking at update-java-alternatives and default-jdk [20:08:39] was confused by the default being 6 but ok [20:15:36] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (212242) [20:38:32] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [20:44:16] yurik: what's up? [20:45:29] ori-l: I merged the gdash professor patchset but didn't clean up manually [20:45:33] I'm not sure you should bother [20:45:40] we're going to kill the box soon [20:45:41] hey ori-l ! you are back :) I tried to do git-update, but it fails because it can't pass through the creds from the host [20:47:38] when i enable the passthrough, it complains because only the insecure pubkey is being added. When i enable all keys to be used, it complains that i need to change the host. What do you think would be the best way to do a git pull under my host machine credentials? [20:49:14] paravoid: danke danke [20:49:31] in the spirit of yurik: udpprofile? [20:49:46] yurik: hmm [20:50:20] you could execute the git-pull on the host, but then you don't have the benefit of a homogenous environment [20:50:45] why would it need to pass through creds? anonymous fetch? [20:51:03] good question [20:51:13] it can do the fetch the first time without issue [20:51:41] gwicke: RT 6408,6409,6410 have been created by cmjohnson because he saw disk failures. i addd you and info that we just talked about them the other day when /tmp was full etc.. and you confirmed they are constantly used for tests, not prod.. so fyi [20:53:27] that is cerium,xenon,praseodymium.. software raid [20:54:11] is it legal to have a CNAME for both foo.bar.quux.tld and bar.quux.tld ? I thought CNAME would delegate all the children? [20:54:27] (hint: WMF does that now) [20:56:04] ori-l & YuviPanda, yes, but usually i have the git remote set to ssh+username, so are you saying there is a way to git pull anonymously? [20:56:06] ^d: who was 3 and 4 from? [20:56:25] <^d> I don't remember. Would have to look at logs. [20:56:32] (03CR) 10Edenhill: (WIP) Initial Debian version (031 comment) [operations/software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/78782 (owner: 10Faidon Liambotis) [20:57:08] (03PS1) 10John F. Lewis: Enable AbuseFilter block option on Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98002 [20:57:10] yurik: ori-l hmm, I have origin set to https and a remote 'gerrit' set to ssh. [20:57:19] I think vagrant by default sets origin to https [20:57:21] (03CR) 10Faidon Liambotis: [C: 04-1] varnish : lint clean up (0317 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/97910 (owner: 10Matanya) [20:58:32] YuviPanda, i only have one remote - gerrit [20:59:25] yurik: git pull takes either the name of the remote ('origin' by default) or an explicit URL [20:59:38] yurik: in a pinch, you could get the remote URL and munge it to be anonymous [20:59:59] ottomata: I dont know, guess it depends on the varnish shm log file permissions [21:00:11] hmm, i guess that could be done :) [21:00:25] now i need to learn how to do string manipulations in bash :) [21:00:29] bleh [21:00:37] but thanks for the idea ): [21:01:14] yurik: git pull $(git config --get remote.origin.url | sed 's|//.*@|//|g') [21:01:32] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (200522) [21:01:35] (03CR) 10Ottomata: (WIP) Initial Debian version (031 comment) [operations/software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/78782 (owner: 10Faidon Liambotis) [21:03:06] yurik: another possibility is to have an interactive prompt on provisioning [21:03:21] yurik: "Would you like me to generate a key and upload it to Gerrit?" [21:03:40] yurik: gerrit has an http API that could plausibly be used [21:03:41] ori-l, yes, but i will leave that for you :) [21:03:43] anyways, needs a bit of thought [21:04:32] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [21:05:43] (03CR) 10Odder: [C: 04-1] Enable AbuseFilter block option on Wikidata (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98002 (owner: 10John F. Lewis) [21:06:38] (03CR) 10Vogone: [C: 04-1] "In the discussion finite block durations were proposed as the block targets will be IPs." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98002 (owner: 10John F. Lewis) [21:06:50] ori-l: would need to double check with ^d, but afaik keys should never be touched in gerrit, only uploaded into the thing on wikitech [21:07:03] <^d> {{cn}} [21:07:12] <^d> You do have to upload to gerrit, because it doesn't use the ldap keys. [21:07:14] <^d> (Which is stupid) [21:09:33] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (205760) [21:09:36] Gerrit is stupid, news at 11 [21:10:33] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [21:11:54] (03CR) 10Odder: "What's the acceptable block duration? One month, half a year?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98002 (owner: 10John F. Lewis) [21:17:01] (03PS1) 10Ori.livneh: serve graphite.wikimedia.org via misc-varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/98003 [21:17:33] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (204450) [21:18:33] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [21:19:26] (03CR) 10Vogone: "There does not seem to be any consensus regarding this. I proposed this to be discussed within the community before 1-2 people here start " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98002 (owner: 10John F. Lewis) [21:21:33] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (206829) [21:26:02] RECOVERY - Disk space on analytics1023 is OK: DISK OK [21:26:12] RECOVERY - SSH on analytics1023 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [21:26:13] RECOVERY - puppet disabled on analytics1023 is OK: OK [21:26:13] RECOVERY - Host analytics1023 is UP: PING OK - Packet loss = 0%, RTA = 0.60 ms [21:26:22] RECOVERY - DPKG on analytics1023 is OK: All packages OK [21:26:32] RECOVERY - RAID on analytics1023 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [21:28:32] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [21:40:43] (03PS4) 10Milimetric: [not ready for review] Productionizing Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 [21:41:15] (03CR) 10jenkins-bot: [V: 04-1] [not ready for review] Productionizing Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [22:03:07] (03PS1) 10Hashar: (WIP) beta: autoupdate should restart parsoid (WIP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/98007 [22:06:30] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (200821) [22:07:30] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [22:18:30] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (207638) [22:24:42] (03PS1) 10Hashar: parsoid: role class for beta and factor out common code [operations/puppet] - 10https://gerrit.wikimedia.org/r/98014 [22:26:30] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [22:30:30] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (202288) [22:31:30] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [22:32:26] (03PS1) 10Andrew Bogott: Change fflorin's uid... [operations/puppet] - 10https://gerrit.wikimedia.org/r/98016 [22:32:32] (03CR) 10jenkins-bot: [V: 04-1] Change fflorin's uid... [operations/puppet] - 10https://gerrit.wikimedia.org/r/98016 (owner: 10Andrew Bogott) [22:32:36] apergos: so… ^ ? [22:32:38] oops [22:32:47] heh [22:33:21] I rebased seconds ago! Does this mean someone else is fixing the same thing? [22:34:16] not touching it [22:34:19] the heck? Git tells me I'm fully up to date, jenkins tells me it can't rebase [22:35:09] uh [22:35:27] (03CR) 10Mattflaschen: [C: 04-1] "There are several issues with this patch:" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97675 (owner: 10MZMcBride) [22:35:27] my box thinks it's october, that could be an issue. [22:36:06] well your parent is right, I checked that [22:39:52] (03PS2) 10Andrew Bogott: Change fflorin's uid. [operations/puppet] - 10https://gerrit.wikimedia.org/r/98016 [22:41:16] Well, change some punctuation in the commit log and Jenkins loves me again [22:41:35] andrewbogott: happened to me yesterday too, btw [22:41:41] hahaha [22:42:05] * apergos wonders how many gratuitous commit message edits jenkins has caused [22:42:54] lgtm [22:43:04] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (204534) [22:45:04] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [22:49:04] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (207111) [22:51:05] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [22:54:05] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (206355) [22:55:16] (03PS1) 10Ori.livneh: Add .gitreview [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/98019 [22:55:27] (03CR) 10Ori.livneh: [C: 032 V: 032] Add .gitreview [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/98019 (owner: 10Ori.livneh) [22:59:28] (03CR) 10Dzahn: [C: 031] "+1 for avoiding duplicate UID, merge when the existing files are also fixed with find -exec" [operations/puppet] - 10https://gerrit.wikimedia.org/r/98016 (owner: 10Andrew Bogott) [22:59:45] bah, I have Andrew on notify [22:59:50] awww [22:59:53] (03CR) 10Andrew Bogott: [C: 032] Change fflorin's uid. [operations/puppet] - 10https://gerrit.wikimedia.org/r/98016 (owner: 10Andrew Bogott) [22:59:55] that's gonna get annoying [23:00:02] also hi werdna, how are things? [23:00:09] heyhey apergos, things are okay [23:00:17] except I am hung over today and somebody started using a LEAF BLOWER at 8am [23:00:23] eeewwwww [23:00:37] it's a weekday, they shouldn't even be out in their yard [23:00:50] they should be drinking coffe or on their morning commute already [23:01:09] <^demon|meeting> Maybe it's not their own lawn? [23:01:14] * YuviPanda drinks from a bottle of single malt scotch [23:01:24] except I've been using it as a water bottle for a few months now [23:01:29] hahaha [23:01:31] nice.... [23:01:52] ^demon|meeting: in that casse you get a kid to come do the lawn for 75 cents, after school [23:01:54] (03PS1) 10Ori.livneh: Add mwprof to git-deploy; apply to tungsten as role::mwprof [operations/puppet] - 10https://gerrit.wikimedia.org/r/98022 [23:01:56] still not at 8am [23:02:26] you pay kids to come to your lawn so you can tell them to get off it? [23:02:54] (03CR) 10Ori.livneh: [C: 032] Add mwprof to git-deploy; apply to tungsten as role::mwprof [operations/puppet] - 10https://gerrit.wikimedia.org/r/98022 (owner: 10Ori.livneh) [23:03:34] no, not unless they hang around on the lawn after they are done raking it [23:03:41] <^demon|meeting> YuviPanda: No, but I will pay kids to come to my lawn so I don't have to do yard work myself ;-) [23:03:45] <^demon|meeting> Yard work. Bleh. [23:03:57] solution is to not have lawns [23:04:08] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [23:04:16] <^demon|meeting> Indeed. After I moved out of home I swore I'd never do yard work unless I could afford to pay someone to do it for me. [23:04:23] <^demon|meeting> I *refuse* to ever mow a damn lawn again. [23:04:41] :D [23:04:42] ok. so wait. you now have a decent income and can afford to pay someone to do it [23:04:50] so you will now do it yourself? :-P [23:05:05] he also lives in SF, so I bet he doesn't have a lawn :P [23:05:12] <^demon|meeting> Touché, both of you. [23:05:44] I have plants on the balcony (and inside now, winter), no lawn. worksforme [23:06:16] <^demon|meeting> I liked having plants on my balcony in Richmond. [23:06:21] <^demon|meeting> I *do* miss my grape vines. [23:07:42] ^demon|meeting: http://www.vineviewer.co/?search=%23grape vines that have #grape in them [23:08:01] <^demon|meeting> YuviPanda: I've yet to watch a vine. Don't think I'll start now :) [23:08:01] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (201957) [23:08:31] ^demon|meeting: me neither. I've so far only been using it to make similarly themed jokes [23:08:50] vine, whine, wine... [23:09:01] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [23:12:37] Ryan_Lane: what's the proper workaround for fatal: Unknown commit none/master for a new git-deploy setup? [23:15:20] (03PS1) 10Ori.livneh: Remove a line of commented-out code to touch repo for git-deploy [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/98026 [23:15:33] (03CR) 10Ori.livneh: [C: 032 V: 032] Remove a line of commented-out code to touch repo for git-deploy [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/98026 (owner: 10Ori.livneh) [23:31:31] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [23:39:31] (03CR) 10Tim Starling: "> Can we move the www.wikimedia.org portal to the wwwportals file (as a later thing), and still have some sort of catch all and redirect t" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91209 (owner: 10Reedy) [23:43:07] (03PS1) 10Andrew Bogott: Remove user accounts from the labstore boxes. [operations/puppet] - 10https://gerrit.wikimedia.org/r/98030 [23:43:12] apergos: ^ [23:43:19] heh [23:44:22] (03PS2) 10Andrew Bogott: Remove user accounts from the labstore boxes. [operations/puppet] - 10https://gerrit.wikimedia.org/r/98030 [23:44:27] (03CR) 10ArielGlenn: [C: 031] "yep, and let's find out from Coren how/if we get users onto those boxes, given the existing ldap accounts (shudder)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/98030 (owner: 10Andrew Bogott) [23:45:31] (03CR) 10Andrew Bogott: [C: 032] Remove user accounts from the labstore boxes. [operations/puppet] - 10https://gerrit.wikimedia.org/r/98030 (owner: 10Andrew Bogott) [23:51:17] andrewbogott: do you know how to debug git-deploy issues? [23:51:28] nope! Never used it. [23:51:35] Someday I hope to know how, though :) [23:52:05] who was it that was picking up some git-deploy maintenance responsibilities from ryan? [23:55:14] I've been looking at the code some but no testing yet [23:55:34] I think hashar and... forget who, were using it recently [23:55:55] also if I keep typing in here I will never go to bed (2 am) so.. [23:56:01] signing off, have a good rest of the day [23:56:59] ori-l: I announced that I would like to pick up those responsibilities, but I so far have not, and am also rankly unqualified for the role (so far). [23:58:14] If you want to poke at it together at some point, let me know. I know a bit about it by know, and maybe it's better than nothing.