[00:00:32] <grrrit-wm>	 (03CR) 10TTO: "Note, needs a manual rebase now :( Sorry, this is kind of my fault" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104355 (owner: 10Odder)
[00:01:19] <MaxSem>	 LD time! I'm frist!
[00:05:03] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[00:05:03] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[00:06:52] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 00:06:46 UTC 2013  
[00:07:02] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 00:06:56 UTC 2013  
[00:07:03] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 12:06:46 AM UTC  
[00:07:03] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[00:07:51] <logmsgbot>	 !log maxsem synchronized php-1.23wmf8/extensions/MobileFrontend  'https://gerrit.wikimedia.org/r/104682'
[00:08:06] <morebots>	 Logged the message, Master
[00:08:44] <grrrit-wm>	 (03CR) 10MaxSem: [C: 032] Updating schema ID for EchoInteraction [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103611 (owner: 10Kaldari)
[00:08:55] <grrrit-wm>	 (03Merged) 10jenkins-bot: Updating schema ID for EchoInteraction [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103611 (owner: 10Kaldari)
[00:11:52] <logmsgbot>	 !log maxsem synchronized wmf-config/CommonSettings.php  'https://gerrit.wikimedia.org/r/103611'
[00:12:07] <morebots>	 Logged the message, Master
[00:22:48] <MaxSem>	 I'm done
[00:24:33] <Aaron|home>	 !log Running a swift eqiad->tampa sync script (no-deletes) to catch any inconsistencies that built up
[00:24:50] <morebots>	 Logged the message, Master
[00:49:19] <icinga-wm>	 PROBLEM - Disk space on terbium is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=79%):  
[00:53:19] <icinga-wm>	 RECOVERY - Disk space on terbium is OK: DISK OK  
[00:54:02] <grrrit-wm>	 (03PS1) 10Aaron Schulz: Avoid md5sum calls in MergeCdbFileUpdates [operations/puppet] - 10https://gerrit.wikimedia.org/r/104699 
[00:55:41] <LeslieCarr>	 who just cleared off the space on terbium ?
[00:55:44] <LeslieCarr>	 and what'd' you delete ?
[00:55:50] <grrrit-wm>	 (03PS2) 10Aaron Schulz: Disable MessageBlobStore::clear() via hook [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104191 
[00:56:28] <Aaron|home>	 Reedy: that should avoid any notices
[00:56:34] <Aaron|home>	 I guess that can be merged anytime then
[01:04:24] <grrrit-wm>	 (03CR) 10Aaron Schulz: [C: 032] Disable MessageBlobStore::clear() via hook [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104191 (owner: 10Aaron Schulz)
[01:04:48] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[01:04:48] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[01:05:22] <grrrit-wm>	 (03Merged) 10jenkins-bot: Disable MessageBlobStore::clear() via hook [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104191 (owner: 10Aaron Schulz)
[01:06:31] <logmsgbot>	 !log aaron synchronized wmf-config/CommonSettings.php  'Disable MessageBlobStore::clear() via hook'
[01:06:49] <morebots>	 Logged the message, Master
[01:06:59] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 01:06:55 UTC 2013  
[01:07:18] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 01:07:10 UTC 2013  
[01:07:48] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 01:07:10 AM UTC  
[01:07:48] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 01:06:55 AM UTC  
[01:08:50] <grrrit-wm>	 (03PS2) 10Aaron Schulz: Setup and enabled redisLockManager for all file backends in use [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104317 
[01:26:59] <icinga-wm>	 PROBLEM - RAID on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[01:27:48] <icinga-wm>	 RECOVERY - RAID on terbium is OK: OK: optimal, 1 logical, 2 physical  
[01:33:58] <icinga-wm>	 PROBLEM - RAID on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[01:34:51] <icinga-wm>	 RECOVERY - RAID on terbium is OK: OK: optimal, 1 logical, 2 physical  
[01:38:01] <icinga-wm>	 PROBLEM - RAID on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[01:38:02] <icinga-wm>	 PROBLEM - DPKG on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[01:39:01] <icinga-wm>	 PROBLEM - SSH on terbium is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[01:39:52] <icinga-wm>	 PROBLEM - puppet disabled on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[01:40:31] <icinga-wm>	 PROBLEM - twemproxy process on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[01:41:02] <icinga-wm>	 RECOVERY - DPKG on terbium is OK: All packages OK  
[01:41:31] <icinga-wm>	 RECOVERY - twemproxy process on terbium is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker  
[01:41:51] <icinga-wm>	 RECOVERY - puppet disabled on terbium is OK: OK  
[01:43:52] <icinga-wm>	 RECOVERY - SSH on terbium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0)  
[01:45:52] <icinga-wm>	 RECOVERY - RAID on terbium is OK: OK: optimal, 1 logical, 2 physical  
[01:50:01] <icinga-wm>	 PROBLEM - RAID on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[01:51:51] <icinga-wm>	 RECOVERY - RAID on terbium is OK: OK: optimal, 1 logical, 2 physical  
[01:55:01] <icinga-wm>	 PROBLEM - RAID on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[01:55:52] <icinga-wm>	 RECOVERY - RAID on terbium is OK: OK: optimal, 1 logical, 2 physical  
[01:59:01] <icinga-wm>	 PROBLEM - RAID on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[02:04:46] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[02:04:46] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[02:06:56] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 02:06:47 UTC 2013  
[02:07:36] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 02:07:33 UTC 2013  
[02:07:46] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 02:07:33 AM UTC  
[02:07:46] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 02:06:47 AM UTC  
[02:11:19] <grrrit-wm>	 (03PS1) 10Aaron Schulz: Cleaned up RSYNC_ARGS generation in scap-2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/104704 
[02:11:54] <Aaron|home>	 ori: ^ in my local faffing around with rsync, it seems like the include is needed first
[02:14:00] <logmsgbot>	 !log aaron started scap: php-1.23wmf6 testing scap timing
[02:17:31] <logmsgbot>	 !log aaron started scap: php-1.23wmf6 testing scap timing
[02:20:06] <icinga-wm>	 PROBLEM - puppetmaster https on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[02:24:56] <icinga-wm>	 RECOVERY - RAID on terbium is OK: OK: optimal, 1 logical, 2 physical  
[02:25:57] <icinga-wm>	 PROBLEM - HTTP on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[02:27:56] <icinga-wm>	 PROBLEM - puppet disabled on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[02:28:06] <icinga-wm>	 PROBLEM - SSH on terbium is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[02:28:06] <icinga-wm>	 PROBLEM - RAID on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[02:28:06] <icinga-wm>	 PROBLEM - DPKG on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[02:28:46] <icinga-wm>	 RECOVERY - puppet disabled on terbium is OK: OK  
[02:28:56] <icinga-wm>	 RECOVERY - RAID on terbium is OK: OK: optimal, 1 logical, 2 physical  
[02:28:56] <icinga-wm>	 RECOVERY - SSH on terbium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0)  
[02:28:56] <icinga-wm>	 RECOVERY - DPKG on terbium is OK: All packages OK  
[02:30:52] <logmsgbot>	 !log LocalisationUpdate completed (1.23wmf8) at Tue Dec 31 02:30:52 UTC 2013
[02:33:11] * Aaron|home  cancels...probably hanging on terbium
[02:39:47] <Aaron|home>	 looks looks Cirrus jobs eating cpu on terbium
[02:54:25] <logmsgbot>	 !log LocalisationUpdate completed (1.23wmf7) at Tue Dec 31 02:54:25 UTC 2013
[02:59:31] <Aaron|home>	 ori: so that run definitely was hitting all active versions, so the MW_VERSIONS_SYNC must not be getting set
[03:01:56] <icinga-wm>	 RECOVERY - HTTP on virt0 is OK: HTTP OK: HTTP/1.1 302 Found - 457 bytes in 0.071 second response time  
[03:01:57] <icinga-wm>	 RECOVERY - puppetmaster https on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.165 second response time  
[03:02:53] <logmsgbot>	 !log aaron started scap: php-1.23wmf6 Timing test
[03:05:28] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[03:05:41] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[03:06:38] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 03:06:36 UTC 2013  
[03:06:49] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 03:06:46 UTC 2013  
[03:07:28] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 03:06:36 AM UTC  
[03:07:38] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 03:06:46 AM UTC  
[03:20:57] <logmsgbot>	 !log aaron finished scap: php-1.23wmf6 Timing test
[03:21:13] <morebots>	 Logged the message, Master
[03:23:18] <logmsgbot>	 !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 31 03:23:18 UTC 2013
[03:23:35] <morebots>	 Logged the message, Master
[03:24:51] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] Cleaned up RSYNC_ARGS generation in scap-2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/104704 (owner: 10Aaron Schulz)
[03:26:22] <Aaron|home>	 TimStarling: odd, the rsync command for sync-common stalls on terbium (even just running it there directly)
[03:26:28] <Aaron|home>	 no other server has that problem
[03:29:00] <Aaron|home>	 you can see the command in <<ps -ef | grep resync>>, if I add -v to it I can see that it receives a few .git files for the rsync listing and then stalls
[03:29:08] <icinga-wm>	 PROBLEM - RAID on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[03:29:29] <Aaron|home>	 ori: ;)
[03:29:45] <Aaron|home>	 it sucks that one server hangs all of scap forever though
[03:29:59] <icinga-wm>	 RECOVERY - RAID on terbium is OK: OK: optimal, 1 logical, 2 physical  
[03:30:35] * Aaron|home  wonders if it would be too evil to add timeout to the remote dsh commands...
[03:52:00] <TimStarling>	 Aaron|home: apparently terbium is slow because of copyFileBackend or forceSearchIndex or both
[03:52:03] <TimStarling>	 apparently it was swapping
[03:53:23] <TimStarling>	 copyFileBackend is writing very heavily to the local drive
[03:55:04] <TimStarling>	 domas was complaining about it in #mediawiki_security but apparently you are only in that channel as AaronSchulz
[04:04:41] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[04:04:51] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[04:06:41] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 04:06:37 UTC 2013  
[04:06:49] <Aaron|home>	 TimStarling: it only writes directly if there are things to copy, which there are very little...maybe from virtual memory on disk though
[04:06:51] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 04:06:37 AM UTC  
[04:06:51] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 04:06:42 UTC 2013  
[04:07:29] <TimStarling>	 well, it seems pretty busy in top
[04:07:41] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 04:06:42 AM UTC  
[04:09:42] <TimStarling>	 seems very busy in iotop, which I just managed to install
[04:10:58] <Aaron|home>	 yeah, the output just shows it comparing json file listings and finding very little to copy, so I assume the desk i/o is swap (of course there is good network i/o though)
[04:11:06] <Aaron|home>	 *disk
[04:12:09] <TimStarling>	 check vmstat
[04:12:30] <TimStarling>	 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
[04:12:30] <TimStarling>	  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
[04:12:34] <TimStarling>	  3  6 597540 469320  71652 1394104    0    0 11443 25329 31851 24294 48  4 28 20
[04:12:41] <TimStarling>	 no swap, much disk activity
[04:14:49] <TimStarling>	 if I strace one of them at random (6565) it shows a lot of files being created and deleted
[04:21:20] * Aaron|home  restarted with less concurrency
[04:21:40] <Aaron|home>	 TimStarling: any pattern to the file names?
[04:21:56] <domas>	 there was lots of swapping before
[04:22:04] <domas>	 but few more processes got killed by oom killer
[04:22:24] <domas>	 Aaron|home: do those processes have to take 1.5G ?
[04:22:38] <TimStarling>	 http://paste.tstarling.com/p/UqzmQl.html
[04:22:52] <TimStarling>	 let me see if I can get the actual file names in swift
[04:23:01] <domas>	 Aaron|home: technically, if one puts the files on shm, then it is much faster!
[04:25:31] <domas>	 but for that one needs not to waste RAM in memory leaks
[04:25:32] <domas>	 :)
[04:29:25] <TimStarling>	 oh, it's stopped now
[04:33:23] <Aaron|home>	 I think it's the --missingonly used on the deleted files containers
[04:34:54] <Aaron|home>	 oh, wait, no, I just misread the code
[04:35:06] * Aaron|home  is still confused then
[04:50:16] <TimStarling>	 Aaron|home: it's happening again...
[04:50:33] <Aaron|home>	 yeah, not it actually is copying files
[04:50:35] <Aaron|home>	 *now
[04:54:12] <TimStarling>	 what would have happened when it ran out of disk space earlier? would any files have been corrupted?
[05:03:54] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[05:04:04] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[05:06:55] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 05:06:52 UTC 2013  
[05:07:04] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 05:06:57 UTC 2013  
[05:07:05] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[05:07:54] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 05:06:52 AM UTC  
[05:08:19] <Aaron|home>	 TimStarling: cloudfiles seems to ignore the result of fwrite()...not sure
[05:09:04] <TimStarling>	 fclose() is the most common place for disk exhaustion to be reported
[05:11:25] <Aaron|home>	 or fflush() too I suppose
[05:16:00] <Aaron|home>	 meh, I'll run this later...I think the mtimes are higher in eqiad slightly for lots of files due to the previous write order, so more copying happens than needed
[05:16:58] <springle>	 !log package upgrade sanitarium hosts db1053 db1054 db1057
[05:17:15] <morebots>	 Logged the message, Master
[05:17:21] <Aaron|home>	 using the md5 value would avoid that...I added that in passing to a patch still in gerrit
[05:20:33] <logmsgbot>	 !log aaron started scap: php-1.23wmf6 Trying timing test again
[05:20:52] <morebots>	 Logged the message, Master
[05:25:52] <logmsgbot>	 !log aaron finished scap: php-1.23wmf6 Trying timing test again
[05:26:13] <morebots>	 Logged the message, Master
[05:26:35] <Aaron|home>	 1/2 that time was searchidx1001
[05:30:10] <Aaron|home>	 TimStarling: ok, any idea what's wrong with searchidx1001?
[05:30:54] * Aaron|home  waits an age for << time sync-common >> to finish
[05:31:51] <TimStarling>	 looks like disk activity again
[05:32:04] <TimStarling>	 this reminds me, you know how you said dsh is slow?
[05:32:59] <TimStarling>	 it shouldn't be that slow, it may be your agent, or the network connection to it, that's at fault
[05:34:13] <Aaron|home>	 TimStarling: the scap took 5 min, and really would be far less without searchidx1001 (and with some more perf fixes too)
[05:34:36] <Aaron|home>	 I'm running it on a kde install on a USB on my gaming desktop
[05:34:59] <Aaron|home>	 since using windows clients (git-bash/putty) gives slow garbage results
[05:38:13] <Aaron|home>	 does searchidx even use MW?
[05:38:50] <TimStarling>	 it's java
[05:38:55] <TimStarling>	 says iotop
[05:40:52] <TimStarling>	 as you might expect
[05:41:57] * Aaron|home  will be glad when elastic replaces all this garbage
[05:42:20] <TimStarling>	 indexing should really run at low priority
[05:43:03] <TimStarling>	 I don't think it makes sense to stall web processes for scap, but it probably makes sense to stall search indexing and cron jobs
[05:44:44] <TimStarling>	 "The priority within the best-effort class will be dynamically derived from the CPU nice level of the process: io_priority = (cpu_nice + 20) / 5. "
[05:45:32] <TimStarling>	 so it should be enough to just run it with nice -n10 or something
[05:46:22] <TimStarling>	 ah, right
[05:47:19] <Aaron|home>	 lol, my pipe broke before sync-common finished ;)
[05:48:13] <TimStarling>	 appservers run sshd at nice level -10, so scap will get an IO priority boost there since it runs from sshd
[05:48:33] <TimStarling>	 it'd be funny if salt turned out to be slower for that reason, and nobody could work out why ;)
[05:48:48] <TimStarling>	 on searchidx1001, sshd is running at nice 0
[05:49:09] <TimStarling>	 so is java
[05:52:36] * Aaron|home  sees rsync in top now
[06:04:43] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[06:04:43] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[06:05:25] <Aaron|home>	 TimStarling: so there is a reasonable fix for this?
[06:06:33] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 06:06:31 UTC 2013  
[06:06:43] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 06:06:31 AM UTC  
[06:07:03] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 06:06:57 UTC 2013  
[06:07:27] <TimStarling>	 well, we could extend the sshd nice level hack to all mediawiki-installation servers, instead of just the ones with applicationserver::service
[06:07:44] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 06:06:57 AM UTC  
[06:20:46] <Aaron|home>	 TimStarling: any reason not to?
[06:22:36] <TimStarling>	 well, I'm not sure what else runs via ssh
[06:22:41] <TimStarling>	 backups etc.
[06:23:45] <TimStarling>	 but if there are low-priority things running via ssh, we can fix them individually
[06:24:27] <TimStarling>	 it's probably the right approach
[06:25:17] <TimStarling>	 maybe we should renice sshd everywhere, in the base module
[07:04:19] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[07:04:39] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[07:06:50] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 07:06:45 UTC 2013  
[07:07:19] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 07:06:45 AM UTC  
[07:07:29] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 07:07:21 UTC 2013  
[07:07:39] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 07:07:21 AM UTC  
[07:11:54] <Aaron|home>	 ori: what do you think of that?
[07:12:09] <ori>	 renicing sshd in the base module?
[07:12:21] <Aaron|home>	 yes
[07:13:18] <Aaron|home>	 that said searchidx1001 is really having a hard time (all commands are slow to start there), so it's a bit of an outlier, but it should help
[07:14:02] <ori>	 have you tried renicing sshd on searchidx1001? what was the effect on scap runtime?
[07:14:25] <Aaron|home>	 can I even do that?
[07:14:39] <ori>	 i think so
[07:16:07] <ori>	 doing it everywhere might be worth doing, but it will still be the case that scap will hang if a host is saturated, right?
[07:16:23] <ori>	 without indicating to the deployer that this is what is happening
[07:16:27] <Aaron|home>	 I don't have permission
[07:17:01] <Aaron|home>	 ori: yeah, that server was like 90% of the wait
[07:17:32] <Aaron|home>	 it was terbium earlier due to some overactive script there
[07:18:28] <ori>	 i reniced it
[07:18:29] <ori>	 old priority 0, new priority -10
[07:19:48] <ori>	 so there's a logging problem that wouldn't be resolved by renicing
[07:20:20] <Aaron|home>	 ori: btw, do you have any clue why $DSH_EXPORTS might not get applied to scap-2 on the remote hosts?
[07:22:00] <logmsgbot>	 !log aaron started scap: php-1.23wmf6 Timing test
[07:22:18] <morebots>	 Logged the message, Master
[07:26:33] <logmsgbot>	 !log aaron finished scap: php-1.23wmf6 Timing test
[07:26:48] <ori>	 4 minutes
[07:26:55] <Aaron|home>	 4min41 sec, with that search service taking like 1min
[07:27:03] <Aaron|home>	 much better though
[07:27:10] <morebots>	 Logged the message, Master
[07:27:41] <Aaron|home>	 though the list of servers is shuffled, so if it alternates between being near the first or last the long-tail will show differently
[07:28:00] <Aaron|home>	 the earlier that box is synced, the faster scap will randomly seem
[07:30:47] <ori>	 i sometimes do careless things, like running zgrep with a regexp on several large rotated log files
[07:30:53] <ori>	 with the default priority
[07:31:50] <ori>	 if i were to do that on fluorine, and sshd was reniced -10
[07:32:13] <ori>	 that would put the demuxer in danger, right?
[07:33:14] <ori>	 i'm not suggesting i should be careless with impunity, just pointing out that people do resource-intensive things in ssh sessions sometimes without realizing it
[07:33:43] <Aaron|home>	 what if the scap commands just have nice in them?
[07:33:54] <ori>	 yeah, that seems better
[07:35:30] <Aaron|home>	 so<<nice -10 /usr/local/bin/scap-1 >> ?
[07:37:49] <Aaron|home>	 well nice -n 10 I mean
[07:38:11] <Aaron|home>	 -10 that is
[07:38:57] <ori>	 i think so, but let's wait and see what paravoid thinks
[07:39:47] <springle>	 !log resume schema changes for gerrit 51675 externallinks.el_id
[07:39:51] <Aaron|home>	 that will give perm errors though
[07:40:05] <morebots>	 Logged the message, Master
[07:40:07] <ori>	 if renicing sshd everywhere allows us to debug overloads it could be useful but i figure ops use the console in such cases
[07:43:09] <ori>	 Aaron|home: maybe in limits.conf
[08:03:10] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 04-1] Avoid md5sum calls in MergeCdbFileUpdates (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/104699 (owner: 10Aaron Schulz)
[08:04:21] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[08:04:31] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[08:05:34] <grrrit-wm>	 (03PS2) 10Aaron Schulz: Avoid md5sum calls in MergeCdbFileUpdates [operations/puppet] - 10https://gerrit.wikimedia.org/r/104699 
[08:07:01] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 08:06:56 UTC 2013  
[08:07:21] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 08:07:12 UTC 2013  
[08:07:21] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 08:06:56 AM UTC  
[08:07:31] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 08:07:12 AM UTC  
[09:04:32] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[09:04:32] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[09:07:02] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 09:06:54 UTC 2013  
[09:07:12] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 09:07:11 UTC 2013  
[09:07:32] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 09:07:11 AM UTC  
[09:07:32] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 09:06:54 AM UTC  
[09:56:36] <grrrit-wm>	 (03PS2) 10Hashar: Central OAuth wiki for Labs (metawiki) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104666 (owner: 10CSteipp)
[10:01:35] <grrrit-wm>	 (03CR) 10Hashar: [C: 04-1] "Copy pasted bug first comment as a commit summary to get some context attached to the code." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104666 (owner: 10CSteipp)
[10:04:30] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[10:04:31] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[10:05:56] <matanya>	 akosiaris: https://gerrit.wikimedia.org/r/#/c/104504/ i hope this is what you meant :)
[10:06:50] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 10:06:46 UTC 2013  
[10:07:20] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 10:07:11 UTC 2013  
[10:07:31] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 10:06:46 AM UTC  
[10:07:31] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 10:07:11 AM UTC  
[10:14:12] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "The approach is sound. Mostly minor fixes plus one big fix (scopes thingy)" (036 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/104504 (owner: 10Matanya)
[10:26:38] <grrrit-wm>	 (03PS5) 10Matanya: ircecho: move to a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/104504 
[10:28:04] <matanya>	 thanks akosiaris. comments done
[10:45:19] <grrrit-wm>	 (03PS2) 10Odder: Create Chinese Wikivoyage (zhwikivoyage) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104355 
[10:45:28] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Create Chinese Wikivoyage (zhwikivoyage) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104355 (owner: 10Odder)
[10:50:07] <grrrit-wm>	 (03PS3) 10Odder: Create Chinese Wikivoyage (zhwikivoyage) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104355 
[10:57:19] <grrrit-wm>	 (03PS1) 10Odder: Close wikimania2013 wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104726 
[11:04:51] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[11:05:01] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[11:06:41] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 11:06:34 UTC 2013  
[11:07:01] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 11:06:34 AM UTC  
[11:07:32] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 11:07:29 UTC 2013  
[11:07:51] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 11:07:29 AM UTC  
[11:22:33] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "There is one breaking change. Other than that LGTM. Adding a" (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/104504 (owner: 10Matanya)
[11:25:08] <grrrit-wm>	 (03CR) 10Matanya: "yeah, will fix those and add the description, good idea." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/104504 (owner: 10Matanya)
[11:28:25] <grrrit-wm>	 (03PS6) 10Matanya: ircecho: move to a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/104504 
[11:55:15] <grrrit-wm>	 (03PS7) 10Alexandros Kosiaris: ircecho: move to a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/104504 (owner: 10Matanya)
[11:55:47] <grrrit-wm>	 (03PS2) 10Hashar: adding '*.raa.se' to the wgCopyUploadsDomains array. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104683 (owner: 10Dan-nl)
[11:57:04] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] "deploying. We might want to make that list configurable by some user group via a new special page :-)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104683 (owner: 10Dan-nl)
[11:57:13] <grrrit-wm>	 (03Merged) 10jenkins-bot: adding '*.raa.se' to the wgCopyUploadsDomains array. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104683 (owner: 10Dan-nl)
[11:58:45] <logmsgbot>	 !log hashar synchronized wmf-config/InitialiseSettings.php  'adding *.raa.se to the wgCopyUploadsDomains array {{gerrit|104683}}'
[11:58:56] <grrrit-wm>	 (03CR) 10Hashar: "deployed in production." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104683 (owner: 10Dan-nl)
[11:59:26] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] ircecho: move to a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/104504 (owner: 10Matanya)
[11:59:36] <morebots>	 Logged the message, Master
[12:01:36] <hashar>	 hello
[12:04:54] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[12:05:04] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[12:05:30] <akosiaris>	 buona sera signore 
[12:06:32] <hashar>	 akosiaris: mind merging in a change for ryan deployment system please ? : -D  https://gerrit.wikimedia.org/r/#/c/103095
[12:06:39] <hashar>	 adds in the integration/kss.git repo in git-deploy
[12:06:42] <akosiaris>	 rgist ?
[12:06:49] <hashar>	 oh rgist
[12:06:55] <hashar>	 we need a wiki page to list all the possible names
[12:07:04] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 12:06:54 UTC 2013  
[12:07:04] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 12:06:59 UTC 2013  
[12:07:05] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[12:07:28] <akosiaris>	 i firmly believe rgist is the best 
[12:07:31] <Betacommand>	 icinga-wm: make your mind up
[12:07:41] <akosiaris>	 it has all the previous names deeply embedded :-)
[12:07:54] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 12:06:59 PM UTC  
[12:08:06] <akosiaris>	 ryan will disagree obviously :-D
[12:08:12] <akosiaris>	 hashar: Do not +2 until the repo integration/kss has been created on gerrit.
[12:08:33] <hashar>	 akosiaris: yeah it has been created and I forged a commit for it let me  clarify
[12:08:52] <akosiaris>	 forging ????? isn't that illegal ? :P
[12:08:56] <akosiaris>	 ok merging 
[12:09:11] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Add integration/kss to deployment repo config [operations/puppet] - 10https://gerrit.wikimedia.org/r/103095 (owner: 10Spage)
[12:09:13] <grrrit-wm>	 (03CR) 10Hashar: "The repository has been created and I added a dummy commit that introduce the .gitreview file with https://gerrit.wikimedia.org/r/104730 :" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103095 (owner: 10Spage)
[12:09:34] <akosiaris>	 there you go
[12:09:43] <hashar>	 akosiaris: thanks, no urgency to have it deployed on tin since  the repository is empty for now.  S will follow up later I guess
[12:12:57] <grrrit-wm>	 (03PS2) 10Hashar: contint: package curl on slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/97526 
[12:17:23] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Migrate ircecho module's nrpe checks to nrpe module [operations/puppet] - 10https://gerrit.wikimedia.org/r/104733 
[12:29:34] <icinga-wm>	 PROBLEM - Auth DNS on labs-ns0.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call  
[12:30:24] <icinga-wm>	 RECOVERY - Auth DNS on labs-ns0.wikimedia.org is OK: DNS OK: 0.078 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.153.219  
[12:37:49] <grrrit-wm>	 (03CR) 10Hashar: "bunch of nitpicking, haven't looked at apache/kibana configuration." (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/104172 (owner: 10BryanDavis)
[12:47:48] <grrrit-wm>	 (03CR) 10Hashar: [C: 031] "I am not a big fan of adding new extension-list but the change is fine and temporary. So I guess it is not a big deal." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95996 (owner: 10Aude)
[12:48:48] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Migrate ircecho module's nrpe checks to nrpe module [operations/puppet] - 10https://gerrit.wikimedia.org/r/104733 (owner: 10Alexandros Kosiaris)
[12:56:37] <icinga-wm>	 PROBLEM - Auth DNS on labs-ns0.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call  
[12:57:27] <icinga-wm>	 RECOVERY - Auth DNS on labs-ns0.wikimedia.org is OK: DNS OK: 0.149 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.153.219  
[12:59:56] <grrrit-wm>	 (03CR) 10Aude: "how about Thursday afternoon? or someone choose when :)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95996 (owner: 10Aude)
[13:23:41] <grrrit-wm>	 (03CR) 10Hashar: "Thursday afternoon is fine to me. Want to do that before elastic search upgrade which is at 5pm UTC. Adding an entry on https://wikitech." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95996 (owner: 10Aude)
[13:29:50] <hashar>	 akosiaris: got a dns entry for you if you got time https://gerrit.wikimedia.org/r/104376  for kr.wikimedia.org
[13:48:36] <icinga-wm>	 PROBLEM - Auth DNS on labs-ns0.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call  
[13:49:26] <icinga-wm>	 RECOVERY - Auth DNS on labs-ns0.wikimedia.org is OK: DNS OK: 5.333 seconds response time. nagiostest.beta.wmflabs.org returns  
[13:57:36] <icinga-wm>	 PROBLEM - Auth DNS on labs-ns0.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call  
[13:58:26] <icinga-wm>	 RECOVERY - Auth DNS on labs-ns0.wikimedia.org is OK: DNS OK: 0.149 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.153.219  
[14:05:39] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[14:05:49] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[14:06:59] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 14:06:50 UTC 2013  
[14:07:09] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 14:07:00 UTC 2013  
[14:07:39] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 02:06:50 PM UTC  
[14:07:49] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 02:07:00 PM UTC  
[14:18:39] <icinga-wm>	 PROBLEM - Auth DNS on labs-ns0.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call  
[14:20:29] <icinga-wm>	 RECOVERY - Auth DNS on labs-ns0.wikimedia.org is OK: DNS OK: 0.320 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.153.219  
[14:47:47] <grrrit-wm>	 (03PS1) 10Dan-nl: beta: gwtoolset filebackend [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104739 
[14:52:57] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Add kr.wikimedia DNS entry [operations/dns] - 10https://gerrit.wikimedia.org/r/104376 (owner: 10John F. Lewis)
[14:56:03] <grrrit-wm>	 (03PS1) 10Dan-nl: production: add gwtoolset to extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104741 
[14:56:29] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] beta: gwtoolset filebackend [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104739 (owner: 10Dan-nl)
[14:56:50] <grrrit-wm>	 (03Merged) 10jenkins-bot: beta: gwtoolset filebackend [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104739 (owner: 10Dan-nl)
[15:04:33] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[15:05:33] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[15:07:12] <hashar>	 >>> new Date
[15:07:15] <hashar>	 bah
[15:07:33] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 15:07:24 UTC 2013  
[15:07:33] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 15:07:24 UTC 2013  
[15:07:33] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[15:08:33] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 03:07:24 PM UTC  
[15:48:11] <grrrit-wm>	 (03PS1) 10Hashar: retab certs.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/104742 
[15:48:12] <grrrit-wm>	 (03PS1) 10Hashar: certs.pp puppet lint fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/104743 
[15:49:32] <grrrit-wm>	 (03CR) 10Hashar: "I am not sure of the impacts in production if I have made any mistake in there :(" [operations/puppet] - 10https://gerrit.wikimedia.org/r/104743 (owner: 10Hashar)
[15:53:41] <jeremyb>	 whoops, missed hashar
[16:04:59] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[16:05:18] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[16:07:28] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 16:07:18 UTC 2013  
[16:07:38] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 16:07:28 UTC 2013  
[16:07:59] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 04:07:28 PM UTC  
[16:08:18] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 04:07:18 PM UTC  
[17:04:09] <manybubbles>	 something seems unhappy with centralnotice
[17:04:50] <manybubbles>	 mwalkerz: are you centralnotice?
[17:05:04] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[17:05:05] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[17:05:30] <manybubbles>	  No banner exists where tmp_name = B13_123115_yer_enUS
[17:07:14] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 17:07:06 UTC 2013  
[17:07:14] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 17:07:11 UTC 2013  
[17:08:04] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 05:07:11 PM UTC  
[17:08:05] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 05:07:06 PM UTC  
[17:09:30] <Nemo_bis>	 manybubbles: does it need to be disabled? any meta-wiki admin or staffer can do so until it's fixed
[17:09:58] <manybubbles>	 Nemo_bis: I dunno if it needs to be disabled, jus that it is logging more errors then normal
[17:10:08] <manybubbles>	 I can't tell if those errors are catastrophic from here
[17:10:48] <manybubbles>	 it might just need some configuration fixed
[17:10:53] <manybubbles>	 or a banner uploaded, or something
[17:29:33] <grrrit-wm>	 (03CR) 10Chad: "Code's live, so this can probably go live whenever." [operations/puppet] - 10https://gerrit.wikimedia.org/r/103768 (owner: 10BryanDavis)
[18:04:01] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[18:04:11] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[18:06:47] <halfak>	 ^ Epoch Fail
[18:07:01] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 18:06:57 UTC 2013  
[18:07:11] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 06:06:57 PM UTC  
[18:07:21] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 18:07:18 UTC 2013  
[18:08:01] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 06:07:18 PM UTC  
[18:12:31] <icinga-wm>	 PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[18:13:21] <icinga-wm>	 RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical  
[18:27:04] <eranroz>	 please revert https://git.wikimedia.org/commitdiff/mediawiki%2Fextensions%2FParserFunctions/164a6469b6d95c447869a1a427a66150b76a2c58 and deploy it as hotfix to hewikisource
[18:27:30] <eranroz>	 this change breaks magic words and many templates are now broken
[18:38:42] <eranroz>	 *HOTFIX* please revert  https://git.wikimedia.org/commitdiff/mediawiki%2Fextensions%2FParserFunctions/164a6469b6d95c447869a1a427a66150b76a2c58 - Hebrew Wikisource it currently broken
[18:39:24] <greg-g>	 eranroz: is there a bug for this?
[18:39:44] <eranroz>	 https://bugzilla.wikimedia.org/46613
[18:39:49] <greg-g>	 siebrand: ^^
[18:40:08] <greg-g>	 ah, I see
[18:40:11] <greg-g>	 he's commented
[18:42:08] <greg-g>	 eranroz: have you commented on the bug (are you kipod)?
[18:42:20] <eranroz>	 no
[18:42:38] <greg-g>	 it seems there is disagreement about the next step here
[18:42:40] <eranroz>	 but this bug happend on march and then there was revert for the specific change
[18:42:51] <eranroz>	 the site is currently broken
[18:42:59] <eranroz>	 we can take everything back and then disscuss it
[18:44:32] <greg-g>	 Reedy: thoughts ^^
[18:46:54] <eranroz>	 https://gerrit.wikimedia.org/r/#/c/56196/
[18:47:03] <eranroz>	 Reedy everted such change feww months ago
[18:47:07] <eranroz>	 when it first happend
[18:47:26] <eranroz>	 there is probably a bug in Transalte extension that automaticly breaks the hebrew translations
[18:47:36] <eranroz>	 whenever someone change other messages there
[18:51:17] <greg-g>	 eranroz: can you show me a broken page?
[18:51:22] <greg-g>	 also, interesting http://i.imgur.com/ChiXrgS.jpg
[18:51:54] <twkozlowski>	 You have a really unreadable screen, greg-g
[18:52:04] <eranroz>	 well this is because of RTL issuses
[18:52:13] <eranroz>	 i'm not sure it is a bug :)
[18:52:21] <greg-g>	 twkozlowski: sorry, dumb default of jpg there
[18:52:22] <greg-g>	 http://imgur.com/LuL2wwh
[18:52:33] <greg-g>	 well, my username is incorrectly displayed :)
[18:53:18] <eranroz>	 for exahttps://he.wikisource.org/wiki/%D7%95%D7%99%D7%A7%D7%99%D7%98%D7%A7%D7%A1%D7%98:%D7%9E%D7%96%D7%A0%D7%95%D7%9F
[18:53:22] <eranroz>	 https://he.wikisource.org/wiki/%D7%95%D7%99%D7%A7%D7%99%D7%98%D7%A7%D7%A1%D7%98:%D7%9E%D7%96%D7%A0%D7%95%D7%9F
[18:53:40] <eranroz>	 (the village pump message at the top)
[18:53:54] <greg-g>	 ah, I see
[18:54:03] <eranroz>	 we can fix the specific template there, but it occurs in all templates
[18:54:04] <greg-g>	 Reedy: around?
[18:54:09] <greg-g>	 eranroz: yeah
[18:58:40] <^d>	 Force running l10n update now.
[19:00:43] <eranroz>	 thanks (https://gerrit.wikimedia.org/r/#/c/104752/)
[19:00:57] <logmsgbot>	 !log LocalisationUpdate completed (1.23wmf8) at Tue Dec 31 19:00:56 UTC 2013
[19:02:16] <logmsgbot>	 !log LocalisationUpdate completed (1.23wmf7) at Tue Dec 31 19:02:16 UTC 2013
[19:02:22] <^d>	 I think that's a lie though.
[19:04:02] <^d>	 So we're changing i18n file format but nobody decided to update the LocalisationUpdate extension?
[19:04:22] <eranroz>	 ?
[19:05:15] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[19:05:16] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[19:05:32] <greg-g>	 ^d: ugggggggggggggggggggg
[19:06:45] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 19:06:35 UTC 2013  
[19:06:54] <AaronSchulz>	 ori: does https://gerrit.wikimedia.org/r/#/c/104699/ look ok now?
[19:07:16] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 07:06:36 PM UTC  
[19:07:35] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 19:07:26 UTC 2013  
[19:08:11] <logmsgbot>	 !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 31 19:08:11 UTC 2013
[19:08:15] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 07:07:26 PM UTC  
[19:08:16] <^d>	 greg-g: Nevermind. I was misreading the exception.
[19:08:23] <^d>	 In any case, it's sorta broke
[19:08:44] <greg-g>	 just a guess, because of AaronSchulz's changes?
[19:08:59] <^d>	 What changes?
[19:10:08] <^d>	 I'm getting stuff like http://p.defau.lt/?HkROozC11wjb2kUyQdGWhw
[19:10:32] <greg-g>	 huh
[19:12:38] <logmsgbot>	 !log LocalisationUpdate completed (1.23wmf8) at Tue Dec 31 19:12:38 UTC 2013
[19:14:02] <logmsgbot>	 !log LocalisationUpdate completed (1.23wmf7) at Tue Dec 31 19:14:02 UTC 2013
[19:14:42] <AaronSchulz>	 ^d: errors reading the i18n files? I never touched that stuff
[19:15:04] <greg-g>	 I thought that's what the cdb file stuff was, sorry
[19:16:17] <^d>	 AaronSchulz: I know, that's why I said what changes :p
[19:19:10] <logmsgbot>	 !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 31 19:19:10 UTC 2013
[19:21:54] <greg-g>	 ^d: so, still broken, what's next? :(
[19:22:01] <^d>	 I'm trying something else.
[19:22:04] <greg-g>	 k
[19:23:06] <logmsgbot>	 !log demon synchronized php-1.23wmf7/extensions/ParserFunctions  'Updating PFuncs to master'
[19:23:33] <logmsgbot>	 !log demon synchronized php-1.23wmf8/extensions/ParserFunctions  'Updating PFuncs to master'
[19:23:52] <^d>	 Well, that's not enough either :\
[19:25:17] <AaronSchulz>	 ^d: what are you trying to do?
[19:25:28] <^d>	 Playing whack a mole.
[19:25:37] <AaronSchulz>	 well, besides that
[19:25:57] <^d>	 hebrew wikis are broken with parser functions.
[19:26:05] * AaronSchulz  volunteers Reedy to make that make-wmf-branch change for extensions
[19:26:09] <^d>	 A patch was committed, which I merged to master.
[19:27:54] <^d>	 Tried to run l10nupdate, but that broke
[19:33:09] <ori>	 AaronSchulz: reviewing
[19:35:26] <AaronSchulz>	 ^d: what is an example page?
[19:35:38] <^d>	 https://he.wikisource.org/wiki/%D7%95%D7%99%D7%A7%D7%99%D7%98%D7%A7%D7%A1%D7%98:%D7%9E%D7%96%D7%A0%D7%95%D7%9F
[19:35:53] <^d>	 https://gerrit.wikimedia.org/r/#/c/104752/ was the revert
[19:39:01] <eranroz>	 i just noticed the revert have change something in the last line in the diff
[19:39:35] <eranroz>	 (dont know what)
[19:39:53] <eranroz>	 no new line at the EOF
[19:40:08] <eranroz>	 can it cause problems to parsers
[19:40:12] <eranroz>	 ?
[19:40:12] <logmsgbot>	 !log aaron started scap: active
[19:42:43] <ori>	 AaronSchulz: change looks ok if you commit to adding detailed usage notes (explaining, for example, when you would want to specify --no-checksum, and when not) in a follow-up commit
[19:44:30] <grrrit-wm>	 (03CR) 10Ori.livneh: "Looks ok if you commit to adding detailed usage notes (explaining, for example, when you would want to specify --no-checksum, and when not" (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/104699 (owner: 10Aaron Schulz)
[19:51:36] <grrrit-wm>	 (03PS1) 10Aaron Schulz: Made RefreshCdbJsonFiles include newlines in the JSON [operations/puppet] - 10https://gerrit.wikimedia.org/r/104762 
[20:01:27] <grrrit-wm>	 (03PS3) 10Aaron Schulz: Avoid md5sum calls in MergeCdbFileUpdates [operations/puppet] - 10https://gerrit.wikimedia.org/r/104699 
[20:01:28] <AaronSchulz>	 ori: there we go ^
[20:02:43] <manybubbles>	 ottomata: hey, you around?  can we talk about elasticsearch plugins.  I want them so bad
[20:03:21] <Reedy>	 ...
[20:03:42] <ottomata>	 manybubbles: hey ja but not working right now
[20:03:52] <ottomata>	 i'm helping my little cousin install linux, wooo!
[20:04:07] <ottomata>	 manybubbles: but i need to get that stuff worked out soon too
[20:04:11] <ottomata>	 so let's talk about it next week
[20:04:29] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[20:04:29] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[20:07:09] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 20:07:04 UTC 2013  
[20:07:29] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 08:07:04 PM UTC  
[20:07:39] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 20:07:29 UTC 2013  
[20:08:29] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 08:07:29 PM UTC  
[20:20:16] <grrrit-wm>	 (03PS1) 10Chad: Prioritize priority CirrusSearch jobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/104763 
[20:20:53] <AaronSchulz>	 ^d: are those jobs really fast?
[20:21:07] <^d>	 Should be, they're single page updates.
[20:21:11] <logmsgbot>	 !log aaron finished scap: active
[20:21:14] <^d>	 Triggered by page edits
[20:21:22] <^d>	 (Rather than bulk updates)
[20:21:23] <AaronSchulz>	 I'd imagine just the fact that it's a separate queue will prioritize it better that before
[20:21:42] <AaronSchulz>	 ^d: so they take like 100s of ms?
[20:21:49] <AaronSchulz>	 or less
[20:22:29] <^d>	 Yeah, at most a few 100ms each
[20:23:16] <^d>	 http://p.defau.lt/?fF9p9rtdKx_vnVVRhfCrWQ
[20:23:56] <halfak>	 Hey folks.  I need a package installed on stat1.wikimedia.org.  Can someone tell me the right way to request this?
[20:24:45] <AaronSchulz>	 ^d: https://he.wikisource.org/wiki/%D7%95%D7%99%D7%A7%D7%99%D7%98%D7%A7%D7%A1%D7%98:%D7%9E%D7%96%D7%A0%D7%95%D7%9F looks fine now
[20:25:05] <^d>	 Thanks.
[20:25:41] <AaronSchulz>	 they .122 sec/job
[20:25:45] <AaronSchulz>	 *they are
[20:26:00] <AaronSchulz>	 according to getJobProfileTimes on fluorine
[20:26:28] <grrrit-wm>	 (03CR) 10Aaron Schulz: [C: 031] "Average runtime is .122sec/job, which is fine" [operations/puppet] - 10https://gerrit.wikimedia.org/r/104763 (owner: 10Chad)
[20:26:37] * halfak  will just file a ticket with ops-requests and hope for the best. 
[20:28:06] <RobH>	 halfak: so thats the right first step
[20:28:24] <RobH>	 if its already an ubuntu package, thats much much easier.
[20:28:31] <halfak>	 RobH: Thanks.  It is. 
[20:28:42] <RobH>	 someone in ops reviews the package and ensures its ok to run, then we can add into system via puppet manifests
[20:29:04] <RobH>	 if you are comfortable doing the puppet work, you can check it into gerrit and attach a reference to it in the rt request
[20:29:16] <RobH>	 but everything after 'ops-request' is optional
[20:29:23] <RobH>	 more folks can do though, the faster stuff goes
[20:29:43] <RobH>	 then you can follow up with whoever is on RT triage duty that week normally
[20:30:36] <halfak>	 What's "RT triage duty" and how do I find out who is on it?
[20:30:54] <RobH>	 RT: name in topic
[20:30:57] <RobH>	 this week it is alex
[20:31:14] <RobH>	 https://wikitech.wikimedia.org/wiki/RT_Triage_Duty
[20:31:34] <RobH>	 Basically operations department now has a volunteer from our team every week as a point person for all operations requests
[20:31:43] <RobH>	 they triage the ops-request queue and other queues as well
[20:32:01] <RobH>	 so if you email and don't get a response for something, you have a go to person
[20:32:19] <halfak>	 Cool.  So if this is a blocker for me (it is) I should ping the triager after filing the ticket?
[20:32:35] <RobH>	 its not a bad idea =]
[20:33:05] <RobH>	 though rt triage doesnt mean  he has to personally do every task, but he will either do it or find someone to do it
[20:33:19] <halfak>	 RobH: Makes sense.  Thanks for the help. 
[20:33:27] <RobH>	 or if no one is available he can ping ken and see if we have to free up another project in progress
[20:34:24] <Reedy>	 halfak: There's also the shortcut
[20:34:33] <Reedy>	 You just need to bribe someone with a whisky
[20:34:47] <Reedy>	 s/bribe/encourage/
[20:35:04] <halfak>	 I would totally do that.  How's the promise of whiskey?  Also what type is preferred?
[20:35:08] <RobH>	 harder to do with ops though
[20:35:17] <RobH>	 since right now alex is triage in greece
[20:35:21] <grrrit-wm>	 (03CR) 10Manybubbles: [C: 031] Prioritize priority CirrusSearch jobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/104763 (owner: 10Chad)
[20:35:30] <RobH>	 next week could be sean in AU, or me in SF, etc...
[20:35:37] <RobH>	 thats a LOT of whiskey to ship about
[20:36:45] <grrrit-wm>	 (03PS4) 10Ori.livneh: Avoid md5sum calls in MergeCdbFileUpdates [operations/puppet] - 10https://gerrit.wikimedia.org/r/104699 (owner: 10Aaron Schulz)
[20:36:50] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032 V: 032] Avoid md5sum calls in MergeCdbFileUpdates [operations/puppet] - 10https://gerrit.wikimedia.org/r/104699 (owner: 10Aaron Schulz)
[20:36:59] <halfak>	 Hmmm.  For SF, I can always just recruit a collaborator to obtain and deliver whiskey. 
[20:37:05] * halfak  is in MN. 
[20:37:51] <ori>	 halfak: what's the package?
[20:38:01] <halfak>	 python3-dev
[20:38:09] <ori>	 oh, that's hardly objectionable
[20:38:09] <grrrit-wm>	 (03PS1) 10Chad: Move all other small misc. wikis over to Cirrus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104765 
[20:38:10] <halfak>	 I just need the headers to compile oursql for python3
[20:38:32] <RobH>	 ?
[20:38:33] <ori>	 RobH: I'm going to add it to puppet. stat1 is explicitly cordoned off for analyst tooling
[20:38:42] <RobH>	 hrmm
[20:38:51] <RobH>	 running a db on it wha?
[20:38:59] <RobH>	 stat1 is insanity
[20:39:01] <grrrit-wm>	 (03PS2) 10Chad: Move all other small misc. wikis over to Cirrus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104765 
[20:39:08] <RobH>	 i think someone will want to review that ;]
[20:39:17] <ori>	 RobH: analysts typically load datasets from $datastore and process them
[20:39:21] <halfak>	 I just created the ticket.  See https://rt.wikimedia.org/Ticket/Display.html?id=6561
[20:39:32] <RobH>	 well, whatever reasoning you guys need it for please list it
[20:39:48] <RobH>	 so good enough, not saying no or nothin
[20:39:55] <RobH>	 just saying more info the better, makes it faster
[20:40:14] * ori  nods
[20:56:22] <AaronSchulz>	 ori: https://gerrit.wikimedia.org/r/#/c/104762/ is fairly simple
[20:56:41] <AaronSchulz>	 and the files don't crash vim with that ;)
[21:05:29] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[21:05:39] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[21:07:09] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 21:07:07 UTC 2013  
[21:07:19] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 21:07:17 UTC 2013  
[21:07:29] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 09:07:07 PM UTC  
[21:07:39] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 09:07:17 PM UTC  
[21:10:27] <dan-nl>	 hey Reedy, GWToolset on Commons no longer has i18n available to it. are you able to take a look at https://gerrit.wikimedia.org/r/#/c/104741/ and merge/deploy it if you're okay with it?
[21:10:35] <ori>	 AaronSchulz: reviewing
[21:10:51] <AaronSchulz>	 ori: oh, and you can look at https://gerrit.wikimedia.org/r/#/c/103619/ (not related) :)
[21:10:57] <ori>	 also, re: --trustmtime, i forgot to actually merge it earlier
[21:11:00] <ori>	 so i just merged it now
[21:11:06] <ori>	 in case you're wondering why it's not showing up on tin
[21:11:11] <ori>	 AaronSchulz: kk
[21:12:40] <grrrit-wm>	 (03PS2) 10Reedy: production: add gwtoolset to extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104741 (owner: 10Dan-nl)
[21:12:56] <Reedy>	 dan-nl: That means it needs a scap run...
[21:13:19] <dan-nl>	 thanks reedy … what's a scap run?
[21:13:30] <Reedy>	 rebuilds the message caches etc
[21:13:41] <dan-nl>	 ah, okay ...
[21:13:43] <Reedy>	 Or... I merge it now and wait for localistion update to fix it in a few hours
[21:14:37] <Reedy>	 I wonder if extension-list-1.23wmf7 should just be deleted rather than a blank line
[21:14:41] <dan-nl>	 whichever you feel is best is fine
[21:14:47] <dan-nl>	 probably
[21:14:56] <Reedy>	 AaronSchulz: Does scap work atm?
[21:15:20] <AaronSchulz>	 Reedy: yeah, I ran it earlier today, I can run again
[21:15:29] * AaronSchulz  is always curious what the timing is
[21:15:50] <Reedy>	 Running it with no changes?
[21:16:56] <AaronSchulz>	 ?
[21:17:19] <AaronSchulz>	 that would be post merge of course
[21:17:28] <dan-nl>	 re: extension-list-1.23wmf7, i'm not sure why we needed it … with nothing in it i imagine you're right and the file can be deleted now. 
[21:18:39] <dan-nl>	 Reedy: if you'd like i can deleted it and upload another patch set. just let me know
[21:19:07] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] Made RefreshCdbJsonFiles include newlines in the JSON [operations/puppet] - 10https://gerrit.wikimedia.org/r/104762 (owner: 10Aaron Schulz)
[21:25:36] <grrrit-wm>	 (03PS3) 10Reedy: production: add gwtoolset to extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104741 (owner: 10Dan-nl)
[21:27:45] <dan-nl>	 Reedy: should wmf-config/extension-list-labs stay in the config for future extensions that want to transition from labs to production or is there no consequence to placing new extensions in the extension-list immediately?
[21:27:50] <grrrit-wm>	 (03Merged) 10jenkins-bot: production: add gwtoolset to extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104741 (owner: 10Dan-nl)
[21:28:19] <Reedy>	 dan-nl: It's wrapped in a file existence check
[21:28:40] <Reedy>	 dan-nl: Putting it in extension-list is a problem if the extension is not in all of the 2-3 currently active branches
[21:29:08] <AaronSchulz>	 Reedy: when did that become the case? It used to not be a long time ago afaik
[21:29:23] <Reedy>	 ~I
[21:29:29] <Reedy>	 It's been a problem for a while
[21:30:19] <Reedy>	 Which the extension-list-1.XXwmfY was the workaround
[21:30:46] <AaronSchulz>	 it used to give a notice and move on
[21:31:20] <Reedy>	 It did
[21:31:23] <Reedy>	 reedy@tin:/a/common$ time scap "Rebuild 1.23wmf8 l10n cache with gwtoolset in it"
[21:31:23] <Reedy>	 Invalid MediaWiki version "Rebuild 1.23wmf8 l10n cache with gwtoolset in it"
[21:31:27] <Reedy>	 Say wut?
[21:31:44] <Reedy>	 reedy@tin:/a/common$ scap --help
[21:31:44] <Reedy>	 Invalid MediaWiki version "--help"
[21:31:47] <Reedy>	 Helpful.
[21:31:53] <AaronSchulz>	 scap never had --help
[21:31:59] <dan-nl>	 hmm … so how can i find out if gwtoolset is in all current active branches?
[21:32:00] <AaronSchulz>	 just do "scap active <summary>"
[21:32:03] <twkozlowski>	 Reedy: lol
[21:32:08] <grrrit-wm>	 (03CR) 10Ori.livneh: "Looks good to me. What impact do you expect this to have? Should we notify someone in ops?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/104763 (owner: 10Chad)
[21:32:12] <twkozlowski>	 59169
[21:32:52] <Reedy>	 dan-nl: It's only usually a problem around first deploying an extension
[21:33:05] <Reedy>	 It then goes into a config to make sure it's branched for every version going forward (until removed)
[21:34:28] <greg-g>	 AaronSchulz: mind sending out a note about how scap has changed to engineering?
[21:35:06] <ori>	 greg-g: it might be good to let things settle a bit first
[21:35:20] <logmsgbot>	 !log reedy started scap: active Rebuild 1.23wmf8 l10n cache with gwtoolset in it
[21:35:39] <greg-g>	 ori: when will that be?
[21:35:45] <ori>	 we merged a lot of stuff over the past couple of days, and we don't have a good grasp of the impact quite yet
[21:36:09] <ori>	 I'm not sure; at least a few days, I think. AaronSchulz, what do you think?
[21:36:15] <greg-g>	 a few days?!
[21:36:31] <greg-g>	 so, there's people planning to push stuff out on Thursday
[21:36:44] <greg-g>	 just saying :)
[21:36:46] <Reedy>	 That would've been fun waiting for SF to wake up
[21:36:49] <ori>	 they don't need to do anything differently
[21:36:54] <Reedy>	 Y U NO LET ME PREP DEPLOYMENT
[21:36:56] <Reedy>	 ori: active?
[21:37:02] <Reedy>	 [21:31:19] <Reedy> reedy@tin:/a/common$ time scap "Rebuild 1.23wmf8 l10n cache with gwtoolset in it"
[21:37:02] <Reedy>	 [21:31:19] <Reedy> Invalid MediaWiki version "Rebuild 1.23wmf8 l10n cache with gwtoolset in it"
[21:37:07] <greg-g>	 yeah, Reedy wasn't aware of how to deploy now
[21:37:07] <Reedy>	 [21:31:57] <AaronSchulz> just do "scap active <summary>"
[21:37:10] <Reedy>	 That's something different
[21:37:22] <greg-g>	 so, ie: something needs to either be communicated, or done differently
[21:37:25] <ori>	 ah, hah, right. That's a good point
[21:37:48] <greg-g>	 would have been better to test with a different command
[21:38:06] <ori>	 "git deploy" was taken
[21:38:13] <greg-g>	 leave scap alone, create scaptest or something and only switch when ready
[21:39:02] <ori>	 meh, I don't really agree
[21:39:26] <greg-g>	 why?
[21:39:44] <greg-g>	 things were broken on Monday for manybubbles, now they are different for Reedy without notice...
[21:39:49] <greg-g>	 seems things were done incorrectly to me
[21:40:33] <ori>	 ok, i'm going to be slightly mean for a moment, because i feel strongly about this
[21:40:39] <greg-g>	 cool
[21:41:44] <ori>	 the official deployment sprint was done "correctly" from that perspective because things didn't break or change without notice, but the reason for that is that we didn't really make substantial changes
[21:41:58] * greg-g  nods
[21:42:16] <ori>	 something very clearly is being done correctly at the moment because aaron managed to make substantial improvements to the deployment process
[21:42:33] <ori>	 in some sense, my argument is a false dichotomy, because you can make substantial changes *and* communicate them
[21:42:53] <greg-g>	 right
[21:43:00] <greg-g>	 I was going to say those things are not related
[21:43:03] <ori>	 and i was wrong above to suggest we should wait; i did forget about the subargument change
[21:43:26] <ori>	 but people often fail to appreciate the cost of running a parallel setup, and the cost of simulating production conditions, and the cost of architecting on dry land
[21:43:28] <greg-g>	 so why not make big major changes in a way that won't break other people's work?
[21:43:37] <greg-g>	 I mean, we talk about feature flags all the time, why not here?
[21:44:09] <ori>	 because bash is not a nice programming language and even trivial logic breaks so the overhead of adding that would not be worth it
[21:44:16] <greg-g>	 (obviously not "scap --new blah", but that'd be neat, you know what I mean ;) )
[21:44:26] <ori>	 it's a question of standards and priorities and organizational culture
[21:44:30] <greg-g>	 I mean what I said before, scaptest or something
[21:44:50] <ori>	 we are not launching rockets into space; wikimedia is an experiment and we are where we are because were bold and tried things out
[21:44:57] * greg-g  sighs
[21:45:13] <greg-g>	 sure, everything is an experiment so it doesn't matter when anyone breaks anything
[21:45:17] <greg-g>	 arugment not valid
[21:45:38] <ori>	 that is not license to break things with abandon, but man, i get tired of this protocol
[21:45:42] <greg-g>	 sorry, I should have said the same thing you did about being mean
[21:45:48] <ori>	 it is exaggerated and it stiffles work and creativity
[21:46:01] <greg-g>	 kind of, so, tell me why doing this in scaptest wouldn't work?
[21:46:06] <grrrit-wm>	 (03CR) 10Manybubbles: "Andrew Otto is who we've been working with mostly in ops but I believe he is out this week. I've added him as a reviewer." [operations/puppet] - 10https://gerrit.wikimedia.org/r/104763 (owner: 10Chad)
[21:46:43] <ori>	 maybe it would, but that's hindsight; the scope of the work didn't start out being substantial, it's just that aaron kept discovering more opportunities for optimizing
[21:47:04] <greg-g>	 sure, I understand and appreciate that momentum aspect
[21:47:25] <greg-g>	 just sometimes there's a line, is all
[21:47:50] <ori>	 did the site go down or something?
[21:48:10] <Nemo_bis>	 on the bright side, we had already disabled pt.wiki emergency captcha earlier :D
[21:48:18] <greg-g>	 Nemo_bis: :) :)
[21:48:48] <greg-g>	 ori: no, just complaining that if you two are offline for any amount of time when someone else is trying to do something, they're stuck to no fault of their own
[21:49:20] <greg-g>	 that's it really.
[21:49:37] <ori>	 that is regrettable and in hindsight it is almost always the case that any breakage, even minor, could have been avoided
[21:49:50] <greg-g>	 just maybe a heads up to the list like "hey, so, we started making some changes to scap, got bigger than we planned, not done, but we'll let you know when it's all clear" would be really nice
[21:49:54] <greg-g>	 :)
[21:50:04] <greg-g>	 I mean, scap is kind of important ;)
[21:50:09] <ori>	 but harping on that reflects lopsided priorities, imo
[21:50:21] <ori>	 the big news story is that things are finally moving with it
[21:50:22] <greg-g>	 informed deployers vs progress?
[21:50:28] <greg-g>	 yay! to that (honestly)
[21:50:47] <greg-g>	 (also, my vs up there was just tryign to clarify, not continue an attack)
[21:51:14] <twkozlowski>	 Nemo_bis: yay! \o/
[21:51:42] <ori>	 IIRC facebook senior engs issued tshirts for the staff that say something like "failures happen"
[21:51:53] * greg-g  nods
[21:52:01] <dan-nl>	 Reedy: just an fyi, the i18n for gwtoolset is now back on production … thanks!
[21:52:19] <ori>	 out of recognition that creativity and momentum are rare and fleeting and easily suffocated by an exaggerated emphasis on safety and control
[21:52:28] <greg-g>	 fair
[21:52:31] <Reedy>	 dan-nl: Yeah, not completely finished though ;)
[21:53:01] <twkozlowski>	 Reedy: on an unrelated note
[21:53:11] <Reedy>	 twkozlowski: No
[21:53:13] <Reedy>	 No you can't.
[21:53:22] <ori>	 anyways </rant>, thanks for hearing me out.
[21:53:30] <greg-g>	 ori: so, I apologize for over reacting, I just see worst-case scenarios and people asking me the "shoulda coulda" things
[21:53:39] <greg-g>	 ori: ditto
[21:53:42] <twkozlowski>	 Reedy: OK
[21:54:01] <Reedy>	 :D
[21:54:38] <ori>	 greg-g: yeah, me too (overreacting). i understand your perspective.
[21:55:21] <greg-g>	 ori: hifive
[21:55:50] * greg-g  was gonna type "hi5" but thought that was too lolspeek
[21:56:41] <logmsgbot>	 !log reedy finished scap: active Rebuild 1.23wmf8 l10n cache with gwtoolset in it
[21:57:38] <Reedy>	 twkozlowski: Wassup?
[21:57:42] <Reedy>	 dan-nl: Should be all fixed now
[21:57:49] <dan-nl>	 thanks again :)
[21:59:13] <ori>	 AaronSchulz: OK, so I think that if -n "$1" but "$1" "!= "active" and not -d "$MW_COMMON_SOURCE/$1", we should assume $1 is a log message rather than a version
[21:59:20] <twkozlowski>	 I noticed https://wikitech.wikimedia.org/wiki/Add_a_wiki is seriously outdated or plainly wrong
[21:59:33] <twkozlowski>	 when I went through the instructions for zhwikivoyage
[21:59:40] <AaronSchulz>	 ori: I was actually making a regex
[21:59:55] <AaronSchulz>	 it's taking forever to get working in my test script though
[22:00:06] <AaronSchulz>	 even though it's trivial
[22:00:52] <ori>	 maybe require --flag / --option=value
[22:04:02] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[22:04:03] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC  
[22:05:22] <grrrit-wm>	 (03PS3) 10Manybubbles: Move all other small misc. wikis over to Cirrus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104765 (owner: 10Chad)
[22:06:52] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 22:06:48 UTC 2013  
[22:07:03] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 10:06:48 PM UTC  
[22:07:04] <grrrit-wm>	 (03PS1) 10Aaron Schulz: Allow "scap <message>" calls as they previously worked [operations/puppet] - 10https://gerrit.wikimedia.org/r/104772 
[22:07:13] <AaronSchulz>	 ori: ^
[22:07:22] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 22:07:18 UTC 2013  
[22:07:38] <ori>	 AaronSchulz: hmmm
[22:08:03] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 10:07:18 PM UTC  
[22:08:05] <ori>	 AaronSchulz: better, IMO: https://dpaste.de/mQA2
[22:08:47] <ori>	 actually, after VERSIONS="${1#--versions=}", there should be a call to "shift"
[22:09:41] <grrrit-wm>	 (03PS4) 10Chad: Move all other small misc. wikis over to Cirrus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104765 
[22:10:07] <ori>	 the way you have it, if i mean to type active but instead type "acitve", scap proceeds without complaining and 'acitve' is logged
[22:10:43] <ori>	 this way if the first condition is true ($1 starts with '--version=.*') you can fail if the parameter value is invalid
[22:10:45] <AaronSchulz>	 ori: can you commit that then?
[22:10:54] <ori>	 sure, i'll update your patch
[22:11:05] <AaronSchulz>	 most of my reading on options parsing in bash involved piles of messy code
[22:12:17] <AaronSchulz>	 though I guess with this the order is pretty well known
[22:18:30] <grrrit-wm>	 (03PS5) 10Manybubbles: Move all other small misc. wikis over to Cirrus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104765 (owner: 10Chad)
[22:26:20] <logmsgbot>	 !log aaron started scap: active timing test
[22:29:47] <spagewmf>	 hey, Flow has some code that only fails in production.  We need to run some debugging code on test2wiki (mw1063).  Is that OK?  Do we need an LD slot to mess with it?
[22:29:52] <logmsgbot>	 !log aaron finished scap: active timing test
[22:30:18] <AaronSchulz>	 3m45sec, 63 seconds was on two slow boxes
[22:31:41] <Reedy>	 spagewmf: That sounds pretty wrong
[22:31:47] <Reedy>	 Testwiki runs on any apache
[22:31:51] <Reedy>	 testwiki runs on one apache
[22:31:58] <Reedy>	 and that apache isn't mw1063
[22:31:59] <Reedy>	 it's mw1017
[22:32:16] <AaronSchulz>	 yeah 1063 does not look unique
[22:32:28] <grrrit-wm>	 (03PS2) 10Ori.livneh: Allow "scap <message>" calls as they previously worked [operations/puppet] - 10https://gerrit.wikimedia.org/r/104772 (owner: 10Aaron Schulz)
[22:32:49] <ori>	 ^ AaronSchulz
[22:33:12] <spagewmf>	 Reedy: fine, we can run on testwiki rather than test2wiki.  It's a crazy "This shouldn't happen" PHP thing, so we're stumped.
[22:33:26] <Reedy>	 spagewmf: should be fine
[22:33:38] <Reedy>	 Just didn't want you to be looking in the wrong place and wtfing more
[22:34:13] <ori>	 spagewmf: might be worth investigating out loud on #wikimedia-dev to increase the chance that someone overhears the analysis and has an idea
[22:34:49] <spagewmf>	 Reedy:  Thanks.  ori: good point
[22:35:36] <ori>	 AaronSchulz: I'll wait for you to +1 (or -1, if you see issues) before merging
[22:36:02] <AaronSchulz>	 seems to work
[22:36:10] <grrrit-wm>	 (03CR) 10Aaron Schulz: [C: 031] Allow "scap <message>" calls as they previously worked [operations/puppet] - 10https://gerrit.wikimedia.org/r/104772 (owner: 10Aaron Schulz)
[22:36:39] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] Allow "scap <message>" calls as they previously worked [operations/puppet] - 10https://gerrit.wikimedia.org/r/104772 (owner: 10Aaron Schulz)
[22:36:52] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 22:36:48 UTC 2013  
[22:37:12] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 10:36:48 PM UTC  
[22:37:15] <ori>	 running puppet on tin
[22:37:32] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 22:37:23 UTC 2013  
[22:38:12] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 10:37:23 PM UTC  
[22:39:48] <ori>	 AaronSchulz: want to try it?
[22:40:15] <AaronSchulz>	 ok
[22:41:33] <logmsgbot>	 !log aaron started scap: testing scap with no --versions flag
[22:42:43] <ori>	 where is morebots?
[22:43:02] <AaronSchulz>	 leaving early for the New Years?
[22:43:45] <ori>	 celebrating another undeserved year of life
[22:43:55] <AaronSchulz>	 ori: heh, it will be nice when the tampa boxes are gone from the dsh groups... >:)
[22:44:16] <logmsgbot>	 !log aaron finished scap: testing scap with no --versions flag
[22:44:30] <AaronSchulz>	 2m55sec
[22:44:50] <AaronSchulz>	 who decides what bots are worthy of live? oh...wait...
[22:45:35] <ori>	 labs-morebots is alive and well on #wikimedia-labs, so it must be a failure of this specific instance
[22:45:40] * ori  investigates
[22:46:10] <AaronSchulz>	 Reedy: do you want to make that extension branch change or should I stab at it?
[22:46:46] <Reedy>	 AaronSchulz: You can just cut the extension list and move it down to fix the problem ;)
[22:46:56] <logmsgbot>	 Meanwhile, I'm doing just fine, because my implementation doesn't suck.
[22:47:22] <spagewmf>	 logmsgbot passes Turing test
[22:48:50] <spagewmf>	 Reedy or anyone, how can we put a modified PHP file just on mw1017?
[22:49:16] <Reedy>	 change it on tin, sync-common on mw1017
[22:49:17] <ori>	 modified how?
[22:49:24] <ori>	 but yes, that
[22:49:28] <Reedy>	 or sudo -u mwdeploy EDITOR /path/to/file
[22:53:49] <spagewmf>	 ori, wfDebugLog("huh") 
[22:59:18] <ori>	 !log morebots went missing. job was active on toollabs but stuck. nothing useful in logs. restarted.
[23:00:05] <morebots>	 Logged the message, Master
[23:01:29] <ori>	 AaronSchulz: i quibbled with the tone, but greg-g is probably right to ask for an e-mail
[23:07:02] <icinga-wm>	 RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Tue Dec 31 23:06:51 UTC 2013  
[23:07:03] <icinga-wm>	 RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Tue Dec 31 23:06:56 UTC 2013  
[23:07:12] <icinga-wm>	 PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 11:06:56 PM UTC  
[23:07:12] <icinga-wm>	 PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Tue 31 Dec 2013 11:06:51 PM UTC  
[23:27:21] <spagewmf>	 glory be, problem is a condition someone's adding to watchlist query
[23:27:24] <spagewmf>	      AND (page_namespace != '90')
[23:27:35] <ebernhardson>	 null propogates everywhere in sql :(
[23:27:45] <spagewmf>	 should be AND (page_namespace IS NULL OR page_namespace != '90')