[02:16:56] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1284s [02:22:26] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 712s [02:26:36] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [02:32:07] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [03:59:55] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [04:40:15] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [04:53:54] no notpeter and OrenBochman is presumably asleep [05:20:40] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [05:37:14] PROBLEM - Puppet freshness on es1002 is CRITICAL: Puppet has not run in the last 10 hours [06:12:34] PROBLEM - Disk space on hume is CRITICAL: DISK CRITICAL - free space: /a/static/uncompressed 35276 MB (3% inode=99%): [07:55:11] RECOVERY - Disk space on hume is OK: DISK OK [09:44:07] RECOVERY - MySQL slave status on es1004 is OK: OK: [09:58:41] re [10:11:47] OrenBochman: hey [10:12:04] I am in IL [10:12:18] till jan 5 [10:15:31] OrenBochman: so, FYI, i'm also interested in search. i even added puppetizing it to the wiki but ryan seemed to think it was already done. (and now notpeter's doing it?) [10:15:39] https://www.mediawiki.org/w/index.php?title=WMF_Projects/Wikimedia_Labs&diff=460321&oldid=460318 [10:17:57] I'm a little out of touch with ops [10:23:36] OrenBochman: i'm acquainted with maybe half of stuff happening in ops. but not so much with search. [10:24:02] well search has been dormant [10:26:30] well, stuff still breaks periodically (e.g. searchidx\d dies and we notice because someone complains that there's no propagation to search results after edits). sometimes several times in a week. so mostly i know from paying attention to what gets booted and from seeing new wikis added and then something gets logged about adding an index for the new wiki [10:30:46] well I expect to learn to operate the ops side of search within the next two weeks [10:32:08] also once It's migrated to Solr it will be possible for ops to respond to problems without dev intervention [10:32:38] i thought there was a reason not to use solr? [10:32:51] anyway, elasticsearch should also be evaluated [10:33:17] but i'm all for using something that someone else is using too not just our own custom glue [10:33:53] elastic search would require more work to migrate [10:34:16] no surprise there :) [10:34:44] elasticsearch.org and lucene.apache.org/solr for ppl that want links :) [10:34:58] * jeremyb is going afk shortly [10:38:16] using solr sould reduce our code base in search by 30% [10:41:38] ttl [10:44:07] OrenBochman: ad machar [10:44:22] sabbaba [14:09:27] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [15:29:45] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [15:49:39] PROBLEM - Puppet freshness on es1002 is CRITICAL: Puppet has not run in the last 10 hours [16:01:29] PROBLEM - mobile traffic loggers on cp1044 is CRITICAL: PROCS CRITICAL: 5 processes with args varnishncsa [16:11:10] RECOVERY - mobile traffic loggers on cp1044 is OK: PROCS OK: 2 processes with args varnishncsa [20:12:59] PROBLEM - mobile traffic loggers on cp1041 is CRITICAL: PROCS CRITICAL: 7 processes with args varnishncsa [20:12:59] PROBLEM - mobile traffic loggers on cp1042 is CRITICAL: PROCS CRITICAL: 7 processes with args varnishncsa [20:22:39] RECOVERY - mobile traffic loggers on cp1042 is OK: PROCS OK: 4 processes with args varnishncsa [20:41:59] RECOVERY - mobile traffic loggers on cp1041 is OK: PROCS OK: 1 process with args varnishncsa [21:08:13] PROBLEM - mobile traffic loggers on cp1043 is CRITICAL: PROCS CRITICAL: 5 processes with args varnishncsa [21:12:23] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [21:13:04] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 1.90 ms [21:30:43] RECOVERY - mobile traffic loggers on cp1043 is OK: PROCS OK: 1 process with args varnishncsa