[00:00:44] <mutante>	 or i can remove myself but not save that because "You no longer have permission to access this document, so your changes can't be saved. "  hahah
[00:20:08] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] Add djvu tools for OCG. [puppet] - 10https://gerrit.wikimedia.org/r/165329 (owner: 10Cscott)
[00:25:37] <grrrit-wm>	 (03CR) 10Dzahn: "@ocg1001:~# /usr/bin/ddjvu --help" [puppet] - 10https://gerrit.wikimedia.org/r/165329 (owner: 10Cscott)
[00:44:54] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected  
[01:05:59] <grrrit-wm>	 (03PS1) 10Dzahn: elasticsearch - delete pmtpa remnants [puppet] - 10https://gerrit.wikimedia.org/r/165672 
[01:07:06] <grrrit-wm>	 (03PS1) 10Dzahn: facilities - remove Tampa power strip monitors [puppet] - 10https://gerrit.wikimedia.org/r/165673 
[01:07:47] <grrrit-wm>	 (03PS2) 10Dzahn: elasticsearch - delete pmtpa remnants [puppet] - 10https://gerrit.wikimedia.org/r/165672 
[01:10:16] <grrrit-wm>	 (03PS1) 10Dzahn: rancid - remove pmtpa devices from router.db [puppet] - 10https://gerrit.wikimedia.org/r/165674 
[01:16:56] <grrrit-wm>	 (03PS1) 10Dzahn: remove pdf servers and role::pdf [puppet] - 10https://gerrit.wikimedia.org/r/165676 
[01:18:32] <grrrit-wm>	 (03PS2) 10Dzahn: remove pdf servers,role::pdf and misc pdf class [puppet] - 10https://gerrit.wikimedia.org/r/165676 
[01:20:56] <grrrit-wm>	 (03PS1) 10Dzahn: rolematcher - remove pmtpa [puppet] - 10https://gerrit.wikimedia.org/r/165677 
[01:26:44] <grrrit-wm>	 (03PS2) 10Chad: Adding tools for banning/unbanning an ES node [puppet] - 10https://gerrit.wikimedia.org/r/164617 
[01:26:46] <grrrit-wm>	 (03PS5) 10Chad: First of (hopefully many) es-tool commands [puppet] - 10https://gerrit.wikimedia.org/r/163945 
[01:26:48] <grrrit-wm>	 (03PS3) 10Chad: Another es-tool function: restart a node the fast & easy way [puppet] - 10https://gerrit.wikimedia.org/r/164401 
[01:26:50] <grrrit-wm>	 (03PS4) 10Chad: More elasticsearch tools [puppet] - 10https://gerrit.wikimedia.org/r/164270 
[01:26:52] <grrrit-wm>	 (03PS1) 10Dzahn: redis - remove pmtpa monitoring group [puppet] - 10https://gerrit.wikimedia.org/r/165678 
[01:34:20] <icinga-wm>	 PROBLEM - Disk space on analytics1035 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/h 149638 MB (3% inode=99%):  
[02:17:53] <logmsgbot>	 !log LocalisationUpdate completed (1.25wmf1) at 2014-10-09 02:17:53+00:00
[02:18:04] <morebots>	 Logged the message, Master
[02:30:03] <logmsgbot>	 !log LocalisationUpdate completed (1.25wmf2) at 2014-10-09 02:30:03+00:00
[02:30:11] <morebots>	 Logged the message, Master
[03:26:59] <grrrit-wm>	 (03PS3) 10KartikMistry: WIP: apertium service for Beta [puppet] - 10https://gerrit.wikimedia.org/r/165485 
[03:33:47] <logmsgbot>	 !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Oct  9 03:33:47 UTC 2014 (duration 33m 46s)
[03:33:53] <morebots>	 Logged the message, Master
[03:55:02] <grrrit-wm>	 (03PS3) 10Chad: Adding tools for banning/unbanning an ES node [puppet] - 10https://gerrit.wikimedia.org/r/164617 
[04:43:30] <grrrit-wm>	 (03CR) 10Krinkle: "Per https://bugzilla.wikimedia.org/show_bug.cgi?id=71761, this doesn't seem to stop it from existing instances. Looks like it might need a" [puppet] - 10https://gerrit.wikimedia.org/r/165360 (owner: 10Yuvipanda)
[04:44:57] <grrrit-wm>	 (03CR) 10Krinkle: "I don't mind, but I don't see why we'd do it different here. Seems simple enough to just ensure absent and it'll just be removed automatic" [puppet] - 10https://gerrit.wikimedia.org/r/165204 (https://bugzilla.wikimedia.org/54393) (owner: 10Zfilipin)
[04:58:20] <icinga-wm>	 PROBLEM - ElasticSearch health check on elastic1008 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.140  
[04:59:52] <^d>	 ^ known, I'm doing stuff.
[05:00:16] <springle>	 :)
[05:00:39] <^d>	 Should recover in a minute or two. icinga happened to hit 1008 *right* as the service was bouncing :)
[05:09:30] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core:  cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR  
[05:19:07] <grrrit-wm>	 (03PS2) 10KartikMistry: WIP: Added initial Debian package [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/165528 
[06:07:59] <icinga-wm>	 PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[06:16:50] <icinga-wm>	 PROBLEM - puppet last run on lvs4004 is CRITICAL: CRITICAL: puppet fail  
[06:28:40] <icinga-wm>	 PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: puppet fail  
[06:29:02] <icinga-wm>	 PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:29:31] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: mediawiki: consolidate hosts in site.pp, convert mw1053 [puppet] - 10https://gerrit.wikimedia.org/r/165696 
[06:36:29] <icinga-wm>	 RECOVERY - puppet last run on lvs4004 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures  
[06:41:40] <icinga-wm>	 RECOVERY - ElasticSearch health check on elastic1008 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 18: number_of_data_nodes: 18: active_primary_shards: 2029: active_shards: 6082: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0  
[06:46:40] <icinga-wm>	 RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures  
[06:47:10] <icinga-wm>	 RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures  
[06:53:51] <_joe_>	 ori: "pcc
[06:53:59] <_joe_>	 damn keyboard
[06:54:08] <_joe_>	 "pcc" is really _awesome_
[06:57:57] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: consolidate hosts in site.pp, convert mw1053 [puppet] - 10https://gerrit.wikimedia.org/r/165696 (owner: 10Giuseppe Lavagetto)
[07:00:47] <ori>	 _joe_: :)
[07:01:29] <icinga-wm>	 PROBLEM - puppet last run on mw1053 is CRITICAL: Timeout while attempting connection  
[07:02:19] <icinga-wm>	 PROBLEM - Host mw1053 is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[07:02:35] <_joe_>	 !log reinstalling mw1053
[07:02:42] <morebots>	 Logged the message, Master
[07:03:21] <_joe_>	 the problem with writing procedures is you cant miss even one point
[07:05:34] <_joe_>	 (like "schedule downtime in icinga")
[07:07:29] <icinga-wm>	 RECOVERY - Host mw1053 is UP: PING OK - Packet loss = 0%, RTA = 5.30 ms  
[07:08:39] <icinga-wm>	 RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54251 bytes in 7.403 second response time  
[07:11:40] <icinga-wm>	 PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[07:12:35] <_joe_>	 hiera doesn't work as expected on the puppet compiler... damn, I really got to work on it.
[07:14:41] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: mediawiki: convert three more appservers to HAT. [puppet] - 10https://gerrit.wikimedia.org/r/165697 
[07:15:09] <_joe_>	 !log reimaging mw102[3-5] to hhvm
[07:15:16] <morebots>	 Logged the message, Master
[07:24:39] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: convert three more appservers to HAT. [puppet] - 10https://gerrit.wikimedia.org/r/165697 (owner: 10Giuseppe Lavagetto)
[07:31:22] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0  
[07:43:19] <icinga-wm>	 PROBLEM - DPKG on mw1053 is CRITICAL: Connection refused by host  
[07:43:24] <icinga-wm>	 PROBLEM - nutcracker process on mw1053 is CRITICAL: Connection refused by host  
[07:43:38] <icinga-wm>	 PROBLEM - puppet last run on mw1053 is CRITICAL: Connection refused by host  
[07:43:38] <icinga-wm>	 PROBLEM - Disk space on mw1053 is CRITICAL: Connection refused by host  
[07:44:18] <icinga-wm>	 PROBLEM - RAID on mw1053 is CRITICAL: Connection refused by host  
[07:44:39] <icinga-wm>	 PROBLEM - check configured eth on mw1053 is CRITICAL: Connection refused by host  
[07:45:01] <icinga-wm>	 PROBLEM - check if dhclient is running on mw1053 is CRITICAL: Connection refused by host  
[07:45:02] <icinga-wm>	 PROBLEM - check if salt-minion is running on mw1053 is CRITICAL: Connection refused by host  
[07:45:18] <_joe_>	 didn't I schedule downtime there?
[07:45:24] <_joe_>	 ok, whatever.
[07:45:38] <icinga-wm>	 PROBLEM - nutcracker port on mw1053 is CRITICAL: Connection refused by host  
[07:48:09] <icinga-wm>	 RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54230 bytes in 0.370 second response time  
[07:56:09] <icinga-wm>	 PROBLEM - NTP on mw1053 is CRITICAL: NTP CRITICAL: Offset unknown  
[07:59:28] <icinga-wm>	 RECOVERY - RAID on mw1053 is OK: OK: no RAID installed  
[07:59:29] <icinga-wm>	 RECOVERY - DPKG on mw1053 is OK: All packages OK  
[07:59:29] <icinga-wm>	 RECOVERY - nutcracker process on mw1053 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker  
[07:59:38] <icinga-wm>	 RECOVERY - nutcracker port on mw1053 is OK: TCP OK - 0.000 second response time on port 11212  
[07:59:38] <icinga-wm>	 RECOVERY - Disk space on mw1053 is OK: DISK OK  
[07:59:49] <icinga-wm>	 RECOVERY - check configured eth on mw1053 is OK: NRPE: Unable to read output  
[08:00:10] <icinga-wm>	 RECOVERY - check if dhclient is running on mw1053 is OK: PROCS OK: 0 processes with command name dhclient  
[08:00:19] <icinga-wm>	 RECOVERY - check if salt-minion is running on mw1053 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion  
[08:01:37] <icinga-wm>	 PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: Puppet has 1 failures  
[08:02:46] <icinga-wm>	 PROBLEM - puppet last run on mw1024 is CRITICAL: Connection refused by host  
[08:02:47] <icinga-wm>	 PROBLEM - Disk space on mw1024 is CRITICAL: Connection refused by host  
[08:03:01] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core:  cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR  
[08:03:01] <icinga-wm>	 PROBLEM - RAID on mw1023 is CRITICAL: Connection refused by host  
[08:03:06] <icinga-wm>	 PROBLEM - nutcracker port on mw1025 is CRITICAL: Connection refused by host  
[08:03:16] <icinga-wm>	 PROBLEM - DPKG on mw1025 is CRITICAL: Connection refused by host  
[08:03:16] <icinga-wm>	 PROBLEM - nutcracker process on mw1025 is CRITICAL: Connection refused by host  
[08:03:27] <icinga-wm>	 PROBLEM - Disk space on mw1025 is CRITICAL: Connection refused by host  
[08:03:27] <icinga-wm>	 PROBLEM - check configured eth on mw1023 is CRITICAL: Connection refused by host  
[08:03:27] <icinga-wm>	 PROBLEM - puppet last run on mw1025 is CRITICAL: Connection refused by host  
[08:03:39] <icinga-wm>	 PROBLEM - RAID on mw1024 is CRITICAL: Connection refused by host  
[08:03:39] <icinga-wm>	 PROBLEM - check if dhclient is running on mw1023 is CRITICAL: Connection refused by host  
[08:03:58] <icinga-wm>	 PROBLEM - check if salt-minion is running on mw1023 is CRITICAL: Connection refused by host  
[08:04:08] <_joe_>	 oh I hate you icinga
[08:04:09] <icinga-wm>	 PROBLEM - check configured eth on mw1024 is CRITICAL: Connection refused by host  
[08:04:19] <icinga-wm>	 PROBLEM - check if dhclient is running on mw1024 is CRITICAL: Connection refused by host  
[08:04:19] <icinga-wm>	 PROBLEM - RAID on mw1025 is CRITICAL: Connection refused by host  
[08:04:41] <icinga-wm>	 PROBLEM - check if salt-minion is running on mw1024 is CRITICAL: Connection refused by host  
[08:04:42] <icinga-wm>	 RECOVERY - check configured eth on mw1023 is OK: NRPE: Unable to read output  
[08:04:48] <icinga-wm>	 PROBLEM - check configured eth on mw1025 is CRITICAL: Connection refused by host  
[08:04:48] <icinga-wm>	 RECOVERY - check if dhclient is running on mw1023 is OK: PROCS OK: 0 processes with command name dhclient  
[08:04:49] <icinga-wm>	 RECOVERY - RAID on mw1024 is OK: OK: no RAID installed  
[08:04:58] <icinga-wm>	 PROBLEM - check if dhclient is running on mw1025 is CRITICAL: Connection refused by host  
[08:04:58] <icinga-wm>	 RECOVERY - check if salt-minion is running on mw1023 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion  
[08:04:59] <icinga-wm>	 RECOVERY - Disk space on mw1024 is OK: DISK OK  
[08:05:09] <icinga-wm>	 PROBLEM - check if salt-minion is running on mw1025 is CRITICAL: Connection refused by host  
[08:05:09] <icinga-wm>	 RECOVERY - check configured eth on mw1024 is OK: NRPE: Unable to read output  
[08:05:19] <icinga-wm>	 RECOVERY - RAID on mw1023 is OK: OK: no RAID installed  
[08:05:29] <icinga-wm>	 RECOVERY - nutcracker port on mw1025 is OK: TCP OK - 0.000 second response time on port 11212  
[08:05:37] <icinga-wm>	 RECOVERY - check if dhclient is running on mw1024 is OK: PROCS OK: 0 processes with command name dhclient  
[08:05:37] <icinga-wm>	 RECOVERY - RAID on mw1025 is OK: OK: no RAID installed  
[08:05:39] <icinga-wm>	 RECOVERY - DPKG on mw1025 is OK: All packages OK  
[08:05:39] <icinga-wm>	 RECOVERY - nutcracker process on mw1025 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker  
[08:05:50] <icinga-wm>	 RECOVERY - check if salt-minion is running on mw1024 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion  
[08:05:50] <icinga-wm>	 RECOVERY - Disk space on mw1025 is OK: DISK OK  
[08:05:59] <icinga-wm>	 RECOVERY - check configured eth on mw1025 is OK: NRPE: Unable to read output  
[08:05:59] <icinga-wm>	 PROBLEM - puppet last run on mw1023 is CRITICAL: CRITICAL: Puppet has 1 failures  
[08:06:17] <icinga-wm>	 RECOVERY - check if dhclient is running on mw1025 is OK: PROCS OK: 0 processes with command name dhclient  
[08:06:18] <icinga-wm>	 PROBLEM - puppet last run on mw1024 is CRITICAL: CRITICAL: Puppet has 1 failures  
[08:06:27] <icinga-wm>	 RECOVERY - check if salt-minion is running on mw1025 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion  
[08:06:56] <icinga-wm>	 PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures  
[08:08:00] <_joe_>	 for the record, this happens because the host gets removed from icinga when we clean up puppet facts, thus the downtime gets yanked
[08:10:07] <icinga-wm>	 RECOVERY - NTP on mw1053 is OK: NTP OK: Offset -0.001644015312 secs  
[08:14:16] <icinga-wm>	 RECOVERY - puppet last run on mw1023 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures  
[08:14:26] <icinga-wm>	 RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures  
[08:14:27] <icinga-wm>	 RECOVERY - puppet last run on mw1024 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures  
[08:15:07] <icinga-wm>	 RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures  
[08:15:16] <icinga-wm>	 PROBLEM - HHVM rendering on mw1053 is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[08:15:26] <icinga-wm>	 PROBLEM - NTP on mw1023 is CRITICAL: NTP CRITICAL: Offset unknown  
[08:15:37] <icinga-wm>	 PROBLEM - HHVM rendering on mw1023 is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[08:16:06] <icinga-wm>	 RECOVERY - HHVM rendering on mw1053 is OK: HTTP OK: HTTP/1.1 200 OK - 66889 bytes in 0.291 second response time  
[08:16:07] <icinga-wm>	 PROBLEM - NTP on mw1024 is CRITICAL: NTP CRITICAL: Offset unknown  
[08:16:37] <icinga-wm>	 RECOVERY - HHVM rendering on mw1023 is OK: HTTP OK: HTTP/1.1 200 OK - 66889 bytes in 0.301 second response time  
[08:16:43] <icinga-wm>	 PROBLEM - NTP on mw1025 is CRITICAL: NTP CRITICAL: Offset unknown  
[08:17:22] <_joe_>	 !log repooling mw102[3-5],mw1053 in the hhvm pool
[08:17:27] <icinga-wm>	 RECOVERY - NTP on mw1023 is OK: NTP OK: Offset -0.01114153862 secs  
[08:17:27] <morebots>	 Logged the message, Master
[08:18:08] <icinga-wm>	 RECOVERY - NTP on mw1024 is OK: NTP OK: Offset -0.02184319496 secs  
[08:18:36] <icinga-wm>	 RECOVERY - NTP on mw1025 is OK: NTP OK: Offset -0.0204795599 secs  
[08:21:05] <grrrit-wm>	 (03CR) 10Nemo bis: "At this stage in the setup, is it possible to check if the LanguageConverter class is loaded for that wiki?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165490 (https://bugzilla.wikimedia.org/71416) (owner: 10Reedy)
[08:45:20] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Small correction, LGTM." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/163945 (owner: 10Chad)
[08:46:38] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0]  
[08:59:17] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0]  
[09:14:35] <grrrit-wm>	 (03PS1) 10Glaisher: Add 'abusefilter-modify-restricted' to sysops at zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165704 (https://bugzilla.wikimedia.org/71854) 
[09:15:56] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0  
[09:19:07] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core:  cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR  
[09:23:57] <icinga-wm>	 PROBLEM - puppet last run on mw1020 is CRITICAL: CRITICAL: Puppet has 1 failures  
[09:38:19] <icinga-wm>	 RECOVERY - puppet last run on mw1020 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures  
[09:43:47] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0  
[09:45:05] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: First of (hopefully many) es-tool commands (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/163945 (owner: 10Chad)
[09:47:01] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core:  cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR  
[09:48:57] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0  
[09:51:49] <mark>	 that ws planned maintenance
[09:51:58] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] "first time ever I hear about rolematcher, wikitech shows no hits too" [puppet] - 10https://gerrit.wikimedia.org/r/165677 (owner: 10Dzahn)
[09:59:18] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core:  cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR  
[10:04:16] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0  
[10:07:27] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core:  cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR  
[10:11:41] <mark>	 ...and that's outside the maintenance window
[10:20:43] <grrrit-wm>	 (03PS7) 10Giuseppe Lavagetto: mediawiki: consolidate apache configs [puppet] - 10https://gerrit.wikimedia.org/r/164358 
[10:22:43] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: consolidate apache configs [puppet] - 10https://gerrit.wikimedia.org/r/164358 (owner: 10Giuseppe Lavagetto)
[10:35:03] <apergos>	 ah godog
[10:35:14] <_joe_>	 !log disabling puppet on most mw* hosts while testing apache changes
[10:35:20] <morebots>	 Logged the message, Master
[10:35:25] <godog>	 apergos: hey!
[10:35:50] <apergos>	 so whil folk are touching apache (well afterwards), there's "rotate all the logs and only keep two weeks' worth"
[10:36:05] <apergos>	 which woudl mean a daily apache graceful on all the apaches as the current patchset has it
[10:36:36] <apergos>	 https://gerrit.wikimedia.org/r/#/c/130296/   when you're done with the current stuff, mind taking a look?
[10:36:58] <_joe_>	 apergos: yeah I am aware of that patch
[10:37:02] <apergos>	 I'm just unsure about all of them gracefulling at the same time
[10:37:09] <apergos>	 ah _joe_, good... 
[10:37:12] <_joe_>	 (godog == Filippo)
[10:37:21] <apergos>	 bah
[10:37:23] <_joe_>	 tu quoque, apergos :P
[10:37:34] <apergos>	 I just remember that he has a nick that goes to some unrelated handle :-D
[10:37:37] <_joe_>	 Giuseppe == joseph => joe
[10:37:46] <apergos>	 yeah yours makes sense
[10:38:24] <godog>	 haha indeed mine doesn't
[10:39:09] * apergos shuts up and waits for the config stuff to be tested and happy first
[10:39:47] <godog>	 apergos: patch still looks good to me though :)
[10:40:34] <apergos>	 :-)
[10:41:07] <apergos>	 what do you think about all the apaches gracefulling at once?  I guess a scap does that (or does it any more?)
[10:43:14] <_joe_>	 it does not
[10:43:35] <_joe_>	 apache gracefulling is nice anyway, but why do we need that?
[10:44:34] <apergos>	 well either you copytruncate or you do that, otherwise the clients will write to the wrong logs
[10:44:41] <apergos>	 after rotation
[10:45:12] <apergos>	 unles there's some other thing you had in mind
[10:45:46] <_joe_>	 I don't remember if there is an internal way to rotate logs in apache
[10:48:22] <apergos>	 it looks like not so much 
[10:48:36] <apergos>	 there's the rotatelogs thing for use with piped logs, it's an external program 
[10:49:48] <icinga-wm>	 PROBLEM - puppet last run on lvs1002 is CRITICAL: CRITICAL: Puppet has 1 failures  
[10:51:07] <grrrit-wm>	 (03PS5) 10Filippo Giunchedi: swift-synctool: enable/disable/show sync [software] - 10https://gerrit.wikimedia.org/r/160428 
[10:51:48] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: swift-synctool: enable/disable/show sync (033 comments) [software] - 10https://gerrit.wikimedia.org/r/160428 (owner: 10Filippo Giunchedi)
[10:58:08] <_joe_>	 damn apache configs; they're full of subtelties
[11:00:46] * YuviPanda switches everything to nginx+fastcgi rather than apache+fastcgi
[11:04:27] <icinga-wm>	 RECOVERY - puppet last run on lvs1002 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures  
[11:08:34] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: mediawiki: restore redirect to https for donatewiki robots.txt [puppet] - 10https://gerrit.wikimedia.org/r/165706 
[11:08:54] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] mediawiki: restore redirect to https for donatewiki robots.txt [puppet] - 10https://gerrit.wikimedia.org/r/165706 (owner: 10Giuseppe Lavagetto)
[11:11:39] <godog>	 !log disabled puppet in ms-fe/ms-be in eqiad/codfw to merge container-sync changes
[11:11:46] <morebots>	 Logged the message, Master
[11:11:56] <_joe_>	 !log reenabled puppet on mw*
[11:12:03] <morebots>	 Logged the message, Master
[11:12:17] <grrrit-wm>	 (03PS3) 10Filippo Giunchedi: swift: add container sync [puppet] - 10https://gerrit.wikimedia.org/r/160430 
[11:13:16] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: add container sync [puppet] - 10https://gerrit.wikimedia.org/r/160430 (owner: 10Filippo Giunchedi)
[11:13:48] <icinga-wm>	 PROBLEM - puppet last run on db1042 is CRITICAL: CRITICAL: Puppet last ran 88061 seconds ago, expected  14400  
[11:14:57] <icinga-wm>	 RECOVERY - puppet last run on db1042 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures  
[11:18:58] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0  
[11:20:42] <mark>	 :P
[11:22:58] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core:  cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR  
[11:27:28] <godog>	 srsly?
[11:39:02] <manybubbles>	 !log starting upgrade of elastic1009
[11:39:12] <morebots>	 Logged the message, Master
[11:42:34] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: swift: move hiera params to the right place [puppet] - 10https://gerrit.wikimedia.org/r/165708 
[11:43:23] <godog>	 _joe_: ^
[11:43:59] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 031] swift: move hiera params to the right place [puppet] - 10https://gerrit.wikimedia.org/r/165708 (owner: 10Filippo Giunchedi)
[11:52:17] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: move hiera params to the right place [puppet] - 10https://gerrit.wikimedia.org/r/165708 (owner: 10Filippo Giunchedi)
[11:52:27] <godog>	 thanks!
[11:57:19] <springle>	 !log xtrabackup db1016 to db2010
[11:57:26] <morebots>	 Logged the message, Master
[11:59:46] <springle>	 !log converted some librenms tables to innodb on db1001 m1-master. should be a no-op
[11:59:51] <morebots>	 Logged the message, Master
[12:01:55] <logmsgbot>	 !log reedy Purged l10n cache for 1.24wmf22
[12:02:01] <morebots>	 Logged the message, Master
[12:02:53] <icinga-wm>	 PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 338 seconds  
[12:13:23] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0  
[12:16:43] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core:  cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR  
[12:52:23] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0  
[12:57:34] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core:  cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR  
[12:57:40] <paravoid>	 what the hell
[13:00:04] <jouncebot>	 K4: Dear anthropoid, the time has come. Please deploy Fundraising (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141009T1300).
[13:03:45] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0  
[13:07:03] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core:  cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR  
[13:17:54] <grrrit-wm>	 (03CR) 10Springle: [C: 031] "DB bits are ready. See RT." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/165231 (https://bugzilla.wikimedia.org/71597) (owner: 10BryanDavis)
[13:19:19] <RoanKattouw>	 akosiaris: Argh crap I'm an idiot
[13:19:32] <RoanKattouw>	 akosiaris: My site.pp patch installs role::citoid , but it needs to be role::citoid::production :S
[13:19:37] * RoanKattouw writes a patch
[13:20:03] <icinga-wm>	 PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 337 seconds  
[13:20:33] <icinga-wm>	 PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 344 seconds  
[13:20:35] <RoanKattouw>	 But... it is? Huh?
[13:21:09] <RoanKattouw>	 OK salt documentation time then
[13:21:15] <akosiaris>	 salt ? 
[13:21:22] <akosiaris>	 trebuchet you mean ?
[13:21:25] <RoanKattouw>	 For git-deploy
[13:21:27] <RoanKattouw>	 Yeah trebuchet
[13:21:45] <akosiaris>	 it should be enough to do a git-deploy start ; git-deploy sync
[13:21:53] <akosiaris>	 on /srv/deployment/citoid/deploy on tin
[13:21:53] <RoanKattouw>	 I'll try that now
[13:22:03] <RoanKattouw>	 I just checked and /srv/deployment/citoid doesn't even exist on sca1001 yet
[13:22:07] <RoanKattouw>	 I had expected puppet to create that already
[13:22:09] <RoanKattouw>	 But I'll try
[13:22:36] <akosiaris>	 I kind of did too. Not absolutely sure yet about how ori trebuchet package provider works
[13:23:03] <akosiaris>	 and since this is a first in production, we are kind of guinea pigs :-)
[13:23:20] <RoanKattouw>	 Well that worked beautifully
[13:23:24] <icinga-wm>	 RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 0 seconds  
[13:23:34] <icinga-wm>	 RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay -0 seconds  
[13:23:36] <grrrit-wm>	 (03CR) 10Chad: First of (hopefully many) es-tool commands (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/163945 (owner: 10Chad)
[13:24:02] <akosiaris>	 ori: Thank you !
[13:24:11] <akosiaris>	 RoanKattouw: thanks as well :-)
[13:24:23] <icinga-wm>	 RECOVERY - puppet last run on sca1002 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures  
[13:24:31] <RoanKattouw>	 OK citoid is running on both sca1001 and sca1002 now
[13:24:33] <icinga-wm>	 RECOVERY - citoid on sca1001 is OK: HTTP OK: HTTP/1.1 200 OK - 745 bytes in 0.021 second response time  
[13:24:43] <RoanKattouw>	 I had to start it on 1001 but it had already started on 1002, maybe because puppet was just running there
[13:24:47] <RoanKattouw>	 Sweet!
[13:24:56] <akosiaris>	 Cool!
[13:25:00] <akosiaris>	 so ... LVS time then
[13:25:05] <icinga-wm>	 RECOVERY - citoid on sca1002 is OK: HTTP OK: HTTP/1.1 200 OK - 745 bytes in 0.033 second response time  
[13:25:06] <RoanKattouw>	 Yes exactly
[13:25:11] <RoanKattouw>	 That's not working yet
[13:25:13] * RoanKattouw looks at LVS logs
[13:25:15] <akosiaris>	 ok, reviewing one last time and merging
[13:25:40] <RoanKattouw>	 10.2.2.19 is Destination Unreachable
[13:26:34] <RoanKattouw>	 Oh right that change isn't merged yet
[13:26:38] * RoanKattouw glares at Gerrit search
[13:26:49] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: Add LVS for citoid [puppet] - 10https://gerrit.wikimedia.org/r/164759 (owner: 10Catrope)
[13:28:41] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Add LVS for citoid [puppet] - 10https://gerrit.wikimedia.org/r/164759 (owner: 10Catrope)
[13:36:33] <icinga-wm>	 RECOVERY - puppet last run on sca1001 is OK: OK: Puppet is currently enabled, last run 62 seconds ago with 0 failures  
[13:37:58] <akosiaris>	 RoanKattouw: I dare say that LVS is done and works fine as well
[13:37:58] <grrrit-wm>	 (03CR) 10Cscott: "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/165329 (owner: 10Cscott)
[13:38:26] <RoanKattouw>	 Yup, works for me
[13:38:33] <RoanKattouw>	 Awesome! Thank you so much!
[13:39:36] <grrrit-wm>	 (03PS4) 10Chad: Adding tools for banning/unbanning an ES node [puppet] - 10https://gerrit.wikimedia.org/r/164617 
[13:39:37] <akosiaris>	 RoanKattouw: thanks as well! Seems like we got a new service
[13:39:38] <grrrit-wm>	 (03PS6) 10Chad: First of (hopefully many) es-tool commands [puppet] - 10https://gerrit.wikimedia.org/r/163945 
[13:39:40] <grrrit-wm>	 (03PS4) 10Chad: Another es-tool function: restart a node the fast & easy way [puppet] - 10https://gerrit.wikimedia.org/r/164401 
[13:39:42] <grrrit-wm>	 (03PS5) 10Chad: More elasticsearch tools [puppet] - 10https://gerrit.wikimedia.org/r/164270 
[13:40:13] <RoanKattouw>	 Indeed we did
[13:40:32] <RoanKattouw>	 Now I just need to refactor the code that uses it and implement the new UI for it and and and :D
[13:40:46] <icinga-wm>	 PROBLEM - Apache HTTP on mw1155 is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:41:04] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0]  
[13:41:13] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 3 below the confidence bounds  
[13:41:15] <akosiaris>	 ETOOMANYANDS
[13:41:34] <icinga-wm>	 RECOVERY - Apache HTTP on mw1155 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.084 second response time  
[13:45:19] <grrrit-wm>	 (03CR) 10Chad: Adding tools for banning/unbanning an ES node (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/164617 (owner: 10Chad)
[13:50:23] <icinga-wm>	 PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333  
[13:50:23] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: First of (hopefully many) es-tool commands (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/163945 (owner: 10Chad)
[13:54:08] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: First of (hopefully many) es-tool commands (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/163945 (owner: 10Chad)
[13:54:15] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: First of (hopefully many) es-tool commands (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/163945 (owner: 10Chad)
[13:54:36] <Coren>	 !log begin reimaging of mw1029
[13:54:41] <morebots>	 Logged the message, Master
[13:55:23] <icinga-wm>	 RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0  
[13:56:34] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0]  
[14:02:02] <akosiaris>	 !log updated pybal on palladium for citoid
[14:02:07] <morebots>	 Logged the message, Master
[14:03:16] <grrrit-wm>	 (03PS1) 10coren: Reimaging mw1029 as appserver_hhvm [puppet] - 10https://gerrit.wikimedia.org/r/165719 
[14:03:18] <grrrit-wm>	 (03PS5) 10Chad: Adding tools for banning/unbanning an ES node [puppet] - 10https://gerrit.wikimedia.org/r/164617 
[14:03:20] <grrrit-wm>	 (03PS7) 10Chad: First of (hopefully many) es-tool commands [puppet] - 10https://gerrit.wikimedia.org/r/163945 
[14:03:22] <grrrit-wm>	 (03PS5) 10Chad: Another es-tool function: restart a node the fast & easy way [puppet] - 10https://gerrit.wikimedia.org/r/164401 
[14:03:24] <grrrit-wm>	 (03PS6) 10Chad: More elasticsearch tools [puppet] - 10https://gerrit.wikimedia.org/r/164270 
[14:03:28] <grrrit-wm>	 (03CR) 10Chad: First of (hopefully many) es-tool commands (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/163945 (owner: 10Chad)
[14:03:41] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 031] Reimaging mw1029 as appserver_hhvm [puppet] - 10https://gerrit.wikimedia.org/r/165719 (owner: 10coren)
[14:04:01] <grrrit-wm>	 (03CR) 10Chad: First of (hopefully many) es-tool commands (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/163945 (owner: 10Chad)
[14:04:17] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on citoid.svc.eqiad.wmnet is CRITICAL: HTTP CRITICAL - No data received from host  
[14:04:41] <grrrit-wm>	 (03CR) 10coren: [C: 032] Reimaging mw1029 as appserver_hhvm [puppet] - 10https://gerrit.wikimedia.org/r/165719 (owner: 10coren)
[14:05:19] <akosiaris>	 RoanKattouw: that you ? ^
[14:07:49] <RoanKattouw>	 akosiaris: I haven't touched anything
[14:07:55] <RoanKattouw>	 Let me look at the sca boxes
[14:08:36] <RoanKattouw>	 Hmm I found some nice error logs here
[14:09:18] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: swift: fix hiera variables naming [puppet] - 10https://gerrit.wikimedia.org/r/165722 
[14:09:27] <RoanKattouw>	 HTTP to localhost is working fine on both sca1001 and 1002 though
[14:09:41] <RoanKattouw>	 Because Zotero is what's failing and the welcome page (which is what the LVS health checks use) doesn't use Zotero
[14:09:48] <RoanKattouw>	 Resolving citoid.svc.eqiad.wmflabs (citoid.svc.eqiad.wmflabs)... failed: Temporary failure in name resolution.
[14:09:51] <RoanKattouw>	 Ahm...
[14:10:09] <RoanKattouw>	 akosiaris: Did the Citoid DNS change get deployed during the codfw network cut maybe?
[14:10:50] <RoanKattouw>	 Oh nm I'm an idiot, I need to learn how to type
[14:11:03] <RoanKattouw>	 wmnet not wmflabs
[14:11:22] <RoanKattouw>	 akosiaris: I don't know man, icinga-wm is on crack. I can hit citoid just fine
[14:11:39] <RoanKattouw>	 $ wget http://citoid.svc.eqiad.wmnet:1970/ -O-    is super fast
[14:12:42] <akosiaris>	 RoanKattouw: yeah, I am noticing the same ....
[14:13:08] <akosiaris>	 maybe a check issue...
[14:15:04] <grrrit-wm>	 (03PS2) 10Manybubbles: Elasticsearch Drop number of concurrent merges [puppet] - 10https://gerrit.wikimedia.org/r/163188 
[14:15:51] <RoanKattouw>	 I checked the pybal log and those checks seem to be happy
[14:16:30] <_joe_>	 akosiaris: IP issue?
[14:16:37] <_joe_>	 no if the checks work
[14:17:30] <akosiaris>	 found it
[14:18:05] <akosiaris>	 /usr/lib/nagios/plugins/check_http -H citoid.svc.eqiad.wmnet -p 1970 -I 10.2.2.19 -u "" 
[14:18:06] <akosiaris>	 HTTP CRITICAL - No data received from host
[14:18:06] <akosiaris>	 /usr/lib/nagios/plugins/check_http -H citoid.svc.eqiad.wmnet -p 1970 -I 10.2.2.19 -u "/"
[14:18:06] <akosiaris>	 HTTP OK: HTTP/1.1 200 OK - 745 bytes in 0.002 second response time |time=0.001569s;;;0.000000 size=745B;;;0
[14:18:29] <akosiaris>	 so the check needs another !/ at the end
[14:18:38] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 031] Remove all IMAP configuration and Puppet manifests (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/164940 (owner: 10Mark Bergsma)
[14:18:42] <grrrit-wm>	 (03PS3) 10Faidon Liambotis: Remove all IMAP configuration and Puppet manifests [puppet] - 10https://gerrit.wikimedia.org/r/164940 (owner: 10Mark Bergsma)
[14:18:57] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 032] Remove all IMAP configuration and Puppet manifests [puppet] - 10https://gerrit.wikimedia.org/r/164940 (owner: 10Mark Bergsma)
[14:19:03] <akosiaris>	 mathoid's url displatching is probably different which is why it is working
[14:19:11] <akosiaris>	 RoanKattouw: _joe_ ^
[14:19:14] <paravoid>	 displatching?
[14:19:21] <akosiaris>	 dispatching 
[14:20:14] <akosiaris>	 could be displatching as well
[14:20:18] <akosiaris>	 http://www.urbandictionary.com/define.php?term=displatch
[14:20:36] <akosiaris>	 it is node.js after all
[14:20:40] <_joe_>	 translated: "we need a rewrite rule"
[14:20:54] <RoanKattouw>	 WTF
[14:21:04] <RoanKattouw>	 citoid breaks on "" but works for "/" , that's odd
[14:21:27] <RoanKattouw>	 The web server part of citoid is <50 lines so let me see if I can quickly fix that
[14:21:34] <akosiaris>	  -u, --url=PATH
[14:21:35] <akosiaris>	     URL to GET or POST (default: /)
[14:21:51] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: swift: fix hiera variables naming [puppet] - 10https://gerrit.wikimedia.org/r/165722 
[14:21:56] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: fix hiera variables naming [puppet] - 10https://gerrit.wikimedia.org/r/165722 (owner: 10Filippo Giunchedi)
[14:31:10] <icinga-wm>	 ACKNOWLEDGEMENT - LVS HTTP IPv4 on citoid.svc.eqiad.wmnet is CRITICAL: HTTP CRITICAL - No data received from host alexandros kosiaris citoid seems to not like being queried without a URL at the request. Investigated at citoid level by Roan, fallback plan is to adjust the check
[14:31:21] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: First of (hopefully many) es-tool commands (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/163945 (owner: 10Chad)
[14:31:39] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/163945 (owner: 10Chad)
[14:32:02] <bblack>	 hey an ack page, neat
[14:34:13] <akosiaris>	 ahahaha
[14:34:22] <akosiaris>	 that's a first
[14:34:27] <akosiaris>	 useful though 
[14:34:59] <akosiaris>	 btw... 2 minutes delay between you, bblack getting the page and me getting the page
[14:38:02] <apergos>	 it says 14:31 but it was a lie, it came later
[14:38:25] <apergos>	 prolly about a minute after you a kosiaris
[14:40:33] <RoanKattouw>	 akosiaris: Hmm I'm out of my league here trying to figure out this problem, so I'd rather bounce it to Gabriel. I'll see if I can change the check in the meantime
[14:40:49] <_joe_>	 I didn't get the page...
[14:40:56] <akosiaris>	 _joe_: great! nothing to worry about
[14:40:57] <akosiaris>	 seriously now, this is something to investigate 
[14:43:16] <grrrit-wm>	 (03PS1) 10Catrope: Work around Citoid bug in health check [puppet] - 10https://gerrit.wikimedia.org/r/165731 
[14:43:16] <RoanKattouw>	 akosiaris: ---^^
[14:43:47] <icinga-wm>	 PROBLEM - RAID on mw1029 is CRITICAL: Connection refused by host  
[14:44:07] <icinga-wm>	 PROBLEM - check configured eth on mw1029 is CRITICAL: Connection refused by host  
[14:44:18] <icinga-wm>	 PROBLEM - check if dhclient is running on mw1029 is CRITICAL: Connection refused by host  
[14:44:28] <icinga-wm>	 PROBLEM - check if salt-minion is running on mw1029 is CRITICAL: Connection refused by host  
[14:45:02] <icinga-wm>	 PROBLEM - nutcracker port on mw1029 is CRITICAL: Connection refused by host  
[14:45:08] <icinga-wm>	 PROBLEM - nutcracker process on mw1029 is CRITICAL: Connection refused by host  
[14:45:09] <icinga-wm>	 PROBLEM - DPKG on mw1029 is CRITICAL: Connection refused by host  
[14:45:18] <icinga-wm>	 PROBLEM - Disk space on mw1029 is CRITICAL: Connection refused by host  
[14:45:18] <icinga-wm>	 PROBLEM - puppet last run on mw1029 is CRITICAL: Connection refused by host  
[14:45:54] <akosiaris>	 I am wondering....
[14:45:57] <icinga-wm>	 PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 6706 seconds  
[14:46:18] <YuviPanda>	 is that the server being re-imaged?
[14:47:28] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected  
[14:48:06] <icinga-wm>	 RECOVERY - check if salt-minion is running on mw1029 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion  
[14:48:27] <icinga-wm>	 RECOVERY - RAID on mw1029 is OK: OK: no RAID installed  
[14:48:28] <icinga-wm>	 RECOVERY - nutcracker port on mw1029 is OK: TCP OK - 0.000 second response time on port 11212  
[14:48:46] <icinga-wm>	 RECOVERY - DPKG on mw1029 is OK: All packages OK  
[14:48:46] <icinga-wm>	 RECOVERY - nutcracker process on mw1029 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker  
[14:48:47] <icinga-wm>	 RECOVERY - check configured eth on mw1029 is OK: NRPE: Unable to read output  
[14:48:56] <icinga-wm>	 RECOVERY - Disk space on mw1029 is OK: DISK OK  
[14:48:56] <icinga-wm>	 RECOVERY - check if dhclient is running on mw1029 is OK: PROCS OK: 0 processes with command name dhclient  
[14:49:18] <icinga-wm>	 RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 0 seconds  
[14:49:50] <icinga-wm>	 RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -1 seconds  
[14:51:19] <anomie>	 manybubbles, marktraceur, ^d: So who wants to SWAT this morning?
[14:51:32] <anomie>	 prtksxna: Ping for SWAT in about 8.5 minutes
[14:51:34] <manybubbles>	 either is fine - I can do it if no noe wants it
[14:51:47] <anomie>	 manybubbles: I don't want it
[14:51:55] <manybubbles>	 I'll do it
[14:52:13] <RoanKattouw>	 Oh crap, SWAT time
[14:52:14] <RoanKattouw>	 anomie: Can I SWAT a VE patch?
[14:52:39] <anomie>	 RoanKattouw: manybubbles is going to do the SWAT today. There should be time to add it to the list.
[14:52:39] <manybubbles>	 RoanKattouw: sure
[14:52:48] <RoanKattouw>	 OK
[14:52:52] * RoanKattouw starts cherry-picking
[14:54:18] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Default to UNKNOWN when NRPE checks timeout [puppet] - 10https://gerrit.wikimedia.org/r/165732 
[14:54:37] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: swift: fix container-sync template [puppet] - 10https://gerrit.wikimedia.org/r/165733 
[14:55:10] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: swift: fix container-sync template [puppet] - 10https://gerrit.wikimedia.org/r/165733 
[14:55:18] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: fix container-sync template [puppet] - 10https://gerrit.wikimedia.org/r/165733 (owner: 10Filippo Giunchedi)
[14:55:55] <marktraceur>	 I could have, but take it away manybubbles
[15:00:05] <jouncebot>	 manybubbles, anomie, ^d, marktraceur: Dear anthropoid, the time has come. Please deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141009T1500).
[15:00:21] <manybubbles>	 prtksxna: its time for to deploy your SWAT
[15:00:27] <manybubbles>	 ready to verify that it worked?
[15:00:45] <manybubbles>	 marktraceur and anomie: those less files are just recompiled on change, right?
[15:01:14] <anomie>	 manybubbles: I have no idea how that works.
[15:01:26] <YuviPanda>	 manybubbles: they should be, yeah.
[15:01:29] <RoanKattouw>	 manybubbles: I added one just in the nick of time there
[15:01:39] <manybubbles>	 anomie: thanks.  will ready while I wait for prtksxna or RoanKattouw to be ready
[15:01:39] <RoanKattouw>	 ( https://gerrit.wikimedia.org/r/165738 )
[15:01:45] <RoanKattouw>	 And I declare myself ready :)
[15:02:20] <icinga-wm>	 PROBLEM - puppet last run on ms-be2002 is CRITICAL: CRITICAL: puppet fail  
[15:03:08] <godog>	 ^ that's me testing
[15:04:16] <godog>	 manybubbles: https://gerrit.wikimedia.org/r/#/c/163188 good to merge I'm assuming?
[15:04:29] <manybubbles>	 godog: fine by me!
[15:05:12] <manybubbles>	 RoanKattouw: I'll get that merged and deployed then
[15:05:34] <grrrit-wm>	 (03PS3) 10Filippo Giunchedi: Elasticsearch Drop number of concurrent merges [puppet] - 10https://gerrit.wikimedia.org/r/163188 (owner: 10Manybubbles)
[15:05:40] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Elasticsearch Drop number of concurrent merges [puppet] - 10https://gerrit.wikimedia.org/r/163188 (owner: 10Manybubbles)
[15:05:57] <godog>	 manybubbles: ack, it's done!
[15:06:03] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: swift-synctool: enable/disable/show sync (031 comment) [software] - 10https://gerrit.wikimedia.org/r/160428 (owner: 10Filippo Giunchedi)
[15:06:07] <manybubbles>	 thanks!
[15:07:01] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: puppet fail  
[15:07:23] <godog>	 yeah yeah icinga-wm
[15:07:50] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2003 is CRITICAL: CRITICAL: puppet fail  
[15:09:57] <manybubbles>	 prtksxna: around?  I'd like to do your SWAT deploy in a few minutes
[15:13:45] <manybubbles>	 oh crap.  I rebased against 1.2_4_wmf2 accidentally.  now its sad
[15:13:46] <manybubbles>	 ^^
[15:13:48] <manybubbles>	 anomie: ^^
[15:14:28] <anomie>	 manybubbles: unhappy as in?
[15:14:35] <manybubbles>	 git now hates me
[15:14:45] <manybubbles>	 fatal: bad config file line 70 in .gitmodules
[15:15:32] <manybubbles>	 I wonder if I can rm php-1.25wmf2 and rebuild it.  that is what I do on my laptop when I screw make git this mad
[15:15:39] <manybubbles>	 but I can't if there are security patches
[15:15:45] <manybubbles>	 I suppose
[15:15:49] <RoanKattouw>	 manybubbles: Wait, on the cluster?
[15:15:49] <anomie>	 manybubbles: hang on a second
[15:16:00] <RoanKattouw>	 OK I'll let anomie deal with this
[15:16:03] <manybubbles>	 RoanKattouw: yeah - I was doing the rebase step and mistyped
[15:16:05] <anomie>	 manybubbles: Is it fixed-ish now?
[15:16:18] <manybubbles>	 anomie: looks much better
[15:16:20] <manybubbles>	 what did you do?
[15:16:24] <RoanKattouw>	 manybubbles: You don't really need to rebase any more, we made 'git pull' an alias for 'git pull --rebase'
[15:16:38] <anomie>	 manybubbles: I saw you were on a detached head, so just git checkout -f wmf/1.25wmf2
[15:16:52] <manybubbles>	 anomie: oh, well, thats probably right
[15:16:57] <RoanKattouw>	 Wait
[15:16:57] <manybubbles>	 well, lets just move on then
[15:16:59] <manybubbles>	 oh now
[15:17:00] <manybubbles>	 no
[15:17:06] <RoanKattouw>	 Hold on let me check something
[15:17:17] <RoanKattouw>	 If there were security patches before, they have to be restored
[15:17:21] <RoanKattouw>	 I forget whether we had any
[15:17:43] <RoanKattouw>	 (and once I find out whether we do, I can't tell this channel anyway)
[15:18:16] <manybubbles>	 nice
[15:18:31] <anomie>	 RoanKattouw: Aren't the security patches locally committed to the wmf branch when we have them?
[15:18:42] <RoanKattouw>	 Maybe?
[15:18:58] <RoanKattouw>	 I'm checking reflog just in case
[15:19:01] <icinga-wm>	 RECOVERY - puppet last run on mw1029 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures  
[15:21:03] <guillom>	 Reminder: if you have interesting or disruptive stuff that will be deployed next week, please add them to https://meta.wikimedia.org/wiki/Tech/News/2014/42
[15:21:12] <RoanKattouw>	 manybubbles: OK so you're OK to proceed now on the security patch front, but you're still in a rebase conflict state, so I'm gonna clean that up for you
[15:21:29] <manybubbles>	 RoanKattouw: oh, I was just doing it I think
[15:21:32] <manybubbles>	 but you can take over
[15:21:34] <RoanKattouw>	 Or a merge conflict state or *something*. git status is crazy
[15:21:39] <manybubbles>	 yeah
[15:21:58] * anomie was observing that too
[15:22:28] <RoanKattouw>	 Ugh, it's because the local wmf/1.25wmf2 branch is hosed
[15:22:29] <manybubbles>	 when I do this locally I always reclone but that isn't an option for us
[15:22:32] <RoanKattouw>	 It's pointing to 24
[15:22:45] <manybubbles>	 RoanKattouw: that is my fault - that is how I rebased it
[15:22:52] <manybubbles>	 mistype one fucking number
[15:23:45] <RoanKattouw>	 OK here we go
[15:23:47] <RoanKattouw>	 You're all set now
[15:23:59] <RoanKattouw>	 Your next step is to run git submodule update --recursive extensions/VisualEditor
[15:24:03] <RoanKattouw>	 (don't forget --recursive)
[15:24:05] <anomie>	 RoanKattouw: Still not on a branch?
[15:24:06] <marktraceur>	 manybubbles: Ooh, I did that last week, that's fun
[15:24:27] <RoanKattouw>	 anomie: Ugh, fixing
[15:24:30] <manybubbles>	 RoanKattouw: k.  can you document how you fix that shit?
[15:25:22] <RoanKattouw>	 manybubbles: I don't really recall how I did it and I'd rather not spend hours writing it up
[15:25:30] <manybubbles>	 ah
[15:25:42] <RoanKattouw>	 Instead, you can just run 'git pull'. It's aliased to 'git pull --rebase' in that clone
[15:25:58] <anomie>	 Deep magic is how I fix stuff like that, I never remember exactly how either
[15:26:14] <manybubbles>	 RoanKattouw: I thought we always wanted to git log HEAD..origin/XXX to check what we're getting?
[15:26:55] <RoanKattouw>	 manybubbles: You can do that after git fetch and before git pull
[15:26:56] <andrewbogott>	 !log upgraded wikitech-static to 1.25wmf2
[15:26:56] <manybubbles>	 ok to sync then?
[15:26:56] <anomie>	 manybubbles: That's about what I do: git fetch && git log HEAD..origin/XXX, then if that looks good git pull
[15:26:56] <RoanKattouw>	 pull will fetch again, but meh
[15:26:56] <morebots>	 Logged the message, Master
[15:28:31] <logmsgbot>	 !log manybubbles Synchronized php-1.25wmf2/extensions/VisualEditor/: SWAT deploy VE cherry-pick (duration: 00m 06s)
[15:28:37] <morebots>	 Logged the message, Master
[15:28:44] <manybubbles>	 RoanKattouw: thanks for fixing it.  Here is sync^^
[15:28:59] <manybubbles>	 prtksxna: you are next!  ready for swat?
[15:29:06] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: Added initial Debian packaging (032 comments) [debs/contenttranslation/apertium-pt-ca] - 10https://gerrit.wikimedia.org/r/165475 (owner: 10KartikMistry)
[15:29:32] <RoanKattouw>	 Thanks manybubbles 
[15:29:57] <manybubbles>	 thank you!
[15:30:53] <manybubbles>	 prtksxna: I'm giving you another ten minutes to ping me as ready for SWAT or I'm booting your change.
[15:30:56] <^d>	 I was afk, sorry guys.
[15:31:38] <Coren>	 !log done reimaging of mw1029.  Now hhvm_appserver
[15:31:43] <morebots>	 Logged the message, Master
[15:31:55] <Coren>	 !log begin reimaging of mw1028
[15:31:59] <morebots>	 Logged the message, Master
[15:32:41] <icinga-wm>	 PROBLEM - puppet last run on elastic1011 is CRITICAL: CRITICAL: Puppet has 2 failures  
[15:34:08] <hashar>	 !log restarted Zuul
[15:34:13] <morebots>	 Logged the message, Master
[15:34:35] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "Same questions as in https://gerrit.wikimedia.org/r/#/c/165475/" [debs/contenttranslation/apertium-es-pt] - 10https://gerrit.wikimedia.org/r/165473 (owner: 10KartikMistry)
[15:37:02] <grrrit-wm>	 (03PS1) 10coren: Switch mw1028 to appserver_hhvm [puppet] - 10https://gerrit.wikimedia.org/r/165746 
[15:40:08] <grrrit-wm>	 (03CR) 10coren: [C: 032] Switch mw1028 to appserver_hhvm [puppet] - 10https://gerrit.wikimedia.org/r/165746 (owner: 10coren)
[15:42:52] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "Typo, plus all the questions from https://gerrit.wikimedia.org/r/#/c/165475/" (031 comment) [debs/contenttranslation/apertium-en-es] - 10https://gerrit.wikimedia.org/r/165471 (owner: 10KartikMistry)
[15:44:51] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] Default to UNKNOWN when NRPE checks timeout [puppet] - 10https://gerrit.wikimedia.org/r/165732 (owner: 10Alexandros Kosiaris)
[15:46:46] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "Same questions as for https://gerrit.wikimedia.org/r/#/c/165475/" [debs/contenttranslation/apertium-es-ca] - 10https://gerrit.wikimedia.org/r/163578 (owner: 10KartikMistry)
[15:47:35] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Add .gitreview file [debs/contenttranslation/apertium-lex-tools] - 10https://gerrit.wikimedia.org/r/164057 (owner: 10KartikMistry)
[15:47:42] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [V: 032] Add .gitreview file [debs/contenttranslation/apertium-lex-tools] - 10https://gerrit.wikimedia.org/r/164057 (owner: 10KartikMistry)
[15:50:01] <icinga-wm>	 RECOVERY - puppet last run on elastic1011 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures  
[16:00:50] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] Added initial Debian packaging (033 comments) [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/163577 (owner: 10KartikMistry)
[16:06:48] <grrrit-wm>	 (03PS2) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-en-es] - 10https://gerrit.wikimedia.org/r/165471 
[16:08:32] <grrrit-wm>	 (03PS2) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-pt-ca] - 10https://gerrit.wikimedia.org/r/165475 
[16:10:39] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: swift-synctool: enable/disable/show sync (031 comment) [software] - 10https://gerrit.wikimedia.org/r/160428 (owner: 10Filippo Giunchedi)
[16:18:51] <ottomata>	 yo akosiaris, yt?
[16:22:30] <logmsgbot>	 oblivian is doing a graceful restart of all apaches
[16:22:40] <icinga-wm>	 PROBLEM - Disk space on mw1028 is CRITICAL: Connection refused by host  
[16:22:40] <icinga-wm>	 PROBLEM - puppet last run on mw1028 is CRITICAL: Connection refused by host  
[16:22:41] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: base: add checks for 127.0.1.1 in /etc/hosts [puppet] - 10https://gerrit.wikimedia.org/r/157795 
[16:22:47] <grrrit-wm>	 (03PS1) 10Jgreen: remove role::mail::sender from role::labs::instance, it's already included via standard [puppet] - 10https://gerrit.wikimedia.org/r/165751 
[16:22:51] <logmsgbot>	 oblivian is doing a graceful restart of all apaches
[16:23:13] <logmsgbot>	 !log oblivian gracefulled all apaches
[16:23:21] <morebots>	 Logged the message, Master
[16:23:29] <icinga-wm>	 PROBLEM - RAID on mw1028 is CRITICAL: Connection refused by host  
[16:23:42] <grrrit-wm>	 (03CR) 10KartikMistry: Added initial Debian packaging (033 comments) [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/163577 (owner: 10KartikMistry)
[16:23:49] <icinga-wm>	 PROBLEM - check configured eth on mw1028 is CRITICAL: Connection refused by host  
[16:24:10] <icinga-wm>	 PROBLEM - check if dhclient is running on mw1028 is CRITICAL: Connection refused by host  
[16:24:30] <icinga-wm>	 PROBLEM - check if salt-minion is running on mw1028 is CRITICAL: Connection refused by host  
[16:24:43] <icinga-wm>	 PROBLEM - nutcracker port on mw1028 is CRITICAL: Connection refused by host  
[16:24:50] <icinga-wm>	 PROBLEM - nutcracker process on mw1028 is CRITICAL: Connection refused by host  
[16:24:50] <icinga-wm>	 PROBLEM - DPKG on mw1028 is CRITICAL: Connection refused by host  
[16:27:17] <^d>	 YuviPanda: [2014-10-09 16:19:18,635][WARN ][org.elasticsearch.service.graphite.GraphiteReporter] Error writing to Graphite: Connection timed out 
[16:27:21] <^d>	 Ok, that's something ^
[16:27:38] <grrrit-wm>	 (03PS2) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-es-pt] - 10https://gerrit.wikimedia.org/r/165473 
[16:27:59] <YuviPanda>	 ^d: In what I consider facepalm, I think the problem might be that we've statsd sitting before graphite here, and so technically we need an ES statsd plugin...
[16:28:12] <^d>	 ...
[16:28:20] <YuviPanda>	 I know. I'm an idiot.
[16:28:39] <^d>	 There is one, luckily.
[16:28:42] <^d>	 https://github.com/swoop-inc/elasticsearch-statsd-plugin
[16:28:45] <YuviPanda>	 oh?
[16:29:19] <^d>	 Basically identical structure.
[16:29:31] <YuviPanda>	 yeah
[16:30:05] <YuviPanda>	 ^d: can we try that?
[16:30:10] <^d>	 Ok, I gotta dump out for a few hours. I'll have a look at deploying this one instead.
[16:30:12] <YuviPanda>	 there's only minor differences between statsd and graphite...
[16:30:16] <^d>	 *dip out
[16:30:22] <YuviPanda>	 ^d: cool, thanks!
[16:30:25] <grrrit-wm>	 (03PS4) 10KartikMistry: Add initial Debian packaging [debs/contenttranslation/apertium-es-ca] - 10https://gerrit.wikimedia.org/r/163578 
[16:30:27] <YuviPanda>	 ^d: and sorry about the wildish goose chase.
[16:35:03] <grrrit-wm>	 (03CR) 10coren: [C: 031] "It is good to remove redundancy and removing redundancy is good." [puppet] - 10https://gerrit.wikimedia.org/r/165751 (owner: 10Jgreen)
[16:35:49] <icinga-wm>	 PROBLEM - NTP on mw1028 is CRITICAL: NTP CRITICAL: Offset unknown  
[16:35:49] <icinga-wm>	 RECOVERY - puppet last run on ms-be2002 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures  
[16:36:01] <icinga-wm>	 RECOVERY - puppet last run on ms-fe2003 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures  
[16:36:26] <grrrit-wm>	 (03CR) 10Jgreen: [C: 032 V: 031] remove role::mail::sender from role::labs::instance, it's already included via standard [puppet] - 10https://gerrit.wikimedia.org/r/165751 (owner: 10Jgreen)
[16:36:40] <Coren>	 Why is mw1028 whining?  It was in maintenance.
[16:36:49] <icinga-wm>	 PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: Puppet last ran 19598 seconds ago, expected  14400  
[16:36:50] <icinga-wm>	 PROBLEM - puppet last run on ms-be2001 is CRITICAL: CRITICAL: Puppet last ran 19630 seconds ago, expected  14400  
[16:36:50] <icinga-wm>	 PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: Puppet last ran 19622 seconds ago, expected  14400  
[16:36:59] <icinga-wm>	 PROBLEM - puppet last run on ms-be2010 is CRITICAL: CRITICAL: Puppet last ran 20292 seconds ago, expected  14400  
[16:37:00] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2001 is CRITICAL: CRITICAL: Puppet last ran 19887 seconds ago, expected  14400  
[16:37:00] <icinga-wm>	 PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: Puppet last ran 19543 seconds ago, expected  14400  
[16:37:00] <icinga-wm>	 PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: Puppet last ran 19951 seconds ago, expected  14400  
[16:37:01] <Coren>	 Oooooh.  Stupid icinga.  Flexible maintenance.
[16:37:31] <icinga-wm>	 PROBLEM - puppet last run on ms-be2003 is CRITICAL: CRITICAL: Puppet last ran 20059 seconds ago, expected  14400  
[16:37:32] <icinga-wm>	 PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: Puppet last ran 19657 seconds ago, expected  14400  
[16:37:32] <icinga-wm>	 PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: Puppet last ran 20027 seconds ago, expected  14400  
[16:37:40] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2002 is CRITICAL: CRITICAL: Puppet last ran 20658 seconds ago, expected  14400  
[16:37:40] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2004 is CRITICAL: CRITICAL: Puppet last ran 20056 seconds ago, expected  14400  
[16:37:50] <icinga-wm>	 PROBLEM - puppet last run on ms-be2009 is CRITICAL: CRITICAL: Puppet last ran 20372 seconds ago, expected  14400  
[16:37:50] <icinga-wm>	 PROBLEM - puppet last run on ms-be2011 is CRITICAL: CRITICAL: Puppet last ran 19815 seconds ago, expected  14400  
[16:38:37] <godog>	 mh missed it by ~5000s :(
[16:39:01] <godog>	 !log re-enable puppet on ms-fe/ms-be in codfw
[16:39:06] <morebots>	 Logged the message, Master
[16:40:09] <icinga-wm>	 RECOVERY - puppet last run on ms-be2010 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures  
[16:41:09] <icinga-wm>	 RECOVERY - NTP on mw1028 is OK: NTP OK: Offset -0.01753103733 secs  
[16:41:24] <_joe_>	 Coren: because wmf-reimage wipes the host from puppet
[16:41:40] <_joe_>	 so when you reinstall it it's a fresh host
[16:41:41] <paravoid>	 wmf-reimage?
[16:41:45] <_joe_>	 without downtime
[16:42:00] <Coren>	 _joe_: Oh!  Duh!  It's obvious in retrospect.
[16:42:02] <_joe_>	 paravoid: ah, a small script that does the cleaning on puppet/salt
[16:42:20] <_joe_>	 and then polls them both to ask you to sign the new keys
[16:42:21] <icinga-wm>	 RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures  
[16:42:39] <paravoid>	 nice
[16:42:46] <_joe_>	 Coren: I realized that this morning
[16:43:14] <paravoid>	 just add the idrac/ipmitool commands over there as well
[16:43:21] <paravoid>	 and you're pretty much done ;)
[16:43:28] <_joe_>	 paravoid: mmmh almost, yes
[16:43:34] <paravoid>	 plus a loop to run puppet on boot
[16:43:46] <_joe_>	 the procedure is pretty boring right now, I
[16:43:52] <godog>	 !log re-enable puppet on ms-fe/ms-be in eqiad
[16:43:53] <icinga-wm>	 RECOVERY - puppet last run on ms-be2003 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures  
[16:43:55] <_joe_>	 m trying to make it simpler and simpler
[16:43:59] <icinga-wm>	 RECOVERY - puppet last run on ms-fe2004 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures  
[16:43:59] <icinga-wm>	 PROBLEM - puppet last run on ms-be1005 is CRITICAL: CRITICAL: Puppet last ran 20716 seconds ago, expected  14400  
[16:43:59] <icinga-wm>	 PROBLEM - puppet last run on ms-be1008 is CRITICAL: CRITICAL: Puppet last ran 20053 seconds ago, expected  14400  
[16:43:59] <morebots>	 Logged the message, Master
[16:44:05] <godog>	 [sorry alarm storm inbound]
[16:44:09] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1003 is CRITICAL: CRITICAL: Puppet last ran 20576 seconds ago, expected  14400  
[16:44:10] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1001 is CRITICAL: CRITICAL: Puppet last ran 20376 seconds ago, expected  14400  
[16:44:10] <icinga-wm>	 PROBLEM - puppet last run on ms-be1015 is CRITICAL: CRITICAL: Puppet last ran 20956 seconds ago, expected  14400  
[16:44:10] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1002 is CRITICAL: CRITICAL: Puppet last ran 20947 seconds ago, expected  14400  
[16:44:10] <icinga-wm>	 PROBLEM - puppet last run on ms-be1001 is CRITICAL: CRITICAL: Puppet last ran 20586 seconds ago, expected  14400  
[16:44:10] <icinga-wm>	 PROBLEM - puppet last run on ms-be1003 is CRITICAL: CRITICAL: Puppet last ran 20465 seconds ago, expected  14400  
[16:44:19] <icinga-wm>	 PROBLEM - puppet last run on ms-be1011 is CRITICAL: CRITICAL: Puppet last ran 21081 seconds ago, expected  14400  
[16:44:29] <icinga-wm>	 PROBLEM - puppet last run on ms-be1014 is CRITICAL: CRITICAL: Puppet last ran 20827 seconds ago, expected  14400  
[16:44:33] <icinga-wm>	 PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Puppet last ran 20871 seconds ago, expected  14400  
[16:44:33] <icinga-wm>	 PROBLEM - puppet last run on ms-be1009 is CRITICAL: CRITICAL: Puppet last ran 20019 seconds ago, expected  14400  
[16:44:39] <icinga-wm>	 PROBLEM - puppet last run on ms-be1002 is CRITICAL: CRITICAL: Puppet last ran 20985 seconds ago, expected  14400  
[16:44:40] <icinga-wm>	 PROBLEM - puppet last run on ms-be1004 is CRITICAL: CRITICAL: Puppet last ran 20538 seconds ago, expected  14400  
[16:44:49] <icinga-wm>	 PROBLEM - puppet last run on ms-be1012 is CRITICAL: CRITICAL: Puppet last ran 20124 seconds ago, expected  14400  
[16:44:49] <icinga-wm>	 PROBLEM - puppet last run on ms-be1007 is CRITICAL: CRITICAL: Puppet last ran 20083 seconds ago, expected  14400  
[16:44:50] <icinga-wm>	 PROBLEM - puppet last run on ms-be1013 is CRITICAL: CRITICAL: Puppet last ran 20601 seconds ago, expected  14400  
[16:44:50] <icinga-wm>	 PROBLEM - puppet last run on ms-be1006 is CRITICAL: CRITICAL: Puppet last ran 20515 seconds ago, expected  14400  
[16:44:59] <icinga-wm>	 RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures  
[16:45:31] <icinga-wm>	 RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures  
[16:46:10] <icinga-wm>	 RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures  
[16:48:11] <icinga-wm>	 RECOVERY - puppet last run on ms-be2011 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures  
[16:49:09] <icinga-wm>	 RECOVERY - puppet last run on ms-fe2002 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures  
[16:49:49] <icinga-wm>	 RECOVERY - puppet last run on ms-be1012 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures  
[16:50:00] <icinga-wm>	 RECOVERY - puppet last run on ms-be2005 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures  
[16:50:09] <icinga-wm>	 RECOVERY - puppet last run on ms-be1008 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures  
[16:50:19] <icinga-wm>	 RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures  
[16:50:19] <icinga-wm>	 RECOVERY - puppet last run on ms-be2001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures  
[16:50:40] <icinga-wm>	 RECOVERY - puppet last run on ms-be1009 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures  
[16:50:50] <icinga-wm>	 RECOVERY - puppet last run on ms-be1007 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures  
[16:51:22] <icinga-wm>	 RECOVERY - puppet last run on ms-be2008 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures  
[16:51:39] <icinga-wm>	 RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures  
[16:53:19] <icinga-wm>	 RECOVERY - puppet last run on ms-be2009 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures  
[16:53:30] <icinga-wm>	 RECOVERY - puppet last run on ms-be1011 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures  
[16:54:49] <icinga-wm>	 RECOVERY - puppet last run on ms-be1002 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures  
[16:55:29] <icinga-wm>	 RECOVERY - puppet last run on ms-fe1002 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures  
[16:55:44] <prtksxna>	 manybubbles: o/
[16:56:13] <prtksxna>	 manybubbles: Did I mess up time zones? :(
[16:56:29] <manybubbles>	 prtksxna: musta been!  SWAT was two hours ago
[16:56:29] <icinga-wm>	 RECOVERY - puppet last run on ms-be1015 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures  
[16:56:43] <prtksxna>	 argghh! :(
[16:56:44] <manybubbles>	 8am SF time/11am my time
[16:56:46] <manybubbles>	 sorry!
[16:56:49] <icinga-wm>	 RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures  
[16:56:53] <manybubbles>	 you can reschedule for the next one?
[16:57:34] <prtksxna>	 manybubbles: I read it wrong. I wanted to sleep and was up to get this done now :|
[16:57:48] <prtksxna>	 manybubbles: Yeah, I'll do that, but it won't go on all the Wikipedias now¬
[16:57:50] <icinga-wm>	 RECOVERY - puppet last run on ms-be1014 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures  
[16:57:54] <icinga-wm>	 RECOVERY - Disk space on mw1028 is OK: DISK OK  
[16:57:54] <icinga-wm>	 RECOVERY - nutcracker process on mw1028 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker  
[16:57:54] <icinga-wm>	 RECOVERY - DPKG on mw1028 is OK: All packages OK  
[16:57:59] <icinga-wm>	 RECOVERY - check configured eth on mw1028 is OK: NRPE: Unable to read output  
[16:58:12] <icinga-wm>	 RECOVERY - check if dhclient is running on mw1028 is OK: PROCS OK: 0 processes with command name dhclient  
[16:58:19] <icinga-wm>	 RECOVERY - puppet last run on ms-be1005 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures  
[16:58:32] <icinga-wm>	 RECOVERY - RAID on mw1028 is OK: OK: no RAID installed  
[16:58:50] <_joe_>	 !log gracefully restarted again api apaches to recover 500s
[16:58:50] <icinga-wm>	 RECOVERY - nutcracker port on mw1028 is OK: TCP OK - 0.000 second response time on port 11212  
[16:58:51] <icinga-wm>	 PROBLEM - puppet last run on mw1028 is CRITICAL: CRITICAL: Puppet has 1 failures  
[16:58:55] <morebots>	 Logged the message, Master
[16:59:32] <prtksxna>	 manybubbles: If I get it done in the next one by when does it show up on the Wikipedias?
[17:00:21] <manybubbles>	 prtksxna: those are in wmf2 so wikipedias will get it immediately
[17:00:39] <prtksxna>	 manybubbles: Oh right. I'll do that then
[17:00:47] <manybubbles>	 wmf3 (going onto test wikis in an hour) won't have it unless you backport it for wmf3 or its already included there
[17:00:49] <prtksxna>	 manybubbles: Sorry if I wasted your time, making you wait
[17:01:05] <manybubbles>	 its cool - I mostly just pinged you and got back to work
[17:01:40] <icinga-wm>	 RECOVERY - puppet last run on ms-fe1003 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures  
[17:01:45] <icinga-wm>	 RECOVERY - puppet last run on ms-be1001 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures  
[17:02:09] <icinga-wm>	 RECOVERY - puppet last run on ms-be1013 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures  
[17:02:10] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0  
[17:03:02] <icinga-wm>	 RECOVERY - puppet last run on ms-be1004 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures  
[17:03:11] <icinga-wm>	 RECOVERY - puppet last run on ms-be1006 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures  
[17:04:00] <icinga-wm>	 RECOVERY - puppet last run on ms-be1003 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures  
[17:04:40] <icinga-wm>	 RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures  
[17:07:20] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core:  cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR  
[17:11:13] <subbu>	 greg-g, ping.
[17:11:13] <greg-g>	 subbu: You sent me a contentless ping.  This is a contentless pong.  Please provide a bit of information about what you want and I will respond when I am around.
[17:12:17] <subbu>	 aha .. greg-g i assume you've seen my full-of-content ping on #mediawiki-parsoid :) will wait for a pong on it.
[17:12:27] <subbu>	 or rather that you'll see it.
[17:14:23] <greg-g>	 subbu: hey, sorry, in meetings
[17:14:28] <greg-g>	 :/
[17:14:34] <greg-g>	 subbu: I assume it's ok :)
[17:16:29] <subbu>	 greg-g, ok .. is there a good time when we should do it? 1pm PST good?
[17:27:32] <greg-g>	 subbu: that should be fine, yeah
[17:28:08] <subbu>	 k, thanks.
[17:37:29] <apergos>	 bd808: holler when yo uhave some free time, I"m eating dinner right now but after tht ready to try some salt install and play on deployment-prep
[17:38:48] <bd808>	 apergos: "free" is a relative term. :) I should be available after ~19:00Z
[17:40:02] <apergos>	 so 10 pm here... that works for me
[17:40:35] <apergos>	 I'll check in around then, thanks!
[17:44:40] <icinga-wm>	 RECOVERY - Disk space on analytics1035 is OK: DISK OK  
[17:55:49] <Coren>	 !log done reimaging of mw1028.  Now hhvm_appserver
[17:55:56] <morebots>	 Logged the message, Master
[17:56:28] <Coren>	 !log begin reimaging of mw1027
[17:56:33] <morebots>	 Logged the message, Master
[17:57:09] <icinga-wm>	 PROBLEM - HHVM rendering on mw1028 is CRITICAL: Connection refused  
[17:57:34] <Coren>	 Stupid pybal faster than I intended.
[17:57:41] <Coren>	 (Already depooled ^^)
[17:59:30] <icinga-wm>	 PROBLEM - Apache HTTP on mw1028 is CRITICAL: Connection refused  
[18:00:05] <jouncebot>	 Reedy, greg-g: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141009T1800).
[18:25:39] <Reedy>	 Anyone around for some puppet advice?
[18:26:16] <greg-g>	 (please) :)
[18:26:30] <Reedy>	 https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/beta.pp#L101
[18:26:53] <Reedy>	 can I just move that code into another class, and include the other class in multiple places in beta?
[18:27:50] <hoo>	 Reedy: Yes
[18:28:13] <hoo>	 Puppet 3 doesn't have dynamic scooping so it's rather sane
[18:28:24] <grrrit-wm>	 (03PS1) 10Reedy: Make beta jobrunners use beta nutcracker config [puppet] - 10https://gerrit.wikimedia.org/r/165770 
[18:28:32] <Reedy>	 So that's bug fix number 1
[18:28:39] <icinga-wm>	 PROBLEM - Swift HTTP backend on ms-fe2003 is CRITICAL: Connection timed out  
[18:29:14] <Reedy>	 Just noticed that jobrunners in beta are using production memcached and spamming tonnes of errors
[18:29:49] <Reedy>	 DAMN IT
[18:30:26] <hoo>	 Reedy: they shouldn't even be able to connect
[18:30:31] <hoo>	 or is that what you're seeing
[18:30:35] <hoo>	 if so, lulz
[18:30:43] <Reedy>	 they're trying to
[18:30:44] <Reedy>	 And failing
[18:30:46] <Reedy>	 miserably
[18:31:06] <grrrit-wm>	 (03PS2) 10Reedy: Make beta jobrunners use beta nutcracker config [puppet] - 10https://gerrit.wikimedia.org/r/165770 
[18:31:51] <Reedy>	 sigh
[18:31:53] <grrrit-wm>	 (03PS3) 10Reedy: Make beta jobrunners use beta nutcracker config [puppet] - 10https://gerrit.wikimedia.org/r/165770 
[18:32:11] <icinga-wm>	 PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: Puppet has 1 failures  
[18:32:59] <grrrit-wm>	 (03PS1) 10Reedy: Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165773 
[18:33:01] <grrrit-wm>	 (03PS1) 10Reedy: testwiki to 1.25wmf3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165774 
[18:33:03] <grrrit-wm>	 (03PS1) 10Reedy: Wikipedias to 1.25wmf2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165775 
[18:33:05] <grrrit-wm>	 (03PS1) 10Reedy: group0 to 1.25wmf3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165776 
[18:33:24] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165773 (owner: 10Reedy)
[18:33:31] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165773 (owner: 10Reedy)
[18:33:44] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] testwiki to 1.25wmf3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165774 (owner: 10Reedy)
[18:33:51] <grrrit-wm>	 (03Merged) 10jenkins-bot: testwiki to 1.25wmf3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165774 (owner: 10Reedy)
[18:34:01] <logmsgbot>	 !log reedy Started scap: testwiki to 1.25wmf3 and build l10n cache
[18:34:10] <morebots>	 Logged the message, Master
[18:38:45] <grrrit-wm>	 (03CR) 10Reedy: [C: 031] "Cherry picked onto beta" [puppet] - 10https://gerrit.wikimedia.org/r/165770 (owner: 10Reedy)
[18:42:34] <logmsgbot>	 !log reedy scap failed: TypeError bufsize must be an integer (duration: 08m 33s)
[18:42:41] <Reedy>	 bd808: ^^ lol
[18:42:42] <morebots>	 Logged the message, Master
[18:42:48] <greg-g>	 eek
[18:43:00] <bd808>	 bah. what's that from.
[18:43:07] * bd808 goes to look at logs
[18:43:19] <Reedy>	 bd808: http://p.defau.lt/?NoGn1xPjNmVe_T1t5b6yrw
[18:44:06] <bd808>	 ugh. Ori's fix for the bug Tim hit
[18:45:21] <grrrit-wm>	 (03PS1) 10Reedy: Setting $wgMemCachedServers = array(); [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165778 
[18:45:32] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Setting $wgMemCachedServers = array(); [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165778 (owner: 10Reedy)
[18:45:41] <grrrit-wm>	 (03Merged) 10jenkins-bot: Setting $wgMemCachedServers = array(); [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165778 (owner: 10Reedy)
[18:46:08] <Reedy>	 See if that does anything for beta...
[18:48:22] <grrrit-wm>	 (03PS1) 10Ori.livneh: add `keyholder` module for managing a shared ssh-agent [puppet] - 10https://gerrit.wikimedia.org/r/165779 
[18:48:26] <bd808>	 Reedy: I see the bug. missing parens
[18:50:30] <icinga-wm>	 RECOVERY - puppet last run on ms-be2008 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures  
[18:51:19] <bd808>	 Reedy: I think I fixed it. Give it a shot
[18:51:23] <Reedy>	 Thanks
[18:51:29] <ori>	 thanks
[18:51:39] <bd808>	 !log cherry-picked I3ae9edab2505c37945fe66863721913a6d33223c to scap
[18:51:45] <morebots>	 Logged the message, Master
[18:51:47] <logmsgbot>	 !log reedy Started scap: testwiki to 1.25wmf3 and build l10n cache (take 2)
[18:51:52] <morebots>	 Logged the message, Master
[18:56:33] <grrrit-wm>	 (03CR) 10Reedy: "deployment-videoscaler01 also uses production nutcracker. Probably should move I should move include ::role::beta::nutcracker into role::b" [puppet] - 10https://gerrit.wikimedia.org/r/165770 (owner: 10Reedy)
[18:59:17] <arlolra>	 akosiaris, paravoid: is there any timeline for upgrading the parsoid boxes to trusty? (just curious ... no pressure, I'm sure you're busy)
[18:59:23] <grrrit-wm>	 (03CR) 10BryanDavis: "Pretty fancy. What kind of audit logging should this do?" [puppet] - 10https://gerrit.wikimedia.org/r/165779 (owner: 10Ori.livneh)
[19:00:50] <grrrit-wm>	 (03PS4) 10Reedy: Make beta jobrunners use beta nutcracker config [puppet] - 10https://gerrit.wikimedia.org/r/165770 
[19:01:19] <icinga-wm>	 PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/).  
[19:01:36] <Reedy>	 18:58:16 ['/srv/deployment/scap/scap/bin/sync-common', '--no-update-l10n', 'mw1010.eqiad.wmnet', 'mw1070.eqiad.wmnet', 'mw1161.eqiad.wmnet', 'mw1201.eqiad.wmnet'] on mw1028 returned [127]: bash: /srv/deployment/scap/scap/bin/sync-common: No such file or directory
[19:02:02] <Reedy>	 Coren: Have you finished with mw1028?
[19:02:04] <bd808>	 mw1028 is missing scap parts?
[19:02:14] <hoo>	 Coren reimaged that one AFAIR
[19:02:49] <hoo>	 https://gerrit.wikimedia.org/r/165746
[19:03:03] <Reedy>	 Saw in the SAL
[19:03:04] <Reedy>	 hence asking :)
[19:03:26] <bd808>	 Reedy: trebuchet hasn't run there. Maybe needs salt key signed?
[19:03:46] <bd808>	 There's not /srv/deployment directory
[19:12:50] <_joe_>	 Reedy: not yet
[19:12:58] <_joe_>	 Reedy: I can finish it now if needed
[19:13:18] <Reedy>	 _joe_: mw1028?
[19:13:26] <_joe_>	 yes
[19:13:32] <Reedy>	 I'm guessing it's not currently pooled?
[19:14:09] <icinga-wm>	 RECOVERY - check if salt-minion is running on mw1028 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion  
[19:14:12] <Reedy>	 It's not urgent
[19:14:46] <Reedy>	 ie it doesn't need doing now
[19:16:01] <icinga-wm>	 PROBLEM - check if salt-minion is running on mw1040 is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/bin/python /usr/bin/salt-minion  
[19:16:09] <icinga-wm>	 PROBLEM - check if salt-minion is running on mw1055 is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/bin/python /usr/bin/salt-minion  
[19:16:12] <icinga-wm>	 PROBLEM - check if salt-minion is running on tungsten is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/bin/python /usr/bin/salt-minion  
[19:16:16] <icinga-wm>	 PROBLEM - check if salt-minion is running on mw1042 is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/bin/python /usr/bin/salt-minion  
[19:16:27] <_joe_>	 wat?
[19:16:29] <icinga-wm>	 PROBLEM - check if salt-minion is running on mw1047 is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/bin/python /usr/bin/salt-minion  
[19:16:47] <cscott>	 hey-lo!
[19:16:57] <cscott>	 can anyone here process access-requests ?
[19:17:09] <icinga-wm>	 RECOVERY - check if salt-minion is running on mw1040 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion  
[19:17:09] <icinga-wm>	 RECOVERY - check if salt-minion is running on mw1055 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion  
[19:17:10] <icinga-wm>	 RECOVERY - check if salt-minion is running on mw1042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion  
[19:17:18] <_joe_>	 cscott: whoever's on duty
[19:17:21] <cscott>	 i'd like to make sure that arlolra can deploy ocg before i go on vacation next week, but his shell access request has been in limbo for a week.
[19:17:26] <cscott>	 so who's on duty?
[19:17:26] <_joe_>	 look @topic
[19:17:29] <icinga-wm>	 RECOVERY - check if salt-minion is running on mw1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion  
[19:17:36] <cscott>	 andrewbogott: you're it!
[19:17:51] <icinga-wm>	 PROBLEM - DPKG on tungsten is CRITICAL: DPKG CRITICAL dpkg reports broken packages  
[19:18:02] <bblack>	 heh I guess the salt-minion check shouldn't fail just because a salt command is currently running
[19:18:05] <andrewbogott>	 cscott: I am!  Do you have a ticket # by chance?
[19:18:17] <bblack>	 (probably also the dpkg check shouldn't fail just because an apt command is currently being run)
[19:18:32] <cscott>	 andrewbogott: rt 8505 i think?
[19:18:50] <icinga-wm>	 RECOVERY - DPKG on tungsten is OK: All packages OK  
[19:19:11] <icinga-wm>	 RECOVERY - check if salt-minion is running on tungsten is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion  
[19:19:26] <_joe_>	 Reedy: FYI, it's running scap now
[19:19:27] <Reedy>	 _joe_: A slightly more important thing would be how we do heira for labs... https://github.com/wikimedia/operations-puppet/blob/9ad61aa3c94169e4c5d376371766b2e6983bb46b/modules/puppetmaster/files/labs.hiera.yaml#L12
[19:19:50] <Reedy>	 [20:11:24] <bd808> So labs/deployment-prep.yaml I think
[19:19:59] <_joe_>	 yes
[19:20:24] <_joe_>	 Reedy: maybe tomorrow? I started working at 8 am :)
[19:20:29] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0  
[19:20:55] <_joe_>	 Reedy: I should be here around 14Z tomorro if we want to work a little on this
[19:21:07] <Reedy>	 I'm not gonna be around a lot of tomorrow
[19:21:48] <logmsgbot>	 !log reedy Finished scap: testwiki to 1.25wmf3 and build l10n cache (take 2) (duration: 30m 00s)
[19:21:49] <Reedy>	 I noticed a lot of labs was using production nutcracker config :(
[19:21:54] <morebots>	 Logged the message, Master
[19:22:00] <Reedy>	 uh
[19:22:02] <Reedy>	 s/labs/beta/
[19:23:30] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core:  cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR  
[19:23:49] <icinga-wm>	 RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge.  
[19:24:04] <grrrit-wm>	 (03PS2) 10Reedy: Wikipedias to 1.25wmf2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165775 
[19:24:17] <grrrit-wm>	 (03PS1) 10Dzahn: salt_minion monitoring - only CRIT if > 2 [puppet] - 10https://gerrit.wikimedia.org/r/165840 
[19:24:22] <mutante>	 bblack: ^
[19:26:34] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Wikipedias to 1.25wmf2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165775 (owner: 10Reedy)
[19:26:43] <grrrit-wm>	 (03Merged) 10jenkins-bot: Wikipedias to 1.25wmf2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165775 (owner: 10Reedy)
[19:27:20] <grrrit-wm>	 (03CR) 10Jforrester: [C: 031] "Due now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157477 (https://bugzilla.wikimedia.org/70217) (owner: 10Jforrester)
[19:27:27] <andrewbogott>	 cscott: can you explain about the request for 'deployment-prep' for arlo?  That should be something that you or another beta admin can give them.
[19:27:39] <andrewbogott>	 Ah, so it says in the request.  nevermind
[19:27:39] <cscott>	 yes, i didn't know that at the time
[19:27:40] <James_F>	 Reedy: Ping. https://gerrit.wikimedia.org/r/#/c/157477/ is scheduled for this deploy window; sorry it was still C-1'ed.
[19:27:56] <cscott>	 andrewbogott: i think i've already given him deployment-prep
[19:28:04] <Reedy>	 James_F: That's fine, still going through this deploy, and attempting to fix beta at the same time etc
[19:28:13] <_joe_>	 mutante: thanks
[19:28:13] <James_F>	 Reedy: My sympathies.
[19:28:23] <cscott>	 James_F: yes, i'm preparing to deploy the jjb config change now
[19:28:41] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] salt_minion monitoring - only CRIT if > 2 [puppet] - 10https://gerrit.wikimedia.org/r/165840 (owner: 10Dzahn)
[19:28:44] <James_F>	 cscott: Cool. Different channel for that conversation, though. :-)
[19:28:54] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Provide access for Arlo Breault: parsoid-admin and ocg-render-admin [puppet] - 10https://gerrit.wikimedia.org/r/165847 
[19:30:59] <grrrit-wm>	 (03CR) 10Cscott: [C: 031] Provide access for Arlo Breault: parsoid-admin and ocg-render-admin [puppet] - 10https://gerrit.wikimedia.org/r/165847 (owner: 10Andrew Bogott)
[19:31:04] <logmsgbot>	 !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.25wmf2
[19:31:14] <morebots>	 Logged the message, Master
[19:31:43] <cscott>	 andrewbogott: i don't have +2 rights on puppet, so you'll need to find another reviewer.
[19:31:56] <grrrit-wm>	 (03CR) 10Arlolra: [C: 031] Provide access for Arlo Breault: parsoid-admin and ocg-render-admin [puppet] - 10https://gerrit.wikimedia.org/r/165847 (owner: 10Andrew Bogott)
[19:32:28] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Provide access for Arlo Breault: parsoid-admin and ocg-render-admin [puppet] - 10https://gerrit.wikimedia.org/r/165847 (owner: 10Andrew Bogott)
[19:32:47] <grrrit-wm>	 (03PS2) 10Reedy: group0 to 1.25wmf3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165776 
[19:32:53] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] group0 to 1.25wmf3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165776 (owner: 10Reedy)
[19:33:07] <grrrit-wm>	 (03Merged) 10jenkins-bot: group0 to 1.25wmf3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165776 (owner: 10Reedy)
[19:34:06] <logmsgbot>	 !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf3
[19:34:11] <morebots>	 Logged the message, Master
[19:34:56] <grrrit-wm>	 (03CR) 10Reedy: [C: 04-1] "Needs rebasing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157477 (https://bugzilla.wikimedia.org/70217) (owner: 10Jforrester)
[19:34:58] <Reedy>	 James_F: ^^
[19:35:09] <James_F>	 Bleh.
[19:35:52] <hoo>	 Reedy: https://gerrit.wikimedia.org/r/164884 easy peasy
[19:36:04] <James_F>	 What on Earth was done to need that?
[19:36:17] <grrrit-wm>	 (03CR) 10Reedy: "This should probably be fixed by using a heira file for nutcracker for labs. Then the config from here can presumably be removed completel" [puppet] - 10https://gerrit.wikimedia.org/r/165770 (owner: 10Reedy)
[19:36:32] <grrrit-wm>	 (03CR) 10Jforrester: "PS2 is a rebase." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157477 (https://bugzilla.wikimedia.org/70217) (owner: 10Jforrester)
[19:36:33] <Reedy>	 James_F: Not sure. A week or 2 ago gerrit wouldn't rebase anything for some stupid reason
[19:36:35] <grrrit-wm>	 (03PS2) 10Jforrester: Enable TemplateData GUI on remaining big Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157477 (https://bugzilla.wikimedia.org/70217) 
[19:36:39] <Reedy>	 Then it started working again
[19:36:53] <James_F>	 Reedy: Maybe that's it. git rebase origin/master && git review just worked.
[19:36:57] <Reedy>	 yeah
[19:37:03] <Reedy>	 We almost need a rebase bot for that
[19:37:08] <arlolra>	 andrewbogott: thanks
[19:37:12] <Reedy>	 for these trivial rebases that jgit fails on
[19:37:25] <grrrit-wm>	 (03CR) 10Dzahn: "please link access changes to a ticket. was actually reviewing" [puppet] - 10https://gerrit.wikimedia.org/r/165847 (owner: 10Andrew Bogott)
[19:37:40] <andrewbogott>	 arlolra: it'll take 30 mins or so for the change to spread.  Please let me know if things are not working in an hour.
[19:37:51] <grrrit-wm>	 (03PS3) 10Reedy: Enable TemplateData GUI on remaining big Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157477 (https://bugzilla.wikimedia.org/70217) (owner: 10Jforrester)
[19:37:54] <James_F>	 Reedy: Or maybe just wait for Phabricator to come along and solve everything? :-)
[19:37:55] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Enable TemplateData GUI on remaining big Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157477 (https://bugzilla.wikimedia.org/70217) (owner: 10Jforrester)
[19:38:02] <andrewbogott>	 mutante: yeah, I noticed that I didn't add the ticket # a second after I merged.
[19:38:08] <andrewbogott>	 The ticket, at least, links to the change.
[19:38:12] * James_F grins.
[19:38:15] <grrrit-wm>	 (03Merged) 10jenkins-bot: Enable TemplateData GUI on remaining big Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157477 (https://bugzilla.wikimedia.org/70217) (owner: 10Jforrester)
[19:38:28] <Reedy>	 brb, going to find something to drink
[19:38:57] <grrrit-wm>	 (03PS2) 10Reedy: Remove unused $wmgMediaViewerBeta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164884 (owner: 10Hoo man)
[19:39:03] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Remove unused $wmgMediaViewerBeta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164884 (owner: 10Hoo man)
[19:39:14] <grrrit-wm>	 (03Merged) 10jenkins-bot: Remove unused $wmgMediaViewerBeta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164884 (owner: 10Hoo man)
[19:40:53] <mutante>	 andrewbogott: i see the ticket now. gotcha, also confirmed by gwicke,thx
[19:41:49] <grrrit-wm>	 (03CR) 10Jforrester: [C: 04-1] "Waiting for wmf4." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158121 (owner: 10Jforrester)
[19:42:14] <andrewbogott>	 ok
[19:42:20] <manybubbles>	 !log upgrading elastic1014
[19:42:20] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet).  
[19:42:20] <icinga-wm>	 RECOVERY - puppet last run on mw1028 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures  
[19:42:24] <Reedy>	 James_F: You could've just used 'large' => true ;)
[19:42:26] <morebots>	 Logged the message, Master
[19:42:30] <icinga-wm>	 PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: No response from remote host 198.35.26.192 for 1.3.6.1.2.1.2.2.1.8  with snmp version 2  
[19:42:45] <James_F>	 Reedy: Eh, but the follow-up just sets default => true instead. :-)
[19:42:57] <James_F>	 Reedy: (Follow-up due in a couple of weeks.)
[19:42:59] <icinga-wm>	 RECOVERY - Apache HTTP on mw1028 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.085 second response time  
[19:43:08] <mutante>	 hmm. has gerrit been updated lately or something?
[19:43:29] <icinga-wm>	 PROBLEM - BGP status on cr1-ulsfo is CRITICAL: CRITICAL: No response from remote host 198.35.26.192  
[19:43:29] <icinga-wm>	 RECOVERY - HHVM rendering on mw1028 is OK: HTTP OK: HTTP/1.1 200 OK - 66959 bytes in 0.300 second response time  
[19:44:20] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [500.0]  
[19:44:45] <grrrit-wm>	 (03PS2) 10Reedy: Add 'abusefilter-modify-restricted' to sysops at zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165704 (https://bugzilla.wikimedia.org/71854) (owner: 10Glaisher)
[19:44:49] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Add 'abusefilter-modify-restricted' to sysops at zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165704 (https://bugzilla.wikimedia.org/71854) (owner: 10Glaisher)
[19:44:57] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add 'abusefilter-modify-restricted' to sysops at zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165704 (https://bugzilla.wikimedia.org/71854) (owner: 10Glaisher)
[19:44:59] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Change wikitech backup crons to use new, proper dirs. [puppet] - 10https://gerrit.wikimedia.org/r/165859 
[19:45:10] <icinga-wm>	 PROBLEM - BGP status on cr1-ulsfo is CRITICAL: CRITICAL: No response from remote host 198.35.26.192  
[19:45:19] <grrrit-wm>	 (03PS3) 10Reedy: Create new user groups on fa.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165371 (https://bugzilla.wikimedia.org/71760) (owner: 10Calak)
[19:45:23] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Create new user groups on fa.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165371 (https://bugzilla.wikimedia.org/71760) (owner: 10Calak)
[19:45:30] <icinga-wm>	 PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: No response from remote host 198.35.26.192 for 1.3.6.1.2.1.2.2.1.8  with snmp version 2  
[19:45:32] <grrrit-wm>	 (03Merged) 10jenkins-bot: Create new user groups on fa.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165371 (https://bugzilla.wikimedia.org/71760) (owner: 10Calak)
[19:45:54] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] rolematcher - remove pmtpa [puppet] - 10https://gerrit.wikimedia.org/r/165677 (owner: 10Dzahn)
[19:45:58] <grrrit-wm>	 (03CR) 10Reedy: [C: 04-1] "Need rebasing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160494 (owner: 10Awight)
[19:46:14] <grrrit-wm>	 (03PS2) 10Reedy: Remove unused log group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165404 (owner: 10MaxSem)
[19:46:18] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Remove unused log group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165404 (owner: 10MaxSem)
[19:46:20] <icinga-wm>	 RECOVERY - BGP status on cr1-ulsfo is OK: OK: host 198.35.26.192, sessions up: 9, down: 0, shutdown: 0  
[19:46:29] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge.  
[19:46:30] <grrrit-wm>	 (03Merged) 10jenkins-bot: Remove unused log group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165404 (owner: 10MaxSem)
[19:47:27] <grrrit-wm>	 (03PS2) 10Reedy: Prevent search engines from indexing user pages and all talk pages on ckbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164766 (https://bugzilla.wikimedia.org/71663) (owner: 10Calak)
[19:47:30] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Prevent search engines from indexing user pages and all talk pages on ckbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164766 (https://bugzilla.wikimedia.org/71663) (owner: 10Calak)
[19:47:42] <grrrit-wm>	 (03Merged) 10jenkins-bot: Prevent search engines from indexing user pages and all talk pages on ckbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164766 (https://bugzilla.wikimedia.org/71663) (owner: 10Calak)
[19:48:19] <icinga-wm>	 PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 6 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/).  
[19:48:21] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Change wikitech backup crons to use new, proper dirs. [puppet] - 10https://gerrit.wikimedia.org/r/165859 (owner: 10Andrew Bogott)
[19:49:09] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "not used, checked on neon" [puppet] - 10https://gerrit.wikimedia.org/r/165678 (owner: 10Dzahn)
[19:49:13] <grrrit-wm>	 (03CR) 10Reedy: "I note we seemed to have something similar on beta... I just did https://gerrit.wikimedia.org/r/165778 instead... I'm not sure which is be" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161005 (owner: 10BryanDavis)
[19:51:19] <icinga-wm>	 RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge.  
[19:51:35] <logmsgbot>	 !log reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)
[19:51:40] <morebots>	 Logged the message, Master
[19:52:07] <mutante>	 !change 165676 | cscott
[19:52:18] <mutante>	 (that bot trigger was actually nice)
[19:52:32] <mutante>	 it used to get the gerrit link and ping 
[19:53:38] <MatmaRex>	 mutante: probably doesn't work on this channel. try on #-dev
[19:54:31] <James_F>	 mutante: Indeed!
[19:55:12] <mutante>	 MatmaRex: that works, thanks
[19:55:22] <mutante>	 at some point it was here i think
[19:56:22] <grrrit-wm>	 (03CR) 10Manybubbles: [C: 031] "Burn it with fire." [puppet] - 10https://gerrit.wikimedia.org/r/165672 (owner: 10Dzahn)
[19:56:55] <grrrit-wm>	 (03CR) 10Cscott: [C: 031] "LGTM, i don't have +2 rights in puppet though." [puppet] - 10https://gerrit.wikimedia.org/r/165676 (owner: 10Dzahn)
[19:57:17] <mutante>	 wow, that's really effective
[19:57:20] <mutante>	 thanks
[19:57:36] <grrrit-wm>	 (03CR) 10Chad: [C: 031] elasticsearch - delete pmtpa remnants [puppet] - 10https://gerrit.wikimedia.org/r/165672 (owner: 10Dzahn)
[19:57:48] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] elasticsearch - delete pmtpa remnants [puppet] - 10https://gerrit.wikimedia.org/r/165672 (owner: 10Dzahn)
[19:58:30] <grrrit-wm>	 (03PS3) 10Dzahn: remove pdf servers,role::pdf and misc pdf class [puppet] - 10https://gerrit.wikimedia.org/r/165676 
[19:59:02] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] remove pdf servers,role::pdf and misc pdf class [puppet] - 10https://gerrit.wikimedia.org/r/165676 (owner: 10Dzahn)
[19:59:42] <grrrit-wm>	 (03PS1) 10Ori.livneh: add auditd module; add auditd rules for keyholder [puppet] - 10https://gerrit.wikimedia.org/r/165862 
[19:59:48] <MatmaRex>	 mutante: :D you could get one of the bot's admins to share the bot's brain from #-dev with here, it's already shared with #mediawiki and maybe some others
[19:59:50] <wm-bot>	 I trust:  petan|w.*wikimedia/Petrb (2admin), .*@wikimedia/.* (2trusted), .*@mediawiki/.* (2trusted), .*@mediawiki/Catrope (2admin), .*@wikimedia/RobH (2admin), .*@wikimedia/Ryan-lane (2admin), petan!.*@wikimedia/Petrb (2admin), .*@wikimedia/Krinkle (2admin),
[19:59:50] <MatmaRex>	 @trusted
[20:00:14] <grrrit-wm>	 (03CR) 10Ori.livneh: "@bd808: See follow-up patch, https://gerrit.wikimedia.org/r/#/c/165862/" [puppet] - 10https://gerrit.wikimedia.org/r/165779 (owner: 10Ori.livneh)
[20:00:15] <MatmaRex>	 but i suppose there might have been a reason it wasn't done
[20:00:35] <mutante>	 yea, it's always controversial which bot should be where
[20:00:40] <mutante>	 either works for me
[20:01:15] <^d>	 mutante: Would you mind having a look at https://gerrit.wikimedia.org/r/#/c/165602/?
[20:01:18] <MatmaRex>	 !botbrain
[20:01:32] <MatmaRex>	 (hmph, that doesn't work here either. nevermind.)
[20:01:40] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0]  
[20:01:50] <mutante>	 ^d: oh, i saw that yesterday, yea. and i see +1 from chris, so .. sure
[20:02:02] <^d>	 thx
[20:02:33] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] Gerrit: explicitly whitelist image formats we want to display [puppet] - 10https://gerrit.wikimedia.org/r/165602 (https://bugzilla.wikimedia.org/70892) (owner: 10Chad)
[20:02:54] <grrrit-wm>	 (03PS1) 10Jhobs: Add 437-05 to unified baselining [puppet] - 10https://gerrit.wikimedia.org/r/165863 
[20:03:02] <Jeff_Green>	 !log rebooting samarium
[20:03:08] <morebots>	 Logged the message, Master
[20:05:10] <grrrit-wm>	 (03PS2) 10Jforrester: Enable TemplateData GUI for all wikis; move config to CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157478 (https://bugzilla.wikimedia.org/60158) 
[20:05:35] <grrrit-wm>	 (03CR) 10Jforrester: "Scheduled for 6 November." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157478 (https://bugzilla.wikimedia.org/60158) (owner: 10Jforrester)
[20:05:59] <icinga-wm>	 PROBLEM - CI tmpfs disk space on lanthanum is CRITICAL: DISK CRITICAL - free space: /var/lib/jenkins-slave/tmpfs 21 MB (4% inode=99%):  
[20:06:08] <Reedy>	 FUUUUUUUUUUUUUUUU
[20:06:25] <Reedy>	 hashar: ^^
[20:06:44] <bd808>	 that's the job thing
[20:06:46] <dr0ptp4kt>	 bblack: mind taking a look at https://gerrit.wikimedia.org/r/#/c/165863/ and then approving and merging and deploying if it looks good? jhobs is the new addition to partners / zero team.
[20:07:06] <bd808>	 Reedy: there's a bug for it
[20:07:12] <grrrit-wm>	 (03CR) 10Ori.livneh: First of (hopefully many) es-tool commands (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/163945 (owner: 10Chad)
[20:07:18] <hashar>	 Reedy: cleaning it
[20:07:20] <icinga-wm>	 PROBLEM - Disk space on lanthanum is CRITICAL: DISK CRITICAL - free space: /var/lib/jenkins-slave/tmpfs 19 MB (3% inode=99%):  
[20:07:42] <Reedy>	 bd808: Right. But I was presuming it was a sign jenkins was going to crap out :)
[20:09:19] <icinga-wm>	 RECOVERY - Disk space on lanthanum is OK: DISK OK  
[20:09:33] <hashar>	 bug is logged and there is a way to nicely garbage collect them
[20:10:00] <icinga-wm>	 RECOVERY - CI tmpfs disk space on lanthanum is OK: DISK OK  
[20:10:29] <icinga-wm>	 PROBLEM - puppet last run on amssq40 is CRITICAL: CRITICAL: puppet fail  
[20:17:30] <subbu>	 deploying new version of parsoid ...
[20:29:50] <icinga-wm>	 RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures  
[20:33:42] <subbu>	 !log deployed parsoid version 644071d2
[20:33:49] <morebots>	 Logged the message, Master
[20:35:32] <grrrit-wm>	 (03PS1) 10Cscott: Give cscott the ability to deploy zuul changes. [puppet] - 10https://gerrit.wikimedia.org/r/165867 
[20:36:04] <cscott>	 hashar: ^ although i suspect an RT ticket # and an email to access-requests is probably the way I *should* be doing this
[20:36:59] <mutante>	 cscott: what kind of access do you need?
[20:37:08] <mutante>	 ah
[20:37:21] <cscott>	 mutante: https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Deploy_configuration :)
[20:37:39] <grrrit-wm>	 (03CR) 10Hashar: [C: 031] "I trust C. Scott :)" [puppet] - 10https://gerrit.wikimedia.org/r/165867 (owner: 10Cscott)
[20:37:42] <hashar>	 cscott: +1ed :°
[20:38:13] <cscott>	 mutante: hashar and i have been fixing up parsoid's jenkins jobs over on #-qa
[20:38:46] <mutante>	 sounds very reasonable to me, just that quick mail to access-requests please for the trail
[20:39:09] <matanya>	 mutante: thanks a lot for tampa work
[20:40:11] <mutante>	 matanya: :) getting closer
[20:55:17] <grrrit-wm>	 (03PS7) 10Ori.livneh: base::standard-packages: install `perf` [puppet] - 10https://gerrit.wikimedia.org/r/164883 
[20:55:20] <ori>	 ^ _joe_
[20:57:38] <grrrit-wm>	 (03CR) 10Krinkle: [C: 031] "Familiarise with https://www.mediawiki.org/wiki/CI/JJB and https://www.mediawiki.org/wiki/CI/Z if not already. Assume the docs are perfect" [puppet] - 10https://gerrit.wikimedia.org/r/165867 (owner: 10Cscott)
[21:00:30] <grrrit-wm>	 (03CR) 10Cscott: "Krinkle -- I already have JJB access. But thanks for the warnings re zuul." [puppet] - 10https://gerrit.wikimedia.org/r/165867 (owner: 10Cscott)
[21:13:58] <Krinkle>	 cscott: ldap/wmf can push to Jenkins, but that's not by design and doesn't mean everyone should actually access it :)
[21:14:15] <Krinkle>	 so JJB access is kind of granted implicitly/socially. most ppl just don't know how.
[21:14:16] <Krinkle>	 :)
[21:14:51] <Krinkle>	 always push to gerrit first, and merge right after pushing to jenkins from your local machine. 
[21:16:22] <cscott>	 Krinkle: yup.
[21:16:28] <hashar>	 cscott: +2 everything Timo says :]
[21:20:39] <grrrit-wm>	 (03PS8) 10Giuseppe Lavagetto: base::standard-packages: install `perf` [puppet] - 10https://gerrit.wikimedia.org/r/164883 (owner: 10Ori.livneh)
[21:20:51] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] base::standard-packages: install `perf` [puppet] - 10https://gerrit.wikimedia.org/r/164883 (owner: 10Ori.livneh)
[21:21:24] <ori>	 _joe_: thanks!
[21:21:54] <_joe_>	 ori: I'll merge that
[21:22:04] <grrrit-wm>	 (03PS2) 10BBlack: Add 437-05 to unified baselining [puppet] - 10https://gerrit.wikimedia.org/r/165863 (owner: 10Jhobs)
[21:22:14] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] Add 437-05 to unified baselining [puppet] - 10https://gerrit.wikimedia.org/r/165863 (owner: 10Jhobs)
[21:22:55] <arlolra>	 andrewbogott: ssh bast1001.wikimedia.org 
[21:22:55] <arlolra>	 Permission denied (publickey).
[21:23:05] <arlolra>	 :(
[21:23:53] <Reedy>	 What username arlolra?
[21:24:15] <andrewbogott>	 arlolra: try again while I watch the logs?
[21:24:25] <arlolra>	 whoami
[21:24:25] <arlolra>	 arlolra
[21:24:50] <Reedy>	 You don't have a home directory on bast1001
[21:24:51] <arlolra>	 andrewbogott: 4 attempts just now
[21:24:52] <andrewbogott>	 ok, says Invalid user arlolra, I'll investigate
[21:25:27] <arlolra>	 Reedy, andrewbogott: thanks
[21:30:30] <icinga-wm>	 PROBLEM - DPKG on labmon1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages  
[21:32:30] <icinga-wm>	 RECOVERY - DPKG on labmon1001 is OK: All packages OK  
[21:33:10] <andrewbogott>	 arlolra: it doesn't look like the groups you requested membership in include access to virt1001.  Did you mean to request access to 'deployment' as well?
[21:33:13] <andrewbogott>	 cscott, any idea?
[21:33:34] <dr0ptp4kt>	 bblack: thanks for looking that vcl change
[21:33:55] <cscott>	 andrewbogott: it's possible the docs i wrote about the set of groups required to deploy ocg are incomplete.
[21:34:19] <cscott>	 virt1001 is a bastion, right?  (i should read the scrollback)
[21:34:59] <andrewbogott>	 chasemp: does the admin module support group-in-group?  It would sort of make sense to have many groups include an implicit 'bastion' membership rather than having the bastions enumerate all groups that need access.
[21:35:02] <bblack>	 dr0ptp4kt: np :)
[21:35:20] <andrewbogott>	 cscott: what about virt1001?
[21:35:20] <_joe_>	 .win 22
[21:35:27] <chasemp>	 andrewbogott: it doesn't and I've thought about it, but never did it
[21:35:27] <_joe_>	 grrr
[21:35:32] <andrewbogott>	 chasemp: ok
[21:35:36] <chasemp>	 mainly because bastion was the only example I could think of
[21:35:41] <chasemp>	 and it was too abstracted for one case
[21:35:44] <cscott>	 andrewbogott: you need access to tin and deployment-bastion in order to deploy ocg.
[21:35:45] <chasemp>	 it would confuse more than help I thought
[21:36:14] <andrewbogott>	 cscott: ok, so arlolra is trying to connect to bast1001 because…?
[21:36:37] <andrewbogott>	 Is tin public or do you need to go via bast1001 to get there?
[21:36:41] * andrewbogott should really know this
[21:37:00] <andrewbogott>	 well, anyway, the only group with access to tin is 'deployment'
[21:37:05] <mutante>	 andrewbogott: you need bastion
[21:37:08] <cscott>	 andrewbogott: i think i go via bastion
[21:37:26] <^d>	 tin is not public.
[21:37:26] <arlolra>	 andrewbogott: that's what they're telling me to do
[21:37:41] <mutante>	 andrewbogott: there is also a "bastion-only" group if needed
[21:38:00] <arlolra>	 maybe deployment-prep should have just been deployment in that email
[21:38:07] <cscott>	 so i'm checking puppet -- i'm a member of parsoid-admin, deployment, ocg-render-admins, (and pdf-qa-users, which i have no idea what it does)
[21:38:15] <cscott>	 so i guess 'deployment' is the odd dog out there.
[21:39:25] <mutante>	 that's what used to be "mortals" in the past
[21:39:36] <cscott>	 andrewbogott: you added me to parsoid-admin in 0ef350163984322e3d99b09ac1cecc7d855eb6d9 but i was already a member of deployment at that time
[21:39:39] <mutante>	 the group who deploys mediawiki
[21:39:46] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Add arlolra to deployment as well. [puppet] - 10https://gerrit.wikimedia.org/r/165892 
[21:39:50] <^d>	 We should've kept mortals.
[21:40:17] <arlolra>	 mere mortals
[21:40:32] <mutante>	 are you going to deploy mw?
[21:40:50] <arlolra>	 I hope not
[21:40:59] <cscott>	 i was added to deployment in RT #7542
[21:41:05] <matanya>	 andrewbogott: you could add him to bastion only
[21:41:10] <arlolra>	 ocg and parsoid
[21:41:23] <grrrit-wm>	 (03CR) 10Hoo man: [C: 04-1] "If they are not supposed to be a deployer, don't make them one. We have a "bastiononly" group these days." [puppet] - 10https://gerrit.wikimedia.org/r/165892 (owner: 10Andrew Bogott)
[21:41:38] <grrrit-wm>	 (03CR) 10Dzahn: "if all you want is to add bastion to the existing groups, use "bastiononly" group. deployment not really needed" [puppet] - 10https://gerrit.wikimedia.org/r/165892 (owner: 10Andrew Bogott)
[21:41:40] <andrewbogott>	 Currently the only group with access to tin is 'deployment'
[21:41:42] <cscott>	 just tell me what group you choose so i can add it to the "you must be members of these groups to deploy" documentation for OCG and Parsoid
[21:41:58] <hoo>	 andrewbogott: Why is tin needed?
[21:42:08] <cscott>	 tin is where we stage git-deploy for ocg and parsoid
[21:42:16] <YuviPanda>	 > Tin has many uses. It takes a high polish and is used to coat other metals to prevent corrosion, such as in tin cans which are made of tin-coated steel. Alloys of tin are important, such as soft solder, pewter, bronze and phosphor bronze. 
[21:42:16] <andrewbogott>	 that sounds like deploying to me
[21:42:41] <hoo>	 YuviPanda: Correct answer, yet totally useless :D
[21:42:48] <cscott>	 https://wikitech.wikimedia.org/wiki/Parsoid#Deploying_changes and https://wikitech.wikimedia.org/wiki/OCG#Deploying_changes
[21:42:52] <YuviPanda>	 hoo: unlike Tin! :)
[21:43:09] <grrrit-wm>	 (03CR) 10Hoo man: "<hoo> andrewbogott: Why is tin needed?" [puppet] - 10https://gerrit.wikimedia.org/r/165892 (owner: 10Andrew Bogott)
[21:43:28] <arlolra>	 ha
[21:44:33] <grrrit-wm>	 (03CR) 10Dzahn: "i guess we make no difference so far between _what_ is deployed and this is deployment, just another kind because it's not mediawiki.. shr" [puppet] - 10https://gerrit.wikimedia.org/r/165892 (owner: 10Andrew Bogott)
[21:45:49] <icinga-wm>	 PROBLEM - DPKG on analytics1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages  
[21:46:01] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] "...unless we want to introduce parsoid-deployers and add it to tin, which would also seem good" [puppet] - 10https://gerrit.wikimedia.org/r/165892 (owner: 10Andrew Bogott)
[21:46:09] <hoo>	 andrewbogott: Creating a custom group would be rather easy these days, if you care enough
[21:47:49] <icinga-wm>	 PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: puppet fail  
[21:48:17] <cscott>	 there's already parsoid-admin and ocg-admin
[21:48:24] <cscott>	 if you add those to tin we'd be good i think.
[21:49:46] <grrrit-wm>	 (03CR) 10Cscott: "There's already parsoid-admin and ocg-render-admins -- let's use those instead of inventing a new parsoid-deployers group (if we wanted to" [puppet] - 10https://gerrit.wikimedia.org/r/165892 (owner: 10Andrew Bogott)
[21:50:44] <grrrit-wm>	 (03CR) 10Dzahn: "excellent point. yea, let's add one of those to tin, where they are needed" [puppet] - 10https://gerrit.wikimedia.org/r/165892 (owner: 10Andrew Bogott)
[21:51:07] <andrewbogott>	 and bastion
[21:51:08] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Add parsoid-admin ocg-render-admin to tin and bast1001. [puppet] - 10https://gerrit.wikimedia.org/r/165897 
[21:51:10] <grrrit-wm>	 (03CR) 10Cscott: "Oh, but note that the /srv/deployment/ocg and /srv/deployment/parsoid directories are both setgid wikidev. So we'd need to create new uni" [puppet] - 10https://gerrit.wikimedia.org/r/165892 (owner: 10Andrew Bogott)
[21:51:12] <andrewbogott>	 and whatever else we're forgetting
[21:51:32] <cscott>	 andrewbogott: we need new unix groups to own the code on tin as well.
[21:51:57] <hoo>	 Groups other than wikidev?
[21:52:07] <hoo>	 Nobody needs groups other than wikidev :D
[21:53:34] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] Add parsoid-admin ocg-render-admin to tin and bast1001. [puppet] - 10https://gerrit.wikimedia.org/r/165897 (owner: 10Andrew Bogott)
[21:57:42] <cscott>	 hoo: is wikidev connected to the deployment group?  or are we all wikidev?
[21:58:05] <grrrit-wm>	 (03CR) 10Cscott: "Just for reference, here are the install directions for Parsoid: https://wikitech.wikimedia.org/wiki/Parsoid#Deploying_changes" [puppet] - 10https://gerrit.wikimedia.org/r/165892 (owner: 10Andrew Bogott)
[21:58:21] <hoo>	 cscott: Everyone is wikidev
[21:58:26] <cscott>	 ok then
[21:58:27] <hoo>	 in the past it was the only gid we had
[21:58:28] <hoo>	 :P
[21:58:33] <hoo>	 That's the point of that joke
[21:59:04] <grrrit-wm>	 (03CR) 10Cscott: "It looks like https://gerrit.wikimedia.org/r/165897 is the preferred solution now." [puppet] - 10https://gerrit.wikimedia.org/r/165892 (owner: 10Andrew Bogott)
[22:00:10] <andrewbogott>	 chasemp: have any reservations about https://gerrit.wikimedia.org/r/#/c/165897/ ?
[22:00:56] <cscott>	 andrewbogott: only that ocg-render-admins is plural.
[22:01:54] <mutante>	 cscott: andrewbogott . oh. heh, i just saw this
[22:01:57] <grrrit-wm>	 (03CR) 10Cscott: [C: 04-1] Add parsoid-admin ocg-render-admin to tin and bast1001. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/165897 (owner: 10Andrew Bogott)
[22:01:58] <mutante>	 "cscott is requesting access to tmh* hosts. such as thm1001."
[22:02:11] <mutante>	 i should have moved that to access requests queue earlier.. does now
[22:02:34] <mutante>	 probably because duplicate tickets were merged
[22:02:37] <grrrit-wm>	 (03PS2) 10Andrew Bogott: Add parsoid-admin ocg-render-admin to tin and bast1001. [puppet] - 10https://gerrit.wikimedia.org/r/165897 
[22:02:46] <cscott>	 mutante: oh, right -- last time i actually tried to deploy a *mediawiki* config (since i am a member of deployment, doncha know) it failed on thm*
[22:03:47] <grrrit-wm>	 (03PS2) 10Dzahn: Give cscott the ability to deploy zuul changes. [puppet] - 10https://gerrit.wikimedia.org/r/165867 (owner: 10Cscott)
[22:04:33] <mutante>	 cscott: while at it.. linked that too. so you have 2 tickets, one for each
[22:04:41] <grrrit-wm>	 (03CR) 10Cscott: [C: 031] Add parsoid-admin ocg-render-admin to tin and bast1001. [puppet] - 10https://gerrit.wikimedia.org/r/165897 (owner: 10Andrew Bogott)
[22:04:49] <icinga-wm>	 PROBLEM - check if salt-minion is running on analytics1003 is CRITICAL: NRPE: Command check_check_salt_minion not defined  
[22:08:41] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] Give cscott the ability to deploy zuul changes. [puppet] - 10https://gerrit.wikimedia.org/r/165867 (owner: 10Cscott)
[22:09:44] <chasemp>	 andrewbogott, not sure on giving them perms from policy angle, but syntax is good
[22:10:21] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] "yea, this seems nicer than Change-Id: Ie9fd2d3f5358" [puppet] - 10https://gerrit.wikimedia.org/r/165897 (owner: 10Andrew Bogott)
[22:10:48] <gwicke>	 https://gerrit.wikimedia.org/r/#/c/165903/
[22:10:56] <gwicke>	 sorry, wrong channel
[22:11:17] <grrrit-wm>	 (03CR) 10Dzahn: "agree, Change-Id: Ib519628c3f33e seems nicer" [puppet] - 10https://gerrit.wikimedia.org/r/165892 (owner: 10Andrew Bogott)
[22:12:30] <grrrit-wm>	 (03Abandoned) 10Andrew Bogott: Add arlolra to deployment as well. [puppet] - 10https://gerrit.wikimedia.org/r/165892 (owner: 10Andrew Bogott)
[22:12:46] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Add parsoid-admin ocg-render-admin to tin and bast1001. [puppet] - 10https://gerrit.wikimedia.org/r/165897 (owner: 10Andrew Bogott)
[22:12:57] <cscott>	 whoo
[22:13:22] <cscott>	 of course i just noticed that the commit summary uses ocg-render-admin instead of ocg-render-admins.  oh well.
[22:16:00] <andrewbogott>	 arlolra: try now
[22:16:41] <arlolra>	 :)
[22:17:18] <arlolra>	 andrewbogott, et al.: thanks
[22:19:40] <cscott>	 does the puppet change require some time to propagate?
[22:20:16] <hoo>	 Yep... every server affected needs at least one puppet run
[22:20:27] <hoo>	 might take up to ~30 mins
[22:20:50] <mutante>	 that, unless we speed it up by manually running it
[22:22:27] <andrewbogott>	 I ran it on bast1001 and tin
[22:23:47] <grrrit-wm>	 (03PS1) 10Ottomata: Grant analytics shell account access to Marcel Ruiz Forns [puppet] - 10https://gerrit.wikimedia.org/r/165909 
[22:27:33] <arlolra>	 verified I could access both. thanks
[22:27:43] <grrrit-wm>	 (03PS2) 10Ottomata: Grant analytics shell account access to Marcel Ruiz Forns [puppet] - 10https://gerrit.wikimedia.org/r/165909 
[22:29:59] <grrrit-wm>	 (03PS1) 10Christopher Johnson (WMDE): fix typo in static yaml phab priority settings file [puppet] - 10https://gerrit.wikimedia.org/r/165911 
[22:45:08] <icinga-wm>	 PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: puppet fail  
[23:00:05] <jouncebot>	 RoanKattouw, ^d, marktraceur, MaxSem: Dear anthropoid, the time has come. Please deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141009T2300).
[23:00:20] * hoo waves to whoever is going to do SWAT
[23:00:37] * hoo has https://gerrit.wikimedia.org/r/165913 https://gerrit.wikimedia.org/r/165914
[23:00:38] * MaxSem will do
[23:00:42] <hoo>	 :)
[23:01:05] <MaxSem>	 hoo, I patches from wiki only XD
[23:01:11] <hoo>	 What
[23:01:13] <hoo>	 ?
[23:01:13] <MaxSem>	 *I get patches
[23:01:19] <grrrit-wm>	 (03CR) 10Ori.livneh: "needs rebase" [puppet] - 10https://gerrit.wikimedia.org/r/147487 (owner: 10Reedy)
[23:01:20] <hoo>	 It's in the Wiki
[23:01:24] <hoo>	 but not the core bumps
[23:01:30] <ori>	 Reedy: wanna amend that? ^^
[23:01:37] <hoo>	 I can +2 myself if you don't want to
[23:03:07] <MaxSem>	 prtksxna, yt?
[23:03:15] <prtksxna>	 MaxSem: o/
[23:03:18] <icinga-wm>	 RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures  
[23:10:15] <MaxSem>	 andrewbogott or legoktm, yt?
[23:10:22] <andrewbogott>	 I am
[23:10:23] <legoktm>	 I am
[23:10:26] <MaxSem>	 cool
[23:11:58] <cajoel>	 any elasticsearch/logstash magicians available?
[23:12:11] <cajoel>	 jgage: you?
[23:12:20] <bd808>	 cajoel: you summoned?
[23:12:25] <cajoel>	 awesome
[23:12:35] <ebernhardson>	 MaxSem: am here :)
[23:12:52] <cajoel>	 I have weeks of old crufty logs that I'd like to import in to my elasticseach (my house -- oit-- not production)
[23:13:02] <grrrit-wm>	 (03PS1) 10Dzahn: gerrit - add 'phab' short link to phabricator [puppet] - 10https://gerrit.wikimedia.org/r/165923 
[23:13:15] <cajoel>	 even though I /think/ I'm using a fitler to READ the actual timestamps, the data keeps showing up at the time I import it
[23:13:27] <bd808>	 heh.
[23:13:33] <cajoel>	 instead of re-aligned to the actual dates the events happened
[23:13:39] <cajoel>	 any guidelines on ingesting old logs?
[23:13:52] <cajoel>	 syslog for the most part
[23:13:56] <bd808>	 paste of your filter config?
[23:14:06] <grrrit-wm>	 (03PS1) 10Ori.livneh: mediawiki: remove cruft from apache2.conf [puppet] - 10https://gerrit.wikimedia.org/r/165924 
[23:14:28] <cajoel>	 hrm
[23:14:34] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] "so that we can start linking from gerrit to phab tasks, k?" [puppet] - 10https://gerrit.wikimedia.org/r/165923 (owner: 10Dzahn)
[23:14:46] <cajoel>	 sent in a pm
[23:16:08] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0  
[23:17:50] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 031] "I thought the way you made the pattern case-insensitive was a little funny, but sure enough, the docs say: "To match case insensitive stri" [puppet] - 10https://gerrit.wikimedia.org/r/165923 (owner: 10Dzahn)
[23:19:16] * MaxSem bites Zuul
[23:19:24] <grrrit-wm>	 (03PS2) 10Dzahn: Linkify Phabricator Task references in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/164880 (owner: 10QChris)
[23:19:53] <legoktm>	 MaxSem: yeah, this is taking really really long.
[23:20:22] <grrrit-wm>	 (03CR) 10Dzahn: "heh, yea, just copied from existing patterns. but it seems i'm a duplicate of https://gerrit.wikimedia.org/r/#/c/164880/1 more or less, s" [puppet] - 10https://gerrit.wikimedia.org/r/165923 (owner: 10Dzahn)
[23:20:49] <MaxSem>	 can we just run it on a dozen supercomputers? :|
[23:21:47] <ebernhardson>	 i do wonder what the bottleneck is (processor, cpu, io bandwidth, etc)
[23:21:57] <ebernhardson>	 because some of those are rather cheap(relative to engineer time waiting) to fix
[23:22:17] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] Linkify Phabricator Task references in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/164880 (owner: 10QChris)
[23:23:30] <ori>	 ebernhardson: it does everything at least twice, processes queues synchronously and serially even when the jobs have no interdependency, and has a shit-ton of peacock jobs that are non-voting and which are of no value to anyone
[23:24:21] <MaxSem>	 wheeee
[23:24:26] <MaxSem>	 gerrit is a roast
[23:24:33] <ebernhardson>	 yea, that doesn't sound like something throwing hardware $$ at will fix :(
[23:25:01] <ori>	 if you aim the hardware well and throw it hard enough, it might
[23:25:15] <ebernhardson>	 :)
[23:25:56] <grrrit-wm>	 (03PS1) 10Dzahn: add reports.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/165927 
[23:26:17] <grrrit-wm>	 (03PS2) 10Dzahn: add reports.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/165927 
[23:27:10] <grrrit-wm>	 (03CR) 10Dzahn: "works! :)" [puppet] - 10https://gerrit.wikimedia.org/r/164880 (owner: 10QChris)
[23:27:42] <grrrit-wm>	 (03CR) 10Dzahn: [C: 04-2] "duplicate of https://gerrit.wikimedia.org/r/#/c/164880/ - use "T" now to link to phab" [puppet] - 10https://gerrit.wikimedia.org/r/165923 (owner: 10Dzahn)
[23:27:51] <grrrit-wm>	 (03Abandoned) 10Dzahn: gerrit - add 'phab' short link to phabricator [puppet] - 10https://gerrit.wikimedia.org/r/165923 (owner: 10Dzahn)
[23:29:19] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core:  cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR  
[23:29:45] <ori>	 ^ bblack?
[23:31:07] <andrewbogott>	 MaxSem: are we still mid-swat?
[23:31:18] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0  
[23:33:56] <grrrit-wm>	 (03CR) 10Dzahn: "this is to discuss if the name is good for this purpose. it will need follow-up to add varnish config on misc-web if we take it. i would s" [dns] - 10https://gerrit.wikimedia.org/r/165927 (owner: 10Dzahn)
[23:34:21] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core:  cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR  
[23:37:31] <MaxSem>	 andrewbogott, yep - too many submodule updates and zuul silliness
[23:37:51] <andrewbogott>	 ok
[23:38:24] <hoo>	 Are you going to scap it all at the end?
[23:38:34] <MaxSem>	 is it required?
[23:38:48] <hoo>	 No, just want to get an idea of how much time it will take
[23:39:09] <MaxSem>	 hoo, does WD need a recursive update?
[23:39:26] <hoo>	 No, it's only one repo
[23:39:33] <hoo>	 Everything else is embedded there
[23:40:05] <hoo>	 That's what's needed to use composer for prod.
[23:40:09] <logmsgbot>	 !log maxsem Synchronized php-1.25wmf2/extensions/Wikidata/: (no message) (duration: 00m 10s)
[23:40:15] <MaxSem>	 hoo, ^^
[23:40:16] <morebots>	 Logged the message, Master
[23:40:19] <MaxSem>	 please test
[23:40:25] <hoo>	 Already done :)
[23:40:29] <hoo>	 Fatals are easy to test
[23:40:37] <hoo>	 wmf3 as well, please
[23:41:24] <hoo>	 apergos: ping
[23:43:07] <logmsgbot>	 !log maxsem Synchronized php-1.25wmf3/extensions/Wikidata/: (no message) (duration: 00m 10s)
[23:43:11] <MaxSem>	 hoo, ^^
[23:43:12] <morebots>	 Logged the message, Master
[23:43:14] <hoo>	 Thanks :)
[23:43:39] <hoo>	 Reaching apergos at this time is not possible, I guess?
[23:43:46] <MaxSem>	 yup
[23:43:49] <hoo>	 :S
[23:44:07] <hoo>	 mutante: Want to do me a favour and do something on snapshot for me?
[23:44:37] * hoo would be so happy if we could finally get this access request through *sigh*
[23:45:51] <ori>	 hoo: what do you need?
[23:45:54] <logmsgbot>	 !log maxsem Synchronized php-1.25wmf2/extensions/Flow/: (no message) (duration: 00m 09s)
[23:46:00] <morebots>	 Logged the message, Master
[23:46:06] <hoo>	 ori: Two empty files deleted and one cron started per hand
[23:46:14] <ori>	 hoo: go on
[23:46:29] <hoo>	 Nice, give me a moment
[23:46:53] <MaxSem>	 ebernharson, ^^^
[23:46:59] <grrrit-wm>	 (03Abandoned) 10Dzahn: remove 10.0.0.0/16 Tampa subnet from DHCP [puppet] - 10https://gerrit.wikimedia.org/r/164241 (owner: 10Dzahn)
[23:47:25] <hoo>	 hoo@snapshot1003:~$ sudo -u datasets rm /mnt/data/xmldatadumps/public/other/wikidata/2014100*
[23:47:28] <hoo>	 ori: ^ do that
[23:48:07] <hoo>	 those are both broken due to a fatal
[23:48:19] <ori>	 yes, they're 4k
[23:48:21] <hoo>	 s/broken/empty/
[23:48:22] <ori>	 done
[23:48:25] <hoo>	 :)
[23:49:09] <hoo>	 sudo -u datasets /usr/local/bin/dumpwikidatajson.sh
[23:49:14] <hoo>	 that's needed to create a new one
[23:49:22] <hoo>	 you probably want to run that in a screen or so
[23:49:26] <hoo>	 takes ~10h
[23:49:37] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0  
[23:50:22] <ori>	 i'll just stare at it intently for 10 hours
[23:50:50] <hoo>	 Awesome :D
[23:50:50] <ori>	 hoo: running
[23:50:58] <logmsgbot>	 !log maxsem Synchronized php-1.25wmf2/extensions/MobileApp: (no message) (duration: 00m 04s)
[23:51:02] <hoo>	 Yay :)
[23:51:03] <hoo>	 Thanks
[23:51:04] <morebots>	 Logged the message, Master
[23:52:02] <logmsgbot>	 !log maxsem Synchronized php-1.25wmf3/extensions/MobileApp: (no message) (duration: 00m 03s)
[23:52:06] <morebots>	 Logged the message, Master
[23:53:39] <legoktm>	 MaxSem: time for OSM now? :)
[23:54:18] <MaxSem>	 no, we're not deploying OpenStreetMaps today
[23:55:23] <legoktm>	 lol
[23:55:41] <logmsgbot>	 !log maxsem Synchronized php-1.25wmf2/extensions/OpenStackManager/: (no message) (duration: 00m 04s)
[23:55:48] <morebots>	 Logged the message, Master
[23:56:33] <legoktm>	 andrewbogott: ^ 
[23:56:35] <grrrit-wm>	 (03PS3) 10Ottomata: Grant analytics shell account access to Marcel Ruiz Forns [puppet] - 10https://gerrit.wikimedia.org/r/165909 
[23:56:43] <andrewbogott>	 legoktm: isn't there another one?
[23:56:54] <legoktm>	 no, it's just one submodule update
[23:56:58] <andrewbogott>	 ok
[23:57:06] <MaxSem>	 and no wmf3?
[23:57:17] <MaxSem>	 (too many commits for me today)
[23:57:23] <legoktm>	 oh, we probably should backport to wmf3
[23:57:28] <legoktm>	 wikitech is still on wmf2
[23:57:35] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "these are down and all UNKNOWN in icinga meanwhile. cleaning that up https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=all&type=" [puppet] - 10https://gerrit.wikimedia.org/r/165673 (owner: 10Dzahn)
[23:57:43] <logmsgbot>	 !log maxsem Synchronized php-1.25wmf3/resources/: (no message) (duration: 00m 04s)
[23:57:48] <morebots>	 Logged the message, Master
[23:57:51] <logmsgbot>	 !log maxsem Synchronized php-1.25wmf2/resources/: (no message) (duration: 00m 03s)
[23:57:57] <morebots>	 Logged the message, Master
[23:58:00] <MaxSem>	 prtksxna ^^
[23:58:05] <legoktm>	 but we don't want the API modules to disappear on thursday..
[23:58:26] <MaxSem>	 pfff
[23:58:31] <MaxSem>	 I think I'm done
[23:58:33] <grrrit-wm>	 (03PS4) 10Ottomata: Grant analytics shell account access to Marcel Ruiz Forns [puppet] - 10https://gerrit.wikimedia.org/r/165909 
[23:58:42] <prtksxna>	 MaxSem: Checking…
[23:58:43] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Grant analytics shell account access to Marcel Ruiz Forns [puppet] - 10https://gerrit.wikimedia.org/r/165909 (owner: 10Ottomata)
[23:58:52] <legoktm>	 MaxSem: yeah, we can backport them another day.
[23:58:59] <hoo>	 legoktm: It's a bi-weekly API... you just need to plan your actions ;)
[23:59:18] <legoktm>	 lol
[23:59:38] <prtksxna>	 Thanks MaxSem!
[23:59:54] <legoktm>	 andrewbogott: let me know once you sync wikitech and then I'll test it there