[00:00:47] <icinga-wm>	 PROBLEM - puppet last run on mw2047 is CRITICAL puppet fail
[00:02:35] <grrrit-wm>	 (03CR) 10Yuvipanda: "Attempting to test this on toolsbeta." [puppet] - 10https://gerrit.wikimedia.org/r/204193 (https://phabricator.wikimedia.org/T96059) (owner: 10Yuvipanda)
[00:03:05] <grrrit-wm>	 (03PS1) 10BBlack: remove more torrus api-cluster refs (followup fix for 6254a447?) [puppet] - 10https://gerrit.wikimedia.org/r/204198 
[00:04:41] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] remove more torrus api-cluster refs (followup fix for 6254a447?) [puppet] - 10https://gerrit.wikimedia.org/r/204198 (owner: 10BBlack)
[00:08:07] <icinga-wm>	 RECOVERY - puppet last run on netmon1001 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures
[00:12:32] <mutante>	 bblack: yes, ^ recovery confirmed 
[00:12:34] <mutante>	 thx
[00:16:58] <grrrit-wm>	 (03PS9) 10BBlack: r::c::config::active_nodes -> hiera cache::nodes [puppet] - 10https://gerrit.wikimedia.org/r/204068 
[00:18:00] <grrrit-wm>	 (03PS1) 10Ori.livneh: Create Application class [software/brrd] - 10https://gerrit.wikimedia.org/r/204199 
[00:18:17] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032 V: 032] Create Application class [software/brrd] - 10https://gerrit.wikimedia.org/r/204199 (owner: 10Ori.livneh)
[00:18:37] <icinga-wm>	 RECOVERY - puppet last run on mw2047 is OK Puppet is currently enabled, last run 51 seconds ago with 0 failures
[00:21:52] <grrrit-wm>	 (03PS1) 10Ori.livneh: Update upstart job def for brrd [puppet] - 10https://gerrit.wikimedia.org/r/204200 
[00:22:02] <grrrit-wm>	 (03PS2) 10Ori.livneh: Update upstart job def for brrd [puppet] - 10https://gerrit.wikimedia.org/r/204200 
[00:22:36] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032 V: 032] Update upstart job def for brrd [puppet] - 10https://gerrit.wikimedia.org/r/204200 (owner: 10Ori.livneh)
[00:28:41] <grrrit-wm>	 (03CR) 10Chad: Add submodules to master checkoutMediaWiki (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204080 (https://phabricator.wikimedia.org/T88442) (owner: 10Thcipriani)
[00:32:35] <grrrit-wm>	 (03PS10) 10BBlack: r::c::config::active_nodes -> hiera cache::nodes [puppet] - 10https://gerrit.wikimedia.org/r/204068 
[00:35:10] <grrrit-wm>	 (03PS1) 10Dzahn: integration: move redirect out of .htaccess [puppet] - 10https://gerrit.wikimedia.org/r/204202 
[00:36:13] <grrrit-wm>	 (03CR) 10Dzahn: "[gallium:/srv/org/wikimedia/integration] $ cat .htaccess" [puppet] - 10https://gerrit.wikimedia.org/r/204202 (owner: 10Dzahn)
[00:41:33] <grrrit-wm>	 (03PS2) 10Dzahn: various role classes: moar small lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/202653 (https://phabricator.wikimedia.org/T93645) 
[00:41:49] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] various role classes: moar small lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/202653 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn)
[00:43:36] <icinga-wm>	 RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60633 bytes in 1.156 second response time
[00:45:59] <grrrit-wm>	 (03PS3) 10Dzahn: drop shop & store entries from most projects [dns] - 10https://gerrit.wikimedia.org/r/196605 (https://phabricator.wikimedia.org/T92438) 
[00:46:51] <grrrit-wm>	 (03PS1) 10Alex Monk: Add AffCom user group application contact page on meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204205 (https://phabricator.wikimedia.org/T95789) 
[00:46:53] <grrrit-wm>	 (03CR) 10Dzahn: "@Faidon ok:)" [dns] - 10https://gerrit.wikimedia.org/r/196605 (https://phabricator.wikimedia.org/T92438) (owner: 10Dzahn)
[00:50:17] <grrrit-wm>	 (03Abandoned) 10Dzahn: color root shell in red [puppet] - 10https://gerrit.wikimedia.org/r/198425 (owner: 10Dzahn)
[00:51:02] <grrrit-wm>	 (03Abandoned) 10Dzahn: use 208.80.153.224 for text-lb.codfw.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/196075 (https://phabricator.wikimedia.org/T92377) (owner: 10Dzahn)
[00:55:01] <grrrit-wm>	 (03PS2) 10Alex Monk: Add AffCom user group application contact page on meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204205 (https://phabricator.wikimedia.org/T95789) 
[00:56:19] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 04-1] Add AffCom user group application contact page on meta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204205 (https://phabricator.wikimedia.org/T95789) (owner: 10Alex Monk)
[01:08:04] <^d>	 mutante: Has anything changed with the gerrit ssl cert recently?
[01:08:12] <^d>	 (like, last couple of days?)
[01:08:35] <bblack>	 not that I've heard about
[01:08:43] <mutante>	 ^d: same, not that i heard about
[01:08:54] <bblack>	 why?
[01:09:43] <^d>	 I have a stupid little PHP script on my machine that fetches some JSON from Gerrit for me. It started blowing up like yesterday
[01:10:04] <^d>	 Warning: file_get_contents(): SSL operation failed with code 1. OpenSSL Error messages:
[01:10:05] <^d>	 error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed in fetch_missing_repos.php on line 17
[01:11:37] <mutante>	 there is https://phabricator.wikimedia.org/T82319  but that's not a new thing , it doesnt explain things changing just a few days ago
[01:12:14] <^d>	 I'm totally willing to accept it's possibly something I screwed up on my machine :p
[01:12:47] <mutante>	 ^d: i see "ssl3" in there
[01:13:04] <mutante>	 we disabled SSLv3
[01:13:04] <^d>	 Yeah, hmm
[01:13:10] <mutante>	 also not a few days ago.. but .. wait.. 
[01:14:08] <bblack>	 I think openssl always calls that function ssl3_get_server_certificate even when it's using TLS?
[01:15:48] <^d>	 https://phabricator.wikimedia.org/P520 - my openssl config for PHP
[01:16:02] <mutante>	 cant find the other bug i meant
[01:16:09] <mutante>	 anyways that change on gerrit was in Oct 2014 
[01:18:34] <mutante>	 ^d: is this using curl from php?
[01:18:51] <^d>	 fopen equivalent
[01:18:58] <bblack>	 shouldn't your openssl have a capath? how else would it have a set of root certs to validate us against?
[01:19:02] <mutante>	 "curl used to include a list of accepted CAs, but no longer bundles ANY CA certs. So by default it'll reject all SSL certificates as unverifiable."
[01:19:11] <bblack>	 like /etc/ssl/certs/
[01:19:15] <mutante>	 so you would have to set the capath, yea
[01:19:42] <^d>	 Gah, default config changed under me at some point
[01:21:03] <grrrit-wm>	 (03PS11) 10BBlack: r::c::config::active_nodes -> hiera cache::nodes [puppet] - 10https://gerrit.wikimedia.org/r/204068 
[01:22:34] <wikibugs>	 6operations, 10Wikimedia-Mailing-lists: Update mailman listinfo.txt template - https://phabricator.wikimedia.org/T96108#1208549 (10Dzahn) a:3Dzahn
[01:40:48] <icinga-wm>	 PROBLEM - mailman I/O stats on sodium is CRITICAL - I/O stats: Transfers/Sec=223.60 Read Requests/Sec=133.20 Write Requests/Sec=36.10 KBytes Read/Sec=873.60 KBytes_Written/Sec=448.75
[01:41:39] <mutante>	 now that might have been me running a script
[01:42:27] <icinga-wm>	 RECOVERY - mailman I/O stats on sodium is OK - I/O stats: Transfers/Sec=3.50 Read Requests/Sec=1.10 Write Requests/Sec=8.40 KBytes Read/Sec=6.40 KBytes_Written/Sec=136.40
[02:30:31] <grrrit-wm>	 (03PS1) 10Chad: Use Diffusion to support r1234 links in Gerrit [puppet] - 10https://gerrit.wikimedia.org/r/204211 
[02:32:02] <wikibugs>	 6operations, 5Interdatacenter-IPsec: Strongswan: security association reauthentication failure - https://phabricator.wikimedia.org/T96111#1208595 (10Gage) 3NEW a:3Gage
[02:32:19] <logmsgbot>	 !log l10nupdate Synchronized php-1.25wmf24/cache/l10n: (no message) (duration: 09m 03s)
[02:32:19] <jgage>	 blah
[02:32:41] <morebots>	 Logged the message, Master
[02:39:02] <logmsgbot>	 !log LocalisationUpdate completed (1.25wmf24) at 2015-04-15 02:37:59+00:00
[02:39:14] <morebots>	 Logged the message, Master
[02:52:07] <icinga-wm>	 PROBLEM - mailman I/O stats on sodium is CRITICAL - I/O stats: Transfers/Sec=189.40 Read Requests/Sec=95.70 Write Requests/Sec=3.30 KBytes Read/Sec=11046.00 KBytes_Written/Sec=32.55
[02:54:32] <wikibugs>	 6operations, 5Interdatacenter-IPsec: Strongswan: security association reauthentication failure - https://phabricator.wikimedia.org/T96111#1208607 (10Gage)
[02:56:57] <icinga-wm>	 RECOVERY - mailman I/O stats on sodium is OK - I/O stats: Transfers/Sec=84.40 Read Requests/Sec=21.00 Write Requests/Sec=78.80 KBytes Read/Sec=692.40 KBytes_Written/Sec=533.30
[03:04:18] <logmsgbot>	 !log l10nupdate Synchronized php-1.26wmf1/cache/l10n: (no message) (duration: 09m 22s)
[03:04:32] <morebots>	 Logged the message, Master
[03:05:17] <icinga-wm>	 PROBLEM - mailman I/O stats on sodium is CRITICAL - I/O stats: Transfers/Sec=201.80 Read Requests/Sec=136.80 Write Requests/Sec=11.60 KBytes Read/Sec=3515.20 KBytes_Written/Sec=604.70
[03:08:27] <icinga-wm>	 PROBLEM - mailman I/O stats on sodium is CRITICAL - I/O stats: Transfers/Sec=88.70 Read Requests/Sec=96.60 Write Requests/Sec=11.20 KBytes Read/Sec=1232.00 KBytes_Written/Sec=292.85
[03:11:11] <logmsgbot>	 !log LocalisationUpdate completed (1.26wmf1) at 2015-04-15 03:10:08+00:00
[03:11:20] <morebots>	 Logged the message, Master
[03:13:17] <icinga-wm>	 PROBLEM - mailman I/O stats on sodium is CRITICAL - I/O stats: Transfers/Sec=44.00 Read Requests/Sec=93.70 Write Requests/Sec=5.20 KBytes Read/Sec=5864.00 KBytes_Written/Sec=25.35
[03:18:16] <icinga-wm>	 PROBLEM - mailman I/O stats on sodium is CRITICAL - I/O stats: Transfers/Sec=207.00 Read Requests/Sec=153.20 Write Requests/Sec=27.20 KBytes Read/Sec=15444.00 KBytes_Written/Sec=283.35
[03:21:37] <icinga-wm>	 RECOVERY - mailman I/O stats on sodium is OK - I/O stats: Transfers/Sec=69.70 Read Requests/Sec=1.80 Write Requests/Sec=7.60 KBytes Read/Sec=9.60 KBytes_Written/Sec=828.20
[03:26:27] <icinga-wm>	 PROBLEM - mailman I/O stats on sodium is CRITICAL - I/O stats: Transfers/Sec=101.60 Read Requests/Sec=146.80 Write Requests/Sec=2.90 KBytes Read/Sec=4511.60 KBytes_Written/Sec=32.55
[03:28:06] <icinga-wm>	 RECOVERY - mailman I/O stats on sodium is OK - I/O stats: Transfers/Sec=97.30 Read Requests/Sec=18.80 Write Requests/Sec=72.80 KBytes Read/Sec=79.60 KBytes_Written/Sec=547.45
[03:34:27] <icinga-wm>	 PROBLEM - mailman I/O stats on sodium is CRITICAL - I/O stats: Transfers/Sec=181.90 Read Requests/Sec=94.80 Write Requests/Sec=1.30 KBytes Read/Sec=755.60 KBytes_Written/Sec=30.80
[03:36:06] <icinga-wm>	 RECOVERY - mailman I/O stats on sodium is OK - I/O stats: Transfers/Sec=5.90 Read Requests/Sec=0.40 Write Requests/Sec=2.80 KBytes Read/Sec=2.00 KBytes_Written/Sec=135.60
[04:12:00] <grrrit-wm>	 (03PS1) 10EBernhardson: Invalidate flow cache by bumping cache version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204215 
[04:13:51] <grrrit-wm>	 (03CR) 10Mattflaschen: [C: 032] Invalidate flow cache by bumping cache version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204215 (owner: 10EBernhardson)
[04:13:56] <grrrit-wm>	 (03Merged) 10jenkins-bot: Invalidate flow cache by bumping cache version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204215 (owner: 10EBernhardson)
[04:16:41] <logmsgbot>	 !log ebernhardson Synchronized wmf-config/CommonSettings.php: Bump flow cache version (duration: 00m 11s)
[04:16:48] <morebots>	 Logged the message, Master
[06:01:57] <icinga-wm>	 PROBLEM - puppetmaster https on virt1000 is CRITICAL - Socket timeout after 10 seconds
[06:14:47] <icinga-wm>	 RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 9.398 second response time
[06:18:04] <logmsgbot>	 !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Apr 15 06:17:01 UTC 2015 (duration 17m 0s)
[06:19:46] <icinga-wm>	 PROBLEM - puppetmaster https on virt1000 is CRITICAL - Socket timeout after 10 seconds
[06:29:57] <icinga-wm>	 PROBLEM - puppet last run on db1028 is CRITICAL Puppet has 1 failures
[06:30:57] <icinga-wm>	 PROBLEM - puppet last run on db1021 is CRITICAL Puppet has 1 failures
[06:31:16] <icinga-wm>	 PROBLEM - puppet last run on cp1056 is CRITICAL Puppet has 1 failures
[06:32:06] <icinga-wm>	 PROBLEM - puppet last run on logstash1002 is CRITICAL Puppet has 2 failures
[06:32:07] <icinga-wm>	 PROBLEM - puppet last run on cp4014 is CRITICAL Puppet has 1 failures
[06:33:47] <icinga-wm>	 PROBLEM - puppet last run on cp3014 is CRITICAL Puppet has 1 failures
[06:35:07] <icinga-wm>	 PROBLEM - puppet last run on mw2143 is CRITICAL Puppet has 1 failures
[06:35:27] <icinga-wm>	 PROBLEM - puppet last run on mw1144 is CRITICAL Puppet has 3 failures
[06:35:37] <icinga-wm>	 PROBLEM - puppet last run on mw2184 is CRITICAL Puppet has 1 failures
[06:35:47] <icinga-wm>	 PROBLEM - puppet last run on mw2126 is CRITICAL Puppet has 1 failures
[06:35:47] <icinga-wm>	 PROBLEM - puppet last run on mw2123 is CRITICAL Puppet has 1 failures
[06:35:47] <icinga-wm>	 PROBLEM - puppet last run on mw2096 is CRITICAL Puppet has 1 failures
[06:35:47] <icinga-wm>	 PROBLEM - puppet last run on mw2045 is CRITICAL Puppet has 2 failures
[06:35:47] <icinga-wm>	 PROBLEM - puppet last run on mw1052 is CRITICAL Puppet has 1 failures
[06:35:57] <icinga-wm>	 PROBLEM - puppet last run on mw1061 is CRITICAL Puppet has 1 failures
[06:36:07] <icinga-wm>	 PROBLEM - puppet last run on mw1123 is CRITICAL Puppet has 1 failures
[06:36:17] <icinga-wm>	 PROBLEM - puppet last run on mw2022 is CRITICAL Puppet has 1 failures
[06:36:27] <icinga-wm>	 PROBLEM - puppet last run on mw1118 is CRITICAL Puppet has 1 failures
[06:36:28] <icinga-wm>	 PROBLEM - puppet last run on mw1065 is CRITICAL Puppet has 1 failures
[06:36:47] <icinga-wm>	 PROBLEM - puppet last run on mw2050 is CRITICAL Puppet has 1 failures
[06:36:56] <icinga-wm>	 PROBLEM - puppet last run on mw1175 is CRITICAL Puppet has 1 failures
[06:45:37] <icinga-wm>	 RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 1.293 second response time
[06:45:47] <icinga-wm>	 RECOVERY - puppet last run on cp1056 is OK Puppet is currently enabled, last run 17 seconds ago with 0 failures
[06:46:37] <icinga-wm>	 RECOVERY - puppet last run on logstash1002 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures
[06:46:47] <icinga-wm>	 RECOVERY - puppet last run on cp3014 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:47:07] <icinga-wm>	 RECOVERY - puppet last run on db1021 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:48:18] <icinga-wm>	 RECOVERY - puppet last run on cp4014 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:50:36] <icinga-wm>	 PROBLEM - puppetmaster https on virt1000 is CRITICAL - Socket timeout after 10 seconds
[07:06:38] <icinga-wm>	 RECOVERY - puppet last run on mw1052 is OK Puppet is currently enabled, last run 23 seconds ago with 0 failures
[07:06:46] <icinga-wm>	 RECOVERY - puppet last run on mw1061 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures
[07:06:46] <icinga-wm>	 RECOVERY - puppet last run on mw2123 is OK Puppet is currently enabled, last run 36 seconds ago with 0 failures
[07:06:46] <icinga-wm>	 RECOVERY - puppet last run on mw2045 is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures
[07:06:57] <icinga-wm>	 RECOVERY - puppet last run on mw1123 is OK Puppet is currently enabled, last run 56 seconds ago with 0 failures
[07:07:07] <icinga-wm>	 RECOVERY - puppet last run on mw2022 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures
[07:07:07] <icinga-wm>	 RECOVERY - puppet last run on db1028 is OK Puppet is currently enabled, last run 51 seconds ago with 0 failures
[07:07:16] <icinga-wm>	 RECOVERY - puppet last run on mw1118 is OK Puppet is currently enabled, last run 1 second ago with 0 failures
[07:07:17] <icinga-wm>	 RECOVERY - puppet last run on mw1065 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:07:37] <icinga-wm>	 RECOVERY - puppet last run on mw2143 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures
[07:07:37] <icinga-wm>	 RECOVERY - puppet last run on mw2050 is OK Puppet is currently enabled, last run 27 seconds ago with 0 failures
[07:07:47] <icinga-wm>	 RECOVERY - puppet last run on mw1175 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:07:47] <icinga-wm>	 RECOVERY - puppet last run on mw1144 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures
[07:08:07] <icinga-wm>	 RECOVERY - puppet last run on mw2184 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:08:17] <icinga-wm>	 RECOVERY - puppet last run on mw2126 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures
[07:08:17] <icinga-wm>	 RECOVERY - puppet last run on mw2096 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:19:19] <wikibugs>	 6operations, 6Services, 7Service-Architecture: Set up monitoring automation for services - https://phabricator.wikimedia.org/T94821#1208803 (10mobrovac) After sleeping on it, I realised it's just a matter of format and we can go either way. I still think we should expose a proper endpoint for this (either sp...
[07:19:31] <mobrovac>	 _joe_: ^^
[07:25:16] <wikibugs>	 6operations, 6Services, 7Service-Architecture: Set up monitoring automation for services - https://phabricator.wikimedia.org/T94821#1208804 (10Joe) Having an endpoint exposing this gives us a lot of flexibility/autodiscovery ability that does NOT depend on people using swagger/spec to define what we should m...
[07:25:23] <_joe_>	 mobrovac: thanks
[07:26:34] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "Is there a plan to have something in the main namespaces of the module (or the role) ? If yes, just introduce it in this commit so we can " [puppet] - 10https://gerrit.wikimedia.org/r/204161 (owner: 10MaxSem)
[07:28:26] <_joe_>	 mobrovac: I don't care about having to parse a slightly more complicated yaml than a json already taylored to my needs
[07:29:17] <mobrovac>	 _joe_: ok cool, i can also go either way, really don't care
[07:29:32] <mobrovac>	 i cared yesterday but today i don't any more
[07:29:46] <mobrovac>	 too much discussion
[07:29:53] <_joe_>	 eheh
[07:30:26] <_joe_>	 still, I'd prefer the application to do that translation work
[07:30:36] <_joe_>	 it's an API and I'm your client, right?]
[07:30:47] <mobrovac>	 oh please write that in the ticket
[07:30:56] <mobrovac>	 (poossibly also explaining why)
[07:31:17] <mobrovac>	 _joe_: yes, it's a contract between services and the monitoring tool
[07:31:59] <mobrovac>	 i've been basing my comments on that, why gabriel argues for exposing full specs so that any entity may use them as they wish
[07:32:06] <mobrovac>	 s/why/but
[07:32:39] <mobrovac>	 and i agree that eventually all services should expose their full specs
[07:32:47] <mobrovac>	 but the template is not yet there to offer that
[07:32:58] <_joe_>	 I think one doesn't exclude the other
[07:33:23] <_joe_>	 but in my experience, you will have situations where you want to define what to monitor as a "decorator" to your method
[07:33:47] <_joe_>	 (sorry for the java/pythonesque terminology, I have no idea how you call those in nodejs)
[07:34:16] <icinga-wm>	 RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.090 second response time
[07:34:18] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I think this patch goes completely in the right direction. I have a few comments, but I guess we may go in a slightly different direction:" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/204068 (owner: 10BBlack)
[07:34:21] <mobrovac>	 no worries
[07:35:06] <_joe_>	 and I had to write some nodejs code at $WORK~1
[07:35:52] <_joe_>	 because well, some libraries were so ugly that even I knew how to make them better :P
[07:36:10] <mobrovac>	 hahaha
[07:36:52] * mobrovac has to think of a project where he can write some c++ just for the pleasure of it
[07:38:33] <ori>	 where were you during the HHVM migration!
[07:38:53] <_joe_>	 oh well. I have a wishlist for HHVM :P
[07:39:29] <_joe_>	 mobrovac: btw, something tells me you'll have to work on that code of mine sooner than later. It's a node library for etcd. I really hoped someone did something better but it doesn't look like it. There is one in coffeescript which is obviously better, though
[07:40:07] <mobrovac>	 :)
[07:40:52] <mobrovac>	 _joe_: https://www.npmjs.com/package/node-etcd ?
[07:41:08] <_joe_>	 that one is in coffeescript and it's better
[07:41:45] <_joe_>	 mobrovac: but devs refused to use coffeescript
[07:42:16] <mobrovac>	 ah right
[07:42:28] <mobrovac>	 but it compiles to a nodejs module, so ...
[07:42:32] <_joe_>	 yes
[07:42:43] <_joe_>	 I didn't say it made sense :)
[07:42:49] <_joe_>	 so yes, I'd use that
[07:43:26] <mobrovac>	 btw are we set on etcd or are options still being explored?
[07:45:32] <ori>	 both ;)
[07:45:42] <_joe_>	 ahah
[07:45:54] <_joe_>	 nope, not true. I like a _lot_ zookeeper
[07:46:50] <_joe_>	 I just find it unconfortable its api, and the fact you can't query it via curl
[07:47:25] <_joe_>	 also, I'm a bit scared by it being java, of course :)
[07:47:25] <mobrovac>	 zk still suffers from SPOF IIRC
[07:47:57] <_joe_>	 mobrovac: I don't think so, but I may remember incorrectly. I plan to start testing next week
[07:48:16] <icinga-wm>	 PROBLEM - RAID on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:48:36] <mobrovac>	 "unconfortable its api, and the fact you can't query it via curl" -> seems like reason enough not to use it
[07:49:16] <icinga-wm>	 PROBLEM - SSH on ms-be1016 is CRITICAL - Socket timeout after 10 seconds
[07:49:27] <icinga-wm>	 PROBLEM - configured eth on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:49:28] <icinga-wm>	 PROBLEM - swift-account-replicator on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:49:36] <icinga-wm>	 PROBLEM - swift-object-updater on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:49:47] <icinga-wm>	 PROBLEM - very high load average likely xfs on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:49:57] <icinga-wm>	 PROBLEM - swift-account-auditor on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:50:06] <icinga-wm>	 PROBLEM - swift-account-reaper on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:50:07] <icinga-wm>	 PROBLEM - swift-container-auditor on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:50:18] <icinga-wm>	 PROBLEM - swift-container-replicator on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:50:27] <icinga-wm>	 PROBLEM - swift-container-server on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:50:27] <icinga-wm>	 PROBLEM - puppet last run on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:50:36] <icinga-wm>	 PROBLEM - swift-account-server on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:50:36] <icinga-wm>	 PROBLEM - swift-container-updater on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:50:46] <icinga-wm>	 PROBLEM - swift-object-auditor on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:50:46] <icinga-wm>	 PROBLEM - dhclient process on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:50:47] <icinga-wm>	 PROBLEM - swift-object-server on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:50:57] <icinga-wm>	 PROBLEM - Disk space on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:51:06] <icinga-wm>	 PROBLEM - salt-minion processes on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:51:37] <icinga-wm>	 PROBLEM - DPKG on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:51:37] <icinga-wm>	 PROBLEM - swift-object-replicator on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:58:31] <_joe_>	 !log powercycling ms-be1016, console shows BUG: soft lockup - CPU#XX stuck for YYs! being emitted continuously
[08:02:57] <icinga-wm>	 RECOVERY - DPKG on ms-be1016 is OK: All packages OK
[08:02:57] <icinga-wm>	 RECOVERY - swift-object-replicator on ms-be1016 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator
[08:02:57] <icinga-wm>	 RECOVERY - swift-account-reaper on ms-be1016 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper
[08:03:07] <icinga-wm>	 RECOVERY - swift-container-auditor on ms-be1016 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor
[08:03:17] <icinga-wm>	 RECOVERY - swift-container-replicator on ms-be1016 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator
[08:03:17] <icinga-wm>	 RECOVERY - swift-container-server on ms-be1016 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server
[08:03:18] <icinga-wm>	 RECOVERY - puppet last run on ms-be1016 is OK Puppet is currently enabled, last run 24 minutes ago with 0 failures
[08:03:27] <icinga-wm>	 RECOVERY - swift-account-server on ms-be1016 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server
[08:03:36] <icinga-wm>	 RECOVERY - swift-container-updater on ms-be1016 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater
[08:03:37] <icinga-wm>	 RECOVERY - swift-object-auditor on ms-be1016 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor
[08:03:46] <icinga-wm>	 RECOVERY - dhclient process on ms-be1016 is OK: PROCS OK: 0 processes with command name dhclient
[08:03:47] <icinga-wm>	 RECOVERY - SSH on ms-be1016 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[08:03:47] <icinga-wm>	 RECOVERY - swift-object-server on ms-be1016 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server
[08:03:48] <icinga-wm>	 RECOVERY - Disk space on ms-be1016 is OK: DISK OK
[08:03:57] <icinga-wm>	 RECOVERY - configured eth on ms-be1016 is OK - interfaces up
[08:03:57] <icinga-wm>	 RECOVERY - swift-account-replicator on ms-be1016 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator
[08:03:57] <icinga-wm>	 RECOVERY - salt-minion processes on ms-be1016 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[08:04:08] <icinga-wm>	 RECOVERY - swift-object-updater on ms-be1016 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater
[08:04:26] <icinga-wm>	 RECOVERY - very high load average likely xfs on ms-be1016 is OK - load average: 8.70, 2.86, 1.01
[08:04:27] <icinga-wm>	 RECOVERY - RAID on ms-be1016 is OK Active: 6, Working: 6, Failed: 0, Spare: 0
[08:04:27] <icinga-wm>	 RECOVERY - swift-account-auditor on ms-be1016 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor
[08:05:07] <hashar>	 seems the log bot is dead again
[08:05:15] <hashar>	 !log testing !log
[08:05:36] <ori>	 Didn't log the message, master.
[08:05:46] <hashar>	 stupid bot
[08:05:57] <hashar>	 restarting it
[08:07:05] <hashar>	 !log testing !log
[08:07:12] <hashar>	 logmsgbot: ping
[08:07:27] <hashar>	 LoginError: (<Site object '('https', 'wikitech.wikimedia.org')/w/'>, {u'result': u'WrongPass'})
[08:07:32] <hashar>	 so that is wikitech being dead again
[08:07:34] <hashar>	 or ldap
[08:08:35] <hashar>	 if I am right, someone need to restart keystone or ldap on virt1000
[08:09:09] <YuviPanda>	 Sigh
[08:09:14] <YuviPanda>	 I'm in a bus 
[08:09:22] <YuviPanda>	 Someone else with root needs to
[08:11:00] <godog>	 I'll take a look
[08:12:02] <godog>	 !log bounce keystone on virt1000
[08:12:07] <Raymond__>	 hashar: https://integration.wikimedia.org/ci/job/mwext-testextension-zend/114/console is dead too. known?
[08:12:16] <godog>	 sad_trombone.wav
[08:12:35] <morebots>	 Logged the message, Master
[08:14:31] <hashar>	 godog: our error!
[08:14:38] <hashar>	 err
[08:14:39] <hashar>	 our hero
[08:15:39] <godog>	 haha it is a fine line between the two
[08:27:16] <wikibugs>	 6operations: Java security updates (CPU 2014) - https://phabricator.wikimedia.org/T96125#1208853 (10MoritzMuehlenhoff)
[08:27:38] <YuviPanda>	 godog: hey! I have a graphite / statsd question (from pov of a developer) got a moment?
[08:34:37] <icinga-wm>	 PROBLEM - puppetmaster https on virt1000 is CRITICAL - Socket timeout after 10 seconds
[08:37:21] <godog>	 YuviPanda: sure, in 5
[08:37:55] <YuviPanda>	 godog: cool
[08:43:33] <godog>	 YuviPanda: hey, shoot
[08:43:44] <YuviPanda>	 godog: so if you look at http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1429086474.982&target=tools.tools-services-02.WebServiceMonitor.manifestscollected
[08:43:54] <YuviPanda>	 it says something about 1900 a minute of sorts, right?
[08:43:58] <YuviPanda>	 but what’s actually happening
[08:44:05] <YuviPanda>	 is that it’s reporting about 300 something every 10 seconds
[08:44:13] <YuviPanda>	 and I assume that it would get averaged every flush period
[08:44:15] <YuviPanda>	 instead of ‘added'
[08:44:31] <YuviPanda>	 which is what seems to be happening instead (added every 60s?)
[08:44:36] <YuviPanda>	 I’m wondering what I’m doing wrong
[08:45:27] <YuviPanda>	 https://github.com/wikimedia/operations-software-tools-manifest/blob/master/tools/manifest/collector.py#L60
[08:45:31] <YuviPanda>	 I wonder if the ‘incr’ is the problem
[08:45:36] <YuviPanda>	 and I should ‘set’ it somehow instead
[08:45:43] <YuviPanda>	 this is using your python-statsd package for client
[08:46:07] <icinga-wm>	 RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 6.222 second response time
[08:46:11] <godog>	 YuviPanda: looking
[08:48:42] <YuviPanda>	 godog: cool
[08:48:53] <godog>	 YuviPanda: we're flushing at 60s ATM btw
[08:49:03] <YuviPanda>	 right
[08:49:04] <godog>	 YuviPanda: how often does the collector run?
[08:49:11] <YuviPanda>	 godog: approximately every 10s
[08:49:17] <YuviPanda>	 this isn’t a diamond collector tho
[08:50:58] <godog>	 YuviPanda: but yeah in this case it'll accumulate the counter until the flush and then reset it, so you might get duplicates in there if it is collecting every 10s
[08:51:06] <icinga-wm>	 PROBLEM - puppetmaster https on virt1000 is CRITICAL - Socket timeout after 10 seconds
[08:51:09] <YuviPanda>	 booo :(
[08:51:26] <YuviPanda>	 godog: but isn’t it supposed to flush the value as being the average of all the values it gets?
[08:52:24] <godog>	 YuviPanda: counters no, they are only increasing
[08:52:40] <YuviPanda>	 right, so that’s part of the problem - I’m not sure which bit to use :D
[08:53:00] <YuviPanda>	 godog: so if I want it to ‘flush average of all values received since last flush’ what should I use?
[08:54:28] <godog>	 YuviPanda: why the average though? you can use a set and get an unique count
[08:55:14] <YuviPanda>	 godog: because the deamon works like: 1. do things, 2. sleep 10s, 3. go back to 1
[08:55:37] <YuviPanda>	 so a set would all be unique only inside (1) and there can be upto 6 of them every minute
[08:55:41] <YuviPanda>	 so average is my best bet
[08:56:41] <godog>	 YuviPanda: the count will be unique inside a flush interval btw
[08:57:04] <YuviPanda>	 true, but I don’t want to be sending potentially count packets when I can send only 1 packet
[08:57:27] <YuviPanda>	 and I do think average is what actually makes more sense than set here
[08:57:48] <YuviPanda>	 if first 3 report 300 and the last 2 report 200, I want to see that drop I think
[08:58:35] <wikibugs>	 6operations, 10MediaWiki-extensions-Graph, 6Services, 10service-template-node, 7service-runner: Deploy graphoid service into production - https://phabricator.wikimedia.org/T90487#1208877 (10mobrovac) @akosiaris has started work on this, ETA: end of April
[09:00:42] <wikibugs>	 6operations, 6Mobile-Apps, 6Services: Deployment of Mobile App's service on the SCA cluster - https://phabricator.wikimedia.org/T92627#1208880 (10mobrovac) Another status update: T95533 has been resolved, allowing us to move forward on this front. ETA for deployment: end of April.
[09:01:42] <godog>	 YuviPanda: I think I'm missing what you are interested in knowning from the metric
[09:01:57] <YuviPanda>	 godog: it’s basically ‘number of valid service manifests in all of toollabs'
[09:02:19] <YuviPanda>	 godog: if you want something that’s more concrete
[09:02:21] <YuviPanda>	 godog: look at http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1429088531.664&target=tools.tools-services-02.WebServiceMonitor.startsuccess
[09:02:26] <YuviPanda>	 that’s number of webservices restarted
[09:02:37] <YuviPanda>	 hmm actually
[09:02:41] <YuviPanda>	 *that* is averaging itself
[09:02:59] * YuviPanda and giving it floating point values?!
[09:05:24] <wikibugs>	 6operations, 6Mobile-Apps, 6Services: Deployment of Mobile App's service on the SCA cluster - https://phabricator.wikimedia.org/T92627#1208896 (10mobrovac)
[09:06:38] <YuviPanda>	 godog: alright, I gotta sleep. I’ll think about it some more and read up docs and write an email if needed :)
[09:06:39] <YuviPanda>	 thanks
[09:06:52] <YuviPanda>	 godog: thanks for statsite - it’s made graphite in general a lot  more usable \o/
[09:07:49] <godog>	 YuviPanda: np, I think we'll need to make some adjustments too, also consider gauges for what you're trying to do
[09:08:06] <godog>	 YuviPanda: it won't average though, just update
[09:08:08] <YuviPanda>	 godog: yeah, but wouldn’t that only count the last entry before flush?
[09:08:09] <YuviPanda>	 yeah
[09:08:21] <YuviPanda>	 so if I have a drop in the first 5 runs in a minute it’ll just no twork
[09:08:23] <YuviPanda>	 *not
[09:08:24] <hashar>	 I love how null are really null
[09:08:38] <wikibugs>	 6operations, 10Continuous-Integration, 5Continuous-Integration-Isolation: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1208904 (10hashar) Should we start drawing a network diagram representing the different lan / vlan we have and the traffic flows between...
[09:10:40] <YuviPanda>	 hashar: yup
[09:10:42] <YuviPanda>	 best feature
[09:22:17] <icinga-wm>	 RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 1.251 second response time
[09:26:32] <wikibugs>	 6operations, 10Continuous-Integration, 5Continuous-Integration-Isolation, 5Patch-For-Review, 7Upstream: Create a Debian package for NodePool on Debian Jessie - https://phabricator.wikimedia.org/T89142#1208920 (10hashar) From upstream at http://lists.openstack.org/pipermail/openstack-infra/2015-April/0026...
[09:32:06] <icinga-wm>	 PROBLEM - puppetmaster https on virt1000 is CRITICAL - Socket timeout after 10 seconds
[09:36:56] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0]
[09:41:26] <_joe_>	 we don't keep a list of all our datacenters anywhere in our code?
[09:41:43] <_joe_>	 meh
[09:42:10] <paravoid>	 what do you mean?
[09:42:28] <_joe_>	 in puppet, we don't have any list that includes all our datacenters
[09:42:53] <paravoid>	 should be easy to add to realm.pp
[09:42:53] <_joe_>	 or even dividing between "caching" ones and "main" ones
[09:43:05] <paravoid>	 well that distinction is only meaningful per context
[09:43:14] <_joe_>	 yes, it's just strange it's not there :)
[09:43:17] <paravoid>	 what does that mean even :)
[09:44:07] <_joe_>	 well, a distinction is that some datacenters are serving mediawiki/services directly, and the others don't
[09:44:50] <_joe_>	 but I don't have a compelling use of it, right now, so nevermind
[09:45:09] <_joe_>	 I do have an use for the list of our datacenters though, adding it.
[09:46:47] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0]
[09:48:05] <paravoid>	 even the list is a bit dependent on context :)
[09:48:47] <icinga-wm>	 PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds
[09:52:22] <paravoid>	 what are you trying to do?
[09:52:54] <wikibugs>	 6operations: Encrypted password storage - https://phabricator.wikimedia.org/T96130#1208952 (10MoritzMuehlenhoff) 3NEW
[09:53:21] <_joe_>	 paravoid: I'm trying to elaborate on https://gerrit.wikimedia.org/r/#/c/204068/
[09:53:28] <_joe_>	 which has a couple of issues
[09:54:27] <_joe_>	 and in writing one function, I would've loved not to hardcode the list of all our datacenters inside the function itself
[09:55:32] <_joe_>	 but it was completely secondary to my problem, and I got derailed as usual :)
[09:57:05] <wikibugs>	 6operations, 10Continuous-Integration, 5Continuous-Integration-Isolation, 5Patch-For-Review: Provide Debian package python-pymysql for jessie-wikimedia - https://phabricator.wikimedia.org/T96131#1208959 (10hashar) 3NEW
[09:59:34] <wikibugs>	 6operations: Encrypted password storage - https://phabricator.wikimedia.org/T96130#1208966 (10Joe) I think it would be nice/necessary to be able to have ACLs on different sections of the store, and be able to select what each user/group of users will be able to read and/or write. I know pwstore allows this, just...
[10:00:00] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: eventlogging: adjust counters thresholds [puppet] - 10https://gerrit.wikimedia.org/r/204237 (https://phabricator.wikimedia.org/T90111) 
[10:04:47] <icinga-wm>	 RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.104 second response time
[10:05:47] <icinga-wm>	 PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333
[10:07:47] <icinga-wm>	 PROBLEM - puppet last run on mw2013 is CRITICAL puppet fail
[10:13:48] <grrrit-wm>	 (03PS5) 10Hashar: Initial Debian packaging [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/203961 (https://phabricator.wikimedia.org/T89142) 
[10:14:47] <icinga-wm>	 RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60626 bytes in 1.898 second response time
[10:15:09] <grrrit-wm>	 (03CR) 10Hashar: "Added python-pymysql and python-daemon to Build-Deps since dh_python2 does not find them." [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/203961 (https://phabricator.wikimedia.org/T89142) (owner: 10Hashar)
[10:16:05] <wikibugs>	 6operations, 10Continuous-Integration, 5Continuous-Integration-Isolation: Provide Debian package python-pymysql for jessie-wikimedia - https://phabricator.wikimedia.org/T96131#1208971 (10hashar)
[10:19:56] <icinga-wm>	 PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds
[10:21:36] <icinga-wm>	 RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60647 bytes in 0.784 second response time
[10:25:47] <icinga-wm>	 RECOVERY - puppet last run on mw2013 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures
[10:25:50] <wikibugs>	 6operations, 10Continuous-Integration, 5Continuous-Integration-Isolation, 5Patch-For-Review, 7Upstream: Create a Debian package for NodePool on Debian Jessie - https://phabricator.wikimedia.org/T89142#1208974 (10hashar)
[10:26:22] <godog>	 !log bounce jobchron on mw1001
[10:26:34] <wikibugs>	 6operations, 10Continuous-Integration, 5Continuous-Integration-Isolation, 5Patch-For-Review, 7Upstream: Create a Debian package for NodePool on Debian Jessie - https://phabricator.wikimedia.org/T89142#1028174 (10hashar)
[10:26:36] <icinga-wm>	 PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds
[10:26:44] <godog>	 morebots: I am disappoint
[10:26:44] <morebots>	 I am a logbot running on tools-exec-02.
[10:26:44] <morebots>	 Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log.
[10:26:44] <morebots>	 To log a message, type !log <msg>.
[10:26:59] <wikibugs>	 6operations, 10Continuous-Integration, 5Continuous-Integration-Isolation, 5Patch-For-Review, 7Upstream: Create a Debian package for NodePool on Debian Jessie - https://phabricator.wikimedia.org/T89142#1028174 (10hashar) I have updated the dependency table in the task details to take in account Jessie ins...
[10:27:20] <hashar>	 godog: let me look at the logs
[10:27:45] <hashar>	 ssh tools-login.eqiad.wmflabs
[10:27:47] <hashar>	 become morebots
[10:28:04] <hashar>	 LoginError: (<Site object '('https', 'wikitech.wikimedia.org')/w/'>, {u'result': u'WrongPass'})
[10:28:14] <hashar>	 godog: I guess keystone / ldap whatever needs yet another restart
[10:28:32] <hashar>	 we should probaqbly get a monitoring probe for that service
[10:29:37] <icinga-wm>	 RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60634 bytes in 1.297 second response time
[10:30:04] <godog>	 hashar: indeed that would be nice
[10:30:17] <godog>	 !log restart keystone on virt1000 (#2)
[10:30:39] <morebots>	 Logged the message, Master
[10:30:41] <hashar>	 !
[10:30:43] <hashar>	 magic
[10:31:02] <godog>	 !log bounce jobchron on mw1001
[10:31:07] <morebots>	 Logged the message, Master
[10:31:19] <hashar>	 meanwhile I feel really more at ease with debian packaging
[10:36:24] <wikibugs>	 6operations, 10MediaWiki-JobRunner: jobchron logs are not rotated - https://phabricator.wikimedia.org/T96132#1208986 (10fgiunchedi) 3NEW
[10:37:58] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] "Stupid mistake. Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/203228 (owner: 10Hashar)
[10:44:17] <icinga-wm>	 PROBLEM - puppetmaster https on virt1000 is CRITICAL - Socket timeout after 10 seconds
[10:48:33] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor dependency comment, other LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/203073 (https://phabricator.wikimedia.org/T95545) (owner: 10Hashar)
[10:50:46] <icinga-wm>	 RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0
[10:51:14] <_joe_>	 why is the puppet master in labs going down repeatedly? has anyone looked?
[10:56:39] <_joe_>	 looking
[10:57:16] <icinga-wm>	 RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 1.720 second response time
[11:02:27] <icinga-wm>	 PROBLEM - puppetmaster https on virt1000 is CRITICAL - Socket timeout after 10 seconds
[11:16:07] <_joe_>	 !log restarted apache2 on virt1000, passenger gone to hell
[11:16:57] <icinga-wm>	 RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.085 second response time
[11:22:16] <icinga-wm>	 PROBLEM - puppetmaster https on virt1000 is CRITICAL - Socket timeout after 10 seconds
[11:23:39] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] ssh::userkey: Allow a prefix to be specified for a key [puppet] - 10https://gerrit.wikimedia.org/r/202731 (owner: 10Alexandros Kosiaris)
[11:38:26] <icinga-wm>	 RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 1.355 second response time
[11:41:53] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Specify ssh userkey policy for ganeti clusters [puppet] - 10https://gerrit.wikimedia.org/r/202730 (owner: 10Alexandros Kosiaris)
[11:43:37] <icinga-wm>	 PROBLEM - puppetmaster https on virt1000 is CRITICAL - Socket timeout after 10 seconds
[11:45:03] <grrrit-wm>	 (03PS7) 10Alexandros Kosiaris: Specify ssh userkey policy for ganeti clusters [puppet] - 10https://gerrit.wikimedia.org/r/202730 
[11:45:31] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Specify ssh userkey policy for ganeti clusters [puppet] - 10https://gerrit.wikimedia.org/r/202730 (owner: 10Alexandros Kosiaris)
[11:47:00] <grrrit-wm>	 (03PS7) 10Alexandros Kosiaris: ssh::userkey: Allow a prefix to be specified for a key [puppet] - 10https://gerrit.wikimedia.org/r/202731 
[11:48:31] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: graphite: switch remaining machines to statsdlb [puppet] - 10https://gerrit.wikimedia.org/r/204247 
[11:49:44] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: graphite: switch remaining machines to statsdlb [puppet] - 10https://gerrit.wikimedia.org/r/204247 
[11:49:52] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] graphite: switch remaining machines to statsdlb [puppet] - 10https://gerrit.wikimedia.org/r/204247 (owner: 10Filippo Giunchedi)
[11:51:37] <icinga-wm>	 RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.042 second response time
[12:01:21] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet).
[12:02:52] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet).
[12:06:02] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge.
[12:06:11] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge.
[12:10:19] <grrrit-wm>	 (03PS7) 10Alexandros Kosiaris: Provision the ssh key added in 3c8c524 [puppet] - 10https://gerrit.wikimedia.org/r/201462 
[12:10:49] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Provision the ssh key added in 3c8c524 [puppet] - 10https://gerrit.wikimedia.org/r/201462 (owner: 10Alexandros Kosiaris)
[12:10:52] <icinga-wm>	 PROBLEM - puppetmaster https on virt1000 is CRITICAL - Socket timeout after 10 seconds
[12:20:43] <icinga-wm>	 RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.149 second response time
[12:21:08] <grrrit-wm>	 (03CR) 10Hashar: "Thanks Alexandros. I think we can express the resources dependency explicitly:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/203073 (https://phabricator.wikimedia.org/T95545) (owner: 10Hashar)
[12:26:01] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet).
[12:26:01] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet).
[12:31:37] <wikibugs>	 7Puppet, 6operations: Prepend timestamp in /var/log/puppet.log - https://phabricator.wikimedia.org/T75989#1209049 (10hashar) 5Open>3declined Shelling out to date for each line is not clever. Maybe we could extend the puppet console log formatter with our own class but I have no idea whether it is doable no...
[12:35:32] <icinga-wm>	 PROBLEM - puppetmaster https on virt1000 is CRITICAL - Socket timeout after 10 seconds
[12:37:02] <icinga-wm>	 RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.077 second response time
[12:37:02] <abogott>	 akosiaris: do you know anything about what’s happening, other than a million alerts firing?
[12:37:30] <akosiaris>	 abogott: keystone is not answering, obviously neither is nova
[12:37:37] <akosiaris>	 but that is not standard, rather intermittent
[12:38:01] <akosiaris>	 at first due to a keystone debug log I thought it was LDAP not answering
[12:38:07] <akosiaris>	 but it it seems this is working fine
[12:38:21] <akosiaris>	 I have gone through restarting keystone half a dozen time
[12:38:29] <abogott>	 ok
[12:38:37] <akosiaris>	 also nova-* services
[12:38:56] <akosiaris>	 a few minutes ago I killed a couple of mysqldumps on nova and keystone databases
[12:39:22] <wikibugs>	 7Puppet, 6Labs: Puppet logs should be timestamped in a human-readable way - https://phabricator.wikimedia.org/T88108#1209056 (10scfc)
[12:39:25] <wikibugs>	 7Puppet, 6operations: Prepend timestamp in /var/log/puppet.log - https://phabricator.wikimedia.org/T75989#1209057 (10scfc)
[12:39:48] <akosiaris>	 but that was like a shotgun approach, I did not really believe mysql was the problem
[12:39:58] <abogott>	 akosiaris: is it intermittent?
[12:40:03] <akosiaris>	 yes
[12:40:08] <abogott>	 ok
[12:40:38] <akosiaris>	 for example now nova list is returning just fine
[12:40:53] <akosiaris>	 a few minutes ago it would stall and never return a single result
[12:41:15] <abogott>	 yeah, naturally it’s working fine now that I’m trying to see the issue :)
[12:41:31] <abogott>	 how was memory on virt1000 when things were at their worst?
[12:42:02] <akosiaris>	 oom did not come out if that is what you are asking
[12:42:24] <abogott>	 ok
[12:42:46] <akosiaris>	 btw, we noticed this due to puppetmaster failing
[12:43:09] <akosiaris>	 while it seems it was a more deeply hidden problem
[12:43:33] <abogott>	 did you restart opendj by chance?
[12:43:39] <akosiaris>	 the machine was into heavy iowait into one of the CPUs
[12:43:45] <akosiaris>	 yes, on neptunium
[12:43:52] <abogott>	 ah, ok, that explains the dns issue...
[12:44:11] <akosiaris>	 and of course it did not help
[12:44:32] <abogott>	 (there’s a dumb issue with pdns where it can’t recover if ldap restarts.)
[12:44:51] <wikibugs>	 7Puppet, 6operations: Prepend timestamp in /var/log/puppet.log - https://phabricator.wikimedia.org/T75989#1209071 (10scfc) I had the same wish because I had assumed that "manual" Puppet runs (`sudo puppet agent -tv`) do not log to `/var/log/syslog`.  But they do, so all Puppet runs are logged with individually...
[12:48:33] <akosiaris>	 andrewbogott: so, how long are the mysqldumps in /usr/local/sbin/db-bak.sh supposed to last ? 
[12:49:27] <andrewbogott>	 There’s a huge amount of data in that db that we can just drop.
[12:49:49] <andrewbogott>	 Since there’s latent wikitech data there that’s no longer used.
[12:49:54] <akosiaris>	 9:45 mysqldump --single-transaction -u root keystone -c 
[12:50:07] <akosiaris>	 almost 10 hours... I 'd say we should
[12:50:10] <andrewbogott>	 springle: is there any reason I can’t just drop the wikitech data from the virt1000 db?
[12:50:36] <andrewbogott>	 akosiaris: so, that almost fits.  It mysql was hammered such that keystone couldn’t query it...
[12:50:49] <andrewbogott>	 Although that doesn’t explain the puppet issue, since as far as I know puppet doesn’t use mysql
[12:51:13] <_joe_>	 puppet was blocked communicating with keystone
[12:51:13] <wikibugs>	 7Puppet, 6operations, 5Patch-For-Review: Make Puppet repository pass lenient and strict lint checks - https://phabricator.wikimedia.org/T87132#1209083 (10hashar) a:5hashar>3None
[12:51:19] <_joe_>	 to set some nova values
[12:51:55] <akosiaris>	 puppet is calling nova directly, right ? 
[12:51:55] <andrewbogott>	 Are we talking about the local puppet run on virt1000, or the puppetmaster?
[12:52:02] <akosiaris>	 hmmm
[12:52:03] <_joe_>	 puppetmaster
[12:52:09] <akosiaris>	 keystone           | Query   | 40302 | Sending data | SELECT /*!40001 SQL_NO_CACHE */ * FROM `token`
[12:52:17] <bblack>	 virt1000 is the puppetmaster for labs too, right?
[12:52:21] <andrewbogott>	 yes
[12:52:27] <akosiaris>	 ok that would explain the heavy IOwait 
[12:52:49] <andrewbogott>	 I’m trying to think why they puppetmaster would hit keystone.  It should be talking straight to ldap.
[12:53:08] <andrewbogott>	 On the other hand, puppet client runs have a fact which hits the metatdata service and /that/ probably hits keystone.  Is it possible that that’s what you were seeing?
[12:53:27] <akosiaris>	 /usr/bin/python /usr/bin/nova --os-region-name eqiad --os-auth-url http://virt1000.wikimedia.org:35357/v2.0 --os-password <redacted> --os-username novaadmin --os-tenant-name editor-engagement meta mwui set puppetstatus=failed
[12:53:41] <akosiaris>	 that was seen all over for all VMs more or less
[12:53:46] <andrewbogott>	 ah!
[12:53:57] <andrewbogott>	 Yes, puppetmaster hitting metadata
[12:53:57] <akosiaris>	 puppet calls it for some reason
[12:54:02] <andrewbogott>	 So, ok, this all adds up.
[12:54:21] <hashar>	 puppetstatus=failed  is set by labstatus.rb isn't it ?
[12:54:31] <akosiaris>	 so me killing mysqldumps actually fixed the problem ?
[12:54:42] <andrewbogott>	 akosiaris: maybe :)
[12:54:44] <_joe_>	 lol
[12:54:52] <akosiaris>	 oh, come on...
[12:55:06] <akosiaris>	 then, that mysql is misbehaving somehow... :-(
[12:55:26] <bblack>	 or being abused!
[12:55:36] <akosiaris>	 yeah, point taken
[12:55:46] <akosiaris>	 as always, I fix applications, not databases
[12:55:52] <bblack>	 :)
[12:56:26] <andrewbogott>	 Probably https://phabricator.wikimedia.org/T92693 is the proper fix for this
[12:56:31] <andrewbogott>	 that and thinning out the obsolete data in that db
[12:56:47] <andrewbogott>	 Which I’ve talked to sean about a few times but probably he’s waiting for me to do it and I’m waiting for him to do it.
[12:56:52] <andrewbogott>	 So I’ll just do it right now :)
[12:57:25] <andrewbogott>	 heh, I guess i know it’s backed up at least
[12:57:33] <akosiaris>	 I would say purging all expired tokens from that table regularly would be a sane thing to do
[12:57:50] <andrewbogott>	 akosiaris: keystone tokens you mean?
[12:58:09] <andrewbogott>	 !log dropping labswiki and labswiki_eqiad from mysql on virt1000
[12:58:12] <akosiaris>	 I am afraid to do a select count(*) but show table status says 4763172 rows with a data length of 71531757568 bytes
[12:58:16] <akosiaris>	 andrewbogott: yes
[12:58:18] <morebots>	 Logged the message, Master
[12:58:31] * andrewbogott is more-or-less terrified of the ‘drop database’ command
[12:58:31] <akosiaris>	 !log restarted opendj on neptunium
[12:58:36] <morebots>	 Logged the message, Master
[12:58:42] <akosiaris>	 !log restarted keystone, nova services on virt1000
[12:58:47] <morebots>	 Logged the message, Master
[12:59:00] <akosiaris>	 this is actually not the correct timestamp but at least here it is
[12:59:09] <andrewbogott>	 !log restarted pdns on virt1000 and labcontrol2001 to recover from the opendj restart
[12:59:15] <morebots>	 Logged the message, Master
[12:59:15] <_joe_>	 andrewbogott: don't do it, then
[12:59:29] <akosiaris>	 `expires` datetime DEFAULT NULL,
[12:59:44] <akosiaris>	 there... the table even has a nice field for the DELETE from where query ;-)
[12:59:52] <andrewbogott>	 _joe_: I suspect that backing up those tables was part of what made the mysql backup job run forever and gobble resources
[13:00:21] <akosiaris>	 andrewbogott: that is not what I witnessed
[13:00:33] <akosiaris>	 I saw the token table being backed up for like forever
[13:00:38] <akosiaris>	 well, half a day
[13:01:37] <akosiaris>	 and /usr/local/sbin/db-bak.sh is not backing up is specifically backing up labswiki, keystone, nova and then glance and mysql
[13:02:01] <akosiaris>	 funny thing is, it is done in a nice -n 19 wrapper
[13:02:08] <andrewbogott>	 akosiaris: I just typed ‘select * from token’ and now it’s hanging.  That’s because that table is… super big?
[13:02:25] <akosiaris>	 andrewbogott: yup
[13:02:30] <akosiaris>	 I suggest ctrl-c
[13:02:35] <andrewbogott>	 so keystone doesn’t clean up after itself, ever
[13:02:39] <akosiaris>	 and reissuing with something like limit 10;
[13:02:53] <akosiaris>	 if that is true (which it might very well be), it is unfortunate
[13:03:31] <akosiaris>	 http://www.sebastien-han.fr/blog/2012/12/12/cleanup-keystone-tokens/
[13:03:32] <akosiaris>	 ahahaha
[13:03:37] <akosiaris>	 deja vu ?
[13:04:03] <bblack>	 don't forget whoever mentioned earlier that keystone fix -> restart pdns as well
[13:04:04] <andrewbogott>	 ha!  Yes, that certainly fits
[13:04:19] <akosiaris>	 The setup runs for 2 months now and already 1970938 and I don’t run a public cloud. I can’t imagine the nightmare with a public cloud…
[13:04:58] <akosiaris>	 andrewbogott: sorry about waking you up btw
[13:05:13] <akosiaris>	 what puzzles me is why today and not yesterday or last week or something though
[13:05:18] <andrewbogott>	 akosiaris: np, I woke up around 5 mins before you called
[13:05:40] <andrewbogott>	 akosiaris: I think this has been happening, a little bit, periodically.  And we just hit a limit where it was finally too much and the problem became more severe.
[13:06:32] <andrewbogott>	 akosiaris: so, for the moment I think I support a cron that deletes expired tokens.  are you deep enough in that you can feed me a mysql command that will do that?
[13:06:55] <akosiaris>	 I think so
[13:07:03] <andrewbogott>	 oh, I guess that page includes the exact command for that, huh?
[13:07:07] <akosiaris>	 so, that guy proposes mysql -u${mysql_user} -p${mysql_password} -h${mysql_host} -e 'USE keystone ; DELETE FROM token WHERE NOT DATE_SUB(CURDATE(),INTERVAL 2 DAY) <= expires;'
[13:07:23] <akosiaris>	 but there are some improvements we can do
[13:08:01] <akosiaris>	 for example use keystone is not needed in the query
[13:08:23] <akosiaris>	 mysql keystone -e 'DELETE etc' is slightly better
[13:08:51] <akosiaris>	 also that 2 days thing should be more configurable
[13:09:01] <akosiaris>	 but otherwise the basic premise seems correct to me
[13:10:23] <akosiaris>	 btw, I am starting to think keystone actually cleans up after itself
[13:10:42] <akosiaris>	 2015-03-06 05:16:06 seems to be the earliest of tokens we got
[13:11:05] <akosiaris>	 no scratch that
[13:11:15] <akosiaris>	 it was me misreading some fields
[13:11:21] <akosiaris>	 2012-08-18 01:34:54
[13:11:35] <akosiaris>	 is the earliest we got, so no, it does not cleanup
[13:12:59] <akosiaris>	 btw andrewbogott driver = keystone.token.backends.memcache.Token
[13:13:08] <akosiaris>	 we could keep the tokens in memcached as well
[13:13:35] <akosiaris>	 not that I like the idea of adding one more part in that machinery, but someone obviously has done it
[13:14:11] <andrewbogott>	 yeah, I was thinking about memcached, but… since wikitech is on a different server now we don’t currently depend on memcached on virt1000.  Nice to keep it that way
[13:21:02] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Clean up expired keystone tokens. [puppet] - 10https://gerrit.wikimedia.org/r/204256 
[13:21:08] <andrewbogott>	 akosiaris: ^
[13:21:32] <icinga-wm>	 PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds
[13:21:42] <grrrit-wm>	 (03PS2) 10Andrew Bogott: Clean up expired keystone tokens. [puppet] - 10https://gerrit.wikimedia.org/r/204256 
[13:22:34] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Clean up expired keystone tokens. [puppet] - 10https://gerrit.wikimedia.org/r/204256 (owner: 10Andrew Bogott)
[13:23:02] <grrrit-wm>	 (03CR) 10Thcipriani: Add submodules to master checkoutMediaWiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204080 (https://phabricator.wikimedia.org/T88442) (owner: 10Thcipriani)
[13:23:56] <andrewbogott>	 akosiaris: is there any reason why this isn’t a keystone bug?
[13:24:39] <grrrit-wm>	 (03PS1) 10ArielGlenn: html dumps will be served from host where they are produced, via proxy [puppet] - 10https://gerrit.wikimedia.org/r/204257 
[13:25:08] <grrrit-wm>	 (03PS3) 10Andrew Bogott: Clean up expired keystone tokens. [puppet] - 10https://gerrit.wikimedia.org/r/204256 
[13:25:30] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] html dumps will be served from host where they are produced, via proxy [puppet] - 10https://gerrit.wikimedia.org/r/204257 (owner: 10ArielGlenn)
[13:25:40] <wikibugs>	 6operations, 5Interdatacenter-IPsec: Strongswan: security association reauthentication failure - https://phabricator.wikimedia.org/T96111#1209143 (10Gage) I spoke with Tobias from Strongswan on IRC about this:  <+ecdsa> jgage: The log you posted shows a rekey collision for the IPv6 SA, but that seems to be han...
[13:25:59] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: contint: make Jessie slaves package builders (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/203073 (https://phabricator.wikimedia.org/T95545) (owner: 10Hashar)
[13:26:37] <akosiaris>	 andrewbogott: a bug ? well a missing feature I would say (other would too I suppose)
[13:26:48] <andrewbogott>	 ok, logging
[13:27:00] <grrrit-wm>	 (03PS3) 10Thcipriani: Add submodules to master checkoutMediaWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204080 (https://phabricator.wikimedia.org/T88442) 
[13:27:42] <akosiaris>	 andrewbogott: are you sure about those 2 days ?
[13:27:51] <akosiaris>	 perhaps it should be more ?
[13:27:56] <andrewbogott>	 akosiaris: no, that’s just a c/p
[13:28:04] <andrewbogott>	 I guess we can make it a lot bigger and still get the benefit.
[13:28:11] <akosiaris>	 yup
[13:28:46] <akosiaris>	 hmm so it deletes expired tokens 2 days after they are expired
[13:29:08] <akosiaris>	 it actually feels right... 
[13:29:17] <andrewbogott>	 yeah, 2 days is quite a bit.
[13:29:26] <bblack>	 if they're expired they're useless for ongoing access for a long-running whatever somehow?
[13:29:37] <grrrit-wm>	 (03PS2) 10ArielGlenn: html dumps will be served from host where they are produced, via proxy [puppet] - 10https://gerrit.wikimedia.org/r/204257 
[13:29:52] <bblack>	 or is it possible for $something to use a token to open some connection and keep it alive past expiry so long as the token isn't deleted?
[13:30:11] <andrewbogott>	 I don’t know.  I certainly can’t think of anything like that.
[13:30:22] <andrewbogott>	 This is all REST stuff so nothing should persist.
[13:30:31] <bblack>	 ok
[13:30:34] <akosiaris>	 I think so too
[13:30:39] <akosiaris>	 then again, famous last words
[13:31:11] <grrrit-wm>	 (03CR) 10BBlack: [C: 031] Clean up expired keystone tokens. [puppet] - 10https://gerrit.wikimedia.org/r/204256 (owner: 10Andrew Bogott)
[13:32:22] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 031] "Premise seems just fine." [puppet] - 10https://gerrit.wikimedia.org/r/204256 (owner: 10Andrew Bogott)
[13:34:12] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Clean up expired keystone tokens. [puppet] - 10https://gerrit.wikimedia.org/r/204256 (owner: 10Andrew Bogott)
[13:34:32] <andrewbogott>	 akosiaris: “Provision the ssh key added” shall I merge?
[13:35:06] <hashar>	 akosiaris: for the package builder role on ci,  should I just  require => Class['contint::packages::labs']  so ?
[13:35:52] <hashar>	 should realize it before role::package::builder ,  but then the module uses ensure_package and I am afraid 'cowbuilder' will end up being realized earlier
[13:37:57] <wikibugs>	 7Blocked-on-Operations, 6operations, 5Patch-For-Review: Install nodejs, nginx and other dependencies on francium - https://phabricator.wikimedia.org/T94457#1209183 (10ArielGlenn) https://gerrit.wikimedia.org/r/#/c/204257/ nginx setup for the html dumps producing host, which will temporarily be serving its du...
[13:38:38] <akosiaris>	 andrewbogott: yes please
[13:39:17] <andrewbogott>	 akosiaris: you disabled puppet on virt1000 earlier?
[13:39:24] <akosiaris>	 hashar: yeah that require seems fine. I doubt cowbuilder will be realized beforehand though
[13:39:35] <akosiaris>	 andrewbogott: ah yes, I did, should have enabled it
[13:39:41] <andrewbogott>	 ok, I’ll re-enable
[13:39:52] <hashar>	 akosiaris: giving it a try
[13:40:04] <grrrit-wm>	 (03CR) 10Hashar: contint: make Jessie slaves package builders (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/203073 (https://phabricator.wikimedia.org/T95545) (owner: 10Hashar)
[13:40:13] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge.
[13:40:13] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge.
[13:42:11] <akosiaris>	 hashar: btw, I think force => true would force that symlink anyway
[13:42:44] <grrrit-wm>	 (03PS6) 10Hashar: contint: make Jessie slaves package builders [puppet] - 10https://gerrit.wikimedia.org/r/203073 (https://phabricator.wikimedia.org/T95545) 
[13:42:58] <akosiaris>	 no wait... it would be deleting an entire directory if the race condition you describe happens... hmmm not sure what will happen
[13:43:11] <hashar>	 akosiaris: yeah I thought about that. But if the dir is created, the cow images are created which takes a while then they are deleted and recreated. that is annoying
[13:43:42] <grrrit-wm>	 (03PS2) 10Hashar: package_builder: fix dependency order for hooks [puppet] - 10https://gerrit.wikimedia.org/r/203228 
[13:43:48] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Move the keystone token cron into openstack::database-server [puppet] - 10https://gerrit.wikimedia.org/r/204259 
[13:44:07] <grrrit-wm>	 (03CR) 10Hashar: "Cherry picked to production to get rid of the parent change that needs further work." [puppet] - 10https://gerrit.wikimedia.org/r/203228 (owner: 10Hashar)
[13:44:28] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] package_builder: fix dependency order for hooks [puppet] - 10https://gerrit.wikimedia.org/r/203228 (owner: 10Hashar)
[13:45:41] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Move the keystone token cron into openstack::database-server [puppet] - 10https://gerrit.wikimedia.org/r/204259 (owner: 10Andrew Bogott)
[13:47:49] <grrrit-wm>	 (03CR) 10Chad: Add submodules to master checkoutMediaWiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204080 (https://phabricator.wikimedia.org/T88442) (owner: 10Thcipriani)
[13:48:59] <grrrit-wm>	 (03PS12) 10BBlack: r::c::config::active_nodes -> hiera cache::nodes [puppet] - 10https://gerrit.wikimedia.org/r/204068 
[13:49:49] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] r::c::config::active_nodes -> hiera cache::nodes [puppet] - 10https://gerrit.wikimedia.org/r/204068 (owner: 10BBlack)
[13:50:33] <_joe_>	 bblack: I'll take a look shortly
[13:51:12] <bblack>	 _joe_: I need to iterate at least once, and then push it through puppet-compiler yet, to find stupid things :)
[13:52:29] <grrrit-wm>	 (03PS3) 10Alexandros Kosiaris: ganeti: Reference correctly the ganeti cluster nodes [puppet] - 10https://gerrit.wikimedia.org/r/203035 
[13:52:32] <icinga-wm>	 RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60609 bytes in 0.180 second response time
[13:52:51] <_joe_>	 bblack: I'm dealing with yet-another-small-change in hhvm 3.6 vs 3.3
[13:53:05] <andrewbogott>	 akosiaris: thanks for sorting out the keystone issue.  I guess we’ll check back tomorrow to make sure the cron actually worked.
[13:53:19] <andrewbogott>	 hm, or I could change it to ’14’ so that it fires in five minutes
[13:53:20] * andrewbogott does that
[13:53:33] <akosiaris>	 andrewbogott: run it manually
[13:53:43] <akosiaris>	 the very first time at least
[13:54:04] <andrewbogott>	 !log purging expired keystone tokens on virt1000
[13:54:10] <morebots>	 Logged the message, Master
[13:54:28] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] ganeti: Reference correctly the ganeti cluster nodes [puppet] - 10https://gerrit.wikimedia.org/r/203035 (owner: 10Alexandros Kosiaris)
[13:55:47] * andrewbogott runs a big expensive query on virt1000, most likely reproducing the same failure that got us here to begin with…
[13:56:55] <wikibugs>	 6operations, 5Interdatacenter-IPsec: Strongswan: security association reauthentication failure - https://phabricator.wikimedia.org/T96111#1209206 (10Gage) I reduced some timeouts in order to recreate the problem; config changes suggested by ecdsa have not yet been made: ``` conn %default         ikelifetime=6m...
[14:00:04] <jouncebot>	 chasemp: Respected human, time to deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150415T1400). Please do the needful.
[14:00:11] <chasemp>	 nope^
[14:01:39] <grrrit-wm>	 (03PS5) 10Filippo Giunchedi: graphite: introduce carbon-c-relay [puppet] - 10https://gerrit.wikimedia.org/r/181080 (https://phabricator.wikimedia.org/T85908) 
[14:01:43] <wikibugs>	 6operations, 10ops-eqiad, 10ops-fundraising: barium has a failed HDD - https://phabricator.wikimedia.org/T93899#1209209 (10Cmjohnson) New disk is on-line  nclosure Device ID: N/A Slot Number: 3 Drive's position: DiskGroup: 1, Span: 0, Arm: 0 Enclosure position: N/A Device Id: 3 WWN: 5000c500794334bb Sequence...
[14:03:54] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Followup fix for 7ba51bc [puppet] - 10https://gerrit.wikimedia.org/r/204263 
[14:04:54] <andrewbogott>	 paravoid: thanks for fixing labvirt networking.  Looks good now.
[14:05:23] <paravoid>	 great
[14:05:31] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Followup fix for 7ba51bc [puppet] - 10https://gerrit.wikimedia.org/r/204263 (owner: 10Alexandros Kosiaris)
[14:08:53] <grrrit-wm>	 (03CR) 10Hashar: [C: 04-1] "I have applied patchset 6 and ran it on integration-slave-jessie-1001.eqiad.wmflabs and cowbuilder ends up being installed first :(" [puppet] - 10https://gerrit.wikimedia.org/r/203073 (https://phabricator.wikimedia.org/T95545) (owner: 10Hashar)
[14:09:13] <grrrit-wm>	 (03PS13) 10BBlack: r::c::config::active_nodes -> hiera cache::$cluster::nodes [puppet] - 10https://gerrit.wikimedia.org/r/204068 
[14:09:20] <hashar>	 akosiaris: packages are still realized first regardless of the require => Class[..]  :-((((
[14:10:30] <akosiaris>	 hashar: ok, gimme a sec to sort something out and I 'll look into it
[14:10:41] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Typo fix in ssh::userkey [puppet] - 10https://gerrit.wikimedia.org/r/204264 
[14:10:42] <icinga-wm>	 PROBLEM - puppet last run on cp3041 is CRITICAL puppet fail
[14:12:08] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Typo fix in ssh::userkey [puppet] - 10https://gerrit.wikimedia.org/r/204264 (owner: 10Alexandros Kosiaris)
[14:13:19] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: gdash: display udp errors in graphite dashboard [puppet] - 10https://gerrit.wikimedia.org/r/204265 
[14:13:51] <icinga-wm>	 RECOVERY - puppet last run on ganeti1002 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures
[14:14:00] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: gdash: display udp errors in graphite dashboard [puppet] - 10https://gerrit.wikimedia.org/r/204265 
[14:14:19] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] gdash: display udp errors in graphite dashboard [puppet] - 10https://gerrit.wikimedia.org/r/204265 (owner: 10Filippo Giunchedi)
[14:14:22] <icinga-wm>	 RECOVERY - puppet last run on ganeti2004 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures
[14:14:22] <icinga-wm>	 RECOVERY - puppet last run on ganeti2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:14:22] <icinga-wm>	 RECOVERY - puppet last run on ganeti2003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:14:22] <icinga-wm>	 RECOVERY - puppet last run on ganeti2002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:15:04] <grrrit-wm>	 (03CR) 10Hashar: "Found out package_builder init has:" [puppet] - 10https://gerrit.wikimedia.org/r/203073 (https://phabricator.wikimedia.org/T95545) (owner: 10Hashar)
[14:15:22] <icinga-wm>	 RECOVERY - puppet last run on ganeti1003 is OK Puppet is currently enabled, last run 0 seconds ago with 0 failures
[14:16:02] <icinga-wm>	 RECOVERY - puppet last run on ganeti2006 is OK Puppet is currently enabled, last run 46 seconds ago with 0 failures
[14:16:52] <icinga-wm>	 RECOVERY - puppet last run on ganeti2005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:31] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0]
[14:21:11] <wikibugs>	 6operations, 10MediaWiki-General-or-Unknown, 10MediaWiki-JobRunner, 7Graphite: jobrunner metrics audit - https://phabricator.wikimedia.org/T95913#1209253 (10fgiunchedi) picking this up, related https://gerrit.wikimedia.org/r/204237 https://gerrit.wikimedia.org/r/203839 https://gerrit.wikimedia.org/r/203847...
[14:21:18] <wikibugs>	 6operations, 10ops-eqiad, 10ops-fundraising: barium has a failed HDD - https://phabricator.wikimedia.org/T93899#1209254 (10Cmjohnson) 5Open>3Resolved package updates were successful..resolving this ticket
[14:26:10] <grrrit-wm>	 (03CR) 10Hashar: "So it is all good to me and I would +2 it but I prefer who ever handles the deployment / maintenance of jouncebot to trigger the merge. Ju" [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/203985 (owner: 10BryanDavis)
[14:28:52] <icinga-wm>	 RECOVERY - puppet last run on cp3041 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:29:24] <wikibugs>	 6operations, 5Interdatacenter-IPsec: Strongswan: security association reauthentication failure - https://phabricator.wikimedia.org/T96111#1209294 (10Gage) Ok, good news. Further discussion with ecdsa has revealed that this problem is fixed in 5.3.0, which is released but not yet packaged for Debian.  Bug: http...
[14:30:54] <grrrit-wm>	 (03PS14) 10BBlack: r::c::config::active_nodes -> hiera cache::$cluster::nodes [puppet] - 10https://gerrit.wikimedia.org/r/204068 
[14:32:17] <grrrit-wm>	 (03CR) 10Thcipriani: Add submodules to master checkoutMediaWiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204080 (https://phabricator.wikimedia.org/T88442) (owner: 10Thcipriani)
[14:32:22] <icinga-wm>	 RECOVERY - puppet last run on ganeti1004 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures
[14:32:34] <wikibugs>	 10Ops-Access-Requests, 6operations, 10Continuous-Integration: Add user wmde-fisch to LDAP group wmde - https://phabricator.wikimedia.org/T95546#1209309 (10hashar) The Jenkins account shows up with the 'wmde' group at https://integration.wikimedia.org/ci/user/wmde-fisch/  @WMDE-Fisch should thus be able to co...
[14:33:42] <icinga-wm>	 RECOVERY - puppet last run on ganeti1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:37:42] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: restbase: add ganglia cluster [puppet] - 10https://gerrit.wikimedia.org/r/204274 
[14:43:33] <gwicke>	 ori: are VE preconnects to rest.wikimedia.org active already?
[14:49:58] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: statsite: default to localhost, override as needed [puppet] - 10https://gerrit.wikimedia.org/r/204275 
[14:52:07] * anomie sees nothing for SWAT this morning
[14:54:36] <legoktm>	 !log running deleteEmptyAccounts.php --fix on metawiki (CentralAuth)
[14:54:41] <morebots>	 Logged the message, Master
[14:57:12] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0]
[14:57:52] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 10 data above and 3 below the confidence bounds
[15:00:04] <jouncebot>	 manybubbles, anomie, ^d, thcipriani, marktraceur: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150415T1500). Please do the needful.
[15:00:28] <^d>	 I'll take it
[15:00:30] <^d>	 No patches!
[15:00:39] <grrrit-wm>	 (03PS15) 10BBlack: r::c::config::active_nodes -> hiera cache::$cluster::nodes [puppet] - 10https://gerrit.wikimedia.org/r/204068 
[15:01:04] <^d>	 bblack: hiera refactors usually take 10+ patches but they're so worth it in the end :)
[15:13:34] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Other than removing the default value for mission-critical lookups, I think this patch is now good to be merged." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/204068 (owner: 10BBlack)
[15:15:30] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Install labvirt-star cert on labvirt nodes. [puppet] - 10https://gerrit.wikimedia.org/r/204279 
[15:15:57] <grrrit-wm>	 (03PS16) 10BBlack: r::c::config::active_nodes -> hiera cache::$cluster::nodes [puppet] - 10https://gerrit.wikimedia.org/r/204068 
[15:16:35] <wikibugs>	 6operations, 5Interdatacenter-IPsec: Update 3.19 kernel to 3.19.4 - https://phabricator.wikimedia.org/T96146#1209407 (10MoritzMuehlenhoff) 3NEW
[15:16:41] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: statsite: default to localhost, override as needed [puppet] - 10https://gerrit.wikimedia.org/r/204275 
[15:17:59] <wikibugs>	 6operations, 6Labs: One instance hammering on NFS should not make it unavailable to everyone else - https://phabricator.wikimedia.org/T95766#1209414 (10coren) NFS indeed does not allow us to know which enduser is responsible for any specific traffic, as an unavoidable consequence of the levels of abstraction t...
[15:18:55] <wikibugs>	 6operations, 5Interdatacenter-IPsec: Update 3.19 kernel to 3.19.4 - https://phabricator.wikimedia.org/T96146#1209416 (10BBlack) In practice, getting this to the to-be-ipsec nodes will take quite some time for cache reboots once it's in the repo and package updated on the hosts...
[15:19:54] <wikibugs>	 6operations, 5Interdatacenter-IPsec: Update 3.19 kernel to 3.19.4 - https://phabricator.wikimedia.org/T96146#1209417 (10BBlack) (I mention the above mainly as a side note about having ipsec rollout date depend on the fix or not)
[15:20:43] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Install labvirt-star cert on labvirt nodes. [puppet] - 10https://gerrit.wikimedia.org/r/204279 (owner: 10Andrew Bogott)
[15:22:11] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0]
[15:27:33] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Use the already-existing $certname var in libvirtd.conf [puppet] - 10https://gerrit.wikimedia.org/r/204282 
[15:27:52] <bblack>	 !log disabling puppet on caches JIC for https://gerrit.wikimedia.org/r/204068 merge
[15:27:57] <morebots>	 Logged the message, Master
[15:28:34] <grrrit-wm>	 (03PS17) 10BBlack: r::c::config::active_nodes -> hiera cache::$cluster::nodes [puppet] - 10https://gerrit.wikimedia.org/r/204068 
[15:30:06] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 031] r::c::config::active_nodes -> hiera cache::$cluster::nodes [puppet] - 10https://gerrit.wikimedia.org/r/204068 (owner: 10BBlack)
[15:30:36] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] r::c::config::active_nodes -> hiera cache::$cluster::nodes [puppet] - 10https://gerrit.wikimedia.org/r/204068 (owner: 10BBlack)
[15:30:49] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Typo fixes in role::ganeti [puppet] - 10https://gerrit.wikimedia.org/r/204283 
[15:31:49] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Use the already-existing $certname var in libvirtd.conf [puppet] - 10https://gerrit.wikimedia.org/r/204282 (owner: 10Andrew Bogott)
[15:32:00] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Typo fixes in role::ganeti [puppet] - 10https://gerrit.wikimedia.org/r/204283 (owner: 10Alexandros Kosiaris)
[15:35:08] <bblack>	 !log re-enabling puppet on caches, canary nodes were no-op \o/
[15:35:15] <morebots>	 Logged the message, Master
[15:39:01] <godog>	 ottomata: want to try sending varnish stats straight to graphite?
[15:39:02] <icinga-wm>	 PROBLEM - nova-compute process on labvirt1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute
[15:40:54] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Typo fix [puppet] - 10https://gerrit.wikimedia.org/r/204285 
[15:44:17] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Typo fix [puppet] - 10https://gerrit.wikimedia.org/r/204285 (owner: 10Alexandros Kosiaris)
[15:45:07] <ottomata>	 godog: today is a bad day :( analytics cluster is really unhappy right now with too many jobs running, am trying to help the produciton ones through, then will figure out some better queues for users
[15:45:18] <ottomata>	 but, you are welcome to just try it on your own
[15:45:23] <ottomata>	 and i can help with any qs you might have
[15:46:53] <grrrit-wm>	 (03PS3) 10Filippo Giunchedi: statsite: default to localhost, override as needed [puppet] - 10https://gerrit.wikimedia.org/r/204275 
[15:47:54] <godog>	 ottomata: ack, let me know if you free up today or we can pick it up tomorrow too!
[15:56:11] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK No anomaly detected
[15:58:08] <grrrit-wm>	 (03PS4) 10Gage: mailman: SENDER_HEADERS use from only [puppet] - 10https://gerrit.wikimedia.org/r/154846 (https://bugzilla.wikimedia.org/46049) (owner: 10John F. Lewis)
[16:00:14] <grrrit-wm>	 (03CR) 10Gage: [C: 032] mailman: SENDER_HEADERS use from only [puppet] - 10https://gerrit.wikimedia.org/r/154846 (https://bugzilla.wikimedia.org/46049) (owner: 10John F. Lewis)
[16:04:11] <grrrit-wm>	 (03CR) 10Mobrovac: restbase: add ganglia cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/204274 (owner: 10Filippo Giunchedi)
[16:05:46] <andrewbogott>	 Coren: so, I’m trying to make a self-signed cert, and no matter what I try the Subject and Issuer are the same… but the cert I’m trying to replicate has issuer CN=Wikimedia CA
[16:05:48] <andrewbogott>	 any idea?
[16:06:32] <Coren>	 Needs moar contekts.
[16:07:40] <andrewbogott>	 I need a replacement for https://dpaste.de/FJnj
[16:07:43] <Coren>	 As a rule, the steps are (a) generate key, (b) create csr, (c) sign csr with same key.
[16:07:48] <andrewbogott>	 that uses labvirt* instead of virt*
[16:08:11] <Coren>	 That's... not a self-signed cert.  :-)
[16:08:47] <andrewbogott>	 Wikimedia CA is… us, isn’t it?
[16:09:05] <andrewbogott>	 Or is ‘signed by us’ different from self-signed?
[16:09:29] <akosiaris>	 wikimedia CA ?
[16:09:33] <Coren>	 Yep.  :-)  "self-signed cert" means a certificate that signs /itself/.  You want a cert signed by our CA.  :-)
[16:09:43] <grrrit-wm>	 (03CR) 10Nuria: eventlogging: adjust counters thresholds (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/204237 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi)
[16:09:47] <andrewbogott>	 Coren: ok!  How do I do that?  :)
[16:10:06] <greg-g>	 we have a CA now?
[16:10:07] <Coren>	 (Which, afaik, we don't *have*).  Someone somewhere has created a CSC with that issuer.  You need to locate it and use its key.
[16:10:13] <akosiaris>	 greg-g: not really
[16:10:24] <Coren>	 akosiaris: We totally should, though.
[16:10:34] <akosiaris>	 a real CA ? 
[16:10:43] <akosiaris>	 as in a CA that is in browsers ?
[16:10:44] <Coren>	 akosiaris: An internal one.
[16:10:45] <andrewbogott>	 Um… it’s surely not a real CA
[16:10:48] <andrewbogott>	 not for browsers
[16:11:04] <akosiaris>	 we got an internal CA for a few very specific things
[16:11:07] <akosiaris>	 we actually got 2
[16:11:20] <akosiaris>	 on that I personally lost the keys for like 1,5 year ago
[16:11:27] <Coren>	 akosiaris: Heh.
[16:11:27] <akosiaris>	 and one that I created to replace that first one
[16:11:37] <Coren>	 akosiaris: What's its dn?
[16:11:44] <akosiaris>	 so andrewbogott you have me to talk to
[16:11:53] <akosiaris>	 Coren: hmm lemme check
[16:12:18] <andrewbogott>	 Coren: is that a different key from virt-star.eqiad.wmnet.key?
[16:12:31] <akosiaris>	 Subject: C=US, ST=California, L=San Francisco, O=Wikimedia Foundation, OU=Operations, CN=WMF CA 2014-2017
[16:12:36] <akosiaris>	 Coren: ^
[16:12:47] <akosiaris>	 openssl x509 -in files/ssl/wmf_ca_2014_2017.crt -text 
[16:13:10] <Coren>	 andrewbogott: It is - the virt-star one was /signed/ by one with a dn of C=US, ST=California, L=San Francisco, O=Wikimedia Foundation, CN=Wikimedia CA
[16:13:11] <akosiaris>	 this one is actually the new one and it lives entirely in the private puppet repo
[16:13:26] <Coren>	 andrewbogott: If you need to have the same signer, you need to locate that cert and key
[16:13:34] <akosiaris>	 the without 2014-2017 is the old one
[16:13:43] <Coren>	 akosiaris: Do we deploy that root cert and trust it?
[16:13:44] <akosiaris>	 and I suggest killing it
[16:13:50] <icinga-wm>	 RECOVERY - Host analytics1020 is UPING OK - Packet loss = 0%, RTA = 1.12 ms
[16:13:51] <akosiaris>	 Coren: yes
[16:14:04] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: eventlogging: adjust counters thresholds (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/204237 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi)
[16:14:05] <Coren>	 andrewbogott: You can use a cert signed by that key then.
[16:14:14] <akosiaris>	 require certificates::wmf_ca 
[16:14:22] <akosiaris>	 and require certificates::wmf_ca_2014_2017 respectively
[16:14:23] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: restbase: add ganglia cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/204274 (owner: 10Filippo Giunchedi)
[16:14:24] <akosiaris>	 old and new
[16:14:34] <andrewbogott>	 It doesn’t have to be the same, it just has to play nice with wmf-ca.pem
[16:14:38] <akosiaris>	 s/require/class/ but you get the picture
[16:14:57] <Coren>	 andrewbogott: Wait.  wmf-ca.pem?  Where does that come from?
[16:15:05] * andrewbogott is pretty lost
[16:15:17] <wikibugs>	 6operations, 10ops-eqiad, 6Labs: labvirt100x boxes 'no carrier' on eth1 - https://phabricator.wikimedia.org/T95973#1209547 (10Cmjohnson) 5Open>3Resolved This should be resolved now thanks to Faidons fix.
[16:15:23] <akosiaris>	 andrewbogott: what are you trying to do ?
[16:15:25] <Coren>	 andrewbogott: OKay.  The nutshell:
[16:15:29] * akosiaris reading backlog
[16:15:41] <Coren>	 andrewbogott: You need a cert that is signed by an authorithy the clients will recognize.
[16:16:01] <Coren>	 andrewbogott: If the clients has our internal CA certs, then any cert signed with them will work.
[16:16:29] <cmjohnson1>	 ottomata: an1020 has been fixed. 
[16:16:45] <andrewbogott>	 So… on the old virt nodes, we have libvirtd settings:  key_file = "/var/lib/nova/virt-star.eqiad.wmnet.key" cert_file = "/etc/ssl/localcerts/virt-star.eqiad.wmnet.crt" ca_file = "/etc/ssl/certs/wmf-ca.pem"
[16:16:48] <Coren>	 andrewbogott: So you can create a CSR for your labvirt* cert, and sign it with our CA
[16:16:53] <andrewbogott>	 That doesn’t work on libvirt100x because of the wrong hostname
[16:17:08] <andrewbogott>	 So I presume a need a new, similar cert but for libvirt* instead of virt*
[16:17:18] <akosiaris>	 you presume correctly
[16:17:19] <Coren>	 That ca_file, where does it come from?  That's the one whose key you need.
[16:18:25] <Coren>	 You need to sign your labvirt* cert with the key to that one.  :-)
[16:18:41] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1020 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[16:18:41] <icinga-wm>	 PROBLEM - puppet last run on analytics1020 is CRITICAL puppet fail
[16:18:41] <andrewbogott>	 ok… and it sounds like akosiaris thinks that key is lost forever, yes?
[16:19:05] <andrewbogott>	 hm
[16:19:08] <akosiaris>	 I would say is pretty sure vs thinks
[16:19:11] <Coren>	 I'm not sure it's the same, but if it is then yes - you'll need to also change the ca_file everywhere.
[16:19:42] <andrewbogott>	 actually, I have to stop working on this and go to the quarterly review meeting.
[16:19:51] <andrewbogott>	 akosiaris, if you have time to sort this out, the patch that needs fixing is https://gerrit.wikimedia.org/r/#/c/204279/
[16:19:59] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] * Simplify package build, also the stepping stone for adding a systemd unit file (T95055) [debs/ircecho] - 10https://gerrit.wikimedia.org/r/204045 (owner: 10Muehlenhoff)
[16:20:07] <andrewbogott>	 otherwise I will return to this later on and struggle :/
[16:20:28] <akosiaris>	 andrewbogott: where did that cert come from ?
[16:20:46] <akosiaris>	 andrewbogott: actually, get to your meeting and we will talk later
[16:20:53] <andrewbogott>	 I made it, it’s signed with labvirt-star.eqiad.wmnet.key
[16:20:59] <andrewbogott>	 which is a copy of virt-star.eqiad.wmnet.key
[16:21:10] <andrewbogott>	 and the cert doesn’t work because… wrong CA
[16:21:22] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: Add a systemd unit file (T95055) (032 comments) [debs/ircecho] - 10https://gerrit.wikimedia.org/r/204054 (owner: 10Muehlenhoff)
[16:21:31] <akosiaris>	 you self-signed the certificate ? 
[16:21:37] <akosiaris>	 that will not work
[16:21:45] <andrewbogott>	 So I see!
[16:21:53] * andrewbogott reboots in hopes of getting laptop camera to work
[16:22:17] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: * Simplify package build, also the stepping stone for adding a systemd unit file (T95055) (031 comment) [debs/ircecho] - 10https://gerrit.wikimedia.org/r/204045 (owner: 10Muehlenhoff)
[16:22:23] <Coren>	 akosiaris: Terminology woes; andrewbogott thought "singed ourselves" and "self-signed" were the same.
[16:22:31] <gwicke>	 !log running revision render thin-out script on wikipedia HTML
[16:22:38] <morebots>	 Logged the message, Master
[16:22:53] <Coren>	 Which, admitedly, sounds reasonable unless you are familiar with pki.  :-)
[16:23:23] <grrrit-wm>	 (03PS2) 1020after4: Trebuchet: run all state changing git commands with umask 002 [puppet] - 10https://gerrit.wikimedia.org/r/201344 (https://phabricator.wikimedia.org/T94754) (owner: 10BryanDavis)
[16:23:50] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1020 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[16:23:51] <icinga-wm>	 RECOVERY - puppet last run on analytics1020 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures
[16:27:23] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "This will not work. The certificate needs to be signed by a valid CA and not be self-signed. We got 2 WMF internal CAs, one we try to depr" [puppet] - 10https://gerrit.wikimedia.org/r/204279 (owner: 10Andrew Bogott)
[16:29:08] <logmsgbot>	 !log demon Synchronized php-1.26wmf1/extensions/CentralAuth/: (no message) (duration: 00m 13s)
[16:29:13] <morebots>	 Logged the message, Master
[16:29:38] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] "minor nit but LGTM otherwise" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/199598 (https://phabricator.wikimedia.org/T84956) (owner: 10Gilles)
[16:31:12] <grrrit-wm>	 (03PS1) 10Alex Monk: Note which LDAP groups are allowed in HTTP login prompts mentioning labs [puppet] - 10https://gerrit.wikimedia.org/r/204291 
[16:33:58] <wikibugs>	 6operations: Encrypted password storage - https://phabricator.wikimedia.org/T96130#1209635 (10Dzahn) We had an existing ticket for this in RT, it used to be https://rt.wikimedia.org/Ticket/Display.html?id=6665  which was imported over to phab as T83410  Let's merge them?
[16:37:49] <wikibugs>	 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1209651 (10csteipp) >>! In T95229#1207763, @GWicke wrote: >> Graphoid is 530 kloc's of javascript. >  > If the codebase is too large to review, then why don't w...
[16:44:48] <nuria>	 !log restarted eventlogging && deployed d241d75ee2fab554bc47cf8d1ba83f5df2130633
[16:45:57] <morebots>	 Logged the message, Master
[16:47:00] <icinga-wm>	 PROBLEM - NTP on analytics1020 is CRITICAL: NTP CRITICAL: Offset unknown
[16:48:24] <ottomata>	 hey cmjohnson1
[16:48:24] <ottomata>	 you working on an20?
[16:48:24] <jgage>	 that's an odd error
[16:48:40] <cmjohnson1>	 no, I just plugged the eth cable in ...you may want to reboot it
[16:49:18] <ottomata>	 ooo
[16:49:19] <ottomata>	 ok
[16:49:31] <ottomata>	 !log rebooting analyics1020
[16:49:37] <morebots>	 Logged the message, Master
[16:49:49] <jgage>	 too bad you didn't do an apt-get upgrade first :)
[16:50:50] <icinga-wm>	 PROBLEM - Host analytics1020 is DOWN: PING CRITICAL - Packet loss = 100%
[16:50:54] <ottomata>	 haha
[16:50:56] <ottomata>	 oh?
[16:53:04] <jgage>	 it has pending updates for openjdk, among many others
[16:53:21] <icinga-wm>	 RECOVERY - Host analytics1020 is UPING OK - Packet loss = 0%, RTA = 1.29 ms
[16:57:30] <wikibugs>	 10Ops-Access-Requests, 6operations: Requesting access to tin.eqiad.wmnet for mforns - https://phabricator.wikimedia.org/T96163#1209702 (10mforns) 3NEW
[16:59:30] <wikibugs>	 10Ops-Access-Requests, 6operations: Requesting access to hafnium for mforns - https://phabricator.wikimedia.org/T96164#1209712 (10mforns) 3NEW
[17:04:19] <ottomata>	 thanks cmjohnson1, an20 is looking much better
[17:04:52] <cmjohnson1>	 cool, i'm pretty sure just plugging the network cable in willy nilly was the problem
[17:06:29] <wikibugs>	 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1209731 (10GWicke) @csteipp, it's not an either-or. If we have doubts about the XSS cleanliness of the output, then additional sanitization can help to further...
[17:12:30] <wikibugs>	 6operations, 5Interdatacenter-IPsec: Update 3.19 kernel to 3.19.4 - https://phabricator.wikimedia.org/T96146#1209737 (10MoritzMuehlenhoff) AFAICS the aes256gcm bug is bypassed with https://phabricator.wikimedia.org/rOPUP1ab5d2ccdb85b37c220c49a3e6678688098dcaeb so that shouldn't be a blocker
[17:15:00] <legoktm>	 !log running migrateAccount.php --auto (CentralAuth)
[17:15:07] <morebots>	 Logged the message, Master
[17:26:05] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: "some general comments" (031 comment) [software/sentry] - 10https://gerrit.wikimedia.org/r/201006 (https://phabricator.wikimedia.org/T84956) (owner: 10Gilles)
[17:31:36] <wikibugs>	 6operations, 6Phabricator, 10Wikimedia-Bugzilla, 7Tracking: Tracking: Remove Bugzilla from production - https://phabricator.wikimedia.org/T95184#1209804 (10JohnLewis)
[17:42:15] <grrrit-wm>	 (03CR) 10Mobrovac: [C: 031] restbase: add ganglia cluster [puppet] - 10https://gerrit.wikimedia.org/r/204274 (owner: 10Filippo Giunchedi)
[17:45:05] <physikerwelt>	 bd808: I have tested https://gerrit.wikimedia.org/r/#/c/204098/ and it works well on labs-vagrant feel free to merge it
[18:00:04] <jouncebot>	 twentyafterfour, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150415T1800). Please do the needful.
[18:16:12] <aude>	 twentyafterfour: will be back in ~30 min, in case anything needed for wikidata
[18:25:38] <legoktm>	 !log running forceRenameUsers.php (SUL finalization) on test* wikis
[18:25:44] <morebots>	 Logged the message, Master
[18:27:08] <grrrit-wm>	 (03PS2) 10Yuvipanda: tools: Remove remnants of portgranter code [puppet] - 10https://gerrit.wikimedia.org/r/204014 (https://phabricator.wikimedia.org/T93046) 
[18:27:35] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Remove remnants of portgranter code [puppet] - 10https://gerrit.wikimedia.org/r/204014 (https://phabricator.wikimedia.org/T93046) (owner: 10Yuvipanda)
[18:30:29] <grrrit-wm>	 (03PS3) 10Yuvipanda: tools: Separate registration / unregistreation for proxylistener [puppet] - 10https://gerrit.wikimedia.org/r/204193 (https://phabricator.wikimedia.org/T96059) 
[18:30:35] <grrrit-wm>	 (03CR) 10Yuvipanda: tools: Separate registration / unregistreation for proxylistener [puppet] - 10https://gerrit.wikimedia.org/r/204193 (https://phabricator.wikimedia.org/T96059) (owner: 10Yuvipanda)
[18:30:44] <YuviPanda>	 Coren: ^ can you +1?
[18:31:00] * Coren reads
[18:33:15] <grrrit-wm>	 (03CR) 10coren: [C: 031] "Reasonably sane, but not tested by me. :-)" [puppet] - 10https://gerrit.wikimedia.org/r/204193 (https://phabricator.wikimedia.org/T96059) (owner: 10Yuvipanda)
[18:39:53] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] "Alright, i'm slowly and very carefully doing this now :)" [puppet] - 10https://gerrit.wikimedia.org/r/204193 (https://phabricator.wikimedia.org/T96059) (owner: 10Yuvipanda)
[18:41:42] <legoktm>	 twentyafterfour: have you started deploying yet?
[18:42:52] <twentyafterfour>	 legoktm: haven't started scapping yet no, everything ok?
[18:44:03] <legoktm>	 twentyafterfour: I have a i18n update (WikimediaMessages) that should go out asap, I'm putting up the patch now, can you include it in your scap?
[18:44:36] <twentyafterfour>	 legoktm: sure thing
[18:47:29] <legoktm>	 twentyafterfour: the active branches are wmf1 and wmf2 right?
[18:48:37] <twentyafterfour>	 legoktm: yes I just cut wmf2 a little while ago and I'm about to phase out 1.25wmf24
[18:49:08] <twentyafterfour>	 wmf2 isn't yet active anywhere
[18:49:33] <twentyafterfour>	 (It's currently checking out all the submodules for wmf2)
[18:50:23] * aude back
[18:51:07] <thedj>	 hmm, wikimedia.org has become very laggy for me all of a sudden
[18:51:19] <thedj>	 tinet.ams05.atlas.cogentco.com (130.117.14.50)  14.209 ms  17.279 ms  13.724 ms
[18:51:25] <thedj>	 10  xe-7-2-2.was10.ip4.gtt.net (141.136.111.14)  887.090 ms  907.144 ms  783.867 ms
[18:51:33] <legoktm>	 twentyafterfour: the submodule bumps are wmf2: https://gerrit.wikimedia.org/r/204317 and wmf1: https://gerrit.wikimedia.org/r/204318
[18:51:48] <thedj>	 seems somewhere in between those two, because it makes a huge jump in ping response from there
[18:53:39] <legoktm>	 twentyafterfour: do you want me to merge those or will you take care of it?
[18:53:42] <aude>	 thedj: do you have a traceroute?
[18:53:58] <twentyafterfour>	 legoktm: I can take care of it for you no problem
[18:54:14] * aude can't help but know we've had problems with the route to stuff like gerrit (ssh) and labs
[18:54:37] <thedj>	 aude: aude http://pastebin.ca/2973441
[18:55:01] <legoktm>	 thanks :D
[18:56:07] <aude>	 thedj: hmm
[18:56:37] <aude>	 not sure but if it's an ongoing problem then probably ask paravoid and/or create a task
[18:56:50] <aude>	 sometimes we can work around the issue or try to deal with it somehow
[18:57:23] <thedj>	 i can't even fetch right now :(
[18:57:46] <paravoid>	 your IP in private please :)
[18:59:47] <akosiaris>	 GTT ?
[19:01:12] <paravoid>	 gtt/cogent
[19:01:44] <paravoid>	 or maybe just ziggo/gtt, unsure yet
[19:03:51] <grrrit-wm>	 (03PS2) 10Dereckson: Set meta namespace and site name on or.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203785 (https://phabricator.wikimedia.org/T94142) 
[19:05:09] <grrrit-wm>	 (03PS1) 1020after4: Add 1.26wmf2 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204319 
[19:05:11] <grrrit-wm>	 (03PS1) 1020after4: Wikipedias to 1.26wmf1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204320 
[19:05:13] <grrrit-wm>	 (03PS1) 1020after4: Group0 to 1.26wmf2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204321 
[19:05:15] <grrrit-wm>	 (03PS1) 1020after4: Remove 1.25wmf20 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204322 
[19:06:50] <grrrit-wm>	 (03PS3) 10Dereckson: Set meta namespace and site name on or.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203785 (https://phabricator.wikimedia.org/T94142) 
[19:08:13] <grrrit-wm>	 (03CR) 10Dereckson: "PS2: Rebased" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203785 (https://phabricator.wikimedia.org/T94142) (owner: 10Dereckson)
[19:09:54] <wikibugs>	 6operations, 10hardware-requests, 5Continuous-Integration-Isolation: eqiad: 2 hardware access request for CI isolation on labsnet - https://phabricator.wikimedia.org/T93076#1210155 (10hashar) labnodepool1001 has been installed and is ready for service implementation  scandium (zuul mergers) should land in la...
[19:11:24] <grrrit-wm>	 (03PS1) 10Yuvipanda: Revert "tools: Separate registration / unregistreation for proxylistener" [puppet] - 10https://gerrit.wikimedia.org/r/204323 
[19:11:40] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] Revert "tools: Separate registration / unregistreation for proxylistener" [puppet] - 10https://gerrit.wikimedia.org/r/204323 (owner: 10Yuvipanda)
[19:15:35] <YuviPanda>	 Coren: I reverted it for now, needs identd debugging
[19:16:03] <Coren>	 What issue did you run into?
[19:16:11] <icinga-wm>	 PROBLEM - puppet last run on cp3049 is CRITICAL puppet fail
[19:17:00] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 21.43% of data above the critical threshold [500.0]
[19:17:07] <wikibugs>	 6operations, 10Wikimedia-Mailing-lists: scrub non-free PDF from list archives - https://phabricator.wikimedia.org/T95195#1210167 (10Jalexander) IANAL but my recommendation is to leave it, the risks are too high until we get a demand, if we get a demand then we have a legal requirement.
[19:17:17] <YuviPanda>	 Coren: basically identd can’t figure out which user is connecting from
[19:17:58] <YuviPanda>	 Coren: I think problem is that the client closes connection too early
[19:18:14] <Coren>	 Ah.  Right, identd must have an actively open socket to work.
[19:18:29] <wikibugs>	 6operations: Update DNS for the Wikipedia store, before May 31 - https://phabricator.wikimedia.org/T96182#1210171 (10vshchepakina) 3NEW a:3Jgreen
[19:18:41] <icinga-wm>	 PROBLEM - puppet last run on cp3019 is CRITICAL puppet fail
[19:19:11] <icinga-wm>	 PROBLEM - puppet last run on cp3047 is CRITICAL puppet fail
[19:19:34] <YuviPanda>	 Coren: yeah, am reworking it all now :)
[19:20:10] <icinga-wm>	 PROBLEM - puppet last run on cp3017 is CRITICAL puppet fail
[19:21:50] <YuviPanda>	 well
[19:21:52] <YuviPanda>	 not ‘all'
[19:21:54] <YuviPanda>	 but enough biits
[19:26:26] <Coren>	 !log tools -exec-03 drained, rebooting
[19:27:21] <icinga-wm>	 PROBLEM - puppet last run on analytics1027 is CRITICAL Puppet last ran 4 hours ago
[19:33:21] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0]
[19:33:22] <icinga-wm>	 RECOVERY - puppet last run on cp3019 is OK Puppet is currently enabled, last run 6 seconds ago with 0 failures
[19:33:50] <icinga-wm>	 RECOVERY - puppet last run on analytics1027 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures
[19:34:02] <icinga-wm>	 RECOVERY - puppet last run on cp3049 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:35:31] <icinga-wm>	 RECOVERY - puppet last run on cp3047 is OK Puppet is currently enabled, last run 24 seconds ago with 0 failures
[19:35:49] <greg-g>	 anyone have a clue about https://phabricator.wikimedia.org/T96114  CSS isn't loading
[19:35:57] <greg-g>	 see the last comment from Krenair 
[19:36:21] <icinga-wm>	 RECOVERY - puppet last run on cp3017 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures
[19:43:48] <grrrit-wm>	 (03PS1) 10Yuvipanda: Revert "Revert "tools: Separate registration / unregistreation for proxylistener"" [puppet] - 10https://gerrit.wikimedia.org/r/204329 
[19:44:01] <YuviPanda>	 Coren: ^ wanna take a look? :)
[19:44:28] <Coren>	 YuviPanda: Can you give me a minute?  I'm in the middle of shuffling jobs around.
[19:44:40] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Revert "Revert "tools: Separate registration / unregistreation for proxylistener"" [puppet] - 10https://gerrit.wikimedia.org/r/204329 (owner: 10Yuvipanda)
[19:44:43] <YuviPanda>	 Coren: sure
[19:45:34] <grrrit-wm>	 (03PS2) 10Yuvipanda: Revert "Revert "tools: Separate registration / unregistreation for proxylistener"" [puppet] - 10https://gerrit.wikimedia.org/r/204329 
[19:46:44] <grrrit-wm>	 (03PS1) 10Ottomata: Add 2 new FairScheduler queues: priority and production [puppet] - 10https://gerrit.wikimedia.org/r/204330 
[19:47:18] <Coren>	 YuviPanda: Looking now while I wait for draining.
[19:47:24] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Add 2 new FairScheduler queues: priority and production [puppet] - 10https://gerrit.wikimedia.org/r/204330 (owner: 10Ottomata)
[19:47:25] <twentyafterfour>	 grr.. gnome-terminal crashed and took out weechat. I should really use screen :-o
[19:47:44] <YuviPanda>	 Coren: cool. I just added a recv on the client and a send on the server
[19:47:50] * YuviPanda hasn’t really done socket programming as such
[19:48:55] <grrrit-wm>	 (03CR) 1020after4: [C: 032] Add 1.26wmf2 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204319 (owner: 1020after4)
[19:49:04] <grrrit-wm>	 (03CR) 1020after4: [C: 032] Remove 1.25wmf20 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204322 (owner: 1020after4)
[19:49:42] <Coren>	 YuviPanda: I don't know about python, but if you want to make certain you don't have half-closed sockets with pending data, in C you'd normally do an explicit shutdown(sock, 2) when you want to make something synchronous.
[19:50:00] <wikibugs>	 6operations, 10Wikimedia-Mailing-lists: scrub non-free PDF from list archives - https://phabricator.wikimedia.org/T95195#1210258 (10Slaporte) 5Open>3declined a:3Slaporte Hi @jeremyb, please report cases of potential copyright infringement through our standard DMCA process where appropriate: http://wikime...
[19:50:01] <YuviPanda>	 Coren: I think the .close() would be the equivalent
[19:50:11] <YuviPanda>	 oooh no
[19:50:24] <grrrit-wm>	 (03PS1) 10Chad: Hiera-ize the mediawiki-installation dsh group [puppet] - 10https://gerrit.wikimedia.org/r/204331 
[19:50:37] <Coren>	 YuviPanda: Python might do it implicitly in the .close() though I wouldn't know for certain.
[19:50:46] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add 1.26wmf2 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204319 (owner: 1020after4)
[19:51:05] <gwicke>	 ori: is the VE preconnect patch live?
[19:51:15] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Hiera-ize the mediawiki-installation dsh group [puppet] - 10https://gerrit.wikimedia.org/r/204331 (owner: 10Chad)
[19:51:52] <YuviPanda>	 Coren: read docs, doesn’t :) good catch
[19:52:16] <Coren>	 I don't know python all that well, but I know sockets.  :-)
[19:52:35] <grrrit-wm>	 (03PS2) 10Chad: Hiera-ize the mediawiki-installation dsh group [puppet] - 10https://gerrit.wikimedia.org/r/204331 
[19:53:12] <grrrit-wm>	 (03PS3) 10Yuvipanda: Revert "Revert "tools: Separate registration / unregistreation for proxylistener"" [puppet] - 10https://gerrit.wikimedia.org/r/204329 
[19:53:13] <YuviPanda>	 Coren: ^
[19:54:41] <grrrit-wm>	 (03CR) 10coren: [C: 031] "Unreverse not unnaproved." [puppet] - 10https://gerrit.wikimedia.org/r/204329 (owner: 10Yuvipanda)
[19:55:06] <Coren>	 :-)  Because "revert revert"  :-)
[19:55:20] <grrrit-wm>	 (03PS4) 10Yuvipanda: Revert "Revert "tools: Separate registration / unregistreation for proxylistener"" [puppet] - 10https://gerrit.wikimedia.org/r/204329 
[19:55:21] <wikibugs>	 6operations, 10Wikimedia-Mailing-lists: scrub non-free PDF from list archives - https://phabricator.wikimedia.org/T95195#1210276 (10Slaporte) 5declined>3Open a:5Slaporte>3None
[19:55:39] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] Revert "Revert "tools: Separate registration / unregistreation for proxylistener"" [puppet] - 10https://gerrit.wikimedia.org/r/204329 (owner: 10Yuvipanda)
[19:58:30] <logmsgbot>	 !log twentyafterfour Started scap: testwiki to php-1.26wmf2 and rebuild l10n cache
[19:59:52] <twentyafterfour>	 I negate your negation with negation. ftw
[20:00:10] <twentyafterfour>	 <3 triple negatives
[20:01:59] <wikibugs>	 6operations, 10MediaWiki-extensions-Sentry, 6Multimedia, 10hardware-requests, 3Multimedia-Sprint-2015-03-25: Procure hardware for Sentry - https://phabricator.wikimedia.org/T93138#1210290 (10RobH) a:5RobH>3Tgr @tgr: has the setup in labs been puppetized at this time?  We tend to not allocate bare met...
[20:03:06] <wikibugs>	 6operations, 7Monitoring, 5Patch-For-Review: remove ganglia(old), replace with ganglia_new - https://phabricator.wikimedia.org/T93776#1210295 (10RobH)
[20:03:08] <wikibugs>	 6operations, 10hardware-requests: hardware for global ganglia aggregator in eqiad - https://phabricator.wikimedia.org/T95792#1210292 (10RobH) 5Open>3declined a:3RobH update from in person & irc conversations:  the ganglia aggregator for codfw is install2001, so this should ideally go on carbon (which som...
[20:04:19] <wikibugs>	 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1210296 (10csteipp) @gwicke and I talked in person and agreed that if all service output that is html, svg, or any other xml-derived format is run through an ex...
[20:05:21] <wikibugs>	 6operations, 10Wikimedia-Mailing-lists: scrub non-free PDF from list archives - https://phabricator.wikimedia.org/T95195#1210297 (10Krenair) 5Open>3declined a:3Krenair
[20:13:12] <grrrit-wm>	 (03CR) 10Chad: "Seems like it'll work: http://puppet-compiler.wmflabs.org/715/change/204331/html/ :)" [puppet] - 10https://gerrit.wikimedia.org/r/204331 (owner: 10Chad)
[20:14:39] <^d>	 YuviPanda: ^ was deceptively easy :)
[20:15:05] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 031] "Seems like the best way to make this work across N environments vs beta + prod." [puppet] - 10https://gerrit.wikimedia.org/r/204331 (owner: 10Chad)
[20:16:52] <grrrit-wm>	 (03PS3) 10Chad: Hiera-ize the mediawiki-installation dsh group [puppet] - 10https://gerrit.wikimedia.org/r/204331 
[20:21:42] <wikibugs>	 6operations, 10ops-codfw: rack/wire/initial setup of db2043-db2070 - https://phabricator.wikimedia.org/T89368#1210313 (10Papaul) db2051 to db2070  Rack table update racking complete  wiring complete
[20:25:45] <grrrit-wm>	 (03PS1) 10Yuvipanda: tools: Fix scope issue + do not explicitly shutdown socket [puppet] - 10https://gerrit.wikimedia.org/r/204333 
[20:25:56] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] tools: Fix scope issue + do not explicitly shutdown socket [puppet] - 10https://gerrit.wikimedia.org/r/204333 (owner: 10Yuvipanda)
[20:25:58] <grrrit-wm>	 (03PS2) 10Yuvipanda: tools: Fix scope issue + do not explicitly shutdown socket [puppet] - 10https://gerrit.wikimedia.org/r/204333 
[20:26:04] <grrrit-wm>	 (03PS3) 10Alex Monk: Add AffCom user group application contact page on meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204205 (https://phabricator.wikimedia.org/T95789) 
[20:27:25] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] tools: Fix scope issue + do not explicitly shutdown socket [puppet] - 10https://gerrit.wikimedia.org/r/204333 (owner: 10Yuvipanda)
[20:28:01] <subbu>	 !log deployed parsoid version ac7a01b9
[20:31:40] <subbu>	 YuviPanda, hmm ... why is that not getting logged you know?
[20:31:49] <YuviPanda>	 morebots is probably dead
[20:32:08] <YuviPanda>	 (https://wikitech.wikimedia.org/wiki/Morebots)
[20:32:15] <YuviPanda>	 I’ll restart it in a while if nobody does
[20:32:25] <subbu>	 ah, ok.
[20:32:26] <YuviPanda>	 but someone needs to own that and fix it. nobody really does atm
[20:35:45] <subbu>	 YuviPanda, I'll update the SAL page directly for now
[20:46:23] <logmsgbot>	 !log twentyafterfour Finished scap: testwiki to php-1.26wmf2 and rebuild l10n cache (duration: 47m 53s)
[20:47:33] <twentyafterfour>	  testwiki still shows 1.26wmf1
[20:47:47] * twentyafterfour wonders where I screwed up
[20:48:28] <aude>	 ?
[20:49:44] <aude>	 i see 1.26wmf2
[20:49:51] <twentyafterfour>	 oh weird - https://test.wikipedia.org/wiki/Special:Version  shows  "This is a test of release of MediaWiki 1.26wmf1 (49cbab3). "  at the top but "MediaWiki 	1.26wmf2 (8e57fcd)
[20:49:54] <twentyafterfour>	 18:50, 15 April 2015" further down
[20:50:10] <aude>	 strange
[20:50:22] <twentyafterfour>	 yeah
[20:50:23] <aude>	 that's a central notice banner
[20:50:29] <aude>	 not sure how it gets set
[20:50:49] <grrrit-wm>	 (03CR) 1020after4: [C: 032] Wikipedias to 1.26wmf1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204320 (owner: 1020after4)
[20:50:54] <Nemo_bis>	 It's not, it's a sitenotice https://test.wikipedia.org/wiki/MediaWiki:Sitenotice
[20:51:02] <grrrit-wm>	 (03Merged) 10jenkins-bot: Wikipedias to 1.26wmf1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204320 (owner: 1020after4)
[20:51:15] <Nemo_bis>	 mere {{CURRENTVERSION}}
[20:51:45] <twentyafterfour>	 and where is CURRENTVERSION defined? maybe it's cached?
[20:53:17] <wikibugs>	 6operations: Update DNS for the Wikipedia store, before May 31 - https://phabricator.wikimedia.org/T96182#1210351 (10Jgreen) p:5Normal>3Triage a:5Jgreen>3None
[20:54:25] <twentyafterfour>	 seems like it's cache somewhere - I logged in to testwiki and now the sitenotice shows wmf2  at the top of the page.. but https://test.wikipedia.org/wiki/MediaWiki:Sitenotice  shows 1.26wmf1 (with a different hash now!) in the page body
[20:55:10] <wikibugs>	 6operations: Give Google webmaster tools access to jon katz (Read only is fine) - https://phabricator.wikimedia.org/T90980#1210357 (10Philippe-WMF) FWIW, and late, but... approved from my end.  pb  ___________________ Philippe Beaudette Director, Community Advocacy Wikimedia Foundation, Inc.  415-839-6885, x 664...
[20:57:21] <logmsgbot>	 !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.26wmf1
[20:59:26] <grrrit-wm>	 (03CR) 1020after4: [C: 032] Group0 to 1.26wmf2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204321 (owner: 1020after4)
[20:59:56] <wikibugs>	 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: enable authenticated access to Cassandra JMX - https://phabricator.wikimedia.org/T92471#1210388 (10Eevans) Some additional information:  The new local-only option introduced in 2.1.4 does //not// support authentication, or encryption.  So th...
[21:00:19] <grrrit-wm>	 (03Merged) 10jenkins-bot: Group0 to 1.26wmf2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204321 (owner: 1020after4)
[21:01:17] <logmsgbot>	 !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf2
[21:03:11] <logmsgbot>	 !log twentyafterfour Purged l10n cache for 1.25wmf24
[21:04:18] <grrrit-wm>	 (03PS5) 10Andrew Bogott: Have sink create ldap host entries. [puppet] - 10https://gerrit.wikimedia.org/r/202582 
[21:08:36] <grrrit-wm>	 (03PS1) 10Yuvipanda: tools: Fix missing import [puppet] - 10https://gerrit.wikimedia.org/r/204337 
[21:10:01] <icinga-wm>	 PROBLEM - puppet last run on mw1008 is CRITICAL Puppet has 1 failures
[21:10:49] <grrrit-wm>	 (03PS2) 10Yuvipanda: tools: Fix missing import [puppet] - 10https://gerrit.wikimedia.org/r/204337 
[21:11:01] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Fix missing import [puppet] - 10https://gerrit.wikimedia.org/r/204337 (owner: 10Yuvipanda)
[21:12:21] <twentyafterfour>	 !log cleaned up /srv/mediawiki/php-1.25wmf20
[21:13:50] <wikibugs>	 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1210443 (10BBlack) >>! In T95229#1210296, @csteipp wrote: > Ops, from my perspective, it would be really great to be able to plan for using alternate, unauthent...
[21:16:05] <grrrit-wm>	 (03PS7) 10Andrew Bogott: Set up ssh keys so that designate can clear salt and puppet certs. [puppet] - 10https://gerrit.wikimedia.org/r/204067 
[21:17:54] <grrrit-wm>	 (03CR) 10Rush: [C: 031] Set up ssh keys so that designate can clear salt and puppet certs. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/204067 (owner: 10Andrew Bogott)
[21:24:41] <icinga-wm>	 RECOVERY - puppet last run on mw1008 is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures
[21:24:56] <grrrit-wm>	 (03PS8) 10Andrew Bogott: Set up ssh keys so that designate can clear salt and puppet certs. [puppet] - 10https://gerrit.wikimedia.org/r/204067 
[21:25:20] <wikibugs>	 6operations, 10hardware-requests, 5Patch-For-Review: Decom/repurpose rbf* hosts - https://phabricator.wikimedia.org/T95153#1210535 (10RobH) 5Open>3Resolved added back to spares
[21:25:59] <grrrit-wm>	 (03PS1) 10Yuvipanda: tools: Fix more silly copy paste errors [puppet] - 10https://gerrit.wikimedia.org/r/204340 
[21:26:10] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] tools: Fix more silly copy paste errors [puppet] - 10https://gerrit.wikimedia.org/r/204340 (owner: 10Yuvipanda)
[21:26:16] <grrrit-wm>	 (03PS2) 10Yuvipanda: tools: Fix more silly copy paste errors [puppet] - 10https://gerrit.wikimedia.org/r/204340 
[21:27:56] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] tools: Fix more silly copy paste errors [puppet] - 10https://gerrit.wikimedia.org/r/204340 (owner: 10Yuvipanda)
[21:34:21] <^d>	 Can someone kick hhvm on mw1191? Complaints of full TC cache on fluorine.
[21:37:29] <mutante>	 ^d: done
[21:37:33] <^d>	 thx
[21:37:34] <mutante>	 !log restarted hhvm on mw1191
[21:41:11] <wikibugs>	 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1210586 (10csteipp) > What we're already possibly-planning around is including *.wikimedia.org in all of the certs so that potentially one IP + one cert can han...
[21:42:03] <grrrit-wm>	 (03PS1) 10Yuvipanda: tools: Set portreleaser to be epilog script for web queues [puppet] - 10https://gerrit.wikimedia.org/r/204366 (https://phabricator.wikimedia.org/T96059) 
[21:42:42] <grrrit-wm>	 (03CR) 10Dzahn: Hiera-ize the mediawiki-installation dsh group (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/204331 (owner: 10Chad)
[21:44:51] <wikibugs>	 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1210595 (10GWicke) It would be great to use this for upload especially.
[21:47:38] <grrrit-wm>	 (03PS9) 10Andrew Bogott: Set up ssh keys so that designate can clear salt and puppet certs. [puppet] - 10https://gerrit.wikimedia.org/r/204067 
[21:48:46] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Set up ssh keys so that designate can clear salt and puppet certs. [puppet] - 10https://gerrit.wikimedia.org/r/204067 (owner: 10Andrew Bogott)
[21:49:22] <grrrit-wm>	 (03CR) 10Rush: [C: 031] "niiiice" [puppet] - 10https://gerrit.wikimedia.org/r/204067 (owner: 10Andrew Bogott)
[21:51:05] <grrrit-wm>	 (03CR) 10Dzahn: [C: 04-1] Set up ssh keys so that designate can clear salt and puppet certs. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/204067 (owner: 10Andrew Bogott)
[21:51:38] <mutante>	 andrewbogott: "@resolve" is specific to ferm
[21:51:50] <andrewbogott>	 so I have to hardcode an ip?
[21:51:51] <mutante>	 http://ferm.foo-projects.org/download/2.1/ferm.html#_resolve__hostname1_hostname2________type__
[21:53:13] <mutante>	 that, or you have to do the DNS lookup differently
[21:53:21] <mutante>	 in an .erb template 
[21:54:36] <andrewbogott>	 mutante: scope.function_ipresolve?
[21:54:37] <mutante>	  Socket.gethostbyname("hal")
[21:56:06] <mutante>	 http://grokbase.com/p/gg/puppet-users/136n1jtdcg/how-to-resolve-hostnames-to-ip-addresses-in-templates
[21:57:23] <mutante>	 andrewbogott: i suppose that works too, because i see we use it in the strongswan module
[21:57:32] <mutante>	 one is in puppet itself the other in .erb
[21:57:52] <mutante>	 well, no, both are used in templates
[21:57:55] <bblack>	 which is very new :)
[21:58:13] <bblack>	 in general, it would be better to avoid resolving DNS inside of puppet if we can, in cases where possible
[21:58:21] <mutante>	 modules/strongswan/lib/puppet/parser/functions/ipresolve.rb:  newfunction(:ipresolve, 
[21:58:32] <andrewbogott>	 bblack: really?  It’s /better/ to have the ip hard-coded?
[21:58:34] <mutante>	 ^ ah, so that is our own function
[21:58:41] <andrewbogott>	 That seems fragile if we want to move something
[21:59:27] <mutante>	 how about still avoiding to have it in the manifests, so put it in hiera, but use the IP?
[21:59:27] <bblack>	 andrewbogott: no, it's /better/ to pass a hostname down to whatever-configuration on the host, and let it resolve that on the target host :)
[21:59:43] <andrewbogott>	 can an ssl cert’s “from=“ resolve?
[21:59:53] <bblack>	 but if unavoidable, we can do that in puppet because the tool requires IPs as inputs
[22:00:57] <bblack>	 (it would be much better if the tool didn't, though, but I can see how a firewall is kind of a special case.  Still...)
[22:01:44] <grrrit-wm>	 (03PS10) 10Andrew Bogott: Set up ssh keys so that designate can clear salt and puppet certs. [puppet] - 10https://gerrit.wikimedia.org/r/204067 
[22:02:36] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Set up ssh keys so that designate can clear salt and puppet certs. [puppet] - 10https://gerrit.wikimedia.org/r/204067 (owner: 10Andrew Bogott)
[22:03:09] <andrewbogott>	 goddamn I am tired of this patch
[22:05:14] <grrrit-wm>	 (03PS11) 10Andrew Bogott: Set up ssh keys so that designate can clear salt and puppet certs. [puppet] - 10https://gerrit.wikimedia.org/r/204067 
[22:06:21] <grrrit-wm>	 (03PS2) 10Yuvipanda: tools: Set portreleaser to be epilog script for web queues [puppet] - 10https://gerrit.wikimedia.org/r/204366 (https://phabricator.wikimedia.org/T96059) 
[22:09:44] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Set up ssh keys so that designate can clear salt and puppet certs. [puppet] - 10https://gerrit.wikimedia.org/r/204067 (owner: 10Andrew Bogott)
[22:10:01] <YuviPanda>	 Coren: wanna +1 https://gerrit.wikimedia.org/r/#/c/204366/?
[22:10:07] <YuviPanda>	 then it’s only the monitoring script left
[22:13:32] <grrrit-wm>	 (03CR) 10MaxSem: "I also think that having 2 "api" entry points on the same domain is going to be misleading, can we use "rest" or something?" [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) (owner: 10GWicke)
[22:13:45] <grrrit-wm>	 (03CR) 10coren: [C: 031] tools: Set portreleaser to be epilog script for web queues [puppet] - 10https://gerrit.wikimedia.org/r/204366 (https://phabricator.wikimedia.org/T96059) (owner: 10Yuvipanda)
[22:14:05] <grrrit-wm>	 (03PS3) 10Yuvipanda: tools: Set portreleaser to be epilog script for web queues [puppet] - 10https://gerrit.wikimedia.org/r/204366 (https://phabricator.wikimedia.org/T96059) 
[22:14:22] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Set portreleaser to be epilog script for web queues [puppet] - 10https://gerrit.wikimedia.org/r/204366 (https://phabricator.wikimedia.org/T96059) (owner: 10Yuvipanda)
[22:15:00] <icinga-wm>	 PROBLEM - puppet last run on virt1000 is CRITICAL puppet fail
[22:15:11] <grrrit-wm>	 (03CR) 10GWicke: "@MaxSem: The idea is to have all APIs share the same root eventually, so that clients can just point to http://project.org/api/ for all th" [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) (owner: 10GWicke)
[22:16:12] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Avoid duplicate definition of puppetmaster::certmanager [puppet] - 10https://gerrit.wikimedia.org/r/204397 
[22:17:25] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Avoid duplicate definition of puppetmaster::certmanager [puppet] - 10https://gerrit.wikimedia.org/r/204397 (owner: 10Andrew Bogott)
[22:20:00] <icinga-wm>	 RECOVERY - puppet last run on virt1000 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[22:29:21] <icinga-wm>	 PROBLEM - puppet last run on labcontrol2001 is CRITICAL puppet fail
[22:34:59] <grrrit-wm>	 (03PS2) 10Dereckson: User rights configuration on ne.wikipedia - Filemover [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203335 (https://phabricator.wikimedia.org/T95103) 
[22:36:44] <grrrit-wm>	 (03PS2) 10Dzahn: sshd: set Message Authentication Code ciphers [puppet] - 10https://gerrit.wikimedia.org/r/185329 
[22:58:21] <grrrit-wm>	 (03CR) 10Nuria: [C: 031] eventlogging: adjust counters thresholds (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/204237 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi)
[23:01:07] <RoanKattouw>	 Hmm, no jouncebot
[23:01:08] <grrrit-wm>	 (03PS3) 10Dzahn: sshd: use Chacha20-poly1305,AES-CGM ciphers [puppet] - 10https://gerrit.wikimedia.org/r/185325 
[23:01:12] <RoanKattouw>	 I'm doing SWAT today
[23:01:17] <RoanKattouw>	 YuviPanda: No jouncebot?
[23:03:08] <RoanKattouw>	 Dereckson: You around for your config patches?
[23:03:32] <Dereckson>	 Hi. Yup.
[23:04:39] <Dereckson>	 And my test plan is ready to check the changes when deployed. http://etherpad.wikimedia.org/p/deploy-20150416-SWAT-evening
[23:05:43] <RoanKattouw>	 Awesome
[23:05:46] <RoanKattouw>	 I'll start merging them now
[23:05:52] <RoanKattouw>	 And once they merge I'll deploy them all at once
[23:06:00] <grrrit-wm>	 (03CR) 10Catrope: [C: 032] Namespace configuration on ru.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202912 (https://phabricator.wikimedia.org/T95110) (owner: 10Dereckson)
[23:06:01] <YuviPanda>	 RoanKattouw: possibly. No idea why everyone asks me :)
[23:06:05] <grrrit-wm>	 (03CR) 10Catrope: [C: 032] User rights configuration on ne.wikipedia - Filemover [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203335 (https://phabricator.wikimedia.org/T95103) (owner: 10Dereckson)
[23:06:13] <YuviPanda>	 I can take a look in an hour or so 
[23:06:15] <RoanKattouw>	 YuviPanda: Do you know who owns it?
[23:06:21] <grrrit-wm>	 (03CR) 10Catrope: [C: 032] Namespace configuration on it.wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203354 (https://phabricator.wikimedia.org/T93870) (owner: 10Dereckson)
[23:06:22] <YuviPanda>	 Nobody atm
[23:06:25] <grrrit-wm>	 (03CR) 10Catrope: [C: 032] Logo configuration on he.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203422 (https://phabricator.wikimedia.org/T75424) (owner: 10Dereckson)
[23:06:30] <grrrit-wm>	 (03CR) 10Catrope: [C: 032] Set meta namespace and site name on or.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203785 (https://phabricator.wikimedia.org/T94142) (owner: 10Dereckson)
[23:11:36] <YuviPanda>	 RoanKattouw: it is unowned and used very similar to morebots 
[23:11:52] <YuviPanda>	 Which is also dead atm
[23:12:39] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Install the cert_manager with a file resource [puppet] - 10https://gerrit.wikimedia.org/r/204411 
[23:12:41] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Just hardcode the designate ip. [puppet] - 10https://gerrit.wikimedia.org/r/204412 
[23:15:34] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Install the cert_manager with a file resource [puppet] - 10https://gerrit.wikimedia.org/r/204411 (owner: 10Andrew Bogott)
[23:17:29] <grrrit-wm>	 (03Merged) 10jenkins-bot: Namespace configuration on ru.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202912 (https://phabricator.wikimedia.org/T95110) (owner: 10Dereckson)
[23:17:32] <grrrit-wm>	 (03Merged) 10jenkins-bot: Namespace configuration on it.wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203354 (https://phabricator.wikimedia.org/T93870) (owner: 10Dereckson)
[23:17:35] <grrrit-wm>	 (03Merged) 10jenkins-bot: Logo configuration on he.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203422 (https://phabricator.wikimedia.org/T75424) (owner: 10Dereckson)
[23:17:37] <grrrit-wm>	 (03Merged) 10jenkins-bot: Set meta namespace and site name on or.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203785 (https://phabricator.wikimedia.org/T94142) (owner: 10Dereckson)
[23:18:09] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Just hardcode the designate ip. [puppet] - 10https://gerrit.wikimedia.org/r/204412 (owner: 10Andrew Bogott)
[23:21:59] <grrrit-wm>	 (03CR) 10Catrope: [C: 032] "Jenkins?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203335 (https://phabricator.wikimedia.org/T95103) (owner: 10Dereckson)
[23:22:32] <RoanKattouw>	 Oh, right
[23:22:43] <Dereckson>	 Oh sorry
[23:22:45] <Dereckson>	 seen the dep
[23:22:50] <RoanKattouw>	 Yeah
[23:22:58] <RoanKattouw>	 Do you want me to do the dependencies too? Or just the one?
[23:23:13] <Dereckson>	 I offer to postpone this deploy with the other two another day.
[23:23:24] <RoanKattouw>	 OK cool
[23:23:31] <RoanKattouw>	 I'll do the other 4 now then
[23:25:40] <logmsgbot>	 !log catrope Synchronized wmf-config/InitialiseSettings.php: SWAT (duration: 00m 14s)
[23:26:16] <RoanKattouw>	 Dereckson: There you go
[23:26:31] <Dereckson>	 Thank you. Checking.
[23:27:14] <RoanKattouw>	 superm401: Around for your SWAT?
[23:27:22] <superm401>	 RoanKattouw, yep
[23:27:46] <logmsgbot>	 !log catrope Synchronized php-1.26wmf1/extensions/Citoid: SWAT (duration: 00m 12s)
[23:27:46] <RoanKattouw>	 Cool
[23:28:35] <logmsgbot>	 !log catrope Synchronized php-1.26wmf1/extensions/Flow: SWAT (duration: 00m 16s)
[23:28:43] <RoanKattouw>	 superm401: There you go ---^^
[23:29:12] <superm401>	 RoanKattouw, works, thanks.
[23:29:34] <RoanKattouw>	 Sweet
[23:29:44] <logmsgbot>	 !log catrope Synchronized php-1.26wmf2/extensions/Citoid: SWAT (duration: 00m 14s)
[23:32:41] <Dereckson>	 Changes verified, all seems to work fine.
[23:33:42] <Dereckson>	 I were a little afraid on or.wikt, as they don't have any test on the community portal, but [[Special:All pages]] gives me some results.
[23:34:20] <Dereckson>	 (including a village pump, I will suggest to create a redirect from the community portal link to this page on the Phabricator task)
[23:34:32] <Dereckson>	 s/any test/any text
[23:39:00] <icinga-wm>	 PROBLEM - puppet last run on holmium is CRITICAL Puppet has 1 failures
[23:39:01] <icinga-wm>	 PROBLEM - puppet last run on cp3017 is CRITICAL puppet fail
[23:39:27] <logmsgbot>	 !log catrope Synchronized php-1.26wmf1/extensions/VisualEditor: SWAT (duration: 00m 12s)
[23:39:36] <logmsgbot>	 !log catrope Synchronized php-1.26wmf2/extensions/VisualEditor: SWAT (duration: 00m 12s)
[23:40:36] <gwicke>	 MaxSem: you might want to weigh in at https://phabricator.wikimedia.org/T95229
[23:40:41] <wikibugs>	 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1210921 (10Dzahn) now, also see T96182, which says " A Record to point to our new IP address: 23.227.38.32" in contradiction to what Andrew and myself ha...
[23:41:48] <grrrit-wm1>	 (03CR) 10Andrew Bogott: [C: 032] Create the .ssh dir before sticking a key in it [puppet] - 10https://gerrit.wikimedia.org/r/204418 (owner: 10Andrew Bogott)
[23:43:20] <MaxSem>	 gwicke, I agree with you however all my reasons are already mentioned
[23:44:24] <gwicke>	 MaxSem: a 'me too' is fine as well ;)
[23:45:16] <logmsgbot>	 !log catrope Synchronized php-1.26wmf1/extensions/VisualEditor: Revert SWAT for VE wmf1, caused JS errors (duration: 00m 12s)
[23:45:40] <icinga-wm>	 RECOVERY - puppet last run on holmium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[23:49:20] <wikibugs>	 6operations: Update DNS for the Wikipedia store, before May 31 - https://phabricator.wikimedia.org/T96182#1210927 (10Dzahn) At this point there are 3 different answers we have recevied:  a) use the existing CNAME shopwikipedia.myshopify.com. and just move that from "shop" to "store"  this seems logical because i...
[23:49:50] <grrrit-wm1>	 (03PS1) 10Andrew Bogott: Fix insertion of the designate ip into the certmanager key [puppet] - 10https://gerrit.wikimedia.org/r/204422 
[23:50:37] <wikibugs>	 6operations: Update DNS for the Wikipedia store, before May 31 - https://phabricator.wikimedia.org/T96182#1210936 (10Dzahn) Since it's already so confusing, It's probably better if we keep all the updates in one place, i'd suggest we keep using T92438.
[23:50:57] <grrrit-wm1>	 (03CR) 10Andrew Bogott: [C: 032] Fix insertion of the designate ip into the certmanager key [puppet] - 10https://gerrit.wikimedia.org/r/204422 (owner: 10Andrew Bogott)
[23:51:12] <Dereckson>	 RoanKattouw: thank you for the deploy.
[23:51:30] <RoanKattouw>	 No problem
[23:58:40] <icinga-wm>	 RECOVERY - puppet last run on cp3017 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures