[00:00:04] <jouncebot>	 RoanKattouw ostriches Krenair: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160126T0000). Please do the needful.
[00:00:04] <jouncebot>	 James_F AaronSchulz ebernhardson jgirault jan_drewniak bd808: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[00:00:19] <jgirault>	 here o/
[00:00:21] <bd808>	 mega ping
[00:00:33] * Krenair grumbles
[00:00:35] <Krenair>	 10 patches
[00:00:54] <Krenair>	 there is an *8* patch limit
[00:00:56] <James_F>	 I'm happy to skip mine for space.
[00:01:01] <James_F>	 Even if mine were there first
[00:01:03] <Krenair>	 not that I have time to do it today anyway, but still
[00:01:48] <bd808>	 I can slide mine to tomorrow AM with no harm
[00:01:50] * bd808 does so
[00:02:59] <Dereckson>	 There is already 8 I think tomorrow AM
[00:03:47] <ebernhardson>	 they are entirely config patches, so it's an easy deploy
[00:03:49] <ebernhardson>	 i can just do them all
[00:04:16] <AaronSchulz>	 that's the spirit ;)
[00:04:26] <Dereckson>	 okay
[00:05:21] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] VisualEditor: Provide framework for enabling an A/B test for IPs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258404 (owner: 10Jforrester)
[00:05:25] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] VisualEditor: Don't set ShowBetaWelcome, now set in repo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258405 (owner: 10Jforrester)
[00:05:40] <bd808>	 ebernhardson: mine can stay or go as you see fit. I would have done it this afternoon but I was busy with other stuffs
[00:05:56] <ebernhardson>	 James_F: actually, you had a -1 on the second patch. But I'm assuming since you asked for deploy the dependency has been shipped out?
[00:06:15] <James_F>	 ebernhardson: Yeah, the C-1 is now removed.
[00:06:17] <grrrit-wm>	 (03Merged) 10jenkins-bot: VisualEditor: Provide framework for enabling an A/B test for IPs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258404 (owner: 10Jforrester)
[00:06:43] <grrrit-wm>	 (03Merged) 10jenkins-bot: VisualEditor: Don't set ShowBetaWelcome, now set in repo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258405 (owner: 10Jforrester)
[00:08:52] <logmsgbot>	 !log ebernhardson@mira Synchronized wmf-config/InitialiseSettings.php: SWAT James_F (duration: 01m 35s)
[00:08:55] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:09:59] <James_F>	 ebernhardson: Nothing seems broken here.
[00:10:25] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] filebackend: add configuration for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197499 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto)
[00:10:28] <AaronSchulz>	 "SWAT James F"...sounds dastardly 
[00:10:30] <logmsgbot>	 !log ebernhardson@mira Synchronized wmf-config/CommonSettings.php: SWAT James_F (duration: 01m 26s)
[00:10:32] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:10:33] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] Set $wgCentralAuthUseSlaves for loginwiki, mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265178 (https://phabricator.wikimedia.org/T119689) (owner: 10Aaron Schulz)
[00:10:36] <wikibugs>	 6operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-Requests: Rename cbk-zamwiki to cbkwiki - https://phabricator.wikimedia.org/T124657#1964245 (10Liuxinyu970226)
[00:10:47] <ebernhardson>	 James_F: well, not broken is a success i suppose ;)
[00:11:02] <James_F>	 ebernhardson: :-)
[00:11:13] <AaronSchulz>	 ebernhardson: https://en.wikipedia.org/wiki/Swatting :)
[00:11:16] <grrrit-wm>	 (03Merged) 10jenkins-bot: filebackend: add configuration for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197499 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto)
[00:11:25] <ebernhardson>	 lol :)
[00:11:39] <grrrit-wm>	 (03Merged) 10jenkins-bot: Set $wgCentralAuthUseSlaves for loginwiki, mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265178 (https://phabricator.wikimedia.org/T119689) (owner: 10Aaron Schulz)
[00:11:54] <wikibugs>	 6operations, 10Wikimedia-Site-Requests: Rename cbk-zamwiki to cbkwiki - https://phabricator.wikimedia.org/T124657#1964250 (10Liuxinyu970226)
[00:12:50] <wikibugs>	 6operations, 10Wikimedia-Site-Requests: Rename cbk-zamwiki to cbkwiki - https://phabricator.wikimedia.org/T124657#1961729 (10Liuxinyu970226) Sorry
[00:13:32] <logmsgbot>	 !log ebernhardson@mira Synchronized wmf-config/filebackend-production.php: SWAT AaronSchulz (duration: 01m 26s)
[00:13:35] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:14:00] <ebernhardson>	 AaronSchulz: first out, second is syncing now
[00:14:50] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] Adjust cirrus titlesuggest index shard counts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261287 (https://phabricator.wikimedia.org/T124332) (owner: 10EBernhardson)
[00:14:56] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] Remove variables for unused experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/260176 (owner: 10EBernhardson)
[00:15:01] <AaronSchulz>	 ok
[00:15:05] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] Change CirrusSearch sharding values for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265372 (https://phabricator.wikimedia.org/T124215) (owner: 10EBernhardson)
[00:15:13] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] Add popularity_score field to cirrussearch indices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265927 (owner: 10EBernhardson)
[00:15:17] <logmsgbot>	 !log ebernhardson@mira Synchronized wmf-config/CommonSettings.php: SWAT AaronSchulz (duration: 01m 26s)
[00:15:20] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:15:31] <grrrit-wm>	 (03Merged) 10jenkins-bot: Adjust cirrus titlesuggest index shard counts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261287 (https://phabricator.wikimedia.org/T124332) (owner: 10EBernhardson)
[00:15:56] <grrrit-wm>	 (03Merged) 10jenkins-bot: Remove variables for unused experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/260176 (owner: 10EBernhardson)
[00:15:58] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Change CirrusSearch sharding values for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265372 (https://phabricator.wikimedia.org/T124215) (owner: 10EBernhardson)
[00:16:01] <ebernhardson>	 :S
[00:16:24] <ebernhardson>	 oh, merge conflict with another patch i just merged...sec
[00:16:52] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add popularity_score field to cirrussearch indices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265927 (owner: 10EBernhardson)
[00:19:28] <grrrit-wm>	 (03PS4) 10EBernhardson: Change CirrusSearch sharding values for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265372 (https://phabricator.wikimedia.org/T124215) 
[00:19:50] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Change CirrusSearch sharding values for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265372 (https://phabricator.wikimedia.org/T124215) (owner: 10EBernhardson)
[00:20:09] * ebernhardson hates merge conflicts..
[00:21:05] <grrrit-wm>	 (03PS5) 10EBernhardson: Change CirrusSearch sharding values for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265372 (https://phabricator.wikimedia.org/T124215) 
[00:22:21] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] Change CirrusSearch sharding values for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265372 (https://phabricator.wikimedia.org/T124215) (owner: 10EBernhardson)
[00:22:58] <grrrit-wm>	 (03Merged) 10jenkins-bot: Change CirrusSearch sharding values for codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265372 (https://phabricator.wikimedia.org/T124215) (owner: 10EBernhardson)
[00:25:19] <logmsgbot>	 !log ebernhardson@mira Synchronized wmf-config/CommonSettings.php: SWAT ebernhardson (duration: 01m 27s)
[00:25:21] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:26:10] <wikibugs>	 6operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service: Adjust balance of WDQS nodes to allow continued operation if eqiad went offline. - https://phabricator.wikimedia.org/T124627#1964292 (10Tfinc) If we have lost a DC then we should not be doing maintenance on a node. Stability is key then. I'm us...
[00:27:19] <logmsgbot>	 !log ebernhardson@mira Synchronized wmf-config/CirrusSearch-common.php: SWAT ebernhardson (duration: 01m 26s)
[00:27:21] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:28:04] <ebernhardson>	 jgirault: you're up next
[00:28:13] <jgirault>	 o/
[00:28:28] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] Bump portals to master (remove A/B/C test from production) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266278 (https://phabricator.wikimedia.org/T124245) (owner: 10JGirault)
[00:28:54] <jgirault>	 are you cleaning varnish?
[00:29:02] <logmsgbot>	 !log ebernhardson@mira Synchronized wmf-config/InitialiseSettings.php: SWAT ebernhardson (duration: 01m 26s)
[00:29:05] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:29:36] <grrrit-wm>	 (03Merged) 10jenkins-bot: Bump portals to master (remove A/B/C test from production) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266278 (https://phabricator.wikimedia.org/T124245) (owner: 10JGirault)
[00:30:57] <wikibugs>	 7Blocked-on-Operations, 6operations, 10RESTBase, 10hardware-requests: Expand SSD space in Cassandra cluster - https://phabricator.wikimedia.org/T121575#1964309 (10RobH)
[00:32:04] <logmsgbot>	 !log ebernhardson@mira Synchronized portals/: SWAT jgirault (duration: 01m 28s)
[00:32:07] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:32:10] <ebernhardson>	 jgirault: yours is out, please check
[00:32:24] <ebernhardson>	 jgirault: might need to avoid cache headers somehow, or i can purge
[00:32:42] <jgirault>	 ebernhardson: can you purge?
[00:32:50] <ebernhardson>	 jgirault: just the main url, or assets too?
[00:33:00] <jgirault>	 actually it works, just tried
[00:33:15] <ebernhardson>	 does that mean caching is broken? :)
[00:33:31] <jgirault>	 ebernhardson: I tried https://www.wikipedia.org/?abc123#abtest2
[00:33:41] <ebernhardson>	 ahh, ok
[00:33:51] <jgirault>	 ebernhardson: we’re good !
[00:34:10] <wikibugs>	 6operations: Metrics not reaching Graphite - https://phabricator.wikimedia.org/T124639#1964356 (10Krinkle) 5Open>3Resolved Monitoring (mid-long term) statsv is {T117994}.  Restart (one-time) has been down. Closing task (assuming that's all for now).
[00:34:12] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] Only send warning and higher session logs to logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266307 (owner: 10BryanDavis)
[00:34:16] <ebernhardson>	 bd808: last up is you
[00:34:27] <wikibugs>	 6operations, 7Graphite, 7Monitoring: Add monitoring for analytics-statsv service - https://phabricator.wikimedia.org/T117994#1788765 (10Krinkle)
[00:34:46] <wikibugs>	 6operations, 6Performance-Team, 7Graphite, 7Monitoring: Add monitoring for analytics-statsv service - https://phabricator.wikimedia.org/T117994#1788765 (10Krinkle)
[00:34:53] <grrrit-wm>	 (03Merged) 10jenkins-bot: Only send warning and higher session logs to logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266307 (owner: 10BryanDavis)
[00:35:19] <bd808>	 sweet
[00:37:43] <logmsgbot>	 !log ebernhardson@mira Synchronized wmf-config/InitialiseSettings.php: SWAT bd808 (duration: 01m 34s)
[00:37:45] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:37:49] <ebernhardson>	 bd808: all synced out
[00:38:38] <ebernhardson>	 minor jump in kibana fatalmonitor, but looks to be related to earlier patch
[00:38:56] <ebernhardson>	 looks like sync order issue perhaps...
[00:39:04] <bd808>	 ebernhardson: looks good. Log volume for the session channel dropped as expected
[00:40:33] <logmsgbot>	 !log ebernhardson@mira Synchronized wmf-config/CommonSettings.php: (no message) (duration: 01m 25s)
[00:40:36] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:41:13] <ori>	 hoo: yt?
[00:41:43] <ebernhardson>	 looks to have stopped with the resync, i think with that SWAT is complete
[00:42:10] <hoo>	 ori: Yes
[00:43:01] <grrrit-wm>	 (03PS4) 10Nuria: Removing code that generates pageviews using legacy definition [puppet] - 10https://gerrit.wikimedia.org/r/265656 (https://phabricator.wikimedia.org/T124244) 
[00:44:16] <ori>	 hoo: I proposed a change to how test.wikipedia.org is configured in Varnish in <https://lists.wikimedia.org/pipermail/engineering/2016-January/000017.html>; bblack pointed out (correctly) that it would impact test.wikidata.org as well, so I wanted to check if that would be OK.
[00:44:55] <hoo>	 I saw that… I don't think we do anything that is impacted by that
[00:45:17] <Krenair>	 someone remind me what the distinction between engineering and wikitech-l is now?
[00:45:31] <mobrovac>	 !log mobileapps deploying c2318b6
[00:45:33] <ebernhardson>	 Krenair: engineering gets read more often
[00:45:34] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:45:41] <ori>	 ebernhardson: heh
[00:45:46] <mobrovac>	 Krenair: less noise on engineering
[00:45:54] <Krenair>	 really? :| wow
[00:46:19] <ori>	 Krenair: I sent it to engineering@ because I figured the change would only meaningfully impact people with shell access
[00:46:41] <Krenair>	 did you announce to shell users that engineering@ was opened to subscription?
[00:48:24] <YuviPanda>	 yeah, I have wikitech-l set to digest and engineering not set to digest
[00:48:24] <ori>	 Krenair: No; I wasn't sure if I should. When I proposed to close the list, several people said they wanted to have a low-volume list that was scoped (in subject-matter, if not visibility) to things that pertain to staff
[00:48:48] <Krenair>	 shell access is not limited to staff
[00:48:58] <Krenair>	 shell access has never been limited to staff
[00:49:23] <YuviPanda>	 what does engineering@ have to do with shell access?
[00:52:37] <wikibugs>	 6operations, 10CirrusSearch, 6Discovery, 7Elasticsearch: Look into encrypting Elasticsearch traffic - https://phabricator.wikimedia.org/T124444#1964464 (10EBernhardson) Moving this into the ops column, as this is almost entirely backend infrastructure. Switching the connection from http to https between ap...
[00:57:58] <Krenair>	 ori, ^
[00:59:17] <ori>	 jesus, I don't know
[00:59:28] <ori>	 forward it to anyone you think might care
[01:00:10] <ori>	 it is insanely exhausting to be grilled about a single e-mail like that, especially now that the list is in principle accessible to anyone
[01:00:52] <YuviPanda>	 ori: BUT YOU MUST EXPLAIN YOURSELF!
[01:02:08] <grrrit-wm>	 (03PS1) 10Tim Landscheidt: Tools: Allow proxymanager to add and remove proxy forward entries [puppet] - 10https://gerrit.wikimedia.org/r/266448 
[01:02:47] <ori>	 that kind of scrutiny feels hostile to me, and I think that it is ultimately counterproductive, in that it makes people more likely to avoid public lists altogether.
[01:03:03] <grrrit-wm>	 (03CR) 10Tim Landscheidt: "Tests:" [puppet] - 10https://gerrit.wikimedia.org/r/266448 (owner: 10Tim Landscheidt)
[01:09:03] <Krenair>	 ori, it's about being inclusive of non-staff, I'm not trying to personally attack you
[01:10:26] <ostriches>	 Clearly what we need...is another list :D
[01:10:28] <Krenair>	 the reasons for keeping engineering@ don't seem particularly strong to me
[01:10:33] <ostriches>	 shell-users-l :D
[01:13:42] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Create /etc/mediawiki/WikitechPrivateSettings.php [puppet] - 10https://gerrit.wikimedia.org/r/266451 (https://phabricator.wikimedia.org/T124732) 
[01:13:44] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Remove puppet classes and files associated with /srv/mediawiki/private/WikitechPrivateLdapSettings.php [puppet] - 10https://gerrit.wikimedia.org/r/266452 (https://phabricator.wikimedia.org/T124732) 
[01:15:44] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Get wikitech private settings from a new location: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266453 (https://phabricator.wikimedia.org/T124732) 
[01:16:19] <grrrit-wm>	 (03CR) 10Alex Monk: Create /etc/mediawiki/WikitechPrivateSettings.php (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/266451 (https://phabricator.wikimedia.org/T124732) (owner: 10Andrew Bogott)
[01:17:05] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 031] Remove puppet classes and files associated with /srv/mediawiki/private/WikitechPrivateLdapSettings.php [puppet] - 10https://gerrit.wikimedia.org/r/266452 (https://phabricator.wikimedia.org/T124732) (owner: 10Andrew Bogott)
[01:19:41] <grrrit-wm>	 (03PS2) 10Andrew Bogott: Create /etc/mediawiki/WikitechPrivateSettings.php [puppet] - 10https://gerrit.wikimedia.org/r/266451 (https://phabricator.wikimedia.org/T124732) 
[01:19:43] <grrrit-wm>	 (03PS2) 10Andrew Bogott: Remove puppet classes and files associated with /srv/mediawiki/private/WikitechPrivateLdapSettings.php [puppet] - 10https://gerrit.wikimedia.org/r/266452 (https://phabricator.wikimedia.org/T124732) 
[01:20:34] <ori>	 the Wikimedia movement has a great many communication channels, with quite a lot of overlap in terms of subject matter. I don't think that it is important that discussions like these (i.e., changes which are reversible and which carry no real political implications) reach anyone who is conceivably interested. Information gets around; public information doubly so.
[01:20:40] <ori>	 It's more important that anyone conceivably be affected be able to retrace the thinking behind the change and respond to it if they object. 
[01:21:04] <ori>	 *anyone conceivably affected
[01:22:18] <ori>	 I have no problem whatsoever with you forwarding it to wikitech-l if you think it would be of interest to readers of that list
[01:22:38] <Krenair>	 I tried to forward it but I suppose it didn't work because I'm not subscribed from that address
[01:22:54] <ori>	 would you like me to forward it?
[01:24:21] <Krenair>	 the problem is that someone is sending something relevant to shell users to a list about staff stuff
[01:24:44] <Krenair>	 people shouldn't really be doing that
[01:25:32] <grrrit-wm>	 (03PS1) 10Cenarium: Move account creation throttle to ping limiter and remove noratelimit from account creators [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266454 (https://phabricator.wikimedia.org/T85538) 
[01:25:52] * YuviPanda rakes ori over the coals some more
[01:25:53] <Krenair>	 it doesn't matter whether it's you or someone else
[01:25:59] <YuviPanda>	 IT DID NOT EVEN HAVE A PGP SIGNATURE!
[01:26:06] <Krenair>	 YuviPanda, you are not being helpful
[01:26:12] <YuviPanda>	 neither are you, Krenair
[01:26:13] <Leah>	 Krenair: I agree that having engineering@ and wikitech-l@ is stupid.
[01:26:16] <Leah>	 I said so.
[01:26:18] <Leah>	 But shrug.
[01:26:33] <Leah>	 It's also dumb that we have #wikimedia-tech and #wikimedia-dev.
[01:26:39] <YuviPanda>	 (and #mediawiki)
[01:27:11] <Leah>	 And #mediawiki-core and #wikimedia-devtools and...
[01:27:18] <Leah>	 Such fragmentation. Oh well.
[01:32:00] <YuviPanda>	 now everyone goes back quietly to status quo, and one less person will attempt to even try anything.
[01:32:12] * YuviPanda goes back to finding things to eat
[01:34:21] <Leah>	 Are we still beating Ori up? I'm curious about the HTML attachments.
[01:36:01] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Create /etc/mediawiki/WikitechPrivateSettings.php [puppet] - 10https://gerrit.wikimedia.org/r/266451 (https://phabricator.wikimedia.org/T124732) (owner: 10Andrew Bogott)
[01:38:31] <ebernhardson>	 the complaining is wildly unhelpful. Whenever anyone tries to do something a few people just bicker and whine as if they should be in charge of everything...
[01:41:07] <andrewbogott>	 Krenair: I’ve verified that the .php files in the before and after of https://gerrit.wikimedia.org/r/#/c/266453/ are the same.  Willing to merge that patch?
[01:42:21] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 032] Get wikitech private settings from a new location: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266453 (https://phabricator.wikimedia.org/T124732) (owner: 10Andrew Bogott)
[01:42:30] <andrewbogott>	 thank you!
[01:42:54] <grrrit-wm>	 (03Merged) 10jenkins-bot: Get wikitech private settings from a new location: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266453 (https://phabricator.wikimedia.org/T124732) (owner: 10Andrew Bogott)
[01:43:00] <Krenair>	 woops, almost did it from tin :)
[01:43:09] <Krenair>	 I should use the deployment.(eqiad|codfw).wmnet thing
[01:44:37] <Krenair>	 mw1019 HHVM unhappy?
[01:45:00] <logmsgbot>	 !log krenair@mira Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/266453/ (duration: 01m 27s)
[01:45:03] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[01:46:09] <Krenair>	 andrewbogott, LGTM
[01:47:08] <andrewbogott>	 yep, looks like wikitech survived.  Thanks.
[01:47:20] <andrewbogott>	 we’ll see what labtestwiki is doing these days...
[01:48:36] <Krenair>	 andrewbogott, not much, it seems?
[01:49:01] <Krenair>	 oh, wait, HSTS
[01:49:21] <Krenair>	 nope, http 500
[01:49:31] <andrewbogott>	 Krenair: it’s still trying to load the old config file which is not there for some reason…
[01:49:40] <andrewbogott>	 I’m syncing and doing a puppet run and we’ll see
[01:49:41] <Krenair>	 oh
[01:49:55] <Krenair>	 I need to run a command
[01:49:56] <andrewbogott>	 PHP Fatal error:  require_once(): Failed opening required '/srv/mediawiki/private/WikitechPrivateLdapSettings.php'
[01:49:59] <andrewbogott>	 seems probably related :)
[01:50:02] <Krenair>	 because we never added it to the scap list
[01:50:13] <andrewbogott>	 I’m doing sync-common on labtestweb2001 right now
[01:50:17] <Krenair>	 so am I
[01:50:21] <andrewbogott>	 great :)
[01:50:39] <andrewbogott>	 I think I’m good with it not getting pushes from scap, since that could clobber development work in progress
[01:50:56] <andrewbogott>	 it loads now
[01:51:01] <wikibugs>	 6operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service: Adjust balance of WDQS nodes to allow continued operation if eqiad went offline. - https://phabricator.wikimedia.org/T124627#1964744 (10Smalyshev) OK then, I would then suggest imaging a server in codfw, and once it is complete we can proceed t...
[01:51:41] <Krenair>	 oh dear
[01:51:49] <Krenair>	 sync-common was really not happy
[01:51:51] <andrewbogott>	 hm, [b162446a] 2016-01-26 01:51:38: Fatal exception of type "PasswordError"
[01:51:53] <andrewbogott>	 that’s a new one
[01:52:08] <andrewbogott>	 Krenair: you think doing two at once broke things?
[01:52:11] <Krenair>	 rsync: rename failed for "/srv/mediawiki/extract2.php" (from .~tmp~/extract2.php): No such file or directory (2)
[01:52:11] <Krenair>	 rsync: rename failed for "/srv/mediawiki/mobilelanding.php" (from .~tmp~/mobilelanding.php): No such file or directory (2)
[01:52:11] <Krenair>	 rsync: rename failed for "/srv/mediawiki/wikiversions.json" (from .~tmp~/wikiversions.json): No such file or directory (2)
[01:52:11] <Krenair>	 rsync: rename failed for "/srv/mediawiki/wikiversions.php" (from .~tmp~/wikiversions.php): No such file or directory (2)
[01:52:13] <Krenair>	 etc.
[01:52:23] <andrewbogott>	 I will stop syncing and let you do another one
[01:52:28] <Krenair>	 I ran it again and it finished in 3 seconds
[01:53:05] <andrewbogott>	 ok.
[01:53:16] <Krenair>	 where did you get that exception?
[01:53:27] <andrewbogott>	 creating an account.
[01:53:31] <andrewbogott>	 I’m going to try again
[01:53:45] <Krenair>	 mysql> select user_name from user;
[01:53:46] <Krenair>	 Empty set (0.00 sec)
[01:53:48] <Krenair>	 yeah
[01:54:26] <andrewbogott>	 ok, same results
[01:54:35] <andrewbogott>	 so, we’re somewhere new at least :)
[01:55:04] <Krenair>	 so what was the rest of the exception?
[01:55:19] <andrewbogott>	 tragically, that’s all it says
[01:55:30] <grrrit-wm>	 (03PS6) 10Krinkle: [WIP] Implement /w/static.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263566 (https://phabricator.wikimedia.org/T99096) 
[01:55:46] <wikibugs>	 6operations, 10MediaWiki-Cache, 10MediaWiki-JobQueue, 10MediaWiki-JobRunner, and 2 others: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan - https://phabricator.wikimedia.org/T124418#1964746 (10JanZerebecki) I added it so I can look at the things related to this ticket in one graph (queue s...
[01:57:11] <Krenair>	 andrewbogott, fluorine:/a/mw-log/exception.log has more details
[01:57:32] <andrewbogott>	 'Failed to add user because LDAPSetCreationValues returned false'
[01:57:33] <Krenair>	 2016-01-26 01:51:38 labtestweb2001 labtestwiki exception ERROR: [b162446a] /w/index.php?title=Special:UserLogin&action=submitlogin&type=signup&returnto=Main+Page   PasswordError from line 2389 of /srv/mediawiki/php-1.27.0-wmf.10/includes/user/User.php: There was either an authentication database error or you are not allowed to update your external account. {"exception_id":"b162446a"} 
[01:57:33] <Krenair>	 [Exception PasswordError] (/srv/mediawiki/php-1.27.0-wmf.10/includes/user/User.php:2389) There was either an authentication database error or you are not allowed to update your external account.
[01:57:33] <Krenair>	   #0 /srv/mediawiki/php-1.27.0-wmf.10/includes/specials/SpecialUserlogin.php(676): User->setPassword(string)
[01:57:36] <andrewbogott>	 that’s what ldap says at least
[01:57:51] <Krenair>	 from the ldap extension debug log?
[01:58:04] <andrewbogott>	 oh, wait...
[01:58:06] <Krenair>	 LdapAuthentication.php:				$this->printDebug( "Failed to add user because LDAPSetCreationValues returned false", NONSENSITIVE );
[01:58:13] <andrewbogott>	 from /a/mw-log/ldap.log
[01:58:24] <andrewbogott>	 but that exception is from your attempt, not from mine...
[01:58:29] <andrewbogott>	 in your case the user already existed in ldap
[01:59:25] <Krenair>	 andrewbogott, weren't we using a separate ldap server?
[01:59:27] <Krenair>	 oh, right
[01:59:31] <Krenair>	 because I tried to create my account already
[01:59:40] <Krenair>	 and it failed and broke everything and gave me a user account with an IP as a name
[01:59:45] <andrewbogott>	 also the new ldap server has an almost-complete import of the old one
[01:59:50] <Krenair>	 right
[02:00:24] <Krenair>	 So just before 'Failed to add user because LDAPSetCreationValues returned false' there should be something else
[02:00:29] <andrewbogott>	 there is
[02:00:37] <Krenair>	 One of these:
[02:00:38] <Krenair>	                         $auth->printDebug( "Unable to allocate a UID", NONSENSITIVE );
[02:00:41] <Krenair>	                         $auth->printDebug( "Invalid shell name $shellaccountname", NONSENSITIVE );
[02:00:42] <andrewbogott>	 just a minute, though, I’m going to try to do a fresh test
[02:00:45] <Krenair>	                         $auth->printDebug( "$shellaccountname is not a creatable name.", NONSENSITIVE );
[02:00:46] <andrewbogott>	 with a new username
[02:00:52] <Krenair>	                                 $auth->printDebug( "User $shellaccountname already exists.", NONSENSITIVE );
[02:00:52] <Krenair>	 ok
[02:02:23] <andrewbogott>	 here’s everything:  https://dpaste.de/AT9u
[02:02:25] <andrewbogott>	 not very helpful
[02:03:48] <Krenair>	 2016-01-26 02:01:21 labtestweb2001 labtestwiki ldap INFO: 2.1.0 Failed to bind as cn=proxyagent,ou=profile,dc=wikimedia,dc=org 
[02:03:52] <Krenair>	 that can't be right
[02:03:58] <andrewbogott>	 Successfully added user, and then later… Failed to modify the user's password 
[02:04:06] <andrewbogott>	 oh?  I missed that, that’s something
[02:04:28] <andrewbogott>	 hm
[02:06:16] <andrewbogott>	 well, sure enough, I can’t ldapsearch with that cn and the password from /etc/mediawiki/
[02:06:19] <wikibugs>	 6operations, 10MediaWiki-Cache, 10MediaWiki-JobQueue, 10MediaWiki-JobRunner, and 2 others: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan - https://phabricator.wikimedia.org/T124418#1964774 (10BBlack) Yeah but the rate increase we're looking at is actually in the htmlCacheUpdate job insert...
[02:06:43] <Krenair>	 andrewbogott, I get ldap_bind: Invalid credentials (49) from terbium
[02:06:54] <Krenair>	 with that password and '-h labtestservices2001.wikimedia.org'
[02:06:58] <andrewbogott>	 yeah, me too
[02:07:15] <Krenair>	 ldapsearch didn't exist for me on labtestweb2001?
[02:07:34] <andrewbogott>	 yeah, ok, I just never updated that password.  Stay tuned :)
[02:07:46] <Krenair>	 I was about to say in the security channel that it's using the public password still, but ok :P
[02:09:35] <andrewbogott>	 ok, password changed
[02:09:42] <andrewbogott>	 so, once more, I will create an account
[02:10:25] <Krenair>	 the password seems to be set correctly now
[02:10:43] <andrewbogott>	 and account creation works!  Or at least reports that it works
[02:11:18] <andrewbogott>	 yeah, I can log in a second time with that account
[02:11:32] <Krenair>	 andrewbogott, btw, what happened with TLS certificates?
[02:11:39] <andrewbogott>	 I made a new one
[02:11:42] <Krenair>	 did you find some way to get it signed in a trusted way?
[02:12:13] <andrewbogott>	 we have a home-made CA for internal services.  That’s what moritz user, so I signed with the same authority for this.
[02:12:30] <Krenair>	 aha, so there was an internal CA I wasn't aware of :)
[02:12:35] <andrewbogott>	 yeah
[02:12:39] <Krenair>	 I think I suggested making one without considering there might already be one
[02:12:54] <Krenair>	 makes sense for internal-facing stuff like this
[02:12:59] <andrewbogott>	 Well, unfortunately, there are multiples.  So I was waiting on moritz to figure out which one to use
[02:13:04] <Krenair>	 haha
[02:13:08] <Krenair>	 what's the other one?
[02:13:18] <andrewbogott>	 I think there’s one that’s the official “from now on only use this one” ca
[02:13:23] <andrewbogott>	 but I couldn’t tell which was which
[02:13:43] <andrewbogott>	 all that I know is here: https://phabricator.wikimedia.org/T124374
[02:13:47] <andrewbogott>	 (which isn’t much)
[02:14:02] <wikibugs>	 6operations, 5Patch-For-Review: labtestservices2001.wikimedia.org.crt - https://phabricator.wikimedia.org/T124374#1964775 (10Andrew) 5Open>3Resolved
[02:14:21] <andrewbogott>	 so… next, I guess is to make a real account and figure out how to make it a cloud-admin...
[02:15:25] <andrewbogott>	 Krenair: want me to delete your existing ldap from labtest user so you can make a fresh one?
[02:15:39] <Krenair>	 yes please
[02:15:47] <Krenair>	 I can promote a user to cloudadmin
[02:17:14] <Krenair>	 andrewbogott, would you be comfortable merging https://gerrit.wikimedia.org/r/#/c/265907/ later?
[02:19:28] <andrewbogott>	 sure
[02:19:41] <andrewbogott>	 ok, I didn’t find a user named ‘krenair’ but I deleted a bunch of test accounts
[02:20:10] <Krenair>	 uid=krenair, cn is Alex Monk
[02:20:44] <Krenair>	 I managed to sign up
[02:21:06] <andrewbogott>	 and if you want to promote me, I’m labtestwikitech
[02:22:20] <Krenair>	 you mean labtestandrew, andrewbogott?
[02:22:41] <andrewbogott>	 yes
[02:23:06] <andrewbogott>	 copied the wrong field from my password fault, fortunately not the password
[02:23:11] <Krenair>	 haha
[02:23:58] <logmsgbot>	 !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 09m 36s)
[02:24:02] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:24:51] <Krenair>	 > var_dump( User::newFromName( 'labtestandrew' )->addGroup( 'cloudadmin' ) );
[02:24:51] <Krenair>	 bool(true)
[02:25:27] <Krenair>	 ^^ please don't use that sort of thing on normal production wikis, stewards should usually be doing such things there... but this is (lab test) wikitech and the DB/network rules would prevent it
[02:25:49] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] "Verified that this is from the same krenair who can log in to bast1001 :)" [puppet] - 10https://gerrit.wikimedia.org/r/265907 (owner: 10Alex Monk)
[02:25:55] <grrrit-wm>	 (03PS2) 10Andrew Bogott: admin: Replace my prod yubikey SSH key [puppet] - 10https://gerrit.wikimedia.org/r/265907 (owner: 10Alex Monk)
[02:26:09] <Krenair>	 thanks
[02:26:18] <andrewbogott>	 Have to rebase before I can merge
[02:27:06] <andrewbogott>	 Krenair: we also need ‘admin’ or bureaucrat or something in order to bestow rights to other accounts, right?
[02:27:25] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] admin: Replace my prod yubikey SSH key [puppet] - 10https://gerrit.wikimedia.org/r/265907 (owner: 10Alex Monk)
[02:28:05] <Krenair>	 Special:ListGroupRights is the page which shows you which groups can grant/take away what
[02:29:10] <Krenair>	 Huh.
[02:29:18] <Krenair>	 it doesn't appear to be set up correctly.
[02:29:56] <Krenair>	 oh no wait
[02:30:00] <Krenair>	 andrewbogott, so cloudadmin gets userrights
[02:30:07] <Krenair>	 'userrights'
[02:30:15] <Krenair>	 the right in itself which lets you give/remove any local group
[02:30:44] <Krenair>	 so labtestandrew can let anyone do anything on labtestwikitech now, careful with it :p
[02:30:48] <Krenair>	 well, 'anything'
[02:30:59] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Jan 26 02:30:58 UTC 2016 (duration 7m 0s)
[02:31:02] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:31:20] <andrewbogott>	 great
[02:31:34] <andrewbogott>	 ok, next step is for me to set up an admin project in keystone, I think...
[02:31:38] <andrewbogott>	 and roles and such
[03:00:43] <Krenair>	 meh... this yubikey is not quite working as expected for anything except slot 9a :(
[03:08:57] <bblack>	 Krenair: yeah apparently it has trouble with switching modes.  kinda driver/OS -dependent how easily it resets
[03:09:29] <bblack>	 if you insert it and use it for just 2FA, everything's fine.  In most cases I've heard of, you can then also use it for 9a ssh stuff without issue.
[03:09:42] <bblack>	 but once you touch 9a ssh stuff, 2FA button pushes are dead
[03:10:07] <Krenair>	 that's not the issue I've found
[03:10:18] <bblack>	 some software can reset that state easily.  e.g. if I remove->reinsert key and launch LastPass and have it query yubi 2FA, it resets fine.
[03:10:46] <bblack>	 but remove->reinsert and go try Google 2FA first, and it fails to do anything useful :/
[03:11:01] <Krenair>	 I can use SSH on slot 9a and I can use it for Google 2FA
[03:11:11] <Krenair>	 but I can't use a separate SSH key on a different slot
[03:11:21] <Krenair>	 I thought I had this working at one point
[03:11:23] <bblack>	 oh I wasn't aware you can do more than one ssh key in a single yubi
[03:11:30] <bblack>	 I don't think that's possible, but I don't know
[03:12:24] <andrewbogott>	 Krenair: I created an instance and can see the console!  It works in project ‘labtestproject’ but not in ‘testlabs’ for some reason.
[03:13:46] <icinga-wm>	 PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[03:14:24] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[03:15:12] <Krenair>	 bblack, do you know what happens when you delete-certificate and reimport cert.pem?
[03:15:30] <bblack>	 Krenair: no idea
[03:16:34] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[03:17:52] <Krenair>	 Actually I'm beginning to wonder whether I just made this key wrongly somehow
[03:18:04] <icinga-wm>	 RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[03:28:09] <Krenair>	 Well, this new one seems like it works against my VPS
[03:31:48] <Krenair>	 bblack, andrewbogott: would either of you mind helping me try again?
[03:31:59] <andrewbogott>	 Krenair: new key,  you mean?
[03:32:01] <Krenair>	 yes
[03:32:04] <andrewbogott>	 sure
[03:33:29] <grrrit-wm>	 (03PS1) 10Alex Monk: admin: Replace my prod yubikey SSH key (take 2) [puppet] - 10https://gerrit.wikimedia.org/r/266465 
[03:35:24] <grrrit-wm>	 (03PS2) 10Andrew Bogott: admin: Replace my prod yubikey SSH key (take 2) [puppet] - 10https://gerrit.wikimedia.org/r/266465 (owner: 10Alex Monk)
[03:36:51] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] admin: Replace my prod yubikey SSH key (take 2) [puppet] - 10https://gerrit.wikimedia.org/r/266465 (owner: 10Alex Monk)
[03:37:24] <andrewbogott>	 Krenair: want me to speed up the roll-out of that anyplace?  bast1001?
[03:37:37] <Krenair>	 one of the bastions would be enough for me to test it, thanks
[03:37:47] <andrewbogott>	 ok, refreshing puppet on 1001
[03:37:57] <Krenair>	 I can just wait though, I don't want to waste your time
[03:38:06] <andrewbogott>	 no worries
[03:38:22] <andrewbogott>	 I’m about to head out though… hopefully this one takes :)
[03:38:53] <Krenair>	 andrewbogott, it worked
[03:38:53] <andrewbogott>	 ok, bast1001 should have the new key now.
[03:38:56] <andrewbogott>	 cool
[03:39:24] <andrewbogott>	 ok, I’m off.  I’m excited to have labtestwikitech up and running now — thanks for all your help with that.
[03:39:53] <Krenair>	 you're welcome
[03:52:27] <grrrit-wm>	 (03CR) 10Legoktm: [C: 031] Get rid of $wg = $wmg for Graph [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266433 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson)
[03:53:42] <grrrit-wm>	 (03CR) 10Legoktm: Get rid of $wg = $wmg for Graph (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266433 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson)
[04:27:29] <legoktm>	 !log restarted resetGlobalUserTokens.php after it lost mysql connection again
[04:27:32] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[04:29:52] <grrrit-wm>	 (03PS2) 10Dereckson: Use extension registration for Graph [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266433 (https://phabricator.wikimedia.org/T119117) 
[04:36:20] <grrrit-wm>	 (03PS1) 10Dereckson: Get rid of $wg = $wmg for BetaFeatures [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266470 (https://phabricator.wikimedia.org/T119117) 
[06:05:54] <icinga-wm>	 PROBLEM - puppet last run on mw1244 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:16] <icinga-wm>	 PROBLEM - puppet last run on mw2052 is CRITICAL: CRITICAL: puppet fail
[06:31:25] <icinga-wm>	 PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:46] <icinga-wm>	 PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:05] <icinga-wm>	 PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:06] <icinga-wm>	 PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:07] <icinga-wm>	 PROBLEM - puppet last run on cp3048 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:25] <icinga-wm>	 PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:35] <icinga-wm>	 PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:45] <icinga-wm>	 PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:15] <icinga-wm>	 RECOVERY - puppet last run on mw1244 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures
[06:56:35] <icinga-wm>	 RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[06:57:06] <icinga-wm>	 RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:15] <icinga-wm>	 RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures
[06:57:16] <icinga-wm>	 RECOVERY - puppet last run on cp3048 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[06:57:34] <icinga-wm>	 RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[06:57:45] <icinga-wm>	 RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:55] <icinga-wm>	 RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures
[06:58:35] <icinga-wm>	 RECOVERY - puppet last run on mw2052 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures
[06:59:05] <icinga-wm>	 RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[08:15:35] <icinga-wm>	 PROBLEM - configured eth on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:15:55] <icinga-wm>	 PROBLEM - dhclient process on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:16:44] <icinga-wm>	 PROBLEM - puppet last run on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:16:44] <icinga-wm>	 PROBLEM - RAID on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:16:56] <icinga-wm>	 PROBLEM - nutcracker port on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:16:56] <icinga-wm>	 PROBLEM - salt-minion processes on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:17:04] <icinga-wm>	 PROBLEM - SSH on mw1161 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[08:17:14] <icinga-wm>	 PROBLEM - DPKG on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:18:04] <icinga-wm>	 RECOVERY - dhclient process on mw1161 is OK: PROCS OK: 0 processes with command name dhclient
[08:18:55] <icinga-wm>	 RECOVERY - nutcracker port on mw1161 is OK: TCP OK - 0.000 second response time on port 11212
[08:18:55] <icinga-wm>	 RECOVERY - salt-minion processes on mw1161 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[08:18:56] <icinga-wm>	 RECOVERY - SSH on mw1161 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.4 (protocol 2.0)
[08:19:14] <icinga-wm>	 RECOVERY - DPKG on mw1161 is OK: All packages OK
[08:19:35] <icinga-wm>	 PROBLEM - puppet last run on mw2127 is CRITICAL: CRITICAL: puppet fail
[08:19:45] <icinga-wm>	 RECOVERY - configured eth on mw1161 is OK: OK - interfaces up
[08:22:21] <wikibugs>	 6operations, 10ops-eqiad, 5Patch-For-Review: mw1172, mw1178,mw1217, mw1228, mw1257 are unresponsive, mgmt interface unreachable - https://phabricator.wikimedia.org/T124642#1965069 (10Joe)
[08:23:10] <wikibugs>	 6operations, 10ops-eqiad, 5Patch-For-Review: mw1172, mw1178,mw1217,  mw1257 are unresponsive, mgmt interface unreachable - https://phabricator.wikimedia.org/T124642#1965071 (10Joe)
[08:24:06] <icinga-wm>	 ACKNOWLEDGEMENT - Host mw1257 is DOWN: PING CRITICAL - Packet loss = 100% Giuseppe Lavagetto T124642
[08:24:06] <icinga-wm>	 ACKNOWLEDGEMENT - Host mw1217 is DOWN: PING CRITICAL - Packet loss = 100% Giuseppe Lavagetto T124642
[08:24:06] <icinga-wm>	 ACKNOWLEDGEMENT - Host mw1178 is DOWN: PING CRITICAL - Packet loss = 100% Giuseppe Lavagetto T124642
[08:24:06] <icinga-wm>	 ACKNOWLEDGEMENT - Host mw1172 is DOWN: PING CRITICAL - Packet loss = 100% Giuseppe Lavagetto T124642
[08:25:14] <icinga-wm>	 ACKNOWLEDGEMENT - Host mw1228 is DOWN: PING CRITICAL - Packet loss = 100% Giuseppe Lavagetto T122005
[08:25:24] <icinga-wm>	 PROBLEM - Disk space on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:25:35] <icinga-wm>	 PROBLEM - SSH on mw1161 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[08:27:25] <icinga-wm>	 RECOVERY - Disk space on mw1161 is OK: DISK OK
[08:29:54] <icinga-wm>	 PROBLEM - nutcracker port on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:29:54] <icinga-wm>	 PROBLEM - salt-minion processes on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:31:09] <grrrit-wm>	 (03PS4) 10Elukey: Add the moving average function to the event logging's insert rate alarming metric. Bug: T124204 [puppet] - 10https://gerrit.wikimedia.org/r/266264 (https://phabricator.wikimedia.org/T124204) 
[08:32:05] <icinga-wm>	 RECOVERY - nutcracker port on mw1161 is OK: TCP OK - 0.000 second response time on port 11212
[08:34:05] <icinga-wm>	 PROBLEM - Disk space on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:38:35] <icinga-wm>	 PROBLEM - nutcracker port on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:40:24] <icinga-wm>	 RECOVERY - Disk space on mw1161 is OK: DISK OK
[08:42:12] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Add the moving average function to the event logging's insert rate alarming metric. Bug: T124204 [puppet] - 10https://gerrit.wikimedia.org/r/266264 (https://phabricator.wikimedia.org/T124204) (owner: 10Elukey)
[08:42:24] <icinga-wm>	 RECOVERY - RAID on mw1161 is OK: OK: no RAID installed
[08:42:44] <icinga-wm>	 RECOVERY - salt-minion processes on mw1161 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[08:42:44] <icinga-wm>	 RECOVERY - nutcracker port on mw1161 is OK: TCP OK - 0.000 second response time on port 11212
[08:47:14] <icinga-wm>	 PROBLEM - DPKG on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:47:24] <icinga-wm>	 RECOVERY - puppet last run on mw2127 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[08:48:44] <icinga-wm>	 PROBLEM - RAID on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:49:45] <icinga-wm>	 PROBLEM - configured eth on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:51:04] <icinga-wm>	 PROBLEM - Disk space on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:51:24] <icinga-wm>	 RECOVERY - DPKG on mw1161 is OK: All packages OK
[08:51:54] <icinga-wm>	 RECOVERY - configured eth on mw1161 is OK: OK - interfaces up
[08:53:06] <icinga-wm>	 RECOVERY - Disk space on mw1161 is OK: DISK OK
[08:59:45] <icinga-wm>	 PROBLEM - salt-minion processes on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:59:45] <icinga-wm>	 PROBLEM - nutcracker port on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:01:18] <grrrit-wm>	 (03PS1) 10Ema: esams: add text nodes to mobile cluster [puppet] - 10https://gerrit.wikimedia.org/r/266475 (https://phabricator.wikimedia.org/T109286) 
[09:01:55] <icinga-wm>	 PROBLEM - Disk space on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:02:16] <icinga-wm>	 PROBLEM - DPKG on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:02:25] <icinga-wm>	 PROBLEM - puppet last run on mw2024 is CRITICAL: CRITICAL: Puppet has 1 failures
[09:02:55] <grrrit-wm>	 (03Abandoned) 10Giuseppe Lavagetto: neodymium: add role::deployment::salt_masters [puppet] - 10https://gerrit.wikimedia.org/r/266218 (owner: 10Giuseppe Lavagetto)
[09:06:15] <icinga-wm>	 RECOVERY - nutcracker port on mw1161 is OK: TCP OK - 0.000 second response time on port 11212
[09:06:35] <icinga-wm>	 RECOVERY - DPKG on mw1161 is OK: All packages OK
[09:07:31] <wikibugs>	 6operations: reinstall eqiad memcache servers with jessie - https://phabricator.wikimedia.org/T123711#1965105 (10Joe) All memcached hosts have both memcached and the session-related redis.  So reinstalling them has a small but non-trivial effect: when a server goes down, we lose 1/18th of the current user sessio...
[09:08:06] <icinga-wm>	 RECOVERY - Disk space on mw1161 is OK: DISK OK
[09:12:14] <icinga-wm>	 RECOVERY - RAID on mw1161 is OK: OK: no RAID installed
[09:12:35] <icinga-wm>	 RECOVERY - SSH on mw1161 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.4 (protocol 2.0)
[09:12:44] <icinga-wm>	 RECOVERY - salt-minion processes on mw1161 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[09:20:45] <icinga-wm>	 PROBLEM - RAID on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:21:14] <icinga-wm>	 PROBLEM - nutcracker port on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:21:14] <icinga-wm>	 PROBLEM - salt-minion processes on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:21:42] <grrrit-wm>	 (03PS2) 10Muehlenhoff: Remove debdeploy::master from palladium [puppet] - 10https://gerrit.wikimedia.org/r/266219 
[09:21:51] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 032 V: 032] Remove debdeploy::master from palladium [puppet] - 10https://gerrit.wikimedia.org/r/266219 (owner: 10Muehlenhoff)
[09:21:54] <icinga-wm>	 PROBLEM - configured eth on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:23:15] <icinga-wm>	 RECOVERY - salt-minion processes on mw1161 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[09:23:15] <icinga-wm>	 RECOVERY - nutcracker port on mw1161 is OK: TCP OK - 0.000 second response time on port 11212
[09:23:16] <icinga-wm>	 PROBLEM - SSH on mw1161 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:23:35] <icinga-wm>	 PROBLEM - DPKG on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:24:54] <icinga-wm>	 RECOVERY - RAID on mw1161 is OK: OK: no RAID installed
[09:25:15] <icinga-wm>	 RECOVERY - SSH on mw1161 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.4 (protocol 2.0)
[09:25:25] <icinga-wm>	 RECOVERY - DPKG on mw1161 is OK: All packages OK
[09:25:55] <icinga-wm>	 RECOVERY - configured eth on mw1161 is OK: OK - interfaces up
[09:28:04] <grrrit-wm>	 (03CR) 10Hoo man: [C: 031] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264461 (owner: 10Suriyaa Kudo)
[09:28:50] <_joe_>	 !log finishing reboots of appservers in eqiad
[09:28:53] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:29:04] <icinga-wm>	 RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:30:04] <icinga-wm>	 RECOVERY - puppet last run on mw2024 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:30:53] <hashar>	 !log restarting Jenkins to upgrade the gearman plugin with https://review.openstack.org/#/c/271543/
[09:30:56] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:36:17] <icinga-wm>	 PROBLEM - HHVM rendering on mw1048 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:36:26] <icinga-wm>	 PROBLEM - Apache HTTP on mw1059 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:38:16] <icinga-wm>	 RECOVERY - HHVM rendering on mw1048 is OK: HTTP OK: HTTP/1.1 200 OK - 64744 bytes in 0.113 second response time
[09:38:17] <icinga-wm>	 RECOVERY - Apache HTTP on mw1059 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.041 second response time
[09:38:56] <icinga-wm>	 PROBLEM - Host mw1171 is DOWN: PING CRITICAL - Packet loss = 100%
[09:40:36] <icinga-wm>	 RECOVERY - Host mw1171 is UP: PING OK - Packet loss = 0%, RTA = 1.37 ms
[09:42:37] <icinga-wm>	 PROBLEM - Host mw1111 is DOWN: PING CRITICAL - Packet loss = 100%
[09:42:56] <icinga-wm>	 PROBLEM - puppet last run on mw2078 is CRITICAL: CRITICAL: Puppet has 1 failures
[09:43:36] <icinga-wm>	 RECOVERY - Host mw1111 is UP: PING OK - Packet loss = 0%, RTA = 0.38 ms
[09:58:46] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0]
[09:59:58] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0]
[10:06:03] <_joe_>	 uh let's see this
[10:06:21] <wikibugs>	 6operations, 10Salt: Salt minions randomly crashing when the deployment server grain gets changed - https://phabricator.wikimedia.org/T124646#1965213 (10ArielGlenn) Forgot to mention, this is actually an issue with the pillar refresh after the grain is set.
[10:06:27] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[10:06:46] <_joe_>	 ema, elukey when you see those alarms about 5xx reqs/min you should look at https://grafana-admin.wikimedia.org/dashboard/db/varnish-http-errors
[10:07:24] <_joe_>	 as you can see from the 4th graph (HTTP 5xx Responses)
[10:07:27] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[10:07:33] <_joe_>	 it was just a spike
[10:07:43] <_joe_>	 if it wasn't, it's worth investigating more
[10:08:36] <icinga-wm>	 RECOVERY - puppet last run on mw2078 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[10:09:37] <yurik>	 hmm, for a moment i couldn't get to mediawiki.org
[10:09:51] <yurik>	 too bad i refreshed and didn't copy the bottom error message
[10:10:03] <_joe_>	 yurik: was it a 503?
[10:10:07] <yurik>	 i think so
[10:13:27] <icinga-wm>	 PROBLEM - puppet last run on cp3043 is CRITICAL: CRITICAL: Puppet has 1 failures
[10:14:27] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[10:14:47] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[10:14:49] <grrrit-wm>	 (03PS1) 10ArielGlenn: make default log rotation for apache be 30 days [puppet] - 10https://gerrit.wikimedia.org/r/266480 
[10:15:16] <icinga-wm>	 PROBLEM - Mobile HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[10:15:28] <grrrit-wm>	 (03Abandoned) 10ArielGlenn: apache: keep two weeks' worth of logs, rather than 1yr [puppet] - 10https://gerrit.wikimedia.org/r/130296 (owner: 10ArielGlenn)
[10:16:14] <grrrit-wm>	 (03Abandoned) 10Muehlenhoff: Move debdeploy::master off palladium [puppet] - 10https://gerrit.wikimedia.org/r/266215 (owner: 10Muehlenhoff)
[10:18:53] <wikibugs>	 6operations, 10Salt: Move salt master to separate host from puppet master - https://phabricator.wikimedia.org/T115287#1965246 (10ArielGlenn) git-deploy moved to neodymium yesterday, debdeploy was moved by moritz today.  Giving a couple of days for any problems to shake out, on Thursday palladium will be remove...
[10:19:27] <icinga-wm>	 RECOVERY - Mobile HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[10:20:47] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[10:20:54] <ema>	 _joe_: similar spike in esams, what could be the cause?
[10:21:07] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[10:22:47] <wikibugs>	 6operations, 10Salt: Move salt master to separate host from puppet master - https://phabricator.wikimedia.org/T115287#1965250 (10Joe) Please note that we will need to find a way to allow salt key signing during the reimaging/imaging of a server; this isn't a blocker for the decommission of palladium, but I'd k...
[10:30:31] <wikibugs>	 7Puppet, 6operations, 10Salt: Make it possible for wmf-reimage to work seamlessly with a non-local salt master - https://phabricator.wikimedia.org/T124761#1965260 (10Joe) 3NEW a:3ArielGlenn
[10:31:10] <wikibugs>	 7Puppet, 6operations, 10Salt: Make it possible for wmf-reimage to work seamlessly with a non-local salt master - https://phabricator.wikimedia.org/T124761#1965260 (10Joe) a:5ArielGlenn>3Joe
[10:32:35] <wikibugs>	 6operations, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Switchover of the application servers to codfw - https://phabricator.wikimedia.org/T124671#1965272 (10Joe)
[10:35:16] <icinga-wm>	 PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 782
[10:37:18] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: Use the logical redis definition for GettingStarted. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266481 (https://phabricator.wikimedia.org/T124671) 
[10:39:07] <icinga-wm>	 RECOVERY - puppet last run on cp3043 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures
[10:40:16] <icinga-wm>	 RECOVERY - check_mysql on db1008 is OK: Uptime: 586915 Threads: 2 Questions: 4505529 Slow queries: 3909 Opens: 1610 Flush tables: 2 Open tables: 417 Queries per second avg: 7.676 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[10:40:51] <wikibugs>	 6operations, 10CirrusSearch, 6Discovery, 7Elasticsearch: Look into encrypting Elasticsearch traffic - https://phabricator.wikimedia.org/T124444#1965282 (10faidon) >>! In T124444#1964147, @EBernhardson wrote: > I realizes it's a ton more work, hardware, and I honestly don't even know what would be involved....
[10:43:03] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 031] esams: add text nodes to mobile cluster [puppet] - 10https://gerrit.wikimedia.org/r/266475 (https://phabricator.wikimedia.org/T109286) (owner: 10Ema)
[10:44:03] <grrrit-wm>	 (03CR) 10BBlack: [C: 031] esams: add text nodes to mobile cluster [puppet] - 10https://gerrit.wikimedia.org/r/266475 (https://phabricator.wikimedia.org/T109286) (owner: 10Ema)
[10:46:00] <grrrit-wm>	 (03PS2) 10Ema: esams: add text nodes to mobile cluster [puppet] - 10https://gerrit.wikimedia.org/r/266475 (https://phabricator.wikimedia.org/T109286) 
[10:46:18] <grrrit-wm>	 (03CR) 10Ema: [C: 032 V: 032] esams: add text nodes to mobile cluster [puppet] - 10https://gerrit.wikimedia.org/r/266475 (https://phabricator.wikimedia.org/T109286) (owner: 10Ema)
[10:50:51] <ema>	 !log Starting migration of mobile traffic to text cluster in esams https://phabricator.wikimedia.org/T109286
[10:50:54] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[11:10:35] <wikibugs>	 6operations, 10MediaWiki-Cache, 10MediaWiki-JobQueue, 10MediaWiki-JobRunner, and 2 others: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan - https://phabricator.wikimedia.org/T124418#1965334 (10Addshore) As far as I can tell in Wikibase....     - WikiPgaeUpdater::scheduleRefereshLinks creat...
[11:23:26] <icinga-wm>	 PROBLEM - Host multatuli is DOWN: PING CRITICAL - Packet loss = 100%
[11:24:07] <icinga-wm>	 RECOVERY - Host multatuli is UP: PING OK - Packet loss = 0%, RTA = 85.94 ms
[11:41:05] <moritzm>	 multatuli was me (reboot I forgot to ack in icinga)
[11:46:07] <moritzm>	 !log rebooting bromine for kernel update
[11:46:10] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[11:50:52] <moritzm>	 !log rebooting etherpad1001 for kernel update
[11:50:55] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:05:11] <wikibugs>	 6operations, 10ContentTranslation-cxserver, 6Services, 10Traffic: Remove cxserver from parsoidcache cluster - https://phabricator.wikimedia.org/T110478#1965406 (10BBlack) Does that imply that **nothing** should be using the hostnames `cxserver.wikimedia.org` and/or `cxserver.eqiad.wikimedia.org`, which map...
[12:05:41] <wikibugs>	 6operations, 10Graphoid, 6Services, 10Traffic: Remove graphoid from parsoidcache - https://phabricator.wikimedia.org/T110477#1965407 (10BBlack) Are things still using the hostnames `graphoid.wikimedia.org` and/or `graphoid.eqiad.wikimedia.org`, which map to the cache_parsoid cluster rather than through res...
[12:06:20] <wikibugs>	 6operations, 10Citoid, 6Services, 10Traffic: Remove citoid from parsoidcache - https://phabricator.wikimedia.org/T110476#1965408 (10BBlack) Are things still using the hostnames `citoid.wikimedia.org` and/or `citoid.eqiad.wikimedia.org`, which map to the cache_parsoid cluster rather than through restbase?
[12:06:58] <wikibugs>	 6operations, 10RESTBase, 6Services, 10Traffic: Remove restbase from parsoidcache - https://phabricator.wikimedia.org/T110475#1965410 (10BBlack) Are things still using the hostnames `rest.wikimedia.org` and/or `restbase.wikimedia.org` and/or `restbase.eqiad.wikimedia.org`, which map to the cache_parsoid clu...
[12:08:55] <grrrit-wm>	 (03PS1) 10BBlack: VCL: do not use illegal "trusted" XFF values for XCIP [puppet] - 10https://gerrit.wikimedia.org/r/266486 (https://phabricator.wikimedia.org/T120121) 
[12:10:11] <moritzm>	 !log rebooting mx2001/mx1001 (with a delay in between) for kernel update
[12:10:14] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:29:06] <icinga-wm>	 PROBLEM - puppet last run on ganeti2002 is CRITICAL: CRITICAL: Puppet has 3 failures
[12:29:16] <icinga-wm>	 PROBLEM - NTP peers on nescio is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown
[12:31:13] <grrrit-wm>	 (03PS1) 10BBlack: make cache_parsoid LVS IPs slightly more intuitive [puppet] - 10https://gerrit.wikimedia.org/r/266488 
[12:31:15] <grrrit-wm>	 (03PS1) 10BBlack: cache_parsoid: use local backends in codfw [puppet] - 10https://gerrit.wikimedia.org/r/266489 
[12:34:31] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] make cache_parsoid LVS IPs slightly more intuitive [puppet] - 10https://gerrit.wikimedia.org/r/266488 (owner: 10BBlack)
[12:39:38] <akosiaris>	 !log rolling reboot of ganeti200{1,2,3,4,5,6}.codfw.wmnet for kernel upgrade
[12:39:41] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:42:48] <icinga-wm>	 PROBLEM - puppet last run on cp3044 is CRITICAL: CRITICAL: puppet fail
[12:44:26] <icinga-wm>	 RECOVERY - NTP peers on nescio is OK: NTP OK: Offset 7e-06 secs
[12:48:19] <wikibugs>	 6operations, 10Graphoid, 6Services, 10Traffic: Remove graphoid from parsoidcache - https://phabricator.wikimedia.org/T110477#1965474 (10Yurik) Not to my knowledge. I sometimes use it for debugging, eg when restbase has a bad day, but I can ssh directly
[12:54:36] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet).
[12:54:36] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet).
[12:54:56] <bblack>	 ^ me
[12:56:46] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge.
[12:56:46] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge.
[13:03:57] <wikibugs>	 6operations, 10Parsoid: Need databases provisioned for parsoid-rt testing, visual diff testing - https://phabricator.wikimedia.org/T124703#1965507 (10jcrespo) The database has been exported, the 3 databases are being imported now into m5-master.
[13:05:17] <grrrit-wm>	 (03PS1) 10Jcrespo: Depool pc1002 for maintenance (clone to pc1005) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266490 (https://phabricator.wikimedia.org/T121888) 
[13:10:58] <icinga-wm>	 RECOVERY - puppet last run on cp3044 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[13:11:38] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Depool pc1002 for maintenance (clone to pc1005) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266490 (https://phabricator.wikimedia.org/T121888) (owner: 10Jcrespo)
[13:14:23] <logmsgbot>	 !log jynus@mira Synchronized wmf-config/db-eqiad.php: Depool pc1002 for maintenance (clone to pc1005) (duration: 01m 39s)
[13:14:26] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:21:47] <icinga-wm>	 PROBLEM - High load average on labstore1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0]
[13:23:45] <jynus>	 I am probably afecting some job working on terbium if mediawiki doesn't now how to connect, but I really need to bring down pc1002, and I cannot wait
[13:25:57] <grrrit-wm>	 (03PS2) 10Bmansurov: Add sampling rates for mobile web language switcher [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265292 (https://phabricator.wikimedia.org/T123932) 
[13:30:04] <icinga-wm>	 PROBLEM - mysqld processes on pc1002 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld
[13:30:52] <icinga-wm>	 PROBLEM - mysqld processes on pc1005 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld
[13:31:00] <_joe_>	 jynus: is this expected?
[13:31:04] <_joe_>	 (1005)
[13:31:09] <_joe_>	 if so, disregard
[13:31:23] <akosiaris>	 both of them at the same time ?
[13:31:31] <apergos>	 1002 is him
[13:31:53] <apergos>	 clone to pc1005
[13:31:58] <apergos>	 I see.  I bet they are both him
[13:32:27] <akosiaris>	 ok
[13:32:40] <moritzm>	 !log rebooted nescio/maerlant for kernel update
[13:32:43] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:33:40] <icinga-wm>	 ACKNOWLEDGEMENT - High load average on labstore1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [24.0] cpettet yes I see
[13:38:49] <jynus>	 oh s***
[13:39:41] <bblack>	 that must be "oh sure" as in everything's fine, right? :P
[13:39:46] <apergos>	 :-D
[13:42:13] <jynus>	 yes, everithing is fine
[13:43:04] <jynus>	 there are 2 factors here, I am getting older
[13:43:21] <jynus>	 and I have too many things going on at the same time
[13:44:32] <jynus>	 the third is- where is my expert-system controlled alter system?
[13:44:53] <apergos>	 don't even try that 'getting older' thing on me, young whipper-snapper! 
[13:45:26] <icinga-wm>	 PROBLEM - Host alsafi is DOWN: PING CRITICAL - Packet loss = 100%
[13:47:53] <jynus>	 at this point I am happy with bringing down the right hosts, and not the wrong ones
[13:48:04] <mark>	 should I get concerned?
[13:48:38] <jynus>	 no, it was regular maintenance that was not downtimed properly
[13:48:42] <jynus>	 sorry
[13:49:29] <jynus>	 pc1002 is the depooled host, pc1005 is the new host, I am cloning them
[13:49:47] <jynus>	 before decommision pc1002 and pool pc1005
[13:50:24] <jynus>	 there is a better solution- paging based on service, not on servers
[13:50:26] <icinga-wm>	 RECOVERY - High load average on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0]
[13:51:17] <icinga-wm>	 PROBLEM - DPKG on cp4019 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[13:51:42] <jynus>	 to be clear here, there was no user notice at all
[13:53:36] <icinga-wm>	 PROBLEM - puppet last run on cp4019 is CRITICAL: CRITICAL: Puppet has 1 failures
[13:55:08] <jynus>	 I think I inherited a page config useful for working in a different timezone
[13:55:27] <apergos>	 probably
[13:55:33] <apergos>	 easy to change that
[13:55:57] <apergos>	 so I'm trying root@ganeti2001:~# gnt-instance console alsafi.wikimedia.org
[13:55:57] <apergos>	    given that ssh in fails
[13:56:04] <apergos>	 and it's not doing much.  any ideas?
[13:56:20] * apergos eyes akosiaris
[13:58:50] <akosiaris>	 hmm
[13:59:59] <apergos>	 I saw a note in SAL from mutante yesterday that he 'logged in as though it were hibernating' and it came back up
[14:01:19] <akosiaris>	 if it's that, some VMs need to be rebooted to get the KVM disk_aio setting applied and alsafi is probably one of them
[14:01:48] <apergos>	 oh
[14:02:06] <akosiaris>	 but it doesn't look like that though
[14:02:19] <akosiaris>	 I just migrated it though, lemme check
[14:02:23] <akosiaris>	 I might be the cause
[14:02:58] <apergos>	 ok
[14:03:10] <apergos>	 I gave up on the console ting, it hung forever
[14:03:12] <apergos>	 *thing
[14:03:36] <akosiaris>	 alsafi login: 
[14:03:36] <akosiaris>	 Debian GNU/Linux 8 alsafi ttyS0
[14:03:40] <akosiaris>	 nope it did not
[14:03:43] <akosiaris>	 I just got a console
[14:03:49] <apergos>	 well I sure did not. meh
[14:03:54] <akosiaris>	 so this is probably network related misconfiguration on my side
[14:04:19] <apergos>	 I was on ganeti2001 at the time, was that a mistake?
[14:04:30] <akosiaris>	 nope
[14:04:34] <apergos>	 huh
[14:04:49] <akosiaris>	 did you press a couple of enters though ?
[14:04:55] <apergos>	 no
[14:05:04] <apergos>	 one but not two
[14:05:16] <akosiaris>	 one should have been enough
[14:05:36] <akosiaris>	 ok definitely network related 
[14:05:45] <apergos>	 I"m repeating the experiment with the same results
[14:05:56] <akosiaris>	 oh, I am attached to it right now
[14:06:00] <apergos>	 ah :-D
[14:06:01] <apergos>	 nm then
[14:08:20] <icinga-wm>	 ACKNOWLEDGEMENT - Host mw2173 is DOWN: PING CRITICAL - Packet loss = 100% Giuseppe Lavagetto T124408
[14:08:46] <icinga-wm>	 PROBLEM - Host mx2001 is DOWN: PING CRITICAL - Packet loss = 100%
[14:11:01] <akosiaris>	 that's the exact same issue
[14:11:05] <akosiaris>	 ^
[14:11:12] <akosiaris>	 fixing both as we speak
[14:11:18] <apergos>	 awesome
[14:12:26] <icinga-wm>	 RECOVERY - Host mx2001 is UP: PING WARNING - Packet loss = 86%, RTA = 384.85 ms
[14:13:07] <icinga-wm>	 RECOVERY - Host alsafi is UP: PING OK - Packet loss = 0%, RTA = 36.21 ms
[14:13:45] <apergos>	 \o/
[14:14:11] <akosiaris>	 !log migrate alsafi,mx2001 back from ganeti2004 to fix a network misconfiguration
[14:14:14] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:14:34] <akosiaris>	 wrong assumption on my part btw
[14:14:47] <apergos>	 oh?
[14:15:18] <akosiaris>	 I never put into auto lo in e/n/i eth0.2002. I assumed vlan2002 which depends on it would bring the slave interface up as well
[14:15:40] <apergos>	 ouch
[14:16:17] <akosiaris>	 yeah, that part needs some better puppetization. And it is actually possible these days in jessie
[14:16:24] <akosiaris>	 using /etc/network/interfaces.d/
[14:16:36] <akosiaris>	 I 'll start concocting something up to handle these things better 
[14:16:37] <apergos>	 another impetus to get stuff moved
[14:17:34] <akosiaris>	 the thing daniel talked about is evident here btw https://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&h=alsafi.wikimedia.org&m=cpu_report&s=descending&mc=2&g=load_report&c=Miscellaneous+codfw
[14:17:46] <icinga-wm>	 PROBLEM - puppet last run on mx2001 is CRITICAL: CRITICAL: puppet fail
[14:17:47] <icinga-wm>	 PROBLEM - puppet last run on alsafi is CRITICAL: CRITICAL: puppet fail
[14:18:14] <akosiaris>	 you get to see a huge load increase for no apparent reason. Then https://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&h=alsafi.wikimedia.org&m=cpu_report&s=descending&mc=2&g=cpu_report&c=Miscellaneous+codfw
[14:18:19] <akosiaris>	 and you see the IOwait
[14:18:50] <akosiaris>	 so that seems to be fixed by disk_aio=native which I 've applied throughout the clusters but we still need a reboot in some VMs
[14:18:59] <akosiaris>	 after that we will hopefully not see it ever again
[14:21:10] <akosiaris>	 !log migrating alsafi,mx2001 back to 2004 for testing
[14:21:13] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:21:31] <apergos>	 what on earth was it doing during that time to have that high load
[14:21:37] <akosiaris>	 nothing
[14:21:41] <akosiaris>	 it's a kvm/qemu bug
[14:21:46] <apergos>	 ahahaha
[14:22:09] <apergos>	 and the native setting is the workaround then
[14:22:13] <akosiaris>	 yes
[14:23:04] <apergos>	 which vms still need a reboot, can they be on a list to be scheduled? 
[14:24:12] <akosiaris>	 quite a few (more than 70%), but yeah, I 'll create one. I 've been testing the workaround and I think it works fine, so... what better time to do it than now ?
[14:24:22] <apergos>	 +1
[14:26:41] <bblack>	 debian gurus? how do you *really* force apt-get to *never* ask questions at all (even if that means it has to just fail)
[14:27:06] <bblack>	 I tried:
[14:27:10] <bblack>	 apt-get -y -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confold upgrade </dev/null
[14:27:17] <bblack>	 (and it's running via salt, so it has no terminal anyways)
[14:27:35] <bblack>	 and it's still hanging on asking a question that a package wants answered, which has an appropriate default answer and just needs enter pressed :P
[14:28:31] <ema>	 DEBIAN_FRONTEND
[14:28:37] <icinga-wm>	 RECOVERY - puppet last run on alsafi is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[14:28:39] <ema>	 maybe? :)
[14:28:54] <bblack>	 no idea :)
[14:29:00] <ema>	 export DEBIAN_FRONTEND='noninteractive' or something like that
[14:29:08] <apergos>	 oh yeah it's a setting like that
[14:29:14] <apergos>	 used in docker containers a lot
[14:29:14] <bblack>	 ok
[14:30:02] * bblack grumbles something about how that should be a simple flag like "--non-interactive", which should maybe activate itself when there's no terminal attached to the initial command
[14:30:04] <apergos>	 that is exactly the right var and setting
[14:30:22] * bblack and wonders why apt-get actually creates a fake terminal for the dpkg it invokes...
[14:32:17] <apergos>	 I got no answer for that
[14:32:42] <apergos>	 I wonde rhow much stuff in software is "oh yeah we never removed that workaround, it's obsolete"
[14:34:37] <jynus>	 I think I now remember I left a screen session on some mysql server doing an important task
[14:34:57] <icinga-wm>	 RECOVERY - DPKG on cp4019 is OK: All packages OK
[14:35:10] <jynus>	 I only now have to find which one
[14:35:22] <jynus>	 this was before vacations
[14:39:12] <grrrit-wm>	 (03PS1) 10Alex Monk: Change ukwikinews logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266497 (https://phabricator.wikimedia.org/T124778) 
[14:39:34] <bblack>	 !log upgrading packages (incl kernel) on all ulsfo caches (cp4xxx)
[14:39:37] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:41:37] <icinga-wm>	 RECOVERY - puppet last run on mx2001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures
[14:42:12] <grrrit-wm>	 (03PS4) 10Rush: Enable RPS on eth0 on labstores [puppet] - 10https://gerrit.wikimedia.org/r/261598 (owner: 10Mark Bergsma)
[14:42:28] <apergos>	 jynus: it's not on db* because none of them have screen running on them :-P
[14:43:41] <grrrit-wm>	 (03CR) 10Rush: [C: 032] Enable RPS on eth0 on labstores [puppet] - 10https://gerrit.wikimedia.org/r/261598 (owner: 10Mark Bergsma)
[14:44:31] <jynus>	 apergos, that is false, I am running 1 right now
[14:44:36] <apergos>	 where?
[14:44:52] <apergos>	 jynus: 
[14:46:11] <jynus>	 https://phabricator.wikimedia.org/P2527
[14:47:40] <jynus>	 been there for 2 hours at least
[14:48:03] <apergos>	 ugh can't help it if debian / ubuntu calls the process SCREEN instead of screen :-/
[14:48:06] <icinga-wm>	 RECOVERY - puppet last run on cp4019 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[14:48:06] <chasemp>	 !log RPS on eth0 on labstores
[14:48:09] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:48:11] <jynus>	 lol
[14:48:39] <apergos>	 huh some of these are running as springle :-D
[14:49:10] <jynus>	 yes, do not touch them if you do not want all db infrastructure to collapse
[14:49:37] <apergos>	 "only" 22 hosts to check, good luck...
[14:50:27] <jynus>	 you have a paste so I do not have to rerun it?
[14:50:41] <jynus>	 I suppose it will be on the log, doesn't matter
[14:53:06] <apergos>	 yes I have the info
[14:53:16] <apergos>	 where would you like a paste? 
[14:53:42] <jynus>	 file on salt master?
[14:54:09] <grrrit-wm>	 (03PS1) 10Ema: esams: remove varnish-fe,nginx services from mobile cluster [puppet] - 10https://gerrit.wikimedia.org/r/266499 (https://phabricator.wikimedia.org/T109286) 
[14:54:16] <apergos>	 /root/dbscreens.txt
[14:54:40] <apergos>	 I think some of those springle screens can go but that's a task for another time
[14:54:59] <apergos>	 on neodymium is the file, of course
[14:55:05] <jynus>	 of course
[14:55:41] <jynus>	 no need to search "SCREEN -S partitioning"
[14:56:24] <apergos>	 only ten hosts left then :-D
[14:56:53] <_joe_>	 uhm just got a page?
[14:57:03] <ema>	 !log Finished migration of mobile traffic to text cluster in esams https://phabricator.wikimedia.org/T109286
[14:57:06] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:57:27] <apergos>	 er?
[14:57:31] <apergos>	 I got nothin
[14:57:33] <jynus>	 _joe_, I didn't
[14:57:39] <apergos>	 what was it?
[14:58:17] <_joe_>	 nope, an old one got delivered again
[14:58:24] <apergos>	 ah whew
[14:59:00] <jynus>	 it happened to me constantly when in the US, I received the same page 20 times
[15:02:11] <yurik>	 bblack, hi, re mobile merge - i saw you commented about merging ip ranges - weren't we trying to stabilize those ranges so that some of our zero partners can identify mobile traffic via ips?
[15:04:29] <grrrit-wm>	 (03CR) 10BBlack: [C: 031] esams: remove varnish-fe,nginx services from mobile cluster [puppet] - 10https://gerrit.wikimedia.org/r/266499 (https://phabricator.wikimedia.org/T109286) (owner: 10Ema)
[15:04:38] <bblack>	 yurik: yes, kinda
[15:05:06] <grrrit-wm>	 (03CR) 10Ema: [C: 032 V: 032] esams: remove varnish-fe,nginx services from mobile cluster [puppet] - 10https://gerrit.wikimedia.org/r/266499 (https://phabricator.wikimedia.org/T109286) (owner: 10Ema)
[15:08:05] <grrrit-wm>	 (03PS1) 10Jcrespo: m4-master is now the eventlogging master (pointed by dbproxy1004) [software] - 10https://gerrit.wikimedia.org/r/266500 
[15:08:53] <grrrit-wm>	 (03CR) 10Ottomata: [C: 031] m4-master is now the eventlogging master (pointed by dbproxy1004) [software] - 10https://gerrit.wikimedia.org/r/266500 (owner: 10Jcrespo)
[15:08:53] <ottomata>	 :)
[15:09:19] <jynus>	 ?
[15:09:25] <jynus>	 are you a wizard?
[15:09:45] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032 V: 032] m4-master is now the eventlogging master (pointed by dbproxy1004) [software] - 10https://gerrit.wikimedia.org/r/266500 (owner: 10Jcrespo)
[15:10:07] <jynus>	 FYI, it has been running for a few ours already
[15:10:10] <jynus>	 *hours
[15:11:49] <ottomata>	 jynus:  hi ja
[15:11:52] <ottomata>	 ah ok cool
[15:12:01] <ottomata>	 how's it look?  worired it might be super slow with just one process
[15:12:17] <ottomata>	 (sorry the post office line was very long this morning :/ )
[15:12:28] <jynus>	 it is not slow, but it looks we were missing 20% of events
[15:12:41] <ottomata>	 over the weekend?
[15:12:43] <ottomata>	 whatcha mean?
[15:12:48] <jynus>	 overally
[15:12:51] <ottomata>	 ?
[15:13:11] <ottomata>	 like forever?
[15:13:20] <jynus>	 after I finish with it, I will run it from 1 Jan to see how many differences we get
[15:13:34] <ottomata>	 ha, crazy ok, like the sync.sh script has been lazy?
[15:13:45] <jynus>	 I do not know, really
[15:13:47] <ottomata>	 hm
[15:13:54] <jynus>	 maybe it is just false positives
[15:15:30] <wikibugs>	 6operations, 6Discovery: Elasticsearch health and capacity planning FY2016-17 - https://phabricator.wikimedia.org/T124626#1965732 (10dcausse) Yes it's extremely hard to guess, the morelike problem makes it hard to evaluate.  Cluster wide: Without serving morelike queries tp95 starts to move at 1200qps (prefix)...
[15:16:15] <jynus>	 I will know more when it finishes
[15:16:53] <ottomata>	 ok
[15:16:56] <ottomata>	 thanks
[15:19:27] <icinga-wm>	 PROBLEM - NTP on cygnus is CRITICAL: NTP CRITICAL: Offset -2.074615479 secs
[15:22:06] <grrrit-wm>	 (03CR) 10Dereckson: "PS2: load extension through wfLoadExtension" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266433 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson)
[15:28:56] <icinga-wm>	 PROBLEM - NTP on pollux is CRITICAL: NTP CRITICAL: Offset -9.967888832 secs
[15:31:53] <grrrit-wm>	 (03PS3) 10Ottomata: Refactor MirrorMaker puppetization [puppet/kafka] - 10https://gerrit.wikimedia.org/r/265789 (https://phabricator.wikimedia.org/T124077) 
[15:32:07] <grrrit-wm>	 (03PS4) 10Ottomata: Refactor MirrorMaker puppetization [puppet/kafka] - 10https://gerrit.wikimedia.org/r/265789 (https://phabricator.wikimedia.org/T124077) 
[15:40:38] <grrrit-wm>	 (03PS1) 10Ema: eqiad: add text nodes to mobile cluster [puppet] - 10https://gerrit.wikimedia.org/r/266503 (https://phabricator.wikimedia.org/T109286) 
[15:41:24] <wikibugs>	 6operations, 6Discovery: Elasticsearch health and capacity planning FY2016-17 - https://phabricator.wikimedia.org/T124626#1965764 (10dcausse) Side note: I think we can really optimize server usage by splitting cluster by feature. From what I understand in this paper[1]: parallelization of slow queries can real...
[15:46:39] <wikibugs>	 6operations, 6Project-Creators: Operations-related subprojects/tags reorganization - https://phabricator.wikimedia.org/T119944#1965793 (10matmarex) >>! In T119944#1960512, @Aklapper wrote: >>>! In T119944#1950162, @matmarex wrote: >> empty up #Wikimedia-Media-Storage, moving the reports in it to #swift or else...
[15:46:49] <grrrit-wm>	 (03PS1) 10Jcrespo: Updated partitioning for s1 and s4 [software] - 10https://gerrit.wikimedia.org/r/266504 (https://phabricator.wikimedia.org/T120513) 
[15:47:01] <grrrit-wm>	 (03PS5) 10Ottomata: Refactor MirrorMaker puppetization [puppet/kafka] - 10https://gerrit.wikimedia.org/r/265789 (https://phabricator.wikimedia.org/T124077) 
[15:48:30] <grrrit-wm>	 (03CR) 10Jcrespo: [V: 032] Updated partitioning for s1 and s4 [software] - 10https://gerrit.wikimedia.org/r/266504 (https://phabricator.wikimedia.org/T120513) (owner: 10Jcrespo)
[15:48:39] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Updated partitioning for s1 and s4 [software] - 10https://gerrit.wikimedia.org/r/266504 (https://phabricator.wikimedia.org/T120513) (owner: 10Jcrespo)
[15:51:06] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Refactor MirrorMaker puppetization [puppet/kafka] - 10https://gerrit.wikimedia.org/r/265789 (https://phabricator.wikimedia.org/T124077) (owner: 10Ottomata)
[15:54:04] <wikibugs>	 6operations, 6Commons, 7Swift: Update rsvg on the image scalers - https://phabricator.wikimedia.org/T112421#1965815 (10matmarex)
[15:54:07] <wikibugs>	 6operations, 6Commons, 5MW-1.27-release-notes, 7Swift: Some files had disappeared from Commons after renaming - https://phabricator.wikimedia.org/T111838#1965816 (10matmarex)
[15:57:17] <grrrit-wm>	 (03PS2) 10Hashar: contint: stop cloning mediawiki/tools/codesniffer.git [puppet] - 10https://gerrit.wikimedia.org/r/260018 (https://phabricator.wikimedia.org/T66371) 
[15:57:49] <grrrit-wm>	 (03CR) 10Hashar: [C: 031 V: 032] "Simple rebase. Still cherry-picked on integration puppetmaster." [puppet] - 10https://gerrit.wikimedia.org/r/260018 (https://phabricator.wikimedia.org/T66371) (owner: 10Hashar)
[15:58:16] <wikibugs>	 6operations, 6Commons, 7Monitoring, 7Swift: Monitor [[Special:ListFiles]] for non 200 HTTP statuses in thumbnails - https://phabricator.wikimedia.org/T106937#1965841 (10matmarex)
[15:58:19] <wikibugs>	 6operations, 10RESTBase, 6Services, 10Traffic: Remove restbase from parsoidcache - https://phabricator.wikimedia.org/T110475#1965843 (10GWicke) @bblack, there are still users for rest.wikimedia.org. I sent a reminder and announced a shut-down date for March. If we set up a redirect (or rewrite) for the dom...
[16:00:05] <jouncebot>	 anomie ostriches thcipriani marktraceur Krenair: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160126T1600).
[16:00:05] <jouncebot>	 Dereckson: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[16:00:10] <Dereckson>	 Hello.
[16:01:05] <icinga-wm>	 RECOVERY - mysqld processes on pc1005 is OK: PROCS OK: 1 process with command name mysqld
[16:01:11] <thcipriani>	 Dereckson: Hiya, I can SWAT for you this morning.
[16:01:22] <Dereckson>	 Okay.
[16:01:39] <grrrit-wm>	 (03PS1) 10Ottomata: Rotate kafka-mirror GC logs too [puppet/kafka] - 10https://gerrit.wikimedia.org/r/266508 
[16:02:45] <icinga-wm>	 RECOVERY - mysqld processes on pc1002 is OK: PROCS OK: 1 process with command name mysqld
[16:02:47] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Pass flake8 and add it to tox envlist [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/265252 (owner: 10Hashar)
[16:03:05] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265893 (https://phabricator.wikimedia.org/T124389) (owner: 10Dereckson)
[16:03:25] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Add .gitreview [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/265250 (owner: 10Hashar)
[16:03:54] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Introduce tox as a test entry point [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/265251 (owner: 10Hashar)
[16:04:16] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Add .gitreview [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/264010 (owner: 10Hashar)
[16:04:33] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Introduce tox as a test entry point [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/264011 (owner: 10Hashar)
[16:04:41] <Dereckson>	 thcipriani: 265893 depends of 265892, which depends of 265891
[16:04:57] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Pass flake8 and add it to tox envlist [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/264012 (owner: 10Hashar)
[16:05:31] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Rotate kafka-mirror GC logs too [puppet/kafka] - 10https://gerrit.wikimedia.org/r/266508 (owner: 10Ottomata)
[16:05:57] <grrrit-wm>	 (03PS2) 10Giuseppe Lavagetto: Use the logical redis definition for GettingStarted. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266481 (https://phabricator.wikimedia.org/T124671) 
[16:05:59] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: Rationalize definition of service hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266509 (https://phabricator.wikimedia.org/T114273) 
[16:06:01] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: Define Production service entries for InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266510 (https://phabricator.wikimedia.org/T114273) 
[16:06:03] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: Reduce poolcounter configuration complexity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266511 (https://phabricator.wikimedia.org/T114273) 
[16:06:05] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: Add references to wmfServices for Cirrusearch. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266512 (https://phabricator.wikimedia.org/T114273) 
[16:06:07] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: Use wmfMasterDatacenter for picking the master redis config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266513 (https://phabricator.wikimedia.org/T114273) 
[16:06:09] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: Configure redis LockManager in both DCs, use the master everywhere. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266514 
[16:06:12] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265891 (https://phabricator.wikimedia.org/T124389) (owner: 10Dereckson)
[16:07:46] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Define Production service entries for InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266510 (https://phabricator.wikimedia.org/T114273) (owner: 10Giuseppe Lavagetto)
[16:07:54] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Reduce poolcounter configuration complexity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266511 (https://phabricator.wikimedia.org/T114273) (owner: 10Giuseppe Lavagetto)
[16:08:06] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Add references to wmfServices for Cirrusearch. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266512 (https://phabricator.wikimedia.org/T114273) (owner: 10Giuseppe Lavagetto)
[16:08:08] <thcipriani>	 hmm, zuul still being slow about picking up that change...
[16:08:17] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Use wmfMasterDatacenter for picking the master redis config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266513 (https://phabricator.wikimedia.org/T114273) (owner: 10Giuseppe Lavagetto)
[16:08:27] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Configure redis LockManager in both DCs, use the master everywhere. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266514 (owner: 10Giuseppe Lavagetto)
[16:08:32] <_joe_>	 ugh, and ofc
[16:08:55] <grrrit-wm>	 (03Merged) 10jenkins-bot: Namespace configuration for wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265891 (https://phabricator.wikimedia.org/T124389) (owner: 10Dereckson)
[16:08:58] <Dereckson>	 It's taking 265891
[16:12:02] <logmsgbot>	 !log thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespace configuration for wuu.wikipedia [[gerrit:265891]] (duration: 01m 29s)
[16:12:05] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:12:08] <thcipriani>	 ^ Dereckson check please
[16:12:28] <Dereckson>	 Testing.
[16:12:47] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265892 (https://phabricator.wikimedia.org/T124389) (owner: 10Dereckson)
[16:13:24] <grrrit-wm>	 (03PS1) 10Jcrespo: Pool new parsercache pc1005 after cloning it from pc1002 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266516 (https://phabricator.wikimedia.org/T121888) 
[16:13:46] <Dereckson>	 Tested. Works fine.
[16:14:16] <icinga-wm>	 PROBLEM - NTP on serpens is CRITICAL: NTP CRITICAL: Offset unknown
[16:15:35] <grrrit-wm>	 (03Merged) 10jenkins-bot: Remove Tranwiki namespace on wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265892 (https://phabricator.wikimedia.org/T124389) (owner: 10Dereckson)
[16:16:03] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add Portal namespace on wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265893 (https://phabricator.wikimedia.org/T124389) (owner: 10Dereckson)
[16:18:13] <wikibugs>	 6operations, 6Commons, 10MediaWiki-File-management, 6Multimedia: image magick stripping colour profile of PNG files [probably regression] - https://phabricator.wikimedia.org/T113123#1965913 (10matmarex)
[16:18:37] <icinga-wm>	 RECOVERY - NTP on serpens is OK: NTP OK: Offset 0.005597949028 secs
[16:19:16] <logmsgbot>	 !log thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove Tranwiki namespace on wuu.wikipedia [[gerrit:265892]] and Add Portal namespace on wuu.wikipedia [[gerrit:265893]] (duration: 01m 27s)
[16:19:19] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:19:21] <thcipriani>	 ^ Dereckson check plase
[16:19:24] <thcipriani>	 *please
[16:19:41] <Dereckson>	 Testing.
[16:21:18] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265896 (https://phabricator.wikimedia.org/T122175) (owner: 10Dereckson)
[16:21:44] <grrrit-wm>	 (03CR) 10Ema: "This one should be merged *after* the pybal+etcd setup is done in eqiad." [puppet] - 10https://gerrit.wikimedia.org/r/266503 (https://phabricator.wikimedia.org/T109286) (owner: 10Ema)
[16:21:59] <grrrit-wm>	 (03Merged) 10jenkins-bot: Namespaces configuration on sk.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265896 (https://phabricator.wikimedia.org/T122175) (owner: 10Dereckson)
[16:22:06] <Dereckson>	 265892 and 265893 Verified.
[16:22:07] <icinga-wm>	 PROBLEM - HHVM rendering on mw1258 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[16:22:53] <thcipriani>	 Dereckson: thank you
[16:23:27] <icinga-wm>	 PROBLEM - Apache HTTP on mw1258 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[16:23:35] <grrrit-wm>	 (03CR) 10Hashar: "recheck" [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/264012 (owner: 10Hashar)
[16:23:38] <grrrit-wm>	 (03CR) 10Hashar: "recheck" [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/265252 (owner: 10Hashar)
[16:23:46] <grrrit-wm>	 (03CR) 10RobH: [C: 031] "This is slated for puppetswat today, and looks acceptable for merge at that time (pending rebase)." [puppet] - 10https://gerrit.wikimedia.org/r/265427 (https://phabricator.wikimedia.org/T120843) (owner: 10EBernhardson)
[16:24:39] <grrrit-wm>	 (03CR) 10RobH: [C: 031] "Looks good for puppetswat later today." [puppet] - 10https://gerrit.wikimedia.org/r/238850 (owner: 10EBernhardson)
[16:25:11] <logmsgbot>	 !log thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespaces configuration on sk.wikipedia [[gerrit:265896]] (duration: 01m 27s)
[16:26:04] <thcipriani>	 ^ Dereckson check please
[16:26:04] <icinga-wm>	 PROBLEM - High load average on labstore1001 is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [24.0]
[16:26:04] <grrrit-wm>	 (03PS1) 10Ottomata: Add nagios_servicegroup parameter to kafka::mirror::monitoring [puppet/kafka] - 10https://gerrit.wikimedia.org/r/266517 
[16:26:06] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:26:44] <grrrit-wm>	 (03CR) 10Hashar: "CI is enabled :-}" [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/265252 (owner: 10Hashar)
[16:26:46] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Add nagios_servicegroup parameter to kafka::mirror::monitoring [puppet/kafka] - 10https://gerrit.wikimedia.org/r/266517 (owner: 10Ottomata)
[16:26:51] <grrrit-wm>	 (03CR) 10Hashar: "CI is enabled :-}" [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/264012 (owner: 10Hashar)
[16:27:03] <grrrit-wm>	 (03CR) 10RobH: [C: 031] "slated for puppetswat shortly." [puppet] - 10https://gerrit.wikimedia.org/r/266299 (https://phabricator.wikimedia.org/T123869) (owner: 10Eevans)
[16:27:08] <Dereckson>	 thcipriani: Tested
[16:27:14] <grrrit-wm>	 (03CR) 10Ottomata: [V: 032] Add nagios_servicegroup parameter to kafka::mirror::monitoring [puppet/kafka] - 10https://gerrit.wikimedia.org/r/266517 (owner: 10Ottomata)
[16:27:27] <icinga-wm>	 PROBLEM - Host alsafi is DOWN: PING CRITICAL - Packet loss = 100%
[16:27:57] <thcipriani>	 Dereckson: I'll circle back to the category collation and do that one at the end, since it requires running a script
[16:28:20] <Dereckson>	 Okay.
[16:28:38] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265666 (https://phabricator.wikimedia.org/T124167) (owner: 10Dereckson)
[16:29:18] <grrrit-wm>	 (03Merged) 10jenkins-bot: Enable SandboxLink on nl.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265666 (https://phabricator.wikimedia.org/T124167) (owner: 10Dereckson)
[16:29:29] <grrrit-wm>	 (03PS1) 10Ottomata: Puppetize Kafka MirrorMaker on analytics1021 mirroring from main-eqiad to analytics-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/266518 
[16:30:46] <wikibugs>	 6operations, 6Performance-Team, 10Traffic: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#1965960 (10BBlack) Quick update, I did a small re-check on just a single text node in esams (mobile + desktop text traffic, random subsample of IPs, mostly in Europe) for 5 minutes:    | Protocol | Percentage...
[16:31:33] <logmsgbot>	 !log thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable SandboxLink on nl.wikiquote [[gerrit:265666]] (duration: 01m 26s)
[16:31:36] <thcipriani>	 ^ Dereckson check please
[16:31:36] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:31:47] <icinga-wm>	 PROBLEM - puppet last run on mw2052 is CRITICAL: CRITICAL: puppet fail
[16:31:47] <icinga-wm>	 RECOVERY - NTP on cygnus is OK: NTP OK: Offset 0.0002664327621 secs
[16:32:12] <Dereckson>	 thcipriani: works
[16:32:24] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265623 (https://phabricator.wikimedia.org/T124154) (owner: 10Dereckson)
[16:32:42] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Puppetize Kafka MirrorMaker on analytics1021 mirroring from main-eqiad to analytics-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/266518 (owner: 10Ottomata)
[16:33:17] <grrrit-wm>	 (03Merged) 10jenkins-bot: Update et.wikiquote logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265623 (https://phabricator.wikimedia.org/T124154) (owner: 10Dereckson)
[16:35:02] <grrrit-wm>	 (03PS1) 10Ottomata: Remove invalid parameter jmx_port from kafka::mirror::monitoring use [puppet] - 10https://gerrit.wikimedia.org/r/266519 
[16:35:15] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Remove invalid parameter jmx_port from kafka::mirror::monitoring use [puppet] - 10https://gerrit.wikimedia.org/r/266519 (owner: 10Ottomata)
[16:36:38] <logmsgbot>	 !log thcipriani@mira Synchronized w/static/images/project-logos/etwikiquote.png: SWAT: Update et.wikiquote logo [[gerrit:265623]] (duration: 01m 27s)
[16:36:41] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:36:45] <thcipriani>	 ^ Dereckson check please
[16:38:05] <grrrit-wm>	 (03PS1) 10Ottomata: Fix dependency for nrpe::monitor_service { "kafka-mirror-${title}" [puppet/kafka] - 10https://gerrit.wikimedia.org/r/266521 
[16:39:09] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Fix dependency for nrpe::monitor_service { "kafka-mirror-${title}" [puppet/kafka] - 10https://gerrit.wikimedia.org/r/266521 (owner: 10Ottomata)
[16:39:27] <Dereckson>	 https://et.wikiquote.org/w/static/images/project-logos/etwikiquote.png is live and okay, but not yet https://et.wikiquote.org/static/images/project-logos/etwikiquote.png
[16:39:44] <grrrit-wm>	 (03Merged) 10jenkins-bot: Namespace configuration on ur.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265888 (https://phabricator.wikimedia.org/T122045) (owner: 10Dereckson)
[16:40:31] <thcipriani>	 Dereckson: hmm, I see both updated
[16:41:15] <thcipriani>	 oh wait, in incognito I see the non-w link as not updated, too
[16:41:28] <Dereckson>	 With wget, I've the former version too.
[16:41:28] <thcipriani>	 ...doublechecking
[16:45:30] <grrrit-wm>	 (03PS1) 10Ottomata: Use brokers_string instead of brokers_array so each hostname is suffixed with broker port [puppet] - 10https://gerrit.wikimedia.org/r/266525 
[16:45:49] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Use brokers_string instead of brokers_array so each hostname is suffixed with broker port [puppet] - 10https://gerrit.wikimedia.org/r/266525 (owner: 10Ottomata)
[16:45:51] <thcipriani>	 Dereckson: hmm with ?debug=true I get the correct logo
[16:46:21] <Dereckson>	 cache issue so I imagine
[16:48:31] <thcipriani>	 Indeed. I tried purgeList, didn't seem to have an effect.
[16:48:43] <Dereckson>	 We wait to see if it takes the new files later and revisit the issue if not?
[16:49:05] <bblack>	 thcipriani: what would purgeList do exactly in this case?
[16:49:06] <icinga-wm>	 RECOVERY - Host alsafi is UP: PING OK - Packet loss = 0%, RTA = 36.40 ms
[16:49:11] <thcipriani>	 Dereckson: yeah, I'm syncing your urwiki change now.
[16:53:33] <bblack>	 I don't know what interfaces you have for purging, but /static/ assets are all virtually under the same hostname for caching purposes
[16:53:33] <bblack>	 they're all https://www.wikimedia.org/static/.... regardless of what wiki they're referenced from, if you're purging
[16:53:33] <vvv>	 apergos: are you still the person whom I should email if I want to mirror XML dumps?
[16:53:33] <thcipriani>	 bblack: kk, yeah, I saw X-Cache headers, cache busting worked, that was the rational behind purgeList
[16:53:33] <apergos>	 vvv: yep I'm the one
[16:53:33] <grrrit-wm>	 (03CR) 10Aklapper: [C: 031] "fine with me" [puppet] - 10https://gerrit.wikimedia.org/r/266316 (https://phabricator.wikimedia.org/T123581) (owner: 10Dzahn)
[16:53:33] <icinga-wm>	 PROBLEM - RAID on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:53:33] <icinga-wm>	 PROBLEM - puppet last run on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:53:33] <icinga-wm>	 PROBLEM - configured eth on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:53:33] <thcipriani>	 ^ sync-proxies hung up on mw1161 as well :(
[16:53:33] <icinga-wm>	 PROBLEM - SSH on mw1161 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[16:53:33] <icinga-wm>	 PROBLEM - dhclient process on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:55:07] <icinga-wm>	 PROBLEM - puppet last run on alsafi is CRITICAL: CRITICAL: puppet fail
[16:55:37] <icinga-wm>	 PROBLEM - nutcracker process on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:55:47] <logmsgbot>	 !log thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespace configuration on ur.wikipedia [[gerrit:265888]] (duration: 07m 10s)
[16:55:50] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:55:53] <wikibugs>	 6operations, 6Project-Creators: Operations-related subprojects/tags reorganization - https://phabricator.wikimedia.org/T119944#1966063 (10matmarex) #wikimedia-media-storage has no more open tasks and it's been archived. #swift has twenty or so new ones :), and #mediawiki-file-management has also gained a few p...
[16:55:58] <thcipriani>	 ^ Dereckson check please
[16:56:26] <icinga-wm>	 RECOVERY - RAID on mw1161 is OK: OK: no RAID installed
[16:56:27] <icinga-wm>	 RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 14 minutes ago with 0 failures
[16:56:45] <Dereckson>	 Testing.4~
[16:56:57] <icinga-wm>	 RECOVERY - configured eth on mw1161 is OK: OK - interfaces up
[16:57:26] <icinga-wm>	 RECOVERY - SSH on mw1161 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.4 (protocol 2.0)
[16:57:26] <icinga-wm>	 RECOVERY - dhclient process on mw1161 is OK: PROCS OK: 0 processes with command name dhclient
[16:57:31] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266427 (https://phabricator.wikimedia.org/T123627) (owner: 10Dereckson)
[16:57:37] <icinga-wm>	 RECOVERY - nutcracker process on mw1161 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[16:58:12] <grrrit-wm>	 (03Merged) 10jenkins-bot: Set category collation to uca-lt on lt.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266427 (https://phabricator.wikimedia.org/T123627) (owner: 10Dereckson)
[16:58:52] <Dereckson>	 Tested.
[16:59:27] <icinga-wm>	 RECOVERY - puppet last run on alsafi is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:59:57] <icinga-wm>	 RECOVERY - puppet last run on mw2052 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:00:05] <jouncebot>	 RobH cmjohnson1: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160126T1700).
[17:00:05] <jouncebot>	 Krenair ebernhardson urandom: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process.
[17:00:28] <robh>	 cmjohnson1: you about?
[17:00:40] <cmjohnson1>	 ys
[17:01:29] <vvv>	 apergos: email sent
[17:01:32] <apergos>	 thanks!
[17:01:36] <robh>	 ok puppet swat time.  so first thing is all these apache patches
[17:01:39] <logmsgbot>	 !log thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: Set category collation to uca-lt on lt.wikipedia [[gerrit:266427]] (duration: 01m 33s)
[17:01:43] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:01:48] <robh>	 apache config patches that is.
[17:02:01] <robh>	 cmjohnson1: so, it used to be that https://wikitech.wikimedia.org/wiki/Application_servers was accurate
[17:02:08] <Krenair>	 there's only 3 now
[17:02:18] <thcipriani>	 !log running updateCollation on ltwiki
[17:02:21] <robh>	 but i found last time that manually pushing puppet for apache was painful.  so best to do the testing as the page says
[17:02:21] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:02:24] <robh>	 Krenair: cool
[17:02:37] <grrrit-wm>	 (03PS1) 10Ottomata: Move analytics-eqiad kafka mirror test to kafka1001 [puppet] - 10https://gerrit.wikimedia.org/r/266528 
[17:03:01] <robh>	 cmjohnson1: so review https://wikitech.wikimedia.org/wiki/Application_servers#Deploying_config
[17:03:35] <robh>	 you should be able to walk through each change via those steps (for Krenair's config changes)
[17:03:51] <robh>	 (so you'll be disabling puppet on app servers during this run)
[17:03:58] <robh>	 via salt on neodymium
[17:04:19] <thcipriani>	 hmm well that's not good, tried: mwscript updateCollation.php --wiki=ltwiki --previous-colation=uppercase: "processing...Database is read-only: Brief Database Maintenance in progress, please try again in 3 minutes"
[17:04:29] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Move analytics-eqiad kafka mirror test to kafka1001 [puppet] - 10https://gerrit.wikimedia.org/r/266528 (owner: 10Ottomata)
[17:04:44] <robh>	 cmjohnson1: oh i forgot the most important part, before we touch shit we need to ensure the previous swat window is done
[17:04:48] <Krenair>	 surely those two beta patches don't need prod app server puppets disabling?
[17:04:57] <robh>	 (they dont appear to be ;)
[17:05:26] <thcipriani>	 robh: just trying to run the maintenance script, then I'll be out of your way :P
[17:05:31] <robh>	 no worries =]
[17:05:48] <robh>	 Since swat is intentionally regular, minor disruptions are no big deal.
[17:06:24] <thcipriani>	 Krenair: ever run into the "Database is read-only" message above? Never seen it before running this script...
[17:06:24] <robh>	 (plus im just shocked to have puppet swat patches)
[17:07:32] <Krenair>	 jynus, ^
[17:07:45] <Krenair>	 thcipriani, I've certainly seen DBs go read-only, not sure about that particular maintenance message
[17:07:46] <jynus>	 in theory that happens when there is general lag, but that is not hte case
[17:08:17] <icinga-wm>	 PROBLEM - Check correctness of the icinga configuration on neon is CRITICAL: Icinga configuration contains errors
[17:08:19] <thcipriani>	 jynus: hmm, lemme try again.
[17:08:38] <Krenair>	 it's s3
[17:09:06] <jynus>	 yes, but I do not see it
[17:09:11] <Krenair>	 that script only uses master, so db1038
[17:09:21] <grrrit-wm>	 (03PS2) 10Giuseppe Lavagetto: Define Production service entries for InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266510 (https://phabricator.wikimedia.org/T114273) 
[17:10:04] <thcipriani>	 well I got a slow query message: https://phabricator.wikimedia.org/P2528
[17:10:06] <icinga-wm>	 PROBLEM - NTP on fermium is CRITICAL: NTP CRITICAL: Offset 28.98543322 secs
[17:10:38] <marxarelli>	 thcipriani: heads up, i'm going to start merging backports for wmf.11 on mira
[17:10:58] <icinga-wm>	 PROBLEM - NTP on krypton is CRITICAL: NTP CRITICAL: Offset 19.68465185 secs
[17:11:17] <icinga-wm>	 RECOVERY - High load average on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0]
[17:11:37] <Krenair>	 are you deploying marxarelli?
[17:11:40] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Define Production service entries for InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266510 (https://phabricator.wikimedia.org/T114273) (owner: 10Giuseppe Lavagetto)
[17:11:46] <marxarelli>	 Krenair: not yet
[17:12:00] <marxarelli>	 just wanted to get wmf.11 ready for when swat is finished
[17:12:13] <Krenair>	 puppet swat is immediately following normal swat
[17:12:59] <marxarelli>	 Krenair: should i wait? the tentative plan was to sync up to mw1017 for targeting of wmf.11 patches
[17:13:12] <marxarelli>	 *targeted testing*
[17:13:30] <marxarelli>	 i can hold off if that's a problem
[17:13:58] <robh>	 dont interrupt puppet swat with unplanned swatting ;]
[17:14:04] <ebernhardson>	 :)
[17:14:06] <robh>	 this week we actually have patches!
[17:15:07] <robh>	 thcipriani: we're still standing by until you give us the all clear to proceed =]
[17:15:14] <marxarelli>	 robh: kk. i'll just rebase wmf.11 to get it ready but hold off on syncing
[17:15:17] <grrrit-wm>	 (03PS1) 10Ottomata: Run kafka mirror on both kafka1001 and kafka1002 [puppet] - 10https://gerrit.wikimedia.org/r/266530 
[17:15:22] <robh>	 (not rushing you just ensuring you know we arent going to start pushing on top of ya!)
[17:16:01] <bd808>	 The HHVM process on mw1019 is dying every 5-7 minutes like a clockwork. 1743 HHVM errors from that host in logstash for the last hour.
[17:16:08] <Krenair>	 thcipriani, when static changes are made like https://gerrit.wikimedia.org/r/#/c/265623/ please send the URL (with www.wikimedia.org hostname) to purgeList.php
[17:16:21] <Krenair>	 like this: echo 'https://www.wikimedia.org/static/images/project-logos/etwikiquote.png' | mwscript purgeList.php
[17:16:33] <bd808>	 Is mw1019 the same host I was whining about yesterday?
[17:16:42] <Krenair>	 I did it this time
[17:16:44] <thcipriani>	 Krenair: gotcha, thanks.
[17:16:58] <Krenair>	 bd808, I was whining about that one yesterday
[17:17:05] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Run kafka mirror on both kafka1001 and kafka1002 [puppet] - 10https://gerrit.wikimedia.org/r/266530 (owner: 10Ottomata)
[17:17:11] <grrrit-wm>	 (03PS3) 10Cmjohnson: beta: Remove deployment.wmflabs.org VHost that doesn't actually resolve [puppet] - 10https://gerrit.wikimedia.org/r/265548 (owner: 10Alex Monk)
[17:17:39] <marxarelli>	 bd808, anomie, tgr: merging wmf.11 backports but holding off on deploying to mw1017 until after puppet swat
[17:17:43] <thcipriani>	 hmm, so I can connect with db1038 via the sql script from mira, but I can't run the updateCollation script...
[17:17:57] <icinga-wm>	 PROBLEM - puppet last run on kafka1001 is CRITICAL: CRITICAL: puppet fail
[17:18:04] <Krenair>	 yeah, the sql script doesn't take that sort of thing into account AFAIK
[17:18:06] <bd808>	 Krenair: confirmed that we were both complaining about it. J.oe said not to worry about it for now.
[17:18:23] <jynus>	 is it trying to run it on the wrong host? db1038 is not in read only
[17:18:29] <bd808>	 marxarelli: sweet. Have fun with our pal Jenkins
[17:18:35] <jynus>	 maybe it is in read only at mediawiki level?
[17:18:36] <grrrit-wm>	 (03CR) 10Cmjohnson: [C: 032] beta: Remove deployment.wmflabs.org VHost that doesn't actually resolve [puppet] - 10https://gerrit.wikimedia.org/r/265548 (owner: 10Alex Monk)
[17:18:45] <Krenair>	 jynus, I think MW has it's own 'read-only' status
[17:18:47] <Krenair>	 yes
[17:18:50] <robh>	 cmjohnson: So depending on how long it takes for us to get into our swat window, we may end up rolling some of the propsed patches from today to thursday.
[17:19:41] <Dereckson>	 thcipriani / Krenair > et.wikiquote logo now works
[17:20:06] <icinga-wm>	 RECOVERY - puppet last run on kafka1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:20:18] <thcipriani>	 Dereckson: cool. Krenair ran the right purgeList command :)
[17:20:33] <cmjohnson>	 !log disabling puppet on mw cluster
[17:20:36] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:21:06] <thcipriani>	 robh: go ahead with puppet swat. not sure what's going on with updateCollation, probably take a few to figure it out.
[17:21:23] <Krenair>	 I'm looking into updateCollation
[17:22:16] <icinga-wm>	 PROBLEM - High load average on labstore1001 is CRITICAL: CRITICAL: 87.50% of data above the critical threshold [24.0]
[17:22:57] <robh>	 ok, cmjohnson1 is handling the apache updates and im assisting today, you can see chris is now proceeding =]
[17:23:32] <wikibugs>	 6operations, 10vm-requests: request VM for releases.wm.org - https://phabricator.wikimedia.org/T124261#1966143 (10Dzahn) a:3Dzahn
[17:24:37] <robh>	 cmjohnson1: While you are working on the apache changes, I'll start on the search ones
[17:24:49] <cmjohnson1>	 okay
[17:25:01] <robh>	 ebernhardson: Do either one of your patches require me to restart search service?
[17:25:14] <robh>	 they seem like they will simply puppet change into place and roll, but I want to be certain
[17:25:43] <grrrit-wm>	 (03PS2) 10RobH: [cirrus maint] redirect stderr to log and use full mwscript path [puppet] - 10https://gerrit.wikimedia.org/r/265427 (https://phabricator.wikimedia.org/T120843) (owner: 10EBernhardson)
[17:26:20] <ebernhardson>	 robh: well, one of them does eventually (minimum cluster nodes), but we can't just restart the search service. It requires a 3 day rolling restart across the cluster
[17:26:24] <ebernhardson>	 robh: me and dcausse will work that out 
[17:26:28] <grrrit-wm>	 (03PS1) 10Dzahn: releases: add role on bromine [puppet] - 10https://gerrit.wikimedia.org/r/266531 (https://phabricator.wikimedia.org/T124261) 
[17:26:31] <ebernhardson>	 (we both have root in elastic*)
[17:26:43] <robh>	 ebernhardson: ok, so as long as I see it roll in puppet sucessfully then the puppet swat portion is done for these?
[17:26:49] <ebernhardson>	 robh: yes
[17:26:51] <robh>	 (the two search ones, the icinga one i understand ;)
[17:26:58] <robh>	 awesome, I'm rebasing and merging them for you now
[17:27:13] <grrrit-wm>	 (03PS3) 10Giuseppe Lavagetto: Define Production service entries for InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266510 (https://phabricator.wikimedia.org/T114273) 
[17:27:40] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] releases: add role on bromine [puppet] - 10https://gerrit.wikimedia.org/r/266531 (https://phabricator.wikimedia.org/T124261) (owner: 10Dzahn)
[17:27:51] <grrrit-wm>	 (03PS1) 10Ottomata: Set group_prefix on kafka-mirror jmxtrans metrics [puppet] - 10https://gerrit.wikimedia.org/r/266532 
[17:27:56] <jynus>	 are you executing that from mira?
[17:28:09] <grrrit-wm>	 (03PS3) 10Cmjohnson: mediawiki: Move www.wikimedia.org portal into wwwportals [puppet] - 10https://gerrit.wikimedia.org/r/265642 (owner: 10Alex Monk)
[17:28:11] <grrrit-wm>	 (03CR) 10RobH: [C: 032] [elasticsearch] Update recover_after_nodes value [puppet] - 10https://gerrit.wikimedia.org/r/238850 (owner: 10EBernhardson)
[17:28:12] <jynus>	 can you try from terbium?
[17:28:27] <thcipriani>	 jynus: the updateCollation script? yes.
[17:28:32] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Define Production service entries for InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266510 (https://phabricator.wikimedia.org/T114273) (owner: 10Giuseppe Lavagetto)
[17:28:34] <jynus>	 to both?
[17:28:38] <Krenair>	 So that string is from wikimedia's config
[17:28:41] <Krenair>	 But it's all commented out
[17:28:50] <Krenair>	 # This key must exist for the master switch script to work
[17:28:50] <Krenair>	 'readOnlyBySection' => array(
[17:28:51] <Krenair>	 #       'DEFAULT' => 'Brief Database Maintenance in progress, please try again in 3 minutes', #s3
[17:28:51] <Krenair>	 etc.
[17:28:55] <grrrit-wm>	 (03PS2) 10Ottomata: Set group_prefix on kafka-mirror jmxtrans metrics [puppet] - 10https://gerrit.wikimedia.org/r/266532 
[17:29:08] <grrrit-wm>	 (03PS3) 10Ottomata: Set group_prefix on kafka-mirror jmxtrans metrics [puppet] - 10https://gerrit.wikimedia.org/r/266532 
[17:29:25] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Set group_prefix on kafka-mirror jmxtrans metrics [puppet] - 10https://gerrit.wikimedia.org/r/266532 (owner: 10Ottomata)
[17:29:39] <grrrit-wm>	 (03PS2) 10Dzahn: releases: add role on bromine [puppet] - 10https://gerrit.wikimedia.org/r/266531 (https://phabricator.wikimedia.org/T124261) 
[17:29:55] <Dereckson>	 Krenair: commented in db-eqiad.php, but uncommented in db-codfw.php
[17:30:02] <grrrit-wm>	 (03CR) 10Dzahn: [V: 032] releases: add role on bromine [puppet] - 10https://gerrit.wikimedia.org/r/266531 (https://phabricator.wikimedia.org/T124261) (owner: 10Dzahn)
[17:30:16] <jynus>	 yep
[17:30:21] <jynus>	 I was going to say that
[17:30:22] <thcipriani>	  running from terbium seems to work
[17:30:27] <thcipriani>	 since, because, eqiad.
[17:30:32] <Krenair>	 Ohhh.
[17:30:33] <Krenair>	 Yep
[17:30:38] <_joe_>	 thcipriani: what's the issue?
[17:30:48] <jynus>	 not sever-related, _joe_ 
[17:30:53] <jynus>	 mediawiki-config
[17:31:01] <jynus>	 although I am unsure how to fix
[17:31:01] <_joe_>	 I am interested anyways
[17:31:10] <_joe_>	 what is the issue specifically?
[17:31:13] <jynus>	 because efectively, codfw is read only
[17:31:16] <_joe_>	 with databases?
[17:31:16] <grrrit-wm>	 (03PS4) 10Cmjohnson: mediawiki: Move www.wikimedia.org portal into wwwportals [puppet] - 10https://gerrit.wikimedia.org/r/265642 (owner: 10Alex Monk)
[17:31:25] <_joe_>	 heh, we should pair up on those
[17:31:39] <_joe_>	 I have ideas on how to configure appservers
[17:32:25] <jynus>	 but mediawiki tries to write to the local master that is now the eqiad master
[17:32:36] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on bromine is CRITICAL: CRITICAL: puppet fail daniel_zahn fixing puppet roles
[17:32:39] <jynus>	 but mediawiki won't allow
[17:32:57] <jynus>	 even if it is rw in reality, from the point of view of the main master
[17:33:05] <jynus>	 *main datacenter
[17:33:26] <grrrit-wm>	 (03PS1) 10Ottomata: Lint fixes for role::kafka::analytics::mirror [puppet] - 10https://gerrit.wikimedia.org/r/266533 
[17:33:36] <jynus>	 I actually thought about this before db-eqiad.php and db-codfw.php makes no sense
[17:33:54] <Krenair>	 I thought db-codfw points to the eqiad masters?
[17:33:54] <grrrit-wm>	 (03CR) 10Cmjohnson: [C: 032] mediawiki: Move www.wikimedia.org portal into wwwportals [puppet] - 10https://gerrit.wikimedia.org/r/265642 (owner: 10Alex Monk)
[17:34:03] <mutante>	 hmm. more NTP issues than usual in icinga
[17:34:13] <mutante>	 but restarting deamons usually fixes them
[17:34:14] <jynus>	 because tight now db-codfw's master is in eqiad
[17:34:31] <grrrit-wm>	 (03PS3) 10RobH: [cirrus maint] redirect stderr to log and use full mwscript path [puppet] - 10https://gerrit.wikimedia.org/r/265427 (https://phabricator.wikimedia.org/T120843) (owner: 10EBernhardson)
[17:34:40] <Krenair>	 so we can comment out the read-only part?
[17:34:45] <grrrit-wm>	 (03PS2) 10Ottomata: Lint fixes for role::kafka::analytics::mirror [puppet] - 10https://gerrit.wikimedia.org/r/266533 
[17:34:54] <grrrit-wm>	 (03CR) 10RobH: [C: 032] [cirrus maint] redirect stderr to log and use full mwscript path [puppet] - 10https://gerrit.wikimedia.org/r/265427 (https://phabricator.wikimedia.org/T120843) (owner: 10EBernhardson)
[17:35:00] <mutante>	 !log mw1258 - restart hhvm
[17:35:03] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:35:16] <jynus>	 no, we do not want to accidentally write to eqiad from codfw, right?
[17:35:19] <mutante>	 icinga config is broken
[17:35:47] <icinga-wm>	 RECOVERY - Apache HTTP on mw1258 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.025 second response time
[17:36:06] <mutante>	 Error: Contact group 'analytics_eqiad' specified in service 'Kafka MirrorMaker analytics-eqiad' for host 'kafka1001' is not defined anywhere!
[17:36:38] <icinga-wm>	 RECOVERY - HHVM rendering on mw1258 is OK: HTTP OK: HTTP/1.1 200 OK - 64840 bytes in 0.084 second response time
[17:36:46] <jynus>	 actually, probably yes, because only some shards are read-only
[17:36:50] <ottomata>	 ?
[17:36:53] <ottomata>	 oook
[17:37:01] <ottomata>	 unsure of how that stuff works, with you shortly
[17:37:17] <ottomata>	 are contact groups not available to use everywhere?
[17:37:18] <jynus>	 but that is a security concern
[17:37:35] <mutante>	 ottomata: is it maybe just a - vs _ or so?
[17:37:43] <mutante>	 looks
[17:37:47] <jynus>	 because we do not want cross-wiki queries until we setup SSL there
[17:37:57] <robh>	 ok, rolling the maint script update onto elastic1001 (the rest will get on normal call in, this is a paranoid post merge puppet run)
[17:38:42] <mutante>	 ottomata: so analytics_eqiad is a service group, defined and ready to use, but the errors says it looks for the same thing as a contact group
[17:38:48] <mutante>	 like group of people to notify
[17:39:11] <jynus>	 I am unsure on how to proceed, Krenair, _joe_ I accept suggestions
[17:39:56] <icinga-wm>	 RECOVERY - High load average on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0]
[17:40:13] <jynus>	 maybe set the master to a local master, but makse sure it is read-only at mysql side?
[17:40:17] <icinga-wm>	 RECOVERY - NTP on pollux is OK: NTP OK: Offset 0.001584768295 secs
[17:41:06] <grrrit-wm>	 (03PS1) 10Dereckson: Document db-codfw readOnlyBySection [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266534 
[17:41:17] <Dereckson>	 jynus: so should we add some comment like this one ? ^
[17:41:50] <jynus>	 well, first decide what to do :-)
[17:42:09] <Krenair>	 jynus, db-codfw's master DB lines point to eqiad... therefore, shouldn't it be safe to remove read-only mode?
[17:42:23] <mutante>	 ottomata: i think it's this   modules/role/manifests/graphite/alerts.pp:        group        => 'analytics_eqiad',
[17:42:32] <jynus>	 Krenair, I explained why not- we do not want cross-datacenter writes
[17:42:45] <mutante>	 that kind of group is probably a contact group, because they are alerts
[17:42:51] <Krenair>	 well
[17:43:00] <thcipriani>	 !log ltwiki collation updated 503623 rows processed
[17:43:03] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:43:10] <Krenair>	 let's just leave it as is and have people run maint scripts from terbium instead?
[17:43:13] <jynus>	 that is in general- we do not want cross-datacenter queries
[17:43:27] <Krenair>	 if they need to write
[17:43:35] <thcipriani>	 ^ Dereckson ltwiki is updated.
[17:43:42] <thcipriani>	 jynus: Krenair thank you for the help!
[17:43:44] <Krenair>	 we should also change the read only reason to explain this
[17:44:04] <Dereckson>	 Thank you for the deploy.
[17:44:20] <jynus>	 we should fix it, both with the comment and I think it is not fully read-only
[17:44:28] <ottomata>	 mutante:  what's the difference between contact group and nagios service group?  
[17:44:38] <jynus>	 only s3 (twice), s1 and s5
[17:44:41] <grrrit-wm>	 (03PS1) 10Dzahn: graphite alerts, fix analytics monitoring group name [puppet] - 10https://gerrit.wikimedia.org/r/266535 
[17:45:02] <ottomata>	 oh
[17:45:04] <ottomata>	 i should just change it ot analytics
[17:45:05] <ottomata>	 hm
[17:45:12] <ottomata>	 or really not
[17:45:13] <mutante>	 ottomata: contact group is a group of people, only used for notifications. service group is a group of services, groups service checks together in the web ui
[17:45:13] <jynus>	 is there a way to check that s2 is not read-only?
[17:45:14] <ottomata>	  at all
[17:45:14] <ottomata>	 ok.
[17:45:15] <ottomata>	 nm
[17:45:21] <mutante>	 ottomata: did just that
[17:45:22] <ottomata>	 oh
[17:45:29] <Krenair>	 cmjohnson1, robh: got a bit distracted by the DB stuff, how's swat going?
[17:45:39] <ottomata>	 hmmm
[17:45:46] <ottomata>	 mutante:  i think that is not what is causing the error though
[17:45:50] <robh>	 chris has tow of the three pushed to the test apache (the rest have puppet halted)
[17:45:57] <robh>	 and is doing the tests now, so its moving along =]
[17:46:05] <ottomata>	 group is actually monitoring::service group
[17:46:06] <jynus>	 let me open a ticket, even if this is trivial, because it is important for the future failover
[17:46:07] <Krenair>	 cool
[17:46:16] <ottomata>	 which is a service group
[17:46:19] <ottomata>	 mutante:  i'm on it...
[17:46:52] <mutante>	 ottomata: i think it ends up being a contact group because that is a graphite alert
[17:47:00] <mutante>	 ottomata: ok, cool
[17:47:42] <ottomata>	 the problem is a recent commit of mine for kafka mirror stuff
[17:47:51] <ottomata>	 hm, why does nrpe::monitor_service not take a service group param!?
[17:48:42] <grrrit-wm>	 (03PS1) 10Ottomata: Remove incorrect nagios_servicegroup param for kafka::mirror::monitoring [puppet/kafka] - 10https://gerrit.wikimedia.org/r/266536 
[17:49:09] <grrrit-wm>	 (03PS4) 10Giuseppe Lavagetto: Define Production service entries for InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266510 (https://phabricator.wikimedia.org/T114273) 
[17:49:55] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Define Production service entries for InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266510 (https://phabricator.wikimedia.org/T114273) (owner: 10Giuseppe Lavagetto)
[17:50:10] <mutante>	 ottomata: it probably should, our use of service groups could be improved, we have some but a lot is missing
[17:50:16] <grrrit-wm>	 (03PS3) 10Cmjohnson: beta: Move login and bits apache configs into wikimedia.conf, like prod [puppet] - 10https://gerrit.wikimedia.org/r/265659 (owner: 10Alex Monk)
[17:50:39] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Remove incorrect nagios_servicegroup param for kafka::mirror::monitoring [puppet/kafka] - 10https://gerrit.wikimedia.org/r/266536 (owner: 10Ottomata)
[17:51:05] <wikibugs>	 6operations, 10DBA, 10MediaWiki-Configuration, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: codfw is in read only according to mediawiki - https://phabricator.wikimedia.org/T124795#1966220 (10jcrespo) 3NEW
[17:51:13] <jynus>	 ^
[17:51:32] <grrrit-wm>	 (03PS1) 10Ottomata: Update kafka submodule and remove incorrect use of nagios_servicegroup [puppet] - 10https://gerrit.wikimedia.org/r/266538 
[17:51:43] <grrrit-wm>	 (03PS3) 10Ottomata: Lint fixes for role::kafka::analytics::mirror [puppet] - 10https://gerrit.wikimedia.org/r/266533 
[17:51:45] <grrrit-wm>	 (03PS2) 10Ottomata: Update kafka submodule and remove incorrect use of nagios_servicegroup [puppet] - 10https://gerrit.wikimedia.org/r/266538 
[17:51:47] <grrrit-wm>	 (03PS2) 10Dzahn: graphite alerts, fix analytics monitoring group name [puppet] - 10https://gerrit.wikimedia.org/r/266535 
[17:52:01] <grrrit-wm>	 (03CR) 10Cmjohnson: [C: 032] beta: Move login and bits apache configs into wikimedia.conf, like prod [puppet] - 10https://gerrit.wikimedia.org/r/265659 (owner: 10Alex Monk)
[17:52:03] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Lint fixes for role::kafka::analytics::mirror [puppet] - 10https://gerrit.wikimedia.org/r/266533 (owner: 10Ottomata)
[17:52:09] <grrrit-wm>	 (03PS4) 10Ottomata: Lint fixes for role::kafka::analytics::mirror [puppet] - 10https://gerrit.wikimedia.org/r/266533 
[17:52:17] <grrrit-wm>	 (03CR) 10Ottomata: [V: 032] Lint fixes for role::kafka::analytics::mirror [puppet] - 10https://gerrit.wikimedia.org/r/266533 (owner: 10Ottomata)
[17:52:35] <grrrit-wm>	 (03PS3) 10Ottomata: Update kafka submodule and remove incorrect use of nagios_servicegroup [puppet] - 10https://gerrit.wikimedia.org/r/266538 
[17:52:37] <grrrit-wm>	 (03PS5) 10RobH: [elasticsearch] Update recover_after_nodes value [puppet] - 10https://gerrit.wikimedia.org/r/238850 (owner: 10EBernhardson)
[17:52:49] <wikibugs>	 6operations, 10DBA, 10MediaWiki-Configuration, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: codfw is in read only according to mediawiki - https://phabricator.wikimedia.org/T124795#1966245 (10jcrespo) My personal recommendation is to make it 100% read only, point to local masters, and force maintenance f...
[17:52:54] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Update kafka submodule and remove incorrect use of nagios_servicegroup [puppet] - 10https://gerrit.wikimedia.org/r/266538 (owner: 10Ottomata)
[17:53:37] <wikibugs>	 6operations, 10DBA, 10MediaWiki-Configuration, 6Release-Engineering-Team, and 2 others: codfw is in read only according to mediawiki - https://phabricator.wikimedia.org/T124795#1966247 (10Krenair) phab doesn't auto-add projects like that anymore
[17:53:47] <grrrit-wm>	 (03CR) 10RobH: [C: 032] [elasticsearch] Update recover_after_nodes value [puppet] - 10https://gerrit.wikimedia.org/r/238850 (owner: 10EBernhardson)
[17:53:51] <grrrit-wm>	 (03PS6) 10RobH: [elasticsearch] Update recover_after_nodes value [puppet] - 10https://gerrit.wikimedia.org/r/238850 (owner: 10EBernhardson)
[17:54:07] <jynus>	 Krenair, let me wish!
[17:54:20] <Dereckson>	 jynus: on Phabricator, you can cc team projects in the subscribers field by the way
[17:54:39] <jynus>	 yeah, but editing is too much work
[17:54:52] <Krenair>	 you can CC any project in subscribers
[17:54:56] <Krenair>	 that doesn't mean you should though
[17:55:06] <jynus>	 yep
[17:55:17] <Krenair>	 although it wouldn't affect me much, others would probably not like to see it abused :p
[17:55:23] <Dereckson>	 jynus: you have a 'add CCs' section (will be 'Change subscribers' on next update)
[17:55:55] <jynus>	 I know, was a mistake
[17:56:01] <wikibugs>	 7Blocked-on-Operations, 6operations, 10Wikimedia-General-or-Unknown: Invalidate all users sessions - https://phabricator.wikimedia.org/T124440#1966254 (10Legoktm) It's still running :/
[17:56:28] <jynus>	 can we focus on the task itself?
[17:56:45] <Krenair>	 I don't have anything useful to add to it
[17:57:09] <jynus>	 as in you agree with my suggestion?
[17:57:15] <Krenair>	 I'd be happy with fixing the read-only message in the config to be useful
[17:57:41] <jynus>	 question is that I think the read only config is also wrong
[17:58:23] <wikibugs>	 6operations, 10Citoid, 6Services, 10Traffic: Remove citoid from parsoidcache - https://phabricator.wikimedia.org/T110476#1966263 (10mobrovac) >>! In T110476#1965408, @BBlack wrote: > Are things still using the hostnames `citoid.wikimedia.org` and/or `citoid.eqiad.wikimedia.org`, which map to the cache_pars...
[17:59:33] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 04-1] "We need to fix the config first before freezing it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266534 (owner: 10Dereckson)
[18:02:59] <grrrit-wm>	 (03PS1) 10Cscott: Add missing `.deployment-prep` to redis server hostname. [puppet] - 10https://gerrit.wikimedia.org/r/266539 
[18:03:59] <jynus>	 let me focus on our primary infrastructure first
[18:04:26] <icinga-wm>	 PROBLEM - NTP on mendelevium is CRITICAL: NTP CRITICAL: Offset 5.305729508 secs
[18:04:33] <grrrit-wm>	 (03PS2) 10Jcrespo: Pool new parsercache pc1005 after cloning it from pc1002 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266516 (https://phabricator.wikimedia.org/T121888) 
[18:04:57] <icinga-wm>	 PROBLEM - NTP on technetium is CRITICAL: NTP CRITICAL: Offset 21.0652746 secs
[18:05:20] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Pool new parsercache pc1005 after cloning it from pc1002 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266516 (https://phabricator.wikimedia.org/T121888) (owner: 10Jcrespo)
[18:05:26] <grrrit-wm>	 (03PS5) 10RobH: Add alert for elasticsearch 50th percentile prefix search time [puppet] - 10https://gerrit.wikimedia.org/r/265942 (https://phabricator.wikimedia.org/T124542) (owner: 10EBernhardson)
[18:06:11] <grrrit-wm>	 (03CR) 10RobH: [C: 032] Add alert for elasticsearch 50th percentile prefix search time [puppet] - 10https://gerrit.wikimedia.org/r/265942 (https://phabricator.wikimedia.org/T124542) (owner: 10EBernhardson)
[18:08:38] <logmsgbot>	 !log jynus@mira Synchronized wmf-config/db-eqiad.php: Pool new parsercache pc1005 after cloning it from pc1002 (duration: 01m 28s)
[18:08:41] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:09:18] <robh>	 ebernhardson: all of your swat patches are merged, the neon one for icinga alerts is going live now.
[18:09:42] <grrrit-wm>	 (03PS1) 10Dereckson: Set WikidataPageBanner namespaces on fr.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266541 (https://phabricator.wikimedia.org/T123084) 
[18:12:55] <wikibugs>	 6operations, 10Graphoid, 6Services, 10Traffic: Remove graphoid from parsoidcache - https://phabricator.wikimedia.org/T110477#1966309 (10mobrovac) AFAIK, `graphoid.(eqiad.)wikimedia.org` can be safely removed.
[18:14:00] <robh>	 !log i broke icinga, fixing
[18:14:03] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:14:26] <cmjohnson1>	 !log starting puppet on mw cluster 
[18:14:29] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:14:30] <wikibugs>	 6operations, 10Parsoid: Need databases provisioned for parsoid-rt testing, visual diff testing - https://phabricator.wikimedia.org/T124703#1966329 (10jcrespo) 5Open>3Resolved a:3jcrespo The 3 databases have been successfully imported into m5-master. Use T124704 to request access and puppetizing it.
[18:15:42] <wikibugs>	 10Ops-Access-Requests, 6operations, 6Parsing-Team: Getting parsing-team members sudo access to manage (start, stop, restart) services on ruthenium - https://phabricator.wikimedia.org/T124701#1966370 (10jcrespo)
[18:16:59] <grrrit-wm>	 (03PS1) 10RobH: Revert "Add alert for elasticsearch 50th percentile prefix search time" [puppet] - 10https://gerrit.wikimedia.org/r/266543 
[18:17:17] <robh>	 rolling back my change to unbreak icinga =P
[18:17:36] <grrrit-wm>	 (03CR) 10RobH: [C: 032] Revert "Add alert for elasticsearch 50th percentile prefix search time" [puppet] - 10https://gerrit.wikimedia.org/r/266543 (owner: 10RobH)
[18:18:14] <legoktm>	 !log running mwscript updateArticleCount.php --wiki=jawiki --update=1
[18:18:16] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:18:42] <mutante>	 robh: it was broken before i think
[18:18:50] <robh>	 .....
[18:18:55] <robh>	 someone left it in a broken state live?
[18:19:12] <robh>	 mutante: if so then my rollback wont resurrect icinga
[18:19:43] <mutante>	 robh: see backlog from about 9:38 
[18:19:57] <mutante>	 https://gerrit.wikimedia.org/r/#/c/266535/
[18:20:06] <robh>	 ok...
[18:20:18] <robh>	 mutante: the backlog for me is very cluttered, can you summarize?
[18:20:44] <robh>	 ottomata: so did you break it before i got to it?
[18:20:55] <grrrit-wm>	 (03PS1) 10TheDJ: Raise file upload limit to 2,5 GB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266544 (https://phabricator.wikimedia.org/T116514) 
[18:21:46] <robh>	 !log icinga is broken, it seems it was from a change before mine, but my forced reload broke it
[18:21:47] <mutante>	 robh: i saw the icinga check for icinga config itself reported it as broken, then i ran icinga -v and saw it was looking for a contact group called "analytics_eqiad", there is a service group called analytics_eqiad but not a contact group
[18:21:49] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:21:51] <ottomata>	 i haven't checked, but i think i fixed it
[18:21:53] <ottomata>	 no?
[18:22:03] <ottomata>	 https://gerrit.wikimedia.org/r/#/c/266538/
[18:22:07] <robh>	 i pushed a change and it broke it again then
[18:22:50] <robh>	 ok, back.
[18:22:59] <robh>	 my change also broke it, sorry for the bad ping ottomata
[18:23:09] <robh>	 it just took a few moments to catch up =P
[18:24:19] <icinga-wm>	 RECOVERY - Check correctness of the icinga configuration on neon is OK: Icinga configuration is correct
[18:24:20] <Josve05away>	 I'm getting "404 File Not Found" when creating pages
[18:24:30] <mutante>	 ah, unrelated things then and fixed, cool !
[18:24:41] <icinga-wm>	 RECOVERY - NTP on krypton is OK: NTP OK: Offset -0.004741430283 secs
[18:25:01] <robh>	 ebernhardson: i take it bakc i had to roll the icinga patch back cuz something broke
[18:25:05] <grrrit-wm>	 (03Abandoned) 10Dzahn: graphite alerts, fix analytics monitoring group name [puppet] - 10https://gerrit.wikimedia.org/r/266535 (owner: 10Dzahn)
[18:25:10] <robh>	 but i've taken the liberty to move it to thursdays puppetswat
[18:25:16] <robh>	 and i'm goit to poke at it before then
[18:25:34] <robh>	 urandom: we didnt get to yours today but its now on thursday
[18:25:40] <urandom>	 robh: kk
[18:25:55] <grrrit-wm>	 (03PS1) 10Dzahn: bugzilla-static: ensure_resource to fix duplicates [puppet] - 10https://gerrit.wikimedia.org/r/266546 
[18:26:08] <robh>	 and cmjohnson1 is still pushing his apache changes to the rest of the cluster (so we arent out of the window quite yet)
[18:26:32] <grrrit-wm>	 (03PS2) 10Dzahn: bugzilla-static: ensure_resource to fix duplicates [puppet] - 10https://gerrit.wikimedia.org/r/266546 
[18:27:07] <Vito>	 Houston, we've got a big problem
[18:27:27] <Vito>	 There is no user by the name "Vituzzu". Check your spelling. <-- while trying loggin to meta
[18:27:33] <Helder>	 Does anyone knows why https://meta.wikimedia.org/wiki/Special:BlankPage redirects to https://wikimediafoundation.org/wiki/Special:BlankPage ?
[18:27:53] <sjoerddebruin>	 Hm meta seems broken yes
[18:27:54] <robh>	 !log i broke icinga, but then i fixed it, icinga back to normal.
[18:27:57] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:28:06] <Josve05away>	 and commons. Can't save things. get 404's
[18:28:12] <robh>	 cmjohnson1: we have an issue.
[18:28:15] <robh>	 with logging in
[18:28:33] <Josve05away>	 also got redirected to wikimediafoundation-wiki when trying to do so sometimes
[18:29:18] <robh>	 cmjohnson1: i think one of those changes broken things
[18:29:20] <Leah>	 Hi. Meta-Wiki seems broken. Known issue?
[18:29:22] <ebernhardson>	 robh: kk
[18:29:24] <Vito>	 https://phabricator.wikimedia.org/T124804
[18:29:33] <sjoerddebruin>	 Vito: It's not a CA-thing.
[18:29:37] <ebernhardson>	 robh: was it the prior brokenness i saw mentioned, or was it mine as well?
[18:29:43] <Vito>	 I think so sjoerddebruin
[18:29:47] <mutante>	 robh: that seems bad and related to that change that moved wwwportals
[18:29:50] <icinga-wm>	 PROBLEM - NTP on planet1001 is CRITICAL: NTP CRITICAL: Offset 1.700246453 secs
[18:29:51] <mutante>	 i'd revert that
[18:29:54] <robh>	 ebernhardson: not sure but rolling it back fixed it
[18:30:00] <ebernhardson>	 ok
[18:30:00] <robh>	 and now we're in outage condition
[18:30:02] <hoo>	 Could someone please rv group0?
[18:30:07] <robh>	 cmjohnson1: revert https://gerrit.wikimedia.org/r/#/c/265659/
[18:30:14] <robh>	 its breakign things
[18:30:15] <cmjohnson1>	 robh: did you or should i 
[18:30:24] <robh>	 i did not, please do so
[18:30:36] <grrrit-wm>	 (03PS1) 10Cmjohnson: Revert "beta: Move login and bits apache configs into wikimedia.conf, like prod" [puppet] - 10https://gerrit.wikimedia.org/r/266549 
[18:30:54] <subbu>	 jynus, thanks for the quick response in provisioning the erstwhile-ruthenium-dbs. what is involved in getting access to them and puppetizing it (T124704)?
[18:31:11] <robh>	 Since its login related I don't think we have to do anything to caching due to this.
[18:31:22] <robh>	 but im not certain.
[18:31:29] <Leah>	 We're 301ing meta.wikimedia.org to wikimediafoundation.org currently.
[18:31:35] <jynus>	 subbu, please wait some seconds, there are some issues going on
[18:31:44] <Leah>	 So there might be bad cache at some level.
[18:31:46] <mutante>	 revert this https://gerrit.wikimedia.org/r/#/c/265642/4/modules/mediawiki/files/apache/sites/wikimedia.conf
[18:31:49] <mutante>	 cmjohnson1: 
[18:32:11] <cmjohnson1>	 robh: it's reverting modules/role/manifests/elasticsearch/alerts.pp
[18:32:14] <mutante>	 i think that because it touched wikimedia.org docroot
[18:32:30] <robh>	 wait
[18:32:31] <robh>	 fuck
[18:32:33] <mutante>	 that would be in respose to Leah's comment
[18:32:35] <robh>	 mutante: which one to revert?
[18:32:39] <robh>	 oh
[18:32:49] <robh>	 mutante: so all of the changes broke things?
[18:33:03] <robh>	 so multiple issues
[18:33:05] <mutante>	 i don't know if more than one broke something, i just suspected that one
[18:33:12] <mutante>	 it's possible , yes
[18:33:13] <subbu>	 jynus, k
[18:33:53] <mutante>	 Reedy: ^ hey are you here
[18:33:59] <hoo>	 Ugh
[18:34:02] <hoo>	 we send 301s?
[18:34:10] <icinga-wm>	 PROBLEM - NTP on dubnium is CRITICAL: NTP CRITICAL: Offset unknown
[18:34:31] <dr0ptp4kt>	 hey all, is officewiki looking broken and otherwise is redirecting to foundationwiki. related?
[18:34:41] <Leah>	 dr0ptp4kt: Yes.
[18:34:46] <dr0ptp4kt>	 Leah: thx
[18:34:47] <Vito>	 same happens for other wikis dr0ptp4kt
[18:34:52] <dr0ptp4kt>	 Vito: thx
[18:34:56] <dr0ptp4kt>	 thx all around :)
[18:34:57] <Leah>	 Everything on wikimedia.org is probably borked currently, including login, office, and meta.
[18:35:04] <Vito>	 we just need to wait for the tech guys to fix it
[18:35:07] <dr0ptp4kt>	 yikes
[18:35:20] <Vito>	 meanwhile we can say silly funny stuffs about this outage
[18:35:27] <Leah>	 In another channel, sure. :-)
[18:35:32] <Vito>	 hehehehe
[18:36:03] <grrrit-wm>	 (03PS1) 10Cmjohnson: Revert "mediawiki: Move www.wikimedia.org portal into wwwportals" [puppet] - 10https://gerrit.wikimedia.org/r/266551 
[18:36:12] <Vito>	 btw I fear many bots/tasks will need to be restarted as soon as the outage ends
[18:36:37] <grrrit-wm>	 (03PS2) 10Cmjohnson: Revert "beta: Move login and bits apache configs into wikimedia.conf, like prod" [puppet] - 10https://gerrit.wikimedia.org/r/266549 
[18:38:05] <grrrit-wm>	 (03PS2) 10Cmjohnson: Revert "mediawiki: Move www.wikimedia.org portal into wwwportals" [puppet] - 10https://gerrit.wikimedia.org/r/266551 
[18:38:16] <grrrit-wm>	 (03CR) 10Cmjohnson: [C: 032] Revert "beta: Move login and bits apache configs into wikimedia.conf, like prod" [puppet] - 10https://gerrit.wikimedia.org/r/266549 (owner: 10Cmjohnson)
[18:39:13] <grrrit-wm>	 (03PS3) 10Cmjohnson: Revert "mediawiki: Move www.wikimedia.org portal into wwwportals" [puppet] - 10https://gerrit.wikimedia.org/r/266551 
[18:39:57] <cscott>	 greg-g, robh: do you think I could sneak an OCG deploy either before or after the train deploy today?
[18:40:06] <robh>	 we're in an outage.
[18:40:16] <icinga-wm>	 PROBLEM - NTP on bohrium is CRITICAL: NTP CRITICAL: Offset 1.762993813 secs
[18:40:17] <cscott>	 robh: ah.  that's why I ask!
[18:40:20] <wikibugs>	 6operations, 10MediaWiki-extensions-CentralAuth, 10netops: wikimedia.org seems to be gone - https://phabricator.wikimedia.org/T124804#1966496 (10Vituzzu)
[18:40:27] <grrrit-wm>	 (03CR) 10Cmjohnson: [C: 032] Revert "mediawiki: Move www.wikimedia.org portal into wwwportals" [puppet] - 10https://gerrit.wikimedia.org/r/266551 (owner: 10Cmjohnson)
[18:40:52] <cmjohnson1>	 changes reverted
[18:41:31] <sjoerddebruin>	 :)
[18:41:32] <hoo>	 salt run puppet?
[18:41:53] <cscott>	 robh: i'll check back after the train deploy then, hopefully things will not be on fire.
[18:42:22] <wikibugs>	 6operations, 10MediaWiki-extensions-CentralAuth, 10netops: wikimedia.org seems to be gone - https://phabricator.wikimedia.org/T124804#1966502 (10Aklapper) @Vituzzu: Thanks for reporting this. https://gerrit.wikimedia.org/r/#/c/266551/ got reverted so things should be back to normal. Can you confirm (by bypas...
[18:42:48] <_joe_>	 hoo: I'm on it
[18:43:12] <hoo>	 :)
[18:43:32] <wikibugs>	 6operations, 10MediaWiki-extensions-CentralAuth, 10netops: wikimedia.org seems to be gone - https://phabricator.wikimedia.org/T124804#1966507 (10I_JethroBT) Agreed, meta.wikimedia.org has been completely replaced with a broken-ish landing page for Wikimedia projects:  {F3283563}
[18:43:43] <andre__>	 Meh, and https://commons.wikimedia.org/wiki/Commons:Village_pump seems to redirect me to wmf:
[18:43:57] <doctaxon>	 yes it does
[18:44:05] <AndyRussG>	 meta is down? https://meta.wikimedia.org/
[18:44:11] <Leah>	 AndyRussG: Yes, known.
[18:44:18] <andre__>	 Also, yes. https://phabricator.wikimedia.org/T124804
[18:44:20] <AndyRussG>	 Leah: ah K thx :)
[18:44:21] <Wiki13>	 AndyRussG: changes causing it are being reveted
[18:44:33] <AndyRussG>	 K thx!
[18:44:49] <_joe_>	 !log running salt --batch-size=20 -C 'G@luster:appserver and G@site:eqiad' cmd.run 'puppet agent -t --tags mw-apache-config'
[18:44:52] <wikibugs>	 6operations, 10MediaWiki-extensions-CentralAuth, 10netops: wikimedia.org seems to be gone - https://phabricator.wikimedia.org/T124804#1966511 (10Aklapper) and https://commons.wikimedia.org/wiki/Commons:Village_pump redirects me to wmf:
[18:45:03] <mutante>	 if affects everything in wikimedia.org, but not wikipedia.org 
[18:45:05] <AndyRussG>	 Leah: Wiki13: Can u tell me when this happened about? It would affect FR
[18:45:12] <cscott>	 wikitech seems fine.
[18:45:23] <Wiki13>	 i think about 15 mins ago
[18:45:35] <Wiki13>	 according to the log of this channel
[18:45:39] <wikibugs>	 6operations, 10MediaWiki-extensions-CentralAuth, 10netops: Meta and Commons seem to redirect to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966518 (10Aklapper)
[18:45:40] <AndyRussG>	 Or I should say, might affect fr (checking)
[18:45:45] <AndyRussG>	 Wiki13: thx!
[18:45:50] <wikibugs>	 6operations, 10MediaWiki-extensions-CentralAuth, 10netops: Meta and Commons seem to redirect to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966520 (10I_JethroBT) @Aklapper after bypassing my cache, meta.wikimedia.org is still gone.
[18:45:51] <Wiki13>	 nah it wont affect FR
[18:45:58] <Wiki13>	 only *.wikimedia.org sites
[18:46:02] <mutante>	 AndyRussG: doesnt affect wikipedia.org
[18:46:03] <Wiki13>	 like meta and sommons
[18:46:04] <AndyRussG>	 Wiki13: FR uses CentralNotice which depends on banners from meta
[18:46:05] <_joe_>	 it's running, it will take some time to be applied though
[18:46:11] <_joe_>	 about 10 minutes at least
[18:46:17] <Wiki13>	 then its borked AndyRussG
[18:46:24] <AndyRussG>	 ;p
[18:46:25] <Leah>	 AndyRussG: I don't see an exact culprit at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:46:25] <the-wub>	 AndyRussG Wiki13 : yeah, donate.wikimedia.org is redirecting too. and CN is inacessible
[18:46:26] <hoo>	 _joe_: What about varnish? These were 301s
[18:46:46] <_joe_>	 hoo: I'm first fixing the app layer
[18:46:49] <Wiki13>	 we have to wait until the techs here fix it
[18:46:50] <AndyRussG>	 the-wub: what campaigns are (were) up?
[18:46:52] <MatmaRex>	 (for those tuning in now, this is https://phabricator.wikimedia.org/T124804)
[18:47:03] <hoo>	 Sure, but that should go on its own now... hopefully
[18:47:21] <MatmaRex>	 Leah: AndyRussG: it was the puppetswat
[18:47:28] <mutante>	 Leah: probably https://gerrit.wikimedia.org/r/#/c/265642  it was in the puppet swat
[18:47:31] <the-wub>	 AndyRussG: only low level banners. let's take FR discussion to our channel
[18:47:35] <i_jethrobot>	 Thanks to folks for fixing the problem. : )
[18:47:45] <AndyRussG>	 the-wub: yep, thx!
[18:47:52] <MatmaRex>	 whichhhh i suppose isn't on SAL, that kind of sucks, eh?
[18:47:57] <Leah>	 MatmaRex: Right.
[18:48:00] <_joe_>	 hoo: it will take some time to recover from though
[18:48:14] <_joe_>	 it's a _ton_ of stuff to figure out
[18:49:59] <mafk>	 ehm, yes, is Meta down, right?
[18:50:01] <i_jethrobot>	 _joe_ Are we talking a matter of hours?
[18:50:06] <i_jethrobot>	 mafk - Correct.
[18:50:26] <mafk>	 i_jethrobot: thanks
[18:50:37] <_joe_>	 i_jethrobot: 20 minutes tops
[18:50:39] <_joe_>	 I hope
[18:51:28] <i_jethrobot>	 _joe_ OK.  Good time for some lunch, then.  : )
[18:52:37] <icinga-wm>	 RECOVERY - NTP on dubnium is OK: NTP OK: Offset -0.02100622654 secs
[18:52:52] <wikibugs>	 6operations, 10MediaWiki-extensions-CentralAuth, 10netops: Meta and Commons seem to redirect to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966553 (10Vituzzu) @Aklapper still doesn't work for me. I'm currently served by Amsterdam's cluster btw.
[18:52:54] <ottomata>	 jynus:  https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=1&host=db1047
[18:53:00] <ottomata>	 PROCS CRITICAL: 2 processes with UID = 0 (root), args '/bin/bash /usr/local/bin/eventlogging_sync.sh' 
[18:53:00] <ottomata>	 ok?
[18:53:05] <wikibugs>	 6operations, 10MediaWiki-extensions-CentralAuth, 10netops: Meta, Commons, Wikispecies seem to redirect to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966554 (10OhanaUnited)
[18:53:26] <jynus>	 ottomata, not a concern now
[18:53:32] <wikibugs>	 6operations, 10MediaWiki-extensions-CentralAuth, 10netops: Meta, Commons, Wikispecies seem to redirect to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966462 (10OhanaUnited) Wikispecies also has the same issue
[18:53:41] <ottomata>	 k
[18:54:47] <wikibugs>	 6operations, 10MediaWiki-extensions-CentralAuth, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966572 (10matmarex)
[18:55:41] <wikibugs>	 6operations, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966580 (10matmarex)
[18:55:57] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966462 (10matmarex)
[18:55:59] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966592 (10Dzahn) The remaining issues are because a tagged puppet run is now executed on all appservers, which...
[18:56:02] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966593 (10Tbayer) Office.wikimedia.org is affected too, just for the record.
[18:56:39] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966597 (10Dzahn) >>! In T124804#1966593, @Tbayer wrote: > Office.wikimedia.org is affected too, just for the r...
[18:57:04] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966603 (10MZMcBride) This issue is definitely going to require incident documentation (<https://wikitech.wikim...
[18:57:51] <grrrit-wm>	 (03PS1) 10EBernhardson: Put more like query load back on eqiad for codfw load testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266559 
[18:58:15] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966615 (10Izno) >>! In T124804#1966597, @Dzahn wrote: >  > everything under .wikimedia.org is affected but not...
[18:58:19] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966618 (10Mike_Peel) >>! In T124804#1966597, @Dzahn wrote: >  > everything under .wikimedia.org is affected bu...
[18:59:06] <icinga-wm>	 RECOVERY - NTP on planet1001 is OK: NTP OK: Offset -0.009496450424 secs
[19:00:04] <jouncebot>	 marxarelli: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160126T1900).
[19:00:07] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966649 (10Dzahn) >>! In T124804#1966618, @Mike_Peel wrote: >>>! In T124804#1966597, @Dzahn wrote: >>  >> every...
[19:00:19] <grrrit-wm>	 (03CR) 10Aaron Schulz: Use the logical redis definition for GettingStarted. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266481 (https://phabricator.wikimedia.org/T124671) (owner: 10Giuseppe Lavagetto)
[19:02:18] <marxarelli>	 !log backports to wmf.11 ready on mira but delaying train due to wikimedia.org outage
[19:02:22] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:04:35] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966668 (10Mike_Peel) This kind of outage should probably appear on http://status.wikimedia.org/ ... (unless th...
[19:06:45] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966674 (10Pine) @Mike_Peel agreed.
[19:08:26] <icinga-wm>	 RECOVERY - NTP on fermium is OK: NTP OK: Offset 0.01764667034 secs
[19:08:56] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966694 (10Pine) Update: Commons is working now, but not Meta.
[19:11:06] <icinga-wm>	 PROBLEM - Check correctness of the icinga configuration on neon is CRITICAL: Icinga configuration contains errors
[19:11:11] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966709 (10matmarex) >>! In T124804#1966668, @Mike_Peel wrote: > This kind of outage should probably appear on...
[19:12:24] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966713 (10Pine) Commons is down again for me.
[19:12:38] <icinga-wm>	 RECOVERY - NTP on bohrium is OK: NTP OK: Offset -0.009258747101 secs
[19:13:43] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966722 (10Vituzzu) "There is no user by the name "Vituzzu". Check your spelling." again at meta.
[19:13:59] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966723 (10Aklapper) >>! In T124804#1966713, @Pine wrote: > Commons is down again for me.  Please see T124804#1...
[19:14:41] <wikibugs>	 6operations, 7Monitoring: add icinga and watchmouse https checks for content on commons. or other wikimedia.org sites - https://phabricator.wikimedia.org/T124812#1966725 (10Dzahn) 3NEW
[19:14:42] <akosiaris>	 !log issuing a varnish ban on all eqiad backend varnish for req.http.host .*wikimedia.org
[19:14:43] <wikibugs>	 6operations, 6Discovery: Elasticsearch health and capacity planning FY2016-17 - https://phabricator.wikimedia.org/T124626#1966735 (10TJones) David mentioned this ticket, and I had to take a peek.  > If my math is right, a 100% increase in 12 months extrapolated to 18 months gives >  > current capacity = 1 > in...
[19:14:45] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:15:06] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966736 (10matmarex)
[19:15:15] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966738 (10Dzahn) >>! In T124804#1966668, @Mike_Peel wrote: > This kind of outage should probably ap...
[19:16:18] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966744 (10RobH) Operations is still working on this issue.  At this time the underlying issue has b...
[19:17:37] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966746 (10Pine) @Dzahn @RobH thank you.
[19:18:55] <wikibugs>	 6operations, 7Monitoring: add icinga and watchmouse https checks for content on commons. or other wikimedia.org sites - https://phabricator.wikimedia.org/T124812#1966753 (10Dzahn)
[19:18:57] <wikibugs>	 6operations, 7Monitoring: add icinga and watchmouse https checks for content on commons. or other wikimedia.org sites - https://phabricator.wikimedia.org/T124812#1966755 (10Mike_Peel) Checking for specific strings would make sense - standard HTTP tokens or headers perhaps? But beyond that, the user expectation...
[19:19:31] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966757 (10jcrespo) At 7:36 PM, for reasons operations team has not yet investigated, a wrong config...
[19:20:11] <MatmaRex>	 (hmm, is loginwiki actually down?)
[19:21:39] <mafk>	 I can access that MatmaRex 
[19:21:45] <mafk>	 and meta
[19:21:52] <MatmaRex>	 hmm, actually, things might just be fixed now
[19:22:07] <MatmaRex>	 i'm just idly wondering if loginwiki was actually affected
[19:22:14] <MatmaRex>	 anyway. not important.
[19:22:39] <lestaty>	 login, meta and commons ok on brazil.
[19:22:54] <_joe_>	 can anyone still having issues please state so?
[19:22:58] <marxarelli>	 MatmaRex: there was a report in the wm-l thread about login to mw.org not working
[19:23:06] <_joe_>	 because the problems should be fixed at least on desktop
[19:23:30] <Pine>	 _joe_: I still can't get Meta or Commons on desktop
[19:23:30] <_joe_>	 Pine: as of now?
[19:23:31] <Pine>	 Yes
[19:23:35] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966774 (10Mike_Peel) >>! In T124804#1966757, @jcrespo wrote: > @Mike_Peel That panel is not handled...
[19:23:38] <_joe_>	 can you open a new browser window to test?
[19:23:49] <cursive>	 _joe_: Works for me
[19:23:52] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966776 (10jcrespo) Correction, it was 19:23 UTC.
[19:23:57] <_joe_>	 because browsers tend to cache pages
[19:24:37] <Pine>	 _joe_: I just refreshed, they're both working now for me
[19:24:50] <_joe_>	 Pine: :) glad to hear
[19:24:51] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966779 (10jeblad) ...and from Oslo, 10 points for well-done cleanup! :)
[19:25:06] <cursive>	 _joe_: Yup, all clear in Chrome incognito
[19:25:27] <Pine>	 To stay up, or not to stay up, that is the question.
[19:25:28] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966782 (10jcrespo) > The webpage showing the status of operations isn't handled by the operations t...
[19:25:52] <wikibugs>	 6operations, 7Monitoring: add icinga and watchmouse https checks for content on commons. or other wikimedia.org sites - https://phabricator.wikimedia.org/T124812#1966785 (10Grendelkhan) Additionally, are there presubmit/integration checks that would have caught this? The builds looked green on push.
[19:26:00] <andre__>	 Commons and Meta do not redirect for me anymore.
[19:26:27] <icinga-wm>	 PROBLEM - NTP on rutherfordium is CRITICAL: NTP CRITICAL: Offset 12.7241199 secs
[19:27:25] <akosiaris>	 !log issuing a varnish ban on all eqiad frontend varnish for req.http.host .*wikimedia.org
[19:27:35] <akosiaris>	 !log issuing a varnish ban on all codfw backend varnish for req.http.host .*wikimedia.org
[19:27:54] <akosiaris>	 !log issuing a varnish ban on all codfw frontend varnish for req.http.host .*wikimedia.org
[19:28:00] <marxarelli>	 _joe_: is it ok to proceed with the train to mw1017 and group0?
[19:28:07] <akosiaris>	 !log issuing a varnish ban on all ulsfo backend varnish for req.http.host .*wikimedia.org
[19:28:15] <akosiaris>	 !log issuing a varnish ban on all ulsfo frontend varnish for req.http.host .*wikimedia.org
[19:28:20] <akosiaris>	 marxarelli: no, please not yet
[19:28:21] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:28:22] <marxarelli>	 following the fun invalidation stuff
[19:28:25] <_joe_>	 marxarelli: please not yet
[19:28:28] <marxarelli>	 ack
[19:28:32] <akosiaris>	 !log issuing a varnish ban on all ulsfo backend varnish for req.http.host .*wikimedia.org
[19:28:38] <akosiaris>	 !log issuing a varnish ban on all esams backend varnish for req.http.host .*wikimedia.org
[19:28:51] <_joe_>	 (we're just back logging, we already did all of that)
[19:28:51] <akosiaris>	 !log issuing a varnish ban on all esams frontend varnish for req.http.host .*wikimedia.org
[19:28:54] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:28:55] <akosiaris>	 !log all of the above already done, back logging
[19:28:57] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:28:59] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:29:02] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:29:09] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:29:11] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:29:14] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:30:49] <Leah>	 _joe_, akosiaris: Thanks for the fast response. :-)
[19:32:16] <icinga-wm>	 RECOVERY - Check correctness of the icinga configuration on neon is OK: Icinga configuration is correct
[19:33:28] <grrrit-wm>	 (03PS1) 10Eevans: Enable EventBus on remaining (applicable) wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266564 (https://phabricator.wikimedia.org/T116786) 
[19:34:09] <icinga-wm>	 PROBLEM - tools-home on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds
[19:34:20] <grrrit-wm>	 (03CR) 10Jhobs: [C: 031] "Assuming that 0.01 -> 0.1 change was intentional, then this LGTM." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265292 (https://phabricator.wikimedia.org/T123932) (owner: 10Bmansurov)
[19:35:33] <akosiaris>	 !log all of the above referred to cache_text
[19:36:13] <akosiaris>	 !log issuing a varnish ban on all eqiad mobile backend varnish for req.http.host .*wikimedia.org
[19:36:20] <akosiaris>	 !log issuing a varnish ban on all eqiad mobile frontend varnish for req.http.host .*wikimedia.org
[19:36:29] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:36:32] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:36:36] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:36:54] <akosiaris>	 !log issuing a varnish ban on all codfw mobile backend varnish for req.http.host .*wikimedia.org
[19:36:57] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:38:23] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966833 (10MZMcBride) >>! In T124804#1966722, @Vituzzu wrote: > "There is no user by the name "Vituz...
[19:38:28] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966835 (10Vituzzu) >>! In T124804#1966833, @MZMcBride wrote: >>>! In T124804#1966722, @Vituzzu wrot...
[19:38:34] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966836 (10jcrespo) Update: while we believe most issues have been solved now, the caching purge has...
[19:39:21] <wikibugs>	 6operations, 7Monitoring: add icinga and watchmouse https checks for content on commons. or other wikimedia.org sites - https://phabricator.wikimedia.org/T124812#1966839 (10jayvdb)
[19:39:38] <icinga-wm>	 RECOVERY - tools-home on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 772570 bytes in 6.513 second response time
[19:39:40] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1966845 (10Harej)
[19:40:51] <wikibugs>	 6operations, 7Monitoring: add icinga and watchmouse https checks for content on commons. or other wikimedia.org sites - https://phabricator.wikimedia.org/T124812#1966853 (10Dzahn) There is a script called apache-fast-test. (modules/apache/files/apache-fast-test) but it's not run automatically by integration. I...
[19:43:23] <akosiaris>	 !log issuing a varnish ban on all codfw mobile frontend varnish for req.http.host .*wikimedia.org
[19:43:42] <akosiaris>	 !log issuing a varnish ban on all ulsfo mobile backend varnish for req.http.host .*wikimedia.org
[19:43:48] <akosiaris>	 !log issuing a varnish ban on all ulsfo mobile frontend varnish for req.http.host .*wikimedia.org
[19:43:52] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:44:11] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:44:14] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:45:52] <akosiaris>	 !log issuing a varnish ban on all esams mobile backend varnish for req.http.host .*wikimedia.org
[19:45:55] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:46:00] <akosiaris>	 !log issuing a varnish ban on all esams mobile frontend varnish for req.http.host .*wikimedia.org
[19:46:03] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:49:02] <grrrit-wm>	 (03PS1) 10EBernhardson: Keep daily graphite data for 5 years [puppet] - 10https://gerrit.wikimedia.org/r/266567 
[19:49:20] <bgerstle>	 we don't have any S3 buckets, do we?
[19:49:58] <Leah>	 bgerstle: What's your real question? :-)
[19:50:09] <bgerstle>	 if we have any S3 buckets i can use
[19:50:12] <akosiaris>	 marxarelli|brb: red alert is done. You can proceed with deploy
[19:50:14] <bgerstle>	 to upload build artifacts from travis
[19:50:39] <bgerstle>	 https://docs.travis-ci.com/user/uploading-artifacts/
[19:50:47] <grrrit-wm>	 (03CR) 10EBernhardson: "I was looking over the data stored in graphite to get some ideas for capacity planning for elasticsearch, and was a bit disappointed to on" [puppet] - 10https://gerrit.wikimedia.org/r/266567 (owner: 10EBernhardson)
[19:50:59] <cscott>	 bgerstle: why not use github's "releases" feature?
[19:51:07] <bgerstle>	 cscott: because this is specifically for test runs
[19:51:17] <bgerstle>	 cscott: i.e. to see failed visual test images
[19:51:20] <icinga-wm>	 RECOVERY - NTP on rutherfordium is OK: NTP OK: Offset -1.537799835e-05 secs
[19:51:52] <cscott>	 bgerstle: you could commit them to a repo and git push
[19:56:19] <bgerstle>	 cscott: others have already asked about uploading directly to GitHub, but apparently github upload API is deprecated
[19:56:20] <bgerstle>	 cscott: don't think i'd have push access from Travis VM
[19:56:20] <bgerstle>	 or Travis environment in general (not sure if OS X is actually a VM)
[19:57:15] <bgerstle>	 cscott: and unfortunately, these tests are only failing in travis :-(
[19:57:18] <cscott>	 well, if it helps https://github.com/cscott/node-icu-bidi/blob/master/scripts/publish.js is my script to push release binaries to github from travis.
[19:57:19] <cscott>	 i'm pretty sure you can push to git via github's api as well, you just need an OAuth token.  which you'd store in a secure env variable in travis.
[19:57:19] <bgerstle>	 cscott: i see, you're using an access token (which i assume has that privilege)
[19:57:19] <grrrit-wm>	 (03PS1) 10Ottomata: Make all kafka broker metrics prefixed with kafka.cluster.$cluster_name [puppet] - 10https://gerrit.wikimedia.org/r/266568 (https://phabricator.wikimedia.org/T121643) 
[19:57:19] <bgerstle>	 yeah, or i can just use build artifacts that upload to S3
[19:57:19] <grrrit-wm>	 (03PS2) 10Ottomata: Make all kafka broker metrics prefixed with kafka.cluster.$cluster_name [puppet] - 10https://gerrit.wikimedia.org/r/266568 (https://phabricator.wikimedia.org/T121643) 
[19:57:19] <cscott>	 bgerstle: yeah.  it just seems a shame to pay for storage when there's so much free storage floating around.
[19:57:20] <cscott>	 but developer time probably costs more than S3 does
[19:57:20] <bgerstle>	 and people have already people cool tools that visualize the images in S3: https://github.com/ashfurrow/second_curtain
[19:57:20] <cscott>	 you could upload to commons. ;)
[19:57:20] <bgerstle>	 cscott: e.g. getting this when a visual test fails would be _really_ nice https://eigen-ci.s3.amazonaws.com/snapshots/2014-08-04--15-47/index.html
[19:57:21] <bgerstle>	 just need an AWS bucket :-)
[19:57:26] <bgerstle>	 Leah: so whaddya say? :-)
[19:59:00] <Leah>	 I don't know if the Wikimedia Foundation has any S3 buckets currently.
[19:59:09] <Leah>	 I guess file a ticket in Phabricator and mark it operations?
[20:00:32] <bgerstle>	 Leah: sure, was hoping i could get a quick answer, but the absence of a "hell no" to S3 is enough to keep me going for now :-)
[20:00:51] <grrrit-wm>	 (03PS3) 10Chad: Update debian package for gerrit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/263631 
[20:01:10] <Leah>	 bgerstle: I can't really tell if the Travis builds just need storage space to live on or if they're really S3-specific.
[20:01:21] <Leah>	 We have the former, of course.
[20:01:49] <Leah>	 The whole Travis/mobile mess won't really be any worse by incorporating Amazon hosting, I don't think.
[20:01:59] <marxarelli>	 !log proceeding with train deploy. wmf.11 to mw1017, then group0
[20:02:20] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:02:23] <bgerstle>	 Leah: unfortunately it's S3 specific at the moment :-(
[20:02:37] <bgerstle>	 there's no mention of an adapter to different storage repos
[20:02:39] <marxarelli>	 tgr, bd808, anomie: ^
[20:02:55] <Leah>	 All right.
[20:03:09] <bgerstle>	 Leah: i was specifically wondering about any existing S3 buckets
[20:03:29] <bgerstle>	 (since it's S3 specific)
[20:03:43] <Leah>	 I haven't heard of the Wikimedia Foundation having any S3 buckets.
[20:03:47] <bgerstle>	 Leah: sorry, ahve another meeting, bbl
[20:03:53] <bd808>	 marxarelli: cool beans. anomie and I are in a meeting but he can jump out to test things when you are ready
[20:03:53] <Leah>	 No worries, bye.
[20:05:33] <bgerstle>	 Leah: thanks!
[20:07:09] <grrrit-wm>	 (03PS3) 10Ottomata: Make all kafka broker metrics prefixed with kafka.cluster.$cluster_name [puppet] - 10https://gerrit.wikimedia.org/r/266568 (https://phabricator.wikimedia.org/T121643) 
[20:09:28] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Make all kafka broker metrics prefixed with kafka.cluster.$cluster_name [puppet] - 10https://gerrit.wikimedia.org/r/266568 (https://phabricator.wikimedia.org/T121643) (owner: 10Ottomata)
[20:10:11] <icinga-wm>	 RECOVERY - NTP on mendelevium is OK: NTP OK: Offset -0.002828121185 secs
[20:11:07] <grrrit-wm>	 (03PS1) 10Ottomata: Remove extra group_prefix assignment [puppet] - 10https://gerrit.wikimedia.org/r/266571 
[20:11:20] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Remove extra group_prefix assignment [puppet] - 10https://gerrit.wikimedia.org/r/266571 (owner: 10Ottomata)
[20:14:57] <marxarelli>	 !log running 'sync-common --verbose deployment.eqiad.wmnet' on mw1017 to sync wmf.11 for initial testing
[20:14:59] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:16:11] <icinga-wm>	 RECOVERY - NTP on technetium is OK: NTP OK: Offset -0.001200199127 secs
[20:18:02] <marxarelli>	 !log locally modified wikiversions.php and wikiversions.json on mw1017 for testing
[20:18:19] <grrrit-wm>	 (03CR) 10Mobrovac: [C: 031] Enable EventBus on remaining (applicable) wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266564 (https://phabricator.wikimedia.org/T116786) (owner: 10Eevans)
[20:18:26] <grrrit-wm>	 (03PS1) 10Ottomata: End kafka group_prefix propertly with . [puppet] - 10https://gerrit.wikimedia.org/r/266575 
[20:18:47] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] End kafka group_prefix propertly with . [puppet] - 10https://gerrit.wikimedia.org/r/266575 (owner: 10Ottomata)
[20:18:50] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:19:17] <marxarelli>	 er, actually, bd808, anomie|meeting is that the best course? (locally modifying wikiversions on mw1017)
[20:19:38] <grrrit-wm>	 (03PS4) 10Chad: Update debian package for gerrit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/263631 
[20:20:50] <bd808>	 marxarelli: yeah. Just changing wikiversions.php locally on mw1017 is all it takes
[20:21:32] <grrrit-wm>	 (03CR) 10Aaron Schulz: [C: 031] Raise file upload limit to 2,5 GB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266544 (https://phabricator.wikimedia.org/T116514) (owner: 10TheDJ)
[20:22:22] <bd808>	 marxarelli: https://www.mediawiki.org/wiki/Special:Version from mw1017 is still showing .10 for me
[20:22:39] <marxarelli>	 bd808: whoops. just did testwiki
[20:22:55] <marxarelli>	 sec
[20:23:04] <bd808>	 marxarelli: ah ok
[20:24:35] <marxarelli>	 bd808: k. all of group0 should be on wmf.11 (mw1017) now
[20:28:31] <bd808>	 marxarelli: *nod*
[20:28:32] <grrrit-wm>	 (03PS3) 10Dzahn: bugzilla-static: ensure_resource to fix duplicates [puppet] - 10https://gerrit.wikimedia.org/r/266546 
[20:28:32] <grrrit-wm>	 (03PS5) 10Chad: Update debian package for gerrit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/263631 
[20:28:32] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] bugzilla-static: ensure_resource to fix duplicates [puppet] - 10https://gerrit.wikimedia.org/r/266546 (owner: 10Dzahn)
[20:28:32] <grrrit-wm>	 (03PS1) 10Ottomata: Pass group_prefix to analytics-eqiad kafka jmxtrans [puppet] - 10https://gerrit.wikimedia.org/r/266578 
[20:28:32] <grrrit-wm>	 (03PS2) 10Ottomata: Pass group_prefix to analytics-eqiad kafka jmxtrans [puppet] - 10https://gerrit.wikimedia.org/r/266578 
[20:28:32] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Pass group_prefix to analytics-eqiad kafka jmxtrans [puppet] - 10https://gerrit.wikimedia.org/r/266578 (owner: 10Ottomata)
[20:28:32] <bd808>	 marxarelli: for really testing things we need to make all wikis on mw1017 to .11. anomie will be done in this meeting in a few minutes and can help
[20:28:32] <bd808>	 tgr: group0 via mw1017 has .11 again now
[20:28:40] <marxarelli>	 bd808: ah, ok. will do
[20:43:37] <marxarelli>	 !log modified wikiversions.php locally on mw1017 to promote all wikis to wmf.11 for initial testing
[20:43:37] <icinga-wm>	 PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is inactive
[20:43:38] <grrrit-wm>	 (03PS1) 10Dzahn: releases: use ensure_resource, avoid duplicate defs [puppet] - 10https://gerrit.wikimedia.org/r/266581 
[20:43:38] <chasemp>	 !log stopping nfs on labstore1001
[20:43:38] <grrrit-wm>	 (03PS2) 10Dzahn: releases: use ensure_resource, avoid duplicate defs [puppet] - 10https://gerrit.wikimedia.org/r/266581 
[20:43:38] <bd808>	 marxarelli: mw1017 interactions are looking good to me. logout/login worked, edit worked
[20:43:38] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] releases: use ensure_resource, avoid duplicate defs [puppet] - 10https://gerrit.wikimedia.org/r/266581 (owner: 10Dzahn)
[20:43:39] <tgr>	 login status after account creation seems unreliable
[20:43:40] <tgr>	 I'll try to come up with reproduction steps
[20:43:40] <marxarelli>	 bd808, tgr: k. will wait for y'all to give the green light before continuing with group0 promotion
[20:43:42] <bd808>	 tgr: would that be a group0 blocker or just a group1 blocker?
[20:43:42] <bd808>	 mostly meaning how long do we need to keep marxarelli hanging on the line
[20:43:42] <tgr>	 group1, I tested with loginwiki on mw1017
[20:43:43] <chasemp>	 !log starting nfsd on labstore1001
[20:48:54] <marxarelli>	 bd808, tgr: k. i'll proceed with group0 then
[20:50:14] <grrrit-wm>	 (03PS3) 10Andrew Bogott: Remove puppet classes and files associated with /srv/mediawiki/private/WikitechPrivateLdapSettings.php [puppet] - 10https://gerrit.wikimedia.org/r/266452 (https://phabricator.wikimedia.org/T124732) 
[20:51:40] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Remove puppet classes and files associated with /srv/mediawiki/private/WikitechPrivateLdapSettings.php [puppet] - 10https://gerrit.wikimedia.org/r/266452 (https://phabricator.wikimedia.org/T124732) (owner: 10Andrew Bogott)
[20:53:15] <wikibugs>	 7Puppet, 10Beta-Cluster-Infrastructure, 5Patch-For-Review, 7Tracking: Remove all ::beta roles in puppet - https://phabricator.wikimedia.org/T86644#1967239 (10Krenair)
[20:53:58] <grrrit-wm>	 (03PS3) 10Andrew Bogott: Don't send puppet nags to the novaadmin user. [puppet] - 10https://gerrit.wikimedia.org/r/266192 (https://phabricator.wikimedia.org/T124516) 
[20:56:24] <tgr>	 I'm hotwiring mw1017 to not throttle account creations from my ip
[20:58:34] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Don't send puppet nags to the novaadmin user. [puppet] - 10https://gerrit.wikimedia.org/r/266192 (https://phabricator.wikimedia.org/T124516) (owner: 10Andrew Bogott)
[20:58:34] <chasemp>	 !log drop labstore1001 nfs threads down to 192
[20:58:34] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:59:08] <wikibugs>	 6operations, 10Continuous-Integration-Infrastructure, 5Patch-For-Review, 7WorkType-NewFunctionality: Phase out operations-puppet-pep8 Jenkins job and tools/puppet_pep8.py - https://phabricator.wikimedia.org/T114887#1967267 (10hashar) The job `operations-puppet-tox-pep8-jessie` is always triggered. It will...
[20:59:23] <bd808>	 tgr: when marxarelli syncs the wikiversions bump you'll have to set it back up probably
[20:59:37] <tgr>	 yeah
[20:59:50] <marxarelli>	 !log getting 'Lost parent, LightProcess exiting' when running sync-dir
[20:59:53] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:00:23] <bd808>	 marxarelli: from which servers?
[21:00:30] <marxarelli>	 bd808: mira
[21:00:47] <marxarelli>	 whole msg is '[Tue Jan 26 20:59:06 2016] [hphp] [5613:7f79e535fd00:0:000001] [] Lost parent, LightProcess exiting'
[21:00:58] <bd808>	 o_O locally hhvm is puking and dying?
[21:01:16] <jynus>	 known issue, ignore it for now, it is not affecting sync
[21:01:17] <marxarelli>	 i can't tell if it's remote execution or not
[21:01:34] <marxarelli>	 well, it is
[21:01:37] <marxarelli>	 for all servers
[21:02:15] <marxarelli>	 jynus: rgr that. i'll resume it then
[21:02:32] <marxarelli>	 !log resuming sync-dir and ignoring error as a known issue
[21:02:35] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:02:56] <jynus>	 I can tell you because I personally tested it
[21:07:23] <bgerstle>	 hi again ops! a quick search on wikitech revealed "WebPageTest" which appears to use S3: https://wikitech.wikimedia.org/wiki/WebPageTest#Setup_S3
[21:07:29] * marxarelli is glad he upgraded his dsl service, helps with the error stream
[21:08:06] <ori>	 bgerstle: and?
[21:08:08] <wikibugs>	 6operations, 6Labs, 10Labs-Infrastructure, 5Patch-For-Review: mail from testlabs to ops list - https://phabricator.wikimedia.org/T124516#1967301 (10Andrew) 5Open>3Resolved
[21:08:22] <bgerstle>	 i'd like to talk to the people involved to see if i can use S3 too
[21:08:48] <bgerstle>	 sorry gotta go... meeting multitasking doesn't work
[21:08:56] <ori>	 ping phedenskog or me sometime
[21:08:59] <wikibugs>	 6operations, 10RESTBase, 5Patch-For-Review: Reduce log spam by removing non-operational cassandra IPs from seeds - https://phabricator.wikimedia.org/T123869#1967306 (10Eevans) This didn't go out in today's Puppet SWAT, and has been rescheduled for Thursday.
[21:09:04] <bgerstle_afk>	 ori will do, thanks!
[21:09:50] <cscott>	 marxarelli: you're working on the train deploy?
[21:09:57] <marxarelli>	 cscott: yep
[21:10:24] <cscott>	 marxarelli: could you ping me when it's done?  i'd like to sneak in an OCG deploy afterwards, if there's time.
[21:10:36] <marxarelli>	 cscott: sure thing
[21:11:22] <marxarelli>	 !log sync-dir php linting failed
[21:11:25] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:11:49] <marxarelli>	 ah, it was local b0rking
[21:12:09] <marxarelli>	 ok, that other error is related to the lint failure
[21:15:04] <marxarelli>	 bd808: so i have no idea if sync-dir was successful because of the hhvm `php -l` madness
[21:15:32] <bd808>	 marxarelli: if lint failed then it never got around to tyring to sync
[21:16:17] <marxarelli>	 well, i can hack scap to use php5 -l for now
[21:16:34] <bd808>	 that might be worth a shot
[21:16:45] <marxarelli>	 or manually lint and temporary remove check_valid_syntax
[21:17:00] <marxarelli>	 k. i'll try option 1
[21:18:59] <marxarelli>	 alright. hack worked but the lint still failed. might be real
[21:19:02] * marxarelli checks
[21:21:22] <marxarelli>	 !log lint error found when running sync-dir 'Errors parsing /srv/mediawiki-staging/php-1.27.0-wmf.11/extensions/Echo/includes/iterator/CallbackFilterIterator.php'
[21:21:25] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:22:33] <marxarelli>	 !log Fatal error: Cannot redeclare class CallbackFilterIterator in /srv/mediawiki-staging/php-1.27.0-wmf.11/extensions/Echo/includes/iterator/CallbackFilterIterator.php on line 24
[21:22:36] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:24:09] <Reedy|ChaosMonke>	 That's not good
[21:24:12] <marxarelli>	 bd808: who can i ping about that? ^ Roan?
[21:24:21] <Reedy|ChaosMonke>	 Roans on holiday AFAIK
[21:24:30] <marxarelli>	 might only be a php 5.4 issue
[21:24:35] <bd808>	 marxarelli: legoktm?
[21:24:39] <Reedy|ChaosMonke>	 5.4? :P
[21:24:46] <bd808>	 5.5
[21:24:47] <Krenair>	  * This class is implemented as part of SPL starting at PHP5.4.  This
[21:24:47] <Krenair>	  * re-implementation provides backwards compatibility to mediawiki
[21:24:47] <Krenair>	  * running on PHP5.3.
[21:24:56] <Krenair>	 huh
[21:25:01] <bd808>	 oh fun
[21:25:02] <marxarelli>	 i'm linting with 5.5 on mira
[21:25:11] <marxarelli>	 but CallbackFilterIterator was introduced in 5.4
[21:25:34] <Reedy|ChaosMonke>	 Shouldn't it be wrapped in an if !exists?
[21:25:45] <marxarelli>	 yeah, seems like
[21:27:15] <Reedy|ChaosMonke>	 Wonder how well that works in an autoloader
[21:27:27] <marxarelli>	 strange that this didn't occur last week, but perhaps that's because the linter was running via hhvm
[21:27:40] <Reedy|ChaosMonke>	 or php 5.3?
[21:27:45] <Reedy|ChaosMonke>	 Did tin have hhvm?
[21:27:56] <marxarelli>	 whatever the debian default php is on tin
[21:27:57] <MaxSem>	 nope, tin was on 5.3
[21:27:59] <marxarelli>	 *was*
[21:28:02] <MaxSem>	 which explains a lot
[21:28:21] <hashar>	 tin was 5.3 definitely
[21:28:40] * marxarelli sighs
[21:28:48] <hashar>	 that is a reason we still have CI job running Zend 5.3 (there are more reasons)
[21:28:51] <Reedy|ChaosMonke>	 Needs a task filing at least
[21:29:05] <MaxSem>	 nope
[21:29:13] <MaxSem>	 we should just go 5.5 only!
[21:29:25] <hashar>	 so I am really wondering how the hell a Zend 5.4+ method has been introduced in the code base 
[21:29:47] <MaxSem>	 it's a shim for poor old 5.3
[21:29:58] <Reedy|ChaosMonke>	 heh
[21:30:05] <MaxSem>	 autoloader never gets called for it, so no fatals
[21:30:13] <MaxSem>	 but linter just lints all .php
[21:30:15] <Reedy|ChaosMonke>	 Considering, it's only tin on WMF servers...
[21:30:22] <marxarelli>	 i'll file a task unless someone else is already on it
[21:30:29] <Reedy|ChaosMonke>	 Yeah, so an if !class_exists() should fix it?
[21:30:36] <MaxSem>	 new bug: nuke tin from the orbit
[21:30:49] <Reedy|ChaosMonke>	 MaxSem: I think _joe_ is getting on with it
[21:30:58] <Reedy|ChaosMonke>	 MaxSem: Then we have the community to deal with
[21:33:16] <Krenair>	 tin is no longer the main deployment host
[21:33:22] <Krenair>	 it is still receiving deployments though
[21:33:34] <hashar>	 we sill need a Zend 5.3 for CI regardless, since we run test for release branches
[21:33:40] <Krenair>	 arguably we still need 5.3 support
[21:33:57] <Reedy|ChaosMonke>	 Until tin is reinstalled...
[21:34:35] <Reedy|ChaosMonke>	 then, if/when we bump core, we only need 5.3 for < 1.27
[21:35:06] <hashar>	 oh
[21:39:52] <subbu>	 jynus, pinging again about https://phabricator.wikimedia.org/T124704 to see what is involved in getting this done.
[21:41:05] <marxarelli>	 !log filed https://phabricator.wikimedia.org/T124828 for fatal in extensions/Echo
[21:41:08] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:42:14] <Nemo_bis>	 well done scap :)
[21:45:37] <marxarelli>	 legoktm, matt_flaschen: anyone around to take care of ^ ?
[21:48:26] <wikibugs>	 6operations, 10Traffic, 7Documentation:  Automate and/or better-document varnish ban procedure for operations staff, so it can be accomplished with more speed and confidence in outage conditions - https://phabricator.wikimedia.org/T124835#1967490 (10RobH) 3NEW a:3BBlack
[21:48:58] <grrrit-wm>	 (03PS1) 10Ottomata: Fix alert for eventlogging raw - valid rate [puppet] - 10https://gerrit.wikimedia.org/r/266597 
[21:49:02] <marxarelli>	 bd808: fwiw, this is a long-standing bug that doesn't occur with hhvm -l
[21:49:18] <bd808>	 marxarelli: heh. their tests exclude that file from parallel-lint
[21:49:22] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Fix alert for eventlogging raw - valid rate [puppet] - 10https://gerrit.wikimedia.org/r/266597 (owner: 10Ottomata)
[21:49:23] <wikibugs>	 6operations, 10Traffic, 7Documentation: Automate and/or better-document varnish ban procedure for operations staff, so it can be accomplished with more speed and confidence in outage conditions - https://phabricator.wikimedia.org/T124835#1967498 (10RobH) I initially assigned this to @bblack, but it can be ac...
[21:51:57] <grrrit-wm>	 (03PS1) 10Ori.livneh: New set of speed experiments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266599 
[21:52:57] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] New set of speed experiments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266599 (owner: 10Ori.livneh)
[21:53:29] <grrrit-wm>	 (03Merged) 10jenkins-bot: New set of speed experiments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266599 (owner: 10Ori.livneh)
[21:55:38] <logmsgbot>	 !log ori@mira Synchronized docroot and w: I9b054d847a: New set of speed experiments (duration: 01m 29s)
[21:55:41] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:01:06] <Leah>	 robh: Did you mean to put the date twice in the page title of https://wikitech.wikimedia.org/wiki/Incident_documentation/20160126-20160126-WikimediaDomainRedirection ?
[22:01:59] <Krenair>	 obviously that was an oversight, I moved it
[22:02:00] <robh>	 Leah: nope, fixed!
[22:02:03] <robh>	 uhh
[22:02:06] <robh>	 i moved it already...
[22:02:11] <Krenair>	 uhhh
[22:02:15] <Krenair>	 looks like we both moved it
[22:02:21] <robh>	 we both did? whatevs as long as its right
[22:02:21] <robh>	 heh
[22:02:35] <robh>	 and it is
[22:02:50] <Krenair>	 interesting, I assumed MW prevented that sort of weird conflict
[22:02:50] <Krenair>	 ok
[22:02:57] <Leah>	 It's... supposed to.
[22:03:24] <icinga-wm>	 PROBLEM - Kafka Cluster analytics-eqiad Broker Messages In Per Second on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 13 data above and 46 below the confidence bounds
[22:03:27] <Leah>	 Thanks. I was looking at https://wikitech.wikimedia.org/wiki/Category:Incident_documentation and we actually do use date ranges sometimes, I guess. that's why I asked.
[22:03:32] <ottomata>	 !
[22:03:41] <ottomata>	 i think that's just because i changed the metrics ^^
[22:03:50] <Krenair>	 although it did tell me it suppressed a redirect, which I don't remember pressing the button for, and it didn't log
[22:04:00] <ragesoss>	 It looks like OAuth is broken on test.wikipedia.org
[22:04:03] <ottomata>	 anomaly will take a while to go away
[22:04:15] <Krenair>	 tgr, anomie: ^
[22:04:19] <robh>	 i left a redirect in place
[22:04:27] <robh>	 odd mw behavior
[22:04:27] <Krenair>	 I believe testwiki went to wmf.11 again today
[22:04:46] <marxarelli>	 it's about to, after this linter business gets cleared up
[22:04:53] <Krenair>	 oooh
[22:05:00] * marxarelli is the king of slow trains
[22:05:02] <Krenair>	 I have a cross-wiki notification
[22:05:22] <Krenair>	 nice, forgot that was enabled
[22:05:36] <tgr>	 ragesoss: https://tools.wmflabs.org/oauth-hello-world/ works for me
[22:06:00] <ragesoss>	 tgr: that doesn't auth to test.wikipedia.org
[22:06:20] <ragesoss>	 tgr try https://dashboard-testing.wikiedu.org/
[22:06:38] <ragesoss>	 When I click 'Allow' there, it redirects back to the same page (with all the params stripped)\
[22:06:53] <tgr>	 that seems to work too
[22:07:03] <ragesoss>	 hm...
[22:07:19] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1967560 (10MZMcBride) >>! In T124804#1966836, @jcrespo wrote: > Followup will be on this ticket and...
[22:07:47] <tgr>	 ragesoss: did you use a newly created account?
[22:07:53] <ragesoss>	 tgr: no.
[22:08:01] <ragesoss>	 I'm just trying to log in with my usual account.
[22:08:47] <marxarelli>	 ragesoss: there was an outage for all of *.wikimedia.org. if there's a 301 in there, the Location: could be cached, maybe?
[22:09:03] <marxarelli>	 ragesoss: try in a clear browser session
[22:09:16] <ragesoss>	 it worked after I logged out and back in.
[22:09:20] <ragesoss>	 sorry for the false alarm.
[22:09:21] <ragesoss>	 thanks!
[22:09:43] <greg-g>	 ragesoss: thank you (things have been odd lately, so better safe than sorry ;) )
[22:10:48] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1967564 (10RobH) 5Open>3Resolved a:3RobH resolving as I've sent the outage notification to the...
[22:11:04] <grrrit-wm>	 (03PS1) 10Aaron Schulz: Set $wgCentralAuthUseSlaves on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266603 
[22:11:24] <marxarelli>	 ok, everything is linting ok now. proceeding with the train
[22:11:46] * bd808 wipes the lint off of marxarelli's back
[22:12:13] <wikibugs>	 6operations, 10vm-requests, 5Patch-For-Review: move releases.wm.org to bromine (was: request VM for releases.wm.org) - https://phabricator.wikimedia.org/T124261#1967569 (10Dzahn)
[22:12:18] <marxarelli>	 bd808: is that a hairy man joke? ;P
[22:12:53] <tgr>	 ragesoss: was that with User:Ragesoss?
[22:12:58] <bd808>	 If it is then I'm throwing bricks from inside my house of unwanted hair
[22:13:07] <mutante>	 one hair can stop an entire train
[22:13:16] <ragesoss>	 tgr: It was not working with "User:Sage (Wiki Ed)"
[22:13:37] <marxarelli>	 it's a very little train
[22:14:08] <marxarelli>	 yay, we're syncing finally
[22:14:23] <mutante>	 jouncebot: next release
[22:14:23] <jouncebot>	 In 1 hour(s) and 45 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160127T0000)
[22:15:40] <mutante>	 i wanted: In 17 days ... Mediawiki 1.27 , heh
[22:15:43] <logmsgbot>	 !log dduvall@mira Synchronized php-1.27.0-wmf.11: syncing wmf.11 backports of session fixes (duration: 03m 55s)
[22:15:45] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:16:41] <mutante>	 is there a way to estimate how much longer until somebody needs to upload new mediawiki files on releases.wikimedia.org 
[22:17:02] <mutante>	 or any release files actually
[22:17:19] <logmsgbot>	 !log dduvall@mira rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.11
[22:17:22] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:17:42] <marxarelli>	 whoops ... sec
[22:17:50] <bd808>	 mutante: ask ostriches and csteipp. They are the folks who cut new tarballs AFAIK
[22:18:16] <mutante>	 bd808: makes sense, thx
[22:18:18] <ostriches>	 mutante: Hm why do you ask?
[22:18:33] <csteipp>	 mutante: Sometime in the next 2 months we will, I'm pretty sure.
[22:18:36] <mutante>	 ostriches: i am moving the releases.wm site to a different place
[22:19:02] <mutante>	 on a virtual machine.. and we save one physical server
[22:19:09] <mutante>	 and get rid of ubuntu
[22:19:17] <ostriches>	 mmk
[22:20:11] <grrrit-wm>	 (03PS1) 10Dduvall: Group0 to 1.27.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266606 
[22:21:11] <marxarelli>	 oh boy, that's a big diff
[22:21:22] <mutante>	 i'll also copy your home dirs. do you do other stuff there (caesium) besides uploading? gpg ?
[22:21:26] <marxarelli>	 i guess hhvm on mira pretty prints json
[22:21:37] <bd808>	 marxarelli: yes!
[22:22:27] <grrrit-wm>	 (03CR) 10Dduvall: [C: 032] "Diff is larger than expected due to pretty printed JSON on mira." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266606 (owner: 10Dduvall)
[22:22:52] <grrrit-wm>	 (03Merged) 10jenkins-bot: Group0 to 1.27.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266606 (owner: 10Dduvall)
[22:22:57] <greg-g>	 I like it so much better!
[22:23:07] * greg-g is kinda back in the realm of awareness
[22:23:13] <Krenair>	 nice
[22:24:07] <marxarelli>	 and i just found python -m json.tool ... :)
[22:24:19] <grrrit-wm>	 (03CR) 10BryanDavis: "So glad to see a version built with a modern PHP runtime that knows how to pretty print JSON!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266606 (owner: 10Dduvall)
[22:25:03] <logmsgbot>	 !log dduvall@mira rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.11, for real this time
[22:25:06] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:26:19] <wikibugs>	 6operations, 10vm-requests, 5Patch-For-Review: move releases.wm.org to bromine (was: request VM for releases.wm.org) - https://phabricator.wikimedia.org/T124261#1967740 (10Dzahn) @akosiaris It does mean that all shell users who are in "releasers-mediawiki" or "releasers-mobile" now get access to a machine wi...
[22:26:41] <marxarelli>	 bd808, anomie, tgr: group0 has been promoted finally!
[22:26:57] <Krenair>	 oh
[22:27:01] <Krenair>	 for some reason I thought that was done earlier
[22:27:14] <bd808>	 Krenair: we just did it on mw1017 before
[22:27:19] <Krenair>	 ahhh
[22:27:21] <Krenair>	 that explains it
[22:27:33] <bd808>	 which gets testwiki by accident mostly
[22:27:40] <marxarelli>	 thcipriani, ostriches: fyi, there's a local modification to scap on mira to make `php -l` function without spewing errors
[22:27:41] <Krenair>	 yeah, I think I saw it on testwiki
[22:27:49] <tgr>	 bd808: the OAuth channel should be in logstash, right?
[22:27:51] <marxarelli>	 just changed php -l to php5 -l
[22:27:55] <tgr>	 even debug level events?
[22:28:11] <bd808>	 tgr: I don't think we send debug
[22:29:09] <tgr>	 not even when $wmgMonologChannels['OAuth'] === 'debug' ?
[22:29:40] <marxarelli>	 my first deploy week(s) have felt a bit like The Last Crusade
[22:29:45] <tgr>	 is there a local log or do I have to filter through fluorine?
[22:30:12] <marxarelli>	 "don't look now, Indy, Jehovah is spelt with an i'
[22:31:16] <bd808>	 tgr: logstash only gets debug when it is explictly configured with "logstash=>debug" -- https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/logging.php#L148
[22:37:14] <wikibugs>	 6operations, 10Wikimedia-Apache-configuration, 10incident-20160126-WikimediaDomainRedirection, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1967786 (10greg)
[22:37:26] <grrrit-wm>	 (03PS1) 10Dzahn: releases: setup rsyncd to copy release files [puppet] - 10https://gerrit.wikimedia.org/r/266608 (https://phabricator.wikimedia.org/T124261) 
[22:37:33] <wikibugs>	 6operations, 10incident-20160126-WikimediaDomainRedirection, 7Monitoring: add icinga and watchmouse https checks for content on commons. or other wikimedia.org sites - https://phabricator.wikimedia.org/T124812#1967793 (10greg)
[22:38:29] <wikibugs>	 6operations, 10Continuous-Integration-Config, 10incident-20160126-WikimediaDomainRedirection, 7Regression: operations-apache-config-lint replacement doesn't check syntax - https://phabricator.wikimedia.org/T114801#1967805 (10greg)
[22:38:50] <wikibugs>	 6operations, 10Traffic, 10incident-20160126-WikimediaDomainRedirection, 7Documentation: Automate and/or better-document varnish ban procedure for operations staff, so it can be accomplished with more speed and confidence in outage conditions - https://phabricator.wikimedia.org/T124835#1967813 (10greg)
[22:39:12] <grrrit-wm>	 (03PS2) 10Dzahn: releases: setup rsyncd to copy release files [puppet] - 10https://gerrit.wikimedia.org/r/266608 (https://phabricator.wikimedia.org/T124261) 
[22:39:25] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] releases: setup rsyncd to copy release files [puppet] - 10https://gerrit.wikimedia.org/r/266608 (https://phabricator.wikimedia.org/T124261) (owner: 10Dzahn)
[22:39:56] <bd808>	 tgr: I think you have to dig in fluorine's logs. Testwiki does log full debug there
[22:41:04] <icinga-wm>	 PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/).
[22:44:13] <grrrit-wm>	 (03PS1) 10Aaron Schulz: Enable deferred writes to codfw swift cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266609 (https://phabricator.wikimedia.org/T91869) 
[22:44:53] <greg-g>	 icinga-wm: on tin?
[22:47:37] <ostriches>	 if we're using mira, then yeah. mira's usually complaining a tad about it.
[22:47:55] <ostriches>	 (during the time when you've fetched locally, but haven't sync'd yet so the other one hasn't caught up)
[22:49:51] <grrrit-wm>	 (03PS1) 10Dzahn: releases: add ferm rule to allow rsync to bromine [puppet] - 10https://gerrit.wikimedia.org/r/266613 (https://phabricator.wikimedia.org/T124261) 
[22:50:26] <grrrit-wm>	 (03PS2) 10Dzahn: releases: add ferm rule to allow rsync to bromine [puppet] - 10https://gerrit.wikimedia.org/r/266613 (https://phabricator.wikimedia.org/T124261) 
[22:51:00] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] releases: add ferm rule to allow rsync to bromine [puppet] - 10https://gerrit.wikimedia.org/r/266613 (https://phabricator.wikimedia.org/T124261) (owner: 10Dzahn)
[22:52:05] <grrrit-wm>	 (03CR) 10QChris: [C: 04-2] "CR-2 since it seems the comment from 2016-01-24T21:18 is getting" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/263631 (owner: 10Chad)
[22:56:16] <wikibugs>	 6operations, 10vm-requests, 5Patch-For-Review: move releases.wm.org to bromine (was: request VM for releases.wm.org) - https://phabricator.wikimedia.org/T124261#1967914 (10Dzahn) setup rsync, copying the release files over to bromine now ... running in screen ..
[23:00:20] <greg-g>	 bd808: tgr anomie: good start on https://etherpad.wikimedia.org/p/SessionManagerRolloutFailure, can we get that cleaned up and put on wikitech, please?
[23:00:46] <bd808>	 greg-g: yeah. that's on my todo list
[23:00:52] <greg-g>	 word, thank you
[23:11:27] <wikibugs>	 6operations, 10netops: Peer with SFMIX at ULSFO with 200 Paul - https://phabricator.wikimedia.org/T124843#1967980 (10Reedy) 3NEW
[23:16:41] <wikibugs>	 6operations, 10netops: Peer with SFMIX at ULSFO with 200 Paul - https://phabricator.wikimedia.org/T124843#1968012 (10Dzahn) {meme, src=votecat}   let me know if you need smart hands at ulsfo for this
[23:23:30] <marxarelli>	 Krenair, ostriches: it's possible that https://phabricator.wikimedia.org/T124828 will need a backport to wmf.10 for the evening swat
[23:23:50] <marxarelli>	 s/possible/probable/
[23:25:02] <ostriches>	 lgtm
[23:25:53] <wikibugs>	 6operations, 10Incident-20160126-WikimediaDomainRedirection, 10Wikimedia-Apache-configuration, 10netops: All wikis under wikimedia.org (Meta, Commons, loginwiki, others) are redirecting to wikimediafoundation.org - https://phabricator.wikimedia.org/T124804#1968040 (10TheDJ) >>! In T124804#1966782, @jcrespo...
[23:28:02] <grrrit-wm>	 (03PS1) 10Dzahn: releases: also rsync /home dirs with user tools [puppet] - 10https://gerrit.wikimedia.org/r/266616 (https://phabricator.wikimedia.org/T124261) 
[23:28:04] <Krenair>	 marxarelli, ok
[23:28:24] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] releases: also rsync /home dirs with user tools [puppet] - 10https://gerrit.wikimedia.org/r/266616 (https://phabricator.wikimedia.org/T124261) (owner: 10Dzahn)
[23:34:06] <grrrit-wm>	 (03PS2) 10Dzahn: releases: also rsync /home dirs with user tools [puppet] - 10https://gerrit.wikimedia.org/r/266616 (https://phabricator.wikimedia.org/T124261) 
[23:38:24] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] releases: also rsync /home dirs with user tools [puppet] - 10https://gerrit.wikimedia.org/r/266616 (https://phabricator.wikimedia.org/T124261) (owner: 10Dzahn)
[23:45:44] <wikibugs>	 6operations, 10netops: Peer with SFMIX at ULSFO in 200 Paul - https://phabricator.wikimedia.org/T124843#1968096 (10Reedy)
[23:46:21] <grrrit-wm>	 (03PS7) 10Andrew Bogott: Keystone: Adopt a multi-domain model [puppet] - 10https://gerrit.wikimedia.org/r/244350 
[23:48:35] <grrrit-wm>	 (03PS8) 10Andrew Bogott: Keystone: Adopt a multi-domain model with ldap users but mysql role assignment [puppet] - 10https://gerrit.wikimedia.org/r/244350 
[23:51:21] <grrrit-wm>	 (03PS1) 10Rush: nfsd: bump threads avail to 192 [puppet] - 10https://gerrit.wikimedia.org/r/266622 
[23:51:36] <wikibugs>	 6operations, 10Traffic: update the multicast purging documentation - https://phabricator.wikimedia.org/T82096#1968101 (10BBlack) 5Open>3Resolved Fixed up https://wikitech.wikimedia.org/wiki/Multicast_HTCP_purging
[23:51:40] <grrrit-wm>	 (03PS2) 10Rush: nfsd: bump threads avail to 192 [puppet] - 10https://gerrit.wikimedia.org/r/266622 
[23:51:42] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] nfsd: bump threads avail to 192 [puppet] - 10https://gerrit.wikimedia.org/r/266622 (owner: 10Rush)
[23:51:53] <grrrit-wm>	 (03PS2) 10EBernhardson: Put more like query load back on eqiad for codfw load testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266559 
[23:53:05] <grrrit-wm>	 (03CR) 10Rush: [C: 032] nfsd: bump threads avail to 192 [puppet] - 10https://gerrit.wikimedia.org/r/266622 (owner: 10Rush)
[23:55:12] <wikibugs>	 6operations, 10vm-requests, 5Patch-For-Review: move releases.wm.org to bromine (was: request VM for releases.wm.org) - https://phabricator.wikimedia.org/T124261#1968108 (10Dzahn) @csteipp @demon This is the ticket re: moving the releases server.  The purpose is to replace another Ubuntu system (caesium). The...
[23:56:13] <icinga-wm>	 RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active
[23:59:35] <mutante>	 that recovery sounds good, but also something i never saw before, heh
[23:59:57] <mutante>	 new?