[00:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170112T0000). Please do the needful.
[00:00:04] <jouncebot>	 MarcoAurelio: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[00:00:05] <wikibugs>	 (03CR) 10Reedy: [C: 032] Update Kafka analytics broker list for deployment-prep [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287741 (owner: 10Ottomata)
[00:00:29] <TabbyCat>	 jouncebot: my patch already deployed tnx
[00:00:40] <Reedy>	 gj
[00:00:43] <ostriches>	 jouncebot: go away
[00:00:44] <Reedy>	 Shall we do the other rename one?
[00:01:17] <TabbyCat>	 the trwiki one? I have not asked them yet so I'd wait to avoid any drhamah
[00:02:05] <wikibugs>	 (03PS4) 10Reedy: Update Kafka analytics broker list for deployment-prep [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287741 (owner: 10Ottomata)
[00:02:10] <wikibugs>	 (03CR) 10Reedy: [C: 032] Update Kafka analytics broker list for deployment-prep [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287741 (owner: 10Ottomata)
[00:02:22] <Reedy>	 TabbyCat: But wikimedians love teh dramas! :D
[00:02:37] <TabbyCat>	 I have enough of it for this month
[00:03:02] <TabbyCat>	 I'll ask them tomorrow
[00:03:42] <wikibugs>	 (03CR) 10Aaron Schulz: [C: 032] Add DB "shard" column to logstash log entries for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330612 (owner: 10Aaron Schulz)
[00:04:16] <wikibugs>	 (03Merged) 10jenkins-bot: Update Kafka analytics broker list for deployment-prep [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287741 (owner: 10Ottomata)
[00:04:40] <wikibugs>	 (03CR) 10jenkins-bot: Update Kafka analytics broker list for deployment-prep [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287741 (owner: 10Ottomata)
[00:05:28] <wikibugs>	 (03PS3) 10Reedy: Add transitionary config for EducationProgram [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303383
[00:05:34] <wikibugs>	 (03CR) 10Reedy: [C: 032] Add transitionary config for EducationProgram [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303383 (owner: 10Reedy)
[00:05:47] <wikibugs>	 (03Merged) 10jenkins-bot: Add DB "shard" column to logstash log entries for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330612 (owner: 10Aaron Schulz)
[00:05:57] <wikibugs>	 (03CR) 10jenkins-bot: Add DB "shard" column to logstash log entries for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330612 (owner: 10Aaron Schulz)
[00:06:22] <wikibugs>	 (03CR) 10Chad: [C: 032] Adding language name configuration for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315912 (https://phabricator.wikimedia.org/T113408) (owner: 10Jon Harald Søby)
[00:07:04] <wikibugs>	 (03Merged) 10jenkins-bot: Add transitionary config for EducationProgram [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303383 (owner: 10Reedy)
[00:07:10] <wikibugs>	 (03PS4) 10Reedy: Update gallery image bounding box on svwiki to 150x150 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304991 (https://phabricator.wikimedia.org/T113877) (owner: 10Gilles)
[00:07:15] <wikibugs>	 (03CR) 10Reedy: [C: 032] Update gallery image bounding box on svwiki to 150x150 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304991 (https://phabricator.wikimedia.org/T113877) (owner: 10Gilles)
[00:08:09] <wikibugs>	 (03CR) 10jenkins-bot: Add transitionary config for EducationProgram [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303383 (owner: 10Reedy)
[00:09:20] <wikibugs>	 (03Merged) 10jenkins-bot: Adding language name configuration for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315912 (https://phabricator.wikimedia.org/T113408) (owner: 10Jon Harald Søby)
[00:10:15] <wikibugs>	 (03CR) 10jenkins-bot: Adding language name configuration for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315912 (https://phabricator.wikimedia.org/T113408) (owner: 10Jon Harald Søby)
[00:12:09] <logmsgbot>	 !log demon@tin Synchronized wmf-config/InitialiseSettings.php: Wikidata lang config (duration: 00m 38s)
[00:12:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:12:48] <nuria>	 !Log restarted apache2 and mysql on bohrium to see if mysql no connection errors disappear
[00:12:58] <nuria>	 !log restarted apache2 and mysql on bohrium to see if mysql no connection errors disappear
[00:13:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:13:48] <wikibugs>	 (03PS5) 10Reedy: Update gallery image bounding box on svwiki to 150x150 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304991 (https://phabricator.wikimedia.org/T113877) (owner: 10Gilles)
[00:13:59] <wikibugs>	 (03CR) 10Reedy: Update gallery image bounding box on svwiki to 150x150 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304991 (https://phabricator.wikimedia.org/T113877) (owner: 10Gilles)
[00:14:04] <wikibugs>	 (03CR) 10Reedy: [C: 032] Update gallery image bounding box on svwiki to 150x150 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304991 (https://phabricator.wikimedia.org/T113877) (owner: 10Gilles)
[00:15:20] <wikibugs>	 (03Merged) 10jenkins-bot: Update gallery image bounding box on svwiki to 150x150 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304991 (https://phabricator.wikimedia.org/T113877) (owner: 10Gilles)
[00:15:35] <wikibugs>	 (03CR) 10jenkins-bot: Update gallery image bounding box on svwiki to 150x150 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304991 (https://phabricator.wikimedia.org/T113877) (owner: 10Gilles)
[00:15:50] <wikibugs>	 (03CR) 10Addshore: [C: 04-1] Enable ElectronPdfService extension on metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324488 (https://phabricator.wikimedia.org/T150943) (owner: 10Addshore)
[00:15:53] <wikibugs>	 (03CR) 10Addshore: [C: 04-1] Enable ElectronPdfService extension on dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324489 (https://phabricator.wikimedia.org/T150942) (owner: 10Addshore)
[00:16:16] <ostriches>	 addshore: I was just about to ask about those 2 :)
[00:16:47] <addshore>	 ostriches: reedy came over ;)
[00:17:09] <addshore>	 They'll get out eventually over the next month!
[00:17:13] <ostriches>	 Okie dokie
[00:17:33] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:17:33] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:17:47] <logmsgbot>	 !log reedy@tin Synchronized wmf-config: More consistency for various commits (duration: 00m 40s)
[00:17:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:18:23] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[00:18:23] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[00:18:26] <wikibugs>	 (03CR) 10Chad: [C: 032] noc: Implement noc.wikimedia.org/db.php?format=json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331091 (owner: 10Krinkle)
[00:20:02] <wikibugs>	 (03Merged) 10jenkins-bot: noc: Implement noc.wikimedia.org/db.php?format=json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331091 (owner: 10Krinkle)
[00:20:12] <wikibugs>	 (03CR) 10jenkins-bot: noc: Implement noc.wikimedia.org/db.php?format=json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331091 (owner: 10Krinkle)
[00:20:52] <wikibugs>	 (03PS1) 10Addshore: Enable ElectronPdfService on testwikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331807
[00:20:57] <addshore>	 Reedy: ostriches ^^ that one would be nice though!
[00:21:02] <logmsgbot>	 !log demon@tin Synchronized docroot/noc/db.php: (no message) (duration: 00m 39s)
[00:21:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:21:13] <wikibugs>	 (03CR) 10Reedy: [C: 032] Enable ElectronPdfService on testwikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331807 (owner: 10Addshore)
[00:21:53] <icinga-wm>	 RECOVERY - puppet last run on dbproxy1006 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[00:22:41] <wikibugs>	 (03Merged) 10jenkins-bot: Enable ElectronPdfService on testwikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331807 (owner: 10Addshore)
[00:22:56] <wikibugs>	 (03CR) 10jenkins-bot: Enable ElectronPdfService on testwikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331807 (owner: 10Addshore)
[00:23:03] <Reedy>	 kaldari: https://gerrit.wikimedia.org/r/#/c/324672/ can that go out?
[00:23:22] <kaldari>	 Reedy: Nope
[00:23:23] <kaldari>	 not yet
[00:23:49] <Reedy>	 kaldari: Mind dropping a -1 on it? :)
[00:23:54] <kaldari>	 sure
[00:24:06] <wikibugs>	 (03CR) 10Kaldari: [C: 04-1] "Not ready yet" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324672 (https://phabricator.wikimedia.org/T152076) (owner: 10Kaldari)
[00:24:10] <wikibugs>	 (03PS2) 10Reedy: Use internal url for Ores, move to ProductionServices.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/316317 (owner: 10Giuseppe Lavagetto)
[00:24:34] <Reedy>	 thanks!
[00:24:41] <wikibugs>	 (03PS2) 10Chad: wikitech: Add oathauth group with oathauth-api-all right [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327852 (https://phabricator.wikimedia.org/T153487) (owner: 10BryanDavis)
[00:25:03] <wikibugs>	 (03PS1) 10Aaron Schulz: Include DB shard in production SPI log entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331808
[00:25:26] <matanya>	 kaldari: i will buy you a beer when it is out
[00:26:54] <wikibugs>	 (03CR) 10Chad: [C: 032] wikitech: Add oathauth group with oathauth-api-all right [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327852 (https://phabricator.wikimedia.org/T153487) (owner: 10BryanDavis)
[00:27:37] <wikibugs>	 (03PS2) 10Aaron Schulz: Include DB shard in production SPI log entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331808
[00:28:03] <wikibugs>	 (03CR) 10Reedy: Use internal url for Ores, move to ProductionServices.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/316317 (owner: 10Giuseppe Lavagetto)
[00:28:37] <wikibugs>	 (03Merged) 10jenkins-bot: wikitech: Add oathauth group with oathauth-api-all right [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327852 (https://phabricator.wikimedia.org/T153487) (owner: 10BryanDavis)
[00:28:50] <wikibugs>	 (03CR) 10jenkins-bot: wikitech: Add oathauth group with oathauth-api-all right [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327852 (https://phabricator.wikimedia.org/T153487) (owner: 10BryanDavis)
[00:29:58] <logmsgbot>	 !log demon@tin Synchronized wmf-config/InitialiseSettings.php: oathauth group for wikitech (duration: 00m 38s)
[00:29:59] <wikibugs>	 (03PS2) 10Reedy: Set wgSemiprotectedRestrictionLevels for de.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/282471 (https://phabricator.wikimedia.org/T132249) (owner: 10Dereckson)
[00:30:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:30:19] <wikibugs>	 (03PS3) 10Reedy: Set wgSemiprotectedRestrictionLevels for de.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/282471 (https://phabricator.wikimedia.org/T132249) (owner: 10Dereckson)
[00:30:36] <wikibugs>	 (03CR) 10Reedy: [C: 04-1] "Consensus needed?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/282471 (https://phabricator.wikimedia.org/T132249) (owner: 10Dereckson)
[00:31:13] <icinga-wm>	 PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[00:57:52] <wikibugs>	 (03CR) 10MZMcBride: "I wasn't speaking hypothetically, of course. You're almost certainly noticing the same behavior that I'm seeing, with bot edits such as th" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324215 (https://phabricator.wikimedia.org/T154698) (owner: 10Anomie)
[01:00:04] <jouncebot>	 Deploy window No Deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170112T0100)
[01:00:13] <icinga-wm>	 RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[01:02:00] <wikibugs>	 (03PS3) 10Krinkle: Include DB shard in production SPI log entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331808 (owner: 10Aaron Schulz)
[01:07:24] <wikibugs>	 (03PS3) 10Volans: Initial import with the first version [software/cumin] - 10https://gerrit.wikimedia.org/r/330425 (https://phabricator.wikimedia.org/T154588)
[01:08:47] <wikibugs>	 (03PS1) 10Dereckson: Use directly wgGalleryOptions without wmg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331819
[01:09:36] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] "Looking at logstash-beta it seems this field is showing up fine, but it does have a warning next to it about "No cache mapping for this fi" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331808 (owner: 10Aaron Schulz)
[01:10:11] <Krinkle>	 elukey: https://gerrit.wikimedia.org/r/#/c/327686/ :)
[01:14:14] <wikibugs>	 (03PS4) 10Dereckson: Configure he.wiki images size [mediawiki-config] - 10https://gerrit.wikimedia.org/r/31580 (https://phabricator.wikimedia.org/T43712)
[01:15:17] <wikibugs>	 (03CR) 10Dereckson: "PS4: short array syntax, rebased against gilles change, rebased against wmg/wg cleaning change" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/31580 (https://phabricator.wikimedia.org/T43712) (owner: 10Dereckson)
[01:16:36] <wikibugs>	 (03CR) 10Dereckson: "To the deployer: this change touches CS and IS. Need a kludge (copy $wg/$wmg in IS or CS) to deploy it. Order matters." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331819 (owner: 10Dereckson)
[01:16:44] <wikibugs>	 (03PS1) 10Brion VIBBER: Add 'webp' package to ImageMagick role [puppet] - 10https://gerrit.wikimedia.org/r/331820 (https://phabricator.wikimedia.org/T27397)
[01:22:03] <wikibugs>	 (03PS2) 10Dereckson: Explicit dblist name for compact language links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315983
[01:22:15] <wikibugs>	 (03CR) 10Dereckson: "PS2: Rebased" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315983 (owner: 10Dereckson)
[01:23:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Explicit dblist name for compact language links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315983 (owner: 10Dereckson)
[01:27:46] <Revent>	 https://commons.wikimedia.org/w/index.php?title=File:Education_sounds.ogg&action=delete
[01:27:58] <Revent>	 “A database query error has occurred. This may indicate a bug in the software.[WHbbdQpAADsAAOUkYz8AAAAU] 2017-01-12 01:27:33: Fatal exception of type "DBQueryError””
[01:28:36] <Revent>	 File is a 2GB+ ‘audio file’ with an embedded rar archive, and needs to go away.
[01:29:03] <ostriches>	 "Lock wait timeout exceeded; try restarting transaction"
[01:29:06] <ostriches>	 Is the error in question
[01:29:18] <Revent>	 Odd.
[01:29:45] <ostriches>	 Revent: That shouldn't be impacted by filesize, that's just the DB query part of deleting...try again?
[01:29:48] * ostriches shrugs
[01:29:54] <Revent>	 ostriches: It eventually was deleted, after a significant time delay.
[01:30:22] <ostriches>	 Yeah, just a slow query :(
[01:30:26] <ostriches>	 Glad it's gone now
[01:30:34] <Revent>	 logged deletion time was 1:24, it gave me the error, and then went away about a minute ago.
[01:31:29] <ostriches>	 Reedy: Ugh, the MassMessage fix(es) are spamming a bit.
[01:31:38] <Reedy>	 really?
[01:31:46] <Reedy>	 Would've thought they would've timed out about then
[01:31:50] <ostriches>	 https://phabricator.wikimedia.org/P4739
[01:32:01] <ostriches>	 Saw on cli when running `sql`
[01:32:20] <Reedy>	 some cache not updated?
[01:32:36] <Dereckson>	 tests/cirrusTest.php also has the list of dblists, joy!
[01:33:04] <Reedy>	 ostriches: /srv/mediawiki out of date on tin?
[01:33:17] <ostriches>	 Maybe? Shouldn't be tho with all the syncs we've done
[01:33:21] <Reedy>	 !log running scap pull on tin
[01:33:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:34:04] <Reedy>	 I'm sure I've seen this before
[01:34:05] <wikibugs>	 (03PS3) 10Dereckson: Explicit dblist name for compact language links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315983
[01:34:13] <Reedy>	 Something weird with it not syncing -staging to not staging
[01:34:38] <Dereckson>	 Reedy: yes I've seen it to once, when I was creating a wiki
[01:34:41] <Dereckson>	 too
[01:34:55] <Reedy>	 ostriches: scap pull fixed it on tin
[01:34:56] <wikibugs>	 (03CR) 10Dereckson: "PS3: +tests/cirrusTest.php" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315983 (owner: 10Dereckson)
[01:34:59] <Reedy>	 stupid thing
[01:35:08] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Explicit dblist name for compact language links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315983 (owner: 10Dereckson)
[01:35:09] <ostriches>	 Yep, looks clean now
[01:35:16] <Reedy>	 Should file a bug about that
[01:35:20] <Reedy>	 if there's not one already
[01:35:38] <Reedy>	 I thought staging to no staging was the first thing
[01:36:05] <Dereckson>	 Reedy: https://phabricator.wikimedia.org/T152005
[01:36:45] <Reedy>	 Oh look
[01:37:40] <wikibugs>	 (03PS3) 10Chad: beta: Set $wgLinterStatsdSampleFactor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327438 (owner: 10Legoktm)
[01:38:09] <ostriches>	 I could've sworn we fixed that, hmm
[01:38:13] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[01:38:13] <ostriches>	 Reedy: Rebased ^
[01:38:50] <Dereckson>	 Krenair: NocDblistTest::testNocDblists has caught a change I rebased, works like a charm so
[01:39:09] <Krenair>	 nice
[01:39:13] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2798855 keys, up 72 days 17 hours - replication_delay is 0
[01:39:39] <wikibugs>	 (03PS4) 10Dereckson: Explicit dblist name for compact language links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315983
[01:40:02] <ostriches>	 Dereckson, Reedy: Raised priority on T152005
[01:40:03] <stashbot>	 T152005: /srv/mediawiki on tin not being updated when using scap sync-file - https://phabricator.wikimedia.org/T152005
[01:40:18] * Dereckson nods
[01:40:40] <Krenair>	 thanks for merging that btw guys
[01:40:56] <Krenair>	 what about https://gerrit.wikimedia.org/r/#/c/298397/ ? :p
[01:40:57] <ostriches>	 Merged all the things today!
[01:41:50] <wikibugs>	 (03CR) 10Dereckson: "PS4: +noc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315983 (owner: 10Dereckson)
[01:45:20] <Krenair>	 https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/mediawiki-config+-label:Code-Review%253C%253D-1+-label:Verified-1 is actually somehow only one page
[01:45:22] <Krenair>	 impressive
[01:45:58] <Dereckson>	 and your watching query only has two changes
[01:46:29] <ostriches>	 Krenair: Yep, that was the goal :)
[01:46:57] <ostriches>	 Heck, even removing the label query still is one page, no Next link :D
[01:47:01] <ostriches>	 Reedy: Go us!
[01:47:22] <wikibugs>	 (03Abandoned) 10Dereckson: Throttle user edits to 1000 per minute [mediawiki-config] - 10https://gerrit.wikimedia.org/r/316980 (https://phabricator.wikimedia.org/T56515) (owner: 10Dereckson)
[01:48:03] <wikibugs>	 (03CR) 10Alex Monk: [C: 04-1] "temporary -1 while task is still being dealt with by DBA" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314792 (https://phabricator.wikimedia.org/T126832) (owner: 10Dereckson)
[01:51:20] <wikibugs>	 (03Abandoned) 10Chad: MWMultiversion cleanups [puppet] - 10https://gerrit.wikimedia.org/r/309366 (owner: 10Chad)
[01:52:04] <Dereckson>	 https://gerrit.wikimedia.org/r/#/c/309742/ ←  so how do we call squid.php? Last time we had ReverseProxy.php and CachingProxy.php as proposals
[01:52:06] <Krenair>	 Why do we repeat the dblist tag reading code in tests/cirrusTest.php
[01:52:17] <Dereckson>	 Krenair: I'm writing a class to store this list
[01:52:21] <Krenair>	 ok
[01:52:49] <Dereckson>	 Was asking myself the same question when I updated clldefault → compact-language-links dblist change
[01:53:40] <Krenair>	 yeah that's where I noticed
[01:55:03] <wikibugs>	 (03Abandoned) 10Chad: Enable Education Program extension at urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309062 (https://phabricator.wikimedia.org/T144927) (owner: 10محمد شعیب)
[01:55:34] <wikibugs>	 (03CR) 10Alex Monk: "Is this ready to go?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301129 (https://phabricator.wikimedia.org/T141349) (owner: 10Jforrester)
[01:56:22] <ostriches>	 Dereckson: "cachestuff.php" ;-)
[01:56:54] <wikibugs>	 (03CR) 10Chad: "Talked in person, tldr: no" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301129 (https://phabricator.wikimedia.org/T141349) (owner: 10Jforrester)
[01:56:57] <Krenair>	 cp.php
[01:57:00] <Krenair>	 varnish.php
[01:57:04] <Krenair>	 I don't mind really
[01:57:12] <ostriches>	 squidvarnishcp.php
[01:57:12] <ostriches>	 :D
[01:57:20] <ostriches>	 cache-flavor-of-the-year.php
[01:57:49] <wikibugs>	 (03CR) 10Alex Monk: "Let's leave either a commit message marker or a negative CR to indicate that then?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301129 (https://phabricator.wikimedia.org/T141349) (owner: 10Jforrester)
[01:58:23] <ostriches>	 Krenair: Context was something something reading team.
[01:58:26] <ostriches>	 I dunno, ask James_F 
[01:58:41] <James_F>	 :-)
[01:59:21] <wikibugs>	 (03CR) 10Chad: [C: 04-2] "Per talking to Krinkle in person, this probably isn't a great idea" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228618 (https://phabricator.wikimedia.org/T90612) (owner: 10Legoktm)
[01:59:53] <wikibugs>	 06Operations, 10ops-codfw: decommission mw2075-2089 to make room for new systems - https://phabricator.wikimedia.org/T154621#2935519 (10Papaul)
[02:00:13] <wikibugs>	 06Operations, 10ops-codfw: decommission mw2075-2089 to make room for new systems - https://phabricator.wikimedia.org/T154621#2918062 (10Papaul) a:05Papaul>03RobH
[02:01:21] <Krenair>	 James_F: Is https://gerrit.wikimedia.org/r/301129 ready to go?
[02:01:56] <James_F>	 What Chad said.
[02:02:28] <wikibugs>	 (03CR) 10Alex Monk: "Is this change done/ready?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/329762 (owner: 10Matěj Suchánek)
[02:02:42] <Krenair>	 James_F: Something something reading team?
[02:02:50] <Krenair>	 Or I dunno?
[02:02:54] <James_F>	 Yup.
[02:03:07] <Krenair>	 Okay, this greatly clarifies things.
[02:03:09] <James_F>	 It's Reading's responsibility and it's stuck in limbo.
[02:03:56] <Krenair>	 Adding -ownerin:wmf-deployment also hugely reduces the size of this gerrit query
[02:04:32] <Krenair>	 Though if I recall correctly, also excludes James_F's changes
[02:04:51] <James_F>	 Indeed. :-(
[02:04:56] * James_F sniffs.
[02:05:38] <wikibugs>	 (03Abandoned) 10Chad: Allow 'block' AbuseFilterAction on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239455 (https://phabricator.wikimedia.org/T113096) (owner: 10Platonides)
[02:06:08] <ostriches>	 James_F clutters the review list with his patches that won't land for a year :p
[02:06:38] <James_F>	 True.
[02:06:44] <James_F>	 Statements of intent. :-)
[02:07:03] <James_F>	 See also half the code in VisualEditor.
[02:08:10] <ostriches>	 At least the backlog is manageable now :D
[02:09:00] <Dereckson>	 Krenair: seems the test doesn't care about wiktionary, wikiquote, etc.
[02:09:14] <Krenair>	 this is the cirrusTest thing?
[02:09:18] <Dereckson>	 yes
[02:09:49] <Krenair>	 still probably best to just run the same code
[02:10:20] <Dereckson>	 I concur and list of lists will be more maintainable.
[02:10:33] <Krenair>	 Dereckson: What's your thoughts on https://gerrit.wikimedia.org/r/308281 ?
[02:11:25] <Dereckson>	 they have more rights than other interface-editor groups
[02:11:38] <Dereckson>	 limits aren't important, abuse filter is
[02:12:50] <Dereckson>	 I'd create a 'technical administrator' group, and use it for ru., here, and all other "we want some sysop but only for technical stuff" variants
[02:14:05] <Dereckson>	 I agree with MarcoAurelio less groups we have, better will be the l10n
[02:14:23] <Dereckson>	 (more a matter to reuse an already well translated label)
[02:15:57] <wikibugs>	 (03CR) 10Chad: "If we still want this, needs a major rebase against master. The arrays have long since been fixed, for example." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271936 (owner: 10Jforrester)
[02:17:04] <wikibugs>	 (03CR) 10Chad: "Is there a reason we can't enable this in beta?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/256967 (owner: 10Paladox)
[02:17:38] <wikibugs>	 (03CR) 10Chad: [C: 032] beta: Set $wgLinterStatsdSampleFactor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327438 (owner: 10Legoktm)
[02:19:05] <wikibugs>	 (03Merged) 10jenkins-bot: beta: Set $wgLinterStatsdSampleFactor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327438 (owner: 10Legoktm)
[02:19:15] <wikibugs>	 (03CR) 10jenkins-bot: beta: Set $wgLinterStatsdSampleFactor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327438 (owner: 10Legoktm)
[02:21:22] * ostriches throws a rock at l10nupdate
[02:27:33] <Krenair>	 ostriches, https://gerrit.wikimedia.org/r/#/c/330709/ needs rebasing
[02:27:49] <ostriches>	 Can't land yet anyway
[02:27:52] <ostriches>	 Dependency hasn't
[02:28:11] <ostriches>	 Oh, that's the puppet bit
[02:28:25] <ostriches>	 Well, rebasing doesn't make a difference, Filippo said he wasn't gonna land +deploy it today
[02:30:03] <Krenair>	 ok
[02:31:08] <Krenair>	 would be good if ops could do something similar to today but with the puppet repo
[02:31:08] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.7) (duration: 11m 12s)
[02:31:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:36:24] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Jan 12 02:36:23 UTC 2017 (duration 5m 15s)
[02:36:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:36:35] <Krenair>	 ostriches, hey you know what needs doing
[02:36:39] <Krenair>	 interwiki.php update
[02:36:40] <Krenair>	 https://phabricator.wikimedia.org/T154920#2930522
[02:36:52] <Krenair>	 also https://phabricator.wikimedia.org/T154225
[02:37:15] <ostriches>	 Oh snap, how do I do that again? Been awhile :p
[02:37:31] <logmsgbot>	 !log demon@tin Synchronized wmf-config/InitialiseSettings-labs.php: no-op, completeness (duration: 00m 38s)
[02:37:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:37:44] <wikibugs>	 (03PS1) 10Dereckson: Consolidate database lists list in one place [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331825
[02:37:47] <Krenair>	 gotta run dumpInterwiki.php
[02:38:16] <Krenair>	 specify output file, download it to your mediawiki-config dir, upload as commit
[02:38:18] <Krenair>	 then deploy
[02:38:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Consolidate database lists list in one place [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331825 (owner: 10Dereckson)
[02:39:14] <ostriches>	 Download, then upload?
[02:39:22] * ostriches does it all from tin like a boss
[02:39:46] <Krenair>	 well
[02:40:08] <Krenair>	 I guess you can use a temporary HTTPS password to upload from tin
[02:40:45] <wikibugs>	 (03PS1) 10Chad: Updating interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331826
[02:41:02] <ostriches>	 Krenair: I upload changes from tin all the time :p
[02:41:05] <ostriches>	 Saves me round-trips
[02:41:11] <ostriches>	 Plz review ^
[02:44:04] <wikibugs>	 (03CR) 10Chad: [C: 032] Updating interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331826 (owner: 10Chad)
[02:44:15] <ostriches>	 ostriches: Thanks for the review
[02:44:18] <ostriches>	 You're welcome
[02:45:24] <wikibugs>	 (03Merged) 10jenkins-bot: Updating interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331826 (owner: 10Chad)
[02:45:43] <wikibugs>	 (03CR) 10jenkins-bot: Updating interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331826 (owner: 10Chad)
[02:45:58] <Krenair>	 lgtm
[02:46:30] <logmsgbot>	 !log demon@tin Synchronized wmf-config/interwiki.php: T154225 (duration: 00m 38s)
[02:46:32] <Dereckson>	 why multiversion/vendor/ is commit in the repo? As composer.lock is committed, it should recreate the same content
[02:46:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:46:35] <stashbot>	 T154225: Update interwiki map, following edit - https://phabricator.wikimedia.org/T154225
[02:47:35] <ostriches>	 Because deployment servers can't/won't/shouldn't be downloading things from packagist :)
[02:47:42] <ostriches>	 cf: mediawiki/vendor
[02:52:02] <Krenair>	 we should change the topic back to status up
[02:53:29] <ostriches>	 And with that, I'm out for the night. Later
[03:07:37] <wikibugs>	 (03PS2) 10Dereckson: Consolidate database lists list in one place [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331825
[03:08:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Consolidate database lists list in one place [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331825 (owner: 10Dereckson)
[03:23:03] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 693.15 seconds
[03:29:03] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 220.98 seconds
[04:37:43] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:38:33] <icinga-wm>	 PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:39:33] <icinga-wm>	 RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy
[04:39:33] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[04:45:23] <icinga-wm>	 PROBLEM - puppet last run on cp3047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:45:33] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:45:33] <icinga-wm>	 PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:45:43] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:46:33] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1010 is OK: All endpoints are healthy
[04:46:33] <icinga-wm>	 RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy
[04:46:33] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[04:57:43] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:58:23] <icinga-wm>	 PROBLEM - Host labstore1004 is DOWN: PING CRITICAL - Packet loss = 100%
[04:58:41] <Waggie>	 ...
[04:58:43] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:58:43] <icinga-wm>	 PROBLEM - Host ms-be1006 is DOWN: PING CRITICAL - Packet loss = 100%
[04:58:43] <icinga-wm>	 PROBLEM - Host db1055 is DOWN: PING CRITICAL - Packet loss = 100%
[04:58:43] <icinga-wm>	 PROBLEM - Host ms-be1007 is DOWN: PING CRITICAL - Packet loss = 100%
[04:58:43] <icinga-wm>	 PROBLEM - Host db1056 is DOWN: PING CRITICAL - Packet loss = 100%
[04:58:43] <icinga-wm>	 PROBLEM - Host db1088 is DOWN: PING CRITICAL - Packet loss = 100%
[04:58:44] <icinga-wm>	 PROBLEM - Host db1051 is DOWN: PING CRITICAL - Packet loss = 100%
[04:58:44] <icinga-wm>	 PROBLEM - Host analytics1029 is DOWN: PING CRITICAL - Packet loss = 100%
[04:58:45] <icinga-wm>	 PROBLEM - Host db1060 is DOWN: PING CRITICAL - Packet loss = 100%
[04:58:45] <icinga-wm>	 PROBLEM - Host db1054 is DOWN: PING CRITICAL - Packet loss = 100%
[04:58:46] <icinga-wm>	 PROBLEM - Host db1057 is DOWN: PING CRITICAL - Packet loss = 100%
[04:58:46] <icinga-wm>	 PROBLEM - Host db1059 is DOWN: PING CRITICAL - Packet loss = 100%
[04:58:47] <icinga-wm>	 PROBLEM - Host es1015 is DOWN: PING CRITICAL - Packet loss = 100%
[04:58:47] <icinga-wm>	 PROBLEM - Host analytics1030 is DOWN: PING CRITICAL - Packet loss = 100%
[04:59:03] <icinga-wm>	 PROBLEM - configured eth on lvs1001 is CRITICAL: eth2 reporting no carrier.
[04:59:05] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on prometheus.svc.eqiad.wmnet is CRITICAL: connect to address 10.2.2.25 and port 80: No route to host
[04:59:13] <icinga-wm>	 PROBLEM - configured eth on lvs1002 is CRITICAL: eth2 reporting no carrier.
[04:59:13] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - apaches_80 - Could not depool server mw1186.eqiad.wmnet because of too many down!: api-https_443 - Could not depool server mw1198.eqiad.wmnet because of too many down!: api_80 - Could not depool server mw1205.eqiad.wmnet because of too many down!: appservers-https_443 - Could not depool server mw1179.eqiad.wmnet because of too many down!
[04:59:23] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - apaches_80 - Could not depool server mw1265.eqiad.wmnet because of too many down!: api-https_443 - Could not depool server mw1235.eqiad.wmnet because of too many down!: zotero_1969 - Could not depool server sca1003.eqiad.wmnet because of too many down!: api_80 - Could not depool server mw1282.eqiad.wmnet because of too many down!
[04:59:23] <icinga-wm>	 RECOVERY - Host labstore1004 is UP: PING WARNING - Packet loss = 61%, RTA = 0.46 ms
[04:59:23] <icinga-wm>	 PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds
[04:59:33] <icinga-wm>	 RECOVERY - Host db1057 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms
[04:59:33] <icinga-wm>	 RECOVERY - Host db1054 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms
[04:59:37] <icinga-wm>	 RECOVERY - Host analytics1029 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms
[04:59:43] <icinga-wm>	 RECOVERY - Host analytics1031 is UP: PING OK - Packet loss = 0%, RTA = 26.94 ms
[04:59:55] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on zotero.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.0 200 OK - 62 bytes in 0.006 second response time
[05:00:03] <icinga-wm>	 RECOVERY - configured eth on lvs1001 is OK: OK - interfaces up
[05:00:05] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on prometheus.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 156 bytes in 0.005 second response time
[05:00:05] <icinga-wm>	 RECOVERY - configured eth on lvs1003 is OK: OK - interfaces up
[05:00:13] <icinga-wm>	 RECOVERY - configured eth on lvs1002 is OK: OK - interfaces up
[05:00:13] <icinga-wm>	 RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 10.568 second response time
[05:00:23] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy
[05:00:23] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy
[05:01:03] <icinga-wm>	 PROBLEM - All Flannel etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/etcd/flannel - 341 bytes in 0.003 second response time
[05:01:13] <icinga-wm>	 PROBLEM - Verify internal DNS from within Tools on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/labs-dns/private - 341 bytes in 0.004 second response time
[05:01:23] <icinga-wm>	 PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:01:33] <icinga-wm>	 PROBLEM - Redis set/get on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/redis - 341 bytes in 0.003 second response time
[05:01:33] <icinga-wm>	 PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 341 bytes in 0.002 second response time
[05:01:33] <icinga-wm>	 PROBLEM - showmount succeeds on a labs instance on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/nfs/showmount - 341 bytes in 0.005 second response time
[05:01:35] <icinga-wm>	 PROBLEM - NFS read/writeable on labs instances on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/nfs/home - 341 bytes in 0.003 second response time
[05:01:37] <icinga-wm>	 PROBLEM - toolschecker service itself needs to return OK on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/self - 341 bytes in 0.003 second response time
[05:01:43] <icinga-wm>	 PROBLEM - Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/dumps - 341 bytes in 0.002 second response time
[05:02:03] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0]
[05:02:03] <icinga-wm>	 PROBLEM - puppet last run on ms-be1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:02:13] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[05:02:13] <icinga-wm>	 PROBLEM - puppet last run on db1060 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:02:13] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0]
[05:02:23] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[05:02:23] <icinga-wm>	 RECOVERY - All Flannel etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 24.844 second response time
[05:02:23] <icinga-wm>	 RECOVERY - Verify internal DNS from within Tools on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 15.919 second response time
[05:02:33] <icinga-wm>	 RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.007 second response time
[05:02:33] <icinga-wm>	 RECOVERY - Redis set/get on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.010 second response time
[05:02:33] <icinga-wm>	 RECOVERY - showmount succeeds on a labs instance on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.024 second response time
[05:02:35] <icinga-wm>	 RECOVERY - NFS read/writeable on labs instances on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.032 second response time
[05:02:37] <icinga-wm>	 RECOVERY - toolschecker service itself needs to return OK on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.032 second response time
[05:02:43] <icinga-wm>	 RECOVERY - Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.007 second response time
[05:03:13] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[05:03:32] <robh>	 irc from my phone!  (it seems like this was a monitoring bug with false positive alerts and not a site failure.)
[05:04:06] <robh>	 i'll be on laptop in about 10 minutes (unless another opsen shows up) im not comfortable taking laptop out on ac transit bus ;)
[05:04:16] <Bsadowski1>	 Haha ><
[05:04:26] <Bsadowski1>	 Yeah, that wouldn't be good...
[05:04:36] <madhuvishy>	 ummm hello
[05:05:08] <robh>	 also seems nothing is down, just a bunch of alerts and clears, so not worth getting off the bus at a random stop that isnt mine to try to troubleshoot.
[05:05:10] <madhuvishy>	 looks like a monitoring bug
[05:05:11] <madhuvishy>	 yeah
[05:05:30] <madhuvishy>	 robh: nope i'm around etc, do your thing :)
[05:05:58] <robh>	 cool, i just setup irc on my phone yesterday to connect to my bouncer, so was excuse to have something to do on the bus ;]
[05:06:13] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0]
[05:06:38] <robh>	 ok, out until im home and things look ok anyhow.
[05:08:13] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[05:10:33] <icinga-wm>	 PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service]
[05:11:03] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[05:12:23] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[05:13:13] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[05:14:46] <Krenair>	 was that a network issue or something?
[05:15:23] <icinga-wm>	 RECOVERY - puppet last run on cp3047 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures
[05:15:50] <madhuvishy>	 not sure - bunch of NTP CRITICAL: No response from NTP server reports on icinga
[05:17:06] <Krenair>	 I just got a user complaining about replag
[05:17:18] <Krenair>	 db servers becoming unavailable might cause that?
[05:17:26] <volans>	 look like almost all the hosts that went "down" are on C2 on eqiad, so maybe the switch got restarted?
[05:17:33] <volans>	 Krenair: let me check 
[05:18:11] <volans>	 Krenair: no DB has lag right now, when was the complain?
[05:18:27] <Krenair>	 I know
[05:18:49] <Krenair>	 around about 05:01-05:06 ish
[05:19:19] <Krenair>	 I'd check if the hosts were on the same rack, yeah
[05:21:10] <volans>	 they are
[05:23:20] <volans>	 including the master of s1 :(
[05:24:13] <Krenair>	 yeah that going down would drop enwiki into read-only mode
[05:24:55] <Krenair>	 <Cameron11598> https://status.wikimedia.org/178333/Wiki-platform-[[w:dsb:Main-Page]]-(s3)---UNCACHED  <--- FYI ?
[05:24:57] <Krenair>	 from -tech
[05:25:28] <Krenair>	 am guessing that was the result of one of the lvs hosts being there
[05:26:35] <wikibugs>	 (03PS1) 10Andrew Bogott: Keystone: Turn down default log levels [puppet] - 10https://gerrit.wikimedia.org/r/331830
[05:27:11] <volans>	 Krenair: no, no lvs is in that rack
[05:27:51] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] Keystone: Turn down default log levels [puppet] - 10https://gerrit.wikimedia.org/r/331830 (owner: 10Andrew Bogott)
[05:28:03] <Krenair>	 hmm
[05:28:10] <Krenair>	 there were some lvs-related alerts in icinga
[05:28:16] <Krenair>	 none for cp hosts
[05:28:42] <volans>	 yes, I saw that too, it should be because the eth2 of lvs1001 is connected there but I need to verifyEinsOlogy9-CosmIvity1+RelatTein5$
[05:28:51] <volans>	 pastefail
[05:29:03] <icinga-wm>	 RECOVERY - puppet last run on ms-be1005 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures
[05:29:13] <icinga-wm>	 RECOVERY - puppet last run on db1060 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures
[05:29:23] <icinga-wm>	 RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[05:30:26] <ema>	 cache_text was affected between 4:56 and 5:01, all good on cache_upload 
[05:37:33] <icinga-wm>	 RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[06:34:23] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1260 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[06:37:59] <Waggie>	 Bsadowski1: You've never taken AC Transit, you REALLY don't want to take a laptop out on it, for a wide variety of reasons.. :)
[06:39:00] <Waggie>	 robh: You were wise to not do so. :)
[06:40:13] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[06:44:03] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[06:51:43] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[06:51:45] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[06:52:33] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[06:52:33] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy
[06:56:03] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1259 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[07:04:03] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[07:04:03] <icinga-wm>	 PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:22:03] <icinga-wm>	 RECOVERY - Check HHVM threads for leakage on mw1259 is OK: OK
[07:27:23] <icinga-wm>	 RECOVERY - Check HHVM threads for leakage on mw1260 is OK: OK
[07:28:03] <icinga-wm>	 RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK
[07:32:03] <icinga-wm>	 RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[07:48:13] <icinga-wm>	 RECOVERY - Check HHVM threads for leakage on mw1169 is OK: OK
[08:36:42] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] osm: Use LABS_NETWORKS in ferm rsync rule [puppet] - 10https://gerrit.wikimedia.org/r/331622 (owner: 10Alexandros Kosiaris)
[08:36:48] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: osm: Use LABS_NETWORKS in ferm rsync rule [puppet] - 10https://gerrit.wikimedia.org/r/331622
[08:36:54] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] osm: Use LABS_NETWORKS in ferm rsync rule [puppet] - 10https://gerrit.wikimedia.org/r/331622 (owner: 10Alexandros Kosiaris)
[08:37:12] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: puppetdb: Do not set up Ganglia in Labs [puppet] - 10https://gerrit.wikimedia.org/r/329329 (https://phabricator.wikimedia.org/T154104) (owner: 10Tim Landscheidt)
[08:37:19] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] puppetdb: Do not set up Ganglia in Labs [puppet] - 10https://gerrit.wikimedia.org/r/329329 (https://phabricator.wikimedia.org/T154104) (owner: 10Tim Landscheidt)
[08:38:26] <hashar>	 good morning
[08:38:41] <hashar>	 akosiaris: I went crazy yesterday and fixed a bunch of rspec changes :D
[08:38:51] <hashar>	 rspec tests
[08:40:57] <wikibugs>	 (03CR) 10Thiemo Mättig (WMDE): [C: 031] "I had a look at the PropertySuggester source code. It does *not* check if a property exists, because this would be to expensive. As long a" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/329762 (owner: 10Matěj Suchánek)
[08:41:53] <icinga-wm>	 PROBLEM - Check if rsync server is running on labsdb1006 is CRITICAL: PROCS CRITICAL: 0 processes with command name rsync, regex args /usr/bin/rsync --no-detach --daemon
[08:43:24] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "It's a system group that exists by default on all installations, and the idea is to use it as is, so there isn't much we can do about the " [puppet] - 10https://gerrit.wikimedia.org/r/331602 (owner: 10Alexandros Kosiaris)
[08:47:09] <akosiaris>	 hashar: yes I noticed. I am looking at https://gerrit.wikimedia.org/r/#/c/331677/1/modules/nrpe/spec/defines/monitor_service_spec.rb,unified right now
[08:47:24] <akosiaris>	 trying to remember why I had those "pending"
[08:48:37] <akosiaris>	 so.. now rspec despite the "pending"
[08:48:46] <akosiaris>	 runs the tests ? 
[08:49:01] <akosiaris>	 and if it is ok it just errors out, if it is not it honors the pending ?
[08:49:02] <akosiaris>	 lol
[08:49:52] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] nrpe: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/331677 (owner: 10Hashar)
[08:49:55] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: nrpe: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/331677 (owner: 10Hashar)
[08:49:59] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] nrpe: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/331677 (owner: 10Hashar)
[08:50:12] <hashar>	 akosiaris: so yeah rspec 3 always run the test
[08:50:26] <hashar>	 pending() is a merely to flag the test as it is going to fail
[08:50:44] <hashar>	 IF the spec pass  and it is flagged pending(),  then it is marked as failling
[08:50:53] <hashar>	 cause since it pass, it is no more pending :}  
[08:50:56] <hashar>	 something fixed it up somehow
[08:51:00] <akosiaris>	 yeah makes sense
[08:51:23] <akosiaris>	 I think nrpe is one the my first rspec module tests
[08:51:29] <hashar>	 so in theory before refactoring code, someone can write the expected behaviors to achieve and mark them all as pending
[08:51:38] <akosiaris>	 probably not the very best ones
[08:51:45] <akosiaris>	 thanks for handling this!
[08:51:51] <hashar>	 and whenever they fail, that means a feature has been implemented properly 
[08:52:13] <akosiaris>	 omg did I just write "not the very best ones" ?
[08:52:13] <hashar>	 there is a few patches to run all the spec form the root of the repo.  Will attempt to have one final nice patch for review
[08:52:21] <hashar>	 then probably write some doc
[08:52:26] <akosiaris>	 that was a translation straight from greek and it's wrong also
[08:52:27] <akosiaris>	 lol
[08:52:29] <hashar>	 and hopefully get Jenkins to run the spec finally
[08:52:31] <hashar>	 haha
[08:52:46] * akosiaris needs more coffee
[08:52:54] <hashar>	 none of us are native english speakers anyway :)
[08:53:11] <akosiaris>	 hashar: so yeah.. generally I 've altered slightly my opinion about rspec tests
[08:53:21] <akosiaris>	 so e.g. that test is practically stupid
[08:53:33] <akosiaris>	 all it does is re implement the actual class
[08:53:38] <akosiaris>	 which is a really simple class
[08:53:42] <akosiaris>	 there isn't much to test there
[08:53:56] <akosiaris>	 tests in that very simple case only hinder refactoring
[08:54:10] <akosiaris>	 and given we don't enforce them anyway via jenkins
[08:54:15] <akosiaris>	 people just utterly ignore them
[08:54:29] <akosiaris>	 they don't even remember/know they exist
[08:54:46] <akosiaris>	 in other cases, with many codepaths (case statements, if/thens and so on)
[08:54:52] <akosiaris>	 tests make way more sense
[08:55:09] <akosiaris>	 or puppet parser functions and so on
[08:55:34] <akosiaris>	 I am thinking we should go through with a process of cleaning up our tree
[08:55:42] <akosiaris>	 kill the useless tests
[08:55:45] <akosiaris>	 like this one
[08:55:53] <akosiaris>	 and then enable the tests in jenkins
[08:55:56] <wikibugs>	 (03CR) 10Hashar: "That was supposed to be covered by a RewriteRule https://gerrit.wikimedia.org/r/#/c/322019/5/modules/contint/templates/apache/doc.wikimedi" [puppet] - 10https://gerrit.wikimedia.org/r/331558 (https://phabricator.wikimedia.org/T150727) (owner: 10Krinkle)
[08:57:51] <hashar>	 akosiaris: yeah  duplicating the implementation in a test is worthless
[08:57:55] <hashar>	 I agree with that and other ops have the same concern
[08:58:12] <hashar>	 when I refactored the Zuul class to use hiera, I wrote a set of spec and that greatly helped
[08:58:29] <hashar>	 specially to assert the proper hiera key got loaded and the resulting erb template compiled/expanded properly
[08:59:30] <hashar>	 !log disabling puppet on contint1001 to live hack apache conf ( T150727 )
[08:59:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:59:35] <stashbot>	 T150727: doc.wikimedia.org displays "403 Forbidden" for coverage sub directories - https://phabricator.wikimedia.org/T150727
[09:02:27] <wikibugs>	 06Operations, 10MediaWiki-Vagrant: Upgrade Vagrant to 1.9.1 in Wikimedia apt for both Trusty and Jessie - https://phabricator.wikimedia.org/T155112#2935815 (10akosiaris)
[09:16:35] <akosiaris>	 !log T155112 upload Vagrant 1.9.1 to apt.wikimedia.org/jessie-wikimedia/thirdparty and apt.wikimedia.org/trusty-wikimedia/thirdparty
[09:16:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:16:40] <stashbot>	 T155112: Upgrade Vagrant to 1.9.1 in Wikimedia apt for both Trusty and Jessie - https://phabricator.wikimedia.org/T155112
[09:16:58] <wikibugs>	 07Puppet, 06Labs, 10MediaWiki-Vagrant, 13Patch-For-Review, 15User-bd808: Make role::labs::mediawiki_vagrant work on Debian Jessie host systems - https://phabricator.wikimedia.org/T154340#2935839 (10akosiaris)
[09:17:00] <wikibugs>	 06Operations, 10MediaWiki-Vagrant: Upgrade Vagrant to 1.9.1 in Wikimedia apt for both Trusty and Jessie - https://phabricator.wikimedia.org/T155112#2935836 (10akosiaris) 05Open>03Resolved a:03akosiaris @bd808, yes in fact I 've done it already.
[09:24:57] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: oresrdb.svc.eqiad.wmnet: Point to oresrdb1002 [dns] - 10https://gerrit.wikimedia.org/r/331835
[09:31:23] <icinga-wm>	 PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:38:32] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Revert "oresrdb.svc.eqiad.wmnet: Point to oresrdb1002" [dns] - 10https://gerrit.wikimedia.org/r/331837
[09:43:22] <wikibugs>	 06Operations, 10Citoid, 06Services, 10VisualEditor: NIH db misbehaviour causing problems to Citoid - https://phabricator.wikimedia.org/T133696#2935854 (10Mvolz) a:05Mvolz>03None
[09:56:30] <wikibugs>	 (03CR) 10Hashar: [V: 031 C: 031] "I have downloaded the pson catalogs for each of the hosts and manually did a diff.  They are all noop :-}" [puppet] - 10https://gerrit.wikimedia.org/r/331457 (owner: 10Hashar)
[09:59:23] <icinga-wm>	 RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures
[10:00:13] <wikibugs>	 (03PS3) 10Hashar: kafka: fix Unrecognized escape sequence '\.' [puppet] - 10https://gerrit.wikimedia.org/r/331451
[10:01:25] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] oresrdb.svc.eqiad.wmnet: Point to oresrdb1002 [dns] - 10https://gerrit.wikimedia.org/r/331835 (owner: 10Alexandros Kosiaris)
[10:03:43] <wikibugs>	 (03CR) 10Hashar: "Rebased to run the puppet compiler against the kafka1001-1003 and kafka2001-2003 hosts: https://puppet-compiler.wmflabs.org/5083/ it is no" [puppet] - 10https://gerrit.wikimedia.org/r/331451 (owner: 10Hashar)
[10:05:00] <wikibugs>	 (03PS4) 10Hashar: kafka: fix Unrecognized escape sequence '\.' [puppet] - 10https://gerrit.wikimedia.org/r/331451
[10:20:03] <icinga-wm>	 PROBLEM - DPKG on oresrdb1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[10:23:13] <icinga-wm>	 RECOVERY - DPKG on oresrdb1001 is OK: All packages OK
[10:23:13] <icinga-wm>	 PROBLEM - puppet last run on lvs1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[10:24:19] <wikibugs>	 (03CR) 10Hashar: "$whitelist_tail was missing the underscore. kafka2003.codfw.wmnet catalog compilation failed twice, might be unrelated(?)" [puppet] - 10https://gerrit.wikimedia.org/r/331451 (owner: 10Hashar)
[10:24:43] <wikibugs>	 (03CR) 10Hashar: [V: 031 C: 031] "Ah Unable to find facts for host kafka2003.codfw.wmnet, skipping :}" [puppet] - 10https://gerrit.wikimedia.org/r/331451 (owner: 10Hashar)
[10:36:13] <icinga-wm>	 PROBLEM - puppet last run on mc1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[10:37:48] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] Revert "oresrdb.svc.eqiad.wmnet: Point to oresrdb1002" [dns] - 10https://gerrit.wikimedia.org/r/331837 (owner: 10Alexandros Kosiaris)
[10:48:06] <wikibugs>	 (03PS1) 10Odder: Add noratelimit user right to translation admins on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331841 (https://phabricator.wikimedia.org/T155162)
[10:50:29] <wikibugs>	 (03CR) 10Odder: [C: 04-1] "Let's wait for community consensus or at least an announcement on a local Village Pump before this is deployed." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331841 (https://phabricator.wikimedia.org/T155162) (owner: 10Odder)
[10:50:48] <wikibugs>	 07Puppet, 10Continuous-Integration-Config, 13Patch-For-Review: rake-jessie tests check .pp files but are not triggered by .pp file changes - https://phabricator.wikimedia.org/T153013#2935931 (10hashar) 05Open>03Resolved a:03Paladox
[10:51:13] <icinga-wm>	 RECOVERY - puppet last run on lvs1001 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[10:53:58] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: ores: Add redis database to client_hosts hiera key [puppet] - 10https://gerrit.wikimedia.org/r/331843
[10:55:18] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] ores: Add redis database to client_hosts hiera key [puppet] - 10https://gerrit.wikimedia.org/r/331843 (owner: 10Alexandros Kosiaris)
[11:05:13] <icinga-wm>	 RECOVERY - puppet last run on mc1003 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures
[11:34:23] <icinga-wm>	 PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service]
[11:52:23] <icinga-wm>	 PROBLEM - puppet last run on wtp1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:57:44] <wikibugs>	 (03PS1) 10Urbanecm: [cleanup] Remove old throttle rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331844
[12:01:29] <wikibugs>	 (03Abandoned) 10Urbanecm: [cleanup] Remove old throttle rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331844 (owner: 10Urbanecm)
[12:03:23] <icinga-wm>	 RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[12:09:33] <icinga-wm>	 PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service]
[12:20:23] <icinga-wm>	 RECOVERY - puppet last run on wtp1021 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures
[12:37:33] <icinga-wm>	 RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures
[12:42:27] <wikibugs>	 (03PS2) 10Hashar: mirrors: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/331639
[12:45:20] <wikibugs>	 (03PS1) 10Urbanecm: Add HD logos for several projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331846 (https://phabricator.wikimedia.org/T150618)
[13:04:42] <wikibugs>	 (03PS1) 10Hashar: bacula: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/331847
[13:10:18] <wikibugs>	 (03PS1) 10Hashar: backup: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/331848
[13:31:55] <icinga-wm>	 PROBLEM - BGP status on cr1-eqdfw is CRITICAL: BGP CRITICAL - No response from remote host 208.80.153.198
[13:35:36] <icinga-wm>	 PROBLEM - Redis status tcp_6381 on rdb2004 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.16.123 on port 6381
[13:36:36] <icinga-wm>	 RECOVERY - Redis status tcp_6381 on rdb2004 is OK: OK: REDIS 2.8.17 on 10.192.16.123:6381 has 1 databases (db0) with 7423080 keys, up 73 days 5 hours - replication_delay is 0
[13:36:36] <icinga-wm>	 PROBLEM - Redis status tcp_6379 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6379
[13:36:36] <icinga-wm>	 PROBLEM - Redis status tcp_6380 on rdb2004 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.16.123 on port 6380
[13:37:06] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:37:06] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:37:26] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[13:37:26] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 37 probes of 400 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[13:37:26] <icinga-wm>	 RECOVERY - Redis status tcp_6379 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6379 has 1 databases (db0) with 2808630 keys, up 73 days 5 hours - replication_delay is 0
[13:37:36] <icinga-wm>	 RECOVERY - Redis status tcp_6380 on rdb2004 is OK: OK: REDIS 2.8.17 on 10.192.16.123:6380 has 1 databases (db0) with 7510819 keys, up 73 days 5 hours - replication_delay is 0
[13:37:56] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[13:37:56] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[13:38:16] <icinga-wm>	 PROBLEM - IPsec on cp1048 is CRITICAL: Strongswan CRITICAL - ok: 55 not-conn: cp2005_v4
[13:38:16] <icinga-wm>	 PROBLEM - IPsec on cp1049 is CRITICAL: Strongswan CRITICAL - ok: 55 not-conn: cp2011_v4
[13:38:16] <icinga-wm>	 PROBLEM - IPsec on cp1071 is CRITICAL: Strongswan CRITICAL - ok: 53 not-conn: cp2005_v4, cp2011_v4, cp2017_v4
[13:38:26] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2807764 keys, up 73 days 5 hours - replication_delay is 0
[13:38:26] <icinga-wm>	 PROBLEM - IPsec on cp1055 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp2004_v4, cp2016_v4
[13:38:36] <icinga-wm>	 PROBLEM - IPsec on cp1068 is CRITICAL: Strongswan CRITICAL - ok: 40 not-conn: cp2001_v4, cp2004_v4, cp2019_v4, cp2023_v4
[13:39:06] <icinga-wm>	 PROBLEM - IPsec on cp1046 is CRITICAL: Strongswan CRITICAL - ok: 22 not-conn: cp2003_v4, cp2015_v4
[13:39:06] <icinga-wm>	 PROBLEM - IPsec on cp1053 is CRITICAL: Strongswan CRITICAL - ok: 43 not-conn: cp2019_v4
[13:39:06] <icinga-wm>	 PROBLEM - IPsec on cp1065 is CRITICAL: Strongswan CRITICAL - ok: 43 not-conn: cp2016_v4
[13:39:06] <icinga-wm>	 PROBLEM - IPsec on cp1047 is CRITICAL: Strongswan CRITICAL - ok: 23 not-conn: cp2009_v4
[13:39:06] <icinga-wm>	 PROBLEM - IPsec on cp1059 is CRITICAL: Strongswan CRITICAL - ok: 22 not-conn: cp2003_v4, cp2021_v4
[13:39:07] <icinga-wm>	 PROBLEM - IPsec on cp1063 is CRITICAL: Strongswan CRITICAL - ok: 55 not-conn: cp2022_v4
[13:39:07] <icinga-wm>	 PROBLEM - IPsec on cp1054 is CRITICAL: Strongswan CRITICAL - ok: 40 not-conn: cp2004_v4, cp2010_v4, cp2013_v4, cp2023_v4
[13:39:08] <icinga-wm>	 PROBLEM - IPsec on cp1050 is CRITICAL: Strongswan CRITICAL - ok: 55 not-conn: cp2022_v4
[13:39:08] <icinga-wm>	 PROBLEM - IPsec on cp1067 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp2010_v4, cp2019_v4
[13:39:16] <icinga-wm>	 PROBLEM - IPsec on cp1099 is CRITICAL: Strongswan CRITICAL - ok: 55 not-conn: cp2008_v4
[13:39:16] <icinga-wm>	 PROBLEM - IPsec on cp1074 is CRITICAL: Strongswan CRITICAL - ok: 55 not-conn: cp2011_v4
[13:39:16] <icinga-wm>	 PROBLEM - IPsec on cp1073 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2008_v4, cp2024_v4
[13:42:26] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 2 probes of 400 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[13:43:16] <icinga-wm>	 RECOVERY - IPsec on cp1047 is OK: Strongswan OK - 24 ESP OK
[13:46:06] <icinga-wm>	 RECOVERY - IPsec on cp1065 is OK: Strongswan OK - 44 ESP OK
[13:46:47] <wikibugs>	 (03PS12) 10Hashar: Modification of Rakefile spec entry point [puppet] - 10https://gerrit.wikimedia.org/r/282484 (https://phabricator.wikimedia.org/T78342) (owner: 10Nicko)
[13:46:50] <wikibugs>	 (03PS1) 10Hashar: (DO NOT SUBMIT) Octopus merge of spec fixes [puppet] - 10https://gerrit.wikimedia.org/r/331850
[13:48:18] <wikibugs>	 (03PS5) 10Hashar: Use task to run modules spec [puppet] - 10https://gerrit.wikimedia.org/r/307223
[13:48:26] <icinga-wm>	 RECOVERY - IPsec on cp1055 is OK: Strongswan OK - 44 ESP OK
[13:48:45] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Modification of Rakefile spec entry point [puppet] - 10https://gerrit.wikimedia.org/r/282484 (https://phabricator.wikimedia.org/T78342) (owner: 10Nicko)
[13:50:36] <icinga-wm>	 RECOVERY - IPsec on cp1068 is OK: Strongswan OK - 44 ESP OK
[13:50:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Use task to run modules spec [puppet] - 10https://gerrit.wikimedia.org/r/307223 (owner: 10Hashar)
[13:51:16] <icinga-wm>	 RECOVERY - IPsec on cp1050 is OK: Strongswan OK - 56 ESP OK
[13:51:43] <wikibugs>	 06Operations, 10Continuous-Integration-Config, 13Patch-For-Review: Create a basic RSpec unit test for operations/puppet - https://phabricator.wikimedia.org/T78342#2936094 (10hashar)
[13:52:16] <icinga-wm>	 RECOVERY - IPsec on cp1059 is OK: Strongswan OK - 24 ESP OK
[13:52:38] <wikibugs>	 06Operations, 10Continuous-Integration-Config, 13Patch-For-Review: Create a basic RSpec unit test for operations/puppet - https://phabricator.wikimedia.org/T78342#842870 (10hashar) I have been sprinting that a bit this week.  Namely fixed a few spec and rebased Nico patch.  Results are under the Gerrit topic...
[13:53:16] <icinga-wm>	 RECOVERY - IPsec on cp1048 is OK: Strongswan OK - 56 ESP OK
[13:53:36] <icinga-wm>	 PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:57:07] <wikibugs>	 (03PS1) 10Hashar: (DO NOT SUBMIT) test git submodule update [puppet] - 10https://gerrit.wikimedia.org/r/331853
[13:57:16] <icinga-wm>	 RECOVERY - IPsec on cp1063 is OK: Strongswan OK - 56 ESP OK
[13:59:16] <icinga-wm>	 RECOVERY - IPsec on cp1054 is OK: Strongswan OK - 44 ESP OK
[13:59:24] <wikibugs>	 (03Abandoned) 10Hashar: (DO NOT SUBMIT) test git submodule update [puppet] - 10https://gerrit.wikimedia.org/r/331853 (owner: 10Hashar)
[14:00:16] <icinga-wm>	 RECOVERY - IPsec on cp1049 is OK: Strongswan OK - 56 ESP OK
[14:03:01] <wikibugs>	 (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/282484 (https://phabricator.wikimedia.org/T78342) (owner: 10Nicko)
[14:03:16] <icinga-wm>	 RECOVERY - IPsec on cp1067 is OK: Strongswan OK - 44 ESP OK
[14:03:16] <icinga-wm>	 RECOVERY - IPsec on cp1073 is OK: Strongswan OK - 56 ESP OK
[14:03:16] <icinga-wm>	 RECOVERY - IPsec on cp1071 is OK: Strongswan OK - 56 ESP OK
[14:04:06] <icinga-wm>	 RECOVERY - IPsec on cp1046 is OK: Strongswan OK - 24 ESP OK
[14:04:06] <icinga-wm>	 RECOVERY - IPsec on cp1053 is OK: Strongswan OK - 44 ESP OK
[14:05:34] <hashar>	 akosiaris: rspec of all modules on CI and passing!! https://integration.wikimedia.org/ci/job/operations-puppet-rake-jessie/2726/console :}
[14:06:16] <icinga-wm>	 RECOVERY - IPsec on cp1099 is OK: Strongswan OK - 56 ESP OK
[14:09:16] <icinga-wm>	 RECOVERY - IPsec on cp1074 is OK: Strongswan OK - 56 ESP OK
[14:21:36] <icinga-wm>	 RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures
[14:22:03] <wikibugs>	 (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/307223 (owner: 10Hashar)
[14:55:32] <wikibugs>	 (03PS1) 10Hashar: (WIP) Jenkins integration of rspec [puppet] - 10https://gerrit.wikimedia.org/r/331856
[15:03:16] <icinga-wm>	 PROBLEM - puppet last run on analytics1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:27:18] <wikibugs>	 06Operations, 06Commons, 10TimedMediaHandler-Transcode, 10Wikimedia-Video, and 3 others: Commons video transcoders have over 6500 tasks in the backlog. - https://phabricator.wikimedia.org/T153488#2936350 (10zhuyifei1999) (Un-stalled the requests for server side uploads)
[15:28:27] <wikibugs>	 (03PS2) 10Hashar: Jenkins integration of rspec [puppet] - 10https://gerrit.wikimedia.org/r/331856 (https://phabricator.wikimedia.org/T78342)
[15:31:36] <icinga-wm>	 RECOVERY - puppet last run on analytics1039 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[15:34:27] <wikibugs>	 (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/330470 (owner: 10Hashar)
[15:35:16] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] build: update rubocop to 0.39 and tweak config [puppet] - 10https://gerrit.wikimedia.org/r/330470 (owner: 10Hashar)
[15:39:02] <wikibugs>	 (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/330470 (owner: 10Hashar)
[15:40:18] <wikibugs>	 (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/331856 (https://phabricator.wikimedia.org/T78342) (owner: 10Hashar)
[15:45:27] <wikibugs>	 06Operations, 10Continuous-Integration-Config, 13Patch-For-Review: Create a basic RSpec unit test for operations/puppet - https://phabricator.wikimedia.org/T78342#2936364 (10hashar)
[15:45:42] <wikibugs>	 06Operations, 10Continuous-Integration-Config, 13Patch-For-Review: Create a basic RSpec unit test for operations/puppet - https://phabricator.wikimedia.org/T78342#842870 (10hashar) 05stalled>03Open a:03hashar
[15:51:16] <wikibugs>	 (03PS1) 10Paladox: Update mysql-connector-java to 5.1.40 [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331863
[15:54:01] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] bacula: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/331847 (owner: 10Hashar)
[15:54:23] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] backup: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/331848 (owner: 10Hashar)
[15:54:30] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: backup: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/331848 (owner: 10Hashar)
[15:54:32] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] backup: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/331848 (owner: 10Hashar)
[15:55:08] <hashar>	 akosiaris: I am going to write a note to the ops list
[15:55:33] <hashar>	 but in short the holygrail is having rspec to craft JUnit reports for Jenkins to interpret
[15:55:35] <hashar>	 https://integration.wikimedia.org/ci/job/operations-puppet-rake-jessie/2732/testReport/  \O/
[15:56:03] <akosiaris>	 hashar: \o/
[15:56:22] <hashar>	 I have doubt our spec are any helpful though :(
[15:56:39] <hashar>	 and there is a few modules that are highly coupled with everything.  tilerator is an example
[15:56:52] <hashar>	 you end up needing base / conftool / etcd etc
[15:56:52] <hashar>	 :(
[15:57:10] <hashar>	 maybe in the modules we could use mock modules that are essentially empty
[15:57:26] <hashar>	 and at the root of puppet.git,  have integration tests that play with all modules
[15:59:59] <akosiaris>	 so, have you seen the new RFC ?
[16:00:08] <akosiaris>	 all that coupling should become more loose
[16:00:16] <akosiaris>	 the role/profile/module paradigm
[16:00:26] <akosiaris>	 allows to couple more loosely such things 
[16:00:34] <hashar>	 yeah
[16:00:50] <hashar>	 I thought of Joe change to be rather complicated
[16:00:57] <hashar>	 and merely move bits around / introducing yet another level of inception
[16:01:08] <akosiaris>	 at first yeah, that's what it looks like
[16:01:12] <hashar>	 but now that I have looked a bit more at our modules coupling it makes total sense
[16:01:21] <akosiaris>	 exactly
[16:01:22] <hashar>	 so each module would have spec that at worth just use  stdlib and wmflib
[16:01:50] <akosiaris>	 maybe some other very intrusive ones like some defines from apache
[16:01:53] <akosiaris>	 or monitoring
[16:02:05] <akosiaris>	 but the scope of these is meant to be very very limited
[16:02:13] <hashar>	 the roles using several modules, the spec helper would point to /modules  (eg do not use the fixture system and just use any module)
[16:02:25] <hashar>	 and at root of the profile module, we would get some end to end  integration tests
[16:02:52] <hashar>	 like   profile mediawiki::appserver  it { should.contain_package['hhvm'].with_version('3.4.4') }
[16:02:55] <hashar>	 or something like that
[16:03:20] <hashar>	 monitoring I don't think it should be done in modules
[16:03:26] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db1047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.13 seconds
[16:03:33] <hashar>	 but yeah overall that is exciting
[16:04:36] <icinga-wm>	 PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service]
[16:06:16] <akosiaris>	 hashar: generally speaking ? yes it shouldn't
[16:06:34] <akosiaris>	 but it may not be easy to find the easiest construct for that
[16:06:36] <hashar>	 that is a things to fix for the future generation
[16:06:45] <akosiaris>	 monitoring is a very weird thing
[16:06:58] <akosiaris>	 you want to be very pervasive and present without even caring for it
[16:07:13] <akosiaris>	 which when testing, makes everything go awry
[16:07:46] <akosiaris>	 it will need some better experience with the paradigm and some design
[16:10:09] <wikibugs>	 (03CR) 10Paladox: "I got the jar directly from https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.40/" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331863 (owner: 10Paladox)
[16:13:40] <wikibugs>	 (03PS2) 10Hashar: (DO NOT SUBMIT) Octopus merge of spec fixes [puppet] - 10https://gerrit.wikimedia.org/r/331850
[16:13:43] <wikibugs>	 (03Draft1) 10Paladox: Gerrit: Remove mysql-connection-java apt package [puppet] - 10https://gerrit.wikimedia.org/r/331864
[16:13:46] <wikibugs>	 (03PS2) 10Paladox: Gerrit: Remove mysql-connection-java apt package [puppet] - 10https://gerrit.wikimedia.org/r/331864
[16:14:27] <wikibugs>	 (03PS13) 10Hashar: Modification of Rakefile spec entry point [puppet] - 10https://gerrit.wikimedia.org/r/282484 (https://phabricator.wikimedia.org/T78342) (owner: 10Nicko)
[16:14:43] <wikibugs>	 (03PS6) 10Hashar: Use task to run modules spec [puppet] - 10https://gerrit.wikimedia.org/r/307223
[16:14:51] <wikibugs>	 (03PS3) 10Hashar: Jenkins integration of rspec [puppet] - 10https://gerrit.wikimedia.org/r/331856 (https://phabricator.wikimedia.org/T78342)
[16:15:11] <wikibugs>	 (03CR) 10Paladox: "@Chad should i remove depends on since we can merge this as it won't remove mysql-connection-java jar as it would need manual removal." [puppet] - 10https://gerrit.wikimedia.org/r/331864 (owner: 10Paladox)
[16:15:22] <hashar>	 akosiaris: I will let things settle a bit and write about rspec tomorrow.  Thank you for the reviews! ;}
[16:15:36] <akosiaris>	 hashar: thanks as well
[16:15:37] <akosiaris>	 !
[16:15:40] <akosiaris>	 :-)
[16:18:59] <hashar>	 eek
[16:19:08] <hashar>	 the lvm module relies on stdlib
[16:19:27] <hashar>	 and the .fixtures.yml file uses the github repo https://github.com/puppetlabs/puppetlabs-stdlib.git
[16:19:31] <hashar>	 not quite what we have ;}
[16:23:04] <wikibugs>	 (03CR) 10Chad: "Is this available via debian unstable or testing perhaps? Using the provided debian package is kind of nice...less stuff to bundle in the " [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331863 (owner: 10Paladox)
[16:23:32] <wikibugs>	 (03CR) 10Chad: [C: 04-1] "See comments I left on the dependency" [puppet] - 10https://gerrit.wikimedia.org/r/331864 (owner: 10Paladox)
[16:24:16] <wikibugs>	 (03CR) 10Paladox: "> Is this available via debian unstable or testing perhaps? Using the" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331863 (owner: 10Paladox)
[16:25:26] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db1047 is OK: OK slave_sql_lag Replication lag: 46.95 seconds
[16:32:36] <icinga-wm>	 RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[16:34:38] <wikibugs>	 (03CR) 10Chad: "Yeah, I saw the release notes...I'm just curious though in the delta between 5.1.21 and 5.1.40, are there particular bugfixes or features " [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331863 (owner: 10Paladox)
[16:35:26] <wikibugs>	 (03CR) 10Paladox: "Well between those releases it fixes something to do if you set utf8mb4 on the server jdbc did not work correctly." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331863 (owner: 10Paladox)
[16:35:36] <wikibugs>	 (03CR) 10Paladox: "also adds something to do with alter table." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331863 (owner: 10Paladox)
[16:54:43] <wikibugs>	 06Operations, 10ops-codfw: codfw: mw2251-mw2260 rack/setup - https://phabricator.wikimedia.org/T155180#2936454 (10Papaul)
[16:55:35] <wikibugs>	 06Operations, 10ops-codfw: codfw: mw2251-mw2260 rack/setup - https://phabricator.wikimedia.org/T155180#2936470 (10Papaul)
[16:57:53] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: osm: Fix osm rsync server check [puppet] - 10https://gerrit.wikimedia.org/r/331878
[17:04:36] <icinga-wm>	 PROBLEM - puppet last run on cp4006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:09:54] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] osm: Fix osm rsync server check [puppet] - 10https://gerrit.wikimedia.org/r/331878 (owner: 10Alexandros Kosiaris)
[17:10:02] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: osm: Fix osm rsync server check [puppet] - 10https://gerrit.wikimedia.org/r/331878
[17:10:45] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] osm: Fix osm rsync server check [puppet] - 10https://gerrit.wikimedia.org/r/331878 (owner: 10Alexandros Kosiaris)
[17:12:46] <wikibugs>	 06Operations, 10ops-codfw: codfw:mw2251-mw2260 switch port configuration - https://phabricator.wikimedia.org/T155181#2936476 (10Papaul)
[17:17:25] <wikibugs>	 06Operations, 10ops-codfw: codfw: mw2251-mw2260 rack/setup - https://phabricator.wikimedia.org/T155180#2936490 (10Papaul) p:05Triage>03Normal
[17:19:36] <icinga-wm>	 PROBLEM - puppet last run on restbase1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:26:28] <wikibugs>	 (03CR) 10Anomie: "> but suddenly they're making edits while logged out with a higher frequency." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324215 (https://phabricator.wikimedia.org/T154698) (owner: 10Anomie)
[17:32:36] <icinga-wm>	 RECOVERY - puppet last run on cp4006 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[17:33:16] <icinga-wm>	 RECOVERY - Check if rsync server is running on labsdb1006 is OK: PROCS OK: 1 process with command name rsync, regex args /usr/bin/rsync --no-detach --daemon
[17:43:35] <wikibugs>	 06Operations, 10IDS-extension, 10Wikimedia-Extension-setup, 07I18n: Deploy IDS rendering engine to production - https://phabricator.wikimedia.org/T148693#2936504 (10Arthur2e5) Yes.
[17:47:36] <icinga-wm>	 RECOVERY - puppet last run on restbase1014 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[17:52:50] <doctaxon>	 Hi, the Notifications do not work at dewiki right now
[17:53:01] <doctaxon>	 has anybody an idea?
[18:08:36] <wikibugs>	 (03CR) 10Chad: [C: 032] Add HD logos for several projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331846 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm)
[18:10:19] <wikibugs>	 (03Merged) 10jenkins-bot: Add HD logos for several projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331846 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm)
[18:10:37] <wikibugs>	 (03CR) 10jenkins-bot: Add HD logos for several projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331846 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm)
[18:12:03] <logmsgbot>	 !log demon@tin Synchronized static/images/project-logos: HD logos for (nap|os|pl|pt)wiki (duration: 00m 39s)
[18:12:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:13:05] <logmsgbot>	 !log demon@tin Synchronized wmf-config/InitialiseSettings.php: Use HD logos for (nap|os|pl|pt)wiki (duration: 00m 41s)
[18:13:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:28:36] <icinga-wm>	 PROBLEM - puppet last run on snapshot1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[18:57:36] <icinga-wm>	 RECOVERY - puppet last run on snapshot1007 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures
[19:09:16] <icinga-wm>	 PROBLEM - Juniper alarms on mr1-eqiad is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 208.80.154.199
[19:10:07] <icinga-wm>	 RECOVERY - Juniper alarms on mr1-eqiad is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms
[19:14:13] <wikibugs>	 (03PS1) 10Papaul: DNS: Add mgmt and prodcution DNS entres for mw2251-mw2260 Fix: Putting server in alphabetical order Bug:T155180 [dns] - 10https://gerrit.wikimedia.org/r/331903
[19:38:02] <wikibugs>	 (03Draft1) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:38:06] <wikibugs>	 (03Draft2) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:38:10] <wikibugs>	 (03Draft3) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:38:16] <wikibugs>	 (03Draft4) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:38:20] <wikibugs>	 (03Draft5) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:38:24] <wikibugs>	 (03Draft6) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:38:29] <wikibugs>	 (03Draft7) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:38:34] <wikibugs>	 (03Draft8) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:38:39] <wikibugs>	 (03Draft9) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:38:44] <wikibugs>	 (03Draft10) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:38:52] <wikibugs>	 (03Draft11) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:38:55] <wikibugs>	 (03Draft12) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:38:59] <wikibugs>	 (03Draft13) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:39:04] <wikibugs>	 (03Draft14) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:39:08] <wikibugs>	 (03Draft15) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:39:12] <wikibugs>	 (03Draft16) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:39:16] <wikibugs>	 (03Draft17) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:39:19] <wikibugs>	 (03Draft18) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:39:23] <wikibugs>	 (03Draft19) 10Paladox: Test: Do not merge [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:39:37] <wikibugs>	 (03PS20) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[19:45:53] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] "Yeah, because afaik DirectoryIndex is about the current directory (e.g. how your typical index.php/index.html file works). Whereas what we" [puppet] - 10https://gerrit.wikimedia.org/r/331558 (https://phabricator.wikimedia.org/T150727) (owner: 10Krinkle)
[19:46:14] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] "I wonder what problem the rewrite rule caused in Apache 2.4? Afaik that should work just fine in either Apache version." [puppet] - 10https://gerrit.wikimedia.org/r/331558 (https://phabricator.wikimedia.org/T150727) (owner: 10Krinkle)
[20:06:22] <wikibugs>	 (03PS21) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[20:18:00] <wikibugs>	 (03PS22) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[20:18:39] <wikibugs>	 (03CR) 10Paladox: "recheck" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 (owner: 10Paladox)
[20:20:10] <wikibugs>	 (03PS23) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[20:26:19] <wikibugs>	 (03PS24) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[20:36:13] <wikibugs>	 (03PS25) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[20:47:08] <wikibugs>	 (03PS26) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[21:02:06] <wikibugs>	 (03PS27) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[21:02:56] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-3/2/3: down - Core: cr2-codfw:xe-5/0/1 (Zayo, OGYX/120003//ZYO) 36ms {#2909} [10Gbps wave]BR
[21:09:17] <wikibugs>	 (03PS1) 10Filippo Giunchedi: cassandra: add jmx_exporter to Cassandra in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/331911 (https://phabricator.wikimedia.org/T155120)
[21:09:48] <wikibugs>	 (03PS28) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[21:10:22] <wikibugs>	 (03PS29) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[21:10:50] <wikibugs>	 (03CR) 10Paladox: "recheck" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 (owner: 10Paladox)
[21:15:03] <wikibugs>	 (03CR) 10Filippo Giunchedi: "PCC  https://puppet-compiler.wmflabs.org/5086/" [puppet] - 10https://gerrit.wikimedia.org/r/331911 (https://phabricator.wikimedia.org/T155120) (owner: 10Filippo Giunchedi)
[21:30:47] <wikibugs>	 (03PS30) 10Paladox: Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873
[21:36:26] <icinga-wm>	 PROBLEM - puppet last run on mw1242 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:39:29] <wikibugs>	 06Operations, 06Commons, 10TimedMediaHandler-Transcode, 10Wikimedia-Video, and 3 others: Commons video transcoders have over 6500 tasks in the backlog. - https://phabricator.wikimedia.org/T153488#2936925 (10matmarex)
[21:46:13] <wikibugs>	 (03CR) 10Paladox: "@Chad i think this is ready. Dosent fix all lintian failures, but fixes some of them :)" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 (owner: 10Paladox)
[22:04:26] <icinga-wm>	 RECOVERY - puppet last run on mw1242 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures
[22:08:31] <wikibugs>	 (03CR) 10Hashar: [C: 04-1] "When invoking "rake spec:all", if one of the module fail it does not run the others modules spec. And in parallel mode "rake -m",  that is" [puppet] - 10https://gerrit.wikimedia.org/r/307223 (owner: 10Hashar)
[22:08:50] <logmsgbot>	 !log maxsem@tin Synchronized php-1.29.0-wmf.7/extensions/Graph/includes/ApiGraph.php: Debug for T155057 (duration: 00m 38s)
[22:08:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:08:54] <stashbot>	 T155057: Graph: First parameter must either be an object or the name of an existing class - https://phabricator.wikimedia.org/T155057
[22:29:34] <wikibugs>	 (03PS1) 10Chad: Removing old presenation files from wmfwiki docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331918
[22:29:51] <wikibugs>	 (03CR) 10Reedy: [C: 031] "RIP" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331918 (owner: 10Chad)
[22:31:46] <wikibugs>	 (03CR) 10Chad: [C: 032] Removing old presenation files from wmfwiki docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331918 (owner: 10Chad)
[22:33:11] <wikibugs>	 (03PS2) 10Chad: Removing old presentation files from wmfwiki docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331918
[22:33:20] <ostriches>	 apergos: FINE ^
[22:35:08] <wikibugs>	 (03CR) 10Chad: [C: 032] Removing old presentation files from wmfwiki docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331918 (owner: 10Chad)
[22:36:39] <wikibugs>	 (03Merged) 10jenkins-bot: Removing old presentation files from wmfwiki docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331918 (owner: 10Chad)
[22:38:13] <logmsgbot>	 !log demon@tin Synchronized docroot/foundation/presentations: removing some of these powerpoints (duration: 00m 38s)
[22:38:16] <wikibugs>	 (03CR) 10jenkins-bot: Removing old presentation files from wmfwiki docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331918 (owner: 10Chad)
[22:38:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:38:21] <wikibugs>	 (03PS1) 10BearND: admin: update my production ssh key [puppet] - 10https://gerrit.wikimedia.org/r/331920
[22:39:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] admin: update my production ssh key [puppet] - 10https://gerrit.wikimedia.org/r/331920 (owner: 10BearND)
[22:42:41] <wikibugs>	 (03PS2) 10BearND: admin: update my production ssh key [puppet] - 10https://gerrit.wikimedia.org/r/331920
[22:44:21] <wikibugs>	 (03PS1) 10Chad: Remove last of these powerpoints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331922
[22:44:31] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "this key was created while i was sitting next to Bernd at allhands :)" [puppet] - 10https://gerrit.wikimedia.org/r/331920 (owner: 10BearND)
[22:46:02] <wikibugs>	 (03CR) 10Chad: [C: 032] Remove last of these powerpoints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331922 (owner: 10Chad)
[22:46:07] <wikibugs>	 (03PS1) 10Papaul: DHCP: Add DHCP entries for mw2251-mw2260 Bug:T155180 [puppet] - 10https://gerrit.wikimedia.org/r/331923
[22:47:27] <wikibugs>	 (03Merged) 10jenkins-bot: Remove last of these powerpoints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331922 (owner: 10Chad)
[22:47:39] <wikibugs>	 (03CR) 10jenkins-bot: Remove last of these powerpoints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331922 (owner: 10Chad)
[22:48:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:48:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy
[22:49:15] <logmsgbot>	 !log demon@tin Synchronized docroot/foundation: Yay no more powerpoints (duration: 00m 38s)
[22:49:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:50:03] <ostriches>	 wmfwiki docroot almost sane!
[22:52:24] <apergos>	 \o/
[22:54:23] <wikibugs>	 (03CR) 10Reedy: "Nearly 500 commits... https://github.com/mysql/mysql-connector-j/compare/5.1.21...5.1.40" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331863 (owner: 10Paladox)
[22:54:26] <icinga-wm>	 PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:55:08] <wikibugs>	 (03CR) 10Paladox: "Not to mention 6.x lol. That would probably break gerrit." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331863 (owner: 10Paladox)
[22:55:25] <wikibugs>	 (03CR) 10Paladox: "break gerrit as in the 6.x would break it." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331863 (owner: 10Paladox)
[22:56:59] <wikibugs>	 06Operations, 10Ops-Access-Requests: Requesting access to hive/webrequest data for demon - https://phabricator.wikimedia.org/T155198#2937117 (10demon)
[22:58:20] <wikibugs>	 (03PS1) 10ArielGlenn: rsync for Erik Zachte from stat* hosts to dataset1001 other/media [puppet] - 10https://gerrit.wikimedia.org/r/331924
[22:58:35] <wikibugs>	 (03PS1) 10Chad: Grant access to analytics-privatedata-users to demon [puppet] - 10https://gerrit.wikimedia.org/r/331925 (https://phabricator.wikimedia.org/T155198)
[23:05:32] <logmsgbot>	 !log demon@tin Synchronized README: no-op for force co-master sync (duration: 00m 40s)
[23:05:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:16:15] <wikibugs>	 (03CR) 10Chad: "So, I'm not entirely sure if the copyright file is all correct or not, the licensing here is kinda unclear. Other files look fine." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 (owner: 10Paladox)
[23:16:19] <icinga-wm>	 PROBLEM - MariaDB disk space on db1026 is CRITICAL: DISK CRITICAL - free space: /srv 73649 MB (4% inode=99%)
[23:17:16] <icinga-wm>	 PROBLEM - Disk space on db1026 is CRITICAL: DISK CRITICAL - free space: /srv 57155 MB (3% inode=99%)
[23:18:05] <wikibugs>	 (03CR) 10Paladox: "I think the files in * are apache version 2 as gerrit is licensed under apache 2.x but debian/* is licensed under gpl 2.0+" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 (owner: 10Paladox)
[23:18:09] <madhuvishy>	 marostegui: is this worrying ^
[23:18:54] <Reedy>	 s5 slave
[23:18:54] <Reedy>	 'db1026' => 1,   # 1.4TB  64GB, watchlist, recentchanges, contributions, logpager
[23:19:16] <ostriches>	 Yeah, /srv is filling up fast, lost another 1% in like a minute?
[23:19:36] <Reedy>	 What's in /srv on the db hosts?
[23:20:08] <Reedy>	 https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&c=MySQL+eqiad&h=db1026.eqiad.wmnet&tab=m&vn=&hide-hf=false&m=cpu_report&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name
[23:20:51] <godog>	 I'm taking a look too
[23:20:59] <Reedy>	 ta
[23:20:59] <madhuvishy>	 marostegui is looking too
[23:22:04] <godog>	 madhuvishy: nice, where are you?
[23:22:35] <madhuvishy>	 godog: ventana - middle rows, left side
[23:22:48] <godog>	 but yeah load spiked a few minutes ago
[23:23:26] <icinga-wm>	 RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[23:28:24] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB disk space on db1026 is CRITICAL: DISK CRITICAL - free space: /srv 74907 MB (5% inode=99%): Marostegui long running query sorting on a temp table
[23:29:37] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0]
[23:30:36] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0]
[23:32:16] <icinga-wm>	 RECOVERY - Disk space on db1026 is OK: DISK OK
[23:33:37] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0]
[23:35:57] <godog>	 the fatals seem to be from the jobqueue machines with "Could not wait for replica DBs to catch up to db1049"
[23:36:08] <marostegui>	 db1049?
[23:36:17] <madhuvishy>	 that is the master?
[23:36:20] <madhuvishy>	 for db1026
[23:36:29] <godog>	 yeah it is its master
[23:36:36] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0]
[23:37:19] <icinga-wm>	 RECOVERY - MariaDB disk space on db1026 is OK: DISK OK
[23:38:09] <godog>	 seems to have subsided now though
[23:38:33] <marostegui>	 yes, the queries are gone now
[23:38:38] <ostriches>	 godog: Yeah, all the errors seem to be lag related in MW
[23:38:39] <madhuvishy>	 https://tendril.wikimedia.org/host/view/db1026.eqiad.wmnet/3306 show spike in replag - and a drop
[23:38:43] <godog>	 I was looking at https://logstash.wikimedia.org/goto/8b3389188a01d6a60453e1145f08ce15 fwiw
[23:38:43] <marostegui>	 we'll need to investigate where are they coming from
[23:38:53] <ostriches>	 (generally speaking, a lot of our errors recently have been lag related :()
[23:39:00] <marostegui>	 I am going to compress a few tables to get more extra disk space
[23:39:33] <Reedy>	 Have you got a big enough press?
[23:39:43] <marostegui>	 looks like it is not the first time these queries appear: https://phabricator.wikimedia.org/T147747
[23:39:57] <marostegui>	 Reedy: I might need help if am not strong enough!
[23:39:59] <ostriches>	 ApiQueryContributions is a *terrible* query
[23:40:05] <ostriches>	 Frequent offender.
[23:44:01] <marostegui>	 I have thrown 100G to the lv 
[23:44:10] <marostegui>	 And I will start the compression in a bit
[23:45:53] <wikibugs>	 (03PS2) 10Dzahn: DHCP: Add DHCP entries for mw2251-mw2260 Bug:T155180 [puppet] - 10https://gerrit.wikimedia.org/r/331923 (owner: 10Papaul)
[23:47:08] <wikibugs>	 (03CR) 10Dzahn: [C: 032] DHCP: Add DHCP entries for mw2251-mw2260 Bug:T155180 [puppet] - 10https://gerrit.wikimedia.org/r/331923 (owner: 10Papaul)
[23:50:58] <wikibugs>	 (03CR) 10Dzahn: [C: 032] DNS: Add mgmt and prodcution DNS entres for mw2251-mw2260 Fix: Putting server in alphabetical order Bug:T155180 [dns] - 10https://gerrit.wikimedia.org/r/331903 (owner: 10Papaul)
[23:52:35] <wikibugs>	 (03CR) 10Dzahn: "Ok, thanks. i'm hitting "abandon" then." [dns] - 10https://gerrit.wikimedia.org/r/325856 (owner: 10Papaul)
[23:52:59] <marostegui>	 btw: https://phabricator.wikimedia.org/T154929 godog madhuvishy (just in case)
[23:53:13] <marostegui>	 it shouldn't be needed after the 100G I gave it
[23:53:18] <marostegui>	 but just in case
[23:53:32] <wikibugs>	 (03PS1) 10Chad: MWMultiversion: Move CLI entry point to class and out of MWVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331930
[23:55:52] <wikibugs>	 (03PS1) 10Chad: MWMultiVersion: Use proper (new) cli entry point [puppet] - 10https://gerrit.wikimedia.org/r/331931
[23:56:45] <icinga-wm>	 ACKNOWLEDGEMENT - Keyholder SSH agent on mira is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. daniel_zahn currently not the deployment server
[23:57:46] <ostriches>	 mutante: Shouldn't it be armed though?
[23:58:03] <ostriches>	 mira's not the current default, but it's a legit master, no reason for it not to run hot & ready
[23:59:28] <mutante>	 ostriches: we are talking about that right now
[23:59:42] * ostriches nods
[23:59:42] <mutante>	 yes