[00:04:12] <James_F>	 Reedy: I'm ready to test whenever jenkins feels like merging it. :-)
[00:04:22] <Reedy>	 Think he's nearly done
[00:05:23] <Reedy>	 hhvm is slooooow
[00:05:31] <Reedy>	 It's kidna hilarious we're saying that
[00:05:37] <James_F>	 Yeah, why don't we get rid of that?
[00:05:38] <James_F>	 ;-)
[00:06:58] <wikibugs_>	 (03PS2) 1020after4: WIP: Add phabricator config for the new swift backend [puppet] - 10https://gerrit.wikimedia.org/r/432533
[00:07:15] <wikibugs_>	 (03PS2) 1020after4: Add account for phabricator_files to swift::params::accounts [puppet] - 10https://gerrit.wikimedia.org/r/432528
[00:07:31] <bd808>	 Reedy will need to find a new exotic locale to run a deploy from if he's going to return in style. Cars, planes, and boats have been done already. What does it take to get on a rocket?
[00:07:44] * bd808 phones Musk
[00:07:51] <Reedy>	 ISS would be fun
[00:08:12] <bd808>	 Cindy knows folks at NASA :)
[00:08:37] <James_F>	 Submarine would be harder than space.
[00:08:46] <wikibugs_>	 (03PS3) 1020after4: Add account for phabricator_files to swift::params::accounts [puppet] - 10https://gerrit.wikimedia.org/r/432528
[00:08:51] <Reedy>	 Submarine docked in portsmouth seems easy
[00:09:47] <wikibugs_>	 (03PS2) 10EBernhardson: Tune CirrusSearch slow logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436848 (https://phabricator.wikimedia.org/T196180)
[00:09:57] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Tune CirrusSearch slow logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436848 (https://phabricator.wikimedia.org/T196180) (owner: 10EBernhardson)
[00:10:06] <Reedy>	 James_F: should be on mwdebug1001
[00:10:16] <James_F>	 Ta.
[00:12:11] <James_F>	 Reedy: LGTM.
[00:15:10] <logmsgbot>	 !log reedy@deploy1001 Synchronized php-1.32.0-wmf.7/extensions/WikimediaMessages/: respect watchlist preference feature flag (duration: 00m 58s)
[00:15:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:20:09] <wikibugs_>	 (03PS1) 10EBernhardson: logstash: Use gelf long_message when provided [puppet] - 10https://gerrit.wikimedia.org/r/437657 (https://phabricator.wikimedia.org/T196180)
[00:23:05] <wikibugs_>	 (03CR) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk)
[00:23:52] <wikibugs_>	 (03PS4) 1020after4: Configuration for phabricator to use swift storage. [puppet] - 10https://gerrit.wikimedia.org/r/432528 (https://phabricator.wikimedia.org/T182085)
[00:24:12] <wikibugs_>	 (03Abandoned) 1020after4: WIP: Add phabricator config for the new swift backend [puppet] - 10https://gerrit.wikimedia.org/r/432533 (owner: 1020after4)
[00:26:52] <wikibugs_>	 (03CR) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk)
[00:26:59] <wikibugs_>	 (03PS3) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052
[00:28:02] <wikibugs_>	 (03CR) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk)
[00:28:24] <wikibugs_>	 10Operations, 10MediaWiki-Platform-Team, 10Performance-Team, 10MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), and 2 others: php-memcached 3.0 (PHP 7) incompatible with BagOStuff - https://phabricator.wikimedia.org/T196125#4247677 (10Reedy)
[00:38:03] <wikibugs_>	 (03PS4) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052
[00:38:52] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk)
[00:40:59] <wikibugs_>	 (03PS5) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052
[00:41:27] * Krinkle staging on deploy1001 and testing something mwdebug1002
[00:41:47] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk)
[00:44:49] <wikibugs_>	 (03PS6) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052
[00:45:27] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk)
[00:51:05] <wikibugs_>	 (03PS7) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052
[00:51:47] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk)
[00:52:05] <Krenair>	 what nonsense
[00:53:01] <wikibugs_>	 (03PS8) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052
[00:53:42] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk)
[00:54:35] <wikibugs_>	 (03PS9) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052
[00:55:16] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk)
[00:55:44] <Krenair>	 00:55:14 modules/network/manifests/constants.pp:220 wmf-style: Found hiera call in class 'network::constants' for 'network::constants::extra_labs_cumin_masters'
[00:55:45] <Krenair>	 wat
[00:56:13] <Krenair>	 I'm going back to the idea of just giving those hosts access to everything
[00:57:36] <wikibugs_>	 (03PS10) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052
[00:58:13] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk)
[00:58:57] <wikibugs_>	 (03PS11) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052
[01:04:40] <Krenair>	 actually I know a different hack around this
[01:05:01] <Krenair>	 no I'll just try the (bad) suggested way
[01:06:28] <logmsgbot>	 !log krinkle@deploy1001 Synchronized php-1.32.0-wmf.7/vendor/: I5a5d7de4702c23f0 / T196496 (duration: 01m 35s)
[01:06:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:06:34] <stashbot>	 T196496: Inline script for 'wgBackendResponseTime' missing in prod - https://phabricator.wikimedia.org/T196496
[01:07:43] <logmsgbot>	 !log krinkle@deploy1001 Synchronized php-1.32.0-wmf.7/composer.json: I13dbdba2b9d / T196496 (duration: 00m 57s)
[01:07:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:11:44] <wikibugs_>	 (03PS12) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052
[01:12:24] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk)
[01:16:28] <wikibugs_>	 (03PS13) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052
[01:17:01] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk)
[01:17:59] <wikibugs_>	 (03PS14) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052
[02:27:12] <logmsgbot>	 !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.6) (duration: 08m 23s)
[02:27:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:29:46] <wikibugs_>	 (03Abandoned) 10Krinkle: Move scap::sources from role::deployment_server to common [puppet] - 10https://gerrit.wikimedia.org/r/436581 (https://phabricator.wikimedia.org/T161675) (owner: 10Krinkle)
[02:59:25] <wikibugs_>	 10Operations, 10MediaWiki-Platform-Team, 10Performance-Team, 10MW-1.31-release, and 3 others: php-memcached 3.0 (PHP 7) incompatible with BagOStuff - https://phabricator.wikimedia.org/T196125#4259791 (10Reedy)
[02:59:59] <logmsgbot>	 !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.7) (duration: 15m 31s)
[03:00:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:10:14] <logmsgbot>	 !log l10nupdate@deploy1001 ResourceLoader cache refresh completed at Wed Jun  6 03:10:14 UTC 2018 (duration 10m 15s)
[03:10:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:32:47] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 901.17 seconds
[04:03:37] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 292.88 seconds
[04:35:23] <wikibugs_>	 (03PS1) 10KartikMistry: dotfiles: Added `screen -R` in .bash_profile [puppet] - 10https://gerrit.wikimedia.org/r/437669
[04:41:24] <logmsgbot>	 !log kartik@deploy1001 Started deploy [cxserver/deploy@8ce20ba]: Update cxserver to 391d7b6 (Fixing T196462)
[04:41:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:41:30] <stashbot>	 T196462: cxserver: Error: ENOENT: no such file or directory, open 'config/MWPageLoader.yaml - https://phabricator.wikimedia.org/T196462
[04:44:29] <logmsgbot>	 !log kartik@deploy1001 Finished deploy [cxserver/deploy@8ce20ba]: Update cxserver to 391d7b6 (Fixing T196462) (duration: 03m 06s)
[04:44:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:15:13] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1065 - https://phabricator.wikimedia.org/T196490#4259845 (10Marostegui) 05Open>03Resolved a:03Cmjohnson All good! ``` Number of Virtual Disks: 1 Virtual Drive: 0 (Target Id: 0) Name                : RAID Level          : Primary-1, Secondary-0, RAID...
[05:16:08] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: db1065 storage crash - https://phabricator.wikimedia.org/T195444#4259848 (10Marostegui) 05Open>03Resolved After replacing disk #1, this is all good now. ``` root@db1065:~# megacli -LDPDInfo -aAll | grep -i flagged Drive has flagged a S.M.A.R.T alert...
[05:17:47] <marostegui>	 !log Deploy schema change on db1070 s5 primary master  - T191316 T192926 T195193
[05:17:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:17:53] <stashbot>	 T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926
[05:17:53] <stashbot>	 T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193
[05:17:53] <stashbot>	 T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316
[05:20:56] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1972 bytes in 0.115 second response time
[05:21:01] <_joe_>	 AaronSchulz: so, I've noticed that using 2 distinct PoolRoutes we expose ourself to some failure scenario which is undesired, I'll write a ticket once I'm sure of what might need to be changed
[05:21:57] <wikibugs_>	 (03PS1) 10Marostegui: dbproxy1010: Depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/437670 (https://phabricator.wikimedia.org/T190704)
[05:22:44] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] dbproxy1010: Depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/437670 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui)
[05:24:07] <marostegui>	 !log Reload haproxy on dbproxy1010 to depool labsdb1010 - T190704
[05:24:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:24:12] <stashbot>	 T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704
[05:25:01] <marostegui>	 !log Restart MySQL on labsdb1010
[05:25:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:29:07] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 22 probes of 320 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[05:29:24] <wikibugs_>	 (03PS1) 10Marostegui: Revert "dbproxy1010: Depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/437671
[05:30:36] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 64 probes of 301 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[05:30:57] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] Revert "dbproxy1010: Depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/437671 (owner: 10Marostegui)
[05:31:56] <marostegui>	 !log Reload haproxy on dbproxy1010 to repool labsdb1010 - T190704
[05:32:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:32:00] <stashbot>	 T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704
[05:38:07] <icinga-wm>	 PROBLEM - Check systemd state on kubestage1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[05:39:26] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 320 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[05:40:47] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 12 probes of 301 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[05:41:27] <icinga-wm>	 RECOVERY - Check systemd state on kubestage1001 is OK: OK - running: The system is fully operational
[05:44:25] <wikibugs_>	 (03PS1) 10Marostegui: dbproxy1010: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/437674 (https://phabricator.wikimedia.org/T190704)
[05:45:00] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] dbproxy1010: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/437674 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui)
[05:46:29] <marostegui>	 !log Reload haproxy on dbproxy1010 to depool labsdb1011 - T190704
[05:46:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:46:34] <stashbot>	 T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704
[05:53:16] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 43 probes of 301 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[05:53:44] <logmsgbot>	 !log ppchelko@deploy1001 Started deploy [restbase/deploy@baa70b7]: Public release of feed availability endpoint T196402
[05:53:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:53:48] <stashbot>	 T196402: Public rollout of feed content availability endpoint - https://phabricator.wikimedia.org/T196402
[05:55:56] <wikibugs_>	 (03PS1) 10Marostegui: db-eqiad.php: Depool all sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437676 (https://phabricator.wikimedia.org/T190704)
[05:58:18] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 10 probes of 301 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[05:59:17] <wikibugs_>	 (03PS2) 10Marostegui: db-eqiad.php: Depool all sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437676 (https://phabricator.wikimedia.org/T190704)
[06:01:37] <icinga-wm>	 PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received
[06:01:55] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool all sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437676 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui)
[06:02:46] <icinga-wm>	 RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy
[06:03:08] <wikibugs_>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool all sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437676 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui)
[06:04:26] <wikibugs_>	 10Operations, 10DBA, 10Epic: DB meta task for next DC failover issues - https://phabricator.wikimedia.org/T189107#4259884 (10Marostegui)
[06:04:46] <wikibugs_>	 10Operations, 10DBA, 10Epic: DB meta task for next DC failover issues - https://phabricator.wikimedia.org/T189107#4031557 (10Marostegui)
[06:04:46] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool all sanitariums masters - T190704 (duration: 01m 09s)
[06:04:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:04:51] <stashbot>	 T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704
[06:05:28] <logmsgbot>	 !log ppchelko@deploy1001 Finished deploy [restbase/deploy@baa70b7]: Public release of feed availability endpoint T196402 (duration: 11m 45s)
[06:05:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:05:32] <stashbot>	 T196402: Public rollout of feed content availability endpoint - https://phabricator.wikimedia.org/T196402
[06:06:46] <logmsgbot>	 !log ppchelko@deploy1001 Started deploy [restbase/deploy@baa70b7]: Public release of feed availability endpoint T196402, take 2
[06:06:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:09:56] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/: /en.w
[06:09:56] <icinga-wm>	 e/media/{title}{/revision} (Get media in test page) is WARNING: Test Get media in test page responds with unexpected value at path /items[2] = Missing keys: [utitles, uthumbnail, ulicense]
[06:13:59] <logmsgbot>	 !log ppchelko@deploy1001 Finished deploy [restbase/deploy@baa70b7]: Public release of feed availability endpoint T196402, take 2 (duration: 07m 13s)
[06:14:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:14:04] <stashbot>	 T196402: Public rollout of feed content availability endpoint - https://phabricator.wikimedia.org/T196402
[06:14:19] <marostegui>	 !log Stop slave on db2095:3316 to rebuild archive_insert and archive_update triggers - T192926
[06:14:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:14:24] <stashbot>	 T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926
[06:18:56] <wikibugs_>	 10Operations, 10Traffic, 10Wikimania-Hackathon-2018, 10Availability (MediaWiki-MultiDC): Create HTTP verb and sticky cookie DC routing in VCL - https://phabricator.wikimedia.org/T91820#4259902 (10tstarling) Special:Userlogin starts a session on a GET request so that it can implement CSRF protection on the...
[06:32:05] <marostegui>	 !log Deploy schema change on s6 codfw master (db2039), this will generate lag on s6 codfw  - T191316 T192926 T195193 T89737
[06:32:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:32:12] <stashbot>	 T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737
[06:32:12] <stashbot>	 T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926
[06:32:12] <stashbot>	 T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193
[06:32:12] <stashbot>	 T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316
[06:37:55] <wikibugs_>	 (03Abandoned) 10Giuseppe Lavagetto: Depool esams [dns] - 10https://gerrit.wikimedia.org/r/434952 (owner: 10Giuseppe Lavagetto)
[06:47:08] <wikibugs_>	 10Operations, 10Traffic, 10Wikimania-Hackathon-2018, 10Availability (MediaWiki-MultiDC): Create HTTP verb and sticky cookie DC routing in VCL - https://phabricator.wikimedia.org/T91820#4259926 (10aaron) >>! In T91820#4259902, @tstarling wrote: > Special:Userlogin starts a session on a GET request so that i...
[06:56:08] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 031] "One more down!" [puppet] - 10https://gerrit.wikimedia.org/r/437467 (https://phabricator.wikimedia.org/T188377) (owner: 10Elukey)
[06:57:35] <wikibugs_>	 (03PS4) 10Elukey: Move the varnishkafka submodule to operations/puppet [puppet] - 10https://gerrit.wikimedia.org/r/437467 (https://phabricator.wikimedia.org/T188377)
[06:58:26] <elukey>	 jynus: o/ - qq before merging --^ - did you get any issue with puppet when you moved the mariad db submodule to operations/puppet? (just to know what to expect)
[07:00:26] <jynus>	 yes, it broke all puppetmasters
[07:00:40] <elukey>	 ah lovely
[07:00:58] <jynus>	 I would wait to involve cloud
[07:01:11] <jynus>	 as it just needs an rm to be fixed
[07:01:49] <wikibugs_>	 (03CR) 10Muehlenhoff: profile::mediawiki::jobrunner: manage both videoscaler, jobrunner (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/437490 (owner: 10Giuseppe Lavagetto)
[07:01:53] <jynus>	 basically it creates a conflict because existing files (old submodule) conflict with new tracked files
[07:02:34] <jynus>	 so pull fails
[07:05:09] <elukey>	 ah you mean the puppet masters syncing from the prod ones, like labs etc..
[07:05:24] <elukey>	 but the regular puppet-merge in prod shouldn't cause issues right?
[07:11:15] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on labsdb1009 - https://phabricator.wikimedia.org/T195690#4259933 (10jcrespo) @RobH Can you check if we have next-business day support for defects for this hw provider and purchase? Because they seem to not be honoring that/adding some on-purpose delay.
[07:17:17] <icinga-wm>	 ACKNOWLEDGEMENT - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0: Ayounsi Zayo outage
[07:20:55] <wikibugs_>	 (03CR) 10Elukey: "Adding Cloud folks since it seems from the past experience that changes like these tend to break puppet masters, waiting for their green l" [puppet] - 10https://gerrit.wikimedia.org/r/437467 (https://phabricator.wikimedia.org/T188377) (owner: 10Elukey)
[07:25:21] <wikibugs_>	 (03CR) 10Elukey: [C: 031] "LVS seems related to adding the git-ssh.eqiad.wikimedia.org's IP to the phab1002's interface, but until it is not added to conftool/pybal " [puppet] - 10https://gerrit.wikimedia.org/r/437300 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn)
[07:30:54] <marostegui>	 !log Stop MySQL on labsdb1011 to install intel-microcode and reboot
[07:30:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:31:01] <marostegui>	 moritzm: ^
[07:31:47] <moritzm>	 ack, thanks
[07:33:44] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 031] profile::mediawiki::videoscaler: remove global Timeout setting [puppet] - 10https://gerrit.wikimedia.org/r/437491 (owner: 10Giuseppe Lavagetto)
[07:38:26] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1953 bytes in 0.077 second response time
[07:47:37] <wikibugs_>	 (03PS2) 10Gehel: logstash: Use gelf long_message when provided [puppet] - 10https://gerrit.wikimedia.org/r/437657 (https://phabricator.wikimedia.org/T196180) (owner: 10EBernhardson)
[07:48:10] <wikibugs_>	 (03CR) 10Gehel: [C: 032] logstash: Use gelf long_message when provided [puppet] - 10https://gerrit.wikimedia.org/r/437657 (https://phabricator.wikimedia.org/T196180) (owner: 10EBernhardson)
[07:48:27] <wikibugs_>	 (03CR) 10Gehel: [C: 032] "Nice! This one was tricky :)" [puppet] - 10https://gerrit.wikimedia.org/r/437657 (https://phabricator.wikimedia.org/T196180) (owner: 10EBernhardson)
[07:48:54] <marostegui>	 !log Stop replication on all sanitarium masters to move labsdb1011 - T190704
[07:48:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:48:58] <stashbot>	 T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704
[07:53:18] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 031] jobrunner: add profile::mediawiki::videoscaler [puppet] - 10https://gerrit.wikimedia.org/r/437492 (owner: 10Giuseppe Lavagetto)
[07:53:57] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 031] videoscaler/jobrunner: add the respective VIPs [puppet] - 10https://gerrit.wikimedia.org/r/437493 (owner: 10Giuseppe Lavagetto)
[07:55:56] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1973 bytes in 0.083 second response time
[07:56:15] <wikibugs_>	 (03CR) 10Muehlenhoff: conftool-data: merge the jobrunner, videoscaler clusters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/437494 (owner: 10Giuseppe Lavagetto)
[07:57:11] <wikibugs_>	 (03PS3) 10Elukey: phabricator: add role to node phab1002 [puppet] - 10https://gerrit.wikimedia.org/r/437300 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn)
[07:58:09] <wikibugs_>	 (03PS3) 10Addshore: Wikidata: Always have 4 change dispatchers running [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435648 (https://phabricator.wikimedia.org/T194602) (owner: 10Hoo man)
[08:00:17] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on db1116 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 679.41 seconds
[08:00:26] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db1116 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 680.82 seconds
[08:00:28] <marostegui>	 ^ that is me
[08:00:36] <marostegui>	 I think I missed to silence that host
[08:00:46] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s8 on db1116 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 700.25 seconds
[08:00:46] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on db1116 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 701.27 seconds
[08:02:56] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s5 on db1116 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[08:04:32] <wikibugs_>	 (03PS5) 10Gehel: elasticsearch: alert when cirrus writes are frozen for too long [puppet] - 10https://gerrit.wikimedia.org/r/431754 (https://phabricator.wikimedia.org/T193605)
[08:04:44] <wikibugs_>	 (03PS6) 10Gehel: elasticsearch: alert when cirrus writes are frozen for too long [puppet] - 10https://gerrit.wikimedia.org/r/431754 (https://phabricator.wikimedia.org/T193605)
[08:05:38] <wikibugs_>	 (03CR) 10Gehel: [C: 032] elasticsearch: alert when cirrus writes are frozen for too long [puppet] - 10https://gerrit.wikimedia.org/r/431754 (https://phabricator.wikimedia.org/T193605) (owner: 10Gehel)
[08:05:47] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db1116 is OK: OK slave_sql_lag Replication lag: 0.29 seconds
[08:06:16] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s8 on db1116 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[08:06:33] <wikibugs_>	 10Operations, 10Mail, 10Patch-For-Review: Upgrade mx1001/mx2001 to stretch - https://phabricator.wikimedia.org/T175361#4259998 (10ayounsi) I'm currently in Europe, so if you're on the east coast, ping me anytime (east coast) this week and I can do it.
[08:06:57] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on db1116 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[08:07:54] <wikibugs_>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool all sanitarium masters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437678
[08:09:36] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool all sanitarium masters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437678 (owner: 10Marostegui)
[08:12:22] <wikibugs_>	 (03PS1) 10Gehel: elasticsearch: send frozen writes check over HTTPS [puppet] - 10https://gerrit.wikimedia.org/r/437679 (https://phabricator.wikimedia.org/T193605)
[08:12:33] <wikibugs_>	 (03PS2) 10Gehel: elasticsearch: send frozen writes check over HTTPS [puppet] - 10https://gerrit.wikimedia.org/r/437679 (https://phabricator.wikimedia.org/T193605)
[08:13:01] <addshore>	 jouncebot: now
[08:13:01] <jouncebot>	 For the next 0 hour(s) and 46 minute(s): Wikibase Dispatching (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180606T0805)
[08:13:19] <addshore>	 marostegui: I'v got 1 patch for mediawiki-config :) let me know when im okay!
[08:13:29] <wikibugs_>	 (03CR) 10Gehel: [C: 032] elasticsearch: send frozen writes check over HTTPS [puppet] - 10https://gerrit.wikimedia.org/r/437679 (https://phabricator.wikimedia.org/T193605) (owner: 10Gehel)
[08:14:12] <marostegui>	 addshore: yeah, I am deplying now, should be done in 1 min or so :)
[08:14:57] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool all sanitariums masters - T190704 (duration: 00m 57s)
[08:15:01] <marostegui>	 addshore: all yours
[08:15:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:15:02] <stashbot>	 T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704
[08:18:20] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: "I wouldn't recommend doing that, for the reasons very nicely pointed out in  https://superuser.com/questions/224631/is-it-a-good-idea-to-p" [puppet] - 10https://gerrit.wikimedia.org/r/437669 (owner: 10KartikMistry)
[08:19:09] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "db-eqiad.php: Depool all sanitarium masters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437678 (owner: 10Marostegui)
[08:20:07] <addshore>	 marostegui: thanks!
[08:20:14] <wikibugs_>	 (03CR) 10Addshore: [C: 032] Wikidata: Always have 4 change dispatchers running [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435648 (https://phabricator.wikimedia.org/T194602) (owner: 10Hoo man)
[08:20:18] <marostegui>	 I will need to deploy later again
[08:20:29] <addshore>	 marostegui: yup, thats fine :)
[08:20:51] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437678 (owner: 10Marostegui)
[08:21:39] <wikibugs_>	 (03Merged) 10jenkins-bot: Wikidata: Always have 4 change dispatchers running [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435648 (https://phabricator.wikimedia.org/T194602) (owner: 10Hoo man)
[08:22:17] <wikibugs_>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool all sanitarium masters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437678 (owner: 10Marostegui)
[08:22:50] <marostegui>	 addshore: let me know when I can do so
[08:23:09] <addshore>	 syncing now
[08:24:03] <logmsgbot>	 !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: wikidatawiki dispatching: [[gerrit:435648|dispatchMaxTime 720 (4 dispatchers at once)]] (duration: 00m 56s)
[08:24:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:24:12] <addshore>	 woo :D deploy1001 
[08:24:14] <addshore>	 marostegui: all yours
[08:24:19] <marostegui>	 \o/
[08:24:19] <marostegui>	 thanks
[08:25:17] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool all sanitariums masters - T190704 (duration: 00m 56s)
[08:25:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:25:21] <stashbot>	 T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704
[08:26:31] <wikibugs_>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4260035 (10Marostegui) labsdb1011 has been moved over the new sanitarium. This was the last host to be moved. Let's wait to mak...
[08:26:32] <wikibugs_>	 (03PS1) 10Marostegui: Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/437682
[08:26:37] <wikibugs_>	 (03PS2) 10Marostegui: Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/437682
[08:26:49] <wikibugs_>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4260036 (10Marostegui)
[08:27:22] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/437682 (owner: 10Marostegui)
[08:29:05] <marostegui>	 !log Reload haproxy on dbproxy1010 to repool labsdb1011 - T190704
[08:29:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:29:39] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] Prepare to tighten Puppet DB access control - check client certificates (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/437057 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk)
[08:30:32] <wikibugs_>	 (03PS1) 10Gehel: elasticsearch: check frozen writes improvements [puppet] - 10https://gerrit.wikimedia.org/r/437683 (https://phabricator.wikimedia.org/T193605)
[08:31:32] <wikibugs_>	 (03CR) 10Gehel: [C: 032] elasticsearch: check frozen writes improvements [puppet] - 10https://gerrit.wikimedia.org/r/437683 (https://phabricator.wikimedia.org/T193605) (owner: 10Gehel)
[08:32:49] <wikibugs_>	 (03PS1) 10Hashar: Try a build against jessie-backports [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/437684 (https://phabricator.wikimedia.org/T196037)
[08:33:17] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Try a build against jessie-backports [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/437684 (https://phabricator.wikimedia.org/T196037) (owner: 10Hashar)
[08:36:48] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: conftool-data: merge the jobrunner, videoscaler clusters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/437494 (owner: 10Giuseppe Lavagetto)
[08:37:30] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: profile::mediawiki::jobrunner: manage both videoscaler, jobrunner (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/437490 (owner: 10Giuseppe Lavagetto)
[08:38:32] <wikibugs_>	 (03Abandoned) 10Hashar: Try a build against jessie-backports [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/437684 (https://phabricator.wikimedia.org/T196037) (owner: 10Hashar)
[08:39:15] <wikibugs_>	 (03CR) 10jenkins-bot: Add Minus-X to check against files that shouldn't be executable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436994 (https://phabricator.wikimedia.org/T196225) (owner: 10Mainframe98)
[08:39:54] <wikibugs_>	 (03CR) 10jenkins-bot: Fixing very trivial spelling error in InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437441 (owner: 10Sau226)
[08:39:59] <wikibugs_>	 (03CR) 10Hashar: "recheck" [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/433318 (https://phabricator.wikimedia.org/T194342) (owner: 10KartikMistry)
[08:40:20] <wikibugs_>	 (03CR) 10jenkins-bot: Drop the UnicodeConverter extension from production, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436331 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester)
[08:40:53] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] WIP: apertium-apy: New upstream release [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/433318 (https://phabricator.wikimedia.org/T194342) (owner: 10KartikMistry)
[08:42:11] <wikibugs_>	 (03CR) 10Muehlenhoff: conftool-data: merge the jobrunner, videoscaler clusters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/437494 (owner: 10Giuseppe Lavagetto)
[08:44:47] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::jobrunner: manage both videoscaler, jobrunner [puppet] - 10https://gerrit.wikimedia.org/r/437490
[08:46:58] <wikibugs_>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4241885 (10Marostegui)
[08:47:38] <wikibugs_>	 10Operations, 10ops-eqiad, 10Cloud-VPS: labnet1003 and labnet1004 moving and enabling 10G NICs - https://phabricator.wikimedia.org/T193196#4260109 (10ayounsi) asw2-b-eqiad xe-7/0/9 and  xe-4/0/3 moved to group "vlan-cloud-hosts1-b-eqiad" asw2-b-eqiad xe-7/0/19 and xe-4/0/46 moved to group "vlan-cloud-instanc...
[08:48:16] <wikibugs_>	 10Operations, 10Wikidata, 10Wikidata-Campsite, 10Wikimedia-General-or-Unknown, and 6 others: Multiple projects reporting Cannot access the database: No working replica DB server - https://phabricator.wikimedia.org/T195520#4260113 (10Addshore) 05Open>03Resolved So to wrap this ticket up the incident rep...
[08:58:21] <wikibugs_>	 (03PS1) 10Marostegui: mariadb: Convert db1116 to spare [puppet] - 10https://gerrit.wikimedia.org/r/437687 (https://phabricator.wikimedia.org/T196376)
[08:58:47] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 031] profile::mediawiki::jobrunner: manage both videoscaler, jobrunner [puppet] - 10https://gerrit.wikimedia.org/r/437490 (owner: 10Giuseppe Lavagetto)
[08:59:20] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler02/11388/ seems to DTRT; I will apply this with care." [puppet] - 10https://gerrit.wikimedia.org/r/437490 (owner: 10Giuseppe Lavagetto)
[08:59:50] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] profile::mediawiki::jobrunner: manage both videoscaler, jobrunner [puppet] - 10https://gerrit.wikimedia.org/r/437490 (owner: 10Giuseppe Lavagetto)
[09:00:06] <wikibugs_>	 (03CR) 10ArielGlenn: "This looks legit to me. But maybe we should move away from using the trebuchet user anywhere at all." [puppet] - 10https://gerrit.wikimedia.org/r/361796 (owner: 10Thcipriani)
[09:03:47] <wikibugs_>	 (03CR) 10Elukey: [C: 031] "Just to be on the safe side, I am adding Mukunda to the code review. Is there anything relevant to know when Phabricator gets executed the" [puppet] - 10https://gerrit.wikimedia.org/r/437300 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn)
[09:05:14] <wikibugs_>	 (03CR) 10ArielGlenn: "Looks ok. I am pretty sure we don't need the dumps.yaml change in the end, but that can be sorted out later. Please don't merge this until" [puppet] - 10https://gerrit.wikimedia.org/r/437558 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn)
[09:10:43] <wikibugs_>	 (03CR) 10jenkins-bot: Drop the UnicodeConverter extension from production, part 3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436333 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester)
[09:11:00] <wikibugs_>	 (03CR) 10jenkins-bot: Drop the UnicodeConverter extension from production, part 4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436334 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester)
[09:11:46] <wikibugs_>	 (03PS10) 10MarcoAurelio: idwikimedia: initial configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429385 (https://phabricator.wikimedia.org/T192726)
[09:12:40] <wikibugs_>	 (03CR) 10jenkins-bot: Replace wfGetLBFactory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414310 (owner: 10Umherirrender)
[09:15:34] <wikibugs_>	 (03CR) 10jenkins-bot: Add reference for itwiki $wgAbuseFilterActions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420237 (owner: 10Nemo bis)
[09:15:38] <icinga-wm>	 PROBLEM - Disk space on elastic1029 is CRITICAL: DISK CRITICAL - free space: /srv 59704 MB (12% inode=99%)
[09:15:59] <wikibugs_>	 (03CR) 10jenkins-bot: Only retain private securepoll data for 60 days after election [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372180 (https://phabricator.wikimedia.org/T173393) (owner: 10Brian Wolff)
[09:16:20] <wikibugs_>	 (03CR) 10jenkins-bot: Remove $wgNamespacesWithSubpages overrides for NS_TEMPLATE [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432587 (https://phabricator.wikimedia.org/T191612) (owner: 10Gergő Tisza)
[09:16:33] <wikibugs_>	 (03CR) 10jenkins-bot: Wikidata: Always have 4 change dispatchers running [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435648 (https://phabricator.wikimedia.org/T194602) (owner: 10Hoo man)
[09:20:37] <wikibugs_>	 (03CR) 10jenkins-bot: Testing page creation log on Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437379 (https://phabricator.wikimedia.org/T196400) (owner: 10Kaldari)
[09:20:52] <wikibugs_>	 (03CR) 10jenkins-bot: Disable DisableAccount on wikis where there are no disabled users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338792 (https://phabricator.wikimedia.org/T106067) (owner: 10Reedy)
[09:21:21] <wikibugs_>	 (03CR) 10jenkins-bot: Remove lines that are now part of AbuseFilter defaults [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424974 (https://phabricator.wikimedia.org/T178349) (owner: 10Huji)
[09:21:35] <wikibugs_>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool all sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437676 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui)
[09:21:49] <wikibugs_>	 (03CR) 10jenkins-bot: Enable DynamicPageList extension on bdwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414109 (https://phabricator.wikimedia.org/T188109) (owner: 10Framawiki)
[09:22:20] <wikibugs_>	 (03CR) 10jenkins-bot: Add wmgBabelCategoryNames to officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432403 (owner: 10Amire80)
[09:22:35] <wikibugs_>	 (03CR) 10jenkins-bot: Drop the UnicodeConverter extension from production, part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436332 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester)
[09:22:45] <wikibugs_>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool all sanitarium masters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437678 (owner: 10Marostegui)
[09:26:24] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] Tighten Puppet DB access control - check client certificates (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/437640 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk)
[09:27:39] <wikibugs_>	 (03PS8) 10Ema: prometheus: export intel-microcode information via node_exporter [puppet] - 10https://gerrit.wikimedia.org/r/436553 (https://phabricator.wikimedia.org/T127825)
[09:29:48] <icinga-wm>	 RECOVERY - Disk space on elastic1029 is OK: DISK OK
[09:31:29] <wikibugs_>	 (03CR) 10Ema: "Script updated to handle a few issues reported by Moritz, current output:" [puppet] - 10https://gerrit.wikimedia.org/r/436553 (https://phabricator.wikimedia.org/T127825) (owner: 10Ema)
[09:32:27] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1958 bytes in 0.078 second response time
[09:42:50] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::jobrunner_tls: add server alias for discovery [puppet] - 10https://gerrit.wikimedia.org/r/437697
[09:42:58] <_joe_>	 brown paper bag fix :(
[09:43:17] <_joe_>	 also, http(s) is hard as you add layers of indirection
[09:43:36] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] profile::mediawiki::jobrunner_tls: add server alias for discovery [puppet] - 10https://gerrit.wikimedia.org/r/437697 (owner: 10Giuseppe Lavagetto)
[09:44:55] <wikibugs_>	 (03CR) 10ArielGlenn: "Logic is sensible. /etc/ssl/certs/Puppet_Internal_CA.pem is copied from /var/lib/puppet/ssl/certs/ca.pem exactly so that it will be availa" [puppet] - 10https://gerrit.wikimedia.org/r/437057 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk)
[09:45:56] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: jobrunner_tls: server_aliases, not server_alias [puppet] - 10https://gerrit.wikimedia.org/r/437698
[09:45:58] <wikibugs_>	 (03PS2) 10Marostegui: mariadb: Convert db1116 to spare [puppet] - 10https://gerrit.wikimedia.org/r/437687 (https://phabricator.wikimedia.org/T196376)
[09:46:37] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] Tighten Puppet DB access control - check client certificates (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/437640 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk)
[09:46:52] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] mariadb: Convert db1116 to spare [puppet] - 10https://gerrit.wikimedia.org/r/437687 (https://phabricator.wikimedia.org/T196376) (owner: 10Marostegui)
[09:48:29] <_joe_>	 sigh, ff-only
[09:48:36] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: jobrunner_tls: server_aliases, not server_alias [puppet] - 10https://gerrit.wikimedia.org/r/437698
[09:48:44] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] jobrunner_tls: server_aliases, not server_alias [puppet] - 10https://gerrit.wikimedia.org/r/437698 (owner: 10Giuseppe Lavagetto)
[09:49:17] <icinga-wm>	 PROBLEM - puppet last run on mw2152 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:54:27] <icinga-wm>	 RECOVERY - puppet last run on mw2152 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:54:44] <wikibugs_>	 (03CR) 10ArielGlenn: "Yes, you don't want to mix and match classes from different modules within a module. We do that at the profile level." [puppet] - 10https://gerrit.wikimedia.org/r/372764 (owner: 10Alex Monk)
[09:55:11] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Failover m2-master to db1065 [puppet] - 10https://gerrit.wikimedia.org/r/437703 (https://phabricator.wikimedia.org/T186320)
[09:57:51] <wikibugs_>	 (03CR) 10Marostegui: [C: 031] mariadb: Failover m2-master to db1065 [puppet] - 10https://gerrit.wikimedia.org/r/437703 (https://phabricator.wikimedia.org/T186320) (owner: 10Jcrespo)
[09:58:47] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Failover m3-master to db1072 [puppet] - 10https://gerrit.wikimedia.org/r/437707 (https://phabricator.wikimedia.org/T186320)
[09:59:19] <wikibugs_>	 (03CR) 10Jcrespo: [C: 04-1] "wrong port" [puppet] - 10https://gerrit.wikimedia.org/r/437703 (https://phabricator.wikimedia.org/T186320) (owner: 10Jcrespo)
[10:00:01] <wikibugs_>	 (03PS2) 10Jcrespo: mariadb: Failover m2-master to db1065 [puppet] - 10https://gerrit.wikimedia.org/r/437703 (https://phabricator.wikimedia.org/T186320)
[10:00:53] <wikibugs_>	 (03CR) 10Marostegui: "commit says m2 slave, isn't it m3?" [puppet] - 10https://gerrit.wikimedia.org/r/437707 (https://phabricator.wikimedia.org/T186320) (owner: 10Jcrespo)
[10:04:24] <wikibugs_>	 (03PS2) 10Jcrespo: mariadb: Failover m3-master to db1072 [puppet] - 10https://gerrit.wikimedia.org/r/437707 (https://phabricator.wikimedia.org/T186320)
[10:07:43] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Update misc replica CNAME for m2 and m3 [dns] - 10https://gerrit.wikimedia.org/r/437710 (https://phabricator.wikimedia.org/T186320)
[10:08:26] <wikibugs_>	 (03PS1) 10Marostegui: sX.hosts: Remove db1116 [software] - 10https://gerrit.wikimedia.org/r/437711 (https://phabricator.wikimedia.org/T196376)
[10:09:21] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] sX.hosts: Remove db1116 [software] - 10https://gerrit.wikimedia.org/r/437711 (https://phabricator.wikimedia.org/T196376) (owner: 10Marostegui)
[10:10:08] <wikibugs_>	 (03Merged) 10jenkins-bot: sX.hosts: Remove db1116 [software] - 10https://gerrit.wikimedia.org/r/437711 (https://phabricator.wikimedia.org/T196376) (owner: 10Marostegui)
[10:13:41] <awight>	 I'm doing a canary deployment to ores2002, shouldn't impact anyone else...
[10:15:14] <logmsgbot>	 !log awight@deploy1001 Started deploy [ores/deploy@bf182e2]: ORES canary deployment to ores2002.codfw.wmnet; T176336
[10:15:20] <logmsgbot>	 !log awight@deploy1001 Finished deploy [ores/deploy@bf182e2]: ORES canary deployment to ores2002.codfw.wmnet; T176336 (duration: 00m 06s)
[10:15:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:15:21] <stashbot>	 T176336: Deploy drafttopic model to production ORES - https://phabricator.wikimedia.org/T176336
[10:15:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:15:41] <logmsgbot>	 !log awight@deploy1001 Started deploy [ores/deploy@65e979f]: ORES canary deployment to ores2002.codfw.wmnet; T176336
[10:15:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:16:36] <marostegui>	 !log Deploy schema change on dbstore1002:s6  - T191316 T192926 T195193 T89737
[10:16:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:16:45] <stashbot>	 T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737
[10:16:45] <stashbot>	 T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926
[10:16:45] <stashbot>	 T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193
[10:16:45] <stashbot>	 T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316
[10:19:25] <logmsgbot>	 !log awight@deploy1001 Finished deploy [ores/deploy@65e979f]: ORES canary deployment to ores2002.codfw.wmnet; T176336 (duration: 03m 44s)
[10:19:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:20:24] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 031] "On a Nahelem CPU with the intel-microcode package in stretch (before Spectre happened):" [puppet] - 10https://gerrit.wikimedia.org/r/436553 (https://phabricator.wikimedia.org/T127825) (owner: 10Ema)
[10:20:41] <wikibugs_>	 10Operations, 10ops-codfw, 10netops: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489#4260466 (10Peachey88)
[10:24:20] <wikibugs_>	 (03PS3) 10Jcrespo: mariadb: Failover m2-master to db1065 [puppet] - 10https://gerrit.wikimedia.org/r/437703 (https://phabricator.wikimedia.org/T186320)
[10:24:22] <wikibugs_>	 (03PS3) 10Jcrespo: mariadb: Failover m3-master to db1072 [puppet] - 10https://gerrit.wikimedia.org/r/437707 (https://phabricator.wikimedia.org/T186320)
[10:24:24] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Switchover m2-master to db1065 [puppet] - 10https://gerrit.wikimedia.org/r/437714 (https://phabricator.wikimedia.org/T186320)
[10:24:26] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Switchover m3-master to db1072 [puppet] - 10https://gerrit.wikimedia.org/r/437715 (https://phabricator.wikimedia.org/T186320)
[10:24:45] <wikibugs_>	 (03PS5) 10Ema: VCL: Normalise the Accept-Language header for the REST API [puppet] - 10https://gerrit.wikimedia.org/r/434558 (https://phabricator.wikimedia.org/T195327) (owner: 10Mobrovac)
[10:28:11] <wikibugs_>	 (03PS2) 10Jcrespo: mariadb: Switchover m2-master to db1065 [puppet] - 10https://gerrit.wikimedia.org/r/437714 (https://phabricator.wikimedia.org/T186320)
[10:29:26] <wikibugs_>	 (03PS2) 10Jcrespo: mariadb: Switchover m3-master to db1072 [puppet] - 10https://gerrit.wikimedia.org/r/437715 (https://phabricator.wikimedia.org/T186320)
[10:40:42] <wikibugs_>	 (03PS1) 10Awight: Initialize LFS on scap targets [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627)
[10:41:16] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Initialize LFS on scap targets [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627) (owner: 10Awight)
[10:42:46] <wikibugs_>	 (03PS7) 10Giuseppe Lavagetto: Switch video scalers to a profile [puppet] - 10https://gerrit.wikimedia.org/r/430892 (owner: 10Muehlenhoff)
[10:44:03] <wikibugs_>	 (03PS1) 10Marostegui: mariadb: Set db1095 as spare, remove unused code [puppet] - 10https://gerrit.wikimedia.org/r/437720 (https://phabricator.wikimedia.org/T196376)
[10:44:56] <wikibugs_>	 (03CR) 10Marostegui: [C: 04-2] "Do not merge until db1095 is out of use" [puppet] - 10https://gerrit.wikimedia.org/r/437720 (https://phabricator.wikimedia.org/T196376) (owner: 10Marostegui)
[10:49:11] <wikibugs_>	 (03CR) 10Marostegui: [C: 04-2] "https://puppet-compiler.wmflabs.org/compiler02/11391/" [puppet] - 10https://gerrit.wikimedia.org/r/437720 (https://phabricator.wikimedia.org/T196376) (owner: 10Marostegui)
[10:53:45] <wikibugs_>	 (03PS2) 10Awight: Initialize LFS on scap targets [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627)
[10:54:32] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Initialize LFS on scap targets [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627) (owner: 10Awight)
[10:57:58] <wikibugs_>	 (03PS3) 10Awight: Initialize LFS on scap targets [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627)
[10:58:24] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Initialize LFS on scap targets [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627) (owner: 10Awight)
[10:59:11] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Switch video scalers to a profile [puppet] - 10https://gerrit.wikimedia.org/r/430892 (owner: 10Muehlenhoff)
[10:59:46] <wikibugs_>	 (03PS4) 10Awight: Initialize LFS on scap targets [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627)
[10:59:51] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 26 probes of 300 (alerts on 19) - https://atlas.ripe.net/measurements/11645088/#!map
[11:00:25] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Initialize LFS on scap targets [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627) (owner: 10Awight)
[11:04:10] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::videoscaler: remove global Timeout setting [puppet] - 10https://gerrit.wikimedia.org/r/437491
[11:04:51] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 7 probes of 300 (alerts on 19) - https://atlas.ripe.net/measurements/11645088/#!map
[11:05:07] <wikibugs_>	 (03CR) 10Mobrovac: [C: 031] VCL: Normalise the Accept-Language header for the REST API [puppet] - 10https://gerrit.wikimedia.org/r/434558 (https://phabricator.wikimedia.org/T195327) (owner: 10Mobrovac)
[11:05:16] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] profile::mediawiki::videoscaler: remove global Timeout setting [puppet] - 10https://gerrit.wikimedia.org/r/437491 (owner: 10Giuseppe Lavagetto)
[11:06:44] <wikibugs_>	 (03CR) 10KartikMistry: "> I wouldn't recommend doing that, for the reasons very nicely" [puppet] - 10https://gerrit.wikimedia.org/r/437669 (owner: 10KartikMistry)
[11:06:56] <wikibugs_>	 (03Abandoned) 10KartikMistry: dotfiles: Added `screen -R` in .bash_profile [puppet] - 10https://gerrit.wikimedia.org/r/437669 (owner: 10KartikMistry)
[11:12:22] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1953 bytes in 0.083 second response time
[11:15:01] <icinga-wm>	 PROBLEM - Disk space on elastic1029 is CRITICAL: DISK CRITICAL - free space: /srv 60186 MB (12% inode=99%)
[11:19:27] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Depool db1084 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437730
[11:22:35] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1084 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437730 (owner: 10Jcrespo)
[11:23:25] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: jobrunner: add profile::mediawiki::videoscaler [puppet] - 10https://gerrit.wikimedia.org/r/437492
[11:23:49] <wikibugs_>	 (03Merged) 10jenkins-bot: mariadb: Depool db1084 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437730 (owner: 10Jcrespo)
[11:27:02] <logmsgbot>	 !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1084 (duration: 00m 58s)
[11:27:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:27:12] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on db2075 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 303.33 seconds
[11:27:32] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on db2059 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 308.25 seconds
[11:27:41] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on db2052 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.59 seconds
[11:27:51] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on db2094 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 313.57 seconds
[11:27:51] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on db2066 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 313.86 seconds
[11:27:51] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on db2084 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 314.36 seconds
[11:27:52] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on db2038 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 315.40 seconds
[11:28:30] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/11393/mw1308.eqiad.wmnet/ seems to DTRT." [puppet] - 10https://gerrit.wikimedia.org/r/437492 (owner: 10Giuseppe Lavagetto)
[11:29:01] <wikibugs_>	 (03CR) 10jenkins-bot: mariadb: Depool db1084 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437730 (owner: 10Jcrespo)
[11:32:41] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1947 bytes in 0.072 second response time
[11:33:21] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Give more s4 weight to db1097 and db1103 (3314) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437733
[11:34:51] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Give more s4 weight to db1097 and db1103 (3314) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437733 (owner: 10Jcrespo)
[11:35:32] <jynus>	 !log stop and reimage db1084
[11:35:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:36:04] <wikibugs_>	 (03Merged) 10jenkins-bot: mariadb: Give more s4 weight to db1097 and db1103 (3314) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437733 (owner: 10Jcrespo)
[11:38:01] <logmsgbot>	 !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Increase s4 weight for db1097 and db1103 (duration: 00m 56s)
[11:38:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:40:28] <wikibugs_>	 (03CR) 10jenkins-bot: mariadb: Give more s4 weight to db1097 and db1103 (3314) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437733 (owner: 10Jcrespo)
[11:41:23] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: videoscaler/jobrunner: add the respective VIPs [puppet] - 10https://gerrit.wikimedia.org/r/437493
[11:41:33] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Reimage db1084 as stretch [puppet] - 10https://gerrit.wikimedia.org/r/437735
[11:42:09] <wikibugs_>	 (03CR) 10Jcrespo: [V: 032 C: 032] mariadb: Reimage db1084 as stretch [puppet] - 10https://gerrit.wikimedia.org/r/437735 (owner: 10Jcrespo)
[11:44:13] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/11395/" [puppet] - 10https://gerrit.wikimedia.org/r/437493 (owner: 10Giuseppe Lavagetto)
[11:44:24] <_joe_>	 argh, merge-sniped
[11:44:31] <icinga-wm>	 RECOVERY - Disk space on elastic1029 is OK: DISK OK
[11:44:39] <wikibugs_>	 (03PS3) 10Giuseppe Lavagetto: videoscaler/jobrunner: add the respective VIPs [puppet] - 10https://gerrit.wikimedia.org/r/437493
[11:49:12] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1306 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.005 second response time
[11:49:45] <_joe_>	 that's puppet restarting hhvm ^^
[11:50:21] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1306 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.001 second response time
[11:51:21] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1334 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[11:52:22] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1334 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.001 second response time
[11:55:12] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1974 bytes in 0.079 second response time
[12:02:04] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1966 bytes in 0.106 second response time
[12:04:44] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1337 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[12:05:44] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1337 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time
[12:08:54] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1310 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[12:09:54] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1310 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.001 second response time
[12:11:33] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1309 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[12:12:33] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1309 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.005 second response time
[12:15:55] <wikibugs_>	 (03PS2) 10Mobrovac: Disable redis queue for cirrus search for all wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437448 (https://phabricator.wikimedia.org/T189137) (owner: 10Ppchelko)
[12:18:23] <wikibugs_>	 (03CR) 10Mobrovac: [C: 032] Disable redis queue for cirrus search for all wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437448 (https://phabricator.wikimedia.org/T189137) (owner: 10Ppchelko)
[12:18:33] * mobrovac taking over deploy1001
[12:19:36] <wikibugs_>	 (03Merged) 10jenkins-bot: Disable redis queue for cirrus search for all wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437448 (https://phabricator.wikimedia.org/T189137) (owner: 10Ppchelko)
[12:22:14] <logmsgbot>	 !log ppchelko@deploy1001 Started deploy [cpjobqueue/deploy@c8d62da]: Enable cirrus for everything T190327
[12:22:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:22:19] <stashbot>	 T190327: FY17/18 Q4 Program 8 Services Goal: Complete the JobQueue transition to EventBus - https://phabricator.wikimedia.org/T190327
[12:23:00] <logmsgbot>	 !log mobrovac@deploy1001 Synchronized wmf-config/jobqueue.php: Switch CirrusSearch jobs to EventBus for all wikis - T189137 (duration: 00m 57s)
[12:23:01] <logmsgbot>	 !log ppchelko@deploy1001 Finished deploy [cpjobqueue/deploy@c8d62da]: Enable cirrus for everything T190327 (duration: 00m 47s)
[12:23:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:23:08] <stashbot>	 T189137: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137
[12:23:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:24:21] <wikibugs_>	 (03PS9) 10Ema: prometheus: export intel-microcode information via node_exporter [puppet] - 10https://gerrit.wikimedia.org/r/436553 (https://phabricator.wikimedia.org/T127825)
[12:24:24] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1951 bytes in 0.087 second response time
[12:24:38] <logmsgbot>	 !log mobrovac@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Switch CirrusSearch jobs to EventBus for all wikis, file 2/2 - T189137 (duration: 00m 56s)
[12:24:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:25:03] <wikibugs_>	 (03CR) 10Ema: [C: 032] prometheus: export intel-microcode information via node_exporter [puppet] - 10https://gerrit.wikimedia.org/r/436553 (https://phabricator.wikimedia.org/T127825) (owner: 10Ema)
[12:27:59] * mobrovac done with deploy1001
[12:29:35] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1947 bytes in 0.106 second response time
[12:33:25] <wikibugs_>	 (03PS4) 10Jcrespo: mariadb: Failover m2-master to db1065 [puppet] - 10https://gerrit.wikimedia.org/r/437703 (https://phabricator.wikimedia.org/T186320)
[12:34:07] <wikibugs_>	 (03PS16) 10Elukey: [WIP] Create profile::analytics::cluster::packages::* classes [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata)
[12:38:47] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Repool db1084 with low load after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437744
[12:44:31] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Repool db1084 with low load after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437744 (owner: 10Jcrespo)
[12:46:04] <wikibugs_>	 (03Merged) 10jenkins-bot: mariadb: Repool db1084 with low load after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437744 (owner: 10Jcrespo)
[12:48:50] <Reedy>	 jouncebot: reload
[12:48:54] <Reedy>	 jouncebot: refresh
[12:48:55] <jouncebot>	 I refreshed my knowledge about deployments.
[12:50:11] <logmsgbot>	 !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1084 with low load (duration: 00m 56s)
[12:50:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:50:48] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Repool db1084 fully after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437745
[12:52:18] <wikibugs_>	 (03CR) 10jenkins-bot: Disable redis queue for cirrus search for all wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437448 (https://phabricator.wikimedia.org/T189137) (owner: 10Ppchelko)
[12:52:22] <wikibugs_>	 (03CR) 10jenkins-bot: mariadb: Repool db1084 with low load after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437744 (owner: 10Jcrespo)
[12:54:11] <wikibugs_>	 (03CR) 10Elukey: "All right we are at another checkpoint:" [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata)
[12:59:29] <akosiaris>	 !log add +spec_ctrl to ganeti01.svc.codfw.wmnet cluster default cpu_type
[12:59:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:00:04] <jouncebot>	 addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Your horoscope predicts another unfortunate European Mid-day SWAT(Max 6 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180606T1300).
[13:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[13:01:17] <wikibugs_>	 (03PS1) 10Muehlenhoff: Add library hint for elfutils [puppet] - 10https://gerrit.wikimedia.org/r/437747
[13:01:43] <akosiaris>	 !log starting slow rolling restart of all VMs on ganeti01.svc.codfw.wmnet
[13:01:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:02:11] <zeljkof>	 so, nothing for SWAT, no EU SWAT, I'm around if anybody comes late
[13:02:26] <wikibugs_>	 (03PS2) 10Muehlenhoff: Add library hint for elfutils [puppet] - 10https://gerrit.wikimedia.org/r/437747
[13:03:52] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 032] Add library hint for elfutils [puppet] - 10https://gerrit.wikimedia.org/r/437747 (owner: 10Muehlenhoff)
[13:06:05] <addshore>	 =o
[13:06:45] <moritzm>	 !log installing elfutils security updates
[13:06:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:22:58] <wikibugs_>	 10Operations, 10CirrusSearch, 10Discovery, 10Elasticsearch, and 2 others: Alert when elasticsearch writes are frozen for too long - https://phabricator.wikimedia.org/T193605#4260822 (10Gehel) Deployed and seems to be working
[13:30:31] <wikibugs_>	 10Operations, 10Move-Files-To-Commons, 10TCB-Team, 10Wikimedia-Extension-setup, and 2 others: Deploying FileExporter and FileImporter - https://phabricator.wikimedia.org/T190716#4260883 (10JStrodt_WMDE)
[13:30:41] <wikibugs_>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437754 (https://phabricator.wikimedia.org/T191316)
[13:31:36] <wikibugs_>	 10Operations, 10Move-Files-To-Commons, 10TCB-Team, 10Wikimedia-Extension-setup, and 2 others: Deploying FileExporter and FileImporter - https://phabricator.wikimedia.org/T190716#4081970 (10JStrodt_WMDE)
[13:32:19] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437754 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[13:32:45] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1972 bytes in 0.078 second response time
[13:33:30] <wikibugs_>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437754 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[13:33:46] <wikibugs_>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437754 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[13:35:10] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1096:3316 for alter table (duration: 00m 57s)
[13:35:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:55] <marostegui>	 !log Deploy schema change on db1096:3316  - T191316 T192926 T195193 T89737
[13:36:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:36:01] <stashbot>	 T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737
[13:36:01] <stashbot>	 T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926
[13:36:01] <stashbot>	 T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193
[13:36:02] <stashbot>	 T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316
[13:40:35] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/: /en.wikipedia.org/v1/page/media/{title}{/revision} (Get media in test page) is WARNING: Test Get media in test page responds with unexpected v
[13:40:35] <icinga-wm>	 [2] = Missing keys: [utitles, uthumbnail, ulicense]
[13:40:45] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/: /en.wikipedia.org/v1/page/media/{title}{/revision} (Get media in test page) is WARNING: Test Get media in test page responds with unexpected v
[13:40:45] <icinga-wm>	 [2] = Missing keys: [utitles, uthumbnail, ulicense]
[13:42:55] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1960 bytes in 0.072 second response time
[13:46:29] <wikibugs_>	 10Operations, 10Wikimedia-Mailing-lists: Wikidata_Mail_BR - https://phabricator.wikimedia.org/T196552#4260920 (10Kaioduarte-TB)
[13:48:05] <wikibugs_>	 (03PS2) 10Muehlenhoff: Remove at [puppet] - 10https://gerrit.wikimedia.org/r/435171
[13:49:21] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 032] Remove at [puppet] - 10https://gerrit.wikimedia.org/r/435171 (owner: 10Muehlenhoff)
[13:53:28] <wikibugs_>	 (03PS3) 10Muehlenhoff: Enable base::service_auto_restart for prometheus-mcrouter-exporter [puppet] - 10https://gerrit.wikimedia.org/r/436782 (https://phabricator.wikimedia.org/T135991)
[13:57:11] <wikibugs_>	 (03CR) 10Ottomata: "I think that's fine. We install refinery on analytics1003 and use it to launch jobs, so it makes sense that it gets all the packages." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata)
[13:58:17] <jynus>	 !log disabling puppet on db1051, db1065
[13:58:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:59:51] <wikibugs_>	 (03Abandoned) 10ArielGlenn: Fix killing dumpers in Wikidata entity dumpers [puppet] - 10https://gerrit.wikimedia.org/r/393923 (owner: 10Hoo man)
[14:00:32] <wikibugs_>	 (03PS1) 10Volans: Add nginx::snippet define [puppet/nginx] - 10https://gerrit.wikimedia.org/r/437761
[14:02:39] <wikibugs_>	 (03PS5) 10Jcrespo: mariadb: Failover m2-master to db1065 [puppet] - 10https://gerrit.wikimedia.org/r/437703 (https://phabricator.wikimedia.org/T186320)
[14:04:07] <wikibugs_>	 (03CR) 10Vgutierrez: [C: 04-1] Add nginx::snippet define (031 comment) [puppet/nginx] - 10https://gerrit.wikimedia.org/r/437761 (owner: 10Volans)
[14:05:21] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Failover m2-master to db1065 [puppet] - 10https://gerrit.wikimedia.org/r/437703 (https://phabricator.wikimedia.org/T186320) (owner: 10Jcrespo)
[14:05:35] <wikibugs_>	 (03CR) 10Volans: Add nginx::snippet define (031 comment) [puppet/nginx] - 10https://gerrit.wikimedia.org/r/437761 (owner: 10Volans)
[14:06:34] <andrewbogott>	 !log rebooting labvirt1003
[14:06:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:07:07] <wikibugs_>	 (03CR) 10Vgutierrez: [C: 04-1] Add nginx::snippet define (031 comment) [puppet/nginx] - 10https://gerrit.wikimedia.org/r/437761 (owner: 10Volans)
[14:09:15] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Switchover m2-master to db1065 [puppet] - 10https://gerrit.wikimedia.org/r/437714 (https://phabricator.wikimedia.org/T186320) (owner: 10Jcrespo)
[14:09:20] <wikibugs_>	 (03PS3) 10Jcrespo: mariadb: Switchover m2-master to db1065 [puppet] - 10https://gerrit.wikimedia.org/r/437714 (https://phabricator.wikimedia.org/T186320)
[14:09:22] <wikibugs_>	 (03PS2) 10Volans: Add nginx::snippet define [puppet/nginx] - 10https://gerrit.wikimedia.org/r/437761
[14:09:24] <wikibugs_>	 (03PS1) 10Muehlenhoff: Also remove at from gridengine [puppet] - 10https://gerrit.wikimedia.org/r/437764
[14:09:45] <icinga-wm>	 PROBLEM - Host labvirt1003 is DOWN: PING CRITICAL - Packet loss = 100%
[14:09:53] <wikibugs_>	 (03CR) 10Volans: "replies inline" (031 comment) [puppet/nginx] - 10https://gerrit.wikimedia.org/r/437761 (owner: 10Volans)
[14:10:05] <icinga-wm>	 PROBLEM - Host www.toolserver.org is DOWN: CRITICAL - Host Unreachable (www.toolserver.org)
[14:10:21] <wikibugs_>	 (03CR) 10Rush: [C: 031] "I won't even harrass you that this doesn't need to be an array anymore :D" [puppet] - 10https://gerrit.wikimedia.org/r/437764 (owner: 10Muehlenhoff)
[14:10:29] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 031] Also remove at from gridengine [puppet] - 10https://gerrit.wikimedia.org/r/437764 (owner: 10Muehlenhoff)
[14:11:14] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 032] Also remove at from gridengine [puppet] - 10https://gerrit.wikimedia.org/r/437764 (owner: 10Muehlenhoff)
[14:11:26] <jynus>	 marostegui: so I am ready for the switch
[14:11:32] <marostegui>	 so am I!
[14:12:11] <wikibugs_>	 (03CR) 10Vgutierrez: [C: 031] "<3" [puppet/nginx] - 10https://gerrit.wikimedia.org/r/437761 (owner: 10Volans)
[14:12:12] <marostegui>	 you want me to handle the dbproxies for example?
[14:12:42] <_joe_>	 volans, vgutierrez is /etc/nginx/snippets a debian standard way to do things?
[14:12:52] <jynus>	 note I wrote 1001 and 1006
[14:12:57] <jynus>	 but it is 1002 and 1007
[14:13:04] <jynus>	 but yes
[14:13:23] <_joe_>	 I didn't know, tbh 
[14:13:51] <marostegui>	 ok, I will reload them whenever you give me green light
[14:14:00] <jynus>	 ok, starting
[14:14:27] <jynus>	 !log starting s2-master switchover from db1051 to db1065
[14:14:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:14:35] <icinga-wm>	 PROBLEM - Check systemd state on kafkamon2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[14:14:37] <volans>	 _joe_: need to check the /etc/nginx/snippets is generated by the nginx-common package and has fastcgi-php.conf and snakeoil.conf inside
[14:15:02] <elukey>	 checking kafkamon
[14:15:05] <_joe_>	 yeah, and it doesn't have conf-available/conf-enabled
[14:15:21] <volans>	 has /etc/nginx/conf.d but all inside is included
[14:15:36] <_joe_>	 yeah, it's a different beast for sure
[14:15:53] <jynus>	 marostegui: prepare
[14:15:56] <marostegui>	 ok
[14:16:10] <marostegui>	 whenever you want :)
[14:16:13] <marostegui>	 i am ready
[14:16:29] <jynus>	 heartbeat ran on 1051
[14:16:32] <jynus>	 not on 65
[14:16:37] <akosiaris>	 https://www.youtube.com/watch?v=l2-iq7moFgM
[14:16:51] <akosiaris>	 marostegui: :P
[14:17:06] <jynus>	 missing patch
[14:17:31] <jynus>	 going back to rw
[14:17:38] <marostegui>	 I will prepare the patch
[14:17:45] <wikibugs_>	 (03PS5) 10Jcrespo: mariadb: Switchover m2-master to db1065 [puppet] - 10https://gerrit.wikimedia.org/r/437714 (https://phabricator.wikimedia.org/T186320)
[14:17:54] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: conftool-data: merge the jobrunner, videoscaler clusters [puppet] - 10https://gerrit.wikimedia.org/r/437494
[14:18:08] <wikibugs_>	 (03CR) 10Jcrespo: [V: 032 C: 032] mariadb: Switchover m2-master to db1065 [puppet] - 10https://gerrit.wikimedia.org/r/437714 (https://phabricator.wikimedia.org/T186320) (owner: 10Jcrespo)
[14:18:17] <marostegui>	 that was fast :)
[14:18:43] <jynus>	 ok, going to read only again
[14:18:45] <volans>	 _joe_: to answer your question yes, seems debian-specific: https://salsa.debian.org/nginx-team/nginx/blob/master/debian/changelog#L533
[14:18:53] <jynus>	 !log setting gerrit on read only
[14:18:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:19:52] <jynus>	 we are ok now, switch
[14:19:57] <marostegui>	 ok
[14:20:02] <marostegui>	 I am ready whenever you want
[14:20:18] <jynus>	 no
[14:20:20] <jynus>	 now
[14:20:22] <marostegui>	 ok
[14:20:38] <marostegui>	 done
[14:20:49] <marostegui>	 db1065 is now on dbproxies and db1051 is gone
[14:20:50] <jynus>	 confirm on stats?
[14:20:52] <jynus>	 ok
[14:21:01] <marostegui>	 yep
[14:21:07] <jynus>	 !log setting m2 on read write
[14:21:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:37] <jynus>	 replication is working
[14:22:00] <jynus>	 killing connection on db1051
[14:22:20] <jynus>	 akosiaris: check otrs
[14:22:21] <marostegui>	 akosiaris: can you check otrs?
[14:22:26] <akosiaris>	 checking
[14:22:46] <wikibugs_>	 (03CR) 10Jcrespo: [V: 032 C: 032] "This didn't went so well :-/" [puppet] - 10https://gerrit.wikimedia.org/r/437714 (https://phabricator.wikimedia.org/T186320) (owner: 10Jcrespo)
[14:22:58] <akosiaris>	 seems fine
[14:22:58] <andrewbogott>	 !log rebooting labvirt1009
[14:22:58] <jynus>	 ^I can comment on gerrit, so writes work
[14:23:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:23:16] <akosiaris>	 anything I 've tested in OTRS works
[14:23:44] <_joe_>	 can I merge my own change then, or should I wait for you in case you need a very quick revert?
[14:23:50] <wikibugs_>	 (03PS3) 10Giuseppe Lavagetto: conftool-data: merge the jobrunner, videoscaler clusters [puppet] - 10https://gerrit.wikimedia.org/r/437494
[14:24:09] <wikibugs_>	 (03CR) 10Elukey: [WIP] Create profile::analytics::cluster::packages::* classes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata)
[14:24:12] <jynus>	 _joe_: go on
[14:24:17] <jynus>	 if we were to revert
[14:24:18] <_joe_>	 ok, thanks
[14:24:26] <volans>	 all good for debmonitor
[14:24:34] <_joe_>	 in case don't worry, just don't do sudo -i puppet-merge in case
[14:24:35] <_joe_>	 :P
[14:24:39] <jynus>	 we would do it with everything already written
[14:24:48] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] conftool-data: merge the jobrunner, videoscaler clusters [puppet] - 10https://gerrit.wikimedia.org/r/437494 (owner: 10Giuseppe Lavagetto)
[14:24:50] <jynus>	 so not really a revert, just another fail
[14:24:51] <wikibugs_>	 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 3 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#4261077 (10Niedzielski)
[14:24:52] <wikibugs_>	 (03PS1) 10Ppchelko: Switch all jobs to the new queue and clean up the old queue configs. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437767 (https://phabricator.wikimedia.org/T190327)
[14:25:15] <icinga-wm>	 PROBLEM - Host checker.tools.wmflabs.org is DOWN: CRITICAL - Host Unreachable (checker.tools.wmflabs.org)
[14:25:35] <icinga-wm>	 PROBLEM - Host labvirt1009 is DOWN: PING CRITICAL - Packet loss = 100%
[14:25:39] <wikibugs_>	 (03PS17) 10Elukey: [WIP] Create profile::analytics::cluster::packages::* classes [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata)
[14:25:59] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Switch all jobs to the new queue and clean up the old queue configs. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437767 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko)
[14:26:28] <jynus>	 so there is a gotcha- don't coordinate on etherpad if you are going to do maintenance on etehrpad
[14:26:43] <akosiaris>	 ahahahaha
[14:26:44] <Krinkle>	 :)
[14:26:48] <jynus>	 and don't deploy patches if you are going to do maintenance on gerrit
[14:26:53] <jynus>	 I learned today the second
[14:26:57] <jynus>	 one patch got stuck
[14:27:12] <jynus>	 which prevented itself from unstuck
[14:27:34] <wikibugs_>	 (03PS2) 10Ppchelko: Switch all jobs to the new queue and clean up the old queue configs. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437767 (https://phabricator.wikimedia.org/T190327)
[14:28:19] <wikibugs_>	 10Operations, 10ops-eqiad, 10netops: replace mr1-eqiad - https://phabricator.wikimedia.org/T185171#4261082 (10faidon) It's been a few months now, what's the status of this?
[14:28:29] <jynus>	 marostegui: I am setting up db1051 replication
[14:28:33] <jynus>	 just in case
[14:29:01] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on videoscaler.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:29:08] <akosiaris>	 um
[14:29:11] <marostegui>	 yeah
[14:29:13] <marostegui>	 agreed
[14:29:13] <akosiaris>	 _joe_: ^ ?
[14:29:13] <_joe_>	 that's my fault
[14:29:16] <akosiaris>	 ok
[14:29:22] <_joe_>	 yeah, lemme understand what's up
[14:30:30] <_joe_>	 it's codfw, so it's not critical
[14:31:12] <jynus>	 it is the maintenance you are doing with puppet, right?
[14:31:29] <_joe_>	 yeah I confirm it's just codfw
[14:31:31] <jynus>	 (probably related to that)
[14:32:16] <_joe_>	 uhm,no idea why that's happening
[14:35:54] <logmsgbot>	 !log oblivian@puppetmaster1001 conftool action : set/pooled=inactive; selector: cluster=videoscaler,dc=codfw,service=nginx,name=mw21(5[3-9]|6).*
[14:35:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:36:50] <wikibugs_>	 (03PS1) 10Papaul: DNS: Add prod & mgmt DNS for frmon2001 [dns] - 10https://gerrit.wikimedia.org/r/437768 (https://phabricator.wikimedia.org/T196476)
[14:37:28] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor nitpick, rest LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/435631 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk)
[14:37:45] <icinga-wm>	 RECOVERY - Check systemd state on kafkamon2001 is OK: OK - running: The system is fully operational
[14:38:04] <wikibugs_>	 10Operations, 10Wikimedia-Mailing-lists: Give admin acces to recommender-feedback@wikimedia.org - https://phabricator.wikimedia.org/T196556#4261110 (10bmansurov)
[14:38:25] <icinga-wm>	 PROBLEM - etcd request latencies on acrab is CRITICAL: instance=10.192.16.26:6443 operation=list https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[14:39:14] <logmsgbot>	 !log oblivian@puppetmaster1001 conftool action : set/pooled=inactive; selector: cluster=videoscaler,dc=codfw,service=nginx,name=mw22(4[1-5]|5[3-8]|6[1-9]).*
[14:39:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:35] <icinga-wm>	 RECOVERY - etcd request latencies on acrab is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[14:40:09] <wikibugs_>	 10Operations, 10Wikimedia-Mailing-lists: Give admin acces to recommender-feedback@wikimedia.org - https://phabricator.wikimedia.org/T196556#4261130 (10bmansurov)
[14:40:12] <wikibugs_>	 (03PS2) 10ArielGlenn: allow writeuptopageid to write multiple output files [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/436511 (https://phabricator.wikimedia.org/T196063)
[14:42:16] <awight>	 akosiaris: greg-g: I'd like to carve out an ORES deployment window today, maybe 15:00-16:00 UTC, unless there are objections?  I noticed that the Wednesday Services window overlaps with the train, which wouldn't be cool in this case.
[14:42:40] <wikibugs_>	 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Import some Analytics git puppet submodules to operations/puppet - https://phabricator.wikimedia.org/T188377#4261137 (10elukey)
[14:42:51] <_joe_>	 I'm still not sure what's happening in codfw with the videoscalers tbh
[14:43:08] <_joe_>	 oh I see, it happens I'm an idiot
[14:43:11] <_joe_>	 sorry people
[14:45:16] <icinga-wm>	 PROBLEM - Apache HTTP on mwdebug2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:45:23] <_joe_>	 uh?
[14:45:31] <_joe_>	 that's not me at all ^^
[14:45:38] <moritzm>	 Alex is rebooting ganeti instances
[14:45:44] <_joe_>	 oh ok
[14:46:15] <icinga-wm>	 RECOVERY - Apache HTTP on mwdebug2002 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 621 bytes in 0.134 second response time
[14:46:29] <wikibugs_>	 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10netops: switch port configuration for frmon2001 - https://phabricator.wikimedia.org/T196557#4261150 (10Papaul) p:05Triage>03Normal
[14:46:41] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on videoscaler.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 266 bytes in 0.132 second response time
[14:47:16] <jynus>	 did you discover what it was^?
[14:47:51] <jynus>	 Oh, I just read probably just a mistake
[14:49:44] <_joe_>	 jynus: I forgot to reenable puppet on some servers in codfw
[14:49:53] <_joe_>	 simple as that
[14:50:31] <jynus>	 :-)
[14:51:57] <Krinkle>	 milimetric: Do you know if the 'https=1' portion is still useful in the analytics response header?
[14:54:03] <milimetric>	 hm, not off the top of my head Krinkle, I’m at the doctor’s, maybe mforns can take a look?
[14:54:10] <Krinkle>	 thx
[14:55:14] <wikibugs_>	 10Operations, 10ops-eqiad, 10netops: replace mr1-eqiad - https://phabricator.wikimedia.org/T185171#4261177 (10Cmjohnson) @faidon it's still not done, I have been waiting until we're finished upgrading the network switches
[14:56:07] <greg-g>	 awight: should be fine, please add to calendar
[14:56:33] <awight>	 greg-g: done, ty!
[14:57:02] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Remove db1051, to be decommissioned, add db1065 [software] - 10https://gerrit.wikimedia.org/r/437769 (https://phabricator.wikimedia.org/T195484)
[14:57:25] <wikibugs_>	 (03PS1) 10Ayounsi: Facter: add a v4 and v6 default routes fact [puppet] - 10https://gerrit.wikimedia.org/r/437771
[14:58:37] <icinga-wm>	 RECOVERY - Host labvirt1003 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms
[14:58:48] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: jobrunner: uniform hiera parameters between jobrunner and videoscaler [puppet] - 10https://gerrit.wikimedia.org/r/437772
[14:58:48] <icinga-wm>	 PROBLEM - ensure kvm processes are running on labvirt1003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args /usr/bin/kvm
[15:00:03] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] jobrunner: uniform hiera parameters between jobrunner and videoscaler [puppet] - 10https://gerrit.wikimedia.org/r/437772 (owner: 10Giuseppe Lavagetto)
[15:00:04] <jouncebot>	 awight: How many deployers does it take to do ORES special deployment deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180606T1500).
[15:00:54] <awight>	 ORES is about to do some donuts in the parking lot.
[15:01:08] <logmsgbot>	 !log awight@deploy1001 Started deploy [ores/deploy@65e979f]: ORES: new draft topic model; T176336
[15:01:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:20] <stashbot>	 T176336: Deploy drafttopic model to production ORES - https://phabricator.wikimedia.org/T176336
[15:01:30] <wikibugs_>	 (03PS3) 10ArielGlenn: allow writeuptopageid to write multiple output files [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/436511 (https://phabricator.wikimedia.org/T196063)
[15:06:17] <urandom>	 !log upgrade Cassandra to 3.11.2, restbase1007-{b,c} - T178905
[15:06:22] <logmsgbot>	 !log oblivian@puppetmaster1001 conftool action : set/pooled=yes; selector: cluster=videoscaler,dc=codfw,service=nginx
[15:06:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:06:32] <stashbot>	 T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905
[15:06:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:07:37] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1303 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[15:07:49] <icinga-wm>	 RECOVERY - ensure kvm processes are running on labvirt1003 is OK: PROCS OK: 3 processes with regex args /usr/bin/kvm
[15:07:49] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1305 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[15:08:05] <_joe_>	 the criticals on hhvm is me restarting it again, sorry
[15:08:17] <icinga-wm>	 RECOVERY - Host labvirt1009 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms
[15:08:33] <wikibugs_>	 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team: rack/setup/install labstore1008 & labstore1009 - https://phabricator.wikimedia.org/T193655#4261243 (10Cmjohnson) @chasemp I went to cable these today and noticed they have 10G nics...do you need these in a 10G rack?
[15:08:37] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1303 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time
[15:08:48] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1305 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time
[15:10:17] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1302 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[15:11:17] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1302 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time
[15:12:17] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1311 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[15:12:21] <wikibugs_>	 10Operations, 10ops-eqiad: Degraded RAID on wtp1043 - https://phabricator.wikimedia.org/T196260#4261260 (10Cmjohnson) a:05Cmjohnson>03RobH assigning to @robh to order a new disk because my techdirect renewal is pending approval
[15:12:44] <robh>	 oh yeah, ill do that now
[15:13:18] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1311 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time
[15:13:46] <_joe_>	 !log adding jobrunners, videoscalers to both pools with equal weight in codfw
[15:13:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:17:17] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1308 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.004 second response time
[15:17:26] <wikibugs_>	 (03PS1) 10Krinkle: varnish: Remove setting of CP cookies [puppet] - 10https://gerrit.wikimedia.org/r/437774 (https://phabricator.wikimedia.org/T110353)
[15:17:30] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Remove db1051, to be decommissioned, add db1065 [software] - 10https://gerrit.wikimedia.org/r/437769 (https://phabricator.wikimedia.org/T195484) (owner: 10Jcrespo)
[15:18:17] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1308 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time
[15:19:37] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1306 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[15:19:37] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1293 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[15:20:37] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1306 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.001 second response time
[15:20:38] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1293 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time
[15:21:47] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1304 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[15:22:48] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1304 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.005 second response time
[15:22:48] <wikibugs_>	 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install phab1002(WMF4727) - https://phabricator.wikimedia.org/T196019#4261344 (10Cmjohnson) @dzahn I see phab1002 is installed and in icinga does the bios/drac/serial still need setup/testing
[15:22:57] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1296 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[15:23:37] <wikibugs_>	 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install phab1002(WMF4727) - https://phabricator.wikimedia.org/T196019#4261346 (10Cmjohnson)
[15:23:58] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1296 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time
[15:26:31] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: site.pp: merge videoscalers into the jobrunners [puppet] - 10https://gerrit.wikimedia.org/r/437776
[15:26:45] <logmsgbot>	 !log awight@deploy1001 Finished deploy [ores/deploy@65e979f]: ORES: new draft topic model; T176336 (duration: 25m 37s)
[15:26:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:26:56] <stashbot>	 T176336: Deploy drafttopic model to production ORES - https://phabricator.wikimedia.org/T176336
[15:26:56] <halfak>	 \o/
[15:26:59] <XioNoX>	 !log stop pybal on lvs1001
[15:27:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:27:23] <urandom>	 !log upgrade Cassandra to 3.11.2, restbase2001-{b,c} - T178905
[15:27:25] <halfak>	 Looks good
[15:27:32] <awight>	 yes that happened!
[15:27:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:27:35] <stashbot>	 T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905
[15:28:10] <_joe_>	 I see a raise in the number of errors on the OresFetchScore jobs
[15:28:23] <wikibugs_>	 (03PS1) 10Sau226: Implementing Patroller User Rights for azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437777 (https://phabricator.wikimedia.org/T196488)
[15:28:26] <_joe_>	 halfak/ awight it might be an artifact of deployment, I'll keep you updated
[15:28:33] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Implementing Patroller User Rights for azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437777 (https://phabricator.wikimedia.org/T196488) (owner: 10Sau226)
[15:28:44] <_joe_>	 reference graph: https://grafana.wikimedia.org/dashboard/db/jobqueue-eventbus?orgId=1&panelId=9&fullscreen&from=now-15m&to=now
[15:28:47] <icinga-wm>	 PROBLEM - ores on ores2001 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 136 bytes in 0.061 second response time
[15:28:47] <icinga-wm>	 PROBLEM - Check systemd state on ores2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[15:29:10] <_joe_>	 it's already going down, so less critical
[15:29:15] <wikibugs_>	 (03PS2) 10Sau226: Implementing Patroller User Rights for azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437777 (https://phabricator.wikimedia.org/T196488)
[15:29:19] <_joe_>	 and probably a consequence of the deploy
[15:30:20] <awight>	 akosiaris: mutante: Can I ask a favor...  I need two directories rm'd
[15:30:37] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1001 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090
[15:30:37] <icinga-wm>	 PROBLEM - pybal on lvs1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal
[15:30:53] <awight>	 ores200[1-2].codfw.wmnet:/srv/deployment/ores/deploy-cache/revs/65e979fc2ee87198a93473a852278b2adf551dc8
[15:31:05] <wikibugs_>	 (03PS3) 10Sau226: Implementing Patroller User Rights for azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437777 (https://phabricator.wikimedia.org/T196488)
[15:31:27] <icinga-wm>	 PROBLEM - PyBal connections to etcd on lvs1001 is CRITICAL: CRITICAL: 0 connections established with conf1001.eqiad.wmnet:2379 (min=4)
[15:32:18] <_joe_>	 !log cross enabling videoscalers,jobrunners in their respective pools
[15:32:22] <ema>	 the lvs1001 alerts are known, pybal stopped for maintenance ^
[15:32:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:57] <icinga-wm>	 RECOVERY - Host www.toolserver.org is UP: PING OK - Packet loss = 0%, RTA = 0.49 ms
[15:34:17] <icinga-wm>	 RECOVERY - Check systemd state on ores2001 is OK: OK - running: The system is fully operational
[15:34:40] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Decommission db1051 [puppet] - 10https://gerrit.wikimedia.org/r/437779 (https://phabricator.wikimedia.org/T195484)
[15:34:55] <awight>	 akosiaris: mutante: nvm the request above, I worked around like you don't want to think about.
[15:35:08] <icinga-wm>	 ACKNOWLEDGEMENT - PyBal backends health check on lvs1001 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 Ayounsi Maintenance for T187962
[15:35:08] <icinga-wm>	 ACKNOWLEDGEMENT - PyBal connections to etcd on lvs1001 is CRITICAL: CRITICAL: 0 connections established with conf1001.eqiad.wmnet:2379 (min=4) Ayounsi Maintenance for T187962
[15:35:08] <icinga-wm>	 ACKNOWLEDGEMENT - pybal on lvs1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal Ayounsi Maintenance for T187962
[15:35:13] <awight>	 awight@deploy1001:/srv/deployment/ores/deploy$ SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service ores2002.codfw.wmnet "cd /srv/deployment/ores/dep
[15:35:16] <awight>	 loy-cache/revs/65e979fc2ee87198a93473a852278b2adf551dc8/submodules/assets; git lfs pull"
[15:35:19] <awight>	 MEOW
[15:36:10] * halfak barfs a little bit
[15:38:25] <wikibugs_>	 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4261381 (10ayounsi) @Cmjohnson:  Please move lvs1001 from asw-c-eqiad:ge-2/0/45 to asw2-c-eqiad:ge-2/0/27  and (after I'm done de-pooling the host) lvs1002 fro...
[15:38:40] <logmsgbot>	 !log ppchelko@deploy1001 Started deploy [cpjobqueue/deploy@402d729]: Adjust the cirrus concurrencies
[15:38:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:38:54] <awight>	 All done messing with ORES.
[15:39:18] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1001 is OK: PYBAL OK - All pools are healthy
[15:39:18] <logmsgbot>	 !log ppchelko@deploy1001 Finished deploy [cpjobqueue/deploy@402d729]: Adjust the cirrus concurrencies (duration: 00m 40s)
[15:39:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:39:24] <awight>	 Seems that our deployment is happy.
[15:39:27] <icinga-wm>	 RECOVERY - pybal on lvs1001 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal
[15:41:38] <icinga-wm>	 RECOVERY - PyBal connections to etcd on lvs1001 is OK: OK: 4 connections established with conf1001.eqiad.wmnet:2379 (min=4)
[15:42:37] <icinga-wm>	 PROBLEM - puppet last run on cp4026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:42:57] <icinga-wm>	 PROBLEM - puppet last run on elastic2015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:42:57] <icinga-wm>	 PROBLEM - puppet last run on mw2159 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:43:00] <wikibugs_>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: frdb2001 RAID disk failure - https://phabricator.wikimedia.org/T196251#4261401 (10Jgreen) 05Open>03Resolved excellent, thanks!
[15:43:17] <icinga-wm>	 PROBLEM - puppet last run on mw2191 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:43:27] <icinga-wm>	 PROBLEM - puppet last run on cp2019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:43:42] <_joe_>	 uhm what's up with puppet?
[15:43:46] <_joe_>	 can someone look?
[15:43:57] <icinga-wm>	 PROBLEM - puppet last run on db2093 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:43:57] <icinga-wm>	 PROBLEM - puppet last run on puppetdb2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:44:28] <wikibugs_>	 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10netops: switch port configuration for frbast2001 - https://phabricator.wikimedia.org/T196503#4261409 (10Jgreen)
[15:44:58] <icinga-wm>	 PROBLEM - puppet last run on mw2278 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:45:17] <icinga-wm>	 PROBLEM - puppet last run on mw2190 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:45:18] <icinga-wm>	 PROBLEM - puppet last run on elastic2034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:45:48] <icinga-wm>	 PROBLEM - puppet last run on mc2020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:46:00] <wikibugs_>	 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10netops: switch port configuration for frbast2001 - https://phabricator.wikimedia.org/T196503#4258998 (10Jgreen) Note--corrected hostname on the task title and description. @ayounsi this should be vlan frack-bastion-codfw.
[15:46:17] <icinga-wm>	 PROBLEM - puppet last run on rdb2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:46:38] <icinga-wm>	 PROBLEM - puppet last run on rdb2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:46:47] <icinga-wm>	 PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:46:57] <icinga-wm>	 PROBLEM - puppet last run on mw2178 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:46:58] <icinga-wm>	 PROBLEM - puppet last run on acrab is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:47:17] <icinga-wm>	 PROBLEM - puppet last run on ms-be2039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:47:18] <icinga-wm>	 PROBLEM - puppet last run on db2038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:47:48] <wikibugs_>	 (03PS2) 10RobH: Add Reedy to contint-docker group [puppet] - 10https://gerrit.wikimedia.org/r/436860 (https://phabricator.wikimedia.org/T196192) (owner: 10Reedy)
[15:48:24] <wikibugs_>	 (03CR) 10RobH: [C: 032] Add Reedy to contint-docker group [puppet] - 10https://gerrit.wikimedia.org/r/436860 (https://phabricator.wikimedia.org/T196192) (owner: 10Reedy)
[15:49:31] <wikibugs_>	 10Operations, 10Continuous-Integration-Infrastructure, 10SRE-Access-Requests, 10Patch-For-Review: Add Reedy to contint-docker group - https://phabricator.wikimedia.org/T196192#4261418 (10RobH) 05Open>03Resolved a:03RobH This has been merged live.  All affected servers will call into puppet and get th...
[15:49:48] <moritzm>	 probably puppetdb restart for codfw
[15:49:54] <moritzm>	 it's also on ganeti
[15:49:55] <wikibugs_>	 (03CR) 10Jgreen: [C: 031] DNS: Add prod & mgmt DNS for frmon2001 [dns] - 10https://gerrit.wikimedia.org/r/437768 (https://phabricator.wikimedia.org/T196476) (owner: 10Papaul)
[15:51:07] <XioNoX>	 !log disable pybal on lvs1002 - T187962
[15:51:17] <moritzm>	 manual puppet run on affected host works fine, seems like fallout of puppetdb reboot
[15:51:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:51:25] <stashbot>	 T187962: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962
[15:51:58] <icinga-wm>	 RECOVERY - puppet last run on mw2178 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[15:53:37] <icinga-wm>	 RECOVERY - Host checker.tools.wmflabs.org is UP: PING OK - Packet loss = 0%, RTA = 0.80 ms
[15:54:25] <urandom>	 !log upgrade Cassandra to 3.11.2, restbase2001-{a,b,c} - T178905
[15:54:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:54:37] <stashbot>	 T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905
[15:54:50] <wikibugs_>	 10Operations, 10Traffic, 10Wikimania-Hackathon-2018, 10Availability (MediaWiki-MultiDC): Create HTTP verb and sticky cookie DC routing in VCL - https://phabricator.wikimedia.org/T91820#1096885 (10Anomie) >>! On IRC, @tstarling wrote: > <TimStarling> pity the SessionManager refactor did not add replication...
[15:55:25] <wikibugs_>	 10Operations, 10Phabricator: Phabricator is very slow to load - https://phabricator.wikimedia.org/T196565#4261464 (10Paladox)
[15:56:03] <wikibugs_>	 10Operations, 10Phabricator: Phabricator is very slow to load - https://phabricator.wikimedia.org/T196565#4261482 (10Paladox) p:05Triage>03Unbreak!
[15:56:36] <andrewbogott>	 !log rebooting labvirt1014
[15:56:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:56:49] <wikibugs_>	 10Operations, 10Wikimedia-Mailing-lists: Give admin acces to recommender-feedback@wikimedia.org - https://phabricator.wikimedia.org/T196556#4261110 (10RobH) I don't see this actual list already created, but it mentions that Ori was the admin?  This seems to be asking for a modification, not a new list.  Is the...
[15:57:16] <wikibugs_>	 10Operations, 10Phabricator: Phabricator is very slow to load - https://phabricator.wikimedia.org/T196565#4261464 (10greg) See also: https://news.ycombinator.com/item?id=17245649
[15:57:58] <twentyafterfour>	 !log reloading apache on phab1001 to free up some resources
[15:58:01] <wikibugs_>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: labtest: keystone: delete service (collapsed) [puppet] - 10https://gerrit.wikimedia.org/r/437783 (https://phabricator.wikimedia.org/T167559)
[15:58:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:59:07] <icinga-wm>	 RECOVERY - puppet last run on puppetdb2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[15:59:15] <wikibugs_>	 10Operations, 10Phabricator: Phabricator is very slow to load - https://phabricator.wikimedia.org/T196565#4261505 (10mmodell) Restarted apache to free up some stuck processes, this seems to have helped quite a bit, I'm not sure for how long though.
[16:00:37] <wikibugs_>	 (03CR) 1020after4: [C: 031] Initialize LFS on scap targets [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627) (owner: 10Awight)
[16:01:10] <wikibugs_>	 (03PS1) 10Ayounsi: [WIP] Add static routes with MTU 1450 for ipsec dests [puppet] - 10https://gerrit.wikimedia.org/r/437784
[16:01:11] <wikibugs_>	 10Operations, 10SRE-Access-Requests, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4261509 (10Jdforrester-WMF)
[16:01:35] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add static routes with MTU 1450 for ipsec dests [puppet] - 10https://gerrit.wikimedia.org/r/437784 (owner: 10Ayounsi)
[16:01:52] <wikibugs_>	 (03CR) 10Rush: [C: 032] openstack: labtest: keystone: delete service (collapsed) [puppet] - 10https://gerrit.wikimedia.org/r/437783 (https://phabricator.wikimedia.org/T167559) (owner: 10Arturo Borrero Gonzalez)
[16:02:22] <wikibugs_>	 (03CR) 10Rush: [C: 032] "We probably need to stop keystone on labtestcontrol2001 and make sure it doesn't start on boot?  Could puppetize that...I'm good either wa" [puppet] - 10https://gerrit.wikimedia.org/r/437783 (https://phabricator.wikimedia.org/T167559) (owner: 10Arturo Borrero Gonzalez)
[16:02:41] <wikibugs_>	 10Operations, 10Wikimedia-Mailing-lists: Give admin acces to recommender-feedback@wikimedia.org - https://phabricator.wikimedia.org/T196556#4261524 (10bmansurov) @RobH, thanks for the reply. My bad, I mixed up this individual email address with a list. Do you know who manages @wikimedia.org email addresses?
[16:04:04] <wikibugs_>	 (03CR) 10Arturo Borrero Gonzalez: [C: 032] openstack: labtest: keystone: delete service (collapsed) [puppet] - 10https://gerrit.wikimedia.org/r/437783 (https://phabricator.wikimedia.org/T167559) (owner: 10Arturo Borrero Gonzalez)
[16:04:25] <wikibugs_>	 10Operations, 10Wikimedia-Mailing-lists: Give admin acces to recommender-feedback@wikimedia.org - https://phabricator.wikimedia.org/T196556#4261527 (10RobH) WMF OIT handles the actual @wikimedia.org address allocations for staff and a google alias.  That isn't a @lists.wikimedia.org address, which is handled v...
[16:04:35] <andrewbogott>	 !log rebooting labvirt1002
[16:04:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:05:40] <XioNoX>	 !log lvs1002 repooled
[16:05:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:08:07] <icinga-wm>	 PROBLEM - toolschecker: Redis set/get on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/redis - 259 bytes in 12.013 second response time
[16:09:12] <andrewbogott>	 !log rebooting labvirt1004
[16:09:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:10:38] <icinga-wm>	 RECOVERY - puppet last run on mw2190 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures
[16:11:18] <icinga-wm>	 RECOVERY - puppet last run on mc2020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:11:47] <icinga-wm>	 RECOVERY - puppet last run on rdb2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:12:08] <icinga-wm>	 RECOVERY - puppet last run on rdb2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:12:17] <icinga-wm>	 RECOVERY - puppet last run on restbase2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:12:28] <icinga-wm>	 RECOVERY - puppet last run on acrab is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:12:47] <icinga-wm>	 RECOVERY - puppet last run on ms-be2039 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:12:48] <icinga-wm>	 RECOVERY - puppet last run on db2038 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:13:08] <icinga-wm>	 RECOVERY - puppet last run on cp4026 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:13:27] <icinga-wm>	 RECOVERY - puppet last run on elastic2015 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[16:13:27] <icinga-wm>	 RECOVERY - puppet last run on mw2159 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:13:47] <icinga-wm>	 RECOVERY - puppet last run on mw2191 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[16:13:57] <icinga-wm>	 RECOVERY - puppet last run on cp2019 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[16:14:27] <icinga-wm>	 RECOVERY - puppet last run on db2093 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[16:14:28] <icinga-wm>	 RECOVERY - toolschecker: Redis set/get on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.009 second response time
[16:15:28] <icinga-wm>	 RECOVERY - puppet last run on mw2278 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[16:15:57] <icinga-wm>	 RECOVERY - puppet last run on elastic2034 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[16:16:26] <andrewbogott>	 !log rebooting labvirt1005
[16:16:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:07] <icinga-wm>	 PROBLEM - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.008 second response time
[16:21:08] <icinga-wm>	 PROBLEM - toolschecker: tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 1.041 second response time
[16:21:18] <icinga-wm>	 RECOVERY - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.005 second response time
[16:21:38] <icinga-wm>	 PROBLEM - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.209 second response time
[16:23:57] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[16:25:13] <andrewbogott>	 !log rebooting labvirt1006
[16:25:14] <jynus>	 !log stop mysql @ db1051 in preparation for decom
[16:25:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:25:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:25:45] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Decommission db1051 [puppet] - 10https://gerrit.wikimedia.org/r/437779 (https://phabricator.wikimedia.org/T195484)
[16:26:12] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Decommission db1051 [puppet] - 10https://gerrit.wikimedia.org/r/437779 (https://phabricator.wikimedia.org/T195484) (owner: 10Jcrespo)
[16:26:14] <wikibugs>	 10Operations, 10MediaWiki-Debian, 10Wikimedia-Mailing-lists: Create mediawiki-debian mailing list - https://phabricator.wikimedia.org/T192865#4261579 (10RobH) 05Open>03Resolved a:03RobH This seems to have sat, and the new list is working fine without the old list archives from the third party server....
[16:27:48] <icinga-wm>	 RECOVERY - toolschecker: tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 1015 bytes in 0.014 second response time
[16:30:02] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Create new editing-team mailing list - https://phabricator.wikimedia.org/T196120#4261595 (10RobH) 05Open>03Resolved a:03RobH I've gone ahead and created this list, setting it to private and requiring approval to join it.  Since it is a private team list, I turned...
[16:31:50] <icinga-wm>	 PROBLEM - toolschecker: All Flannel etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/flannel - 255 bytes in 3.592 second response time
[16:33:42] <andrewbogott>	 !log rebooting labvirt1007
[16:33:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:39:11] <wikibugs>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4261829 (10jcrespo) There is a script `operations/software/dbtools/events_sanitarium.sql` that should be checked, updated and d...
[17:40:20] <wikibugs>	 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293#4261830 (10Vgutierrez) @Cmjohnson any updates regarding lvs1015?
[17:44:19] <andrewbogott>	 !log rebooting labvirt1010
[17:44:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:49:37] <wikibugs>	 (03PS1) 10Phuedx: admin: Replace phuedx's key [puppet] - 10https://gerrit.wikimedia.org/r/437794
[17:50:08] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4261877 (10RobH)
[17:51:37] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4261509 (10RobH)
[17:51:42] <wikibugs>	 10Operations, 10Traffic, 10media-storage, 10Patch-For-Review, 10Performance-Team (Radar): Remove unnecessary response headers - https://phabricator.wikimedia.org/T194814#4261895 (10Krinkle)
[17:51:48] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4261509 (10RobH) p:05Triage>03Normal
[17:52:56] <wikibugs>	 10Operations, 10Phabricator: Phabricator is very slow to load - https://phabricator.wikimedia.org/T196565#4261908 (10mmodell) I can't reproduce currently, load average isn't particularly high and phabricator has been snappy fast for a while now. I think we can close this as resolved.
[17:53:19] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4261910 (10RobH) Any additions to deployers requires approval by both @greg (for RI) plus review in the SRE weekly meetings.    The othe...
[17:55:04] <wikibugs>	 10Operations, 10Traffic, 10media-storage, 10Patch-For-Review, 10Performance-Team (Radar): Remove unnecessary response headers - https://phabricator.wikimedia.org/T194814#4261920 (10Krinkle)
[17:57:48] <wikibugs>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4261935 (10Marostegui) I do see it is deployed on db1095 and on db1102 on the `ops` database It needs some checking, but I gues...
[17:58:21] <wikibugs>	 10Operations, 10Traffic, 10media-storage, 10Patch-For-Review, 10Performance-Team (Radar): Reduce amount of headers sent from Varnish responses - https://phabricator.wikimedia.org/T194814#4261938 (10Krinkle)
[17:58:44] <wikibugs>	 10Operations, 10Traffic, 10media-storage, 10Patch-For-Review, 10Performance-Team (Radar): Reduce amount of headers sent from web responses - https://phabricator.wikimedia.org/T194814#4209672 (10Krinkle)
[17:59:22] <urandom>	 !log upgrade Cassandra to 3.11.2, restbase2010-{a,b,c} - T178905
[17:59:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:27] <stashbot>	 T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905
[17:59:44] <andrewbogott>	 !log rebooting labvirt1011
[17:59:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:52] <wikibugs>	 (03PS2) 10Jgreen: DNS: Add prod & mgmt DNS for frmon2001 [dns] - 10https://gerrit.wikimedia.org/r/437768 (https://phabricator.wikimedia.org/T196476) (owner: 10Papaul)
[17:59:53] <wikibugs>	 (03PS2) 10Jgreen: DNS: Add prod & mgmt DNS for frmon2001 [dns] - 10https://gerrit.wikimedia.org/r/437768 (https://phabricator.wikimedia.org/T196476) (owner: 10Papaul)
[18:00:04] <jouncebot>	 Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180606T1800)
[18:00:07] <wikibugs>	 (03CR) 10Jgreen: [V: 031 C: 032] DNS: Add prod & mgmt DNS for frmon2001 [dns] - 10https://gerrit.wikimedia.org/r/437768 (https://phabricator.wikimedia.org/T196476) (owner: 10Papaul)
[18:00:08] <wikibugs>	 (03CR) 10Jgreen: [V: 031 C: 032] DNS: Add prod & mgmt DNS for frmon2001 [dns] - 10https://gerrit.wikimedia.org/r/437768 (https://phabricator.wikimedia.org/T196476) (owner: 10Papaul)
[18:00:14] <wikibugs>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4261959 (10Marostegui) It is from 3 years ago...: https://gerrit.wikimedia.org/r/#/q/events_sanitarium.sql
[18:00:28] <logmsgbot>	 !log gilles@deploy1001 Started deploy [performance/navtiming@816e610]: T196528 Funnel performance survey responses from kafka to graphite
[18:00:33] <logmsgbot>	 !log gilles@deploy1001 Finished deploy [performance/navtiming@816e610]: T196528 Funnel performance survey responses from kafka to graphite (duration: 00m 05s)
[18:01:06] <wikibugs>	 (03CR) 10Jgreen: [V: 032 C: 032] DNS: Add prod & mgmt DNS for frmon2001 [dns] - 10https://gerrit.wikimedia.org/r/437768 (https://phabricator.wikimedia.org/T196476) (owner: 10Papaul)
[18:03:15] <wikibugs>	 (03PS2) 10Jgreen: DNS: Add prod DNS entries for frbast2001 [dns] - 10https://gerrit.wikimedia.org/r/437539 (https://phabricator.wikimedia.org/T196417) (owner: 10Papaul)
[18:04:37] <wikibugs>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4261989 (10jcrespo) > if they are really needed anymore  They are needed, a different thing is how much changes they need, but...
[18:04:56] <wikibugs>	 (03CR) 10Jgreen: [C: 032] DNS: Add prod DNS entries for frbast2001 [dns] - 10https://gerrit.wikimedia.org/r/437539 (https://phabricator.wikimedia.org/T196417) (owner: 10Papaul)
[18:04:57] <wikibugs>	 (03CR) 1020after4: [C: 031] Install LFS on scap targets (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627) (owner: 10Awight)
[18:05:09] <wikibugs>	 (03CR) 10Jgreen: [C: 032] DNS: Add prod DNS entries for frbast2001 [dns] - 10https://gerrit.wikimedia.org/r/437539 (https://phabricator.wikimedia.org/T196417) (owner: 10Papaul)
[18:05:09] <wikibugs>	 (03CR) 1020after4: [C: 031] Install LFS on scap targets (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627) (owner: 10Awight)
[18:05:54] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4261994 (10greg) +1 (yay!)
[18:06:44] <andrewbogott>	 !log rebooting labvirt1012
[18:06:46] <wikibugs>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4262001 (10jcrespo) See: T196570
[18:06:48] <wikibugs>	 10Operations, 10Phabricator, 10User-greg: Phabricator is very slow to load - https://phabricator.wikimedia.org/T196565#4262003 (10greg) 05Open>03Resolved a:03greg Please reopen if something looks off in the future.
[18:06:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:07:24] <wikibugs>	 (03PS4) 10Alex Monk: Prepare to tighten Puppet DB access control - check client certificates [puppet] - 10https://gerrit.wikimedia.org/r/437057 (https://phabricator.wikimedia.org/T194962)
[18:07:25] <wikibugs>	 (03PS4) 10Alex Monk: Prepare to tighten Puppet DB access control - check client certificates [puppet] - 10https://gerrit.wikimedia.org/r/437057 (https://phabricator.wikimedia.org/T194962)
[18:07:27] <wikibugs>	 (03PS2) 10Alex Monk: Tighten Puppet DB access control - check client certificates [puppet] - 10https://gerrit.wikimedia.org/r/437640 (https://phabricator.wikimedia.org/T194962)
[18:07:29] <wikibugs>	 (03PS2) 10Alex Monk: Tighten Puppet DB access control - check client certificates [puppet] - 10https://gerrit.wikimedia.org/r/437640 (https://phabricator.wikimedia.org/T194962)
[18:07:31] <wikibugs>	 (03CR) 1020after4: [C: 031] Install LFS on scap targets (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627) (owner: 10Awight)
[18:07:33] <wikibugs>	 (03CR) 1020after4: [C: 031] Install LFS on scap targets (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627) (owner: 10Awight)
[18:07:56] <wikibugs>	 10Operations, 10Traffic, 10media-storage, 10Patch-For-Review, 10Performance-Team (Radar): Reduce amount of headers sent from web responses - https://phabricator.wikimedia.org/T194814#4209672 (10Vgutierrez) @ema Could we use std.log (VCL_Log) to report X-Analytics data and stop the header from reaching th...
[18:08:16] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4262010 (10RobH)
[18:09:17] <wikibugs>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4262014 (10Marostegui) I definitely think we do not need the `ops` database one on sanitarium hosts, those are probably entries...
[18:12:41] <wikibugs>	 (03PS2) 10RobH: DNS: Add mgmt DNS entries for labtestnet2003 [dns] - 10https://gerrit.wikimedia.org/r/436579 (https://phabricator.wikimedia.org/T196000) (owner: 10Papaul)
[18:12:42] <wikibugs>	 (03PS2) 10RobH: DNS: Add mgmt DNS entries for labtestnet2003 [dns] - 10https://gerrit.wikimedia.org/r/436579 (https://phabricator.wikimedia.org/T196000) (owner: 10Papaul)
[18:13:24] <wikibugs>	 (03CR) 10RobH: [C: 032] DNS: Add mgmt DNS entries for labtestnet2003 [dns] - 10https://gerrit.wikimedia.org/r/436579 (https://phabricator.wikimedia.org/T196000) (owner: 10Papaul)
[18:13:24] <wikibugs>	 (03CR) 10RobH: [C: 032] DNS: Add mgmt DNS entries for labtestnet2003 [dns] - 10https://gerrit.wikimedia.org/r/436579 (https://phabricator.wikimedia.org/T196000) (owner: 10Papaul)
[18:13:53] <robh>	 well
[18:13:59] <robh>	 someone had dns changes pending on nameserver 
[18:14:23] <andrewbogott>	 !log rebooting labvirt1013
[18:14:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:27] <robh>	 someone had pending dns changes merged but not live, for frmon2001 and such, now live
[18:15:06] <paladox>	 robh maybe this https://gerrit.wikimedia.org/r/437539  ?
[18:15:07] <robh>	 Jeff_Green: ^ patchset you merged in gerrit are now live
[18:15:16] <robh>	 yeah
[18:15:20] <robh>	 found it via git blame ;]
[18:15:27] <Jeff_Green>	 robh thanks
[18:15:41] <robh>	 i saw it was new mgmt entries and assumed it was cool to merge =]
[18:15:52] <Jeff_Green>	 yeah, we were just working on this
[18:19:00] <urandom>	 !log upgrade Cassandra to 3.11.2, restbase2003-{a,b,c} - T178905
[18:19:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:05] <stashbot>	 T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905
[18:20:16] <wikibugs>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4262072 (10Marostegui) Nevermind my comments above. They have nothing to do with the sanitarium events.  The ones on the file a...
[18:20:26] <wikibugs>	 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10TechCom-RFC (TechCom-Approved), 10User-ArielGlenn: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#4262075 (10Reedy)
[18:20:34] <wikibugs>	 10Operations, 10MediaWiki-Platform-Team, 10Performance-Team, 10MW-1.27-release-notes, and 3 others: php-memcached 3.0 (PHP 7) incompatible with BagOStuff - https://phabricator.wikimedia.org/T196125#4262073 (10Reedy) 05Open>03Resolved
[18:21:01] <andrewbogott>	 !log rebooting labvirt1015
[18:21:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:23:32] <andrewbogott>	 !log rebooting labvirt1016
[18:23:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:25:05] <wikibugs>	 (03PS1) 10RobH: labtestnet2003 install params [puppet] - 10https://gerrit.wikimedia.org/r/437801 (https://phabricator.wikimedia.org/T196000)
[18:25:06] <wikibugs>	 (03PS1) 10RobH: labtestnet2003 install params [puppet] - 10https://gerrit.wikimedia.org/r/437801 (https://phabricator.wikimedia.org/T196000)
[18:26:25] <wikibugs>	 (03PS1) 10Marostegui: events_sanitarium: Update sanitarium hosts [software] - 10https://gerrit.wikimedia.org/r/437802 (https://phabricator.wikimedia.org/T190704)
[18:26:25] <wikibugs>	 (03PS1) 10Marostegui: events_sanitarium: Update sanitarium hosts [software] - 10https://gerrit.wikimedia.org/r/437802 (https://phabricator.wikimedia.org/T190704)
[18:29:18] <andrewbogott>	 !log rebooting labvirt1017
[18:29:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:33:49] <marostegui>	 Why do we have duplicate lines from jouncebot everytime something is commited to gerrit? ie my change or robh last change
[18:34:06] <robh>	 yeah i just noticed that as well
[18:34:16] <robh>	 well, from wikibuygs you mean?
[18:34:30] <Reedy>	 needs rebooting
[18:35:05] <James_F>	 Yeah, probably got a second instance running due to Cloud reboots?
[18:35:20] <Reedy>	 They're on the same name though
[18:35:24] <Reedy>	 two listeners?
[18:35:53] <James_F>	 Could be.
[18:36:16] <andrewbogott>	 !log rebooting labvirt1018, 1021, 1022
[18:36:17] <Reedy>	 Ugh, it's fab deployed?
[18:36:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:36:30] <robh>	 which is why i have no idea how to fix it
[18:36:40] <Reedy>	 qdel
[18:37:55] <Reedy>	 hurrah
[18:38:43] <andrewbogott>	 !log rebooting labvirt1019, 1020
[18:38:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:54:50] <urandom>	 !log upgrade Cassandra to 3.11.2, restbase2004-{a,b,c} - T178905
[18:54:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:54:54] <stashbot>	 T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905
[18:55:19] <herron>	 !log stopped exim on mx1001 in prep for upgrade to stretch
[18:55:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:00:04] <Guest24005>	 thcipriani: #bothumor My software never has bugs. It just develops random features. Rise for MediaWiki train. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180606T1900).
[19:00:04] <jouncebot>	 thcipriani: Time to snap out of that daydream and deploy MediaWiki train. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180606T1900).
[19:00:45] * thcipriani does
[19:01:05] <Reedy>	 I see we have multiple jouncebots too
[19:01:37] <James_F>	 Not any more?
[19:01:47] <Reedy>	 James_F: jouncebot and Guest24005?
[19:02:05] <James_F>	 Oh, right, I was going by jouncebot and jouncebot_ earlier.
[19:02:08] <Reedy>	 heh
[19:02:24] <James_F>	 But of course there's a /third/.
[19:06:13] <robh>	 bleh
[19:08:32] <hashar>	 o/
[19:08:59] <hashar>	 thcipriani: i am around!
[19:11:16] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was rece
[19:16:11] <urandom>	 wth?
[19:16:18] <urandom>	 how is a warning critical?
[19:17:18] <urandom>	 ...and why doesn't that show up in the web ui...and why no email?
[19:18:18] <logmsgbot>	 !log thcipriani@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.7
[19:18:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:19:15] <logmsgbot>	 !log thcipriani@deploy1001 Synchronized php: group1 wikis to 1.32.0-wmf.7 (duration: 00m 56s)
[19:19:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:38:08] <urandom>	 !log upgrade Cassandra to 3.11.2, restbase2008-{a,b,c} - T178905
[19:38:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:38:16] <stashbot>	 T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905
[19:47:40] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team: templatetiger is using 827G of 8T available tools nfs storage - https://phabricator.wikimedia.org/T183954#4262294 (10Bstorm) p:05Normal>03High Hello @Kolossos, the NFS is at quite high utilization again, and the number one user is the templatetiger tool d...
[19:53:45] <icinga-wm>	 PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: Traceback (most recent call last)
[19:53:45] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: Traceback (most recent call last)
[19:54:14] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: Traceback (most recent call last)
[19:56:04] <wikibugs>	 (03CR) 10Andrew Bogott: "I'm a bit confused about variable naming... in at least one place it's implied that keystone_host is set in hiera, but elsewhere you're fo" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/437812 (https://phabricator.wikimedia.org/T167559) (owner: 10Rush)
[19:58:14] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 9 probes of 301 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[19:58:14] <icinga-wm>	 RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 0 probes of 321 (alerts on 19) - https://atlas.ripe.net/measurements/1791307/#!map
[20:00:04] <jouncebot>	 cscott, arlolra, subbu, bearND, halfak, and Amir1: That opportune time is upon us again. Time for a Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180606T2000).
[20:00:04] <Guest24005>	 cscott, arlolra, subbu, bearND, halfak, and Amir1: Dear deployers, time to do the Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180606T2000).
[20:00:30] <halfak>	 We're all done with ORES stuff for today :) 
[20:00:45] <subbu>	 huh .. 2 bots?
[20:00:52] <Krenair>	 looking
[20:01:28] <Amir1>	 Aren't we deploying new version of ORES? I want to see the new look on the home page.
[20:01:40] <halfak>	 Oh!  I thought we did
[20:01:51] <halfak>	 We got the drafttopic version out. 
[20:01:58] <halfak>	 Is there a newer version ready for deployment?
[20:02:00] <Krenair>	 hm
[20:02:05] <Krenair>	 oh there it is
[20:02:39] <halfak>	 Amir1, I think we need to get the new homepage on ores-beta first
[20:02:54] <halfak>	 I'll give you a quick review if you want to make that simple change now :) 
[20:03:04] <Amir1>	 yeah sure
[20:03:34] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 7 probes of 301 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[20:04:02] <Krenair>	 jouncebot, next
[20:04:02] <jouncebot>	 In 2 hour(s) and 55 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180606T2300)
[20:10:01] <logmsgbot>	 !log bsitzmann@deploy1001 Started deploy [mobileapps/deploy@a07af40]: Update mobileapps to 3bf9be5 (T196402 T195948)
[20:10:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:10:12] <stashbot>	 T196402: Public rollout of feed content availability endpoint - https://phabricator.wikimedia.org/T196402
[20:10:13] <stashbot>	 T195948: MCS should respect Accept-Language header - https://phabricator.wikimedia.org/T195948
[20:17:27] <urandom>	 !log upgrade Cassandra to 3.11.2, restbase2011-{a,b,c} - T178905
[20:17:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:17:33] <stashbot>	 T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905
[20:17:52] <wikibugs>	 10Operations, 10ops-codfw, 10Cloud-VPS: move/setup/install labtestnet2003(WMF6469) - https://phabricator.wikimedia.org/T196000#4262353 (10RobH)
[20:18:38] <logmsgbot>	 !log bsitzmann@deploy1001 Finished deploy [mobileapps/deploy@a07af40]: Update mobileapps to 3bf9be5 (T196402 T195948) (duration: 08m 37s)
[20:18:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:18:47] <stashbot>	 T196402: Public rollout of feed content availability endpoint - https://phabricator.wikimedia.org/T196402
[20:18:48] <stashbot>	 T195948: MCS should respect Accept-Language header - https://phabricator.wikimedia.org/T195948
[20:18:55] <wikibugs>	 10Operations, 10Cloud-VPS: move/setup/install labtestnet2003(WMF6469) - https://phabricator.wikimedia.org/T196000#4243908 (10RobH) a:05RobH>03chasemp @chasemp,  This system is now ready for cloud team to take over.  Feel free to use or resolve this task as needed.
[20:19:10] <bearND>	 !log rolled back mobileapps deploy
[20:19:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:19:57] <wikibugs>	 (03CR) 10Ottomata: [WIP] Allow admin module to ensure system user membership in managed groups (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/379004 (https://phabricator.wikimedia.org/T174465) (owner: 10Ottomata)
[20:20:30] <wikibugs>	 (03CR) 10Chad: [V: 032 C: 032] "We'll deploy 2.15.2, but let's merge this for consistency and since the artifacts already uploaded" [software/gerrit] (stable-2.15) - 10https://gerrit.wikimedia.org/r/436607 (owner: 10Chad)
[20:20:31] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4262364 (10RobH) I forgot to note that @Jdforrester-WMF put 'deployers' but I assume he meant 'deployment'
[20:20:33] <wikibugs>	 (03PS2) 10Ottomata: [WIP] Allow admin module to ensure system user membership in managed groups [puppet] - 10https://gerrit.wikimedia.org/r/379004 (https://phabricator.wikimedia.org/T174465)
[20:21:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Allow admin module to ensure system user membership in managed groups [puppet] - 10https://gerrit.wikimedia.org/r/379004 (https://phabricator.wikimedia.org/T174465) (owner: 10Ottomata)
[20:21:23] <wikibugs>	 (03CR) 10Chad: [V: 032 C: 032] Merge tag 'v2.15.2' into wmf/stable-2.15 [software/gerrit/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/436619 (owner: 10Chad)
[20:22:31] <wikibugs>	 (03PS1) 10RobH: adds jforrester to deployment, deploy-service, & mobileapps-admin groups [puppet] - 10https://gerrit.wikimedia.org/r/437819 (https://phabricator.wikimedia.org/T196566)
[20:23:40] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): Requesting deployment access for jforrester - https://phabricator.wikimedia.org/T196566#4262372 (10RobH)
[20:29:31] <Reedy>	 jouncebot: now
[20:29:31] <jouncebot>	 For the next 0 hour(s) and 30 minute(s): Services – Parsoid / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180606T2000)
[20:29:31] <jouncebot>	 For the next 0 hour(s) and 30 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180606T1900)
[20:29:33] <Reedy>	 jouncebot: next
[20:29:34] <jouncebot>	 In 2 hour(s) and 30 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180606T2300)
[20:42:19] <urandom>	 !log upgrade Cassandra to 3.11.2, restbase2005-{a,b,c} - T178905
[20:42:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:42:23] <stashbot>	 T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905
[20:43:45] <wikibugs>	 10Operations, 10JADE, 10TechCom, 10Scoring-platform-team (Current): Deploy JADE extension to production - https://phabricator.wikimedia.org/T183381#4262445 (10Joe)
[20:44:53] <yannf>	 https://tools.wmflabs.org/add-information/?image=Map-heart-054.jpg broken?
[20:45:00] <wikibugs>	 10Operations, 10JADE, 10Scoring-platform-team (Current), 10User-Joe: Scalability concerns creating a page per revision - https://phabricator.wikimedia.org/T196547#4262449 (10Joe)
[20:45:04] <yannf>	 502 Bad Gateway
[20:45:22] <Reedy>	 yannf: #wikimedia-cloud likely fallout from maintenance/host reboots
[20:47:05] <ebernhardson>	 !log sighup logstash on logstash100[789] to reload config for gerrit.wikimedia.org/r/437657
[20:47:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:55:29] <wikibugs>	 10Operations, 10Mail, 10Patch-For-Review: Upgrade mx1001/mx2001 to stretch - https://phabricator.wikimedia.org/T175361#4262462 (10herron) Planning to proceed with the firewall update and reinstall to Stretch starting at 10a Eastern tomorrow (coordinated over IRC)  In preparation for that, Exim on mx1001 has...
[21:05:26] <yannf>	 Reedy, should I open a report on Phab?
[21:06:20] <Reedy>	 Visit #wikimedia-cloud and mention the tool isn't working. Someone should restart it
[21:07:11] <yannf>	 done
[21:07:43] <yannf>	 ok, it worked ;)
[21:08:04] <yannf>	 *works
[21:26:17] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: rack/setup/install Prometeuse/Grafana host frmon2001 for fr-tech - https://phabricator.wikimedia.org/T196476#4262540 (10Papaul)
[21:27:49] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: rack/setup/install Prometeuse/Grafana host frmon2001 for fr-tech - https://phabricator.wikimedia.org/T196476#4257945 (10Papaul) a:05Papaul>03Jgreen @Jgreen  all yours. let me know if you have any questions.
[21:34:27] <wikibugs>	 (03CR) 10Awight: [C: 031] "Thanks for the fixup!" [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627) (owner: 10Awight)
[21:40:30] <icinga-wm>	 PROBLEM - cassandra-c CQL 10.192.48.48:9042 on restbase2005 is CRITICAL: connect to address 10.192.48.48 and port 9042: Connection refused
[21:41:11] <icinga-wm>	 PROBLEM - cassandra-c SSL 10.192.48.48:7001 on restbase2005 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[21:42:08] <wikibugs>	 (03PS1) 10EBernhardson: logstash: typo gelf long_message -> full_message [puppet] - 10https://gerrit.wikimedia.org/r/437864
[21:42:20] <wikibugs>	 (03PS2) 10EBernhardson: logstash: typo gelf long_message -> full_message [puppet] - 10https://gerrit.wikimedia.org/r/437864
[21:42:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] logstash: typo gelf long_message -> full_message [puppet] - 10https://gerrit.wikimedia.org/r/437864 (owner: 10EBernhardson)
[21:46:12] <urandom>	 ^^^ got that
[21:46:41] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-c CQL 10.192.48.48:9042 on restbase2005 is CRITICAL: connect to address 10.192.48.48 and port 9042: Connection refused eevans Cassandra upgrade
[21:46:41] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-c SSL 10.192.48.48:7001 on restbase2005 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused eevans Cassandra upgrade
[21:48:56] <wikibugs>	 (03PS1) 10Chad: Gerrit 2.15.2 wmf build [software/gerrit] (stable-2.15) - 10https://gerrit.wikimedia.org/r/437865
[21:58:40] <icinga-wm>	 RECOVERY - cassandra-c SSL 10.192.48.48:7001 on restbase2005 is OK: SSL OK - Certificate restbase2005-c valid until 2018-08-17 16:12:01 +0000 (expires in 71 days)
[22:00:10] <icinga-wm>	 RECOVERY - cassandra-c CQL 10.192.48.48:9042 on restbase2005 is OK: TCP OK - 0.030 second response time on 10.192.48.48 port 9042
[22:03:15] <Dereckson>	 jouncebot: next
[22:03:15] <jouncebot>	 In 0 hour(s) and 56 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180606T2300)
[22:16:31] <logmsgbot>	 !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@0346959]: Update mobileapps to 5ea008c
[22:16:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:22:04] <logmsgbot>	 !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@0346959]: Update mobileapps to 5ea008c (duration: 05m 33s)
[22:22:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:33:42] <wikibugs>	 (03PS8) 10Awight: Install LFS on scap targets [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627)
[22:34:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Install LFS on scap targets [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627) (owner: 10Awight)
[22:46:26] <wikibugs>	 (03PS4) 10Krinkle: Swap mediawiki.org to use standard docroot naming scheme [puppet] - 10https://gerrit.wikimedia.org/r/421949 (owner: 10Chad)
[22:50:09] <wikibugs>	 (03CR) 10Awight: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627) (owner: 10Awight)
[22:50:55] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Install LFS on scap targets [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627) (owner: 10Awight)
[22:52:13] <wikibugs>	 (03PS9) 10Awight: Install LFS on scap targets [puppet] - 10https://gerrit.wikimedia.org/r/437719 (https://phabricator.wikimedia.org/T180627)
[22:58:37] <wikibugs>	 10Operations, 10ops-ulsfo, 10Traffic, 10netops: troubleshoot cr3/cr4 link - https://phabricator.wikimedia.org/T196030#4262689 (10RobH) a:05RobH>03ayounsi Ok, I'm back onsite today, and I've taken the following steps:  * verified both optics are working by connecting an lc-sc patch to a light meter, the...
[23:00:04] <jouncebot>	 addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the Evening SWAT (Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180606T2300).
[23:00:04] <jouncebot>	 MatmaRex: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[23:00:27] <MatmaRex>	 hi.
[23:06:17] * legoktm looks around
[23:06:50] <legoktm>	 I'll just do the swat then?
[23:07:41] <Reedy>	 That's what I do when people on the list aren't around :P
[23:07:41] <MatmaRex>	 thanks
[23:07:45] <wikibugs>	 (03PS1) 10Krinkle: mc: Clean up docs and use same format and order between prod and beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437876
[23:08:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mc: Clean up docs and use same format and order between prod and beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437876 (owner: 10Krinkle)
[23:09:10] <wikibugs>	 (03CR) 10Krinkle: "This documents in -labs the differences from prod. This is important given that unlike e.g. CommonSettings, one does not load after the ot" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437876 (owner: 10Krinkle)
[23:10:57] <wikibugs>	 (03PS2) 10Krinkle: mc: Clean up docs and use same format and order between prod and beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437876
[23:13:17] <wikibugs>	 (03PS1) 10Krinkle: mc-labs: Update wgMemCachedPersistent override [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437877
[23:16:24] <legoktm>	 MatmaRex: it's on mwdebug1002
[23:16:50] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on db2062 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 303.29 seconds
[23:16:51] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on db2055 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.52 seconds
[23:17:10] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on db2088 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.66 seconds
[23:17:11] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on db2071 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 310.00 seconds
[23:17:11] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on db2094 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 311.13 seconds
[23:17:21] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on db2070 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 314.25 seconds
[23:17:21] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on db2085 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 313.80 seconds
[23:17:30] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on db2072 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 316.16 seconds
[23:17:31] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on db2092 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 318.39 seconds
[23:17:31] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on db2048 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 318.02 seconds
[23:17:43] <Reedy>	 What's up with codfw?
[23:18:55] <legoktm>	 MatmaRex: ugh, I'm gonna have to scap
[23:18:57] <MatmaRex>	 legoktm: looks good except the l10n message is missing
[23:18:59] <MatmaRex>	 yeah
[23:21:22] <logmsgbot>	 !log legoktm@deploy1001 Started scap: Preference for responsive MonoBook, plus set mobile width cutoff to 550px ([[gerrit:437875]], [[gerrit:437814]])
[23:21:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:23:46] <urandom>	 !log upgrade Cassandra to 3.11.2, restbase2009-{a,b,c} - T178905
[23:23:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:23:50] <stashbot>	 T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905
[23:24:12] <wikibugs>	 (03PS3) 10Krinkle: mc-labs: Sync with prod or document differences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437876
[23:25:22] <wikibugs>	 (03PS4) 10Krinkle: mc-labs: Sync with prod or document differences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437876
[23:25:35] <wikibugs>	 (03Abandoned) 10Krinkle: mc-labs: Update wgMemCachedPersistent override [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437877 (owner: 10Krinkle)
[23:26:12] <wikibugs>	 (03PS5) 10Krinkle: mc-labs: Sync with prod or document differences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437876
[23:27:34] <wikibugs>	 (03PS6) 10Krinkle: mc-labs: Sync with prod or document differences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437876
[23:51:48] <urandom>	 !log upgrade Cassandra to 3.11.2, restbase2012-{a,b,c} - T178905
[23:51:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:51:52] <stashbot>	 T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905
[23:54:32] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure: confd broken on deployment-redis hosts - https://phabricator.wikimedia.org/T196596#4262770 (10Reedy)
[23:55:21] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure: confd broken on deployment-redis hosts - https://phabricator.wikimedia.org/T196596#4262785 (10Reedy) p:05Triage>03High
[23:56:28] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure: confd broken on deployment-redis hosts - https://phabricator.wikimedia.org/T196596#4262770 (10Reedy)