[00:00:08] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 46.67% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[00:19:41] <wikibugs>	 (03PS1) 10Brion VIBBER: Switch in WebM VP9/Opus video transcodes to replace WebM VP8/Vorbis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447572 (https://phabricator.wikimedia.org/T63805)
[00:24:09] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[02:23:24] <logmsgbot>	 !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.12) (duration: 09m 44s)
[02:23:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:25:04] <Krinkle>	 robh: Not sure who best to ask, but I’m still awaiting review of these webperf patches, would like feedback and/or to land soon so I can work continue with moving arclamp from mwlog1001
[02:25:10] <Krinkle>	 https://gerrit.wikimedia.org/r/#/q/status:open+hashtag:beta-picked+project:operations/puppet+branch:production+topic:webperf
[02:26:32] <Krinkle>	 At this point they should all be no-ops for prod, mostly refactoring to prepare for the next step.
[02:43:18] <icinga-wm>	 PROBLEM - Disk space on maps1001 is CRITICAL: DISK CRITICAL - free space: /srv 54478 MB (3% inode=99%)
[02:52:46] <logmsgbot>	 !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.13) (duration: 09m 19s)
[02:52:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:02:38] <icinga-wm>	 RECOVERY - Disk space on maps1001 is OK: DISK OK
[03:03:08] <logmsgbot>	 !log l10nupdate@deploy1001 ResourceLoader cache refresh completed at Tue Jul 24 03:03:08 UTC 2018 (duration 10m 22s)
[03:03:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:25:29] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 783.45 seconds
[03:50:38] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 177.34 seconds
[04:44:49] <marostegui>	 !log Deploy schema change on db1066 (s2 primary master) T144010 T51190 T199368
[04:44:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:44:56] <stashbot>	 T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190
[04:44:56] <stashbot>	 T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010
[04:44:59] <stashbot>	 T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368
[04:49:53] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1081, db1121 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447574 (https://phabricator.wikimedia.org/T200061)
[04:51:23] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1081, db1121 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447574 (https://phabricator.wikimedia.org/T200061) (owner: 10Marostegui)
[04:52:43] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1081, db1121 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447574 (https://phabricator.wikimedia.org/T200061) (owner: 10Marostegui)
[04:53:00] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1081, db1121 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447574 (https://phabricator.wikimedia.org/T200061) (owner: 10Marostegui)
[04:54:04] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1084, db1121 (duration: 00m 56s)
[04:54:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:54:12] <marostegui>	 !log Stop replication in sync on db1081 and db1121
[04:54:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:05:55] <marostegui>	 !log Deploy schema change on db1081 T144010 T51190 T199368
[05:06:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:06:01] <stashbot>	 T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190
[05:06:01] <stashbot>	 T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010
[05:06:01] <stashbot>	 T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368
[05:13:15] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Repool db1121 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447577
[05:14:57] <wikibugs>	 (03PS2) 10Marostegui: db-eqiad.php: Repool db1121 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447577
[05:19:33] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1121 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447577 (owner: 10Marostegui)
[05:20:52] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1121 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447577 (owner: 10Marostegui)
[05:21:05] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Repool db1121 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447577 (owner: 10Marostegui)
[05:25:25] <kart_>	 marostegui: OK to deploy cxserver change? Let me know.
[05:26:15] <marostegui>	 kart_: give me a minute :)
[05:26:24] <kart_>	 Sure
[05:26:26] <marostegui>	 got distracted and didn't deploy my merged change above
[05:27:31] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1121 (duration: 00m 54s)
[05:27:31] <marostegui>	 kart_: all yours!
[05:27:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:27:38] <kart_>	 marostegui: cool.
[05:29:52] <logmsgbot>	 !log kartik@deploy1001 Started deploy [cxserver/deploy@d378d27]: Update cxserver to d3c9d15 (T198941)
[05:29:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:29:56] <stashbot>	 T198941: SyntaxError: Unexpected token u in JSON at position 0 - https://phabricator.wikimedia.org/T198941
[05:33:53] <logmsgbot>	 !log kartik@deploy1001 Finished deploy [cxserver/deploy@d378d27]: Update cxserver to d3c9d15 (T198941) (duration: 04m 01s)
[05:33:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:34:26] <kart_>	 marostegui: done.
[05:34:38] <marostegui>	 thanks
[06:32:07] <marostegui>	 !log Deploy schema change on dbstore1002:s4 T144010 T51190 T199368
[06:32:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:32:13] <stashbot>	 T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190
[06:32:13] <stashbot>	 T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010
[06:32:14] <stashbot>	 T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368
[06:33:56] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Repool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447582
[06:35:35] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447582 (owner: 10Marostegui)
[06:36:45] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447582 (owner: 10Marostegui)
[06:38:17] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Repool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447582 (owner: 10Marostegui)
[06:38:25] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1081 (duration: 00m 55s)
[06:38:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:46:09] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on cp1049 is CRITICAL: 43 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=cp1049&var-datasource=eqiad%2520prometheus%252Fops
[06:47:06] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Promote es1017 as the master of es3-eqiad (instead of es1014) [puppet] - 10https://gerrit.wikimedia.org/r/447584 (https://phabricator.wikimedia.org/T197073)
[06:49:20] <wikibugs>	 (03PS1) 10Jcrespo: Correct es2 and es3 masters on prometheus metrics [puppet] - 10https://gerrit.wikimedia.org/r/447585
[06:50:17] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Correct es2 and es3 masters on prometheus metrics [puppet] - 10https://gerrit.wikimedia.org/r/447585
[06:51:13] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Correct es2 and es3 masters on prometheus metrics [puppet] - 10https://gerrit.wikimedia.org/r/447585 (owner: 10Jcrespo)
[06:53:35] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Promote es1017 as the master of es3-eqiad (instead of es1014) [puppet] - 10https://gerrit.wikimedia.org/r/447584 (https://phabricator.wikimedia.org/T197073)
[06:54:36] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-1] "Blocking until deployment time." [puppet] - 10https://gerrit.wikimedia.org/r/447584 (https://phabricator.wikimedia.org/T197073) (owner: 10Jcrespo)
[06:56:51] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Promote es1017 as the master of es3-eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447586 (https://phabricator.wikimedia.org/T197073)
[06:59:35] <wikibugs>	 (03CR) 10Marostegui: [C: 031] "https://puppet-compiler.wmflabs.org/compiler02/11835/" [puppet] - 10https://gerrit.wikimedia.org/r/447584 (https://phabricator.wikimedia.org/T197073) (owner: 10Jcrespo)
[07:06:23] <wikibugs>	 (03PS2) 10Elukey: profile::kafka::broker: raise default max open files to 128k [puppet] - 10https://gerrit.wikimedia.org/r/447389 (https://phabricator.wikimedia.org/T200177)
[07:07:23] <marostegui>	 elukey: dbstore1002 is misbehaving due to a schema change that is probably overloading it
[07:07:26] <marostegui>	 I am on it
[07:07:49] <elukey>	 thanks :(
[07:08:04] <marostegui>	 Big schema changes there are a pain :(
[07:12:18] <wikibugs>	 (03PS1) 10Jcrespo: Setup es1017 as the backend for the es3-eqiad master [dns] - 10https://gerrit.wikimedia.org/r/447587 (https://phabricator.wikimedia.org/T197073)
[07:13:03] <wikibugs>	 (03CR) 10Marostegui: "The change looks good to me, the commit message looks a bit strange to me though" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447586 (https://phabricator.wikimedia.org/T197073) (owner: 10Jcrespo)
[07:13:35] <marostegui>	 elukey: Not much we can do now just let the alter finish
[07:14:01] <wikibugs>	 (03CR) 10Marostegui: [C: 031] Setup es1017 as the backend for the es3-eqiad master [dns] - 10https://gerrit.wikimedia.org/r/447587 (https://phabricator.wikimedia.org/T197073) (owner: 10Jcrespo)
[07:16:28] <wikibugs>	 (03CR) 10Volans: "I left some minor comments in the Python file" (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/447565 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson)
[07:23:09] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/11836/" [puppet] - 10https://gerrit.wikimedia.org/r/447389 (https://phabricator.wikimedia.org/T200177) (owner: 10Elukey)
[07:42:40] <wikibugs>	 (03CR) 10Volans: "Forgot to add one comment, see inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/447565 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson)
[08:20:54] <wikibugs>	 (03CR) 10Elukey: [C: 032] profile::kafka::broker: raise default max open files to 128k [puppet] - 10https://gerrit.wikimedia.org/r/447389 (https://phabricator.wikimedia.org/T200177) (owner: 10Elukey)
[08:21:37] <elukey>	 !log rolling restart of kafka jumbo/main-(eqiad|codfw) clusters to pick up the new max open files limit (infinity -> 128k)
[08:21:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:38:05] <ema>	 !log restart varnish-fe on cache_text instances with cold, labeled VCL T200207
[08:38:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:38:09] <stashbot>	 T200207: Discard of cold labeled VCL crashes varnish parent and child  - https://phabricator.wikimedia.org/T200207
[08:47:18] <wikibugs>	 10Operations, 10ops-eqiad, 10monitoring: rack/setup/install graphite1004 - https://phabricator.wikimedia.org/T196484 (10fgiunchedi) Thanks for the update @Cmjohnson, not particularly urgent but it would be nice to have graphite1004 before the end of the quarter
[08:58:37] <wikibugs>	 10Operations, 10Traffic: Discard of cold, labeled VCL crashes varnish parent and child  - https://phabricator.wikimedia.org/T200207 (10ema)
[09:13:32] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447592
[09:15:02] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447592 (owner: 10Marostegui)
[09:16:10] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447592 (owner: 10Marostegui)
[09:17:37] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1097:3314 (duration: 00m 54s)
[09:17:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:17:56] <marostegui>	 !log Deploy schema change on db1097:3314 T144010 T51190 T199368
[09:18:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:18:01] <stashbot>	 T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190
[09:18:02] <stashbot>	 T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010
[09:18:02] <stashbot>	 T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368
[09:18:04] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447592 (owner: 10Marostegui)
[09:19:10] <wikibugs>	 (03PS1) 10Ema: Revert "Revert "cache_text: add support for alternate_domains"" [puppet] - 10https://gerrit.wikimedia.org/r/447593 (https://phabricator.wikimedia.org/T164609)
[09:19:25] <wikibugs>	 (03PS2) 10Ema: Revert "Revert "cache_text: add support for alternate_domains"" [puppet] - 10https://gerrit.wikimedia.org/r/447593 (https://phabricator.wikimedia.org/T164609)
[09:20:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "Revert "cache_text: add support for alternate_domains"" [puppet] - 10https://gerrit.wikimedia.org/r/447593 (https://phabricator.wikimedia.org/T164609) (owner: 10Ema)
[09:22:55] <wikibugs>	 (03PS3) 10Ema: Revert "Revert "cache_text: add support for alternate_domains"" [puppet] - 10https://gerrit.wikimedia.org/r/447593 (https://phabricator.wikimedia.org/T164609)
[09:30:19] <wikibugs>	 (03PS2) 10DCausse: Upgrade to 6.3.1-alpha1 (without hebrew) [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/446869 (https://phabricator.wikimedia.org/T199791)
[09:37:25] <wikibugs>	 (03PS1) 10Jcrespo: switchover: Make posible replica migration an optional, separate step [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447595 (https://phabricator.wikimedia.org/T199224)
[09:37:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] switchover: Make posible replica migration an optional, separate step [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447595 (https://phabricator.wikimedia.org/T199224) (owner: 10Jcrespo)
[09:47:20] <wikibugs>	 (03PS2) 10Jcrespo: switchover: Make posible replica migration an optional, separate step [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447595 (https://phabricator.wikimedia.org/T199224)
[09:59:11] <wikibugs>	 10Operations, 10Analytics, 10EventBus, 10Services (watching): Set a proper max open files limit for Kafka clusters - https://phabricator.wikimedia.org/T200177 (10mobrovac)
[10:09:55] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] switchover: Make posible replica migration an optional, separate step [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447595 (https://phabricator.wikimedia.org/T199224) (owner: 10Jcrespo)
[10:17:32] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1097:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447596
[10:23:01] <wikibugs>	 (03PS1) 10Filippo Giunchedi: mtail: gather metrics on systemd respawns [puppet] - 10https://gerrit.wikimedia.org/r/447597 (https://phabricator.wikimedia.org/T147923)
[10:30:19] <wikibugs>	 (03PS2) 10Filippo Giunchedi: mtail: gather metrics on systemd respawns [puppet] - 10https://gerrit.wikimedia.org/r/447597 (https://phabricator.wikimedia.org/T147923)
[10:30:25] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] mtail: gather metrics on systemd respawns [puppet] - 10https://gerrit.wikimedia.org/r/447597 (https://phabricator.wikimedia.org/T147923) (owner: 10Filippo Giunchedi)
[10:32:04] <wikibugs>	 10Operations, 10Analytics, 10EventBus, 10Services (watching): Set a proper max open files limit for Kafka clusters - https://phabricator.wikimedia.org/T200177 (10elukey) 05Open>03Resolved
[10:38:24] <wikibugs>	 (03PS1) 10Filippo Giunchedi: syslog: add systemd.mtail [puppet] - 10https://gerrit.wikimedia.org/r/447598 (https://phabricator.wikimedia.org/T147923)
[10:38:34] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] syslog: add systemd.mtail [puppet] - 10https://gerrit.wikimedia.org/r/447598 (https://phabricator.wikimedia.org/T147923) (owner: 10Filippo Giunchedi)
[10:39:38] <wikibugs>	 (03PS2) 10Filippo Giunchedi: syslog: add systemd.mtail [puppet] - 10https://gerrit.wikimedia.org/r/447598 (https://phabricator.wikimedia.org/T147923)
[10:40:57] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447599
[10:42:20] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447599 (owner: 10Marostegui)
[10:43:35] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447599 (owner: 10Marostegui)
[10:44:54] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1084 (duration: 00m 55s)
[10:44:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:44:59] <marostegui>	 !log Stop replication in sync on db1084 and db1097:3314
[10:45:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:48:10] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447599 (owner: 10Marostegui)
[10:48:48] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Repool db1084, db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447601
[11:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Time to snap out of that daydream and deploy European Mid-day SWAT(Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180724T1100).
[11:00:05] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[11:01:01] <wikibugs>	 (03PS1) 10Jcrespo: switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224)
[11:01:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224) (owner: 10Jcrespo)
[11:02:34] <wikibugs>	 (03PS2) 10Jcrespo: switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224)
[11:02:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224) (owner: 10Jcrespo)
[11:04:34] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Repool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447604
[11:06:30] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447604 (owner: 10Marostegui)
[11:07:35] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447604 (owner: 10Marostegui)
[11:07:58] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Repool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447604 (owner: 10Marostegui)
[11:08:25] <wikibugs>	 (03Abandoned) 10Marostegui: Revert "db-eqiad.php: Depool db1097:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447596 (owner: 10Marostegui)
[11:08:35] <wikibugs>	 (03PS2) 10Marostegui: db-eqiad.php: Repool db1084, db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447601
[11:08:56] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 (duration: 01m 06s)
[11:08:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:10:22] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1084, db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447601 (owner: 10Marostegui)
[11:11:31] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1084, db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447601 (owner: 10Marostegui)
[11:12:10] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Repool db1084, db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447601 (owner: 10Marostegui)
[11:12:51] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1084 (duration: 00m 55s)
[11:12:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:19:02] <Amir1>	 !log start of ladsgroup@mwmaint1001:~$ foreachwikiindblist s2 populateChangeTagDef.php --sleep 2 (T193873)
[11:19:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:19:06] <stashbot>	 T193873: Run maintenance script to populate change_tag_def on WMF production (all wikis) - https://phabricator.wikimedia.org/T193873
[11:19:47] <Yann_>	 https://phabricator.wikimedia.org/T200121
[11:20:10] <Yann_>	 this is a serious issue, some files are lost :((
[11:21:40] <Amir1>	 !log start of ladsgroup@mwmaint1001:~$ foreachwikiindblist s1 populateChangeTagDef.php (T193873)
[11:21:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:30:58] <wikibugs>	 (03PS3) 10Jcrespo: switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224)
[11:31:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224) (owner: 10Jcrespo)
[11:34:28] <wikibugs>	 (03PS1) 10Zfilipin: Group0 to 1.32.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447607
[11:35:26] <ema>	 !log disable puppet on cp-text hosts to merge alternate domains patch T164609
[11:35:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:35:30] <stashbot>	 T164609: Merge cache_misc into cache_text functionally - https://phabricator.wikimedia.org/T164609
[11:36:29] <wikibugs>	 (03CR) 10Ema: [C: 032] Revert "Revert "cache_text: add support for alternate_domains"" [puppet] - 10https://gerrit.wikimedia.org/r/447593 (https://phabricator.wikimedia.org/T164609) (owner: 10Ema)
[11:36:37] <wikibugs>	 (03PS4) 10Ema: Revert "Revert "cache_text: add support for alternate_domains"" [puppet] - 10https://gerrit.wikimedia.org/r/447593 (https://phabricator.wikimedia.org/T164609)
[11:37:25] <wikibugs>	 (03PS4) 10Jcrespo: switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224)
[11:37:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224) (owner: 10Jcrespo)
[11:41:33] <wikibugs>	 (03PS5) 10Jcrespo: switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224)
[11:42:00] <logmsgbot>	 !log zfilipin@deploy1001 Started scap: testwiki to php-1.32.0-wmf.14 and rebuild l10n cache
[11:42:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:43:14] <wikibugs>	 (03CR) 10Jcrespo: "This is still untested, but it will handle automatically (but without puppet) the stop and kill of heartbeat." [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224) (owner: 10Jcrespo)
[11:43:19] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-1] switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224) (owner: 10Jcrespo)
[11:45:42] <logmsgbot>	 !log zfilipin@deploy1001 scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_2212739269" --threads=30 --lang en  --quiet' returned non-zero exit status 1 (duration: 03m 42s)
[11:45:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:55:33] <logmsgbot>	 !log zfilipin@deploy1001 Started scap: testwiki to php-1.32.0-wmf.14 and rebuild l10n cache
[11:55:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:58:23] <logmsgbot>	 !log zfilipin@deploy1001 scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_4179557944" --threads=30 --lang en  --quiet' returned non-zero exit status 1 (duration: 02m 50s)
[11:58:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:00:05] <jouncebot>	 Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180724T1200)
[12:00:31] <wikibugs>	 (03PS6) 10Jcrespo: switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224)
[12:03:15] <wikibugs>	 (03PS7) 10Jcrespo: switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224)
[12:06:40] <zeljkof>	 kart_: re T199941, is it resolved? or just no longer blocking the train?
[12:06:41] <stashbot>	 T199941: Fatal MWException in Babel: "Language::isValidBuiltInCode must be passed a string"  - https://phabricator.wikimedia.org/T199941
[12:07:53] <ema>	 !log depool cp1067 to test alternate domains patch T164609
[12:07:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:56] <stashbot>	 T164609: Merge cache_misc into cache_text functionally - https://phabricator.wikimedia.org/T164609
[12:11:49] <gehel>	 !log vacuum full of postgres on maps1001 to try to reclaim space - T200228
[12:12:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:12:17] <stashbot>	 T200228: disk space alert on maps1001 - https://phabricator.wikimedia.org/T200228
[12:22:21] <kart_>	 zeljkof: I tried reproduced as originally mentioned in the ticket. eg. Not happening at: https://test.wikipedia.org/wiki/User:KartikMistry
[12:26:12] <zeljkof>	 kart_: can you confirm it's resolved? or should somebody else confirm it? Krinkle?
[12:27:44] <kart_>	 zeljkof: Krinkle is better person to confirm.
[12:28:49] <zeljkof>	 kart_: thanks, this is the only thing blocking the last week's train and I am trying to get it resolved as quickly as possible, we should start with .14 today, and .13 is still blocked :/
[12:52:25] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447611
[12:53:32] <wikibugs>	 (03CR) 10BBlack: Serve WebP variants for the hottest thumbnails (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/434055 (https://phabricator.wikimedia.org/T27611) (owner: 10Gilles)
[12:53:49] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447611 (owner: 10Marostegui)
[12:55:02] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447611 (owner: 10Marostegui)
[12:56:14] <marostegui>	 zeljkof: I was hoping to deploy right before the train, but there is an untracked file in mediawiki-staging on deploy1001
[12:56:18] <marostegui>	 	modified:   wikiversions.json
[12:58:11] <thcipriani>	 marostegui: fixed, train stuff
[12:58:11] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447611 (owner: 10Marostegui)
[12:58:27] <marostegui>	 ok thanks! deploying!
[12:59:36] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 (duration: 00m 55s)
[12:59:37] <marostegui>	 all done - thanks zeljkof
[12:59:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:00:04] <jouncebot>	 hashar: That opportune time is upon us again. Time for a MediaWiki train - European version deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180724T1300).
[13:01:48] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1330467944
[13:02:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 16783224
[13:02:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 789072856
[13:02:44] <volans>	 gehel: ^^^
[13:03:19] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 865520
[13:03:38] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 587272
[13:03:59] <gehel>	 yep, that's clearly me!
[13:04:03] <gehel>	 volans: thanks!
[13:05:09] <volans>	 yw :)
[13:05:18] <volans>	 hopefully nothing major
[13:05:54] <gehel>	 volans: nope, vacuuming the tables to see if there is recoverable space (and it looks like there is)
[13:06:13] <volans>	 ack, nice
[13:06:19] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 2256
[13:06:34] <gehel>	 I should have downtimed that alert (now done)
[13:07:40] <ema>	 !log repool cp1067 with alternate domains support T164609
[13:07:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:07:43] <stashbot>	 T164609: Merge cache_misc into cache_text functionally - https://phabricator.wikimedia.org/T164609
[13:15:57] <wikibugs>	 (03PS10) 10Ema: cache_text: load misc VCL as wikimedia_misc in VTC files [puppet] - 10https://gerrit.wikimedia.org/r/443930 (https://phabricator.wikimedia.org/T164609)
[13:16:42] <wikibugs>	 (03CR) 10Ema: [C: 032] cache_text: load misc VCL as wikimedia_misc in VTC files [puppet] - 10https://gerrit.wikimedia.org/r/443930 (https://phabricator.wikimedia.org/T164609) (owner: 10Ema)
[13:17:01] <wikibugs>	 (03PS8) 10Ema: cache_text: add misc-specific VTC tests [puppet] - 10https://gerrit.wikimedia.org/r/443974 (https://phabricator.wikimedia.org/T164609)
[13:17:38] <wikibugs>	 (03CR) 10Ema: [C: 032] cache_text: add misc-specific VTC tests [puppet] - 10https://gerrit.wikimedia.org/r/443974 (https://phabricator.wikimedia.org/T164609) (owner: 10Ema)
[13:18:38] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2036 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[13:19:23] <zeljkof>	 marostegui: sorry, just saw this, I was on lunch, train is blocked :/
[13:28:50] <marostegui>	 Ah :|
[13:28:55] <marostegui>	 Still?
[13:29:06] <marostegui>	 Then I will deploy something else I think :)
[13:31:22] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447615
[13:32:27] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447615 (owner: 10Marostegui)
[13:33:32] <wikibugs>	 (03PS2) 10Marostegui: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447615
[13:34:45] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447615 (owner: 10Marostegui)
[13:35:52] <wikibugs>	 (03PS1) 10Zfilipin: all wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447617
[13:35:54] <wikibugs>	 (03CR) 10Zfilipin: [C: 032] all wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447617 (owner: 10Zfilipin)
[13:35:56] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447615 (owner: 10Marostegui)
[13:37:10] <wikibugs>	 (03Merged) 10jenkins-bot: all wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447617 (owner: 10Zfilipin)
[13:38:09] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1084 (duration: 01m 59s)
[13:38:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:38:13] <marostegui>	 !log Stop replication in sync db1081 and db1103:3314
[13:38:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:38:41] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447615 (owner: 10Marostegui)
[13:40:34] <marostegui>	 !log Deploy schema change on db1081 T144010 T51190 T199368
[13:40:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:40:40] <stashbot>	 T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190
[13:40:41] <stashbot>	 T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010
[13:40:41] <stashbot>	 T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368
[13:42:28] <marostegui>	 !log Stop replication in sync db1084 and db1103:3314
[13:42:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:43:34] <marostegui>	 !log Deploy schema change on db1084 T144010 T51190 T199368
[13:43:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:45:51] <ema>	 !log apply alternate domains patch to text-eqiad T164609
[13:45:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:45:55] <stashbot>	 T164609: Merge cache_misc into cache_text functionally - https://phabricator.wikimedia.org/T164609
[13:53:22] <wikibugs>	 (03CR) 10Marostegui: switchover: Add the functionality to start and stop heartbeat (031 comment) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224) (owner: 10Jcrespo)
[13:58:52] <wikibugs>	 (03PS1) 10Zfilipin: all wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447620 (https://phabricator.wikimedia.org/T191059)
[13:59:09] <icinga-wm>	 PROBLEM - Varnish frontend child restarted on cp1068 is CRITICAL: 4 gt 3 https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=66&fullscreen&orgId=1&var-server=cp1068&var-datasource=eqiad+prometheus/ops
[14:00:20] <ema>	 known ^
[14:03:23] <zeljkof>	 Krinkle: I have almost deployed .13 everywhere, but I see you have added T200269 to blockers of T191059
[14:03:24] <stashbot>	 T191059: 1.32.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T191059
[14:03:24] <stashbot>	 T200269: Unable to undelete revision (Fatal error: given Title does not belong to page ID, RevisionStoreRecord) - https://phabricator.wikimedia.org/T200269
[14:03:42] <zeljkof>	 well, this train is going well :/
[14:03:55] <Krinkle>	 zeljkof: Sorry :/
[14:04:28] <Krinkle>	 Either these are really difficult problem, or we're understaffed, or it seems people aren't working on the UBNs.
[14:04:35] <zeljkof>	 Krinkle: thanks for the report :) I just want this train to finish, somehow
[14:04:39] <Krinkle>	 Yeah
[14:05:09] <zeljkof>	 or somehow we don't test important things before train
[14:09:29] <marostegui>	 !log Deploy schema change on db1103:3314 T144010 T51190 T199368
[14:09:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:09:35] <stashbot>	 T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190
[14:09:35] <stashbot>	 T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010
[14:09:35] <stashbot>	 T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368
[14:14:59] <icinga-wm>	 ACKNOWLEDGEMENT - Device not healthy -SMART- on db1069 is CRITICAL: cluster=mysql device=megaraid,0 instance=db1069:9100 job=node site=eqiad Jcrespo https://phabricator.wikimedia.org/T199056 https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db1069&var-datasource=eqiad%2520prometheus%252Fops
[14:19:54] <wikibugs>	 (03Abandoned) 10Zfilipin: all wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447620 (https://phabricator.wikimedia.org/T191059) (owner: 10Zfilipin)
[14:21:31] <wikibugs>	 (03PS1) 10Zhuyifei1999: Add libmysqlclient-dev to python 3 base docker image [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/447622 (https://phabricator.wikimedia.org/T190274)
[14:22:03] <wikibugs>	 (03PS1) 10Zfilipin: Revert "all wikis to 1.32.0-wmf.13" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447623 (https://phabricator.wikimedia.org/T191059)
[14:22:28] <wikibugs>	 (03PS1) 10Zhuyifei1999: Add .gitreview [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/447625
[14:23:55] <wikibugs>	 (03PS2) 10Zfilipin: Group 2 back to php-1.32.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447623 (https://phabricator.wikimedia.org/T191059)
[14:24:21] <wikibugs>	 (03CR) 10Zfilipin: [C: 032] Group 2 back to php-1.32.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447623 (https://phabricator.wikimedia.org/T191059) (owner: 10Zfilipin)
[14:24:29] <wikibugs>	 (03CR) 10Thcipriani: [C: 031] Group 2 back to php-1.32.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447623 (https://phabricator.wikimedia.org/T191059) (owner: 10Zfilipin)
[14:25:35] <wikibugs>	 (03Merged) 10jenkins-bot: Group 2 back to php-1.32.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447623 (https://phabricator.wikimedia.org/T191059) (owner: 10Zfilipin)
[14:34:38] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[14:42:59] <dcausse>	 looking ^
[14:47:51] <dcausse>	 !log T156137: banning elastic1031 due to high load (same "getEntryAfterMiss" symptoms)
[14:47:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:47:55] <stashbot>	 T156137: Reduce impact of GC pauses on elasticsearch response time - https://phabricator.wikimedia.org/T156137
[14:48:57] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Repool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447626
[14:50:46] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447626 (owner: 10Marostegui)
[14:51:44] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447626 (owner: 10Marostegui)
[14:53:00] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1084 (duration: 01m 02s)
[14:53:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:56:32] <wikibugs>	 (03PS1) 10Gehel: maps: disable OSM updates on eqiad while vacuum is running [puppet] - 10https://gerrit.wikimedia.org/r/447627
[14:57:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] maps: disable OSM updates on eqiad while vacuum is running [puppet] - 10https://gerrit.wikimedia.org/r/447627 (owner: 10Gehel)
[14:58:07] <wikibugs>	 (03PS2) 10Gehel: maps: disable OSM updates on eqiad while vacuum is running [puppet] - 10https://gerrit.wikimedia.org/r/447627 (https://phabricator.wikimedia.org/T200228)
[15:08:09] <wikibugs>	 (03PS6) 10Vgutierrez: WIP: provide ACMEv2 support based on certbot/acme library [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717)
[15:09:39] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[15:10:00] <wikibugs>	 (03PS1) 10Andrew Bogott: Restrict cloud dns recursors to $LABS_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/447632
[15:10:34] <wikibugs>	 (03PS8) 10Jcrespo: switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224)
[15:10:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Restrict cloud dns recursors to $LABS_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/447632 (owner: 10Andrew Bogott)
[15:11:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] WIP: provide ACMEv2 support based on certbot/acme library [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) (owner: 10Vgutierrez)
[15:11:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224) (owner: 10Jcrespo)
[15:12:30] <wikibugs>	 (03PS2) 10Andrew Bogott: Restrict cloud dns recursors to $LABS_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/447632
[15:20:16] <wikibugs>	 (03PS9) 10Jcrespo: switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224)
[15:27:36] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] Restrict cloud dns recursors to $LABS_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/447632 (owner: 10Andrew Bogott)
[15:29:20] <gehel>	 !log restart postgres on maps1001 - T200228
[15:29:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:29:24] <stashbot>	 T200228: disk space alert on maps1001 - https://phabricator.wikimedia.org/T200228
[15:33:42] <wikibugs>	 (03CR) 10Mholloway: [C: 031] maps: disable OSM updates on eqiad while vacuum is running [puppet] - 10https://gerrit.wikimedia.org/r/447627 (https://phabricator.wikimedia.org/T200228) (owner: 10Gehel)
[15:34:29] <icinga-wm>	 RECOVERY - Recursive DNS on 208.80.153.78 is OK: DNS OK: 0.240 seconds response time. www.wikipedia.org returns 208.80.154.224
[15:37:22] <wikibugs>	 (03PS1) 10Andrew Bogott: labservices: typo fix in heira [puppet] - 10https://gerrit.wikimedia.org/r/447634
[15:37:59] <wikibugs>	 (03PS10) 10Jcrespo: switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224)
[15:38:13] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] labservices: typo fix in heira [puppet] - 10https://gerrit.wikimedia.org/r/447634 (owner: 10Andrew Bogott)
[15:38:55] <jynus>	 !log stopping puppet on es2017, es2018; changing mysql configuration for production testing
[15:38:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:39:27] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Repool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447635
[15:42:59] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447635 (owner: 10Marostegui)
[15:44:50] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447635 (owner: 10Marostegui)
[15:45:39] <jynus>	 es2018 and es2019 will alert of replica lag
[15:45:43] <jynus>	 this is expected
[15:46:03] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1103:3314 (duration: 01m 02s)
[15:46:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:46:13] <jynus>	 it is part of the test I am doing (diconnected both datacenter to make sure they do not afffect the primary dc)
[15:48:28] <wikibugs>	 (03PS7) 10Vgutierrez: WIP: provide ACMEv2 support based on certbot/acme library [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717)
[15:48:51] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224) (owner: 10Jcrespo)
[15:49:05] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] switchover: Add the functionality to start and stop heartbeat [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447603 (https://phabricator.wikimedia.org/T199224) (owner: 10Jcrespo)
[15:49:12] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] WIP: provide ACMEv2 support based on certbot/acme library [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) (owner: 10Vgutierrez)
[15:55:48] <jynus>	 !log test switchover from es2017 to es2018
[15:55:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:58:38] <icinga-wm>	 PROBLEM - MariaDB Slave IO: es3 on es2017 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 1275, Errmsg: error connecting to master repl@es2018.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: Server is running in --secure-auth mode, but repl@10.192.0.142 has a password in the old format: please change the password to the new format
[15:59:17] <dcausse>	 !log T156137: restarting elasticsearch on elastic1031 to disable G1GC
[15:59:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:59:21] <stashbot>	 T156137: Reduce impact of GC pauses on elasticsearch response time - https://phabricator.wikimedia.org/T156137
[15:59:29] <marostegui>	 We are fixing that alert
[15:59:46] <marostegui>	 It was part of a test
[16:00:05] <jouncebot>	 godog, moritzm, and _joe_: Your horoscope predicts another unfortunate Puppet SWAT(Max 6 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180724T1600).
[16:00:05] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[16:01:18] <wikibugs>	 (03PS8) 10Vgutierrez: WIP: provide ACMEv2 support based on certbot/acme library [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717)
[16:01:41] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] WIP: provide ACMEv2 support based on certbot/acme library [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) (owner: 10Vgutierrez)
[16:01:58] <icinga-wm>	 RECOVERY - MariaDB Slave IO: es3 on es2017 is OK: OK slave_io_state Slave_IO_Running: Yes
[16:02:46] <wikibugs>	 10Operations, 10netops: OSPF metrics - https://phabricator.wikimedia.org/T200277 (10ayounsi) p:05Triage>03Normal
[16:02:47] <dcausse>	 !log T156137: unbanning elastic1031
[16:02:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:49] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: es3 on es2017 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1468.56 seconds
[16:07:07] <marostegui>	 ^ that is part of a test
[16:07:42] <jynus>	 !log test switchover from es2018 to es2017
[16:07:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:07:59] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: es3 on es2017 is OK: OK slave_sql_lag not a slave
[16:08:01] <jynus>	 SUCCESS: Master switch completed successfully
[16:08:25] <jynus>	 it took a bit more, but I am doing cross-dc commands (around 4-5 seconds)
[16:20:14] <wikibugs>	 (03PS2) 10Bstorm: gridengine: try to translate all the Ubuntu package calls to Debian [puppet] - 10https://gerrit.wikimedia.org/r/447561 (https://phabricator.wikimedia.org/T199276)
[16:21:32] <wikibugs>	 (03PS1) 10Anomie: Set MCR write-both-read-old on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447638 (https://phabricator.wikimedia.org/T197817)
[16:21:49] <wikibugs>	 (03PS1) 10Anomie: Set MCR read-old-write-both on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447639 (https://phabricator.wikimedia.org/T198311)
[16:21:51] <wikibugs>	 (03PS1) 10Anomie: Set MCR read-new-write-both on Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447640 (https://phabricator.wikimedia.org/T198311)
[16:28:39] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: es3 on es2018 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 2776.41 seconds
[16:29:09] <Amir1>	 anomie: it's taking so long to merge :/
[16:29:26] <Amir1>	 ETA 6 minutes
[16:29:42] <wikibugs>	 (03PS1) 10Ema: cache_canary: add phabricator for testing purposes [puppet] - 10https://gerrit.wikimedia.org/r/447643 (https://phabricator.wikimedia.org/T164609)
[16:34:02] <wikibugs>	 (03PS2) 10Ema: cache_canary: add phabricator for testing purposes [puppet] - 10https://gerrit.wikimedia.org/r/447643 (https://phabricator.wikimedia.org/T164609)
[16:37:00] <wikibugs>	 (03CR) 10Ema: [C: 032] cache_canary: add phabricator for testing purposes [puppet] - 10https://gerrit.wikimedia.org/r/447643 (https://phabricator.wikimedia.org/T164609) (owner: 10Ema)
[16:38:05] <zeljkof>	 Amir1: it's merged, right?
[16:38:15] <Amir1>	 merged right now
[16:38:21] <Amir1>	 pulling it in wmdebug1002
[16:39:57] <Amir1>	 anomie: It's live in mwdebug1002, can you test it there?
[16:40:13] <anomie>	 Amir1: Worked.
[16:41:49] <Amir1>	 anomie: Thanks, syncing
[16:42:45] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized php-1.32.0-wmf.13/includes/page/PageArchive.php: [[gerrit:447636|PageArchive: Pass correct overrides to newRevisionFromArchiveRow() (T200072)]] (duration: 01m 03s)
[16:42:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:42:49] <stashbot>	 T200072: MCR causes Cognate integration test fails: The given Title does not belong to page ID 2 but actually belongs to 4 - https://phabricator.wikimedia.org/T200072
[16:43:22] <Amir1>	 zeljkof: It's deployed now
[16:43:24] <wikibugs>	 (03PS2) 10EBernhardson: Split elasticsearch::log::hot_threads into two pieces [puppet] - 10https://gerrit.wikimedia.org/r/447565 (https://phabricator.wikimedia.org/T198351)
[16:43:26] <wikibugs>	 (03CR) 10EBernhardson: Split elasticsearch::log::hot_threads into two pieces (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/447565 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson)
[16:43:38] <zeljkof>	 Amir1: thanks!
[16:43:40] <Amir1>	 merging and deploying the .14 atm
[16:44:21] <wikibugs>	 (03PS1) 10Ema: cache_text: disable all alternate_domains but grafana [puppet] - 10https://gerrit.wikimedia.org/r/447646 (https://phabricator.wikimedia.org/T164609)
[16:45:40] <wikibugs>	 10Operations, 10ops-eqiad, 10User-fgiunchedi: ms-be1036 in power off status, not responsive to power on commands - https://phabricator.wikimedia.org/T196873 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi Host is back in service
[16:49:55] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Fix heartbeat regex [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447648 (https://phabricator.wikimedia.org/T199224)
[16:54:31] <wikibugs>	 (03PS1) 10Dzahn: planet: fix missing language in link element [puppet] - 10https://gerrit.wikimedia.org/r/447649 (https://phabricator.wikimedia.org/T198680)
[16:54:35] <wikibugs>	 (03PS2) 10Ema: cache_text: disable all alternate domains but config-master [puppet] - 10https://gerrit.wikimedia.org/r/447646 (https://phabricator.wikimedia.org/T164609)
[16:55:53] <wikibugs>	 (03PS1) 10Zfilipin: all wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447650
[16:55:55] <wikibugs>	 (03CR) 10Zfilipin: [C: 032] all wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447650 (owner: 10Zfilipin)
[16:56:33] <wikibugs>	 (03Abandoned) 10Zfilipin: Group0 to 1.32.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447607 (owner: 10Zfilipin)
[16:56:56] <wikibugs>	 (03CR) 10Zfilipin: [C: 04-2] all wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447650 (owner: 10Zfilipin)
[16:57:10] <wikibugs>	 (03CR) 10Zfilipin: all wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447650 (owner: 10Zfilipin)
[16:58:00] <jynus>	 !log finishing test on es3 hosts T199224
[16:58:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:58:04] <stashbot>	 T199224: Test database master switchover script on codfw - https://phabricator.wikimedia.org/T199224
[16:58:18] <wikibugs>	 (03PS3) 10Ema: cache_text: disable all alternate domains [puppet] - 10https://gerrit.wikimedia.org/r/447646 (https://phabricator.wikimedia.org/T164609)
[16:58:19] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: es3 on es2018 is OK: OK slave_sql_lag Replication lag: 0.16 seconds
[16:58:20] <jynus>	 alerts will end as soon as replicas catch up soon
[16:59:45] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Fix heartbeat regex [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447648 (https://phabricator.wikimedia.org/T199224) (owner: 10Jcrespo)
[17:00:04] <jouncebot>	 cscott, arlolra, subbu, halfak, and Amir1: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Services – Graphoid / Parsoid / Citoid / ORES . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180724T1700).
[17:00:49] <icinga-wm>	 PROBLEM - Disk space on maps1001 is CRITICAL: DISK CRITICAL - free space: /srv 54522 MB (3% inode=99%)
[17:00:59] <wikibugs>	 (03PS2) 10Zfilipin: all wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447650
[17:01:15] <jynus>	 gehel: ^
[17:02:56] <gehel>	 jynus: thanks!
[17:03:29] <jynus>	 is that normal?
[17:03:42] <jynus>	 not normal, but something you were aware?
[17:04:09] <gehel>	 jynus: not normal, but Î'm aware and working on it
[17:04:18] <gehel>	 I'm silencing it for now
[17:05:58] <wikibugs>	 (03CR) 10Thcipriani: [C: 031] all wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447650 (owner: 10Zfilipin)
[17:06:55] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized php-1.32.0-wmf.14/includes/page/PageArchive.php: [[gerrit:447636|PageArchive: Pass correct overrides to newRevisionFromArchiveRow() (T200072)]] (duration: 01m 01s)
[17:06:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:06:58] <stashbot>	 T200072: MCR causes Cognate integration test fails: The given Title does not belong to page ID 2 but actually belongs to 4 - https://phabricator.wikimedia.org/T200072
[17:07:11] <wikibugs>	 (03CR) 10Zfilipin: [C: 032] all wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447650 (owner: 10Zfilipin)
[17:08:27] <wikibugs>	 (03Merged) 10jenkins-bot: all wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447650 (owner: 10Zfilipin)
[17:13:13] <wikibugs>	 (03CR) 10Ema: [C: 032] cache_text: disable all alternate domains [puppet] - 10https://gerrit.wikimedia.org/r/447646 (https://phabricator.wikimedia.org/T164609) (owner: 10Ema)
[17:14:02] <thcipriani>	 Amir1: the fix you link in your sync is for wmf.13, but you sync'd wmf.14, is that correct?
[17:14:30] <Amir1>	 thcipriani: I just sync'd the .14 fix it's in SAL
[17:14:47] <thcipriani>	 I meant gerrit 447636
[17:14:58] <thcipriani>	 https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/447636/
[17:15:05] <thcipriani>	 seems to be for wmf.13
[17:15:26] <Amir1>	 ah, yeah, I forgot to change the deploy summary, sorry
[17:15:35] <thcipriani>	 ah, ok
[17:15:43] <thcipriani>	 wmf.14 isn't deployed anywhere yet
[17:16:01] <thcipriani>	 so we're wrangling it on the deployment servers now
[17:16:01] <Amir1>	 just to be sure :D
[17:16:12] <Amir1>	 I rebased it as well
[17:16:22] <thcipriani>	 as long as that change is merged in wmf.14 my current plan should be ok :)
[17:17:40] <elukey>	 !log restart eventstreams on scb2* nodes (hopefully last time before deploying the fix) to avoid mem leaks issues during the EU night
[17:17:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:18:18] <ema>	 !log restart varnish-fe on cp1068 to clear "child restarted" alert T164609
[17:18:18] <thcipriani>	 !log train window running long, services deploy delayed
[17:18:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:18:22] <stashbot>	 T164609: Merge cache_misc into cache_text functionally - https://phabricator.wikimedia.org/T164609
[17:18:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:18:59] <icinga-wm>	 RECOVERY - Varnish frontend child restarted on cp1068 is OK: (C)3 gt (W)1 gt 1 https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=66&fullscreen&orgId=1&var-server=cp1068&var-datasource=eqiad+prometheus/ops
[17:19:40] <zeljkof>	 cscott, arlolra, subbu,halfak, Amir1: please stand by for services deploy, we are probably going to move wmf.13 to all wikis right now
[17:27:52] <wikibugs>	 (03PS2) 10Dzahn: planet: fix missing language in link element [puppet] - 10https://gerrit.wikimedia.org/r/447649 (https://phabricator.wikimedia.org/T198680)
[17:28:15] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/11844/planet1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/447649 (https://phabricator.wikimedia.org/T198680) (owner: 10Dzahn)
[17:33:35] <logmsgbot>	 !log zfilipin@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.32.0-wmf.13
[17:33:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:34:21] <Amir1>	 load time of Special:Tags in frwiki is now down from 3s to 0.4s
[17:38:28] <icinga-wm>	 RECOVERY - Disk space on maps1001 is OK: DISK OK
[17:40:19] <ema>	 !log re-enable puppet on all cache nodes with alternate domains disabled T164609
[17:40:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:40:23] <stashbot>	 T164609: Merge cache_misc into cache_text functionally - https://phabricator.wikimedia.org/T164609
[17:42:56] <wikibugs>	 (03PS3) 10Gehel: maps: disable OSM updates on eqiad while vacuum is running [puppet] - 10https://gerrit.wikimedia.org/r/447627 (https://phabricator.wikimedia.org/T200228)
[17:43:37] <wikibugs>	 (03CR) 10Gehel: [C: 032] maps: disable OSM updates on eqiad while vacuum is running [puppet] - 10https://gerrit.wikimedia.org/r/447627 (https://phabricator.wikimedia.org/T200228) (owner: 10Gehel)
[17:44:25] <wikibugs>	 (03PS4) 10Gehel: Enable fetching constraints for Updater [puppet] - 10https://gerrit.wikimedia.org/r/445454 (https://phabricator.wikimedia.org/T192567) (owner: 10Smalyshev)
[17:45:00] <wikibugs>	 (03PS1) 10Bstorm: wiki replicas: moving compatibility views to $table_compat [puppet] - 10https://gerrit.wikimedia.org/r/447654 (https://phabricator.wikimedia.org/T174047)
[17:46:17] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install phab1002(WMF4727) - https://phabricator.wikimedia.org/T196019 (10Dzahn) a:03Dzahn
[17:47:09] <wikibugs>	 (03CR) 10Gehel: [C: 032] Enable fetching constraints for Updater [puppet] - 10https://gerrit.wikimedia.org/r/445454 (https://phabricator.wikimedia.org/T192567) (owner: 10Smalyshev)
[17:57:00] <wikibugs>	 (03Abandoned) 10Dzahn: add IPv6 for bast3003 [dns] - 10https://gerrit.wikimedia.org/r/405225 (https://phabricator.wikimedia.org/T184936) (owner: 10Dzahn)
[17:57:08] <wikibugs>	 (03Abandoned) 10Dzahn: bast3002->bast3003 in DHCP,network constants,smokeping [puppet] - 10https://gerrit.wikimedia.org/r/405229 (https://phabricator.wikimedia.org/T184936) (owner: 10Dzahn)
[17:57:13] <wikibugs>	 (03Abandoned) 10Dzahn: bast3002->bast3003 as prometheus node, rm from site [puppet] - 10https://gerrit.wikimedia.org/r/405230 (https://phabricator.wikimedia.org/T184936) (owner: 10Dzahn)
[17:57:19] <wikibugs>	 (03Abandoned) 10Dzahn: prometheus.svc.esams.wmnet: bast3002->bast3003 [dns] - 10https://gerrit.wikimedia.org/r/405231 (https://phabricator.wikimedia.org/T184936) (owner: 10Dzahn)
[17:57:25] <wikibugs>	 (03Abandoned) 10Dzahn: decom bast3002, keep mgmt [dns] - 10https://gerrit.wikimedia.org/r/405232 (https://phabricator.wikimedia.org/T184936) (owner: 10Dzahn)
[17:59:12] <wikibugs>	 10Operations, 10ops-esams: bast3002 sdb broken - https://phabricator.wikimedia.org/T169035 (10Dzahn)
[17:59:27] <wikibugs>	 10Operations, 10ops-esams, 10Patch-For-Review: install/designate other machine as esams bastion - https://phabricator.wikimedia.org/T184936 (10Dzahn) 05stalled>03Invalid Thanks Mark :)  i think we can close this as Invalid then.
[18:01:02] <wikibugs>	 (03PS1) 10Jcrespo: switchover: Commit pending transactions when setting read only [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447655
[18:01:28] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] switchover: Commit pending transactions when setting read only [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447655 (owner: 10Jcrespo)
[18:02:42] <wikibugs>	 10Operations, 10ops-eqiad: rack/setup/install torrelay1001.wikimedia.org - https://phabricator.wikimedia.org/T196701 (10Dzahn) You can assign this to me after the initial setup to implement service.
[18:06:11] <wikibugs>	 (03PS2) 10Dzahn: phabricator weekly project changes email: Ignore disabled new assignees [puppet] - 10https://gerrit.wikimedia.org/r/443401 (https://phabricator.wikimedia.org/T195780) (owner: 10Aklapper)
[18:07:22] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs region-migrate: add a wait and reboot after the copy [puppet] - 10https://gerrit.wikimedia.org/r/447656
[18:08:09] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] wmcs region-migrate: add a wait and reboot after the copy [puppet] - 10https://gerrit.wikimedia.org/r/447656 (owner: 10Andrew Bogott)
[18:10:57] <wikibugs>	 (03CR) 10Dzahn: [C: 032] phabricator weekly project changes email: Ignore disabled new assignees [puppet] - 10https://gerrit.wikimedia.org/r/443401 (https://phabricator.wikimedia.org/T195780) (owner: 10Aklapper)
[18:11:17] <wikibugs>	 (03PS3) 10Dzahn: phabricator weekly project changes email: Ignore disabled new assignees [puppet] - 10https://gerrit.wikimedia.org/r/443401 (https://phabricator.wikimedia.org/T195780) (owner: 10Aklapper)
[18:15:29] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Jenkins, 10Patch-For-Review: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10Dzahn)
[18:15:33] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Jenkins, 10Patch-For-Review: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10Dzahn) @Hashar Can we change Jenkins config to use the new host per Krinkle's question above?
[18:19:45] <wikibugs>	 (03PS1) 10Jcrespo: switchover: Fix bug where shards are added with an extra 'b' [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447659 (https://phabricator.wikimedia.org/T199224)
[18:20:09] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] switchover: Fix bug where shards are added with an extra 'b' [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/447659 (https://phabricator.wikimedia.org/T199224) (owner: 10Jcrespo)
[18:27:27] <wikibugs>	 (03PS2) 10Dzahn: Remove priyankaivy.blogspot.com from Planet [puppet] - 10https://gerrit.wikimedia.org/r/444328 (owner: 10Amire80)
[18:32:00] <logmsgbot>	 !log mobrovac@deploy1001 Started deploy [eventstreams/deploy@690fdad]: Wait for the client to consume the meesage being sent before consuming the next one - T199813
[18:32:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:32:05] <stashbot>	 T199813: EventStreams accumulates too much memory on SCB nodes in CODFW - https://phabricator.wikimedia.org/T199813
[18:32:26] <wikibugs>	 (03CR) 10Dzahn: [C: 032] Remove priyankaivy.blogspot.com from Planet [puppet] - 10https://gerrit.wikimedia.org/r/444328 (owner: 10Amire80)
[18:32:36] <wikibugs>	 (03CR) 10Volans: [C: 031] "Thanks for the py3 migration and all the fixes! LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/447565 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson)
[18:34:18] <logmsgbot>	 !log mobrovac@deploy1001 Finished deploy [eventstreams/deploy@690fdad]: Wait for the client to consume the meesage being sent before consuming the next one - T199813 (duration: 02m 18s)
[18:34:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:44:34] <wikibugs>	 (03PS2) 10Dzahn: phabricator: Print IDs of projects of tasks assigned to disabled accounts [puppet] - 10https://gerrit.wikimedia.org/r/446367 (owner: 10Aklapper)
[18:45:02] <wikibugs>	 (03CR) 10Dzahn: "+----------------------------------------------+--------------------------------+--------------------------------+" [puppet] - 10https://gerrit.wikimedia.org/r/446367 (owner: 10Aklapper)
[18:45:57] <wikibugs>	 (03CR) 10Dzahn: [C: 032] phabricator: Print IDs of projects of tasks assigned to disabled accounts [puppet] - 10https://gerrit.wikimedia.org/r/446367 (owner: 10Aklapper)
[18:46:36] <wikibugs>	 (03PS2) 10Thiemo Kreuz (WMDE): Do not leak local $wgWBShared… variables to th eglobal scope [mediawiki-config] - 10https://gerrit.wikimedia.org/r/444632
[18:46:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Do not leak local $wgWBShared… variables to th eglobal scope [mediawiki-config] - 10https://gerrit.wikimedia.org/r/444632 (owner: 10Thiemo Kreuz (WMDE))
[18:46:55] <wikibugs>	 (03PS4) 10Dzahn: dumps: add phab1002 as second phab server [puppet] - 10https://gerrit.wikimedia.org/r/437558 (https://phabricator.wikimedia.org/T196019)
[18:49:43] <wikibugs>	 (03Abandoned) 10Dzahn: convert check_prometheus_metric.py to python3 [puppet] - 10https://gerrit.wikimedia.org/r/441208 (owner: 10Dzahn)
[18:59:34] <wikibugs>	 (03CR) 10Gehel: "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/447564 (owner: 10EBernhardson)
[19:00:04] <jouncebot>	 Deploy window MediaWiki train - Americas version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180724T1900)
[19:00:32] <wikibugs>	 (03CR) 10Gehel: "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/445320 (owner: 10EBernhardson)
[19:00:38] <wikibugs>	 (03CR) 10Gehel: [C: 031] Delete unused code in elasticsearch module [puppet] - 10https://gerrit.wikimedia.org/r/445320 (owner: 10EBernhardson)
[19:05:11] <wikibugs>	 (03CR) 10Gehel: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/447565 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson)
[19:09:16] <gehel>	 !log resetting postgres data on maps1003 after failing replication - T200228
[19:09:18] <icinga-wm>	 PROBLEM - Disk space on elastic1024 is CRITICAL: DISK CRITICAL - free space: /srv 52360 MB (10% inode=99%)
[19:09:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:09:20] <stashbot>	 T200228: disk space alert on maps1001 - https://phabricator.wikimedia.org/T200228
[19:11:28] <icinga-wm>	 PROBLEM - MegaRAID on db1069 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded)
[19:11:29] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on db1069 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T200287
[19:11:29] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on db1069 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db1069&var-datasource=eqiad%2520prometheus%252Fops
[19:13:45] <wikibugs>	 (03PS3) 10EBernhardson: Delete unused code in elasticsearch module [puppet] - 10https://gerrit.wikimedia.org/r/445320
[19:23:07] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on db1069 - https://phabricator.wikimedia.org/T200287 (10Cmjohnson) Swapped disk  current state is rebuild  Firmware state: Rebuild Firmware state: Online, Spun Up Firmware state: Online, Spun Up Firmware state: Online, Spun Up Firmware state: Online, Spun Up Firmware s...
[19:24:37] <wikibugs>	 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: rack/setup/install cp1075-cp1090 - https://phabricator.wikimedia.org/T195923 (10Cmjohnson)
[19:25:51] <wikibugs>	 (03PS3) 10EBernhardson: Split elasticsearch::log::hot_threads into two pieces [puppet] - 10https://gerrit.wikimedia.org/r/447565 (https://phabricator.wikimedia.org/T198351)
[19:25:53] <wikibugs>	 (03PS2) 10EBernhardson: Make cirrus specific elasticsearch profile [puppet] - 10https://gerrit.wikimedia.org/r/447566 (https://phabricator.wikimedia.org/T198351)
[19:32:02] <wikibugs>	 (03CR) 10EBernhardson: "elasticsearch 5 is the default, so the only thing with elasticsearch 2 would have to have hiera config specifying it. Given that, there is" [puppet] - 10https://gerrit.wikimedia.org/r/447564 (owner: 10EBernhardson)
[19:46:48] <icinga-wm>	 RECOVERY - Disk space on elastic1024 is OK: DISK OK
[19:47:58] <icinga-wm>	 RECOVERY - MegaRAID on db1069 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[19:48:54] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1069 - https://phabricator.wikimedia.org/T200287 (10Marostegui) 05Open>03Resolved a:03Cmjohnson This is all good now Thank you! ``` root@db1069:~# megacli -LDPDInfo -aAll  Adapter #0  Number of Virtual Disks: 1 Virtual Drive: 0 (Target Id: 0) Name...
[19:49:44] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1069 bad disk - https://phabricator.wikimedia.org/T199056 (10Marostegui) 05Open>03Resolved The disk got replaced and this is all good now: T200287#4448846
[19:51:32] <wikibugs>	 (03PS3) 10EBernhardson: Make cirrus specific elasticsearch profile [puppet] - 10https://gerrit.wikimedia.org/r/447566 (https://phabricator.wikimedia.org/T198351)
[19:58:15] <wikibugs>	 (03PS2) 10Urbanecm: Add wikimania2019wiki Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/445766 (https://phabricator.wikimedia.org/T199509)
[19:59:44] <wikibugs>	 (03PS4) 10EBernhardson: Make cirrus specific elasticsearch profile [puppet] - 10https://gerrit.wikimedia.org/r/447566 (https://phabricator.wikimedia.org/T198351)
[19:59:53] <wikibugs>	 (03Abandoned) 10Urbanecm: Add wikimania2019wiki Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/445766 (https://phabricator.wikimedia.org/T199509) (owner: 10Urbanecm)
[20:04:08] <wikibugs>	 (03PS5) 10EBernhardson: Make cirrus specific elasticsearch profile [puppet] - 10https://gerrit.wikimedia.org/r/447566 (https://phabricator.wikimedia.org/T198351)
[20:05:17] <Urbanecm>	 Reedy, Dereckson: We have 4 wikis pending. Can somebody clear the list?
[20:05:36] <Reedy>	 I'm waiting for the wikimania.wikimedia.org to be handled in apache
[20:05:42] <wikibugs>	 (03PS2) 10Urbanecm: Initial configuration for wikimania2019wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/445765 (https://phabricator.wikimedia.org/T199509)
[20:07:00] <Urbanecm>	 ok
[20:07:21] <Urbanecm>	 BTW, why you want to "Stop redirecting wikimania.wikimedia.org to the yearly wiki"? I don't understand that Reedy 
[20:07:34] <Reedy>	 Because we're going to put a wiki there
[20:07:57] <Urbanecm>	 Oh, totally forgot...
[20:08:00] <Urbanecm>	 Thanks
[20:08:37] <Urbanecm>	 BTW, I've finally uploaded logos to the MW patch for wikimania2019wiki, so it is ready from this side as well.
[20:08:41] <wikibugs>	 (03PS6) 10EBernhardson: Make cirrus specific elasticsearch profile [puppet] - 10https://gerrit.wikimedia.org/r/447566 (https://phabricator.wikimedia.org/T198351)
[20:11:22] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs region-migrate: add ssh tests before and after [puppet] - 10https://gerrit.wikimedia.org/r/447719
[20:12:34] <wikibugs>	 (03PS2) 10Andrew Bogott: wmcs region-migrate: add ssh tests before and after [puppet] - 10https://gerrit.wikimedia.org/r/447719
[20:13:13] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] wmcs region-migrate: add ssh tests before and after [puppet] - 10https://gerrit.wikimedia.org/r/447719 (owner: 10Andrew Bogott)
[20:15:09] <wikibugs>	 10Operations, 10Analytics-Kanban, 10DNS, 10Release-Engineering-Team, and 5 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776 (10JAllemandou)
[20:17:00] <wikibugs>	 (03PS7) 10EBernhardson: Make cirrus specific elasticsearch profile [puppet] - 10https://gerrit.wikimedia.org/r/447566 (https://phabricator.wikimedia.org/T198351)
[20:17:02] <wikibugs>	 (03PS2) 10EBernhardson: Split per-cluster config out of elasticsearch::curator [puppet] - 10https://gerrit.wikimedia.org/r/447567 (https://phabricator.wikimedia.org/T180807)
[20:23:34] <wikibugs>	 (03PS3) 10EBernhardson: Split per-cluster config out of elasticsearch::curator [puppet] - 10https://gerrit.wikimedia.org/r/447567 (https://phabricator.wikimedia.org/T180807)
[20:23:36] <wikibugs>	 (03PS8) 10EBernhardson: Switch elasticsearch to use tlsproxy module [puppet] - 10https://gerrit.wikimedia.org/r/444610 (https://phabricator.wikimedia.org/T198351)
[20:27:24] <wikibugs>	 (03PS4) 10EBernhardson: Split per-cluster config out of elasticsearch::curator [puppet] - 10https://gerrit.wikimedia.org/r/447567 (https://phabricator.wikimedia.org/T180807)
[20:27:26] <wikibugs>	 (03PS9) 10EBernhardson: Switch elasticsearch to use tlsproxy module [puppet] - 10https://gerrit.wikimedia.org/r/444610 (https://phabricator.wikimedia.org/T198351)
[20:30:08] <wikibugs>	 (03PS1) 10Ayounsi: Depool eqsin for cr1-eqsin software upgrade [dns] - 10https://gerrit.wikimedia.org/r/447721
[20:31:34] <wikibugs>	 (03CR) 10Ayounsi: [C: 032] Depool eqsin for cr1-eqsin software upgrade [dns] - 10https://gerrit.wikimedia.org/r/447721 (owner: 10Ayounsi)
[20:32:58] <XioNoX>	 !log depooling eqsin for cr1-eqsin software upgrade
[20:33:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:35:16] <wikibugs>	 (03PS5) 10EBernhardson: Split per-cluster config out of elasticsearch::curator [puppet] - 10https://gerrit.wikimedia.org/r/447567 (https://phabricator.wikimedia.org/T180807)
[20:42:28] <icinga-wm>	 PROBLEM - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is CRITICAL: CRITICAL: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is alerting: 70% GET drop in 30min alert.
[20:43:03] <XioNoX>	 ^ expected
[20:46:34] <XioNoX>	 and confirmed that this alert is working as expected
[20:56:29] <XioNoX>	 I downtimed everything I could find with a eqsin mention in icinga
[21:00:20] <XioNoX>	 !log restarting cr1-eqsin for software upgrade
[21:00:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:01:49] <icinga-wm>	 PROBLEM - Recursive DNS on 208.80.154.20 is CRITICAL: CRITICAL - Plugin timed out while executing system call
[21:02:32] <XioNoX>	 ^ that's labs-recursor1.wikimedia.org, I'd guess unrelated to cr1-eqsin maintenance
[21:02:39] <icinga-wm>	 PROBLEM - Host cp5011.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:02:39] <icinga-wm>	 PROBLEM - Host cp5010.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:02:39] <icinga-wm>	 PROBLEM - Host cp5004.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:02:39] <icinga-wm>	 PROBLEM - Host cp5009.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:02:59] <icinga-wm>	 PROBLEM - Host lvs5003.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:02:59] <icinga-wm>	 PROBLEM - Host cp5008.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:02:59] <icinga-wm>	 PROBLEM - Host cp5005.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:03:19] <XioNoX>	 hum, mgmt doesn't have mr1 as parent I guess
[21:04:08] <icinga-wm>	 RECOVERY - Recursive DNS on 208.80.154.20 is OK: DNS OK: 0.046 seconds response time. www.wikipedia.org returns 208.80.154.224
[21:04:09] <icinga-wm>	 PROBLEM - Host cp5012.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:04:18] <icinga-wm>	 PROBLEM - Host cp5007.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:04:18] <icinga-wm>	 PROBLEM - Host bast5001.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:04:38] <icinga-wm>	 PROBLEM - Host cp5001.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:04:38] <icinga-wm>	 PROBLEM - Host cp5003.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:04:38] <icinga-wm>	 PROBLEM - Host cp5002.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:05:08] <icinga-wm>	 PROBLEM - Host lvs5002.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:05:08] <icinga-wm>	 PROBLEM - Host lvs5001.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:05:39] <icinga-wm>	 PROBLEM - Host dns5001.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:05:39] <icinga-wm>	 PROBLEM - Host dns5002.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:06:50] <XioNoX>	 !log Install done, cr1-eqsin re-rebooting
[21:06:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:07:58] <icinga-wm>	 PROBLEM - IPsec on cp1063 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:03] <librenms-wmf>	 04Critical Alert for device cr1-eqsin.wikimedia.org - Critical syslog messages
[21:08:08] <icinga-wm>	 PROBLEM - IPsec on cp2010 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp5007_v4, cp5007_v6, cp5008_v4, cp5008_v6, cp5009_v4, cp5009_v6, cp5010_v4, cp5010_v6, cp5011_v4, cp5011_v6, cp5012_v4, cp5012_v6
[21:08:08] <icinga-wm>	 PROBLEM - IPsec on cp2013 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp5007_v4, cp5007_v6, cp5008_v4, cp5008_v6, cp5009_v4, cp5009_v6, cp5010_v4, cp5010_v6, cp5011_v4, cp5011_v6, cp5012_v4, cp5012_v6
[21:08:08] <icinga-wm>	 PROBLEM - IPsec on cp2016 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp5007_v4, cp5007_v6, cp5008_v4, cp5008_v6, cp5009_v4, cp5009_v6, cp5010_v4, cp5010_v6, cp5011_v4, cp5011_v6, cp5012_v4, cp5012_v6
[21:08:08] <icinga-wm>	 PROBLEM - IPsec on cp2019 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp5007_v4, cp5007_v6, cp5008_v4, cp5008_v6, cp5009_v4, cp5009_v6, cp5010_v4, cp5010_v6, cp5011_v4, cp5011_v6, cp5012_v4, cp5012_v6
[21:08:08] <icinga-wm>	 PROBLEM - IPsec on cp1066 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp5007_v4, cp5007_v6, cp5008_v4, cp5008_v6, cp5009_v4, cp5009_v6, cp5010_v4, cp5010_v6, cp5011_v4, cp5011_v6, cp5012_v4, cp5012_v6
[21:08:08] <icinga-wm>	 PROBLEM - IPsec on cp2007 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp5007_v4, cp5007_v6, cp5008_v4, cp5008_v6, cp5009_v4, cp5009_v6, cp5010_v4, cp5010_v6, cp5011_v4, cp5011_v6, cp5012_v4, cp5012_v6
[21:08:08] <icinga-wm>	 PROBLEM - IPsec on cp2014 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:09] <icinga-wm>	 PROBLEM - IPsec on cp1099 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:09] <icinga-wm>	 PROBLEM - IPsec on cp2011 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:18] <icinga-wm>	 PROBLEM - IPsec on cp1073 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:18] <icinga-wm>	 PROBLEM - IPsec on cp2023 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp5007_v4, cp5007_v6, cp5008_v4, cp5008_v6, cp5009_v4, cp5009_v6, cp5010_v4, cp5010_v6, cp5011_v4, cp5011_v6, cp5012_v4, cp5012_v6
[21:08:18] <icinga-wm>	 PROBLEM - IPsec on cp2005 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:18] <icinga-wm>	 PROBLEM - IPsec on cp2026 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:19] <icinga-wm>	 PROBLEM - IPsec on cp2017 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:19] <icinga-wm>	 PROBLEM - IPsec on cp2022 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:19] <icinga-wm>	 PROBLEM - IPsec on cp1048 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:20] <icinga-wm>	 PROBLEM - IPsec on cp2024 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:28] <icinga-wm>	 PROBLEM - IPsec on cp2020 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:28] <icinga-wm>	 PROBLEM - IPsec on cp1067 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp5007_v4, cp5007_v6, cp5008_v4, cp5008_v6, cp5009_v4, cp5009_v6, cp5010_v4, cp5010_v6, cp5011_v4, cp5011_v6, cp5012_v4, cp5012_v6
[21:08:29] <icinga-wm>	 PROBLEM - IPsec on cp1062 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:29] <icinga-wm>	 PROBLEM - IPsec on cp1065 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp5007_v4, cp5007_v6, cp5008_v4, cp5008_v6, cp5009_v4, cp5009_v6, cp5010_v4, cp5010_v6, cp5011_v4, cp5011_v6, cp5012_v4, cp5012_v6
[21:08:39] <icinga-wm>	 PROBLEM - IPsec on cp1052 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp5007_v4, cp5007_v6, cp5008_v4, cp5008_v6, cp5009_v4, cp5009_v6, cp5010_v4, cp5010_v6, cp5011_v4, cp5011_v6, cp5012_v4, cp5012_v6
[21:08:39] <icinga-wm>	 PROBLEM - IPsec on cp1053 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp5007_v4, cp5007_v6, cp5008_v4, cp5008_v6, cp5009_v4, cp5009_v6, cp5010_v4, cp5010_v6, cp5011_v4, cp5011_v6, cp5012_v4, cp5012_v6
[21:08:39] <XioNoX>	 IPsec alerts expected
[21:08:48] <icinga-wm>	 PROBLEM - IPsec on cp1071 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:48] <icinga-wm>	 PROBLEM - IPsec on cp1055 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp5007_v4, cp5007_v6, cp5008_v4, cp5008_v6, cp5009_v4, cp5009_v6, cp5010_v4, cp5010_v6, cp5011_v4, cp5011_v6, cp5012_v4, cp5012_v6
[21:08:48] <icinga-wm>	 PROBLEM - IPsec on cp1049 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:48] <icinga-wm>	 PROBLEM - IPsec on cp1068 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp5007_v4, cp5007_v6, cp5008_v4, cp5008_v6, cp5009_v4, cp5009_v6, cp5010_v4, cp5010_v6, cp5011_v4, cp5011_v6, cp5012_v4, cp5012_v6
[21:08:49] <icinga-wm>	 PROBLEM - IPsec on cp1074 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:49] <icinga-wm>	 PROBLEM - IPsec on cp1050 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:58] <icinga-wm>	 PROBLEM - IPsec on cp1072 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp5001_v4, cp5001_v6, cp5002_v4, cp5002_v6, cp5003_v4, cp5003_v6, cp5004_v4, cp5004_v6, cp5005_v4, cp5005_v6
[21:08:58] <icinga-wm>	 PROBLEM - IPsec on cp2001 is CRITICAL: Strongswan CRITICAL - ok: 46 connecting: (unnamed) not-conn: cp5007_v6, cp5008_v4, cp5008_v6, cp5009_v4, cp5009_v6, cp5010_v6, cp5011_v6, cp5012_v6
[21:08:59] <icinga-wm>	 PROBLEM - IPsec on cp1064 is CRITICAL: Strongswan CRITICAL - ok: 61 connecting: (unnamed) not-conn: cp5001_v6, cp5002_v6, cp5003_v6, cp5004_v6, cp5005_v6
[21:08:59] <icinga-wm>	 PROBLEM - IPsec on cp1054 is CRITICAL: Strongswan CRITICAL - ok: 48 connecting: (unnamed) not-conn: cp5007_v6, cp5008_v6, cp5009_v6, cp5010_v6, cp5011_v6, cp5012_v6
[21:09:18] <icinga-wm>	 RECOVERY - IPsec on cp2010 is OK: Strongswan OK - 54 ESP OK
[21:09:18] <icinga-wm>	 RECOVERY - IPsec on cp2013 is OK: Strongswan OK - 54 ESP OK
[21:09:18] <icinga-wm>	 RECOVERY - IPsec on cp2016 is OK: Strongswan OK - 54 ESP OK
[21:09:19] <icinga-wm>	 RECOVERY - IPsec on cp2019 is OK: Strongswan OK - 54 ESP OK
[21:09:19] <icinga-wm>	 RECOVERY - IPsec on cp1066 is OK: Strongswan OK - 54 ESP OK
[21:09:19] <icinga-wm>	 RECOVERY - IPsec on cp2007 is OK: Strongswan OK - 54 ESP OK
[21:09:19] <icinga-wm>	 RECOVERY - IPsec on cp2014 is OK: Strongswan OK - 68 ESP OK
[21:09:28] <icinga-wm>	 RECOVERY - IPsec on cp1099 is OK: Strongswan OK - 66 ESP OK
[21:09:28] <icinga-wm>	 RECOVERY - IPsec on cp2011 is OK: Strongswan OK - 68 ESP OK
[21:09:29] <icinga-wm>	 RECOVERY - Host dns5002.mgmt is UP: PING OK - Packet loss = 0%, RTA = 239.19 ms
[21:09:29] <icinga-wm>	 RECOVERY - Host cp5012.mgmt is UP: PING OK - Packet loss = 0%, RTA = 241.44 ms
[21:09:29] <icinga-wm>	 RECOVERY - IPsec on cp1073 is OK: Strongswan OK - 66 ESP OK
[21:09:29] <icinga-wm>	 RECOVERY - IPsec on cp2023 is OK: Strongswan OK - 54 ESP OK
[21:09:29] <icinga-wm>	 RECOVERY - IPsec on cp2005 is OK: Strongswan OK - 68 ESP OK
[21:09:29] <icinga-wm>	 RECOVERY - IPsec on cp2026 is OK: Strongswan OK - 68 ESP OK
[21:09:30] <icinga-wm>	 RECOVERY - IPsec on cp2017 is OK: Strongswan OK - 68 ESP OK
[21:09:38] <icinga-wm>	 RECOVERY - IPsec on cp2022 is OK: Strongswan OK - 68 ESP OK
[21:09:38] <icinga-wm>	 RECOVERY - IPsec on cp1048 is OK: Strongswan OK - 66 ESP OK
[21:09:38] <icinga-wm>	 RECOVERY - IPsec on cp2024 is OK: Strongswan OK - 68 ESP OK
[21:09:39] <icinga-wm>	 RECOVERY - IPsec on cp2020 is OK: Strongswan OK - 68 ESP OK
[21:09:39] <icinga-wm>	 RECOVERY - IPsec on cp1067 is OK: Strongswan OK - 54 ESP OK
[21:09:48] <icinga-wm>	 RECOVERY - Host cp5007.mgmt is UP: PING OK - Packet loss = 0%, RTA = 232.85 ms
[21:09:48] <icinga-wm>	 RECOVERY - Host bast5001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 234.91 ms
[21:09:48] <icinga-wm>	 RECOVERY - IPsec on cp1062 is OK: Strongswan OK - 66 ESP OK
[21:09:48] <icinga-wm>	 RECOVERY - IPsec on cp1065 is OK: Strongswan OK - 54 ESP OK
[21:09:58] <icinga-wm>	 RECOVERY - IPsec on cp1052 is OK: Strongswan OK - 54 ESP OK
[21:09:58] <icinga-wm>	 RECOVERY - IPsec on cp1053 is OK: Strongswan OK - 54 ESP OK
[21:09:58] <icinga-wm>	 RECOVERY - Host lvs5003.mgmt is UP: PING OK - Packet loss = 0%, RTA = 247.92 ms
[21:09:59] <icinga-wm>	 RECOVERY - IPsec on cp1071 is OK: Strongswan OK - 66 ESP OK
[21:09:59] <icinga-wm>	 RECOVERY - IPsec on cp1055 is OK: Strongswan OK - 54 ESP OK
[21:09:59] <icinga-wm>	 RECOVERY - IPsec on cp1049 is OK: Strongswan OK - 66 ESP OK
[21:09:59] <icinga-wm>	 RECOVERY - IPsec on cp1068 is OK: Strongswan OK - 54 ESP OK
[21:09:59] <icinga-wm>	 RECOVERY - Host cp5001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 240.75 ms
[21:09:59] <icinga-wm>	 RECOVERY - Host cp5003.mgmt is UP: PING OK - Packet loss = 0%, RTA = 236.73 ms
[21:10:00] <icinga-wm>	 RECOVERY - Host cp5002.mgmt is UP: PING OK - Packet loss = 0%, RTA = 245.65 ms
[21:10:06] <XioNoX>	 !log starting to see recoveries from cr1-eqsin upgrade
[21:10:08] <icinga-wm>	 RECOVERY - IPsec on cp1074 is OK: Strongswan OK - 66 ESP OK
[21:10:08] <icinga-wm>	 RECOVERY - IPsec on cp1050 is OK: Strongswan OK - 66 ESP OK
[21:10:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:10:09] <icinga-wm>	 RECOVERY - Host lvs5001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 232.59 ms
[21:10:09] <icinga-wm>	 RECOVERY - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is OK: OK: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is not alerting.
[21:10:09] <icinga-wm>	 RECOVERY - IPsec on cp1072 is OK: Strongswan OK - 66 ESP OK
[21:10:18] <icinga-wm>	 RECOVERY - IPsec on cp2001 is OK: Strongswan OK - 54 ESP OK
[21:10:18] <icinga-wm>	 RECOVERY - IPsec on cp1064 is OK: Strongswan OK - 66 ESP OK
[21:10:18] <icinga-wm>	 RECOVERY - IPsec on cp1054 is OK: Strongswan OK - 54 ESP OK
[21:10:19] <icinga-wm>	 RECOVERY - IPsec on cp1063 is OK: Strongswan OK - 66 ESP OK
[21:10:29] <icinga-wm>	 RECOVERY - Host lvs5002.mgmt is UP: PING OK - Packet loss = 0%, RTA = 228.58 ms
[21:10:59] <icinga-wm>	 RECOVERY - Host dns5001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 239.58 ms
[21:12:32] <wikibugs>	 (03PS3) 10Bstorm: gridengine: try to translate all the Ubuntu package calls to Debian [puppet] - 10https://gerrit.wikimedia.org/r/447561 (https://phabricator.wikimedia.org/T199276)
[21:13:05] <librenms-wmf>	 08Warning Alert for device mr1-eqsin.wikimedia.org - Inbound interface errors
[21:13:08] <icinga-wm>	 PROBLEM - puppet last run on bast5001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:28] <icinga-wm>	 RECOVERY - Host cp5011.mgmt is UP: PING OK - Packet loss = 0%, RTA = 245.35 ms
[21:13:28] <icinga-wm>	 RECOVERY - Host cp5010.mgmt is UP: PING OK - Packet loss = 0%, RTA = 246.35 ms
[21:13:28] <icinga-wm>	 RECOVERY - Host cp5004.mgmt is UP: PING OK - Packet loss = 0%, RTA = 245.44 ms
[21:13:28] <icinga-wm>	 RECOVERY - Host cp5009.mgmt is UP: PING OK - Packet loss = 0%, RTA = 253.01 ms
[21:13:48] <icinga-wm>	 RECOVERY - Host cp5008.mgmt is UP: PING OK - Packet loss = 0%, RTA = 253.14 ms
[21:13:48] <icinga-wm>	 RECOVERY - Host cp5005.mgmt is UP: PING OK - Packet loss = 0%, RTA = 252.28 ms
[21:14:19] <icinga-wm>	 PROBLEM - puppet last run on cp5012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:25] <XioNoX>	 !log re1 is master routing engine on cr1-eqsin, triggering a re switch
[21:15:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:19:03] <librenms-wmf>	 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr1-eqsin.wikimedia.org recovered from Critical syslog messages
[21:20:16] <wikibugs>	 (03CR) 10Bstorm: [C: 032] gridengine: try to translate all the Ubuntu package calls to Debian [puppet] - 10https://gerrit.wikimedia.org/r/447561 (https://phabricator.wikimedia.org/T199276) (owner: 10Bstorm)
[21:21:08] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on einsteinium is CRITICAL: cluster=cache_text site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[21:23:39] <icinga-wm>	 PROBLEM - puppet last run on cp5003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:25:39] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[21:25:48] <icinga-wm>	 PROBLEM - puppet last run on bast5001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:26:06] <librenms-wmf>	 08̶W̶a̶r̶n̶i̶n̶g Device mr1-eqsin.wikimedia.org recovered from Inbound interface errors
[21:28:47] <wikibugs>	 (03CR) 10Krinkle: [C: 031] JobQueue: Signal JobQueueEventBus is never read-only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/447055 (https://phabricator.wikimedia.org/T199594) (owner: 10Mobrovac)
[21:31:49] <icinga-wm>	 PROBLEM - IPv4 ping to eqsin on ripe-atlas-eqsin is CRITICAL: CRITICAL - failed 211 probes of 327 (alerts on 19) - https://atlas.ripe.net/measurements/11645085/#!map
[21:33:59] <icinga-wm>	 RECOVERY - puppet last run on cp5003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[21:34:58] <icinga-wm>	 RECOVERY - puppet last run on cp5012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[21:35:09] <icinga-wm>	 PROBLEM - puppet last run on lvs5001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/lib/nagios/plugins/check_pybal]
[21:39:37] <XioNoX>	 rope-atlas probe is getting better
[21:39:48] <XioNoX>	 ripe*
[21:41:18] <icinga-wm>	 RECOVERY - puppet last run on bast5001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[21:42:08] <icinga-wm>	 RECOVERY - IPv4 ping to eqsin on ripe-atlas-eqsin is OK: OK - failed 5 probes of 327 (alerts on 19) - https://atlas.ripe.net/measurements/11645085/#!map
[21:48:35] <wikibugs>	 (03PS1) 10Ayounsi: Revert "Depool eqsin for cr1-eqsin software upgrade" [dns] - 10https://gerrit.wikimedia.org/r/447726
[21:49:03] <wikibugs>	 (03CR) 10Ayounsi: [C: 032] Revert "Depool eqsin for cr1-eqsin software upgrade" [dns] - 10https://gerrit.wikimedia.org/r/447726 (owner: 10Ayounsi)
[21:50:38] <icinga-wm>	 RECOVERY - puppet last run on lvs5001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[21:59:42] <XioNoX>	 !log re-pooling eqsin
[21:59:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:01:56] <wikibugs>	 (03PS1) 10Bstorm: gridengine: some more exec node package cleanup for stretch [puppet] - 10https://gerrit.wikimedia.org/r/447727 (https://phabricator.wikimedia.org/T199276)
[22:02:38] <wikibugs>	 (03CR) 10Bstorm: [C: 032] gridengine: some more exec node package cleanup for stretch [puppet] - 10https://gerrit.wikimedia.org/r/447727 (https://phabricator.wikimedia.org/T199276) (owner: 10Bstorm)
[22:15:28] <icinga-wm>	 PROBLEM - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is CRITICAL: CRITICAL: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is alerting: 70% GET drop in 30min alert.
[22:15:30] <wikibugs>	 (03PS1) 10Bstorm: gridengine: just a couple more changes to work with stretch [puppet] - 10https://gerrit.wikimedia.org/r/447729 (https://phabricator.wikimedia.org/T199276)
[22:17:23] <wikibugs>	 (03CR) 10Bstorm: [C: 032] gridengine: just a couple more changes to work with stretch [puppet] - 10https://gerrit.wikimedia.org/r/447729 (https://phabricator.wikimedia.org/T199276) (owner: 10Bstorm)
[22:21:32] <wikibugs>	 10Operations, 10DBA, 10JADE, 10Scoring-platform-team, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight)
[22:22:34] <wikibugs>	 10Operations, 10DBA, 10JADE, 10Scoring-platform-team, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight)
[22:23:13] <wikibugs>	 10Operations, 10DBA, 10JADE, 10Scoring-platform-team (Current), 10User-Joe: Extension:JADE scalability concerns due to creating a page per revision - https://phabricator.wikimedia.org/T196547 (10awight) Creating a separate task presenting our questions as an RFC: {T200297}
[22:29:58] <wikibugs>	 10Operations, 10DBA, 10JADE, 10Scoring-platform-team, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight)
[22:39:19] <icinga-wm>	 RECOVERY - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is OK: OK: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is not alerting.
[22:41:02] <wikibugs>	 10Operations, 10DBA, 10JADE, 10Scoring-platform-team, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight)
[23:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I, the Bot under the Fountain, allow thee, The Deployer, to do Evening SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180724T2300).
[23:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[23:01:58] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Sakretsu) For the record, IP and registered users are still reporting this issue from mobi...
[23:02:15] <wikibugs>	 (03PS1) 10RobH: decom prod dns for [dataset|ms]1001 [dns] - 10https://gerrit.wikimedia.org/r/447732 (https://phabricator.wikimedia.org/T194060)
[23:05:06] <wikibugs>	 (03PS1) 10RobH: decom dataset1001 & ms1001 [puppet] - 10https://gerrit.wikimedia.org/r/447733 (https://phabricator.wikimedia.org/T194060)
[23:12:18] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission, 10User-ArielGlenn: decommission dataset1001, ms1001 - https://phabricator.wikimedia.org/T194060 (10RobH)
[23:13:06] <wikibugs>	 (03PS1) 10Dzahn: planet: fix broken URL in xmldescription, missing dot [puppet] - 10https://gerrit.wikimedia.org/r/447736 (https://phabricator.wikimedia.org/T198680)
[23:14:14] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission, 10User-ArielGlenn: decommission dataset1001, ms1001 - https://phabricator.wikimedia.org/T194060 (10RobH)
[23:14:30] <wikibugs>	 (03CR) 10RobH: [C: 032] decom dataset1001 & ms1001 [puppet] - 10https://gerrit.wikimedia.org/r/447733 (https://phabricator.wikimedia.org/T194060) (owner: 10RobH)
[23:14:59] <wikibugs>	 (03CR) 10RobH: [C: 032] decom prod dns for [dataset|ms]1001 [dns] - 10https://gerrit.wikimedia.org/r/447732 (https://phabricator.wikimedia.org/T194060) (owner: 10RobH)
[23:17:22] <wikibugs>	 10Operations, 10DBA, 10JADE, 10Scoring-platform-team, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight)
[23:17:33] <wikibugs>	 (03PS1) 10Smalyshev: Enable constraints fetching for test cluster [puppet] - 10https://gerrit.wikimedia.org/r/447740 (https://phabricator.wikimedia.org/T192567)
[23:17:34] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission, 10User-ArielGlenn: decommission dataset1001, ms1001 - https://phabricator.wikimedia.org/T194060 (10RobH) a:03Cmjohnson
[23:17:36] <wikibugs>	 (03PS1) 10Smalyshev: Enable constraints fetching on internal cluster [puppet] - 10https://gerrit.wikimedia.org/r/447741 (https://phabricator.wikimedia.org/T192567)
[23:17:38] <wikibugs>	 (03PS1) 10Smalyshev: Enable constraints loading everywhere [puppet] - 10https://gerrit.wikimedia.org/r/447742 (https://phabricator.wikimedia.org/T192567)
[23:18:12] <wikibugs>	 (03PS2) 10Dzahn: planet: fix broken URL in xmldescription, missing dot [puppet] - 10https://gerrit.wikimedia.org/r/447736 (https://phabricator.wikimedia.org/T198680)
[23:18:29] <wikibugs>	 (03PS2) 10Smalyshev: Enable constraints fetching for test cluster [puppet] - 10https://gerrit.wikimedia.org/r/447740 (https://phabricator.wikimedia.org/T192567)
[23:19:36] <wikibugs>	 (03CR) 10Paladox: [C: 031] planet: fix broken URL in xmldescription, missing dot [puppet] - 10https://gerrit.wikimedia.org/r/447736 (https://phabricator.wikimedia.org/T198680) (owner: 10Dzahn)
[23:20:06] <wikibugs>	 (03CR) 10Dzahn: [C: 032] planet: fix broken URL in xmldescription, missing dot [puppet] - 10https://gerrit.wikimedia.org/r/447736 (https://phabricator.wikimedia.org/T198680) (owner: 10Dzahn)
[23:26:00] <wikibugs>	 (03CR) 10Legoktm: [C: 032] Add .gitreview [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/447625 (owner: 10Zhuyifei1999)
[23:26:23] <wikibugs>	 (03Merged) 10jenkins-bot: Add .gitreview [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/447625 (owner: 10Zhuyifei1999)
[23:38:24] <wikibugs>	 (03PS10) 10EBernhardson: Switch elasticsearch to use tlsproxy module [puppet] - 10https://gerrit.wikimedia.org/r/444610 (https://phabricator.wikimedia.org/T198351)
[23:38:26] <wikibugs>	 (03PS2) 10EBernhardson: Make elasticsearch http and transport ports explicit [puppet] - 10https://gerrit.wikimedia.org/r/447568 (https://phabricator.wikimedia.org/T198351)
[23:41:53] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Sunset Watchmouse's status.wikimedia.org - https://phabricator.wikimedia.org/T199816 (10Dzahn) Deprecated link on German Wikipedia.  https://de.wikipedia.org/w/index.php?title=Wikipedia%3ATechnik%2FNetzwerk%2FDomains&type=revision&diff=1794...
[23:43:35] <wikibugs>	 (03PS1) 10Dzahn: planet: tune feed name, description, owneremail, maxarticles [puppet] - 10https://gerrit.wikimedia.org/r/447743
[23:44:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] planet: tune feed name, description, owneremail, maxarticles [puppet] - 10https://gerrit.wikimedia.org/r/447743 (owner: 10Dzahn)
[23:47:51] <wikibugs>	 (03PS2) 10Dzahn: planet: tune feed name, description, owneremail, maxarticles [puppet] - 10https://gerrit.wikimedia.org/r/447743
[23:48:57] <wikibugs>	 (03PS3) 10EBernhardson: Make elasticsearch http and transport ports explicit [puppet] - 10https://gerrit.wikimedia.org/r/447568 (https://phabricator.wikimedia.org/T198351)
[23:49:36] <wikibugs>	 (03PS21) 10EBernhardson: Prep work for multi-instance elasticsearch refactor [puppet] - 10https://gerrit.wikimedia.org/r/440498 (https://phabricator.wikimedia.org/T198351)
[23:57:52] <wikibugs>	 (03CR) 10EBernhardson: "I've split most of the other parts out of this patch, leaving only conversion to systemd/elasticsearch_5@ and minor elasticsearch.yml conf" [puppet] - 10https://gerrit.wikimedia.org/r/440498 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson)