[02:40:07] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at esams on icinga1001 is CRITICAL: 48.8 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[02:43:19] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at esams on icinga1001 is OK: (C)60 le (W)70 le 92.67 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[07:36:27] <icinga-wm>	 PROBLEM - ElasticSearch unassigned shard check - 9243 on search.svc.eqiad.wmnet is CRITICAL: CRITICAL - eswiki_content_1521891951[6](2019-08-15T03:43:12.536Z), enwiki_content_1546970425[2](2019-08-15T03:43:02.394Z) https://wikitech.wikimedia.org/wiki/Search%23Administration
[08:03:33] <icinga-wm>	 ACKNOWLEDGEMENT - Host elastic1017 is DOWN: PING CRITICAL - Packet loss = 100% Effie Mouzeli Host will be retired - T230518
[08:22:53] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "This LGTM in general." (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden)
[08:28:25] <onimisionipe>	 !log running `_cluster/reroute?pretty&explain=true&retry_failed` on eqiad production-search cluster to force allocation of shards
[08:28:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:42:54] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/530712 (https://phabricator.wikimedia.org/T113114) (owner: 10Alexandros Kosiaris)
[09:53:16] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Epic, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1023 with 10G interfaces - https://phabricator.wikimedia.org/T229871 (10Andrew) I did a manual install-console on this host and it's doing its initial puppet run now.
[09:53:35] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1023 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:54:13] <icinga-wm>	 ACKNOWLEDGEMENT - ensure kvm processes are running on cloudvirt1023 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 Arturo Borrero Gonzalez rebuilding https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[10:11:29] <icinga-wm>	 RECOVERY - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is OK: puppetdb.PuppetDB OK https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[10:23:41] <wikibugs>	 (03PS3) 10Andrew Bogott: cloud recursors: alias 'puppet' to the new in-labs puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/530341 (https://phabricator.wikimedia.org/T171188)
[10:23:43] <wikibugs>	 (03PS2) 10Andrew Bogott: labpuppetmaster1001/1002:  Clean up after moving puppetmasters to the cloud [puppet] - 10https://gerrit.wikimedia.org/r/530382 (https://phabricator.wikimedia.org/T171188)
[10:23:45] <wikibugs>	 (03PS1) 10Andrew Bogott: cloudvirt1023: update nic name for Stretch [puppet] - 10https://gerrit.wikimedia.org/r/530763
[10:24:42] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt1023: update nic name for Stretch [puppet] - 10https://gerrit.wikimedia.org/r/530763 (owner: 10Andrew Bogott)
[10:25:22] <icinga-wm>	 RECOVERY - ElasticSearch unassigned shard check - 9243 on search.svc.eqiad.wmnet is OK: OK - All good https://wikitech.wikimedia.org/wiki/Search%23Administration
[10:50:15] <wikibugs>	 (03PS1) 10Alex Monk: Add missing cloudinfra contact group [puppet] - 10https://gerrit.wikimedia.org/r/530765 (https://phabricator.wikimedia.org/T230674)
[11:03:46] <wikibugs>	 (03PS4) 10Andrew Bogott: cloud recursors: alias 'puppet' to the new in-labs puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/530341 (https://phabricator.wikimedia.org/T171188)
[11:03:48] <wikibugs>	 (03PS3) 10Andrew Bogott: labpuppetmaster1001/1002:  Clean up after moving puppetmasters to the cloud [puppet] - 10https://gerrit.wikimedia.org/r/530382 (https://phabricator.wikimedia.org/T171188)
[11:03:50] <wikibugs>	 (03PS1) 10Andrew Bogott: cloudvirt1023: change network names again [puppet] - 10https://gerrit.wikimedia.org/r/530766
[11:04:48] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt1023: change network names again [puppet] - 10https://gerrit.wikimedia.org/r/530766 (owner: 10Andrew Bogott)
[11:07:54] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1023 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[11:33:25] <wikibugs>	 (03PS1) 10MarcoAurelio: WIP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530769
[11:36:20] <wikibugs>	 (03PS2) 10MarcoAurelio: Change language code for punjabiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530769 (https://phabricator.wikimedia.org/T230680)
[11:39:42] <icinga-wm>	 PROBLEM - MegaRAID on db1063 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[11:39:43] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on db1063 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T230682 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[11:39:47] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on db1063 - https://phabricator.wikimedia.org/T230682 (10ops-monitoring-bot)
[11:41:02] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: /{domain}/v1/page/featured/{year}/{month}/{day} (retrieve title of the featured article for April 29, 2016) is CRITICAL: Test retrieve title of the featured article for April 29, 2016 returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[11:42:38] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[11:47:24] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most-read articles for January 1, 2016 (with aggregated=true)) is CRITICAL: Test retrieve the most-read articles for January 1, 2016 (with aggregated=true) returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[11:48:58] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[12:02:35] <wikibugs>	 10Operations, 10Acme-chief, 10Traffic: Provide the three cert types (chain-only, cert only and chained) as soon as we get the certificate issued - https://phabricator.wikimedia.org/T229096 (10Krenair) Can we close this now?
[12:02:59] <wikibugs>	 10Operations, 10Acme-chief, 10Traffic: acme-chief staging time not working as expected - https://phabricator.wikimedia.org/T225945 (10Krenair) Is it working as expected now?
[12:47:13] <wikibugs>	 10Operations, 10Acme-chief, 10Traffic: Decide/document criteria needed to serve acme-chief LE issued unified certificate to end users - https://phabricator.wikimedia.org/T230687 (10Krenair)
[12:55:23] <wikibugs>	 (03PS1) 10Krinkle: hieradata: Move beta 'cache::app_directors' from Horizon to Puppet [puppet] - 10https://gerrit.wikimedia.org/r/530771 (https://phabricator.wikimedia.org/T158837)
[13:16:49] <wikibugs>	 (03PS1) 10Krinkle: hieradata: Add 'performance.wikimedia.beta.wmflabs.org' routing [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837)
[13:19:52] <Krinkle>	 Krenair: ^
[13:20:13] <Krinkle>	 to replace performance-beta.wmflabs.org web proxy
[13:20:24] <Krinkle>	 looks like wildcard routing puts it on text vcl already
[13:20:28] <Krinkle>	 so probably will just work?
[13:25:19] <Krinkle>	 bd808: Which one is meant to win? Horizon puppet hiera or puppet.git puppet hiera?
[13:27:31] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "LGTM!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530769 (https://phabricator.wikimedia.org/T230680) (owner: 10MarcoAurelio)
[13:29:09] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] "Help? Cherry-picking on beta puppet master causes a compilation failure." [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) (owner: 10Krinkle)
[13:29:22] <Krinkle>	 akosiaris: could use some help from someone who knows VCL better. any recommendations?
[13:30:13] <wikibugs>	 (03PS2) 10Andrew Bogott: Add missing cloudinfra contact group [puppet] - 10https://gerrit.wikimedia.org/r/530765 (https://phabricator.wikimedia.org/T230674) (owner: 10Alex Monk)
[13:31:16] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Add missing cloudinfra contact group [puppet] - 10https://gerrit.wikimedia.org/r/530765 (https://phabricator.wikimedia.org/T230674) (owner: 10Alex Monk)
[13:32:22] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] "Thanks for the fix!" [puppet] - 10https://gerrit.wikimedia.org/r/530765 (https://phabricator.wikimedia.org/T230674) (owner: 10Alex Monk)
[13:48:59] <wikibugs>	 (03CR) 10Alex Monk: "(See T171188)" [puppet] - 10https://gerrit.wikimedia.org/r/530344 (owner: 10Alex Monk)
[13:49:08] <wikibugs>	 (03CR) 10Alex Monk: [C: 04-1] "(See T171188)" [puppet] - 10https://gerrit.wikimedia.org/r/530371 (owner: 10Alex Monk)
[13:56:54] <icinga-wm>	 PROBLEM - Host cp2004 is DOWN: PING CRITICAL - Packet loss = 100%
[14:04:58] <icinga-wm>	 PROBLEM - IPsec on cp3033 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:05:06] <icinga-wm>	 PROBLEM - IPsec on cp5010 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:05:08] <icinga-wm>	 PROBLEM - IPsec on cp1085 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:05:10] <icinga-wm>	 PROBLEM - IPsec on cp1081 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:05:16] <icinga-wm>	 PROBLEM - IPsec on cp5007 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:05:20] <icinga-wm>	 PROBLEM - IPsec on cp5011 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:05:32] <icinga-wm>	 PROBLEM - IPsec on cp3043 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:05:34] <icinga-wm>	 PROBLEM - IPsec on cp4032 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:05:36] <icinga-wm>	 PROBLEM - IPsec on cp3042 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:05:36] <icinga-wm>	 PROBLEM - IPsec on cp3032 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:05:44] <icinga-wm>	 PROBLEM - IPsec on cp1077 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:05:50] <icinga-wm>	 PROBLEM - IPsec on cp4031 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:05:50] <icinga-wm>	 PROBLEM - IPsec on cp4030 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:05:50] <icinga-wm>	 PROBLEM - IPsec on cp1089 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:05:54] <icinga-wm>	 PROBLEM - IPsec on cp5008 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:06:00] <icinga-wm>	 PROBLEM - IPsec on cp4028 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:06:02] <icinga-wm>	 PROBLEM - IPsec on cp1083 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:06:06] <icinga-wm>	 PROBLEM - IPsec on cp4027 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:06:06] <icinga-wm>	 PROBLEM - IPsec on cp4029 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:06:08] <icinga-wm>	 PROBLEM - IPsec on cp1087 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:06:14] <icinga-wm>	 PROBLEM - IPsec on cp3030 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:06:14] <icinga-wm>	 PROBLEM - IPsec on cp3040 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:06:16] <icinga-wm>	 PROBLEM - IPsec on cp5009 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:06:18] <icinga-wm>	 PROBLEM - IPsec on cp5012 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:06:18] <icinga-wm>	 PROBLEM - IPsec on cp3041 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:06:22] <icinga-wm>	 PROBLEM - IPsec on cp1079 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:06:32] <icinga-wm>	 PROBLEM - IPsec on cp1075 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp2004_v4, cp2004_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan
[14:14:41] <Krenair>	 okay so that's all just because cp2004 went off I guess
[14:46:26] <cdanis>	 yeah that's typical with the IPsec alerts
[14:46:55] <cdanis>	 AIUI, in the future ATS world, there won't be a need for IPsec
[14:51:17] <Krenair>	 cdanis, just like TLS between everything I guess?
[14:51:58] <Krenair>	 is it done with IPsec right now because varnish and TLS don't mix?
[14:53:14] <cdanis>	 that's right Krenair
[14:53:33] <cdanis>	 I believe it's specifically for "Varnish needs to call other Varnish" case, as that can't use TLS
[14:54:30] <Krenair>	 kind of surprised we can't stick nginx in that path too
[14:54:33] <Krenair>	 but ok
[14:55:46] <cdanis>	 I'm not sure if it's "can't", or "not worth it for something temporary"
[14:58:21] <Krenair>	 right
[14:58:26] <Krenair>	 makes sense. would be more overhead too
[14:59:51] <cdanis>	 yeah, plus it'd be a nontrivial nginx configuration, and I know _j.oe_ has encountered some issues elsewhere when using nginx as a TLS-adding reverse proxy
[15:00:37] <Krenair>	 well, we have some prior art :)
[15:00:40] <Krenair>	 but sure
[17:51:38] <wikibugs>	 (03PS9) 10Daimona Eaytoy: Rename globals and rights in AbuseFilter config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/480074
[21:19:39] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] Add `WS` and `CAT` as aliases for zhwikisource namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530413 (https://phabricator.wikimedia.org/T230548) (owner: 10DannyS712)