[00:51:05] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2098 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 990.52 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[02:14:03] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2098 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[02:43:37] <icinga-wm>	 PROBLEM - termbox codfw on termbox.svc.codfw.wmnet is CRITICAL: /termbox (get rendered termbox) is CRITICAL: Test get rendered termbox returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service
[02:45:13] <icinga-wm>	 RECOVERY - termbox codfw on termbox.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service
[03:29:22] <wikibugs>	 (03PS1) 10CRusnov: ganeti: Add ability to get ganeti cluster for given instance [software/spicerack] - 10https://gerrit.wikimedia.org/r/533984 (https://phabricator.wikimedia.org/T231068)
[04:12:33] <wikibugs>	 (03PS1) 10Dzahn: peopleweb: include envoy for TLS termination [puppet] - 10https://gerrit.wikimedia.org/r/533985 (https://phabricator.wikimedia.org/T210411)
[04:19:55] <wikibugs>	 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn)
[04:21:02] <wikibugs>	 (03PS1) 10Dzahn: add peopleweb.discovery.wmnet [dns] - 10https://gerrit.wikimedia.org/r/533986 (https://phabricator.wikimedia.org/T210411)
[04:22:06] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] add peopleweb.discovery.wmnet [dns] - 10https://gerrit.wikimedia.org/r/533986 (https://phabricator.wikimedia.org/T210411) (owner: 10Dzahn)
[04:28:48] <vgutierrez>	 !log Switching cp2002 from nginx to ats-tls - T231433
[04:28:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:28:50] <stashbot>	 T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433
[04:29:47] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] hiera: Move nginx from port 443 to 4443 on cp2002 [puppet] - 10https://gerrit.wikimedia.org/r/532984 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[04:29:56] <wikibugs>	 (03PS2) 10Vgutierrez: hiera: Move nginx from port 443 to 4443 on cp2002 [puppet] - 10https://gerrit.wikimedia.org/r/532984 (https://phabricator.wikimedia.org/T231433)
[04:36:01] <vgutierrez>	 !log upgrading ATS to 8.0.5-1wm4 on cp2002 - T231433
[04:36:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:36:03] <stashbot>	 T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433
[04:37:29] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] hiera: Move ats-tls from port 8443 to port 443 on cp2002 [puppet] - 10https://gerrit.wikimedia.org/r/532985 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[04:37:40] <wikibugs>	 (03PS2) 10Vgutierrez: hiera: Move ats-tls from port 8443 to port 443 on cp2002 [puppet] - 10https://gerrit.wikimedia.org/r/532985 (https://phabricator.wikimedia.org/T231433)
[04:38:13] <icinga-wm>	 PROBLEM - HTTPS Unified ECDSA on cp2002 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS
[04:39:07] <icinga-wm>	 PROBLEM - HTTPS Unified RSA on cp2002 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS
[04:44:19] <wikibugs>	 (03PS1) 10CRusnov: netbox: Transparently support read-only operations for virtual machines [software/spicerack] - 10https://gerrit.wikimedia.org/r/533987 (https://phabricator.wikimedia.org/T231068)
[04:44:37] <icinga-wm>	 RECOVERY - HTTPS Unified ECDSA on cp2002 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 345568 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2019-11-22 07:59:59 +0000 (expires in 80 days) https://wikitech.wikimedia.org/wiki/HTTPS
[04:45:31] <icinga-wm>	 RECOVERY - HTTPS Unified RSA on cp2002 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 345513 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (RSA) valid until 2019-11-22 07:59:59 +0000 (expires in 80 days) https://wikitech.wikimedia.org/wiki/HTTPS
[04:47:42] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops: mw2231 is down and unable to reboot - https://phabricator.wikimedia.org/T231192 (10Joe) I second what @MoritzMuehlenhoff suggested. The system is not scheduled for replacement for another 2 years, so if we can salvage it somehow, that'd be great.
[04:49:38] <vgutierrez>	 !log repooling cp2002 - T231433
[04:49:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:49:41] <stashbot>	 T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433
[04:50:06] <marostegui>	 !Drop filejournal table on s3 - T51195
[04:50:06] <stashbot>	 T51195: Drop filejournal table from WMF - https://phabricator.wikimedia.org/T51195
[04:50:10] <marostegui>	 !log Drop filejournal table on s3 - T51195
[04:50:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:57:56] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Remove trailing line on cp2002 yaml file [puppet] - 10https://gerrit.wikimedia.org/r/533988
[04:58:46] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Promote db2118 to s7 codfw master [puppet] - 10https://gerrit.wikimedia.org/r/533989 (https://phabricator.wikimedia.org/T230106)
[04:59:57] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] hiera: Remove trailing line on cp2002 yaml file [puppet] - 10https://gerrit.wikimedia.org/r/533988 (owner: 10Vgutierrez)
[05:01:43] <wikibugs>	 (03PS1) 10Marostegui: db-codfw.php: Promote db2118 to s7 codfw master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533990 (https://phabricator.wikimedia.org/T230106)
[05:02:18] <marostegui>	 !log Promote db2118 to s7 codfw master (db2047 -> db2118) T230106
[05:02:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:02:20] <stashbot>	 T230106: Switchover codfw primary database masters to new hosts - https://phabricator.wikimedia.org/T230106
[05:05:29] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Promote db2118 to s7 codfw master [puppet] - 10https://gerrit.wikimedia.org/r/533989 (https://phabricator.wikimedia.org/T230106) (owner: 10Marostegui)
[05:05:40] <wikibugs>	 (03PS2) 10Marostegui: mariadb: Promote db2118 to s7 codfw master [puppet] - 10https://gerrit.wikimedia.org/r/533989 (https://phabricator.wikimedia.org/T230106)
[05:12:26] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Promote db2118 to s7 codfw master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533990 (https://phabricator.wikimedia.org/T230106) (owner: 10Marostegui)
[05:12:46] <wikibugs>	 (03CR) 10jenkins-bot: db-codfw.php: Promote db2118 to s7 codfw master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533990 (https://phabricator.wikimedia.org/T230106) (owner: 10Marostegui)
[05:14:51] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Promote db2118 to s7 codfw master (db2047 -> db2118) T230106', diff saved to https://phabricator.wikimedia.org/P9026 and previous config saved to /var/cache/conftool/dbconfig/20190903-051450-marostegui.json
[05:14:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:14:53] <stashbot>	 T230106: Switchover codfw primary database masters to new hosts - https://phabricator.wikimedia.org/T230106
[05:16:20] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2047 old master from s7 T230106', diff saved to https://phabricator.wikimedia.org/P9027 and previous config saved to /var/cache/conftool/dbconfig/20190903-051619-marostegui.json
[05:16:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:17:09] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Promote db2118 to s7 codfw master (db2047 -> db2118) T230106 (duration: 00m 54s)
[05:17:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:19:49] <wikibugs>	 10Operations, 10DBA: Drop puppet database from m1 - https://phabricator.wikimedia.org/T231539 (10Marostegui) a:03Marostegui
[05:20:50] <wikibugs>	 (03PS1) 10Dzahn: add fake SSL key for peopleweb [labs/private] - 10https://gerrit.wikimedia.org/r/533991
[05:22:01] <wikibugs>	 10Operations, 10DBA: Drop puppet database from m1 - https://phabricator.wikimedia.org/T231539 (10Marostegui) I have left a backup of this DB at: ` cumin1001:/home/marostegui/T231539 `
[05:22:16] <marostegui>	 !log Rename tables on the puppet database on m1 master - T231539
[05:22:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:22:18] <stashbot>	 T231539: Drop puppet database from m1 - https://phabricator.wikimedia.org/T231539
[05:24:42] <wikibugs>	 10Operations, 10DBA: Drop puppet database from m1 - https://phabricator.wikimedia.org/T231539 (10Marostegui) I have renamed the tables on the `puppet` DB, I will leave them for a few hours before dropping the database: ` # mysql.py -hdb1063 puppet -e "show tables" -BN TO_DROP_auth_group TO_DROP_auth_group_perm...
[05:27:34] <wikibugs>	 (03CR) 10Dzahn: [V: 03+2 C: 03+2] add fake SSL key for peopleweb [labs/private] - 10https://gerrit.wikimedia.org/r/533991 (owner: 10Dzahn)
[05:35:48] <wikibugs>	 (03PS2) 10Dzahn: peopleweb: add TLS termination with envoy [puppet] - 10https://gerrit.wikimedia.org/r/533985 (https://phabricator.wikimedia.org/T210411)
[05:37:34] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is CRITICAL: 59.56 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[05:40:48] <wikibugs>	 (03PS3) 10Dzahn: peopleweb: add TLS termination with envoy [puppet] - 10https://gerrit.wikimedia.org/r/533985 (https://phabricator.wikimedia.org/T210411)
[05:44:11] <wikibugs>	 (03PS1) 10Marostegui: db-codfw.php: Reorganize s7 codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533993 (https://phabricator.wikimedia.org/T230106)
[05:45:32] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Reorganize s7 codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533993 (https://phabricator.wikimedia.org/T230106) (owner: 10Marostegui)
[05:46:30] <wikibugs>	 (03Merged) 10jenkins-bot: db-codfw.php: Reorganize s7 codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533993 (https://phabricator.wikimedia.org/T230106) (owner: 10Marostegui)
[05:46:46] <wikibugs>	 (03CR) 10jenkins-bot: db-codfw.php: Reorganize s7 codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533993 (https://phabricator.wikimedia.org/T230106) (owner: 10Marostegui)
[05:47:49] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Reorganize s7 codfw T230106 (duration: 00m 54s)
[05:47:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:47:51] <stashbot>	 T230106: Switchover codfw primary database masters to new hosts - https://phabricator.wikimedia.org/T230106
[05:48:26] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] peopleweb: add TLS termination with envoy [puppet] - 10https://gerrit.wikimedia.org/r/533985 (https://phabricator.wikimedia.org/T210411) (owner: 10Dzahn)
[05:48:36] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is OK: (C)60 le (W)70 le 74.14 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[05:52:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Reorganize s7 codfw T230106', diff saved to https://phabricator.wikimedia.org/P9028 and previous config saved to /var/cache/conftool/dbconfig/20190903-055234-marostegui.json
[05:52:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:53:18] <mutante>	 !log people.wikimedia.org - switching to TLS termination with envoy 
[05:53:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:54:51] <wikibugs>	 10Operations, 10Traffic: Tune ATS SSL session cache - https://phabricator.wikimedia.org/T231849 (10Vgutierrez)
[05:55:45] <wikibugs>	 10Operations, 10Traffic: Tune ATS SSL session cache - https://phabricator.wikimedia.org/T231849 (10Vgutierrez) p:05Triage→03Normal
[05:57:08] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: wdqs: convert to profile::lvs::realserver [puppet] - 10https://gerrit.wikimedia.org/r/532666
[05:57:47] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Increase ATS SSL session cache capacity to 4M sessions [puppet] - 10https://gerrit.wikimedia.org/r/533994 (https://phabricator.wikimedia.org/T231849)
[05:59:03] <wikibugs>	 (03PS1) 10Dzahn: ATS: switch people.wikimedia.org to https backend [puppet] - 10https://gerrit.wikimedia.org/r/533995 (https://phabricator.wikimedia.org/T210411)
[06:00:12] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] ATS: switch people.wikimedia.org to https backend [puppet] - 10https://gerrit.wikimedia.org/r/533995 (https://phabricator.wikimedia.org/T210411) (owner: 10Dzahn)
[06:00:22] <wikibugs>	 (03PS2) 10Dzahn: ATS: switch people.wikimedia.org to https backend [puppet] - 10https://gerrit.wikimedia.org/r/533995 (https://phabricator.wikimedia.org/T210411)
[06:02:15] <_joe_>	 grrrr
[06:02:28] <_joe_>	 mutante: please let me merge
[06:02:41] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: wdqs: convert to profile::lvs::realserver [puppet] - 10https://gerrit.wikimedia.org/r/532666
[06:02:46] <_joe_>	 this is so fcking slow
[06:02:59] <mutante>	 _joe_: oh, sure. i will take a break but this one is already too late i'm afraid
[06:03:03] <_joe_>	 ff-only is the choice of people that don't merge much.
[06:03:10] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] wdqs: convert to profile::lvs::realserver [puppet] - 10https://gerrit.wikimedia.org/r/532666 (owner: 10Giuseppe Lavagetto)
[06:03:49] <mutante>	 _joe_: want me to type 'multiple' ?
[06:04:42] <marostegui>	 !log Change min_replicas to 4 on s7 for eqiad and codfw T231019
[06:04:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:04:45] <stashbot>	 T231019: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019
[06:05:54] <_joe_>	 mutante: please do, I was getting errors from puppet merge :P
[06:06:14] <mutante>	 _joe_: merged!
[06:06:22] <mutante>	 well.. in the process of merging
[06:06:34] <_joe_>	 eheh it's a long process nowadays
[06:06:44] <_joe_>	 I never looked into what's making it so damn slow
[06:06:58] <mutante>	 i just know that labs/private is now part of it
[06:07:05] <mutante>	 done!
[06:10:43] <mutante>	 !log running puppet on cp-text_eqiad to switch people.wm.org to https backend
[06:10:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:20:45] <wikibugs>	 (03PS1) 10Dzahn: puppet/site: fail if no role has been assigned to a node [puppet] - 10https://gerrit.wikimedia.org/r/534005
[06:21:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppet/site: fail if no role has been assigned to a node [puppet] - 10https://gerrit.wikimedia.org/r/534005 (owner: 10Dzahn)
[06:22:38] <wikibugs>	 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn)
[06:23:12] <wikibugs>	 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn)
[06:23:51] <wikibugs>	 (03PS2) 10Dzahn: puppet/site: fail if no role has been assigned to a node [puppet] - 10https://gerrit.wikimedia.org/r/534005
[06:24:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppet/site: fail if no role has been assigned to a node [puppet] - 10https://gerrit.wikimedia.org/r/534005 (owner: 10Dzahn)
[06:28:05] <wikibugs>	 (03PS3) 10Dzahn: puppet/site: fail if no role has been assigned to a node [puppet] - 10https://gerrit.wikimedia.org/r/534005
[06:30:04] <wikibugs>	 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui)
[06:31:19] <wikibugs>	 10Operations, 10DBA: Decommission db2047.codfw.wmnet - https://phabricator.wikimedia.org/T231852 (10Marostegui)
[06:31:34] <wikibugs>	 10Operations, 10DBA: Decommission db2047.codfw.wmnet - https://phabricator.wikimedia.org/T231852 (10Marostegui) p:05Triage→03Normal
[06:31:54] <wikibugs>	 10Operations, 10DBA: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10Marostegui)
[06:34:46] <wikibugs>	 (03PS1) 10Marostegui: db2047: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/534010 (https://phabricator.wikimedia.org/T231852)
[06:35:01] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] Release 0.21 [software/acme-chief] - 10https://gerrit.wikimedia.org/r/533856 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez)
[06:35:34] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2047: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/534010 (https://phabricator.wikimedia.org/T231852) (owner: 10Marostegui)
[06:36:14] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Decommission db2047.codfw.wmnet - https://phabricator.wikimedia.org/T231852 (10Marostegui)
[06:39:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Pool db1133 with weight 0 T229657', diff saved to https://phabricator.wikimedia.org/P9029 and previous config saved to /var/cache/conftool/dbconfig/20190903-063932-marostegui.json
[06:39:35] <stashbot>	 marostegui@cumin1001: Failed to log message to wiki. Somebody should check the error logs.
[06:39:36] <stashbot>	 T229657: Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657
[06:40:26] <wikibugs>	 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui)
[06:41:23] <wikibugs>	 (03PS5) 10Marostegui: wmnet: Promote db1133 to m5 master [dns] - 10https://gerrit.wikimedia.org/r/529333 (https://phabricator.wikimedia.org/T229657)
[06:53:57] <wikibugs>	 (03CR) 10jenkins-bot: Release 0.21 [software/acme-chief] - 10https://gerrit.wikimedia.org/r/533856 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez)
[06:59:50] <wikibugs>	 (03CR) 10Dzahn: "gentle ping. we said it needs discussion back in June. that's been a couple months." [puppet] - 10https://gerrit.wikimedia.org/r/510753 (https://phabricator.wikimedia.org/T223463) (owner: 10Rush)
[07:00:43] <wikibugs>	 (03PS3) 10Dzahn: Remove wikimedia.ee [dns] - 10https://gerrit.wikimedia.org/r/459835 (https://phabricator.wikimedia.org/T204056) (owner: 10Reedy)
[07:01:17] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "not stalled anymore now - but also rebased to nothing because it was already done by somebody else now" [dns] - 10https://gerrit.wikimedia.org/r/459835 (https://phabricator.wikimedia.org/T204056) (owner: 10Reedy)
[07:02:18] <wikibugs>	 (03Abandoned) 10Dzahn: Remove wikimedia.ee [dns] - 10https://gerrit.wikimedia.org/r/459835 (https://phabricator.wikimedia.org/T204056) (owner: 10Reedy)
[07:10:24] <wikibugs>	 10Operations, 10MediaWiki-Maintenance-scripts, 10serviceops: Stop forcing RUNNER=php for foreachwiki/foreachwikiindblist - https://phabricator.wikimedia.org/T230110 (10Dzahn) Kind of, yea. But also we should revert setting it to php7.2 now to avoid hardcoding the version.
[07:13:17] <wikibugs>	 (03PS1) 10Dzahn: switch RUNNER in foreachwikiindblist back to just 'php' [puppet] - 10https://gerrit.wikimedia.org/r/534012 (https://phabricator.wikimedia.org/T230110)
[07:15:14] <wikibugs>	 (03CR) 10Nuria: [C: 03+1] analytics::refinery::job::data_purge.pp Add skip-trash to timers [puppet] - 10https://gerrit.wikimedia.org/r/533955 (https://phabricator.wikimedia.org/T229436) (owner: 10Mforns)
[07:16:47] <marostegui>	 !log Change min_replicas to 6 on s1 for eqiad and codfw T231019
[07:18:56] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 04-1] "This will need some sync up with DC as TTBOMK they currently install the servers without the initial role attached. But there's also a few" [puppet] - 10https://gerrit.wikimedia.org/r/534005 (owner: 10Dzahn)
[07:20:21] <wikibugs>	 (03PS1) 10Marostegui: db-codfw.php: Re-organize s1 codfw candidate masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534014 (https://phabricator.wikimedia.org/T230106)
[07:20:56] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for mbsantos - https://phabricator.wikimedia.org/T227695 (10Nuria) Closing as turnilo is indeed sufficient to gather the info requested
[07:20:58] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for mbsantos - https://phabricator.wikimedia.org/T227695 (10Nuria) 05Open→03Resolved
[07:22:08] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[07:22:22] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqsin on icinga1001 is CRITICAL: job=varnish-text site=eqsin https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[07:23:00] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[07:23:00] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqiad on icinga1001 is CRITICAL: job=varnish-text site=eqiad https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[07:23:04] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[07:23:16] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[07:23:30] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqsin on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[07:23:32] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is CRITICAL: cluster=cache_text site=codfw https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[07:24:06] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[07:24:08] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[07:24:10] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[07:24:25] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Re-organize s1 codfw candidate masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534014 (https://phabricator.wikimedia.org/T230106) (owner: 10Marostegui)
[07:24:40] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[07:25:19] <wikibugs>	 (03Merged) 10jenkins-bot: db-codfw.php: Re-organize s1 codfw candidate masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534014 (https://phabricator.wikimedia.org/T230106) (owner: 10Marostegui)
[07:26:23] <wikibugs>	 (03CR) 10jenkins-bot: db-codfw.php: Re-organize s1 codfw candidate masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534014 (https://phabricator.wikimedia.org/T230106) (owner: 10Marostegui)
[07:26:29] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Reorganize s1 codfw future master/candidate T230106 (duration: 00m 49s)
[07:28:44] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Wikistats, 10Traffic, 10Performance-Team (Radar): Piwik JS isn't cached - https://phabricator.wikimedia.org/T230772 (10Nuria) @ema So I understand: caching pass needs to be removed (will do so) and since caching response includes an eTag. Is removing caching: pass s...
[07:29:09] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove roentgenium/tureis [puppet] - 10https://gerrit.wikimedia.org/r/534017 (https://phabricator.wikimedia.org/T224559)
[07:29:28] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Reorganize future master/candidate on s1 codfw [puppet] - 10https://gerrit.wikimedia.org/r/534018 (https://phabricator.wikimedia.org/T230106)
[07:31:07] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove DNS entries for roentgenium/tureis [dns] - 10https://gerrit.wikimedia.org/r/534019 (https://phabricator.wikimedia.org/T224559)
[07:31:37] <wikibugs>	 (03CR) 10Dzahn: "i see, thanks Moritz. So that sounds like the alternative idea to add base::firewall in the default stanza is also not going to work becau" [puppet] - 10https://gerrit.wikimedia.org/r/534005 (owner: 10Dzahn)
[07:33:38] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 04-1] "Adding that fail seems like a fine approach, we only need to make sure that proper roles for setting up a server with and without base::fi" [puppet] - 10https://gerrit.wikimedia.org/r/534005 (owner: 10Dzahn)
[07:34:28] <logmsgbot>	 !log joal@deploy1001 Started deploy [analytics/refinery@4810dfa]: Regular weekly analytics deploy train
[07:35:05] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Reorganize future master/candidate on s1 codfw [puppet] - 10https://gerrit.wikimedia.org/r/534018 (https://phabricator.wikimedia.org/T230106) (owner: 10Marostegui)
[07:38:36] <marostegui>	 !log Upgrade and reboot db2103 and db2112 to pick up binlog format change - T230106
[07:44:30] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is CRITICAL: 53.81 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[07:46:54] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in codfw on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler={proxy:fcgi://127.0.0.1:9000,proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&va
[07:46:54] <icinga-wm>	 server&var-method=GET
[07:47:22] <jynus>	 did we just lose lots of app servers on codfw?
[07:47:52] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler={proxy:fcgi://127.0.0.1:9000,proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluste
[07:47:52] <icinga-wm>	 ethod=GET
[07:47:55] <marostegui>	 yeah, that is what icinga shows on the 
[07:47:58] <marostegui>	 dashboard
[07:48:13] <jynus>	 is it network?
[07:48:30] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=GET
[07:49:22] <marostegui>	 I am checking mw2274 and it apparently is ok
[07:49:30] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET
[07:49:48] <marostegui>	 they all recovered indeed
[07:49:55] <mutante>	 i dont see that on icinga ?
[07:50:02] <mutante>	 yea
[07:50:06] <marostegui>	 they just recovered
[07:50:09] <jynus>	 it recovered, but pyball was seeing multiple failures
[07:50:55] <jynus>	 many db errors too, so that would point to an app server or network issue
[07:52:28] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is OK: (C)60 le (W)70 le 93.27 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[07:54:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1133 from wikitech', diff saved to https://phabricator.wikimedia.org/P9030 and previous config saved to /var/cache/conftool/dbconfig/20190903-075451-marostegui.json
[07:55:18] <mutante>	 nothing on the maint-announce calendar or inbox
[08:00:13] <wikibugs>	 (03PS2) 10Ema: ATS: log Cookie in labs too [puppet] - 10https://gerrit.wikimedia.org/r/533938 (https://phabricator.wikimedia.org/T227432)
[08:01:54] <wikibugs>	 (03CR) 10Ema: [C: 03+2] ATS: log Cookie in labs too [puppet] - 10https://gerrit.wikimedia.org/r/533938 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema)
[08:02:15] <logmsgbot>	 !log joal@deploy1001 deploy aborted: Regular weekly analytics deploy train (duration: 27m 47s)
[08:02:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:03:51] <logmsgbot>	 !log joal@deploy1001 Started deploy [analytics/refinery@4810dfa]: Regular weekly analytics deploy train - Second try
[08:03:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:04:19] <logmsgbot>	 !log joal@deploy1001 Finished deploy [analytics/refinery@4810dfa]: Regular weekly analytics deploy train - Second try (duration: 00m 27s)
[08:04:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:05:08] <mutante>	 i dont see anything obvious for either appservers not networking in codfw except this https://grafana.wikimedia.org/d/000000608/datacenter-overview?panelId=7&fullscreen&orgId=1&var-datasource=codfw%20prometheus%2Fops&var-cluster=All
[08:07:33] <wikibugs>	 (03CR) 10Ema: [C: 03+1] hiera: Increase ATS SSL session cache capacity to 4M sessions [puppet] - 10https://gerrit.wikimedia.org/r/533994 (https://phabricator.wikimedia.org/T231849) (owner: 10Vgutierrez)
[08:09:20] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] hiera: Increase ATS SSL session cache capacity to 4M sessions [puppet] - 10https://gerrit.wikimedia.org/r/533994 (https://phabricator.wikimedia.org/T231849) (owner: 10Vgutierrez)
[08:09:36] <wikibugs>	 (03PS2) 10Vgutierrez: hiera: Increase ATS SSL session cache capacity to 4M sessions [puppet] - 10https://gerrit.wikimedia.org/r/533994 (https://phabricator.wikimedia.org/T231849)
[08:09:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Pool db1133 with weight 0 T229657', diff saved to https://phabricator.wikimedia.org/P9031 and previous config saved to /var/cache/conftool/dbconfig/20190903-080958-marostegui.json
[08:10:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:10:02] <stashbot>	 T229657: Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657
[08:14:40] <icinga-wm>	 PROBLEM - Check systemd state on labweb1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:14:50] <icinga-wm>	 PROBLEM - Check systemd state on labweb1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:16:14] <icinga-wm>	 RECOVERY - Check systemd state on labweb1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:16:24] <icinga-wm>	 RECOVERY - Check systemd state on labweb1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:16:46] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:16:47] <Bsadowski1>	 Some issues loading pages
[08:16:52] <Bsadowski1>	 Hmm
[08:18:12] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is CRITICAL: cluster=cache_text site=eqsin https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:18:34] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-text site=ulsfo https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[08:18:42] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is CRITICAL: cluster=cache_text site=codfw https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:19:00] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-text site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[08:19:28] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:19:30] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqiad on icinga1001 is CRITICAL: job=varnish-text site=eqiad https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[08:19:36] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:19:48] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:19:56] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:20:10] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[08:20:39] <wikibugs>	 (03PS1) 10Ema: cache: allow caching piwik [puppet] - 10https://gerrit.wikimedia.org/r/534034 (https://phabricator.wikimedia.org/T230772)
[08:21:04] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:21:06] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[08:21:12] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:21:41] <gehel>	 !log purging maps / info.json from cache - T231842
[08:21:52] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:22:10] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[08:22:40] <wikibugs>	 (03CR) 10Nikerabbit: [C: 04-1] Move ContentTranslation out of Beta in jvwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533172 (https://phabricator.wikimedia.org/T231207) (owner: 10KartikMistry)
[08:23:08] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:23:20] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-text site=ulsfo https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[08:24:32] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is CRITICAL: cluster=cache_text site=eqsin https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:24:43] <stashbot>	 gehel: Failed to log message to wiki. Somebody should check the error logs.
[08:24:47] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: restbase: convert to use profile::lvs::realserver [puppet] - 10https://gerrit.wikimedia.org/r/532667
[08:24:54] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[08:25:02] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is CRITICAL: cluster=cache_text site=codfw https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:25:20] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-text site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[08:25:50] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:26:08] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:26:18] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:26:38] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:26:50] <marostegui>	 !log Add REPLICATION grant to wikiuser and wikiadmin on db1073 with replication enabled - T229657
[08:26:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:26:52] <stashbot>	 T229657: Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657
[08:26:54] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[08:27:24] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:28:41] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Wikistats, 10Traffic, and 2 others: Piwik JS isn't cached - https://phabricator.wikimedia.org/T230772 (10ema) >>! In T230772#5460265, @Nuria wrote: > @ema So I understand: caching pass needs to be removed  Yes, and the equivalent change needs to be done for ATS too. I...
[08:30:44] <wikibugs>	 (03CR) 10Nuria: [C: 03+1] cache: allow caching piwik [puppet] - 10https://gerrit.wikimedia.org/r/534034 (https://phabricator.wikimedia.org/T230772) (owner: 10Ema)
[08:31:11] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, and 3 others: Piwik JS isn't cached - https://phabricator.wikimedia.org/T230772 (10Nuria)
[08:33:42] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: restbase: convert to use profile::lvs::realserver [puppet] - 10https://gerrit.wikimedia.org/r/532667
[08:35:12] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is CRITICAL: 50.92 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[08:35:46] <wikibugs>	 (03CR) 10Ema: [C: 03+2] ATS: cache responses to cookies [puppet] - 10https://gerrit.wikimedia.org/r/533530 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema)
[08:35:54] <wikibugs>	 (03PS2) 10Ema: ATS: cache responses to cookies [puppet] - 10https://gerrit.wikimedia.org/r/533530 (https://phabricator.wikimedia.org/T227432)
[08:36:13] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] Add extra key for tstarling [puppet] - 10https://gerrit.wikimedia.org/r/533125 (owner: 10Tim Starling)
[08:40:28] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 04-1] "These were already omitted in the initial version of this patch (" [puppet] - 10https://gerrit.wikimedia.org/r/532348 (owner: 10Vgutierrez)
[08:41:06] <wikibugs>	 (03PS2) 10Dzahn: remove wikiba.se microsite puppetization [puppet] - 10https://gerrit.wikimedia.org/r/532972 (https://phabricator.wikimedia.org/T99531)
[08:42:06] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:42:23] <wikibugs>	 (03PS3) 10KartikMistry: Move ContentTranslation out of Beta in jvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533172 (https://phabricator.wikimedia.org/T231207)
[08:42:26] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is CRITICAL: cluster=cache_text site=codfw https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:43:14] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:43:42] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:44:02] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:44:50] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:47:11] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/18143/restbase1025.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/532667 (owner: 10Giuseppe Lavagetto)
[08:47:24] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: restbase: convert to use profile::lvs::realserver [puppet] - 10https://gerrit.wikimedia.org/r/532667
[08:47:43] <wikibugs>	 10Operations, 10serviceops: Update component/php72 to 7.2.22 - https://phabricator.wikimedia.org/T230024 (10MoritzMuehlenhoff) p:05Triage→03Normal
[08:49:06] <ema>	 !log cp1075: pool ats-be with caching enabled T228629
[08:49:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:49:09] <stashbot>	 T228629: ATS Backends: Test live cache_text traffic  - https://phabricator.wikimedia.org/T228629
[08:49:15] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
[08:49:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:49:28] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is CRITICAL: 54.85 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[08:53:14] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:53:26] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-text site=ulsfo https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[08:54:22] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:54:48] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:54:50] <ema>	 !log cp1089: varnish-backend-restart due to mbox lag and fetch failures
[08:54:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:55:02] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[08:55:08] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is CRITICAL: cluster=cache_text site=codfw https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:55:24] <wikibugs>	 (03PS1) 10Nuria: Adding caching headers for piwik javascript [puppet] - 10https://gerrit.wikimedia.org/r/534114 (https://phabricator.wikimedia.org/T230772)
[08:55:26] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-text site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[08:55:56] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:56:44] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:57:02] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[08:57:24] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is OK: (C)60 le (W)70 le 109.5 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[08:58:06] <wikibugs>	 (03CR) 10Gilles: Adding caching headers for piwik javascript (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/534114 (https://phabricator.wikimedia.org/T230772) (owner: 10Nuria)
[09:01:02] <wikibugs>	 (03PS2) 10Nuria: Adding caching headers for piwik javascript [puppet] - 10https://gerrit.wikimedia.org/r/534114 (https://phabricator.wikimedia.org/T230772)
[09:01:42] <wikibugs>	 10Operations, 10MobileFrontend, 10Traffic, 10Mobile: https://en.wikipedia.org/wiki/Heteromyidae shows the mobile version on desktop - https://phabricator.wikimedia.org/T231620 (10ema) p:05Triage→03Normal
[09:02:24] <wikibugs>	 10Operations, 10Traffic: Unexpectedly received mobile version of an article while logged out - https://phabricator.wikimedia.org/T231504 (10ema) This should now be fixed. Please let me know if that's not the case!
[09:02:33] <wikibugs>	 10Operations, 10MobileFrontend, 10Traffic, 10Mobile: https://en.wikipedia.org/wiki/Heteromyidae shows the mobile version on desktop - https://phabricator.wikimedia.org/T231620 (10ema) This should now be fixed. Please let me know if that's not the case!
[09:02:38] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC) - https://phabricator.wikimedia.org/T227539 (10akosiaris)
[09:03:15] <wikibugs>	 (03PS2) 10Ema: cache: allow caching piwik [puppet] - 10https://gerrit.wikimedia.org/r/534034 (https://phabricator.wikimedia.org/T230772)
[09:03:27] <gehel>	 !log reset kartotherian password -T231842
[09:03:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:04:34] <Shanmugamp7>	 anyone getting 503?
[09:04:38] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqsin on icinga1001 is CRITICAL: job=varnish-text site=eqsin https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[09:04:42] <ema>	 Shanmugamp7: yup
[09:04:56] <ema>	 !log cp1085: varnish-backend-restart, mbox lag and fetch failures
[09:04:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:05:44] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is CRITICAL: cluster=cache_text site=eqsin https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[09:05:54] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[09:06:03] <wikibugs>	 (03CR) 10Ema: [C: 03+2] cache: allow caching piwik [puppet] - 10https://gerrit.wikimedia.org/r/534034 (https://phabricator.wikimedia.org/T230772) (owner: 10Ema)
[09:06:08] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-text site=ulsfo https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[09:06:14] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is CRITICAL: cluster=cache_text site=codfw https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[09:06:32] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-text site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[09:06:54] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at codfw on icinga1001 is CRITICAL: job=varnish-text site=codfw https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[09:07:02] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[09:07:04] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqiad on icinga1001 is CRITICAL: job=varnish-text site=eqiad https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[09:07:10] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[09:07:22] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[09:07:32] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[09:07:44] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[09:07:50] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqsin on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[09:07:50] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[09:08:18] <ema>	 Shanmugamp7: things should look better now
[09:08:32] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[09:08:40] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[09:08:40] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[09:08:44] <Shanmugamp7>	 ema: ok, thanks
[09:08:46] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[09:08:50] <ema>	 Shanmugamp7: thank you!
[09:09:44] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[09:11:39] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, and 3 others: Piwik JS isn't cached - https://phabricator.wikimedia.org/T230772 (10Nuria)
[09:11:57] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: scb: convert to profile::lvs::realserver [puppet] - 10https://gerrit.wikimedia.org/r/532668
[09:16:03] <wikibugs>	 10Operations, 10Traffic: ATS-tls isn't enforcing the same list of curves as nginx during TLS handshake - https://phabricator.wikimedia.org/T231859 (10Vgutierrez)
[09:16:17] <hashar>	 !log Deploy refactor of Zuul pipelines which might mean that some repos/branches would miss jobs or have extra unwanted jobs. In such case please fill in a task against #continuous-integration-config
[09:16:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:17:55] <wikibugs>	 10Operations, 10Traffic: ATS-tls isn't enforcing the same list of curves as nginx during TLS handshake - https://phabricator.wikimedia.org/T231859 (10Vgutierrez) p:05Triage→03Normal
[09:21:30] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: scb: convert to profile::lvs::realserver [puppet] - 10https://gerrit.wikimedia.org/r/532668
[09:26:48] <wikibugs>	 (03PS1) 10Vgutierrez: Release 8.0.5-1wm5 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/534118 (https://phabricator.wikimedia.org/T231859)
[09:27:34] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [cookbooks] - 10https://gerrit.wikimedia.org/r/530096 (https://phabricator.wikimedia.org/T225297) (owner: 10Elukey)
[09:27:44] <wikibugs>	 (03PS1) 10Dzahn: aptrepo: attempt to fix ListShellHook for envoy [puppet] - 10https://gerrit.wikimedia.org/r/534119
[09:27:53] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/18145/scb1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/532668 (owner: 10Giuseppe Lavagetto)
[09:28:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "These are just test hosts anyway." [puppet] - 10https://gerrit.wikimedia.org/r/531235 (https://phabricator.wikimedia.org/T102099) (owner: 10Jbond)
[09:30:02] <wikibugs>	 (03PS2) 10Muehlenhoff: Switch Stas to volunteer account [puppet] - 10https://gerrit.wikimedia.org/r/533859
[09:30:30] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] aptrepo: attempt to fix ListShellHook for envoy [puppet] - 10https://gerrit.wikimedia.org/r/534119 (owner: 10Dzahn)
[09:30:37] <wikibugs>	 (03CR) 10Gilles: [C: 03+1] Adding caching headers for piwik javascript [puppet] - 10https://gerrit.wikimedia.org/r/534114 (https://phabricator.wikimedia.org/T230772) (owner: 10Nuria)
[09:30:58] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] aptrepo: attempt to fix ListShellHook for envoy [puppet] - 10https://gerrit.wikimedia.org/r/534119 (owner: 10Dzahn)
[09:31:13] <wikibugs>	 (03PS2) 10Dzahn: aptrepo: attempt to fix ListShellHook for envoy [puppet] - 10https://gerrit.wikimedia.org/r/534119
[09:32:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch Stas to volunteer account [puppet] - 10https://gerrit.wikimedia.org/r/533859 (owner: 10Muehlenhoff)
[09:33:56] <wikibugs>	 (03PS1) 10Vgutierrez: ATS: Configure a list of curves to be offered during the TLS handshake [puppet] - 10https://gerrit.wikimedia.org/r/534123 (https://phabricator.wikimedia.org/T231859)
[09:34:17] <wikibugs>	 (03PS3) 10Dzahn: aptrepo: attempt to fix ListShellHook for envoy [puppet] - 10https://gerrit.wikimedia.org/r/534119
[09:37:57] <wikibugs>	 (03CR) 10Vgutierrez: "pcc is happy: https://puppet-compiler.wmflabs.org/compiler1001/18146/" [puppet] - 10https://gerrit.wikimedia.org/r/534123 (https://phabricator.wikimedia.org/T231859) (owner: 10Vgutierrez)
[09:46:12] <mutante>	 !log install1002 - import GPG key for getenvoy repo, importing envoy for jessie with reprepro update
[09:46:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:30] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: ores: convert to profile::lvs::realserver [puppet] - 10https://gerrit.wikimedia.org/r/532669
[09:46:51] <moritzm>	 !log moved uid=smalyshev from cn=wmf to cn=nda
[09:46:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:47:16] <wikibugs>	 10Operations, 10Cassandra, 10Core Platform Team Workboards (Clinic Duty Team): Revisit default settings for c-foreach-restart - https://phabricator.wikimedia.org/T198787 (10mobrovac) a:05mobrovac→03None
[09:48:29] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/18147/ores1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/532669 (owner: 10Giuseppe Lavagetto)
[09:49:00] <wikibugs>	 (03PS1) 10Dzahn: tlsproxy/envoy: fix package name for envoy on jessie [puppet] - 10https://gerrit.wikimedia.org/r/534125
[09:51:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] tlsproxy/envoy: fix package name for envoy on jessie [puppet] - 10https://gerrit.wikimedia.org/r/534125 (owner: 10Dzahn)
[09:53:21] <wikibugs>	 (03CR) 10Ema: [C: 03+1] ATS: Configure a list of curves to be offered during the TLS handshake [puppet] - 10https://gerrit.wikimedia.org/r/534123 (https://phabricator.wikimedia.org/T231859) (owner: 10Vgutierrez)
[09:53:52] <wikibugs>	 (03CR) 10Ema: [C: 03+1] Release 8.0.5-1wm5 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/534118 (https://phabricator.wikimedia.org/T231859) (owner: 10Vgutierrez)
[09:54:50] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: calico: Enabled felix prometheus endpoint [puppet] - 10https://gerrit.wikimedia.org/r/534126
[09:59:08] <_joe_>	 !log removing old lvs-related scripts from ores*
[09:59:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:59:30] <wikibugs>	 (03PS2) 10Dzahn: tlsproxy/envoy: fix package name for envoy on jessie [puppet] - 10https://gerrit.wikimedia.org/r/534125
[10:00:24] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: proton: convert to profile::lvs::realserver [puppet] - 10https://gerrit.wikimedia.org/r/532670
[10:02:26] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/18149/proton1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/532670 (owner: 10Giuseppe Lavagetto)
[10:03:43] <wikibugs>	 (03PS1) 10Dzahn: requesttracker: include envoy for TLS termination [puppet] - 10https://gerrit.wikimedia.org/r/534128
[10:03:54] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] tlsproxy/envoy: fix package name for envoy on jessie [puppet] - 10https://gerrit.wikimedia.org/r/534125 (owner: 10Dzahn)
[10:04:25] <wikibugs>	 (03PS3) 10Dzahn: tlsproxy/envoy: fix package name for envoy on jessie [puppet] - 10https://gerrit.wikimedia.org/r/534125
[10:07:18] <_joe_>	 mutante: the puppetization on jessie is completely untested, don't hate me if nothing works :P
[10:07:39] <wikibugs>	 (03PS1) 10Dzahn: add discovery name for RT [dns] - 10https://gerrit.wikimedia.org/r/534129
[10:07:42] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: openldap,labweb: convert to profile::lvs::realserver [puppet] - 10https://gerrit.wikimedia.org/r/532671
[10:07:48] <mutante>	 _joe_: ok :) i am using the RT server to test it.. even though i want to replace that with stretch anyways
[10:08:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] add discovery name for RT [dns] - 10https://gerrit.wikimedia.org/r/534129 (owner: 10Dzahn)
[10:08:14] <_joe_>	 mutante: before you add that discovery name, we need to change a few things on the puppet side though
[10:08:43] <_joe_>	 oh just a cname heh
[10:08:57] <mutante>	 yea, i am using the CNAME method like for planet and people etc
[10:09:07] <mutante>	 copied that from the first example from e.ma
[10:09:26] <mutante>	 i would be happy to do that part later though
[10:09:56] <mutante>	 actually i want to delete ununpentium anyways. this was just to confirm the install works
[10:10:57] <mutante>	 real goals: replace ununpentium with rt1001 (stretch) and then replace rt with rt-static and stop running the Perl code
[10:13:45] <mutante>	 duh. that's a public IP as well.. right.
[10:14:11] <mutante>	 glad that dns-lint is smart nowadays
[10:15:19] <wikibugs>	 (03PS2) 10Dzahn: requesttracker: include envoy for TLS termination [puppet] - 10https://gerrit.wikimedia.org/r/534128
[10:15:48] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is CRITICAL: cluster=cache_text site=eqsin https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:15:58] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:16:10] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-text site=ulsfo https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[10:16:14] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqsin on icinga1001 is CRITICAL: job=varnish-text site=eqsin https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[10:16:16] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is CRITICAL: cluster=cache_text site=codfw https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:16:43] <wikibugs>	 (03CR) 10Dzahn: [C: 04-2] "ununpentium is in wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/534129 (owner: 10Dzahn)
[10:16:56] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is CRITICAL: 53.03 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[10:17:02] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:17:04] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqiad on icinga1001 is CRITICAL: job=varnish-text site=eqiad https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[10:17:10] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:17:17] <ema>	 !log cp1083: varnish-backend-restart -- mbox lag, fetch failures
[10:17:26] <mutante>	 looks like before on https://grafana.wikimedia.org/d/000000479/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[10:17:34] <mutante>	 ack 1083
[10:17:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:51] <ema>	 mutante: yup, 1083 this time
[10:18:12] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-text site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[10:18:54] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] requesttracker: include envoy for TLS termination [puppet] - 10https://gerrit.wikimedia.org/r/534128 (owner: 10Dzahn)
[10:19:26] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqsin on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[10:20:34] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:20:44] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:20:58] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[10:21:04] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:21:22] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[10:21:48] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:21:50] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[10:21:56] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[10:24:52] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is CRITICAL: 56.13 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[10:31:50] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: openldap,labweb: convert to profile::lvs::realserver [puppet] - 10https://gerrit.wikimedia.org/r/532671
[10:36:18] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/18152/" [puppet] - 10https://gerrit.wikimedia.org/r/532671 (owner: 10Giuseppe Lavagetto)
[10:40:44] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is OK: (C)60 le (W)70 le 72.81 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[10:44:38] <wikibugs>	 (03PS1) 10Dzahn: add certificate for rt.discovery [puppet] - 10https://gerrit.wikimedia.org/r/534132
[10:44:53] <wikibugs>	 (03PS1) 10Dzahn: add fake SSL key for rt.discovery [labs/private] - 10https://gerrit.wikimedia.org/r/534133
[10:47:37] <wikibugs>	 (03CR) 10Dzahn: [V: 03+2 C: 03+2] add fake SSL key for rt.discovery [labs/private] - 10https://gerrit.wikimedia.org/r/534133 (owner: 10Dzahn)
[10:48:26] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] add certificate for rt.discovery [puppet] - 10https://gerrit.wikimedia.org/r/534132 (owner: 10Dzahn)
[10:48:37] <wikibugs>	 (03PS2) 10Dzahn: add certificate for rt.discovery [puppet] - 10https://gerrit.wikimedia.org/r/534132
[10:50:31] <wikibugs>	 10Operations, 10Traffic: Cannot download STL files due to network error - https://phabricator.wikimedia.org/T231422 (10ema) The issue happens due to varnish-frontend giving up the fetch from ATS because of lack of free space:  ` --  ObjStatus      200 --  ObjReason      OK --  ObjHeader      Date: Tue, 03 Sep...
[10:53:13] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, and 3 others: Piwik JS isn't cached - https://phabricator.wikimedia.org/T230772 (10Nuria) I think what is left here is to restart apache for settings to take place
[10:53:48] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: /{domain}/v1/transform/html/to/mobile-html/{title} (Get preview mobile HTML for test page) is CRITICAL: Test Get preview mobile HTML for test page returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[10:55:03] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, and 3 others: Piwik JS isn't cached - https://phabricator.wikimedia.org/T230772 (10Nuria) Restarted with : sudo apache2ctl restart
[10:58:40] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[11:00:05] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190903T1100).
[11:00:05] <jouncebot>	 Zoranzoki21, Amir1, and raynor: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[11:00:13] <Amir1>	 o/
[11:01:11] <raynor>	 o/
[11:02:00] <Amir1>	 I can SWAT I guess
[11:02:42] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533538 (https://phabricator.wikimedia.org/T231654) (owner: 10Zoranzoki21)
[11:03:07] <Amir1>	 bah, the user is not around :/
[11:03:20] <wikibugs>	 (03CR) 10Ladsgroup: "The user is not around" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533538 (https://phabricator.wikimedia.org/T231654) (owner: 10Zoranzoki21)
[11:03:40] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533882 (https://phabricator.wikimedia.org/T225055) (owner: 10Ladsgroup)
[11:05:16] <raynor>	 Amir1 - mine is almost a noop. I only need to verify that js config has changed
[11:05:20] <wikibugs>	 (03Merged) 10jenkins-bot: Enable WRITE_BOTH for items term store for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533882 (https://phabricator.wikimedia.org/T225055) (owner: 10Ladsgroup)
[11:05:55] <Amir1>	 raynor: I will merge yours quickly
[11:06:24] <raynor>	 kk, thx
[11:06:53] <wikibugs>	 (03PS2) 10Ladsgroup: Bump MobileWebUIActionsTracking sampling rate to 1 percent [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533930 (https://phabricator.wikimedia.org/T220016) (owner: 10Pmiazga)
[11:07:04] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Bump MobileWebUIActionsTracking sampling rate to 1 percent [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533930 (https://phabricator.wikimedia.org/T220016) (owner: 10Pmiazga)
[11:07:06] <wikibugs>	 (03CR) 10jenkins-bot: Enable WRITE_BOTH for items term store for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533882 (https://phabricator.wikimedia.org/T225055) (owner: 10Ladsgroup)
[11:07:14] <Amir1>	 raynor: is it testable on mwdebug1001?
[11:07:17] <Amir1>	 mwdebug1002
[11:07:27] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:533882|Enable WRITE_BOTH for items term store for wikidatawiki (T225055)]] (duration: 00m 55s)
[11:07:35] <raynor>	 Amir1, I just need to check mw.confg, that's all
[11:07:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:07:40] <stashbot>	 T225055: Switch `tmpItemTermsMigrationStages` to MIGRATION_WRITE_BOTH - https://phabricator.wikimedia.org/T225055
[11:07:40] <Amir1>	 marostegui: This went live
[11:08:06] <raynor>	 once it get's deployed I'll see changes in grafana
[11:08:12] <Amir1>	 raynor: cool
[11:08:48] <raynor>	 we went with super safe events sampling rate of 0.01%, and as you might expect, graph shows ~1 event from time to time
[11:09:12] <Amir1>	 LOL
[11:09:28] <wikibugs>	 (03Merged) 10jenkins-bot: Bump MobileWebUIActionsTracking sampling rate to 1 percent [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533930 (https://phabricator.wikimedia.org/T220016) (owner: 10Pmiazga)
[11:09:50] <wikibugs>	 (03CR) 10jenkins-bot: Bump MobileWebUIActionsTracking sampling rate to 1 percent [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533930 (https://phabricator.wikimedia.org/T220016) (owner: 10Pmiazga)
[11:10:15] <Amir1>	 raynor: going live
[11:10:48] <raynor>	 awesome, thx
[11:10:59] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:533930|Bump MobileWebUIActionsTracking sampling rate to 1 percent (T220016)]] (duration: 00m 53s)
[11:11:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:11:02] <stashbot>	 T220016: Create, and deploy working  MobileWebUIActionsTracking schema - https://phabricator.wikimedia.org/T220016
[11:11:25] <Amir1>	 raynor: ^
[11:11:40] <raynor>	 thx, checking
[11:13:24] <raynor>	 lovely cache ;)
[11:14:36] <icinga-wm>	 PROBLEM - Check systemd state on ununpentium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:15:52] <Zoranzoki21>	 Hi, sorry for lating! I am in bus because I go to school
[11:16:03] <Zoranzoki21>	 I saw to my patch for bswiki is voted with +2
[11:16:10] <Zoranzoki21>	 How it's going?
[11:16:25] <Amir1>	 Zoranzoki21: I had to stop it because you weren't around. I can continue now
[11:16:55] <Zoranzoki21>	 I don't have access to X-Wikimedia-Debug because I am on phone
[11:17:00] <Amir1>	 raynor: mediawiki basically is a caching service with some functionalities around it 
[11:17:09] <raynor>	 :)
[11:17:09] <Zoranzoki21>	 Urbanecm usually does it when I am not around
[11:17:22] <Amir1>	 Zoranzoki21: Is there a way I can test it?
[11:17:54] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: /{domain}/v1/page/random/title (retrieve a random article title) is CRITICAL: Test retrieve a random article title returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[11:18:15] <Zoranzoki21>	 I think no..
[11:18:40] <Zoranzoki21>	 I changed name of wgMetaNamespaceTalk and after merge of patch you should run namespaceDupes as i Know
[11:19:08] <jynus>	 raynor: Amir1: are you aware of the above failure on mobileapps?
[11:19:32] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[11:19:45] <Amir1>	 Zoranzoki21: If it's a namespace on bswki, I should be able to check it.
[11:19:48] <jynus>	 ah, it got fixed
[11:20:09] <Amir1>	 jynus: no, I think it's an intermittent issue (btw. we should move away from scb :D)
[11:20:11] <Zoranzoki21>	 Amir1: I think it is true.. It is Project_talk namespace as I know
[11:20:11] <raynor>	 jynus, Amir1: yup, I saw that, it's not something my patch could cause it
[11:20:19] <Amir1>	 kubeternetes ftw
[11:20:28] <jynus>	 ok, np now
[11:22:09] <wikibugs>	 (03PS2) 10Ladsgroup: Fix wgMetaNamespaceTalk for bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533538 (https://phabricator.wikimedia.org/T231654) (owner: 10Zoranzoki21)
[11:22:25] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533538 (https://phabricator.wikimedia.org/T231654) (owner: 10Zoranzoki21)
[11:23:32] <wikibugs>	 (03Merged) 10jenkins-bot: Fix wgMetaNamespaceTalk for bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533538 (https://phabricator.wikimedia.org/T231654) (owner: 10Zoranzoki21)
[11:23:47] <wikibugs>	 (03CR) 10jenkins-bot: Fix wgMetaNamespaceTalk for bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533538 (https://phabricator.wikimedia.org/T231654) (owner: 10Zoranzoki21)
[11:24:05] <raynor>	 Amir1, FYI: the wgWMEMobileWebUIActionsTracking config is still set to 0.0001 ;/
[11:24:18] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: /{domain}/v1/media/image/featured/{year}/{month}/{day} (retrieve featured image data for April 29, 2016) is CRITICAL: Test retrieve featured image data for April 29, 2016 returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[11:24:44] <Amir1>	 raynor: oh oh oh no, I forgot to rebase
[11:24:56] <Amir1>	 I'm sorry
[11:25:01] <Amir1>	 deploying now
[11:25:01] <raynor>	 no worries
[11:25:25] <raynor>	 I thought it's just cache thing, ResourceLoader output is cached for ~5 mins if I remember it right
[11:25:44] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:533930|Bump MobileWebUIActionsTracking sampling rate to 1 percent (T220016)]] (duration: 00m 52s)
[11:25:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:25:46] <stashbot>	 T220016: Create, and deploy working  MobileWebUIActionsTracking schema - https://phabricator.wikimedia.org/T220016
[11:25:54] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[11:26:22] <Amir1>	 raynor: sorry. Can you try again
[11:26:44] <raynor>	 ok, now it works, thx Amir1 
[11:26:56] <raynor>	 and don't worry, it's ok
[11:27:27] <Zoranzoki21>	 Yes, each people can do something wrong noone is non-wrong
[11:27:45] <Zoranzoki21>	 Amir1: How is going with my patch?
[11:28:00] <Amir1>	 Zoranzoki21: being deployed
[11:28:08] <Zoranzoki21>	 Amir1: Ok
[11:28:33] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:533538|Fix wgMetaNamespaceTalk for bswiki (T231654)]] (duration: 00m 54s)
[11:28:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:36] <stashbot>	 T231654: Update talk namespace for Bosnian in InitialiseSettings.php - https://phabricator.wikimedia.org/T231654
[11:29:13] <Amir1>	 !log ladsgroup@mwmaint1002:~$ mwscript namespaceDupes.php bswiki --fix (T231654)
[11:29:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:29:24] <Amir1>	 !log EU SWAT is done
[11:29:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:29:38] <Zoranzoki21>	 Done? :)
[11:30:42] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most read articles for January 1, 2016) is CRITICAL: Test retrieve the most read articles for January 1, 2016 returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[11:31:31] <Zoranzoki21>	 Amir1: Can I go?
[11:31:42] <Amir1>	 Zoranzoki21: yup
[11:31:45] <Zoranzoki21>	 Tnx
[11:32:18] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[11:35:48] <Amir1>	 !log ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --to-id 1000 --sleep 2 (T225056)
[11:35:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:35:51] <stashbot>	 T225056: Run Item Terms Rebuild script - https://phabricator.wikimedia.org/T225056
[11:39:00] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: role::lvs::realserver: remove from puppet [puppet] - 10https://gerrit.wikimedia.org/r/532673
[11:42:09] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] "according to cumin, only one server that is currently down still has the resources removed here." [puppet] - 10https://gerrit.wikimedia.org/r/532673 (owner: 10Giuseppe Lavagetto)
[11:47:07] <hashar>	 jouncebot: next
[11:47:07] <jouncebot>	 In 0 hour(s) and 12 minute(s): Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190903T1200)
[11:47:08] <hashar>	 jouncebot: now
[11:47:09] <jouncebot>	 For the next 0 hour(s) and 12 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190903T1100)
[11:48:02] <marostegui>	 !log Downtime m5 hosts T229657
[11:48:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:48:05] <stashbot>	 T229657: Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657
[11:49:23] <wikibugs>	 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui)
[11:55:31] <marostegui>	 !log Change topology on m5 and make everything replicate from db1133 - T229657
[11:55:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:55:35] <stashbot>	 T229657: Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657
[11:58:55] <wikibugs>	 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui)
[12:00:04] <jouncebot>	 Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190903T1200)
[12:00:23] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Promote db1133 as wikitech master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534144 (https://phabricator.wikimedia.org/T229657)
[12:02:04] <wikibugs>	 (03CR) 10Marostegui: mariadb: Promote db1133 to m5 master [puppet] - 10https://gerrit.wikimedia.org/r/529331 (https://phabricator.wikimedia.org/T229657) (owner: 10Marostegui)
[12:02:11] <wikibugs>	 (03PS7) 10Marostegui: mariadb: Promote db1133 to m5 master [puppet] - 10https://gerrit.wikimedia.org/r/529331 (https://phabricator.wikimedia.org/T229657)
[12:02:17] <wikibugs>	 (03PS2) 10Hashar: Remove role::ci::slave::webperformance [puppet] - 10https://gerrit.wikimedia.org/r/531420 (https://phabricator.wikimedia.org/T225416)
[12:02:21] <marostegui>	 !log Disable puppet on db1073 and db1133 - T229657
[12:02:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:02:24] <stashbot>	 T229657: Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657
[12:03:57] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Promote db1133 to m5 master [puppet] - 10https://gerrit.wikimedia.org/r/529331 (https://phabricator.wikimedia.org/T229657) (owner: 10Marostegui)
[12:06:43] <wikibugs>	 (03CR) 10Marostegui: wmnet: Promote db1133 to m5 master [dns] - 10https://gerrit.wikimedia.org/r/529333 (https://phabricator.wikimedia.org/T229657) (owner: 10Marostegui)
[12:06:50] <wikibugs>	 (03CR) 10Hashar: "Some easy cleanup :]" [puppet] - 10https://gerrit.wikimedia.org/r/531420 (https://phabricator.wikimedia.org/T225416) (owner: 10Hashar)
[12:07:33] <wikibugs>	 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui)
[12:07:58] <wikibugs>	 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) @JHedden all the PRE steps are done.
[12:11:47] <hashar>	 jouncebot: now
[12:11:48] <jouncebot>	 For the next 0 hour(s) and 48 minute(s): Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190903T1200)
[12:12:15] <hashar>	 !sal
[12:12:16] <wm-bot>	 https://wikitech.wikimedia.org/wiki/Server_Admin_Log  https://tools.wmflabs.org/sal/production   See it and you will know all you need.
[12:12:27] <hashar>	 oh swat is done. thank you :]
[12:12:41] <hashar>	 I am going to promote 1.34.0-wmf.20 to rest of wikis since there are no more blocker
[12:15:30] <wikibugs>	 (03PS1) 10Hashar: all wikis to 1.34.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534147
[12:15:32] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] all wikis to 1.34.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534147 (owner: 10Hashar)
[12:18:05] <wikibugs>	 (03Merged) 10jenkins-bot: all wikis to 1.34.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534147 (owner: 10Hashar)
[12:19:48] <logmsgbot>	 !log hashar@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.20
[12:19:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:19:50] <wikibugs>	 (03CR) 10jenkins-bot: all wikis to 1.34.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534147 (owner: 10Hashar)
[12:20:18] <icinga-wm>	 PROBLEM - PHP opcache health on mwdebug2001 is CRITICAL: CRITICAL: opcache free space is below 50 MB https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[12:20:34] <icinga-wm>	 PROBLEM - PHP opcache health on mwdebug2002 is CRITICAL: CRITICAL: opcache free space is below 50 MB https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[12:21:54] <hashar>	 :-\
[12:22:04] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in codfw on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:fcgi://127.0.0.1:9000 method=GET https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver
[12:22:31] <hashar>	 and the high latency is I guess just the bytecode cache being primed
[12:23:40] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=GET
[12:26:26] <godog>	 hashar: not sure why only codfw though
[12:26:34] <hashar>	 just noise imho
[12:26:50] <hashar>	 I would guess the icinga check has hit a mw server that hand't been hit previousl
[12:26:54] <hashar>	 and ends up timing out
[12:27:08] <hashar>	 == noise imho :]
[12:27:22] <wikibugs>	 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog: Lake Huron missing due to apparent OSM vandalism - https://phabricator.wikimedia.org/T231691 (10Haros) >>! In T231691#5457015, @Pikne wrote: > @Gehel, there's also a lake in Norway waiting for monthly automatic update to reappear in smaller zoom le...
[12:27:28] <hashar>	 cutting branches
[12:27:38] <godog>	 the metric above isn't checked directly by icinga, it is extracted from logs of all appservers, but indeed that could be a minority of appservers influencing  the average
[12:27:57] <godog>	 checked by icinga on a single host that is
[12:29:26] <hashar>	 !log Cutting wmf/1.34.0-wmf.21  # T220746
[12:29:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:29:29] <stashbot>	 T220746: 1.34.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T220746
[12:33:26] <icinga-wm>	 PROBLEM - SSH access on cobalt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Gerrit
[12:34:48] <hashar>	 cobalt that is gerrit :-\
[12:34:57] <hashar>	 apparently due to the branch cut
[12:35:02] <marostegui>	 hashar: gerrit seems to be down for me indeed
[12:35:03] <hashar>	 that train is going to take a while ;-\\
[12:35:10] <hashar>	 ah just slow
[12:35:12] <hashar>	 it is back
[12:35:35] <marostegui>	 I just refreshed and it is back for me too
[12:36:28] <icinga-wm>	 RECOVERY - SSH access on cobalt is OK: SSH OK - GerritCodeReview_2.15.14-16-g855b179b5f (SSHD-CORE-1.6.0) (protocol 2.0) https://wikitech.wikimedia.org/wiki/Gerrit
[12:37:25] <jynus>	 we need moar T226240
[12:37:26] <stashbot>	 T226240: Create mirror of Gerrit repositories for consumption by various tools - https://phabricator.wikimedia.org/T226240
[12:38:04] <cdanis>	 I think the new gerrit readonly replica is being used by several things already
[12:38:18] <cdanis>	 also Soon there will be a newer, stronger machine as the primary
[12:38:22] <hashar>	 no idea what might have happened
[12:38:27] <jynus>	 yeah, that is why I said we need it moar :-D
[12:38:41] <hashar>	 last time I checked the bulk of the traffic got moved there
[12:38:50] <hashar>	 this time, I don't know what happened :-\
[12:39:00] <moritzm>	 !log upgrading mwdebug2001 to PHP 7.2.22
[12:39:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:39:03] <jynus>	 and yes, I know it is not simple, and that partition tolerance is an issue
[12:39:36] <akosiaris>	 https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&orgId=1&var-server=cobalt&var-datasource=eqiad%20prometheus%2Fops&var-cluster=misc&from=now-30m&to=now
[12:40:03] <akosiaris>	 ~1.5 times more CPU usage and load is generally increased
[12:40:08] <akosiaris>	 I see nothing in the logs however
[12:40:36] <hashar>	 I am cutting the wmf branches on repos so that takes a bit of cycles
[12:40:55] <hashar>	 but that alone does not explain the load spike :-\
[12:41:02] <icinga-wm>	 RECOVERY - PHP opcache health on mwdebug2001 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[12:41:48] <jynus>	 yeah, deployment operations should only be trivial operations AFAIK from cobalt point of view
[12:43:21] <godog>	 looks like gc times are on their way to the sky, from https://grafana.wikimedia.org/d/Bw2mQ3iWz/gerrit-javamelody?orgId=1&from=1567510996784&to=1567514596784&panelId=14&fullscreen&var-Application=&var-Window=30m
[12:43:52] <thcipriani>	 https://grafana.wikimedia.org/d/Bw2mQ3iWz/gerrit-javamelody?orgId=1&panelId=14&fullscreen&from=1567493022605&to=1567514622605&var-Application=&var-Window=30m
[12:44:03] <akosiaris>	 there's an increase in network traffic as well. It does seem periodic, but somehow between 12:25 and 12:32 the machine was receiving 6-7MB/s of traffic (which it normally doesn't
[12:44:06] <thcipriani>	 yeah, that. This happens during branch cut recently
[12:44:24] <jynus>	 interesting
[12:44:25] <thcipriani>	 gc thrashes and lots of GC pause is indistinguishable from the service being down :(
[12:45:32] <thcipriani>	 evidently a lot of memory gets allocated for adding a ref; although that seems like a recent development. I've noticed it for the past handful of train branch cuts.
[12:46:39] <wikibugs>	 (03CR) 10Jhedden: [C: 03+1] wmnet: Promote db1133 to m5 master [dns] - 10https://gerrit.wikimedia.org/r/529333 (https://phabricator.wikimedia.org/T229657) (owner: 10Marostegui)
[12:46:45] <jynus>	 any ideas? aside from hw upgrade? Would physically separate large/critical repos help?
[12:47:26] <jynus>	 or it is more of a configuration issue than load?
[12:47:53] <wikibugs>	 (03PS3) 10Nuria: Adding caching headers for piwik javascript [puppet] - 10https://gerrit.wikimedia.org/r/534114 (https://phabricator.wikimedia.org/T230772)
[12:48:56] <thcipriani>	 I think it may be due more to configuration since this is a recent development. My theory recently was that we had too many old refs laying around and I did a bunch cleanup, but that didn't solve the issue.
[12:49:35] <marostegui>	 jeh: morning! ready to start in 10 minutes?
[12:50:13] <jeh>	 marostegui: yep, will shut down the OpenStack scheduler just before we start
[12:50:25] <marostegui>	 jeh: excellent, I will confirm with you before starting
[12:50:45] <jynus>	 maybe sending a reminder on cloud?
[12:50:51] <jynus>	 (IRC)
[12:51:05] <jeh>	 jynus: good idea, I'll do that
[12:51:25] <jynus>	 to avoid mass-reports
[12:52:03] <jynus>	 as it may affect temporarolly striker and other tools
[12:52:44] <moritzm>	 !log uploaded PHP 7.2.22 to component/php72 T230024
[12:52:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:52:47] <stashbot>	 T230024: Update component/php72 to 7.2.22 - https://phabricator.wikimedia.org/T230024
[12:54:52] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] wmnet: Promote db1133 to m5 master [dns] - 10https://gerrit.wikimedia.org/r/529333 (https://phabricator.wikimedia.org/T229657) (owner: 10Marostegui)
[12:55:18] <wikibugs>	 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog: Lake Huron missing due to apparent OSM vandalism - https://phabricator.wikimedia.org/T231691 (10Gehel) >>! In T231691#5460894, @Haros wrote: > We really need a way to trigger a refresh without creating a new issue. I can do so if that is necessary,...
[12:56:19] <wikibugs>	 (03CR) 10Krinkle: "I'm confused - trigger_error cannot trigger the fatal error page, it should emit a syslog warning only. Try an uncaught Exception?" [puppet] - 10https://gerrit.wikimedia.org/r/511078 (https://phabricator.wikimedia.org/T113114) (owner: 10Ladsgroup)
[12:56:20] <Lucas_WMDE>	 is the m5 master failover at the same time as the train?
[12:56:31] <marostegui>	 Lucas_WMDE: should not interfere 
[12:56:36] <Lucas_WMDE>	 ok
[12:56:46] <marostegui>	 Lucas_WMDE: only affects wikitech
[12:57:33] <marostegui>	 cdanis: ping
[12:57:38] <cdanis>	 o/
[12:57:43] <cdanis>	 I've been watching :)
[12:57:43] <marostegui>	 o/
[12:57:49] <marostegui>	 thanks for being around!
[12:58:46] <marostegui>	 The main test I want to check is if wikitech becomes indeed read-only after running dbctl
[12:59:00] <cdanis>	 sure
[13:00:04] <jouncebot>	 hashar: How many deployers does it take to do MediaWiki train - European version deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190903T1300).
[13:00:04] <jouncebot>	 marostegui, jeh, and jynus: Dear deployers, time to do the m5 database master failover deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190903T1300).
[13:00:07] <marostegui>	 jeh: let me know when you are ready and I can start
[13:00:44] <jeh>	 marostegui: all set, you can begin 
[13:00:47] <marostegui>	 ok
[13:00:48] <marostegui>	 starting
[13:00:49] <marostegui>	 !log Failover m5 from db1073 to db1133 - T229657
[13:00:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:00:52] <stashbot>	 T229657: Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657
[13:01:14] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Set wikitech as read-only for maintenance T229657', diff saved to https://phabricator.wikimedia.org/P9033 and previous config saved to /var/cache/conftool/dbconfig/20190903-130113-marostegui.json
[13:01:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:01:26] <jynus>	 I can still edit
[13:01:29] <marostegui>	 cdanis: dbctl failed with a warning
[13:01:39] <jynus>	 still can edit
[13:01:50] <marostegui>	 yep, wikitech is not read-only
[13:02:02] <cdanis>	 what did dbctl output?
[13:02:24] <marostegui>	 https://phabricator.wikimedia.org/P9034
[13:02:30] <icinga-wm>	 PROBLEM - DPKG on mwdebug2002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[13:02:50] <cdanis>	 omg json schema
[13:03:00] <marostegui>	 cdanis: this is what I ran: dbctl --scope eqiad section wikitech ro "Maintenance on wikitech T229657 " && dbctl config commit -m "Set wikitech as read-only for maintenance T229657"
[13:03:22] <cdanis>	 yep all looks good
[13:03:30] <Reedy>	 marostegui: labswiki not wikitech
[13:03:36] <icinga-wm>	 RECOVERY - PHP opcache health on mwdebug2002 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[13:03:41] <jynus>	 the section is called wikitech, Reedy
[13:03:46] <jynus>	 the wiki is called labswiki
[13:03:53] <marostegui>	 Reedy: but the section is called wikitech
[13:04:06] <icinga-wm>	 RECOVERY - DPKG on mwdebug2002 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[13:04:25] <marostegui>	 cdanis: dbctl config diff still shows uncommitted stuff (as it failed)
[13:05:01] <cdanis>	 marostegui: indeed, the json schema as written doesn't wikitech to be present in readOnlyBySection 🤦
[13:05:05] <cdanis>	 I am pushing a patch now
[13:05:12] <marostegui>	 cdanis: great - thanks! :)
[13:05:22] <marostegui>	 we have time, our maintenance window was 30 minutes just in case!
[13:05:24] <wikibugs>	 (03PS1) 10CDanis: dbctl: schema: allow wikitech readonly [puppet] - 10https://gerrit.wikimedia.org/r/534150
[13:05:59] <jynus>	 we could also proceed without it if we were in a hurry, read only is detected also by mysql ro
[13:06:13] <jynus>	 but it would be less "smooth"
[13:06:23] <marostegui>	 yeah, let's try to merge cdanis patch and run the commit
[13:06:26] <jynus>	 +1
[13:06:27] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] dbctl: schema: allow wikitech readonly [puppet] - 10https://gerrit.wikimedia.org/r/534150 (owner: 10CDanis)
[13:06:34] <wikibugs>	 (03CR) 10CDanis: [V: 03+2 C: 03+2] dbctl: schema: allow wikitech readonly [puppet] - 10https://gerrit.wikimedia.org/r/534150 (owner: 10CDanis)
[13:06:48] <jynus>	 I see
[13:07:02] <jynus>	 cdanis: that is a mistake we have done many times- ignore wikitech
[13:07:07] <cdanis>	 indeed
[13:07:18] <marostegui>	 and that's why we hvae a task to move it to s5 or something \o/ :)
[13:07:19] <jynus>	 as it is a relatively new cluster (it didn't use to be part of the main installation)
[13:07:26] <marostegui>	 cdanis: let me know when I can try the commit again
[13:07:40] <cdanis>	 meh, gerrit is very slow right now even on fetch operations
[13:07:46] <jynus>	 yeah
[13:07:50] <wikibugs>	 (03PS1) 10Odder: Add high-density logos for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534151 (https://phabricator.wikimedia.org/T230120)
[13:08:18] <cdanis>	 ok marostegui give the commit a try again
[13:08:21] <marostegui>	 ok
[13:08:26] <marostegui>	 trying going read-only again
[13:08:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Set wikitech as read-only for maintenance T229657', diff saved to https://phabricator.wikimedia.org/P9035 and previous config saved to /var/cache/conftool/dbconfig/20190903-130839-marostegui.json
[13:08:42] <marostegui>	 it went thru fine now, let's check
[13:08:42] <stashbot>	 marostegui@cumin1001: Failed to log message to wiki. Somebody should check the error logs.
[13:08:43] <stashbot>	 T229657: Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657
[13:08:53] <marostegui>	 read only works for me
[13:08:57] <jynus>	 Warning: The database has been locked for maintenance, so you will not be able to save your edits right now. Yo
[13:09:01] <jynus>	 confirm^
[13:09:02] <marostegui>	 ok, proceeding
[13:09:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Promote db1133 to wikitech master T229657', diff saved to https://phabricator.wikimedia.org/P9036 and previous config saved to /var/cache/conftool/dbconfig/20190903-130937-marostegui.json
[13:09:41] <stashbot>	 marostegui@cumin1001: Failed to log message to wiki. Somebody should check the error logs.
[13:09:53] <moritzm>	 !log upgrading remaining mwdebug servers to PHP 7.2.22 T230024
[13:09:54] <Lucas_WMDE>	 heh @ stashbot
[13:09:56] <stashbot>	 moritzm: Failed to log message to wiki. Somebody should check the error logs.
[13:09:57] <stashbot>	 T230024: Update component/php72 to 7.2.22 - https://phabricator.wikimedia.org/T230024
[13:10:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Set wikitech back to RW after maintenance T229657', diff saved to https://phabricator.wikimedia.org/P9037 and previous config saved to /var/cache/conftool/dbconfig/20190903-131000-marostegui.json
[13:10:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:10:06] <marostegui>	 wikitech should be writtable again
[13:10:12] <marostegui>	 jeh: changing DNS now
[13:10:14] <cdanis>	 stashbot seems to confirm :)
[13:10:15] <stashbot>	 See https://wikitech.wikimedia.org/wiki/Tool:Stashbot for help.
[13:10:17] <jynus>	 ah, normal that log fails if wikitech in read only :-)
[13:10:31] <marostegui>	 I can edit fine on wikitech
[13:10:33] <jynus>	 loop dependency
[13:10:42] <jynus>	 yeah, I mean at the time 
[13:11:02] <jynus>	 marostegui: I confirm I can too
[13:11:06] <marostegui>	 jeh: DNS change went thru, TTL is 1M 
[13:11:20] <jeh>	 marostegui: OK, I'll keep my eye on the clients
[13:11:23] <marostegui>	 going to reload haproxy on dbproxy1005 (which is not used)
[13:11:23] <jynus>	 4 mediawiki errors
[13:11:44] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler={proxy:fcgi://127.0.0.1:9000,proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluste
[13:11:44] <icinga-wm>	 ethod=GET
[13:11:47] <marostegui>	 !log Reload haproxy on dbproxy1005 T229657
[13:11:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:54] <jynus>	 al regarding topology, nothing not expected
[13:12:08] <jynus>	 no ongoing mw errors
[13:12:25] <marostegui>	 jynus: can you check tendril and zarcillo for me? db1133 should be the new master
[13:12:31] <jynus>	 doing
[13:12:35] <jynus>	 was next on my list
[13:12:36] <marostegui>	 thanks
[13:12:39] <marostegui>	 dbproxy1005 reloaded
[13:12:49] <jynus>	 tendril looks good
[13:13:12] <wikibugs>	 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui)
[13:13:12] <marostegui>	 jeh: how are things looking from your end?
[13:13:28] <jynus>	 and so does zarillo, marostegui
[13:13:34] <marostegui>	 !log Re-enable puppet on db1073 and db1133 T229657
[13:13:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:13:37] <marostegui>	 jynus: great - thanks!
[13:13:53] <jynus>	 also no load issues- although none was expected
[13:14:02] <wikibugs>	 (03PS1) 10Odder: Add high-density logos for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534152 (https://phabricator.wikimedia.org/T230120)
[13:14:14] <Lucas_WMDE>	 oops, wikibugs was kicked?
[13:14:19] <jeh>	 marostegui: DNS is changed, services look good as they're coming back up
[13:14:25] <marostegui>	 great!
[13:14:36] <wikibugs>	 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui)
[13:14:39] <jynus>	 activity on old master?
[13:14:39] <cdanis>	 Lucas_WMDE: it does that occasionally, it will reconnect
[13:14:45] <jynus>	 (that was a question)
[13:14:53] <marostegui>	 jynus:nope
[13:14:57] <jynus>	 cool
[13:14:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1073 from wikitech T229657', diff saved to https://phabricator.wikimedia.org/P9038 and previous config saved to /var/cache/conftool/dbconfig/20190903-131456-marostegui.json
[13:15:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:15:00] <stashbot>	 T229657: Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657
[13:15:06] <wikibugs>	 10Operations, 10Traffic: Unexpectedly received mobile version of an article while logged out - https://phabricator.wikimedia.org/T231504 (10Mholloway) 05Open→03Resolved Sounds like this can be resolved, then.  I can no longer reproduce the issue, but I'll reopen if I see any further cases.  Thanks!
[13:15:15] <wikibugs>	 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui)
[13:15:32] <jynus>	 Lucas_WMDE: can you think of any relation between wikibugs and wikitech?
[13:15:53] <Lucas_WMDE>	 no idea
[13:15:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add high-density logos for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534152 (https://phabricator.wikimedia.org/T230120) (owner: 10Odder)
[13:16:02] <hashar>	 !log Gerrit has some random times out from time to time (no reason)
[13:16:03] <Lucas_WMDE>	 I didn’t know it would reconnect by itself, in that case probably no problem
[13:16:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:12] <cdanis>	 the excess flood on wikibugs is usually just it posts too many updates too quickly
[13:16:12] <jynus>	 maybe some tools could have temporary issues?
[13:17:00] <jynus>	 wikibugs is the one that posts phab updates?
[13:17:40] <cdanis>	 yeah
[13:18:05] <jynus>	 any user reports of issues on irc? I see nothing pon phab- although it normally takes some time
[13:18:35] <jynus>	 nothing strange on logs
[13:18:37] <marostegui>	 jeh: everything looking good?
[13:19:04] <jeh>	 there's a lot to check, but so far so good
[13:19:19] <jynus>	 yeah, take your time
[13:19:21] <jynus>	 :-D
[13:19:52] <jynus>	 the only worrying issues is some exceptions related to SDC/wikibase
[13:20:09] <jynus>	 but those happened for a long time before the switch
[13:20:18] <marostegui>	 jeh: good, let us know if we can help
[13:20:40] <marostegui>	 I will merge the MW config patch once the train is finished
[13:20:43] <marostegui>	 No rush for that one
[13:20:44] <jynus>	 https://tools.wmflabs.org/admin/tools seems to work
[13:20:50] <marostegui>	 Apart fro m that everything looks good from this end
[13:20:54] <jynus>	 but I don't have admin privileges there
[13:21:40] <wikibugs>	 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) This was done successfully.  wikitech read only start: 13:08:40 wikit...
[13:21:53] <wikibugs>	 (03PS1) 10CDanis: dbctl: schema: allow wikitech in readOnlyBySection [software/conftool] - 10https://gerrit.wikimedia.org/r/534153
[13:21:57] <jynus>	 not sure I understand the difference between https://tools.wmflabs.org/admin/tools and https://toolsadmin.wikimedia.org but both seem to work
[13:22:05] <marostegui>	 cdanis: thanks for the quick patch :)
[13:22:13] <marostegui>	 Glad we had you around!
[13:22:26] <jynus>	 thanks cdanis
[13:22:31] <jynus>	 also marostegui, great job as usual
[13:22:38] <cdanis>	 yeah no, sorry for the trouble 
[13:22:56] <cdanis>	 I think _joe_ wrote that schema originally so clearly it’s his fault ;)
[13:22:56] <marostegui>	 no trouble at all :)
[13:23:42] <_joe_>	 I don't think we had wikitech in the readonlybysection back in the day :P
[13:23:47] <marostegui>	 cdanis: we've had issues with wikitech for years, because it is a wiki but it is on a misc section rather than on s1-s8, that's why we have https://phabricator.wikimedia.org/T167973 
[13:23:51] <_joe_>	 it was treated completely separately
[13:24:25] <odder>	 Hi everyone, any ideas why https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/534152/ failed?
[13:24:47] <_joe_>	 odder: I thinkthis is the wrong channel to ask questions about CI
[13:25:01] <_joe_>	 #wikimedia-releng is probably a better place
[13:25:11] <hashar>	 !log 1.34.0-wmf.21 cut
[13:25:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:12] <cdanis>	 FWIW with a recent change I made to dbctl (not released yet) it would have become obvious that it would have failed validation before config commit 
[13:25:14] <marostegui>	 jeh: I am going to go to a meeting, but please once you are fully happy from your side, comment on the task and I will take are of the pending thing on the task (merge MW) after my meeting, as it is not urgent and the train is running
[13:25:15] <odder>	 Never knew this existed until just now :-P
[13:25:16] <Reedy>	 ccccccljuuntjnbundvnkeefhtrcfhcgfchribklhnjj
[13:25:27] <Reedy>	 ffs
[13:25:28] <jeh>	 marostegui: will do, thanks
[13:25:29] <marostegui>	 Reedy: get your cat off the keyboard
[13:25:31] <_joe_>	 Reedy: I want a yubykey too
[13:25:51] <cdanis>	 _joe_: OIT will send you one, and then I’ll help you put your ssh key on it 
[13:26:04] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:fcgi://127.0.0.1:9000 method=GET https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[13:26:08] <_joe_>	 marostegui: is the migration done?
[13:26:09] <hashar>	 !log Gerrit should be fine again, apparently was due to the wmf branch cut taking too much resources (sic) - T231872 filled to investigate
[13:26:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:26:11] <stashbot>	 T231872: Gerrit GC thrashing during branch cut - https://phabricator.wikimedia.org/T231872
[13:26:13] <Reedy>	 Or just buy one and reimburse it... Because they don't get any sort of discount
[13:26:22] <marostegui>	 _joe_: which migration? the failover?
[13:26:42] <_joe_>	 yes
[13:26:47] <marostegui>	 _joe_: yep
[13:26:50] <cdanis>	 Btw there’s something going on with the app servers, elevated latency and lots more mcrouter traffic than usual 
[13:26:53] <jynus>	 I can take over when you go to meeting
[13:26:58] <jynus>	 yeah, cdanis saw that
[13:27:06] <_joe_>	 cdanis: same pattern we saw yesterday 
[13:27:19] <_joe_>	 it grew twice since this morning
[13:27:20] <jynus>	 but this correlates to the db change
[13:27:40] <_joe_>	 what db change?
[13:27:53] <jynus>	 wikitech db master change
[13:28:03] <_joe_>	 well how can that impact the application servers?
[13:28:14] <_joe_>	 that do not connect to that database at all?
[13:28:23] <jynus>	 ?
[13:28:35] <jynus>	 dbctl operation is done
[13:28:46] <jynus>	 aka confctl
[13:28:50] <_joe_>	 so?
[13:29:06] <jynus>	 I am just correlating timestamps
[13:29:28] <jynus>	 if X happens at the same time than Y, maybe (not sure) could be related
[13:29:41] <_joe_>	 ok, this definitely can't be
[13:30:09] <_joe_>	 so this is only happening on appservers AIUI
[13:30:31] <_joe_>	 we saw this pattern yesterday as well
[13:30:47] <_joe_>	 looks like some objects in APC are suddenly invalid or something
[13:31:23] <cdanis>	 there was also a slight HTTP traffic increase about 10 minutes before the elevated latency
[13:32:05] <_joe_>	 the only real way to track this down further would be to have latency data aggregated by endpoint and wiki
[13:32:10] <_joe_>	 things we don't do
[13:32:42] <_joe_>	 I think we should open a ticket requiring further investigation / instrumentation
[13:32:50] <_joe_>	 this happened yesterday afternoon as well
[13:33:20] <_joe_>	 it would be interesting to look at profiling data before / after, to see whaat parts of the code are hotter
[13:33:36] <_joe_>	 we have the flamegraphs for that, I'll look later
[13:35:38] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:fcgi://127.0.0.1:9000 method=GET https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[13:36:33] <wikibugs>	 (03PS1) 10Hashar: Group0 to 1.34.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534154 (https://phabricator.wikimedia.org/T220746)
[13:37:12] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[13:37:53] <wikibugs>	 (03PS4) 10Ottomata: Adding caching headers for piwik javascript [puppet] - 10https://gerrit.wikimedia.org/r/534114 (https://phabricator.wikimedia.org/T230772) (owner: 10Nuria)
[13:38:22] <logmsgbot>	 !log hashar@deploy1001 Started scap: testwiki to 1.34.0-wmf.21 and rebuild l10n cache - T220746
[13:38:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:38:25] <stashbot>	 T220746: 1.34.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T220746
[13:38:38] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "Overall pretty good chart, thanks for running the benchmark. I 've left some minor comments around." (035 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/526679 (https://phabricator.wikimedia.org/T229287) (owner: 10MSantos)
[13:38:59] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Adding caching headers for piwik javascript [puppet] - 10https://gerrit.wikimedia.org/r/534114 (https://phabricator.wikimedia.org/T230772) (owner: 10Nuria)
[13:41:43] <wikibugs>	 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10JHedden) Cloud VPS OpenStack has been fully switched over and all services are ba...
[13:42:00] <icinga-wm>	 RECOVERY - PHP opcache health on mwdebug1002 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[13:42:30] <wikibugs>	 (03PS2) 10CDanis: dbctl: schema: allow wikitech in readOnlyBySection [software/conftool] - 10https://gerrit.wikimedia.org/r/534153
[13:44:03] <logmsgbot>	 !log joal@deploy1001 Started deploy [analytics/refinery@8b17711]:  Fixes for regualr analytics deploy
[13:44:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:47:11] <wikibugs>	 10Operations, 10Commons, 10Traffic: Downloading the original SVG of a file on Commons serves a truncated stream - https://phabricator.wikimedia.org/T231753 (10ema) p:05Triage→03Normal
[13:47:47] <wikibugs>	 10Operations, 10Traffic: Cannot download STL files due to network error - https://phabricator.wikimedia.org/T231422 (10ema)
[13:47:51] <wikibugs>	 10Operations, 10Commons, 10Traffic: Downloading the original SVG of a file on Commons serves a truncated stream - https://phabricator.wikimedia.org/T231753 (10ema)
[13:51:43] <wikibugs>	 10Operations: Conffile handling for PHP 7.2 packages - https://phabricator.wikimedia.org/T231881 (10MoritzMuehlenhoff)
[13:52:06] <wikibugs>	 10Operations: Conffile handling for PHP 7.2 packages - https://phabricator.wikimedia.org/T231881 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff
[13:54:53] <wikibugs>	 (03PS1) 10Ema: VCL: only cache responses with explicit CL on upload [puppet] - 10https://gerrit.wikimedia.org/r/534156 (https://phabricator.wikimedia.org/T231422)
[13:55:55] <wikibugs>	 (03CR) 10Gilles: [C: 03+1] VCL: only cache responses with explicit CL on upload [puppet] - 10https://gerrit.wikimedia.org/r/534156 (https://phabricator.wikimedia.org/T231422) (owner: 10Ema)
[13:56:02] <icinga-wm>	 PROBLEM - PHP opcache health on mwdebug1001 is CRITICAL: CRITICAL: opcache free space is below 50 MB https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[13:56:57] <wikibugs>	 (03CR) 10Ema: [C: 03+2] VCL: only cache responses with explicit CL on upload [puppet] - 10https://gerrit.wikimedia.org/r/534156 (https://phabricator.wikimedia.org/T231422) (owner: 10Ema)
[13:57:16] <wikibugs>	 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui)
[13:57:25] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Promote db1133 as wikitech master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534144 (https://phabricator.wikimedia.org/T229657) (owner: 10Marostegui)
[13:58:16] <marostegui>	 hashar: ^ I have merged that (I thought the train finished), I won't deploy anyways (it is a noop change)
[13:58:28] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Promote db1133 as wikitech master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534144 (https://phabricator.wikimedia.org/T229657) (owner: 10Marostegui)
[13:58:30] <hashar>	 marostegui: dont worry :)
[13:58:39] <hashar>	 marostegui: I am done with the mediawiki-config changes for now
[13:58:45] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Promote db1133 as wikitech master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534144 (https://phabricator.wikimedia.org/T229657) (owner: 10Marostegui)
[13:59:00] <hashar>	 the whole thing is on hold pending for some canary and I don't even know which one hehe
[13:59:20] <marostegui>	 hashar: ah, ok, what should I do with wikiversions.json file on /srv/mediawiki-staging ?
[13:59:56] <wikibugs>	 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog: Lake Huron missing due to apparent OSM vandalism - https://phabricator.wikimedia.org/T231691 (10MusikAnimal) 05Open→03Resolved a:03MusikAnimal >>! In T231691#5456906, @Gehel wrote: > As far as I can tell, the issue is now resolved. >  > @Musi...
[14:00:32] <hashar>	 marostegui: I guess you can stash it rebase, reapply
[14:00:40] <hashar>	 git stash;  git remote update; git rebase
[14:00:42] <hashar>	 git stash apply
[14:00:48] <hashar>	 or just dish it out
[14:00:53] <hashar>	 and I will resync wikiversion.json later on
[14:01:10] <wikibugs>	 (03PS2) 10MSantos: maps: cleanup unused template [puppet] - 10https://gerrit.wikimedia.org/r/533974 (owner: 10Gehel)
[14:01:16] <wikibugs>	 (03CR) 10MSantos: [C: 03+1] maps: cleanup unused template [puppet] - 10https://gerrit.wikimedia.org/r/533974 (owner: 10Gehel)
[14:01:39] <marostegui>	 hashar: ok, I will get rid of it then :)
[14:03:36] <marostegui>	 hashar: I merged my change, and I got your wikiversions back :)
[14:07:36] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, and 3 others: Piwik JS isn't cached - https://phabricator.wikimedia.org/T230772 (10Nuria) I can see the cache-control: max-age=604800 , I think @ema needs to change something on his end so varnish /ATS settings apply?
[14:11:08] <wikibugs>	 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog: Lake Huron missing due to apparent OSM vandalism - https://phabricator.wikimedia.org/T231691 (10MSantos) >>! In T231691#5460989, @Gehel wrote: >>>! In T231691#5460894, @Haros wrote: >> We really need a way to trigger a refresh without creating a ne...
[14:12:00] <icinga-wm>	 RECOVERY - PHP opcache health on mwdebug1001 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[14:12:55] <hashar>	 marostegui: thx :)
[14:14:34] <wikibugs>	 10Operations: Conffile handling for PHP 7.2 packages - https://phabricator.wikimedia.org/T231881 (10MoritzMuehlenhoff) As a workaround the update can be deployed with the following Cumin command (still need to get to the bottom of what causes conffile prompts here):   ` sudo cumin mwdebug1001* 'export DEBIAN_FRO...
[14:21:50] <moritzm>	 !log upgrading app server canaries to PHP 7.2.22 T230024
[14:21:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:52] <stashbot>	 T230024: Update component/php72 to 7.2.22 - https://phabricator.wikimedia.org/T230024
[14:22:00] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1341 is CRITICAL: CRITICAL - load average: 79.83, 37.30, 25.85 https://wikitech.wikimedia.org/wiki/Application_servers
[14:22:18] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1281 is CRITICAL: CRITICAL - load average: 63.43, 29.23, 20.96 https://wikitech.wikimedia.org/wiki/Application_servers
[14:22:26] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1313 is CRITICAL: CRITICAL - load average: 89.51, 40.94, 27.24 https://wikitech.wikimedia.org/wiki/Application_servers
[14:22:44] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1317 is CRITICAL: CRITICAL - load average: 76.54, 41.03, 27.41 https://wikitech.wikimedia.org/wiki/Application_servers
[14:23:34] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1341 is OK: OK - load average: 39.91, 37.40, 27.15 https://wikitech.wikimedia.org/wiki/Application_servers
[14:23:48] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1233 is CRITICAL: CRITICAL - load average: 70.48, 30.04, 19.93 https://wikitech.wikimedia.org/wiki/Application_servers
[14:23:58] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1222 is CRITICAL: CRITICAL - load average: 50.68, 24.78, 17.68 https://wikitech.wikimedia.org/wiki/Application_servers
[14:24:00] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1313 is OK: OK - load average: 34.66, 36.24, 27.01 https://wikitech.wikimedia.org/wiki/Application_servers
[14:24:08] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1230 is CRITICAL: CRITICAL - load average: 49.15, 23.84, 15.92 https://wikitech.wikimedia.org/wiki/Application_servers
[14:24:12] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1285 is CRITICAL: CRITICAL - load average: 102.66, 41.87, 25.80 https://wikitech.wikimedia.org/wiki/Application_servers
[14:24:14] <cdanis>	 moritzm: ^ you?  is that just cache invalidations?
[14:24:20] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1317 is OK: OK - load average: 31.41, 35.30, 26.69 https://wikitech.wikimedia.org/wiki/Application_servers
[14:24:24] <moritzm>	 I haven't done anything yet
[14:24:34] <moritzm>	 (and will only upgrade mw1261 initially anyway)
[14:24:37] <cdanis>	 ah
[14:24:40] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1226 is CRITICAL: CRITICAL - load average: 49.85, 27.31, 17.83 https://wikitech.wikimedia.org/wiki/Application_servers
[14:25:30] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1281 is OK: OK - load average: 23.23, 31.55, 24.05 https://wikitech.wikimedia.org/wiki/Application_servers
[14:25:34] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1222 is OK: OK - load average: 25.10, 23.30, 17.87 https://wikitech.wikimedia.org/wiki/Application_servers
[14:25:42] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1230 is OK: OK - load average: 18.52, 20.54, 15.53 https://wikitech.wikimedia.org/wiki/Application_servers
[14:25:50] <moritzm>	 are these php7 API servers? I saw https://phabricator.wikimedia.org/T231011 listed in the weekly document 
[14:26:14] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1226 is OK: OK - load average: 20.65, 23.87, 17.55 https://wikitech.wikimedia.org/wiki/Application_servers
[14:26:36] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1290 is CRITICAL: CRITICAL - load average: 61.84, 28.12, 20.60 https://wikitech.wikimedia.org/wiki/Application_servers
[14:26:58] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1286 is CRITICAL: CRITICAL - load average: 60.17, 29.37, 22.24 https://wikitech.wikimedia.org/wiki/Application_servers
[14:27:02] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1284 is CRITICAL: CRITICAL - load average: 105.93, 42.44, 25.36 https://wikitech.wikimedia.org/wiki/Application_servers
[14:27:26] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1283 is CRITICAL: CRITICAL - load average: 91.14, 41.83, 26.03 https://wikitech.wikimedia.org/wiki/Application_servers
[14:27:54] <cdanis>	 latency is still elevated as is mcrouter traffic
[14:28:16] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:fcgi://127.0.0.1:9000 method=GET https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[14:28:31] <logmsgbot>	 !log hashar@deploy1001 Finished scap: testwiki to 1.34.0-wmf.21 and rebuild l10n cache - T220746 (duration: 50m 09s)
[14:28:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:28:34] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1286 is OK: OK - load average: 28.28, 27.71, 22.40 https://wikitech.wikimedia.org/wiki/Application_servers
[14:28:34] <stashbot>	 T220746: 1.34.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T220746
[14:28:58] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1285 is OK: OK - load average: 15.92, 28.07, 24.53 https://wikitech.wikimedia.org/wiki/Application_servers
[14:29:06] <cdanis>	 oh, I didn't realize hashar was rebuilding l10n cache
[14:29:12] <cdanis>	 that often causes appserver high CPU
[14:31:28] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[14:31:48] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1233 is OK: OK - load average: 12.21, 21.93, 21.97 https://wikitech.wikimedia.org/wiki/Application_servers
[14:31:48] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1284 is OK: OK - load average: 16.12, 28.54, 24.43 https://wikitech.wikimedia.org/wiki/Application_servers
[14:32:14] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1283 is OK: OK - load average: 15.66, 27.38, 24.29 https://wikitech.wikimedia.org/wiki/Application_servers
[14:32:58] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1290 is OK: OK - load average: 16.52, 28.06, 25.14 https://wikitech.wikimedia.org/wiki/Application_servers
[14:33:04] <hashar>	 so
[14:33:12] <hashar>	 cdanis: yeah and it is done
[14:33:20] <hashar>	 rest is the usual bytecode cache being rebuild 
[14:33:33] <cdanis>	 well, there's still something else mysterious going on with the appserver fleet
[14:34:12] <cdanis>	 https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=11&fullscreen&orgId=1&from=now-2d&to=now
[14:34:12] <hashar>	 jouncebot: now
[14:34:12] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 25 minute(s)
[14:34:55] <hashar>	 cdanis: maybe that is related to 1.34.0-wmf.20  which I have deployed on rest of wikis around that time
[14:35:08] <godog>	 FYI I'll be bold and merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/531142 no point in the high cpu alerts anymore IMHO
[14:35:20] <hashar>	 hmm n that was before
[14:36:27] <wikibugs>	 (03PS2) 10Filippo Giunchedi: mediawiki: remove per-host high CPU alerts [puppet] - 10https://gerrit.wikimedia.org/r/531142 (https://phabricator.wikimedia.org/T230396)
[14:37:51] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] Group0 to 1.34.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534154 (https://phabricator.wikimedia.org/T220746) (owner: 10Hashar)
[14:38:17] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] mediawiki: remove per-host high CPU alerts [puppet] - 10https://gerrit.wikimedia.org/r/531142 (https://phabricator.wikimedia.org/T230396) (owner: 10Filippo Giunchedi)
[14:39:12] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Promote db1133 as wikitech master T229657 (duration: 00m 54s)
[14:39:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:15] <stashbot>	 T229657: Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657
[14:40:20] <wikibugs>	 (03Merged) 10jenkins-bot: Group0 to 1.34.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534154 (https://phabricator.wikimedia.org/T220746) (owner: 10Hashar)
[14:40:38] <wikibugs>	 (03CR) 10jenkins-bot: Group0 to 1.34.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534154 (https://phabricator.wikimedia.org/T220746) (owner: 10Hashar)
[14:41:34] <hashar>	 eeek
[14:41:39] <hashar>	 marostegui: ah I am pushing group0
[14:41:42] <hashar>	 but that should be fast
[14:42:08] <hashar>	 wait for canaries
[14:42:40] <marostegui>	 hashar: I'm fully done
[14:44:20] <hashar>	 marostegui: congratulations
[14:45:05] <wikibugs>	 10Operations, 10DBA, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui)
[14:45:40] <hashar>	 even sync wikiversions takes age :-\
[14:45:43] <logmsgbot>	 !log hashar@deploy1001 rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.21 - T220746
[14:45:44] <wikibugs>	 10Operations, 10DBA: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Marostegui)
[14:45:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:45:46] <stashbot>	 T220746: 1.34.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T220746
[14:45:47] <wikibugs>	 10Operations, 10DBA, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) 05Open→03Resolved This is all done - db1073 will be decommissioned in a few days (most...
[14:46:42] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb2006 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[14:47:18] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1004 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[14:47:28] <hashar>	 bajh
[14:47:34] <hashar>	 :\
[14:47:56] <icinga-wm>	 PROBLEM - Graphoid LVS codfw on graphoid.svc.codfw.wmnet is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Graphoid
[14:48:18] <hashar>	 anyone familiar with Graphoid? 
[14:48:38] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb2004 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[14:48:48] <hashar>	 seems that as part of rolling 1.34.0-wmf.21 that broke Graphoid somehow :-\
[14:49:18] <wikibugs>	 10Operations, 10DBA: Decommission db1073.eqiad.wmnet - https://phabricator.wikimedia.org/T231892 (10Marostegui)
[14:49:36] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb2003 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[14:49:48] <hashar>	 I don't even know where that health check is defined :\
[14:50:05] <wikibugs>	 10Operations, 10DBA: Decommission db1073.eqiad.wmnet - https://phabricator.wikimedia.org/T231892 (10Marostegui)
[14:50:10] <wikibugs>	 10Operations, 10DBA, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui)
[14:50:20] <wikibugs>	 10Operations, 10DBA: Decommission db1073.eqiad.wmnet - https://phabricator.wikimedia.org/T231892 (10Marostegui)
[14:50:23] <wikibugs>	 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui)
[14:50:57] <wikibugs>	 10Operations, 10DBA: Decommission db1073.eqiad.wmnet - https://phabricator.wikimedia.org/T231892 (10Marostegui) p:05Triage→03Normal This host was just removed from being a master [T229657] let's give it a few more days before actually start its decommissioning process.
[14:51:03] <bd808>	 hashar: if you can find mobrovac he might know how to debug. Second best bet would be akosiaris I think
[14:51:27] <hashar>	 thanks bd808 :)
[14:53:21] <hashar>	 Load failed with response code 40 
[14:53:22] <hashar>	 403
[14:53:24] <hashar>	 err
[14:53:44] <mobrovac>	 this seems to be a mw-side issue
[14:54:04] <mobrovac>	 graphoid gets back the "invalidhash" error from mw api
[14:54:38] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1001 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[14:54:40] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb2002 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[14:54:42] <mobrovac>	 for action=graph&title=Extension:Graph/Demo&hash=2e25518b199b22ab9043f7ce9a0cd1370b27d77a
[14:54:46] <mobrovac>	 hashar: ^
[14:55:06] <hashar>	 so I guess time for me to rollback group0 ;]
[14:56:50] <wikibugs>	 (03PS3) 10Marostegui: wmnet: Update s8-master record [dns] - 10https://gerrit.wikimedia.org/r/531455 (https://phabricator.wikimedia.org/T230762)
[14:56:56] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[14:57:01] <wikibugs>	 (03PS4) 10Marostegui: mariadb: Promote db1109 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/531189 (https://phabricator.wikimedia.org/T230762)
[14:57:02] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[14:57:07] <wikibugs>	 (03PS1) 10Hashar: Revert "Group0 to 1.34.0-wmf.21" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534171 (https://phabricator.wikimedia.org/T220746)
[14:57:19] <logmsgbot>	 !log hashar@deploy1001 rebuilt and synchronized wikiversions files: Rollback group0 to 1.34.0-wmf.21 - T220746
[14:57:19] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] "Already rollbacked" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534171 (https://phabricator.wikimedia.org/T220746) (owner: 10Hashar)
[14:57:20] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[14:57:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:57:22] <stashbot>	 T220746: 1.34.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T220746
[14:57:26] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[14:57:30] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[14:57:40] <akosiaris>	 IIRC there was some graph extension related patch merged 
[14:57:48] <icinga-wm>	 RECOVERY - Graphoid LVS codfw on graphoid.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Graphoid
[14:57:50] <akosiaris>	 lemme verify that
[14:58:12] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[14:58:14] <hashar>	 filling a bug
[14:58:15] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Group0 to 1.34.0-wmf.21" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534171 (https://phabricator.wikimedia.org/T220746) (owner: 10Hashar)
[14:58:26] <akosiaris>	 https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Kartographer/+/531159/
[14:58:33] <wikibugs>	 (03CR) 10jenkins-bot: Revert "Group0 to 1.34.0-wmf.21" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534171 (https://phabricator.wikimedia.org/T220746) (owner: 10Hashar)
[14:58:43] <akosiaris>	 that would probably break graphoid I guess ?
[14:58:55] <akosiaris>	 RoanKattouw: ^ ?
[14:59:19] <RoanKattouw>	 It should not affect Graphoid, it was in the Graph extension
[14:59:31] <mobrovac>	 action=graph is the graph extension
[14:59:33] <RoanKattouw>	 Although, huh, invalidhash? Interesting
[14:59:47] <RoanKattouw>	 Maybe it's yet another bit of fallout from my Graph extension change :(
[14:59:53] <mobrovac>	 there are no mw logs for that has id that i could find
[15:00:01] <hashar>	 mobrovac: akosiaris I have filled it as https://phabricator.wikimedia.org/T231894
[15:00:04] <mobrovac>	 s/has/hash/
[15:00:09] <hashar>	 havent filled a lot of details though ;-\
[15:00:13] <akosiaris>	 hashar: ok thanks
[15:00:42] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, although I think this is the kind of cook book which warrants a similar sanity check like Cumin? In Cumin is prints the host m" [cookbooks] - 10https://gerrit.wikimedia.org/r/531897 (https://phabricator.wikimedia.org/T231066) (owner: 10Volans)
[15:01:20] <hashar>	 mobrovac: akosiaris and I have rollbacked so at least we are safe "tm"
[15:01:24] <RoanKattouw>	 Hmm perhaps it's caused by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Graph/+/493628 which is a much larger change than mine, and is new in wmf.21
[15:02:09] <wikibugs>	 (03PS10) 10Ori.livneh: Configure forensic logging of Apache requests; enable on beta [puppet] - 10https://gerrit.wikimedia.org/r/511751
[15:02:17] <hashar>	 I am going to promote "testwiki" to 1.34.0-wmf.21 though no idea whether graphoid is enabled there
[15:02:29] <hashar>	 ah yeah it is
[15:03:51] <mobrovac>	 if do promote it, the checks won't start failing for graphoid because it only uses mw.org for that
[15:03:59] <RoanKattouw>	 Wait, I'm looking for the patch that fixed the object-instead-of-array issue but I can't find it. Maybe that only affected Kartographer, not Graph?
[15:04:04] <mobrovac>	 but since we know .21 causes problems, i'm not sure you should promote it
[15:04:06] <wikibugs>	 (03PS1) 10Hashar: Promote testwiki to 1.34.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534173 (https://phabricator.wikimedia.org/T231894)
[15:04:31] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] Promote testwiki to 1.34.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534173 (https://phabricator.wikimedia.org/T231894) (owner: 10Hashar)
[15:04:49] <RoanKattouw>	 Hmm actually
[15:04:58] <hashar>	 I am just promoting testwiki
[15:05:03] <RoanKattouw>	 I wonder if Graphoid is really broken or if it's an assumption in the check that needs to be updated
[15:05:10] <hashar>	 no idea whether that would help to debug the graphoid thing though
[15:05:27] <wikibugs>	 (03Merged) 10jenkins-bot: Promote testwiki to 1.34.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534173 (https://phabricator.wikimedia.org/T231894) (owner: 10Hashar)
[15:05:41] <RoanKattouw>	 Where can I find the HTTP request that the monitoring code makes? The wikitech link in the icinga-wm message is dead
[15:06:06] <hashar>	 check_wmf_services in puppet
[15:06:25] <hashar>	 singular
[15:06:25] <hashar>	 bah
[15:06:41] <wikibugs>	 (03CR) 10jenkins-bot: Promote testwiki to 1.34.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534173 (https://phabricator.wikimedia.org/T231894) (owner: 10Hashar)
[15:06:43] <hashar>	 check_wmf_service!http://graphoid.svc.codfw.wmnet:19000!15
[15:07:14] <hashar>	  /usr/bin/service-checker-swagger -t $ARG2$ $HOSTNAME$ $ARG1$
[15:07:14] <logmsgbot>	 !log hashar@deploy1001 rebuilt and synchronized wikiversions files: testwiki 1.34.0-wmf.21 for T231894 - T220746
[15:07:16] <hashar>	 bah :-\
[15:07:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:07:18] <stashbot>	 T231894: 1.34.0-wmf.21 cause Graphoid service check to fail due to 403 from mediawiki.org - https://phabricator.wikimedia.org/T231894
[15:07:18] <stashbot>	 T220746: 1.34.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T220746
[15:07:42] <akosiaris>	 RoanKattouw: curl http://graphoid.svc.codfw.wmnet:19000/?spec | jq .
[15:07:48] <akosiaris>	 it's the x-amples stanza
[15:07:53] <RoanKattouw>	 Thanks
[15:07:56] <akosiaris>	 stanzas more like it
[15:08:33] <mobrovac>	 RoanKattouw: https://github.com/wikimedia/mediawiki-services-graphoid/blob/master/spec.yaml#L67
[15:08:36] <akosiaris>	 more or less I see             "request": { "params": { "format": "png", "title": "Extension:Graph/Demo",                 "revid": "0",                 "id": "2e25518b199b22ab9043f7ce9a0cd1370b27d77a"}
[15:09:29] <hashar>	 I will let you guys find out the magic. I am getting a breka and be back later for dinner
[15:09:48] <hashar>	 but I guess anyone from #wikimedia-releng should be able to promote to 1.34.0-wmf.21 again if need be
[15:09:50] <RoanKattouw>	 I have a suspicion, first going to try to confirm it locally
[15:10:30] <akosiaris>	 so the monitoring script  tries to query /mediawiki.org/v1/png/Extension:Graph/Demo/0/2e25518b199b22ab9043f7ce9a0cd1370b27d77a
[15:10:40] <akosiaris>	 what that internally means for graphoid's requests, I am not sure
[15:11:32] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1287 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.010 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:12:23] <wikibugs>	 10Operations, 10Traffic: Cannot download STL files due to network error - https://phabricator.wikimedia.org/T231422 (10ema) @Gilles: the issue should now be fixed, can you confirm?
[15:12:24] <akosiaris>	 RoanKattouw: note btw that graphoid is under a code stewardship requests per https://phabricator.wikimedia.org/T211881. Getting changes to it, might very well prove challenging
[15:12:37] <RoanKattouw>	 I don't think we're going to need to change Graphoid itself
[15:12:42] <RoanKattouw>	 I think I screwed something up in the Graph extension
[15:13:01] <akosiaris>	 ok, that makes it way easier then
[15:13:06] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1287 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 593 bytes in 1.994 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:15:45] <RoanKattouw>	 Yup I found it, patch incoming
[15:16:20] <RoanKattouw>	 Same mistake as I made in Kartographer, except in Graph it somehow didn't cause PHP fatals, instead it just returned "invalidhash" for everything
[15:18:27] <wikibugs>	 10Operations, 10Traffic: Cannot download STL files due to network error - https://phabricator.wikimedia.org/T231422 (10Gilles) 05Open→03Resolved Fix confirmed
[15:19:18] <RoanKattouw>	 Patch in Gerrit and +2ed: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Graph/+/534174
[15:20:05] <RoanKattouw>	 I'll cherry-pick and deploy once it makes it through Jenkins, unless someone beats me to it
[15:21:07] <Vito>	 anyone to assist for a big rename?
[15:21:10] <Vito>	 150k
[15:21:37] * Vito eyes RoanKattouw
[15:22:08] <RoanKattouw>	 I've never done one before, is there documentation about what I should do?
[15:23:01] <Vito>	 RoanKattouw: if some rename gets stuck there should be a script to restart it 
[15:23:48] <RoanKattouw>	 Aha, fixStuckGlobalRename.php sounds relevant
[15:24:05] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] First version of the wikifeeds chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/526679 (https://phabricator.wikimedia.org/T229287) (owner: 10MSantos)
[15:24:57] <Vito>	 RoanKattouw: I think also nukeTehWikis.php should be of interest 
[15:25:01] <RoanKattouw>	 haha
[15:25:17] <RoanKattouw>	 Vito: Could you link me to the log entry for the stuck rename please?
[15:25:33] <Vito>	 currently I didn't perform the rename yet
[15:25:59] <Vito>	 but we prefer to have some sysadmin around while doing big renames
[15:26:29] <RoanKattouw>	 Oh I see
[15:27:09] <RoanKattouw>	 I'm about to take a shower then go to the office, but I should be back on line in about an hour
[15:27:49] <RoanKattouw>	 (Sorry, I know it looks like I'm working, but that's only because I got up at 6:45am for an early meeting and then got a train blocker dropped in my lap)
[15:28:21] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Traffic: (OoW) lvs2006 Embedded Flash/SD-CARD iLO errors - https://phabricator.wikimedia.org/T192082 (10Papaul) p:05Normal→03Low
[15:28:46] <wikibugs>	 10Operations, 10ops-codfw: (OoW) lvs2002 repeated usb connect/disconnect message - https://phabricator.wikimedia.org/T148017 (10Papaul) p:05Normal→03Low
[15:29:13] <wikibugs>	 10Operations, 10ops-codfw, 10Traffic, 10Patch-For-Review: (OoW) lvs2006 crashed into (what it seems) an unrecoverable state - https://phabricator.wikimedia.org/T209337 (10Papaul) p:05Lowest→03Low
[15:30:41] <Vito>	 no pb RoanKattouw, I'll take a look at rename status until completed
[15:30:56] <RoanKattouw>	 OK cool, and if it gets stuck feel free to ping me
[15:31:20] <RoanKattouw>	 I just might not respond immediately
[15:32:43] <ebernhardson>	 !log unban elastic1027 from production-search-eqiad
[15:32:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:37:39] <wikibugs>	 (03PS1) 10Ladsgroup: Set item terms migration stage for Wikidata on WRITE_BOTH up to Q2m [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534183 (https://phabricator.wikimedia.org/T225055)
[15:38:22] <wikibugs>	 (03PS1) 10Ema: envoyproxy: allow overriding tls_port [puppet] - 10https://gerrit.wikimedia.org/r/534184
[15:39:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] envoyproxy: allow overriding tls_port [puppet] - 10https://gerrit.wikimedia.org/r/534184 (owner: 10Ema)
[15:39:34] <wikibugs>	 (03PS1) 10Reedy: Re-apply wgFlaggedRevsOverride = false on ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534185 (https://phabricator.wikimedia.org/T227260)
[15:40:01] <wikibugs>	 (03PS2) 10Reedy: Re-apply wgFlaggedRevsOverride = false on ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534185 (https://phabricator.wikimedia.org/T227260)
[15:40:07] <Reedy>	 jouncebot: now
[15:40:07] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 19 minute(s)
[15:40:12] <Reedy>	 jouncebot: next
[15:40:12] <jouncebot>	 In 0 hour(s) and 19 minute(s): Puppet SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190903T1600)
[15:41:52] <wikibugs>	 (03PS2) 10Ema: envoyproxy: allow overriding tls_port [puppet] - 10https://gerrit.wikimedia.org/r/534184
[15:43:07] <wikibugs>	 (03CR) 10Reedy: [C: 03+2] Re-apply wgFlaggedRevsOverride = false on ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534185 (https://phabricator.wikimedia.org/T227260) (owner: 10Reedy)
[15:43:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] envoyproxy: allow overriding tls_port [puppet] - 10https://gerrit.wikimedia.org/r/534184 (owner: 10Ema)
[15:49:15] <wikibugs>	 (03PS3) 10Ema: envoyproxy: allow overriding TLS port [puppet] - 10https://gerrit.wikimedia.org/r/534184
[15:50:44] <wikibugs>	 10Operations, 10Elasticsearch, 10Wikimedia-Logstash, 10observability, and 2 others: Migrate Elasticsearch from deprecated Gelf logstash input to rsyslog Kafka logging pipeline - https://phabricator.wikimedia.org/T225125 (10Mathew.onipe) rsyslog Json requires the `@cee` token which must be provided accordin...
[15:51:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] envoyproxy: allow overriding TLS port [puppet] - 10https://gerrit.wikimedia.org/r/534184 (owner: 10Ema)
[15:54:07] <wikibugs>	 (03Merged) 10jenkins-bot: Re-apply wgFlaggedRevsOverride = false on ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534185 (https://phabricator.wikimedia.org/T227260) (owner: 10Reedy)
[15:54:24] <wikibugs>	 (03CR) 10jenkins-bot: Re-apply wgFlaggedRevsOverride = false on ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534185 (https://phabricator.wikimedia.org/T227260) (owner: 10Reedy)
[15:55:40] <logmsgbot>	 !log reedy@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T227260 (duration: 00m 54s)
[15:55:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:55:44] <stashbot>	 T227260: Last reviewed revision is shown to logged-out users in ukwiki instead of last revision - https://phabricator.wikimedia.org/T227260
[15:56:47] <Vito>	 RoanKattouw: everything worked fine, ty
[15:57:52] <wikibugs>	 10Operations, 10Release Pipeline, 10Maps (Kartotherian), 10Patch-For-Review: Create blubberfile for deploying kartotherian into docker environment. - https://phabricator.wikimedia.org/T223275 (10Jdforrester-WMF) I believe that this is now Done?
[16:00:04] <jouncebot>	 godog and _joe_: My dear minions, it's time we take the moon! Just kidding. Time for Puppet SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190903T1600).
[16:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[16:01:01] <logmsgbot>	 !log joal@deploy1001 Finished deploy [analytics/refinery@8b17711]:  Fixes for regualr analytics deploy (duration: 136m 59s)
[16:01:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:01:32] <wikibugs>	 (03PS4) 10Ema: envoyproxy: allow overriding TLS port [puppet] - 10https://gerrit.wikimedia.org/r/534184
[16:01:42] <wikibugs>	 10Operations, 10Epic, 10Maps (Kartotherian), 10Patch-For-Review: Move Kartotherian and Tilerator to Kubernetes - https://phabricator.wikimedia.org/T216826 (10MSantos)
[16:01:45] <wikibugs>	 10Operations, 10Release Pipeline, 10Maps (Kartotherian), 10Patch-For-Review: Create blubberfile for deploying kartotherian into docker environment. - https://phabricator.wikimedia.org/T223275 (10MSantos) 05Open→03Resolved
[16:03:41] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] envoyproxy: allow overriding TLS port [puppet] - 10https://gerrit.wikimedia.org/r/534184 (owner: 10Ema)
[16:12:16] <wikibugs>	 (03PS1) 10Fdans: role::common::aqs: update druid mediawiki's datasource [puppet] - 10https://gerrit.wikimedia.org/r/534191
[16:15:03] <RoanKattouw>	 CI is completely broken in the wmf.21 branch, we'll need releng to fix that before my train blocker fix can be deployed
[16:16:31] <James_F>	 Or just force-merge it?
[16:19:31] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] CLI: suppress ncclient noisy logger [software/homer] - 10https://gerrit.wikimedia.org/r/533570 (https://phabricator.wikimedia.org/T228388) (owner: 10Volans)
[16:21:45] <wikibugs>	 (03CR) 10Phamhi: "Puppet successful compile result can be found here: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/18131/cons" [puppet] - 10https://gerrit.wikimedia.org/r/533606 (owner: 10Phamhi)
[16:22:17] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] config: inject role and site to the configuration [software/homer] - 10https://gerrit.wikimedia.org/r/533568 (https://phabricator.wikimedia.org/T228388) (owner: 10Volans)
[16:23:11] <RoanKattouw>	 Sure, I guess
[16:23:19] <wikibugs>	 (03PS5) 10Ema: envoyproxy: allow overriding TLS port [puppet] - 10https://gerrit.wikimedia.org/r/534184
[16:25:04] <James_F>	 RoanKattouw: It's also "only" UBN because it blocks the train, it's just group0 graphs that might be broken.
[16:25:18] <RoanKattouw>	 I know
[16:25:33] <RoanKattouw>	 But it breaking graphs on mw.org caused icinga to declare all of Graphoid to be broken
[16:27:11] <wikibugs>	 (03CR) 10Ema: "pcc lgtm https://puppet-compiler.wmflabs.org/compiler1002/18155/miscweb1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/534184 (owner: 10Ema)
[16:27:41] * James_F nods.
[16:29:41] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] "Verify it doesn't impact what we have installed already with the compiler, but it seems GTG to me." [puppet] - 10https://gerrit.wikimedia.org/r/534184 (owner: 10Ema)
[16:34:59] <Krinkle>	 ERROR:zuul.Repo:Unable to initialize repo for https://gerrit.wikimedia.org/r/npm-test
[16:35:05] <logmsgbot>	 !log catrope@deploy1001 Synchronized php-1.34.0-wmf.21/extensions/Graph/includes/ApiGraph.php: T231894 (duration: 00m 55s)
[16:35:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:35:09] <stashbot>	 T231894: 1.34.0-wmf.21 cause Graphoid service check to fail due to 403 from mediawiki.org - https://phabricator.wikimedia.org/T231894
[16:35:29] <Krinkle>	 Could not find the registration file for the extension …
[16:35:34] <RoanKattouw>	 This --^^ should fix the train blocker and make it safe to reenable wmf.21 on group0 without breaking the Graphoid monitoring check
[16:35:35] <Krinkle>	 RoanKattouw: those are the errors on wmf.21 in Jenkins
[16:35:37] <Krinkle>	 strange indeed
[16:35:47] <RoanKattouw>	 Krinkle: Yes, extension loading is completely broken in wmf.21 CI right now
[16:35:56] <RoanKattouw>	 Which is why I force-merged the patch I just deployed
[16:36:04] <Krinkle>	 Also why is it cloning a think called "npm-test.git" that repo doesn't exist
[16:36:07] <Krinkle>	 buggy quibble release?
[16:36:38] <Krinkle>	 (-releng)
[16:40:08] <wikibugs>	 (03CR) 10Ayounsi: transports: add JunOS transport (032 comments) [software/homer] - 10https://gerrit.wikimedia.org/r/533558 (https://phabricator.wikimedia.org/T228388) (owner: 10Volans)
[16:55:07] <wikibugs>	 (03PS1) 10Odder: Add high-density logos for the Incubator [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534197 (https://phabricator.wikimedia.org/T230122)
[16:57:40] <wikibugs>	 (03PS2) 10Odder: Add high-density logos for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534152 (https://phabricator.wikimedia.org/T230120)
[16:59:17] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] config: enforce positional vs. keyword args [software/homer] - 10https://gerrit.wikimedia.org/r/533623 (owner: 10Volans)
[16:59:56] <wikibugs>	 (03PS1) 10Odder: Add high-density logos for the Incubator [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534198 (https://phabricator.wikimedia.org/T230122)
[17:00:04] <jouncebot>	 cscott, arlolra, subbu, halfak, and accraze: My dear minions, it's time we take the moon! Just kidding. Time for Services – Graphoid / Parsoid / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190903T1700).
[17:02:27] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add high-density logos for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534152 (https://phabricator.wikimedia.org/T230120) (owner: 10Odder)
[17:02:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add high-density logos for the Incubator [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534198 (https://phabricator.wikimedia.org/T230122) (owner: 10Odder)
[17:14:51] <Urbanecm>	 Hi opsens. Do you know what's wmf-nda, ie. who should be there?
[17:18:10] <subbu>	 no parsoid deploy today
[17:22:42] <wikibugs>	 10Operations, 10Discovery-Search, 10Elasticsearch, 10Patch-For-Review: Reindex commonswiki as shards have grown beyond critical threshold - https://phabricator.wikimedia.org/T231446 (10Gehel)
[17:24:06] <wikibugs>	 10Operations, 10netops: Check router ACLs for early install SSH access from puppet masters/cumin hosts - https://phabricator.wikimedia.org/T231811 (10ayounsi) Hosts in the `cloud-hosts1-b-eqiad` vlan are behind the `labs-in4` firewall filter (applied on traffic going out of that vlan), which also includes the...
[17:30:39] <Nemo_bis>	 Urbanecm: do you mean the Phabricator group?
[17:30:57] <Urbanecm>	 Nemo_bis: yes
[17:31:56] <Nemo_bis>	 Urbanecm: well, isn't it described on the project itself? https://phabricator.wikimedia.org/project/profile/61/
[17:32:19] <Nemo_bis>	 mostly https://wikitech.wikimedia.org/wiki/Ops_Onboarding + https://wikitech.wikimedia.org/wiki/Volunteer_NDA cover it
[17:32:23] <hashar>	 James_F: I guess I can try to promote 1.34.0-wmf.21 again now that Graph is solved ;)
[17:32:29] <hashar>	 jouncebot: now
[17:32:29] <jouncebot>	 For the next 0 hour(s) and 27 minute(s): Services – Graphoid / Parsoid / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190903T1700)
[17:35:28] <wikibugs>	 (03PS1) 10Hashar: Group0 to 1.34.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534203 (https://phabricator.wikimedia.org/T220746)
[17:35:53] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] "Take 2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534203 (https://phabricator.wikimedia.org/T220746) (owner: 10Hashar)
[17:35:59] <wikibugs>	 (03PS15) 10Mathew.onipe: lvs: allow access to wdqs lvs on port 8888 [puppet] - 10https://gerrit.wikimedia.org/r/529053 (https://phabricator.wikimedia.org/T176875)
[17:36:01] <wikibugs>	 (03PS2) 10Mathew.onipe: elasticsearch: add syslog logging option [puppet] - 10https://gerrit.wikimedia.org/r/533928 (https://phabricator.wikimedia.org/T225125)
[17:37:30] <wikibugs>	 (03Merged) 10jenkins-bot: Group0 to 1.34.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534203 (https://phabricator.wikimedia.org/T220746) (owner: 10Hashar)
[17:38:07] <wikibugs>	 (03CR) 10jenkins-bot: Group0 to 1.34.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534203 (https://phabricator.wikimedia.org/T220746) (owner: 10Hashar)
[17:38:18] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban, 10netops: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10Cmjohnson) @Ottomata Do you still need the 2nd port now that you're not doing the cloud thing?  If so which...
[17:38:44] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] "That is 1.34.0-wmf.21 !" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534203 (https://phabricator.wikimedia.org/T220746) (owner: 10Hashar)
[17:40:53] <logmsgbot>	 !log hashar@deploy1001 rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.21
[17:40:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:45:56] <James_F>	 !log Pulled I9b64a2bb770 into wmf.21 production on the deploy server; no need to deploy to app-servers, CI-only fix.
[17:45:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:01:45] <wikibugs>	 (03PS2) 10Herron: prometheus: aggregate systemd failed metrics [puppet] - 10https://gerrit.wikimedia.org/r/533282 (https://phabricator.wikimedia.org/T230570)
[18:03:34] <wikibugs>	 (03CR) 10Herron: prometheus: aggregate systemd failed metrics (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/533282 (https://phabricator.wikimedia.org/T230570) (owner: 10Herron)
[18:06:10] <wikibugs>	 (03PS1) 10Herron: prometheus: deploy prometheus-ipsec-exporter to all sites [puppet] - 10https://gerrit.wikimedia.org/r/534210 (https://phabricator.wikimedia.org/T230236)
[18:18:17] <wikibugs>	 (03CR) 10Eevans: [C: 03+1] sessionstore: Bump limits and requests [deployment-charts] - 10https://gerrit.wikimedia.org/r/533922 (https://phabricator.wikimedia.org/T229697) (owner: 10Alexandros Kosiaris)
[18:20:24] <wikibugs>	 (03PS3) 10Herron: eventgate-main: add new kafka-main brokers to broker list [deployment-charts] - 10https://gerrit.wikimedia.org/r/529428 (https://phabricator.wikimedia.org/T225005)
[18:21:42] <wikibugs>	 (03CR) 10Herron: "based on the wikitech docs the chart version will need to be bumped, added that to ps3" [deployment-charts] - 10https://gerrit.wikimedia.org/r/529428 (https://phabricator.wikimedia.org/T225005) (owner: 10Herron)
[18:23:31] <wikibugs>	 (03PS7) 10MSantos: First version of the wikifeeds chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/526679 (https://phabricator.wikimedia.org/T229287)
[18:25:02] <wikibugs>	 (03PS1) 10Subramanya Sastry: Enable loading Parsoid/PHP as an extension on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534215 (https://phabricator.wikimedia.org/T231569)
[18:25:43] <wikibugs>	 (03CR) 10MSantos: "> Patch Set 6:" (036 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/526679 (https://phabricator.wikimedia.org/T229287) (owner: 10MSantos)
[18:26:12] <wikibugs>	 (03CR) 10Subramanya Sastry: "Maybe wait till the Parsoid server on the beta cluster is converted to an appserver. But this should be safe nevertheless." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534215 (https://phabricator.wikimedia.org/T231569) (owner: 10Subramanya Sastry)
[18:27:18] <wikibugs>	 (03CR) 10Subramanya Sastry: "We should figure out if we need any other custom config file similar to the rt test settings file. I already flagged https://phabricator.w" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534215 (https://phabricator.wikimedia.org/T231569) (owner: 10Subramanya Sastry)
[18:30:48] <Urbanecm>	 Nemo_bis: but it doesn't say who should be in
[18:30:50] <Urbanecm>	 jouncebot: now
[18:30:50] <jouncebot>	 No deployments scheduled for the next 4 hour(s) and 29 minute(s)
[18:30:56] <Urbanecm>	 jouncebot: next
[18:30:56] <jouncebot>	 In 4 hour(s) and 29 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190903T2300)
[18:35:51] <Urbanecm>	 !log Livetesting on mwdebug1002
[18:35:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:39:50] * Urbanecm going to sync the livetest, works, will upload soon
[18:41:40] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/: Emergency fix: GE not loading configuration properly: newbie facing feature (duration: 00m 57s)
[18:41:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:05] <wikibugs>	 (03PS1) 10Urbanecm: [bugfix] Growth experiments not loading conf properly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534217
[18:43:31] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "already deployed, emergency" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534217 (owner: 10Urbanecm)
[18:43:33] <wikibugs>	 (03PS2) 10Catrope: Enable and configure ORES damaging and goodfaith on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528537 (https://phabricator.wikimedia.org/T225562) (owner: 10Sbisson)
[18:44:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [bugfix] Growth experiments not loading conf properly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534217 (owner: 10Urbanecm)
[18:44:59] <Urbanecm>	 RoanKattouw: ^^
[18:45:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [bugfix] Growth experiments not loading conf properly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534217 (owner: 10Urbanecm)
[18:45:08] <Urbanecm>	 that V-1'ed patch is emergency-deployed fix
[18:45:10] <Nemo_bis>	 Urbanecm: it's ops, plus staff who requests [possibly documented on a private page, I don't remember], plus volunteers who request it
[18:45:17] <Urbanecm>	 going to fix jenkins
[18:45:21] <Urbanecm>	 Nemo_bis: thanks
[18:45:59] <Nemo_bis>	 I think there is no unifying policy in order to let space for some ad hoc decisions, but I might misremember
[18:46:21] <RoanKattouw>	 Urbanecm: What's broken about it? Is it extending the array instead of overwriting it completely?
[18:46:25] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban, 10netops: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10Ottomata) We don't!    Just Analytics VLAN for now please.
[18:46:34] <Urbanecm>	 RoanKattouw: yes
[18:46:42] <Urbanecm>	 it showed the site help link
[18:46:43] <Urbanecm>	 as the first one
[18:46:51] <Urbanecm>	 not sure who added it into extension.json for the first place
[18:46:52] <RoanKattouw>	 Right. The long-term solution then probably involves changing the merge_strategy of the setting in extension.json
[18:46:55] <Urbanecm>	 yes
[18:46:59] <Urbanecm>	 but this is emergency solution
[18:47:01] <Urbanecm>	 user facing
[18:47:13] <Urbanecm>	 I'll fix the patch
[18:47:22] <RoanKattouw>	 What link?
[18:47:54] <RoanKattouw>	 Urbanecm: Also you probably don't need $wgExtensionFunctions here, you should be able to override the $wg vars directly. That's how almost all (maybe actually all?) of the existing $wmg vars work
[18:48:08] <Urbanecm>	 https://gerrit.wikimedia.org/r/534217
[18:48:09] <Urbanecm>	 my fix
[18:48:12] <Urbanecm>	 it has -1 from jenkins
[18:48:13] <Urbanecm>	 lint
[18:48:25] <wikibugs>	 (03PS2) 10Urbanecm: [bugfix] Growth experiments not loading conf properly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534217
[18:48:27] <Urbanecm>	 RoanKattouw: probably...
[18:48:32] <Urbanecm>	 not sure, lazy to test now :-)
[18:48:39] <Urbanecm>	 fixed, because it's urgent IMO
[18:48:46] <RoanKattouw>	 Yeah no problem, it should be temporary anyway
[18:48:48] <Urbanecm>	 yeah
[18:49:11] <RoanKattouw>	 Please file a task for a long-term solution too, ideally with an explanation of what broke and where
[18:49:38] <Urbanecm>	 will do
[18:49:49] <Urbanecm>	 i was quite sure it's extreg anyway, so this was easy to test
[18:50:27] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] [bugfix] Growth experiments not loading conf properly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534217 (owner: 10Urbanecm)
[18:50:37] <Urbanecm>	 RoanKattouw: thanks for review
[18:50:45] <Urbanecm>	 I'll fill a task soon
[18:50:51] <Urbanecm>	 and try the merging strategy...
[18:51:54] <wikibugs>	 (03Merged) 10jenkins-bot: [bugfix] Growth experiments not loading conf properly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534217 (owner: 10Urbanecm)
[18:52:13] <wikibugs>	 (03CR) 10jenkins-bot: [bugfix] Growth experiments not loading conf properly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534217 (owner: 10Urbanecm)
[19:07:50] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] role::common::aqs: update druid mediawiki's datasource [puppet] - 10https://gerrit.wikimedia.org/r/534191 (owner: 10Fdans)
[19:09:17] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Switch high-traffic jobs to eventgate. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529124 (https://phabricator.wikimedia.org/T228705) (owner: 10Ppchelko)
[19:09:25] <wikibugs>	 (03PS2) 10Ottomata: Switch high-traffic jobs to eventgate. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529124 (https://phabricator.wikimedia.org/T228705) (owner: 10Ppchelko)
[19:12:10] <wikibugs>	 (03CR) 10jenkins-bot: Switch high-traffic jobs to eventgate. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529124 (https://phabricator.wikimedia.org/T228705) (owner: 10Ppchelko)
[19:12:42] <ottomata>	 !log switching jobqueue events to eventgate-main - T228705
[19:12:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:12:45] <stashbot>	 T228705: Migrate JobQueue to eventgate - https://phabricator.wikimedia.org/T228705
[19:14:00] <logmsgbot>	 !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Switch high-traffic jobs to eventgate. Take 2 - T228705 (duration: 00m 56s)
[19:14:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:18:26] <logmsgbot>	 !log fdans@deploy1001 Started restart [analytics/aqs/deploy@fc1d232]: (no justification provided)
[19:18:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:20:00] <logmsgbot>	 !log fdans@deploy1001 Started restart [analytics/aqs/deploy@fc1d232]: (no justification provided)
[19:20:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:21:27] <wikibugs>	 (03CR) 10Jhedden: [C: 03+1] admin: convert maintain_kubeusers to systemd timer type [puppet] - 10https://gerrit.wikimedia.org/r/533606 (owner: 10Phamhi)
[19:23:30] <wikibugs>	 10Operations, 10Packaging, 10serviceops, 10CPT Initiatives (Session Management Service (CDP2)): Need help to create and deploy Debian-packaged Python 3 app - https://phabricator.wikimedia.org/T229980 (10WDoranWMF)
[19:42:10] <XioNoX>	 !log rollback OSPF metric change on eqiad-codfw Zayo link (1320->320)
[19:42:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:54:51] <wikibugs>	 (03PS1) 10Ppchelko: Switch all non-low-traffic jobs to eventgate. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534225 (https://phabricator.wikimedia.org/T228705)
[19:55:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Switch all non-low-traffic jobs to eventgate. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534225 (https://phabricator.wikimedia.org/T228705) (owner: 10Ppchelko)
[20:12:43] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] prometheus: deploy prometheus-ipsec-exporter to all sites [puppet] - 10https://gerrit.wikimedia.org/r/534210 (https://phabricator.wikimedia.org/T230236) (owner: 10Herron)
[20:22:29] <wikibugs>	 (03PS1) 10CDanis: dbctl: indicate failed commit in announcement [software/conftool] - 10https://gerrit.wikimedia.org/r/534230 (https://phabricator.wikimedia.org/T231871)
[20:44:44] <wikibugs>	 10Operations, 10ops-esams, 10netops: replace msw1-esams - https://phabricator.wikimedia.org/T185151 (10ayounsi)
[20:44:46] <wikibugs>	 10Operations, 10ops-esams: Repurpose csw2-oe14/15 and lab-ex4200 as msw - https://phabricator.wikimedia.org/T215991 (10ayounsi)
[21:05:46] <wikibugs>	 (03PS2) 10Ppchelko: Switch all non-low-traffic jobs to eventgate. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534225 (https://phabricator.wikimedia.org/T228705)
[21:06:41] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Switch all non-low-traffic jobs to eventgate. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534225 (https://phabricator.wikimedia.org/T228705) (owner: 10Ppchelko)
[21:10:14] <wikibugs>	 (03PS1) 10Ppchelko: Remove EventBusRCFeedEngine eventServiceName. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534236 (https://phabricator.wikimedia.org/T229863)
[21:12:35] <wikibugs>	 (03PS3) 10Ppchelko: Switch all non-low-traffic jobs to eventgate. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534225 (https://phabricator.wikimedia.org/T228705)
[21:17:48] <wikibugs>	 (03PS13) 10Jforrester: Set wgNoticeProjects for wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471663 (https://phabricator.wikimedia.org/T208694) (owner: 10MacFan4000)
[21:32:08] <wikibugs>	 (03Abandoned) 10Subramanya Sastry: Make scandium a read-only appserver + enable exception logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529173 (https://phabricator.wikimedia.org/T228069) (owner: 10Subramanya Sastry)
[21:34:13] <wikibugs>	 (03PS1) 10Catrope: Revert "[bugfix] Growth experiments not loading conf properly" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534240 (https://phabricator.wikimedia.org/T231935)
[21:34:53] <wikibugs>	 (03CR) 10Catrope: [C: 04-2] "Wait for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/534233 to be deployed first" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534240 (https://phabricator.wikimedia.org/T231935) (owner: 10Catrope)
[21:55:22] <Urbanecm>	 RoanKattouw: you deploying "Set correct merge strategy for help panel links"?
[21:55:26] <Urbanecm>	 https://gerrit.wikimedia.org/r/534242
[21:58:07] <RoanKattouw>	 Yes, I listed it for the SWAT in about an hour
[21:59:08] <Urbanecm>	 yeah, evening one. Thanks, just wasn't sure how permanent your -2 is
[21:59:11] <Urbanecm>	 thanks!
[22:00:30] <RoanKattouw>	 Oh it was just until the extension patch got merged, which I expected to take longer
[22:01:04] <Urbanecm>	 yeah, got it now :-)
[22:01:06] <Urbanecm>	 thanks
[22:01:29] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "Formally: LGTM, change shouldn't break anything now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534240 (https://phabricator.wikimedia.org/T231935) (owner: 10Catrope)
[22:01:42] <Urbanecm>	 voted +1 then, seems good and not breaking, once your -2 is resolved
[22:08:44] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:fcgi://127.0.0.1:9000 method=GET https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[22:11:54] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[22:12:11] <wikibugs>	 (03PS50) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291)
[22:14:15] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov)
[22:16:12] <wikibugs>	 (03PS51) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291)
[22:28:08] <wikibugs>	 (03CR) 10Awight: [C: 03+1] "> Uploaded patch set 13." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471663 (https://phabricator.wikimedia.org/T208694) (owner: 10MacFan4000)
[22:33:11] <wikibugs>	 10Operations, 10DBA: Migrate MySQLs to use ROW-based replication - https://phabricator.wikimedia.org/T109179 (10TK-999) >>! In T109179#3952629, @jcrespo wrote: > This is important, but not a goal for this quarter- we are still blocked on mediawiki extension maintainers to be compatible with it; however, all da...
[22:40:36] <wikibugs>	 (03CR) 10CRusnov: "Looks ilke I elminated the changes to unprepared postgres servers, and the compiler output looks good." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov)
[22:43:38] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] Set wgNoticeProjects for wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471663 (https://phabricator.wikimedia.org/T208694) (owner: 10MacFan4000)
[22:46:58] <wikibugs>	 (03PS14) 10Jforrester: Set wgNoticeProjects for wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471663 (https://phabricator.wikimedia.org/T208694) (owner: 10MacFan4000)
[22:47:03] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] "…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471663 (https://phabricator.wikimedia.org/T208694) (owner: 10MacFan4000)
[22:50:41] <wikibugs>	 (03Merged) 10jenkins-bot: Set wgNoticeProjects for wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471663 (https://phabricator.wikimedia.org/T208694) (owner: 10MacFan4000)
[22:50:57] <wikibugs>	 (03CR) 10jenkins-bot: Set wgNoticeProjects for wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471663 (https://phabricator.wikimedia.org/T208694) (owner: 10MacFan4000)
[22:54:02] <logmsgbot>	 !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: T208694 Set CentralNotice's wgNoticeProjects for wikimedia (duration: 00m 59s)
[22:54:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:54:13] <stashbot>	 T208694: wgNoticeProjects should not default to wikimedia projects - https://phabricator.wikimedia.org/T208694
[22:54:29] <AndyRussG>	 James_F: thanks!!! ^
[22:54:43] <James_F>	 AndyRussG: Seems fine, as expected.
[22:54:57] <James_F>	 Hopefully after the related patch lands in production it'll be the same. :-)
[22:55:16] <AndyRussG>	 James_F: yeah! Thanks for doing this, now we can deploy that and other accumulated stuf!! :)
[22:55:27] * James_F nods.
[22:55:42] <James_F>	 (I surrender the conch.)
[23:00:05] <jouncebot>	 MaxSem, RoanKattouw, Niharika, and Urbanecm: (Dis)respected human, time to deploy Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190903T2300). Please do the needful.
[23:00:05] <jouncebot>	 davidwbarratt and RoanKattouw: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[23:00:13] <davidwbarratt>	 here!
[23:00:16] <Niharika>	 I can swat. 
[23:00:43] * Niharika flexes fingers
[23:03:21] <Niharika>	 RoanKattouw: Around? 
[23:03:39] <RoanKattouw>	 Yes, coming
[23:04:08] <wikibugs>	 (03PS3) 10Niharika29: Enable and configure ORES damaging and goodfaith on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528537 (https://phabricator.wikimedia.org/T225562) (owner: 10Sbisson)
[23:04:29] <wikibugs>	 (03CR) 10Niharika29: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528537 (https://phabricator.wikimedia.org/T225562) (owner: 10Sbisson)
[23:05:05] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "Looks right to me!  Is this in use already anywhere, such that we have to update things before/after merging?" [puppet] - 10https://gerrit.wikimedia.org/r/533758 (https://phabricator.wikimedia.org/T171188) (owner: 10Alex Monk)
[23:09:04] <RoanKattouw>	 OK, here now, sorry
[23:09:16] <RoanKattouw>	 Niharika: That ORES patch needs a table creation and a maintenance script as well
[23:10:06] <ebernhardson>	 !log production-search-eqiad all indices index.merge.policy.deletes_pct_allowed=20
[23:10:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:10:14] <Niharika>	 RoanKattouw: No problem. Zuul is kinda sluggish today; will take some time. 
[23:10:41] <RoanKattouw>	 I'm surprised a config patch is taking this long, don't those usually get merged in under a minute?
[23:11:17] <wikibugs>	 (03Merged) 10jenkins-bot: Enable and configure ORES damaging and goodfaith on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528537 (https://phabricator.wikimedia.org/T225562) (owner: 10Sbisson)
[23:11:35] <wikibugs>	 (03CR) 10jenkins-bot: Enable and configure ORES damaging and goodfaith on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528537 (https://phabricator.wikimedia.org/T225562) (owner: 10Sbisson)
[23:11:37] <Niharika>	 And merged! 
[23:13:20] <Niharika>	 RoanKattouw: Which maintenance script and table creation? 
[23:13:40] <RoanKattouw>	 Niharika: See the two "mwscript" bullet points at https://www.mediawiki.org/wiki/ORES/RCFilters#Deploying_ORES+RCFilters_to_a_new_wiki
[23:16:09] <Niharika>	 RoanKattouw: Table done. For the populatedatabase script, it is normal for the script to throw up a ton of `ScoreFetcher errored for 55947634: No model available for [goodfaith]
[23:16:09] <Niharika>	 ScoreFetcher errored for 55947635: No model available for [goodfaith]`? That one finished too. 
[23:17:04] <Niharika>	 And that change is on mwdebug1002 now if you can test it. 
[23:18:34] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[23:20:09] <ebernhardson>	 aware of ^, should work itself out in a sec
[23:23:52] <RoanKattouw>	 Niharika: Yes that is normal sadly
[23:24:00] <RoanKattouw>	 Sorry, I got distracted, I blame Greg
[23:24:04] <RoanKattouw>	 Will test now
[23:25:39] <Niharika>	 davidwbarratt: Tchanders: The change is live on mwdebug1002. 
[23:26:35] <RoanKattouw>	 Niharika: Re-ran the population script and it worked (weird).  It's working now, good to sync
[23:27:03] <Niharika>	 Okay cool. 
[23:28:34] <logmsgbot>	 !log niharika29@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable and configure ORES damaging and goodfaith on zhwiki T225562 (duration: 00m 58s)
[23:28:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:28:37] <stashbot>	 T225562: Deploy ORES filters for zhwiki - https://phabricator.wikimedia.org/T225562
[23:29:20] <davidwbarratt>	 thanks!
[23:31:12] <Niharika>	 RoanKattouw: Safe to take your -2 off of https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/534240/ now, I reckon. 
[23:31:28] <RoanKattouw>	 Whoops yes sorry
[23:31:29] <RoanKattouw>	 Done
[23:31:52] <wikibugs>	 (03PS2) 10Niharika29: Revert "[bugfix] Growth experiments not loading conf properly" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534240 (https://phabricator.wikimedia.org/T231935) (owner: 10Catrope)
[23:31:54] <RoanKattouw>	 Although the wmf.20 / wmf.21 patches need to be deployed first
[23:32:16] <RoanKattouw>	 the "Fix merge_strategy for GrowthExperiments help links" patches
[23:32:44] <Niharika>	 RoanKattouw: It won't let me merge this one until those are merged? Or can I merge it but sync it after those are out? 
[23:32:58] <RoanKattouw>	 Merge before but sync after would work
[23:33:19] <Niharika>	 Cool. It's taking long to merge today so I'm trying to save time. :)
[23:33:47] <wikibugs>	 (03CR) 10Niharika29: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534240 (https://phabricator.wikimedia.org/T231935) (owner: 10Catrope)
[23:35:58] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1004 is OK: OK: Less than 20.00% above the threshold [500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[23:37:47] <Tchanders>	 Niharika: Looks good, thanks
[23:38:11] <Niharika>	 Alright, let's sync it. 
[23:41:09] <logmsgbot>	 !log niharika29@deploy1001 Synchronized php-1.34.0-wmf.20/includes/block: Allow CompositeBlock::appliesToRight to return null when unsure T229417, T231145 (duration: 00m 55s)
[23:41:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:42:30] <logmsgbot>	 !log niharika29@deploy1001 Synchronized php-1.34.0-wmf.20/tests/phpunit/: Allow CompositeBlock::appliesToRight to return null when unsure T229417, T231145 (duration: 00m 57s)
[23:42:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:42:33] <Niharika>	 davidwbarratt: Tchanders: Done! :) 
[23:43:46] <davidwbarratt>	 YAY!
[23:44:17] <Niharika>	 '\o/' 
[23:44:36] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "[bugfix] Growth experiments not loading conf properly" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534240 (https://phabricator.wikimedia.org/T231935) (owner: 10Catrope)
[23:46:52] <wikibugs>	 (03CR) 10jenkins-bot: Revert "[bugfix] Growth experiments not loading conf properly" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534240 (https://phabricator.wikimedia.org/T231935) (owner: 10Catrope)
[23:47:14] <Niharika>	 RoanKattouw: Can you test https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/GrowthExperiments/+/534243/ or https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/GrowthExperiments/+/534242/? 
[23:48:26] <RoanKattouw>	 Niharika: I can, but I'll need to test them together with the config patch. Is that also on mwdebug?
[23:48:49] <Niharika>	 RoanKattouw: One moment. 
[23:49:27] <Niharika>	 RoanKattouw: Now they are. 
[23:49:31] <RoanKattouw>	 OK, testing
[23:50:19] <RoanKattouw>	 Looks like it works, good to deploy
[23:50:28] <RoanKattouw>	 Niharika: Please mind the sync order when you deploy these
[23:50:46] <Niharika>	 RoanKattouw: To confirm, `extension.json` patches first? 
[23:50:50] <RoanKattouw>	 1) wmf.20 + wmf.21 patches; 2a) InitialiseSettings.php change in the config patch; 2b) CommonSettings.php
[23:50:57] <Niharika>	 Got it. 
[23:52:49] <logmsgbot>	 !log niharika29@deploy1001 Synchronized php-1.34.0-wmf.20/extensions/GrowthExperiments/: Set correct merge strategy for help panel links T231935 (duration: 00m 56s)
[23:52:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:52:56] <stashbot>	 T231935: GrowthExperiments not loading extension.json properly - https://phabricator.wikimedia.org/T231935
[23:53:30] <wikibugs>	 (03CR) 10CRusnov: [C: 03+2] profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov)
[23:53:58] <logmsgbot>	 !log niharika29@deploy1001 Synchronized php-1.34.0-wmf.21/extensions/GrowthExperiments/: Set correct merge strategy for help panel links T231935 (duration: 00m 55s)
[23:54:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:56:04] <logmsgbot>	 !log niharika29@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Revert - [bugfix]Growth experiments not loading conf properly T231935  (duration: 00m 55s)
[23:56:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:56:20] <wikibugs>	 (03PS2) 10Tim Starling: Add extra key for tstarling [puppet] - 10https://gerrit.wikimedia.org/r/533125
[23:57:00] <wikibugs>	 (03CR) 10Tim Starling: [C: 03+2] Add extra key for tstarling [puppet] - 10https://gerrit.wikimedia.org/r/533125 (owner: 10Tim Starling)
[23:57:27] <logmsgbot>	 !log niharika29@deploy1001 Synchronized wmf-config/CommonSettings.php: Revert - [bugfix]Growth experiments not loading conf properly T231935  (duration: 00m 55s)
[23:57:34] <Niharika>	 RoanKattouw: All done. 
[23:57:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:57:38] <RoanKattouw>	 THanks!
[23:58:56] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban, 10netops: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10Cmjohnson) @ottomata All the servers are moved and all  of them but cloudvirtan1003 are connected to the swi...