[00:00:04] <jouncebot>	 twentyafterfour: Your horoscope predicts another unfortunate Phabricator update deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190829T0000).
[00:58:51] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[01:01:59] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[01:20:47] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[01:27:05] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[01:36:29] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[01:37:39] <wikibugs>	 10Operations, 10MediaWiki-Debug-Logger, 10Wikimedia-Logstash, 10observability, and 9 others: Port mediawiki/php/wmerrors to PHP7 and deploy - https://phabricator.wikimedia.org/T187147 (10Krinkle) 05Open→03Resolved The last point of the task description is wmerrors taking care of showing the error page...
[01:37:47] <wikibugs>	 10Operations, 10CPT Initiatives (PHP7 (TEC4)), 10HHVM, 10Patch-For-Review, and 2 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Krinkle)
[01:38:01] <wikibugs>	 10Operations, 10MediaWiki-Debug-Logger, 10Wikimedia-Logstash, 10observability, and 9 others: Port mediawiki/php/wmerrors to PHP7 and deploy - https://phabricator.wikimedia.org/T187147 (10Krinkle)
[01:39:39] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[01:40:35] <icinga-wm>	 PROBLEM - PHP opcache health on mwdebug1001 is CRITICAL: CRITICAL: opcache free space is below 50 MB https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[01:55:21] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[01:58:27] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[02:33:49] <wikibugs>	 (03PS1) 10Tim Starling: Add extra key for tstarling [puppet] - 10https://gerrit.wikimedia.org/r/533125
[02:46:39] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 19445624 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[02:46:59] <icinga-wm>	 RECOVERY - Check the Netbox report-s- librenms for fail status. on netmon1002 is OK: librenms.LibreNMS OK https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[02:49:09] <wikibugs>	 (03PS1) 10CRusnov: librenms: Exclude problematic InventoryItem type as requested [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/533128 (https://phabricator.wikimedia.org/T231502)
[02:49:47] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 456 and 62 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[02:52:05] <icinga-wm>	 RECOVERY - ElasticSearch shard size check - 9243 on search.svc.codfw.wmnet is OK: OK - All good! https://wikitech.wikimedia.org/wiki/Search%23If_it_has_been_indexed
[02:54:01] <wikibugs>	 (03CR) 10Mathew.onipe: "Eqiad and codfw have recovered. shard sizes for commonswiki_content are back to normal. I'll still leave this patch here for while to see " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533023 (https://phabricator.wikimedia.org/T231446) (owner: 10Mathew.onipe)
[02:54:14] <wikibugs>	 (03CR) 10Mathew.onipe: [C: 04-1] "do not merge" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533023 (https://phabricator.wikimedia.org/T231446) (owner: 10Mathew.onipe)
[02:54:57] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[03:01:15] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[03:03:37] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Discovery, 10Discovery-Search (Current work): Memory correctable errors -EDAC- elastic1029 - https://phabricator.wikimedia.org/T214283 (10Mathew.onipe) 05Resolved→03Open elastic1029 is back on icinga showing memory errors. see https://icinga.wikimedia.org/cgi-bi...
[03:04:23] <wikibugs>	 (03PS3) 10CRusnov: Add script to import management DNS entries [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/529977 (https://phabricator.wikimedia.org/T228670)
[03:07:29] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[03:12:11] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[03:20:27] <icinga-wm>	 PROBLEM - cassandra-b service on restbase2017 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[03:20:29] <icinga-wm>	 PROBLEM - Check systemd state on restbase2017 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:20:59] <icinga-wm>	 PROBLEM - cassandra-b CQL 10.192.48.122:9042 on restbase2017 is CRITICAL: connect to address 10.192.48.122 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[03:21:37] <icinga-wm>	 PROBLEM - cassandra-b SSL 10.192.48.122:7001 on restbase2017 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://phabricator.wikimedia.org/T120662
[03:23:13] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[03:26:21] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[03:38:55] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[03:40:55] <icinga-wm>	 RECOVERY - cassandra-b service on restbase2017 is OK: OK - cassandra-b is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[03:40:57] <icinga-wm>	 RECOVERY - Check systemd state on restbase2017 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:42:03] <icinga-wm>	 RECOVERY - cassandra-b SSL 10.192.48.122:7001 on restbase2017 is OK: SSL OK - Certificate restbase2017-b valid until 2020-11-29 09:26:18 +0000 (expires in 458 days) https://phabricator.wikimedia.org/T120662
[03:42:55] <icinga-wm>	 RECOVERY - cassandra-b CQL 10.192.48.122:9042 on restbase2017 is OK: TCP OK - 0.030 second response time on 10.192.48.122 port 9042 https://phabricator.wikimedia.org/T93886
[03:43:37] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[03:59:01] <wikibugs>	 (03PS1) 10CRusnov: Add script to rotate backup dumps, and dump with timestamp [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/533131 (https://phabricator.wikimedia.org/T231512)
[04:02:51] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at esams on icinga1001 is CRITICAL: 33.46 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[04:05:59] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at esams on icinga1001 is OK: (C)60 le (W)70 le 101 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[04:15:03] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[04:19:51] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[04:22:59] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[04:28:49] <wikibugs>	 10Operations, 10Traffic: Perform HTTPS redirect without crossing domain boundaries for non canonical domains - https://phabricator.wikimedia.org/T231513 (10Vgutierrez)
[04:29:31] <wikibugs>	 10Operations, 10Traffic: Perform HTTPS redirect without crossing domain boundaries for non canonical domains - https://phabricator.wikimedia.org/T231513 (10Vgutierrez) p:05Triage→03Normal
[04:31:09] <wikibugs>	 10Operations, 10Traffic: Enable HSTS for non canonical domains using the ncredir service - https://phabricator.wikimedia.org/T231514 (10Vgutierrez)
[04:31:38] <wikibugs>	 10Operations, 10Traffic: Enable HSTS for non canonical domains using the ncredir service - https://phabricator.wikimedia.org/T231514 (10Vgutierrez) p:05Triage→03Normal
[04:42:26] <wikibugs>	 (03PS1) 10Vgutierrez: ncredir: Perform HTTPS upgrade without crossing domain boundaries [puppet] - 10https://gerrit.wikimedia.org/r/533133 (https://phabricator.wikimedia.org/T231513)
[04:47:55] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[04:49:03] <wikibugs>	 10Operations, 10netops: Review switches ACL to connect from tools-bastion to dbproxy1018 - https://phabricator.wikimedia.org/T231418 (10Marostegui) Works! Thank you!
[04:49:29] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[04:52:23] <wikibugs>	 (03PS1) 10Marostegui: report_users: Whitelist dbproxy1018 [software] - 10https://gerrit.wikimedia.org/r/533136
[04:53:50] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] report_users: Whitelist dbproxy1018 [software] - 10https://gerrit.wikimedia.org/r/533136 (owner: 10Marostegui)
[04:54:03] <wikibugs>	 (03PS5) 10Marostegui: mariadb: Promote db1133 to m5 master [puppet] - 10https://gerrit.wikimedia.org/r/529331 (https://phabricator.wikimedia.org/T229657)
[04:54:11] <wikibugs>	 (03PS4) 10Marostegui: wmnet: Promote db1133 to m5 master [dns] - 10https://gerrit.wikimedia.org/r/529333 (https://phabricator.wikimedia.org/T229657)
[04:54:22] <wikibugs>	 (03PS3) 10Marostegui: mariadb: Promote db1109 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/531189 (https://phabricator.wikimedia.org/T230762)
[04:54:31] <wikibugs>	 (03PS2) 10Marostegui: wmnet: Update s8-master record [dns] - 10https://gerrit.wikimedia.org/r/531455 (https://phabricator.wikimedia.org/T230762)
[04:56:34] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Decommission db2053 [puppet] - 10https://gerrit.wikimedia.org/r/533139 (https://phabricator.wikimedia.org/T231407)
[04:56:49] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Decommission db2053.codfw.wmnet - https://phabricator.wikimedia.org/T231407 (10Marostegui)
[04:57:19] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[04:58:01] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Decommission db2053 [puppet] - 10https://gerrit.wikimedia.org/r/533139 (https://phabricator.wikimedia.org/T231407) (owner: 10Marostegui)
[04:58:47] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Decommission db2053.codfw.wmnet - https://phabricator.wikimedia.org/T231407 (10Marostegui)
[04:59:06] <marostegui>	 !log Remove db2053 from tendril and zarcillo T231407
[04:59:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:59:13] <stashbot>	 T231407: Decommission db2053.codfw.wmnet - https://phabricator.wikimedia.org/T231407
[05:00:14] <wikibugs>	 (03PS1) 10Vgutierrez: ncredir: Enable HSTS with max-age set to 1 week [puppet] - 10https://gerrit.wikimedia.org/r/533140 (https://phabricator.wikimedia.org/T231514)
[05:00:36] <marostegui>	 !log Stop MySQL on db2053 for decommission T231407
[05:00:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:01:30] <wikibugs>	 10Operations, 10ops-codfw, 10decommission: Decommission db2053.codfw.wmnet - https://phabricator.wikimedia.org/T231407 (10Marostegui) a:05Marostegui→03RobH
[05:01:44] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2053.codfw.wmnet - https://phabricator.wikimedia.org/T231407 (10Marostegui) This host is ready for #dc-ops to decommission
[05:02:07] <wikibugs>	 10Operations, 10DBA: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10Marostegui)
[05:03:26] <wikibugs>	 (03CR) 10Vgutierrez: "pcc looks happy: https://puppet-compiler.wmflabs.org/compiler1001/18100/" [puppet] - 10https://gerrit.wikimedia.org/r/533140 (https://phabricator.wikimedia.org/T231514) (owner: 10Vgutierrez)
[05:08:15] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[05:10:06] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Traffic, and 2 others: Picture from Commons not found from Singapore - https://phabricator.wikimedia.org/T231086 (10aaron) >>! In T231086#5434295, @CDanis wrote:  > I suggest: > * when ATS gets a 404 response from one cluster, retry on the other activ...
[05:12:57] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[05:19:55] <wikibugs>	 (03PS1) 10Vgutierrez: redirects.dat: Get rid of non canonical domains rules [puppet] - 10https://gerrit.wikimedia.org/r/533141 (https://phabricator.wikimedia.org/T133548)
[05:19:57] <wikibugs>	 (03PS1) 10Vgutierrez: redirects.dat: Enforce HTTPS for canonnical domains [puppet] - 10https://gerrit.wikimedia.org/r/533142 (https://phabricator.wikimedia.org/T133548)
[05:21:29] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[05:24:25] <wikibugs>	 (03PS2) 10Vgutierrez: redirects.dat: Enforce HTTPS for canonical domains [puppet] - 10https://gerrit.wikimedia.org/r/533142 (https://phabricator.wikimedia.org/T133548)
[05:32:56] <wikibugs>	 (03PS1) 10Marostegui: dbproxy1018: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/533144 (https://phabricator.wikimedia.org/T202367)
[05:33:34] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] dbproxy1018: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/533144 (https://phabricator.wikimedia.org/T202367) (owner: 10Marostegui)
[05:37:53] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[05:40:51] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[05:45:31] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[05:47:48] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[05:50:56] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[05:52:12] <vgutierrez>	 hmm something got deployed yesterday that's giving hiccups to php-fpm?
[05:54:14] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[05:57:48] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[06:00:20] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[06:07:08] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[06:08:38] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[06:11:40] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[06:28:38] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[06:35:12] <wikibugs>	 (03PS8) 10Jeena Huneidi: Add restbase chart (port from local-charts) [deployment-charts] - 10https://gerrit.wikimedia.org/r/517557 (https://phabricator.wikimedia.org/T224935)
[06:36:24] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[06:41:06] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[06:44:12] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[07:01:12] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[07:04:55] <apergos>	 68c9e46ab0c2171a74619cecc445d55c28b31eab  these alerts are relatively new, perhaps we should up the threshhold a bit
[07:06:05] <apergos>	 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/531664/   and https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/531691/
[07:08:09] <apergos>	 godog: any thoughts? ^^
[07:08:14] <icinga-wm>	 PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS6939/IPv6: Active, AS6939/IPv4: Connect https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:12:54] <icinga-wm>	 RECOVERY - BGP status on cr4-ulsfo is OK: BGP OK - up: 94, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:13:36] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[07:20:26] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 56 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[07:22:38] <vgutierrez>	 apergos: hmmm checking the last 7 days I'd say that php-fpm response time got substantially worse since last UTC night
[07:22:52] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[07:24:24] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[07:24:33] <apergos>	 I didn't see anything deployed near the time the alerts started though. 
[07:26:00] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 24 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[07:29:38] <wikibugs>	 10Operations, 10DBA, 10Data-Services: Replace labsdb (wikireplicas) dbproxies: dbproxy1010 and dbproxy1011 - https://phabricator.wikimedia.org/T231520 (10Marostegui)
[07:29:49] <wikibugs>	 10Operations, 10DBA, 10Data-Services: Replace labsdb (wikireplicas) dbproxies: dbproxy1010 and dbproxy1011 - https://phabricator.wikimedia.org/T231520 (10Marostegui) p:05Triage→03Normal
[07:30:02] <wikibugs>	 10Operations, 10DBA, 10Data-Services: Replace labsdb (wikireplicas) dbproxies: dbproxy1010 and dbproxy1011 - https://phabricator.wikimedia.org/T231520 (10Marostegui)
[07:30:38] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[07:31:58] <wikibugs>	 10Operations, 10DBA, 10Data-Services: Replace labsdb (wikireplicas) dbproxies: dbproxy1010 and dbproxy1011 - https://phabricator.wikimedia.org/T231520 (10Marostegui)
[07:32:12] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[07:35:16] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[07:43:11] <wikibugs>	 (03PS1) 10Dzahn: ATS/varnish: switch iegreview to miscweb backend and use TLS [puppet] - 10https://gerrit.wikimedia.org/r/533154 (https://phabricator.wikimedia.org/T210411)
[07:44:46] <wikibugs>	 (03PS2) 10Dzahn: ATS/varnish: switch iegreview to miscweb backend and use TLS [puppet] - 10https://gerrit.wikimedia.org/r/533154 (https://phabricator.wikimedia.org/T210411)
[07:48:46] <wikibugs>	 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui)
[07:50:52] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[07:51:12] <wikibugs>	 (03PS1) 10Dzahn: racktables: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/533155 (https://phabricator.wikimedia.org/T224247)
[07:53:42] <wikibugs>	 (03PS1) 10Dzahn: iegreview: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/533157 (https://phabricator.wikimedia.org/T224247)
[07:57:04] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[07:57:09] <wikibugs>	 (03PS1) 10Dzahn: wikimania_scholarships: remove jessie/php5 support [puppet] - 10https://gerrit.wikimedia.org/r/533158 (https://phabricator.wikimedia.org/T224247)
[08:00:07] <wikibugs>	 (03Abandoned) 10Dzahn: ATS/varnish: replace krypton with miscweb1001, rename director [puppet] - 10https://gerrit.wikimedia.org/r/532695 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn)
[08:00:15] <wikibugs>	 (03PS1) 10KartikMistry: Update cxserver to 2019-08-29-074757-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/533160 (https://phabricator.wikimedia.org/T230200)
[08:00:17] <wikibugs>	 (03PS1) 10Dzahn: misc_apps::httpd: remove jessie/php5 support [puppet] - 10https://gerrit.wikimedia.org/r/533159 (https://phabricator.wikimedia.org/T224247)
[08:01:44] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[08:02:57] <wikibugs>	 (03PS1) 10Dzahn: wikistats (cloud): remove php5 support [puppet] - 10https://gerrit.wikimedia.org/r/533161
[08:03:30] <_joe_>	 !log live tweak on mw1270: apc.ttl removed; apc size 4 GB; tideways disabled.
[08:03:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:06:22] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[08:08:45] <wikibugs>	 (03CR) 10KartikMistry: [V: 03+2 C: 03+2] Update cxserver to 2019-08-29-074757-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/533160 (https://phabricator.wikimedia.org/T230200) (owner: 10KartikMistry)
[08:11:20] <_joe_>	 !log disabling zend GC on mw1347, testing an hypothesis for T231011
[08:11:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:11:26] <stashbot>	 T231011: Mysterious, coordinated slowdowns every ~ 25 minutes on mw1347,mw1348 (php7 api servers) - https://phabricator.wikimedia.org/T231011
[08:14:09] <kart_>	 I'm planning to update cxserver. Anything else going on that can cause issue with it. _joe_ ?
[08:14:41] <_joe_>	 kart_: I'm around for 10 more minutes, and no one else with k8s experience is
[08:14:58] <_joe_>	 so you have no support on our side
[08:15:07] <_joe_>	 other than that, do what you need
[08:15:12] <kart_>	 OK. It is pretty quick. In case of trouble, I'll revert.
[08:15:19] <kart_>	 Thanks!
[08:15:40] <logmsgbot>	 !log @ helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
[08:15:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:18:10] <logmsgbot>	 !log @ helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
[08:18:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:20:16] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] ATS/varnish: switch iegreview to miscweb backend and use TLS [puppet] - 10https://gerrit.wikimedia.org/r/533154 (https://phabricator.wikimedia.org/T210411) (owner: 10Dzahn)
[08:21:06] <logmsgbot>	 !log @ helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
[08:21:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:22:21] <wikibugs>	 (03PS1) 10Dzahn: Revert "webserver_misc_apps: only include envoy if on stretch" [puppet] - 10https://gerrit.wikimedia.org/r/533163
[08:23:52] <kart_>	 !log Updated cxserver to 2019-08-29-074757-production (T230200)
[08:23:54] <mutante>	 !log switching iegreview app to stretch backend with TLS and discovery record
[08:23:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:23:57] <stashbot>	 T230200: Whole article as one section - https://phabricator.wikimedia.org/T230200
[08:24:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:26:15] <mutante>	 !log running puppet on cp-text_eqiad
[08:26:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:28] <marostegui>	 !log Change min_replicas to 2 on s3 for eqiad and codfw T231019
[08:30:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:33] <stashbot>	 T231019: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019
[08:32:50] <wikibugs>	 10Operations, 10Discovery-Search (Current work): Alert when a jvm hits more than 100 old gc ops/hour - https://phabricator.wikimedia.org/T231516 (10Mathew.onipe)
[08:36:00] <marostegui>	 !log Change min_replicas to 4 on s4 for eqiad and codfw T231019
[08:36:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:36:07] <stashbot>	 T231019: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019
[08:38:33] <wikibugs>	 10Operations: two user pages on meta can't be rendered - https://phabricator.wikimedia.org/T231522 (10ArielGlenn) These two pages are aliases for the same contributor, and the problematic revisions were added in 2016 on each page, so this is some sort of regression (php? babel? combo?)
[08:38:51] <wikibugs>	 10Operations, 10MediaWiki-extensions-Babel: two user pages on meta can't be rendered - https://phabricator.wikimedia.org/T231522 (10ArielGlenn)
[08:39:27] <mutante>	 !log cp1085 - kill stuck puppet processes and run manually
[08:39:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:44] <mutante>	 !log cp1085 - puppet run stuck after Loading facts, possibly related to ACKed IPMI sensor status issue in Icinga
[08:41:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:43:45] <marostegui>	 !log Change min_replicas to 4 on s2 for eqiad and codfw T231019
[08:43:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:43:51] <stashbot>	 T231019: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019
[09:03:23] <wikibugs>	 10Operations, 10ops-eqiad, 10Traffic: cp1085 - IPMI not working - https://phabricator.wikimedia.org/T231525 (10Dzahn)
[09:08:07] <marostegui>	 !log Reboot db1133 to upgrade kernel - T229657
[09:08:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:08:12] <stashbot>	 T229657: Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657
[09:15:29] <wikibugs>	 (03PS1) 10Marostegui: install_server: Do not reimage dbproxy1018,dbproxy1019 [puppet] - 10https://gerrit.wikimedia.org/r/533171 (https://phabricator.wikimedia.org/T202367)
[09:19:02] <wikibugs>	 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn)
[09:19:35] <mutante>	 !log iegreview.wikimedia.org switched to new stretch backend and using TLS (T210411)
[09:19:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:19:42] <stashbot>	 T210411: Applayer services without TLS - https://phabricator.wikimedia.org/T210411
[09:21:07] <wikibugs>	 (03PS1) 10KartikMistry: WIP: Move ContentTranslation out of Beta in jvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533172 (https://phabricator.wikimedia.org/T231207)
[09:21:22] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] racktables: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/533155 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn)
[09:21:31] <wikibugs>	 (03PS2) 10Dzahn: racktables: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/533155 (https://phabricator.wikimedia.org/T224247)
[09:21:50] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] install_server: Do not reimage dbproxy1018,dbproxy1019 [puppet] - 10https://gerrit.wikimedia.org/r/533171 (https://phabricator.wikimedia.org/T202367) (owner: 10Marostegui)
[09:23:19] <wikibugs>	 (03PS3) 10Dzahn: racktables: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/533155 (https://phabricator.wikimedia.org/T224247)
[09:24:54] <wikibugs>	 (03PS2) 10Dzahn: iegreview: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/533157 (https://phabricator.wikimedia.org/T224247)
[09:27:03] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] iegreview: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/533157 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn)
[09:43:41] <wikibugs>	 10Operations, 10Traffic: Improve ATS prometheus metrics - https://phabricator.wikimedia.org/T231533 (10Vgutierrez)
[09:47:28] <wikibugs>	 (03PS1) 10Dzahn: ATS/varnish: switch scholarschips to miscweb and use TLS [puppet] - 10https://gerrit.wikimedia.org/r/533175 (https://phabricator.wikimedia.org/T210411)
[09:49:41] <wikibugs>	 (03PS2) 10Dzahn: ATS/varnish: switch wikimania scholarships to miscweb, use TLS [puppet] - 10https://gerrit.wikimedia.org/r/533175 (https://phabricator.wikimedia.org/T210411)
[09:50:42] <wikibugs>	 (03PS3) 10Dzahn: ATS/varnish: switch wikimania scholarships to miscweb, use TLS [puppet] - 10https://gerrit.wikimedia.org/r/533175 (https://phabricator.wikimedia.org/T210411)
[10:04:29] <wikibugs>	 (03PS1) 10Vgutierrez: prometheus: Ship a custom metrics file for trafficserver_exporter [puppet] - 10https://gerrit.wikimedia.org/r/533178 (https://phabricator.wikimedia.org/T231533)
[10:06:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] prometheus: Ship a custom metrics file for trafficserver_exporter [puppet] - 10https://gerrit.wikimedia.org/r/533178 (https://phabricator.wikimedia.org/T231533) (owner: 10Vgutierrez)
[10:07:50] <wikibugs>	 (03PS2) 10Vgutierrez: prometheus: Ship a custom metrics file for trafficserver_exporter [puppet] - 10https://gerrit.wikimedia.org/r/533178 (https://phabricator.wikimedia.org/T231533)
[10:08:57] <wikibugs>	 (03CR) 10Mobrovac: [C: 04-1] "Hm so, this will wrap the restbase ports in tls, which means all of the internal customers need to switch to tls as soon as this goes out." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/533028 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema)
[10:14:46] <wikibugs>	 (03CR) 10Ema: "> Hm so, this will wrap the restbase ports in tls, which means all of" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/533028 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema)
[10:18:50] <wikibugs>	 (03PS1) 10Dzahn: ATS: configure "never_cache" for miscweb1001 backend [puppet] - 10https://gerrit.wikimedia.org/r/533181 (https://phabricator.wikimedia.org/T224247)
[10:26:14] <wikibugs>	 10Operations, 10Traffic: Allow blocking requests from specific networks on the edge - https://phabricator.wikimedia.org/T231063 (10ema) 05Open→03Resolved This is now done.
[10:28:43] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: cergen fails signing CSR - https://phabricator.wikimedia.org/T231423 (10ema) @Ottomata: I understand that this is now fixed, can you confirm and close the task if so?
[10:30:06] <wikibugs>	 (03CR) 10Ema: [C: 03+2] Revert "ATS: temporarily use plain HTTP to access docker-registry" [puppet] - 10https://gerrit.wikimedia.org/r/533041 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema)
[10:30:08] <wikibugs>	 (03PS2) 10Ema: Revert "ATS: temporarily use plain HTTP to access docker-registry" [puppet] - 10https://gerrit.wikimedia.org/r/533041 (https://phabricator.wikimedia.org/T227432)
[10:31:01] <wikibugs>	 (03PS1) 10Dzahn: mediawiki::webserver: send stderr of hhvm-restart script to /dev/null [puppet] - 10https://gerrit.wikimedia.org/r/533184
[10:38:40] <wikibugs>	 (03CR) 10Mobrovac: [C: 03+1] "From IRC:" [puppet] - 10https://gerrit.wikimedia.org/r/533028 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema)
[10:38:58] <wikibugs>	 (03PS2) 10Dzahn: mediawiki::webserver: send stderr of hhvm-restart script to /dev/null [puppet] - 10https://gerrit.wikimedia.org/r/533184
[10:47:11] <wikibugs>	 10Operations, 10DBA: Switchover m1 primary master: db1063 to db1135 - https://phabricator.wikimedia.org/T231403 (10akosiaris) Sep 10 16:00UTC sounds okish to me as far as `bacula` goes. We will probably have a couple of full backups (it's the start of the month, when full backups happen and they might not have...
[10:48:22] <wikibugs>	 (03PS3) 10Dzahn: mediawiki::webserver: send stderr of hhvm-restart script to /dev/null [puppet] - 10https://gerrit.wikimedia.org/r/533184
[10:48:48] <wikibugs>	 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Error pulling image from docker registry - https://phabricator.wikimedia.org/T231388 (10ema) 05Open→03Resolved We have managed to generate a proper certificate for the docker-registry origin servers, and cp1075 is now back to using TLS to conne...
[10:49:08] <wikibugs>	 (03PS4) 10Dzahn: mediawiki::webserver: send stderr of hhvm-restart script to /dev/null [puppet] - 10https://gerrit.wikimedia.org/r/533184
[10:49:56] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "for now to avoid cron spam" [puppet] - 10https://gerrit.wikimedia.org/r/533184 (owner: 10Dzahn)
[10:50:35] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Improve ATS prometheus metrics - https://phabricator.wikimedia.org/T231533 (10ema) p:05Triage→03Normal
[10:50:47] <wikibugs>	 10Operations, 10ops-eqiad, 10Traffic: cp1085 - IPMI not working - https://phabricator.wikimedia.org/T231525 (10ema) p:05Triage→03Normal
[10:50:55] <wikibugs>	 10Operations, 10DBA: Switchover m1 primary master: db1063 to db1135 - https://phabricator.wikimedia.org/T231403 (10Marostegui) Thanks @akosiaris! As spoken on IRC, no need to re-schedule because of bacula.
[10:54:38] <wikibugs>	 10Operations, 10DBA: Switchover m1 primary master: db1063 to db1135: Tuesday 10th September at 16:00 UTC - https://phabricator.wikimedia.org/T231403 (10Marostegui)
[10:54:46] <wikibugs>	 (03CR) 10Dzahn: "we should follow-up with a change to the Python script to handle actual errors different from info there and then revert this?" [puppet] - 10https://gerrit.wikimedia.org/r/533184 (owner: 10Dzahn)
[10:57:59] <wikibugs>	 10Operations, 10observability, 10Discovery-Search (Current work): Alert when a jvm hits more than 100 old gc ops/hour - https://phabricator.wikimedia.org/T231516 (10Peachey88)
[11:00:04] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor My software never has bugs. It just develops random features. Rise for European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190829T1100).
[11:00:05] <jouncebot>	 matthiasmullie: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[11:00:17] <wikibugs>	 (03PS1) 10Mathew.onipe: icinga: add old JVM GC check [puppet] - 10https://gerrit.wikimedia.org/r/533189 (https://phabricator.wikimedia.org/T231516)
[11:00:19] <matthiasmullie>	 o/
[11:04:12] <wikibugs>	 10Operations, 10DBA: Drop puppet database from m1 - https://phabricator.wikimedia.org/T231539 (10Marostegui)
[11:04:32] <Lucas_WMDE>	 o/
[11:04:37] <Lucas_WMDE>	 I can SWAT
[11:04:39] <wikibugs>	 10Operations, 10DBA: Drop puppet database from m1 - https://phabricator.wikimedia.org/T231539 (10Marostegui) p:05Triage→03Normal
[11:05:10] <Lucas_WMDE>	 matthiasmullie: you’re a deployer right?
[11:05:23] <Lucas_WMDE>	 do you want to deploy your changes?
[11:05:31] <Lucas_WMDE>	 s/changes/backports/
[11:08:52] <wikibugs>	 10Operations, 10Traffic: Unexpectedly received mobile version of an article while logged out - https://phabricator.wikimedia.org/T231504 (10ema) p:05Triage→03High
[11:10:07] <wikibugs>	 10Operations, 10observability, 10Discovery-Search (Current work), 10Patch-For-Review: Alert when a jvm hits more than 100 old gc ops/hour - https://phabricator.wikimedia.org/T231516 (10Mathew.onipe) On another note, I think this check make sense for other clusters as well
[11:10:49] <matthiasmullie>	 Lucas_WMDE: yeah I can do them myuself
[11:11:06] <matthiasmullie>	 thanks for the offer, though!
[11:11:12] <Lucas_WMDE>	 ok :)
[11:12:46] <wikibugs>	 (03CR) 10Mathew.onipe: "PCC is Ok: https://puppet-compiler.wmflabs.org/compiler1002/18105/" [puppet] - 10https://gerrit.wikimedia.org/r/533189 (https://phabricator.wikimedia.org/T231516) (owner: 10Mathew.onipe)
[11:15:20] <wikibugs>	 (03CR) 10Volans: "Replies inline" (036 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/514840 (https://phabricator.wikimedia.org/T205900) (owner: 10CRusnov)
[11:16:11] <wikibugs>	 (03PS2) 10Dzahn: ATS: configure "never-cache" for webserver-misc-apps.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/533181 (https://phabricator.wikimedia.org/T224247)
[11:19:04] <wikibugs>	 10Operations, 10Traffic: Unexpectedly received mobile version of an article while logged out - https://phabricator.wikimedia.org/T231504 (10ema) Thanks for filing this bug and for providing all request/response headers, very useful!  I'm currently trying to reproduce the issue with [[ https://en.wikipedia.org/...
[11:22:23] <wikibugs>	 (03CR) 10Ema: [C: 03+1] ATS: configure "never-cache" for webserver-misc-apps.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/533181 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn)
[11:24:08] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [dns] - 10https://gerrit.wikimedia.org/r/532502 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov)
[11:24:11] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] ATS: configure "never-cache" for webserver-misc-apps.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/533181 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn)
[11:24:15] <wikibugs>	 (03PS3) 10Dzahn: ATS: configure "never-cache" for webserver-misc-apps.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/533181 (https://phabricator.wikimedia.org/T224247)
[11:24:51] <wikibugs>	 (03CR) 10Ema: [C: 03+1] ATS/varnish: switch wikimania scholarships to miscweb, use TLS [puppet] - 10https://gerrit.wikimedia.org/r/533175 (https://phabricator.wikimedia.org/T210411) (owner: 10Dzahn)
[11:25:43] <wikibugs>	 (03PS4) 10Dzahn: ATS/varnish: switch wikimania scholarships to miscweb, use TLS [puppet] - 10https://gerrit.wikimedia.org/r/533175 (https://phabricator.wikimedia.org/T210411)
[11:26:09] <wikibugs>	 (03CR) 10Ema: [C: 03+1] ATS: remove wikiba.se backend [puppet] - 10https://gerrit.wikimedia.org/r/532976 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn)
[11:28:20] <Lucas_WMDE>	 matthiasmullie: everything okay? I’m not seeing any log messages
[11:28:41] <Lucas_WMDE>	 oh, gate-and-submit-swat is being slow…
[11:29:02] <matthiasmullie>	 yeah, these take a little while :p
[11:29:18] <matthiasmullie>	 almost there ^^
[11:31:23] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] ATS/varnish: switch wikimania scholarships to miscweb, use TLS [puppet] - 10https://gerrit.wikimedia.org/r/533175 (https://phabricator.wikimedia.org/T210411) (owner: 10Dzahn)
[11:35:09] <wikibugs>	 (03PS1) 10Vgutierrez: prometheus: Add basic ATS network and ssl metrics [puppet] - 10https://gerrit.wikimedia.org/r/533193 (https://phabricator.wikimedia.org/T231533)
[11:37:16] <mutante>	 !log scholarships.wikimedia.org app moving to new backend and using TLS. backend upgraded from jessie to stretch and PHP7 (T210411)
[11:37:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:37:24] <stashbot>	 T210411: Applayer services without TLS - https://phabricator.wikimedia.org/T210411
[11:38:17] <wikibugs>	 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn)
[11:38:39] <wikibugs>	 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn)
[11:38:49] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] wikimania_scholarships: remove jessie/php5 support [puppet] - 10https://gerrit.wikimedia.org/r/533158 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn)
[11:39:01] <wikibugs>	 (03PS2) 10Dzahn: wikimania_scholarships: remove jessie/php5 support [puppet] - 10https://gerrit.wikimedia.org/r/533158 (https://phabricator.wikimedia.org/T224247)
[11:44:31] <logmsgbot>	 !log mlitn@deploy1001 Synchronized php-1.34.0-wmf.20/extensions/WikibaseMediaInfo: [SDC] Add "copy statements" functionality (MediaInfo part) (duration: 00m 54s)
[11:44:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:45:26] <wikibugs>	 (03PS2) 10Dzahn: Revert "webserver_misc_apps: only include envoy if on stretch" [puppet] - 10https://gerrit.wikimedia.org/r/533163
[11:45:34] <logmsgbot>	 !log mlitn@deploy1001 Synchronized php-1.34.0-wmf.20/extensions/UploadWizard: [SDC] Add "copy statements" functionality (UploadWizard part) (duration: 00m 52s)
[11:45:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:45:51] <wikibugs>	 (03PS3) 10Dzahn: Revert "webserver_misc_apps: only include envoy if on stretch" [puppet] - 10https://gerrit.wikimedia.org/r/533163
[11:46:21] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] Revert "webserver_misc_apps: only include envoy if on stretch" [puppet] - 10https://gerrit.wikimedia.org/r/533163 (owner: 10Dzahn)
[11:49:35] <wikibugs>	 (03PS1) 10Dzahn: planet: include envoy for TLS termination [puppet] - 10https://gerrit.wikimedia.org/r/533197 (https://phabricator.wikimedia.org/T210411)
[11:49:53] <wikibugs>	 (03PS2) 10Dzahn: misc_apps::httpd: remove jessie/php5 support [puppet] - 10https://gerrit.wikimedia.org/r/533159 (https://phabricator.wikimedia.org/T224247)
[11:50:37] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] misc_apps::httpd: remove jessie/php5 support [puppet] - 10https://gerrit.wikimedia.org/r/533159 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn)
[11:50:58] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "Needs clarification on why we need this." (031 comment) [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/533131 (https://phabricator.wikimedia.org/T231512) (owner: 10CRusnov)
[11:51:14] <matthiasmullie>	 Done
[11:53:51] <Lucas_WMDE>	 jouncebot: now
[11:53:51] <jouncebot>	 For the next 0 hour(s) and 6 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190829T1100)
[11:54:06] <Lucas_WMDE>	 nothing else to SWAT, and we don’t have much time left anyways
[11:54:09] <Lucas_WMDE>	 !log EU SWAT done
[11:54:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:36] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/533128 (https://phabricator.wikimedia.org/T231502) (owner: 10CRusnov)
[11:57:05] <wikibugs>	 10Operations, 10decommission, 10serviceops: decom krypton.eqiad.wmnet - https://phabricator.wikimedia.org/T231546 (10Dzahn)
[12:00:15] <wikibugs>	 (03PS3) 10Dzahn: site/install_server: decom krypton.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/532701 (https://phabricator.wikimedia.org/T224247)
[12:10:41] <wikibugs>	 (03PS4) 10Dzahn: site/install_server: decom krypton.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/532701 (https://phabricator.wikimedia.org/T224247)
[12:11:57] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] site/install_server: decom krypton.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/532701 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn)
[12:12:06] <wikibugs>	 (03PS5) 10Dzahn: site/install_server: decom krypton.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/532701 (https://phabricator.wikimedia.org/T224247)
[12:19:49] <wikibugs>	 10Operations, 10decommission, 10serviceops: decom krypton.eqiad.wmnet - https://phabricator.wikimedia.org/T231546 (10Dzahn)
[12:20:39] <wikibugs>	 10Operations, 10decommission, 10serviceops: decom krypton.eqiad.wmnet - https://phabricator.wikimedia.org/T231546 (10Dzahn)
[12:22:27] <wikibugs>	 10Operations, 10serviceops, 10Patch-For-Review: upgrade and rename krypton & create its codfw equivalent - https://phabricator.wikimedia.org/T224247 (10Dzahn)
[12:22:35] <wikibugs>	 10Operations, 10serviceops: upgrade and rename krypton & create its codfw equivalent - https://phabricator.wikimedia.org/T224247 (10Dzahn)
[12:23:32] <wikibugs>	 10Operations, 10serviceops: upgrade and rename krypton & create its codfw equivalent - https://phabricator.wikimedia.org/T224247 (10Dzahn)
[12:24:42] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 71, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:25:10] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 229, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:26:14] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 73, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:26:44] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:27:30] <wikibugs>	 10Operations, 10serviceops: upgrade and rename krypton & create its codfw equivalent - https://phabricator.wikimedia.org/T224247 (10Dzahn) The following services have been moved away from krypton miscweb1001.  - http://racktables.wikimedia.org - http://iegreview.wikimedia.org - http://scholarships.wikimedia.or...
[12:28:22] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-text site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[12:30:06] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[12:31:56] <wikibugs>	 10Operations, 10serviceops: upgrade and rename krypton & create its codfw equivalent - https://phabricator.wikimedia.org/T224247 (10Dzahn) TODO: DB config needs to use codfw proxy if puppet role applied on codfw node
[12:33:02] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[12:33:12] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[13:00:05] <jouncebot>	 zeljkof: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for MediaWiki train - European version . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190829T1300).
[13:00:29] <zeljkof>	 thank you jouncebot but train looks blocked at the moment :/
[13:10:12] <wikibugs>	 (03CR) 10Ema: [C: 03+1] prometheus: Ship a custom metrics file for trafficserver_exporter (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/533178 (https://phabricator.wikimedia.org/T231533) (owner: 10Vgutierrez)
[13:10:38] <wikibugs>	 (03CR) 10Ema: [C: 03+1] prometheus: Add basic ATS network and ssl metrics [puppet] - 10https://gerrit.wikimedia.org/r/533193 (https://phabricator.wikimedia.org/T231533) (owner: 10Vgutierrez)
[13:15:32] <wikibugs>	 10Operations, 10WMF-Legal, 10serviceops: Move old transparency report pages to historical URLs and setup redirect - https://phabricator.wikimedia.org/T230638 (10Dzahn) One thing that you can already do is create https://transparency.wikimedia.org/historical/ since that is just inside the content repo that is...
[13:17:36] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10Discovery, and 2 others: Make oozie swift upload emit event to Kafka about swift object upload complete - https://phabricator.wikimedia.org/T227896 (10Nuria) 05Open→03Resolved
[13:17:41] <wikibugs>	 10Operations, 10Analytics, 10Discovery, 10Research-Backlog, 10Patch-For-Review: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10Nuria)
[13:22:41] <wikibugs>	 10Operations, 10Traffic: Unexpectedly received mobile version of an article while logged out - https://phabricator.wikimedia.org/T231504 (10ema) a:03ema
[13:32:01] <wikibugs>	 (03PS2) 10Mathew.onipe: icinga: add old JVM GC check for elastic [puppet] - 10https://gerrit.wikimedia.org/r/533189 (https://phabricator.wikimedia.org/T231516)
[13:40:05] <icinga-wm>	 RECOVERY - PHP opcache health on mwdebug1001 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[13:40:33] <wikibugs>	 (03CR) 10BBlack: ncredir: Perform HTTPS upgrade without crossing domain boundaries (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/533133 (https://phabricator.wikimedia.org/T231513) (owner: 10Vgutierrez)
[13:42:54] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: cergen fails signing CSR - https://phabricator.wikimedia.org/T231423 (10Ottomata) 05Open→03Resolved
[13:43:49] <wikibugs>	 (03PS1) 10Andrew Bogott: prometheus: add some cloud-dev dns metrics to the codfw prometheus host [puppet] - 10https://gerrit.wikimedia.org/r/533210 (https://phabricator.wikimedia.org/T224828)
[13:44:15] <wikibugs>	 (03PS1) 10Urbanecm: Fix "Assign all rights assigned to suppress group to oversight group" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533211 (https://phabricator.wikimedia.org/T230601)
[13:45:06] <Urbanecm>	 Daimona: could you review ^^, please?
[13:45:48] <Daimona>	 Yep
[13:47:40] <Urbanecm>	 thx
[13:48:15] <wikibugs>	 (03CR) 10Daimona Eaytoy: [C: 04-1] Fix "Assign all rights assigned to suppress group to oversight group" (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533211 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[13:49:20] <wikibugs>	 (03PS2) 10Urbanecm: Fix "Assign all rights assigned to suppress group to oversight group" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533211 (https://phabricator.wikimedia.org/T230601)
[13:49:23] <Urbanecm>	 Daimona: like that? 
[13:52:36] <wikibugs>	 (03PS1) 10Vgutierrez: Point wikimania.org to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/533213 (https://phabricator.wikimedia.org/T133548)
[13:52:43] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] icinga: add old JVM GC check for elastic [puppet] - 10https://gerrit.wikimedia.org/r/533189 (https://phabricator.wikimedia.org/T231516) (owner: 10Mathew.onipe)
[13:52:53] <wikibugs>	 (03CR) 10Daimona Eaytoy: [C: 03+1] Fix "Assign all rights assigned to suppress group to oversight group" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533211 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[13:53:01] <Daimona>	 Yes. Haven't tested, but can help testing on mwdebug
[13:53:17] <Urbanecm>	 jouncebot: now
[13:53:17] <jouncebot>	 For the next 1 hour(s) and 6 minute(s): MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190829T1300)
[13:53:37] <Urbanecm>	 hmm, we have train now, but it seems like blocked
[13:54:07] <Urbanecm>	 zeljkof: do you think I can try the commit above (https://gerrit.wikimedia.org/r/533211) at mwdebug (and deploy if it works) now?
[13:55:39] <zeljkof>	 Urbanecm: is it urgent? can it wait until next swat window? if it's urgent, go ahead, train is blocked, I don't plan to deploy anything soon
[13:55:57] <Daimona>	 Heh, train is very blocked this week...
[13:56:11] <Daimona>	 Ever heard of trenitalia?
[13:56:40] <zeljkof>	 italian railroad company?
[13:56:41] <Daimona>	 Anyway, I believe we can wait
[13:56:51] <wikibugs>	 (03PS2) 10Vgutierrez: redirects.dat: Get rid of non canonical domains rules [puppet] - 10https://gerrit.wikimedia.org/r/533141 (https://phabricator.wikimedia.org/T133548)
[13:57:00] <Urbanecm>	 Yeah, probably
[13:57:01] <Daimona>	 And subject of tons of memes because of the constant delays, yep
[13:57:14] <Daimona>	 That's like internet explorer but on rails
[13:57:20] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 93 probes of 452 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[13:58:02] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 112 probes of 452 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[13:58:05] <Urbanecm>	 thank you Daimona for the review
[13:58:18] <wikibugs>	 (03PS2) 10Andrew Bogott: prometheus: add some cloud-dev dns metrics to the codfw prometheus host [puppet] - 10https://gerrit.wikimedia.org/r/533210 (https://phabricator.wikimedia.org/T224828)
[13:58:26] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 120 probes of 452 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[13:59:08] <wikibugs>	 (03PS1) 10BBlack: Remove wikimedia.ee zonefile [dns] - 10https://gerrit.wikimedia.org/r/533216
[13:59:13] <Daimona>	 np ;)
[14:00:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] prometheus: add some cloud-dev dns metrics to the codfw prometheus host [puppet] - 10https://gerrit.wikimedia.org/r/533210 (https://phabricator.wikimedia.org/T224828) (owner: 10Andrew Bogott)
[14:01:08] <wikibugs>	 (03PS3) 10Vgutierrez: redirects.dat: Enforce HTTPS for canonical domains [puppet] - 10https://gerrit.wikimedia.org/r/533142 (https://phabricator.wikimedia.org/T133548)
[14:01:16] <wikibugs>	 (03PS3) 10Andrew Bogott: prometheus: add some cloud-dev dns metrics to the codfw prometheus host [puppet] - 10https://gerrit.wikimedia.org/r/533210 (https://phabricator.wikimedia.org/T224828)
[14:01:18] <wikibugs>	 (03CR) 10BBlack: [C: 03+1] redirects.dat: Get rid of non canonical domains rules [puppet] - 10https://gerrit.wikimedia.org/r/533141 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez)
[14:01:51] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Remove wikimedia.ee zonefile [dns] - 10https://gerrit.wikimedia.org/r/533216 (owner: 10BBlack)
[14:02:09] <Daimona>	 Urbanecm: would you have time to run a couple of scripts?
[14:02:57] <wikibugs>	 (03PS1) 10Vgutierrez: Point wikimedia.community to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/533219 (https://phabricator.wikimedia.org/T133548)
[14:04:01] <Urbanecm>	 Daimona: sure, which ones?
[14:04:12] <Daimona>	 https://phabricator.wikimedia.org/T231542#5451131
[14:04:27] <Daimona>	 ty
[14:04:33] <wikibugs>	 (03PS1) 10Jhedden: labstore: check nfs v4 cluster status with rpcinfo [puppet] - 10https://gerrit.wikimedia.org/r/533220 (https://phabricator.wikimedia.org/T229448)
[14:04:55] <wikibugs>	 (03PS4) 10Andrew Bogott: prometheus: add some cloud-dev dns metrics to the codfw prometheus host [puppet] - 10https://gerrit.wikimedia.org/r/533210 (https://phabricator.wikimedia.org/T224828)
[14:06:35] <Urbanecm>	 Daimona: looking
[14:06:43] <Daimona>	 Thanks
[14:06:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] prometheus: add some cloud-dev dns metrics to the codfw prometheus host [puppet] - 10https://gerrit.wikimedia.org/r/533210 (https://phabricator.wikimedia.org/T224828) (owner: 10Andrew Bogott)
[14:07:41] <wikibugs>	 (03PS2) 10Jhedden: labstore: check nfs v4 cluster status with rpcinfo [puppet] - 10https://gerrit.wikimedia.org/r/533220 (https://phabricator.wikimedia.org/T229448)
[14:12:25] <wikibugs>	 (03PS5) 10Andrew Bogott: prometheus: add some cloud-dev dns metrics to the codfw prometheus host [puppet] - 10https://gerrit.wikimedia.org/r/533210 (https://phabricator.wikimedia.org/T224828)
[14:13:34] <Urbanecm>	 Daimona: I'm not sure I understand what you want
[14:13:41] <wikibugs>	 (03CR) 10BBlack: [C: 03+1] redirects.dat: Enforce HTTPS for canonical domains [puppet] - 10https://gerrit.wikimedia.org/r/533142 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez)
[14:14:00] <Daimona>	 The last two code bits should be executed via shell.php/eval.php, I'm unsure what's the most suited
[14:14:13] <Daimona>	 First $id = SqlBlobStore::makeAddressFromTextId( 850575 ); to get the id
[14:14:20] <Daimona>	 Then put it in MediaWikiServices::getInstance()->getBlobStore()->getBlob( $id );
[14:14:28] <Daimona>	 It should throw an exception which I'd like to see
[14:15:02] <wikibugs>	 (03CR) 10BBlack: [C: 03+1] ncredir: Enable HSTS with max-age set to 1 week [puppet] - 10https://gerrit.wikimedia.org/r/533140 (https://phabricator.wikimedia.org/T231514) (owner: 10Vgutierrez)
[14:16:10] <Urbanecm>	 looking
[14:16:42] <ema>	 !log depool ats-be on cp1075 to investigate T231504 
[14:16:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:16:49] <stashbot>	 T231504: Unexpectedly received mobile version of an article while logged out - https://phabricator.wikimedia.org/T231504
[14:17:00] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
[14:17:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:17:33] <wikibugs>	 (03PS1) 10CDanis: trafficserver: fix grafana link [puppet] - 10https://gerrit.wikimedia.org/r/533229
[14:19:00] <Urbanecm>	 Daimona: it doesn't seem to throw any exception
[14:19:03] <Urbanecm>	 see https://phabricator.wikimedia.org/P9000
[14:19:04] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 24 probes of 452 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[14:19:55] <Daimona>	 Indeed, thanks, you and hashar did it at the same time :D That's even worse
[14:20:05] <hashar>	 oops sorry
[14:20:10] <Daimona>	 Maybe there's a bug in fetchText
[14:20:15] <Daimona>	 Ahah np
[14:20:29] <Daimona>	 I love the moment when you investigate a bug and find 3 bugs more
[14:21:37] <Urbanecm>	 Daimona: why there should be a bug in fetchText.php?
[14:21:40] <Urbanecm>	 https://phabricator.wikimedia.org/P9001
[14:22:07] <Daimona>	 Per T231542#5450927
[14:22:08] <stashbot>	 T231542: AFPData.php: Refusing to cast DUNDEFINED to something else - https://phabricator.wikimedia.org/T231542
[14:22:24] <Daimona>	 But actually... I've just realized that maybe the ID should be inserted after running the command, and not together...
[14:22:35] <Urbanecm>	 that's bug in hashar's command :-)
[14:23:07] <Daimona>	 Huh, well, nevermind... Let me see what's wrong at this point
[14:23:12] <Urbanecm>	 Sure :-)
[14:25:22] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 22 probes of 452 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[14:25:23] <Daimona>	 Would you mind eval'ing unserialize($res), where $res is the result of fetchText, please?
[14:25:50] <Daimona>	 Copy 'n pasting gives an error
[14:25:59] <Urbanecm>	 not at all
[14:27:01] <wikibugs>	 (03CR) 10CDanis: prometheus: add some cloud-dev dns metrics to the codfw prometheus host (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/533210 (https://phabricator.wikimedia.org/T224828) (owner: 10Andrew Bogott)
[14:28:10] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Point wikimania.org to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/533213 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez)
[14:28:14] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Point wikimedia.community to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/533219 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez)
[14:28:22] <wikibugs>	 (03PS2) 10BBlack: Point wikimania.org to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/533213 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez)
[14:28:28] <Urbanecm>	 Daimona: althrough I'm not sure how am I supposed to assign an output of fetchText.php to a variable in shell
[14:28:39] <Daimona>	 I was also wondering about that.
[14:28:58] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] labstore: check nfs v4 cluster status with rpcinfo [puppet] - 10https://gerrit.wikimedia.org/r/533220 (https://phabricator.wikimedia.org/T229448) (owner: 10Jhedden)
[14:29:00] <Daimona>	 So, we can try to emulate the code via shell.php, tell me when you're ready and I'll paste the commands
[14:29:05] <Urbanecm>	 Daimona: shouldn't we coordinate the process of fixing in T231542 in -dev or something?
[14:29:05] <stashbot>	 T231542: AFPData.php: Refusing to cast DUNDEFINED to something else - https://phabricator.wikimedia.org/T231542
[14:29:11] <Urbanecm>	 not sure how is this relevant to -operations scope
[14:29:19] <Daimona>	 Yes, sure
[14:29:40] <Daimona>	 But a private chat is also fine, I'll post relevant updates on phab
[14:29:46] <wikibugs>	 (03PS6) 10Andrew Bogott: prometheus: add some cloud-dev dns metrics to the codfw prometheus host [puppet] - 10https://gerrit.wikimedia.org/r/533210 (https://phabricator.wikimedia.org/T224828)
[14:30:18] <wikibugs>	 (03PS2) 10BBlack: Point wikimedia.community to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/533219 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez)
[14:32:21] <wikibugs>	 (03CR) 10Andrew Bogott: "pcc run: https://puppet-compiler.wmflabs.org/compiler1002/18113/" [puppet] - 10https://gerrit.wikimedia.org/r/533210 (https://phabricator.wikimedia.org/T224828) (owner: 10Andrew Bogott)
[14:34:23] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] "> Patch Set 6:" [puppet] - 10https://gerrit.wikimedia.org/r/533210 (https://phabricator.wikimedia.org/T224828) (owner: 10Andrew Bogott)
[14:34:58] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] prometheus: add some cloud-dev dns metrics to the codfw prometheus host [puppet] - 10https://gerrit.wikimedia.org/r/533210 (https://phabricator.wikimedia.org/T224828) (owner: 10Andrew Bogott)
[14:36:41] <wikibugs>	 (03PS3) 10BBlack: redirects.dat: Get rid of non canonical domains rules [puppet] - 10https://gerrit.wikimedia.org/r/533141 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez)
[14:36:50] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 33 probes of 452 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[14:38:41] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] redirects.dat: Get rid of non canonical domains rules [puppet] - 10https://gerrit.wikimedia.org/r/533141 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez)
[14:39:05] <wikibugs>	 (03PS4) 10BBlack: redirects.dat: Enforce HTTPS for canonical domains [puppet] - 10https://gerrit.wikimedia.org/r/533142 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez)
[14:41:21] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] redirects.dat: Enforce HTTPS for canonical domains [puppet] - 10https://gerrit.wikimedia.org/r/533142 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez)
[14:42:42] <wikibugs>	 (03PS2) 10Ottomata: Release Spark 2.4.3 [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/532455 (https://phabricator.wikimedia.org/T222253)
[14:47:02] <wikibugs>	 (03CR) 10Thcipriani: "> outdated or still holding?" [puppet] - 10https://gerrit.wikimedia.org/r/528433 (owner: 10Paladox)
[14:48:10] <wikibugs>	 10Operations, 10ops-codfw, 10decommission: Decommission db2034 - https://phabricator.wikimedia.org/T223216 (10Papaul) ` papaul@asw-a-codfw# run show interfaces ge-5/0/32 descriptions     Interface       Admin Link Description ge-5/0/32       down  down DISABLED
[14:48:51] <wikibugs>	 (03PS1) 10BBlack: redirects.dat: secure external redirects [puppet] - 10https://gerrit.wikimedia.org/r/533236
[14:51:52] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] redirects.dat: secure external redirects [puppet] - 10https://gerrit.wikimedia.org/r/533236 (owner: 10BBlack)
[14:57:47] <wikibugs>	 (03PS2) 10Vgutierrez: ncredir: Perform HTTPS upgrade without crossing domain boundaries [puppet] - 10https://gerrit.wikimedia.org/r/533133 (https://phabricator.wikimedia.org/T231513)
[15:04:09] <wikibugs>	 (03CR) 10Jhedden: [C: 03+2] labstore: check nfs v4 cluster status with rpcinfo [puppet] - 10https://gerrit.wikimedia.org/r/533220 (https://phabricator.wikimedia.org/T229448) (owner: 10Jhedden)
[15:04:17] <wikibugs>	 (03PS3) 10Jhedden: labstore: check nfs v4 cluster status with rpcinfo [puppet] - 10https://gerrit.wikimedia.org/r/533220 (https://phabricator.wikimedia.org/T229448)
[15:11:40] <wikibugs>	 (03CR) 10BBlack: ncredir: Perform HTTPS upgrade without crossing domain boundaries (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/533133 (https://phabricator.wikimedia.org/T231513) (owner: 10Vgutierrez)
[15:11:51] <wikibugs>	 (03CR) 10BBlack: [C: 03+1] ncredir: Perform HTTPS upgrade without crossing domain boundaries [puppet] - 10https://gerrit.wikimedia.org/r/533133 (https://phabricator.wikimedia.org/T231513) (owner: 10Vgutierrez)
[15:14:00] <wikibugs>	 (03PS3) 10Vgutierrez: ncredir: Perform HTTPS upgrade without crossing domain boundaries [puppet] - 10https://gerrit.wikimedia.org/r/533133 (https://phabricator.wikimedia.org/T231513)
[15:15:16] <wikibugs>	 (03CR) 10Vgutierrez: ncredir: Perform HTTPS upgrade without crossing domain boundaries (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/533133 (https://phabricator.wikimedia.org/T231513) (owner: 10Vgutierrez)
[15:15:23] <wikibugs>	 (03CR) 10BBlack: [C: 03+1] ncredir: Perform HTTPS upgrade without crossing domain boundaries [puppet] - 10https://gerrit.wikimedia.org/r/533133 (https://phabricator.wikimedia.org/T231513) (owner: 10Vgutierrez)
[15:15:37] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] ncredir: Perform HTTPS upgrade without crossing domain boundaries [puppet] - 10https://gerrit.wikimedia.org/r/533133 (https://phabricator.wikimedia.org/T231513) (owner: 10Vgutierrez)
[15:15:47] <wikibugs>	 (03PS4) 10Vgutierrez: ncredir: Perform HTTPS upgrade without crossing domain boundaries [puppet] - 10https://gerrit.wikimedia.org/r/533133 (https://phabricator.wikimedia.org/T231513)
[15:18:50] <icinga-wm>	 RECOVERY - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is OK: puppetdb.PuppetDB OK https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[15:19:01] <wikibugs>	 (03PS1) 10Ema: ATS: perform MW and RB mangling after cache lookup [puppet] - 10https://gerrit.wikimedia.org/r/533242 (https://phabricator.wikimedia.org/T231504)
[15:21:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ATS: perform MW and RB mangling after cache lookup [puppet] - 10https://gerrit.wikimedia.org/r/533242 (https://phabricator.wikimedia.org/T231504) (owner: 10Ema)
[15:24:19] <wikibugs>	 (03PS2) 10Vgutierrez: ncredir: Enable HSTS with max-age set to 1 week [puppet] - 10https://gerrit.wikimedia.org/r/533140 (https://phabricator.wikimedia.org/T231514)
[15:26:55] <wikibugs>	 (03PS2) 10Ema: ATS: perform MW and RB mangling after cache lookup [puppet] - 10https://gerrit.wikimedia.org/r/533242 (https://phabricator.wikimedia.org/T231504)
[15:27:25] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] ncredir: Enable HSTS with max-age set to 1 week [puppet] - 10https://gerrit.wikimedia.org/r/533140 (https://phabricator.wikimedia.org/T231514) (owner: 10Vgutierrez)
[15:36:19] <wikibugs>	 (03PS1) 10Vgutierrez: ncredir: Get rid of wikimedia.ee [puppet] - 10https://gerrit.wikimedia.org/r/533245
[15:36:23] <wikibugs>	 (03PS2) 10Ayounsi: Add bash script between FNM and notify script [puppet] - 10https://gerrit.wikimedia.org/r/533081 (https://phabricator.wikimedia.org/T226810)
[15:37:00] <wikibugs>	 (03CR) 10Ayounsi: "Answered, thanks!" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/533081 (https://phabricator.wikimedia.org/T226810) (owner: 10Ayounsi)
[15:37:52] <wikibugs>	 (03PS1) 10Papaul: DNS: Remove mgmt DNS for db2034 [dns] - 10https://gerrit.wikimedia.org/r/533247
[15:38:54] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] ncredir: Get rid of wikimedia.ee [puppet] - 10https://gerrit.wikimedia.org/r/533245 (owner: 10Vgutierrez)
[15:38:58] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 53.33% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[15:39:38] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/18114/netflow1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/533081 (https://phabricator.wikimedia.org/T226810) (owner: 10Ayounsi)
[15:39:48] <wikibugs>	 (03PS3) 10Ayounsi: Add bash script between FNM and notify script [puppet] - 10https://gerrit.wikimedia.org/r/533081 (https://phabricator.wikimedia.org/T226810)
[15:40:44] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] DNS: Remove mgmt DNS for db2034 [dns] - 10https://gerrit.wikimedia.org/r/533247 (owner: 10Papaul)
[16:00:04] <jouncebot>	 godog and _joe_: (Dis)respected human, time to deploy Puppet SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190829T1600). Please do the needful.
[16:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[16:05:20] <wikibugs>	 (03PS3) 10Ema: ATS: perform MW and RB mangling after cache lookup [puppet] - 10https://gerrit.wikimedia.org/r/533242 (https://phabricator.wikimedia.org/T231504)
[16:07:04] <wikibugs>	 (03PS1) 10Papaul: DNS: Remove mgmt DNS for db2045 [dns] - 10https://gerrit.wikimedia.org/r/533255
[16:07:52] <wikibugs>	 (03CR) 10Ema: [C: 03+2] ATS: perform MW and RB mangling after cache lookup [puppet] - 10https://gerrit.wikimedia.org/r/533242 (https://phabricator.wikimedia.org/T231504) (owner: 10Ema)
[16:08:11] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] DNS: Remove mgmt DNS for db2045 [dns] - 10https://gerrit.wikimedia.org/r/533255 (owner: 10Papaul)
[16:13:10] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[16:23:22] <wikibugs>	 (03PS7) 10CRusnov: Add Netbox instance addresses [dns] - 10https://gerrit.wikimedia.org/r/532502 (https://phabricator.wikimedia.org/T223291)
[16:25:41] <wikibugs>	 (03CR) 10CRusnov: [C: 03+2] Add Netbox instance addresses [dns] - 10https://gerrit.wikimedia.org/r/532502 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov)
[16:30:34] <wikibugs>	 (03PS2) 10Dzahn: planet: include envoy for TLS termination [puppet] - 10https://gerrit.wikimedia.org/r/533197 (https://phabricator.wikimedia.org/T210411)
[16:33:07] <wikibugs>	 (03CR) 10Jforrester: "Doing T112147 would be nicer, of course. :-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533211 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[16:33:10] <wikibugs>	 (03PS2) 10Cwhite: add the option of passing a custom metrics context manager to EndpointRequest [software/service-checker] - 10https://gerrit.wikimedia.org/r/532807
[16:34:18] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] planet: include envoy for TLS termination [puppet] - 10https://gerrit.wikimedia.org/r/533197 (https://phabricator.wikimedia.org/T210411) (owner: 10Dzahn)
[16:36:40] <wikibugs>	 (03PS1) 10Dzahn: Revert "planet: include envoy for TLS termination" [puppet] - 10https://gerrit.wikimedia.org/r/533263
[16:38:40] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] Revert "planet: include envoy for TLS termination" [puppet] - 10https://gerrit.wikimedia.org/r/533263 (owner: 10Dzahn)
[16:49:09] <logmsgbot>	 !log crusnov@cumin1001 START - Cookbook sre.ganeti.makevm
[16:49:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:49:20] <logmsgbot>	 !log crusnov@cumin1001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
[16:49:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:49:30] <logmsgbot>	 !log crusnov@cumin1001 START - Cookbook sre.ganeti.makevm
[16:49:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:52:28] <wikibugs>	 (03PS1) 10Bstorm: pdns: set the recursor threads in line with best practices [puppet] - 10https://gerrit.wikimedia.org/r/533268 (https://phabricator.wikimedia.org/T224828)
[16:54:02] <wikibugs>	 (03CR) 10Urbanecm: "> Patch Set 2:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533211 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[16:58:05] <wikibugs>	 (03CR) 10Thcipriani: [C: 03+2] Update go-import and wikimedia plugins [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/525869 (owner: 10Paladox)
[16:59:25] <logmsgbot>	 !log crusnov@cumin1001 START - Cookbook sre.ganeti.makevm
[16:59:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:00:04] <jouncebot>	 cscott, arlolra, subbu, halfak, and accraze: It is that lovely time of the day again! You are hereby commanded to deploy Services – Graphoid / Parsoid / Citoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190829T1700).
[17:00:19] <subbu>	 no parsoid deploy today
[17:03:10] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is CRITICAL: 59.76 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[17:03:34] <wikibugs>	 (03CR) 10Bstorm: [C: 04-1] "We don't want to merge this until we've done some tests and tried a bit more to reproduce the problem just in case this is related." [puppet] - 10https://gerrit.wikimedia.org/r/533268 (https://phabricator.wikimedia.org/T224828) (owner: 10Bstorm)
[17:03:48] <wikibugs>	 (03CR) 10Jhedden: [C: 03+1] pdns: set the recursor threads in line with best practices [puppet] - 10https://gerrit.wikimedia.org/r/533268 (https://phabricator.wikimedia.org/T224828) (owner: 10Bstorm)
[17:04:44] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is OK: (C)60 le (W)70 le 78.38 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[17:05:12] <wikibugs>	 (03Merged) 10jenkins-bot: Update go-import and wikimedia plugins [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/525869 (owner: 10Paladox)
[17:06:03] <davidwbarratt>	 Will there be a train next week? https://wikitech.wikimedia.org/wiki/Deployments wasn't sure if there would be with the US Holiday or not
[17:07:54] <liw>	 davidwbarratt, I believe there will be, handled by Europeans
[17:08:30] <davidwbarratt>	 ah, ok, great!
[17:09:00] <logmsgbot>	 !log crusnov@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
[17:09:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:40] <logmsgbot>	 !log crusnov@cumin1001 START - Cookbook sre.ganeti.makevm
[17:09:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:10:12] <logmsgbot>	 !log crusnov@cumin1001 START - Cookbook sre.ganeti.makevm
[17:10:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:15:20] <dcausse>	 !log restarted elasticsearch on cloudelastic1004 (T231517)
[17:15:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:15:26] <stashbot>	 T231517: Investigate and fix GC issues on cloudelastic machines - https://phabricator.wikimedia.org/T231517
[17:16:07] <wikibugs>	 (03CR) 10Daimona Eaytoy: [C: 03+1] "> > Patch Set 2:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533211 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[17:19:22] <logmsgbot>	 !log crusnov@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
[17:19:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:22:01] <wikibugs>	 (03CR) 10Volans: "Sorry, I couldn't do a full pass now, I did a very quick one. I also had some pre-existing un-pushed comments, ignore them if they don't a" (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov)
[17:22:08] <logmsgbot>	 !log crusnov@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
[17:22:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:31:27] <James_F>	 davidwbarratt: One of the reasons we have the train on Tuesday–Thursday is to avoid having to cancel it on weeks when there's a Monday holiday.
[17:48:44] * Krinkle staging on mwdebug1002 soon with https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/533267/ and https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralAuth/+/533218/
[18:00:05] <jouncebot>	 MaxSem, RoanKattouw, Niharika, and Urbanecm: How many deployers does it take to do Morning SWAT (Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190829T1800).
[18:00:05] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[18:00:21] * Krinkle takes the window for now, although still waiting for CI
[18:00:44] <Urbanecm>	 Krinkle: ping me once you're done, please
[18:00:57] <Krinkle>	 Urbanecm: config deploy?
[18:01:25] <Urbanecm>	 Krinkle: yup
[18:01:29] <Urbanecm>	 (but I can wait)
[18:01:34] <Urbanecm>	 as long as it isn't wait three hours
[18:02:36] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 35.71% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[18:04:13] <Krinkle>	 Urbanecm: go ahead then, will be quicker than my CI MW patches
[18:04:26] <Krinkle>	 just be careful not to git pull in any php.* dir just in case.
[18:04:39] * Urbanecm takes the window then
[18:04:40] <Urbanecm>	 thanks Krinkle 
[18:04:47] <wikibugs>	 (03PS1) 10Herron: prometheus: aggregate systemd failed metrics [puppet] - 10https://gerrit.wikimedia.org/r/533282 (https://phabricator.wikimedia.org/T230570)
[18:05:59] <wikibugs>	 (03PS3) 10Urbanecm: Fix "Assign all rights assigned to suppress group to oversight group" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533211 (https://phabricator.wikimedia.org/T230601)
[18:06:15] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Fix "Assign all rights assigned to suppress group to oversight group" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533211 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[18:07:16] <ebernhardson>	 !log increase index.refresh_interval to 5m for all indices on cloudelastic-chi
[18:07:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:08:03] * Urbanecm waits on CI as well
[18:10:43] <wikibugs>	 (03Merged) 10jenkins-bot: Fix "Assign all rights assigned to suppress group to oversight group" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533211 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[18:10:54] <wikibugs>	 (03CR) 10jenkins-bot: Fix "Assign all rights assigned to suppress group to oversight group" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533211 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[18:11:10] <wikibugs>	 (03PS1) 10Andrew Bogott: Update dns entries for codfw1dev, the cloud test region [dns] - 10https://gerrit.wikimedia.org/r/533286
[18:11:27] <wikibugs>	 (03PS1) 10Andrew Bogott: designate: update pool config for a single server in codfw1-dev [puppet] - 10https://gerrit.wikimedia.org/r/533287
[18:12:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Update dns entries for codfw1dev, the cloud test region [dns] - 10https://gerrit.wikimedia.org/r/533286 (owner: 10Andrew Bogott)
[18:17:07] <wikibugs>	 (03PS2) 10Andrew Bogott: Update dns entries for codfw1dev, the cloud test region [dns] - 10https://gerrit.wikimedia.org/r/533286
[18:17:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Update dns entries for codfw1dev, the cloud test region [dns] - 10https://gerrit.wikimedia.org/r/533286 (owner: 10Andrew Bogott)
[18:18:33] <wikibugs>	 (03PS1) 10Urbanecm: Follow up for e70da21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533288 (https://phabricator.wikimedia.org/T230601)
[18:18:56] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Follow up for e70da21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533288 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[18:20:39] <wikibugs>	 (03CR) 10Jforrester: [C: 03+1] Follow up for e70da21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533288 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[18:20:40] <Krinkle>	 Urbanecm: my patches have landed, fyi - let me know when done :)
[18:21:23] <wikibugs>	 (03PS2) 10Urbanecm: Follow up for e70da21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533288 (https://phabricator.wikimedia.org/T230601)
[18:21:30] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Follow up for e70da21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533288 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[18:21:43] <Urbanecm>	 Krinkle: syncing the patch(es)
[18:21:43] <wikibugs>	 (03PS3) 10Andrew Bogott: Update dns entries for codfw1dev, the cloud test region [dns] - 10https://gerrit.wikimedia.org/r/533286
[18:22:11] <Urbanecm>	 but the second one I just +2'ed is on deploy1001, but not in git yet
[18:22:19] <Urbanecm>	 (was making sure it works before uploading)
[18:22:48] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/CommonSettings.php: SWAT: Fix "Assign all rights assigned to suppress group to oversight group" (T230601) (duration: 00m 54s)
[18:23:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:23:06] <stashbot>	 T230601: Groups 'oversight'/'suppress' should be reconciled - https://phabricator.wikimedia.org/T230601
[18:23:23] <ebernhardson>	 !log restart elasticsearch on cloudelastic1001 (T231517)
[18:23:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:23:29] <stashbot>	 T231517: Investigate and fix GC issues on cloudelastic machines - https://phabricator.wikimedia.org/T231517
[18:23:44] <Urbanecm>	 I'm done with actual deployment, but the commit has a different commit message in git and deploy1001, so I'd need to do some git-fu to have it really done
[18:23:54] <Urbanecm>	 Krinkle: if you want, you can deploy, but bear what I wrote in mind
[18:23:58] <Urbanecm>	 (not safe to deploy config rn)
[18:24:01] <wikibugs>	 (03CR) 10DannyS712: [C: 04-1] "The latter does work; its appears to be working currently" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533288 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[18:24:52] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "> Patch Set 2: Code-Review-1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533288 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[18:25:42] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Update dns entries for codfw1dev, the cloud test region [dns] - 10https://gerrit.wikimedia.org/r/533286 (owner: 10Andrew Bogott)
[18:25:44] <wikibugs>	 (03CR) 10DannyS712: [C: 03+1] "> > Patch Set 2: Code-Review-1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533288 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[18:25:46] <wikibugs>	 (03Merged) 10jenkins-bot: Follow up for e70da21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533288 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[18:26:01] <wikibugs>	 (03CR) 10jenkins-bot: Follow up for e70da21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533288 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[18:26:26] <Urbanecm>	 Krinkle: done for real
[18:27:06] <Krinkle>	 ok :)
[18:27:37] * Krinkle staging on mwdebug1002
[18:28:42] <wikibugs>	 (03PS4) 10Andrew Bogott: Update dns entries for codfw1dev, the cloud test region [dns] - 10https://gerrit.wikimedia.org/r/533286
[18:30:19] <wikibugs>	 (03CR) 10Daimona Eaytoy: "Hah, because use() passes in the current value of $wg..., not the one it'll have when running the update. I'm sorry, my fault." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533288 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[18:30:52] <wikibugs>	 (03CR) 10Urbanecm: "> Patch Set 2:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533288 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[18:31:48] <Daimona>	 Krinkle: is the AF change on mwdebug?
[18:32:09] <Krinkle>	 Daimona: pulling as we speak
[18:32:18] <Krinkle>	 live on mwdebug1002 now
[18:32:24] <Daimona>	 OK, I'm gonna check as well
[18:32:57] <Daimona>	 Works!
[18:33:20] <Daimona>	 So the suspect about private + unserialize was right
[18:33:39] <Krinkle>	 [XWgaYgpAAC4AAGLnkcQAAAAS] /wiki/Especial:RegistroAbusos/4122   Error from line 639 of /srv/mediawiki/php-1.34.0-wmf.20/includes/Revision.php: Call to a member function getId() on null
[18:33:45] <Daimona>	 That's unrelated
[18:33:47] <logmsgbot>	 !log krinkle@deploy1001 Synchronized php-1.34.0-wmf.20/extensions/CentralAuth/modules/ext.centralauth.ForeignApi.js: e7cd3cd313a4642 (duration: 00m 55s)
[18:33:47] <James_F>	 Eurgh.
[18:33:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:33:54] <Daimona>	 And PHP7-only
[18:33:54] <Krinkle>	 Caught BadMethodCallException - T187153
[18:33:55] <stashbot>	 T187153: Special:Abuselog throws when viewing details or examining (BadMethodCallException: Call get getId() on null) - https://phabricator.wikimedia.org/T187153
[18:34:05] <Daimona>	 T187153
[18:34:17] <Daimona>	 The fun never ends when you stuff serialized classes in the DB ;)
[18:34:42] <Krinkle>	 Alright, Scap, let's scatter the code around production.
[18:34:50] <James_F>	 Yeah, getting rid of that is epic, but will be so worth it.
[18:36:17] <Krinkle>	 Beauty is in the AbuseFilter of the VariableHolder – .php
[18:36:18] <Daimona>	 And it's even funnier when HHVM throws a BadMethodCallException, and PHP7 an Error
[18:36:36] <logmsgbot>	 !log krinkle@deploy1001 Synchronized php-1.34.0-wmf.20/extensions/AbuseFilter/includes/AbuseFilterVariableHolder.php: T231542 f37f0bd50cf (duration: 00m 53s)
[18:36:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:36:54] <stashbot>	 T231542: AFPData.php: Refusing to cast DUNDEFINED to something else - https://phabricator.wikimedia.org/T231542
[18:39:50] <aharoni>	 hallo
[18:40:02] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[18:40:08] <aharoni>	 Krenair - thanks for all the recent translatewiki patches
[18:40:15] <aharoni>	 I'm not sure about https://gerrit.wikimedia.org/r/#/c/translatewiki/+/532120/ , though
[18:40:38] <aharoni>	 I might be wrong, but I suspect that Cargo is not actually used on Wikimedia servers.
[18:40:51] <James_F>	 It isn't.
[18:41:08] <James_F>	 But the API messages in any repo aren't great for most translators.
[18:41:32] <aharoni>	 It somehow made it to the checklist at https://phabricator.wikimedia.org/T189982 . It's good to split the messages, but it shouldn't be configured under "Used by Wikimedia".
[18:41:37] <aharoni>	 I'll submit a path.
[18:41:40] <aharoni>	 patch
[18:41:48] <ebernhardson>	 !log set index.merge.scheduler.max_thread_count to null to accept default values on cloudelastic-chi (T231517)
[18:41:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:41:55] <stashbot>	 T231517: Investigate and fix GC issues on cloudelastic machines - https://phabricator.wikimedia.org/T231517
[18:42:10] <James_F>	 Good spot, aharoni.
[18:48:28] <James_F>	 aharoni: BTW, https://gerrit.wikimedia.org/r/c/translatewiki/+/475614 needs a rebase, then I'll merge.
[18:50:15] <ebernhardson>	 !log restart elasticsearch on cloudelastic1002 (T231517)
[18:50:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:50:21] <stashbot>	 T231517: Investigate and fix GC issues on cloudelastic machines - https://phabricator.wikimedia.org/T231517
[18:50:53] <aharoni>	 James_F - thanks, done
[18:56:52] <wikibugs>	 (03PS1) 10CRusnov: install_server: at netbox server types, and dhcp config [puppet] - 10https://gerrit.wikimedia.org/r/533301
[18:57:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] install_server: at netbox server types, and dhcp config [puppet] - 10https://gerrit.wikimedia.org/r/533301 (owner: 10CRusnov)
[18:59:00] <ebernhardson>	 !log restart elasticsearch on cloudelastic1003 (T231517)
[18:59:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:59:09] <stashbot>	 T231517: Investigate and fix GC issues on cloudelastic machines - https://phabricator.wikimedia.org/T231517
[18:59:58] <wikibugs>	 (03PS2) 10CRusnov: install_server: at netbox server types, and dhcp config [puppet] - 10https://gerrit.wikimedia.org/r/533301
[19:00:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] install_server: at netbox server types, and dhcp config [puppet] - 10https://gerrit.wikimedia.org/r/533301 (owner: 10CRusnov)
[19:01:20] <wikibugs>	 (03PS1) 10Jhedden: labstore: Open NFS between secondary servers [puppet] - 10https://gerrit.wikimedia.org/r/533302 (https://phabricator.wikimedia.org/T229448)
[19:04:06] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] labstore: Open NFS between secondary servers [puppet] - 10https://gerrit.wikimedia.org/r/533302 (https://phabricator.wikimedia.org/T229448) (owner: 10Jhedden)
[19:04:55] <wikibugs>	 (03PS2) 10Jhedden: labstore: Open NFS between secondary servers [puppet] - 10https://gerrit.wikimedia.org/r/533302 (https://phabricator.wikimedia.org/T229448)
[19:06:56] <wikibugs>	 (03CR) 10Jhedden: [C: 03+2] labstore: Open NFS between secondary servers [puppet] - 10https://gerrit.wikimedia.org/r/533302 (https://phabricator.wikimedia.org/T229448) (owner: 10Jhedden)
[19:17:08] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Update dns entries for codfw1dev, the cloud test region [dns] - 10https://gerrit.wikimedia.org/r/533286 (owner: 10Andrew Bogott)
[19:19:52] <wikibugs>	 (03PS2) 10Andrew Bogott: designate: update pool config for a single server in codfw1-dev [puppet] - 10https://gerrit.wikimedia.org/r/533287
[19:20:43] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] designate: update pool config for a single server in codfw1-dev [puppet] - 10https://gerrit.wikimedia.org/r/533287 (owner: 10Andrew Bogott)
[19:31:47] <wikibugs>	 (03Abandoned) 10Jhedden: toolschecker: match nginx and wsgi timeouts [puppet] - 10https://gerrit.wikimedia.org/r/528892 (https://phabricator.wikimedia.org/T221301) (owner: 10Jhedden)
[19:32:03] <wikibugs>	 (03Abandoned) 10Jhedden: toolschecker: check status for webservice tasks [puppet] - 10https://gerrit.wikimedia.org/r/528897 (https://phabricator.wikimedia.org/T221301) (owner: 10Jhedden)
[19:39:02] <wikibugs>	 (03PS1) 10Bstorm: powerdns: correct some database variables in my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/533308 (https://phabricator.wikimedia.org/T224828)
[19:44:07] <wikibugs>	 (03PS1) 10Jhedden: toolschecker: remove webservice grid and k8s check [puppet] - 10https://gerrit.wikimedia.org/r/533310 (https://phabricator.wikimedia.org/T221301)
[19:45:24] <wikibugs>	 (03CR) 10Jhedden: [C: 03+1] powerdns: correct some database variables in my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/533308 (https://phabricator.wikimedia.org/T224828) (owner: 10Bstorm)
[19:48:53] <wikibugs>	 (03CR) 10Jhedden: [C: 03+2] toolschecker: remove webservice grid and k8s check [puppet] - 10https://gerrit.wikimedia.org/r/533310 (https://phabricator.wikimedia.org/T221301) (owner: 10Jhedden)
[19:56:50] <ebernhardson>	 !log cloudelastic-chi run frwiki_content/_forcemerge?only_expunge_deletes=true to try and fix 5gb segments with 96% deleted documents
[19:57:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:58:27] <bblack>	 /win 21
[20:03:07] <wikibugs>	 (03PS1) 10Andrew Bogott: profile::openstack::codfw1dev::db: allow designate host to access db server [puppet] - 10https://gerrit.wikimedia.org/r/533314
[20:07:21] <wikibugs>	 (03PS3) 10CRusnov: install_server: at netbox server types, and dhcp config [puppet] - 10https://gerrit.wikimedia.org/r/533301
[20:08:15] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] install_server: at netbox server types, and dhcp config [puppet] - 10https://gerrit.wikimedia.org/r/533301 (owner: 10CRusnov)
[20:09:31] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] profile::openstack::codfw1dev::db: allow designate host to access db server [puppet] - 10https://gerrit.wikimedia.org/r/533314 (owner: 10Andrew Bogott)
[20:11:29] <wikibugs>	 (03PS1) 10Jhedden: toolschecker: remove showmount check [puppet] - 10https://gerrit.wikimedia.org/r/533318 (https://phabricator.wikimedia.org/T229448)
[20:11:40] <wikibugs>	 (03PS4) 10CRusnov: install_server: at netbox server types, and dhcp config [puppet] - 10https://gerrit.wikimedia.org/r/533301
[20:14:13] <foks>	 !log removing two files for legal compliance
[20:14:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:15:18] <wikibugs>	 (03CR) 10Jhedden: [C: 03+2] toolschecker: remove showmount check [puppet] - 10https://gerrit.wikimedia.org/r/533318 (https://phabricator.wikimedia.org/T229448) (owner: 10Jhedden)
[20:15:35] <wikibugs>	 (03PS2) 10Jhedden: toolschecker: remove showmount check [puppet] - 10https://gerrit.wikimedia.org/r/533318 (https://phabricator.wikimedia.org/T229448)
[20:16:17] <wikibugs>	 (03PS7) 10BryanDavis: docker: add support for "testing" tags [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/528178 (https://phabricator.wikimedia.org/T224558) (owner: 10Bstorm)
[20:16:19] <wikibugs>	 (03PS1) 10BryanDavis: Add black to tox [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/533320
[20:16:21] <wikibugs>	 (03PS1) 10BryanDavis: flake8: remove ignored tests [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/533321
[20:16:23] <wikibugs>	 (03PS1) 10BryanDavis: Ignore a local .python-version file [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/533322
[20:16:25] <wikibugs>	 (03PS1) 10BryanDavis: Add --single CLI argument [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/533323
[20:16:27] <wikibugs>	 (03PS1) 10BryanDavis: Downgrade libbz2 for jessie images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/533324
[20:30:47] <wikibugs>	 (03CR) 10CRusnov: [C: 03+2] librenms: Exclude problematic InventoryItem type as requested [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/533128 (https://phabricator.wikimedia.org/T231502) (owner: 10CRusnov)
[20:31:32] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] docker: add support for "testing" tags [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/528178 (https://phabricator.wikimedia.org/T224558) (owner: 10Bstorm)
[20:32:18] <wikibugs>	 (03Merged) 10jenkins-bot: docker: add support for "testing" tags [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/528178 (https://phabricator.wikimedia.org/T224558) (owner: 10Bstorm)
[20:35:25] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Add black to tox [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/533320 (owner: 10BryanDavis)
[20:35:46] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] flake8: remove ignored tests [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/533321 (owner: 10BryanDavis)
[20:35:53] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Ignore a local .python-version file [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/533322 (owner: 10BryanDavis)
[20:38:01] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] "Assuming the MACs match what's given by Ganeti" [puppet] - 10https://gerrit.wikimedia.org/r/533301 (owner: 10CRusnov)
[20:39:12] <wikibugs>	 (03Merged) 10jenkins-bot: Add black to tox [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/533320 (owner: 10BryanDavis)
[20:39:14] <wikibugs>	 (03CR) 10CRusnov: [C: 03+2] install_server: at netbox server types, and dhcp config [puppet] - 10https://gerrit.wikimedia.org/r/533301 (owner: 10CRusnov)
[20:39:17] <wikibugs>	 (03Merged) 10jenkins-bot: flake8: remove ignored tests [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/533321 (owner: 10BryanDavis)
[20:39:34] <wikibugs>	 (03Merged) 10jenkins-bot: Ignore a local .python-version file [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/533322 (owner: 10BryanDavis)
[20:39:45] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Add --single CLI argument [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/533323 (owner: 10BryanDavis)
[20:40:09] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Downgrade libbz2 for jessie images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/533324 (owner: 10BryanDavis)
[20:40:35] <wikibugs>	 (03Merged) 10jenkins-bot: Add --single CLI argument [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/533323 (owner: 10BryanDavis)
[20:40:58] <wikibugs>	 (03Merged) 10jenkins-bot: Downgrade libbz2 for jessie images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/533324 (owner: 10BryanDavis)
[20:44:34] <wikibugs>	 (03PS5) 10CRusnov: install_server: at netbox server types, and dhcp config [puppet] - 10https://gerrit.wikimedia.org/r/533301
[20:45:25] <wikibugs>	 (03CR) 10CRusnov: [V: 03+2 C: 03+2] install_server: at netbox server types, and dhcp config [puppet] - 10https://gerrit.wikimedia.org/r/533301 (owner: 10CRusnov)
[21:11:50] <icinga-wm>	 PROBLEM - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[21:14:09] <wikibugs>	 (03PS1) 10Paladox: Merge tag 'v2.15.16' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/533332
[21:14:44] <wikibugs>	 (03PS1) 10Andrew Bogott: codfw1dev: standardize on a single pdns server, codfw1dev-ns0 [puppet] - 10https://gerrit.wikimedia.org/r/533333
[21:15:55] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] codfw1dev: standardize on a single pdns server, codfw1dev-ns0 [puppet] - 10https://gerrit.wikimedia.org/r/533333 (owner: 10Andrew Bogott)
[21:17:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Merge tag 'v2.15.16' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/533332 (owner: 10Paladox)
[21:32:33] <wikibugs>	 (03PS1) 10Paladox: Support newer bazel versions [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/533336
[21:32:44] <wikibugs>	 (03PS2) 10Paladox: Support newer bazel versions [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/533336
[21:32:55] <wikibugs>	 (03PS3) 10Paladox: Support newer bazel versions [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/533336
[21:36:27] <wikibugs>	 (03PS2) 10Bstorm: toolforge: add CORS header to docker-registry [puppet] - 10https://gerrit.wikimedia.org/r/528617 (owner: 10BryanDavis)
[21:36:50] <wikibugs>	 (03PS4) 10Paladox: Support newer bazel versions [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/533336
[21:36:53] <wikibugs>	 (03PS2) 10Paladox: Merge tag 'v2.15.16' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/533332
[21:45:11] <wikibugs>	 (03Abandoned) 10Paladox: Merge branch 'stable-2.15' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/525865 (owner: 10Paladox)
[21:50:09] <icinga-wm>	 PROBLEM - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[22:00:10] <wikibugs>	 (03CR) 10Thcipriani: [C: 03+2] "Nice work!" [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/533336 (owner: 10Paladox)
[22:00:43] <wikibugs>	 (03PS1) 10Krinkle: Update interwiki-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533345
[22:00:54] <wikibugs>	 (03PS2) 10Krinkle: Update interwiki-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533345 (https://phabricator.wikimedia.org/T187716)
[22:03:59] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] Update interwiki-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533345 (https://phabricator.wikimedia.org/T187716) (owner: 10Krinkle)
[22:08:27] <wikibugs>	 (03Merged) 10jenkins-bot: Support newer bazel versions [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/533336 (owner: 10Paladox)
[22:08:58] <wikibugs>	 (03Merged) 10jenkins-bot: Update interwiki-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533345 (https://phabricator.wikimedia.org/T187716) (owner: 10Krinkle)
[22:09:19] <wikibugs>	 (03CR) 10jenkins-bot: Update interwiki-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533345 (https://phabricator.wikimedia.org/T187716) (owner: 10Krinkle)
[22:11:27] <icinga-wm>	 RECOVERY - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is OK: puppetdb.PuppetDB OK https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[22:20:16] <wikibugs>	 (03PS1) 10Paladox: Support newer bazel versions [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/533349
[22:20:50] <wikibugs>	 (03PS1) 10Paladox: Merge tag 'v2.16.11' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/533350
[22:26:53] <wikibugs>	 (03PS2) 10Bstorm: pdns: set the recursor threads in line with best practices [puppet] - 10https://gerrit.wikimedia.org/r/533268 (https://phabricator.wikimedia.org/T224828)
[22:29:05] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] pdns: set the recursor threads in line with best practices [puppet] - 10https://gerrit.wikimedia.org/r/533268 (https://phabricator.wikimedia.org/T224828) (owner: 10Bstorm)
[22:31:18] <wikibugs>	 (03CR) 10Paladox: [C: 03+2] "Self merging as it's been merged into 2.15 and it passes the build." [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/533349 (owner: 10Paladox)
[22:37:13] <wikibugs>	 (03PS44) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291)
[22:39:42] <wikibugs>	 (03Merged) 10jenkins-bot: Support newer bazel versions [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/533349 (owner: 10Paladox)
[22:49:23] <wikibugs>	 (03PS2) 10Krinkle: CommonSettings: Clean up wmf-config caching code [no-op] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528446 (https://phabricator.wikimedia.org/T217830)
[22:49:40] <wikibugs>	 (03PS45) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291)
[22:51:16] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] CommonSettings: Clean up wmf-config caching code [no-op] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528446 (https://phabricator.wikimedia.org/T217830) (owner: 10Krinkle)
[22:54:22] <wikibugs>	 (03CR) 10Jforrester: [C: 03+1] CommonSettings: Clean up wmf-config caching code [no-op] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528446 (https://phabricator.wikimedia.org/T217830) (owner: 10Krinkle)
[22:57:33] <wikibugs>	 (03Merged) 10jenkins-bot: CommonSettings: Clean up wmf-config caching code [no-op] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528446 (https://phabricator.wikimedia.org/T217830) (owner: 10Krinkle)
[22:58:44] <wikibugs>	 (03CR) 10jenkins-bot: CommonSettings: Clean up wmf-config caching code [no-op] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528446 (https://phabricator.wikimedia.org/T217830) (owner: 10Krinkle)
[22:59:13] * Krinkle staging on mwdebug1002
[23:00:04] <jouncebot>	 MaxSem, RoanKattouw, Niharika, and Urbanecm: (Dis)respected human, time to deploy Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190829T2300). Please do the needful.
[23:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[23:04:07] <wikibugs>	 (03PS46) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291)
[23:06:42] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov)
[23:07:45] <Krinkle>	 filing an unrelated bug report first, then will proceed with the patch deploy
[23:09:03] <wikibugs>	 (03PS47) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291)
[23:14:26] <wikibugs>	 (03PS4) 10Krinkle: CommonSettings: Store mtime inside wmf-config cache file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528447 (https://phabricator.wikimedia.org/T217830)
[23:15:04] <logmsgbot>	 !log krinkle@deploy1001 Synchronized wmf-config/CommonSettings.php: 4cdfebe (duration: 00m 54s)
[23:15:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:16:45] <wikibugs>	 (03CR) 10Jforrester: "> Patch Set 3: Code-Review+1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528447 (https://phabricator.wikimedia.org/T217830) (owner: 10Krinkle)
[23:21:19] <icinga-wm>	 PROBLEM - Disk space on elastic1018 is CRITICAL: DISK CRITICAL - free space: /srv 26499 MB (5% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1018&var-datasource=eqiad+prometheus/ops
[23:25:24] <wikibugs>	 (03PS48) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291)
[23:31:32] <wikibugs>	 (03CR) 10BryanDavis: "AKosiaris: can you give this a sanity check from your perspective on the prod network's use of ::docker::registry? I tried to separate thi" [puppet] - 10https://gerrit.wikimedia.org/r/528617 (owner: 10BryanDavis)
[23:33:53] <icinga-wm>	 RECOVERY - Disk space on elastic1018 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1018&var-datasource=eqiad+prometheus/ops
[23:43:14] <wikibugs>	 (03PS4) 10Krinkle: Remove $wgSiteStatsAsyncFactor setting which had the same effect as the default (disabled) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521004 (owner: 10Aaron Schulz)