[01:19:25] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [01:21:53] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [02:16:21] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [02:21:15] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [02:24:06] (03PS1) 10Andrew Bogott: cloud-vps: add first draft of policy tests [puppet] - 10https://gerrit.wikimedia.org/r/678461 (https://phabricator.wikimedia.org/T279845) [02:25:09] (03CR) 10Andrew Bogott: [C: 03+2] cloud-vps: add first draft of policy tests [puppet] - 10https://gerrit.wikimedia.org/r/678461 (https://phabricator.wikimedia.org/T279845) (owner: 10Andrew Bogott) [02:28:07] (03PS1) 10Andrew Bogott: Fix path to wmcs-policy-tests.py [puppet] - 10https://gerrit.wikimedia.org/r/678462 (https://phabricator.wikimedia.org/T279845) [02:30:12] (03CR) 10Andrew Bogott: [C: 03+2] Fix path to wmcs-policy-tests.py [puppet] - 10https://gerrit.wikimedia.org/r/678462 (https://phabricator.wikimedia.org/T279845) (owner: 10Andrew Bogott) [02:34:29] (03PS1) 10Andrew Bogott: wmcs-policy-tests.py: apparently we need the .py extension [puppet] - 10https://gerrit.wikimedia.org/r/678463 (https://phabricator.wikimedia.org/T279845) [02:37:12] (03CR) 10Andrew Bogott: [C: 03+2] wmcs-policy-tests.py: apparently we need the .py extension [puppet] - 10https://gerrit.wikimedia.org/r/678463 (https://phabricator.wikimedia.org/T279845) (owner: 10Andrew Bogott) [03:17:27] PROBLEM - WDQS SPARQL on wdqs1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [03:24:47] RECOVERY - WDQS SPARQL on wdqs1013 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 1.061 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [05:17:05] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [05:19:33] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [05:53:47] PROBLEM - varnish-http-requests grafana alert on alert1001 is CRITICAL: CRITICAL: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is alerting: 70% GET drop in 30min alert. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/ [05:58:37] RECOVERY - varnish-http-requests grafana alert on alert1001 is OK: OK: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is not alerting. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/ [06:12:37] PROBLEM - Check systemd state on cumin2001 is CRITICAL: CRITICAL - degraded: The following units failed: database-backups-snapshots.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:31:03] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=pdu_sentry4 site=eqsin https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [06:33:31] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [06:50:07] PROBLEM - PHP7 rendering on mw1295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:56:57] PROBLEM - PHP7 jobrunner on mw1295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner [06:59:15] RECOVERY - PHP7 jobrunner on mw1295 is OK: HTTP OK: HTTP/1.1 200 OK - 330 bytes in 0.028 second response time https://wikitech.wikimedia.org/wiki/Jobrunner [07:04:17] PROBLEM - PHP7 jobrunner on mw1295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner [07:07:09] RECOVERY - PHP7 rendering on mw1295 is OK: HTTP OK: HTTP/1.1 200 OK - 332 bytes in 9.839 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:08:59] RECOVERY - PHP7 jobrunner on mw1295 is OK: HTTP OK: HTTP/1.1 200 OK - 332 bytes in 1.593 second response time https://wikitech.wikimedia.org/wiki/Jobrunner [07:15:25] PROBLEM - PHP7 rendering on mw1295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:17:41] RECOVERY - PHP7 rendering on mw1295 is OK: HTTP OK: HTTP/1.1 200 OK - 330 bytes in 0.024 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:19:49] PROBLEM - PHP7 jobrunner on mw1308 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner [07:22:05] RECOVERY - PHP7 jobrunner on mw1308 is OK: HTTP OK: HTTP/1.1 200 OK - 330 bytes in 0.028 second response time https://wikitech.wikimedia.org/wiki/Jobrunner [07:29:31] PROBLEM - PHP7 jobrunner on mw1308 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner [07:31:47] RECOVERY - PHP7 jobrunner on mw1308 is OK: HTTP OK: HTTP/1.1 200 OK - 332 bytes in 1.475 second response time https://wikitech.wikimedia.org/wiki/Jobrunner [07:35:55] PROBLEM - PHP7 jobrunner on mw1295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner [07:36:49] PROBLEM - PHP7 jobrunner on mw1308 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner [07:39:33] PROBLEM - PHP7 rendering on mw1308 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:40:33] RECOVERY - PHP7 jobrunner on mw1295 is OK: HTTP OK: HTTP/1.1 200 OK - 331 bytes in 0.105 second response time https://wikitech.wikimedia.org/wiki/Jobrunner [07:41:41] PROBLEM - PHP7 rendering on mw1295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:43:54] on some jobrunners the CPU usage is very high [07:43:56] https://grafana.wikimedia.org/d/000000607/cluster-overview?orgId=1&var-site=eqiad&var-cluster=jobrunner&var-instance=All&var-datasource=thanos [07:44:32] seems ffmpeg/transcode related again [07:45:35] PROBLEM - PHP7 jobrunner on mw1295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner [07:46:32] effie, _joe_, akosiaris - around? [07:46:39] RECOVERY - PHP7 rendering on mw1308 is OK: HTTP OK: HTTP/1.1 200 OK - 332 bytes in 2.025 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:47:54] https://commons.wikimedia.org/wiki/Special:Contributions/Askeuhd seems to be uploading many fairly large iss images, but those are images and not videos [07:48:36] elukey: I am around in 2', what do you see? [07:48:47] RECOVERY - PHP7 jobrunner on mw1308 is OK: HTTP OK: HTTP/1.1 200 OK - 332 bytes in 3.253 second response time https://wikitech.wikimedia.org/wiki/Jobrunner [07:49:16] effie: some jobrunners are clearly overloaded (cpu-wise), and I quickly checked on one of them and I see a lot of ffmpeg processes, so I suppose all transcode-related [07:49:25] I didn't follow what was done the other time [07:51:31] elukey: lets cont on -sre because here is too noisy [07:53:49] PROBLEM - PHP7 jobrunner on mw1308 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner [07:56:07] RECOVERY - PHP7 rendering on mw1295 is OK: HTTP OK: HTTP/1.1 200 OK - 330 bytes in 0.034 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:00:07] PROBLEM - HTTPS non-canonical-redirect-4 on ncredir2001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikispecies.net has 86396 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [08:00:31] PROBLEM - HTTPS non-canonical-redirect-4 on ncredir4001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikispecies.net has 86371 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [08:00:37] PROBLEM - HTTPS non-canonical-redirect-4 on ncredir3002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikispecies.net has 86363 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [08:00:47] PROBLEM - HTTPS non-canonical-redirect-4 on ncredir5002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikispecies.net has 86355 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [08:00:51] PROBLEM - HTTPS non-canonical-redirect-4 on ncredir1002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikispecies.net has 86350 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [08:00:52] vgutierrez: o/ --^ acme-chief in need of a restart? [08:00:59] RECOVERY - PHP7 jobrunner on mw1308 is OK: HTTP OK: HTTP/1.1 200 OK - 329 bytes in 0.839 second response time https://wikitech.wikimedia.org/wiki/Jobrunner [08:01:01] PROBLEM - HTTPS non-canonical-redirect-4 on ncredir3001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikispecies.net has 86340 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [08:01:03] PROBLEM - HTTPS on apt1001 is CRITICAL: SSL CRITICAL - OCSP staple validity for apt.wikimedia.org has 86339 seconds left https://wikitech.wikimedia.org/wiki/APT_repository [08:01:39] PROBLEM - HTTPS non-canonical-redirect-4 on ncredir1001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikispecies.net has 86302 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [08:01:41] PROBLEM - HTTPS non-canonical-redirect-4 on ncredir2002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikispecies.net has 86300 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [08:01:49] PROBLEM - HTTPS non-canonical-redirect-4 on ncredir5001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikispecies.net has 86292 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [08:01:53] PROBLEM - HTTPS non-canonical-redirect-4 on ncredir4002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikispecies.net has 86289 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [08:02:46] (Processor usage over 85%) firing: Alert for device mr1-codfw.wikimedia.org - Processor usage over 85% - https://alerts.wikimedia.org [08:02:57] Looks like that [08:03:02] One sec [08:03:04] <3 [08:04:35] PROBLEM - PHP7 rendering on mw1295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:04:57] RECOVERY - PHP7 jobrunner on mw1295 is OK: HTTP OK: HTTP/1.1 200 OK - 330 bytes in 0.034 second response time https://wikitech.wikimedia.org/wiki/Jobrunner [08:05:37] !log restart acme-chief [08:05:43] RECOVERY - HTTPS non-canonical-redirect-4 on ncredir1002 is OK: SSL OK - OCSP staple validity for www.wikispecies.net has 604458 seconds left:Certificate *.wikispecies.net valid until 2021-06-17 07:01:16 +0000 (expires in 65 days) https://wikitech.wikimedia.org/wiki/Ncredir [08:05:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:05:46] done, back to the shower :) [08:06:12] vgutierrez: thanksss [08:06:37] !log jiji@cumin1001 conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1334.eqiad.wmnet [08:06:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:06:51] RECOVERY - PHP7 rendering on mw1295 is OK: HTTP OK: HTTP/1.1 200 OK - 331 bytes in 0.292 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:06:55] !log jiji@cumin1001 conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1318.eqiad.wmnet [08:07:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:07:13] !log jiji@cumin1001 conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1311.eqiad.wmnet [08:07:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:07:46] (Processor usage over 85%) resolved: Alert for device mr1-codfw.wikimedia.org - Processor usage over 85% - https://alerts.wikimedia.org [08:07:48] !log jiji@cumin1001 conftool action : set/weight=20; selector: cluster=jobrunner,name=mw1334.eqiad.wmnet [08:07:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:08:05] !log jiji@cumin1001 conftool action : set/weight=20; selector: cluster=jobrunner,name=mw1318.eqiad.wmnet [08:08:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:01] PROBLEM - PHP7 jobrunner on mw1295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner [08:10:15] RECOVERY - HTTPS non-canonical-redirect-4 on ncredir4001 is OK: SSL OK - OCSP staple validity for www.wikispecies.net has 604186 seconds left:Certificate *.wikispecies.net valid until 2021-06-17 07:01:16 +0000 (expires in 65 days) https://wikitech.wikimedia.org/wiki/Ncredir [08:12:19] RECOVERY - PHP7 jobrunner on mw1295 is OK: HTTP OK: HTTP/1.1 200 OK - 330 bytes in 0.035 second response time https://wikitech.wikimedia.org/wiki/Jobrunner [08:16:31] RECOVERY - HTTPS non-canonical-redirect-4 on ncredir5001 is OK: SSL OK - OCSP staple validity for www.wikispecies.net has 603811 seconds left:Certificate *.wikispecies.net valid until 2021-06-17 07:01:16 +0000 (expires in 65 days) https://wikitech.wikimedia.org/wiki/Ncredir [08:16:31] PROBLEM - puppet last run on parse2016 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:16:35] PROBLEM - puppet last run on parse2018 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:16:37] PROBLEM - puppet last run on parse2009 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:16:49] PROBLEM - puppet last run on parse2006 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:16:49] PROBLEM - puppet last run on parse2014 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:17:07] PROBLEM - puppet last run on parse2004 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:17:11] PROBLEM - puppet last run on parse2020 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:17:11] RECOVERY - HTTPS non-canonical-redirect-4 on ncredir2001 is OK: SSL OK - OCSP staple validity for www.wikispecies.net has 603771 seconds left:Certificate *.wikispecies.net valid until 2021-06-17 07:01:16 +0000 (expires in 65 days) https://wikitech.wikimedia.org/wiki/Ncredir [08:17:25] PROBLEM - puppet last run on parse2007 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:18:15] PROBLEM - puppet last run on parse2005 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:18:17] PROBLEM - puppet last run on parse2012 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:19:29] PROBLEM - puppet last run on parse2008 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:19:29] PROBLEM - puppet last run on parse2002 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:20:07] PROBLEM - puppet last run on parse2017 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:20:31] RECOVERY - HTTPS on apt1001 is OK: SSL OK - OCSP staple validity for apt.wikimedia.org has 603571 seconds left:Certificate apt.wikimedia.org valid until 2021-06-14 07:00:24 +0000 (expires in 62 days) https://wikitech.wikimedia.org/wiki/APT_repository [08:20:33] PROBLEM - puppet last run on parse2015 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:20:59] PROBLEM - puppet last run on parse2013 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:21:21] PROBLEM - puppet last run on parse2003 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:21:21] PROBLEM - puppet last run on parse2010 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:22:07] PROBLEM - puppet last run on parse2011 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:22:27] ^ me [08:22:31] PROBLEM - puppet last run on parse2019 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:22:47] RECOVERY - Ensure local MW versions match expected deployment on wtp1025 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:22:57] RECOVERY - puppet last run on parse2016 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:22:59] RECOVERY - HTTPS non-canonical-redirect-4 on ncredir3001 is OK: SSL OK - OCSP staple validity for www.wikispecies.net has 603422 seconds left:Certificate *.wikispecies.net valid until 2021-06-17 07:01:16 +0000 (expires in 65 days) https://wikitech.wikimedia.org/wiki/Ncredir [08:23:25] RECOVERY - Ensure local MW versions match expected deployment on parse2001 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:23:35] RECOVERY - HTTPS non-canonical-redirect-4 on ncredir1001 is OK: SSL OK - OCSP staple validity for www.wikispecies.net has 603386 seconds left:Certificate *.wikispecies.net valid until 2021-06-17 07:01:16 +0000 (expires in 65 days) https://wikitech.wikimedia.org/wiki/Ncredir [08:23:49] RECOVERY - puppet last run on parse2007 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:25:55] RECOVERY - puppet last run on parse2002 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:26:13] RECOVERY - HTTPS non-canonical-redirect-4 on ncredir4002 is OK: SSL OK - OCSP staple validity for www.wikispecies.net has 603228 seconds left:Certificate *.wikispecies.net valid until 2021-06-17 07:01:16 +0000 (expires in 65 days) https://wikitech.wikimedia.org/wiki/Ncredir [08:27:37] RECOVERY - HTTPS non-canonical-redirect-4 on ncredir5002 is OK: SSL OK - OCSP staple validity for www.wikispecies.net has 603144 seconds left:Certificate *.wikispecies.net valid until 2021-06-17 07:01:16 +0000 (expires in 65 days) https://wikitech.wikimedia.org/wiki/Ncredir [08:27:47] RECOVERY - puppet last run on parse2010 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:27:47] RECOVERY - puppet last run on parse2003 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:28:33] RECOVERY - puppet last run on parse2011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:28:57] RECOVERY - puppet last run on parse2019 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:29:27] RECOVERY - puppet last run on parse2018 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:29:31] RECOVERY - puppet last run on parse2009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:29:41] RECOVERY - puppet last run on parse2006 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:29:41] RECOVERY - puppet last run on parse2014 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:30:01] RECOVERY - puppet last run on parse2004 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:30:03] RECOVERY - puppet last run on parse2020 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:31:07] RECOVERY - puppet last run on parse2005 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:31:09] RECOVERY - puppet last run on parse2012 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:32:19] RECOVERY - puppet last run on parse2008 is OK: OK: Puppet is currently enabled, last run 6 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:32:57] RECOVERY - puppet last run on parse2017 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:33:21] RECOVERY - HTTPS non-canonical-redirect-4 on ncredir2002 is OK: SSL OK - OCSP staple validity for www.wikispecies.net has 602801 seconds left:Certificate *.wikispecies.net valid until 2021-06-17 07:01:16 +0000 (expires in 65 days) https://wikitech.wikimedia.org/wiki/Ncredir [08:33:23] RECOVERY - puppet last run on parse2015 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:33:25] RECOVERY - mediawiki-installation DSH group on wtp1025 is OK: OK https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [08:33:47] RECOVERY - puppet last run on parse2013 is OK: OK: Puppet is currently enabled, last run 6 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:34:45] RECOVERY - HTTPS non-canonical-redirect-4 on ncredir3002 is OK: SSL OK - OCSP staple validity for www.wikispecies.net has 602716 seconds left:Certificate *.wikispecies.net valid until 2021-06-17 07:01:16 +0000 (expires in 65 days) https://wikitech.wikimedia.org/wiki/Ncredir [08:35:05] PROBLEM - puppet last run on wtp1044 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:35:09] RECOVERY - mediawiki-installation DSH group on parse2001 is OK: OK https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [08:35:11] PROBLEM - puppet last run on wtp1030 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:35:43] PROBLEM - puppet last run on wtp1037 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:35:43] PROBLEM - puppet last run on wtp1032 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:35:43] PROBLEM - puppet last run on wtp1043 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:35:43] PROBLEM - puppet last run on wtp1046 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:35:43] PROBLEM - puppet last run on wtp1039 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:36:19] PROBLEM - puppet last run on wtp1045 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:36:23] PROBLEM - puppet last run on wtp1042 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:36:43] PROBLEM - puppet last run on wtp1033 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:37:23] PROBLEM - puppet last run on wtp1040 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:37:29] ^ still me [08:38:41] PROBLEM - puppet last run on wtp1041 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:38:55] PROBLEM - puppet last run on wtp1047 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:39:53] PROBLEM - puppet last run on wtp1048 is CRITICAL: CRITICAL: Puppet last ran 3 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:41:31] RECOVERY - puppet last run on wtp1030 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:41:31] RECOVERY - puppet last run on wtp1032 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:41:31] RECOVERY - puppet last run on wtp1037 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:41:31] RECOVERY - puppet last run on wtp1039 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:41:31] RECOVERY - puppet last run on wtp1042 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:41:32] RECOVERY - puppet last run on wtp1043 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:41:32] RECOVERY - puppet last run on wtp1044 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:41:33] RECOVERY - puppet last run on wtp1046 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:41:33] RECOVERY - puppet last run on wtp1045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:43:05] RECOVERY - puppet last run on wtp1033 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:43:08] !log Start server-side upload for 4 video files (T279878, T279839, T279818) [08:43:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:43:19] T279878: Server side upload for Sturm - https://phabricator.wikimedia.org/T279878 [08:43:20] T279818: Server side upload for Sturm - https://phabricator.wikimedia.org/T279818 [08:43:20] T279839: Server side upload for Sturm - https://phabricator.wikimedia.org/T279839 [08:43:45] RECOVERY - puppet last run on wtp1040 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:43:50] Urbanecm: can you hang on to it for a bit [08:43:52] ? [08:44:17] effie: sorry, paused. Ping me when it can start again :) [08:45:05] RECOVERY - puppet last run on wtp1041 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:45:21] RECOVERY - puppet last run on wtp1047 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:46:17] RECOVERY - puppet last run on wtp1048 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:46:54] Urbanecm: cheers, it will take at least one hour, I am monitoring the load a bit for now on the jobrunners [08:47:14] sure, no problem [09:31:53] (03PS1) 10Ladsgroup: Disable legacy javascript in jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678569 (https://phabricator.wikimedia.org/T72470) [09:32:47] 10SRE, 10serviceops: High load on jobrunners - https://phabricator.wikimedia.org/T279893 (10jijiki) p:05Triage→03Medium [09:43:47] Urbanecm: I think we can proceed [09:43:56] cool, thanks [09:44:21] !log Start server-side upload for 4 video files #2 (T279878, T279839, T279818) [09:44:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:33] T279878: Server side upload for Sturm - https://phabricator.wikimedia.org/T279878 [09:44:34] T279818: Server side upload for Sturm - https://phabricator.wikimedia.org/T279818 [09:44:34] T279839: Server side upload for Sturm - https://phabricator.wikimedia.org/T279839 [09:45:03] (03PS3) 10Ladsgroup: snapshot: Migrate cronjobs in pagetitles to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/678338 (https://phabricator.wikimedia.org/T273673) [09:46:20] (03CR) 10Ladsgroup: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/678338 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [09:50:34] (03PS4) 10Ladsgroup: snapshot: Migrate cronjobs in pagetitles to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/678338 (https://phabricator.wikimedia.org/T273673) [09:50:55] (03CR) 10Ladsgroup: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/678338 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [09:54:54] jouncebot: now [09:54:54] No deployments scheduled for the next 0 hour(s) and 35 minute(s) [09:55:53] !log urbanecm@deploy1002 Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 00m 57s) [09:55:54] 10SRE, 10serviceops: High load on jobrunners (12 Apr 2021) - https://phabricator.wikimedia.org/T279893 (10jijiki) [09:56:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:02:12] (03PS5) 10Ladsgroup: snapshot: Migrate cronjobs in pagetitles to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/678338 (https://phabricator.wikimedia.org/T273673) [10:08:19] (03CR) 10Ladsgroup: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/678338 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [10:19:56] (03PS6) 10Ladsgroup: snapshot: Migrate cronjobs in pagetitles to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/678338 (https://phabricator.wikimedia.org/T273673) [10:20:23] (03CR) 10Ladsgroup: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/678338 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [10:21:02] (03CR) 10jerkins-bot: [V: 04-1] snapshot: Migrate cronjobs in pagetitles to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/678338 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [10:25:14] (03PS7) 10Ladsgroup: snapshot: Migrate cronjobs in pagetitles to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/678338 (https://phabricator.wikimedia.org/T273673) [10:26:12] (03CR) 10Ladsgroup: "PCC https://puppet-compiler.wmflabs.org/compiler1003/716/snapshot1008.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/678338 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [10:29:15] PROBLEM - Check systemd state on ms-be1063 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:30:05] jan_drewniak: Your horoscope predicts another unfortunate Wikimedia Portals Update deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210412T1030). [10:37:48] Is there a documentation how to deploy portals? we have been waiting for this for weeks now... [10:42:43] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be1063 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [10:49:14] I'm deploying portals [10:52:02] (03PS1) 10Ladsgroup: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678574 (https://phabricator.wikimedia.org/T279398) [10:53:11] (03CR) 10Ladsgroup: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678574 (https://phabricator.wikimedia.org/T279398) (owner: 10Ladsgroup) [10:53:55] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678574 (https://phabricator.wikimedia.org/T279398) (owner: 10Ladsgroup) [10:54:41] live on mwdebug1002 [10:55:35] RECOVERY - Check systemd state on ms-be1063 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:55:56] ugh forgot to rebase [10:57:03] works fine [10:57:05] syncing [10:59:48] !log ladsgroup@deploy1002 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:678574|Bumping portals to master (T279398 T279419)]] (duration: 00m 58s) [10:59:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:00] T279398: Change logo of MediaWiki on www.wikipedia.org - https://phabricator.wikimedia.org/T279398 [11:00:00] T279419: New MediaWiki logo is stretched on portals www.wiktionary.org, www.wikinews.org, www.wikiquote.org, www.wikibooks.org and www.wikiversity.org - https://phabricator.wikimedia.org/T279419 [11:00:05] Amir1, Lucas_WMDE, awight, and Urbanecm: That opportune time is upon us again. Time for a [[Backport windows|European mid-day backport window]]
'''''' deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210412T1100). [11:00:05] No GERRIT patches in the queue for this window AFAICS. [11:00:14] o/ [11:00:18] Amir1: you done? [11:00:19] haha, jouncebot is broken [11:00:19] uhm [11:00:25] and we should fix jouncebot... [11:00:25] Urbanecm: yup [11:00:36] Amir1: https://deploy-commands.toolforge.org/bacc/678386 looks suspiciously similar to the landing pages to tools I've written :P [11:00:38] is it due to that deploy commands thing? [11:00:47] !log ladsgroup@deploy1002 Synchronized portals: Wikimedia Portals Update: [[gerrit:678574|Bumping portals to master (T279398 T279419)]] (duration: 00m 58s) [11:00:54] Majavah: yup lol [11:00:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:57] didn't you say you're done Amir1 :) [11:00:58] I can start the deploys [11:01:06] (since I’m in a call with Silvan) [11:01:11] Lucas_WMDE: no, due to the recent page refactoring done by Kri.nkle [11:01:17] i partially fixed it [11:01:20] but apparently not fully :D [11:01:21] huh, I thought it worked after that [11:01:22] that was the last big [11:01:33] Lucas_WMDE: it didn't, i had to rewrote part of it [11:01:49] I think we’re talking about different issues? [11:01:49] https://gerrit.wikimedia.org/r/c/wikimedia/bots/jouncebot/+/677017 [11:01:53] I know about the stuff [11:01:56] and that that needed fixes [11:02:04] maybe :) [11:02:08] but now it claimed the window is empty, which I don’t think it did just a few days ago? [11:02:18] I think it still pinged users correctly after that redesign [11:02:25] it sometimes does and sometimes doesn't [11:02:28] hm okw [11:02:30] *ok [11:03:23] Amir1: still deploying? [11:03:27] nope [11:03:33] ok then I’ll go ahead [11:03:46] (03PS4) 10Lucas Werkmeister (WMDE): Remove idGeneratorLogging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677560 (https://phabricator.wikimedia.org/T274156) (owner: 10Noa wmde) [11:03:51] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Remove idGeneratorLogging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677560 (https://phabricator.wikimedia.org/T274156) (owner: 10Noa wmde) [11:04:38] (03Merged) 10jenkins-bot: Remove idGeneratorLogging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677560 (https://phabricator.wikimedia.org/T274156) (owner: 10Noa wmde) [11:05:09] pulled to mwdebug1001, just checking the wiki doesn’t explode [11:05:41] syncing [11:06:28] Urbanecm: in the meantime, if you have time, can you reuplaod https://meta.wikimedia.org/wiki/File:MediaWiki-logo_sister_1x.png with https://commons.wikimedia.org/wiki/File:MediaWiki-2020-icon.svg please? [11:06:35] The distorted part is fixed now [11:06:43] sure sure [11:06:52] and the rest of mw logos in https://meta.wikimedia.org/wiki/Category:Project_portal_files [11:06:52] (03PS2) 10Lucas Werkmeister (WMDE): Remove all remains of idGeneratorLogging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677920 (https://phabricator.wikimedia.org/T274156) (owner: 10Silvan Heintze) [11:07:04] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/Wikibase.php: Config: [[gerrit:677560|Remove idGeneratorLogging (T274156)]] (duration: 00m 58s) [11:07:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:13] T274156: Remove WikibaseRepo idGeneratorLogging option - https://phabricator.wikimedia.org/T274156 [11:07:14] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Remove all remains of idGeneratorLogging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677920 (https://phabricator.wikimedia.org/T274156) (owner: 10Silvan Heintze) [11:07:23] Amir1: should be done [11:07:32] i thought you were a metadmin :/ [11:07:50] Amir1: should i do 1.5 and 2x too? [11:08:10] (03Merged) 10jenkins-bot: Remove all remains of idGeneratorLogging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677920 (https://phabricator.wikimedia.org/T274156) (owner: 10Silvan Heintze) [11:08:20] yup [11:08:56] I'm not a meta admin but with my global interface admin I can do some stuff sometimes (that's how I fixed some of portal templates) [11:09:19] i see [11:09:56] (03PS2) 10Lucas Werkmeister (WMDE): wikidata: post edit constraint jobs on 60% of edits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677928 (https://phabricator.wikimedia.org/T204031) (owner: 10Tonina Zhelyazkova) [11:09:57] done all, hopefully [11:09:59] can you check? [11:10:21] sure [11:10:30] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:677920|Remove all remains of idGeneratorLogging (T274156)]] (1/2) (duration: 00m 57s) [11:10:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:46] perfect [11:10:58] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] wikidata: post edit constraint jobs on 60% of edits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677928 (https://phabricator.wikimedia.org/T204031) (owner: 10Tonina Zhelyazkova) [11:11:10] Urbanecm: yup. Thanks. [11:11:19] cool cool [11:11:41] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:677920|Remove all remains of idGeneratorLogging (T274156)]] (2/2, Beta-only) (duration: 00m 56s) [11:11:49] (03Merged) 10jenkins-bot: wikidata: post edit constraint jobs on 60% of edits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677928 (https://phabricator.wikimedia.org/T204031) (owner: 10Tonina Zhelyazkova) [11:11:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:13:42] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:677928|wikidata: post edit constraint jobs on 60% of edits (T204031)]] (duration: 01m 13s) [11:13:49] Urbanecm: the floor is yours :) [11:13:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:13:51] T204031: Deploy regular running of wikidata constraint checks using the job queue - https://phabricator.wikimedia.org/T204031 [11:13:59] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be1063 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [11:15:09] Lucas_WMDE: thanks. Should be no-op patches through :) [11:15:16] ok ^^ [11:15:20] (03PS2) 10Urbanecm: Explicitly set wgGEMentorshipMigrationStage: WRITE_OLD/READ_OLD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678386 (https://phabricator.wikimedia.org/T279853) [11:15:23] (03CR) 10Urbanecm: [C: 03+2] Explicitly set wgGEMentorshipMigrationStage: WRITE_OLD/READ_OLD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678386 (https://phabricator.wikimedia.org/T279853) (owner: 10Urbanecm) [11:15:26] (03PS2) 10Urbanecm: labs: Set GEMentorshipMigrationStage to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_OLD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678389 (https://phabricator.wikimedia.org/T279853) [11:15:31] (03CR) 10Urbanecm: [C: 03+2] labs: Set GEMentorshipMigrationStage to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_OLD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678389 (https://phabricator.wikimedia.org/T279853) (owner: 10Urbanecm) [11:15:57] I +2 the backport [11:16:16] (03CR) 10Ladsgroup: [C: 03+2] Don't do strict equal condition check [extensions/FlaggedRevs] (wmf/1.36.0-wmf.38) - 10https://gerrit.wikimedia.org/r/678347 (https://phabricator.wikimedia.org/T279750) (owner: 10Ladsgroup) [11:16:26] (03Merged) 10jenkins-bot: Explicitly set wgGEMentorshipMigrationStage: WRITE_OLD/READ_OLD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678386 (https://phabricator.wikimedia.org/T279853) (owner: 10Urbanecm) [11:16:29] (03Merged) 10jenkins-bot: labs: Set GEMentorshipMigrationStage to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_OLD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678389 (https://phabricator.wikimedia.org/T279853) (owner: 10Urbanecm) [11:17:47] good idea Amir1 :) [11:19:20] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: NO-OP: 6c03d6a59086fa42ec4fc9d289c819a4d3b8e052: Explicitly set wgGEMentorshipMigrationStage: WRITE_OLD/READ_OLD (T279853) (duration: 00m 58s) [11:19:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:29] T279853: Migrate mentor/mentee relationship to a separate database table on Wikimedia wikis - https://phabricator.wikimedia.org/T279853 [11:19:59] so that should be it from my side [11:20:29] ...but it is not [11:22:59] (03Merged) 10jenkins-bot: Don't do strict equal condition check [extensions/FlaggedRevs] (wmf/1.36.0-wmf.38) - 10https://gerrit.wikimedia.org/r/678347 (https://phabricator.wikimedia.org/T279750) (owner: 10Ladsgroup) [11:24:14] Amir1: go ahead [11:24:46] cool [11:24:49] going ahead [11:26:36] (03PS2) 10Ladsgroup: Disable legacy javascript in jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678569 (https://phabricator.wikimedia.org/T72470) [11:26:43] (03CR) 10Ladsgroup: [C: 03+2] Disable legacy javascript in jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678569 (https://phabricator.wikimedia.org/T72470) (owner: 10Ladsgroup) [11:26:45] !log ladsgroup@deploy1002 Synchronized php-1.36.0-wmf.38/extensions/FlaggedRevs/frontend/FlaggedRevsXML.php: Backport: [[gerrit:678347|Don't do strict equal condition check (T279750)]] (duration: 00m 57s) [11:26:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:26:53] T279750: [accepted revision] is white in FlaggedRevs diffs - https://phabricator.wikimedia.org/T279750 [11:27:26] (03Merged) 10jenkins-bot: Disable legacy javascript in jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678569 (https://phabricator.wikimedia.org/T72470) (owner: 10Ladsgroup) [11:29:50] !log ladsgroup@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:678569|Disable legacy javascript in jawiki (T72470)]] (duration: 00m 56s) [11:29:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:29:59] T72470: Remove legacy javascript globals - https://phabricator.wikimedia.org/T72470 [11:38:55] Amir1: just wondering, how do we monitor global usages? [11:39:05] to decide whether it's fine to disable it somewhere [11:39:34] https://grafana.wikimedia.org/d/000000037/mw-js-deprecate?orgId=1&refresh=1m&from=now-90d&to=now&var-Step=24h [11:39:54] oh, cool, we have a grafana dashboard :) [11:40:00] it is global so I can't say if we cleaned it up in this wiki or that wiki [11:40:41] but I have cleaned it globally for months, the new case will increase the errors and I'll fix the prominent ones post-deployment (where they are left from the cleanup) [11:41:43] got it [13:56:13] PROBLEM - Postgres Replication Lag on puppetdb2002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 251733688 and 5 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [13:58:39] RECOVERY - Postgres Replication Lag on puppetdb2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 290328 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [14:17:23] (03PS1) 10Ottomata: produce_canary_events - No longer use http proxies and use api-ro [puppet] - 10https://gerrit.wikimedia.org/r/678600 (https://phabricator.wikimedia.org/T274951) [14:19:29] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:20:56] (03CR) 10Ottomata: [C: 03+2] produce_canary_events - No longer use http proxies and use api-ro [puppet] - 10https://gerrit.wikimedia.org/r/678600 (https://phabricator.wikimedia.org/T274951) (owner: 10Ottomata) [14:21:45] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:49:56] (03PS1) 10Awight: [beta] Enable line numbering on all namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678606 (https://phabricator.wikimedia.org/T267911) [14:52:31] (03PS1) 10Ottomata: refine_job - remove support for RefineFailuresChecker and use 0.1.4 in test/refine [puppet] - 10https://gerrit.wikimedia.org/r/678607 (https://phabricator.wikimedia.org/T273789) [14:53:39] (03CR) 10jerkins-bot: [V: 04-1] refine_job - remove support for RefineFailuresChecker and use 0.1.4 in test/refine [puppet] - 10https://gerrit.wikimedia.org/r/678607 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [14:58:25] (03PS1) 10Ottomata: refine - use refinery 0.1.4 [puppet] - 10https://gerrit.wikimedia.org/r/678608 (https://phabricator.wikimedia.org/T273789) [14:59:38] (03CR) 10jerkins-bot: [V: 04-1] refine - use refinery 0.1.4 [puppet] - 10https://gerrit.wikimedia.org/r/678608 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [15:04:38] (03PS2) 10Ottomata: refine_job - remove support for RefineFailuresChecker and use 0.1.4 in test/refine [puppet] - 10https://gerrit.wikimedia.org/r/678607 (https://phabricator.wikimedia.org/T273789) [15:05:56] (03CR) 10jerkins-bot: [V: 04-1] refine_job - remove support for RefineFailuresChecker and use 0.1.4 in test/refine [puppet] - 10https://gerrit.wikimedia.org/r/678607 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [15:06:35] (03PS2) 10Ottomata: refine - use refinery 0.1.4 [puppet] - 10https://gerrit.wikimedia.org/r/678608 (https://phabricator.wikimedia.org/T273789) [15:07:59] (03CR) 10jerkins-bot: [V: 04-1] refine - use refinery 0.1.4 [puppet] - 10https://gerrit.wikimedia.org/r/678608 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [15:14:11] (03PS4) 10Ottomata: Set up refine_sanitize jobs in analytics test cluster. [puppet] - 10https://gerrit.wikimedia.org/r/676380 (https://phabricator.wikimedia.org/T273789) [15:14:37] (03CR) 10jerkins-bot: [V: 04-1] Set up refine_sanitize jobs in analytics test cluster. [puppet] - 10https://gerrit.wikimedia.org/r/676380 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [15:16:02] (03PS3) 10Ottomata: refine_job - remove support for RefineFailuresChecker and use 0.1.4 in test/refine [puppet] - 10https://gerrit.wikimedia.org/r/678607 (https://phabricator.wikimedia.org/T273789) [15:17:15] (03PS3) 10Ottomata: refine - use refinery 0.1.4 [puppet] - 10https://gerrit.wikimedia.org/r/678608 (https://phabricator.wikimedia.org/T273789) [15:17:24] (03PS5) 10Ottomata: Set up refine_sanitize jobs in analytics test cluster. [puppet] - 10https://gerrit.wikimedia.org/r/676380 (https://phabricator.wikimedia.org/T273789) [15:17:26] (03CR) 10jerkins-bot: [V: 04-1] refine_job - remove support for RefineFailuresChecker and use 0.1.4 in test/refine [puppet] - 10https://gerrit.wikimedia.org/r/678607 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [15:17:51] (03PS6) 10Ottomata: Set up refine_sanitize jobs in analytics test cluster. [puppet] - 10https://gerrit.wikimedia.org/r/676380 (https://phabricator.wikimedia.org/T273789) [15:18:35] (03CR) 10jerkins-bot: [V: 04-1] refine - use refinery 0.1.4 [puppet] - 10https://gerrit.wikimedia.org/r/678608 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [15:19:15] (03CR) 10jerkins-bot: [V: 04-1] Set up refine_sanitize jobs in analytics test cluster. [puppet] - 10https://gerrit.wikimedia.org/r/676380 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [16:00:27] (03PS1) 10Urbanecm: labs: Set mentor migration stage to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_NEW [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678613 (https://phabricator.wikimedia.org/T279853) [16:01:08] jouncebot: now [16:01:08] No deployments scheduled for the next 0 hour(s) and 58 minute(s) [16:01:11] jouncebot: next [16:01:11] In 0 hour(s) and 58 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210412T1700) [16:01:22] (03CR) 10Urbanecm: [C: 03+2] "no-op for prod" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678613 (https://phabricator.wikimedia.org/T279853) (owner: 10Urbanecm) [16:03:48] (03Merged) 10jenkins-bot: labs: Set mentor migration stage to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_NEW [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678613 (https://phabricator.wikimedia.org/T279853) (owner: 10Urbanecm) [16:25:23] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=routinator site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:27:49] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:37:10] 10SRE, 10OTRS, 10Security, 10User-notice: ((OTRS)) Community Edition 6 is end-of-life; no FOSS replacement provided - https://phabricator.wikimedia.org/T275294 (10Niklitov) Hello! Pls add my account User:Niklitov from OTRS, Wikimedia RU! Best Regards! [17:00:05] ryankemper: Your horoscope predicts another unfortunate Wikidata Query Service weekly deploy deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210412T1700). [17:58:09] (03PS2) 10Zabe: Replace 'ombudsman' with 'ombuds' in wmgPrivilegedGlobalGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678341 (https://phabricator.wikimedia.org/T256299) [18:00:05] RoanKattouw, Niharika, and Urbanecm: How many deployers does it take to do [[Backport windows|Morning backport window]]
'''''' deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210412T1800). [18:00:05] No GERRIT patches in the queue for this window AFAICS. [18:00:21] I can deploy today [18:00:22] Zabe: around? [18:00:27] o/ [18:00:39] (03CR) 10Urbanecm: [C: 03+2] Replace 'ombudsman' with 'ombuds' in wmgPrivilegedGlobalGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678341 (https://phabricator.wikimedia.org/T256299) (owner: 10Zabe) [18:01:31] (03Merged) 10jenkins-bot: Replace 'ombudsman' with 'ombuds' in wmgPrivilegedGlobalGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678341 (https://phabricator.wikimedia.org/T256299) (owner: 10Zabe) [18:02:30] (03CR) 10Urbanecm: [C: 04-2] "I don't see how https://phabricator.wikimedia.org/T275076#6876085 is addressed now. Violates https://meta.wikimedia.org/wiki/Limits_to_con" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/667306 (https://phabricator.wikimedia.org/T275076) (owner: 10Zabe) [18:03:11] (03PS2) 10Urbanecm: Enable on bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678237 (https://phabricator.wikimedia.org/T279635) (owner: 10Zabe) [18:03:27] !log urbanecm@deploy1002 sync-file aborted: ae05f7cd53925c06d8a23cb8f667a20d79ce2cff: Replace ombudsman with ombuds in wmgPrivilegedGlobalGroups (T256299ú (duration: 00m 00s) [18:03:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:03:40] Zabe: syncing first one [18:04:28] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: ae05f7cd53925c06d8a23cb8f667a20d79ce2cff: Replace ombudsman with ombuds in wmgPrivilegedGlobalGroups (T256299) (duration: 00m 57s) [18:04:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:04:36] T256299: Rename "ombudsman" global group to "ombuds" - https://phabricator.wikimedia.org/T256299 [18:04:45] Zabe: for the protection one, I'm not going to grant interface-admins other rights than what they have by default. Their purpose is to have a group which can edit CSS/JS, to minmalize the number of people who have this capability. It would be fine to create an extra group "templateeditor" (see enwiki or commons), which would have this new right. [18:04:50] (03CR) 10Urbanecm: [C: 03+2] Enable on bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678237 (https://phabricator.wikimedia.org/T279635) (owner: 10Zabe) [18:04:59] ok [18:06:46] (03Merged) 10jenkins-bot: Enable on bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678237 (https://phabricator.wikimedia.org/T279635) (owner: 10Zabe) [18:07:03] Zabe: can you test this one on mwdebug1001, please? [18:08:54] it works the supposed way [18:09:07] thanks, syncing [18:09:31] (03PS3) 10Zabe: Add 'abusefilter-maintainer' to wmgPrivilegedGlobalGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678339 (https://phabricator.wikimedia.org/T279835) [18:09:46] Zabe: thanks for that ombuds privileged groups patch, I had completely missed that when renaming the messages [18:09:52] Zabe: mind me syncing the abusefilter-maintainer one too? [18:10:30] yes, thx [18:10:36] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 13b10d3b1b0b3ff48077a7d212a0eddd6214ce22: Enable on bswiki (T279635) (duration: 00m 57s) [18:10:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:10:45] T279635: Enable Kartographer on bs.wiki - https://phabricator.wikimedia.org/T279635 [18:10:47] (03PS4) 10Urbanecm: Add 'abusefilter-maintainer' to wmgPrivilegedGlobalGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678339 (https://phabricator.wikimedia.org/T279835) (owner: 10Zabe) [18:10:55] (03CR) 10Urbanecm: [C: 03+2] Add 'abusefilter-maintainer' to wmgPrivilegedGlobalGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678339 (https://phabricator.wikimedia.org/T279835) (owner: 10Zabe) [18:12:03] (03Merged) 10jenkins-bot: Add 'abusefilter-maintainer' to wmgPrivilegedGlobalGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678339 (https://phabricator.wikimedia.org/T279835) (owner: 10Zabe) [18:13:58] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): cloudvirt1038: PCIe error - https://phabricator.wikimedia.org/T276922 (10Cmjohnson) another Dell tech arrived today with what was believed to be the replacement part. The part was replaced and the error persisted. Several reboots and TSR rep... [18:14:04] syncing [18:15:01] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 5d275ec5378c5faea356aeb0fc985f4d815efed1: Add abusefilter-maintainer to wmgPrivilegedGlobalGroups (T279835) (duration: 00m 58s) [18:15:07] Zabe: done [18:15:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:09] T279835: Add 'abusefilter-maintainer' to wmgPrivilegedGlobalGroups - https://phabricator.wikimedia.org/T279835 [18:15:09] anything else? [18:15:33] if we are already into it, thx: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/678366 [18:16:04] (03PS4) 10Urbanecm: Add extendedconfirmed on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678366 (https://phabricator.wikimedia.org/T279836) (owner: 10Zabe) [18:16:08] (03CR) 10Urbanecm: [C: 03+2] Add extendedconfirmed on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678366 (https://phabricator.wikimedia.org/T279836) (owner: 10Zabe) [18:16:10] sounds good [18:17:41] (03Merged) 10jenkins-bot: Add extendedconfirmed on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678366 (https://phabricator.wikimedia.org/T279836) (owner: 10Zabe) [18:18:39] Zabe: please test on mwdebug1001 [18:20:28] it works the supposed way [18:20:58] syncing [18:22:29] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: a1949fd15a5a6fe745f6b807b2716ccb2a287476: Add extendedconfirmed on svwiki (T279836) (duration: 00m 59s) [18:22:33] and done [18:22:37] anything else? [18:22:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:38] T279836: Add extendedconfirmed on svwiki in wmgAutopromoteOnceonEdit and use it in wgRestrictionLevels - https://phabricator.wikimedia.org/T279836 [18:23:52] I would have this one: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/675498 [18:24:47] (03PS2) 10Urbanecm: Enable assignment of importupload on enwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/675498 (https://phabricator.wikimedia.org/T278683) (owner: 10Zabe) [18:25:04] (03CR) 10Urbanecm: [C: 03+2] "not 100% convinced, but apparently the case elsewhere (T278683#6953394)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/675498 (https://phabricator.wikimedia.org/T278683) (owner: 10Zabe) [18:25:53] (03Merged) 10jenkins-bot: Enable assignment of importupload on enwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/675498 (https://phabricator.wikimedia.org/T278683) (owner: 10Zabe) [18:27:00] Zabe: cna you test on mwdebug1001, please? [18:27:06] 10SRE, 10netops: Higher latency on Lumen eqiad/esams link - https://phabricator.wikimedia.org/T277654 (10wiki_willy) Latest update from Lumen Sales Director is that he's looking into it internally with 2 resources. Both were out last week due to spring break, so we should hear something back this week. Thank... [18:28:25] it works the supposed way [18:29:35] cool, syncing [18:31:11] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 117743f695b5cd4b9fa99ff8aaa00d3f9a1d8889: Enable assignment of importupload on enwikibooks (T278683) (duration: 00m 57s) [18:31:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:20] T278683: Enable assignment of importupload on en.wikibooks - https://phabricator.wikimedia.org/T278683 [18:31:24] (03PS1) 10Zabe: Unset $wmgUseWikimediaShopLink for ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678628 (https://phabricator.wikimedia.org/T279877) [18:31:39] done [18:32:41] thx for your help :) [18:32:44] np [18:50:05] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:52:35] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [19:03:49] PROBLEM - Postgres Replication Lag on puppetdb2002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 98473104 and 16 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [19:04:27] (03PS2) 10Addshore: node10-sssd: bump npm from 6.5 to 6.14.5 [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/674134 (https://phabricator.wikimedia.org/T278180) [19:04:30] /win 18 [19:06:15] RECOVERY - Postgres Replication Lag on puppetdb2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 792256 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [19:14:20] (03PS2) 10Zabe: Unset $wmgUseWikimediaShopLink for ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678628 (https://phabricator.wikimedia.org/T279877) [19:25:57] PROBLEM - Postgres Replication Lag on puppetdb2002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 122222456 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [19:28:27] RECOVERY - Postgres Replication Lag on puppetdb2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 0 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [19:35:27] Database locked? [19:35:40] "The database is currently locked to new entries and other modifications, probably for routine database maintenance, after which it will be back to normal. " [19:35:46] got this when doing a global lock [20:00:05] chrisalbon and accraze: It is that lovely time of the day again! You are hereby commanded to deploy [[mw:Services|Services]] – [[mw:Extension:Graph|Graphoid]] / [[ORES]]. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210412T2000). [20:47:40] (03CR) 10Paladox: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/678425 (https://phabricator.wikimedia.org/T277645) (owner: 10Paladox) [21:00:05] Reedy and sbassett: It is that lovely time of the day again! You are hereby commanded to deploy Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210412T2100). [21:29:22] 10SRE, 10ops-eqiad, 10DC-Ops, 10Discovery-Search (Current work): (Need By: 2021-04-30) rack/setup/install wcqs100[123] - https://phabricator.wikimedia.org/T276644 (10Jclark-ctr) wcqs1001. A3. U37 Port 26 wcqs1002 B1. U34 Port 19 wcqs1003. C4. U17 Port 15 [21:29:46] 10SRE, 10ops-eqiad, 10DC-Ops, 10Discovery-Search (Current work): (Need By: 2021-04-30) rack/setup/install wcqs100[123] - https://phabricator.wikimedia.org/T276644 (10Jclark-ctr) [21:30:37] 10SRE, 10ops-eqiad, 10DC-Ops, 10Discovery-Search (Current work): (Need By: 2021-04-30) rack/setup/install wcqs100[123] - https://phabricator.wikimedia.org/T276644 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson [21:41:00] (03PS1) 10Andrew Bogott: nfs-mounts: provide dumps access to spi-tools [puppet] - 10https://gerrit.wikimedia.org/r/678642 (https://phabricator.wikimedia.org/T279555) [21:45:06] (03CR) 10Paladox: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/678646 (owner: 10Paladox) [21:45:23] (03CR) 10jerkins-bot: [V: 04-1] gerrit: Convert gerrit-theme to Polymer 3 [puppet] - 10https://gerrit.wikimedia.org/r/678646 (owner: 10Paladox) [21:46:15] (03PS4) 10Paladox: gerrit: Convert gerrit-theme to Polymer 3 [puppet] - 10https://gerrit.wikimedia.org/r/678646 [21:49:04] (03CR) 10Paladox: [C: 04-1] "Doesn't work yet" [puppet] - 10https://gerrit.wikimedia.org/r/678646 (owner: 10Paladox) [21:53:06] (03PS5) 10Paladox: gerrit: Convert gerrit-theme to Polymer 3 [puppet] - 10https://gerrit.wikimedia.org/r/678646 [21:59:39] (03CR) 10Paladox: "I've got the theme working but I'm not sure if the results table works (haven't tried that)." [puppet] - 10https://gerrit.wikimedia.org/r/678646 (owner: 10Paladox) [21:59:54] (03CR) 10Paladox: "tested on Gerri's master." [puppet] - 10https://gerrit.wikimedia.org/r/678646 (owner: 10Paladox) [22:01:09] PROBLEM - Disk space on Hadoop worker on analytics1061 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/g 15 GB (0% inode=99%): https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [22:08:19] (03PS4) 10Jforrester: wgAbuseFilterAflFilterMigrationStage: Make COMPAT_NEW in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/657696 (https://phabricator.wikimedia.org/T269712) [22:08:26] (03PS3) 10Jforrester: wgAbuseFilterAflFilterMigrationStage: Stop setting, COMPAT_NEW is default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/657697 (https://phabricator.wikimedia.org/T269712) [22:08:41] (03CR) 10jerkins-bot: [V: 04-1] wgAbuseFilterAflFilterMigrationStage: Stop setting, COMPAT_NEW is default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/657697 (https://phabricator.wikimedia.org/T269712) (owner: 10Jforrester) [22:15:17] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/657696 (https://phabricator.wikimedia.org/T269712) (owner: 10Jforrester) [22:59:13] PROBLEM - Disk space on Hadoop worker on analytics1070 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/m 15 GB (0% inode=99%): https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [23:00:04] RoanKattouw, Niharika, and Urbanecm: Your horoscope predicts another unfortunate [[Backport windows|Evening backport window]]
'''''' deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210412T2300). [23:00:04] No GERRIT patches in the queue for this window AFAICS. [23:00:47] (03CR) 10Jforrester: [C: 03+2] wgAbuseFilterAflFilterMigrationStage: Make COMPAT_NEW in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/657696 (https://phabricator.wikimedia.org/T269712) (owner: 10Jforrester) [23:01:32] (03Merged) 10jenkins-bot: wgAbuseFilterAflFilterMigrationStage: Make COMPAT_NEW in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/657696 (https://phabricator.wikimedia.org/T269712) (owner: 10Jforrester) [23:03:24] Reedy: Thinks look OK to me. Do you agree? [23:03:36] (On mwdebug1002. ;-)) [23:06:06] (03CR) 10Aaron Schulz: [C: 03+1] mc: Set 'broadcastRoutingPrefix' option in $wgWANObjectCaches [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677418 (owner: 10Krinkle) [23:06:54] !log jforrester@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:657696|wgAbuseFilterAflFilterMigrationStage: Make COMPAT_NEW in production (T269712)]] (duration: 00m 58s) [23:07:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:07:05] T269712: Migrate afl_filter to afl_filter_id and afl_global - https://phabricator.wikimedia.org/T269712 [23:07:30] heh [23:07:40] Yeah, the AF pages don't seem to be having any issue [23:07:40] No errors yet. [23:07:59] OK, I'll declare this done. :-) [23:08:21] James_F: late one https://gerrit.wikimedia.org/r/677418 if thats okay? (expected to be no-op setting wil exist next week and do the same as what the other settings set currently) [23:08:46] Krinkle: Sure… do you want to do it? [23:09:01] * James_F isn't a WANCache expert, to put it mildly. :-) [23:09:44] sure, np [23:10:01] Also, that patch didn't make the REL1_36 branch; I'll backport it. [23:11:03] right yeah it meant to be backported [23:11:07] I forgot to make the cherry pick [23:11:08] thx! [23:11:13] (03CR) 10Krinkle: [C: 03+2] mc: Set 'broadcastRoutingPrefix' option in $wgWANObjectCaches [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677418 (owner: 10Krinkle) [23:12:04] (03Merged) 10jenkins-bot: mc: Set 'broadcastRoutingPrefix' option in $wgWANObjectCaches [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677418 (owner: 10Krinkle) [23:12:29] Krinkle: Eurgh. Actual merge conflicts. Might leave to Aaron? [23:12:38] ok [23:14:44] * Krinkle staging on mwdebug1002 [23:24:58] and rolling out [23:25:42] !log krinkle@deploy1002 Synchronized wmf-config/mc.php: I390b4726d01037107 (duration: 00m 58s) [23:25:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:29:22] (03PS4) 10Jforrester: wgAbuseFilterAflFilterMigrationStage: Stop setting, COMPAT_NEW is default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/657697 (https://phabricator.wikimedia.org/T269712) [23:29:37] (03CR) 10Jforrester: [C: 04-1] "Not until wmf.1 is everywhere and won't rollback." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/657697 (https://phabricator.wikimedia.org/T269712) (owner: 10Jforrester) [23:37:01] (03PS4) 10Ottomata: refine_job - remove RefineFailuresChecker and use 0.1.4 in test/refine [puppet] - 10https://gerrit.wikimedia.org/r/678607 (https://phabricator.wikimedia.org/T273789) [23:38:52] (03PS4) 10Ottomata: refine - use refinery 0.1.4 [puppet] - 10https://gerrit.wikimedia.org/r/678608 (https://phabricator.wikimedia.org/T273789) [23:39:03] (03PS7) 10Ottomata: Set up refine_sanitize jobs in analytics test cluster. [puppet] - 10https://gerrit.wikimedia.org/r/676380 (https://phabricator.wikimedia.org/T273789) [23:40:20] (03CR) 10jerkins-bot: [V: 04-1] Set up refine_sanitize jobs in analytics test cluster. [puppet] - 10https://gerrit.wikimedia.org/r/676380 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata)