[00:00:04] RoanKattouw, Niharika, and Urbanecm: Time to snap out of that daydream and deploy Evening backport window. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201215T0000). [00:00:04] No GERRIT patches in the queue for this window AFAICS. [00:01:05] (03PS1) 10Bstorm: toolforge: upgrade sssd patch version [puppet] - 10https://gerrit.wikimedia.org/r/649471 (https://phabricator.wikimedia.org/T270128) [00:01:25] (03PS2) 10Bstorm: toolforge: upgrade sssd patch version [puppet] - 10https://gerrit.wikimedia.org/r/649471 (https://phabricator.wikimedia.org/T270128) [00:03:10] (03CR) 10Jeena Huneidi: [C: 03+2] "It is indeed nice to not see it restarting so many times" [deployment-charts] - 10https://gerrit.wikimedia.org/r/647842 (owner: 10Ahmon Dancy) [00:03:37] (03PS10) 10Ryan Kemper: categories: fix prom exporter's broken namespace [puppet] - 10https://gerrit.wikimedia.org/r/647774 (https://phabricator.wikimedia.org/T269872) [00:04:06] (03CR) 10Jeena Huneidi: [C: 03+2] "LGTM" [deployment-charts] - 10https://gerrit.wikimedia.org/r/647843 (owner: 10Ahmon Dancy) [00:04:59] (03CR) 10Jeena Huneidi: [C: 03+2] "LGTM" [deployment-charts] - 10https://gerrit.wikimedia.org/r/647844 (owner: 10Ahmon Dancy) [00:05:08] (03PS11) 10Ryan Kemper: categories: fix prom exporter's broken namespace [puppet] - 10https://gerrit.wikimedia.org/r/647774 (https://phabricator.wikimedia.org/T269872) [00:05:09] (03Merged) 10jenkins-bot: Reorganized setup.sh and added db wait loop [deployment-charts] - 10https://gerrit.wikimedia.org/r/647842 (owner: 10Ahmon Dancy) [00:05:51] (03CR) 10Jeena Huneidi: [C: 03+2] "Wahoo, I tried it with 3 replicas and only one did the rebuild :D" [deployment-charts] - 10https://gerrit.wikimedia.org/r/648304 (owner: 10Ahmon Dancy) [00:06:25] (03Merged) 10jenkins-bot: New utility macros in templates/_mediawiki-common.tpl [deployment-charts] - 10https://gerrit.wikimedia.org/r/647843 (owner: 10Ahmon Dancy) [00:06:32] (03CR) 10Jeena Huneidi: [C: 03+2] "nice catch" [deployment-charts] - 10https://gerrit.wikimedia.org/r/649470 (owner: 10Ahmon Dancy) [00:06:45] (03Merged) 10jenkins-bot: 0.2.0: Use a Job to set up the database [deployment-charts] - 10https://gerrit.wikimedia.org/r/647844 (owner: 10Ahmon Dancy) [00:06:57] (03CR) 10jerkins-bot: [V: 04-1] categories: fix prom exporter's broken namespace [puppet] - 10https://gerrit.wikimedia.org/r/647774 (https://phabricator.wikimedia.org/T269872) (owner: 10Ryan Kemper) [00:07:31] (03PS3) 10Bstorm: toolforge: upgrade sssd patch version [puppet] - 10https://gerrit.wikimedia.org/r/649471 (https://phabricator.wikimedia.org/T270128) [00:07:50] (03Merged) 10jenkins-bot: 0.3.0: add manually recached l10n CDB support [deployment-charts] - 10https://gerrit.wikimedia.org/r/648304 (owner: 10Ahmon Dancy) [00:08:05] (03PS2) 10Aklapper: Change redirect for wikimedia.org/research [puppet] - 10https://gerrit.wikimedia.org/r/619130 (https://phabricator.wikimedia.org/T259979) [00:08:19] (03CR) 10jerkins-bot: [V: 04-1] Change redirect for wikimedia.org/research [puppet] - 10https://gerrit.wikimedia.org/r/619130 (https://phabricator.wikimedia.org/T259979) (owner: 10Aklapper) [00:08:21] (03Merged) 10jenkins-bot: fix typo in README.md [deployment-charts] - 10https://gerrit.wikimedia.org/r/649470 (owner: 10Ahmon Dancy) [00:09:42] (03CR) 10Bstorm: "I don't know if the upgrade will upset the LDAP connections or anything, but at very least, I also don't believe that will matter too much" [puppet] - 10https://gerrit.wikimedia.org/r/649471 (https://phabricator.wikimedia.org/T270128) (owner: 10Bstorm) [00:15:05] (03PS3) 10Aklapper: Change redirect for wikimedia.org/research [puppet] - 10https://gerrit.wikimedia.org/r/619130 (https://phabricator.wikimedia.org/T259979) [00:16:12] (03PS2) 10Bstorm: partman: build a recipe to re-image nfs servers [puppet] - 10https://gerrit.wikimedia.org/r/647815 (https://phabricator.wikimedia.org/T266199) [00:17:27] (03CR) 10Bstorm: "Thanks, Stevie! How's the next version? Should I leave the swap partition in there or remove that?" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/647815 (https://phabricator.wikimedia.org/T266199) (owner: 10Bstorm) [00:19:46] (03CR) 10Bstorm: "This will help future-proof things a bit. It should be deployed pretty carefully. The admission.yaml is mounted into apiserver containers." [puppet] - 10https://gerrit.wikimedia.org/r/639883 (https://phabricator.wikimedia.org/T263284) (owner: 10Bstorm) [00:22:48] (03CR) 10Krinkle: [C: 03+1] "As I understand it, this will inform Scap in prod of the script and use it during emergency deploys via --force. It will not (yet) change " [puppet] - 10https://gerrit.wikimedia.org/r/636074 (https://phabricator.wikimedia.org/T243009) (owner: 10Ahmon Dancy) [00:31:53] (03PS12) 10Ryan Kemper: categories: fix prom exporter's broken namespace [puppet] - 10https://gerrit.wikimedia.org/r/647774 (https://phabricator.wikimedia.org/T269872) [00:33:41] (03CR) 10Cwhite: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/648329 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy) [00:34:52] 10Operations, 10Performance-Team, 10Traffic, 10observability: Ensure graphs used by Performance account for Varnish-to-ATS migration - https://phabricator.wikimedia.org/T233474 (10Krinkle) 05Open→03Resolved Not anymore. Thank you! [00:36:46] PROBLEM - Postgres Replication Lag on maps1008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 2772892408 and 561 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:38:00] (03PS13) 10Ryan Kemper: categories: fix prom exporter's broken namespace [puppet] - 10https://gerrit.wikimedia.org/r/647774 (https://phabricator.wikimedia.org/T269872) [00:39:28] (03CR) 10BryanDavis: toolforge: upgrade sssd patch version (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649471 (https://phabricator.wikimedia.org/T270128) (owner: 10Bstorm) [00:39:48] PROBLEM - Postgres Replication Lag on maps1005 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 4148880352 and 301 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:39:48] PROBLEM - Postgres Replication Lag on maps1007 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 3390818704 and 255 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:40:16] (03PS14) 10Ryan Kemper: categories: fix prom exporter's broken namespace [puppet] - 10https://gerrit.wikimedia.org/r/647774 (https://phabricator.wikimedia.org/T269872) [00:41:52] PROBLEM - Postgres Replication Lag on maps1006 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 3505120792 and 253 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:41:54] (03CR) 10jerkins-bot: [V: 04-1] categories: fix prom exporter's broken namespace [puppet] - 10https://gerrit.wikimedia.org/r/647774 (https://phabricator.wikimedia.org/T269872) (owner: 10Ryan Kemper) [00:42:12] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 698380832 and 47 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:42:24] PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 518022160 and 34 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:43:02] (03PS15) 10Ryan Kemper: categories: fix prom exporter's broken namespace [puppet] - 10https://gerrit.wikimedia.org/r/647774 (https://phabricator.wikimedia.org/T269872) [00:43:06] PROBLEM - Postgres Replication Lag on maps1010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 25630000 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:46:20] RECOVERY - Postgres Replication Lag on maps1010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 181872 and 52 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:46:39] (03CR) 10Ryan Kemper: "```" [puppet] - 10https://gerrit.wikimedia.org/r/647774 (https://phabricator.wikimedia.org/T269872) (owner: 10Ryan Kemper) [00:46:54] (03CR) 10Ryan Kemper: [C: 03+2] categories: fix prom exporter's broken namespace [puppet] - 10https://gerrit.wikimedia.org/r/647774 (https://phabricator.wikimedia.org/T269872) (owner: 10Ryan Kemper) [00:47:04] RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 2053552 and 97 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:48:24] RECOVERY - Postgres Replication Lag on maps1006 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 226808 and 176 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:48:32] (03PS16) 10Ryan Kemper: categories: fix prom exporter's broken namespace [puppet] - 10https://gerrit.wikimedia.org/r/647774 (https://phabricator.wikimedia.org/T269872) [00:48:54] RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 192088 and 206 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:51:10] RECOVERY - Postgres Replication Lag on maps1005 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1856 and 343 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:51:10] RECOVERY - Postgres Replication Lag on maps1007 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1856 and 343 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:51:26] RECOVERY - Postgres Replication Lag on maps1008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 620376 and 357 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:55:59] 10Operations, 10MW-on-K8s, 10serviceops, 10Patch-For-Review, and 2 others: RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10tstarling) [00:56:24] (03PS1) 10Ryan Kemper: wdqs: fix typo breaking prom blazegraph exporter [puppet] - 10https://gerrit.wikimedia.org/r/649473 (https://phabricator.wikimedia.org/T269872) [00:57:00] (03CR) 10Ryan Kemper: "Worked except for this minor issue: https://gerrit.wikimedia.org/r/c/operations/puppet/+/649473" [puppet] - 10https://gerrit.wikimedia.org/r/647774 (https://phabricator.wikimedia.org/T269872) (owner: 10Ryan Kemper) [00:57:12] (03CR) 10Ryan Kemper: [C: 03+2] wdqs: fix typo breaking prom blazegraph exporter [puppet] - 10https://gerrit.wikimedia.org/r/649473 (https://phabricator.wikimedia.org/T269872) (owner: 10Ryan Kemper) [01:00:28] PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 110206848 and 8 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:01:16] (03PS1) 10Bstorm: wikireplicas: close the connection object for maintain-meta_p [puppet] - 10https://gerrit.wikimedia.org/r/649475 (https://phabricator.wikimedia.org/T269620) [01:01:34] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [01:02:06] RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1377552 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:02:40] RECOVERY - Check systemd state on wdqs2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:02:40] RECOVERY - Check systemd state on wdqs2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:02:48] RECOVERY - Check systemd state on wdqs1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:02:50] RECOVERY - Check systemd state on wdqs2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:02:50] RECOVERY - Check systemd state on wdqs2003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:02:50] RECOVERY - Check systemd state on wdqs2006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:02:56] RECOVERY - Check systemd state on wdqs2005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:03:04] RECOVERY - Check systemd state on wdqs1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:03:12] RECOVERY - Check systemd state on wdqs1005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:03:16] RECOVERY - Check systemd state on wdqs1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:03:16] RECOVERY - Check systemd state on wdqs1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:03:34] RECOVERY - Check systemd state on wdqs2007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:03:36] RECOVERY - Check systemd state on wdqs1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:04:02] RECOVERY - Check systemd state on wdqs2008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:06:28] PROBLEM - Postgres Replication Lag on maps2001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 95027448 and 6 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:06:31] ^ This issue is resolved: https://phabricator.wikimedia.org/T269872 [01:08:06] RECOVERY - Postgres Replication Lag on maps2001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 142784 and 80 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:36:40] (03Abandoned) 10Jforrester: Don't cache output that is not safe to cache [core] (wmf/1.36.0-wmf.20) - 10https://gerrit.wikimedia.org/r/644665 (https://phabricator.wikimedia.org/T269154) (owner: 10Daniel Kinzler) [02:07:30] (03PS1) 10TrainBranchBot: Branch commit for wmf/1.36.0-wmf.22 [core] (wmf/1.36.0-wmf.22) - 10https://gerrit.wikimedia.org/r/649480 [02:07:57] (03PS2) 10DannyS712: Branch commit for wmf/1.36.0-wmf.22 [core] (wmf/1.36.0-wmf.22) - 10https://gerrit.wikimedia.org/r/649480 (https://phabricator.wikimedia.org/T267415) (owner: 10TrainBranchBot) [03:25:08] 10Operations, 10Research, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Redirect wikimedia.org/research to research.wikimedia.org instead of some external closed survey - https://phabricator.wikimedia.org/T259979 (10leila) @Dzahn Sorry to ping you personally. I'm hoping you can point me to the righ... [04:08:44] PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 1.124e+04 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [04:11:58] RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 14 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [05:04:23] (03PS2) 10KartikMistry: Update cxserver to 2020-12-12-101743-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/649329 (https://phabricator.wikimedia.org/T268309) [05:09:36] * kart_ doing cxserver update. [05:10:05] (03CR) 10KartikMistry: [C: 03+2] Update cxserver to 2020-12-12-101743-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/649329 (https://phabricator.wikimedia.org/T268309) (owner: 10KartikMistry) [05:11:26] (03Merged) 10jenkins-bot: Update cxserver to 2020-12-12-101743-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/649329 (https://phabricator.wikimedia.org/T268309) (owner: 10KartikMistry) [05:14:35] !log kartik@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' . [05:14:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:17:38] !log kartik@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . [05:17:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:23:32] !log kartik@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' . [05:23:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:29:10] !log Updated cxserver to 2020-12-12-101743-production (T268309) [05:29:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:29:13] T268309: Update cxserver to handle Parsoid HTML change from to - https://phabricator.wikimedia.org/T268309 [05:34:36] 10Operations, 10SRE-Access-Requests: Requesting access to deployment group for STran - https://phabricator.wikimedia.org/T270125 (10STran) a:05Tchanders→03RLazarus [05:35:23] 10Operations, 10SRE-Access-Requests: Requesting access to deployment group for STran - https://phabricator.wikimedia.org/T270125 (10STran) @RLazarus I've finished filling it out and believe it's ready. Thank you! 🙇 [05:42:50] PROBLEM - ElasticSearch health check for shards on 9200 on logstash2005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Max retries exceeded with url: /_cluster/health (Caused by NewConnectionError(requests.packages.urllib3.connection.HTTPConnection object at 0x7fe33f0d34e0: Failed to establish a new connection: [Errno 111] Connection [05:42:50] ://wikitech.wikimedia.org/wiki/Search%23Administration [05:43:04] PROBLEM - Check systemd state on logstash2005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:47:06] RECOVERY - dump of es5 in codfw on alert1001 is OK: Last dump for es5 at codfw (es2025.codfw.wmnet) taken on 2020-12-15 00:00:02 (1270 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [05:58:44] RECOVERY - dump of es4 in codfw on alert1001 is OK: Last dump for es4 at codfw (es2022.codfw.wmnet) taken on 2020-12-15 00:00:02 (1293 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [06:08:16] RECOVERY - Check systemd state on logstash2005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:09:38] RECOVERY - ElasticSearch health check for shards on 9200 on logstash2005 is OK: OK - elasticsearch status production-logstash-codfw: active_primary_shards: 456, status: green, active_shards_percent_as_number: 100.0, relocating_shards: 0, number_of_pending_tasks: 0, cluster_name: production-logstash-codfw, delayed_unassigned_shards: 0, unassigned_shards: 0, number_of_nodes: 6, active_shards: 862, number_of_in_flight_fetch: 0, task [06:09:38] ueue_millis: 0, timed_out: False, number_of_data_nodes: 3, initializing_shards: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration [06:16:42] !log marostegui@cumin1001 START - Cookbook sre.dns.netbox [06:16:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:23:09] !log marostegui@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [06:23:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:49:15] (03CR) 10Elukey: [C: 03+2] kerberos: explicitly set KRB5CCNAME [puppet] - 10https://gerrit.wikimedia.org/r/649415 (https://phabricator.wikimedia.org/T255262) (owner: 10Elukey) [06:51:22] (03CR) 10Elukey: "Moritz: I am going to merge this to unblock the testing of the /run/user/$id/krb_cred credential cache on stat100x, but please let me know" [puppet] - 10https://gerrit.wikimedia.org/r/649415 (https://phabricator.wikimedia.org/T255262) (owner: 10Elukey) [06:54:48] (03PS1) 10Elukey: kerberos: fix KRB5CCNAME location in kerberos-run-command [puppet] - 10https://gerrit.wikimedia.org/r/649492 [06:55:22] (03CR) 10Elukey: [C: 03+2] kerberos: fix KRB5CCNAME location in kerberos-run-command [puppet] - 10https://gerrit.wikimedia.org/r/649492 (owner: 10Elukey) [07:14:01] (03PS1) 10Elukey: profile::presto::client: allow to customize the kerberos ccache path [puppet] - 10https://gerrit.wikimedia.org/r/649495 (https://phabricator.wikimedia.org/T255262) [07:15:03] (03PS2) 10Elukey: profile::presto::client: allow to customize the kerberos ccache path [puppet] - 10https://gerrit.wikimedia.org/r/649495 (https://phabricator.wikimedia.org/T255262) [07:21:49] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27141/console" [puppet] - 10https://gerrit.wikimedia.org/r/649495 (https://phabricator.wikimedia.org/T255262) (owner: 10Elukey) [07:24:49] (03CR) 10Elukey: sqoop: Ensure /tmp/sqoop-jars/ is present (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/644347 (https://phabricator.wikimedia.org/T251788) (owner: 10Razzi) [07:25:14] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db2131', diff saved to https://phabricator.wikimedia.org/P13547 and previous config saved to /var/cache/conftool/dbconfig/20201215-072513-marostegui.json [07:25:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:26:35] (03CR) 10Elukey: [V: 03+1 C: 03+2] profile::presto::client: allow to customize the kerberos ccache path [puppet] - 10https://gerrit.wikimedia.org/r/649495 (https://phabricator.wikimedia.org/T255262) (owner: 10Elukey) [07:42:33] (03CR) 10Elukey: Port the Spicerack interactive module (033 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649426 (owner: 10Elukey) [07:47:54] RECOVERY - Disk space on dumpsdata1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=dumpsdata1001&var-datasource=eqiad+prometheus/ops [07:49:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool es1019 for decommissioning', diff saved to https://phabricator.wikimedia.org/P13548 and previous config saved to /var/cache/conftool/dbconfig/20201215-074924-marostegui.json [07:49:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:50:25] (03PS1) 10Marostegui: instances.yaml: Remove es1019 from dbctl [puppet] - 10https://gerrit.wikimedia.org/r/649582 [07:51:22] (03CR) 10Marostegui: [C: 03+2] instances.yaml: Remove es1019 from dbctl [puppet] - 10https://gerrit.wikimedia.org/r/649582 (owner: 10Marostegui) [07:52:21] !log marostegui@cumin1001 dbctl commit (dc=all): 'Remove es1019 from dbctl', diff saved to https://phabricator.wikimedia.org/P13549 and previous config saved to /var/cache/conftool/dbconfig/20201215-075220-marostegui.json [07:52:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:53:47] (03PS1) 10Marostegui: es1019: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/649583 (https://phabricator.wikimedia.org/T270159) [07:57:00] RECOVERY - Check systemd state on stat1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:58:54] (03CR) 10Marostegui: [C: 03+2] es1019: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/649583 (https://phabricator.wikimedia.org/T270159) (owner: 10Marostegui) [08:08:36] (03CR) 10Filippo Giunchedi: [C: 03+1] icinga: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/648359 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn) [08:11:37] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: collect zuul error mtail metrics [puppet] - 10https://gerrit.wikimedia.org/r/648329 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy) [08:26:07] 10Operations, 10netops, 10observability, 10User-fgiunchedi: LibreNMS sends its alerts to Alertmanager, resulting in email notifications to network operations - https://phabricator.wikimedia.org/T267018 (10fgiunchedi) >>! In T267018#6689478, @CDanis wrote: > Does this mean we can deprecate the [[ https://ge... [08:42:13] (03CR) 10Volans: [C: 03+1] "replies inline" (033 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649426 (owner: 10Elukey) [08:44:17] (03PS1) 10Gergő Tisza: Enable and configure GrowthExperiments on Bangla Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649586 (https://phabricator.wikimedia.org/T266020) [08:52:40] PROBLEM - Prometheus k8s cache not updating on prometheus2003 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23k8s_cache_not_updating https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=prometheus2003&var-datasource=codfw+prometheus/ops [08:57:52] RECOVERY - Prometheus k8s cache not updating on prometheus2003 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23k8s_cache_not_updating https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=prometheus2003&var-datasource=codfw+prometheus/ops [09:03:46] PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:03:46] PROBLEM - aqs endpoints health on aqs1005 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:04:20] checking --^ [09:04:32] PROBLEM - aqs endpoints health on aqs1009 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:05:42] PROBLEM - aqs endpoints health on aqs1007 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:06:14] RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:07:44] PROBLEM - aqs endpoints health on aqs1006 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:09:14] !log marostegui@cumin1001 START - Cookbook sre.dns.netbox [09:09:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:14] PROBLEM - aqs endpoints health on aqs1005 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:13:54] !log marostegui@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [09:13:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:14:31] (03CR) 10Jbond: [C: 03+1] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649077 (owner: 10ArielGlenn) [09:16:36] RECOVERY - aqs endpoints health on aqs1007 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:18:10] RECOVERY - aqs endpoints health on aqs1009 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:18:52] RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:19:06] RECOVERY - aqs endpoints health on aqs1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:19:11] (03CR) 10Jbond: "LGTM however its useful to have ensure_packages at the top of the manifest. Its very rare that you want some action to happen before the " (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/648358 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn) [09:20:22] RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:28:37] !log Stop mysql on es1019 - T270159 [09:28:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:28:40] T270159: decommission es1019.eqiad.wmnet - https://phabricator.wikimedia.org/T270159 [09:30:15] (03PS1) 10Marostegui: backup1002.cnf: Remove es1019 from the list [puppet] - 10https://gerrit.wikimedia.org/r/649591 (https://phabricator.wikimedia.org/T270159) [09:31:16] (03CR) 10Jcrespo: [C: 03+2] backup1002.cnf: Remove es1019 from the list [puppet] - 10https://gerrit.wikimedia.org/r/649591 (https://phabricator.wikimedia.org/T270159) (owner: 10Marostegui) [09:31:52] (03CR) 10jerkins-bot: [V: 04-1] backup1002.cnf: Remove es1019 from the list [puppet] - 10https://gerrit.wikimedia.org/r/649591 (https://phabricator.wikimedia.org/T270159) (owner: 10Marostegui) [09:32:59] (03CR) 10Jcrespo: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/649591 (https://phabricator.wikimedia.org/T270159) (owner: 10Marostegui) [09:33:11] (03PS1) 10Jbond: lldp: add new per interface neighbours fact [puppet] - 10https://gerrit.wikimedia.org/r/649592 (https://phabricator.wikimedia.org/T268802) [09:34:45] (03CR) 10Marostegui: [C: 03+2] backup1002.cnf: Remove es1019 from the list [puppet] - 10https://gerrit.wikimedia.org/r/649591 (https://phabricator.wikimedia.org/T270159) (owner: 10Marostegui) [09:36:13] (03CR) 10Jcrespo: [C: 03+2] backup1002.cnf: Remove es1019 from the list [puppet] - 10https://gerrit.wikimedia.org/r/649591 (https://phabricator.wikimedia.org/T270159) (owner: 10Marostegui) [09:36:19] (03PS2) 10Jbond: lldp: add new per interface neighbours fact [puppet] - 10https://gerrit.wikimedia.org/r/649592 (https://phabricator.wikimedia.org/T268802) [09:40:14] 10Operations, 10fundraising-tech-ops, 10netops, 10Patch-For-Review: Manage frack switches with Netbox - https://phabricator.wikimedia.org/T268802 (10jbond) @Dwisehaupt I have created a [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/649592 | patch ]] to add a new per interface `neighbours` fact.... [09:41:04] (03PS3) 10Jbond: lldp: add new per interface neighbours fact [puppet] - 10https://gerrit.wikimedia.org/r/649592 (https://phabricator.wikimedia.org/T268802) [09:41:22] (03PS4) 10Jbond: lldp: add new per interface neighbours fact [puppet] - 10https://gerrit.wikimedia.org/r/649592 (https://phabricator.wikimedia.org/T268802) [09:47:50] RECOVERY - dump of es4 in eqiad on alert1001 is OK: Last dump for es4 at eqiad (es1022.eqiad.wmnet) taken on 2020-12-15 00:00:01 (1293 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [09:59:44] (03CR) 10Alexandros Kosiaris: [C: 03+1] eventgate-main - increase replicas from 3 to 5 and mem limit to 600Mi [deployment-charts] - 10https://gerrit.wikimedia.org/r/649360 (https://phabricator.wikimedia.org/T249745) (owner: 10Ottomata) [10:03:20] (03PS1) 10Gergő Tisza: [beta] Set up beta bnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649596 (https://phabricator.wikimedia.org/T270165) [10:03:28] RECOVERY - Check systemd state on mc1031 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:04:20] RECOVERY - Check systemd state on mc2031 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:06:43] (03CR) 10Alexandros Kosiaris: [C: 03+1] "LVs created!" [puppet] - 10https://gerrit.wikimedia.org/r/648192 (https://phabricator.wikimedia.org/T244335) (owner: 10JMeybohm) [10:07:16] (03Abandoned) 10Alexandros Kosiaris: prometheus: Turn on codfw prometheus/k8s-staging [puppet] - 10https://gerrit.wikimedia.org/r/649281 (owner: 10Alexandros Kosiaris) [10:11:06] RECOVERY - dump of es5 in eqiad on alert1001 is OK: Last dump for es5 at eqiad (es1025.eqiad.wmnet) taken on 2020-12-15 00:00:01 (1270 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [10:12:27] (03PS1) 10Alexandros Kosiaris: kubernetes::calico: Switch to useing calico-kubeconfig [puppet] - 10https://gerrit.wikimedia.org/r/649601 [10:12:39] 10Operations, 10serviceops: Memcached is listening to 127.0.0.1 after first puppet runs - https://phabricator.wikimedia.org/T261164 (10jijiki) 05Open→03Declined [10:13:37] (03CR) 10Filippo Giunchedi: [C: 03+1] Enable k8s-staging prometheus instance in codfw [puppet] - 10https://gerrit.wikimedia.org/r/648192 (https://phabricator.wikimedia.org/T244335) (owner: 10JMeybohm) [10:14:51] (03CR) 10Alexandros Kosiaris: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27142/console" [puppet] - 10https://gerrit.wikimedia.org/r/649601 (owner: 10Alexandros Kosiaris) [10:16:21] (03CR) 10Filippo Giunchedi: [C: 03+1] "> Patch Set 1: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/648192 (https://phabricator.wikimedia.org/T244335) (owner: 10JMeybohm) [10:17:52] (03PS1) 10Effie Mouzeli: Swap mc1033 with mc1034 for Redis lock manager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649602 (https://phabricator.wikimedia.org/T265643) [10:19:07] (03CR) 10jerkins-bot: [V: 04-1] Swap mc1033 with mc1034 for Redis lock manager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649602 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli) [10:21:02] (03PS2) 10Effie Mouzeli: Swap mc1033 with mc1034 for Redis lock manager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649602 (https://phabricator.wikimedia.org/T265643) [10:21:30] (03PS1) 10Alexandros Kosiaris: Rename contfigmap.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/649603 [10:21:32] (03PS1) 10Alexandros Kosiaris: Remove system:authenticated from cni rights [deployment-charts] - 10https://gerrit.wikimedia.org/r/649604 [10:21:34] (03PS1) 10Alexandros Kosiaris: calico: Add GlobalNetworkPolicy support [deployment-charts] - 10https://gerrit.wikimedia.org/r/649605 [10:21:36] (03PS1) 10Alexandros Kosiaris: staging-codfw: Ship a default-deny GlobalNetworkPolicy [deployment-charts] - 10https://gerrit.wikimedia.org/r/649606 [10:21:38] (03CR) 10JMeybohm: [C: 04-1] "Stop. Mount error" [puppet] - 10https://gerrit.wikimedia.org/r/648192 (https://phabricator.wikimedia.org/T244335) (owner: 10JMeybohm) [10:22:26] (03PS2) 10Alexandros Kosiaris: Remove system:authenticated from cni rights [deployment-charts] - 10https://gerrit.wikimedia.org/r/649604 [10:22:28] (03PS2) 10Alexandros Kosiaris: calico: Add GlobalNetworkPolicy support [deployment-charts] - 10https://gerrit.wikimedia.org/r/649605 [10:22:30] (03PS2) 10Alexandros Kosiaris: staging-codfw: Ship a default-deny GlobalNetworkPolicy [deployment-charts] - 10https://gerrit.wikimedia.org/r/649606 [10:28:24] (03CR) 10JMeybohm: [C: 03+1] "Fixed, all good." [puppet] - 10https://gerrit.wikimedia.org/r/648192 (https://phabricator.wikimedia.org/T244335) (owner: 10JMeybohm) [10:31:52] (03PS1) 10Urbanecm: Initial configuration for madwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649607 (https://phabricator.wikimedia.org/T269437) [10:32:01] (03PS1) 10Phuedx: Revert "mediawiki.base: Add mw.errorLogger.logError()" [core] (wmf/1.36.0-wmf.22) - 10https://gerrit.wikimedia.org/r/649515 [10:33:12] (03PS1) 10Effie Mouzeli: hiera: upgrade mc1033, mc2033 to buster [puppet] - 10https://gerrit.wikimedia.org/r/649608 (https://phabricator.wikimedia.org/T213089) [10:42:04] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [10:42:30] (03PS1) 10Majavah: Remove deploymentwiki configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649609 (https://phabricator.wikimedia.org/T198673) [10:44:18] (03CR) 10JMeybohm: [C: 03+1] Remove system:authenticated from cni rights [deployment-charts] - 10https://gerrit.wikimedia.org/r/649604 (owner: 10Alexandros Kosiaris) [10:44:55] (03CR) 10JMeybohm: [C: 03+1] kubernetes::calico: Switch to useing calico-kubeconfig [puppet] - 10https://gerrit.wikimedia.org/r/649601 (owner: 10Alexandros Kosiaris) [10:45:18] (03CR) 10JMeybohm: [C: 03+2] Enable k8s-staging prometheus instance in codfw [puppet] - 10https://gerrit.wikimedia.org/r/648192 (https://phabricator.wikimedia.org/T244335) (owner: 10JMeybohm) [10:48:09] (03PS1) 10Urbanecm: Add wordmark and tagline for kawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649610 (https://phabricator.wikimedia.org/T267776) [10:50:14] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 23601 bytes in 7.398 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [10:52:55] (03PS1) 10Urbanecm: Update tiwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649611 (https://phabricator.wikimedia.org/T263504) [10:52:58] (03PS1) 10Urbanecm: Update tiwiktionary logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649612 (https://phabricator.wikimedia.org/T263504) [10:53:42] (03CR) 10JMeybohm: [C: 03+2] Add k8s-staging prometheus instance datasource [puppet] - 10https://gerrit.wikimedia.org/r/648193 (https://phabricator.wikimedia.org/T244335) (owner: 10JMeybohm) [10:56:34] (03CR) 10JMeybohm: [C: 04-1] "Ah, wait. Needs a version Bump in Chart.yaml" [deployment-charts] - 10https://gerrit.wikimedia.org/r/649604 (owner: 10Alexandros Kosiaris) [10:56:49] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: Puppet failure on deployment-cache-text06 - https://phabricator.wikimedia.org/T256064 (10jbond) [10:57:14] (03CR) 10VolkerE: [C: 03+1] Revert "mediawiki.base: Add mw.errorLogger.logError()" [core] (wmf/1.36.0-wmf.22) - 10https://gerrit.wikimedia.org/r/649515 (owner: 10Phuedx) [10:57:18] (03PS1) 10Urbanecm: Enable RC patrol for papwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649613 (https://phabricator.wikimedia.org/T268924) [10:59:43] jouncebot: now [10:59:43] No deployments scheduled for the next 1 hour(s) and 0 minute(s) [11:00:39] (03CR) 10Effie Mouzeli: [C: 03+2] Swap mc1033 with mc1034 for Redis lock manager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649602 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli) [11:01:28] (03Merged) 10jenkins-bot: Swap mc1033 with mc1034 for Redis lock manager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649602 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli) [11:02:15] (03CR) 10Jdrewniak: [C: 03+2] Revert "mediawiki.base: Add mw.errorLogger.logError()" [core] (wmf/1.36.0-wmf.22) - 10https://gerrit.wikimedia.org/r/649515 (owner: 10Phuedx) [11:06:15] !log jiji@deploy1001 Synchronized wmf-config/ProductionServices.php: Swap mc1033 with mc1034 for Redis lock manager - T265643 (duration: 00m 59s) [11:06:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:06:19] T265643: Upgrade MediaWiki's Redis cluster to Debian Buster - https://phabricator.wikimedia.org/T265643 [11:09:50] !log Create fake db to trigger data checks alerts for clouddb hosts T267090 [11:09:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:09:53] T267090: Productionize clouddb10[13-20] - https://phabricator.wikimedia.org/T267090 [11:17:58] !log disable puppet on mc1033, mc2033 for memcached upgrade [11:17:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:47] (03CR) 10Effie Mouzeli: [C: 03+2] hiera: upgrade mc1033, mc2033 to buster [puppet] - 10https://gerrit.wikimedia.org/r/649608 (https://phabricator.wikimedia.org/T213089) (owner: 10Effie Mouzeli) [11:23:58] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc1033.eqiad.wmnet ` The log can be... [11:24:05] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc2033.codfw.wmnet ` The log can be... [11:25:02] (03Merged) 10jenkins-bot: Revert "mediawiki.base: Add mw.errorLogger.logError()" [core] (wmf/1.36.0-wmf.22) - 10https://gerrit.wikimedia.org/r/649515 (owner: 10Phuedx) [11:26:56] PROBLEM - High average GET latency for mw requests on api_appserver in codfw on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-me [11:28:02] RECOVERY - High average GET latency for mw requests on api_appserver in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=GET [11:29:06] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10jijiki) [11:31:30] (03CR) 10ArielGlenn: "> Patch Set 3: Code-Review+1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649077 (owner: 10ArielGlenn) [11:37:49] !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1033.eqiad.wmnet with reason: REIMAGE [11:37:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:39:27] (03PS1) 10Cparle: Remove license mapping for search for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649617 (https://phabricator.wikimedia.org/T257938) [11:39:31] !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc2033.codfw.wmnet with reason: REIMAGE [11:39:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:39:51] (03CR) 10Jbond: [C: 03+2] cfssl: update ocsp refresh to work on multi master [puppet] - 10https://gerrit.wikimedia.org/r/649410 (owner: 10Jbond) [11:39:52] !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1033.eqiad.wmnet with reason: REIMAGE [11:39:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:40:36] (03CR) 10jerkins-bot: [V: 04-1] Remove license mapping for search for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649617 (https://phabricator.wikimedia.org/T257938) (owner: 10Cparle) [11:41:18] effie: you ok for me to merge you change [11:41:23] for mc1033 [11:41:34] oh crap [11:41:39] sorry [11:41:43] yes john please do so [11:41:48] !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2033.codfw.wmnet with reason: REIMAGE [11:41:49] (03PS2) 10Cparle: Remove license mapping for search for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649617 (https://phabricator.wikimedia.org/T257938) [11:41:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:42:06] effie: no probs :) merged [11:42:23] I should have merged it for what I am doing now, but it is fixable [11:42:26] (03CR) 10Matthias Mullie: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649617 (https://phabricator.wikimedia.org/T257938) (owner: 10Cparle) [11:42:33] ack [11:45:16] PROBLEM - Check systemd state on pki1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:45:25] ^^ this is me [11:46:18] PROBLEM - Check systemd state on dbprov1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:48:48] (03PS1) 10Jbond: cfssl: only install ocsp refresh app if ocsp resource defined [puppet] - 10https://gerrit.wikimedia.org/r/649618 [11:49:20] (03CR) 10jerkins-bot: [V: 04-1] cfssl: only install ocsp refresh app if ocsp resource defined [puppet] - 10https://gerrit.wikimedia.org/r/649618 (owner: 10Jbond) [11:51:26] (03PS2) 10Jbond: cfssl: only install ocsp refresh app if ocsp resource defined [puppet] - 10https://gerrit.wikimedia.org/r/649618 [11:54:38] (03CR) 10Alexandros Kosiaris: "> Patch Set 2: Code-Review-1" [deployment-charts] - 10https://gerrit.wikimedia.org/r/649604 (owner: 10Alexandros Kosiaris) [11:57:22] (03CR) 10Jbond: [C: 03+2] cfssl: only install ocsp refresh app if ocsp resource defined [puppet] - 10https://gerrit.wikimedia.org/r/649618 (owner: 10Jbond) [12:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy European mid-day backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201215T1200). [12:00:04] Lucas_WMDE, Majavah, and Urbanecm: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:15] I can deploy today! [12:00:19] hello, here, my patch is beta only [12:00:21] I have to have lunch first [12:00:22] (03PS1) 10Hnowlan: install_server: Reimage mw1265 to buster. [puppet] - 10https://gerrit.wikimedia.org/r/649620 (https://phabricator.wikimedia.org/T245757) [12:00:29] I might be back to deploy my change at the end of the window [12:00:32] or not [12:00:35] enjoy your meal then :) [12:00:36] so please go ahead with the rest first :) [12:01:35] Majavah: why are we deleting the config? [12:01:49] (03PS1) 10JMeybohm: k8s_infrastructure_users: Fix typo in types [labs/private] - 10https://gerrit.wikimedia.org/r/649621 [12:01:50] Urbanecm: we want to get rid of deploymentwiki [12:01:56] so that beta is more like production [12:02:12] Majavah: yes, but the wiki still exists, people can log in, etc [12:02:12] (03CR) 10JMeybohm: [V: 03+2 C: 03+2] k8s_infrastructure_users: Fix typo in types [labs/private] - 10https://gerrit.wikimedia.org/r/649621 (owner: 10JMeybohm) [12:02:28] (03CR) 10Urbanecm: [C: 03+2] Update tiwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649611 (https://phabricator.wikimedia.org/T263504) (owner: 10Urbanecm) [12:02:50] Urbanecm: that patch removes that wiki so it doesn't exist according to mediawiki [12:03:15] (03Merged) 10jenkins-bot: Update tiwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649611 (https://phabricator.wikimedia.org/T263504) (owner: 10Urbanecm) [12:03:23] (03PS1) 10Jbond: cfssl::ocsp: cast port to int [puppet] - 10https://gerrit.wikimedia.org/r/649622 [12:03:27] well, instructions for deleting a wiki are at https://wikitech.wikimedia.org/wiki/Delete_a_wiki [12:04:27] (03CR) 10Jbond: [C: 03+2] cfssl::ocsp: cast port to int [puppet] - 10https://gerrit.wikimedia.org/r/649622 (owner: 10Jbond) [12:05:15] oops, hadn't seen that page [12:05:27] somehow didn't think about the global references [12:05:32] jayme: fyi i merged your private repo changes [12:05:41] so now I need to find someone with beta cluster shell access? [12:05:45] jbond42: tnaks [12:05:53] ha. thanks! :) [12:06:02] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: e1bd906c1d5007361dd25bfdd8260ad8954b3b68: Update tiwiki logos (T263504) (duration: 00m 54s) [12:06:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:06:06] T263504: Update logos for ti.wikis - https://phabricator.wikimedia.org/T263504 [12:06:16] Majavah: well I'm a deployment-prep root, that itself is not a problem. But your patch misses one thing (deleted.dblist) ;) [12:06:23] (03CR) 10Urbanecm: [C: 03+2] Update tiwiktionary logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649612 (https://phabricator.wikimedia.org/T263504) (owner: 10Urbanecm) [12:06:38] * Majavah wonders if that's still the case with beta [12:07:10] Majavah: hmm, we don't have a deleted-labs.dblist... [12:07:13] (03Merged) 10jenkins-bot: Update tiwiktionary logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649612 (https://phabricator.wikimedia.org/T263504) (owner: 10Urbanecm) [12:07:42] Majavah: can we discuss what to do and what the purpose of deleted.dblist for prod is in a task? [12:07:52] yeah, sure, this isn't urgent [12:08:02] deleting a wiki is extremely rare, and I don't want to break anything, even if "just" beta :) [12:08:49] (03PS2) 10Urbanecm: Add wordmark and tagline for kawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649610 (https://phabricator.wikimedia.org/T267776) [12:08:52] (03PS1) 10Alexandros Kosiaris: admin_ng: Add tiller ClusterRole [deployment-charts] - 10https://gerrit.wikimedia.org/r/649623 [12:08:55] totally understand that :D [12:09:22] :) [12:09:27] (03CR) 10Urbanecm: [C: 03+2] Add wordmark and tagline for kawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649610 (https://phabricator.wikimedia.org/T267776) (owner: 10Urbanecm) [12:10:00] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: b7756de87d1202e3437f6e9c60c3e865f739b5af: Update tiwiktionary logos (T263504) (duration: 00m 55s) [12:10:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:10:21] (03Merged) 10jenkins-bot: Add wordmark and tagline for kawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649610 (https://phabricator.wikimedia.org/T267776) (owner: 10Urbanecm) [12:11:22] (03PS2) 10Urbanecm: Enable RC patrol for papwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649613 (https://phabricator.wikimedia.org/T268924) [12:11:24] (03PS1) 10Jbond: cfssl::ocsp: only care about information_schema db [puppet] - 10https://gerrit.wikimedia.org/r/649624 [12:11:59] (03CR) 10JMeybohm: [C: 04-1] admin_ng: Add tiller ClusterRole (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/649623 (owner: 10Alexandros Kosiaris) [12:12:56] !log urbanecm@deploy1001 Synchronized static/images/mobile/copyright/wikipedia-tagline-ka.svg: 0861bbb3be7ebba43da1dfc8a99ed3682a47a990: Add wordmark and tagline for kawiki (T267776; 1/3) (duration: 00m 55s) [12:12:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:13:00] T267776: Add wordmarks and taglines for Georgian Wikipedia (kawiki) - https://phabricator.wikimedia.org/T267776 [12:13:14] added a comment to that task [12:13:28] thanks [12:13:41] (03CR) 10Jbond: [C: 03+2] cfssl::ocsp: only care about information_schema db [puppet] - 10https://gerrit.wikimedia.org/r/649624 (owner: 10Jbond) [12:14:01] !log Purge tiwiki and tiwiktionary logos (T263504) [12:14:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:14:05] T263504: Update logos for ti.wikis - https://phabricator.wikimedia.org/T263504 [12:15:15] !log urbanecm@deploy1001 Synchronized static/images/mobile/copyright/wikipedia-wordmark-ka.svg: 0861bbb3be7ebba43da1dfc8a99ed3682a47a990: Add wordmark and tagline for kawiki (T267776; 2/3) (duration: 00m 54s) [12:15:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:16:09] (03CR) 10Urbanecm: [C: 03+2] Enable RC patrol for papwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649613 (https://phabricator.wikimedia.org/T268924) (owner: 10Urbanecm) [12:16:58] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 0861bbb3be7ebba43da1dfc8a99ed3682a47a990: Add wordmark and tagline for kawiki (T267776; 3/3) (duration: 00m 54s) [12:17:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:04] (03Merged) 10jenkins-bot: Enable RC patrol for papwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649613 (https://phabricator.wikimedia.org/T268924) (owner: 10Urbanecm) [12:18:36] (03PS2) 10Urbanecm: [beta] Set up beta bnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649596 (https://phabricator.wikimedia.org/T270165) (owner: 10Gergő Tisza) [12:18:41] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 2058fa5f19befd1548e205482ec6fd63abcc1728: Enable RC patrol for papwiki (T268924) (duration: 00m 54s) [12:18:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:18:45] T268924: Requesting patrolling 'on' for pap.wikipedia - https://phabricator.wikimedia.org/T268924 [12:18:50] (03PS2) 10Alexandros Kosiaris: admin_ng: Add tiller ClusterRole [deployment-charts] - 10https://gerrit.wikimedia.org/r/649623 [12:18:52] (03PS1) 10Jbond: cfssl::ocsp: no longer need to test file contents [puppet] - 10https://gerrit.wikimedia.org/r/649625 [12:18:59] (03CR) 10Urbanecm: [C: 03+2] [beta] Set up beta bnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649596 (https://phabricator.wikimedia.org/T270165) (owner: 10Gergő Tisza) [12:19:24] (03CR) 10Alexandros Kosiaris: admin_ng: Add tiller ClusterRole (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/649623 (owner: 10Alexandros Kosiaris) [12:19:54] (03Merged) 10jenkins-bot: [beta] Set up beta bnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649596 (https://phabricator.wikimedia.org/T270165) (owner: 10Gergő Tisza) [12:22:22] (03PS2) 10Jbond: cfssl::ocsp: no longer need to test file contents [puppet] - 10https://gerrit.wikimedia.org/r/649625 [12:23:03] (03CR) 10Jbond: [C: 03+2] cfssl::ocsp: no longer need to test file contents [puppet] - 10https://gerrit.wikimedia.org/r/649625 (owner: 10Jbond) [12:24:11] * Lucas_WMDE back [12:24:19] how goes the deployment window? [12:24:51] Lucas_WMDE: floor is yours [12:24:54] ok! [12:24:57] thanks [12:25:56] (03CR) 10JMeybohm: [C: 03+1] admin_ng: Add tiller ClusterRole (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/649623 (owner: 10Alexandros Kosiaris) [12:25:58] (03PS3) 10Lucas Werkmeister (WMDE): Remove propagatePageDeletion setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/643734 (owner: 10Itamar Givon) [12:26:03] RECOVERY - Check systemd state on pki1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:26:25] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Remove propagatePageDeletion setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/643734 (owner: 10Itamar Givon) [12:27:31] (03Merged) 10jenkins-bot: Remove propagatePageDeletion setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/643734 (owner: 10Itamar Givon) [12:28:24] testing on mwdebug1001 pro-forma [12:29:11] looks fine, syncing [12:30:37] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/Wikibase.php: Config: [[gerrit:643734|Remove propagatePageDeletion setting]] (1/2) (duration: 00m 54s) [12:30:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:31:21] (03CR) 10Alexandros Kosiaris: [C: 03+2] admin_ng: Add tiller ClusterRole [deployment-charts] - 10https://gerrit.wikimedia.org/r/649623 (owner: 10Alexandros Kosiaris) [12:32:04] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc1033.eqiad.wmnet'] ` and were **ALL** successful. [12:32:06] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:643734|Remove propagatePageDeletion setting]] (2/2) (duration: 00m 54s) [12:32:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:32] alright, I’m done as well [12:32:36] anything else? [12:32:43] (03Merged) 10jenkins-bot: admin_ng: Add tiller ClusterRole [deployment-charts] - 10https://gerrit.wikimedia.org/r/649623 (owner: 10Alexandros Kosiaris) [12:32:48] (03CR) 10Effie Mouzeli: [C: 03+1] install_server: Reimage mw1265 to buster. [puppet] - 10https://gerrit.wikimedia.org/r/649620 (https://phabricator.wikimedia.org/T245757) (owner: 10Hnowlan) [12:33:49] !log EU backport+config window done [12:33:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:35:24] (03PS1) 10JMeybohm: admin_ng: Fix indentation in various places [deployment-charts] - 10https://gerrit.wikimedia.org/r/649628 [12:35:26] (03PS1) 10JMeybohm: admin_ng Update/Fix PodSecurityPolicies [deployment-charts] - 10https://gerrit.wikimedia.org/r/649629 (https://phabricator.wikimedia.org/T228967) [12:36:32] (03CR) 10JMeybohm: [C: 03+1] "Wow. Thanks 😊" [deployment-charts] - 10https://gerrit.wikimedia.org/r/649603 (owner: 10Alexandros Kosiaris) [12:41:32] (03CR) 10JMeybohm: [C: 04-1] "This onl" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/649605 (owner: 10Alexandros Kosiaris) [12:43:23] RECOVERY - Check systemd state on dbprov1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:44:57] !log akosiaris@deploy1001 helmfile [staging-codfw] START helmfile.d/admin 'apply'. [12:44:58] (03PS1) 10Jbond: spec_test: claen up rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 [12:44:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:08] !log akosiaris@deploy1001 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. [12:45:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:10] !log Add tiller ClusterRole to staging-codfw [12:45:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:55] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' . [12:45:55] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'production' . [12:45:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:07] !log deploy zotero to staging-codfw for testing purposes [12:46:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:15] PROBLEM - Postgres Replication Lag on maps1010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 44654408 and 5 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [12:50:09] PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 21865704 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [12:50:43] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc2033.codfw.wmnet'] ` and were **ALL** successful. [12:51:23] RECOVERY - Postgres Replication Lag on maps1010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 695744 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [12:51:29] (03CR) 10Elukey: Port the Spicerack interactive module (031 comment) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649426 (owner: 10Elukey) [12:51:45] RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 21888 and 54 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [12:54:16] (03PS1) 10Elukey: Increase bandit version to avoid tox issues [cookbooks] - 10https://gerrit.wikimedia.org/r/649634 [12:54:18] (03PS1) 10Elukey: sre.hadoop.upgrade-bigtop-distro: use systemctl mask where needed [cookbooks] - 10https://gerrit.wikimedia.org/r/649635 (https://phabricator.wikimedia.org/T269919) [12:55:20] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' . [12:55:20] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'plain' . [12:55:20] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' . [12:55:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:55:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:55:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:09] RECOVERY - Check systemd state on rpki1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:56:25] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . [12:56:25] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [12:56:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:27] (03PS2) 10Elukey: sre.hadoop.upgrade-bigtop-distro: use systemctl mask where needed [cookbooks] - 10https://gerrit.wikimedia.org/r/649635 (https://phabricator.wikimedia.org/T269919) [12:56:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:57:04] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'production' . [12:57:04] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' . [12:57:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:57:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:58:41] RECOVERY - Check systemd state on rpki2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:59:00] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' . [12:59:00] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' . [12:59:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:59:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:04] Urbanecm and Amir1: (Dis)respected human, time to deploy Create new wikis (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201215T1300). Please do the needful. [13:00:20] \o/ [13:00:43] Amir1: ? [13:12:49] !log akosiaris@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' . [13:12:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:14:11] Amir1: ^ [13:17:33] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Thanks!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/649603 (owner: 10Alexandros Kosiaris) [13:19:10] (03Merged) 10jenkins-bot: Rename contfigmap.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/649603 (owner: 10Alexandros Kosiaris) [13:31:31] Amir1: going to merge the first one [13:31:43] Thanks [13:31:48] Let me know what happens [13:31:54] (03PS2) 10Urbanecm: Initial configuration for madwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649607 (https://phabricator.wikimedia.org/T269437) [13:32:03] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for madwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649607 (https://phabricator.wikimedia.org/T269437) (owner: 10Urbanecm) [13:32:15] sure Amir1 [13:32:59] (03Merged) 10jenkins-bot: Initial configuration for madwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649607 (https://phabricator.wikimedia.org/T269437) (owner: 10Urbanecm) [13:33:14] madwiki sounds exciting [13:33:35] hopefully it won't be mad at me for creating it:) [13:33:46] pulled to mwmaint, let's pull the trigger [13:34:28] Amir1: mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=muswiki mad wikipedia madwiki mad.wikipedia.org worked fine [13:34:33] db is live [13:34:45] \o. [13:34:48] *\o/ [13:34:59] yes, we should do something about the beta cluster [13:35:15] and it's at s5, as it should be [13:35:16] proceeding [13:35:17] is there a ticket for bnwiki in beta? [13:35:32] yes, lemme find it [13:35:39] T270165 [13:35:40] T270165: Create beta bnwiki - https://phabricator.wikimedia.org/T270165 [13:35:51] Amir1: ^ [13:36:07] okay, I'll have fun with it [13:36:20] !log urbanecm@deploy1001 Synchronized wmf-config/db-eqiad.php: Creating madwiki (T269437) (duration: 00m 55s) [13:36:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:26] you do the prod, let me know if things are not working fine [13:36:26] T269437: Create Wikipedia Madurese - https://phabricator.wikimedia.org/T269437 [13:37:02] Amir1: sure [13:38:06] wiki's live at mwdebug, syncing the rest [13:38:27] let's do the interwiki cache later [13:38:32] sure, I'll skip it [13:39:07] !log urbanecm@deploy1001 Synchronized wmf-config/db-codfw.php: Creating madwiki (T269437) (duration: 00m 54s) [13:39:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:39:55] PROBLEM - Check systemd state on mc1033 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:40:14] !log urbanecm@deploy1001 Synchronized dblists: Creating madwiki (T269437) (duration: 00m 54s) [13:40:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:23] wikiversions going now [13:40:36] !log urbanecm@deploy1001 Scap failed!: 5/8 canaries failed their endpoint checks(https://en.wikipedia.org) [13:40:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:46] opps [13:41:02] how that failed canaries? [13:41:11] no idea [13:41:13] reverting that part [13:41:47] !log urbanecm@deploy1001 rebuilt and synchronized wikiversions files: REVERT: Creating madwiki (T269437) [13:41:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:41:50] T269437: Create Wikipedia Madurese - https://phabricator.wikimedia.org/T269437 [13:42:14] Amir1: scap said this https://usercontent.irccloud-cdn.com/file/egvBPvkD/image.png [13:42:33] PROBLEM - Check systemd state on mc2033 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:42:38] that's helpful (sarcastic) [13:42:49] let me see if logstash is showing something [13:43:33] oooooh [13:43:53] I see, it's a fun issue from my yesterday's deploy [13:43:54] PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 3064 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [13:44:19] Amir1: I see this `2020-12-15 13:40:36 [X9i81ApAAEgAABr6O6MAAAET] mw1277 enwiki 1.36.0-wmf.21 exception ERROR: [X9i81ApAAEgAABr6O6MAAAET] /wiki/Main_Page PHP Fatal Error from line 27 of /srv/mediawiki/php-1.36.0-wmf.21/extensions/Wikibase/client/includes/Hooks/SkinAfterBottomScriptsHandler.php: Uncaught TypeError: Argument 1 passed to Wikibase\Client\Hooks\SkinAfterBottomScriptsHandler::__construct() must be of the type [13:44:19] string, object given, called in /srv/mediawiki/php-1.36.0-wmf.21/extensions/Wikibase/client/includes/ClientHooks.php on line 198 and defined in /srv/mediawiki/php-1.36.0-wmf.21/extensions/Wikibase/client/includes/Hooks/SkinAfterBottomScriptsHandler.php:27` [13:44:27] we should go there and manually do scap pull [13:44:28] that's only url+host match i can see [13:44:35] Amir1: to all canaries? [13:44:35] yup [13:44:38] why? [13:44:49] let's do it, I'll tell you later [13:45:02] okay, let's ask differently, how? [13:45:09] ssh to 10 different hosts? [13:45:15] yeah [13:45:27] RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 9 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [13:45:30] doing 1277 atm [13:45:56] mw1278.eqiad.wmnet [13:46:36] mw1263.eqiad.wmnet [13:46:41] and mw1276.eqiad.wmnet [13:46:54] done 1279 [13:47:21] Amir1: before you get to another host [13:47:31] I just ran scap pull at mw1276 [13:47:40] but /srv/mediawiki/wikiversions.php does NOT have madwiki [13:47:51] scap pull doesn't seem to rebuild wikiversions.php [13:48:28] we are not there yet [13:48:30] right? [13:48:35] you were syncing db lists [13:48:37] Amir1: yes, I was syncing wikiversions [13:48:42] and that is what failed canaries [13:49:22] 10Operations, 10Phabricator, 10Security: Phabricator unresponsive - https://phabricator.wikimedia.org/T270184 (10jbond) p:05Triage→03Medium [13:49:28] I see [13:49:36] i followed the checklist at T269437, synced db-eqiad.php, db-codfw.php and dblists (that went fine), and then ran scap sync-wikiversions, which failed canaries [13:49:36] T269437: Create Wikipedia Madurese - https://phabricator.wikimedia.org/T269437 [13:50:22] The strange part is that logstash, says different hosts have failed [13:50:29] https://logstash.wikimedia.org/app/kibana#/dashboard/mediawiki-errors [13:51:14] how stupid idea would it be to retry running sync-wikiversions? [13:51:40] I was about to suggest that [13:52:18] Urbanecm: is it in the json file? [13:52:33] Amir1: yes [13:53:27] Amir1: so, let's try to sync it once again? [13:53:52] yeah, just make sure the revert is not in the deploy node [13:54:10] yup, I reverted it already [13:54:23] second try of wikiversions [13:54:43] let's try [13:55:00] canaries passed [13:55:07] \o/ [13:55:28] !log urbanecm@deploy1001 rebuilt and synchronized wikiversions files: Creating madwiki (T269437) [13:55:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:55:32] T269437: Create Wikipedia Madurese - https://phabricator.wikimedia.org/T269437 [13:55:38] great, so it worked [13:56:19] so the error was sorta expected during my yesterday's backport, so I did --force [13:56:40] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Creating madwiki (T269437) (duration: 00m 53s) [13:56:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:48] I assume, it didn't sync to canaries, when we tried to sync all, it failed [13:56:56] so a manual scap pull should have fixed it [13:57:21] so, one last sync, and we're done with this one [13:57:30] \o/ [13:58:06] !log urbanecm@deploy1001 Synchronized langlist: Creating madwiki (T269437) (duration: 00m 54s) [13:58:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:58:36] (03PS1) 10Filippo Giunchedi: alertmanager: add peering@ contact/team [puppet] - 10https://gerrit.wikimedia.org/r/649645 (https://phabricator.wikimedia.org/T267018) [13:58:38] (03PS1) 10Filippo Giunchedi: alertmanager: rename irc contact to sre-irc [puppet] - 10https://gerrit.wikimedia.org/r/649646 (https://phabricator.wikimedia.org/T267018) [14:01:31] (03PS2) 10Urbanecm: Initial configuration for wawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646736 (https://phabricator.wikimedia.org/T269431) [14:01:42] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for wawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646736 (https://phabricator.wikimedia.org/T269431) (owner: 10Urbanecm) [14:02:31] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for wawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646736 (https://phabricator.wikimedia.org/T269431) (owner: 10Urbanecm) [14:02:36] what? [14:02:49] (03PS3) 10Urbanecm: Initial configuration for eowikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646734 (https://phabricator.wikimedia.org/T269426) [14:03:35] (03PS3) 10Urbanecm: Initial configuration for wawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646736 (https://phabricator.wikimedia.org/T269431) [14:04:13] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for wawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646736 (https://phabricator.wikimedia.org/T269431) (owner: 10Urbanecm) [14:04:32] Urbanecm: Today seems to be really fun [14:04:47] are you expecting more failures to come? :) [14:05:02] indeed, it's 2020 [14:05:03] (03Merged) 10jenkins-bot: Initial configuration for wawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646736 (https://phabricator.wikimedia.org/T269431) (owner: 10Urbanecm) [14:05:19] :) [14:05:22] pulling to mwmaint [14:06:10] addWiki succeeded [14:06:39] database exists, and is at s5 [14:06:55] syncing db- files [14:07:43] !log urbanecm@deploy1001 Synchronized wmf-config/db-eqiad.php: Creating wawikisource (T269431) (duration: 00m 55s) [14:07:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:48] T269431: Create Wikisource Walloon - https://phabricator.wikimedia.org/T269431 [14:08:49] !log urbanecm@deploy1001 Synchronized wmf-config/db-codfw.php: Creating wawikisource (T269431) (duration: 00m 54s) [14:08:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:57] dblists now [14:09:46] !log urbanecm@deploy1001 Synchronized dblists: Creating wawikisource (T269431) (duration: 00m 54s) [14:09:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:09:58] wikiversions coming now [14:11:02] !log urbanecm@deploy1001 rebuilt and synchronized wikiversions files: Creating wawikisource (T269431) [14:11:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:12] static files... [14:11:44] (03PS2) 10Jbond: spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 [14:12:04] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: Creating wawikisource (T269431) (duration: 00m 54s) [14:12:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:32] IS.php, and that'd be the last sync here [14:13:12] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Creating wawikisource (T269431) (duration: 00m 54s) [14:13:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:13:15] T269431: Create Wikisource Walloon - https://phabricator.wikimedia.org/T269431 [14:13:19] Jhs: wawikisource is ready for you [14:13:35] Wikisources are a pain in the ass, I will do those last :) [14:13:39] okay :D [14:13:46] so, let's go with eowikivoyage [14:13:55] (03PS4) 10Urbanecm: Initial configuration for eowikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646734 (https://phabricator.wikimedia.org/T269426) [14:14:30] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for eowikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646734 (https://phabricator.wikimedia.org/T269426) (owner: 10Urbanecm) [14:14:59] (03CR) 10jerkins-bot: [V: 04-1] spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 (owner: 10Jbond) [14:15:37] (03Merged) 10jenkins-bot: Initial configuration for eowikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646734 (https://phabricator.wikimedia.org/T269426) (owner: 10Urbanecm) [14:15:43] (03PS9) 10Urbanecm: Initial configuration for skrwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/643524 (https://phabricator.wikimedia.org/T268410) [14:16:48] * Amir1 waves at Jhs [14:17:09] 👋 [14:18:41] (03PS7) 10Urbanecm: Initial configuration for skrwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/643526 (https://phabricator.wikimedia.org/T268448) [14:18:56] pulling to mwmaint [14:20:25] Amir1: I'm afraid I did a mistake [14:20:33] I mistakenly ran addWiki _before_ pulling [14:20:45] it should just fail [14:20:48] right? [14:20:57] it failed, but it did create the db (at s5, fortunately) [14:21:19] 10Operations, 10InternetArchiveBot, 10Traffic, 10Platform Team Workboards (Clinic Duty Team): IAbot sending a huge volume of action=raw requests - https://phabricator.wikimedia.org/T269914 (10holger.knust) [14:21:19] okay, where it failed? [14:21:35] we need to redo the rest [14:21:45] Amir1: see traceback https://www.irccloud.com/pastebin/6QGRzNva/ [14:22:20] it's just the main page [14:22:30] so, it might be terrible but I have done it before [14:22:32] pull [14:22:35] run eval.php [14:22:44] pulled [14:22:44] copy paste the part for creating the main page [14:23:01] Amir1: eval for muswiki or eowikivoyage? [14:23:17] eowikivoyage, if it didn't work, muswiki [14:23:40] no version entry for `eowikivoyage`., presumably because i didn't sync wikiversions yet [14:23:55] okay then, muswiki is [14:24:22] just make sure you don't create the main page in muswiki :D [14:24:46] happily it's a closed wiki - would be worse if i used dewiki :D [14:25:24] haha true [14:25:58] !log akosiaris@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' . [14:25:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:26:26] Amir1: so, running https://github.com/wikimedia/mediawiki-extensions-WikimediaMaintenance/blob/master/addWiki.php#L148 in muswiki's shell.php [14:26:40] MediaWikiServices::getInstance()->getRevisionStoreFactory()->getRevisionStore('eowikivoyage') says InvalidArgumentException with message 'Unable to find type name' [14:28:17] I think some parts from the first lines need to be ran too [14:28:41] let me give it a try, it's messy, that's what I really want to break addWiki.php [14:29:37] Amir1: never mind, quitting and reopening shell.php fixed it [14:29:46] haha [14:31:36] service redefined successfully [14:33:31] (03PS3) 10Jbond: spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 [14:34:37] once main page is done, we have $this->setFundraisingLink( $domain, $lang ); [14:34:55] Populate sites table (this should be idempotent) [14:35:04] Sets up the filebackend zones (this should be idempotent) [14:35:56] MassMessage stuff [14:36:38] (03CR) 10jerkins-bot: [V: 04-1] spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 (owner: 10Jbond) [14:40:08] !log Configure replication on x2 codfw hosts T269324 [14:40:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:40:12] T269324: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 [14:40:18] (03PS1) 10Elukey: sre.hadoop.change-distro-from-cdh: move to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/649653 (https://phabricator.wikimedia.org/T269925) [14:44:31] (03PS1) 10Alexandros Kosiaris: apertium: Lower replicas to 4 [deployment-charts] - 10https://gerrit.wikimedia.org/r/649654 [14:47:31] (03PS1) 10Elukey: sre.hadoop.change-distro-from-cdh: use systemctl mask where needed [cookbooks] - 10https://gerrit.wikimedia.org/r/649655 [14:47:50] okay, wiki's live, syncing [14:48:53] !log urbanecm@deploy1001 Synchronized wmf-config/db-eqiad.php: Creating eowikivoyage (T269426) (duration: 00m 58s) [14:48:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:57] T269426: Create Wikivoyage Esperanto - https://phabricator.wikimedia.org/T269426 [14:50:43] !log urbanecm@deploy1001 Synchronized wmf-config/db-codfw.php: Creating eowikivoyage (T269426) (duration: 00m 54s) [14:50:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:40] !log urbanecm@deploy1001 Synchronized dblists: Creating eowikivoyage (T269426) (duration: 00m 54s) [14:51:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:52] !log urbanecm@deploy1001 rebuilt and synchronized wikiversions files: Creating eowikivoyage (T269426) [14:52:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:57] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: Creating eowikivoyage (T269426) (duration: 00m 54s) [14:54:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:00] T269426: Create Wikivoyage Esperanto - https://phabricator.wikimedia.org/T269426 [14:55:07] (03PS10) 10Urbanecm: Initial configuration for skrwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/643524 (https://phabricator.wikimedia.org/T268410) [14:55:10] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for skrwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/643524 (https://phabricator.wikimedia.org/T268410) (owner: 10Urbanecm) [14:55:15] Jhs: eowikivoyage is done [14:55:47] Urbanecm, gr8! [14:56:01] not sure what it means, but thank you :D [14:56:09] (03Merged) 10jenkins-bot: Initial configuration for skrwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/643524 (https://phabricator.wikimedia.org/T268410) (owner: 10Urbanecm) [14:56:36] Urbanecm: gr + eight :D [14:56:43] aha :D [14:57:22] * Amir1 remembers sk8er boi by Avril [14:57:38] skrwiki db created [14:58:00] syncing the necessary files [14:58:40] !log urbanecm@deploy1001 Synchronized wmf-config/db-eqiad.php: Creating skrwiki (T268410) (duration: 00m 54s) [14:58:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:43] T268410: Create Wikipedia Saraiki - https://phabricator.wikimedia.org/T268410 [14:59:17] (03PS4) 10Jbond: spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 [14:59:46] !log urbanecm@deploy1001 Synchronized wmf-config/db-codfw.php: Creating skrwiki (T268410) (duration: 00m 54s) [14:59:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:44] !log urbanecm@deploy1001 Synchronized dblists: Creating skrwiki (T268410) (duration: 00m 55s) [15:00:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:09] !log urbanecm@deploy1001 rebuilt and synchronized wikiversions files: Creating skrwiki (T268410) [15:02:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:16] (03CR) 10jerkins-bot: [V: 04-1] spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 (owner: 10Jbond) [15:03:07] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: Creating skrwiki (T268410) (duration: 00m 54s) [15:03:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:24] !log reboot ms-be2031 - T269337 [15:03:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:28] T269337: Add ms-be20[58-61] to swift - https://phabricator.wikimedia.org/T269337 [15:04:08] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Creating skrwiki (T268410) (duration: 00m 54s) [15:04:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:12] T268410: Create Wikipedia Saraiki - https://phabricator.wikimedia.org/T268410 [15:05:05] !log urbanecm@deploy1001 Synchronized langlist: Creating skrwiki (T268410) (duration: 00m 54s) [15:05:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:26] (03PS8) 10Urbanecm: Initial configuration for skrwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/643526 (https://phabricator.wikimedia.org/T268448) [15:05:29] (03PS2) 10Urbanecm: Add skrwiki and skrwiktionary to rtl.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646732 (https://phabricator.wikimedia.org/T268448) [15:05:33] Urbanecm: don't forget the rtl thingy for skr [15:05:43] Amir1: yup, thanks for the reminder [15:05:49] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for skrwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/643526 (https://phabricator.wikimedia.org/T268448) (owner: 10Urbanecm) [15:06:25] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for skrwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/643526 (https://phabricator.wikimedia.org/T268448) (owner: 10Urbanecm) [15:06:28] (03PS5) 10Jbond: spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 [15:06:33] (03CR) 10jerkins-bot: [V: 04-1] Add skrwiki and skrwiktionary to rtl.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646732 (https://phabricator.wikimedia.org/T268448) (owner: 10Urbanecm) [15:06:58] damn it [15:07:02] Original exception: [X9jRBQpAMM8AA2srb6cAAABP] 2020-12-15 15:06:45: Fatal exception of type "Exception" skrwiki but I think you know it? [15:07:11] (03CR) 10Urbanecm: [V: 03+2 C: 03+2] Initial configuration for skrwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/643526 (https://phabricator.wikimedia.org/T268448) (owner: 10Urbanecm) [15:07:36] I look into that one [15:08:10] aha [15:08:12] bad timezone [15:08:16] I'll revert the wiki to utc [15:08:21] you kidding me [15:08:42] Unknown or bad timezone (Asia/Islamabad) [15:08:46] * revi *look-of-disapproval* [15:08:47] apparently Asia/Islamabad is not valid [15:09:01] we should write a test for that [15:09:04] `Asia/Karachi` [15:09:26] https://time.is/en/Islamabad [15:09:33] (03CR) 10jerkins-bot: [V: 04-1] spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 (owner: 10Jbond) [15:09:49] thank you [15:09:50] Thanks revi [15:10:08] Also https://en.wikipedia.org/wiki/Time_in_Pakistan#IANA_time_zone_database [15:10:15] (03PS1) 10Urbanecm: Fix timezone for skrwiki and skrwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649658 (https://phabricator.wikimedia.org/T268448) [15:10:20] (03CR) 10Urbanecm: [C: 03+2] Fix timezone for skrwiki and skrwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649658 (https://phabricator.wikimedia.org/T268448) (owner: 10Urbanecm) [15:10:58] (03CR) 10jerkins-bot: [V: 04-1] Fix timezone for skrwiki and skrwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649658 (https://phabricator.wikimedia.org/T268448) (owner: 10Urbanecm) [15:12:20] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Fix timezone config for skrwiki to unbreak it (T268410) (duration: 00m 54s) [15:12:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:12:24] T268410: Create Wikipedia Saraiki - https://phabricator.wikimedia.org/T268410 [15:12:43] verified fix [15:12:52] (03CR) 10Urbanecm: [V: 03+2 C: 03+2] Add skrwiki and skrwiktionary to rtl.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646732 (https://phabricator.wikimedia.org/T268448) (owner: 10Urbanecm) [15:13:04] thanks revi [15:13:51] (03PS2) 10Urbanecm: Fix timezone for skrwiki and skrwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649658 (https://phabricator.wikimedia.org/T268448) [15:13:57] (03CR) 10Urbanecm: [C: 03+2] Fix timezone for skrwiki and skrwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649658 (https://phabricator.wikimedia.org/T268448) (owner: 10Urbanecm) [15:14:42] (03CR) 10jerkins-bot: [V: 04-1] Fix timezone for skrwiki and skrwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649658 (https://phabricator.wikimedia.org/T268448) (owner: 10Urbanecm) [15:15:49] (03CR) 10Urbanecm: [V: 03+2 C: 03+2] Fix timezone for skrwiki and skrwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649658 (https://phabricator.wikimedia.org/T268448) (owner: 10Urbanecm) [15:18:07] !log urbanecm@deploy1001 Synchronized wmf-config/db-eqiad.php: Creating skrwiktionary (T268448) (duration: 00m 55s) [15:18:10] (03CR) 10RLazarus: "Sorry this stalled! Happy to get it merged today." [puppet] - 10https://gerrit.wikimedia.org/r/619130 (https://phabricator.wikimedia.org/T259979) (owner: 10Aklapper) [15:18:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:12] T268448: Create Wiktionary Saraiki - https://phabricator.wikimedia.org/T268448 [15:19:38] !log urbanecm@deploy1001 Synchronized wmf-config/db-codfw.php: Creating skrwiktionary (T268448) (duration: 00m 54s) [15:19:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:42] !log urbanecm@deploy1001 Synchronized dblists: Creating skrwiktionary (T268448) (duration: 00m 57s) [15:20:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:21:57] !log urbanecm@deploy1001 rebuilt and synchronized wikiversions files: Creating skrwiktionary (T268448) [15:21:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:07] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: Creating skrwiktionary (T268448) (duration: 00m 54s) [15:23:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:22] how many (more) wikis do we have for today [15:23:27] revi: this one is last [15:23:29] goood [15:23:44] so I can go get something to eat (instead of waiting for my no-global-up obsession) [15:24:04] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Creating skrwiktionary (T268448) (duration: 00m 54s) [15:24:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:08] T268448: Create Wiktionary Saraiki - https://phabricator.wikimedia.org/T268448 [15:24:12] revi: it should be live already [15:24:15] yeah [15:24:29] https://skr.wiktionary.org/w/index.php?title=%D9%88%D8%B1%D8%AA%DD%A8_%D8%A2%D9%84%D8%A7:-revi&action=history already done [15:24:43] I was like refreshing every 1 minute for few mins [15:25:19] okay, let's do the last part [15:25:24] updating interwiki cache [15:25:28] (03PS1) 10Urbanecm: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649659 [15:25:30] (03CR) 10Urbanecm: [C: 03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649659 (owner: 10Urbanecm) [15:25:34] * Amir1 grabs popcorn [15:25:45] :popcorn: [15:26:00] where do I get the license to sell a popcorn [15:26:17] (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649659 (owner: 10Urbanecm) [15:26:19] hmm, is it a licensed business to sell food? [15:26:31] at least in Korea to open a restaurant you need a permit [15:26:43] basically a rubber stamp that your kitchen is clean [15:26:43] what about a fast food bar? [15:26:53] or lockdown rules? [15:27:03] lockdown rules - takeout/delivery only after 9PM [15:27:05] revi Urbanecm since no one responded in -stewards, either of you available to gblock an lta? [15:27:11] DannyS712: sure [15:27:12] !log urbanecm@deploy1001 Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 57s) [15:27:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:13] oh. [15:27:19] martin won. [15:27:19] finally done [15:27:27] two hours and a half, five wikis created [15:27:33] https://meta.wikimedia.org/wiki/Special:GlobalBlock/185.229.226.214 [15:27:40] now I am going to fear if my wikia.com -> fandom.com update caused a hell.... [15:27:41] Thanks Urbanecm. Great work! [15:27:47] no, dont|panic won revi [15:27:51] (change visibility) 16:14, 15 December 2020 Tks4Fish talk contribs block globally blocked User:185.229.226.214 (expiration 15:14, 15 January 2021) (Long-term abuse) [15:27:53] we all lost. [15:28:01] *sob* [15:28:31] yes, it works :yay: [15:28:44] oh, oops, wrong link https://meta.wikimedia.org/wiki/Special:GlobalBlock/94.237.112.14 [15:29:04] done [15:29:07] thanks [15:29:11] np [15:29:24] I am adding them to the -sw bots [15:29:38] for spying watching for vandalism, etc. [15:29:50] thanks SantaPaws :) [15:29:57] ho ho ho [15:32:38] Urbanecm: are you deploying something? I thought I could sneak a config deploy while there's no window.. [15:32:47] Pchelolo: I just finished creating more wikis [15:32:53] the floor is yours Pchelolo [15:33:02] nice! thank you [15:33:12] (03PS3) 10Ppchelko: Enable old revision parser cache on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649432 (https://phabricator.wikimedia.org/T268075) [15:34:23] (03CR) 10Ppchelko: [C: 03+2] Enable old revision parser cache on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649432 (https://phabricator.wikimedia.org/T268075) (owner: 10Ppchelko) [15:34:35] (03CR) 10Alexandros Kosiaris: [C: 03+2] apertium: Lower replicas to 4 [deployment-charts] - 10https://gerrit.wikimedia.org/r/649654 (owner: 10Alexandros Kosiaris) [15:35:18] (03Merged) 10jenkins-bot: Enable old revision parser cache on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649432 (https://phabricator.wikimedia.org/T268075) (owner: 10Ppchelko) [15:35:20] (03PS6) 10Jbond: spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 [15:35:46] 10Operations, 10Research, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Redirect wikimedia.org/research to research.wikimedia.org instead of some external closed survey - https://phabricator.wikimedia.org/T259979 (10RLazarus) Happened to see this go by -- I've dropped a single comment on the review... [15:35:48] 10Operations, 10Research, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Redirect wikimedia.org/research to research.wikimedia.org instead of some external closed survey - https://phabricator.wikimedia.org/T259979 (10RLazarus) a:03RLazarus [15:35:58] (03Merged) 10jenkins-bot: apertium: Lower replicas to 4 [deployment-charts] - 10https://gerrit.wikimedia.org/r/649654 (owner: 10Alexandros Kosiaris) [15:37:06] (03CR) 10jerkins-bot: [V: 04-1] spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 (owner: 10Jbond) [15:37:46] !log akosiaris@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' . [15:37:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:39:44] !log akosiaris@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' . [15:39:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:39:51] (03PS7) 10Jbond: spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 [15:41:57] !log ppchelko@deploy1001 Synchronized wmf-config/CommonSettings.php: gerrit:649432 T268075 Enable old revision parser cache on all wikis CS.php (duration: 00m 56s) [15:42:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:02] T268075: Enable caching for old revisions in production - https://phabricator.wikimedia.org/T268075 [15:42:45] (03CR) 10jerkins-bot: [V: 04-1] spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 (owner: 10Jbond) [15:43:08] !log ppchelko@deploy1001 Synchronized wmf-config/InitialiseSettings.php: gerrit:649432 T268075 Enable old revision parser cache on all wikis IS.php (duration: 00m 54s) [15:43:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:29] (03PS5) 10Elukey: Port the Spicerack interactive module [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649426 [15:46:14] 10Operations, 10serviceops, 10MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), 10Performance Issue, and 3 others: Strategy for storing parser output for "old revision" (Popular diffs and permalinks) - https://phabricator.wikimedia.org/T244058 (10Pchelolo) [15:46:24] (03CR) 10Elukey: "@Riccardo: just submitted a change with the "go"/"abort" pattern to kick off the conversation. The msg of the exception is probably horrib" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649426 (owner: 10Elukey) [15:47:49] (03PS8) 10Jbond: spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 [15:47:51] 10Operations, 10serviceops, 10MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), 10Performance Issue, and 3 others: Strategy for storing parser output for "old revision" (Popular diffs and permalinks) - https://phabricator.wikimedia.org/T244058 (10Pchelolo) 05Open→03Resolved a:03Pchelolo Now old revision vie... [15:49:12] 10Operations, 10SRE-Access-Requests: Requesting access to deployment group for STran - https://phabricator.wikimedia.org/T270125 (10RLazarus) @STran Thanks! @aezell Can you please comment here, approving as @STran's manager? @thcipriani Can you please also comment, approving for the deployment group on behal... [15:50:15] 10Operations, 10SRE-Access-Requests: Requesting access to deployment group for STran - https://phabricator.wikimedia.org/T270125 (10RLazarus) [15:50:24] (03PS1) 10Awight: Add a job for some visualeditor metrics aggregation [puppet] - 10https://gerrit.wikimedia.org/r/649660 (https://phabricator.wikimedia.org/T262209) [15:50:43] (03CR) 10jerkins-bot: [V: 04-1] spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 (owner: 10Jbond) [15:52:34] (03PS1) 10Awight: Add a job for CodeMirror metrics aggregation [puppet] - 10https://gerrit.wikimedia.org/r/649661 (https://phabricator.wikimedia.org/T267902) [15:53:50] 10Operations, 10SRE-Access-Requests: Requesting access to deployment group for STran - https://phabricator.wikimedia.org/T270125 (10RLazarus) (P.S. Today is technology's department-wide Fun Day, so further progress from Tyler and me might not come until tomorrow; sorry for the inconvenience. Give a yell if thi... [15:54:08] (03PS1) 10Awight: Add a job for TemplateWizard metrics aggregation [puppet] - 10https://gerrit.wikimedia.org/r/649662 (https://phabricator.wikimedia.org/T262209) [15:54:31] (03CR) 10Mforns: [C: 04-1] Add a job for some visualeditor metrics aggregation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649660 (https://phabricator.wikimedia.org/T262209) (owner: 10Awight) [15:56:44] (03CR) 10Volans: [C: 03+2] "Thanks for the fix!" [cookbooks] - 10https://gerrit.wikimedia.org/r/649634 (owner: 10Elukey) [15:58:28] (03PS2) 10Awight: Add a job for TemplateWizard metrics aggregation [puppet] - 10https://gerrit.wikimedia.org/r/649662 (https://phabricator.wikimedia.org/T262209) [15:58:34] (03CR) 10Volans: [C: 03+1] "LGTM, couple of nits inline" (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/649635 (https://phabricator.wikimedia.org/T269919) (owner: 10Elukey) [15:58:41] (03CR) 10Mforns: Add a job for TemplateWizard metrics aggregation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649662 (https://phabricator.wikimedia.org/T262209) (owner: 10Awight) [15:58:49] (03Merged) 10jenkins-bot: Increase bandit version to avoid tox issues [cookbooks] - 10https://gerrit.wikimedia.org/r/649634 (owner: 10Elukey) [15:59:31] (03PS2) 10Awight: Add a job for some visualeditor metrics aggregation [puppet] - 10https://gerrit.wikimedia.org/r/649660 (https://phabricator.wikimedia.org/T262209) [16:00:10] (03PS2) 10Awight: Add a job for CodeMirror metrics aggregation [puppet] - 10https://gerrit.wikimedia.org/r/649661 (https://phabricator.wikimedia.org/T267902) [16:01:02] (03PS9) 10Jbond: spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 [16:01:36] (03CR) 10Awight: Add a job for TemplateWizard metrics aggregation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649662 (https://phabricator.wikimedia.org/T262209) (owner: 10Awight) [16:02:45] (03CR) 10Awight: Add a job for some visualeditor metrics aggregation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649660 (https://phabricator.wikimedia.org/T262209) (owner: 10Awight) [16:04:05] (03CR) 10jerkins-bot: [V: 04-1] spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 (owner: 10Jbond) [16:06:32] (03PS4) 10Aklapper: Change redirect for wikimedia.org/research [puppet] - 10https://gerrit.wikimedia.org/r/619130 (https://phabricator.wikimedia.org/T259979) [16:07:38] (03CR) 10Dzahn: [C: 03+1] Change redirect for wikimedia.org/research [puppet] - 10https://gerrit.wikimedia.org/r/619130 (https://phabricator.wikimedia.org/T259979) (owner: 10Aklapper) [16:09:26] 10Operations, 10Research, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Redirect wikimedia.org/research to research.wikimedia.org instead of some external closed survey - https://phabricator.wikimedia.org/T259979 (10Dzahn) @leila It's about to get deployed. We apologize for this taking so long. Yes... [16:10:02] (03CR) 10RLazarus: [C: 03+2] Change redirect for wikimedia.org/research [puppet] - 10https://gerrit.wikimedia.org/r/619130 (https://phabricator.wikimedia.org/T259979) (owner: 10Aklapper) [16:10:45] (03PS10) 10Jbond: spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 [16:10:47] (03PS1) 10Jbond: scap: fixup spec test [puppet] - 10https://gerrit.wikimedia.org/r/649666 [16:12:34] !log start of rebuilding sites table across wikis (T269443 T269435 T269430 T268461 T268415) [16:12:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:12:49] T269430: Add Wikidata support for eowikivoyage - https://phabricator.wikimedia.org/T269430 [16:12:50] T269443: Add Wikidata support for madwiki - https://phabricator.wikimedia.org/T269443 [16:12:51] T268415: Add Wikidata support for skrwiki - https://phabricator.wikimedia.org/T268415 [16:12:51] T268461: Add Wikidata support for skrwiktionary - https://phabricator.wikimedia.org/T268461 [16:12:52] T269435: Add Wikidata support for wawikisource - https://phabricator.wikimedia.org/T269435 [16:13:51] (03CR) 10Mforns: "This LGTM now, but we should review and merge https://gerrit.wikimedia.org/r/c/analytics/reportupdater-queries/+/649351 first, so that que" [puppet] - 10https://gerrit.wikimedia.org/r/649662 (https://phabricator.wikimedia.org/T262209) (owner: 10Awight) [16:15:59] (03CR) 10Jbond: [C: 03+2] scap: fixup spec test [puppet] - 10https://gerrit.wikimedia.org/r/649666 (owner: 10Jbond) [16:16:03] (03PS1) 10RLazarus: httpbb: Fix a typo in an assertion [puppet] - 10https://gerrit.wikimedia.org/r/649667 [16:16:05] (03CR) 10Jbond: [C: 03+2] spec_test: clean up Rakefiles to ensure they work with parallel spec [puppet] - 10https://gerrit.wikimedia.org/r/649631 (owner: 10Jbond) [16:16:40] (03CR) 10Dzahn: [C: 03+1] httpbb: Fix a typo in an assertion [puppet] - 10https://gerrit.wikimedia.org/r/649667 (owner: 10RLazarus) [16:18:12] !log reimaging mw1265 to buster [16:18:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:18:58] (03CR) 10Hnowlan: [C: 03+2] install_server: Reimage mw1265 to buster. [puppet] - 10https://gerrit.wikimedia.org/r/649620 (https://phabricator.wikimedia.org/T245757) (owner: 10Hnowlan) [16:19:01] 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by hnowlan on cumin1001.eqiad.wmnet for hosts: ` ['m... [16:19:45] (03PS1) 10Alexandros Kosiaris: Revert "apertium: Lower replicas to 4" [deployment-charts] - 10https://gerrit.wikimedia.org/r/649525 [16:19:51] (03CR) 10RLazarus: [C: 03+2] httpbb: Fix a typo in an assertion [puppet] - 10https://gerrit.wikimedia.org/r/649667 (owner: 10RLazarus) [16:20:58] (03PS1) 10Jbond: debian: migrate to shared spec helper [puppet] - 10https://gerrit.wikimedia.org/r/649668 [16:22:46] (03CR) 10Jbond: [C: 03+2] debian: migrate to shared spec helper [puppet] - 10https://gerrit.wikimedia.org/r/649668 (owner: 10Jbond) [16:23:30] (03CR) 10Alexandros Kosiaris: [C: 03+2] Revert "apertium: Lower replicas to 4" [deployment-charts] - 10https://gerrit.wikimedia.org/r/649525 (owner: 10Alexandros Kosiaris) [16:24:29] 10Operations, 10Research, 10Wikimedia-Apache-configuration: Redirect wikimedia.org/research to research.wikimedia.org instead of some external closed survey - https://phabricator.wikimedia.org/T259979 (10RLazarus) 05Open→03Resolved Deployed and tested: ` rzl@cumin1001:~$ httpbb /srv/deployment/httpbb-te... [16:24:37] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/media/math/check/{type} (Mathoid - check test formula) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [16:24:48] (03Merged) 10jenkins-bot: Revert "apertium: Lower replicas to 4" [deployment-charts] - 10https://gerrit.wikimedia.org/r/649525 (owner: 10Alexandros Kosiaris) [16:25:52] (03CR) 10Elukey: sre.hadoop.upgrade-bigtop-distro: use systemctl mask where needed (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/649635 (https://phabricator.wikimedia.org/T269919) (owner: 10Elukey) [16:27:47] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [16:31:47] !log end of rebuilding sites table across wikis (T269443 T269435 T269430 T268461 T268415) [16:31:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:31:58] T269430: Add Wikidata support for eowikivoyage - https://phabricator.wikimedia.org/T269430 [16:31:58] T269443: Add Wikidata support for madwiki - https://phabricator.wikimedia.org/T269443 [16:31:59] T268415: Add Wikidata support for skrwiki - https://phabricator.wikimedia.org/T268415 [16:31:59] T268461: Add Wikidata support for skrwiktionary - https://phabricator.wikimedia.org/T268461 [16:32:00] T269435: Add Wikidata support for wawikisource - https://phabricator.wikimedia.org/T269435 [16:34:46] (03PS4) 10Ottomata: eventgate-main - increase mem limit to 600Mi [deployment-charts] - 10https://gerrit.wikimedia.org/r/649360 (https://phabricator.wikimedia.org/T249745) [16:35:01] (03PS1) 10Jbond: rakefile: add parallel spec without forking [puppet] - 10https://gerrit.wikimedia.org/r/649673 [16:35:51] (03Abandoned) 10Jbond: (WIP) spec test fixes [puppet] - 10https://gerrit.wikimedia.org/r/645187 (owner: 10Jbond) [16:38:34] (03CR) 10Ottomata: [C: 03+2] eventgate-main - increase mem limit to 600Mi [deployment-charts] - 10https://gerrit.wikimedia.org/r/649360 (https://phabricator.wikimedia.org/T249745) (owner: 10Ottomata) [16:41:48] !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [16:41:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:43:03] !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1265.eqiad.wmnet with reason: REIMAGE [16:43:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:44:30] y [16:44:32] !log otto@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [16:44:32] !log otto@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [16:44:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:44:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:45:01] 10Operations, 10SRE-Access-Requests: Access to prod mysql from stat1004 - https://phabricator.wikimedia.org/T270196 (10gmodena) [16:45:05] !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1265.eqiad.wmnet with reason: REIMAGE [16:45:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:45:32] (03CR) 10Dzahn: [V: 03+1 C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/27143/icinga2001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/648359 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn) [16:46:28] (03PS1) 10Ssingh: dnsdist: respond to qtype=ANY queries with NOTIMP [puppet] - 10https://gerrit.wikimedia.org/r/649674 (https://phabricator.wikimedia.org/T252132) [16:47:21] !log otto@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [16:47:21] !log otto@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [16:47:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:47:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:47:46] !log bumped eventate-main memory limits from 300M to 600M - T249745 [16:47:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:47:50] T249745: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745 [16:48:13] (03CR) 10Dzahn: "noop confirmed on alert1001, icinga1001" [puppet] - 10https://gerrit.wikimedia.org/r/648359 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn) [16:50:00] (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27145/console" [puppet] - 10https://gerrit.wikimedia.org/r/649674 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh) [16:51:40] (03PS1) 10Dzahn: icinga: use ensure_packages for all packages and move to top [puppet] - 10https://gerrit.wikimedia.org/r/649679 [16:53:22] (03PS2) 10Jbond: rakefile: add parallel spec without forking [puppet] - 10https://gerrit.wikimedia.org/r/649673 [16:55:49] (03CR) 10Dzahn: puppetmaster: require_package -> ensure_packages (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/648358 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn) [16:55:54] (03PS2) 10Dzahn: puppetmaster: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/648358 (https://phabricator.wikimedia.org/T266479) [16:57:51] 10Operations, 10Research, 10Wikimedia-Apache-configuration: Redirect wikimedia.org/research to research.wikimedia.org instead of some external closed survey - https://phabricator.wikimedia.org/T259979 (10Aklapper) a:05RLazarus→03Aklapper Thanks everyone! :) [16:57:57] (03PS1) 10Dzahn: puppetmaster: remove absented logrotate conf [puppet] - 10https://gerrit.wikimedia.org/r/649680 [16:58:59] 10Operations, 10Research, 10Wikimedia-Apache-configuration: Redirect wikimedia.org/research to research.wikimedia.org instead of some external closed survey - https://phabricator.wikimedia.org/T259979 (10RLazarus) Quick correction -- this is now live on all appservers, but the old URL is still cached by the... [16:59:05] (03PS1) 10Dzahn: puppetmaster: remove absented ganglia-gen and sshknowngen [puppet] - 10https://gerrit.wikimedia.org/r/649681 [17:00:04] jbond42 and cdanis: Time to snap out of that daydream and deploy Puppet request window. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201215T1700). [17:00:15] (03PS1) 10Dzahn: puppetmaster: remove absented puppet-wildcardsign [puppet] - 10https://gerrit.wikimedia.org/r/649683 [17:00:25] window empty [17:00:39] (because they are already done) [17:00:59] cool thanks mutante [17:02:23] (03PS1) 10Jbond: network: migrate to rspec mock syntax [puppet] - 10https://gerrit.wikimedia.org/r/649684 [17:04:02] PROBLEM - Host ms-be2031 is DOWN: PING CRITICAL - Packet loss = 100% [17:04:12] (03CR) 10jerkins-bot: [V: 04-1] network: migrate to rspec mock syntax [puppet] - 10https://gerrit.wikimedia.org/r/649684 (owner: 10Jbond) [17:06:39] (03Abandoned) 10Jbond: DO NOT MERGE: change to check the difference paralle_spec makes [puppet] - 10https://gerrit.wikimedia.org/r/645112 (owner: 10Jbond) [17:07:21] 10Operations, 10serviceops, 10MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), 10Performance Issue, and 3 others: Strategy for storing parser output for "old revision" (Popular diffs and permalinks) - https://phabricator.wikimedia.org/T244058 (10jcrespo) Thanks for working on this. Let's monitor parsercache usag... [17:08:02] RECOVERY - Host ms-be2031 is UP: PING OK - Packet loss = 0%, RTA = 31.78 ms [17:08:35] \o/ ms-be2031 was me [17:10:43] 10Operations, 10serviceops, 10MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), 10Performance Issue, and 3 others: Strategy for storing parser output for "old revision" (Popular diffs and permalinks) - https://phabricator.wikimedia.org/T244058 (10Pchelolo) > maybe not necessary if it only hits memcache, haven't l... [17:11:14] 10Operations, 10serviceops, 10MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), 10Performance Issue, and 3 others: Strategy for storing parser output for "old revision" (Popular diffs and permalinks) - https://phabricator.wikimedia.org/T244058 (10jcrespo) Ah, cool then. One less thing to worry about, but was goin... [17:12:30] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10wiki_willy) @Cmjohnson and @RobH - per our conversation on IRC, just a heads up to avoid installing the remaining db hosts with IPV6. (reference T270101 for the r... [17:12:40] (03PS1) 10Elukey: Allow connections to dbproxy101[3,5]:3306 in the analytics-in4 filter [homer/public] - 10https://gerrit.wikimedia.org/r/649706 (https://phabricator.wikimedia.org/T270196) [17:16:45] (03CR) 10Jbond: [C: 03+2] rakefile: add parallel spec without forking [puppet] - 10https://gerrit.wikimedia.org/r/649673 (owner: 10Jbond) [17:17:31] 10Operations, 10Research, 10Wikimedia-Apache-configuration: Redirect wikimedia.org/research to research.wikimedia.org instead of some external closed survey - https://phabricator.wikimedia.org/T259979 (10leila) Thank you all! I really appreciate your work on this. [17:18:16] PROBLEM - very high load average likely xfs on ms-be2031 is CRITICAL: CRITICAL - load average: 112.77, 100.50, 56.49 https://wikitech.wikimedia.org/wiki/Swift [17:18:32] godog: it is fighting back --^ :D [17:19:00] elukey: heheh yeah it's been off for two hours, and had 500+ days uptime before that [17:19:01] (03PS1) 10Dzahn: mediawiki/jobrunner: create beta role, remove hiera lookup [puppet] - 10https://gerrit.wikimedia.org/r/649707 [17:19:05] stretching its legs [17:19:14] (03PS2) 10Jbond: rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 [17:19:26] (03CR) 10Dduvall: [C: 03+2] Branch commit for wmf/1.36.0-wmf.22 [core] (wmf/1.36.0-wmf.22) - 10https://gerrit.wikimedia.org/r/649480 (https://phabricator.wikimedia.org/T267415) (owner: 10TrainBranchBot) [17:20:01] (03CR) 10Dzahn: "This would involve changing the role used here: https://openstack-browser.toolforge.org/puppetclass/role::mediawiki::jobrunner" [puppet] - 10https://gerrit.wikimedia.org/r/649707 (owner: 10Dzahn) [17:27:54] (03PS1) 10Dzahn: etcd: add data types, replace hiera with lookup [puppet] - 10https://gerrit.wikimedia.org/r/649708 (https://phabricator.wikimedia.org/T209953) [17:30:06] 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1265.eqiad.wmnet'] ` and were **ALL** successful. [17:32:07] 10Operations, 10Research, 10Wikimedia-Apache-configuration: Redirect wikimedia.org/research to research.wikimedia.org instead of some external closed survey - https://phabricator.wikimedia.org/T259979 (10RLazarus) >>! In T259979#6692703, @RLazarus wrote: > Quick correction -- this is now live on all appserve... [17:33:34] (03CR) 10Cwhite: [C: 03+1] alertmanager: add peering@ contact/team [puppet] - 10https://gerrit.wikimedia.org/r/649645 (https://phabricator.wikimedia.org/T267018) (owner: 10Filippo Giunchedi) [17:33:51] (03CR) 10Cwhite: [C: 03+1] alertmanager: rename irc contact to sre-irc [puppet] - 10https://gerrit.wikimedia.org/r/649646 (https://phabricator.wikimedia.org/T267018) (owner: 10Filippo Giunchedi) [17:36:14] (03CR) 10Jbond: [C: 03+1] "lgtm thx" [puppet] - 10https://gerrit.wikimedia.org/r/648358 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn) [17:42:52] PROBLEM - very high load average likely xfs on ms-be2031 is CRITICAL: CRITICAL - load average: 103.14, 101.28, 95.25 https://wikitech.wikimedia.org/wiki/Swift [17:43:29] (03Merged) 10jenkins-bot: Branch commit for wmf/1.36.0-wmf.22 [core] (wmf/1.36.0-wmf.22) - 10https://gerrit.wikimedia.org/r/649480 (https://phabricator.wikimedia.org/T267415) (owner: 10TrainBranchBot) [17:53:41] (03PS1) 10Dduvall: testwikis wikis to 1.36.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649713 [17:53:43] (03CR) 10Dduvall: [C: 03+2] testwikis wikis to 1.36.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649713 (owner: 10Dduvall) [17:54:32] (03Merged) 10jenkins-bot: testwikis wikis to 1.36.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649713 (owner: 10Dduvall) [17:55:07] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Access to prod mysql from stat1004 - https://phabricator.wikimedia.org/T270196 (10Reedy) [17:55:29] !log dduvall@deploy1001 Started scap: testwikis wikis to 1.36.0-wmf.22 [17:55:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:04] chrisalbon and accraze: My dear minions, it's time we take the moon! Just kidding. Time for Services – Graphoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201215T1800). [18:23:17] 10Operations, 10InternetArchiveBot, 10Traffic, 10Platform Team Workboards (Clinic Duty Team): IAbot sending a huge volume of action=raw requests - https://phabricator.wikimedia.org/T269914 (10daniel) We could sleep for a second before sending the 415... [18:41:15] !log dduvall@deploy1001 Finished scap: testwikis wikis to 1.36.0-wmf.22 (duration: 46m 41s) [18:41:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:44:42] RECOVERY - very high load average likely xfs on ms-be2031 is OK: OK - load average: 53.82, 65.02, 78.38 https://wikitech.wikimedia.org/wiki/Swift [18:48:44] !log dduvall@deploy1001 Pruned MediaWiki: 1.36.0-wmf.20 (duration: 04m 19s) [18:48:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201215T1900) [19:06:12] (03PS1) 10Ladsgroup: druid: Migrate hiera() to lookup() and set data type in monitoring [puppet] - 10https://gerrit.wikimedia.org/r/649721 (https://phabricator.wikimedia.org/T209953) [19:11:12] 10Operations, 10Traffic: Image fails to load with CORS violation - https://phabricator.wikimedia.org/T270209 (10Majavah) added hopefully right project tags, please correct if I'm wrong [19:14:00] (03CR) 10Ladsgroup: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/27147/" [puppet] - 10https://gerrit.wikimedia.org/r/649721 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [19:14:25] !log joal@deploy1001 Started deploy [analytics/refinery@2202db5]: Regular analytics weekly train [analytics/refinery@2202db5] [19:14:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:30:19] 10Operations, 10Traffic: Image fails to load with CORS violation - https://phabricator.wikimedia.org/T270209 (10DannyS712) [19:31:02] !log joal@deploy1001 Finished deploy [analytics/refinery@2202db5]: Regular analytics weekly train [analytics/refinery@2202db5] (duration: 16m 36s) [19:31:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:32:22] !log joal@deploy1001 Started deploy [analytics/refinery@2202db5] (thin): Regular analytics weekly train - THIN [analytics/refinery@2202db5] [19:32:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:32:30] !log joal@deploy1001 Finished deploy [analytics/refinery@2202db5] (thin): Regular analytics weekly train - THIN [analytics/refinery@2202db5] (duration: 00m 08s) [19:32:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:04] marxarelli and longma: I, the Bot under the Fountain, allow thee, The Deployer, to do Mediawiki train - American Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201215T2000). [20:03:55] (03PS1) 10Dduvall: group0 wikis to 1.36.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649728 [20:04:00] (03CR) 10Dduvall: [C: 03+2] group0 wikis to 1.36.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649728 (owner: 10Dduvall) [20:04:48] (03Merged) 10jenkins-bot: group0 wikis to 1.36.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649728 (owner: 10Dduvall) [20:06:27] !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.22 [20:06:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:08:46] longma: buttons pushed. awfully quiet [20:09:02] yeah [20:11:35] that seems desirable :) [20:12:29] Is this the last of the year? [20:12:36] RhinosF1: it is [20:13:09] Well this year needs a nice peaceful end so let's hope it stays quiet [20:13:19] :) [20:14:31] :) [20:14:33] hear hear. though i think that's why i'm suspicious :) [20:14:45] but no, looks great [20:15:28] Thanks for all the work you guys have done over the year. You deserve a good Christmas. [20:15:39] * RhinosF1 shivers at it being Christmas already [20:15:45] thank you! [20:17:12] this year really was a strange black hole time wise. somehow both infinity vast and small depending on the moment [20:17:17] 10Operations, 10Wikimedia-Mailing-lists: No admin response for many months for research-internal listserv - https://phabricator.wikimedia.org/T270213 (10Isaac) [20:18:16] marxarelli: I don't even know where it went. [20:18:48] the year that time stood still. and yet also flew by [20:19:04] PROBLEM - Check systemd state on mw1265 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:19:14] eaten up by the churn of getting by [20:19:25] hmmm, mw1265 is the buster test victim [20:19:55] apergos: that should go on bash. [20:20:20] ifup@eno1.service, same thing we saw on a couple of buster mc hosts [20:20:20] * RhinosF1 shuts up to let someone look at icinga-wm peacefully [20:20:24] PROBLEM - PHP opcache health on mw1265 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [20:22:10] rzl: would that have been caused by the fpm restart during deploy? [20:22:30] marxarelli: the systemd alert no, the opcache alert... plausibly [20:23:03] marxarelli: for context, mc1265 is the first appserver upgraded to debian buster and actually serving traffic [20:23:09] ah ok [20:23:35] I don't think any performance implications are expected, but if anything weird is happening and it's just that machine, something about the upgrade is likely to be the reason :) [20:24:01] interested in hearing which systemd unit failed [20:24:13] the systemd unit was ifup@eno1.service [20:24:17] (03PS2) 10Jdlrobson: wgMinervaCountErrors config was removed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649434 (https://phabricator.wikimedia.org/T266359) [20:24:24] (where the buster change is renaming eth1 to eno1) [20:24:28] oh ugh [20:24:36] well that's meh [20:25:02] yeah I think that's an artifact of one sort or another -- we saw it on some but not all of the memcache hosts we also upgraded to buster [20:25:18] weird that it wasn't uniform [20:25:22] the opcache hit ratio is more potentially concerning, and also might or might not be real, but either way it's separate [20:25:47] yeah, it might also be we saw it everywhere but someone fixed it everywhere but on mc[12]033, and I wasn't paying attention to how :) [20:26:00] if you remember, keep me in the loop for anything interesting? I'll be trying out a snapshot testbed host on it $soon [20:26:09] sure thing [20:26:12] next jan most likely, tbd [20:26:19] if it doesn't turn out to be trivial I'll open a task and add you [20:26:24] awesome [20:26:32] !log reset email for User:Cnk1220 [20:26:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:29:10] !log group0 to 1.36.0-wmf.22 complete. no new errors or concerning rates (refs T267415) [20:29:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:29:16] T267415: 1.36.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T267415 [20:29:58] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=gerrit site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:30:19] 10Operations, 10SRE-Access-Requests: Requesting access to deployment group for STran - https://phabricator.wikimedia.org/T270125 (10Tchanders) Thanks @RLazarus! We're not in a particular hurry. [20:31:13] marxarelli: I think the opcache hit ratio alert is fine to ignore, it is indeed much lower on mw1265 but that's just an artifact of runtime since restart, especially since it has a lower weight so the cache will warm slower [20:31:29] the usual spike in cache misses associated with the deploy must have just bumped it below the alert threshold [20:31:32] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:35:16] rzl: got it. yeah, we're used to seeing timeouts due to opcache repopulation but i'll keep an eye on that server for the remaining of the wek [20:35:20] *week* [20:35:59] marxarelli: 👍 [20:45:10] (03PS1) 10Wolfgang Kandek: Phabricator: adding aggressive crawler IP address to banlist [puppet] - 10https://gerrit.wikimedia.org/r/649734 (https://phabricator.wikimedia.org/T270184) [20:48:01] (03CR) 10Dzahn: [C: 03+1] Phabricator: adding aggressive crawler IP address to banlist [puppet] - 10https://gerrit.wikimedia.org/r/649734 (https://phabricator.wikimedia.org/T270184) (owner: 10Wolfgang Kandek) [20:49:31] (03CR) 10Wolfgang Kandek: [C: 03+2] Phabricator: adding aggressive crawler IP address to banlist [puppet] - 10https://gerrit.wikimedia.org/r/649734 (https://phabricator.wikimedia.org/T270184) (owner: 10Wolfgang Kandek) [21:03:02] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [21:04:32] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [21:13:12] PROBLEM - PHP opcache health on mw1306 is CRITICAL: CRITICAL: opcache full. https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [21:14:46] RECOVERY - PHP opcache health on mw1306 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [21:17:12] PROBLEM - Ensure local MW versions match expected deployment on mw1265 is CRITICAL: CRITICAL: 131 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [21:23:28] (03PS1) 10Wolfgang Kandek: Phabicator: add 2nd IP to local banlist to be able to remove global ban [puppet] - 10https://gerrit.wikimedia.org/r/649737 (https://phabricator.wikimedia.org/T270184) [21:25:18] (03CR) 10Dzahn: [C: 03+1] "both are Vodafone IPs in Germany" [puppet] - 10https://gerrit.wikimedia.org/r/649737 (https://phabricator.wikimedia.org/T270184) (owner: 10Wolfgang Kandek) [21:25:50] (03CR) 10Wolfgang Kandek: [C: 03+2] Phabicator: add 2nd IP to local banlist to be able to remove global ban [puppet] - 10https://gerrit.wikimedia.org/r/649737 (https://phabricator.wikimedia.org/T270184) (owner: 10Wolfgang Kandek) [21:38:51] (03PS3) 10Dzahn: puppetmaster: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/648358 (https://phabricator.wikimedia.org/T266479) [21:40:56] (03CR) 10Dzahn: [C: 04-1] puppetmaster: require_package -> ensure_packages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/648358 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn) [21:43:40] (03PS4) 10Dzahn: puppetmaster: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/648358 (https://phabricator.wikimedia.org/T266479) [21:43:44] (03CR) 10Dzahn: [C: 04-1] puppetmaster: require_package -> ensure_packages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/648358 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn) [21:45:13] (03CR) 10Marostegui: "I have some questions about the process of inserting the data on the m2-master, should I comment on the task itself or here?" [homer/public] - 10https://gerrit.wikimedia.org/r/649706 (https://phabricator.wikimedia.org/T270196) (owner: 10Elukey) [21:45:44] (03CR) 10Dzahn: [C: 03+2] "now works: https://puppet-compiler.wmflabs.org/compiler1003/27150/" [puppet] - 10https://gerrit.wikimedia.org/r/648358 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn) [21:46:27] (03CR) 10Dzahn: [C: 03+2] httpbb: add tests for parsoid servers [puppet] - 10https://gerrit.wikimedia.org/r/648383 (https://phabricator.wikimedia.org/T268524) (owner: 10Dzahn) [21:53:01] (03CR) 10Dzahn: "Sorry, I have no context from https://phabricator.wikimedia.org/T243009 but many other people were involved in that ticket already so it w" [puppet] - 10https://gerrit.wikimedia.org/r/636074 (https://phabricator.wikimedia.org/T243009) (owner: 10Ahmon Dancy) [22:04:59] (03PS1) 10Mholloway: WikimediaEvents: Promote SessionTick to group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649740 (https://phabricator.wikimedia.org/T248987) [22:07:32] (03CR) 10Mholloway: [C: 03+2] WikimediaEvents: Promote SessionTick to group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649740 (https://phabricator.wikimedia.org/T248987) (owner: 10Mholloway) [22:08:37] (03Merged) 10jenkins-bot: WikimediaEvents: Promote SessionTick to group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649740 (https://phabricator.wikimedia.org/T248987) (owner: 10Mholloway) [22:10:45] !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings.php: WikimediaEvents: Promote SessionTick to group1 T248987 (duration: 01m 04s) [22:10:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:10:50] T248987: Session Length Metric. Web implementation - https://phabricator.wikimedia.org/T248987 [22:10:55] 10Operations, 10MW-on-K8s, 10serviceops, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Pipeline): Deployment infrastructure for PHP microservices - https://phabricator.wikimedia.org/T261369 (10Legoktm) Have we ever run composer in production/part of an automated build process like this? For M... [22:15:14] 10Operations, 10Research, 10Wikimedia-Mailing-lists: No admin response for many months for research-internal listserv - https://phabricator.wikimedia.org/T270213 (10Peachey88) [22:27:44] 10Operations, 10Research, 10Wikimedia-Mailing-lists: No admin response for many months for research-internal listserv - https://phabricator.wikimedia.org/T270213 (10Dzahn) @Isaac In general to contact list owners you can mail the special address like `research-internal-owner@lists.wikimedia.org` But I can t... [22:28:47] 10Operations, 10Research, 10Wikimedia-Mailing-lists: No admin response for many months for research-internal listserv - https://phabricator.wikimedia.org/T270213 (10Dzahn) Once you have new admin email addresses for us let us know and we can add them and then reset the password so new admins will get access. [22:44:52] 10Operations, 10MW-on-K8s, 10serviceops, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Pipeline): Deployment infrastructure for PHP microservices - https://phabricator.wikimedia.org/T261369 (10Jdforrester-WMF) npm services build the whole of their (non-dev) npm dependency graph into the image... [22:59:35] (03CR) 10Dzahn: "I was torn between these 2: a) The version you see here right now, it is in Hiera as "foo/bar" and we split it at "/" and don't have to p" [puppet] - 10https://gerrit.wikimedia.org/r/648385 (owner: 10Dzahn) [23:10:25] (03CR) 10Dzahn: gerrit: use proper hostname on replica hosts (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/643919 (owner: 10Hashar) [23:24:23] (03PS1) 10PipelineBot: blubberoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/649750 [23:29:32] (03CR) 10Dduvall: [C: 03+2] blubberoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/649750 (owner: 10PipelineBot) [23:31:02] (03Merged) 10jenkins-bot: blubberoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/649750 (owner: 10PipelineBot) [23:34:19] !log dduvall@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . [23:34:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:38:09] (03PS1) 10Dzahn: gerrit: turn gerrit::replica into its own role (WIP, draft) [puppet] - 10https://gerrit.wikimedia.org/r/649752 [23:45:50] !log dduvall@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [23:45:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:47:17] !log dduvall@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [23:47:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:48:03] (03PS5) 10Dzahn: All parsoid profiles use_php=true [puppet] - 10https://gerrit.wikimedia.org/r/577043 (owner: 10C. Scott Ananian) [23:48:04] bd808: ^ [23:50:03] marxarelli: yay! It seems to be working for my test blubber input [23:50:38] \o/ [23:53:41] (03PS1) 10Wolfgang Kandek: Phabricator: reformat phabbanlist for new IP banning format, remove lines that use old non-working format [puppet] - 10https://gerrit.wikimedia.org/r/649753 (https://phabricator.wikimedia.org/T270185) [23:55:07] (03CR) 10jerkins-bot: [V: 04-1] Phabricator: reformat phabbanlist for new IP banning format, remove lines that use old non-working format [puppet] - 10https://gerrit.wikimedia.org/r/649753 (https://phabricator.wikimedia.org/T270185) (owner: 10Wolfgang Kandek) [23:55:22] (03CR) 10Wolfgang Kandek: "Reformatted the phab ban list file to document the new formatting required for IP bans." [puppet] - 10https://gerrit.wikimedia.org/r/649753 (https://phabricator.wikimedia.org/T270185) (owner: 10Wolfgang Kandek) [23:58:46] (03PS2) 10Wolfgang Kandek: Phabricator: reformat phabbanlist for new IP banning format, remove lines that use old non-working format [puppet] - 10https://gerrit.wikimedia.org/r/649753 (https://phabricator.wikimedia.org/T270185)