[00:00:05] <jouncebot>	 twentyafterfour: My dear minions, it's time we take the moon! Just kidding. Time for Phabricator update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210408T0000).
[00:33:12] <wikibugs>	 10SRE, 10Security-Team, 10Wikimedia-Mailing-lists: Upgrade GNU Mailman from 2.1 to Mailman3 - https://phabricator.wikimedia.org/T52864 (10Ladsgroup)
[00:33:35] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10User-Ladsgroup: Import several public mailing lists archives from mailman2 to lists-next to measure database size - https://phabricator.wikimedia.org/T278609 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup Let's call it resolved. Wikitech-l is one of our oldest and big...
[00:56:48] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Release 2020.02~wmf5 [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/677669 (https://phabricator.wikimedia.org/T279480) (owner: 10Ottomata)
[00:56:50] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Release 2020.02~wmf5 [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/677669 (https://phabricator.wikimedia.org/T279480) (owner: 10Ottomata)
[01:30:40] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1005-cloudelastic-chi-eqiad on cloudelastic1005 is CRITICAL: 137.3 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1005&panelId=37
[01:32:36] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1006-cloudelastic-chi-eqiad on cloudelastic1006 is CRITICAL: 153.6 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1006&panelId=37
[01:50:00] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[01:52:22] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[01:52:56] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1004-cloudelastic-chi-eqiad on cloudelastic1004 is CRITICAL: 109.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1004&panelId=37
[02:07:06] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is CRITICAL: 106.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1001&panelId=37
[02:50:28] <AaronSchulz>	 !log Restarted importMissingLocalNames.php (mwmaint 1002, wiki=metawiki,batch-size=1000)
[02:50:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:52:02] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 103.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37
[03:27:44] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37
[03:43:44] <wikibugs>	 (03PS1) 10Krinkle: [Beta Cluster] Disable wgEnableWANCacheReaper experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677731
[03:47:31] <wikibugs>	 (03PS1) 10Krinkle: [Beta Cluster] mc: Use new 'wanRoutingPrefix' option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677732
[03:47:33] <wikibugs>	 (03PS1) 10Krinkle: [Beta Cluster] mc: Remove unused mcrouterAware/cluster/coalesceKeys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677733
[03:47:35] <wikibugs>	 (03PS1) 10Krinkle: mc: Remove unused mcrouterAware/cluster/coalesceKeys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677734
[03:47:44] <wikibugs>	 (03PS2) 10Krinkle: [Beta Cluster] mc: Use new 'wanRoutingPrefix' option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677732
[03:47:46] <wikibugs>	 (03PS2) 10Krinkle: mc: Add 'wanRoutingPrefix' (replaces 'mcrouterAware' and 'cluster') [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677418
[03:47:48] <wikibugs>	 (03PS2) 10Krinkle: [Beta Cluster] mc: Remove unused mcrouterAware/cluster/coalesceKeys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677733
[03:47:50] <wikibugs>	 (03PS2) 10Krinkle: mc: Remove unused mcrouterAware/cluster/coalesceKeys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677734
[04:15:42] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 77.29 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37
[04:19:22] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[04:21:50] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[04:36:24] <wikibugs>	 10SRE, 10Traffic, 10netops, 10Performance-Team (Radar): experiment with reenabling compression between applayer's TLS terminators and edge caches - https://phabricator.wikimedia.org/T263288 (10Krinkle)
[05:00:02] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 104.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37
[05:12:19] <wikibugs>	 10SRE, 10DBA, 10Platform Engineering, 10Wikimedia-Incident: Appservers latency spike / parser cache growth 2021-03-28 - https://phabricator.wikimedia.org/T278655 (10Marostegui) I am not fully sure I am reading the disk space graph correctly as I don't see an increase there. There's surely an increase on th...
[05:15:11] <wikibugs>	 (03PS2) 10KartikMistry: Update cxserver to 2021-04-07-062518-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/677557 (https://phabricator.wikimedia.org/T278141)
[05:31:28] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] Update cxserver to 2021-04-07-062518-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/677557 (https://phabricator.wikimedia.org/T278141) (owner: 10KartikMistry)
[05:39:37] <wikibugs>	 (03Merged) 10jenkins-bot: Update cxserver to 2021-04-07-062518-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/677557 (https://phabricator.wikimedia.org/T278141) (owner: 10KartikMistry)
[05:42:52] <icinga-wm>	 PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:43:17] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
[05:43:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:44:50] <wikibugs>	 10SRE, 10DBA, 10Platform Engineering, 10Wikimedia-Incident: Appservers latency spike / parser cache growth 2021-03-28 - https://phabricator.wikimedia.org/T278655 (10Marostegui) I have done some testing with pc000 in a testing host. Deleted everything under 20 days so simulating that we only keep 20 days in...
[05:45:04] <icinga-wm>	 RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 0.007 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:54:52] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
[05:54:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:58:20] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
[05:58:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:01:14] <kart_>	 !log Updated cxserver to 2021-04-07-062518-production (T278141, T263139, T271711, T201491, T240525, T207662)
[06:01:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:01:28] <stashbot>	 T240525: cxserver: Update to core-js@3 - https://phabricator.wikimedia.org/T240525
[06:01:28] <stashbot>	 T278141: cxserver missing important metrics after service-runner 2.8.1 upgrade - https://phabricator.wikimedia.org/T278141
[06:01:29] <stashbot>	 T201491: Fix common typos in code - https://phabricator.wikimedia.org/T201491
[06:01:29] <stashbot>	 T207662: MT processing error: TypeError: key.trim is not a function - https://phabricator.wikimedia.org/T207662
[06:01:29] <stashbot>	 T271711: Update cxserver to service-runner 2.8.1 - https://phabricator.wikimedia.org/T271711
[06:01:29] <stashbot>	 T263139: Show section placeholder before "References" and similar sections - https://phabricator.wikimedia.org/T263139
[06:03:34] <kart_>	 That's lots of fixes :)
[06:15:39] <wikibugs>	 (03CR) 10Elukey: hadoop: add the liblog4j-extras1.2-java jar to HADOOP_CLASSPATH (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/677576 (https://phabricator.wikimedia.org/T276906) (owner: 10Elukey)
[06:17:09] <wikibugs>	 (03PS2) 10Elukey: hadoop: add the liblog4j-extras1.2-java jar to HADOOP_CLASSPATH [puppet] - 10https://gerrit.wikimedia.org/r/677576 (https://phabricator.wikimedia.org/T276906)
[06:17:21] <wikibugs>	 (03CR) 10Elukey: hadoop: add the liblog4j-extras1.2-java jar to HADOOP_CLASSPATH (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/677576 (https://phabricator.wikimedia.org/T276906) (owner: 10Elukey)
[06:25:06] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28946/console" [puppet] - 10https://gerrit.wikimedia.org/r/677576 (https://phabricator.wikimedia.org/T276906) (owner: 10Elukey)
[06:28:27] <wikibugs>	 (03PS11) 10DharmrajRathod98: Improved: regex-validation in cli/recover-dump and added unit test file in test/unit [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/673693 (https://phabricator.wikimedia.org/T277754)
[06:32:20] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 101.4 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37
[06:33:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1111 to clone db1177 T275633', diff saved to https://phabricator.wikimedia.org/P15229 and previous config saved to /var/cache/conftool/dbconfig/20210408-063331-marostegui.json
[06:33:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:33:40] <stashbot>	 T275633: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633
[06:33:48] <marostegui>	 !log Stop MySQL on db1111 to clone db1177 T275633
[06:33:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:36:46] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 103.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37
[06:39:02] <icinga-wm>	 PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration
[06:41:06] <icinga-wm>	 RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 0.051 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration
[06:41:45] <logmsgbot>	 !log elukey@deploy1002 Started deploy [analytics/refinery@1dbbd3d] (hadoop-test): (no justification provided)
[06:41:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:43:51] <wikibugs>	 (03CR) 10Elukey: [V: 03+1 C: 03+2] hadoop: add the liblog4j-extras1.2-java jar to HADOOP_CLASSPATH [puppet] - 10https://gerrit.wikimedia.org/r/677576 (https://phabricator.wikimedia.org/T276906) (owner: 10Elukey)
[06:44:05] <logmsgbot>	 !log elukey@deploy1002 Finished deploy [analytics/refinery@1dbbd3d] (hadoop-test): (no justification provided) (duration: 02m 20s)
[06:44:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:53:47] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Add db1177 to s8. [puppet] - 10https://gerrit.wikimedia.org/r/677799 (https://phabricator.wikimedia.org/T275633)
[06:56:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool es1023 to upgrade kernel and mysql, remove weight from es1021, to leave it as it was yesterday T279281', diff saved to https://phabricator.wikimedia.org/P15231 and previous config saved to /var/cache/conftool/dbconfig/20210408-065627-marostegui.json
[06:56:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:56:35] <stashbot>	 T279281: Upgrade 10.4.13 hosts to a higher version - https://phabricator.wikimedia.org/T279281
[06:57:10] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 104.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37
[06:59:28] <wikibugs>	 (03PS2) 10Abijeet Patro: Rename wgTranslateBlacklist to wgTranslateExclusionList [mediawiki-config] - 10https://gerrit.wikimedia.org/r/676909 (https://phabricator.wikimedia.org/T277965)
[07:06:08] <wikibugs>	 (03PS1) 10Elukey: hadoop: improve the HDFS Namenode audit log4j config [puppet] - 10https://gerrit.wikimedia.org/r/677803 (https://phabricator.wikimedia.org/T276906)
[07:06:52] <wikibugs>	 (03PS2) 10Elukey: hadoop: improve the HDFS Namenode audit log4j config [puppet] - 10https://gerrit.wikimedia.org/r/677803 (https://phabricator.wikimedia.org/T276906)
[07:09:47] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'es1023 (re)pooling @ 25%: Repool es1023', diff saved to https://phabricator.wikimedia.org/P15232 and previous config saved to /var/cache/conftool/dbconfig/20210408-070946-root.json
[07:09:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:12:34] <wikibugs>	 (03PS2) 10Muehlenhoff: Assign mw_rc_irc role to irc1001 [puppet] - 10https://gerrit.wikimedia.org/r/677509 (https://phabricator.wikimedia.org/T278255)
[07:16:56] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[07:17:20] <dcausse>	 looking ^
[07:18:44] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: deployment_server: Unify 2 "admin" if exlucsions via filter() [puppet] - 10https://gerrit.wikimedia.org/r/677805
[07:19:05] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/677228 (https://phabricator.wikimedia.org/T268434) (owner: 10JMeybohm)
[07:19:12] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1005 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: number_of_nodes: 6, active_shards: 1877, number_of_data_nodes: 6, unassigned_shards: 0, initializing_shards: 0, cluster_name: cloudelastic-chi-eqiad, relocating_shards: 0, timed_out: False, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 100.0, status: green, delayed_unassigne
[07:19:12] <icinga-wm>	 er_of_pending_tasks: 0, number_of_in_flight_fetch: 0, active_primary_shards: 937 https://wikitech.wikimedia.org/wiki/Search%23Administration
[07:20:04] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Assign mw_rc_irc role to irc1001 [puppet] - 10https://gerrit.wikimedia.org/r/677509 (https://phabricator.wikimedia.org/T278255) (owner: 10Muehlenhoff)
[07:20:30] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28947/console" [puppet] - 10https://gerrit.wikimedia.org/r/677805 (owner: 10Alexandros Kosiaris)
[07:21:53] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "Thanks. I 've submitted a small followup is https://gerrit.wikimedia.org/r/c/operations/puppet/+/677805 that should make this a bit more r" [puppet] - 10https://gerrit.wikimedia.org/r/677667 (owner: 10Legoktm)
[07:23:33] <wikibugs>	 (03CR) 10Ema: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28948/console" [puppet] - 10https://gerrit.wikimedia.org/r/677580 (https://phabricator.wikimedia.org/T279533) (owner: 10Ema)
[07:24:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'es1023 (re)pooling @ 50%: Repool es1023', diff saved to https://phabricator.wikimedia.org/P15233 and previous config saved to /var/cache/conftool/dbconfig/20210408-072450-root.json
[07:24:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:25:47] <wikibugs>	 (03PS1) 10Muehlenhoff: Broadcase IRC events to irc1001 instead of kraz [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677806 (https://phabricator.wikimedia.org/T224579)
[07:26:25] <wikibugs>	 (03PS2) 10Muehlenhoff: Broadcast IRC events to irc1001 instead of kraz [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677806 (https://phabricator.wikimedia.org/T224579)
[07:27:34] <wikibugs>	 (03CR) 10Ema: [V: 03+1 C: 03+2] vlc: get exp cache admission policy parameters from hiera [puppet] - 10https://gerrit.wikimedia.org/r/677580 (https://phabricator.wikimedia.org/T279533) (owner: 10Ema)
[07:34:57] <wikibugs>	 (03PS1) 10Muehlenhoff: Only install git-fat for distros up to Buster [puppet] - 10https://gerrit.wikimedia.org/r/677807 (https://phabricator.wikimedia.org/T275873)
[07:35:16] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] "Thanks all for taking care of my mess 🙏" [puppet] - 10https://gerrit.wikimedia.org/r/677805 (owner: 10Alexandros Kosiaris)
[07:35:21] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] hadoop: improve the HDFS Namenode audit log4j config [puppet] - 10https://gerrit.wikimedia.org/r/677803 (https://phabricator.wikimedia.org/T276906) (owner: 10Elukey)
[07:39:40] <wikibugs>	 (03PS1) 10Muehlenhoff: New component for PostGIS 3 backport [puppet] - 10https://gerrit.wikimedia.org/r/677808 (https://phabricator.wikimedia.org/T277064)
[07:39:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'es1023 (re)pooling @ 75%: Repool es1023', diff saved to https://phabricator.wikimedia.org/P15234 and previous config saved to /var/cache/conftool/dbconfig/20210408-073953-root.json
[07:40:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:42:30] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be1020.eqiad.wmnet
[07:42:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:45:25] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1175 for schema change', diff saved to https://phabricator.wikimedia.org/P15235 and previous config saved to /var/cache/conftool/dbconfig/20210408-074524-marostegui.json
[07:45:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:48:44] <wikibugs>	 (03PS1) 10Ema: varnish: test setting exp policy parameters in labs [puppet] - 10https://gerrit.wikimedia.org/r/677810 (https://phabricator.wikimedia.org/T279533)
[07:49:04] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=udpmxircecho site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[07:49:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Remove weight from es5 master', diff saved to https://phabricator.wikimedia.org/P15236 and previous config saved to /var/cache/conftool/dbconfig/20210408-074911-marostegui.json
[07:49:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:50:40] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1020.eqiad.wmnet
[07:50:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:51:22] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Add db1177 to s8. [puppet] - 10https://gerrit.wikimedia.org/r/677799 (https://phabricator.wikimedia.org/T275633) (owner: 10Marostegui)
[07:51:28] <wikibugs>	 (03PS2) 10Marostegui: mariadb: Add db1177 to s8. [puppet] - 10https://gerrit.wikimedia.org/r/677799 (https://phabricator.wikimedia.org/T275633)
[07:53:51] <wikibugs>	 (03PS2) 10Ema: varnish: test setting exp policy parameters in labs [puppet] - 10https://gerrit.wikimedia.org/r/677810 (https://phabricator.wikimedia.org/T279533)
[07:54:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'es1023 (re)pooling @ 100%: Repool es1023', diff saved to https://phabricator.wikimedia.org/P15237 and previous config saved to /var/cache/conftool/dbconfig/20210408-075457-root.json
[07:55:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:55:28] <icinga-wm>	 PROBLEM - ircecho bot process on irc1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, regex args /usr/local/bin/udpmxircecho.py https://wikitech.wikimedia.org/wiki/Ircecho
[07:58:32] <wikibugs>	 (03CR) 10Ema: [C: 03+2] varnish: test setting exp policy parameters in labs [puppet] - 10https://gerrit.wikimedia.org/r/677810 (https://phabricator.wikimedia.org/T279533) (owner: 10Ema)
[08:03:12] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
[08:03:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:03:22] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+1 C: 03+2] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/677805 (owner: 10Alexandros Kosiaris)
[08:03:53] <wikibugs>	 10SRE, 10Traffic: cache_upload cache policy + large_objects_cutoff concerns - https://phabricator.wikimedia.org/T275809 (10ema)
[08:04:08] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Add exp cache admission policy parameters to hiera - https://phabricator.wikimedia.org/T279533 (10ema) 05Open→03Resolved After changing `exp_policy_rate` and `exp_policy_base` in hiera for traffic-cache-atstext-buster, the rendered VCL now looks like this: ` +// Incl...
[08:04:29] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: kubernetes1017: Add kubelet node labels [puppet] - 10https://gerrit.wikimedia.org/r/677811
[08:04:57] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] New component for PostGIS 3 backport [puppet] - 10https://gerrit.wikimedia.org/r/677808 (https://phabricator.wikimedia.org/T277064) (owner: 10Muehlenhoff)
[08:05:21] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] kubernetes1017: Add kubelet node labels [puppet] - 10https://gerrit.wikimedia.org/r/677811 (owner: 10Alexandros Kosiaris)
[08:05:45] <moritzm>	 akosiaris: shall I merge your patch along?
[08:05:54] <akosiaris>	 moritzm: merge mine as well please :-)
[08:06:10] <moritzm>	 done
[08:06:16] <akosiaris>	 danke!
[08:06:55] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] "No problem" [puppet] - 10https://gerrit.wikimedia.org/r/677663 (https://phabricator.wikimedia.org/T276509) (owner: 10Andrew Bogott)
[08:08:50] <wikibugs>	 (03PS4) 10Alexandros Kosiaris: Segment values.yaml between teams [deployment-charts] - 10https://gerrit.wikimedia.org/r/675558 (https://phabricator.wikimedia.org/T278208) (owner: 10Elukey)
[08:09:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Segment values.yaml between teams [deployment-charts] - 10https://gerrit.wikimedia.org/r/675558 (https://phabricator.wikimedia.org/T278208) (owner: 10Elukey)
[08:10:56] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/675558 (https://phabricator.wikimedia.org/T278208) (owner: 10Elukey)
[08:11:00] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15238 and previous config saved to /var/cache/conftool/dbconfig/20210408-081059-root.json
[08:11:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:11:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Segment values.yaml between teams [deployment-charts] - 10https://gerrit.wikimedia.org/r/675558 (https://phabricator.wikimedia.org/T278208) (owner: 10Elukey)
[08:12:21] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] admin: Switch values/values.yaml to common.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/676935 (owner: 10Alexandros Kosiaris)
[08:13:39] <wikibugs>	 (03Merged) 10jenkins-bot: admin: Switch values/values.yaml to common.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/676935 (owner: 10Alexandros Kosiaris)
[08:14:14] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: admin: Switch usages to internal kubernetes API, with exceptions [deployment-charts] - 10https://gerrit.wikimedia.org/r/676936
[08:14:16] <wikibugs>	 (03PS5) 10Alexandros Kosiaris: Segment values.yaml between teams [deployment-charts] - 10https://gerrit.wikimedia.org/r/675558 (https://phabricator.wikimedia.org/T278208) (owner: 10Elukey)
[08:15:00] <wikibugs>	 (03PS8) 10Effie Mouzeli: hieradata: remove parsoidJS from production 3 [puppet] - 10https://gerrit.wikimedia.org/r/677114 (https://phabricator.wikimedia.org/T279059)
[08:15:12] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
[08:15:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:17:51] <moritzm>	 !log imported postgis 3.1.1+dfsg-1~wmf1 to component/postgis for buster-wikimedia T277064
[08:17:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:18:01] <stashbot>	 T277064: Packaging PostGIS 3.1 for the new Maps stack - https://phabricator.wikimedia.org/T277064
[08:18:36] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] hieradata: remove parsoidJS from production 3 [puppet] - 10https://gerrit.wikimedia.org/r/677114 (https://phabricator.wikimedia.org/T279059) (owner: 10Effie Mouzeli)
[08:19:58] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "Tested, noop as expected" [deployment-charts] - 10https://gerrit.wikimedia.org/r/676935 (owner: 10Alexandros Kosiaris)
[08:21:14] <wikibugs>	 10SRE, 10Maps, 10Packaging, 10serviceops: Packaging PostGIS 3.1 for the new Maps stack - https://phabricator.wikimedia.org/T277064 (10MoritzMuehlenhoff)
[08:21:20] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is OK: (C)100 gt (W)80 gt 66.21 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37
[08:22:10] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on puppetmaster1002 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes
[08:22:16] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on puppetmaster1003 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes
[08:24:12] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on puppetmaster1002 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes
[08:24:18] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on puppetmaster1003 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes
[08:24:50] <marostegui>	 !log Stop MySQL on all db1117 sections to upgrade kernel
[08:24:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:24:58] <marostegui>	 ^ this will cause haproxy irc alerts
[08:26:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15239 and previous config saved to /var/cache/conftool/dbconfig/20210408-082603-root.json
[08:26:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:26:38] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] admin: Switch usages to internal kubernetes API, with exceptions [deployment-charts] - 10https://gerrit.wikimedia.org/r/676936 (owner: 10Alexandros Kosiaris)
[08:27:34] <icinga-wm>	 PROBLEM - Disk space on stat1008 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=87%): /tmp 0 MB (0% inode=87%): /var/tmp 0 MB (0% inode=87%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=stat1008&var-datasource=eqiad+prometheus/ops
[08:28:05] <wikibugs>	 (03Merged) 10jenkins-bot: admin: Switch usages to internal kubernetes API, with exceptions [deployment-charts] - 10https://gerrit.wikimedia.org/r/676936 (owner: 10Alexandros Kosiaris)
[08:28:12] <wikibugs>	 10SRE, 10Scap, 10Python3-Porting: Porting scap to Python 3 - https://phabricator.wikimedia.org/T279628 (10MoritzMuehlenhoff)
[08:28:22] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1021 is CRITICAL: CRITICAL check_failover servers up 1 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[08:28:36] <marostegui>	 ^ expected
[08:28:38] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1014 is CRITICAL: CRITICAL check_failover servers up 1 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[08:28:54] <marostegui>	 ^ same
[08:29:18] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1017 is CRITICAL: CRITICAL check_failover servers up 1 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[08:29:20] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1013 is CRITICAL: CRITICAL check_failover servers up 1 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[08:31:45] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM, nice work! (despite the curator version) just a few nits inline" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/677593 (https://phabricator.wikimedia.org/T274394) (owner: 10Cwhite)
[08:33:52] <moritzm>	 !log installing remaining curl security updates for buster
[08:33:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:34:56] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1017 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[08:34:58] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1013 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[08:35:12] <wikibugs>	 (03CR) 10JMeybohm: [C: 04-1] "Wow, that's quite some chart! 😊" (0337 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/670220 (https://phabricator.wikimedia.org/T265327) (owner: 10Giuseppe Lavagetto)
[08:35:16] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1021 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[08:35:36] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1014 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[08:36:07] <wikibugs>	 (03PS9) 10Effie Mouzeli: hieradata: remove parsoidJS from production 3 [puppet] - 10https://gerrit.wikimedia.org/r/677114 (https://phabricator.wikimedia.org/T279059)
[08:37:57] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[08:38:03] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[08:38:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:38:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:38:31] <wikibugs>	 (03PS1) 10Ema: varnish: add script to test exp policy offline [puppet] - 10https://gerrit.wikimedia.org/r/677814 (https://phabricator.wikimedia.org/T275809)
[08:38:43] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [staging-codfw] START helmfile.d/admin 'sync'.
[08:38:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:39:43] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
[08:39:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:40:18] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [codfw] START helmfile.d/admin 'sync'.
[08:40:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:06] <wikibugs>	 (03CR) 10Ema: [C: 03+2] varnish: add script to test exp policy offline [puppet] - 10https://gerrit.wikimedia.org/r/677814 (https://phabricator.wikimedia.org/T275809) (owner: 10Ema)
[08:41:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15240 and previous config saved to /var/cache/conftool/dbconfig/20210408-084107-root.json
[08:41:14] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [codfw] DONE helmfile.d/admin 'sync'.
[08:41:14] <icinga-wm>	 RECOVERY - ircecho bot process on irc1001 is OK: PROCS OK: 1 process with command name python, regex args /usr/local/bin/udpmxircecho.py https://wikitech.wikimedia.org/wiki/Ircecho
[08:41:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:42:56] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 103.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37
[08:43:50] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [eqiad] START helmfile.d/admin 'sync'.
[08:43:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:44:17] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [eqiad] DONE helmfile.d/admin 'sync'.
[08:44:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:44:50] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] ceph: run tests on debian 10 buster [puppet] - 10https://gerrit.wikimedia.org/r/677307 (owner: 10David Caro)
[08:46:52] <icinga-wm>	 PROBLEM - ircecho bot process on irc1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, regex args /usr/local/bin/udpmxircecho.py https://wikitech.wikimedia.org/wiki/Ircecho
[08:47:22] <wikibugs>	 (03PS1) 10Elukey: jupyter: avoid logs to syslog/daemon.log for jupyterhub [puppet] - 10https://gerrit.wikimedia.org/r/677816
[08:47:34] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] "> Patch Set 3:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/675558 (https://phabricator.wikimedia.org/T278208) (owner: 10Elukey)
[08:47:49] <elukey>	 akosiaris: \o/ thanks!
[08:48:05] <akosiaris>	 elukey: prego
[08:48:23] <akosiaris>	 thanks for tackling it
[08:48:32] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28949/console" [puppet] - 10https://gerrit.wikimedia.org/r/677816 (owner: 10Elukey)
[08:48:36] <icinga-wm>	 RECOVERY - Disk space on stat1008 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=stat1008&var-datasource=eqiad+prometheus/ops
[08:48:51] <wikibugs>	 (03Merged) 10jenkins-bot: Segment values.yaml between teams [deployment-charts] - 10https://gerrit.wikimedia.org/r/675558 (https://phabricator.wikimedia.org/T278208) (owner: 10Elukey)
[08:49:21] <wikibugs>	 (03CR) 10Elukey: [V: 03+1 C: 03+2] jupyter: avoid logs to syslog/daemon.log for jupyterhub [puppet] - 10https://gerrit.wikimedia.org/r/677816 (owner: 10Elukey)
[08:49:53] <elukey>	 dcaro: o/ ok to merge?
[08:50:09] <dcaro>	 elukey: yep, thanks
[08:50:20] <elukey>	 done :)
[08:56:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15241 and previous config saved to /var/cache/conftool/dbconfig/20210408-085610-root.json
[08:56:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:56:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1166 for schema change', diff saved to https://phabricator.wikimedia.org/P15242 and previous config saved to /var/cache/conftool/dbconfig/20210408-085630-marostegui.json
[08:56:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:57:37] <wikibugs>	 (03PS1) 10Ema: cache: enable exp caching policy on cp5001 [puppet] - 10https://gerrit.wikimedia.org/r/677820 (https://phabricator.wikimedia.org/T275809)
[09:02:06] <wikibugs>	 (03CR) 10JMeybohm: [C: 04-1] "Oh, wait: Switching the docker cgroup driver to systemd means we need to/should also switch the kubelet cgroup driver to systemd to have t" [puppet] - 10https://gerrit.wikimedia.org/r/524186 (https://phabricator.wikimedia.org/T277876) (owner: 10Alexandros Kosiaris)
[09:02:53] <wikibugs>	 (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/677820 (https://phabricator.wikimedia.org/T275809) (owner: 10Ema)
[09:04:22] <wikibugs>	 (03PS1) 10Elukey: jupyter: simplify the cron script to clean up user Trash [puppet] - 10https://gerrit.wikimedia.org/r/677822
[09:05:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] jupyter: simplify the cron script to clean up user Trash [puppet] - 10https://gerrit.wikimedia.org/r/677822 (owner: 10Elukey)
[09:09:03] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] cloud email alerts: remove f-strings in case of stretch vms [puppet] - 10https://gerrit.wikimedia.org/r/677599 (owner: 10Bstorm)
[09:09:56] <moritzm>	 !log installing underscore security updates on stretch
[09:10:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:25] <wikibugs>	 (03PS2) 10Elukey: jupyter: simplify the cron script to clean up user Trash [puppet] - 10https://gerrit.wikimedia.org/r/677822
[09:12:02] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28950/console" [puppet] - 10https://gerrit.wikimedia.org/r/677822 (owner: 10Elukey)
[09:14:07] <wikibugs>	 (03CR) 10Ema: [C: 03+2] cache: enable exp caching policy on cp5001 [puppet] - 10https://gerrit.wikimedia.org/r/677820 (https://phabricator.wikimedia.org/T275809) (owner: 10Ema)
[09:14:33] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
[09:14:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:14:56] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
[09:15:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:16:28] <wikibugs>	 (03PS10) 10Effie Mouzeli: hieradata: remove parsoidJS from production 3 [puppet] - 10https://gerrit.wikimedia.org/r/677114 (https://phabricator.wikimedia.org/T279059)
[09:17:06] <wikibugs>	 (03PS3) 10Elukey: jupyter: simplify the cron script to clean up user Trash [puppet] - 10https://gerrit.wikimedia.org/r/677822
[09:17:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] hieradata: remove parsoidJS from production 3 [puppet] - 10https://gerrit.wikimedia.org/r/677114 (https://phabricator.wikimedia.org/T279059) (owner: 10Effie Mouzeli)
[09:18:24] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28952/console" [puppet] - 10https://gerrit.wikimedia.org/r/677822 (owner: 10Elukey)
[09:20:30] <ema>	 !log cp5001: varnish-frontend-restart to test exp policy settings starting from a empty cache T275809
[09:20:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:20:38] <stashbot>	 T275809: cache_upload cache policy + large_objects_cutoff concerns - https://phabricator.wikimedia.org/T275809
[09:21:04] <wikibugs>	 (03PS11) 10Effie Mouzeli: hieradata: remove parsoidJS from production 3 [puppet] - 10https://gerrit.wikimedia.org/r/677114 (https://phabricator.wikimedia.org/T279059)
[09:22:09] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] hieradata: remove parsoidJS from production 3 [puppet] - 10https://gerrit.wikimedia.org/r/677114 (https://phabricator.wikimedia.org/T279059) (owner: 10Effie Mouzeli)
[09:24:08] <moritzm>	 !log installing libzstd security updates on buster
[09:24:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:25:03] <dcaro>	 There's some issues hapenning (old gc, and unassigned shards check timeout) on the cloudelastic nodes, I created T279636 and tagged it with 'elasticsearch', if it's not the right one please let me know
[09:25:03] <stashbot>	 T279636: cloudelastic* timeout while checking shards - https://phabricator.wikimedia.org/T279636
[09:25:20] <wikibugs>	 (03PS12) 10Effie Mouzeli: hieradata: remove parsoidJS from production 3 [puppet] - 10https://gerrit.wikimedia.org/r/677114 (https://phabricator.wikimedia.org/T279059)
[09:25:48] <logmsgbot>	 !log zpapierski@deploy1002 Started deploy [wikimedia/discovery/analytics@d098717]: T273847 export queries to relforge dag deployment - sensor name fix
[09:25:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:25:56] <stashbot>	 T273847: Create a elasticsearch/kibana index with queries to allow query completion candidate research - https://phabricator.wikimedia.org/T273847
[09:26:08] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] cloud email alerts: remove f-strings in case of stretch vms [puppet] - 10https://gerrit.wikimedia.org/r/677599 (owner: 10Bstorm)
[09:26:09] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15243 and previous config saved to /var/cache/conftool/dbconfig/20210408-092608-root.json
[09:26:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:26:31] <wikibugs>	 (03PS1) 10Ema: Revert "cache: enable exp caching policy on cp5001" [puppet] - 10https://gerrit.wikimedia.org/r/677712
[09:27:00] <wikibugs>	 (03CR) 10Effie Mouzeli: hieradata: remove parsoidJS from production 4 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/677118 (https://phabricator.wikimedia.org/T279059) (owner: 10Effie Mouzeli)
[09:27:29] <wikibugs>	 (03CR) 10Ema: [C: 03+2] Revert "cache: enable exp caching policy on cp5001" [puppet] - 10https://gerrit.wikimedia.org/r/677712 (owner: 10Ema)
[09:27:36] <logmsgbot>	 !log zpapierski@deploy1002 Finished deploy [wikimedia/discovery/analytics@d098717]: T273847 export queries to relforge dag deployment - sensor name fix (duration: 01m 48s)
[09:27:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:27:45] <wikibugs>	 (03CR) 10Effie Mouzeli: "PCC works https://puppet-compiler.wmflabs.org/compiler1002/28954/" [puppet] - 10https://gerrit.wikimedia.org/r/677114 (https://phabricator.wikimedia.org/T279059) (owner: 10Effie Mouzeli)
[09:29:07] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
[09:29:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:29:40] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
[09:29:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:29:49] <wikibugs>	 10SRE, 10ops-codfw, 10netops: Multiple host down alerts from rack C2 - https://phabricator.wikimedia.org/T279457 (10ayounsi) Ok, because of this RTF RMA we're going to replace the switch with a spare. @Papaul Let's chat on IRC to figure out what time would works best for you, then we can notify services owne...
[09:30:23] <moritzm>	 !log installing openssl updates for buster
[09:30:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:31:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P15244 and previous config saved to /var/cache/conftool/dbconfig/20210408-093151-root.json
[09:31:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:34:44] <wikibugs>	 (03PS1) 10JMeybohm: Migrate kubernetes infrastructure_users to new syntax [labs/private] - 10https://gerrit.wikimedia.org/r/677825 (https://phabricator.wikimedia.org/T269461)
[09:36:29] <Urbanecm>	 !log Retry server-side upload for T279192
[09:36:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:36:38] <stashbot>	 T279192: Server side upload for Sturm - https://phabricator.wikimedia.org/T279192
[09:36:42] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 103.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37
[09:38:58] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+2 C: 03+2] Migrate kubernetes infrastructure_users to new syntax [labs/private] - 10https://gerrit.wikimedia.org/r/677825 (https://phabricator.wikimedia.org/T269461) (owner: 10JMeybohm)
[09:39:51] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Add db1177 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/677826 (https://phabricator.wikimedia.org/T275633)
[09:40:41] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] instances.yaml: Add db1177 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/677826 (https://phabricator.wikimedia.org/T275633) (owner: 10Marostegui)
[09:41:05] <wikibugs>	 (03PS1) 10Urbanecm: Enable Growth for newcomers on simplewiki, mswiki, tawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677828 (https://phabricator.wikimedia.org/T278369)
[09:41:12] <godog>	 Urbanecm: neat, thank you for retrying
[09:41:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15246 and previous config saved to /var/cache/conftool/dbconfig/20210408-094112-root.json
[09:41:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:41:38] <Urbanecm>	 godog: np. I tried it twice the other day, and both attempts failed, so I called you, sorry for bothering :-).
[09:42:04] <effie>	 !log disable puppet in mw* servers for 677114
[09:42:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:42:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Add db1177 to dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15247 and previous config saved to /var/cache/conftool/dbconfig/20210408-094218-marostegui.json
[09:42:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:42:27] <stashbot>	 T275633: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633
[09:42:47] <godog>	 Urbanecm: no bother, you did the right thing! I think it was temporary indeed due to the swift rebalance in eqiad I started the other day, it does get noisy at the beginning and some PUTs are known to fail
[09:44:22] <Urbanecm>	 !log [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=3600 --user=Lusccasdeutsch . # T278856
[09:44:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:44:30] <stashbot>	 T278856: Server side upload for Lusccasdeutsch (master task) - https://phabricator.wikimedia.org/T278856
[09:44:47] <Urbanecm>	 godog: good to know :). Anyway, thanks for the help :)
[09:45:28] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 101.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37
[09:46:43] <wikibugs>	 10SRE, 10Wikidata, 10Wikidata Query Builder, 10wdwb-tech, 10User-Addshore: Deploy WDQS query builder to microsites - https://phabricator.wikimedia.org/T266703 (10Addshore)
[09:46:56] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P15248 and previous config saved to /var/cache/conftool/dbconfig/20210408-094655-root.json
[09:47:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:48:11] <wikibugs>	 (03PS13) 10Effie Mouzeli: hieradata: remove parsoidJS from production 3 [puppet] - 10https://gerrit.wikimedia.org/r/677114 (https://phabricator.wikimedia.org/T279059)
[09:50:04] <icinga-wm>	 RECOVERY - ircecho bot process on irc1001 is OK: PROCS OK: 1 process with command name python, regex args /usr/local/bin/udpmxircecho.py https://wikitech.wikimedia.org/wiki/Ircecho
[09:51:18] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/677807 (https://phabricator.wikimedia.org/T275873) (owner: 10Muehlenhoff)
[09:52:07] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] hieradata: remove parsoidJS from production 3 [puppet] - 10https://gerrit.wikimedia.org/r/677114 (https://phabricator.wikimedia.org/T279059) (owner: 10Effie Mouzeli)
[09:53:49] <wikibugs>	 10SRE, 10GitLab (Initialization), 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10User-brennen: SSH Access of Git data in GitLab - https://phabricator.wikimedia.org/T276148 (10jbond) > We're making it a part of the Ansible playbook that manages Gitlab installation. I believe you should h...
[09:56:16] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15249 and previous config saved to /var/cache/conftool/dbconfig/20210408-095615-root.json
[09:56:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:56:48] <icinga-wm>	 PROBLEM - ircecho bot process on irc1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, regex args /usr/local/bin/udpmxircecho.py https://wikitech.wikimedia.org/wiki/Ircecho
[09:56:55] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[09:57:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:58:08] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[09:58:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:59:02] <icinga-wm>	 RECOVERY - ircecho bot process on irc1001 is OK: PROCS OK: 1 process with command name python, regex args /usr/local/bin/udpmxircecho.py https://wikitech.wikimedia.org/wiki/Ircecho
[09:59:26] <icinga-wm>	 PROBLEM - Confd template for /srv/config-master/pybal/eqiad/parsoid on puppetmaster1001 is CRITICAL: Compilation of file /srv/config-master/pybal/eqiad/parsoid is broken https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[10:00:05] <jouncebot>	 mvolz: Dear deployers, time to do the [[mw:Services|Services]] – [[mw:Citoid|Citoid]] /  Zotero deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210408T1000).
[10:00:24] <icinga-wm>	 PROBLEM - Confd template for /srv/config-master/pybal/codfw/parsoid on puppetmaster2001 is CRITICAL: Compilation of file /srv/config-master/pybal/codfw/parsoid is broken https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[10:00:34] <icinga-wm>	 PROBLEM - Confd template for /srv/config-master/pybal/eqiad/parsoid on puppetmaster2001 is CRITICAL: Compilation of file /srv/config-master/pybal/eqiad/parsoid is broken https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[10:00:42] <icinga-wm>	 PROBLEM - Confd template for /srv/config-master/pybal/codfw/parsoid on puppetmaster1001 is CRITICAL: Compilation of file /srv/config-master/pybal/codfw/parsoid is broken https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[10:01:04] <jayme>	 effie: ^ (I guess)
[10:01:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P15250 and previous config saved to /var/cache/conftool/dbconfig/20210408-100159-root.json
[10:02:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:02:29] <effie>	 yes that is me 
[10:02:32] <effie>	 all me 
[10:03:01] <wikibugs>	 (03CR) 10Mvolz: [C: 03+2] citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/675848 (owner: 10PipelineBot)
[10:03:14] <wikibugs>	 (03Abandoned) 10Mvolz: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/674936 (owner: 10PipelineBot)
[10:03:19] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Add db1180 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/677831 (https://phabricator.wikimedia.org/T275633)
[10:03:35] <wikibugs>	 10SRE, 10Wikidata, 10Wikidata Query Builder, 10wdwb-tech, 10User-Addshore: Deploy WDQS query builder to microsites - https://phabricator.wikimedia.org/T266703 (10Michael) 05Open→03Stalled This is blocked by {T264822}
[10:03:48] <wikibugs>	 (03CR) 10Jbond: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/677558 (owner: 10Jbond)
[10:03:53] <wikibugs>	 10SRE, 10Wikidata, 10Wikidata Query Builder, 10wdwb-tech, 10User-Addshore: Deploy WDQS query builder to microsites - https://phabricator.wikimedia.org/T266703 (10Michael)
[10:03:58] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] instances.yaml: Add db1180 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/677831 (https://phabricator.wikimedia.org/T275633) (owner: 10Marostegui)
[10:04:32] <wikibugs>	 10SRE, 10Wikidata, 10Wikidata Query Builder, 10wdwb-tech, 10User-Addshore: 🛑 Deploy WDQS query builder to microsites - https://phabricator.wikimedia.org/T266703 (10Michael)
[10:04:36] <wikibugs>	 (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/675848 (owner: 10PipelineBot)
[10:07:45] <logmsgbot>	 !log jgiannelos@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
[10:07:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:08:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Add db1180 to dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15251 and previous config saved to /var/cache/conftool/dbconfig/20210408-100829-marostegui.json
[10:08:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:08:38] <stashbot>	 T275633: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633
[10:09:44] <logmsgbot>	 !log zpapierski@deploy1002 Started deploy [wikimedia/discovery/analytics@ff0137d]: T273847 export queries to relforge dag deployment - start date update
[10:09:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:09:55] <stashbot>	 T273847: Create a elasticsearch/kibana index with queries to allow query completion candidate research - https://phabricator.wikimedia.org/T273847
[10:10:22] <icinga-wm>	 PROBLEM - ircecho bot process on irc1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, regex args /usr/local/bin/udpmxircecho.py https://wikitech.wikimedia.org/wiki/Ircecho
[10:10:50] <logmsgbot>	 !log mvolz@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
[10:10:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15252 and previous config saved to /var/cache/conftool/dbconfig/20210408-101119-root.json
[10:11:22] <logmsgbot>	 !log zpapierski@deploy1002 Finished deploy [wikimedia/discovery/analytics@ff0137d]: T273847 export queries to relforge dag deployment - start date update (duration: 01m 37s)
[10:11:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:13:00] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 103.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37
[10:13:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1157 for schema change', diff saved to https://phabricator.wikimedia.org/P15253 and previous config saved to /var/cache/conftool/dbconfig/20210408-101303-marostegui.json
[10:13:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:16:53] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
[10:16:56] <logmsgbot>	 !log jgiannelos@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
[10:16:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P15254 and previous config saved to /var/cache/conftool/dbconfig/20210408-101702-root.json
[10:17:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:19:19] <wikibugs>	 10SRE, 10ops-codfw: Degraded RAID on ms-be2028 - https://phabricator.wikimedia.org/T279245 (10fgiunchedi) 05Resolved→03Open a:05fgiunchedi→03Papaul @papaul I'm running into troubles with the disk I haven't seen before (xfs crashes after a while, log below). Can we try another spare disk just to exclude...
[10:23:11] <wikibugs>	 (03PS1) 10Muehlenhoff: ircecho: Install python-prometheus-client [puppet] - 10https://gerrit.wikimedia.org/r/677834
[10:25:05] <wikibugs>	 10SRE, 10Proton, 10Product-Infrastructure-Team-Backlog (Kanban): Proton metrics broken - https://phabricator.wikimedia.org/T277857 (10Jgiannelos) It looks like native prometheus metrics are now exposed in the service. That said we may still need to adapt the grafana dashboard because the metrics names might...
[10:25:48] <icinga-wm>	 RECOVERY - ircecho bot process on irc1001 is OK: PROCS OK: 1 process with command name python, regex args /usr/local/bin/udpmxircecho.py https://wikitech.wikimedia.org/wiki/Ircecho
[10:27:15] <wikibugs>	 (03CR) 10Effie Mouzeli: "After merging this, some confd/etcd alerts will pop up. Puppet should be run first on the puppetmasters, then on the icinga servers, and o" [puppet] - 10https://gerrit.wikimedia.org/r/677114 (https://phabricator.wikimedia.org/T279059) (owner: 10Effie Mouzeli)
[10:27:18] <logmsgbot>	 !log mvolz@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
[10:27:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:27:43] <effie>	 !log enable puppet on all mw* servers
[10:27:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:56] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1118 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15255 and previous config saved to /var/cache/conftool/dbconfig/20210408-102855-marostegui.json
[10:29:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:26] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good! Let's sync up before merging, I'll run a few tests after it has been merged." [puppet] - 10https://gerrit.wikimedia.org/r/677292 (owner: 10Jbond)
[10:30:01] <wikibugs>	 (03PS3) 10Effie Mouzeli: hieradata: remove parsoidJS from production 4 [puppet] - 10https://gerrit.wikimedia.org/r/677118 (https://phabricator.wikimedia.org/T279059)
[10:30:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] hieradata: remove parsoidJS from production 4 [puppet] - 10https://gerrit.wikimedia.org/r/677118 (https://phabricator.wikimedia.org/T279059) (owner: 10Effie Mouzeli)
[10:30:32] <marostegui>	 !log Upgrade kernel on db1118
[10:30:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:30:49] <wikibugs>	 (03PS4) 10Effie Mouzeli: hieradata: remove parsoidJS from production 4 [puppet] - 10https://gerrit.wikimedia.org/r/677118 (https://phabricator.wikimedia.org/T279059)
[10:31:09] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Only install git-fat for distros up to Buster [puppet] - 10https://gerrit.wikimedia.org/r/677807 (https://phabricator.wikimedia.org/T275873) (owner: 10Muehlenhoff)
[10:31:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] hieradata: remove parsoidJS from production 4 [puppet] - 10https://gerrit.wikimedia.org/r/677118 (https://phabricator.wikimedia.org/T279059) (owner: 10Effie Mouzeli)
[10:32:17] <wikibugs>	 (03PS5) 10Effie Mouzeli: hieradata: remove parsoidJS from production 4 [puppet] - 10https://gerrit.wikimedia.org/r/677118 (https://phabricator.wikimedia.org/T279059)
[10:32:37] <XioNoX>	 !log enable sampling on cr1-codfw:fpc0
[10:32:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] hieradata: remove parsoidJS from production 4 [puppet] - 10https://gerrit.wikimedia.org/r/677118 (https://phabricator.wikimedia.org/T279059) (owner: 10Effie Mouzeli)
[10:32:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:33] <wikibugs>	 (03Abandoned) 10Effie Mouzeli: hieradata: remove parsoidJS from production 4 [puppet] - 10https://gerrit.wikimedia.org/r/677118 (https://phabricator.wikimedia.org/T279059) (owner: 10Effie Mouzeli)
[10:36:42] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[10:37:20] <wikibugs>	 (03PS2) 10Ayounsi: Automatically enable sampling on all FPCs [homer/public] - 10https://gerrit.wikimedia.org/r/636392 (https://phabricator.wikimedia.org/T257392)
[10:37:28] <logmsgbot>	 !log mvolz@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
[10:37:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:16] <wikibugs>	 (03Abandoned) 10Effie Mouzeli: hieradata: remove parsoidJS from production 5 [puppet] - 10https://gerrit.wikimedia.org/r/677119 (https://phabricator.wikimedia.org/T279059) (owner: 10Effie Mouzeli)
[10:38:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15256 and previous config saved to /var/cache/conftool/dbconfig/20210408-103821-root.json
[10:38:24] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Automatically enable sampling on all FPCs [homer/public] - 10https://gerrit.wikimedia.org/r/636392 (https://phabricator.wikimedia.org/T257392) (owner: 10Ayounsi)
[10:38:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:39:03] <wikibugs>	 (03Merged) 10jenkins-bot: Automatically enable sampling on all FPCs [homer/public] - 10https://gerrit.wikimedia.org/r/636392 (https://phabricator.wikimedia.org/T257392) (owner: 10Ayounsi)
[10:40:21] <wikibugs>	 (03PS1) 10Effie Mouzeli: profile::parsoid: remove parsoidJS module from parsoid profile [puppet] - 10https://gerrit.wikimedia.org/r/677837
[10:40:27] <marostegui>	 !log Upgrade db2085's kernel
[10:40:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:41:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::parsoid: remove parsoidJS module from parsoid profile [puppet] - 10https://gerrit.wikimedia.org/r/677837 (owner: 10Effie Mouzeli)
[10:41:50] <XioNoX>	 !log enable sampling on all routers FPCs
[10:41:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:43:19] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me! Likewise, let's sync when merging for some tests in parallel." [puppet] - 10https://gerrit.wikimedia.org/r/677506 (owner: 10Jbond)
[10:44:26] <wikibugs>	 (03PS1) 10JMeybohm: k8s_infrastructure_users: Remove special case for old schema [puppet] - 10https://gerrit.wikimedia.org/r/677839 (https://phabricator.wikimedia.org/T269461)
[10:46:56] <wikibugs>	 10SRE, 10netops, 10Patch-For-Review: automatically sample from all FPCs on core routers - https://phabricator.wikimedia.org/T257392 (10ayounsi) 05Open→03Resolved a:03ayounsi One more thing automated from Netbox.
[10:47:04] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1] "PCC SUCCESS (NOOP 10): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28957/console" [puppet] - 10https://gerrit.wikimedia.org/r/677839 (https://phabricator.wikimedia.org/T269461) (owner: 10JMeybohm)
[10:47:07] <effie>	 !log disable puppet on parsoid* servers 
[10:47:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:50:09] <wikibugs>	 (03PS2) 10Effie Mouzeli: profile::parsoid: remove parsoidJS from production 4 [puppet] - 10https://gerrit.wikimedia.org/r/677837 (https://phabricator.wikimedia.org/T677119)
[10:52:29] <Urbanecm>	 jouncebot: next
[10:52:29] <jouncebot>	 In 0 hour(s) and 7 minute(s): [[Backport windows|EU Backport and Config training]]<br/><small>''''''</small> (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210408T1100)
[10:53:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/677510 (https://phabricator.wikimedia.org/T273673) (owner: 10Jbond)
[10:53:25] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15257 and previous config saved to /var/cache/conftool/dbconfig/20210408-105324-root.json
[10:53:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:56:15] <wikibugs>	 (03CR) 10Muehlenhoff: "Looks good, but we should also remove" [puppet] - 10https://gerrit.wikimedia.org/r/677514 (owner: 10Jbond)
[10:57:09] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] k8s_infrastructure_users: Remove special case for old schema [puppet] - 10https://gerrit.wikimedia.org/r/677839 (https://phabricator.wikimedia.org/T269461) (owner: 10JMeybohm)
[10:57:49] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1 C: 03+2] k8s_infrastructure_users: Remove special case for old schema [puppet] - 10https://gerrit.wikimedia.org/r/677839 (https://phabricator.wikimedia.org/T269461) (owner: 10JMeybohm)
[10:58:19] <wikibugs>	 (03PS3) 10Effie Mouzeli: profile::parsoid: remove parsoidJS from production 4 [puppet] - 10https://gerrit.wikimedia.org/r/677837 (https://phabricator.wikimedia.org/T677119)
[10:59:08] <icinga-wm>	 PROBLEM - HP RAID on ms-be2028 is CRITICAL: CRITICAL: Slot 3: OK: 1I:1:1, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Failed: 1I:1:2 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[10:59:10] <wikibugs>	 (03PS1) 10Elukey: hadoop: fix log4j audit log max file size [puppet] - 10https://gerrit.wikimedia.org/r/677845
[10:59:10] <icinga-wm>	 ACKNOWLEDGEMENT - HP RAID on ms-be2028 is CRITICAL: CRITICAL: Slot 3: OK: 1I:1:1, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Failed: 1I:1:2 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T279644 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[10:59:14] <wikibugs>	 10SRE, 10ops-codfw: Degraded RAID on ms-be2028 - https://phabricator.wikimedia.org/T279644 (10ops-monitoring-bot)
[10:59:18] <wikibugs>	 (03PS1) 10PipelineBot: mathoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/677846
[11:00:05] <jouncebot>	 Amir1, apergos, and duesen: Dear deployers, time to do the [[Backport windows|EU Backport and Config training]]<br/><small>''''''</small> deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210408T1100).
[11:01:13] <apergos>	 there don't seem to be any changes listed in the window fwiw (maybe due to tomorrow's wmf holiday?)
[11:01:20] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] hadoop: fix log4j audit log max file size [puppet] - 10https://gerrit.wikimedia.org/r/677845 (owner: 10Elukey)
[11:02:58] <wikibugs>	 10SRE, 10Traffic, 10User-notice: Rate limit requests in violation of User-Agent policy more aggressively - https://phabricator.wikimedia.org/T224891 (10ayounsi) Even with the current rate limiting, some crawling are regularly causing issues, wasting precious SRE time.  I'd like to revisit this task to be mor...
[11:03:37] <wikibugs>	 (03CR) 10Effie Mouzeli: "@Αλέξανδρος Κοσιάρης, deploy-service sudo block should be removed here after all" [puppet] - 10https://gerrit.wikimedia.org/r/677837 (https://phabricator.wikimedia.org/T677119) (owner: 10Effie Mouzeli)
[11:03:51] <wikibugs>	 (03CR) 10Effie Mouzeli: "pcc https://puppet-compiler.wmflabs.org/compiler1003/28956/parse2001.codfw.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/677837 (https://phabricator.wikimedia.org/T677119) (owner: 10Effie Mouzeli)
[11:04:07] <wikibugs>	 10SRE, 10ops-codfw: Degraded RAID on ms-be2028 - https://phabricator.wikimedia.org/T279644 (10fgiunchedi)
[11:04:09] <wikibugs>	 10SRE, 10ops-codfw: Degraded RAID on ms-be2028 - https://phabricator.wikimedia.org/T279245 (10fgiunchedi)
[11:04:10] <Urbanecm>	 apergos: possibly. But I'd like to get sth deployed anyway, so if the training is running, I can give you all a config patch.
[11:04:28] <Urbanecm>	 https://gerrit.wikimedia.org/r/c/677828
[11:04:35] <apergos>	 there doesn't seem to be a training either. I'm in the  google meet and there's 0 attendees :-D
[11:04:44] <Urbanecm>	 ok
[11:04:47] <Urbanecm>	 so I'll just self-service then :)
[11:04:53] <apergos>	 I mean, it's on the calendar but meh.  
[11:05:05] <apergos>	 yeah, I'd just go ahead and do it yourself
[11:05:07] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Enable Growth for newcomers on simplewiki, mswiki, tawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677828 (https://phabricator.wikimedia.org/T278369) (owner: 10Urbanecm)
[11:05:50] <wikibugs>	 (03Merged) 10jenkins-bot: Enable Growth for newcomers on simplewiki, mswiki, tawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677828 (https://phabricator.wikimedia.org/T278369) (owner: 10Urbanecm)
[11:06:47] <apergos>	 what's the story for "community concensus" on this change? it looks like that's not the process here
[11:06:59] <apergos>	 (just following along since I'm in here)
[11:07:25] <apergos>	 oh also don't forget please to add your patch to the calendar so it's in the record, it's nice to be able to search there and not just the logs
[11:07:30] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] profile::parsoid: remove parsoidJS from production 4 [puppet] - 10https://gerrit.wikimedia.org/r/677837 (https://phabricator.wikimedia.org/T677119) (owner: 10Effie Mouzeli)
[11:07:35] <Urbanecm>	 apergos: sure, will do.
[11:07:36] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: de1670cbd2c59a24f1e29a6d3731e3ac7f39d336: Enable Growth for newcomers on simplewiki, mswiki, tawiki (T278369; T277562; T277550) (duration: 01m 07s)
[11:07:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:07:47] <stashbot>	 T278369: Deploy Growth features on Tamil Wikipedia - https://phabricator.wikimedia.org/T278369
[11:07:48] <stashbot>	 T277562: Deploy Growth features on Malay Wikipedia - https://phabricator.wikimedia.org/T277562
[11:07:48] <stashbot>	 T277550: Deploy Growth features on Simple English Wikipedia - https://phabricator.wikimedia.org/T277550
[11:07:53] <wikibugs>	 (03PS1) 10David Caro: pcc: honor spaces in arguments [puppet] - 10https://gerrit.wikimedia.org/r/677847
[11:08:29] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15258 and previous config saved to /var/cache/conftool/dbconfig/20210408-110828-root.json
[11:08:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:09:24] <logmsgbot>	 !log filippo@cumin1001 END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2028.codfw.wmnet
[11:09:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:09:43] <Urbanecm>	 apergos: so, the features I just enabled for the wikis are an iniciative of the Growth team, for which I work as a software engineer. The community relationship specialist told me it's ready to go, so I synced it :).
[11:10:12] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM thanks" [puppet] - 10https://gerrit.wikimedia.org/r/677847 (owner: 10David Caro)
[11:10:20] <apergos>	 hm it's probably good for that note to go on the task (assuming it's public info... which it now is, since this channel is logged :-D)
[11:10:35] <apergos>	 just so people looking at recent deploys to learn anything get how it goes
[11:10:43] <Urbanecm>	 good point
[11:11:32] <Urbanecm>	 the wikis are obviously contacted by us in advance, and given the opportunity to try & comment on the features before they are live, but it doesn't need an explicit consensus, as it's a WMF-pursued change rather than community-pursued
[11:12:11] <apergos>	 yup
[11:12:32] <apergos>	 folks learning to do this will need to know which is which and be able to double-check (well anyone doing this, really, heh)
[11:14:07] <Urbanecm>	 yeah. Well, in this case, the tasks are created by a WMF employee who's a Growth team member, they're tagged with a team's sprint board, and another Growth team member requests deployments/deploys it, so...I guess it's pretty obvious it's WMF-pursued change
[11:14:22] <wikibugs>	 (03PS1) 10Jbond: hiera: move key to correct location [labs/private] - 10https://gerrit.wikimedia.org/r/677848
[11:14:42] <Urbanecm>	 but since you asked, it's probably not _really_ obvious. Not sure how to make it more visible tbh
[11:15:42] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] hiera: move key to correct location [labs/private] - 10https://gerrit.wikimedia.org/r/677848 (owner: 10Jbond)
[11:17:10] <apergos>	 well if we all get  in the habit of adding a comment on the task "does not need community concensus, wmf-pursued change" or "link to community concensus here" (well some community member will do that)
[11:17:21] <apergos>	 and if no such comment is on there, the person with the patch can be asked
[11:17:35] <apergos>	 prolly ok that way, just gets it into people's heads
[11:17:57] <Urbanecm>	 yeah, i totally understand it
[11:18:16] <Urbanecm>	 on the other hand not everything that has community support can be done
[11:18:29] <Urbanecm>	 (one example: we won't install flow/LQT on more wikis)
[11:18:35] <apergos>	 no, but we can at least make the check part of the routine, or else
[11:18:44] <apergos>	 some things might go out that shouldn't :-D
[11:18:51] <apergos>	 I mean, more than without the check!
[11:19:11] <apergos>	 I really wish we could de-install flow on more wikis but that's another topic :-P
[11:19:28] <Urbanecm>	 iirc uninstaling flow is also on the list of banned changes
[11:19:48] <apergos>	 which is the worst of both worlds >_<
[11:20:02] <Urbanecm>	 yeah
[11:20:08] <Urbanecm>	 the issue is uninstalling flow...isn't exactly easy
[11:20:18] <apergos>	 must maintain it, won't improve  it, won't give it to anyone else, can't get rid of it
[11:20:34] <apergos>	 and now we have these nice shiny new overlays for talk pages... >_<
[11:20:47] <apergos>	 yeah well what do you do with the old flow pages. nothing good
[11:20:52] <Urbanecm>	 i love discussiontools, btw :)
[11:20:57] <apergos>	 :-)
[11:21:21] <Urbanecm>	 T188812 says "Flow allegedly puts the wikis into an irreversible state whereupon it becomes impossible for the Wikimedia Foundation to handle its leftovers"
[11:21:22] <stashbot>	 T188812: Uninstall Flow on all wikis where it has zero topics - https://phabricator.wikimedia.org/T188812
[11:21:32] <Urbanecm>	 i think it summarizes the flow case really well :/
[11:21:50] <apergos>	 I will bookmark it and never read it until some day when I really want to feel 10x more miserable than normal
[11:22:17] <Urbanecm>	 :D
[11:22:19] <apergos>	 I had to dig into the db structure at a point where I was rewriting the dumps for them
[11:22:23] <apergos>	 it was extremely painful
[11:22:38] <apergos>	 that concludes my "Flow, the externsion you love to hate" Ted Talk.
[11:23:13] <Urbanecm>	 i dealt with a couple of issues that were mostly about "extension defined a content model, the extension is gone, the pages created via that extension cannot be deleted, undeleted, moved, viewed nor edited"
[11:23:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15259 and previous config saved to /var/cache/conftool/dbconfig/20210408-112332-root.json
[11:23:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:24:22] <apergos>	 yeah how could they be. no extension, no content model, and a bunch of the data lives in a separate off-wiki db
[11:24:29] <apergos>	 that's not a recipe for trouble, right?
[11:24:47] <Urbanecm>	 how could it be :)
[11:24:57] <apergos>	 welp, 25 minutes in, I think no one is coming to be trained, legit because it is the day before a 4 day weekend for wmf folks
[11:25:08] <apergos>	 so, closing that google meet tab :-)
[11:25:13] <Urbanecm>	 ok ok :
[11:25:31] <apergos>	 thanks for using the window :-D
[11:25:40] <Urbanecm>	 :)
[11:27:42] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: fix help invocation [puppet] - 10https://gerrit.wikimedia.org/r/677850
[11:28:04] <RhinosF1>	 Urbanecm: discussion tools is one of my favourite extensions
[11:28:11] <Urbanecm>	 yeahj
[11:28:27] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] sonofgridengine: grid-configurator: fix help invocation [puppet] - 10https://gerrit.wikimedia.org/r/677850 (owner: 10Arturo Borrero Gonzalez)
[11:28:38] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: fix help invocation [puppet] - 10https://gerrit.wikimedia.org/r/677850
[11:28:47] <wikibugs>	 (03PS4) 10Effie Mouzeli: profile::parsoid: remove parsoidJS from production 4 [puppet] - 10https://gerrit.wikimedia.org/r/677837 (https://phabricator.wikimedia.org/T677119)
[11:32:06] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] profile::parsoid: remove parsoidJS from production 4 [puppet] - 10https://gerrit.wikimedia.org/r/677837 (https://phabricator.wikimedia.org/T677119) (owner: 10Effie Mouzeli)
[11:33:47] <wikibugs>	 (03CR) 10Effie Mouzeli: "PCC https://puppet-compiler.wmflabs.org/compiler1003/28956/parse2001.codfw.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/677837 (https://phabricator.wikimedia.org/T677119) (owner: 10Effie Mouzeli)
[11:34:16] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on ms-be2028 is CRITICAL: cluster=swift device=None instance=ms-be2028 job=node site=codfw https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be2028&var-datasource=codfw+prometheus/ops
[11:40:30] <wikibugs>	 (03PS2) 10Amire80: Add default import sources [mediawiki-config] - 10https://gerrit.wikimedia.org/r/676930 (https://phabricator.wikimedia.org/T214139)
[11:40:46] <icinga-wm>	 PROBLEM - Check systemd state on sodium is CRITICAL: CRITICAL - degraded: The following units failed: update-ubuntu-mirror.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:46:25] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15261 and previous config saved to /var/cache/conftool/dbconfig/20210408-114625-root.json
[11:46:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:47:04] <wikibugs>	 (03PS1) 10Elukey: hadoop: fix the HDFS Namenode audit log config [puppet] - 10https://gerrit.wikimedia.org/r/677853 (https://phabricator.wikimedia.org/T276906)
[11:47:30] <logmsgbot>	 !log zpapierski@deploy1002 Started deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix
[11:47:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:47:39] <stashbot>	 T273847: Create a elasticsearch/kibana index with queries to allow query completion candidate research - https://phabricator.wikimedia.org/T273847
[11:49:10] <logmsgbot>	 !log zpapierski@deploy1002 Finished deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix (duration: 01m 39s)
[11:49:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:49:28] <wikibugs>	 (03PS1) 10Ema: vcl: fix ADM_PARAM definition [puppet] - 10https://gerrit.wikimedia.org/r/677858 (https://phabricator.wikimedia.org/T279533)
[11:50:14] <wikibugs>	 (03CR) 10Ema: [C: 03+2] vcl: fix ADM_PARAM definition [puppet] - 10https://gerrit.wikimedia.org/r/677858 (https://phabricator.wikimedia.org/T279533) (owner: 10Ema)
[11:50:22] <XioNoX>	 !log tighten cr3-ulsfo loopback firewall filter - T207799
[11:50:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:53:54] <wikibugs>	 (03PS1) 10Ayounsi: Introduce production4/6 and tighten looback filter [homer/public] - 10https://gerrit.wikimedia.org/r/677859 (https://phabricator.wikimedia.org/T207799)
[11:54:10] <wikibugs>	 (03PS1) 10Ema: cache: enable exp caching policy on cp5001 [puppet] - 10https://gerrit.wikimedia.org/r/677718 (https://phabricator.wikimedia.org/T275809)
[11:55:10] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Introduce production4/6 and tighten looback filter [homer/public] - 10https://gerrit.wikimedia.org/r/677859 (https://phabricator.wikimedia.org/T207799) (owner: 10Ayounsi)
[11:57:34] <logmsgbot>	 !log zpapierski@deploy1002 Started deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix
[11:57:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:57:42] <stashbot>	 T273847: Create a elasticsearch/kibana index with queries to allow query completion candidate research - https://phabricator.wikimedia.org/T273847
[11:57:43] <logmsgbot>	 !log zpapierski@deploy1002 Finished deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix (duration: 00m 09s)
[11:57:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:58:04] <wikibugs>	 (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/677718 (https://phabricator.wikimedia.org/T275809) (owner: 10Ema)
[11:58:40] <XioNoX>	 !log tighten all routers loopback firewall filter - T207799
[11:58:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:00:21] <wikibugs>	 (03Merged) 10jenkins-bot: Introduce production4/6 and tighten looback filter [homer/public] - 10https://gerrit.wikimedia.org/r/677859 (https://phabricator.wikimedia.org/T207799) (owner: 10Ayounsi)
[12:00:23] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] hadoop: fix the HDFS Namenode audit log config [puppet] - 10https://gerrit.wikimedia.org/r/677853 (https://phabricator.wikimedia.org/T276906) (owner: 10Elukey)
[12:00:25] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: fix help invocation [puppet] - 10https://gerrit.wikimedia.org/r/677850
[12:00:27] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: sonofgridnegine: grid-configurator: run black autoformater [puppet] - 10https://gerrit.wikimedia.org/r/677860
[12:00:29] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: include defaults in help message [puppet] - 10https://gerrit.wikimedia.org/r/677861
[12:00:31] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: rework --domains option [puppet] - 10https://gerrit.wikimedia.org/r/677862
[12:01:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15262 and previous config saved to /var/cache/conftool/dbconfig/20210408-120128-root.json
[12:01:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] sonofgridengine: grid-configurator: rework --domains option [puppet] - 10https://gerrit.wikimedia.org/r/677862 (owner: 10Arturo Borrero Gonzalez)
[12:01:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:16:34] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15263 and previous config saved to /var/cache/conftool/dbconfig/20210408-121633-root.json
[12:16:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:19:19] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: error if running in toolsbeta if no --beta [puppet] - 10https://gerrit.wikimedia.org/r/677865
[12:20:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] sonofgridengine: grid-configurator: error if running in toolsbeta if no --beta [puppet] - 10https://gerrit.wikimedia.org/r/677865 (owner: 10Arturo Borrero Gonzalez)
[12:22:02] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: rework --domains option [puppet] - 10https://gerrit.wikimedia.org/r/677862
[12:31:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15264 and previous config saved to /var/cache/conftool/dbconfig/20210408-123137-root.json
[12:31:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:32:16] <wikibugs>	 (03PS1) 10Ema: vcl: declare adm_param as a global variable [puppet] - 10https://gerrit.wikimedia.org/r/677870 (https://phabricator.wikimedia.org/T279533)
[12:32:32] <wikibugs>	 10SRE, 10Epic, 10cloud-services-team (Kanban): CloudVPS: network architecture - https://phabricator.wikimedia.org/T209460 (10ayounsi)
[12:39:14] <moritzm>	 !log installing xcftools security updates
[12:39:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:40:23] <wikibugs>	 (03PS1) 10Ayounsi: Remove 185.15.56.0/24 from network::external [puppet] - 10https://gerrit.wikimedia.org/r/677872 (https://phabricator.wikimedia.org/T265864)
[12:41:15] <wikibugs>	 (03CR) 10Ayounsi: [C: 04-1] "-1 until we're sure it's safe to merge." [puppet] - 10https://gerrit.wikimedia.org/r/677872 (https://phabricator.wikimedia.org/T265864) (owner: 10Ayounsi)
[12:44:03] <moritzm>	 !log installing libbsd security updates for Buster
[12:44:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:45:08] <wikibugs>	 (03CR) 10Ema: [C: 03+2] vcl: declare adm_param as a global variable [puppet] - 10https://gerrit.wikimedia.org/r/677870 (https://phabricator.wikimedia.org/T279533) (owner: 10Ema)
[12:46:19] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 98): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28960/console" [puppet] - 10https://gerrit.wikimedia.org/r/677567 (owner: 10David Caro)
[12:46:31] <wikibugs>	 (03CR) 10Ema: [C: 03+2] cache: enable exp caching policy on cp5001 [puppet] - 10https://gerrit.wikimedia.org/r/677718 (https://phabricator.wikimedia.org/T275809) (owner: 10Ema)
[12:48:23] <wikibugs>	 (03PS4) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: fix help invocation [puppet] - 10https://gerrit.wikimedia.org/r/677850
[12:48:25] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: sonofgridnegine: grid-configurator: run black autoformater [puppet] - 10https://gerrit.wikimedia.org/r/677860
[12:48:27] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: include defaults in help message [puppet] - 10https://gerrit.wikimedia.org/r/677861
[12:48:29] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: rework --domains option [puppet] - 10https://gerrit.wikimedia.org/r/677862
[12:48:31] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: error if running in toolsbeta if no --beta [puppet] - 10https://gerrit.wikimedia.org/r/677865
[12:48:33] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: introduce support for the new domain [puppet] - 10https://gerrit.wikimedia.org/r/677873 (https://phabricator.wikimedia.org/T277653)
[12:49:38] <ema>	 !log cp5001: varnish-frontend-restart to test exp policy settings starting from a empty cache T275809
[12:49:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:49:46] <stashbot>	 T275809: cache_upload cache policy + large_objects_cutoff concerns - https://phabricator.wikimedia.org/T275809
[12:50:34] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] sonofgridengine: grid-configurator: introduce support for the new domain [puppet] - 10https://gerrit.wikimedia.org/r/677873 (https://phabricator.wikimedia.org/T277653) (owner: 10Arturo Borrero Gonzalez)
[12:56:20] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] pcc: honor spaces in arguments [puppet] - 10https://gerrit.wikimedia.org/r/677847 (owner: 10David Caro)
[13:01:00] <wikibugs>	 (03CR) 10Ottomata: hadoop: add the liblog4j-extras1.2-java jar to HADOOP_CLASSPATH (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/677576 (https://phabricator.wikimedia.org/T276906) (owner: 10Elukey)
[13:03:19] <wikibugs>	 (03CR) 10Ottomata: "Huh, TIL :) TY!" [puppet] - 10https://gerrit.wikimedia.org/r/677816 (owner: 10Elukey)
[13:03:57] <wikibugs>	 (03PS1) 10Ema: admin: add mikeraish to ldap_only users [puppet] - 10https://gerrit.wikimedia.org/r/677885 (https://phabricator.wikimedia.org/T279147)
[13:06:08] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] jupyter: simplify the cron script to clean up user Trash [puppet] - 10https://gerrit.wikimedia.org/r/677822 (owner: 10Elukey)
[13:13:37] <wikibugs>	 (03CR) 10Ema: [C: 03+2] admin: add mikeraish to ldap_only users [puppet] - 10https://gerrit.wikimedia.org/r/677885 (https://phabricator.wikimedia.org/T279147) (owner: 10Ema)
[13:13:53] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] admin: add mikeraish to ldap_only users [puppet] - 10https://gerrit.wikimedia.org/r/677885 (https://phabricator.wikimedia.org/T279147) (owner: 10Ema)
[13:14:34] <wikibugs>	 10SRE, 10Scap, 10Python3-Porting, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (2021-04-01 to 2021-06-30 (Q4)): Porting scap to Python 3 - https://phabricator.wikimedia.org/T279628 (10thcipriani)
[13:16:34] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant access to Superset for Mikeraish - https://phabricator.wikimedia.org/T279147 (10ema) @Mraishwmf: you should be all set! Let me know if you can now access Superset.
[13:16:53] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:18:59] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1005 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: unassigned_shards: 0, relocating_shards: 0, active_shards_percent_as_number: 100.0, active_primary_shards: 937, number_of_in_flight_fetch: 0, cluster_name: cloudelastic-chi-eqiad, active_shards: 1877, number_of_pending_tasks: 0, initializing_shards: 0, number_of_nodes: 6, task_max_waiting_in_queue_
[13:18:59] <icinga-wm>	 _of_data_nodes: 6, delayed_unassigned_shards: 0, status: green, timed_out: False https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:19:01] <wikibugs>	 (03PS2) 10Andrew Bogott: Replace cloudcephmon2001-dev with cloudcephmon2004-dev [puppet] - 10https://gerrit.wikimedia.org/r/677663 (https://phabricator.wikimedia.org/T276509)
[13:20:40] <wikibugs>	 10SRE, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts...
[13:20:46] <wikibugs>	 10SRE: Integrate Buster 10.9 point update - https://phabricator.wikimedia.org/T279054 (10MoritzMuehlenhoff)
[13:22:18] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Replace cloudcephmon2001-dev with cloudcephmon2004-dev [puppet] - 10https://gerrit.wikimedia.org/r/677663 (https://phabricator.wikimedia.org/T276509) (owner: 10Andrew Bogott)
[13:24:23] <moritzm>	 !log installing groff bugfix updates from Buster point release
[13:24:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:47] <wikibugs>	 (03PS1) 10Ottomata: dumps.wikimedia.org - add section in legal specifying CC0 license for analytics [puppet] - 10https://gerrit.wikimedia.org/r/677900 (https://phabricator.wikimedia.org/T278409)
[13:26:09] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Switch cloudcephmon2001-dev to a spare::system [puppet] - 10https://gerrit.wikimedia.org/r/677664 (owner: 10Andrew Bogott)
[13:26:16] <wikibugs>	 (03PS2) 10Andrew Bogott: Switch cloudcephmon2001-dev to a spare::system [puppet] - 10https://gerrit.wikimedia.org/r/677664
[13:29:12] <wikibugs>	 10SRE: DRY up .html files in puppet used for snapshot and dumps modules - https://phabricator.wikimedia.org/T279661 (10Ottomata)
[13:29:42] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+1] "Sure looks just like the other one. Go ahead on, sorry for the mixup!" [puppet] - 10https://gerrit.wikimedia.org/r/677900 (https://phabricator.wikimedia.org/T278409) (owner: 10Ottomata)
[13:29:48] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] dumps.wikimedia.org - add section in legal specifying CC0 license for analytics [puppet] - 10https://gerrit.wikimedia.org/r/677900 (https://phabricator.wikimedia.org/T278409) (owner: 10Ottomata)
[13:33:54] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudcephmon2004-dev - https://phabricator.wikimedia.org/T276509 (10Andrew) thank you @papaul!  this box is now in service.
[13:34:05] <hnowlan>	 Majavah: I've deployed that cpjobqueue change to beta 
[13:34:25] <wikibugs>	 (03PS1) 10JMeybohm: calico: Add defauls for container resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/677906 (https://phabricator.wikimedia.org/T277877)
[13:36:17] <wikibugs>	 10SRE, 10Prod-Kubernetes, 10serviceops, 10Kubernetes, 10Patch-For-Review: Set resource requests and limits for calico PODs - https://phabricator.wikimedia.org/T277877 (10JMeybohm) a:03JMeybohm Added some defaults based on the current maximum values (https://grafana-rw.wikimedia.org/d/2AfU0X_Mz/jayme-ca...
[13:39:43] <logmsgbot>	 !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
[13:39:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:41:47] <logmsgbot>	 !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
[13:41:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:44:28] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.decommission for hosts cloudcephmon2001-dev.codfw.wmnet
[13:44:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:47:58] <wikibugs>	 (03PS1) 10Andrew Bogott: Remove references to cloudcephmon2001-dev.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/677910 (https://phabricator.wikimedia.org/T279662)
[13:48:13] <wikibugs>	 (03PS3) 10David Caro: ceph: use ensure_packages instead of package directly [puppet] - 10https://gerrit.wikimedia.org/r/677595 (https://phabricator.wikimedia.org/T274566)
[13:48:15] <wikibugs>	 (03PS1) 10David Caro: ceph.common: add ceph repo parameter [puppet] - 10https://gerrit.wikimedia.org/r/677911
[13:48:55] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Remove references to cloudcephmon2001-dev.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/677910 (https://phabricator.wikimedia.org/T279662) (owner: 10Andrew Bogott)
[13:49:07] <icinga-wm>	 PROBLEM - graphite.wikimedia.org render on graphite1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Graphite%23Operations_troubleshooting
[13:49:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ceph.common: add ceph repo parameter [puppet] - 10https://gerrit.wikimedia.org/r/677911 (owner: 10David Caro)
[13:49:35] <icinga-wm>	 PROBLEM - graphite.wikimedia.org api on graphite1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Graphite%23Operations_troubleshooting
[13:50:17] <wikibugs>	 (03CR) 10David Caro: "PCC: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28962/console" [puppet] - 10https://gerrit.wikimedia.org/r/677595 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[13:50:29] <icinga-wm>	 RECOVERY - graphite.wikimedia.org render on graphite1004 is OK: HTTP OK: HTTP/1.1 200 OK - 1594 bytes in 0.006 second response time https://wikitech.wikimedia.org/wiki/Graphite%23Operations_troubleshooting
[13:55:13] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephmon2001-dev.codfw.wmnet
[13:55:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:55] <wikibugs>	 10ops-codfw, 10decommission-hardware, 10Patch-For-Review, 10cloud-services-team (Kanban): decommission cloudcephmon2001-dev - https://phabricator.wikimedia.org/T279662 (10Andrew) a:05Andrew→03Papaul
[14:00:39] <wikibugs>	 10SRE, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['parse2001.codfw.wmnet'] `  and were **ALL*...
[14:09:51] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:10:34] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudcephosd10[16-20].eqiad.wmnet - https://phabricator.wikimedia.org/T274945 (10Cmjohnson)
[14:11:31] <wikibugs>	 (03PS3) 10Silvan Heintze: Remove idGeneratorLogging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677560 (https://phabricator.wikimedia.org/T274156) (owner: 10Noa wmde)
[14:17:53] <icinga-wm>	 PROBLEM - Long running screen/tmux on puppetmaster1001 is CRITICAL: CRIT: Long running tmux process. (user: ryankemper PID: 2120, 2539394s 1728000s). https://wikitech.wikimedia.org/wiki/Monitoring/Long_running_screens
[14:18:55] <wikibugs>	 (03PS1) 10Silvan Heintze: Remove all remains of idGeneratorLogging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677920 (https://phabricator.wikimedia.org/T274156)
[14:19:23] <ryankemper>	 ^ Killed my tmux session `cergen` on `puppetmaster1001`
[14:20:49] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Remove idGeneratorLogging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677560 (https://phabricator.wikimedia.org/T274156) (owner: 10Noa wmde)
[14:22:20] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Remove all remains of idGeneratorLogging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677920 (https://phabricator.wikimedia.org/T274156) (owner: 10Silvan Heintze)
[14:22:46] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+1] Add growthexperiments_mentee_data to private tables [puppet] - 10https://gerrit.wikimedia.org/r/677653 (https://phabricator.wikimedia.org/T279587) (owner: 10Urbanecm)
[14:23:18] <icinga-wm>	 RECOVERY - graphite.wikimedia.org api on graphite1004 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.015 second response time https://wikitech.wikimedia.org/wiki/Graphite%23Operations_troubleshooting
[14:23:50] <wikibugs>	 (03PS2) 10David Caro: ceph: add ceph repo parameter to all client modules [puppet] - 10https://gerrit.wikimedia.org/r/677911 (https://phabricator.wikimedia.org/T274566)
[14:25:39] <wikibugs>	 (03CR) 10David Caro: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/677911 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[14:29:54] <wikibugs>	 (03PS3) 10David Caro: ceph: add ceph repo parameter to all client modules [puppet] - 10https://gerrit.wikimedia.org/r/677911 (https://phabricator.wikimedia.org/T274566)
[14:30:05] <wikibugs>	 (03CR) 10David Caro: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/677911 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[14:31:18] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] logspam: silence rare but annoying UTF-8 warnings [puppet] - 10https://gerrit.wikimedia.org/r/677676 (owner: 10Brennen Bearnes)
[14:34:44] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] calico: Add defauls for container resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/677906 (https://phabricator.wikimedia.org/T277877) (owner: 10JMeybohm)
[14:36:08] <wikibugs>	 10Puppet, 10SRE, 10puppet-compiler, 10Patch-For-Review, and 2 others: Integrate the puppet compiler in the puppet CI pipeline - https://phabricator.wikimedia.org/T166066 (10dcaro) >>! In T166066#5039654, @hashar wrote: > We have a Jenkins job T97513 which has been made to recognizes `Hosts:` in commit mess...
[14:38:51] <wikibugs>	 10SRE, 10ops-codfw: Degraded RAID on ms-be2028 - https://phabricator.wikimedia.org/T279245 (10Papaul) a:05Papaul→03fgiunchedi Disk replaced
[14:39:21] <wikibugs>	 (03CR) 10Silvan Heintze: "split up into two separate changes, as Lucas suggested" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677920 (https://phabricator.wikimedia.org/T274156) (owner: 10Silvan Heintze)
[14:45:44] <icinga-wm>	 RECOVERY - HP RAID on ms-be2028 is OK: OK: Slot 3: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[14:46:11] <wikibugs>	 (03PS1) 10JMeybohm: kube-apiserver: Use --enable-admission-plugins argument [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063)
[14:46:13] <wikibugs>	 (03PS1) 10JMeybohm: kube-apiserver: Update the list of enabled admission controllers [puppet] - 10https://gerrit.wikimedia.org/r/677923 (https://phabricator.wikimedia.org/T270063)
[14:47:47] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] kube-apiserver: Update the list of enabled admission controllers [puppet] - 10https://gerrit.wikimedia.org/r/677923 (https://phabricator.wikimedia.org/T270063) (owner: 10JMeybohm)
[14:48:02] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] "\o/" [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063) (owner: 10JMeybohm)
[14:50:10] <wikibugs>	 (03PS1) 10JMeybohm: infrastructure_users: Remove comments with old schema [puppet] - 10https://gerrit.wikimedia.org/r/677926 (https://phabricator.wikimedia.org/T269461)
[14:52:30] <wikibugs>	 (03CR) 10David Caro: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/677911 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[15:00:52] <wikibugs>	 (03CR) 10Tonina Zhelyazkova: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677928 (https://phabricator.wikimedia.org/T204031) (owner: 10Tonina Zhelyazkova)
[15:01:32] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] infrastructure_users: Remove comments with old schema [puppet] - 10https://gerrit.wikimedia.org/r/677926 (https://phabricator.wikimedia.org/T269461) (owner: 10JMeybohm)
[15:03:30] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] wikidata: post edit constraint jobs on 60% of edits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677928 (https://phabricator.wikimedia.org/T204031) (owner: 10Tonina Zhelyazkova)
[15:04:06] <wikibugs>	 (03PS4) 10David Caro: ceph: add ceph repo and parameter to all client modules [puppet] - 10https://gerrit.wikimedia.org/r/677911 (https://phabricator.wikimedia.org/T274566)
[15:08:54] <wikibugs>	 10SRE, 10Community-Tech, 10MediaWiki-CrossWikiWatchlist, 10Crosswiki: Acquire new hardware for hosting cross-wiki watchlist database - https://phabricator.wikimedia.org/T142538 (10MusikAnimal)
[15:10:37] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch to iptables legacy alternative provider on bullseye [puppet] - 10https://gerrit.wikimedia.org/r/677931 (https://phabricator.wikimedia.org/T275873)
[15:20:21] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] cloud email alerts: remove f-strings in case of stretch vms [puppet] - 10https://gerrit.wikimedia.org/r/677599 (owner: 10Bstorm)
[15:22:03] <wikibugs>	 10SRE, 10ops-eqiad, 10Analytics-Clusters: Icinga/MegaRAID alert on an-worker1100 - https://phabricator.wikimedia.org/T279475 (10elukey) ` elukey@an-worker1100:~$ sudo megacli -AdpBbuCmd -BbuLearn -aAll                                       Adapter 0: BBU Learn Failed  Exit Code: 0x01 `  This is also weird..
[15:29:22] <wikibugs>	 (03PS1) 10David Caro: ceph.common: pin any package from ceph repo to prio 1003 [puppet] - 10https://gerrit.wikimedia.org/r/677938 (https://phabricator.wikimedia.org/T274566)
[15:29:49] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on sretest1002 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[15:30:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ceph.common: pin any package from ceph repo to prio 1003 [puppet] - 10https://gerrit.wikimedia.org/r/677938 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[15:31:07] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: ceph.common: pin any package from ceph repo to prio 1003 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/677938 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[15:32:08] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: ceph.common: pin any package from ceph repo to prio 1003 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/677938 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[15:36:04] <elukey>	 !log reboot an-worker1100 to see if it helps with the strange BBU behavior
[15:36:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:36:21] <wikibugs>	 (03PS2) 10David Caro: ceph.common: pin any package from ceph repo to prio 1003 [puppet] - 10https://gerrit.wikimedia.org/r/677938 (https://phabricator.wikimedia.org/T274566)
[15:37:31] <icinga-wm>	 PROBLEM - Host an-worker1100 is DOWN: PING CRITICAL - Packet loss = 100%
[15:40:37] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on ms-be2028 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be2028&var-datasource=codfw+prometheus/ops
[15:40:39] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops: (Need By: 2021-04-30) rack/setup/install backup200[4-7] - https://phabricator.wikimedia.org/T277323 (10Papaul)
[15:42:01] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "I would record somewhere why the legacy version is required, commit message or a comment in the puppet manifest. It may help in the future" [puppet] - 10https://gerrit.wikimedia.org/r/677931 (https://phabricator.wikimedia.org/T275873) (owner: 10Muehlenhoff)
[15:42:17] <wikibugs>	 (03PS3) 10David Caro: ceph.common: pin any package from ceph repo to prio 1003 [puppet] - 10https://gerrit.wikimedia.org/r/677938 (https://phabricator.wikimedia.org/T274566)
[15:42:20] <wikibugs>	 (03CR) 10David Caro: ceph.common: pin any package from ceph repo to prio 1003 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/677938 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[15:42:21] <wikibugs>	 10SRE, 10Security: Investigate iptables replacments - https://phabricator.wikimedia.org/T279683 (10jbond) p:05Triage→03Medium
[15:43:07] <wikibugs>	 (03CR) 10Jbond: "LGTM, also created https://phabricator.wikimedia.org/T279683 to explore long term options" [puppet] - 10https://gerrit.wikimedia.org/r/677931 (https://phabricator.wikimedia.org/T275873) (owner: 10Muehlenhoff)
[15:44:07] <logmsgbot>	 !log pt1979@cumin2001 START - Cookbook sre.dns.netbox
[15:44:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:45:21] <wikibugs>	 (03CR) 10David Caro: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/677938 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[15:45:47] <wikibugs>	 10SRE, 10serviceops, 10Parsoid (Tracking), 10Patch-For-Review: Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` ['wtp1025.eqiad.wmnet'] ` The log can be found in `/var/log/...
[15:47:15] <icinga-wm>	 PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration
[15:48:14] <wikibugs>	 10SRE, 10serviceops, 10Parsoid (Tracking), 10Patch-For-Review: Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10jijiki)
[15:48:28] <wikibugs>	 10SRE, 10serviceops, 10Parsoid (Tracking), 10Patch-For-Review: Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10jijiki)
[15:49:07] <icinga-wm>	 RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 0.005 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration
[15:51:39] <logmsgbot>	 !log pt1979@cumin2001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:51:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:53:13] <wikibugs>	 10SRE, 10Security: Investigate iptables replacements - https://phabricator.wikimedia.org/T279683 (10dcaro)
[15:53:36] <wikibugs>	 10SRE, 10Security: Investigate iptables replacements - https://phabricator.wikimedia.org/T279683 (10aborrero) beware that in the next debian release iptables may not even be part of the base system install.
[15:55:15] <icinga-wm>	 RECOVERY - Host an-worker1100 is UP: PING WARNING - Packet loss = 33%, RTA = 2.34 ms
[15:56:11] <wikibugs>	 (03CR) 10Jbond: "lgtm but see inline comments" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/677911 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[15:57:59] <icinga-wm>	 PROBLEM - SSH on an-worker1100 is CRITICAL: connect to address 10.64.36.145 and port 22: Connection refused https://wikitech.wikimedia.org/wiki/SSH/monitoring
[15:58:50] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "I'd noticed that recently and didn't take time to fix it. Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/677850 (owner: 10Arturo Borrero Gonzalez)
[15:59:29] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] sonofgridnegine: grid-configurator: run black autoformater [puppet] - 10https://gerrit.wikimedia.org/r/677860 (owner: 10Arturo Borrero Gonzalez)
[15:59:40] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/677595 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[16:00:05] <jouncebot>	 jbond42 and cdanis: It is that lovely time of the day again! You are hereby commanded to deploy [[Puppet request window]]<br/><small>''''''</small>. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210408T1600).
[16:00:16] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/677861 (owner: 10Arturo Borrero Gonzalez)
[16:00:21] <cdanis>	 thanks jouncebot 
[16:00:35] <wikibugs>	 (03CR) 10Jbond: ceph.common: pin any package from ceph repo to prio 1003 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/677938 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[16:00:48] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Revert incorrect changes to ve.ui.MWBackCommand that made it stop working [extensions/VisualEditor] (wmf/1.36.0-wmf.38) - 10https://gerrit.wikimedia.org/r/677725 (https://phabricator.wikimedia.org/T279613)
[16:02:31] <wikibugs>	 (03CR) 10Jbond: ceph: add ceph repo and parameter to all client modules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/677911 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[16:04:17] <wikibugs>	 (03CR) 10David Caro: ceph: add ceph repo and parameter to all client modules (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/677911 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[16:04:25] <icinga-wm>	 PROBLEM - Host an-worker1100 is DOWN: PING CRITICAL - Packet loss = 100%
[16:05:01] <effie>	 jouncebot: now
[16:05:01] <jouncebot>	 For the next 0 hour(s) and 54 minute(s): [[Puppet request window]]<br/><small>''''''</small> (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210408T1600)
[16:05:08] <effie>	 jouncebot: next
[16:05:08] <jouncebot>	 In 0 hour(s) and 54 minute(s): [[mw:Services|Services]] – [[mw:Extension:Graph|Graphoid]] / [[ORES]] (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210408T1700)
[16:05:36] <cdanis>	 effie: puppet request window empty today if you want to use it
[16:05:53] <icinga-wm>	 RECOVERY - Host an-worker1100 is UP: PING OK - Packet loss = 0%, RTA = 0.49 ms
[16:05:56] <logmsgbot>	 !log pt1979@cumin2001 START - Cookbook sre.dns.netbox
[16:06:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:17] <effie>	 cdanis: I am reimaging a parsoid server, and I just remembered that it could cause a deployment failure
[16:06:44] <effie>	 so I was checking who I need to nag :p
[16:06:53] <icinga-wm>	 RECOVERY - SSH on an-worker1100 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[16:06:58] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] ceph: use ensure_packages instead of package directly [puppet] - 10https://gerrit.wikimedia.org/r/677595 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[16:10:25] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1100 is OK: OK: optimal, 23 logical, 23 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[16:10:50] <logmsgbot>	 !log pt1979@cumin2001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:10:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:54] <wikibugs>	 (03PS4) 10Cwhite: logstash: refactor how curator jobs are defined and deployed [puppet] - 10https://gerrit.wikimedia.org/r/677593 (https://phabricator.wikimedia.org/T274394)
[16:13:07] <wikibugs>	 (03CR) 10Legoktm: "> Patch Set 1:" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677002 (https://phabricator.wikimedia.org/T224565) (owner: 10Herron)
[16:13:50] <logmsgbot>	 !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1025.eqiad.wmnet with reason: REIMAGE
[16:13:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:14:34] <wikibugs>	 10SRE, 10ops-eqiad, 10Analytics-Clusters: Icinga/MegaRAID alert on an-worker1100 - https://phabricator.wikimedia.org/T279475 (10elukey) The alert recovered, but I discovered a bad disk that needs to be replaced (had to clear preserved cache to allow boot, and one partition didn't mount). Hopefully we'll get...
[16:14:57] <wikibugs>	 (03CR) 10Jbond: ceph: add ceph repo and parameter to all client modules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/677911 (https://phabricator.wikimedia.org/T274566) (owner: 10David Caro)
[16:15:58] <logmsgbot>	 !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1025.eqiad.wmnet with reason: REIMAGE
[16:16:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:16:46] <cmjohnson1>	 !log update bios cp1087, already deposed for h/w issues T278729
[16:16:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:16:54] <stashbot>	 T278729: cp1087 powercycled - https://phabricator.wikimedia.org/T278729
[16:18:16] <wikibugs>	 (03CR) 10Cwhite: logstash: refactor how curator jobs are defined and deployed (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/677593 (https://phabricator.wikimedia.org/T274394) (owner: 10Cwhite)
[16:18:21] <icinga-wm>	 PROBLEM - Host cp1087 is DOWN: PING CRITICAL - Packet loss = 100%
[16:19:02] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: cache_upload cache policy + large_objects_cutoff concerns - https://phabricator.wikimedia.org/T275809 (10ema) Today I've added [[ https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/varnish/files/exp_policy.py | exp_policy.py...
[16:19:32] <wikibugs>	 (03PS1) 10PipelineBot: mathoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/677942
[16:22:59] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudcephosd10[16-20].eqiad.wmnet - https://phabricator.wikimedia.org/T274945 (10Cmjohnson) These are all connected, the 2nd interfaces are not setup, it seems that we're all confused on how to do this so I di...
[16:26:10] <wikibugs>	 10SRE, 10ops-eqiad, 10Analytics-Clusters: Icinga/MegaRAID alert on an-worker1100 - https://phabricator.wikimedia.org/T279475 (10elukey) One drive is in a Foreign state, no idea why (also unconfigured - good):  ` Enclosure Device ID: 32 Slot Number: 10 Enclosure position: 1 Device Id: 10 WWN: 5000c500cf8ee990...
[16:26:41] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudcephosd10[16-20].eqiad.wmnet - https://phabricator.wikimedia.org/T274945 (10Papaul) @Cmjohnson I will take a look at it once done with some onsite work
[16:26:41] <icinga-wm>	 RECOVERY - Host cp1087 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms
[16:27:06] <wikibugs>	 (03PS2) 10Herron: replace mwlog1001 with new mwlog[12]002 hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677002 (https://phabricator.wikimedia.org/T224565)
[16:28:40] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: 2021-03-31) rack/setup/install cloudgw100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T272403 (10Cmjohnson) @aborrero The 2nd interfaces are  cloudgw1001 cloudsw1-c8  xe-0/0/19 cable id 5321  cloudgw1002 cloudsw1-d5 xe-0/0/35...
[16:28:59] <wikibugs>	 (03CR) 10Herron: "> Patch Set 1:" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677002 (https://phabricator.wikimedia.org/T224565) (owner: 10Herron)
[16:33:04] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/677593 (https://phabricator.wikimedia.org/T274394) (owner: 10Cwhite)
[16:33:17] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] replace mwlog1001 with new mwlog[12]002 hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677002 (https://phabricator.wikimedia.org/T224565) (owner: 10Herron)
[16:33:45] <elukey>	 !log reboot an-worker1100 again to check if all the disks come up correctly
[16:33:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:22] <wikibugs>	 10SRE, 10ops-eqiad, 10Traffic: cp1087 powercycled - https://phabricator.wikimedia.org/T278729 (10Cmjohnson) updated the BIOS and submitted Dell ticket You have successfully submitted request SR1056516502.
[16:35:00] <wikibugs>	 10SRE, 10ops-eqiad, 10Analytics-Clusters: Icinga/MegaRAID alert on an-worker1100 - https://phabricator.wikimedia.org/T279475 (10elukey) I had to do: ` megacli -CfgForeign -Scan -a0 megacli -CfgForeign -Clear -a0 megacli -CfgLdAdd -r0 [32:10] -a0 `  And the disk came back to life and I was able to re-mount it...
[16:35:05] <wikibugs>	 (03PS1) 10Jbond: O:gitlab: add config for backup sets [puppet] - 10https://gerrit.wikimedia.org/r/677970 (https://phabricator.wikimedia.org/T274463)
[16:36:07] <icinga-wm>	 PROBLEM - Host an-worker1100 is DOWN: PING CRITICAL - Packet loss = 100%
[16:37:39] <icinga-wm>	 RECOVERY - Host an-worker1100 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms
[16:40:50] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/677974
[16:42:19] <wikibugs>	 10SRE, 10ops-eqiad, 10Analytics-Clusters: Icinga/MegaRAID alert on an-worker1100 - https://phabricator.wikimedia.org/T279475 (10elukey) 05Open→03Resolved a:03elukey All good, I'll re-open in case something weird comes up, but now all disks are good :)
[16:51:06] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops: Netbox Duplicate Cable Lables - https://phabricator.wikimedia.org/T279160 (10Cmjohnson) 05Open→03Resolved
[16:51:16] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops: Netbox Duplicate Cable Lables - https://phabricator.wikimedia.org/T279160 (10Cmjohnson) Fixed the report has zero errors
[16:51:29] <dancy>	 !log testing Scap 3.17.0 release on deployment-deploy01
[16:51:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:51:40] <wikibugs>	 (03PS5) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: fix help invocation [puppet] - 10https://gerrit.wikimedia.org/r/677850
[16:51:42] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: sonofgridnegine: grid-configurator: run black autoformater [puppet] - 10https://gerrit.wikimedia.org/r/677860
[16:51:44] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: include defaults in help message [puppet] - 10https://gerrit.wikimedia.org/r/677861
[16:51:46] <wikibugs>	 (03PS4) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: rework --domains option [puppet] - 10https://gerrit.wikimedia.org/r/677862
[16:51:48] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: error if running in toolsbeta if no --beta [puppet] - 10https://gerrit.wikimedia.org/r/677865
[16:51:50] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: introduce support for the new domain [puppet] - 10https://gerrit.wikimedia.org/r/677873 (https://phabricator.wikimedia.org/T277653)
[16:54:03] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] sonofgridengine: grid-configurator: introduce support for the new domain [puppet] - 10https://gerrit.wikimedia.org/r/677873 (https://phabricator.wikimedia.org/T277653) (owner: 10Arturo Borrero Gonzalez)
[16:58:42] <wikibugs>	 10SRE, 10serviceops, 10Parsoid (Tracking), 10Patch-For-Review: Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1025.eqiad.wmnet'] `  and were **ALL** successful.
[17:00:05] <jouncebot>	 chrisalbon and accraze: Your horoscope predicts another unfortunate [[mw:Services|Services]] – [[mw:Extension:Graph|Graphoid]] / [[ORES]] deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210408T1700).
[17:05:20] <wikibugs>	 (03PS1) 10Razzi: clouddb: enable alerting for clouddb1021 [puppet] - 10https://gerrit.wikimedia.org/r/677977 (https://phabricator.wikimedia.org/T269211)
[17:11:20] <wikibugs>	 (03PS1) 10Jgiannelos: Bump chromium-render to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/677978
[17:12:51] <cscott>	 Reedy: is now good?
[17:13:40] <wikibugs>	 (03PS2) 10JMeybohm: kube-apiserver: Update admission controller config [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063)
[17:14:07] <wikibugs>	 (03Abandoned) 10JMeybohm: kube-apiserver: Update the list of enabled admission controllers [puppet] - 10https://gerrit.wikimedia.org/r/677923 (https://phabricator.wikimedia.org/T270063) (owner: 10JMeybohm)
[17:14:53] <wikibugs>	 (03CR) 10Jgiannelos: [C: 03+2] Bump chromium-render to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/677978 (owner: 10Jgiannelos)
[17:16:26] <dancy>	 !log Scap 3.17.0 deployed to beta cluster
[17:16:29] <wikibugs>	 (03Merged) 10jenkins-bot: Bump chromium-render to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/677978 (owner: 10Jgiannelos)
[17:16:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:18:48] <logmsgbot>	 !log jgiannelos@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
[17:18:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:18:58] <wikibugs>	 (03PS1) 10Gergő Tisza: linkrecommendation: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/677980
[17:21:40] <liw>	 Majavah, we've deployed 3.17.0 on beta and are having some trouble testing (the hosts we were expecting don't exisst) - does everything look OK at your end?
[17:22:44] <Majavah>	 liw: which hosts?
[17:23:51] <logmsgbot>	 !log jgiannelos@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
[17:23:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:24:04] <liw>	 Majavah, deployment-mediawiki-07 and deployment-mediawiki11
[17:24:37] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1] "PCC SUCCESS (DIFF 10): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28964/console" [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063) (owner: 10JMeybohm)
[17:25:42] <Majavah>	 liw: mediawiki-07 is gone, mediawiki11 works just fine for me, but for that you can't use .wmflabs names, new VMs only have <host>.<project>.eqiad1.wikimedia.cloud
[17:26:04] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] linkrecommendation: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/677980 (owner: 10Gergő Tisza)
[17:27:32] <wikibugs>	 (03Merged) 10jenkins-bot: linkrecommendation: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/677980 (owner: 10Gergő Tisza)
[17:28:12] <wikibugs>	 (03PS4) 10Legoktm: mediawiki fonts: Remove ttf-ubuntu-font-family [puppet] - 10https://gerrit.wikimedia.org/r/675357 (owner: 10Majavah)
[17:29:47] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Hausa Wikimedians mailing list - https://phabricator.wikimedia.org/T279654 (10Ladsgroup) Can this wait for a month until we get the new mailman out of the door?
[17:29:52] <logmsgbot>	 !log jgiannelos@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
[17:29:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:33:03] <wikibugs>	 (03CR) 10Legoktm: [V: 03+1 C: 03+1] "PCC SUCCESS (DIFF 9): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28966/console" [puppet] - 10https://gerrit.wikimedia.org/r/675357 (owner: 10Majavah)
[17:35:03] <wikibugs>	 (03CR) 10Legoktm: [V: 03+1 C: 03+2] mediawiki fonts: Remove ttf-ubuntu-font-family [puppet] - 10https://gerrit.wikimedia.org/r/675357 (owner: 10Majavah)
[17:35:11] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[17:35:23] <icinga-wm>	 RECOVERY - Check systemd state on sodium is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:36:06] <wikibugs>	 (03PS3) 10JMeybohm: kube-apiserver: Update admission controller config [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063)
[17:36:39] <liw>	 Majavah, right. we figured out the new name, but had permission problems.
[17:36:39] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1005 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: relocating_shards: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 100.0, unassigned_shards: 0, timed_out: False, active_shards: 1877, number_of_data_nodes: 6, status: green, cluster_name: cloudelastic-chi-eqiad, number_of_in_flight_fetch: 0, number_of_nodes: 6, initializin
[17:36:39] <icinga-wm>	 er_of_pending_tasks: 0, active_primary_shards: 937, delayed_unassigned_shards: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[17:38:23] <Majavah>	 liw: could you be more specific? I'm not aware of any issues with it
[17:39:48] <liw>	 Majavah, I didn't take notes, dancy may have a note of the version, but my memory says scap tried to create a directory in /srv/deployments and didn't have permission
[17:39:49] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on parse2001 is CRITICAL: Host parse2001 is not in mediawiki-installation dsh group https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups
[17:40:06] <liw>	 dancy, I meant a note of the directory
[17:41:24] <dancy>	 `Permission denied: '/srv/deployment'`
[17:41:57] <dancy>	 That's during `scap deploy -v 'testing scap3.17.0'` in `/srv/deployment/integration/slave-scripts` on deployment-deploy01
[17:42:15] <Majavah>	 uhh, let me test
[17:43:53] <wikibugs>	 (03CR) 10JMeybohm: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/28967/" [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063) (owner: 10JMeybohm)
[17:45:25] <Majavah>	 dancy: deployment-mediawiki11 does not have /srv/deployment or any subdirectories, nor do I see anything in Puppet that should create it for integration/slave-scripts
[17:45:28] <wikibugs>	 10SRE, 10DBA, 10Wikimedia-Mailing-lists: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Ladsgroup) With the wikitech-l imported my last offer is now: 34GB.
[17:46:08] <wikibugs>	 10SRE, 10DBA, 10Wikimedia-Mailing-lists: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Ladsgroup)
[17:46:12] <dancy>	 Zooming out, what we need in general is a test of commands that are suitable for validating the new scap release in beta.  Do you have suggestions?
[17:47:26] <Majavah>	 if that was for me, I'm not familiar enough with scap to have anything else than standard mediawiki sync-worlds
[17:48:17] <dancy>	 ok. We'll work something out.  Thanks
[17:51:22] <Majavah>	 let me know if I can be helpful somehow
[17:52:26] <logmsgbot>	 !log ryankemper@cumin2001 END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
[17:52:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:01] <icinga-wm>	 RECOVERY - Check systemd state on wdqs2008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:54:53] <mutante>	 Amir1: would like any mailing list creations to be stalled?
[17:54:59] <mutante>	 (you)
[17:55:46] <mutante>	 I would estimate there is on average 2 to 4 per month
[17:58:21] <wikibugs>	 (03PS2) 10Dzahn: site/conftool-data: assign 4 x API, 4 x app, 2 x jobrunner, rack A5 [puppet] - 10https://gerrit.wikimedia.org/r/677674 (https://phabricator.wikimedia.org/T279599)
[17:58:23] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[17:59:22] <logmsgbot>	 !log tgr@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
[17:59:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:39] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] site/conftool-data: assign 4 x API, 4 x app, 2 x jobrunner, rack A5 [puppet] - 10https://gerrit.wikimedia.org/r/677674 (https://phabricator.wikimedia.org/T279599) (owner: 10Dzahn)
[17:59:48] <mutante>	 jouncebot: now
[17:59:48] <jouncebot>	 For the next 0 hour(s) and 0 minute(s): [[mw:Services|Services]] – [[mw:Extension:Graph|Graphoid]] / [[ORES]] (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210408T1700)
[18:00:05] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: My dear minions, it's time we take the moon! Just kidding. Time for [[Backport windows|Morning backport window]]<br/><small>''''''</small> deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210408T1800).
[18:00:05] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[18:00:08] <Majavah>	 liw: is my understanding correct that the integration/slave-scripts repo on beta is only used for testing scap?
[18:00:24] <MatmaRex>	 jouncebot is lying, there are patches
[18:00:27] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1005 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: unassigned_shards: 0, number_of_data_nodes: 6, active_shards_percent_as_number: 100.0, initializing_shards: 0, active_shards: 1877, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, timed_out: False, cluster_name: cloudelastic-chi-eqiad, number_of_nodes: 6, status: green, task_max_waiting_i
[18:00:27] <icinga-wm>	 , delayed_unassigned_shards: 0, active_primary_shards: 937, relocating_shards: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[18:00:39] <Urbanecm>	 interesting MatmaRex 
[18:01:02] <Urbanecm>	 i can deploy then :)
[18:02:07] <mutante>	 !log mw2403 through mw2401 - new hardwere moving into production, not pooled yet, initial puppet run, being added to icinga etc, creating mcrouter certs for them (T279599)
[18:02:07] <MatmaRex>	 it's weird, did we mess up the format? or does it not parse the page right after the recent changes?
[18:02:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:02:15] <stashbot>	 T279599: bring 10 new mediawiki appserver in codfw into production, new rack A5 (mw2402 - mw2411) - https://phabricator.wikimedia.org/T279599
[18:02:35] <MatmaRex>	 and it is printing weird HTML stuff in its messages, which isn't reassuring
[18:02:39] <mutante>	 log mw2403 through mw2411 - new hardware moving into production, not pooled yet, initial puppet run, being added to icinga etc, creating mcrouter certs for them (T279599)
[18:02:44] <mutante>	 darn it
[18:02:49] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Revert incorrect changes to ve.ui.MWBackCommand that made it stop working [extensions/VisualEditor] (wmf/1.36.0-wmf.38) - 10https://gerrit.wikimedia.org/r/677725 (https://phabricator.wikimedia.org/T279613) (owner: 10Bartosz Dziewoński)
[18:02:59] <Majavah>	 phuedx also has a patch scheduled, but doesn't look like they're here unless they're on an alt nick that I'm not aware of
[18:03:17] <mutante>	 !log mw2403 through mw2411 - new hardware moving into production, not pooled yet, initial puppet run, being added to icinga etc, creating mcrouter certs for them (T279599)
[18:03:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:05:38] <phuedx>	 o/ Sorry I'm late
[18:06:16] <MatmaRex>	 hi. Urbanecm is deploying
[18:06:31] <Urbanecm>	 phuedx: you have private patch in PS, right?
[18:06:32] <Amir1>	 mutante: my personal opinion is that if it is in any way time sensitive, it should go ahead but if it can wait for a bit, I'd like to stay so we get it pushed, there's not much left tbh
[18:07:26] <phuedx>	 Urbanecm: AIUI I have to generate the value on the deployment host, add it to private/PrivateSettings.php, and then it can be deployed?
[18:07:59] <mutante>	 Amir1: alright! but the options are between "wait a bit if you can" and "use the old server if you feel you have to" but not  "hey, wanna be the first to test new server", right?
[18:08:54] <Urbanecm>	 phuedx: affirmative
[18:09:37] <phuedx>	 Urbanecm: Ah! Sorry. No. I haven't done the patch to PS.php yet
[18:10:18] <mutante>	 Amir1: they are asking for wikimania organizing, cant tell how urgent, in the past I would have just done it, dont want to step on your toes during import though
[18:11:19] <Amir1>	 mutante: that one clearly is good to go
[18:11:28] <Amir1>	 on my side
[18:11:43] <mutante>	 Amir1: on mailman2? then i'll do it, ack
[18:11:49] <Amir1>	 yeah
[18:11:52] <mutante>	 ok, thanks
[18:12:02] <phuedx>	 Urbanecm: Want me to?
[18:12:23] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Create mailing list for Wikimania Core Organizing Team - https://phabricator.wikimedia.org/T279668 (10Dzahn) a:03Dzahn
[18:12:40] <Urbanecm>	 phuedx: yup. You can also sync it. 
[18:12:43] <Amir1>	 but on the other hand, it's not a big deal to migrate them from the old one, so I don't want to block any list creation 
[18:13:16] <mutante>	 Amir1: cool, ok. not expecting this to happen every day
[18:14:13] <mutante>	 could have been been that you were looking for candidates who first get created on new side and never import.. that was part of my thought
[18:14:37] <mutante>	 but if import is no big deal.. just going ahead as normal
[18:15:11] <phuedx>	 Urbanecm: Mind if I hold off for 10 minutes? thcipriani wanted to sit in on the deployment
[18:15:24] <Urbanecm>	 Not at all, waiting for CI
[18:15:32] <phuedx>	 Thanks :)
[18:17:48] <wikibugs>	 (03PS1) 10Dzahn: add fake mcrouter certs for mw2403 through mw2411 [labs/private] - 10https://gerrit.wikimedia.org/r/677991 (https://phabricator.wikimedia.org/T279599)
[18:18:36] <wikibugs>	 (03CR) 10Dzahn: [V: 03+2 C: 03+2] add fake mcrouter certs for mw2403 through mw2411 [labs/private] - 10https://gerrit.wikimedia.org/r/677991 (https://phabricator.wikimedia.org/T279599) (owner: 10Dzahn)
[18:19:05] <icinga-wm>	 RECOVERY - Long running screen/tmux on puppetmaster1001 is OK: OK: No SCREEN or tmux processes detected. https://wikitech.wikimedia.org/wiki/Monitoring/Long_running_screens
[18:19:42] <phuedx>	 Urbanecm: Is there a certain form for the SAL message, e.g. PrivateSettings: Add value for $wg... (T123456)?
[18:19:46] <stashbot>	 T123456: Special:CentralAuth reports account attachment, which - being standalone - is confusing, report accout creation as well - https://phabricator.wikimedia.org/T123456
[18:20:01] <phuedx>	 I ask because I usually defer to the output of the backport-summary script when I'm deploying ;)
[18:22:47] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on 7 hosts with reason: new_install
[18:22:50] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 7 hosts with reason: new_install
[18:22:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:23:01] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2410-2411].codfw.wmnet with reason: new_install
[18:23:02] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2410-2411].codfw.wmnet with reason: new_install
[18:23:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:23:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:23:17] <Urbanecm>	 phuedx: what you suggest should be fine
[18:23:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:23:37] <Urbanecm>	 there's not really a standard way; as long as it explains what is happening, it should be good
[18:24:07] <wikibugs>	 (03Merged) 10jenkins-bot: Revert incorrect changes to ve.ui.MWBackCommand that made it stop working [extensions/VisualEditor] (wmf/1.36.0-wmf.38) - 10https://gerrit.wikimedia.org/r/677725 (https://phabricator.wikimedia.org/T279613) (owner: 10Bartosz Dziewoński)
[18:24:22] <Urbanecm>	 phuedx: please ping me once you're done, Matma.Rex's change just merged :)
[18:24:50] <phuedx>	 Urbanecm: You go ahead of me
[18:24:55] <Urbanecm>	 okay
[18:24:58] <Urbanecm>	 MatmaRex: still around?
[18:25:03] <logmsgbot>	 !log tgr@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
[18:25:03] <logmsgbot>	 !log tgr@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
[18:25:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:25:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:25:49] <MatmaRex>	 Urbanecm: yeah
[18:26:00] <Urbanecm>	 MatmaRex: pulled to mwdebug1001, can you test?
[18:26:03] <MatmaRex>	 is it just me, or is CI slower recently?
[18:26:05] <MatmaRex>	 looking
[18:27:28] <MatmaRex>	 Urbanecm: looks good
[18:27:33] <Urbanecm>	 thx, syncing
[18:29:42] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized php-1.36.0-wmf.38/extensions/VisualEditor/modules/ve-mw/ui/tools/ve.ui.MWBackTool.js: e0f3735f6a31d2914bae6c9daac1267707a2d108: Revert incorrect changes to ve.ui.MWBackCommand that made it stop working (T279613) (duration: 01m 07s)
[18:29:49] <Urbanecm>	 MatmaRex: should be live
[18:29:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:29:51] <stashbot>	 T279613: [wmf.38-regression] mobile VE - "oo-ui-icon-close" button does not work - https://phabricator.wikimedia.org/T279613
[18:30:02] <MatmaRex>	 thanks
[18:30:07] <Urbanecm>	 np
[18:30:14] <Urbanecm>	 phuedx: I'm done.
[18:30:26] <wikibugs>	 (03CR) 10Dzahn: "learned something from this change, ty" [puppet] - 10https://gerrit.wikimedia.org/r/677805 (owner: 10Alexandros Kosiaris)
[18:30:31] * thcipriani waves
[18:30:40] <phuedx>	 Urbanecm: Thanks
[18:30:45] <Urbanecm>	 np
[18:30:46] <Urbanecm>	 hi thcipriani :)
[18:30:53] <Urbanecm>	 phuedx: do let me know if i can help in any way.
[18:31:47] <logmsgbot>	 !log tgr@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
[18:31:47] <logmsgbot>	 !log tgr@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
[18:31:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:32:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:32:11] <wikibugs>	 (03PS1) 10JMeybohm: New upstream version 0.13.1 [debs/chartmuseum] - 10https://gerrit.wikimedia.org/r/677996
[18:32:19] <wikibugs>	 (03CR) 10Dzahn: "there is a bug, wikibugs uses "(owner: Alexandros Kosiaris)" on IRC but it should also be Αλέξανδρος Κοσιάρης now 😊" [puppet] - 10https://gerrit.wikimedia.org/r/677805 (owner: 10Alexandros Kosiaris)
[18:33:42] <wikibugs>	 (03PS2) 10JMeybohm: New upstream version 0.13.1 [debs/chartmuseum] - 10https://gerrit.wikimedia.org/r/677996
[18:37:28] <mutante>	 !log mw2403 through mw2411 - serial rebooting
[18:37:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:42:12] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudcephosd10[16-20].eqiad.wmnet - https://phabricator.wikimedia.org/T274945 (10Papaul) @Cmjohnson I did the second interface for cloudcephosd1016 see below for the instructions let me know in you have any qu...
[18:42:22] <phuedx>	 Urbanecm: Made the commit to PrivateSettings.php. Going to sync-file now
[18:43:36] <phuedx>	 Pulling to mwdebug1001
[18:44:24] <phuedx>	 Testing now
[18:47:21] <legoktm>	 mutante: please uh, file a bug against wikibugs :p
[18:47:37] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[18:47:49] <mutante>	 legoktm: :) 
[18:48:14] <mutante>	 yea, unicode works here
[18:49:43] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1005 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, unassigned_shards: 0, status: green, active_primary_shards: 937, number_of_nodes: 6, initializing_shards: 0, timed_out: False, number_of_data_nodes: 6, number_of_pending_tasks: 0, relocating_shards: 0, number_of_in_flight_fetch: 0, delayed_unassigned_shards: 0,
[18:49:43] <icinga-wm>	 _in_queue_millis: 0, active_shards: 1877, active_shards_percent_as_number: 100.0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[18:50:25] <phuedx>	 Syncing now
[18:51:35] <logmsgbot>	 !log phuedx@deploy1002 Synchronized private/PrivateSettings.php: PrivateSettings: Add value for  (T261842) (duration: 01m 06s)
[18:51:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:51:44] <stashbot>	 T261842: Create schema to track users opting in/out of desktop improvements - https://phabricator.wikimedia.org/T261842
[18:51:59] <phuedx>	 *facepalm* double quotes
[18:52:57] <phuedx>	 !log phuedx@deploy1002 Synchronized private/PrivateSettings.php: PrivateSettings: Add value for $wgWMEVectorPrefDiffSalt (T261842)
[18:53:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:00:05] <jouncebot>	 marxarelli and twentyafterfour: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Mediawiki train - American Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210408T1900).
[19:01:04] <wikibugs>	 (03PS1) 10Andrew Bogott: policy.yaml files: update default behavior [puppet] - 10https://gerrit.wikimedia.org/r/678006
[19:02:11] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] policy.yaml files: update default behavior [puppet] - 10https://gerrit.wikimedia.org/r/678006 (owner: 10Andrew Bogott)
[19:04:03] <phuedx>	 Urbanecm: All done :)
[19:04:09] <Urbanecm>	 cool
[19:09:23] <wikibugs>	 (03PS1) 10Andrew Bogott: Cinder: prevent some actions in policy.yaml [puppet] - 10https://gerrit.wikimedia.org/r/678011
[19:10:24] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Cinder: prevent some actions in policy.yaml [puppet] - 10https://gerrit.wikimedia.org/r/678011 (owner: 10Andrew Bogott)
[19:16:48] <mutante>	 legoktm: T279710  scnr
[19:16:48] <stashbot>	 T279710: wikibugs should display the same type of name that the Gerrit UI displays - https://phabricator.wikimedia.org/T279710
[19:17:28] <legoktm>	 :D
[19:26:08] <twentyafterfour>	 jouncebot: ^ that's my favorite bothumor. you're a pretty funny bot, jouncebot. 
[19:27:01] <twentyafterfour>	 marxarelli: Is ther some way I can help out with the train?  
[19:27:09] * twentyafterfour hasn't been following as closely as I should
[19:27:15] <marxarelli>	 twentyafterfour: just getting a late start, sorry
[19:28:05] <twentyafterfour>	 np
[19:28:34] <marxarelli>	 i think we're good. the only thing of concern that i saw yesterday was https://phabricator.wikimedia.org/T279585 and that doesn't seem of concern to api folks or wikidata folks
[19:28:49] <marxarelli>	 so just the usual today. roll and watch
[19:29:22] <wikibugs>	 (03PS1) 10Andrew Bogott: OpenStack nova: allow anyone to read instance volume info [puppet] - 10https://gerrit.wikimedia.org/r/678022 (https://phabricator.wikimedia.org/T279697)
[19:29:40] <twentyafterfour>	 marxarelli: ok, I'll help watch logs if that's helpful
[19:30:02] <wikibugs>	 10SRE, 10Wikimedia-Logstash, 10observability: Buster elasticsearch-curator version not compatible with ELK7 - https://phabricator.wikimedia.org/T257024 (10colewhite) p:05High→03Medium I found elasticsearch-curator 5.8.1 in the `thirdparty/elastic74` component and added it to the `thirdparty/elastic710` c...
[19:30:22] <marxarelli>	 thanks! that's always helpful. logspam-watch was acting funny for me yesterday (choking on some bad utf-8 i think?) but it seems to be ok today
[19:30:47] <wikibugs>	 (03PS2) 10Andrew Bogott: OpenStack nova: allow anyone to read instance volume info [puppet] - 10https://gerrit.wikimedia.org/r/678022 (https://phabricator.wikimedia.org/T279697)
[19:31:33] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] OpenStack nova: allow anyone to read instance volume info [puppet] - 10https://gerrit.wikimedia.org/r/678022 (https://phabricator.wikimedia.org/T279697) (owner: 10Andrew Bogott)
[19:33:02] <wikibugs>	 (03PS1) 10Dduvall: all wikis to 1.36.0-wmf.38 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678023
[19:33:04] <wikibugs>	 (03CR) 10Dduvall: [C: 03+2] all wikis to 1.36.0-wmf.38 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678023 (owner: 10Dduvall)
[19:33:44] <wikibugs>	 (03Merged) 10jenkins-bot: all wikis to 1.36.0-wmf.38 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678023 (owner: 10Dduvall)
[19:34:08] <brennen>	 marxarelli: re: logspam-watch, yeah, have a patch in for that: https://gerrit.wikimedia.org/r/c/operations/puppet/+/677676
[19:34:50] <twentyafterfour>	 submitting "WikiPage constructed on a Title that cannot exist as a page" to prod-errors
[19:35:01] <logmsgbot>	 !log dduvall@deploy1002 rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.38
[19:35:05] <marxarelli>	 brennen: right on. even with the sporadic error it was still quite helpful
[19:35:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:37:20] <marxarelli>	 twentyafterfour: yeah, that doesn't look good
[19:39:24] <marxarelli>	 but happening for wmf.37 too it looks like
[19:41:45] <wikibugs>	 (03PS1) 10Andrew Bogott: Openstack Cinder: allow all users to getallsnapshots [puppet] - 10https://gerrit.wikimedia.org/r/678029
[19:42:59] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Openstack Cinder: allow all users to getallsnapshots [puppet] - 10https://gerrit.wikimedia.org/r/678029 (owner: 10Andrew Bogott)
[19:44:31] <twentyafterfour>	 a lot of lock wait timeouts, at least more than normal But I don't see any indicator of what the cause may be
[19:46:12] <wikibugs>	 (03PS1) 10Eric Gardner: Don't show "invalid search" message when request is aborted by user [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.38) - 10https://gerrit.wikimedia.org/r/677956 (https://phabricator.wikimedia.org/T277714)
[19:47:09] <twentyafterfour>	 Krinkle: just got to see your new phatality feature in action (the backlinks from phab to kibana) it works nicely
[19:47:17] <twentyafterfour>	 see T279711
[19:47:18] <stashbot>	 T279711: WikiPage constructed on a Title that cannot exist as a page: Special:Watchlist [Called from Article::newPage] - https://phabricator.wikimedia.org/T279711
[19:48:28] <Krinkle>	 twentyafterfour: ooh nice, thanks for deploying that
[19:48:42] <Krinkle>	 seems to all work now as intended
[19:49:19] <twentyafterfour>	 yep it's pretty cool
[19:49:36] <twentyafterfour>	 thanks for building that feature, the backlinks are super helpful
[19:49:55] <marxarelli>	 +1 those are really nice
[19:50:18] <mutante>	 !log mw2403 through mw2411 - scap pull - new hardware
[19:50:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:50:37] <wikibugs>	 (03CR) 10Cwhite: "Tested in Pontoon and it appears to DTRT.  Will triple-check the curator config in codfw before rolling out completely." [puppet] - 10https://gerrit.wikimedia.org/r/677593 (https://phabricator.wikimedia.org/T274394) (owner: 10Cwhite)
[19:50:44] <twentyafterfour>	 Krinkle: should we get rid of the reqid field now or fix it so that it is populated? it's currently unpopulated and just used in the description template
[19:53:03] <wikibugs>	 (03PS5) 10Cwhite: pontoon: set jobs_host and define aggressive curator config [puppet] - 10https://gerrit.wikimedia.org/r/677593 (https://phabricator.wikimedia.org/T274394)
[19:54:10] <Krinkle>	 twentyafterfour: yeah, I was going to follow-up maybe after some time has passed to hide trace/reqId from the form.
[19:54:19] <Krinkle>	 but I think that means it also hides it on existing tasks, I think?
[19:54:34] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw240[3-9].codfw.wmnet
[19:54:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:54:48] <twentyafterfour>	 Krinkle: I think hiding it on the form does not hide it from the task detail view 
[19:55:11] <twentyafterfour>	 Krinkle: I'll find out
[19:55:12] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw240[3-9].codfw.wmnet
[19:55:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:56:01] <wikibugs>	 (03PS1) 10Bstorm: gridengine: set grid-configurator source files to use new domain name [puppet] - 10https://gerrit.wikimedia.org/r/678043 (https://phabricator.wikimedia.org/T277653)
[19:56:28] <twentyafterfour>	 Krinkle: yeah the form doesn't affect the detail view
[19:56:43] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/weight=10; selector: name=mw2379.codfw.wmnet
[19:56:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:57:03] <wikibugs>	 (03PS6) 10Cwhite: logstash: refactor how curator jobs are defined and deployed [puppet] - 10https://gerrit.wikimedia.org/r/677593 (https://phabricator.wikimedia.org/T274394)
[19:57:06] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/weight=10; selector: name=mw238[0-2].codfw.wmnet
[19:57:07] <Krinkle>	 twentyafterfour: hm.. okay, but it hides it from edit form though for those tasks
[19:57:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:57:19] <Krinkle>	 fields that are visible but not editable
[19:57:26] <Krinkle>	 I guess that's okay for those secondary fields
[19:57:31] <Krinkle>	 unlikely to want to change
[19:57:46] <wikibugs>	 (03CR) 10Bstorm: "Please note, I didn't bother messing with the "dedicated" exec nodes because they aren't used at all and that stuff should be removed." [puppet] - 10https://gerrit.wikimedia.org/r/678043 (https://phabricator.wikimedia.org/T277653) (owner: 10Bstorm)
[19:58:09] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/weight=10; selector: name=mw241[0-1].codfw.wmnet
[19:58:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:58:20] <twentyafterfour>	 Krinkle: yeah I think it's ok for this essentially deprecated fields
[19:58:39] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw241[0-1].codfw.wmnet
[19:58:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:58:46] <wikibugs>	 (03PS1) 10Razzi: superset: check http server following redirects with curl [puppet] - 10https://gerrit.wikimedia.org/r/678044 (https://phabricator.wikimedia.org/T277729)
[19:59:35] <mutante>	 jouncebot: now
[19:59:35] <jouncebot>	 For the next 1 hour(s) and 0 minute(s): Mediawiki train - American Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210408T1900)
[20:00:08] <mutante>	 just added new servers to scap groups but stands back now
[20:00:22] <mutante>	 (getting scap but not pooled)
[20:02:21] <legoktm>	 !log imported parsoid_0.11.1all_all.deb to releases.wikimedia.org apt repo
[20:02:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:03:25] <wikibugs>	 (03PS2) 10Razzi: superset: check http server following redirects with curl [puppet] - 10https://gerrit.wikimedia.org/r/678044 (https://phabricator.wikimedia.org/T277729)
[20:27:05] * razzi lunchtime!
[20:27:07] <legoktm>	 !log legoktm@deploy1002:~$ cat deb-parsoid-urls.txt | mwscript purgeList.php --wiki=aawiki # to clear releases.wm.o/debian/ cache
[20:27:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:27:54] <mutante>	 jouncebot: now
[20:27:55] <jouncebot>	 For the next 0 hour(s) and 32 minute(s): Mediawiki train - American Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210408T1900)
[20:28:02] <mutante>	 is a train ongoing?
[20:28:56] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw241[0-1].codfw.wmnet
[20:29:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:30:01] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw240[3-9].codfw.wmnet
[20:30:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:32:57] <mutante>	 !log mw2304 through mw2411 - pooled and set to active state in netbox (T279599)
[20:33:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:33:06] <stashbot>	 T279599: bring 10 new mediawiki appserver in codfw into production, new rack A5 (mw2402 - mw2411) - https://phabricator.wikimedia.org/T279599
[20:33:20] <mutante>	 typo in log line again, duh
[20:33:40] <mutante>	 !log mw2403 through mw2411 pooled and set to active state in netbox (T279599)
[20:33:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:34:20] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install (35) mw2377 and upwards - https://phabricator.wikimedia.org/T274171 (10Dzahn)
[20:34:56] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install (35) mw2377 and upwards - https://phabricator.wikimedia.org/T274171 (10Dzahn) mw2403 through mw2411 in production, set to Active in Netbox.
[20:40:13] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on wtp1025 is CRITICAL: Host wtp1025 is not in mediawiki-installation dsh group https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups
[20:40:22] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install (35) mw2377 and upwards - https://phabricator.wikimedia.org/T274171 (10Dzahn) Checked netbox one more time. Now all mw servers in codfw are in one of 2 states. ACTIVE or OFFLINE and covered by decom tickets.
[20:42:51] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Add Lena Meintrup to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T279531 (10KFrancis) @Lena_WMDE The NDA was sent to the email listed for your electronic signature.  Please review and sign when you have a minute.  Thanks!
[20:43:44] <wikibugs>	 (03CR) 10Ottomata: "Huh, maybe I don't know anything!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/678044 (https://phabricator.wikimedia.org/T277729) (owner: 10Razzi)
[20:44:08] <wikibugs>	 (03CR) 10Ottomata: "Otherwise lgtm, maybe investigate to see how monitoring::service works vs nrpe::monitoring_service." [puppet] - 10https://gerrit.wikimedia.org/r/678044 (https://phabricator.wikimedia.org/T277729) (owner: 10Razzi)
[20:45:22] <mutante>	 ignoring the wtp1025 alert after I saw it's not pooled. be back later
[20:47:25] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on parse2001 is CRITICAL: CRITICAL: 318 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[20:48:17] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1025 is CRITICAL: CRITICAL: 318 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[20:48:54] <wikibugs>	 10SRE, 10Wikimedia-Logstash, 10observability: Buster elasticsearch-curator version not compatible with ELK7 - https://phabricator.wikimedia.org/T257024 (10herron) Initially the thinking was to store the appropriate elasticsearch-curator for each ES version in the component.  But in practice yeah that's provi...
[20:57:34] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Create mailing list for Wikimania Core Organizing Team - https://phabricator.wikimedia.org/T279668 (10Dzahn) Hey @Effeietsanders,  you should have mail.  See  https://lists.wikimedia.org/mailman/listinfo/wikimania-cot  and you should have a random password to login at:  https:...
[21:00:27] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Create mailing list for Wikimania Core Organizing Team - https://phabricator.wikimedia.org/T279668 (10Dzahn) 05Open→03Resolved
[21:06:04] <wikibugs>	 (03PS1) 10Gergő Tisza: Bump linkrecommendation version [deployment-charts] - 10https://gerrit.wikimedia.org/r/678078
[21:07:28] <wikibugs>	 (03CR) 10Razzi: superset: check http server following redirects with curl (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/678044 (https://phabricator.wikimedia.org/T277729) (owner: 10Razzi)
[21:15:40] <wikibugs>	 10SRE, 10WMF-Annual-Report: Update annual.wikimedia.org redirect to point to 2020 Annual Report - https://phabricator.wikimedia.org/T279571 (10Dzahn) a:03Dzahn
[21:16:32] <wikibugs>	 (03PS3) 10Razzi: superset: check http server following redirects with curl [puppet] - 10https://gerrit.wikimedia.org/r/678044 (https://phabricator.wikimedia.org/T277729)
[21:17:15] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Wikisul maillist - https://phabricator.wikimedia.org/T279482 (10Dzahn) 05Open→03Stalled setting to stalled to reflect this. if you feel like it's getting more urgent feel free to change that
[21:17:40] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Wikisul maillist - https://phabricator.wikimedia.org/T279482 (10Dzahn) p:05Triage→03Medium
[21:17:56] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Bump linkrecommendation version [deployment-charts] - 10https://gerrit.wikimedia.org/r/678078 (owner: 10Gergő Tisza)
[21:18:03] <wikibugs>	 (03CR) 10Razzi: [V: 03+1] "PCC SUCCESS (DIFF 1 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28969/console" [puppet] - 10https://gerrit.wikimedia.org/r/678044 (https://phabricator.wikimedia.org/T277729) (owner: 10Razzi)
[21:18:25] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Hausa Wikimedians mailing list - https://phabricator.wikimedia.org/T279654 (10Dzahn) p:05Triage→03Medium
[21:21:59] <wikibugs>	 10SRE, 10Dumps-Generation, 10SRE-Access-Requests: Create new group for root access to snapshot*, dumpsdata* and labstore1006,7 with holger in it - https://phabricator.wikimedia.org/T277629 (10Dzahn) Any news on access check for @holger.knust ?
[21:22:33] <wikibugs>	 (03Merged) 10jenkins-bot: Bump linkrecommendation version [deployment-charts] - 10https://gerrit.wikimedia.org/r/678078 (owner: 10Gergő Tisza)
[21:22:50] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to stat boxes for mlitn - https://phabricator.wikimedia.org/T274749 (10Dzahn) Hi @MarkTraceur friendly ping. this is still blocked on your approval at the moment.
[21:23:24] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Add Lena Meintrup to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T279531 (10Dzahn) a:03Lena_WMDE
[21:24:13] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant access to Superset for Mikeraish - https://phabricator.wikimedia.org/T279147 (10Dzahn) a:03MRaishWMF
[21:24:57] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10CAS-SSO: CAS SSO for reedy - https://phabricator.wikimedia.org/T279244 (10Dzahn) a:03Reedy
[21:29:00] <wikibugs>	 10SRE, 10DBA, 10Platform Engineering, 10Wikimedia-Incident: Appservers latency spike / parser cache growth 2021-03-28 - https://phabricator.wikimedia.org/T278655 (10Krinkle) >>! In T278655#6982467, @Marostegui wrote: > I am not fully sure I am reading the disk space graph correctly as I don't see an increa...
[21:29:13] <wikibugs>	 10SRE, 10DBA, 10Platform Engineering, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Appservers latency spike / parser cache growth 2021-03-28 - https://phabricator.wikimedia.org/T278655 (10Krinkle)
[21:32:13] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudvirt104[0-6].eqiad.wmnet - https://phabricator.wikimedia.org/T275081 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts: ` ['cloudvirt1041.eqiad.wmnet',...
[21:33:43] <logmsgbot>	 !log tgr@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
[21:33:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:34:25] <logmsgbot>	 !log andrew@deploy1002 Started deploy [horizon/deploy@3abe9d0]: Fix for T279667
[21:34:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:34:33] <stashbot>	 T279667: Horizon: 'edit security groups' instance menu produces an error - https://phabricator.wikimedia.org/T279667
[21:35:45] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install moss-be200[12] - https://phabricator.wikimedia.org/T276642 (10Papaul)
[21:38:18] <logmsgbot>	 !log andrew@deploy1002 Finished deploy [horizon/deploy@3abe9d0]: Fix for T279667 (duration: 03m 52s)
[21:38:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:44:27] <wikibugs>	 10SRE, 10ops-codfw, 10ops-eqiad, 10DC-Ops, 10procurement: Dc-Ops Commands for Cumin - https://phabricator.wikimedia.org/T279721 (10wiki_willy)
[21:46:41] <logmsgbot>	 !log tgr@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
[21:46:41] <logmsgbot>	 !log tgr@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
[21:46:42] <logmsgbot>	 !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
[21:46:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:47:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:47:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:48:17] <logmsgbot>	 !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
[21:48:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:48:43] <logmsgbot>	 !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
[21:48:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:50:37] <logmsgbot>	 !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
[21:50:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:51:32] <logmsgbot>	 !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
[21:51:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:52:42] <logmsgbot>	 !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
[21:52:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:53:36] <logmsgbot>	 !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
[21:53:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:54:13] <wikibugs>	 10SRE, 10ops-codfw, 10ops-eqiad, 10DC-Ops, 10procurement: Dc-Ops Commands for Cumin - https://phabricator.wikimedia.org/T279721 (10RobH)
[21:54:43] <logmsgbot>	 !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
[21:54:43] <wikibugs>	 10SRE, 10ops-codfw, 10ops-eqiad, 10DC-Ops, 10procurement: Dc-Ops Commands for Cumin - https://phabricator.wikimedia.org/T279721 (10RobH)
[21:54:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:55:45] <logmsgbot>	 !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
[21:55:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:56:43] <logmsgbot>	 !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
[21:56:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:57:55] <logmsgbot>	 !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
[21:58:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:00:00] <logmsgbot>	 !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
[22:00:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:04:02] <wikibugs>	 (03CR) 10Razzi: [V: 03+1 C: 03+2] superset: check http server following redirects with curl [puppet] - 10https://gerrit.wikimedia.org/r/678044 (https://phabricator.wikimedia.org/T277729) (owner: 10Razzi)
[22:07:31] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudvirt104[0-6].eqiad.wmnet - https://phabricator.wikimedia.org/T275081 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudvirt1041.eqiad.wmnet', 'cloudvirt1042.eqiad.wmnet', 'cloudvirt1043.eqi...
[22:12:11] <logmsgbot>	 !log tgr@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
[22:12:11] <logmsgbot>	 !log tgr@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
[22:12:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:12:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:13:12] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Create mailing list for Wikimania Core Organizing Team - https://phabricator.wikimedia.org/T279668 (10Effeietsanders) @Dzahn many thanks for the speedy turnaround. I've set it up.
[22:18:15] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[22:20:09] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1005 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: task_max_waiting_in_queue_millis: 0, unassigned_shards: 0, active_shards: 1877, cluster_name: cloudelastic-chi-eqiad, number_of_data_nodes: 6, active_primary_shards: 937, relocating_shards: 0, timed_out: False, number_of_in_flight_fetch: 0, active_shards_percent_as_number: 100.0, number_of_pending_
[22:20:09] <icinga-wm>	  green, number_of_nodes: 6, delayed_unassigned_shards: 0, initializing_shards: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[22:21:08] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudvirt104[0-6].eqiad.wmnet - https://phabricator.wikimedia.org/T275081 (10RobH)
[22:21:46] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudvirt104[0-6].eqiad.wmnet - https://phabricator.wikimedia.org/T275081 (10RobH) I've emailed our Dell rep to determine where the NIC is for the seed server, cloudvirt1040.  Once I have that info, I'll reass...
[22:25:32] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "Interesting. I think I like this more. We'll want to update the docs." [puppet] - 10https://gerrit.wikimedia.org/r/677862 (owner: 10Arturo Borrero Gonzalez)
[22:28:02] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "In a way, we could just make the --beta arg unnecessary (and key off of the project file), but maybe it's good to force you to check where" [puppet] - 10https://gerrit.wikimedia.org/r/677865 (owner: 10Arturo Borrero Gonzalez)
[22:28:55] <wikibugs>	 (03CR) 10Bstorm: "Suggested a patch that could make this approach work." [puppet] - 10https://gerrit.wikimedia.org/r/677873 (https://phabricator.wikimedia.org/T277653) (owner: 10Arturo Borrero Gonzalez)
[22:44:37] <wikibugs>	 (03PS1) 10Dzahn: annualreport: update redirect to annual report for 2020 [puppet] - 10https://gerrit.wikimedia.org/r/678106 (https://phabricator.wikimedia.org/T279571)
[22:46:07] <wikibugs>	 (03PS2) 10Dzahn: annualreport: update redirect to annual report for 2020 [puppet] - 10https://gerrit.wikimedia.org/r/678106 (https://phabricator.wikimedia.org/T279571)
[22:49:04] <wikibugs>	 (03PS1) 10Razzi: superset: put puppet:// resource in files/ [puppet] - 10https://gerrit.wikimedia.org/r/678109 (https://phabricator.wikimedia.org/T277729)
[22:51:34] <wikibugs>	 (03CR) 10Razzi: [C: 03+2] superset: put puppet:// resource in files/ [puppet] - 10https://gerrit.wikimedia.org/r/678109 (https://phabricator.wikimedia.org/T277729) (owner: 10Razzi)
[22:54:10] <brennen>	 jouncebot next
[22:54:11] <jouncebot>	 In 0 hour(s) and 5 minute(s): [[Backport windows|US Backport and Config training]]<br/><small>''''''</small> (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210408T2300)
[22:59:27] <mutante>	 legoktm: if you dont mind .. a review on https://gerrit.wikimedia.org/r/c/operations/puppet/+/678106
[23:00:00] * legoktm looks
[23:00:05] <jouncebot>	 brennen: Time to snap out of that daydream and deploy [[Backport windows|US Backport and Config training]]<br/><small>''''''</small>. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210408T2300).
[23:00:17] <brennen>	 here, will be doing training with EricGardner.
[23:01:23] <wikibugs>	 (03CR) 10Legoktm: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/678106 (https://phabricator.wikimedia.org/T279571) (owner: 10Dzahn)
[23:01:34] <mutante>	 thank you
[23:02:03] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] annualreport: update redirect to annual report for 2020 [puppet] - 10https://gerrit.wikimedia.org/r/678106 (https://phabricator.wikimedia.org/T279571) (owner: 10Dzahn)
[23:02:10] <wikibugs>	 (03PS3) 10Dzahn: annualreport: update redirect to annual report for 2020 [puppet] - 10https://gerrit.wikimedia.org/r/678106 (https://phabricator.wikimedia.org/T279571)
[23:02:13] <legoktm>	 np :)
[23:04:40] <etonkovidova>	 I am here too - will do quick verify on https://gerrit.wikimedia.org/r/678106
[23:04:55] <wikibugs>	 10SRE, 10WMF-Annual-Report, 10Patch-For-Review: Update annual.wikimedia.org redirect to point to 2020 Annual Report - https://phabricator.wikimedia.org/T279571 (10Dzahn) p:05Triage→03High
[23:05:06] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] Don't show "invalid search" message when request is aborted by user [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.38) - 10https://gerrit.wikimedia.org/r/677956 (https://phabricator.wikimedia.org/T277714) (owner: 10Eric Gardner)
[23:06:45] <wikibugs>	 10SRE, 10WMF-Annual-Report, 10Patch-For-Review: Update annual.wikimedia.org redirect to point to 2020 Annual Report - https://phabricator.wikimedia.org/T279571 (10Dzahn) @spatton Thanks for the thoughtful way you handled the ticket.  Code change has been reviewed and deployed just now on the backends (miscwe...
[23:08:14] <wikibugs>	 (03PS1) 10Razzi: superset: comment out check that isn't working as intended [puppet] - 10https://gerrit.wikimedia.org/r/678113 (https://phabricator.wikimedia.org/T277729)
[23:08:37] <wikibugs>	 10SRE, 10WMF-Annual-Report, 10Patch-For-Review: Update annual.wikimedia.org redirect to point to 2020 Annual Report - https://phabricator.wikimedia.org/T279571 (10Dzahn) ` curl -S https://annual.wikimedia.org | grep moved ... <p>The document has moved <a href="https://wikimediafoundation.org/about/annualrepo...
[23:09:56] <wikibugs>	 10SRE, 10WMF-Annual-Report, 10serviceops: Update annual.wikimedia.org redirect to point to 2020 Annual Report - https://phabricator.wikimedia.org/T279571 (10Dzahn)
[23:10:33] <wikibugs>	 (03PS2) 10Razzi: superset: comment out check that isn't working as intended [puppet] - 10https://gerrit.wikimedia.org/r/678113 (https://phabricator.wikimedia.org/T277729)
[23:12:31] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install moss-be200[12] - https://phabricator.wikimedia.org/T276642 (10Papaul)
[23:12:56] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudnet2004-dev - https://phabricator.wikimedia.org/T275676 (10Papaul)
[23:15:45] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[23:17:41] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1005 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: active_shards_percent_as_number: 100.0, status: green, delayed_unassigned_shards: 0, unassigned_shards: 0, cluster_name: cloudelastic-chi-eqiad, number_of_nodes: 6, initializing_shards: 0, number_of_pending_tasks: 0, task_max_waiting_in_queue_millis: 0, timed_out: False, number_of_data_nodes: 6, ac
[23:17:41] <icinga-wm>	 , active_primary_shards: 937, relocating_shards: 0, number_of_in_flight_fetch: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[23:24:39] <wikibugs>	 (03PS1) 10Papaul: Add moss-be200[12] MAC adderess, partman recipe and role insetup [puppet] - 10https://gerrit.wikimedia.org/r/678117 (https://phabricator.wikimedia.org/T276642)
[23:33:14] <wikibugs>	 (03Merged) 10jenkins-bot: Don't show "invalid search" message when request is aborted by user [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.38) - 10https://gerrit.wikimedia.org/r/677956 (https://phabricator.wikimedia.org/T277714) (owner: 10Eric Gardner)
[23:41:20] <brennen>	 EricGardner has confirmed https://gerrit.wikimedia.org/r/677956 working, going ahead with sync.
[23:47:17] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[23:47:34] <chaomodus>	 Hm
[23:47:36] <chaomodus>	 i'll look at that.
[23:48:21] <logmsgbot>	 !log brennen@deploy1002 Synchronized php-1.36.0-wmf.38/extensions/WikibaseMediaInfo/resources/mediasearch-vue/store/actions.js: Backport: [[gerrit:677956|Do not show "invalid search" message when request is aborted by user (TT277714)]] (duration: 00m 57s)
[23:48:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:49:37] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[23:50:18] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] Add moss-be200[12] MAC adderess, partman recipe and role insetup [puppet] - 10https://gerrit.wikimedia.org/r/678117 (https://phabricator.wikimedia.org/T276642) (owner: 10Papaul)
[23:54:03] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install moss-be200[12] - https://phabricator.wikimedia.org/T276642 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` moss-be2001.codfw.wmnet ` The log can be found in `/var/...