[00:05:02] PROBLEM - PHP7 rendering on mw1303 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:06:30] RECOVERY - PHP7 rendering on mw1303 is OK: HTTP OK: HTTP/1.1 200 OK - 330 bytes in 2.252 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:10:44] PROBLEM - PHP7 rendering on mw1303 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:12:18] RECOVERY - PHP7 rendering on mw1303 is OK: HTTP OK: HTTP/1.1 200 OK - 330 bytes in 7.249 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:53:02] RECOVERY - Backup freshness on backup1001 is OK: Fresh: 101 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [02:48:57] (03PS1) 10Andrew Bogott: nova: make firstboot script tolerant of more diverse vendor data [puppet] - 10https://gerrit.wikimedia.org/r/654032 (https://phabricator.wikimedia.org/T271056) [02:48:59] (03PS1) 10Andrew Bogott: Nova: move our injected userdata into vendor_data, where it belongs [puppet] - 10https://gerrit.wikimedia.org/r/654033 (https://phabricator.wikimedia.org/T271056) [03:38:16] PROBLEM - Host ms-be2050 is DOWN: PING CRITICAL - Packet loss = 100% [03:38:46] RECOVERY - Host ms-be2050 is UP: PING OK - Packet loss = 0%, RTA = 33.40 ms [03:40:11] (03PS2) 10Andrew Bogott: Nova: move our injected userdata into vendor_data, where it belongs [puppet] - 10https://gerrit.wikimedia.org/r/654033 (https://phabricator.wikimedia.org/T271056) [03:42:12] (03CR) 10Andrew Bogott: [C: 03+2] nova: make firstboot script tolerant of more diverse vendor data [puppet] - 10https://gerrit.wikimedia.org/r/654032 (https://phabricator.wikimedia.org/T271056) (owner: 10Andrew Bogott) [03:42:23] (03CR) 10Andrew Bogott: [C: 03+2] Nova: move our injected userdata into vendor_data, where it belongs [puppet] - 10https://gerrit.wikimedia.org/r/654033 (https://phabricator.wikimedia.org/T271056) (owner: 10Andrew Bogott) [04:29:28] (03PS1) 10Andrew Bogott: nova firstboot script: fix appearances of .novalocal in /etc/hosts [puppet] - 10https://gerrit.wikimedia.org/r/654036 (https://phabricator.wikimedia.org/T271056) [04:30:04] (03CR) 10Andrew Bogott: [C: 03+2] nova firstboot script: fix appearances of .novalocal in /etc/hosts [puppet] - 10https://gerrit.wikimedia.org/r/654036 (https://phabricator.wikimedia.org/T271056) (owner: 10Andrew Bogott) [04:31:26] PROBLEM - BFD status on cr2-eqdfw is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [04:31:40] PROBLEM - BFD status on cr3-knams is CRITICAL: CRIT: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [04:32:00] PROBLEM - OSPF status on cr3-knams is CRITICAL: OSPFv2: 2/4 UP : OSPFv3: 2/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [04:33:06] RECOVERY - BFD status on cr2-eqdfw is OK: OK: UP: 10 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [04:33:20] RECOVERY - BFD status on cr3-knams is OK: OK: UP: 8 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [04:35:18] RECOVERY - OSPF status on cr3-knams is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [04:57:02] PROBLEM - OSPF status on cr3-knams is CRITICAL: OSPFv2: 2/4 UP : OSPFv3: 3/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [04:58:28] PROBLEM - BFD status on cr3-knams is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [05:00:08] RECOVERY - BFD status on cr3-knams is OK: OK: UP: 8 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [05:00:24] RECOVERY - OSPF status on cr3-knams is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:29:06] 10Operations, 10ops-codfw, 10DBA: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10Marostegui) [06:29:32] 10Operations, 10ops-codfw, 10DBA: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10Marostegui) p:05Triage→03Medium [06:30:59] (03PS1) 10Marostegui: db2140: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/654042 (https://phabricator.wikimedia.org/T271084) [06:32:13] (03CR) 10Marostegui: [C: 03+2] db2140: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/654042 (https://phabricator.wikimedia.org/T271084) (owner: 10Marostegui) [06:38:25] (03PS1) 10Andrew Bogott: nova: puppetize the nova-api-metadata service [puppet] - 10https://gerrit.wikimedia.org/r/654044 (https://phabricator.wikimedia.org/T261134) [06:39:55] (03CR) 10jerkins-bot: [V: 04-1] nova: puppetize the nova-api-metadata service [puppet] - 10https://gerrit.wikimedia.org/r/654044 (https://phabricator.wikimedia.org/T261134) (owner: 10Andrew Bogott) [06:42:08] (03PS2) 10Andrew Bogott: nova: puppetize the nova-api-metadata service [puppet] - 10https://gerrit.wikimedia.org/r/654044 (https://phabricator.wikimedia.org/T261134) [06:43:34] (03CR) 10jerkins-bot: [V: 04-1] nova: puppetize the nova-api-metadata service [puppet] - 10https://gerrit.wikimedia.org/r/654044 (https://phabricator.wikimedia.org/T261134) (owner: 10Andrew Bogott) [06:46:29] (03PS3) 10Andrew Bogott: nova: puppetize the nova-api-metadata service [puppet] - 10https://gerrit.wikimedia.org/r/654044 (https://phabricator.wikimedia.org/T261134) [06:48:13] (03CR) 10Andrew Bogott: [C: 03+2] nova: puppetize the nova-api-metadata service [puppet] - 10https://gerrit.wikimedia.org/r/654044 (https://phabricator.wikimedia.org/T261134) (owner: 10Andrew Bogott) [06:49:46] (03PS1) 10Marostegui: dbctl: Add x2 as a valid section [puppet] - 10https://gerrit.wikimedia.org/r/654045 (https://phabricator.wikimedia.org/T269324) [07:33:21] (03PS2) 10Giuseppe Lavagetto: Retry when failing to fetch image metadata from the registry [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/651760 [07:33:29] (03CR) 10jerkins-bot: [V: 04-1] Retry when failing to fetch image metadata from the registry [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/651760 (owner: 10Giuseppe Lavagetto) [07:38:13] (03PS3) 10Giuseppe Lavagetto: Retry when failing to fetch image metadata from the registry [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/651760 [07:39:26] (03CR) 10jerkins-bot: [V: 04-1] Retry when failing to fetch image metadata from the registry [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/651760 (owner: 10Giuseppe Lavagetto) [07:40:38] (03PS4) 10Giuseppe Lavagetto: Retry when failing to fetch image metadata from the registry [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/651760 [07:41:54] (03CR) 10jerkins-bot: [V: 04-1] Retry when failing to fetch image metadata from the registry [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/651760 (owner: 10Giuseppe Lavagetto) [07:43:01] (03PS2) 10Muehlenhoff: Remove obsolete (and expired) repository key for Tor [puppet] - 10https://gerrit.wikimedia.org/r/651156 (https://phabricator.wikimedia.org/T269861) [07:43:45] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/651649 (https://phabricator.wikimedia.org/T251005) (owner: 10Dzahn) [07:47:09] (03PS1) 10Elukey: profile::analytics::database::meta: remove old alarm [puppet] - 10https://gerrit.wikimedia.org/r/654171 [07:49:23] (03CR) 10Elukey: [C: 03+2] profile::analytics::database::meta: remove old alarm [puppet] - 10https://gerrit.wikimedia.org/r/654171 (owner: 10Elukey) [07:51:21] RECOVERY - Check systemd state on stat1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:56:11] RECOVERY - exim queue on mx1001 is OK: OK: Less than 2000 mails in exim queue. https://wikitech.wikimedia.org/wiki/Exim [07:57:47] (03PS5) 10Giuseppe Lavagetto: Retry when failing to fetch image metadata from the registry [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/651760 [07:59:36] (03CR) 10jerkins-bot: [V: 04-1] Retry when failing to fetch image metadata from the registry [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/651760 (owner: 10Giuseppe Lavagetto) [08:03:40] (03CR) 10Muehlenhoff: [C: 03+2] Remove obsolete (and expired) repository key for Tor [puppet] - 10https://gerrit.wikimedia.org/r/651156 (https://phabricator.wikimedia.org/T269861) (owner: 10Muehlenhoff) [08:05:08] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/644591 (https://phabricator.wikimedia.org/T247364) (owner: 10CRusnov) [08:07:15] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/628436 (https://phabricator.wikimedia.org/T247364) (owner: 10CRusnov) [08:09:23] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/649077 (owner: 10ArielGlenn) [08:13:19] (03PS1) 10Muehlenhoff: Remove access for moushirael [puppet] - 10https://gerrit.wikimedia.org/r/654179 [08:13:20] (03PS6) 10Giuseppe Lavagetto: Retry when failing to fetch image metadata from the registry [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/651760 [08:15:01] (03CR) 10Muehlenhoff: [C: 03+2] Remove access for moushirael [puppet] - 10https://gerrit.wikimedia.org/r/654179 (owner: 10Muehlenhoff) [08:22:21] 10Operations: Update tor's apt gpg key - https://phabricator.wikimedia.org/T269861 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff I've removed the expired/obsolete key from Puppet. [08:31:34] (03CR) 10Volans: [C: 03+1] "LGTM, thank for using wmflib! Optional nit inline" (031 comment) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/651760 (owner: 10Giuseppe Lavagetto) [08:45:57] (03PS7) 10Giuseppe Lavagetto: Retry when failing to fetch image metadata from the registry [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/651760 [08:51:54] (03CR) 10JMeybohm: [C: 03+1] "Looks good!" [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/651760 (owner: 10Giuseppe Lavagetto) [08:53:29] <_joe_> *deep sigh* [08:53:42] _joe_: <3 [09:02:40] !log bounce asw-d-codfw:xe-7/0/8 - T271041 [09:02:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:44] T271041: ms-be2050 shows network errors - https://phabricator.wikimedia.org/T271041 [09:09:03] (03PS1) 10Muehlenhoff: Remove access for evanp [puppet] - 10https://gerrit.wikimedia.org/r/654183 [09:19:01] 10Operations, 10SRE-swift-storage, 10netops: ms-be2050 shows network errors - https://phabricator.wikimedia.org/T271041 (10ayounsi) Symptoms are a bit similar to T269313 but I don't think it's the same issue as the switch port is showing dropped multicast traffic for no reason. ` asw-d-codfw> show interfaces... [09:19:08] (03CR) 10Muehlenhoff: [C: 03+2] Remove access for evanp [puppet] - 10https://gerrit.wikimedia.org/r/654183 (owner: 10Muehlenhoff) [09:23:48] 10Operations, 10Discovery-Search (Current work): Reshard commonswiki_file elasticsearch index - https://phabricator.wikimedia.org/T260083 (10Joe) p:05Triage→03Medium a:03RKemper @RKemper I'm assigning this task to you, since it seems you're acting on it. Please remove yourself as assignee if that's not ok. [09:24:12] 10Operations, 10ops-codfw, 10SRE-swift-storage, 10netops: ms-be2050 shows network errors - https://phabricator.wikimedia.org/T271041 (10elukey) [09:25:18] 10Operations, 10ops-codfw, 10SRE-swift-storage, 10netops: ms-be2050 shows network errors - https://phabricator.wikimedia.org/T271041 (10elukey) @Papaul Hi! happy new year :) When you are in can you ping me or Filippo to swap the DAC between ms-be2050 and asw-d-codfw? [09:25:29] 10Operations, 10ops-codfw, 10SRE-swift-storage, 10netops: ms-be2050 shows network errors - https://phabricator.wikimedia.org/T271041 (10elukey) p:05Triage→03Medium [09:26:21] 10Operations, 10Performance-Team, 10Traffic: Enable webp thumbnails on all images for non-Commons wikis - https://phabricator.wikimedia.org/T269946 (10Joe) @jbond you can't remove the operations tag without also removing the #traffic one. Not sure if it should given @Peachey88's comment above. [09:30:12] 10Operations, 10Technical-blog-posts, 10Traffic: 3rd part of blog post series: the evolution of Wikimedia's Content Delivery Network - https://phabricator.wikimedia.org/T270074 (10ema) 05Open→03Resolved a:03ema >>! In T270074#6701946, @srodlund wrote: > @ema these should all be fixed now. :-) I'll send... [09:30:40] 10Operations, 10ops-eqiad, 10SRE-swift-storage: Degraded RAID on ms-be1019 - https://phabricator.wikimedia.org/T270806 (10fgiunchedi) This host is also slated for decom soon. Do we have 4TB spare disks on site @Cmjohnson @Jclark-ctr ? If we don't that's fine too, we can let this be until decom. [09:32:13] 10Operations: Provide an option menu when booting via PXE - https://phabricator.wikimedia.org/T191018 (10fgiunchedi) >>! In T191018#6716959, @elukey wrote: > Hello @fgiunchedi, I'd need to boot an-coord1002 with d-i in rescue mode to execute `grub-install` on a raid-1 disk (that doesn't have it), is there a proc... [09:33:12] (03PS1) 10Muehlenhoff: Remove access for kaldari [puppet] - 10https://gerrit.wikimedia.org/r/654185 [09:34:24] 10Operations, 10Performance-Team, 10SRE-swift-storage: Re-deleting a Commons file: "Error deleting file: The file "mwstore://local-multiwrite/local-deleted/..." is in an inconsistent state within the internal storage backends". - https://phabricator.wikimedia.org/T270994 (10Joe) p:05Triage→03Low @aaron d... [09:35:02] 10Operations, 10ops-codfw, 10SRE-swift-storage: Degraded RAID on ms-be2055 - https://phabricator.wikimedia.org/T271055 (10fgiunchedi) hi @Papaul ! This host will need a 4TB disk replacement (under warranty), thank you ! [09:39:13] (03PS1) 10Volans: tox: Remove '--skip B322' from Bandit config. [software/homer] - 10https://gerrit.wikimedia.org/r/654186 (https://phabricator.wikimedia.org/T270969) [09:42:21] (03CR) 10Volans: [C: 03+2] "Fixed CI" [software/homer] - 10https://gerrit.wikimedia.org/r/654186 (https://phabricator.wikimedia.org/T270969) (owner: 10Volans) [09:43:33] (03PS1) 10Elukey: Set fs.permissions.umask-mode for the Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/654187 (https://phabricator.wikimedia.org/T270629) [09:44:16] (03Merged) 10jenkins-bot: tox: Remove '--skip B322' from Bandit config. [software/homer] - 10https://gerrit.wikimedia.org/r/654186 (https://phabricator.wikimedia.org/T270969) (owner: 10Volans) [09:44:41] (03PS2) 10Muehlenhoff: Remove access for kaldari [puppet] - 10https://gerrit.wikimedia.org/r/654185 [09:45:37] 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1) - https://phabricator.wikimedia.org/T264398 (10ema) Varnish 6.0.0 does not seem to be affected by the regression, here is the average `webperf_navtiming_responsestart... [09:47:10] (03CR) 10Kormat: pontoon: lock hiera output file (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649847 (owner: 10Filippo Giunchedi) [09:48:33] (03PS3) 10Elukey: admin: deprecate the analytics-users posix group [puppet] - 10https://gerrit.wikimedia.org/r/651448 (https://phabricator.wikimedia.org/T269150) [09:48:50] !log Deploy schema change on s6 codfw master (lag will appear on codfw) - T270187 [09:48:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:48:55] T270187: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 [09:49:51] (03CR) 10Elukey: "Moritz: ok to merge? :)" [puppet] - 10https://gerrit.wikimedia.org/r/651448 (https://phabricator.wikimedia.org/T269150) (owner: 10Elukey) [09:51:13] (03CR) 10Kormat: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/647815 (https://phabricator.wikimedia.org/T266199) (owner: 10Bstorm) [09:53:39] 10Operations, 10ops-eqiad, 10Traffic: Interface errors on asw2-a-eqiad:xe-4/0/7 (lvs1016) - https://phabricator.wikimedia.org/T271087 (10ayounsi) p:05Triage→03Medium [09:56:16] (03CR) 10Muehlenhoff: [C: 03+2] Remove access for kaldari [puppet] - 10https://gerrit.wikimedia.org/r/654185 (owner: 10Muehlenhoff) [09:58:44] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Retry when failing to fetch image metadata from the registry [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/651760 (owner: 10Giuseppe Lavagetto) [10:01:08] (03Merged) 10jenkins-bot: Retry when failing to fetch image metadata from the registry [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/651760 (owner: 10Giuseppe Lavagetto) [10:07:00] (03PS1) 10Muehlenhoff: Remove access for dcipoletti [puppet] - 10https://gerrit.wikimedia.org/r/654190 [10:07:30] (03CR) 10jerkins-bot: [V: 04-1] Remove access for dcipoletti [puppet] - 10https://gerrit.wikimedia.org/r/654190 (owner: 10Muehlenhoff) [10:09:43] (03PS2) 10Muehlenhoff: Remove access for dcipoletti [puppet] - 10https://gerrit.wikimedia.org/r/654190 [10:12:11] 10Operations, 10SRE-tools, 10serviceops-radar: SVC DNS zonefiles and source of truth - https://phabricator.wikimedia.org/T270071 (10ayounsi) If some snowflakes (that can't be changed) prevent us from managing the bulk of records with Netbox we could move them to a different "namespace". For example we could... [10:16:19] (03PS3) 10Muehlenhoff: Remove access for dcipoletti [puppet] - 10https://gerrit.wikimedia.org/r/654190 [10:19:07] (03CR) 10Muehlenhoff: [C: 03+2] Remove access for dcipoletti [puppet] - 10https://gerrit.wikimedia.org/r/654190 (owner: 10Muehlenhoff) [10:19:45] <_joe_> !log uploading docker-report 0.0.10 to debian buster [10:19:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:26:10] (03PS2) 10Filippo Giunchedi: pontoon: lock hiera output file [puppet] - 10https://gerrit.wikimedia.org/r/649847 [10:26:14] (03CR) 10Filippo Giunchedi: pontoon: lock hiera output file (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649847 (owner: 10Filippo Giunchedi) [10:26:47] (03CR) 10Kormat: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/649847 (owner: 10Filippo Giunchedi) [10:32:07] (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: lock hiera output file [puppet] - 10https://gerrit.wikimedia.org/r/649847 (owner: 10Filippo Giunchedi) [10:34:32] (03PS1) 10Volans: type hints: mark the package as type hinted [software/pywmflib] - 10https://gerrit.wikimedia.org/r/654191 [10:36:16] (03CR) 10Elukey: [C: 03+1] type hints: mark the package as type hinted [software/pywmflib] - 10https://gerrit.wikimedia.org/r/654191 (owner: 10Volans) [10:41:26] (03PS1) 10Elukey: install_server: add a "rescue" label [puppet] - 10https://gerrit.wikimedia.org/r/654192 [10:42:17] (03PS2) 10Elukey: install_server: add a "rescue" label [puppet] - 10https://gerrit.wikimedia.org/r/654192 [10:44:12] (03CR) 10Elukey: "If this is non-sense let me know, I tried to read as much docs as possible but I am still not sure if my understanding is correct or not (" [puppet] - 10https://gerrit.wikimedia.org/r/654192 (owner: 10Elukey) [10:49:20] (03CR) 10Volans: [C: 03+2] type hints: mark the package as type hinted [software/pywmflib] - 10https://gerrit.wikimedia.org/r/654191 (owner: 10Volans) [10:50:10] !log push pfw policies - T269958 [10:50:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:10] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/650505 (https://phabricator.wikimedia.org/T270298) (owner: 10JMeybohm) [10:52:07] (03Merged) 10jenkins-bot: type hints: mark the package as type hinted [software/pywmflib] - 10https://gerrit.wikimedia.org/r/654191 (owner: 10Volans) [10:56:07] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/651448 (https://phabricator.wikimedia.org/T269150) (owner: 10Elukey) [11:00:16] 10Operations, 10ops-eqiad, 10DC-Ops: Audit down ports - https://phabricator.wikimedia.org/T218751 (10ayounsi) @Cmjohnson yes, if you can give me the status of the ports from https://librenms.wikimedia.org/ports/state=down/hostname=asw/format=list_basic/ (all but 3 are in eqiad) we should be able to close it... [11:04:01] 10Operations: Redundant bootloaders for software RAID - https://phabricator.wikimedia.org/T215183 (10elukey) Hello people, I found this task after dealing with /dev/sda failed in a raid1 array. I thought that I had to do grub-install on /dev/sdb via d-i rescue, but then I noticed that the partman recipe was alre... [11:04:10] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review: Degraded RAID on an-coord1002 - https://phabricator.wikimedia.org/T270768 (10elukey) The host boots, see T215183#6718961, but we still need to get the new disk :) [11:06:21] (03CR) 10Elukey: "I was able to resolved my problem, see https://phabricator.wikimedia.org/T215183#6718961, but I am wondering if a "rescue" label could be " [puppet] - 10https://gerrit.wikimedia.org/r/654192 (owner: 10Elukey) [11:09:04] (03CR) 10David Caro: cloud: drop dumps project backups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/652182 (https://phabricator.wikimedia.org/T260692) (owner: 10Arturo Borrero Gonzalez) [11:13:08] (03CR) 10Muehlenhoff: Restart systemd units on package upgrade (032 comments) [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/650521 (https://phabricator.wikimedia.org/T270302) (owner: 10JMeybohm) [11:13:25] (03PS1) 10David Caro: wmcs.backup: ignore dumps backups for now [puppet] - 10https://gerrit.wikimedia.org/r/654196 (https://phabricator.wikimedia.org/T267195) [11:14:52] (03CR) 10jerkins-bot: [V: 04-1] wmcs.backup: ignore dumps backups for now [puppet] - 10https://gerrit.wikimedia.org/r/654196 (https://phabricator.wikimedia.org/T267195) (owner: 10David Caro) [11:17:22] (03CR) 10Jbond: "LGTM but im not expert here so will leave others to provide +1. for the record i think its a great idea have also added riccardo as i hav" [puppet] - 10https://gerrit.wikimedia.org/r/654192 (owner: 10Elukey) [11:21:09] (03PS1) 10Volans: doc: improve installation and intrduction docs [software/pywmflib] - 10https://gerrit.wikimedia.org/r/654198 [11:23:54] (03CR) 10Ayounsi: [DONT MERGE] cloud: expand dmz_cidr list for public endpoints (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/651169 (https://phabricator.wikimedia.org/T209082) (owner: 10Arturo Borrero Gonzalez) [11:26:23] (03PS1) 10Giuseppe Lavagetto: Add dependency on wmflib [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/654200 [11:28:40] (03CR) 10jerkins-bot: [V: 04-1] Add dependency on wmflib [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/654200 (owner: 10Giuseppe Lavagetto) [11:28:57] (03PS2) 10Volans: doc: improve installation and intrduction docs [software/pywmflib] - 10https://gerrit.wikimedia.org/r/654198 [11:30:04] jan_drewniak: My dear minions, it's time we take the moon! Just kidding. Time for Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210104T1130). [11:30:56] (03CR) 10Elukey: [C: 03+1] doc: improve installation and intrduction docs [software/pywmflib] - 10https://gerrit.wikimedia.org/r/654198 (owner: 10Volans) [11:31:42] (03CR) 10Volans: [C: 03+2] doc: improve installation and intrduction docs [software/pywmflib] - 10https://gerrit.wikimedia.org/r/654198 (owner: 10Volans) [11:31:58] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654201 (https://phabricator.wikimedia.org/T128546) [11:32:21] (03PS3) 10Marostegui: tendril: Migrate hiera() to lookup() and setting datatype [puppet] - 10https://gerrit.wikimedia.org/r/642649 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [11:32:23] (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654201 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [11:33:19] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654201 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [11:33:47] (03Merged) 10jenkins-bot: doc: improve installation and intrduction docs [software/pywmflib] - 10https://gerrit.wikimedia.org/r/654198 (owner: 10Volans) [11:33:59] (03CR) 10Marostegui: [C: 03+2] tendril: Migrate hiera() to lookup() and setting datatype [puppet] - 10https://gerrit.wikimedia.org/r/642649 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [11:34:34] (03PS2) 10David Caro: wmcs.backup: ignore all dumps backups except dumps-0 [puppet] - 10https://gerrit.wikimedia.org/r/654196 (https://phabricator.wikimedia.org/T267195) [11:35:33] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:654201| Bumping portals to master (T128546)]] (duration: 01m 14s) [11:35:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:37] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [11:36:29] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:654201| Bumping portals to master (T128546)]] (duration: 00m 56s) [11:36:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:49] (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.6 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/654202 [11:37:54] PROBLEM - Device not healthy -SMART- on an-coord1002 is CRITICAL: cluster=analytics device=sda instance=an-coord1002 job=node site=eqiad https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=an-coord1002&var-datasource=eqiad+prometheus/ops [11:38:17] 10Operations, 10Traffic, 10netops, 10User-jbond: varnish filtering: should we automatically update public_cloud_nets - https://phabricator.wikimedia.org/T270391 (10ayounsi) 2 other options: * Define a list of ASNs and get the matching prefixes from BGP (or API like RIPE stats) * Define a list of ASNs and g... [11:40:42] RECOVERY - Check systemd state on deneb is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:41:09] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.6 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/654202 (owner: 10Volans) [11:41:17] (03PS1) 10Muehlenhoff: Remove expiry date for mayakpwiki [puppet] - 10https://gerrit.wikimedia.org/r/654205 [11:43:09] (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.6 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/654202 (owner: 10Volans) [11:43:32] 10Operations: Redundant bootloaders for software RAID - https://phabricator.wikimedia.org/T215183 (10jbond) > Is there a way to audit this option and see how many hosts have it set disabled? After all this work it seems something that we'd want to keep enabled.. If there is a way to get/set the parameter via ip... [11:44:38] (03PS3) 10David Caro: wmcs.backup: ignore all dumps backups except dumps-0 [puppet] - 10https://gerrit.wikimedia.org/r/654196 (https://phabricator.wikimedia.org/T267195) [11:45:30] (03PS1) 10Volans: Upstream release v0.0.6 [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/654209 [11:46:04] (03CR) 10jerkins-bot: [V: 04-1] wmcs.backup: ignore all dumps backups except dumps-0 [puppet] - 10https://gerrit.wikimedia.org/r/654196 (https://phabricator.wikimedia.org/T267195) (owner: 10David Caro) [11:47:29] (03PS1) 10Arturo Borrero Gonzalez: openstack: add prometheus script to collect metrics on ceph network usage by nova [puppet] - 10https://gerrit.wikimedia.org/r/654211 (https://phabricator.wikimedia.org/T271096) [11:47:44] 10Operations, 10ops-codfw, 10SRE-swift-storage: Degraded RAID on ms-be2055 - https://phabricator.wikimedia.org/T271055 (10fgiunchedi) p:05Triage→03Medium [11:48:23] (03CR) 10jerkins-bot: [V: 04-1] openstack: add prometheus script to collect metrics on ceph network usage by nova [puppet] - 10https://gerrit.wikimedia.org/r/654211 (https://phabricator.wikimedia.org/T271096) (owner: 10Arturo Borrero Gonzalez) [11:48:37] (03CR) 10Volans: [C: 03+2] Upstream release v0.0.6 [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/654209 (owner: 10Volans) [11:49:56] (03CR) 10Muehlenhoff: [C: 03+2] Remove expiry date for mayakpwiki [puppet] - 10https://gerrit.wikimedia.org/r/654205 (owner: 10Muehlenhoff) [11:50:53] (03Merged) 10jenkins-bot: Upstream release v0.0.6 [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/654209 (owner: 10Volans) [11:52:34] PROBLEM - MD RAID on an-coord1002 is CRITICAL: CRITICAL: State: degraded, Active: 1, Working: 1, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [11:52:35] ACKNOWLEDGEMENT - MD RAID on an-coord1002 is CRITICAL: CRITICAL: State: degraded, Active: 1, Working: 1, Failed: 0, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T271098 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [11:52:39] 10Operations, 10ops-eqiad: Degraded RAID on an-coord1002 - https://phabricator.wikimedia.org/T271098 (10ops-monitoring-bot) [11:53:48] (03CR) 10David Caro: [C: 03+1] "LGTM" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/651301 (owner: 10Bstorm) [11:58:10] (03PS2) 10JMeybohm: Restart systemd units on package upgrade [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/650521 (https://phabricator.wikimedia.org/T270302) [11:58:26] (03CR) 10JMeybohm: Restart systemd units on package upgrade (032 comments) [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/650521 (https://phabricator.wikimedia.org/T270302) (owner: 10JMeybohm) [11:59:30] !log uploaded python3-wmflib_0.0.6 to apt.wikimedia.org buster-wikimedia [11:59:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for European mid-day backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210104T1200). [12:00:04] No GERRIT patches in the queue for this window AFAICS. [12:00:19] I have one thing to deploy [12:02:25] (03PS4) 10Ladsgroup: Grant several OATHAuth-related permissions to wmf-supportsafety at Meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650988 (https://phabricator.wikimedia.org/T180896) (owner: 10Urbanecm) [12:02:36] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/650988 [12:02:40] deploying this [12:02:56] it's not testable [12:03:24] (03CR) 10Ladsgroup: [C: 03+2] "B&C" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650988 (https://phabricator.wikimedia.org/T180896) (owner: 10Urbanecm) [12:04:17] (03Merged) 10jenkins-bot: Grant several OATHAuth-related permissions to wmf-supportsafety at Meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650988 (https://phabricator.wikimedia.org/T180896) (owner: 10Urbanecm) [12:04:34] 10Operations: Redundant bootloaders for software RAID - https://phabricator.wikimedia.org/T215183 (10Volans) I agree we should audit it. I think that with redfish API it should be doable, adding @crusnov as they've worked on it last Q. [12:05:16] (03CR) 10Volans: "recheck" [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/654200 (owner: 10Giuseppe Lavagetto) [12:06:20] (03CR) 10Muehlenhoff: "Looks good, one final comment" (032 comments) [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/650521 (https://phabricator.wikimedia.org/T270302) (owner: 10JMeybohm) [12:07:20] (03PS2) 10Arturo Borrero Gonzalez: openstack: add prometheus script to collect metrics on ceph network usage by nova [puppet] - 10https://gerrit.wikimedia.org/r/654211 (https://phabricator.wikimedia.org/T271096) [12:07:57] (03CR) 10jerkins-bot: [V: 04-1] openstack: add prometheus script to collect metrics on ceph network usage by nova [puppet] - 10https://gerrit.wikimedia.org/r/654211 (https://phabricator.wikimedia.org/T271096) (owner: 10Arturo Borrero Gonzalez) [12:08:04] (03CR) 10Volans: [C: 03+1] "LGTM!" [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/654200 (owner: 10Giuseppe Lavagetto) [12:08:05] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:650988|Grant several OATHAuth-related permissions to wmf-supportsafety at Meta (T180896)]] (duration: 00m 56s) [12:08:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:10] T180896: Allow functionaries to reset second factor on low-risk accounts - https://phabricator.wikimedia.org/T180896 [12:08:17] 10Operations, 10ops-eqiad, 10Analytics-Kanban: Degraded RAID on an-coord1002 - https://phabricator.wikimedia.org/T271098 (10Peachey88) [12:09:07] 10Operations, 10ops-eqiad, 10Analytics: Degraded RAID on an-coord1002 - https://phabricator.wikimedia.org/T271098 (10Peachey88) [12:09:32] (03PS1) 10Jbond: pupetlabs-lvm: update lvm module with latest upstream [puppet] - 10https://gerrit.wikimedia.org/r/654216 [12:11:53] (03PS3) 10Arturo Borrero Gonzalez: openstack: add prometheus script to collect metrics on ceph network usage by nova [puppet] - 10https://gerrit.wikimedia.org/r/654211 (https://phabricator.wikimedia.org/T271096) [12:13:10] (03PS4) 10Arturo Borrero Gonzalez: openstack: add prometheus script to collect ceph usage network metrics by nova [puppet] - 10https://gerrit.wikimedia.org/r/654211 (https://phabricator.wikimedia.org/T271096) [12:15:13] (03PS5) 10Arturo Borrero Gonzalez: openstack: add prometheus script to collect ceph usage network metrics by nova [puppet] - 10https://gerrit.wikimedia.org/r/654211 (https://phabricator.wikimedia.org/T271096) [12:28:49] (03PS2) 10Jbond: pupetlabs-lvm: update lvm module with latest upstream [puppet] - 10https://gerrit.wikimedia.org/r/654216 (https://phabricator.wikimedia.org/T271099) [12:31:23] (03CR) 10Arturo Borrero Gonzalez: [C: 04-1] [DONT MERGE] cloud: expand dmz_cidr list for public endpoints (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/651169 (https://phabricator.wikimedia.org/T209082) (owner: 10Arturo Borrero Gonzalez) [12:37:42] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/654216 (https://phabricator.wikimedia.org/T271099) (owner: 10Jbond) [12:40:55] (03PS6) 10Arturo Borrero Gonzalez: openstack: add prometheus script to collect ceph usage network metrics by nova [puppet] - 10https://gerrit.wikimedia.org/r/654211 (https://phabricator.wikimedia.org/T271096) [12:43:42] (03PS4) 10David Caro: wmcs.backup: Add command to remove/print dangling snapshots [puppet] - 10https://gerrit.wikimedia.org/r/650535 (https://phabricator.wikimedia.org/T270478) [12:45:26] Amir1: you can test by looking at Special:ListGroupRights - it's assumed the permissions do their job as that's been shown before so as long as it's on ListGroupRights fine then it's near guarnteed to be fine. [12:45:49] yeah [12:49:13] (03CR) 10jerkins-bot: [V: 04-1] wmcs.backups: Add host to the rbd snapshot name [puppet] - 10https://gerrit.wikimedia.org/r/654221 (https://phabricator.wikimedia.org/T267195) (owner: 10David Caro) [12:52:04] !log deployment-cache-text06: try out varnish 6.0.1-1wm1 T264398 [12:52:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:08] T264398: 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1) - https://phabricator.wikimedia.org/T264398 [13:21:36] 10Puppet, 10Patch-For-Review: puppetlabs-lvm: upgrade the lvm module to match the puppe;tlabs upstream module - https://phabricator.wikimedia.org/T271099 (10Peachey88) [13:24:06] PROBLEM - Number of messages locally queued by purged for processing on cp1077 is CRITICAL: cluster=cache_text instance=cp1077 job=purged layer=backend site=eqiad https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1077 [13:24:28] PROBLEM - Number of messages locally queued by purged for processing on cp3050 is CRITICAL: cluster=cache_text instance=cp3050 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3050 [13:24:48] PROBLEM - Number of messages locally queued by purged for processing on cp3058 is CRITICAL: cluster=cache_text instance=cp3058 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3058 [13:25:24] PROBLEM - Number of messages locally queued by purged for processing on cp3064 is CRITICAL: cluster=cache_text instance=cp3064 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3064 [13:25:34] PROBLEM - Number of messages locally queued by purged for processing on cp3056 is CRITICAL: cluster=cache_text instance=cp3056 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3056 [13:26:10] PROBLEM - Number of messages locally queued by purged for processing on cp3054 is CRITICAL: cluster=cache_text instance=cp3054 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3054 [13:26:14] PROBLEM - Number of messages locally queued by purged for processing on cp1081 is CRITICAL: cluster=cache_text instance=cp1081 job=purged layer=backend site=eqiad https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1081 [13:26:14] PROBLEM - Number of messages locally queued by purged for processing on cp3060 is CRITICAL: cluster=cache_text instance=cp3060 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3060 [13:26:18] PROBLEM - Number of messages locally queued by purged for processing on cp3062 is CRITICAL: cluster=cache_text instance=cp3062 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3062 [13:26:24] PROBLEM - Number of messages locally queued by purged for processing on cp2035 is CRITICAL: cluster=cache_text instance=cp2035 job=purged layer=backend site=codfw https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2035 [13:27:08] PROBLEM - Number of messages locally queued by purged for processing on cp1087 is CRITICAL: cluster=cache_text instance=cp1087 job=purged layer=backend site=eqiad https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1087 [13:27:52] RECOVERY - Number of messages locally queued by purged for processing on cp1081 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1081 [13:28:02] RECOVERY - Number of messages locally queued by purged for processing on cp2035 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2035 [13:28:29] (03CR) 10Filippo Giunchedi: "LGTM, a couple of comments (more context at https://phabricator.wikimedia.org/T191018 as well)" [puppet] - 10https://gerrit.wikimedia.org/r/654192 (owner: 10Elukey) [13:28:52] <_joe_> uh what was that? [13:29:04] RECOVERY - Number of messages locally queued by purged for processing on cp1077 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1077 [13:30:50] <_joe_> those queues are going down FWIW [13:30:57] <_joe_> ema: ^^ if you want to take a look [13:31:06] <_joe_> it seems ATS is lagging in processing purges [13:32:04] PROBLEM - Number of messages locally queued by purged for processing on cp1087 is CRITICAL: cluster=cache_text instance=cp1087 job=purged layer=backend site=eqiad https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1087 [13:32:23] _joe_: yeah, it looks like https://phabricator.wikimedia.org/T265625 [13:32:32] puzzling [13:35:32] RECOVERY - Number of messages locally queued by purged for processing on cp3056 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3056 [13:36:04] RECOVERY - Number of messages locally queued by purged for processing on cp3054 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3054 [13:36:14] RECOVERY - Number of messages locally queued by purged for processing on cp3062 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3062 [13:37:00] RECOVERY - Number of messages locally queued by purged for processing on cp3064 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3064 [13:38:42] PROBLEM - Number of messages locally queued by purged for processing on cp1087 is CRITICAL: cluster=cache_text instance=cp1087 job=purged layer=backend site=eqiad https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1087 [13:38:43] (03PS1) 10Muehlenhoff: mw maintenance: Install php-readline from component/php72 [puppet] - 10https://gerrit.wikimedia.org/r/654231 (https://phabricator.wikimedia.org/T245757) [13:39:30] PROBLEM - Number of messages locally queued by purged for processing on cp3060 is CRITICAL: cluster=cache_text instance=cp3060 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3060 [13:42:02] RECOVERY - Number of messages locally queued by purged for processing on cp1087 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1087 [13:42:37] (03CR) 10Jbond: "labs PCC: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27332/console" [puppet] - 10https://gerrit.wikimedia.org/r/654216 (https://phabricator.wikimedia.org/T271099) (owner: 10Jbond) [13:42:48] RECOVERY - Number of messages locally queued by purged for processing on cp3060 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3060 [13:46:20] RECOVERY - Number of messages locally queued by purged for processing on cp3058 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3058 [13:47:42] RECOVERY - Number of messages locally queued by purged for processing on cp3050 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3050 [13:50:50] 10Operations, 10Wikimedia-Mailing-lists: reset admin password for wikimedianl-l - https://phabricator.wikimedia.org/T271104 (10Effeietsanders) [13:55:58] PROBLEM - Number of messages locally queued by purged for processing on cp3050 is CRITICAL: cluster=cache_text instance=cp3050 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3050 [13:56:00] PROBLEM - Number of messages locally queued by purged for processing on cp3054 is CRITICAL: cluster=cache_text instance=cp3054 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3054 [13:56:16] PROBLEM - Number of messages locally queued by purged for processing on cp3058 is CRITICAL: cluster=cache_text instance=cp3058 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3058 [13:56:52] PROBLEM - Number of messages locally queued by purged for processing on cp3064 is CRITICAL: cluster=cache_text instance=cp3064 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3064 [13:58:51] (03CR) 10Elukey: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/654192 (owner: 10Elukey) [13:58:52] PROBLEM - Number of messages locally queued by purged for processing on cp1077 is CRITICAL: cluster=cache_text instance=cp1077 job=purged layer=backend site=eqiad https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1077 [13:59:16] PROBLEM - Number of messages locally queued by purged for processing on cp2039 is CRITICAL: cluster=cache_text instance=cp2039 job=purged layer=backend site=codfw https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2039 [13:59:20] PROBLEM - Number of messages locally queued by purged for processing on cp1081 is CRITICAL: cluster=cache_text instance=cp1081 job=purged layer=backend site=eqiad https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1081 [13:59:22] PROBLEM - Number of messages locally queued by purged for processing on cp5009 is CRITICAL: cluster=cache_text instance=cp5009 job=purged layer=backend site=eqsin https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqsin+prometheus/ops&var-instance=cp5009 [13:59:30] PROBLEM - Number of messages locally queued by purged for processing on cp2033 is CRITICAL: cluster=cache_text instance=cp2033 job=purged layer=backend site=codfw https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2033 [13:59:32] PROBLEM - Number of messages locally queued by purged for processing on cp2035 is CRITICAL: cluster=cache_text instance=cp2035 job=purged layer=backend site=codfw https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2035 [13:59:41] (03PS3) 10Elukey: cache: Migrate hiera() to lookup() and setting datatype [puppet] - 10https://gerrit.wikimedia.org/r/651640 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [13:59:54] PROBLEM - Number of messages locally queued by purged for processing on cp1075 is CRITICAL: cluster=cache_text instance=cp1075 job=purged layer=backend site=eqiad https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1075 [14:00:04] ema: lemme know if it is ok to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/651640 [14:00:08] PROBLEM - Number of messages locally queued by purged for processing on cp2029 is CRITICAL: cluster=cache_text instance=cp2029 job=purged layer=backend site=codfw https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2029 [14:00:15] ah sorry not now :) [14:00:16] PROBLEM - Number of messages locally queued by purged for processing on cp1087 is CRITICAL: cluster=cache_text instance=cp1087 job=purged layer=backend site=eqiad https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1087 [14:00:22] PROBLEM - Number of messages locally queued by purged for processing on cp2027 is CRITICAL: cluster=cache_text instance=cp2027 job=purged layer=backend site=codfw https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2027 [14:00:24] PROBLEM - Number of messages locally queued by purged for processing on cp3056 is CRITICAL: cluster=cache_text instance=cp3056 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3056 [14:01:04] PROBLEM - Number of messages locally queued by purged for processing on cp3060 is CRITICAL: cluster=cache_text instance=cp3060 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3060 [14:01:06] PROBLEM - Number of messages locally queued by purged for processing on cp3062 is CRITICAL: cluster=cache_text instance=cp3062 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3062 [14:01:10] RECOVERY - Number of messages locally queued by purged for processing on cp2033 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2033 [14:05:08] PROBLEM - Number of messages locally queued by purged for processing on cp2029 is CRITICAL: cluster=cache_text instance=cp2029 job=purged layer=backend site=codfw https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2029 [14:05:22] PROBLEM - Number of messages locally queued by purged for processing on cp2027 is CRITICAL: cluster=cache_text instance=cp2027 job=purged layer=backend site=codfw https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2027 [14:05:56] PROBLEM - Number of messages locally queued by purged for processing on cp2039 is CRITICAL: cluster=cache_text instance=cp2039 job=purged layer=backend site=codfw https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2039 [14:07:15] (03PS3) 10Elukey: install_server: add a "rescue" label [puppet] - 10https://gerrit.wikimedia.org/r/654192 [14:07:18] 10Operations, 10Wikimedia-Mailing-lists: reset admin password for wikimedianl-l - https://phabricator.wikimedia.org/T271104 (10Joe) p:05Triage→03Low a:03Joe Hi [14:08:35] (03PS4) 10Elukey: install_server: add a "rescue" label [puppet] - 10https://gerrit.wikimedia.org/r/654192 [14:08:52] 10Operations, 10Wikidata, 10Wikidata Query UI, 10Patch-For-Review, 10User-Addshore: Move WDQS UI to microsites - https://phabricator.wikimedia.org/T266702 (10Ladsgroup) >>! In T266702#6662363, @Addshore wrote: > Notes from the call: > > * Branches for gui deploy repo - for WDQS and WCQS > * in the meant... [14:09:14] RECOVERY - Number of messages locally queued by purged for processing on cp2039 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2039 [14:09:50] RECOVERY - Number of messages locally queued by purged for processing on cp1075 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1075 [14:10:03] (03PS5) 10Elukey: install_server: add a "rescue" label [puppet] - 10https://gerrit.wikimedia.org/r/654192 (https://phabricator.wikimedia.org/T191018) [14:10:56] (03PS5) 10Ayounsi: Allow specific flows from 172.16/12 to prod [homer/public] - 10https://gerrit.wikimedia.org/r/643269 (https://phabricator.wikimedia.org/T209082) [14:11:02] RECOVERY - Number of messages locally queued by purged for processing on cp5009 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqsin+prometheus/ops&var-instance=cp5009 [14:11:44] RECOVERY - Number of messages locally queued by purged for processing on cp2029 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2029 [14:11:58] RECOVERY - Number of messages locally queued by purged for processing on cp2027 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2027 [14:11:59] (03PS4) 10Elukey: admin: deprecate the analytics-users posix group [puppet] - 10https://gerrit.wikimedia.org/r/651448 (https://phabricator.wikimedia.org/T269150) [14:12:33] (03CR) 10Ayounsi: [DONT MERGE] cloud: expand dmz_cidr list for public endpoints (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/651169 (https://phabricator.wikimedia.org/T209082) (owner: 10Arturo Borrero Gonzalez) [14:12:48] RECOVERY - Number of messages locally queued by purged for processing on cp2035 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=codfw+prometheus/ops&var-instance=cp2035 [14:12:49] (03CR) 10Ayounsi: Allow specific flows from 172.16/12 to prod (034 comments) [homer/public] - 10https://gerrit.wikimedia.org/r/643269 (https://phabricator.wikimedia.org/T209082) (owner: 10Ayounsi) [14:13:42] !log Restart mysql on pc2009 [14:13:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:13:48] RECOVERY - Number of messages locally queued by purged for processing on cp1077 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1077 [14:14:16] RECOVERY - Number of messages locally queued by purged for processing on cp1081 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1081 [14:15:10] RECOVERY - Number of messages locally queued by purged for processing on cp1087 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1087 [14:15:20] RECOVERY - Number of messages locally queued by purged for processing on cp3056 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3056 [14:15:41] !log installing libdatetime-timezone-perl updates [14:15:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:16:02] RECOVERY - Number of messages locally queued by purged for processing on cp3062 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3062 [14:16:48] RECOVERY - Number of messages locally queued by purged for processing on cp3064 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3064 [14:17:34] RECOVERY - Number of messages locally queued by purged for processing on cp3054 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3054 [14:17:42] RECOVERY - Number of messages locally queued by purged for processing on cp3060 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3060 [14:19:14] RECOVERY - Number of messages locally queued by purged for processing on cp3050 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3050 [14:22:50] RECOVERY - Number of messages locally queued by purged for processing on cp3058 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3058 [14:24:27] (03PS1) 10Jbond: pki: add python db config file [puppet] - 10https://gerrit.wikimedia.org/r/654255 [14:24:29] (03CR) 10Elukey: [C: 03+2] admin: deprecate the analytics-users posix group [puppet] - 10https://gerrit.wikimedia.org/r/651448 (https://phabricator.wikimedia.org/T269150) (owner: 10Elukey) [14:24:34] (03PS5) 10Elukey: admin: deprecate the analytics-users posix group [puppet] - 10https://gerrit.wikimedia.org/r/651448 (https://phabricator.wikimedia.org/T269150) [14:25:57] (03CR) 10jerkins-bot: [V: 04-1] pki: add python db config file [puppet] - 10https://gerrit.wikimedia.org/r/654255 (owner: 10Jbond) [14:25:59] (03CR) 10Filippo Giunchedi: "I understand the rationale of the change, although duplicating the priority both in the filename and the logstash::conf call seems like a " [puppet] - 10https://gerrit.wikimedia.org/r/650629 (https://phabricator.wikimedia.org/T254533) (owner: 10Cwhite) [14:26:01] elukey: feel free to go ahead [14:26:48] (03CR) 10Filippo Giunchedi: "LGTM, modulo comments on Id1a7d2bd59f" [puppet] - 10https://gerrit.wikimedia.org/r/647028 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [14:27:29] (03CR) 10Filippo Giunchedi: [C: 03+1] profile: update netdev rsyslog template to ecs 1.7.0 [puppet] - 10https://gerrit.wikimedia.org/r/647032 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [14:28:04] (03CR) 10Filippo Giunchedi: "LGTM modulo comments in Id1a7d2bd59f" [puppet] - 10https://gerrit.wikimedia.org/r/647029 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [14:29:54] (03CR) 10Ssingh: "recheck" [software/knead-wikidough] - 10https://gerrit.wikimedia.org/r/639838 (https://phabricator.wikimedia.org/T267424) (owner: 10Ssingh) [14:29:59] (03PS2) 10Jbond: pki: add python db config file [puppet] - 10https://gerrit.wikimedia.org/r/654255 [14:31:02] !log installing openssl updates on buster-based DB hosts [14:31:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:13] (03PS1) 10Jbond: (WIP) create a script to detect manually installed packages [puppet] - 10https://gerrit.wikimedia.org/r/654257 [14:33:23] (03PS8) 10MSantos: start using imposm as OSM sync tool [puppet] - 10https://gerrit.wikimedia.org/r/644482 (https://phabricator.wikimedia.org/T260949) [14:34:36] (03CR) 10Ssingh: [C: 03+2] "CR completed, CI is passing (finally!). Merging." [software/knead-wikidough] - 10https://gerrit.wikimedia.org/r/639838 (https://phabricator.wikimedia.org/T267424) (owner: 10Ssingh) [14:34:41] !log Upgrade and restart mysql on es2020 and es2024 - T271106 [14:34:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:34:46] T271106: Enable report_host on candidate masters - https://phabricator.wikimedia.org/T271106 [14:35:08] (03CR) 10jerkins-bot: [V: 04-1] start using imposm as OSM sync tool [puppet] - 10https://gerrit.wikimedia.org/r/644482 (https://phabricator.wikimedia.org/T260949) (owner: 10MSantos) [14:35:22] (03CR) 10jerkins-bot: [V: 04-1] (WIP) create a script to detect manually installed packages [puppet] - 10https://gerrit.wikimedia.org/r/654257 (owner: 10Jbond) [14:37:13] 10Operations, 10ops-eqsin, 10DC-Ops: cr2-eqsin: fan failure - https://phabricator.wikimedia.org/T267544 (10ayounsi) The Netops steps are: 1/ connect the new router temporarily for OS upgrade/software config 2/ depool site 3/ shutdown old router 4/ swap router 5/ power on new router 6/ repool site 7/ wipe old... [14:38:09] 10Operations, 10Wikimedia-Mailing-lists: reset admin password for wikimedianl-l - https://phabricator.wikimedia.org/T271104 (10Joe) 05Open→03Resolved The password has been re-set, and the list admins should receive an email shortly with the password (yes, via email in plain text, welcome to mailman, and th... [14:38:18] (03CR) 10Ottomata: Use puppetca cert for eventstreams so it can make requests to internal services (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/643085 (https://phabricator.wikimedia.org/T253069) (owner: 10Ottomata) [14:39:00] (03CR) 10Ottomata: Add event stream config for android.user_contributions_screen (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/639284 (https://phabricator.wikimedia.org/T228179) (owner: 10Mholloway) [14:39:01] 10Operations, 10Traffic, 10Patch-For-Review: Deploy Wikidough: Experimental DNS-over-HTTPS (DoH) public resolver - https://phabricator.wikimedia.org/T252132 (10ssingh) [14:39:24] 10Operations, 10Traffic, 10Patch-For-Review: Integration tests for Wikidough - https://phabricator.wikimedia.org/T267424 (10ssingh) 05Open→03Resolved With https://gerrit.wikimedia.org/r/639838 merged, I am going to mark this as resolved as the first iteration of the test suite for Wikidough is now comple... [14:41:10] (03PS3) 10Jbond: pki: add python db config file [puppet] - 10https://gerrit.wikimedia.org/r/654255 [14:41:53] (03PS1) 10Elukey: admin: move user matmarex from 'researchers' to 'analytics-privatedata-users' [puppet] - 10https://gerrit.wikimedia.org/r/654258 (https://phabricator.wikimedia.org/T268801) [14:45:00] (03CR) 10MSantos: [C: 03+1] postgres: increase number of WAL files retained by master [puppet] - 10https://gerrit.wikimedia.org/r/643717 (owner: 10Hnowlan) [14:46:32] 10Operations, 10Maps, 10JavaScript: Display map markers on Kartographer maps even in case of mapserver failures - https://phabricator.wikimedia.org/T270865 (10sdkim) p:05Medium→03Low [14:47:30] (03PS1) 10Elukey: admin: remove ssh access to jherndandez [puppet] - 10https://gerrit.wikimedia.org/r/654259 (https://phabricator.wikimedia.org/T268801) [14:47:38] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27334/console" [puppet] - 10https://gerrit.wikimedia.org/r/654255 (owner: 10Jbond) [14:47:59] (03CR) 10jerkins-bot: [V: 04-1] admin: remove ssh access to jherndandez [puppet] - 10https://gerrit.wikimedia.org/r/654259 (https://phabricator.wikimedia.org/T268801) (owner: 10Elukey) [14:48:19] (03PS9) 10MSantos: start using imposm as OSM sync tool [puppet] - 10https://gerrit.wikimedia.org/r/644482 (https://phabricator.wikimedia.org/T260949) [14:48:55] (03PS2) 10Elukey: admin: remove ssh access to jherndandez [puppet] - 10https://gerrit.wikimedia.org/r/654259 (https://phabricator.wikimedia.org/T268801) [14:49:48] (03CR) 10jerkins-bot: [V: 04-1] start using imposm as OSM sync tool [puppet] - 10https://gerrit.wikimedia.org/r/644482 (https://phabricator.wikimedia.org/T260949) (owner: 10MSantos) [14:50:51] !log cp3058: ats-backend-restart T265625 [14:50:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:55] T265625: ats-be occasional system CPU usage increase - https://phabricator.wikimedia.org/T265625 [14:53:54] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/654259 (https://phabricator.wikimedia.org/T268801) (owner: 10Elukey) [14:53:58] (03PS4) 10Jbond: pki: add python db config file [puppet] - 10https://gerrit.wikimedia.org/r/654255 [14:54:46] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27335/console" [puppet] - 10https://gerrit.wikimedia.org/r/654255 (owner: 10Jbond) [14:54:57] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/654258 (https://phabricator.wikimedia.org/T268801) (owner: 10Elukey) [14:56:38] (03CR) 10Jbond: [V: 03+1 C: 03+2] pki: add python db config file [puppet] - 10https://gerrit.wikimedia.org/r/654255 (owner: 10Jbond) [15:04:34] (03PS1) 10Muehlenhoff: Enable base::service_auto_restart for Apache/Nginx on debmonitor [puppet] - 10https://gerrit.wikimedia.org/r/654261 (https://phabricator.wikimedia.org/T135991) [15:11:00] (03PS1) 10Jbond: cfssl::db: use correct config [puppet] - 10https://gerrit.wikimedia.org/r/654263 [15:13:13] (03CR) 10Jbond: [C: 03+2] cfssl::db: use correct config [puppet] - 10https://gerrit.wikimedia.org/r/654263 (owner: 10Jbond) [15:21:57] 10Operations, 10ops-codfw, 10DC-Ops, 10fundraising-tech-ops: (Need By: 2020-09-30) rack/setup/install frmx2001.frack.codfw.wmnet, frdata2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T260183 (10Jgreen) [15:21:59] 10Operations, 10fundraising-tech-ops: (Need By: 2020-09-30) rack/setup/install frmx1001 & frdata1002 - https://phabricator.wikimedia.org/T260181 (10Jgreen) [15:29:32] 10Operations, 10ops-codfw, 10SRE-swift-storage, 10netops: ms-be2050 shows network errors - https://phabricator.wikimedia.org/T271041 (10Papaul) 05Open→03Resolved a:03Papaul ` Queue: 8, Forwarding classes: mcast Queued: Packets : 0 0 pps B... [15:30:48] PROBLEM - Check systemd state on pki1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:32:44] 10Operations, 10ops-codfw, 10DBA: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10Papaul) @Marostegui yes we can swap the DIMM and see . You can depool the server when you can and let me know. [15:33:10] !log Depool db2140 T271084 [15:33:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:16] T271084: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 [15:33:39] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2140 ', diff saved to https://phabricator.wikimedia.org/P13644 and previous config saved to /var/cache/conftool/dbconfig/20210104-153339-marostegui.json [15:33:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:55] jbond42: FYI cfssl failed on pki1001 ^^^ AttributeError: 'NoneType' object has no attribute 'encoding' [15:34:02] (03PS1) 10David Caro: wmcs.backup: Add backup_image command [puppet] - 10https://gerrit.wikimedia.org/r/654266 (https://phabricator.wikimedia.org/T270478) [15:34:21] like the DB connection returned none [15:34:56] 10Operations, 10ops-codfw, 10DBA: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10Marostegui) @Papaul server off! [15:34:58] PROBLEM - Check systemd state on pki2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:35:29] (03PS1) 10David Caro: wmcs.backup: blacked all files [puppet] - 10https://gerrit.wikimedia.org/r/654267 [15:35:31] (03CR) 10Mforns: "I think Andrew-WMDE's -1 is now solved, as the dependency has been merged and queries are deployed now." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649864 (https://phabricator.wikimedia.org/T270246) (owner: 10Andrew-WMDE) [15:36:42] (03CR) 10jerkins-bot: [V: 04-1] wmcs.backup: Add backup_image command [puppet] - 10https://gerrit.wikimedia.org/r/654266 (https://phabricator.wikimedia.org/T270478) (owner: 10David Caro) [15:38:08] (03CR) 10David Caro: [C: 03+2] wmcs.backup: Remove all temp files after usage [puppet] - 10https://gerrit.wikimedia.org/r/650542 (https://phabricator.wikimedia.org/T270478) (owner: 10David Caro) [15:38:24] (03CR) 10David Caro: [C: 03+2] wmcs.backup: Add command to remove/print dangling snapshots [puppet] - 10https://gerrit.wikimedia.org/r/650535 (https://phabricator.wikimedia.org/T270478) (owner: 10David Caro) [15:39:00] PROBLEM - Number of messages locally queued by purged for processing on cp3060 is CRITICAL: cluster=cache_text instance=cp3060 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3060 [15:39:52] (03PS5) 10David Caro: wmcs.backup: Add a images summary command [puppet] - 10https://gerrit.wikimedia.org/r/651166 (https://phabricator.wikimedia.org/T267195) [15:40:40] RECOVERY - Number of messages locally queued by purged for processing on cp3060 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3060 [15:42:21] (03PS6) 10David Caro: wmcs.backup: Add a images summary command [puppet] - 10https://gerrit.wikimedia.org/r/651166 (https://phabricator.wikimedia.org/T267195) [15:42:23] (03PS4) 10David Caro: wmcs.backup: Add a method to create a vm backup [puppet] - 10https://gerrit.wikimedia.org/r/651507 (https://phabricator.wikimedia.org/T267195) [15:42:25] (03PS5) 10David Caro: wmcs.backup: Remove all dangling snapshots [puppet] - 10https://gerrit.wikimedia.org/r/651537 (https://phabricator.wikimedia.org/T267195) [15:42:27] (03PS4) 10David Caro: wmcs.backup: Add a way to remove old backups and snapshots [puppet] - 10https://gerrit.wikimedia.org/r/651550 (https://phabricator.wikimedia.org/T267195) [15:42:29] (03PS4) 10David Caro: wmcs.backup: Add command to backup all assigned vms [puppet] - 10https://gerrit.wikimedia.org/r/651761 (https://phabricator.wikimedia.org/T267195) [15:42:31] (03PS4) 10David Caro: wmcs.backup: add a command to remove non-handled backups [puppet] - 10https://gerrit.wikimedia.org/r/651776 (https://phabricator.wikimedia.org/T267195) [15:42:33] (03PS2) 10David Caro: wmcs.backup: Add a command to create the next backup [puppet] - 10https://gerrit.wikimedia.org/r/654220 (https://phabricator.wikimedia.org/T267195) [15:42:35] (03PS2) 10David Caro: wmcs.backup: Add host to the rbd snapshot name [puppet] - 10https://gerrit.wikimedia.org/r/654221 (https://phabricator.wikimedia.org/T267195) [15:42:37] (03PS2) 10David Caro: wmcs.backup: Add backup_image command [puppet] - 10https://gerrit.wikimedia.org/r/654266 (https://phabricator.wikimedia.org/T270478) [15:42:39] (03PS2) 10David Caro: wmcs.backup: blacked all files [puppet] - 10https://gerrit.wikimedia.org/r/654267 [15:43:42] 10Operations, 10ops-codfw, 10DBA: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10Papaul) @Marostegui swapped A7 with B6 , clear the IDRAC log no more errors for now [15:44:47] (03PS3) 10JMeybohm: Restart systemd units on package upgrade [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/650521 (https://phabricator.wikimedia.org/T270302) [15:45:20] (03CR) 10JMeybohm: [C: 03+2] Don't register /var/run/kubernetes (as it's unused) [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/650505 (https://phabricator.wikimedia.org/T270298) (owner: 10JMeybohm) [15:45:21] volans: thanks that can be ignored will fix after this meeting (10 mins) [15:45:35] ack, no prob, thx [15:45:39] (03CR) 10JMeybohm: [C: 03+2] "Thanks!" (031 comment) [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/650521 (https://phabricator.wikimedia.org/T270302) (owner: 10JMeybohm) [15:46:03] 10Operations, 10ops-codfw, 10DBA: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10Marostegui) Thanks Papaul! I am going to check the data and will close the task once I am done. If it happens again we can reopen and ask Dell for a replacement [15:46:21] (03PS2) 10Awight: Add a job for TemplateData metrics aggregation [puppet] - 10https://gerrit.wikimedia.org/r/649864 (https://phabricator.wikimedia.org/T270246) (owner: 10Andrew-WMDE) [15:46:37] (03CR) 10Awight: Add a job for TemplateData metrics aggregation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649864 (https://phabricator.wikimedia.org/T270246) (owner: 10Andrew-WMDE) [15:47:36] (03CR) 10jerkins-bot: [V: 04-1] wmcs.backup: Add host to the rbd snapshot name [puppet] - 10https://gerrit.wikimedia.org/r/654221 (https://phabricator.wikimedia.org/T267195) (owner: 10David Caro) [15:47:45] (03CR) 10jerkins-bot: [V: 04-1] wmcs.backup: Add backup_image command [puppet] - 10https://gerrit.wikimedia.org/r/654266 (https://phabricator.wikimedia.org/T270478) (owner: 10David Caro) [15:48:05] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: 2021-01-30) rack/setup/install frqueue2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T269481 (10Papaul) [15:48:13] (03CR) 10Mforns: [C: 03+1] "LGTM! Elukey, I think this can be merged. Andrew-WMDE's -1 has been solved by now." [puppet] - 10https://gerrit.wikimedia.org/r/649864 (https://phabricator.wikimedia.org/T270246) (owner: 10Andrew-WMDE) [15:48:18] (03PS3) 10Urbanecm: labs: bnwiki: Fix a typo in wgGEHelpPanelLinks config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650948 (https://phabricator.wikimedia.org/T270578) [15:48:28] jouncebot: now [15:48:28] No deployments scheduled for the next 2 hour(s) and 11 minute(s) [15:48:37] (03CR) 10Urbanecm: [C: 03+2] "beta only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650948 (https://phabricator.wikimedia.org/T270578) (owner: 10Urbanecm) [15:49:01] (03CR) 10JMeybohm: [V: 03+2 C: 03+2] Restart systemd units on package upgrade [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/650521 (https://phabricator.wikimedia.org/T270302) (owner: 10JMeybohm) [15:49:06] (03CR) 10Razzi: [C: 03+1] Set fs.permissions.umask-mode for the Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/654187 (https://phabricator.wikimedia.org/T270629) (owner: 10Elukey) [15:49:27] (03Merged) 10jenkins-bot: labs: bnwiki: Fix a typo in wgGEHelpPanelLinks config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650948 (https://phabricator.wikimedia.org/T270578) (owner: 10Urbanecm) [15:55:38] PROBLEM - exim queue on mx1001 is CRITICAL: CRITICAL: 4379 mails in exim queue. https://wikitech.wikimedia.org/wiki/Exim [15:56:20] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [15:59:48] (03PS1) 10Jbond: cfssl: use *args vs **kwargs [puppet] - 10https://gerrit.wikimedia.org/r/654269 [16:02:31] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: 2021-01-30) rack/setup/install frqueue2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T269481 (10Papaul) [16:03:00] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [16:05:57] (03CR) 10Jbond: [C: 03+2] cfssl: use *args vs **kwargs [puppet] - 10https://gerrit.wikimedia.org/r/654269 (owner: 10Jbond) [16:10:16] RECOVERY - Check systemd state on pki1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:10:16] RECOVERY - Check systemd state on pki2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:14:20] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: 2021-01-30) rack/setup/install frqueue2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T269481 (10Papaul) [16:21:48] (03PS1) 10Elukey: Remove analytics-users from various analytics and sre configs [puppet] - 10https://gerrit.wikimedia.org/r/654271 (https://phabricator.wikimedia.org/T269150) [16:25:36] !log pt1979@cumin2001 START - Cookbook sre.dns.netbox [16:25:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:50] (03CR) 10Elukey: [C: 03+2] admin: move user matmarex from 'researchers' to 'analytics-privatedata-users' [puppet] - 10https://gerrit.wikimedia.org/r/654258 (https://phabricator.wikimedia.org/T268801) (owner: 10Elukey) [16:27:08] (03CR) 10Elukey: [C: 03+2] admin: remove ssh access to jherndandez [puppet] - 10https://gerrit.wikimedia.org/r/654259 (https://phabricator.wikimedia.org/T268801) (owner: 10Elukey) [16:27:14] (03PS3) 10Elukey: admin: remove ssh access to jherndandez [puppet] - 10https://gerrit.wikimedia.org/r/654259 (https://phabricator.wikimedia.org/T268801) [16:28:21] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/654261 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [16:29:16] 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Mailman password reminder mail (and other texts) has broken encoding in Czech - https://phabricator.wikimedia.org/T271123 (10Mormegil) [16:32:01] !log pt1979@cumin2001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [16:32:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:35:02] !log import kubernetes 1.16.15-4 to component/kubernetes-future buster-wikimedia and stretch-wikimedia [16:35:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:35:48] 10Operations, 10Discovery-Search (Current work): Reshard commonswiki_file elasticsearch index - https://phabricator.wikimedia.org/T260083 (10RKemper) >>! In T260083#6718598, @Joe wrote: > @RKemper I'm assigning this task to you, since it seems you're acting on it. Please remove yourself as assignee if that's n... [16:38:29] 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Mailman password reminder mail (and other texts) has broken encoding in Czech - https://phabricator.wikimedia.org/T271123 (10Joe) p:05Triage→03High @Ladsgroup @herron Can you take a look? I guess we just need to add czech to the exception languages? [16:39:21] 10Operations, 10Discovery-Search (Current work): Reshard commonswiki_file elasticsearch index - https://phabricator.wikimedia.org/T260083 (10CBogen) [16:40:56] 10Operations, 10ops-eqsin, 10DC-Ops: cr2-eqsin: fan failure - https://phabricator.wikimedia.org/T267544 (10RobH) [16:42:35] (03PS2) 10Jbond: (WIP) create a script to detect manually installed packages [puppet] - 10https://gerrit.wikimedia.org/r/654257 [16:44:29] (03CR) 10jerkins-bot: [V: 04-1] (WIP) create a script to detect manually installed packages [puppet] - 10https://gerrit.wikimedia.org/r/654257 (owner: 10Jbond) [16:44:32] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: 2021-01-30) rack/setup/install frqueue2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T269481 (10Papaul) [16:47:01] 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Mailman password reminder mail (and other texts) has broken encoding in Czech - https://phabricator.wikimedia.org/T271123 (10Ladsgroup) oh boy. My suggestion is that for sake of uniformity and ease of maintenance, we should convert all the files to use utf-8 i... [16:52:50] 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Mailman password reminder mail (and other texts) has broken encoding in Czech - https://phabricator.wikimedia.org/T271123 (10Mormegil) +1 from me. I think it should not be worse than the current state. :-) (Read: It might break something but that thing is prob... [17:03:30] (03PS1) 10Ssingh: dnsdist: allow custom headers in the HTTP response and enable HSTS [puppet] - 10https://gerrit.wikimedia.org/r/654275 (https://phabricator.wikimedia.org/T252132) [17:05:21] (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27336/console" [puppet] - 10https://gerrit.wikimedia.org/r/654275 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh) [17:08:41] 10Operations, 10Traffic: ats-be occasional system CPU usage increase - https://phabricator.wikimedia.org/T265625 (10ema) All that CPU time is spent in the kernel, and specifically calling `mmap` a lot. I've seen `ksys_mmap_pgoff` featured prominently in `perf report` of affected nodes, and tracing for 10 secon... [17:16:42] (03PS3) 10Jbond: (WIP) create a script to detect manually installed packages [puppet] - 10https://gerrit.wikimedia.org/r/654257 [17:18:30] (03CR) 10jerkins-bot: [V: 04-1] (WIP) create a script to detect manually installed packages [puppet] - 10https://gerrit.wikimedia.org/r/654257 (owner: 10Jbond) [17:21:03] (03PS1) 10Elukey: admin: remove members of 'reseachers' already in other posix groups [puppet] - 10https://gerrit.wikimedia.org/r/654277 (https://phabricator.wikimedia.org/T268801) [17:25:48] (03PS4) 10Jbond: (WIP) create a script to detect manually installed packages [puppet] - 10https://gerrit.wikimedia.org/r/654257 [17:27:09] (03CR) 10CRusnov: "Thanks for the reviews!" [puppet] - 10https://gerrit.wikimedia.org/r/628436 (https://phabricator.wikimedia.org/T247364) (owner: 10CRusnov) [17:29:26] 10Operations, 10serviceops, 10cloud-services-team (Kanban): Upgrade labweb servers to buster - https://phabricator.wikimedia.org/T269004 (10Andrew) @jijiki, my tests suggest that this upgrade will go smoothly. If you judge MW to be mostly ready for Buster then I'll go ahead and upgrade things this week; if n... [17:29:35] (03PS3) 10CRusnov: Port elasticsearch/es-tool.py to Python3 [puppet] - 10https://gerrit.wikimedia.org/r/644591 (https://phabricator.wikimedia.org/T247364) [17:30:15] (03PS4) 10CRusnov: check_ripe_atlas.py: Port to Python3 [puppet] - 10https://gerrit.wikimedia.org/r/646879 (https://phabricator.wikimedia.org/T247364) [17:31:28] (03PS1) 10Jbond: profile::base: add parameter to install apt audit script [puppet] - 10https://gerrit.wikimedia.org/r/654278 [17:31:30] (03PS1) 10Jbond: sretest: test install_apt_audit_installed script [puppet] - 10https://gerrit.wikimedia.org/r/654279 [17:31:32] (03CR) 10CRusnov: [C: 03+2] Port elasticsearch/es-tool.py to Python3 [puppet] - 10https://gerrit.wikimedia.org/r/644591 (https://phabricator.wikimedia.org/T247364) (owner: 10CRusnov) [17:31:48] (03CR) 10CRusnov: [C: 03+2] check_ripe_atlas.py: Port to Python3 [puppet] - 10https://gerrit.wikimedia.org/r/646879 (https://phabricator.wikimedia.org/T247364) (owner: 10CRusnov) [17:32:00] (03CR) 10CRusnov: [C: 03+2] modules/icinga/files/raid_handler.py: Port to Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/647369 (https://phabricator.wikimedia.org/T247364) (owner: 10CRusnov) [17:32:59] (03PS5) 10Jbond: apt: Create a script to detect manually installed packages [puppet] - 10https://gerrit.wikimedia.org/r/654257 [17:33:32] (03CR) 10Jbond: "I think this should be good for at least a first pass" [puppet] - 10https://gerrit.wikimedia.org/r/654257 (owner: 10Jbond) [17:33:40] (03PS2) 10Jbond: profile::base: add parameter to install apt audit script [puppet] - 10https://gerrit.wikimedia.org/r/654278 [17:33:53] (03PS2) 10Jbond: sretest: test install_apt_audit_installed script [puppet] - 10https://gerrit.wikimedia.org/r/654279 [17:35:12] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27337/console" [puppet] - 10https://gerrit.wikimedia.org/r/654279 (owner: 10Jbond) [17:38:41] (03PS3) 10Jbond: sretest: test install_apt_audit_installed script [puppet] - 10https://gerrit.wikimedia.org/r/654279 [17:38:59] RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:39:39] RECOVERY - Check systemd state on netbox2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:39:48] (03PS3) 10Elukey: Add a job for TemplateData metrics aggregation [puppet] - 10https://gerrit.wikimedia.org/r/649864 (https://phabricator.wikimedia.org/T270246) (owner: 10Andrew-WMDE) [17:40:27] (03PS4) 10Jbond: sretest: test install_apt_audit_installed script [puppet] - 10https://gerrit.wikimedia.org/r/654279 [17:41:17] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27339/console" [puppet] - 10https://gerrit.wikimedia.org/r/654279 (owner: 10Jbond) [17:44:21] PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [17:44:43] (03CR) 10Elukey: [C: 03+2] Add a job for TemplateData metrics aggregation [puppet] - 10https://gerrit.wikimedia.org/r/649864 (https://phabricator.wikimedia.org/T270246) (owner: 10Andrew-WMDE) [17:51:14] RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:51:50] (03CR) 10CRusnov: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/652575 (https://phabricator.wikimedia.org/T247364) (owner: 10CRusnov) [17:57:50] 10Operations, 10Mail, 10Epic: Move most (all?) exim personal aliases to WMF ITS - https://phabricator.wikimedia.org/T122144 (10Dzahn) Aliases of former board members, staff and affiliates that have been removed now after the announcement from ITS and the grace period until end of 2020. ` fdevouard@wikipedi... [17:58:38] (03PS1) 10PipelineBot: mathoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/654282 [18:00:04] ryankemper: How many deployers does it take to do Wikidata Query Service weekly deploy deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210104T1800). [18:04:28] 10Operations, 10ops-codfw: RMA failed codfw C7 switch - WMF6114 - https://phabricator.wikimedia.org/T267950 (10Papaul) 05Open→03Resolved The new switch is rack in C1 qfx5100-spare2-codfw and wired, ready for any other configuration ` Model: qfx5100-48s-6q Junos: 14.1X53-D43.7 JUNOS Base OS boot [14.1X53-D... [18:08:48] PROBLEM - Rate of JVM GC Old generation-s runs - logstash1012-production-logstash-eqiad on logstash1012 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-logstash-eqiad&var-instance=logstash1012&panelId=37 [18:14:22] !log restart elasticsearch on logstash1012 - oom [18:14:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:00] (03PS1) 10Andrew Bogott: Add a rough account-creation script for acme-chief/LE [puppet] - 10https://gerrit.wikimedia.org/r/654285 (https://phabricator.wikimedia.org/T207372) [18:23:31] (03CR) 10jerkins-bot: [V: 04-1] Add a rough account-creation script for acme-chief/LE [puppet] - 10https://gerrit.wikimedia.org/r/654285 (https://phabricator.wikimedia.org/T207372) (owner: 10Andrew Bogott) [18:24:27] jouncebot: next [18:24:27] In 0 hour(s) and 35 minute(s): Morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210104T1900) [18:27:11] (03PS2) 10Andrew Bogott: Add a rough account-creation script for acme-chief/LE [puppet] - 10https://gerrit.wikimedia.org/r/654285 (https://phabricator.wikimedia.org/T207372) [18:29:44] (03CR) 10Andrew Bogott: [C: 03+2] Add a rough account-creation script for acme-chief/LE [puppet] - 10https://gerrit.wikimedia.org/r/654285 (https://phabricator.wikimedia.org/T207372) (owner: 10Andrew Bogott) [18:36:24] PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:36:54] PROBLEM - Check systemd state on netbox2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:37:05] grumble [18:41:50] (03CR) 10Elukey: Add cookbook for rebooting druid nodes (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/651636 (https://phabricator.wikimedia.org/T269596) (owner: 10Razzi) [18:44:20] PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [18:50:53] !log ayounsi@cumin1001 START - Cookbook sre.dns.netbox [18:50:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:50:55] (03Abandoned) 10Dduvall: pipeline: Increase fetch depth to 50 [core] (wmf/1.36.0-wmf.22) - 10https://gerrit.wikimedia.org/r/650573 (owner: 10Dduvall) [18:55:10] RECOVERY - Rate of JVM GC Old generation-s runs - logstash1012-production-logstash-eqiad on logstash1012 is OK: (C)100 gt (W)80 gt 70.14 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-logstash-eqiad&var-instance=logstash1012&panelId=37 [18:56:07] !log ayounsi@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [18:56:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:00:04] RoanKattouw, Niharika, and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy Morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210104T1900). [19:00:05] No GERRIT patches in the queue for this window AFAICS. [19:00:13] I'll deploy a few things [19:00:56] 10Operations, 10SRE-tools, 10Traffic, 10User-crusnov: Some Traffic clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271144 (10crusnov) [19:00:58] (03PS2) 10Urbanecm: metawiki: Grant oathauth-view-log to stewards [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651002 [19:01:02] (03CR) 10Urbanecm: [C: 03+2] metawiki: Grant oathauth-view-log to stewards [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651002 (owner: 10Urbanecm) [19:01:18] RECOVERY - Uncommitted DNS changes in Netbox on netbox1001 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [19:01:47] (03PS2) 10Urbanecm: ukwikisource: Add Archive namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653168 (https://phabricator.wikimedia.org/T270627) [19:01:53] (03CR) 10Urbanecm: [C: 03+2] ukwikisource: Add Archive namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653168 (https://phabricator.wikimedia.org/T270627) (owner: 10Urbanecm) [19:01:57] (03Merged) 10jenkins-bot: metawiki: Grant oathauth-view-log to stewards [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651002 (owner: 10Urbanecm) [19:03:29] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 9af0b01b87715e8071df071d0c3b8fbbd89c4e69: metawiki: Grant oathauth-view-log to stewards (duration: 00m 57s) [19:03:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:42] (03PS3) 10Urbanecm: ukwikisource: Add Archive namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653168 (https://phabricator.wikimedia.org/T270627) [19:03:44] (03CR) 10Urbanecm: [C: 03+2] ukwikisource: Add Archive namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653168 (https://phabricator.wikimedia.org/T270627) (owner: 10Urbanecm) [19:04:16] (03Merged) 10jenkins-bot: ukwikisource: Add Archive namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653168 (https://phabricator.wikimedia.org/T270627) (owner: 10Urbanecm) [19:05:45] (03PS2) 10Urbanecm: ukwikisource: Delete Translation namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653170 (https://phabricator.wikimedia.org/T270628) [19:05:56] (03CR) 10Urbanecm: [C: 03+2] ukwikisource: Delete Translation namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653170 (https://phabricator.wikimedia.org/T270628) (owner: 10Urbanecm) [19:06:31] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: c54783bff9862829d220c42544d523e06f86e6ac: ukwikisource: Add Archive namespace (T270627) (duration: 00m 57s) [19:06:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:37] T270627: Add Archive: namespace to Ukrainian Wikisource - https://phabricator.wikimedia.org/T270627 [19:06:48] (03Merged) 10jenkins-bot: ukwikisource: Delete Translation namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653170 (https://phabricator.wikimedia.org/T270628) (owner: 10Urbanecm) [19:07:20] !log mwscript namespaceDupes.php --wiki=ukwikisource --fix # T270627 [19:07:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:08:08] (03CR) 10Legoktm: "This seems fine, but isn't the maintenance script convention lowerCamelCase?" [puppet] - 10https://gerrit.wikimedia.org/r/653539 (owner: 10Daimona Eaytoy) [19:09:02] (03PS2) 10Urbanecm: frwiktionary: Mark several namespaces as content namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653174 (https://phabricator.wikimedia.org/T270821) [19:09:08] (03CR) 10Urbanecm: [C: 03+2] frwiktionary: Mark several namespaces as content namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653174 (https://phabricator.wikimedia.org/T270821) (owner: 10Urbanecm) [19:09:12] (03CR) 10Jforrester: [C: 03+1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/653539 (owner: 10Daimona Eaytoy) [19:09:23] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 192fe58369cba7b35f2fc426b55d410b0fe351d4: ukwikisource: Delete Translation namespace (T270628) (duration: 00m 58s) [19:09:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:09:27] T270628: Delete Translation: namespace to Ukrainian Wikisource - https://phabricator.wikimedia.org/T270628 [19:09:56] (03Merged) 10jenkins-bot: frwiktionary: Mark several namespaces as content namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653174 (https://phabricator.wikimedia.org/T270821) (owner: 10Urbanecm) [19:11:32] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: d8670c2e14791448cb087e978c0b02290048dc1d: frwiktionary: Mark several namespaces as content namespaces (T270821) (duration: 00m 57s) [19:11:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:11:36] T270821: Make some namespaces content namespaces ($wgContentNamespaces) on French Wiktionary - https://phabricator.wikimedia.org/T270821 [19:12:56] (03PS3) 10Urbanecm: hrwiki: Enable visual editor in the draft (Nacrt) namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653177 (https://phabricator.wikimedia.org/T270688) [19:13:00] (03CR) 10Urbanecm: [C: 03+2] hrwiki: Enable visual editor in the draft (Nacrt) namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653177 (https://phabricator.wikimedia.org/T270688) (owner: 10Urbanecm) [19:13:49] (03Merged) 10jenkins-bot: hrwiki: Enable visual editor in the draft (Nacrt) namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653177 (https://phabricator.wikimedia.org/T270688) (owner: 10Urbanecm) [19:14:10] (03CR) 10Bstorm: "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/647815 (https://phabricator.wikimedia.org/T266199) (owner: 10Bstorm) [19:15:17] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 7fe2d562eb286e15626795daabbe8597075903b1: hrwiki: Enable visual editor in the draft (Nacrt) namespace (T270688) (duration: 00m 55s) [19:15:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:21] T270688: Enable VisualEditor in hrwiki Draft namespace - https://phabricator.wikimedia.org/T270688 [19:15:27] 10Operations, 10MW-on-K8s, 10serviceops, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Pipeline): Deployment infrastructure for PHP microservices - https://phabricator.wikimedia.org/T261369 (10Legoktm) Specifically regarding pygmentize, we should be using the version that's currently bundled... [19:17:21] (03PS1) 10Urbanecm: ukwikisource: Search Archive NS by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654290 (https://phabricator.wikimedia.org/T270627) [19:17:31] (03PS4) 10Urbanecm: hrwiki: Restrict changetags permissions to sysop and bot group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653173 (https://phabricator.wikimedia.org/T270996) [19:17:37] (03CR) 10Urbanecm: [C: 03+2] hrwiki: Restrict changetags permissions to sysop and bot group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653173 (https://phabricator.wikimedia.org/T270996) (owner: 10Urbanecm) [19:18:09] (03PS2) 10Urbanecm: ukwikisource: Search Archive NS by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654290 (https://phabricator.wikimedia.org/T270627) [19:18:14] (03CR) 10Urbanecm: [C: 03+2] ukwikisource: Search Archive NS by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654290 (https://phabricator.wikimedia.org/T270627) (owner: 10Urbanecm) [19:18:19] Urbanecm: is there space for one more config patch in the deploy window today? [19:18:33] (03Merged) 10jenkins-bot: hrwiki: Restrict changetags permissions to sysop and bot group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653173 (https://phabricator.wikimedia.org/T270996) (owner: 10Urbanecm) [19:18:34] Jdlrobson: sure [19:18:47] (03PS1) 10Jdlrobson: Enable PageImages on mediawiki namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654291 [19:18:48] awesome! i'll update the deployment page now [19:18:51] ^ that's the patch [19:18:56] okay, thanks [19:19:51] done! [19:20:02] im a bit slow getting started this morning :) [19:20:30] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 88b9316c3f1a0e176dee0b9435f01c5fd54ab165: hrwiki: Restrict changetags permissions to sysop and bot group (T270996) (duration: 00m 55s) [19:20:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:20:34] T270996: Restrict changetags userright to sysops and bots on hrwiki - https://phabricator.wikimedia.org/T270996 [19:20:40] (03PS2) 10Urbanecm: mediawikiwiki: Enable PageImages on couple more namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654291 (owner: 10Jdlrobson) [19:20:53] I clarified the commit message a bit (originally i thought you're enabling it at NS_MEDIAWIKI everywhere) [19:21:02] (03PS3) 10Urbanecm: mediawikiwiki: Enable PageImages on couple more namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654291 (owner: 10Jdlrobson) [19:21:08] (03CR) 10Urbanecm: [C: 03+2] mediawikiwiki: Enable PageImages on couple more namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654291 (owner: 10Jdlrobson) [19:21:43] Urbanecm: thx! [19:21:58] (03Merged) 10jenkins-bot: mediawikiwiki: Enable PageImages on couple more namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654291 (owner: 10Jdlrobson) [19:22:00] this one needs to be synced before i can test though (as far as a i recall) [19:22:05] okay [19:22:11] we just need to check nothing explodes because of a config error [19:22:26] I've done that before with a missing semicolon i believe haha [19:22:33] Jdlrobson: pulled to mwdebug1001 now [19:22:58] i can confirm nothing blew up so that's good [19:23:09] great [19:23:10] syncing then [19:23:44] it works! [19:23:50] great! [19:24:20] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 19aa23f11befd1e3d0efda4c680f1c117729bf26: mediawikiwiki: Enable PageImages on couple more namespaces (duration: 00m 55s) [19:24:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:24:32] Jdlrobson: synced. Anything else? [19:25:11] (03PS3) 10Urbanecm: ukwikisource: Search Archive NS by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654290 (https://phabricator.wikimedia.org/T270627) [19:25:16] (03CR) 10Urbanecm: ukwikisource: Search Archive NS by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654290 (https://phabricator.wikimedia.org/T270627) (owner: 10Urbanecm) [19:25:20] (03CR) 10Urbanecm: [C: 03+2] ukwikisource: Search Archive NS by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654290 (https://phabricator.wikimedia.org/T270627) (owner: 10Urbanecm) [19:26:07] (03Merged) 10jenkins-bot: ukwikisource: Search Archive NS by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654290 (https://phabricator.wikimedia.org/T270627) (owner: 10Urbanecm) [19:26:35] nope that's it [19:26:41] thanks Urbanecm that's going to make this week a lot easier :) [19:26:54] happy to make your weeks easier! :) [19:28:37] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 5bb1e32c4922067e78297c3bda30f96f931fdf81: ukwikisource: Search Archive NS by default (T270627) (duration: 00m 55s) [19:28:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:28:41] T270627: Add Archive: namespace to Ukrainian Wikisource - https://phabricator.wikimedia.org/T270627 [19:28:48] 🙏 [19:29:14] (03PS5) 10Urbanecm: Add wgImportSources for zhwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637869 (https://phabricator.wikimedia.org/T266388) (owner: 10Hamish) [19:29:20] (03CR) 10Urbanecm: [C: 03+2] Add wgImportSources for zhwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637869 (https://phabricator.wikimedia.org/T266388) (owner: 10Hamish) [19:30:11] (03Merged) 10jenkins-bot: Add wgImportSources for zhwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637869 (https://phabricator.wikimedia.org/T266388) (owner: 10Hamish) [19:32:01] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 96115197ded0d9a983dcce74da0d8080d0c58f33: Add wgImportSources for zhwikinews (T266388) (duration: 00m 56s) [19:32:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:32:05] T266388: Open Transwiki import in Chinese Wikinews - https://phabricator.wikimedia.org/T266388 [19:32:29] (03PS2) 10Urbanecm: Enable abusefilter block at hrwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653206 (https://phabricator.wikimedia.org/T270997) (owner: 10Luke081515) [19:32:39] (03CR) 10Urbanecm: [C: 03+2] Enable abusefilter block at hrwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653206 (https://phabricator.wikimedia.org/T270997) (owner: 10Luke081515) [19:33:27] (03CR) 10Urbanecm: [C: 03+2] Add localised logos for the Madurese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651462 (https://phabricator.wikimedia.org/T270693) (owner: 10Odder) [19:33:46] (03Merged) 10jenkins-bot: Enable abusefilter block at hrwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653206 (https://phabricator.wikimedia.org/T270997) (owner: 10Luke081515) [19:34:21] (03Merged) 10jenkins-bot: Add localised logos for the Madurese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651462 (https://phabricator.wikimedia.org/T270693) (owner: 10Odder) [19:35:16] (03PS2) 10Urbanecm: Enable abusefilter block at zh_yuewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653207 (https://phabricator.wikimedia.org/T270567) (owner: 10Luke081515) [19:35:16] !log urbanecm@deploy1001 Synchronized wmf-config/abusefilter.php: 57f11b3252dbd164356f8fb223e0d18e907d975d: Enable abusefilter block at hrwiki (T270997) (duration: 00m 54s) [19:35:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:20] T270997: Enable AbuseFilter "block" action on hrwiki - https://phabricator.wikimedia.org/T270997 [19:35:21] (03CR) 10Urbanecm: [C: 03+2] Enable abusefilter block at zh_yuewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653207 (https://phabricator.wikimedia.org/T270567) (owner: 10Luke081515) [19:36:11] (03Merged) 10jenkins-bot: Enable abusefilter block at zh_yuewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653207 (https://phabricator.wikimedia.org/T270567) (owner: 10Luke081515) [19:36:45] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: d5fa55a4688d36d239b9c11d320aa04c7547917e: Add localised logos for the Madurese Wikipedia (T270693) (duration: 00m 55s) [19:36:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:36:50] T270693: Upload local logo for Wikipedia Madurese - https://phabricator.wikimedia.org/T270693 [19:37:25] 10Operations, 10SRE-tools, 10Traffic, 10IPv6, 10User-crusnov: Some Traffic clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271144 (10Aklapper) [19:39:35] (03PS2) 10Urbanecm: Add localised logos for the Madurese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651465 (https://phabricator.wikimedia.org/T270693) (owner: 10Odder) [19:39:42] !log urbanecm@deploy1001 Synchronized wmf-config/abusefilter.php: 9a5ec62b85aa43f087bf5d8a7aaf6fe892a73187: Enable abusefilter block at zh_yuewiki (T270567) (duration: 00m 54s) [19:39:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:46] T270567: Enable block action for AbuseFilters on zh_yuewiki - https://phabricator.wikimedia.org/T270567 [19:39:50] (03CR) 10Urbanecm: [C: 03+2] Add localised logos for the Madurese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651465 (https://phabricator.wikimedia.org/T270693) (owner: 10Odder) [19:40:46] (03Merged) 10jenkins-bot: Add localised logos for the Madurese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651465 (https://phabricator.wikimedia.org/T270693) (owner: 10Odder) [19:42:24] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 8058502ada99d25eee4140df3504b8a2c71c20fa: Add localised logos for the Madurese Wikipedia (T270693) (duration: 00m 54s) [19:42:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:42:29] T270693: Upload local logo for Wikipedia Madurese - https://phabricator.wikimedia.org/T270693 [19:43:42] (03CR) 10Bstorm: [C: 03+1] "As far as I can tell, most cloud VMs use the labs_lvm module that basically does what this does by re-inventing the wheel with scripts and" [puppet] - 10https://gerrit.wikimedia.org/r/654216 (https://phabricator.wikimedia.org/T271099) (owner: 10Jbond) [19:55:15] (03PS4) 10Legoktm: aptrepo: Pull pyall component over HTTPS [puppet] - 10https://gerrit.wikimedia.org/r/651300 [19:56:22] 10Operations, 10SRE-Access-Requests: Requesting a service account with only access to Gerrit for fundraising deployment tools - https://phabricator.wikimedia.org/T271151 (10Jgreen) [19:57:35] (03PS1) 10Herron: kibana: change backend naming from kibana-next to kibana7 [puppet] - 10https://gerrit.wikimedia.org/r/654294 (https://phabricator.wikimedia.org/T234854) [19:57:43] (03PS1) 10Andrew Bogott: profile::mail::smarthost: switch to acme-chief certs [puppet] - 10https://gerrit.wikimedia.org/r/654295 (https://phabricator.wikimedia.org/T260834) [19:58:06] 10Operations, 10SRE-Access-Requests: Requesting a service account with only access to Gerrit for fundraising deployment tools - https://phabricator.wikimedia.org/T271151 (10Jgreen) [19:59:14] 10Operations, 10Gerrit-Privilege-Requests, 10SRE-Access-Requests: Requesting a service account with only access to Gerrit for fundraising deployment tools - https://phabricator.wikimedia.org/T271151 (10Jgreen) [19:59:16] (03CR) 10jerkins-bot: [V: 04-1] profile::mail::smarthost: switch to acme-chief certs [puppet] - 10https://gerrit.wikimedia.org/r/654295 (https://phabricator.wikimedia.org/T260834) (owner: 10Andrew Bogott) [20:00:57] (03PS2) 10Andrew Bogott: profile::mail::smarthost: switch to acme-chief certs [puppet] - 10https://gerrit.wikimedia.org/r/654295 (https://phabricator.wikimedia.org/T260834) [20:02:05] 10Operations, 10Gerrit-Privilege-Requests, 10SRE-Access-Requests: Requesting a service account with only access to Gerrit for fundraising deployment tools - https://phabricator.wikimedia.org/T271151 (10Jgreen) [20:04:09] 10Operations, 10Gerrit-Privilege-Requests, 10SRE-Access-Requests: Requesting a service account with only access to Gerrit for fundraising deployment tools - https://phabricator.wikimedia.org/T271151 (10Jgreen) [20:05:28] (03PS2) 10Legoktm: mediawiki: Temporarily add alternative path for AbuseFilter script [puppet] - 10https://gerrit.wikimedia.org/r/653539 (owner: 10Daimona Eaytoy) [20:05:28] (03PS1) 10Legoktm: mediawiki: Remove temporary alternative path for AbuseFilter script [puppet] - 10https://gerrit.wikimedia.org/r/654296 [20:05:42] (03PS3) 10Legoktm: mediawiki: Temporarily add alternative path for AbuseFilter script [puppet] - 10https://gerrit.wikimedia.org/r/653539 (owner: 10Daimona Eaytoy) [20:05:44] (03PS2) 10Legoktm: mediawiki: Remove temporary alternative path for AbuseFilter script [puppet] - 10https://gerrit.wikimedia.org/r/654296 [20:05:58] (03CR) 10Bstorm: cloud haproxy: refactor the various haproxy setups (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/651301 (owner: 10Bstorm) [20:06:25] (03CR) 10Legoktm: [C: 03+1] "Let me know when the AbuseFilter patch is ready and I can merge this ahead of the train." [puppet] - 10https://gerrit.wikimedia.org/r/653539 (owner: 10Daimona Eaytoy) [20:06:52] (03CR) 10Legoktm: [C: 04-2] "Once Ifcc2bff9e400fde564179fe6b96496ceae6b8623 is deployed everywhere" [puppet] - 10https://gerrit.wikimedia.org/r/654296 (owner: 10Legoktm) [20:09:46] RECOVERY - Check systemd state on netbox2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:10:44] RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:10:49] (03CR) 10Jforrester: [C: 03+1] mediawiki: Temporarily add alternative path for AbuseFilter script [puppet] - 10https://gerrit.wikimedia.org/r/653539 (owner: 10Daimona Eaytoy) [20:11:41] (03CR) 10Jforrester: [C: 03+1] "> Patch Set 3: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/653539 (owner: 10Daimona Eaytoy) [20:17:20] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: 2021-01-30) rack/setup/install frqueue2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T269481 (10Papaul) ` [edit interfaces interface-range disabled] - member "ge-[0-1]/0/3"; [edit interfaces interface-range vlan-fundraising] member "ge-[0-1]/... [20:17:45] (03CR) 10Daimona Eaytoy: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/653539 (owner: 10Daimona Eaytoy) [20:17:58] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: 2021-01-30) rack/setup/install frqueue2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T269481 (10Papaul) [20:19:01] (03PS3) 10Andrew Bogott: profile::mail::smarthost: switch to acme-chief certs [puppet] - 10https://gerrit.wikimedia.org/r/654295 (https://phabricator.wikimedia.org/T260834) [20:19:03] (03PS1) 10Andrew Bogott: profile::mail::smarthost: add acme-chief certs [puppet] - 10https://gerrit.wikimedia.org/r/654297 (https://phabricator.wikimedia.org/T260834) [20:19:43] (03CR) 10Legoktm: [C: 03+2] mediawiki: Temporarily add alternative path for AbuseFilter script [puppet] - 10https://gerrit.wikimedia.org/r/653539 (owner: 10Daimona Eaytoy) [20:21:02] (03CR) 10Andrew Bogott: [C: 03+2] profile::mail::smarthost: add acme-chief certs [puppet] - 10https://gerrit.wikimedia.org/r/654297 (https://phabricator.wikimedia.org/T260834) (owner: 10Andrew Bogott) [20:22:08] 10Operations, 10SRE-tools, 10netops, 10IPv6, 10User-jbond: Some Foundation clusters do not appear to support IPv6 - https://phabricator.wikimedia.org/T271136 (10Dzahn) [20:22:11] 10Operations, 10SRE-tools, 10netops, 10IPv6, 10User-jbond: Some Foundation clusters do not appear to support IPv6 - https://phabricator.wikimedia.org/T271136 (10Dzahn) adding netops because ping* offload servers are in their domain, right? [20:22:50] 10Operations, 10Analytics-Clusters, 10SRE-tools, 10netops, and 2 others: Some Foundation clusters do not appear to support IPv6 - https://phabricator.wikimedia.org/T271136 (10Dzahn) [20:30:29] (03PS1) 10Papaul: DNS: Add DNS for frqueue2002 [dns] - 10https://gerrit.wikimedia.org/r/654298 [20:30:55] (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/654294 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron) [20:32:01] (03CR) 10Dzahn: [C: 03+2] mw maintenance: Install php-readline from component/php72 [puppet] - 10https://gerrit.wikimedia.org/r/654231 (https://phabricator.wikimedia.org/T245757) (owner: 10Muehlenhoff) [20:32:05] (03CR) 10Papaul: [C: 03+2] DNS: Add DNS for frqueue2002 [dns] - 10https://gerrit.wikimedia.org/r/654298 (owner: 10Papaul) [20:34:26] legoktm: Thanks! [20:34:35] 10Operations, 10SRE-tools, 10netops, 10IPv6, 10User-jbond: Some Foundation clusters do not appear to support IPv6 - https://phabricator.wikimedia.org/T271136 (10elukey) Removing the Analytics tag since kafka-main servers are managed by SRE (it is the codfw cluster for the jobqueue etc..) :) [20:35:36] PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:36:20] PROBLEM - Check systemd state on netbox2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:37:08] 10Operations, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: 2021-01-30) rack/setup/install frqueue2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T269481 (10Papaul) [20:39:06] 10Operations, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: 2021-01-30) rack/setup/install frqueue2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T269481 (10Papaul) a:05Papaul→03Jgreen @Jgreen Happy new year. This server is ready for install. [20:39:33] 10Operations, 10ops-codfw, 10SRE-swift-storage: Degraded RAID on ms-be2055 - https://phabricator.wikimedia.org/T271055 (10Papaul) a:03Papaul [20:40:45] (03PS3) 10Razzi: Add cookbook for rebooting druid nodes [cookbooks] - 10https://gerrit.wikimedia.org/r/651636 (https://phabricator.wikimedia.org/T269596) [20:41:36] (03PS4) 10Andrew Bogott: profile::mail::smarthost: switch to acme-chief certs [puppet] - 10https://gerrit.wikimedia.org/r/654295 (https://phabricator.wikimedia.org/T260834) [20:41:38] (03PS1) 10Andrew Bogott: Add cloudinfra-dns-manager to password_safelist [puppet] - 10https://gerrit.wikimedia.org/r/654299 (https://phabricator.wikimedia.org/T260834) [20:42:23] (03CR) 10Andrew Bogott: [C: 03+2] Add cloudinfra-dns-manager to password_safelist [puppet] - 10https://gerrit.wikimedia.org/r/654299 (https://phabricator.wikimedia.org/T260834) (owner: 10Andrew Bogott) [20:44:28] (03CR) 10jerkins-bot: [V: 04-1] Add cookbook for rebooting druid nodes [cookbooks] - 10https://gerrit.wikimedia.org/r/651636 (https://phabricator.wikimedia.org/T269596) (owner: 10Razzi) [20:46:50] (03CR) 10Dzahn: "on mwmaint2001 - adds the component but since prod servers are still on stretch it does not affect them further" [puppet] - 10https://gerrit.wikimedia.org/r/654231 (https://phabricator.wikimedia.org/T245757) (owner: 10Muehlenhoff) [20:48:35] (03CR) 10Dzahn: [C: 03+2] "just moving things around in the file - but making obvious which services are left that have a SPOF" [dns] - 10https://gerrit.wikimedia.org/r/650630 (owner: 10Dzahn) [20:48:45] (03PS2) 10Dzahn: move misc services with single backend to own section [dns] - 10https://gerrit.wikimedia.org/r/650630 [20:48:56] !log restart airflow-scheduler on an-airflow1001 to maybe resolve kerberos issue ('GSS initiate failed') [20:48:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:00:04] chrisalbon and accraze: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Services – Graphoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210104T2100). [21:03:50] PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 164 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [21:05:03] 10Operations, 10Patch-For-Review, 10User-fgiunchedi: Standardizing our partman recipes - https://phabricator.wikimedia.org/T156955 (10Dzahn) Since this ticket has been created we now have fairly small subset of standard recipes. But at the same time there are still a bunch of recipes in a "custom" subdirect... [21:05:30] RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 2 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [21:06:13] 10Operations, 10Patch-For-Review, 10User-fgiunchedi: Standardizing our partman recipes - https://phabricator.wikimedia.org/T156955 (10Dzahn) ` git grep 'echo partman/.* ;;' | awk '{print $4}' | sort | uniq -c | sort -rn 74 partman/standard.cfg 65 partman/flat.cfg 4 partman/custom/mw-raid1-l... [21:10:29] 10Operations, 10ops-codfw, 10SRE-swift-storage: Degraded RAID on ms-be2055 - https://phabricator.wikimedia.org/T271055 (10Papaul) We will receiving the disk by tomorrow CASE # 5352620710 [21:14:06] !log razzi@cumin1001 START - Cookbook sre.hosts.decommission [21:14:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:15:06] PROBLEM - Disk space on an-airflow1001 is CRITICAL: DISK CRITICAL - free space: / 231 MB (0% inode=83%): /tmp 231 MB (0% inode=83%): /var/tmp 231 MB (0% inode=83%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=an-airflow1001&var-datasource=eqiad+prometheus/ops [21:20:33] (03PS4) 10Razzi: Add cookbook for rebooting druid nodes [cookbooks] - 10https://gerrit.wikimedia.org/r/651636 (https://phabricator.wikimedia.org/T269596) [21:20:36] (03PS1) 10Ryan Kemper: airflow: remove chelsyx' admin access [puppet] - 10https://gerrit.wikimedia.org/r/654307 (https://phabricator.wikimedia.org/T271161) [21:21:09] (03CR) 10jerkins-bot: [V: 04-1] airflow: remove chelsyx' admin access [puppet] - 10https://gerrit.wikimedia.org/r/654307 (https://phabricator.wikimedia.org/T271161) (owner: 10Ryan Kemper) [21:22:17] 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn) [21:22:26] 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn) [21:22:45] 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn) [21:23:19] (03PS1) 10Ottomata: Bump refine jar version to refinery-job 0.0.143 [puppet] - 10https://gerrit.wikimedia.org/r/654308 (https://phabricator.wikimedia.org/T251609) [21:23:23] (03CR) 10jerkins-bot: [V: 04-1] Add cookbook for rebooting druid nodes [cookbooks] - 10https://gerrit.wikimedia.org/r/651636 (https://phabricator.wikimedia.org/T269596) (owner: 10Razzi) [21:24:36] (03PS2) 10Ottomata: Bump refine jar version to refinery-job 0.0.143 [puppet] - 10https://gerrit.wikimedia.org/r/654308 (https://phabricator.wikimedia.org/T251609) [21:25:37] (03Abandoned) 10Ottomata: refine: blacklist WikibasePingback [puppet] - 10https://gerrit.wikimedia.org/r/647351 (owner: 10Milimetric) [21:27:55] (03PS5) 10Razzi: Add cookbook for rebooting druid nodes [cookbooks] - 10https://gerrit.wikimedia.org/r/651636 (https://phabricator.wikimedia.org/T269596) [21:35:20] !log razzi@cumin1001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [21:35:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:35:27] 10Operations, 10vm-requests, 10Patch-For-Review: Eq: 5 VM request for kafka-test-eqiad cluster - https://phabricator.wikimedia.org/T268202 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by razzi@cumin1001 for hosts: `kafka-test1004.eqiad.wmnet` - kafka-test1004.eqiad.wmnet (**WARN**) - **... [21:39:42] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install ml-serve100[1-4] - https://phabricator.wikimedia.org/T267050 (10wiki_willy) a:03Jclark-ctr Servers arrived Dec 23. @Jclark-ctr - can you install the GPU into one of these hosts? Thanks, Willy [21:41:04] 10Operations, 10ops-eqiad, 10Analytics: Degraded RAID on an-coord1002 - https://phabricator.wikimedia.org/T271098 (10wiki_willy) 05Open→03Resolved a:03Cmjohnson Duplicate of T270768 [21:41:07] (03PS1) 10PipelineBot: mathoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/654312 [21:42:17] 10Operations, 10ops-eqiad, 10SRE-swift-storage: Degraded RAID on ms-be1019 - https://phabricator.wikimedia.org/T270806 (10wiki_willy) a:03Cmjohnson [21:45:45] 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn) [21:45:56] 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn) Etherpad created: https://etherpad.wikimedia.org/p/appserver-buster-upgrade [21:50:15] (03PS2) 10Ryan Kemper: remove chelsyx' admin access [puppet] - 10https://gerrit.wikimedia.org/r/654307 (https://phabricator.wikimedia.org/T271161) [21:53:00] 10Operations, 10Gerrit-Privilege-Requests, 10SRE-Access-Requests: Requesting a service account with only access to Gerrit for fundraising deployment tools - https://phabricator.wikimedia.org/T271151 (10XenoRyet) As engineering manager for fr-tech, this has my approval. Please let me know if anything else is... [21:59:01] (03CR) 10Jeena Huneidi: [C: 03+2] "Nice, I was wondering if there was something like this we could do" [deployment-charts] - 10https://gerrit.wikimedia.org/r/651571 (owner: 10Ahmon Dancy) [22:00:04] Reedy and sbassett: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Weekly Security deployment window . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210104T2200). [22:00:26] (03Merged) 10jenkins-bot: 0.3.1: Use securityContext instead of main_app.rootImage [deployment-charts] - 10https://gerrit.wikimedia.org/r/651571 (owner: 10Ahmon Dancy) [22:11:22] 10Operations, 10EasyTimeline, 10Packaging: WMF deployed EasyTimeline extension depends on Ploticus package which is not available in Debian Buster (but available again in Debian Bullseye) - https://phabricator.wikimedia.org/T253377 (10Legoktm) 05Open→03Resolved a:03MoritzMuehlenhoff >>! In T245757#6640... [22:11:29] 10Operations, 10Beta-Cluster-Infrastructure, 10DBA, 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10Legoktm) [22:11:33] 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Legoktm) [22:11:38] (03PS1) 10Ppchelko: Labs: Remove now unused wgParserCacheUseJson [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654315 [22:13:45] (03PS1) 10PipelineBot: mathoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/654316 [22:15:09] (03PS1) 10Ppchelko: Labs: enable Parsoid in beta cluster for all MW installs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654317 (https://phabricator.wikimedia.org/T270440) [22:16:10] (03PS10) 10Bstorm: cloud haproxy: refactor the various haproxy setups [puppet] - 10https://gerrit.wikimedia.org/r/651301 [22:17:06] (03CR) 10Bstorm: cloud haproxy: refactor the various haproxy setups (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/651301 (owner: 10Bstorm) [22:17:37] (03CR) 10Subramanya Sastry: [C: 03+1] "Feel free to +2 when you are ready to deploy." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654317 (https://phabricator.wikimedia.org/T270440) (owner: 10Ppchelko) [22:18:09] (03CR) 10Ppchelko: [C: 03+2] "always ready! 😊" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654317 (https://phabricator.wikimedia.org/T270440) (owner: 10Ppchelko) [22:20:35] (03Merged) 10jenkins-bot: Labs: enable Parsoid in beta cluster for all MW installs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654317 (https://phabricator.wikimedia.org/T270440) (owner: 10Ppchelko) [22:20:53] (03PS1) 10Dzahn: parsoid: include a generic mariadb server in testreduce role [puppet] - 10https://gerrit.wikimedia.org/r/654318 (https://phabricator.wikimedia.org/T266509) [22:22:08] (03PS2) 10Dzahn: parsoid: include a generic mariadb server in testreduce role [puppet] - 10https://gerrit.wikimedia.org/r/654318 (https://phabricator.wikimedia.org/T266509) [22:24:12] PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [22:26:37] (03PS1) 10Volans: dependencies: remove upper limit for prospector [software/homer] - 10https://gerrit.wikimedia.org/r/654320 [22:27:45] (03PS2) 10Volans: dependencies: remove upper limit for prospector [software/homer] - 10https://gerrit.wikimedia.org/r/654320 (https://phabricator.wikimedia.org/T270969) [22:31:43] (03CR) 10Volans: [C: 03+2] dependencies: remove upper limit for prospector [software/homer] - 10https://gerrit.wikimedia.org/r/654320 (https://phabricator.wikimedia.org/T270969) (owner: 10Volans) [22:34:29] (03Merged) 10jenkins-bot: dependencies: remove upper limit for prospector [software/homer] - 10https://gerrit.wikimedia.org/r/654320 (https://phabricator.wikimedia.org/T270969) (owner: 10Volans) [22:39:24] (03PS1) 10Dzahn: parsoid/testreduce: install g++ on testreduce host [puppet] - 10https://gerrit.wikimedia.org/r/654321 [22:42:36] 10Operations, 10SRE-Access-Requests: convert Maya Kampurath to full-time employee - https://phabricator.wikimedia.org/T271169 (10Dzahn) [22:45:10] (03CR) 10Dzahn: [C: 03+2] parsoid/testreduce: install g++ on testreduce host [puppet] - 10https://gerrit.wikimedia.org/r/654321 (owner: 10Dzahn) [22:48:05] (03CR) 10Dzahn: [C: 03+2] parsoid: include a generic mariadb server in testreduce role [puppet] - 10https://gerrit.wikimedia.org/r/654318 (https://phabricator.wikimedia.org/T266509) (owner: 10Dzahn) [22:54:12] (03PS1) 10Dzahn: parsoid::testing: fix duplicate declaration of mariadb-client for buster [puppet] - 10https://gerrit.wikimedia.org/r/654322 [22:57:41] 10Operations, 10Beta-Cluster-Infrastructure, 10DBA, 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10Legoktm) [23:03:30] (03CR) 10Bstorm: "PCC looks probably good. It gets confusing with this much of a refactor https://puppet-compiler.wmflabs.org/compiler1001/27345/" [puppet] - 10https://gerrit.wikimedia.org/r/651301 (owner: 10Bstorm) [23:04:38] (03CR) 10Bstorm: "> Patch Set 10:" [puppet] - 10https://gerrit.wikimedia.org/r/651301 (owner: 10Bstorm) [23:05:42] (03PS2) 10Dzahn: parsoid::testing: fix duplicate declaration of mariadb-client for buster [puppet] - 10https://gerrit.wikimedia.org/r/654322 (https://phabricator.wikimedia.org/T266509) [23:10:21] (03PS3) 10Dzahn: parsoid::testing: fix duplicate declaration of mariadb-client for buster [puppet] - 10https://gerrit.wikimedia.org/r/654322 (https://phabricator.wikimedia.org/T266509) [23:11:03] (03PS2) 10Ppchelko: Labs: Remove now unused wgParserCacheUseJson [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654315 [23:12:36] (03PS1) 10Ppchelko: Labs: remove labs-specific wmgUseMediaModeration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654325 [23:13:58] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/27347/" [puppet] - 10https://gerrit.wikimedia.org/r/654322 (https://phabricator.wikimedia.org/T266509) (owner: 10Dzahn) [23:14:01] (03CR) 10Dzahn: [C: 03+2] parsoid::testing: fix duplicate declaration of mariadb-client for buster [puppet] - 10https://gerrit.wikimedia.org/r/654322 (https://phabricator.wikimedia.org/T266509) (owner: 10Dzahn) [23:16:01] 10Operations, 10Parsoid-Tests, 10serviceops, 10Parsoid (Tracking), 10Patch-For-Review: Make testreduce web UI publicly accessible on the internet - https://phabricator.wikimedia.org/T266509 (10Dzahn) [23:18:36] 10Operations, 10Parsoid-Tests, 10serviceops, 10Parsoid (Tracking), 10Patch-For-Review: Make testreduce web UI publicly accessible on the internet - https://phabricator.wikimedia.org/T266509 (10Dzahn) A generic mariadb server has now been installed by puppet on testreduce1001. (no change on scandium which... [23:20:54] (03CR) 10Volans: "quick first pass" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/654257 (owner: 10Jbond) [23:21:59] 10Operations, 10Parsoid-Tests, 10serviceops, 10Parsoid (Tracking), 10Patch-For-Review: Make testreduce web UI publicly accessible on the internet - https://phabricator.wikimedia.org/T266509 (10ssastry) [23:24:08] (03PS1) 10Dzahn: Revert "remove parsoid-vd/parsoid-rt.wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/653998 [23:24:15] (03PS2) 10Dzahn: Revert "remove parsoid-vd/parsoid-rt.wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/653998 (https://phabricator.wikimedia.org/T266509) [23:24:17] (03CR) 10jerkins-bot: [V: 04-1] Revert "remove parsoid-vd/parsoid-rt.wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/653998 (https://phabricator.wikimedia.org/T266509) (owner: 10Dzahn) [23:24:22] (03CR) 10jerkins-bot: [V: 04-1] Revert "remove parsoid-vd/parsoid-rt.wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/653998 (https://phabricator.wikimedia.org/T266509) (owner: 10Dzahn) [23:25:58] (03PS3) 10Dzahn: Revert "remove parsoid-vd/parsoid-rt.wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/653998 (https://phabricator.wikimedia.org/T266509) [23:26:57] 10Operations, 10Parsoid-Tests, 10serviceops, 10Parsoid (Tracking), 10Patch-For-Review: Make testreduce web UI publicly accessible on the internet - https://phabricator.wikimedia.org/T266509 (10Dzahn) Next we need to re-create the DNS entries (parsoid-rt-tests, parsoid-vd-tests) before we can point them t... [23:27:47] (03PS4) 10Dzahn: Revert "remove parsoid-vd/parsoid-rt.wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/653998 (https://phabricator.wikimedia.org/T266509) [23:29:13] (03PS5) 10Dzahn: Revert "remove parsoid-rt-tests.wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/653998 (https://phabricator.wikimedia.org/T266509) [23:52:39] (03PS1) 10Aaron Schulz: Use $region for default mcrouter routes [puppet] - 10https://gerrit.wikimedia.org/r/654330 [23:52:59] 10Operations, 10Parsoid-Tests, 10serviceops, 10Parsoid (Tracking), 10Patch-For-Review: Make testreduce web UI publicly accessible on the internet - https://phabricator.wikimedia.org/T266509 (10ssastry) Couple observations: 1. We don't need parsoid-vd-tests on testreduce1001 anymore since there are no imm... [23:58:39] 10Operations, 10Performance-Team, 10SRE-swift-storage, 10Patch-For-Review: Re-deleting a Commons file: "Error deleting file: The file "mwstore://local-multiwrite/local-deleted/..." is in an inconsistent state within the internal storage backends". - https://phabricator.wikimedia.org/T270994 (10aaron) A sub...