[00:47:41] PROBLEM - Varnish traffic drop between 30min ago and now at esams on alert1001 is CRITICAL: 35.52 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [00:51:47] PROBLEM - varnish-http-requests grafana alert on alert1001 is CRITICAL: CRITICAL: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is alerting: 70% GET drop in 30min alert. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/ [00:54:15] RECOVERY - varnish-http-requests grafana alert on alert1001 is OK: OK: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is not alerting. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/ [00:54:57] RECOVERY - Varnish traffic drop between 30min ago and now at esams on alert1001 is OK: (C)60 le (W)70 le 88.57 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [01:19:48] 10SRE, 10Wikimedia-Mailing-lists: daily-article-l@, education@ import to Mailman3 failed because of unicode characters in display name - https://phabricator.wikimedia.org/T282271 (10Ladsgroup) I fixed education and re-migrated it. `daily-article-l` otoh, has lots of such cases, will clean that later. [01:25:51] 10SRE, 10Wikimedia-Mailing-lists: daily-article-l@, education@ import to Mailman3 failed because of unicode characters in display name - https://phabricator.wikimedia.org/T282271 (10Legoktm) I filed https://gitlab.com/mailman/mailman/-/issues/891 upstream [01:28:04] 10SRE, 10Wikimedia-Mailing-lists, 10Upstream: daily-article-l@, education@ import to Mailman3 failed because of unicode characters in display name - https://phabricator.wikimedia.org/T282271 (10Legoktm) [01:31:28] (03PS6) 10Legoktm: lists: Add Apache configuration for pipermail redirects [puppet] - 10https://gerrit.wikimedia.org/r/685711 [01:32:12] (03CR) 10Legoktm: [C: 03+2] lists: Add Apache configuration for pipermail redirects [puppet] - 10https://gerrit.wikimedia.org/r/685711 (owner: 10Legoktm) [01:35:55] (03PS4) 10Legoktm: mailman3: Script to generate pipermail redirects [puppet] - 10https://gerrit.wikimedia.org/r/685723 (https://phabricator.wikimedia.org/T280731) [01:36:43] (03CR) 10Legoktm: [C: 03+2] mailman3: Script to generate pipermail redirects [puppet] - 10https://gerrit.wikimedia.org/r/685723 (https://phabricator.wikimedia.org/T280731) (owner: 10Legoktm) [01:44:23] 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Implement static redirects from pipermail archives to hyperkitty archives - https://phabricator.wikimedia.org/T280731 (10Legoktm) I enabled redirects for mediawiki-debian and mediawiki-distributors. So far these only work for individual messages for now.... [01:55:29] PROBLEM - SSH on phab2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [02:23:43] <[1997kB]> Hey, Can anyone move icinga-wm in #wikidata to #wikidata-feed ? Bot is confusing to normal users and all such kind of monitoring bot related to Wikidata are in -feed channel. [02:26:11] [1997kB]: best would be to file a task in phabricator, and tag with SRE and observability [02:26:54] <[1997kB]> Ok, will do. [02:28:30] Yeah, exactly what p858snake said :) [02:30:42] 10SRE, 10observability: Move icinga-wm from #wikidata to #wikidata-feed - https://phabricator.wikimedia.org/T282301 (101997kB) [02:31:12] <[1997kB]> there you go.. thanks! [02:31:57] 10SRE, 10observability: Move icinga-wm from #wikidata to #wikidata-feed - https://phabricator.wikimedia.org/T282301 (101997kB) [02:34:24] [1997kB]: if you are skilled in git, You can propose the patch yourself, The checks are generally in the operations/puppet repo from memory [02:49:48] <[1997kB]> p858snake: well I can try that too [02:53:19] PROBLEM - Disk space on rpki1001 is CRITICAL: DISK CRITICAL - free space: / 2811 MB (32% inode=2%): /tmp 2811 MB (32% inode=2%): /var/tmp 2811 MB (32% inode=2%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=rpki1001&var-datasource=eqiad+prometheus/ops [03:07:25] 10SRE, 10observability: Move icinga-wm from #wikidata to #wikidata-feed - https://phabricator.wikimedia.org/T282301 (10Peachey88) [03:10:19] 10SRE, 10Wikimedia-Mailing-lists: Find list owners for lists without them - https://phabricator.wikimedia.org/T281779 (10Ladsgroup) Lego has this great idea to add "list-disabled@" (going to blackhole) as owner which also can at least tag which mailing lists are disabled in a machine-readable way. Most of thes... [03:17:59] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [03:19:19] PROBLEM - WDQS SPARQL on wdqs1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [03:20:00] (03PS1) 10Ladsgroup: exim4: Add blackhole for "disabled-lists" [puppet] - 10https://gerrit.wikimedia.org/r/686879 (https://phabricator.wikimedia.org/T281779) [03:20:25] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [03:21:45] RECOVERY - WDQS SPARQL on wdqs1013 is OK: HTTP OK: HTTP/1.1 200 OK - 691 bytes in 9.755 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [03:22:43] 10SRE, 10Wikimedia-Mailing-lists, 10User-Ladsgroup: Archive destacado-l - https://phabricator.wikimedia.org/T282291 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup Will do that a bit later. The sad thing is that we don't have a standard way to disable a mailing list and some hacks have been made for mail... [03:23:35] 10SRE, 10Wikimedia-Mailing-lists, 10User-Ladsgroup: Archive destacado-l - https://phabricator.wikimedia.org/T282291 (10Ladsgroup) 05Resolved→03Open I wanted to assign it, not to close it [03:29:01] PROBLEM - WDQS SPARQL on wdqs1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [03:31:15] RECOVERY - WDQS SPARQL on wdqs1013 is OK: HTTP OK: HTTP/1.1 200 OK - 690 bytes in 1.091 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [03:38:31] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 143, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [03:39:31] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 239, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:48:36] (03PS1) 10Legoktm: mailman3: Improve pipermail_redirects [puppet] - 10https://gerrit.wikimedia.org/r/686930 [05:55:46] (03CR) 10Legoktm: [C: 03+2] mailman3: Improve pipermail_redirects [puppet] - 10https://gerrit.wikimedia.org/r/686930 (owner: 10Legoktm) [06:01:47] (03PS1) 10Legoktm: mailman3: Fix print in pipermail_redirects [puppet] - 10https://gerrit.wikimedia.org/r/686936 [06:02:13] (03CR) 10Legoktm: [C: 03+2] mailman3: Fix print in pipermail_redirects [puppet] - 10https://gerrit.wikimedia.org/r/686936 (owner: 10Legoktm) [06:38:47] 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Implement static redirects from pipermail archives to hyperkitty archives - https://phabricator.wikimedia.org/T280731 (10Legoktm) ` legoktm@lists1001:/var/lib/mailman3/redirects$ sudo pipermail_redirects deutschschweiz --no-rebuild Going through 2020-Augu... [06:50:47] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [06:53:09] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [06:55:00] 10SRE, 10Wikimedia-Mailing-lists: Upgrade mailing lists from mailman2 to 3 in batches - https://phabricator.wikimedia.org/T280322 (10Ladsgroup) [07:09:11] (03PS1) 10Legoktm: Switch to Architecture: all [software/mailman-templates] - 10https://gerrit.wikimedia.org/r/686959 [07:10:15] 10SRE, 10DBA, 10Wikimedia-Mailing-lists: Delete lists-next.wikimedia.org - https://phabricator.wikimedia.org/T281548 (10Legoktm) 05Stalled→03Open We can do this next week. [07:17:13] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:19:37] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:37:20] (03CR) 10Legoktm: [C: 03+2] Switch to Architecture: all [software/mailman-templates] - 10https://gerrit.wikimedia.org/r/686959 (owner: 10Legoktm) [07:37:29] (03PS2) 10Legoktm: Add qqq [software/mailman-templates] - 10https://gerrit.wikimedia.org/r/685533 [07:38:23] (03Merged) 10jenkins-bot: Switch to Architecture: all [software/mailman-templates] - 10https://gerrit.wikimedia.org/r/686959 (owner: 10Legoktm) [08:01:15] 10SRE, 10observability, 10wdwb-tech: Move icinga-wm from #wikidata to #wikidata-feed - https://phabricator.wikimedia.org/T282301 (10Addshore) [08:01:43] 10SRE, 10Wikidata, 10observability, 10wdwb-tech: Move icinga-wm from #wikidata to #wikidata-feed - https://phabricator.wikimedia.org/T282301 (10Addshore) [08:08:31] 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Implement static redirects from pipermail archives to hyperkitty archives - https://phabricator.wikimedia.org/T280731 (10Legoktm) I enabled redirects for everything that didn't cause unicode errors, currently 102,189 individual emails. 10% there! [08:09:44] 10SRE, 10Wikidata, 10observability, 10wdwb-tech: Move icinga-wm from #wikidata to #wikidata-feed - https://phabricator.wikimedia.org/T282301 (10Legoktm) For reference https://gerrit.wikimedia.org/g/operations/puppet/+/7bbfebf8e5c99a90923fc200f270a8654e384ca9/modules/profile/manifests/icinga/ircbot.pp#11 is... [08:23:43] 10SRE, 10Wikimedia-Mailing-lists, 10User-Ladsgroup: Upgrade mailing lists from mailman2 to 3 in batches - https://phabricator.wikimedia.org/T280322 (10Ladsgroup) a:03Ladsgroup I think it's safe to assume I'm doing this. [08:47:25] (03PS1) 10Legoktm: mailman3: Fix generating redirects for non UTF-8 messages [puppet] - 10https://gerrit.wikimedia.org/r/686998 [08:48:41] (03CR) 10Legoktm: [C: 03+2] mailman3: Fix generating redirects for non UTF-8 messages [puppet] - 10https://gerrit.wikimedia.org/r/686998 (owner: 10Legoktm) [08:53:31] 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Implement static redirects from pipermail archives to hyperkitty archives - https://phabricator.wikimedia.org/T280731 (10Legoktm) {fff57796d2a969b54b64c0aabb21abc5be630ed6} avoids the unicode problem. Re-running it in a screen now. [08:55:53] <[1997kB]> legoktm p858snake Hi, when trying to clone repo, I'm getting "error: invalid path '/ modules / mailman3 / files / templates / domain:admin:notice:new-list.txt' [08:55:53] <[1997kB]> fatal: unable to checkout working tree [08:56:27] [1997kB]: I don't actually use git so i can't assist [08:56:32] huh, are you on Windows? [08:57:00] <[1997kB]> yep [08:57:20] oh crap [08:57:40] well, that's my fault, I'm sorry [08:58:40] let me uh, file a bug [08:59:05] I'll fix it on Monday, I don't want to deploy a change to fix it that is that large over the weekend [08:59:22] Gerrit supports creating a patch through the web UI, you could use that? [09:00:58] <[1997kB]> I can try that. [09:01:25] 10SRE: Mailman3 templates with colons in filename made operations/puppet not cloneable on Windows - https://phabricator.wikimedia.org/T282308 (10Legoktm) p:05Triage→03High [09:01:28] (03PS1) 10Majavah: kubeadm: Add support for API server SANs [puppet] - 10https://gerrit.wikimedia.org/r/687003 [09:03:32] (03PS2) 10Majavah: kubeadm: Add support for API server SANs [puppet] - 10https://gerrit.wikimedia.org/r/687003 [09:32:56] (03CR) 101997kB: [C: 03+1] "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/686697 (https://phabricator.wikimedia.org/T282301) (owner: 101997kB) [10:01:33] PROBLEM - Disk space on rpki1001 is CRITICAL: DISK CRITICAL - free space: / 2746 MB (32% inode=1%): /tmp 2746 MB (32% inode=1%): /var/tmp 2746 MB (32% inode=1%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=rpki1001&var-datasource=eqiad+prometheus/ops [10:05:19] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 145, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [10:05:55] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 241, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [10:17:19] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:17:32] (03PS2) 10Majavah: toolforge: Add ingress-nginx Helm files [puppet] - 10https://gerrit.wikimedia.org/r/685715 (https://phabricator.wikimedia.org/T264221) [10:19:45] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:34:26] (03CR) 10Elukey: kerberos: require --email_address for create and reset-password (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/686766 (https://phabricator.wikimedia.org/T282185) (owner: 10Razzi) [11:08:11] RECOVERY - SSH on phab2001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [11:15:42] (03PS2) 10Neechalkaran: enable wikilove to tawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/686700 [11:35:05] (03PS1) 10Amire80: WIP Make the Malaysian talk namespaces names consistent [mediawiki-config] - 10https://gerrit.wikimedia.org/r/687064 [12:47:35] (03PS3) 10Majavah: toolforge: Add ingress-nginx Helm files [puppet] - 10https://gerrit.wikimedia.org/r/685715 (https://phabricator.wikimedia.org/T264221) [13:14:27] PROBLEM - Disk space on rpki1001 is CRITICAL: DISK CRITICAL - free space: / 2759 MB (32% inode=2%): /tmp 2759 MB (32% inode=2%): /var/tmp 2759 MB (32% inode=2%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=rpki1001&var-datasource=eqiad+prometheus/ops [13:17:11] (03CR) 10Aklapper: "Thanks. Please see https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/686700 (owner: 10Neechalkaran) [13:18:00] (03PS3) 10Aklapper: Enable WikiLove extension on tawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/686700 (https://phabricator.wikimedia.org/T280326) (owner: 10Neechalkaran) [13:19:08] (03CR) 10jerkins-bot: [V: 04-1] Enable WikiLove extension on tawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/686700 (https://phabricator.wikimedia.org/T280326) (owner: 10Neechalkaran) [13:50:02] (03Abandoned) 10Hashar: [WMF] Add XDG_CACHE_HOME to tools/download_file.py [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/684440 (owner: 10Hashar) [13:59:16] (03CR) 10Zabe: Enable WikiLove extension on tawiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/686700 (https://phabricator.wikimedia.org/T280326) (owner: 10Neechalkaran) [14:00:22] (03PS1) 10Hashar: Merge 'upstream/stable-3.2' into wmf/stable-3.2 [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/687117 [14:01:06] (03PS4) 10Hashar: [WMF] register our plugins as submodules [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/684336 [14:01:09] (03PS5) 10Hashar: [WMF] script to build our plugins [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/684411 [14:12:37] PROBLEM - SSH on phab2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [14:35:14] 10SRE, 10Wikimedia-Mailing-lists: Old pending actions in migrated ML were not imported - https://phabricator.wikimedia.org/T282310 (10Aklapper) @Sannita: The summary says "Old pending actions were not imported" but the text says that pending actions were not discarded which sounds like a contradiction? For the... [15:32:55] 10SRE, 10Wikimedia-Mailing-lists: Old pending actions in migrated ML were not imported - https://phabricator.wikimedia.org/T282310 (10Sannita) [15:33:24] 10SRE, 10Wikimedia-Mailing-lists: Old pending actions in migrated ML were not imported - https://phabricator.wikimedia.org/T282310 (10Sannita) I rewrote the description, adding more clarifications. HTH. [15:34:34] 10SRE, 10Wikimedia-Mailing-lists: Old pending actions in migrated ML were not imported - https://phabricator.wikimedia.org/T282310 (10Sannita) [16:00:28] (03PS4) 10Majavah: toolforge: Add ingress-nginx Helm files [puppet] - 10https://gerrit.wikimedia.org/r/685715 (https://phabricator.wikimedia.org/T264221) [16:03:43] 10SRE, 10Wikimedia-Mailing-lists: Old pending actions in migrated ML were not imported - https://phabricator.wikimedia.org/T282310 (10RhinosF1) You don't need to use mailman2. You can simply create an account using the new mailman3 interface. There's no shared account for all list admins anymore. [16:53:31] 10SRE, 10Wikimedia-Mailing-lists: Old pending actions in migrated ML were not imported - https://phabricator.wikimedia.org/T282310 (10Ladsgroup) p:05Triage→03Low I think I understand the problem. I had some similar issues too (in some of my mailing lists). I feel it's basically a bunch of small issues feed... [17:17:59] !log starting upgrade of batch G of mailing lists (T280322) [17:18:06] Whee. [17:18:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:18:10] T280322: Upgrade mailing lists from mailman2 to 3 in batches - https://phabricator.wikimedia.org/T280322 [17:18:38] almost there ^^ [17:19:11] Yeah. [17:32:35] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp5016 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [17:40:41] (03CR) 10Ladsgroup: "2021-05-08 17:39:42 1lfQvV-0005Jb-PE <= ladsgroup@gmail.com H=mail-yb1-f178.google.com [209.85.219.178]:37493 I=[172.16.4.88]:25 P=esmtps " [puppet] - 10https://gerrit.wikimedia.org/r/686879 (https://phabricator.wikimedia.org/T281779) (owner: 10Ladsgroup) [17:42:17] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp5016 is OK: HTTP OK: HTTP/1.0 200 OK - 23302 bytes in 0.709 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [17:44:07] Amir1: I was very happy to see the xmldatadumps-l list migrated. And also happy to create one account and see all my lists and roles attached. Thanks a heap! [17:44:47] apergos: ^^ Let me know if you encounter any issues [17:45:11] I guess the subscribers will let us know :-) [17:45:26] I know some 2011 emails of xmldumps have not been upgraded to the new archive due to some weird encoding issues. We'll find and handle those manually [17:45:34] like five to ten emails in total [17:48:40] I'm finding so many mailing lists I'd forgotten I was on. [17:48:56] Like the WMF-people-attending-Wikimania-2013 one. [17:51:04] lol [17:58:37] I undsubscribed from one I never read. [17:59:13] mediawiki-commits or whatever? :-) [18:04:01] no, I kept that one :-) [18:04:14] wikimedia-l? *cough* [18:04:23] it was the education list, I see the emails, I click through, but I never really pay attention in the end... so [18:04:56] I'm still resentfull that I had to subscribe to troll-l in order to post my view during The Troubled Times (tm) but I just stayed on [18:05:11] As a drama Shah, I love wikimedia-l :D [18:05:53] jokes aside, its cost outweighs its benefits IMHO [18:08:04] there is no other across-the-movement mailing list though. and if this one closed and a new one were started, problems would migrate along with users [18:09:11] Yes, a clear (stricter maybe?) moderation policy would help [18:09:44] The challenge is that anyone interested enough to moderate will be seen by all the many "sides" as being biased against them. Exhaustingly unfun. [18:10:34] * James_F did his time moderating USENET back in the '90s and has bad memories even of light groups, let alone things consider their life's vocation. [18:10:37] indeed. Having some objective rules would help I liked the one that was suggested. Any user that's banned in one project can't post there. [18:11:33] but objective rules have false positives, double unfun [18:11:59] If I wanted to be unhelpful I'd just take up on enwikinews or another semi-abandonned project, get to be a sysop, and then ban every WMF staffer I could. [18:12:12] Suddenly, no WMFers on troll-l. Success! Etc. [18:12:19] a couple of weeks ago I was at a presentation by twich and reddit moderators, it's was really interesting [18:12:21] (Or WMDErs or whatever.) [18:12:51] banned on one project can be a byproduct of bad culture on a specific wiki so I think that would be a real problematic policy [18:12:56] The alternative is suggesting that only blocks from some of our communities count, which is very uncomfortable. [18:12:59] Yeah. [18:51:38] 10SRE, 10Wikimedia-Mailing-lists: 'Held Unsubscriptions' keeps sending email notifications in Mailman3 - https://phabricator.wikimedia.org/T282319 (10Ciell) [19:15:33] PROBLEM - rpki grafana alert on alert1001 is CRITICAL: CRITICAL: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is alerting: eqiad total VRPs alert, valid ROAs alert. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [19:36:50] 10SRE, 10Wikimedia-Mailing-lists: Old pending actions in migrated ML were not imported - https://phabricator.wikimedia.org/T282310 (10Sannita) >>! In T282310#7071576, @Ladsgroup wrote: > You basically have a mailing list that has pending work to tend to when the upgrade happens. The upgrade doesn't bring them... [19:49:21] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=routinator site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [19:51:49] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [21:03:35] PROBLEM - Postgres Replication Lag on puppetdb2002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 146734704 and 5 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [21:06:01] RECOVERY - Postgres Replication Lag on puppetdb2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 48848 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [21:40:59] 10SRE, 10Wikimedia-Mailing-lists: Mailman 3: per-list language preferences don't work - https://phabricator.wikimedia.org/T282279 (10Tgr) Apparently this happens if you try to unset the language (set it to the `----` option so it inherits global preferences). Is the PATCH request internal? In the browser, I d... [21:56:08] 10SRE, 10Wikimedia-Mailing-lists: Mailman 3: Invalid Parameter "delivery_status": Accepted Values are: enabled, by_user, by_bounces, by_moderator, unknown. - https://phabricator.wikimedia.org/T282327 (10Tgr) [23:25:37] RECOVERY - SSH on phab2001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook