[00:07:08] 10Operations, 10media-storage: Server side upload failed with "overwriting failed (at recordUpload stage)" - https://phabricator.wikimedia.org/T231738 (10Urbanecm) [01:11:21] PROBLEM - etcd request latencies on neon is CRITICAL: instance=10.64.0.40:6443 operation=create https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:19:15] RECOVERY - etcd request latencies on neon is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:38:05] PROBLEM - etcd request latencies on neon is CRITICAL: instance=10.64.0.40:6443 operation=create https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:40:13] PROBLEM - k8s API server requests latencies on argon is CRITICAL: instance=10.64.32.133:6443 verb=PUT https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:46:29] PROBLEM - etcd request latencies on argon is CRITICAL: instance=10.64.32.133:6443 operation=get https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:49:05] PROBLEM - etcd request latencies on neon is CRITICAL: instance=10.64.0.40:6443 operation=create https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:55:21] PROBLEM - etcd request latencies on neon is CRITICAL: instance=10.64.0.40:6443 operation=create https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:55:57] PROBLEM - k8s API server requests latencies on argon is CRITICAL: instance=10.64.32.133:6443 verb=PUT https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:04:47] RECOVERY - etcd request latencies on neon is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:06:51] RECOVERY - etcd request latencies on argon is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:06:53] RECOVERY - k8s API server requests latencies on argon is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:50:59] (03PS1) 10Bstorm: openstack-pdns: don't run the mdadm check where the database runs [puppet] - 10https://gerrit.wikimedia.org/r/533727 (https://phabricator.wikimedia.org/T224828) [03:33:23] PROBLEM - snapshot of s6 in codfw on db1115 is CRITICAL: snapshot for s6 at codfw taken more than 4 days ago: Most recent backup 2019-08-28 03:26:02 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [05:16:29] PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/transform/html/to/mobile-html/{title} (Get preview mobile HTML for test page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [05:17:57] RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [05:48:41] PROBLEM - snapshot of s3 in codfw on db1115 is CRITICAL: snapshot for s3 at codfw taken more than 4 days ago: Most recent backup 2019-08-28 05:34:11 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [06:32:39] can someone delete the uploads from https://phabricator.wikimedia.org/p/Nettersteal/ [07:27:59] PROBLEM - snapshot of x1 in codfw on db1115 is CRITICAL: snapshot for x1 at codfw taken more than 4 days ago: Most recent backup 2019-08-28 07:02:54 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [07:58:42] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog: Lake Huron missing due to apparent OSM vandalism - https://phabricator.wikimedia.org/T231691 (10Gehel) As far as I can tell, the issue is now resolved. @MusikAnimal can you confirm and close this ticket if all looks good to you? Thanks! [10:40:47] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog: Lake Huron missing due to apparent OSM vandalism - https://phabricator.wikimedia.org/T231691 (10Pikne) @Gehel, there's also a lake in Norway waiting for monthly automatic update to reappear in smaller zoom levels (T230511). What are the guidelines... [12:24:33] 10Operations, 10MediaWiki-extensions-Babel: Two user pages on meta can't be rendered: "request has exceeded memory limit" - https://phabricator.wikimedia.org/T231522 (10Aklapper) [12:24:45] lol [12:26:47] PROBLEM - Widespread puppet agent failures- no resources reported on icinga1001 is CRITICAL: site=eqsin https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [12:54:57] RECOVERY - Widespread puppet agent failures- no resources reported on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [14:32:24] (03PS1) 10Zoranzoki21: Change configuration of AbuseFilter extension for enwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533747 (https://phabricator.wikimedia.org/T231750) [15:33:29] (03CR) 10Urbanecm: [C: 04-2] "CR-2, to note a more general issue that should prevent merging. This is not to mean "rejecting", just to note an issue that isn't about yo" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533747 (https://phabricator.wikimedia.org/T231750) (owner: 10Zoranzoki21) [16:03:43] PROBLEM - MegaRAID on analytics1045 is CRITICAL: CRITICAL: 12 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [16:06:41] PROBLEM - Check systemd state on analytics1045 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:39:40] (03CR) 10Daimona Eaytoy: [C: 04-1] ">" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533747 (https://phabricator.wikimedia.org/T231750) (owner: 10Zoranzoki21) [17:29:12] !log Run foreachwikiindblist group0 extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --dry-run --verbose (T231137) [17:29:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:29:15] T231137: Run fixFirstBlockAutopromoteEntries in production - https://phabricator.wikimedia.org/T231137 [17:29:25] !log Previous should be *group0.dblist (T231137) [17:29:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:33:45] !log Run foreachwikiindblist group1.dblist extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --dry-run --verbose (T231137) [17:33:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:45:17] !log Run mwscript extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --wiki=metawiki --verbose (T231137) [17:45:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:45:20] T231137: Run fixFirstBlockAutopromoteEntries in production - https://phabricator.wikimedia.org/T231137 [17:53:26] !log Run mwscript extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --wiki=enwikiquote --verbose (T231137) [17:53:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:28] T231137: Run fixFirstBlockAutopromoteEntries in production - https://phabricator.wikimedia.org/T231137 [18:08:53] 10Operations, 10Commons, 10Traffic, 10netops: Downloading the original SVG of a file on Commons serves a truncated stream - https://phabricator.wikimedia.org/T231753 (10Urbanecm) Thanks for the advanced information, it's going to be really helpful. Can't much help, but can reproduce: ` urbanecm@notebook... [18:11:00] hi [18:11:08] Urbanecm, https://commons.wikimedia.org/wiki/File:%E0%A4%B6%E0%A4%BF%E0%A4%B2%E0%A5%8D%E0%A4%AA%E0%A4%95%E0%A4%BE%E0%A4%B0_%E0%A4%9A%E0%A4%B0%E0%A4%BF%E0%A4%A4%E0%A5%8D%E0%A4%B0%E0%A4%95%E0%A5%8B%E0%A4%B6_%E0%A4%96%E0%A4%82%E0%A4%A1_%E0%A5%A8_%E2%80%93_%E0%A4%B8%E0%A4%BE%E0%A4%B9%E0%A4%BF%E0%A4%A4%E0%A5%8D%E0%A4%AF.pdf [18:11:22] https://phabricator.wikimedia.org/T231757 [18:12:40] first time I see this in 16 years editing [18:12:56] 10Operations, 10Commons, 10MediaWiki-File-management: Repeated errors while undeleting a file - https://phabricator.wikimedia.org/T231757 (10Zoranzoki21) @Urbanecm Oh, this is really weird. We get similar reports. Did I add the correct tags? [18:15:01] 10Operations, 10Commons, 10Traffic, 10netops: Downloading the original SVG of a file on Commons serves a truncated stream - https://phabricator.wikimedia.org/T231753 (10Urbanecm) Following @Zoranzoki21's tests, wget might handle protocols differently, it made to my check with --http1.0. ` urbanecm@notebo... [18:16:47] 10Operations, 10Commons, 10MediaWiki-File-management: Repeated errors while undeleting a file - https://phabricator.wikimedia.org/T231757 (10Yann) The expected content is https://commons.wikimedia.org/wiki/User:Yann/%E0%A4%B6%E0%A4%BF%E0%A4%B2%E0%A5%8D%E0%A4%AA%E0%A4%95%E0%A4%BE%E0%A4%B0_%E0%A4%9A%E0%A4%B0%E... [18:17:40] 10Operations, 10Commons, 10MediaWiki-File-management, 10Wikimedia-production-error: Repeated errors while undeleting a file - https://phabricator.wikimedia.org/T231757 (10Zoranzoki21) ` MediaWiki internal error. Original exception: [XWwLHwpAAD8AAC1WrNIAAACN] 2019-09-01 18:17:03: Fatal exception of type "W... [18:30:39] 10Operations, 10Commons, 10Traffic, 10netops: Downloading the original SVG of a file on Commons serves a truncated stream - https://phabricator.wikimedia.org/T231753 (10Urbanecm) Following up, it seems to behave randomly. The following is from Firefox 68.0.2 (64-bit) at Windows 10. {F30189917} Works for... [19:19:44] (03PS1) 10Alex Monk: Make puppetmaster CA content key be a hash keyed by puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/533758 (https://phabricator.wikimedia.org/T171188) [19:20:32] (03PS2) 10Alex Monk: cloud: Change monitoring things to look at new pupeptmaster [puppet] - 10https://gerrit.wikimedia.org/r/530344 (https://phabricator.wikimedia.org/T171188) [19:21:04] (03PS2) 10Alex Monk: Make puppetmaster CA content key be a hash keyed by puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/533758 (https://phabricator.wikimedia.org/T171188) [19:21:06] (03PS2) 10Alex Monk: cloud: Move instances to use new puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/530371 (https://phabricator.wikimedia.org/T171188) [19:21:29] (03CR) 10Alex Monk: [C: 04-1] "need to actually change it to use parent change" [puppet] - 10https://gerrit.wikimedia.org/r/530371 (https://phabricator.wikimedia.org/T171188) (owner: 10Alex Monk) [19:23:01] (03PS3) 10Alex Monk: cloud: Move instances to use new puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/530371 (https://phabricator.wikimedia.org/T171188) [19:24:47] (03PS3) 10Alex Monk: cloud: Change monitoring things to look at new puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/530344 (https://phabricator.wikimedia.org/T171188) [19:28:12] 10Operations, 10Puppet, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188 (10Krenair) At Wikimania me and Andrew discussed what else needs doing before we push the button here. We realised... [19:34:54] 10Operations, 10Commons, 10MediaWiki-File-management, 10Wikimedia-production-error: Repeated errors while undeleting a file - https://phabricator.wikimedia.org/T231757 (10Urbanecm) Posting tracebacks from logstash for posterity, this is T231276 basically. This one is for `XWwIjgpAICgAAI1nf8IAAABD`, but `XW... [19:59:27] 10Operations, 10Commons, 10Traffic, 10netops: Downloading the original SVG of a file on Commons serves a truncated stream - https://phabricator.wikimedia.org/T231753 (10Yorwba) I can no longer reproduce; it's a cache hit now. On the other hand, I think wget managed to work where the other tools failed by i... [20:38:15] (03PS8) 10DannyS712: Fix typos in code [puppet] - 10https://gerrit.wikimedia.org/r/530989 (https://phabricator.wikimedia.org/T201491) [21:46:52] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 10 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214 (10simon04) [21:47:02] (03CR) 10Zoranzoki21: Change configuration of AbuseFilter extension for enwikisource (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533747 (https://phabricator.wikimedia.org/T231750) (owner: 10Zoranzoki21) [21:47:31] (03Abandoned) 10Zoranzoki21: Add categories for all Croatian projects at wmgBabelMainCategory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482548 (owner: 10Zoranzoki21) [21:48:03] (03PS7) 10Zoranzoki21: IS.php: Add wgProofreadPagePageJoiner, set it per default on '-' and at zhwikisource on __PAGEJOIN__ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482502 (https://phabricator.wikimedia.org/T205826) [21:48:16] (03Abandoned) 10Zoranzoki21: Add category at wgGettingStartedExcludedCategories for srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482534 (owner: 10Zoranzoki21) [22:17:31] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 38 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [22:23:41] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 41 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [22:29:15] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 33 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [22:33:42] 10Operations, 10Commons, 10Traffic, 10netops: Downloading the original SVG of a file on Commons serves a truncated stream - https://phabricator.wikimedia.org/T231753 (10Urbanecm) >>! In T231753#5457439, @Yorwba wrote: > I can no longer reproduce; it's a cache hit now. I can reproduce, somehow, as in, this... [22:36:37] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 36 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [22:37:53] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 37 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [22:42:09] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 35 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [22:54:29] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 35 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [22:56:23] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 34 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [23:03:29] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 40 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [23:09:01] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 35 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [23:16:09] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 39 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [23:23:13] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 39 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [23:39:51] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 32 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [23:43:55] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 35 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [23:52:35] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 36 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [23:58:07] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 32 probes of 451 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts