[00:00:18] 10Operations, 10Mail, 10Phabricator: Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10mmodell) I'm not sure if the above problem is the entire issue? It seems like it should be working. [00:04:29] PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [00:27:51] PROBLEM - Memory correctable errors -EDAC- on wtp2020 is CRITICAL: 5.001 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=wtp2020&var-datasource=codfw+prometheus/ops [00:27:56] 10Operations, 10Mail, 10Phabricator: Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10mmodell) Test [00:34:04] PROBLEM - EDAC syslog messages on wtp2020 is CRITICAL: 5.001 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=wtp2020&var-datasource=codfw+prometheus/ops [00:38:08] 10Operations, 10Mail, 10Phabricator: Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10mmodell) As far as I can tell everything seems to be configured correctly but the messages just disappear. I suspect that something changed with the exim version on stretch? [00:55:56] 10Operations, 10Mail, 10Phabricator: Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10mmodell) Note that I've gone through all the upstream troubleshooting steps, however, we have a custom exim setup which is not at all supported by upstream and I wasn't involved in settin... [00:58:21] 10Operations, 10Mail, 10Phabricator: Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10mmodell) adding @chasemp as he was the one who set up phab_epipe.py as far as I remember. [01:08:01] 10Operations, 10Mail, 10Phabricator: Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10Paladox) @mmodell could it be because we don't have php-mailparse installed when on stretch? We installed php-mailparse when using php5 though. [01:09:14] (03PS1) 10Paladox: Phabricator: Install php-mailparse when using php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/513712 [01:10:55] (03PS2) 10Paladox: Phabricator: Install php-mailparse when using php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/513712 [01:11:00] (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/513712 (owner: 10Paladox) [01:11:27] (03CR) 10jerkins-bot: [V: 04-1] Phabricator: Install php-mailparse when using php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/513712 (owner: 10Paladox) [01:12:00] (03PS3) 10Paladox: Phabricator: Install php-mailparse when using php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/513712 [01:12:05] (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/513712 (owner: 10Paladox) [01:12:54] (03CR) 10jerkins-bot: [V: 04-1] Phabricator: Install php-mailparse when using php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/513712 (owner: 10Paladox) [01:13:16] (03CR) 10jerkins-bot: [V: 04-1] Phabricator: Install php-mailparse when using php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/513712 (owner: 10Paladox) [01:13:43] (03PS4) 10Paladox: Phabricator: Install php-mailparse when using php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/513712 [01:14:30] (03CR) 10jerkins-bot: [V: 04-1] Phabricator: Install php-mailparse when using php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/513712 (owner: 10Paladox) [01:14:59] (03PS5) 10Paladox: Phabricator: Install php-mailparse when using php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/513712 [01:15:24] (03PS1) 1020after4: phabricator: Install php-mailparse [puppet] - 10https://gerrit.wikimedia.org/r/513713 (https://phabricator.wikimedia.org/T224752) [01:15:59] (03Abandoned) 10Paladox: Phabricator: Install php-mailparse when using php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/513712 (owner: 10Paladox) [01:16:33] (03CR) 10Paladox: phabricator: Install php-mailparse (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/513713 (https://phabricator.wikimedia.org/T224752) (owner: 1020after4) [01:17:57] 10Operations, 10Mail, 10Phabricator, 10Patch-For-Review: Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10mmodell) hmm `php-mailparse is already the newest version (3.0.2+2.1.6-12-gae1ef14-3+0~20180910132529.4+stretch~1.gbpcad4a8+wmf1)` [01:20:08] 10Operations, 10Mail, 10Phabricator, 10Patch-For-Review: Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10Paladox) @mmodell Yup, it may be installed but the php class manages the mods (removes ones that are not maintained by the class). [01:21:56] 10Operations, 10Mail, 10Phabricator, 10Patch-For-Review: Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10Paladox) For example if we removed: ` 'mysqlnd': package_name => 'php7.2-mysqlnd', sapis => ['cli', 'fpm'], ` p... [01:22:22] (03CR) 10Paladox: phabricator: Install php-mailparse (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/513713 (https://phabricator.wikimedia.org/T224752) (owner: 1020after4) [01:30:34] (03PS2) 1020after4: phabricator: Install php-mailparse [puppet] - 10https://gerrit.wikimedia.org/r/513713 (https://phabricator.wikimedia.org/T224752) [01:30:53] (03CR) 1020after4: phabricator: Install php-mailparse (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/513713 (https://phabricator.wikimedia.org/T224752) (owner: 1020after4) [01:31:10] (03CR) 10Paladox: [C: 03+1] "Great! Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/513713 (https://phabricator.wikimedia.org/T224752) (owner: 1020after4) [02:18:27] 10Operations, 10Mail, 10Phabricator, 10Patch-For-Review: Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10mmodell) test test [02:26:54] 10Operations, 10Mail, 10Phabricator, 10Patch-For-Review: Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10mmodell) test again! [02:29:01] PROBLEM - MegaRAID on helium is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [02:29:12] ACKNOWLEDGEMENT - MegaRAID on helium is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T224794 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [02:29:15] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T224794 (10ops-monitoring-bot) [02:34:03] 10Operations, 10Mail, 10Phabricator, 10Patch-For-Review: Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10mmodell) one more time [02:37:11] 10Operations, 10Mail, 10Phabricator, 10Patch-For-Review: Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10mmodell) ok it looks like it works.. https://gerrit.wikimedia.org/r/513713 should be the fix (Applied manually on phab1003) [02:39:09] 10Operations, 10Mail, 10Phabricator, 10Patch-For-Review: Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10mmodell) @paladox: nice catch noticing that the mailparse module wasn't enabled on fpm in stretch! [02:57:52] PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [03:30:16] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [04:13:51] 10Operations, 10ops-eqiad: Degraded RAID on analytics1029 - https://phabricator.wikimedia.org/T224795 (10ops-monitoring-bot) [06:31:29] PROBLEM - puppet last run on ms-be1030 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/prometheus-puppet-agent-stats] [06:31:43] PROBLEM - puppet last run on mw1278 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/puppet-enabled] [06:33:25] PROBLEM - puppet last run on conf2003 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [06:33:53] 10Operations, 10Mail, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10greg) p:05Unbreak!→03High Hotfix in place, resetting to High. [06:58:25] RECOVERY - puppet last run on ms-be1030 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:58:43] RECOVERY - puppet last run on mw1278 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:00:27] RECOVERY - puppet last run on conf2003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:52:06] (03PS1) 10Elukey: Update password file paths for the monthly Analytics sqoop jobs [puppet] - 10https://gerrit.wikimedia.org/r/513718 [07:53:01] (03CR) 10Elukey: [C: 03+2] Update password file paths for the monthly Analytics sqoop jobs [puppet] - 10https://gerrit.wikimedia.org/r/513718 (owner: 10Elukey) [07:57:52] (03PS1) 10DannyS712: Add "Zerrenda" (list) namespace to VisualEditor on euwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/513720 (https://phabricator.wikimedia.org/T224801) [08:00:14] (03PS2) 10DannyS712: Add "Zerrenda" (list) namespace to VisualEditor on euwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/513720 (https://phabricator.wikimedia.org/T224801) [10:07:51] 10Operations, 10Release Pipeline, 10Release-Engineering-Team, 10serviceops, and 5 others: Introduce kask session storage service to kubernetes - https://phabricator.wikimedia.org/T220401 (10akosiaris) >>! In T220401#5226623, @Eevans wrote: >>>! In T220401#5226531, @akosiaris wrote: >> One minor question. G... [11:26:41] 10Operations, 10ops-eqiad, 10DC-Ops, 10Data-Services, and 2 others: Decommission labstore100[123] and their disk shelves - https://phabricator.wikimedia.org/T187456 (10faidon) One note for @Cmjohnson for the upcoming decom which is apparently imminent: labstore1003-arrayN are one of the handful cases that... [11:45:07] RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational [11:53:49] (03CR) 10Faidon Liambotis: [C: 04-1] Add cable names report (033 comments) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/513003 (https://phabricator.wikimedia.org/T216469) (owner: 10CRusnov) [12:44:51] PROBLEM - Device not healthy -SMART- on db1062 is CRITICAL: cluster=mysql device=megaraid,0 instance=db1062:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db1062&var-datasource=eqiad+prometheus/ops [13:09:45] 10Operations, 10Operations-Software-Development, 10netbox, 10netops, and 2 others: Netbox report to validate network equipment data - https://phabricator.wikimedia.org/T221507 (10faidon) It seems like part of the challenge is identifying clustered equipment (i.e. asw stacks & pfw). In those cases, the devi... [14:45:17] PROBLEM - mailman_queue_size on fermium is CRITICAL: CRITICAL: 1 mailman queue(s) above limits (thresholds: bounces: 25 in: 25 virgin: 25) https://wikitech.wikimedia.org/wiki/Mailman [14:48:07] RECOVERY - mailman_queue_size on fermium is OK: OK: mailman queues are below the limits. https://wikitech.wikimedia.org/wiki/Mailman [15:02:44] 10Operations, 10ops-eqiad, 10DBA: db1062 (s7 db primary master) disk with predictive failure - https://phabricator.wikimedia.org/T224805 (10Marostegui) [15:03:01] 10Operations, 10ops-eqiad, 10DBA: db1062 (s7 db primary master) disk with predictive failure - https://phabricator.wikimedia.org/T224805 (10Marostegui) p:05Triage→03High [15:03:12] 10Operations, 10ops-eqiad, 10DBA: db1062 (s7 db primary master) disk with predictive failure - https://phabricator.wikimedia.org/T224805 (10Marostegui) [15:03:15] 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [15:04:57] ACKNOWLEDGEMENT - Device not healthy -SMART- on db1062 is CRITICAL: cluster=mysql device=megaraid,0 instance=db1062:9100 job=node site=eqiad Marostegui T224805 https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db1062&var-datasource=eqiad+prometheus/ops [15:14:59] PROBLEM - mailman_queue_size on fermium is CRITICAL: CRITICAL: 1 mailman queue(s) above limits (thresholds: bounces: 25 in: 25 virgin: 25) https://wikitech.wikimedia.org/wiki/Mailman [15:16:23] RECOVERY - mailman_queue_size on fermium is OK: OK: mailman queues are below the limits. https://wikitech.wikimedia.org/wiki/Mailman [15:30:31] PROBLEM - mailman_queue_size on fermium is CRITICAL: CRITICAL: 1 mailman queue(s) above limits (thresholds: bounces: 25 in: 25 virgin: 25) https://wikitech.wikimedia.org/wiki/Mailman [15:31:55] RECOVERY - mailman_queue_size on fermium is OK: OK: mailman queues are below the limits. https://wikitech.wikimedia.org/wiki/Mailman [17:16:13] PROBLEM - Device not healthy -SMART- on helium is CRITICAL: cluster=misc device=megaraid,8 instance=helium:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=helium&var-datasource=eqiad+prometheus/ops [19:18:15] PROBLEM - Disk space on mw1296 is CRITICAL: DISK CRITICAL - free space: /tmp 53 MB (0% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space [19:21:05] RECOVERY - Disk space on mw1296 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space [20:08:09] PROBLEM - Apache HTTP on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [20:09:25] RECOVERY - Apache HTTP on mw1286 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 618 bytes in 1.185 second response time https://wikitech.wikimedia.org/wiki/Application_servers [20:14:27] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/513720 (https://phabricator.wikimedia.org/T224801) (owner: 10DannyS712) [20:15:12] (03CR) 10jerkins-bot: [V: 04-1] Add "Zerrenda" (list) namespace to VisualEditor on euwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/513720 (https://phabricator.wikimedia.org/T224801) (owner: 10DannyS712) [20:17:15] (03CR) 10Urbanecm: [C: 04-1] "See inline comment" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/513720 (https://phabricator.wikimedia.org/T224801) (owner: 10DannyS712) [20:29:08] (03PS1) 10Urbanecm: Add Wikiprojekti namespace to wgExtraSignatureNamespaces for fiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/513740 (https://phabricator.wikimedia.org/T224215) [21:10:58] (03PS3) 10DannyS712: Add "Zerrenda" (list) namespace to VisualEditor on euwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/513720 (https://phabricator.wikimedia.org/T224801) [21:25:27] 10Operations, 10Mail, 10Phabricator, 10Patch-For-Review, and 2 others: Phabricator email comments not posted - https://phabricator.wikimedia.org/T224752 (10Aklapper) [22:41:23] XioNoX: Just got this on enwp's Special:Export https://usercontent.irccloud-cdn.com/file/VxmrJUFN/image.png [22:46:53] * Krinkle staging on mwdebug1002 [22:47:56] Krinkle, It's a decent Export so am I messing you up or what [22:48:37] No, I am informing other engineers that I am deploying a patch [22:48:40] unrelated to your issue :) [22:49:17] Krinkle, ah cool, any clue what's causing mine? Unsurprisingly it's a useful error message [22:49:25] !log krinkle@deploy1001 Synchronized php-1.34.0-wmf.7/extensions/3D/modules/mmv.3d.js: T224812 / bd4fbfddbe1a0 (duration: 01m 07s) [22:49:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:49:32] T224812: Unable to display 3D file: TypeError: mw.mmv.isBrowserSupported is not a function - https://phabricator.wikimedia.org/T224812 [22:50:46] RhinosF1: In the logs I see [2019-06-01T22:48:20 UTC] enwiki /wiki/Special:Export POST "PHP Fatal Error: entire web request took longer than 200 seconds and timed out" [22:50:59] So might be a case of exporting more than is supported through this mechanism. [22:51:14] For larger exports, you may want to consider reducing from a periodic xml dump [22:51:19] Krinkle, So it timed out. I was trying to get userlinks and twinkle templates. [22:52:01] Krinkle, https://phabricator.wikimedia.org/P8579 [22:52:03] These can be downloaded from https://meta.wikimedia.org/wiki/Data_dumps, but to save time/disk space, you can also access them from the command-line in Toolforge, which might be simpler. [22:52:16] Yeah, that's probably too much. [22:52:26] But if you don't need recursion, then you might be able to do it in chunks [22:53:02] Krinkle, What's the best way for that? Do you know how many it can take? [22:53:13] depends on a lot of factors, but 100 should be fine [22:53:18] Shrink what you're asking for until it works. [22:53:18] maybe more, but varies [22:53:40] 100 2MB pages won't work; 100 now-empty user pages will definitely work. Etc. [22:53:49] it might be easier to do 9x 100, then to try and find the max and time out each time until then [22:54:35] Might have to try and work out if I don't need any [22:55:07] RhinosF1: Are you using it to export the wikitext of these templates? Or also other templates part of these etc. [22:55:32] Krinkle, Linked templates as well to stop things breaking [22:55:47] full history for attribution & legal [22:59:35] a link to the original page action=history with offset timestamp is usually considered enough, which is what we do on Wikipedia content e.g. for translated articles. [22:59:44] I assume this is for templates on your own wiki? [23:00:32] Krinkle, yep. But going through 900 pages to add that is as bad. Trying to shorten the list [23:14:40] 10Operations, 10ops-eqiad, 10Analytics: Degraded RAID on analytics1029 - https://phabricator.wikimedia.org/T224795 (10Peachey88) [23:21:49] Taken quite a few off Krinkle without a thought.