[00:03:34] (03PS2) 10Krinkle: build: Update PHPUnit from 3.7 to 4.8, add phplint to composer-test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331093 (https://phabricator.wikimedia.org/T85947) [00:03:48] (03CR) 10Krinkle: "check experimental" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331093 (https://phabricator.wikimedia.org/T85947) (owner: 10Krinkle) [00:08:06] (03PS3) 10Krinkle: build: Update PHPUnit from 3.7 to 4.8, add phplint to composer-test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331093 (https://phabricator.wikimedia.org/T85947) [00:08:11] (03PS1) 10Filippo Giunchedi: ganglia: display deprecation banner [puppet] - 10https://gerrit.wikimedia.org/r/331097 (https://phabricator.wikimedia.org/T145659) [00:10:26] (03PS4) 10Krinkle: build: Update PHPUnit from 3.7 to 4.8, add phplint to composer-test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331093 (https://phabricator.wikimedia.org/T85947) [00:10:32] (03CR) 10Krinkle: "check experimental" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331093 (https://phabricator.wikimedia.org/T85947) (owner: 10Krinkle) [00:14:50] (03CR) 10Krinkle: "@Hashar: Do we want php55 or hhvm for these jobs? Given that they run php lint, I guess we want to be conservative and not allow hhvm-spec" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331093 (https://phabricator.wikimedia.org/T85947) (owner: 10Krinkle) [00:16:10] 06Operations, 10MediaWiki-Vagrant, 06Release-Engineering-Team, 07Epic: [EPIC] Migrate base image to Debian Jessie - https://phabricator.wikimedia.org/T136429#2925115 (10bd808) [00:17:13] (03PS9) 10Paladox: Gerrit: Add support for logstash in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) [00:22:33] (03CR) 10Krinkle: "Drafted said change at I0ac50349." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325055 (https://phabricator.wikimedia.org/T85947) (owner: 10Krinkle) [00:22:50] (03CR) 10Krinkle: "Drafted at I0ac50349." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331093 (https://phabricator.wikimedia.org/T85947) (owner: 10Krinkle) [00:24:21] (03CR) 10Krinkle: "check experimental" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331093 (https://phabricator.wikimedia.org/T85947) (owner: 10Krinkle) [00:25:27] RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [00:26:15] (03CR) 10Krinkle: "check experimental" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325055 (https://phabricator.wikimedia.org/T85947) (owner: 10Krinkle) [00:27:28] (03PS5) 10Krinkle: build: Update PHPUnit from 3.7 to 4.8, add phplint to composer-test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331093 (https://phabricator.wikimedia.org/T85947) [00:27:34] (03CR) 10Krinkle: "check experimental" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331093 (https://phabricator.wikimedia.org/T85947) (owner: 10Krinkle) [00:35:23] (03PS6) 10Krinkle: build: Update PHPUnit from 3.7 to 4.8, add phplint to composer-test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331093 (https://phabricator.wikimedia.org/T85947) [00:36:35] (03CR) 10Krinkle: "Setting --ignore-fails because php-parallel-lint fails due to the dead-end PrivateSettings.php symlink that it tries to read as a PHP fil" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331093 (https://phabricator.wikimedia.org/T85947) (owner: 10Krinkle) [00:36:40] (03CR) 10Krinkle: "check experimental" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331093 (https://phabricator.wikimedia.org/T85947) (owner: 10Krinkle) [00:56:58] (03CR) 10Krinkle: [C: 032] "Doesn't affect the main Jenkins job nor beta or prod. Enables running phpunit locally for developers. Reviewed by Hashar. Verified by expe" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325055 (https://phabricator.wikimedia.org/T85947) (owner: 10Krinkle) [00:57:31] (03Merged) 10jenkins-bot: build: require-dev phpunit in composer.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325055 (https://phabricator.wikimedia.org/T85947) (owner: 10Krinkle) [00:57:41] (03CR) 10jenkins-bot: build: require-dev phpunit in composer.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325055 (https://phabricator.wikimedia.org/T85947) (owner: 10Krinkle) [00:58:00] (03CR) 10Krinkle: "Pending review from Hashar." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331093 (https://phabricator.wikimedia.org/T85947) (owner: 10Krinkle) [01:34:27] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:54:37] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:02:27] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [02:03:45] (03PS3) 10Andrew Bogott: Move shinkengen from using LDAP to the OpenStack APIs [puppet] - 10https://gerrit.wikimedia.org/r/328611 (owner: 10Alex Monk) [02:03:47] (03PS1) 10Andrew Bogott: Openstack clientlib: Include python3 packages if version is post-liberty [puppet] - 10https://gerrit.wikimedia.org/r/331105 [02:22:37] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [02:30:36] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.7) (duration: 11m 08s) [02:30:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:32:07] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [02:33:17] (03PS2) 10Andrew Bogott: Openstack clientlib: Include python3 packages if version is post-liberty [puppet] - 10https://gerrit.wikimedia.org/r/331105 [02:33:17] (03PS4) 10Andrew Bogott: Move shinkengen from using LDAP to the OpenStack APIs [puppet] - 10https://gerrit.wikimedia.org/r/328611 (owner: 10Alex Monk) [02:35:57] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Jan 7 02:35:56 UTC 2017 (duration 5m 20s) [02:35:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:36:00] (03PS3) 10Andrew Bogott: Openstack clientlib: Include python3 packages if version is post-liberty [puppet] - 10https://gerrit.wikimedia.org/r/331105 [02:36:02] (03PS5) 10Andrew Bogott: Move shinkengen from using LDAP to the OpenStack APIs [puppet] - 10https://gerrit.wikimedia.org/r/328611 (owner: 10Alex Monk) [02:41:02] (03PS4) 10Andrew Bogott: Openstack clientlib: Include python3 packages if version is post-liberty [puppet] - 10https://gerrit.wikimedia.org/r/331105 [02:41:04] (03PS6) 10Andrew Bogott: Move shinkengen from using LDAP to the OpenStack APIs [puppet] - 10https://gerrit.wikimedia.org/r/328611 (owner: 10Alex Monk) [02:42:39] (03PS5) 10Andrew Bogott: Openstack clientlib: Include python3 packages if version is post-liberty [puppet] - 10https://gerrit.wikimedia.org/r/331105 [02:42:41] (03PS7) 10Andrew Bogott: Move shinkengen from using LDAP to the OpenStack APIs [puppet] - 10https://gerrit.wikimedia.org/r/328611 (owner: 10Alex Monk) [02:49:49] (03CR) 10Andrew Bogott: "This works now, at least on Jessie." [puppet] - 10https://gerrit.wikimedia.org/r/328611 (owner: 10Alex Monk) [02:51:03] (03Abandoned) 10Andrew Bogott: Shinkengen: Get project hosts from openstack and not from ldap. [puppet] - 10https://gerrit.wikimedia.org/r/331005 (https://phabricator.wikimedia.org/T108625) (owner: 10Andrew Bogott) [03:00:07] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [03:06:50] 06Operations, 10MediaWiki-Configuration, 06Performance-Team, 06Services (watching), and 5 others: Integrating MediaWiki (and other services) with dynamic configuration - https://phabricator.wikimedia.org/T149617#2758050 (10srishakatux) To the owner of this session: Here is the link to the session guideline... [03:07:12] 06Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Asynchronous processing in production: one queue to rule them all - https://phabricator.wikimedia.org/T149408#2751310 (10srishakatux) To the owner of this session: Here is the link to the session guidelines page: https://www.mediawiki.org/w... [03:13:44] i missed that when odder was here, but for anyone watching, the answer was to use phabricator and add croslof and myself like https://phabricator.wikimedia.org/T154826 [03:14:02] and the phab tag called "domains" [03:14:15] if people have questions like that [03:14:18] * mutante signs out again [03:14:47] PROBLEM - puppet last run on db1044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:22:54] (03CR) 10Dzahn: [C: 031] "@Moritz any concerns about this one? Merge during allhands?" [puppet] - 10https://gerrit.wikimedia.org/r/314270 (https://phabricator.wikimedia.org/T115348) (owner: 10Muehlenhoff) [03:24:29] (03CR) 10Dzahn: [C: 031] Gerrit: Add support for logstash in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/330832 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [03:39:57] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [03:42:47] RECOVERY - puppet last run on db1044 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [04:07:57] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [05:03:47] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:11:26] !log Update statistics count on so.wikipedia (T154833) [05:11:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:11:30] T154833: Update statistics count on so.wikipedia - https://phabricator.wikimedia.org/T154833 [05:17:37] PROBLEM - puppet last run on elastic1027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:32:47] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [05:34:27] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [05:34:56] 06Operations, 10Dumps-Generation, 07HHVM: Merge facebook/hhvm@9d2be6c30b into build of next hhvm release - https://phabricator.wikimedia.org/T143648#2925425 (10Krinkle) [05:35:11] (03CR) 10Dereckson: [C: 031] Enable import from cswiki to arbcom_cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330983 (https://phabricator.wikimedia.org/T154799) (owner: 10Urbanecm) [05:35:37] PROBLEM - puppet last run on ms-be1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:37:27] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [05:45:37] RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [06:03:37] RECOVERY - puppet last run on ms-be1011 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:04:27] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [06:07:27] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:24:37] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:39:57] PROBLEM - puppet last run on pollux is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:40:37] PROBLEM - puppet last run on analytics1046 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata] [06:52:37] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [07:07:57] RECOVERY - puppet last run on pollux is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [07:08:37] RECOVERY - puppet last run on analytics1046 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [07:37:37] PROBLEM - puppet last run on db1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:05:37] RECOVERY - puppet last run on db1037 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [08:14:57] RECOVERY - Check systemd state on elastic2033 is OK: OK - running: The system is fully operational [08:39:57] PROBLEM - puppet last run on analytics1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:08:57] RECOVERY - puppet last run on analytics1046 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [09:32:07] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [09:33:38] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [09:34:47] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [09:36:37] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [10:00:07] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [10:02:47] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [10:12:07] PROBLEM - puppet last run on mx2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:12:37] PROBLEM - Disk space on mendelevium is CRITICAL: DISK CRITICAL - free space: / 756 MB (3% inode=16%) [10:14:37] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=1011.60 Read Requests/Sec=3696.70 Write Requests/Sec=6.20 KBytes Read/Sec=30821.60 KBytes_Written/Sec=2891.20 [10:27:37] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=162.70 Read Requests/Sec=206.90 Write Requests/Sec=3.90 KBytes Read/Sec=6013.60 KBytes_Written/Sec=519.60 [10:36:37] Morning everyone, I'm being told that OTRS is hosted on mendelevium, which is disk space critical right now. We also get the error "OTRS Daemon is not running. Please contact your administrator!" [10:36:46] is there anyone who can assist with this issue? [10:36:47] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:40:07] RECOVERY - puppet last run on mx2001 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [10:41:04] 06Operations, 10Traffic, 10media-storage: Cache and media (images) issues on all Wikimedia wikis - creates problems on upload, display and generation of thumbnails and files - https://phabricator.wikimedia.org/T154780#2925616 (10zhuyifei1999) [10:53:15] 06Operations, 10IDS-extension, 10Wikimedia-Extension-setup, 07I18n: Deploy IDS rendering engine to production - https://phabricator.wikimedia.org/T148693#2925648 (10Shoichi) >>! In T148693#2923787, @Niharika wrote: > Hi @Shoichi is the translation work currently in progress? Yes, in progress. Sorry for vac... [10:55:44] 06Operations, 10OTRS: OTRS down, unbreak now - https://phabricator.wikimedia.org/T154841#2925669 (10MarcoAurelio) [10:58:21] 06Operations, 10OTRS: OTRS down, unbreak now - https://phabricator.wikimedia.org/T154841#2925678 (10MarcoAurelio) [From the interface] The OTRS Daemon is a daemon process that performs asynchronous tasks, e.g. ticket escalation triggering, email sending, etc. A running OTRS Daemon is mandatory for correct sys... [10:58:25] 06Operations, 10OTRS: OTRS down, unbreak now - https://phabricator.wikimedia.org/T154841#2925680 (10DeltaQuad) 05:12:56 PROBLEM - Disk space on mendelevium is CRITICAL: DISK CRITICAL - free space: / 756 MB (3% inode=16%) OTRS reads "OTRS Daemon is not running. Please contact your administrator!" T150311 may... [11:02:38] 06Operations, 10OTRS: OTRS down, unbreak now - https://phabricator.wikimedia.org/T154841#2925686 (10MarcoAurelio) Going through https://ticket.wikimedia.org/otrs/index.pl?Action=AdminSysConfig to see if one could restart it via the GUI, gives: {F5246461} [11:04:47] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [11:14:07] PROBLEM - puppet last run on californium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:21:53] 06Operations, 10IDS-extension, 10Wikimedia-Extension-setup, 07I18n: Deploy IDS rendering engine to production - https://phabricator.wikimedia.org/T148693#2730181 (10zhuyifei1999) For better readability, is it possible for you to use a consistent syntax regarding translations? I see a few are used: `/** 組宋體... [11:22:18] 06Operations, 10OTRS: OTRS down, unbreak now - https://phabricator.wikimedia.org/T154841#2925738 (10Josve05a) [11:22:35] 06Operations, 10OTRS: OTRS down, unbreak now - https://phabricator.wikimedia.org/T154841#2925740 (10MarcoAurelio) [11:25:32] 06Operations, 10OTRS: OTRS down, unbreak now - https://phabricator.wikimedia.org/T154841#2925745 (10MarcoAurelio) >>! In T154841#2925680, @DeltaQuad wrote: > T150311 may possibly be related? Not sure, but adding @akosiaris and @fgiunchedi just in case. Thanks. [11:26:11] akosiaris: you around? :) [11:29:04] <_joe_> TabbyCat: I am, I can take a look at what's up with otrs [11:29:21] _joe_: appreciate it, thanks :) [11:32:38] RECOVERY - Disk space on mendelevium is OK: DISK OK [11:33:39] <_joe_> TabbyCat: better now? [11:33:48] let me check [11:34:06] I think the box was out-of-memory and needs a reboot [11:34:18] it /looks/ better [11:34:52] but the interface look good [11:35:04] <_joe_> !log restarted apache/otrs, removed a 8 gb error.log [11:35:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:13] <_joe_> !log from medelevium [11:35:16] <_joe_> meh [11:35:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:33] yep, still a lot of "no such ticket" errors in the internal log _joe_ - do we have to worry about that? [11:35:57] PROBLEM - puppet last run on xenon is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:35:58] I think we don't, as this may relate to what I did. [11:36:08] Lets concentrate on if anything got lost. [11:36:26] maybe as it's OTRS-otrs.Daemon.pl - Daemon Kernel::System::Daemon::DaemonModules::SchedulerTaskWorker-10 [11:36:35] scheduler task [11:37:00] <_joe_> krd: actually, I think that's what caused problems [11:37:17] maybe it's still running in background? [11:37:45] I checked that new tickets do arrive. [11:38:02] An email that I sent during the downtime did not arrive yet. [11:38:10] Is there any email queue in the way? [11:38:43] 06Operations, 10OTRS: OTRS down, unbreak now - https://phabricator.wikimedia.org/T154841#2925651 (10Joe) There was a huge error log for apache caused by an error in inserting a ticket in the history; I stopped apache, removed the file that was filling up the root filesystem, and started apache/otrs back again.... [11:39:17] <_joe_> yeah, there are several email queues, but it might be that mail couldn't be delivered to OTRS [11:39:35] <_joe_> sorry, gotta step away for a few, please let me know if things are ok [11:39:42] <_joe_> I'll be back in 10 minutes [11:39:54] Thanks for your assistance [11:42:07] RECOVERY - puppet last run on californium is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [11:44:15] 06Operations, 10OTRS: OTRS down, unbreak now - https://phabricator.wikimedia.org/T154841#2925774 (10DatGuy) p:05Unbreak!>03Normal a:03Joe Seems like that fixed it. Lowering priority for monitoring, assigning task. [11:46:16] 06Operations, 10OTRS: OTRS down, unbreak now - https://phabricator.wikimedia.org/T154841#2925782 (10Krd) The system is up again, and the dashboard look good to me. I can confirm that an e-mail sent during the downtime finally created a ticket, so I assume there was no data loss. From my point, the ticket can... [11:46:42] 06Operations, 10OTRS: OTRS error (back up, now monitoring) - https://phabricator.wikimedia.org/T154841#2925786 (10DatGuy) [11:47:21] _joe_: the videoscalers seems really wrird now, comparing https://grafana.wikimedia.org/dashboard/db/job-queue-health?var-jobType=webVideoTranscode&from=now-24h&to=now-5m & https://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&c=Video+scalers+eqiad&h=&tab=m&vn=&hide-hf=false&m=cpu_report&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name & [11:47:21] https://commons.wikimedia.org/wiki/Special:TimedMediaHandler [11:48:04] and ideas what's wrong? [11:48:08] *any [11:49:50] by weird I mean low load despite long queue. special page says queue size 2500+, grafana shows queue wait median = two day [11:50:17] <_joe_> zhuyifei1999_: I would not trust the special page [11:50:51] <_joe_> and tbh, I think this can wait for next week during working hours, but let me do one check [11:50:53] well yeah, but there's still grafana [11:50:58] k [11:51:48] <_joe_> zhuyifei1999_: ok the status _in_the_queue_ is [11:51:50] <_joe_> webVideoTranscode: 0 queued; 5725 claimed (4205 active, 1520 abandoned); 0 delayed [11:52:00] <_joe_> this means that a lot of jobs failed yesterday [11:52:08] <_joe_> during our swift outage [11:52:19] <_joe_> and they'll be recycled later [11:52:41] <_joe_> TabbyCat: any more issues with OTRS? [11:52:55] <_joe_> else I'd avoid disturbing akosiaris during the weekend [11:53:27] _joe_: k [11:53:43] 06Operations, 10OTRS: OTRS error (back up, now monitoring) - https://phabricator.wikimedia.org/T154841#2925829 (10Joe) a:05Joe>03None [11:54:15] 06Operations, 10OTRS: OTRS error (back up, now monitoring) - https://phabricator.wikimedia.org/T154841#2925651 (10Joe) De-assigning from me as I'm going to hop on a plane in a few hours from now and I won't be able to follow through on monday. [11:55:43] (03CR) 10Zfilipin: [C: 031] build: update rubocop to 0.39 and tweak config [puppet] - 10https://gerrit.wikimedia.org/r/330470 (owner: 10Hashar) [11:56:47] PROBLEM - puppet last run on mw1222 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:56:54] 06Operations, 10IDS-extension, 10Wikimedia-Extension-setup, 07I18n: Deploy IDS rendering engine to production - https://phabricator.wikimedia.org/T148693#2925831 (10Shoichi) Thanks ,I pushed my [[ https://github.com/Wikimedia-TW/han3_ji7_tsoo1_kian3_WM/blob/code_vf_H2E/src/main/java/idsrend/services/IDSren... [11:57:00] _joe_: I've noticed none for now, maybe krd can say if he's found anything else? [11:57:30] well, krd is not here anymore :| [12:03:57] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [12:11:07] (03CR) 10Muehlenhoff: "No concerns at all, just needs some manual rebasing, since there have been other changes to the standard packages that gerrit chokes on. F" [puppet] - 10https://gerrit.wikimedia.org/r/314270 (https://phabricator.wikimedia.org/T115348) (owner: 10Muehlenhoff) [12:11:25] (03PS1) 10Umherirrender: Expand .gitignore for more editors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331138 [12:13:30] (03PS2) 10Umherirrender: Expand .gitignore for more editors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331138 [12:24:47] RECOVERY - puppet last run on mw1222 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [13:25:47] PROBLEM - puppet last run on mc1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:31:30] !log elastic@codfw removing/readding replicas for viwiki_general and zhwiki_content (affected by something similar to https://github.com/elastic/elasticsearch/issues/12661) - T154765 [13:31:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:31:34] T154765: Make elasticsearch more resilient to small network hiccups - https://phabricator.wikimedia.org/T154765 [13:33:47] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [13:36:47] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [13:38:57] PROBLEM - puppet last run on db1075 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:54:47] RECOVERY - puppet last run on mc1002 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [14:07:57] RECOVERY - puppet last run on db1075 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [14:09:37] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=365.80 Read Requests/Sec=2475.90 Write Requests/Sec=0.30 KBytes Read/Sec=21818.40 KBytes_Written/Sec=10.40 [14:21:37] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=214.30 Read Requests/Sec=319.70 Write Requests/Sec=72.10 KBytes Read/Sec=7613.60 KBytes_Written/Sec=494.40 [14:35:34] (03PS2) 10Tim Landscheidt: Tools: Enable PHP module mcrypt on Trusty execution nodes [puppet] - 10https://gerrit.wikimedia.org/r/324957 (https://phabricator.wikimedia.org/T97857) [14:40:22] (03PS2) 10Tim Landscheidt: dynamicproxy: Indent @ssl_settings in NGINX configurations [puppet] - 10https://gerrit.wikimedia.org/r/329750 [14:45:03] (03PS2) 10Tim Landscheidt: dumps: Indent @ssl_settings in NGINX configuration [puppet] - 10https://gerrit.wikimedia.org/r/329736 [14:59:37] (03CR) 10ArielGlenn: [C: 032] dumps: Indent @ssl_settings in NGINX configuration [puppet] - 10https://gerrit.wikimedia.org/r/329736 (owner: 10Tim Landscheidt) [15:29:07] RECOVERY - Check systemd state on elastic2030 is OK: OK - running: The system is fully operational [15:33:47] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [15:36:47] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:00:47] PROBLEM - puppet last run on analytics1045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:28:47] RECOVERY - puppet last run on analytics1045 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [16:33:47] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [16:36:47] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [17:12:04] 06Operations, 10IDS-extension, 10Wikimedia-Extension-setup, 07I18n: Deploy IDS rendering engine to production - https://phabricator.wikimedia.org/T148693#2926097 (10Arthur2e5) Regarding marking translated comments, consider using something like /*e to replace /**, and //e to replace ///. This would allow e... [17:23:13] any opsen around ? [18:07:27] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/1: down - Core: cr1-ulsfo:xe-1/2/0 (Telia, IC-313592, 51ms) {#11372} [10Gbps wave]BR [18:08:17] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 66, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/2/0: down - Core: cr1-eqord:xe-0/0/1 (Telia, IC-313592, 51ms) {#1502} [10Gbps wave]BR [18:15:57] PROBLEM - puppet last run on thumbor1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:44:57] RECOVERY - puppet last run on thumbor1002 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [19:57:10] Anyone around that can take a look at what is up with the video scalers? [19:58:19] I think it’s once again loading up with trancodes that were ‘broken’ from the drama the other day, and not being shown while processing. [20:21:43] hi, someone is asking which version of lilypond is running on WMF servers for Extension:Score, any idea? [20:23:17] PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:33:04] (03CR) 10Aude: Update Wikidata property blacklist (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/329762 (owner: 10Matěj Suchánek) [20:52:17] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [20:58:26] (03CR) 10Hashar: [C: 031] "The committed autoloader file are messed up when running composer install but I don't think we have a way to avoid it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331093 (https://phabricator.wikimedia.org/T85947) (owner: 10Krinkle) [21:00:25] (03CR) 10Hashar: [C: 031] Labs: remove wmgUseGWToolset [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328870 (owner: 10MaxSem) [21:22:17] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:30:27] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [21:31:27] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 68, down: 0, dormant: 0, excluded: 0, unused: 0 [21:33:17] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [21:40:07] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [21:42:07] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [21:47:37] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:47:37] PROBLEM - citoid endpoints health on scb2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:48:27] RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy [21:48:27] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [21:52:27] PROBLEM - citoid endpoints health on scb2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:53:17] RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy [21:58:27] PROBLEM - citoid endpoints health on scb2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:58:37] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:58:37] PROBLEM - citoid endpoints health on scb2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:59:17] RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy [21:59:27] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [21:59:27] RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy [23:29:07] PROBLEM - Host google is DOWN: PING CRITICAL - Packet loss = 100% [23:29:27] RECOVERY - Host google is UP: PING OK - Packet loss = 0%, RTA = 1.61 ms [23:29:37] lol