[00:04:09] 10Operations, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Reimage labstore1001 and labstore1002 for DRBD storage setup - https://phabricator.wikimedia.org/T158196#3462394 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['labstore1001.eqiad.wmnet'] ``` and were **ALL**... [00:12:21] 10Operations, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Reimage labstore1001 and labstore1002 for DRBD storage setup - https://phabricator.wikimedia.org/T158196#3462408 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by madhuvishy on neodymium.eqiad.wmnet for hosts:... [00:13:09] PROBLEM - Host labstore1001 is DOWN: PING CRITICAL - Packet loss = 100% [00:14:29] RECOVERY - Host labstore1001 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [00:19:05] (03PS1) 10Smalyshev: Fix service dependency name for update service [puppet] - 10https://gerrit.wikimedia.org/r/366989 (https://phabricator.wikimedia.org/T168918) [00:20:05] (03PS2) 10Smalyshev: Fix service dependency name for update service [puppet] - 10https://gerrit.wikimedia.org/r/366989 (https://phabricator.wikimedia.org/T168918) [00:37:08] 10Operations, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Reimage labstore1001 and labstore1002 for DRBD storage setup - https://phabricator.wikimedia.org/T158196#3462453 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['labstore1002.eqiad.wmnet'] ``` and were **ALL**... [01:20:10] PROBLEM - salt-minion processes on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:20:10] PROBLEM - nutcracker process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:20:10] PROBLEM - dhclient process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:21:09] RECOVERY - salt-minion processes on thumbor1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [01:21:09] RECOVERY - nutcracker process on thumbor1002 is OK: PROCS OK: 1 process with UID = 115 (nutcracker), command name nutcracker [01:21:09] RECOVERY - dhclient process on thumbor1002 is OK: PROCS OK: 0 processes with command name dhclient [01:40:22] (03PS1) 10Krinkle: Enable jQuery 3 on test.wikipedia.org and test2.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366994 (https://phabricator.wikimedia.org/T124742) [02:13:01] 10Operations, 10Performance-Team, 10Traffic: enwiki Main_Page timeouts - https://phabricator.wikimedia.org/T104225#3462570 (10Krinkle) 05Open>03Resolved Closing. Contrary to my previous comment, at least from tailing Varnish for a while, and from looking at slowparse.log, we no longer have any timeouts.... [02:17:32] 10Operations, 10Wikidata, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Add Dinka Wikipedia to Wikidata - https://phabricator.wikimedia.org/T170930#3462574 (10Dcljr) It looks like it will be necessary to purge the cache on [[ https://din.wikipedia.org/w/index.php?title=K%C3%ABc%C3%ABweek:Contribut... [02:28:44] 10Operations, 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Performance-Team, and 6 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3462580 (10Krinkle) >>! In T164173#3435975, @gerritbot wrote: > Change 364094 merged by jen... [02:38:56] 10Operations, 10Wikidata, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Add Dinka Wikipedia to Wikidata - https://phabricator.wikimedia.org/T170930#3462607 (10Dcljr) Never mind. There are so few pages, I'm just doing it myself, now. Will be done in about 10 minutes, probably. [02:52:21] 10Operations, 10Wikidata, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Add Dinka Wikipedia to Wikidata - https://phabricator.wikimedia.org/T170930#3462618 (10Dcljr) Done. [04:02:16] 10Operations, 10MediaWiki-JobRunner, 10Performance-Team: Investigate Jobrunner error increase - https://phabricator.wikimedia.org/T171371#3462630 (10Krinkle) [04:03:08] 10Operations, 10MediaWiki-JobRunner, 10Performance-Team: Investigate Jobrunner error increase - https://phabricator.wikimedia.org/T171371#3462642 (10Krinkle) [04:10:01] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=543.10 Read Requests/Sec=688.10 Write Requests/Sec=560.10 KBytes Read/Sec=45856.40 KBytes_Written/Sec=3999.20 [04:11:12] 10Operations, 10MediaWiki-JobRunner, 10Performance-Team: Investigate Jobrunner error increase - https://phabricator.wikimedia.org/T171371#3462648 (10Krinkle) Checking the local logs on one of the servers (mw1303), the error is quite easily found. The last two log files are 1000x bigger (19**G** instead of 8M... [04:15:10] 10Operations, 10MediaWiki-JobRunner, 10Performance-Team: Investigate 100x increase in Jobrunner errors - https://phabricator.wikimedia.org/T171371#3462656 (10Krinkle) [04:19:11] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=7.00 Read Requests/Sec=0.20 Write Requests/Sec=0.50 KBytes Read/Sec=1.20 KBytes_Written/Sec=15.60 [04:31:13] 10Operations, 10MediaWiki-JobRunner, 10Performance-Team: Investigate 30x increase in Jobrunner errors - https://phabricator.wikimedia.org/T171371#3462682 (10Krinkle) [04:42:54] (03CR) 10Nuria: "+1" [puppet] - 10https://gerrit.wikimedia.org/r/366966 (owner: 10Ottomata) [06:15:33] (03PS1) 10Krinkle: openstack: Remove stray pmtpa references [puppet] - 10https://gerrit.wikimedia.org/r/367004 [08:08:05] 10Operations, 10Commons, 10Performance-Team, 10Thumbor, 10media-storage: HTTP 429 on thumbnail images for specific SVG file on Commons - https://phabricator.wikimedia.org/T170628#3462796 (10MoritzMuehlenhoff) [08:41:35] (03PS1) 10MarcoAurelio: High density logos for es.wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367008 (https://phabricator.wikimedia.org/T170604) [09:05:31] (03CR) 1020after4: [C: 031] "Any reason not to merge this one?" [puppet] - 10https://gerrit.wikimedia.org/r/364148 (https://phabricator.wikimedia.org/T103886) (owner: 10Giuseppe Lavagetto) [09:07:08] (03CR) 1020after4: [C: 031] "I guess this won't break production..." [puppet] - 10https://gerrit.wikimedia.org/r/354247 (https://phabricator.wikimedia.org/T165643) (owner: 10Paladox) [10:57:59] 10Operations, 10Traffic, 10media-storage: Cache and media (images) issues on all Wikimedia wikis - creates problems on upload, display and generation of thumbnails and files - https://phabricator.wikimedia.org/T154780#3462876 (10DavidSaroyan) Hello! The same issue happened to us when uploading these files to... [11:53:16] (03PS7) 10Amire80: Make compact language links default for all Wikipedias except en and de [mediawiki-config] - 10https://gerrit.wikimedia.org/r/364428 [12:14:01] (03CR) 10Hoo man: [C: 031] "This will probably help, but will not fix the immediate queue backlog in question: We still only dispatch up to 275 changes every 25 secon" [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup) [12:37:11] (03PS1) 10Urbanecm: Allow flooders to remove themselves from the flood group on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367025 (https://phabricator.wikimedia.org/T171379) [12:37:54] (03PS7) 10Urbanecm: Add two lines to NamespacesAliases for zh_classical [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360450 (https://phabricator.wikimedia.org/T168422) [12:55:06] (03Abandoned) 10Urbanecm: Add two lines to NamespacesAliases for zh_classical [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360450 (https://phabricator.wikimedia.org/T168422) (owner: 10Urbanecm) [13:15:46] (03CR) 10Marostegui: [C: 031] "Looks good to me, and the server I was playing with db1051, looks good (pooled, and with the correct weight)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366837 (https://phabricator.wikimedia.org/T171282) (owner: 10Reedy) [13:31:21] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1952 bytes in 0.105 second response time [14:32:28] (03Draft1) 10Paladox: Gerrit: Add labs specific gerrit.config [puppet] - 10https://gerrit.wikimedia.org/r/367030 [14:32:31] (03PS2) 10Paladox: Gerrit: Add labs specific gerrit.config [puppet] - 10https://gerrit.wikimedia.org/r/367030 [14:32:58] (03CR) 10Paladox: "done in https://gerrit.wikimedia.org/r/#/c/367030/" [puppet] - 10https://gerrit.wikimedia.org/r/366768 (owner: 10Paladox) [14:33:08] (03PS10) 10Paladox: Gerrit: Make ldap servers configuable [puppet] - 10https://gerrit.wikimedia.org/r/366768 [14:33:11] (03Abandoned) 10Paladox: Gerrit: Make ldap servers configuable [puppet] - 10https://gerrit.wikimedia.org/r/366768 (owner: 10Paladox) [14:41:22] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1951 bytes in 0.144 second response time [17:41:37] (03CR) 10Dzahn: [C: 04-1] Gerrit: Add labs specific gerrit.config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/367030 (owner: 10Paladox) [17:42:18] (03CR) 10Paladox: Gerrit: Add labs specific gerrit.config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/367030 (owner: 10Paladox) [17:44:08] (03CR) 10Dzahn: [C: 04-1] Gerrit: Add labs specific gerrit.config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/367030 (owner: 10Paladox) [17:45:16] (03PS3) 10Paladox: Gerrit: Add labs specific gerrit.config [puppet] - 10https://gerrit.wikimedia.org/r/367030 [17:45:18] (03CR) 10Dzahn: [C: 04-1] "not needed - the idea was that you can change the name of the template file so that you can test any config change -without- needed a chan" [puppet] - 10https://gerrit.wikimedia.org/r/367030 (owner: 10Paladox) [17:46:20] (03PS4) 10Paladox: Gerrit: Add labs specific gerrit.config [puppet] - 10https://gerrit.wikimedia.org/r/367030 [20:03:40] (03CR) 10Krinkle: [C: 031] Fix up some file indenting broken by my phpcs changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366837 (https://phabricator.wikimedia.org/T171282) (owner: 10Reedy) [20:05:34] (03CR) 10Krinkle: [C: 031] Move some trailing ] onto newlines to make more balanced [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366839 (owner: 10Reedy) [20:05:50] (03CR) 10Krinkle: [C: 04-1] Function comments, parameters and stuffs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366771 (owner: 10Reedy) [20:07:06] (03CR) 10Krinkle: Just run updateArticleCount.php over all.dblist (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/363639 (owner: 10Reedy) [20:08:31] (03CR) 10Krinkle: [C: 031] "Nope, I suggest this Monday?" [puppet] - 10https://gerrit.wikimedia.org/r/364148 (https://phabricator.wikimedia.org/T103886) (owner: 10Giuseppe Lavagetto) [20:12:31] (03CR) 10Nemo bis: Just run updateArticleCount.php over all.dblist (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/363639 (owner: 10Reedy) [22:43:28] (03CR) 10Krinkle: varnish: Avoid std.fileread() and use new errorpage template (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/350966 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [22:45:41] (03CR) 10Krinkle: Just run updateArticleCount.php over all.dblist (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/363639 (owner: 10Reedy) [22:45:44] (03CR) 10Krinkle: [C: 04-1] Just run updateArticleCount.php over all.dblist [puppet] - 10https://gerrit.wikimedia.org/r/363639 (owner: 10Reedy) [22:48:10] (03CR) 10Reedy: "Well, no." [puppet] - 10https://gerrit.wikimedia.org/r/363639 (owner: 10Reedy) [22:54:32] (03CR) 10Reedy: "Plus, based on MediaWiki's usual inability to count..." [puppet] - 10https://gerrit.wikimedia.org/r/363639 (owner: 10Reedy) [23:00:36] (03CR) 10Reedy: "I don't think disabling something for 98.35% of wikibooks because 1.65% want a feature that doesn't support it ;)" [puppet] - 10https://gerrit.wikimedia.org/r/363639 (owner: 10Reedy) [23:21:21] PROBLEM - Apache HTTP on mw1277 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [23:22:21] RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.070 second response time [23:47:57] (03CR) 10Krinkle: [C: 04-1] "No, the count feature 'comma count' works fine. It's only updateArticleCount.php that's broken in its way to try and batch calculate it re" [puppet] - 10https://gerrit.wikimedia.org/r/363639 (owner: 10Reedy) [23:49:51] (03CR) 10Krinkle: [C: 04-1] "Presumably the other wikis will still dynamically recount their articles post-edit (as all wikis do), but the periodic recount to account " [puppet] - 10https://gerrit.wikimedia.org/r/363639 (owner: 10Reedy) [23:50:03] (03CR) 10Reedy: "Works fine, as long as the article count doesn't skew. Like we know it does. But we also apparently have no way to make it recalculate? Wh" [puppet] - 10https://gerrit.wikimedia.org/r/363639 (owner: 10Reedy) [23:52:02] (03CR) 10Reedy: "We could do a computed dblist... That is all minus those two. Or minus another list.. And make sure we execute it in something that expand" [puppet] - 10https://gerrit.wikimedia.org/r/363639 (owner: 10Reedy)