[00:05:50] (03PS1) 10Madhuvishy: labstore backups: Switch secondary labstore cluster backup source to labstore1005 [puppet] - 10https://gerrit.wikimedia.org/r/367116 [00:06:35] (03CR) 10jerkins-bot: [V: 04-1] labstore backups: Switch secondary labstore cluster backup source to labstore1005 [puppet] - 10https://gerrit.wikimedia.org/r/367116 (owner: 10Madhuvishy) [00:08:39] (03PS2) 10Madhuvishy: labstore backups: Switch secondary cluster backup source [puppet] - 10https://gerrit.wikimedia.org/r/367116 [00:13:58] (03CR) 10Madhuvishy: [C: 032] labstore backups: Switch secondary cluster backup source [puppet] - 10https://gerrit.wikimedia.org/r/367116 (owner: 10Madhuvishy) [00:21:22] 10Operations, 10Cloud-Services, 10Patch-For-Review, 10Tracking: overhaul labstore setup [tracking] - https://phabricator.wikimedia.org/T126083#3463424 (10madhuvishy) [00:21:25] 10Operations, 10Cloud-Services, 10Tracking: Cleanup tools nfs share on labstore1004/5 - https://phabricator.wikimedia.org/T156982#3463421 (10madhuvishy) 05Open>03Resolved a:03madhuvishy This was done. Let's open a different task if we plan to do a clean up in the future. [00:21:47] 10Operations, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Reimage labstore1001 and labstore1002 for DRBD storage setup - https://phabricator.wikimedia.org/T158196#3463427 (10madhuvishy) [00:21:50] 10Operations, 10Cloud-Services: Move labstore1002 and labstore1002-array1 and labstore1002-array2 to different rack (currently in C3) - https://phabricator.wikimedia.org/T158913#3463425 (10madhuvishy) 05Open>03Resolved a:03madhuvishy [00:22:47] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): Investigate alternative RAID strategies for labstore1001/2 - https://phabricator.wikimedia.org/T162090#3463428 (10madhuvishy) Update: I have reimaged labstore1001 and labstore1002 with RAID 50 for the external shelf storage. [00:44:27] (03PS3) 10Reedy: Function comments, parameters and stuffs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366771 [00:45:24] (03PS4) 10Reedy: Function comments, parameters and stuffs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366771 [02:09:44] 10Operations, 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Performance-Team, and 6 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3463486 (10aaron) I commented on the patch, it's a __METHOD__ mismatch problem, so the comm... [02:27:13] Anyone around? [02:27:29] oh, possibly I want -tech, but... [02:27:30] [WXQJSApAIDQAABjqvVUAAABG] 2017-07-23 02:26:16: Fatal exception of type "MWException" [02:28:56] (It no longer happens, but Safari Version 10.1.1 (12603.2.4), MacOS 10.12.5 [02:44:32] tzatziki: what were you doing? [02:45:20] legoktm: I typed "Paper Towns" into the search box and clicked the result for the film [02:45:25] and then it gave me that [02:45:39] you ran into https://phabricator.wikimedia.org/T166348 [02:47:50] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): labstore systemd state Icinga alarms - https://phabricator.wikimedia.org/T151322#3463492 (10madhuvishy) Labstore1002 has been reimaged - these failures don't get reported anymore. I'll reimage 2001 in the upcoming week. [03:32:41] PROBLEM - puppet last run on mw2112 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz] [03:33:02] PROBLEM - puppet last run on analytics1042 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz] [04:01:01] RECOVERY - puppet last run on mw2112 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [04:01:31] RECOVERY - puppet last run on analytics1042 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [04:09:11] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=6473.40 Read Requests/Sec=4822.90 Write Requests/Sec=15.00 KBytes Read/Sec=19374.00 KBytes_Written/Sec=107.60 [04:15:21] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=117.80 Read Requests/Sec=168.80 Write Requests/Sec=10.10 KBytes Read/Sec=2165.60 KBytes_Written/Sec=272.00 [05:33:21] (03PS1) 10Legoktm: visualdiff: Remove manually built `uprightdiff` [puppet] - 10https://gerrit.wikimedia.org/r/367131 [05:36:31] (03CR) 10Legoktm: "The promised follow-up is Change-Id: I9e0eb9e75af15c0b33970096b01799ea9d5c25bf" [puppet] - 10https://gerrit.wikimedia.org/r/327028 (owner: 10Legoktm) [06:19:19] legoktm: belated thanks for link :) [07:37:02] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [140.0] [07:38:19] ^ that's me [07:49:53] 10Operations, 10Commons, 10Category: Category page HTTP ERROR 500/503: (Commons, probably language setting) - https://phabricator.wikimedia.org/T171392#3463572 (10zhuyifei1999) [07:50:20] 10Operations, 10Commons, 10Category, 10Performance: Category page HTTP ERROR 500/503: (Commons, probably language setting) - https://phabricator.wikimedia.org/T171392#3463322 (10zhuyifei1999) [07:54:37] 10Operations, 10Commons, 10Category, 10Performance: Category page HTTP ERROR 500/503: (Commons, probably language setting) - https://phabricator.wikimedia.org/T171392#3463574 (10zhuyifei1999) A few refreshes on https://commons.wikimedia.org/wiki/Category:SVG_logos_of_the_United_Kingdom?uselang=de I got: ``... [07:56:32] (03CR) 10Luke081515: [C: 031] Allow flooders to remove themselves from the flood group on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367025 (https://phabricator.wikimedia.org/T171379) (owner: 10Urbanecm) [08:00:44] 10Operations, 10Commons, 10Category: Category page HTTP ERROR 500/503: (Commons, probably language setting) - https://phabricator.wikimedia.org/T171392#3463575 (10zhuyifei1999) ^ was with X-Wikimedia-Debug. without it I get: `Lua error in mw.wikibase.entity.lua at line 34: The entity data must be a table obt... [08:09:35] 10Operations, 10Commons, 10Category: Category page HTTP ERROR 500/503: (Commons, probably language setting) - https://phabricator.wikimedia.org/T171392#3463584 (10zhuyifei1999) After `action=purge` the page will no longer load (blank 500 page). Right now it seems to me that turning on X-Wikimedia-Debug the... [08:14:56] 10Operations, 10Commons, 10Category: Category page HTTP ERROR 500/503: (Commons, probably language setting) - https://phabricator.wikimedia.org/T171392#3463585 (10zhuyifei1999) 10 days => July 13 => Possibly related to [[https://wikitech.wikimedia.org/w/index.php?title=Deployments&oldid=1765139#deploycal-ite... [08:43:53] 10Operations, 10Commons, 10Category: Category page HTTP ERROR 500/503: (Commons, probably language setting) - https://phabricator.wikimedia.org/T171392#3463322 (10Vachovec1) >>! In T171392#3463575, @zhuyifei1999 wrote: > ^ was with X-Wikimedia-Debug. without it I get: `Lua error in mw.wikibase.entity.lua at... [08:51:00] 10Operations, 10Commons, 10Category: Category page HTTP ERROR 500/503: (Commons, probably language setting) - https://phabricator.wikimedia.org/T171392#3463592 (10zhuyifei1999) Wikibase [[https://commons.wikimedia.org/wiki/User:Zhuyifei1999/sandbox?uselang=de|does not seem to have problems]] accessing bulk d... [09:31:44] 10Operations, 10Commons, 10Category: Category page HTTP ERROR 500/503: (Commons, probably language setting) - https://phabricator.wikimedia.org/T171392#3463605 (10zhuyifei1999) Cost of crashing = [[https://commons.wikimedia.org/wiki/User:Zhuyifei1999/sandbox/3?uselang=de|three langswitches]]. (but expanding... [09:34:07] (03PS1) 10Foxy brown: Enable Article Reminder feature flag on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367318 (https://phabricator.wikimedia.org/T169354) [09:34:41] (03PS2) 10Foxy brown: Enable Article Reminder feature flag on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367318 (https://phabricator.wikimedia.org/T169354) [09:49:27] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto, 10Category: Category page HTTP ERROR 500/503: (Commons, probably language setting) - https://phabricator.wikimedia.org/T171392#3463617 (10zhuyifei1999) [09:50:00] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto: Category page HTTP ERROR 500/503: (Commons, probably language setting) - https://phabricator.wikimedia.org/T171392#3463322 (10zhuyifei1999) [10:17:36] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto: Category page HTTP ERROR 500/503: (Commons, probably language setting) - https://phabricator.wikimedia.org/T171392#3463639 (10zhuyifei1999) I've made an ugly workaround in https://commons.wikimedia.org/wiki/User:Zhuyifei1999/sandbox/5?uselang=de, [[h... [10:28:20] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto: Commons pages transcluding Template:Countries_of_Europe with prefix= set HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3463640 (10zhuyifei1999) [10:35:33] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto: Commons pages transcluding Template:Countries_of_Europe with prefix= set HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3463641 (10Vachovec1) @zhuyifei1999: I removed the {{... [10:43:33] (03CR) 10Nemo bis: "I agree those two wikis are not supposed to block everyone else. As long as the DB machines are happy, running updateArticleCount.php on a" [puppet] - 10https://gerrit.wikimedia.org/r/363639 (owner: 10Reedy) [10:57:41] PROBLEM - nova instance creation test on labnet1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack [10:57:47] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto: Commons pages transcluding Template:Countries_of_Europe with prefix= set HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3463648 (10zhuyifei1999) @Vachovec1 it's transcluded... [11:01:44] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3463653 (10zhuyifei1999) [11:04:17] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3463654 (10Sitacuisses) >>! In T171392#3463585, @zhuyifei1999 wro... [11:12:51] 10Operations, 10Wikidata, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Add Dinka Wikipedia to Wikidata - https://phabricator.wikimedia.org/T170930#3463661 (10Amire80) >>! In T170930#3461803, @aude wrote: > for some reason, the sites table on dinwiki only had an entry for dinwiki and not any other... [11:28:11] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3463671 (10Vachovec1) >>! In T171392#3463648, @zhuyifei1999 wrote... [11:35:07] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3463675 (10zhuyifei1999) >>! In T171392#3463671, @Vachovec1 wrote... [11:48:04] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3463691 (10zhuyifei1999) >>! In T171392#3463671, @Vachovec1 wrote... [11:56:51] (03PS21) 10Ema: varnish: Avoid std.fileread() and use new errorpage template [puppet] - 10https://gerrit.wikimedia.org/r/350966 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [12:00:06] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3463692 (10Vachovec1) >>! In T171392#3463675, @zhuyifei1999 wrote... [12:02:08] !log CI is overloaded due to a mass update of mediawiki-codesniffer to 0.10.1 [12:02:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:02:36] !log CI should self recover when the queue is processed. Will check again in an hour or so [12:02:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:05:11] PROBLEM - Nginx local proxy to apache on mwdebug1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1313 bytes in 0.010 second response time [12:05:11] PROBLEM - HHVM rendering on mwdebug1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1313 bytes in 0.002 second response time [12:05:28] (03CR) 10Ema: varnish: Avoid std.fileread() and use new errorpage template (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/350966 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [12:05:31] PROBLEM - Apache HTTP on mwdebug1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1313 bytes in 0.001 second response time [12:05:40] whoops [12:06:11] RECOVERY - Nginx local proxy to apache on mwdebug1001 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 619 bytes in 0.225 second response time [12:06:11] RECOVERY - HHVM rendering on mwdebug1001 is OK: HTTP OK: HTTP/1.1 200 OK - 73933 bytes in 0.747 second response time [12:06:31] RECOVERY - Apache HTTP on mwdebug1001 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.100 second response time [12:06:43] !log Restarted hhvm and apache2 on mwdebug1001 [12:06:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:10:12] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto, 10Wikimedia-log-errors: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3463694 (10hoo) With https://commons.wi... [12:12:50] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto, 10Wikimedia-log-errors: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3463322 (10hoo) Just as a note: I didn'... [12:13:04] so CI is all fine. Just has too many patches to process [12:13:38] that will self solve over the next hour so. I will check back then :] [12:17:22] (03PS22) 10Ema: varnish: Avoid std.fileread() and use new errorpage template [puppet] - 10https://gerrit.wikimedia.org/r/350966 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [12:19:46] ACKNOWLEDGEMENT - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [140.0] amusso Mass changes being made to mediawiki extensions [12:44:04] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto, 10Wikimedia-log-errors: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3463706 (10zhuyifei1999) >>! In T171392... [13:05:58] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto, 10Wikimedia-log-errors: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3463707 (10Perhelion) [13:13:31] RECOVERY - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] [16:06:31] PROBLEM - pdfrender on scb1001 is CRITICAL: connect to address 10.64.0.16 and port 5252: Connection refused [16:07:28] (03CR) 10Daniel Kinzler: "we should also consider spinning up more dispatchers" [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup) [16:13:11] PROBLEM - Check Varnish expiry mailbox lag on cp1099 is CRITICAL: CRITICAL: expiry mailbox lag is 2016638 [16:29:41] RECOVERY - nova instance creation test on labnet1001 is OK: PROCS OK: 1 process with command name python, args nova-fullstack [16:32:41] PROBLEM - nova instance creation test on labnet1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack [16:33:11] RECOVERY - Check Varnish expiry mailbox lag on cp1099 is OK: OK: expiry mailbox lag is 0 [16:57:49] (03PS2) 10KartikMistry: WIP: cg3: New upstream version [debs/contenttranslation/cg3] - 10https://gerrit.wikimedia.org/r/362334 (https://phabricator.wikimedia.org/T171406) [17:06:56] (03PS3) 10Giuseppe Lavagetto: rake: new rakefile specifically for CI [puppet] - 10https://gerrit.wikimedia.org/r/366591 (https://phabricator.wikimedia.org/T166888) [17:59:52] RECOVERY - nova instance creation test on labnet1001 is OK: PROCS OK: 1 process with command name python, args nova-fullstack [18:03:01] PROBLEM - nova instance creation test on labnet1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack [19:55:18] (03PS1) 10Urbanecm: Add some import sources for tawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367334 (https://phabricator.wikimedia.org/T171395) [20:00:21] RECOVERY - nova instance creation test on labnet1001 is OK: PROCS OK: 1 process with command name python, args nova-fullstack [20:03:21] PROBLEM - nova instance creation test on labnet1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack [20:09:48] (03CR) 10Luke081515: [C: 031] Add some import sources for tawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367334 (https://phabricator.wikimedia.org/T171395) (owner: 10Urbanecm) [20:42:01] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:12:11] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:29:32] RECOVERY - nova instance creation test on labnet1001 is OK: PROCS OK: 1 process with command name python, args nova-fullstack [21:32:32] PROBLEM - nova instance creation test on labnet1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack [21:50:41] RECOVERY - nova instance creation test on labnet1001 is OK: PROCS OK: 1 process with command name python, args nova-fullstack [22:41:37] PROBLEM - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:42:26] RECOVERY - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 173 bytes in 0.002 second response time [22:42:54] sigh, I'll check what's up [22:44:21] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [22:46:21] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [22:51:21] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [22:52:21] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:26:38] !log legoktm@tin Synchronized php-1.30.0-wmf.10/includes/page/Article.php: [SECURITY] Restore ability to suppress pages while deleting - T171405 (duration: 00m 45s) [23:26:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:28:38] * TabbyCat checks [23:29:03] lgtm [23:34:57] 10Operations, 10Data-Services, 10Tracking: overhaul labstore setup [tracking] - https://phabricator.wikimedia.org/T126083#3464301 (10bd808) [23:37:57] 10Operations, 10Data-Services: Convert labstore cluster configuration to hiera and profiles - https://phabricator.wikimedia.org/T161835#3464307 (10bd808) [23:38:02] 10Operations, 10Data-Services, 10cloud-services-team (Kanban): Undo special tools-home and tools-project share definitions for NFS - https://phabricator.wikimedia.org/T161834#3464308 (10bd808) [23:38:10] 10Operations, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Reimage labstore1001 and labstore1002 for DRBD storage setup - https://phabricator.wikimedia.org/T158196#3464310 (10bd808) [23:38:16] 10Operations, 10Data-Services: evaluate possibility for nscd use with useldap - https://phabricator.wikimedia.org/T124991#3464312 (10bd808)