[00:02:12] PROBLEM - citoid endpoints health on sca1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:03:29] 7Puppet, 6Reading-Infrastructure-Team, 6Release-Engineering, 10Sentry, and 2 others: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1481681 (10Tgr) [00:04:59] 7Puppet, 6Reading-Infrastructure-Team, 6Release-Engineering, 10Sentry, and 2 others: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1481690 (10Tgr) p:5Normal>3High [00:06:12] RECOVERY - citoid endpoints health on sca1001 is OK: All endpoints are healthy [00:10:35] 6operations, 10Traffic, 7HTTPS, 5Patch-For-Review: Decom old multiple-subdomain wikis in wikipedia.org - https://phabricator.wikimedia.org/T102814#1481708 (10Philippe-WMF) no; will do tonight. [00:12:31] 6operations, 6Community-Advocacy, 10Traffic, 7HTTPS, 5Patch-For-Review: Decom old multiple-subdomain wikis in wikipedia.org - https://phabricator.wikimedia.org/T102814#1481709 (10Philippe-WMF) [00:25:02] RECOVERY - puppet last run on mw2091 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [00:40:28] 6operations, 6Phabricator: Consider using the Badges application for a few special roles to highlight those users' comments - https://phabricator.wikimedia.org/T106924#1481744 (10Jdforrester-WMF) 3NEW [01:56:55] Seems like Images are not Showing on en-wiki... [02:03:54] !log LocalisationUpdate failed (1.26wmf15) at 2015-07-25 02:03:54+00:00 [02:04:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:06:28] (03PS1) 10Mattflaschen: Enable Flow on all wikis, except private and a couple special wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/226954 [02:07:01] (03PS2) 10Mattflaschen: Enable Flow on all wikis, except private and a couple special wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/226954 (https://phabricator.wikimedia.org/T106562) [02:08:04] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 25 02:08:04 UTC 2015 (duration 8m 3s) [02:08:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:16:39] Lor_, looks fine to me [02:17:07] 6operations, 6Community-Advocacy, 10Traffic, 7HTTPS, 5Patch-For-Review: Decom old multiple-subdomain wikis in wikipedia.org - https://phabricator.wikimedia.org/T102814#1481872 (10Philippe-WMF) They're notified. :) [02:17:09] Krenair, Weird, some pages i'm not seeing images today...maybe it's just me. [02:17:10] 6operations, 6Community-Advocacy, 10Traffic, 7HTTPS, 5Patch-For-Review: Decom old multiple-subdomain wikis in wikipedia.org - https://phabricator.wikimedia.org/T102814#1481874 (10NativeForeigner) Confirmed. Arbcom-en has been notified. [02:18:39] Lor_, examples? [02:19:12] 6operations, 6Community-Advocacy, 10Traffic, 7HTTPS, 5Patch-For-Review: Decom old multiple-subdomain wikis in wikipedia.org - https://phabricator.wikimedia.org/T102814#1481875 (10Reedy) >>! In T102814#1481872, @Philippe-WMF wrote: > They're notified. :) Woo, thanks! Out of interest, what exactly did yo... [02:19:13] Krenair, https://en.wikipedia.org/wiki/Governing_Past_Dues for example [02:19:52] https://upload.wikimedia.org/wikipedia/en/thumb/4/4a/Governing_Past_Dues.jpg/220px-Governing_Past_Dues.jpg -> Error generating thumbnail [02:21:16] Interesting. [02:21:43] I thought it may have been a server problem, guess that explains why some images do show up [02:34:25] 6operations, 6Community-Advocacy, 10Traffic, 7HTTPS, 5Patch-For-Review: Decom old multiple-subdomain wikis in wikipedia.org - https://phabricator.wikimedia.org/T102814#1481903 (10Jalexander) We told them we'd remove it soon :) specifically no time frame. I would wait until Monday or Tuesday to ensure the... [02:35:00] !log l10nupdate Synchronized php-1.26wmf15/cache/l10n: (no message) (duration: 09m 52s) [02:35:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:41:09] !log LocalisationUpdate completed (1.26wmf15) at 2015-07-25 02:41:09+00:00 [02:41:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:06:05] (03CR) 10Alex Monk: [C: 04-1] "When removing a dblist you also need to remove it from the list of tags and... in theory remove the noc conf symlink, but it looks like so" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/226954 (https://phabricator.wikimedia.org/T106562) (owner: 10Mattflaschen) [03:35:23] PROBLEM - Persistent high iowait on labstore2001 is CRITICAL 50.00% of data above the critical threshold [35.0] [03:47:23] RECOVERY - Persistent high iowait on labstore2001 is OK Less than 50.00% above the threshold [25.0] [06:31:13] PROBLEM - puppet last run on lvs1003 is CRITICAL Puppet has 1 failures [06:31:24] PROBLEM - puppet last run on mw2023 is CRITICAL Puppet has 1 failures [06:31:33] PROBLEM - puppet last run on holmium is CRITICAL Puppet has 1 failures [06:31:53] PROBLEM - puppet last run on mw1060 is CRITICAL Puppet has 1 failures [06:31:53] PROBLEM - puppet last run on mw2207 is CRITICAL Puppet has 1 failures [06:32:03] PROBLEM - puppet last run on mw1086 is CRITICAL Puppet has 2 failures [06:32:04] PROBLEM - puppet last run on mw1135 is CRITICAL Puppet has 1 failures [06:32:13] PROBLEM - puppet last run on sca1001 is CRITICAL Puppet has 1 failures [06:32:23] PROBLEM - puppet last run on mw2016 is CRITICAL Puppet has 1 failures [06:32:24] PROBLEM - puppet last run on mc2005 is CRITICAL Puppet has 1 failures [06:32:24] PROBLEM - puppet last run on subra is CRITICAL Puppet has 1 failures [06:32:43] PROBLEM - puppet last run on mw1158 is CRITICAL Puppet has 1 failures [06:32:52] PROBLEM - puppet last run on db2058 is CRITICAL Puppet has 1 failures [06:32:54] PROBLEM - puppet last run on chromium is CRITICAL Puppet has 3 failures [06:33:23] PROBLEM - puppet last run on mw2129 is CRITICAL Puppet has 2 failures [06:55:24] RECOVERY - puppet last run on holmium is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:55:43] RECOVERY - puppet last run on mw1060 is OK Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:55:53] RECOVERY - puppet last run on mw1086 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:56:03] RECOVERY - puppet last run on sca1001 is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures [06:56:13] RECOVERY - puppet last run on mw2016 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:56:14] RECOVERY - puppet last run on subra is OK Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:56:14] RECOVERY - puppet last run on mc2005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:33] RECOVERY - puppet last run on mw1158 is OK Puppet is currently enabled, last run 0 seconds ago with 0 failures [06:56:43] RECOVERY - puppet last run on db2058 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:52] RECOVERY - puppet last run on chromium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:03] RECOVERY - puppet last run on lvs1003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:23] RECOVERY - puppet last run on mw2129 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:57:23] RECOVERY - puppet last run on mw2023 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:52] RECOVERY - puppet last run on mw2207 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:53] RECOVERY - puppet last run on mw1135 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [07:05:09] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 25 07:05:08 UTC 2015 (duration 5m 7s) [07:05:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:57:03] PROBLEM - puppet last run on mw1031 is CRITICAL Puppet has 1 failures [08:18:53] PROBLEM - puppet last run on db2059 is CRITICAL puppet fail [08:23:23] RECOVERY - puppet last run on mw1031 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [08:45:13] RECOVERY - puppet last run on db2059 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [09:17:38] * AaronSchulz is disturbed by https://phabricator.wikimedia.org/T106895 [09:19:35] 6operations, 10MediaWiki-File-management, 6Multimedia, 10Wikimedia-Media-storage: Thumbnails no longer rendering for recent local uploads - https://phabricator.wikimedia.org/T106895#1482048 (10Peachey88) p:5Triage>3Unbreak! [09:32:30] 6operations, 10MediaWiki-File-management, 6Multimedia, 10Wikimedia-Media-storage: Thumbnails no longer rendering for recent local uploads - https://phabricator.wikimedia.org/T106895#1482062 (10aaron) Maybe related to 3bfb0b5d4e874ba66fb857b36feac173918fc458 [09:34:11] _joe_: maybe https://gerrit.wikimedia.org/r/#/c/226738/ should be reverted [09:35:27] * AaronSchulz wonders if ori is still up [09:37:07] or mark [09:44:37] https://en.wikipedia.org/wiki/Special:ListFiles :( [09:51:26] 6operations, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1482066 (10Tau) Finnally I managed to get the logging enabled. 1) I did `chmod -R a+rw /var/log/mediawiki` - still nothing... [09:55:29] <_joe__> Hi [09:55:50] <_joe__> On IRC on my phone [09:56:06] <_joe__> And bad reception too [09:56:08] heh [09:56:31] hey [09:56:32] someone called? [09:56:33] <_joe__> Whatsup Aaron? [09:58:06] https://phabricator.wikimedia.org/T106895 [09:58:21] I suspect that is related to the rewrite rule change made recently [09:58:34] <_joe__> paravoid: Aaron called me but I'm afk atm [09:58:50] yeah, that's a good suspicion [09:59:04] hhvm imagescalers are broken I think [09:59:06] filippo will join soon too [09:59:13] hey [09:59:30] looking [09:59:43] Could not determine the name of the requested thumbnail. [09:59:43]

[09:59:43] [09:59:56] <_joe__> Paravoid: maybe The rewrite is pointing to The wrong docroot? [10:00:19] 6operations, 10MediaWiki-File-management, 6Multimedia, 10Wikimedia-Media-storage: Thumbnails no longer rendering for recent local uploads - https://phabricator.wikimedia.org/T106895#1482074 (10aaron) https://en.wikipedia.org/wiki/Special:ListFiles is quite broken :/ [10:00:56] <_joe__> i said so to or.i, not dire if che fixed it [10:00:57] that commit message isn't very informative -- how did it work before the ProxyPass? [10:01:02] or did it not work at all? [10:01:17] I assume it hit the << ^/w/(.*\.(php|hh))$ >> case [10:01:24] it basically worked before [10:01:54] just some failures due to APC missing on certain hosts (I made a different MW patch that also works around that, but there is some misconfig there) [10:01:59] <_joe__> It worled via zend [10:02:02] the breakage now is far worse [10:02:42] <_joe__> Because we didnt catch thumb handlet [10:03:16] <_joe__> But revert those tqo apache changes [10:05:07] <_joe__> paravoid: call me if you need info, The linea here is very bad [10:05:18] don't worry [10:06:34] <_joe__> The one day I'm afk as in I left The PC at home... [10:08:13] shall we revert that? also some requests seem to yield 200 "GET http://commons.wikimedia.org/w/thumb_handler.php/3/38/Cajsa_Warg.djvu/page354-7541px-Cajsa_Warg.djvu.jpg [10:08:17] actually it wouldn't hit that php/hh case due to the $, so zend [10:08:21] * AaronSchulz is too tired [10:09:24] godog: https://gerrit.wikimedia.org/r/#/c/226738/1? I think so. [10:09:45] <_joe__> godog: i dunno, does The timing relate? [10:10:43] the earlier one went out the 23rd, probably not the problem [10:11:01] _joe__: not sure it does yet, looking at the graphs [10:11:34] since it didn't actually have the indented affect [10:12:46] (the thumb_handler paths do not *end* in .php, only include it) [10:12:53] <_joe__> The second patch had an errore, proxypassing to wikimedia. org instead of The correct docroot [10:12:59] yeah [10:15:26] (03PS1) 10Aaron Schulz: Revert "Follow-up for Ie17cb06: add thumb_handler.php ProxyPass rule to all vhosts" [puppet] - 10https://gerrit.wikimedia.org/r/226964 [10:15:42] so via this it does work, https://commons.wikimedia.org/w/thumb_handler.php/1/16/Vitale_da_bologna%2C_resti_di_affreschi_in_san_martino%2C_con_abramo%2C_apostoli_e_dannati%2C_01.JPG/650px-Vitale_da_bologna%2C_resti_di_affreschi_in_san_martino%2C_con_abramo%2C_apostoli_e_dannati%2C_01.JPG [10:16:56] paravoid: thoughts? ^ still trying to get some more confirmation [10:17:59] the URL prefix indeed has to match up or thumb_handler.php cannot figure the relative file path part, so the wrong docroot issue is valid cause [10:18:35] what relative file path part? [10:18:36] the option is (a) revert to a known OK state or (b) try a fix and play with HHVM on the weekend [10:18:59] thumb_handler.php/ [10:19:10] how would it know the docroot, though [10:19:20] I was reading WebRequest::getPathInfo [10:19:32] it just seems to parse REQUEST_URI [10:20:02] and uses SCRIPT_NAME too, but it's unset in this case I think [10:22:52] anyway [10:22:58] let's just revert for now and fallback to zend [10:23:01] and figure it out later [10:23:15] AaronSchulz: your patch doesn't revert the wikimedia.org part, I'll amend [10:25:15] (03PS2) 10Faidon Liambotis: Revert "Add ProxyPass rule for thumb_handler.php" [puppet] - 10https://gerrit.wikimedia.org/r/226964 (owner: 10Aaron Schulz) [10:25:48] (03CR) 10Faidon Liambotis: [C: 032 V: 032] Revert "Add ProxyPass rule for thumb_handler.php" [puppet] - 10https://gerrit.wikimedia.org/r/226964 (owner: 10Aaron Schulz) [10:27:09] * paravoid wonders what exactly did we smoketest before switching 100% of our load to HHVM imagescalers [10:27:29] makes sense [10:27:43] (the amendment) [10:28:10] force-running puppet across all imagescalers [10:28:40] I wonder if some monitoring of https://en.wikipedia.org/wiki/Special:ListFiles would be useful (looking for non-200s) [10:28:52] https://en.wikipedia.org/wiki/Special:ListFiles looks good now [10:28:57] yeah, that would probably make sense [10:29:07] want to file it as a phab task? [10:29:13] though that involves following a bunch of links...not sure if we do stuff like that already [10:29:15] sure [10:29:25] we don't internally, but catchpoint does that [10:29:40] they load a full page with Chrome [10:30:10] I can imagine ungeneratable thumbs making this check too noisy, though [10:30:17] it happens too often I think :/ [10:31:25] 6operations: Monitor https://en.wikipedia.org/wiki/Special:ListFiles for non 200 HTTP statuses in thumbnails - https://phabricator.wikimedia.org/T106937#1482087 (10aaron) 3NEW [10:31:42] 6operations, 10MediaWiki-File-management, 6Multimedia, 10Wikimedia-Media-storage: Thumbnails no longer rendering for recent local uploads - https://phabricator.wikimedia.org/T106895#1482096 (10Edgars2007) Seems to be fixed. Thanks! [10:31:53] feels like we could be exposing what thumb_handler does via metrics too [10:33:02] OK, thanks guys! I should check out soon. [10:33:09] AaronSchulz: thanks! [10:33:14] AaronSchulz: np, thanks! [10:33:22] for the troubleshooting, the call, the correct guess, the patchset etc. :) [10:33:42] 6operations, 10MediaWiki-File-management, 6Multimedia, 10Wikimedia-Media-storage: Thumbnails no longer rendering for recent local uploads - https://phabricator.wikimedia.org/T106895#1482098 (10faidon) 5Open>3Resolved a:3faidon Aaron's guess was correct. Both 9d45102200e7196ba4933103a3fa1144509515a9 a... [10:38:40] I'm afk, call if I can help [10:39:06] 6operations, 6Multimedia, 10Wikimedia-Media-storage: Monitor [[Special:ListFiles]] for non 200 HTTP statuses in thumbnails - https://phabricator.wikimedia.org/T106937#1482104 (10Peachey88) [11:14:58] 6operations, 6Multimedia, 10Wikimedia-Media-storage, 7Monitoring: Monitor [[Special:ListFiles]] for non 200 HTTP statuses in thumbnails - https://phabricator.wikimedia.org/T106937#1482116 (10Krenair) [11:18:55] (03PS1) 10Alex Monk: NewUserMessageOnAutoCreate = true for gomwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/226966 (https://phabricator.wikimedia.org/T106169) [11:25:24] (03PS1) 10Alex Monk: Enable GuidedTour on knwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/226967 (https://phabricator.wikimedia.org/T103659) [11:39:59] (03PS1) 10Alex Monk: Fix site name and meta namespace of zh_min_nanwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/226968 (https://phabricator.wikimedia.org/T106639) [11:52:27] (03PS1) 10Alex Monk: Localise suwikiquote logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/226970 (https://phabricator.wikimedia.org/T106784) [12:10:02] (03PS1) 10Alex Monk: Replace deprecated wgConf->localVHosts with wgLocalVirtualHosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/226971 (https://phabricator.wikimedia.org/T106206) [12:21:04] 6operations, 6Multimedia, 10Wikimedia-Media-storage: Thumbnails no longer rendering for recent local uploads - https://phabricator.wikimedia.org/T106895#1482217 (10Nemo_bis) [12:24:26] 6operations, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1482218 (10Nemo_bis) > BUT no such phrase as ForeignAPIRepo is included in it. How to get it? Have you tried loading and [[h... [12:26:15] <_joe__> paravoid: traffic is not 100% on hhvm afaik [12:31:35] <_joe__> Also, i strongly suspect proxypassmatch woul have worled instead [13:42:42] !db1035 restarted, temporarilly increasing db error rates on s3 [13:42:46] !log db1035 restarted, temporarilly increasing db error rates on s3 [13:42:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:43:02] will depool it [13:46:35] db1044 is handling the load nicely, only <1 min of problems [13:54:35] (03PS1) 10Jcrespo: Depool db1035 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/226976 [13:55:22] (03CR) 10Jcrespo: [C: 032] Depool db1035 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/226976 (owner: 10Jcrespo) [13:56:51] !log jynus Synchronized wmf-config/db-eqiad.php: Depool db1035 (duration: 00m 12s) [13:56:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:57:38] !log jynus Synchronized wmf-config/db-eqiad.php: Depool db1035 (duration: 00m 12s) [14:30:03] (03PS1) 10Jcrespo: Repool db1035 with lower weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/226977 [14:30:54] (03CR) 10Jcrespo: [C: 032] Repool db1035 with lower weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/226977 (owner: 10Jcrespo) [14:33:08] !log jynus Synchronized wmf-config/db-eqiad.php: Repool db1035 with lower weight (duration: 00m 13s) [14:33:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:03:10] (03PS1) 10BryanDavis: [WIP] Update configuration for logstash 1.5.3 [puppet] - 10https://gerrit.wikimedia.org/r/226991 [16:16:02] PROBLEM - Restbase endpoints health on restbase1008 is CRITICAL: /page/graph/png/{title}/{revision}/{graph_id} is CRITICAL: Test test for /page/graph/png/{title}/{revision}/{graph_id} returned the unexpected status 404 (expecting: 200) [16:18:08] 6operations, 10Math, 5Patch-For-Review: Convert Math to use extension registration - https://phabricator.wikimedia.org/T87941#1482515 (10Physikerwelt) [16:30:02] <_joe_> !log repooling mw1159,mw1160 [16:30:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:30:11] <_joe_> wtf [16:30:31] <_joe_> paravoid: I stand corrected, they were depooled, not by me though. [16:30:40] <_joe_> (and they needed repooling obviously) [16:45:56] 6operations, 10Math, 5Patch-For-Review: Convert Math to use extension registration - https://phabricator.wikimedia.org/T87941#1482579 (10Paladox) @Physikerwelt you can do it the normal way as you would in php. You go to mw-config/ on your website then you follow steps once you get to upgrade database or inst... [16:47:47] (03PS1) 10Jcrespo: Repool db1035 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/226996 [16:49:00] (03CR) 10Jcrespo: [C: 032] Repool db1035 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/226996 (owner: 10Jcrespo) [16:53:32] !log jynus Synchronized wmf-config/db-eqiad.php: Repool db1035 at 100% capacity (duration: 00m 40s) [16:53:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:54:02] 6operations, 5Patch-For-Review: Requesting access for joal to resources [stat1001, stat1002, stat1003, bast1001.wikimedia.org, Hadoop cluster, eventlogging1001, hafnium] - New key after laptop stolen - https://phabricator.wikimedia.org/T106812#1482583 (10JohnLewis) 5Open>3Resolved [16:54:34] 6operations, 5Patch-For-Review: Requesting access for joal to resources [stat1001, stat1002, stat1003, bast1001.wikimedia.org, Hadoop cluster, eventlogging1001, hafnium] - New key after laptop stolen - https://phabricator.wikimedia.org/T106812#1478952 (10JohnLewis) a:5Ottomata>3JohnLewis [16:56:26] 6operations, 10Math, 5Patch-For-Review: Convert Math to use extension registration - https://phabricator.wikimedia.org/T87941#1482588 (10Physikerwelt) ...mh it seems that there are several problems with the web updater... I chose a unsupported language and got Class undefined: DT_LanguageEn [16:58:44] 6operations, 10Math, 5Patch-For-Review: Convert Math to use extension registration - https://phabricator.wikimedia.org/T87941#1482593 (10Paladox) Hum I am not sure if that is a problem with extension registration. Does it work with supported languages. [17:12:57] 6operations, 6Community-Advocacy, 10Traffic, 7HTTPS, 5Patch-For-Review: Decom old multiple-subdomain wikis in wikipedia.org - https://phabricator.wikimedia.org/T102814#1482621 (10Reedy) >>! In T102814#1481903, @Jalexander wrote: > We told them we'd remove it soon :) specifically no time frame. I would wa... [17:21:26] Anyone around crazy enough to look for an error in the logs for me? I think it might be a PDF upload bug that we've seen before, but I'm not sure yet what the error is [17:22:10] Sounds fun [17:36:54] (03PS1) 10Giuseppe Lavagetto: mediawiki: catch thumb_handler.php to HHVM as well [puppet] - 10https://gerrit.wikimedia.org/r/227000 [17:37:54] <_joe_> AaronSchulz, paravoid. godog: ^^ should do the right thing, but it might be improved if we want (and include it only on the scalers) [17:41:58] <_joe_> (btw, the first patch by ori was correct too, AFAICS, the second one was simply broken by a path that didn't match the docroot) [18:02:33] 6operations, 6Multimedia, 10Wikimedia-Media-storage: Thumbnails no longer rendering for recent local uploads - https://phabricator.wikimedia.org/T106895#1482670 (10Joe) A bit of post-mortem: while 9d45102200e7196ba4933103a3fa1144509515a9 was correct, 3bfb0b5d4e874ba66fb857b36feac173918fc458 wasn't, as the... [18:21:23] PROBLEM - RAID on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:23:03] PROBLEM - Apache HTTP on mw1160 is CRITICAL - Socket timeout after 10 seconds [18:24:03] PROBLEM - Apache HTTP on mw1159 is CRITICAL - Socket timeout after 10 seconds [18:24:54] RECOVERY - Apache HTTP on mw1160 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.065 second response time [18:25:22] RECOVERY - RAID on terbium is OK optimal, 1 logical, 2 physical [18:25:52] RECOVERY - Apache HTTP on mw1159 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.064 second response time [18:26:06] (03PS2) 10BryanDavis: [WIP] Update configuration for logstash 1.5.3 [puppet] - 10https://gerrit.wikimedia.org/r/226991 [18:53:51] chasemp: around? [19:07:01] (03CR) 10Ori.livneh: "Simply removing the '$' anchor from the first pattern seems like a better idea" [puppet] - 10https://gerrit.wikimedia.org/r/227000 (owner: 10Giuseppe Lavagetto) [19:07:43] _joe_: 18:30 ori: Depooled Precise image scalers (mw1159 and mw1160) [19:07:46] i !logged it [19:21:23] PROBLEM - configured eth on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:25:13] RECOVERY - configured eth on terbium is OK - interfaces up [19:27:53] <_joe_> ori: yeah i've seen it, I just wasn't aware [19:29:40] <_joe_> ori: i was also on my phone with a crappy connection [19:30:30] ah ok, i thought you thought i was sneaking around :P [19:31:05] <_joe_> nah [19:32:16] <_joe_> I got my first phonecall on a saturday the one day I didn't have my laptop with me [19:32:54] <_joe_> I brilliantly made my SO feel guilty of not letting me take it with me [19:46:15] (03PS3) 10BryanDavis: [WIP] Update configuration for logstash 1.5.3 [puppet] - 10https://gerrit.wikimedia.org/r/226991 [19:49:55] 6operations, 6Multimedia, 10Wikimedia-Media-storage, 7user-notice: Thumbnails no longer rendering for recent local uploads - https://phabricator.wikimedia.org/T106895#1482703 (10Matanya) [19:50:41] (03PS4) 10BryanDavis: [WIP] Update configuration for logstash 1.5.3 [puppet] - 10https://gerrit.wikimedia.org/r/226991 (https://phabricator.wikimedia.org/T99735) [19:54:15] it looks like the MW API p99 latency went up a lot around 16:48 UTC: http://grafana.wikimedia.org/#/dashboard/db/restbase?panelId=12&fullscreen [19:54:54] mean didn't change that much [19:55:48] SAL lists db1035 being re-pooled around that time [20:25:42] PROBLEM - RAID on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:27:32] RECOVERY - RAID on terbium is OK optimal, 1 logical, 2 physical [20:42:18] I think it's something else causing it though [20:51:54] !log rolling restart of restbase instances [20:52:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:55:42] Hey, action=sitematrix is broken: https://en.wikipedia.org/w/api.php?action=sitematrix&format=jsonfm [20:55:54] "url": null for all wikipedias [20:56:14] Krenair: is that related to https://gerrit.wikimedia.org/r/#/c/225287/ ? [20:57:12] maybe [20:57:41] * Krenair looks [21:10:44] 6operations, 6Labs, 10wikitech.wikimedia.org: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1482769 (10Ricordisamoa) [21:26:19] sitic, yeah, I think it'd be triggered by that [21:40:23] sitic, are you Sitic on phabricator? [21:40:31] Krenair: yes [21:43:51] filed a task [21:43:54] Krenair: thanks :-) [22:27:34] PROBLEM - puppet last run on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:29:23] RECOVERY - puppet last run on terbium is OK Puppet is currently enabled, last run 27 minutes ago with 0 failures [23:41:42] PROBLEM - BGP status on cr2-ulsfo is CRITICAL No response from remote host 198.35.26.193 [23:43:46] 10Ops-Access-Requests, 6operations, 6Discovery, 10Maps, 3Discovery-Maps-Sprint: Grant sudo on map-tests200* for maps team - https://phabricator.wikimedia.org/T106637#1482907 (10Yurik) @jcrespo, thanks for getting to the bottom of that issue! And yes, logstash should be used more (I will be adding more m... [23:45:33] PROBLEM - BGP status on cr2-ulsfo is CRITICAL No response from remote host 198.35.26.193