[01:03:02] PROBLEM - Check health of redis instance on 6381 on rdb2003 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6381 [01:04:02] RECOVERY - Check health of redis instance on 6381 on rdb2003 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6381 has 1 databases (db0) with 8383451 keys, up 3 minutes 51 seconds - replication_delay is 0 [02:09:11] Noob to coding [02:09:42] Want to understand it better and learn more [03:33:02] PROBLEM - puppet last run on mw2222 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz] [03:33:03] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz] [04:01:02] RECOVERY - puppet last run on mw2222 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [04:01:12] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [04:10:22] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=8215.80 Read Requests/Sec=6853.50 Write Requests/Sec=11.00 KBytes Read/Sec=29114.40 KBytes_Written/Sec=4768.80 [04:16:22] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=3.40 Read Requests/Sec=0.00 Write Requests/Sec=4.60 KBytes Read/Sec=0.00 KBytes_Written/Sec=41.20 [06:41:52] PROBLEM - puppet last run on db1011 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata] [07:10:03] RECOVERY - puppet last run on db1011 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [07:45:02] PROBLEM - nova-compute process on labvirt1011 is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [07:46:02] RECOVERY - nova-compute process on labvirt1011 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [09:00:53] !log Executing 'sudo -u _graphite find /var/lib/carbon/whisper/eventstreams/rdkafka -type f -mtime +15 -delete' on graphite1001 to free some space (/var/lib/carbon filling up) - T1075 [09:01:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:01:04] T1075: Something puts many different metrics into graphite, allocating a lot of disk space - https://phabricator.wikimedia.org/T1075 [09:13:22] PROBLEM - HHVM rendering on mw2125 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:14:12] RECOVERY - HHVM rendering on mw2125 is OK: HTTP OK: HTTP/1.1 200 OK - 75172 bytes in 0.294 second response time [09:34:42] this one was due to /usr/local/bin/restart-hhvm, probably also the other ones.. [10:07:22] marostegui: good (morning?) day [10:07:29] gotta big rename [11:17:57] (03PS1) 10Urbanecm: Add maiwikimedia to DNS [dns] - 10https://gerrit.wikimedia.org/r/361295 (https://phabricator.wikimedia.org/T168782) [11:19:32] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [11:19:42] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [11:20:46] (03PS1) 10Urbanecm: Add maiwikimedia to Apache conf [puppet] - 10https://gerrit.wikimedia.org/r/361296 (https://phabricator.wikimedia.org/T168782) [11:25:32] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [11:26:42] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [11:34:08] (03PS1) 10Urbanecm: Initial configuration for maiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) [11:35:13] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for maiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) (owner: 10Urbanecm) [11:36:28] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) (owner: 10Urbanecm) [11:37:24] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for maiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) (owner: 10Urbanecm) [11:39:03] (03CR) 10Urbanecm: "What is wrong with this patch?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) (owner: 10Urbanecm) [11:48:59] Urbanecm: trailing whitespaces ^^ [11:49:06] wikiversion.json [11:49:08] and [11:49:34] InitialiseSettings.php [11:49:39] marked in red in Gerrit [11:49:48] (to which I cannot still access ;) ) [12:01:22] wikiversions.js is autogenerated [12:41:04] So if there are whitespaces issues, assert it comes from a manual update, eg my wiki add. [12:41:25] if not, we need to amend the tooling script [12:46:02] Dereckson: https://gerrit.wikimedia.org/r/#/c/361297/1/wikiversions.json [12:46:14] is that file really autogenerated? [12:46:30] didn't knew [12:46:49] I always added an entry there manually [12:47:28] anyway, lunch time, see ya later [13:01:02] hi thcipriani. are you around? [13:04:42] hi dear Ops folks. Is someone around who can help me with a, I think, Ops related question? [13:06:36] lzia: I can try but will not be around for a lot of time :) [13:06:46] great, thanks elukey. [13:06:57] elukey: thcipriani helped us with https://phabricator.wikimedia.org/T131949#3371909 the other day. [13:08:14] elukey: the change was going to update the interface of quicksurvey (part of the interface, two keys: Visit survey and No thanks) so that the users of certain languages see those keys in their language. [13:09:01] We received a notice from rowiki that they still see the two buttons in English, and I tested it now for Romanian and Arabic and it's indeed the case. My question is: how can we fix for this? [13:09:22] this is kind of urgent, as it may result in drop-outs from the surveys. [13:09:30] elukey: ^ [13:11:52] (reading) [13:14:54] lzia: I am afraid I can really help much, a bit ignorant about the extension mentioned in https://gerrit.wikimedia.org/r/#/c/360889 [13:15:08] do we have any positive confirmation that it works for some users? [13:15:18] no, elukey. [13:15:51] Thing is, not sure how mediawiki/extensions/QuickSurveys works :( [13:16:15] who do you think can help, elukey? :) [13:26:23] hey schana. are you around? [13:28:30] elukey: lzia: repeating my post from last night: [13:28:30] We (research) are currently running reader surveys that had some translations deployed during thursday's swat: https://gerrit.wikimedia.org/r/#/c/360889/ [13:28:30] Everything looked good after the deploy; however, the translation changes are no longer in place. [13:28:30] test (survey buttons should not be in English): https://ro.wikipedia.org/wiki/Special:Random?quicksurvey=true [13:28:30] They are correctly translated when requesting from mw2017.codfw.wmnet [13:28:51] Hi lzia [13:30:18] ah now we have more info :) [13:30:33] so after the deploy the translations were there [13:30:41] Yes [13:30:55] how do you test it in codfw? [13:31:12] X-wikimesia-debug header [13:32:41] I just used the extension to force the X debug header, the translations are there [13:33:06] But not from eqiad [13:33:16] yeah, weird [13:38:31] Special:Version shows the same version of the extension on both [13:44:06] (trying to check but I am a bit ignorant about mediawiki) [13:50:03] I can see that /srv/mediawiki/php-1.30.0-wmf.5/extensions/QuickSurveys/i18n/ro.json on mwdebug1001 contains the last things [13:53:10] schana,lzia - is the issue present in all the pages? Only a subset of them? etc.. [13:54:16] * lzia checks (though in her experience it has been consistent, elukey) [13:54:56] but the survey should be in all pages right? [13:55:17] all namespace 0 except search result pages and I think main page [13:55:20] elukey: ^ [13:57:36] elukey: in my tests of around 20 pages across the few languages, I see the issue in all pages (again, they're all namespace 0, no main page, no search result page) [14:06:23] lzia: going to update the task with some info, but I have too few info to give a real help sorry :( [14:06:40] np, thanks for trying elukey and for the update. [14:12:00] lzia: https://phabricator.wikimedia.org/T131949#3376854 [14:44:35] lzia: I actually don't see the link to the quicksurvey using https://ro.wikipedia.org/wiki/Special:Random?quicksurvey=true, but I do see that that wiki still has the translated messages deployed Thursday https://ro.wikipedia.org/wiki/MediaWiki:Ext-quicksurveys-external-survey-yes-button and https://ro.wikipedia.org/wiki/MediaWiki:Ext-quicksurveys-external-survey-no-button [14:46:06] thcipriani: do you have DNT on? If you do, you won't see it. [14:46:21] ah, I might [14:46:24] * thcipriani checks [14:48:02] RECOVERY - MariaDB Slave Lag: s1 on db1047 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [14:50:01] can you see it thcipriani? [14:50:15] ah, ok, now I see what you're seeing. Hrm. OK, so the message is in the wiki, but not showing up, lemme futz with something in resourceloader and see if that's it [14:50:30] thanks, thcipriani. :) [14:50:36] looking at https://wikitech.wikimedia.org/wiki/How_to_deploy_code#ResourceLoader_and_l10n_messages [14:50:39] for reference [14:50:43] uhu [14:59:45] well, I followed that procedure and I'm still seeing the old messages for rowiki :( [15:00:09] hmm. and did you see elukey's note on the phab task? [15:00:36] thcipriani: https://phabricator.wikimedia.org/T131949#3376854 [15:02:39] hrm, it works for me when I set the X-Wikimedia-Debug server to any particular header [15:02:56] er, any particular server rather [15:04:03] or even https://ro.wikipedia.org/w/index.php?title=Managua&quicksurvey=true&debug=true [15:06:00] hrm, actually this page does seem to just be working for me now https://ro.wikipedia.org/w/index.php?title=Ozer%C8%9Bi_(Ozer%C8%9Bi),_Volod%C3%AEm%C3%AEre%C8%9B&quicksurvey=true [15:06:07] tried it in a few different browsers [15:06:35] maybe there is just some cache after you tell resouceloader to reload the message? [15:06:36] * lzia checks [15:06:50] yeah, let me know if rowiki seems working for you [15:07:05] hmm. I see it correctly with the link you gave as well. let me check cache [15:08:47] thcipriani: do you see the same thing fixed with Japanese? [15:08:59] Romanian is fixed for me now, but I still see the same problem with Japanese. [15:09:14] I haven't tried to clear any other languages [15:09:21] so let me try that... [15:10:49] ok [15:11:02] ok, jawiki cleared, lets see if that wiki is now fixed [15:11:08] * lzia checks [15:12:31] so, works with &debug=true i.e. https://ja.wikipedia.org/wiki/Special:Random?quicksurvey=true&debug=true [15:13:28] it doesn't work without that for me though [15:13:46] which is what I noticed with rowiki as well then it cleared up in a few minutes (cached somewhere maybe?) I'm just going to try to reload the rl message for all languages we updated and then we'll wait a few to see if they start working [15:14:10] fantastic. thanks thcipriani. [15:20:39] thcipriani: the languages that were needed were (ar, hi, hu, ja, nl, ro) - does something need done for all of them separately? [15:21:18] schana: I got them all, just now [15:21:27] cool, thanks [15:21:34] so jawiki seems to be working for me now without debug=true [15:22:13] ^ lzia can you confirm? [15:22:22] or schana [15:22:37] all of them look good to me [15:22:41] * lzia checks [15:23:54] fantastic. it works. :) [15:24:02] thcipriani: thank you so much! :) [15:24:05] thcipriani: was there something we should have done differently when updating? [15:24:30] schana: I'm not sure. I'm going to update the task with what I did and ping some folks who know more about resourceloader than me [15:24:40] okay, thanks thcipriani [15:24:53] * thcipriani does that [15:36:35] ok, task updated, new task filed for scap, translations look ok, so I'm going to escape from my computer :) [15:47:36] 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Reopen Wikinews Dutch - https://phabricator.wikimedia.org/T168764#3376987 (10MF-Warburg) Ok. Please answer my question in return. [16:52:35] 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Reopen Wikinews Dutch - https://phabricator.wikimedia.org/T168764#3377074 (10Aklapper) MF-Warburg: Errm, I cannot answer questions on topics that I have no knowledge of. [21:10:59] (03Draft1) 10Paladox: Icinga: Fix check_ram script to work on stretch [puppet] - 10https://gerrit.wikimedia.org/r/361361 [21:11:02] (03PS2) 10Paladox: Nrpe: Fix check_ram script to work on stretch [puppet] - 10https://gerrit.wikimedia.org/r/361361 [21:28:55] (03PS3) 10Paladox: Nrpe: Fix check_ram script to work on stretch [puppet] - 10https://gerrit.wikimedia.org/r/361361 [21:35:02] (03CR) 10Paladox: "Here's a stretch host:" [puppet] - 10https://gerrit.wikimedia.org/r/361361 (owner: 10Paladox) [21:37:07] (03CR) 10Paladox: "See https://askubuntu.com/questions/770108/what-do-the-changes-in-free-output-from-14-04-to-16-04-mean and https://gitlab.com/procps-ng/pr" [puppet] - 10https://gerrit.wikimedia.org/r/361361 (owner: 10Paladox) [21:49:32] PROBLEM - Check Varnish expiry mailbox lag on cp1099 is CRITICAL: CRITICAL: expiry mailbox lag is 2055191 [22:14:28] 10Operations, 10DC-Ops: Lots of hosts with hyperthreading disabled - https://phabricator.wikimedia.org/T156140#2965447 (10jcrespo) Relevant: https://lists.debian.org/debian-devel/2017/06/msg00308.html [22:29:32] RECOVERY - Check Varnish expiry mailbox lag on cp1099 is OK: OK: expiry mailbox lag is 164