[00:16:18] PROBLEM - Kafka Broker Replica Max Lag on kafka1013 is CRITICAL: CRITICAL: 68.97% of data above the critical threshold [5000000.0] [00:18:27] (03PS2) 10Dzahn: resolving::domain_search: drop esams.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/280503 (https://phabricator.wikimedia.org/T123712) [00:20:11] (03PS1) 10Krinkle: coal: No longer log values for 'domLoading' metric from navtiming [puppet] - 10https://gerrit.wikimedia.org/r/281066 (https://phabricator.wikimedia.org/T131565) [00:41:59] (03PS1) 10Paladox: Set differential.always-allow-close to true for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/281069 [00:42:32] (03PS1) 10Luke081515: Add 'editextendedsemiprotected' protection level on dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281070 (https://phabricator.wikimedia.org/T131109) [00:45:42] (03CR) 10Luke081515: [C: 04-1] "Currently blocked, please don't deploy:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281070 (https://phabricator.wikimedia.org/T131109) (owner: 10Luke081515) [00:46:20] (03PS2) 10Dereckson: Phabricator: allow liberal close in Differential [puppet] - 10https://gerrit.wikimedia.org/r/281069 (owner: 10Paladox) [00:46:38] (03CR) 10Paladox: "Thanks" [puppet] - 10https://gerrit.wikimedia.org/r/281069 (owner: 10Paladox) [00:46:46] (03PS3) 10Dereckson: Phabricator: allow liberal close in Differential [puppet] - 10https://gerrit.wikimedia.org/r/281069 (owner: 10Paladox) [00:47:00] (03CR) 10Dereckson: [C: 031] Phabricator: allow liberal close in Differential [puppet] - 10https://gerrit.wikimedia.org/r/281069 (owner: 10Paladox) [00:49:32] (03PS1) 10Paladox: Set differential.allow-self-accept to true in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/281071 [00:51:07] (03PS2) 10Paladox: Set differential.allow-self-accept to true in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/281071 [00:52:56] (03CR) 10Dereckson: "Use case: config repository where a lot of changes are self reviewed or self merged after a discussion outside the tracker." [puppet] - 10https://gerrit.wikimedia.org/r/281071 (owner: 10Paladox) [00:54:17] (03PS3) 10Paladox: Set differential.allow-self-accept to true in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/281071 [01:02:02] (03CR) 1020after4: [C: 04-1] "I don't think this is necessary - when you land the revision phabricator will close it automatically, regardless of who commits it." [puppet] - 10https://gerrit.wikimedia.org/r/281069 (owner: 10Paladox) [01:03:26] (03CR) 1020after4: [C: 04-1] "I don't think we want this. Anyone who can run `arc land` can already bypass differential review so there is no need to self-accept, it's " [puppet] - 10https://gerrit.wikimedia.org/r/281071 (owner: 10Paladox) [01:04:33] !log krinkle@tin Synchronized php-1.27.0-wmf.19/includes/specials/SpecialRedirect.php: T131328 (duration: 00m 39s) [01:04:34] T131328: Special:Redirect should not emit 404 - https://phabricator.wikimedia.org/T131328 [01:04:34] (03CR) 1020after4: "also these settings are overridden by a phabricator admin in the web interface so there isn't any point putting them in puppet, other than" [puppet] - 10https://gerrit.wikimedia.org/r/281071 (owner: 10Paladox) [01:04:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:06:31] (03CR) 10Dereckson: "With the setting to true, yes." [puppet] - 10https://gerrit.wikimedia.org/r/281069 (owner: 10Paladox) [01:09:17] RECOVERY - Kafka Broker Replica Max Lag on kafka1013 is OK: OK: Less than 50.00% above the threshold [1000000.0] [01:09:23] (03CR) 10Dereckson: "Hmmm... just a note about the last remark: config must ideally be reproducible and to depend of UI config isn't a part of a sensible recov" [puppet] - 10https://gerrit.wikimedia.org/r/281071 (owner: 10Paladox) [01:23:55] (03CR) 10Ori.livneh: [C: 032] coal: No longer log values for 'domLoading' metric from navtiming [puppet] - 10https://gerrit.wikimedia.org/r/281066 (https://phabricator.wikimedia.org/T131565) (owner: 10Krinkle) [01:25:21] (03CR) 10Dereckson: [C: 04-1] "Logic looks good for me." (035 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281070 (https://phabricator.wikimedia.org/T131109) (owner: 10Luke081515) [02:03:32] !log l10nupdate@tin LocalisationUpdate failed (1.27.0-wmf.18) at 2016-04-02 02:03:32+00:00 [02:03:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:24:07] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.19) (duration: 10m 48s) [02:24:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:32:42] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Apr 2 02:32:42 UTC 2016 (duration 8m 36s) [02:32:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:27:11] to #ops, gettign intermittent error like Error deleting file: Could not acquire lock for "mwstore://local-multiwrite/local-deleted/e/6/7/e675vkye2pmx13vjxp1dnv9xp95ic0i.jpg". when trying to delete images at commons [06:01:40] 6Operations: uwsgi takes a long time to restart (Debian Jessie in labs) - https://phabricator.wikimedia.org/T118495#2171310 (10Halfak) [06:20:09] 6Operations: uwsgi takes a long time to restart (Debian Jessie in labs) - https://phabricator.wikimedia.org/T118495#2171313 (10Halfak) A uwsgi start takes less than one second. The majority of the waiting seems to happen when stopping the last uwsgi. I ran these commands on our staging server while no traffic... [06:20:50] 6Operations: uwsgi takes a long time to restart (Debian Jessie in labs) - https://phabricator.wikimedia.org/T118495#2171315 (10Halfak) [06:22:36] <_joe_> halfak: the reason uwsgi takes time to stop might be it tries to serve all in-flight requests first [06:22:49] _joe_, no inflight requests to stop [06:22:51] <_joe_> I kind of remember there is a config switch [06:23:11] Takes exactly 1:30 on two entirely different staging environments. [06:23:15] (03PS2) 10Jforrester: Apply rate limit to edits for normal users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280002 [06:23:17] Might be related to python 3 uwsgi? [06:23:38] <_joe_> halfak: no idea, this is a case where strace(1) debugging is probably needed [06:23:58] * halfak doesn't know what he's doing with strace [06:23:58] (03CR) 10Jforrester: Apply rate limit to edits for normal users (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280002 (owner: 10Jforrester) [06:24:04] do you have an example I could work from? [06:25:08] <_joe_> halfak: you need an opsen to take a look :) [06:25:41] <_joe_> halfak: strace -p is how it works, but you'll just see a bunch of gibberish, basically [06:26:31] <_joe_> I'll subscribe the ticket, and take a look on monday if I remember :) [06:27:49] Thanks _joe_ :) [06:30:37] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:58] PROBLEM - puppet last run on subra is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:27] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:36] PROBLEM - puppet last run on mw2050 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:56] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:56] PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 3 failures [06:33:17] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 2 failures [06:55:36] RECOVERY - puppet last run on subra is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:55:58] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [06:56:07] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:56:18] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:56:57] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:57:26] RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:57:56] RECOVERY - puppet last run on mw2050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:22:01] !log krinkle@tin Synchronized php-1.27.0-wmf.19/extensions/NavigationTiming/modules/ext.navigationTiming.js: T131565 (duration: 00m 33s) [07:22:02] T131565: Remove "domLoading" metric from Navigation Timing - https://phabricator.wikimedia.org/T131565 [07:22:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:31:24] (03PS1) 10Krinkle: webperf: Rename navtiming 'loading' and 'sending' to standard equivalent [puppet] - 10https://gerrit.wikimedia.org/r/281082 [07:32:11] (03CR) 10Krinkle: "Requires renaming on graphite1001/graphite2001 to ensure historical data is preserved." [puppet] - 10https://gerrit.wikimedia.org/r/281082 (owner: 10Krinkle) [07:39:20] (03Abandoned) 10Faidon Liambotis: Rename hooft's mgmt to bast3001 too [dns] - 10https://gerrit.wikimedia.org/r/280641 (owner: 10Faidon Liambotis) [09:01:18] 6Operations, 10Gerrit, 10Mail, 7Upstream: Only receiving few emails from Gerrit - https://phabricator.wikimedia.org/T131189#2172024 (10Nemo_bis) Can we please close this contingent bug and split the unspecified expected long-term improvements to a separate task? [09:35:47] (03PS1) 10Faidon Liambotis: Remove scs-oe11-esams DNS [dns] - 10https://gerrit.wikimedia.org/r/281116 [09:35:49] (03PS1) 10Faidon Liambotis: Remove puppet/recursor0/recursor1.esams CNAMEs [dns] - 10https://gerrit.wikimedia.org/r/281117 [09:49:37] PROBLEM - puppet last run on ganeti1002 is CRITICAL: CRITICAL: Puppet has 1 failures [10:01:56] !log reedy@tin Synchronized php-1.27.0-wmf.19/extensions/OATHAuth: Fix for 2FA testing (duration: 00m 30s) [10:02:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:14:48] RECOVERY - puppet last run on ganeti1002 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [10:31:37] PROBLEM - Kafka Broker Replica Max Lag on kafka1020 is CRITICAL: CRITICAL: 65.52% of data above the critical threshold [5000000.0] [10:36:23] (03PS1) 10Reedy: Revert "Revert labswiki to wmf.18 as 2FA seems to be broken" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281122 [10:42:12] (03CR) 10Reedy: [C: 032] Revert "Revert labswiki to wmf.18 as 2FA seems to be broken" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281122 (owner: 10Reedy) [10:42:39] (03Merged) 10jenkins-bot: Revert "Revert labswiki to wmf.18 as 2FA seems to be broken" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281122 (owner: 10Reedy) [10:45:22] !log reedy@tin rebuilt wikiversions.php and synchronized wikiversions files: wikitech back to .19 [10:45:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:00:17] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [11:18:17] RECOVERY - Kafka Broker Replica Max Lag on kafka1020 is OK: OK: Less than 50.00% above the threshold [1000000.0] [12:22:23] (03CR) 10Paladox: "@20after4 this is the same as gerrit. Users on gerrit can already self submit. Plus doing arc land will by pass any testing so someone cou" [puppet] - 10https://gerrit.wikimedia.org/r/281071 (owner: 10Paladox) [12:27:18] (03CR) 10Paladox: "@20after4 but it doesn't allow users reviewing and landing other users changes. That's why setting it to true allows other users too." [puppet] - 10https://gerrit.wikimedia.org/r/281069 (owner: 10Paladox) [12:28:55] (03CR) 10Paladox: "@20after4 but if you can already arc land then there is no difference in uploading the diff and running the jenkins tests to make sure the" [puppet] - 10https://gerrit.wikimedia.org/r/281071 (owner: 10Paladox) [12:35:10] (03CR) 10Paladox: "Also per the description for the config" [puppet] - 10https://gerrit.wikimedia.org/r/281071 (owner: 10Paladox) [12:37:02] (03CR) 10Paladox: "Which will allow users to ask for someone to review the code but allow themselves to merge. Since not everyone will have land rights. I up" [puppet] - 10https://gerrit.wikimedia.org/r/281071 (owner: 10Paladox) [12:38:04] (03CR) 10Paladox: "Per description for the config" [puppet] - 10https://gerrit.wikimedia.org/r/281069 (owner: 10Paladox) [13:08:00] (03PS1) 10Ladsgroup: Use die-on-term on ores uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/281161 (https://phabricator.wikimedia.org/T131572) [13:15:16] 6Operations, 13Patch-For-Review: uwsgi takes a long time to restart (Debian Jessie in labs) - https://phabricator.wikimedia.org/T118495#1801807 (10Ladsgroup) I checked logs and it seems uwsgi service can't shut down with SIGTERM (uwsgi in restart sends SIGHUP to workers and then SIGTERM to the main process) wo... [13:24:41] 6Operations, 13Patch-For-Review: uwsgi takes a long time to restart (Debian Jessie in labs) - https://phabricator.wikimedia.org/T118495#2172524 (10Ladsgroup) Also [[http://uwsgi-docs.readthedocs.org/en/latest/articles/TheArtOfGracefulReloading.html|this article]] is a very good reading. I think we should imple... [13:33:51] (03PS29) 10Ladsgroup: Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [13:45:58] I would really appreciate if ops people merge this patch: https://gerrit.wikimedia.org/r/281161 [13:46:04] very simple [13:46:07] :) [13:57:53] Amir1: ores hosts are Debian or Ubuntu? [13:58:11] Dereckson: debian, jessie 8.3 [13:58:45] http://uwsgi-docs.readthedocs.org/en/latest/Upstart.html#what-is-die-on-term led me to think this concerns Ubuntu, not systemd. [14:01:15] (03PS2) 10Luke081515: Add 'editextendedsemiprotected' protection level on frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281070 (https://phabricator.wikimedia.org/T131109) [14:01:47] Dereckson: each section is about a different topic [14:02:03] things are all irrelevant :) [14:03:55] (03CR) 10Luke081515: Add 'editextendedsemiprotected' protection level on frwiki (034 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281070 (https://phabricator.wikimedia.org/T131109) (owner: 10Luke081515) [14:04:27] (03CR) 10Luke081515: "In general this patch is now ready for deploy, the last questions are solved now. My plan is to deploy this at the evening SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281070 (https://phabricator.wikimedia.org/T131109) (owner: 10Luke081515) [14:05:59] Luke081515: we're Saturday [14:06:06] argh [14:06:21] (03PS1) 10Ladsgroup: Puppetize ORES redis configs [puppet] - 10https://gerrit.wikimedia.org/r/281170 [14:06:24] * Luke081515 has vacanes so he is mixing up the days sometimes [14:06:42] (03CR) 10Luke081515: "Sorry, I mixed up the days, I mean monday." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281070 (https://phabricator.wikimedia.org/T131109) (owner: 10Luke081515) [14:07:38] (03CR) 10jenkins-bot: [V: 04-1] Puppetize ORES redis configs [puppet] - 10https://gerrit.wikimedia.org/r/281170 (owner: 10Ladsgroup) [14:10:05] (03CR) 10Dereckson: [C: 031] "+1 in the extent the setting is correct, and will indeed shutdown the server when it receives a TERM signal." [puppet] - 10https://gerrit.wikimedia.org/r/281161 (https://phabricator.wikimedia.org/T131572) (owner: 10Ladsgroup) [14:13:19] Luke|away: yup, 281070 looks good [14:13:25] Thanks [14:13:46] (03CR) 10Dereckson: [C: 031] "Technically correct." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281070 (https://phabricator.wikimedia.org/T131109) (owner: 10Luke081515) [14:18:25] (03PS2) 10Ladsgroup: Puppetize ORES redis configs [puppet] - 10https://gerrit.wikimedia.org/r/281170 [14:20:13] (03CR) 10jenkins-bot: [V: 04-1] Puppetize ORES redis configs [puppet] - 10https://gerrit.wikimedia.org/r/281170 (owner: 10Ladsgroup) [14:26:19] (03PS3) 10Ladsgroup: Puppetize ORES redis configs [puppet] - 10https://gerrit.wikimedia.org/r/281170 [14:27:46] (03CR) 10jenkins-bot: [V: 04-1] Puppetize ORES redis configs [puppet] - 10https://gerrit.wikimedia.org/r/281170 (owner: 10Ladsgroup) [14:29:47] (03PS4) 10Ladsgroup: Puppetize ORES redis configs [puppet] - 10https://gerrit.wikimedia.org/r/281170 [14:30:47] (03CR) 10jenkins-bot: [V: 04-1] Puppetize ORES redis configs [puppet] - 10https://gerrit.wikimedia.org/r/281170 (owner: 10Ladsgroup) [14:32:58] (03PS5) 10Ladsgroup: Puppetize ORES redis configs [puppet] - 10https://gerrit.wikimedia.org/r/281170 [14:40:07] Amir1: if you want puppet-lint on your local workstation, it's a Ruby Gem, available as a standalone product and not depending of a full Puppet installation [14:40:38] You can use a package or `gem install puppet-lint` [14:41:00] It has an autofix mode by the way: puppet-lint --fix [14:41:22] oh [14:41:23] thanks [14:41:25] :) [14:43:48] (03CR) 1020after4: "@paladox: Please open a task to discuss this further, however, I don't really think this resolves anything. You don't need to accept a dif" [puppet] - 10https://gerrit.wikimedia.org/r/281071 (owner: 10Paladox) [14:51:16] (03PS6) 10Ladsgroup: Puppetize ORES redis configs [puppet] - 10https://gerrit.wikimedia.org/r/281170 [14:52:07] PROBLEM - puppet last run on mw2110 is CRITICAL: CRITICAL: Puppet has 1 failures [14:55:29] (03CR) 10Ladsgroup: [C: 031] "Works in beta like a charm" [puppet] - 10https://gerrit.wikimedia.org/r/281170 (owner: 10Ladsgroup) [15:18:57] RECOVERY - puppet last run on mw2110 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:32:27] PROBLEM - Kafka Broker Replica Max Lag on kafka1014 is CRITICAL: CRITICAL: 51.72% of data above the critical threshold [5000000.0] [15:32:31] (03CR) 10Paladox: "Ok." [puppet] - 10https://gerrit.wikimedia.org/r/281071 (owner: 10Paladox) [15:40:09] 6Operations, 10Phabricator: Enable differential.allow-self-accept in phabricator - https://phabricator.wikimedia.org/T131622#2172941 (10Paladox) [15:41:22] (03PS4) 10Paladox: Set differential.allow-self-accept to true in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/281071 (https://phabricator.wikimedia.org/T131622) [15:41:38] PROBLEM - puppet last run on cp3041 is CRITICAL: CRITICAL: puppet fail [15:41:40] (03CR) 10Paladox: "@20after4 here https://phabricator.wikimedia.org/T131622 please." [puppet] - 10https://gerrit.wikimedia.org/r/281071 (https://phabricator.wikimedia.org/T131622) (owner: 10Paladox) [15:43:07] RECOVERY - Kafka Broker Replica Max Lag on kafka1014 is OK: OK: Less than 50.00% above the threshold [1000000.0] [15:47:36] (03PS4) 10Paladox: Phabricator: allow liberal close in Differential [puppet] - 10https://gerrit.wikimedia.org/r/281069 (https://phabricator.wikimedia.org/T131623) [15:47:53] (03CR) 10Paladox: "Discussion at https://phabricator.wikimedia.org/T131623" [puppet] - 10https://gerrit.wikimedia.org/r/281069 (https://phabricator.wikimedia.org/T131623) (owner: 10Paladox) [15:49:13] (03PS5) 10Paladox: Set differential.allow-self-accept to true in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/281071 (https://phabricator.wikimedia.org/T131622) [16:08:18] RECOVERY - puppet last run on cp3041 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:20:27] PROBLEM - Outgoing network saturation on labstore1003 is CRITICAL: CRITICAL: 20.69% of data above the critical threshold [100000000.0] [16:25:23] Amir1: hello [16:25:35] Amir1: i've just seen https://phabricator.wikimedia.org/T131627 [16:25:48] what do you mean by it getting run on every puppet run? [16:26:42] my guess is that that is happening because the owner of the dir does not match what is specified in the puppet manifest or something similar [16:27:19] i.e. in the normal case, the provider is triggered only once on package install [16:39:24] mobrovac: hey, no. I checked the logs, the reason behind that is timeout of my checks [16:39:48] I checked permission and owner of everything [16:40:30] hm, the original problem is that it shouldn't be triggered in the first place [16:41:39] if the permissions match, then why is it triggered? [16:41:45] * mobrovac scratches head [16:42:24] do you want to check? [16:43:05] mobrovac: if you want, go to deployment-ores-web.eqiad.wmflabs run "puppet agent -v --test --debug" [16:43:32] kk [16:44:43] thanks [16:47:22] Amir1: so, Role::Labs::Ores::Web fires before the scap3 provider and chowns /srv/ores/deploy/config/99-main.yaml to www-data [16:47:35] Amir1: but scap3 expects it to be owned by deploy-service [16:47:42] so it triggers the provider [16:47:59] (the ores role is triggered before the provider) [16:48:30] I added the patch to make 99-main.yaml today but this puppet runs were happening before that [16:48:35] the provider, in turn, chowns it back to deploy-service [16:49:06] RECOVERY - Outgoing network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [16:49:08] the file has world-readable perms, so you shouldn't even force the chown [16:52:56] mobrovac: okay [16:53:01] let me take a look [16:54:04] Amir1: just setting the mode to 0664 should do the trick [16:54:09] (660 is too restrictive) [16:54:21] and 775 is no bueno either [16:59:17] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/0: down - Core: cr2-codfw:xe-5/2/1 (Telia, IC-314534, 29ms) {#11375} [10Gbps wave]BR [16:59:36] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/2/1: down - Core: cr1-eqord:xe-0/0/0 (Telia, IC-314534, 24ms) {#10694} [10Gbps wave]BR [17:00:44] paravoid: ^ ? [17:03:01] thanks [17:10:53] (03CR) 10EBernhardson: [C: 032] CirrusSearch: Add new rescore profiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280245 (https://phabricator.wikimedia.org/T127896) (owner: 10DCausse) [17:11:45] (03Merged) 10jenkins-bot: CirrusSearch: Add new rescore profiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280245 (https://phabricator.wikimedia.org/T127896) (owner: 10DCausse) [17:17:05] (03PS1) 10EBernhardson: Revert "CirrusSearch: Add new rescore profiles" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281208 [17:17:15] (03CR) 10EBernhardson: [C: 032] Revert "CirrusSearch: Add new rescore profiles" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281208 (owner: 10EBernhardson) [17:17:42] (03Merged) 10jenkins-bot: Revert "CirrusSearch: Add new rescore profiles" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281208 (owner: 10EBernhardson) [17:18:00] (03PS1) 10EBernhardson: CirrusSearch: Add new rescore profiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281209 [17:18:55] (03PS2) 10EBernhardson: CirrusSearch: Add new rescore profiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281209 (https://phabricator.wikimedia.org/T127896) [17:28:56] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 4 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [17:41:24] (03PS1) 10Dereckson: 350K articles celebration logo on cs.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281215 (https://phabricator.wikimedia.org/T131605) [17:57:04] (03PS2) 10Dereckson: 350K articles celebration logo on cs.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281215 (https://phabricator.wikimedia.org/T131605) [17:57:50] (03CR) 10Dereckson: "PS2: sharper version from 1x size" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281215 (https://phabricator.wikimedia.org/T131605) (owner: 10Dereckson) [18:07:21] (03PS3) 10Dereckson: 350K articles celebration logo on cs.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281215 (https://phabricator.wikimedia.org/T131605) [18:14:31] (03PS4) 10Dereckson: 350K articles celebration logo on cs.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281215 (https://phabricator.wikimedia.org/T131605) [18:15:32] (03CR) 10Dereckson: "PS4: reverted to PS2 (PS3 used a white background logo)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281215 (https://phabricator.wikimedia.org/T131605) (owner: 10Dereckson) [18:34:17] PROBLEM - Ubuntu mirror in sync with upstream on carbon is CRITICAL: /srv/mirrors/ubuntu is over 12 hours old. [18:35:47] PROBLEM - Host google is DOWN: PING CRITICAL - Packet loss = 100% [18:39:07] RECOVERY - Host google is UP: PING WARNING - Packet loss = 93%, RTA = 16.83 ms [18:44:46] RECOVERY - Ubuntu mirror in sync with upstream on carbon is OK: /srv/mirrors/ubuntu is over 0 hours old. [18:46:01] https://phabricator.wikimedia.org/T109331 :/ [18:49:36] (03PS5) 10Dereckson: 350K articles celebration logo on cs.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281215 (https://phabricator.wikimedia.org/T131605) [18:53:15] (03CR) 10Dereckson: "PS5: Disable HD logos + genuine coherent sharp version for 1x logo." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281215 (https://phabricator.wikimedia.org/T131605) (owner: 10Dereckson) [18:58:46] (03CR) 10Dereckson: [C: 032] 350K articles celebration logo on cs.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281215 (https://phabricator.wikimedia.org/T131605) (owner: 10Dereckson) [18:59:13] (03Merged) 10jenkins-bot: 350K articles celebration logo on cs.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281215 (https://phabricator.wikimedia.org/T131605) (owner: 10Dereckson) [19:02:37] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [19:06:49] Reedy: could you on Tin do a `git log` in /srv/mediawiki-staging/wmf-config and tell me if it looks good for you? I don't see a trace of the deployment for the previous changes in the server admin log. [19:07:07] Dereckson: You mean the search stuff? [19:07:17] aye [19:07:23] I don't think it was deployed [19:07:29] I think it was an accdiental merge, and revert [19:08:21] If i do a git diff b412822c HEAD, we have 350K logo + labs to php-1.27.0-wmf.19 [19:08:36] so it looks coherent with the log [19:09:26] May I deploy the T131605 files so? [19:09:26] T131605: Set celebration logo on Czech Wikipedia - https://phabricator.wikimedia.org/T131605 [19:09:32] I thinkn so [19:09:41] I thought I deployed the labs change? [19:10:52] 10:45 logmsgbot: reedy@tin rebuilt wikiversions.php and synchronized wikiversions files: wikitech back to .19 [19:13:42] Okay, let's go. [19:14:43] !log dereckson@tin Synchronized static/images/project-logos/cswiki.png: 350K celebration logo for cs.wikipedia (duration: 00m 33s) [19:14:48] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [19:14:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:16:19] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: 350K celebration logo for cs.wikipedia (T131605) (duration: 00m 29s) [19:16:20] T131605: Set celebration logo on Czech Wikipedia - https://phabricator.wikimedia.org/T131605 [19:16:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:18:23] purged, https://en.wikipedia.org/static/images/project-logos/cswiki.png looks good [19:22:36] PROBLEM - jenkins_service_running on gallium is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [19:39:27] (03CR) 10Ladsgroup: [C: 04-1] "Needs another approach" [puppet] - 10https://gerrit.wikimedia.org/r/281170 (owner: 10Ladsgroup) [19:48:07] 6Operations, 10Traffic, 10Wiki-Loves-Monuments-General, 7HTTPS: configure https for www.wikilovesmonuments.org - https://phabricator.wikimedia.org/T118388#2173427 (10Akoopal) A few weeks ago, I asked Sindy Meijer from the dutch office to take this up with the company that is now doing the admin for Wikimed... [19:50:44] (03PS1) 10Ladsgroup: ores: do git clone in staging [puppet] - 10https://gerrit.wikimedia.org/r/281228 [19:59:30] (03PS30) 10Ladsgroup: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 [20:02:50] (03Abandoned) 10Ladsgroup: Puppetize ORES redis configs [puppet] - 10https://gerrit.wikimedia.org/r/281170 (owner: 10Ladsgroup) [20:23:28] 6Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-Requests, 7Tracking: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) (tracking) - https://phabricator.wikimedia.org/T10217#2173459 (10Kaihsu) p:5Normal>3High This bug is nearly a... [20:27:37] RECOVERY - jenkins_service_running on gallium is OK: PROCS OK: 1 process with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [20:28:06] !log Restarted Jenkins on gallium [20:28:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:40:16] PROBLEM - puppet last run on mw2192 is CRITICAL: CRITICAL: puppet fail [20:42:07] PROBLEM - Disk space on restbase2004 is CRITICAL: DISK CRITICAL - free space: /srv 176606 MB (3% inode=99%) [21:10:08] RECOVERY - puppet last run on mw2192 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:29:12] (03PS31) 10Ladsgroup: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 [21:38:57] PROBLEM - puppet last run on xenon is CRITICAL: CRITICAL: Puppet has 1 failures [21:43:58] (03PS32) 10Ladsgroup: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 [21:50:11] (03PS33) 10Ladsgroup: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 [22:01:08] (03PS34) 10Ladsgroup: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 [22:03:37] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [22:04:30] mobrovac: I fixed owner and mode issue and it doesn't change it now but still tries to run deploy-local [22:05:13] (if you want to see the log login to deployment-ores-web.eqiad.wmflabs and do puppet agent -tv) [22:36:00] oh, It seems that's another issue! the puppet can't install the package in the first place and tries to do every time [22:36:23] interestingly it fails not because of checks [22:36:42] because the puppet can't make a proper connect to git repo (tin) [22:36:58] but the user can! [22:37:05] that's just fantastic [23:12:46] PROBLEM - puppet last run on mw1114 is CRITICAL: CRITICAL: Puppet has 40 failures