[00:00:04] RoanKattouw ostriches Krenair: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160226T0000). [00:00:04] bmansurov yurik mutante: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [00:00:15] here [00:00:26] (03CR) 10jenkins-bot: [V: 04-1] Password policies for advanced permission groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/272660 (https://phabricator.wikimedia.org/T119100) (owner: 10CSteipp) [00:00:29] here [00:02:11] I'll do it [00:02:53] (03PS4) 10CSteipp: Password policies for advanced permission groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/272660 (https://phabricator.wikimedia.org/T119100) [00:07:33] (03CR) 10Catrope: [C: 032] Run the survey at normal rate to test DNT [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273261 (https://phabricator.wikimedia.org/T125946) (owner: 10Bmansurov) [00:08:10] (03Merged) 10jenkins-bot: Run the survey at normal rate to test DNT [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273261 (https://phabricator.wikimedia.org/T125946) (owner: 10Bmansurov) [00:12:48] (03CR) 10CSteipp: Password policies for advanced permission groups (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/272660 (https://phabricator.wikimedia.org/T119100) (owner: 10CSteipp) [00:15:09] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Run reader segmentation survey at 1:500 to test DNT (duration: 01m 21s) [00:15:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:15:34] bmansurov: ---^^ [00:15:39] checking [00:15:57] 6Operations, 10Traffic, 10Wikimedia-Blog, 7HTTPS: make blog links from wmfwiki front page use HTTPS links - https://phabricator.wikimedia.org/T104728#2065367 (10Krenair) >>! In T104728#2065314, @jrbs wrote: >>>! In T104728#2065282, @Dzahn wrote: >> Looks to me like this ticket is either about editing wiki... [00:16:11] (03CR) 10Catrope: [C: 032] Disabled Graph namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273283 (owner: 10Yurik) [00:16:12] RoanKattouw: still seeing the old data [00:18:25] (03Merged) 10jenkins-bot: Disabled Graph namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273283 (owner: 10Yurik) [00:19:06] RoanKattouw: all good now [00:19:32] bmansurov: OK, lemme know when you're ready for part 2 (lowering the rate) [00:20:05] RoanKattouw: ok I will [00:20:42] yurik: Yours is going out now, in 2 parts [00:20:48] RoanKattouw, cool [00:20:55] thx [00:21:00] (03CR) 10Catrope: [C: 032] Raise file upload limit to 2047MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266544 (https://phabricator.wikimedia.org/T116514) (owner: 10TheDJ) [00:21:26] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Add wmgUseGraphWithJsonNamespace (duration: 01m 04s) [00:21:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:21:37] (03Merged) 10jenkins-bot: Raise file upload limit to 2047MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266544 (https://phabricator.wikimedia.org/T116514) (owner: 10TheDJ) [00:22:30] !log catrope@tin Synchronized wmf-config/CommonSettings.php: Add plumbing for wmgUseGraphWithJsonNamespace (duration: 01m 03s) [00:22:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:26:54] PROBLEM - puppet last run on lvs4002 is CRITICAL: CRITICAL: puppet fail [00:28:31] 6Operations, 10Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144#2065416 (10bbogaert) [00:29:30] yurik: Done, please verify [00:29:48] * yurik looks [00:30:15] RoanKattouw, seems good, thx [00:31:00] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Raise file upload limit to 2047MB (duration: 01m 02s) [00:31:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:33:00] !log catrope@tin Synchronized php-1.27.0-wmf.14/extensions/MobileFrontend/: SWAT (duration: 01m 05s) [00:33:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:33:36] 6Operations, 6Commons, 10MediaWiki-Uploading, 6Multimedia, and 2 others: Raise max upload limit above 1GB - https://phabricator.wikimedia.org/T76614#2065435 (10Krenair) https://commons.wikimedia.org/wiki/Help:Server-side_upload mentions UW does up to 1GiB - is that now also raised by this change? [00:33:44] 6Operations, 10Traffic, 10Wikipedia-Store, 7HTTPS: shop.wikimedia.org should be HTTPS only - https://phabricator.wikimedia.org/T39790#2065436 (10CCogdill_WMF) I'm not sure if anyone currently at the Store is aware of this task. Adding @Ppena. @Dzahn are there any hard deadlines on this? [00:34:22] RoanKattouw: will you be deploying the middle patch after mutante's patch? [00:36:23] I deployed everything already [00:36:28] Except your back-to-normal patch [00:36:48] When would you like that one deployed? [00:37:22] RoanKattouw: in 8 minutes please [00:38:23] 6Operations, 6Commons, 10MediaWiki-Uploading, 6Multimedia, and 2 others: Raise max upload limit above 1GB - https://phabricator.wikimedia.org/T76614#807206 (10matmarex) Apparently yes. [00:39:29] RoanKattouw: thanks for the merge, i'm here fwiw, because i added that one [00:39:58] bblack: re: "walk it through swat" done :) [00:43:57] 6Operations, 10Mail: move travel related aliases to OIT - https://phabricator.wikimedia.org/T127549#2065469 (10Dzahn) >>! In T127549#2065414, @bbogaert wrote: > I received a bounce back. I checked my spelling in LDAP and google groups, and everything looks ok. Could it be something your side? Hi Byron, i'm... [00:44:01] ori, _joe_: so, the jobqueue meeting overlays with the staff meeting :/ [00:46:01] RoanKattouw: now [00:53:16] 6Operations, 10Mail: move travel related aliases to OIT - https://phabricator.wikimedia.org/T127549#2065495 (10Dzahn) @bbogaert i added it back, but i'm not sure why it failed [00:53:25] RECOVERY - puppet last run on lvs4002 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [00:58:46] 6Operations, 10Wikimedia-Video, 13Patch-For-Review: 1gb file upload limit is too restrictive for conference presentation videos - https://phabricator.wikimedia.org/T116514#2065497 (10Dzahn) I added that change to SWAT today and it was merged by Roan. This should resolve this ticket. Please confirm. [00:59:37] 6Operations, 6Commons, 10MediaWiki-Uploading, 6Multimedia, and 2 others: Raise max upload limit above 1GB - https://phabricator.wikimedia.org/T76614#2065498 (10Dzahn) Yes, the limit should be 2047MB now, please confirm. [01:02:05] bmansurov: OK, sorry for the delay, will do [01:02:15] (03CR) 10Catrope: [C: 032] Run the survey at lowered rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273169 (https://phabricator.wikimedia.org/T125946) (owner: 10Bmansurov) [01:03:34] (03Merged) 10jenkins-bot: Run the survey at lowered rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273169 (https://phabricator.wikimedia.org/T125946) (owner: 10Bmansurov) [01:04:25] 6Operations, 6Research-and-Data, 10Wikimedia-Mailing-lists: Close / Archive rcom-l - https://phabricator.wikimedia.org/T128141#2065503 (10Dzahn) 5Open>3Resolved Hi @DarTar , done. The list is disabled ("emergency moderation" message when you login as admin) and archives are still at https://lists.wiki... [01:05:46] AaronSchulz: I'll suggest re-scheduling (the jq meeting) [01:05:47] 6Operations, 6Commons, 10MediaWiki-Uploading, 6Multimedia, and 2 others: Raise max upload limit above 1GB - https://phabricator.wikimedia.org/T76614#2065506 (10zhuyifei1999) I'll post a notice to commons Village Pump shortly. [01:06:03] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Lower survey rate again (duration: 01m 05s) [01:06:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:08:07] 6Operations, 6Research-and-Data, 10Wikimedia-Mailing-lists: Close / Archive rcom-l - https://phabricator.wikimedia.org/T128141#2065507 (10DarTar) @Dzahn thank you. [01:08:29] (03CR) 10BryanDavis: [C: 04-1] "Putting a hold on rolling this out until we can reasonably measure overhead it introduces." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273376 (https://phabricator.wikimedia.org/T114700) (owner: 10BryanDavis) [01:09:51] RoanKattouw: thanks. I see the change. [01:17:12] (03PS6) 10Dzahn: maps: move roles to modules/role/ [puppet] - 10https://gerrit.wikimedia.org/r/249059 [01:22:21] (03CR) 10Dzahn: "@Alex see http://puppet-compiler.wmflabs.org/1868/ now" [puppet] - 10https://gerrit.wikimedia.org/r/249059 (owner: 10Dzahn) [01:53:59] (03CR) 10Rschen7754: "@yuvipanda: we don't need X11, we just need the inkscape package installed. The bot code we are using does not use the GUI. Of course if X" [puppet] - 10https://gerrit.wikimedia.org/r/270638 (https://phabricator.wikimedia.org/T126933) (owner: 10Merlijn van Deen) [02:01:11] 6Operations, 10Traffic, 10Wikimedia-Blog, 7HTTPS: make blog links from wmfwiki front page use HTTPS links - https://phabricator.wikimedia.org/T104728#2065688 (10jrbs) They all look like HTTPS links to me now. I think something must have changed on the blog side. [02:01:25] twentyafterfour: Krenair greg-g did you guys do anything with phabricator within about last 24 hours? [02:01:39] why? what's wrong? [02:02:09] Krenair: https://phabricator.wikimedia.org/project/profile/1510/ and bunch of others: [02:02:12] Danny_B set the image for Non-design researcher mentoring to Unknown Object (File). [02:02:18] images disappeared [02:02:42] I don't have the server permissions to mess with phabricator at that level [02:02:59] Danny_B: Nothing happened that I know of [02:03:03] Krenair: sorry for bothering then, i thought you do [02:03:13] no, nor does greg-g [02:03:14] twentyafterfour: so i wonder how they could disappear [02:03:44] Danny_B: I don't know :-/ [02:03:46] i'm nearly sure they have been there yesterday [02:04:02] but the 100% date is of course the one of when i set that [02:04:14] so at least since then something must have happened [02:04:42] Danny_B: was it a custom image or just an "icon and color" [02:04:59] icon + color [02:05:10] hmm even more odd [02:05:13] 22nd is the last date when i was working on it [02:05:25] according to my timeline [02:05:37] hence on that day it was there still [02:05:43] thus last about 3 days [02:05:48] or 4 [02:08:34] looking at my timeline i see that bunch of such images are missing [02:08:43] db corruption? [02:08:54] missing (cached) files? [02:16:16] Danny_B: would you mind filing a task with the details? I've just looked and at least one of the files is indeed missing from the db [02:16:34] I mean, it seems mostly harmless since they are phabricator-generated files [02:16:50] but still, it's a little worrying that file objects are gone from the db [02:17:31] maybe tag the task #phabricator and #dba? [02:22:34] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 72, down: 0, dormant: 0, excluded: 0, unused: 0 [02:22:50] will do, but just the observation, please improve it then with tech details [02:23:23] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [02:24:37] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.14) (duration: 10m 34s) [02:24:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:27:14] (03PS3) 10Andrew Bogott: Added config files for Openstack Liberty [puppet] - 10https://gerrit.wikimedia.org/r/270891 [02:28:58] 6Operations, 10DBA, 10Phabricator: Project icon files are missing - https://phabricator.wikimedia.org/T128160#2065709 (10Danny_B) [02:29:09] Danny_B: thanks [02:29:15] yw [02:30:06] next time maybe we should do the upgrade first on labs instance and after some testing on the real instance [02:31:22] clearly this one has undergone no testing whatsoever and was just slinged out without care :) [02:32:19] !log l10nupdate@tin ResourceLoader cache refresh completed at Fri Feb 26 02:32:19 UTC 2016 (duration 7m 42s) [02:32:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:35:11] 6Operations, 10DBA, 10Phabricator: Project icon files are missing - https://phabricator.wikimedia.org/T128160#2065723 (10mmodell) The odd thing is that when phabricator generates a profile image using the 'icon + background color' feature, it seems to make a unique image each time rather than reusing some st... [02:35:51] 6Operations, 10DBA, 10Phabricator, 10Phabricator-Upstream: Project icon files are missing - https://phabricator.wikimedia.org/T128160#2065724 (10mmodell) [02:36:40] 6Operations, 10DBA, 10Phabricator, 10Phabricator-Upstream: Project icon files are missing - https://phabricator.wikimedia.org/T128160#2065709 (10mmodell) p:5Triage>3Normal [02:40:03] PROBLEM - puppet last run on wtp2018 is CRITICAL: CRITICAL: Puppet has 1 failures [02:51:03] 6Operations, 10DBA, 10Phabricator, 10Phabricator-Upstream: Project icon files are missing - https://phabricator.wikimedia.org/T128160#2065745 (10Danny_B) >>! In T128160#2065723, @mmodell wrote: > The odd thing is that when phabricator generates a profile image using the 'icon + background color' feature, i... [02:53:52] 6Operations, 10DBA, 10Phabricator, 10Phabricator-Upstream: Project icon files are missing - https://phabricator.wikimedia.org/T128160#2065747 (10Danny_B) Might there be any cronjob or anything like that which would actually do the maintenance of files? I remember seeing in some projects that although the i... [03:01:04] 6Operations, 10Traffic, 10Wikimedia-Blog, 7HTTPS: make blog links from wmfwiki front page use HTTPS links - https://phabricator.wikimedia.org/T104728#2065750 (10Krenair) They all show as HTTP links now. [03:06:33] RECOVERY - puppet last run on wtp2018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:10:44] 6Operations, 10DBA, 10Phabricator, 10Phabricator-Upstream: Project icon files are missing - https://phabricator.wikimedia.org/T128160#2065753 (10mmodell) @Danny_b: Phabricator prefixes the file's path with a session-based hash which will change over time. This is a protection so that files with a security... [03:14:24] PROBLEM - puppet last run on mw2099 is CRITICAL: CRITICAL: puppet fail [03:17:29] 6Operations, 10Traffic, 10Wikimedia-Blog, 7HTTPS: make blog links from wmfwiki front page use HTTPS links - https://phabricator.wikimedia.org/T104728#2065756 (10jrbs) >>! In T104728#2065750, @Krenair wrote: > They all show as HTTP links now. I have no idea what is going on. It seems to have been caused by... [03:17:51] twentyafterfour: iirc those were changes something like f-flag-orange -> f-profile or sth like that, i'll try to find it [03:18:22] 6Operations, 10Traffic, 10Wikimedia-Blog, 7HTTPS: make blog links from wmfwiki front page use HTTPS links - https://phabricator.wikimedia.org/T104728#2065757 (10Krenair) yes, it's T104726 [03:19:24] 6Operations, 10Traffic, 10Wikimedia-Blog, 7HTTPS: make blog links from wmfwiki front page use HTTPS links - https://phabricator.wikimedia.org/T104728#2065758 (10jrbs) Oh, I see. So maybe this should be marked as a duplicate. [03:21:56] 6Operations, 10Traffic, 10Wikimedia-Blog, 7HTTPS: make blog links from wmfwiki front page use HTTPS links - https://phabricator.wikimedia.org/T104728#2065760 (10Krenair) It's marked as a blocker. And needs someone to make upstream actually do it. [03:22:23] PROBLEM - puppet last run on install2001 is CRITICAL: CRITICAL: puppet fail [03:27:00] hm [03:27:02] https://wikitech.wikimedia.org/w/resources/assets/poweredby_mediawiki_88x31.png [03:27:03] 404 [03:27:21] Krinkle, was this related to one of the recent changes? [03:35:40] sigh [03:35:48] I guess it was missed in the apache changes [03:42:34] RECOVERY - puppet last run on mw2099 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:45:03] (03PS1) 10Alex Monk: Add public-wiki-rewrites to wikitech [puppet] - 10https://gerrit.wikimedia.org/r/273410 (https://phabricator.wikimedia.org/T99096) [03:45:42] (03PS2) 10Alex Monk: Add public-wiki-rewrites to wikitech [puppet] - 10https://gerrit.wikimedia.org/r/273410 (https://phabricator.wikimedia.org/T99096) [03:48:45] RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [03:55:44] (03PS1) 10Alex Monk: Merge labtestwikitech apache config with wikitech apache config [puppet] - 10https://gerrit.wikimedia.org/r/273411 [03:56:12] ^ warning: I am trying to change apache configs :) [03:56:17] ("but it's only wikitech") [04:09:04] PROBLEM - puppet last run on mw1077 is CRITICAL: CRITICAL: Puppet has 1 failures [04:16:25] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 13.64% of data above the critical threshold [100000000.0] [04:31:05] We don't have an icinga alert for when Krenair tries to change apache config? ;-) [04:35:24] RECOVERY - puppet last run on mw1077 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:35:43] PROBLEM - Host mw2027 is DOWN: PING CRITICAL - Packet loss = 100% [04:36:24] RECOVERY - Host mw2027 is UP: PING OK - Packet loss = 0%, RTA = 36.79 ms [04:37:34] RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [05:35:12] 6Operations, 10Traffic, 10Wikimedia-Blog, 7HTTPS: Switch blog to HTTPS-only - https://phabricator.wikimedia.org/T105905#2065829 (10Tbayer) (Update: Heard back on Feb 10 that they needed to check with someone specific about this and would get back. On Feb 18 I sent a reminder and got the reply that they wou... [06:18:23] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/3/0: down - Core: cr1-codfw:xe-5/0/2 (Zayo, OGYX/124337//ZYO, 38.8ms) {#?} [10Gbps wave]BR [06:18:53] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/0/2: down - Core: cr2-ulsfo:xe-1/3/0 (Zayo, OGYX/124337//ZYO, 38.8ms) {#11541} [10Gbps wave]BR [06:29:23] PROBLEM - Router interfaces on cr1-eqdfw is CRITICAL: CRITICAL: host 208.80.153.198, interfaces up: 33, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/2/0: down - Peering: Equinix Dallas (SR 17915024) {#11397} [10Gbps DF]BR [06:30:24] PROBLEM - puppet last run on mw2024 is CRITICAL: CRITICAL: puppet fail [06:30:34] PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:45] PROBLEM - puppet last run on cp3048 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:54] PROBLEM - puppet last run on db2056 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:04] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:04] PROBLEM - puppet last run on wtp2015 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:16] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 3 failures [06:31:45] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:34] PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:54] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures [06:38:19] <_joe_> uhm we just lost codfw? or what? [06:38:23] RECOVERY - Router interfaces on cr1-eqdfw is OK: OK: host 208.80.153.198, interfaces up: 35, down: 0, dormant: 0, excluded: 0, unused: 0 [06:39:21] <_joe_> uhm nope [06:56:45] RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:04] RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:57:04] RECOVERY - puppet last run on cp3048 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:57:13] RECOVERY - puppet last run on db2056 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:57:24] RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:57:24] RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [06:57:24] RECOVERY - puppet last run on wtp2015 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:57:24] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:04] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:34] RECOVERY - puppet last run on mw2024 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [08:01:12] (03CR) 10Muehlenhoff: "I checked existing labs instances with "aufs" loaded (three of them) and cleared the change with the owners of that VMs." [puppet] - 10https://gerrit.wikimedia.org/r/270690 (owner: 10Muehlenhoff) [08:01:21] (03PS2) 10Muehlenhoff: Blacklist aufs kernel module [puppet] - 10https://gerrit.wikimedia.org/r/270690 [08:01:57] !log blacklisting aufs kernel module [08:02:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:02:28] (03CR) 10Muehlenhoff: [C: 032 V: 032] Blacklist aufs kernel module [puppet] - 10https://gerrit.wikimedia.org/r/270690 (owner: 10Muehlenhoff) [08:03:29] !sal [08:03:30] https://wikitech.wikimedia.org/wiki/Server_Admin_Log https://tools.wmflabs.org/sal/production See it and you will know all you need. [08:03:48] (03PS1) 10Elukey: Remove mc1016 from the redis/memcached pools for maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273415 (https://phabricator.wikimedia.org/T123711) [08:05:29] (03PS2) 10Elukey: Remove mc1016 from the redis/memcached pools for maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273415 (https://phabricator.wikimedia.org/T123711) [08:11:02] (03CR) 10Elukey: [C: 032] Remove mc1016 from the redis/memcached pools for maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273415 (https://phabricator.wikimedia.org/T123711) (owner: 10Elukey) [08:11:04] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 [08:11:34] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [08:12:18] !log removed mc1016.eqiad from the redis/memcached pools for maintenance [08:12:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:19:53] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/3/0: down - Core: cr1-codfw:xe-5/0/2 (Zayo, OGYX/124337//ZYO, 38.8ms) {#?} [10Gbps wave]BR [08:20:25] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/0/2: down - Core: cr2-ulsfo:xe-1/3/0 (Zayo, OGYX/124337//ZYO, 38.8ms) {#11541} [10Gbps wave]BR [08:23:24] PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: Puppet has 1 failures [08:25:55] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [08:25:55] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [08:33:05] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [08:33:05] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [08:38:11] (03CR) 10Merlijn van Deen: "sudo apt-get install -s inkscape:" [puppet] - 10https://gerrit.wikimedia.org/r/270638 (https://phabricator.wikimedia.org/T126933) (owner: 10Merlijn van Deen) [08:41:03] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 [08:41:33] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [08:49:55] RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [08:54:04] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/0/2: down - Core: cr2-ulsfo:xe-1/3/0 (Zayo, OGYX/124337//ZYO, 38.8ms) {#11541} [10Gbps wave]BR [08:55:23] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/3/0: down - Core: cr1-codfw:xe-5/0/2 (Zayo, OGYX/124337//ZYO, 38.8ms) {#?} [10Gbps wave]BR [08:56:36] mhh I don't see any scheduled maint for that link announced? ^ [08:57:09] godog: flapping interfaces? [08:59:06] elukey: yeah looks like it, happened this morning 3x already [09:04:38] <_joe_> godog: yeah no maint announced [09:04:52] <_joe_> but I didn't find any impact in ganglia graphs earlier [09:04:59] <_joe_> I might've missed something though [09:09:50] _joe_: yeah looks like traffic moved to cr1 in ulsfo, possibly routing through eqord to go to eqiad (speculation) [09:10:13] cc mark paravoid ^ [09:10:22] <_joe_> godog: our network map has a big "his sunt leones" painted on it for me :P [09:10:25] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [09:11:45] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 [09:13:48]  [09:14:23] _joe_: hehe mostly for me too, taking an educated guess by looking at librenms [09:26:59] (03CR) 10Muehlenhoff: [C: 031] "Manager approval is present and three day period has lapsed." [puppet] - 10https://gerrit.wikimedia.org/r/273038 (https://phabricator.wikimedia.org/T127808) (owner: 10Dzahn) [09:27:54] (03PS5) 10Ema: admin: add nikerabbit to researchers [puppet] - 10https://gerrit.wikimedia.org/r/273038 (https://phabricator.wikimedia.org/T127808) (owner: 10Dzahn) [09:28:32] (03CR) 10Ema: [C: 032 V: 032] admin: add nikerabbit to researchers [puppet] - 10https://gerrit.wikimedia.org/r/273038 (https://phabricator.wikimedia.org/T127808) (owner: 10Dzahn) [09:30:23] Nikerabbit: hey [09:30:38] good day [09:30:41] your access request is merged now [09:35:01] ok [09:42:12] (03PS1) 10Elukey: Add mc1016 back to the redis/memcached pools after maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273424 (https://phabricator.wikimedia.org/T123711) [09:45:30] (03CR) 10Elukey: [C: 032] Add mc1016 back to the redis/memcached pools after maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273424 (https://phabricator.wikimedia.org/T123711) (owner: 10Elukey) [09:46:41] !log mc1016.eqiad re-added to the memcached/redis pools after maintenance [09:46:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:52:50] 6Operations, 10Ops-Access-Requests, 10Analytics, 10ContentTranslation-Analytics, 13Patch-For-Review: access for nikerabbit to researchers - https://phabricator.wikimedia.org/T127808#2065960 (10ema) 5Open>3Resolved @Nikerabbit can now access the db. Closing. [10:04:16] 6Operations: reinstall redis servers with jessie - https://phabricator.wikimedia.org/T123675#1935461 (10elukey) Good conversation to save in this task as FYI: ``` is there a known procedure to pool/de-pool rdbXXXX hosts from the jobrunner queue pools? (I am reading https://phabricator.wikimedia.org/T12... [10:06:01] ACKNOWLEDGEMENT - cassandra-a CQL 10.64.32.187:9042 on restbase1008 is CRITICAL: Connection refused Filippo Giunchedi bootstrapping [10:18:00] PROBLEM - HTTPS-toolserver on www.toolserver.org is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:IO::Socket::SSL: connect: Connection timed out [10:20:00] RECOVERY - HTTPS-toolserver on www.toolserver.org is OK: SSL OK - Certificate toolserver.org valid until 2016-06-30 17:56:02 +0000 (expires in 125 days) [10:29:55] (03CR) 10Ori.livneh: [C: 031] Also start/stop/restart keyholder-proxy [puppet] - 10https://gerrit.wikimedia.org/r/259596 (owner: 10Ottomata) [10:32:31] PROBLEM - HTTPS-toolserver on www.toolserver.org is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:IO::Socket::SSL: connect: Connection timed out [10:34:11] RECOVERY - HTTPS-toolserver on www.toolserver.org is OK: SSL OK - Certificate toolserver.org valid until 2016-06-30 17:56:02 +0000 (expires in 125 days) [10:40:38] (03PS1) 10Giuseppe Lavagetto: Add regex-based matching for selection of nodes. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/273427 (https://phabricator.wikimedia.org/T114305) [10:41:11] <_joe_> bblack: it was you asking for this too in the past, right? ^^ [10:41:51] _joe_: hey good morning! I took the liberty to add a #puppet-compiler project in Phabricator. Since they are often filled against some CI project, that highlight them nicely on the workboard [10:42:17] just been bold. Maybe I should have reached out to you first [10:42:25] <_joe_> hashar: no you did right [10:42:32] <_joe_> sorry for not acting upon it [10:43:01] and Luke081515|away mentionned Phabricator has a sub project system. So potentially we could have #puppet-compiler under #ci-infrastructure umbrella or something like that [10:43:07] but I have no idea how sub project works [10:47:11] (03PS1) 10Elukey: Remove mc1017/mc1018 from the redis/memcached pools for maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273430 (https://phabricator.wikimedia.org/T123711) [10:47:59] (03PS2) 10Elukey: Remove mc1017/mc1018 from the redis/memcached pools for maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273430 (https://phabricator.wikimedia.org/T123711) [10:51:17] (03CR) 10Elukey: [C: 032] Remove mc1017/mc1018 from the redis/memcached pools for maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273430 (https://phabricator.wikimedia.org/T123711) (owner: 10Elukey) [10:52:52] (03PS1) 10Filippo Giunchedi: swift: return 400 on UnicodeDecodeErrors [puppet] - 10https://gerrit.wikimedia.org/r/273431 (https://phabricator.wikimedia.org/T128081) [10:59:24] 6Operations: upgrade 15+4 swift servers from precise to trusty - https://phabricator.wikimedia.org/T125024#2066085 (10fgiunchedi) upgraded `ms-fe1004` yesterday but ran into {T128081} so I've kept it depooled. `ms-be101[345]` all upgraded now [11:01:36] !log removed mc1018/1017 from the redis memcached pools for maintenance [11:01:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:02:49] (03PS2) 10Giuseppe Lavagetto: Add regex-based matching for selection of nodes. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/273427 (https://phabricator.wikimedia.org/T114305) [11:05:18] 6Operations, 6Commons, 10MediaWiki-Uploading, 6Multimedia, and 2 others: Raise max upload limit above 1GB - https://phabricator.wikimedia.org/T76614#2066127 (10Steinsplitter) Cool. Thanks :) [11:10:10] PROBLEM - HTTPS-toolserver on www.toolserver.org is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:IO::Socket::SSL: connect: Connection timed out [11:11:52] RECOVERY - HTTPS-toolserver on www.toolserver.org is OK: SSL OK - Certificate toolserver.org valid until 2016-06-30 17:56:02 +0000 (expires in 125 days) [11:12:35] (03PS1) 10BBlack: vcl_config: pass_random setting, disable for upload-fe [puppet] - 10https://gerrit.wikimedia.org/r/273432 [11:16:04] 6Operations, 10Ops-Access-Requests: Access Request for mobrovac as ci-admin to mess with CI infrastructure - https://phabricator.wikimedia.org/T128175#2066176 (10hashar) [11:17:10] (03CR) 10Giuseppe Lavagetto: [C: 032] Add regex-based matching for selection of nodes. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/273427 (https://phabricator.wikimedia.org/T114305) (owner: 10Giuseppe Lavagetto) [11:17:45] (03PS1) 10Hashar: admin: mobrovac as ci-admins [puppet] - 10https://gerrit.wikimedia.org/r/273434 (https://phabricator.wikimedia.org/T128175) [11:18:19] (03CR) 10Hashar: [C: 04-1] "Pending feedback and access request process outcome on T128175" [puppet] - 10https://gerrit.wikimedia.org/r/273434 (https://phabricator.wikimedia.org/T128175) (owner: 10Hashar) [11:20:47] If anyone is into file mgmt issues ("inconsistent state within the internal storage backends"), taking a look at https://phabricator.wikimedia.org/T128096 is welcome [11:22:01] (03CR) 10Ema: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/273432 (owner: 10BBlack) [11:22:03] andre__: thanks, I'll take a look [11:24:04] (03CR) 10BBlack: [C: 032] vcl_config: pass_random setting, disable for upload-fe [puppet] - 10https://gerrit.wikimedia.org/r/273432 (owner: 10BBlack) [11:24:38] 6Operations, 10media-storage: Unable to delete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2066238 (10Joe) [11:24:51] godog, ah thanks [11:31:48] <_joe_> andre__: adding the operations tag helps to make an issue noticed by ops [11:32:22] <_joe_> meaning it will show up here every time someone comments [11:32:42] <_joe_> so you don't need to poke us directly :) [11:33:57] (03PS1) 10Giuseppe Lavagetto: Version bump to 0.1.0, start using semver properly from now on. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/273437 [11:35:02] PROBLEM - Memcached on mc1017 is CRITICAL: Connection refused [11:35:47] (03CR) 10Giuseppe Lavagetto: [C: 032] Version bump to 0.1.0, start using semver properly from now on. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/273437 (owner: 10Giuseppe Lavagetto) [11:35:49] --^ this is me sorry [11:35:52] PROBLEM - Memcached on mc1018 is CRITICAL: Connection timed out [11:36:00] 6Operations, 10media-storage: Unable to delete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2066246 (10fgiunchedi) @Serkanland @Lokal_Profil does it happen for any file operation on any file at the... [11:38:44] andre__: do you know if those two users are online on irc? [11:38:50] PROBLEM - Last backup of the tools filesystem on labstore1001 is CRITICAL: CRITICAL - Last run result for unit replicate-tools was exit-code [11:42:03] !log run swiftrepl eqiad -> codfw for unsharded containers [11:42:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:44:08] 6Operations, 10media-storage: Unable to delete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2066259 (10Serkanland) This happens only when I want delete any file. But the problem doesn't happen when... [11:54:04] 6Operations: Reinstall redis servers (Job queues) with Jessie - https://phabricator.wikimedia.org/T123675#2066293 (10elukey) [12:01:00] 6Operations, 10media-storage: Unable to delete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2066300 (10fgiunchedi) @Serkanland thanks, could you try deleting `Etapprapporteringexempel_v3.0.odt` aga... [12:05:06] 6Operations, 10Traffic, 7HTTPS: Outbound HTTPS for varnish backend instances - https://phabricator.wikimedia.org/T109325#2066302 (10BBlack) Updates from the passage of time: Varnish4 is happening and is a realistic blocker (for lots of things) these days, so we're almost certainly looking at something like... [12:07:52] RECOVERY - Memcached on mc1017 is OK: TCP OK - 0.010 second response time on port 11211 [12:10:30] RECOVERY - Memcached on mc1018 is OK: TCP OK - 0.001 second response time on port 11211 [12:11:50] _joe_, I wasn't even sure that problem is already on ops level, but thanks :) [12:12:17] <_joe_> andre__: me neither, but we should check nonetheless [12:24:07] 6Operations, 10media-storage: Unable to delete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2066359 (10Serkanland) @fgiunchedi, I deleted one file, but when I wanted deleted another file, this happ... [12:28:16] (03PS1) 10Elukey: Add mc1017/mc1018 back to the memcached/redis pools after maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273443 (https://phabricator.wikimedia.org/T123711) [12:29:12] 6Operations, 10media-storage: Unable to delete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2066367 (10Serkanland) The file: https://az.wikipedia.org/wiki/Şəkil:Manuel-Neuer.jpg. [12:30:23] (03CR) 10Elukey: [C: 032] Add mc1017/mc1018 back to the memcached/redis pools after maintenance. [puppet] - 10https://gerrit.wikimedia.org/r/273443 (https://phabricator.wikimedia.org/T123711) (owner: 10Elukey) [12:31:37] !log added mc1017/mc1018 back to the redis/memcached pools after maintenance [12:31:39] (03PS4) 10Giuseppe Lavagetto: role::diamond: move to standard::diamond [puppet] - 10https://gerrit.wikimedia.org/r/273248 [12:31:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:31:41] (03PS4) 10Giuseppe Lavagetto: ntp: further reorg, split of client and server code [puppet] - 10https://gerrit.wikimedia.org/r/273247 [12:31:43] (03PS3) 10Giuseppe Lavagetto: standard: move to own module [puppet] - 10https://gerrit.wikimedia.org/r/273209 (https://phabricator.wikimedia.org/T119042) [12:31:45] (03PS3) 10Giuseppe Lavagetto: role::ntp: rename standard::ntp, move to the standard module [puppet] - 10https://gerrit.wikimedia.org/r/273246 [12:31:47] (03PS1) 10Giuseppe Lavagetto: role::mail::sender: move to standard [puppet] - 10https://gerrit.wikimedia.org/r/273444 [12:31:49] (03PS1) 10Giuseppe Lavagetto: role::testsystem: move to module [puppet] - 10https://gerrit.wikimedia.org/r/273445 [12:32:34] 6Operations, 10Traffic, 7HTTPS, 7Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2066387 (10BBlack) Do have any ideas about the versions of IE they're using, agencies they're working for, and/or approximate start date of the pr... [12:38:06] 6Operations, 10media-storage: Unable to delete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2066406 (10Serkanland) Finally I deleted this file :-) file Thank you very much for help! [12:40:12] 6Operations, 6Project-Admins, 3DevRel-February-2016: Operations-related subprojects/tags reorganization - https://phabricator.wikimedia.org/T119944#2066408 (10Aklapper) So we add #Traffic to the [[ https://phabricator.wikimedia.org/maniphest/query/uxzC3MibBe5g/#R | 58 Varnish tasks ]] and then archive #Varni... [12:48:42] 6Operations, 6Project-Admins, 3DevRel-February-2016: Operations-related subprojects/tags reorganization - https://phabricator.wikimedia.org/T119944#2066410 (10BBlack) >>! In T119944#2066408, @Aklapper wrote: > So we add #Traffic to the [[ https://phabricator.wikimedia.org/maniphest/query/uxzC3MibBe5g/#R | 58... [12:59:16] 6Operations, 10puppet-compiler, 13Patch-For-Review: Puppet Compiler: Support wildcards, regexps, or 'all hosts' - https://phabricator.wikimedia.org/T114305#2066426 (10Joe) 5stalled>3Resolved [13:09:34] 6Operations, 13Patch-For-Review, 7Tracking: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#2066433 (10elukey) [13:10:10] (03PS5) 10Giuseppe Lavagetto: role::diamond: move to standard::diamond [puppet] - 10https://gerrit.wikimedia.org/r/273248 [13:10:12] (03PS2) 10Giuseppe Lavagetto: role::mail::sender: move to standard [puppet] - 10https://gerrit.wikimedia.org/r/273444 [13:10:14] (03PS2) 10Giuseppe Lavagetto: role::testsystem: move to module [puppet] - 10https://gerrit.wikimedia.org/r/273445 [13:10:16] (03PS5) 10Giuseppe Lavagetto: ntp: further reorg, split of client and server code [puppet] - 10https://gerrit.wikimedia.org/r/273247 [13:26:58] !log launch swiftrepl continuous replication for unsharded containers on ms-fe1003 T128096 [13:26:59] T128096: Unable to delete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096 [13:27:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:30:24] 6Operations, 13Patch-For-Review, 7Tracking: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#2066456 (10Krenair) [13:38:19] 6Operations, 6Commons, 10MediaWiki-Uploading, 6Multimedia, and 3 others: Raise max upload limit above 1GB - https://phabricator.wikimedia.org/T76614#2066462 (10Josve05a) [13:40:58] 6Operations, 10MediaWiki-Interface, 10Traffic, 5MW-1.27-release, and 4 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#1963606 (10Elitre) At en.wiki as well: https... [13:51:33] (03PS4) 10Krinkle: Set $wgResourceBasePath to "/w" for remaining wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271711 (https://phabricator.wikimedia.org/T99096) [13:52:32] (03PS5) 10Krinkle: Set $wgResourceBasePath to "/w" for remaining wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271711 (https://phabricator.wikimedia.org/T99096) [13:52:57] (03CR) 10Krinkle: [C: 032] Set $wgResourceBasePath to "/w" for remaining wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271711 (https://phabricator.wikimedia.org/T99096) (owner: 10Krinkle) [13:53:32] (03Merged) 10jenkins-bot: Set $wgResourceBasePath to "/w" for remaining wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271711 (https://phabricator.wikimedia.org/T99096) (owner: 10Krinkle) [13:54:50] !log rebooting lithium for kernel update [13:54:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:58:54] 6Operations, 10Deployment-Systems, 6Performance-Team, 10Traffic, 5MW-1.27-release-notes: Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime - https://phabricator.wikimedia.org/T99096#2066474 (10Krinkle) [13:59:11] 6Operations, 10Deployment-Systems, 6Performance-Team, 10Traffic, 5MW-1.27-release-notes: Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime - https://phabricator.wikimedia.org/T99096#1457784 (10Krinkle) 5Open>3Resolved [13:59:17] !log krinkle@tin Synchronized wmf-config/InitialiseSettings.php: T99096: Enable wmgUseWmfstatic on remaining wikis (duration: 00m 50s) [13:59:18] T99096: Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime - https://phabricator.wikimedia.org/T99096 [13:59:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:02:04] (03PS1) 10Tim Landscheidt: diamond: Fix comments [puppet] - 10https://gerrit.wikimedia.org/r/273451 [14:14:58] 6Operations, 10media-storage: Unable to delete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2066498 (10fgiunchedi) p:5Unbreak!>3Normal I've launched a continuous replication via swiftrepl for u... [14:27:21] 6Operations, 10media-storage: Unable to delete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2066522 (10Serkanland) Yes, I can. For Azerbaijani Wikipedia I confirm [14:32:52] 6Operations, 10Continuous-Integration-Infrastructure, 10Traffic: Make CI run Varnish 4 VCL tests - https://phabricator.wikimedia.org/T128188#2066542 (10hashar) [15:03:09] !log Switched MediaWiki core npm test to Nodepool instance T119143 [15:03:10] T119143: Migrate javascript npm CI jobs to Nodepool - https://phabricator.wikimedia.org/T119143 [15:03:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:05:45] (03PS1) 10Hashar: nodepool: raise min pool from 10 to 14 [puppet] - 10https://gerrit.wikimedia.org/r/273459 [15:14:20] !log disabling puppet on restbase1009.eqiad to preserve local changes during a quick experiment [15:14:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:17:35] !log blocking CQL native port on restbase1009.eqiad.wmnet : https://phabricator.wikimedia.org/P2677 [15:17:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:20:18] !log performing backup of m5-master mysql data [15:20:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:22:11] 6Operations, 10Ops-Access-Requests, 13Patch-For-Review: Access Request for mobrovac as ci-admin to mess with CI infrastructure - https://phabricator.wikimedia.org/T128175#2066654 (10ema) p:5Triage>3Normal [15:22:16] it should only impact by locking writes for 0.1 seconds at the end of the process, to obtain a binlog position [15:24:09] !log re-enabling puppet on restbase1009.eqiad.wmnet [15:24:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:24:36] !log forcing puppet run on restbase1009.eqiad.wmnet [15:24:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:51:30] !log shutting down mariadb on db2030 to clone from db1009 [15:51:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:57:30] 6Operations, 10hardware-requests, 13Patch-For-Review: Upgrade restbase100[7-9] to match restbase100[1-6] hardware - https://phabricator.wikimedia.org/T119935#2066705 (10fgiunchedi) 5Open>3Resolved restbase1009 raid grow has finished and the FS has been expanded too, this is complete. all hosts have 128G... [16:12:23] 6Operations, 10Traffic, 7HTTPS, 7Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2066739 (10Florian) Hi @BBlack, sorry that I haven't wrote as much information as I had in the tickets (like the one i add here now :)) :) One us... [16:14:39] (03PS4) 10Andrew Bogott: Added config files for Openstack Liberty [puppet] - 10https://gerrit.wikimedia.org/r/270891 [16:19:23] 6Operations, 10Traffic, 7HTTPS, 7Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2066342 (10valhallasw) If they are still running Windows XP, it's probably an SNI issue. The server for ticket.wikimedia.org serves a *.wikipedia.... [16:21:28] valhallasw`cloud: Regarding your comment here: https://phabricator.wikimedia.org/T128182 If I don't misinterpret it, I think you misinterpreted the task (maybe I haven't wrote it clear enough): The users don't try to open ticket.wikimedia.org, they try to open wikipedia.org (www.wikipedia.org or en.wikipedia.org) :) [16:23:05] 6Operations, 10Traffic, 7HTTPS, 7Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2066752 (10Florian) [16:24:05] 6Operations, 10Traffic, 7HTTPS, 7Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2066342 (10Florian) @valhallasw Please excuse if this wasn't clear, but the users aren't OTRS agents, they're users of the Wikipedia and are tryin... [16:26:21] FlorianSW: aaaah, right. I was already confused why government employees needed to otrs at work :D [16:26:56] :D [16:27:24] 6Operations, 10Traffic, 7HTTPS, 7Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2066758 (10BBlack) Honestly, I'm not sure there's much we can do anyways, except ask them to contact their local network administrators about it.... [16:27:25] It's clearly in your initial message, I just missed it [16:30:28] (03PS1) 10Muehlenhoff: Drop the annotations, dpkg from jessie chokes on them and they are only needed for bootstrapping new archs [debs/linux44] - 10https://gerrit.wikimedia.org/r/273471 [16:32:29] valhallasw`cloud: no matter, I clarified the message much more because of your feedback -> win-win :) [16:34:47] (03PS1) 10Muehlenhoff: Use gcc 4.9 on x86 [debs/linux44] - 10https://gerrit.wikimedia.org/r/273472 [16:43:20] (03PS1) 10Ema: Simplify VCL errorpage [puppet] - 10https://gerrit.wikimedia.org/r/273480 [16:44:02] (03CR) 10Andrew Bogott: [C: 032] Added config files for Openstack Liberty [puppet] - 10https://gerrit.wikimedia.org/r/270891 (owner: 10Andrew Bogott) [16:44:14] 6Operations, 10media-storage: Unable to delete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2066834 (10Lokal_Profil) I can confirm that files can now be moved and new versions be uploaded on se.wik... [16:46:26] !log labstore1001 'mdadm --manage /dev/md126 --add /dev/sdaf' [16:46:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:46:57] (03CR) 10BBlack: [C: 031] Simplify VCL errorpage [puppet] - 10https://gerrit.wikimedia.org/r/273480 (owner: 10Ema) [16:47:28] (03CR) 10Ottomata: [C: 031] Add kafka1012 back to the pool of kafka brokers in wmf-config. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271276 (owner: 10Elukey) [16:48:38] (03PS1) 10Andrew Bogott: Move labtest to openstack liberty [puppet] - 10https://gerrit.wikimedia.org/r/273481 [16:49:53] (03PS3) 10Ottomata: admin: Clarify researchers vs. statistics-users rights [puppet] - 10https://gerrit.wikimedia.org/r/272736 (owner: 10Alex Monk) [16:50:07] (03CR) 10Ottomata: [C: 032 V: 032] "Thanks Alex!" [puppet] - 10https://gerrit.wikimedia.org/r/272736 (owner: 10Alex Monk) [16:51:41] (03CR) 10Ottomata: "Guiseppe, thoughts? I don't feel comfortable reviewing this without more understanding." [puppet] - 10https://gerrit.wikimedia.org/r/272613 (owner: 10BryanDavis) [16:53:15] (03PS6) 10Ottomata: Set up Analytics MySQL Meta backup with backup::mysqlset [puppet] - 10https://gerrit.wikimedia.org/r/273312 (https://phabricator.wikimedia.org/T127991) [16:55:49] (03PS1) 10Tim Landscheidt: diamond: Remove unnecessary/incorrect include of stdlib [puppet] - 10https://gerrit.wikimedia.org/r/273483 [16:58:03] (03CR) 10Jcrespo: "I have not had a proper look at it, but we should consider space restrictions on backups. I wouls like to have alex aproval for this." [puppet] - 10https://gerrit.wikimedia.org/r/273312 (https://phabricator.wikimedia.org/T127991) (owner: 10Ottomata) [17:07:14] (03PS1) 10Mforns: Replace limn::data::generate by reportupdater [puppet] - 10https://gerrit.wikimedia.org/r/273487 (https://phabricator.wikimedia.org/T127327) [17:07:29] (03PS2) 10Mforns: Replace limn::data::generate by reportupdater [puppet] - 10https://gerrit.wikimedia.org/r/273487 (https://phabricator.wikimedia.org/T127327) [17:08:11] (03CR) 10Mforns: [C: 04-1] "The referenced repositories are not merged yet, so please don't merge yet." [puppet] - 10https://gerrit.wikimedia.org/r/273487 (https://phabricator.wikimedia.org/T127327) (owner: 10Mforns) [17:08:58] (03CR) 10jenkins-bot: [V: 04-1] Replace limn::data::generate by reportupdater [puppet] - 10https://gerrit.wikimedia.org/r/273487 (https://phabricator.wikimedia.org/T127327) (owner: 10Mforns) [17:12:09] (03Abandoned) 10Elukey: Add kafka1012 back to the pool of kafka brokers in wmf-config. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271276 (owner: 10Elukey) [17:15:44] (03CR) 10Ottomata: Define service entries for InitialiseSettings (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266510 (https://phabricator.wikimedia.org/T114273) (owner: 10Giuseppe Lavagetto) [17:17:45] (03PS1) 10Elukey: Add kafka1012.eqiad.wmnet back to the media-wiki config. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273488 (https://phabricator.wikimedia.org/T125084) [17:19:23] (03CR) 10Ottomata: [C: 031] Add kafka1012.eqiad.wmnet back to the media-wiki config. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/273488 (https://phabricator.wikimedia.org/T125084) (owner: 10Elukey) [17:24:05] 6Operations, 10Analytics, 6Analytics-Kanban, 13Patch-For-Review: Increase HADOOP_HEAPSIZE (-Xmx) for hive-server2 - https://phabricator.wikimedia.org/T76343#2066893 (10Nuria) 5Open>3Resolved [17:27:13] (03PS3) 10Mforns: Replace limn::data::generate by reportupdater [puppet] - 10https://gerrit.wikimedia.org/r/273487 (https://phabricator.wikimedia.org/T127327) [17:27:27] (03PS4) 10Mforns: Replace limn::data::generate by reportupdater [puppet] - 10https://gerrit.wikimedia.org/r/273487 (https://phabricator.wikimedia.org/T127327) [17:33:48] (03CR) 10Tim Landscheidt: "Tested on Toolsbeta." [puppet] - 10https://gerrit.wikimedia.org/r/273483 (owner: 10Tim Landscheidt) [17:34:23] (03CR) 10Mforns: [C: 04-1] "The repositories referenced by this change are not merged yet, please don't merge yet." [puppet] - 10https://gerrit.wikimedia.org/r/273487 (https://phabricator.wikimedia.org/T127327) (owner: 10Mforns) [17:37:20] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [17:39:10] (03PS1) 10Ema: Maps VCL forward-port to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/273490 (https://phabricator.wikimedia.org/T124279) [17:40:29] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [17:41:26] (03Abandoned) 10Ema: Maps VCL forward-port to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/273490 (https://phabricator.wikimedia.org/T124279) (owner: 10Ema) [17:41:51] is it just a spike or is it continuous [17:43:25] (03PS15) 10Ema: Maps VCL forward-port to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/269466 (https://phabricator.wikimedia.org/T124279) [17:47:39] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [17:48:19] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [17:51:28] (03PS16) 10Ema: Maps VCL forward-port to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/269466 (https://phabricator.wikimedia.org/T124279) [17:51:56] A wild s5 replication slave appears! [17:52:02] I mean m5 [17:58:51] 6Operations, 10Ops-Access-Requests, 13Patch-For-Review: Access Request for mobrovac as ci-admin to mess with CI infrastructure - https://phabricator.wikimedia.org/T128175#2066937 (10Legoktm) +1 [17:59:14] (03PS1) 10Dzahn: ipmi: remove from iron, add on neodymium [puppet] - 10https://gerrit.wikimedia.org/r/273491 [18:00:01] (03PS2) 10Dzahn: ipmi: remove from iron, add on neodymium [puppet] - 10https://gerrit.wikimedia.org/r/273491 [18:02:06] (03CR) 10Dzahn: [C: 032] "added reviewers fyi, if you used to use this on iron, please use it either on palladium or on neodymium" [puppet] - 10https://gerrit.wikimedia.org/r/273491 (owner: 10Dzahn) [18:03:10] 6Operations, 10Traffic: confctl: improve/upgrade --tags/--find - https://phabricator.wikimedia.org/T128199#2066945 (10BBlack) [18:05:30] (03PS1) 10Cmjohnson: Adding dhcpd entries for new restbase1010-1015 [puppet] - 10https://gerrit.wikimedia.org/r/273493 [18:08:16] (03PS2) 10Cmjohnson: Adding dhcpd entries for new restbase1010-1015 [puppet] - 10https://gerrit.wikimedia.org/r/273493 [18:09:41] 6Operations, 10Ops-Access-Requests, 13Patch-For-Review: Access Request for mobrovac as ci-admin to mess with CI infrastructure - https://phabricator.wikimedia.org/T128175#2066176 (10GWicke) +1 from me. [18:11:26] (03CR) 10Cmjohnson: [C: 032] Adding dhcpd entries for new restbase1010-1015 [puppet] - 10https://gerrit.wikimedia.org/r/273493 (owner: 10Cmjohnson) [18:11:29] 6Operations, 10Ops-Access-Requests, 13Patch-For-Review: Access Request for mobrovac as ci-admin to mess with CI infrastructure - https://phabricator.wikimedia.org/T128175#2066977 (10JanZerebecki) +1 I imagine you can make good use of this. (Offtopic: We could copy the keys from ldap into place at instance... [18:13:05] 6Operations, 10Traffic: confctl: improve/upgrade --tags/--find - https://phabricator.wikimedia.org/T128199#2066978 (10BBlack) Arguably if the selector selects more than one entry, it should echo the changes first and ask for confirmation, unless given some force/yes flag. [18:30:34] 6Operations, 10RESTBase: install restbase1010-restbase1016 - https://phabricator.wikimedia.org/T128107#2067021 (10fgiunchedi) chatting with @cmjohnson on IRC, we'll need to install the new machines with 5x ssd from the beginning, there should be 12x + 3x SSD in eqiad. IOW we have SSD for 3x machines to start w... [18:35:14] (03CR) 10Krinkle: [C: 031] Simplify VCL errorpage [puppet] - 10https://gerrit.wikimedia.org/r/273480 (owner: 10Ema) [18:35:29] (03CR) 10Krinkle: "pet try link - http://misc-web-lb.eqiad.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/273480 (owner: 10Ema) [18:41:30] !log ori@tin Synchronized php-1.27.0-wmf.14/includes/user/User.php: I43cde3a48: Prevent duplicate memcached lookups for user record (duration: 01m 02s) [18:41:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:53:36] (03PS1) 10Southparkfan: Clean up display.php table [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273500 [18:56:12] (03PS2) 10Southparkfan: Clean up display.php table [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273500 [18:56:52] (03CR) 10Dzahn: "tested in compiler on some random hosts http://puppet-compiler.wmflabs.org/1876/" [puppet] - 10https://gerrit.wikimedia.org/r/273209 (https://phabricator.wikimedia.org/T119042) (owner: 10Giuseppe Lavagetto) [18:57:57] (03CR) 10Dzahn: [C: 031] standard: move to own module [puppet] - 10https://gerrit.wikimedia.org/r/273209 (https://phabricator.wikimedia.org/T119042) (owner: 10Giuseppe Lavagetto) [18:58:08] (03PS4) 10Dzahn: standard: move to own module [puppet] - 10https://gerrit.wikimedia.org/r/273209 (https://phabricator.wikimedia.org/T119042) (owner: 10Giuseppe Lavagetto) [19:01:39] 6Operations, 6Discovery, 7Elasticsearch, 7Epic: EPIC: Cultivating the Elasticsearch garden (operational lessons from 1.7.1 upgrade) - https://phabricator.wikimedia.org/T109089#2067145 (10Deskana) [19:07:58] (03PS17) 10Ema: Maps VCL forward-port to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/269466 (https://phabricator.wikimedia.org/T124279) [19:08:57] 6Operations, 10RESTBase: install restbase1010-restbase1016 - https://phabricator.wikimedia.org/T128107#2067173 (10Eevans) >>! In T128107#2067021, @fgiunchedi wrote: > IOW we have SSD for 3x machines to start with, the remaining 18x SSD will come from restbase1001-1006 as they are progressively decommissioned.... [19:10:07] (03PS1) 10BBlack: traffic-pool: remove BindsTo= [puppet] - 10https://gerrit.wikimedia.org/r/273502 [19:14:34] (03CR) 10Dzahn: [C: 04-1] "it's introducting literal tabs and mixing it with whitespace. let's not use the tabs please" [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273500 (owner: 10Southparkfan) [19:18:20] (03PS3) 10Southparkfan: Clean up display.php table [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273500 [19:21:39] (03PS4) 10Southparkfan: Clean up display.php table [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273500 [19:22:02] (03CR) 10Dzahn: [C: 032] Clean up display.php table [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273500 (owner: 10Southparkfan) [19:22:11] (03CR) 10Dzahn: [V: 032] Clean up display.php table [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273500 (owner: 10Southparkfan) [19:30:42] (03PS1) 10Southparkfan: Fix wrong thead tag [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273505 [19:33:44] 6Operations, 10RESTBase: install restbase1010-restbase1016 - https://phabricator.wikimedia.org/T128107#2067229 (10Cmjohnson) Without taking from existing servers (restbae1001-6). I am able to stand up 2 machines (restbase1010-1011). To get the next 2(1012-1013) servers going we'll need the ssds from restbase1... [19:35:19] 6Operations, 10RESTBase: install restbase1010-restbase1016 - https://phabricator.wikimedia.org/T128107#2067232 (10Cmjohnson) Current status is 1010 and 1011 are ready for installs, everything but identifying the partman recipe is complete completed: racked, cabled, switch cfg, dhcp, dns, confirmed virtual con... [19:39:54] (03CR) 10Dzahn: [C: 032 V: 032] Fix wrong thead tag [debs/wikistats] - 10https://gerrit.wikimedia.org/r/273505 (owner: 10Southparkfan) [19:59:23] (03Abandoned) 10GWicke: htmldumper 0.1.0 with dependencies [dumps/html/deploy] - 10https://gerrit.wikimedia.org/r/204964 (https://phabricator.wikimedia.org/T94457) (owner: 10GWicke) [20:04:23] 7Blocked-on-Operations, 6Operations, 10RESTBase, 10RESTBase-Cassandra, 13Patch-For-Review: Finish conversion to multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#2067276 (10GWicke) Copying @Eevans' note from https://phabricator.wikimedia.org/T125842#2056268: I've... [20:04:37] !log issuing test repair on cerium (restbase staging), keyspace : T108611 [20:04:38] T108611: perform initial (manual) repair of Cassandra cluster - https://phabricator.wikimedia.org/T108611 [20:06:30] 7Blocked-on-Operations, 6Operations, 10RESTBase, 10RESTBase-Cassandra, 13Patch-For-Review: Finish conversion to multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#2067290 (10GWicke) @Eevans, thanks for starting work on this. Could you work with @fgiunchedi to transl... [20:06:53] 6Operations, 10procurement: temporarily provide a second VM running at rackspace for wikitech-static - https://phabricator.wikimedia.org/T128206#2067291 (10Dzahn) [20:07:17] 6Operations, 10procurement: temporarily provide a second VM running at rackspace for wikitech-static - https://phabricator.wikimedia.org/T128206#2067308 (10Dzahn) [20:07:19] 6Operations, 6Labs, 10wikitech.wikimedia.org: Update wikitech-static OS/PHP version - https://phabricator.wikimedia.org/T126385#2012874 (10Dzahn) [20:09:43] 6Operations, 6Labs, 10wikitech.wikimedia.org: Update wikitech-static OS/PHP version - https://phabricator.wikimedia.org/T126385#2067313 (10Dzahn) @Krenair i made a procurement ticket to request a second VM with jessie. The idea would be to give you access to that so you could migrate data from the old one, w... [20:10:00] 6Operations, 10MediaWiki-Authentication-and-authorization, 10Traffic: Logging out of a wiki leaves an XXwikiSession= Cookie behind - https://phabricator.wikimedia.org/T127436#2067318 (10Anomie) a:3Anomie [20:26:31] (03PS2) 10Andrew Bogott: Move labtest to openstack liberty [puppet] - 10https://gerrit.wikimedia.org/r/273481 [20:26:33] (03PS1) 10Andrew Bogott: Pin the cloud archive at the same priority as wikimedia repo [puppet] - 10https://gerrit.wikimedia.org/r/273512 [20:27:10] I am going to restart Jenkins soonish [20:27:40] bblack, SMalyshev and I would really like to chat with you and anyone else interested in WDQS caching. [20:27:49] any thoughts on the best time for that? [20:27:56] should be a fairly brief one issue meeting :) [20:29:10] !log Restarting Jenkins [20:30:43] yurik: 2017? :) [20:30:58] * yurik throws icecream at bblack [20:31:06] but seriously, maybe sometime next week, but everything's hyper-busy through into April [20:31:19] SMalyshev, ^ [20:32:29] bblack: we don't need you to *do* anything yet, we just want tome time to talk about what should be done [20:32:37] ok [20:32:59] ok, so let's schedule something next week [20:33:03] any preferences? [20:33:08] well it would be a good chance to catch up on a few related other things too, aside from the technicalities of caching itself [20:33:09] 6Operations, 6Labs, 10wikitech.wikimedia.org: Update wikitech-static OS/PHP version - https://phabricator.wikimedia.org/T126385#2067409 (10Krenair) Sounds good to me [20:33:15] (03PS2) 10Andrew Bogott: Pin the cloud archive at the same priority as wikimedia repo [puppet] - 10https://gerrit.wikimedia.org/r/273512 [20:33:17] (03PS3) 10Andrew Bogott: Move labtest to openstack liberty [puppet] - 10https://gerrit.wikimedia.org/r/273481 [20:33:18] bblack: yes, sure [20:33:27] bblack, i hear you, lets do a quick 30 or less min chat, and figure out which way to head and if our discovery ops guru could help with that :) cc:gehel [20:33:35] the few other things being: (1) Why did we put it on misc-web (for all I know that was me) and is that still appropriate long-term and (2) codfw switchover plans [20:33:50] i'm slow :) [20:34:37] bblack, want to do it now? :D [20:34:50] we might be chatting on irc for much longer otherwise [20:34:50] nope! :) [20:34:57] we have done that before :-P [20:35:02] sounds good :) [20:35:11] SMalyshev, you drive :-P [20:35:14] no prefs [20:35:25] ok, I just pick on the calendar [20:35:36] SMalyshev, ideally - after 1pm SF [20:35:44] tueday seems easiest, there's something ~9AM SF for me all the other days next weekl [20:35:46] 1pm, 2pm - best time of day for me :) [20:35:59] yurik: that excludes gehel I thought you wanted him to be there too? [20:36:09] SMalyshev, oh, what time is he around/ [20:36:35] I am a nightowl so afternoon is great for me but for folks in Europe that may be not the best time [20:36:38] if you're going later like 1 or 2pm SF, I probably can't make Mon/Weds [20:36:55] SMalyshev, ok, any time for me then [20:37:01] bblack: Tuesday? [20:37:10] tuesday anytime is fine [20:37:33] yep [20:38:00] (03CR) 10Dzahn: [C: 031] standard: move to own module [puppet] - 10https://gerrit.wikimedia.org/r/273209 (https://phabricator.wikimedia.org/T119042) (owner: 10Giuseppe Lavagetto) [20:38:54] ok, sent one for Tue 11:30 [20:38:58] (03CR) 10Chad: [C: 032] "The mail itself wouldn't be received on 3, even if you address it there. 6 is the "direct" address, although it hardly matters in practice" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263551 (owner: 10MaxSem) [20:39:12] (03PS3) 10Andrew Bogott: Pin the cloud archive at the same priority as wikimedia repo [puppet] - 10https://gerrit.wikimedia.org/r/273512 [20:39:14] (03PS4) 10Andrew Bogott: Move labtest to openstack liberty [puppet] - 10https://gerrit.wikimedia.org/r/273481 [20:39:40] (03Merged) 10jenkins-bot: Update WMF address [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263551 (owner: 10MaxSem) [20:40:26] yurik: 1pm SF (10pm CET) is fine for me, as long as you let me know a bit in advance [20:40:41] SMalyshev, ^ [20:40:45] bblack, ^ [20:41:52] ok [20:42:26] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: update wmf address in echo footer thingie (duration: 00m 59s) [20:44:09] gehel: so far we scheduled it for Tue 11:30 Pacific, if that doesn't work out we'll move it [20:45:09] actually, 1pm is easier for me (need to find time for dinner at some point), but I'm fine either way [20:54:26] SMalyshev, if its ok with bblack, could you move it to 1pm SF? monday is fine too [20:54:48] yurik: Monday I'm busy (see emails) [20:54:57] SMalyshev, tue it is ) [20:55:45] yurik: on Tue 1pm there's exciting new improvements to phabricator talk [20:57:13] gehel, is 2pm too late for you? [20:57:46] good for me as long as you are aware that some of my neurons are already gonna be asleep [20:58:12] dont try to hard to accomodate me, 11am is fine too [20:59:12] (03PS4) 10Andrew Bogott: Pin the cloud archive at the same priority as wikimedia repo [puppet] - 10https://gerrit.wikimedia.org/r/273512 [20:59:14] (03PS5) 10Andrew Bogott: Move labtest to openstack liberty [puppet] - 10https://gerrit.wikimedia.org/r/273481 [20:59:16] (03PS1) 10Andrew Bogott: Designate.conf: Specify some more pool_target settings [puppet] - 10https://gerrit.wikimedia.org/r/273517 [21:00:49] (03CR) 10Andrew Bogott: [C: 032] Designate.conf: Specify some more pool_target settings [puppet] - 10https://gerrit.wikimedia.org/r/273517 (owner: 10Andrew Bogott) [21:05:08] (03PS2) 10Dzahn: ci: split and move role classes to modules/role/ [puppet] - 10https://gerrit.wikimedia.org/r/260939 [21:06:56] (03CR) 10Dzahn: "modules/role/manifests/ci/labs/common.pp removed, you are right. also updated the files that had changes meanwhile, hopefully correct" [puppet] - 10https://gerrit.wikimedia.org/r/260939 (owner: 10Dzahn) [21:10:26] hashar: ok with you if i split those CI role classes, as long as i make sure it is no-op? [21:11:12] mutante: Guten Abend [21:11:23] mutante: sorry to have completely ignored that puppet change :( [21:11:46] mutante: part of the reason being that we have a bunch of cherry picked patches on the integration puppetmaster and I have been lazy to get them reviewed/refined/merged etc [21:11:47] ;( [21:11:54] hashar: guten abend, i'm not expecting you to review Friday night [21:11:58] ah [21:12:27] ok, well i compiled this http://puppet-compiler.wmflabs.org/1877/ [21:12:55] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/1877/" [puppet] - 10https://gerrit.wikimedia.org/r/260939 (owner: 10Dzahn) [21:13:35] 6Operations, 10ops-eqiad: Failed drive in labstore1001 array - https://phabricator.wikimedia.org/T127076#2067539 (10chasemp) 5Open>3Resolved I was able to to figure this out and now the array is rebuilding: ```md126 : active raid10 sdaf[12] sdap[11] sdam[5] sdan[7] sdak[1] sdai[8] sdal[3] sdaj[10] sdae[0]... [21:15:49] PROBLEM - puppet last run on mw1130 is CRITICAL: CRITICAL: puppet fail [21:16:41] 6Operations, 10media-storage: Unable to delete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2063784 (10Osiris) @fgiunchedi Filippo, I've got one now on simplewiki: https://simple.wikipedia.org/wik... [21:17:30] (03PS7) 10Dzahn: maps: move roles to modules/role/ [puppet] - 10https://gerrit.wikimedia.org/r/249059 [21:17:50] (03PS8) 10Dzahn: osm/maps/postgres: move tuning.conf out of /files/ [puppet] - 10https://gerrit.wikimedia.org/r/249056 [21:18:28] (03CR) 10jenkins-bot: [V: 04-1] osm/maps/postgres: move tuning.conf out of /files/ [puppet] - 10https://gerrit.wikimedia.org/r/249056 (owner: 10Dzahn) [21:32:41] (03PS12) 10Dzahn: mediawiki: update font packages for jessie [puppet] - 10https://gerrit.wikimedia.org/r/218640 (https://phabricator.wikimedia.org/T102623) [21:40:33] 6Operations, 10Traffic, 10Wikipedia-Store, 7HTTPS: shop.wikimedia.org should be HTTPS only - https://phabricator.wikimedia.org/T39790#2067606 (10Ppena) @CCogdill_WMF I'm not sure what you mean, I see https throughout the entire flow of the store? Could you confirm? {F3433869}{F3433871} [21:43:49] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:55:28] 6Operations, 10Traffic, 10Wikipedia-Store, 7HTTPS: shop.wikimedia.org should be HTTPS only - https://phabricator.wikimedia.org/T39790#2067663 (10Dzahn) @MZMcBride @Chmarkine can you chime in and show the remaining issues to @Ppena ? @CCogdill_WMF i don't of any hard deadlines, it's just that this ticket h... [22:01:39] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: puppet fail [22:07:40] (03PS13) 10Dzahn: mediawiki: update font packages for jessie [puppet] - 10https://gerrit.wikimedia.org/r/218640 (https://phabricator.wikimedia.org/T102623) [22:12:29] (03CR) 10Dzahn: "@Muehlenhoff i believe i had already addressed your comments back then" [puppet] - 10https://gerrit.wikimedia.org/r/218640 (https://phabricator.wikimedia.org/T102623) (owner: 10Dzahn) [22:13:50] PROBLEM - Apache HTTP on mw1130 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:14:59] PROBLEM - HHVM rendering on mw1130 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:15:10] PROBLEM - salt-minion processes on mw1130 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:15:18] PROBLEM - RAID on mw1130 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:15:38] PROBLEM - configured eth on mw1130 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:15:39] PROBLEM - puppet last run on mw1130 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:15:57] farewell mw1130 [22:15:59] PROBLEM - DPKG on mw1130 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:15:59] PROBLEM - HHVM processes on mw1130 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:16:01] we hardly knew ye [22:16:08] PROBLEM - nutcracker process on mw1130 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:16:19] PROBLEM - Check size of conntrack table on mw1130 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:16:39] PROBLEM - nutcracker port on mw1130 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:16:49] PROBLEM - SSH on mw1130 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:19:48] PROBLEM - Disk space on mw1130 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:19:59] PROBLEM - dhclient process on mw1130 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:20:22] !log powercycle mw1130 [22:21:23] tx mutante I was halfway there :) [22:22:03] get an IRC bot to power cycle them! [22:22:49] RECOVERY - HHVM processes on mw1130 is OK: PROCS OK: 12 processes with command name hhvm [22:22:49] RECOVERY - DPKG on mw1130 is OK: All packages OK [22:22:59] RECOVERY - nutcracker process on mw1130 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [22:23:08] RECOVERY - Disk space on mw1130 is OK: DISK OK [22:23:10] RECOVERY - Check size of conntrack table on mw1130 is OK: OK: nf_conntrack is 0 % full [22:23:18] RECOVERY - dhclient process on mw1130 is OK: PROCS OK: 0 processes with command name dhclient [22:23:30] RECOVERY - nutcracker port on mw1130 is OK: TCP OK - 0.000 second response time on port 11212 [22:23:39] RECOVERY - SSH on mw1130 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0) [22:23:49] RECOVERY - salt-minion processes on mw1130 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [22:23:58] RECOVERY - RAID on mw1130 is OK: OK: no RAID installed [22:24:10] RECOVERY - configured eth on mw1130 is OK: OK - interfaces up [22:24:19] RECOVERY - Apache HTTP on mw1130 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 498 bytes in 0.063 second response time [22:24:19] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 41 minutes ago with 0 failures [22:25:25] chasemp: yw:) [22:25:28] RECOVERY - HHVM rendering on mw1130 is OK: HTTP OK: HTTP/1.1 200 OK - 72761 bytes in 1.231 second response time [22:28:00] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:37:06] 6Operations, 10Traffic, 10Wikipedia-Store, 7HTTPS: shop.wikimedia.org should be HTTPS only - https://phabricator.wikimedia.org/T39790#2067888 (10CCogdill_WMF) Hmm good point @PPena, I haven't actually checked on this since the store was redesigned. I know there used to be an issue with the customer login d... [23:06:44] (03CR) 10Dzahn: "re: "might work after the switch to openldap" this could be the case meanwhile" [puppet] - 10https://gerrit.wikimedia.org/r/229299 (https://phabricator.wikimedia.org/T107702) (owner: 10Dzahn) [23:24:11] 6Operations, 10Traffic, 10Wikipedia-Store, 7HTTPS: shop.wikimedia.org should be HTTPS only - https://phabricator.wikimedia.org/T39790#2068061 (10MZMcBride) 5Open>3Resolved This looks fixed now. Thanks, all! ``` $ curl -Is "http://shop.wikimedia.org/" | grep Location Location: https://shop.wikimedia.or... [23:25:23] ^ cool [23:25:27] that's been open since 2012 [23:25:34] shop https-only [23:42:38] 6Operations, 10hardware-requests, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Log host for codfw (fluorine's equivalent) - https://phabricator.wikimedia.org/T126988#2068297 (10RobH) [23:46:18] 6Operations, 10hardware-requests, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Log host for codfw (fluorine's equivalent) - https://phabricator.wikimedia.org/T126988#2068308 (10RobH) Please note that the server order for this request has been placed on T127092 (which includes all order tracking info.) [23:46:57] 6Operations, 10media-storage: Unable to delete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2068311 (10aaron) Probably caused by T128124. swiftrepl takes time to clean these up. [23:58:14] (03PS2) 10Catrope: Configure default Echo subscriptions user options on he.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/246171 (https://phabricator.wikimedia.org/T114982) (owner: 10Dereckson) [23:58:29] (03CR) 10Catrope: "@Krenair, legoktm: Like this?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/246171 (https://phabricator.wikimedia.org/T114982) (owner: 10Dereckson) [23:58:31] (03CR) 10Tim Landscheidt: [C: 031] "Yes, AFAICS you incorporated the changes correctly ("git show" = each line deleted is inserted somewhere else verbatim (+ 1 new empty line" [puppet] - 10https://gerrit.wikimedia.org/r/260939 (owner: 10Dzahn)