[00:04:38] (03PS1) 10Dzahn: ruthenium: slight syntax change for require_package [puppet] - 10https://gerrit.wikimedia.org/r/267396 [00:06:18] (03CR) 10Dzahn: [C: 032] ruthenium: slight syntax change for require_package [puppet] - 10https://gerrit.wikimedia.org/r/267396 (owner: 10Dzahn) [00:08:30] !log bd808@mira Synchronized php-1.27.0-wmf.11/includes/session/SessionBackend.php: Remove proposed fix for T125267 (duration: 01m 33s) [00:08:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:09:04] (03PS1) 10Dzahn: ruthenium: move comments up above require_package [puppet] - 10https://gerrit.wikimedia.org/r/267397 [00:09:30] (03PS2) 10Dzahn: visualdiff: move comments up above require_package [puppet] - 10https://gerrit.wikimedia.org/r/267397 [00:09:52] (03CR) 10Dzahn: [C: 032] visualdiff: move comments up above require_package [puppet] - 10https://gerrit.wikimedia.org/r/267397 (owner: 10Dzahn) [00:10:17] (03CR) 10Dzahn: [V: 032] visualdiff: move comments up above require_package [puppet] - 10https://gerrit.wikimedia.org/r/267397 (owner: 10Dzahn) [00:17:12] (03PS1) 10Dzahn: visualdiff: do not install g++ package, debugging [puppet] - 10https://gerrit.wikimedia.org/r/267398 [00:17:39] (03PS2) 10Dzahn: visualdiff: do not install g++ package, debugging [puppet] - 10https://gerrit.wikimedia.org/r/267398 [00:17:50] (03CR) 10Dzahn: [C: 032 V: 032] visualdiff: do not install g++ package, debugging [puppet] - 10https://gerrit.wikimedia.org/r/267398 (owner: 10Dzahn) [00:18:54] ori: hah, i found an issue with require_package that is kind of odd [00:19:16] ori: when trying to install a package with "++" in it, like "g++" [00:19:22] that causes this: [00:19:39] undefined method `function_create_resources' [00:19:54] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:20:07] it must be the "++", interpreted as incrementing a var [00:20:08] or something [00:20:23] as soon as i dont try to install that package it works [00:22:26] (03CR) 10Thcipriani: "Just merged the last set of commits to enable a single local command run via scap that should take the place of most of the `install` func" [puppet] - 10https://gerrit.wikimedia.org/r/262742 (owner: 10Alexandros Kosiaris) [00:27:44] 6operations: require_package fails to install packages if "++" appears in package name - https://phabricator.wikimedia.org/T125276#1983252 (10Dzahn) 3NEW [00:29:15] 6operations: require_package fails to install packages if "++" appears in package name - https://phabricator.wikimedia.org/T125276#1983266 (10Dzahn) a:3ori [00:29:31] 6operations: require_package fails to install packages if "++" appears in package name - https://phabricator.wikimedia.org/T125276#1983252 (10Dzahn) [00:30:07] (03CR) 10Dzahn: "this actually fixed the puppet error on ruthenium." [puppet] - 10https://gerrit.wikimedia.org/r/267398 (owner: 10Dzahn) [00:32:01] 6operations: require_package fails to install packages if "++" appears in package name - https://phabricator.wikimedia.org/T125276#1983282 (10Dzahn) if this gets fixed we should revert https://gerrit.wikimedia.org/r/#/c/267398/ [00:42:38] (03PS2) 10Dzahn: testreduce: Remove ensure => latest from the repo declaration [puppet] - 10https://gerrit.wikimedia.org/r/267393 (owner: 10Subramanya Sastry) [00:42:46] (03CR) 10Dzahn: [C: 032] testreduce: Remove ensure => latest from the repo declaration [puppet] - 10https://gerrit.wikimedia.org/r/267393 (owner: 10Subramanya Sastry) [00:45:11] 6operations, 7Icinga: icinga contacts and permissions for ema and elukey - https://phabricator.wikimedia.org/T124941#1983373 (10Dzahn) a:5Dzahn>3ema @ema could you confirm it works for you as well and then just close the ticket or give it back to me? thanks [00:46:06] (03PS1) 10Yuvipanda: toollabs: Do not overwrite Host header [puppet] - 10https://gerrit.wikimedia.org/r/267402 [00:47:00] 10Ops-Access-Requests, 6operations: add subbu to parsoid-roots - https://phabricator.wikimedia.org/T125166#1983399 (10Dzahn) Just to clarify, is this for all parsoid servers, incl. production or just for the test server ruthenium? [00:51:26] * bd808 waits on mediawiki-extenstions-php53 ... [01:00:55] who can give me push rights for the 1.27.0-wmf.11.nosessionmanager MW core branch? [01:01:00] bd808: ^ [01:03:06] tgr, is it sorted? [01:03:12] also the same branch in OAuth [01:03:14] not yet [01:04:42] tgr, you should already be able to push to that branch? [01:05:27] I get [01:05:29] remote: Branch refs/heads/wmf/1.27.0-wmf.11.nosessionmanager: [01:05:29] remote: You are not allowed to perform this operation. [01:05:29] remote: To push into this reference you need 'Push' rights. [01:05:31] on MW repos, refs/tags/wmf/* allows wmf-deployment to force push [01:05:56] it's not even a force push, on force push I get a different error [01:06:10] ooh, sorry, refs/heads [01:06:48] tgr, try now? [01:07:19] that worked, thanks! [01:12:49] !log bd808@mira Synchronized php-1.27.0-wmf.11/includes/session: SessionManager: Don't save non-persisted sessions to backend storage (54c796c) (duration: 01m 29s) [01:13:11] anomie, tgr: ^ [01:15:31] anomie: dropping! -- https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Memcached+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [01:16:10] bytes_in fell down too -- https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&title=&vl=&x=&n=&hreg[]=mc1001&mreg[]=bytes_in>ype=line&glegend=show&aggregate=1&embed=1&_=1454110869000 [01:16:16] bd808: \o/ [01:17:34] tgr, can you let me know when you no longer need that acl? [01:17:43] 6operations, 10MediaWiki-Authentication-and-authorization, 5MW-1.27-release-notes, 5Patch-For-Review, and 2 others: ~3000% increase in session redis memory usage, causing evictions and session loss - https://phabricator.wikimedia.org/T125267#1983485 (10bd808) [01:17:56] Krenair: I will [01:18:58] (03PS1) 10Llyrian: Enable data type mathematical expression on wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/267405 (https://phabricator.wikimedia.org/T124931) [01:29:02] 6operations, 10MediaWiki-Authentication-and-authorization, 5MW-1.27-release-notes, 5Patch-For-Review, and 2 others: ~3000% increase in session redis memory usage, causing evictions and session loss - https://phabricator.wikimedia.org/T125267#1983509 (10bd808) On investigation @anomie determined that Sessio... [01:29:21] 6operations, 10MediaWiki-Authentication-and-authorization, 5MW-1.27-release-notes, 5Patch-For-Review, and 2 others: ~3000% increase in session redis memory usage, causing evictions and session loss - https://phabricator.wikimedia.org/T125267#1983512 (10bd808) {T125194} could be related if evictions are act... [01:42:02] (03CR) 10Tim Landscheidt: "I believe this would at least cause http -> https redirects to fail (T66627). If that's true, then in "proxy_redirect http://<%= @web_dom" [puppet] - 10https://gerrit.wikimedia.org/r/267402 (owner: 10Yuvipanda) [01:42:20] bd808: https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Memcached+eqiad&h=mc1001.eqiad.wmnet&jr=&js=&v=523764096.000000&m=used_memory&vl=bytes&ti=used_memory seems to be slowly dropping too (as stuff expires from the cache, presumably). It should be back to "normal", whatever that is, at around 2:12 since the problem sessions should all have an hour expiry. [01:42:55] (03CR) 10Tim Landscheidt: "Eh, remove the last sentence. The http -> http part is done by the $scheme bit." [puppet] - 10https://gerrit.wikimedia.org/r/267402 (owner: 10Yuvipanda) [01:44:21] anomie: *nod* I think that we have increased baseline storage size and that should be looked at more but things are looking better with the local cache [01:45:29] YuviPanda: what do you think? Are the graphs related to https://phabricator.wikimedia.org/T125267 headed in a direction that makes you comfortable? [01:45:45] * bd808 wishes ori was still about to give his opinion [01:46:35] week view is starting to look good: http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=Memcached+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [01:47:10] s/good/better/ [01:48:22] bd808: We certainly increased baseline storage size, if only because we're now storing a bunch of metadata about the session. A rough estimate on that is around 350 bytes per session (based on looking at one session). [01:49:32] what's the N for the 350*N increase? [01:49:54] How many sessions are active at any particular time? [01:50:03] yeah I guess that's the question [01:50:07] (keeping in mind they expire after an hour of inactivity) [01:51:11] greg-g: that weekly graph is the one that is looking the most improved to me [01:51:52] greg-g: can we keep it on and get some dinner/rest? [01:52:34] going to the no-sessonmanager branch right now would be nuts, so its either soldier on for the night or roll back to .10 [01:52:36] bd808: sorry, looking now [01:52:52] bd808: I don't really have much of an opinion because I don't know much but looking anyway [01:53:36] YuviPanda: do you know how to get ganglia to spit out that aggregate used_memory graph? [01:53:37] bd808: yeah, status quo [01:53:43] nope... [01:54:03] is that graphite? [01:54:05] no [01:54:27] no, it's some ganglia magic trick [01:54:43] hmm wouldn't looking at evicted keyes be enough? [01:54:45] I was pretty good with ganglia a year ago but I forgot all the tricks [01:55:18] ori said (not on the ticket!) that the evicted keys thing was a bad graph [01:55:29] hmm [01:55:31] pshhh, kids these days [01:55:35] heh [01:55:37] if it's not a grafana dashboard they dont know what to do [01:55:39] I thought ori wasn't here >_> [01:55:44] i wasn't [01:55:47] reading backlog [01:55:57] ok :) [01:56:04] I chanted his name in a dark room while staring at a mirror [01:56:15] mostly I opened ganglia for the first time in 2 months and have no idea what I'm looking at [01:56:16] so ignore me :) [01:56:22] bd808: I've still never done that successfully [01:56:24] http://ganglia.wikimedia.org/latest/graph_all_periods.php?title=&vl=&x=&n=&hreg%5B%5D=mc*&mreg%5B%5D=used_memory>ype=line&glegend=show&aggregate=1 [01:56:27] (bloody mary) [01:57:00] as anomie noted, it'll take a full hour to settle, but we're already below 500mb, which means we're no longer storing things faster than they are expiring [01:58:07] session eviction is why this was an UBN!; network utilization was a symptom but not in and of itself a user-facing problem (perf impact notwithstanding) [01:59:20] and bytes_out on memcached eqiad is down to pre-wmf11 level anyway [01:59:27] so, pretty sure you fixed it [01:59:58] w00t [01:59:59] wouldn't be the worst idea in the world to keep an eye on it, but it LGTM [02:00:10] * bd808 throws anomie a party [02:00:23] yeah I think there is some follow up for sure [02:02:38] also, it's not just your memory, ganglia got worse. to make these graphs you have to click on the "aggregate graph" tab, but only from the main page, and you usually need a hard refresh of the main page for that tab to be functional [02:02:40] 6operations, 10MediaWiki-Authentication-and-authorization, 5MW-1.27-release-notes, 5Patch-For-Review, and 2 others: ~3000% increase in session redis memory usage, causing evictions and session loss - https://phabricator.wikimedia.org/T125267#1983529 (10Anomie) What seems to have been happening here is two... [02:03:05] bd808, ori: I wrote a summary of what went on here at https://phabricator.wikimedia.org/T125267#1983529 [02:03:28] anomie: yeah, looks accurate [02:05:08] coalescing saves would be nice but can certainly be done later [02:05:47] i think the most pressing issue is that dinner is getting cold [02:05:51] :) [02:05:57] i'm off again. great work! ttyl [02:06:03] o/ [02:06:12] * bd808 heads to dinner as well [02:08:50] tgr, so, what's the status? [02:10:47] Krenair: looks like there's no need to roll back, so I won't need the push right anymore [02:10:59] (famous last sentences etc) [02:13:17] ok [02:13:31] a gerrit admin can put it back if necessary [02:24:54] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.11) (duration: 10m 24s) [02:24:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:31:57] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Jan 30 02:31:56 UTC 2016 (duration 7m 2s) [02:32:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:35:53] 6operations, 10RESTBase, 6Services, 10VisualEditor, and 2 others: RESTBase not accessible via Varnish in Beta - https://phabricator.wikimedia.org/T125282#1983538 (10mobrovac) 3NEW [02:41:21] (03PS1) 10Mobrovac: CXServer: BetaCluster: Fix RESTBase URL [puppet] - 10https://gerrit.wikimedia.org/r/267416 [02:44:54] (03CR) 10Mobrovac: "Kartik, Santhosh: it'll probably be a good idea to schedule this patch for PuppetSWAT - https://wikitech.wikimedia.org/wiki/PuppetSWAT ." [puppet] - 10https://gerrit.wikimedia.org/r/267416 (owner: 10Mobrovac) [02:48:44] http://danluu.com/perf-tracing/ [02:50:28] "which aren’t shown because there’s too much going on to effectively visualize" [02:50:30] 6operations, 10Beta-Cluster-Infrastructure, 10RESTBase, 6Services, and 2 others: RESTBase not accessible via Varnish in Beta - https://phabricator.wikimedia.org/T125282#1983548 (10Krenair) [02:50:30] lol [02:54:16] 6operations, 10Beta-Cluster-Infrastructure, 10RESTBase, 6Services, and 2 others: RESTBase not accessible via Varnish in Beta - https://phabricator.wikimedia.org/T125282#1983551 (10Krenair) 5Open>3Resolved a:3Krenair ``` <Krenair> 27 FetchError c no backend connection Krenair: a... [02:54:50] 6operations, 10Beta-Cluster-Infrastructure, 10RESTBase, 6Services, and 2 others: RESTBase not accessible via Varnish in Beta - https://phabricator.wikimedia.org/T125282#1983554 (10Jdforrester-WMF) [02:59:34] (03PS2) 10Tim Landscheidt: shinken: Add role::labs::instance as hostgroup to all instances [puppet] - 10https://gerrit.wikimedia.org/r/267039 (https://phabricator.wikimedia.org/T123271) [03:04:03] PROBLEM - Last backup of the tools filesystem on labstore1001 is CRITICAL: CRITICAL - Last run for unit replicate-tools was over 1 day, 1:00:00 ago [03:05:24] PROBLEM - Last backup of the others filesystem on labstore1001 is CRITICAL: CRITICAL - Last run result for unit replicate-others was exit-code [03:40:03] !log Deleted old /srv/mediawiki/php-1.27.0-wmf.[1-5] directories across the cluster to match the deployment tree, T124567 [03:40:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:44:27] (03PS1) 10Tim Landscheidt: shinken: Only regenerate configuration when there are changes [puppet] - 10https://gerrit.wikimedia.org/r/267423 [04:04:53] PROBLEM - Last backup of the maps filesystem on labstore1001 is CRITICAL: CRITICAL - Last run result for unit replicate-maps was exit-code [04:15:58] (03CR) 10KartikMistry: [C: 031] "Indeed. Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/267416 (owner: 10Mobrovac) [04:58:46] (03PS1) 10Tim Landscheidt: shinken: Indent and align generated configuration [puppet] - 10https://gerrit.wikimedia.org/r/267426 [05:02:20] (03CR) 10Tim Landscheidt: "I tested that Shinken treats the configuration in the same way." [puppet] - 10https://gerrit.wikimedia.org/r/267426 (owner: 10Tim Landscheidt) [05:02:38] (03CR) 10BBlack: "+1-ish conceptually. We should probably either do == for the hostname comparison though, or stick with the regex we're using elsewhere fo" [puppet] - 10https://gerrit.wikimedia.org/r/267381 (https://phabricator.wikimedia.org/T125176) (owner: 10GWicke) [05:20:43] 10Ops-Access-Requests, 6operations: add subbu to parsoid-roots - https://phabricator.wikimedia.org/T125166#1983578 (10ssastry) >>! In T125166#1983399, @Dzahn wrote: > Just to clarify, is this for all parsoid servers, incl. production or just for the test server ruthenium? Good question. Long answer, but I fig... [05:40:19] (03PS2) 10BBlack: WIP / untested: Don't decode percent encoding for rest.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/267381 (https://phabricator.wikimedia.org/T125176) (owner: 10GWicke) [05:41:06] (03CR) 10BBlack: "Should I be worried about the same thing for cxserver and/or citoid?" [puppet] - 10https://gerrit.wikimedia.org/r/267381 (https://phabricator.wikimedia.org/T125176) (owner: 10GWicke) [06:27:03] PROBLEM - Kafka Broker Under Replicated Partitions on kafka1018 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [10.0] [06:27:04] PROBLEM - Kafka Broker Under Replicated Partitions on kafka1014 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [10.0] [06:29:46] (03CR) 10Physikerwelt: [C: 04-1] Enable data type mathematical expression on wikidata.org (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/267405 (https://phabricator.wikimedia.org/T124931) (owner: 10Llyrian) [06:29:54] PROBLEM - puppet last run on neodymium is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:03] PROBLEM - puppet last run on wtp2008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:23] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:35] PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:43] PROBLEM - puppet last run on cp3048 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:43] PROBLEM - puppet last run on mw1260 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:15] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:24] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 4 failures [06:31:34] PROBLEM - puppet last run on db2044 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:44] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:54] PROBLEM - puppet last run on mw1203 is CRITICAL: CRITICAL: Puppet has 3 failures [06:31:54] PROBLEM - puppet last run on mw2043 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:03] PROBLEM - puppet last run on db2062 is CRITICAL: CRITICAL: puppet fail [06:32:14] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:33] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 2 failures [06:51:44] 6operations, 10MediaWiki-Authentication-and-authorization, 5MW-1.27-release-notes, 5Patch-For-Review, and 2 others: ~3000% increase in session redis memory usage, causing evictions and session loss - https://phabricator.wikimedia.org/T125267#1983643 (10bd808) Via RECOVERY - puppet last run on mw1260 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:56:05] RECOVERY - puppet last run on wtp2008 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:56:14] RECOVERY - puppet last run on mw1203 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:15] RECOVERY - puppet last run on mw2043 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [06:56:24] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:56:45] RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [06:56:46] RECOVERY - puppet last run on cp3048 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:56:53] RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:57:24] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:57:44] RECOVERY - puppet last run on neodymium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:44] RECOVERY - puppet last run on db2044 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:57:54] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:23] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:23] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:55] RECOVERY - puppet last run on db2062 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:00:38] 6operations, 7Mail, 7Mobile, 5Patch-For-Review: consolidate mailman redirects in exim aliases file - https://phabricator.wikimedia.org/T123581#1983651 (10Peachey88) Yes, it can be done on gApps via the groups system. [07:03:43] PROBLEM - puppet last run on mw2012 is CRITICAL: CRITICAL: puppet fail [07:04:34] (03PS2) 10Jforrester: Enable VisualEditor by default for some other wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264765 (https://phabricator.wikimedia.org/T116523) [07:04:45] (03CR) 10Jforrester: [C: 031] "Re-scheduled for 1 February." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264765 (https://phabricator.wikimedia.org/T116523) (owner: 10Jforrester) [07:05:01] (03PS7) 10Jforrester: Centralise all VisualEditor feedback pages except for a few wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258206 (https://phabricator.wikimedia.org/T92661) [07:05:07] (03CR) 10Jforrester: "Re-scheduled for 1 February." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258206 (https://phabricator.wikimedia.org/T92661) (owner: 10Jforrester) [07:31:34] RECOVERY - puppet last run on mw2012 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [07:47:15] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [100000000.0] [08:08:24] RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [08:46:25] (03PS1) 10Legoktm: releases: Fix capitalization of MediaWiki [puppet] - 10https://gerrit.wikimedia.org/r/267428 [08:46:40] (03CR) 10Legoktm: "Yay, thank you :) I submitted a small followup: https://gerrit.wikimedia.org/r/267428" [puppet] - 10https://gerrit.wikimedia.org/r/267299 (https://phabricator.wikimedia.org/T125164) (owner: 10Dzahn) [08:51:14] 6operations, 6Commons, 10MassMessage: Not all mass messages sent out. - https://phabricator.wikimedia.org/T125214#1983672 (10Saqib) Any development guys ? [08:51:52] legoktm: ^ [08:52:15] * legoktm looks [08:56:14] 6operations, 6Commons, 10MassMessage: Not all mass messages sent out. - https://phabricator.wikimedia.org/T125214#1983675 (10Legoktm) This is probably a dupe of T124414, I can re-trigger the messages from the server like last time (T124441), but a) there's a good chance they'll just fail again, and b) if mul... [08:56:58] ori: thanks [08:57:59] (03CR) 10Kelson: [C: 031] Add 2 sites to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262893 (https://phabricator.wikimedia.org/T122995) (owner: 10Mdann52) [09:00:28] legoktm: thank you! :) [09:18:01] 6operations, 6Commons, 10MassMessage: Not all mass messages sent out. - https://phabricator.wikimedia.org/T125214#1983691 (10Saqib) Please note , we have removed the users from the list who have already received the message so if the process restart, no one will get duplicate message. [09:38:20] 6operations, 6Commons, 10MassMessage: Not all mass messages sent out. - https://phabricator.wikimedia.org/T125214#1983705 (10Tgr) Per https://logstash.wikimedia.org/#dashboard/temp/AVKR5QQ1ptxhN1XadWGe, 10800 errors, 10300 have SessionManager in the stack trace. [09:38:29] 6operations, 6Commons, 10MassMessage: Not all mass messages sent out. - https://phabricator.wikimedia.org/T125214#1983706 (10Tgr) [10:38:07] <_joe_> subbu: sorry, I guess someone already did restart those services [12:33:34] PROBLEM - puppet last run on mw2181 is CRITICAL: CRITICAL: puppet fail [12:50:03] 6operations, 6Commons, 10MassMessage: Not all mass messages sent out. - https://phabricator.wikimedia.org/T125214#1983892 (10Steinsplitter) Please delete all failed msg from queue to prevent duplicates. @Tgr @Legoktm [13:00:57] !log discard preserved cache on ms-be2003, powercycle [13:01:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:01:34] RECOVERY - puppet last run on mw2181 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:18:11] (03CR) 10Luke081515: [C: 031] Add 2 sites to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262893 (https://phabricator.wikimedia.org/T122995) (owner: 10Mdann52) [13:26:47] 6operations, 5Patch-For-Review: install/setup/deploy server rhodium as puppetmaster (scaling out) - https://phabricator.wikimedia.org/T98173#1983927 (10akosiaris) Toss the puppet cert. I should have cleaned it but I forgot about it (though I did remember about the salt key). Otherwise, status is the same. Stal... [13:27:03] 6operations: rhodium.eqiad.wmnet status? - https://phabricator.wikimedia.org/T125056#1983928 (10akosiaris) Answered on that ticket as well. I think we can close this one [14:23:33] (03CR) 10Addshore: [C: 031] releases: Fix capitalization of MediaWiki [puppet] - 10https://gerrit.wikimedia.org/r/267428 (owner: 10Legoktm) [14:57:08] 6operations: rhodium.eqiad.wmnet status? - https://phabricator.wikimedia.org/T125056#1983985 (10ArielGlenn) 5Open>3Resolved worksforme [14:58:27] 6operations, 5Patch-For-Review: install/setup/deploy server rhodium as puppetmaster (scaling out) - https://phabricator.wikimedia.org/T98173#1983996 (10ArielGlenn) ok, thanks for the update, tossed! [15:18:47] 6operations, 6Collaboration-Team-Backlog, 10Flow: Flow messages are not editable and new topics can't be posted (API outage) - https://phabricator.wikimedia.org/T125080#1984048 (10matmarex) https://wikitech.wikimedia.org/wiki/Incident_documentation/20160128-MediaWiki-API [15:19:39] 6operations, 10Salt: Move salt master to separate host from puppet master - https://phabricator.wikimedia.org/T115287#1984052 (10ArielGlenn) rhodium is waiting to be installed, puppet cert is dead so I tossed it. Not responding to salt as of today are: analytics1017.eqiad.wmnet, still no ssh in etc, and thre... [15:21:08] 6operations, 10MediaWiki-Parser, 10Traffic, 10Wikidata, 10Wikidata-Page-Banner: Banners fail to show up occassionally on Russian Wikivoyage - https://phabricator.wikimedia.org/T121135#1984054 (10faidon) [15:56:40] "Session error. Please try again" when trying to CU. Logged-off and in a couple of times now. Same error. [16:14:12] mafk: There are issues with the new SessionManager stuff.. If you are able to reproduce that issue repeatedly, file a task and cc anomie and tgr [16:15:40] Glaisher: will do [16:15:44] Glaisher: is at bgwiki [16:17:37] which project fwiw? [16:22:38] authentication and authorization + CU [16:22:50] If it's wrong, someone would fix it ;) [16:23:40] cc'd you Glaisher btw [16:23:48] ok, fixing that [16:26:59] mafk: I only have a minute or two right now, but if you can I'd like to check something quick. Can you PM me the value of your bgwikiSession cookie? [16:27:23] anomie: if you tell me how to get it, I'll do :) [16:28:24] We'll have to do it some other time then. Sorry. [16:29:06] ok [16:33:55] 6operations, 7Icinga: icinga contacts and permissions for ema and elukey - https://phabricator.wikimedia.org/T124941#1984105 (10ema) 5Open>3Resolved I can confirm that logging in as ema I can access the dashboard but I can't execute commands. Everything works fine as Ema though. Thanks @Dzahn! [17:41:16] (03PS2) 10Llyrian: Enable data type mathematical expression on wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/267405 (https://phabricator.wikimedia.org/T124931) [17:46:05] (03CR) 10Physikerwelt: "@Hoo man do you think we should keep the line? Removing it completly would have the same effect." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/267405 (https://phabricator.wikimedia.org/T124931) (owner: 10Llyrian) [18:02:10] not at all urgent, but if any root is around, could you run these on ruthenium (a) cd /srv/visualdiff; git pull (b) restart parsoid-vd-client service [18:07:45] (03CR) 10GWicke: "@bblack: Yes, those should also use REST APIs, so we should not decode slashes for those either." [puppet] - 10https://gerrit.wikimedia.org/r/267381 (https://phabricator.wikimedia.org/T125176) (owner: 10GWicke) [18:40:37] <_joe_> subbu: I'm here, I can do that [18:41:23] <_joe_> subbu: I assume needing a root is temporary, right? [18:43:53] <_joe_> !log updated visualdiff, restarted parsoid-vd [18:43:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:44:10] <_joe_> subbu: {{done}} [18:44:38] <_joe_> btw b) is systemctl restart parsoid-vd.service [18:45:21] <_joe_> sorry, parsoid-vd-client.service [19:02:29] (03Abandoned) 10Halfak: Sets ORES redis cache_maxmemory => '2G' [puppet] - 10https://gerrit.wikimedia.org/r/261642 (owner: 10Halfak) [19:27:11] _joe_: i was logged in now a another user! which is not me [19:27:46] anyone else had this issue ? [19:29:05] on the wikis? [19:29:21] he.wikipedia [19:29:57] afaik, the sessions issue hasn't been fully fixed yet [19:30:23] there are still some sporadic reports of user craziness [19:30:37] thanks mobrovac [19:31:04] i hope no one got my access, it will be problematic if someone will become a steward :) [19:31:29] euh probably, yes [19:31:46] matanya: try the usual - log out, clear cookies and try logging back in [19:31:55] did that [19:32:00] it solved it [19:32:07] but i wanted to falg the issue [19:32:11] cool, that's a good sign [21:00:13] PROBLEM - very high load average likely xfs on ms-be1010 is CRITICAL: CRITICAL - load average: 228.17, 142.38, 68.50 [21:27:09] (03PS1) 10BryanDavis: Revert "monolog: Ensure that context data added by WebProcessor is utf-8 safe" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/267469 (https://phabricator.wikimedia.org/T119594) [21:27:15] matanya: there is a bug report about that already [21:28:17] (03PS2) 10BryanDavis: Revert "monolog: Ensure that context data added by WebProcessor is utf-8 safe" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/267469 (https://phabricator.wikimedia.org/T119594) [21:50:39] (03CR) 10Steinsplitter: [C: 031] Add 2 sites to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262893 (https://phabricator.wikimedia.org/T122995) (owner: 10Mdann52) [21:58:23] PROBLEM - puppet last run on db2067 is CRITICAL: CRITICAL: Puppet has 1 failures [22:02:28] <_joe_> matanya: sorry I was AFK, but other people way more qualified than me to solve this are looking [22:02:55] thanks _joe_ i have been contacted. :) [22:06:47] _joe_, thanks. yes this is temporary .. till T124701 and T125166 is resolved on monday in ops meeting. [22:06:56] <_joe_> ok [22:07:06] <_joe_> I will look into making that happen :) [22:08:09] _joe_, ok :) for now, can you also give me a dump of the logs for parsoid-vd-client .. journalctl -n 1000 -u parsoid-vd-client > /tmp/ ? [22:08:24] <_joe_> subbu: ok [22:09:33] <_joe_> subbu: it's all "the server has no work for us" [22:09:50] <_joe_> I can drop it to a file, but I don't really see a point :) [22:10:27] ah, ok. at least that is good. so, the client is not broken anymore. [22:10:35] yes, not needed. [22:11:36] <_joe_> ok :) [22:14:53] i'll have to purge test entries (from earlier failures) from the db .. but i don't have access to the db yet. i'll file another access request for that now. and deal with this on monday. [22:16:35] <_joe_> yeah if it's not urgent as in "the site is burning down" I'd rather not clean the db now [22:18:18] 10Ops-Access-Requests, 6operations, 6Parsing-Team, 5Patch-For-Review: Getting parsing-team members sudo access to manage (start, stop, restart) services on ruthenium - https://phabricator.wikimedia.org/T124701#1984408 (10ssastry) Can parsoid-rt-admin members get mysql client access to the testreduce_0715 a... [22:21:13] RECOVERY - puppet last run on db2067 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [22:32:39] (03PS1) 10BryanDavis: Enable debug level session logging to fluorine [mediawiki-config] - 10https://gerrit.wikimedia.org/r/267512 [22:33:17] (03CR) 10BryanDavis: [C: 032] Enable debug level session logging to fluorine [mediawiki-config] - 10https://gerrit.wikimedia.org/r/267512 (owner: 10BryanDavis) [22:33:42] (03Merged) 10jenkins-bot: Enable debug level session logging to fluorine [mediawiki-config] - 10https://gerrit.wikimedia.org/r/267512 (owner: 10BryanDavis) [22:36:15] !log bd808@mira Synchronized wmf-config/InitialiseSettings.php: Enable debug level session logging to fluorine (5ac9412) (duration: 01m 26s) [22:36:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:40:26] (03PS1) 10Ebrahim: toolslabs: install hunspell and libhunspell-dev to exec nodes [puppet] - 10https://gerrit.wikimedia.org/r/267513 (https://phabricator.wikimedia.org/T125193) [22:58:25] (03PS1) 10BryanDavis: Revert "Enable debug level session logging to fluorine" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/267516 [22:58:33] (03CR) 10BryanDavis: [C: 032] Revert "Enable debug level session logging to fluorine" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/267516 (owner: 10BryanDavis) [22:59:06] (03Merged) 10jenkins-bot: Revert "Enable debug level session logging to fluorine" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/267516 (owner: 10BryanDavis) [23:01:32] !log bd808@mira Synchronized wmf-config/InitialiseSettings.php: Revert Enable debug level session logging to fluorine (17bfb06) (duration: 01m 28s) [23:01:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:12:23] (03PS1) 10BryanDavis: Revert all wikis to 1.27.0-wmf.10 (again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/267517 [23:14:15] (03CR) 10BryanDavis: [C: 032] Revert all wikis to 1.27.0-wmf.10 (again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/267517 (owner: 10BryanDavis) [23:14:43] (03Merged) 10jenkins-bot: Revert all wikis to 1.27.0-wmf.10 (again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/267517 (owner: 10BryanDavis) [23:20:05] !log bd808@mira rebuilt wikiversions.php and synchronized wikiversions files: Revert all wikis to 1.27.0-wmf.10 (again) [23:20:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:20:14] sadness [23:25:03] * MaxSem hugs bd808 [23:32:04] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/).