[00:19:29] 6operations, 7Database: new external storage cluster(s) - https://phabricator.wikimedia.org/T105843#1452813 (10Springle) 3NEW a:3Springle [00:25:23] RECOVERY - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1426 bytes in 0.219 second response time [00:28:03] 6operations, 10Traffic, 7HTTPS, 5Patch-For-Review: Insecure POST traffic - https://phabricator.wikimedia.org/T105794#1452828 (10Anomie) AnomieBOT should be fixed now. Thought I had done that before, but apparently not. [00:36:32] 6operations, 10Wikimedia-Logstash: reinstall logstash1001-1003 - https://phabricator.wikimedia.org/T97545#1452840 (10bd808) >>! In T97545#1452296, @RobH wrote: > It appears that we are now in the steps of relocating two of the three systems into different racks, correct? > > We'll have @cmjohnson then remove... [00:39:44] 6operations, 10Wikimedia-Logstash: reinstall logstash1001-1003 - https://phabricator.wikimedia.org/T97545#1452845 (10bd808) I will be traveling on 2015-07-20 and probably not online until the SF afternoon on 2015-07-21 so it would be great if they were not reimaged before 2015-07-22 unless someone else can be... [00:44:51] (03CR) 10Springle: [C: 032] s6 pager slave partitioning [software] - 10https://gerrit.wikimedia.org/r/223732 (owner: 10Springle) [00:44:59] (03CR) 10Springle: [V: 032] s6 pager slave partitioning [software] - 10https://gerrit.wikimedia.org/r/223732 (owner: 10Springle) [00:50:07] (03CR) 10Springle: [C: 031] "db1001 is still the active m1-master. It needs more work before decom (remember how db9 stayed around forever? same problem)." [puppet] - 10https://gerrit.wikimedia.org/r/224558 (https://phabricator.wikimedia.org/T105768) (owner: 10John F. Lewis) [01:15:34] PROBLEM - Apache HTTP on mw1156 is CRITICAL - Socket timeout after 10 seconds [01:17:14] RECOVERY - Apache HTTP on mw1156 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.091 second response time [02:02:56] !log LocalisationUpdate failed (1.26wmf13) at 2015-07-15 02:02:55+00:00 [02:03:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:06:22] lol [02:07:49] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul 15 02:07:48 UTC 2015 (duration 7m 47s) [02:07:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:08:07] * Reedy has a look [02:09:04] LU fails, but LU cache refresh worked, heh. [02:09:18] lol [02:09:22] bit useless [02:10:16] !log Running LU manually to see what's wrong with it [02:10:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:18:59] Taking a while [02:24:13] Rebuilding localization cache at 2015-07-15 02:11:03+00:00 [02:24:13] Completed at 2015-07-15 02:22:19+00:00. Copying LC files to /srv/mediawiki-staging [02:24:18] I hate computers [02:25:01] Then sync-dir failed [02:25:50] Hang on [02:25:56] Is there 2 l10nupdates running? [02:26:36] Before-deploying checklist complete down to final items --- l10n, cdb and flaps to go [02:26:39] am i doing it right? [02:27:02] Reedy: terbium? [02:27:04] PROBLEM - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [02:27:15] tin [02:27:25] what are you doing? [02:27:35] 997 3719 0.0 0.0 45760 2644 ? Ss 02:26 0:00 /usr/bin/ssh -oBatchMode=yes -oSetupTimeout=10 -F/dev/null -ll10nupdate mw2053.codfw.wmnet /srv/deployment/scap/scap/bin/sync-common --no-update-l10n --include php-1.26wmf13 --include php-1.26wmf13/cache --include php-1.26wmf13/cache/l10n --include php-1.26wmf13/cache/l10n/*** mw1010.eqiad.wmnet mw1033.eqiad.wmnet mw1070.eqiad.wmnet mw1097.eqiad.wmnet mw1216.eqiad. [02:27:35] wmnet mw1161.eqiad.wmnet mw1201.eqiad.wmnet mw2001.codfw.wmnet mw2041.codfw.wmnet mw2080.codfw.wmnet mw2119.codfw.wmnet mw2187.codfw.wmne [02:27:41] why is that showing as 997 not a user? [02:27:43] nothing, a lame joke about comparing it to flight training checklist before landing [02:28:03] PROBLEM - puppet last run on mw1090 is CRITICAL Puppet last ran 6 hours ago [02:28:33] Oh [02:28:38] I see when reading it again :P [02:29:01] 997 - so that user doesnt really exist, yea [02:29:04] RECOVERY - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 301 TLS Redirect - 497 bytes in 0.007 second response time [02:29:10] looks [02:29:18] Yesterday too [02:29:19] 02:35 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-14 02:35:21+00:00 [02:29:19] 02:31 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 07m 27s) [02:29:19] 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 14 02:07:32 UTC 2015 (duration 7m 30s) [02:29:19] 02:02 logmsgbot: LocalisationUpdate failed (1.26wmf13) at 2015-07-14 02:02:33+00:00 [02:30:29] on terbium 997 is "gmetric" .. eh [02:30:33] no consistency [02:30:51] i'm on tin atm [02:31:02] i wonder if there's 2 crons for this [02:32:39] (03PS1) 10John F. Lewis: labs: set project mount to true on wikistats [puppet] - 10https://gerrit.wikimedia.org/r/224739 [02:32:48] might explain the fail and succeed [02:33:02] l10nupdate:x:997:10002::/home/l10nupdate:/bin/bash [02:33:06] (03PS2) 10John F. Lewis: labs: set project mount to true on wikistats [puppet] - 10https://gerrit.wikimedia.org/r/224739 [02:33:07] there it is, 997 [02:33:44] i see a single cronjob [02:33:49] 0 2 * * * /usr/local/bin/l10nupdate-1 --verbose >> /var/log/l10nupdatelog/l10nupdate.log 2>&1 [02:34:16] I'm just wondering why it runs once, says it fails, then is seemingly running itself again after [02:34:16] (03PS3) 10John F. Lewis: labs: set project mount to true on wikistats [puppet] - 10https://gerrit.wikimedia.org/r/224739 [02:34:18] (03PS3) 10John F. Lewis: remove db100[2-7] from install_server and coredb [puppet] - 10https://gerrit.wikimedia.org/r/224558 [02:35:05] Ah [02:35:11] I wonder if ori made some changes to it [02:35:13] once with /bin/bash , once with /bin/sh [02:35:32] unless one is you [02:35:45] also wondering if that user should really have the shell [02:36:05] I ran one with sudo, but it failed on the sync-dir file [02:36:34] a) bash /usr/local/bin/sudo-withagent l10nupdate ... [02:36:42] b) /bin/sh -c /usr/local/bin/l10nupdate-1 .. [02:36:43] one might be me [02:36:51] c) /bin/bash /usr/local/bin/l10nupdate-1 .. [02:36:52] though, I didn't run it with --verose [02:37:02] Oh [02:37:05] both are running with --verbose [02:37:06] it probably calls it on itself [02:37:09] and then also this: [02:37:35] Failed to sync-dir 'php-1.26wmf13/cache/l10n' [02:37:36] sigh [02:37:40] /scap/bin/sync-common --no-update-l10n [02:38:27] (03CR) 10Dzahn: [C: 032] "thanks John" [puppet] - 10https://gerrit.wikimedia.org/r/224739 (owner: 10John F. Lewis) [02:40:43] Reedy: i was about to say kill them all and start a single fresh one.. but looks like you are [02:41:09] Nope, they died [02:41:10] :P [02:42:29] (03PS4) 10John F. Lewis: labs: set project mount to true on wikistats [puppet] - 10https://gerrit.wikimedia.org/r/224739 [02:43:02] Reedy: just saw this from ostriches. any relation ? https://phabricator.wikimedia.org/T105850 [02:43:20] Nah [02:43:23] since it's l10n logs [02:43:24] ok [02:43:24] It's just log noise [02:43:30] Thought it was worth reporting [02:43:56] i'll run it manually again [03:00:43] PROBLEM - puppet last run on cp3008 is CRITICAL puppet fail [03:03:21] !log es1.6 upgrade: raised limits on shard migration rate - should speed up the restart. we should lower it before we do restarts during europe's morning [03:03:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:07:13] Oh [03:07:19] Is it that apache that's ro? [03:10:35] !log reedy Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 13m 32s) [03:10:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:10:45] go me [03:14:21] !log LocalisationUpdate completed (1.26wmf13) at 2015-07-15 03:14:21+00:00 [03:14:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:28:44] RECOVERY - puppet last run on cp3008 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [03:29:04] !log es1.6 upgrade: upgrade elastic1012 [03:29:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:49:36] !log upgrade db1030 trusty [03:49:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:55:50] (03PS1) 10Springle: upgrade db1030 trusty [puppet] - 10https://gerrit.wikimedia.org/r/224742 [03:56:48] (03CR) 10Springle: [C: 032] upgrade db1030 trusty [puppet] - 10https://gerrit.wikimedia.org/r/224742 (owner: 10Springle) [04:12:12] !log es1.6 upgrade: upgrade elastic1013 [04:12:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [04:21:24] (03PS1) 10Springle: move db1030 to correct node definition [puppet] - 10https://gerrit.wikimedia.org/r/224744 [04:22:31] (03CR) 10Springle: [C: 032] move db1030 to correct node definition [puppet] - 10https://gerrit.wikimedia.org/r/224744 (owner: 10Springle) [04:28:05] !log es1.6 upgrade: lowered the shard transfer settings back to our normal rate. going to bed. [04:28:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [04:52:27] DB connection error: Can't connect to MySQL server on '208.80.154.136' (4) (208.80.154.136) [04:52:31] Was transient [04:52:36] during manual l10nupdate [04:59:52] 6operations, 7user-notice: schedule maintenance for IRC server - https://phabricator.wikimedia.org/T105804#1453063 (10Dzahn) [05:01:25] Reedy: did you get l10nupdate hammered into working order? [05:04:18] 6operations, 7user-notice: schedule maintenance for IRC server - https://phabricator.wikimedia.org/T105804#1453069 (10MZMcBride) It would be nice to know how long the maintenance window is intended to be and consequently what the estimated/expected downtime will be. It may make sense to include this informatio... [05:04:37] Reedy: another data point for T98682 [05:10:26] !log db1030 busy removing table partitioning [05:10:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [05:12:16] 6operations, 7Database: new external storage cluster(s) - https://phabricator.wikimedia.org/T105843#1453075 (10Legoktm) [05:19:51] <_joe|AFK> springle: es* servers are low on disk space [05:24:58] _joe_: noticed, thanks. T105843 [05:25:13] _joe_: https://phabricator.wikimedia.org/T105843 is the ticket for it I think [05:26:14] <_joe_> let's ditch ES and move to cassandra, it has proven to be so stable! [05:26:46] :) [05:26:49] fine with me [05:27:05] * springle washes hands of ES; wanders off [05:29:36] <_joe_> nah you will be responsible for that, but nice try mate [05:31:06] aww [05:40:37] !log es1.6 upgrade: upgrade elastic1014 [05:40:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [05:46:44] 6operations, 10Traffic, 7HTTPS, 5Patch-For-Review: Insecure POST traffic - https://phabricator.wikimedia.org/T105794#1453085 (10Legoktm) > 234 php wikibot classes That is probably me, [[https://github.com/legoktm/harej-bots/commit/d93d3d6990f7b3a8cc3154c0a589cb6c34bcbc87|this]] should fix it. (Yes th... [05:50:01] (03PS8) 10Alex Monk: Enable Echo on Wikimedia wikis by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/139326 (https://phabricator.wikimedia.org/T59375) (owner: 10Withoutaname) [06:17:10] 7Blocked-on-Operations, 6operations, 10Parsoid, 6Services: Offer io.js on Jessie - https://phabricator.wikimedia.org/T91855#1453090 (10MoritzMuehlenhoff) There doesn't seem to be any activity wrt packaging io.js in Debian (and the already existing nodejs is already poorly maintained). As much as I dislike... [06:27:50] (03PS2) 10Muehlenhoff: Enable packet filter for heze [puppet] - 10https://gerrit.wikimedia.org/r/224576 [06:30:14] PROBLEM - puppet last run on cp2002 is CRITICAL Puppet has 2 failures [06:30:18] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable packet filter for heze [puppet] - 10https://gerrit.wikimedia.org/r/224576 (owner: 10Muehlenhoff) [06:31:14] PROBLEM - puppet last run on db2064 is CRITICAL Puppet has 1 failures [06:31:24] PROBLEM - puppet last run on mw1170 is CRITICAL Puppet has 2 failures [06:31:24] PROBLEM - puppet last run on mw2145 is CRITICAL Puppet has 1 failures [06:31:26] !log es1.6 upgrade: upgrade elastic1015 [06:31:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:31:45] PROBLEM - puppet last run on lvs1003 is CRITICAL Puppet has 1 failures [06:31:45] PROBLEM - puppet last run on mw2043 is CRITICAL Puppet has 1 failures [06:31:54] PROBLEM - puppet last run on mw2045 is CRITICAL Puppet has 2 failures [06:32:04] PROBLEM - puppet last run on cp3007 is CRITICAL Puppet has 1 failures [06:32:15] PROBLEM - puppet last run on mc2015 is CRITICAL Puppet has 1 failures [06:32:15] PROBLEM - puppet last run on mw1061 is CRITICAL Puppet has 1 failures [06:32:44] PROBLEM - puppet last run on subra is CRITICAL Puppet has 1 failures [06:32:44] PROBLEM - puppet last run on mw2158 is CRITICAL Puppet has 1 failures [06:32:53] PROBLEM - puppet last run on mw1158 is CRITICAL Puppet has 1 failures [06:32:53] PROBLEM - puppet last run on mw1220 is CRITICAL Puppet has 1 failures [06:33:43] PROBLEM - puppet last run on mw2129 is CRITICAL Puppet has 1 failures [06:35:29] (03CR) 10Muehlenhoff: "We already have existing reviews for memcached and redis (https://gerrit.wikimedia.org/r/#/c/222554/ and https://gerrit.wikimedia.org/r/#/" [puppet] - 10https://gerrit.wikimedia.org/r/188715 (https://phabricator.wikimedia.org/T86898) (owner: 10Dzahn) [06:53:39] 6operations, 7user-notice: schedule maintenance for IRC server - https://phabricator.wikimedia.org/T105804#1453119 (10Joe) p:5Triage>3Normal [06:55:14] RECOVERY - puppet last run on db2064 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:55:18] (03PS1) 10Chmarkine: Update links on dumps.wm.org to HTTPS [puppet] - 10https://gerrit.wikimedia.org/r/224750 [06:55:43] RECOVERY - puppet last run on lvs1003 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:55:44] RECOVERY - puppet last run on mw2043 is OK Puppet is currently enabled, last run 0 seconds ago with 0 failures [06:56:04] RECOVERY - puppet last run on cp2002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:09] (03CR) 10Muehlenhoff: "I noticed one more thing: Now that the thresholds are configurable, you also neeed to pass the thresholds to the nrpe_command option of nr" [puppet] - 10https://gerrit.wikimedia.org/r/223560 (owner: 10Matanya) [06:56:13] RECOVERY - puppet last run on mc2015 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:56:35] RECOVERY - puppet last run on subra is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:43] RECOVERY - puppet last run on mw2158 is OK Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:56:44] RECOVERY - puppet last run on mw1220 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:13] RECOVERY - puppet last run on mw1170 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:14] RECOVERY - puppet last run on mw2145 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:35] RECOVERY - puppet last run on mw2129 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:57:45] RECOVERY - puppet last run on mw2045 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:54] RECOVERY - puppet last run on cp3007 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:04] RECOVERY - puppet last run on mw1061 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:34] RECOVERY - puppet last run on mw1158 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [07:12:51] (03CR) 10Gilles: "Is this scheduled for deployment?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221885 (https://phabricator.wikimedia.org/T88493) (owner: 10Aaron Schulz) [07:13:37] (03CR) 10Gilles: [C: 031] grafana: Set a default dashboard [puppet] - 10https://gerrit.wikimedia.org/r/224129 (owner: 10Krinkle) [07:22:13] (03CR) 10Aaron Schulz: "Any SWAT really. It was supposed to be yesterday." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221885 (https://phabricator.wikimedia.org/T88493) (owner: 10Aaron Schulz) [08:27:14] !log es1.6 upgrade: upgrade elastic1016 [08:27:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:33:00] _joe_: ping? [08:33:34] <_joe_> SMalyshev: hi! [08:33:47] <_joe_> Sorry I was at dinner when you pinged me [08:33:53] _joe_: hi! no problem [08:34:03] <_joe_> I'm finishing writing some tests and then I'll work on your patch [08:34:24] _joe_: cool. I'm going to bed soon but will read any comments you leave in the morning. [08:34:35] <_joe_> also, I'll formalize the hardware procurement request today [08:34:52] _joe_: cool! How long it may take, do you have any idea? [08:35:03] <_joe_> SMalyshev: if you prefer, I'll just comment the patch and you can amend it, or I can amend it myself [08:35:14] _joe_: if you can amend it it'd be great [08:35:44] <_joe_> SMalyshev: no idea right now, sorry, let's say less than a month typically, around a week if we have spares [08:35:57] <_joe_> SMalyshev: amending it might actually be easier than comment extensively :) [08:36:17] <_joe_> but we're well set to have everything in prod by the end of the quarter [08:36:40] <_joe_> (remember we will still need to set up varnish etc once we're done with the basic puppet setup) [08:37:29] _joe_: so we have a conference mid-Aug (18-19) where I will be presenting with Blazegraph folks about it. If we had something running by then (doesn't have to be perfectly finished) it's be great. Not a must (it wasn't in quarterly goals) but may be nice [08:38:00] _joe_: yeah I remember the varnish things... But that should not be too hard (unless of course I miss something :) [08:38:41] <_joe_> I think we should be able to do that [08:39:07] <_joe_> but it strongly depends on hw procurement [08:39:42] from what I looked at https://wikitech.wikimedia.org/wiki/Server_Spares we have couple of spares that should be good for our purposes (wmf354*) [08:39:49] <_joe_> yes [08:40:06] <_joe_> that's what I'm inclined to think, let's hope no one stole them :P [08:40:11] yeah :) [08:40:20] <_joe_> sometimes people are slow in updating that page [08:42:17] wmf3151 might do as a backup option, there's also a lot of WMF31* there but they have less memory... might work for some time though [08:42:44] at least for starters [08:42:58] <_joe_> SMalyshev: we also still have einsteinium I think/but I need to check [08:43:25] _joe_: yes but that one has spinning disks. Once we get the SSD ones we can probably return einsteinium [08:43:54] <_joe_> ok [08:47:55] 6operations, 10Continuous-Integration-Infrastructure: Phase out lanthanum.eqiad.wmnet - https://phabricator.wikimedia.org/T86658#1453240 (10hashar) [09:01:55] (03PS1) 10Filippo Giunchedi: install-server: fix logstash partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/224757 (https://phabricator.wikimedia.org/T104035) [09:06:07] (03PS2) 10Filippo Giunchedi: install-server: fix logstash partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/224757 (https://phabricator.wikimedia.org/T104035) [09:21:48] 6operations, 7Graphite, 7Monitoring: deprecate gdash - https://phabricator.wikimedia.org/T104365#1453308 (10fgiunchedi) according to https://github.com/grafana/grafana/issues/22 there's support to render png from graphite, I've tried it and it is per-graph not per-dashboard so changing it manually would be t... [09:23:51] godog: hello! is gdash deprecated in favor of grafana ? [09:25:57] hey hashar, I think people are still playing with it, see thread in ^ [09:43:40] (03PS3) 10Filippo Giunchedi: diamond: service stats puppet integration [puppet] - 10https://gerrit.wikimedia.org/r/224094 [09:43:42] (03PS3) 10Filippo Giunchedi: diamond: add upstart/systemd service stats [puppet] - 10https://gerrit.wikimedia.org/r/224093 [09:44:07] 6operations, 10ops-codfw, 7Swift: ms-be2013 - swift-storage/sdc1 is not accessible: Input/output error - https://phabricator.wikimedia.org/T105213#1453330 (10fgiunchedi) 5Open>3Resolved complete [10:18:36] (03CR) 10Mobrovac: service::node: auto-monitoring of local endpoints (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/223328 (https://phabricator.wikimedia.org/T94821) (owner: 10Giuseppe Lavagetto) [10:50:40] 6operations: long-running root console sessions - https://phabricator.wikimedia.org/T105869#1453417 (10fgiunchedi) 3NEW [10:54:39] (03PS1) 10Addshore: Fix link to WD Json dumps on other dumps html page [puppet] - 10https://gerrit.wikimedia.org/r/224768 (https://phabricator.wikimedia.org/T104307) [11:18:51] (03CR) 10BBlack: [C: 031] Update links on dumps.wm.org to HTTPS [puppet] - 10https://gerrit.wikimedia.org/r/224750 (owner: 10Chmarkine) [11:28:44] PROBLEM - puppet last run on mw1141 is CRITICAL Puppet has 1 failures [11:32:20] (03PS1) 10Dereckson: Closes wikimania2014. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224770 [11:33:34] (03CR) 10Dereckson: [C: 04-1] "Not yet, we wait for wikimania2014 team agreement." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224770 (owner: 10Dereckson) [11:33:54] (03PS2) 10Dereckson: Closes wikimania2014. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224770 (https://phabricator.wikimedia.org/T105675) [11:36:32] (03PS1) 10Dereckson: Cleans up wikimania2013. extraneous rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224771 [11:43:21] !log es1.6 upgrade: upgrade elastic1017 [11:43:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:44:02] (03PS1) 10Dereckson: Set wmgULSPosition to personal on gom.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224772 (https://phabricator.wikimedia.org/T105124) [11:53:23] RECOVERY - puppet last run on mw1141 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [13:06:13] 6operations, 7Database: db1022 duplicate key errors - https://phabricator.wikimedia.org/T105879#1453705 (10Springle) 3NEW a:3Springle [13:06:37] 6operations, 7Database: db1022 duplicate key errors - https://phabricator.wikimedia.org/T105879#1453713 (10Springle) For now, I will depool db1022. [13:10:08] (03PS1) 10Springle: depool db1022 T105879 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224783 [13:10:50] (03CR) 10Springle: [C: 032] depool db1022 T105879 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224783 (owner: 10Springle) [13:10:56] (03Merged) 10jenkins-bot: depool db1022 T105879 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224783 (owner: 10Springle) [13:12:09] 6operations, 10RESTBase, 10RESTBase-Cassandra: Test multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#1453721 (10BBlack) I had to backtrack a bit into the ref'd closed ticket, but it seems that the fundamental drivers for multiple instances per HW node is that: - If th... [13:12:16] !log springle Synchronized wmf-config/db-eqiad.php: depool db1022 T105879 (duration: 00m 12s) [13:12:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:15:31] !log sync-common on mw1216 after sync-file from tin failed non-zero exit status 12 [13:15:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:19:40] 12 ENOMEM, but am not sure if it was mw1216 or tin. they both now seem fine [13:20:10] _joe_: ^ seen that recently at all? [13:20:36] <_joe_> springle: lemme see [13:20:48] <_joe_> springle: sure mw1216? it's a scap proxy AFAIR [13:21:54] <_joe_> http://ganglia.wikimedia.org/latest/?c=Application%20servers%20eqiad&h=mw1216.eqiad.wmnet&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 looks like more than healthy [13:21:58] oh actually, i'm wrong. it's mw1090 [13:22:43] http://aerosuidae.net/paste/2f0b0/55a65c70 [13:24:03] !log entry below not mw1216 fault, but r/o filesystem error on mw1090 [13:24:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:24:36] _joe_: i see you already know about mw1090. sorry for the noise [13:24:54] but it would be nice to depool it from scap :) [13:25:56] <_joe_> springle: I assumed someone else did, sorry [13:34:19] (03CR) 10Jforrester: Closes wikimania2014. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224770 (https://phabricator.wikimedia.org/T105675) (owner: 10Dereckson) [13:38:15] (03PS3) 10Dereckson: Close wikimania2014 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224770 (https://phabricator.wikimedia.org/T105675) [13:55:20] !log es1.6 upgrade: upgrade elastic1018 [13:55:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:03:59] (03CR) 10Alex Monk: "Do we usually remove the custom rights settings from locked wikis?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224771 (owner: 10Dereckson) [14:11:24] (03PS1) 10Legoktm: Revert "Revert "Set $wgCentralAuthStrict = true;"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224792 [14:11:35] (03CR) 10Jforrester: [C: 04-1] "Per my comment on Phabricator." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224770 (https://phabricator.wikimedia.org/T105675) (owner: 10Dereckson) [14:11:50] jouncebot: next [14:11:51] In 0 hour(s) and 48 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150715T1500) [14:20:34] !log legoktm Synchronized php-1.26wmf13/extensions/CentralAuth/includes/CentralAuthPlugin.php: Add log entry for $wgCentralAuthStrict failures if SULMigration is enabled (duration: 00m 13s) [14:20:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:22:14] !log sync failed on mw1090.eqiad.wmnet, read only filesystem [14:22:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:22:27] legoktm: known issue :) [14:22:30] out of rotation [14:22:35] ticket philed [14:22:38] ok [14:22:45] (I did the same thing last night :)) [14:22:46] so it's not serving traffic? [14:22:47] :P [14:22:54] yup [14:23:23] (03CR) 10Legoktm: [C: 032] Revert "Revert "Set $wgCentralAuthStrict = true;"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224792 (owner: 10Legoktm) [14:23:40] (03CR) 10Legoktm: Revert "Revert "Set $wgCentralAuthStrict = true;"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224792 (owner: 10Legoktm) [14:24:59] (03PS2) 10Legoktm: Revert "Revert "Set $wgCentralAuthStrict = true;"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224792 [14:26:06] (03CR) 10Legoktm: [C: 032] Revert "Revert "Set $wgCentralAuthStrict = true;"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224792 (owner: 10Legoktm) [14:26:33] (03Merged) 10jenkins-bot: Revert "Revert "Set $wgCentralAuthStrict = true;"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224792 (owner: 10Legoktm) [14:29:28] woo [14:29:30] 2015-07-15 14:29:00 mw1017 testwiki CentralAuth INFO: plugin: unattached account for 'Lego-test' {"private":false} [14:31:07] and everything still works :D [14:33:22] !log legoktm Synchronized wmf-config/CommonSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 12s) [14:33:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:44:17] (03PS1) 10Reedy: Remove old comments about GeSHi rl startup modules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224799 [14:48:41] (03PS2) 10Reedy: Re-enable all languages in GeSHi [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224799 [14:52:17] (03PS1) 10Alexandros Kosiaris: Introduce krypton.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/224801 (https://phabricator.wikimedia.org/T105507) [14:58:43] jouncebot, next [14:58:43] In 0 hour(s) and 1 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150715T1500) [15:00:04] manybubbles anomie ostriches thcipriani marktraceur Krenair: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150715T1500). Please do the needful. [15:00:22] no swat [15:00:29] none scheduled at least [15:00:35] no patches, rather [15:03:33] (03CR) 10Hoo man: [C: 04-1] Fix link to WD Json dumps on other dumps html page (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/224768 (https://phabricator.wikimedia.org/T104307) (owner: 10Addshore) [15:03:48] 6operations, 10RESTBase, 10RESTBase-Cassandra: Test multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#1454051 (10GWicke) @bblack: GC scaling is an important and well-known issue for JVM apps (even with G1GC), but evidently it's not the only one for Cassandra. There mig... [15:06:01] (03CR) 10Alexandros Kosiaris: [C: 032] Introduce krypton.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/224801 (https://phabricator.wikimedia.org/T105507) (owner: 10Alexandros Kosiaris) [15:07:57] I have a patch [15:08:00] I'll do it [15:09:37] (03CR) 10Alex Monk: [C: 032] Enable Echo on Wikimedia wikis by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/139326 (https://phabricator.wikimedia.org/T59375) (owner: 10Withoutaname) [15:10:04] (03Merged) 10jenkins-bot: Enable Echo on Wikimedia wikis by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/139326 (https://phabricator.wikimedia.org/T59375) (owner: 10Withoutaname) [15:17:55] manybubbles, although it seems stuck syncing [15:18:03] ? [15:18:12] sync-proxies: 91% (ok: 11; fail: 0; left: 1) [15:18:12] ah [15:18:15] hmmmmm [15:18:18] well then! [15:18:36] I'm not sure what to do in that case. my instrinct is that ctrl-c and retry is safe [15:19:14] The sync itself is supposed to be safe (effectively a no-op), but it might leave some servers with the new version and some with the old [15:19:58] ah, it worked this time [15:20:05] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 11s) [15:20:08] just a single failure [15:20:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:20:49] fun times [15:21:45] (03PS1) 10Dzahn: add new VM krypton to DHCP/netboot [puppet] - 10https://gerrit.wikimedia.org/r/224805 [15:22:05] (03PS2) 10Dzahn: add new VM krypton to DHCP/netboot [puppet] - 10https://gerrit.wikimedia.org/r/224805 (https://phabricator.wikimedia.org/T105507) [15:22:18] already a known one, legoktm !logged it earlier [15:22:21] readonly fs [15:23:31] hey ori, is it possible for deployers to depool a mw server? [15:24:07] <_joe_> Krenair: it is depooled [15:24:11] oh, okay [15:24:19] <_joe_> it just needs to be removed from the scap list [15:24:21] <_joe_> lemme do that [15:24:33] mw10190? do you need that depooled too? [15:24:44] <_joe_> mutante: it is depooled [15:24:51] <_joe_> I depooled that yesterday night [15:24:54] still on mediawiki-installation dsh list, I guess [15:24:58] <_joe_> yes [15:25:19] nevermind, it is. i just looked at True/False. the entire line is commented [15:25:27] <_joe_> Krenair: gimme 10 mins, I'm doing a pretty big commit I've been working on [15:25:34] sure, it's not urgent [15:25:46] <_joe_> oh ok then, a bit more than 10 mins :) [15:25:55] i can do it, hold on [15:27:05] (03PS1) 10Dzahn: remove mw1090 from dsh group [puppet] - 10https://gerrit.wikimedia.org/r/224806 [15:27:20] !log krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 12s) [15:27:20] (03CR) 10John F. Lewis: [C: 04-1] add new VM krypton to DHCP/netboot (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/224805 (https://phabricator.wikimedia.org/T105507) (owner: 10Dzahn) [15:27:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:27:37] (03PS2) 10Dzahn: remove mw1090 from dsh group [puppet] - 10https://gerrit.wikimedia.org/r/224806 (https://phabricator.wikimedia.org/T105835) [15:28:03] (03PS3) 10Dzahn: remove mw1090 from dsh group [puppet] - 10https://gerrit.wikimedia.org/r/224806 (https://phabricator.wikimedia.org/T105835) [15:29:27] !log krenair Synchronized docroot/noc/createTxtFileSymlinks.sh: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 12s) [15:29:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:29:34] all looks fine, by the way [15:29:49] Just not sure how to properly sync the deletion of echowikis.dblist [15:29:57] I wonder if sync-dblist does it [15:30:21] oh! I think I remember deletes needing a scat [15:30:23] scap [15:31:36] (03PS3) 10Dzahn: add new VM krypton to DHCP/netboot [puppet] - 10https://gerrit.wikimedia.org/r/224805 (https://phabricator.wikimedia.org/T105507) [15:31:38] yeah [15:31:45] but I don't want to do a full scap [15:31:51] (03CR) 10Krinkle: "FIXME: Leaves two broken references:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/139326 (https://phabricator.wikimedia.org/T59375) (owner: 10Withoutaname) [15:31:57] i'm the scap man .. [15:31:58] "sync-dir ." should do it, but that's also... sub-optimal [15:32:18] mutante++ [15:32:35] yeah... sync-dir and scap are the only things that will delete across the cluster [15:32:35] (03PS4) 10Dzahn: add new VM krypton to DHCP/netboot [puppet] - 10https://gerrit.wikimedia.org/r/224805 (https://phabricator.wikimedia.org/T105507) [15:32:57] (03CR) 10John F. Lewis: add new VM krypton to DHCP/netboot (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/224805 (https://phabricator.wikimedia.org/T105507) (owner: 10Dzahn) [15:32:59] In theory sync-file could but it actually won't [15:33:00] getting server error from gerrit [15:33:29] (03CR) 10Dzahn: [C: 032] add new VM krypton to DHCP/netboot [puppet] - 10https://gerrit.wikimedia.org/r/224805 (https://phabricator.wikimedia.org/T105507) (owner: 10Dzahn) [15:33:30] sync-dblist probably should? [15:33:32] not anymore [15:33:39] (03PS1) 10Legoktm: Disable 'CentralAuth-Bug39996' log [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224808 (https://phabricator.wikimedia.org/T105895) [15:33:44] (03PS4) 10Dzahn: remove mw1090 from dsh group [puppet] - 10https://gerrit.wikimedia.org/r/224806 (https://phabricator.wikimedia.org/T105835) [15:33:58] Krenair: are you deploying stuff? [15:33:59] (03CR) 10Dzahn: [C: 032] "it had a disk failure" [puppet] - 10https://gerrit.wikimedia.org/r/224806 (https://phabricator.wikimedia.org/T105835) (owner: 10Dzahn) [15:34:01] sort of [15:34:04] ok [15:34:07] have something urgent? [15:34:08] * legoktm will wait [15:34:09] nope [15:34:37] There's just a dblist I'm trying to get rid of lying around on the servers because there's no simple way to sync it's removal [15:34:50] lol [15:34:59] dsh rm it? [15:35:12] yeah [15:35:15] ok, so mw1090 should be gone on next run [15:35:34] <_joe_> mutante: thanks [15:35:50] !log krenair Synchronized database lists: (no message) (duration: 00m 11s) [15:35:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:35:56] <_joe_> mutante: if the next run is after ~ 20 minutes [15:36:14] I think that did it, actually, bd808 [15:36:51] (03CR) 10coren: [C: 031] "This makes 1.27 senses." [puppet] - 10https://gerrit.wikimedia.org/r/33066 (owner: 10Faidon Liambotis) [15:36:58] yeah [15:37:02] Krenair: oh? cool. Not sure how it actually would but if it did, awesome [15:37:16] https://github.com/wikimedia/mediawiki-tools-scap/blob/HEAD/scap/main.py#L322 [15:37:45] ah ha. yeah that would apply on each host [15:37:52] It would try to sync *.dblist from the master to the current server, right? so it seems one is deleted and gets rid of it locally too [15:38:05] 6operations, 10Wikimedia-IEG-grant-review: move iegreview to a VM - https://phabricator.wikimedia.org/T105007#1454151 (10akosiaris) [15:38:07] sees* [15:38:07] 6operations, 10Wikimedia-Wikimania-Scholarships: move wikimania_scholarships to a VM - https://phabricator.wikimedia.org/T105003#1454152 (10akosiaris) [15:38:10] beta-mediawiki-config-update-eqiad Failed deployment on the EQIAD beta cluster :-/ Please contact a member of the beta project to fixup the working directory on the destination server. [15:38:11] 6operations, 7Tracking: tracking: move all misc services from zirconium to a VM - https://phabricator.wikimedia.org/T104946#1454153 (10akosiaris) [15:39:16] (03CR) 10coren: [C: 04-1] "I'm not fond of how the latest changeset includes NFS changes in addition to the straightforward switch to a fact. Was this an accidental" [puppet] - 10https://gerrit.wikimedia.org/r/221562 (owner: 10Andrew Bogott) [15:39:49] Krenair: it looks like it's gone from noc , looks right [15:39:56] yeah [15:40:02] Krenair: yeah. It would run on each host in the cluster and tell it to check *.dblist locally vs remote via rsync. That should clean up local things that are no longer on the rsync server as well as adding local files [15:40:14] TIL about code I ported to scap :) [15:40:30] :D [15:43:40] (03CR) 10Andrew Bogott: "yes, probably accidental... can you mark the relevant bits? I don't immediately see what you mean." [puppet] - 10https://gerrit.wikimedia.org/r/221562 (owner: 10Andrew Bogott) [15:47:12] (03PS1) 10Dzahn: krypton: fix conflict with old server with same name [puppet] - 10https://gerrit.wikimedia.org/r/224813 [15:48:37] (03PS2) 10Dzahn: krypton: fix conflict with old server with same name [puppet] - 10https://gerrit.wikimedia.org/r/224813 (https://phabricator.wikimedia.org/T105507) [15:50:42] (03PS3) 10Dzahn: krypton: fix conflict with old server with same name [puppet] - 10https://gerrit.wikimedia.org/r/224813 (https://phabricator.wikimedia.org/T105507) [15:51:02] (03CR) 10Dzahn: [C: 032] krypton: fix conflict with old server with same name [puppet] - 10https://gerrit.wikimedia.org/r/224813 (https://phabricator.wikimedia.org/T105507) (owner: 10Dzahn) [15:51:24] (03CR) 10coren: "I'm not sure I understand most of these points?" [puppet] - 10https://gerrit.wikimedia.org/r/224064 (https://phabricator.wikimedia.org/T105027) (owner: 10coren) [15:53:05] ACKNOWLEDGEMENT - puppet last run on mw1090 is CRITICAL Puppet last ran 19 hours ago daniel_zahn depooled and removed from dsh [15:54:21] legoktm, are you going to deploy https://gerrit.wikimedia.org/r/#/c/224808/1/wmf-config/InitialiseSettings.php ? [15:56:29] Krenair: are you done now? [15:56:41] well there's a ton of config changes I'd still like to do [15:56:56] (03CR) 10Alex Monk: [C: 032] Enable WikiLove extension at Spanish Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222581 (https://phabricator.wikimedia.org/T103424) (owner: 10Glaisher) [15:56:58] do you want to do that one too then ? :) [15:57:02] okay [15:57:22] (03Merged) 10jenkins-bot: Enable WikiLove extension at Spanish Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222581 (https://phabricator.wikimedia.org/T103424) (owner: 10Glaisher) [15:58:18] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222581/ (duration: 00m 11s) [15:58:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:00:34] (03CR) 10Alex Monk: [C: 032] Disable 'CentralAuth-Bug39996' log [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224808 (https://phabricator.wikimedia.org/T105895) (owner: 10Legoktm) [16:00:41] (03Merged) 10jenkins-bot: Disable 'CentralAuth-Bug39996' log [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224808 (https://phabricator.wikimedia.org/T105895) (owner: 10Legoktm) [16:01:22] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224808/ (duration: 00m 12s) [16:01:24] legoktm, ^ [16:01:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:01:29] thanks [16:01:45] (03CR) 10Alex Monk: [C: 04-1] add HTTPS variants for wmfblog in feed whitelists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222691 (https://phabricator.wikimedia.org/T104727) (owner: 10Jeremyb) [16:02:11] (03CR) 10Alex Monk: [C: 032] Remove SVN admin and coder groups from mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224420 (https://phabricator.wikimedia.org/T105676) (owner: 10Alex Monk) [16:02:37] (03Merged) 10jenkins-bot: Remove SVN admin and coder groups from mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224420 (https://phabricator.wikimedia.org/T105676) (owner: 10Alex Monk) [16:03:27] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224420/ (duration: 00m 12s) [16:03:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:04:02] !log krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/224420/ (duration: 00m 12s) [16:04:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:05:06] 6operations, 10Traffic, 10fundraising-tech-ops, 5Patch-For-Review: Decide what to do with *.donate.wikimedia.org subdomain + TLS - https://phabricator.wikimedia.org/T102827#1454255 (10JanZerebecki) >>! In T102827#1452705, @CCogdill_WMF wrote: > The only one(s) I'd specifically ask we keep around if possibl... [16:05:53] huh... that didn't seem to remove it entirely. interesting [16:09:37] hey legoktm, why doesn't https://gerrit.wikimedia.org/r/#/c/224420/1/wmf-config/CommonSettings.php actually unset $wgGroupPermissions['svnadmins'] ? [16:10:30] (03CR) 10RobH: [C: 032] reclaim lanthanum: remove lanthanum.eqaid.wmnet [dns] - 10https://gerrit.wikimedia.org/r/223167 (https://phabricator.wikimedia.org/T86658) (owner: 10John F. Lewis) [16:14:04] (03CR) 10JanZerebecki: [C: 04-1] "Needs more discussion on the ticket." [dns] - 10https://gerrit.wikimedia.org/r/223245 (https://phabricator.wikimedia.org/T102827) (owner: 10Chmarkine) [16:18:01] 6operations, 10Continuous-Integration-Infrastructure: Phase out lanthanum.eqiad.wmnet - https://phabricator.wikimedia.org/T86658#1454281 (10RobH) a:5RobH>3Cmjohnson The following have been completed: * merge @dzahn's dns change and push live * decom from palladium: puppet keys, salt keys, puppetstoreddb *... [16:19:17] 6operations, 10ops-eqiad: wipe disks for lanthanum - https://phabricator.wikimedia.org/T105901#1454285 (10RobH) 3NEW a:3Cmjohnson [16:19:18] robh: I feel left out because that was my change not mutante's :( [16:19:29] ? [16:19:33] oh [16:19:36] sorry [16:20:16] (03PS1) 10Alexandros Kosiaris: Introduce roles for the maps-team in labs [puppet] - 10https://gerrit.wikimedia.org/r/224822 [16:20:31] fixed ;D [16:20:33] (03PS1) 10Faidon Liambotis: etherpad: switch to HTTPS-only (redirect, HSTS) [puppet] - 10https://gerrit.wikimedia.org/r/224823 [16:21:09] 6operations, 10ops-eqiad: wipe disks for lanthanum - https://phabricator.wikimedia.org/T105901#1454300 (10hashar) The reclaim is confirmed in parent task T86658. So yeah disks can be wiped :-) [16:23:06] !log trying to kill labvirt1005 via repeated instance suspend/resume [16:23:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:26:28] !log woo, first try! [16:26:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:26:48] 6operations, 5Patch-For-Review, 7discovery-system, 5services-tooling: Create a debian package of python-etcd - https://phabricator.wikimedia.org/T99771#1454334 (10akosiaris) 5Open>3Resolved Resolving, should have been resolved a long time ago [16:27:00] 6operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service, and 3 others: Wikidata Query Service hardware - https://phabricator.wikimedia.org/T86561#1454337 (10RobH) [16:27:11] PROBLEM - Host labvirt1005 is DOWN: PING CRITICAL - Packet loss = 100% [16:27:24] (03PS2) 10Alexandros Kosiaris: Introduce roles for the maps-team in labs [puppet] - 10https://gerrit.wikimedia.org/r/224822 (https://phabricator.wikimedia.org/T105070) [16:27:31] (03PS3) 10Alexandros Kosiaris: Introduce roles for the maps-team in labs [puppet] - 10https://gerrit.wikimedia.org/r/224822 (https://phabricator.wikimedia.org/T105070) [16:27:37] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Introduce roles for the maps-team in labs [puppet] - 10https://gerrit.wikimedia.org/r/224822 (https://phabricator.wikimedia.org/T105070) (owner: 10Alexandros Kosiaris) [16:28:11] robh: thoughts on https://gerrit.wikimedia.org/r/#/c/224757/ ? not sure when actual reimagining will happen [16:29:07] 6operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Wikidata Query Service hardware - https://phabricator.wikimedia.org/T86561#1454346 (10Joe) [16:30:50] PROBLEM - mediawiki-installation DSH group on mw1090 is CRITICAL: Host mw1090 is not in mediawiki-installation dsh group [16:31:54] 6operations, 6Labs, 3ToolLabs-Goals-Q4: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1454350 (10Andrew) I just ran a simple test on labvirt1005 (with 3.13), and was able to make it lock up on the first try. So now I'm ready to try a different kernel. [16:33:42] 6operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Wikidata Query Service hardware - https://phabricator.wikimedia.org/T86561#1454358 (10RobH) Update from IRC discussion: We'll be allocating two systems for this: Dell PowerEdge R420, Dual Intel Xeon E5-2440, 64GB Memory, Dual 30... [16:34:02] ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw1090 is CRITICAL: Host mw1090 is not in mediawiki-installation dsh group daniel_zahn T105835 [16:34:21] RECOVERY - Host labvirt1005 is UPING OK - Packet loss = 0%, RTA = 2.57 ms [16:36:28] 6operations, 6Labs, 3ToolLabs-Goals-Q4: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1454377 (10yuvipanda) Install linux-generic-lts-vivid package and reboot? [16:36:36] 6operations, 10ops-eqiad: relocate wmf3544 from row d into any other row - https://phabricator.wikimedia.org/T105904#1454378 (10RobH) 3NEW a:3Cmjohnson [16:36:39] 6operations, 10RESTBase, 10RESTBase-Cassandra: Test multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#1454385 (10Eevans) >>! In T95253#1453721, @BBlack wrote: > I had to backtrack a bit into the ref'd closed ticket, but it seems that the fundamental drivers for multipl... [16:38:56] (03PS11) 10Giuseppe Lavagetto: service::node: auto-monitoring of local endpoints [puppet] - 10https://gerrit.wikimedia.org/r/223328 (https://phabricator.wikimedia.org/T94821) [16:39:29] !log krypton - signing puppet cert, initial run [16:39:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:39:49] (03CR) 10jenkins-bot: [V: 04-1] service::node: auto-monitoring of local endpoints [puppet] - 10https://gerrit.wikimedia.org/r/223328 (https://phabricator.wikimedia.org/T94821) (owner: 10Giuseppe Lavagetto) [16:43:18] 6operations, 10ops-eqiad: relocate wmf3544 from row d into any other row - https://phabricator.wikimedia.org/T105904#1454396 (10RobH) Also please just place the relocated system into the internal vlan for its new row and enable the port. Thanks! [16:43:26] !log accepting unaccepted salt keys for ganeti VMs ,planet, bromine, krypton [16:43:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:44:14] <_joe_> wtf gerrut [16:44:18] <_joe_> *gerrit [16:44:29] <_joe_> I can't get flake8 to spit an error locally [16:45:36] (03CR) 10Giuseppe Lavagetto: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/223328 (https://phabricator.wikimedia.org/T94821) (owner: 10Giuseppe Lavagetto) [16:45:54] !log rebooting labvirt1005 [16:45:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:48:08] 6operations, 10RESTBase, 10hardware-requests: Expand RESTBase cluster capacity - https://phabricator.wikimedia.org/T93790#1454417 (10GWicke) We have consolidated a lot of information from this task and others in https://wikitech.wikimedia.org/wiki/Cassandra/Hardware. [16:48:44] 6operations, 10Traffic, 10fundraising-tech-ops, 5Patch-For-Review: Decide what to do with *.donate.wikimedia.org subdomain + TLS - https://phabricator.wikimedia.org/T102827#1454419 (10CCogdill_WMF) These are the recommend settings from our email service provider (IBM). When it comes to deliverability, we h... [16:49:00] PROBLEM - Host labvirt1005 is DOWN: PING CRITICAL - Packet loss = 100% [16:49:09] (03PS12) 10Giuseppe Lavagetto: service::node: auto-monitoring of local endpoints [puppet] - 10https://gerrit.wikimedia.org/r/223328 (https://phabricator.wikimedia.org/T94821) [16:49:43] (03PS1) 10Yuvipanda: celery: Make systemd do the chdir [puppet] - 10https://gerrit.wikimedia.org/r/224827 [16:50:12] (03PS2) 10Yuvipanda: beta: Switch puppet cherry-pick check to new graphite metric name [puppet] - 10https://gerrit.wikimedia.org/r/224505 [16:50:17] _joe_: odd? " 16:45:39 files/checker.py:1:1: E902 IOError: [Errno 2] No such file or directory: 'files/checker.py'" [16:50:19] (03CR) 10Yuvipanda: [C: 032 V: 032] beta: Switch puppet cherry-pick check to new graphite metric name [puppet] - 10https://gerrit.wikimedia.org/r/224505 (owner: 10Yuvipanda) [16:50:21] 6operations, 10RESTBase, 10RESTBase-Cassandra: Test multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#1454432 (10GWicke) [16:50:27] (03PS2) 10Yuvipanda: celery: Make systemd do the chdir [puppet] - 10https://gerrit.wikimedia.org/r/224827 [16:50:34] (03CR) 10Yuvipanda: [C: 032 V: 032] celery: Make systemd do the chdir [puppet] - 10https://gerrit.wikimedia.org/r/224827 (owner: 10Yuvipanda) [16:50:38] <_joe_> mutante: yes, that was my error [16:50:45] <_joe_> YuviPanda: celery? [16:50:48] <_joe_> on redis? [16:51:09] <_joe_> sigh :) [16:51:19] <_joe_> anyways, GTG, [16:51:29] _joe_: hmm? [16:51:31] RECOVERY - Host labvirt1005 is UPING OK - Packet loss = 0%, RTA = 2.18 ms [16:51:38] _joe|afk: are you going to tell me to use amqp? :P [16:52:07] (03PS1) 10Faidon Liambotis: sslcert: fix ::ca's ensure => absent [puppet] - 10https://gerrit.wikimedia.org/r/224828 [16:59:21] (03PS1) 10BryanDavis: scap: Add co-master configuration [puppet] - 10https://gerrit.wikimedia.org/r/224829 (https://phabricator.wikimedia.org/T104826) [17:00:15] 6operations, 6Labs, 3ToolLabs-Goals-Q4: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1454502 (10Andrew) Starting with a 3.13 system... # apt-get install linux-generic-lts-vivid # apt-get install linux-image-3.19 linux-headers-3.19 # apt-get dist-upgrade # puppet ag... [17:01:18] (03PS1) 10coren: Remove echowikis.dblist support [software] - 10https://gerrit.wikimedia.org/r/224830 [17:02:17] (03CR) 10Dzahn: [C: 031] "we should still totally do this." [puppet] - 10https://gerrit.wikimedia.org/r/198116 (https://phabricator.wikimedia.org/T87132) (owner: 10Tim Landscheidt) [17:03:00] (03PS1) 10Yuvipanda: ores: Use virtualenv package rather than python-virtualenv [puppet] - 10https://gerrit.wikimedia.org/r/224831 [17:03:10] (03CR) 10coren: [C: 032] Remove echowikis.dblist support [software] - 10https://gerrit.wikimedia.org/r/224830 (owner: 10coren) [17:03:18] 6operations, 10RESTBase, 10RESTBase-Cassandra: Test multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#1454533 (10GWicke) [17:03:21] (03CR) 10coren: [V: 032] Remove echowikis.dblist support [software] - 10https://gerrit.wikimedia.org/r/224830 (owner: 10coren) [17:03:24] (03PS2) 10Yuvipanda: ores: Use virtualenv package rather than python-virtualenv [puppet] - 10https://gerrit.wikimedia.org/r/224831 [17:03:35] (03CR) 10Yuvipanda: [C: 032 V: 032] ores: Use virtualenv package rather than python-virtualenv [puppet] - 10https://gerrit.wikimedia.org/r/224831 (owner: 10Yuvipanda) [17:04:30] (03CR) 10Dzahn: "re: polluting. i think polluting every single jenkins check log and making it impossible to ever pass is worse than a couple exceptions i" [puppet] - 10https://gerrit.wikimedia.org/r/198116 (https://phabricator.wikimedia.org/T87132) (owner: 10Tim Landscheidt) [17:05:18] (03PS2) 10Addshore: Fix link to WD Json dumps on other dumps html page [puppet] - 10https://gerrit.wikimedia.org/r/224768 (https://phabricator.wikimedia.org/T104307) [17:06:23] (03CR) 10Dzahn: "yep, i merged this yesterday: https://gerrit.wikimedia.org/r/#/c/139581/" [software] - 10https://gerrit.wikimedia.org/r/224830 (owner: 10coren) [17:07:50] the manual V: 2 is just automatic by now, isnt it yuvi [17:09:04] (03CR) 10Hoo man: [C: 031] Fix link to WD Json dumps on other dumps html page [puppet] - 10https://gerrit.wikimedia.org/r/224768 (https://phabricator.wikimedia.org/T104307) (owner: 10Addshore) [17:13:38] (03PS1) 10Jgreen: remove aluminium.wikimedia.org A/PTR records, no longer needed [dns] - 10https://gerrit.wikimedia.org/r/224832 [17:15:51] (03CR) 10Jgreen: [C: 032 V: 031] remove aluminium.wikimedia.org A/PTR records, no longer needed [dns] - 10https://gerrit.wikimedia.org/r/224832 (owner: 10Jgreen) [17:17:02] !log authdns-update to remove aluminium, also lanthanum by preexisting commit [17:17:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:22:06] !log krenair Synchronized wmf-config/CommonSettings.php: partially revert https://gerrit.wikimedia.org/r/#/c/224420/1/wmf-config/CommonSettings.php - doesnt quite work (duration: 00m 13s) [17:22:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:23:35] (03PS1) 10Alex Monk: Revert removal of SVN admin group which doesn't work with config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224833 [17:24:04] (03CR) 10Alex Monk: [C: 032] Revert removal of SVN admin group which doesn't work with config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224833 (owner: 10Alex Monk) [17:24:10] (03Merged) 10jenkins-bot: Revert removal of SVN admin group which doesn't work with config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224833 (owner: 10Alex Monk) [17:24:55] (03PS3) 10Alex Monk: Re-enable all languages in GeSHi [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224799 (https://phabricator.wikimedia.org/T105889) (owner: 10Reedy) [17:27:55] (03PS1) 10Dzahn: add krypton to site.pp with base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/224835 (https://phabricator.wikimedia.org/T105507) [17:28:36] (03PS1) 10Glaisher: Remove /docroot/noc/conf/echowikis.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224836 [17:28:40] (03PS1) 10Jgreen: remove boron.wm.o A/PTR records, shouldn't need them anymore [dns] - 10https://gerrit.wikimedia.org/r/224837 [17:30:53] (03CR) 10Dzahn: "so what about http://dumps.wikimedia.org/other/wikidata/ then? they are still being created. are they useless? should they be removed? sho" [puppet] - 10https://gerrit.wikimedia.org/r/224768 (https://phabricator.wikimedia.org/T104307) (owner: 10Addshore) [17:31:31] (03CR) 10Glaisher: "https://gerrit.wikimedia.org/r/#/c/224836/ removes echowikis.dblist from noc/conf/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/139326 (https://phabricator.wikimedia.org/T59375) (owner: 10Withoutaname) [17:33:21] (03CR) 10Dzahn: [C: 032] "it's already gone from noc though since the last sync" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224836 (owner: 10Glaisher) [17:33:31] (03CR) 10Jgreen: [C: 032 V: 031] remove boron.wm.o A/PTR records, shouldn't need them anymore [dns] - 10https://gerrit.wikimedia.org/r/224837 (owner: 10Jgreen) [17:33:59] !log authdns-update to remove boron.wm.o [17:34:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:35:20] (03CR) 10Dzahn: [C: 032] add krypton to site.pp with base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/224835 (https://phabricator.wikimedia.org/T105507) (owner: 10Dzahn) [17:36:12] 6operations, 6Labs, 3ToolLabs-Goals-Q4: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1454617 (10Andrew) labvirt1005 with 3.19.0-22-lowlatency has survived quite a few cycles of suspend/resume. So I'm convinced that it does not exhibit that particular bug, at least. [17:36:23] 6operations, 10Wikimedia-Site-requests, 5Patch-For-Review: Rename "chapcomwiki" to "affcomwiki" - https://phabricator.wikimedia.org/T41482#1454618 (10Glaisher) >>! In T41482#1452225, @Dzahn wrote: > What about the pending Apache config change to redirect it? Is it moot until we figured out the issues with ES... [17:37:10] Krenair: Are you doing that site requests sprint? If so, could you deploy that gomwiki interwiki sources patch? [17:37:29] Yeah, it's in my open tabs queue of stuff to deploy [17:37:44] ok, thanks [17:38:12] Then I'm going to tidy up the workboard and other related stuff [17:41:50] 6operations, 10hardware-requests: CODFW Search Servers - https://phabricator.wikimedia.org/T97049#1454638 (10RobH) [17:55:03] (03PS1) 1020after4: disable LCStoreStaticArray completely for now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224840 [17:55:33] (03CR) 1020after4: [C: 032] disable LCStoreStaticArray completely for now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224840 (owner: 1020after4) [17:55:40] (03Merged) 10jenkins-bot: disable LCStoreStaticArray completely for now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224840 (owner: 1020after4) [17:58:48] (03PS1) 1020after4: group0 to 1.26wmf14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224842 [18:00:05] twentyafterfour greg-g: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150715T1800). [18:00:45] 6operations, 10Traffic, 10fundraising-tech-ops, 5Patch-For-Review: Decide what to do with *.donate.wikimedia.org subdomain + TLS - https://phabricator.wikimedia.org/T102827#1454670 (10JanZerebecki) Do you have a link to those recommendations? [18:02:55] (03CR) 1020after4: [C: 032] group0 to 1.26wmf14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224842 (owner: 1020after4) [18:03:00] (03Merged) 10jenkins-bot: group0 to 1.26wmf14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224842 (owner: 1020after4) [18:03:54] (03PS6) 10coren: Use the labsproject fact rather than $::instanceproject from ldap [puppet] - 10https://gerrit.wikimedia.org/r/221562 (owner: 10Andrew Bogott) [18:04:53] ok I'm going to try pushing 1.26wmf14 again [18:05:03] hopefully nothing blows up this time [18:05:15] (03CR) 10coren: [C: 031] "... nevermind. It's an artifact caused by changes done since the original and went away with a rebase." [puppet] - 10https://gerrit.wikimedia.org/r/221562 (owner: 10Andrew Bogott) [18:06:05] twentyafterfour: godspeed [18:06:48] !log twentyafterfour Started scap: group0 to 1.26wmf14 [18:06:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:09:01] if everything seems ok, maybe we should go ahead and do group1 after a while? [18:09:18] (03PS1) 10Jgreen: several fundraising DNS changes moving toward service-oriented NAT and hostnames [dns] - 10https://gerrit.wikimedia.org/r/224846 [18:09:29] (03CR) 10jenkins-bot: [V: 04-1] several fundraising DNS changes moving toward service-oriented NAT and hostnames [dns] - 10https://gerrit.wikimedia.org/r/224846 (owner: 10Jgreen) [18:09:51] twentyafterfour: yeppers [18:10:06] I'd give it 30 minutes of clean-ish logs [18:10:32] it's hard to detect anything in the logs from group0 [18:10:39] yeah :/ [18:10:43] (03PS2) 10Jgreen: several fundraising DNS changes moving toward service-oriented NAT and hostnames [dns] - 10https://gerrit.wikimedia.org/r/224846 [18:10:43] they are below the noise floor [18:10:49] wait... [18:11:02] isn't there a group-specific view in logstash? [18:11:24] nope. [18:11:31] nope. how would we do that? [18:11:42] greg-g: I made a group1 logstash view [18:11:42] dunno [18:12:05] but even group1 has a low enough frequency of fatals that it's basically empty [18:12:11] bd808: but, when we go to long lived branches called group0, 1, 2 instead of new wmfXXs every week, that'd be easier [18:12:26] I'm just calling that inevitable now (cc twentyafterfour ;) ) [18:13:09] https://logstash.wikimedia.org/#/dashboard/elasticsearch/fatalmonitor-group1 [18:13:22] that ^ [18:13:26] based on wiki name [18:15:40] (03PS3) 10Jgreen: several fundraising DNS changes moving toward service-oriented NAT and hostnames [dns] - 10https://gerrit.wikimedia.org/r/224846 [18:16:22] (03PS1) 10Dzahn: wikimania scholarships app: apply role on krypton [puppet] - 10https://gerrit.wikimedia.org/r/224849 (https://phabricator.wikimedia.org/T105003) [18:16:47] https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=mem_report&s=by+name&c=API%2520application%2520servers%2520eqiad&tab=m&vn=&hide-hf=false shows what blew up yesterday, not fatalmonitor [18:17:59] (03PS4) 10Jgreen: several fundraising DNS changes moving toward service-oriented NAT and hostnames [dns] - 10https://gerrit.wikimedia.org/r/224846 [18:19:25] (03CR) 10Dzahn: [C: 032] wikimania scholarships app: apply role on krypton [puppet] - 10https://gerrit.wikimedia.org/r/224849 (https://phabricator.wikimedia.org/T105003) (owner: 10Dzahn) [18:19:37] (03CR) 10Jgreen: [C: 032 V: 031] several fundraising DNS changes moving toward service-oriented NAT and hostnames [dns] - 10https://gerrit.wikimedia.org/r/224846 (owner: 10Jgreen) [18:20:45] !log authdns-update shifting to service-oriented hostnames for fundraising cluster [18:20:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:21:15] !log es1.6 upgrade: upgrading elastic1019 [18:21:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:21:40] (03PS2) 10Amire80: Set a different wmgContentTranslationDefaultSourceLanguage for English [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224031 (https://phabricator.wikimedia.org/T105327) [18:22:39] (03CR) 10Zfilipin: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/218389 (https://phabricator.wikimedia.org/T102020) (owner: 10Zfilipin) [18:23:09] Reedy, Krenair: Wikitech is blocking account creations from the hackathon. Can we whitelist it? [18:23:50] what's the error message? [18:24:17] Krenair: there have been 6 accounts created ... [18:24:37] LOL [18:24:41] whatsmyip says 201.149.6.36 [18:24:44] okay [18:24:49] we can do a throttle exemption [18:25:07] (03CR) 10Dzahn: [C: 031] Re-enable all languages in GeSHi [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224799 (https://phabricator.wikimedia.org/T105889) (owner: 10Reedy) [18:25:09] bd808: Krenair is doing it [18:25:26] awesome [18:25:44] classic hackathon/editathon problem [18:25:52] every time, so reliable [18:25:59] bd808: it's just IPv4 right? [18:26:00] stupid spammers [18:26:10] legoktm: seems to be, yes [18:26:13] PROBLEM - puppet last run on krypton is CRITICAL Puppet has 2 failures [18:26:26] you could use the tor-relay.. oh wait [18:26:33] heh [18:26:55] someone is still deploying [18:26:59] twentyafterfour: you? [18:27:10] legoktm: yes scap running [18:27:24] 50% complete [18:27:42] I could live hack it on silver [18:27:46] legoktm: we're redoing the scap from yesterday since we were blocked by the new LCStore stuffz [18:28:28] ACKNOWLEDGEMENT - puppet last run on krypton is CRITICAL Puppet has 2 failures daniel_zahn Dependency File[/srv/deployment/scholarships/scholarships] has failures: true [18:29:01] yep, none of the roles just work by themselves :/ [18:29:11] although it may or may not be overwritten by scap [18:29:16] they only got lucky when combined on the same node [18:29:29] (03PS1) 10Jgreen: more fundraising hostnames shift to service-oriented scheme [dns] - 10https://gerrit.wikimedia.org/r/224851 [18:30:34] bd808: you could also just add some unprivileged accounts to bot group so that they can be used as base to create accounts :) https://wikitech.wikimedia.org/w/index.php?title=Special:ListUsers&group=bureaucrat [18:30:49] in the spirit of https://www.mediawiki.org/wiki/Help:Mass_account_creation [18:31:04] I think we need to raise the throttle for wikimania anyway [18:31:19] (03CR) 10Jgreen: [C: 032 V: 031] more fundraising hostnames shift to service-oriented scheme [dns] - 10https://gerrit.wikimedia.org/r/224851 (owner: 10Jgreen) [18:33:16] scap 85% complete [18:34:05] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/218389 (https://phabricator.wikimedia.org/T102020) (owner: 10Zfilipin) [18:34:52] (03PS7) 10Andrew Bogott: Use the labsproject fact rather than $::instanceproject from ldap [puppet] - 10https://gerrit.wikimedia.org/r/221562 [18:36:06] 6operations, 10Continuous-Integration-Infrastructure: Create a basic RSpec unit test for operations/puppet - https://phabricator.wikimedia.org/T78342#1454759 (10zeljkofilipin) [18:37:04] 6operations: fix the puppet role for the wikimania scholarship app - https://phabricator.wikimedia.org/T105920#1454766 (10Dzahn) 3NEW [18:37:17] (03CR) 10Andrew Bogott: [C: 032] Use the labsproject fact rather than $::instanceproject from ldap [puppet] - 10https://gerrit.wikimedia.org/r/221562 (owner: 10Andrew Bogott) [18:37:34] 6operations: fix the puppet role for the wikimania scholarship app - https://phabricator.wikimedia.org/T105920#1454776 (10Dzahn) [18:37:36] 6operations, 10Wikimedia-Wikimania-Scholarships, 5Patch-For-Review: move wikimania_scholarships to a VM - https://phabricator.wikimedia.org/T105003#1454775 (10Dzahn) [18:39:23] !log twentyafterfour Finished scap: group0 to 1.26wmf14 (duration: 32m 34s) [18:39:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:39:50] PROBLEM - check_puppetrun on backup4001 is CRITICAL puppet fail [18:39:51] !log krenair Synchronized wmf-config/throttle.php: throttle labswiki account creations from hackathon at 500 (duration: 00m 12s) [18:39:54] bd808, ^ [18:39:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:40:45] (03PS1) 10Alex Monk: Raise account creation throttle for hackathon on labswiki to 500 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224856 [18:41:27] (03CR) 10Alex Monk: [C: 032] "sync'd" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224856 (owner: 10Alex Monk) [18:41:33] (03Merged) 10jenkins-bot: Raise account creation throttle for hackathon on labswiki to 500 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224856 (owner: 10Alex Monk) [18:42:01] PROBLEM - puppet last run on cp3043 is CRITICAL puppet fail [18:44:50] PROBLEM - check_puppetrun on backup4001 is CRITICAL puppet fail [18:46:04] (03CR) 10Hoo man: "@Dzahn: That's the legacy dump directory... we still put symlinks to new json dumps there (for b/c reasons), but it's not getting the new " [puppet] - 10https://gerrit.wikimedia.org/r/224768 (https://phabricator.wikimedia.org/T104307) (owner: 10Addshore) [18:48:42] bd808, did it work? [18:48:59] Krenair: I'll test [18:49:50] RECOVERY - check_puppetrun on backup4001 is OK Puppet is currently enabled, last run 149 seconds ago with 0 failures [18:50:21] Krenair: \o/ works [18:50:26] great [18:51:44] (03PS1) 10Dzahn: wm scholarships: fix puppet fail, missing deploy dir [puppet] - 10https://gerrit.wikimedia.org/r/224859 (https://phabricator.wikimedia.org/T105920) [18:54:54] twentyafterfour, are you done, by the way? [18:55:14] Krenair: yes [18:55:36] well, I still need to sync group1 but that can wait a few minutes [18:55:55] (03PS4) 10Alex Monk: wikitech: Clean up contentadmin rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222776 [18:56:03] (03CR) 10Alex Monk: [C: 032] wikitech: Clean up contentadmin rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222776 (owner: 10Alex Monk) [18:56:09] (03Merged) 10jenkins-bot: wikitech: Clean up contentadmin rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222776 (owner: 10Alex Monk) [18:57:07] (03PS5) 10Zfilipin: The basic RuboCop configuration [puppet] - 10https://gerrit.wikimedia.org/r/218389 (https://phabricator.wikimedia.org/T102020) [18:57:15] !log krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/222776/ (duration: 00m 13s) [18:57:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:57:42] (03PS1) 10Faidon Liambotis: check_ssl: add support for picking the auth algorithm [puppet] - 10https://gerrit.wikimedia.org/r/224860 [18:57:54] (03CR) 10Zfilipin: "Patch set 5 is just a rebase. Had to do it from the command line, gerrit could not do it." [puppet] - 10https://gerrit.wikimedia.org/r/218389 (https://phabricator.wikimedia.org/T102020) (owner: 10Zfilipin) [18:57:56] (03PS1) 10Alexandros Kosiaris: Update hieradata/labs/maps-team/common.yaml [puppet] - 10https://gerrit.wikimedia.org/r/224861 [18:58:00] (03CR) 10jenkins-bot: [V: 04-1] Update hieradata/labs/maps-team/common.yaml [puppet] - 10https://gerrit.wikimedia.org/r/224861 (owner: 10Alexandros Kosiaris) [18:58:04] (03CR) 10Zfilipin: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/218389 (https://phabricator.wikimedia.org/T102020) (owner: 10Zfilipin) [18:58:27] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222776/ (duration: 00m 13s) [18:58:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:58:58] (03PS2) 10Alex Monk: wikitech: Get rid of unused ops namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222792 [18:59:34] (03CR) 10Alex Monk: [C: 032] wikitech: Get rid of unused ops namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222792 (owner: 10Alex Monk) [18:59:36] (03PS2) 10Dzahn: wm scholarships: fix puppet fail, missing deploy dir [puppet] - 10https://gerrit.wikimedia.org/r/224859 (https://phabricator.wikimedia.org/T105920) [18:59:39] (03Merged) 10jenkins-bot: wikitech: Get rid of unused ops namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222792 (owner: 10Alex Monk) [19:00:43] !log krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/222792/ (duration: 00m 12s) [19:00:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:01:35] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222792/ (duration: 00m 13s) [19:01:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:03:31] (03CR) 10Dzahn: [C: 032] wm scholarships: fix puppet fail, missing deploy dir [puppet] - 10https://gerrit.wikimedia.org/r/224859 (https://phabricator.wikimedia.org/T105920) (owner: 10Dzahn) [19:05:30] (03CR) 10Alex Monk: [C: 032] Add import sources at gomwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224408 (https://phabricator.wikimedia.org/T104563) (owner: 10Glaisher) [19:06:02] (03Merged) 10jenkins-bot: Add import sources at gomwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224408 (https://phabricator.wikimedia.org/T104563) (owner: 10Glaisher) [19:06:37] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224408/ (duration: 00m 12s) [19:06:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:08:01] 6operations, 5Patch-For-Review: fix the puppet role for the wikimania scholarship app - https://phabricator.wikimedia.org/T105920#1454869 (10Dzahn) one issue fixed, more to go the next is that there is /etc/apache2/conf.d/ anymore here on jessie with Apache 2.4 [19:08:09] (03PS1) 10Alexandros Kosiaris: Allow speficying postgis version in postgresql::spatialdb [puppet] - 10https://gerrit.wikimedia.org/r/224864 [19:08:14] (03CR) 10jenkins-bot: [V: 04-1] Allow speficying postgis version in postgresql::spatialdb [puppet] - 10https://gerrit.wikimedia.org/r/224864 (owner: 10Alexandros Kosiaris) [19:09:13] (03PS2) 10Alexandros Kosiaris: Update hieradata/labs/maps-team/common.yaml [puppet] - 10https://gerrit.wikimedia.org/r/224861 [19:09:52] RECOVERY - puppet last run on cp3043 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [19:11:30] 6operations, 5Patch-For-Review: fix the puppet role for the wikimania scholarship app - https://phabricator.wikimedia.org/T105920#1454877 (10Dzahn) [19:11:31] I'm gonna go ahead with group1 to 1.26wmf14 unless there are objections? [19:11:40] none from me [19:12:44] none from me either, /me logs out of tin [19:14:13] (03CR) 10Dzahn: [C: 031] "alright then" [puppet] - 10https://gerrit.wikimedia.org/r/224768 (https://phabricator.wikimedia.org/T104307) (owner: 10Addshore) [19:15:04] (03CR) 10Alexandros Kosiaris: [C: 032] Update hieradata/labs/maps-team/common.yaml [puppet] - 10https://gerrit.wikimedia.org/r/224861 (owner: 10Alexandros Kosiaris) [19:15:29] (03PS2) 10Alexandros Kosiaris: Allow speficying postgis version in postgresql::spatialdb [puppet] - 10https://gerrit.wikimedia.org/r/224864 [19:27:56] (03CR) 10Alexandros Kosiaris: [C: 032] Allow speficying postgis version in postgresql::spatialdb [puppet] - 10https://gerrit.wikimedia.org/r/224864 (owner: 10Alexandros Kosiaris) [19:30:27] _joe|afk: Are you thinking of switching the remaining imagescalers over to HHVM soon? :-) [19:31:18] (03PS1) 10Alexandros Kosiaris: maps:: Add cassandra as a component of role::maps classes [puppet] - 10https://gerrit.wikimedia.org/r/224871 [19:31:23] (03CR) 10jenkins-bot: [V: 04-1] maps:: Add cassandra as a component of role::maps classes [puppet] - 10https://gerrit.wikimedia.org/r/224871 (owner: 10Alexandros Kosiaris) [19:39:02] 6operations, 6Labs, 3ToolLabs-Goals-Q4: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1454934 (10Andrew) Oh, except on 3.19.0, resuming an instance doesn't work. It says it's resuming but actually never works again. [19:47:30] (03PS1) 10Legoktm: Disable AccountAudit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224936 (https://phabricator.wikimedia.org/T105894) [19:47:47] (03PS2) 10Alexandros Kosiaris: maps:: Add cassandra as a component of role::maps classes [puppet] - 10https://gerrit.wikimedia.org/r/224871 [19:47:49] (03PS1) 10Alexandros Kosiaris: role::maps: Assign postgresql::postgis::pgversion a value as well [puppet] - 10https://gerrit.wikimedia.org/r/224937 [19:47:52] (03CR) 10jenkins-bot: [V: 04-1] maps:: Add cassandra as a component of role::maps classes [puppet] - 10https://gerrit.wikimedia.org/r/224871 (owner: 10Alexandros Kosiaris) [19:47:55] (03CR) 10jenkins-bot: [V: 04-1] role::maps: Assign postgresql::postgis::pgversion a value as well [puppet] - 10https://gerrit.wikimedia.org/r/224937 (owner: 10Alexandros Kosiaris) [19:49:59] <_joe|afk> James_F: I think to ramp up to 75% this week [19:50:09] _joe|afk: Cool. [19:50:18] (03PS2) 10Alexandros Kosiaris: role::maps: Assign postgresql::postgis::pgversion a value as well [puppet] - 10https://gerrit.wikimedia.org/r/224937 [19:50:29] <_joe|afk> But I'm dining atm [19:50:41] <_joe|afk> :-) [19:51:31] _joe|afk: get off your phone! [19:57:21] (03CR) 10Alexandros Kosiaris: [C: 032] role::maps: Assign postgresql::postgis::pgversion a value as well [puppet] - 10https://gerrit.wikimedia.org/r/224937 (owner: 10Alexandros Kosiaris) [19:58:17] (03PS6) 10Zfilipin: The basic RuboCop configuration [puppet] - 10https://gerrit.wikimedia.org/r/218389 (https://phabricator.wikimedia.org/T102020) [19:59:11] 6operations, 10Traffic, 7HTTPS, 7Mobile, 5Patch-For-Review: TLS and *.wap/*.mobile multi-level subdomains of wikipedia.org - https://phabricator.wikimedia.org/T104942#1455025 (10dr0ptp4kt) I added Reading web engineers to the patch and will add them to this ticket as well in case they have any concerns. [20:00:04] gwicke cscott arlolra subbu: Respected human, time to deploy Services – Parsoid / OCG / Citoid / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150715T2000). Please do the needful. [20:00:19] 6operations, 10Traffic, 7HTTPS, 7Mobile, 5Patch-For-Review: TLS and *.wap/*.mobile multi-level subdomains of wikipedia.org - https://phabricator.wikimedia.org/T104942#1455026 (10dr0ptp4kt) [20:00:23] twentyafterfour: how's group1? [20:00:44] (03CR) 10Zfilipin: "Patch set 6 updated rubocop configuration file for new (rebased) code." [puppet] - 10https://gerrit.wikimedia.org/r/218389 (https://phabricator.wikimedia.org/T102020) (owner: 10Zfilipin) [20:01:06] (03CR) 10Zfilipin: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/218389 (https://phabricator.wikimedia.org/T102020) (owner: 10Zfilipin) [20:02:30] (03CR) 10Zfilipin: "Rubocop is green!" [puppet] - 10https://gerrit.wikimedia.org/r/218389 (https://phabricator.wikimedia.org/T102020) (owner: 10Zfilipin) [20:03:14] (03PS3) 10Alexandros Kosiaris: maps:: Add cassandra as a component of role::maps classes [puppet] - 10https://gerrit.wikimedia.org/r/224871 [20:03:16] (03PS1) 10Alexandros Kosiaris: Add missing user attribute in maps-team labs hiera [puppet] - 10https://gerrit.wikimedia.org/r/224941 [20:03:21] (03CR) 10jenkins-bot: [V: 04-1] maps:: Add cassandra as a component of role::maps classes [puppet] - 10https://gerrit.wikimedia.org/r/224871 (owner: 10Alexandros Kosiaris) [20:03:22] (03PS3) 10Rush: Phabricator: Turn on diffusion.allow-http-auth [puppet] - 10https://gerrit.wikimedia.org/r/223067 (owner: 10Chad) [20:03:25] (03CR) 10jenkins-bot: [V: 04-1] Add missing user attribute in maps-team labs hiera [puppet] - 10https://gerrit.wikimedia.org/r/224941 (owner: 10Alexandros Kosiaris) [20:03:58] (03CR) 10Rush: [C: 032 V: 032] Phabricator: Turn on diffusion.allow-http-auth [puppet] - 10https://gerrit.wikimedia.org/r/223067 (owner: 10Chad) [20:05:14] grrrit-wm: I didn't do group1 yet, was just about to do that [20:05:24] (03PS1) 1020after4: group1 wikis to 1.26wmf14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224942 [20:05:35] (03CR) 1020after4: [C: 032] group1 wikis to 1.26wmf14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224942 (owner: 1020after4) [20:05:41] (03Merged) 10jenkins-bot: group1 wikis to 1.26wmf14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224942 (owner: 1020after4) [20:05:48] :) [20:06:09] er nice I replied to gerrit-wm [20:06:28] stupid gerrit, always messing everything up [20:06:47] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf14 [20:06:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:07:08] greg-g: done [20:07:15] word [20:11:21] 7Puppet, 10Continuous-Integration-Config, 5Patch-For-Review: Setup rubycop for operations/puppet ruby code lints - https://phabricator.wikimedia.org/T102020#1455054 (10zeljkofilipin) Zuul commit is merged and deployed: https://gerrit.wikimedia.org/r/#/c/224850/ Rubocop job is green and in experimental pipe... [20:12:17] 6operations, 10Traffic, 10fundraising-tech-ops, 5Patch-For-Review: Decide what to do with *.donate.wikimedia.org subdomain + TLS - https://phabricator.wikimedia.org/T102827#1455055 (10BBlack) @PCoombe @CCogdill_WMF - We've already dealt with the generic/per-project issues in https://gerrit.wikimedia.org/r/... [20:13:58] (03PS2) 10BryanDavis: scap: Add co-master configuration [puppet] - 10https://gerrit.wikimedia.org/r/224829 (https://phabricator.wikimedia.org/T104826) [20:15:55] did we deploy the new mediawiki stats stuff again? [20:16:12] https://tessera.wikimedia.org/dashboards/6/ciphers?from=-1h <- dead data again [20:16:37] twentyafterfour: ^ [20:16:39] ori: ^ [20:17:07] bblack: ? [20:17:28] 6operations, 10Traffic, 7HTTPS, 7Mobile, 5Patch-For-Review: TLS and *.wap/*.mobile multi-level subdomains of wikipedia.org - https://phabricator.wikimedia.org/T104942#1455085 (10BBlack) @dr0ptp4kt thanks :) [20:17:52] bblack: porobably [20:17:55] *probably [20:18:03] my revert patch is still not merged [20:18:09] I just did sync-wikiversions [20:18:09] twentyafterfour: the past two times someone deployed some new stats thing built into wikimedia, it killed other statsd stats like this [20:18:20] it probably was contained in the new wikiversion [20:18:20] (03CR) 10Rush: [C: 04-1] "I don't think doing this in this way is a good idea at all. Most of this logic doesn't belong under 'role' and duplicating the general ph" [puppet] - 10https://gerrit.wikimedia.org/r/222987 (https://phabricator.wikimedia.org/T104827) (owner: 10Negative24) [20:18:33] !log Running FlowCreateMentionTemplate.php on all Flow wikis [20:18:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Mr. Obvious [20:18:43] https://gerrit.wikimedia.org/r/#/c/223673/ is unmerged, so I think we did push it with the new branch [20:18:53] wmf14 went to group1, it went out to group0 earlier today [20:19:46] twentyafterfour: I think it's probably only an issue when it's used significantly in traffic terms [20:20:11] ori: bblack is seeing https://tessera.wikimedia.org/dashboards/6/ciphers?from=-1h die again with the new branch [20:20:16] it spams statsd with traffic. it used to be broken traffic. I don't know if that's fixed and it's just a load issue, or not and it's still broken [20:20:35] (the ciphers link is just my easiest example URL. we lose lots of others stats too) [20:20:40] !log es1.6 upgrade: upgrade elastic1020 [20:20:41] we tried to fix the broken stats but I'm thinking it is a traffic issue [20:20:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:21:07] so, do we need to patch wmf14 or roll it back again? [20:21:27] we can patch [20:21:34] bd808: can you review https://gerrit.wikimedia.org/r/#/c/180027/ [20:21:49] twentyafterfour: cherry pick https://gerrit.wikimedia.org/r/#/c/223673/ [20:21:51] (we tested, and I think it should be fine) [20:22:00] sorry for wrong channel though. [20:22:44] 7Puppet, 10Continuous-Integration-Config, 5Patch-For-Review: Setup rubocop for operations/puppet ruby code lints - https://phabricator.wikimedia.org/T102020#1455123 (10zeljkofilipin) [20:23:20] http://graphite.wikimedia.org/render/?width=931&height=511&_salt=1436991786.808&target=varnish.eqiad.backends.ipv4_10_2_2_1.2xx.count&from=-3hours [20:23:39] ^ more busted stats :) [20:23:44] bblack: I'm guessing it lines up with 20:06 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf14 [20:24:11] yeah [20:24:17] cherry picked [20:24:40] https://gerrit.wikimedia.org/r/#/c/224946/ [20:24:42] I think we are going to need to merge that in master :/ [20:26:01] (03PS2) 10BBlack: sslcert: fix ::ca's ensure => absent [puppet] - 10https://gerrit.wikimedia.org/r/224828 (owner: 10Faidon Liambotis) [20:27:23] (03CR) 10BBlack: [C: 032] sslcert: fix ::ca's ensure => absent [puppet] - 10https://gerrit.wikimedia.org/r/224828 (owner: 10Faidon Liambotis) [20:27:55] twentyafterfour: lgtm. legoktm just merged it to master too [20:30:46] man sync-dir and sync-file have a long pause that isn't there with other scap invocations [20:30:58] !log twentyafterfour Synchronized php-1.26wmf14: Sync If0237cdd0d66634d75b2bab8bc4292c0f3ef75ef (revert Count API module instantiations and Hook runs) (duration: 01m 48s) [20:31:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:31:16] twentyafterfour: weird [20:31:45] I mean it's not that I mind staring blankly at scappy for a minute but it is weird [20:32:04] she is a radiant pig [20:32:16] maybe tin just has slow disks? :-/ [20:33:12] !log globally cleaning up dangling symlinks left in /etc/certs from before Id7d2447 via salted 'find /etc/ssl/certs -type l -xtype l|xargs rm' [20:33:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:33:48] twentyafterfour: I think the last time I tried to track that down the only thing I could think of is ipv6 dns timeouts on the initial ssh connections [20:35:15] yeah, I guess it might just be that sync-file and sync-dir don't output any status information before they try to go to the network, whereas other invocations do some local stuff ( with log output ) up-front so the perception of a delay is different [20:36:44] confusion from delay with progress bar < confusion from delay without progress bar [20:37:09] twentyafterfour: perception is key :) [20:37:29] I keep wondering if we should add `-4` to the ssh command [20:38:14] we're having issues with basic "salt" functionality [20:38:16] bd808: graphite unhappiness caused by mw stats again? [20:38:36] I don't recall if/where that's tied into deployment, but could explain slow stuff there too [20:38:37] gwicke: yeah :( twentyafterfour just pushed the revert [20:38:46] (or, perhaps the deploy is causing the slowness elsewhere, I don't know) [20:41:25] !log restarted salt-master service on palladium [20:41:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:42:53] bblack: no salt used by scap. It's all homebrew dsh [20:43:20] ok [20:43:30] (03CR) 10Krinkle: "Ah, shit. Someone overwrote /db/grafana with a copy of /file/default.json. It's gone. Can't find it in logstash-elasticsearch either, ther" [puppet] - 10https://gerrit.wikimedia.org/r/224129 (owner: 10Krinkle) [20:48:31] bblack: salt is just unreliable it seems [20:48:56] (03CR) 10Rush: "what's up with this change?" [puppet] - 10https://gerrit.wikimedia.org/r/205797 (https://phabricator.wikimedia.org/T548) (owner: 1020after4) [20:49:06] well, that does seem to be the general trend over all the time we've had it [20:49:16] but it had been "ok" recently, up until today/now [20:49:38] (03CR) 1020after4: "rush: I haven't been able to get it reviewed :(" [puppet] - 10https://gerrit.wikimedia.org/r/205797 (https://phabricator.wikimedia.org/T548) (owner: 1020after4) [20:50:31] twentyafterfour: did akosiaris remove his -1 here? https://gerrit.wikimedia.org/r/#/c/205797/ [20:50:40] or was it abandoned and reopened or something [20:50:48] (not sure what the -1 was for tho) [20:51:12] (03CR) 1020after4: "Because this was holding up serious issues I just went around the problem and edited settings in the phab database. Perhaps keeping these " [puppet] - 10https://gerrit.wikimedia.org/r/205797 (https://phabricator.wikimedia.org/T548) (owner: 1020after4) [20:51:28] (03PS2) 10Alexandros Kosiaris: Add missing user attribute in maps-team labs hiera [puppet] - 10https://gerrit.wikimedia.org/r/224941 [20:51:38] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add missing user attribute in maps-team labs hiera [puppet] - 10https://gerrit.wikimedia.org/r/224941 (owner: 10Alexandros Kosiaris) [20:52:26] chasemp: it was abandoned and reopened. The -1 was for a couple of things but I never got a response after addressing some of them and protesting the others [20:53:09] ok twentyafterfour, I will try to ask akosiaris when I get a chance [20:53:17] statsd stats still look unhealthy, was the previous scap supposed to be complete and in effect? [20:53:37] bblack: it should be [20:53:38] I'll check the graphite host in case it needs some kind of kick there [20:53:47] the only pickles I see are, there is hard coded /srv/phab path which isn't fixed and do you intend to move all the default yaml settings to this format? [20:53:49] or was it stat100x, I think [20:54:04] but I get it in general it's more deterministic clearly than ruby [20:54:16] by fixed I meant isn't static :) [20:54:28] jouncebot, next [20:54:28] In 2 hour(s) and 5 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150715T2300) [20:54:41] bblack: doh! I didn't merge the change so no it didn't get synced [20:55:27] chasemp: yeah I would eventually move all the defaults out of yaml [20:56:01] bblack: I apologize, I totally thought that merged [20:56:04] twentyafterfour: do those still override them via teh class params because of phab env magic? [20:57:09] do which still override? [20:58:04] like right now if teh default is no and the class param via role is yes it overrides to yes etc [20:58:09] (03PS4) 10Alexandros Kosiaris: maps:: Add cassandra as a component of role::maps classes [puppet] - 10https://gerrit.wikimedia.org/r/224871 [20:58:11] (03PS1) 10Alexandros Kosiaris: Specify postgresql::spatialdb for maps-team hiera [puppet] - 10https://gerrit.wikimedia.org/r/224953 [20:58:15] (03CR) 10jenkins-bot: [V: 04-1] maps:: Add cassandra as a component of role::maps classes [puppet] - 10https://gerrit.wikimedia.org/r/224871 (owner: 10Alexandros Kosiaris) [20:58:17] (03CR) 10jenkins-bot: [V: 04-1] Specify postgresql::spatialdb for maps-team hiera [puppet] - 10https://gerrit.wikimedia.org/r/224953 (owner: 10Alexandros Kosiaris) [20:58:26] (03PS2) 10Alexandros Kosiaris: Specify postgresql::spatialdb for maps-team hiera [puppet] - 10https://gerrit.wikimedia.org/r/224953 [20:58:31] I guess yes because the local.json overrides [20:58:34] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Specify postgresql::spatialdb for maps-team hiera [puppet] - 10https://gerrit.wikimedia.org/r/224953 (owner: 10Alexandros Kosiaris) [20:58:35] and that is sourced via puppet params [20:58:44] just working it out here for myself [20:59:56] yeah local.json would still be used for non-static defaults, but global static defaults would move to a static php file [21:00:43] bblack: waiting for jenkins to merge [21:01:43] (03PS5) 10Alexandros Kosiaris: maps:: Add cassandra as a component of role::maps classes [puppet] - 10https://gerrit.wikimedia.org/r/224871 [21:01:45] (03PS1) 10Alexandros Kosiaris: Fix typo with postgresql::spatialdb::postgis_version [puppet] - 10https://gerrit.wikimedia.org/r/224954 [21:01:49] (03CR) 10jenkins-bot: [V: 04-1] maps:: Add cassandra as a component of role::maps classes [puppet] - 10https://gerrit.wikimedia.org/r/224871 (owner: 10Alexandros Kosiaris) [21:01:51] (03CR) 10jenkins-bot: [V: 04-1] Fix typo with postgresql::spatialdb::postgis_version [puppet] - 10https://gerrit.wikimedia.org/r/224954 (owner: 10Alexandros Kosiaris) [21:02:23] (03PS2) 10Alexandros Kosiaris: Fix typo with postgresql::spatialdb::postgis_version [puppet] - 10https://gerrit.wikimedia.org/r/224954 [21:02:35] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Fix typo with postgresql::spatialdb::postgis_version [puppet] - 10https://gerrit.wikimedia.org/r/224954 (owner: 10Alexandros Kosiaris) [21:03:53] (03PS1) 10KartikMistry: WIP: Do not use registry for Beta [puppet] - 10https://gerrit.wikimedia.org/r/224955 [21:05:27] twentyafterfour: It seems the "COunt API module/hooks" patch brought down statsd/graphite again [21:05:44] We just lost 1.0 million of 1.5 million requests counts per minute [21:05:50] down to 400k [21:05:52] doens't make sense [21:06:00] this happened before as well when that same patch was deployed [21:06:02] Krinkle: I'm aware [21:06:03] twice! [21:06:08] waiting for jenkins to merge the revert [21:06:10] this is the third time now [21:06:12] OK [21:06:25] What was different this time that we thought it wouldn't break? [21:06:30] Krinkle: bd808 reverted it on master too [21:06:57] Krinkle: I don't know. I think it just never got reverted in master so it keeps getting in each new branch [21:07:01] basically overlooked [21:07:31] Krinkle: I think people just didn't realize it was going out. It was never reverted in master, so it went with a branch deploy [21:07:48] why wasn't it reverted in master by whoever reverted it from the branch? [21:07:50] oh my scroll was messed up, that's all above heh [21:07:59] OK [21:08:14] Krinkle: bblack syncing for real this time [21:09:05] greg-g: they weren't sure of the root cause, is what I gathered [21:09:06] greg-g: the reversion patch got a -1 with "here's this other patch that fixes it properly so we don't have to revert it", but apparently that's not true, or something [21:09:21] gotcha [21:09:32] then let's just move forward and be happy :) [21:09:34] !log twentyafterfour Synchronized php-1.26wmf14: Really Sync If0237cdd0d66634d75b2bab8bc4292c0f3ef75ef this time (duration: 01m 32s) [21:09:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:09:57] bblack: Krinkle: is it looking healthy now? [21:10:22] right, the reversion is now merged to master, so we should be good going forward, until someone tries this again [21:10:39] twentyafterfour: don't see recovery yet, but it may take a minute [21:11:25] RB metrics are starting to pull back up after flying very close to the ground for a bit [21:11:38] there is a large uptick on https://tessera.wikimedia.org/dashboards/6/ciphers?from=-1h [21:11:50] yeah, looks recovering-ish [21:11:56] hopefully the rest of the lines tick up in a few more seconds :) [21:13:21] so that change we just reverted was just overloading statsd to the point where it overwhelmed other stats and they were getting dropped? [21:13:34] probably [21:13:51] statsdlb probably [21:14:01] that must take a hell of a lot of events [21:14:09] it's a single-threaded process [21:14:18] the very first time, the overload also included lots of invalid stats, and the statsite listeners logged about every single hit to syslog and then the ~46GB syslog files filled the root fs. [21:14:37] oh wow [21:14:37] I think that was supposedly reduced or addressed, so the other two incidents may have just been pure overload [21:15:37] so at least one unencountered issue got fixed in the process, not entirely a useless-failure [21:15:48] just call it chaos monkey :) [21:16:19] chaos train? :) [21:16:44] quick someone deploy the chaos train, things are too calm around here ;) [21:17:14] * gwicke groans faintly [21:18:42] the fact that statsite and/or statsdlb overloads just cause a numerical dropoff in some stats begs the question: are we missing some smaller chunk of stats all the time just due to random drops/spikes? [21:19:19] we definitely did over long periods [21:20:01] I'm definitely warming up to the idea of running local statsd daemons that export a pull interface to a central aggregator [21:20:29] what's the name of the pull-based graphite alternative again? [21:24:24] no idea [21:24:54] my google-fu seems to be too weak [21:25:20] prometheus [21:25:23] ? [21:25:26] yes! [21:25:55] when google fails, you can always count on chasemp. ;) [21:26:11] ^ chasemp you can quote me on that one. for the lols [21:26:14] heh [21:26:23] chasemp: thanks! [21:26:25] I've been lurking in their channel for a month or so to get an idea of stability [21:27:04] what's your impression? [21:27:47] "too new" [21:27:53] yes essentially [21:28:01] their docs read way more mature than their experience I think [21:28:22] overall pleasant channel though [21:29:31] an interesting narrative has been people asking for push metrics ala graphite replacment [21:29:32] the aggregation model with counters and timestamps seems to be more robust and easier to reason about [21:29:40] also the time we've invested into graphite is probably in the man-years scale, we should weight that against perceived benefits of a new system [21:29:41] and the maintainers trying to decide how much effort to dedicate [21:29:49] as it's not really their primary use case [21:30:01] but they also aren't fulfilling across the board as $othertool replacements atm [21:30:12] so it's at a bit of a crossroads I think in the next 6mo-1yr [21:30:36] *nod* [21:31:12] at least a pull-based model would be less likely to drop metrics across the board because of one app misbehaving [21:31:53] the subproject spun up to handle https://github.com/prometheus/pushgateway which I think is quasi-step childed still [21:32:01] but otoh, the monitoring system then needs to know about all clients in order to be able to poll [21:32:03] I think...both methods for different cases make sense [21:32:36] (03PS6) 10Alexandros Kosiaris: maps:: Add cassandra as a component of role::maps classes [puppet] - 10https://gerrit.wikimedia.org/r/224871 [21:32:38] (03PS1) 10Alexandros Kosiaris: postgresql::spatialdb is a define, not hiera autolookups [puppet] - 10https://gerrit.wikimedia.org/r/224957 [21:34:11] (03CR) 10Alexandros Kosiaris: [C: 032] postgresql::spatialdb is a define, not hiera autolookups [puppet] - 10https://gerrit.wikimedia.org/r/224957 (owner: 10Alexandros Kosiaris) [21:37:50] !log es1.6 upgrade: upgrade elastic1021 [21:37:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:39:02] chasemp: I actually wonder if there are any pull-based aggregators that could work with graphite [21:40:01] well diamond is basically pulling and then pushing into graphite as a relay...through statsd as an aggregator [21:40:16] so the model wouldn't be too disimilar [21:40:32] but it all gets weird as we layer it :) [21:40:58] yeah, it would need to be all pull aggregation per metric to deliver reliable results [21:41:14] afaik, stacked statsds don't work so well [21:41:25] it would be either/or for a particular metric for sure [21:41:50] *nod* [21:53:30] (03CR) 10Alex Monk: [C: 032] Enable ShortUrl extension at orwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222584 (https://phabricator.wikimedia.org/T103644) (owner: 10Glaisher) [21:53:59] !log es1.6 upgrade: upgrade elastic1022 [21:54:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:54:07] (03Merged) 10jenkins-bot: Enable ShortUrl extension at orwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222584 (https://phabricator.wikimedia.org/T103644) (owner: 10Glaisher) [21:57:57] (03CR) 10John Vandenberg: [C: 031] Re-enable all languages in GeSHi [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224799 (https://phabricator.wikimedia.org/T105889) (owner: 10Reedy) [21:59:26] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222584/ (duration: 00m 13s) [21:59:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:04:58] (03PS7) 10Alexandros Kosiaris: maps:: Add cassandra as a component of role::maps classes [puppet] - 10https://gerrit.wikimedia.org/r/224871 [22:05:00] (03PS1) 10Alexandros Kosiaris: Order postgresql::spatialdb after postgis [puppet] - 10https://gerrit.wikimedia.org/r/224962 [22:05:26] (03CR) 10Alex Monk: [C: 032] Add www.workwithsounds.eu to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223843 (https://phabricator.wikimedia.org/T105143) (owner: 10Odder) [22:05:42] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Order postgresql::spatialdb after postgis [puppet] - 10https://gerrit.wikimedia.org/r/224962 (owner: 10Alexandros Kosiaris) [22:05:51] (03Merged) 10jenkins-bot: Add www.workwithsounds.eu to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223843 (https://phabricator.wikimedia.org/T105143) (owner: 10Odder) [22:06:37] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/223843/ (duration: 00m 12s) [22:06:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:16:32] (03PS1) 10Alexandros Kosiaris: maps-team hiera: Update IPs to the new ones [puppet] - 10https://gerrit.wikimedia.org/r/224963 [22:17:15] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] maps-team hiera: Update IPs to the new ones [puppet] - 10https://gerrit.wikimedia.org/r/224963 (owner: 10Alexandros Kosiaris) [22:18:11] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.69% of data above the critical threshold [500.0] [22:20:21] (03CR) 10coren: [C: 031] "This may, in fact, be causing issues atm." [puppet] - 10https://gerrit.wikimedia.org/r/224660 (owner: 10Andrew Bogott) [22:23:53] !log deploy patch for T105305 to wmf13/14 [22:23:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:24:01] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [22:24:02] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [22:27:13] 6operations, 10Traffic, 7HTTPS, 7Mobile, 5Patch-For-Review: TLS and *.wap/*.mobile multi-level subdomains of wikipedia.org - https://phabricator.wikimedia.org/T104942#1455510 (10Tnegrin) @Jhernandez, @dr0ptp4kt -- what's the level of effort for Reading here? [22:29:23] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [22:30:41] 6operations, 10Traffic, 7HTTPS, 7Mobile, 5Patch-For-Review: TLS and *.wap/*.mobile multi-level subdomains of wikipedia.org - https://phabricator.wikimedia.org/T104942#1455523 (10BBlack) It shouldn't be a matter of any real effort. As far as I know, both of these subdomains were already deprecated years... [22:33:34] (03CR) 10Negative24: "@chasemp I agree. This should be in a module and that would solve many problems. I don't know if you want this to be a new class or just e" [puppet] - 10https://gerrit.wikimedia.org/r/222987 (https://phabricator.wikimedia.org/T104827) (owner: 10Negative24) [22:33:59] (03CR) 10Negative24: "*in the module (not a)" [puppet] - 10https://gerrit.wikimedia.org/r/222987 (https://phabricator.wikimedia.org/T104827) (owner: 10Negative24) [22:34:27] (03PS8) 10Alexandros Kosiaris: maps:: Add cassandra as a component of role::maps classes [puppet] - 10https://gerrit.wikimedia.org/r/224871 [22:35:39] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] maps:: Add cassandra as a component of role::maps classes [puppet] - 10https://gerrit.wikimedia.org/r/224871 (owner: 10Alexandros Kosiaris) [22:40:50] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [22:40:51] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [22:45:04] (03CR) 10Hoo man: [C: 04-1] "Many style comments, nothing substantial" (0351 comments) [puppet] - 10https://gerrit.wikimedia.org/r/219800 (https://phabricator.wikimedia.org/T103087) (owner: 10Lokal Profil) [22:46:32] (03PS1) 10Andrew Bogott: Don't use multi-master for labs salt. It has never worked. [puppet] - 10https://gerrit.wikimedia.org/r/224967 [22:52:21] 6operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Rename zh-yue -> yue - https://phabricator.wikimedia.org/T30441#1455609 (10C933103) [22:52:42] 6operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Rename zh-yue -> yue - https://phabricator.wikimedia.org/T30441#335030 (10C933103) >>! In T30441#335089, @deryckchan wrote: > See bug 19986 for the discussion on why it's so difficult. (Thanks Andre Klapper) (T21986) [22:53:45] (03CR) 10Alex Monk: [C: 04-1] "This should be split" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214981 (https://phabricator.wikimedia.org/T100953) (owner: 10Odder) [22:54:41] (03PS1) 10KartikMistry: CX: Add missing eo-en pair [puppet] - 10https://gerrit.wikimedia.org/r/224968 [22:55:12] (03CR) 10Alex Monk: "It doesn't really depend on that though, does it?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220061 (owner: 10Cscott) [22:57:53] (03CR) 10Alex Monk: "Ping." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200038 (owner: 10Cscott) [22:58:44] (03CR) 10Alex Monk: "Andrew, Coren, Yuvi: Do we want to do this?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214893 (https://phabricator.wikimedia.org/T100313) (owner: 10Ladsgroup) [22:59:05] (03CR) 10Andrew Bogott: [C: 032] Don't use multi-master for labs salt. It has never worked. [puppet] - 10https://gerrit.wikimedia.org/r/224967 (owner: 10Andrew Bogott) [22:59:43] 6operations, 10Traffic, 7HTTPS, 7Mobile, 5Patch-For-Review: TLS and *.wap/*.mobile multi-level subdomains of wikipedia.org - https://phabricator.wikimedia.org/T104942#1455649 (10Tnegrin) cool -- I'll defer to the expertise of Adam and Joaquin on this. [23:00:04] RoanKattouw ostriches rmoen Krenair: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150715T2300). Please do the needful. [23:03:08] (03PS6) 10Alex Monk: Add interwiki-labs.cdb [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175755 (https://phabricator.wikimedia.org/T69931) (owner: 10Reedy) [23:03:15] (03CR) 10Alex Monk: [C: 032] Add interwiki-labs.cdb [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175755 (https://phabricator.wikimedia.org/T69931) (owner: 10Reedy) [23:03:21] (03Merged) 10jenkins-bot: Add interwiki-labs.cdb [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175755 (https://phabricator.wikimedia.org/T69931) (owner: 10Reedy) [23:04:42] (03CR) 10Alex Monk: Disable webp for now, so we can enable outside of WMF (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221731 (https://phabricator.wikimedia.org/T27397) (owner: 10TheDJ) [23:06:39] !log krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/175755/ (duration: 00m 12s) [23:06:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:08:59] !log krenair Synchronized docroot/noc: https://gerrit.wikimedia.org/r/#/c/175755/ (duration: 00m 13s) [23:09:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:09:18] (03CR) 10Alex Monk: [C: 032] Re-enable all languages in GeSHi [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224799 (https://phabricator.wikimedia.org/T105889) (owner: 10Reedy) [23:09:23] (03PS3) 10Odder: Provide static PNG logos for emlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214981 (https://phabricator.wikimedia.org/T100953) [23:09:50] (03Merged) 10jenkins-bot: Re-enable all languages in GeSHi [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224799 (https://phabricator.wikimedia.org/T105889) (owner: 10Reedy) [23:10:37] !log krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/224799/ (duration: 00m 13s) [23:10:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:11:51] (03CR) 10Alex Monk: [C: 032] Convert some usages of 'wiki' to 'wikipedia' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194075 (https://phabricator.wikimedia.org/T91340) (owner: 10MaxSem) [23:11:54] (03CR) 10jenkins-bot: [V: 04-1] Convert some usages of 'wiki' to 'wikipedia' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194075 (https://phabricator.wikimedia.org/T91340) (owner: 10MaxSem) [23:13:11] PROBLEM - Host mw2027 is DOWN: PING CRITICAL - Packet loss = 100% [23:14:19] (03PS4) 10Odder: Provide static PNG logo for emlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214981 (https://phabricator.wikimedia.org/T100953) [23:14:42] RECOVERY - Host mw2027 is UPING OK - Packet loss = 0%, RTA = 43.61 ms [23:14:54] 6operations, 10Traffic, 7HTTPS, 5Patch-For-Review: Insecure POST traffic - https://phabricator.wikimedia.org/T105794#1455703 (10Slakr) 21 SineBot/1.5.19(User:SineBot) Fixed. [23:15:15] (03PS3) 10Alex Monk: Convert some usages of 'wiki' to 'wikipedia' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194075 (https://phabricator.wikimedia.org/T91340) (owner: 10MaxSem) [23:16:01] (03CR) 10Alex Monk: [C: 032] Convert some usages of 'wiki' to 'wikipedia' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194075 (https://phabricator.wikimedia.org/T91340) (owner: 10MaxSem) [23:16:07] (03Merged) 10jenkins-bot: Convert some usages of 'wiki' to 'wikipedia' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194075 (https://phabricator.wikimedia.org/T91340) (owner: 10MaxSem) [23:16:44] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/194075/ (duration: 00m 12s) [23:16:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:19:01] (03CR) 10Alex Monk: [C: 032] "Yeah, seems unused" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/209840 (https://phabricator.wikimedia.org/T62023) (owner: 10Reedy) [23:19:03] (03CR) 10jenkins-bot: [V: 04-1] Remove wmgDualLicense, orphaned [mediawiki-config] - 10https://gerrit.wikimedia.org/r/209840 (https://phabricator.wikimedia.org/T62023) (owner: 10Reedy) [23:21:05] (03PS5) 10Alex Monk: Remove wmgDualLicense, orphaned [mediawiki-config] - 10https://gerrit.wikimedia.org/r/209840 (https://phabricator.wikimedia.org/T62023) (owner: 10Reedy) [23:21:14] (03CR) 10Alex Monk: [C: 032] Remove wmgDualLicense, orphaned [mediawiki-config] - 10https://gerrit.wikimedia.org/r/209840 (https://phabricator.wikimedia.org/T62023) (owner: 10Reedy) [23:21:20] (03Merged) 10jenkins-bot: Remove wmgDualLicense, orphaned [mediawiki-config] - 10https://gerrit.wikimedia.org/r/209840 (https://phabricator.wikimedia.org/T62023) (owner: 10Reedy) [23:22:07] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/209840/ (duration: 00m 12s) [23:22:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:23:21] (03PS5) 10Odder: Provide static PNG logo for emlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214981 (https://phabricator.wikimedia.org/T100953) [23:25:58] (03CR) 10Odder: "Should be okay to merge." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214981 (https://phabricator.wikimedia.org/T100953) (owner: 10Odder) [23:27:35] Krenair: maybe you can do https://gerrit.wikimedia.org/r/#/c/221885/ this window? [23:29:35] AaronSchulz, okay [23:30:00] (03PS1) 10Alex Monk: Beta: Only enable ContentTranslation on wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/224975 (https://phabricator.wikimedia.org/T91340) [23:30:18] AaronSchulz, does not merge [23:33:50] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 697.509448533 [23:34:12] even though it merges for me locally. wtf gerrit? [23:34:29] (03PS5) 10Aaron Schulz: Set $wgMainStash to redis instead of the DB default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221885 (https://phabricator.wikimedia.org/T88493) [23:34:43] yeah, merges fine...I've seen that lots of times before [23:35:22] (03CR) 10Alex Monk: [C: 032] Set $wgMainStash to redis instead of the DB default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221885 (https://phabricator.wikimedia.org/T88493) (owner: 10Aaron Schulz) [23:36:06] helpful timing, grrrit-wm... [23:36:33] AaronSchulz, syncing [23:36:42] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221885/ (duration: 00m 13s) [23:36:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:40:28] 6operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Rename zh-classical -> lzh - https://phabricator.wikimedia.org/T30443#1455779 (10Liuxinyu970226) [23:41:31] PROBLEM - puppet last run on pollux is CRITICAL puppet fail [23:42:11] (03PS1) 10Alexandros Kosiaris: Update cassandra seeds for maps-team project [puppet] - 10https://gerrit.wikimedia.org/r/224977 [23:42:51] AaronSchulz, all good? [23:43:37] seems fine to me [23:43:37] seems so [23:43:41] * Krenair goes to get dinner [23:44:56] 6operations, 10ops-eqiad, 10Analytics-Cluster: rack new hadoop worker nodes - https://phabricator.wikimedia.org/T104463#1455784 (10Ottomata) Hm, oh! Chris, we can move analytics1003, analytics1004 and analytics1010 to Row D. If I remember correctly those are tall servers, so maybe we can fit a few in their... [23:45:00] PROBLEM - puppet last run on labsdb1004 is CRITICAL Puppet has 5 failures [23:45:44] (03CR) 10Alexandros Kosiaris: [C: 032] Update cassandra seeds for maps-team project [puppet] - 10https://gerrit.wikimedia.org/r/224977 (owner: 10Alexandros Kosiaris) [23:46:51] (03PS7) 10Gergő Tisza: Basic role for Sentry [puppet] - 10https://gerrit.wikimedia.org/r/199598 (https://phabricator.wikimedia.org/T84956) (owner: 10Gilles) [23:47:19] (03CR) 10Gergő Tisza: "rebased" [puppet] - 10https://gerrit.wikimedia.org/r/199598 (https://phabricator.wikimedia.org/T84956) (owner: 10Gilles) [23:49:24] (03Abandoned) 10Aaron Schulz: Set $wgActivityUpdatesUseJobQueue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/206862 (https://phabricator.wikimedia.org/T91284) (owner: 10Aaron Schulz) [23:50:57] (03PS1) 10Alexandros Kosiaris: postgres: allow defaults for unix_socket_directory [puppet] - 10https://gerrit.wikimedia.org/r/224979 [23:59:17] (03CR) 10Alexandros Kosiaris: [C: 032] postgres: allow defaults for unix_socket_directory [puppet] - 10https://gerrit.wikimedia.org/r/224979 (owner: 10Alexandros Kosiaris)