[00:04:37] yay, it works [00:25:27] 6operations, 10Wikimedia-Git-or-Gerrit, 5Patch-For-Review: TransparencyReport repository master in Gerrit silently made private - https://phabricator.wikimedia.org/T89640#1076862 (10Prtksxna) Thanks @akosiaris! [00:34:29] (03PS3) 10Tim Landscheidt: Tools: Use labsdeprepo [puppet] - 10https://gerrit.wikimedia.org/r/119428 (https://phabricator.wikimedia.org/T62925) [00:35:16] (03CR) 10Tim Landscheidt: [C: 04-1] "Needs to be tested first, and for that I need wikitech." [puppet] - 10https://gerrit.wikimedia.org/r/119428 (https://phabricator.wikimedia.org/T62925) (owner: 10Tim Landscheidt) [00:47:10] PROBLEM - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1455 bytes in 0.221 second response time [01:04:04] 6operations, 10Wikimedia-Git-or-Gerrit, 5Patch-For-Review: TransparencyReport repository master in Gerrit silently made private - https://phabricator.wikimedia.org/T89640#1076898 (10Prtksxna) I am currently unable to clone from the private respository— ```COUNTEREXAMPLE ~/wmf ▶ git clone ssh://prtksxna@gerr... [01:10:20] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.0066889632107 [01:19:45] (03PS1) 10Jforrester: Beta Features: Remove VisualEditor language tool (deployed everywhere) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193762 [01:20:02] (03CR) 10Jforrester: [C: 04-1] "Not until after wmf20 is everywhere." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193762 (owner: 10Jforrester) [01:20:30] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [01:30:40] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [01:47:20] RECOVERY - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1445 bytes in 0.186 second response time [02:03:10] !log l10nupdate Synchronized php-1.25wmf18/cache/l10n: (no message) (duration: 00m 01s) [02:04:18] !log LocalisationUpdate completed (1.25wmf18) at 2015-03-02 02:03:14+00:00 [02:04:41] !log l10nupdate Synchronized php-1.25wmf19/cache/l10n: (no message) (duration: 00m 01s) [02:05:50] !log LocalisationUpdate completed (1.25wmf19) at 2015-03-02 02:04:47+00:00 [02:17:06] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Mar 2 02:16:03 UTC 2015 (duration 16m 2s) [02:26:38] (03PS2) 10Tim Starling: Remove alias codes from langlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166281 (https://bugzilla.wikimedia.org/43697) (owner: 10TTO) [02:26:45] (03CR) 10Tim Starling: [C: 032] Remove alias codes from langlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166281 (https://bugzilla.wikimedia.org/43697) (owner: 10TTO) [02:26:50] (03Merged) 10jenkins-bot: Remove alias codes from langlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166281 (https://bugzilla.wikimedia.org/43697) (owner: 10TTO) [02:37:20] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [02:38:00] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [02:41:29] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [02:51:13] (03CR) 10TTO: "Thanks Tim. Could you please merge the dependency as well (see commit message)? Otherwise lots of interwiki links will break, IIRC." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166281 (https://bugzilla.wikimedia.org/43697) (owner: 10TTO) [03:09:14] (03PS2) 10Alex Monk: Setting import sources for uawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193662 (https://phabricator.wikimedia.org/T91187) (owner: 10Base) [03:13:58] (03PS4) 10Alex Monk: Enable Collection by default on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165490 (https://phabricator.wikimedia.org/T73416) (owner: 10Reedy) [03:14:47] (03CR) 10Tim Landscheidt: ""It's complicated." http://permalink.gmane.org/gmane.org.wikimedia.labs/3011 suggests this was done to send a signal different from KILL " [puppet] - 10https://gerrit.wikimedia.org/r/192172 (https://phabricator.wikimedia.org/T90331) (owner: 10Tim Landscheidt) [03:28:45] (03PS3) 10Alex Monk: Enabling subpages for ns0 in uawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193661 (https://phabricator.wikimedia.org/T91185) (owner: 10Base) [03:33:03] (03CR) 10Alex Monk: "IIRC we'll have to run something like this to do the schema change:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193138 (https://phabricator.wikimedia.org/T89898) (owner: 10Glaisher) [04:07:50] PROBLEM - HTTP 5xx req/min on graphite2001 is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [04:07:50] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [04:09:40] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [04:20:26] 6operations, 6Labs: Replicate or back up glance image data on virt1000 - https://phabricator.wikimedia.org/T90628#1077040 (10Krenair) virt1000, yep. [04:25:30] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [04:25:30] RECOVERY - HTTP 5xx req/min on graphite2001 is OK: OK: Less than 1.00% above the threshold [250.0] [04:27:20] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:18:20] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [05:21:40] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [05:29:00] !log on tin: updating deployment branches for Ieb27df7ef470cbda06b5b0f5bfb372bd7279c183 [05:29:06] Logged the message, Master [05:29:48] !log tstarling Started scap: Ieb27df7ef470cbda06b5b0f5bfb372bd7279c183 [05:29:51] Logged the message, Master [05:32:06] !log tstarling Finished scap: Ieb27df7ef470cbda06b5b0f5bfb372bd7279c183 (duration: 02m 17s) [05:32:09] Logged the message, Master [05:33:12] !log on terbium: fixed permissions on /srv/mediawiki/multiversion [05:33:15] Logged the message, Master [05:35:05] (03PS1) 10KartikMistry: CX: Enable Content Translation in pawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193774 (https://phabricator.wikimedia.org/T89635) [05:36:34] (03CR) 10Santhosh: [C: 04-1] "Must be configured for Main namespace publishing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193774 (https://phabricator.wikimedia.org/T89635) (owner: 10KartikMistry) [05:42:58] (03PS2) 10KartikMistry: CX: Enable Content Translation in pawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193774 (https://phabricator.wikimedia.org/T89635) [05:51:08] (03PS1) 10KartikMistry: CX: Enable Content Translation in kywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193775 (https://phabricator.wikimedia.org/T89337) [05:54:33] !log tstarling Synchronized langlist: (no message) (duration: 00m 06s) [05:54:39] Logged the message, Master [05:54:39] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [05:57:37] (03PS1) 10Tim Starling: Don't make script files writable by unprivileged users [puppet] - 10https://gerrit.wikimedia.org/r/193776 [06:29:11] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:12] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:00] PROBLEM - puppet last run on elastic1022 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:50] PROBLEM - puppet last run on labcontrol2001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:51] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:51] (03CR) 10Faidon Liambotis: [C: 04-1] "Removing the owner/group will do nothing for the files right now, it will just leave them be owned by whatever they were owned before." [puppet] - 10https://gerrit.wikimedia.org/r/193776 (owner: 10Tim Starling) [06:37:40] PROBLEM - HTTP 5xx req/min on graphite2001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [06:37:40] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [06:43:30] PROBLEM - HTTP error ratio anomaly detection on graphite2001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 4 below the confidence bounds [06:43:30] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 4 below the confidence bounds [06:45:40] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [06:46:20] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:46:41] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:47:20] RECOVERY - puppet last run on labcontrol2001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:47:30] RECOVERY - puppet last run on elastic1022 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:51:28] so daily puppet failure storm just ended, I guess _joe_ will be around soon [06:52:17] awww [07:13:00] RECOVERY - HTTP 5xx req/min on graphite2001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:13:01] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:16:12] (03CR) 10Santhosh: [C: 031] CX: Enable Content Translation in pawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193774 (https://phabricator.wikimedia.org/T89635) (owner: 10KartikMistry) [07:24:50] (03PS3) 10Yuvipanda: tools: Move uwsgi services also to generic nodes [puppet] - 10https://gerrit.wikimedia.org/r/193558 (https://phabricator.wikimedia.org/T91065) [07:26:37] (03CR) 10Yuvipanda: [C: 032] tools: Move uwsgi services also to generic nodes [puppet] - 10https://gerrit.wikimedia.org/r/193558 (https://phabricator.wikimedia.org/T91065) (owner: 10Yuvipanda) [07:35:14] (03PS1) 10Yuvipanda: tools: Add uwsgi support to generic webgrid nodes [puppet] - 10https://gerrit.wikimedia.org/r/193781 (https://phabricator.wikimedia.org/T91065) [07:35:23] (03CR) 10jenkins-bot: [V: 04-1] tools: Add uwsgi support to generic webgrid nodes [puppet] - 10https://gerrit.wikimedia.org/r/193781 (https://phabricator.wikimedia.org/T91065) (owner: 10Yuvipanda) [07:36:54] paravoid: btw, only SPOF for tools now is redis and the ‘cron host’. I checked which hosts run where and know for sure now that if one virt host goes down, outside of cron and redis and ‘bigbrother’, nothing else will be disrupted [07:36:58] things would just get a little slower. [07:37:23] am adding more nodes so we run about 50% or so at capacity. right now we’re way close to 100% and I don’t like that at all [07:37:25] (03CR) 10Tim Landscheidt: Tools: Puppetize toolwatcher (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [07:37:28] * YuviPanda gives more love to toollabs [07:39:16] (03PS2) 10Yuvipanda: tools: Add uwsgi support to generic webgrid nodes [puppet] - 10https://gerrit.wikimedia.org/r/193781 (https://phabricator.wikimedia.org/T91065) [07:39:44] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Add uwsgi support to generic webgrid nodes [puppet] - 10https://gerrit.wikimedia.org/r/193781 (https://phabricator.wikimedia.org/T91065) (owner: 10Yuvipanda) [07:41:40] (03PS7) 10Tim Landscheidt: Tools: Puppetize toolwatcher [puppet] - 10https://gerrit.wikimedia.org/r/120186 [07:43:36] YuviPanda: hi [07:43:41] hey kart_ [07:43:51] YuviPanda: how are things? [07:44:00] not bad. hand slightly worse than before [07:44:06] mostly getting toollabs into better shape now [07:45:00] YuviPanda: need some help here: https://gerrit.wikimedia.org/r/#/c/191263 [07:45:47] YuviPanda: what I'm trying to do is, we want to use 'registry' from config.default.js [07:46:39] kart_: sorry, can’t help atm. doing toollabs things :( [07:46:42] wait for akosiaris maybe? [07:47:43] YuviPanda: no worries. Just need your thoughts on this anytime later today. [07:47:52] cool, I’ll see if I have time [07:47:53] thanks [07:48:00] YuviPanda: Thanks! [07:50:22] (03CR) 10Tim Landscheidt: [C: 04-1] ""Could not find parent resource type 'role::labs::tools::config' of type hostclass in production"" [puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [07:55:51] (03PS8) 10Tim Landscheidt: Tools: Puppetize toolwatcher [puppet] - 10https://gerrit.wikimedia.org/r/120186 [07:57:15] (03CR) 10Tim Landscheidt: "Tested in Toolsbeta." [puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [07:58:11] (03CR) 10Yuvipanda: [C: 031] Tools: Puppetize toolwatcher [puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [08:01:55] <_joe_> what's toolwatcher? [08:04:54] _joe_: https://github.com/wikimedia/labs-toollabs/blob/master/misctools/toolwatcher [08:05:04] wait [08:05:10] why isn’t that just done by a skeleton homedir? [08:05:15] <_joe_> .... [08:05:25] * _joe_ facepalms [08:06:10] _joe_: so this is another one of those dilemmas. it’s terrible, but is it better to be terrible and unpuppetized or terrible and puppetized? [08:06:19] or I could fix it, but other things need fixing first... [08:06:26] <_joe_> YuviPanda: use skeletons FFS [08:07:25] (03CR) 10Yuvipanda: "Hmm, we should actually just use skeleton homedirs for this instead, no?" [puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [08:17:26] (03PS2) 10Yuvipanda: Tools: Puppetize jobkill [puppet] - 10https://gerrit.wikimedia.org/r/192172 (https://phabricator.wikimedia.org/T90331) (owner: 10Tim Landscheidt) [08:24:00] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [08:24:00] RECOVERY - HTTP error ratio anomaly detection on graphite2001 is OK: OK: No anomaly detected [08:24:02] (03PS1) 10KartikMistry: CX: Enable ky and pa in target, kk and tr in source [puppet] - 10https://gerrit.wikimedia.org/r/193789 [08:26:23] (03CR) 10KartikMistry: [C: 04-1] "To be merge on 05 March, 2015 after https://gerrit.wikimedia.org/r/193774 and https://gerrit.wikimedia.org/r/193775 only :)" [puppet] - 10https://gerrit.wikimedia.org/r/193789 (owner: 10KartikMistry) [08:31:30] (03CR) 10Yuvipanda: [C: 032] "T91233 talks about getting rid of / documenting / rewriting this script." [puppet] - 10https://gerrit.wikimedia.org/r/192172 (https://phabricator.wikimedia.org/T90331) (owner: 10Tim Landscheidt) [08:34:10] (03CR) 10Yuvipanda: "I've filed T91235 to get rid of this, but am otherwise going to merge this on the principle of 'better terrible and puppetized than terrib" [puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [08:34:17] (03PS9) 10Yuvipanda: Tools: Puppetize toolwatcher [puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [08:34:46] (03CR) 10Yuvipanda: [C: 032 V: 032] Tools: Puppetize toolwatcher [puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [08:39:52] (03PS1) 10Yuvipanda: toollabs: Remove uwsgi node definitions [puppet] - 10https://gerrit.wikimedia.org/r/193791 (https://phabricator.wikimedia.org/T91065) [08:41:32] (03PS2) 10Yuvipanda: toollabs: Remove uwsgi node definitions [puppet] - 10https://gerrit.wikimedia.org/r/193791 (https://phabricator.wikimedia.org/T91065) [08:41:41] (03CR) 10Yuvipanda: [C: 032 V: 032] toollabs: Remove uwsgi node definitions [puppet] - 10https://gerrit.wikimedia.org/r/193791 (https://phabricator.wikimedia.org/T91065) (owner: 10Yuvipanda) [08:57:14] good morning [09:05:26] mogge [09:11:13] 6operations, 10ops-codfw, 3wikis-in-codfw: PXE doesn't work on mc2017-18 - https://phabricator.wikimedia.org/T90586#1077427 (10Joe) mc2018 gets to the pxe boot but never sends any dhcp packets to carbon, apparently. Are you sure it's properly connected? [09:13:49] 6operations, 10ops-codfw, 3wikis-in-codfw: Configure mw2001-2134 correctly - https://phabricator.wikimedia.org/T91238#1077429 (10Joe) 3NEW [09:25:49] 6operations, 10ops-codfw: rack mw2135 through mw2215 - https://phabricator.wikimedia.org/T86806#1077458 (10Joe) I still need the following things: - Verify console redirection after boot is enabled - Verify hyperthreading is active - Insert mac addresses in the dhcp files so that we can install them expeditely. [09:43:15] (03PS1) 10Giuseppe Lavagetto: mediawiki: add currently installed codfw memcached [puppet] - 10https://gerrit.wikimedia.org/r/193798 [09:43:17] (03PS1) 10Giuseppe Lavagetto: mediawiki: add codfw appservers to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/193799 [09:55:29] (03PS1) 10Nemo bis: Use language code "bho" directly instead of its alias "bh" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193801 (https://phabricator.wikimedia.org/T91240) [09:57:49] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [10:00:13] (03CR) 10Kelson: [C: 031] "Yes." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193801 (https://phabricator.wikimedia.org/T91240) (owner: 10Nemo bis) [10:01:10] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [10:07:33] (03CR) 10Tim Landscheidt: "Okay, now tested with toolsbeta-webproxy, from every angle I can imagine." [puppet] - 10https://gerrit.wikimedia.org/r/119428 (https://phabricator.wikimedia.org/T62925) (owner: 10Tim Landscheidt) [10:11:48] (03CR) 10Yuvipanda: [C: 031] "Let's do this on say, friday?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/119428 (https://phabricator.wikimedia.org/T62925) (owner: 10Tim Landscheidt) [10:20:30] (03CR) 10Yuvipanda: "Needs someone to babysit this. I'll do it if nobody else does it by end of the week." [puppet] - 10https://gerrit.wikimedia.org/r/123903 (owner: 10Tim Landscheidt) [10:22:56] (03CR) 10TTO: "@Reedy: It'd be nice to get this going :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy) [10:32:08] YuviPanda: kart_: for https://gerrit.wikimedia.org/r/#/c/191263, using defaults for registry just so that languages can be enabled via deployment and not configuration is not the best approach. [10:32:36] have some other way of enabling them, like a flag in a database or something [10:37:01] (03CR) 10Alexandros Kosiaris: [C: 032] CX: Enable ky and pa in target, kk and tr in source [puppet] - 10https://gerrit.wikimedia.org/r/193789 (owner: 10KartikMistry) [10:50:20] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [11:00:30] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [11:03:47] (03PS1) 10Tim Landscheidt: Tools: Fix creating initial access.conf and shosts.equiv [puppet] - 10https://gerrit.wikimedia.org/r/193804 [11:05:33] (03CR) 10Tim Landscheidt: "Tested on Toolsbeta (and again starred at the screen for far too long before realizing that the instance I was initially testing did not i" [puppet] - 10https://gerrit.wikimedia.org/r/193804 (owner: 10Tim Landscheidt) [11:07:18] (03CR) 10Tim Landscheidt: "(The error is responsible for the Puppet failures on tools-webgrid-07 and tools-webgrid-generic-02.)" [puppet] - 10https://gerrit.wikimedia.org/r/193804 (owner: 10Tim Landscheidt) [11:07:49] (03PS2) 10Yuvipanda: Tools: Fix creating initial access.conf and shosts.equiv [puppet] - 10https://gerrit.wikimedia.org/r/193804 (owner: 10Tim Landscheidt) [11:08:49] (03CR) 10Yuvipanda: [C: 032 V: 032] "<3" [puppet] - 10https://gerrit.wikimedia.org/r/193804 (owner: 10Tim Landscheidt) [11:37:05] (03PS1) 10KartikMistry: Revert "CX: Enable ky and pa in target, kk and tr in source" [puppet] - 10https://gerrit.wikimedia.org/r/193809 [11:37:40] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Revert "CX: Enable ky and pa in target, kk and tr in source" [puppet] - 10https://gerrit.wikimedia.org/r/193809 (owner: 10KartikMistry) [12:01:41] (03PS1) 10Aude: Add badge items for beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193814 [12:03:23] (03PS2) 10Aude: Add badge items for beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193814 [12:15:53] (03CR) 10WMDE-Fisch: [C: 031] "looks good, need this for browsertests" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193814 (owner: 10Aude) [12:39:42] 6operations, 10hardware-requests: Upgrade eqiad LVS to 10G - https://phabricator.wikimedia.org/T89120#1077798 (10mark) [14:45:04] (03PS1) 10Hashar: contint: keep 180 min of puppet reports [puppet] - 10https://gerrit.wikimedia.org/r/193825 (https://phabricator.wikimedia.org/T87484) [14:55:59] (03CR) 10Hashar: "I have blanked https://wikitech.wikimedia.org/wiki/Hiera:Integration and applied this change to the puppetmaster:" [puppet] - 10https://gerrit.wikimedia.org/r/193825 (https://phabricator.wikimedia.org/T87484) (owner: 10Hashar) [14:59:32] (03PS1) 10Gerrit Patch Uploader: Add Draft namespace on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193827 (https://phabricator.wikimedia.org/T91223) [14:59:34] (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193827 (https://phabricator.wikimedia.org/T91223) (owner: 10Gerrit Patch Uploader) [14:59:49] (03PS2) 10Bugreporter: Add Draft namespace on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193827 (https://phabricator.wikimedia.org/T91223) (owner: 10Gerrit Patch Uploader) [15:04:06] 10Ops-Access-Requests, 6operations: RESTBase deploy access and shell on Cassandra cluster for eevans - https://phabricator.wikimedia.org/T91134#1078131 (10coren) p:5Triage>3Normal [15:07:58] 6operations: Enable TRIM for SSDs for Cassandra software raid - https://phabricator.wikimedia.org/T89584#1078148 (10coren) p:5Triage>3Normal [15:08:13] 6operations: Enable TRIM for SSDs for Cassandra software raid - https://phabricator.wikimedia.org/T89584#1039857 (10coren) Changed title to reflect discussion [15:10:40] (03PS5) 1001tonythomas: Added BounceHandler extension to group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191937 (https://phabricator.wikimedia.org/T48640) [15:15:54] !log temporarily depooling cp1064 (eqiad upload) for reinstall [15:15:59] Logged the message, Master [15:18:36] 7Puppet, 6Labs: dynamicproxy: Move list of blocked user agents to hiera - https://phabricator.wikimedia.org/T90844#1078178 (10coren) [15:22:19] 6operations, 10Staging: Package geoipupdate for jessie - https://phabricator.wikimedia.org/T90229#1078195 (10coren) a:3faidon [15:22:43] why me? :) [15:23:16] 6operations: Enable TRIM for SSDs for Cassandra software raid - https://phabricator.wikimedia.org/T89584#1078200 (10BBlack) Also, assuming TRIM does reach the disk, you might want to do some perf testing and some research on how the drive is provisioned from the factory. I took an in-depth look at these issues... [15:23:26] paravoid: Because you made the original package - I expect it should take you a few minutes? I can spirit it away from you if you are overly busy. :-) [15:23:53] 6operations, 10Staging: Package geoipupdate for jessie - https://phabricator.wikimedia.org/T90229#1054117 (10coren) p:5Triage>3Normal [15:24:33] I quote the expert: "It should be trivial". :-) [15:26:25] paravoid: By the way, backporting a recent LVM to precise went fairly well, I only have a minor bad dependency issue to fix and I'm all set (a package erroneously claims to need perl-base 5.20 which is just silly) [15:27:50] I did it with a ppa to make it easy for me to test locally, bringing it back internally will be simple. [15:29:08] (03PS1) 10BBlack: depool cp1064 backend [puppet] - 10https://gerrit.wikimedia.org/r/193830 [15:30:24] (03CR) 10BBlack: [C: 032] depool cp1064 backend [puppet] - 10https://gerrit.wikimedia.org/r/193830 (owner: 10BBlack) [15:33:00] (03CR) 10Krinkle: [C: 031] contint: keep 180 min of puppet reports [puppet] - 10https://gerrit.wikimedia.org/r/193825 (https://phabricator.wikimedia.org/T87484) (owner: 10Hashar) [15:33:27] 6operations, 10ops-esams, 7HTTPS, 3HTTPS-by-default: esams power capacity issues - https://phabricator.wikimedia.org/T90000#1078215 (10coren) p:5Triage>3High Blocker on high priority task. [15:36:13] 6operations, 7HTTPS, 3HTTPS-by-default: Expand HTTP frontend clusters with new hardware - https://phabricator.wikimedia.org/T86663#1078222 (10BBlack) Update: Orders have been placed for all of the new servers in esams + eqiad. eqiad hardware has already arrived, and esams is due by Mar 17th (for the disks,... [15:36:29] 6operations: pybal issue? - https://phabricator.wikimedia.org/T90839#1078224 (10coren) p:5Triage>3High This is set as blocker for two tasks, one of which is High. @Joe, you requested this ticket be created; are you up to taking it? [15:37:30] PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: Puppet has 1 failures [15:37:40] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: Puppet has 1 failures [15:38:33] 6operations, 10Wikimedia-Git-or-Gerrit, 5Patch-For-Review: TransparencyReport repository master in Gerrit silently made private - https://phabricator.wikimedia.org/T89640#1078238 (10akosiaris) @Prtksxna. OK, fixed. I had tested cloning the repo as my user, but not actually cloning the repo as a member of th... [15:39:08] (03CR) 10Hoo man: "Looks good at a glance, please be careful when syncing the files (Wikibase.php mustn't be synced first)." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193814 (owner: 10Aude) [15:50:18] tonythomas, twkozlowski, aude: Ping for SWAT in 10 minutes. [15:50:30] Jeff_Green: around ? [15:50:36] manybubbles, marktraceur, ^d: Bunch of config changes to SWAT today. Anyone want it? [15:50:41] 6operations: NIC misassigned (double entries) by jessie installer - https://phabricator.wikimedia.org/T90236#1078286 (10coren) p:5Triage>3Low Since there is a workaround, and the bug is filed upstream, I'm setting this to Low priority despite the blocked task being High (nothing we can do directly at this ti... [15:50:59] anomie: I imagine I can do it! [15:51:06] manybubbles: ok! [15:51:07] let me finish this email and review them [15:51:14] pong [15:51:54] <^d> Ugh I hobbled all the way over to respond to the ping and someone already picked up swat [15:52:28] (03CR) 10Aude: Add badge items for beta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193814 (owner: 10Aude) [15:52:40] (03PS3) 10Aude: Add badge items for beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193814 [15:54:10] (03CR) 10Hoo man: [C: 031] Add badge items for beta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193814 (owner: 10Aude) [15:55:11] you hurt, demon? [15:55:20] RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [15:55:56] anyone around to represent twkozlowski's change sets? [15:56:16] manybubbles: He may show up yet [15:56:22] 6operations: bond eth intefaces on ms1001 - https://phabricator.wikimedia.org/T89829#1078302 (10coren) a:3BBlack Brandon is already working on this alongside Ariel [15:56:40] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [15:56:41] 6operations: bond eth intefaces on ms1001 - https://phabricator.wikimedia.org/T89829#1078306 (10coren) p:5Triage>3Normal [15:56:51] anomie: yeah. [15:57:12] does gerrit share a password with anything? like shell accounts or something? I'm logged out and can't keep strait how to log in [15:57:45] manybubbles: wikitechwiki I think? [15:57:50] (03CR) 10Aude: Add badge items for beta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193814 (owner: 10Aude) [15:58:45] mforns: what are you trying to do? :) [15:58:52] <^d> manybubbles: went skiing for the weekend, fine until the last 20 minutes when I decided to be stupid. fell and bruised my knee. [15:59:20] hi paravoid, I recently got root access to vanadium, but my usual password does not work [15:59:41] what usual password? [15:59:45] trying to read some logs to troubleshoot eventlogging [16:00:04] manybubbles, anomie, ^d: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150302T1600). [16:00:37] neither the one I use to ssh into stat100x, nor the other one I use for everything else in wmf [16:00:37] (03CR) 10BryanDavis: [C: 04-1] "There may well be a bug but I'm pretty sure that "wikipedia" doesn't work because of this -- https://github.com/wikimedia/operations-media" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191937 (https://phabricator.wikimedia.org/T48640) (owner: 1001tonythomas) [16:00:53] uhm, what? [16:01:06] we don't have password logins enabled [16:01:11] anomie: looks like not the best time for https://gerrit.wikimedia.org/r/#/c/191937/ [16:01:14] (03PS1) 10ArielGlenn: make monitor disk space take params, use for datasets [puppet] - 10https://gerrit.wikimedia.org/r/193834 [16:01:43] manybubbles: when deploying our change for beta wikidata, take extra care to sync things in the correct order [16:01:45] paravoid, ok, it will probably be the pass that I used to sigh my sshkey, then [16:01:57] Wikibase-labs.php and Wikibase-production.php first [16:01:59] aude: k. here. i'll do you first then. [16:02:02] ok [16:02:09] mforns: you should never type this into a remote system then... [16:02:11] tonythomas: are you ready to be next? [16:02:12] (03PS1) 10KartikMistry: CX: Remove wmgContentTranslationTargetNamespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193835 (https://phabricator.wikimedia.org/T91256) [16:02:22] (03CR) 10Manybubbles: [C: 032] Add badge items for beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193814 (owner: 10Aude) [16:02:27] (03Merged) 10jenkins-bot: Add badge items for beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193814 (owner: 10Aude) [16:02:27] paravoid, right [16:03:11] manybubbles: nope. I will have to withdrew that one [16:03:16] you should probably change your passphrase :) [16:03:28] tonythomas: k. [16:03:29] also, accoridng to puppet you have no root access to vanadium [16:04:13] paravoid, wow, I thought I got the access like couple of days ago, the task I created has been marked as resolved [16:04:22] https://phabricator.wikimedia.org/T89471 [16:04:28] !log manybubbles Synchronized wmf-config/Wikibase-labs.php: SWAT wikidata - add badge items for beta (duration: 00m 06s) [16:04:32] Logged the message, Master [16:04:41] !log manybubbles Synchronized wmf-config/Wikibase-production.php: SWAT wikidata - add badge items for beta (duration: 00m 07s) [16:04:44] Logged the message, Master [16:05:25] !log manybubbles Synchronized wmf-config/Wikibase.php: SWAT wikidata - add badge items for beta (duration: 00m 06s) [16:05:26] aude: ^^^^^^^ that was the last one [16:05:27] Logged the message, Master [16:05:29] thanks [16:05:39] * aude verify nothing is horribly broken [16:05:57] quiddity: Flow\Exception\FlowException from line 90 of /srv/mediawiki/php-1.25wmf18/extensions/Flow/includes/Templating.php: Insufficient permissions to see userlinks for rev_id = scppg71o9i6nivwm [16:05:59] quiddity: normal? [16:06:04] just noticed it while deploying [16:06:36] http://wikidata.beta.wmflabs.org/w/api.php?action=wbavailablebadges :) [16:06:43] and wikidata looks good [16:06:53] another call for someone to support twkozlowski's patched for swat this morning. [16:06:59] aude: sweet. I'll consider you done. [16:07:22] k [16:07:23] tonythomas: I can mark the patch as not done if you'd like and you can reschedule. [16:07:46] manybubbles: yes. please. I think we are having still some issues on the same. [16:08:57] tonythomas: done [16:09:00] (03PS2) 10KartikMistry: CX: Publish translations to the Main namespace by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193835 [16:09:29] k. getting a drink. I'll give twkozlowski another fifteen minutes before I skip them and call swat closed. [16:09:32] manybubbles: k. Deploying to 'group1' still remains unsolved [16:10:58] tonythomas: ? I don't understand. need a consult? can do hangout if you want. [16:11:52] they don't want to deploy to wikipedia [16:11:56] but they do want every other wiki [16:12:17] but I found that the 'wiki' tag appears to include sites like mediawikiwiki, which should get it [16:12:41] Krenair: yeah - its kind of funky which tag to use. I remember cursing it with cirrus [16:13:07] 6operations, 10ops-eqiad, 10RESTBase, 6Services: restbase1006 faulty disk controller - https://phabricator.wikimedia.org/T89639#1078340 (10coren) p:5Triage>3High Setting to High as this is blocker to a High priority task. [16:13:39] Krenair: add group0 => true as well? [16:13:49] I doubt that'd work [16:14:00] you'd still catch other *wiki sites [16:14:18] hi twkozlowski [16:14:27] hmm. I always wanted a separate list 'group1' , which as per bd808 would be difficult to maintain [16:14:31] Hi! Sorry for being late. [16:14:51] SiteConfiguration::getSetting is the place all this comes together [16:15:13] (03PS1) 10Krinkle: zuul: Use umask 022 for installing zuul [puppet] - 10https://gerrit.wikimedia.org/r/193836 (https://phabricator.wikimedia.org/T90984) [16:15:17] combined with the wikiTags[] stuff in CommonSettings.php [16:15:40] twkozlowski: ah cool - ready to support you patches? any particular order you want them in? [16:15:48] (03PS3) 10KartikMistry: CX: Publish translations to the Main namespace by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193835 [16:16:05] bd808, like metawiki, for example [16:16:05] bd808: you help them then - I never really traced it all the way through :) [16:16:11] and probably everything else in special.dblist [16:16:13] I honestly haven't audited it deeply but I do know that Reedy told me that wiki was the right key to use for 'all wikipedias'. [16:16:28] really? [16:16:32] https://phabricator.wikimedia.org/T91174 [16:16:49] Yeah I saw you file the bug [16:17:01] and I'm sure you are right that there is a bug in there somehow [16:17:03] (03PS1) 10ArielGlenn: set up bonded interface for ms1001 plus ipv6 for it [puppet] - 10https://gerrit.wikimedia.org/r/193837 [16:17:17] SiteConfiguration is full of voodoo [16:19:10] I'd have to dig in my irc logs to find it but Sam and I had a long discussion about how to set an "everything but wikipedia" config setting when I was rolling out the monolog config. His ultimate answer was the "wiki" key but like I said we didn't trace all the possible code in SiteConfiguration [16:19:39] 6operations: bond eth intefaces on ms1001 - https://phabricator.wikimedia.org/T89829#1078377 (10ArielGlenn) actually I'm working on it by myself :-) takin it back... [16:19:54] 6operations: bond eth intefaces on ms1001 - https://phabricator.wikimedia.org/T89829#1078378 (10ArielGlenn) a:5BBlack>3ArielGlenn [16:20:05] bd808: default => true, wikipedia => false ? [16:20:19] "wiki" works where is does because the site "wikipedia" is translated to "wiki" when it is turned into a dbsuffix [16:20:39] but there may certainly be other wikis where the noraml dbsuffix is also "wiki" [16:20:44] There are [16:20:46] Like commonswiki [16:20:50] wikidatawiki :) [16:20:51] mediawikiwiki [16:20:52] and mediawikiwiki, and metawiki... [16:20:53] you don't want that [16:20:56] maybe [16:21:16] wmf-config/InitialiseSettings.php: 'wiki' => '//bits.wikimedia.org/favicon/wikipedia.ico', // bug 48479 [16:21:16] wmf-config/InitialiseSettings.php: 'wiki' => '//bits.wikimedia.org/apple-touch/wikipedia.png', [16:21:16] wmf-config/InitialiseSettings.php: 'wiki' => '/srv/mediawiki/images/sul/wikipedia.png', [16:21:23] wmf-config/InitialiseSettings.php: 'wiki' => '/images/mobile/W.png', [16:21:23] wmf-config/InitialiseSettings.php: 'wiki' => 'org.wikipedia', [16:21:24] sigh :( [16:21:33] metawiki should be metawikiwiki [16:21:40] and that's how we get bugs of wikipedia touch icon on wikidata [16:21:47] If line 181 didn't exist in CommonSettings.php you could use "wikipedia" [16:21:56] but I have no idea what removing that would break [16:22:51] 6operations: bond eth intefaces on ms1001 - https://phabricator.wikimedia.org/T89829#1078391 (10ArielGlenn) https://gerrit.wikimedia.org/r/#/c/193837/ for this. [16:23:07] 6operations, 5Patch-For-Review: bond eth intefaces on ms1001 - https://phabricator.wikimedia.org/T89829#1078395 (10ArielGlenn) [16:23:24] You could also add: if ( $site === 'wikipedia' ) $wikiTags[]="wikipedia" [16:23:40] and then "wikipedia" would work too [16:23:40] (03CR) 10coren: [C: 031] "That looks good." [puppet] - 10https://gerrit.wikimedia.org/r/193837 (owner: 10ArielGlenn) [16:23:44] 6operations, 10ops-codfw: rack/wire/initial setup of db2043-db2070 - https://phabricator.wikimedia.org/T89368#1078400 (10faidon) a:5faidon>3mark [16:24:08] but again I'm not confident in how to audit the possible ramifications of that [16:24:11] there is +wikipedia [16:24:17] think that is the db list [16:24:54] I doubt it works actually [16:25:00] maybe... [16:25:09] twkozlowski: ready for swat? [16:25:51] '+wikipedia' => array( 'Wikipedia' => NS_PROJECT ), [16:25:55] in wgNamespaceAliases [16:26:09] the + is magical [16:26:18] indeed [16:26:24] + just means add to default config [16:26:24] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - leila - https://phabricator.wikimedia.org/T90954#1078402 (10chasemp) @leila ping [16:26:38] 6operations, 6Security: define in Puppet or remove user account - amire80 - https://phabricator.wikimedia.org/T90950#1078413 (10chasemp) @Amire80 ping [16:26:51] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - dartar - https://phabricator.wikimedia.org/T90949#1078415 (10chasemp) @DarTar ping [16:27:47] SiteConfiguration::getSetting looks for the "+" prefix and does an array_merge instead of overwriting the value found so far in the lookup [16:28:54] bd808: so vaguely magical [16:29:05] 10Ops-Access-Requests, 6operations: Requesting sudo access to vanadium for mforns - https://phabricator.wikimedia.org/T89471#1078422 (10mforns) 5Resolved>3Open Hi, As I saw this task was marked as resolved I tried to sudo in vanadium and could not. I spoke with Faidon and he told me I was still not a sudo... [16:29:12] OUr whole config system is a bit wacky and organic [16:29:36] I only buy organic config systems [16:30:04] "organic". What a nice way of putting it. :-) [16:30:08] I prefer mine to be GMO honestly :) [16:30:13] * tonythomas was expecting this on https://gerrit.wikimedia.org/r/#/c/191937/ :) [16:30:31] bd808: you like them to glow in the dark? :) [16:30:35] (03CR) 10Krinkle: [C: 031] "Applied on integration-puppetmaster, solves the bug." [puppet] - 10https://gerrit.wikimedia.org/r/193836 (https://phabricator.wikimedia.org/T90984) (owner: 10Krinkle) [16:30:56] yup. and contain fungus resistance genes [16:30:59] 6operations, 6Security: define in Puppet or remove user account - mglaser - https://phabricator.wikimedia.org/T90947#1078427 (10chasemp) 5Open>3Resolved @mglaser, thanks for the response! This is done [16:31:00] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1078429 (10chasemp) [16:31:13] bd808: Delicate subject. :-) [16:31:24] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071208 (10chasemp) [16:31:46] 6operations, 6Security: define in Puppet or remove user account - hoo - https://phabricator.wikimedia.org/T90940#1078433 (10chasemp) a:3hoo @hoo ping [16:32:04] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - santhosh - https://phabricator.wikimedia.org/T90937#1078436 (10chasemp) @santhosh ping [16:32:20] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - tnegrin - https://phabricator.wikimedia.org/T90932#1078438 (10chasemp) @Tnegrin ping [16:32:20] twkozlowski: last ping on patches for swat. are you ready for them? [16:32:31] 6operations, 6Security: define in Puppet or remove user account - amire80 - https://phabricator.wikimedia.org/T90950#1078439 (10Amire80) I never used them, and if I did, it was so long ago that I don't remember. I do use terbium. [16:32:33] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - tfinc - https://phabricator.wikimedia.org/T90927#1078441 (10chasemp) @tfinc ping [16:32:35] (03PS1) 10Faidon Liambotis: autoinstall: set mirror/udeb/suite on jessie [puppet] - 10https://gerrit.wikimedia.org/r/193840 [16:32:39] manybubbles: Sorry! [16:32:41] bblack: ^ [16:32:41] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - diederik - https://phabricator.wikimedia.org/T90951#1078443 (10chasemp) @drdee, just a note it looks like your last activity was in 2014 :) I am planning on removing these production accounts during general cleanup if t... [16:32:44] manybubbles: Yeah, ready ready [16:32:53] tonythomas, bd808: Can we do today's bouncehandler deployment in CommonSettings with a $site !== 'wikipedia' check or something? [16:32:58] bd808: Personally, I make it a point to demand inorganic, mechanical and supernatural food. Seeing "organic", "biological" and "natural" misused all the time annoys me to no end. [16:33:00] bblack: it worked, right? [16:33:04] twkozlowski: k. I'm just going to do them in whatever order they are open in my browser. shout if you have objections. [16:33:11] (03CR) 10Manybubbles: [C: 032] Set $wgBabelCategoryNames true at outreachwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190686 (https://phabricator.wikimedia.org/T89484) (owner: 10Gerardduenas) [16:33:33] would be a shame to miss the window because of silly configuration system hacks [16:33:35] Krenair: with just 'deafult' => true ? [16:33:44] (03Merged) 10jenkins-bot: Set $wgBabelCategoryNames true at outreachwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190686 (https://phabricator.wikimedia.org/T89484) (owner: 10Gerardduenas) [16:34:05] Krenair: So the wiki=>false would just exclude a bit more than the wikipedias correct? [16:34:11] which is not the end of the world [16:34:13] paravoid: yes [16:34:28] well it'd disable it in places we should probably keep it on [16:34:30] (03CR) 10BBlack: [C: 031] autoinstall: set mirror/udeb/suite on jessie [puppet] - 10https://gerrit.wikimedia.org/r/193840 (owner: 10Faidon Liambotis) [16:34:48] like the group0 sites [16:34:54] akosiaris: Hey, how are things going with getting the citoid service working in production? I'm being asked for another urgent update, and gwicke said you'd know best. [16:35:42] !log manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - Set $wgBabelCategoryNames true at outreachwiki (duration: 00m 06s) [16:35:43] twkozlowski: ^^^^ first one [16:35:45] Logged the message, Master [16:36:28] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - leila - https://phabricator.wikimedia.org/T90954#1078465 (10leila) @DarTar I don't see a need for maintaining access to oxygen and gadolinium. Could you confirm? [16:36:52] omg that code path in SiteConfiguration::getSetting includes `break 2;` [16:37:03] a goto in break clothing [16:37:11] * bd808 shudders [16:37:14] break 2 ? [16:37:29] it breaks the inner and outer loops [16:37:37] hmm. [16:37:38] bd808: break 7; [16:37:45] manybubbles: Looks good. [16:37:53] manybubbles: :) why not make 10 louder? [16:38:16] (03CR) 10Manybubbles: [C: 032] Change templateeditor user group rights on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189594 (https://phabricator.wikimedia.org/T89040) (owner: 10Calak) [16:38:22] (03Merged) 10jenkins-bot: Change templateeditor user group rights on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189594 (https://phabricator.wikimedia.org/T89040) (owner: 10Calak) [16:39:16] !log manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - Change templateeditor user group rights on fawiki (duration: 00m 07s) [16:39:19] twkozlowski: ^^^ second [16:39:19] Logged the message, Master [16:40:17] 6operations, 6Security: define in Puppet or remove user account - hoo - https://phabricator.wikimedia.org/T90940#1078471 (10hoo) A machine that deployers probably had access to previously... I don't need access to them, feel free to remove it. [16:40:55] manybubbles: Works [16:41:03] (03CR) 10Manybubbles: [C: 032] AbuseFilter config change for ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192246 (https://phabricator.wikimedia.org/T89379) (owner: 10Base) [16:41:10] (03Merged) 10jenkins-bot: AbuseFilter config change for ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192246 (https://phabricator.wikimedia.org/T89379) (owner: 10Base) [16:41:49] PROBLEM - HHVM rendering on mw1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:41:55] !log manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - AbuseFilter config change for ukwiki (duration: 00m 07s) [16:41:56] twkozlowski: ^^^ [16:42:00] Logged the message, Master [16:42:30] PROBLEM - Apache HTTP on mw1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:43:05] Now how do I test that. [16:43:24] ...... I think we'll just move on [16:43:44] (03CR) 10Manybubbles: [C: 032] Enable NewUserMessage extension for fawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193016 (https://phabricator.wikimedia.org/T90831) (owner: 10Mjbmr) [16:43:50] (03Merged) 10jenkins-bot: Enable NewUserMessage extension for fawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193016 (https://phabricator.wikimedia.org/T90831) (owner: 10Mjbmr) [16:44:33] manybubbles: I'll ask Base to report on Phab whether that AF config change works [16:44:44] (03CR) 10Manybubbles: "Oh, hmmmm - I don't know that I actually have the proper permissions to create those tables....." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193392 (https://phabricator.wikimedia.org/T89818) (owner: 10Glaisher) [16:44:46] (03PS3) 10ArielGlenn: fix up ordering for salt-minion package, config, service [puppet] - 10https://gerrit.wikimedia.org/r/162860 [16:45:32] !log manybubbles Synchronized wmf-config/abusefilter.php: SWAT - AbuseFilter config change for ukwiki (duration: 00m 07s) [16:45:35] Logged the message, Master [16:45:41] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - diederik - https://phabricator.wikimedia.org/T90951#1078480 (10Tnegrin) I'm fine either way. Diederik has signed the NDA and as such has access to these systems. -Toby [16:45:49] !log manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - Enable WikiLove extension at newiki (duration: 00m 07s) [16:45:51] Logged the message, Master [16:46:18] !log correction to last sync -message - was totally wrong - patch instead did this: "Enable NewUserMessage extension for fawiktionary " [16:46:22] Logged the message, Master [16:46:23] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - tnegrin - https://phabricator.wikimedia.org/T90932#1078482 (10Tnegrin) I need access to this system to run various queries against hadoop and other databases. I do not need admin access. thanks, -Toby [16:47:15] manybubbles, you don't know if you can do schema updates? [16:47:17] manybubbles: Yep, NewUserMessage now works [16:47:28] (03PS4) 10ArielGlenn: fix up ordering for salt-minion package, config, service [puppet] - 10https://gerrit.wikimedia.org/r/162860 [16:47:51] Krenair: let me clarify - I've never done one at WMF [16:48:11] ok [16:48:23] let me look and see if there is documentation on it. anyone around who has done schema updates? [16:48:27] I'm pretty sure the wikiadmin user gives us complete write access to all the wiki dbs [16:48:35] https://wikitech.wikimedia.org/wiki/How_to_do_a_schema_change [16:48:49] (03PS1) 10Giuseppe Lavagetto: dhcp: add entry for mc2017 and mc2018 [puppet] - 10https://gerrit.wikimedia.org/r/193841 [16:49:10] <_joe_> papaul: ^^ [16:49:15] <_joe_> just FYI [16:49:59] (03CR) 10Giuseppe Lavagetto: [C: 032] dhcp: add entry for mc2017 and mc2018 [puppet] - 10https://gerrit.wikimedia.org/r/193841 (owner: 10Giuseppe Lavagetto) [16:50:39] Krenair: yeah. see it. I suppose I could just run that. anyone around who can clean up after me if I break it? this isn't something I'm as confident doing. [16:50:49] manybubbles: I have always just asked s.pringle before the change was merged :/ [16:52:18] (03CR) 10Manybubbles: "OK - https://wikitech.wikimedia.org/wiki/How_to_do_a_schema_change makes it clear I _can_ do schema changes but its not clear how I'd clea" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193392 (https://phabricator.wikimedia.org/T89818) (owner: 10Glaisher) [16:52:30] PROBLEM - HHVM busy threads on mw1087 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [86.4] [16:52:41] so, doing what I say in the comment - delaying until springle replies [16:53:28] I'm 80% sure i could just run the script but as its something I haven't done before and because its a part I don't understand too well _and_ because our database setup is known to require lots of care - I'm more comfortable delaying this [16:55:13] twkozlowski: ok - all done but the last one. pinged springle on the last one. [16:55:26] aude: have you don't schema changes? [16:55:42] manybubbles: Thanks! [16:55:42] (03CR) 10Nemo bis: "Ouch, sorry. We have wikipedia.dblist though, so it could be added" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191937 (https://phabricator.wikimedia.org/T48640) (owner: 1001tonythomas) [16:55:44] (03PS2) 10BBlack: autoinstall: set mirror/udeb/suite on jessie [puppet] - 10https://gerrit.wikimedia.org/r/193840 (owner: 10Faidon Liambotis) [16:56:10] (03CR) 10BBlack: [C: 032 V: 032] autoinstall: set mirror/udeb/suite on jessie [puppet] - 10https://gerrit.wikimedia.org/r/193840 (owner: 10Faidon Liambotis) [16:57:19] (03PS6) 10Nemo bis: Added BounceHandler extension to group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191937 (https://phabricator.wikimedia.org/T48640) (owner: 1001tonythomas) [16:57:34] James_F: OKish. Things are moving on, we had a meeting on Tuesday about it, tickets have been filled for all work items and being worked on. Right, I am fighting to resolve a problem with xpcshell and zotero and the rest which is the major blocker. That's about it [16:58:08] akosiaris: *nods* Do you need anything to help? [16:58:30] James_F: I don't think so [16:58:43] akosiaris: OK. If you think of anything, just shout. :_0 [16:58:45] manybubbles: ? [16:58:55] James_F: sure. Thanks! [16:59:00] what needs doing? [16:59:06] aude: just wondering if you'd done schema migrations. I've never done them but cirrus never needed one. [16:59:14] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - leila - https://phabricator.wikimedia.org/T90954#1078521 (10DarTar) @leila you can safely remove access to these servers. [16:59:15] yeah [16:59:19] if it's a new table [16:59:20] deployingn wikilove extensions to a wiki - its out of the window though [16:59:40] schema change on gigantic table... that's for sean [16:59:42] aude: yeah - its a new table. I just didn't want to do it and have to fail horribly if I made a mistake [16:59:45] ok [16:59:54] ticket? [17:00:42] aude: https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=146616&oldid=146615 [17:00:54] foreachwikiindblist wikibooks.dblist maintenance/patchSql.php php-1.25wmf17/maintenance/archives/patch-sites.sql [17:00:57] PROBLEM - HHVM queue size on mw1087 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [80.0] [17:00:58] for example [17:01:09] Krenair: bd808 : can you take a look at the new PS, Nemo_bis uploaded over https://gerrit.wikimedia.org/r/#/c/191937/ ? [17:02:06] tonythomas, it looks sane... I can't verify that it won't break other stuff though [17:02:23] or for a single wiki, just mwscript ... [17:05:31] paravoid, hi again, do you know who is in RT duty today for ops? [17:05:47] mforns: 'tis me [17:05:53] mforns: see /topic :) [17:06:00] thanks paravoid [17:06:02] what Coren said :) [17:06:24] mforns: What can I do to you? [17:06:44] hi Coren, I wanted to ask you about a task, that I thought it was resolved, but it seems there was a problem [17:07:03] mforns: Can you point me at it? [17:07:11] sure, just a sec [17:07:36] here: https://phabricator.wikimedia.org/T89471 [17:07:58] I reopened it half an hour ago [17:09:32] mforns: Ah, looks like simple distractedness. The similarity of the tickets apparently got them mixed. [17:09:42] mforns: I'll gladly fix that for you right now. [17:09:52] Coren, yea, that's what I thought [17:09:59] oh, thank you! [17:10:13] let me know if you need something from me [17:12:10] (03PS1) 10coren: Add mforns to eventlogging-roots [puppet] - 10https://gerrit.wikimedia.org/r/193844 (https://phabricator.wikimedia.org/T89471) [17:12:26] bd808, we don't use composer for anything except wikidata, do we? [17:12:49] Krenair: we use it for MediaWiki core [17:12:59] ok... what about extensions? [17:12:59] and for multiversion too [17:13:21] There are libraries in mediawiki/vendor for extensions [17:13:41] the cirrus extension has a lib in there for sure [17:14:08] (03CR) 10coren: [C: 032] "That had already been approved and thought completed in the linked task." [puppet] - 10https://gerrit.wikimedia.org/r/193844 (https://phabricator.wikimedia.org/T89471) (owner: 10coren) [17:14:25] Krenair: https://github.com/wikimedia/mediawiki-vendor/blob/master/composer.json#L12 [17:15:03] I was wondering because of the suggested instructions in https://phabricator.wikimedia.org/T88748 [17:15:12] Krenair: what do you mean by "use"? [17:15:17] Krenair: ugh [17:15:19] (03PS2) 10Ottomata: Render statistics-private mysql.conf credentials on stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/193503 [17:15:19] the most obvious thing wrong is update.php [17:15:33] currently, we use it locally and update results to gerrit [17:15:38] (03PS1) 10BBlack: HT support for nginx SSL CPU pinning [puppet] - 10https://gerrit.wikimedia.org/r/193845 [17:15:39] mforns: Try it now? [17:15:39] vs. run it in production [17:15:42] we only have SMW on wikitech and it uses a really old version that didn't need composer [17:15:46] right [17:15:47] Coren, ok [17:15:52] Also, this is asking for a "SMW Bundle" [17:15:57] which we don't have on wikitech, AFAIK? [17:16:09] certainly appears to include a lot more: https://www.mediawiki.org/wiki/Semantic_Bundle#Contents [17:16:28] Coren, yay! it works [17:16:33] Thanks! [17:16:50] Krenair: Yeah. https://phabricator.wikimedia.org/T88748#1078553 is key [17:16:55] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting sudo access to vanadium for mforns - https://phabricator.wikimedia.org/T89471#1078589 (10coren) 5Open>3Resolved Sorry about the mixup: wires got crossed and you were indeed left out of the other patch. [17:17:09] (03CR) 10BBlack: [C: 032] HT support for nginx SSL CPU pinning [puppet] - 10https://gerrit.wikimedia.org/r/193845 (owner: 10BBlack) [17:17:35] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting sudo access to vanadium for mforns - https://phabricator.wikimedia.org/T89471#1078594 (10mforns) No problem at all, thanks for the quick fix! [17:18:25] bblack: lol wtf [17:18:29] you're crazy :) [17:18:33] (good crazy :) [17:18:48] :P [17:21:32] I haven't decided whether to stick with that, or map 1 process-per-virtual-core, in pairs pinned to both siblings. That would maximize CPU for nginx, but I figure HT isn't +100% anyways, and varnish/io will consume the remainder [17:22:33] oh plus ipsec, so yeah why bother :) [17:23:18] (03PS2) 10BBlack: starting point for varnish storage sanitization T90583 [puppet] - 10https://gerrit.wikimedia.org/r/193548 [17:25:13] (03PS3) 10Ottomata: Render statistics-private mysql.conf credentials on stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/193503 [17:25:36] (03PS1) 10RobH: reclaim holmium to spares [dns] - 10https://gerrit.wikimedia.org/r/193847 [17:28:37] (03CR) 10RobH: [C: 032] reclaim holmium to spares [dns] - 10https://gerrit.wikimedia.org/r/193847 (owner: 10RobH) [17:29:58] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1078642 (10chasemp) [17:29:59] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - leila - https://phabricator.wikimedia.org/T90954#1078640 (10chasemp) 5Open>3Resolved Done and thanks! [17:31:11] 6operations, 10ops-codfw: prepare equipment list for eqord - https://phabricator.wikimedia.org/T91079#1078644 (10Reedy) [17:31:28] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071208 (10chasemp) [17:31:30] 6operations, 6Security: define in Puppet or remove user account - amire80 - https://phabricator.wikimedia.org/T90950#1078645 (10chasemp) 5Open>3Resolved oxygen.wikimedia.org gadolinium.wikimedia.org These have been removed. Thanks! [17:32:06] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1078651 (10chasemp) [17:32:07] 6operations, 6Security: define in Puppet or remove user account - hoo - https://phabricator.wikimedia.org/T90940#1078649 (10chasemp) 5Open>3Resolved >>! In T90940#1078471, @hoo wrote: > A machine that deployers probably had access to previously... I don't need access to them, feel free to remove it. done... [17:33:04] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071208 (10chasemp) [17:34:09] 6operations, 10ops-codfw, 3wikis-in-codfw: Configure mw2001-2134 correctly - https://phabricator.wikimedia.org/T91238#1078664 (10Papaul) Joe needs to confirm with Mark if we need to enable console redirection as we discuss in IRC. [17:34:18] (03CR) 10Ottomata: [C: 032] starting point for varnish storage sanitization T90583 [puppet] - 10https://gerrit.wikimedia.org/r/193548 (owner: 10BBlack) [17:35:08] ack! [17:35:10] bblack: i was about to puppet mege [17:35:11] but [17:35:13] i can wait [17:35:15] i'll let you do your thing [17:35:19] mine is no biggy, you can merge anytime [17:35:37] ? [17:35:43] 6operations: Add Yana to contracts@ - https://phabricator.wikimedia.org/T91269#1078674 (10emailbot) [17:35:44] OH [17:35:46] CRAP [17:35:49] :) [17:35:57] wrong tab. [17:35:58] fixing. [17:36:01] it's ok, I was about to do it anyways, but I need to disable puppet across the prod clusters first [17:36:05] just in case [17:36:05] oh you were? [17:36:09] (03PS1) 10Rush: Add tnegrin to statistics-web-users [puppet] - 10https://gerrit.wikimedia.org/r/193848 [17:36:17] shoudl I revert? [17:36:19] i haven't puppet merged [17:36:21] give me like, 1-2 mins [17:36:27] ok [17:37:14] ottomata: go ahead [17:37:25] ok [17:37:27] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: puppet fail [17:37:38] done [17:37:42] (03PS4) 10Ottomata: Render statistics-private mysql.conf credentials on stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/193503 [17:38:45] (03CR) 10Ottomata: [C: 032] Render statistics-private mysql.conf credentials on stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/193503 (owner: 10Ottomata) [17:40:06] (03PS1) 10Rush: admin add parsoid-rt cleanup reference [puppet] - 10https://gerrit.wikimedia.org/r/193849 [17:40:21] (03PS1) 10BBlack: 2layer caches: def 360G space for S3700 partitioning [puppet] - 10https://gerrit.wikimedia.org/r/193850 [17:40:45] (03CR) 10BBlack: [C: 032 V: 032] 2layer caches: def 360G space for S3700 partitioning [puppet] - 10https://gerrit.wikimedia.org/r/193850 (owner: 10BBlack) [17:40:57] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [17:42:47] 6operations, 10ops-codfw, 3wikis-in-codfw: PXE doesn't work on mc2017-18 - https://phabricator.wikimedia.org/T90586#1078710 (10Papaul) Joe did add the NIC MAC for both severs in the dhcp files. Both servers made dhcp request. But I am still receiving PXE-E18: server response time out on both servers. [17:46:56] manybubbles, I'm not sure about that Flow error message. I'll paste what you wrote in #wikimedia-collaboration so that the devs definitely see it. thanks :) [17:49:56] (03PS2) 10Rush: admin add parsoid-rt cleanup reference [puppet] - 10https://gerrit.wikimedia.org/r/193849 [17:50:04] (03CR) 10Rush: [C: 032] admin add parsoid-rt cleanup reference [puppet] - 10https://gerrit.wikimedia.org/r/193849 (owner: 10Rush) [17:50:11] (03CR) 10Rush: [V: 032] admin add parsoid-rt cleanup reference [puppet] - 10https://gerrit.wikimedia.org/r/193849 (owner: 10Rush) [17:51:16] hello together! I have a (big) problem on my local setup. If i try to clone a repository from wikimedia git, i get transfer rates like 20 kBit/s, and if i clone from github, it uses the full connection speed (1,7 MB/s here). Both ssh and https. I made a traceroute to find out a bottleneck (https://gist.github.com/Florianschmidtwelzow/22defb24b0b3832d63e9) and saw, that the routes goes ober telia.net. Now i remember oth [17:51:25] ...connections when routes goes over telia.net :/ [17:51:37] any ideas how to find out more and probably fix the problem? :) [17:51:46] paravoid: ^ if you're about [17:52:24] You might need to provide your external ip, but you can do that in PM when someone responds if you don't want to post it publicly [17:52:30] FlorianSW: Your message was cut off after "Now I remember oth" [17:52:55] 6operations: Add Yana to contracts@ - https://phabricator.wikimedia.org/T91269#1078732 (10coren) p:5Triage>3Normal Followed up by email. [17:52:57] yes, your external IP would be helpful [17:52:59] RoanKattouw: Now i remember other users of my provider reporting slow... [17:53:03] 6operations: Add Yana to contracts@ - https://phabricator.wikimedia.org/T91269#1078734 (10coren) a:3coren [17:54:23] FlorianSW: external IP? :P PM it to paravoid if you don't want to post it publicly [17:54:40] Reedy: already did :D [17:56:47] (03CR) 10Dzahn: "it _seems_ we own wikibooks.com now" [puppet] - 10https://gerrit.wikimedia.org/r/185474 (https://phabricator.wikimedia.org/T87039) (owner: 10Glaisher) [18:09:31] (03CR) 10Dzahn: "i think these files are too large to put them here. i don't want to grow operations/puppet by several Megabytes for this. it's also a one-" [puppet] - 10https://gerrit.wikimedia.org/r/192964 (https://phabricator.wikimedia.org/T85140) (owner: 10Dzahn) [18:10:47] (03PS1) 10Aklapper: Remove "shell" from monthly Phabricator statistics email. [puppet] - 10https://gerrit.wikimedia.org/r/193852 [18:10:58] (03CR) 10Rush: "put these in their own repo?" [puppet] - 10https://gerrit.wikimedia.org/r/192964 (https://phabricator.wikimedia.org/T85140) (owner: 10Dzahn) [18:11:05] mutante: if the files are just one time for the warc format, just add them straight to zirconium really [18:12:45] (03CR) 10John F. Lewis: "Personally I say just put them straight onto zirconium really. The 146k html files are on their so just these pages which will be redundan" [puppet] - 10https://gerrit.wikimedia.org/r/192964 (https://phabricator.wikimedia.org/T85140) (owner: 10Dzahn) [18:14:38] 6operations, 10ops-codfw, 3wikis-in-codfw: mc2004 console is unreadable remotely - https://phabricator.wikimedia.org/T90883#1078835 (10coren) a:3Papaul [18:15:42] (03CR) 10Dduvall: [WIP] Add role::mediawiki_vagrant_lxc (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/193665 (owner: 10BryanDavis) [18:18:53] (03CR) 10BryanDavis: [WIP] Add role::mediawiki_vagrant_lxc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/193665 (owner: 10BryanDavis) [18:19:14] 7Blocked-on-Operations, 6operations, 10RESTBase, 10hardware-requests, 7RESTBase-architecture: RESTBase production hardware - 5 of 6 ready - https://phabricator.wikimedia.org/T76986#1078848 (10coren) [18:19:25] (03CR) 10Dzahn: "yes, a separate repo would normally make sense, but John is right, it only makes sense if we also put all the bug and activity HTML files " [puppet] - 10https://gerrit.wikimedia.org/r/192964 (https://phabricator.wikimedia.org/T85140) (owner: 10Dzahn) [18:19:38] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - leila - https://phabricator.wikimedia.org/T90954#1078849 (10leila) thanks, @DarTar. Thanks, @chasemp, and sorry that it took me few days to get back to you. [18:20:40] 6operations: install/deploy dbproxy1003 through dbproxy1011 - https://phabricator.wikimedia.org/T86958#1078854 (10RobH) [18:20:40] 6operations, 10ops-eqiad: relocate/wire/setup dbproxy1003 through dbproxy1011 - https://phabricator.wikimedia.org/T86957#1078852 (10RobH) 5Open>3Resolved a:5RobH>3None [18:23:11] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - dartar - https://phabricator.wikimedia.org/T90949#1078859 (10DarTar) I don't need access to these servers any more, thanks for checking. [18:23:14] what would you say is the maximum size of a gerrit/git repo [18:23:28] 3.6G HTML files is too much, isnt it [18:23:40] (03CR) 10coren: [C: 031] "That seems entirely sane." [puppet] - 10https://gerrit.wikimedia.org/r/193165 (https://phabricator.wikimedia.org/T90466) (owner: 10Thcipriani) [18:23:48] (03PS1) 10RobH: setting dbproxy1008 mac address [puppet] - 10https://gerrit.wikimedia.org/r/193854 [18:24:47] PROBLEM - puppet last run on cp3013 is CRITICAL: CRITICAL: Puppet has 1 failures [18:24:56] (03CR) 10RobH: [C: 032] setting dbproxy1008 mac address [puppet] - 10https://gerrit.wikimedia.org/r/193854 (owner: 10RobH) [18:25:28] oh, i think we set a maximum anyways after that one time we uploaded Gigabytes by accident [18:25:56] mutante: lol [18:26:29] there isn't really one. Nor should one be dictated, AFAIC [18:26:45] but yeah, 3.6Gs is not nice to other people [18:27:33] i have all those static HTML files of static Bugzilla and deploying them (in either way) seems too much [18:28:14] do they ever change ? [18:28:17] 10Ops-Access-Requests, 6operations: Access to ops-access-request for me - https://phabricator.wikimedia.org/T91280#1078886 (10RobLa-WMF) 3NEW [18:28:30] or will they ever change, if you prefer [18:28:42] no, they are supposed to stay like this forever [18:29:02] and i just uploaded them manually [18:29:12] (03CR) 10ArielGlenn: [C: 031] "what happened to that table? Is it available everywhere yet?" [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/156450 (https://bugzilla.wikimedia.org/51225) (owner: 10MaxSem) [18:29:20] well, i have 2 more files to add to them with links [18:29:26] seems fine to me [18:30:47] (03PS1) 10BBlack: set bnx2x num_queues on cache nodes in late_command [puppet] - 10https://gerrit.wikimedia.org/r/193857 [18:32:35] (03CR) 10ArielGlenn: "Can we just skip the labhosts for the moment? Pass a parameter in for this, define it in hiera only for the labstore hosts as yes, default" [puppet] - 10https://gerrit.wikimedia.org/r/160628 (owner: 10Matanya) [18:34:08] !log starting script to reindex search changes made yesterday night on enwiki (script is https://wikitech.wikimedia.org/wiki/Search#Recovering_from_an_Elasticsearch_outage.2Finterruption_in_updates) [18:34:12] Logged the message, Master [18:40:47] (03PS1) 10Dzahn: static bugzilla: add links to all bugs/activities [puppet] - 10https://gerrit.wikimedia.org/r/193858 (https://phabricator.wikimedia.org/T85140) [18:42:37] RECOVERY - puppet last run on cp3013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:45:46] robh: I can connect to californium’s mgmt but I can’t ping it. any thoughts on who I should bug about that? [18:46:44] 64 bytes from californium.mgmt.eqiad.wmnet [18:46:51] can ping it [18:47:39] Yes, sorry — I mean, I can’t ping anything /but/ mgmt [18:47:57] which makes it hard to do anything other than install the OS over and over [18:48:28] the os installs? [18:49:09] sure. serial console shows a login prompt. [18:49:19] are you trying with the new_install ssh key? [18:49:21] ok, but you watched the OS install [18:49:28] or is that the old os and possibly old iP? [18:49:34] Part of it [18:49:40] Dunno, I’m happy to install again. [18:49:49] mutante: ping doesn’t use a key :) [18:49:52] lemme check out the network before you do [18:49:56] thanks [18:50:05] and the issue is you cannot ping its production interface [18:50:07] mgmt is fine [18:50:11] correct? [18:50:19] yep [18:50:33] and its supposed to be private vlan from looks of dns [18:50:56] ah, then most likely switch port config [18:51:06] ge-4/0/38 up up californium [18:51:10] It doesn’t need a public IP — it’ll be behind the misc-web varnishes. [18:51:13] switch config is fine, checking mac address [18:52:09] (03CR) 10Dzahn: [C: 032] static bugzilla: add links to all bugs/activities [puppet] - 10https://gerrit.wikimedia.org/r/193858 (https://phabricator.wikimedia.org/T85140) (owner: 10Dzahn) [18:52:24] so mac matches in both [18:52:37] andrewbogott: so the next thing i'd do is pxe boot and watch carbon as well as the serial console output [18:52:49] and confirm that the dhcp request and tftp image load hit carbon [18:52:54] and that it actually installs on the sysetm [18:52:58] ok. I will try after the meeting. [18:53:00] thank you for looking [18:53:20] but yea, the switch is all configured and carbon config has the proper mac address [18:53:32] (confirmed off the port labeling and ethernet switching table on switch) [18:53:58] the box gets installed or not ? [18:54:40] (03CR) 10Calak: "Thank you Manybubbles." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189594 (https://phabricator.wikimedia.org/T89040) (owner: 10Calak) [18:54:40] akosiaris: it’s showing a login prompt; Rob thinks maybe that OS install predates a move. [18:56:26] easy to check. try to login with the root account [18:56:49] if you succeed it's an old install, if not... undetermined [18:57:08] e.g. cause you might not have the correct password [18:57:09] akosiaris: I think in this case ‘old’ means, a few weeks old. Since it only just moved. [18:57:31] as in installed and never accepted into puppet ? [18:57:40] right [18:57:43] neither old nor new root works [18:57:57] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [18:58:04] expected if it never made it into puppet [18:58:51] (03CR) 10Rush: "@alexandros" [puppet] - 10https://gerrit.wikimedia.org/r/160628 (owner: 10Matanya) [19:01:09] 10Ops-Access-Requests, 6operations: Access to ops-access-request for me - https://phabricator.wikimedia.org/T91280#1079068 (10coren) 5Open>3Invalid a:3coren Ah, that is a mistake on my part: I accidentally left you and the ops-access-requests group on a related ticket. :-) [19:01:14] (03PS1) 10Papaul: added mc2135 [puppet] - 10https://gerrit.wikimedia.org/r/193864 [19:01:27] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [19:02:23] joe: what email or name do i have to use in gerrit for you? [19:03:32] (03CR) 10Giuseppe Lavagetto: [C: 04-1] added mc2135 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/193864 (owner: 10Papaul) [19:04:47] PROBLEM - puppet last run on cp4006 is CRITICAL: CRITICAL: Puppet has 1 failures [19:10:07] (03PS1) 10Papaul: fixed clossing brace [puppet] - 10https://gerrit.wikimedia.org/r/193867 [19:12:26] (03Abandoned) 10Papaul: added mc2135 [puppet] - 10https://gerrit.wikimedia.org/r/193864 (owner: 10Papaul) [19:12:37] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: Puppet has 1 failures [19:15:12] (03PS2) 10BBlack: set bnx2x num_queues on cache nodes in late_command [puppet] - 10https://gerrit.wikimedia.org/r/193857 [19:15:35] !log finished Cirrus outage recovery job script for enwiki [19:15:40] !log starting on all other wikis [19:15:42] Logged the message, Master [19:15:44] Logged the message, Master [19:16:46] PROBLEM - txstatsd backend instances on graphite1001 is CRITICAL: CRITICAL: Not all configured txstatsd instances are running. [19:16:57] PROBLEM - puppet last run on amssq33 is CRITICAL: CRITICAL: Puppet has 1 failures [19:16:58] PROBLEM - puppet last run on amssq39 is CRITICAL: CRITICAL: Puppet has 1 failures [19:17:10] (03CR) 10Chad: [C: 032] Enable Collection by default on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165490 (https://phabricator.wikimedia.org/T73416) (owner: 10Reedy) [19:17:23] (03Merged) 10jenkins-bot: Enable Collection by default on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165490 (https://phabricator.wikimedia.org/T73416) (owner: 10Reedy) [19:17:58] RECOVERY - uWSGI web apps on graphite2001 is OK: OK: All defined uWSGI apps are runnning. [19:18:39] yay [19:18:42] !log demon Synchronized wmf-config/InitialiseSettings.php: collection on all wikis (duration: 00m 07s) [19:18:43] <^d> :) [19:18:45] Logged the message, Master [19:19:39] !log looks like that didn't cover the whole range - expanding the range of reindexed data - starting against for enwiki [19:19:42] Logged the message, Master [19:20:27] RECOVERY - HHVM busy threads on mw1087 is OK: OK: Less than 30.00% above the threshold [57.6] [19:20:46] RECOVERY - HHVM queue size on mw1087 is OK: OK: Less than 30.00% above the threshold [10.0] [19:21:27] PROBLEM - uWSGI web apps on graphite2001 is CRITICAL: CRITICAL: Not all configured uWSGI apps are running. [19:22:33] (03CR) 10Chad: [C: 032] Enable WikiLove extension at newiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193392 (https://phabricator.wikimedia.org/T89818) (owner: 10Glaisher) [19:22:40] (03Merged) 10jenkins-bot: Enable WikiLove extension at newiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193392 (https://phabricator.wikimedia.org/T89818) (owner: 10Glaisher) [19:24:47] RECOVERY - puppet last run on cp4006 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [19:25:10] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s) [19:25:14] Logged the message, Master [19:27:06] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [19:29:12] 10Ops-Access-Requests, 6operations: Access to ops-access-request for me - https://phabricator.wikimedia.org/T91280#1079146 (10RobLa-WMF) 5Invalid>3Open Hi Coren, thanks for noticing that on the specific T91257 (which, incidentally, I still don't have access to), but the general comment still stands. I don... [19:30:37] 10Ops-Access-Requests, 6operations: Access to ops-access-request for me - https://phabricator.wikimedia.org/T91280#1079156 (10chasemp) #Ops-Access-Requests isn't used in any ACL scenarios so it shouldn't be useful in this way? [19:32:16] RECOVERY - txstatsd backend instances on graphite1001 is OK: OK: All defined txstatsd jobs are runnning. [19:32:31] YuviPanda|zzz: Although we're running out of resources for adding nodes. [19:33:33] Coren: right, so we should be adding more hardware :) [19:34:04] YuviPanda: We need to pressure Mark into spending the $ for lots of nodes. :-) [19:34:08] s/nodes/hosts/ [19:34:20] (03CR) 10Chad: [C: 032] Enable EducationProgram extension on lvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193138 (https://phabricator.wikimedia.org/T89898) (owner: 10Glaisher) [19:34:27] (03Merged) 10jenkins-bot: Enable EducationProgram extension on lvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193138 (https://phabricator.wikimedia.org/T89898) (owner: 10Glaisher) [19:34:36] Coren: https://phabricator.wikimedia.org/maniphest/query/GNcYnN5ZPqlx/#R is High / UBN tasks assigned to you for a while now. Can you look through them and update? [19:35:24] 10Ops-Access-Requests, 6operations: Access to ops-access-request for me - https://phabricator.wikimedia.org/T91280#1079162 (10RobLa-WMF) >>! In T91280#1079156, @chasemp wrote: > #Ops-Access-Requests isn't used in any ACL scenarios so it shouldn't be useful in this way? Then please add me to whatever group you... [19:35:48] RECOVERY - puppet last run on amssq33 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [19:35:48] RECOVERY - puppet last run on amssq39 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [19:36:05] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s) [19:36:07] Logged the message, Master [19:38:24] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - tfinc - https://phabricator.wikimedia.org/T90927#1079188 (10Tfinc) Keep my access on bastion & stat* but not the rest. thanks [19:41:48] (03PS1) 10Ori.livneh: Improve performance by hashing/slicing in a single pass [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193878 [19:44:24] (03CR) 10Ori.livneh: [C: 032 V: 032] Improve performance by hashing/slicing in a single pass [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193878 (owner: 10Ori.livneh) [19:45:29] (03CR) 10Chad: [C: 032] Enabling subpages for ns0 in uawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193661 (https://phabricator.wikimedia.org/T91185) (owner: 10Base) [19:45:49] (03Merged) 10jenkins-bot: Enabling subpages for ns0 in uawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193661 (https://phabricator.wikimedia.org/T91185) (owner: 10Base) [19:46:12] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s) [19:46:40] (03PS1) 10Ori.livneh: Update package version [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193879 [19:48:34] (03CR) 10Chad: [C: 032] Remove Anexo namespace on pt.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172012 (https://phabricator.wikimedia.org/T75164) (owner: 10Dereckson) [19:48:42] (03Merged) 10jenkins-bot: Remove Anexo namespace on pt.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172012 (https://phabricator.wikimedia.org/T75164) (owner: 10Dereckson) [19:48:57] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s) [19:51:25] (03Abandoned) 10JanZerebecki: Change ru.wikinews.org to HTTPS only. [puppet] - 10https://gerrit.wikimedia.org/r/178676 (https://phabricator.wikimedia.org/T55259) (owner: 10JanZerebecki) [19:54:04] (03CR) 10Ori.livneh: [C: 032 V: 032] Update package version [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193879 (owner: 10Ori.livneh) [19:54:55] (03PS1) 10Ottomata: Change default vcores to max($::processorcount - 1, 1) so that we don't end up setting it to 0 [puppet/cdh] - 10https://gerrit.wikimedia.org/r/193882 [19:55:33] (03CR) 10Ottomata: [C: 032] Change default vcores to max($::processorcount - 1, 1) so that we don't end up setting it to 0 [puppet/cdh] - 10https://gerrit.wikimedia.org/r/193882 (owner: 10Ottomata) [19:55:42] (03CR) 10Chad: [C: 032] Set $wgAllowMicrodataAttributes to true at hewikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191192 (https://phabricator.wikimedia.org/T89655) (owner: 10Gerardduenas) [19:55:50] (03Merged) 10jenkins-bot: Set $wgAllowMicrodataAttributes to true at hewikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191192 (https://phabricator.wikimedia.org/T89655) (owner: 10Gerardduenas) [19:56:08] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s) [19:56:46] akosiaris: ping if you have a minute [19:56:57] (03Abandoned) 10JanZerebecki: DO NOT MERGE: disable icinga use of naggen for labs test [puppet] - 10https://gerrit.wikimedia.org/r/158339 (owner: 10JanZerebecki) [19:57:12] (03PS1) 10Ottomata: Update cdh module with vcores fix [puppet] - 10https://gerrit.wikimedia.org/r/193883 [19:57:36] ori: in a meeting, weel be free in like 5 [19:57:43] (03CR) 10Ottomata: [C: 032 V: 032] Update cdh module with vcores fix [puppet] - 10https://gerrit.wikimedia.org/r/193883 (owner: 10Ottomata) [19:57:48] akosiaris: cool thanks [19:58:22] (03PS1) 10Ori.livneh: Typo fix for makefile [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193884 [19:58:33] (03CR) 10Ori.livneh: [C: 032 V: 032] Typo fix for makefile [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193884 (owner: 10Ori.livneh) [19:58:37] (03CR) 10Chad: [C: 032] Set $wgUploadNavigationUrl for it.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193672 (owner: 10Nemo bis) [20:00:15] ori: ok, done. what can I help you with ? [20:00:17] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [20:00:54] 10Ops-Access-Requests, 6operations: Access to ops-access-request for RobLa (and all managers that need to approve) - https://phabricator.wikimedia.org/T91280#1079324 (10Dzahn) [20:01:27] 10Ops-Access-Requests, 6operations: Access to ops-access-request for RobLa-WMF (and all managers that need to approve) - https://phabricator.wikimedia.org/T91280#1079338 (10RobLa-WMF) [20:01:42] (03PS2) 10Chad: Specify HTTPS for $wgCanonicalServer for all private wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/99299 (owner: 10MZMcBride) [20:03:06] PROBLEM - puppet last run on analytics1011 is CRITICAL: CRITICAL: puppet fail [20:04:27] PROBLEM - puppet last run on analytics1033 is CRITICAL: CRITICAL: puppet fail [20:04:27] PROBLEM - puppet last run on analytics1017 is CRITICAL: CRITICAL: puppet fail [20:04:39] (03Merged) 10jenkins-bot: Set $wgUploadNavigationUrl for it.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193672 (owner: 10Nemo bis) [20:04:47] PROBLEM - puppet last run on analytics1020 is CRITICAL: CRITICAL: puppet fail [20:04:57] PROBLEM - puppet last run on analytics1041 is CRITICAL: CRITICAL: puppet fail [20:05:02] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 08s) [20:05:08] Logged the message, Master [20:05:20] <^d> Nemo_bis: That's done ^ [20:05:26] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [20:05:47] PROBLEM - puppet last run on analytics1040 is CRITICAL: CRITICAL: puppet fail [20:06:01] thank [20:06:59] (03CR) 10Chad: [C: 032] Use Wikiquote logo from Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192978 (owner: 10Tim Starling) [20:07:06] (03Merged) 10jenkins-bot: Use Wikiquote logo from Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192978 (owner: 10Tim Starling) [20:07:26] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: puppet fail [20:07:27] PROBLEM - puppet last run on analytics1035 is CRITICAL: CRITICAL: puppet fail [20:07:32] hmm ottomata [20:07:33] my frault [20:07:34] i know [20:07:36] i know waht's up [20:07:40] it is puppet being dumb [20:07:42] apparently [20:07:46] $::processorcount - 1 [20:07:50] is a string?? [20:07:59] comparison of String with 23 failed [20:08:07] or i dunno, it thinks 1 is? [20:08:08] i'm testing now [20:08:23] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 08s) [20:09:12] ori: did you do anything to fix "PROBLEM - txstatsd backend instances on graphite1001 is CRITICAL: CRITICAL: Not all configured txstatsd instances are running."? [20:09:27] or did it just come back on its own [20:10:17] PROBLEM - puppet last run on analytics1038 is CRITICAL: CRITICAL: puppet fail [20:10:25] (03CR) 10Chad: [C: 032] Specify HTTPS for $wgCanonicalServer for all private wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/99299 (owner: 10MZMcBride) [20:10:27] PROBLEM - puppet last run on analytics1016 is CRITICAL: CRITICAL: puppet fail [20:10:32] (03Merged) 10jenkins-bot: Specify HTTPS for $wgCanonicalServer for all private wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/99299 (owner: 10MZMcBride) [20:10:49] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 07s) [20:10:51] (03CR) 10Nikerabbit: [C: 031] Use language code "bho" directly instead of its alias "bh" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193801 (https://phabricator.wikimedia.org/T91240) (owner: 10Nemo bis) [20:10:55] Logged the message, Master [20:10:57] PROBLEM - puppet last run on analytics1013 is CRITICAL: CRITICAL: puppet fail [20:11:07] PROBLEM - puppet last run on analytics1002 is CRITICAL: CRITICAL: puppet fail [20:11:32] (03CR) 10Chad: [C: 032] Use language code "bho" directly instead of its alias "bh" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193801 (https://phabricator.wikimedia.org/T91240) (owner: 10Nemo bis) [20:11:37] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: puppet fail [20:11:39] (03Merged) 10jenkins-bot: Use language code "bho" directly instead of its alias "bh" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193801 (https://phabricator.wikimedia.org/T91240) (owner: 10Nemo bis) [20:12:08] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s) [20:12:28] PROBLEM - puppet last run on analytics1032 is CRITICAL: CRITICAL: puppet fail [20:12:57] PROBLEM - puppet last run on analytics1037 is CRITICAL: CRITICAL: puppet fail [20:13:26] PROBLEM - puppet last run on analytics1014 is CRITICAL: CRITICAL: puppet fail [20:13:37] PROBLEM - puppet last run on analytics1001 is CRITICAL: CRITICAL: puppet fail [20:14:25] ugh [20:14:31] i think the stdlib max() function is just totally broken [20:14:32] in this: [20:14:35] 10Ops-Access-Requests, 6operations: Access to ops-access-request for RobLa-WMF (and all managers that need to approve) - https://phabricator.wikimedia.org/T91280#1079407 (10RobH) ops-access-requests is public... and no one but ops should access ops-access-reviews. Ticekt T91257 is INCORRECT in that its ops-ac... [20:14:36] max($::processorcount - 1, 10) [20:14:40] 10 is a String. [20:14:42] a STRING [20:14:43] in ruby. [20:14:44] GAH [20:15:08] PROBLEM - puppet last run on analytics1028 is CRITICAL: CRITICAL: puppet fail [20:15:15] lame [20:15:17] PROBLEM - puppet last run on analytics1027 is CRITICAL: CRITICAL: puppet fail [20:15:44] (03Restored) 10Papaul: added mc2135 [puppet] - 10https://gerrit.wikimedia.org/r/193864 (owner: 10Papaul) [20:18:47] PROBLEM - puppet last run on analytics1031 is CRITICAL: CRITICAL: puppet fail [20:20:16] (03PS1) 10Ottomata: Fix for type comparison mismatch [puppet/cdh] - 10https://gerrit.wikimedia.org/r/193887 [20:20:32] (03CR) 10Ottomata: [C: 032 V: 032] Fix for type comparison mismatch [puppet/cdh] - 10https://gerrit.wikimedia.org/r/193887 (owner: 10Ottomata) [20:21:04] (03PS1) 10Ottomata: Update cdh with type comparison fix [puppet] - 10https://gerrit.wikimedia.org/r/193888 [20:21:07] PROBLEM - puppet last run on analytics1029 is CRITICAL: CRITICAL: puppet fail [20:21:21] (03CR) 10Ottomata: [C: 032 V: 032] Update cdh with type comparison fix [puppet] - 10https://gerrit.wikimedia.org/r/193888 (owner: 10Ottomata) [20:21:36] PROBLEM - puppet last run on analytics1034 is CRITICAL: CRITICAL: puppet fail [20:21:37] PROBLEM - puppet last run on analytics1019 is CRITICAL: CRITICAL: puppet fail [20:21:57] PROBLEM - puppet last run on analytics1015 is CRITICAL: CRITICAL: puppet fail [20:22:06] obscene, ottomata [20:22:16] PROBLEM - puppet last run on analytics1036 is CRITICAL: CRITICAL: puppet fail [20:22:38] PROBLEM - puppet last run on analytics1039 is CRITICAL: CRITICAL: puppet fail [20:22:59] RECOVERY - puppet last run on analytics1011 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [20:23:28] RECOVERY - puppet last run on analytics1040 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [20:23:47] RECOVERY - puppet last run on analytics1020 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [20:23:56] RECOVERY - puppet last run on analytics1041 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [20:24:37] RECOVERY - puppet last run on analytics1017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:25:16] PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail [20:25:17] RECOVERY - puppet last run on analytics1035 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [20:25:39] is it possible to silence the analytics stuffs you guys know about? Unsure what's real and what's not [20:26:18] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [20:27:07] looks like they just fixed it [20:28:07] RECOVERY - puppet last run on analytics1038 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [20:28:17] RECOVERY - puppet last run on analytics1016 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [20:28:47] PROBLEM - txstatsd backend instances on graphite1001 is CRITICAL: CRITICAL: Not all configured txstatsd instances are running. [20:28:57] RECOVERY - puppet last run on analytics1002 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [20:29:00] robh: which log(s) should I watch on carbon to verify that the install is working properly? [20:29:09] syslog [20:29:13] dhcp and tftp requests [20:29:22] ori: the txstatd and statsdlb seem to be alerting off and on for graphite1001 [20:29:38] any thoughts? I think you deployed that stuff friday unsure what's a real alert here as it seems transient [20:29:52] chasemp: sorry, i was just updating it and hadn't !logged [20:29:57] RECOVERY - txstatsd backend instances on graphite1001 is OK: OK: All defined txstatsd jobs are runnning. [20:29:57] RECOVERY - puppet last run on analytics1013 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [20:30:01] should be fine now [20:30:09] ok thanks [20:30:16] PROBLEM - check_puppetrun on barium is CRITICAL: CRITICAL: puppet fail [20:30:16] PROBLEM - check_puppetrun on bismuth is CRITICAL: CRITICAL: puppet fail [20:30:16] PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail [20:30:19] 6operations, 5Patch-For-Review: bond eth interfaces on ms1001 - https://phabricator.wikimedia.org/T89829#1079529 (10Aklapper) [20:30:37] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [20:30:38] chasemp: did you see my update on T89857, and are you ok with that as the status quo? [20:30:42] 10Ops-Access-Requests, 6operations: Access to ops-access-request for RobLa-WMF (and all managers that need to approve) - https://phabricator.wikimedia.org/T91280#1079533 (10RobH) 5Open>3declined I understand that @robla is requesting access to read #ops-access-reviews. I'd like to better explain WHY we di... [20:30:47] RECOVERY - puppet last run on analytics1037 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [20:31:16] RECOVERY - puppet last run on analytics1014 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [20:31:27] RECOVERY - puppet last run on analytics1032 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [20:31:40] !log On graphite1001, updated statsdlb to 0.2-1 [20:31:42] Logged the message, Master [20:31:58] “error: diskfilter writes are not supported.” [20:32:12] ori: I saw the update friday and seems like a survivable bandaid :) hopefully, only meaningful I would add is that when I went previously from a crappy statsd to one that could handle multi metric packets [20:32:16] our cpu dropped 30% [20:32:22] so that may be a big boon in the future [20:32:29] robh: you’re right, it isn’t installing. [20:32:37] RECOVERY - puppet last run on analytics1001 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [20:32:43] i can fix servers just by hearing about them! [20:32:44] \o/ [20:33:02] but otherwise, what else ya gonna do? capacity is just an issue for us I guess. I do wonder if we are using sampling effectively or at all. that would help a lot. [20:33:02] * robh is the server whisperer [20:33:32] chasemp: yeah, I think filippo settled on statsite, which apart from handling multi metric packets is generally several orders of magnitude faster than txstatsd. keeping txstatsd in place until filippo gets back spares us the complexity of having to migrate metrics [20:33:58] robh: neither the mgmt nor non-mgmt v4 IP appears in the syslog on Carbon [20:34:02] since there are slight differences in the way metric names are constructed and in the aggregate stats that are computed [20:34:07] RECOVERY - puppet last run on analytics1028 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:34:17] RECOVERY - puppet last run on analytics1027 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [20:35:17] PROBLEM - check_puppetrun on payments1001 is CRITICAL: CRITICAL: puppet fail [20:35:17] PROBLEM - check_puppetrun on barium is CRITICAL: CRITICAL: puppet fail [20:35:17] PROBLEM - check_puppetrun on bismuth is CRITICAL: CRITICAL: puppet fail [20:35:18] PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail [20:35:24] ori: yes understood, and I personally don't care which one we go with if it's sane :) I mostly poked filippo on whether or not the resource usage now was justified. i.e. mutli metric packets, not using sampling rates, using counters where we want sets doubling our load, and having no meaingful idea of who our top consumers are and //why// [20:35:40] but I have tried to stay on the edge of it as I don't have a ton of time to be as helpful as a I want [20:37:10] ori: around ? [20:37:46] RECOVERY - puppet last run on analytics1031 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:38:57] RECOVERY - puppet last run on analytics1029 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [20:39:17] 6operations, 10ops-codfw, 3wikis-in-codfw: mc2004 console is unreadable remotely - https://phabricator.wikimedia.org/T90883#1079582 (10Papaul) @Joe please see email [20:39:27] RECOVERY - puppet last run on analytics1019 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [20:40:16] PROBLEM - check_puppetrun on payments1001 is CRITICAL: CRITICAL: puppet fail [20:40:16] PROBLEM - check_puppetrun on barium is CRITICAL: CRITICAL: puppet fail [20:40:17] PROBLEM - check_puppetrun on bismuth is CRITICAL: CRITICAL: puppet fail [20:40:17] PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail [20:40:28] RECOVERY - puppet last run on analytics1034 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [20:40:28] RECOVERY - puppet last run on analytics1039 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [20:40:56] RECOVERY - puppet last run on analytics1015 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [20:41:07] RECOVERY - puppet last run on analytics1036 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [20:41:12] matanya: hey [20:41:48] hi ori you looked for me last week, can i still help ? [20:43:27] RECOVERY - puppet last run on analytics1033 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [20:44:08] matanya: no, i figured it out :) thanks [20:44:42] * matanya wonders what "it" was :D [20:45:17] RECOVERY - check_puppetrun on payments1001 is OK: OK: Puppet is currently enabled, last run 154 seconds ago with 0 failures [20:45:17] PROBLEM - check_puppetrun on barium is CRITICAL: CRITICAL: puppet fail [20:45:18] PROBLEM - check_puppetrun on bismuth is CRITICAL: CRITICAL: puppet fail [20:45:18] PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail [20:50:16] RECOVERY - check_puppetrun on barium is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [20:50:17] PROBLEM - check_puppetrun on bismuth is CRITICAL: CRITICAL: puppet fail [20:50:17] RECOVERY - check_puppetrun on lutetium is OK: OK: Puppet is currently enabled, last run 113 seconds ago with 0 failures [20:53:22] robh: when californium hits carbon dhcp, carbon says: DHCPDISCOVER from 84:2b:2b:fd:b9:f6 via 10.64.16.2: network 10.64.16.0/22: no free leases [20:53:28] any idea what that’s about? [20:53:40] It shouldn’t need a lease since that ip is set in dns… [20:54:16] andrewbogott: does it have ipv6 ? [20:54:51] (just wondering [20:54:57] Looks like not [20:55:11] Either we aren’t setting ipv6 for our hosts or I’m looking in the wrong place [20:55:17] RECOVERY - check_puppetrun on bismuth is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [20:55:35] so without knowing: carbon lease range is too narrow [20:56:58] andrewbogott: does it have dns? [20:57:04] matanya: yeah, that’s what it looks like, but that would suggest that we can’t install ANY new server. If that were true I would surely not be the first to complain. [20:57:21] robh: I see californium 1H IN A 10.65.2.21 in templates/wmnet [20:57:24] looks right to me... [20:57:43] andrewbogott: not neccerily [20:58:14] it also depends on lease time and relevant subnetting [20:58:24] ah, yeah, it could be just that one subnet is full [20:58:35] except, we have a specific IP assigned to that host... [20:59:07] robh: grep 10.65.2.21 in dns — it’s assigned to two different hosts. [20:59:10] if it is not in the correct range, it will fail [20:59:14] So, that is surely it, I will find an available ip [20:59:15] andrewbogott: that'll do it [20:59:19] maybe, no [20:59:27] are you sure you arent confusing mgmt and production? [20:59:39] I’m not sure, but, what do you mean? [20:59:44] checking [20:59:55] Oh! you mean that it has a mgmt IP but not prod… [21:00:04] gwicke, cscott, arlolra, subbu: Dear anthropoid, the time has come. Please deploy Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150302T2100). [21:00:11] californium 1H IN A 10.64.20.18 is production [21:00:15] you’re right 10.64.20.18 is [21:00:16] californium 1H IN A 10.65.2.21 is pmgmt [21:00:17] yes, as you say [21:00:19] mgmt [21:00:20] and it doesn’t have a conflict [21:06:43] robh: so… ‘no free leases’? [21:07:11] ok, lets see.... [21:08:10] how nice, my next edit to en.wikipedia will break the site: https://tools.wmflabs.org/fengtools/contribsize/result.php?user=Matanya [21:08:57] (03PS3) 10BBlack: set bnx2x num_queues on cache nodes in late_command [puppet] - 10https://gerrit.wikimedia.org/r/193857 [21:09:18] matanya: you can squeeze one more in there [21:09:26] :) [21:14:09] (03CR) 10BBlack: [C: 032] set bnx2x num_queues on cache nodes in late_command [puppet] - 10https://gerrit.wikimedia.org/r/193857 (owner: 10BBlack) [21:15:54] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - dartar - https://phabricator.wikimedia.org/T90949#1079809 (10chasemp) 5Open>3Resolved >>! In T90949#1078859, @DarTar wrote: > I don't need access to these servers any more, thanks for checking. done! [21:15:55] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1079811 (10chasemp) [21:16:17] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071208 (10chasemp) [21:17:12] andrewbogott: sorry, checking now [21:17:17] !log deployed parsoid version 08643f53 [21:17:18] i kept getting distracted [21:17:19] Logged the message, Master [21:17:29] thx [21:17:57] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - diederik - https://phabricator.wikimedia.org/T90951#1079826 (10chasemp) >>! In T90951#1078480, @Tnegrin wrote: > I'm fine either way. Diederik has signed the NDA and as such has access to > these systems. > > -Toby Und... [21:18:37] andrewbogott: I'll end up rebooting it a coupdl of times if that sok with you? [21:18:49] yeah, there’s nothing on there I care about [21:18:53] since I haven’t been able to ssh :) [21:19:05] i promise to try to find, but not fix, issue [21:19:15] nothing worse than someone fixing yer shit and not telling you why ;D [21:19:23] (i suppose not fixing it at all maybe, dunno) [21:20:07] so network checks out [21:20:26] andrewbogott: hahaha [21:20:28] i know what it is [21:20:31] this is in labs vlan on dns [21:20:37] but in private non labs vlan on switch [21:20:39] =P [21:20:54] you can tell by seeing the network segment the request comes in on [21:20:59] ‘private non labs vlan’ is fine w/me :) [21:21:06] well, the dns is set for labs private [21:21:09] and the nework for non labs private [21:21:13] which should it be? [21:21:31] hm... [21:21:31] if this doesnt need labs private, then we need to update dns [21:21:46] I guess it needs to talk on rabbitmq with labs hosts [21:21:49] i dunno how locked down labs private is [21:21:52] So that’s probably labs vlan [21:21:54] then labs private [21:21:56] yeah [21:21:59] ok then i need to change the switch is all, cool [21:22:04] great [21:22:07] !log deployed patch for T88361 [21:22:09] easy fix! (sorry it took so long ;) [21:22:12] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - tnegrin - https://phabricator.wikimedia.org/T90932#1079842 (10chasemp) https://gerrit.wikimedia.org/r/#/c/193848/ [21:22:12] Logged the message, Master [21:22:16] PROBLEM - Host cp1064 is DOWN: PING CRITICAL - Packet loss = 100% [21:24:05] robh: should I try to reinstall now? [21:24:27] RECOVERY - Host cp1064 is UP: PING OK - Packet loss = 0%, RTA = 1.36 ms [21:24:57] not yet, still editing switch config [21:26:13] .... still waiting... [21:26:19] takes awhile to sync sometimes [21:26:22] andrewbogott: Ok, now you can reboot into pxe again =] [21:26:33] cool [21:26:48] PROBLEM - Varnishkafka log producer on cp1064 is CRITICAL: Connection refused by host [21:26:48] PROBLEM - dhclient process on cp1064 is CRITICAL: Connection refused by host [21:26:48] PROBLEM - DPKG on cp1064 is CRITICAL: Connection refused by host [21:26:55] sorry about that, confusion between labs and non labs private, and i didnt quite catch they differed slightly [21:26:56] PROBLEM - Varnish HTCP daemon on cp1064 is CRITICAL: Connection refused by host [21:26:56] PROBLEM - Varnish HTTP upload-backend on cp1064 is CRITICAL: Connection refused [21:27:07] PROBLEM - configured eth on cp1064 is CRITICAL: Connection refused by host [21:27:07] PROBLEM - Disk space on cp1064 is CRITICAL: Connection refused by host [21:27:07] PROBLEM - Varnish HTTP upload-frontend on cp1064 is CRITICAL: Connection refused [21:27:17] PROBLEM - salt-minion processes on cp1064 is CRITICAL: Connection refused by host [21:27:18] PROBLEM - puppet last run on cp1064 is CRITICAL: Connection refused by host [21:27:37] PROBLEM - Varnish traffic logger on cp1064 is CRITICAL: Connection refused by host [21:27:46] PROBLEM - HTTPS on cp1064 is CRITICAL: Return code of 255 is out of bounds [21:27:47] PROBLEM - RAID on cp1064 is CRITICAL: Connection refused by host [21:28:05] 6operations, 10ops-codfw, 3wikis-in-codfw: Console on mc2001 is unresponsive - https://phabricator.wikimedia.org/T90559#1079854 (10Papaul) @Joe mc2001 is not in the dhcp config file [21:30:12] andrewbogott: now let me know if it works =D [21:30:32] robh: looks like the Trusty installer is running properly now. ‘Loading additional components' [21:30:34] thanks! [21:30:39] wee! [21:30:44] welcome, glad its fixed [21:32:46] PROBLEM - Host cp1064 is DOWN: PING CRITICAL - Packet loss = 100% [21:33:17] RECOVERY - Host cp1064 is UP: PING OK - Packet loss = 0%, RTA = 1.42 ms [21:39:12] 6operations: Give Google webmaster tools access to jon katz (Read only is fine) - https://phabricator.wikimedia.org/T90980#1079893 (10coren) Delegating any large (greater than a few hundred) number of sites is essentially unscalable (hours of error-prone clicking); and sharing the password to the root authority... [21:39:23] (03PS1) 10BBlack: wmf-reimage: use ipv4 for initial ssh [puppet] - 10https://gerrit.wikimedia.org/r/193960 [21:39:29] 6operations: Give Google webmaster tools access to jon katz (Read only is fine) - https://phabricator.wikimedia.org/T90980#1079897 (10coren) p:5Triage>3Normal [21:39:43] (03CR) 10BBlack: [C: 032 V: 032] wmf-reimage: use ipv4 for initial ssh [puppet] - 10https://gerrit.wikimedia.org/r/193960 (owner: 10BBlack) [21:41:57] 6operations, 10ops-codfw: rack mw2135 through mw2215 - https://phabricator.wikimedia.org/T86806#1079900 (10RobH) a:5RobH>3None This isn't technically waiting on me. Items we need to move forward: @Joe: You asked for console redirection to be enabled AFTER post, which differs from what @Mark states (to ke... [21:42:27] 6operations: setup and deploy mw2135 through mw2215 - https://phabricator.wikimedia.org/T86806#1079903 (10RobH) p:5High>3Triage [21:42:54] 6operations, 10ops-codfw: setup and deploy mw2135 through mw2215 - https://phabricator.wikimedia.org/T86806#976931 (10RobH) [21:44:17] (03PS1) 10Ori.livneh: Add blank.gif for performance.wikimedia.org for T88361 [puppet] - 10https://gerrit.wikimedia.org/r/193963 [21:44:20] woo californium haz ping [21:44:29] (03CR) 10Ori.livneh: [C: 032 V: 032] Add blank.gif for performance.wikimedia.org for T88361 [puppet] - 10https://gerrit.wikimedia.org/r/193963 (owner: 10Ori.livneh) [21:53:38] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Add joal to deployment group - https://phabricator.wikimedia.org/T90731#1079953 (10RobH) Checklist items: [x] - User has already signed https://phabricator.wikimedia.org/L3 [ ] - User's manager has approved request. Once manager approves the request, i... [21:53:42] (03CR) 10Andrew Bogott: [C: 032] wikitech: remove nutcracker [puppet] - 10https://gerrit.wikimedia.org/r/193341 (owner: 10Giuseppe Lavagetto) [21:57:41] PROBLEM - puppet last run on silver is CRITICAL: CRITICAL: puppet fail [21:58:59] (03PS1) 10Andrew Bogott: Nutcracker should be 'stopped' rather than 'absent' [puppet] - 10https://gerrit.wikimedia.org/r/193966 [22:00:44] (03CR) 10Andrew Bogott: [C: 032] Nutcracker should be 'stopped' rather than 'absent' [puppet] - 10https://gerrit.wikimedia.org/r/193966 (owner: 10Andrew Bogott) [22:03:04] RECOVERY - puppet last run on silver is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [22:03:55] (03PS1) 10Andrew Bogott: Revert "Nutcracker should be 'stopped' rather than 'absent'" [puppet] - 10https://gerrit.wikimedia.org/r/193967 [22:04:17] (03CR) 10Dzahn: [C: 032] initial commit - moved from operations/software [software/dbtree] - 10https://gerrit.wikimedia.org/r/193488 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [22:04:24] PROBLEM - Memcached on silver is CRITICAL: Connection refused [22:04:25] (03PS1) 10Andrew Bogott: Revert "wikitech: remove nutcracker" [puppet] - 10https://gerrit.wikimedia.org/r/193968 [22:04:28] (03CR) 10Dzahn: [V: 032] initial commit - moved from operations/software [software/dbtree] - 10https://gerrit.wikimedia.org/r/193488 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [22:04:33] (03CR) 10jenkins-bot: [V: 04-1] Revert "wikitech: remove nutcracker" [puppet] - 10https://gerrit.wikimedia.org/r/193968 (owner: 10Andrew Bogott) [22:06:17] (03CR) 10Andrew Bogott: [C: 032] Revert "Nutcracker should be 'stopped' rather than 'absent'" [puppet] - 10https://gerrit.wikimedia.org/r/193967 (owner: 10Andrew Bogott) [22:06:35] (03PS2) 10Andrew Bogott: Revert "wikitech: remove nutcracker" [puppet] - 10https://gerrit.wikimedia.org/r/193968 [22:07:40] (03CR) 10Andrew Bogott: [C: 032] Revert "wikitech: remove nutcracker" [puppet] - 10https://gerrit.wikimedia.org/r/193968 (owner: 10Andrew Bogott) [22:08:43] RECOVERY - Memcached on silver is OK: TCP OK - 0.000 second response time on port 11000 [22:11:31] (03PS1) 10BBlack: mount sysfs for PHYS_CORES in late_command [puppet] - 10https://gerrit.wikimedia.org/r/193972 [22:11:52] (03CR) 10BBlack: [C: 032 V: 032] mount sysfs for PHYS_CORES in late_command [puppet] - 10https://gerrit.wikimedia.org/r/193972 (owner: 10BBlack) [22:21:48] (03CR) 10Dzahn: [C: 032] 10.in-addr.arpa - indentation fixes [dns] - 10https://gerrit.wikimedia.org/r/193160 (owner: 10Dzahn) [22:26:39] (03PS2) 10Dzahn: static bugzilla: add links to all bugs/activities [puppet] - 10https://gerrit.wikimedia.org/r/193858 (https://phabricator.wikimedia.org/T85140) [22:31:19] 6operations, 10ops-codfw: setup and deploy mw2135 through mw2215 - https://phabricator.wikimedia.org/T86806#1080144 (10Papaul) @Rob: yes i confirm that hyperthreading is enable on all 80 servers. Plus the other 20 servers. [22:34:28] 6operations: Remove clevel@lists.wikimedia.org - https://phabricator.wikimedia.org/T91323#1080170 (10Eloquence) 3NEW a:3Dzahn [22:38:18] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [22:41:38] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [22:42:15] (03PS1) 10BBlack: re-enable cp1064 uplaod backend in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/193976 [22:42:29] (03CR) 10BBlack: [C: 032 V: 032] re-enable cp1064 uplaod backend in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/193976 (owner: 10BBlack) [22:44:49] (03CR) 10Dzahn: "i just updated the index.html for now: https://gerrit.wikimedia.org/r/#/c/193858/" [puppet] - 10https://gerrit.wikimedia.org/r/192964 (https://phabricator.wikimedia.org/T85140) (owner: 10Dzahn) [22:44:53] (03Abandoned) 10Dzahn: static bugzilla: add links to all bugs/activities [puppet] - 10https://gerrit.wikimedia.org/r/192964 (https://phabricator.wikimedia.org/T85140) (owner: 10Dzahn) [22:49:18] !log repooled cp1064 frontend (upload eqiad) [22:49:25] Logged the message, Master [22:52:08] (03PS7) 10Dzahn: puppetize dbtree config file to connect to tendril [puppet] - 10https://gerrit.wikimedia.org/r/193505 (https://phabricator.wikimedia.org/T90837) [22:54:02] jouncebot: next [22:54:02] In 1 hour(s) and 5 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150303T0000) [22:54:10] hmph, silly time. [22:56:18] PROBLEM - puppet last run on cp1044 is CRITICAL: CRITICAL: Puppet has 1 failures [22:58:18] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: puppet fail [22:59:00] (03CR) 10Dzahn: puppetize dbtree config file to connect to tendril (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/193505 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [22:59:55] (03CR) 10Dzahn: [C: 032] move dbtree into its own subrepository, rm here [software] - 10https://gerrit.wikimedia.org/r/193491 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [23:01:09] (03CR) 10Dzahn: [C: 031] "can be deployed anytime, the tool is in a different repo now and even if puppet doesn't clone it yet we don't want it affected by mw deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193143 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [23:01:43] !add swat 193143 [23:12:12] mutante: If you could figure out how to make that !add swat command work I think I could find people to pitch in a buy you a lot of beer [23:12:19] a *lot* of beer [23:12:31] bd808: :) https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=146671&oldid=146666 [23:12:35] what did i do wrong there [23:12:54] in source it looks like i add it to today, when rendered it's tomorrow [23:13:17] UTC time in table breaks [23:13:22] it's in the right place [23:13:37] under the Tuesday heading [23:13:46] 00:00–01:00 UTC (Mon) 16:00–17:00 PST [23:13:49] oh, right [23:13:57] it is Tuesday in UTC [23:14:00] yea, thanks [23:14:12] stupid timezones making tables hard [23:14:13] it confuses everybody (in the US) [23:14:57] RECOVERY - puppet last run on cp1044 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:15:33] What makes it even more confusing is that the labels (morning and evening swat) are not UTC relative but instead PST relative [23:15:42] yes, that :) [23:15:58] it's the midnight deploy now [23:16:11] the world revovles around silicon valley, don'tchaknow [23:16:46] silicon valley revolves around silicon valley at least [23:17:03] which means it'll soon implode (we can only hope) [23:17:42] it's about due for another big crash actually, at least based on the prior periodic cycles [23:18:09] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [23:18:12] jouncebot: star date -308166.21 [23:18:18] .... [23:19:16] bd808: i sure hope so, been saving up to finally buy a place :) [23:19:27] another good crash would be just the thing [23:19:52] bd808, timezones end up confusing the rest of the world too, when you pin regular scheduled times to some random local zone [23:20:13] Krenair: I agree completely [23:20:53] where "random" == headquarters, but yeah [23:20:58] jouncebot: next [23:20:59] In 0 hour(s) and 39 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150303T0000) [23:20:59] "Dear anthropoid, the time has come to deploy leap second bug fixes" (jouncebot crashes and quits) [23:21:00] * legoktm has never found it confusing :P [23:21:05] especially when that zone has semi-annual fluctuation +/-1 hour [23:21:19] greg-g, sounds arbitrary enough for me [23:21:22] facebook, for instance, pins deploys to SF time, and now that they have a London office, they have one deploy window for them as well [23:21:39] Krenair: except you know, Ops made me do it due to coverage [23:21:50] I don't blame you in particular. [23:21:55] :) [23:22:04] blame Sue for moving WMF to SF :) [23:22:15] greg-g: create a deploy window for every WMF office ;) [23:22:33] all 1 of them [23:22:34] there is only one WMF office [23:22:36] heh, "The sun never sets on the WMF remotie kingdom" [23:22:38] we ran our servers at $DAYJOB-1 on PST/PDT because that was where our primary DC was. Made things like reporting monthly billing to international clients interesting [23:22:41] therein lies the problem [23:22:59] legoktm: indeed :p [23:23:17] empire, dangit, got the quote wrong [23:23:27] we all carry a wmf office in our hearts [23:24:30] except greg-g obviously [23:24:33] (03PS2) 10Gergő Tisza: Enable CORS support logging on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/182767 (https://phabricator.wikimedia.org/T507) [23:24:44] I have no heart [23:25:38] awkward of you to bring that up greg-g [23:25:44] I was going to say it's because your the heart of wmf [23:25:52] I should get my dog some business cards that say she's Chief Physical Security Officer for the Wikimedia Houston Office :) [23:26:21] and Prinicpal Biter! [23:28:09] 6operations: Give Google webmaster tools access to jon katz (Read only is fine) - https://phabricator.wikimedia.org/T90980#1080417 (10JKatzWMF) If it makes it easier, I would limit it to the following wikis: en.wikipedia, zh.wikipedia, de.wikipedia, es.wikipedia, fr.wikipedia, jp.wikipedia, commons.wikimedia, i... [23:31:48] PROBLEM - puppet last run on hooft is CRITICAL: CRITICAL: puppet fail [23:32:28] (03PS1) 10MaxSem: Fix WP app being advertized on non-WP sites like Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193988 (https://phabricator.wikimedia.org/T91174) [23:40:27] (03PS2) 10Tim Starling: Fix WP app being advertized on non-WP sites like Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193988 (https://phabricator.wikimedia.org/T91174) (owner: 10MaxSem) [23:40:32] (03CR) 10Tim Starling: [C: 032] Fix WP app being advertized on non-WP sites like Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193988 (https://phabricator.wikimedia.org/T91174) (owner: 10MaxSem) [23:40:37] (03Merged) 10jenkins-bot: Fix WP app being advertized on non-WP sites like Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193988 (https://phabricator.wikimedia.org/T91174) (owner: 10MaxSem) [23:42:44] !log maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/193988 (duration: 00m 07s) [23:42:47] Logged the message, Master [23:43:57] (03Abandoned) 10Dzahn: add language domains in .wiki TLD [dns] - 10https://gerrit.wikimedia.org/r/191104 (https://phabricator.wikimedia.org/T88873) (owner: 10Dzahn) [23:44:06] (03Abandoned) 10Dzahn: add project domains in .wiki TLD [dns] - 10https://gerrit.wikimedia.org/r/191109 (https://phabricator.wikimedia.org/T88873) (owner: 10Dzahn) [23:48:06] (03CR) 10Dzahn: [C: 032] let puppet clone dbtree into noc/docroot/dbtree [puppet] - 10https://gerrit.wikimedia.org/r/193492 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [23:51:38] RECOVERY - puppet last run on hooft is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [23:52:21] 6operations, 10Wikimedia-Git-or-Gerrit, 5Patch-For-Review: TransparencyReport repository master in Gerrit silently made private - https://phabricator.wikimedia.org/T89640#1080471 (10Prtksxna) >>! In T89640#1078238, @akosiaris wrote: > @Prtksxna. OK, fixed. {icon thumbs-o-up} [23:56:50] (03PS8) 10Dzahn: puppetize dbtree config file to connect to tendril [puppet] - 10https://gerrit.wikimedia.org/r/193505 (https://phabricator.wikimedia.org/T90837) [23:58:15] (03CR) 10Dzahn: [C: 032] puppetize dbtree config file to connect to tendril [puppet] - 10https://gerrit.wikimedia.org/r/193505 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [23:59:06] jouncebot: next [23:59:06] In 0 hour(s) and 0 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150303T0000) [23:59:20] who's deploying? and, can i go first? i'm in a hurry a bit [23:59:35] <^d> I suppose I can do it [23:59:59] <^d> ping ebernhardson, Krenair, mutante, tgr for swat