[00:14:39] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Last successful Puppet run was Thu 10 Jul 2014 22:13:53 UTC [00:25:14] mwalker: What caused scap to take so long compared to normal? [00:25:37] it didn't take longer compared to normal; I'm just impatient [00:26:22] mutante: woot [00:26:27] mutante: and ty :) [00:49:49] (03CR) 1020after4: "> Inline comments; plus, the commit should only contain debian/. The" [operations/debs/php-mailparse] (review) - 10https://gerrit.wikimedia.org/r/142751 (owner: 1020after4) [00:51:11] (03PS1) 10Ori.livneh: beta: apply mediawiki::jobrunner on jobrunners [operations/puppet] - 10https://gerrit.wikimedia.org/r/145478 [00:58:01] (03CR) 10Aaron Schulz: [C: 04-1] beta: apply mediawiki::jobrunner on jobrunners (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/145478 (owner: 10Ori.livneh) [02:13:49] RECOVERY - Puppet freshness on analytics1018 is OK: puppet ran at Fri Jul 11 02:13:46 UTC 2014 [02:14:13] (03PS2) 10Dzahn: turn icinga into module pt1. separate classes [operations/puppet] - 10https://gerrit.wikimedia.org/r/145472 [02:18:59] (03CR) 10jenkins-bot: [V: 04-1] turn icinga into module pt1. separate classes [operations/puppet] - 10https://gerrit.wikimedia.org/r/145472 (owner: 10Dzahn) [02:28:47] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 5 below the confidence bounds [02:31:27] !log LocalisationUpdate completed (1.24wmf12) at 2014-07-11 02:30:24+00:00 [02:31:34] Logged the message, Master [02:33:51] (03PS1) 10Dzahn: Bugzilla: use 'header append' vs. 'header add' [operations/puppet] - 10https://gerrit.wikimedia.org/r/145489 [02:34:26] (03PS2) 10Dzahn: Bugzilla: use 'header append' vs. 'header add' [operations/puppet] - 10https://gerrit.wikimedia.org/r/145489 [02:39:02] (03PS2) 10Springle: Use MariaDB event scheduler on coredb slaves. [operations/software] - 10https://gerrit.wikimedia.org/r/145276 [02:39:30] (03CR) 10Springle: [C: 032] Use MariaDB event scheduler on coredb slaves. [operations/software] - 10https://gerrit.wikimedia.org/r/145276 (owner: 10Springle) [02:43:14] (03PS1) 10Dzahn: StrictTransportSecurity for wikitech [operations/puppet] - 10https://gerrit.wikimedia.org/r/145491 [02:44:09] (03PS2) 10Dzahn: StrictTransportSecurity for wikitech [operations/puppet] - 10https://gerrit.wikimedia.org/r/145491 [02:45:22] (03PS1) 10Dzahn: retab wikitech.wikimedia.org.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/145492 [02:46:33] (03PS2) 10Dzahn: retab wikitech.wikimedia.org.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/145492 [02:47:49] (03PS1) 10Dzahn: wikitech apache erb - qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/145493 [02:51:07] (03PS1) 10Dzahn: retab OTRS Apache config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/145494 [02:51:57] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:53:15] (03PS1) 10Dzahn: StrictTransportSecurity for OTRS [operations/puppet] - 10https://gerrit.wikimedia.org/r/145495 [02:59:33] (03PS1) 10Dzahn: delete unused bugzilla apache config [operations/puppet] - 10https://gerrit.wikimedia.org/r/145496 [02:59:53] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: Fetching readonly [03:01:24] !log LocalisationUpdate completed (1.24wmf13) at 2014-07-11 03:00:20+00:00 [03:01:28] Logged the message, Master [03:01:46] (03PS1) 10Dzahn: delete unused download.wm.org Apache site [operations/puppet] - 10https://gerrit.wikimedia.org/r/145497 [03:04:51] (03PS1) 10Dzahn: delete mobile.wp.org Apache config [operations/puppet] - 10https://gerrit.wikimedia.org/r/145498 [03:06:11] (03CR) 10Chmarkine: [C: 031] StrictTransportSecurity for OTRS [operations/puppet] - 10https://gerrit.wikimedia.org/r/145495 (owner: 10Dzahn) [03:07:45] (03PS14) 10Withoutaname: Delete ve.wikimedia.org and leave redirect [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131907 (https://bugzilla.wikimedia.org/55737) [03:09:22] (03PS1) 10Dzahn: delete stat1001.wm.org Apache site [operations/puppet] - 10https://gerrit.wikimedia.org/r/145499 [03:11:00] (03CR) 10Chmarkine: [C: 031] update SSL cipher list for OTRS to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/144734 (owner: 10Dzahn) [03:11:11] (03PS2) 10Dzahn: delete stat1001.wm.org Apache site [operations/puppet] - 10https://gerrit.wikimedia.org/r/145499 [03:13:16] (03CR) 10Chmarkine: [C: 031] update SSL cipher list on wikitech to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/144736 (owner: 10Dzahn) [03:20:04] (03CR) 10Chmarkine: [C: 031] StrictTransportSecurity for wikitech [operations/puppet] - 10https://gerrit.wikimedia.org/r/145491 (owner: 10Dzahn) [03:20:59] (03PS1) 10Dzahn: StrictTransportSecurity for lists.wm.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/145500 (https://bugzilla.wikimedia.org/38516) [03:31:18] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 11 03:30:11 UTC 2014 (duration 30m 10s) [03:31:22] Logged the message, Master [03:33:41] (03PS3) 10Dzahn: turn icinga into module pt1. separate classes [operations/puppet] - 10https://gerrit.wikimedia.org/r/145472 [03:33:57] (03PS4) 10Dzahn: turn icinga into module [operations/puppet] - 10https://gerrit.wikimedia.org/r/145472 [03:35:28] (03CR) 10jenkins-bot: [V: 04-1] turn icinga into module [operations/puppet] - 10https://gerrit.wikimedia.org/r/145472 (owner: 10Dzahn) [03:38:38] (03PS1) 10Dzahn: rm unused Icinga checkcommands.cfg.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/145501 [03:39:24] (03CR) 10Dzahn: "/puppet$ find . | grep checkcommands" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145501 (owner: 10Dzahn) [03:41:27] (03PS5) 10Dzahn: turn icinga into module [operations/puppet] - 10https://gerrit.wikimedia.org/r/145472 [03:42:53] (03CR) 10jenkins-bot: [V: 04-1] turn icinga into module [operations/puppet] - 10https://gerrit.wikimedia.org/r/145472 (owner: 10Dzahn) [03:46:10] (03CR) 10Dzahn: "why does jenkins still say things about admins.pp in https://integration.wikimedia.org/ci/job/operations-puppet-puppetlint-strict/4456/con" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145472 (owner: 10Dzahn) [03:51:37] (03PS6) 10Dzahn: turn icinga into module [operations/puppet] - 10https://gerrit.wikimedia.org/r/145472 [04:01:37] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Fri 11 Jul 2014 02:00:54 UTC [04:07:07] (03CR) 10Dzahn: "puppet automatically sets the +x bit on directories, so technically you never need a 7, but one can argue about style to make it obvious" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145471 (owner: 10BBlack) [04:22:04] (03PS2) 10Ori.livneh: beta: apply mediawiki::jobrunner on jobrunners [operations/puppet] - 10https://gerrit.wikimedia.org/r/145478 [04:30:14] (03PS3) 10Ori.livneh: beta: apply mediawiki::jobrunner on jobrunners [operations/puppet] - 10https://gerrit.wikimedia.org/r/145478 [04:32:53] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [04:33:20] (03PS4) 10Ori.livneh: beta: apply mediawiki::jobrunner on jobrunners [operations/puppet] - 10https://gerrit.wikimedia.org/r/145478 [05:16:34] (03PS1) 10Springle: Use MariaDB event scheduler to automatically delay dbstore1001. This saves running separate pt-slave-delay processes, plus is easy to disable single channels, or all of them, by: [operations/software] - 10https://gerrit.wikimedia.org/r/145507 [05:17:36] (03CR) 10Chmarkine: "How about using "Header set"?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145489 (owner: 10Dzahn) [05:18:44] (03CR) 10Springle: [C: 032] Use MariaDB event scheduler to automatically delay dbstore1001. This saves running separate pt-slave-delay processes, plus is easy to disabl [operations/software] - 10https://gerrit.wikimedia.org/r/145507 (owner: 10Springle) [05:20:41] (03PS4) 1020after4: Packaging for debian using pkg-php-tools/dh_php5. [operations/debs/php-mailparse] (review) - 10https://gerrit.wikimedia.org/r/142751 [05:20:43] (03PS1) 10Springle: Ensure MariaDB event logging statements are deterministic. [operations/software] - 10https://gerrit.wikimedia.org/r/145509 [05:21:14] (03CR) 10Springle: [C: 032] Ensure MariaDB event logging statements are deterministic. [operations/software] - 10https://gerrit.wikimedia.org/r/145509 (owner: 10Springle) [05:22:56] (03CR) 1020after4: "Ok this version should be distribution neutral, as far as I can tell. I followed Filippo's advice to use pkg-php-tools and finally figured" [operations/debs/php-mailparse] (review) - 10https://gerrit.wikimedia.org/r/142751 (owner: 1020after4) [05:23:05] (03CR) 10Ori.livneh: "I cherry-picked this in Labs and watched it roll out to deployment-jobrunner01. It appears to be working well." [operations/puppet] - 10https://gerrit.wikimedia.org/r/145478 (owner: 10Ori.livneh) [05:34:16] (03PS1) 10Ori.livneh: mediawiki: move SSHD nice override from web.pp to init.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/145510 [05:40:00] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Fri Jul 11 05:39:41 UTC 2014 [06:27:28] PROBLEM - puppet last run on mw1100 is CRITICAL: CRITICAL: Puppet has 1 failures [06:27:29] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:18] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: Puppet has 4 failures [06:28:58] PROBLEM - puppet last run on mw1069 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:19] PROBLEM - puppet last run on mw1068 is CRITICAL: CRITICAL: Puppet has 3 failures [06:29:29] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:38] PROBLEM - puppet last run on mw1099 is CRITICAL: CRITICAL: Puppet has 2 failures [06:38:28] PROBLEM - puppet last run on ssl3001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:42:18] (03PS1) 10QChris: Document use of stat1001.wikimedia.org's default site [operations/puppet] - 10https://gerrit.wikimedia.org/r/145518 [06:45:23] RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:45:38] (03CR) 10QChris: [C: 04-1] "stat1001's default site is still in use." [operations/puppet] - 10https://gerrit.wikimedia.org/r/145499 (owner: 10Dzahn) [06:46:02] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [06:46:12] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:46:22] RECOVERY - puppet last run on mw1068 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:46:32] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:46:32] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:46:42] RECOVERY - puppet last run on mw1099 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:47:07] (03PS2) 10QChris: Document use of stat1001.wikimedia.org's default site [operations/puppet] - 10https://gerrit.wikimedia.org/r/145518 [06:51:22] PROBLEM - puppet last run on db1039 is CRITICAL: CRITICAL: Puppet has 2 failures [06:56:23] RECOVERY - puppet last run on ssl3001 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [07:09:24] RECOVERY - puppet last run on db1039 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [08:44:47] (03CR) 10Hashar: [C: 031] beta: apply mediawiki::jobrunner on jobrunners [operations/puppet] - 10https://gerrit.wikimedia.org/r/145478 (owner: 10Ori.livneh) [08:54:34] !log Started rebuildItemsPerSite for wikidatawiki on terbium [08:54:38] Logged the message, Master [08:54:41] aude: ^ fyi [08:55:01] ok [09:55:37] (03PS2) 10Alex Monk: delete unused download.mediawiki.org Apache site [operations/puppet] - 10https://gerrit.wikimedia.org/r/145497 (owner: 10Dzahn) [10:43:42] springle: was just curious, was there a reason the labsdb boxes were split up like that, rather than have all databases in one machine like analytics-store? [10:44:13] YuviPanda: because analytics-store uses MariaDB 10 with has multi-source replication [10:44:21] labsdb is still on 5.5 [10:44:24] aaaha! [10:44:40] so I'm going to guess that moving that would still be possible when/if we move to 10 [10:44:47] upgrading to 10 is on my list, both sanitarium and labsdb [10:44:58] coool [10:45:02] that would make all labsdbs the same, without federations and such [10:45:06] right [10:45:12] so they'll all have all the databases? [10:45:15] right [10:45:42] springle: cooool! :D [10:45:49] springle: any rough timeline? [10:46:07] heh [10:46:20] more than 3 months, less than 6 [10:46:29] that's good enough for moi :) [10:46:31] springle: thankyou! [10:48:49] yw [11:13:18] lunch time [11:18:01] (03CR) 10JanZerebecki: "Yes add isn't what we want. Yes set would be better." [operations/puppet] - 10https://gerrit.wikimedia.org/r/145489 (owner: 10Dzahn) [11:37:50] (03PS3) 10Tim Landscheidt: WIP: labsdeprepo: Allow more than one local repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/118796 (https://bugzilla.wikimedia.org/60925) [11:46:59] greg-g: ping [11:46:59] hoo: You sent me a contentless ping. This is a contentless pong. Please provide a bit of information about what you want and I will respond when I am around. [11:47:08] it's gret! [11:47:10] greg* [11:47:16] it's automated response greg! [11:47:18] :D [11:47:21] :D [11:47:30] this is like voicemail! [11:47:39] * hoo dislikes voicemail [11:47:40] we should all leave crazy responses [11:47:44] we need to deploy tiny follow up urgent fix [11:47:49] for what we did yesterday [11:48:06] 2 lines [11:48:26] one line,a ctually [11:48:31] yes [11:48:42] we are too strongly limiting the entity search (term) results [11:51:10] Reedy: --^ [11:53:41] (03CR) 10KartikMistry: "I don't have much experience with PHP packaging, but still - few nitpicking. Packaging looks fine otherwise." (034 comments) [operations/debs/php-mailparse] (review) - 10https://gerrit.wikimedia.org/r/142751 (owner: 1020after4) [11:54:05] (03CR) 10Odder: Add a complete list of local interwikis (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/144264 (https://bugzilla.wikimedia.org/954) (owner: 10TTO) [12:05:35] !log Jenkins: manually removing history of mwext-Wikibase-client-tests and mwext-Wikibase-repo-tests . They are no more used since January [12:05:40] Logged the message, Master [12:06:09] sad but true [12:06:21] aude: we will get them back in eventually :D [12:06:25] yes! [12:06:54] (03CR) 10TTO: Add a complete list of local interwikis (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/144264 (https://bugzilla.wikimedia.org/954) (owner: 10TTO) [12:07:33] !log Jenkins: dropping history of mwext-Wikibase-testextensions-master as well [12:07:38] Logged the message, Master [12:34:35] (03CR) 10Jgreen: [C: 031] retab OTRS Apache config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/145494 (owner: 10Dzahn) [12:35:58] (03CR) 10Jgreen: [C: 031] StrictTransportSecurity for OTRS [operations/puppet] - 10https://gerrit.wikimedia.org/r/145495 (owner: 10Dzahn) [12:51:53] (03PS2) 10Krinkle: StrictTransportSecurity for OTRS [operations/puppet] - 10https://gerrit.wikimedia.org/r/145495 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [12:52:43] (03PS3) 10Krinkle: Enable StrictTransportSecurity for wikitech [operations/puppet] - 10https://gerrit.wikimedia.org/r/145491 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [12:52:56] (03PS3) 10Krinkle: retab wikitech.wikimedia.org.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/145492 (owner: 10Dzahn) [12:53:01] (03PS2) 10Krinkle: wikitech apache erb - qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/145493 (owner: 10Dzahn) [12:53:11] (03CR) 10Krinkle: [C: 031] Enable StrictTransportSecurity for wikitech [operations/puppet] - 10https://gerrit.wikimedia.org/r/145491 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [12:53:19] (03CR) 10Krinkle: [C: 031] StrictTransportSecurity for OTRS [operations/puppet] - 10https://gerrit.wikimedia.org/r/145495 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [13:10:24] (03CR) 10Ottomata: [C: 032 V: 032] Document use of stat1001.wikimedia.org's default site [operations/puppet] - 10https://gerrit.wikimedia.org/r/145518 (owner: 10QChris) [13:17:09] (03PS1) 10Dereckson: Namespaces configuration for ru.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145556 (https://bugzilla.wikimedia.org/67844) [13:17:42] (03CR) 10Dereckson: "Follow-up: change I93205ef4616ab401c79c96f98b38af4ded604c68." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/35663 (owner: 10Dereckson) [13:20:34] (03CR) 10Matanya: wikitech apache erb - qualify vars (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/145493 (owner: 10Dzahn) [13:20:50] hi [13:20:58] (03CR) 10Matanya: [C: 031] retab wikitech.wikimedia.org.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/145492 (owner: 10Dzahn) [13:21:19] hello titish_maryam [13:21:39] how are you? [13:22:15] ok, can I help you ? [13:22:36] (03CR) 10Matanya: [C: 031] delete unused bugzilla apache config [operations/puppet] - 10https://gerrit.wikimedia.org/r/145496 (owner: 10Dzahn) [13:23:15] What assistance! [13:24:59] Some people are too polite like you:) [13:26:04] greg-g: Are you around? [13:29:56] (03CR) 10Ottomata: "Yeah, we can't just get rid of it. I believe that we did this because we didn't want to put the datasets under the broader stats.wikimedi" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145499 (owner: 10Dzahn) [13:39:59] hoo: Greg-g is in SF so definitely sleeping [13:42:46] ah, ok [13:45:24] ottomata: heads up, i'm about to do a huge delete, if you see stuff going wrong, blame me [13:46:05] matanya: What's big? 100k? [13:46:25] hha ok [13:46:49] hey dogeydogey, just had a good idea for something maybe easy for you to do [13:46:55] yes! [13:46:57] bring it [13:47:08] ok, it will require a bit of coordination with folks to make sure its good to do [13:47:10] https://gerrit.wikimedia.org/r/#/c/145499/ [13:47:17] check that out, read the comments [13:47:19] hoo: around 15K [13:47:38] this [13:47:38] http://stat1001.wikimedia.org/public-datasets/ [13:47:47] probably should just have its own subdomain [13:47:49] matanya: That's not even worth mentioning here :D [13:47:53] and we should get rid of the default site [13:47:57] now, i'm not 100% sure this should happen [13:48:15] so, you should ask dario, I'll PM you his email [13:48:16] ottomata thanks! will work on it [13:48:29] hoo: there is a red warning, i'm better telling before than sorry later [13:48:36] :D [13:49:10] ottomata: he is now online on -analytics [13:50:11] (03PS2) 10Dereckson: Namespaces configuration for ru.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145556 (https://bugzilla.wikimedia.org/67844) [13:50:50] (03CR) 10Dereckson: "PS2: adding U: and UT: per bug 67844 comment 4." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145556 (https://bugzilla.wikimedia.org/67844) (owner: 10Dereckson) [13:51:17] thanks matanya, I just PMed dogeydogey all the details [13:51:59] !log Jenkins: mediawiki/core change being queued while Jenkins is busy proceeding some history. That is normal, will resume soon ™ [13:52:05] Logged the message, Master [13:58:00] good morning cmjohnson1! [13:58:05] any luck yesterday? [13:58:10] where shoudl I start trying again? [14:06:22] (03CR) 10Andrew Bogott: [C: 032] retab wikitech.wikimedia.org.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/145492 (owner: 10Dzahn) [14:09:40] Hm, I have never before seen a patch in "Submitted, Merge Pending" state. Does that mean Jenkins is ill? [14:10:27] andrewbogott: Nope, it means it's waiting for a dependency to be merged [14:10:36] and you have to merge that manually [14:10:37] oh, of course. [14:10:39] * andrewbogott looks [14:11:11] (03CR) 10Andrew Bogott: [C: 032] Enable StrictTransportSecurity for wikitech [operations/puppet] - 10https://gerrit.wikimedia.org/r/145491 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [14:18:32] :-D [14:22:52] ottomata: so the partman recipe didn't work [14:24:14] aye :/ [14:25:48] yeah..it's almost been 2 years since we did an install with them. [14:42:00] (03PS3) 10JanZerebecki: Bugzilla: use 'header set' vs. 'header add' [operations/puppet] - 10https://gerrit.wikimedia.org/r/145489 (owner: 10Dzahn) [14:48:51] (03CR) 10JanZerebecki: [C: 031] StrictTransportSecurity for OTRS [operations/puppet] - 10https://gerrit.wikimedia.org/r/145495 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [14:55:54] hoo: g'morning, it is now 7:55am and I'm in the office, what can I do for you? :) [14:56:55] wow, early at the office [14:57:47] yeah, commuting by bus across the Golden Gate means I either get up/in early but have the trip be shorter, or get up/in later but have the trip take a lot longer [14:57:54] ah [14:57:55] greg-g: During yesterday's deploy we fixed an API module... but turns out that we missed a thing and the results are messed now [14:58:06] * greg-g nods [14:58:12] what's the error/bug? [14:58:15] affects entity suggesteions [14:58:31] greg-g: The results are virtually useless now [14:58:32] retrieving too few / incorrect ones [14:58:40] what aude says [14:58:55] it's a one-line fix [14:58:57] gotcha [14:59:20] so, one-liner on a Friday morning... lesse.... [14:59:30] have I had my coffee yet? - in-progress [14:59:32] we try to avoid this but [14:59:45] :) [14:59:46] lousy for the users during the weekend [14:59:55] yeah [14:59:56] we try to do better :) [14:59:58] one-liner in js? [14:59:59] https://gerrit.wikimedia.org/r/#/c/145551/1/extensions/Wikibase/lib/includes/store/sql/TermSqlIndex.php [15:00:03] nope php [15:00:04] see link [15:00:08] * greg-g nods [15:00:36] tested on beta cluster? [15:00:52] We "fixed" the module yesterday, but turns out I had to few data on my machine for the error to show [15:01:07] (03CR) 10Chmarkine: [C: 031] Bugzilla: use 'header set' vs. 'header add' [operations/puppet] - 10https://gerrit.wikimedia.org/r/145489 (owner: 10Dzahn) [15:01:12] no, only tested it on my set up [15:01:24] not sure there's useful enough data on beta to test this [15:01:35] hoo: file a bug for that :) [15:01:42] bug for what? [15:01:46] Useful data on beta [15:01:47] -? [15:01:48] yeah [15:02:11] ideally there's a place on the beta cluster you can test your changes [15:02:23] we can do imports of subsets of wikidata, as needed [15:02:27] ottomata: we need to go into bios on each and change boot settings to bios and not UEFI...Dell wants you to use their shit [15:02:30] but... for now... [15:02:42] so next time you reboot go bios and make the change [15:02:43] hoo: don't tell anyone I said yes [15:02:49] subsets is a little difficult with Wikibase as we have stable ids [15:02:49] :P [15:02:53] i imported properties for our 'demo' system [15:02:53] :P [15:03:03] could do sometime for beta, import a big batch but not now [15:03:06] oook [15:03:36] aude: yeah, let's just put that on the backlog so we can get to it [15:03:49] yeah [15:04:23] ty [15:05:09] greg-g: Thanks a lot :) [15:05:31] hoo: we have commons on beta to use a foreign api file backend [15:05:43] hoo: i.e. if commons on beta is missing a file, it fetches it from the production wiki [15:05:50] maybe something similar could be build for wikidata [15:05:58] i.e. fetch from wikidata production via the API [15:05:59] wouldn't work same because of the id numbering [15:06:03] instead of direct database coonnection [15:06:13] oh come on, that is just a number! :-D [15:06:24] but yeah I see what you mean [15:06:26] the items reference property ids [15:06:29] etc. [15:06:33] Let's refactor our storage backend :P [15:06:40] * hoo runs [15:06:45] my script for importing properties did some renumbering but was just for properties [15:06:52] can modify for items [15:06:57] hoo: shush you [15:14:14] !log hoo Synchronized php-1.24wmf12/extensions/Wikidata/: Fix the wbsearchentities API (duration: 00m 16s) [15:14:18] Logged the message, Master [15:14:20] \o/ [15:14:33] verified [15:15:58] !log hoo Synchronized php-1.24wmf13/extensions/Wikidata/: Fix the wbsearchentities API (duration: 00m 13s) [15:16:03] Logged the message, Master [15:16:06] ok, we're done :) [15:17:12] thanks greg-g [15:23:39] (03PS1) 10Tim Landscheidt: labsdebrepo: Fix initial run [operations/puppet] - 10https://gerrit.wikimedia.org/r/145573 [15:34:24] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Fri 11 Jul 2014 13:33:20 UTC [15:39:36] hoo: aude shhhhhhhh, I said don't tell anyone! [15:39:39] :P [15:40:15] /j #wikimedia-operations-cabal :D [15:43:19] (03CR) 10JanZerebecki: [C: 04-1] "Thank you for all the HSTS patches." [operations/puppet] - 10https://gerrit.wikimedia.org/r/145500 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [15:52:10] (03CR) 10Tim Landscheidt: "Tested on Toolsbeta." [operations/puppet] - 10https://gerrit.wikimedia.org/r/145573 (owner: 10Tim Landscheidt) [15:53:08] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Fri Jul 11 15:53:02 UTC 2014 [15:53:55] (03PS1) 10Cmjohnson: fixing typo on analytics1041 entry [operations/dns] - 10https://gerrit.wikimedia.org/r/145576 [15:56:43] (03CR) 10Cmjohnson: [C: 032] fixing typo on analytics1041 entry [operations/dns] - 10https://gerrit.wikimedia.org/r/145576 (owner: 10Cmjohnson) [16:06:09] and I am off for vacations see you in a week [16:59:04] (03PS1) 10Chad: Increase weighting of title field for Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145588 [16:59:24] (03CR) 10jenkins-bot: [V: 04-1] Increase weighting of title field for Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145588 (owner: 10Chad) [17:01:36] <^d> gdash graphs be empty. is this known? [17:03:45] (03PS2) 10Chad: Increase weighting of title field for Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145588 [17:03:55] (03CR) 10jenkins-bot: [V: 04-1] Increase weighting of title field for Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145588 (owner: 10Chad) [17:04:02] superm401: I have a question about https://bugzilla.wikimedia.org/show_bug.cgi?id=65379 [17:04:12] That bug implies that you can see the list of instances in a project when you aren't project admin... [17:04:16] is that right? Can you verify? [17:04:52] andrewbogott, yes, I think that was the issue. [17:04:59] Got to go to a meeting [17:05:12] Can you please check and see if that is still true? Because right now I can't see the instance list if I'm not project admin (I just tried) [17:05:17] but, ok, I guess I'll catch up with you later [17:08:42] PROBLEM - Host analytics1004 is DOWN: PING CRITICAL - Packet loss = 100% [17:13:52] RECOVERY - Host analytics1004 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [17:16:02] PROBLEM - DPKG on analytics1004 is CRITICAL: Connection refused by host [17:16:02] PROBLEM - RAID on analytics1004 is CRITICAL: Connection refused by host [17:16:12] PROBLEM - check configured eth on analytics1004 is CRITICAL: Connection refused by host [17:16:12] PROBLEM - SSH on analytics1004 is CRITICAL: Connection refused [17:16:12] PROBLEM - Disk space on analytics1004 is CRITICAL: Connection refused by host [17:16:32] PROBLEM - puppet last run on analytics1004 is CRITICAL: Connection refused by host [17:16:32] PROBLEM - check if dhclient is running on analytics1004 is CRITICAL: Connection refused by host [17:16:42] PROBLEM - puppet disabled on analytics1004 is CRITICAL: Connection refused by host [17:21:33] (03CR) 10Dzahn: "ok,thanks. background: i did this as part of a chain of patches that were all about deleting (unused) files from files/apache/sites" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145499 (owner: 10Dzahn) [17:21:46] (03Abandoned) 10Dzahn: delete stat1001.wm.org Apache site [operations/puppet] - 10https://gerrit.wikimedia.org/r/145499 (owner: 10Dzahn) [17:25:48] (03CR) 10Dzahn: [C: 032] delete unused bugzilla apache config [operations/puppet] - 10https://gerrit.wikimedia.org/r/145496 (owner: 10Dzahn) [17:25:55] (03CR) 1001tonythomas: "The default return path was set to the $wgPasswordSender ( $from->address ) which is wiki@wikimedia.org, after change Ic5c1231611a5c4984dd" [operations/puppet] - 10https://gerrit.wikimedia.org/r/141287 (owner: 1001tonythomas) [17:43:00] (03PS4) 10Tim Landscheidt: labsdeprepo: Allow more than one local repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/118796 (https://bugzilla.wikimedia.org/60925) [17:57:37] (03CR) 10Ori.livneh: [C: 032] beta: apply mediawiki::jobrunner on jobrunners [operations/puppet] - 10https://gerrit.wikimedia.org/r/145478 (owner: 10Ori.livneh) [18:01:36] (03Abandoned) 10Dzahn: rm unused Icinga checkcommands.cfg.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/145501 (owner: 10Dzahn) [18:03:49] (03CR) 10Dzahn: [C: 032] update SSL cipher list for OTRS to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/144734 (owner: 10Dzahn) [18:05:26] manybubbles: hiii [18:09:22] (03PS2) 10Dzahn: retab OTRS Apache config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/145494 [18:10:42] (03PS3) 10Dzahn: retab OTRS Apache config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/145494 [18:13:08] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: Fetching origin [18:14:26] (03CR) 10Dzahn: [C: 032] retab OTRS Apache config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/145494 (owner: 10Dzahn) [18:15:08] RECOVERY - Unmerged changes on repository puppet on strontium is OK: Fetching origin [18:15:52] (03PS3) 10Dzahn: StrictTransportSecurity for OTRS [operations/puppet] - 10https://gerrit.wikimedia.org/r/145495 (https://bugzilla.wikimedia.org/38516) [18:16:58] (03CR) 10Dzahn: [C: 032] StrictTransportSecurity for OTRS [operations/puppet] - 10https://gerrit.wikimedia.org/r/145495 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [18:17:56] csteipp: ^ [18:18:08] godog: care to re-review https://gerrit.wikimedia.org/r/#/c/142751/ [18:19:37] !log OTRS - enabled STS, updated SSL cipher list, restarted Apache on iodine [18:19:42] Logged the message, Master [18:19:43] Jeff_Green: [18:19:52] (03CR) 10MaxSem: [C: 031] delete mobile.wp.org Apache config [operations/puppet] - 10https://gerrit.wikimedia.org/r/145498 (owner: 10Dzahn) [18:24:04] (03PS5) 1020after4: Packaging for debian using pkg-php-tools/dh_php5. [operations/debs/php-mailparse] (review) - 10https://gerrit.wikimedia.org/r/142751 [18:24:08] (03PS3) 10Chad: Increase weighting of title field for Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145588 [18:26:17] (03CR) 10Dzahn: "hmm. why not "append"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145489 (owner: 10Dzahn) [18:32:19] mutante: got it [18:33:21] mutante: I think that's actually a really good one for hsts. Thanks! [18:33:58] cool, yw, also just 7 days for now [18:35:03] (03CR) 10Dzahn: [C: 032] Bugzilla: use 'header set' vs. 'header add' [operations/puppet] - 10https://gerrit.wikimedia.org/r/145489 (owner: 10Dzahn) [18:36:08] (03CR) 10Dzahn: [V: 032] Bugzilla: use 'header set' vs. 'header add' [operations/puppet] - 10https://gerrit.wikimedia.org/r/145489 (owner: 10Dzahn) [18:38:56] (03CR) 10JanZerebecki: "All 3 variants work because no other header with that name exists before that line. But our intent is to have exactly one header with that" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145489 (owner: 10Dzahn) [18:39:54] (03PS1) 10Dzahn: OTRS - 'set' header instead of 'append'ing it [operations/puppet] - 10https://gerrit.wikimedia.org/r/145605 [18:41:29] (03CR) 10JanZerebecki: [C: 031] OTRS - 'set' header instead of 'append'ing it [operations/puppet] - 10https://gerrit.wikimedia.org/r/145605 (owner: 10Dzahn) [18:42:01] (03CR) 10Dzahn: [C: 032] OTRS - 'set' header instead of 'append'ing it [operations/puppet] - 10https://gerrit.wikimedia.org/r/145605 (owner: 10Dzahn) [18:48:07] csteipp: just saw it also got merged already for wikitech :) [18:48:11] mutante: could you check if there is anything else than /pipermail and /pipermail accessible from https://lists.wikimedia.org ? so I can fix the redirects to only https. [18:48:45] err /pipermail and /mailman [18:49:41] jzerebecki: /rss/ /images/ /mbox/ [18:49:59] ah yea the aliases [18:52:04] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: Fetching origin [18:52:18] icinga-wm: fix strontium :p [18:53:37] (03PS3) 10Dzahn: wikitech apache erb - qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/145493 [18:53:54] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Last successful Puppet run was Fri 11 Jul 2014 16:53:30 UTC [18:59:05] (03PS1) 10Dzahn: wikitech-make ServerAlias configurable as well [operations/puppet] - 10https://gerrit.wikimedia.org/r/145610 [19:00:11] (03PS2) 10Dzahn: wikitech-make ServerAlias configurable as well [operations/puppet] - 10https://gerrit.wikimedia.org/r/145610 [19:01:51] (03PS1) 10MarkTraceur: Remove MMV on private wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145612 [19:03:06] ottomata: ssh: connect to host analytics1004 port 22: Connection refused [19:03:10] known? [19:05:47] (03PS1) 10JanZerebecki: Make lists.wikimedia.org HTTPS only [operations/puppet] - 10https://gerrit.wikimedia.org/r/145616 [19:06:52] (03CR) 10Gergő Tisza: "Not sure if there is a use case for having MMV on private wikis, but the bug could be fixed by turning off thumbnail guessing instead." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145612 (owner: 10MarkTraceur) [19:07:19] oh i rebooted it a while ago, trying to repurpose it, meant to come back to it [19:07:24] mutante: not known, thanks, on it [19:07:29] shiould have logged that [19:08:19] cscott: hey! did you try labs-vagrant? [19:08:31] nope, ran out of time yesterday [19:08:46] hey Rob [19:09:12] (03CR) 10Gergő Tisza: "Well, probably could be fixed. Do we have a bugzilla ticket for that?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145612 (owner: 10MarkTraceur) [19:09:16] (03CR) 10JanZerebecki: "Needs: Change-Id: I3511f4b0d0185d1e4d35166c13f2104c7805f737" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145500 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [19:09:52] cscott: alright :) [19:10:09] RobH: when i comment on a ticket you assign me do you received any type of notifications? [19:10:49] everyone in ops gets every single rt ticket, so yes, but its in a lot of others [19:11:00] if you need me to do something its usually best to assign the ticket to me [19:11:07] ok [19:11:22] that way it shows up on my main page in my list of shit assgined [19:11:28] i may still miss it, but its less likely =] [19:11:36] ok [19:12:00] i just receive the MTP [19:12:10] and the 1 U rack mount [19:12:19] and update the ticket [19:12:27] also i have 10 HP server on site [19:12:49] papaul: you didnt assign the ticket back to me though [19:12:54] so in the directions in https://rt.wikimedia.org/Ticket/Display.html?id=7454 [19:12:58] it was just a update [19:13:29] ok, just make sure the mtp ticket gets assigned back to me in rt so i track the other part of the order [19:13:33] so you one me to assign back the ticket to you every time i do an update? [19:13:40] PROBLEM - Host analytics1004 is DOWN: PING CRITICAL - Packet loss = 100% [19:13:43] the ticket https://rt.wikimedia.org/Ticket/Display.html?id=7454 has instructions on that [19:13:58] not every ticket, i've been trying to put in the ticket if it resolves or goes back to me each time [19:14:14] ok [19:14:18] when the mtp/mpo ticket is on the final shipment [19:14:28] then it'll be 'recive this in, update ticket if its all here, and resolve' [19:14:34] ok [19:14:37] but since it needs more work on my part, it goes back to me [19:14:51] i will assign it back to you than [19:14:53] cuz yea, every opsen gets every single RT ticket [19:15:12] I read every ticket that happens during my rt duty week, and i read all procurements [19:15:34] but on normal week its more a skim than review of all the RT stuff, so I miss things [19:15:39] so when i received the second part what do i do since you will have the ticket [19:15:52] (03PS1) 10JanZerebecki: dumps: Remove css background image that is 404 [operations/puppet] - 10https://gerrit.wikimedia.org/r/145619 (https://bugzilla.wikimedia.org/58292) [19:16:18] So I'll go over this in a larger view than your question, but just so you get the process [19:16:33] quick answer: you'll get the ticket assgined back to you before it arrives on iste [19:16:35] long answer: [19:16:51] when you (or chris, or anyone) asks for something in rt it goes like this [19:17:37] request (whoever) > review and cost/quote generation (rob) > purchase approvals (mark) > ordering (rob) > shipment monitoring (rob) > open inbound shipment ticket when ships (rob) [19:17:47] then i assign it to the onsite (you, chris, gage) [19:17:49] for receipt [19:18:05] (03CR) 10Chad: "Why not fix it to the correct location?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145619 (https://bugzilla.wikimedia.org/58292) (owner: 10JanZerebecki) [19:18:16] ok [19:18:17] Since 99.99% of vendors email whoever places the order when it ships [19:18:32] its just easier on everyone if i put in the shipment tickets, seems silly to put in a ticket for you to open a ticket, ya know? [19:18:43] RECOVERY - Disk space on analytics1004 is OK: DISK OK [19:18:43] RECOVERY - check if dhclient is running on analytics1004 is OK: PROCS OK: 0 processes with command name dhclient [19:18:43] RECOVERY - SSH on analytics1004 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [19:18:43] RECOVERY - puppet disabled on analytics1004 is OK: OK [19:18:50] RECOVERY - check configured eth on analytics1004 is OK: NRPE: Unable to read output [19:18:50] RECOVERY - Host analytics1004 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [19:18:50] RECOVERY - Puppet freshness on analytics1004 is OK: puppet ran at Fri Jul 11 19:18:48 UTC 2014 [19:18:50] ok got it [19:19:01] RECOVERY - DPKG on analytics1004 is OK: All packages OK [19:19:01] RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [19:19:05] now on the HP servers how are we going to work it out [19:19:15] i told them to hold on it for now [19:19:32] RECOVERY - puppet last run on analytics1004 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [19:19:32] wanted to discuss that with you first [19:19:55] sorry, phone call, brb [19:20:11] no problem [19:20:12] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Fri 11 Jul 2014 17:19:56 UTC [19:22:42] (03CR) 10JanZerebecki: "https://bugzilla.wikimedia.org/show_bug.cgi?id=58292#c5 suggested that. I also didn't see a need for it. Do you have a background URL you " [operations/puppet] - 10https://gerrit.wikimedia.org/r/145619 (https://bugzilla.wikimedia.org/58292) (owner: 10JanZerebecki) [19:23:19] (03PS1) 10Ori.livneh: mediawiki: use apache module [operations/puppet] - 10https://gerrit.wikimedia.org/r/145620 [19:26:06] matanya: you have RT access, right? [19:26:16] (03PS2) 10Ori.livneh: mediawiki: move SSHD nice override from web.pp to init.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/145510 [19:26:18] (03PS2) 10Ori.livneh: mediawiki: use apache module [operations/puppet] - 10https://gerrit.wikimedia.org/r/145620 [19:32:59] papaul: Ok, sorry about that. What HP order, its already there? [19:33:05] I just got notice its shipping... [19:33:20] Or do you mean on where to rack it? [19:33:24] (preplanning) [19:33:41] I planned to put in a RT ticket with details on where to rack the 10 database servers once they arrive. [19:34:07] but yea, thats monday [19:34:54] yes [19:34:57] 10 servvers [19:35:47] (03CR) 10Dzahn: Make lists.wikimedia.org HTTPS only (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/145616 (owner: 10JanZerebecki) [19:36:22] (03CR) 10Hashar: [C: 031] "I agree there is no point in keeping the Monobook era image." [operations/puppet] - 10https://gerrit.wikimedia.org/r/145619 (https://bugzilla.wikimedia.org/58292) (owner: 10JanZerebecki) [19:36:23] So I want to run this server placement by Chris (we tend to check one anothers work on this kind of thing) [19:36:32] (03CR) 10Hashar: "Oh else we could use http://bits.wikimedia.org/static-current/skins/MonoBook/headbg.jpg" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145619 (https://bugzilla.wikimedia.org/58292) (owner: 10JanZerebecki) [19:36:34] RobH: [19:36:37] but I expect we'll finish populating b8, and then place some in c6 [19:36:42] 5 in each [19:36:50] ok no problem [19:36:53] but not sure, lemme see port counts.. or b8 [19:36:57] will wait on you and chris [19:37:01] son that [19:37:15] yea, 14 of the outlet pairs used, should be able to fit half of the order there [19:37:32] we seem to have the 6th and 8th racks in the later rows for storage and databases [19:37:46] we may instead put all 10 in c6 but meh, i dont like that much [19:37:53] i rather diversify placement when possible. [19:38:01] (but we plan to also order more of these so meh) [19:38:14] why that ? [19:38:24] any reason of diversification [19:38:32] just best practices [19:38:43] these are new db servers, so likely best for like, enwiki db use [19:38:46] (03CR) 10Dzahn: Make lists.wikimedia.org HTTPS only (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/145616 (owner: 10JanZerebecki) [19:38:56] we dont want any service to have all its servers in a single rack [19:39:11] So i prefer to spread new database orders out across multiple database racks [19:39:26] ok just keep me updated on way those servers will go [19:39:28] but in this case, this 10 is the first of three orders [19:39:43] are those like testing servers [19:39:52] before you received the other order [19:39:54] nope, was meant to hit last fiscal year budget [19:40:00] hence the lesser number and quick order [19:40:04] ok [19:40:09] but approvals process took it longer than expected so it shows now, heh [19:40:25] so we will be having HP's and Dell on the DC [19:40:26] cool [19:40:33] so if we don't have an answer on where to rack it on monday when they arrive [19:40:33] in the D [19:40:43] you can just have them store it in shipping until we do [19:40:49] that way you only have to unpack and move them once =] [19:40:53] they are already here sir [19:40:58] oh, wha? [19:41:01] bahhhhh [19:41:06] that was faster than expected [19:41:10] yes that what i was telling you [19:41:15] ok, cool, wasnt sure [19:41:26] i told Cyrusone to hold on it [19:41:27] So yea, have them keep it there, I'll email about this (and RT) [19:41:30] onto monday [19:41:32] excellent [19:41:55] oh, we need to have an accurate database count before we name these, heh [19:42:01] Coren: what would be next steps for the labs graphite project? [19:42:10] papaul: i'll be dropping a ticket soon to relabel the databases in the racks there [19:42:15] just fyi [19:42:18] ok [19:44:04] RobH:since i have already the cable management spool and the 1u Rack cassette i will like also for you to tell me where you want those to go [19:44:24] yep, im just waiting on the fiber optics with the order to show up [19:44:33] but i'll get that to you soon [19:44:34] ok cool [19:44:37] YuviPanda: Well, if we settled on the hardware, just poking RobH should do it. [19:44:55] Besides, poking RobH is fun! *poke* [19:45:13] (03CR) 10Dzahn: [C: 031] Make lists.wikimedia.org HTTPS only (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/145616 (owner: 10JanZerebecki) [19:45:37] Coren: ah, hmm. thoughts? I'm going to go with whatever you pick from https://wikitech.wikimedia.org/wiki/Server_Spares [19:45:45] *poke*mon [19:46:07] Coren: Dell PowerEdge R420, dual Intel Xeon E5-2450 v2 2.50GHz, 64GB Memory, (4) 3TB Disks perhaps? we can stripe heavily. [19:46:21] and won't have to worry about write wear as with SSDs [19:46:35] YuviPanda: Honestly I'm a bit dubious the SSDs are a good idea; there /is/ a maximum number of write cycle for those and a graphite instance is going to make, literally, millions of small writes. [19:47:05] Coren: true, I think I agree there. [19:47:29] (03CR) 10Dzahn: [C: 032] delete mobile.wp.org Apache config [operations/puppet] - 10https://gerrit.wikimedia.org/r/145498 (owner: 10Dzahn) [19:48:16] RobH: How complicated would it be to grab one of those 2x500G servers and stuff and extra 2x500G in it? Does it even have the bays? I don't like wasting resources and the 4x3T seem like overkill. [19:48:17] (03CR) 10Dzahn: "nowadays: templates/varnish/mobile-frontend.inc.vcl.erb: /* Support the old mobile.wikipedia.org wap gateway */" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145498 (owner: 10Dzahn) [19:48:23] RECOVERY - Unmerged changes on repository puppet on strontium is OK: Fetching origin [19:48:57] Coren: yeah, 4x500G seems ideal, assuming we get 1T usable space for OS + data. [19:49:21] Oh, hm, nevermind. The R420s obviously have the bays since there is one with 4x3T. :-) [19:49:21] Coren: its easier to use the 4 3tb. [19:49:30] harder to add disks, and we dont have 500 [19:49:34] we add 1tb disks now [19:49:34] Ah. [19:49:35] so yea [19:49:43] just use the one formatted for 4 disks is much easier. [19:49:57] as adding means checking if there are disk cables, drive caddies, crap like that [19:51:20] (03CR) 10Dzahn: [C: 032] wikitech apache erb - qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/145493 (owner: 10Dzahn) [19:51:21] the 3TB sata picked for those misc systems is cuz thats the sweet spot on those particular disk models for storage space [19:51:31] that made sense to order for spare, so dont feel bad ;] [19:51:43] you have redunancy is all, heh [19:51:46] redundancy even [19:51:46] * YuviPanda feels less bad about taking away disks from things that may need them :) [19:52:13] Coren: so, 4x3T then? [19:52:13] or we can add two 1TB disks but then they arent all matching for like raid5/6 or whatevs [19:52:26] but easier on folks to use the existing spec if it works and its approved [19:52:34] (not really a wrong answer, heh) [19:53:13] YuviPanda: Yeah, I just commented on the procurement ticket that the 4x3T is futureproof and works with an existing config. Besides, it's not necessarily a bad thing that we get to keep more historical data if we want. [19:53:21] Coren: \o/ [19:58:05] the beta cluster seems to have broke [19:58:24] it won't let me log in because "i have cookies disabled" (i don't) [19:59:43] papaul: so https://rt.wikimedia.org/Ticket/Display.html?id=7859 is a new ticket and the first of many 'rename server' tickets that you'll be getting from me [20:00:03] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Fri Jul 11 19:59:59 UTC 2014 [20:00:04] but they will take a bit of time on each, so yay, more work ;] [20:00:48] we're goign to be renaming and standardizing these hostnames from their old and incorrect hostnames to new ones [20:01:32] RobH:ok [20:01:57] I had to figure out what those would be renamed before i could name and figure out where the new db boxes are going to go =] [20:10:00] heh, so the config for wikitech itself has "if realm == 'labs'" [20:10:02] (03CR) 10JanZerebecki: [C: 031] wikitech-make ServerAlias configurable as well [operations/puppet] - 10https://gerrit.wikimedia.org/r/145610 (owner: 10Dzahn) [20:10:25] ow [20:10:53] ah, it's "labs labs" :) [20:10:59] # Add additional wikis for development$ [20:11:39] yargghh [20:11:56] RobH: I don't suppose you ever made any progress with analytics1009, did you? [20:12:45] so it is sending and receiving dhcp request [20:12:45] s [20:12:59] but when it sends the pxe tftp request i never see it hit carbon [20:13:14] So there is something wrong there, since I see carbon serve requests to other subnets/vlans [20:13:34] not sure if its network or install server config, hadn't gotten that far [20:13:44] the 10 minute reboot cycle to test things gets tiresome ;_; [20:14:06] (03PS2) 10Dzahn: update SSL cipher list on wikitech to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/144736 [20:15:49] (03CR) 10Dzahn: [C: 032] "PS2 was rebase only" [operations/puppet] - 10https://gerrit.wikimedia.org/r/144736 (owner: 10Dzahn) [20:20:59] (03CR) 10Dzahn: [C: 032] "merging, better no image than a broken attempt to load something that doesnt exist" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145619 (https://bugzilla.wikimedia.org/58292) (owner: 10JanZerebecki) [20:21:58] (03CR) 10Chad: [C: 04-1] "Disagree. It's just as freaking easy to point to the right place." [operations/puppet] - 10https://gerrit.wikimedia.org/r/145619 (https://bugzilla.wikimedia.org/58292) (owner: 10JanZerebecki) [20:22:50] (03CR) 10Dzahn: "but not the http:// but a https:// link then.." [operations/puppet] - 10https://gerrit.wikimedia.org/r/145619 (https://bugzilla.wikimedia.org/58292) (owner: 10JanZerebecki) [20:23:34] ^d: https://bugzilla.wikimedia.org/show_bug.cgi?id=58292#c5 [20:23:36] (03PS1) 10Ottomata: Add new 14 DataNode to Hadoop puppetization [operations/puppet] - 10https://gerrit.wikimedia.org/r/145675 [20:24:00] RobH: heads-up - we are nearing the time where we need the final dump (tuesday is likely). should i already put in a note at https://rt.wikimedia.org/Ticket/Display.html?id=5288 ? [20:24:17] ^d: adding it now is a change because it never worked, heh [20:24:44] <^d> It worked. [20:24:44] (03PS2) 10Ottomata: Add new 14 DataNode to Hadoop puppetization [operations/puppet] - 10https://gerrit.wikimedia.org/r/145675 [20:24:44] <^d> Once upon a time. [20:24:50] yeah, RobH, for me too [20:24:55] reboot cycle is pretty annoying [20:25:07] so, i am repurposing analytics1004 as standby namenode then [20:25:21] the only problem is, analytics1009 was moved to a separate row for redundancy purposes [20:25:30] analytics1004 is in the same row as analytics1010 (primary namenode) [20:25:39] sooo, i guess we should get chris to swap them :/ [20:25:51] i'm going to go ahead and install things as is, we should be able to move it ok later [20:25:53] (03CR) 10jenkins-bot: [V: 04-1] Add new 14 DataNode to Hadoop puppetization [operations/puppet] - 10https://gerrit.wikimedia.org/r/145675 (owner: 10Ottomata) [20:25:56] <^d> mutante: https://bits.wikimedia.org/skins/MonoBook/headbg.jpg is the correct url. [20:25:59] <^d> Amending now. [20:26:11] ^d: ok [20:26:58] (03PS3) 10Ottomata: Add new 14 DataNode to Hadoop puppetization [operations/puppet] - 10https://gerrit.wikimedia.org/r/145675 [20:29:28] (03PS4) 10Ottomata: Add new 14 DataNode to Hadoop puppetization [operations/puppet] - 10https://gerrit.wikimedia.org/r/145675 [20:29:37] (03CR) 10Ottomata: [C: 032 V: 032] Add new 14 DataNode to Hadoop puppetization [operations/puppet] - 10https://gerrit.wikimedia.org/r/145675 (owner: 10Ottomata) [20:29:57] (03PS2) 10Chad: dumps: Point CSS background image to correct location [operations/puppet] - 10https://gerrit.wikimedia.org/r/145619 (https://bugzilla.wikimedia.org/58292) (owner: 10JanZerebecki) [20:30:14] mutante: [20:30:16] this ok to merge? [20:30:16] pdate SSL cipher list on wikitech to support PFS (a8534a0) [20:30:16] ottomata: i merged [20:30:19] haha, ok [20:30:22] heh, :) [20:30:28] both are in now [20:30:44] grrrit-wm: did you miss one? [20:30:55] cool [20:33:32] !log wikitech - graceful apache for ssl cipher list change [20:33:38] Logged the message, Master [20:35:59] (03PS1) 10Ottomata: Remove duplicate parameter in hadoop.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/145676 [20:36:08] (03PS2) 10Ottomata: Remove duplicate parameter in hadoop.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/145676 [20:36:48] (03CR) 10Ottomata: [C: 032 V: 032] Remove duplicate parameter in hadoop.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/145676 (owner: 10Ottomata) [20:37:18] Coren, andrewbogott: could one of you increase the instance (or CPU) and public IP quota of the deployment-prep project by a generous amount? (say, 4). We're going to start migrating app servers in beta from precise to trusty and there will be an interim period when we'll have an instance of each type. [20:39:13] (03PS1) 10Ottomata: Remove references to cdh4 [operations/puppet] - 10https://gerrit.wikimedia.org/r/145677 [20:39:21] ori: woot for trust migrations! [20:39:39] <^d> trust migrations? [20:39:43] <^d> Are those like trust falls? [20:39:43] gah [20:39:44] trusty [20:40:14] (03CR) 10Ottomata: [C: 032 V: 032] Remove references to cdh4 [operations/puppet] - 10https://gerrit.wikimedia.org/r/145677 (owner: 10Ottomata) [20:40:40] ori: Yep, one moment... [20:41:38] reminds me; I should probably debug why I can't connect to delopment-prep [20:41:59] PROBLEM - RAID on analytics1011 is CRITICAL: Connection refused by host [20:42:09] PROBLEM - check configured eth on analytics1011 is CRITICAL: Connection refused by host [20:42:19] PROBLEM - check if dhclient is running on analytics1011 is CRITICAL: Connection refused by host [20:42:19] PROBLEM - puppet disabled on analytics1011 is CRITICAL: Connection refused by host [20:42:29] PROBLEM - puppet last run on analytics1011 is CRITICAL: Connection refused by host [20:42:29] PROBLEM - DPKG on analytics1011 is CRITICAL: Connection refused by host [20:42:39] PROBLEM - Disk space on analytics1011 is CRITICAL: Connection refused by host [20:42:39] PROBLEM - Hadoop DataNode on analytics1011 is CRITICAL: Connection refused by host [20:42:49] PROBLEM - Hadoop JournalNode on analytics1011 is CRITICAL: Connection refused by host [20:42:49] PROBLEM - Hadoop NodeManager on analytics1011 is CRITICAL: Connection refused by host [20:49:49] RECOVERY - Hadoop JournalNode on analytics1011 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.qjournal.server.JournalNode [20:49:49] RECOVERY - Hadoop NodeManager on analytics1011 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [20:49:59] RECOVERY - RAID on analytics1011 is OK: OK: no disks configured for RAID [20:50:09] RECOVERY - check configured eth on analytics1011 is OK: NRPE: Unable to read output [20:50:19] RECOVERY - check if dhclient is running on analytics1011 is OK: PROCS OK: 0 processes with command name dhclient [20:50:19] RECOVERY - puppet disabled on analytics1011 is OK: OK [20:50:29] RECOVERY - DPKG on analytics1011 is OK: All packages OK [20:50:36] (03PS1) 10Jgreen: train_spamassassin su '-s /bin/sh' b/c user no longer has default shell [operations/puppet] - 10https://gerrit.wikimedia.org/r/145681 [20:50:39] RECOVERY - Disk space on analytics1011 is OK: DISK OK [20:50:39] RECOVERY - Hadoop DataNode on analytics1011 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [20:51:06] !log bringing up some hadoop journalnodes (and datanodes) [20:51:12] Logged the message, Master [20:51:29] RECOVERY - puppet last run on analytics1011 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [20:54:11] (03PS1) 10Ottomata: Only include analytics-users on NameNodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/145685 [20:54:59] PROBLEM - NTP on analytics1011 is CRITICAL: NTP CRITICAL: Offset unknown [20:55:08] (03CR) 10Jgreen: [C: 032 V: 031] train_spamassassin su '-s /bin/sh' b/c user no longer has default shell [operations/puppet] - 10https://gerrit.wikimedia.org/r/145681 (owner: 10Jgreen) [20:55:59] (03PS1) 10Ori.livneh: SSL cipherlist: drop DHE-RSA-AES128-GCM-SHA256 [operations/puppet] - 10https://gerrit.wikimedia.org/r/145686 [20:56:29] PROBLEM - puppet last run on analytics1004 is CRITICAL: CRITICAL: Puppet has 1 failures [20:56:33] (03PS2) 10Ottomata: Only include analytics-users on NameNodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/145685 [20:56:57] (03PS3) 10Ottomata: Only include analytics-users on NameNodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/145685 [20:57:02] (03CR) 10Ottomata: [C: 032 V: 032] Only include analytics-users on NameNodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/145685 (owner: 10Ottomata) [20:57:22] !log disabling puppet on analytics1004 (AGH!) [20:57:28] Logged the message, Master [20:59:15] andrewbogott: ack? [20:59:22] ori: done [20:59:31] andrewbogott: awesome, many thanks [21:01:27] (03PS1) 10Ottomata: analytics1004 is the new standby NameNode [operations/puppet] - 10https://gerrit.wikimedia.org/r/145687 [21:01:36] (03CR) 10Ottomata: [C: 032 V: 032] analytics1004 is the new standby NameNode [operations/puppet] - 10https://gerrit.wikimedia.org/r/145687 (owner: 10Ottomata) [21:02:04] PROBLEM - Hadoop Namenode - Stand By on analytics1004 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.hdfs.server.namenode.NameNode [21:05:12] grr, i didn't want that to run yet...hm its ok [21:06:54] RECOVERY - NTP on analytics1011 is OK: NTP OK: Offset -0.02678930759 secs [21:06:59] (03PS1) 10Dzahn: remove SSL cipher DHE-RSA-AES128-GCM-SHA256 [operations/puppet] - 10https://gerrit.wikimedia.org/r/145688 [21:07:39] (03PS2) 10Dzahn: remove SSL cipher DHE-RSA-AES128-GCM-SHA256 [operations/puppet] - 10https://gerrit.wikimedia.org/r/145688 [21:07:41] (03PS2) 10Ori.livneh: SSL cipherlist: drop DHE-RSA-AES128-GCM-SHA256 [operations/puppet] - 10https://gerrit.wikimedia.org/r/145686 [21:10:57] jgage: you there? [21:14:04] (03PS3) 10Dzahn: remove SSL cipher DHE-RSA-AES128-GCM-SHA256 [operations/puppet] - 10https://gerrit.wikimedia.org/r/145688 [21:15:16] (03Abandoned) 10Ori.livneh: SSL cipherlist: drop DHE-RSA-AES128-GCM-SHA256 [operations/puppet] - 10https://gerrit.wikimedia.org/r/145686 (owner: 10Ori.livneh) [21:31:53] !log Reloading Zuul to deploy config change I993eba5ab7b70f924a2b925fea7c196db27c4cc3 [21:31:55] ori: ^ [21:31:57] Logged the message, Master [21:32:08] Krinkle: woot, thanks very much [21:32:19] AaronSchulz: ^^^ that means jenkins will V+2 / merge jobrunner changes [21:32:33] \o/ [21:33:39] Krinkle: can we get it to CR+2 too? ;) [21:33:49] hehe [21:34:18] ori: if tehre's any issue, the first thing to suspect is the ACL, ensure JenkinsBot has the right to V+2 and to merge. [21:34:43] (group= JenkinsBot, user=jenkins-bot) [21:37:41] (03CR) 10Dzahn: [C: 032] dumps: Point CSS background image to correct location [operations/puppet] - 10https://gerrit.wikimedia.org/r/145619 (https://bugzilla.wikimedia.org/58292) (owner: 10JanZerebecki) [21:39:29] PROBLEM - puppet last run on analytics1004 is CRITICAL: CRITICAL: Puppet has 1 failures [21:40:59] RECOVERY - Hadoop Namenode - Stand By on analytics1004 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.namenode.NameNode [21:41:29] PROBLEM - Hadoop HistoryServer on analytics1010 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer [21:41:39] PROBLEM - Hadoop NameNode Primary Is Active on analytics1010 is CRITICAL: Hadoop.NameNode.FSNamesystem.tag_HAState CRITICAL: standby [21:41:49] PROBLEM - Hadoop ResourceManager on analytics1010 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.resourcemanager.ResourceManager [21:42:19] PROBLEM - puppet last run on analytics1010 is CRITICAL: CRITICAL: Puppet has 1 failures [21:44:29] RECOVERY - puppet last run on analytics1004 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [21:45:39] RECOVERY - Hadoop NameNode Primary Is Active on analytics1010 is OK: Hadoop.NameNode.FSNamesystem.tag_HAState OKAY: active [21:45:43] (03CR) 10Dzahn: "got a feeling this is another change to dumps HTML that does not actually get applied? because all i see is Snapshot::Dumps::Templates/Fil" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145619 (https://bugzilla.wikimedia.org/58292) (owner: 10JanZerebecki) [21:47:29] RECOVERY - Hadoop HistoryServer on analytics1010 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer [21:47:49] RECOVERY - Hadoop ResourceManager on analytics1010 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.resourcemanager.ResourceManager [21:48:19] RECOVERY - puppet last run on analytics1010 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:50:02] (03CR) 10Dzahn: "yep, this changes templates on snapshot hosts, and for example the change is applied here:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145619 (https://bugzilla.wikimedia.org/58292) (owner: 10JanZerebecki) [21:52:18] ^d: meh, that change doesnt actually change stuff..:p [21:52:23] see comment [21:52:36] we gotta ping apergos later [21:52:47] <^d> aw, ok. [21:54:45] (03CR) 10CSteipp: [C: 031] "I'm fairly sure that was an oversight." [operations/puppet] - 10https://gerrit.wikimedia.org/r/145688 (owner: 10Dzahn) [22:00:38] (03PS2) 10Awight: Enable FundraisingTranslateWorkflow on metawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141607 [22:00:40] (03PS1) 10Awight: Enable FundraisingTranslateWorkflow on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145703 [22:01:22] (03CR) 10Awight: Enable FundraisingTranslateWorkflow on metawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141607 (owner: 10Awight) [22:01:39] (03CR) 10Awight: [C: 04-2] "Needs testing before deployment" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141607 (owner: 10Awight) [22:03:11] i was trying to save an edit to mediawiki.org and got [13c5580b] 2014-07-11 22:02:52: Fatal exception of type MWException [22:03:26] (03CR) 10jenkins-bot: [V: 04-1] Enable FundraisingTranslateWorkflow on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145703 (owner: 10Awight) [22:03:28] (03CR) 10jenkins-bot: [V: 04-1] Enable FundraisingTranslateWorkflow on metawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141607 (owner: 10Awight) [22:08:18] (03CR) 10Dzahn: "thanks, i saw this triggered https://rt.wikimedia.org/Ticket/Display.html?id=7858" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145499 (owner: 10Dzahn) [22:20:07] (03PS2) 10Awight: Enable FundraisingTranslateWorkflow on the beta cluster [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145703 [22:23:45] (03CR) 10Dzahn: [C: 031] Need oxygen access to get at lsearchd logs [operations/puppet] - 10https://gerrit.wikimedia.org/r/145054 (owner: 10Chad) [22:26:34] (03PS1) 10Awight: Enable FundraisingTranslateWorkflow on the beta cluster [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145710 [22:27:29] (03Abandoned) 10Awight: Enable FundraisingTranslateWorkflow on metawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141607 (owner: 10Awight) [22:28:49] (03PS3) 10Awight: Enable FundraisingTranslateWorkflow on metawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145703 [22:29:07] (03CR) 10Awight: [C: 04-2] "Do not merge until tested on the beta cluster." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145703 (owner: 10Awight) [22:30:36] (03CR) 10Awight: [C: 032] Enable FundraisingTranslateWorkflow on the beta cluster [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145710 (owner: 10Awight) [22:31:54] (03PS1) 10Dzahn: kafka process monitoring: make it send pages [operations/puppet] - 10https://gerrit.wikimedia.org/r/145711 [22:31:58] !log jenkins/gallium's weekly w(h)ine hour is here. [22:32:02] Logged the message, Master [22:32:53] (03CR) 10Dzahn: [C: 031] "https://wikitech.wikimedia.org/wiki/Incident_documentation/20140608-Kafka#Actionables" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145711 (owner: 10Dzahn) [22:33:11] !log Pooled/depooled Jenkins slave on gallium [22:33:14] !log Restarting Jenkins [22:33:16] Logged the message, Master [22:33:24] Logged the message, Master [22:35:03] jenkins is locked again [22:44:41] !log upgraded libssl on wtp* [22:44:46] Logged the message, Master [22:45:38] any chance someone could look up some gerrit.wikimedia.org ssh auth logs for me? my public key stopped working overnight [22:45:55] ssh -v says the server accepts the key, but the refuses to authenticate [22:53:34] ^d: do we have those? that's not openssh but gerrit itself on the high port [22:54:22] tgr: you mean ssh to 29418, right [22:54:55] mutante: ssh -vp 29418 tgr@gerrit.wikimedia.org [22:55:06] which gives [22:55:17] debug1: Server accepts key: pkalg ssh-rsa blen 149 [22:55:17] Received disconnect from 208.80.154.81: 2: Too may authentication failures [22:57:11] works for me with my key, not sure where the logs are if any [22:59:09] tgr: ah, i found something.. try again? [22:59:14] watching a log [22:59:35] mutante: tried, ip is 198.73.209.4 [22:59:36] tgr: tgr a/389 LOGIN FROM ... [22:59:45] [2014-07-11 22:59:21,064 +0000] ac2c0a4d tgr a/389 LOGIN FROM 198.73.209.1 [22:59:48] [2014-07-11 22:59:21,141 +0000] ac2c0a4d tgr a/389 LOGOUT [23:01:09] eh.. yea.. that's all i see ..LOGIN and LOGOUT [23:01:17] not very informative :) [23:01:33] I guess the low level SSH info is not logged then? [23:01:44] i see anoher login from [23:01:48] jforrester [23:01:54] from the IP you said [23:01:59] heh, is that office? [23:02:15] 198.73.209.4 is the office, yeah. [23:02:16] yes, some sort of proxy I assume [23:02:23] Or whatever. [23:02:38] yea, no, this is called "sshd_log" [23:03:26] guess I am out of luck then [23:03:36] I'll try to rotate the key [23:04:06] tgr: is it still showing up in gerrit ui preferences? [23:04:14] multiple keys? [23:04:43] didn't check, but the same setup worked a day ago and I haven't changed anything [23:04:50] one thing is already odd in any case.. you get "refused" and i see "LOGIN"..duh? [23:05:04] also, according to the local ssh log the server does accept the key [23:05:11] just refuses the connection afterwards [23:05:54] hmmmm [23:06:55] well, it's normal that it closes on you [23:06:55] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 8 below the confidence bounds [23:07:01] but you should see this: [23:07:06] Hi Dzahn, you have successfully connected over SSH. [23:07:13] .. [23:07:14] Connection to gerrit.wikimedia.org closed. [23:08:55] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [23:15:18] tgr: i went to the backend db of gerrit.. i see you have 2 keys in there [23:15:55] mutante: sorry, I got distracted [23:16:03] | ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAIEAmOYO1PzKx4B1dPWBLrM492... [23:16:12] | ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAQEAtfXhY5kcZneQPdq5jUZA... [23:16:26] gerrit handles it as an authentication failure, I get the same error about authentication failures when I use git-review [23:16:47] did you mean to have 2 keys? [23:16:52] I have two keys, yes [23:17:11] haven't changed that since last year [23:17:48] I can get rid of one of them, but I used the exact same setup just a day ago and it worked [23:18:13] uhm.. sure there are no changes on the local computer since yesterday? i dont think gerrit changed either [23:18:25] running out of ideas soon [23:18:55] the keyfile has a modified date of 2013 [23:19:19] can you try connecting with [23:19:26] ssh -i /full/path/to/private/key ? [23:19:27] not sure what else could be relevant, ssh finds the key, offers it, the server recognizes it, from that point local settings should not play any role [23:19:31] and not use any agent? [23:20:07] I tried ssh-agent -k; ssh -i .... but that still sent all my keys [23:20:08] hrrmm, i see no reason for that on the server side [23:20:28] not sure how to kill the agent for good [23:20:37] except it triggered some kind of bruteforce protection [23:21:25] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [23:22:16] you are probably right [23:22:17] there is a column "inactive" in the accounts table, but you are "N" inactive [23:22:32] I deleted all other local keys, and that did the trick [23:22:37] oooh :) [23:22:52] so works? cool [23:23:04] ssh offered six keys by default, and the gerrit key happened to be the last [23:23:15] so maybe that's causing problems [23:23:16] oh, hah, i bet 5 is the limit [23:23:21] <^d> mutante: Yes, ssh auth logs exist. [23:23:30] after that "too many auth failures" [23:23:33] from the first 5 [23:23:36] <^d> in /var/lib/gerrit2/review_site/logs/ [23:23:37] ^d: yep, found it meanwhile [23:23:52] still not something that I changed recently, though [23:23:52] ^d: thanks, looks solved [23:24:07] <^d> We *did* just upgrade gerrit on Monday. [23:24:12] <^d> Was trying to fix a mina sshd bug. [23:24:18] <^d> Take that for what it's worth. [23:24:34] I could still connect yesterday [23:24:45] maybe the order it sends the keys changes? [23:24:49] and you were lucky before [23:25:15] hm, ubuntu did pull a bunch of updates yesterday, might have included something ssh-related [23:25:36] I can probably make sure to add my gerrit key to the agent first [23:25:41] thanks for the help! [23:27:44] awight: https://gerrit.wikimedia.org/r/#/c/145710/ Please re-submit if you still want it to be merged. Jenkins was restarted so the queue was cleared. [23:27:48] I generally do this for people after a reboot, but prefer not to touch that repo when I'm not deploying [23:28:01] Just add +2 again :) [23:28:27] Krinkle: rad, thanks for the ping. I'll also deploy the noop so it's not stale. [23:28:41] (03CR) 10Awight: [V: 032] Enable FundraisingTranslateWorkflow on the beta cluster [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145710 (owner: 10Awight) [23:28:53] (03CR) 10Awight: [C: 032] Enable FundraisingTranslateWorkflow on the beta cluster [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145710 (owner: 10Awight) [23:29:36] (03Merged) 10jenkins-bot: Enable FundraisingTranslateWorkflow on the beta cluster [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145710 (owner: 10Awight) [23:34:51] !log awight updated /a/common to {{Gerrit|I862a4afed}}: Fixup highlightTest.php [23:34:56] Logged the message, Master [23:36:37] !log awight Synchronized wmf-config: Deploying FundraisingTranslateWorkflow to labs (duration: 00m 05s) [23:36:42] Logged the message, Master [23:40:57] (03PS1) 10Awight: typo in FundraisingTranslateWorkflow config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145722 [23:41:11] (03CR) 10Awight: [C: 032] typo in FundraisingTranslateWorkflow config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145722 (owner: 10Awight) [23:41:17] (03Merged) 10jenkins-bot: typo in FundraisingTranslateWorkflow config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145722 (owner: 10Awight) [23:44:26] !log awight Synchronized wmf-config: Deploying FundraisingTranslateWorkflow to labs (take 2) (duration: 00m 04s) [23:44:31] Logged the message, Master