[00:03:46] (03PS1) 10Reedy: Update wikivoyage logo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85145 [00:05:14] (03CR) 10Reedy: [C: 032] Update wikivoyage logo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85145 (owner: 10Reedy) [00:05:37] (03Merged) 10jenkins-bot: Update wikivoyage logo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85145 (owner: 10Reedy) [00:06:27] !log reedy synchronized images/sul/wikivoyage.png [00:06:30] Logged the message, Master [00:09:46] (03CR) 10Akosiaris: "The change is more of an argument to try and figure out what kind of service we want to provide to users. Allowing both is the bad choice " [operations/puppet] - 10https://gerrit.wikimedia.org/r/84873 (owner: 10Akosiaris) [00:17:33] Logged the message, Master [00:18:02] heya akosiaris, i want to install the consumer.properties and producer.properties files in the kafka deb under different names [00:18:03] like [00:18:08] consumer.properties.example [00:18:12] or something [00:18:41] currently i'm installing them with debian/kafka.install [00:18:48] but I can't rename files with that [00:18:54] I could use your makefile [00:19:01] and use cp instead of $(INSTALL) [00:24:57] (03PS1) 10Ryan Lane: Add missing config for test/testrepo [operations/puppet] - 10https://gerrit.wikimedia.org/r/85147 [00:26:04] (03CR) 10Ryan Lane: [C: 032] Add missing config for test/testrepo [operations/puppet] - 10https://gerrit.wikimedia.org/r/85147 (owner: 10Ryan Lane) [00:27:24] (03PS1) 10Ottomata: Refactoring kafka::server and adding kafka::mirror support. [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/85148 [00:29:50] (03PS2) 10Ottomata: Refactoring kafka::server and adding kafka::mirror support. [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/85148 [00:38:44] PROBLEM - Puppet freshness on sq45 is CRITICAL: No successful Puppet run in the last 10 hours [00:44:26] (03CR) 10CSteipp: "https, with no user authentication, provides:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/84873 (owner: 10Akosiaris) [00:46:33] !log rebooting sq45 [00:46:37] Logged the message, Master [00:50:14] PROBLEM - Host sq45 is DOWN: PING CRITICAL - Packet loss = 100% [01:39:09] (03PS1) 10Reedy: Expand MWMultiversion test cases [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85162 [01:43:57] (03PS2) 10Reedy: Expand MWMultiversion test cases [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85162 [01:45:31] (03CR) 10Reedy: [C: 032] Expand MWMultiversion test cases [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85162 (owner: 10Reedy) [01:45:48] (03Merged) 10jenkins-bot: Expand MWMultiversion test cases [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85162 (owner: 10Reedy) [01:57:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [01:58:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [02:12:46] !log LocalisationUpdate completed (1.22wmf17) at Fri Sep 20 02:12:46 UTC 2013 [02:12:51] Logged the message, Master [02:24:44] !log LocalisationUpdate completed (1.22wmf18) at Fri Sep 20 02:24:44 UTC 2013 [02:24:48] Logged the message, Master [02:44:52] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Sep 20 02:44:51 UTC 2013 [02:44:55] Logged the message, Master [02:54:47] (03PS1) 10Reedy: WIP don't deduce sites based on docroot stuff [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85165 [02:55:39] (03PS1) 10Reedy: Add tearDown to make sure singleton is destroyed [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85166 [02:56:44] TimStarling: was MW_SECURE_HOST for singer rather than the current setup? [02:56:52] (used in multiversion) [02:57:01] MWVersion.php: # MW_SECURE_HOST set from secure gateway? [02:57:01] MWVersion.php: $secure = getenv( 'MW_SECURE_HOST' ); [02:57:01] MWVersion.php: $host = $secure ? $secure : $_SERVER['HTTP_HOST']; [03:01:16] (03PS2) 10Reedy: Add tearDown to make sure singleton is destroyed [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85166 [03:01:34] (03CR) 10Reedy: [C: 032] Add tearDown to make sure singleton is destroyed [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85166 (owner: 10Reedy) [03:01:56] (03Merged) 10jenkins-bot: Add tearDown to make sure singleton is destroyed [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85166 (owner: 10Reedy) [03:03:03] Hell, the $host at the end isn't used anywhere else [03:03:38] (03PS1) 10Reedy: Remove unused MW_SECURE_HOST check [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85167 [03:16:41] (03PS1) 10Reedy: Add testcase for pa_uswikimedia cause it breaks stuff [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85169 [03:16:59] (03CR) 10Reedy: [C: 032] Add testcase for pa_uswikimedia cause it breaks stuff [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85169 (owner: 10Reedy) [03:17:12] (03Merged) 10jenkins-bot: Add testcase for pa_uswikimedia cause it breaks stuff [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85169 (owner: 10Reedy) [03:19:04] (03PS2) 10Reedy: WIP don't deduce sites based on docroot stuff [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85165 [03:19:18] (03CR) 10jenkins-bot: [V: 04-1] WIP don't deduce sites based on docroot stuff [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85165 (owner: 10Reedy) [03:47:34] (03PS3) 10Reedy: WIP don't deduce sites based on docroot stuff [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85165 [03:54:21] (03PS4) 10Reedy: WIP don't deduce sites based on docroot stuff [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85165 [03:56:52] Do we have any other weird exceptions? [03:58:31] 8 "projects", plus wikimedia, mediawiki, wikisource.org [04:06:17] oh, wikidata [04:09:50] global job queue length is 2.9M [04:10:01] (03PS1) 10Reedy: Fix wikidata.org domain [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85171 [04:10:18] refreshlinks? [04:10:43] (03CR) 10Reedy: [C: 032] Fix wikidata.org domain [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85171 (owner: 10Reedy) [04:10:52] (03Merged) 10jenkins-bot: Fix wikidata.org domain [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85171 (owner: 10Reedy) [04:11:06] mostly parsoid [04:11:14] which wikis? [04:11:16] ParsoidCacheUpdateJob: 2413680 queued; 611 claimed (12 active, 599 abandoned) [04:11:21] on enwiki alone [04:16:59] good to see wikitech is up to date as usual [04:21:59] Reedy: Fancy a pint later? YuviPanda, MaxSem and I are painting the town red. [04:23:54] (03PS5) 10Reedy: WIP don't deduce sites based on docroot stuff [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85165 [04:24:04] (03CR) 10jenkins-bot: [V: 04-1] WIP don't deduce sites based on docroot stuff [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85165 (owner: 10Reedy) [04:26:48] (03PS6) 10Reedy: WIP don't deduce sites based on docroot stuff [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85165 [04:26:49] (03PS1) 10Reedy: Update tests to remove docroot setting [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85172 [04:39:57] the job runners do look to be slacking a bit [06:20:32] Reedy: we still have less high prio jobs than we did in March [06:21:09] it would be useful to, say, double those so that at least emails arrive on time :) [06:21:27] dunno what other stuff is in there but at least emails must not be too DB intensive I hope? [06:26:41] (03PS1) 10Reedy: Remove 404.html symlinks, 404s are handled by w/404.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85175 [06:29:41] The config is open... [06:30:10] "Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request." [06:30:21] 404 whilst trying to handle the 404 [06:31:19] !log reedy synchronized docroot/bits/ [06:31:23] Logged the message, Master [06:31:41] Hmm, that's slightly better [06:34:19] I'm guessing php isn't enabled for bits usage [06:39:10] (03PS1) 10Reedy: Enable PHP on w/404.php on bits [operations/apache-config] - 10https://gerrit.wikimedia.org/r/85177 [06:39:14] (03CR) 10jenkins-bot: [V: 04-1] Enable PHP on w/404.php on bits [operations/apache-config] - 10https://gerrit.wikimedia.org/r/85177 (owner: 10Reedy) [06:40:21] (03PS2) 10Reedy: Enable PHP on w/404.php on bits [operations/apache-config] - 10https://gerrit.wikimedia.org/r/85177 [06:40:23] (03PS1) 10Reedy: Add w symlink for 404.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85178 [06:40:45] (03CR) 10Reedy: [C: 032] Add w symlink for 404.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85178 (owner: 10Reedy) [06:40:54] (03Merged) 10jenkins-bot: Add w symlink for 404.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85178 (owner: 10Reedy) [09:50:03] PROBLEM - Puppet freshness on hume is CRITICAL: No successful Puppet run in the last 10 hours [09:55:03] PROBLEM - Puppet freshness on terbium is CRITICAL: No successful Puppet run in the last 10 hours [11:32:18] (03PS1) 10Petr Onderka: Better error reporting for GCC [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/85189 [12:03:35] (03PS1) 10Nemo bis: Fix misc::maintenance::updatequerypages duplicate cronjobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/85192 [12:04:49] (03CR) 10jenkins-bot: [V: 04-1] Fix misc::maintenance::updatequerypages duplicate cronjobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/85192 (owner: 10Nemo bis) [12:05:09] (03PS2) 10Petr Onderka: Better error reporting for GCC [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/85189 [12:05:42] (03CR) 10Petr Onderka: [C: 032 V: 032] Better error reporting for GCC [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/85189 (owner: 10Petr Onderka) [12:07:03] hmm [12:07:11] (03PS2) 10Nemo bis: Fix misc::maintenance::updatequerypages duplicate cronjobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/85192 [12:07:17] (03CR) 10jenkins-bot: [V: 04-1] Fix misc::maintenance::updatequerypages duplicate cronjobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/85192 (owner: 10Nemo bis) [12:07:17] bbl [12:07:24] ah that was lquick ^^ [12:16:35] (03PS1) 10Petr Onderka: Fixed exception when reading near end of stream [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/85196 [12:21:38] (03CR) 10Petr Onderka: [C: 032 V: 032] Fixed exception when reading near end of stream [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/85196 (owner: 10Petr Onderka) [13:05:06] (03PS3) 10Nemo bis: Fix misc::maintenance::updatequerypages duplicate cronjobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/85192 [13:11:08] (03CR) 10ArielGlenn: [C: 031] Fix misc::maintenance::updatequerypages duplicate cronjobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/85192 (owner: 10Nemo bis) [13:39:55] !log Jenkins: PHPUnit jobs for MediaWiki extensions had an issue for the last couple days. They were being fetched in an incorrect place :/ Fix is https://gerrit.wikimedia.org/r/85202 and got deployed [13:39:58] Logged the message, Master [13:45:14] hey mark; any news regarding the varnish and incorrect tagging of zero traffic? [14:12:15] (03PS7) 10Hashar: Jenkins validation (please ignore) [operations/debs/pybal] - 10https://gerrit.wikimedia.org/r/84932 [14:17:35] (03PS8) 10Hashar: Jenkins validation (please ignore) [operations/debs/pybal] - 10https://gerrit.wikimedia.org/r/84932 [14:24:37] (03PS1) 10Jgreen: begin puppetizing clamav for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/85209 [14:25:56] (03CR) 10Jgreen: [C: 032 V: 031] begin puppetizing clamav for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/85209 (owner: 10Jgreen) [14:34:18] (03PS1) 10Jgreen: puppetize clamd.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/85211 [14:34:35] (03CR) 10jenkins-bot: [V: 04-1] puppetize clamd.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/85211 (owner: 10Jgreen) [14:42:02] (03PS2) 10Jgreen: puppetize clamd.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/85211 [14:43:26] (03CR) 10Jgreen: [C: 032 V: 031] puppetize clamd.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/85211 (owner: 10Jgreen) [14:49:36] (03PS1) 10Jgreen: oops, checked in clamd.conf in the wrong dir [operations/puppet] - 10https://gerrit.wikimedia.org/r/85214 [14:51:55] (03CR) 10Jgreen: [C: 032 V: 031] oops, checked in clamd.conf in the wrong dir [operations/puppet] - 10https://gerrit.wikimedia.org/r/85214 (owner: 10Jgreen) [15:12:04] (03PS1) 10Jgreen: rename clamd service to clamav-daemon [operations/puppet] - 10https://gerrit.wikimedia.org/r/85215 [15:14:31] (03CR) 10Jgreen: [C: 032 V: 031] rename clamd service to clamav-daemon [operations/puppet] - 10https://gerrit.wikimedia.org/r/85215 (owner: 10Jgreen) [15:17:46] (03CR) 10Ottomata: "Sounds like git-deploy is the way to go here. Nik, sorry for having advised you wrong." [operations/puppet] - 10https://gerrit.wikimedia.org/r/82673 (owner: 10Manybubbles) [15:23:33] heya dr0ptp4kt, did you still want to check up on real varnish logs with more headers? or did you guys solve your thing all the way? [15:33:10] (03PS1) 10Ottomata: Not including consumer.properties and producer.properties in /etc/kafka. [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/85219 [15:34:40] manybubbles: if you are ever pointed to some git-deploy doc, I am willing to lear as well :-) [15:34:58] manybubbles: the workaround I found is to add my files in a git repository and do the git pull manually on the server to deploy the changes :( [15:36:15] (03CR) 10Ottomata: "(1 comment)" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/85219 (owner: 10Ottomata) [15:58:30] !log gerrit: deleting reference refs/for/master from pywikibot/core.git , confuses some git clients [15:58:34] Logged the message, Master [16:36:11] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 10,000 [16:37:51] lies, damned lies [16:38:20] ... [16:39:19] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:43:00] Nemo_bis: this graph says 0 jobs? https://ganglia.wikimedia.org/latest/graph.php?r=year&z=xlarge&c=Miscellaneous+pmtpa&h=hume.wikimedia.org&jr=&js=&v=2899505&m=Global_JobQueue_length [16:44:58] I see nothing obvious here https://wikitech.wikimedia.org/wiki/Server_admin_log [16:48:59] greg-g: hm? to me does say 0, but preceded by a 3 and 5 more zeros https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20pmtpa&h=hume.wikimedia.org&v=823574&m=Global_JobQueue_length&r=hour&z=default&jr=&js=&st=1365625056&z=large [16:51:13] sigh, +1M last week [16:52:08] I wonder what those 0.1M bursts followed by oscillations are [16:53:01] greg-g: can we know the job queue length per job type? [16:53:50] and that is just tampa? [16:53:55] aude: Tim said it was 2.0 out of 2.5M parsoid jobs on en.wiki [16:54:04] there is no such a thing as tampa job queue [16:55:01] hmmm, makes sense [16:55:06] it's just run on that server [16:55:47] yeah, it was just the server with a MW install able to run maintenance reports like that one [16:55:52] ok [17:12:03] (03PS1) 10Aude: add commons site link group section for test wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85230 [17:12:53] (03CR) 10Aude: [C: 04-1] "depends on repopulating sites table on testwikidata, updating WikimediaMessages and Wikibase branches on wmf18 and running localisation up" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85230 (owner: 10Aude) [17:16:55] (03CR) 10Aude: "see https://gerrit.wikimedia.org/r/#/c/85230/ for submodule updates" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85230 (owner: 10Aude) [17:19:17] Nemo_bis, apergos, what was the puppet error that https://gerrit.wikimedia.org/r/85192 fixes? [17:20:03] springle: puppet is just refusing to run on hume [17:20:25] because a cron with same name is declared multiple times [17:20:37] this was discussed on #wikimedia-tech [17:21:30] Ariel is afk [17:21:58] springle: [17:22:25] the cron jobs added by the previous change don't vary by name [17:22:41] so puppet tries to declare the same resource twice and bails [17:23:06] Error 400 on SERVER: Duplicate definition: Cron[cron-updatequerypages-lonelypages-s1] is already defined in file /etc/puppet/manifests/misc/maintenance.pp at line 487; cannot redefine at /etc/puppet/manifests/misc/maintenance.pp:487 on node hume.wikimedia.org [17:23:36] this is because the previous patch didn't actually do what was intended (run each corn job on its specific date only on s1) [17:23:46] ah ok [17:24:20] the new code looks like it should in fact run the new cron jobs only on s1 on the given dates (but please look it over, I wont' be here to babysit when the jobs actually run so I didn't want to +2 it and stick you with it) [17:25:07] apergos: thank you [17:25:49] yw [17:26:46] I've been on a 'keep puppet running on all hosts' kick, hoping to make it a habit [17:32:17] (03Abandoned) 10Akosiaris: Reduce retry timeout for etherpad frontend apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/84882 (owner: 10Akosiaris) [17:36:08] <^d> Ryan_Lane: Do you remember what the purpose of installing git-svn on the gerrit box was? [17:36:20] (03PS1) 10Akosiaris: Retry timeout to 15sec for etherpad frontend [operations/puppet] - 10https://gerrit.wikimedia.org/r/85236 [17:37:28] !log Created Echo tables on officewiki [17:37:32] Logged the message, Master [17:38:00] (03PS4) 10Springle: Fix misc::maintenance::updatequerypages duplicate cronjobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/85192 (owner: 10Nemo bis) [17:38:45] (03CR) 10Springle: [C: 032] Fix misc::maintenance::updatequerypages duplicate cronjobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/85192 (owner: 10Nemo bis) [17:39:08] (03PS1) 10Reedy: Add echo to officewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85237 [17:39:26] (03CR) 10Akosiaris: [C: 032] Retry timeout to 15sec for etherpad frontend [operations/puppet] - 10https://gerrit.wikimedia.org/r/85236 (owner: 10Akosiaris) [17:39:37] !log reedy synchronized database lists files: [17:39:41] Logged the message, Master [17:40:08] (03CR) 10Reedy: [C: 032] Add echo to officewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85237 (owner: 10Reedy) [17:40:21] (03Merged) 10jenkins-bot: Add echo to officewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85237 (owner: 10Reedy) [17:41:40] !log reedy synchronized wmf-config/ 'touch' [17:41:44] Logged the message, Master [17:41:58] Can someone merge and deploy https://gerrit.wikimedia.org/r/#/c/85177/ to fix the 404 handler on bits so it doesn't 404? [17:44:15] PROBLEM - MySQL Processlist on db1051 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 64 statistics [17:44:33] Well, I lie slightly [17:44:45] I stopped it 404-ing by adding the w symlink so it can find 404.php [17:44:50] It just won't execute PHP [17:47:15] RECOVERY - MySQL Processlist on db1051 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 5 statistics [17:47:44] (03PS1) 10Chad: Cirrus on all closed wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85239 [17:47:45] RECOVERY - Puppet freshness on hume is OK: puppet ran at Fri Sep 20 17:47:38 UTC 2013 [17:49:44] ^d: umm, yeah [17:49:49] ^d: you asked for it ;) [17:49:56] to do the svn -> git migration [17:50:06] <^d> Why on earth did I need that. [17:50:08] <^d> I was stupid. [17:50:12] hahaha [17:51:33] (03PS1) 10Chad: Remove stupid package I don't need [operations/puppet] - 10https://gerrit.wikimedia.org/r/85240 [17:53:25] PROBLEM - MySQL Processlist on db1043 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 49 statistics [17:53:25] RECOVERY - Puppet freshness on terbium is OK: puppet ran at Fri Sep 20 17:53:23 UTC 2013 [17:56:25] PROBLEM - MySQL Processlist on db1043 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 33 statistics [17:59:01] (03PS1) 10Springle: adjust processlist warn/crit thresholds [operations/puppet] - 10https://gerrit.wikimedia.org/r/85243 [17:59:25] RECOVERY - MySQL Processlist on db1043 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 16 statistics [18:00:13] (03CR) 10Springle: [C: 032] adjust processlist warn/crit thresholds [operations/puppet] - 10https://gerrit.wikimedia.org/r/85243 (owner: 10Springle) [18:04:00] (03PS1) 10Akosiaris: Update debian/changelog [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/85246 [18:04:01] (03PS1) 10Akosiaris: Remove scala 2.8 annotations [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/85247 [18:04:30] (03CR) 10Akosiaris: [C: 032 V: 032] Update debian/changelog [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/85246 (owner: 10Akosiaris) [18:04:42] (03CR) 10Akosiaris: [C: 032 V: 032] Remove scala 2.8 annotations [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/85247 (owner: 10Akosiaris) [18:05:35] ottomata: drdee: ^ This fixes kafka not building with the latest patches. I also updated version number so if you want new kafka packages on apt tell me [18:06:25] PROBLEM - MySQL Processlist on db1043 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 43 statistics [18:06:29] awesoome, danke [18:06:40] before you build a new one [18:06:44] let's check / merge this [18:06:45] https://gerrit.wikimedia.org/r/#/c/85219/ [18:06:55] check my comment, i'm not really sure of the best way to do what I want [18:11:06] (03PS4) 10BBlack: netmapper data sync stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/85120 [18:12:25] PROBLEM - MySQL Processlist on db1043 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 37 statistics [18:12:36] ottomata: i see what you mean... you absolutely want the .example suffix in those 2 files? [18:14:21] (03CR) 10BBlack: [C: 032] netmapper data sync stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/85120 (owner: 10BBlack) [18:14:25] RECOVERY - MySQL Processlist on db1043 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 14 statistics [18:16:09] well, no akosiaris, not necessarily [18:16:33] the main problem is kinda that there are multipel consumer .property files, and their names will probably be arbitrary [18:16:35] for example [18:16:44] consumer.eqiad.properties, consumer.esams.properties [18:16:45] etc. [18:16:47] but [18:16:56] if the .deb installs one as consumer.properties [18:17:09] the init script will read it in [18:17:27] alternativelly, i guess i could ensure => absent on that file in puppet [18:17:58] i just don't want to one day provision a kafka mirror instance using this .deb and puppet [18:18:11] and have the extra manual step of removing a file so it can actually work [18:19:24] and, since the shipped consumer and producer files almost certainly will not work for mirror. they both point at the kafka cluster defaults, which means if mirror starts up and someone is actually running the defaults, like in a test env, mirror will create an infinite loop of logs [18:19:34] consume and produce back to the same brokers [18:21:15] (03PS1) 10Lcarr: adding subnet sandbox1-b-eqiad and dickson.freenode.net [operations/dns] - 10https://gerrit.wikimedia.org/r/85250 [18:23:27] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [18:24:37] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [18:24:38] (03PS1) 10BBlack: apparently class-require does not do what I wish it did [operations/puppet] - 10https://gerrit.wikimedia.org/r/85251 [18:24:40] (03CR) 10jenkins-bot: [V: 04-1] apparently class-require does not do what I wish it did [operations/puppet] - 10https://gerrit.wikimedia.org/r/85251 (owner: 10BBlack) [18:27:27] PROBLEM - Apache HTTP on mw1085 is CRITICAL: Connection refused [18:27:31] (03Abandoned) 10BBlack: apparently class-require does not do what I wish it did [operations/puppet] - 10https://gerrit.wikimedia.org/r/85251 (owner: 10BBlack) [18:27:47] (03PS1) 10BBlack: apparently class-require does not do what I wish it did [operations/puppet] - 10https://gerrit.wikimedia.org/r/85253 [18:28:15] jenkins-- [18:29:00] (03PS6) 10Ryan Lane: Simplify git-deploy configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/83046 [18:29:59] (03CR) 10BBlack: [C: 032] apparently class-require does not do what I wish it did [operations/puppet] - 10https://gerrit.wikimedia.org/r/85253 (owner: 10BBlack) [18:30:02] greg-g: Do we do Friday LDs? What's the policy there? [18:30:38] RoanKattouw: no Friday deploys :) [18:30:44] That's what I thought :) [18:30:53] * MaxSem deploys greg-g [18:30:55] In that case, can I sign up for an LD on Monday? [18:30:59] MatmaRex: :P [18:31:01] anomie: I'm tempted to add rate-limiting for null edits [18:31:03] I have some VE page corruption fixes [18:31:08] RoanKattouw: yessir, what's the issue? [18:31:13] ah [18:31:23] (03CR) 10Akosiaris: "(1 comment)" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/85219 (owner: 10Ottomata) [18:31:23] cool, yeah, plz do add to wikitech deploy page [18:31:27] RECOVERY - Apache HTTP on mw1085 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.092 second response time [18:31:34] https://en.wikipedia.org/w/index.php?title=Matt_Chandler_%28pastor%29&curid=37214846&diff=573675359&oldid=571858677 (search for page corruption? I hold greg-g, you deploy! [18:32:54] grrrit-wm: aren't tabbing mistakes fun? :D [18:33:04] AaronSchulz: People would still do it, just slower. [18:33:26] RoanKattouw: eek [18:33:34] RoanKattouw: how often is it happening? [18:34:00] <^d> manybubbles: Sooo, I thought about it for a long time. Hacked a couple of alternatives. All ended up being worse than my dislike of LSB. [18:34:02] Let me look at data for the past 24h [18:34:07] <^d> So I merged your LSB approach :) [18:34:10] My guess is not too often because the page has to be in a very specific state [18:34:19] ^d: I saw that. [18:34:21] * greg-g nods [18:34:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [18:34:38] ^d: I think I broke something with that phrase rescore stuff. I'm kind of mired in it. [18:34:42] but cool! [18:35:03] <^d> Mmk. Oh, and we've scheduled all "closed" wikis to migrate on wednesday. [18:35:25] <^d> Nobody uses them much so user impact will be almost non-existent, but will allow us to index 126 more wikis :) [18:35:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 14.184 second response time [18:35:54] lol [18:37:06] greg-g: Today (UTC, so the past 18 hours) there were 16 edits flagged for possible corruption issues on enwiki, 13-14 of them exhibit the reference corruption bug [18:37:33] So it's corrupting ~20 pages per day, which is annoying but not massive [18:37:48] urgh [18:37:52] Like, it would take me a couple hours to fix all of them over the weekend [18:37:52] I wish you would have said 5 [18:38:11] RoanKattouw: what's the gerrit change that fixes this? [18:38:44] ^d: yay! [18:38:51] greg-g: https://gerrit.wikimedia.org/r/#/c/85144/ [18:39:00] anomie: well slow can be handled [18:39:14] (03PS2) 10Reedy: wmgHTTPSBlacklistCountries changes for CN [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84781 [18:39:27] whoa, bigger than I though [18:39:28] t [18:39:31] Yeah [18:39:36] (03CR) 10Reedy: [C: 032] wmgHTTPSBlacklistCountries changes for CN [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84781 (owner: 10Reedy) [18:39:43] More than half is in the test directory but still [18:40:08] at least you added/edited tests! [18:40:15] Yeah [18:40:33] is this live on beta cluster? [18:40:36] I even added a failing test case in PS1 but for some reason Jenkins V+2ed even though the tests were failing for me locally :S [18:40:38] Probably [18:40:47] It was merged into master at 5am this morning so it should be live there [18:40:49] I haven't checked [18:40:51] * greg-g nods [18:40:52] (03CR) 10Reedy: [C: 031] Cirrus on all closed wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85239 (owner: 10Chad) [18:40:55] (03Merged) 10jenkins-bot: wmgHTTPSBlacklistCountries changes for CN [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84781 (owner: 10Reedy) [18:41:14] (03PS3) 10Reedy: Language Template fixup definition for UploadWizard. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/39026 (owner: 10Lupo) [18:41:25] I need to do a bit of testing with that change as well [18:41:30] * greg-g nods [18:41:40] k, well, I'll assume monday for now [18:41:43] The unit tests pretty much prove that we won't corrupt things but I'm paranoid [18:41:46] Yeah sounds reasonable [18:41:55] yeah, thanks RoanKattouw [18:41:56] If it was a one-liner and it was corrupting more pages .... but it's not and it's not [18:42:01] right [18:42:02] :) [18:43:09] (03Abandoned) 10Reedy: Delete all the superfluous wikimania docroots [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/62565 (owner: 10Reedy) [18:43:14] (03PS3) 10Reedy: Change name of cywikisource to "Wicidestun" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84221 (owner: 10TTO) [18:43:20] (03CR) 10Reedy: [C: 032] Change name of cywikisource to "Wicidestun" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84221 (owner: 10TTO) [18:43:21] anomie: huh, ApiPurge already calls pingLimiter() thought not PurgeAction [18:43:47] (03Merged) 10jenkins-bot: Change name of cywikisource to "Wicidestun" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84221 (owner: 10TTO) [18:44:20] (03PS2) 10Reedy: (bug 54287) Change logo for crwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85110 (owner: 10Yuvipanda) [18:44:26] (03CR) 10Reedy: [C: 032] (bug 54287) Change logo for crwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85110 (owner: 10Yuvipanda) [18:44:55] (03Merged) 10jenkins-bot: (bug 54287) Change logo for crwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85110 (owner: 10Yuvipanda) [18:45:20] well purge alone doesn't do links updates (other than randomly for cascade protection)...I guess that makes sense [18:45:25] (03PS2) 10Reedy: Remove unused MW_SECURE_HOST check [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85167 [18:45:30] (03CR) 10Reedy: [C: 032] Remove unused MW_SECURE_HOST check [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85167 (owner: 10Reedy) [18:45:50] (03Merged) 10jenkins-bot: Remove unused MW_SECURE_HOST check [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85167 (owner: 10Reedy) [18:46:03] merge spamming :) [18:46:51] (03PS2) 10Reedy: Remove 404.html symlinks, 404s are handled by w/404.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85175 [18:46:57] (03CR) 10Reedy: [C: 032] Remove 404.html symlinks, 404s are handled by w/404.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85175 (owner: 10Reedy) [18:47:08] seems $wgUser->pingLimiter() gets hit for null edits to, maybe it's the 'noratelimit' right [18:47:32] (03PS1) 10coren: Rename manganese -> dickson and prep for debian [operations/puppet] - 10https://gerrit.wikimedia.org/r/85256 [18:47:37] andrewbogott: is domain necessary for lookup of public ips? [18:47:44] it's an extra query [18:47:52] (03Merged) 10jenkins-bot: Remove 404.html symlinks, 404s are handled by w/404.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85175 (owner: 10Reedy) [18:48:05] It's not needed for the lookup but it's used elsewhere in the codeā€¦ lemme see [18:48:59] LeslieCarr: https://gerrit.wikimedia.org/r/#/c/85256/1 [18:49:30] !log reedy synchronized wmf-config/ [18:49:34] Logged the message, Master [18:49:40] (03PS2) 10Lcarr: adding subnet sandbox1-b-eqiad and dickson.freenode.net fixed to include ipv6 [operations/dns] - 10https://gerrit.wikimedia.org/r/85250 [18:50:35] (03PS2) 10coren: Rename manganese -> dickson and prep for debian [operations/puppet] - 10https://gerrit.wikimedia.org/r/85256 [18:51:00] !log reedy synchronized docroot and w [18:51:04] Logged the message, Master [18:51:30] nope, no 'noratelimit' rights [18:52:16] !log reedy synchronized multiversion/ [18:52:20] Logged the message, Master [18:52:28] Ryan_Lane, deleteHost does $this->getDomain()->updateSOA(); (an example, I feel like there was another place that motivated me to add that...) [18:52:46] ah. right. [18:53:38] Yeah, and this->domain happens a few other places for public hosts. [18:53:40] AaronSchulz: Is the problem that no rate limits are set on 'edit', except for ips and "newbies"? [18:53:49] We could maybe do the lookup when we need it rather than ahead of time... [18:54:23] * Ryan_Lane nods [18:54:40] (03PS3) 10coren: Rename manganese -> dickson and prep for debian [operations/puppet] - 10https://gerrit.wikimedia.org/r/85256 [18:55:18] (03CR) 10Reedy: "Is this still needed?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/39026 (owner: 10Lupo) [18:55:24] anomie: probably [18:56:09] (03CR) 10Lcarr: [C: 04-1] "(2 comments)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/85256 (owner: 10coren) [18:56:33] * AaronSchulz checked the config [18:57:28] anomie: that should definitely be limited somehow [18:57:53] which probably means limiting edits and not just using 'purge' and configuring that...lest people do mass edits adding and removing a newline or something to get around it [18:57:56] akosiaris: fyi I just pushed a couple more recent 0.8 branch changes into our repo [18:58:08] arghhhh [18:58:09] why ? [18:58:12] (03PS4) 10coren: Rename manganese -> dickson and prep for debian [operations/puppet] - 10https://gerrit.wikimedia.org/r/85256 [18:58:28] haha, because i'm trying to stay up to date with 0.8 so it is easier for us when they finally mark it as stable [18:58:35] there were only 2 changes [18:58:37] one was irrelevant [18:58:41] the other was for consumers [18:58:47] Well if it does not compile again you are the one cleaning up :P [18:58:48] looks ok i think :) [18:58:50] haha ok [18:59:16] (03PS1) 10Kaldari: Slightly more concise language for wgRightsText [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85259 [18:59:18] AaronSchulz: OTOH, mass edits adding/removing a newline are likely to be noticed by the community and blocked. Null edits are invisible to the community. [18:59:25] so, i just tried running install like you said, it seems to work [18:59:32] for some reason I though install wouldn't let you rename files [18:59:38] but I guess that is just dh_install [18:59:40] not install install [18:59:45] anomie: hopefully, I guess it's worth a try [19:00:04] i think your suggested change is good, what else did you want to test [19:00:06] akosiaris: ^? [19:00:17] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 10,000 [19:00:40] Avoiding naming these files directly and relying on the MIRROR_CONFFILES variable you created [19:00:49] It will make it a bit more robust [19:00:49] oh aye [19:00:59] i could just make a var for each file [19:01:02] there are only 2 files [19:01:26] I am doing a target that expands that one so probably better [19:01:32] i am testing now [19:01:38] ok [19:03:27] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:04:24] anomie: are you doing that or should I? [19:04:36] AaronSchulz: Doing what? [19:04:52] (so, no, I'm not doing whatever it is) [19:05:03] anomie: rate limiting link purges [19:05:14] * AaronSchulz would make some patches then [19:05:22] *will...can't type today [19:06:23] (03PS1) 10coren: Add dickson to DNS (new freenode server) [operations/dns] - 10https://gerrit.wikimedia.org/r/85261 [19:07:56] (03CR) 10Ryan Lane: [C: 032] Add dickson to DNS (new freenode server) [operations/dns] - 10https://gerrit.wikimedia.org/r/85261 (owner: 10coren) [19:09:43] (03PS1) 10Chad: Officewiki gets Cirrus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85262 [19:13:05] (03CR) 10Chad: [C: 032] Officewiki gets Cirrus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85262 (owner: 10Chad) [19:13:16] (03Merged) 10jenkins-bot: Officewiki gets Cirrus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85262 (owner: 10Chad) [19:14:00] !log demon synchronized wmf-config/InitialiseSettings.php 'Cirrus on officewiki' [19:14:04] Logged the message, Master [19:15:53] (03PS1) 10Hashar: contint: puppet class to setup browsertests slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/85264 [19:18:37] (03CR) 10Lcarr: [C: 032] Rename manganese -> dickson and prep for debian [operations/puppet] - 10https://gerrit.wikimedia.org/r/85256 (owner: 10coren) [19:19:59] (03PS1) 10Chad: Turn officewiki Cirrus back off, stupid TMH [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85266 [19:20:24] (03CR) 10Chad: [C: 032] Turn officewiki Cirrus back off, stupid TMH [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85266 (owner: 10Chad) [19:20:28] !log demon synchronized wmf-config/InitialiseSettings.php [19:20:31] Logged the message, Master [19:23:48] <^demon> manybubbles: fyi, I filed bug 54394. It's not our fault, but keeps us from using Cirrus on private wikis (eg officewiki) just yet. [19:24:04] ^d: cool [19:30:38] (03PS2) 10Hashar: contint: puppet class to setup browsertests slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/85264 [19:32:17] (03CR) 10Hashar: "Did it manually on integration-selenium-driver.pmtpa.wmflabs (it is not a puppet self instance, so I have did this change manually :])." [operations/puppet] - 10https://gerrit.wikimedia.org/r/85264 (owner: 10Hashar) [20:08:18] heyaaa LeslieCarr, I'm messing with FoxyProxy + ssh SOCKS proxy for analytics hadoop nodes [20:08:20] and it mostly works [20:09:38] i think i need someone to verify what I'm seeing. [20:09:51] foxyproxy works for the datanode edge case [20:10:04] but only if I enable it for all urls. [20:10:13] the datanode pattern should be no different than the namenode (or others) [20:10:25] but it doesn't seem to work, and i'm not sure why [20:10:47] it does use the proxy, at least, I have output from ssh -v when I request a datanode [20:11:14] but it seems the proxy can't actually find the datanode to proxy to, unless foxyproxy is set to proxy ALL urls through it [20:11:20] which is totally whack! [20:13:43] manybubbles: do we have a meeting?! [20:13:43] ah! [20:14:48] <^demon> ottomata: Supposedly, but I saw no one else around but me :\ [20:14:58] ah! ping us! haha, nik and I are there now [20:15:00] time snuck up on me [20:15:04] ^demon: I came. [20:15:09] sorry, I'm not having fun right now [20:15:54] cmjohnson-away: are you really away? want to join us? [20:18:16] <^demon> And now the hangout won't launch. [20:18:19] <^demon> I hate google+ [20:20:56] (03PS1) 10Reedy: Update php symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85322 [20:22:04] (03CR) 10Reedy: [C: 032] Update php symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85322 (owner: 10Reedy) [20:22:13] (03Merged) 10jenkins-bot: Update php symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85322 (owner: 10Reedy) [20:23:17] !log reedy synchronized php-1.22wmf18/extensions/Wikibase [20:23:20] Logged the message, Master [20:23:51] !log reedy synchronized php-1.22wmf18/extensions/WikimediaMessages [20:23:54] Logged the message, Master [20:31:28] (03PS1) 10RobH: RT#4416 wikimedia.us redirection [operations/apache-config] - 10https://gerrit.wikimedia.org/r/85325 [20:34:49] RobH: what's up with 4942? (i saw the revert) [20:35:53] (03PS2) 10RobH: RT#4416 wikimedia.us redirection [operations/apache-config] - 10https://gerrit.wikimedia.org/r/85325 [20:36:13] bad apache config [20:36:20] wasnt redirecting and i didnt feel like debugging it anymore that day. [20:36:28] ok [20:36:49] it didnt seem to break anything, but i did not want to leave it in place. [20:37:22] (03CR) 10RobH: [C: 032] RT#4416 wikimedia.us redirection [operations/apache-config] - 10https://gerrit.wikimedia.org/r/85325 (owner: 10RobH) [20:39:33] i could have made it work without regex, but thats cheatin [20:40:03] Is any fellow ops (or anyone who can apache gracefull all)about? [20:40:08] the script has issues for me. [20:41:36] yea, i get pubkey denials.. [20:43:01] Jeff_Green: You about? [20:43:06] yes [20:43:16] can you login to fenari and 'apache-gracefull-all' for me [20:43:17] ? [20:43:22] zomg the horror [20:43:23] yeah [20:43:34] i tested config on mw1220 and all is good [20:43:37] so should be fine [20:43:46] famous last words [20:43:47] here goes [20:43:50] weeee [20:43:52] \o/ [20:44:01] no deploys on Friday? :P [20:44:03] I forget, do I do that as root or as myself? [20:44:05] * aude wonders what wikimedia.us will be [20:44:14] aude: just redirect to wikimedia.org [20:44:20] ok :) [20:44:22] greg-g: is it saturday somewhere already? [20:44:23] Jeff_Green: try as you first, i think. [20:44:44] i may be wrong, the ddsh issues been around for me so long i usualy cheat and use salt [20:44:46] here goes [20:44:47] but its cheatin. [20:44:58] Jeff_Green: yeah, Australia [20:44:58] mw1155: System failed sanity check: VIP not configured on lo [20:45:02] lots of that [20:45:04] that happens [20:45:07] its ok. [20:45:12] i get ssh auth issues, which arent. [20:45:13] heh [20:45:16] and all the mw servers complain about ntpdate lacking sync servers, whatever [20:45:22] ...thats not as ok [20:45:25] but meh [20:45:35] time shouldnt affect this redirection [20:45:43] is the apache script trying to run ntpdate manually? [20:45:46] that's amusing if os [20:45:47] !log reedy synchronized php-1.22wmf18/extensions/WikimediaMessages [20:45:49] so [20:46:07] Jeff_Green: seems ot work [20:46:17] cuz now http://wikimedia.us goes to wikimedia.org [20:46:17] what's with two-letter words that end in o? [20:46:18] !log reedy synchronized php-1.22wmf18/extensions/Wikibase [20:47:43] greg-g: i cannot think of a dismissive two letter word to answer your question [20:47:57] meh is 3 [20:47:58] fail. [20:48:13] hem [20:48:18] thats 3 [20:48:25] I onk [20:48:44] its been that kinda week. [20:50:35] exim config syntax is making me disloyal [20:54:09] (03CR) 10Aude: "should be okay now" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85230 (owner: 10Aude) [20:55:41] <^d> I wish I could stay connected to wifi today. [20:55:42] * ^d sighs [20:56:29] i gave up on it [20:56:31] ^d: plug an ethernet port into your laptop... oh wait, apple hates cables [20:56:39] i have my usb hub rigged with permanent usb to ethernet adapter. [20:56:47] (air has no network port) [20:58:14] RobH: i have 4942 pulled locally, will play on the train [20:59:21] manybubbles: i don't see much info on mininum_master_nodes [20:59:26] aside from advice on what it shoudl be set at [20:59:36] and i don't fully understand what it does [21:00:04] afaict, it is refering to the number of nodes around that *could* be elected master [21:00:19] so, i guess the relevance for the split brain scenario is [21:00:21] (03CR) 10Chad: [C: 032] CirrusSearch depends on Elastica extension. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84562 (owner: 10Manybubbles) [21:00:26] ottomata: I think this issue had something nice: https://github.com/elasticsearch/elasticsearch/issues/2488 [21:00:35] (03Merged) 10jenkins-bot: CirrusSearch depends on Elastica extension. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84562 (owner: 10Manybubbles) [21:00:39] oo, reading... [21:00:50] (let's see if what I was about to say is right :p) [21:01:36] (03PS1) 10Chad: itwiktionary to 1.22wmf18 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85333 [21:01:37] ottomata: http://asquera.de/opensource/2012/11/25/elasticsearch-pre-flight-checklist/ basically says what you said [21:01:55] (03CR) 10Chad: [C: 032] itwiktionary to 1.22wmf18 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85333 (owner: 10Chad) [21:02:05] (03Merged) 10jenkins-bot: itwiktionary to 1.22wmf18 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85333 (owner: 10Chad) [21:02:29] jeremyb: cool, feel free to submit patchset and flag me for review and i'll merge it =] [21:02:49] (03PS2) 10Reedy: add commons site link group section for test wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85230 (owner: 10Aude) [21:02:56] (03CR) 10Reedy: [C: 032] add commons site link group section for test wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85230 (owner: 10Aude) [21:03:11] (03Merged) 10jenkins-bot: add commons site link group section for test wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85230 (owner: 10Aude) [21:03:45] manybubbles: 'I ended up working around the problem (in testing) by using https://github.com/sonian/elasticsearch-zookeeper in place of Zen discovery. We already had reliable Zookeeper infrastructure up for other applications, so this approach made a whole lot of sense to me. I was unable to reproduce the problem with the Zookeeper discovery module.' [21:03:45] :p [21:04:15] ottomata: ha [21:04:28] certainly an option [21:04:56] !log demon synchronized php-1.22wmf18/extensions/Elastica [21:05:00] Logged the message, Master [21:05:27] !log demon synchronized php-1.22wmf18/extensions/CirrusSearch [21:05:30] Logged the message, Master [21:05:56] !log demon synchronized wmf-config [21:05:59] Logged the message, Master [21:06:44] !log demon rebuilt wikiversions.cdb and synchronized wikiversions files: [21:06:47] Logged the message, Master [21:08:08] !log demon synchronized php-1.22wmf18/extensions/Elastica [21:08:47] <^d> manybubbles: Ok, everything's live and updated now, including new Elastica shim. [21:09:08] Hi, I'm getting PHP fatal error on edit Save on mediawiki.org, Failed opening required '/usr/local/apache/common-local/php-1.22wmf18/extensions/Elastica/Elastica/lib/Elastica/Document.php'. Known issue? [21:09:18] Should be fixed [21:10:33] "Fixed in 60 seconds", starring Nicolas Cage as operations [21:11:03] <^d> I think I fixed it in < 60s from when Reedy pinged me. [21:12:08] OK, starring Johnny ^depp [21:12:12] hah [21:14:53] manybubbles: are you using multicast or unicast zen discoery? [21:17:43] also, manybubbles, how many ES nodes are there now? [21:17:44] 3? [21:17:53] 3 [21:18:03] testsearch100[123] I believe [21:18:27] ^ that is correct [21:19:41] manybubbles: it seems like the split brain problem is a potential problems for small cluster like 3, it doesn't seem like there is a good setting [21:19:58] you could set min master nodes at 2, but if a node is cut off from a master, you get split brain [21:20:05] you could set it at 3, but then you lose HA [21:20:10] all nodes have to be up in order to run the cluster [21:20:27] buy moar boxen [21:20:27] 1 is the same problem as 2 [21:24:22] manybubbles: multicast or unicast discovery? [21:24:28] ottomata: multicast [21:24:45] ottomata: why is 2 a bad setting for 3 machines other than really close to not HA? [21:25:04] i think it is probably the correct setting, but can lead to the split brain scenario described in that github issue you sent me [21:25:09] https://github.com/elasticsearch/elasticsearch/issues/2488 [21:25:48] in a 3 node cluster with min master nodes = 2, if a client node gets disconnected from the master [21:25:53] it can still see itself and the other client node [21:25:59] so it can see 2 potential master nodes [21:26:28] !log LocalisationUpdate completed (1.22wmf17) at Fri Sep 20 21:26:27 UTC 2013 [21:26:29] it then try to start an election [21:26:32] Logged the message, Master [21:27:25] i think you can potentially avoid that if you only designate a 2 out of 3 of the nodes possible masters [21:27:28] and leave one only as a client [21:27:31] i think... [21:27:40] but that's annoying because that's custom node configuration [21:27:45] its nice having everything homogenious [21:35:34] (03PS1) 10Ori.livneh: Add EventLogging Kafka writer plug-in [operations/puppet] - 10https://gerrit.wikimedia.org/r/85337 [21:36:47] (03CR) 10Ori.livneh: [C: 04-1] "This has a couple of unmet dependencies; current patch is a draft." [operations/puppet] - 10https://gerrit.wikimedia.org/r/85337 (owner: 10Ori.livneh) [21:39:47] manybubbles: even with that potential problem, ithink it is unlikely to happen, especially since all the nodes are in the same rack [21:39:58] re, potential split brain [21:40:04] so, setting it to N/2 + 1 is fine [21:40:10] e.g. 2 [21:41:44] ottomata: ok. so once we have more nodes we'll have a more interesting choice to make [21:42:20] aye [21:47:59] !log LocalisationUpdate completed (1.22wmf18) at Fri Sep 20 21:47:58 UTC 2013 [21:48:02] Logged the message, Master [21:54:15] (03PS1) 10coren: Fix typo in sanbox1-b-eqiad.cfg [operations/puppet] - 10https://gerrit.wikimedia.org/r/85342 [21:54:34] (03CR) 10coren: [C: 032] "Typo fxi" [operations/puppet] - 10https://gerrit.wikimedia.org/r/85342 (owner: 10coren) [21:58:35] (03PS1) 10Lcarr: updating rdns for dickson temporarily [operations/dns] - 10https://gerrit.wikimedia.org/r/85345 [21:58:55] (03CR) 10Lcarr: [C: 032] adding subnet sandbox1-b-eqiad and dickson.freenode.net fixed to include ipv6 [operations/dns] - 10https://gerrit.wikimedia.org/r/85250 (owner: 10Lcarr) [21:59:03] (03CR) 10Lcarr: [C: 032] updating rdns for dickson temporarily [operations/dns] - 10https://gerrit.wikimedia.org/r/85345 (owner: 10Lcarr) [22:02:36] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Sep 20 22:02:35 UTC 2013 [22:02:39] Logged the message, Master [22:05:22] !log LocalisationUpdate completed (1.22wmf17) at Fri Sep 20 22:05:21 UTC 2013 [22:05:27] Logged the message, Master [22:06:08] !log LocalisationUpdate completed (1.22wmf18) at Fri Sep 20 22:06:08 UTC 2013 [22:06:11] Logged the message, Master [22:07:22] (03PS2) 10coren: Add dickson to DNS (new freenode server) [operations/dns] - 10https://gerrit.wikimedia.org/r/85261 [22:07:57] (03CR) 10coren: [C: 032] "+2 after rebase/merge" [operations/dns] - 10https://gerrit.wikimedia.org/r/85261 (owner: 10coren) [22:10:50] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Sep 20 22:10:50 UTC 2013 [22:10:54] Logged the message, Master [22:20:19] !log parallel dump db1033 to db1036 with innodb_file_per_table=1 [22:20:25] Logged the message, Master [22:28:08] anomie: I'm tempted to add rate-limiting for null edits [22:28:15] Perhaps I can tempt you to fix the software instead. :-) [22:29:44] <^d> There's too many bugs :( [22:31:00] There's also too many people with noratelimits anyway :P [22:31:01] I love how the approach is to try to enforce limits rather than, y'know, fixing the problems that make people want to null edit pages. [22:31:33] Never mind stale pages and links entries, we'll just make life more difficult. [22:31:33] http://en.wikipedia.beta.wmflabs.org is back up [22:31:51] Elsie: https://git.wikimedia.org/git/mediawiki/core.git [22:31:53] glhf [22:32:36] Elsie: you can do both, but limiting should be done in any case for sanity [22:32:49] Pre-mature optimization is rarely wise. [22:32:52] it's also the easiest to do [22:33:00] Well, then it's clear the best. Never mind me. [22:33:06] Clearly, even. [22:33:08] this has caused problems many times before [22:33:13] This? [22:33:22] I guess "it happened several times before" counts as "premature" in some universe [22:33:36] * AaronSchulz is the cultivate enough in the full multiverse [22:33:46] *is not cultivated [22:33:52] You mean people null editing pages has happened before? [22:34:18] <^d> You know, we wouldn't have all these problems with refreshing links if we didn't bother tracking them. [22:34:20] sure, I'm sure you were on the channel sometimes before when it happened [22:34:46] * Elsie shrugs. [22:35:02] There are maintenance scripts that could help. [22:35:04] anyway, various bugs have been fixed before (like OOMs for many backlinks) and more will eventually [22:35:05] But nobody will run them. [22:35:22] I can't fault anyone for doing what works. [22:35:38] <^d> Writing a patch to drop those tables would also work. [22:35:44] * ^d does that instead [22:35:52] people that don't have knowledge of the problems it causes, yes [22:35:53] Elsie: Nobody? [22:36:03] I run all sorts of weird stuff from time to time per your requests [22:36:06] and that's a good case for limiting [22:36:12] Reedy: refreshLinks.php? :-) [22:36:18] Doesn't that die? [22:36:22] Reedy: Yes, and I appreciate that. [22:36:24] Yes, it dies. [22:36:44] I looked up the bugs today. [22:36:50] Someone also may want to look at Parsoid. [22:36:55] Or VisualEditor. [22:37:01] As one of them is flooding the global job queue. [22:37:03] But whatever. [22:37:05] That would be Parsoid [22:37:05] Parsoid [22:37:08] <^d> I'm not sure "want" is the right word. [22:37:12] You probably might need/want to file a bug for that [22:37:20] Rather than the /dev/null todo list [22:38:19] https://bugzilla.wikimedia.org/show_bug.cgi?id=54406 [22:38:35] heh [22:41:37] Thanks, Nemo. [22:42:13] You and your useless bugs. :P [22:43:42] <^d> I filed a useless bug. [22:43:43] <^d> https://bugzilla.wikimedia.org/show_bug.cgi?id=54407 [22:45:40] PROBLEM - HTTP on formey is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:46:41] ohnoes Elsie has competitors [22:47:04] My bugs are rarely useless. [22:47:20] PROBLEM - HTTPS on formey is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:48:12] Does the queue tell the truth? Couldn't a refreshLink2 counts for millions refreshLinks? [22:48:20] RECOVERY - HTTPS on formey is OK: OK - Certificate will expire on 08/22/2015 22:23. [22:48:36] <^d> Maybe we could combine refreshLinks and refreshLinks2 into refreshLinks3. [22:48:36] Which queue? [22:48:39] Elsie: I'm just kidding, I only mean that they are sometimes a bit reticent. :) [22:48:45] :-) [22:48:51] Much like myself. [22:49:03] Sorry, https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20pmtpa&h=hume.wikimedia.org&v=823574&m=Global_JobQueue_length&r=hour&z=default&jr=&js=&st=1365625056&z=large (from the bug report) [22:50:30] RECOVERY - HTTP on formey is OK: HTTP OK: HTTP/1.1 200 OK - 3596 bytes in 0.054 second response time [22:50:33] Christian75: I'm not sure what you mean by "truth" but it's quite sure there is a backlog, Tim confirms it [22:51:24] We also have reports of concrete effects like https://bugzilla.wikimedia.org/show_bug.cgi?id=43936#c7 [22:51:32] reedy@tin:~$ mwscript showJobs.php enwiki [22:51:32] 2492013 [22:51:37] (03PS2) 10Akosiaris: Not including consumer.properties and producer.properties in /etc/kafka. [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/85219 (owner: 10Ottomata) [22:51:44] ottomata: ^ [22:52:38] I try to say. It is diffucult to know how many jobs there really is in the queue, because one refreshLinks2 could expand to millions of refreshLinks - but sure there is at least 3.0 M [22:52:44] apache 23802 0.2 0.3 315424 42544 ? SN 22:40 0:00 php MWScript.php runJobs.php --wiki=frwiki --procs=1 --type=ParsoidCacheUpdateJob --maxtime=300 --memory-limit=300M [23:03:43] !log upgrading db1044 to precise [23:03:46] Logged the message, Master [23:10:00] PROBLEM - DPKG on db1044 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:14:06] PROBLEM - RAID on db1044 is CRITICAL: Connection refused by host [23:14:21] springle: We've machines (other than the dns boxes) still on 10.04? [23:14:40] yeah, i've found a couple [23:14:55] slightly scary [23:15:06] RECOVERY - RAID on db1044 is OK: OK: State is Optimal, checked 2 logical device(s) [23:16:21] actually anything still on mysql 5.1 is 10.04. that number is dropping :) [23:18:46] PROBLEM - Host db1044 is DOWN: PING CRITICAL - Packet loss = 100% [23:20:06] RECOVERY - DPKG on db1044 is OK: All packages OK [23:20:16] RECOVERY - Host db1044 is UP: PING OK - Packet loss = 0%, RTA = 0.47 ms [23:21:56] Hopefully you don't find something still running our mysql 4oh4 [23:22:04] heh [23:31:39] (03PS1) 10Springle: db1044 precise + mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/85362 [23:36:14] cmjohnson: any idea/recollection why db1044 doesn't use lvm? it has the same raid h/w [23:36:41] (03CR) 10Springle: [C: 032] db1044 precise + mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/85362 (owner: 10Springle) [23:45:17] (03PS1) 10Reedy: Mostly fixup whitespace in 404.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85369 [23:45:42] (03CR) 10Reedy: [C: 032] Mostly fixup whitespace in 404.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85369 (owner: 10Reedy) [23:45:53] (03Merged) 10jenkins-bot: Mostly fixup whitespace in 404.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85369 (owner: 10Reedy) [23:46:34] springle: no idea...may not have been set up that way [23:46:38] originally [23:52:14] (03PS1) 10Reedy: Make HTML pass validator [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85373 [23:52:24] (03CR) 10Reedy: [C: 032] Make HTML pass validator [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85373 (owner: 10Reedy) [23:53:06] (03Merged) 10jenkins-bot: Make HTML pass validator [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85373 (owner: 10Reedy) [23:53:52] !log reedy synchronized w/404.php [23:53:56] Logged the message, Master [23:58:52] paravoid: mwscript showJobs.php --wiki=enwiki --group