[00:02:05] AzaToth: seems to be fine now? [00:02:17] reseted it [00:02:34] (03PS2) 10MarkTraceur: WIP Add MMV feature flags for beta and pilot sites [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117376 [00:07:37] testing again YuviPanda [00:07:46] AzaToth: ok! [00:07:55] added a !_.isEmpty() before _.has() [00:15:30] YuviPanda: seems to work [00:15:37] AzaToth: woo! [00:15:53] AzaToth: want me to +2? [00:16:14] w8 [00:16:19] gonna do a test [00:16:25] AzaToth: ok [00:16:33] please join #wikimedia-qa [00:16:36] AzaToth: I'm going to go off now. feel free to just leave a commit in the local repo? [00:16:48] ok [00:16:50] AzaToth: and feel free to self-merge too [00:16:54] can't [00:16:54] AzaToth: after testing [00:17:09] I don't have permission to +2 or merge grrrit [00:17:10] AzaToth: oh? you don'thave +2? [00:17:16] AzaToth: let me see if i can fix that now [00:17:17] * YuviPanda fiddles [00:17:31] (03PS1) 10Tim Landscheidt: puppet::self::master: Specify IP range for eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/118030 [00:19:21] AzaToth: grr, I can't figure that out :( [00:19:30] AzaToth: leave them on tools, I'll +2 when I wake up. [00:19:37] AzaToth: and figure out how to give you +2 tomorrow! [00:19:42] AzaToth: thanks for fixing the issues :) [00:20:00] night! [00:21:12] nait [00:24:13] (03PS1) 10AzaToth: testing grrrit; please ignore [operations/puppet] - 10https://gerrit.wikimedia.org/r/118031 [00:24:41] argh, failed push [00:25:05] (03Abandoned) 10AzaToth: testing grrrit; please ignore [operations/puppet] - 10https://gerrit.wikimedia.org/r/118031 (owner: 10AzaToth) [00:25:44] forgot I can't make new branches [00:27:05] ah, good [00:27:31] <^d> Why can't people make branches? [00:27:38] at #wikimedia-betacluster: (PS1) AzaToth: testing grrrit; please ignore [operations/apache-config] (betacluster) - https://gerrit.wikimedia.org/r/118035 [00:27:47] ^d: I assume it's a permission [00:28:00] can make topics, but not new branches [00:28:02] <^d> Meh, should be granted on most things :\ [00:28:08] :/ [00:28:17] erm #wikimedia-qa [00:28:20] There's a wikimedia-betacluster channel [00:28:25] wth... [00:28:46] (03PS1) 10Tim Landscheidt: Tools: Rename references to local-admin to tools.admin [operations/puppet] - 10https://gerrit.wikimedia.org/r/118036 [00:28:46] hoo: I mixed it up in my head [00:29:20] hoo: I meant #wikimedia-qa [00:30:00] ok, I'm not in that one either [00:30:35] (03CR) 10Tim Landscheidt: "This is apparently the reason why Puppet is failing for tools-master.eqiad.wmflabs at the moment." [operations/puppet] - 10https://gerrit.wikimedia.org/r/118036 (owner: 10Tim Landscheidt) [00:31:04] (03PS1) 10AzaToth: Testing grrrit; please ignore [operations/apache-config] - 10https://gerrit.wikimedia.org/r/118038 [00:31:08] good [00:31:19] works now fine [00:31:44] commits to branch "betacluster" goes to #wikimedia-qa, all else goes whereever they should, for example here [00:32:32] hoo: so for now on, you will be totally oblivious what happens on betacluster [00:34:04] AzaToth: mh... I'm not going to idle in yet another channel [00:34:16] hoo: blame hashar [00:34:22] sadly I still have other things to do than idle in a billion IRC channels [00:35:01] hoo: long or short scale? [00:35:25] sorry? [00:35:38] hoo: billion in long or short scale? [00:36:20] AzaToth: Don't really get that question... but when I started MediaWiki Tech. stuff, I was in one, maybe two channels... now there's more ... [00:36:55] hoo: https://en.wikipedia.org/wiki/Long_and_short_scales [00:37:29] oh, that one :P [00:37:41] ツ [00:37:58] I'm in 20 odd channels [00:37:58] I don't think that matters much :P [00:38:05] (03PS1) 10Tim Landscheidt: Tools: Allow access for administrators from bastions [operations/puppet] - 10https://gerrit.wikimedia.org/r/118039 [00:38:27] I know you where hyperboling [00:38:34] :D [00:42:08] scfc_de: Want to review https://gerrit.wikimedia.org/r/113755 ? :) [00:44:55] (03Abandoned) 10AzaToth: Testing grrrit; please ignore [operations/apache-config] - 10https://gerrit.wikimedia.org/r/118038 (owner: 10AzaToth) [00:47:49] hoo: It's on my list, but I have some backlog re sql from last summer (!), so I need to process that first :-). [00:48:42] ok... I can relate to that, my review backlog also is quite insane :) [00:49:03] On labs, I'm seeing errors from ntpd every 30 s in syslog: "bind(22) AF_INET6 fe80::f816:3eff:fece:f4c%2#123 flags 0x11 failed: Cannot assign requested address", "unable to create socket on eth0 (14945) for fe80::f816:3eff:fece:f4c#123", "failed to init interface for address fe80::f816:3eff:fece:f4c". [00:49:22] Is that something important, or does someone have a clue how to silence that? [01:01:13] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [01:08:24] Found https://bugzilla.wikimedia.org/show_bug.cgi?id=60166 for ntpd issue. [01:09:43] PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 333 seconds [01:09:43] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 333 seconds [02:09:13] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [02:12:33] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (200374) [02:12:35] !log LocalisationUpdate completed (1.23wmf16) at 2014-03-11 02:12:35+00:00 [02:12:51] Logged the message, Master [02:15:33] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [02:22:40] !log LocalisationUpdate completed (1.23wmf17) at 2014-03-11 02:22:39+00:00 [02:22:48] Logged the message, Master [02:42:52] (03CR) 10Andrew Bogott: [C: 032] puppet::self::master: Specify IP range for eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/118030 (owner: 10Tim Landscheidt) [02:48:26] * sDrewth looks up and down for csteipp [02:57:14] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Mar 11 02:57:11 UTC 2014 (duration 57m 10s) [02:57:24] Logged the message, Master [03:10:33] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (203624) [03:27:33] PROBLEM - MySQL Idle Transactions on db1018 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:28:24] RECOVERY - MySQL Idle Transactions on db1018 is OK: OK longest blocking idle transaction sleeps for 0 seconds [03:35:43] RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 79 seconds [03:35:43] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 78 seconds [03:49:33] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [03:54:33] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (200127) [03:58:33] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [04:02:13] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [05:03:33] (03CR) 10ArielGlenn: [C: 032] convert oooold snapshot manifest to module [operations/puppet] - 10https://gerrit.wikimedia.org/r/117901 (owner: 10ArielGlenn) [05:10:13] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [06:25:38] (03PS1) 10Physikerwelt: Change home directory of vagrant user [operations/puppet] - 10https://gerrit.wikimedia.org/r/118053 [06:26:52] (03CR) 10jenkins-bot: [V: 04-1] Change home directory of vagrant user [operations/puppet] - 10https://gerrit.wikimedia.org/r/118053 (owner: 10Physikerwelt) [06:28:33] (03PS2) 10Physikerwelt: Change home directory of vagrant user [operations/puppet] - 10https://gerrit.wikimedia.org/r/118053 [06:30:52] (03PS3) 10Physikerwelt: Change home directory of vagrant user [operations/puppet] - 10https://gerrit.wikimedia.org/r/118053 [06:31:22] (03PS4) 10Physikerwelt: Change home directory of vagrant user [operations/puppet] - 10https://gerrit.wikimedia.org/r/118053 [06:33:30] (03PS5) 10Physikerwelt: Change home directory of vagrant user [operations/puppet] - 10https://gerrit.wikimedia.org/r/118053 [06:40:49] (03CR) 10Physikerwelt: "That might work. But I have no idea how to test that." [operations/puppet] - 10https://gerrit.wikimedia.org/r/118053 (owner: 10Physikerwelt) [07:03:13] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [07:15:04] (03CR) 10Physikerwelt: "one might also have to replace" [operations/puppet] - 10https://gerrit.wikimedia.org/r/118053 (owner: 10Physikerwelt) [07:18:51] (03CR) 10Alexandros Kosiaris: [C: 032] base: remove lookupvar and replace with top scope @ var [operations/puppet] - 10https://gerrit.wikimedia.org/r/117922 (owner: 10Matanya) [07:23:43] (03CR) 10Alexandros Kosiaris: [C: 04-1] ganglia_new: remove lookupvar and replace with top scope @ var and fix scoping (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/112889 (owner: 10Matanya) [07:25:12] (03PS2) 10Alexandros Kosiaris: openstack: remove var and replace with top scope ::var [operations/puppet] - 10https://gerrit.wikimedia.org/r/116976 (owner: 10Matanya) [07:26:38] (03CR) 10Alexandros Kosiaris: [C: 032] openstack: remove var and replace with top scope ::var [operations/puppet] - 10https://gerrit.wikimedia.org/r/116976 (owner: 10Matanya) [07:28:20] (03PS3) 10Matanya: ganglia_new: remove lookupvar and replace with top scope @ var and fix scoping [operations/puppet] - 10https://gerrit.wikimedia.org/r/112889 [08:11:13] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [08:48:07] matanya: what's the RT number for "Global_JobQueue_length ganglia graph empty since January"? [08:48:54] (03CR) 10Nemo bis: [C: 04-1] "Needs rebase" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113639 (owner: 10Matanya) [08:51:05] Nemo_bis, 6771 [08:51:14] thanks [08:52:29] (03PS4) 10Matanya: remove darrell shell account [operations/puppet] - 10https://gerrit.wikimedia.org/r/113639 [09:01:42] andre__: is there a ticket for moving noc.wikimedia.org from fenari too? [09:02:49] Nemo_bis, 6862 [09:03:53] ok [09:06:19] (03PS3) 10Nemo bis: Use $wgTranslatePageTranslationULS [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115879 [09:09:45] (03PS1) 10Nemo bis: Move Global_JobQueue_length ganglia graph to terbium in eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/118058 [09:10:40] (03CR) 10Nemo bis: "Moving noc.wm.o to an eqiad host is RT 6862, should I make a patch for that too? I sent Idf779971a7b83a1efbde647edffbdaaf0429b8a8 to test " [operations/puppet] - 10https://gerrit.wikimedia.org/r/117250 (owner: 10Nemo bis) [09:17:45] (03CR) 10Matanya: Add cron job to run characterEditStats.php on multilingual wikis weekly (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/117250 (owner: 10Nemo bis) [09:20:37] (03PS8) 10Physikerwelt: added basic hbase support [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/99381 [09:22:33] (03Abandoned) 10Physikerwelt: Add Mathoid module (TeX -> MathML / SVG conversion web service) [operations/puppet] - 10https://gerrit.wikimedia.org/r/90733 (owner: 10Physikerwelt) [09:38:12] (03PS1) 10Andrew Bogott: Specify region-appropriate dns resolver. [operations/puppet] - 10https://gerrit.wikimedia.org/r/118060 [09:39:21] YuviPanda: ^ [09:42:24] (03CR) 10Yuvipanda: [C: 04-1] "Perhaps not have a default value for the resolver in the class, and always specify it in the role?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/118060 (owner: 10Andrew Bogott) [09:44:50] (03PS2) 10Andrew Bogott: Specify region-appropriate dns resolver. [operations/puppet] - 10https://gerrit.wikimedia.org/r/118060 [09:44:53] YuviPanda, better? [09:45:33] andrewbogott: yeah, looks good, provided it tests fine :) [09:47:07] (03CR) 10Andrew Bogott: [C: 032] Specify region-appropriate dns resolver. [operations/puppet] - 10https://gerrit.wikimedia.org/r/118060 (owner: 10Andrew Bogott) [09:51:37] (03CR) 10Alexandros Kosiaris: [C: 032] Move Global_JobQueue_length ganglia graph to terbium in eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/118058 (owner: 10Nemo bis) [10:04:13] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [10:05:58] akosiaris_away, will ^^^ also mean that existing cronjobs on hume will get disabled? [10:32:52] MaxSem: no. but hume is to be disabled due to the tampa => eqiad migration anyway [10:33:19] yep, already discovered myself that it's still spilling fatals [10:33:39] want me to disable that ? [10:33:55] would it be a crime to just sudo -u mwdeploy crontab -e it? I have privs to do it myself [10:34:36] like a felony ? or a misdemeanor ? [10:35:04] feel free to do it, just log it please [10:35:12] at least like a "you'll be slapped for it":P [10:35:22] ok [10:37:02] !log Manually disabled old broken job queue cronjobs on hume [10:37:21] Logged the message, Master [10:45:36] (03PS3) 10Nemo bis: Add cron job to run characterEditStats.php on multilingual wikis weekly [operations/puppet] - 10https://gerrit.wikimedia.org/r/117250 [10:46:28] (03CR) 10Nemo bis: "Thanks for the review, I learnt to follow https://wikitech.wikimedia.org/wiki/Puppet_coding instead of copying code around mine. :)" (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/117250 (owner: 10Nemo bis) [10:48:27] (03CR) 10Nemo bis: "It works! Someone close the RT ticket please. done Nemo_bis [10:52:38] and i'm glad my work is useful for you - re puppet coding :) [10:53:43] :) [10:53:57] Weird, now the old data disappeared and the recent data appeared, on hume https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20pmtpa&h=hume.wikimedia.org&r=year&z=default&jr=&js=&st=1394535162&v=1&m=Global%20JobQueue%20length&z=large [11:12:13] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [11:13:07] (03CR) 10Addshore: "Well if they already exist then removing them imo would be bad as they may be used by people." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/113972 (owner: 10Thiemo Mättig (WMDE)) [11:43:16] paravoid: [11:43:29] paravoid: I have approved you on phabricator instance [11:44:08] hashar: i would love to see that too :) <-- hint :P [11:44:28] matanya: not sure who can register at http://fab.wmflabs.org [11:44:35] can'y [11:44:36] *t [11:44:37] matanya: I think you need a @wikimedia.org email [11:45:04] :-( [11:45:04] gerrit is upset now [11:45:17] matanya: mail Chad Horohoe, he can probably add you [11:45:37] getent passwd l10nupdate [11:45:40] wrong window [11:46:56] how hashar it seems it did work [11:47:11] *oh [11:47:22] waiting for admin approval [11:55:55] matanya: approved [11:56:03] thanks [11:57:21] nap + commuting, will be back later in the afternoon [11:57:22] looks good [11:57:32] see ya [13:05:13] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [13:14:43] re [13:17:56] * Reedy rebots hashar [13:24:20] Reedy: hi ! ! do you have any clue how /usr/local/bin/sync-common is installed on the box ? Where does it come from ? [13:37:32] hashar: do you need an answer more specific than "puppet, probably"? [13:37:46] yeah will have to grep I guess : [13:37:47] D [13:37:53] with age I am getting lazier and lazier [13:43:31] I was waiting for my shell to be useable again ;) [13:44:19] hashar: I think it should just be a symlink now created by puppet [13:44:25] to /srv/scap/bin/sync-common' [13:45:37] And that's mediawiki/tools/scap.git or something [13:46:06] ahhh yeah /srv/scap [13:46:15] it is probably not fetched on the beta instance I am recreating [13:47:45] modules/mediawiki/manifests/sync.pp [13:47:52] mediawiki::sync [13:48:26] (03PS4) 10Nemo bis: Add cron job to run characterEditStats.php on multilingual wikis weekly [operations/puppet] - 10https://gerrit.wikimedia.org/r/117250 [13:48:39] (03CR) 10jenkins-bot: [V: 04-1] Add cron job to run characterEditStats.php on multilingual wikis weekly [operations/puppet] - 10https://gerrit.wikimedia.org/r/117250 (owner: 10Nemo bis) [13:49:11] (03PS5) 10Nemo bis: Add cron job to run characterEditStats.php on multilingual wikis weekly [operations/puppet] - 10https://gerrit.wikimedia.org/r/117250 [13:49:23] (03CR) 10jenkins-bot: [V: 04-1] Add cron job to run characterEditStats.php on multilingual wikis weekly [operations/puppet] - 10https://gerrit.wikimedia.org/r/117250 (owner: 10Nemo bis) [13:53:59] (03PS2) 10Reedy: Remove remnants of old CA RC2UDP config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117667 [13:54:03] (03CR) 10Reedy: [C: 032] Remove remnants of old CA RC2UDP config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117667 (owner: 10Reedy) [13:54:12] (03Merged) 10jenkins-bot: Remove remnants of old CA RC2UDP config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117667 (owner: 10Reedy) [13:54:31] (03PS6) 10Nemo bis: Add cron job to run characterEditStats.php on multilingual wikis weekly [operations/puppet] - 10https://gerrit.wikimedia.org/r/117250 [13:54:59] !log reedy synchronized wmf-config/CommonSettings.php 'I208d51b5db031d35518453e2b9de096f7f53f7a0' [13:55:07] Logged the message, Master [13:55:22] (03PS2) 10Reedy: Add logo for legalteamwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117377 (owner: 10Jalexander) [13:55:27] (03CR) 10Reedy: [C: 032] Add logo for legalteamwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117377 (owner: 10Jalexander) [13:55:39] (03PS7) 10Nemo bis: Add cron job to run characterEditStats.php on multilingual wikis weekly [operations/puppet] - 10https://gerrit.wikimedia.org/r/117250 [13:55:55] (03Merged) 10jenkins-bot: Add logo for legalteamwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117377 (owner: 10Jalexander) [13:56:06] (03CR) 10Nemo bis: "PS4 move to terbium, then rebase." [operations/puppet] - 10https://gerrit.wikimedia.org/r/117250 (owner: 10Nemo bis) [13:56:12] (03PS2) 10Reedy: Set wmgBabelCategoryNames for Chinese Wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114970 (owner: 10Odder) [13:56:16] (03CR) 10Reedy: [C: 032] Set wmgBabelCategoryNames for Chinese Wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114970 (owner: 10Odder) [13:56:24] (03Merged) 10jenkins-bot: Set wmgBabelCategoryNames for Chinese Wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114970 (owner: 10Odder) [13:56:59] (03PS3) 10Reedy: Enable VisualEditor by default on French Wikiversity [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117238 (owner: 10Jforrester) [13:57:04] (03CR) 10Reedy: [C: 032] Enable VisualEditor by default on French Wikiversity [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117238 (owner: 10Jforrester) [13:57:11] (03Merged) 10jenkins-bot: Enable VisualEditor by default on French Wikiversity [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117238 (owner: 10Jforrester) [13:57:29] (03PS2) 10Reedy: 'Markbotedits' user right for rollbackers on shwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117807 (owner: 10Odder) [13:57:31] (03CR) 10jenkins-bot: [V: 04-1] 'Markbotedits' user right for rollbackers on shwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117807 (owner: 10Odder) [13:57:33] (03CR) 10Reedy: [C: 032] 'Markbotedits' user right for rollbackers on shwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117807 (owner: 10Odder) [13:57:42] (03Merged) 10jenkins-bot: 'Markbotedits' user right for rollbackers on shwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117807 (owner: 10Odder) [13:58:25] (03PS2) 10Reedy: Add Wikimedia CH domain to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116263 (owner: 10Odder) [13:58:36] (03CR) 10Reedy: [C: 032] Add Wikimedia CH domain to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116263 (owner: 10Odder) [13:58:43] (03Merged) 10jenkins-bot: Add Wikimedia CH domain to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116263 (owner: 10Odder) [13:59:25] (03PS2) 10Reedy: Allow more upload file types for sewikimedia sysops [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116057 (owner: 10TTO) [13:59:29] (03CR) 10Reedy: [C: 032] Allow more upload file types for sewikimedia sysops [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116057 (owner: 10TTO) [13:59:37] (03Merged) 10jenkins-bot: Allow more upload file types for sewikimedia sysops [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116057 (owner: 10TTO) [13:59:54] (03PS2) 10Reedy: localize wmgBabelCategoryNames and wmgBabelMainCategory for oswiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115181 (owner: 10Ricordisamoa) [13:59:55] Reedy: if you're interested and you give me time I can rebase the commits depending on https://gerrit.wikimedia.org/r/115879 too [13:59:58] (03CR) 10Reedy: [C: 032] localize wmgBabelCategoryNames and wmgBabelMainCategory for oswiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115181 (owner: 10Ricordisamoa) [14:00:06] (03Merged) 10jenkins-bot: localize wmgBabelCategoryNames and wmgBabelMainCategory for oswiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115181 (owner: 10Ricordisamoa) [14:00:34] Are they complex rebases? [14:00:57] (03PS2) 10Reedy: Crats should not add users to import on frwiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115152 (owner: 10Odder) [14:01:03] (03CR) 10Reedy: [C: 032] Crats should not add users to import on frwiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115152 (owner: 10Odder) [14:01:10] (03Merged) 10jenkins-bot: Crats should not add users to import on frwiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115152 (owner: 10Odder) [14:01:48] (03CR) 10Reedy: [C: 04-1] "Conflicts, needs rebase :(" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116059 (owner: 10TTO) [14:02:51] !log reedy synchronized database lists files: [14:02:59] Logged the message, Master [14:03:16] !log reedy synchronized wmf-config/ [14:03:26] Logged the message, Master [14:04:40] (03PS1) 10Cmjohnson: Adding dhcpd entries for db1061-63 [operations/puppet] - 10https://gerrit.wikimedia.org/r/118070 [14:06:30] (03CR) 10Cmjohnson: [C: 032] Adding dhcpd entries for db1061-63 [operations/puppet] - 10https://gerrit.wikimedia.org/r/118070 (owner: 10Cmjohnson) [14:13:13] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [14:15:24] (03PS1) 10Hashar: Tweak l10nupdate user/group creations for beta cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/118071 [14:19:22] (03CR) 10Ottomata: [C: 031] Tweak l10nupdate user/group creations for beta cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/118071 (owner: 10Hashar) [14:21:15] (03PS1) 10Hashar: mediawiki::sync one file{} statement per file [operations/puppet] - 10https://gerrit.wikimedia.org/r/118076 [14:26:47] (03CR) 10Matanya: Tweak l10nupdate user/group creations for beta cluster (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/118071 (owner: 10Hashar) [14:27:53] (03CR) 10Matanya: mediawiki::sync one file{} statement per file (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/118076 (owner: 10Hashar) [14:28:56] matanya: not sure what you mean by "break resources into multiple lines" at https://gerrit.wikimedia.org/r/#/c/118071/1/modules/mediawiki/manifests/users/l10nupdate.pp,unified [14:29:43] name => blah [14:29:50] home => blah [14:30:01] etc, and not a one liner [14:30:08] ah [14:30:16] yeah should do that [14:34:04] (03CR) 10Hashar: "There is apparently no need to quote special keywords." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/118076 (owner: 10Hashar) [14:35:53] (03CR) 10Matanya: mediawiki::sync one file{} statement per file (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/118076 (owner: 10Hashar) [14:36:14] (03CR) 10Hashar: Tweak l10nupdate user/group creations for beta cluster (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/118071 (owner: 10Hashar) [14:36:19] (03PS2) 10Hashar: Tweak l10nupdate user/group creations for beta cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/118071 [14:36:50] hashar: grep -r "ensure" [14:37:03] matanya: yeah you told me about it a few days ago on a different statement [14:37:09] i have? [14:37:17] can't remember which change [14:37:18] * matanya has a memory leak [14:37:23] but I did comply and quoted the value [14:37:46] this time I decided to verify whether we should really quote them and its not needed [14:38:00] in vim that shows the strings and keyword in different colors which is handy [14:38:35] i agree [14:38:55] but then we need to not quote all quoted ensure's [14:45:12] ideally :] [14:46:02] and i enforced the other way :/ [14:47:27] (03CR) 10Addshore: [C: 031] bugzilla: remove the new from motd, old is dead [operations/puppet] - 10https://gerrit.wikimedia.org/r/117669 (owner: 10Matanya) [14:57:37] (03CR) 10Ottomata: [C: 031] "This is looking good, thanks! If you confirm that you've tested this and it works as it should, I think we can merge." [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/99381 (owner: 10Physikerwelt) [14:59:17] (03CR) 10Ottomata: [C: 031] "Wow cool, so simple!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117670 (owner: 10Matanya) [15:00:00] ottomata: it won't work though :) [15:00:25] not until we add the ferm class to the hosts [15:00:32] k [16:06:03] PROBLEM - Host ps1-b5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:06:13] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [16:08:45] !log attempting to fix ps1-b5 and ps1-b6 [16:08:58] Logged the message, Master [16:17:59] (03CR) 10Dzahn: [C: 04-1] "this is for Chase to amend to and try out gerrit while creating his own account" [operations/puppet] - 10https://gerrit.wikimedia.org/r/118016 (owner: 10Dzahn) [16:19:57] (03CR) 10Dzahn: [C: 032] bugzilla: remove the new from motd, old is dead [operations/puppet] - 10https://gerrit.wikimedia.org/r/117669 (owner: 10Matanya) [16:23:26] (03CR) 10Dzahn: [C: 032] remove darrell shell account [operations/puppet] - 10https://gerrit.wikimedia.org/r/113639 (owner: 10Matanya) [16:27:32] (03CR) 10Dzahn: "can you just add Vibha Bamba to this maybe? is there still a controversy?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117659 (owner: 10Nemo bis) [16:29:12] (03CR) 10Nemo bis: "There isn't any controversy, just MZ being overzealous. :) I follow Talleyrand's advice. I'd add Vibha if she was on gerrit; she isn't, he" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117659 (owner: 10Nemo bis) [16:34:55] (03CR) 10Dzahn: "i pasted this URL in the -design channel" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117659 (owner: 10Nemo bis) [16:40:33] RECOVERY - Host ps1-b5-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.62 ms [16:45:46] (03PS9) 10Ottomata: Initial 2.0.0-1 debian release [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115323 [16:46:17] (03PS6) 10BBlack: Overwrite stale ZeroOpts= (blank) cookies if HTTPS zero-rated. [operations/puppet] - 10https://gerrit.wikimedia.org/r/117661 (owner: 10Dr0ptp4kt) [16:46:44] (03CR) 10Faidon Liambotis: "OK, let's talk some data. On our (sampled randomly 1:1000) logs from March 1st-11th, we have 9988 lines for wikidata.org URLs that do not " [operations/apache-config] - 10https://gerrit.wikimedia.org/r/113972 (owner: 10Thiemo Mättig (WMDE)) [16:50:40] (03PS10) 10Ottomata: Initial 2.0.0-1 debian release [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115323 [16:52:27] (03CR) 10Ottomata: "Ok, I made the init script a little bit smarter. Note that this is the template that dh_make provides, and I've only changed a little bi" [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115323 (owner: 10Ottomata) [16:52:54] (03CR) 10Ottomata: Initial 2.0.0-1 debian release (031 comment) [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115323 (owner: 10Ottomata) [16:57:35] robh: i need to take down mw1201-1203 from A6 and 1208-10 from B6 [16:57:40] going to put then in row D [16:58:21] (we moved this conversation in here, heh ;) [16:58:36] So yea.... i dunno if just changing the network config and moving is ideal [16:58:37] or reinstalling [16:58:51] could try the ip thing first and if it works, awesome [16:59:10] oh, i forgot to ask [16:59:13] why are they moving? [16:59:58] our power utilization on those 2 racks is too high [17:00:12] keeps alarming [17:00:31] i see temp alartms [17:00:33] what other alarms? [17:00:44] or is this non email alert? [17:00:55] https://icinga.wikimedia.org/icinga/ [17:01:14] non-email [17:01:33] its not over [17:01:35] its unbalanced [17:01:39] no way to balance and keep in rack? [17:01:45] rack is full [17:01:49] i have to move servers [17:01:58] what i mean is [17:02:04] can we balance the phases without removing? [17:02:10] cuz we are below the 8.6kW ceiling [17:02:17] without removing servers that is [17:02:36] ideally we load each rack to the fuill 8.6 and balace, but i realize it may not always be possible since we have limited power outlets per pdu. [17:03:28] i don't really see what I can move..all 3 phases on b6 jump over the threshold [17:03:49] same with A6. I think the most logical thing to do is move a few [17:03:51] i dont see any over threshhold on icinga [17:03:55] i see only they are out of balance [17:03:56] not over. [17:04:09] logging directly into the pdu [17:04:21] yea, tower b is over, but not tower a [17:04:29] so the towers are not balanced [17:04:43] (maybe all the network gear is on one only, or some server pdu isnt in both?) [17:04:56] its just odd that tower a y and z phase are 10.3 and 10.2 [17:05:07] but the tower b y and z are 12.6 and 12.8 [17:05:26] so yea, its over loaded on those two phases, but not on the identical redundant tower, so its unbalacned imo [17:05:31] but dunno if it can be fixed. [17:05:35] it is odd...but most of them our that way. I don't think the psu are drawing equal power [17:05:49] ^ the servers [17:05:49] (03PS4) 10Rush: add account for Chase and add to admins::root add key RT #7004 [operations/puppet] - 10https://gerrit.wikimedia.org/r/118016 (owner: 10Dzahn) [17:06:18] that bugs me, somethign is off [17:06:45] moving the servers is a bandaid, which may be the final thign we have to do [17:07:25] we can wait...I can email my contact at servertech and see if they have any suggestions [17:07:30] another reason i want all per outlet control [17:07:37] so can see draw per outlet. [17:07:49] that would be nice [17:08:08] let's pause this and I will do some more digging [17:08:25] but yea, if we end up having to move in row then sounds fine what you wanted to do [17:08:43] !ls [17:08:46] oops :) [17:08:59] sorry, out of row [17:09:02] heh [17:09:59] robh: one reason could be we are using y cables in A6 [17:10:22] ohh, maybe yea [17:10:49] meh, if you have to mvoe em, feel free [17:10:58] just do them two at a time or so and should be fine. [17:11:06] (and admin log it) [17:11:16] and dont do it during a deployment window [17:11:36] its annoying for them to have apaches going on and offline during those [17:11:58] but i forgot the y cables and they arent being used normally [17:12:04] and even normally they suck and arent a great idea. [17:12:10] yeah...probably going to do it first thing tomorrow around 8:30..should be least impact [17:12:48] https://wikitech.wikimedia.org/wiki/Deployments [17:13:17] you should be fine [17:13:38] yep..no conflicts [17:13:51] the dns stuff may be wonky, hence my suggestion of two at a time, heh [17:14:13] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [17:14:33] (03CR) 10BBlack: [C: 032 V: 032] Overwrite stale ZeroOpts= (blank) cookies if HTTPS zero-rated. [operations/puppet] - 10https://gerrit.wikimedia.org/r/117661 (owner: 10Dr0ptp4kt) [17:14:54] meh, forget my suggestion [17:15:00] i keep thinking you are moving more than 3 [17:15:01] 3 i' [17:15:05] i'd do all at once. [17:15:11] all three are in the api pool [17:15:38] so if you look at the api cluster, its weights are a bit off [17:15:48] and the newer machines are pulling a bit more than their fair share imo [17:15:58] BBlack, thx. ^ yurik, see merged patch above several messages. that paves the way for the ZRMA JS/CSS patch [17:16:00] but there are idle api machines, so taking all three down seems totally legit to me. [17:16:16] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=API%2520application%2520servers%2520eqiad&tab=m&vn=&hide-hf=false [17:16:34] cmjohnson1: when you moved and reconfigured the analytics node system ips [17:16:43] they were already claling into puppet and continued without issue? [17:16:54] (i think its all hostname, not ip based, so sounds right to me,just checkin) [17:17:07] (03PS1) 10Odder: Add Meta to $wgImportSources for Serbian Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/118099 [17:17:19] that you would have to ask ottomata [17:19:53] (03PS5) 10Dzahn: add account for Chase and add to admins::root add key RT #7004 [operations/puppet] - 10https://gerrit.wikimedia.org/r/118016 [17:22:24] (03PS1) 10Hashar: beta: disable memcached accross datacenters [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/118100 [17:23:00] (03CR) 10Hashar: [C: 032] beta: disable memcached accross datacenters [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/118100 (owner: 10Hashar) [17:23:07] (03Merged) 10jenkins-bot: beta: disable memcached accross datacenters [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/118100 (owner: 10Hashar) [17:24:43] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [17:25:23] https://graphite.wikimedia.org/render/?title=HTTP%205xx%20Responses%20-8hours&from=-8hours&width=1024&height=500&until=now&areaMode=none&hideLegend=false&lineWidth=2&lineMode=connected&target=color%28cactiStyle%28alias%28reqstats.5xx,%225xx%20resp/min%22%29%29,%22blue%22%29 [17:25:28] ^ ? [17:27:30] (03CR) 10Ragesoss: [C: 031] Add Meta to $wgImportSources for Serbian Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/118099 (owner: 10Odder) [17:31:21] !log shut down srv284-srv301 (sdtpa row B, B5) [17:31:29] Logged the message, Master [17:32:14] test [17:32:22] chasemp: welcome [17:32:28] hi chasemp [17:32:30] bblack: I can't find any spikes in fatals or exceptions that match that 5xx graph. [17:32:43] thanks [17:32:47] mutante: thanks for the merges [17:32:52] hola chasemp [17:34:07] for posterity: rush <- chasemp, rush -> chasemp [17:34:20] chasemp: ello there [17:34:34] * greg-g is Greg from the standing desk ;) [17:35:09] \o/ for new opsen! [17:35:52] so now there are chasemp and jgage, yay for ops team revival :) [17:39:07] were are the new opsen?????????? [17:39:16] * hashar celebrates [17:41:21] 13:36 < chasemp!~chasemp@2620:62:c000:149:7479:bf7d:2ba3:ffb4 [Remote host closed the connection] [17:41:24] :) [17:42:55] jgage: did you have a look at rt #6999 ? [17:44:04] .... [17:47:23] * hashar waves [17:47:57] matanya, looking at it now [17:50:06] i have not written any ferm rules yet [17:50:10] another thing to learn :) [17:50:48] jgage: i told ottomata i'll handle this [17:51:04] just wanted to make sure we don't have duplicate efforts [17:51:23] matanya: great, thank you! [17:52:44] manybubbles: An error has occurred while searching: We could not complete your search due to a temporary problem. Please try again later. [17:52:55] on wikitech [17:53:05] matanya: gar wikitech [17:53:35] manybubbles: is this useful error message for you? [17:53:37] RobH, were you asking about changing IPs? [17:53:43] matanya: can you send me a link? [17:53:51] https://wikitech.wikimedia.org/w/index.php?search=submodule&title=Special%3ASearch&go=Go [17:54:00] matanya: not really. Its a catch all that says "go read the logs" but I don't have access to wikitech's logs [17:54:09] andrewbogott_afk: ^^ [17:54:21] and you guys just manually changed ips since the fqdn didnt change [17:54:41] was wondering if/how that went and if you had puppet or salt issues? [17:54:58] ie: we need to move some apaches from row a to row d and dont wanna reinstall if we dont have to. [17:57:20] can anyone with wikitech access send me whatever Cirrus errors it is logging? [17:57:33] ottomata: ^ ? [17:58:07] maybe I can just have ssh access to it so I can debug this? [17:58:21] I'm sure it is some out of date submodule or something [17:58:43] RobH [17:58:54] yeah, changing IPs went ok, I didn't do anythign with salt, but I think that has changed recently [17:59:01] we had to revoke the puppet cert and make a new one [17:59:07] probably would have to do the same with salt [17:59:16] ok, sounds right [17:59:22] but yeah, it was just a matter of changing the network configs and restarting networking [17:59:25] ottomata: can you please point me to the zoo stuff on wikitech please? [17:59:58] matanya: i don't think I have any docs on zookeeper [18:00:03] everything is in puppet :/ [18:00:11] aside from some usage docs for kafka here and there [18:00:35] didn't you point me to something yesterday? [18:00:38] * matanya scrolls [18:01:16] manybubbles: would those be in /var/log/apache2 or elsewhere? [18:02:03] i pointed you to github puppet module repo [18:02:04] i thnk [18:02:21] maybe? [18:02:21] https://github.com/wikimedia/puppet-zookeeper [18:02:30] oh the wikitech doc I gave you was on about using puppet git submodule [18:02:31] s [18:02:39] https://wikitech.wikimedia.org/wiki/Puppet_coding#git_submodules [18:02:41] matanya: ^ [18:02:58] !log bd808 updated /a/common to {{Gerrit|I06d07cc3e}}: beta: disable memcached accross datacenters [18:03:07] Logged the message, Master [18:03:07] yes, thanks ottomata [18:03:11] logmsgbot lies [18:03:15] huh [18:03:20] (03PS1) 10BryanDavis: Group1 wikis to 1.23wmf17 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/118107 [18:04:05] greg-g: It does that when I follow Sam's instructions and commit from tin [18:04:43] I should probably be exporting the magic "don't announce" env var [18:04:47] * bd808|deploy makes a note [18:04:49] ahh, right [18:05:10] that was part of the hack for security patches, right? [18:05:16] yeah [18:06:05] greg-g: I'm ready for group1 to wmf17 whenever you are [18:06:22] do it [18:07:09] (03CR) 10BryanDavis: [C: 032] Group1 wikis to 1.23wmf17 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/118107 (owner: 10BryanDavis) [18:07:16] (03Merged) 10jenkins-bot: Group1 wikis to 1.23wmf17 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/118107 (owner: 10BryanDavis) [18:07:40] ottomata: should be in /var/log/apache2/error.log but might be moved based on your configuration [18:08:10] greg-g: it might interest you that tomorrow early UTC morning I am planning to upgrade PHP5 due to a security update. Plus two long awaited upgrades for php5-wmerrors and libmecached (a patch from Tim was included). I don't think it needs to be mentioned in any deployment schedule, just letting you know [18:08:36] !log bd808 rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to 1.23wmf17 [18:08:40] manybubbles: not seeing anything there [18:08:41] akosiaris: thanks muchly [18:08:46] Logged the message, Master [18:08:55] it'll be wherever the php error logs are [18:09:04] but I couldn't tell you where they are configured to go [18:09:21] whatever site in /etc/apache/ runs wikitech should have that config [18:13:20] greg-g: LGTM [18:15:33] PROBLEM - Host barium is DOWN: PING CRITICAL - Packet loss = 100% [18:16:50] Anybody have ideas about how to diagnose the 5xx spike? https://graphite.wikimedia.org/render/?title=HTTP%205xx%20Responses%20-8hours&from=-8hours&width=1024&height=500&until=now&areaMode=none&hideLegend=false&lineWidth=2&lineMode=connected&target=color%28cactiStyle%28alias%28reqstats.5xx,%225xx%20resp/min%22%29%29,%22blue%22%29 [18:16:55] (03PS11) 10Ottomata: Initial 2.0.0-1 debian release [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115323 [18:17:30] ottomata: want me to inspect? [18:17:48] bd808|deploy: what the heck, why'd it go up at 15 UTC? [18:18:00] I'm not seeing anything out of the ordinary in fatalmonitor or exception logs [18:18:19] greg-g: Dunno. bblack pointed it out earlier [18:18:33] * bd808|deploy won't take the blame [18:20:23] AzaToth: inspect? [18:20:23] RECOVERY - Host barium is UP: PING OK - Packet loss = 0%, RTA = 0.68 ms [18:20:58] yeah, manybubbles it says error.log [18:21:02] not seeing anything there though [18:21:14] ottomata: review* [18:21:34] oh of archiva .deb? [18:21:37] akosiaris: is looking at it now [18:21:39] well I can't do anythign about the errors without some kind of log. might be in the cirrus debug log. I haven't a clue where that is configured to go. [18:21:48] greg-g: Ok for me to try cleaning up some of the l10n caches now? [18:22:01] yea [18:22:08] it'd be set with $wgDebugLogFile in some php file [18:22:41] manybubbles: Logs you're wanting should be in /a/mw-log on flourine [18:22:46] bd808|deploy: yeah, go for it [18:23:03] bd808|deploy: wikitech? [18:23:27] manybubbles: Oh. Not paying close attention :/ [18:23:36] yeah, I like fluorine:) [18:25:22] csteipp: https://bugs.launchpad.net/swift/+bug/1265665 hehe [18:27:09] (03PS12) 10Ottomata: Initial 2.0.0-1 debian release [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115323 [18:27:58] AaronSchulz: Yeah, we're not the only one. Although tbh, I've been trying to code up a simple attack module to run timing attacks like that, I'm pretty sure PHP really isn't vulnerable. [18:28:25] RECOVERY - Disk space on snapshot1004 is OK: DISK OK [18:28:33] There's a slight bias between 0 correct and 2 correct, but then everything else is flat [18:29:35] (03CR) 10Alexandros Kosiaris: [C: 032] Initial 2.0.0-1 debian release [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115323 (owner: 10Ottomata) [18:30:21] \!log bd808 purged l10n cache for 1.23wmf13 [18:31:51] (03CR) 10Addshore: "This could probably be abandoned now? :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/52043 (owner: 10Silke Meyer) [18:32:55] (03PS1) 10coren: Tool Labs: shadow master for eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/118120 [18:32:59] (03CR) 10Dzahn: [C: 031] "key confirmed per https://office.wikimedia.org/w/index.php?title=User:CPettet_%28WMF%29&oldid=108184 UID confirmed on formey, sitting ne" [operations/puppet] - 10https://gerrit.wikimedia.org/r/118016 (owner: 10Dzahn) [18:33:01] (03CR) 10AzaToth: [C: 04-1] "can drop the empty docs as well" (035 comments) [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115323 (owner: 10Ottomata) [18:33:09] !log bd808 purged l10n cache for 1.23wmf14 [18:33:10] lol logmsgbot , why do you double escape things you silly bot [18:33:15] ottomata: sorry, couldn't resist [18:33:17] Logged the message, Master [18:33:38] (03CR) 10Gage: [C: 032] add account for Chase and add to admins::root add key RT #7004 [operations/puppet] - 10https://gerrit.wikimedia.org/r/118016 (owner: 10Dzahn) [18:35:05] !log bd808 purged l10n cache for 1.23wmf15 [18:35:14] Logged the message, Master [18:35:31] (03CR) 10coren: [C: 032] Tool Labs: shadow master for eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/118120 (owner: 10coren) [18:36:11] Ack. [18:36:37] Is it okay if I push that chase changeset? [18:36:38] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor stuff. Other than that, it is almost ready for an LGTM" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/109088 (owner: 10Matanya) [18:38:57] greg-g: {{done}} https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Miscellaneous+eqiad&h=tin.eqiad.wmnet&jr=&js=&event=show&ts=0&v=186.588&m=disk_free&vl=GB&ti=Disk+Space+Available [18:39:06] (03PS4) 10Matanya: gerrit: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109088 [18:39:06] oops coren looks like we had a race? i just ran puppet-merge for #7004 and it only showed me chasemp's change, but after i typed 'yes' it also showed me toollabs changes [18:39:26] jgage: I just merged both. We just had a collision. :-) [18:39:46] bd808|deploy: 9gigs, not bad [18:40:18] coren: interesting! chase was just asking us "can a race happen during merging?" and then conveniently, one did ;) [18:41:05] So the answer is "Yes, but it's basically harmless if confusing". :-) [18:41:54] yeah [18:42:00] greg-g: It looks more dramatic here: https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20eqiad&h=snapshot1002.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1394563282&v=7.708&m=disk_free&vl=GB&ti=Disk%20Space%20Available&z=large [18:42:36] bd808|deploy: very much so [18:42:45] * greg-g isn't quite sure what the snapshot hosts do [18:43:08] Hopefully that will make a.pergos happier [18:43:09] "hese hosts generate the XML dumps." [18:43:14] https://wikitech.wikimedia.org/wiki/Snapshot1 [18:43:18] (03CR) 10Alexandros Kosiaris: [C: 04-1] nfs: lint (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/109081 (owner: 10Matanya) [18:46:38] (03CR) 10Dzahn: [C: 032] [Planet Wikimedia] Add Vibha Bamba to English planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/117659 (owner: 10Nemo bis) [18:48:36] greg-g: really done now [18:52:51] !log shut down mw28-mw57 [18:53:00] Logged the message, Master [18:53:53] mutante: You're on a decommissioning roll! [18:55:35] bd808, since you are on rt duty.. https://rt.wikimedia.org/Ticket/Display.html?id=6961 [18:55:50] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [18:56:03] (03CR) 10Alexandros Kosiaris: [C: 032] ganglia_new: remove lookupvar and replace with top scope @ var and fix scoping [operations/puppet] - 10https://gerrit.wikimedia.org/r/112889 (owner: 10Matanya) [18:56:08] gwicke: bd808 !== bblack :) [18:56:21] * gwicke slaps forehead [18:56:26] sorry 'bout that [18:56:35] no worries. [18:56:38] bblack, ^^ [18:57:11] * bd808 is older, balder and less root IRL [18:57:34] hehe [18:58:02] but more available [18:58:29] solution: just give bd808 root [18:58:39] !shut down mw59-74 (pmtpa row D,D3) [18:58:43] * gwicke has this theory that rt duty means 'hide from IRC for a week' [18:58:52] :) [18:59:06] greg-g: I thought you wanted me to write code for you ;) [18:59:40] If I had root I'd be too busy doing actually useful work to write much code [18:59:45] :( [18:59:55] scap in beta cluster is useful [19:00:02] grumble [19:00:43] look, just think about it like this: doing that beta cluster work is kinda like having root, it's just root not on production! [19:00:49] so, all the joy but none of the paging! [19:01:20] :P [19:02:16] Hey, who is it that "owns" dumps nowadays? [19:02:20] Ariel, right? [19:02:29] greg-g: First food. Then hand to hand combat with labs [19:02:38] Coren: yes [19:02:56] apergos: Got a minute? [19:03:06] yes [19:03:14] what's up? [19:03:20] !log shut down mw86-mw125 (sdtpa row A, A5) [19:03:28] Logged the message, Master [19:03:41] apergos: We probably want to stop the rsync to pmtpa now, and switch to eqiad. Your puppet class has already been applied to the new fileserver, what do you need on your end? [19:03:55] the gluster rsync? [19:04:11] apergos: dumps. [19:04:33] the copy of dumps to the labs gluster share? [19:04:44] Well, it's not gluster anymore but yeah. [19:04:53] can you point me to your puppet change? [19:05:01] * Coren points at the open grave marked 'gluster'. [19:05:48] PROBLEM - Host mw43 is DOWN: PING CRITICAL - Packet loss = 100% [19:05:58] there it is:) duh, random ones [19:06:01] um, sorry but can you give aa little more direction; you said the new fileserver has the puppet change... can you point me to that? [19:06:24] node /labstore100[12]\.eqiad\.wmnet/ in site.pp; has the rsync::server::module that /should/ be okay [19:06:28] when I see what is exported from where etc, I can then add a stanza to the dataset manifest [19:06:34] ACKNOWLEDGEMENT - Host mw43 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn go away [19:06:44] gwicke: can you save me some digging and show me where the deploy command is at that doesn't work? (the one that tries to use upstart) [19:07:08] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [19:07:13] bblack, service-restart and its new-style git deploy service restart both time out [19:07:14] (03CR) 10Alexandros Kosiaris: [C: 04-1] ganglia_new: lint clean (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107128 (owner: 10Matanya) [19:07:20] bblack, and they don't work per box [19:07:27] Ah, that reminds me I should turn that back on. [19:08:05] bblack, the bug against salt is at https://bugzilla.wikimedia.org/show_bug.cgi?id=61882 [19:08:28] RECOVERY - Puppet freshness on labstore1001 is OK: puppet ran at Tue Mar 11 19:08:25 UTC 2014 [19:09:00] apergos: rsync destination should be 'labstore.svc.eqiad.wmnet' though, not labstoreX, to hit the currently active server. [19:09:21] gwicke: to be clear, though, the commands are correct, they just "time out" because sudo fails? [19:10:10] I'll need to change modules/dataset/manifests/cron/rsync/labs.pp and update modules/dataset/files/gluster/gluster-rsync-cron.sh [19:10:24] bblack: no [19:10:33] apergos: Want me to do it? I don't mind, so long as you point me at the right things. :-) [19:10:38] sudo is broken since our move from /etc/init.d/parsoid to upstart [19:10:41] might rename some things too since most of it is not gluster-specific [19:10:54] bblack, the salt problem is an unrelated salt bug [19:11:04] then let's leave it out of this discussion :) [19:11:13] once sudo is fixed we can work around it by using dsh [19:11:16] what command do you want to execute, which fails? :) [19:11:28] we want a way to reliably restart the service [19:11:30] well that class is the right class and that script is the right script, might want the script to become a template that takes the mount points, [19:11:46] ideally one that works on individual boxes as well as on all boxes [19:12:16] the class should say 'if pmtpa shovel in all the gluster related packages and mount, otherwise mount via nfs from X ost' [19:12:26] I can do it tomorrow morning or you can poke at it sooner if you like [19:13:14] gwicke: I'm not even sure how to parse your statement about "as well as on all boxes", nor am I confident what "the service" means explicitly [19:13:25] I don't do deploys, tell me in unix terms what you want :) [19:13:36] bblack, dsh -g parsoid restart parsoid [19:13:50] or sudo restart parsoid on a single box [19:14:14] the dsh would be run from sudo as well? [19:14:15] Coren: if you do make the change, stick me on a a reviewer (not that I nee to review it, just so it shows up in my dashboard and I'll note what you did for later) [19:14:33] bblack, it would be run as a normal user, so we'd add sudo to the host cmd [19:14:56] apergos: Should be straighforward enough. What Could Go Wrong?™ [19:15:01] so you want "dsh -g parsion sudo restart parsoid" to work [19:15:03] grrrr [19:15:09] you break it you own it :-P [19:15:09] s/parsion/parsoid/ [19:15:22] bblack, yes; and "sudo restart parsoid" on an individual box [19:15:34] well, I suspect one implies the other, unless I misunderstand [19:15:42] normally yes [19:22:05] (03PS3) 10Matanya: ganglia_new: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107128 [19:22:17] (03CR) 10jenkins-bot: [V: 04-1] ganglia_new: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107128 (owner: 10Matanya) [19:23:09] (03PS4) 10Matanya: ganglia_new: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107128 [19:23:09] death to jenkins [19:23:21] (03CR) 10jenkins-bot: [V: 04-1] ganglia_new: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107128 (owner: 10Matanya) [19:24:05] i rebased, what does he want? [19:24:17] more runners? [19:24:46] i guess so [19:24:54] where is hashar? [19:25:13] asleep [19:25:40] this one looks real [19:25:53] it's pretty quick to say -1 [19:26:00] follow the jenkins link [19:26:18] (i know "Working.." :p) [19:26:24] what mutante ? [19:27:17] i guess i missed something, what link mutante ? [19:27:20] when jenkins is slow (re: more runners) it will not vote -1 [19:27:26] it will just not vote [19:27:31] right [19:27:39] there are 3 jenkins tests, right [19:27:51] lint, syntax and the experimental one that doesnt vote [19:28:02] correct [19:28:09] check which of them is -1 and then there is link next to that [19:28:18] follow that and scroll down all the way [19:28:25] what is the last error there [19:28:37] please click the link above ... :) [19:28:48] (i can't really load the page right now, i just get "Working...") [19:28:57] yup [19:29:00] this is the point [19:29:06] Build failed. [19:29:06] This change was unable to be automatically merged with the current state of the repository. Please rebase your change and upload a new patchset. [19:29:15] this is what jenkins posted ^ [19:29:28] no test seemed to fail, and i did rebase [19:29:35] and when you click the rebase button it says "path conflict"? [19:29:49] yes [19:30:04] but when rebasing in my shell all is cool [19:30:09] then you have to manually do the rebase [19:30:15] eh, odd [19:30:17] joy [19:30:23] but i think been there before :p [19:30:29] and same thing.. [19:31:26] ok, manul it is [19:31:28] (03CR) 10Alexandros Kosiaris: [C: 04-1] swift: lint (035 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/109625 (owner: 10Matanya) [19:31:41] no joy today ^ [19:32:02] gwicke: the existing rule for pre-upstart stuff was for the user jenkins-deploy and only in beta. Is that still true for the new command? [19:33:13] (because the ticket seems to indicate all members of group "parsoid", and doesn't mention beta. Also, nobody's in group "parsoid") [19:35:38] matanya: yeah... it was 1/5 this round [19:35:50] at least you got 3-4 this morning :-) [19:36:02] yes, there are days like this [19:36:31] !log re-deleting salt keys for pmtpa appservers [19:36:40] Logged the message, Master [19:40:21] (03PS5) 10Matanya: ganglia_new: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107128 [19:44:49] (03PS3) 10Matanya: swift: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109625 [19:45:37] those lints are so damn hard :) [19:48:04] :-) [19:54:31] (03PS1) 10coren: Change dumps to labs for eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/118132 [19:56:49] (03PS2) 10coren: Change dumps to labs for eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/118132 [19:56:59] apergos: ^^ if you're interested. [19:57:17] thanks, I"ll open the tabs for tomorrow [19:57:20] (03PS1) 10Jgreen: unpuppetize some fundraising-related symlinks, remove deprecated ssh keys, all for aluminium [operations/puppet] - 10https://gerrit.wikimedia.org/r/118134 [19:57:42] as long as the rsync jobs don't stack up with multiple ones running, it won't break anything if they don't run [19:57:58] so if there are any issues tomorrow it's not a big deal [19:58:30] kk, so I just self +2 this? [19:59:29] well please puppetd --test on the servers (dataset2 and dataset1001) [19:59:32] (03CR) 10Jgreen: [C: 032 V: 031] unpuppetize some fundraising-related symlinks, remove deprecated ssh keys, all for aluminium [operations/puppet] - 10https://gerrit.wikimedia.org/r/118134 (owner: 10Jgreen) [19:59:35] just ot make sure something isn't really really borken [19:59:51] apergos: That's SOP for me anyways. :-) [19:59:57] cool [20:00:25] (03CR) 10coren: [C: 032] "What's the worse that could happen?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/118132 (owner: 10coren) [20:03:16] Holey sheets; dumps2 is lucid? [20:04:47] killkillkill [20:05:06] It's dying anyways, seeing as it's in tampa. But still. :-) [20:05:42] at least it's not hardy! [20:06:01] I think we killed the last hardy last year. [20:07:10] bblack, sorry was afk [20:10:46] there used to be a rule for /etc/init.d/parsoid in prod [20:12:11] it looks like we removed that from prod after the move to upstart as it was no longer functional anyway [20:12:44] it used to be for me IIRC, but it should now be for all users in the group parsoid [20:14:17] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [20:17:38] (03CR) 10Alexandros Kosiaris: [C: 032] mediawiki::sync one file{} statement per file [operations/puppet] - 10https://gerrit.wikimedia.org/r/118076 (owner: 10Hashar) [20:18:01] bblack, just checked and it seems that the parsoid deployers are indeed not in the parsoid group [20:23:24] (03PS1) 10coren: More tweaks to the labs rsync of public dumps [operations/puppet] - 10https://gerrit.wikimedia.org/r/118138 [20:24:30] gwicke: ok I'll do some more digging in puppet git history and see if I can get it back the way it was (+ upstart, +parsoidgroup). Who should be in the parsoid group? [20:25:29] bblack, just added a response on the ticket [20:25:46] right now there is a special rule in manifests/admins.php for roan and me [20:25:54] that's only to run commands as user parsoid though [20:26:02] right, I saw that [20:26:14] I assume that's for other things [20:26:23] ssastry is not included, but should as he is supposed to take over deployments [20:26:36] that's so that we can attach a debugger for example [20:26:45] ok [20:26:54] also note that ssastry already has deploy rights [20:26:58] akosiaris: thank you :-] [20:27:35] hashar: :-) [20:27:41] bblack, if we wanted to narrow down the right to restart services then making it depend on a specific per-service group would make sense [20:27:54] that does not seem to be common practice though [20:28:41] so maybe we can just add a sudo rule for group wikidev for now, to mirror the git-deploy rights? [20:28:48] well, either way. I just don't want to be repeating the list of users in multiple places [20:29:04] but that could be done with a variable in puppet as easily, and not mess with local *nix groups [20:29:48] is it generally true that all wikidev users are valid deployers of everything, parsoid included? [20:30:30] bblack, afaik yes [20:30:56] (03CR) 10coren: [C: 032] "Trivial fixes." [operations/puppet] - 10https://gerrit.wikimedia.org/r/118138 (owner: 10coren) [20:32:35] bblack, I don't know much about how the salt stack auth stuff works though [20:35:24] (03PS1) 10coren: Last tweaks to rsync of dumps to labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/118139 [20:38:04] (03CR) 10coren: [C: 032] "Even more trivial." [operations/puppet] - 10https://gerrit.wikimedia.org/r/118139 (owner: 10coren) [20:38:50] has anyone seen this before http://p.defau.lt/?fyG99SaYPY5oT6F4_st68Q [20:39:13] any suggestions? [20:47:10] cmjohnson1: eww. [20:47:10] thats when you git review? [20:47:10] i think i fixed ...i rebased. we'll see if it works [20:47:10] So you, myself, and mark need to do a call with cyrus one [20:47:10] and get our layout set [20:47:30] ok [21:00:28] (03CR) 10Nikerabbit: mediawiki::sync one file{} statement per file (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/118076 (owner: 10Hashar) [21:28:28] (03PS1) 10Odder: Let AbuseFilter block users on Spanish Wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/118206 [21:28:59] hmpf [21:57:21] (03CR) 10Hashar: "Forgot to add Marc as a reviewer :] We can catch on wednesday when our TZ overlap." [operations/puppet] - 10https://gerrit.wikimedia.org/r/118071 (owner: 10Hashar) [22:08:13] (03CR) 10BryanDavis: [C: 04-1] "I think you hit a transient failure by NFS rather than a systemic problem when the directory wasn't created. I have been testing in eqiad." [operations/puppet] - 10https://gerrit.wikimedia.org/r/118053 (owner: 10Physikerwelt) [22:10:48] (03PS1) 10Cmjohnson: Db1064/65 [operations/dns] - 10https://gerrit.wikimedia.org/r/118210 [22:16:12] (03PS2) 10Cmjohnson: Adding dns for db1064 and db1065 Change-Id: I33cc77d09bc5176135b9f50a81d28890bf37e75e [operations/dns] - 10https://gerrit.wikimedia.org/r/118210 [22:17:04] robh: fixed it ..but had to clone the repo's again [22:17:44] when in doubt; reclone. [22:17:47] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [22:17:48] (03CR) 10Cmjohnson: [C: 032] Adding dns for db1064 and db1065 Change-Id: I33cc77d09bc5176135b9f50a81d28890bf37e75e [operations/dns] - 10https://gerrit.wikimedia.org/r/118210 (owner: 10Cmjohnson) [22:24:34] I'm having a lot of issues loading wiki pages today.. it appears mostly on bits urls [22:24:49] taking forever (3,6,10,30 seconds) to load [22:25:07] sometimes the rest of the page will load first but not always. [22:56:55] (03PS11) 10Dzahn: turn planet into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108674 [22:57:08] (03CR) 10jenkins-bot: [V: 04-1] turn planet into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108674 (owner: 10Dzahn) [23:15:17] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [23:29:47] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [23:32:17] (03PS12) 10Dzahn: turn planet into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108674 [23:42:22] (03CR) 10Dzahn: [C: 04-1] "crap, rebased totally wrong" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108674 (owner: 10Dzahn) [23:48:17] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 11 Mar 2014 08:47:37 PM UTC [23:54:08] (03PS13) 10Dzahn: turn planet into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108674