[00:00:04] RoanKattouw, ^d, Krenair: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150225T0000). [00:00:40] 6operations: request: use spamassassin to filter as well - https://phabricator.wikimedia.org/T83030#1064425 (10Dzahn) [00:00:44] Nothing to swat? [00:00:47] Krenair: a large percentage of old RT stuff is dupe with an some old BZ issue actually [00:00:55] yeah, not too surprising [00:00:56] I wouldn't be surprised by 20-30% [00:01:26] the magic of duplicate tracking systems I guess [00:02:20] 6operations, 10Wikimedia-General-or-Unknown: RT password reset function broken (sends mail w/ blank passwd) - https://phabricator.wikimedia.org/T32412#1064431 (10Dzahn) declining this because we stopped using RT [00:02:39] 6operations, 10Wikimedia-General-or-Unknown: RT password reset function broken (sends mail w/ blank passwd) - https://phabricator.wikimedia.org/T32412#1064435 (10Dzahn) 5Open>3declined a:3Dzahn [00:03:27] 6operations, 10Wikimedia-General-or-Unknown, 7Ipv6: Enable IPv6 on donate.wikimedia.org - https://phabricator.wikimedia.org/T73267#1064440 (10Dzahn) a:3Dzahn [00:08:00] if only people would not have phone numbers in their mail footers [00:09:18] 6operations: Changing address of Võro Vikipeediä - https://phabricator.wikimedia.org/T84537#1064443 (10Dzahn) [00:10:54] yeah :( [00:13:04] 6operations: have the ip ranges from modules/ntp/templates/ntp.conf.erb pull from network.pp - https://phabricator.wikimedia.org/T82962#1064451 (10Dzahn) [00:21:05] 6operations: configure pt-kill for wikiuser on coredbs - https://phabricator.wikimedia.org/T82802#1064462 (10Dzahn) [00:22:03] (03PS1) 10GWicke: Add a daily incremental repair job [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/192732 [00:22:22] 6operations, 7Pybal: pybal health checks are ipv4 even for ipv6 vips - https://phabricator.wikimedia.org/T82747#1064470 (10Dzahn) [00:23:13] 6operations, 7Monitoring: give icinga a "login" link - https://phabricator.wikimedia.org/T82499#1064473 (10Dzahn) [00:23:29] 6operations, 7Monitoring: give icinga a "login" link - https://phabricator.wikimedia.org/T82499#901765 (10Dzahn) obsolete since recently icinga-admin has been removed (!?) [00:25:54] 6operations: package and puppetize ishmael - https://phabricator.wikimedia.org/T82225#1064479 (10Dzahn) [00:26:21] 6operations: Have sane syslog logging - https://phabricator.wikimedia.org/T82287#1064482 (10Dzahn) [00:26:46] 6operations: for the love of all that is good, puppetize udpmcast - https://phabricator.wikimedia.org/T82092#1064485 (10Dzahn) [00:27:14] 6operations: update the multicast purging documentation - https://phabricator.wikimedia.org/T82096#1064488 (10Dzahn) [00:28:21] 6operations: create a test for multicast relay - https://phabricator.wikimedia.org/T82038#1064492 (10Dzahn) [00:29:23] 6operations: add ES cluster to noc.wikimedia.org/dbtree reporting - https://phabricator.wikimedia.org/T81251#1064495 (10Dzahn) [00:34:29] 6operations, 7network: hook up and dns oob access - https://phabricator.wikimedia.org/T80847#1064512 (10Dzahn) [00:36:22] 6operations, 10Analytics, 6Mobile-Apps, 10Wikipedia-App-Android-App, 10Wikipedia-App-iOS-App: Avoid cache fragmenting URLs for Share a Fact shares - https://phabricator.wikimedia.org/T90606#1064522 (10dr0ptp4kt) p:5Triage>3Normal [00:41:02] 6operations, 10Wikimedia-General-or-Unknown, 7Ipv6: Enable IPv6 on donate.wikimedia.org - https://phabricator.wikimedia.org/T73267#1064526 (10Dzahn) a:5Dzahn>3None [00:42:04] 6operations, 10Wikimedia-General-or-Unknown, 7Ipv6: Enable IPv6 on donate.wikimedia.org - https://phabricator.wikimedia.org/T73267#766158 (10Dzahn) a:3Jgreen [00:45:49] (03PS1) 10Ori.livneh: webperf: update VE metric module [puppet] - 10https://gerrit.wikimedia.org/r/192733 [00:47:11] (03PS2) 10Ori.livneh: webperf: update VE metric module [puppet] - 10https://gerrit.wikimedia.org/r/192733 [00:48:51] (03CR) 10Ori.livneh: [C: 032 V: 032] webperf: update VE metric module [puppet] - 10https://gerrit.wikimedia.org/r/192733 (owner: 10Ori.livneh) [01:12:58] (03PS5) 10Dzahn: etherpad: remove SSL stanza [puppet] - 10https://gerrit.wikimedia.org/r/181413 (https://phabricator.wikimedia.org/T85788) (owner: 10John F. Lewis) [01:14:14] (03CR) 10Dzahn: [C: 032] etherpad: remove SSL stanza [puppet] - 10https://gerrit.wikimedia.org/r/181413 (https://phabricator.wikimedia.org/T85788) (owner: 10John F. Lewis) [01:16:57] PROBLEM - Cassandra database on restbase1002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 111 (cassandra), command name java, args CassandraDaemon [01:16:57] PROBLEM - Cassandra database on restbase1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 111 (cassandra), command name java, args CassandraDaemon [01:16:57] RECOVERY - Cassandra database on restbase1002 is OK: PROCS OK: 1 process with UID = 111 (cassandra), command name java, args CassandraDaemon [01:18:11] PROBLEM - etherpad.wikimedia.org HTTPS on zirconium is CRITICAL: CRITICAL - Cannot make SSL connection [01:18:26] ^ that's me.. on it [01:18:36] i waited until now on purpose so less users [01:19:05] (03PS3) 10Dzahn: etherpad->misc-web-lb.eqiad [dns] - 10https://gerrit.wikimedia.org/r/181269 (https://phabricator.wikimedia.org/T85788) (owner: 10John F. Lewis) [01:19:48] (03CR) 10Dzahn: [C: 032] etherpad->misc-web-lb.eqiad [dns] - 10https://gerrit.wikimedia.org/r/181269 (https://phabricator.wikimedia.org/T85788) (owner: 10John F. Lewis) [01:24:56] RECOVERY - Cassandra database on restbase1001 is OK: PROCS OK: 1 process with UID = 111 (cassandra), command name java, args CassandraDaemon [01:49:01] (03PS1) 10Dzahn: remove etherpad HTTPS monitoring [puppet] - 10https://gerrit.wikimedia.org/r/192741 [01:50:02] (03PS2) 10Dzahn: remove etherpad HTTPS monitoring [puppet] - 10https://gerrit.wikimedia.org/r/192741 (https://phabricator.wikimedia.org/T85788) [01:59:40] (03CR) 10Dzahn: [C: 032] remove etherpad HTTPS monitoring [puppet] - 10https://gerrit.wikimedia.org/r/192741 (https://phabricator.wikimedia.org/T85788) (owner: 10Dzahn) [02:10:58] me wonders which host the LVS howto in https://wikitech.wikimedia.org/wiki/LVS#Pool_or_depool_hosts is talking about [02:14:30] hm i'd like to know the answer to that too [02:15:00] gwicke: any host that is in pybal i would say [02:15:40] you could wget http://config-master.eqiad.wmnet/pybal/eqiad/ for example if you want the config [02:16:08] the "/home/wikipedia/conf" part is outdated [02:16:32] it gets it from config-master instead of noc [02:17:38] mutante, thanks! this works: curl http://config-master.eqiad.wmnet/pybal/eqiad/restbase [02:17:47] :) [02:20:25] is there a correct host/path pair I should replace this with? [02:20:41] the /home/wikipedia thing? [02:20:44] yes [02:21:22] is this still in some local host filesystem, or is it now maintained elsewhere (puppet?) [02:21:33] private git repo in /srv/pybal-config on config-master.eqiad which is currently an alias for palladium, the puppet master [02:21:43] aha [02:21:49] but not going through gerrit [02:21:50] !log l10nupdate Synchronized php-1.25wmf17/cache/l10n: (no message) (duration: 00m 02s) [02:21:57] Logged the message, Master [02:22:57] !log LocalisationUpdate completed (1.25wmf17) at 2015-02-25 02:21:53+00:00 [02:23:02] Logged the message, Master [02:23:21] RECOVERY - HHVM queue size on mw1229 is OK: OK: Less than 30.00% above the threshold [10.0] [02:23:34] mutante, would this be correct? "Edit the files in /srv/pybal-config/$colo on config-master.$colo and wait a minute - PyBal will fetch the file over HTTP." [02:24:34] gwicke: .. and don't forget to git commit locally after editing [02:24:43] kk, will add [02:24:48] thank you [02:26:10] RECOVERY - HHVM busy threads on mw1229 is OK: OK: Less than 30.00% above the threshold [76.8] [02:28:10] mutante, done: https://wikitech.wikimedia.org/wiki/LVS#Pool_or_depool_hosts [02:28:54] I brought up both cassandra & restbase on restbase1006 and was wondering why it still doesn't see any traffic [02:29:23] gwicke: that looks good, thanks. i should add though that i have an issue that might be related [02:29:43] i recently reinstalled an appserver and put it back in pybal [02:29:49] but it still doesnt get any traffic [02:30:13] i still have to make a ticket for that [02:30:34] IIRC ori was working on dynamic pooling / depooling recently [02:31:03] https://phabricator.wikimedia.org/T86542#1051915 [02:31:04] might be worth checking if it could be related [02:31:25] yes [02:34:19] !log l10nupdate Synchronized php-1.25wmf18/cache/l10n: (no message) (duration: 00m 03s) [02:34:24] Logged the message, Master [02:35:27] !log LocalisationUpdate completed (1.25wmf18) at 2015-02-25 02:34:23+00:00 [02:35:31] Logged the message, Master [02:43:06] PROBLEM - HHVM queue size on mw1229 is CRITICAL: CRITICAL: 87.50% of data above the critical threshold [80.0] [02:43:28] PROBLEM - HHVM busy threads on mw1229 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [115.2] [03:02:17] 6operations, 10Wikimedia-Etherpad: move etherpad behind misc-web - https://phabricator.wikimedia.org/T85788#1064719 (10Dzahn) merged all the patches above. it has been switched. etherpad.wikimedia.org is an alias for misc-web-lb.eqiad.wikimedia.org. we don't do caching here. (return (pass);) [03:03:11] 6operations, 7HTTPS, 5Patch-For-Review: Put all zirconium vhosts behind misc varnish cluster - https://phabricator.wikimedia.org/T60048#1064723 (10Dzahn) [03:03:12] 6operations, 10Wikimedia-Etherpad: move etherpad behind misc-web - https://phabricator.wikimedia.org/T85788#1064721 (10Dzahn) 5Open>3Resolved https://gerrit.wikimedia.org/r/#/c/192741/ [03:04:48] 6operations: disable contacts.wikimedia.org? - https://phabricator.wikimedia.org/T84158#1064724 (10Dzahn) [03:07:07] RECOVERY - HHVM queue size on mw1229 is OK: OK: Less than 30.00% above the threshold [10.0] [03:07:38] 6operations, 7HTTPS, 5Patch-For-Review: Put all zirconium vhosts behind misc varnish cluster - https://phabricator.wikimedia.org/T60048#1064730 (10Dzahn) all the blocking tasks have been resolved. no service names are pointing to zirconium anymore. the reference to contacts included, that is also behind mis... [03:10:56] 6operations, 7HTTPS, 5Patch-For-Review: Put all zirconium vhosts behind misc varnish cluster - https://phabricator.wikimedia.org/T60048#610144 (10Dzahn) [03:11:11] 6operations: remove public IP from zirconium - https://phabricator.wikimedia.org/T90676#1064741 (10Dzahn) p:5Triage>3Normal [03:11:27] PROBLEM - HHVM queue size on mw1229 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [80.0] [03:11:36] 6operations: remove public IP from zirconium - https://phabricator.wikimedia.org/T90676#1064731 (10Dzahn) [03:11:37] 6operations, 7HTTPS, 5Patch-For-Review: Put all zirconium vhosts behind misc varnish cluster - https://phabricator.wikimedia.org/T60048#1064744 (10Dzahn) 5Open>3Resolved [04:31:17] (03CR) 10Tim Landscheidt: "htmlpurifier/ is 4.5.0, php-htmlpurifier is 4.3.0; according to http://htmlpurifier.org/news/, there have been (minor) security fixes in t" [puppet] - 10https://gerrit.wikimedia.org/r/148172 (owner: 10Tim Landscheidt) [04:38:24] !log s/db1001/dbproxy1001/g on zirconium drupal contacts. seems unpuppetized [04:38:30] Logged the message, Master [04:53:47] PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: puppet fail [04:58:07] 6operations: contacts.wikimedia.org drupal unpuppetized - https://phabricator.wikimedia.org/T90679#1064781 (10Springle) 3NEW [05:08:23] (03PS2) 10GWicke: Add a daily incremental repair job [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/192732 [05:12:16] RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [05:13:38] (03PS1) 10GWicke: Increase the JVM tenuring threshold [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/192760 [05:25:36] (03PS1) 10GWicke: Increase the new generation to 1/2 heap [puppet] - 10https://gerrit.wikimedia.org/r/192762 [05:28:35] (03PS2) 10GWicke: Increase the new generation to 1/4 heap [puppet] - 10https://gerrit.wikimedia.org/r/192762 [05:44:07] (03PS1) 10KartikMistry: Enable Content Translation in minwiki and uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192764 [05:46:07] (03PS2) 10KartikMistry: Enable Content Translation in minwiki and uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192764 [06:00:25] <_joe_> springle: hey, nah I was already asleep [06:17:21] !log twentyafterfour Synchronized php-1.25wmf18/cache/l10n: (no message) (duration: 00m 02s) [06:17:30] Logged the message, Master [06:20:35] that log was bogus, just me testing but not actually syncing [06:22:22] !log 06:20 < twentyaft> that log was bogus, just me testing but not actually syncing [06:22:26] Logged the message, Master [06:22:29] :) [06:22:47] greg-g: thanks, I wasn't quite sure what is appropriate in logs [06:23:33] context like "I'm about to do X" or "that was just me trying something" is good [06:24:11] any luck? [06:24:15] * greg-g reads bug update [06:28:06] PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:06] PROBLEM - puppet last run on mw1117 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:23] <_joe_> passenger o'clock [06:28:27] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:46] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:07] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:18] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:37] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:48] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:46] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:46:27] RECOVERY - puppet last run on mw1117 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:47:28] RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:49:07] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:49:37] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [07:04:45] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Feb 25 07:03:41 UTC 2015 (duration 3m 40s) [07:04:52] Logged the message, Master [07:09:47] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [07:13:20] (03PS5) 10Nemo bis: Point rel=canonical to HTTPS for all ru projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192502 (https://phabricator.wikimedia.org/T90527) (owner: 10Chmarkine) [07:16:35] (03PS3) 10KartikMistry: Enable Content Translation in minwiki and uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192764 [07:26:28] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [07:26:37] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:28:02] (03PS1) 10KartikMistry: CX: Enable ru in source, min and uz in target wikis [puppet] - 10https://gerrit.wikimedia.org/r/192769 [07:35:24] akosiaris: if you don't mind, please review my two patches. thanks! [07:38:20] (03CR) 10KartikMistry: [C: 04-1] "To be merge along with, https://gerrit.wikimedia.org/r/192764" [puppet] - 10https://gerrit.wikimedia.org/r/192769 (owner: 10KartikMistry) [07:43:32] apergos: users reporting certificate issues on wikipedia [07:44:12] mostly chrome on windows [07:45:43] <_joe_> matanya: which domain? [07:45:54] en.wikipedia, commons [07:46:02] i think it is client side, but not sure yet [07:46:07] <_joe_> newest chrome on mac works correctly [07:46:14] on linux too [07:46:22] don't have windows off hand to check [07:46:27] <_joe_> where are these reports? [07:46:30] <_joe_> me neither [07:46:39] <_joe_> I don't have a single windows pc [07:47:01] this thread: https://he.wikipedia.org/wiki/%D7%95%D7%99%D7%A7%D7%99%D7%A4%D7%93%D7%99%D7%94:%D7%9E%D7%96%D7%A0%D7%95%D7%9F#.D7.99.D7.A9_.D7.91.D7.A2.D7.99.D7.94_.D7.A2.D7.9D_.D7.AA.D7.9E.D7.95.D7.A0.D7.95.D7.AA.3F [07:47:13] https://saucelabs.com/opensauce/ [07:47:43] <_joe_> ori: do you by chance have a windows pc? [07:47:58] no, but i was suggesting using sauce labs for that [07:48:02] <_joe_> matanya: that doesn't help me a lot [07:48:13] oh, sorry _joe_ [07:48:17] the report in the thread matanya linked to is specific to images [07:48:34] <_joe_> ori: there was a chrome stable update today [07:48:35] the thread says images are not loading in chrome on windows 7 [07:48:47] and several people say "me too" [07:48:51] <_joe_> mh, and any reason why? [07:49:03] <_joe_> ori: thanks [07:49:10] matanya: one person [07:49:18] one on my tp [07:49:22] and one mailed me [07:49:28] and two on the thread [07:49:37] <_joe_> just on hewiki? [07:49:42] so far [07:49:57] the error they get is: NET::ERR_CERT_AUTHORITY_INVALID [07:50:09] <_joe_> I hope google isn't doing something specific for israel [07:50:36] <_joe_> which means the CA for commons is invalid for them? [07:50:36] superfish! :) [07:50:41] yes [07:50:50] <_joe_> matanya: no like revoking a ton of certs [07:51:02] <_joe_> matanya: can you point me to a specific url where they see the problem? [07:51:07] yes [07:51:28] https://en.wikipedia.org/wiki/Main_Page [07:51:30] <_joe_> or maybe it's a windows update [07:51:35] <_joe_> matanya: oh lol :P [07:52:00] basically, no images on any wiki is the latest update [07:52:13] <_joe_> I figured that [07:52:27] <_joe_> so for example the image https://upload.wikimedia.org/wikipedia/commons/thumb/2/21/David_Duchovny_2011_Shankbone.JPG/100px-David_Duchovny_2011_Shankbone.JPG [07:52:36] <_joe_> which is on the enwiki home [07:52:47] <_joe_> if they open it in a new tab, what does happen? [07:52:53] works for me: http://i.imgur.com/s9PEUxD.jpg [07:52:53] sec [07:53:01] chrome 40 on windows 7 (via sauce labs) [07:53:08] ori: https ? [07:53:13] * ori tries [07:53:32] <_joe_> ori: I guess this is windows7 in Israel related [07:54:04] <_joe_> matanya: the upload url and enwp home page offer the exact same cert to users [07:54:20] https too: http://i.imgur.com/Mrz9rjQ.jpg [07:54:35] <_joe_> matanya: also, did they try in an incognito window? [07:54:51] weird [07:55:25] wikipedia should have a "last seen" feature [07:57:05] <_joe_> matanya: so I'd ask your users: 1) to try to load https://upload.wikimedia.org/wikipedia/commons/thumb/2/21/David_Duchovny_2011_Shankbone.JPG/100px-David_Duchovny_2011_Shankbone.JPG directly 2) to try the enwp main page and this url in an incognito window [07:57:20] asking [07:59:22] <_joe_> btw https://productforums.google.com/forum/#!topic/chrome/hYYrTwXXitc [07:59:36] http://i.imgur.com/95E77jI.jpg [07:59:52] <_joe_> lol [08:01:03] he said he will be back next to the pc in an hour, will update when i get a response [08:01:24] <_joe_> matanya: thanks, if it's very widespread we will hear others complaining [08:01:33] probably :) [08:01:40] <_joe_> with pitchforks [08:01:54] thanks for your prompt response and help [08:02:43] <_joe_> well, ori did much more than me, and I'm actually in working hours :P [08:04:14] * matanya thanks ori too [08:05:41] * ori boggles at http://www.amervets.com/library.htm [08:06:20] <_joe_> ori: that's for your safety [08:06:21] paraphrase: "to make this data more broadly accessible, we now only allow access through a windows executable file" [08:06:42] best quote: "For combat theater purposes the software is completely stand-alone, utilizing only one Windows system interface (.dll) which already exists in the Windows system inventory (winsock.dll)." [08:07:15] <_joe_> ori: have you by chance read http://justsecurity.org/20304/transcript-nsa-director-mike-rogers-vs-yahoo-encryption-doors/ ? [08:07:38] <_joe_> the quote about access via a windows executable kind of bringed that back to my mind [08:08:02] (03PS1) 10Springle: Unbreak dbtree [software] - 10https://gerrit.wikimedia.org/r/192771 [08:08:34] I read it now; what an infuriating interview [08:08:58] (03CR) 10Springle: [C: 04-1] Unbreak dbtree (031 comment) [software] - 10https://gerrit.wikimedia.org/r/192771 (owner: 10Springle) [08:09:26] <_joe_> "I won't call them backdoors" [08:09:36] last night i read snowden's AMA on reddit, where he referenced kaspersky's reports about "the equation group" (nsa). fascinating stuff: http://www.kaspersky.com/about/news/virus/2015/equation-group-the-crown-creator-of-cyber-espionage [08:09:45] <_joe_> oh yes [08:09:57] GReAT has been able to recover two modules which allow reprogramming of the hard drive firmware of more than a dozen of the popular HDD brands. This is perhaps the most powerful tool in the Equation group’s arsenal and the first known malware capable of infecting the hard drives. [08:09:59] <_joe_> well, for some frightening value of fascinating, yes :) [08:10:40] "For most hard drives there are functions to write into the hardware firmware area, but there are no functions to read it back. It means that we are practically blind, and cannot detect hard drives that have been infected by this malware” – warns Costin Raiu, Director of the Global Research and Analysis Team at Kaspersky Lab. [08:10:46] <_joe_> ori: that's why TAILS or freepto are a good bet if you want to have secure communications [08:13:41] <_joe_> btw my admiration for snowden grows day by day [08:13:50] <_joe_> (re: reddit's AMA) [08:14:20] <_joe_> and I think he's right in pointing out the gemalto hack is much much more serious than that super-malware [08:14:52] <_joe_> basically nsa/gchq can intercept any phone communication on the air and decrypt it warrantless [08:14:56] <_joe_> in a lot of countries [08:16:51] right. super-malware is used to go after a specific target, rather than weaken security and privacy for everybody [08:18:57] <_joe_> it's still worrysome, esp for activists, but still [08:31:49] (03PS2) 10Giuseppe Lavagetto: memcached: use distro version on modern distros [puppet] - 10https://gerrit.wikimedia.org/r/192575 [08:32:05] (03CR) 10Giuseppe Lavagetto: [C: 032] memcached: use distro version on modern distros [puppet] - 10https://gerrit.wikimedia.org/r/192575 (owner: 10Giuseppe Lavagetto) [08:36:06] RECOVERY - Memcached on mc2014 is OK: TCP OK - 0.043 second response time on port 11211 [09:09:58] was there any further update re: https issues for the (one or two?) he users? [09:13:18] could be related to Superfish / Lenovo? might want to ask them if they're on Lenovo machines [09:13:30] greetings [09:14:24] (background: Lenovo has been including HTTPS-sniffing crap on their machines called Superfish, it's a local proxy and they install an alternate Superfish Cert in the browser to make it transparent. It blew up in the news recently and now people are scrambling to remove it... [09:14:56] ... but I see some google hits of users causing themselves problems by deleting the Superfish Cert, but not actually uninstalling the software, thus leading to cert errors trying to reach legit HTTPS sites) [09:15:07] hi godog :) [09:16:03] hey bblack, still up? :) [09:16:11] matanya: ^ the above :) [09:16:20] godog: sorta, maybe not for long [09:17:04] bblack: thanks, that was one first question, not a lenovo [09:17:11] doh [09:17:50] bblack: hehe it makes me chuckle when I go to bed and wake up and us folks are still up (cc ori) [09:18:55] matanya: do we have any confirmation it's a broader problem? I checked google translate of that report, I see the original guy and one other one-liner backing him up. Or some real debug info from the browser on why it thinks the cert is invalid, or what cert it's seeing (fingerprint, screenshot of SSL info from clicking padlock, etc?) [09:19:25] not in the last hour [09:19:32] they are both away from the pc [09:19:47] the third that mailed me said he will look at it in the evening [09:20:36] <_joe_> bblack: they do see enwp but not commons... which is strange [09:20:44] <_joe_> the cert chain is the same [09:20:45] it's entirely possible that the error message is because someone is actually trying to hijack SSL and present a false cert and failing. I wonder if they have a common ISP or something like that. [09:20:55] 6operations, 7HTTPS, 3HTTPS-by-default, 5Patch-For-Review: Point rel=canonical to HTTPS for all Russian Wikimedia projects - https://phabricator.wikimedia.org/T90527#1065014 (10Aklapper) [09:38:05] 6operations, 10ops-eqiad, 10RESTBase, 6Services: restbase1006 faulty disk controller - https://phabricator.wikimedia.org/T89639#1065050 (10fgiunchedi) still seeing the errors :( ``` Feb 25 09:37:08 restbase1006 kernel: [50303.994353] mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0... [09:51:17] RECOVERY - puppet last run on mc2014 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [09:54:16] PROBLEM - Host mc2014 is DOWN: PING CRITICAL - Packet loss = 100% [09:55:30] <_joe_> that's me [09:55:34] <_joe_> sorry [09:57:17] RECOVERY - Host mc2014 is UP: PING OK - Packet loss = 0%, RTA = 43.08 ms [10:00:33] <_joe_> !log restarted hhvm on mw1229, stuck in __lll_lock_wait from HPHP::hphp_session_init [10:00:38] Logged the message, Master [10:00:56] RECOVERY - Apache HTTP on mw1229 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.091 second response time [10:00:56] RECOVERY - HHVM rendering on mw1229 is OK: HTTP OK: HTTP/1.1 200 OK - 66880 bytes in 0.294 second response time [10:08:37] RECOVERY - HHVM queue size on mw1229 is OK: OK: Less than 30.00% above the threshold [10.0] [10:08:59] RECOVERY - HHVM busy threads on mw1229 is OK: OK: Less than 30.00% above the threshold [76.8] [10:21:07] PROBLEM - puppet last run on mw1136 is CRITICAL: CRITICAL: Puppet has 1 failures [10:25:36] RECOVERY - NTP on labstore1001 is OK: NTP OK: Offset -0.004239320755 secs [10:38:36] RECOVERY - puppet last run on mw1136 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [10:48:57] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [10:48:57] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [10:53:11] <_joe_> apergos: ^^ [10:53:43] grrr [10:53:47] sec, sorry [10:54:27] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [10:54:27] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [10:54:31] rerunning the salt command now :-/ [11:42:24] PROBLEM - Host mc2002 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:33] RECOVERY - Host mc2002 is UP: PING OK - Packet loss = 0%, RTA = 43.77 ms [12:58:38] <_joe_> the gerrit bot is not here? [13:07:25] _joe_: It seems that the Tool Labs redis is non-functional at the moment, which would probably affect that bot. [13:10:09] <_joe_> anomie: meh, I don't have root on tools I guess [13:11:29] <_joe_> confirmed, no way to fix/debug this [13:16:48] apergos: the maint-announce queue is a mess [13:17:04] please fix, it's impossible for me to see what's actually relevant here right now... [13:19:37] _joe_: Co.ren is online in #wikimedia-labs now, if you feel like joining the discussion [13:20:03] gotcha [13:20:23] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00666666666667 [13:21:08] <_joe_> anomie: I'm off for lunch [13:21:17] <_joe_> :) [13:25:23] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [13:49:12] apergos: thanks :) [13:49:20] noone has actually cared for it for quite a while now :( [13:49:24] not done and yw! [13:49:30] e.g. the equinix moratorium that expired jan 4th... [13:49:33] yep [13:49:44] still more to go, should be only 3-4 things left at the end [13:49:56] nod [13:50:23] there was an NTT announcement that arrived a few minutes ago that prompted me to look at the queue [13:57:37] (03PS5) 10KartikMistry: WIP: Do not use registry and fallback to config.default.js [puppet] - 10https://gerrit.wikimedia.org/r/191263 (https://phabricator.wikimedia.org/T89803) [14:31:30] (03PS1) 10Odder: Set $wgCategoryCollation to 'uca-hsb' on hsbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192803 (https://phabricator.wikimedia.org/T90689) [14:31:39] (03CR) 10jenkins-bot: [V: 04-1] Set $wgCategoryCollation to 'uca-hsb' on hsbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192803 (https://phabricator.wikimedia.org/T90689) (owner: 10Odder) [14:33:03] (03PS2) 10Odder: Set $wgCategoryCollation to 'uca-hsb' on hsbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192803 (https://phabricator.wikimedia.org/T90689) [14:35:03] (03CR) 10Odder: [C: 04-1] "Still requires a hsbwiki community member to provide us with a link to some sort of communitiy consensus for this; I was unable to find an" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192803 (https://phabricator.wikimedia.org/T90689) (owner: 10Odder) [15:00:04] chasemp: Respected human, time to deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150225T1500). Please do the needful. [15:00:28] nothing to see here today [15:23:42] PROBLEM - Disk space on cp1064 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=86%): [15:31:53] PROBLEM - puppet last run on amssq41 is CRITICAL: CRITICAL: puppet fail [15:36:42] 10Ops-Access-Requests, 6operations: Requesting access to contint-admins for legoktm - https://phabricator.wikimedia.org/T90275#1065906 (10Dzahn) a:3Dzahn [15:37:06] (03PS2) 10Dzahn: Add legoktm to contint-admins [puppet] - 10https://gerrit.wikimedia.org/r/191954 (https://phabricator.wikimedia.org/T90275) (owner: 10Hashar) [15:38:10] (03CR) 10Dzahn: [C: 032] Add legoktm to contint-admins [puppet] - 10https://gerrit.wikimedia.org/r/191954 (https://phabricator.wikimedia.org/T90275) (owner: 10Hashar) [15:40:02] ottomata: T85724 how much warning do they need? monday? [15:40:44] !log welcome legoktm as a contint admin [15:41:18] 10Ops-Access-Requests, 6operations: Requesting access to contint-admins for legoktm - https://phabricator.wikimedia.org/T90275#1065951 (10Dzahn) 5Open>3Resolved Notice: /Stage[main]/Admin/Admin::Hashuser[legoktm]/Admin::User[legoktm]/User[legoktm]/ensure: created Notice: /Stage[main]/Admin/Admin::Hashuser[... [15:43:34] springle: monday should be fine, i can send email today [15:44:24] RECOVERY - Disk space on cp1064 is OK: DISK OK [15:45:33] PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: puppet fail [15:47:53] PROBLEM - DPKG on cp1064 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:48:33] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [15:49:39] !log replacing PEM1 on cr1-eqiad [15:50:30] manybubbles, marktraceur, ^d: Who wants to SWAT this morning? [15:50:43] Oof. I don't think I do. [15:50:47] twkozlowski, greg-g, jzerebecki: Ping for SWAT in about 9 minutes. [15:51:06] anomie: ppong [15:51:43] RECOVERY - puppet last run on amssq41 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [15:52:10] !log replacing PEM2 cr1-eqiad [15:53:12] 10Ops-Access-Requests, 6operations: Add joal to deployment group - https://phabricator.wikimedia.org/T90731#1065999 (10Ottomata) 3NEW [15:53:46] (03PS1) 10Ottomata: Add joal to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/192810 (https://phabricator.wikimedia.org/T90731) [15:54:10] (03PS2) 10Ottomata: Add joal to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/192810 (https://phabricator.wikimedia.org/T90731) [15:54:22] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Add joal to deployment group - https://phabricator.wikimedia.org/T90731#1066014 (10Ottomata) Change here: https://gerrit.wikimedia.org/r/#/c/192810/ [15:54:23] I guess I'll SWAT, since no one else wants to [15:54:25] <^d> anomie: I can if you're too busy [15:54:30] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Add joal to deployment group - https://phabricator.wikimedia.org/T90731#1066017 (10Ottomata) a:3ArielGlenn [15:54:39] <^d> We're always too busy :) [15:54:44] ^d: Up to you, if you want it you can have it [15:56:03] <^d> I can. These are all things I was looking at yesterday for easy config stuffs [15:56:08] ok [15:56:13] RECOVERY - puppet last run on bast1001 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [15:56:14] <^d> Well, 2 of 3 [15:57:22] <^d> Ah all 3, these are easy [15:58:38] (03CR) 10GWicke: "@Nik, lowering CMSInitiatingOccupancyFraction is indeed the other option I was considering as well. It seemed to work well in my initial c" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/192760 (owner: 10GWicke) [16:00:04] manybubbles, anomie, ^d, marktraceur: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150225T1600). [16:00:19] anomie: Ready when you are! [16:00:33] (03CR) 10Chad: [C: 032] Temporarily re-enable uploads on Marathi Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192784 (https://phabricator.wikimedia.org/T87771) (owner: 10Odder) [16:00:37] twkozlowski: ^d will be SWATting today, FYI [16:00:42] (03CR) 10Chad: [C: 032] Set $wgArticleCountMethod to 'any' on zhwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192367 (https://phabricator.wikimedia.org/T53604) (owner: 10Odder) [16:00:44] (03CR) 10Chad: [C: 032] Point rel=canonical to HTTPS for all ru projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192502 (https://phabricator.wikimedia.org/T90527) (owner: 10Chmarkine) [16:01:23] <^d> Now, we wait [16:03:16] (03PS2) 10GWicke: Reduce the pressure on CMS GC to avoid stop-the-world [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/192760 [16:04:02] 6operations, 10ops-eqiad: cr1-eqiad power supply fan failure - https://phabricator.wikimedia.org/T89224#1066021 (10Cmjohnson) PEM's arrived and were swapped. Returning failed PEM's... UPS Tracking #'s are 1Z7AF3889021293382 and 1Z7AF3889021293935 [16:04:16] 6operations, 10ops-eqiad: cr1-eqiad power supply fan failure - https://phabricator.wikimedia.org/T89224#1066022 (10Cmjohnson) 5Open>3Resolved [16:04:50] Hi swatters. Is deleting a wiki swattable? I've got such a patch open for several months now and don't know the process for getting it merged. [16:04:54] (03Merged) 10jenkins-bot: Temporarily re-enable uploads on Marathi Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192784 (https://phabricator.wikimedia.org/T87771) (owner: 10Odder) [16:04:56] (03Merged) 10jenkins-bot: Set $wgArticleCountMethod to 'any' on zhwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192367 (https://phabricator.wikimedia.org/T53604) (owner: 10Odder) [16:04:59] (03Merged) 10jenkins-bot: Point rel=canonical to HTTPS for all ru projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192502 (https://phabricator.wikimedia.org/T90527) (owner: 10Chmarkine) [16:05:24] <^d> Glaisher: deleting? Or closing? [16:05:32] deleting [16:05:44] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s) [16:05:50] looks easy enough... just adding it to deleted.dblist is what is needed says docs at wikitech [16:05:58] !log demon Synchronized commonsuploads.dblist: (no message) (duration: 00m 07s) [16:06:29] It's a wiki with no content. [16:06:29] <^d> twkozlowski, jzerebecki: Your stuff is all live [16:06:56] <^d> twkozlowski: Running updateArticleCount.php for zhwikinews now [16:07:15] <^d> {{done}} [16:07:34] <^d> Glaisher: link to gerrit change? [16:07:56] https://gerrit.wikimedia.org/r/#/c/171219/ [16:07:58] ^d: Yay, mrwiki seems to work. [16:08:14] can confirm a few, 5 still cached old value, but i don't think purging makes sense for that [16:08:39] PROBLEM - HHVM rendering on mw1145 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:09:00] PROBLEM - Apache HTTP on mw1145 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:09:13] <^d> Glaisher: Don't we want to change apache first? [16:09:59] right. I think we need to remove it from the chapters vhost [16:10:29] but mediawiki-config should be done first, right? [16:10:33] I don't know. [16:11:23] <^d> No, I think the redirect should be in place first [16:11:44] <^d> Get that done by ops and I think the wmf-config stuff is swattable then yeah [16:12:07] Alright. Thanks [16:12:12] anomie: heya, sorry, here, oh, and so is twkozlowski :) [16:12:47] greg-g: I confirm we didn't break anything, so yay \o/ [16:12:51] :) [16:12:59] twkozlowski: hi, btw, long time no see :) [16:17:48] PROBLEM - HHVM busy threads on mw1145 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [86.4] [16:17:59] PROBLEM - HHVM queue size on mw1145 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [80.0] [16:20:09] PROBLEM - Host mc2005 is DOWN: PING CRITICAL - Packet loss = 100% [16:23:39] RECOVERY - Host mc2005 is UP: PING OK - Packet loss = 0%, RTA = 43.02 ms [16:24:38] PROBLEM - puppet last run on amssq54 is CRITICAL: CRITICAL: puppet fail [16:36:03] 6operations, 7HTTPS, 3HTTPS-by-default, 5Patch-For-Review: Point rel=canonical to HTTPS for all Russian Wikimedia projects - https://phabricator.wikimedia.org/T90527#1066091 (10Chmarkine) >>! In T90527#1065142, @Nemo_bis wrote: >> Since Russian Wikimedia projects are HTTPS only and have HSTS enabled > > C... [16:37:49] 6operations, 7HTTPS, 3HTTPS-by-default, 5Patch-For-Review: Point rel=canonical to HTTPS for all Russian Wikimedia projects - https://phabricator.wikimedia.org/T90527#1066095 (10Chmarkine) 5Open>3Resolved [16:38:04] greg-g, would like to add something extra to the swat, just a simple VE JS change. Will that be okay? [16:38:11] <^d> godog: I have a small change for es-tool if you've got a minute [16:39:55] Krenair: sure [16:40:05] Krenair: just put it on the wiki and do it :) [16:40:05] need to make the core submodule update :/ [16:40:27] and git thinks this is a helpful time to re-pack my copy of core [16:40:39] :) [16:44:19] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:44:24] (03PS2) 10Giuseppe Lavagetto: wmf-reimage: perform the first puppet run [puppet] - 10https://gerrit.wikimedia.org/r/192793 [16:44:35] (03PS4) 10Chad: es-tool: support IPv6 addresses in (un)ban-node [puppet] - 10https://gerrit.wikimedia.org/r/191357 [16:44:37] <_joe_> godog: ^^ [16:47:15] > git pull [16:47:22] "Hey, let me pack everything again to speed things up for you!" [16:47:37] [ sits there for a few minutes ] [16:47:39] "Oh yeah, already up-to-date." [16:47:49] * Krenair facedesk [16:47:53] <^d> silly git [16:59:51] !log krenair Synchronized php-1.25wmf18/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTemplateDialog.js: https://gerrit.wikimedia.org/r/#/c/192750/ (duration: 00m 06s) [16:59:54] James_F, ^ [16:59:58] Thanks. :-) [17:10:21] (03CR) 10GWicke: "Intuitively this sounds wrong. Normally counters that aren't decremented are expected to monotonically increase. They aren't the same as r" [puppet] - 10https://gerrit.wikimedia.org/r/192791 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [17:10:43] 6operations, 10ops-eqiad, 10Incident-20150205-SiteOutage: Restore asw2-a5-eqiad redundant power - https://phabricator.wikimedia.org/T88792#1066156 (10faidon) 5Open>3Resolved Since the server reshuffling was done last week, Chris reseated the PSU today and now it's properly detected. The alarm is cleared. [17:13:47] ^d _joe_ sorry in a call [17:13:53] <^d> no worries [17:18:32] <_joe_> godog: np [17:29:48] (03CR) 10Ori.livneh: [C: 031] "Rate metrics are already averaged." [puppet] - 10https://gerrit.wikimedia.org/r/192791 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [17:32:04] (03CR) 10Filippo Giunchedi: [C: 04-1] wmf-reimage: perform the first puppet run (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/192793 (owner: 10Giuseppe Lavagetto) [17:32:58] (03CR) 10Filippo Giunchedi: [C: 031] es-tool: support IPv6 addresses in (un)ban-node [puppet] - 10https://gerrit.wikimedia.org/r/191357 (owner: 10Chad) [17:33:24] <_joe_> godog: meh, sorry [17:33:52] _joe_: np, long day for everyone :) [17:33:58] <_joe_> having to look at boot screens while writing code is not a good idea :/ [17:34:08] (03PS3) 10Giuseppe Lavagetto: wmf-reimage: perform the first puppet run [puppet] - 10https://gerrit.wikimedia.org/r/192793 [17:35:23] (03PS1) 10BBlack: nginx service: ensure => running [puppet/nginx] - 10https://gerrit.wikimedia.org/r/192825 [17:35:25] (03PS1) 10BBlack: nginx: tmpfs for /var/lib/nginx [puppet/nginx] - 10https://gerrit.wikimedia.org/r/192826 [17:35:45] bblack: question, how can i know what version of varnish do we run in production? [17:35:48] 6operations, 10Incident-20150205-SiteOutage, 6MediaWiki-Core-Team, 10Wikimedia-Logstash, 5Patch-For-Review: Decouple logging infrastructure failures from MediaWiki logging - https://phabricator.wikimedia.org/T88732#1066220 (10bd808) [17:35:49] 6operations, 10Incident-20150205-SiteOutage, 6MediaWiki-Core-Team, 10Wikimedia-Logstash, 5Patch-For-Review: Prototype Monolog and rsyslog configuration to ship log events from MediaWiki to Logstash - https://phabricator.wikimedia.org/T88870#1066219 (10bd808) 5Open>3Resolved [17:36:30] nuria: if you mean for documentation purposes: we run 3.0.6 (with many local patches, but those don't affect the basics here) [17:36:59] bblack: ok, got it, i saw there is a 4.0 but will install that one locally to try stuff [17:38:00] nuria: if you haven't looked at our production VCL code before, it's rather complicated and everything pretty much affects everything. [17:38:00] (03PS1) 10John F. Lewis: convert zirconium to private network [puppet] - 10https://gerrit.wikimedia.org/r/192827 (https://phabricator.wikimedia.org/T90676) [17:38:09] (03PS1) 10John F. Lewis: zirconium->wmnet dns [dns] - 10https://gerrit.wikimedia.org/r/192828 (https://phabricator.wikimedia.org/T90676) [17:38:27] bblack: I looked ta that yesterday , boy ....a source of joy that code [17:38:37] :) [17:39:15] 6operations, 5Patch-For-Review: remove public IP from zirconium - https://phabricator.wikimedia.org/T90676#1066238 (10JohnLewis) a:3JohnLewis Above patches should hopefully do it correctly. I did this by reading other patch changes to puppet and dns for public->private host changes so, :) [17:39:21] bblack: I also looked ta the very many extensions teh guys from fastly seem to have developed so they are making, seems like their business is completely dependent on vcl [17:39:29] typically new complex feature work going into varnish ends up requiring a length review process to vet all the interactions. [17:39:47] bblack: right, makes total sense [17:39:55] but we can provide some input/help [17:41:36] (03CR) 10BBlack: [C: 032] nginx service: ensure => running [puppet/nginx] - 10https://gerrit.wikimedia.org/r/192825 (owner: 10BBlack) [17:42:11] 6operations, 10MediaWiki-Database: Add a "datasets" database to analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T85277#1066287 (10Ironholds) 5Open>3Resolved a:3Ironholds [17:42:33] (03PS6) 10Glaisher: Redirect ve.wikimedia.org to wikimedia.org.ve [puppet] - 10https://gerrit.wikimedia.org/r/170925 (https://phabricator.wikimedia.org/T57737) [17:44:07] (03CR) 10Glaisher: "Rebased against the current state of repo and removed from wikimedia-chapter vhost in the new patch." [puppet] - 10https://gerrit.wikimedia.org/r/170925 (https://phabricator.wikimedia.org/T57737) (owner: 10Glaisher) [17:45:05] (03CR) 10GWicke: "The point is that count metrics currently aren't. Example:" [puppet] - 10https://gerrit.wikimedia.org/r/192791 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [17:46:33] 6operations, 10Wikimedia-General-or-Unknown, 5Patch-For-Review: Delete vewikimedia and redirect it to wikimedia.org.ve - https://phabricator.wikimedia.org/T57737#1066351 (10Glaisher) [17:48:53] (03CR) 10GWicke: [C: 04-1] "Further evidence that averaging count metrics would break existing expectations / behavior:" [puppet] - 10https://gerrit.wikimedia.org/r/192791 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [17:49:11] Hi all, what does the log message mean in "wfFixSessionID: PHP's built in entropy is disabled or not sufficient, overriding session id generation using our cryptrand source." should I have a package that i’m missing? [17:51:20] godog: do you think the cron job in https://gerrit.wikimedia.org/r/#/c/192732/ is good to go? [17:51:25] it talks about session.entropy_file [17:51:43] I found in wfCheckEntropy a few checks. Wasn’t aware I needed to configure that [17:52:05] gwicke: haven't fully read the last comments, will do [17:53:03] 6operations, 10Wikimedia-General-or-Unknown, 5Patch-For-Review: Delete vewikimedia and redirect it to wikimedia.org.ve - https://phabricator.wikimedia.org/T57737#1066388 (10Glaisher) This task is now about deleting vewikimedia and redirecting it to http://wikimedia.org.ve as requested at T72579#745839. * R... [17:53:11] godog: thx! [17:54:46] (03CR) 10Ori.livneh: "So: will the replacement for txstatsd meters be statsite gauges?" [puppet] - 10https://gerrit.wikimedia.org/r/192791 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [17:55:11] (03PS1) 10BBlack: bump nginx module for ensure => running [puppet] - 10https://gerrit.wikimedia.org/r/192830 [17:55:13] (03PS1) 10BBlack: final jessie-cache VM tuning, for now [puppet] - 10https://gerrit.wikimedia.org/r/192831 [17:55:15] (03PS1) 10BBlack: kernel 3.19 for jessie caches [puppet] - 10https://gerrit.wikimedia.org/r/192832 [17:55:17] (03PS1) 10BBlack: mkfs for ext4 varnish filesystems on jessie [puppet] - 10https://gerrit.wikimedia.org/r/192833 [17:56:37] (03CR) 10Brion VIBBER: [C: 031] "Do we have enough consensus for someone with +2 to poke this? :)" [puppet] - 10https://gerrit.wikimedia.org/r/192205 (https://phabricator.wikimedia.org/T78617) (owner: 10Aklapper) [17:56:42] (03CR) 10jenkins-bot: [V: 04-1] mkfs for ext4 varnish filesystems on jessie [puppet] - 10https://gerrit.wikimedia.org/r/192833 (owner: 10BBlack) [17:58:03] (03PS2) 10John F. Lewis: zirconium->wmnet dns [dns] - 10https://gerrit.wikimedia.org/r/192828 (https://phabricator.wikimedia.org/T90676) [17:58:26] (03PS2) 10BBlack: mkfs for ext4 varnish filesystems on jessie [puppet] - 10https://gerrit.wikimedia.org/r/192833 [17:59:16] (03CR) 10jenkins-bot: [V: 04-1] mkfs for ext4 varnish filesystems on jessie [puppet] - 10https://gerrit.wikimedia.org/r/192833 (owner: 10BBlack) [18:00:18] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [18:01:35] <_joe_> I did puppet-merge manually on strontium [18:01:37] (03CR) 10Ori.livneh: "On reflection, I think GWicke is right." [puppet] - 10https://gerrit.wikimedia.org/r/192791 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [18:04:04] 6operations, 7Graphite, 5Patch-For-Review: replace txstatsd - https://phabricator.wikimedia.org/T90111#1066406 (10GWicke) > change aggregation policy for .count metrics to average Averaging `.count` metrics should be incorrect no matter what model statsite is using. If it follows what we are doing so far... [18:04:37] (03CR) 10Dzahn: [C: 032] "yes" [puppet] - 10https://gerrit.wikimedia.org/r/192205 (https://phabricator.wikimedia.org/T78617) (owner: 10Aklapper) [18:05:41] (03CR) 10Ricordisamoa: "Nooooooo... I was just going to +1 too ;(" [puppet] - 10https://gerrit.wikimedia.org/r/192205 (https://phabricator.wikimedia.org/T78617) (owner: 10Aklapper) [18:06:35] (03CR) 10Dzahn: "That counts as another +1 :)" [puppet] - 10https://gerrit.wikimedia.org/r/192205 (https://phabricator.wikimedia.org/T78617) (owner: 10Aklapper) [18:09:00] 6operations, 7Graphite: scale statsd reporting/aggregation (plan) - https://phabricator.wikimedia.org/T89857#1066435 (10chasemp) @fgiunchedi and I had a long involved discussion about options today :) I think we both came away with a bit to think on. [18:18:10] 6operations, 7Graphite: scale statsd reporting/aggregation (plan) - https://phabricator.wikimedia.org/T89857#1066460 (10GWicke) @chasemp: Do you mind summarizing the main questions / options that you identified? [18:18:24] mutante: thanks! [18:18:35] 10Ops-Access-Requests, 6operations: Requesting access to contint-admins for legoktm - https://phabricator.wikimedia.org/T90275#1066461 (10Legoktm) Thank you! [18:19:51] legoktm: you're welcome [18:20:06] jzerebecki: do you know why wikidata Q numbers are different in prod and labs? [18:20:19] 6operations, 10hardware-requests: codfw: (1) eventlogging node - https://phabricator.wikimedia.org/T90747#1066479 (10RobH) 3NEW a:3RobH [18:20:26] http://www.wikidata.org/wiki/Q5 vs. http://wikidata.beta.wmflabs.org/wiki/Q5 vs. http://wikidata.beta.wmflabs.org/wiki/Q44076 [18:20:47] already causes issues now [18:20:53] (03CR) 10Ori.livneh: "Should be rebased so that it doesn't depend on https://gerrit.wikimedia.org/r/#/c/192791/" [puppet] - 10https://gerrit.wikimedia.org/r/192792 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [18:20:58] mutante: no connection of the data. totally different databases [18:21:09] same with test.wikidata.org [18:21:15] jzerebecki: hrmm.. that's unfortunate. it means things like this need code changes after being tested [18:21:20] https://gerrit.wikimedia.org/r/#/c/192709/1 [18:21:34] A Q number is just a unique sequence id [18:22:01] (03Abandoned) 10Filippo Giunchedi: graphite: do not aggregate counters by sum [puppet] - 10https://gerrit.wikimedia.org/r/192791 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [18:22:08] i just think that you should't need to change code after it's already on beta [18:22:19] just to get it on prod [18:22:41] seems a bit against the point of beta [18:22:55] beta is not staging [18:23:10] meaning beta is not a copy of prod [18:23:22] so there is wmf-config/InitialiseSettings-staging.php ?? [18:23:32] beta is just a shared integration testing environment [18:23:32] (03PS2) 10Filippo Giunchedi: graphite: extend aggregation for statsite extended counters [puppet] - 10https://gerrit.wikimedia.org/r/192792 (https://phabricator.wikimedia.org/T90111) [18:23:41] nope we don't have a staging environment [18:23:45] then where is staging? [18:23:45] (03PS3) 10Filippo Giunchedi: graphite: extend aggregation for statsite extended counters [puppet] - 10https://gerrit.wikimedia.org/r/192792 (https://phabricator.wikimedia.org/T90111) [18:23:48] heh [18:24:34] if we don't even expect beta to be like prod he could as well change InitialiseSettings.php right away [18:24:36] (03CR) 10jenkins-bot: [V: 04-1] graphite: extend aggregation for statsite extended counters [puppet] - 10https://gerrit.wikimedia.org/r/192792 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [18:24:43] (03CR) 10Filippo Giunchedi: [C: 04-1] wmf-reimage: perform the first puppet run (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/192793 (owner: 10Giuseppe Lavagetto) [18:26:10] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, not thrilled to change multiple things at once but not a blocker in this case" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/192760 (owner: 10GWicke) [18:26:16] (03CR) 10Dzahn: [C: 031] "the change is correct for beta, i just think it's unfortunate that beta and prod are different since it will need another change to fix al" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192709 (owner: 10Jdlrobson) [18:26:40] mutante: I think we all agree that having a real staging environment is important; 'just' need to figure out a way to get there [18:26:45] it should not be called "-labs" then [18:27:03] let's call it "-beta" [18:27:05] +500 for a staging environment [18:27:18] 6operations, 10hardware-requests: codfw: (1) eventlogging node - https://phabricator.wikimedia.org/T90747#1066534 (10Tnegrin) Hi Rob/Ori -- Should we keep vanadium as a spare? We had mentioned having a failover box handy - perhaps we should get two new boxes? -Toby [18:27:27] (03CR) 10Filippo Giunchedi: Add a daily incremental repair job (031 comment) [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/192732 (owner: 10GWicke) [18:28:07] 6operations, 10ops-codfw, 3wikis-in-codfw: Console on mc2001 is unresponsive - https://phabricator.wikimedia.org/T90559#1066538 (10Papaul) i checked all the settings for the console, everything looks good. can you please try to connect again. thanks [18:29:29] (03CR) 10GWicke: Add a daily incremental repair job (031 comment) [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/192732 (owner: 10GWicke) [18:30:29] 6operations, 10ops-codfw, 3wikis-in-codfw: Move network cable to the other port on codfw memcached hosts - https://phabricator.wikimedia.org/T90456#1066554 (10Papaul) the cable on those servers are switched , can you please confirm that it is working so I can resolve this task. thanks [18:30:48] (03CR) 10Filippo Giunchedi: [C: 031] Add a daily incremental repair job [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/192732 (owner: 10GWicke) [18:34:58] (03CR) 10Filippo Giunchedi: "btw I think the conclusion is right, but the ever increasing restbase counter is not proof of that, there's less than 7d days of 1m data t" [puppet] - 10https://gerrit.wikimedia.org/r/192791 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [18:35:28] mutante: the data on test.wikidata and wikidata.beta is mostly trash, just entered for tests. that commit is configuration that refers to identifier of data people entered. it is like putting in wikipedia article names. it only works better with wikidata because of convention to have them remain stable ids (no reuse, etc.). [18:36:13] mutante: which means yes such configuration changes usually don't make sense to go through beta nor test [18:36:33] 6operations, 10hardware-requests: codfw: (1) eventlogging node - https://phabricator.wikimedia.org/T90747#1066568 (10RobH) Vanadium has the following: - dell poweredge r310 - single cpu Intel(R) Xeon(R) CPU X3450 @ 2.67GHz - 8 GB Memory - Dual 500 GB Hard disks As such, any of the misc class spare systems... [18:37:27] (03CR) 10Chad: [C: 032 V: 032] Upgrade phabricator plugins to add add-project action [gerrit/plugins] - 10https://gerrit.wikimedia.org/r/192613 (https://phabricator.wikimedia.org/T89967) (owner: 10QChris) [18:39:22] 7Blocked-on-Operations, 6operations, 10Analytics, 6Mobile-Apps, and 2 others: Avoid cache fragmenting URLs for Share a Fact shares - https://phabricator.wikimedia.org/T90606#1066600 (10dr0ptp4kt) [18:39:22] (03CR) 10Ori.livneh: "Yeah, the counters end up being not very useful when you follow the spec." [puppet] - 10https://gerrit.wikimedia.org/r/192791 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [18:40:04] (03PS4) 10Filippo Giunchedi: graphite: extend aggregation for statsite extended counters [puppet] - 10https://gerrit.wikimedia.org/r/192792 (https://phabricator.wikimedia.org/T90111) [18:40:45] (03CR) 10Ori.livneh: [C: 031] graphite: extend aggregation for statsite extended counters [puppet] - 10https://gerrit.wikimedia.org/r/192792 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [18:40:52] (03CR) 10jenkins-bot: [V: 04-1] graphite: extend aggregation for statsite extended counters [puppet] - 10https://gerrit.wikimedia.org/r/192792 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [18:41:26] clearly I'm being dense, thanks jenkins [18:42:10] (03PS5) 10Filippo Giunchedi: graphite: extend aggregation for statsite extended counters [puppet] - 10https://gerrit.wikimedia.org/r/192792 (https://phabricator.wikimedia.org/T90111) [18:42:55] (03PS2) 10BBlack: bump nginx module for ensure => running [puppet] - 10https://gerrit.wikimedia.org/r/192830 [18:42:57] (03PS2) 10BBlack: final jessie-cache VM tuning, for now [puppet] - 10https://gerrit.wikimedia.org/r/192831 [18:42:59] (03PS3) 10BBlack: mkfs for ext4 varnish filesystems on jessie [puppet] - 10https://gerrit.wikimedia.org/r/192833 [18:43:01] (03PS2) 10BBlack: kernel 3.19 for jessie caches [puppet] - 10https://gerrit.wikimedia.org/r/192832 [18:46:21] (03CR) 10BBlack: [C: 032] bump nginx module for ensure => running [puppet] - 10https://gerrit.wikimedia.org/r/192830 (owner: 10BBlack) [18:46:30] 7Blocked-on-Operations, 6operations, 10Analytics, 6Mobile-Apps, and 3 others: Avoid cache fragmenting URLs for Share a Fact shares - https://phabricator.wikimedia.org/T90606#1066658 (10dr0ptp4kt) [18:46:37] (03CR) 10BBlack: [C: 032] final jessie-cache VM tuning, for now [puppet] - 10https://gerrit.wikimedia.org/r/192831 (owner: 10BBlack) [18:47:15] (03CR) 10BBlack: [C: 032] kernel 3.19 for jessie caches [puppet] - 10https://gerrit.wikimedia.org/r/192832 (owner: 10BBlack) [18:48:13] (03PS1) 10MaxSem: Revert "Pull WG on WD for now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192845 [18:49:12] twentyafterfour, can you deploy ^^^ after switching enwiki? [18:50:57] (03CR) 10Hoo man: "Could you please add a task to this with a brief explanation what this does?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192845 (owner: 10MaxSem) [18:52:04] twentyafterfour: Don't forget to merge https://gerrit.wikimedia.org/r/192604 before cutting the branches [18:52:53] MaxSem: ok [18:53:05] thanks:) [18:53:06] hoo: I won't forget [18:53:48] (03CR) 10GWicke: "@Filippo, mind going all the way to +2 so that we can evaluate this while still in load testing?" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/192760 (owner: 10GWicke) [18:54:10] twentyafterfour: :) [18:54:19] (03CR) 10GWicke: "@Filippo, would be great to merge this so that we can run it at least once over night before declaring the cluster stable." [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/192732 (owner: 10GWicke) [18:55:21] (03CR) 10MaxSem: "https://trello.com/c/nXel0ldz/29-change-wikigrok-frontend-to-send-responses-to-wikidata-org-instead-of-local-wiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192845 (owner: 10MaxSem) [18:56:24] 7Blocked-on-Operations, 6operations, 10RESTBase, 10hardware-requests, and 2 others: RESTBase production hardware - 4 of 6 ready - https://phabricator.wikimedia.org/T76986#1066712 (10GWicke) [18:57:06] (03CR) 10Hoo man: "That link doesn't work for me :/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192845 (owner: 10MaxSem) [18:57:25] 7Blocked-on-Operations, 6operations, 10RESTBase, 10hardware-requests, and 2 others: RESTBase production hardware - 5 of 6 ready - https://phabricator.wikimedia.org/T76986#1066734 (10GWicke) [18:57:32] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] "yep" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/192732 (owner: 10GWicke) [18:57:49] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Reduce the pressure on CMS GC to avoid stop-the-world [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/192760 (owner: 10GWicke) [18:58:32] (03CR) 10MaxSem: "Interestingly, it doesn't work for me either in Chrome. Must be a Trello bug, try FF." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192845 (owner: 10MaxSem) [18:58:56] (03PS1) 10Filippo Giunchedi: cassandra: update submodule [puppet] - 10https://gerrit.wikimedia.org/r/192849 [19:00:04] twentyafterfour, greg-g, legoktm: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150225T1900). Please do the needful. [19:00:20] (03CR) 10Lydia Pintscher: "It doesn't work in Firefox for me either." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192845 (owner: 10MaxSem) [19:00:26] (03PS1) 10GWicke: Update the cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/192850 [19:00:41] gwicke: clash there [19:01:01] I'm what?? [19:01:02] godog: ah, yeah [19:01:07] mid-air collision [19:01:09] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Update the cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/192850 (owner: 10GWicke) [19:01:10] let me abandon [19:01:20] too late ;) [19:01:39] haha that's fine [19:02:08] (03Abandoned) 10Filippo Giunchedi: cassandra: update submodule [puppet] - 10https://gerrit.wikimedia.org/r/192849 (owner: 10Filippo Giunchedi) [19:02:15] legoktm: haaaahhhhaaaa [19:02:16] gwicke: done [19:02:52] godog: thanks! [19:03:13] would you prefer to wait with https://gerrit.wikimedia.org/r/#/c/192762/ until we have results from the gc parameter changes? [19:04:38] ok getting ready to do the deploy. hoo: I'll merge your patch now [19:10:45] branching 1.25wmf19 [19:11:58] using any of your scripts this time, twentyafterfour ? [19:13:44] (03PS1) 10Rush: phab role cleanup and puppetize phabtools.conf [puppet] - 10https://gerrit.wikimedia.org/r/192856 [19:14:15] gwicke: yeah it'd be great before/after comparisons [19:14:32] godog: kk [19:14:35] (03CR) 10jenkins-bot: [V: 04-1] phab role cleanup and puppetize phabtools.conf [puppet] - 10https://gerrit.wikimedia.org/r/192856 (owner: 10Rush) [19:14:53] mutante: so when I try and ssh into gallium, I'm getting "channel 0: open failed: administratively prohibited: open failed ssh_exchange_identification: Connection closed by remote host" any ideas? [19:15:17] godog: unfortunately there will be other changes as well (more wikis), might want to repeat this change as a lab experiment on the test cluster [19:15:29] mutante: I'm using proxycommand which works fine for tin/terbium/etc... [19:15:59] legoktm: have you been granted access to gallium? [19:16:10] YuviPanda|zzz: yes, https://gerrit.wikimedia.org/r/#/c/191954/ [19:16:19] greg-g: I haven't tested them so no [19:16:24] twentyafterfour: :) [19:16:35] ah, right [19:17:02] legoktm: ah, right, it's not gallium.eqiad.wmnet , it's gallium.wikimedia.org [19:17:14] would that be it? [19:17:18] !log restarted cassandra on restbase1003 with new GC settings from puppet [19:17:37] oh, so I don't go through a bastion? [19:18:03] legoktm: you can, but you don't have to in this case [19:18:23] legoktm: if you do you need to change the config to use .wikimedia.org [19:18:55] you can use "Host *.wikimedia.org *.wmnet" in ssh config [19:19:17] I think you might need ^ anyway [19:19:27] I can't proxycommand to virt1000 without it, for example [19:20:21] hmm, it's hanging at debug1: Connecting to gallium.wikimedia.org [208.80.154.135] port 22. now :/ [19:21:13] hmm. i just found this "In the current system they all can access zuul.eqiad.wmnet which is the public IP of gallium." [19:21:21] legoktm: "ssh zuul.eqiad.wmnet" works for me too [19:21:35] so you could just connect to that with existing config [19:21:38] ah, that works for me as well :) [19:21:44] I'll just use that then [19:21:45] thanks! [19:21:48] alright, cool [19:22:31] this was from https://phabricator.wikimedia.org/T86171 which you might find interesting anyways (diagram about contint architecture and stuff) [19:25:38] (03PS2) 10Rush: phab role cleanup and puppetize phabtools.conf [puppet] - 10https://gerrit.wikimedia.org/r/192856 [19:26:30] (03CR) 10jenkins-bot: [V: 04-1] phab role cleanup and puppetize phabtools.conf [puppet] - 10https://gerrit.wikimedia.org/r/192856 (owner: 10Rush) [19:26:36] (03CR) 10Dzahn: [C: 031] "of course it also needs the DNS change but lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/192827 (https://phabricator.wikimedia.org/T90676) (owner: 10John F. Lewis) [19:26:54] make-wmf-branch fail: https://etherpad.wikimedia.org/p/mmodell [19:31:21] (03CR) 10Dzahn: "robh: does the private IP look right?" (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/192828 (https://phabricator.wikimedia.org/T90676) (owner: 10John F. Lewis) [19:31:22] ^d: ^ [19:31:55] (03CR) 10Dzahn: "the "public1-a-eqiad to private1-1-eqiad" part" [dns] - 10https://gerrit.wikimedia.org/r/192828 (https://phabricator.wikimedia.org/T90676) (owner: 10John F. Lewis) [19:32:15] <^d> looking [19:32:21] twentyafterfour: Use --set-upstream-to=... instead of -u [19:32:58] <^d> That'll work [19:33:00] (03PS3) 10Rush: phab role cleanup and puppetize phabtools.conf [puppet] - 10https://gerrit.wikimedia.org/r/192856 [19:34:30] (03CR) 10RobH: "I made an in line comment that while the actual change is fine, I agree that using up the random open IPs within the already used space is" (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/192828 (https://phabricator.wikimedia.org/T90676) (owner: 10John F. Lewis) [19:34:56] (03PS4) 10Rush: phab role cleanup and puppetize phabtools.conf [puppet] - 10https://gerrit.wikimedia.org/r/192856 [19:36:38] (03CR) 10Rush: [C: 032] phab role cleanup and puppetize phabtools.conf [puppet] - 10https://gerrit.wikimedia.org/r/192856 (owner: 10Rush) [19:36:51] hoo: set-upstream-to doesn't exist in the version of git on tin [19:37:09] git version 1.7.9.5 [19:37:45] uh, pre-2.0 git [19:37:46] MaxSem: can you please expand on the patch for wikidata? we still don't know what it actually does and it'd be nice to know a bit more before it gets deployed [19:38:23] it deploys WikiGrok to wd.o to record (and, in future, apply) responses in one place [19:38:40] (03PS2) 10RobH: rewriting techblog.wikimedia.org to blog.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/192704 [19:38:49] ok but no statements are added to wikidata yet? [19:38:55] yep:) [19:39:00] ok thanks [19:39:11] so I don't know what to do [19:39:11] that makes it a bit clearer ;-) [19:39:19] (03PS3) 10John F. Lewis: zirconium->wmnet dns [dns] - 10https://gerrit.wikimedia.org/r/192828 (https://phabricator.wikimedia.org/T90676) [19:39:27] Revert https://gerrit.wikimedia.org/r/#/c/170025/ [19:39:31] don't worry, _that_ will be communicated eeell in advance:P [19:39:46] (03CR) 10John F. Lewis: "moved to .199 which was the first available one I could see with a skim" [dns] - 10https://gerrit.wikimedia.org/r/192828 (https://phabricator.wikimedia.org/T90676) (owner: 10John F. Lewis) [19:42:12] twentyafterfour: see reedy's comment ^ [19:42:15] ohai Reedy :) [19:42:52] problem is make-wmf-branch doesn't know how to resume mid-stream [19:42:57] yeah [19:43:04] you have to hack it [19:43:32] i usually just add a temp flag to find the wanted starting point [19:44:36] godog: created a dashboard at http://grafana.wikimedia.org/#/dashboard/db/cassandra-heap [19:44:47] Orr... [19:46:00] (03CR) 10BBlack: [C: 031] rewriting techblog.wikimedia.org to blog.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/192704 (owner: 10RobH) [19:47:47] !log restarted restbase1005 with new GC settings [19:48:43] * aude waves [19:48:55] wb aude [19:50:19] bblack: thanks for review, I'll now be able to merge it @ 2 my time =] [19:51:49] 6operations, 10Wikimedia-Blog, 5Patch-For-Review: add techblog.wikimedia.org redirection to blog.wikimedia.org to redirects - https://phabricator.wikimedia.org/T90638#1067047 (10RobH) I've gotten reviews and feedback from Faidon (with corrections), implemented corrections and gotten a +1 from Brandon. As su... [19:55:05] (03PS1) 10Dzahn: static Bugzilla: explain cgi links are redirects [puppet] - 10https://gerrit.wikimedia.org/r/192865 (https://phabricator.wikimedia.org/T85140) [19:56:28] (03CR) 10John F. Lewis: [C: 031] "Yeah" [puppet] - 10https://gerrit.wikimedia.org/r/192865 (https://phabricator.wikimedia.org/T85140) (owner: 10Dzahn) [20:01:51] Hi all, what does the log message mean in "wfFixSessionID: PHP's built in entropy is disabled or not sufficient, overriding session id generation using our cryptrand source."... Is there a package that I would need to install? I dont see anything about hash/crack/crypt in github.com/wikimedia/operations-puppet mediawiki module. The GlobalFunctions.php around wfCheckEntropy talks about session.entropy_* what should I put there? [20:02:06] reverted that change. greg-g, Reedy, aude, anyone care to +2? https://gerrit.wikimedia.org/r/#/c/192866/ [20:02:55] twentyafterfour: Even non git-2.0 has git --set-upstread [20:02:59] just use that? [20:03:27] ? [20:04:35] (03PS2) 10Dzahn: static Bugzilla: explain cgi links are redirects [puppet] - 10https://gerrit.wikimedia.org/r/192865 (https://phabricator.wikimedia.org/T85140) [20:04:36] $ git --version [20:04:36] git version 1.7.1 [20:04:44] $ git branch --set-upstream origin/master [20:04:44] Branch origin/master set up to track local branch fooooo. [20:04:48] twentyafterfour: ^ [20:04:52] that git is even older [20:05:29] twentyafterfour: :( [20:05:52] but ok and shall test on tin + resubmit [20:06:48] something furthermore is needed in checkoutMediawiki afaik [20:06:51] als8 [20:06:53] o* [20:08:12] aude: hoo: if you want a related bug for that, it should be T87036 [20:08:27] that would convert tin to trusty which would upgrade git [20:09:29] mutante: ♥ for killing "Needs Volunteer"! [20:09:43] mutante: :) [20:10:03] Yeah, upgrading the two would be awesome [20:10:08] andre__: i liked "needs volunteer to unbreak now" :) /me hides [20:10:25] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: puppet fail [20:10:26] ok I made a basic resume feature in make-wmf-branch, though I didn't add cli arg parsing for it [20:10:37] https://gerrit.wikimedia.org/r/#/c/192868/ [20:10:50] aude: hoo: _or_ it could be that we want Debian right away, so jessie [20:11:24] That would be even better, IMO [20:11:39] reinstall or dist-upgade [20:11:41] xD [20:12:04] meh, reinstall in both cases :p [20:13:42] 6operations, 10Beta-Cluster, 6MediaWiki-Core-Team, 7HHVM: Convert work machines (tin, terbium) to Trusty and hhvm usage - https://phabricator.wikimedia.org/T87036#1067273 (10Dzahn) Or we might switch them over to Debian jessie right away? What do other ops think in this case? still trusty or jessie already? [20:15:47] 6operations, 10Beta-Cluster, 6MediaWiki-Core-Team, 7HHVM: Convert work machines (tin, terbium) to Trusty and hhvm usage - https://phabricator.wikimedia.org/T87036#1067281 (10Dzahn) Resolving this should also prevent reverts like https://gerrit.wikimedia.org/r/#/c/192866/ [20:16:36] (03CR) 10Dzahn: [C: 032] static Bugzilla: explain cgi links are redirects [puppet] - 10https://gerrit.wikimedia.org/r/192865 (https://phabricator.wikimedia.org/T85140) (owner: 10Dzahn) [20:19:48] (03PS1) 10Ottomata: Fix to allow spark to load native Hadoop libs [puppet/cdh] - 10https://gerrit.wikimedia.org/r/192870 [20:20:49] (03CR) 10Ottomata: [C: 032] Fix to allow spark to load native Hadoop libs [puppet/cdh] - 10https://gerrit.wikimedia.org/r/192870 (owner: 10Ottomata) [20:21:32] (03PS1) 10Ottomata: Update cdh module with spark native hadoop lib fix [puppet] - 10https://gerrit.wikimedia.org/r/192871 [20:21:52] (03CR) 10Ottomata: [C: 032 V: 032] Update cdh module with spark native hadoop lib fix [puppet] - 10https://gerrit.wikimedia.org/r/192871 (owner: 10Ottomata) [20:22:16] PROBLEM - puppet last run on db2041 is CRITICAL: CRITICAL: puppet fail [20:29:24] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 0 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [20:30:15] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [20:33:11] critical.... there are no unmerged changes? [20:33:33] usually means there is a local commit [20:39:45] RECOVERY - puppet last run on db2041 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [20:42:23] !log restarted nova-compute on virt1002 [20:46:27] bblack: there? [20:47:23] (03PS1) 10GWicke: Lower the heap limit per worker to 250m [puppet] - 10https://gerrit.wikimedia.org/r/192876 [20:50:04] (03CR) 10Rush: [C: 032] Lower the heap limit per worker to 250m [puppet] - 10https://gerrit.wikimedia.org/r/192876 (owner: 10GWicke) [20:50:51] dr0ptp4kt: he went afk for a bit [20:51:19] chasemp: thanks! [21:00:04] gwicke, cscott, arlolra, subbu: Dear anthropoid, the time has come. Please deploy Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150225T2100). [21:03:10] 6operations, 6Labs, 10Wikimedia-Labs-wikitech-interface: 404 on http://wmflabs.org (wikitech migration) - https://phabricator.wikimedia.org/T90787#1067457 (10Dzahn) [21:04:55] whom should i bug to have my cluster access revoked in about 10 hours? [21:05:24] yurik: keep it [21:05:26] it is awesome to keep it [21:05:33] domas, welcome! [21:05:35] yurik: or are you traveling into high risk country?!!? [21:05:38] yep [21:05:54] canada ;) [21:05:56] yurik: make a patchset [21:05:56] kidding [21:06:05] poke whoever is on ops duty [21:06:12] Or rand() ops if it's more urgent [21:06:30] yurik, www.youtube.com/watch?v=Lq9X4eQiX-w [21:06:39] Reedy, i don't want it revoked until i leave - have a few things to finish up [21:06:48] MaxSem, thx ))))))) [21:07:10] yurik: just create the patchset so ops can merge it when you want is the idea :) [21:07:19] thx ) [21:07:22] IN SOVIET RUSSIA CLUSTER REVOKES YOU [21:07:41] lol [21:10:33] We don’t generally upgrade servers in place from Precise to Trusty, right? The pattern is usually to image new hardware and migrate services? [21:10:45] Or are in-place upgrades totally safe and trivial and I should stop worrying? [21:11:20] (03PS1) 1020after4: Remove stale versions: 1.25wmf14 and 1.25wmf15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192880 [21:11:22] (03PS1) 1020after4: Add 1.25wmf19 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192881 [21:11:24] (03PS1) 1020after4: Wikipedias to 1.25wmf18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192882 [21:11:25] It usually depends on the service :) [21:11:26] (03PS1) 1020after4: group0 to 1.25wmf19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192883 [21:11:59] Things like apaches can be depooled and reinstalled at will etc [21:12:07] ‘totally safe and trivial’ <- the motto on my family crest [21:12:13] !log updated Parsoid to version 5a3aaf712c334190a97a1d224a9efc0fb340f6af [21:12:29] "What's the worst that can happen?" [21:13:04] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [21:13:16] (03CR) 1020after4: [C: 032] Remove stale versions: 1.25wmf14 and 1.25wmf15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192880 (owner: 1020after4) [21:16:36] reinstall let's you find the unpuppetized bits [21:17:04] but only if you're doing it to new hardware so you don't lose the original [21:17:11] (03Merged) 10jenkins-bot: Remove stale versions: 1.25wmf14 and 1.25wmf15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192880 (owner: 1020after4) [21:19:13] Do we no longer need to keep around old versions for a long time? [21:20:03] rm: cannot remove `php-1.25wmf14/cache/l10n/l10n_cache-oc.cdb [21:22:50] Is it normal that I get more than one "BitmapHandler::doTransform: creating ..." in a request that has [[File:...]] that’s already been uploaded and handled? [21:23:49] bd808, you know about that? [21:23:57] !log twentyafterfour Started scap: testwiki to php-1.25wmf19 and rebuild l10n cache [21:24:18] (03PS1) 10Yurik: MERGE no earlier than 6pm PST: Revoked my own key [puppet] - 10https://gerrit.wikimedia.org/r/192888 [21:25:17] mutante: sure, that’s a good reason to reinstall. But… is there an established history of in-place upgrades? Is it generally recognized as safe? [21:26:27] andrewbogott: no, it has only been done on a case-by-case basis if there were exceptional circumstances [21:26:35] dang :( [21:26:36] ok, thanks [21:27:14] andrewbogott: in an ideal world it would be just like what you said about instances.. if it's effort ... [21:27:16] Well, I seem to have fixed wmflabs.org but not www.wmflabs.org. Puzzling. [21:27:20] the whole cow/pets thing [21:27:20] I guess I’ll wait a while [21:27:38] mutante: yeah, true. But I would at the very least need a bunch more hardware. [21:28:12] ah, you mean to install new boxes in parallel while the old one is still up? [21:28:16] *nod* [21:28:27] right [21:28:41] Since labs has a bunch of different types of boxes… I’d need a bunch of duplicates. [21:29:04] mutante: what IP do you get for wmflabs.org and www.wmflabs.org? [21:29:23] wmflabs.org has address 208.80.154.18 [21:29:28] www.wmflabs.org has address 208.80.154.136 [21:29:36] well that’s the opposite of what I see [21:29:39] i had not noticed they are 2 different ones [21:29:49] I just changed — probably it’s just a question of dns caching [21:29:55] is it possible you're not going to get consistent results because of load balancing? (disclaimer: i have no idea how this works) [21:30:19] nah, no load balancing for this [21:30:34] I think it’s just caches that are out of sync. I’ll wait a half hour and look again [21:30:57] mutante: so, to reconfirm… ‘www.wmflabs.org’ resolves properly for you in a browser? And wmflabs.org gives you a 404? [21:33:37] andrewbogott: www. does, wmflabs.org doesn't (same results as mutante) [21:33:49] andrewbogott: yes, confirmed [21:33:55] Huh. [21:33:57] * andrewbogott waits [21:34:19] when I ‘dig’ www.wmflabs.org I get the .136 address, but not when I ping. Would’ve thought they’d be the same [21:39:42] andrewbogott: uhm.. it's like it's gone from our servers [21:39:52] ? [21:39:55] which? [21:39:56] oh wait [21:40:04] this is on the labs DNS server? [21:40:11] yes [21:40:26] ok, which server name is that [21:40:47] i had tried ns0.wikimedia.org etc [21:40:58] I think it’s ns1.wmflabs.org [21:40:59] * andrewbogott looks [21:41:16] labs-ns0.wikimedia.org [21:41:22] (03CR) 1020after4: "on firefox:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192845 (owner: 10MaxSem) [21:41:26] dig A wmflabs.org @virt1000.wikimedia.org [21:41:37] dig A www.wmflabs.org @virt1000.wikimedia.org [21:41:41] they are the same if i ask there [21:41:44] so yea, caching [21:42:05] dig gives the same for me as well. Just not ping [21:42:12] so must me a local cache [21:42:19] (03CR) 10MaxSem: "Anyway, per IRC discussion, the WD team has no objections;)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192845 (owner: 10MaxSem) [21:42:21] i can't ask @labs-ns0.wikimedia.org thouh [21:42:21] *be [21:42:23] times out [21:42:25] hm [21:42:26] twentyafterfour, ^^ [21:42:48] 6operations, 6Labs, 10Wikimedia-Labs-wikitech-interface: 404 on http://wmflabs.org (wikitech migration) - https://phabricator.wikimedia.org/T90787#1067708 (10Andrew) thanks for diagnosing this! I've updated the dns rec (it was in LDAP) so this should sort itself out soon. [21:42:51] note that not every team uses phab for everything yet [21:43:18] MaxSem: yeah I'm not hung up about it ;) [21:43:46] Yeah, it's ok to push that thing from our side [21:43:57] an advance notice would have been nice, though [21:44:36] hoo, I wared you guys about this like 2 months ago;) [21:44:43] so it needs to go out after switching enwiki to .18? [21:44:51] yep [21:44:59] otherwise, responses will be lost [21:46:27] so switch enwiki, sync, merge that patch, sync? [21:46:41] mysql:wikiadmin@db1058 [wikidatawiki]> SHOW TABLES LIKE "wikigrok_questions"; [21:46:41] Empty set (0.00 sec) [21:46:42] yup [21:46:44] MaxSem: ^ [21:47:02] hoo, it's not supposed to have that table on repo wiki [21:47:39] But last time you deployed that we got DB errors because of that table missing [21:47:46] Guess you fixed that? [21:48:10] yup, that was because of an old version [21:48:11] what does wikigrok do on wikidata? [21:48:35] eceive people's responses [21:48:44] will store soon [21:48:52] (hopefully, ffs) [21:49:07] then we will talk about submission ;) [21:49:13] andrewbogott: and now it's also the same when i ping [21:49:49] mutante: so, both addresses fixed for you? .136? [21:50:10] andrewbogott: yes [21:50:18] OK, then I will declare victory [21:50:26] 6operations, 6Labs, 10Wikimedia-Labs-wikitech-interface: 404 on http://wmflabs.org (wikitech migration) - https://phabricator.wikimedia.org/T90787#1067732 (10Andrew) 5Open>3Resolved a:3Andrew [21:50:35] redirects as it should, thanks [21:50:38] yes, do it [21:51:25] (03PS1) 10Southparkfan: Use elseif instead of else if [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192894 [21:54:18] !log twentyafterfour Finished scap: testwiki to php-1.25wmf19 and rebuild l10n cache (duration: 30m 20s) [21:54:45] MaxSem: so, it needs the api module but not the database table yet? [21:55:02] <^d> twentyafterfour: I logged T90796 to track the "invalid host name (wikipedia)" error multiversion won't stfu about [21:55:15] <^d> (after spending a few minutes turning that mangled json into something legible) [21:55:27] aha [21:57:28] (03CR) 10Rush: "Is this relevant to today? Please provide more notice in the future if so. I don't know if anyone will be around to do this at the time " [puppet] - 10https://gerrit.wikimedia.org/r/192888 (owner: 10Yurik) [22:00:05] robh: Dear anthropoid, the time has come. Please deploy Apache Cluster Redirects update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150225T2200). [22:00:14] yurik, maybe revoke right now? [22:00:37] weeee, hope i dont break shit.. [22:00:37] robh: jouncebot likes you [22:01:12] oh, it's your redirects, i was first thinking it picked a random person [22:01:18] nah, its mine [22:01:25] the techblog stuff [22:01:29] yea, makes sense [22:02:59] how do I "verify l10n cache on testwiki" [22:03:16] I can't find anything referring to it on special pages [22:03:23] Look at testwiki [22:03:24] <^d> twentyafterfour: "Does it not look broken at a glance?" [22:03:29] !log disabling puppet on all mw systems for redirects update [22:03:31] <^d> Hehe Reedy+1 [22:03:32] If all the messages show as it's probably broken [22:04:05] But you can view Special:AllMessages specifically [22:04:33] (03CR) 10RobH: [C: 032] rewriting techblog.wikimedia.org to blog.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/192704 (owner: 10RobH) [22:06:57] Reedy: so there isn't a version I can check to make sure it updated? [22:07:08] I guess I'm assuming it did [22:07:16] Not really [22:07:35] You'd have to look for new messages added in that version. Or changed in english etc [22:07:45] where is morebots [22:07:55] If messages are actually showing, that should be enough to begin with [22:08:21] looks like nothing has gone to SAL since 10:00 (12 hours ago) [22:08:48] !log morebots is dead [22:11:12] there was a netsplit I think [22:11:14] morebots, better? [22:11:14] I am a logbot running on tools-exec-10. [22:11:14] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [22:11:14] To log a message, type !log . [22:11:23] (03CR) 1020after4: [C: 032] Add 1.25wmf19 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192881 (owner: 1020after4) [22:11:28] (03Merged) 10jenkins-bot: Add 1.25wmf19 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192881 (owner: 1020after4) [22:11:30] andrewbogott: thanks :) [22:11:52] (03CR) 1020after4: [C: 032] Wikipedias to 1.25wmf18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192882 (owner: 1020after4) [22:11:57] (03Merged) 10jenkins-bot: Wikipedias to 1.25wmf18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192882 (owner: 1020after4) [22:13:27] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf18 [22:13:34] Logged the message, Master [22:14:16] andre__: how to know the highest bug number that existed in Bugzilla [22:15:34] https://old-bugzilla.wikimedia.org/show_bug.cgi?id=73681 [22:15:40] mutante, ^ [22:15:44] Krenair: thank you [22:16:45] !log localtesting of change on mw1001 shows no issues, so pushing out to rest of apaches [22:16:50] Logged the message, Master [22:16:52] for bug in $(seq 1 73681); do .. i am making a HTML that links to all of them [22:17:31] (03CR) 1020after4: [C: 032] Revert "Pull WG on WD for now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192845 (owner: 10MaxSem) [22:17:37] 2.6M .. hmm [22:17:38] (03Merged) 10jenkins-bot: Revert "Pull WG on WD for now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192845 (owner: 10MaxSem) [22:19:48] !log twentyafterfour Synchronized ./wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s) [22:19:54] Logged the message, Master [22:20:01] twentyafterfour: revert, please [22:20:02] arghhhh puppet takes os long to sync apcahes its making me paranoid. [22:20:07] eh? [22:20:11] flooding the logs again [22:20:17] Error: 1146 Table 'wikidatawiki.wikigrok_questions' doesn't exist (10.64.32.28) [22:20:58] aaaaah [22:21:01] (03PS1) 10Hoo man: Revert "Revert "Pull WG on WD for now"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192962 [22:21:06] twentyafterfour: ^ [22:21:21] MaxSem: YOU BROKE IT AGAIN [22:21:21] (03CR) 1020after4: [C: 032] Revert "Revert "Pull WG on WD for now"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192962 (owner: 10Hoo man) [22:21:27] (03Merged) 10jenkins-bot: Revert "Revert "Pull WG on WD for now"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192962 (owner: 10Hoo man) [22:21:42] fuck [22:21:43] * aude rage [22:21:50] looking [22:22:06] !log twentyafterfour Synchronized ./wmf-config/InitialiseSettings.php: flooding logs (duration: 00m 07s) [22:22:10] Logged the message, Master [22:24:09] assume it either needs the table or a feature flag [22:24:39] was the code actually updated as expected? [22:24:44] Revert "Revert "Revert "Revert "Revert . aborted, err: Too much reversion [22:24:45] * aude doesn't quite understand what the questions do on wikidata [22:24:55] twentyafterfour: Can you sanc again [22:25:00] I still see the error [22:25:06] might be good to try on test.wikidata first or beta [22:25:06] hoo: ok [22:25:07] mw1204 [22:25:45] !log twentyafterfour Synchronized ./wmf-config/InitialiseSettings.php: mw1204 still logging errors (duration: 00m 05s) [22:25:51] Logged the message, Master [22:25:55] hoo: how about now [22:25:58] thanks twentyafterfour [22:26:09] Looks good to me so far :) [22:26:19] * aude too [22:26:29] !log finished with my deployment window for redirects, tested and is now live with no issues (so far) [22:26:33] Logged the message, Master [22:27:08] dammit, got lost in branches [22:28:09] (03PS1) 10Dzahn: static bugzilla: add links to all bugs/activities [puppet] - 10https://gerrit.wikimedia.org/r/192964 (https://phabricator.wikimedia.org/T85140) [22:28:33] (03CR) 1020after4: [C: 032] group0 to 1.25wmf19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192883 (owner: 1020after4) [22:28:41] (03Merged) 10jenkins-bot: group0 to 1.25wmf19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192883 (owner: 1020after4) [22:32:09] (updated SAL to add stuff missed while morebots was off) [22:32:16] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf19 [22:32:20] Logged the message, Master [22:33:54] (03PS1) 10RobH: setting techblog.w.o to our cluster for rewrite [dns] - 10https://gerrit.wikimedia.org/r/192965 [22:36:46] !log twentyafterfour Purged l10n cache for 1.25wmf17 [22:36:53] Logged the message, Master [22:39:42] " HttpError from line 942 of /srv/mediawiki/php-1.25wmf18/includes/WebRequest.php: Invalid file extension found in the path info or query string." [22:39:51] can't we suppress these at some point? [22:40:15] * hoo points to https://gerrit.wikimedia.org/r/183020 [22:41:16] (03CR) 10RobH: [C: 031] "I am not 100% certain on the use of geoip!text-addrs. As such, I'd like some other reviewers on this." [dns] - 10https://gerrit.wikimedia.org/r/192965 (owner: 10RobH) [22:41:43] hoo, people's user js/css pages? [22:41:49] 6operations, 10Wikimedia-Blog, 5Patch-For-Review: add techblog.wikimedia.org redirection to blog.wikimedia.org to redirects - https://phabricator.wikimedia.org/T90638#1067931 (10RobH) apache rewrite is live https://gerrit.wikimedia.org/r/#/c/192965/ is the dns change [22:41:53] Krenair: yeah [22:41:58] yeah I already reported it [22:42:22] and I have a patch [22:42:25] (03CR) 10Dzahn: [C: 031] "lgtm, given that the redirect exists in the cluster Apache config now, it's like other random redirects, for example "coffee.wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/192965 (owner: 10RobH) [22:42:58] https://phabricator.wikimedia.org/T89377 [22:44:24] !log Done deploying - uploaded release notes for 1.25wmf19 [22:44:28] Logged the message, Master [22:51:34] !log rebooting virt1005 in anticipation of an exprimental upgrade to Trusty. (There are no VMs on virt1005 other than a testing host) [22:51:39] Logged the message, Master [22:53:10] !log Ran rebuildEntityPerPage.php on wikidatawiki to clean up after wikigrok database mess [22:53:14] Logged the message, Master [22:53:34] PROBLEM - Host virt1005 is DOWN: PING CRITICAL - Packet loss = 100% [22:57:45] RECOVERY - Host virt1005 is UP: PING OK - Packet loss = 0%, RTA = 2.84 ms [23:03:00] (03CR) 10Yurik: "I need another hour just in case, can be merged after 4:30pm pst." [puppet] - 10https://gerrit.wikimedia.org/r/192888 (owner: 10Yurik) [23:07:39] (03PS2) 10Andrew Bogott: Use cloud archive for non-icehouse Openstack on Trusty. [puppet] - 10https://gerrit.wikimedia.org/r/174971 [23:07:41] (03PS3) 10Andrew Bogott: Add labs config files for Openstack version Juno [puppet] - 10https://gerrit.wikimedia.org/r/192483 [23:07:43] (03CR) 10RobH: [C: 032] setting techblog.w.o to our cluster for rewrite [dns] - 10https://gerrit.wikimedia.org/r/192965 (owner: 10RobH) [23:08:45] (03CR) 10jenkins-bot: [V: 04-1] Use cloud archive for non-icehouse Openstack on Trusty. [puppet] - 10https://gerrit.wikimedia.org/r/174971 (owner: 10Andrew Bogott) [23:08:47] (03CR) 10jenkins-bot: [V: 04-1] Add labs config files for Openstack version Juno [puppet] - 10https://gerrit.wikimedia.org/r/192483 (owner: 10Andrew Bogott) [23:10:48] (03PS3) 10Andrew Bogott: Use cloud archive for non-icehouse Openstack on Trusty. [puppet] - 10https://gerrit.wikimedia.org/r/174971 [23:10:50] (03PS4) 10Andrew Bogott: Add labs config files for Openstack version Juno [puppet] - 10https://gerrit.wikimedia.org/r/192483 [23:13:23] (03PS4) 10Ori.livneh: Move jq package to module, all elasticsearch machines should have it [puppet] - 10https://gerrit.wikimedia.org/r/188881 (owner: 10Chad) [23:13:31] (03CR) 10Ori.livneh: [C: 032 V: 032] Move jq package to module, all elasticsearch machines should have it [puppet] - 10https://gerrit.wikimedia.org/r/188881 (owner: 10Chad) [23:14:42] (03PS1) 10Tim Starling: Use Wikiquote logo from Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192978 [23:15:31] (03CR) 10RobH: "If they key isn't compromised at this time, why can't you simply shred your local copy and we revoke it at some time when folks are normal" [puppet] - 10https://gerrit.wikimedia.org/r/192888 (owner: 10Yurik) [23:20:14] (03CR) 10Dzahn: "it's 150x150 while other logos are 135x135. is that an issue?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192978 (owner: 10Tim Starling) [23:22:01] (03CR) 10Dzahn: [C: 031] "nevermind, just like the old one being used" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192978 (owner: 10Tim Starling) [23:22:13] aude & hoo, I'm gonna deploy a fix this SWAT. will also start with testwikidata [23:22:19] !log upgrading virt1005 to trusty [23:22:24] Logged the message, Master [23:22:44] MaxSem: Ok, if you're sure it's good now [23:23:24] PROBLEM - DPKG on virt1005 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:24:28] <^d> jouncebot: next [23:24:29] In 0 hour(s) and 35 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150226T0000) [23:26:40] (03PS1) 10MaxSem: Enable WikiGrok in repo mode on testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192987 [23:27:11] Hm [23:27:48] Something is wrong [23:27:54] https://bits.wikimedia.org/static-1.25wmf18/extensions/LiquidThreads/images/arrow_right_25.png vs. https://bits.wikimedia.org/static-1.25wmf19/extensions/LiquidThreads/images/arrow_right_25.png [23:28:04] MaxSem: sounds good, starting with testwikidata [23:28:09] (03PS1) 10QChris: Synchronize mediacounts files to dumps.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/192988 [23:29:14] twentyafterfour, see ^^^ by Krenair [23:29:34] (03PS6) 10Ori.livneh: graphite: extend aggregation for statsite extended counters [puppet] - 10https://gerrit.wikimedia.org/r/192792 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [23:29:39] ? [23:30:07] something wrong with static assets [23:30:13] <^d> The whole wmf19 is 404? [23:30:40] yep [23:30:51] <^d> I see symlinks on tin [23:30:52] <^d> Hmm [23:31:01] did the symlink get synched? [23:31:14] <^d> I assume with scap? [23:32:17] /srv/mediawiki-staging/docroot/bits/static-1.25wmf19 exists on tin [23:32:41] /srv/mediawiki/docroot/bits/static-1.25wmf19 does not exist on mw1001 [23:33:17] !log maxsem Synchronized docroot and w: (no message) (duration: 00m 06s) [23:33:19] I sync'd everything according to the docs I thought [23:33:23] Logged the message, Master [23:33:31] I checked and double-checked each step [23:33:43] works now: https://bits.wikimedia.org/static-1.25wmf19/extensions/LiquidThreads/images/arrow_right_25.png?vsbasd [23:33:47] it's there now [23:33:48] hmm what's up with that :-/ [23:33:51] (note the cache-busting parameter [23:34:10] worksforme without cache busting [23:34:40] <^d> mmmm, black magic [23:34:59] (03CR) 10Ori.livneh: [C: 032 V: 032] graphite: extend aggregation for statsite extended counters [puppet] - 10https://gerrit.wikimedia.org/r/192792 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [23:35:47] so wth went wrong [23:38:25] RECOVERY - DPKG on virt1005 is OK: All packages OK [23:43:18] (03CR) 10Andrew Bogott: [C: 032] Use cloud archive for non-icehouse Openstack on Trusty. [puppet] - 10https://gerrit.wikimedia.org/r/174971 (owner: 10Andrew Bogott) [23:46:05] PROBLEM - puppet last run on virt1005 is CRITICAL: CRITICAL: Puppet has 1 failures [23:46:45] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: puppet fail [23:48:02] twentyafterfour: you didn't merge the symlinks patch until after the scap was run [23:48:20] * bd808 sees the evidence in scrollback here [23:49:14] "[14:23] !log twentyafterfour Started scap:" [23:49:20] "[15:11] (Merged) jenkins-bot: Add 1.25wmf19 symlinks" [23:50:10] so the symlinks were on tin but only after the scap pushed the state of tin to the cluster. [23:53:03] hmm [23:55:12] 6operations, 10hardware-requests: codfw: (1) eventlogging node - https://phabricator.wikimedia.org/T90747#1068271 (10ori) @RobH: Toby is right; we should provision two new machines and decommission vanadium. EventLogging is used by every team in engineering, including ops. It has been running on a machine tha... [23:56:25] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures