[00:03:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.027 seconds [00:09:50] Change merged: Tim Starling; [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/36149 [00:11:03] PROBLEM - Puppet freshness on lvs3 is CRITICAL: Puppet has not run in the last 10 hours [00:11:03] PROBLEM - Puppet freshness on lvs4 is CRITICAL: Puppet has not run in the last 10 hours [00:23:57] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [00:35:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:50:03] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.478 seconds [01:24:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:38:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.877 seconds [02:13:52] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:24:47] !log LocalisationUpdate completed (1.21wmf5) at Mon Dec 3 02:24:47 UTC 2012 [02:25:01] Logged the message, Master [02:28:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.924 seconds [02:46:35] !log LocalisationUpdate completed (1.21wmf4) at Mon Dec 3 02:46:35 UTC 2012 [02:46:45] Logged the message, Master [03:20:55] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [03:21:24] New review: Tim Starling; "Fine apart from inline comment." [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/35818 [03:33:59] New review: Tim Starling; "I wouldn't like to see this change merged and I2c6ab07d ignored, since I2c6ab07d is ready to deploy ..." [operations/apache-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/34113 [03:35:55] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [03:42:58] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [05:27:56] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [07:38:58] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [08:07:30] jeremyb: https://bugzilla.wikimedia.org/show_bug.cgi?id=40582#c5 [08:32:16] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [08:51:28] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.013 second response time on port 11000 [09:07:04] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [09:07:04] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [09:22:53] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [09:22:53] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [10:11:56] PROBLEM - Puppet freshness on lvs3 is CRITICAL: Puppet has not run in the last 10 hours [10:11:56] PROBLEM - Puppet freshness on lvs4 is CRITICAL: Puppet has not run in the last 10 hours [10:25:30] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [12:01:36] New review: Aude; "note that we want to deploy the mw1.21-wmf5 branch, which is what was tagged on Friday + security fi..." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/36205 [12:57:58] !log Shutdown slauerhoff's switchports [12:58:10] Logged the message, Master [13:08:33] New patchset: Dereckson; "(bug 42644) Throttle for Library of Israel editathon" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36548 [13:10:23] Change merged: Demon; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36548 [13:12:07] !log demon synchronized wmf-config/throttle.php 'Deploy I826b9114' [13:12:16] Logged the message, Master [13:20:44] New review: Dereckson; "This doesn't match community desire, as expressed on " [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/33390 [13:22:18] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [13:36:37] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [13:44:34] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [13:58:01] New review: Hashar; "recheck" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/31575 [14:00:05] New patchset: Hashar; "validating new Jenkns job (do not submit)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36552 [14:04:53] New review: Hashar; "recheck" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/36552 [14:07:55] Change abandoned: Hashar; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36552 [14:12:11] Change restored: Hashar; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36552 [14:12:23] New patchset: Hashar; "validating new Jenkns job (do not submit)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36552 [14:12:59] Change abandoned: Hashar; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36552 [14:36:22] cmjohnson1: are we still making the switch today? [14:36:52] not sure...apergos: how are we looking for replacing be3 and be5 [14:37:37] cmjohnson1: I will set weights to zero on ms-be3 toay [14:37:43] after that you can pull it [14:37:58] I'm not going to do the other host at the same time, I'll wait two days and we;ll see how it looks [14:38:08] *grumble* [14:38:08] k..sounds good [14:42:02] why not do it at the same time? [14:45:28] because the new machines are still far from being caught up [14:48:30] strange [14:51:10] yeah, I am not excited about it either [14:53:32] doesnt' the process take about 4 weeks [14:53:33] ? [14:57:19] so we're doing ms-be3 today or tomorrow and ms-be5 at the end of the week? [14:57:48] sbernardin: be3..but not yet [14:58:57] ok [15:00:29] ms-be3 today [15:00:50] ms-be5 we'll see, if I think we can get away with it in two days we'll do it in two days (your schedules permitting) [15:01:39] ok...sounds good [15:14:56] !log upgrade mwlib to 0.14.2 [15:15:05] Logged the message, Master [15:28:28] New patchset: Mark Bergsma; "Add ms-be3004" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36568 [15:29:04] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [15:29:31] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36568 [15:29:56] sbernardin: let's work on mw16 now. rt3951...bad dimm on mw16 [15:34:36] cmjohnson1: ok [15:35:23] sbernardin: i am going to take the server down. I want you to swap the DIMM in A1 to B1 and than turn the server back on and let's see if the error moves w/the dimmm or sticks around [15:35:49] !log mw16 shutting down to troubleshoot DIMM error rt3951 [15:35:58] Logged the message, Master [15:36:51] sbernardin: you should be good to go...ping me once it's back on [15:39:40] PROBLEM - Host mw16 is DOWN: PING CRITICAL - Packet loss = 100% [15:43:43] cmjohnson1: swapped DIMM A1 and B1 [15:43:43] RECOVERY - Puppet freshness on lvs3 is OK: puppet ran at Mon Dec 3 15:43:32 UTC 2012 [15:43:57] cmjohnson1: mw16 powered back on [15:45:06] okay..great....now we will wait a few days and see if the error returns and where..that will determine whether or not it is simple DIMM error or a problem with the main board or cpu1 [15:45:22] RECOVERY - Host mw16 is UP: PING OK - Packet loss = 0%, RTA = 0.56 ms [15:54:40] RECOVERY - Puppet freshness on lvs4 is OK: puppet ran at Mon Dec 3 15:54:13 UTC 2012 [15:57:47] did you ran memtest86+? [16:10:13] platonides: the r410 has a history of bad DIMM modules on the mainboard...the easiest troubleshoot is to swap the DIMM and see if the error follows [16:13:52] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [16:30:04] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.049 second response time [16:39:45] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36485 [16:49:37] !log Installed ms-be3004 [16:49:48] Logged the message, Master [16:57:12] !log Installed ms-be3003 [16:57:21] Logged the message, Master [17:28:26] New patchset: ArielGlenn; "object replication with up to 3 rsyncs at once, up from 2" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36581 [17:30:15] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36581 [17:30:55] !log reedy synchronized wmf-config/InitialiseSettings.php [17:31:04] Logged the message, Master [17:31:05] snapshot1002: rsync: change_dir#3 "/apache/common-local" failed: No such file or directory (2) [17:31:05] snapshot1002: rsync error: errors selecting input/output files, dirs (code 3) at main.c(643) [Receiver=3.0.9] [17:31:31] that sounds like a message for apergos [17:31:53] no, it's a message for "yeah I gotta get back to working on that bax" [17:32:03] it was pulled for upgrade, ignore for now, sorry for the noise [17:33:22] alright [17:33:28] *box ! [17:33:35] weird typos are weird [17:34:16] * jeremyb sometimes wonders if paravoid knows what you mean when you type english in greek letters [17:35:19] well it's not the normal way you would type english words in greek [17:35:26] but still one gets used to it (layout typos) [17:35:49] I get screwed once in awhile by someone with a french keyboard who makes the opposite mistake: gr but with latin letters [17:35:55] his mapping is relaly bizarre [17:36:08] my DSL flapped again [17:36:09] sigh [17:36:24] ugh [17:36:30] still from the storms? [17:36:34] ;/ [17:36:44] it's either that humidity goes up or that people are getting back from work and crosstalk is increased [17:37:06] telco said they're going to fix it on Wednesday, but it worked more or less today [17:37:25] that will make it harder for them to chase it down, if it works when they show up [17:37:36] no, it's easy [17:37:42] go to the flooded junction box [17:37:45] remove the water [17:38:10] oh, there's not a general problem with the cables? [17:38:38] (I lived in an apartment where things got worse with storms, but it was not that simple a fix) [17:38:57] i think you forgot "replace the copper" [17:39:56] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [17:40:07] http://www.theverge.com/2012/11/17/3655442/restoring-verizon-service-manhattan-hurricane-sandy [17:41:08] LeslieCarr: I think I finally found out what was wrong with that Parsoid LVS thing: https://gerrit.wikimedia.org/r/36376 [17:41:28] hahaha [17:41:29] :) [17:41:39] oh humans and our inability to sometimes see obvious [17:42:11] hah [17:42:43] Yeah [17:42:48] I pulled my hair out over this for hours [17:43:05] I see Patrick merged it on Saturday so it should already be live [17:43:41] New patchset: Jarry1250; "Add librsvg to contint puppet manifest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36583 [17:45:45] But somehow it still doesn't work. wtp1 is listening on the service IP correctly, but pinging the service IP doesn't work (I get destination unreachable from the router) and HTTP requests don't work either [17:53:09] Hmm, I found out that I apparently had to create https://noc.wikimedia.org/pybal/pmtpa/parsoid [17:53:26] I'm suddenly very concerned about pybal apparently being dependent on noc [17:53:47] ACKNOWLEDGEMENT - Host analytics1007 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn ack test via android app on tablet [17:59:54] yay, nagios acknowledgement by hitting touchscreen on tablet [18:00:09] Jeff_Green: [18:00:57] http://exchange.nagios.org/directory/Addons/Frontends-%28GUIs-and-CLIs%29/Mobile-Device-Interfaces/aNag/details [18:03:03] RoanKattouw: sounds like a good thing to put in a ticket to puppetize that part [18:03:11] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [18:04:14] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [18:04:20] Well, also [18:04:38] I'm very curious why that stuff is on /home and on a web server that. last I knew, ran on fenari [18:05:40] i usually find with those questions that the answer is that 5 years ago there either was a very valid reason or it was way easier at the time [18:08:17] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [18:09:18] mutante: neat [18:14:52] New patchset: Aaron Schulz; "Enabled retries for jobs that fail to be acknowledged as done." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35818 [18:16:08] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35818 [18:16:50] !log aaron synchronized wmf-config/CommonSettings.php 'deployed b999dbca1f8ca0a76fe327dd1ae30beb6eed462f' [18:16:58] Logged the message, Master [18:19:29] Ryan_Lane, any sign of Mike? Do you know if he's been invited to IRC and/or to the meeting? [18:19:48] andrewbogott: not sure. ct may have more info [18:20:06] woosters, got an email address for Mike W? [18:21:11] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.054 second response time [18:23:25] anyone knows which hardware had been purchased for OSM? [18:23:59] ping wangatlargo [18:25:25] LeslieCarr: Aha, I found out there are a few hoops to jump through when changing LVS https://wikitech.wikimedia.org/view/LVS#Deploy_a_change_to_an_existing_service [18:25:42] woosters: Ok, at least he knows where to find us; that was my main concern. [18:25:47] !log installing package upgrades on spence [18:25:55] Logged the message, Master [18:26:45] hi, [18:27:24] andrewbogott, ryan_lane - wangatlargo is in the channel now [18:27:32] wangatlargo: hi there [18:27:49] Hi, Ryan [18:27:55] folks, wangatlargo is our part time consultant, helping out labs [18:28:01] welcome aboard! [18:28:01] LeslieCarr: You wanna sit down and go through those steps after the ops meeting? I can theoretically do that by myself but that doesn't seem like a very responsible thing to do [18:28:02] Hi Andrew, [18:28:12] sure Roan [18:28:23] Excellent [18:30:06] wangatlargo: after the meeting let's start setting your accounts up [18:30:25] good [18:33:47] anyone knows which hardware had been purchased for OSM? [18:33:55] package upgrade broke apache on spence.. missing apache config.. looking [18:34:00] MaxSem: i can field that [18:34:08] MaxSem: Computers [18:34:12] MaxSem: so we have the servers for db, caching, etc [18:34:14] r310s [18:34:27] MaxSem: however, the hard disks had issues, they didnt ship with the carriers or cables [18:34:40] we have gotten dell to fix this, and the carriers have arrived onsite, but the cables have not yet [18:34:51] otherwise they are all racked up and ready to have the OS installed [18:35:34] 4 osm caching, 4 osm database, and 4 osm apache servers [18:35:42] in each site (eqiad and sdtpa) [18:36:00] RobH, thank you [18:36:04] nice :) [18:36:07] (osm-db#, osm-cp#, and osm-web#) [18:36:17] 1-4 in tampa, 1001-1004 in eqiad [18:36:19] RobH: that's per datacenter? [18:36:20] ah [18:36:21] ok [18:36:22] * Ryan_Lane nods [18:36:54] yea both sdtpa and eqiad have a total of 12 osm servers each site, total 24 osm servers in wmf cluster [18:37:00] :D [18:37:08] now if we can get the damned cables in [18:40:01] RobH, why in both datacenters? for migration or redundancy? [18:40:09] .... [18:40:18] !log deleting broken link to nagios3.conf on spence and restarting apache [18:40:28] Logged the message, Master [18:40:28] because everything we have needs to be in multiple datacenters? [18:40:46] we need to build all services in both sites for cross site redundancy. [18:40:54] or else whats the point? [18:41:42] ideally whatever service it is can balance between both being live [18:41:49] but i have no idea how thats going to be structured, so no clue. [18:41:52] New patchset: Reedy; "enwiki to 1.21wmf5" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36588 [18:42:05] but at miniumum if one site goes dark we have the other to run it [18:42:47] yeah, load balancing... http://wiki.openstreetmap.org/wiki/Developer_FAQ#Why_don.27t_we_spread_the_load_on_the_OpenStreetMap_database_across_a_number_of_servers.3F [18:44:32] well, that's for people making changes on the maps, right? [18:44:48] and we're just going to be providing tile services? [18:45:10] if that's the case I can't see why this can't be load balanced geographically fairly easily [18:45:13] Ryan_Lane, actually OSM runs on just one tileserver:P [18:45:35] heh. we just told you why we'd have more than one [18:47:36] MaxSem: there will be multiple tile servers [18:48:04] mutante: would you be interested in being primary for open street maps? [18:48:53] New patchset: Jgreen; "offhost_backups should only copy gzipped db dumps" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36591 [18:49:51] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36591 [18:58:25] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36588 [19:00:14] * MaxSem reads about Postgres replication and pukes [19:01:14] MaxSem: don't replicate [19:01:22] wangatlargo, you there? [19:01:33] they don't need replication [19:01:48] mutante: please take over osm [19:04:59] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: enwiki to 1.21wmf5 [19:05:08] Logged the message, Master [19:07:52] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [19:07:52] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [19:09:04] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:10:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.049 seconds [19:22:36] New patchset: Reedy; "Defrag commented out readonly statements" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36594 [19:23:46] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [19:23:46] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [19:24:25] New patchset: Reedy; "Remove $wgLegacySchemaConversion" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36595 [19:28:21] binasher: http://ganglia.wikimedia.org/latest/?c=MySQL%20pmtpa&h=db63.pmtpa.wmnet&m=cpu_report&r=week&s=by%20name&hc=4&mc=2 [19:28:27] I wonder what's with the 28th [19:28:50] hmm, mysql uptime reset...must have been a restart [19:29:48] !log reedy synchronized wmf-config/CommonSettings.php 'wikidatawiki marked readonly' [19:29:57] Logged the message, Master [19:30:47] New review: Aaron Schulz; "Summary typo" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/36595 [19:40:51] binasher: Don't suppose I can get away with adding a column (and index on that column) to a wikidata table with around 10M rows, can I? [19:41:10] (I know you're in a meeting atm too) [19:41:43] Reedy: no… and the wikibase extension schema is kind of fucked up [19:42:26] Just kind of? ;) [19:42:26] hrm .... [19:42:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:42:56] do we need a term_id column there? or what? [19:43:02] * aude not a db expert [19:43:26] having a PK column makes using online schema change possible (IIRC) [19:43:32] ok [19:44:24] New review: Hashar; "indeed :)" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/36583 [19:45:18] binasher: https://gdash.wikimedia.org/dashboards/filebackend/ those lockmanger graphs should be split out from the streamfile ones [19:47:00] * AaronSchulz lols at the stats taking 1500ms [19:47:55] * AaronSchulz sighs at multiwrite slowing things down too [19:51:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.048 seconds [19:53:50] New patchset: Raimond Spekking; "Enable AFTv5 for dewiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/34964 [19:54:14] New patchset: Jdlrobson; "ensure all photo uploads go to commons" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36103 [19:57:24] sbernardin: you are good to set up the new ms-be3 [19:57:40] the mgmt ip is in the ticket [19:57:55] thanks for working on it! [19:58:39] sbernardin: rt3855 [19:58:59] AaronSchulz: heh, as expected... [19:59:21] binasher: Does truncating a large table replicate "well" (quickly)? [20:00:28] paravoid: can you take that bug? [20:03:42] Reedy: yeah, truncate should generally be ok [20:09:00] binasher: in https://ishmael.wikimedia.org what is the number in parenthesis for % time ? [20:11:33] AaronSchulz: time in seconds [20:11:41] ah, seconds [20:11:58] clicking "more" gives confusing numbers then [20:13:33] that time in parenthesis on the main page isn't very useful [20:14:02] it's per sample not per instance right? [20:14:13] as in, a sum [20:14:44] yeah, a sum over the time period, or a sum over the time period for queries that took > the slow query threshold [20:15:19] it would be more interesting if ishmael ingested 100% of queries [20:19:25] what's the prognosis for deploying Wikibase to test2 today? are the schema problems blockers? [20:19:53] robla: seems so [20:20:12] first step is to update the repo (wikidata.org) and it requires the schema changes / [20:20:24] or plan is probably truncate one of the tables and rebuild it with a script [20:23:26] !log reedy synchronized wmf-config/CommonSettings.php 'wikidatawiki back to writeable' [20:23:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:23:33] Logged the message, Master [20:26:56] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [20:30:16] AGHHHHH [20:30:17] MEETING [20:30:19] AGGGGH [20:30:23] WHY DOES MY CALENDAR FAIL ME SO [20:32:07] !log kaldari synchronized php-1.21wmf5/resources/startup.js 'syncing startup.js per Roan' [20:32:15] Logged the message, Master [20:32:31] woosters, so sorry about missing the meeting, trying to figure out why calendar doesn't notify me now [20:32:41] I'm updating the etherpad with my updates right now too [20:32:44] np [20:32:46] we gave you all the work [20:32:50] no worries [20:32:57] haha [20:33:13] !log added max_user_connections to wikiadmin grants on s1 [20:33:22] Logged the message, Master [20:39:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.019 seconds [20:42:03] mutante, so will you be the OSM guy?:) [20:42:53] !log kaldari synchronized php-1.21wmf5/extensions/Vector/modules/ext.vector.collapsibleNav.css 'syncing vector css for menus' [20:43:01] Logged the message, Master [20:43:14] AaronSchulz or Reedy, can you think of a reason we would not want to provide a copy of the interwiki cdb file for download to the public? [20:43:50] I am thinking that this with some instructions about a few lines to add, is the quickest way to get a modern mw release up to date on interwikis, since the sql table for a wiki they import might well be empty [20:45:48] wangatlargo: ok, let's start with getting you labs and bugzilla access [20:46:17] wangatlargo: do you have an iphone or android, or something else? [20:47:38] I am using computer now. Is that work? [20:47:57] !log jenkins / gallium : updating Zuul repository 2747501..f73a04d , should be updated by puppet automatically [20:48:04] !log deploying squid config for the updated mobile ACL [20:48:05] well, for one part of this you're going to need a mobile device for two-factor-authentication [20:48:08] Logged the message, Master [20:48:15] Logged the message, Master [20:48:36] wangatlargo: make an account here: https://bugzilla.wikimedia.org/ [20:48:51] also, add a request here: http://www.mediawiki.org/wiki/Developer_access [20:49:16] Ryan_Lane: hi :) I have marked you as on duty in this channel topic. [20:49:31] Ryan_Lane: someone from ops told me so :-D [20:49:39] yeah. I'm on duty [20:50:11] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [20:51:41] Ryan_Lane: and here is a very simple one, adding librsvg on the contint server https://gerrit.wikimedia.org/r/36583 :-] [20:51:53] Ryan_Lane: basically adding librsvg2-bin [20:52:27] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36583 [20:52:37] done [20:53:20] thanks ryan! [20:53:24] yw [20:53:29] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.050 second response time [20:53:46] Ryan_Lane: and I got zuul in production for real, it is triggering a few extensions already. [20:53:51] sweet [20:54:12] Ryan_Lane: fixing up ops/puppet right now (Zuul did not like `production` as a HEAD) [20:54:22] Ryan_Lane: will most probably enable mediawiki core this week :-] [20:54:26] Ryan: account created at https://bugzilla.wikimedia.org/ with email mwang@wikimedia.org. I already created an account at http://www.mediawiki.org/wiki/Developer_access using mwang@wikimedia.org. I can login http://www.mediawiki.org/wiki/Developer_access now [20:54:52] wangatlargo: well, you want to add a request to that page [20:57:41] wangatlargo: you should also join #wikimedia-labs channel [20:57:54] and subscribe to labs-l: https://lists.wikimedia.org/mailman/listinfo/Labs-l [20:57:59] I added a request to that page on Sat. It already done. [20:58:17] ah [20:58:24] what's your labsconsole username? [20:58:35] http://www.mediawiki.org/wiki/Developer_access#User:Mwang [20:58:36] ? [20:58:44] Mwang [20:58:48] ok [20:58:53] did you log into labsconsole yet? [20:59:03] you should also log into gerrit: gerrit.wikimedia.org [20:59:18] gerrit/labsconsole use the same username/password [20:59:30] you should upload an ssh key to each [21:00:25] I've added you to a bunch of groups on labsconsole [21:00:38] you'll need to enable two-factor authentication before you can use most of the interfaces there [21:01:01] https://labsconsole.wikimedia.org/wiki/Special:OATH [21:01:03] I can log into labsconsole. [21:03:29] I can log in gerrit.wikimedia.org now. [21:04:55] !log adding mwang to the ops ldap group [21:05:03] Logged the message, Master [21:05:33] I am creating keys now. Then I will upload my public key. [21:05:39] ok [21:07:27] apergos: I guess it can't hurt [21:08:54] ok, I'll see if Reedy knows any reasons why not, otherwise out it goes [21:08:55] thanks [21:13:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:15:58] wangatlargo: welcome! [21:28:56] !log DNS update - adding doc and integration CNAMEs [21:29:00] Ryan: pub_key uploaded to labconsole and gerrit.wikimedia.org. two-factor authentication enabled. I wrote down the TOKEN and kept it in a safe place. [21:29:04] Logged the message, Master [21:29:17] wangatlargo: did you write down all scratch tokens? [21:29:48] mutante: thanks :-] [21:30:05] hashar: doc.wikimedia, doc.mediawiki.. integration.. as you requested [21:30:59] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.042 seconds [21:31:38] mutante: works for me, you can close the RT :) [21:31:54] done:) [21:32:00] no I have to send the apache conf snippet to serve the new entries! [21:32:01] \O/ [21:32:56] yea, so you have that? nice [21:33:51] cmjohnson1: ms-be3 is up [21:34:21] cmjohnson1: plugged into mrjp-b2-sdtpa port 12 [21:34:47] sbenardin: go ahead and finish the setup...once you are done. create a network ticket w/the above informaiton [21:35:20] cmjohnson1: ok [21:35:25] LeslieCarr: Let me know when you're ready to do the LVS thing (looks like you're out to lunch now) [21:36:35] !log aaron synchronized php-1.21wmf5/extensions/CentralAuth/CentralAuthUser.php 'added exception' [21:36:43] Logged the message, Master [21:36:56] Ryan: Yes, I wrote down all 5 scratch TOKENS. [21:37:01] great [21:37:23] wangatlargo: let's discuss things further at #wikimedia-labs [21:39:11] * Damianz scratches wangatlargo's token [21:43:18] 2012-12-03 21:43:03 refreshLinks2 Module:Taxobox-fr table=templatelinks start= end= STARTING [21:43:20] The luasandbox extension is not present, this engine cannot be used. [21:43:34] notpeter: lua is not on srv193 [21:55:53] RoanKattouw: hey i'm here now [21:55:58] Awesome [21:56:00] want to sit physically together and do this ? [21:56:06] Yes, I'll come over [22:00:35] RoanKattouw: why are exceptions suppressed even in cli mode? [22:00:58] New patchset: Ryan Lane; "Add sync_all deploy function" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36650 [22:01:02] <^demon|brb> The exception output stuff probably doesn't differ per sapi. [22:01:46] when I run jobs, all I get is the "enable $wgShowExceptionDetails" crap [22:01:49] <^demon|brb> Well, the check for outputting a stacktrace, prolly. [22:01:52] enotif jobs [22:02:10] are problem to Special:BlockList after wmf5 update already known? [22:02:47] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:03:13] anyone know why lvs1005 and lvs1006 are not actually bgp peering ? are they supposed to be down ? [22:03:23] or would i kill stuff if i restarted pybal ? [22:03:25] http://en.wikipedia.org/wiki/Special:BlockList / http://www.wikidata.org/wiki/Special:BlockList [22:04:24] New patchset: Catrope; "Add the eqiad parsoid VIP to lvs1003/1004 as well" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36651 [22:05:50] LeslieCarr: ---^^ [22:07:05] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36651 [22:07:06] !log aaron synchronized wmf-config/CommonSettings.php [22:07:14] Logged the message, Master [22:12:30] sbernardin: how are things coming along w/ms-be3 [22:12:44] New review: Hashar; "recheck" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/35135 [22:15:05] PROBLEM - Host ms-be3 is DOWN: PING CRITICAL - Packet loss = 100% [22:15:49]