[05:24:31] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 05:21:10 PM UTC [06:11:55] (03PS1) 10Dzahn: install graphviz on bugzilla role [operations/puppet] - 10https://gerrit.wikimedia.org/r/103525 [06:13:01] (03CR) 10Faidon Liambotis: "Didn't you *just* add this with Ifafb9a2b8f70a8b0c79facaf102745cfd5416b0c ? Can you please revert that instead (and try to be sure next ti" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103496 (owner: 10Yurik) [06:13:06] (03CR) 10Faidon Liambotis: [C: 04-1] Zero partner config: Removed Opera support [operations/puppet] - 10https://gerrit.wikimedia.org/r/103496 (owner: 10Yurik) [06:23:11] (03PS1) 10Yurik: Revert "Added carrier 436-04 to zero" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103526 [06:31:38] yurik: where did you call it "testing"? [06:33:51] (03PS2) 10Ori.livneh: Revert "Hack: cron job to clean up tifs from /tmp on app servers" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103390 [06:35:05] how I hate that appservers parse images... [06:35:12] how much* [06:37:13] yeah. parsing wikitext is hard enough :) [06:38:32] (03CR) 10Faidon Liambotis: Revert "Added carrier 436-04 to zero" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/103526 (owner: 10Yurik) [06:42:59] paravoid, last night i had a few hours with the partner, testing it [06:43:13] they were inspecting their tcp dumps and seeing how it was routing [06:43:20] it turned out they couldn't whitelist opera [06:43:24] hence - revert [06:43:36] just set up something in labs for that purpose [06:43:53] production isn't for carrier testing [06:44:07] paravoid, they were testing it against our production ips. how do you propose they test their system if they are whitelisting it... [06:44:23] or are you saying they should set it up one way, test, and than switch??? [06:44:44] and they are also testing against our production URLs [06:44:46] .... [06:46:59] paravoid, i wish we were not tied in with the ops as we are, and i wish ESI would work, making the whole point moot. But the reality is - this is the only way we have at this point, so please help us out until we can cleanly separate varnish from backend [06:47:17] this is completely besides the point [06:47:32] it's a statement to which I completely agree, but it's also besides the point [06:47:50] don't use production for testing, I'm not sure how can I say this in simpler words :) [06:47:55] ok, how do you propose they test against labs? [06:48:08] they need to whitelist all of our ips and often - URLs [06:48:59] and at the same time - not allow any holes like some of them accidently did - like .*\.wikipedia\..* -- which obviously allowed for ppl to set up their own proxies and get free internet for everything [06:57:49] so, paravoid, what are you proposing? [07:23:26] (03CR) 10Yurik: "There are no other ways to test other than in production. If you have any realistic alternatives, i will be happy to hear them, but noone " [operations/puppet] - 10https://gerrit.wikimedia.org/r/103526 (owner: 10Yurik) [07:23:50] paravoid, ^ [07:31:52] (03Abandoned) 10Yurik: Zero partner config: Removed Opera support [operations/puppet] - 10https://gerrit.wikimedia.org/r/103496 (owner: 10Yurik) [07:40:31] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 04:35:24 PM UTC [07:53:31] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 04:48:26 PM UTC [07:54:31] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 04:50:14 PM UTC [08:03:31] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 04:58:40 PM UTC [08:25:31] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 05:21:10 PM UTC [08:25:51] PROBLEM - MySQL Slave Running on db1034 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Error Table _page_new already exists on query. Default database: [08:26:21] meh [08:27:51] RECOVERY - MySQL Slave Running on db1034 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [09:08:25] (03CR) 10Faidon Liambotis: [C: 032 V: 032] Revert "Added carrier 436-04 to zero" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103526 (owner: 10Yurik) [10:41:31] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 04:35:24 PM UTC [10:42:33] (03CR) 10Ori.livneh: "In the interest of making the diff easier to read, I held back from applying cosmetic change. But since this is not getting reviews, might" [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/101793 (owner: 10Ori.livneh) [10:42:48] (03PS9) 10Ori.livneh: Rewrite for multithreading [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/101793 [10:54:12] (03PS10) 10Ori.livneh: Rewrite for multithreading [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/101793 [10:54:31] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 04:48:26 PM UTC [10:55:31] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 04:50:14 PM UTC [11:04:31] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 04:58:40 PM UTC [11:26:31] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 05:21:10 PM UTC [12:56:21] apergos: PHP fatal error in /usr/local/apache/common-local/php-1.23wmf7/extensions/GlobalBlocking/SpecialGlobalBlock.php line 278: [12:56:21] Call to a member function getPrefixedText() on a non-object [12:56:56] is there a bug report for it yet? [12:57:04] i don' know [12:57:09] is it a know issue? [12:57:30] I don't know about it but wikimedia-dev would be a more likely place [12:57:40] since this is an mw issue [12:57:47] thanks apergos [12:57:53] sure [12:58:14] do you know how would know about this in the dev team? [12:58:47] no, you'll just have to ask [12:59:01] ok [13:24:12] I have been referred to this channel .... there are issues with labs ... the file system I am told [13:24:25] is there anyone who can have a look and is willing to do so ? [13:28:29] GerardM- Coren is the man for labs [13:29:02] GerardM-: it is labstore4 is broken [13:30:03] one man cannot provide 24*7 support [13:30:12] does he not have a colleague ? [13:30:30] (particularly not in the season when you are supposed to be jolly) [13:30:33] GerardM-: it is xmess eve, and the wmf hired one [13:30:54] still not ontop of all iirc [13:31:22] anyhow, labstore3 came in action, and should be fixed [13:31:42] !log reedy synchronized php-1.23wmf7/extensions/GlobalBlocking 'bug 58934 Icc47a2d6367c0b906e40e068635c9fda07108e0f' [13:31:43] cool [13:32:00] Logged the message, Master [13:39:42] as far as I know an xfs repair fixed things up yesterday and labs should be fine [13:39:46] is there a new problem? [13:40:13] there is apergos [13:42:31] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 04:35:24 PM UTC [13:43:26] can anyone have a look at: https://nl.wikipedia.org/w/index.php?title=Speciaal:Logboeken&type=block&page=User%3AHoogeveen123&uselang=en [13:43:32] please note the "with an expiry time of 20:02, 1 January 1970" [13:43:40] I didn't know wo could do time travel [13:43:52] you dont need to repeat yourself across channels [13:44:13] sorry [13:44:26] the other channel seemed on vacation [13:44:56] it's christmas eve in many locations [13:44:56] well this channel is mainly for operations issues (problems with the servers) [13:45:30] if no one is around right now, that should be reported as a bug [13:46:52] matanya: what do you know about the labs issue? [13:47:35] I don't see anything in the admin log [13:47:40] apergos: i don't even seem to be able to log in, no ping [13:47:42] no nothing [13:48:05] well you said something about labstore3 being used, I don't know about that even [13:48:18] actully there is ping now [13:48:43] apergos: thats what i saw in my logs, let me scroll a sec [13:50:56] apergos: i was wrong [13:51:07] ok [13:51:17] that is what you said. the current issue i see is : rm: cannot remove `catlib.py': Read-only file system [13:51:26] for example [13:52:04] read only? great [13:52:37] yes, no one will break the system further :) [13:53:49] (03PS1) 10Cmjohnson: Adding some renamed analytics boxes to decommissioning.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/103547 [13:54:19] labstore4 is still the active storage server [13:54:23] I just checked [13:55:31] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 04:48:26 PM UTC [13:56:31] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 04:50:14 PM UTC [13:56:59] (03CR) 10Cmjohnson: [C: 032] Adding some renamed analytics boxes to decommissioning.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/103547 (owner: 10Cmjohnson) [13:57:20] Dec 24 06:17:01 labstore4 kernel: [55944.738322] XFS (dm-0): metadata I/O error: block 0x1924ac670 ("xfs_trans_read_buf_map") error 117 numblks 8 [13:57:21] nice [13:57:29] ok well coren did an xfs repair yesterday [13:57:38] apparently that didn't really get to the bottom of the issue [13:57:41] so it failed again [13:57:50] apergos: Seriously? [13:57:51] well I don't know what else happened [13:57:56] ah you are there [13:57:58] yes, Coren [13:58:06] Seriously [13:58:14] I just got here. Coffee is fresh. [13:58:17] I was about to say that I had no idea if you looked at it further [13:58:18] ok [13:58:29] well have your coffee, do what you need to do [13:58:36] I'm just hearing about this now, it will wait a bit longr [13:58:43] Wait, the NFS is up. [13:59:32] I see nothing in today's syslog, only yesterdays' (well today at 6 am utc) [13:59:50] I didn't see anything actually ro mounted either [13:59:54] hi people [14:00:21] AFAICT, everything is running fine. Where did you hear of this? [14:00:51] Oh, wait, radonly filesystem? That's probably a project running on gluster. [14:01:01] Which instance was that? [14:01:22] how would i got about aquiring wikimedia logs of requests for Special:RecentChanges? i want to know which of the options are used most often (or at all) [14:01:22] e.g. like https://en.wikipedia.org/w/index.php?title=Special:RecentChanges&hideliu=1&hidebots=0&hideanons=1 to show only bots, does anybody ever look at that? [14:02:09] we have sampled logs (one of evry thousand hits) [14:02:19] * Coren would have, often, if he knew about hideliu! [14:02:53] as a regular wiki user I have certainly used that [14:03:10] I"m sure others too, but as far as how often, dunno [14:03:18] apergos: Where are you hitting the readonly filesystem? [14:03:22] I"m not [14:03:38] I'm just investigating matanya's report [14:04:06] the only thing I have found so far is these xfs errors from 6 am UTC today (the latest ones, they stop after that) [14:04:10] Coren: on tools mostly [14:04:20] and since I don't know if you were working on it at that point, I know nothing :-D [14:04:50] and bastion too Coren [14:05:19] Coren: touch test [14:05:20] touch: cannot touch `test': Read-only file system [14:05:30] matanya: AFAICT, tools is working fine (that one's NFS) [14:05:39] Ah! Bastion is gluster. Lemme go kick in its head. [14:05:49] yes, now tools is ok [14:05:59] but bastion sucks [14:06:24] * matanya is in debugging mode today :) [14:06:40] Yup. Sick gluster. [14:07:32] when are replacing gluster? [14:07:36] * Coren beats it up. [14:07:45] matanya: We're not bringing gluster to eqiad. [14:07:52] good start [14:08:04] what instead? [14:08:21] NFS. If you take out XFS, it's been rock solid for months now. [14:09:38] with which FS? [14:09:42] EXT? [14:10:44] Yeah, ext4 [14:11:25] good, i like ext4 :) [14:11:51] I was considering JFS for a while, but then I woke up. [14:11:53] :-) [14:12:18] To be fair, I really wish they'd sort the licensing crap out and get a good native zfs for linux. [14:13:09] basstion gluster has woken up and shold be properly writable now. [14:13:47] btrfs [14:17:41] (03PS1) 10Cmjohnson: Removing puppet entries for db31|3|4|6|7 db47|9 db50|4|7 [operations/puppet] - 10https://gerrit.wikimedia.org/r/103550 [14:19:19] so those xfs errors form earlier today, are they from before or after you finished what you were doing, Coren? [14:19:32] just so we know if something's still up [14:20:13] btrfs is still not fully cooked [14:20:27] I know, but it's getting there [14:20:32] After, but they are a known issue with aborted readahead and recent kernels with XFS. [14:20:51] Unrelated to the original issue, and mostly log noise. [14:21:02] good to know [14:21:04] thanks [14:21:34] and there is zfs for linux, just user space [14:21:37] (Post 3.4 kernels readahead agressively, and can cancel readaheads when they aren't deemed useful anymore but XFS still does its validation on the unread blocks (which are all 0)) [14:23:16] apergos: https://www.redhat.com/archives/dm-devel/2013-February/msg00104.html [14:24:38] hmm [14:31:02] (03PS2) 10Cmjohnson: Removing puppet entries for db31|3|4|6|7 db47|9 db50|4|7 [operations/puppet] - 10https://gerrit.wikimedia.org/r/103550 [14:31:28] thanks coren, it's ok now [14:41:23] (03CR) 10Cmjohnson: [C: 032] Removing puppet entries for db31|3|4|6|7 db47|9 db50|4|7 [operations/puppet] - 10https://gerrit.wikimedia.org/r/103550 (owner: 10Cmjohnson) [14:42:08] Coren: should labs tool work now ? [14:42:20] I get [14:42:24] Proxy Error [14:42:25] The proxy server received an invalid response from an upstream server. [14:42:27] The proxy server could not handle the request GET /widar/index.php. [14:42:28] Reason: Error reading from remote server [14:42:37] GerardM-: As far as I know, it has been working without problem. [14:42:45] ok [14:42:47] Ah, that's a problem with a specific tool. Lemme look at it. [14:42:55] always blame the user :) [14:42:56] in that case there are still problems ... [14:43:08] [14:44:01] hi andrewbogott [14:44:40] 'morning [14:46:01] I will be away for an hour ... [14:46:04] GerardM-: I see it working intermitently, mostly capacity problems. It's still using the old-style apache though, and a switch to the lighttpd setup would solve that. Do you know who maintains it? [14:46:19] yes ... Magnus [14:46:26] that is the Widar tool [14:46:50] is that what you are talking about Coren ? [14:46:53] I can switch it to the new scheme easily enough, but I'd rather not do so without consulting with him first. [14:47:06] he is in Germany for the holidays [14:47:07] GerardM-: Yes; the error message you showed was related to that. [14:47:13] * Coren ponders. [14:47:26] he is happy when knowledgeable people work on his tool [14:47:27] His setup is straightforward enough; lemme try it and see if it works. [14:47:31] he explicitly told me that [14:47:43] I think you qualify [14:48:10] be back in an hour [14:48:25] GerardM-: I just switch it. It should be fast and reliable this way. [14:51:19] magic [15:02:06] back in about 30 mins [15:02:59] coren .. thank you [15:03:05] I will inform Magnus [15:03:17] do you mind if I blog about it ? [15:05:54] did run it but it did not change anything [15:06:03] will look into it when I am back [15:27:22] PROBLEM - Host analytics1001 is DOWN: PING CRITICAL - Packet loss = 100% [15:32:31] RECOVERY - Host analytics1001 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [15:39:11] PROBLEM - Host analytics1001 is DOWN: PING CRITICAL - Packet loss = 100% [15:43:23] back [15:49:41] RECOVERY - Host analytics1001 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [15:58:42] (03CR) 10Qgil: "TTO you are of course right. I'm very sorry for the confusion." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103107 (owner: 10Yatinmaan) [16:01:31] PROBLEM - Host analytics1001 is DOWN: PING CRITICAL - Packet loss = 100% [16:06:43] (03PS3) 10Dan-nl: annotating-domain-whitelist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102739 [16:12:01] RECOVERY - Host analytics1001 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [16:28:41] apergos: Can you offer help with pxe-booting a cisco? All the bios settings look right to me but it just boots straight to the hdd anyway. [16:29:01] geez I know next to nothing about those [16:29:08] Oh, ok. [16:29:23] I've done this a bunch of times before. Must be forgetting something :( [16:29:30] I can look at the settings to see if another pair of eyes sees something you didn't [16:29:40] but that's about it [16:30:12] I'd appreciate it if you don't mind. [16:30:31] I suppose another possibility is that it's trying pxe, failing to connect upstream and falling back on hdd. [16:31:12] I don't see anything in the terminal output about that, but there's a lot ot read [16:32:57] heya paravoid [16:36:13] apergos: I think I found the issue. [16:36:14] Dec 24 15:41:24 brewster dhcpd: DHCPDISCOVER from 88:43:e1:c2:99:8e via 208.80.154.131: network 208.80.154.128/26: no free leases [16:36:32] So the networking is wrong :( [16:37:11] ahh there we go [16:37:16] i'm having network troubles too! :) but unrelated [16:37:21] so not your settings and you weren't missing anything [16:37:26] yeah, paravoid, I'm not so sure the analytics multicast stuff is fixe [16:37:27] d [16:37:50] apergos: So if I recall, that means the server is assigned to the wrong row? [16:37:50] I still can't get multicast traffic across rows [16:37:55] it depends [16:38:03] whatever you did made some things better though [16:38:06] in ganglia for sure [16:38:09] hm. [16:38:21] i'm not sure if the manual tests i'm doing (with iperf) are good tests [16:38:37] not sure if they are supposed to work given whatever existing acl rules or network settings there are [16:38:40] but they aren't working right now [16:38:42] it probably means that the host ip that's assigned doesn't match the vlan the switch port is in [16:38:44] and some ganglia data is stale [16:38:45] *probably* [16:39:58] ottomata: I would try tcpdump of the actual packets being sent/received by gmond [16:40:08] well, and virt1002 doesn't even have a non-mgmt entry in dns [16:40:08] that's guaranteed to tell you what's going on [16:40:10] um [16:40:22] if you have root on those boxes [16:40:26] I'm so confused by all this [16:40:38] yeah, i'm watching that, starting to wonder if this is multicast ttl being wrong again [16:40:56] is virt1002 one of the ... what is it. one of the reassigned boxes of some sort? [16:41:16] if it was converted from some old name then we can see if there is mgmt for the old name [16:42:15] hello [16:42:30] apergos: according to linux-host-entries.ttyS0-115200 I have virt1001 through 1009. And I'm told that all but 1009 are in row b. [16:42:36] But I'm not sure what the reality is... [16:42:51] hi paravoid, i'm looking into this, i *might* know what's wrong [16:42:51] but [16:42:54] first [16:42:58] I could really use someone who understands this (including vlans) to audit these 9 boxes and figure out what's happening. [16:43:30] weren't a bunch of these just renamed from analytics hosts or something? [16:43:31] if I start a multicast listener in Row B and in Row C [16:43:36] apergos: yes. [16:43:40] in an arbitrary multicast group [16:43:41] ugh [16:43:45] (i'm doing 239.192.1.51) [16:43:49] well that is probably part of it [16:43:54] should I be able to send multicast packets to that grou [16:44:01] and see them on the listeners in each row? [16:44:44] " Removing old analytics dns files (an1001,1002,1005,1006,1008). Replacing mgmt entries with virt100[1-3][7-8]." [16:44:54] conspicuously missing is virt1002 [16:44:55] hm [16:44:58] :-D [16:45:04] ottomata: no [16:45:09] ottomata: depending on port [16:45:15] oh? [16:45:24] is that a network rule? [16:45:38] yes, the routers have firewalls for the analytics VLANs [16:45:51] i thought we had it so that analytics vlans could send any traffic to each other [16:46:01] it was just that they couldn't do that out of any of the analytcis vlan [16:46:03] s [16:46:11] in october virt1002 became pc1002 [16:46:52] I see mgmt for virt1002 actually but no other ip address [16:47:02] Yeah, but I think that more recently a different analytics box was reassigned to virt1002 [16:47:07] ohhhhhh hmm, ok paravoid, it works on port 8649 [16:47:17] ottomata: show configuration firewall family inet filter analytics-in4 [16:47:19] so now we are back to 'what ports are these on anyways' [16:47:22] ok yeah, then, i think this is a multicast ttl issue with hadoop's ganglia lib [16:47:26] it's fairly self-explanatory [16:47:27] I totally don't understand why when pc and analytics swiped these boxes they took them from the /middle/ of the range. [16:47:53] errr, paravoid, where do I run that? [16:47:57] cr1-eqiad [16:48:15] analytics1002.mgmt.eqiad.wmnet may have become virt1002.mgmt.eqiad.wmnet [16:48:20] lemme see if there is an rt ticket [16:49:25] apergos: there is, I made it. But now I can't find it [16:49:29] yep but it doesn' t tell me which went to what [16:49:35] https://rt.wikimedia.org/Ticket/Display.html?id=6546 [16:49:37] there's that [16:49:43] Apparently I'm terrible and searching rt, I can never find what i'm looking for [16:50:04] I'm not fond of the search facility [16:50:41] ok, here was my original request: https://rt.wikimedia.org/Ticket/Display.html?id=6390 [16:51:29] oh bonded ports, more cables... [16:51:36] did that happen? [16:52:03] I think so. Rob created a bunch of other tickets for the subtasks… are they attached to that first ticket somehow? [16:52:20] ah here's the secondary ntworking one [16:52:28] yeah, 6481 and 6482 [16:52:44] 'did not make any changes to vlans' [16:52:50] Yeah, I just read that [16:53:03] So that sort of explains why virt1001 doesn't work, doesn't explain why there's no entry at all for virt1002 [16:53:32] I guess I'll just add virt1002 myself. [16:53:37] And then… who understands about vlans? [16:54:24] yes, I would say add an entry for any that are missing [16:54:50] I can slug through it somewhat slowly [16:54:55] (vlans) [16:55:02] hm, no virt1003 either [16:55:11] in the sf tz, leslie but dunno if she will be around [16:55:24] mark might still be up? [16:55:31] (03CR) 10BryanDavis: "Trivial documentation typos" (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/103080 (owner: 10Aaron Schulz) [16:55:55] I haven't seen signs of him being active today, he might be off [16:56:42] what's up [16:56:56] uhhh [16:57:11] you booted virt1001? [16:57:16] or tried to? [16:57:35] it still must have its old name, and be in stored configs [16:57:46] I'll double check all that later and clean it up I guess [16:58:09] I'm trying to pxe-boot virt1001. That fails, and it comes up thinking it is analytics1001 (which it used to be.) [16:58:31] The pxe boot is failing because of 'no free leases' [16:58:39] paravoid, now you're mostly caught up :) [16:58:45] andrewbogott is trying to sort out virt 1001,2,3,8,9 which were old analytics1001,2,5,8 [16:59:03] thought I read "no free lazers", luckily I was wrong [16:59:17] https://rt.wikimedia.org/Ticket/Display.html?id=6482 this was done but no changes to vlan [16:59:45] (03PS1) 10Andrew Bogott: Add the mysteriously-missing virt1002 and virt1003 entries. [operations/dns] - 10https://gerrit.wikimedia.org/r/103563 [17:00:32] Man, when I see such an obvious whole in a list like ^ I can't help but wonder if someone knows something that I don't know... [17:00:50] s/whole/hole/ [17:00:53] and you were told that 1001,2,3,8 are in row b, what about 1009? [17:01:07] I believe that 1009 was just moved to b as well. [17:01:29] ticket 6529 [17:01:37] the free leases is for public1-b-eqiad [17:01:44] virt1001 is in private address space [17:01:53] got it [17:02:19] or, not moved, it was always there and just mislabled previously... [17:02:34] "no free leases" is just a weird error message that means "I couldn't allocate this address on this subnet" [17:02:48] paravoid: So you think that's a red herring? [17:03:06] no, it's a misconfiguration for sure [17:03:15] ok [17:03:20] I'm assuming virt1001 is supposed to be on private address space, correct? [17:03:21] I figured it didn't really have to do with # of leases [17:03:26] yep [17:03:35] 1000 is a public host, 1001-1009 should be private [17:03:59] when things were renamed and new ips given out, the vlan configs weren't updated [17:04:03] says on the ticket [17:04:55] the switch port is configured on the wrong VLAN then [17:08:38] So I guess I wait for LeslieCarr to come to work (if she's working today) [17:08:49] to change the VLAN? [17:08:51] no, i can do it [17:09:03] chris johnson usually does these [17:09:16] is it more than just virt1001? [17:10:00] paravoid: It's probably for all of these: https://rt.wikimedia.org/Ticket/Display.html?id=6482 [17:10:17] If chris usually does it, why did he close that ticket without doing it I wonder? [17:10:32] shit happens :) [17:11:05] I need to have some wikidata related config for wikipedia beta updated. Anyone here that can help me with this? [17:11:11] analytics2,5,6 and 8 already had private ips [17:11:25] ge-3/0/7 up down virt1008:eth0 [17:11:25] ge-3/0/15 up up virt1008 eth0 [17:11:26] jeroendedauw, it's a no deploys week [17:11:27] yay... [17:11:41] ugh [17:11:52] here should be two ports but not like that [17:11:54] MaxSem: the change I want to happen is that it stops pulling Wikibase and WikibaseDataModel from master [17:11:57] *there [17:12:03] MaxSem: It should stick to the current HEAD [17:12:09] Else things will break [17:12:15] jeroendedauw, poke hashar [17:12:33] * jeroendedauw goes on a quest to find the hashar [17:12:54] MaxSem: no one else that can do this? [17:13:00] no idea [17:13:09] maybe Krinkle|detached [17:13:56] what's so wrong with master WD? [17:14:17] Nothing [17:14:21] Yet [17:14:25] they're all over the place [17:14:26] all wrong [17:14:31] hehe [17:14:57] MaxSem: we will be introducing some new dependencies, beta does not have them, so will break [17:15:14] paravoid: Are those software things or do they recall a dc visit? [17:15:17] you're scaring me [17:15:23] andrewbogott: just config change [17:15:24] Obviously it should be updated to have the dependeencies, but we decided to not block on this [17:15:28] *changes [17:15:37] oh, great. thank you! [17:18:29] MaxSem: the fragility of how dependencies are managed in WMF-verse scares me [17:18:35] meanwhile… apergos, does https://gerrit.wikimedia.org/r/#/c/103563/ look safe to you? [17:20:13]