[00:00:01] andrewbogott: thanks [00:01:32] mutante: feel free to plaster links everywhere :) [00:01:46] And, thank you for keeping records :) That page is kind of a mess right now :( [00:03:14] andrewbogott: https://wikitech.wikimedia.org/w/index.php?title=Puppet_Todo&diff=89289&oldid=87328 [00:03:41] great! [00:03:46] how is download.pp ? just saw you abandoned one earlier? [00:03:48] i may have missed [00:05:02] ah , i also see your comment on gerrit and you even put the link there i asked for:) [00:06:11] merged, nice [00:06:27] PROBLEM - Disk space on xenon is CRITICAL: DISK CRITICAL - free space: /mnt/data 15880 MB (3% inode=99%): [00:06:44] mutante: yeah, there were several little post-patches needed but it gets a clean puppet run now. [00:07:02] we should tell gqil/sumana how volunteers got merges in ops/puppet [00:07:04] heh [00:07:17] sweet [00:08:14] TimStarling: does https://gerrit.wikimedia.org/r/#/c/95481/ look OK? [00:09:16] andrewbogott: i put download.pp being completed on the page as well.. ttyl [00:09:39] made a second patch for irc.pp after alex's comments [00:11:53] (03PS1) 10Ori.livneh: Don't auto-link SHA1s and Gerrit change IDs [operations/debs/adminbot] - 10https://gerrit.wikimedia.org/r/95571 [00:16:50] yes [00:19:16] TimStarling: good, and https://gerrit.wikimedia.org/r/#/c/95519/ too [00:19:57] (03PS2) 10Ori.livneh: Don't auto-link SHA1s and Gerrit change IDs [operations/debs/adminbot] - 10https://gerrit.wikimedia.org/r/95571 [00:20:42] (03CR) 10Ori.livneh: [C: 032] Don't auto-link SHA1s and Gerrit change IDs [operations/debs/adminbot] - 10https://gerrit.wikimedia.org/r/95571 (owner: 10Ori.livneh) [00:20:52] I figured that field should be protected even more know, since blindly using it without using the wrapper would be more likely to have undesired data given the clear() change [00:21:02] *now, ugh [00:22:25] paravoid: I feel like we are overwhelmed with perf issues lately :/ [00:29:38] !log schedule 1yr downtime for unused host 'gurvin' and all its services (RT #6135) and shutting it down [00:29:53] Logged the message, Master [00:31:15] (03PS2) 10Dzahn: remove gurvin from dhcp, dsh and site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/94449 [00:32:50] (03CR) 10Dzahn: [C: 032] remove gurvin from dhcp, dsh and site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/94449 (owner: 10Dzahn) [00:34:14] merged a change on manifests/role/solr.pp that was unmerged on palladium [00:34:26] does anyone know if we use redis-sentinal or something similar to manage redis failover? [00:34:29] (remember we have to merge on 2 servers currently) [00:34:45] ^d: ^ [00:35:04] <^d> I know very little about the existing Solr setup. [00:35:24] eh, sry, it was a blind guess because of search [00:35:28] PROBLEM - Disk space on xenon is CRITICAL: DISK CRITICAL - free space: /mnt/data 15691 MB (3% inode=99%): [00:35:34] <^d> mutante: It's not actually search :p [00:35:56] we use it for session handling! [00:36:25] ah.. Extension:Translate [00:36:26] i see [00:36:33] <^d> Yeah, Translate + GeoData. [00:36:38] (03PS1) 10Cmjohnson: Adding dns entries for elast1001-1012 [operations/dns] - 10https://gerrit.wikimedia.org/r/95573 [00:36:50] <^d> mutante: Ideally we'd like to move them to Elastic, but we've not put a concerted effort in yet. [00:36:51] <^d> :) [00:38:09] i should have said Coren/Max .. nevermind [00:38:44] (03CR) 10Cmjohnson: [C: 032] Adding dns entries for elast1001-1012 [operations/dns] - 10https://gerrit.wikimedia.org/r/95573 (owner: 10Cmjohnson) [00:38:47] ^d: k:) just wanted to say something might be in effect now that just looked merge in gerrit before. which upon closer look is fixing Solr OOM in Labs [00:38:52] so all good [00:39:05] !log dns update [00:39:14] max_heap => "256M", for solr::geodata [00:39:18] Logged the message, Master [00:40:01] cmjohnson1: since you're on DNS, care to remove gurvin? patch is there and i just shut it down and removed from puppet [00:40:15] sure [00:40:25] https://gerrit.wikimedia.org/r/#/c/94448/ just unsure if it should have been replaced with [00:40:29] asset tag [00:40:50] should be wiped [00:41:22] (03CR) 10Cmjohnson: [C: 032] remove gurvin and gurvin.mgmt, decom [operations/dns] - 10https://gerrit.wikimedia.org/r/94448 (owner: 10Dzahn) [00:42:57] mtuante: there is nothing in dns regarding asset tag [00:44:28] cmjohnson1: should it also be like here? https://gerrit.wikimedia.org/r/#/c/94426/2/templates/10.in-addr.arpa [00:44:40] WMF3709.mgmt.pmtpa.wmnet. etc [00:45:03] it depends if you want a DNS name for getting to mgmt for wiping [00:45:35] RobH said we want these in there in case we are reusing them [00:45:43] and/or for wiping [00:46:00] but if it's going to be removed anyways after one person uses it one time , shrug [00:46:03] we would want them if we were going to reuse but everything in Tampa gets wiped [00:46:36] in the former 2 tickets in pmtpa i gave steve those WMF.mgmt names [00:46:54] once we decom it, we need remove all entries, power down and create a ticket for steve [00:46:56] but dunno if that makes it easier [00:47:11] no, wiping is done with a disc or usb [00:47:50] i prefer to leave the server names in the ticket...so new ticket with wipe 'servername' and in ticket request he update racktables [00:48:32] well then, i won't worry about gurvin not having one right now and just refer to it by name.. the info is in racktables too [00:48:38] yep [00:48:45] doing so [00:50:28] cmjohnson1: last step is "recycle hardware to scrap metal"? [00:50:44] i always say recycle/donate .. shrug [00:52:09] we're not sure what we're doing yet. I like to think they're is a really awesome non-profit doing awesome things that could use these servers [00:54:17] cmjohnson1: we recently had an idea.. the Internet Archive.. it recently almost burned down. 600k damage or something and they asked for donations [00:54:27] and it's like almost walking distance from our place [00:54:48] though.. they might need money for new scanners more than actual servers.. we could offer though [00:55:15] we could as long as they meet the req's but if rob was involved in the discussion they probably do [01:00:43] cmjohnson1: was just an idea when reading the article and then on that same day we happened to drive by the building and you could still see where it had burned.. i'd have to find out by walking over there [01:01:23] it's definitely worth looking into [01:01:29] mutante: "walking distance"? ;) [01:01:39] According to gmaps it's 4.2 miles from the office [01:02:57] definitely +1 IA, but their server infra is *really* specialized. [01:03:13] (I suppose that isn't necessarily too far to walk, but it takes a while :) ) [01:03:22] mutante: so it sounds like they lost a lot of cameras and scanners [01:03:31] RoanKattouw: i meant my private place in NoPa [01:03:39] moved [01:03:44] https://blog.archive.org/2013/11/06/scanning-center-fire-please-help-rebuild/#comment-298488 [01:04:07] Oooh nice [01:04:13] shutdown by dzahn per RT-6135 , please wipe and de-rack per RT-6311 [01:04:15] I still mentally had you in Nob Hill [01:04:18] oops, wrong paste [01:04:23] http://news.ninemsn.com.au/technology/2013/11/08/09/51/internet-archive-asks-for-donations [01:04:24] Which is closer but still fairly far [01:04:53] Good to hear you've found something in NoPa :) [01:05:05] yea, i like it. close to GG park [01:05:12] Yeah [01:05:25] actually i'm on Ocean Beach right now.heh [01:05:26] It's a kind of neighborhood I would like to live in if it were actually affordable to live there without roommates :S [01:06:39] yea, i'm sharing with rob [01:06:52] Right, I sort of figured since he also recently moved to the same neighborhood :) [01:08:04] i got tired of SRO after a year :) [01:08:12] "* No servers were affected. If some had been damaged, we have backups in different locations. An electrical conduit was damaged, but all digital services were functional within 6 hours, fully operational in 10 hours." [01:10:01] is that fast or slow for root cause: building on fire ?:) [01:11:07] greg-g: I formatted Roan's summary and sent it to you; holding off on posting it myself to give Faidon a chance to respond. [01:11:12] oh well, i guess they won't really need our servers, unless ..they could make a charity auction to turn them into cash, who knows, people might pay knowing it's a "wikipedia server" [01:14:24] ori-l: /me nods [01:14:56] mutante: this server was blessed with the blood, sweat, and tears of the wikipedia ops team [01:16:30] greg-g: blood is actually true for some ,, those racks can have sharp edges [01:16:41] indeed :) [01:16:58] humans and computers don't mix [01:29:23] RECOVERY - Disk space on xenon is OK: DISK OK [01:29:43] PROBLEM - Puppet freshness on sq44 is CRITICAL: No successful Puppet run in the last 10 hours [01:59:03] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [02:02:03] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [02:15:19] !log LocalisationUpdate completed (1.23wmf3) at Fri Nov 15 02:15:18 UTC 2013 [02:15:35] Logged the message, Master [02:34:42] !log LocalisationUpdate completed (1.23wmf4) at Fri Nov 15 02:34:41 UTC 2013 [02:34:57] Logged the message, Master [03:01:10] (03PS2) 10Tim Starling: Disable client idle disconnection [operations/puppet] - 10https://gerrit.wikimedia.org/r/94848 [03:01:17] (03CR) 10Tim Starling: [C: 032 V: 032] Disable client idle disconnection [operations/puppet] - 10https://gerrit.wikimedia.org/r/94848 (owner: 10Tim Starling) [03:14:10] !log on rdb1001-1004, set timeout=0 in the soft state, to match I8c9b13c1 [03:14:25] Logged the message, Master [03:15:36] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Nov 15 03:15:35 UTC 2013 [03:15:50] Logged the message, Master [03:19:59] !log on mc1001-1016: set timeout=0 in the redis soft state, as for rdb* [03:20:13] Logged the message, Master [04:30:37] PROBLEM - Puppet freshness on sq44 is CRITICAL: No successful Puppet run in the last 10 hours [05:11:10] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [05:12:53] (03PS1) 10Ori.livneh: Beta Labs: set deployment-fluoride as $wgUDPProfilerHost [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95582 [05:13:10] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [05:13:29] (03CR) 10Ori.livneh: [C: 032] Beta Labs: set deployment-fluoride as $wgUDPProfilerHost [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95582 (owner: 10Ori.livneh) [05:15:25] !log ori updated /a/common to {{Gerrit|I67a6d49d4}}: Beta Labs: set deployment-fluoride as $wgUDPProfilerHost [05:15:39] Logged the message, Master [05:16:19] PROBLEM - LVS HTTPS IPv6 on wikibooks-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:16:20] PROBLEM - LVS HTTP IPv6 on wikimedia-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:17:10] RECOVERY - LVS HTTPS IPv6 on wikibooks-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 64083 bytes in 0.675 second response time [05:17:11] RECOVERY - LVS HTTP IPv6 on wikimedia-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 91614 bytes in 0.432 second response time [06:42:32] !log fixing broken cgroups on mw1038, mw1044, mw1091 [06:42:46] Logged the message, Master [06:47:25] !log also fixed mw1141, mw1193 [06:48:50] Logged the message, Master [07:30:39] PROBLEM - Puppet freshness on sq44 is CRITICAL: No successful Puppet run in the last 10 hours [07:46:59] PROBLEM - Host sq44 is DOWN: PING CRITICAL - Packet loss = 100% [07:47:09] that's me [07:47:12] will log shortly [07:52:42] !log sq44 won't come back up, hung after "Remote Access Controller detected", rt #6312 [07:52:56] Logged the message, Master [09:04:26] (03PS1) 10Akosiaris: Shift pmtpa internal servers to new puppet infra [operations/dns] - 10https://gerrit.wikimedia.org/r/95585 [09:07:18] !log disabled puppet an all *.pmtpa.wmnet servers to facilitate auditing the change to new puppetmaster infra [09:07:27] ??? 2?? wtf ? [09:07:34] Logged the message, Master [09:10:23] (03CR) 10Akosiaris: [C: 032] Shift pmtpa internal servers to new puppet infra [operations/dns] - 10https://gerrit.wikimedia.org/r/95585 (owner: 10Akosiaris) [09:15:50] !log disabled commiting in sockpuppet private for evah (hopefully) through a git hook [09:16:05] Logged the message, Master [09:24:45] (03PS1) 10ArielGlenn: checkhost.py: report hosts in various manifests and lists [operations/software] - 10https://gerrit.wikimedia.org/r/95586 [09:24:50] (03CR) 10jenkins-bot: [V: 04-1] checkhost.py: report hosts in various manifests and lists [operations/software] - 10https://gerrit.wikimedia.org/r/95586 (owner: 10ArielGlenn) [09:25:56] akosiaris: I guess you saw the errrors on sockpuppet/palladium/virt1000/virt0 ealrlier about gitpuppet-private.key ? [09:26:33] earlier being ? Cause i thought i fixed that [09:26:36] lemme check [09:27:15] earlier this morning, when I checked puppet runs [09:28:45] other people's stuff fails pep8 and my stuff gets the -2, grrr ... *and* it's E128 for them anyways which is semi-bogus [09:29:08] which is that one ? [09:29:24] i always hate the one about under-indentation [09:29:26] continuation line under-indented for visual indent [09:29:28] yep that's the one [09:29:31] exactly that one [09:29:44] I will nag hashar about it a little later [09:30:21] which btw my pep8 is a little bit older than the one on our CI infra [09:30:25] and I always miss it and get a -2 later... gkrrrr [09:31:02] even better :-/ [09:31:33] and yeah you could have a labs instance with the current one and etc but [09:31:42] mh, prefer doing the work on my laptop, it's why I have one [09:31:56] same here [09:32:09] i do have a chroot however with our repo enabled [09:32:25] smart [09:32:38] I have a vm :-D [09:32:43] so after getting the first -2 i figure out the rest without getting continous -2s [09:32:46] (laptop runs Other Distro(tm)) [09:32:52] lol [09:33:00] other Distro being ? [09:33:09] for me it's Debian [09:33:19] fedora :-D [09:33:25] so ... close enough [09:33:34] ah... you got systemd ? [09:33:37] lucky you!!! [09:33:37] :P [09:33:39] yep [09:33:40] hahaha [09:34:45] so the gitpuppet-private-key is fixed only on the new infra and not the old one [09:34:59] I managed to be out-of-sync myself... imagine others [09:35:22] well it's still happening on eg palladium [09:35:31] so ... I WILL finish this migration today and this will become unimportant [09:35:43] ok, so I can ignore, that was my basic question anyways [09:35:52] yes because palladium still has stafford as its puppetmaster [09:36:12] after it starts looking at himself (itself?) everything should be ok [09:36:18] ok. just let me know at the end of the day: which hosts are puppetmasters, which are cas, and I shall be happy :-) [09:36:41] yeah... I have to update a shitload of wiki pages :-( [09:36:50] will send an email after I am done [09:36:54] perfect [09:45:31] !log restarting zuul, leaked file descriptors pointing to non existent git pack files :/ [09:45:47] Logged the message, Master [09:47:53] ori-l: respond to what? [09:53:42] ah a hashar... when you have time for a question about the pep8 verification, lemme know [09:54:30] apergos: got time rightnow [09:54:36] sweet! [09:55:16] looking at https://gerrit.wikimedia.org/r/#/c/95586/ where it is complaining but the failure appears to be about stuff already in there [09:55:17] https://integration.wikimedia.org/ci/job/operations-software-pep8/45/console [09:55:38] and also I kinda hate the E128 continuation line check [09:56:00] :D [09:56:12] so my 2 qs are, is pep8 now flagging stuff from things checked in earlier? and, can we turn off E128 (globally I would argue)? [09:56:15] at the root of that repository, there is a .pep8 file [09:56:22] yeah I know, I could add there [09:56:23] it has a statement like: ignore = W191,E225,E231,E501,E301,E302 [09:56:28] you can add in E128 [09:56:41] and add above a comment along the line of: ; E128: continuation line under-indented for visual indent [09:57:16] that will make pep8 ignore that error, you can then rebase your change on top of it and you will no more have any complaint [09:57:28] I was asking if we could turn that off globally rather than just that repo [09:57:33] nop [09:57:37] boooo [09:57:39] :D [09:57:45] ok well that was a clear answer, thanks [09:57:57] pep8 is a standard so .. doesn't make sense to hack it Iprefer everyone to attempt to stick to that standard [09:58:15] though via .pep8 one can tweak its standard when it is annoying [09:58:16] ok but... new pep8 checks flag more/differently and so fail on code already checked in [09:58:21] that's a bit of a problem [09:58:35] ohhh [09:59:00] maybe something changed between pep8 1.4.5 and 1.4.6 which we are running now [09:59:04] grrrr [09:59:13] which would be annoying indeed [10:00:29] and that (E128 behavior) seems to be such a thing [10:03:52] (03PS1) 10Hashar: pass pep8 E128 (continuation lines under-indented) [operations/software] - 10https://gerrit.wikimedia.org/r/95587 [10:03:54] :D [10:04:55] (03PS1) 10Hashar: pep8: ignore E128 [operations/software] - 10https://gerrit.wikimedia.org/r/95588 [10:05:19] (03PS2) 10Hashar: checkhost.py: report hosts in various manifests and lists [operations/software] - 10https://gerrit.wikimedia.org/r/95586 (owner: 10ArielGlenn) [10:05:34] (03CR) 10Hashar: "rebased to get rid of pep8 E128 errors." [operations/software] - 10https://gerrit.wikimedia.org/r/95586 (owner: 10ArielGlenn) [10:05:37] apergos: solved :-] [10:05:53] the first commit fix the existing pep8 E128 errors [10:06:00] the second ignore them entirely, [10:06:09] the third rebase your change which should be passing now. [10:06:31] do you have pep8 errors reported in your editor? And which text editor are you using? [10:06:51] (I use vim with syntastic and vim-jedi plugins) [10:08:05] vim-jedi... huh ... use vim-sith :P [10:08:26] hmmm... i don't suppose that exists ... [10:08:53] that is a very nice completion plugins for python, I highly recommend it [10:09:29] it comes with autocompletion that popup automatically and display the function docstring in a split buffer [10:09:34] saves me a ton of time [10:09:40] I wish I had something similar for php :/ [10:10:01] I use a standalone pep8 checker [10:10:07] hmmm that sounds interesting [10:10:07] akosiaris: would you have time to finish up the contint ferm patch today ? [10:10:24] I can show you a demo over hangout screensharing if you want [10:10:33] hashar, yes but not now [10:10:33] i am finishing up the puppet migration (finally) [10:10:46] if everything comes down to ruins you know who to blame [10:11:08] and I use emacs but do not pep8 from within it [11:05:01] PROBLEM - Disk space on cp1058 is CRITICAL: DISK CRITICAL - free space: /srv/sda3 12596 MB (4% inode=99%): /srv/sdb3 12437 MB (3% inode=99%): [11:09:34] bbl [11:13:29] PROBLEM - Puppet freshness on virt11 is CRITICAL: No successful Puppet run in the last 10 hours [11:24:29] PROBLEM - Puppet freshness on virt6 is CRITICAL: No successful Puppet run in the last 10 hours [11:29:29] PROBLEM - Puppet freshness on virt7 is CRITICAL: No successful Puppet run in the last 10 hours [11:30:29] PROBLEM - Puppet freshness on virt5 is CRITICAL: No successful Puppet run in the last 10 hours [11:32:30] PROBLEM - Puppet freshness on virt9 is CRITICAL: No successful Puppet run in the last 10 hours [11:41:39] (03CR) 10Edenhill: [C: 031] "(9 comments)" [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/95473 (owner: 10Ottomata) [11:53:30] PROBLEM - Puppet freshness on virt2 is CRITICAL: No successful Puppet run in the last 10 hours [11:56:19] telnet -4 palladium.eqiad.wmnet 8140 [11:56:19] Trying 10.64.16.160... [11:56:23] κλαψ [11:56:28] snif [11:56:29] PROBLEM - Puppet freshness on virt8 is CRITICAL: No successful Puppet run in the last 10 hours [11:56:31] haha [11:56:35] died? [11:56:42] niah [11:56:47] firewalled probably [11:56:53] from? [11:57:02] that is the $1M question [11:57:05] where are you trying to connect from? [11:57:09] virt6 [11:57:43] labs & analytics have all kinds of firewalls and such [11:57:46] huh [11:57:54] although i didn't remember if labs' production nodes did too [11:57:57] but it's quite possible [11:58:00] yeah... the firewall on virt6 is.... damn... [11:58:20] iptables -nxvL|wc -l [11:58:20] 1653 [11:58:22] sigh [11:58:31] PROBLEM - Puppet freshness on virt12 is CRITICAL: No successful Puppet run in the last 10 hours [11:59:30] PROBLEM - Puppet freshness on virt10 is CRITICAL: No successful Puppet run in the last 10 hours [12:00:47] i am starting to believe it's on the router [12:00:54] I guess I could change that message about 10 hours [12:03:23] (03PS1) 10ArielGlenn: update puppet freshness whiner to say 3 hours too [operations/puppet] - 10https://gerrit.wikimedia.org/r/95592 [12:05:12] (03CR) 10ArielGlenn: [C: 032] update puppet freshness whiner to say 3 hours too [operations/puppet] - 10https://gerrit.wikimedia.org/r/95592 (owner: 10ArielGlenn) [12:10:38] (03PS1) 10Akosiaris: Shift eqiad internal servers to new puppet infra [operations/dns] - 10https://gerrit.wikimedia.org/r/95593 [12:10:43] \o/ [12:19:39] damn ... i already changed one obvious term on cr2-pmtpa but no luck [12:19:50] who kills my packets ????? [12:19:55] cr1-sdtpa too? [12:20:04] ACLs usually are "packet filtered", not drop [12:22:37] it's probably just me but i can't load mediawiki.org [12:22:57] ok, works now [12:27:41] uh https://countess.archive.org/report/matrix.php [12:28:48] finally.... [12:28:48] paravoid thanks for the cr1-sdtpa pointer [12:29:12] RECOVERY - Puppet freshness on virt7 is OK: puppet ran at Fri Nov 15 12:29:03 UTC 2013 [12:29:43] RECOVERY - Puppet freshness on virt6 is OK: puppet ran at Fri Nov 15 12:29:33 UTC 2013 [12:30:03] RECOVERY - Puppet freshness on virt5 is OK: puppet ran at Fri Nov 15 12:29:53 UTC 2013 [12:30:20] akosiaris: each site has two routers, both being edge (ingress/egress points) and both serving as core routers with VRRP [12:30:58] akosiaris: the ones named counterintuitiviely are sdtpa/pmtpa & esams/knams [12:31:08] so i might have changed the primary only and everything would work ok [12:31:13] yup [12:31:17] until it wouldn't :) [12:31:17] nice :-) [12:31:38] esams & knams being 80km apart [12:31:47] sdtpa & pmtpa being 2 floors apart [12:31:53] PROBLEM - Disk space on cerium is CRITICAL: DISK CRITICAL - free space: /mnt/data 15003 MB (3% inode=99%): [12:31:53] RECOVERY - Puppet freshness on virt9 is OK: puppet ran at Fri Nov 15 12:31:44 UTC 2013 [12:32:39] !log updated sockpuppet term on cr1-sdtpa , cr2-tmtpa to include palladium [12:32:54] Logged the message, Master [12:33:05] !log disabled puppet on *eqiad.wmnet for change [12:33:19] Logged the message, Master [12:33:34] (03CR) 10Akosiaris: [C: 032] Shift eqiad internal servers to new puppet infra [operations/dns] - 10https://gerrit.wikimedia.org/r/95593 (owner: 10Akosiaris) [12:35:15] !log github: added zeljkofilipin to the new 'qa' team. Granted rights on wikimedia/mediawiki-selenium and wikimedia/qa-browsertests [12:35:22] zeljkof: ^^^ [12:35:30] Logged the message, Master [12:36:09] hashar: thanks [12:36:37] I do not see wikimedia under my organisations https://github.com/zeljkofilipin [12:36:53] RECOVERY - Disk space on cerium is OK: DISK OK [12:37:03] zeljkof: probably cached [12:37:13] aude: thanks, that is probably it [12:37:22] I can see the admin tools in the repos [12:37:23] sometimes i have to hard refresh on github [12:37:52] aude: I can see wikimedia at my github account now [12:37:57] cool :) [12:37:58] it just needed a minute [12:52:43] RECOVERY - Puppet freshness on virt2 is OK: puppet ran at Fri Nov 15 12:52:41 UTC 2013 [12:55:59] Reedy: hey, do you have any idea what's the deal with wikiminiatlas? [12:56:03] RECOVERY - Puppet freshness on virt8 is OK: puppet ran at Fri Nov 15 12:55:56 UTC 2013 [12:58:03] RECOVERY - Puppet freshness on virt12 is OK: puppet ran at Fri Nov 15 12:57:52 UTC 2013 [12:58:53] RECOVERY - Puppet freshness on virt10 is OK: puppet ran at Fri Nov 15 12:58:47 UTC 2013 [13:12:55] RECOVERY - Puppet freshness on virt11 is OK: puppet ran at Fri Nov 15 13:12:47 UTC 2013 [13:43:33] paravoid: the only thing I know about wikiminiatlas is that its OSM tiles generated on the tool server and shown in an iframe [13:43:38] but that, you must already know about :-( [13:43:55] yeah [13:44:51] (03PS1) 10Akosiaris: Fix applicationserver ganglia plugins perms [operations/puppet] - 10https://gerrit.wikimedia.org/r/95600 [13:44:52] (03PS1) 10Akosiaris: Fix owner/perms for misc/logging.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/95601 [13:44:53] (03PS1) 10Akosiaris: Fix role::swift::ganglia_reporter perms/owner [operations/puppet] - 10https://gerrit.wikimedia.org/r/95602 [13:44:54] (03PS1) 10Akosiaris: fix perms/owner for memcached ganglia plugins [operations/puppet] - 10https://gerrit.wikimedia.org/r/95603 [13:51:49] (03CR) 10Akosiaris: [C: 032] Fix applicationserver ganglia plugins perms [operations/puppet] - 10https://gerrit.wikimedia.org/r/95600 (owner: 10Akosiaris) [13:52:34] (03CR) 10Akosiaris: [C: 032] Fix owner/perms for misc/logging.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/95601 (owner: 10Akosiaris) [13:52:53] (03CR) 10Akosiaris: [C: 032] Fix role::swift::ganglia_reporter perms/owner [operations/puppet] - 10https://gerrit.wikimedia.org/r/95602 (owner: 10Akosiaris) [13:53:17] (03CR) 10Akosiaris: [C: 032] fix perms/owner for memcached ganglia plugins [operations/puppet] - 10https://gerrit.wikimedia.org/r/95603 (owner: 10Akosiaris) [13:55:34] !log switched *eqiad.wmnet to using new puppet infra. triggering puppet run [13:55:48] Logged the message, Master [14:00:07] (03PS1) 10Akosiaris: Change wikimedia.org to new puppet infra [operations/dns] - 10https://gerrit.wikimedia.org/r/95604 [14:05:53] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 3 hours [14:14:53] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Nov 15 14:14:51 UTC 2013 [14:15:54] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 3 hours [14:20:43] huh [14:20:59] i am having problems accessing enwiki but dewiki, wikidata are fine [14:21:10] just me? [14:21:23] it's been weird all day [14:21:29] sometime sworks [14:26:05] (03CR) 10Akosiaris: [C: 032] "LGTM" [operations/puppet] - 10https://gerrit.wikimedia.org/r/94407 (owner: 10Dzahn) [14:28:34] ok, i did a ping to de.wikipedia.org [14:28:42] now en.wikipedia.works after ping [14:29:07] paravoid: were folks doing dns stuff? [14:35:53] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: No successful Puppet run in the last 3 hours [14:40:52] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: No successful Puppet run in the last 3 hours [14:41:52] PROBLEM - Puppet freshness on vanadium is CRITICAL: No successful Puppet run in the last 3 hours [14:45:37] (03PS1) 10Akosiaris: Overzealous mode change in 31c9a130 [operations/puppet] - 10https://gerrit.wikimedia.org/r/95607 [14:45:44] (03CR) 10jenkins-bot: [V: 04-1] Overzealous mode change in 31c9a130 [operations/puppet] - 10https://gerrit.wikimedia.org/r/95607 (owner: 10Akosiaris) [14:47:45] huh ? [14:48:52] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: No successful Puppet run in the last 3 hours [14:48:53] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: No successful Puppet run in the last 3 hours [14:50:52] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: No successful Puppet run in the last 3 hours [14:54:53] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: No successful Puppet run in the last 3 hours [14:57:05] i freaking hate ACLs [15:02:07] !log update cr{1,2}-eqiad ACLs with new puppetmaster frontend address [15:02:19] Logged the message, Master [15:04:48] btw apergos those 3 hours instead of 10. excellent idea :-) [15:05:06] sweet [15:05:37] 3 hours? [15:05:47] puppet whine [15:06:11] btw today: no new kmem_alloc reports [15:06:27] either on the xfs or the ext4 ones [15:31:15] (03CR) 10Ottomata: "(9 comments)" [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/95473 (owner: 10Ottomata) [15:32:17] PROBLEM - udp2log log age for lucene on oxygen is CRITICAL: CRITICAL: log files /a/log/lucene/lucene.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [15:34:57] RECOVERY - Puppet freshness on analytics1005 is OK: puppet ran at Fri Nov 15 15:34:53 UTC 2013 [15:37:41] (03CR) 10Akosiaris: [C: 032] Change wikimedia.org to new puppet infra [operations/dns] - 10https://gerrit.wikimedia.org/r/95604 (owner: 10Akosiaris) [15:40:17] RECOVERY - Puppet freshness on analytics1015 is OK: puppet ran at Fri Nov 15 15:40:15 UTC 2013 [15:41:06] RECOVERY - Puppet freshness on vanadium is OK: puppet ran at Fri Nov 15 15:40:56 UTC 2013 [15:44:57] (03PS4) 10Ottomata: Writing JSON statistics to log file rather than syslog or stderr [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/95473 [15:47:57] RECOVERY - Puppet freshness on analytics1002 is OK: puppet ran at Fri Nov 15 15:47:53 UTC 2013 [15:48:16] RECOVERY - Puppet freshness on analytics1022 is OK: puppet ran at Fri Nov 15 15:48:08 UTC 2013 [15:49:57] RECOVERY - Puppet freshness on analytics1008 is OK: puppet ran at Fri Nov 15 15:49:53 UTC 2013 [15:54:17] RECOVERY - Puppet freshness on analytics1018 is OK: puppet ran at Fri Nov 15 15:54:12 UTC 2013 [15:54:45] ok, gotta run to meet yurik at a cowork space, bbl [16:06:45] (03PS1) 10Akosiaris: netmapper perms/owner fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95625 [16:06:46] (03PS1) 10Akosiaris: role::logging::mediawiki owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95626 [16:06:47] (03PS1) 10Akosiaris: mysql_wmf::pc::conf perms/owner fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95627 [16:06:48] (03PS1) 10Akosiaris: wikidata logrotate configuration owner fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/95628 [16:06:49] (03PS1) 10Akosiaris: l10nupdate logrotate configuration owner fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/95629 [16:06:50] (03PS1) 10Akosiaris: misc::monitoring::net::udp owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95630 [16:06:51] (03PS1) 10Akosiaris: misc::udp2log owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95631 [16:06:52] (03PS1) 10Akosiaris: ganglia::plugin::python owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95632 [16:07:49] (03CR) 10jenkins-bot: [V: 04-1] netmapper perms/owner fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95625 (owner: 10Akosiaris) [16:08:35] I am gonna kill jenkins [16:08:37] (03CR) 10jenkins-bot: [V: 04-1] role::logging::mediawiki owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95626 (owner: 10Akosiaris) [16:09:12] (03CR) 10jenkins-bot: [V: 04-1] mysql_wmf::pc::conf perms/owner fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95627 (owner: 10Akosiaris) [16:09:25] lol [16:09:44] 16:07:45 err: Could not parse for environment production: Syntax error at '=>'; expected '}' at /srv/ssd/jenkins-slave/workspace/operations-puppet-validate/modules/varnish/manifests/netmapper_update.pp:25 [16:09:53] (03CR) 10jenkins-bot: [V: 04-1] wikidata logrotate configuration owner fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/95628 (owner: 10Akosiaris) [16:10:35] (03CR) 10jenkins-bot: [V: 04-1] l10nupdate logrotate configuration owner fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/95629 (owner: 10Akosiaris) [16:10:57] grrrr [16:11:16] (03CR) 10jenkins-bot: [V: 04-1] misc::monitoring::net::udp owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95630 (owner: 10Akosiaris) [16:11:57] (03CR) 10jenkins-bot: [V: 04-1] misc::udp2log owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95631 (owner: 10Akosiaris) [16:12:39] (03CR) 10jenkins-bot: [V: 04-1] ganglia::plugin::python owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95632 (owner: 10Akosiaris) [16:15:45] mutante: upload tarball please? https://rt.wikimedia.org/Ticket/Display.html?id=6316 [16:18:03] (03PS2) 10Akosiaris: ganglia::plugin::python owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95632 [16:18:04] (03PS2) 10Akosiaris: l10nupdate logrotate configuration owner fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/95629 [16:18:05] (03PS2) 10Akosiaris: wikidata logrotate configuration owner fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/95628 [16:18:07] (03PS2) 10Akosiaris: misc::udp2log owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95631 [16:18:08] (03PS2) 10Akosiaris: misc::monitoring::net::udp owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95630 [16:18:08] (03PS2) 10Akosiaris: netmapper perms/owner fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95625 [16:18:09] (03PS2) 10Akosiaris: mysql_wmf::pc::conf perms/owner fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95627 [16:18:10] (03PS2) 10Akosiaris: role::logging::mediawiki owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95626 [16:18:46] (03CR) 10Akosiaris: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/95607 (owner: 10Akosiaris) [16:24:06] (03CR) 10Akosiaris: [C: 032] netmapper perms/owner fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95625 (owner: 10Akosiaris) [16:24:19] (03CR) 10Akosiaris: [C: 032] role::logging::mediawiki owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95626 (owner: 10Akosiaris) [16:24:32] (03CR) 10Akosiaris: [C: 032] mysql_wmf::pc::conf perms/owner fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95627 (owner: 10Akosiaris) [16:24:45] (03CR) 10Akosiaris: [C: 032] wikidata logrotate configuration owner fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/95628 (owner: 10Akosiaris) [16:24:54] (03CR) 10Akosiaris: [C: 032] l10nupdate logrotate configuration owner fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/95629 (owner: 10Akosiaris) [16:25:07] (03CR) 10Akosiaris: [C: 032] misc::monitoring::net::udp owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95630 (owner: 10Akosiaris) [16:25:18] (03CR) 10Akosiaris: [C: 032] misc::udp2log owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95631 (owner: 10Akosiaris) [16:25:28] (03CR) 10Akosiaris: [C: 032] ganglia::plugin::python owner/perms fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/95632 (owner: 10Akosiaris) [16:31:18] RECOVERY - udp2log log age for lucene on oxygen is OK: OK: all log files active [16:33:15] (03PS2) 10Akosiaris: Overzealous mode change in 31c9a130 [operations/puppet] - 10https://gerrit.wikimedia.org/r/95607 [16:34:54] (03CR) 10Akosiaris: [C: 032] Overzealous mode change in 31c9a130 [operations/puppet] - 10https://gerrit.wikimedia.org/r/95607 (owner: 10Akosiaris) [17:07:36] (03PS1) 10Akosiaris: Fix duplicate ensure in swift.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/95638 [17:09:56] cmjohnson1: what's the skinny? :) [17:11:34] ottomata: haven't looked at analytics yet...had to get elastic in for manybubbles....we had some network issues that needs lesliecarr to look at [17:12:47] did you get around to giving them all proper IPs? [17:12:52] if they have non-conflicting IPs [17:13:01] I *think* I can move on from there and get Leslie's help [17:15:10] oh, what's the issue ? [17:17:52] (03PS1) 10Faidon Liambotis: role/swift: fix a typo [operations/puppet] - 10https://gerrit.wikimedia.org/r/95639 [17:18:41] LeslieCarr: they gave .3 to a server instead of reserving .2/.3 for the two routers [17:19:04] (03CR) 10Faidon Liambotis: [C: 032] role/swift: fix a typo [operations/puppet] - 10https://gerrit.wikimedia.org/r/95639 (owner: 10Faidon Liambotis) [17:21:45] (03PS1) 10Akosiaris: ganglia_new perms/owner fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/95641 [17:22:36] paravoid: I think chris fixed that [17:22:37] not sure [17:22:49] other than that leslie, once the new subnets are assigned [17:22:53] i think we just need to fix the analytics acl [17:23:15] 1) Can someone give me delete privs on wikitech wiki? [17:23:16] 2) or can someone delete a page for me? :) [17:23:32] m-ark said: 'ask leslie to carefully check the full config of the new subnets, compare with the existing. and multicast. and ipv6 router advertisements. and yadda yadda yadda' [17:23:39] (03CR) 10Akosiaris: [C: 032] ganglia_new perms/owner fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/95641 (owner: 10Akosiaris) [17:25:07] PROBLEM - MySQL Processlist on db1052 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 1 copy to table, 76 statistics [17:25:20] oh, i don't think i did ipv6 router announcements [17:27:28] apergos: have a sec to give me bureaucrat on wikitech? [17:27:43] meeting starting in 2 mins [17:27:44] LeslieCarr: i think you forgot to update the ACLs too [17:27:50] quite possibly [17:27:58] and I have to do the reconnection drill which google has made more annoying [17:28:24] technically i've been on vacation the last 2 days... except no vacation ;) [17:28:49] huh [17:28:59] re reconnection drill [17:29:43] huh [17:30:07] RECOVERY - MySQL Processlist on db1052 is OK: OK 1 unauthenticated, 0 locked, 0 copy to table, 8 statistics [17:30:14] greg-g: who can I get to upload a tarball? \https://rt.wikimedia.org/Ticket/Display.html?id=6316 [17:30:44] hexmode: someone with root, but they're all going into a meeting right now apparently :) [17:31:03] greg-g: yeah, :P [17:31:27] greg-g: not everyone w/ root has access to download, apparently? [17:31:34] * greg-g shrugs [17:32:01] Reed y for example. [17:33:10] anyone in ops mad if ori makes me a crat while ya'll are in a meeting? [17:33:15] (on wikitech wiki) [17:35:01] Eloquence: do you know if there's a formal process for that? [17:35:17] I know Kaldari nominated himself, but IIRC it was somewhat tongue-in-cheek [17:36:22] it seems to be limited to opsen (current and old) plus good ole luis [17:36:34] I doubt anyone will care :) go for it. [17:36:38] OK, thanks [17:37:02] go ahead [17:37:04] thanks ori [17:37:08] and Eloquence and paravoid :) [17:37:17] (many of us are in a meeting) [17:37:18] * greg-g gets to delete pages now, weee [17:37:44] what's your username? [17:37:50] Greg Grossmeier [17:38:54] done [17:39:03] ty [17:43:51] (03PS1) 10Akosiaris: Have sockpuppet complain on running puppet-merge [operations/puppet] - 10https://gerrit.wikimedia.org/r/95642 [17:48:24] (03CR) 10Chad: [C: 032] Configure labs to have 2 search replicas [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95533 (owner: 10Manybubbles) [17:48:36] (03Merged) 10jenkins-bot: Configure labs to have 2 search replicas [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95533 (owner: 10Manybubbles) [17:49:14] (03CR) 10Akosiaris: [C: 032] Have sockpuppet complain on running puppet-merge [operations/puppet] - 10https://gerrit.wikimedia.org/r/95642 (owner: 10Akosiaris) [17:50:07] hexmode: can have .tar.gz and .sig (like the others)? [17:50:23] (in your /tmp afair) [17:50:31] mutante: 1s [17:51:06] hexmode: oh, it's the whole dir.. got it [17:51:22] mutante: yep [17:52:27] mutante: now extracted in /tmp if you want, also [17:58:48] !log uploaded mw1.22.0rc2 from hexmode http://dumps.wikimedia.org/mediawiki/1.22/ [17:58:51] hexmode: done [17:59:02] Logged the message, Master [17:59:39] (03PS5) 10Ottomata: Writing JSON statistics to log file rather than syslog or stderr [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/95473 [18:00:17] mutante: have you done a dry run of your irc patch on labs yet? If not then I can do that now [18:00:59] andrewbogott: no, i haven't yet, listening to RFP meeting [18:01:06] 'k [18:01:31] andrewbogott: thanks, if you want to run it.. [18:01:51] do you think it's right to put wikibugs in it? [18:01:56] i wasn't that sure [18:02:07] because actually it's not that related to the ircd we run [18:02:10] just both are IRC [18:02:22] Yeah, it probably should probably be a separate module. [18:02:24] and therefore were in misc/irc.pp [18:02:35] and my approach was simply to convert an existing misc/ [18:02:38] so far [18:03:03] (03CR) 10Ottomata: "(1 comment)" [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/95473 (owner: 10Ottomata) [18:03:07] that would also make it easier to merge [18:03:15] would just influence the ircd but not also the bot [18:03:24] * andrewbogott nods [18:03:25] and the bot is touchy [18:04:46] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 3 hours [18:06:37] andrewbogott: i'll make another ps later to remove it then [18:07:47] PROBLEM - Puppet freshness on tin is CRITICAL: No successful Puppet run in the last 3 hours [18:10:12] mutante: ok, I'll wait until then to test. [18:14:29] neon (and i guess others like tin), puppet freshnesss above: they are getting 502 Proxy Error from puppetmaster [18:28:38] (03PS1) 10Ori.livneh: Beta Labs: enable UDP profiler [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95646 [18:30:09] !log ori updated /a/common to {{Gerrit|I132d97598}}: Beta Labs: enable UDP profiler [18:30:20] Logged the message, Master [18:31:00] (03CR) 10Ori.livneh: [C: 032] Beta Labs: enable UDP profiler [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95646 (owner: 10Ori.livneh) [18:31:29] events arriving order out of [18:40:59] greg-g: so our collection rendering project is going somewhat swimmingly; currently my deployment plan is to spin up the rendering infrastructure in labs and point the collection extension at the labs servers (people will be given the choice what renderer to use; and it's something we can turn off if required) [18:41:13] greg-g: given that we hope to have that out by late next week [18:41:30] what are your immediate thoughts/concerns/questions? [18:41:53] w0t [18:41:56] *w00t [18:48:10] mwalker: So what you're saying is you guys essentially re-did the entire pdf book rendering thingy in about a week? [18:48:14] that's pretty impressive [18:48:26] uh; that remains to be seen [18:48:31] it's still pretty rocky [18:49:37] (03CR) 10Umherirrender: "Would be nice, if this gets deployed at weekend, because echo will be deployed on prod-dewiki next week and than it is possible to play a " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95450 (owner: 10Umherirrender) [18:49:44] nonetheless, even if its in alpha form, that's pretty impressive [18:50:33] * mwalker waffles -- we're still only rendering single articles -- a lot of the teething pains will come next week when we start compositing whole books [18:50:34] but yes [18:50:44] it turns out the parsoid RDF makes it not so hard :) [18:51:48] mwalker: just talked with erik about it a little, sounds reasonable to go out next week, though next week is pretty busy (pre-thanksgiving stuff, I guess, but also things delayed from yesterday) [18:51:53] mwalker: do you have an idea of date? [18:52:15] metadata: it's cool [18:52:22] (03PS4) 10Dzahn: move IRC server to module [operations/puppet] - 10https://gerrit.wikimedia.org/r/94407 [18:53:26] greg-g: if we're comfortable releasing it to the world, it's going to have to be late thursday [18:53:39] good to know [18:53:42] I'll pencil you in for now [18:53:58] mwalker: let me know how you feel on, say, wednesday early afternoon [18:54:01] *nods* [18:54:22] and we cant deploy at all 25th/26th? [18:54:46] PROBLEM - Puppet freshness on terbium is CRITICAL: No successful Puppet run in the last 3 hours [18:57:42] mwalker: only emergencies [18:57:52] so, make the current pdf cluster fall over, deploy new stuff [18:57:54] ;) [18:58:08] I don't think I really have to work to make it fall over... [18:58:36] pretty sure just breathing on the things makes them have issues [18:58:37] (03PS5) 10Dzahn: move IRC server to module [operations/puppet] - 10https://gerrit.wikimedia.org/r/94407 [18:58:54] :) [19:01:31] andrewbogott: there, PS5.. much simpler now [19:01:51] mutante: ok, I will test in a minute [19:02:05] the mediawiki-irc-relay also shouldn't hardcode server names.. but one by one.. trying to not put all in one change again [19:06:06] (03PS6) 10Dzahn: move IRC server to module [operations/puppet] - 10https://gerrit.wikimedia.org/r/94407 [19:13:26] PROBLEM - RAID on vanadium is CRITICAL: CRITICAL: Active: 5, Working: 5, Failed: 1, Spare: 0 [19:16:40] (03CR) 10Andrew Bogott: [C: 031] "I've verified on labs that this change is a no-op." [operations/puppet] - 10https://gerrit.wikimedia.org/r/94407 (owner: 10Dzahn) [19:35:46] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 3 hours [19:36:46] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 3 hours [19:51:17] PROBLEM - Disk space on xenon is CRITICAL: DISK CRITICAL - free space: /mnt/data 15015 MB (3% inode=99%): [20:00:17] RECOVERY - Disk space on xenon is OK: DISK OK [20:04:55] Hey. Is Wikipedia really slow or is it my internet connection? [20:05:03] ugh, not again [20:05:04] * greg-g looks [20:05:19] Sven_Manguard: not here, are you in Europe? [20:05:34] I'm in the US, east coast [20:05:58] loading is slow for both Wikipedia (where it's very, very slow) and Wikidata (where it's only marginally slower than normal) [20:06:01] huh, then we should be connecting to same datacenter, and I'm not seeing any issues in my quick random page loads [20:06:12] but load times are fine when I go to non WMF sites [20:06:25] sounds like a routing issue [20:06:30] is LeslieCarr online? [20:06:53] I should just know what the default diagnostics are for this other than traceroute [20:06:55] http://www.gunnerkrigg.com/ is my go to large site (webcomic, one giant image, if it's my connection it takes a while to load) [20:07:00] Sven_Manguard: do I traceroute, I suppose :) [20:07:01] and that loaded instantly [20:07:07] s/I/a/ [20:07:11] meanwhile https://en.wikipedia.org/w/index.php?title=Wikipedia:Reward_board&action=edit§ion=15 is taking several minutes [20:07:31] just about as fast as can be expected for me [20:10:59] traceroute? [20:11:26] and what does "do I traceroute, a suppose" mean :P [20:11:48] Sven_Manguard: in a terminal type: [20:11:49] traceroute en.wikipedia.org [20:13:45] Oy. I found the issue. Steam started updating on me without me knowing [20:17:31] greg-g: in my console? as in cmd? It didn't work [20:17:53] and Wikipedia is still loading painfully slowly, even with Steam paused [20:18:52] (03PS1) 10Dr0ptp4kt: WIP: DO NOT MERGE YET. Apply FlaggedRevs to metawiki for W0. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95662 [20:19:43] (03CR) 10Dr0ptp4kt: [C: 04-2] "DO NOT MERGE YET." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95662 (owner: 10Dr0ptp4kt) [20:21:59] (03CR) 10Yuvipanda: "Patch Set 1: Code-Review+2" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95662 (owner: 10Dr0ptp4kt) [20:22:13] (03CR) 10Yuvipanda: "Change has been successfully merged into the git repository." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95662 (owner: 10Dr0ptp4kt) [20:24:34] Sven_Manguard: I think the command is called tracert on windows [20:25:29] <^demon|away> It is. [20:36:23] (03CR) 10Aaron Schulz: WIP: DO NOT MERGE YET. Apply FlaggedRevs to metawiki for W0. (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95662 (owner: 10Dr0ptp4kt) [20:39:01] ok hi LeslieCarr! [20:39:06] can you help me with the analytics networking stuff? [21:03:58] (03PS1) 10Andrew Bogott: Removed misc::deployment::scripts class. [operations/puppet] - 10https://gerrit.wikimedia.org/r/95699 [21:05:46] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 3 hours [21:05:53] !log install unzip on tin (needs puppetization) [21:06:08] Logged the message, Master [21:08:46] PROBLEM - Puppet freshness on tin is CRITICAL: No successful Puppet run in the last 3 hours [21:18:20] (03CR) 10Dr0ptp4kt: WIP: DO NOT MERGE YET. Apply FlaggedRevs to metawiki for W0. (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95662 (owner: 10Dr0ptp4kt) [21:18:31] (03PS2) 10Dr0ptp4kt: WIP: DO NOT MERGE YET. Apply FlaggedRevs to metawiki for W0. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95662 [21:24:26] LeslieCarr: whenever you get a moment: https://rt.wikimedia.org/Ticket/Display.html?id=6279 [21:24:30] would be much obliged [21:24:49] cmjohnson1: also, note I updated https://rt.wikimedia.org/Ticket/Display.html?id=6238 [21:24:56] an12 still seems to have problems [21:25:06] i know you are busy with es servers [21:27:00] ottomata: thx...yeah not sure what's going on with that. I did exactly as it wanted but no idea [21:31:14] (03PS1) 10Dzahn: install unzip on tin [operations/puppet] - 10https://gerrit.wikimedia.org/r/95702 [21:36:38] mutante: :-]]]] [21:36:49] mutante: I was kidding !!! [21:38:11] (03CR) 10Hashar: [C: 031] "Sounds good. The RT would give us some history for the next decade cleanup." [operations/puppet] - 10https://gerrit.wikimedia.org/r/95702 (owner: 10Dzahn) [21:44:41] (03PS1) 10Hashar: deployment: integration/jenkins for Jenkins CI slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/95705 [21:52:47] (03CR) 10Dzahn: [C: 032] install unzip on tin [operations/puppet] - 10https://gerrit.wikimedia.org/r/95702 (owner: 10Dzahn) [21:53:14] root@sockpuppet:~# puppet-merge [21:53:15] You shouldn't run puppet-merge here. Sockpuppet is unused. [21:53:17] wooohoo :) [21:53:22] akosiaris1: :) [21:54:34] I wouldn't party yet [21:55:10] terbium can't run puppet and it fails for something that is already fixed in gerrit, so my suspicion is that puppet-merge is broken [21:55:46] PROBLEM - Puppet freshness on terbium is CRITICAL: No successful Puppet run in the last 3 hours [21:55:48] oh.. oops. and i saw failed run on neon earlier [21:55:55] (03PS1) 10Hashar: misc::contint merged in role::ci::master [operations/puppet] - 10https://gerrit.wikimedia.org/r/95706 [21:55:56] giving proxy errpr [21:56:05] that's a different error, probably apache timeout [22:23:08] spagewmf: do you happen to know if instance 'mediawiki-temp-test' in editor-engagement is still needed? [22:24:58] akosiaris: same question about pmasterca [22:31:01] (03PS1) 10Cmjohnson: Adding entries for elastic1001-12 [operations/puppet] - 10https://gerrit.wikimedia.org/r/95716 [22:36:47] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 3 hours [22:37:46] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 3 hours [22:51:38] (03CR) 10Cmjohnson: [C: 032] Adding entries for elastic1001-12 [operations/puppet] - 10https://gerrit.wikimedia.org/r/95716 (owner: 10Cmjohnson) [23:08:01] (03PS1) 10Ori.livneh: Configure tungsten as wgUDPProfilerHost [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95719 [23:08:21] * greg-g looks [23:10:36] (03CR) 10Greg Grossmeier: [C: 031] "I approve, and I'm ok with this going out now as if it breaks we should see it pretty quickly (and it's an easy revert)." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95719 (owner: 10Ori.livneh) [23:11:09] (03CR) 10Ori.livneh: [C: 032] Configure tungsten as wgUDPProfilerHost [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95719 (owner: 10Ori.livneh) [23:11:44] !log ori updated /a/common to {{Gerrit|Ia9e64253d}}: Configure tungsten as wgUDPProfilerHost [23:11:58] Logged the message, Master [23:12:06] btw, for those looking in here, I had more reasoning than the one line repeated by grrrit-wm [23:13:12] !log ori synchronized wmf-config/CommonSettings.php 'Ia9e64253d: Configure tungsten as $wgUDPProfilerHost' [23:13:24] Logged the message, Master [23:14:35] (03PS1) 10Manybubbles: Puppet configuration for new elasticsearch servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/95720 [23:15:00] greg-g: verified; works. [23:15:10] yay [23:15:19] ^d: did you catch it? [23:15:21] (03CR) 10Manybubbles: [C: 04-1] "Work in progress at this point: haven't finished linting and have an annoying TODO left. Consider this just review fodder for now." [operations/puppet] - 10https://gerrit.wikimedia.org/r/95720 (owner: 10Manybubbles) [23:15:22] i quoted $ [23:15:32] <^d> Heh [23:15:48] isn't eligible rather than eligable? [23:15:58] the word I mean [23:17:30] where? [23:17:42] manybubbles' patchset [23:17:47] 95720 [23:18:47] Yes. [23:19:09] g'evening Elsie [23:19:22] * Elsie waves. [23:19:25] https://en.wiktionary.org/wiki/eligible#English [23:24:48] look at that, a wiki dictionary [23:25:10] what will they think of next [23:26:01] (03CR) 10MZMcBride: "Faidon pointed out (and I confirmed) that you probably want master_eligible, not master_eligable. Both the code and commit message need a " [operations/puppet] - 10https://gerrit.wikimedia.org/r/95720 (owner: 10Manybubbles) [23:26:14] oh, heh, thanks Elsie [23:26:22] No problem. [23:26:31] you are a native speaker, right? [23:26:38] I am. [23:26:55] I'm grateful for that. English is brutal. [23:39:26] (03CR) 10Andrew Bogott: Puppet configuration for new elasticsearch servers (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/95720 (owner: 10Manybubbles)