[00:00:42] (03PS1) 10Ori.livneh: profiler-to-carbon: read entire socket buffer in one go [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/101146 [00:01:29] (03CR) 10Ori.livneh: [C: 032 V: 032] profiler-to-carbon: read entire socket buffer in one go [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/101146 (owner: 10Ori.livneh) [00:01:38] ori-l: Hold on [00:01:43] I'm very slightly going over my deploy window [00:02:03] RoanKattouw: it's not something that would affect / conflict with your deployment [00:02:33] that is, i'm not syncing / scapping anything [00:02:38] Oh OK [00:02:40] Carry on :) [00:03:05] mwalker: You actually signed up for the LD, please hold while I wrap up the preceding window [00:03:25] !log catrope updated /a/common/php-1.23wmf7 to {{Gerrit|If7c3c52ed}}: Update EducationProgram to wmf7 branch for cherry-picks [00:03:26] RoanKattouw: I'm waiting for jenkins to merge my commits anyways [00:03:34] could take a couple more hours [00:03:42] Logged the message, Master [00:03:52] heh [00:04:05] !log catrope synchronized php-1.23wmf7/extensions/EducationProgram 'Cherry-pick to fix ContextSource compatibility' [00:04:16] mwalker: OK I'm done, it's all yours [00:04:18] Logged the message, Master [00:04:28] AndyRussG: There's your deployment ---^^ [00:04:49] \o/ [00:05:15] Thanks a ton [00:06:49] RoanKattouw: thanks for taking care of the EduProgram issue [00:11:29] !log mwalker synchronized php-1.23wmf6/extensions/Collection/ 'Reverting to known good condition for collection extension' [00:11:46] Logged the message, Master [00:12:00] !log mwalker synchronized php-1.23wmf7/extensions/Collection/ 'Reverting to known good condition for collection extension' [00:12:15] Logged the message, Master [00:13:25] mwalker: How we doin' here? [00:13:37] marktraceur: looks like I'm stable [00:13:39] so; go for it! [00:13:40] 'kay [00:18:40] !log mholmquist synchronized php-1.23wmf6/extensions/MultimediaViewer/ 'Fix for another stray MultimediaViewer event' [00:18:51] gotta catch 'em all [00:18:55] Logged the message, Master [00:19:40] Oh, sorry [00:19:44] LIGHTENING DEPLOOOOYYYYYY [00:20:03] marktraceur reads the instructions [00:20:05] !log mholmquist synchronized php-1.23wmf7/extensions/MultimediaViewer/ 'Fix for another stray MultimediaViewer event' [00:20:11] Only halfway through [00:20:19] Logged the message, Master [00:20:25] When I'm confused and have lost half the parts [00:20:39] If there's anyone else they can go, I need to await the RL cache update [00:20:57] nah, you're it [00:21:28] 'kay [00:24:18] Oh, wait. No. Agh. [00:24:43] 8 minutes left, give me one sec. [00:25:09] !log mholmquist synchronized php-1.23wmf6/extensions/MultimediaViewer/ 'Fix for another stray MultimediaViewer event, take 2.' [00:25:24] Logged the message, Master [00:25:58] !log mholmquist synchronized php-1.23wmf7/extensions/MultimediaViewer/ 'Fix for another stray MultimediaViewer event, take 2.' [00:26:14] Logged the message, Master [00:26:16] Ignore me. [00:26:43] OK, that fixed it on mw.org (maybe? probably?), so I'm going to declare actually being done. [00:26:55] Yeah, enwiki is fixed too [00:27:14] * marktraceur praises Pikachu, Japanese god of thunder [00:55:23] (03PS1) 10Ori.livneh: Fix typo in profiler-to-carbon and tweak client behavior [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/101158 [00:55:33] (03CR) 10Ori.livneh: [C: 032 V: 032] Fix typo in profiler-to-carbon and tweak client behavior [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/101158 (owner: 10Ori.livneh) [01:03:35] (03CR) 10MarkTraceur: "You could reasonably ignore my blind adding of reviewers." [operations/puppet] - 10https://gerrit.wikimedia.org/r/101111 (owner: 10MarkTraceur) [01:10:06] (03CR) 10OliverKeyes: "Is there a rationale for this, or is it just 'this would be fun'? While Stat1 has a public IP we tend not to actively host projects there." [operations/puppet] - 10https://gerrit.wikimedia.org/r/101111 (owner: 10MarkTraceur) [01:10:43] (03PS2) 10MarkTraceur: Add nodejs to stat1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/101111 [01:11:00] ori-l: ^^ [01:12:06] (03CR) 10MarkTraceur: "Oliver:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/101111 (owner: 10MarkTraceur) [01:15:08] (03CR) 10OliverKeyes: [C: 031] "Makes sense; +1ing out of principle." [operations/puppet] - 10https://gerrit.wikimedia.org/r/101111 (owner: 10MarkTraceur) [01:16:37] (03CR) 10Ori.livneh: [C: 04-1] "whitespace" [operations/puppet] - 10https://gerrit.wikimedia.org/r/101111 (owner: 10MarkTraceur) [01:16:44] Bah [01:17:08] Fucking whitespace inconsistency [01:17:10] i'd usually fix it myself but i'm a bit busy [01:17:32] (03PS3) 10MarkTraceur: Add nodejs to stat1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/101111 [01:17:37] 's okay, I'm fast also [01:18:47] (03CR) 10Ori.livneh: [C: 032 V: 032] "approved by ops, patch looks fine" [operations/puppet] - 10https://gerrit.wikimedia.org/r/101111 (owner: 10MarkTraceur) [01:19:04] Shoop da whoop [01:19:32] PROBLEM - MySQL Replication Heartbeat on db66 is CRITICAL: CRIT replication delay 301 seconds [01:19:33] PROBLEM - MySQL Slave Delay on db66 is CRITICAL: CRIT replication delay 305 seconds [01:19:38] * marktraceur doesn't really know what the deploy process is like for changes to puppet [01:20:48] marktraceur: i deployed it [01:21:14] the process is a bit elaborate, but it's documented well on wikitech [01:21:16] if you're curious [01:28:22] PROBLEM - MySQL Slave Delay on db1003 is CRITICAL: CRIT replication delay 301 seconds [01:33:23] PROBLEM - MySQL Slave Delay on db1003 is CRITICAL: CRIT replication delay 311 seconds [01:33:33] PROBLEM - MySQL Replication Heartbeat on db1003 is CRITICAL: CRIT replication delay 316 seconds [01:35:09] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 306 seconds [01:35:19] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 314 seconds [01:37:12] (03PS1) 10Gage: JG: add gage to icinga contactgroups & cgi authorization [operations/puppet] - 10https://gerrit.wikimedia.org/r/101171 [01:42:09] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay -0 seconds [01:42:19] RECOVERY - MySQL Slave Delay on db1003 is OK: OK replication delay 0 seconds [01:42:19] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [01:42:39] RECOVERY - MySQL Replication Heartbeat on db1003 is OK: OK replication delay -1 seconds [01:51:29] RECOVERY - MySQL Replication Heartbeat on db66 is OK: OK replication delay 1 seconds [01:51:39] RECOVERY - MySQL Slave Delay on db66 is OK: OK replication delay 0 seconds [01:53:31] (03CR) 10Dzahn: [C: 031] JG: add gage to icinga contactgroups & cgi authorization [operations/puppet] - 10https://gerrit.wikimedia.org/r/101171 (owner: 10Gage) [01:54:21] jgage: want me to merge or try yourself [01:55:29] looks good, i see you got the contact in private file [01:56:51] (03CR) 10Dzahn: "RT #6495" [operations/puppet] - 10https://gerrit.wikimedia.org/r/101171 (owner: 10Gage) [02:03:20] (03PS1) 10Springle: set db1017 as s5 analytics slave [operations/dns] - 10https://gerrit.wikimedia.org/r/101173 [02:05:31] (03CR) 10Springle: [C: 032] set db1017 as s5 analytics slave [operations/dns] - 10https://gerrit.wikimedia.org/r/101173 (owner: 10Springle) [02:05:46] (03CR) 10Dzahn: [C: 032] JG: add gage to icinga contactgroups & cgi authorization [operations/puppet] - 10https://gerrit.wikimedia.org/r/101171 (owner: 10Gage) [02:21:05] !log LocalisationUpdate completed (1.23wmf6) at Fri Dec 13 02:21:05 UTC 2013 [02:21:23] Logged the message, Master [02:41:23] !log LocalisationUpdate completed (1.23wmf7) at Fri Dec 13 02:41:23 UTC 2013 [02:41:40] Logged the message, Master [02:43:05] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [02:48:41] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Dec 13 02:48:41 UTC 2013 [02:48:55] Logged the message, Master [02:50:52] !log ongoing schema changes on slaves, indexing only, logging gerrit 85508, wb_terms gerrit 99660 [02:51:08] Logged the message, Master [03:01:45] (03PS1) 10Dzahn: install various Perl modules needed by Bugzilla [operations/puppet] - 10https://gerrit.wikimedia.org/r/101174 [03:09:13] (03PS2) 10Dzahn: install various Perl modules needed by Bugzilla [operations/puppet] - 10https://gerrit.wikimedia.org/r/101174 [03:13:10] (03PS3) 10Dzahn: install various Perl modules needed by Bugzilla [operations/puppet] - 10https://gerrit.wikimedia.org/r/101174 [03:20:06] PROBLEM - Puppet freshness on tungsten is CRITICAL: Last successful Puppet run was Fri 13 Dec 2013 12:18:59 AM UTC [03:31:56] PROBLEM - Host elastic1007 is DOWN: PING CRITICAL - Packet loss = 100% [03:34:16] RECOVERY - Host elastic1007 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [03:35:13] PROBLEM - NTP on elastic1007 is CRITICAL: NTP CRITICAL: Offset unknown [03:38:13] RECOVERY - NTP on elastic1007 is OK: NTP OK: Offset -0.0009071826935 secs [03:39:04] (03PS4) 10Dzahn: install various Perl modules needed by Bugzilla [operations/puppet] - 10https://gerrit.wikimedia.org/r/101174 [03:42:20] (03CR) 10Dzahn: [C: 031] "PS4: use libemail-sender-perl instead of libemail-send-perl" [operations/puppet] - 10https://gerrit.wikimedia.org/r/101174 (owner: 10Dzahn) [04:14:45] ./msg memoserv send apergos friendly memo for tomorrow because you are on duty: !change 101174 would be nice if you find the time :) tx :) [04:21:14] !memo is if you want to leave a memo for somebody on IRC to read later when they come online, try /query memoserv and type help in that new window, or /msg memoserv help to see the commands [04:21:15] Key was added [04:39:12] (03PS2) 10Tholam: Update favicon wiktionary/si.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100949 [05:09:57] (03PS3) 10Tholam: Update favicon wiktionary/si.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100949 [05:10:46] PROBLEM - Host mw27 is DOWN: PING CRITICAL - Packet loss = 100% [05:11:26] RECOVERY - Host mw27 is UP: PING OK - Packet loss = 0%, RTA = 35.42 ms [05:15:15] (03PS4) 10Tholam: Update favicon wiktionary/si.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100949 [06:20:42] PROBLEM - Puppet freshness on tungsten is CRITICAL: Last successful Puppet run was Fri 13 Dec 2013 12:18:59 AM UTC [06:55:52] RECOVERY - Puppet freshness on tungsten is OK: puppet ran at Fri Dec 13 06:55:46 UTC 2013 [07:32:01] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [08:08:06] Good morning. Repeating a question from #wm-tech here as more appropriate channel, and pinging apergos. Question for you all. bits.wikimedia.org seems to be rejecting ICMPv6 packets, specifically Packet Too Large for PMTUD. I'm on an IPv6 tunnel with MTU of 1280, and connections to bits.wikimedia.org often hang for 20+ seconds after handshake. Who should I poke re. investigating whether bits rejects such packets? [08:08:45] morning [08:09:24] we will want mark or para void later in the day [08:10:08] I can try to look at it earlier but even if I am able to verify this I wouldn't have the knowledge to fix it [08:10:11] kenneaal: [08:11:02] * apergos adds it to today's list [08:11:44] Thanks for ping. That would be neat. I'll be around most of the day, and available for assisting in testing that. I also have a public RIPE Atlas probe on the network in question, connected through the same IPv6 tunnel, so it would be possible for WM personnel to test the case directly. Can also get a linux VM going with the same connectivity. [08:12:23] that's great, it would be very helpful. [08:13:17] Hehe. Well, vested self-interest in not having to hit reload every hour or so when bits.wm.org forgets my MTU. :P [08:14:19] I wouldn't be surprised if the server suppresses ICMPv6, but other servers (en.wikipedia.org, etc) seem to handle PMTUD correctly. So perhaps there is inconsistent configuration at work. [08:18:10] what ip address does bits resolve to for you? [08:28:21] (03CR) 10Alexandros Kosiaris: "I you do decide to go through all that, it would be best to also create a role class that has all the monitoring, firewall , backup, syste" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 (owner: 10Matanya) [08:29:24] Sorry, please ping if speaking to me, I have a metric ton of windows open at the moment. Resolves to bits-lb.esams.wikimedia.org (2620:0:862:ed1a::a) [08:30:45] apergos: [08:30:48] yep [08:31:01] I'll ping if I need instant answer, no worries :-) [08:31:20] Paste of traceroute6: http://pste.me/35ulx/ [08:32:01] great, thank you [08:33:29] Also, given that it responds to ICMPv6 ping, PMTUD not happening properly makes even less sense. [08:34:07] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [08:52:29] (03PS4) 10Alexandros Kosiaris: Modularize misc::install-server classes [operations/puppet] - 10https://gerrit.wikimedia.org/r/89687 [08:53:26] (03PS1) 10ArielGlenn: access to pdf1,2,3 for mwalker, rt #6468 [operations/puppet] - 10https://gerrit.wikimedia.org/r/101181 [08:53:57] (03CR) 10Alexandros Kosiaris: [C: 032] Modularize misc::install-server classes [operations/puppet] - 10https://gerrit.wikimedia.org/r/89687 (owner: 10Alexandros Kosiaris) [08:54:17] (03PS2) 10ArielGlenn: access to pdf1,2,3 for mwalker, rt #6468 [operations/puppet] - 10https://gerrit.wikimedia.org/r/101181 [08:55:25] (03CR) 10ArielGlenn: [C: 032] access to pdf1,2,3 for mwalker, rt #6468 [operations/puppet] - 10https://gerrit.wikimedia.org/r/101181 (owner: 10ArielGlenn) [08:55:54] apergos: please don't merge yet on palladium [08:56:00] uh huh [08:56:09] I was just seeing this huge pile of changes [09:04:13] (03PS1) 10Alexandros Kosiaris: Migrate to new install-server module [operations/puppet] - 10https://gerrit.wikimedia.org/r/101182 [09:06:16] (03CR) 10Alexandros Kosiaris: [C: 032] Migrate to new install-server module [operations/puppet] - 10https://gerrit.wikimedia.org/r/101182 (owner: 10Alexandros Kosiaris) [09:07:43] apergos: ok I merged [09:07:48] thank you [09:25:08] PROBLEM - NTP on bast4001 is CRITICAL: NTP CRITICAL: No response from NTP server [09:27:48] PROBLEM - NTP on hooft is CRITICAL: NTP CRITICAL: No response from NTP server [09:29:14] * kenneaal checks. It's actually a two component polymer resin that hardens when the joint is crimped. So over 20kV/mm resistance. Should be fine. [09:29:21] Er... Totally not the right window. [09:31:03] :-D [09:31:08] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [09:54:13] (03PS1) 10Alexandros Kosiaris: Adding install-server::ubuntu-mirror class [operations/puppet] - 10https://gerrit.wikimedia.org/r/101187 [09:57:31] (03CR) 10Alexandros Kosiaris: [C: 032] Adding install-server::ubuntu-mirror class [operations/puppet] - 10https://gerrit.wikimedia.org/r/101187 (owner: 10Alexandros Kosiaris) [09:58:30] !log disabling puppet on carbon for a short time (debugging the new install-server module) [09:58:46] Logged the message, Master [10:01:05] PROBLEM - Squid on brewster is CRITICAL: Connection timed out [10:01:15] PROBLEM - HTTP on brewster is CRITICAL: Connection timed out [10:01:39] I am those [10:04:15] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [10:08:02] (03CR) 10ArielGlenn: [C: 04-1] "libemail-mime-modifier-perl should not be installed, as it is a virtual package, and puppet never realizes that virtual packages have been" [operations/puppet] - 10https://gerrit.wikimedia.org/r/101174 (owner: 10Dzahn) [10:09:57] * apergos looks arund for hashar [10:10:02] apergos: i am there [10:10:05] I am looking at https://rt.wikimedia.org/Ticket/Display.html?id=4824 [10:10:10] talk to me [10:10:25] ah yeah [10:10:36] replied to it in a hurry yesterday evening :/ [10:10:53] I should write down the ferm rules right now and get that fixed for good [10:11:12] do it today and I'll look at it today [10:11:36] the reason that ticket took so long was basically because of Augeas which nobody could help with :( [10:11:52] * hashar dig in ferm [10:13:00] do we have a way to apply a class by default on all instances? Aka the equivalent of production base ? :D [10:13:10] PROBLEM - NTP on brewster is CRITICAL: NTP CRITICAL: No response from NTP server [10:13:10] allll? [10:13:18] all instances of a project [10:13:19] I have no idea [10:13:32] well we just write a misc class for now =D [10:15:00] RECOVERY - NTP on brewster is OK: NTP OK: Offset -0.0014783144 secs [10:15:10] RECOVERY - HTTP on brewster is OK: HTTP OK: HTTP/1.1 200 OK - 3089 bytes in 0.187 second response time [10:15:57] akosiaris: oh oops [10:16:02] i just restarted lighttpd and squid on brewster [10:16:31] mark: no worries [10:17:00] RECOVERY - Squid on brewster is OK: TCP OK - 0.035 second response time on port 8080 [10:17:01] heh, I did the same thing, well 1/2 because I saw squid already running... [10:17:14] didn't connect the dots [10:17:30] * apergos goes to make an omlette and reclaim some brain cells from bz perl module dependency hell [10:17:34] back in about 10 mins [10:17:37] hmmm like I never said: (12:01:39 μμ) akosiaris: I am those [10:17:55] I didn't see the backread here, I got a page [10:17:58] I will blame me and will be clearer next time [10:18:25] you were clear [10:18:27] hmmm i never did [10:18:34] i never did get a page i mean [10:18:49] nimsoft alert ubuntu mirror... [10:19:10] and there's the recovery [10:19:33] brb [10:19:36] ok [10:19:42] !g Ie9be31bec57e70fc84bd59a5524e8f848bb61630 [10:19:42] https://gerrit.wikimedia.org/r/#q,Ie9be31bec57e70fc84bd59a5524e8f848bb61630,n,z [10:35:47] * hashar digs in ferm manual [10:36:06] gotta convert: iptables -t nat -I OUTPUT --dest $public_ip -j DNAT --to-dest $private_ip [10:37:50] * apergos sis down to hot omelette and the earlier gerrit changesets [10:38:29] *sitsss [10:38:31] grrr [10:41:38] (03PS1) 10Alexandros Kosiaris: Fix ferm rules for install-server, haproxy, backup [operations/puppet] - 10https://gerrit.wikimedia.org/r/101191 [10:43:10] (03CR) 10Alexandros Kosiaris: [C: 032] Fix ferm rules for install-server, haproxy, backup [operations/puppet] - 10https://gerrit.wikimedia.org/r/101191 (owner: 10Alexandros Kosiaris) [10:45:57] !g I0b02a46f350a99e2e0d29a2da72d6ef6932c8c22 [10:45:57] https://gerrit.wikimedia.org/r/#q,I0b02a46f350a99e2e0d29a2da72d6ef6932c8c22,n,z [10:46:36] (03PS1) 10Hashar: beta: public IP rewriting using DNAT [operations/puppet] - 10https://gerrit.wikimedia.org/r/101192 [10:48:30] hashar: its daddr, not dest [10:49:36] akosiaris: thx :-) [10:50:08] /etc/init.d/ferm start [10:50:08] * Starting Firewall ferm iptables-restore v1.4.4: invalid mask `46' specified [10:50:12] damn... [10:50:21] using my patch ? [10:50:21] it thinks IPv6 is IPv4.... :-( [10:50:23] nope [10:50:25] mine [10:50:25] ah [10:50:35] was wondering how you managed to get my patch tested so fast :-D [10:52:17] (03PS2) 10Hashar: beta: public IP rewriting using DNAT [operations/puppet] - 10https://gerrit.wikimedia.org/r/101192 [10:54:51] * hashar watches puppet running while listening to loud techno music [10:55:13] aaah it's ferm 2.1.2 ? [10:55:15] hmmmm [10:55:45] hey Could not stop Service[ferm]: :D [10:56:20] Ubuntu 10.04.4 LTS :-(... snif... [10:56:33] 10.04.4?? ouch [10:56:44] that is brewster isn't it ? [10:56:50] and carbon [10:57:02] ah I thought ori phased out that machine [10:57:28] he did a bunch of work to migrate graphite/gdash.. to eqiad [10:57:58] RECOVERY - NTP on bast4001 is OK: NTP OK: Offset -0.001607775688 secs [10:58:51] carbon is in eqiad [10:59:11] elements [10:59:36] I say next time we go for exoplanets :P [10:59:36] (03PS3) 10Hashar: beta: public IP rewriting using DNAT [operations/puppet] - 10https://gerrit.wikimedia.org/r/101192 [11:00:21] https://wikitech.wikimedia.org/wiki/Talk:Server_naming_conventions [11:00:22] (03PS4) 10Hashar: beta: public IP rewriting using DNAT [operations/puppet] - 10https://gerrit.wikimedia.org/r/101192 [11:00:32] put your edits where your mouth is [11:00:38] RECOVERY - NTP on hooft is OK: NTP OK: Offset 7.164478302e-05 secs [11:02:18] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [11:02:45] apergos: Ι 'll start by upload this http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/100000/20000/4000/400/124456/124456.strip.gif [11:03:06] perfect [11:05:42] :D [11:06:55] yeah iptables rules!!! http://paste.debian.net/70669/ [11:07:29] your commit message has 208.80.153.243 twice... might wanna fix that [11:07:49] yeah will [11:07:56] got to add some patch from labs [11:10:58] (03PS5) 10Hashar: beta: public IP rewriting using DNAT [operations/puppet] - 10https://gerrit.wikimedia.org/r/101192 [11:13:47] (03PS1) 10Alexandros Kosiaris: Add linux-host-entries.ttyS1-9600 empty file [operations/puppet] - 10https://gerrit.wikimedia.org/r/101194 [11:15:35] (03CR) 10Hashar: [V: 031] "Patchset 5 tested on deployment-staging-cache-mobile02:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/101192 (owner: 10Hashar) [11:16:02] apergos: patch works for me on deployment-staging-cache-mobile02 (a puppet self instance) [11:16:18] I have commented on the patch listing a ferm rule config file and the result of iptables --list -t nat [11:16:31] great [11:16:42] I can't believe it only took me an hour to configure the rules, thanks to ferm! [11:17:18] ah ubuntu next LTS is in April 2013. [11:17:20] 2014 [11:17:24] going to be fun times [11:17:38] ugh [11:17:58] contint boxes are already volunteering for migration [11:18:02] though gallium will probably have to be reinstalled from scratch :D [11:18:59] and we still apparently have some 8.04 boxes:P [11:20:15] is that hardy ? [11:20:56] (03CR) 10Alexandros Kosiaris: [C: 032] Add linux-host-entries.ttyS1-9600 empty file [operations/puppet] - 10https://gerrit.wikimedia.org/r/101194 (owner: 10Alexandros Kosiaris) [11:20:59] yes [11:21:33] yep: https://en.wikipedia.org/wiki/Ubuntu_releases [11:22:12] "trusty" [11:23:02] apergos: can we proceed with https://gerrit.wikimedia.org/r/101192 ? [11:23:10] I am a little confused about how you can possibly get the right rules out... looking at the ferm::rule class, it has default table and chain, how do you get away with setting them in the rule text? [11:23:23] passing parameters to ferm::rule [11:23:35] maybe I have sent the wrong patch in gerrit ghmhm [11:23:55] https://gerrit.wikimedia.org/r/#/c/101192/5/manifests/misc/beta.pp,unified line 88-89 [11:24:00] table => 'nat', [11:24:05] chain => 'OUTPUT' [11:24:13] ohh that's better [11:24:19] maybe I was looking at an earlier version [11:24:23] yeah made that with patchset 5 [11:24:24] sorry :( [11:24:30] yep ps4 [11:25:22] ok lemme stare at it for a couple more minutes given it's the new patchset [11:25:25] sorry... [11:25:45] tis ok [11:25:58] the patch is applied on deployment-staging-cache-mobile02.pmtpa.wmflabs [11:26:11] if you want to play with it (like deleting all ferm rules, running puppet or looking at the generated conf [11:26:30] a potential issue we have with ferm::rule is that it does not purge the /etc/ferm/conf.d files :( [11:26:38] nothing we can really do about I am afraid [11:28:31] ughhh [11:29:10] you can pass ensure = absent to the rule [11:29:34] but if you are worried about someone else remembering that, you could add a comment to the beta stanza [11:29:35] your call [11:30:06] well puppet file {} as a purge parameter which would delete any file not managed by puppet [11:30:18] it does purge [11:30:18] no clue how it is going to work with multiple file{} statements in the same dir [11:30:28] file { '/etc/ferm/conf.d' : [11:30:28] ensure => directory, [11:30:28] owner => root, [11:30:28] group => adm, [11:30:28] mode => '0500', [11:30:29] recurse => true, [11:30:29] purge => true, [11:30:30] require => Package['ferm'], [11:30:30] notify => Service['ferm'], [11:30:30] potentially it might only keep the very last file {} :/ [11:30:31] } [11:30:37] ahhh [11:30:37] recurse = > true, purge => true [11:30:41] at the dir level nice [11:30:43] conf.d will be purge on every run [11:30:48] purged* [11:30:50] \o/ [11:30:53] noisy but nonetheless [11:31:03] noisy ? [11:31:08] so I guess it is work for me :-D [11:31:15] well we'll see the recreations on every run I suppose [11:31:19] nope [11:31:27] ? [11:31:30] only additions and deletions [11:32:01] files created in a purged directory still exist in the catalog [11:32:14] so puppet does not touch them if the exist in the system [11:32:27] oh ho [11:32:32] it will only purge files not existing in the catalog [11:32:52] amazing.. puppet doing something well :-D [11:46:15] hey [11:46:26] is there anyone i can ask my stupid db questions? ;) [11:47:32] ah ok...not stupid...dbtree also states db73 has replication lag [11:50:48] yes it does show that [11:51:35] (03CR) 10Aude: [C: 031] "looks good to me :) thanks hashar!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/101192 (owner: 10Hashar) [11:51:36] we have an open ticket, [11:51:40] the info on the ticket says that [11:51:51] "WikiExporter dumps from searchidx* (pmtpa lucene I guess) combined with wikidata write activity and long-running research queries... make for an unhappy slave." [11:51:55] (quoting sprin gle) [11:54:48] (03PS1) 10ArielGlenn: fix typo in one of the bastion host ips for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/101199 [11:55:08] nosy1: are you affected by that? [11:56:01] (03CR) 10ArielGlenn: [C: 032] fix typo in one of the bastion host ips for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/101199 (owner: 10ArielGlenn) [11:56:16] (03PS1) 10Alexandros Kosiaris: tftp is udp not tcp [operations/puppet] - 10https://gerrit.wikimedia.org/r/101200 [11:58:57] hello [11:59:07] hey [11:59:15] (03CR) 10Alexandros Kosiaris: [C: 032] tftp is udp not tcp [operations/puppet] - 10https://gerrit.wikimedia.org/r/101200 (owner: 10Alexandros Kosiaris) [11:59:21] apergos: yes currently [11:59:35] springle suggested we should use this db host [11:59:43] but i can also switch [11:59:57] apergos: do you know when the job should complete? [12:01:36] not a clue [12:01:50] ok...np...ill change the master here [12:01:51] we would have to ask the authors of those queries :-( [12:01:54] ok [12:02:33] paravoid: how difficult was it to backport ferm 2.2 to precise ? any chance we could do it for lucid ? (god i hate myself for asking this) [12:14:03] which lucid hosts are we keeping? :) [12:17:32] akosiaris: reprepro copy [12:17:46] akosiaris: its only dependency is perl [12:17:55] (and iptables) [12:55:21] hey hashar, when you have a moment. i need some help regarding the puppet config for production https://gerrit.wikimedia.org/r/#/c/101058/ [12:55:30] we want to make sure a runner picks up those 3 gwtoolset jobs … aaron helped me out with that config and i just want to verify that it's okay plus ... [12:55:38] how do i do the same for the beta cluster? [12:57:14] and lastly, csteipp would like some ops input on one of our configs https://bugzilla.wikimedia.org/show_bug.cgi?id=58417 [13:10:21] (03PS5) 10Dan-nl: Production configuration for GWToolset [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101061 [13:14:45] (03PS1) 10Alexandros Kosiaris: dhcp is called bootps in /etc/services [operations/puppet] - 10https://gerrit.wikimedia.org/r/101204 [13:18:10] (03CR) 10Alexandros Kosiaris: [C: 032] dhcp is called bootps in /etc/services [operations/puppet] - 10https://gerrit.wikimedia.org/r/101204 (owner: 10Alexandros Kosiaris) [13:27:16] PROBLEM - MySQL Processlist on es1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:27:16] PROBLEM - MySQL Processlist on es1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:28:16] PROBLEM - Disk space on es1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:28:16] PROBLEM - MySQL Processlist on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:28:16] PROBLEM - MySQL InnoDB on es1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:28:16] PROBLEM - MySQL InnoDB on es1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:28:16] PROBLEM - mysqld processes on es1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:28:17] PROBLEM - MySQL InnoDB on es1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:28:26] PROBLEM - MySQL Processlist on es1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:28:26] PROBLEM - RAID on es1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:28:36] PROBLEM - MySQL Recent Restart on es1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:28:46] PROBLEM - Disk space on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:28:47] PROBLEM - RAID on es1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:28:50] hmm [13:28:56] PROBLEM - mysqld processes on es1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:29:06] RECOVERY - Disk space on es1002 is OK: DISK OK [13:29:06] RECOVERY - MySQL Processlist on es1002 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 14 statistics [13:29:06] PROBLEM - mysqld processes on es1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:29:06] PROBLEM - MySQL Recent Restart on es1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:29:16] PROBLEM - Disk space on es1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:29:16] PROBLEM - RAID on es1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:29:26] PROBLEM - DPKG on es1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:29:56] PROBLEM - MySQL InnoDB on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:29:56] PROBLEM - puppet disabled on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:29:57] PROBLEM - DPKG on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:29:57] PROBLEM - RAID on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:30:06] PROBLEM - MySQL Recent Restart on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:30:07] traffic spikeon es1 [13:30:16] PROBLEM - MySQL disk space on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:30:16] PROBLEM - mysqld processes on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:30:26] PROBLEM - puppet disabled on es1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:30:36] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [13:30:36] RECOVERY - RAID on es1002 is OK: OK: optimal, 1 logical, 2 physical [13:30:46] RECOVERY - mysqld processes on es1002 is OK: PROCS OK: 1 process with command name mysqld [13:30:56] PROBLEM - puppet disabled on es1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:30:56] PROBLEM - Disk space on es1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:30:56] RECOVERY - MySQL Recent Restart on es1003 is OK: OK seconds since restart [13:30:56] RECOVERY - mysqld processes on es1003 is OK: PROCS OK: 1 process with command name mysqld [13:31:06] RECOVERY - Disk space on es1003 is OK: DISK OK [13:31:06] RECOVERY - RAID on es1003 is OK: OK: optimal, 1 logical, 2 physical [13:31:16] PROBLEM - MySQL disk space on es1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:31:23] !log reduce max_connections on es100[1-4]. try to survive sudden spike [13:31:36] RECOVERY - Disk space on es1001 is OK: DISK OK [13:31:40]