[00:00:53] RECOVERY - mailman on sodium is OK: PROCS OK: 10 processes with args mailman [00:00:57] there we go [00:01:00] welcome back [00:01:02] switched now matanya [00:01:14] that's the new thing,thx [00:04:05] !log ori rebuilt wikiversions.cdb and synchronized wikiversions files: [00:04:07] !log ori finished scap: (no message) (duration: 29m 26s) [00:04:11] Logged the message, Master [00:04:19] Logged the message, Master [00:04:51] done for now [00:07:35] i'm going to sleep. night folks [00:08:55] matanya: night. cya [00:20:07] (03PS2) 10Dzahn: emery: remove last log before decom [operations/puppet] - 10https://gerrit.wikimedia.org/r/110394 (owner: 10Matanya) [00:24:22] why do our mysql parser cache nodes not use innodb_file_per_table? [00:24:47] (03CR) 10Dzahn: [C: 031] "ottomata: "as well as being left on emery"? matanya said it the box is ready for shutdown, true or do you need to copy anything? also re: " [operations/puppet] - 10https://gerrit.wikimedia.org/r/110394 (owner: 10Matanya) [00:30:39] (03CR) 10Dzahn: "well, nice that it's merged and all but now there are still just some FIXMEs in there, and we need to know stuff like if "check_stomp" is " [operations/puppet] - 10https://gerrit.wikimedia.org/r/109655 (owner: 10Dzahn) [00:33:02] Jeff_Green: "check_stomp.pl" or is it dead? it was a check on erzurumi, what is the fix: replace hostname or remove check entirely [00:33:16] was added back in RT 703 :p [00:33:39] (03PS2) 10TTO: Add variant rewrites for zhwikivoyage [operations/apache-config] - 10https://gerrit.wikimedia.org/r/110155 [00:33:42] #703: Mysterious Nagios error for erzurumi ,hehe [00:34:36] http://search.cpan.org/~lbrocard/Net-Stomp-0.32/lib/Net/Stomp.pm [00:35:04] (03CR) 10TTO: "zh-mo and zh-my now are added." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/110155 (owner: 10TTO) [00:35:14] mutante: stomp is some archaic monitoring setup for polling activemq when it ran on erzurumi [00:35:21] mutante: you can remove it entirely. we monitor that from inside frack now--there's a job that runs on silicon which reports by nsca instead of nrpe [00:35:24] i had that conversation with Jeff a few days ago [00:35:32] and there he is [00:35:34] cmjohnson1: yes, ActiveMQ, nod [00:35:38] * Jeff_Green runs away again [00:35:40] Jeff_Green: ok, thanks [00:36:07] i'll do that and then kill erzurumi [00:37:30] !log ebernhardson synchronized php-1.23wmf12/extensions/Echo/ 'Update echo for Special:Notifications fix' [00:37:37] Logged the message, Master [00:38:08] !log ebernhardson synchronized php-1.23wmf12/extensions/Flow/ 'Update flow for Special:Notifications fix' [00:38:16] Logged the message, Master [00:43:41] !log ebernhardson synchronized php-1.23wmf11/extensions/Echo/ 'Update echo for Special:Notifications fix' [00:43:48] Logged the message, Master [00:44:21] !log ebernhardson synchronized php-1.23wmf11/extensions/Flow/ 'Update flow for Special:Notifications fix' [00:44:29] Logged the message, Master [00:44:36] !log finished deploying Special:Notifications fix to Echo and Flow [00:44:44] Logged the message, Master [00:51:29] (03PS1) 10Dzahn: remove old erzurumi ActiveMQ monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/111677 [00:52:58] (03PS2) 10Dzahn: remove old erzurumi ActiveMQ monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/111677 [00:54:03] going to rerun scap [00:58:19] !log ori started scap: no-diff scap to test script changes [00:58:26] Logged the message, Master [00:59:14] (03CR) 10Dzahn: [C: 032] "< Jeff_Green> mutante: you can remove it entirely. we monitor that from inside frack now--there's a job that runs on silicon which reports" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111677 (owner: 10Dzahn) [01:00:53] !log ori finished scap: no-diff scap to test script changes (duration: 02m 34s) [01:01:01] Logged the message, Master [01:01:11] woot :) [01:02:11] * bd808 wonders how a scap finished in 2.5m [01:02:38] --versions=1.23wmf10 [01:02:41] and no changes to push out [01:02:56] and more rsync servers than just tin [01:06:35] !log apt-get remove libnet-stomp-perl on neon, i just removed that from puppet but didn't think it should stay in as an "absent" package forever [01:06:44] Logged the message, Master [01:08:16] (03CR) 10Dzahn: "package removed from neon manually, we'll likely never use this again and just a single host" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111677 (owner: 10Dzahn) [01:27:46] (03CR) 10Dzahn: [C: 031] "so, donate-lb doesn't need 2 MX's anymore? could fundraising people actually make/merge fundraising changes? (one of the 2 is enough), thx" [operations/dns] - 10https://gerrit.wikimedia.org/r/111621 (owner: 10Dzahn) [01:45:43] (03Abandoned) 10Dzahn: remove erzurumi from DNS [operations/dns] - 10https://gerrit.wikimedia.org/r/109656 (owner: 10Dzahn) [01:47:21] (03CR) 10Dzahn: "sigh, duplicates that are also linked on the tickets btw" [operations/dns] - 10https://gerrit.wikimedia.org/r/109656 (owner: 10Dzahn) [01:48:29] (03CR) 10Dzahn: "i don't think anymore cares if people want to replace dsh completely" [operations/puppet] - 10https://gerrit.wikimedia.org/r/96413 (owner: 10Dzahn) [01:49:24] (03Abandoned) 10Dzahn: move dsh to module [operations/puppet] - 10https://gerrit.wikimedia.org/r/96413 (owner: 10Dzahn) [01:59:33] (03PS1) 10Springle: s2 repool db1034, warm up [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111699 [02:00:06] (03CR) 10Springle: [C: 032] s2 repool db1034, warm up [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111699 (owner: 10Springle) [02:00:12] (03Merged) 10jenkins-bot: s2 repool db1034, warm up [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111699 (owner: 10Springle) [02:01:34] !log springle synchronized wmf-config/db-eqiad.php 's2 repool db1034 warm up' [02:01:42] Logged the message, Master [02:14:32] (03PS4) 10Dzahn: linting openstack, quoting, arrows [operations/puppet] - 10https://gerrit.wikimedia.org/r/109295 [02:15:53] (03Abandoned) 10Dzahn: linting openstack, quoting, arrows [operations/puppet] - 10https://gerrit.wikimedia.org/r/109295 (owner: 10Dzahn) [02:21:38] (03PS1) 10Springle: s2 depool db1009 schema changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111705 [02:21:47] !log LocalisationUpdate completed (1.23wmf12) at 2014-02-06 02:21:47+00:00 [02:21:57] Logged the message, Master [02:22:08] (03CR) 10Springle: [C: 032] s2 depool db1009 schema changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111705 (owner: 10Springle) [02:22:14] (03Merged) 10jenkins-bot: s2 depool db1009 schema changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111705 (owner: 10Springle) [02:22:32] (03CR) 10Dzahn: [C: 031] sudoers: remove two files, seems not to be used anywhere [operations/puppet] - 10https://gerrit.wikimedia.org/r/111444 (owner: 10Matanya) [02:23:08] !log springle synchronized wmf-config/db-eqiad.php 's2 depool db1009 schema changes' [02:23:16] Logged the message, Master [02:23:36] (03CR) 10Dzahn: "nagios and rainman, cough:) pretty sure this can go, just can't submit in this state and nrpe_fundraising must be from Jeff" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111444 (owner: 10Matanya) [02:24:08] (03PS2) 10Matanya: sudoers: remove two files, seems not to be used anywhere [operations/puppet] - 10https://gerrit.wikimedia.org/r/111444 [02:24:27] (03CR) 10Dzahn: "rebase on a change that just does "D" on files? :p" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111444 (owner: 10Matanya) [02:42:19] (03PS1) 10Springle: clean up s2 and s4 [operations/puppet] - 10https://gerrit.wikimedia.org/r/111709 [02:44:10] (03CR) 10Springle: [C: 032] clean up s2 and s4 [operations/puppet] - 10https://gerrit.wikimedia.org/r/111709 (owner: 10Springle) [02:44:10] !log LocalisationUpdate completed (1.23wmf11) at 2014-02-06 02:44:10+00:00 [02:44:19] Logged the message, Master [02:45:34] !log xtrabackup clone db1034 to db1009 [02:45:41] Logged the message, Master [02:57:45] (03CR) 10Byfserag: "hmm, where is jenkins-bot?" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/110155 (owner: 10TTO) [03:05:51] (03PS1) 10Springle: Sideline db1034 for hardware checks RT 6783. Assign db1024 to s2 as replacement. [operations/puppet] - 10https://gerrit.wikimedia.org/r/111713 [03:07:27] (03CR) 10Springle: [C: 032] Sideline db1034 for hardware checks RT 6783. Assign db1024 to s2 as replacement. [operations/puppet] - 10https://gerrit.wikimedia.org/r/111713 (owner: 10Springle) [03:10:11] (03PS1) 10BBlack: Move KH,MY,PH,SG,TW to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/111714 [03:10:28] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [03:10:44] !log xtrabackup clone db1018 to db1024 [03:10:47] (03CR) 10BBlack: [C: 032 V: 032] Move KH,MY,PH,SG,TW to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/111714 (owner: 10BBlack) [03:10:52] Logged the message, Master [03:22:26] !log LocalisationUpdate ResourceLoader cache refresh completed at 2014-02-06 03:22:26+00:00 [03:22:34] Logged the message, Master [03:27:33] Can I see the kafka broker config somewhere? [03:36:13] PROBLEM - mysqld processes on db1024 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [03:40:35] Snaps: sure, I can retrieve it. Just a second. [03:41:17] ori: I really just want the value of the "message.max.bytes" property :) [03:41:53] PROBLEM - Puppet freshness on db1024 is CRITICAL: Last successful Puppet run was Thu 06 Feb 2014 03:18:56 AM UTC [03:43:28] Snaps: on the bits caches at least it is not explicitly set [03:44:27] ori: okay, thanks :) [03:44:30] Snaps: the file is generated by puppet from a template; the output on bits is this config file: https://dpaste.de/3Dbb/raw [03:44:51] ah, thats for varnishkafka, I was talking about the kafka broker [03:45:02] ohhh, I misread. Sorry. Hang on [03:45:30] it runs on analytics1022.eqiad I think [03:47:37] it's not set [03:48:18] okay, using defaults. All I need to know, thank you ori :) [03:48:25] no problem [03:59:11] springle: Have you seen https://bugzilla.wikimedia.org/60907 (apparently replica s7, "ERROR 1548 (HY000) at line 1: Cannot load from mysql.proc. The table is probably corrupted")? [03:59:37] scfc_de: yes, i've seen it [03:59:51] (i'm assigned to it) [04:01:49] I did that, but that doesn't guarantee that it is noticed :-). [04:02:01] :) [04:08:23] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [04:17:00] going to run another no-op scap [04:17:44] noöp [04:17:50] :) [04:24:06] !log ori started scap: no new code. testing scap changes. [04:24:13] Logged the message, Master [04:24:31] Production is the best test environment. [04:24:35] Gloria: Hush. [04:28:42] !log ori finished scap: no new code. testing scap changes. (duration: 04m 35s) [04:28:50] Logged the message, Master [04:29:07] that was a successful run of all branches [04:29:49] bd808, pack your bags, we're going to vegas [04:30:23] the tampa app servers don't have a usable rsync server, they're all defaulting to tin [04:30:35] o_O [04:30:56] it was like that for all servers until a few hours ago [04:31:05] That was the perl bug? [04:31:27] yeah [04:31:33] Nice find [04:32:11] Why didn't anyone notice the network load on tin due to that? [04:32:19] they were gathering requirements [04:33:25] * bd808 dodges the jab [04:33:35] it was a joke :D [04:33:59] probably because no one was looking; i wasn't [04:34:26] what would you expect the outbound bandwidth to look like if things were working properly? it's not hard to estimate, but it takes some deliberate care. [04:34:51] Yeah. Or graphs with scap start/end lines on them too [04:35:22] well, yeah, you'd see it go up during scap, because that's tin's primary function [04:35:54] But it should go up only during fanout and then drop back down long before the end [04:35:59] I think [04:36:37] it's realllllly hard to read a real world graph and detect this sort of thing when you're not looking for it [04:37:27] we're also swallowing the output somewhere [04:37:36] console output, i mean [04:38:41] well, i suspect we are; i'm not certain. [04:38:46] The output from dsh? [04:40:36] Can scappy be tested in beta? At least the fundamental bits? That seems like an faster palce to iterate on the little changes. [04:40:37] oh, hah. i don't think we were affected by that bug. [04:40:57] god damn it. that's hubris for you. [04:41:44] the python script reads the host files using re.findall(r'^\w+', hosts_file.read(), re.MULTILINE) [04:41:51] which doesn't match '.' [04:41:58] so it was truncating the domain name [04:42:54] but scap-proxies actually specifies nodes via the fully-qualified name, so they are pingable across DCs [04:44:14] it probably still bit us here and there because whenever pinging a host failed for whatever reason find-nearest-rsync would pick it as the proxy [04:44:36] so, back to your original question: [04:44:41] Why didn't anyone notice the network load on tin due to that? [04:44:47] probably there wasn't any [04:44:58] i'll go eat a hat now [04:45:23] Don't forget to have it toasted first [04:45:33] salted :) [04:48:11] * bd808 had to explain why "salted" was funny to his wife [05:08:49] !log redeploying parsoid/deploy on wtp* [05:08:57] Logged the message, Master [05:09:50] !log scratch that, will redploy parsoid/deploy in about an hour [05:09:57] Logged the message, Master [05:12:24] (03PS1) 10Springle: Reduce labsdb1003 mariadb global memory usage (buffer pool size) to allow for high per-thread usage. Kernel OOM killer hit instance on port 3306. [operations/puppet] - 10https://gerrit.wikimedia.org/r/111725 [05:14:10] (03CR) 10Springle: [C: 032] Reduce labsdb1003 mariadb global memory usage (buffer pool size) to allow for high per-thread usage. Kernel OOM killer hit instance on port [operations/puppet] - 10https://gerrit.wikimedia.org/r/111725 (owner: 10Springle) [05:15:52] !log restart labsdb1003 mariadb instances [05:16:00] Logged the message, Master [05:18:33] PROBLEM - mysqld processes on labsdb1003 is CRITICAL: PROCS CRITICAL: 2 processes with command name mysqld [05:23:33] RECOVERY - mysqld processes on labsdb1003 is OK: PROCS OK: 3 processes with command name mysqld [05:29:33] PROBLEM - Varnish HTCP daemon on cp1054 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:53] PROBLEM - Varnish traffic logger on cp1054 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:53] PROBLEM - Varnish HTTP text-backend on cp1054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:35:39] null-scapping again [05:39:56] !log ori started scap: no new code. testing scap changes. (again.) [05:40:05] Logged the message, Master [05:40:38] what happened to cp1054? [05:44:09] (03PS1) 10Wpmirrordev: Extend maximum allowed mediawiki version to 1.23 [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/111728 [05:44:52] (03CR) 10Liangent: [C: 031] Add variant rewrites for zhwikivoyage [operations/apache-config] - 10https://gerrit.wikimedia.org/r/110155 (owner: 10TTO) [05:44:57] !log ori finished scap: no new code. testing scap changes. (again.) (duration: 05m 00s) [05:45:05] Logged the message, Master [05:45:33] wait time spiked on that varnish [05:48:35] !log varnish on cp1054: CPU wait spiked at 05:27. dmesg|tail: XFS: possible memory allocation deadlock in kmem_alloc. not investigating further. [05:48:42] Logged the message, Master [05:49:38] it needs to be rebooted, but there are plenty of other text varnishes that are doing fine, so i'm leaving it for someone in ops [05:56:27] (03PS1) 10Ori.livneh: replace scap bash script with equivalent python code [operations/puppet] - 10https://gerrit.wikimedia.org/r/111730 [06:05:30] (03CR) 10Ori.livneh: [C: 032] replace scap bash script with equivalent python code [operations/puppet] - 10https://gerrit.wikimedia.org/r/111730 (owner: 10Ori.livneh) [06:24:53] (03CR) 10MZMcBride: "Related: bug 27294" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110904 (owner: 10Ori.livneh) [06:46:30] (03PS1) 10Springle: s2 pool db1024 warm up [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111733 [06:46:54] (03CR) 10Springle: [C: 032] s2 pool db1024 warm up [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111733 (owner: 10Springle) [06:47:00] (03Merged) 10jenkins-bot: s2 pool db1024 warm up [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111733 (owner: 10Springle) [06:47:56] !log springle synchronized wmf-config/db-eqiad.php 's2 pool db1024 warm up' [06:48:03] Logged the message, Master [07:01:24] !log xtrabackup clone db1018 to db1009 (take #2) [07:01:32] Logged the message, Master [07:28:03] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 206.433334 [07:29:12] !log redeploying parsoid/deploy on wtp* [07:29:13] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 290.866669 [07:29:20] Logged the message, Master [07:29:23] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 248.933334 [07:33:03] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [07:34:23] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [07:35:13] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [07:44:23] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2590.699951 [07:46:13] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 954.766663 [07:48:03] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 295.833344 [07:57:03] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:00:03] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 77.066666 [08:06:10] (03PS24) 10Matanya: site: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109507 [08:09:16] (03CR) 10Byfserag: [C: 031] Add variant rewrites for zhwikivoyage [operations/apache-config] - 10https://gerrit.wikimedia.org/r/110155 (owner: 10TTO) [08:16:03] PROBLEM - Puppet freshness on labsdb1003 is CRITICAL: Last successful Puppet run was Thu 06 Feb 2014 05:15:15 AM UTC [08:16:13] PROBLEM - Parsoid on wtp1011 is CRITICAL: Connection refused [08:19:13] RECOVERY - Parsoid on wtp1011 is OK: HTTP OK: HTTP/1.1 200 OK - 970 bytes in 0.005 second response time [08:19:13] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:23:03] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:24:23] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:31:08] moin [08:33:05] hello paravoid [08:33:48] paravoid: howdy [08:39:51] (03CR) 10Mattflaschen: [C: 04-1] "Looks good, except for GuidedTour and living people. I'd like to flesh out the living people categories." (033 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111460 (owner: 10Phuedx) [09:00:33] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [09:02:03] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 35.44 ms [09:06:53] (03PS1) 10ArielGlenn: add tantalum misc server [operations/dns] - 10https://gerrit.wikimedia.org/r/111740 [09:09:36] (03CR) 10ArielGlenn: [C: 032] add tantalum misc server [operations/dns] - 10https://gerrit.wikimedia.org/r/111740 (owner: 10ArielGlenn) [09:11:49] cp1054 is still in a kmem_alloc funk [09:12:30] fixing [09:12:43] RECOVERY - Varnish traffic logger on cp1054 is OK: PROCS OK: 2 processes with command name varnishncsa [09:12:43] RECOVERY - Varnish HTTP text-backend on cp1054 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.002 second response time [09:12:44] I'll add a cronjob for compact memory [09:12:47] thanks, also: morning [09:12:49] thanks [09:13:04] sorry if i was too aggressive yesterday or on the lists [09:13:12] ori: is this syntax corrct related to puppet3: <% if @lsbdistrelease >= "12.04" %> [09:13:23] RECOVERY - Varnish HTCP daemon on cp1054 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [09:13:27] (03CR) 10Zhuyifei1999: [C: 031] "Adding Reedy to reviewer" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/110155 (owner: 10TTO) [09:14:06] it's erb so it's just ruby, so i guess it's doing a vanilla lexical comparison [09:14:22] instead of regrets, I think you should start contributing to that pad, I think you can be really really helpful if you want :) [09:14:46] * matanya is cirous about that pad paravoid mentioned [09:14:48] good morning! [09:15:17] ori: the current syntax is <% if scope.function_versioncmp([lsbdistrelease, "12.04"]) >= 0 %> which is not puppet3 friendly [09:15:36] i wonder if my suggestion will in fact fix it [09:15:55] hi hashar [09:15:57] no, probably not, because versioncmp is probably aware of version string semantics [09:16:16] so what would you suggest to do? [09:16:43] PROBLEM - NTP on mw31 is CRITICAL: NTP CRITICAL: Offset unknown [09:16:54] I see this on the docs: http://docs.puppetlabs.com/references/stable/function.html#versioncmp [09:17:22] well, http://docs.puppetlabs.com/references/latest/function.html#versioncmp [09:17:22] ori: I would like to add a symlink bin/scap.py to bin/scap so we can get pyflakes/pep8 jobs. What do you think ? [09:17:23] yeah [09:17:32] hashar: there is no scap.py [09:17:36] only scap [09:17:46] ori: yeah and pyflakes/pep8 can't find bin/scap :/ [09:17:54] ori: or I should use tox :-] [09:18:47] `file` detects scap as 'scap: a python script text executable' [09:18:57] presumably based on the shebang [09:19:19] could you configure jenkins to match linters to files on that basis? [09:19:45] oh, BTW, today I celebrate passing 100 merged patches by me to WMF code base YAY! :) [09:20:02] matanya: congrats! [09:20:37] nice! [09:20:43] RECOVERY - NTP on mw31 is OK: NTP OK: Offset 0.001310110092 secs [09:21:12] matanya: well done! thank you for all the patches! [09:21:21] http://docs.puppetlabs.com/guides/templating.html#using-functions-within-templates [09:21:27] doesn't seem like anything changed between puppet 2 and 3 [09:21:28] ori: yeah that could be done using file. [09:21:54] ori: I will just get tox instead, this way we can let people tweak their pep8/pyflakes / unit tests however they want [09:22:04] ori: Dynamic lookup of $lsbdistrelease at /etc/puppet/modules/ganglia_new/templates/gmond.conf.erb:60 is deprecated. Support will be removed in Puppet 2.8. Use a fully-qualified variable name (e.g., $classname::variable) or parameterized classes. [09:22:15] ori: I can get it installed via pip https://gerrit.wikimedia.org/r/#/c/111536/1/modules/contint/manifests/packages/labs.pp,unified :D [09:22:46] I first need to get Faidon to scream at making puppet to use pip as a package provider [09:22:53] aaaaaaaaaa [09:22:56] :P [09:23:03] and thank you all for encourge and help [09:23:21] matanya: right, so ::lsbdistrelease [09:23:33] yeah, ori, it is in erb [09:24:28] scope.lookupvar('::lsbdistrelease') [09:24:45] or just compute it in the manifest and assign it to a variable [09:24:56] better way i guess [09:24:59] mark, I'm running a command that needs the 'EXTERNAL_INTERFACE_GATEWAY' and 'EXTERNAL_INTERFACE_CIDR'. The external interface is set to 10.64.22.11, so I'm guessing that the CIDR is 10.64.22.11/24? But I don't know what to use as the gateway. [09:25:09] it should be available as @lsbdistrelease because it's a fact iirc [09:25:21] the gateway is 10.64.22.1 [09:25:26] yeah, that is what i thought a first too [09:25:28] so this should work: <% if scope.function_versioncmp([@lsbdistrelease, "12.04"]) >= 0 %> [09:25:39] and that CIDR sounds correct yes [09:26:02] yeah, so my code above was close to correct :) [09:26:20] well, you were solving the wrong problem [09:26:30] means? [09:26:43] eliminating the scope.function_versioncmp when it wasn't the issue [09:26:51] oh, yeah [09:27:26] thanks for the directions [09:28:02] thanks for the patches [09:28:40] (03PS1) 10Matanya: ganglia: puppet 3 compatibility fix: fully qualify variable [operations/puppet] - 10https://gerrit.wikimedia.org/r/111743 [09:29:21] mark, it would also be handy if you could allocate a smallish pool of public IPs for me to use in eqiad. (Eventually I'll recapture those from tampa but right now there isn't a good continuous range for me to swipe.) [09:29:31] (and if the two ranges overlap then hilarity will ensue) [09:29:38] there is one [09:29:50] you have used one ip for it already for NAT [09:29:52] let me look it up [09:30:11] ; 208.80.155.128/25 Eqiad Labs virtualization subnet [09:30:12] all yours [09:30:32] (03PS1) 10ArielGlenn: add tantalum to dhcp, netboot [operations/puppet] - 10https://gerrit.wikimedia.org/r/111744 [09:30:47] sweet [09:32:05] (03CR) 10ArielGlenn: [C: 032] add tantalum to dhcp, netboot [operations/puppet] - 10https://gerrit.wikimedia.org/r/111744 (owner: 10ArielGlenn) [09:33:36] grr, this must mean something new when it says 'floating ip' because it's complainign that my ip range isn't on 10.64.22.0/24 which is obviously not useful for floating ips... [09:34:04] way to constantly redefine your terminology, openstack! [09:34:21] (03PS1) 10Matanya: ldap: puppet 3 compatibility fix: fully qualify variable [operations/puppet] - 10https://gerrit.wikimedia.org/r/111745 [09:36:46] nice [09:38:37] Um… am I miscounting my bits? {u'message': u"The allocation pool {u'start': u'10.64.22.14', u'end': u'10.64.22.255'} spans beyond the subnet cidr 10.64.22.0/24." [09:38:57] (03PS1) 10Ryan Lane: Temporarily disable multi-master salt [operations/puppet] - 10https://gerrit.wikimedia.org/r/111746 [09:39:01] mark, paravoid: ^^ [09:39:02] no that's correct [09:39:04] mind a review? [09:39:17] specifically the template [09:39:18] hopefully paravoid can review? i'm finally gonna get breakfast now ;) [09:39:21] or I can review later [09:39:26] mark, the error is correct? Or my range is correct? [09:39:33] andrewbogott: the range looks correct [09:39:41] 10.64.22.0/24 is 10.64.22.0 to 10.64.22.255 [09:39:52] So neutron can't count then. [09:39:56] * andrewbogott curses [09:39:59] heh [09:40:10] ok, bbl :) [09:40:29] oh, it things 254 is fine. [09:40:33] Off-by-one ftw [09:40:36] I guess that change can wait till tomorrow. I need to sleep :) [09:40:52] revieing [09:40:56] reviewing [09:42:10] mostly I wanted to make sure adding the -'s did what I intended [09:43:22] (03CR) 10Faidon Liambotis: [C: 032] Temporarily disable multi-master salt [operations/puppet] - 10https://gerrit.wikimedia.org/r/111746 (owner: 10Ryan Lane) [09:43:53] thanks [09:44:02] are you deploying? [09:45:12] looks like you did :) [09:45:27] yeah :) [09:45:32] I'm testing on tin [09:45:56] have I mentioned how much I really, really love hiera? [09:46:24] it's not too often you'll hear me praise a puppet feature :) [09:48:56] cool, that change worked [09:55:18] (03PS1) 10Ryan Lane: Add an eventual consistency call for deploy.deployment_server_init [operations/puppet] - 10https://gerrit.wikimedia.org/r/111749 [09:56:43] eventual consistency? [09:57:15] yes. when the puppet master updates the pillars, it also calls deploy.deployment_server_init on tin [09:57:29] (03CR) 10Ori.livneh: "no unless / onlyif?" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/111749 (owner: 10Ryan Lane) [09:57:32] if tin doesn't receive that call for some reason, puppet should ensure it occurs [09:57:41] no unless or onlyif [09:57:43] there's no need [09:57:48] it should run every single time [09:57:58] the function itself ensures a state [09:58:10] as mentioned in the commit message, it's safe to run every time [09:58:41] it also returns in roughly 1-2 seconds if no changes need to be made (like a clone) [09:59:27] unless / onlyif has very little to do with safety [09:59:32] (03CR) 10Ryan Lane: "As mentioned in the commit message, this should be run on every puppet run. It's safe to do so and the command returns in 1-2 seconds if n" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111749 (owner: 10Ryan Lane) [09:59:45] what would be the purpose here? [10:00:10] the point of this is to ensure consistency and the function call is what does it [10:00:37] if it isn't called every time, then it can't ensure consistency [10:02:37] puppet is declarative, not procedural. it checks the state of the system and modifies it as needed to match a declared state. [10:02:51] when you call it every time, you are essentially conceding that there is a bit of state that is a black box to puppet [10:03:06] that it must attempt to modify every time [10:03:26] this is state that's managed outside of puppet [10:03:43] right, and that's bad [10:03:55] why? [10:04:23] first of all, because you're managing it from puppet [10:04:30] absolutely not [10:04:35] puppet is the eventual consistency system [10:04:36] what is that patch then? [10:04:46] salt is the immediate consistency system [10:05:01] you back up an immediate consistency system with an eventual consistency syste [10:05:37] that's a fancy way of saying that there's a bug somewhere you don't care to chase down and will fix by just having a script run until it sticks [10:05:37] salt is ensuring an immediate state for deployment, and puppet calls it locally in case it fails to be called from the master [10:05:51] why is it failing to be called from the master? [10:05:53] no. you *must* assume that an immediate consistency system will fail [10:06:11] and you *must* back that up with something that causes the state to eventually become consistent [10:06:19] ori: I already chased down the bug and fixed it [10:06:32] ori: https://gerrit.wikimedia.org/r/#/c/111746/ [10:06:35] i see, so when i rm ~/oldfiles, i should also add a cron job that rms it just in case? [10:07:06] if you run salt '*' cmd.run 'rm ~/oldfiles' you should definitely have a state that also ensures it's gone [10:07:35] a remote execution system is there to speed up the process [10:07:36] hashar_: I suppose https://gerrit.wikimedia.org/r/#/c/111536/ is not the first pip installed package on CI right ? [10:07:40] it's not meant to be failsafe [10:07:56] i......ok. [10:07:59] if you ever rely on it being so, you've made a mistake [10:08:25] akosiaris: it is [10:09:02] if it's in scope for puppet to manage it [10:09:06] it's in scope for puppet to know its state [10:09:15] and thus in scope to run it as needed, based on local state [10:09:17] that kind of sucks Ryan [10:09:17] akosiaris: I could use tox 1.6+ but I could not manage to backport the debian package from another ubuntu version. There is a bunch of dependencies that have changed such as python/ some new virtual packages and a bunch of other packages that are not in Precise [10:09:22] paravoid: why? [10:09:24] and not generate log churn that makes you think puppet had to modify the system over and over [10:09:36] does this mean that we have to have puppet code that clones mediawiki in case salt failed to do it? :) [10:09:43] akosiaris: so I though the pragmatic approach would be to use pip since tox would only be used on labs (I have added a fail() call whenever $::realm is 'production' [10:09:48] no. you'd have puppet call the salt module [10:09:57] though/thought [10:09:58] just like we have things that call scap [10:10:00] in 0-30 minutes? :) [10:10:06] paravoid: yes [10:10:07] akosiaris: in production I have packaged all the python modules I needed (for Zuul) [10:10:19] that sucks [10:10:28] you can never assume that a remote execution system is going to be 100% reliable [10:10:33] hashar_: ok.. that makes me feel a lot better [10:10:45] I can assume it will be 100% reliable if it doesn't throw any errors [10:10:48] you should take the output of the system, check the failures and base an action on it [10:10:56] but you're not checking the failure [10:10:56] I suppose the idea is that at some point we move to trusty and this is no longer needed right ? [10:11:00] you're running this unconditionally [10:11:12] then you should have a mechanism to ensure it's eventually consistent [10:11:16] cause trusty *might* have the version you *now* want [10:11:18] :P [10:11:20]