[00:02:07] Aha [00:02:16] I need to make sure I write this down somewhere ;) [00:02:27] 1095 /build/buildd/php5-5.4.9/ext/exif/exif.c: No such file or directory. [00:02:32] ignore that [00:02:51] So it's exif_read_data() [00:03:54] It's there in 5.4.9 locally and the clusters PHP 5.3.10-1ubuntu3.6+wmf1 [00:05:43] * bd808 runs in fear from (void *)[1] and it's implications [00:07:15] Interesting, no 5.3.10 branch on github for php source [00:08:04] http://archive.ubuntu.com/ubuntu/pool/main/p/php5/php5_5.3.10.orig.tar.gz [00:08:24] http://archive.ubuntu.com/ubuntu/pool/main/p/php5/php5_5.3.10-1ubuntu3.8.diff.gz may have patches that change those line too [00:09:23] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 10 hours [00:11:23] PROBLEM - Puppet freshness on bast4001 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:23] PROBLEM - Puppet freshness on cp4002 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:23] PROBLEM - Puppet freshness on cp4003 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:23] PROBLEM - Puppet freshness on cp4004 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:23] PROBLEM - Puppet freshness on cp4006 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:23] PROBLEM - Puppet freshness on cp4007 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:24] PROBLEM - Puppet freshness on cp4008 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:24] PROBLEM - Puppet freshness on cp4009 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:25] PROBLEM - Puppet freshness on cp4010 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:25] PROBLEM - Puppet freshness on cp4011 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:26] PROBLEM - Puppet freshness on cp4012 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:26] PROBLEM - Puppet freshness on cp4013 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:27] PROBLEM - Puppet freshness on cp4016 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:27] PROBLEM - Puppet freshness on cp4018 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:28] PROBLEM - Puppet freshness on cp4020 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:28] PROBLEM - Puppet freshness on lvs4002 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:28] PROBLEM - Puppet freshness on lvs4003 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:29] PROBLEM - Puppet freshness on lvs4004 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:58] Reedy: do you have an image I can also test with? [00:12:30] yeah [00:12:41] https://noc.wikimedia.org/~reedy/segfault.tar.gz [00:13:58] large :) [00:14:03] hotel wifi, heh [00:14:21] Reedy: If you haven't figured this out already, you can get the full source tree via `apt-get source php5` [00:14:38] The php exif ext module code looks to be very similar between the 2 versions [00:14:42] line 1095 is identical [00:15:56] Line 1095 has moved to 1085 in master, but still the same [00:16:02] You need to back up the stack to see what's feeding garbage to that cast helper. [00:16:15] https://bugzilla.wikimedia.org/show_bug.cgi?id=55541 [00:18:43] then, what do you run? importImages? [00:18:44] php_ifd_get16u (value=0xfffffffffa3fb318, motorola_intel=0) at [00:19:04] php maintenance/importImages.php --comment-ext=txt --user=Reedy /tmp/uploads --overwrite [00:19:17] extract the tar.gz [00:20:15] in prod? [00:20:46] in prod? [00:20:55] in prod? [00:20:57] are you running this in production? [00:20:58] :) [00:21:07] on terbium [00:21:08] sudo -u apache mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=LSHuploadBot /media/external/keepers/tmp [00:23:07] easier to do it locally with root and can grab files from the interwebs at large [00:24:13] not for me :P [00:24:29] I don't have a working mediawiki locally [00:24:31] apt-get install mediawiki [00:28:33] I wonder if you can narrow it down and call php_ifd_get16u [00:30:08] it's not php_ifd_get16u that is buggy [00:30:14] it's what it's being passed to it [00:30:28] sure [00:30:33] I was meaning a test case [00:30:42] rather than going via mediawiki, php, hell and back [00:30:56] (gdb) zbacktrace [00:30:57] [0xf7edbfb0] exif_read_data() /usr/local/apache/common-local/php-1.22wmf20/includes/media/Exif.php:302 [00:30:59] [0xf7edba88] __construct() /usr/local/apache/common-local/php-1.22wmf20/includes/media/BitmapMetadataHandler.php:268 [00:31:02] [0xf7eda120] Tiff() /usr/local/apache/common-local/php-1.22wmf20/extensions/PagedTiffHandler/PagedTiffHandler.image.php:174 [00:31:09] so, yeah, you can just isolate stuff from exif.php [00:31:53] Interesting, I don't see PagedTiffHandler on mine [00:32:03] that's terbium [00:32:15] yeah [00:32:23] I thought I had it installed, apparently not [00:33:50] the rest of the stack seems very wrong too [00:34:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [00:34:43] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 12.255 second response time [00:49:22] Reedy: I can't recreate the crash in my vagrant vm. importImages brings it in just fine. :( [00:50:04] $ php -v [00:50:05] PHP 5.3.10-1ubuntu3.7 with Suhosin-Patch (cli) (built: Jul 15 2013 18:05:44) [00:51:19] yay, computers [00:52:09] Somewhat confused why it works fine under valgrind locally too [00:54:03] Pointer cleanliness in php extensions is a black art. There are ton of heisenbugs that disappear under valgrind/gdb. [00:54:29] I've chased a lot of them over the years with very few sucesses. [00:55:24] I even accidentally became the maintainer of a pecl extension because of one. :) [00:55:45] * bd808 should find someone to pawn that off on. [00:58:39] At least it's very much not a MediaWiki bug [00:58:48] Though, I guess needs upstreaming [00:59:05] Can we upload > 100 MB attachments to bugs.php.net? :D [01:00:00] If you can't you should file a bug about it [01:01:08] Or just stuff it in a github public repo with a test script to reproduce. [01:02:42] I think `$data = exif_read_data( $this->file, 0, true );` in your minimum test case if I understand the trace Faidon gave. [01:02:59] * bd808 just got called to dinner [01:06:31] As it's currently 2am and I have no desire to start logging PHP bugs... I'll make a note of it and deal with it post sleep [01:49:23] PROBLEM - Puppet freshness on virt1000 is CRITICAL: No successful Puppet run in the last 10 hours [01:59:53] PROBLEM - Puppetmaster HTTPS on virt0 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8140: HTTP/1.1 500 Internal Server Error [02:06:54] RECOVERY - Puppetmaster HTTPS on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.143 second response time [02:18:06] !log LocalisationUpdate completed (1.22wmf19) at Thu Oct 10 02:18:06 UTC 2013 [02:18:23] Logged the message, Master [02:28:22] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Oct 10 02:28:22 UTC 2013 [02:28:35] Logged the message, Master [03:26:14] (03CR) 10Springle: [C: 031] Move mysql_wmf into a module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/88666 (owner: 10Andrew Bogott) [04:08:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [04:08:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 18.957 second response time [04:16:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [04:19:26] !log upgrading db1007 to precise + mariadb [04:19:40] Logged the message, Master [04:19:43] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 11.823 second response time [04:40:13] (03PS1) 10Springle: db1007 to mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/88922 [04:41:16] (03CR) 10Springle: [C: 032] db1007 to mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/88922 (owner: 10Springle) [04:45:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [04:49:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 23.067 second response time [04:55:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [04:55:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 20.974 second response time [05:02:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [05:03:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 21.606 second response time [05:08:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [05:09:03] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 25.778 second response time [05:16:36] !log start xtrabackup clone db1039 to db1007 [05:16:48] Logged the message, Master [05:55:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [05:55:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 20.624 second response time [06:01:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [06:02:25] (03PS1) 10Springle: db1044 install mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/88928 [06:04:10] (03CR) 10Springle: [C: 032] db1044 install mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/88928 (owner: 10Springle) [06:08:43] RECOVERY - Puppet freshness on ms-be8 is OK: puppet ran at Thu Oct 10 06:08:39 UTC 2013 [06:10:53] mark: (neon) Oct 10 06:06:27 neon puppet-agent[32114]: Could not retrieve catalog from remote server: Error 400 on SERVER: Must pass ip_address to Monitor_service_lvs_http[wikibooks-lb.eqiad.wikimedia.org] at /etc/puppet/manifests/lvs.pp:984 on node neon.wikimedia.org [06:12:03] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 23.400 second response time [06:14:53] RECOVERY - Puppet freshness on virt1000 is OK: puppet ran at Thu Oct 10 06:14:50 UTC 2013 [06:25:05] (03PS1) 10Springle: db1045 to s5 [operations/puppet] - 10https://gerrit.wikimedia.org/r/88929 [06:29:08] (03CR) 10Springle: [C: 032] db1045 to s5 [operations/puppet] - 10https://gerrit.wikimedia.org/r/88929 (owner: 10Springle) [06:33:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [06:35:03] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 26.275 second response time [06:48:06] (03PS1) 10Springle: warm up db1039 in s7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/88932 [06:48:43] (03CR) 10Springle: [C: 032] warm up db1039 in s7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/88932 (owner: 10Springle) [06:50:26] !log springle synchronized wmf-config/db-eqiad.php 'warm up db1039 in s7' [06:50:39] Logged the message, Master [06:59:22] (03PS1) 10Ryan Lane: Change deploy repo config to repo => config [operations/puppet] - 10https://gerrit.wikimedia.org/r/88934 [06:59:26] ori-l: ^^ [06:59:49] (03CR) 10jenkins-bot: [V: 04-1] Change deploy repo config to repo => config [operations/puppet] - 10https://gerrit.wikimedia.org/r/88934 (owner: 10Ryan Lane) [06:59:50] that's one step towards being able to split the config of repos into individual parts [07:00:26] ooo i'll review, i'm in a puppet mindset [07:00:53] it's a relatively large change, sorry about that [07:01:10] hard to change the entire config hash and pillar structure without it being large [07:01:39] (03PS1) 10Ori.livneh: Log /proc/diskstats metrics to Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/88935 [07:03:02] (03PS2) 10Ryan Lane: Change deploy repo config to repo => config [operations/puppet] - 10https://gerrit.wikimedia.org/r/88934 [07:03:27] (03CR) 10jenkins-bot: [V: 04-1] Change deploy repo config to repo => config [operations/puppet] - 10https://gerrit.wikimedia.org/r/88934 (owner: 10Ryan Lane) [07:04:18] it looks cleaner, definitely [07:04:50] yep. it's nice to actually be able to work on this when I'm not totally rushed :) [07:06:07] (03PS3) 10Ryan Lane: Change deploy repo config to repo => config [operations/puppet] - 10https://gerrit.wikimedia.org/r/88934 [07:06:32] (03CR) 10jenkins-bot: [V: 04-1] Change deploy repo config to repo => config [operations/puppet] - 10https://gerrit.wikimedia.org/r/88934 (owner: 10Ryan Lane) [07:06:35] -_- [07:11:53] rawr. I have no fucking clue what's broken in this file [07:13:37] ah [07:14:13] (03PS4) 10Ryan Lane: Change deploy repo config to repo => config [operations/puppet] - 10https://gerrit.wikimedia.org/r/88934 [07:16:41] jenkins is happy [07:17:15] yep. I'm going to do some testing in the sartoris project [07:26:09] ok, scratch that, i can't review it this late, it's too big for my head :P [07:26:18] i need to read up more about sartoris i think [07:47:15] ori-l: no worries. I'm going to do testing in the labs project tomorrow anyway, I may have patchsets to follow [07:58:59] PROBLEM - Puppet freshness on cp4001 is CRITICAL: No successful Puppet run in the last 10 hours [08:13:38] (03PS1) 10ArielGlenn: db1039 (s7) back to normal weight [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/88937 [08:14:18] (03PS2) 10ArielGlenn: db1039 (s7) to normal weight [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/88937 [08:14:51] (03CR) 10ArielGlenn: [C: 032] db1039 (s7) to normal weight [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/88937 (owner: 10ArielGlenn) [08:16:29] !log ariel synchronized wmf-config/db-eqiad.php 'db1039 (s7) to normal weight in pool' [08:16:45] Logged the message, Master [08:18:59] PROBLEM - Puppet freshness on cp4014 is CRITICAL: No successful Puppet run in the last 10 hours [08:19:59] PROBLEM - Puppet freshness on cp4019 is CRITICAL: No successful Puppet run in the last 10 hours [08:20:59] PROBLEM - Puppet freshness on cp4005 is CRITICAL: No successful Puppet run in the last 10 hours [08:20:59] PROBLEM - Puppet freshness on cp4015 is CRITICAL: No successful Puppet run in the last 10 hours [08:22:59] PROBLEM - Puppet freshness on cp4017 is CRITICAL: No successful Puppet run in the last 10 hours [08:22:59] PROBLEM - Puppet freshness on lvs4001 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:30] (03CR) 10Hashar: "pep8 errors in files/ganglia/plugin are ignored. There is a .pep8 there and Jenkins run pep8 on a per directory basis :-]" [operations/puppet] - 10https://gerrit.wikimedia.org/r/88935 (owner: 10Ori.livneh) [08:24:18] (03CR) 10Hashar: "Ori proposed another diskstat plugin in https://gerrit.wikimedia.org/r/#/c/88935/ . So I guess we have to pick one :] Follow up on the ot" [operations/puppet] - 10https://gerrit.wikimedia.org/r/85669 (owner: 10Hashar) [08:25:49] (03PS1) 10ArielGlenn: depool db1024 (s7) for upgrade/conversion to mariadb [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/88939 [08:26:37] (03CR) 10ArielGlenn: [C: 032] depool db1024 (s7) for upgrade/conversion to mariadb [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/88939 (owner: 10ArielGlenn) [08:27:51] !log ariel synchronized wmf-config/db-eqiad.php 'depool db1024 (s7) for conversion to mariadb' [08:28:02] Logged the message, Master [08:28:31] (03Abandoned) 10Ori.livneh: Log /proc/diskstats metrics to Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/88935 (owner: 10Ori.livneh) [08:29:42] (03CR) 10Ori.livneh: "Ah, no, we should go with this patch; I forgot that it existed somehow even though I just looked at it the other day. I abandoned mine, le" [operations/puppet] - 10https://gerrit.wikimedia.org/r/85669 (owner: 10Hashar) [08:36:00] (03CR) 10Hashar: [C: 031] "The reporting system is nice. We could probably do something similar for the SNMP trap that is used to monitor whether puppet is running " [operations/puppet] - 10https://gerrit.wikimedia.org/r/88888 (owner: 10Ori.livneh) [09:06:11] (03PS1) 10ArielGlenn: db1024 -> file_per_table, mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/88941 [09:07:23] (03CR) 10ArielGlenn: [C: 032] db1024 -> file_per_table, mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/88941 (owner: 10ArielGlenn) [10:03:09] PROBLEM - Disk space on cp1061 is CRITICAL: DISK CRITICAL - free space: /srv/sdb3 12357 MB (3% inode=99%): [10:05:37] (03PS1) 10ArielGlenn: get rid of last references to search1-2, searchidx1 in dsh groups (decommed) [operations/puppet] - 10https://gerrit.wikimedia.org/r/88950 [10:06:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [10:06:59] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 21.646 second response time [10:06:59] (03CR) 10ArielGlenn: [C: 032] get rid of last references to search1-2, searchidx1 in dsh groups (decommed) [operations/puppet] - 10https://gerrit.wikimedia.org/r/88950 (owner: 10ArielGlenn) [10:09:59] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 10 hours [10:11:59] PROBLEM - Puppet freshness on bast4001 is CRITICAL: No successful Puppet run in the last 10 hours [10:11:59] PROBLEM - Puppet freshness on cp4002 is CRITICAL: No successful Puppet run in the last 10 hours [10:11:59] PROBLEM - Puppet freshness on cp4003 is CRITICAL: No successful Puppet run in the last 10 hours [10:11:59] PROBLEM - Puppet freshness on cp4004 is CRITICAL: No successful Puppet run in the last 10 hours [10:11:59] PROBLEM - Puppet freshness on cp4006 is CRITICAL: No successful Puppet run in the last 10 hours [10:11:59] PROBLEM - Puppet freshness on cp4007 is CRITICAL: No successful Puppet run in the last 10 hours [10:12:00] PROBLEM - Puppet freshness on cp4008 is CRITICAL: No successful Puppet run in the last 10 hours [10:12:00] PROBLEM - Puppet freshness on cp4009 is CRITICAL: No successful Puppet run in the last 10 hours [10:12:00] PROBLEM - Puppet freshness on cp4010 is CRITICAL: No successful Puppet run in the last 10 hours [10:12:01] PROBLEM - Puppet freshness on cp4011 is CRITICAL: No successful Puppet run in the last 10 hours [10:12:01] PROBLEM - Puppet freshness on cp4013 is CRITICAL: No successful Puppet run in the last 10 hours [10:12:02] PROBLEM - Puppet freshness on cp4016 is CRITICAL: No successful Puppet run in the last 10 hours [10:12:02] PROBLEM - Puppet freshness on cp4012 is CRITICAL: No successful Puppet run in the last 10 hours [10:12:03] PROBLEM - Puppet freshness on cp4018 is CRITICAL: No successful Puppet run in the last 10 hours [10:12:03] PROBLEM - Puppet freshness on cp4020 is CRITICAL: No successful Puppet run in the last 10 hours [10:12:04] PROBLEM - Puppet freshness on lvs4002 is CRITICAL: No successful Puppet run in the last 10 hours [10:12:05] PROBLEM - Puppet freshness on lvs4003 is CRITICAL: No successful Puppet run in the last 10 hours [10:12:05] PROBLEM - Puppet freshness on lvs4004 is CRITICAL: No successful Puppet run in the last 10 hours [10:26:44] (03PS1) 10ArielGlenn: remove search1-12 and searchidx1 from dns, decommed (see rt 2897) [operations/dns] - 10https://gerrit.wikimedia.org/r/88954 [10:44:17] (03PS1) 10ArielGlenn: current dhcpd.conf requires linux-host-entries.ttyS1-9600, add one [operations/puppet] - 10https://gerrit.wikimedia.org/r/88956 [11:24:19] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:25:19] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 1 logical drive(s), 4 physical drive(s) [11:33:06] (03PS1) 10Mark Bergsma: Add ulsfo BGP peer addresses [operations/puppet] - 10https://gerrit.wikimedia.org/r/88959 [11:34:26] (03CR) 10Mark Bergsma: [C: 032] Add ulsfo BGP peer addresses [operations/puppet] - 10https://gerrit.wikimedia.org/r/88959 (owner: 10Mark Bergsma) [11:43:07] mark: what else do we need to do to have ulsfo ready to receive traffic ? [11:43:30] i'm working on pybal BGP peering with the routers [11:43:36] having the other router configured would also be good [11:43:45] then we should test, the ssl part especially [11:43:49] and we should be ready [11:43:58] current plan is to put traffic on it early next week [11:44:45] how can I help ? [11:48:36] you can help testing later :) [11:48:59] ok ... ping me then :-) [11:52:04] !log upgraded php5 packages from php5_5.3.10-1ubuntu3.6+wmf1 to php5_5.3.10-1ubuntu3.8+wmf1 on apt.wikimedia.org [11:52:19] Logged the message, Master [11:53:38] (03PS1) 10Mark Bergsma: Fix order [operations/dns] - 10https://gerrit.wikimedia.org/r/88962 [11:54:05] (03CR) 10Mark Bergsma: [C: 032] Fix order [operations/dns] - 10https://gerrit.wikimedia.org/r/88962 (owner: 10Mark Bergsma) [12:01:56] interesting [12:02:04] cr1-ulsfo dropped off the net as soon as I restarted PyBal ;) [12:03:41] er [12:03:47] and the serial console server is segfaulting :) [12:04:26] !log Rebooting scs-ulsfo, pmshell is segfaulting [12:04:40] Logged the message, Master [12:06:56] didn't fix it [12:15:32] heh ok [13:28:16] (03PS1) 10Dzahn: fix broken links to wikis in stats tables [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/88980 [13:29:35] (03PS2) 10Dzahn: fix broken links to wikis in stats tables [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/88980 [13:30:39] (03CR) 10Dzahn: [C: 032] fix broken links to wikis in stats tables [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/88980 (owner: 10Dzahn) [13:36:54] (03CR) 10Dzahn: "(2 comments)" [operations/dns] - 10https://gerrit.wikimedia.org/r/88954 (owner: 10ArielGlenn) [13:37:41] ugh, no I just can't read [13:40:29] (03PS2) 10ArielGlenn: remove search1-12 and searchidx1 from dns, decommed (see rt 2897) [operations/dns] - 10https://gerrit.wikimedia.org/r/88954 [13:55:46] (03CR) 10Dzahn: [C: 031] remove search1-12 and searchidx1 from dns, decommed (see rt 2897) [operations/dns] - 10https://gerrit.wikimedia.org/r/88954 (owner: 10ArielGlenn) [14:01:11] !g I13082597cd921966a7fae0d5c67ff4359d032dda [14:01:11] https://gerrit.wikimedia.org/r/#q,I13082597cd921966a7fae0d5c67ff4359d032dda,n,z [14:14:40] (03CR) 10Akosiaris: [C: 032] "Good work. I just ran catalog compile tests for all db* (+ various others) hosts and with no errors. We should probably however revisit th" [operations/puppet] - 10https://gerrit.wikimedia.org/r/88666 (owner: 10Andrew Bogott) [14:24:45] !log Setup cr1-ulsfo:ae0 <--> cr2-ulsfo:ae0 [14:24:51] (03CR) 10Akosiaris: [C: 032] contint: fetch slave scripts on slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/87058 (owner: 10Hashar) [14:25:00] !log Setup OSPF and OSPF3 on cr1-ulsfo:ae0.2 <--> cr2-ulsfo:ae0.2 [14:25:00] Logged the message, Master [14:25:12] Logged the message, Master [14:32:36] akosiaris: hey :-] regarding the jenkins slave scripts being published on slaves. I got to use git::clone latest [14:32:59] akosiaris: but will eventually migrate to git-deploy whenever I found out how to use it :-] [14:33:16] akosiaris: i noticed Ryan Lane send a patch to tweak the git-deploy manifest and make it easier to add a new project. [14:33:19] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:34:05] yes he did... but I am not sure git-deploy is supposed to be run automated [14:34:10] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 1 logical drive(s), 4 physical drive(s) [14:34:50] which is what you want here .. right ? [14:35:39] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [14:36:19] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 26.55 ms [14:37:19] (03Abandoned) 10Akosiaris: move check-raid.py from base/files/monitoring/ to nrpe/plugins/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/87538 (owner: 10Dzahn) [14:38:14] !log Configured AS65003 ulsfo BGP confederation on cr1-ulsfo and cr2-ulsfo [14:38:22] !log Setup iBGP between cr1-ulsfo and cr2-ulsfo [14:38:26] Logged the message, Master [14:38:39] Logged the message, Master [14:38:59] PROBLEM - Apache HTTP on mw31 is CRITICAL: Connection refused [14:40:48] akosiaris: looking at spec job [14:40:52] akosiaris: err rspec [14:41:19] no jenkins expert so I assume it is fine [14:41:32] (03CR) 10Yurik: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/88261 (owner: 10Dr0ptp4kt) [14:41:59] RECOVERY - Apache HTTP on mw31 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.221 second response time [14:42:03] akosiaris: I am a bit concerned by the time it takes for them to run [14:42:28] !jenkins operations-puppet-spec [14:42:45] !jenkins is https://integration.wikimedia.org/ci/job/$1 [14:42:46] Key was added [14:42:49] !jenkins operations-puppet-spec [14:42:50] https://integration.wikimedia.org/ci/job/operations-puppet-spec [14:43:35] i see 2 secs... [14:43:54] yeah need to tweak it [14:43:55] https://integration.wikimedia.org/ci/job/operations-puppet-spec/3/console [14:43:57] grmblbl [14:44:53] huh... what a nice java exception... [14:45:04] yeah the git plugin attempts to rewrite the submodule urls [14:45:40] !bug 42953 [14:45:40] https://bugzilla.wikimedia.org/42953 [14:46:12] ah... yeah i remember that.... [14:46:47] !wikitech is https://wikitech.wikimedia.org/w/index.php?search=$1 [14:46:47] This key already exist - remove it, if you want to change it [14:47:07] akosiaris: yeah we got it by that already [14:47:08] solved with I2dc0ad5fcb51d7720475eae70f91466a78a0fe2e [14:47:18] !wikitech [14:47:18] http://wikitech.wikimedia.org/view/$1 [14:47:22] working now https://integration.wikimedia.org/ci/job/operations-puppet-spec/4/console [14:48:13] !icinga is https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=$1 [14:48:13] Key was added [14:48:18] !icinga carbon [14:48:18] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=carbon [14:50:32] akosiaris: the rspec only takes 25 seconds ! [14:51:17] hashar: yes... because it seems to stop at the first module [14:51:20] stdblib [14:51:25] stdlib* [14:51:40] i never saw the other 3 modules tests being run [14:52:00] and again.. what 25 secs ? i only see 2 secs at the output .... [14:52:18] https://integration.wikimedia.org/ci/job/operations-puppet-spec/4/ [14:52:29] on the top right, it shows the duration of the build [14:53:28] a ok ... i had not click on full log [14:54:34] 14:47:05 Invoking tests on module bacula [14:54:34] 14:47:05 rake aborted! [14:54:34] 14:47:05 Don't know how to build task 'spec_standalone' [14:54:47] so... i should add a rake target 'spec_standalone?' [14:54:56] spec_standalone that is [14:55:37] if you look at the rake file at the root of the repo [14:55:41] it finds modules [14:55:45] and system('rake spec_standalone') [14:55:51] I am not sure where that commands come from [14:56:01] require 'puppetlabs_spec_helper/rake_tasks' [14:56:04] maybe here ... [14:56:16] from modules/apache/Rakefile [14:56:32] ah from rspec-puppet when doing rspec-puppet-init I think [14:56:51] the modules Rakefile have: [14:56:55] require 'rubygems' [14:56:56] require 'puppetlabs_spec_helper/rake_tasks' [14:57:08] I guess the last require provide the spec_standalone [14:57:11] yes it is in the rake_tasks.rb [14:57:21] hmmm [14:57:23] (03PS1) 10coren: Tool Labs: puppetize the new webnode type instance [operations/puppet] - 10https://gerrit.wikimedia.org/r/88996 [14:57:32] akosiaris: http://paste.openstack.org/show/48217/ [14:57:34] for stdlib [14:58:01] I can't remember the exact details, got hacked up with andrew during the summer [14:58:14] andrewbogott: puppet rspec talking [14:58:36] that is a different target from what rspec-puppet-init creates... [14:58:39] I think that's all on hold pending a proper jail to run the tests in [14:58:58] But, I will read the backscroll in a minute [14:59:34] yeah jailing [14:59:40] but we could run them for trusted users [15:00:59] !log Corrected wrong interface addresses on cr1-ulsfo:ae0 sub-units [15:01:12] Logged the message, Master [15:02:21] (03PS2) 10coren: Tool Labs: puppetize the new webnode type instance [operations/puppet] - 10https://gerrit.wikimedia.org/r/88996 [15:02:51] ok, I've read the backscroll but still don't know what we're discussing :) [15:03:24] (03CR) 10coren: [C: 032] Tool Labs: puppetize the new webnode type instance [operations/puppet] - 10https://gerrit.wikimedia.org/r/88996 (owner: 10coren) [15:04:11] we were thinking about adding rspec tests... so i asked what remains to be done [15:04:23] one is the jailing... hashar solved some other issues [15:04:42] with git submodules and now i am looking at the rake targets [15:04:47] !log removed mysql-server-5.5 package from stat1 to ensure that it doesn't get used now that it's no longer maintained by puppet. [15:04:58] Logged the message, Master [15:05:06] rspec-puppet-init creates a Rakefile that does not have rspec_standalone [15:05:42] and, who calls rspec-puppet-init? [15:05:51] IIRC we were just doing 'rake spec' in the top-level puppet dir? [15:05:59] yup [15:06:10] a user that populates a module for the very first time [15:06:12] that recurse in submodules and calls 'rake spec_standalone' whenever a rake file is found there [15:06:18] that being me in this case :-) [15:06:28] the modules coming from puppet labs have that task defined [15:06:41] Oh… I see. [15:06:53] they apparently use something else that rspec-puppet [15:07:06] or have puppet labs has its own wrapper on top of it [15:07:14] So maybe that should just be handled with a 'howto' guide to create the rakefile by hand? [15:07:26] I guess so [15:07:35] I am not sure whether we have any documentation written yet though :( [15:07:44] I am pretty sure I haven't written any [15:07:48] I'm sure we don't! [15:08:00] good [15:08:01] (03PS1) 10Odder: (bug 54828) Enable FlaggedRevs for Portuguese Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/89001 [15:08:06] that is part of our culture (see bug 1) [15:08:15] oh [15:08:16] no [15:08:17] wait [15:08:18] https://wikitech.wikimedia.org/wiki/Puppet_coding#Rake_tests [15:08:20] \O/ [15:08:46] Hey, nonzero documentation! [15:08:48] the dream is alive [15:09:13] * hashar blames andrewbogott https://wikitech.wikimedia.org/w/index.php?title=Puppet_coding&diff=77308&oldid=76155 [15:09:40] * andrewbogott wouldn't call himself a hero [15:09:42] we will have to split that [[Puppet coding]] in smaller part one day [15:10:39] andrewbogott: I disagree. Every single line of code earn you a "I wrote doc" badge, and after enough badges you will be consider a hero even against your will [15:11:16] deploying rspec triggering [15:12:07] !log jenkins : triggering operations-puppet-rspec (non voting) on operations/puppet.git {{gerrit|89000}} [15:12:16] Logged the message, Master [15:12:51] (03PS1) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/89002 [15:13:56] Coren, you're using lighttpd in tool-labs? Because I have an item on my todo list about purging all lighttpd use from puppet :( [15:14:22] hashar: So, does that mean tests will be run on commit now? [15:14:33] Or only when requested by hand? Or...? [15:15:14] (03PS2) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/89002 [15:15:17] (03PS1) 10coren: Tool Labs: fix race condition on webnodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/89005 [15:15:23] andrewbogott: it will be run for anyone whitelisted. [15:15:24] (03PS1) 10Andrew Bogott: Fix mysql module so it can work w/out mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/89006 [15:15:33] oh, ok. That seems good. [15:15:34] andrewbogott: which is the usual volunteers, wmde and wikimedia folks. [15:15:34] andrewbogott: I'm not using any classes for it though. [15:15:47] andrewbogott: that will vote +2 regardless of spec result [15:16:03] andrewbogott: BUT untrusted volunteers will not have spec run for them. So Jenkins will only vote +1. [15:16:07] andrewbogott: will mail ops list about it [15:16:24] https://integration.wikimedia.org/ci/job/operations-puppet-spec/5/console : FAILURE in 26s (non-voting) [15:16:26] Coren, can you tell me more? Or should I just read my damn email? [15:16:28] that is faster than the validate job [15:16:47] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/89002 (owner: 10Hashar) [15:17:02] andrewbogott: It's discussed on labs-l. Basically, I'm using lighttpd so that every tool has its own webserver; apache and ngnix are way too heavy for that. [15:17:05] mutante question about rss extension [15:17:08] (03CR) 10Andrew Bogott: "Yes, OK, maybe this module is slated for removal, but in the meantime I need it to work AT ALL so I can figure out what it does." [operations/puppet] - 10https://gerrit.wikimedia.org/r/89006 (owner: 10Andrew Bogott) [15:17:20] (03CR) 10coren: [C: 032] Tool Labs: fix race condition on webnodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/89005 (owner: 10coren) [15:17:24] is it possible to whitelist the domain but not the exact path? [15:17:56] andrewbogott: Why the seek-and-destroy? [15:18:23] drdee: i don't know yet, didn't implement the whitelist part.. which wiki are you on [15:18:43] Coren: OK, so all you're doing is installing the debian package? [15:19:03] I don't have a dog in the fight, but mostly whenever I mentioned lighttpd I get a strong "Don't use that, use nginx" from the room. [15:19:08] andrewbogott: Yep. [15:19:47] Coren: Might be worth running an email past the ops to see if anyone has a legit argument against it. [15:20:02] (03CR) 10Andrew Bogott: [C: 032] Fix mysql module so it can work w/out mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/89006 (owner: 10Andrew Bogott) [15:20:33] Coren, if you're not relying on lighttpd base classes though, then I really don't care at all :) [15:21:05] I do not. It's really just a matter of installing the deb, and a script to generate per-tool config file. :-) [15:21:36] e.g. https://wikitech.wikimedia.org/wiki/Puppet_Todo#lighttpd [15:23:04]