[00:00:07] New patchset: Dzahn; "remove misc::download-mediawiki from kaulen because it's either supposed to be on swift or on a new download host in eqiad, but not on the bugzilla server and not in Tampa and it wasn't used anyways. just added download.mediawiki.org to Apache conf so tha" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72664 [00:01:13] New review: Dzahn; "per RT-1839 not on kaulen and was empty dir" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/72664 [00:01:14] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72664 [00:04:28] New patchset: Dzahn; "remove non-root users from kaulen since it's not used as download server" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72666 [00:05:36] New review: Dzahn; "note that bugzilla is also going to move away from kaulen anyways" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72666 [00:31:52] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.453 second response time [00:45:25] !log updated Parsoid to d0e3603 [00:45:34] Logged the message, Master [00:45:36] RECOVERY - Parsoid on wtp1021 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.003 second response time [00:46:06] RECOVERY - Parsoid on wtp1011 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.010 second response time [00:46:06] RECOVERY - Parsoid on wtp1017 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.003 second response time [01:55:59] !log tstarling synchronized php-1.22wmf9/includes/WikiPage.php [01:56:09] Logged the message, Master [01:56:30] !log tstarling synchronized php-1.22wmf9/includes/api/ApiPurge.php [01:56:39] Logged the message, Master [02:13:46] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [02:14:13] !log LocalisationUpdate completed (1.22wmf9) at Tue Jul 9 02:14:12 UTC 2013 [02:14:20] New patchset: GWicke; "Increase Parsoid backend timeout to 5 minutes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72681 [02:14:22] Logged the message, Master [02:22:07] AzaToth: Who's calling me Mr. McBride? [02:26:55] !log LocalisationUpdate completed (1.22wmf8) at Tue Jul 9 02:26:54 UTC 2013 [02:27:04] Logged the message, Master [02:38:31] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 9 02:38:31 UTC 2013 [02:38:41] Logged the message, Master [02:50:47] PROBLEM - DPKG on mc15 is CRITICAL: Timeout while attempting connection [02:51:38] RECOVERY - DPKG on mc15 is OK: All packages OK [02:52:33] Elsie: on WO [02:52:42] Oh, heh. [04:15:40] !log tstarling synchronized php-1.22wmf8/includes/WikiPage.php [04:15:49] Logged the message, Master [04:16:14] !log tstarling synchronized php-1.22wmf8/includes/api/ApiPurge.php [04:16:23] Logged the message, Master [05:30:29] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:31:28] PROBLEM - DPKG on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:33:18] RECOVERY - DPKG on searchidx1001 is OK: All packages OK [05:37:30] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [05:40:29] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:43:29] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [07:24:59] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [07:24:59] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [07:24:59] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [07:24:59] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [07:24:59] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [07:25:00] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [07:25:00] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [07:25:01] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [07:55:58] New review: Hashar; "fine to me." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/72666 [07:59:39] New review: Hashar; "Sorry I cant really review that, too many things to do already. My only recommendation would be to ..." [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/62566 [08:06:17] New review: Mark Bergsma; "This should be done using a purge script, not varnishadm directly. Then that purge script can also c..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/72653 [08:26:04] New patchset: ArielGlenn; ".gitignore, .gitreview, debian packaging files for mwbzutils" [operations/debs/mwbzutils] (master) - https://gerrit.wikimedia.org/r/72691 [08:36:44] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:37:35] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [08:42:52] Change merged: ArielGlenn; [operations/debs/mwbzutils] (master) - https://gerrit.wikimedia.org/r/72691 [08:43:13] !log nikerabbit synchronized php-1.22wmf9/extensions/UniversalLanguageSelector/ 'ULS to master' [08:43:24] Logged the message, Master [08:52:02] !log nikerabbit synchronized php-1.22wmf8/extensions/UniversalLanguageSelector/ 'ULS to master' [08:52:10] Logged the message, Master [08:52:46] New patchset: Nikerabbit; "ULS deployment phase 5" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71971 [08:53:08] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71971 [08:55:38] !log nikerabbit synchronized wmf-config/InitialiseSettings.php 'ULS phase 5' [08:55:46] Logged the message, Master [09:21:22] New patchset: Nikerabbit; "Disable ULS on ml.*" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72702 [09:28:34] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72702 [09:29:01] !log nikerabbit synchronized php-1.22wmf8/extensions/UniversalLanguageSelector/ 'ULS to master' [09:31:52] !log nikerabbit synchronized php-1.22wmf9/extensions/UniversalLanguageSelector/ 'ULS to master' [09:33:30] !log nikerabbit synchronized wmf-config/InitialiseSettings.php 'Bug 51019' [09:33:40] Logged the message, Master [09:42:10] I am having trouble with a user who seems to have a labs account, but is not visible to gerrit. [09:42:32] Could someone with access to LDAP please check whether ther problem is on the gerrit side, or the LDAP side? [09:43:35] qchris: what do you mean "visible to gerrit"? [09:44:01] p858snake|l: Gerrit complains that the user does not exist ... which should mean that it's not in LDAP. [09:44:14] as in logging in? commiting? [09:44:17] p858snake|l: However, I cannot check whether or not the user is in ldap. [09:44:29] p858snake|l: As in 'Trying to add the user to a group' [09:45:08] p858snake|l: Neither does gerrit autosuggest the user, nor does gerrit allow to add directly. [09:50:14] PROBLEM - Host mw72 is DOWN: PING CRITICAL - Packet loss = 100% [09:50:44] RECOVERY - Host mw72 is UP: PING OK - Packet loss = 0%, RTA = 26.63 ms [09:51:11] qchris: wikitech/labs creates the accounts in ldap automatically, so it should be there [09:51:24] make sure they are in the shell group on wiki maybe [09:51:58] (if they aren't, they shouldn't be able to currently push to gerrit, so thats a easy check) [09:52:41] p858snake|l: How can I check if they are in the shell group on wiki? [09:53:45] whats the username, I will play around since I havn't looked for awhile [09:54:34] p858snake|l: It shows "(shell)" for the user on https://wikitech.wikimedia.org/w/index.php?title=Special%3AListUsers [09:54:52] p858snake|l: So I guess, that means he's in the shell group :-/ [09:55:12] yes [09:55:26] Ok. Thanks. [09:55:38] qchris: bug ^demon or ryan most likely [09:56:08] p858snake|l: Yes, I'll do that. Thanks. [10:11:48] New patchset: ArielGlenn; "deployment script: better error handling, make correct remote dirs" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/72708 [10:20:50] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/72708 [10:37:26] PROBLEM - SSH on cp3004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:37:36] PROBLEM - Varnish HTTP upload-backend on cp3004 is CRITICAL: Connection timed out [10:37:45] PROBLEM - Varnish traffic logger on cp3004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:37:45] PROBLEM - Varnish HTCP daemon on cp3004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:43:25] RECOVERY - SSH on cp3004 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [10:43:35] RECOVERY - Varnish traffic logger on cp3004 is OK: PROCS OK: 2 processes with command name varnishncsa [10:43:36] RECOVERY - Varnish HTCP daemon on cp3004 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [11:00:45] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:01:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [11:06:30] New review: Faidon; "Considering the ongoing discussions about database backups/xtrabackup, perhaps the MySQL bpipe plugi..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70840 [11:14:02] New review: Parent5446; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/69982 [11:18:37] New review: Faidon; "That's really nice work, Ori. Very well documented and styled, plus the upstart bits are nice. I onl..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/71927 [11:21:27] New review: QChris; "I have no idea how the github projects get generated.." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72130 [11:43:37] New patchset: DixonD; "Settings for categories created by Extension:Babel for Ukrainian Wikisource." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72713 [11:47:33] !log apt: updating to ceph 0.66 [11:47:43] Logged the message, Master [11:48:14] New patchset: DixonD; "Settings for categories created by Extension:Babel for Ukrainian Wikisource." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72713 [12:14:11] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [12:32:50] manybubbles: hey [12:33:00] paravoid: hey! [12:33:04] manybubbles: https://gerrit.wikimedia.org/r/#/c/72522/ [12:33:34] plus we should probably merge that javadoc cleanup too [12:34:51] paravoid: I'll read the commit. you may want to have a look at all the pending changes for operations/debs/jmxtrans - I believe I have one for .gitreview as well [12:34:51] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:35:04] oh I didn't see that [12:36:01] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [12:37:44] New review: Manybubbles; "Most things of the commit is nice cleanup. The meat is merging preinst and postinst. All changes s..." [operations/debs/jmxtrans] (master) C: 1; - https://gerrit.wikimedia.org/r/72522 [12:38:09] manybubbles: I had those as PS5 on your original commit [12:38:29] but you accidentally pushed PS6 based on PS4 and these were lost [12:38:32] and I didn't notice [12:38:45] paravoid: Sorry! I'm still getting used to gerrit. [12:39:00] oh no worries [12:49:09] New patchset: Hashar; "rake wrapper to run puppet module tests" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72721 [12:52:34] if you wanna try out spec unit test in puppet, see 72721 ^^^^ [12:55:37] hashar: so... [12:58:48] hashar: any progress? [12:59:14] AzaToth: hi [13:01:07] New review: AzaToth; "(1 comment)" [operations/debs/jmxtrans] (master) - https://gerrit.wikimedia.org/r/72522 [13:01:30] hashar: you saw they've merged the TAP thingi now? [13:01:52] so now you can use jenkins-build [13:02:05] just need to update it [13:03:02] !log touched and synced extensions/Wikibase/lib/resources/wikibase.ui.SiteLinksEditTool.js on php-1.22wmf8 and php-1.22wmf9 [13:03:02] !log hashar synchronized php-1.22wmf8/extensions/Wikibase/lib/resources/wikibase.ui.SiteLinksEditTool.js [13:03:10] Logged the message, Master [13:03:19] Logged the message, Master [13:03:30] !log hashar synchronized php-1.22wmf9/extensions/Wikibase/lib/resources/wikibase.ui.SiteLinksEditTool.js [13:03:39] Logged the message, Master [13:03:43] orly [13:19:45] !log upgraded ceph to 0.66 [13:19:55] Logged the message, Master [13:22:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:23:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 5.028 second response time [13:27:26] paravoid: hi [13:27:35] hi. [13:27:51] paravoid: I'm trying to fix a lintian problem related to RPATH [13:28:19] is this about dclass? [13:28:45] you still haven't reacted/replied to about 20% of my review :) [13:29:09] paravoid: the SONAME right ? [13:29:14] not just that [13:29:19] also installing from .libs [13:29:23] and overriding make install completely [13:29:39] paravoid: about installing from .libs, wrote a comment, that the path will be arch-dependent [13:29:51] paravoid: can you give me some tips please on how to handle that in a more generic way ? [13:29:58] like perhaps an $ARCH variable or something ? [13:30:43] what do you mean? [13:33:04] paravoid: so you said "don't use .libs, use debian/tmp" right ? [13:33:29] paravoid: but the path to debian/tmp is build-area/dclass-2.0.14/debian/tmp/usr/lib/x86_64-linux-gnu/ [13:33:45] that is correct [13:33:45] paravoid: the "x86_64-linux-gnu" part of that path is arch-dependent [13:33:58] yes, that's how it's supposed to be [13:34:33] putting the lib in the root of /usr/lib is wrong [13:34:40] paravoid: ok, my question is, in my *.install I'll have to use a path like debian/tmp/usr/lib/$ARCH/ [13:34:56] paravoid: is there such a variable like $ARCH and can I use variables like that inside *.install files ? [13:35:24] paravoid: I intend to build for both 32-bit and 64-bit [13:35:51] average: why 32 bit as well? that does not seem necessary [13:35:55] echo 'usr/lib/*/libdclassjni*so*' > debian/libdclass-jni.install [13:35:59] should be enough [13:37:26] you're currently installing the library under /usr/lib/, e.g. /usr/lib/libfoo.so. that's wrong, it should be /usr/lib//libfoo.so [13:37:35] you're also installing *from* .libs which is also very wrong [13:38:26] ok, will switch to getting stuff from debian/tmp [13:38:55] nod [13:41:22] one more thing [13:41:33] I got this lintian error about RPATH [13:42:57] paravoid: what do you mean above by "" ? [13:43:25] the lintian error is ===> E: libdclass0-jni: binary-or-shlib-defines-rpath usr/lib/libdclassjni.so.0.0.0 /home/user/wikistats/build-area/dclass-2.0.14/.libs [13:44:07] I don't know what to do about it, read some pages, tried to get rid of -Wl and -Wall flags in the Makefile.am , the lintian error above persists [13:44:26] try using debian/tmp [13:44:32] pretty sure it's going to be gone [13:44:52] ok, what about "" ? [13:45:51] x86-64 - linux - gnu [13:45:53] that's the triplet [13:47:29] ok [13:47:36] thank you [13:48:26] three parts [13:48:26] PROBLEM - Disk space on virt6 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 38845 MB (3% inode=99%): [13:49:09] see https://wiki.debian.org/Multiarch/Tuples [13:49:34] technically the path is not the GNU triplet [13:49:44] but something that looks like it, that page explains it further [13:49:49] I'm not sure if you care that much though :) [13:51:13] paravoid: I care, but want to get the package out also [13:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [13:56:12] paravoid: quick q: what's the status of debianizing librdkafka? [13:56:15] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:57:15] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [13:57:33] drdee: I haven't worked on it for a while [13:58:26] k, is there much left / [14:01:17] not really [14:01:38] we were going back and forward with Snaps on the 0.8 branch and all that [14:01:49] but that really just needs a decision, nothing more :) [14:01:55] what decisin? [14:02:11] New patchset: Ottomata; "Adding Christian Aistleitner account on analytics nodes. RT 5403" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72726 [14:02:49] New patchset: Ottomata; "Adding Christian Aistleitner account on analytics nodes. RT 5403" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72726 [14:03:41] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72726 [14:04:16] drdee, i'm not sure what their discussion was, but just with the kafka debianization, we can't really build the official deb until 0.8 is released [14:04:21] we can get it all ready [14:04:32] so that once it is we just build the .deb and tada done [14:04:43] k [14:06:06] paravoid: I'll move 0.8 to master [14:08:10] New patchset: Ottomata; "Fixing copy/paste error in admins.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72727 [14:08:53] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72727 [14:09:38] Snaps: don't mean to pressure you, I can package a branch too [14:10:30] paravoid: thanks, but its okay, it is meant for master eventually anyway. [14:22:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:23:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.122 second response time [14:26:54] hashar, sorry, just sent you an invite by mistake [14:27:06] apparently I don't know how to set up a new hangoug. [14:27:09] hangout [14:27:15] is there booze at that event ? [14:27:26] if so I will be happy to take a plane to attend [14:28:35] wow, it's almost 17h ? [14:28:47] where does the time go [14:35:00] New patchset: Ottomata; "Puppetizing analytics udp2log instances" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72618 [14:35:13] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72618 [14:39:19] Change merged: Faidon; [operations/debs/jmxtrans] (wikimedia) - https://gerrit.wikimedia.org/r/71384 [14:39:21] !log stopping puppet on analytics1006, 1008 and 1009 to manually apply udp2log puppetization [14:39:31] Logged the message, Master [14:40:32] New review: Faidon; "(1 comment)" [operations/debs/jmxtrans] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/72522 [14:40:33] Change merged: Faidon; [operations/debs/jmxtrans] (master) - https://gerrit.wikimedia.org/r/72522 [14:41:57] New review: Faidon; "I never saw the point of .gitreview but I'm with merging it. I think though that you'd need a prebui..." [operations/debs/jmxtrans] (debian) C: -1; - https://gerrit.wikimedia.org/r/71373 [14:43:04] New patchset: Ottomata; "Fixing application of udp2log roles on proper hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72729 [14:43:08] New review: Faidon; "Looks unused here too :) Thanks!" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/71127 [14:43:36] ottomata: heh, typos happen within puppet too :) [14:43:40] New patchset: Ottomata; "Fixing application of udp2log roles on proper hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72729 [14:43:48] haha [14:43:50] yeah rigiht! [14:43:59] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72729 [14:50:36] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71127 [14:52:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:53:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 1.930 second response time [14:55:53] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72613 [14:56:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:56:57] New patchset: Ottomata; "Removing quotes from libanon salt key in udp2log filter config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72730 [14:57:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [14:57:34] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72730 [14:59:22] paravoid: you don't use git review? [14:59:50] !log maxsem synchronized wmf-config/InitialiseSettings.php [14:59:59] Logged the message, Master [15:00:20] I don't [15:00:51] lol [15:01:03] i never tried git reivew [15:01:16] so you manually push? [15:01:32] git push origin HEAD:refs/publish/foo [15:01:34] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:04:35] jeremyb: heh [15:04:47] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [15:04:47] hashar, will you join our call? Alex sent you an invite. [15:05:03] ohh [15:05:25] RECOVERY - Disk space on analytics1006 is OK: DISK OK [15:07:02] pfr-master = push origin HEAD:refs/for/master [15:07:02] pfr-prod = push origin HEAD:refs/for/production [15:09:13] !log re-starting puppet on analytics1006,1008,1009 [15:09:22] Logged the message, Master [15:09:46] New patchset: AzaToth; "adding gitreview" [operations/debs/jmxtrans] (debian) - https://gerrit.wikimedia.org/r/71373 [15:10:08] paravoid: fixed [15:10:42] paravoid: never added that one before as I made it before the debianization was merged [15:11:18] paravoid: I assume you are too stubborn to allow yourself to try git review [15:11:44] ツ [15:14:04] I won't deny that [15:14:05] :) [15:14:11] * paravoid is upgrading desktop to wheezy [15:14:24] 1525 upgraded, 701 newly installed, 70 to remove and 4 not upgraded. [15:14:27] Need to get 1972 MB of archives. [15:14:30] "fun" [15:14:38] heh [15:15:47] paravoid: bne happy you are not using Arch Linux [15:16:03] paravoid: if you forget to upgrade during a month, you are fucked [15:17:03] (if you even suceeds in installing it, as they have no installer) [15:17:03] New review: Manybubbles; "This successfully gets .gitreview out of the source package. Is there a reason .gitreview needs to ..." [operations/debs/jmxtrans] (debian) C: 1; - https://gerrit.wikimedia.org/r/71373 [15:17:34] I didn't say out of the source package [15:17:34] manybubbles: depends on which branch it is on [15:17:40] I said gbp [15:18:09] paravoid: added debian/source/options to solve that [15:18:11] I don't remember the exact circumstances though [15:18:23] last time around I think we needed it in debian/gbp.conf [15:18:26] it's because it's not part of the upstream package [15:18:33] prebuild = rm -f .gitreview [15:18:51] paravoid: extend-diff-ignore = '^\.gitreview$' works fine [15:18:52] not sure though, I think I've it under source/options too [15:19:49] manybubbles: it's all about the concept of pristine upstream [15:20:06] which we though violate [15:20:11] but not that much ツ [15:20:46] anyway, cut have placed the .gitreview in the wikimedia branch, then the debian/source/options wouldn't be needed [15:21:58] it boils down to that anything in the debian dir/branch shouldn't modify hard anything outside the debian dir [15:22:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:22:50] the .gitreview thingi is a bit a problem then as it can be in the debian dir, and it needs to exists without running some other command [15:23:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [15:23:16] as most people uses git-review [15:23:30] only the few hardcore geeks does it manually [15:24:29] New patchset: Ottomata; "Adding icinga alerts for per topic kafka producers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72733 [15:24:44] manybubbles: did you make the v242 tag, or is it upstream? [15:25:00] AzaToth: upstream iirc [15:25:01] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72733 [15:25:28] then we could use upstream-tree=tag as normal [15:25:54] upstream-tree=branch is only needed when upstream has never made any tags, or we are following tip unconditionally [15:26:21] it is on https://github.com/jmxtrans/jmxtrans/tags so they made it. [15:26:31] i.e. if you can do "git describe", then tag is sufficient [15:27:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:27:50] AzaToth: ok - that makes sense now that we're using patches to change their build in the debian branch. It didn't make sense when I started because I didn't think that was how you were supposed to use git-buildpackage. [15:28:13] heh [15:28:19] it's understandable [15:28:19] AzaToth: though I believe we still remove their less than ideal debian directory in the wmf branch [15:28:43] that's a recommended way actually [15:28:49] if upstream ships a debian dir [15:30:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [15:32:13] I use git-review but don't use .gitreview files [15:32:18] i just manually add the gerrit remote [15:32:21] and run git-review -s [15:32:24] and from then on it just works [15:33:41] New patchset: MaxSem; "Don't redirect m.wikimediafoundation.org to the main site" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/72734 [15:35:14] New patchset: MaxSem; "Don't redirect m.wikimediafoundation.org to the main site" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/72734 [15:35:31] andrewbogott: the patch I was referring to is https://gerrit.wikimedia.org/r/#/c/72721/ [15:35:49] andrewbogott: that is basically your python wrapper from integration/jenkins.git but ported to Rake and in operations/puppet.git [15:37:20] hasharCall: That looks much more efficient! But now I'm troubled about the stage that checks things out from github... [15:37:33] (I mean, my test did that too of course.) [15:38:52] yeah I am not sure what the fixtures are :/ [15:39:00] one way would be to have them in our own repo [15:39:22] and skip the github dependency entirely, that is what the spec_standalone task is for [15:39:29] Yeah, that's probably what we should do. I'm trying to figure out what's in there. [15:39:51] New patchset: Matthias Mullie; "Enable auto-archive on enwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72735 [15:40:16] so hmm that is ton of .rb files :-] [15:40:32] and some facters [15:41:16] Presumably it's always the same… could just use a tarball? [15:41:21] andrewbogott: are you the one to ask a security question if csteipp isn't around? [15:41:32] matanya, probably not :) [15:41:37] who is? [15:41:46] paravoid maybe. [15:41:47] matanya: to me in private please [15:41:57] ah, there you go. [15:42:01] could point you to another person depending on the exact issue [15:42:02] :) [15:42:15] infrastucture or mediawiki? [15:43:48] matanya: several of us read u-s-a, there's no need to ping us everytime a USN gets released [15:44:19] thanks paravoid I asked csteipp about it, he sisn't know [15:44:24] *didn't [15:44:29] I have u-s-a on my inbox, don't worry [15:44:37] I'll stop. thanks for letting me know [15:44:50] oh don't get me wrong, it's appreciated [15:45:02] sure, i said that in a good way [15:45:07] hasharCall, one of the things that it checks out is stdlib and we already have a version of that in our puppet repo. [15:45:55] although maybe trying to reuse just makes it harder [15:46:05] andrewbogott: they might have a different version requirement for stdlib [15:46:14] * andrewbogott nods [15:46:45] paravoid: there should be Intonation in IRC :P [15:47:59] andrewbogott: so potentially we could use a snapshot of the github repositories and publish them in ops/puppet [15:48:23] andrewbogott: or use submodules and a specific sha1 that would be changed after review of the code [15:48:56] then adapt the rakefile patch to not use spec_prep (which fetch from github to prepare the fixtures) and simply call spec_standalone [15:50:54] hasharCall, how would the snapshot work? Would the whole tree be checked into the puppet repo? [15:51:27] yeah that is the problem [15:51:52] using submodule might work, but then you probably don't want the puppet master to fetch from them :D [15:52:06] awjr: hey, are you going to need this, or can this ticket be closed? https://rt.wikimedia.org/Ticket/Display.html?id=5267 [15:52:09] New patchset: AzaToth; "update build.xml" [operations/debs/jmxtrans] (master) - https://gerrit.wikimedia.org/r/72736 [15:52:09] New patchset: AzaToth; "Initialize git-dpm" [operations/debs/jmxtrans] (master) - https://gerrit.wikimedia.org/r/72737 [15:53:19] @notify binasher [15:53:19] I'll let you know when I see binasher around here [15:53:44] hasharCall, at the moment the rake tests check out those files once for every module. But nothing in that tree is changed, right? [15:54:03] So we could package them, install them someplace and just do everything with symlinks? [15:54:18] ohhh [15:54:29] do you mean debian packaging some puppet modules ? : -D [15:55:19] Yeah, just regard that whole 'fixtures' directory as a single package. [15:55:34] And then just have one copy of it on our system, for good. [15:56:08] It's kind of weird that it doesn't work like that already, which makes me wonder if I'm missing something [15:59:00] andrewbogott: the list of fetched out module sis http://paste.openstack.org/show/39823/ [16:00:02] andrewbogott: I am not sure what to do honestly. There are too many options :-] [16:00:06] paravoid, you might have an opinion about this. [16:00:32] The puppet spec tests currently check out the ^above^ list of puppet modules from github as part of the testing process. [16:01:01] Wondering if we should debianize them, or host them on gerrit, or just use a tarball, or... [16:01:10] to do what? [16:01:13] I lost you [16:02:07] andrewbogott: I must head back home, we can follow up by mail / gerrit review whatever :) [16:02:12] hashar: ok! [16:02:15] i might connect later tonight (europe time) [16:02:27] if my daughter agrees to get to bed early on hehe [16:02:39] paravoid, this is for puppet unit tests. There's an existing system that works pretty well... [16:02:43] *wave* [16:02:59] It sets up a virtual environment during the tests, which pulls things down from github. [16:03:03] what is? [16:03:10] rspec [16:03:35] can you start from the beginning? [16:03:44] sorry, I'm having trouble to follow [16:04:35] ok... [16:04:51] So… I want jenkins to run puppet unit tests. [16:04:54] right [16:05:01] and use rspec for this [16:05:04] yep. [16:05:07] okay [16:05:27] Right now if you run a unit test locally (via $rake spec) it starts by setting up a venv for the tests to run in [16:05:55] And it pulls down some standard puppet libs to use as part of the tests: firewall, stdlib, a couple of others. [16:05:55] when you say venv you mean rubygems environment? [16:06:19] I actually don't know, exactly. [16:06:26] Probably 'venv' isn't the right term. [16:06:27] the firewall puppetlabs module is crap, I don't think we'll ever use it [16:06:36] so there's no point in fetching that [16:06:44] stdlib we use already, we should probably use the copy in our repo though [16:07:08] unit testing with a e.g. newer stdlib than the one we have is wrong [16:07:20] a module may use a function that doesn't exist in our import, for example [16:07:42] * andrewbogott nods [16:07:44] that seems right. [16:08:04] so these tests, do they have access to the private puppet repo? :) [16:08:38] i don't think rspec should fetch anything externally, our puppet repo should be self-sufficient [16:08:45] mark: They're run locally to a specific module subdir. [16:09:01] That is, presumably, why it pulls down additional modules -- it stashes them in a subdir of the module dir. [16:09:08] I'm guessing puppetlabs is trying to test individual modules and they're considering these modules as "basic" [16:09:18] paravoid, yes, exactly. [16:09:23] but we're going to test individual modules as part of our repo, right? [16:09:41] I mean, we are going to have interdependencies between our modules [16:10:17] Maybe. When I think of 'unit tests' usually I presume that external deps are getting mocked out. [16:10:48] paravoid: so....? http://blog.steve.org.uk/lumail_is_complete.html have you? ;) [16:11:36] I'd been thinking of the checkout of those external modules as part of that mocking. Basically using trivial known-good stubs to fulfill dependencies. [16:11:48] But it's probably equally/more useful to use our actual modules... [16:12:08] Although I would think of that as 'integration' rather than 'unit' [16:12:13] that is true [16:12:30] yeah, dunno [16:12:52] we won't include firewall or xinetd, so modules should not assume these exist [16:13:14] stdlib is fine, as long as its our copy, not what is master on github [16:13:36] (mocking stdlib is impossible...) [16:14:27] It might be that the testing framework itself uses those packages. Lemme try yanking them out to see what happens. [16:24:28] hey greg-g [16:25:30] brb [16:25:40] hoo: so, between now and 10:30am Pacific (ie: 1 hour) is open [16:25:45] oh, ok :) [16:27:23] mark: good evening! Could you have a look at https://gerrit.wikimedia.org/r/#/c/72681/? [16:27:49] or (in case the ? became part of the URL) https://gerrit.wikimedia.org/r/#/c/72681/ [16:30:03] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:31:16] greg-g: Ok, that should be fine, thanks [16:31:19] anomie: ping [16:32:06] hoo: pong [16:32:08] hoo: mind adding it to the Deployments calendar wiki page? https://wikitech.wikimedia.org/wiki/Deployments [16:32:39] anomie: Could you maybe just also merge this one https://gerrit.wikimedia.org/r/72740 ? Very minor thing [16:32:46] greg-g: doing [16:33:33] hoo: thanks :) [16:33:53] PROBLEM - SSH on searchidx1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:48] RECOVERY - SSH on searchidx1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [16:35:58] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [16:38:34] !log going to tinkering with dysprosium...will see some icinga alerts [16:38:43] Logged the message, Master [16:41:04] anomie|away: Did the entry, we only need to update wmf9, all global AF wikis are on it yet [16:44:08] hoo: Ok, back. [16:45:28] about to upload the change [16:45:35] ok [16:46:22] anomie: https://gerrit.wikimedia.org/r/72744 [16:48:56] !log anomie synchronized php-1.22wmf9/extensions/AbuseFilter 'Update AbuseFilter to master for global filter bug fixes' [16:49:05] Logged the message, Master [16:49:28] hoo: There you go [16:49:57] Works like a charm... thanks :)) [16:50:38] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 196 seconds [16:50:48] qchris: you there? i broke gerrit again :P (but just slightly) [16:51:15] qchris: https://gerrit.wikimedia.org/r/#/c/72738/ refuses to show the diff in any form [16:51:32] it's an octopus merge with some changes on top, but diffs for those worked in the past [16:51:40] sbernardin: are you in data center? [16:52:10] need you to work on this https://rt.wikimedia.org/Ticket/Display.html?id=5432 [16:54:52] thanks anomie [16:55:23] greg-g: no problem [16:55:52] MatmaRex: :-) [16:56:23] MatmaRex: That's a merge of three parents? [16:56:39] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [16:57:04] MatmaRex: Not sure if that really worked in the past. I'll try to verify that locally [16:57:35] qchris: yeah, sort of. [16:58:05] qchris: i produced it by checkoit out matser and running `git merge review/matmarex/bug/50385 review/matmarex/bug/47793 --no-commit`, then making some changes and committing. [16:58:12] checking out master* [16:58:50] it'd be enough to be a merge of just these two, really, i forgot this will result in a three-parent commit [16:59:13] cmjohnson1: saw the email...will be getting on it [17:00:19] MatmaRex: I'll play around with merge commits a bit. I am totally not sure how well gerrit handles them :-/ [17:00:52] paravoid: missing-pre-dependency-on-multiarch-support [17:00:58] paravoid: did you encounter this before ? [17:01:17] try lintian -i [17:01:52] ok [17:02:09] sorry, in the middle of an upgrade [17:02:50] ok [17:03:28] cmjohnson1: hey, is this a ticket that still needs to be open, or...??? https://rt.wikimedia.org/Ticket/Display.html?id=3506 [17:03:57] no..close it [17:04:02] thx [17:04:37] thank you! [17:04:55] thx for asking [17:14:15] Robh: hey, is spence actually decom at this point? [17:15:07] ottomata: hey, quick question [17:15:10] this ticket: https://rt.wikimedia.org/Ticket/Display.html?id=2165 [17:15:14] can it be closed? [17:15:29] all dependencies look met [17:15:41] whoa yes close [17:15:44] that is an oldy [17:15:46] and cool! thanks [17:15:46] yep [17:18:02] New patchset: GWicke; "Increase Parsoid backend timeout to 5 minutes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72681 [17:18:22] New patchset: GWicke; "Increase Parsoid backend timeout to 5 minutes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72681 [17:25:03] is any Varnish expert awake? [17:25:59] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [17:25:59] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [17:25:59] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [17:25:59] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [17:25:59] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [17:26:00] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [17:26:00] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [17:26:01] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [17:26:02] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72496 [17:30:30] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71851 [17:32:30] akosiaris: hi [17:32:37] akosiaris: I'm Stefan from the Analytics team [17:33:06] akosiaris: you're Alexandros, you're from the Ops team right ? :) [17:33:49] average: hello Stefan. You are correct :-) [17:33:56] how can i help you ? [17:34:24] akosiaris: Have you encountered "missing-pre-dependency-on-multiarch-support" lintian error ? [17:34:40] http://lintian.ubuntuwire.org/quantal/tags/missing-pre-dependency-on-multiarch-support.html [17:34:43] this one [17:34:52] trying to make a debian package and getting this one [17:35:07] average: just add a predeps [17:35:39] average: nope but just do what AzaToth says in the control file and you should be ok [17:35:54] https://gerrit.wikimedia.org/r/#/c/68711/10/debian/control [17:35:56] average: I assume you know about multiarch in general? [17:36:03] lines 20 and 32 [17:36:06] is that how ? [17:36:21] Build-Depends: debhelper (>= 9), javahelper (>= 0.40), dh-autoreconf [17:36:31] hi AzaToth [17:36:33] asorry [17:37:03] before Depends: ${shlibs:Depends}, ${misc:Depends} [17:37:05] add [17:37:22] Pre-Depend: multiarch-support [17:37:28] Pre-Depends: multiarch-support [17:37:30] hmm [17:37:39] dunno if it should be singular or plural [17:38:23] singular [17:38:30] trying it [17:38:35] java bindings for libdclass ? [17:38:51] akosiaris: yes [17:40:25] AzaToth , akosiaris thank you [17:40:27] it worked ! [17:40:42] you are welcome [17:41:24] New patchset: GWicke; "Bug 51053: Increase Parsoid backend timeout to 5 minutes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72681 [17:43:41] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72735 [17:45:43] average: anyway, that tag has been removed from lintian in debian [17:45:56] i.e. it will be removed from ubuntu in october [17:49:13] New review: Parent5446; "@Hashar, does beta have E:CentralAuth installed? Because that's the only thing that has issues." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68937 [17:52:58] Hi LeslieCarr. I've just added a few new icinga alerts for some udp2log Kafka producers [17:53:03] as far as I can tell, everything is normal [17:53:10] but I get UNKNOWN - check failed, metric not found  in icinga [17:53:17] any tips on where to look to see what might be wrong? [17:53:58] first thing is look for that in just syslog [17:54:03] it may tell you what it can't find [17:54:10] then also look at what the snmp command returns [17:54:17] does it actually return a number or not [17:54:26] by running it from neon [17:54:32] springle: https://www.mediawiki.org/wiki/Requests_for_comment/Support_for_user-specific_page_lists_in_core [17:55:04] i see the same metric not found error in syslog [17:55:20] how do I run the snmp command? [17:56:21] !log mlitn synchronized php-1.22wmf8/extensions/ArticleFeedbackv5 'Update ArticleFeedbackv5 to master' [17:56:31] Logged the message, Master [17:56:49] !log mlitn synchronized php-1.22wmf9/extensions/ArticleFeedbackv5 'Update ArticleFeedbackv5 to master' [17:57:00] Logged the message, Master [17:57:39] oh the check ganglias thing [17:57:40] ah [17:57:40] yes [17:57:42] paravoid , akosiaris what's your oppinion on lintian error-free packages ? [17:57:42] that returns a value [17:57:49] ottomata: what do you think [17:57:58] is a lintian error-free package production ready ? [17:58:28] depends on what it is going to do [17:59:09] it might also need QA depending on how gravely it is going to affect the infrastructure [18:00:08] akosiaris: it's basically just loading a tree in memory, and then you're able to query it, that's it [18:00:32] that's new functionality right ? [18:00:45] average, your talking about dclass, right? i think with that if you got rid of the lintian errors, we're really close to being able to merge it [18:00:47] there isn't anything important relying on it ... [18:00:48] probably just need a final review [18:01:05] ottomata: ys [18:01:06] *yes [18:01:19] LeslieCarr: check_ganglios_generic_value for this new icinga alert works, the check command is a little different though [18:01:36] the ganglia metric name is parameterized in check_commands [18:01:36] !log mlitn synchronized wmf-config/ 'Update ArticleFeedbackv5 & Echo config' [18:01:39] maybe I can't do that? [18:01:42] ottomata: we got rid of all errors, we still got these warnings though https://gist.github.com/wsdookadr/eb98aceceb049f09e70d [18:01:43] Logged the message, Master [18:01:50] command_line $USER1$/check_ganglios_generic_value -H $HOSTADDRESS$ -m udp2log_kafka_producer_$ARG1.AsyncProducerEvents -w $ARG2$ -c $ARG3$ -o lt [18:02:31] well is it properly showing arg1-3 ? [18:02:34] when you put it in ? [18:03:02] like if you forgot a number onthe end is common [18:03:37] i think it looks right [18:03:38] service_description kafka_producer_webrequest-mobile.AsyncProducerEvents [18:03:39] check_command check_kafka_producer_produce_events!webrequest-mobile!2000000!1000000 [18:03:39] ## --PUPPET_NAME-- (called '_naginator_name' in the manifest) analytics1009 kafka-producer-webrequest-mobile.AsyncProducerEvents [18:04:24] oh i see [18:04:34] you forgot the termination "$" on $ARG1 [18:04:34] average, this seems like one we should fix: [18:04:35] non-dev-pkg-with-shlib-symlink [18:04:39] AHHH [18:04:47] line 632 [18:04:51] AHHHH [18:04:51] it's always the little things! [18:04:52] thank you@ [18:05:06] binasher: Don't start anything yet; while the discussion is headed that way it's not a done deal. Do you want me to loop you in the email thread if you want to add something of a technical nature? [18:05:10] at least it wasn't a semi colon… those things are put into code just to taunt humanity [18:05:35] New patchset: Ottomata; "Fixing check_kafka_producer_produce_events check command" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72754 [18:05:59] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72754 [18:06:08] ottomata: looking into it [18:06:33] average, i think that is just saying that the main packages shoudln't include the 0.0.0 symlink, right? [18:06:38] those should be installed by the -dev packages [18:09:14] ottomata: you are right sir [18:09:34] fixing [18:10:25] also this one is important i think: [18:10:26] package-name-doesnt-match-sonames [18:10:30] not sure which way to fix [18:10:37] either make package name match so name, or vice versa [18:13:48] New patchset: Matthias Mullie; "Fix auto-archive switch variable name" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72756 [18:17:07] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72756 [18:19:41] ottomata: ok, non-dev-pkg-with-shlib-symlink is out [18:19:45] so the sonames is left [18:19:56] !log mlitn synchronized wmf-config/ 'Enable ArticleFeedbackv5 auto-archive' [18:20:05] Logged the message, Master [18:20:58] yeah [18:21:06] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 202 seconds [18:21:35] probably the package name hsould be libdclass-jni0 or libdclassjni0 [18:21:42] what do you think? [18:22:06] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [18:22:09] hmm [18:22:17] looks like there are a lot of -jni packages [18:22:18] hm [18:23:27] hmm, average, i'm not sure [18:23:33] what to do, maybe we should jsut ignore that error [18:23:44] paravoid? [18:23:46] opine? [18:25:29] !log authdns-update on ns0 [18:25:41] Logged the message, Mistress of the network gear. [19:10:32] Ryan_Lane: what is this and can it be closed? https://rt.wikimedia.org/Ticket/Display.html?id=2101 [19:13:58] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:14:38] notpeter: i think he rolled an image (ami?) with a bunch of things preinstalled. maybe that means it never happens any more? [19:15:10] (there was mail about it at the time) [19:15:23] jeremyb: yeah, i don't think it's still an issue [19:15:28] but I wanted to check with ryan [19:15:48] PROBLEM - SSH on searchidx1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:56] notpeter: can be closed [19:15:58] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [19:16:06] Ryan_Lane: thanks! [19:16:49] RECOVERY - SSH on searchidx1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [19:19:48] PROBLEM - SSH on searchidx1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:20:48] RECOVERY - SSH on searchidx1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [19:46:05] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:47:56] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [19:59:33] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72646 [20:10:03] New patchset: Hashar; "get rid of GlusterFS on deployment-prep labs project" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72504 [20:10:48] New review: Hashar; "Made the patch slightly less uglier following Ryan email conversation and inline diff. Thank you Ryan!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72504 [20:12:31] !log kaldari synchronized wmf-config/InitialiseSettings.php 'activating Disambiguator extension on all WMF wikis' [20:12:49] Logged the message, Master [20:15:39] Hm.. It seems https://bits.wikimedia.org/geoiplookup was down for a few minutes (it was outputting an empty object on HTTPS, HTTP was fine) [20:15:44] can't reproduce it anymore though [20:15:55] e.g. "Geo = {}" [20:16:00] HTTP 200 OK [20:16:12] Krinkle: ipv6 maybe ? [20:16:42] Yes, my ISP does support that. I get IPv4 and IPv6. [20:17:40] hashar: Why? Are you saying we don't handle Ipv6? [20:17:57] And if not, why would it vary from one request to another. [20:18:14] I can't remember the geoip url :/ [20:18:22] We do, but 1) we terminate it at a proxy, so there might be XFF recognition issues and 2) the IPv6 geoip DB is probably less extensive than the IPv4 one [20:18:28] ah http://geoiplookup.wikimedia.org [20:19:01] specially if your ISP opened up v6 recently [20:19:07] I'm able to reproduce it consistently when I refresh like 40 times [20:19:08] But if it varies from one request to the other, maybe there's a bad Varnish server in the pool [20:19:18] they need to gather some data from spywar^WFacebook users [20:19:20] Tracking headers now [20:19:31] try with curl -4 [20:19:39] and curl -6 to force the protocol [20:20:16] RoanKattouw: Hm.. no useful headers that I can see [20:20:32] We don't expose the host that served it on geoiplookup? [20:20:48] Don't we? Is there no cpNNNN in an X-Cache header? [20:21:04] bah geoiplookup.wikimedia.org does not even have an AAAA record for me [20:21:18] RoanKattouw: Yes, there is actually [20:21:22] hashar: https://bits.wikimedia.org/geoiplookup [20:21:59] RoanKattouw: Though it varies from one request to another, I have two tabs open both with "X-Cache:cp1057 miss (0)" of which one has Geo = { .. }; and one Geo = {} [20:22:01] so yeah same issue for me [20:22:11] curl -6 https://bits.wikimedia.org/geoiplookup [20:22:12] Geo = {} [20:22:37] Ugh [20:22:46] I am pretty sure we could collect the location ourselves and start a free localization IP database :-] [20:22:47] http://cl.ly/image/2H2W2r2G361t [20:22:55] Does curl -6 consistently produce {} ? [20:23:10] RoanKattouw: yes, because the IP is not found in the GeoIP db [20:23:20] at least that was the case when I have setup GeoIP on beta. [20:23:20] Indeed, IPv6 reproduces it consistently [20:23:31] I did a lookup in the proprietary db, I was not there [20:23:46] I wonder why I'm apparently hopping arbitrarily between 4 and 6 then? [20:23:59] I am not sure how the browser handle v4 / v6 [20:24:31] maybe it rounds robin between both dns entries [20:24:38] That's possible [20:24:48] It may also be connecting to both in parallel and using the first one that succeeds [20:25:00] yeah that is possible too [20:25:05] need to take some network traces [20:25:19] Due to various past problems with IPv6 connectivity, browsers have had to implement v6/v4 switching logic of their own [20:25:20] like it sends two syn request, and abort the slower connections [20:25:40] your ISP probably has an equivalent connectivity with knams on both protocol [20:25:51] sometime I get a faster connection over v6 than v4 :-] [20:26:02] (shorter path, probably less people) [20:26:09] One would hope the browser caches this information and not have each request be 2 requests [20:26:33] Wireshark will tell you what happens [20:26:39] Anyhow, I can consistently reproduce it working with curl -4 and not working with curl -6. And in Chrome I can get it pretty reliably once every 40-50 refreshes. [20:27:05] Krinkle: specially look at DNS queries on udp port 53 [20:27:21] kaldari: you there? how often are https://pl.wikipedia.org/wiki/Specjalna:DisambiguationPages and https://pl.wikipedia.org/wiki/Specjalna:DisambiguationPageLinks updated? i've already added the magic word to disambig templates on pl.wp [20:31:41] New review: coren; "Barring the capability for project-wide global puppet variables (which we'd really want, BTW), this ..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/72504 [20:32:35] heya manybubbles [20:32:42] i'm looking at jmxtrans right now [20:32:50] what happens for example, if 'slope' is not set in the attrs? [20:33:05] , "slope": "<%= object['attrs'][attr_name]['slope'].upcase %>" [20:33:21] ottomata: Huh - looks like a bug to me [20:33:27] aye ok, just checking [20:33:29] i'll add checking [20:33:30] I was supposed to make that no do bad things if it wasn't defined [20:33:34] thanks! [20:33:38] aye, yeah [20:33:40] i need more parameteres anywa [20:33:43] (dmax, etc.) [20:33:47] actually [20:33:50] how about we just loop it [20:33:57] and not check for parameter names at all [20:34:31] object['attrs'][attr_name].each_pair { |name, value| puts name: value ... [20:34:35] whatever [20:34:35] I _think_ it'll complain if you put a weird parameter in the wrong writer [20:34:38] yeah [20:34:42] but that's user error :) [20:34:51] easier than hardcoding all the possibilities in [20:35:17] I think that's ok. So long as we document it. [20:35:17] k [20:35:32] (if this was some real api i'd never do that, but whaaateeever, i assume anyone using jmxtrans module knows what theyr'e doing [20:35:44] the queries are so custom to jmx and whatever java app anyway ) [20:36:52] we can move groupName into the attrs then too, rather than as a separate define parameter [20:37:24] OOF, i don't feel like testing all that right now though :p [20:37:24] i'm adding a TODO [20:38:00] PROBLEM - mysqld processes on db78 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [20:41:30] ohhh, sorry manybubbles, i see what you are saying, the attrs['name'] hash might contain settings for multple writers [20:41:41] we'd either have to specify which settings are for which writers [20:41:50] or hmmm [20:41:51] yeah mh [20:41:52] hm [20:42:55] i'm just adding the hardcoded checks for now :) [20:45:20] RECOVERY - Solr on vanadium is OK: All OK [20:45:50] PROBLEM - DPKG on db78 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:46:22] MatmaRex: I now had a closer look on your merge ... [20:46:49] MatmaRex: diffs in gerrit are know to have their limits and problems ... [20:47:34] MatmaRex: The good news however is that upstream is just about to revamp diffing. [20:47:51] RECOVERY - DPKG on db78 is OK: All packages OK [20:47:53] heh, cool [20:48:04] qchris: will they also fix syntax highlighting throwing exceptions on Opera? [20:48:10] (that's reported as a bug) [20:48:21] MatmaRex: Opera is not supported :-/ [20:48:27] or, i should rather ask, could it [20:48:27] MatmaRex: So I doubt that. [20:48:30] MatmaRex, Opera dies sooner than they fix it [20:48:53] MaxSem: IT WILL NEVER DIE FOR MEEE [20:49:16] https://addons.mozilla.org/ru/firefox/collections/burntofferings/opera-style/ [20:49:34] my escape plan ^^^ X( [20:50:00] RECOVERY - mysqld processes on db78 is OK: PROCS OK: 1 process with command name mysqld [20:52:17] New review: Ryan Lane; "Ok for now, but we need to make sure it goes away whenever we kill off gluster." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/72504 [20:53:01] MaxSem: i'll stay at 12.xx at least until i get my rightclick+scroll to change tabs function back [20:53:09] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72504 [20:53:16] i tried 15 for a while, but this was really annoying [20:53:27] also, i can't have vertical tabs anymore, apparently [20:54:00] also, howcan people live without mouse gestures in 2013 is just beyond me. [20:54:12] end rant. [20:55:11] MatmaRex: mouse gestures still exist? people use those? [20:55:31] Ryan_Lane: people don't? [20:55:47] so before anyone ask, Jenkins jobs are slow because Gerrit is slow right now :-] [20:55:52] MatmaRex, he's a maccer:P [20:56:03] mouse gestures suck :) [20:56:08] you expect me to hunt for that little "x" button when i can just, like, swipe my hand around? [20:56:18] or use Ctrl+W like a neanderthal? [20:56:20] so there are touch pad gestures instead;) [20:56:21] keyboard shortcut? [20:56:37] MaxSem: oh, i have those too :P [20:56:38] I don't like using a mouse [20:56:46] it feels like I'm in the stone ages [20:57:03] man, gerrit is slow as hell right now [20:57:14] MaxSem: (actually implemented them myself.) [20:58:00] PROBLEM - mysqld processes on db78 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [20:58:21] MaxSem: also, that extensions pack seems nice [20:58:37] (apart from the fact that these, being extensions, would grind my computer to a halt) [21:00:00] RECOVERY - mysqld processes on db78 is OK: PROCS OK: 1 process with command name mysqld [21:03:37] Gerrit seems to be down entirely now, no longer 503 or timeout, just unavailable [21:03:53] yeah, same here [21:03:57] "Could not connect to remote server" [21:04:19] it's doing something; I managed to push a commit; but it took a looong time [21:04:38] jenkins seems lagged as well [21:05:01] i +2'd a patch and it hasn't even starter the "gate and submit" process [21:05:16] https://gerrit.wikimedia.org/r/#/c/72831/ (incore) [21:05:40] that is, judging by the mails i've gotten, since the webinterface is dead [21:09:46] Not just me then [21:10:39] Let's ese [21:10:46] Do you guys know about the zuul dashboard? [21:10:54] https://integration.wikimedia.org/zuul/ [21:11:05] It seems to be quite backlogged, 119 events [21:11:35] It's known to build up a backlog when l10n-bot runs, but that's usually earlier in the day (like 90ish minutes ago) [21:11:46] scapping... [21:12:03] RoanKattouw, it was 20 minutes ago [21:12:16] !log authdns-update for bastion5001.mgmt [21:12:26] Logged the message, Master [21:12:31] Aha [21:20:02] paravoid: wow, I just noticed you reviewed the giant EL puppet patch. i really appreciate it! [21:20:15] it wasn't so giant [21:20:24] !log maxsem Started syncing Wikimedia installation... : Weekly mobile deployment [21:20:33] Logged the message, Master [21:20:37] paravoid: ok, i don't appreciate it then :P [21:20:55] seriously, tho, thanks. [21:21:47] RoanKattouw: whydoes it even trigger for l10n-bot stuff if that bot self-mergesregardless of whether tests pass or not? [21:21:56] (or at least it did last time i checked when it broke the master) [21:22:08] Yeah, it does that [21:22:10] and where is hashar when i want to shout at him. :P [21:22:27] springle: https://bugzilla.wikimedia.org/showdependencytree.cgi?id=49188&hide_resolved=1 [21:22:43] :) [21:23:03] https://bugzilla.wikimedia.org/buglist.cgi?f1=blocked&o1=substring&emailtype1=exact&query_format=advanced&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=greg%40wikimedia.org&v1=49188&list_id=215523 [21:23:26] (sorry -operations about that long link :) ) [21:23:41] thanks! [21:26:24] !log db78 switched to mariadb [21:26:34] Logged the message, Master [21:29:17] paravoid: '/sbin/nologin' is indeed a redhatism, but debian's 'login' provides '/usr/sbin/nologin'. The l10nupdate user had its shell incorrectly set to /bin/false and IIRC that was initially confusing to whomever was debugging it, which is why I thought of nologin. Which do you prefer? [21:30:25] ori-l: /bin/false is the standard in Debian [21:30:34] ori-l: have a look at your /etc/passwd [21:30:54] sshd:x:104:65534::/var/run/sshd:/usr/sbin/nologin [21:31:17] greg-g: bugzilla [[ style links default to en.wikipedia.org, use [[wikitech:Schema change]] to make it work :) [21:36:43] Nikerabbit: hi, are you about by any chance? [21:38:20] notpeter: if it is urgent or very little effort [21:39:05] milimetric: I am doing the rt triage thing this week and saw https://rt.wikimedia.org/Ticket/Display.html?id=5085 [21:39:11] er Nikerabbit [21:39:14] sorry milimetric [21:39:24] is help still needed on that? [21:39:48] Krinkle: where did I do it wrong? [21:39:49] (clearly doesn't need to be at this very moment can work on it tomorrow when it's not so late your time, but wanted to check in) [21:39:52] notpeter: if it's the vanadium thing, no [21:40:05] Nemo_bis: ok, cool. thank you [21:40:22] trying to get to all the things that may have slipped through the cracks! [21:40:25] :) no prob [21:40:27] I'll close the ticket now, then [21:40:33] good luck triaging [21:40:53] notpeter: iirc the decision was to move all that stuff off of vanadium right? [21:41:06] !log maxsem Finished syncing Wikimedia installation... : Weekly mobile deployment [21:41:10] Logged the message, Master [21:41:11] no idea what it is about sorry, I've never logged in to RT [21:42:07] Nikerabbit: ok, no worries [21:42:24] ori-l: no idea! is that the case? [21:42:34] Nikerabbit, "Search/translation memory on vanadium Solr broken" [21:43:45] mark -- curious -- did you ever enable esi on bits? I'm considering writing an experiment for centralnotice that would use it [21:44:01] notpeter: I think binasher suggested that during one of the monday ops sessions, but I no longer remember who agreed to take it on. [21:44:01] greg-g: https://bugzilla.wikimedia.org/show_bug.cgi?id=49190#c1 [21:44:13] yepp, that one is fixed for now. I expect there to be further changes in the future to get off of vanadium [21:44:16] heh, I can make a ticket to requisition a box for it [21:44:28] Nikerabbit: okie dokie [21:44:37] Krinkle: ah, right, yeah [21:45:05] notpeter: I have *no* idea what the requirements are for that service, and whether the plan was to move it to another host that is doing solr work or to provision a new server.. sorry, completely clueless about this. [21:45:20] ori-l: no worries [21:45:27] I'll try to email the correct people [21:47:21] PROBLEM - Host db78 is DOWN: PING CRITICAL - Packet loss = 100% [21:48:51] RECOVERY - Host db78 is UP: PING OK - Packet loss = 0%, RTA = 26.54 ms [21:51:02] PROBLEM - mysqld processes on db78 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [21:55:21] PROBLEM - RAID on db78 is CRITICAL: CRITICAL: Degraded [22:01:54] !log maxsem synchronized php-1.22wmf8/extensions/MobileFrontend [22:02:04] Logged the message, Master [22:03:27] !log maxsem synchronized php-1.22wmf9/extensions/MobileFrontend [22:03:37] Logged the message, Master [22:15:04] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [22:38:29] RECOVERY - Disk space on virt6 is OK: DISK OK [23:16:07] !log mwalker synchronized php-1.22wmf9/extensions/ContributionReporting 'Updating ContributionReporting for 2013 fundraiser' [23:16:17] Logged the message, Master