[00:00:39] (03CR) 10Ori.livneh: [C: 032] wfdebug-ganglia: set reporting interval to 60 seconds [operations/puppet] - 10https://gerrit.wikimedia.org/r/91807 (owner: 10Ori.livneh) [00:03:04] !log kaldari synchronized php-1.23wmf1/extensions/MobileFrontend/ 'Updating MobileFrontend for cherrypick' [00:03:15] Logged the message, Master [00:04:46] (03PS1) 10Ryan Lane: Reduce salt call timeout to 1s for fetch/checkout [operations/puppet] - 10https://gerrit.wikimedia.org/r/91809 [00:04:53] ^^ \o/ [00:05:16] 58 second reduction in deployment overhead from salt [00:06:40] !log kaldari synchronized php-1.22wmf22/extensions/MobileFrontend/ 'Updating MobileFrontend for cherrypick' [00:06:53] Logged the message, Master [00:07:08] (03CR) 10Ryan Lane: [C: 032] Reduce salt call timeout to 1s for fetch/checkout [operations/puppet] - 10https://gerrit.wikimedia.org/r/91809 (owner: 10Ryan Lane) [00:13:27] # INFO : Step 'sync' finished. Started at 2013-10-25 00:12:55; took 19 seconds to complete [00:13:28] \o/ [00:13:58] !log kaldari synchronized php-1.22wmf22/extensions/MobileFrontend/ 'Updating MobileFrontend for cherrypick' [00:49:48] !log kaldari synchronized php-1.22wmf22/extensions/MobileFrontend/ 'Updating MobileFrontend for cherrypick' [01:01:16] !log kaldari synchronized php-1.22wmf22/extensions/MobileFrontend/ 'Updating MobileFrontend for cherrypick' [01:04:11] !log kaldari synchronized php-1.22wmf22/extensions/MobileFrontend/ 'Updating MobileFrontend for cherrypick, one more time with feeling' [01:04:25] Logged the message, Master [01:06:15] (03PS1) 10Yurik: Fixed incorrect check against empty string in Varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/91813 [01:56:37] i'm out of battery and heading home. would be gone if someone could check noc@ again in the next hour or two [01:56:53] will get back on from home [02:09:39] jeremyb: no news to noc@ , no reply from smart, Leslie contact Philippines chapter for help, on wikimedia-ph list [02:10:24] I don't see why we're spending all this time tbh [02:10:33] there's a crappy ISP somewhere in the world [02:11:09] that has a broken network and doesn't respond to notices from others or listen to their customers [02:11:14] oh well? [02:11:20] heh, yea, it's a good idea to let wikimedia-ph handle more of it [02:11:25] they are local to the ISP [02:11:26] or noone? [02:11:51] jeremyb: archives not public, oh well , but in there https://lists.wikimedia.org/mailman/private/wikimedia-ph/ [02:12:22] paravoid: yea, the users just think it's us and dont know they should complain to their provider [02:12:33] so we get all the OTRS tickets jeremyb handles [02:12:55] besides that, yea, agree [02:13:39] decentralize to chapter ++ [02:14:56] I didn't say exactly that :() [02:14:57] :) [02:15:25] I'm saying that we've done more than enough, we should just stop caring now [02:15:35] I'm sure there are multiple broken ISPs in the world, we can't fix everything [02:16:04] just tell the users that it's their ISP's problem, we did everything we could to contact them and failed [02:16:25] time for them to switch ISPs :) [02:16:32] yea, basically what we're doing, we tell them to contact ISP [02:16:50] !log LocalisationUpdate completed (1.22wmf22) at Fri Oct 25 02:16:50 UTC 2013 [02:17:05] Logged the message, Master [02:19:53] paravoid: fwiw, i get the impression that this ISP didn't break until ~2 weeks ago [02:20:01] so? [02:20:23] so, maybe they have a log of what changed 2 weeks ago? [02:20:32] I really don't care? :) [02:20:37] ok [02:20:54] we're pretty sure they are the ones that are broken [02:21:02] we've notified them [02:21:26] I mean, feel free to pursue this further, I won't stop you :) [02:22:01] well kul has apparently had recent contact with them [02:22:08] so that's one route [02:22:32] and i was thinking about trying one other way [02:23:08] !log starting recentchanges OSC bug 55844 [02:23:23] Logged the message, Master [02:23:55] i'm kinda baffled why they're active on twitter but ignoring us [02:24:11] but yeah, it's their problem [02:24:39] oh yea, the WP Zero route [02:24:57] right [02:25:06] jeremyb: did you tweet to them then ? [02:25:11] yes [02:25:19] well, then that's really all you could do [02:26:23] unless you wanna subscribe to wikimedia-ph to follow up with them , heh [02:26:41] well smart just started following me personally on twitter [02:26:47] and replied [02:26:54] there you go ,, and? [02:27:10] they want my number [02:27:12] :P [02:27:24] duh :p but something [02:27:29] with their service which i don't subscribe to [02:27:31] yes, something [02:27:45] they think you are customer in .ph i bet [02:27:57] yeah [02:28:18] no, it's Wikipedia [02:28:21] :) [02:29:01] goes back to setting up bed [02:29:23] tell me what happened later, now i wanna know the ending [02:29:44] uhhh, ikea? is it puppetized? [02:30:26] haha, kind of, it uses all kinds of thumbnails for instructions that are not under cc, and needs base::tools::hammer [02:33:54] oh, they have a standard toolkit you can buy :P [02:33:59] how's the i18n? [02:34:29] solved by just using pictograms [02:35:02] role::shelf::billy isn't included yet [02:36:54] jeremyb: in the future Ikea will be like thingiverse.com and you 3D-print it and just pay the license, but we'll have free furniture on commons :) ttyl [02:38:29] haha [02:38:48] you could rent a drive up printer at uhaul? [02:38:49] :P [02:48:11] !log LocalisationUpdate completed (1.23wmf1) at Fri Oct 25 02:48:10 UTC 2013 [02:48:29] Logged the message, Master [03:02:06] springle: Hey. are the archive and externallinks pk additions all done? [03:02:18] Reedy: no [03:06:16] Reedy: couple more days. have some recentchanges OSC going on simultaneously. archive/externallinks need master rotations which will happen afterwards [03:06:52] I [03:07:11] plus S6 is held up with the UpdateCollation job [03:07:55] cool. wasn't sure where things were, and then saw the recentchanges stuff starting [03:08:50] lots of improvements happening :) [03:09:03] slowly :) [03:10:40] well, if you have a time machine... [03:21:39] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Oct 25 03:21:39 UTC 2013 [03:21:58] Logged the message, Master [05:45:18] hi [05:45:43] I have a question related to categories [05:45:59] I have a limited understanding of the mediawiki schema [05:46:35] so, I was reading mediawiki documentation in an attempt to understand how I could derive category history (and category events) based on what is available in the mediawiki database [05:49:28] another question which might be a bit strange is, I'm trying to count the number of total revisions inside the enwiki database [05:50:03] So far I've tried these two SELECT COUNT(*) FROM revision; SELECT COUNT(rev_id) FROM revision; [05:50:07] they both take a lot of time [05:50:30] so then I thought "ok, maybe I can just get the max rev_id" [05:51:11] SELECT MAX(rev_id) FROM revision; [05:51:14] I got back 578653633 [05:51:33] but it doesn't help since there are missing revisions [05:51:46] because they were deleted and moved into the archive table? [05:53:09] what's the name of the archive table please ? [05:56:04] legoktm: you know what I was thinking about deriving category history ? I was thinking, since the dump is public data, I can throw it up on AWS, spin up a bunch of instances, let them make easy work of the ~600M revisions to parse them with reverse regexes(since the categories will be at the end of each revision text) and get the categories for articles and their evolution across time [05:56:20] heh [05:56:21] and after that's done, having a cronjob run once in a while could keep that up-to-date [05:56:23] that would be interesting. [06:06:43] but the problem is that I'm not sure if the dumps contain rev_ids [06:06:55] so I couldn't link them back to the enwiki database [06:07:15] and the dump probably doesn't contain some revisions as they're archived, not sure about that one [06:07:21] they have rev_ids [06:07:32] oh that's interesting [06:07:36] but they wouldnt have deleted revisions if the revision was deleted prior to dump creation [06:08:12] true [06:09:07] legoktm: do they contain category ids as well ? [06:09:17] i dont think so [06:09:35] iirc, you can download dumps of the categorylinks sql table [06:15:34] legoktm: so this weird idea, what kind of things would it catch that aren't already in the category , categorylinks table ? [06:15:42] legoktm: the Template: case that you told me about [06:15:47] legoktm: are there others like it ? [06:16:17] the template case would be in the categorylinks table [06:16:31] just not easily parseable from revision text [08:02:14] (03CR) 10Ori.livneh: "It eats up all the CPU on my VM doing absolutely nothing:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90733 (owner: 10Physikerwelt) [08:28:52] (03PS1) 10ArielGlenn: put the glusterfs mount back (dumps rsyncs) [operations/puppet] - 10https://gerrit.wikimedia.org/r/91826 [08:29:57] (03CR) 10ArielGlenn: [C: 032] put the glusterfs mount back (dumps rsyncs) [operations/puppet] - 10https://gerrit.wikimedia.org/r/91826 (owner: 10ArielGlenn) [09:39:59] (03PS1) 10Hashar: ctags configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/91836 [10:00:03] (03CR) 10Akosiaris: [C: 032] ctags configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/91836 (owner: 10Hashar) [10:19:23] out for lunch [10:40:18] (03CR) 10Mark Bergsma: [C: 04-1] "Yay!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91492 (owner: 10Ori.livneh) [11:09:47] (03PS2) 10TTO: redirect vikipedi[a].com.tr to tr.wikipedia.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/88705 (owner: 10Dzahn) [11:30:36] mark, around? [11:30:48] yes [11:30:56] mark, i'm trying to make sense of the varnish log, you might be able to help :) [11:31:02] i simulated the crash in beta [11:32:21] so i discovered that one of the (minor) bugs was in checking -- https://gerrit.wikimedia.org/r/#/c/91813/ [11:32:28] mark ^ [11:33:59] so how does that cause a crash? [11:34:42] PROBLEM - Disk space on copper is CRITICAL: DISK CRITICAL - free space: / 355 MB (3% inode=90%): [11:35:44] mark - that bug - doesn't, just causes an error in the log - because netmapper gets an empty string instead of an IP [11:36:00] mark, could you connect to beta labs - ssh 10.4.1.82 [11:36:14] and look at /etc/varnish/tmp.txt [11:36:40] what's the hostname for that ip? [11:37:08] deployment-cache-mobile01, but DNS was broken a few days ago [11:37:50] the log shows tons of "0 WorkThread - 0x7ff6267f2aa0 start" entries [11:38:42] mark, once you open the file, search for "FORCE" [11:39:03] the first line -- 11 SessionOpen c 109.172.15.11 53958 :80 [11:39:12] right above the FORCE [11:39:48] ok [11:49:54] ...and then? [11:49:55] :) [11:52:33] hmm, mark, i just tried something else and getting weird responses from the backend, postpone your inquiry for a bit please [11:52:47] * mark goes to lunch then :) [11:52:51] i think it might have been a few bugs [11:52:53] :) [11:53:03] sure, but please +2 that little varnish change [11:53:24] mark^! [11:53:29] :) [11:57:06] and btw, one other reason for the failure - it seems varnish doesn't handle onerror="continue" esi param [12:01:52] PROBLEM - Disk space on cp1046 is CRITICAL: DISK CRITICAL - free space: /srv/sda3 7475 MB (2% inode=99%): /srv/sdb3 7353 MB (2% inode=99%): [12:01:52] PROBLEM - Varnish HTCP daemon on cp1046 is CRITICAL: NRPE: Unable to read output [12:04:02] PROBLEM - DPKG on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:04:53] PROBLEM - Varnish HTCP daemon on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:05:59] RECOVERY - Varnish HTCP daemon on cp1046 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [12:05:59] RECOVERY - DPKG on cp1046 is OK: All packages OK [12:06:49] PROBLEM - SSH on cp1046 is CRITICAL: Server answer: [12:08:49] RECOVERY - SSH on cp1046 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [12:09:39] PROBLEM - Varnish traffic logger on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:14:39] RECOVERY - Varnish traffic logger on cp1046 is OK: PROCS OK: 2 processes with command name varnishncsa [12:15:49] PROBLEM - Disk space on cp1046 is CRITICAL: DISK CRITICAL - free space: /srv/sda3 7474 MB (2% inode=99%): /srv/sdb3 7352 MB (2% inode=99%): [12:15:59] PROBLEM - Varnish HTCP daemon on cp1046 is CRITICAL: NRPE: Call to popen() failed [12:16:59] RECOVERY - Varnish HTCP daemon on cp1046 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [12:16:59] PROBLEM - DPKG on cp1046 is CRITICAL: NRPE: Unable to read output [12:17:09] PROBLEM - RAID on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:18:09] RECOVERY - RAID on cp1046 is OK: OK: no RAID installed [12:19:59] RECOVERY - DPKG on cp1046 is OK: All packages OK [12:20:49] PROBLEM - Disk space on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:21:59] PROBLEM - Varnish HTCP daemon on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:22:39] PROBLEM - Varnish traffic logger on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:22:49] PROBLEM - SSH on cp1046 is CRITICAL: Server answer: [12:22:59] RECOVERY - Varnish HTCP daemon on cp1046 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [12:23:39] RECOVERY - Varnish traffic logger on cp1046 is OK: PROCS OK: 2 processes with command name varnishncsa [12:23:49] RECOVERY - SSH on cp1046 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [12:26:39] PROBLEM - Varnish traffic logger on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:27:39] RECOVERY - Varnish traffic logger on cp1046 is OK: PROCS OK: 2 processes with command name varnishncsa [12:27:49] PROBLEM - SSH on cp1046 is CRITICAL: Server answer: [12:27:49] PROBLEM - Disk space on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:27:59] PROBLEM - DPKG on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:28:09] PROBLEM - RAID on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:28:59] PROBLEM - Varnish HTCP daemon on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:30:39] PROBLEM - Varnish traffic logger on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:30:49] RECOVERY - SSH on cp1046 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [12:30:59] RECOVERY - Varnish HTCP daemon on cp1046 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [12:30:59] RECOVERY - DPKG on cp1046 is OK: All packages OK [12:33:09] RECOVERY - RAID on cp1046 is OK: OK: no RAID installed [12:33:39] RECOVERY - Varnish traffic logger on cp1046 is OK: PROCS OK: 2 processes with command name varnishncsa [12:34:59] PROBLEM - Varnish HTCP daemon on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:34:59] PROBLEM - DPKG on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:35:59] RECOVERY - DPKG on cp1046 is OK: All packages OK [12:36:39] PROBLEM - Varnish traffic logger on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:36:49] PROBLEM - SSH on cp1046 is CRITICAL: Server answer: [12:37:39] RECOVERY - Varnish traffic logger on cp1046 is OK: PROCS OK: 2 processes with command name varnishncsa [12:38:49] RECOVERY - SSH on cp1046 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [12:38:59] PROBLEM - DPKG on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:39:09] PROBLEM - RAID on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:40:59] PROBLEM - Varnish HTCP daemon on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:41:59] RECOVERY - Varnish HTCP daemon on cp1046 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [12:42:09] PROBLEM - RAID on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:43:49] PROBLEM - Disk space on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:43:59] RECOVERY - DPKG on cp1046 is OK: All packages OK [12:44:09] RECOVERY - RAID on cp1046 is OK: OK: no RAID installed [12:44:59] PROBLEM - Varnish HTCP daemon on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:45:59] RECOVERY - Varnish HTCP daemon on cp1046 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [12:46:49] PROBLEM - Disk space on cp1046 is CRITICAL: DISK CRITICAL - free space: /srv/sda3 7473 MB (2% inode=99%): /srv/sdb3 7351 MB (2% inode=99%): [12:48:59] PROBLEM - DPKG on cp1046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:49:59] RECOVERY - DPKG on cp1046 is OK: All packages OK [12:52:19] PROBLEM - RAID on cp1046 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:52:49] PROBLEM - Varnish traffic logger on cp1046 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:52:59] PROBLEM - SSH on cp1046 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:53:39] PROBLEM - Varnish HTTP mobile-frontend on cp1046 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:54:09] PROBLEM - Varnish HTCP daemon on cp1046 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:54:09] PROBLEM - Varnish HTTP mobile-backend on cp1046 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:54:09] PROBLEM - DPKG on cp1046 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:58:09] RECOVERY - Varnish HTTP mobile-backend on cp1046 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 6.871 second response time [12:58:09] RECOVERY - RAID on cp1046 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [12:58:39] RECOVERY - Varnish traffic logger on cp1046 is OK: PROCS OK: 2 processes with command name varnishncsa [12:58:49] RECOVERY - SSH on cp1046 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [12:58:59] RECOVERY - Varnish HTCP daemon on cp1046 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [12:58:59] RECOVERY - DPKG on cp1046 is OK: All packages OK [12:59:29] RECOVERY - Varnish HTTP mobile-frontend on cp1046 is OK: HTTP OK: HTTP/1.1 200 OK - 262 bytes in 0.001 second response time [13:02:49] !log Rebooting cp1046, xfs mem alloc issues [13:03:05] Logged the message, Master [13:04:29] PROBLEM - Host cp1046 is DOWN: PING CRITICAL - Packet loss = 100% [13:04:49] Hm. Anyone knows where in git the wikitech config is? I see it on the local box, but it doesn't seem to be a checkout. [13:05:34] is it in git? :) [13:06:15] RECOVERY - Host cp1046 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [13:06:25] ... I sure as hell /hope/ so. [13:06:31] i suspect it may not be [13:06:36] * Coren shudders. [13:06:40] How... uncouth. [13:07:12] it was an unmaintained completely isolated setup on some random vhost in some network until ryan migrated it like 6 months ago or so? [13:08:24] Ah. Fun. [13:26:57] (03PS1) 10Mark Bergsma: Use Varnish by default for role::cache::text [operations/puppet] - 10https://gerrit.wikimedia.org/r/91863 [13:26:58] (03PS1) 10Mark Bergsma: Add ulsfo text caches [operations/puppet] - 10https://gerrit.wikimedia.org/r/91864 [13:28:45] (03CR) 10Mark Bergsma: [C: 032] Use Varnish by default for role::cache::text [operations/puppet] - 10https://gerrit.wikimedia.org/r/91863 (owner: 10Mark Bergsma) [13:30:39] (03CR) 10Physikerwelt: "I can not run the current version of the puppet script" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90733 (owner: 10Physikerwelt) [13:32:31] (03CR) 10Physikerwelt: "I was doing too many things in parallel... I wanted to write" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90733 (owner: 10Physikerwelt) [13:35:05] PROBLEM - SSH on lvs6 is CRITICAL: Server answer: [13:37:05] RECOVERY - SSH on lvs6 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [13:40:05] PROBLEM - SSH on lvs6 is CRITICAL: Server answer: [13:41:05] RECOVERY - SSH on lvs6 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [13:47:32] So, someone reached out regarding receiving 504 errors when trying to access the sites - sadly with little more information than that. [13:47:44] I realized, though, that I don't know where the right place is to forward said issues. [13:47:55] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [13:48:42] RT [13:48:46] leslie has been on it for days already [13:48:55] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [13:48:57] there seems to be an ISP in the philippines with a broken proxy [13:49:20] and she can't get ahold of them in any way, she's even tried reaching out to Wikimedia chapter in the Philippines now [13:49:22] Yeah, I saw that - I'm not sure this is the same issue (they mentioned french), but I'll forward it that way [13:49:37] ok [13:49:41] Thanks :) [13:50:02] if it's oceania that could be ulsfo [13:50:06] which is a slightly different SSL setup [13:50:14] but the existing issues so far that I've seen have not been ulsfo [13:51:27] I've reached out for more info, with a reply-to of ops-requests@ [13:58:35] RECOVERY - Disk space on copper is OK: DISK OK [13:59:19] !log shot swiftrepl on copper, almost out of space, tossed the log... restart as you like [13:59:34] what is almost out of space? [13:59:34] Logged the message, Master [13:59:46] 100mb or so left [13:59:52] would be gone in 30 mins [13:59:53] of what? [13:59:54] or lss [13:59:56] on / [14:00:04] of the log of copper? [14:00:08] could you just ask first before killing? [14:00:19] I have already talked to faidon about this for several days [14:00:24] he said it was fine to shoot it [14:00:53] he didn't want to truncate the log? [14:00:56] no [14:01:10] he doesn't use the log for anything except 'is the job done yet?' [14:01:40] yeah I know, so shouldn't it be better to truncate the log and keeping the process running? :) [14:01:52] we did this same drill a few days ago: shoot job, toss log, restart it [14:02:14] ah, because the fd doesn't get released while the process runs [14:02:27] exactly [14:48:29] (03CR) 10Mark Bergsma: [C: 032] Add ulsfo text caches [operations/puppet] - 10https://gerrit.wikimedia.org/r/91864 (owner: 10Mark Bergsma) [15:07:37] PROBLEM - Varnish HTTP text-frontend on cp4009 is CRITICAL: Connection refused [15:07:37] PROBLEM - Varnish traffic logger on cp4016 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [15:07:37] PROBLEM - Varnish traffic logger on cp4010 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [15:07:47] PROBLEM - Varnish traffic logger on cp4009 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [15:07:57] PROBLEM - HTTPS on cp4010 is CRITICAL: Connection refused [15:08:07] PROBLEM - HTTPS on cp4016 is CRITICAL: Connection refused [15:08:17] PROBLEM - HTTPS on cp4009 is CRITICAL: Connection refused [15:08:37] PROBLEM - Varnish HTTP text-frontend on cp4016 is CRITICAL: Connection refused [15:08:37] PROBLEM - Varnish HTTP text-frontend on cp4010 is CRITICAL: Connection refused [15:12:09] with bad settings. Some stuff continued to work with the bad settings. Some [15:12:12] stuff didn't. Expect to see some more bugs filed around how we make sure this [15:12:15] eek [15:12:18] bad paste [15:15:42] (03PS3) 10Ottomata: Updating with recent upstream changes to varnishkafka.conf [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/91664 [15:18:28] greg-g: if all goes well, next week I think I'll put text/wiki traffic on varnish in ulsfo. That would also include wikipedia, but as it's "only" for OC, it wouldn't be a lot of traffic [15:18:48] I don't know when exactly yet, but I'll do it during European work hours, well outside of any SF deployment windows [15:19:55] mark: sweet [15:20:41] mark: that might be worth a note in the deploy highlights email I write, since its wikipedias. Have a favorite day or two? [15:21:15] monday or tuesday I think, but I can't guarantee it [15:21:31] :) [15:22:00] I'll just say "the week of..." ;) [15:22:05] yeah [15:29:39] (03PS1) 10Mark Bergsma: Add Text caches ulsfo cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/91877 [15:31:02] Hm… right now all the logrotate scripts are in files/logrotate. Seems to me there's a major design choice here -- do things like that get grouped with the tool they configure, or with the tool that uses them? [15:31:10] (03PS1) 10Odder: (bug 40941) Increase font size in Gerrit diff messages [operations/puppet] - 10https://gerrit.wikimedia.org/r/91879 [15:31:13] For instance, I'd think that the gluster logrotate config would go in the gluster module [15:33:37] (03CR) 10Mark Bergsma: [C: 032] Add Text caches ulsfo cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/91877 (owner: 10Mark Bergsma) [15:34:09] andrewbogott: yeah that makes sense [15:34:18] I consider things like logrotate generic infrastructure used by other more specific modules [15:34:33] * andrewbogott nods [15:34:55] yup, we had that discussion before with nagios plugins [15:35:07] and ganglia plugins [15:35:12] and a bunch more yeah [15:35:12] right [15:44:31] (03PS1) 10Cmjohnson: Removing site.pp entries for wm126-134 [operations/puppet] - 10https://gerrit.wikimedia.org/r/91882 [15:44:38] andrewbogott: btw I have already modularized install-server. So dont put it in your plans for modularization. I am holding it back for some changes in network.pp so I can use ferm and drop the iptables rules in there (yuck!) [15:45:17] akosiaris: thanks, I will add this to https://wikitech.wikimedia.org/wiki/Puppet_Todo [15:46:26] :) [15:48:08] apergos: https://gerrit.wikimedia.org/r/91882 [15:51:13] cmjohnson1: dhcp entries? [15:52:57] apergos: we should probably take out mgmt as well. it's not like they're going to be used in the same place again [15:53:13] (03PS1) 10Andrew Bogott: Move generic::gluster* into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/91884 [15:53:24] after they're powered down [15:54:13] sure [15:55:37] RECOVERY - Varnish HTTP text-frontend on cp4009 is OK: HTTP OK: HTTP/1.1 200 OK - 198 bytes in 0.146 second response time [15:55:47] RECOVERY - Varnish traffic logger on cp4009 is OK: PROCS OK: 2 processes with command name varnishncsa [15:57:37] RECOVERY - Varnish HTTP text-frontend on cp4010 is OK: HTTP OK: HTTP/1.1 200 OK - 198 bytes in 0.150 second response time [15:57:37] RECOVERY - Varnish HTTP text-frontend on cp4016 is OK: HTTP OK: HTTP/1.1 200 OK - 198 bytes in 0.150 second response time [15:57:37] RECOVERY - Varnish traffic logger on cp4016 is OK: PROCS OK: 2 processes with command name varnishncsa [15:57:37] RECOVERY - Varnish traffic logger on cp4010 is OK: PROCS OK: 2 processes with command name varnishncsa [16:00:19] (03PS2) 10Andrew Bogott: Move generic::gluster* into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/91884 [16:07:35] PROBLEM - Varnish HTTP text-frontend on cp4018 is CRITICAL: Connection refused [16:07:45] PROBLEM - Varnish traffic logger on cp4018 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [16:07:55] PROBLEM - Varnish HTTP text-frontend on cp4017 is CRITICAL: Connection refused [16:08:05] PROBLEM - Varnish traffic logger on cp4017 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [16:08:15] PROBLEM - HTTPS on cp4018 is CRITICAL: Connection refused [16:08:25] PROBLEM - HTTPS on cp4008 is CRITICAL: Connection refused [16:08:35] PROBLEM - HTTPS on cp4017 is CRITICAL: Connection refused [16:14:08] (03PS1) 10Cmjohnson: Removing mgmt ip's for mw126-135 [operations/dns] - 10https://gerrit.wikimedia.org/r/91885 [16:16:50] (03PS2) 10Cmjohnson: Removing site.pp entries for wm126-134 [operations/puppet] - 10https://gerrit.wikimedia.org/r/91882 [16:19:05] RECOVERY - Varnish traffic logger on cp4017 is OK: PROCS OK: 2 processes with command name varnishncsa [16:19:15] (03PS3) 10Cmjohnson: Removing site.pp entries for wm126-134 [operations/puppet] - 10https://gerrit.wikimedia.org/r/91882 [16:19:29] (03CR) 10Cmjohnson: [C: 032] Removing site.pp entries for wm126-134 [operations/puppet] - 10https://gerrit.wikimedia.org/r/91882 (owner: 10Cmjohnson) [16:19:55] RECOVERY - Varnish HTTP text-frontend on cp4017 is OK: HTTP OK: HTTP/1.1 200 OK - 198 bytes in 0.148 second response time [16:21:56] (03CR) 10Cmjohnson: [C: 032] Removing mgmt ip's for mw126-135 [operations/dns] - 10https://gerrit.wikimedia.org/r/91885 (owner: 10Cmjohnson) [16:22:29] !log dns update [16:22:42] Logged the message, Master [16:24:01] (03PS1) 10Vogone: Added filemover user group to bnwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91886 [16:29:35] RECOVERY - Varnish HTTP text-frontend on cp4018 is OK: HTTP OK: HTTP/1.1 200 OK - 198 bytes in 0.150 second response time [16:29:45] RECOVERY - Varnish traffic logger on cp4018 is OK: PROCS OK: 2 processes with command name varnishncsa [16:50:45] PROBLEM - Disk space on cp1051 is CRITICAL: DISK CRITICAL - free space: /srv/sda3 13277 MB (4% inode=99%): /srv/sdb3 12348 MB (3% inode=99%): [16:54:58] (03PS1) 10Dzahn: fix wrong reverse DNS for mw110 and mw110.mgmt [operations/dns] - 10https://gerrit.wikimedia.org/r/91891 [16:58:52] (03CR) 10Dzahn: [C: 032] "pmtpa/apaches:{ 'host': 'mw110.pmtpa.wmnet', 'weight': 200, 'enabled': False } #won't image properly, needs work" [operations/dns] - 10https://gerrit.wikimedia.org/r/91891 (owner: 10Dzahn) [16:59:29] !log DNS update, fix mw110 reverse entries [16:59:44] Logged the message, Master [17:03:34] mutante: that host was never installed (completely) as it turns out... now there is not much point either [17:04:08] apergos: yea, it wasn't installed because it didn't work because of the above. just resolving RT #6086 by Chris [17:04:40] mutante: cool..i am going to create RT to decom 110 [17:04:40] Is it relatively new hardware? [17:04:47] reedy yes [17:05:02] just didnt want the broken entry either way [17:05:15] yeah..same here, apergos and I were debating on fixing or decomming [17:05:32] cmjohnson1: eh, ok.. in that case .. yea well [17:05:33] neither of us were passionate either way so decom won out [17:05:35] it didn't take long [17:14:34] (03Draft5) 10Akosiaris: Modularizing puppetmaster [operations/puppet] - 10https://gerrit.wikimedia.org/r/91353 [17:29:21] (03CR) 10Eloquence: [C: 04-1] "I'm sorry, but I don't agree re: the username issue. Let's not hardcode past idiosyncrasies in a way that makes things more confusing in t" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91344 (owner: 10Legoktm) [17:29:23] (03CR) 10Chad: [C: 031] "Upstream won't take it. They want everything as small and compact as possible." [operations/puppet] - 10https://gerrit.wikimedia.org/r/91879 (owner: 10Odder) [17:30:27] (03CR) 10Eloquence: "I like "MediaWiki message delivery", the alternative suggested by Nemo. Descriptive and clear." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91344 (owner: 10Legoktm) [17:35:54] (03PS3) 10Legoktm: Enable MassMessage on all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91344 [17:37:24] (03CR) 10Legoktm: "Switched the username to "MediaWiki message delivery" which I've created a global account for." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91344 (owner: 10Legoktm) [17:47:54] (03PS1) 10Cmjohnson: Decommission mw110 [operations/puppet] - 10https://gerrit.wikimedia.org/r/91897 [17:56:43] (03CR) 10Reedy: [C: 04-1] "[18:52:17] Reedy: https://gerrit.wikimedia.org/r/#/c/91344/3/wmf-config/InitialiseSettings.php shouldn't that be using $wmg?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91344 (owner: 10Legoktm) [17:57:01] (03CR) 10Cmjohnson: [C: 032] Decommission mw110 [operations/puppet] - 10https://gerrit.wikimedia.org/r/91897 (owner: 10Cmjohnson) [17:57:10] AaronSchulz: err, why should it use wmg? [17:57:28] some variables use wg, some are wmg so I'm not really sure what the distinction is [17:57:44] Because loading the extension will override those variables [17:57:49] They're loaded at the start [17:57:54] THEN your extension is loaded [17:57:57] overriding those with the defaults [17:58:03] oh. [17:58:05] $wgMyGlobal = $wmgMyGlobal; [17:58:08] gotcha [17:58:15] how do I do the array merge thingy for +metawiki then? [17:59:24] Defining the default in InitialiseSettings.php is usually the simplest way [17:59:42] ok [18:00:45] legoktm: btw, just chatted with Erik, let's get the updated version of MM (pending some changes he suggested, I think) out to testwikis on Nov 7th (not next week, but the week after) and then, pending OK, we'll roll it out. [18:01:28]