[00:11:02] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [00:11:22] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [00:52:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:53:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [01:07:44] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [01:07:44] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [01:20:24] PROBLEM - search indices - check lucene status page on search20 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern found - 60051 bytes in 0.115 second response time [01:20:42] ori-l: I had to revert that scap change yesterday [01:20:48] see https://gerrit.wikimedia.org/r/#/c/71967/ [01:21:01] /bin/sh was used as the interpreter of one of the scripts, not /bin/bash [01:21:37] Oh. I checked the SAL but didn't see a note about the failure. The time it took scap to finish was suspicious but I figured you had done something to make the payload smaller for the purpose of testing. [01:21:46] * ori-l looks [01:22:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:23:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [01:23:38] TimStarling: OK, I'd like to fix that and resubmit. How should I do that, given that the original change is closed? I'm going to do it as a revert of your revert, I think [01:23:57] submit it as a new change [01:24:01] a revert of a revert is fine [01:24:12] it's still less than three reverts :) [06:07:33] TimStarling: ping re ^ [06:08:03] there's a problem with mwscript on fenari [06:08:36] I am trying a few things, but it takes about a year for each puppet run so I usually end up doing something else before it finishes [06:09:02] Could not open input file: /home/wikipedia/common/multiversion/MWScript.php ? [06:09:08] have we some games installed on fenari? [06:09:32] !ask | Reedy [06:09:33] Reedy: Hi, how can we help you? Just ask your question. [06:09:43] 1 second! [06:14:29] TimStarling: 'mwscript' on tin is entirely different [06:15:06] if fenari had tin's mwscript, it would work [06:16:12] ori-l: only because I removed the one that was similar [06:16:30] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 06:16:27 UTC 2013 [06:16:44] it had mode 555, suggesting it was installed by puppet, but I removed it and ran puppet and it wasn't recreated [06:16:57] so it must have just been copied in from somewhere, preserving modes [06:17:00] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [06:18:23] I copied it from tin [06:20:55] i am massively confused [06:21:09] well, there was a copy in /home/wikipedia/bin, that was outdated [06:21:20] there was also a copy in /usr/local/bin, that was slightly less outdated [06:21:40] but neither was maintained by puppet so neither had your sh -> bash fix [06:22:44] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72066 [06:31:10] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 06:31:04 UTC 2013 [06:31:30] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [06:31:58] 24 hours later: centrally-declared rsync command-line arguments [06:32:44] thanks, i feel like a massive dork [06:32:48] moreso than usual, that is [07:06:55] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [07:07:35] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [08:07:42] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [08:09:32] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [08:10:12] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [08:47:13] New review: DixonD; "I used Notepad++. I will fix this later today." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72054 [08:55:27] New review: Nikerabbit; "Perhaps you need to open the diff for the file where I placed my comment?" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72054 [09:05:09] PROBLEM - Puppet freshness on ms-be1002 is CRITICAL: No successful Puppet run in the last 10 hours [09:06:26] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [09:06:56] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [09:07:22] New patchset: DixonD; "A partial fix for https://bugzilla.wikimedia.org/show_bug.cgi?id=50561" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72054 [09:10:52] New review: DixonD; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72054 [09:13:16] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: No successful Puppet run in the last 10 hours [09:43:45] New patchset: Odder; "(bug 50561) Add 'Translation' namespace for ukwikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72054 [09:44:02] New review: Odder; "Comment inline." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72054 [09:44:52] New review: Odder; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72054 [09:52:03] New review: DixonD; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72054 [10:00:25] OK, do we -1 such changes? [10:01:42] eh? [10:03:14] "Just wondering why you're adding namespaces 102 and 252 to $wgNamespacesToBeSearchedDefault (...) — it wasn't requested in bug 48308 nor bug 50561 (...)" [10:03:29] "That was something that our small community was not aware of at the time when the request was made. That's why it was not requested explicitly, but seems to be appropriate for the same reasons as in English Wikisource." [10:04:12] I'm not sure what the general opinion on doing such stuff is. [10:06:10] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [10:06:40] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [10:06:56] looking [10:08:53] New review: Hashar; "This should be done under the 'contint' module in modules/contint/manifests/packages.pp :)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/71921 [10:09:00] erm [10:09:15] * hashar coughs as well [10:09:44] in theory there should be a link to community approval [10:10:01] even if it gets tacked on to one of the other bugs [10:10:20] I know it's minor and etc. but... [10:10:31] I think I missed a bunch of the conversation [10:10:52] apergos: That's how I was taught & raised by you guys [10:10:54] see this: https://gerrit.wikimedia.org/r/#/c/72054/3/wmf-config/InitialiseSettings.php and look at the comments [10:11:03] I mean it's probably fine anyways right [10:11:04] but [10:11:07] On the other hand, this is so minor and has so little influence... [10:11:30] I mean, that's a positive change, in the end... [10:12:36] so I wouldn't -1 it but I would tell em 'scare up a link to community sign-off first, and then we can deploy this' [10:12:42] my 2 cents [10:20:06] I see, thanks apergos [10:21:17] yw [10:23:14] New patchset: Hashar; "New ubuntu icon" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52778 [10:23:53] New review: Hashar; "And it doesn't have the ugly white border :-]" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/52778 [10:24:29] New review: Odder; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72054 [10:25:10] New patchset: Hashar; "icinga: update Ubuntu icon" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52778 [10:37:03] New patchset: DixonD; "(bug 50561) Add 'Translation' namespace for ukwikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72054 [10:42:07] New review: DixonD; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72054 [10:51:10] New patchset: Ori.livneh; "Rewrite of EventLogging module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71927 [11:05:05] New patchset: Ori.livneh; "Rewrite of EventLogging module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71927 [11:09:18] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [11:09:43] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:07:21] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:08:01] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:08:01] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 12:07:53 UTC 2013 [12:08:21] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:08:51] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 12:08:45 UTC 2013 [12:09:01] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:09:51] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 12:09:43 UTC 2013 [12:10:21] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:10:31] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 12:10:22 UTC 2013 [12:11:01] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:11:21] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 12:11:20 UTC 2013 [12:11:44] New patchset: Hashar; "toollabs: insert several missing packages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67055 [12:12:01] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 12:11:55 UTC 2013 [12:12:21] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:12:51] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 12:12:41 UTC 2013 [12:13:01] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:13:10] New patchset: Hashar; "toollabs: insert sql tool to execnodes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66266 [12:13:21] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:13:21] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 12:13:13 UTC 2013 [12:14:01] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:14:01] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 12:13:54 UTC 2013 [12:14:21] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:14:31] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 12:14:23 UTC 2013 [12:15:01] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:15:01] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 12:15:00 UTC 2013 [12:15:21] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:15:31] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 12:15:29 UTC 2013 [12:16:01] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:16:11] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 12:16:03 UTC 2013 [12:16:21] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:16:31] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 12:16:27 UTC 2013 [12:17:01] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:17:01] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 12:16:57 UTC 2013 [12:17:21] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:17:21] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 12:17:18 UTC 2013 [12:17:51] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 12:17:45 UTC 2013 [12:17:59] I'm feeling flooded. [12:18:01] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:18:11] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 12:18:02 UTC 2013 [12:18:21] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:18:31] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 12:18:23 UTC 2013 [12:19:01] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:19:21] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:30:11] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 12:30:03 UTC 2013 [12:31:01] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:36:08] is it just me or is NickServ and SASL auth borked? [12:46:33] bblack: nickserv /chanserv services were down a few minutes ago [12:46:56] uh? [12:52:47] ah chanserv isn't in the channels so nickserv is going to have a bit of an issue right now [12:52:52] there they go [12:52:52] and there I go, going to get foodstuffs to make lunch [12:52:52] before 4 pm even :-/ [13:20:04] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [13:20:04] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [13:20:04] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [13:20:04] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [13:20:04] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [13:20:04] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [13:20:04] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [13:20:04] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [13:24:39] New patchset: Hashar; "contint: file perms for qunit jobs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72086 [13:26:39] Coren: hey are you still around by any chance ? :-D [13:26:56] hashar: Sure [13:27:18] Coren: would you mind merging on sock puppet a file ownership change for me ? https://gerrit.wikimedia.org/r/72086 :-) [13:27:31] some dir needs to change from jenkins to jenkins-slave user :-] [13:28:04] New review: coren; "Hashar likes it!" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/72086 [13:28:07] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72086 [13:28:10] :-] [13:28:25] Merginated. [13:28:34] merci beaucoup! [13:29:13] Deux riens. :-) [13:47:11] !log Jenkins: reduced # of executors on [master] from 4 to 2. Will bring it to 0 next monday (see {{bug|50228}}) [13:47:17] Logged the message, Master [13:48:03] hashar: 0 executors? [13:48:26] yup [13:48:36] to have everything running on slaves and not on the master :-] [13:49:12] afaik, debian-glue-source builds could be made on the master [13:49:24] that way the zuul issue will go away [13:49:30] it needs sudo isn't it ? [13:49:48] not source builds [13:49:57] * hashar is confused [13:49:59] hmm [13:50:08] perhaps you are confuse [13:50:14] I'm confused about https://wikitech.wikimedia.org/wiki/Server_admin_log [13:50:14] if you get a package with a build dependency set to "whatever" package, I guess it needs to install "whatever" [13:51:02] yea, you are right, build-dep-indep need be be installable [13:51:24] though when building the source package, none of the deps are really used ツ [13:51:40] hashar: your log went fubar on the wiki [13:51:49] hashar: 13:48 hashar: Jenkins: reduced # of executors on [master] from 4 to 2. Will bring it to 0 next monday (see [[bugzilla:Gerrit change 50228|bug Gerrit change 50228]]) [13:52:08] pfff [13:52:33] !b 50228 [13:52:33] https://bugzilla.wikimedia.org/50228 [13:52:36] !bug 50228 [13:52:36] https://bugzilla.wikimedia.org/50228 [13:52:53] !log {{bug|50228}} [13:53:03] Logged the message, Master [13:53:30] hashar: I assume the slave "gallium" are to be booted as well? [13:54:19] AzaToth: to be booted ? [13:54:29] hashar: your bug says "I want to get rid of jobs on the master instance and have them all on slaves."; is "I" synonym to "We"? [13:54:35] to be removed [13:54:44] as it's on gallium [13:55:06] and _why_ do you want this? [13:56:31] that is merely to make sure all jobs will be run in a simlar envirnonnement [13:56:58] since most of them are now running on the 'gallium' slave, whenever we add a second we know the jobs will run properly on that second box [13:57:01] envirnonnement:) [13:57:19] ok [13:57:29] sounds sensible [13:58:42] hashar: I'm still waiting for https://review.openstack.org/#/c/34974/ to get it's +2 btw [13:58:45] New patchset: Manybubbles; "Slick support for ganglia slope and units." [operations/puppet/jmxtrans] (master) - https://gerrit.wikimedia.org/r/71954 [13:59:08] ^^^^ ignore that. I'm gonna make more changes in a bit but wanted to rebase first. [13:59:22] AzaToth: yeah I noticed your change [13:59:31] AzaToth: at least you received a quick review on submission :-] [13:59:42] AzaToth: note that yesterday was an holiday in US [13:59:47] yea [14:00:01] AzaToth: I guess lot of them are not working today either, so the +2er might only have a possibility to look at your patch on monday [14:00:15] damn commies [14:01:15] taking a day off after "patriot day number one" with their "parades" [14:08:06] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [14:09:30] New patchset: Manybubbles; "Slick support for ganglia slope and units." [operations/puppet/jmxtrans] (master) - https://gerrit.wikimedia.org/r/71954 [14:12:01] manybubbles: have you got jmxtrans installed somewhere yet ? :D [14:12:09] hashar, I do! [14:12:15] but not puppetized :( [14:12:20] bad bad andrew [14:12:25] not my fault! it used to be puppetized [14:12:28] i want to puppetize it [14:12:31] its a contentious issue [14:12:32] i mean [14:12:32] well [14:12:34] it is my fault [14:12:38] but not recently my fault :p [14:12:42] * hashar manipulates ottomata in making him craft puppets  [14:12:50] hashar and ottomata: installed an puppet managed on solr-jmxtrans [14:13:18] though most of the configs were made by hand - I'm working on puppet to get them generated there [14:13:24] ottomata: I am not blaming you :-] [14:14:07] manybubbles: guess I will wait a bit before asking to get jmxtrans for Jenkins :-] [14:14:15] oh manybubbles, i like recent change [14:14:17] I can merge? [14:14:45] hashar, i think you can do it once I merge this [14:14:47] it should work great [14:15:22] ottomata: sure! I think we should still talk not running jmxtrans on each host though. [14:15:47] yeah, the exported stuff isn't in ther eyet [14:15:51] if you merge I'll make the jvm file as a new commit/review [14:15:56] that's why I wanted to review that jvm change separately [14:15:57] yeah [14:16:09] Change merged: Ottomata; [operations/puppet/jmxtrans] (master) - https://gerrit.wikimedia.org/r/71954 [14:16:53] New patchset: Ottomata; "Updating jmxtrans module for Nik's recent slope + units ganglia support" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72091 [14:17:45] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72091 [14:19:46] ottomata: I just found a mistake I made in the jvm one. Let me fix it and I'll submit. [14:19:52] New patchset: Mark Bergsma; "Add wikidata/wikivoyage LVS service IPs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72092 [14:21:09] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72092 [14:24:01] k [14:26:02] New patchset: Mark Bergsma; "Enable protoproxies for wikidata/wikivoyage in esams" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72094 [14:28:22] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72094 [14:32:46] New patchset: Ottomata; "Reenabling kafka too many produce requests icinga check." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72096 [14:32:58] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72096 [14:35:51] LeslieCarr: you probably aren't up yet, but I'm reenabling this and running puppet on all relevant nodes and watching icinga [14:36:18] New patchset: Manybubbles; "Add metrics that can be applied to most JVMs." [operations/puppet/jmxtrans] (master) - https://gerrit.wikimedia.org/r/72097 [14:38:35] * ChanServ har avslutat (shutting down) [14:38:37] wtf [14:43:28] AzaToth: didn't you see the global notice? [14:44:22] so manybubbles [14:44:26] about exported resource? [14:44:44] how would the jvm one be collected? [14:44:51] manually on whichever jmxtrans instance you want? [14:44:52] so you'd do [14:47:10] I collect them like this right now: https://gist.github.com/anonymous/a1aa637b4c88a4a31c42 [14:47:12] include jmxtrans [14:47:12] Jmxtrans::Metrics::Jvm <<| title == 'host:port' |>> [14:47:13] ? [14:48:34] hm [14:48:44] hmmmmm [14:49:20] It'd be nicer if you could collect them using a prefix match like Jmxtrans::Metrics <<| title ~ 'solr#' |> [14:50:01] and I'd prefer to only collect Jmxtrans::Metrics rather than each helper incarnation - that'd be kind of annoying [14:51:05] we could also use some kind of a tag or group on Jmxtrans::Metrics resources so different jmxtrans instances could call out which group/tag they are monitoring. [14:52:21] I've kind of tried to think of this along the same lines as the nagios plugin. They seem like similar problems to me. [14:54:32] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [14:55:35] New patchset: Mark Bergsma; "Add secondary (fallback) IPs for wikidata/wikivoyage" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72099 [14:56:12] PROBLEM - HTTPS on ssl3003 is CRITICAL: Connection refused [14:56:45] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72099 [14:57:56] New review: Hashar; "Solved https://bugzilla.wikimedia.org/show_bug.cgi?id=46093" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53714 [14:59:12] RECOVERY - HTTPS on ssl3003 is OK: OK - Certificate will expire on 01/20/2016 12:00. [15:08:02] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [15:19:40] New patchset: Mark Bergsma; "beta: adapt role::cache::varnish::upload" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70818 [15:22:16] manybubbles: sorry, was on a quick call [15:22:34] hm, yeah. maybe we can leave the exported stuff up to users of the module? [15:22:52] what happens if we don't export in jmxtrans;:metrics::jvm [15:23:07] but let hte user of the define do it [15:23:07] so [15:23:15] @@jmxtrans::metrics::jvm [15:23:31] and then if you want to collect on your jmxtrans instance, you can [15:23:33] however you want [15:23:36] you can use a tag if you want to [15:23:38] or title [15:23:40] or whatever [15:23:43] but it wouldn't be in the module [15:23:50] does that work though? [15:23:58] hm, i guess you'd have to collect both defines to do that [15:24:32] Jmxtrans::Metrics <<| tag == 'solr' |>> [15:24:32] Jmxtrans::Metrics::Jvm <<| tag == 'solr' |>> [15:24:33] right? [15:24:35] hm [15:24:59] I think the problem is that you'd have to collect all the different sorts of metrics and that is kind of crummy [15:25:04] avoid exported resources [15:25:15] they're a neat construct but they work poorly [15:25:16] New patchset: coren; "Tool Labs: Skeleton puppet class for UWSGI tyrant" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72101 [15:25:36] we don't use them much in our tree and this is no accident [15:25:45] New review: coren; "Trivial skeleton is trivial." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/72101 [15:25:54] paravoid: hmm - [15:26:01] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72101 [15:26:16] we do use them for certain things, nagios ganglia etc. [15:26:22] paravoid: we use them for monitoring and this feels like a similar problem - that is where I got the idea. [15:26:25] so maybe that's appropriate here [15:26:27] right [15:26:35] but they are annoying to test [15:26:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:26:50] and I see that they have a problem with when hosts get removed from the setup they don't get cleared [15:27:06] New patchset: coren; "Tool Labs: Add the tyrant class, properly" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72102 [15:27:11] what do you mean? [15:27:13] they do [15:27:24] maybe I'm getting confused. [15:27:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [15:27:35] New review: coren; "No changeset is too trivial to screw up, obviously." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/72102 [15:28:09] New patchset: Mark Bergsma; "beta: adapt role::cache::varnish::upload" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70818 [15:28:09] I think I was getting confused by nagios#decommission_monitor_host [15:29:52] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70818 [15:31:29] we don't use external resources for ganglia I think [15:32:01] New patchset: Andrew Bogott; "Remove abandoned rsync configs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71126 [15:32:01] New patchset: Andrew Bogott; "Remove the now unused generic::rsyncd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71127 [15:32:01] New patchset: Andrew Bogott; "Convert the nfs rsyncd to the new rsync module." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71124 [15:32:01] New patchset: Andrew Bogott; "Convert the udp2log rsyncd to use the rsyncd module." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71125 [15:32:02] New patchset: Andrew Bogott; "Remove misc::deployment::scap_proxy as it appears unused." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71122 [15:32:02] New patchset: Andrew Bogott; "Convert scap's rsyncd to the new rsync module." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71123 [15:32:02] New patchset: Andrew Bogott; "Convert the search rsyncd to the new rsync module." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71106 [15:32:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:33:26] paravoid: I'm about to add you as reviewer to all the above patches, in hopes of merging a bunch of them on Monday [15:33:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [15:34:55] paravoid, mark, manybubbles, I agree avoiding exported resources in general is good, which is why I kinda want to keep them out of this module [15:35:03] this case is more like nagios than like ganglia [15:35:10] yeah... avoid, but use if you need them [15:35:26] and try to export as little as you can [15:35:33] they kill performance ;) [15:35:37] the jmxtrans instance can pull metrics from jvms on different hosts [15:35:44] with ganglia you can just push them [15:36:16] manybubbles, why don't you want to just run jmxtrans on each of your jvm hosts? [15:36:53] ottomata: it isn't really built for it. it's default config recommendations assume it'll eat the whole machine. I'm sure it'd work on its own but my feeling is that it sees fewer installs that way. [15:37:15] hmmm [15:37:27] rather - I'm sure it'd work having one beside every jvm but it was written to scrape a whole bunch of them [15:37:38] yeah [15:37:52] we can keep the exports out of the module. [15:38:09] if they kill performance then lets just not use them at all and enumerate monitoring directly on the box that does the monitoring. [15:38:18] i know this isn't ideal, but since you are using a define, you could use resource array on your jmxtrans host to get all of your jvm instances [15:38:19] if it turns into a huge pain we'll turn it off. [15:38:42] *looking stuff up* [15:38:48] jmxtrans::metrics::jvm { ["host1:port1", "host2:port2"]: } [15:39:16] that's less elegant, because you have to have all of your JVM instances defined in puppet somewhere [15:39:21] so that's a list to maintain [15:40:19] hmm - we'd have a lot of lists in the end. and it sucks that it is backwards from ganglia. [15:40:38] I could try to configure jmxtrans (with puppet) to run in tiny mode. [15:41:02] wait, tell me the problem again with not exporting in the module? [15:41:10] just exporting in your usage of the module? [15:41:46] if you export in the usage of the module then when you import you have to know all the different types of things your importing [15:42:04] well, right now the only abstracted use of jmxtrans::metrics is your new jvm one [15:42:09] there probably won't be more than that, right? [15:42:23] jvm is the only 100% generic thing I could think we'd add [15:42:33] other JMX metrics are custom to your app [15:42:54] so yeah, you'd have to manually collect, but you'd have to do that for all of your app metrics anyway [15:42:57] I'm working on a solr one which we'll strap to solr instances [15:43:08] and to the solr puppetization, right? [15:43:19] or do you thikn things like that would belong in jmxtrans module? [15:43:19] hmm [15:43:28] naw, i think they belong in the solr puppetization [15:43:29] at this point the solr one is more a proof of concept that we can do it then a full blown implementation. [15:43:30] New review: Mark Bergsma; "(1 comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/71125 [15:43:35] given that we'll be playing with elasticsearch too [15:43:46] they do belong with solr puppetization [15:44:23] but they'll be similar in shade to the jvm one - a define you can use (in this case once per core) [15:45:13] New review: Ottomata; "Nice! See this one all the way through when it gets merged. Make sure that the modules that get in..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71125 [15:45:39] right, I thikn i'm fine with exports in your usage of the jmxtrans module [15:45:44] i'm reluctant to bake it into the module itself [15:45:52] lets not then [15:46:05] hmm, ok col [15:46:06] we'll start without it and if it sucks we'll convince ourselves it is worth adding then [15:46:06] cool [15:46:11] sounds good [15:46:14] i'll add that decision as ac omment on the review [15:47:02] New review: Ottomata; "(1 comment)" [operations/puppet/jmxtrans] (master) - https://gerrit.wikimedia.org/r/72097 [15:47:16] k, cool, I had a comment on the resultAlias names [15:47:16] too [15:47:16] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72102 [15:47:16] but that's inline [15:47:21] ok, i'm going to take my laundry home, be back online in a bit [15:47:33] New patchset: Manybubbles; "Add metrics that can be applied to most JVMs." [operations/puppet/jmxtrans] (master) - https://gerrit.wikimedia.org/r/72097 [15:48:22] New patchset: coren; "Tool Labs: Tyrant role, and minimal class config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72105 [15:49:01] andrewbogott: the rsyncd running as nobody will be a problem for some of these [15:49:14] e.g. for nfs home I bet nobody won't work [15:49:21] mark_: They don't run as nobody… the code backing the individual modules sets the defaults to 0/0 [15:49:22] New review: coren; "Minimal base so that it will apply." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/72105 [15:49:22] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72105 [15:49:25] so, running as root. [15:49:33] ok. [15:49:47] It's weird that the module has two competing defaults :( [15:52:45] as long as you babysit/test each of these I see no problem with merging these any time [15:53:58] thanks. Planning to merge on Monday so that there are lots of folks around in case something unexpected happens. [15:54:28] We did a full test of the search patch on beta, the others are very similar. [15:55:06] New review: Daniel Kinzler; "yea, that's what we want" [operations/apache-config] (master) C: 1; - https://gerrit.wikimedia.org/r/71790 [15:58:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:01:40] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 6.411 second response time [16:04:53] New review: coren; "This needs to go in exec_environ; if it's useful for jobs, it's equally useful for web tools." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66266 [16:05:44] New review: Denny Vrandecic; "Looks right. The ontology should be available as per connected bug. That would be great, thanks!" [operations/apache-config] (master) C: 1; - https://gerrit.wikimedia.org/r/71790 [16:08:10] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 16:08:05 UTC 2013 [16:08:48] New review: coren; "Comments inline." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/67055 [16:09:00] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [16:09:40] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 16:09:33 UTC 2013 [16:10:00] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [16:10:50] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 16:10:40 UTC 2013 [16:11:00] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [16:11:42] I forget, where are the log files on terbium? [16:11:50] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 16:11:41 UTC 2013 [16:12:00] New review: coren; "It's ugly as sin, but the highlight is a reasonable safeguard." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/70188 [16:12:01] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70188 [16:12:12] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [16:13:12] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [16:15:36] manybubbles: did you see my comment about resultAlias? [16:16:05] no - let me look [16:16:29] ottomata: where is your comment? [16:16:44] I didn't see it and can't find it [16:20:52] it was inline on teh other patchset, but it shoudl be in the comment on that patchset too [16:21:03] oh [16:21:03] not [16:21:04] no [16:21:05] its just inline [16:21:15] https://gerrit.wikimedia.org/r/#/c/72097/1/manifests/metrics/jvm.pp [16:29:32] ottomata: got it. I actually like spaces because it matches other stuff in ganglia. OTOH I'm not consistent with the style otherwise. [16:30:20] ottomata:looks like group names don't use spaces but their graphs do [16:32:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:34:20] i'm less concerned about group names, i think thos are mostly for ganglia gui [16:34:24] but we use the other stuff programatically [16:34:33] for example, in order to get icinga alerts [16:34:37] we ahve to specify the ganglia metric name on the cli [16:34:39] spaces make that annoying [16:37:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [16:39:59] ottomata: ok. I was trying to follow the example of the other stuff in ganglia but it is simple enough to change. [16:51:29] New patchset: Ottomata; "Installing perl packages for wikistats dependencies" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72110 [16:56:24] New patchset: Ottomata; "Installing perl packages for wikistats dependencies" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72110 [16:57:40] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72110 [17:08:07] New patchset: Manybubbles; "Add metrics that can be applied to most JVMs." [operations/puppet/jmxtrans] (master) - https://gerrit.wikimedia.org/r/72097 [17:09:40] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [17:10:10] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [17:35:06] New patchset: Andrew Bogott; "Add packages to the jenkins slave for puppet rspec." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71921 [17:41:21] Ryan_Lane could you have a look at https://rt.wikimedia.org/Ticket/Display.html?id=5423 ? [17:43:50] New patchset: Ottomata; "including misc::statistics::wikistats in role::statistics::private." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72114 [17:44:13] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72114 [18:06:29] New patchset: Hashar; "Add packages to the jenkins slave for puppet rspec." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71921 [18:06:39] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [18:06:59] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [18:07:22] New review: Hashar; "'rake' was required twice, I got rid of it with PS3. Should be fine now :)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/71921 [18:08:09] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [18:32:09] PROBLEM - Host colby is DOWN: PING CRITICAL - Packet loss = 100% [18:33:28] !log bringing down colby to remove extra drives [18:33:38] Logged the message, Master [18:38:30] RECOVERY - Host colby is UP: PING OK - Packet loss = 0%, RTA = 26.89 ms [18:39:04] PROBLEM - NTP on colby is CRITICAL: NTP CRITICAL: Offset unknown [18:42:04] RECOVERY - NTP on colby is OK: NTP OK: Offset -0.0009111166 secs [18:49:24] New patchset: Andrew Bogott; "Catch the spec tests up to our customizations:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72124 [18:55:41] New patchset: Stefan.petrea; "Added cronjob for new mobile pageviews" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72125 [19:04:14] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72125 [19:05:15] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [19:05:15] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [19:06:05] PROBLEM - Puppet freshness on ms-be1002 is CRITICAL: No successful Puppet run in the last 10 hours [19:11:13] New patchset: Stefan.petrea; "Fixing param day => weekday" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72128 [19:11:36] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72128 [19:14:05] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: No successful Puppet run in the last 10 hours [19:16:46] andrewbogott: could i get you to review and hopefully +2 https://gerrit.wikimedia.org/r/#/c/70780/ ? [19:19:20] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70780 [19:19:41] ty :) [19:19:53] np! [19:19:57] does it need to be deployed anywhere, or does that happen automatically? [19:23:29] New patchset: Ottomata; "Adding github replication to for jmxtrans to wikimedia/puppet-jmxtrans" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72130 [19:24:20] qchris: ^, does that look ok? [20:06:27] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [20:06:57] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [20:08:17] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 20:08:10 UTC 2013 [20:08:27] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [20:08:57] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 20:08:51 UTC 2013 [20:09:37] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 20:09:33 UTC 2013 [20:09:57] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [20:10:28] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [20:10:47] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 20:10:36 UTC 2013 [20:10:57] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [20:11:17] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 20:11:10 UTC 2013 [20:11:27] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [20:12:07] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 20:12:03 UTC 2013 [20:12:47] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 20:12:44 UTC 2013 [20:12:57] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [20:13:27] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [20:13:37] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 20:13:33 UTC 2013 [20:13:57] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [20:14:27] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 20:14:19 UTC 2013 [20:14:57] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 20:14:51 UTC 2013 [20:15:27] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [20:15:27] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 20:15:25 UTC 2013 [20:15:57] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [20:16:15] ACKNOWLEDGEMENT - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours daniel_zahn flapping [20:16:17] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 20:16:07 UTC 2013 [20:16:27] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [20:16:47] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 20:16:38 UTC 2013 [20:16:57] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [20:17:17] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 20:17:14 UTC 2013 [20:21:04] Change merged: Ottomata; [operations/puppet/jmxtrans] (master) - https://gerrit.wikimedia.org/r/72097 [20:21:55] New patchset: Ottomata; "Updating jmxtrans module to include jvm define wrapper" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72134 [20:22:04] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72134 [20:32:24] New patchset: Jgreen; "test mysql module my.cnf creation for fundraisingdb" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72140 [20:35:35] New patchset: Jgreen; "test mysql module my.cnf creation for fundraisingdb" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72140 [20:35:36] andrewbogott: does something need to be deployed for that patch to take effect? i just tested it and the commit showed up in -dev rather than #pywikipediabot [20:35:53] legoktm, it should've been applied by a cron. [20:35:56] I'l look in a moment. [20:36:01] ok, thanks [20:39:22] New patchset: Jgreen; "test mysql module my.cnf creation for fundraisingdb take 3" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72140 [20:40:48] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72140 [20:42:02] andrewbogott: works great, thanks! [20:42:20] I didn't do anything, the cron must've caught up :) [20:42:35] heh :D [20:44:11] New patchset: MaxSem; "Emergency fix to adapt MobileFrontend for core changes" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72144 [20:46:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:47:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [20:51:28] binasher, quick graphite q for you if you are around [20:51:51] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72144 [20:55:31] !log maxsem synchronized wmf-config/InitialiseSettings.php 'https://gerrit.wikimedia.org/r/72144' [20:55:43] Logged the message, Master [20:56:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:58:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [20:58:30] New patchset: Petr Onderka; "started writing indexes to file" [operations/dumps/incremental] (gsoc) - https://gerrit.wikimedia.org/r/72148 [20:58:53] Change merged: Petr Onderka; [operations/dumps/incremental] (gsoc) - https://gerrit.wikimedia.org/r/72148 [21:04:22] New patchset: Dereckson; "Throttle now handles IP ranges." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65644 [21:06:04] New review: Dereckson; "PS6: rebased" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65644 [21:06:32] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [21:07:02] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [21:07:07] New patchset: Dereckson; "Throttle now handles IP ranges." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65644 [21:09:02] !log catrope synchronized php-1.22wmf9/extensions/VisualEditor/ 'Update VE to master in wmf9' [21:09:10] Logged the message, Master [21:09:18] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52778 [21:10:33] New patchset: Jgreen; "try setting some variables in my.cnf for fundraising db" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72153 [21:10:38] Change merged: Dzahn; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/71790 [21:11:28] syncs apache [21:16:00] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72153 [21:16:42] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 5 21:16:35 UTC 2013 [21:17:33] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [21:18:18] New patchset: Jgreen; "Revert "try setting some variables in my.cnf for fundraising db" . . . well that was full of cryptic fail." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72157 [21:19:39] New review: Dzahn; "testing 3 urls on 1 servers, totalling 3 requests" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/71790 [21:21:02] New patchset: Jgreen; "revert didn't do what I expected..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72159 [21:22:09] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72159 [21:22:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:23:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [21:24:03] !log graceful Apaches, activate "ontology" alias for wikidata [21:24:13] Logged the message, Master [21:24:29] New review: Dzahn; "merged/synced/gracefull'ed" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/71790 [21:24:38] dasein.wikipedia.org [21:25:26] ori-l: ?:) [21:25:32] https://www.wikidata.org/ontology [21:25:50] Wikibase system ontology [21:25:55] heidegger/ontology joke. that is pretty cool, tho. [21:26:11] not that i understand what i'm looking at, but still. [21:27:17] hehe, same here [21:27:30] telling the wikidata people right now [21:27:59] "A reified statement." ? [21:29:03] https://jena.apache.org/documentation/notes/reification.html [21:29:15] ah:) thx [21:29:20] "We shall call these four such statements a reification quad and the components quadlets. Users of reification in Jena may, by default, simply manipulate reified statements as these quads. However, just as for Bag, Seq, Alt and RDF lists in ordinary models, or ontology classes and individuals in OntModels, Jena has additional support for manipulating reified statements." [21:29:34] i wonder if it's considered a bug that /ontology works but /ontology/ doesn't [21:29:46] mutante: depends on the reification [21:29:50] :) [21:29:52] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Fri Jul 5 21:29:50 UTC 2013 [21:30:02] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [21:30:03] i still have no idea what any of that means, but it's still cool [21:31:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:32:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [21:33:03] 14:33 < Denny_WMDE> mutante: we are not using the rdf reification vocabulary, though [21:36:52] ori-l: do you want more details ?:) [21:39:31] New patchset: Asher; "slight myisam tuning for labsdbs, enable xtradb's innodb_kill_idle_transaction feature" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72193 [21:40:24] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72193 [21:45:08] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64462 [21:46:29] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66326 [21:47:25] those were harmless, bbl , bye [21:54:41] !log catrope synchronized php-1.22wmf8/extensions/VisualEditor/ 'Update VE to master in wmf8 (did wmf9 earlier)' [21:54:51] Logged the message, Master [21:55:37] mutante: :-D [22:07:42] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [22:08:00] New review: QChris; "Looks good to me, although we're currently not having a puppet-jmxtrans repo at github (for puppet-c..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72130 [22:08:32] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [22:26:07] cmjohnson1: freshness was disabled or not? [22:26:15] or was it undisabled? [22:42:09] Wasn't there a conclusion that puppet freshness issues == smptt creating a bazillion files and exhausting inodes (apparently due to submissions getting behind)? [22:42:33] orenwolf: didn't read/get/find the memo [22:45:35] AzaToth: LeslieCarr did some hunting and found submissions from the plugin getting hung up in RT 5311, but I don't think there's a resolution yet. [22:46:01] AzaToth: freshness was undisabled [22:46:10] orenwolf: it's good we normal mortals can read those RT:s [22:46:12] AzaToth: lemme cut/paste the relevant info [22:46:20] AzaToth: i hope you're not being sarcastic, because you can [22:46:26] I can? [22:46:31] :D [22:46:34] yes, you can [22:46:52] lemme find the instructions [22:46:58] mutante knows them by heart if this ping finds him [22:47:03] but I need to login to RT [22:47:03] * orenwolf wishes someone would create a multithreaded way for nagios/icinga to ingest plugin responses someday [22:47:16] unless you mirror them somewhere [22:47:21] Here's the trick: If you ever sent an email to RT, like by mailing ops-requests@rt.wikimedia.org in the past, [22:47:22] RT already auto-created a user for you, you just might not know it. So just try the "forgot password" link: [22:47:23] https://rt.wikimedia.org/NoAuth/ResetPassword/Request.html [22:47:50] LeslieCarr: nope [22:47:56] ? [22:48:10] "Sorry, no account in the ticket system has the email address" [22:48:30] I'm a low level mortal being [22:48:30] well, send a quickemail with "this is to createa ticket" i'll close it [22:49:02] I've no ops permissions [22:50:09] AzaToth: have you sent an email to that address ? [22:50:21] AzaToth: because you should then have an account auto created [22:51:56] sent [22:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:52:34] ok, saw the ticket [22:52:40] now try resetting the password ? [22:53:02] trying [22:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [22:53:29] haven't got any email yet [22:53:37] got your replies though [22:54:39] cool [22:55:25] but no password reset mails [22:55:55] have tried three times now [22:56:07] hrm, after you tried to have it reset passwords? let's give it another 15 minutes ..... [22:56:15] k [22:56:46] I know RT can be a bitch to configure sometimes [22:59:19] also it's not the quickest with responses…. [23:07:15] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [23:07:25] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [23:08:47] anyways - "/usr/lib/nagios/plugins/eventhandlers/submit_check_result" gets stuck/hung [23:08:53] sadly, strace -p then unsticks it [23:09:00] so i can't figure out what specifically sticks [23:09:43] and in the git repo it's under /files/icinga/submit_check_result [23:20:05] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [23:20:05] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [23:20:05] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [23:20:05] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [23:20:05] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [23:20:06] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [23:20:06] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [23:20:07] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [23:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:23:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.133 second response time [23:42:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:43:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [23:52:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:53:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time