[00:26:45] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 00:26:40 UTC 2013 [00:27:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [00:29:04] New patchset: Ram; "Bug: 45266 Use sequence numbers instead of timestamps" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/53299 [00:38:33] !log kaldari synchronized php-1.21wmf11/extensions/UploadWizard/ 'deploying some bugfixes for UploadWizard' [00:38:39] Logged the message, Master [00:53:20] New review: Ram; "Not ready for merge; still being tested." [operations/debs/lucene-search-2] (master) C: -1; - https://gerrit.wikimedia.org/r/53299 [00:57:35] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 00:57:34 UTC 2013 [00:58:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [01:07:22] New review: Ram; "Since the changes corresponding to what was reverted in the OAI extension are still present here, I ..." [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/53299 [01:28:26] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 01:28:22 UTC 2013 [01:29:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [01:37:55] PROBLEM - Varnish traffic logger on cp1033 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:50:55] RECOVERY - Varnish traffic logger on cp1033 is OK: PROCS OK: 3 processes with command name varnishncsa [01:55:35] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 188 seconds [01:55:55] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 192 seconds [01:59:16] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 01:59:10 UTC 2013 [01:59:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [02:10:06] !log deployed change 53304 to OATHAuth on wikitech [02:10:18] Logged the message, Master [02:10:32] morebots: what's your deal? why are you so slow right now? [02:10:32] I am a logbot running on wikitech-static. [02:10:32] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [02:10:32] To log a message, type !log . [02:10:34] oh [02:10:35] rigt [02:10:36] right [02:10:41] that box is doing an import [02:16:05] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [02:16:47] !log deploying change 53248 and 52951 to OpenStackManager on wikitech [02:16:53] Logged the message, Master [02:23:05] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [02:23:05] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Puppet has not run in the last 10 hours [02:29:35] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [02:29:45] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 02:29:40 UTC 2013 [02:29:55] !log LocalisationUpdate completed (1.21wmf11) at Tue Mar 12 02:29:54 UTC 2013 [02:29:55] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [02:30:03] Logged the message, Master [02:30:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [02:33:55] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [02:43:55] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [02:52:43] !log LocalisationUpdate completed (1.21wmf10) at Tue Mar 12 02:52:43 UTC 2013 [02:52:50] Logged the message, Master [03:00:15] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 03:00:10 UTC 2013 [03:00:26] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [03:21:45] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 190 seconds [03:21:55] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 191 seconds [03:30:55] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 03:30:52 UTC 2013 [03:31:26] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [03:40:45] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [03:40:55] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [04:01:25] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 04:01:23 UTC 2013 [04:02:30] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [04:32:05] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 04:31:56 UTC 2013 [04:32:26] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [04:46:25] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [04:48:06] New review: MZMcBride; "\o/" [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53276 [04:50:38] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 229 seconds [04:50:47] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 237 seconds [04:52:25] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [04:56:25] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 04:56:19 UTC 2013 [04:56:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [04:57:37] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [04:57:45] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 15 seconds [04:59:56] jfyi - the replag on db1025 and db78 were me [05:02:18] !log tstarling synchronized php-1.21wmf11/extensions/Math [05:02:30] Logged the message, Master [05:02:50] !log tstarling synchronized php-1.21wmf10/extensions/Math [05:02:57] Logged the message, Master [05:26:45] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 05:26:44 UTC 2013 [05:27:26] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [05:36:58] New review: Tim Starling; "It's not just MediaWiki, it's MediaWiki minus some things and plus some other things. It's a subdire..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/53125 [05:55:29] New review: Krinkle; "Ah, okay. That makes sense. We'd have /h/w/c/w but I suppose that's what we deserve." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/53125 [05:57:26] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 05:57:21 UTC 2013 [05:58:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [06:27:57] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 06:27:52 UTC 2013 [06:28:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [06:32:45] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 185 seconds [06:32:56] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 190 seconds [06:39:57] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 185 seconds [06:39:58] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 185 seconds [06:41:05] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [06:45:57] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [06:45:58] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [06:58:25] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 06:58:19 UTC 2013 [06:58:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [07:28:56] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 07:28:47 UTC 2013 [07:29:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [07:37:55] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 193 seconds [07:37:55] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 192 seconds [07:40:56] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [07:40:56] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [07:42:47] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 211 seconds [07:42:55] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 217 seconds [07:59:55] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 07:59:47 UTC 2013 [08:00:26] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [08:31:37] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 08:31:25 UTC 2013 [08:32:11] !log jenkins: live hacked the ant build script for MediaWiki jobs. Now using the local replicated git directory instead of a handmade replication. [08:32:22] Logged the message, Master [08:32:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [08:37:57] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 185 seconds [08:37:57] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 185 seconds [08:38:57] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [08:38:57] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [08:48:35] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 08:48:28 UTC 2013 [08:49:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [08:51:42] https://bugzilla.wikimedia.org/show_bug.cgi?id=46018 [08:52:33] New review: Nemo bis; "Weird side-effect? https://bugzilla.wikimedia.org/show_bug.cgi?id=46018" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/53264 [09:00:09] New review: Hashar; "(1 comment)" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/53264 [09:11:05] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Puppet has not run in the last 10 hours [09:11:05] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Puppet has not run in the last 10 hours [09:11:05] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [09:11:05] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [09:11:05] PROBLEM - Puppet freshness on msfe1002 is CRITICAL: Puppet has not run in the last 10 hours [09:19:05] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 09:18:56 UTC 2013 [09:19:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [09:26:55] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 192 seconds [09:26:55] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 192 seconds [09:45:55] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [09:45:55] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [09:49:25] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 09:49:23 UTC 2013 [09:50:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [09:53:06] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Puppet has not run in the last 10 hours [10:08:12] If anybody is awake: Looks like https://gerrit.wikimedia.org/r/#/c/53264/ created https://bugzilla.wikimedia.org/show_bug.cgi?id=46018 - could somebody revert / fix? [10:12:48] andre__: noticed that one already :-] [10:13:03] andre__: and proposed some fix to the rewrite rule. [10:13:05] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 185 seconds [10:13:13] guess we want to ping someone this afternoon [10:13:19] yeah. but having somebody to deploy it would be nice ;) [10:13:22] yeah [10:13:29] not a big issue anyway [10:13:47] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 205 seconds [10:13:55] not unless google crawls mediawiki.org [10:19:55] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 10:19:50 UTC 2013 [10:20:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [10:25:28] hmm [10:37:45] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [10:38:05] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [10:50:26] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 10:50:19 UTC 2013 [10:50:26] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [11:20:46] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 11:20:42 UTC 2013 [11:21:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [11:32:37] New patchset: Demon; "(bug 45911) Set $wgCategoryCollation to 'uca-pt' for the Portuguese Wikipedia and Wikibooks" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52903 [11:33:48] New patchset: Mark Bergsma; "Handle average (sum/count) values properly" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53346 [11:51:35] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 11:51:28 UTC 2013 [11:52:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [11:58:24] New patchset: Krinkle; "Rename legacy 'live-1.5/' to 'w/'." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/53125 [12:15:31] Change abandoned: Mark Bergsma; "This was sort of implemented a while ago. GeoIP_last_netmask() is inherently thread unsafe, but this..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28295 [12:17:05] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [12:23:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:24:05] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [12:24:06] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Puppet has not run in the last 10 hours [12:25:15] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 12:25:07 UTC 2013 [12:25:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [12:25:55] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 3.577 second response time [12:48:53] New patchset: Hashar; "(bug 44041) adapt role::cache::mobile for beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44709 [12:51:32] New patchset: Hashar; "Varnish rules for Beta cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47567 [12:52:24] New review: Hashar; "rebased" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47567 [12:55:03] !log gallium : upgrading Jenkins (unscheduled) [12:55:10] Logged the message, Master [13:06:06] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 13:06:00 UTC 2013 [13:06:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [13:16:24] "(unscheduled)" can be appended to all my log messages [13:21:52] haha [13:29:53] !log Jenkins restarted! [13:30:00] Logged the message, Master [13:36:37] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 13:36:32 UTC 2013 [13:37:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [13:40:05] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 185 seconds [13:40:45] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 199 seconds [13:43:03] paravoid: wanna merge some of my contint puppet modules ? :-D [13:44:35] got them tested in labs so they should be fine :-] [13:44:46] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 187 seconds [13:44:49] https://gerrit.wikimedia.org/r/#/c/47664/ annnd https://gerrit.wikimedia.org/r/#/c/47742/ =) [13:45:06] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 186 seconds [13:57:01] New patchset: Hashar; "contint::website regroups apache + basic files" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47742 [13:57:01] New patchset: Hashar; "Jenkins module created out of contint manifests" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47664 [13:57:01] New patchset: Mark Bergsma; "Create pbuilder images with ubuntu APT components main and universe" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53358 [13:58:30] hashar: looking now [13:58:44] paravoid: rebased them to make sure they works fine [13:58:49] and running on labs right now [13:59:22] you did review the first one (jenkins module), the other one is mostly to cleanup apache [13:59:45] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 197 seconds [14:00:05] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 205 seconds [14:07:25] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 14:07:18 UTC 2013 [14:07:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [14:07:56] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53346 [14:08:48] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53358 [14:09:02] hashar: that's a lot of patchsets :) [14:09:46] paravoid: lot of rebases too :-] [14:10:06] hashar: can we move groups::jenkins to the module? [14:10:19] I remember saying this before, I don't remember your reply, sorrY ;-) [14:10:24] yeah [14:10:33] I need the group for some other parts [14:10:59] such as adding users to the group, and making sure web files belong to group jenkins [14:11:11] so? [14:11:12] but we can create a "groups" module ;] [14:11:15] noo [14:11:38] what would be the problem with including the class from the module instead? [14:12:40] something like jenkins::user_group ? [14:13:04] that means that manifests/admins.pp will depends on the jenkins modue [14:13:31] hence any puppet run will have to load the jenkins manifest [14:13:53] ah found my comment: """ We need it to be global since it is used to add contint admins in the jenkins group (somewhere in manifests/admins.pp """ [14:13:58] https://gerrit.wikimedia.org/r/#/c/47664/4/modules/jenkins/manifests/user.pp,unified [14:14:26] jenkins::user would be fine [14:14:42] puppet's autoloader won't load the whole module, just that specific class [14:14:56] but preferrably we wouldn't have that class included in all machines [14:15:07] that also mean loading one extra file for any puppet run on any server [14:15:09] just gallium/other contint boxes [14:15:22] unless I'm missing something? [14:16:31] hmm now I am confused [14:16:57] so yeah that is the whole point, sticking the group declaration in admins.pp since that it is used there. Let us skip loading the jenkins::user class on all servers [14:18:09] I don't understand [14:18:12] and we also need the jenkins group to be defined next to the other groups to make sure nobody is going to still its did :D [14:19:39] let me start over [14:19:44] yes please :) [14:20:04] the jenkins group is used to admin access between the jenkins daemon (which run under jenkins:jenkins) and the jenkins administrator [14:20:13] we add the jenkins administrators on gallium to the jenkins group [14:20:34] files that are produced by jenkins but need to be somehow altered/moved by the admins belong to the group jenkins [14:20:56] that group is defined in the global manifests/admins.pp just like wikidev [14:21:12] and received a unique (cluster wide) GID [14:21:48] since the admins are defined in the global file with (include groups::jenkins) it makes more sense to me to have the group there [14:21:54] rather than having to require the module class [14:21:59] why? [14:22:08] that save one extra file lookup for each run of puppet on the full cluster [14:22:15] depending from manifests -> a module isn't bad [14:22:19] AND, make sure that nobody is going to still the GID of the jenkins group [14:22:26] since all groups are defined at the same place [14:23:34] so that is kind of a micro optimization and ensuring consistency of our GIDs [14:23:36] if you just remove the gid, puppet will just create a system group [14:23:44] if the group doesn't exist already [14:23:50] which is also fine [14:24:09] it isn't an optimization at all btw [14:24:20] doh [14:24:20] this just matters for gallium which will open that file anyway [14:24:46] but even if it was, let's not structure our puppet repo based on how many files the puppetmaster needs to open :) [14:25:03] hehe :-] [14:25:09] we can put EVERYTHING in site.pp [14:25:13] hehe [14:25:43] which is more or less what we are doing right now hehe [14:26:01] not by a long shot [14:26:02] depending from manifests -> module is fine [14:26:04] paravoid: so want me to move the groups::jenkins definition to jenkins::group ? [14:26:16] having the module depend on admins.pp is not [14:26:23] imho [14:27:00] either that or move it within jenkins::user [14:27:32] did I convince you or are you just doing whatever it takes for me to merge it? :P [14:28:04] 2 [14:28:05] ;-] [14:28:18] haha [14:28:45] so I should just remove the gid => 561 ? :-] [14:28:46] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 12 seconds [14:29:00] I think it should be fine, yes [14:29:05] my idea was merely to keep the way we define groups [14:29:07] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 4 seconds [14:29:13] probably does not matter since only gallium has it [14:33:20] rebasing and trying out on the labs instance [14:33:39] New patchset: Hashar; "contint::website regroups apache + basic files" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47742 [14:33:39] New patchset: Hashar; "Jenkins module created out of contint manifests" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47664 [14:35:13] New review: Hashar; "Per discussion with Faidon, moved the 'jenkins' group definition out of admins.pp to the Jenkins mod..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47664 [14:35:35] notice: Finished catalog run in 37.58 seconds \O/ [14:35:52] paravoid: https://gerrit.wikimedia.org/r/#/c/47664/13..14/manifests/admins.pp,unified :-] [14:36:05] the group is there https://gerrit.wikimedia.org/r/#/c/47664/13..14/modules/jenkins/manifests/group.pp,unified [14:36:23] just 40 seconds, that's \o/ indeed [14:36:24] * paravoid grins [14:36:39] yeah I got puppet to run in a ram disk [14:36:43] makes things faster [14:36:52] jk [14:37:55] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 14:37:45 UTC 2013 [14:38:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [14:38:44] I have a minor nitpick [14:38:46] https://gerrit.wikimedia.org/r/#/c/47664/14/modules/jenkins/manifests/user.pp [14:39:02] you don't need to require the class if you require => Group [14:39:17] you can either just include the class or drop the require => Group [14:39:24] personally I'd prefer the former but either works [14:40:06] ahh [14:40:23] very very minor as I said :) [14:40:36] I have to maintain by fame of badass reviewer though [14:41:20] done :-] [14:41:36] New patchset: Hashar; "contint::website regroups apache + basic files" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47742 [14:41:37] New patchset: Hashar; "Jenkins module created out of contint manifests" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47664 [14:41:47] New review: Hashar; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47664 [14:43:19] New review: Faidon; "Thanks!" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/47664 [14:43:28] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47664 [14:44:48] hashar: why is there a "require" here: https://gerrit.wikimedia.org/r/#/c/47742/11/manifests/misc/contint.pp ? [14:45:01] cause I am a noob ? :-] [14:45:20] I still don't know when to use include or require [14:45:28] so to play it safe I usually use require [14:45:30] require is basically include + a require => Class [14:45:49] but it really should be avoided imho [14:49:20] New patchset: Hashar; "contint::website regroups apache + basic files" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47742 [14:51:30] New patchset: Hashar; "contint::website regroups apache + basic files" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47742 [14:51:43] paravoid: rebased and fixed the require [14:52:08] trying in labs [14:53:01] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47742 [14:53:33] my other "complaint" [14:53:44] (not a big one obviously, I just merged it :) [14:53:58] is the /srv tree organization [14:54:32] e.g. you define /srv/localhost which is very non-contint namespace and could potentially clash with another module in the future [14:54:36] but let's cross that bridge then :) [14:54:42] if ever [14:56:15] yeah that might be eventually a probel [14:56:26] should probably have been something like /srv/qunit/localhost [14:57:11] New patchset: Hashar; "pep8 configuration file" [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53360 [14:57:11] New patchset: Hashar; "pass pep8 linting checks" [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53361 [14:58:43] paravoid: thanks for your review sprint :-] [14:59:39] New review: Krinkle; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47742 [14:59:51] hashar: ignore that comment [15:00:10] Krinkle: the one you have put in gerrit or your last comment asking me to ignore it ? :-] [15:00:10] Ignore tabs used as indentation (W191) since mark loves tabs. [15:00:11] hahahaha [15:00:19] I'm not +2 that [15:00:26] I'll let mark do it [15:01:01] New patchset: Hashar; "drop 'sys' import: unused" [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53363 [15:02:02] What's the status of RT 4695? [15:02:49] reopened today [15:08:35] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 15:08:28 UTC 2013 [15:09:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [15:09:54] I suspect it's that extra [OR] which is causing issues [15:12:41] paravoid: turns out jenkins::user has to require jenkins::group despite the require => group['jenkins'] [15:12:51] New review: Faidon; "Andrew, are you taking over this? If not, maybe we should abandon?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43886 [15:12:56] paravoid: I got err: Failed to apply catalog: Could not find dependency Group[jenkins] for User[jenkins] at /var/lib/git/operations/puppet/modules/jenkins/manifests/user.pp:13 [15:13:35] New review: Andrew Bogott; "Yep, I'll take it on." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43886 [15:13:39] that's not the problem [15:13:43] class jenkins::group { [15:13:43] class jenkins { [15:13:59] jenkins::group::jenkins ? :) [15:14:03] I don't know how I missed that [15:15:19] my class in the module is jenkins::group so ::jenkins:group ? [15:15:32] ? [15:15:37] remove the subclass [15:15:46] ohhh [15:15:47] sorry [15:15:51] class jenkins::group { group { 'jenkins': ... } } [15:16:13] system => true [15:16:40] I'll do it [15:16:46] got it [15:17:24] I'm ready to push :) [15:18:00] New patchset: Hashar; "fix jenkins::group class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53366 [15:18:06] paravoid: ^^^ [15:18:28] New review: Hashar; "recheck" [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53363 [15:19:18] New review: Hashar; "recheck" [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53363 [15:19:43] New patchset: Faidon; "fix jenkins::group class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53366 [15:20:43] jenkins is annoyingly slow lately [15:21:00] yeah I have been reloading it [15:21:22] so it waits for all jobs to complete before proceeding new ones :/ [15:23:50] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53366 [15:23:55] too slow [15:25:51] ahharhar [15:25:58] uh oh [15:25:59] what? [15:28:49] Again, if anybody is awake: Looks like https://gerrit.wikimedia.org/r/#/c/53264/ created https://bugzilla.wikimedia.org/show_bug.cgi?id=46018 - could somebody revert / fix? [15:28:57] (Redirects such as http://mediawiki.org are going to http://wikiquote.org (301 Moved Permanently)) [15:30:18] ugh [15:31:57] !log Restarted Zuul process. Was stuck somehow waiting for a job that has been canceled. [15:32:04] Logged the message, Master [15:33:22] New patchset: Faidon; "Fix wikiquote.net spurrious OR" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/53369 [15:33:27] RECOVERY - zuul_service_running on gallium is OK: PROCS OK: 1 process with args zuul-server [15:33:48] New patchset: Mark Bergsma; "Imported Upstream version 3.1.2" [operations/debs/ganglia] (master) - https://gerrit.wikimedia.org/r/53370 [15:34:13] Change merged: Mark Bergsma; [operations/debs/ganglia] (master) - https://gerrit.wikimedia.org/r/53370 [15:34:54] Change merged: Faidon; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/53369 [15:34:56] 3.1.2? [15:35:16] just as initial commit [15:35:32] it's the first commit from the debian repo [15:35:43] i couldn't get gerrit to accept it otherwise [15:35:52] New patchset: Hashar; "Jenkins job validation (DO NOT SUBMIT)" [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53371 [15:36:25] PROBLEM - zuul_service_running on gallium is CRITICAL: PROCS CRITICAL: 2 processes with args zuul-server [15:38:24] New review: Hashar; "recheck" [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53371 [15:38:43] root is doing a graceful restart of all apaches [15:38:53] that would be moi [15:39:06] !log root gracefulled all apaches [15:39:15] Logged the message, Master [15:39:15] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Tue Mar 12 15:39:06 UTC 2013 [15:39:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [15:39:39] Change abandoned: Hashar; "(no reason)" [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53371 [15:39:53] New review: Hashar; "recheck" [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53363 [15:40:36] New patchset: Mark Bergsma; "Imported Upstream version 3.5.0" [operations/debs/ganglia] (upstream) - https://gerrit.wikimedia.org/r/53372 [15:41:08] New patchset: Mark Bergsma; "Imported Upstream version 3.5.0" [operations/debs/ganglia] (master) - https://gerrit.wikimedia.org/r/53373 [15:41:08] New patchset: Mark Bergsma; "ganglia (3.5.0-wm1) precise; urgency=low" [operations/debs/ganglia] (master) - https://gerrit.wikimedia.org/r/53374 [15:41:49] !log apache graceful to fix bz 46018 / RT 4695 [15:41:55] Logged the message, Master [15:42:01] and of course the 301 is cached [15:43:01] mark: what's the easiest way to send a purge http://mediawiki.org/ (among others) across every squid? [15:43:12] Change merged: Mark Bergsma; [operations/debs/ganglia] (upstream) - https://gerrit.wikimedia.org/r/53372 [15:43:29] there's a purge maintenance script in mediawiki maintenance/ [15:43:33] it takes URLs as stdin [15:44:08] in /home/wikipedia/common/php/maintenance do: echo 'http://blahblahblah' | php ./purgeList.php --wiki aawiki [15:44:11] aawiki? [15:44:22] it doesn't matter [15:45:07] okay [15:45:10] thanks [15:45:11] Change merged: Mark Bergsma; [operations/debs/ganglia] (master) - https://gerrit.wikimedia.org/r/53373 [15:45:56] No MWMultiVersion instance initialized! MWScript.php wrapper not used? [15:46:09] heh, instructions are outdated, who would have thought [15:49:43] wtf [15:49:51] apache-graceful-all is still broken with eqiad? [15:49:55] dammit [15:52:12] and how are we pushing apache configs all these days if that's still broken? [15:58:56] New review: Demon; "recheck" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/52890 [16:02:16] New review: Faidon; "I tested it and it works. I was bitten by this today, really annoying. Could you merge, build the pa..." [operations/debs/wikimedia-task-appserver] (master) C: 2; - https://gerrit.wikimedia.org/r/49231 [16:02:30] mutante: can you please take care of https://gerrit.wikimedia.org/r/49231 ? [16:03:00]