[00:00:04] !log reedy synchronized langlist [00:00:06] Logged the message, Master [00:00:13] hrmm [00:00:17] Reedy: it doesnt seem to have it. [00:01:22] Reedy: ok, its on langlist file [00:01:28] is it added in svn and such per the wikitech page? [00:01:33] http://wikitech.wikimedia.org/view/Add_a_project [00:01:39] svn? :p [00:01:45] well, i guess its git now [00:01:47] hrmm [00:01:53] i think the auth-dns just pulls from langlist [00:01:56] i wonder why it didnt pull [00:02:12] It seems to have [00:02:17] OpenDNS is showing correct entries for it [00:02:20] and I can access it [00:02:32] hrmm, i killed negative cache, but my dig didnt work [00:02:36] lemme see what i messed up [00:02:52] Snowolf: Seriously? :p [00:03:07] Reedy: meh, i put wikimedia in my dig [00:03:10] old habits [00:03:12] London, England, UK [00:03:12] wikipedia-lb.esams.wikimedia.org [00:03:12] wikipedia-lb.wikimedia.org [00:03:12] 91.198.174.225 [00:03:13] heh [00:03:14] Reedy: you shoudl be all set yes? [00:03:17] Reedy: hmm? I get emails :D [00:03:56] Snowolf: it only became accessible a few minutes ago.. And DNS takes time to propogate unless you force [00:04:11] 00:01, 7 February 2013 User account Hoo man (Talk | contribs) was created automatically [00:04:11] 00:00, 7 February 2013 User account Snowolf (Talk | contribs) was created automatically [00:04:11] 00:00, 7 February 2013 User account Leinad (Talk | contribs) was created automatically [00:04:11] yea, i had to kill a negative cached entry for it [00:05:08] Reedy: need anything else? im about to run out and wont be back online for about an hour. [00:05:19] Nope, that's great thanks :) [00:05:21] I dont wanna leave you hanging since there arent other open about =] [00:05:23] cool [00:05:36] Reedy: not forced, I just clicked the link [00:05:47] lols [00:06:28] Wikimania is in hong kong? ruddy hell, I thought the us was far enough :P [00:06:52] Damianz: Welcome to nearly a year ago ;) [00:07:48] I've not bought a new pc recently so no reason to look up mythical creatures on wikipedia :P yay for being outdated [00:12:21] Must be really slow [00:20:05] !log reedy synchronized php-1.21wmf9/extensions/WikimediaMaintenance [00:20:06] Logged the message, Master [00:30:22] New patchset: Reedy; "Add new wikis to small.dblist" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47819 [00:30:56] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47819 [00:32:50] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47799 [00:35:49] !log reedy synchronized wmf-config/InitialiseSettings.php [00:35:50] Logged the message, Master [00:40:54] New patchset: Reedy; "Add wgMetaNamespace for plwikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47820 [00:41:07] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47820 [00:41:39] !log reedy synchronized wmf-config/InitialiseSettings.php [00:41:40] Logged the message, Master [00:51:49] PROBLEM - Puppet freshness on constable is CRITICAL: Puppet has not run in the last 10 hours [00:51:49] PROBLEM - Puppet freshness on ocg1 is CRITICAL: Puppet has not run in the last 10 hours [00:51:49] PROBLEM - Puppet freshness on lardner is CRITICAL: Puppet has not run in the last 10 hours [01:00:49] PROBLEM - Puppet freshness on marmontel is CRITICAL: Puppet has not run in the last 10 hours [01:06:49] PROBLEM - Puppet freshness on kuo is CRITICAL: Puppet has not run in the last 10 hours [01:06:49] PROBLEM - Puppet freshness on xenon is CRITICAL: Puppet has not run in the last 10 hours [01:09:49] PROBLEM - Puppet freshness on titanium is CRITICAL: Puppet has not run in the last 10 hours [01:10:52] PROBLEM - Puppet freshness on ocg2 is CRITICAL: Puppet has not run in the last 10 hours [01:11:46] PROBLEM - Puppet freshness on tola is CRITICAL: Puppet has not run in the last 10 hours [01:12:49] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [01:13:52] PROBLEM - Puppet freshness on wtp1001 is CRITICAL: Puppet has not run in the last 10 hours [01:14:46] PROBLEM - Puppet freshness on mexia is CRITICAL: Puppet has not run in the last 10 hours [01:15:49] PROBLEM - Puppet freshness on caesium is CRITICAL: Puppet has not run in the last 10 hours [01:15:50] PROBLEM - Puppet freshness on cerium is CRITICAL: Puppet has not run in the last 10 hours [01:17:46] PROBLEM - Puppet freshness on mchenry is CRITICAL: Puppet has not run in the last 10 hours [01:18:49] PROBLEM - Puppet freshness on wtp1 is CRITICAL: Puppet has not run in the last 10 hours [01:50:27] New patchset: Krinkle; "Fix sawikiquote wgLogo." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47825 [01:51:55] Change merged: Krinkle; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47825 [01:53:56] !log krinkle synchronized wmf-config/InitialiseSettings.php 'I9fd98309' [01:53:58] Logged the message, Master [01:57:58] !log Force-running puppet on wtp1 to figure out why it's broken [01:57:59] Logged the message, Mr. Obvious [02:02:14] New review: Catrope; "This is broken. You forgot to remove git-core from existing manifests, so some servers now doubly de..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37247 [02:03:23] ottomata: Are you around? [02:04:06] mw27: ssh: connect to host mw27 port 22: Connection timed out [02:04:07] srv266: ssh: connect to host srv266 port 22: Connection timed out [02:04:07] srv278: ssh: connect to host srv278 port 22: Connection timed out [02:04:10] mw1041: ssh: connect to host mw1041 port 22: Connection timed out [02:04:11] RoanKattouw: I assume this is snafu? [02:04:16] Yeah, ignore those [02:04:24] srv278 is almost always broken [02:06:39] New patchset: Demon; "Removing git-core manifest from other servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47826 [02:09:14] New review: Catrope; "This is not enough. You also need to fix invocations of Package[git-core] like in parsoid.pp" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/47826 [02:10:27] When wmf9 got put onto fenari, it looks like the submodule within GuidedTour was dropped. [02:10:41] Is it alright if I do a sync-dir to fix it. [02:11:18] Go ahaed [02:14:06] thx RoanKattouw. ori-l changed all the steps in "How_to_deploy_code" to use git submodule update --init --recursive , but it's easy to forget [02:14:17] Right [02:14:19] Y [02:14:25] Yeah you guys are the first to use nested submodules [02:15:23] We're just using one submodule. It's everyone else who's nesting. ;) [02:15:26] But, yeah. [02:18:13] It's prompting me for passwords when I run sync-dir (similar to last time). [02:18:26] I'm going to email RT, since I think I sshed correctly and everything should be right. [02:18:35] spagewmf, can you do this one? I already did the submodule update. [02:18:45] huh [02:18:47] Ahm [02:18:55] sure, happy to [02:19:02] Mind if I ask you some questions about that, superm401 ? [02:19:10] (the password issue) [02:19:20] RoanKattouw, sure, go ahead. [02:19:32] Want me to pastebin it? [02:19:39] Please do [02:20:08] Please also run ssh-add -l in the same shell and tell me what the output is (describe, don't paste) [02:21:30] http://pastebin.ca/2311334 [02:22:06] It's an RSA private key with fingerprint. [02:22:42] superm401, sync-dir php-1.21wmf9/extensions/GuidedTour/modules/externals/mediawiki.libs.guiders/mediawiki.libs.guiders.submodule ; OK? [02:22:57] spagewmf, yep, that should do it. [02:23:17] Were those six lines all you got? [02:23:32] !log spage synchronized php-1.21wmf9/extensions/GuidedTour/modules/externals/mediawiki.libs.guiders/mediawiki.libs.guiders.submodule 'sync GuidedTour submodule omitted from 1.21wmf9' [02:23:33] Logged the message, Master [02:24:17] I might have control-Ced. I think they all should have the same underlying cause. [02:24:56] Thanks, spagewmf. [02:25:12] np. https://test2.wikipedia.org/wiki/Main_Page?tour=test WFM [02:27:53] !log LocalisationUpdate completed (1.21wmf9) at Thu Feb 7 02:27:52 UTC 2013 [02:27:54] Logged the message, Master [02:28:28] New patchset: Demon; "Removing git-core manifest from other servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47826 [02:31:20] New review: Catrope; "Looks good to me. There is one remaining use of git-core, but that's in misc/beta.pp" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/47826 [02:34:05] New review: Demon; "That's part of a require, should be fine." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/47826 [02:35:13] New review: Catrope; "D'oh, right. Roan can't read grep output." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/47826 [02:52:26] !log LocalisationUpdate completed (1.21wmf8) at Thu Feb 7 02:52:26 UTC 2013 [02:52:28] Logged the message, Master [03:09:12] PROBLEM - Varnish traffic logger on cp1028 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [03:19:51] RECOVERY - Varnish traffic logger on cp1028 is OK: PROCS OK: 3 processes with command name varnishncsa [04:02:47] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [04:02:47] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [04:02:47] PROBLEM - Puppet freshness on vanadium is CRITICAL: Puppet has not run in the last 10 hours [04:02:47] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [04:04:52] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [07:42:31] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [08:08:36] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [08:30:53] New review: Hashar; "Then I think it complains about the change being closed isnt it ?" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47538 [08:38:56] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 181 seconds [08:38:56] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 181 seconds [08:50:06] New patchset: Spage; "Add a logbot class for #wikimedia-e3 channel" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/46672 [08:54:30] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [08:55:51] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [09:45:36] PROBLEM - MySQL Replication Heartbeat on db1022 is CRITICAL: CRIT replication delay 186 seconds [09:45:53] PROBLEM - MySQL Slave Delay on db1022 is CRITICAL: CRIT replication delay 190 seconds [09:46:11] PROBLEM - MySQL Replication Heartbeat on db46 is CRITICAL: CRIT replication delay 224 seconds [09:46:20] PROBLEM - MySQL Slave Delay on db46 is CRITICAL: CRIT replication delay 227 seconds [09:51:08] RECOVERY - MySQL Slave Delay on db1022 is OK: OK replication delay 15 seconds [09:52:20] RECOVERY - MySQL Replication Heartbeat on db1022 is OK: OK replication delay 0 seconds [09:54:35] RECOVERY - MySQL Replication Heartbeat on db46 is OK: OK replication delay 0 seconds [09:54:53] RECOVERY - MySQL Slave Delay on db46 is OK: OK replication delay 0 seconds [10:08:25] !log jenkins : regenerating jobs to make phpcs to show the sniff codes being used. {{gerrit|47847}} [10:08:27] Logged the message, Master [10:53:12] PROBLEM - Puppet freshness on constable is CRITICAL: Puppet has not run in the last 10 hours [10:53:12] PROBLEM - Puppet freshness on lardner is CRITICAL: Puppet has not run in the last 10 hours [10:53:12] PROBLEM - Puppet freshness on ocg1 is CRITICAL: Puppet has not run in the last 10 hours [11:02:12] PROBLEM - Puppet freshness on marmontel is CRITICAL: Puppet has not run in the last 10 hours [11:08:04] PROBLEM - Puppet freshness on xenon is CRITICAL: Puppet has not run in the last 10 hours [11:08:04] PROBLEM - Puppet freshness on kuo is CRITICAL: Puppet has not run in the last 10 hours [11:11:03] PROBLEM - Puppet freshness on titanium is CRITICAL: Puppet has not run in the last 10 hours [11:12:06] PROBLEM - Puppet freshness on ocg2 is CRITICAL: Puppet has not run in the last 10 hours [11:13:09] PROBLEM - Puppet freshness on tola is CRITICAL: Puppet has not run in the last 10 hours [11:14:03] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [11:15:06] PROBLEM - Puppet freshness on wtp1001 is CRITICAL: Puppet has not run in the last 10 hours [11:16:09] PROBLEM - Puppet freshness on mexia is CRITICAL: Puppet has not run in the last 10 hours [11:17:12] PROBLEM - Puppet freshness on cerium is CRITICAL: Puppet has not run in the last 10 hours [11:17:13] PROBLEM - Puppet freshness on caesium is CRITICAL: Puppet has not run in the last 10 hours [11:19:09] PROBLEM - Puppet freshness on mchenry is CRITICAL: Puppet has not run in the last 10 hours [11:20:13] PROBLEM - Puppet freshness on wtp1 is CRITICAL: Puppet has not run in the last 10 hours [11:44:44] paravoid: I had a google hangout with ottoman yesterday about his puppet manifests for kraken haproxy [11:45:10] paravoid: basically, I have instructed him to use a role class that hold the wikimedia settings and a module with a parameterized class :-D [11:45:30] I am not sure whether he sent a patch [11:46:04] will see :) [11:46:04] hey [11:46:08] yeah [11:46:13] it's not in an ideal state imho [11:46:26] he explained me that is merely temporary [11:46:34] apparently they will remove haproxy entirely [11:46:53] so I guess if that looks like something not too bad, you might want to be liberal :-] [11:47:17] he is definitely not going to write a full haproxy module for sure. His aim is to make the current haproxy hack to be working again then work on a replacement [11:48:09] yeah, I don't agree with that [11:48:12] we'll see [11:48:16] yeah :-] [11:48:18] that is what I told him [11:48:40] anyway, could you possibly review a hack for beta ? /usr/local/apache is defined twice :( [11:48:41] paravoid: anyway, if you could please look at yet another hack for beta [11:48:43] grr [11:48:45] https://gerrit.wikimedia.org/r/#/c/45115/2/modules/applicationserver/manifests/config/apache.pp,unified [11:49:17] ugh [11:49:38] I hate /data [11:49:46] ;-] [11:50:20] that won't work [11:50:27] besides being ugly as hell, it just won't work [11:50:33] there's no ordering gurantee [11:50:34] :( [11:50:46] that one maybe be defined first and nfs::apache::labs later [11:50:47] ahh [11:51:02] and no, don't put the if !defined there too :) [11:51:12] so I am not sure how to fix that :( [11:51:31] I though about mounting /data/project/apache on /usr/local/apachce [11:51:53] but the mount command would not let us mount a subdir (/data/project/apache), only the main volume /data/project [11:53:00] ah maybe I can switch based on $::realm [11:53:06] uugh [11:53:13] prod would ensure directory, and labs would require nfs::apache::labs [11:53:20] such poor abstractions [11:53:56] nfs::apache::labs needs to go for starters [11:54:05] it's a misnomer [11:55:15] maybe I could move the /usr/local/apache definition to the role::applicationserver class ? [11:55:38] applicationserver::config::apache you mean? [11:56:30] or I can move the symlink for beta in applicationserver::config::apache yes [11:56:45] I am not sure if that realm varying stuff should be in the role class or the module [11:57:45] applicationserver::config::apache really needs to be split up [11:58:18] oh, maybe not [11:58:33] the sync apache config part isn't suitable for labs either, is it? [12:00:41] yeah that is broken too [12:00:45] I got a patch for it IIRC [12:02:41] New patchset: Hashar; "beta: /usr/local/apache dupe definition" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/45115 [12:03:05] paravoid: updated to have /usr/local/apache definition at the same place (aka under applicationserver::config::apache ) [12:03:12] and the patch to disable mwsync in beta is https://gerrit.wikimedia.org/r/#/c/47398/ [12:04:05] getting a snack, brb [12:04:20] ugh I hate how labs has /data [12:04:27] it really diverges from production [12:05:05] hashar: the if trickery is bad as it is, let's not do it in two places [12:05:18] put everything into applicationserver::config::apache I'd say, in the same if [12:07:57] so does someone know why amaranth and more are unreachable from Europe since 3.20 UTC as reported by DaB.? [12:08:14] nagios only reported some varnish problem around that time [12:08:28] amaranth isn't managed by us [12:08:45] it's toolserver/WMDE people [12:09:12] paravoid: like https://gerrit.wikimedia.org/r/#/c/45115/3/modules/applicationserver/manifests/config/apache.pp,unified ? [12:09:27] yeah [12:09:27] got rid of the nfs::apache::labs at the sametime [12:09:36] not too excited of that either, but at least let's do it in one place [12:09:39] so it can factored out later [12:09:41] paravoid: in SAL I see RobH restarted it several times [12:10:00] whenever we get git-deploy in beta, the files will be in /srv/ that would be nicer [12:10:07] (no more /data/project ! ) [12:10:20] Nemo_bis: when? [12:11:01] paravoid: some days/weeks/months? ago? [12:11:13] anyway this looks more like a network problem [12:11:17] it's physically located in our infrastructure [12:11:22] not really, no, it looks like the box died [12:11:39] ah better then [12:12:02] I know nothing, it's what he said :) DaB|Uni> It is not possible to reach any host in tampa from the toolserver-cluster [12:12:29] lunch brb [12:17:39] New patchset: Hashar; "sync mediawiki only in production" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47398 [12:19:34] New review: Hashar; "Rebased on https://gerrit.wikimedia.org/r/#/c/45115 which also introduced a if realm." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/47398 [12:28:18] out for the rest of the afternoon [12:28:22] though I will connect tonight [13:19:46] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 181 seconds [13:19:55] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 183 seconds [13:26:02] New patchset: Dzahn; "generate separate Apache sites for each planet language from template instead of a single for all" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47862 [13:30:39] New patchset: Dzahn; "generate separate Apache sites for each planet language from template instead of a single for all" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47862 [13:32:54] New patchset: MaxSem; "Advanced Solr monitoring script" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47111 [13:33:12] New patchset: Dzahn; "generate separate Apache sites for each planet language from template instead of a single for all" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47862 [13:34:37] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47862 [13:36:45] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 182 seconds [13:37:12] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 191 seconds [13:48:04] New patchset: Dzahn; "remove old planet (pre-venus) puppet class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47864 [13:49:49] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47864 [13:56:48] New patchset: Dzahn; "use a simple Apache ports.conf with SSL (NameVirtualHost *:443) enabled without this being in default the first VirtualHost will take precedence" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47865 [13:57:41] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47865 [13:59:50] Reedy: you created minwiki, but it is not listed at interwikimap yet. can you fix it? [14:03:36] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [14:03:37] PROBLEM - Puppet freshness on vanadium is CRITICAL: Puppet has not run in the last 10 hours [14:03:37] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [14:03:37] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [14:05:42] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [14:34:12] <^demon> paravoid: Hi. Roan spotted some problems with the "put git on all servers" change. I submitted a followup: https://gerrit.wikimedia.org/r/#/c/47826/ [14:34:16] mutante: could you also merge https://gerrit.wikimedia.org/r/#/c/47223/ by any chance? :) [14:34:50] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47826 [14:35:06] ^demon: I got an Internal Server Error from gerrit when I clicked submit; second time worked [14:35:24] <^demon> Hmm, I'll check the log. [14:36:03] <^demon> Wow, never seen that before. [14:36:22] <^demon> http://p.defau.lt/?0N8WjmdGzTy1KfNY9Y4Crw [14:36:26] Nemo_bis: ok, yes. i meant to run it more often than just daily, maybe i would not have made it hourly, but...actually.. why not [14:36:37] ;) [14:36:57] <^demon> paravoid: Weridddd, it didn't leave any comments from you. [14:37:03] I was thinking of 2-3 h, but I see it takes only 2 min to complete... [14:37:07] <^demon> *weirdddd, even [14:37:09] New review: Dzahn; "yes, just once daily was a little slow to update" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/47223 [14:37:10] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47223 [14:37:41] New review: Demon; "Some weird bug prevented Faidon's comments from being posted. Will file upstream." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47826 [14:38:02] I didn't put anything in the comments field fwiw [14:38:20] just hit +2 and publish & submit [14:39:23] <^demon> Yeah, but it should've still left the default comments. [14:39:27] <^demon> It didn't leave anything. [14:39:53] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Thu Feb 7 14:39:34 UTC 2013 [14:40:31] nod [14:40:56] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Thu Feb 7 14:40:48 UTC 2013 [14:43:56] RECOVERY - Puppet freshness on caesium is OK: puppet ran at Thu Feb 7 14:43:40 UTC 2013 [14:46:11] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Feb 7 14:45:57 UTC 2013 [14:47:23] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Thu Feb 7 14:47:08 UTC 2013 [14:48:00] RECOVERY - Puppet freshness on mchenry is OK: puppet ran at Thu Feb 7 14:47:43 UTC 2013 [14:48:00] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Thu Feb 7 14:47:52 UTC 2013 [14:48:57] New patchset: Ryan Lane; "Remove keystone patches" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47870 [14:49:02] RECOVERY - Puppet freshness on wtp1001 is OK: puppet ran at Thu Feb 7 14:48:38 UTC 2013 [14:49:56] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47870 [14:51:26] RECOVERY - Puppet freshness on ocg1 is OK: puppet ran at Thu Feb 7 14:51:07 UTC 2013 [14:52:02] RECOVERY - Puppet freshness on marmontel is OK: puppet ran at Thu Feb 7 14:51:29 UTC 2013 [14:55:02] RECOVERY - Puppet freshness on ocg2 is OK: puppet ran at Thu Feb 7 14:54:55 UTC 2013 [14:55:29] RECOVERY - Puppet freshness on tola is OK: puppet ran at Thu Feb 7 14:55:18 UTC 2013 [14:56:32] RECOVERY - Puppet freshness on cerium is OK: puppet ran at Thu Feb 7 14:56:16 UTC 2013 [15:00:01] is interwikimap (missing min prefix) fixed automatically or need i to create a bug? [15:00:26] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Thu Feb 7 15:00:06 UTC 2013 [15:01:02] RECOVERY - Puppet freshness on constable is OK: puppet ran at Thu Feb 7 15:00:41 UTC 2013 [15:03:26] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Feb 7 15:03:18 UTC 2013 [15:17:55] New review: Faidon; "I just realized that this is already packaged in Debian, although a version behind (0.8) as the late..." [operations/debs/python-jsonschema] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/47662 [15:39:49] !log reedy synchronized php-1.21wmf9/cache/interwiki.cdb 'Updating 1.21wmf9 interwiki cache' [15:39:50] Logged the message, Master [15:40:11] !log reedy synchronized php-1.21wmf8/cache/interwiki.cdb 'Updating 1.21wmf8 interwiki cache' [15:40:12] Logged the message, Master [15:44:27] !log Rebooting cr1-esams [15:44:28] Logged the message, Master [15:46:05] !log reedy synchronized wmf-config/InitialiseSettings.php [15:46:06] Logged the message, Master [15:47:00] New patchset: Reedy; "Fix incubator dbname in geocrumbs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47871 [15:47:51] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47871 [15:52:26] New patchset: Reedy; "Add wikidata.dblist to ease running of maintenance scripts under all wikidata-wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47872 [15:52:50] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47872 [16:07:02] RECOVERY - Host ms-be11 is UP: PING OK - Packet loss = 0%, RTA = 0.59 ms [16:08:52] New patchset: Reedy; "Add wikidata dblist to noc conf" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47875 [16:09:15] paravoid, or mutante, here's another puppet question for you [16:09:35] if I'm making a role class that uses a module of the same name [16:09:46] e.g. role::kraken and modules/kraken [16:10:00] I have to be really careful about how I define and include classes, due to puppet's autoloader [16:10:05] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47875 [16:10:14] role::kraken::proxy cannot include a class from the kraken module called kraken::proxy [16:10:20] should I: [16:10:42] 1. name the role something different [16:10:42] or [16:10:42] 2. fully qualify the module class include: ::kraken::proxy [16:10:46] PROBLEM - swift-account-server on ms-be11 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:10:46] PROBLEM - swift-container-updater on ms-be11 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:10:46] PROBLEM - SSH on ms-be11 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:10:47] PROBLEM - swift-object-updater on ms-be11 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:10:55] PROBLEM - swift-container-auditor on ms-be11 is CRITICAL: Connection refused by host [16:10:56] PROBLEM - swift-object-replicator on ms-be11 is CRITICAL: Connection refused by host [16:10:56] PROBLEM - swift-container-replicator on ms-be11 is CRITICAL: Connection refused by host [16:10:56] Anyone know why root seems to own nearly everything in /home/wikipedia/common/docroot/noc/conf now? :/ [16:11:13] PROBLEM - swift-account-reaper on ms-be11 is CRITICAL: Connection refused by host [16:11:14] PROBLEM - swift-object-auditor on ms-be11 is CRITICAL: Connection refused by host [16:11:23] PROBLEM - swift-account-auditor on ms-be11 is CRITICAL: Connection refused by host [16:12:07] PROBLEM - swift-account-replicator on ms-be11 is CRITICAL: Connection refused by host [16:12:07] PROBLEM - swift-container-server on ms-be11 is CRITICAL: Connection refused by host [16:12:07] PROBLEM - swift-object-server on ms-be11 is CRITICAL: Connection refused by host [16:15:30] New patchset: ArielGlenn; "ms-be ssd layout changed to reflect h710 controllers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47876 [16:15:40] ignore those, I'm installing [16:16:08] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47876 [16:20:49] PROBLEM - Host ms-be11 is DOWN: PING CRITICAL - Packet loss = 100% [16:26:40] RECOVERY - Host ms-be11 is UP: PING OK - Packet loss = 0%, RTA = 2.82 ms [16:34:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.954 seconds [16:37:36] New patchset: Reedy; "Bug 44741 - kowikiversity, minwiki, and tswiki using SVG instead of PNG for $wgLogo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47879 [16:38:11] !log reedy synchronized wmf-config/InitialiseSettings.php [16:38:12] Logged the message, Master [16:38:50] Oo, RobH, whatever happened with your battle 1007? [16:38:53] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 206 seconds [16:39:29] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 221 seconds [16:45:11] ottomata: spent all day, never finished, its still being bitchy [16:45:21] i'll resume later today [16:45:21] aye ok, cool, just wondering! [16:45:23] no worries, thank you! [16:46:39] New patchset: Ottomata; "Adding very simple haproxy module." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47881 [16:47:44] PROBLEM - SSH on ms-be11 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:48:08] paravoid, step 1! super simple haproxy module: [16:48:08] https://gerrit.wikimedia.org/r/#/c/47881/ [16:48:20] let me know if I should make more files (install.pp, service.pp, whatever) [16:51:20] RECOVERY - SSH on ms-be11 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [16:55:29] New patchset: Ottomata; "passwords.pp - Adding empty passwords::analytics class, s/svn/git/ in comment." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47883 [16:56:06] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47883 [16:57:39] RECOVERY - swift-object-server on ms-be11 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [16:57:39] RECOVERY - swift-container-server on ms-be11 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [16:58:05] RECOVERY - swift-account-server on ms-be11 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [16:58:05] RECOVERY - swift-container-updater on ms-be11 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [16:58:05] RECOVERY - swift-object-updater on ms-be11 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [16:58:14] RECOVERY - swift-account-auditor on ms-be11 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [16:58:14] RECOVERY - swift-container-auditor on ms-be11 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [16:58:32] RECOVERY - swift-object-auditor on ms-be11 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [16:58:33] RECOVERY - swift-container-replicator on ms-be11 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [16:58:41] RECOVERY - swift-account-reaper on ms-be11 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [16:58:41] RECOVERY - swift-object-replicator on ms-be11 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [16:59:17] RECOVERY - swift-account-replicator on ms-be11 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [17:06:51] hurry up git review! [17:06:51] New patchset: RobH; "mw75-mw80 new image scalers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47884 [17:06:57] that takes forever... [17:07:20] cmjohnson1: So you wanna review my change and +1 it before I merge? [17:07:38] It is easy change, but thought you may wanna review so you are involved in all steps =] [17:08:04] yep [17:09:44] robh: what does line 1524 do? [17:10:13] if its mw75 or 76 it makes it a ganglia aggregator [17:10:28] as each service group (api, apache, scaler) has to have two aggregators per datacenter. [17:10:43] example below it on 1533 [17:11:41] ok cool..i was just wondering what was different for mw75 and 76 and why not for the 77-80 [17:11:58] yea we only want two aggregators per cluster type [17:12:04] Change merged: Cmjohnson; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47884 [17:12:05] s/want/need [17:12:21] ah..cool [17:12:23] cmjohnson1: oh cool, when did we give you =2? [17:12:25] +2 even [17:12:31] awhile ago [17:12:31] (i agree with it, just didnt know it happened) [17:12:35] \o/ [17:12:47] i'll merge on sockpuppet [17:12:53] k..cool [17:13:19] cmjohnson1: Ok, so those are in site.pp [17:13:32] so once you have them racked, you guys can update dhcpd and either do the install [17:13:34] or hand off to me for it [17:13:36] either or [17:14:01] (keep me in loop either way and I'll follow along) [17:14:09] i will update dhcp and hand off...lots of h/w and manual labor stuff going on here today [17:14:17] actually, you dont have to update dhcpd [17:14:22]