[00:14:51] PROBLEM - MySQL Slave Delay on es1004 is CRITICAL: CRIT replication delay 252 seconds [00:20:51] RECOVERY - MySQL Slave Delay on es1004 is OK: OK replication delay 1 seconds [00:28:44] Reedy: https://gerrit.wikimedia.org/r/#/c/10724/ [01:19:37] New patchset: Dereckson; "(bug 37401) Babel configuration for fo.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11133 [01:19:47] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11133 [01:24:00] New patchset: Dereckson; "(bug 37401) Babel configuration for fo.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11133 [01:24:09] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11133 [01:39:40] New patchset: Dereckson; "(bug 37384) - Collection default format is ODT for gu. projects" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11134 [01:39:46] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11134 [01:40:45] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 212 seconds [01:43:09] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 261 seconds [01:46:36] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [01:56:12] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 1 seconds [03:50:21] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [03:50:21] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [03:50:21] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [04:31:22] PROBLEM - MySQL Slave Delay on es1004 is CRITICAL: CRIT replication delay 303 seconds [04:35:52] RECOVERY - MySQL Slave Delay on es1004 is OK: OK replication delay 0 seconds [04:45:18] New patchset: Logicwiki; "(bug 37384) - Collection default format is ODT for gu. projects" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11134 [04:45:25] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11134 [04:45:47] New review: Hashar; "Please stop spamming random people with review requests. Thanks!" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/11082 [04:54:19] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [06:58:12] PROBLEM - Puppet freshness on es1004 is CRITICAL: Puppet has not run in the last 10 hours [07:51:44] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [07:57:14] ACKNOWLEDGEMENT - Host srv206 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn srv206 has a loong history of trouble in RT #241. get rid of it?! [08:06:24] New review: Dereckson; "@Hashar Please make clear to Sumanah what kind of notifications you wish to have or what kind of act..." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/11082 [08:13:23] New patchset: Hashar; "import CommonSettings from wmflabs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9131 [08:13:24] New patchset: Hashar; "vary wgUploadStashScalerBaseUrl based on cluster" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11035 [08:13:24] New patchset: Hashar; "wmfHostnames array to easily change hostnames on a cluster basis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11034 [08:13:30] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9131 [08:13:32] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11035 [08:13:34] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11034 [08:13:56] New review: Hashar; "rebased" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9131 [08:14:08] New review: Hashar; "rebased" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11035 [08:14:23] New review: Hashar; "rebased" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11034 [08:14:26] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9131 [08:14:27] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11035 [08:14:29] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11034 [08:18:23] New patchset: Hashar; "cleanup whitespace in mobile-pmtpa.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11037 [08:18:24] New patchset: Hashar; "move mobile related conf to their own files" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11036 [08:18:30] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11037 [08:18:32] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11036 [08:18:48] New review: Hashar; "rebased" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11036 [08:18:51] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11036 [08:19:01] New review: Hashar; "rebased" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11037 [08:19:04] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11037 [08:21:37] New patchset: Hashar; "specific shell configuration for transcoding boxes" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9132 [08:21:39] New patchset: Hashar; "import overriding system for InitialiseSettings.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9237 [08:21:40] New patchset: Hashar; "move throttling related conf to throttle.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9136 [08:21:46] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9132 [08:21:48] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9237 [08:21:50] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9136 [08:22:34] New review: Hashar; "rebased" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9237 [08:22:55] New review: Hashar; "rebased" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9136 [08:23:04] New review: Hashar; "rebased" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9132 [08:23:06] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9237 [08:23:07] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9136 [08:23:09] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9132 [08:24:12] and now I am going to deploy that to the cluster [08:24:51] !log deploying several changes made to mediawiki-config gerrit changes 11034 11035 9131 11036 11037 9132 9136 and 9237 [08:24:57] Logged the message, Master [08:26:17] GRRR [08:35:27] gr? [08:36:44] someone merged changes in mediawiki-config but did not deploy them :( [08:36:48] so I have to do it hehe [08:36:53] and of course now I have an issue [08:37:20] !log installing samba-common-bin, smbclient package upgrades on tridge [08:37:25] Logged the message, Master [08:39:59] so that was the change being incorrect :] [08:41:00] New patchset: Hashar; "Revert "(bug 37482) Adding Proofread Page ext. namespaces on nl.wikisource"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11142 [08:41:06] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11142 [08:41:58] Standard mw deploy steps; 1) push 2) run :D [08:41:58] ugh [08:42:27] yeah [08:42:47] well that was merged -> run [08:42:53] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [08:42:54] leaving the actual deploy to someone else :-]] [08:43:01] New review: Hashar; "I have reverted the change. It uses the wrong global variable ($wgNamespaceAliases instead of $wgExt..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11067 [08:43:18] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11142 [08:43:21] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11142 [08:43:34] New review: Hashar; "Reverted with https://gerrit.wikimedia.org/r/#/c/11142/" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11067 [08:43:34] OH [08:43:45] Gerrit has a [Revert Change] button [08:43:46] bah [08:44:02] New review: Hashar; "reverts https://gerrit.wikimedia.org/r/#/c/11067/" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11142 [08:45:16] !log reverted '(bug 37482) Adding Proofread Page ext. namespaces on nl.wikisource' --> used the wrong configuration setting. [08:45:21] Logged the message, Master [08:45:30] * hashar proceed to next commit [08:49:52] PROBLEM - MySQL Slave Delay on es1004 is CRITICAL: CRIT replication delay 298 seconds [08:50:18] New review: Dzahn; "blocked by RT 3106 (precise upgrade) for nodejs package." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/11042 [08:52:43] RECOVERY - MySQL Slave Delay on es1004 is OK: OK replication delay 0 seconds [09:08:44] New patchset: Dereckson; "(bug 37482) Adding Proofread Page ext. namespaces on nl.wikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11143 [09:08:49] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11143 [09:22:51] New review: Hashar; "Thanks, you rock!" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11143 [09:22:53] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11143 [09:23:01] New patchset: Dereckson; "(bug 37363) Uninstall CongressLookup extension" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11144 [09:23:07] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11144 [09:51:04] New patchset: Hashar; "boostrap placeholder for PHPUnit" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11145 [09:51:05] New patchset: Hashar; "move utilities functions in common files" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11146 [09:51:11] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11145 [09:51:13] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11146 [10:07:32] New patchset: Dereckson; "(bug 37336) Install Narayam ext. in te.wiktionary & te.wikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11147 [10:07:38] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11147 [10:15:25] New patchset: Hashar; "ant target to trigger PHPUnit tests under Jenkins" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11149 [10:15:31] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11149 [10:16:00] New review: Hashar; "Yeah we have tests in Jenkins now :-]" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11149 [10:16:02] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11149 [10:18:17] New patchset: Hashar; "boostrap placeholder for PHPUnit" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11145 [10:18:18] New patchset: Hashar; "move utilities functions in common files" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11146 [10:18:24] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11145 [10:18:26] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11146 [10:19:41] New review: Hashar; "Patchset 2 is a rebase to take advantage of Ie7d96750 which enables PHPUnit tests in Jenkins." [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/11145 [10:19:49] New review: Hashar; "Patchset 2 is a rebase to take advantage of Ie7d96750 which enables PHPUnit tests in Jenkins." [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/11146 [10:21:11] New review: Hashar; "Since this is fixing two bugs, I would prefer we have two separate Gerrit changes :-]" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/10756 [10:31:12] Thehelpfulone: do you want to make / have a wiki table with all listinfo owners and moderators or something similar? [10:31:42] i can fetch raw info but i mean the "turn it into nice wiki" part [10:32:54] just cause you mentioned you already started a project on wiki on similar things [10:52:51] New patchset: Dereckson; "(bug 36972) activate the patroller group on nn.wiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11150 [10:52:57] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11150 [10:56:49] PROBLEM - MySQL Slave Delay on es1004 is CRITICAL: CRIT replication delay 203 seconds [10:58:19] RECOVERY - MySQL Slave Delay on es1004 is OK: OK replication delay 0 seconds [11:34:46] so, I'd like to setup a tool that provides an overview of the infrastructure by exposing data coming from the puppet db [11:34:55] we've discussed this in the past too [11:35:15] how would I pick a server to do that? [11:37:56] paravoid: well, you could use a new misc server for that [11:38:06] you'd ask RobH [11:38:24] or you could stick it on another server that's doing something similar [11:38:27] it's really a simple django app, do we need a whole separate server for that? [11:39:48] as long as it's not spence i guess [11:40:20] maybe neon? [11:40:21] * Ryan_Lane goes to get food [11:43:17] maybe ask Leslie if she thinks its fine on neon (upcoming monitoring server to replace spence, as it might fit monitoring and historically spence also had multiple tools and webserver and i expect neon to have more resources. my 2 cents [11:45:39] that's more than 2 cents, thanks :-) [11:59:13] there are things like observium and (other end of the spectrum as far as resources) http://noc.wikimedia.org/dbtree/ [11:59:17] hmm noc, eh? grrr [11:59:28] anyways it woul dbe nice I guess to have those generally on one host, I think [12:03:56] true, but noc = fenari and potentially wants cleanup rather than new stuff [12:05:01] yeah I don't want things on fenari [12:05:07] what I want is that not to be on noc [12:06:41] and not spence [12:07:20] indeed [12:08:34] paravoid: pretty likely you know it, but win32-loader.exe to install Debian?:) quickest office OS migration for a guy from XP to Debian: mail to ALL: Please go to http://goodbye-microsoft.com/ ,click install, use your local windows admin rights (!sic) and tell me if any issues. ttyl, your admin". :) [12:09:04] yep, it's fun :) [12:31:13] !log backing up wikitech dir locally on linode instance [12:31:29] Logged the message, Master [12:31:52] upgrade time? [12:32:12] soon [12:33:11] /me sees "wikitech-static" as well [12:33:33] and "wikitech-broken" :p [12:33:46] they sound useful [12:33:48] oh yeah [12:33:54] the static one was a dumphtml copy I think [12:34:00] so we could grab it and stuff it wherever [12:34:01] which would you backup besides the main? [12:34:07] ok [12:34:24] it should get regenerated once a week by cron (in an ideal world, which we don't live in) [12:34:50] gah [12:34:59] really can it take me all day to get this one set of lists generated? [12:35:03] answer: yes, yes it can. [12:35:04] >_< [12:40:56] will just take all of /srv/org/wikimedia , first get it , reduce size later [12:41:07] ok [13:10:08] New review: Dereckson; "Don't merge, shellpolicy issue." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/11150 [13:25:28] mutante, just responded to RT about stat1 [13:25:43] Pretty sure Mark convinved Diederik to do a full reinstall, rather than an upgrade [13:26:19] ottomata: well and Ryan convinced to do in place upgrade, i dont know which was first:) [13:26:32] either or [13:26:33] hahah, we were first, then diederik talked to mark yesterday I think [13:26:39] well, we can do upgrade now, right? [13:26:42] a full install is always better [13:26:47] but we'd have to wait for who knows who to do the install? [13:27:01] and doing the upgrade doesn't mean we can't do the reinstall if the upgrade doesn't work... [13:27:05] meh? what should we do? [13:27:06] it's the best way to ensure puppetization is proper [13:27:20] ok [13:27:24] i guess we'll do reinstall then [13:27:30] we are ready to do that asap [13:27:41] do we need mark to do that? who can do that? [13:28:01] I dunno. add an rt for it [13:29:38] ottomata: turn my upgrade ticket into reinstall ticket. you can just renamed subject in RT [13:31:36] i think there is an RT for it, but it might have been closed [13:31:38] I can reopen it [13:32:05] ottomata: check out 2165 , i turned it into "stats master ticket" and all others are linked there [13:32:31] awesome, thanks [13:32:32] also added you as CC / requestor in several places [13:33:04] https://rt.wikimedia.org/Ticket/Display.html?id=2946 [13:33:55] ah,ok, that was missing. linked and closing the other [13:34:21] ok [13:34:32] i will reopen mine then [13:34:58] rejected upgrade [13:35:18] reinstall is open [13:35:43] but it could have a better comment than " Not doing this, just going with Lucid" :) [13:38:28] yeah, i'm adding [13:39:13] is mark really the only one who can reinstall? is there anyone else we can poke? [13:39:23] Erik Z has suspended some work he was doing on stat1 so we could do this [13:39:27] he asked us to have it done in 24 hours [13:41:36] are you satisfied with backups yet? [13:41:59] !rt 3098 [13:42:00] http://rt.wikimedia.org/Ticket/Display.html?id=3098 [13:43:44] cool! [13:44:03] hmm, i think we shoudl see the /a partition on there too [13:44:05] not just /home [13:44:07] i'd rather not start it in late afternoon with an appointment later, i can do it in the morning but i think everybody could if anybody in US timezone is avail. [13:44:39] ottomata: and yes, make that sure as well [13:45:07] how do I make sure? [13:45:13] do I need access to tridge? [13:45:39] should also receveice mail from amanda on ops list [13:49:42] ottomata: you see these "amanda mail report" mails? i see them but no "stat1" in there [13:49:56] while we do see those daily files on tridge itself [13:50:01] hmm, looking [13:51:26] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [13:51:26] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [13:51:26] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [13:51:38] FAILURE DUMP SUMMARY: [13:51:40] http://stat1.wikimedia.org/ /a lev 0 FAILED "[dump larger than available tape space, 233263375 KB, but cannot incremental dump new disk]" [13:52:08] stat1.wikime /home 0 4917710 1895211 38.5 1:42 18520.5 1:42 18527.6 [13:52:12] so yeah, not enough space for /a [13:55:42] hm, or you do want to let it span multiple tapes [13:55:57] um, doesn't matter to me [13:56:19] apergos: how much did you remove from tridge approx? [13:57:04] I dunno bout we had a few T free instead of 1/2 T [13:57:06] ottomata: anything that can be dropped from /a that is just outdated or is enough to be stored _once_ elsewhere -> archive vs. daily backup [13:57:09] how much space do they need? [13:57:45] "There are TWO parameters that must be set so that the archive spans [13:57:48] multiple virtual tapes. " [13:58:07] probably, I do'nt really know what is in there [13:58:12] i've just been told to back it up [13:58:14] "Next, the parameter tape_splitsize must be set in the dumptypeamanda.conf, runtapes must be set to an appropriate value > 1. | [13:58:17] configuration in amanda.conf [13:58:46] http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-mailing-lists-3/amanda-20/dump-larger-than-available-tape-space-68176/ [14:01:32] ok... [14:02:26] would virtual tape size mean the size of the backup? or the size of all of the available tapes? [14:04:27] http://wiki.zmanda.com/index.php/How_To:Set_Up_Virtual_Tapes [14:04:37] (amanda = software, zmanda = company) [14:10:27] still not sure what I should do about this, I don't really feel comfortable changing backup server configs , and as far as I can tell this has to do with reconfiguring the backup server tape configs? [14:17:18] !log storage3 dist-upgrade and reboot [14:17:26] Logged the message, Master [14:19:40] ottomata: can you find out the answer to the "how much do they need" question? [14:20:03] ok, will ask them [14:20:04] PROBLEM - Host storage3 is DOWN: PING CRITICAL - Packet loss = 100% [14:26:07] New patchset: Krinkle; "(bug 37304) Set $wgTranslateDisablePreSaveTransform = true;" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11155 [14:26:15] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11155 [14:27:38] it.wiki seems to be died [14:31:19] RECOVERY - Host storage3 is UP: PING OK - Packet loss = 0%, RTA = 0.62 ms [14:34:37] PROBLEM - SSH on storage3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:34:55] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:35:04] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: (Return code of 255 is out of bounds) [14:35:22] PROBLEM - MySQL disk space on storage3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:35:24] Vito_away: works for me [14:35:31] PROBLEM - mysqld processes on storage3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:38:22] Change abandoned: Hashar; "Thanks MaxSem :)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7773 [14:38:40] PROBLEM - Host storage3 is DOWN: PING CRITICAL - Packet loss = 100% [14:43:28] RECOVERY - SSH on storage3 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:43:37] RECOVERY - Host storage3 is UP: PING OK - Packet loss = 0%, RTA = 0.50 ms [14:43:46] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay seconds [14:44:13] RECOVERY - MySQL disk space on storage3 is OK: DISK OK [14:46:02] !log lowering ttl for virt0 [14:46:08] Logged the message, Master [14:55:01] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [15:08:26] New patchset: Hashar; "(bug 37545) farsi: change default Collection namespace" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11161 [15:08:32] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11161 [15:27:40] New patchset: Dzahn; "disable shell access for raindrift per RT-3088" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11165 [15:28:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11165 [15:28:59] !log Unstuck torrus [15:29:06] Logged the message, Master [15:30:25] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11165 [15:30:28] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11165 [15:53:24] New review: Dzahn; "does "Newer changeset added in." mean RT-2512 is resolved?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3238 [16:12:53] !log adding gerrit@wikimedia.org to accepted nonmembers of mediawiki-cvs list [16:12:58] Logged the message, Master [16:13:50] mutante: what's mediawiki-cvs? [16:14:14] huh [16:14:26] jeremyb: "This list sends out notifications of commits to MediaWiki's CVS repository" (yes, its gonna be renamed) [16:14:27] it exists on lists.wm.o [16:14:45] a few decades out of date? ;) [16:14:48] yes:) [16:14:57] what it just getting no traffic from gerrit until now? [16:15:19] yea, demon is enabling it [16:16:40] i wonder what wikimedia-commits is [16:16:46] cmjohnson1: i cant really see anyone having a problem with it [16:17:04] cmjohnson1: we cant do more than you anyways as mgmt itself is like broken [16:17:44] New patchset: Hashar; "tweak logging for wmflabs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11173 [16:17:50] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11173 [16:17:52] how to get a quiet afternoon: don't launch your IRC client :-] [16:18:17] heh! ,,,:p [16:18:20] Logged the message, Master [16:19:02] !log shut down sq33 [16:19:07] Logged the message, Master [16:19:17] yea, that order may look weird now, but i just did :) [16:21:19] PROBLEM - Host sq33 is DOWN: PING CRITICAL - Packet loss = 100% [16:22:54] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11173 [16:22:56] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11173 [16:24:34] Logged the message, Master [16:28:31] PROBLEM - Backend Squid HTTP on sq48 is CRITICAL: Connection refused [16:28:49] PROBLEM - Frontend Squid HTTP on sq48 is CRITICAL: Connection refused [16:32:36] New patchset: Demon; "Allow some logs to supress comment-added notifications" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11176 [16:33:00] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11176 [16:34:49] RECOVERY - Host sq33 is UP: PING OK - Packet loss = 0%, RTA = 2.78 ms [16:38:10] Logged the message, Master [16:38:16] PROBLEM - Frontend Squid HTTP on sq33 is CRITICAL: Connection refused [16:38:52] PROBLEM - Backend Squid HTTP on sq33 is CRITICAL: Connection refused [16:43:31] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [16:44:52] RECOVERY - Frontend Squid HTTP on sq48 is OK: HTTP OK HTTP/1.0 200 OK - 604 bytes in 0.003 seconds [16:45:55] RECOVERY - Backend Squid HTTP on sq48 is OK: HTTP OK HTTP/1.0 200 OK - 459 bytes in 0.005 seconds [16:46:10] !log changing virt0's ip address and vlan [16:46:15] Logged the message, Mistress of the network gear. [16:46:40] PROBLEM - Host virt0 is DOWN: PING CRITICAL - Packet loss = 100% [16:47:30] LeslieCarr: you killed our sessions!! ;-] [16:48:34] New patchset: Ryan Lane; "Changing virt0's address" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11178 [16:49:04] PROBLEM - Host labsconsole.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [16:49:15] !log restarting mysql on virt0 with correct bind address [16:49:20] Logged the message, Master [16:49:35] hashar: sorry [16:49:59] but now our ip address utilization is better [16:50:03] due to switching ip ranges [16:50:08] * Ryan_Lane sighs [16:50:14] labsconsole is now broken [16:50:14] feel the ipv4 optimization! [16:50:53] might be because we used the IP for virt0 instead of the DNS entry? [16:51:12] no. mysql was bound to the wrong ip [16:51:19] and the /etc/hosts file had the wrong ip too [16:51:22] working now [16:51:55] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11178 [16:51:59] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11178 [16:52:02] hey mark, can we start installing precise on stat1? [16:56:13] drdee: can we do that tomorrow? [16:56:28] also I thought that erik z requested to start on stat1001 first [16:57:06] mark: i am giving up on this project [16:57:18] i have tried everything to do this in a reasonable time [16:57:25] i don't care about it [16:58:40] PROBLEM - Puppet freshness on es1004 is CRITICAL: Puppet has not run in the last 10 hours [16:59:29] !log restarting opendj on virt0 [16:59:34] Logged the message, Master [16:59:36] me and andrew have spent on and off for about 6 weeks to get precise installed on stat1, we don't need it, we just want to help out but we cannot keep waiting for this forever [17:00:19] we can just set a time for it [17:00:25] "can we do it now" doesn't always work so well [17:00:40] especially not when I'm about to go off for dinner and a soccer match :P [17:01:43] I don't care about that box either, but it's gonna be even bumpier if we don't do it now [17:02:16] RECOVERY - Backend Squid HTTP on sq33 is OK: HTTP OK HTTP/1.0 200 OK - 27400 bytes in 0.010 seconds [17:02:57] mark: anyways [17:03:01] RECOVERY - Frontend Squid HTTP on sq33 is OK: HTTP OK HTTP/1.0 200 OK - 27546 bytes in 0.006 seconds [17:03:04] let me know when it is done [17:04:50] mark, how are we supposed to set a time for it? [17:04:54] do we just set one and someone will do it? [17:05:08] we need to talk to you guys to set a time, which I guess is what we are trying to do [17:05:17] I tried to get someone to respond to this ticket to set a time 6 weeks ago as drdee says [17:05:24] but no one seemed interested, so we gave up [17:05:33] now it is back (and we do kinda need it this time) [17:05:48] so yesterday we got someone to plug in a USB drive so we could back up our data [17:05:51] this is done [17:05:56] so now the machine is sitting there [17:06:01] inactive until someone can do the reinstall [17:06:04] cmjohnson1: what's up? [17:07:45] cmjohnson1: "host search32.mgmt.pmtpa.wmnet" on any server [17:08:13] ottomata: i can do it tomorrow, that's no problem [17:08:38] cmjohnson1: what mark said, but yeah: 10.1.4.43 [17:08:50] ok, thanks mark, drdee, is that ok? tell erik? [17:09:04] my impression was that erik z was nervous about it [17:09:13] I don't know what the deal is with that [17:09:20] we should be totally backed up with what he is nervous about [17:09:24] alright [17:09:26] but, erik z has suspended his work [17:09:27] then i'll do it tomorrow [17:09:31] until this is done [17:09:44] mark: want me to grab it ? [17:09:49] if you want [17:09:57] but whoever drdee talked to yesterday said that it'd be done in 24 hours I guess, but whatevs [17:10:00] just needs a reinstall, with precise, with LVM partitioning [17:10:04] oh cool, thanks Leslie! [17:10:06] cool [17:10:08] (can be done manual, as long as data ends up in LVM LVs) [17:10:18] and there's data attached to an external drive [17:10:23] that needs to be copied back after the install [17:10:25] and puppet runs, etc [17:10:28] i can do that part [17:10:29] ottomata has all the details [17:10:32] right [17:10:33] so format the usb drive and get cmjohnson1 to destroy it [17:10:35] got it ;) [17:10:38] haha [17:11:12] actually cmjohnson1 can you remove the usb drive from stat1 for now ? i just want to be uber paranoid about not destroying anything [17:11:16] LeslieCarr: you know how precise installs work? [17:11:18] by default it'll do lucid [17:11:25] you need to add two lines to the node entry in dhcpd.conf [17:11:33] ok [17:11:47] check e.g. the lvs servers for examples [17:11:52] those have been reinstalled with precise last week [17:13:31] LeslieCarr: also, make sure you don't fill the entire logical volume with LVs [17:13:38] always good to keep at least 10-20% of free space [17:13:50] allocate what's needed, not what's available :) [17:14:06] that allows for LVM snapshots and "oops, disk ran out of space" quick fixes... [17:14:25] * Damianz seds entire logical volume to entire volume group and returns to a happy place [17:15:00] yes, entire volume group [17:17:17] thanks leslie! [17:18:40] thanks cmjohnson1 [17:20:12] LeslieCarr: can you approve this for me — https://gerrit.wikimedia.org/r/#/c/9640/ [17:20:22] New review: preilly; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/9640 [17:20:45] in about 5 minutes or so [17:21:10] Ryan_Lane: you around? [17:21:12] New patchset: Lcarr; "Switching stat1 to precise install" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11186 [17:21:17] yes. not for long, though [17:21:24] and am in the middle of changing the ip for virt0 [17:21:26] Ryan_Lane: can you approve https://gerrit.wikimedia.org/r/#/c/9640/ [17:21:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11186 [17:21:46]