[00:22:27] domas: final question, should those changes on locke be kept and put in svn? [01:26:15] PROBLEM - Puppet freshness on gilman is CRITICAL: Puppet has not run in the last 10 hours [01:43:13] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 265 seconds [01:46:03] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 14 seconds [02:29:26] hah, misinterpreted lily's barfing at first. (once upon a time there was a box named lily...) [02:30:21] PROBLEM - MySQL Replication Heartbeat on db12 is CRITICAL: CRIT replication delay 212 seconds [02:30:48] PROBLEM - MySQL Slave Delay on db12 is CRITICAL: CRIT replication delay 233 seconds [02:30:57] PROBLEM - MySQL Replication Heartbeat on db42 is CRITICAL: CRIT replication delay 197 seconds [02:31:15] PROBLEM - MySQL Slave Delay on db42 is CRITICAL: CRIT replication delay 204 seconds [02:31:42] PROBLEM - MySQL Slave Delay on db1017 is CRITICAL: CRIT replication delay 182 seconds [02:32:09] PROBLEM - MySQL Replication Heartbeat on db36 is CRITICAL: CRIT replication delay 209 seconds [02:32:18] PROBLEM - MySQL Slave Delay on db36 is CRITICAL: CRIT replication delay 248 seconds [02:33:12] RECOVERY - MySQL Slave Delay on db1017 is OK: OK replication delay 0 seconds [02:33:21] PROBLEM - MySQL Replication Heartbeat on db1033 is CRITICAL: CRIT replication delay 230 seconds [02:33:22] Joan who do i ping in here? [02:35:00] RECOVERY - MySQL Replication Heartbeat on db1033 is OK: OK replication delay 0 seconds [02:35:00] RECOVERY - MySQL Replication Heartbeat on db36 is OK: OK replication delay 0 seconds [02:35:18] RECOVERY - MySQL Slave Delay on db36 is OK: OK replication delay 0 seconds [02:35:52] MBisanz: assuming those recoveries are the right cluster than no one? [02:36:17] huh? [02:36:39] That's a typical response to jeremyb. [02:36:48] RECOVERY - MySQL Replication Heartbeat on db42 is OK: OK replication delay 0 seconds [02:36:48] RECOVERY - MySQL Slave Delay on db42 is OK: OK replication delay 0 seconds [02:36:49] Most people just think it, though. [02:36:55] still waiting on db12, db1017 [02:37:11] err, then* [02:37:19] also, hi board member! [02:38:34] haha hi jeremyb [02:38:37] Joan who do I ping? [02:39:36] MBisanz: Is anything broken? [02:39:48] Joan Due to high database server lag, changes newer than 605 seconds may not appear in this list. [02:40:01] But you can edit? [02:40:04] And read the site? [02:40:10] There's no undo button for your stupidity. [02:40:20] As long as nothing's actually broken, you just have to wait. [02:40:52] ok... [02:41:03] seriously don't ping anyone [02:41:14] MBisanz: You can follow https://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=dbrepllag&sishowalldb= for the lags. [02:41:18] They've all gone down. [02:41:37] jeremyb ok [02:41:44] Joan good. why did it go through? [02:41:50] did someone change the limit [02:42:13] why the rename? is he being renamed back to original? [02:42:39] and why did it have that effect? shouldn't it just hit the job queue and that's it? [02:42:43] jeremyb: i'm moving that name out of the way so he can have his old name back [02:42:57] jeremyb: no, renames are funky that way. there is supposed to be a 50k limit on them [02:43:04] why the limit didn't kick in, i dont know [02:43:12] ohhhh, renamed when he left and now he's back? [02:43:28] Easy enough to ban renames. [02:43:38] i guess maybe that means i'll see him on may 5. will you both be there? [02:43:42] RECOVERY - MySQL Slave Delay on db12 is OK: OK replication delay 0 seconds [02:43:42] jeremyb no, he ragequit and scrambled his password [02:43:45] jeremyb no [02:43:57] MBisanz: yes, i recall half of that [02:44:00] You all should switch channels. [02:44:15] all back to 0 [02:44:27] RECOVERY - MySQL Replication Heartbeat on db12 is OK: OK replication delay 0 seconds [02:53:47] Is blondel live yet or is monitoring premature? [02:54:40] LeslieCarr are you here? [02:57:37] so... blondel. [03:01:21] ok, blondel is not in service. [03:01:31] I don't know why nagios decided to page me now though, [03:01:41] since it's been giving a crit alert for days. [03:03:54] ok, gnight. [05:26:02] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [05:26:02] PROBLEM - Puppet freshness on es1004 is CRITICAL: Puppet has not run in the last 10 hours [05:26:02] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [05:35:02] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [05:35:02] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [05:40:08] RECOVERY - udp2log log age on emery is OK: OK: all log files active [05:44:20] PROBLEM - udp2log log age on emery is CRITICAL: CRITICAL: log files /var/log/squid/e3_necromancy_idle1year.log, /var/log/squid/e3_necromancy_idle3month.log, /var/log/squid/telenor-montenegro.log, /var/log/squid/orange-tunesia.log, /var/log/squid/orange-uganda.log, have not been written to in 6 hours [06:53:11] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [07:02:39] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [07:43:04] !log owa[1-3] They dont have real puppet freshness issues, it's rather firewalling and the snmp traps [07:43:08] Logged the message, Master [07:47:03] !log gilman - what's up with it? closes SSH, does not like mgmt pass, was running jenkins but broken [07:47:05] Logged the message, Master [07:49:19] !log stat1 - this also needs udp2log stuff fixed. currently Could not find class misc::udp2log::udp-filter [07:49:21] Logged the message, Master [07:59:07] New patchset: Dzahn; "decommission srv189 - server removed from rack by cmj - RT-2413" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5303 [07:59:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5303 [07:59:55] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/5303 [07:59:57] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5303 [08:22:48] ACKNOWLEDGEMENT - Puppet freshness on gilman is CRITICAL: Puppet has not run in the last 10 hours daniel_zahn gilman is in zombie state,mgmt disconnects - RT 2841 [08:29:48] ACKNOWLEDGEMENT - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours daniel_zahn puppet is actually fresh - fix monitoring RT 2842 [08:29:48] ACKNOWLEDGEMENT - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours daniel_zahn puppet is actually fresh - fix monitoring RT 2842 [08:29:54] ACKNOWLEDGEMENT - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours daniel_zahn puppet is actually fresh - fix monitoring RT 2842 [08:35:11] !log emery - "udp2log_age" says some squid logfiles have not been written to in 6 hours, but from the filenames looks like this isnt a reason to worry, right [08:35:14] Logged the message, Master [08:37:06] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Timeout reading from 10.0.8.39:11000 [08:38:36] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [08:39:21] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [08:44:15] ACKNOWLEDGEMENT - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours daniel_zahn known issue with replacement of class misc::udp2log::udp-filter [09:09:12] PROBLEM - Backend Squid HTTP on amssq52 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:10:24] RECOVERY - Backend Squid HTTP on amssq52 is OK: HTTP OK HTTP/1.0 200 OK - 635 bytes in 0.231 seconds [09:15:21] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [09:29:01] New review: Reedy; "(no comment)" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/5104 [10:01:02] New patchset: Dzahn; "class for mw cronjobs to run refreshLinks.php per cluster - use mwdeploy user" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5104 [10:01:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5104 [10:02:04] New review: Dzahn; "(no comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5104 [10:08:40] diederik: probably! [11:05:46] New patchset: Mark Bergsma; "Double the backend weights to improve chash distribution" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5309 [11:06:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5309 [11:06:52] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5309 [11:06:55] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5309 [11:11:55] New patchset: Mark Bergsma; "Actually double the backend weights this time" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5310 [11:12:11] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5310 [11:13:06] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5310 [11:13:08] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5310 [11:28:49] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2588* [11:31:40] RECOVERY - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is OK: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z OK - 2338 [11:41:25] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [11:42:46] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [11:55:59] !regsubst | mutante [11:55:59] mutante: testing your regsubst replacings - https://blog.kumina.nl/2010/03/puppet-tipstricks-testing-your-regsubst-replacings-2/ [12:22:44] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2613* [12:24:14] RECOVERY - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is OK: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z OK - 2213 [13:05:11] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Timeout reading from 10.0.8.23:11000 [13:06:32] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [13:32:16] New patchset: Dzahn; "class for mw cronjobs to run refreshLinks.php per cluster - flexible hours" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5104 [13:32:33] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5104 [13:33:46] New patchset: Dzahn; "class for mw cronjobs to run refreshLinks.php per cluster - flexible hours" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5104 [13:34:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5104 [13:35:31] New review: Dzahn; "ok, let's not use the cluster name as hour, instead make hours flexible. this works just like in htt..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/5104 [13:37:11] New patchset: Dzahn; "class for mw cronjobs to run refreshLinks.php per cluster - flexible hours" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5104 [13:37:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5104 [13:39:10] New review: Dzahn; "ok, and now going to test this on hume but watch it" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/5104 [13:39:13] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5104 [13:46:06] New patchset: Dzahn; "add refreshlinks cronjobs to hume and verify" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5322 [13:46:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5322 [13:47:48] New review: Dzahn; "tested on labs" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/5322 [13:47:51] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5322 [13:51:53] Reedy: ^ see above and i just upgraded RT-2355, how does that look to you? the resulting crons [13:52:01] they are on hume now [13:52:19] 2 hours time for s1 to finish, 1 hour for the others [13:53:03] changing the run times for the clusters should be easy, i made it: usage: @ in puppet [13:56:33] !log adding refreshLinks cron jobs to hume per RT-2355 (via puppet). if there should be any performance issues, schedule can be changed like @ in mediawiki.pp (and/or remove mediawiki::refreshlinks from hume and clear out the jobs of user mwdeploy) [13:56:36] Logged the message, Master [14:38:31] New patchset: Pyoungmeister; "another little fix for udp2log" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5326 [14:38:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5326 [14:39:17] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5326 [14:39:19] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5326 [14:42:34] RECOVERY - Puppet freshness on stat1 is OK: puppet ran at Thu Apr 19 14:42:16 UTC 2012 [14:43:28] ^ thanks notpeter :) [14:54:26] New patchset: Jgreen; "added install of few more stock packages to mwlib.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5327 [14:54:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5327 [14:55:07] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/5327 [14:55:10] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5327 [15:26:53] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [15:26:53] PROBLEM - Puppet freshness on es1004 is CRITICAL: Puppet has not run in the last 10 hours [15:30:22] New review: Hashar; "I have resolved related bug https://bugzilla.wikimedia.org/35469" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3885 [15:47:43] New patchset: Pyoungmeister; "need that gone" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5333 [15:47:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5333 [15:48:15] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5333 [15:48:18] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5333 [17:13:30] New patchset: Pyoungmeister; "part 1 of getting multicast relay going" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5335 [17:13:45] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/5335 [17:19:12] New patchset: Pyoungmeister; "part 1 of getting multicast relay going" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5335 [17:19:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5335 [17:42:20] !log discovered sanger had ~7K redundant iptables rules, removed extras and reloaded [17:42:23] Logged the message, Master [17:42:41] haha [17:43:27] 7K?? O_O [17:43:31] doesn't sound like the kind of thing that would be done manually... so the automated process that did it may put them back again? [17:51:58] !log discovered nfs1 had ~1K redundant iptables rules, removed extras and reloaded [17:52:01] Logged the message, Master [17:53:03] gl;hf. [18:08:07] hey maplebed: can we configure email warnings for packetloss on emery, locke and oxygen? [18:15:21] !log rebooting db1005. it's dead, jim. [18:15:24] Logged the message, notpeter [18:15:39] diederik: what do you want done? it should already have that turned on [18:15:46] or... do you want it turned off for now? [18:16:15] notpeter: since when is it turned on? [18:17:40] diederik: should be automatic. let me take a look [18:17:48] but that will be part of bringing that host up [18:18:10] but i didn't receive any emails warning me about packetloss on emery last week [18:18:11] status update: I think I have it sorted, but I'm waiting on a code review from asher or mark before I deploy it. [18:18:23] hhhhmmmm, ok [18:18:35] I'll look at it generally [18:18:47] * binasher is reviewing  [18:19:35] RECOVERY - Host db1005 is UP: PING OK - Packet loss = 0%, RTA = 26.52 ms [18:22:53] PROBLEM - mysqld processes on db1005 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [18:25:18] !log nothing obvious in logs on db1005, starting mysql [18:25:20] Logged the message, notpeter [18:25:56] is that the second time db1005 died in the last week? [18:27:05] RECOVERY - mysqld processes on db1005 is OK: PROCS OK: 1 process with command name mysqld [18:29:38] PROBLEM - MySQL Replication Heartbeat on db1005 is CRITICAL: CRIT replication delay 184185 seconds [18:30:32] PROBLEM - MySQL Slave Delay on db1005 is CRITICAL: CRIT replication delay 183750 seconds [18:31:07] binasher: nope. it was db1004 that died earlier this week. so if current trends continue... shit! it's headed right for db1006! [18:31:45] sweet [18:32:15] i wonder what eqiad will look like when we send actual compute load there [18:32:21] nagios christmas! [18:32:54] it's weird [18:33:14] i saw esams varnish restart again btw [18:33:23] but I haven't investigated it yet [18:53:46] notpeter: i just got paged about blondel again, grrr [18:54:38] is it from nagios or ichinga? [18:54:53] *icinga [18:55:50] chinga [18:56:47] binasher: sounds like a personal problem [18:58:14] binasher: but srsly, I'm not sure. I turned off notifications on spence, not sure where to do so on icinga [18:59:05] is anyone else getting sms pages from icinga? [19:03:40] hi guys [19:03:58] i'm looking into getting email alerts for packet loss from udp2log monitor [19:04:01] looking in puppet [19:04:06] i'm almost there [19:04:29] line 132 of logging.pp sets ups a monitor_service called 'packetloss' [19:04:40] i need to change the contact_group for this monitor service [19:04:51] I'm looking for where the different contact_groups are configured [19:04:58] but having trouble finding them [19:16:03] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [19:16:40] talking to notpeter now…, thanks guys [19:49:04] New patchset: Ottomata; "manifests/admins.pp - adding myself to the admins::restricted group so that I can have access to udp2log machines locke, emery, and oxygen." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5350 [19:49:21] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5350 [19:54:00] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [19:58:59] New patchset: Pyoungmeister; "part 1 of getting multicast relay going" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5335 [19:59:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5335 [20:01:16] hm, I have a gerrit question [20:01:32] I might have asked this before months ago, but forgotten. apologies in advance if so [20:01:41] go ahead :D [20:01:49] I have one commit waiting for review in puppet production branch [20:01:54] I have another commit I'd like to push to gerrit [20:01:59] when I run [20:02:00] since most SF people are having lunch anyway [20:02:02] git-review [20:02:13] It asks me if I want to submit both of my commits [20:02:21] even though the first one has already been submitted [20:02:33] should I say yes? [20:02:41] when you do git-review , it compares you current branch with remotes/origin/master [20:02:48] and find out that your branch is ahead by two commits [20:02:53] so the script will send boths [20:02:59] that is often unwanted [20:03:05] does gerrit handle that? [20:03:13] will it know that the first commit has already been submitted? [20:03:34] if I say yes? [20:03:41] gerrit will IF the Change-Id is already known to gerrit [20:03:56] and that the commit sha1 hasn't changed [20:04:01] i think it should be, i've submitted it via git-review [20:04:02] else it will happily create a new patchset [20:04:09] so that should work [20:04:27] is the second commit really depending on the first, already sent, commit? [20:04:41] no, not depending on it at all [20:04:45] completely separate [20:04:53] if not, you might want to create a new branch based off origin/master and rebase your second commit on that [20:05:06] cause the second commit can not be merged until the first one is [20:05:13] if they are dependent? [20:05:15] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [20:05:21] and if the first one is modified, its sha1 will change and the second commit will have to be rebased [20:05:31] so usually here is what I do : [20:05:38] even if it is a different file witha completely different purpose? [20:05:42] git checkout -b my_feature -t origin/master [20:05:48] oh actually, it is in the same file [20:05:49] , git add, git commit [20:05:50] not related [20:05:51] but same file [20:05:52] git-review -f [20:05:59] rinse and repeat with next feature :-D [20:06:10] these aren't really 'features' though [20:06:17] i just added myself to an account:: group [20:06:19] in the first commit [20:06:35] well features / bug fix / addition whatever :-D Gerrit names that a "topic" [20:06:37] and then did ensure => false on an old unixaccount 'aotto' that is not being used [20:06:39] that i want to get rid of [20:06:55] ok [20:07:06] and because they are from different local branches [20:07:13] gerrit won't see the commits when I run review [20:07:14] hm [20:07:34] or, more accurately: git-review wont' see them, because they exist in different local branches [20:07:46] so it will only see me as ahead of origin/master by 1 commit in each local branch, right? [20:08:11] gerrit identity a change by the pair of: branch name, Change-Id in commit message [20:08:52] if you send a commit having the same branch and change id has a previous commit, gerrit will generate a new patchset for that change [20:09:28] similary, a commit cherry picked in another branch, will have the same Change-Id as an existing commit but since branch is different it will trigger a new change [20:09:30] right but it won't try to submit the 2 local commits, because it won't see the first commit if I do a 2nd commit in a different branch? [20:09:40] hm [20:09:40] (which will have the same change id but a different change number hehe) [20:09:45] erg [20:09:45] ok [20:10:06] so the primary key is (branch_name, Change-Id) ;-D [20:10:24] then about dependencies, Gerrit has a look at the parent sha1 of the commit object you send [20:10:33] if the parent is already merged, that is an independent change [20:11:41] if the parent commit is an unmerged change in Gerrit, the new Change/patchset will be made a "child" of the existing Change [20:11:56] so you could theorically change Gerrit dependencies by rebasing a patch set :-D [20:12:18] a dependency could be removed by rebasing the patchset1 onto something that is already merged in (for example: origin/master ;) ) [20:12:26] ahhhh [20:12:28] ok [20:12:46] by doing it in another branch, is my previous commit no longer the parent SHA of my new commmit? [20:13:08] if only I was a smart guy, I would have made nice drawing for that and would have uploaded the imaged description on mw.org :-) [20:13:15] haha [20:13:23] you still can! [20:13:33] when you create another branch, aka: git checkout -b my_branch origin/master [20:13:42] the commit you will make will have for parent origin/master [20:13:42] New patchset: Ottomata; "admins.pp Disabling aotto account. 'otto' has replaced it, and I want to delete the aotto puppet configs. I will do this in another commit after puppet has removed aotto from stat1.wikimedia.org (the only place where aotto currently exists)." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5353 [20:13:59] which is already merged in, hence that makes an independent commit [20:13:59] New patchset: Pyoungmeister; "notifications for all udp2log related alerts going to analytics peeps, plus adding otto and robla" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5354 [20:14:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5353 [20:14:16] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5354 [20:14:22] ahhhhh right, because it is the most recent remote parent that has been approved [20:14:24] yes yes [20:14:26] makes sense [20:14:35] hmm no [20:14:39] because i am not branching from my working master, but from origin/master [20:14:40] ? [20:14:40] because origin/master is merged in [20:14:52] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5354 [20:14:54] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5354 [20:14:57] right, so when I branch, I am branching from the origin GIT [20:15:14] not from my working, which possibly contains unmerged commits [20:15:18] if your working master is pointing to a commit which is merged in the remote repo, it will be fine too :-] [20:15:24] right [20:15:32] but if I have unmerged commits (like I do right now) [20:15:40] then if I just did git checkout -b new_branch [20:15:41] you could even do something: like git checkout -b myoldfix HEAD^^^^^^ [20:15:43] I would still have the same problem [20:15:45] riiiiighhht [20:15:47] ottomata: is your email aotto, or otto ? [20:15:51] that will base your change on top of an old commit [20:15:57] both work, but otto@wikimedia.org is prefered [20:15:59] right [20:16:02] k [20:16:05] so as long as the old commit I choose is one that has been merged [20:16:10] gerrit will be cool [20:16:24] nice nice [20:16:33] hashar, that makes more sense to me now than ever before, thank you! [20:16:47] ottomata: I have added some useful git aliases on http://www.mediawiki.org/wiki/git/aliases [20:16:50] if you are a CLI guy [20:16:53] i am [20:17:03] if you are a GUI guy, that is the last time we talk to each other cause I will /ignore you :-D [20:18:21] ottomata: the alias I use the most is the fancy log : http://www.mediawiki.org/wiki/Git/aliases#Fancy_log_and_graph [20:18:26] hehe, i would probably not have understood that if I was, hehe [20:18:34] yeah, i have that one set up already [20:19:40] also something I have not added on mw.org is that you definitely want to use git bash_completions :-D [20:19:42] save a ton of time [20:20:06] it even detects that 'git lg' is an alias of 'git log' and complete 'git lg' with the options of git log !! [20:20:23] an option I use a lot is --no-merge [20:20:47] since Gerrit generates ton of merges [20:21:03] hehe, i have that already too! [20:21:09] i like the current branch in the prompt [20:21:12] that is really useful [20:21:19] yeah that one is the top most useful one [20:21:41] —no-merge? [20:21:45] git branch -vv , will show you all branches and the remote one being tracked [20:21:45] on in the log you mean? [20:21:59] oh nice and log messages [20:22:01] git branch --merged <-- all local branch merged upstream :-] [20:22:04] so should be safe to delete [20:22:22] yup in log you can try: git lg --no-merge [20:22:29] ottomata: diederik notifications should be going out properly now [20:22:34] well, once spence runs puppet again [20:22:55] ottomata: cool,can we test this/ [20:22:55] aye, and if we have packet loss, which hopefully we won't :) [20:23:15] ottomata: to compare between branches / sha1 you can use: commit1..commit2 (aka two dots) [20:23:29] ottomata: with three dots, it makes a diff against the common ancestor IIRC [20:23:35] never used it though [20:24:03] hm [20:24:05] cool [20:24:17] diederik, i think the packetloss checker just looks for stuff in a log file [20:24:27] so we could append the log file with some stuff it doesn't like [20:24:36] and see if we get the message [20:25:03] not exactly sure how it works yet though…trying to understand more [20:25:35] # this is what will match the packet loss lines [20:25:35] # packet loss format : [20:25:35] # %[%Y-%m-%dT%H:%M:%S]t %server lost: (%percentloss +/- %margin) [20:25:35] # [2011-10-26T21:20:25] sq86.wikimedia.org lost: (3.61446 +/- 19.67462)% [20:25:43] maybe just [20:27:34] echo '….. (50.00000% +- 5.00000)%' >> /var/log/squid/packet-loss.log [20:27:36] or something like that? [20:39:21] ottomata: have a good day :-D [20:39:24] * hashar waves [20:40:00] laters! [20:40:05] thanks so much for your help! [21:03:10] New patchset: Jgreen; "switching members of the fundraising nagios contact group . . . maybe that will be useful somehow." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5358 [21:03:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5358 [21:04:23] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/5358 [21:04:26] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5358 [21:07:47] RECOVERY - MySQL Slave Delay on db1005 is OK: OK replication delay 0 seconds [21:08:07] !log changed nagios contactgroup fundraising from tfinc/awrichards --> jgreen [21:08:09] Logged the message, Master [21:08:17] :) [21:08:23] thanks Jeff_Green [21:08:32] RECOVERY - MySQL Replication Heartbeat on db1005 is OK: OK replication delay 0 seconds [21:08:50] tfinc: i didn't know it existed until p gehres pointed it out! [21:08:57] did you ever receive any form of notification from that? [21:10:12] if i did then i don't remember it [21:11:06] that's what I figured, otherwise one of you guys would have mentioned it too [21:26:39] hi domas: should the cache code for webstatscollector be checked into svn? [22:02:00] does anyone know what happened to es1004? [22:20:57] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [22:23:48] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [22:35:39] diederik: yeah [22:35:46] I don't remember why I didn't [22:35:53] probably because it wasn't SVN checkout [22:36:14] okay, i'll take care of it [22:36:27] I guess it is one line diff [22:36:47] it is amusing how many people didn't get my sarcasm [22:46:29] actually it was a 5 line diff :) [23:03:28] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/5335 [23:18:13] help :) the fundraising test server died [23:18:27] is there someone who can kick it? [23:18:28] test-payments.tesla.usability.wikimedia.org [23:18:55] it is a virtual machine on ESXi [23:19:12] PROBLEM - Host es1004 is DOWN: PING CRITICAL - Packet loss = 100% [23:19:18] i started a service and the machine froze [23:19:31] the virtual machine froze [23:20:24] RECOVERY - MySQL Slave Running on es1004 is OK: OK replication [23:20:33] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [23:20:34] RECOVERY - MySQL Slave Delay on es1004 is OK: OK replication delay seconds [23:20:34] RECOVERY - Host es1004 is UP: PING OK - Packet loss = 0%, RTA = 26.45 ms [23:20:51] RECOVERY - SSH on es1004 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [23:20:51] RECOVERY - MySQL Recent Restart on es1004 is OK: OK seconds since restart [23:21:36] jpostlethwaite: hey [23:21:41] why'd you kill it :p [23:21:41] Hi LeslieCarr is there anyone in ops who can help me restart a vm? [23:21:46] ha [23:21:53] i was just notifying you? [23:22:01] i started a service [23:22:04] and it froze [23:22:23] so is it a labs VM or is it a real machine ? [23:22:31] it is a vm [23:22:36] on tesla [23:22:44] i believe [23:22:56] test-payments.tesla.usability.wikimedia.org [23:23:12] it is an ESXi server [23:23:43] the machine does not have enough resources :( [23:23:48] hrm [23:23:57] just came up [23:24:00] don't know much about the whole payments setup ... [23:24:04] trying to figure out how to get into it [23:24:19] i have access to the machine now [23:24:38] i can give you details if you want to log in :) [23:24:46] what details ? [23:24:55] ipa [23:24:57] i mean yes please [23:24:58] hehe [23:25:10] 208.80.152.235 [23:25:18] 192.168.1.95 [23:25:52] it only has a GB of RAM [23:25:53] yeah, can't ssh into 208.80.152.235 [23:26:08] not sure how you got the 192.168 ip [23:26:12] not anything in prod ... [23:26:16] Jeff_Green takes care of this machine [23:26:17] could be something special set up [23:26:19] ah yeah [23:26:38] don't think i have access... [23:26:55] well thanks for responding [23:27:20] the host server is to be taken down in the future [23:27:28] it is to be decommissioned [23:29:34] RECOVERY - Puppet freshness on es1004 is OK: puppet ran at Thu Apr 19 23:29:04 UTC 2012 [23:31:52] New patchset: Asher; "install our current prod mysql pkgs on es hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5393 [23:32:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5393 [23:32:57] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5393 [23:33:00] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5393 [23:33:08] !log powercycled es1004 [23:33:11] Logged the message, Master [23:54:32] RECOVERY - mysqld processes on es1004 is OK: PROCS OK: 1 process with command name mysqld