[00:13:21] anyone? the instances i just blamed all seem to predate the git migration [00:21:10] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [00:21:59] jeremyb: I think efRaiseThrottle() predates $wgRateLimitsExcludedIPs [00:22:28] !log on cp1005: set tcp_tw_recycle=0 [00:22:31] Logged the message, Master [00:22:38] jeremyb: the latter removes all rate limits the former uses wgAccountCreationThrottle and therefore only rate limits account creation [00:22:39] TimStarling: see Reedy's already started answering me in #mediawiki ;) but thanks [00:50:54] !log migrating fliejournal to innodb on all wikis [00:50:57] Logged the message, Master [01:05:46] RECOVERY - Packetloss_Average on oxygen is OK: OK: packet_loss_average is -1.41884284553 [01:09:47] binasher: THANK YOU MR INNODB [01:10:05] :) [01:10:08] PROBLEM - Packetloss_Average on oxygen is CRITICAL: XML parse error [01:10:14] now look what you've done [01:28:32] http://www.mail-archive.com/netdev@vger.kernel.org/msg60547.html [01:28:42] on ephemeral port exhaustion [01:29:14] this is an answer to a question by a person who has roughly the same problem as us on our squids [01:39:57] !log filejournal migration complete [01:40:01] Logged the message, Master [01:41:28] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 255 seconds [01:42:06] !log on cp1004: set net.ipv4.tcp_tw_recycle=0 and net.ipv4.tcp_tw_reuse=1 [01:42:10] Logged the message, Master [01:43:10] we could use keepalive [01:44:19] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [01:44:42] New patchset: Pgehres; "Enabling UploadWizard on wmfwiki and donatewiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7922 [02:04:56] RECOVERY - Packetloss_Average on oxygen is OK: OK: packet_loss_average is -0.549010083333 [02:09:17] PROBLEM - Packetloss_Average on oxygen is CRITICAL: XML parse error [02:11:21] I see mark already attempted to enable it [02:11:23] it's not working though [02:58:18] New review: Reedy; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7922 [02:58:20] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7922 [03:19:39] New review: Reedy; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7906 [03:19:41] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7906 [04:16:36] Hm.. static-1.20wmf3 symlinks aren't in ops/wm-conf git ? [04:16:41] uncommitted? [04:17:50] yeah [04:17:50] moment [04:20:21] done [04:20:31] I wonder if: [04:20:33] # deleted: docroot/bits/resources-1.19 [04:20:33] # deleted: docroot/bits/resources-1.20wmf1 [04:20:33] # modified: docroot/bits/skins-1.19 [04:20:33] # modified: docroot/bits/skins-1.20wmf1 [04:20:33] # deleted: docroot/bits/w/extensions-1.19 [04:20:36] # deleted: docroot/bits/w/extensions-1.20wmf1 [04:20:40] should be comitted [04:32:19] there's even an -1.18 one [04:32:23] what do they point to ? [05:23:10] Love the section title "other ways to detach your HEAD" at http://sitaramc.github.com/gcs/index.html [07:55:35] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [07:57:49] does bart still exist? And if it does is anyone around with access who can go svn up the planet directory? (apparently /usr/local/planet/wikimedia ) Erik and I made some changes a couple days ago and I think both of us assumed it was semi automatic (because that's all meta says needs to get done) and I doubt I'm able to get passed fenari. [08:33:15] New patchset: Hashar; "set wmfUdp2logDest depending on cluster used" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7931 [08:41:17] ** server can't find bart: NXDOMAIN [08:45:34] hmm, I wonder if it's just stored on singer now? ... [08:45:48] the directions still say on Bart but they haven't been updated on wikitech for ages [08:46:33] singer should at least know where it is.. because I think the apache is there [08:51:14] New patchset: Hashar; "Change cluster name 'beta' => 'wmflabs'" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7933 [08:51:47] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7933 [08:51:49] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7933 [09:16:37] New patchset: Hashar; "disable $wgUseLuceneSearch on labs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7934 [09:18:35] Awww, no setup labs Lucene? [09:18:44] * Damianz removes hashar's cookies [09:19:12] labs does not even work :-D [09:19:23] so I guess we can wait a bit before having lucene [09:22:34] Jamesofur: hi, let me check on planet [09:22:50] mutante: thanks :) [09:23:03] it is on singer, ack [09:23:18] * Jamesofur shakes his head [09:24:34] there are conflicts on "svn up" :p [09:24:52] GIT [09:25:29] is it in the same spot there? We should probably get it updated on http://wikitech.wikimedia.org/view/Planet.wikimedia.org . I don't have an account but I can get someone to update it later if you don't have time. [09:25:37] GIT throws conflicts on svn up? [09:26:09] there is no git on singer [09:26:20] svn up shows conflicts in the config for zh planet [09:26:38] Jamesofur: updated en, it and fr config, looking at zh ... [09:26:45] thanks [09:26:49] yes, it is in /usr/local/planet/wikimedia [09:27:12] ok, at least they did that [09:28:04] [http://taipei-wikipedian.blogspot.com/feeds/posts/default] [09:28:13] <--did you want this in or out? [09:28:40] diff config.ini.r114740 config.ini.r115397 [09:29:58] it sure looks like it should be in, let me quickly check to make sure there wasn't something weird before that caused someone to take it out [09:30:06] New patchset: ArielGlenn; "continue multi-tarball wiki job; use one job queue for everything" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/7935 [09:31:06] Jamesofur: ok, looks good to me now. fixed config.ini and up-to-date [09:31:17] thanks and yeah that should just stay in [09:31:18] thanks much [09:31:20] at 115397 [09:31:21] New review: ArielGlenn; "(no comment)" [operations/dumps] (ariel); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7935 [09:31:22] np [09:31:23] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/7935 [09:33:28] Jamesofur: reference to "bart" removed from that wikitech page [09:33:55] mutante: thanks! [09:34:24] If no one gets to it before I'll take a look at the pl issues you found tomorrow to update that [09:35:07] Jamesofur: ah you saw that? ok, cool, i was about to point to that page on meta [09:35:36] yeah just noticed while I was checking zh, I'll go through and check them out [09:36:02] will probably do it either in the morning at the office or evening after makerfaire setup [09:36:13] nice, pl. was stuck and when manually running it i saw those... but they should not have prevented updates [09:36:36] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:37:00] yeah it looks like the sticking issue happens sometimes, was it a cache issue or something else? [09:37:34] in the past we had some due to missing locales [09:37:49] but this last one wasnt... just ran it and it was updated.. [09:38:08] yea, also saw the corrupted cache ones in the past, but also not in this case..shrug [09:38:39] weird [09:39:36] well, thanks :) it looks like we've been pretty bad at updating these so I'm going to try and keep it on my watchlist to help shepherd them along [09:40:14] added puppet code to install all these locales, in case on should be missing , just need to add it [09:40:28] https://gerrit.wikimedia.org/r/#/c/1302/6/files/locales/local_int [09:41:10] Jamesofur: interested in planet project on labs? (which is trying planet-venus, planet software rewrite)....? [09:41:23] ahhh nice, sure I'd be happy to help out [09:41:33] let me add you:) [09:41:53] thanks! [09:43:13] what's your lab user name? [09:43:33] wiki -> jalexander git/ssh --> jamesur [09:43:54] (and my shell account is jamesofur… eventually I should probably fix the fact that they're all different) [09:44:35] alright, wanna switch to labs channel ? [09:44:40] sure [09:50:42] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:12:00] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:20:24] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:22:03] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [10:27:35] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:39:45] New review: Hashar; "synced live." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7933 [10:45:17] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 185 seconds [10:45:44] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 195 seconds [10:51:44] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:55:20] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [10:55:47] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [12:00:36] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7931 [12:00:38] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7931 [12:02:11] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7934 [12:02:14] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7934 [12:02:20] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:02:33] going to deploy them [12:13:24] New patchset: Dzahn; "adding analytics server nodes and role class skeleton" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7941 [12:13:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7941 [12:14:28] New patchset: Dzahn; "adding analytics server nodes and role class skeleton" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7941 [12:14:48] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7941 [12:15:13] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7941 [12:15:15] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7941 [12:16:09] New patchset: Hashar; "setup squid servers differently on labs." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7942 [12:16:10] New patchset: Hashar; "disable UDP profiling on labs, not used there" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7943 [12:17:34] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7943 [12:18:47] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7942 [12:18:49] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7943 [12:18:50] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7942 [12:23:28] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:42:58] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [12:43:25] New patchset: Hashar; "tweak CentralAuth configuration for labs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7944 [12:44:10] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [12:44:58] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7944 [12:45:00] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7944 [12:47:46] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [12:50:37] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.048 second response time [12:51:18] New patchset: Hashar; "fix typo in previous commit 89ad4bd" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7945 [12:51:32] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7945 [12:51:35] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7945 [12:52:07] New patchset: Hashar; "set wgImageMagickTempDir differently on labs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7946 [12:52:21] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7946 [12:52:23] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7946 [13:13:26] New patchset: Hashar; "wgNotice* on production only" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7947 [13:13:39] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7947 [13:13:42] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7947 [13:37:54] New patchset: Jgreen; "adding logmover account to aluminium/grosley" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7949 [13:38:13] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7949 [13:38:27] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7949 [13:38:29] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7949 [13:41:57] New patchset: Jgreen; "well well well, there *is* no accounts::logmover, removing from storage3 node definition as well . . ." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7950 [13:42:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7950 [13:42:36] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7950 [13:42:38] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7950 [13:52:48] New patchset: Jgreen; "account setup for fundraising backup archiving" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7951 [13:53:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7951 [13:54:10] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7951 [13:54:12] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7951 [13:54:47] New patchset: Hashar; "ant script to lint PHP files" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7952 [13:55:04] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7952 [13:55:06] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7952 [14:00:44] !log authdns-update - pushing fix for reverse lookup in eqiad subnets [14:00:47] Logged the message, Master [14:05:28] New patchset: Hashar; "test commit for jenkins" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7953 [14:07:38] hiiiiii there everyone [14:07:44] morning poke time! [14:07:49] i'll just do one today [14:07:57] can someone check up on this for me? [14:07:57] https://gerrit.wikimedia.org/r/#/c/7896/ [14:09:30] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [14:11:57] mutante: can you review ottomata change 7896 ? It has the keyword "nagios" ;-D [14:12:30] thanks hashar :) [14:16:25] New review: jenkins-bot; "Build Failed " [operations/mediawiki-config] (master); V: -1 C: 0; - https://gerrit.wikimedia.org/r/7953 [14:16:36] \O/ [14:17:04] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/7953 [14:17:15] <--- \O/ [14:17:18] twice [14:17:46] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/7953 [14:21:39] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 192 seconds [14:21:48] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 195 seconds [14:24:01] Change abandoned: Hashar; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7953 [14:24:33] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [14:24:59] !log setup a Jenkins job to lint PHP files in operations/mediawiki-config.git:/wmf-config/ [14:25:05] Logged the message, Master [14:25:10] mutante: got a sec? [14:26:24] New review: Dzahn; "inline comments. i don't think you want to echo stuff within the loop in the nagios check" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/7896 [14:27:08] hexmode: ? [14:27:58] mutante: https://bugzilla.wikimedia.org/show_bug.cgi?id=36950 -- is this ops? [14:29:12] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 0 seconds [14:29:30] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 0 seconds [14:30:58] hexmode: hmmm. it might be. i dont know yet [14:31:42] mutante: k, didn't want to open an rt ticket w/o knowing. fyi: already notified robla+sumanah+ctwoo [14:32:16] i guess its ok to open one [14:35:13] kk, will do [14:35:32] tyvm for letting me pick your brain :) [14:36:20] np [14:42:55] does anyone know how the jobrunner stuff is setup ? [14:43:22] I am wondering how to add a new JobRunner (the one from TimedMediaHandler extension) [14:54:28] drdee: root@analytics1001:# :) [14:55:00] WOOT WOOT :D [14:55:18] well, at least the first one, now on puppet, added a skeleton role class for analytics [14:55:28] will add your accounts in a little [15:00:16] New patchset: Dzahn; "adding dsc,diederik and otto accounts to analytics role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7959 [15:00:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7959 [15:01:17] New review: Dzahn; "dsc = dschoon" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7959 [15:01:20] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7959 [15:01:46] cooooool [15:02:57] ottomata: wanna try logging in? [15:03:55] analytics1001 [15:10:14] .wikimedia.org? [15:10:46] .eqiad.wmnet from fenari [15:10:56] or just drop the suffix [15:14:55] its asking me for pw [15:15:12] this is my key [15:15:13] http://ottomata.org/stuff/ssh_public_key.txt [15:15:23] oh but from fenari [15:15:25] um [15:15:49] i see /home/otto and your key [15:16:06] yeah i can log into fenari [15:16:14] but not from fenari -> analytics1001 [15:16:22] cause my private key is not on fenari [15:16:24] hmm, need forwarding.. [15:16:24] one sec [15:16:31] ottomata: login to fenari again, adding -A to ssh command [15:16:38] to forward the agent [15:16:54] ahhh i have the proxy setup for eqiad in my ssh/config [15:16:56] one sec [15:18:06] i am in [15:18:18] yeehaw [15:18:18] i am in [15:18:30] :) [15:18:34] :D [15:18:46] you got the software RAID 1 [15:18:51] ottomata: how do you message me? [15:18:52] on the first 2 disks, as requested [15:19:01] write [15:19:02] echo "hi" | wall [15:19:07] or that [15:19:11] cool, perfect [15:19:41] very exciting news [15:20:29] ottomata: i got a first draft of gerrit statistics script [15:20:35] i'll share it with yo [15:20:36] oh! [15:20:36] haha [15:20:37] ok [15:20:39] you using ssh? [15:20:42] yes [15:20:45] aye cool [15:20:48] yeah that's what I did [15:20:49] ok cool [15:20:50] i worked on it yesterday [15:20:51] be back on in a bit [15:20:53] nice [15:20:55] and i know what rob wants [15:20:57] if you want to add puppet stuff, pull and look in role/analytics.pp [15:21:11] awesommmmme [15:21:12] thanks mutante [15:21:20] mutante, one more thing :D [15:21:21] when you get a sec [15:21:23] could you review this? [15:21:23] https://gerrit.wikimedia.org/r/#/c/7869/ [15:21:26] nagios stuff [15:21:31] OH [15:21:32] oops [15:21:36] not that one [15:21:36] i think i did:) [15:21:42] this one [15:21:42] https://gerrit.wikimedia.org/r/#/c/7896/ [15:21:53] oh, dzahn did [15:22:00] ok cool [15:22:00] thats me [15:22:03] that's you! [15:22:04] oh! [15:22:05] thank you [15:22:09] ok cool [15:22:17] i think you dont want to echo in the loop [15:22:42] the echo is piping to grep to check for the existince of the string [15:22:44] it isn't outputting [15:22:46] grep -q [15:22:52] oh.ok [15:22:58] if the log file is in the list of slow log files, it changes the time string [15:23:03] kind of a hacky way to do it [15:23:10] but probably simpler than bash array exists [15:23:40] i'll comment in review and fix whitespace and typo [15:23:59] thanks [15:25:13] New patchset: Ottomata; "files/nagios/check_udp2log_log_age - adding slow_log_files list." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7896 [15:25:33] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7896 [15:26:07] just 2 tiny ones left:) [15:26:16] New review: Ottomata; "(no comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7896 [15:26:41] ACK! [15:27:13] New patchset: Ottomata; "files/nagios/check_udp2log_log_age - adding slow_log_files list." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7896 [15:27:17] agh, i actually like the block to be tabbed all the way out [15:27:31] the blank line is technically part of the block, why not tab it too? [15:27:33] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7896 [15:27:34] but meh [15:27:34] anyway [15:27:35] patched [15:28:14] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7896 [15:28:16] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7896 [15:28:26] woo, thank you [15:28:37] be back in a bit [15:30:03] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 182 seconds [15:31:06] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 194 seconds [15:42:48] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 0 seconds [15:43:06] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 0 seconds [15:47:45] PROBLEM - udp2log log age for oxygen on oxygen is CRITICAL: CRITICAL: log files /a/squid/saudi-telecom.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [16:04:06] New patchset: Ottomata; "Adding orange-niger and saudi-telecom to list of slow logs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7965 [16:04:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7965 [16:05:27] mutante, if you are still around, would love your blessing on that too [16:05:36] since getting that deployed a few more slow logs started generating notices [16:05:38] which is good! [16:05:58] it means the faster ones are working [16:06:02] but we don't need notices so fast for these ones [16:26:46] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7965 [16:26:48] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7965 [16:27:11] ottomata: there you go. out now for today [16:29:34] thanks for your help [16:29:36] much appreciated [16:45:26] RECOVERY - udp2log log age for oxygen on oxygen is OK: OK: all log files active [17:09:09] news from this morning: 90% of all 404s hitting swift are actually for images that don't exist rather than thumbnails that need to be generated [17:09:27] Awesome [17:10:22] most commonly when someone replaces a space with a hyphen. [17:10:44] i.e. http://upload.wikimedia.org/wikipedia/commons/thumb/7/72/IPhone_Internals.jpg/400px-IPhone_Internals.jpg is the correct URL but someone's asking for https://upload.wikimedia.org/wikipedia/commons/thumb/7/72/IPhone-Internals.jpg/220px-IPhone-Internals.jpg [17:11:14] or incorrectly URLencodes something (like turning , into %252C instead of %2C) [17:12:37] huh [17:13:10] wonder how the underscores are converted into hyphens [17:14:18] AaronSchulz: did you look at that graph I sent you? [17:14:37] http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&m=swift+object+count&h=Swift+pmtpa+prod&c=Swift+pmtpa for anyone else following along. [17:14:45] it shows the results of our deploy yesterday. [17:15:33] yeah [17:16:38] AaronSchulz: what will happen if an image scaler is asked to create a thumbnail that already exists? [17:16:48] will it regenerate it or just return the existing copy? [17:17:48] it should return it [17:30:08] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [17:32:23] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [17:37:11] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [17:42:53] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [17:56:32] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [17:57:44] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 185 seconds [17:58:38] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 192 seconds [18:01:29] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 0 seconds [18:01:56] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 0 seconds [18:04:21] Reedy: you about? [18:04:27] Ya [18:04:33] I haz a question [18:04:44] I was going to move lucene-search-2 into git [18:04:49] uh... how do I do this? [18:05:06] 1. make a project operations/debs/lucene-search-2 [18:05:09] 2. ??? [18:05:12] 3. profit! [18:05:15] something like that? [18:05:36] (I ask you because I beleive you have done such things. recently even!) [18:05:48] I haven't actually done it. Chad has done 99% of that stuff [18:05:56] Also depends if you want to import history... [18:06:10] but when I hit ^d-tab, nothing happens ;) [18:06:21] hhhmmm, I mean, I odn't care about history [18:06:27] but... I feel like someone would? [18:06:36] I'd suspect so [18:06:41] damn. [18:06:48] ok, I guess I shall just shoot him an email [18:10:38] I would've expected him to be online... :/ [18:26:27] New patchset: Asher; "allowing bast/mon hosts to pull gmetad xml" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7977 [18:26:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7977 [18:28:21] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7977 [18:28:23] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7977 [18:38:32] tfinc: Any idea if we get any more debugging information from collection anywhere? [18:39:42] Reedy: no clue. PediaPress woudl know more. I just mailed them but with it beeing friday pm i'm not expecting an immediate response [18:39:55] Yeaah... [18:40:00] Seems it's not collection itself at fault [18:40:21] head collection on 1.20wmf2 works fine, head or the version 1.20wmf2 was using doesn't work [18:40:23] * AaronSchulz is addicted to yogurt raisins [18:42:25] So it's their download link via Special:Book [18:44:59] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/7823 [19:11:33] hi guys [19:11:45] anyone here know about the GeoIPcity.dat setup? [19:12:25] LeslieCarr maybe? [19:12:27] oh she's not around [19:12:31] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [19:13:00] She's on european timez [19:13:22] ottomata: drdee2 has written geoip-related code before [19:14:17] aye, but i'm referring to puppetmaster.pp [19:14:43] we are downloading the free lite versions of the GeoIP .dat files [19:14:51] and renaming them to the full .dat names [19:15:01] we should ahve a license for the full .dats [19:15:04] so dunno why we are doing that [19:15:23] hah [19:16:09] https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=blob;f=manifests/puppetmaster.pp;h=9e8ed7060c33036a5606a72b52f2da774854c96b;hb=refs/heads/production#l253 [19:29:28] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [19:44:05] Ryan_Lane: any idea why gerrit-wm was silent about 7917? (no email notif either AFAICT) [19:44:28] no clue [19:46:09] have you tried drafts yet? Thehelpfulone said he couldn't view that change many hrs ago. (before patchset 3. so when it was just drafts.) is that how drafts are supposed to work? [19:51:45] drafts are secret changes [19:52:03] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 195 seconds [19:52:12] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 198 seconds [19:52:43] ;( [19:53:33] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [19:53:33] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [19:53:34] Ryan_Lane: do the drafts later become public? can anyone else see patchset 1/2 ? [19:53:51] drafts become public when the submitter publishes it [19:54:40] k [20:03:41] hi guys, i'm having a really weird mysql startup problem [20:03:52] anyone avail to listen to my story and help me sleuth for a minute? [20:04:15] ottomata: sure. this is labs? [20:04:28] no, my local vm, but it is happening on stat1 too [20:04:31] shoudl be very simple [20:04:37] i just want to change the data dir [20:04:38] so [20:04:40] service mysql stop [20:04:47] cp -a /var/lib/mysql /a/ [20:05:01] vim /etc/mysql/my.cnf # change datadir = /a/mysql [20:05:05] service mysql start [20:05:28] datadir = /a i think? [20:05:37] it was /var/lib/mysql before [20:05:40] let me test something [20:05:42] k [20:05:50] yeah, service mysql start just hangs after that [20:05:56] nothing in error logs either [20:05:57] but there does exist a /var/lib/mysql/mysql [20:07:21] yeah [20:07:27] but that is the 'mysql; database [20:07:32] yeah, ok /a/mysql [20:07:33] 'mysql' [20:07:42] but it just hangs, grrr [20:07:43] and [20:07:49] here's another hint [20:07:50] i trie dthis on my local [20:07:54] service mysql stop [20:07:57] edit my.cnf [20:07:58] then [20:08:00] mysql_install_db [20:08:04] so, there's an /a/mysql/mysql ? [20:08:05] to get a brand new db [20:08:09] yes [20:08:16] did you enable log_error ? [20:08:19] yes [20:08:43] oh, not on stat1 [20:08:48] but on my local (same prob) [20:08:48] i get this [20:09:07] https://gist.github.com/2727357 [20:10:10] oh, it's probably just permissions [20:10:22] yeah, cept everything looks ok [20:11:13] pastebin: stat /a{,/*} /var/lib/mysql{,/*} [20:12:03] https://gist.github.com/2727369 [20:12:40] /a/mysql Access: (0700/drwx------) Uid: ( 109/ mysql) Gid: ( 113/ mysql) [20:12:48] /var/lib/mysql Access: (0700/drwx------) Uid: ( 109/ mysql) Gid: ( 113/ mysql) [20:14:35] ibdata1 is same perms too [20:15:09] and the my.cnf? [20:15:53] -rw-r--r-- 1 root root [20:16:11] if you have access to stat1.wikimedia.org [20:16:14] you can log in there and look too [20:16:19] no, i mean the content ;) [20:16:22] oh [20:16:23] ha [20:16:26] i don't have any access there [20:16:51] https://gist.github.com/2727391 [20:16:56] this is the stock my.cnf [20:17:03] i'm using a different one on my local vm [20:17:07] but same problem [20:17:13] only change i've made is to change datadir [20:17:49] hrmmmmm [20:17:56] i know! [20:17:58] hrrrmrm indeed [20:20:01] you could try this: apt-get --reinstall install $(dpkg -l mysql-server-5.\* | awk '/^ii /{print $2}') [20:20:51] hm, ok trying on my local vm [20:20:53] but i just installed this today [20:20:59] i don't even need that data dir [20:21:04] i'm happy to mysql_install_db a new one [20:21:29] i tried it on a populated db a couple days ago and it fixed stuff for me. but that was squeeze not precise/lucid [20:21:35] hmk [20:21:43] (and didn't lose data) [20:22:39] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [20:22:53] same problem :( [20:24:02] i just 777-ed the entire dir [20:24:06] and still [20:24:07] heh [20:24:11] InnoDB: The error means mysqld does not have the access rights to [20:24:11] InnoDB: the directory. [20:24:11] InnoDB: File name ./ibdata1 [20:24:11] InnoDB: File operation call: 'open'. [20:24:11] InnoDB: Cannot continue operation. [20:24:20] strace? ;) [20:24:33] not on mysqld_safe though. on mysqld itself [20:24:44] hard to catch it [20:24:55] hmm [20:24:56] erm? [20:25:31] ? [20:25:31] root@wmvm:/a/mysql# mysqld --defaults-file=/etc/mysql/my.cnf [20:25:32] 120518 20:25:24 [Warning] Can't create test file /a/mysql/wmvm.lower-test [20:25:50] root@wmvm:/a/mysql# ls -lad /a/mysql [20:25:50] drwxrwxrwx 3 mysql mysq [20:26:50] hmmmmmmm [20:26:50] datadir . [20:28:14] root@wmvm:~# mysqld --help --verbose --defaults-file=/etc/mysql/my.cnf | egrep '^datadir' [20:28:14] 120518 20:27:59 [Warning] Can't create test file /a/mysql/wmvm.lower-test [20:28:14] 120518 20:27:59 [Warning] Can't create test file /a/mysql/wmvm.lower-test [20:28:14] mysqld: Can't find file: './mysql/plugin.frm' (errno: 13) [20:28:14] 120518 20:27:59 [ERROR] Can't open the mysql.plugin table. Please run mysql_upgrade to create it. [20:28:14] datadir [20:28:19] what about the warning? https://gist.github.com/2727391#L37 [20:29:01] OHhhhh [20:29:02] hmm [20:29:13] do we use apparmor? [20:29:14] i guess so huh [20:29:26] idk... does your VM? [20:29:26] file exists [20:29:31] it does on stat1 too [20:31:17] started after turning apparmor off :) [20:31:28] interesting, hmmmm [20:31:33] maplebed / Ryan_Lane [20:31:42] know anything about apparmor + mysql in our clusters? [20:31:44] we use apparmor [20:31:45] do we use it? [20:31:45] yes [20:31:47] do we want to use it? [20:31:47] ok [20:32:01] if you use a directory other than /var/lib/mysql you need to change apparmor [20:32:25] /etc/apparmor.d/usr.sbin.mysqld [20:32:44] /var/lib/mysql/ r, [20:32:44] /var/lib/mysql/** rwk, [20:32:53] add in your directory, looking like those [20:33:11] reload the apparmor profiles [20:33:13] aye ok, i will puppetize the apparmor file too [20:33:24] I thought it was for /a [20:33:27] oh, can I notify Service[apparmor] [20:33:30] probably isn't [20:33:33] ? [20:33:51] our mysql puppetization is kind of rough [20:34:09] so ignore everything i say about that :) [20:34:09] yeah i'm trying to improve it [20:34:11] not for prod stuff [20:34:16] but for one off more generic stuff [20:34:25] looks like apparmor is not puppetized at all, eh? [20:34:32] growl [20:35:07] crapppy crackers, ungh [20:35:19] ok, i'm going to create a generic::apparmor::service class that just defines the service [20:35:21] so I can notify it [20:35:38] heh [20:36:15] puppetise all the software! [20:36:44] yes! that is the right attitude! [20:38:23] http://cdn.memegenerator.net/instances/400x/20607920.jpg [20:40:09] hehe [21:09:49] AaronSchulz: would you mind looking at https://gerrit.wikimedia.org/r/#/c/7985/ for me? [21:18:34] New patchset: Ottomata; "Creating generic::mysql::server class that installs packages and sets up my.cnf and starts mysqld." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7988 [21:18:51] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/7988 [21:18:59] boowhaaa [21:19:33] New patchset: Ottomata; "Creating generic::mysql::server class that installs packages and sets up my.cnf and starts mysqld." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7988 [21:19:51] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/7988 [21:20:18] 3rd time lucky? [21:20:37] New patchset: Ottomata; "Creating generic::mysql::server class that installs packages and sets up my.cnf and starts mysqld." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7988 [21:20:56] my lint checker was ok with it hte 2nd time [21:20:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7988 [21:20:59] yay! [21:21:19] Let's see [21:21:33] binasher and/or Ryan_Lane: would love a close review of that one [21:21:36] if you got a sec [21:21:40] i can walk you through it if you like [21:23:03] New patchset: Ottomata; "Creating generic::mysql::server class that installs packages and sets up my.cnf and starts mysqld." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7988 [21:23:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7988 [21:49:21] New review: Aaron Schulz; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/5778 [22:49:29] New review: Siebrand; "I've sent a mail to CT asking for help pulling this forward." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/5783 [23:07:53] New patchset: Asher; "migrating udpprofile from subversion from: svn.wikimedia.org/svnroot/mediawiki/trunk/udpprofile Last Changed Author: asher Last Changed Rev: 113114 Last Changed Date: 2012-03-05 16:47:05 -0800 (Mon, 05 Mar 2012)" [operations/software] (master) - https://gerrit.wikimedia.org/r/7996 [23:10:33] damnit, I seem to have broken ganglia. [23:11:42] New review: Aaron Schulz; "(no comment)" [operations/software] (master) C: 1; - https://gerrit.wikimedia.org/r/7996 [23:16:27] New review: Asher; "(no comment)" [operations/software] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7996 [23:16:29] Change merged: Asher; [operations/software] (master) - https://gerrit.wikimedia.org/r/7996 [23:19:21] New patchset: Aaron Schulz; "Set $wgSiteStatsAsyncFactor=1 on commonswiki." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7997 [23:19:27] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/7997 [23:24:22] New review: Aaron Schulz; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7997 [23:24:24] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7997 [23:28:24] New patchset: Asher; "fix sort of floating point time samples for correct percentile calcs" [operations/software] (master) - https://gerrit.wikimedia.org/r/7998 [23:30:48] New review: Asher; "this was previously applied to prod as a livehack" [operations/software] (master); V: 1 C: 1; - https://gerrit.wikimedia.org/r/7998 [23:32:21] New review: Aaron Schulz; "(no comment)" [operations/software] (master) C: 1; - https://gerrit.wikimedia.org/r/7998 [23:36:05] New review: Asher; "(no comment)" [operations/software] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7998 [23:36:07] Change merged: Asher; [operations/software] (master) - https://gerrit.wikimedia.org/r/7998 [23:44:59] maplebed: this was my gmetad change - https://gerrit.wikimedia.org/r/#/c/7977/1/files/ganglia/gmetad.conf