[00:12:31] !log updated Parsoid to 45944a0 [00:12:39] Logged the message, Master [00:19:08] PROBLEM - Puppet freshness on sodium is CRITICAL: No successful Puppet run in the last 10 hours [00:23:08] PROBLEM - Puppet freshness on magnesium is CRITICAL: No successful Puppet run in the last 10 hours [00:31:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.136 second response time [00:46:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:47:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [00:58:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:59:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.133 second response time [01:01:16] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset -0.0009568929672 secs [01:03:05] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset 0.002873778343 secs [01:08:58] PROBLEM - Host mediawiki-lb.esams.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100% [01:09:18] RECOVERY - Host mediawiki-lb.esams.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 90.61 ms [01:20:59] AaronSchulz: indeed [01:21:25] AaronSchulz: I checked the (incomplete) list you gave me the other time and it was all deleted files [02:07:13] !log LocalisationUpdate completed (1.22wmf7) at Thu Jun 20 02:07:13 UTC 2013 [02:07:23] Logged the message, Master [02:12:59] !log LocalisationUpdate completed (1.22wmf6) at Thu Jun 20 02:12:59 UTC 2013 [02:13:07] Logged the message, Master [02:22:16] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 20 02:22:16 UTC 2013 [02:22:25] Logged the message, Master [02:51:58] New patchset: Ori.livneh; "Allow vanadium to log via logmsgbot" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69616 [02:52:58] PROBLEM - DPKG on mc15 is CRITICAL: Timeout while attempting connection [02:53:58] RECOVERY - DPKG on mc15 is OK: All packages OK [03:04:43] New patchset: Ori.livneh; "Set common rsync and dsh parameters in mw-deployment-vars" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57890 [04:30:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:31:37] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 4.140 second response time [04:49:00] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [04:58:50] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [04:59:30] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [05:21:35] apergos: morning [05:33:18] morning [05:44:05] there's a swift USN but it doesn't affect us [05:45:28] glad to hear it [05:45:42] did you do another all nighter? [05:47:16] uhm [05:47:17] kind of :) [05:47:23] not really, I woke up at 4am [05:47:46] ouch! [05:48:03] nah it's fine [05:48:22] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69616 [05:58:06] paravoid: thanks [05:58:14] hahaha [05:58:22] :) [05:58:37] it's not like I know exactly what that does [05:58:43] but it seemed harmless enough [06:00:26] i wrote tcpircbot to tim's spec. tin doesn't have a public interface like fenari did, so we had to re-do logmsgbot [06:00:46] it's a simple python script that reads from socket and writes to irc, with CIDR based filtering [06:03:29] you should translate mapped ipv4-mapped ipv6 to v4 though :) [06:08:38] paravoid: yeah, that part was just a bad design decision [06:08:42] but the impact is small [06:08:45] seems easy to fix [06:08:47] I'm on it [06:08:58] if only netaddr 0.7.7 that debian unstable has wasn't broken [06:09:01] I'd have a fix already [06:09:34] oh, wheezy too [06:09:35] nice [06:10:32] omg, 0.7.4 is also broken but in a different way [06:14:03] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [06:14:33] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [06:17:23] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: No successful Puppet run in the last 10 hours [06:17:37] paravoid: what's broken? [06:19:01] sec [06:22:36] New patchset: Faidon; "tcpircbot: work with IPv6 & no ACL, clarify option" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69624 [06:22:36] New patchset: Faidon; "tcpircbot: IPv4 cidr instead of IPv4-mapped IPv6" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69625 [06:22:41] ori-l: ^ [06:25:52] paravoid: nice change! testing [06:32:22] New review: Ori.livneh; "Nice changed; verified." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/69625 [06:37:40] *change [06:39:29] did you see both? [06:39:42] ori-l: they're two [06:39:54] I +1'd the other one, too, but gerrit-wm didn't announce it [06:39:59] perhaps because I didn't leave a comment, just the score [06:40:11] ah [06:40:43] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69624 [06:40:52] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69625 [06:41:12] better! [06:41:39] yes! that was a bit ugly before, thanks for that [07:03:10] New patchset: Aklapper; "Bugzilla Weekly Report: Don't list random products but top 5" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69629 [07:04:15] New review: Aklapper; "...to make this consistent with the rest of the existing queries, like the totally similar "Componen..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69629 [07:51:35] gooood morning [07:51:49] hey, hashar [07:52:39] howdy? [07:52:41] <-- sick [07:53:43] oh, sorry to hear that. I'm fine, a bit bored [07:55:12] i'm checking out https://github.com/mozilla/wiki-tests which seems quite nice [07:56:32] one of the developers was on #mediawiki earlier but i missed him [07:56:49] zeljkof is your guy when it comes to selenium tests :) [07:57:31] though we are using a ruby implementation to drive selenium [07:57:58] hashar, ori-l: I have talked with mozilla guys at selenium conference last week [07:58:44] they write tests in python, as far as I know [07:58:53] howdy all looking for the ruby tests that I saw at sel conf I'm from Mozilla [07:59:03] marktraceur pointed him to qa/browsertests [07:59:16] he was looking for test case ideas, I think [08:00:30] ori-l: I do not remember talking to stephend at seconf [08:00:38] ori-l: regarding your Ganglia graph of mw exceptions & fatal, I replied on wikitech-l . There is a nagios plugin to check a ganglia metric :) [08:00:44] qa/browsertests is the right place [08:01:12] ori-l: you could follow up with Leslie / Daniel [08:02:17] Yeah, I saw your reply, haven't had the chance to check out the plug-in yet [08:02:41] I did ask for someone from ops to pair with me on this and as you can imagine my inbox was flooded with replies [08:02:58] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset 0.003545403481 secs [08:03:14] ori-l: you want Daniel :) [08:03:29] if you manage to setup a time with him ahead of time, I am sure I will be happy to help [08:03:54] yeah, i know :) i'm just being a little trollish. people are nice, just busy [08:04:21] do you think monitoring ganglia is the right approach? keep in mind that i wrote the python daemon that generates the ganglia stats, so i have direct high-level access to the underlying data [08:04:47] so that script (or a variant of it) could also emit alerts without having to rely on parsing ganglia [08:05:09] i was looking at various algorithms for anomaly detection but they're mostly too advanced for me [08:05:23] though etsy just open-sourced a library: https://github.com/etsy/skyline [08:05:31] is this for icinga checks? [08:05:37] yeah [08:05:51] I'd say go for the underlying data [08:05:59] sometimes ganglia has issues, why involve it [08:06:03] right [08:06:14] anomaly detection is probably too fancy, it's probably adequate to have a rule calibrated to an absolute threshold [08:06:25] for a first tak, absolutely [08:06:27] *take [08:07:02] i haven't written an icinga plugin before, i should take a look [08:07:19] i think i did once before and got a little lost [08:07:42] I've never worked with them either (a reason I didn't volunteer to your email ;-)) [08:10:14] ori-l: I saw skyline too [08:10:24] looks interesting [08:12:15] after reading about it and some other anomaly detection stuff i remember ryan e-mailed the list to say that canonical was interested in getting a dump of ganglia data, and i bet you they meant to use it as training data for a machine learning anomaly detection algorithm [08:13:24] what sort of anomalies? [08:14:26] Nemo_bis: the algorithms don't know/care about the meaning. any significant deviation from established patterns. in the context of failure analysis, you look for, say, spikes in CPU load [08:15:12] hm [08:15:32] so definitely not something like Mozilla's stats on error rates http://laxstrom.name/blag/2013/02/11/fosdem-talk-reflections-23-docs-code-and-community-health-stability/ [08:16:50] ori-l: I am not sure how the mwerrors are counted. Seems it is doing counter += 1 , so that is most probably saved as a counter and hence you should have a rate of the errors [08:17:16] ori-l: the nagios plugin could raise an error whenever the rate of errors is higher than something (like more than 5 errors per minutes) [08:17:26] Nemo_bis: not that exactly, but the overarching goal is the same [08:18:28] paravoid: if you google for 'anomaly detection ddos' you'll find a bunch of interesting papers [08:18:34] hashar: yeah, that's what i'm going to do, i think [08:18:35] hashar: shouldn't it divide by pages served? [08:19:06] Nemo_bis: I don't think that is relevant since we have a good steam of pages being served :D [08:19:11] right [08:19:20] regardless of the time of the day [08:24:59] yeah, when someone deploys a bug it's usually quite unambiguous [08:33:01] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset 0.005766034126 secs [08:33:02] paravoid: hi! Got a few minutes? apergos and I have a puppet layout question for you :) [08:34:13] go ahead [08:34:15] I got a template in the applicationserver module which need a variable to be set differently based on the realm. So I have added a class parameter, then I had to update all the callers in the role class to pass the variable https://gerrit.wikimedia.org/r/#/c/68831/2/manifests/role/applicationserver.pp,unified [08:34:45] So we end up requiring to call 5 times: class { 'applicationserver::config::php': [08:34:45] fatal_log_file => $role::applicationserver::configuration::fatal_log_file[$::realm] [08:35:46] there's only the two values, they only vary by realm... where should we better put that variation to avoid 5 calls like that? [08:35:55] ahh I could call that directly into role::applicationserver::configuration [08:36:40] and convert the fatal_log_file hash into a ? $::realm { 'production' => foo, 'labs' => bar } , then call that application::config::php with the resulting value [08:37:41] does it make any sense ? :-] [08:38:31] bleh the whole role class could use some refactoring [08:42:52] I am rewriting my patch to call the parameterized class in the configuration role class [08:43:57] move ::php into ::common? [08:44:03] it's not like we have appservers without php [08:45:44] well, you can also cheat and not do this in puppet at all, since it's done in CommonSettings.php [08:46:06] that is true [08:46:07] see the switch( $wmfRealm ) block that sets $wmfUdp2logDest to different values based on the realm [08:47:07] that is for the wmerrors PHP Extensions [08:47:19] I am not sure the wmerrors.log_file is set in CommonSettings.php [08:47:59] well, what if you set $fatal_log_file to 'udp://$wmfUdp2logDest/wmerrors' [08:48:48] which wmerrors are you referring to, btw? [08:51:05] the PHP Extension [08:51:12] that catch the fatals and send them to a log file [08:51:19] (or over udp) [08:54:11] New patchset: Hashar; "vary wmerrors.ini 'fatal_log_file' per realm" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68831 [08:54:16] apergos: ^^^ :-) [08:56:01] trying on labs [08:58:03] btw, you guys saw closedmouth's report on #wikimedia-tech? i don't know how to diagnose that [08:58:28] New review: Hashar; "seems to work fine on integration-puppet.pmtpa.wmflabs labs instance :-)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68831 [08:59:08] I see it now [09:00:37] S1 slaves are lagged out http://noc.wikimedia.org/dbtree/ [09:00:49] db1043 db1049 db1050 db1051 and db1052 [09:00:58] though db63 seems fine [09:02:23] yes just the eqiad slaves [09:04:04] some lag started around 8:49 UTC [09:04:18] seems to be resolving [09:04:36] hourly lag graph http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&title=mysql_slave_lag&mreg[]=%5Emysql_slave_lag%24&hreg[]=db1051&aggregate=1&hl=db1051.eqiad.wmnet%7CMySQL+eqiad [09:05:37] yeah there's only one lagged now [09:06:35] I was looking at the processlist but things seem to be moving through on master [09:12:01] apergos: so the patch is a bit nicer now https://gerrit.wikimedia.org/r/#/c/68831/3/manifests/role/applicationserver.pp,unified [09:12:18] the call to the parameterized class is now in the role::applicationserver::configuration [09:12:24] Wikimedia Platform operations, serious stuff | Log: http://bit.ly/wikisal | Channel logs: http://ur1.ca/edq22 | MediaWiki error counts: https://tinyurl.com/n3twd8k | on RT duty: RobH [09:12:27] which is loaded from everywhere [09:12:39] yes, I have been looking at it [09:12:45] this definitely is nicer [09:14:05] \O/ [09:22:57] New patchset: Hashar; "vary wmerrors.ini 'fatal_log_file' per realm" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68831 [09:23:43] New review: Hashar; "Prefixed the configuration class with 'php':" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68831 [09:28:33] New patchset: Hashar; "vary wmerrors.ini 'fatal_log_file' per realm" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68831 [09:28:33] New patchset: Hashar; "PHP fatal destination is now a class parameter" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68830 [09:28:45] New review: Hashar; "rebased" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68831 [09:28:53] apergos: they are good to go :-) [09:29:02] once merged in puppet, I can try them out on the beta apaches [09:29:07] okay [09:29:07] sec [09:29:12] then merge in sock puppet and try out in prod :-] [09:30:07] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68830 [09:30:57] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68831 [09:31:04] ok they're both in [09:31:13] trying out on labs [09:31:16] err on beta [09:31:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:32:11] -wmerrors.log_file=udp://10.64.0.21:8420 [09:32:11] +wmerrors.log_file=udp://10.4.0.58:8420 [09:32:12] :-] [09:32:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [09:32:27] now I have no idea how to generate a fatal [09:33:13] uh oh [09:40:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:41:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [09:43:04] arghg [09:43:12] unrelated screaming [09:51:14] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [09:51:14] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [09:51:14] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [09:51:14] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [09:51:14] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [09:51:15] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [09:51:15] PROBLEM - Puppet freshness on spence is CRITICAL: No successful Puppet run in the last 10 hours [09:51:16] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [09:51:16] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [09:51:17] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [09:56:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:57:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.157 second response time [10:00:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:03:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [10:19:21] PROBLEM - Puppet freshness on sodium is CRITICAL: No successful Puppet run in the last 10 hours [10:23:21] PROBLEM - Puppet freshness on magnesium is CRITICAL: No successful Puppet run in the last 10 hours [10:26:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:27:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [10:34:00] poor stafford [11:02:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:03:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [11:11:09] New review: Nikerabbit; "Did you forgot to sync this?" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68947 [11:35:07] mark: https://gerrit.wikimedia.org/r/#/q/project:operations/debs/ircd-ratbox+owner:%22AzaToth+%253Cazatoth%2540gmail.com%253E%22,n,z [11:35:20] mark: ryan wanted you to perhaps look into it [11:45:47] snack time [11:59:17] New patchset: Nikerabbit; "ULS deployment phase 3" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/69643 [12:01:59] New review: Nikerabbit; "Planned for 2013-06-25" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/69643 [12:12:00] is there anybody here who can increase the max_user_connections to the replicas for a tool labs project? [12:22:39] JohannesK_WMDE: #wikimedia-labs :-D [12:22:48] JohannesK_WMDE: and/or fill in a bug :-] [12:23:55] hashar: folks in #wikimedia-labs directed me here [12:24:54] JohannesK_WMDE: so a bug will do it :-] [12:25:05] apparently asher can do it but he's not here yet, so i wanted to know if anybody else can do it. we need it urgently. [12:25:08] east coast staff will connect soon [12:25:55] follow up on -labs [12:54:59] PROBLEM - swift-object-server on ms-be2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:54:59] PROBLEM - swift-object-replicator on ms-be2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:54:59] PROBLEM - Disk space on ms-be2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:54:59] PROBLEM - swift-container-auditor on ms-be2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:55:18] PROBLEM - swift-account-reaper on ms-be2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:55:18] PROBLEM - RAID on ms-be2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:55:28] PROBLEM - swift-account-server on ms-be2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:55:28] PROBLEM - swift-account-replicator on ms-be2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:55:28] PROBLEM - swift-object-updater on ms-be2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:55:38] PROBLEM - swift-container-replicator on ms-be2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:55:48] PROBLEM - swift-container-server on ms-be2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:55:48] PROBLEM - swift-container-updater on ms-be2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:55:49] PROBLEM - swift-account-auditor on ms-be2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:55:58] PROBLEM - swift-object-auditor on ms-be2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:55:58] PROBLEM - DPKG on ms-be2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:27:37] !log dns update [13:27:46] Logged the message, Master [13:53:10] New patchset: coren; "Tool Labs: Bump max_user_connections to 512" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69648 [13:58:06] New review: Demon; "(1 comment)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/69648 [13:59:08] New patchset: coren; "Tool Labs: Bump max_user_connections to 512" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69648 [14:01:16] New patchset: coren; "Tool Labs: Bump max_user_connections to 512" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69648 [14:08:51] mark, are you available to stand by and test after I merge https://gerrit.wikimedia.org/r/#/c/68584/? [14:16:57] !log jenkins updating all mediawiki extensions unit testing jobs ( mwext-.*-testextensions-master' [14:17:06] Logged the message, Master [14:39:10] New patchset: Cmjohnson; "adding cp1056-cp1070 dhcpd/fixing spaces" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69663 [14:41:10] New patchset: Cmjohnson; "adding cp1056-cp1070 dhcpd/fixing spaces" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69663 [14:41:54] Change merged: Cmjohnson; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69663 [14:59:10] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [15:10:36] PROBLEM - SSH on ms-be2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:22:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:24:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [15:26:06] New review: Andrew Bogott; "My previous comment is incorrect; in labs we need to replace ldap::client::wmf-test-cluster with lda..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69337 [15:39:49] hashar, faidon, any idea what the story is with puppet class nfs::server? It doesn't seem to be used anywhere. [15:52:04] New patchset: Andrew Bogott; "Moved nfs manifest into a module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69682 [15:52:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:52:50] New patchset: Andrew Bogott; "Moved nfs manifest into a module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69682 [15:53:11] New review: Andrew Bogott; "Work in Progress -- do not merge" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69682 [15:54:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [15:54:31] !log updated Parsoid to bf8d3df [15:54:41] Logged the message, Master [15:57:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:58:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [16:08:20] !log nuked neon puppet.log b/c /var/log was 99% full [16:08:29] Logged the message, Master [16:18:21] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: No successful Puppet run in the last 10 hours [16:20:26] apergos: did you see about ms-be2? [16:33:28] New review: Andrew Bogott; "This is now tested and ready for merge, pending explanation of the weird nfs::server class in the ol..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69682 [16:37:52] Hey Reedy, have you cut wmf8 yet? I want to make sure a last centralauth change goes in... [16:37:59] Yeah [16:38:38] 50 minutes ago apparently [16:38:38] https://git.wikimedia.org/log/mediawiki%2Fcore.git/refs%2Fheads%2Fwmf%2F1.22wmf8 [16:42:24] New review: Faidon; "See inline. You're using tabs & spaces inconsistently but in any case, the agreement is to use 4-spa..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/69682 [16:45:45] paravoid, oops, I forgot to type 'git review' after fixing all the whitespace [16:45:50] thanks for reading, new patch coming soon [16:47:11] paravoid, you think I should just excise the monitoring stuff? That's all cut-n-paste, not sure what it's about. [16:47:52] not the monitoring [16:48:03] just move the subclass's content outside [16:48:13] I didn't see the monitoring class being referenced anywhere else [16:48:18] (unless I was wrong) [16:51:12] the class is defined and then immediately included. So that means it does something, doesn't it? [16:51:29] Um… unlike the 'backup' class whcih is not included. Hm... [16:52:00] yes, that's my point [16:52:18] instead of class monitoring { foo } include monitoring [16:52:20] just do foo [16:52:35] it's a single definition inside, it's not like the class serves as a grouping [16:52:36] oh, I see what you're saying, ok. [16:53:00] but, the 'backup' class is just dead code, isn't it? Or am I misunderstanding how that works? [16:53:37] !log reedy synchronized php-1.22wmf8/ 'initial sync' [16:53:45] Logged the message, Master [16:53:57] no idea [16:55:44] !log reedy synchronized docroot and w [16:55:53] Logged the message, Master [17:00:54] Dang. Reedy, let me know when your wmf8 deploy is done, and I'll push the latest centralauth. Sorry about that. [17:04:49] New review: Andrew Bogott; "retabbed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69682 [17:04:51] New patchset: Andrew Bogott; "Moved nfs manifest into a module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69682 [17:08:01] New patchset: Andrew Bogott; "Moved nfs manifest into a module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69682 [17:08:59] grrr [17:09:09] !log reedy Started syncing Wikimedia installation... : rebuild localisation cache and testwiki to 1.22wmf8 [17:09:17] Logged the message, Master [17:21:01] apergos: Could you kill php-1.22wmf2 from snapshot3 please? [17:21:44] Reedy, Campaigns is a new extension (already deployed in 1.22wmf6 and 7) but it wasn't in make-wmf-branch/default.conf. https://gerrit.wikimedia.org/r/#/c/69691/ adds it. Sorry I missed it [17:22:08] Feel free to make a patchset just adding it to wmf/1.22wmf8 and I'll make sure it's synced out [17:33:36] !log reedy Finished syncing Wikimedia installation... : rebuild localisation cache and testwiki to 1.22wmf8 [17:33:44] Logged the message, Master [17:37:47] !log reedy synchronized php-1.22wmf8/extensions/ 'Sync Campaigns and CentralAuth' [17:37:55] Logged the message, Master [17:38:16] New patchset: Reedy; "testwiki to 1.22wmf8" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/69696 [17:38:26] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/69696 [17:39:18] New patchset: Reedy; "(bug 49358) Remove MoodBar from it.wikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68352 [17:39:39] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68352 [17:40:00] !log LocalisationUpdate completed (1.22wmf7) at Thu Jun 20 17:39:59 UTC 2013 [17:40:07] Logged the message, Master [17:40:47] !log LocalisationUpdate completed (1.22wmf6) at Thu Jun 20 17:40:46 UTC 2013 [17:40:49] New patchset: Reedy; "(bug 49575) Set up $wgImportSources for vec.wiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68655 [17:40:55] Logged the message, Master [17:41:10] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68655 [17:41:35] New patchset: Reedy; "(bug 49612) Localise $wgSitename for fr.wikibooks" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68858 [17:41:57] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68858 [17:42:14] New patchset: Reedy; "(bug 49335) Modify wgNamespacesToBeSearchedDefault for ukwikinews" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/69160 [17:42:32] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/69160 [17:48:21] New patchset: Reedy; "Remove narayam and webfonts from extension-list" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/69699 [17:49:34] New patchset: Reedy; "Remove narayam and webfonts from extension-list" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/69699 [17:49:42] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/69699 [17:50:39] New patchset: Reedy; "Bug 48354: exclude MediaWiki: namespace" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68183 [17:51:00] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68183 [17:51:29] Change abandoned: Reedy; "I created a dupe of this and already merged" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68648 [17:51:51] !log LocalisationUpdate completed (1.22wmf8) at Thu Jun 20 17:51:50 UTC 2013 [17:52:03] Logged the message, Master [17:52:14] New patchset: Reedy; "(bug 46244) Enable wmgUseVectorFooterCleanup on ilowiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68298 [17:52:33] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68298 [17:54:09] !log reedy synchronized wmf-config/ [17:54:17] Logged the message, Master [17:57:02] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 20 17:57:01 UTC 2013 [17:57:09] Logged the message, Master [17:57:54] Ryan_Lane: had time to test? [17:57:59] 19:47 < average> does the Depends: field accept any form of the "OR" operator ? [17:58:02] 19:47 < average> for example, there are multiple packages providing JDK (the Java JDK) [17:58:05] 19:47 < average> and I want to do stuff like [17:58:07] 19:47 < algernon> yes. "|" [17:58:10] 19:47 < average> Depends: sun-java6 OR sun-java7 OR gcj OR openjdk [17:58:12] 19:47 < algernon> look at any of the java packages for an example :) [17:58:15] 19:48 < average> algernon: could you please point me to an example ? [17:58:18] 19:49 < wRAR> average: you should read the policy [17:58:21] 19:49 < wRAR> 7.1 Syntax of relationship fields in this case [17:58:23] 19:50 < algernon> average: ant is one such example, but see the policy as wRAR mentioned [17:58:24] average_drifter: stop schpamming [17:58:26] 19:50 < babilen> average: You really shouldn't depend on sun-* java *at all* but on default-jdk (that is openjdk) [17:58:29] 19:51 < babilen> In particular not Sun's/Oracle's JDK6 as it hasn't been maintained in quite a while and is a security nightmare. [17:58:35] paravoid: what is your oppinion on the above ? [17:58:38] quoted from #debian-mentors on irc.debian.org [17:58:42] AzaToth: should have used a pastie or gist, sorry [18:00:05] average_drifter: you should never depend on sun(oracle) java [18:00:41] AzaToth: ok, would you agree that whenever JDK is a dependency, I should use default-jdk to provide it ? [18:00:59] average_drifter: difficult to answer [18:01:19] for buck I had to depend on openjdk-7-jdk [18:01:32] as default-jdk point to 6 [18:01:35] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: mediawikiwiki, test2wiki and testwikidatawiki to 1.22wmf8 [18:01:45] Logged the message, Master [18:02:01] average_drifter: but normally, default-jdk is the correct one [18:02:42] New patchset: Reedy; "test2wiki, testwikidatawiki and mediawikiwiki to 1.22wmf8" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/69701 [18:03:14] AzaToth: average_drifter is working on dclass that needs to run in the hadoop cluster, the hadoop cluster is still using sun java6 so whatever jdk is chosen it needs to be compatible with sun java 6 [18:03:40] drdee: I would assume it's forward compatible [18:03:46] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: rest of wikipedias to 1.22wmf7 [18:03:49] i.e. it can be run on jdk7 [18:03:54] Logged the message, Master [18:04:19] well if you compile it with jdk7 then i don't think it will run with sun java6 [18:04:29] drdee: is the hadoop cluser using oracle java6 or openjdk 6? [18:05:45] drdee: afaik you are able to specify targer version [18:05:48] oracle java6 [18:05:52] !log updated Parsoid to b206b54 [18:05:59] Logged the message, Master [18:06:28] drdee: aint that against policy? [18:06:36] AzaToth: not just yet [18:06:37]