[00:12:45] !log uploaded new lucene-search-2 with fixed init script, will upgrade it on the search servers [00:12:52] Logged the message, Master [00:17:37] New patchset: Pyoungmeister; "moving preilly to roots class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35835 [00:19:03] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35835 [00:25:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:35:39] PROBLEM - Host yttrium is DOWN: PING CRITICAL - Packet loss = 100% [00:37:05] preilly: is there some blog that describes the locking/fix somewhere? [00:37:27] AaronSchulz: not really [00:37:39] AaronSchulz: you should talk to binasher for a good explanation [00:39:36] maybe http://bugs.mysql.com/bug.php?id=42930 [00:39:52] apparently the semantics of SHOW STATUS also changed in a breaking way in 5 [00:40:04] "Before MySQL 5.0.2, SHOW STATUS returned global status values" [00:40:09] I guess that didn't affect us apparently [00:40:35] maybe that just makes a few vars have session specific values then [00:42:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.022 seconds [00:44:08] New patchset: Pyoungmeister; "coredb: more reasonable inhritence model" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35344 [00:44:58] notpeter: now don't you try to be *reasonable* on me! [00:45:33] !log pgehres synchronized php-1.21wmf5/extensions/CentralNotice/ 'Updating CentralNotice to master to hide banners on mobile devices' [00:45:34] well, if you need someone to tell you that I'm not being reasonable, just ask faidon :) [00:45:40] Logged the message, Master [00:48:26] !log pgehres synchronized php-1.21wmf4/extensions/CentralNotice/ 'Updating CentralNotice to master to hide banners on mobile devices' [00:48:32] Logged the message, Master [00:58:39] TimStarling: https://gerrit.wikimedia.org/r/#/c/34338/ [00:59:01] what is the motivation for that timestamp value change other than it seeming less funny? [00:59:48] don't know [01:00:05] I just see stuff breaking somewhere [01:01:40] preilly: it's deployed now [01:06:25] TimStarling: okay awesome [01:07:02] ah... [01:07:04] 2012-11-28 04:29:25,537 [main] WARN org.wikimedia.lsearch.oai.IncrementalUpdater - Error sending index update records of enwiki to indexer at searchidx2 [01:07:04] java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is: [01:07:04] java.net.SocketTimeoutException: Read timed out [01:07:20] that would be why the incremental indexer stopped working immediately after I deployed the shorter timeouts [01:14:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:17:39] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [01:19:50] New patchset: Ryan Lane; "Add module, runner and add module directory" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35839 [01:28:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.842 seconds [01:28:46] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35839 [01:31:25] New patchset: Ryan Lane; "Fix template name" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35842 [01:31:49] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35842 [01:32:10] ARRRGH [01:32:54] New patchset: Ryan Lane; "Ugh" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35843 [01:33:41] New patchset: Ryan Lane; "Ugh. Adding template back in." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35843 [01:33:52] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35843 [01:38:40] PROBLEM - Puppet freshness on sockpuppet is CRITICAL: Puppet has not run in the last 10 hours [01:39:54] New patchset: Ryan Lane; "Fix module and pillar refresh" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35844 [01:39:56] now I remember why I always do this in labs [01:40:15] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35844 [01:51:22] New patchset: Ryan Lane; "runner_dirs fixes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35846 [01:57:50] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35846 [02:00:59] New patchset: Ryan Lane; "Salt role fixes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35847 [02:01:34] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35847 [02:03:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:07:00] PROBLEM - Host es1010 is DOWN: PING CRITICAL - Packet loss = 100% [02:08:12] PROBLEM - Host analytics1022 is DOWN: PING CRITICAL - Packet loss = 100% [02:08:12] PROBLEM - Host analytics1016 is DOWN: PING CRITICAL - Packet loss = 100% [02:08:21] PROBLEM - Host analytics1018 is DOWN: PING CRITICAL - Packet loss = 100% [02:08:30] PROBLEM - Host analytics1020 is DOWN: PING CRITICAL - Packet loss = 100% [02:08:30] PROBLEM - Host analytics1024 is DOWN: PING CRITICAL - Packet loss = 100% [02:08:30] PROBLEM - Host analytics1026 is DOWN: PING CRITICAL - Packet loss = 100% [02:08:39] RECOVERY - Host es1010 is UP: PING OK - Packet loss = 0%, RTA = 26.54 ms [02:08:48] RECOVERY - Host analytics1018 is UP: PING OK - Packet loss = 0%, RTA = 26.75 ms [02:09:07] RECOVERY - Host analytics1020 is UP: PING OK - Packet loss = 0%, RTA = 27.07 ms [02:09:07] RECOVERY - Host analytics1016 is UP: PING OK - Packet loss = 0%, RTA = 26.59 ms [02:09:15] RECOVERY - Host analytics1026 is UP: PING OK - Packet loss = 0%, RTA = 26.58 ms [02:09:33] RECOVERY - Host analytics1024 is UP: PING OK - Packet loss = 0%, RTA = 26.79 ms [02:09:33] RECOVERY - Host analytics1022 is UP: PING OK - Packet loss = 0%, RTA = 26.59 ms [02:19:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.884 seconds [02:24:32] !log LocalisationUpdate completed (1.21wmf5) at Thu Nov 29 02:24:32 UTC 2012 [02:24:40] Logged the message, Master [02:45:28] !log LocalisationUpdate completed (1.21wmf4) at Thu Nov 29 02:45:27 UTC 2012 [02:45:36] Logged the message, Master [02:46:32] New patchset: Ori.livneh; "Add remote host to Vanadium event log stream" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35848 [03:14:12] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [03:15:42] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [03:19:36] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [03:21:25] New patchset: Ryan Lane; "Fix requirement location for deploy module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35850 [03:27:42] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [03:30:46] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35850 [03:33:42] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.045 second response time [04:56:16] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [04:56:16] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [04:56:16] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [05:12:19] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [05:12:19] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [06:04:00] New patchset: Ryan Lane; "Up ulimit for gluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35854 [06:04:22] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35854 [06:05:18] New patchset: Tim Starling; "Increase RMI read timeout" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35855 [06:06:04] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35855 [06:08:59] !log fixing searchidx2 by increasing RMI timeout [06:09:08] Logged the message, Master [06:13:02] New patchset: Ryan Lane; "pillar -> pillars" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35856 [06:13:22] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [06:13:26] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35856 [06:18:21] New patchset: Ryan Lane; "Fix dependency loop" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35857 [06:18:51] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35857 [06:20:22] New patchset: Ryan Lane; "Fix syntax error. How did the lint check miss this?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35858 [06:20:54] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35858 [06:21:56] RECOVERY - Puppet freshness on sockpuppet is OK: puppet ran at Thu Nov 29 06:21:47 UTC 2012 [06:23:13] New patchset: Tim Starling; "More logging from the incremental indexer" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35859 [06:24:04] New patchset: Ryan Lane; "regex in salt is now -E, not -P" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35860 [06:24:32] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35860 [06:24:36] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35859 [06:24:52] TimStarling: I merged your change on sockpuppet [06:25:13] ok, thanks [06:25:36] yw [06:27:50] ok. wtf puppet. how is every fucking run different? [06:28:50] omfg give me a break. the order of a hash isn't necessarily the same?! [06:29:07] hm? [06:29:24] every time I run puppet on sockpuppet it changes a file [06:29:32] I'm using a hash [06:29:40] it's just changing the order of things [06:29:59] hate. so full of hate. [06:34:52] ah. it's ruby [06:35:02] hooray ruby, another reason to hate you [06:35:28] presumably puppet isn't forced to use plain unsorted hashtables [06:35:43] nah, it's actually ruby < 1.9 [06:35:53] it randomly iterates through a hash [06:36:12] in > 1.9 hashes are ordered [06:41:41] New patchset: Ryan Lane; "Sort hashes before using them" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35862 [06:42:21] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35862 [08:05:40] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 243 seconds [08:06:52] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 316 seconds [08:10:01] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 1 seconds [08:10:28] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [08:30:34] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 279 seconds [08:30:52] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 297 seconds [08:35:02] New patchset: MaxSem; "Add comment about regression suite for redirector" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35864 [09:11:31] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [09:33:25] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [09:42:42] morning [09:44:38] j^: this week's LWN edition has an article about "Wikipedia's new HTML5 video player" [09:49:14] New review: Faidon; "I see nothing relevant to Swift there; all the databases there are practically empty. You should als..." [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/35439 [09:50:58] New review: Faidon; "Why not expand this to be a parameter to the role class too and get rid of the hostname if altogether?" [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/35344 [10:17:45] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [10:22:24] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [11:18:42] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [13:29:02] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [14:22:30] New patchset: Cmjohnson; "Adding labsdb1001 and 1002. fixing labsdb3 entry" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35900 [14:29:24] robh: can you +2 this plz ^ [14:30:26] what was wrong with labsdb3 entry? [14:30:29] i see no change. [14:30:53] when it showed it up in my text editor it had extra spaces [14:31:10] ahh, hrmm [14:31:16] so really no change [14:31:20] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35900 [14:31:28] cool [14:31:29] thx [14:31:30] ok, its good to go, ya just gotta merge on sockpuppet [14:31:31] welcome =] [14:37:12] !log provisioning labsdb1001 [14:37:21] Logged the message, Master [14:57:29] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [14:57:47] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [14:57:47] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [14:57:47] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [14:58:41] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [15:02:17] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [15:03:56] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.045 second response time [15:13:41] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [15:13:41] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [15:37:41] !log provisioning labsdb1002 [15:37:48] Logged the message, Master [16:06:57] ++++ [16:14:51] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [16:16:15] New patchset: Anomie; "Allow per-realm and per-datacenter configuration" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/32167 [16:16:36] New review: Anomie; "Rebased and merged I58901bfd" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/32167 [16:17:45] Change abandoned: Anomie; "Merged into I7ef35304 following Ia319794c." [operations/mediawiki-multiversion] (master) - https://gerrit.wikimedia.org/r/32168 [16:20:53] New patchset: Anomie; "Allow per-realm and per-datacenter configuration" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/32167 [16:21:14] New review: Anomie; "Manually sync dblists" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/32167 [17:04:41] Change merged: Anomie; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35655 [17:24:34] sbernardin: can you pull out the bad disk on hume and give me the model # please [17:28:09] New patchset: preilly; "add IP range for Congo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35922 [17:28:53] Change merged: preilly; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35922 [17:30:36] preilly: yay :-) [17:30:39] !log es1004 has raid error...power cycling to enter raid bios [17:30:48] Logged the message, Master [17:33:25] ok [17:41:20] RECOVERY - MySQL Slave Running on es1004 is OK: OK replication [17:41:20] RECOVERY - Host es1004 is UP: PING OK - Packet loss = 0%, RTA = 26.53 ms [17:42:05] RECOVERY - MySQL Recent Restart on es1004 is OK: OK seconds since restart [17:42:14] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [17:42:23] RECOVERY - MySQL Slave Delay on es1004 is OK: OK replication delay seconds [17:46:58] cmjohnson1: I think this is it...Seagate ST3300655SS [17:47:38] yes that is thx [17:47:42] cmjohnson1: Here's the Dell part #...Dell HT953 [17:49:04] cool, thats what i needed [17:51:14] New patchset: RobH; "giving steve dctech access in admins" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35924 [17:52:20] New review: RobH; "As far as I know, this is all I need to do to give Steve the access that Chris used to have. As Mar..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/35924 [17:55:50] hey guys, what's the status with 720xds @ eqiad? [17:58:38] cmjohnson1 & sbernardin the replacement hume disks (rt3916) are ordered [17:58:46] they'll arrive sometime next week [17:59:16] ok [18:03:25] paravoid: is it ok to make MW just use http://10.2.1.27/auth directly instead of http://ms-fe.pmtpa.wmnet/auth ? [18:04:16] AaronSchulz: it is, but won't you get URLs with fqdns as X-Storage-URLs anyway? [18:04:43] cmjohnson1, RobH: 720xd status? [18:04:46] yeah, but hopefully it will stop the spam of "Couldn't resolve host 'ms-fe.pmtpa.wmnet'" on auth requests [18:05:16] erm, that sounds like a problem we should solve [18:05:28] paravoid: all the 720's are onsite in Tampa...2 of 12 are in eqiad [18:05:40] well like 3 or so ops people gave up looking at it [18:05:49] and how would it stop the log spam? wouldn't you just get "Couldn't resolve host" for the storage urls? [18:06:07] AaronSchulz: who? [18:06:16] (to coordinate with them) [18:06:29] I don't see any errors for non-auth requests [18:06:33] cmjohnson1: thanks. does onsite means racked up and ready to go? [18:06:33] we are working with apergos to get the next 2 be3 and 5 for Monday [18:06:39] maybe some would start up but there are none now [18:06:52] AaronSchulz: hrm [18:06:56] paravoid: ariel, peter, and I think I've pinged you about this [18:06:58] AaronSchulz: okay, could you try and let me know? [18:07:18] I don't remember that (not saying it didn't happen, just that I don't remember it :-) [18:07:20] * AaronSchulz has complained about this for weeks [18:07:29] sorry about that [18:07:31] I just want to get rid of those errors [18:07:32] use RT next time? :-) [18:07:33] in tampa, 4 should be in production now....swapping 2 more on Monday [18:07:42] the 2 in eqiad are racked and ready to go [18:08:01] which ones? [18:08:11] 6,7,8 and 10 [18:08:28] but you may want to check with ariel [18:08:33] no, the eqiad ones [18:08:40] oh...be1001 and 1003 [18:08:52] I want to finally start the ceph test cluster install [18:09:26] New patchset: preilly; "add IP ranges for CD" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35927 [18:09:48] Change merged: preilly; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35927 [18:13:01] !log aaron synchronized wmf-config/PrivateSettings.php 'Set Swift auth URL to the IP address host.' [18:13:08] Logged the message, Master [18:16:07] AaronSchulz: https://bugzilla.wikimedia.org/show_bug.cgi?id=42047 [18:16:13] should I reopen it? [18:16:30] or is the 108 bytes content-length fixed? [18:18:31] paravoid: I'd wait, someone will if anything happens [18:20:12] I'm not sure I understand the strategy [18:20:24] I've verified the bug personally, so it's not a worksforme [18:36:07] preilly: salt 'cp104[1-4]*' cmd.run 'puppetd -tv' [18:36:49] Ryan_Lane: okay cool thanks! [18:37:02] yw [18:40:41] Ryan_Lane: from fenari? [18:40:50] sockpuppet? [18:42:09] sockpuppet, ok [18:42:15] New patchset: MaxSem; "Set up yttrium as a Solr server for GeoData" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35931 [18:42:30] New patchset: Demon; "Turn on TemplateSandbox for all 1.21wmf5 wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35932 [18:44:42] paravoid: sockpuppet, yeah [18:44:47] it's the master [18:45:25] we can allow another host to run all commands on all hosts as a peer if we want, but I think fenari would be a really bad choice :) [18:45:58] not elitist enough?:P [18:46:15] agreed [18:46:40] MaxSem: it runs too much stuff [18:46:52] and has a public IP address [18:47:14] salt's reach is slightly scary, so it's best to protect it :) [18:47:34] yes [18:48:09] +∞ [18:50:52] it has the same level of reach as puppet [18:50:58] which is why I stuck it on the puppet server :) [18:51:18] salt's reach is just…. quicker [18:54:33] New patchset: Demon; "Turn on TemplateSandbox for all 1.21wmf5 wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35932 [18:57:26] Change merged: Anomie; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35932 [19:12:59] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [19:19:31] New review: Dzahn; "more RT-720" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/35680 [19:19:33] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35680 [19:21:45] !log demon synchronized wmf-config/CommonSettings.php 'TemplateSandbox on all wikis' [19:21:52] Logged the message, Master [19:22:12] !log demon synchronized wmf-config/InitialiseSettings.php 'TemplateSandbox on all wikis' [19:22:19] Logged the message, Master [19:23:51] Ryan_Lane, do you have a ready made list of tasks to hand off to Mike W next week? If not, should we formulate one? [19:25:16] !log demon synchronized php-1.21wmf5/extensions/LabeledSectionTransclusion/ 'Updating LST to master' [19:25:25] Logged the message, Master [19:25:27] andrewbogott: no ready-made list [19:25:36] we should start him off with granting requests [19:27:50] RECOVERY - Puppet freshness on analytics1001 is OK: puppet ran at Thu Nov 29 19:27:28 UTC 2012 [19:29:15] !log authdns-update to add dns for tin [19:29:22] Logged the message, RobH [19:29:26] Ryan_Lane: you should be set for dns, still need to add the dhcp lease info [19:29:42] cool [19:34:10] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [19:41:11] !log removing mysql server from iron [19:41:18] Logged the message, Master [19:43:33] New review: Dzahn; "server stopped, packages removed, iptables rules flushed" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/35439 [19:43:34] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35439 [19:44:01] aaron cleared profiling data [19:48:14] !log installing package upgrades on iron [19:48:19] I've seen manifests indented by tabs and two spaces - which of them is preferred? [19:48:20] Logged the message, Master [19:49:03] New review: Asher; "Re: logging XFF, we do not normally use it, unless it's set by one of our own proxies as it is easil..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/35848 [19:49:52] !log demon rebuilt wikiversions.cdb and synchronized wikiversions files: itwikisource back to 1.21wmf4 [19:49:58] Logged the message, Master [20:01:51] New patchset: Cmjohnson; "Fixing MAC addresses for ms-be1001/1003 to reflect change in HW" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35941 [20:16:21] New patchset: Ori.livneh; "Add remote host to Vanadium event log stream" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35848 [20:17:54] New review: Ori.livneh; "@Asher: sounds good to me; amended the patch." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/35848 [20:25:59] robh: could you please +2 for me https://gerrit.wikimedia.org/r/35941 [20:30:29] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35941 [20:30:37] cmjohnson1: done, just need to merge on sockpuppet [20:30:44] cool..thx [20:44:44] New patchset: Ottomata; "INcluding Erik M on analytics nodes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35955 [20:45:04] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35955 [21:19:40] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [21:27:02] !log old swift logs on iron, some rm'ed, some gzipped, archiving ben's home contents on tridge, clean up iron [21:27:09] Logged the message, Master [21:30:59] New patchset: Asher; "event log stream from bits to kraken, in analytic's preferred log format" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36053 [21:33:54] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36053 [21:34:10] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35848 [21:37:05] New patchset: Ottomata; "No longer using manifests/analytics.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36057 [21:37:37] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36057 [21:43:34] New patchset: Ottomata; "Using udp2log iptables rules on analytics1001" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36060 [21:43:46] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36060 [21:46:58] New patchset: Asher; "new s3 slave - db66 replaces db11" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36066 [21:47:22] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36066 [21:48:13] !log asher synchronized wmf-config/db.php 'pooling db66 as a new s3 slave' [21:48:19] Logged the message, Master [21:48:32] New patchset: Pyoungmeister; "coredb: more reasonable inhritence model" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35344 [21:51:41] New patchset: Ori.livneh; "Configure Extension:EventLogging" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36072 [21:52:00] New patchset: Ottomata; "No longer using analytics.pp, removing import." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36074 [21:52:11] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36074 [21:53:57] New patchset: Pyoungmeister; "setting db68 to be s7 box" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36076 [21:55:36] New patchset: Ori.livneh; "Configure Extension:EventLogging" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36072 [21:56:53] Change merged: Ori.livneh; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36072 [21:57:27] cmjohnson1 and sbernardin, friday is not going to happen (but I guess you figured that out already), still too soon to pull two hosts out [21:57:30] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36076 [21:57:33] we'll shoot for monday [21:59:07] apergos: ok...keep me posted [21:59:24] yep, will do [21:59:25] apergos: yep [21:59:37] thx for update...are you thinking monday? [21:59:43] RECOVERY - NTP on analytics1001 is OK: NTP OK: Offset -0.0004314184189 secs [21:59:45] yep [21:59:56] k cool [22:00:12] ok [22:02:09] New patchset: Ryan Lane; "Add tin to dhcp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36078 [22:02:57] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36078 [22:07:24] New patchset: Dzahn; "move misc::noc-wikimedia to ./misc/noc.pp (RT-720)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36086 [22:09:16] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36086 [22:14:58] !log starting innobackup from db1041 to db68 (s7) [22:15:06] Logged the message, notpeter [22:23:49] New patchset: Pyoungmeister; "coredb: more reasonable inhritence model" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35344 [22:24:52] New patchset: Dzahn; "move misc::survey to ./misc/limesurvey.pp and actuall call it limesurvey (RT-720)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36091 [22:25:43] binasher: is the db change for https://gerrit.wikimedia.org/r/#/c/34648/ in a todo list somewhere? [22:25:54] New patchset: Ori.livneh; "Drop trailing '?' from $wgEventLoggingBaseUri" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36092 [22:26:42] AaronSchulz: no [22:26:51] Change merged: Ori.livneh; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36092 [22:26:59] AaronSchulz: http://wikitech.wikimedia.org/view/Schema_changes [22:27:20] yeah I didn't see it there [22:27:22] that's the only schema change todo list [22:30:20] New patchset: RobLa; "Enabling LabledSectionTransclution on test2wiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36095 [22:31:17] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36091 [22:32:41] hrm....is it something I did, or is Jenkins being dumb? [22:38:15] !log olivneh synchronized php-1.21wmf4/extensions/EventLogging/modules/ext.EventLogging.js [22:38:21] Logged the message, Master [22:40:30] who is responsible for the jenkins tests on wmf-config [22:41:03] Antoine [22:41:07] I think [22:41:14] hm, not around [22:41:22] https://integration.mediawiki.org/ci/view/All-enabled/job/operations-mediawiki-config/1162/testReport/junit/%28root%29/dbconfigTests/testDoNotRemoveLinesInHostsbyname/ [22:41:33] dbconfigTests::testDoNotRemoveLinesInHostsbyname is totally broken [22:41:43] "You shall never remove hosts from hostsByName :-D [22:41:44] Failed asserting that 79 matches expected 78." [22:42:09] adding a line != removing a line. [22:42:22] okee doke [22:43:35] interesting....that test has been around a while [22:43:52] New patchset: Pyoungmeister; "coredb: more reasonable inhritence model" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35344 [22:43:54] oh what a horrible test [22:44:02] !log olivneh synchronized wmf-config/CommonSettings.php 'Updating configuration of EventLogging' [22:44:05] $this->assertEquals( 78 [22:44:06] >------->------->-------, count( $this->lb['hostsByName'] ) [22:44:09] Logged the message, Master [22:44:18] and hashar periodically updates the hard coded number [22:44:31] gah [22:45:31] !log olivneh synchronized wmf-config/InitialiseSettings.php 'Enabling EventLogging on metawiki' [22:45:38] Logged the message, Master [22:48:03] New patchset: Pyoungmeister; "coredb: more reasonable inhritence model" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35344 [22:48:26] New patchset: Asher; "commenting out testDoNotRemoveLinesInHostsbyname" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36099 [22:48:40] i wonder what jenkins will do with ^^ [22:48:57] New patchset: Ryan Lane; "Make tin the new git-deploy server" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36100 [22:49:12] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36099 [22:49:35] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35344 [22:51:11] New patchset: Ryan Lane; "Make tin the new git-deploy server" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36100 [22:52:29] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36100 [22:52:43] jenkins worked perfectly for me [22:53:12] oh. it's a specific issue [22:53:18] New patchset: Pyoungmeister; "adding wikivoyage to databse suffix list for indexing" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36101 [22:54:43] New review: Dzahn; "cool, thanks! that looks like it would fix index refreshes?!" [operations/puppet] (production); V: 1 C: 1; - https://gerrit.wikimedia.org/r/36101 [22:55:21] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36101 [22:56:25] New patchset: Ryan Lane; "puppet resources can't be accessed via salt:///" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36102 [22:56:34] New patchset: Jdlrobson; "ensure all photo uploads go to commons" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36103 [22:56:37] !log kaldari synchronized php-1.21wmf5/extensions/UploadWizard/resources/mw.UploadWizardDetails.js 'resyncing UploadWizard patch from yesterday' [22:56:44] Logged the message, Master [22:56:45] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36102 [22:57:38] New patchset: RobLa; "Enabling LabledSectionTransclution on test2wiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36095 [22:58:14] !log restarting incremental indexer on searchidx2 [22:58:21] Logged the message, notpeter [22:59:26] bleh. need to make a stupid debian package [22:59:52] AaronSchulz: could you push https://gerrit.wikimedia.org/r/36095 when the deploy traffic calms down? [22:59:53] New patchset: Ryan Lane; "Ensure git-core is installed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36104 [23:00:34] New patchset: Ori.livneh; "(Bug 39361) Clean up Wikimedia site favicon" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36105 [23:00:41] I guess, I wasn't aware of a lots of traffic...the last window is just closing [23:02:13] I'm guessing ori-l is still at it, based on ^ [23:02:31] we still have to run scap [23:02:45] there were uncommitted local changes aplenty on fenari [23:03:09] and that slowed us down [23:03:33] ^ robla / AaronSchulz [23:03:59] ori-l: completely uncommitted, or only locally committed? [23:04:30] * robla has to scurry off now [23:05:14] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36104 [23:05:22] uncommitted afaik but spagewmf was the one dealing with it [23:14:57] !log DNS update - add wikivoyage.com [23:15:04] Logged the message, Master [23:15:27] preilly: https://github.com/git-deploy/git-deploy [23:16:31] !log spage synchronized php-1.21wmf4/extensions/E3Experiments 'E3 deploy ACUX at 100%' [23:16:38] Logged the message, Master [23:17:41] mutante: don't forget wikivoyage.xxx [23:17:51] that's what universities are buying! [23:17:54] sync-dir complained "snapshot1002: rsync: mkdir "/apache/common-local/php-1.21wmf4/extensions/E3Experiments" failed: No such file or directory (2)" and Connection timed out to srv238 and srv266 [23:18:03] yeah ignore that [23:18:38] New patchset: Dzahn; "add wikivoyage.com redirects" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/36109 [23:18:58] AaronSchulz: hehe.. we should have .wiki [23:19:11] once checked out how much it was..but i forgot..a couple thousand [23:19:53] $3k or so [23:20:10] !log olivneh synchronized php-1.21wmf5/extensions/EventLogging [23:20:17] Logged the message, Master [23:20:29] Change merged: Dzahn; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/36109 [23:21:03] AaronSchulz: and "glam" wants wikipedia.museum i guess [23:23:15] * AaronSchulz doesn't get the point of .xxx at all [23:24:28] AaronSchulz: easier regexes for "kid-safe" filters.. ? shrug [23:24:47] what about meta tags or something? [23:25:34] might be a little tricker with all js sites [23:25:52] though they could always include some tags if the whole site is "mature" [23:26:00] "The sponsoring organization is the International Foundation for Online Responsibility (IFFOR)" [23:26:14] lol [23:26:28] http://www.icann.org/en/groups/board/documents/resolutions-18mar11-en.htm#5 [23:26:31] New patchset: Ryan Lane; "Change git-deploy.conf location" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36111 [23:26:35] ^demon|dinner: I'd ask if you were around, but your nick says no :) [23:26:39] I need a repo created. heh [23:26:56] I should really learn how he does that one of these days [23:27:31] look at his bash history on formey [23:27:45] he does it on formey? [23:28:09] well, bash history has no timestamps, but last time I checked, there were some interesting commands in there [23:28:12] Ryan_Lane: http://wikitech.wikimedia.org/view/Gerrit#Creating_new_repositories [23:28:25] TimStarling: so you've been spying on him? :) [23:28:26] ssh -p 29418 gerrit.wikimedia.org gerrit create-project --owner=MyGroup --parent=test --description='"My super awesome repository"' test/my-new-repo [23:28:42] http://www.mediawiki.org/wiki/Git/Creating_new_repositories [23:28:47] ah. cool [23:28:48] gah [23:28:49] AaronSchulz: thanks [23:28:51] * AaronSchulz is jinxed [23:28:54] yeah, you can just do it with the web interface, that part is easy enough [23:29:04] I'm pretty sure he doesn't do it on formey [23:29:12] I thought Ryan would know that already [23:29:21] where it gets tricky is if you want to import history from subversion [23:29:27] ah [23:29:32] yeah, I don't need to do that [23:29:36] which is what you do on formey [23:29:43] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [23:29:45] likely I just need to pick the proper parent repo and be done with it [23:30:22] and then usually Hashar added the .gitreview files [23:32:07] dzahn is doing a graceful restart of all apaches [23:32:29] !log dzahn gracefulled all apaches [23:32:35] Logged the message, Master [23:33:54] Ryan_Lane: Ævar Arnfjörð Bjarmason [23:34:19] New patchset: Pyoungmeister; "de-dupe file def for mysql ganglia" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36114 [23:35:46] New review: Reedy; "This will likely need numerous calls to purgeList to purge them from Squid" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/36105 [23:35:47] !log olivneh Started syncing Wikimedia installation... : [23:35:54] Logged the message, Master [23:36:14] Ryan_Lane: Add a LICENSE file [23:36:14] We normally make our stuff available under "Artistic || >=GPL 1". Just [23:36:15] license it like that to begin with, I have no objection myself to [23:36:16] making it PD, but I should ask the other guys first. [23:36:18] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36114 [23:36:21] Ryan_Lane: https://github.com/sjn/git-deploy/commit/bf1377d198757df9a0604de0426f33a8125913da [23:38:10] $tag =~ s/[^a-zA-Z0-9_]+/_/g; # strip any bogosity from the tag [23:38:15] * AaronSchulz chuckles [23:39:22] New patchset: Ryan Lane; "Initial commit of git-deploy package" [operations/debs/git-deploy] (master) - https://gerrit.wikimedia.org/r/36116 [23:39:52] Change merged: Ryan Lane; [operations/debs/git-deploy] (master) - https://gerrit.wikimedia.org/r/36116 [23:41:28] PROBLEM - mysqld processes on db68 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [23:42:06] <^demon|dinner> More people should learn how to make repos. [23:42:09] <^demon|dinner> :) [23:42:48] ^demon ^demon please click the button for me!! [23:43:14] ^demon|dinner: I just created it through the web interface [23:43:21] then added the ops group to owners [23:44:44] <^demon|dinner> The only place where you can really screw up when making a repo is checking the permissions-only box. [23:44:57] <^demon|dinner> That box should carry a warning, like "ARE YOU FREAKING SURE?" [23:45:40] Ryan_Lane: https://gerrit.wikimedia.org/r/#/c/36116/1/debian/copyright [23:46:54] speaking of gerrit, I've been using chrome for the last couple of days, and I finally discovered what gerrit is *meant* to do when you click on the fetch command on a page [23:47:11] apparently it's not meant to just disappear when you click on it, like it does in firefox [23:47:17] it's meant to replace it with a textbox [23:47:20] lol, I had that problem too [23:47:39] it doesn't disappear for me in firefox [23:47:46] it does for people using linux, though [23:47:47] <^demon> Yeah, that feature breaks on various browsers on various platforms from time to time. [23:47:55] New patchset: Dzahn; "add redirects for wikipedia.bg to bg.wikipedia.org" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/36117 [23:47:56] works fine fine with ff/precise [23:48:11] but I know it was broken at least on windows with an earlier FF version [23:48:36] ^demon: it's not exactly the most complex JS task [23:48:37] <^demon> Gerrit is pretty much unusable on Opera. [23:48:39] Change merged: Dzahn; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/36117 [23:48:43] odd that it would break so much [23:48:47] ^demon: what is Opera? [23:48:57] surely you jest [23:49:07] [/trolling] [23:49:41] <^demon> There's something that GWT does in Javascript that Opera really doesn't like. Something with anonymous functions. [23:51:06] dzahn is doing a graceful restart of all apaches [23:51:27] !log dzahn gracefulled all apaches [23:51:33] Logged the message, Master [23:52:21] New review: Ori.livneh; "@Reedy" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/36105 [23:54:16] AaronSchulz: firefox on precise happens to be what I am using [23:54:26] and it has never worked for me, and I just checked in safe mode and it still doesn't work [23:54:37] 16.0.2? [23:54:47] what isn't? [23:54:58] * AaronSchulz pushes paravoid in bed [23:55:27] 17 [23:55:37] :) [23:55:41] what's the fetch command on a page? [23:55:51] and I checked with an empty profile, also broken [23:56:05] ah [23:56:09] broken here too [23:56:13] firefox^Wiceweasel 17 [23:56:14] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36095 [23:56:45] sometimes a click will do nothing, sometimes it deletes the text, but I never saw a textbox pop up until I opened it in chromium [23:58:09] ori-l: scap is still running? [23:58:12] actually the first click throws an exception, the second deletes the text