[00:03:41] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [00:21:25] PROBLEM - Host mw1055 is DOWN: PING CRITICAL - Packet loss = 100% [01:14:49] hmm. not sure who would be on right now, but does the WMF use Zeus Technology? [01:15:12] aka Riverbed, Stingray traffic manager, [01:15:35] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [01:15:38] I got a very strange message about meta.wikimedia.org not being found, something about virtual host not running [01:16:01] didn't copy it unfortunately [01:16:49] Thehelpfulone: no [01:17:53] the link on the bottom took me to http://www.riverbed.com/us/products/stingray/stingray_tm.php [01:18:26] maybe my ISP is using it? [01:26:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [01:41:59] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 297 seconds [01:41:59] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 224 seconds [01:48:26] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 611s [01:50:59] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 11 seconds [01:51:35] PROBLEM - Puppet freshness on ms1 is CRITICAL: Puppet has not run in the last 10 hours [01:54:26] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 17s [01:55:38] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 0 seconds [02:29:47] !log adjusted the swift rings a bit to move some more traffic to the new hosts. set object partitions to weight 10, account and container 60. [02:29:56] Logged the message, Master [03:02:01] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [03:44:52] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [04:38:39] !log started hot backup of db48 to db1048 [04:38:47] Logged the message, Master [05:02:59] binasher: are your db48 actions possibly causing an authentication error (to the mysqld) exception in the OTRS web UI? [05:03:11] * jeremyb thinks that's what aude just got. sorry i didn't save the stack [05:03:19] (was in the last 5 mins) [05:03:21] jeremyb: did you see my note in -tech or you seeing it too? [05:03:28] grrr [05:03:31] quite possibly [05:04:54] do you recall the gist of the error message / is it happening every time? [05:05:30] otrs is buggy [05:05:37] binasher: i repro'd now [05:06:02] otrs has an unfortunate mix of 64 innodb tables and 16 myisam tables, which makes totally lockless backups impossible [05:06:14] i can stop what i'm doing though [05:06:30] wtf, myisam? can't purge that crap? [05:07:03] FYI, http://dpaste.com/769684/plain/ [05:07:35] i haven't looked at the table schemas so don't know if there are full text indexes in use or if there's any good reason behind it at all. most likely not [05:07:39] binasher: would be nice if we could at least put up a relevant note on the exception page. but i don't suppose that's feasible [05:08:01] oh, actual mysql client level auth errors [05:08:06] yes [05:08:36] coincidentally it was also app auth i think (i was trying to log in) [05:10:02] i wonder if i'm saturating its nic [05:10:19] i just ran this from fenari - for x in {1..100} ; do mysql -h db48 -e "select 1" >/dev/null ; done [05:10:29] and got 2 connect failures out of 100 [05:11:51] http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=large&h=db48.pmtpa.wmnet&m=cpu_report&s=descending&mc=2&g=network_report&c=MySQL+pmtpa [05:12:29] that's impressive, my xtrabackup is pushing 960mbps over its gigabit interface. but :( [05:16:34] binasher: what is this? FTWRL? [05:17:20] * jeremyb briefly wonders why lvm can't help even for myisam? you can just shut down mysqld entirely when creating the snap [05:20:35] i see inserts into otrs.article_plain, etc in the binlogs from the last couple of minutes.. is the app broken though? [05:21:07] binasher: must be intermittent? I think refresh fixes? [05:21:23] but surely it's annoying if someone's using it a lot. i was just using in passing [05:21:54] New patchset: Jalexander; "Add Canada to Shop Link targets" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/15519 [05:23:14] there was no ftwrl in the last while, i think what you were seeing was just an effect of network saturation [05:25:12] oh, right [05:25:15] * jeremyb is half asleep [05:25:32] there was a brief one at the very start of the process while the myisam tables were copied, but it only lasted a few seconds a half hour ago. the myisam tables are just a few MB total and the other 270GB is lockless [05:25:32] binasher: not coming? ;) [05:25:55] all done now though [05:26:00] k [05:26:13] wikimania? [05:26:18] yah [05:26:51] nope, i didn't realize that going without giving a talk was an option [05:26:59] oh. ;-( [05:27:27] and never having been, i had no ideas what i'd give a talk about [05:27:29] * jeremyb points to "ask the mysqld operators" [05:27:56] i would totally do something like that [05:28:35] * jeremyb was imagining you could just take a small chunk of lcarr's [05:28:43] but i wouldn't mind a full mysqld one either [05:29:43] maybe i'll apply to go next year [05:30:27] hong kong [05:31:17] ooh. although $$$, so i wonder if the foundation will want to send fewer people [05:32:16] it's a long time from now, i think you can't necessarily extrapolate from current behavior ;) [05:33:10] my ticket was $26 and the foundation paid for it. (the wikibus; but i have to find my own way home) [05:33:31] i'll probably find the future behavior slightly bewildering no matter what! [05:33:33] where do you live? [05:33:54] brooklyn, 1-2 miles from otto [05:33:55] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [05:35:37] nice, i think i'd like to live in nyc for a year or two one day [05:37:49] RECOVERY - mysqld processes on db1048 is OK: PROCS OK: 1 process with command name mysqld [05:37:58] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [05:41:53] nyc rocks, (or it did when I lived there) [05:43:24] good morning EEST [05:45:38] oh I've been on here since around 7 am [05:45:55] it's one of those days, very hot so you want to do all your work as early or as late as possible [05:46:31] yeah ;) [05:46:36] not here. 31 tomorrow [05:46:56] google says you will have 37 [05:47:13] google lies, prolly around 40 or 41 [05:47:15] for like the next 4 days at least [05:47:23] a nice return for faidon ;) [05:47:28] they say that next tuesday the temp will drop [05:47:42] a lot, and be around 31-32 [05:48:53] how long does wm run? [05:49:52] apergos: til sunday w/ unconf or sat without [05:50:08] apergos: but i mean from dc12 of course [05:50:50] dc12? [05:50:55] ah [05:51:22] yeah he already had that scheduled [05:52:06] * jeremyb remembers. but they conflicted any way you cut it ;( unless he pulled an asheesh [05:52:24] last year was perfect [05:53:37] ~3 full day buffer without missing a thing and a trip that takes <12 hrs (from dc11 -> wm2011) [05:57:14] less than 12 hours is still a long trp [05:57:47] well it was a ~5ish hr bus and then maybe 2 or 3 hr plane [05:58:19] the plane ride sounds fine, the bus ride not so much [05:58:29] with some bit of break. bus was chartered, small, kinda comfy and overnight. slept some [05:58:42] I can't sleep on 'em [05:58:48] so got in to TLV and it was still day [06:05:10] it's good if youre trying to get oriented someplace new [06:35:37] PROBLEM - Puppet freshness on db1029 is CRITICAL: Puppet has not run in the last 10 hours [06:58:49] ACKNOWLEDGEMENT - Host srv206 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn RT #241 [08:40:17] PROBLEM - Puppet freshness on mw24 is CRITICAL: Puppet has not run in the last 10 hours [08:50:20] PROBLEM - Puppet freshness on mw1102 is CRITICAL: Puppet has not run in the last 10 hours [09:04:17] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [10:04:12] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [10:14:49] New review: Mark Bergsma; "The enable_geoiplookup part is fine now, thanks. But can you separate out the test_backend part? I'm..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/15445 [10:17:12] New review: Mark Bergsma; "Checking $realm and especially $instanceproject is not a good idea, unless really necessary. Why not..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/13304 [10:40:46] !log Depooled mobile varnish server cp1041 for vlan/hostname change and reinstallation with precise [10:40:55] Logged the message, Master [10:44:12] New patchset: Mark Bergsma; "Support both public and internal varnish servers for mobile" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15522 [10:44:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15522 [10:45:15] New patchset: Mark Bergsma; "Temporarily decommission cp1041" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15523 [10:45:48] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15523 [10:45:51] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15522 [10:46:20] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15523 [10:55:15] New patchset: Mark Bergsma; "Make Ubuntu Precise Pangolin the default install" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15524 [10:55:49] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15524 [10:55:52] New patchset: Mark Bergsma; "Move cp1041 to internal" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15525 [10:56:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15525 [10:57:12] New patchset: Mark Bergsma; "Update pathprefix for Precise as well" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15526 [10:57:44] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15524 [10:57:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15526 [10:57:47] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15525 [10:58:06] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15526 [11:07:27] PROBLEM - Host cp1041 is DOWN: PING CRITICAL - Packet loss = 100% [11:09:19] RECOVERY - Host cp1041 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [11:09:55] PROBLEM - NTP on cp1041 is CRITICAL: NTP CRITICAL: No response from NTP server [11:12:37] PROBLEM - Host cp1041 is DOWN: PING CRITICAL - Packet loss = 100% [11:16:22] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [11:17:05] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/15519 [11:27:19] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [11:31:36] New patchset: Mark Bergsma; "Revert "Temporarily decommission cp1041"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15530 [11:32:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15530 [11:32:16] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15530 [11:51:55] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14874 [11:56:05] New patchset: Mark Bergsma; "Reinstall cp1041 with a proper Varnish partitioning scheme" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15532 [11:56:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15532 [11:57:11] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15532 [12:12:58] RECOVERY - Host cp1041 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [12:13:07] PROBLEM - Varnish HTTP mobile-frontend on cp1041 is CRITICAL: Connection refused [12:13:34] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: Connection refused by host [12:16:52] PROBLEM - Varnish HTCP daemon on cp1041 is CRITICAL: Connection refused by host [12:17:28] PROBLEM - Varnish HTTP mobile-backend on cp1041 is CRITICAL: Connection refused [12:20:41] New patchset: Mark Bergsma; "Fix invalid port range" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15533 [12:21:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15533 [12:21:25] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15533 [12:29:10] New patchset: Mark Bergsma; "Setup cache file systems" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15534 [12:29:43] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15534 [12:29:52] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15534 [12:34:54] New patchset: Mark Bergsma; "Fix persistent storage paths for Precise reinstalls" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15535 [12:35:28] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15535 [12:35:46] RECOVERY - NTP on cp1041 is OK: NTP OK: Offset -0.05004477501 secs [12:35:48] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15535 [12:41:21] New patchset: Mark Bergsma; "Update mobile backends" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15536 [12:41:55] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15536 [12:46:25] PROBLEM - Host cp1041 is DOWN: PING CRITICAL - Packet loss = 100% [12:47:01] RECOVERY - Host cp1041 is UP: PING OK - Packet loss = 0%, RTA = 30.92 ms [13:02:37] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [13:03:23] RECOVERY - Varnish HTTP mobile-backend on cp1041 is OK: HTTP OK HTTP/1.1 200 OK - 696 bytes in 1.857 seconds [13:07:53] PROBLEM - Varnish HTTP mobile-backend on cp1041 is CRITICAL: Connection refused [13:45:26] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [13:51:08] RECOVERY - Varnish HTTP mobile-frontend on cp1041 is OK: HTTP OK HTTP/1.1 200 OK - 641 bytes in 0.063 seconds [13:54:35] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [13:59:56] yep, its in the down section [14:00:09] !log shutting down srv206 for chris per rt241 [14:00:15] wow, indeed old ticket [14:00:21] Logged the message, Master [14:00:30] cmjohnson1: you sure its not still down from when i shut it down for you the other day? [14:00:37] its unresponisive, i imagine its still offline from then. [14:01:11] nagios said it was down since 1d something [14:01:27] well, nagios thinks its offline, so its not doin well [14:01:29] its all yours [14:01:42] you will possibly have to force a reboot with the front button [14:06:08] PROBLEM - Varnish HTTP mobile-frontend on cp1041 is CRITICAL: Connection refused [14:07:29] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [14:10:07] Change abandoned: Hashar; "will mount projectstorage.pmtpa.wmnet:/{$::instanceproject} instead." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14152 [14:11:13] squid on sq60 - Error: ERR_ZERO_SIZE_OBJECT when trying to upload, i see we had these in the past sometimes, but what to look for? [14:12:26] "The remote server closed the connection before sending any data." so that would be srv193 being broken (test.wp) [14:14:08] ganglia looks ok [14:14:14] (for srv193) [14:15:21] 14:09:30 srv193 apache2[10874]: [notice] child pid 22767 exit signal Segmentation fault (11) [14:15:42] lovely [14:15:44] apache children keep segfaulting [14:16:08] is 193 the one faidon put the new php5 on and then someone reverted? [14:16:19] sounds likely, its made for tests:) [14:16:59] php5 5.3.2-2wm1 [14:18:09] jeremyb: it was installed on 2012-07-04 [14:19:25] yeah, he did [14:19:31] I think Tim fixed it [14:19:40] but segfaults since Jul 11 21:16:31 it looks [14:19:47] lol [14:21:14] so, slightly over 17 hrs [14:21:16] Tim: on srv193: ran dpkg --set-selections to revert holds on php5 packages and ran apt-get upgrade [14:21:26] but the problem started later ... hmmm [14:21:43] Reedy: you here? [14:21:47] in person [14:21:50] No [14:21:54] Unfortunately :( [14:22:08] i thought i saw you registered pretty early [14:22:17] !log apache children on srv193 keep segfaulting since yesterday (test.wp) [14:22:20] Yeah, I did [14:22:25] Logged the message, Master [14:22:25] When I wasn't sure if I was going or not [14:22:35] oh, the opposite [14:22:35] Didn't see much point getting it refunded [14:22:37] of me [14:23:03] I registered a little over a week ago and I knew I was definitely going many months ago [14:23:06] ;) [14:31:01] New patchset: Hashar; "(bug 38084) uses /data/project instead of NFS instance" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15545 [14:31:02] ugh, jimmy wales says 12000 mbps == 12 terabit [14:31:09] ;-( [14:31:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15545 [14:33:46] Change abandoned: Hashar; "That patch was also in test as Id5632a34 and got merged when test has been merged in production as c..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11299 [14:39:23] New patchset: Hashar; "varnish config for bits.beta.wmflabs.org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13304 [14:39:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13304 [14:40:09] New review: Hashar; "Patchset 7 uses scope.lookupvar:" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/13304 [14:45:35] RECOVERY - Varnish HTTP mobile-frontend on cp1041 is OK: HTTP OK HTTP/1.1 200 OK - 641 bytes in 0.063 seconds [14:48:24] New patchset: Hashar; "varnish config for bits.beta.wmflabs.org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13304 [14:49:00] New review: Hashar; "Patchset 8 fix a missing dot in the regex for beta.wmflabs.org" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/13304 [14:49:00] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13304 [14:49:13] New review: Hashar; "Mark wrote:" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/13304 [14:49:28] mark: got a question for you on https://gerrit.wikimedia.org/r/#/c/13304/ [14:49:29] ;) [14:51:40] New review: Hashar; "I am not sure what the backend part is about, but we definitely don't use it on labs hence why $test..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/15445 [14:55:28] New patchset: Mark Bergsma; "Replace ulimit -s (causing segfault) with thread_pool_stack" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15546 [14:56:03] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15546 [15:06:35] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15546 [15:08:59] RECOVERY - Varnish HTTP mobile-backend on cp1041 is OK: HTTP OK HTTP/1.1 200 OK - 698 bytes in 0.063 seconds [15:19:20] New patchset: Mark Bergsma; "Use the new persistent storage code for cp1041" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15548 [15:19:54] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15548 [15:20:09] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15548 [15:34:56] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [15:38:59] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [15:55:24] mutante: any news? should we file a bug? [16:02:16] New patchset: Hashar; "(bug 38299) Computer Modern fonts for math rendering" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15550 [16:02:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15550 [16:06:16] 12 16:05:15 < wikibugs> (NEW) apache segfaults on [[special:upload]] submit to testwiki - https://bugzilla.wikimedia.org/38364 normal; Wikimedia: General/Unknown; (bugzilla+org.wikimedia) [16:09:58] New patchset: Mark Bergsma; "varnishhtcpd depends on liburi-perl installed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15551 [16:10:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15551 [16:12:32] New patchset: Catrope; "Move php5 Apache module out of site.pp into misc::noc-wikimedia" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15552 [16:13:06] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15552 [16:24:59] RAWR [16:25:25] Why are ALL fundraising Apache config files private?! [16:36:53] PROBLEM - Puppet freshness on db1029 is CRITICAL: Puppet has not run in the last 10 hours [16:44:03] cmjohnson1: just dead [16:44:33] New patchset: Catrope; "Remove unused and obsolete techblog class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15553 [16:45:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15553 [16:54:01] New patchset: Cmjohnson; "adding es5-8 to the dhcpd file." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15555 [16:54:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15555 [17:11:51] RECOVERY - Host srv206 is UP: PING OK - Packet loss = 0%, RTA = 0.83 ms [17:12:24] notpeter: ping [17:13:04] preilly: sup [17:13:08] you want mergy merge? [17:13:15] notpeter: can you merge merge https://gerrit.wikimedia.org/r/15556 [17:13:48] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15556 [17:15:00] PROBLEM - Apache HTTP on srv206 is CRITICAL: Connection refused [17:18:43] preilly: puppet all run and such [17:19:25] notpeter: okay is it live [17:19:33] preilly: should be [17:19:45] notpeter: okay cool [17:22:30] RECOVERY - Apache HTTP on srv206 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.286 second response time [17:31:27] New patchset: Dzahn; "get the lost wikistats class back in, add a role class and case for prod/labs, do not wrap classes, define instead of include, apache ports.conf file,..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15557 [17:32:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15557 [17:34:28] hey, does anyone know what our memcached evictions look like? I'm trying to figure out, if something gets saved in memcache and then I look for it again some time later, what are the chances it will be there... [17:35:10] New patchset: Dzahn; "get the lost wikistats class back in, add a role class and case for prod/labs, do not wrap classes, define instead of include, apache ports.conf file,..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15557 [17:35:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15557 [17:36:33] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15557 [18:37:09] New patchset: Hashar; "(bug 38299) Computer Modern fonts for math rendering" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15550 [18:37:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15550 [18:40:54] PROBLEM - Puppet freshness on mw24 is CRITICAL: Puppet has not run in the last 10 hours [18:41:47] is there something wrong with apache/php on testwiki by chance? getting errors on edit: [18:41:48] Request: GET http://test.wikipedia.org/w/index.php?title=File:Barrio_Santa_Rosa_1342117308397.jpeg&action=edit, from 10.64.0.124 via cp1014.eqiad.wmnet (squid/2.7.STABLE9) to 10.0.2.193 (10.0.2.193) [18:41:48] Error: ERR_ZERO_SIZE_OBJECT, errno [No Error] at Thu, 12 Jul 2012 18:40:31 GMT [18:50:48] PROBLEM - Puppet freshness on mw1102 is CRITICAL: Puppet has not run in the last 10 hours [19:05:07] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [19:14:44] New patchset: Catrope; "Clean up the mess that is SSL certificate installation" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15561 [19:15:14] New patchset: Catrope; "Actually use misc::secure to run secure.wikimedia.org on singer" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15458 [19:15:47] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/15561 [19:15:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15458 [19:20:18] notpeter: https://gerrit.wikimedia.org/r/#/c/15562/ [19:21:22] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15562 [19:38:24] New review: Hashar; "You can install a local puppet syntax checker using Ruby gem. Something like:" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/15561 [20:05:07] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [20:29:03] Ms. LeslieCarr, [20:29:09] i have a quick favor to ask of you, if you got a sec [20:29:14] should be easy :) [20:29:17] hehehe [20:29:22] another code review and push? ;) [20:29:28] sorta, but to private repo [20:29:40] I want to add a product id to GeoIP.conf [20:29:54] to make puppet automatically download and install the GeoIPRegion.dat file [20:30:03] we currently aren't doing this, but should be [20:30:06] we already have the license for it [20:30:21] if I email you the GeoIP.conf file [20:30:32] yeah, sounds good [20:30:32] all you should ahve to do is commit it to the private puppet repo [20:30:40] it already exists there, i'll just email you the whole as a replacement [20:30:48] yeah, please don't paste it in email ;) [20:30:50] ah new key ? [20:30:54] i mean [20:30:57] don't paste in irc [20:30:57] naw, just new product ID [20:30:59] yeah [20:31:04] i will attach in email, want me to encrypt it? [20:31:08] you got a gpg public? [20:32:41] or I can just email you the bottom portion with the product ids [20:32:45] and you can paste them in [20:32:47] those are not sensitive [20:34:12] i think email is ok [20:34:30] it's secure enough for a not the most insane secret thing [20:34:31] or actually [20:35:19] that last presenter said our mail servers were in the EU [20:35:28] LeslieCarr: did you hear him say that? [20:35:47] i thought i heard it, but wasnt sure so didnt chime in. [20:36:38] really ? [20:36:49] i didn't think so but could have been not listening well enough [20:37:59] when he suggested we house media not on us soil, he said we had non us soil servers already , and i think he mentioned mail [20:38:20] that was true in 2007, not now and not for a long time [20:38:39] ohwell [20:42:27] ah yeah [20:43:01] i really would have loved to have some international experts on copyright [20:43:09] cuz he even said he didn't know much abotu the us rights [20:43:31] PROBLEM - Host db48 is DOWN: PING CRITICAL - Packet loss = 100% [20:44:05] uhoh, db48 ... [20:45:01] RECOVERY - Host db48 is UP: PING OK - Packet loss = 0%, RTA = 0.40 ms [20:45:57] weird, it looks fine.... [20:54:21] i had gotten it [21:05:14] binasher: ping [21:05:15] http://pastebin.com/raw.php?i=yv27xgwE [21:05:35] jeremyb: ongoing? [21:05:36] the backup's long done already, right? [21:05:45] binasher: apparently. /j #wikimedia-otrs [21:05:57] binasher: i haven't seen it myself, and am kinda busy [21:06:32] jeremyb: the db was down for a couple minutes, should not be an ongoing problem [21:07:15] binasher: could you go idle there for some minutes? [21:07:39] sure [21:08:32] binasher: danke [21:16:58] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [21:24:27] I'm listening to a talk about Xen right now [21:24:42] ignoring the fact that it's Xen, this is an interesting concept: http://wiki.xen.org/wiki/Xen_Document_Days [21:24:57] looking [21:25:12] they devote last Monday of each month for documentation [21:25:24] everyone works on that and they say it works well for them [21:26:10] for people who hate working on docs, it might work out [21:26:38] * paravoid coughs [21:27:08] :-) [21:27:55] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [21:34:41] New patchset: Hashar; "enhance rakefile for easy validation" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15568 [21:35:16] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15568 [21:37:37] Change abandoned: Asher; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15503 [21:39:19] New patchset: Asher; "otrs is stuffing large email attachemnts into blobs, which was breaking replication" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15569 [21:39:52] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15569 [21:39:55] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15569 [21:53:06] PROBLEM - Puppet freshness on ms1 is CRITICAL: Puppet has not run in the last 10 hours [22:12:27] New patchset: Alex Monk; "(bug 38361) Enable Quiz extension on viwikibooks" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/15570 [22:48:15] New patchset: Asher; "renaming otrsdb cluster name to m2, adding db1046 as as slave and db1048 as a master" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15571 [22:48:49] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15571 [22:55:23] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15571 [23:03:28] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [23:46:31] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours