[00:20:21] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 193 seconds [00:21:06] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 186 seconds [00:23:30] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [00:34:54] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 3 seconds [00:35:39] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay 20 seconds [00:43:09] New patchset: Thehelpfulone; "meta sysop +/- transadmin self,crat -transadmin" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9333 [00:43:15] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9333 [00:43:40] it worked?! :D [00:43:56] Reedy: ^ is that okay in the description or does it need to be on the first line? [00:50:39] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 186 seconds [00:51:33] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 238 seconds [00:53:24] New review: Jeremyb; "Looks good. not touching $wgAddGroups because that's already set for crats in CommonSettings" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/9333 [01:00:42] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 195 seconds [01:01:27] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 205 seconds [01:05:39] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 211 seconds [01:17:03] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay 0 seconds [01:17:39] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 17 seconds [01:40:45] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 211 seconds [01:44:57] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [02:44:30] PROBLEM - Puppet freshness on srv192 is CRITICAL: Puppet has not run in the last 10 hours [02:49:26] New patchset: Aaron Schulz; "Purge from squid all thumbs in Swift on purge." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9355 [02:49:31] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9355 [02:50:30] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [02:56:30] PROBLEM - Puppet freshness on search17 is CRITICAL: Puppet has not run in the last 10 hours [03:00:33] PROBLEM - Puppet freshness on search13 is CRITICAL: Puppet has not run in the last 10 hours [03:00:33] PROBLEM - Puppet freshness on search15 is CRITICAL: Puppet has not run in the last 10 hours [03:01:54] RECOVERY - Puppet freshness on srv192 is OK: puppet ran at Wed May 30 03:01:50 UTC 2012 [03:07:27] PROBLEM - Puppet freshness on search14 is CRITICAL: Puppet has not run in the last 10 hours [03:07:27] PROBLEM - Puppet freshness on search19 is CRITICAL: Puppet has not run in the last 10 hours [03:07:27] PROBLEM - Puppet freshness on search16 is CRITICAL: Puppet has not run in the last 10 hours [03:18:19] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [04:02:08] PROBLEM - Puppet freshness on sodium is CRITICAL: Puppet has not run in the last 10 hours [04:56:17] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 184 seconds [04:57:11] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 195 seconds [05:15:47] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 1 seconds [05:16:32] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 1 seconds [05:44:55] New review: Hashar; "Why not 12 ? ;-D" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/9130 [06:03:59] running sync-dir docroot/mediawiki/xml '{{bug|37111}} deploying export-0.7.xsd' [06:04:04] not sure IRC notification has been done there since I was not connected [06:06:30] hashar: but you're in the channel? [06:06:41] 30 06:05:03 <+logmsgbot> !log hashar synchronized docroot/mediawiki/xml '{{bug|37111}} deploying export-0.7.xsd' [06:06:44] 30 06:05:08 <+morebots> Logged the message, Master [06:07:02] also, bonjour! [06:07:13] yeah I have connected [06:07:22] grr [06:07:27] i mean before those msgs [06:07:46] nop I was not [06:07:49] that was logmsgbot [06:07:56] 30 06:03:35 -!- hashar [~sempitern@mediawiki/hashar] has joined #wikimedia-tech [06:08:01] look at the timestamps [06:08:19] right [06:15:22] well it worked [06:15:27] now I will finish my breakfast :))) [07:32:20] hello [07:46:53] apergos: we have a new XML export schema :-D [07:46:55] 0.7 [07:46:55] http://www.mediawiki.org/xml/export-0.7/ [07:47:16] riiight [07:47:37] good morning ;) [07:47:43] I hope I did not wake you up [07:47:47] no [07:47:47] by pinging ya on IRC [07:47:51] \O/ [07:48:03] I've been online a couple hours already [07:48:08] more than that actually [07:48:36] what's the diff to the previous schema? [07:48:45] for reference : https://bugzilla.wikimedia.org/4220 Unique identity constraints for XML dump format schema [07:48:47] finding you the diff [07:49:14] https://gerrit.wikimedia.org/r/#/c/8889/ || https://gerrit.wikimedia.org/r/gitweb?p=mediawiki%2Fcore.git;a=commit;h=d2e8dd6251552aa5e9c0a35eb94f2eaa91a5d42a [07:49:18] sha1 : d2e8dd6 [07:49:30] ohh [07:49:33] there is no diff sorry [07:49:54] I see that no one replied to my concern [07:50:00] so you have to do something like: diff -u docs/export-0.{6,7}.xsd [07:50:06] it was just overlooked, patched and merged [07:50:44] ohhh [07:50:54] I think I have just read the first sentence of your comment 7 [07:51:08] I bring it up because such a proposal has been floated around in the past [07:52:07] so I would like folks to think aobut it and discuss [07:53:36] so should I just revert my change ? [07:53:49] or write a mail to wikitech-l somewhere announcing 0.7 schema maybe [07:53:51] it's in trunk right now, not the various wmf branches right? [07:54:03] yup only master for now [07:54:08] yes, master [07:54:10] hopefully [07:54:15] I'm going to be calling it trunk for years :-p [07:55:03] $ git branch --contains d2e8dd6 [07:55:03] * master [07:55:04] so I'd leave it for now but if you're not getting any feedback on bugzilla (which is appears you aren't) please drop an email to xmldatadumps-l and to wikitech-l [07:55:04] $ [07:55:05] \O/ [07:55:49] nothing huge, just asking for commentary [07:56:10] I have sent myself a remember [07:56:14] ok cool [07:57:07] so I have learned from this (yet again) that people don't read what I write, even if it's very short. :-( [07:57:39] that is why I have just merged the change [07:57:47] if 0.7 cause any trouble, we can still create a new 0.8 [07:57:55] right [07:57:56] and add a note in 0.7 how it is obsolete / should not be used or something [07:58:16] though to be honest, I should probably have asked first before merging ;-]] [07:59:13] ok well next time [08:11:04] New patchset: ArielGlenn; "fix indentation with verbose (this itme, remotedict creation)" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/9362 [08:12:00] New review: ArielGlenn; "(no comment)" [operations/dumps] (ariel); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9362 [08:12:02] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/9362 [09:16:09] PROBLEM - Puppet freshness on es1002 is CRITICAL: Puppet has not run in the last 10 hours [09:16:09] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [09:16:09] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [09:16:09] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [09:38:24] New patchset: Pyoungmeister; "fixing typo in mobile.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9366 [09:38:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9366 [09:40:47] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9366 [09:40:49] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9366 [09:44:40] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 182 seconds [09:45:25] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 186 seconds [09:58:46] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [09:59:55] New patchset: Pyoungmeister; "turning down log level on lucene log file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9367 [10:00:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9367 [10:00:29] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9367 [10:00:32] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9367 [10:05:22] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 0 seconds [10:06:07] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 2 seconds [10:08:04] New patchset: Bhartshorne; "making the owa servers join the pmtpa-test swift cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9368 [10:08:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9368 [10:08:28] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9368 [10:08:31] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9368 [10:10:19] RECOVERY - Puppet freshness on owa3 is OK: puppet ran at Wed May 30 10:09:47 UTC 2012 [10:10:19] RECOVERY - Puppet freshness on owa1 is OK: puppet ran at Wed May 30 10:09:51 UTC 2012 [10:10:19] RECOVERY - Puppet freshness on owa2 is OK: puppet ran at Wed May 30 10:09:52 UTC 2012 [10:15:03] New patchset: Bhartshorne; "continuing creation of the owa / ms pmtpa-test cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9369 [10:15:23] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9369 [10:15:23] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9369 [10:24:24] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [10:33:52] New patchset: Asher; "make maxClauseCount configurable" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/9370 [10:34:13] notpeter: ^^ [10:42:56] New patchset: Bhartshorne; "creating accounts for SwiftStack contractors, giving them access + sudo to pmtpa-test swift cluster. Note these accounts are missing ssh keys so won't work yet." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9372 [10:43:18] New patchset: Bhartshorne; "opening ssh to the public internet to the swift pmtpa and eqiad test clusters" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9373 [10:43:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9372 [10:43:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9373 [10:45:35] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9372 [10:45:38] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9372 [10:45:46] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9373 [10:45:48] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9373 [10:54:25] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 183 seconds [10:54:25] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 183 seconds [11:26:31] New review: Pyoungmeister; "(no comment)" [operations/debs/lucene-search-2] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9370 [11:29:14] New review: Pyoungmeister; "(no comment)" [operations/debs/lucene-search-2] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9370 [11:29:33] New review: Pyoungmeister; "(no comment)" [operations/debs/lucene-search-2] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9370 [11:29:35] Change merged: Pyoungmeister; [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/9370 [11:56:13] New patchset: Mark Bergsma; "Replace deprecated module md5 by hashlib" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/9375 [11:56:13] New patchset: Mark Bergsma; "Remove umask(0); unnecessary and creates world writeable log files" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/9376 [11:56:14] New patchset: Mark Bergsma; "Replace all bare try except: statements by try except Exception:" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/9377 [11:56:15] New patchset: Mark Bergsma; "Use a more specific exception on module import" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/9378 [11:56:15] New patchset: Mark Bergsma; "Remove CVS $Id$ headers, update copyright notices" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/9379 [11:58:33] New patchset: Mark Bergsma; "Fix hashlib usage, .update() doesn't return self" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/9380 [11:58:55] New review: Mark Bergsma; "Broken, but fixed in a subsequent commit" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9375 [11:58:57] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/9375 [11:59:21] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9376 [11:59:22] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/9376 [11:59:57] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9377 [11:59:59] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/9377 [12:00:33] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9378 [12:00:34] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/9378 [12:00:56] !log restarting pdns on ns2 [12:00:59] Logged the message, Master [12:01:20] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9379 [12:01:22] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/9379 [12:01:47] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9380 [12:01:49] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/9380 [12:05:31] New patchset: Dzahn; "analytics partman recipe - don't use logical partition if you say primary before" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9381 [12:05:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9381 [12:06:15] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9381 [12:06:17] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9381 [12:10:23] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 0 seconds [12:10:41] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 0 seconds [12:11:26] PROBLEM - Puppet freshness on virt2 is CRITICAL: Puppet has not run in the last 10 hours [12:11:32] New patchset: Hashar; "wgLoadScript is only used on production cluster" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9383 [12:11:38] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9383 [12:19:29] New patchset: Hashar; "wgHTCPMulticast* is only used on pmtpa cluster" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9384 [12:19:35] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9384 [12:20:06] New patchset: Pyoungmeister; "adding debian dir" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/9385 [12:27:42] New review: Pyoungmeister; "(no comment)" [operations/debs/lucene-search-2] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9385 [12:27:44] Change merged: Pyoungmeister; [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/9385 [12:31:34] New patchset: Pyoungmeister; "wrong format" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/9386 [12:32:12] New review: Pyoungmeister; "(no comment)" [operations/debs/lucene-search-2] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9386 [12:32:14] Change merged: Pyoungmeister; [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/9386 [12:32:29] New patchset: Dzahn; "need to separate analytics partman - cisco vs. dell" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9387 [12:32:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9387 [12:33:28] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9387 [12:33:30] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9387 [12:43:24] fcking php packages [12:43:30] take like an hour and a half to build [12:43:53] and in the new versions the build-dep on mysql-server and actually SPAWN A MYSQL SERVER during build time [12:43:56] to do tests [12:51:55] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [12:52:19] paravoid: hehe [12:52:31] the varnish build actually does varnish testing and spawns up tons of varnish instances [12:52:35] and sometimes they fail, especially in labs [12:52:40] because of timeouts or whatever [12:52:43] and then your build fails [12:52:45] oh I know, I've fixed a bug in the test suite at some point [12:52:46] pretty annoying as well [12:53:00] it even has its own nice DSL just for testing [12:53:03] yep [12:53:13] it's a cool concept, but sometimes in your way ;) [12:57:46] PROBLEM - Puppet freshness on search17 is CRITICAL: Puppet has not run in the last 10 hours [13:01:49] PROBLEM - Puppet freshness on search13 is CRITICAL: Puppet has not run in the last 10 hours [13:01:49] PROBLEM - Puppet freshness on search15 is CRITICAL: Puppet has not run in the last 10 hours [13:08:52] PROBLEM - Puppet freshness on search16 is CRITICAL: Puppet has not run in the last 10 hours [13:08:52] PROBLEM - Puppet freshness on search14 is CRITICAL: Puppet has not run in the last 10 hours [13:08:52] PROBLEM - Puppet freshness on search19 is CRITICAL: Puppet has not run in the last 10 hours [13:18:55] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [13:29:52] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [13:30:28] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms [13:39:01] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 194 seconds [13:39:19] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 190 seconds [13:47:54] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 204 seconds [13:58:26] paravoid / mark, https://test.wikipedia.org/w/index.php?title=Special:CentralNotice&method=listNoticeDetail¬ice=POTY+Test+Campaign+01 who should I tell about that error? [13:59:04] no idea :-) [14:03:30] PROBLEM - Puppet freshness on sodium is CRITICAL: Puppet has not run in the last 10 hours [14:05:18] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay 4 seconds [14:05:27] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 0 seconds [15:41:16] New patchset: Pyoungmeister; "configure file is incompatible with building debian package." [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/9400 [15:42:21] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 201 seconds [15:42:42] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 200 seconds [15:43:26] New review: Pyoungmeister; "(no comment)" [operations/debs/lucene-search-2] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9400 [15:43:29] Change merged: Pyoungmeister; [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/9400 [15:49:00] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 195 seconds [15:50:03] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 229 seconds [15:53:08] New patchset: Pyoungmeister; "updating changelog for precise" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/9402 [15:54:14] New review: Pyoungmeister; "(no comment)" [operations/debs/lucene-search-2] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9402 [15:54:16] Change merged: Pyoungmeister; [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/9402 [16:13:15] New patchset: Bhartshorne; "inserting darrell's key" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9403 [16:13:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9403 [16:14:06] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9403 [16:14:08] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9403 [16:18:05] New patchset: Bhartshorne; "damned duped identifiers." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9404 [16:18:26] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9404 [16:18:26] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9404 [16:35:26] binasher: ping? [16:35:30] RECOVERY - Puppet freshness on analytics1001 is OK: puppet ran at Wed May 30 16:35:20 UTC 2012 [16:37:14] so, disabling the suhosin patch in lucid seems to make it fail to build [16:37:17] something about a missing header, no idea why it would present by disabling suhosin [16:37:19] I can debug it and fix it [16:37:21] but otoh the precise built fine without suhosin (there's even support for that in debian/rules) [16:37:22] what's the process of disabling it? [16:37:22] i wonder if a later patch references parts of it [16:37:22] probably [16:37:30] I was wondering if you could your benchmarks with precise [16:37:34] maybe check for that header in other patches/ files [16:37:39] or if it's an requirement to use lucid [16:37:43] and hence I should spent time on it [16:37:51] spend even [16:38:50] i was going to run evil stealing a server from prod benchmarks [16:39:00] but i could do something else [16:39:33] New patchset: Bhartshorne; "one more try to get SwiftStack user accounts on the hw test cluster." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9405 [16:39:38] I made wikimedia-app-server install on precise the other day [16:39:42] forward-ported some packages [16:39:54] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9405 [16:39:59] but I see your point [16:40:01] so I'll give it a try [16:40:49] it shouldn't be too hard to get mediawiki working on precise with prod config files if its just for quick and messy testing purposes [16:41:03] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9405 [16:41:06] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9405 [16:42:27] Change abandoned: Lcarr; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7852 [16:43:57] New patchset: Lcarr; "removing old classes from searchindexer" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9407 [16:44:18] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9407 [16:44:21] RECOVERY - Lucene on search21 is OK: TCP OK - 0.001 second response time on port 8123 [16:44:40] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9407 [16:44:43] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9407 [16:51:26] New patchset: Lcarr; "adding analytics1001 into decom to prevent duplicates" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9408 [16:51:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9408 [16:54:00] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9408 [16:54:03] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9408 [17:10:36] RECOVERY - MySQL Idle Transactions on bellin is OK: OK longest blocking idle transaction sleeps for seconds [17:10:45] RECOVERY - MySQL Replication Heartbeat on bellin is OK: OK replication delay seconds [17:10:45] RECOVERY - MySQL Slave Delay on bellin is OK: OK replication delay seconds [17:10:45] RECOVERY - Full LVS Snapshot on bellin is OK: OK no full LVM snapshot volumes [17:10:45] RECOVERY - MySQL Slave Running on bellin is OK: OK replication [17:10:45] RECOVERY - MySQL disk space on bellin is OK: DISK OK [17:10:46] RECOVERY - Host bellin is UP: PING OK - Packet loss = 0%, RTA = 0.51 ms [17:11:39] RECOVERY - MySQL Recent Restart on bellin is OK: OK seconds since restart [17:14:39] New patchset: Lcarr; "removing analytics1001 from decom" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9411 [17:15:00] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9411 [17:15:25] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9411 [17:15:28] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9411 [17:30:51] PROBLEM - NTP on bellin is CRITICAL: NTP CRITICAL: Offset unknown [17:40:20] New patchset: Lcarr; "fixing submit_check_result for passive checks" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9416 [17:40:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9416 [17:42:59] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9416 [17:43:01] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9416 [17:44:21] PROBLEM - Host mw57 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:21] PROBLEM - Host mw52 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:21] PROBLEM - Host mw55 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:21] PROBLEM - Host mw53 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:21] PROBLEM - Host mw51 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:22] PROBLEM - Host mw56 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:22] PROBLEM - Host mw54 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:29] ... [17:44:30] PROBLEM - Host mw47 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:30] PROBLEM - Host mw46 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:39] PROBLEM - Host mw37 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:39] PROBLEM - Host mw29 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:39] PROBLEM - Host mw41 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:39] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:39] PROBLEM - Host mw34 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:40] PROBLEM - Host mw35 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:40] PROBLEM - Host mw48 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:41] PROBLEM - Host mw45 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:48] PROBLEM - Host mw40 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:48] PROBLEM - Host mw33 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:48] PROBLEM - Host mw39 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:48] PROBLEM - Host mw42 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:48] PROBLEM - Host mw44 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:49] PROBLEM - Host mw38 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:49] I take it you guys noticed that some servers seem unresponsive? [17:44:54] Stupid nagios [17:44:57] PROBLEM - Host mw36 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:57] PROBLEM - Host mw30 is DOWN: PING CRITICAL - Packet loss = 100% [17:44:57] Ping is fine [17:45:01] LeslieCarr: ^^ [17:45:06] PROBLEM - Host mw32 is DOWN: PING CRITICAL - Packet loss = 100% [17:45:13] mobile varnish is throwing 503s [17:45:13] hrm [17:45:16] sounds like a spence issue [17:45:21] !log ganglia uploaded backported ganglia 3.3.5 deb package to precise-wikimedia repo [17:45:24] oh mobile varnish down ? [17:45:25] Logged the message, Master [17:45:26] !log ganglia uploaded backported ganglia 3.3.5 deb package to precise-wikimedia repo [17:45:30] Logged the message, Master [17:45:32] LeslieCarr: something's weird [17:45:33] RECOVERY - Host mw30 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [17:45:33] RECOVERY - Host mw38 is UP: PING OK - Packet loss = 0%, RTA = 1.03 ms [17:45:33] RECOVERY - Host mw29 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [17:45:33] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [17:45:41] * Reedy kicks nagios-wm [17:45:42] PROBLEM - Apache HTTP on mw17 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:42] PROBLEM - Apache HTTP on mw13 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:42] PROBLEM - Apache HTTP on srv283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:42] PROBLEM - Apache HTTP on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:42] PROBLEM - Apache HTTP on mw58 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:43] RECOVERY - Host mw48 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [17:45:43] RECOVERY - Host mw46 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms [17:45:43] need me to clear mobile cache ? [17:45:44] RECOVERY - Host mw47 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [17:45:44] RECOVERY - Host mw41 is UP: PING OK - Packet loss = 0%, RTA = 0.52 ms [17:45:45] RECOVERY - Host mw35 is UP: PING OK - Packet loss = 0%, RTA = 0.20 ms [17:45:45] RECOVERY - Host mw37 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms [17:45:46] RECOVERY - Host mw52 is UP: PING OK - Packet loss = 0%, RTA = 0.45 ms [17:45:46] RECOVERY - Host mw56 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [17:45:47] RECOVERY - Host mw51 is UP: PING OK - Packet loss = 0%, RTA = 0.36 ms [17:45:52] PROBLEM - Apache HTTP on mw21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:52] PROBLEM - Apache HTTP on srv205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:52] PROBLEM - Apache HTTP on srv213 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:52] PROBLEM - Apache HTTP on srv211 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:52] PROBLEM - Apache HTTP on srv194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:53] PROBLEM - Apache HTTP on srv196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:53] RECOVERY - Host mw53 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [17:45:53] im getting project home pages but Special:Random results in 503 [17:45:54] RECOVERY - Host mw39 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [17:45:54] RECOVERY - Host mw33 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [17:45:55] RECOVERY - Host mw45 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [17:45:55] RECOVERY - Host mw34 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [17:45:57] !log ganglia uploaded backported ganglia 3.3.5 deb package to precise-wikimedia repository [17:46:00] PROBLEM - Apache HTTP on srv198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:00] PROBLEM - Apache HTTP on srv210 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:00] PROBLEM - Apache HTTP on mw11 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:00] PROBLEM - Apache HTTP on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:00] RECOVERY - Host mw55 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [17:46:01] Logged the message, Master [17:46:01] RECOVERY - Host mw44 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [17:46:01] RECOVERY - Host mw42 is UP: PING OK - Packet loss = 0%, RTA = 0.20 ms [17:46:02] RECOVERY - Host mw57 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [17:46:02] RECOVERY - Host mw36 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [17:46:09] PROBLEM - Apache HTTP on srv245 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:09] PROBLEM - Apache HTTP on srv212 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:09] RECOVERY - Host mw54 is UP: PING OK - Packet loss = 0%, RTA = 0.38 ms [17:46:09] RECOVERY - Host mw32 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [17:46:18] PROBLEM - Apache HTTP on srv277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:18] PROBLEM - Apache HTTP on mw18 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:18] PROBLEM - Apache HTTP on srv195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:18] PROBLEM - Apache HTTP on srv234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:18] PROBLEM - Apache HTTP on mw14 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:19] PROBLEM - Apache HTTP on srv263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:19] PROBLEM - Apache HTTP on srv268 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:20] PROBLEM - Apache HTTP on mw7 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:27] PROBLEM - Apache HTTP on srv247 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:27] PROBLEM - Apache HTTP on srv271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:27] PROBLEM - Apache HTTP on srv237 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:27] PROBLEM - Apache HTTP on srv200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:27] PROBLEM - Apache HTTP on srv275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:28] PROBLEM - Apache HTTP on srv229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:28] PROBLEM - Apache HTTP on srv242 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:29] PROBLEM - Apache HTTP on srv230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:29] LeslieCarr maybe that would help? [17:46:34] i dunno [17:46:36] PROBLEM - Apache HTTP on srv190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:36] PROBLEM - Apache HTTP on srv207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:36] PROBLEM - Apache HTTP on srv228 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:36] PROBLEM - Apache HTTP on srv243 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:36] PROBLEM - Apache HTTP on srv233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:37] PROBLEM - Apache HTTP on srv225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:37]