[00:00:02] hrm, and trying to figure out if Roan is in the wmf ldap group [00:00:04] "not working as specified since 23:49:16" [00:00:09] because he's getting not authorized errors [00:00:10] So it took 5 minutes to get to my phone?wtf [00:00:30] sounds like sms template expansion [00:01:02] Oooooh here we go [00:01:08] It's paging me for Parsoid Varnish on cerium [00:01:17] That's actually a Watchmouse page [00:01:26] It's not paging for the 3 LVS services we killed [00:02:03] LeslieCarr: on formey: ldaplist -l group wmf [00:04:27] New patchset: Lcarr; "giving roan command access" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59976 [00:05:05] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59976 [00:06:25] LeslieCarr: could you possibly do https://gerrit.wikimedia.org/r/#/c/59972/ as well? I stopped puppet and moved the directory in preparation. [00:07:16] si [00:07:21] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59972 [00:07:31] done [00:07:35] I've used up like a month's worth of my puppet allowance in like 24 hours but I'm pretty much done, I promise [00:07:37] thanks :)) [00:11:13] Hah, looks like there's something wrong with the pybal health checks or something [00:11:21] It thinks everything is down [00:11:27] Nagios thinks the same [00:11:35] GET / seems to have broken somehow but everything else is still working [00:17:58] Hah, looks like Parsoid broke it [00:18:04] They didn't close the connection when serving / [00:18:07] So the monitors timed out [00:18:29] Which is why pybal depooled half the cluster, and why Nagios believed things were down. But they were actually up, every URL except / worked ^^ [00:20:37] I'm deploying a fix [00:20:59] RECOVERY - Parsoid on wtp1002 is OK: HTTP OK: HTTP/1.1 200 OK - 1368 bytes in 3.471 second response time [00:21:00] RECOVERY - LVS HTTP IPv4 on parsoid.svc.pmtpa.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 1368 bytes in 3.093 second response time [00:21:03] RECOVERY - Parsoid on wtp1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.006 second response time [00:21:09] there we go [00:21:10] RECOVERY - Parsoid on titanium is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.004 second response time [00:21:10] RECOVERY - LVS HTTP IPv4 on parsoid.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.012 second response time [00:21:12] RECOVERY - LVS HTTP IPv4 on parsoidcache.svc.pmtpa.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 1357 bytes in 0.057 second response time [00:21:19] RECOVERY - Parsoid Varnish on celsus is OK: HTTP OK: HTTP/1.1 200 OK - 1357 bytes in 0.058 second response time [00:21:39] RECOVERY - Parsoid Varnish on titanium is OK: HTTP OK: HTTP/1.1 200 OK - 1357 bytes in 0.008 second response time [00:21:40] RECOVERY - Parsoid on wtp1 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.056 second response time [00:21:49] RECOVERY - Parsoid on wtp1003 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.003 second response time [00:21:49] RECOVERY - Parsoid on mexia is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.054 second response time [00:21:50] RECOVERY - Parsoid on tola is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.054 second response time [00:21:50] RECOVERY - Parsoid Varnish on constable is OK: HTTP OK: HTTP/1.1 200 OK - 1358 bytes in 0.055 second response time [00:21:50] RECOVERY - Parsoid on lardner is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.055 second response time [00:21:50] RECOVERY - Parsoid Varnish on cerium is OK: HTTP OK: HTTP/1.1 200 OK - 1357 bytes in 0.003 second response time [00:21:59] Whee [00:21:59] RECOVERY - Parsoid on cerium is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.008 second response time [00:22:28] \o/ [00:22:42] All so exciting. [00:22:50] RECOVERY - Parsoid on kuo is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.088 second response time [00:25:28] New patchset: Ori.livneh; "Set MPLCONFIGDIR env var for Matplotlib" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59981 [00:27:12] OK, if get that merged I get graphs and then I can stop jumping in my chair. [00:27:55] (I'm not asking, just being declarative, like puppet.) [00:28:05] ensure => 'merged', etc. [00:28:35] !log catrope synchronized php-1.22wmf1/extensions/VisualEditor 'Update VisualEditor to master' [00:28:36] LeslieCarr: James_F points out I am the only one with a capital letter in that file, I hope that wasn't a mistake. I know I used Catrope with a capital C to log into icinga, is all [00:28:42] Logged the message, Master [00:28:57] !log catrope synchronized php-1.22wmf2/extensions/VisualEditor 'Update VisualEditor to master' [00:29:04] Logged the message, Master [00:29:47] LeslieCarr: Neeeever mind silencing works now. Thanks :) [00:30:36] ACKNOWLEDGEMENT - Parsoid on wtp1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds Catrope Deliberately sabotaged to serve as a testing/benchmarking ground for Gabriel [00:30:48] heh [00:32:02] ACKNOWLEDGEMENT - Parsoid on constable is CRITICAL: Connection refused Catrope Known breakage, not pooled we should really remove Parsoid from this box [00:32:27] ACKNOWLEDGEMENT - Parsoid on celsus is CRITICAL: Connection refused Catrope Known breakage, not pooled we should really remove Parsoid from this box [00:32:56] ori-l: I think you missed a sudo joke in there somewhere ;) [00:33:41] Writing a postmortem for the ops list [00:34:27] i sudidn't! (er. best i could come up with.) [00:57:40] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [01:08:55] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 18 seconds [01:13:55] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 198 seconds [01:15:55] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 15 seconds [01:28:55] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [01:30:55] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [01:42:08] New patchset: Ori.livneh; "Set MPLCONFIGDIR env var for Matplotlib" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59981 [01:43:57] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [01:50:57] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 27 seconds [01:51:36] New patchset: Aaron Schulz; "Revert "Enabled 1:1 profiling for cli scripts and put "cli" in the profile ID."" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59982 [01:51:46] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59982 [01:53:43] !log aaron synchronized wmf-config/StartProfiler.php 'Removed cli profiling for now.' [01:53:51] Logged the message, Master [01:54:12] !log aaron cleared profiling data [01:54:20] Logged the message, Master [02:16:42] !log LocalisationUpdate completed (1.22wmf2) at Fri Apr 19 02:16:42 UTC 2013 [02:16:49] Logged the message, Master [02:23:57] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 182 seconds [02:26:27] !log aaron cleared profiling data [02:26:34] Logged the message, Master [02:27:18] !log LocalisationUpdate completed (1.22wmf1) at Fri Apr 19 02:27:17 UTC 2013 [02:27:25] Logged the message, Master [02:28:58] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 232 seconds [02:31:37] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [02:33:57] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 18 seconds [02:40:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:41:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [02:44:04] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 218 seconds [02:53:04] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 24 seconds [03:00:04] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 218 seconds [03:03:04] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 14 seconds [03:32:40] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Apr 19 03:32:40 UTC 2013 [03:32:48] Logged the message, Master [03:46:08] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [03:48:28] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 198 seconds [03:49:28] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 14 seconds [03:52:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:53:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [03:58:28] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 198 seconds [04:00:28] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 18 seconds [04:48:21] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 198 seconds [04:50:21] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 17 seconds [04:56:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:57:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [05:18:29] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 199 seconds [05:20:30] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 10 seconds [05:23:09] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [05:23:45] wow, apergos is the fastest RT gun in the east [05:23:54] maybe. i have no citations [05:23:55] no, you were just lucky [05:24:18] but let's see if you were lucky and the rename worked out [05:24:20] hehe :) [05:24:23] right [05:28:30] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 199 seconds [05:30:29] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 19 seconds [05:44:22] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 191 seconds [05:45:22] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 19 seconds [05:52:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:53:04] RD: scroll up [05:53:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [05:53:32] apergos? [05:53:36] yes [05:53:43] Ty for handling that OTRS RT :-) [05:53:44] yes? [05:53:49] ah it works? [05:53:52] Yup [05:53:58] great I'll close that then [05:54:07] * jeremyb_ was going to close [05:54:14] It turns out, after all these years of not being able to access it, it was just a dummy filter - it is deleted now! [05:54:19] ok you can [05:54:25] yes a test filter :-D [05:54:31] I admit I snickered when I saw that [05:55:09] closed [05:55:22] sweet [05:55:50] I was sort of mad! [05:55:58] hah [05:55:59] It has been bothering me for years ;-) [05:56:14] well, not any more. must be a good year :-D [05:57:07] ho ho [05:57:13] how is milan ? [05:58:22] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 199 seconds [06:00:22] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 19 seconds [06:01:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [06:06:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:07:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [06:09:22] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 184 seconds [06:10:22] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 10 seconds [06:13:01] PROBLEM - Puppet freshness on gallium is CRITICAL: No successful Puppet run in the last 10 hours [06:14:22] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 226 seconds [06:15:22] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 19 seconds [06:18:22] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 199 seconds [06:19:22] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 12 seconds [06:22:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:23:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [06:25:53] New patchset: Tim Starling; "(bug 45005) Redirect wikidata.org to www.wikidata.org" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/49069 [06:26:38] New review: Tim Starling; "PS12: rebase." [operations/apache-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/49069 [06:26:43] Change merged: Tim Starling; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/49069 [06:27:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:27:58] TimStarling: If I'm correct, that should mean http://wikidata.org/w/index.php?title=Special:Watchlist should redirect to www? [06:28:10] yes [06:28:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [06:28:18] it isn't yet for me [06:28:24] but that might be just my cahcing [06:28:40] hold your horses [06:28:52] sorry just wanted to use that expression [06:29:05] I only merged it, I haven't finished deploying it yet [06:29:18] * Jasper_Deng_busy forgot there wasn't a !log yet [06:31:26] !log deploying apache conf change for www.wikidata.org redirect (I7bb872fd) [06:31:34] Logged the message, Master [06:32:45] and it now works [06:33:20] are you sure you're busy? [06:41:57] New patchset: Tim Starling; "Basic puppetization of dsh" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56107 [06:45:52] New patchset: Tim Starling; "Basic puppetization of dsh" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56107 [06:47:08] New review: Tim Starling; "PS5: wrong parent, please ignore" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56107 [06:52:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:53:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [06:58:37] New patchset: Tim Starling; "In sync-dir, actually perform the syntax check" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56105 [06:58:37] New patchset: Tim Starling; "Move scap source location from fenari to tin" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56104 [06:58:38] New patchset: Tim Starling; "Basic puppetization of dsh" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56107 [07:00:11] New review: Tim Starling; "PS5: rebase including conflict resolution with I037a1f5e" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56104 [07:01:02] New patchset: Tim Starling; "Remove some node lists" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56108 [07:01:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:02:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.140 second response time [07:05:19] New review: Tim Starling; "-1 per Ryan's comment, ircecho from tin to Freenode won't work." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/56104 [07:21:47] New review: Tim Starling; "I think the simplest solution would be to use socat as a TCP relay, from ircecho on tin to Freenode...." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56104 [07:40:35] binasher, thanks! [07:52:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:53:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.135 second response time [08:05:57] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [08:05:57] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [08:05:57] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [08:10:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:11:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [08:18:55] New review: Krinkle; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59810 [08:32:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:33:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [09:08:02] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [09:10:32] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [09:11:32] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [10:50:16] New patchset: Hashar; "lucene-jobs: convert java opts to shell variables" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59995 [10:50:16] New patchset: Hashar; "conf file for lucene.jobs.sh (not used yet)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59996 [10:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [11:00:17] New patchset: Hashar; "conf file for lucene.jobs.sh (not used yet)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59996 [11:00:50] New review: Hashar; "fixed invalid template call (source => content)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59996 [11:01:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:02:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [11:15:06] do I understand correctly that if we have a '+wiktionary' => array( 'Wiktionary' => NS_PROJECT, ) alias in wgNamespaceAliases then it's unnecessary to repeat this in wiki-specific aliases? [11:17:02] New patchset: Hashar; "conf file for lucene.jobs.sh (not used yet)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59996 [11:22:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:23:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [11:41:19] New patchset: Odder; "(bug 46846) Localise project namespaces for dv.wiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59998 [11:41:36] New patchset: Mark Bergsma; "Support per-backend options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59999 [11:44:44] so hashar [11:44:47] New patchset: Hashar; "lucene-jobs: enable conf file loading" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60000 [11:44:57] how do I test this in beta labs? ;) [11:44:58] https://gerrit.wikimedia.org/r/#/c/59999/1 [11:44:59] mark: hello [11:46:09] New review: Hashar; "The conf sourcing is enabled in another change to prevent disruption https://gerrit.wikimedia.org/r/..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59996 [11:47:21] mark: sorry was doing some paperwork [11:47:27] no worries [11:47:31] New patchset: Odder; "(bug 44899) Namespace setup for Korean Wikiversity" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59786 [11:47:35] you're not my secretary after all ;-) [11:47:57] so beta has cache instances for upload, mobile, bits [11:48:05] some of them run from the production branch [11:48:17] mobile uses puppetmaster:self [11:49:11] I would love to have puppetmaster self self update [11:49:33] how do I log in? ;-) [11:49:45] ah [11:50:04] do you have a labs account ? [11:50:24] haha [11:50:25] yes [11:51:07] the labsconsole UI is so horrible [11:51:12] i'm trying to find the beta project there ;p [11:51:22] ahh https://wikitech.wikimedia.org/wiki/Special:Contributions/Mark_Bergsma [11:51:30] it is named 'deployment-prep' [11:51:38] oh [11:51:45] New patchset: Odder; "(bug 44899) Namespace setup for Korean Wikiversity" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59786 [11:52:02] Successfully added Mark Bergsma to deployment-prep. [11:52:02] !! [11:52:28] deployment-cache-mobile01 [11:53:06] that one runs out of the tip of production [11:53:14] i need one with puppetmaster self [11:53:21] oh god. :-( [11:53:23] currently mobile is pointing to another one : deployment-cache-varnish-t3 [11:53:51] I would like the instance to self update which is not yet possible with puppetmaster::self [11:54:04] so I wanted to migrate the mobile site to the deployment-cache-mobile01 instance [11:54:10] but then that means not being able to test out changes [11:54:17] so I should update puppetmaster:self :) [11:54:21] i don't see that instance [11:54:26] ah [11:54:31] deployment-varnish-t3 [11:55:02] how do I import that change now? [11:55:09] just fetch from gerrit? [11:56:16] why the fuck people write wrong doc :( [11:57:20] New patchset: Odder; "(bug 44899) Namespace setup for Korean Wikiversity" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59786 [11:57:39] so as root: [11:57:39] export GIT_SSH=/var/lib/git/ssh [11:57:40] cd /var/lib/git/operations/puppet [11:57:42] then fetch the change [11:57:55] I usually copy paste the 'checkout' line from the gerrit change [11:57:59] right [11:58:01] thanks :) [11:58:04] and add a -b 12345/12 [11:58:14] to craft a local branch named after the change + patchset [11:58:22] then puppetd -tv and the rest as usual [11:59:02] brb [12:00:07] awesome [12:00:09] New review: Odder; "Doing things the right way, i.e. removing NS_PROJECT definition from $wgExtraNamespaces and moving i..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59786 [12:00:09] it didn't do anything [12:00:19] that's a good start ;) [12:02:20] it is good to see you having interest in labs instances :D [12:02:46] heh [12:02:49] zeljkof runs selenium tests against beta. So that let us catch mediawiki issues before they got deployed [12:02:51] i have an interest in not breaking the site [12:03:02] and I almost rigged up something by doing my changes only in a large comment section [12:03:04] but that would be unfair [12:03:06] this is what beta is for ;) [12:03:37] exactly :-] [12:14:46] New patchset: Mark Bergsma; "Support per-backend options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59999 [12:15:11] bbl [12:15:26] oh you made it after all [12:15:37] New patchset: Odder; "(bug 46534) Add namespace aliases for uz.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60002 [12:18:03] New patchset: Odder; "(bug 46846) Localise project namespaces for dv.wiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59998 [12:18:44] Geez, you can't do anything here without the whole channel knowing about it immediately. [12:18:55] :D [12:20:56] morning paravoid :-] [12:21:25] paravoid: so how would I get you to sponsor my packages ? :-]  I am not sure what the process is or what you expect from me [12:22:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:23:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [12:24:55] the process is you give a slight hint about wanting to package something new and faidon comes enthusiastically running at you [12:25:10] back in an hour or so ;) [12:32:19] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [12:32:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:33:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [12:47:59] hashar: looking at statsd now [12:48:13] you know you can do --download-current-version ? [12:48:14] paravoid: the up to date changes are in svn [12:48:26] and get rid of the DEB_UPSTREAM_VERSION hacks? :) [12:48:27] paravoid: with uscan yeah [12:48:35] svn build package has an option to do it too [12:48:48] but it does not rename the tarball in build-area :( [12:48:51] yeah I'm talking about uscan [12:48:56] rename? [12:48:57] it symlinks [12:48:58] wfm [12:49:02] in tarballs [12:49:05] but not in build-area :( [12:49:09] at least on a precise instance [12:49:21] so when building out the package it can't find the tar ball :( [12:49:35] huh? [12:50:18] ohh [12:50:47] I see what you mean now [12:51:02] my rules get-orig-source: target needs a tweak [12:52:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:53:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.142 second response time [12:54:29] uscan --download-current-version --rename --destdir ../tarballs [13:00:18] paravoid: I have no idea how svn-buildpackage fetch the sources [13:00:23] apparently we have to uscan first [13:00:58] that's what I mean [13:01:16] so apparently I have to do: [13:01:21] ./debian/rules get-orig-source && svn-buildpackage [13:01:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:02:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [13:03:05] that works [13:03:12] I usually do uscan manually [13:03:16] anyway, are you commiting that? [13:03:28] ditching DEB_UPSTREAM_VERSION and using --download-current-version [13:03:44] also, on d/copyright, the Copyright (c) 2012-2013, James Socol [13:03:51] isn't needed, as you have it two lines above [13:04:17] and while at it, I tend to license debian/* with the same license as upstream to be a good citizen, but that's entirely you decision of course [13:06:09] yeah [13:06:52] otherwise looks good, do those and I'll upload [13:07:22] paravoid: isn't the copyright part of the license ? [13:07:30] no [13:08:33] paravoid: diff review : http://paste.openstack.org/show/36385/ ;) [13:09:12] with colors http://paste.openstack.org/show/36386/ [13:26:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [13:32:59] New patchset: Faidon; "icinga: authorize faidon for info/commands" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60007 [13:33:04] hashar: ack [13:33:12] paravoid: sending to svn [13:33:49] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60007 [13:34:11] New patchset: Mark Bergsma; "Support per-backend options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59999 [13:34:28] paravoid: sent http://anonscm.debian.org/viewvc/python-modules?view=revision&revision=23970 [13:35:09] New patchset: Mark Bergsma; "Support per-backend options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59999 [13:35:25] and there is zero lintian issue :-] [13:36:29] I know [13:36:33] this isn't my first upload you know :) [13:37:47] quick mailbox count shows ~320 [13:38:49] maybe I should subscribe to the list of debian-python commits [13:39:27] voluptuous has a different get-orig-source [13:39:35] yeah haven't fixed that one [13:41:18] can it make it into wheezy? [13:41:48] no [13:42:05] wheezy is frozen since July [13:42:16] so no new packages since then [13:42:16] july? holy fuck [13:42:33] wheezy gets released on May 4th/5th [13:43:15] New patchset: Mark Bergsma; "Support per-backend options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59999 [13:43:30] paravoid: I have fixed the voluptuous target for orig-get-source [13:43:40] get-orig-source [13:43:41] rhgr [13:44:03] New patchset: Mark Bergsma; "Support per-backend options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59999 [13:44:54] hashar: statsd ftbfs, it needs setuptools to build but it's not in B-D [13:45:12] warn: can't parse acronym ftbfs [13:45:20] warn: can't parse acronym B-D :-] [13:45:24] haha [13:45:30] ah hmm [13:45:31] fails to build from source [13:45:34] Build-Depends [13:45:46] let's make up our own acronyms too hashar [13:45:52] and then annoy paravoid with it [13:45:54] ftbfs is quite common [13:45:55] annoying debianisms ;p [13:46:03] common in the grand world of debian [13:46:06] there's even a wikipedia article about it! [13:46:09] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [13:46:11] oh then [13:47:21] paravoid: I did not catch that issue while building in my debian/unstablve vbox [13:47:55] ah B-D for setuptools is set in voluptuous [13:47:59] must have build that one first [13:48:04] * hashar should use a clean chroot [13:48:11] yes you should :) [13:48:14] pbuilder [13:50:14] B-D added with r23972 [13:51:53] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59854 [13:52:15] New patchset: Mark Bergsma; "Support per-backend options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59999 [13:55:03] New patchset: Mark Bergsma; "Support per-backend options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59999 [13:56:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:56:57] New patchset: Mark Bergsma; "Support per-backend options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59999 [13:57:06] hashar: both uploaded [13:57:09] so, [13:57:14] what happens next? [13:57:19] these are new packages, i.e. both new source & binaries [13:57:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [13:57:24] this means that they get into a queue called NEW [13:57:38] where they're going to be checked for sanity and legal compliance by the ftp-master team [13:57:59] we should form a puppet-master team [13:58:00] oh wait [13:58:16] this can be a few days, although recently they suffered a lot of backlog [13:58:26] so for gdnsd it took two and a half months :) [13:58:28] but it's better now [13:58:36] the queue is http://ftp-master.debian.org/new.html [13:58:43] that's the stats, you can see the bump http://ftp-master.debian.org/stat.html [13:59:24] mark--can you root-squash nfs for the fundraising share again? I'm all done. [13:59:30] ok [13:59:33] thx [13:59:44] paravoid: hopefully my simple packages will get approved quickly [14:00:12] paravoid: I love the rrd graphs. We used to have something similar for mediawiki review backlog (back when we used svn) [14:00:24] large or small it doesn't matter much [14:00:31] Jeff_Green: done [14:00:36] great, thanks! [14:00:41] if anything, large packages might get priority :) [14:01:31] paravoid: I am really happy to finally have contributed back to debian [14:01:48] paravoid: once approved would the package land in unstable or experimental? [14:01:53] unstable [14:01:57] that's what you put in debian/changelog [14:02:02] ah yeah [14:02:16] and from there I will have to find out some ubuntu people to sync the packages [14:02:20] no [14:02:23] it happens automatically [14:02:27] even better [14:02:40] although I think they're going to sync from testing this time [14:02:45] or maybe not, I'm not sure [14:02:57] so your packages won't get to testing because of the freeze [14:03:00] and to get them in apt.wm.o ? [14:03:37] I can do that [14:03:38] oh btw [14:03:46] do svn-buildpackge --svn-tag-only for both of them [14:03:48] to tag the uploads on svn [14:04:23] New patchset: Mark Bergsma; "Support per-backend options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59999 [14:04:27] mark: wanna give an opinion on https://gerrit.wikimedia.org/r/#/c/59162/ ? [14:04:49] so, ottomata wants to create analytics/* branches on operations/puppet [14:05:08] so that he can point labs instances there and merge them there without review while working on them [14:06:01] I think Ryan and a similar idea for labs [14:06:05] with labs/projectname branches [14:06:20] I'm not a huge fan but I don't see harm [14:07:56] this doesn't scale anyway [14:11:25] paravoid: tags uploaded [14:13:54] commented [14:15:11] thanks [14:15:16] I thought you might care :) [14:30:40] New patchset: Mark Bergsma; "Support per-backend options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59999 [14:35:15] paravoid: ^ want to review that? [14:35:34] i dislike the duplication of code there but don't currently see an easy way to change that [14:36:34] hmm [14:36:40] maybe we don't need it anymore now with this new funcationality [14:38:09] hmm [14:38:16] I think we can do away with it indeed... good [14:41:08] sec [14:45:59] yeah [14:46:14] haven't reviewed in depth, but it looks okay [14:46:30] ugly, but I have no better alternatives to suggest [14:46:40] yes [14:47:17] maybe someone should look at hiera [14:50:32] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59999 [14:50:54] hiera with yaml? and in general not how it would be used for that particular change? [14:53:04] New patchset: Mark Bergsma; "Revert "Support per-backend options"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60009 [14:53:20] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60009 [14:53:22] heh, that was fast :P [14:55:23] New patchset: Mark Bergsma; "Support per-backend options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60010 [14:56:10] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60010 [15:05:03] paravoid: did you get the two python modules on apt.wm.o ? [15:05:17] not yet [15:09:40] paravoid: also I don't see the packages at http://ftp-master.debian.org/new.html :/ [15:09:47] be patient :) [15:09:54] is that a daily cron or something ? [15:09:55] :-D [15:10:28] i think not daily [15:10:33] but i also think freeze? [15:10:39] ah there is something named dinstall that runs 4 time per day [15:11:01] have you met britney? :) [15:11:08] lots of jargon [15:11:35] New patchset: Mark Bergsma; "Remove the upload-specific backend section in upload-backend.inc.vcl" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60013 [15:12:32] Wheezy is testing, sid is unstable <-- that is what is always confusing me :D [15:12:51] and experimental is rc-buggy [15:13:01] oldstable doesn't currently exist but will soon [15:13:14] stable is squeeze [15:13:43] squeeze / wheezy being the names like ubuntu has precise/lucid/hardy ? [15:13:48] yes [15:13:52] then they also refer to some version numbers :D [15:14:01] so testing == wheezy == 7.0 :D [15:14:33] yes, but that last equality will change when release happens [15:14:42] sec [15:14:53] * mark takes a deep breath [15:14:58] * hashar hides [15:14:59] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60013 [15:15:26] hashar: jessie is the next testing after the coming release [15:16:24] 1.3 bo i think that was my first stable distribution [15:16:44] I mean something that was more or less installable on my comp :D [15:17:31] i don't remember back that far [15:17:50] i started between woody and sarge [15:22:40] ahh [15:22:40] http://en.wikipedia.org/wiki/File:Debian-package-cycl.svg [15:23:24] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [15:33:22] New patchset: Mark Bergsma; "Define the test_wikipedia backend on the mobile backend caches" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60014 [15:35:01] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60014 [15:38:31] oops [15:46:20] New patchset: ArielGlenn; "Luke Welling account info (stat1 access will be in next changeset)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60016 [15:48:13] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60016 [15:50:06] New patchset: Mark Bergsma; "Fix backend option search logic" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60017 [15:50:44] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60017 [15:52:45] New patchset: ArielGlenn; "stat1 access for bsitu, kaldari, lwelling, mlitn (RT 4959)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60019 [15:54:12] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60019 [15:55:38] New patchset: Mark Bergsma; "Sort the backend list to prevent constant reordering" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60020 [15:56:16] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60020 [15:56:37] hashar: haha, someone just ITPed statsd [15:58:07] paravoid: the etsy / nodejs one ? [15:58:12] yeah [15:58:23] I wasn't sure what the Source: field was for [15:58:33] since other modules add a short name, I did the same [15:58:55] #705758 [15:59:22] !bug 705758 [15:59:22] https://bugzilla.wikimedia.org/705758 [15:59:27] not that one [15:59:32] bugs.debian.org :) [15:59:34] !debian is http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=$1 [15:59:34] Key was added [15:59:38] !debian [15:59:38] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=$1 [15:59:40] !debian 705758 [15:59:40] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705758 [15:59:49] wm-bot: you are helpful [15:59:54] New patchset: Mark Bergsma; "Remove test_wikipedia backend on the frontend caches" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60022 [16:01:31] hey folks, from wikimedia-tech - getting 404's and 503's from esams upload varnishes on some images [16:01:33] like wget -S -U Malyacko --header 'host: upload.wikimedia.org' 'http://upload.esams.wikimedia.org/wikipedia/en/b/bd/Metplate.jpg' [16:01:34] New patchset: Mark Bergsma; "Remove test_wikipedia backend on the frontend caches" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60022 [16:01:43] or wget -S -U Malyacko --header 'host: upload.wikimedia.org' 'http://upload.esams.wikimedia.org/wikipedia/en/thumb/b/bd/Metplate.jpg/120px-Metplate.jpg' [16:03:26] guru meditation, that would probably be you mark [16:03:58] Greetings [16:04:16] I've been noticing some issues with thumbnails (I'm in the UK) [16:05:10] Qcoder00: yup, see http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/20130419.txt for before you joined [16:06:36] could somebody with access to /etc/varnish/secret on celsus (Parsoid varnish) purge the varnish cache with something like varnishadm -T 127.0.0.1:6082 url.purge . ? [16:07:25] gwicke: is that preferred to varnishadm ban.url . ? [16:08:21] LeslieCarr: no idea- I'd just like to drop the cache after the deployment yesterday [16:09:00] according to the docs ban.url should work too [16:09:09] gwicke: please make sure that's no longer needed once the entire site is served using parsoid :) [16:09:36] i'll use ban.url [16:10:00] !log flushed celsus varnish cache [16:10:01] mark: hehe, yeah ;) [16:10:07] Logged the message, Mistress of the network gear. [16:10:14] gwicke: are you available for our ops meeting on monday? [16:10:19] mark: any idea what could be causing the esams issue ? [16:10:23] yes [16:10:25] i just fixed it [16:10:26] mark: yes, let me ack that [16:10:30] thanks [16:10:50] LeslieCarr: I had made a mistake that caused varnish backends for esams to go to port 80 instead of 3128 [16:11:03] ah :) [16:11:04] LeslieCarr: thank you! [16:11:09] which sorta half works ;) [16:11:10] that would do it! [16:11:12] hehehe [16:13:34] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60022 [16:13:46] so Qcoder00 should be fixed very soon [16:13:47] PROBLEM - Puppet freshness on gallium is CRITICAL: No successful Puppet run in the last 10 hours [16:13:53] and good catch, thanks for letting us know [16:13:59] Thanks [16:14:02] should be fixed already [16:14:10] please let me know if it's not the case [16:14:10] I do a lot of image stuff so... [16:18:28] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 201 seconds [16:19:28] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 11 seconds [16:23:39] New patchset: Mark Bergsma; "Add dysprosium to the eqiad upload pool" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60023 [16:24:26] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60023 [16:26:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:27:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.177 second response time [16:33:24] New patchset: Hashar; "jenkins: get rid of group def (done by systemuser)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60024 [16:34:57] puppet broken on gallium cause of a duplicate Group['jenkins'] https://gerrit.wikimedia.org/r/#/c/60024/ should take care of it :-] [16:35:05] and I am off for the weekend * wave * [16:35:36] hashar: erm, why? [16:35:41] I like the jenkins user being on the jenkins module [16:35:46] er, group even [16:35:47] cause it is friday evening here ? [16:35:53] no, not that :) [16:35:57] paravoid: systemuser define a group [16:36:15] New review: Faidon; "The jenkins group should be in the jenkins module, let's find a way to do this properly." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/60024 [16:36:18] go home [16:36:28] it can wait until monday [16:36:50] one way would be to stop using systemuser [16:36:57] and fall back to user {}  like we did before :D [16:37:26] or we can hack system user to not attempt to define a group if it is already defined [16:37:27] :D [16:37:46] paravoid: thanks again for the package uploads :-] have a good evening! [16:43:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:44:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [16:48:24] is this metric broken or we're just not doing health checks? https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Upload%20caches%20esams&h=cp3004.esams.wikimedia.org&v=0&m=varnish.backend_unhealthy&r=2hr&z=default&jr=&js=&st=1366389850&vl=N%2Fs&ti=Backend%20conn.%20not%20attempted&z=large [16:49:07] i see bunch of metrics flatlined and then picked back up. i'm having trouble finding metrics that were zero and went up at the same time everything else flatlined [16:50:45] hrmmmm, i wants a gdash equivalent for ganglia [16:50:59] guessing that check doesn't work [16:51:08] i guess that's what views are for. (what ottomata was working on?) [16:51:28] yup! [16:51:31] you can puppetize views now [16:51:32] real easy: [16:52:59] see the ganglia::view documentation in ganglia.pp [16:53:45] https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=blob;f=manifests/ganglia.pp;h=ae60a2991f3fe11df6714a0ea68d3dbeea8341f2;hb=HEAD#l550 [16:57:20] speaking of gdash... how do we open up graphite again? (to some group... not sure what group that should be) what's the reasoning for the closure? (don't think i ever heard?) /me also would use ishmael once in a while (see e.g. bug 47045) but i understand that's possibly more sensitive [16:57:27] binasher: please comment ^^^ [16:58:58] * jeremyb_ sees asher is quite idle so bbl [17:01:26] you should be a bit more specific to what you need [17:01:35] we have gdash, we can add things there [17:01:39] what is it that you're missing? [17:09:06] Does http://etherpad.wikimedia.org/eduleadersworkshop work for anyone? paravoid? [17:09:44] Thehelpfulone: "waiting for etherpad.wikimedia.org ..." [17:10:11] the server responds to a ping, though [17:10:40] yeah it doesn't seem to load for me either, but I can't see anything problematic in ganglia: http://ganglia.wikimedia.org/latest/?r=day&c=Miscellaneous+pmtpa&h=hooper.wikimedia.org (although I'm not sure what I'm looking for!) [17:15:00] !log running enwiki.abuse_filter_log migration - drop afl_user index, added user_timestamp [17:15:09] Logged the message, Master [17:21:46] New review: Ram; "Just a style suggestion (feel free to ignore!): The java_opts() funtion can be written more compactl..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/59996 [17:29:00] New review: Ram; "Might be useful to also check if the file is readable:" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/60000 [17:29:05] New review: Jdlrobson; "Anyone you'd like to nominate Leslie?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57419 [17:30:53] jdlrobson: i don't nominate, i dictate!!! [17:31:46] LeslieCarr: hehe [17:31:56] any people you'd like to dictate? :) [17:32:18] one of the oldschool site experts would be good - maybe AaronSchulz ? [17:33:57] * AaronSchulz is gratuitously pinged [17:34:59] uh [17:35:07] why are you adding action=purge to a nagois monitor? [17:35:39] New review: Lcarr; "LGTM now/ Thanks for puppetizing" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/56107 [17:36:18] New review: Asher; "don't use action=purge for a popular url in a monitor. there are other ways to bypass cache if that..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/57419 [17:37:00] or asher :) [17:37:10] thanks LeslieCarr :) [17:39:15] paravoid: idk, it's been some time since I used graphite. but there were certainly times when I found it useful. it was open to *all* labs users for quite a while. [17:44:16] jeremyb_: if you remember things that were useful that aren't in gdash, let's add them to gdash [17:44:33] binasher: well what's the rationale for keeping it closed? [17:44:54] i originally thought it was just swept in with ishmael as an accident [17:45:08] LeslieCarr, could you take a look at etherpad.wikimedia.org? It seems to be down [17:45:14] https://wikitech.wikimedia.org/wiki/Etherpad [17:45:23] back up again [17:45:24] it is not a secure tool fit for facing the public internet [17:45:38] Thehelpfulone: see binasher's comment [17:45:45] it allows write actions with no privilege model [17:45:53] i mean it' was made for ishmael but applies to etherpad [17:45:54] hmm nope jeremyb_? [17:45:59] huh? [17:46:02] no [17:46:03] Thehelpfulone: ww? [17:46:16] that's my opinion on etherpad! "it is not a secure tool fit for facing the public internet" [17:46:16] LeslieCarr: i was answering jeremyb_ re: graphite, not etherpad [17:46:22] I have no idea what you people are saying at this point, but no [17:46:23] that's for sure [17:46:43] i know you meant graphite but the timing was perfect [17:46:48] ;) [17:47:39] jeremyb_: I'm all for being open as you know, but please be reasonable and ask for things that are useful [17:47:46] there are always tradeoffs involved [17:47:51] paravoid: typing... [17:47:58] LeslieCarr, hmm, I can't access http://etherpad.wikimedia.org/eduleadersworkshop [17:48:04] saying "omg graphite is closed" without giving a purpose isn't very helpful [17:48:15] ssh is also closed and I'm sure you can get useful information with ssh too :) [17:48:27] paravoid: we should make a ticket for opening that up [17:48:37] lets open up the sec channel too [17:48:48] maybe it can log to an etherpad [17:48:48] hrm, it seems to hate that url … [17:49:13] Thehelpfulone: honestly etherpad is a barely supported thing, i'd suggest to just start using another one [17:49:15] I'm gonna go open up all those private wikis that exist. brb. [17:49:33] ops has been saying for as long as i have been working here that etherpad is a non-backed up best effort, insecure piece of crap [17:49:34] errr, what's wrong with having no user accounts and having deletes done by solving a captcha? [17:49:38] LeslieCarr: its probably been pnd [17:49:40] notpeter, start with officewiki? ;) [17:49:42] that should barely be used by anyone [17:49:49] LeslieCarr, yeah it's not for me - there's a thread on education@ about it [17:50:07] you can copy/paste that line but i would appreciate it if you made it sound not as angry [17:50:08] :) [17:50:14] though if you want to make it still sound angry, that's ok [17:50:18] Thehelpfulone: we should start with something more juicy than office :) [17:50:19] can you get the notes from the server side? [17:50:24] notpeter, internal? :D [17:50:32] arbcom all the way [17:50:34] lol [17:50:51] heck, take the arbcom double mailman security off too whilst you're at it! [17:51:05] it's true, that's very anti-transparency [17:51:41] LeslieCarr, is the etherpad lite on wmf labs more reliable? I think they just want the notes then we can warn them not to use it in the future [17:51:57] it's on labs, so not more reliable [17:52:03] i mean maybe [17:52:05] heh I thought you'd say that [17:52:07] but really, etherpad [17:52:13] it seems to work better though [17:52:21] can you get the notes from that pad from the server somewhere? [17:52:23] cool [17:52:29] open a ticket for it? [17:52:33] and where we should send the notes [17:52:34] LeslieCarr, it's a bit hypocritical that ops are using it then? :p [17:52:51] sure, yeah if you email them to me I'll forward them to someone to post to meta [17:53:02] oh yeah, and any meeting wher eit explodes, it's totally our fault [17:54:39] woah does etherpad seriously use openoffice ? [17:54:48] I would like to push a fix to the Parsoid install without setting off alarms- can I silence warnings during the deploy? [17:54:51] oh my god it's a bigger piece of shit than i thought [17:55:12] !log restarting etherpad on hooper [17:55:20] Logged the message, Mistress of the network gear. [17:55:24] wow even it's init.d file is annoying [17:55:31] "Restarting Collaborative real-time editor etherpad " [17:55:52] https://rt.wikimedia.org/Ticket/Display.html?id=4979 is the RT ticket Leslie [17:56:06] binasher: paravoid: i think i vaguely recall that write without perms model thing but somehow didn't think about it/recall it until you brought it up. that is probably a good reason. (although I don't remember quite how it works). maybe we could find another group to allow in addition to wmf (not sure a good one exists now) or make a read only version somehow (like we have 2 icingas already). [17:56:13] binasher: paravoid: anyway, thanks for the pointer to that major blocker... seems like it could use someone (me?) to play with it in labs and figure out how it would be useful to other people (or not) and how to deal with the write model [17:56:29] thanks [17:56:38] jeremyb_: I still don't understand what is it that you're looking for [17:56:45] LeslieCarr: yes, it does use openoffice... :-(((( [17:57:18] Thehelpfulone: yay a restart fixed it [17:57:31] paravoid: the point is i don't know either... [17:57:45] :) [17:58:09] jeremyb_: then why look? [17:58:44] I'm now deploying a Parsoid fix, so please ignore any warnings that this might produce [17:59:26] binasher: i did once (quite a while ago) play with the prod graphite for some time. (maybe over an hour? and that wasn't the only time i used it). i think i found it useful at the time... [17:59:48] gwicke: thanks for the heads up :) [18:00:09] binasher: this may be a year or 6 months ago. i don't remember it so well [18:00:11] notpeter: np [18:01:43] the Parsoid deployment is done, seems that it was fast enough to not trigger warnings [18:01:56] binasher: also, have you seen https://bugzilla.wikimedia.org/47045 yet? i thought you'd be interested to know about the master write on every page load. (I would think it could just be a slave read) [18:06:39] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [18:06:39] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [18:06:39] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [18:07:04] jeremyb_: the word "write" does not appear anywhere in 47045.. wrong ticket? [18:07:34] binasher: https://bugzilla.wikimedia.org/show_bug.cgi?id=47045#c10 [18:08:23] i got that same lock wait timeout on at least 3 separate occasions [18:08:37] (just by testing... i don't actually use the tool myself) [18:08:42] Special:AbuseLog?wpSearchUser updates user_touched? that's nuts.. that should be its own ticket, unrelated to 47045 [18:09:04] yeah, at first wasn't sure if it was related [18:09:19] ok, i'll make a new one and cc you? [18:09:44] cc reedy and aaron [18:09:50] and not you? [18:11:06] you can cc me too [18:11:14] k [18:16:42] binasher: https://bugzilla.wikimedia.org/47422 [18:20:37] binasher: you made somemone in #-tech really happy... [18:26:29] PROBLEM - Host db33 is DOWN: PING CRITICAL - Packet loss = 100% [18:29:53] !request Thehelpfulone query [18:31:05] hah [18:31:20] binasher: you're not in the office today, right? [18:31:39] RECOVERY - Host db33 is UP: PING OK - Packet loss = 0%, RTA = 26.48 ms [18:31:44] greg-g: not currently [18:32:22] binasher: k, just was going to ping you at an opportune time about the mariadb blog post, but since I can't tell if you're stressed right now, hey, how's that mariadb blog post coming? ;) [18:32:50] hah [18:33:56] RECOVERY - Host rdb2 is UP: PING OK - Packet loss = 0%, RTA = 27.08 ms [18:34:06] PROBLEM - DPKG on db33 is CRITICAL: Connection refused by host [18:34:07] PROBLEM - Disk space on db33 is CRITICAL: Connection refused by host [18:34:11] greg-g: erik put it on the calendar for monday, what's up? [18:34:16] PROBLEM - MySQL Recent Restart on db33 is CRITICAL: Connection refused by host [18:34:16] PROBLEM - MySQL disk space on db33 is CRITICAL: Connection refused by host [18:34:26] PROBLEM - RAID on db33 is CRITICAL: Connection refused by host [18:34:27] PROBLEM - NTP on rdb2 is CRITICAL: NTP CRITICAL: No response from NTP server [18:34:33] binasher: oh, didn't know that, just making sure it was happening (I'm not totally in the loop on these things). Thanks. [18:34:36] PROBLEM - SSH on db33 is CRITICAL: Connection refused [18:34:45] and there goes db issues, man, I have bad timing [18:34:46] PROBLEM - SSH on rdb2 is CRITICAL: Connection refused [18:34:56] PROBLEM - mysqld processes on db33 is CRITICAL: Timeout while attempting connection [18:35:43] i think that's just notpeter rebuilding [18:36:09] !log increased php memory limit for fundraising civicrm instance [18:36:17] Logged the message, Master [18:38:36] RECOVERY - SSH on db33 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [18:40:45] New patchset: Diederik; "Disable final two Fundraising filters as they have moved to Gadolinium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60030 [18:42:49] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60030 [18:46:06] PROBLEM - Host rdb2 is DOWN: PING CRITICAL - Packet loss = 100% [18:47:32] New patchset: Jgreen; "adding packages on db29 for pgehres" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60031 [18:47:47] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60031 [18:48:05] did someone ping me in here? my client thought so but I can't find it [18:48:24] <^demon> I didn't, but I was about to ;-) [18:48:26] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 199 seconds [18:48:31] <^demon> Got a second to look at an RT I just filed? [18:48:34] hahaha well I am off for the night [18:48:40] I mean it's almmost 10 pm here [18:48:48] <^demon> Oh ok, don't worry about it then. [18:49:00] <^demon> I'll find someone else or ask later, totally non-urgent. Have a good night :) [18:49:05] all right :-) [18:50:26] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 18 seconds [18:50:27] apergos: the /topic was changed and you're in it. maybe that's it? [18:51:00] ah that would be, yep [18:51:16] RECOVERY - Host rdb2 is UP: PING OK - Packet loss = 0%, RTA = 26.99 ms [18:52:36] PROBLEM - NTP on db33 is CRITICAL: NTP CRITICAL: Offset unknown [19:08:10] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [19:08:31] PROBLEM - Host rdb2 is DOWN: PING CRITICAL - Packet loss = 100% [19:11:30] RECOVERY - NTP on db33 is OK: NTP OK: Offset -0.00191116333 secs [19:20:08] New patchset: Demon; "Begin replicating all repositories to antimony" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59927 [19:21:33] New patchset: Demon; "Begin replicating all repositories to antimony" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59927 [19:22:56] New patchset: Hashar; "conf file for lucene.jobs.sh (not used yet)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59996 [19:23:47] New review: Hashar; "I have basically copy pasted your code, the only exception is the return + ternary which I find a b..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59996 [19:27:07] New review: Hashar; "Applied on deployment-searchidx01 instance. End result of /a/search/conf/lucene.jobs.conf :" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59996 [19:31:20] New patchset: Bsitu; "Add echowikis.dblist file" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60036 [19:36:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:37:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [19:37:46] New patchset: Hashar; "lucene-jobs: enable conf file loading" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60000 [20:08:30] binasher: btw, the collector stat stuff includes CLI already and it's not separated in any way [20:08:48] thats why the job queue graphs actually have stuff other that push() stats [20:10:06] oh, right [20:11:03] I thought you'd be in the office today for some reason [20:15:46] AaronSchulz: i was going to be.. but oh well :) [20:16:41] well, Ryan's not here, so there wouldn't be booze anyway [20:17:22] AaronSchulz: maybe we should send cli to a different collector, have that feed into graphite as well, with a cli. prefix added to all stats, and add collector federation support to reporty.py [20:18:11] although, i wonder if having every stat duplicated under cli n graphite would just be annoying [20:19:04] New patchset: Bsitu; "Make Echo daily cron run against the wikis defined in echowikis.dblist" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60043 [20:20:13] New review: Bsitu; "This change depends on https://gerrit.wikimedia.org/r/#/c/60036/" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/60043 [20:20:50] binasher: though doing things like "most deviant master" queries would then have cli- stuff there when you might only care about queries on web requests sometimes [20:21:20] yeah [20:26:27] jdlrobson: what was that clipboard copier bookmarklet url again [20:26:36] jdlrobson: the one that your friend created [20:26:56] preilly: 1s [20:27:03] jdlrobson: thanks [20:27:33] binasher: so what if it went to another collector and just report.py included it? [20:27:52] preilly: http://bookmarksplugin.tiddlyspace.com/bookmarklet.js [20:27:57] AaronSchulz: sounds good [20:28:16] afk lunch [20:28:26] * AaronSchulz volunteers binasher to code that ;) [20:29:36] I am pushing another Parsoid update out, so please ignore any Parsoid warnings in the next minutes [20:31:20] jdlrobson: thanks [20:31:49] and done [20:37:46] New patchset: Aaron Schulz; "Setup redis job queue debug log." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60074 [20:38:31] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60074 [20:39:26] !log aaron synchronized wmf-config/InitialiseSettings.php 'Setup redis job queue debug log' [20:39:44] Logged the message, Master [20:50:14] RECOVERY - MySQL Recent Restart on db33 is OK: OK seconds since restart [20:50:15] RECOVERY - mysqld processes on db35 is OK: PROCS OK: 1 process with command name mysqld [20:50:15] RECOVERY - DPKG on db35 is OK: All packages OK [20:50:15] RECOVERY - MySQL disk space on db33 is OK: DISK OK [20:50:24] RECOVERY - Disk space on db35 is OK: DISK OK [20:50:25] RECOVERY - MySQL Recent Restart on db35 is OK: OK seconds since restart [20:50:25] RECOVERY - MySQL disk space on db35 is OK: DISK OK [20:50:25] RECOVERY - DPKG on db33 is OK: All packages OK [20:50:25] RECOVERY - RAID on db33 is OK: OK: 1 logical device(s) checked [20:50:34] RECOVERY - RAID on db35 is OK: OK: 1 logical device(s) checked [20:50:35] RECOVERY - mysqld processes on db33 is OK: PROCS OK: 1 process with command name mysqld [20:50:44] RECOVERY - Disk space on db33 is OK: DISK OK [20:55:15] [19-Apr-2013 17:38:32] Fatal error: Call to a member function format() on a non-object at /usr/local/apache/common-local/php-1.22wmf2/languages/Language.php on line 1325 [20:55:17] #0 /usr/local/apache/common-local/php-1.22wmf2/languages/Language.php(1325): Language::sprintfDate() [20:55:26] anomie: does that look familiar? [20:55:53] I think someone touched that area lately [20:57:16] AaronSchulz: Ugh. What's passing in a non-timestamp to sprintfDate? [20:57:29] see exception.log on fluorine [20:58:14] it's on the job runners (grep for mw10(0[1-9]|1[0-6])) [20:58:41] #1 /usr/local/apache/common-local/php-1.22wmf2/extensions/ParserFunctions/ParserFunctions_body.php(481): Language->sprintfDate('m-d H:i:sZ', '-00011130000000', Object(DateTimeZone)) [20:58:56] * AaronSchulz was wondering why the job recycle rate seemed high [20:59:05] binasher: yay graphs :) [20:59:31] * anomie is not seeing it in /a/mw-log/exception.log on fluorine [21:01:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:02:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 6.312 second response time [21:05:22] anomie: gah, I mean fatal.log [21:13:25] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 197 seconds [21:14:22] AaronSchulz: It looks like someone is passing BC dates into {{#time:}}. I didn't know that was even possible. Anyway, sprintfDate never has properly handled that. But now it throws an error instead of just using the epoch. Hrm. [21:15:24] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 20 seconds [21:16:10] I guess the question is do we patch ParserFunctions or make sprintfDate use the epoch (or some other random date) again? [21:21:40] can someone merge https://gerrit.wikimedia.org/r/#/c/59981/ ? sets an environment variable for matplotlib, pretty lightweight stuff [21:25:59] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59981 [21:26:21] ori-l: done [21:27:48] AaronSchulz: https://gerrit.wikimedia.org/r/#/c/60078/ and https://gerrit.wikimedia.org/r/#/c/60079/ [21:29:29] thanks ottomata [21:29:51] binasher: is there a reason that we have inconsitent keys on the user_groups table between wikis? older ones have PRIMARY KEY (ug_user, ug_group) and all newer wikis have UNIQUE KEY (ug_user, ug_group) [21:51:05] HI [21:51:07] Just a heads up [21:51:17] Getting some erattic behaviour at wikisource [21:51:27] where it ocassionaly spits up blank pages [21:51:28] erratic like? [21:51:39] huh, any all types of pages? [21:51:42] I sometimes get blank pages when saving [21:51:45] s/any/on/ [21:51:48] Page: Namespace [21:52:15] AaronSchulz: could that be a caching problem? ^^ [21:52:35] I also recently got a Connection reset error [21:52:50] Qcoder00: you know what i'm going to tell you now? [21:52:54] ctrl-shift-k! [21:53:00] white blank pages say what? [21:53:30] Bad header... [21:53:40] huh? [21:54:09] Well bad header or no content... [21:55:25] logged in, right? [21:55:36] no particular URLs? [21:55:41] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 206 seconds [21:56:03] Yep logged in [21:56:07] anomie|away: https://gerrit.wikimedia.org/r/#/c/60078/1/engines/LuaCommon/LanguageLibrary.php [21:56:12] shouldn't that say "from 0"? [21:56:14] and it' was specifcally to do with page namespace [21:56:51] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 192 seconds [21:56:55] huh, i'm getting a 404: https://bits.wikimedia.org/static-1.22wmf2/skins/chick/main.css?303-4 [21:57:25] Ooops-ey. [21:57:29] AaronSchulz: I suppose. [21:57:43] which namespace? [21:57:46] Qcoder00: ? [21:57:51] Page: [21:58:06] Typically when I save new content [21:58:46] i didn't realize it was on save [21:58:57] give me something to edit? [21:59:59] http://en.wikisource.org/w/index.php?title=Page:Goody_Two-Shoes_(1881).djvu/127&action=edit&redlink=1 [22:01:27] And it seems to have eased... [22:01:29] * Qcoder00 out [22:01:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:02:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [22:05:30] jeremyb_: heh, well, thanks for helping there [22:06:02] greg-g: takes too long to transcribe... can't test for white screen of death fast enough! [22:06:12] also, learning how wikisource works on the fly :) [22:06:15] heh [22:11:12] binasher: https://ishmael.wikimedia.org/?hours=4&host=db1056&sort=time [22:11:20] I wonder why the commit step is so high [22:12:22] Anyone got a pastebin? [22:12:58] dpaste.com [22:13:17] greg-g: he's still having issues [22:13:24] I see [22:13:39] i can't stick around too long [22:13:51] so, I suck at the debugging part... [22:14:12] AaronSchulz: if you could help, or pull in/pick someone else... [22:15:35] http://dpaste.com/1065450/ headers [22:15:49] Thats from a page that came back blank when i saved something [22:16:45] errr [22:16:49] no that's not [22:16:53] you copied the wrong entry [22:17:02] look for one that says POST [22:17:06] shouldn't wikisource be on wmf2? that says wmf1 [22:17:33] greg-g: you mean php version? [22:17:36] that's irrelevant [22:17:46] > X-Powered-By:PHP/5.3.10-1ubuntu3.5+wmf1 [22:18:12] i.e. that changes less than 5x / year probably (wild guess) [22:18:18] oh, ok [22:18:26] it's the version of php itself not of mediawiki [22:18:35] oh, whoa, yeah, total misread [22:18:46] I'm not seeing anything that says POST [22:19:01] That's from the headers I got when I tried to do a refresh on the page [22:19:10] Qcoder00: hit the clear button on the console, submit again, see what happens [22:19:25] And by blank I mean not even the mediawiki skin came up [22:19:30] Qcoder00: again, you copied the wrong entry. i didn't realize it was a refresh [22:20:00] Qcoder00: anyway... you need something on the wikisource domain. bits is not what you're looking for (for this issue at least) [22:20:17] looks like a If-Modified-Since:Fri request with a blank 304 header only response...which seems unremarkable [22:20:29] AaronSchulz: he pastebinned the wrong thing [22:20:37] that would explain [22:20:45] http://dpaste.com/1065450/ is the correct thing as far as I can see.. [22:20:50] being the GET for part of the page [22:21:01] If it's wrong then I lack the competence to explain this :( [22:21:26] Qcoder00: hit the clear button, refresh again, copy the *first* entry at the very top of the console [22:21:39] that paste still looks normal [22:21:47] AaronSchulz: it's the same URL... :) [22:22:25] http://dpaste.com/1065457/ [22:22:29] yeah, I misread that as someone going "here the correct thing" [22:22:43] Apologies for the formatting [22:22:45] :( [22:22:47] *here is [22:23:39] OK it loaded on the third attempt at refresh [22:24:17] Qcoder00: again, wrong thing. you do *not* want to paste something that says bits... [22:24:30] Well then I lack the competence to explain this [22:24:44] the thing you click on in the console must say wikisource in it [22:24:46] because I am certinaly NOT posting anything that says bits [22:24:51] at the *beginning* [22:24:55] in the domain name [22:25:04] the 1065457 link should be [22:25:06] > http://bits.wikimedia.org/en.wikisource.org/load.php?debug=false&lang=en&modules=ext.wikiEditor%7Cext.wikiEditor.toolbar%7Cjquery.wikiEditor%7Cjquery.wikiEditor.toolbar%7Cjquery.wikiEditor.toolbar.config%2Ci18n&skin=vector&version=20130419T025025Z&* [22:25:12] that's from your last paste [22:25:14] that says bits [22:25:15] something with wikisource in it [22:25:43] put another way: it should not say load.php [22:25:44] I'm lacking in competence to explain this [22:26:00] Thanks for your time so far, but I'm too stupid [22:26:04] :( [22:26:06] :( [22:26:50] I dont at the moment see a link that isn't bits [22:27:14] right. so you may need clear your browser cache [22:27:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:27:37] OK [22:28:30] And it seems to load OK if I type the URL directly in the adress bar [22:29:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.176 second response time [22:29:54] I like how tail -f poolcounter.log shows all kinds of languages [22:30:46] I'll levae this a while and see if it clears by itself... [22:30:59] Qcoder00: which browser are you using (sorry I'm a bit late to the conversation) [22:31:04] Firefox [22:31:09] Firefox 20 [22:31:11] I think [22:31:17] OK, thanks [22:32:40] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [22:33:00] And if preview my saves first I don't get the glitch [22:33:14] I am saying it's intermittent, and that usally means it my side :) [22:33:50] Qcoder00: I just tried Internet Explorer and got an error on wikisource as the editing toolbar was loading, trying in Firefox now. [22:38:34] New patchset: Andrew Bogott; "Add manage-nfs-volumes-daemon" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60083 [22:39:17] New review: Andrew Bogott; "Work in progress -- do not merge." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/60083 [22:45:43] New patchset: Andrew Bogott; "Add manage-nfs-volumes-daemon" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60083 [22:45:59] New review: Andrew Bogott; "Work in progress -- do not merge." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/60083 [22:48:33] Very strang [22:48:55] This blank page returning thing seems to be intermittent, as I am seeing it again [22:49:52] Both the links generated in the console are to bits [22:49:55] :( [22:49:59] And it gets no further [22:50:06] Change abandoned: Cmjohnson; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42035 [22:55:29] Qcoder00: to be clear... you're not closing and reopening the console? you're leaving it open the whole time? [22:55:37] Yep [22:55:45] The whole time I try to load [22:55:57] And the blank page on referesh seems to be intermittent [22:56:08] Sometimes it works , sometimes it doesn't. [23:07:15] !log brief reboot of labstore3 -> test to see that everything starts up as expected. [23:07:22] Logged the message, Master [23:08:41] PROBLEM - Host labstore3 is DOWN: PING CRITICAL - Packet loss = 100% [23:09:31] RECOVERY - Host labstore3 is UP: PING OK - Packet loss = 0%, RTA = 26.57 ms [23:12:12] PROBLEM - MySQL Slave Delay on db1051 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:13:22] PROBLEM - MySQL Slave Running on db1051 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:14:22] RECOVERY - MySQL Slave Running on db1051 is OK: OK replication [23:15:02] RECOVERY - MySQL Slave Delay on db1051 is OK: OK replication delay seconds [23:15:51] PROBLEM - MySQL Idle Transactions on db1051 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:16:41] RECOVERY - MySQL Idle Transactions on db1051 is OK: OK longest blocking idle transaction sleeps for seconds [23:16:52] mutante: why add people to !g 49069 ? [23:18:06] jeremyb_: because i talked to them about it for a while the night before it was merged and i wanted them to know it's been merged [23:18:25] ok :) [23:18:32] i thought you actually wanted them to review it :P [23:18:51] they had volunteered to be added before it had been merged [23:19:12] then it was merged, then they reminded me i didnt actually add them, then i did :p [23:19:28] they is funny [23:20:22] PROBLEM - MySQL Recent Restart on db1051 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:20:40] binasher: you? ^^ [23:21:12] RECOVERY - MySQL Recent Restart on db1051 is OK: OK seconds since restart [23:21:25] meh, is just nrpe [23:24:39] wtf [23:25:19] huge spike in enwiki queries [23:26:15] holy shit [23:26:15] wow [23:26:29] big spike in apache traffic and load too [23:26:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:26:57] yeah [23:27:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [23:28:11] standoff in boston? [23:28:31] appears to have escalated in the last ten minutes or so, dunno if that would correlate [23:29:37] maybe [23:29:43] Twitter says 'suspect down' [23:30:04] stat of increase is way too steep, I think, [23:30:06] New patchset: Dzahn; "delete old planet apache site config file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60084 [23:30:14] *start of [23:31:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:31:34] New review: Dzahn; "outdated. replaced by erb templates in new planet" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/60084 [23:31:35] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60084 [23:32:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [23:34:08] oh yeah.. [23:40:06] notpeter: https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Application+servers+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report looks nice :) [23:45:59] notpeter: wtf is with tail -f api.log | grep frwiki [23:46:19] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours