[00:00:08] can I request a varnish cache flush for mobile. We have some cached html which is stopping some important javascript loading causing quite a serious bug - 42749 [00:01:10] awjr asked half an hour or so [00:01:20] I think his ops bribe offer wasn't high enough [00:01:26] haha [00:01:29] I've just asked Ryan Lane [00:01:30] two whiskeys then? [00:01:37] Ryan_Lane: i'll get you a whiskey [00:01:56] Ryan_Lane: I offer 2 whiskeys [00:02:04] offer accepted from all of you [00:02:05] it's done [00:02:06] lol [00:02:09] hahaha [00:02:24] that's 5 whiskeys [00:02:26] confirmed fixed [00:02:31] thanks [00:02:47] Have you purged the bits varnish caches tooo? [00:02:53] Ryan_Lane: you flushed non-mobile cache as well? [00:02:57] oh, that's needed too? [00:03:03] I'm not sure that's a great idea [00:03:11] possibly, per bug 42452 [00:03:30] From what m ark has said before, it only takes a few minutes for them to be repopulated (due to the amount of data)... [00:03:42] So as long as all of them weren't done at once [00:03:51] ok [00:03:53] gimme a sec [00:03:54] New patchset: Dzahn; "add redirect for wikivoyage.net and save a few code lines by using a regex for .com, .de and .net" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/37345 [00:04:50] Change merged: Dzahn; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/37345 [00:06:31] I'm flushing it now [00:06:33] eqiad first [00:08:27] thanks, let me know when they are all flushed and I'll ping the people who are complaining :) [00:11:06] kaldari: done [00:11:15] yay [00:11:58] Haha, nice [00:12:11] There's no sign of any changes on the ganglia graphs [00:12:41] Reedy: those caches refill so quickly [00:12:45] only slightly [00:12:49] on the app servers [00:12:50] Yup [00:12:52] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Bits%2520application%2520servers%2520pmtpa&tab=m&vn= [00:13:05] ah, more noticeable there [00:13:14] eqiad isn't [00:15:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.071 seconds [00:20:31] New patchset: Pyoungmeister; "coredb monitoring: remove uncalled for conditional" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37347 [00:20:53] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37347 [00:21:07] New patchset: Dzahn; "wikivoyage.[com|de|net] redirects - break out into seprate rules again, can just summarize the first part, second part would not work like this" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/37348 [00:21:45] New patchset: Dzahn; "wikivoyage.[com|de|net] redirects - break out into seprate rules again, can just summarize the first part, second part would not work like this" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/37348 [00:23:31] New review: Dzahn; "testing 12 urls on 1 servers, totalling 12 requests" [operations/apache-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/37348 [00:23:33] New patchset: Pyoungmeister; "add name for mysql user" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37349 [00:23:38] New review: Dzahn; "testing 12 urls on 1 servers, totalling 12 requests" [operations/apache-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/37348 [00:23:38] Change merged: Dzahn; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/37348 [00:23:55] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37349 [00:27:03] dzahn is doing a graceful restart of all apaches [00:27:24] !log dzahn gracefulled all apaches [00:27:33] Logged the message, Master [00:31:38] New patchset: Pyoungmeister; "swapping roles on db61 for testing" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37351 [00:31:46] kaldari: did that fix the problem, btw? [00:32:06] Ryan_Lane: no idea, but we'll see if anyone else complains of the problem [00:32:09] !log DNS update - adding wikivoyage.net (link to .org) [00:32:12] heh [00:32:12] ok [00:32:17] Logged the message, Master [00:32:21] Ryan_Lane: I can't reproduce it myself [00:33:09] hooray for whiskey! [00:33:12] It seemed to have been affecting a large number of people on multiple wikis though [00:33:17] * Ryan_Lane nods [00:33:28] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37351 [00:33:45] I like to support the Tennessee economy when I can [00:34:11] :D [00:38:15] PROBLEM - swift-object-auditor on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:38:24] PROBLEM - swift-container-auditor on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:38:42] PROBLEM - swift-account-replicator on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:38:42] PROBLEM - swift-object-updater on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:38:42] PROBLEM - swift-account-reaper on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:38:51] PROBLEM - swift-container-updater on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:39:27] PROBLEM - swift-object-replicator on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:39:27] PROBLEM - swift-account-server on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:39:27] PROBLEM - swift-container-server on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:40:30] PROBLEM - swift-object-server on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:40:31] PROBLEM - SSH on ms-be7 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:40:48] PROBLEM - swift-account-auditor on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:40:57] PROBLEM - swift-container-replicator on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:42:27] RECOVERY - swift-container-replicator on ms-be7 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [00:42:36] RECOVERY - swift-container-server on ms-be7 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [00:42:36] RECOVERY - swift-object-replicator on ms-be7 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [00:42:36] RECOVERY - swift-account-server on ms-be7 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [00:43:03] RECOVERY - swift-object-auditor on ms-be7 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [00:43:12] RECOVERY - swift-container-auditor on ms-be7 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [00:43:31] RECOVERY - swift-account-reaper on ms-be7 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [00:43:31] RECOVERY - swift-account-replicator on ms-be7 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [00:43:31] RECOVERY - swift-object-updater on ms-be7 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [00:43:39] RECOVERY - SSH on ms-be7 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [00:43:39] RECOVERY - swift-container-updater on ms-be7 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [00:43:39] RECOVERY - swift-object-server on ms-be7 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [00:43:57] RECOVERY - swift-account-auditor on ms-be7 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [00:45:22] oh yay [00:45:30] our first broken disk with the new systems [00:46:00] New patchset: Kaldari; "Re-enabling wgAllowCopyUploads for Commons for experimental Flickr uploading, see bug 20512." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37353 [00:46:14] iirc the devs/other operators say that swift is particularly good about detecting failing disks early [00:46:30] or error counts in some log are or something [00:47:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:48:56] Change merged: Kaldari; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37353 [00:54:27] !log reprepro changes: add precise-wikimedia deb universe amd64 mariadb-server-5.5 5.5.28-mariadb-wmf201212041~precise [00:54:35] Logged the message, Master [00:56:13] !log kaldari synchronized wmf-config/InitialiseSettings.php 'turning on experimental Flickr uploading for admins on Commons' [00:56:22] Logged the message, Master [01:03:08] !log reedy synchronized php-1.21wmf5/extensions/ParserFunctions/Expr.php [01:03:16] Logged the message, Master [01:04:03] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.103 seconds [01:15:47] binasher: where are you planning to use maria? [01:36:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:47:10] New patchset: Asher; "mariadb testing" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37365 [01:51:09] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.586 seconds [02:00:27] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [02:00:27] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [02:00:28] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [02:24:07] !log LocalisationUpdate completed (1.21wmf5) at Fri Dec 7 02:24:07 UTC 2012 [02:24:17] Logged the message, Master [02:25:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:27] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [02:38:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.067 seconds [03:16:30] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [03:16:30] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [03:33:27] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [03:33:27] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [03:33:38] New patchset: Catrope; "Another pmpta typo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37371 [03:40:35] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37371 [03:48:34] New patchset: Catrope; "Try using check_http_on_port instead of check_lvs_http_on_port for Parsoid monitoring" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37375 [04:13:24] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37365 [04:16:54] New patchset: Ori.livneh; "Yet another pmpta -> pmtpa" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37378 [04:21:22] New patchset: Asher; "our jenkins puppet test is broken :(" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37379 [04:25:30] PROBLEM - LVS HTTP IPv4 on parsoid.svc.pmtpa.wmnet is CRITICAL: (null) [04:35:11] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37379 [04:52:12] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [04:54:25] New patchset: Asher; "var check fix" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37382 [04:54:42] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37382 [04:59:00] !log kaldari synchronized php-1.21wmf5/extensions/UploadWizard/resources/mw.UploadWizardDetails.js 'fixing live geo bug in UploadWizard' [04:59:10] Logged the message, Master [05:04:26] New patchset: Asher; "reversing the logic for whether my.cnf should contain facebook-patch only options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37385 [05:05:14] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37385 [05:15:09] PROBLEM - Puppet freshness on marmontel is CRITICAL: Puppet has not run in the last 10 hours [05:25:08] !log aaron synchronized php-1.21wmf5/includes/job/JobQueueDB.php 'deployed 78c63d4cafb4937e289856f66ac0e524fda79acb' [05:25:17] Logged the message, Master [05:35:15] PROBLEM - Puppet freshness on ms-be3 is CRITICAL: Puppet has not run in the last 10 hours [06:09:18] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [06:10:21] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.45 ms [06:14:15] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [06:39:26] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.082 second response time [07:45:49] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [07:53:46] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [08:25:41] PROBLEM - LVS HTTP IPv4 on parsoid.svc.pmtpa.wmnet is CRITICAL: (null) [08:28:14] maybe that shouldn't page righ tnow [08:30:04] given that the nagios checks are supposedly broken [08:30:48] Warning: Unknown: Input variables exceeded 1000. To increase the limit change max_input_vars in php.ini. [08:31:06] someone attempted to haXX0rize us?:P [08:31:27] :-D [08:31:36] no one would do that, we're the good guys [08:31:54] (that's sarcasm, I just couldn't find the sarcasm emoticon on my keyboard) [08:31:57] but that annoys bad gys! [08:36:36] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37375 [08:38:59] let's see if roan's change will make those go away [08:43:36] f course puppet takes forever to run on spence >_< [09:10:09] oh joy [09:10:24] "cannot override local resource" again, after all the waiting [09:12:40] hello [09:12:47] New patchset: Hashar; "rake validate now fail properly on .pp validation" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37394 [09:14:32] apergos: may you merge in https://gerrit.wikimedia.org/r/37394 ? That fix the Jenkins job in charge of validating operations/puppet manifests :) [09:14:40] yeah just a sec [09:14:45] sure :-) [09:17:55] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37394 [09:20:13] waiting for the run to complete [09:20:34] thanks a lot! [09:21:04] err: /Stage[main]/Misc::Docsite/File[/srv/org/wikimediaq/doc/index.html]/ensure: change from absent to file failed: Could not set 'file on ensure: No such file or directory - /srv/org/wikimediaq/doc/index.html.puppettmp_4914 at /var/lib/git/operations/puppet/manifests/misc/docs.pp:17 [09:21:05] hm [09:21:39] ah that is andrew boggot change [09:23:03] was tha rakefie something that would have been pulled in by git clone? [09:23:14] *that [09:23:37] apergos: I am not sure I understand your question [09:23:38] RECOVERY - LVS HTTP IPv4 on parsoid.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 1221 bytes in 0.016 seconds [09:23:53] the misc::docsite has an error in it, it require a file which is never provided apparently [09:23:53] ha, that's roan's change working [09:24:50] well I'm asking about your change to rakefile, which I expected to show up as changed somehow in th pupet run [09:24:56] on gallium right? [09:25:05] ahh [09:25:23] Jenkins get the latest version of the production branch then merge in the submitted change [09:25:24] but the only thing I saw that could have been it was [09:25:26] notice: /Stage[main]/Misc::Docs::Puppet/Git::Clone[puppetsource]/Exec[git_pull_puppetsource]/returns: executed successfully [09:25:35] so that's why I was asking if the git clone covered that [09:25:39] so as soon as you merged the rake file change, Jenkins knows about it [09:25:59] Git::Clone should get the latest version, I think that is the default [09:26:03] ok great [09:26:09] then you should be eset to test that now [09:26:10] but Jenkins does not use the file from /etc/puppet/something [09:26:51] ok [09:27:03] New patchset: Hashar; "puppet manifest failure (do not submit)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37396 [09:27:08] I should do an office hour to explain jenkins to everyone :-) [09:27:22] +2! [09:27:52] looks like your failure was flagged as failure, w00t [09:27:57] \O/ [09:28:08] I still have to make the console log nicer [09:28:15] that is a bit hard to find out what is actually failing [09:29:17] Change abandoned: Hashar; "yeah that fails as expected!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37396 [09:31:46] New review: Hashar; "recheck" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/35298 [09:32:26] New review: Hashar; "recheck" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/37118 [09:32:37] New review: Hashar; "recheck" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/37165 [09:32:43] New review: Hashar; "recheck" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/37153 [09:32:50] New review: Hashar; "recheck" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/37118 [09:33:27] that retriggered the lint checks [09:33:32] and made Jenkins to V+1 [09:40:41] New patchset: MaxSem; "Enable GeoData on all wikipedias and wikivoyages" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37398 [09:41:06] Change merged: MaxSem; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37398 [09:48:32] !log maxsem synchronized wmf-config/InitialiseSettings.php 'https://gerrit.wikimedia.org/r/37398' [09:48:39] Warning: the RSA host key for 'hume' differs from the key for the IP address '2620:0:860:2:21d:9ff:fe33:f235' [09:48:39] Offending key for IP in /etc/ssh/ssh_known_hosts:835 [09:48:39] Matching host key in /etc/ssh/ssh_known_hosts:603 [09:48:43] Logged the message, Master [09:52:28] ehm, why 'wikipedia' => somevalue doesn't worrk in InitialiseSettings? [09:52:35] New patchset: MaxSem; "Revert "Enable GeoData on all wikipedias and wikivoyages"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37399 [09:53:09] Change merged: MaxSem; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37399 [09:54:02] !log maxsem synchronized wmf-config/InitialiseSettings.php 'https://gerrit.wikimedia.org/r/#/c/37399/' [09:54:08] o_0 [09:54:11] Logged the message, Master [10:27:54] RECOVERY - Parsoid on kuo is OK: HTTP OK HTTP/1.1 200 OK - 1221 bytes in 0.005 seconds [10:28:12] RECOVERY - Parsoid on wtp1001 is OK: HTTP OK HTTP/1.1 200 OK - 1221 bytes in 0.054 seconds [10:28:21] RECOVERY - Parsoid on lardner is OK: HTTP OK HTTP/1.1 200 OK - 1221 bytes in 0.009 seconds [10:28:30] RECOVERY - Parsoid on tola is OK: HTTP OK HTTP/1.1 200 OK - 1221 bytes in 0.003 seconds [10:28:30] RECOVERY - Parsoid on wtp1 is OK: HTTP OK HTTP/1.1 200 OK - 1221 bytes in 0.009 seconds [10:28:48] RECOVERY - Parsoid on mexia is OK: HTTP OK HTTP/1.1 200 OK - 1221 bytes in 0.003 seconds [10:29:25] heh [10:53:49] amusing names [10:54:02] why don't they use parsoid001 ? ;) [11:05:00] New patchset: Mark Bergsma; "Handle many more error conditions" [operations/software] (master) - https://gerrit.wikimedia.org/r/37231 [11:07:06] New patchset: Mark Bergsma; "Randomize the order of containers" [operations/software] (master) - https://gerrit.wikimedia.org/r/37405 [11:07:06] New patchset: Mark Bergsma; "Use connection pooling for every Swift operation" [operations/software] (master) - https://gerrit.wikimedia.org/r/37406 [11:07:07] New patchset: Mark Bergsma; "Don't unnecessarily rerequest dst containers" [operations/software] (master) - https://gerrit.wikimedia.org/r/37407 [11:07:07] New patchset: Mark Bergsma; "Get rid of the useless HEAD request on every object creation" [operations/software] (master) - https://gerrit.wikimedia.org/r/37408 [11:51:23] PROBLEM - Puppet freshness on db59 is CRITICAL: Puppet has not run in the last 10 hours [12:01:26] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [12:01:26] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [12:01:27] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [12:33:23] PROBLEM - Puppet freshness on db9 is CRITICAL: Puppet has not run in the last 10 hours [12:34:26] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [12:35:12] New patchset: Hashar; "wikibugs perl dependencies are needed for Jenkins" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37419 [12:52:10] New patchset: Jgreen; "add backupmover account to fundraising db dump boxes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37422 [12:52:48] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37422 [12:54:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:57:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.335 seconds [13:10:00] do mwscript and company work on spence too? [13:12:38] <^demon> apergos: Ping. [13:13:10] New review: Nemo bis; "By the thanks also to some CR via IRC by Ariel I sent an e-mail on this about a week ago but got no ..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/33713 [13:14:08] !log demon synchronized php-1.21wmf5/extensions/Wikibase 'Syncing wikibase to 24d8471' [13:14:18] Logged the message, Master [13:17:41] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [13:17:41] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [13:18:43] New patchset: Demon; "Adding wikibase to debug log groups" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37426 [13:20:20] Change merged: Demon; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37426 [13:28:06] !log demon synchronized wmf-config/CommonSettings.php 'Deploying wikibase debug log groups' [13:28:14] Logged the message, Master [13:32:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:34:38] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [13:34:38] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [13:44:05] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.069 seconds [13:50:02] Change abandoned: Andrew Bogott; "Most links turn out to work OK without this, and I don't like this particular approach." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37118 [14:04:26] !log demon synchronized php-1.21wmf5/extensions/Wikibase/client/includes/store/sql/WikiPageEntityLookup.php [14:04:35] Logged the message, Master [14:11:25] !log demon synchronized php-1.21wmf5/extensions/Wikibase/client/includes/store/sql/WikiPageEntityLookup.php 'More debugging' [14:11:33] Logged the message, Master [14:14:28] !log demon synchronized php-1.21wmf5/extensions/Wikibase/client/includes/store/sql/WikiPageEntityLookup.php 'More debugging, Iaf3cfff9' [14:14:36] Logged the message, Master [14:16:40] hashar, feel like reviewing a few lines of php? https://gerrit.wikimedia.org/r/#/c/37250/ [14:17:45] luuuve php [14:18:08] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:23:45] andrewbogott: reviewing [14:23:50] thanks [14:29:36] andrewbogott: done https://gerrit.wikimedia.org/r/#/c/37250/ [14:29:49] andrewbogott: implode( explode() ) is a nice trick :) [14:30:33] Easier than figuring out regexp in yet another language [14:31:26] <^demon> I tried to fix doc.wikimedia.org ssl yesterday. [14:32:28] <^demon> Hmm, the apache config got added, but still seems to be serving old cert & docroot :\ [14:32:32] <^demon> Wonder if apache got kicked. [14:33:45] <^demon> andrewbogott: Can you try restarting apache on gallium? [14:34:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.042 seconds [14:34:36] ^demon: Yep, done. [14:35:07] <^demon> Yay, fixed. [14:35:09] <^demon> Thanks! [14:35:12] <^demon> https://doc.wikimedia.org/ [14:38:44] New patchset: Demon; "Adding stopgap measure for testing wikidata client on test2wiki" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37429 [14:41:21] Jarry1250: /wii mutante [14:41:30] ehm [14:47:09] andrewbogott: regarding your Openstack manager change, I will let Ryan +2 it [14:47:20] * andrewbogott nods [14:47:25] andrewbogott: I don't want to mess with "his" extension :) [14:47:45] Don't you? [14:47:47] I do ;) [14:48:25] hehe [14:49:00] hashar, re-review? [14:50:32] hm [14:50:42] I need to enable lint checks on that extension :) [14:50:51] That's why I break the LDAP extension now and again ;D [14:51:34] andrewbogott: nice =) [14:52:49] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [14:59:54] andrewbogott: I got it [15:00:09] andrewbogott: it is a bit hacky to append the URL to the class name for display purpose [15:00:52] You mean, in the multiselect label? [15:02:24] $instanceInfo["${puppetgroupname}-puppetclasses"] had 'options' set to a list of plain text puppet classes [15:03:04] now it will be something like: ganglia::client … ntp::client .. [15:04:45] Is that dict used anywhere apart from setting up the web form, though? [15:04:55] probably not [15:06:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:07:38] ^demon|brb: pong, sorry for the long delay [15:07:51] you caught me just after I had gone to do some errands [15:08:02] <^demon|brb> No worries, ended up figuring it out. [15:08:09] ok great [15:08:22] I am less here thn I would like [15:08:30] and getting pretty tired of my digestion recently [15:12:07] hashar: I find it weird that the widget uses the dict[key] as the field label. But, given that, the rest of the craziness follows :) [15:13:09] Best option would be to add a help url field to the widget. Dunno if anyone but me would use that though. [15:15:56] PROBLEM - Puppet freshness on marmontel is CRITICAL: Puppet has not run in the last 10 hours [15:16:00] paravoid: do mwscript and friends work on spence? [15:16:15] I don't know [15:16:21] what are you looking for? [15:17:38] paravoid: I'd like to submit a check like the one for the enwiki jobqueue [15:17:57] but that uses a direct DB query which requires passwords and whatnot [15:18:32] I don't think there is a local mw installation there [15:18:58] is the mysql password different for all clusters? [15:19:19] huh, I lie [15:19:19] there is one [15:19:34] oh [15:19:40] in which case multiversion and all the rest will work fine [15:19:44] wonderful [15:19:48] Nemo_bis: There's a script for doing that already [15:19:53] And I added a total count to it too... [15:19:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.383 seconds [15:20:03] Reedy: that only checks those above 10k I think? [15:20:07] hmm where? [15:20:09] No [15:20:27] https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/WikimediaMaintenance.git;a=blob;f=getJobQueueLengths.php;h=25179a01093705ae4b101cb38367dcd4d4afc609;hb=HEAD [15:20:30] It does every one [15:21:06] ah, yes, maintenance script :) [15:21:07] where does it run? [15:21:17] or rather, where is output stashed? [15:21:51] Doing all of those sql connections seperately takes an age (I know, I tried it) [15:21:59] aww [15:22:50] !log reedy synchronized php-1.21wmf5/extensions/WikimediaMaintenance/ [15:22:58] Logged the message, Master [15:23:34] Total 5812836 [15:23:34] real 0m27.778s [15:25:02] fast enough [15:25:25] The other is multiple times longer [15:25:33] and outputting to console is probably the slowest part [15:27:46] PROBLEM - Puppet freshness on db1043 is CRITICAL: Puppet has not run in the last 10 hours [15:29:05] Total 5814623 [15:29:05] real 0m10.913s [15:29:15] a third of the time if you don't output all the zeros [15:29:44] seems easy enough to skip those [15:30:24] I just did it ;) [15:30:30] what command is that? [15:30:30] Reedy, why https://gerrit.wikimedia.org/r/#/c/37398/ didn't work? [15:30:30] https://gerrit.wikimedia.org/r/37437 [15:30:33] By default [15:30:58] ah [15:31:05] and what about getting only the total? [15:31:14] http://p.defau.lt/?79QoWQsguFdgb4r0rMheVw [15:31:27] MaxSem: didn't work for what? Wikipedia? [15:31:33] yes [15:31:51] worked for wikivoyages though [15:32:02] lol [15:32:20] Try 'wiki' [15:32:21] I think.. [15:32:31] I know there is 'wikipedia' => in other configs... [15:34:25] if it starts to resist, I'll have to punish it by deploying everywhere:P [15:36:20] Nemo_bis: https://gerrit.wikimedia.org/r/37438 [15:36:46] PROBLEM - Puppet freshness on ms-be3 is CRITICAL: Puppet has not run in the last 10 hours [15:37:20] mwscript extensions/WikimediaMaintenance/getJobQueueLengths.php --totalonly [15:37:56] Reedy: only the number or also "Total"? [15:38:02] Also Total [15:38:28] (\d+) would suffice to get the number from the line [15:38:46] apergos: how's swift? :) [15:38:53] sucky [15:39:32] the etas on these object repication runs are around 170 hours now [15:39:34] !log reedy synchronized php-1.21wmf5/extensions/WikimediaMaintenance [15:39:42] Logged the message, Master [15:39:53] for what exactly? [15:39:57] down from 210 hours yesterday morning but still [15:40:18] these are what shuffle the data around [15:42:12] Nemo_bis: Also, yes, spence should have mwscript etc setup [15:42:22] at worst, you might need to give it /path/to/mwscript [15:42:31] or php /path/to/MWScript.php [15:42:45] ok thanks [15:42:49] Reedy: spence appears to have a copy of the standard install [15:42:52] with wmf-config and all the rest [15:43:04] so stuff should run there just l ike anywhere else [15:43:05] yeah, it's in mediawiki-installation [15:43:38] reedy@spence:~$ mwscript [15:43:38] mwscript: command not found [15:44:07] It doesn't get the appserver package.. So no /usr/local/bin/mwscript [15:44:20] yeah but [15:44:49] Lol [15:44:49] reedy@spence:~$ php /home/wikipedia/common/multiversion/MWScript.php extensions/WikimediaMaintenance/getJobQueueLengths.php --totalonly [15:44:49] PHP Fatal error: Class 'Memcached' not found in /home/wikipedia/common/php-1.21wmf5/includes/objectcache/MemcachedPeclBagOStuff.php on line 57 [15:44:49] Fatal error: Class 'Memcached' not found in /home/wikipedia/common/php-1.21wmf5/includes/objectcache/MemcachedPeclBagOStuff.php on line 57 [15:44:56] /apache/common-local/multiversion [15:45:00] I guess someone will need to fix that first [15:45:26] so spence needs php5-memcached [15:45:39] try running the /apache/common/ version [15:45:42] any better? [15:46:30] ah I see [15:46:31] meh [15:46:32] Worse [15:46:32] PHP Warning: require(/usr/local/apache/common-local/php-1.21wmf5/../wmf-config/CommonSettings.php): failed to open stream: Permission denied in /usr/local/apache/common-local/php-1.21wmf5/maintenance/doMaintenance.php on line 88 [15:46:32] PHP Fatal error: require(): Failed opening required '/usr/local/apache/common-local/php-1.21wmf5/../wmf-config/CommonSettings.php' (include_path='.:/usr/local/apache/common-local/php-1.21wmf5:/usr/local/apache/common-local/php-1.21wmf5/includes:/usr/local/apache/common-local/php-1.21wmf5/languages:/usr/local/apache/common-local/php-1.21wmf5/maintenance') in /usr/local/apache/common-local/php-1.21wmf5/maintenance/doMaintenance.php [15:46:32] on line 88 [15:46:57]