[00:08:47] !log asher synchronized wmf-config/db.php 'putting db50 into rotation for s6' [00:08:55] Logged the message, Master [00:23:54] New patchset: Hashar; "gallium: enable ssh X11 Forwarding" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1641 [00:31:38] Change abandoned: Hashar; "X11 Forwarding not needed, just found how to install android with no GUI:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1641 [01:57:22] RECOVERY - DPKG on db13 is OK: All packages OK [02:07:54] !log LocalisationUpdate completed (1.18) at Tue Dec 20 02:07:54 UTC 2011 [02:08:04] Logged the message, Master [03:21:40] PROBLEM - MySQL disk space on db9 is CRITICAL: DISK CRITICAL - free space: /a 10547 MB (3% inode=99%): [03:44:42] PROBLEM - Disk space on db9 is CRITICAL: DISK CRITICAL - free space: /a 10457 MB (3% inode=99%): [06:00:09] !log removed srv159, srv183 and srv186 from /etc/dsh/group/job-runners on fenari and stopped mw-job-runner on them, see https://bugzilla.wikimedia.org/show_bug.cgi?id=31576#c23 [06:00:20] Logged the message, Master [06:03:04] !log removed srv162, srv174 from /etc/dsh/group/job-runners: not in puppet jobrunners class [06:03:14] Logged the message, Master [06:10:48] !log removed mobile1, srv124, srv159, srv183, srv186 from /etc/dsh/group/apaches: not in mediawiki-installation [06:10:57] Logged the message, Master [07:23:04] PROBLEM - Puppet freshness on es1002 is CRITICAL: Puppet has not run in the last 10 hours [07:45:14] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [07:56:14] !log after some playing around on ms5 (whichis responsible for the little io utilization spikes, but I'm done now), thumb cleaner is back at work for what should be its last day [07:56:24] Logged the message, Master [08:14:35] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [08:39:07] Change abandoned: Hashar; "per mark request, no white space cleanup." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1494 [09:33:04] !log first attempt at scp from ds2 to ds1 failed after 64gb, nothing useful in log, process on ds1 was hung at "restarting system call"... shot it and running again, from screen as root on ds2. [09:33:13] Logged the message, Master [09:34:26] !log should have logged this earlier, prolly about 2 hours ago removed 3 more bin logs from db9, we were getting crowded again. [09:34:34] Logged the message, Master [09:56:27] RECOVERY - MySQL slave status on es1004 is OK: OK: [10:49:39] !log ds2 scp to ds1 stalled in the same place, looking into it [10:49:47] Logged the message, Master [10:53:53] New patchset: Dzahn; "check_all_memcached - do not rely on NFS (fix RT1269)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1642 [10:54:06] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1642 [11:01:37] New review: Dzahn; "we don't want to rely on NFS, and can now require /usr/local/apache/common-local/wmf-config/mc.php i..." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1642 [11:01:38] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1642 [11:11:33] New patchset: Dzahn; "check_all_memcached - revert change, and use NFS path again, needs discussion" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1643 [11:11:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1643 [11:12:32] PROBLEM - Host srv199 is DOWN: PING CRITICAL - Packet loss = 100% [11:16:35] New review: Dzahn; "what's the best way to ensure mc.php is present and up-to-date on spence?" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1643 [11:16:49] New patchset: Hashar; "WikipediaMobile: add css/html for nightly builds" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1644 [11:16:59] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/1644 [11:19:04] New patchset: Hashar; "WikipediaMobile: add css/html for nightly builds" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1644 [11:19:16] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1644 [11:23:47] !log nikerabbit synchronized php-1.18/extensions/Narayam/resources/ext.narayam.core.css 'bugfix r106781' [11:23:56] Logged the message, Master [11:24:26] !log nikerabbit synchronized php-1.18/extensions/WebFonts/resources/ext.webfonts.css 'bugfix r106781' [11:24:34] Logged the message, Master [11:26:16] New review: Dzahn; "also see: http://rt.wikimedia.org/Ticket/Display.html?id=1269" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1643 [11:54:38] New patchset: Dzahn; "process monitoring for mobile traffic loggers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1645 [11:54:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1645 [11:55:40] New patchset: Dzahn; "process monitoring for mobile traffic loggers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1645 [11:55:53] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1645 [11:57:55] New review: Dzahn; "hashar, re: "recursive dirs". fyi: http://christian.hofstaedtler.name/blog/2008/11/puppet-managing-d..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1640 [13:00:14] !log catrope synchronized php-1.18/extensions/ArticleFeedbackv5/ 'r106794' [13:00:23] Logged the message, Master [13:02:46] !log catrope synchronized wmf-config/InitialiseSettings.php 'Whitelist Category:Article_Feedback_5_Additional_Articles for AFTv5 and blacklist it for AFTv4 on enwiki and en_labswikimedia' [13:02:54] Logged the message, Master [13:17:04] !log catrope synchronized wmf-config/InitialiseSettings.php 'Configure $wgImportSources on en_labswikimedia' [13:17:10] breaking news: PHP sucks :-P [13:17:13] Logged the message, Master [13:18:21] can't clear an option in stream_context_set_option. nor in stream_context_set_default. nor any other way. [13:18:55] !log catrope synchronized wmf-config/InitialiseSettings.php 'Use the correct interwiki prefix' [13:19:04] Logged the message, Master [13:44:32] !log added testswarm package to repo and installed it on gallium [13:44:40] Logged the message, Master [13:58:24] New patchset: Hashar; "enable testswarmm on gallium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1646 [14:10:23] PROBLEM - MySQL disk space on db9 is CRITICAL: DISK CRITICAL - free space: /a 10730 MB (3% inode=99%): [14:13:36] !log another couple binlogs gone on ds9 [14:13:45] Logged the message, Master [14:25:27] New review: Dzahn; "just like the other process checks just with different arguments" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1645 [14:25:28] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1645 [15:28:07] !log thumbs cleaner on ms5 complete. (don't worry, a new job will start up tomorrow) [15:28:15] Logged the message, Master [15:38:35] Is the API-Problem-Message in the topic still correct? And if so: Is somebody working on it? [15:40:16] New patchset: Dzahn; "planet - use star.wmf ssl cert, move to own file, remove hard-coded IP, add locales" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1606 [15:40:28] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1606 [15:40:49] New patchset: Dzahn; "planet - use star.wmf ssl cert, move to own file, remove hard-coded IP, add locales" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1606 [15:41:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1606 [16:02:31] New patchset: Catrope; "script to fetch mediawiki + puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1647 [16:17:05] !log restarting scp on ds2, seems that it renegotiates after 64GB and that was failing, fixed [16:17:14] Logged the message, Master [16:25:08] New patchset: Hashar; "script to fetch mediawiki + puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1647 [16:25:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1647 [16:27:39] New patchset: Hashar; "script to fetch mediawiki + puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1647 [16:27:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1647 [16:35:21] !log catrope synchronized wmf-config/InitialiseSettings.php 'Underscores -> spaces in wmgArticleFeedbackBlacklistCategories' [16:35:30] Logged the message, Master [16:44:51] New patchset: Hashar; "script to fetch mediawiki + puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1647 [16:45:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1647 [16:46:06] !log catrope synchronized php-1.18/resources/startup.js 'touch' [16:46:15] Logged the message, Master [17:12:19] New patchset: Dzahn; "remove special.cfg from nagios" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1648 [17:12:34] New patchset: Dzahn; "change max_concurrent_checks from 8 to 1000" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1649 [17:12:49] New patchset: Hashar; "jenkins: add git configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1650 [17:14:07] New review: Hashar; "I am pretty sure that is how you can kill a box hard by having nagios fork until the box is out of m..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/1649 [17:14:53] New review: Hashar; "Looks fine now. Thanks Roan for the merge!" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/1647 [17:16:52] New review: Dzahn; "the log file is so huge because it is full of "Max concurrent service checks (8) has been reached", ..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1649 [17:23:10] New patchset: Dzahn; "change max_concurrent_checks from 8 to 64" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1649 [17:23:44] http://meta.wikimedia.org/wiki/Planet_Wikimedia#Requests_for_Update_or_Removal a dev could please do this for me? :-) (n. 1, under the big thing) [17:43:42] New patchset: Dzahn; "remove special.cfg from nagios" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1648 [17:59:10] New review: Hashar; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/1649 [18:12:18] PROBLEM - Lighttpd HTTP on dataset1 is CRITICAL: Connection refused [18:13:20] New patchset: Jgreen; "new class for misc::maintenance stuff, cronjobs for hume" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1651 [18:13:30] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/1651 [18:14:31] Jaqen: https://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/planet/it/config.ini?revision=87505&view=markup&pathrev=105427#l122 [18:15:10] Jaqen: nothing to do with mutante's request, right? [18:17:10] no jeremyb [18:23:09] LeslieCarr: can has translation? i don't think that's what you meant. "when we switch to a cert with a different certificate" [18:23:43] oops [18:24:19] thanks [18:27:18] New patchset: Jgreen; "new class for misc::maintenance stuff, cronjobs for hume typofix: semicolons to commas" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1651 [18:28:29] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1651 [18:28:30] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1651 [18:32:43] If Jimbo and English Wikipedia community will decide to do the blackout, who in opts team will be responsible for that? [18:32:48] I have some questions [18:34:51] I don't know that the ops team would need to do anything [18:35:00] vvv: maybe it will be done by a volunteer shell user? or by a wiki sysop like itwikip ? why would ops be involved? [18:35:04] e.g. it wp did it themselves [18:35:16] vvv: is there a specific procedure proposed? [18:35:24] No, I just wonder [18:35:33] I would ask the questions at the proposal page [18:35:36] We were contacted by one major Russian search engine [18:36:06] And they were worried about wiki going offline and garbaging their search results [18:36:23] if ther eis a blackout and if the search engine doesn't load js and if we go the it wp route [18:36:27] they won't be affected [18:39:20] PROBLEM - Puppet freshness on es1002 is CRITICAL: Puppet has not run in the last 10 hours [18:50:09] apergos: i'm trying to think of the right status code to send... http://tools.ietf.org/html/rfc2616#page-4 [18:50:31] for what? [18:50:34] 50x would probably work well with teh search engines but would be semantically wrong [18:50:40] vvv's blackout [18:51:18] if it's js the search engines won't see it most likely [18:51:25] no need to send anything [18:51:52] right. but if there's some other variant... just vvv got me thinking [19:06:09] anybody want to take a look at the world's hackiest-looking template for me? [19:06:46] i'm either doing something of unparalleled brilliance... or something incredibly stupid [19:07:38] roankattouw, i'm looking at you because it's your code that i'm butchering [19:08:10] heh [19:08:29] http://en.wikipedia.org/wiki/Template:Welcomelaws-rand [19:09:19] what i'm *trying* to do is get it to substitute based on the first letter of the recipient's username [19:09:31] or BASEPAGENAME or whatever [19:14:57] thank you brion, it's time to remove that checkbox :-( [19:15:17] \o/ [19:17:08] !log reedy synchronized php-1.18/extensions/Contest/ 'r106838' [19:17:09] :D [19:17:17] Logged the message, Master [19:25:28] brion, you're not fair, it's not 100%, it's just 99,999 [19:25:38] :) [19:25:41] rounding error ;) [19:26:54] !log reedy synchronized php-1.18/extensions/CentralAuth/ 'r106840' [19:27:00] :D [19:27:03] Logged the message, Master [19:36:29] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [20:13:57] RECOVERY - Lighttpd HTTP on dataset1 is OK: HTTP OK HTTP/1.0 200 OK - 1512 bytes in 0.009 seconds [20:15:04] Change abandoned: Hashar; "Not required. This can be configured from the Jenkins web interface." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1650 [20:51:36] PROBLEM - mobile traffic loggers on cp1044 is CRITICAL: PROCS CRITICAL: 1 process with args varnishncsa [21:12:01] !log synchronizing CiviCRM instance on grosley and aluminium to r1037 [21:12:10] Logged the message, Master [21:26:06] New patchset: Asher; "reduce cache ttl to 60s for "mobileaction=view_normal_site" urls since they don't get purged. also fix frontend / hack timing." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1652 [21:29:20] New review: preilly; "This looks okay to me." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/1652 [21:33:04] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1644 [21:33:06] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1644 [21:33:29] New patchset: Asher; "reduce cache ttl to 60s for "mobileaction=view_normal_site" urls since they don't get purged. also fix frontend / hack timing." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1652 [21:34:40] New review: preilly; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/1652 [21:35:23] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1652 [21:35:23] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1652 [22:21:20] PROBLEM - MySQL disk space on db9 is CRITICAL: DISK CRITICAL - free space: /a 10596 MB (3% inode=99%): [22:45:23] how do I find out the state of a particular extension's deployment across all languages of WP? [22:57:33] anyone? [23:01:42] abartov: You're best bet is to look at the configuration itself [23:01:53] noc.wikimedia.org/conf [23:02:23] Also, wikitech is giving me an expired certificate. Is that a known issue? [23:25:56] johnduhart: yes. wikitech's bad cert is known [23:26:21] Just checking :) [23:28:57] johnduhart: we are planning out a migration of how we handle and serve wikitech, so we dont want to throw money at a new cert [23:29:05] well, a new and legit cert [23:29:40] the idea is we may merge it with our labs wiki infrastructure, since labs documents dev work to make things work on cluster (wikitech) [23:30:07] then setup the current out of cluster wikitech as a static automatically replicated host for outage documentation. [23:30:21] Ryan_Lane: I am going to keep saying we are doing this like its true, until everyone agrees with us. [23:30:27] I thought wikitech was supposed to be offsite if there was a failure rendering it unavailable [23:30:44] yep, which is why there would have to be an off cluster static copy [23:30:51] during an outage, there is no real reason to edit docs. [23:30:54] Ah [23:30:59] the only thing that edits is admin log [23:31:17] so we really need to find a better, lightweight way of serving that independently of the wikitech installation [23:31:49] we push admin log to twitter and identi.ca, but relying on a third party service for a basic admin log is wonky. [23:32:16] wikitech is also closed registration due to spamming [23:32:38] if we folded it into labs console, all those devops folks can document and help out on it [23:33:02] i am pretty sure everyone on ops already agreed, its just a matter of finding the time to do it. [23:55:54] thanks, johnduhart, but I don't see how that answers my question. How do I find out if, e.g. ten languages L1 through L10 have the e.g. Book extension enabled?