[01:03:14] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [01:16:17] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [01:42:05] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 283 seconds [01:46:06] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [02:25:56] New patchset: MarkAHershberger; "file systesm standard layout" [operations/debs/wikistats] (master) - https://gerrit.wikimedia.org/r/7573 [02:33:30] New patchset: Jeremyb; "Bug 36813 - update wgUploadNavigationUrl on all cs wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7574 [02:44:58] New review: Jeremyb; "I've no idea what the appropriate standard for local consensus is for a change like this. Just a not..." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/7574 [03:13:07] New review: Krinkle; "Does UploadWizard accept the same query arguments? (such as wpDestFile). And can it show a the link ..." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/7574 [03:18:55] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [03:20:37] New review: Jeremyb; "I don't have answers for Krinkle but I assumed it was fine because it was already in use. These are ..." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/7574 [03:21:07] damnit gerrit, why do you merge lines? i wanted a newline there [03:21:13] * jeremyb waves Krinkle [03:21:19] commons.wikimedia.org/wiki/Special:UploadWizard/es doesn't work [03:21:26] /es is ignored. [03:21:39] The old Commons:Upload (project page, not special page) used to have a translation subpage [03:21:43] don't blame me! ;-P [03:22:08] I'm not [03:22:09] Just pointing out :) [03:22:16] * jeremyb figured ;) [03:22:19] oh hey, I can submit a fix for that now [03:22:22] guess what :: [03:22:25] :D [03:22:49] * Damianz Krinkle's Krinkle. [03:23:58] New review: Krinkle; "@jeremyb: [[Special:UploadWizard/es]] doesn't even work. The old Commons:Upload wiki project page (n..." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/7574 [03:25:23] I've managed to make my .profile and stuff on toolserver so much like my local workstation and I often try to access *my* home directory and wonder why tab completion doesn't' when when I'm just about to re-read "krinkle@willow:". Oh,... right. [03:28:50] Krinkle: is commons:upload deprecated? see huwiktionary [03:29:02] useless 302 redirect [03:29:12] interwikis are protocl-relative though, so it won't break ssl [03:29:30] Oh wait you mean the namespace [03:29:34] errmmmm? [03:29:43] i mean once you're already on commons [03:29:45] yeah [03:29:58] (see wiki named above :P) [03:30:02] ahm. so not really deprecated. There is still useful stuff on there. [03:30:11] k [03:30:23] Nothing should link to *Special*:Upload on commons though [03:30:44] unless it is as upload link for red linked files [03:31:09] (wgUploadMissingFileUrl) [03:31:25] stuff does [03:31:28] nope [03:31:31] those are all Special: [03:32:06] yes, there eis a few to Special outside wgUploadMissingFileUrl [03:32:25] i don't follow [03:32:29] never mind [03:32:33] what's the diff between Special and Special? [03:32:46] are you committing that es one? or should i grab it in my commit coming down the pike now? [03:32:47] I meant Special = Special:Upload (not UploadWizard) [03:32:59] I'm committing it now [03:33:03] k [03:33:21] I'm saying Special:Upload is somewhat deprecated, Commons:Upload not so much. But neither should be fixed uncontroversially. [03:35:30] k [03:36:04] New patchset: Krinkle; "Fix commons Upload urls" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7575 [03:36:22] uh ? [03:36:24] wtf is docroot/foundation/leveĢe_de_fonds.html [03:37:06] haha [03:37:07] new file [03:37:13] also what is news.dblist? [03:38:37] Change abandoned: Krinkle; "Wrong base." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7575 [03:39:10] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [03:40:27] New patchset: Krinkle; "Fix commons Upload urls" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7576 [03:40:31] New patchset: Jeremyb; "wgUploadNavigationUrl: make local paths relative (rm hostname)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7577 [03:40:31] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [03:40:59] Krinkle: ewwww... should have just pushed a new changeset with the right commit [03:41:05] on the same "change" [03:41:15] errr [03:41:20] Yeah, but topic was wrong as well [03:41:33] anyway, there is a encoding issue [03:41:55] how important are topics? [03:42:06] that french file is seen as a new file when it really is not [03:42:07] when I do "git add ." it will be added again [03:42:23] ugh [03:42:38] .gitignore? ;-P [03:42:49] for something that's probably never going to be touched again and has a good first line summary I've just been using no topic [03:43:18] You automatically get a good topic if you use a local branch other than master [03:43:40] which you should always do so you can pull master without getting a merge commit or having to rewrite history to submit a different change [03:44:03] always a clean branch from the remote, no weird dependencies on uncommitted changes [03:44:05] unmerged* [03:44:37] that's an argument for my local sanity [03:44:42] I never specify topic either, git-review uses local branch name [03:44:49] but for how it looks in gerrit, does anyone care? [03:44:59] probably not in mediawiki-config [03:45:55] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [03:46:15] but using a branch like "shell" or "cleanup" should be ok. It has to be named something anyway, and unless you take extra steps or work directly on master, it'll have to be _something_ [03:47:02] were you trying to standardize whitespace? (in 7576/1) [03:47:25] a little bit yeah, 2 lines in the vicinity where using a tab instead of spaces [03:47:46] i've just been working on master and then git-reset to something that's already merged in gerrit after i've pushed for review [03:47:59] yeah, that works too [03:48:02] there's still quite a mix [03:48:09] git reset ---hard gerrit/master [03:48:13] mornin' [03:48:18] moin [03:48:24] I like to keep master clean though, so I can create a new clean branch at any time [03:48:24] oh my sleep schedule is *so* fscked up [03:48:28] how's greece treating you? [03:48:35] although one can do that without having a local master branch too [03:48:40] paravoid: git fsck? [03:49:16] paravoid: i meant to ask you, did you absentee vote? is there such a thing? [03:49:26] no such thing [03:49:30] git co -b tmp; git br -D master /* who needs master anyway */; git co -b fix3000 -t gerrit/master; git br -D tmp; [03:49:41] I don't have a master on many fix-only repo clones [03:49:57] saw jimmy and brian for the go talk a few days ago [03:50:12] i guess maybe you haven't met brian [03:50:48] gtg [04:31:51] PROBLEM - swift-container-auditor on ms-be2 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [04:37:42] RECOVERY - swift-container-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [04:51:45] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [04:57:27] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [05:01:27] New patchset: Jeremyb; "Bug 36533 - tewiktionary: make wgMetaNamespace match the new wgSitename" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7578 [05:07:24] New review: Jeremyb; "I requested proofreading (on the bug) for the foreign chars I don't have fonts for and can't read an..." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/7578 [05:14:06] PROBLEM - Lucene on search1016 is CRITICAL: Connection timed out [05:14:24] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [05:34:21] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [05:37:57] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [05:39:09] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [05:39:18] RECOVERY - Lucene on search1016 is OK: TCP OK - 0.027 second response time on port 8123 [05:40:03] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [05:42:47] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [05:44:26] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [05:51:11] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [05:51:29] PROBLEM - Lucene on search1016 is CRITICAL: Connection timed out [05:52:23] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [06:02:44] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [06:03:38] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [06:04:59] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [06:07:05] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [06:07:06] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [06:07:53] search-pool4.svc.eqiad.wmnet is not behaving [06:09:16] uh huh [06:09:49] (per nagios. happened more than once) [06:12:56] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [06:14:17] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [06:21:11] RECOVERY - Lucene on search1016 is OK: TCP OK - 0.026 second response time on port 8123 [06:21:50] jeremyb: I was looking at this before you came on bot got red-herringed by the spike on the eqiad indexer [06:22:42] indexer or just a search server? [06:22:47] !log restarted lucene search on search1016 it had stopped doing anything useful (see ganglia graphs, also nothinig wtitten to logs) [06:22:56] Logged the message, Master [06:22:56] on the indexer [06:22:59] huh [06:23:01] butit spikes every week [06:23:12] i thought only idx2 and idx1001 were indexers [06:23:34] and the rest just did other things. like execute queries [06:23:45] the index regeneration includes some sort of dumping and indexing of things, which apparently is scheduled for the weekend [06:23:53] *sumping and importing [06:23:57] huh [06:24:06] are weekends lighter than weekdays? [06:24:16] that stuff seriously needs to get rewritten but ugh who wants to dig through the java [06:24:22] i wonder if we can get stats on how much traffic comes from people at work vs. at home [06:24:24] well sundays are I think [06:24:31] (comscore?) [06:24:42] we can just lok at the graphs [06:25:03] well for the original question. not the second ;P [06:25:26] anyways it turned out to be much simplere: lsearchd on one of the to searchpool4 hosts was simply no onger doing anything [06:25:44] why? beats me. but there you have it [06:26:17] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [06:28:06] oh, 1016 is pool4? [06:28:12] uh huh [06:29:27] yup, /me spies https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=blob;f=manifests/role/lucene.pp;hb=HEAD ;) [07:40:04] PROBLEM - Host srv252 is DOWN: PING CRITICAL - Packet loss = 100% [07:41:43] RECOVERY - Host srv252 is UP: PING OK - Packet loss = 0%, RTA = 0.40 ms [07:43:26] !log still upgrading/rebooting a couple srv (API) application servers with long uptime [07:43:31] Logged the message, Master [07:44:43] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [07:45:28] PROBLEM - Apache HTTP on srv252 is CRITICAL: Connection refused [07:46:13] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [07:56:07] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [07:56:43] RECOVERY - Apache HTTP on srv252 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.180 second response time [07:57:37] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [08:00:10] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [08:09:19] PROBLEM - Host srv251 is DOWN: PING CRITICAL - Packet loss = 100% [08:10:13] RECOVERY - Host srv251 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [08:14:25] PROBLEM - Apache HTTP on srv251 is CRITICAL: Connection refused [08:16:13] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [08:24:10] RECOVERY - Apache HTTP on srv251 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [08:38:06] New review: Jeremyb; "this one will need a namespaceDupes run" [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/7182 [08:48:37] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [08:48:37] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [08:52:21] is there some convention about what NS # to use for Wikijunior? (on a wikibooks) [08:52:26] seems inconsistent so far [08:52:54] i think i see at least 5 different #s in use [08:53:15] !c [08:53:35] !change 7582 [08:53:35] https://gerrit.wikimedia.org/r/7582 [08:54:17] i'm asking because that ^^ jumps straight from 100/101 -> 110/111. was wondering if there was a rhyme or reason to it and if it should be done that way or not [08:58:31] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:00:10] PROBLEM - Apache HTTP on srv250 is CRITICAL: Connection refused [09:00:38] PROBLEM - swift-container-auditor on ms-be2 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:01:31] RECOVERY - Apache HTTP on srv250 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.049 second response time [09:01:31] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:07:17] New patchset: Jeremyb; "Bug 35977 - frwikibooks gets Wikijunior NS" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7182 [09:09:19] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:12:10] RECOVERY - swift-container-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:14:25] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:17:52] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:20:57] PROBLEM - Apache HTTP on srv216 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:32:30] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:33:33] RECOVERY - Apache HTTP on srv216 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.024 second response time [09:37:54] PROBLEM - Apache HTTP on srv218 is CRITICAL: Connection refused [09:41:21] PROBLEM - Host srv218 is DOWN: PING CRITICAL - Packet loss = 100% [09:43:36] RECOVERY - Host srv218 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [09:50:27] mutante: I presume all of these are you, right? [09:50:55] paravoid: ack, things starting srv* are me [09:50:57] PROBLEM - Apache HTTP on srv215 is CRITICAL: Connection refused [09:51:38] paravoid: they are/were in the range of 215 days [09:54:27] yikes [09:55:30] upgrade-helper sorts them for me, so order by uptime desc [09:55:44] nice :) [09:57:08] we are all fine on squids, sq*, amssq*, knsq* that is..and _sombunall_ done in the misc. category [09:57:36] but app servers was next [10:04:27] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:04:45] RECOVERY - Apache HTTP on srv218 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.030 second response time [10:05:39] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:06:42] RECOVERY - Apache HTTP on srv215 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.030 second response time [10:08:30] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:12:15] PROBLEM - Apache HTTP on srv214 is CRITICAL: Connection refused [10:13:45] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:18:17] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:29:23] PROBLEM - Apache HTTP on srv188 is CRITICAL: Connection refused [10:32:59] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:39:44] RECOVERY - Apache HTTP on srv214 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.037 second response time [10:44:14] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:49:56] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [11:04:56] PROBLEM - Apache HTTP on srv249 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:05:32] PROBLEM - Host srv249 is DOWN: PING CRITICAL - Packet loss = 100% [11:07:47] RECOVERY - Host srv249 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [11:08:15] RECOVERY - Apache HTTP on srv188 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.054 second response time [11:08:56] New review: Jeremyb; "native speaker proofing done. I think this is ready for merge." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/7578 [11:13:20] RECOVERY - Apache HTTP on srv249 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.035 second response time [11:17:59] PROBLEM - Host srv248 is DOWN: PING CRITICAL - Packet loss = 100% [11:19:02] RECOVERY - Host srv248 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [11:22:56] PROBLEM - Apache HTTP on srv248 is CRITICAL: Connection refused [11:29:23] PROBLEM - Apache HTTP on srv192 is CRITICAL: Connection refused [11:32:41] RECOVERY - Apache HTTP on srv248 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.085 second response time [11:35:14] PROBLEM - Host srv192 is DOWN: PING CRITICAL - Packet loss = 100% [11:36:26] RECOVERY - Host srv192 is UP: PING OK - Packet loss = 0%, RTA = 0.37 ms [11:53:32] PROBLEM - MySQL Replication Heartbeat on db1022 is CRITICAL: CRIT replication delay 189 seconds [11:54:08] PROBLEM - MySQL Slave Delay on db1022 is CRITICAL: CRIT replication delay 206 seconds [11:56:41] PROBLEM - MySQL Replication Heartbeat on db46 is CRITICAL: CRIT replication delay 193 seconds [11:57:08] PROBLEM - MySQL Slave Delay on db46 is CRITICAL: CRIT replication delay 202 seconds [11:59:59] RECOVERY - MySQL Slave Delay on db46 is OK: OK replication delay 0 seconds [12:01:02] RECOVERY - MySQL Replication Heartbeat on db46 is OK: OK replication delay 0 seconds [12:04:56] RECOVERY - MySQL Replication Heartbeat on db1022 is OK: OK replication delay 0 seconds [12:05:05] PROBLEM - Host srv285 is DOWN: PING CRITICAL - Packet loss = 100% [12:05:50] RECOVERY - MySQL Slave Delay on db1022 is OK: OK replication delay 0 seconds [12:06:44] RECOVERY - Host srv285 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [12:10:38] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:12:13] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:17:10] RECOVERY - Memcached on srv284 is OK: TCP OK - 0.007 second response time on port 11000 [12:20:10] PROBLEM - Apache HTTP on srv215 is CRITICAL: Connection refused [12:31:43] RECOVERY - Apache HTTP on srv215 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.081 second response time [13:07:45] !log opening a bz bug for check_job_queue issue related to CommonSettings.php [[BZ:36835]] [13:07:49] Logged the message, Master [13:19:16] mutante: looking into your !log just now... [13:19:36] cool:) [13:19:38] what's the simplest way to get a value out of CommonSettings.php ? [13:19:43] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [13:19:52] e.g. to debug log it [13:19:59] php -r 'include "CommonSettings.php"; print $val;' [13:20:01] ? [13:20:02] http://noc.wikimedia.org/conf/CommonSettings.php.txt [13:20:06] i tried wfDebug (wild guess) and that failed [13:20:31] (i.e. it broke the site. where site==beta deployment. don't panic ;P) [13:23:01] PROBLEM - Host srv301 is DOWN: PING CRITICAL - Packet loss = 100% [13:24:49] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [13:25:25] RECOVERY - Host srv301 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [13:25:47] mutante: checksum spence's copy vs. fenari? [13:25:56] of CommonSettings.php [13:26:29] jeremyb: spence mounts /home from NFS [13:26:56] from fenari? [13:27:12] spence: 10.0.5.8:/home on /home [13:27:23] fenari: 10.0.5.8:/home on /home [13:27:47] nfs-home.pmtpa [13:28:44] huh, that's also the RC2UDP host. busy box ;P [13:29:19] PROBLEM - Apache HTTP on srv301 is CRITICAL: Connection refused [13:30:52] so, mwmultiversion isn't in git yet? i can't tell [13:31:44] ehm, i dunno, but it feels like it would fit in /files/misc/scripts with mwscript mwscriptwikiset but isnt there, ack [13:32:03] ah, still in svn i guess [13:32:10] RECOVERY - Apache HTTP on srv301 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [13:32:50] http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/mwmultiversion/ [13:32:54] last change 5 days ago [13:33:29] :p, /puppet/files/misc/scripts has all the sync- scripts as well ,etc [13:34:58] mutante: so, try on spence: /home/wikipedia/common/multiversion/getMWVersion commonswiki [13:35:23] php-1.18 [13:35:28] there you go [13:35:30] is a lie ;) [13:35:46] heh [13:35:56] it's not getting updates to the wikiversions db? [13:36:13] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [13:37:25] hmm, i don't know. [13:37:56] do you have anything in /home/wikipedia/common/wikiversion* ? [13:37:58] where should it get the info from? [13:38:36] yea, wikiversion.data [13:38:38] .dat [13:38:55] 2012-05-09 [13:39:37] the string "18" does not appear in the file [13:40:20] and wikiversions.cdb , modified 05-10 [13:41:12] so, strace and find out which wikiversions file it's using? or if it's using one at all? [13:41:59] 1.18 was once hardcoded into CommonSettings.php as a fallback. but not in the current cluster version so I'm looking elsewhere [13:43:28] open("/usr/local/apache/common-local/wikiversions.cdb", O_RDONLY) = 3 [13:43:37] there you go [13:44:17] does that (or it's .dat) have 1.18? [13:45:04] yes [13:45:58] so "getMWVersion" should be changed to use /home ? [13:46:17] or add mechanism to copy to /usr/local [13:46:46] or -local should be made to be reliably up to date [13:46:52] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [13:47:08] idk how that's been done in the past. surely it was less a frequent issue [13:48:31] !log copying outdated wikiversions.dat/.cdb files from /home to /usr/local on spence, which fixes check_job_queue (thanks jeremyb) [13:48:34] Logged the message, Master [13:48:44] jeremyb: JOBQUEUE OK - all job queues below 10,000 [13:49:34] thanks for looking into it [13:50:29] jeremyb: yeah, i wonder that too. i don't recall that we had this before, but may have just missed it and somebody copied it then as well [13:53:34] yeah, but i'm thinking maybe the same person did this a few times? [13:55:35] or else there's more than one way to switch wikis between versions [13:55:43] and some people are using the wrong method [13:58:10] maybe manually hunt for cron jobs/logs ? [13:58:20] i see cron was updated on spence a couple months ago fwiw [13:59:08] or... maybe spence should be in a dsh group and it's missing? [13:59:10] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [13:59:55] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [14:00:17] uh,oh, software error when commenting on bugzilla [14:02:02] which? [14:02:04] i got the mail [14:04:46] a deadlock [14:04:52] huh [14:04:58] followed by mid-air-collision, followed by "it works" [14:05:07] btw, I think 7578, 7182, and 7516 are all ready to go. (shell reqs). but 7182 will need a namespaceDupes run [14:05:18] if someone's interested ;) [14:05:33] mutante: shouldn't have midaired on a cc change... [14:07:25] PROBLEM - Host srv300 is DOWN: PING CRITICAL - Packet loss = 100% [14:10:24] * jeremyb waves hexmode [14:10:37] you got a reply on the debian maintainers list [14:10:37] * hexmode waves back [14:10:43] RECOVERY - Host srv300 is UP: PING OK - Packet loss = 0%, RTA = 0.40 ms [14:10:51] :) [14:10:54] awesome [14:11:33] ever heard of fusionforge before? [14:11:40] hint: that's what runs alioth [14:13:33] so, i'm thinking there should be a mediawiki package just for other stuff to embed like that, a standalone package that's relatively stable but not kept ancient just for the sake of fusionforge and a more recent package. (maybe in experimental or backports. or both. or unstable+testing+backports) [14:13:44] jeremyb: I didn't know that is what it was called. I probably saw it before, though. [14:14:07] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [14:14:23] * jeremyb thinks irc mtg may be in order [14:14:38] jmw shouldn't be too hard to find. idk about that guy that actually replied [14:14:42] jeremyb: I do not disagree :) [14:15:00] jeremyb: I also contacted a bunch of RPM maintainers [14:15:01] PROBLEM - Apache HTTP on srv300 is CRITICAL: Connection refused [14:15:28] PROBLEM - swift-container-auditor on ms-be2 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [14:15:29] hexmode: and you got the crawler script,too [14:15:49] mutante: yeah, made a commit [14:15:54] yea, saw the mail from Debian guy [14:16:00] oh, cool [14:16:35] mutante: right now just reorg for filesystem standards and my own ocd [14:16:58] RECOVERY - swift-container-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [14:17:09] really? /usr/shared/php/ document root? [14:17:13] share [14:17:27] * hexmode sighs [14:17:35] ok:) [14:17:47] did I really put shared? [14:17:50] * hexmode checks [14:17:54] no, you didn't [14:18:11] phew [14:18:39] hmm.. /var/www/ always been debian default [14:18:46] ok, let me catch up on bug mail... I will try to get back on the survey and the debian packages [14:19:17] mutante: for web scripts, yes, but not non-outputing php files [14:19:54] ok, i see [14:19:59] mutante: But I'll look at some docs so I can have some authority to my ocd instead of just ocd :P [14:24:10] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [14:24:55] RECOVERY - Apache HTTP on srv300 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.027 second response time [14:28:33] jeremyb: would you be willing to help me maintain a bugzilla4 package for debian? [14:28:46] or mutante or anyone else here? [14:28:54] they want at least two people [14:29:27] they dropped bz b/c it wasn't very well supported [14:39:40] hi all! [14:40:06] I submitted an RT ticket to get precise installed on stat1 here:http://rt.wikimedia.org/Ticket/Display.html?id=2946 [14:40:17] who should I poke about that? [14:48:54] ottomata: I think mark could tell you about the precise upgrades [14:51:57] thanks hexmode. mark? [15:36:39] PROBLEM - Apache HTTP on srv299 is CRITICAL: Connection refused [15:39:59] New patchset: Lcarr; "Cleaning up icinga configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7587 [15:40:15] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/7587 [15:42:56] New patchset: Lcarr; "Cleaning up icinga configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7587 [15:43:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7587 [15:47:09] PROBLEM - swift-container-auditor on ms-be2 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [15:48:30] RECOVERY - Apache HTTP on srv299 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.038 second response time [15:48:30] RECOVERY - swift-container-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [15:50:26] New review: Lcarr; "however, this is only step 1. step 2 is to split off a role class (which will include things like t..." [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7587 [15:50:28] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7587 [15:53:45] New patchset: Dzahn; "add analytics subnet file to autoinstall/subnets and netboot.cfg" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7588 [15:54:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7588 [15:56:59] New patchset: Lcarr; "fixing ncsa class import" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7589 [15:57:18] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7589 [15:57:47] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7589 [15:57:49] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7589 [15:58:04] !change 7588 | Lcarr [15:58:04] Lcarr: https://gerrit.wikimedia.org/r/7588 [15:58:26] cool [15:59:09] New review: Mark Bergsma; "you're missing .cfg in the echo filename" [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/7588 [16:00:52] mutante: after the fix it will look good - i'll +1 it then :) [16:01:10] New patchset: Dzahn; "add analytics subnet file to autoinstall/subnets and netboot.cfg" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7588 [16:01:12] ok, thanks Mark and Leslie [16:01:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7588 [16:03:55] New patchset: Dzahn; "add analytics.cfg partman recipe, which is, for now, a copy of raid1.cfg (they want just software raid1 but likely to change)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7590 [16:04:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7590 [16:13:16] New patchset: Lcarr; "minor spelling, etc fixes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7591 [16:13:33] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/7591 [16:14:09] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [16:15:05] New patchset: Lcarr; "minor spelling, etc fixes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7591 [16:15:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7591 [16:16:38] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7591 [16:16:40] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7591 [16:22:25] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7588 [16:22:29] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7588 [16:22:47] New patchset: Lcarr; "fixing variable subscription on line 631" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7592 [16:23:06] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7590 [16:23:06] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7590 [16:23:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7592 [16:23:10] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7592 [16:23:12] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7592 [16:24:12] LeslieCarr: merged your icinga change with mine..done [16:24:16] cool thanks [16:31:28] New patchset: Lcarr; "minor variable name fix" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7594 [16:31:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7594 [16:33:06] PROBLEM - Host srv298 is DOWN: PING CRITICAL - Packet loss = 100% [16:35:03] RECOVERY - Host srv298 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [16:36:01] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7594 [16:36:03] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7594 [16:36:51] PROBLEM - Host srv188 is DOWN: PING CRITICAL - Packet loss = 100% [16:38:57] PROBLEM - Apache HTTP on srv298 is CRITICAL: Connection refused [16:41:48] New patchset: Lcarr; "really fixing the variable name this time" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7596 [16:42:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7596 [16:42:24] RECOVERY - Host srv188 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [16:42:55] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7596 [16:42:58] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7596 [16:45:33] PROBLEM - Apache HTTP on srv188 is CRITICAL: Connection refused [16:52:45] RECOVERY - Apache HTTP on srv188 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.052 second response time [16:57:33] RECOVERY - Apache HTTP on srv298 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.026 second response time [17:00:16] New patchset: Lcarr; "changing variable names" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7597 [17:00:33] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/7597 [17:00:35] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7597 [17:01:39] New patchset: Lcarr; "changing variable names" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7597 [17:01:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7597 [17:02:07] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7597 [17:02:09] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7597 [17:17:01] New patchset: Lcarr; "Removing some monitoring classes as they duplicate themselves" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7599 [17:17:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7599 [17:17:24] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7599 [17:17:29] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7599 [17:21:10] New patchset: Ottomata; "admins.pp - enabling Fabian's account." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7600 [17:21:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7600 [17:21:57] maplebed or LeslieCarr, if you got a sec, would one of you review that please? [17:22:16] we already had Fabian's account approved, and I made the puppet change, maplebed approved, but I didn't notice that his account in puppet was 'disabled' [17:22:42] oh now you don't just want an account, you want it enabled ? [17:22:43] needy [17:23:59] actually , just delete the "enabled" line [17:25:01] ooook, i had to ensure the key was present too [17:25:11] yeah, keep the key line [17:25:27] New patchset: Ottomata; "admins.pp - enabling Fabian's account." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7600 [17:25:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7600 [17:25:52] bettah? [17:29:05] LeslieCarr? [17:29:44] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7600 [17:29:46] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7600 [17:29:51] dankkeeee [17:30:04] if you don't mind updating file son puppetmaster [17:30:10] i can run puppet on stat1 and have fabian try [17:33:46] done [17:41:35] ottomata: sorry I didn't catch that in the first review. [17:43:53] no probs [17:43:56] i missed it myself as well [17:44:05] it looks good now, thanks to the both of you! [17:57:33] as if to prove my point about why GCal is flawed... [17:57:44] for deployment cals [17:57:56] What happened? [17:58:10] someone moved the general deployment window to 1pm without telling me [17:58:49] Oh yeah I saw that [17:58:54] I was wondering what was up [17:59:05] Like "is RobLa just monopolizing the entire day now" [17:59:11] I'm looking for the email notification and I can't find it [18:00:17] moving back, and hoping someone doesn't do it again, and if they do, actually use the notifcation email so I can tell them not to do that [18:01:31] it was probably a well-intentioned slip up, but that's kinda why gcal sucks for this is because well-intentioned slip ups are way too easy [18:04:04] I've made these points about notifs on the mailing list [18:04:24] 1) set of notified people always too narrow, 2) notifs are easily suppressible because it asks you every time [18:12:47] !log Killing old php-1.20wmf1 directories from apaches to save full disks [18:12:51] Logged the message, Master [18:30:45] !log upgrading labsconsole to 1.20wmf2 [18:30:48] Logged the message, Master [18:31:10] guess I missed my window by a little. heh [18:35:07] * Damianz gives Ryan_Lane a door [18:35:23] I'm still doing the upgrade ;) [18:35:48] Rebel :D [18:35:50] this doesn't affect the other deployment [18:36:39] You're 2 weeks behind! [18:36:57] and I always will be [18:37:01] you guys are my beta testers [18:37:09] ;) [18:37:30] I'm two weeks behind on 1.20wmf2? [18:37:43] you guys are doing 1.20wmf3 right now? [18:37:44] Reedy is deploying 1.20wmf3 now [18:37:46] ah [18:38:00] technically I'm way more than 2 weeks behind [18:38:02] 1.20wmf2 is so two weeks ago [18:38:04] I'm coming from 1.18 [18:38:51] * Reedy wonders if LDAP will still work [18:38:56] it does [18:38:59] I already tested this in labs [18:39:03] heh [18:40:01] !log completed upgrade to 1.20wmf2 on labsconsole [18:40:05] Logged the message, Master [18:40:06] sent mail out to several chapters orderd by mediawiki version asc, in kind of an upgrading campaign with hexmode [18:40:14] New patchset: Asher; "change freq of pqd slow log injest from every 10 to 20m" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7611 [18:40:21] one of them needs better Solaris support:) [18:40:34] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7611 [18:40:38] bbl [18:40:41] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7611 [18:40:44] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7611 [18:42:39] !log adding OATHAuth to labsconsole [18:42:43] Logged the message, Master [18:43:13] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7350 [18:43:15] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7350 [18:43:46] !log switching sessions back to memcached for labsconsole [18:43:50] Logged the message, Master [18:46:44] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [18:46:58] hm [18:47:26] I should probably put a message below the token input on the login page saying that the token is only necessary if two-factor auth is enabled for an account [18:49:18] !log added OATHAuth to components list for MediaWiki Extensions product in bugzilla [18:49:21] Logged the message, Master [18:59:20] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [18:59:38] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [19:01:52] anyone wanna review this one for me? [19:01:52] https://gerrit.wikimedia.org/r/#/c/7285/ [19:02:45] notpeter, jeremyb, this one is still waiting too [19:02:45] https://gerrit.wikimedia.org/r/#/c/6798/ [19:09:22] Ryan_Lane: OATHAuth as a name is both awesome and horrible. probably more horrible though [19:09:23] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [19:09:28] :D [19:09:35] well, that's the standard name [19:09:43] blame the OATH people ;) [19:10:13] there's two implementations, HOTP and TOTP, but generally both are implemented [19:10:25] fits in with our whole MediaWiki/Wikimedia naming scheme, though [19:10:30] heh [19:10:31] true [19:10:36] just wait till we have OATH and OAUTH [19:11:19] thankfully, basically everywhere in the interface I use "Two-factor authentication", rather than OATH [19:11:38] so, end-users shouldn't confuse them [19:11:47] !log resyncing cluster22 from es1002 to es1004 [19:11:50] Logged the message, Master [19:12:08] I couldn't call the extension TwoFactorAuth, either, because there's like a billion ways to do two-factor [19:13:24] really, though, that bit of confusion wouldn't have been nearly as bad as what we're pretty much guaranteed with OATHAuth [19:13:47] heh [19:14:08] well, since the end-user doesn't see anything about OATH, it shouldn't be too bad [19:14:29] also, when you say them out-loud they sound much different, so it shouldn't be confusing when discussing it [19:15:36] we can rename it if it's confusing in practice [19:16:50] someone should probably implement TOTP at some point [19:22:06] New patchset: Ryan Lane; "Change alias directory to 1.20wmf2" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7614 [19:22:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7614 [19:22:29] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7614 [19:22:31] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7614 [19:26:30] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [19:41:07] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [20:06:27] * jeremyb is still waiting on some shell reqs... any takers? [20:06:37] No [20:06:43] It's deployment time [20:06:45] oh, a Reedy [20:06:47] And also, it's been the weekend ;) [20:06:56] 14 14:05:06 < jeremyb> btw, I think 7578, 7182, and 7516 are all ready to go. (shell reqs). but 7182 will need a namespaceDupes run [20:07:07] but Reeeedy [20:07:20] and thanks for finishing those all off jeremyb [20:08:28] also, 7180 is ready for a look. (haven't reviewed it myself so idk for sure it's ready for merge) [20:08:49] Reedy: link to deploy schedule? [20:08:57] jeremyb, question for you http://meta.wikimedia.org/wiki/User:MZMcBride/Sandbox you see all of those namespaces without any non-redirects? We want to delete them, what's the best way of doing it but making sure nothing else is broken? Any namespaceCleanup script to run? [20:09:05] http://wikitech.wikimedia.org/view/Software_deployments [20:10:06] so, 3 more hours [20:10:19] I think we're just about done [20:10:55] with mobile too? [20:11:13] No idea what they were intending to deploy.. [20:11:24] and test/test2/mediawikiwiki? [20:11:25] preilly: awjr_lunch what was on the deployment calendar for Mobile stuffs? [20:11:35] They're all on 1.20wmf3 [20:11:41] well apparently they have their own schedule [20:11:46] indeed [20:11:49] Meh, still need pushing to 1.20wmf2 wikis [20:12:02] pushing what? [20:12:09] which has an emmpty upcoming and no mention of today [20:12:27] Monday, May 14 22:00-23:00 UTC (3pm-4pm PDT): Updated MobileFrontend code (see mw:Extension:MobileFrontend/Deployments) [Patrick, Arthur] [20:12:45] Thehelpfulone: why? [20:13:21] Thehelpfulone: you want to make them aliases for [[help:]]? [20:13:29] why do we want to delete them?http://meta.wikimedia.org/wiki/Meta:Babel#Proposal_to_clean_Meta-Wiki_namespaces [20:13:45] Reedy: what are you asking? [20:14:07] preilly: you're on the deploy schedule linked above. are you really deploying? [20:14:08] Nothing now. I realised what was going ot happen [20:14:16] They will be [20:14:28] jeremyb: yes [20:14:32] jeremyb: ok I think? [20:14:49] Reedy: anyway, if someone could at least review in gerrit. deploy doesn't have to be instant [20:15:00] well, it does [20:15:07] when it's reviewed, it should be deployed [20:15:12] huh? [20:15:13] why? [20:15:17] so no one gets tripped up by checking it out on fenari [20:15:20] i mean not merged [20:15:25] No point [20:15:28] not submitted [20:15:36] Might aswell just do it in one go [20:15:39] just a comment and status change "looks good" [20:15:54] and then later you don't have to review the diff so much [20:16:48] anyway, whatever you like ;) [20:17:04] WORLD DOMINATION [20:17:32] errr, no [20:17:53] well, lunch at least [20:18:22] oh, you're SF? [20:19:01] indeed [20:19:10] well, I'm in SF [20:19:13] I am not SF ;) [20:19:27] {{fact}}! [20:30:28] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [20:44:53] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [20:45:47] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [20:46:54] Reedy you guys still deploying? im seeing a php fatal on mediawiki.org [20:47:09] PHP fatal error in /usr/local/apache/common-local/php-1.20wmf3/extensions/Translate/tag/TranslatablePage.php line 108: [20:47:09] Argument 1 passed to TranslatablePage::newFromTitle() must be an instance of Title, null given, called in /usr/local/apache/common-local/php-1.20wmf3/extensions/Translate/tag/PageTranslationHooks.php on line 414 and defined [20:50:15] Ahhh [20:51:50] awjr: often? Or doing a specific action? [20:52:02] Reedy sorry, got distracted before i finished explaining - on editing an article [20:52:08] on submit [20:56:33] awjr: which page? [20:56:46] roblaAFK: http://www.mediawiki.org/w/index.php?title=Extension:MobileFrontend/Deployments/2012-05-14&action=submit [20:57:54] ah, it's page creation, not editing [20:58:32] yup, sorry for the confusion [21:00:23] Just out of curiosity, is "Our servers are currently experiencing a technical problem." the expected behavior for an upload to mediawiki.org? [21:00:33] PHP fatal error in /usr/local/apache/common-local/php-1.20wmf3/extensions/Translate/tag/TranslatablePage.php line 108: [21:00:34] Argument 1 passed to TranslatablePage::newFromTitle() must be an instance of Title, null given, called in /usr/local/apache/common-local/php-1.20wmf3/extensions/Translate/tag/PageTranslationHooks.php on line 414 and defined [21:01:00] Oh well. Time to upload to wikia. [21:01:10] dschoon: yes. just look up a few lines in here [21:01:11] dschoon: we know about it [21:01:17] sweet. [21:01:20] ah yes. [21:01:23] now i see. [21:01:28] But no, that's not supposed to happen ;) [21:01:32] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [21:37:32] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3238 [21:40:29] I wish gerrit would put some more info than "lolno, path conflict" [21:42:29] New patchset: Ryan Lane; "replace "*.wikimedia.org" with "star.wikimedia.org" per RT-2512 | get rid of star_wikimedia_org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7676 [21:42:48] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/7676 [21:44:06] New patchset: Ryan Lane; "replace "*.wikimedia.org" with "star.wikimedia.org" per RT-2512 | get rid of star_wikimedia_org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7676 [21:44:32] Change abandoned: Ryan Lane; "Newer changeset added in." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3238 [21:44:32] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/7676 [22:02:50] New patchset: Ryan Lane; "replace "*.wikimedia.org" with "star.wikimedia.org" per RT-2512 | get rid of star_wikimedia_org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7676 [22:03:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7676 [22:03:19] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7676 [22:03:21] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7676 [22:06:42] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [22:13:11] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5727 [22:13:13] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5727 [22:15:08] New patchset: Ryan Lane; "Test commit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7677 [22:16:57] New patchset: Ryan Lane; "Test commit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7677 [22:17:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7677 [22:17:41] New patchset: Ryan Lane; "Test commit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7677 [22:18:00] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/7677 [22:18:30] Change abandoned: Ryan Lane; "was only for a test" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7677 [22:19:17] New patchset: Ryan Lane; "Fixing reference to helper, which is gone" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7678 [22:19:37] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7678 [22:20:19] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7678 [22:20:22] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7678 [22:25:36] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [22:35:39] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [22:44:56] New patchset: Asher; "optionally analayze all queries to a server via tcpdump for a limited time" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7683 [22:45:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7683 [22:45:42] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [22:46:54] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [22:55:14] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [22:56:00] New patchset: Asher; "optionally analayze all queries to a server via tcpdump for a limited time" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7683 [22:56:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7683 [22:57:14] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7683 [22:57:16] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7683 [23:01:11] New review: Krinkle; "Changing from Commons:Upload to Special:UploadWizard I think should require consensus on individual ..." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/7574 [23:08:25] New patchset: Asher; "run tcpdump based query analysis on prod cluster dbs for 90 seconds per hour (30 sec at a time, every 20 minutes)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7685 [23:08:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7685 [23:09:12] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7685 [23:09:14] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7685 [23:12:22] New patchset: Asher; "fix time specification" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7686 [23:12:39] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/7686 [23:13:49] New patchset: Asher; "fix time specification" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7686 [23:14:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7686 [23:15:31] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7686 [23:15:34] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7686 [23:20:53] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [23:27:47] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [23:33:29] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [23:47:54] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [23:54:20] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor