[00:03:08] New review: Pyoungmeister; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/49950 [00:03:24] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49950 [00:21:18] TimStarling: http://redis.io/commands/debug-segfault [00:21:34] best command ever [00:24:30] heh [00:37:13] RECOVERY - mysqld processes on db43 is OK: PROCS OK: 1 process with command name mysqld [00:39:37] PROBLEM - Puppet freshness on knsq26 is CRITICAL: Puppet has not run in the last 10 hours [00:41:16] PROBLEM - MySQL Slave Delay on db43 is CRITICAL: CRIT replication delay 2707 seconds [00:41:43] PROBLEM - Puppet freshness on amssq43 is CRITICAL: Puppet has not run in the last 10 hours [00:43:51] New patchset: Asher; "gitreview file" [operations/software/redactatron] (master) - https://gerrit.wikimedia.org/r/49956 [00:44:18] New review: Asher; "Patch Set 1: Verified+2 Code-Review+2" [operations/software/redactatron] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/49956 [00:45:01] Change merged: Asher; [operations/software/redactatron] (master) - https://gerrit.wikimedia.org/r/49956 [00:46:12] New patchset: Asher; "Initial crud functionality + schema review + redaction defaults for trigger generation" [operations/software/redactatron] (master) - https://gerrit.wikimedia.org/r/49957 [00:52:22] RECOVERY - MySQL Slave Delay on db43 is OK: OK replication delay 0 seconds [00:54:10] New review: Asher; "Patch Set 1: Verified+2 Code-Review+2" [operations/software/redactatron] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/49957 [00:54:10] Change merged: Asher; [operations/software/redactatron] (master) - https://gerrit.wikimedia.org/r/49957 [00:54:59] New patchset: Pyoungmeister; "db-pmtpa.php: db43 back in, db58 out" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49958 [00:55:58] New review: Pyoungmeister; "Patch Set 1: Code-Review+2" [operations/mediawiki-config] (master) C: 2; - https://gerrit.wikimedia.org/r/49958 [00:56:13] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49958 [00:57:27] !log py synchronized wmf-config/db-pmtpa.php 'removing db58 from db-secondary for upgrades and maria, replacing db43' [00:57:28] Logged the message, Master [00:58:34] notpeter: do you know anything about ishmael being broken by any chance? [00:59:14] binasher: nein [01:00:49] !log do-release-upgrade-ing on db58 [01:00:50] Logged the message, notpeter [01:06:30] PROBLEM - mysqld processes on db58 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [01:10:44] ok bedtime for me and mostly not here tomorrow (trading monday hoilday when I worked for off day on wednesday) [01:10:49] see folks later [01:16:40] New patchset: Pyoungmeister; "db58 mariafication" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49961 [01:17:18] PROBLEM - Host db58 is DOWN: PING CRITICAL - Packet loss = 100% [01:17:34] New review: Pyoungmeister; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/49961 [01:17:45] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49961 [01:18:39] RECOVERY - Host db58 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [01:28:06] RECOVERY - mysqld processes on db58 is OK: PROCS OK: 1 process with command name mysqld [01:28:37] !log mw1161-1208 have wrong power setting, changing all to system.power.hotspare.enable 0 [01:28:41] Logged the message, RobH [01:30:39] heh, power is normalizing in c6, awesome [01:32:54] PROBLEM - MySQL Slave Running on db58 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Error Table ./eswiki/page_props is marked as crashed and should be [01:44:37] New review: Ryan Lane; "Patch Set 2: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/49916 [02:03:16] New patchset: Faidon; "Revoke preilly's access" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49964 [02:03:37] New review: preilly; "Patch Set 1: Code-Review+1" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/49964 [02:07:16] New review: Faidon; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/49964 [02:07:25] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49964 [02:22:32] PROBLEM - Puppet freshness on ms6 is CRITICAL: Puppet has not run in the last 10 hours [02:24:56] RECOVERY - Puppet freshness on ms6 is OK: puppet ran at Wed Feb 20 02:24:39 UTC 2013 [02:29:12] !log LocalisationUpdate completed (1.21wmf9) at Wed Feb 20 02:29:11 UTC 2013 [02:29:15] Logged the message, Master [02:39:22] New patchset: Tim Starling; "New cluster-local key for me" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49966 [02:40:05] apergos: around? [02:40:13] New review: Tim Starling; "Patch Set 1: Verified+2 Code-Review+2" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/49966 [02:40:22] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49966 [02:40:33] 04:40, unlikely :) [02:40:39] i was gonna say... [02:40:53] you should know better than most [02:43:09] heh [02:45:31] TimStarling: your new and old keys are both absent? [02:46:06] jeremyb_: good point [02:47:05] New patchset: Tim Starling; "Maybe let the new key work" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49967 [02:47:54] New review: Tim Starling; "Patch Set 1: Verified+2 Code-Review+2" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/49967 [02:48:03] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49967 [02:50:35] New patchset: Catrope; "New cluster key for me" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49968 [02:53:05] New patchset: Faidon; "Revoke Ariel's second key" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49969 [02:53:22] New review: Faidon; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/49969 [02:53:33] !log LocalisationUpdate completed (1.21wmf10) at Wed Feb 20 02:53:32 UTC 2013 [02:53:35] Logged the message, Master [02:54:49] New review: Jeremyb; "Patch Set 1: Code-Review+1" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/49968 [02:54:49] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49969 [03:03:05] I have also regenerated my Gerrit/github key, and shredded the old one [03:03:10] And shredded the old fenari key [03:03:24] Which means I'm locked out of the cluster until my key change commit is merged [03:04:40] Patrick R. is leaving WMF? [03:04:46] yes. [03:04:52] Hi Eloquence. [03:05:07] Susan's eager to update wmfwiki? [03:05:07] hi max [03:05:17] jeremyb_: It's already been updated! [03:05:24] oh, wow [03:05:32] i should pay more attention to your bot [03:05:39] jeremyb_: I didn't see it on the lists. That's why I was curious whether it was a mistake. [03:07:01] Susan: You might have noticed the commits above to operations/puppet [03:07:12] I haven't been paying attention in here. [03:08:02] Hi Roan. [03:08:35] damnit, i hate that it's a template. ([[wmfwiki:staff and contractors]]) [03:08:44] RoanKattouw: Did you see wikitech-l re: using wiki pages as databases? [03:08:52] No, I'm behind on wikitech-l [03:08:54] * jeremyb_ went to the history of the wrong page [03:08:55] jeremyb_: It's supposed to be localized. [03:08:56] Haven't read it since Christmas [03:08:58] (bad Roan) [03:09:08] Susan: i know. i saw some recent debate about htat [03:09:09] that [03:09:16] RoanKattouw: http://lists.wikimedia.org/pipermail/wikitech-l/2013-February/066656.html [03:10:03] I have no strong opinions about that and defer to Tim's opinion [03:14:38] New patchset: Ryan Lane; "Changing my production key" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49971 [03:16:25] New review: Ryan Lane; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/49971 [03:16:34] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49971 [03:21:49] RoanKattouw: Well, I was just thinking at dinner that maybe we need Starling's Law or something. [03:22:16] New review: Faidon; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/49968 [03:22:25] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49968 [03:23:02] RoanKattouw: "No matter how intricately designed or implemented, Wikimedians will find a way to mis-use or abuse a technology in infuriating and horrifying ways and with an astonishing pace." [03:23:09] I'm still working on it. [03:25:29] Susan: Mediawiki is just teh suxx0rz at structured data. We need to code a database extension. AKA: Wikidata? :-) [03:26:02] Heh, I know, right? [03:26:22] That thread was a little painful. [03:34:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:36:09] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.691 seconds [04:14:06] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:20:06] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.062 seconds [04:25:30] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 0 seconds [04:32:06] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [04:32:33] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [05:06:27] RECOVERY - MySQL disk space on neon is OK: DISK OK [05:07:21] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [05:25:14] New review: Ori.livneh; "Patch Set 1: -Code-Review" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49678 [05:31:25] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [05:32:55] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [05:33:01] New patchset: Tim Starling; "Re-enable Scribunto profiler" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49976 [05:33:26] New review: Tim Starling; "Patch Set 1: Verified+2 Code-Review+2" [operations/mediawiki-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/49976 [05:33:27] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49976 [05:34:43] !log tstarling synchronized wmf-config/CommonSettings.php 're-enable Scribunto profiler' [05:34:43] Logged the message, Master [05:39:13] PROBLEM - Puppet freshness on labstore2 is CRITICAL: Puppet has not run in the last 10 hours [05:52:48] TimStarling: how is lua going? [05:53:12] the wikipedians are finally starting to think about using it, which is nice [05:53:22] yesterday they seemed to be afraid to touch it [05:53:40] I guess Citation usage and ramp up is the next milestone perhaps [05:53:50] yeah, AllPages for Module: has built up a tad [05:53:55] it was quite that first day [05:54:01] and the wiktionarians? [05:54:19] let's see what it looks like there [05:54:21] http://en.wikipedia.org/w/index.php?namespace=828&tagfilter=&title=Special%3ARecentChanges [05:54:38] to judge by that, it won't be long before we have lua versions of citations and convert on enwiki [05:55:09] the wiktionarians were less shy, but one of them uncovered an unfortunate performance edge case, per wikitech-l [05:55:10] https://en.wiktionary.org/w/index.php?namespace=828&tagfilter=&title=Special%3ARecentChanges [05:55:32] oh, i only saw the IRC discussion. let me search wikitech-l [05:55:48] I'm trying to compose a reply to code cat as to why it's faster to write your own parser than to use lua tables [05:56:29] * Aaron|home recalls the day that RC links like that first one would have timed out [06:01:13] I changed the code to use a two queries and UNION the result to make use of an existing index and felt giddy joy when querying the Portal namespace became reasonable [06:01:18] * Aaron|home was easily impressed [06:02:00] TimStarling: anyway, when does CodeEditor start to choke on decent client hardware? [06:02:29] I definitely noticed slowdown on that language module thing, not unusable though [06:03:11] I'd hate to disable the highlighting at some threshold, it seems better to force people to break up the code a bit [06:03:31] * Aaron|home imagines MediaWiki as one big PHP file [06:03:52] CodeEditor seems to do alright [06:04:02] the thing that fails terribly is firefox with geshi [06:04:20] it spends about a minute frozen, trying to apply the CSS rules [06:04:45] I assume it is CSS, chromium has CSS profiling and it was slow enough there to be suspicious [06:05:46] any particular rules stand out? [06:09:11] just running profiling in chromium again, it is saying different things this time [06:09:40] it may actually be JS [06:13:25] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [06:14:04] I don't know, it's hard to make sense of this tool [06:14:55] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [06:14:57] need perf or something [06:15:01] JS causing re-rendering with a crapload of tiny DOM changes in a loop? [06:15:30] where is a page that's slow? [06:16:02] https://en.wiktionary.org/wiki/Module:languages [06:16:12] I think I tried it with JS off and found that it was still slow [06:16:16] trying it with CSS off now [06:16:16] yeah, "that language module thing" [06:16:35] maybe firefox is just slow [06:16:36] * Aaron|home hears is fan speed up [06:16:49] *his [06:17:49] huh, and no CodeEditor for view source :/ [06:18:04] my laptop went into swap [06:19:03] it takes a while in Opera (a bit less though) and doesn't "stop the world" so much, though it's still laggy [06:19:38] at the top of the CSS profile in chromium, there is .lua.source-lua .sy0, .lua.source-lua .br0, .lua.source-lua .st0 [06:20:13] 2.77, 2.75 and 1.73 seconds respectively [06:20:25] ugh, Chrome is worse than FF [06:20:29] for scrolling, not for the whole page [06:21:04] * Aaron|home likes how Opera doesn't freeze for this [06:21:13] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [06:21:15] well, the DOM has 437,213 nodes [06:22:41] $('span').length [06:22:41] RangeError: Maximum call stack size exceeded [06:22:52] hm. [06:22:58] have you heard the news? opera is going to become another webkit clone [06:23:07] so anything that's good about it will stop existing [06:23:30] * Aaron|home misses all the rock opera themed picture opera installation had [06:23:52] just what the word needs, another webkit clone :) [06:24:14] Chrome is by far the worst for this page (and I didn't scroll) [06:26:09] * Aaron|home tries IE [06:27:16] meh, in between chrome and FF there [06:29:47] it looks like SyntaxHighlight_GeSHi's lexer has odd ideas about how to handle that code [06:36:50] heh [06:36:55] ]] closes comment blocks [06:36:58] that could be a problem [06:39:03] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 194 seconds [06:39:21] try --[=[ [06:39:39] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 218 seconds [06:39:44] --[=[ hello ]=]-- [06:39:58] which on the plus side, looks very artistic [06:40:45] yes, very FILE_ID.DIZ [06:41:08] exactly [06:44:37] "The Lua comparison operators on strings (< and <=) use the C function strcoll which is locale dependent. This means that two strings can compare in different ways according to what the current locale is. For example, strings will compare differently when using Spanish Traditional sorting to that when using Welsh sorting. " [06:44:39] ugh [06:45:03] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 13 seconds [06:45:05] yeah, you must have missed the gerrit discussion about that [06:45:40] brad pushed a change for review that involved calling Lua's setlocale() to force a C locale [06:46:15] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [06:46:18] I pointed out that it is just a wrapper for the C library function of the same name [06:46:35] and so it will affect everything in the same process, even other threads [06:46:48] bbl [06:48:10] * Aaron|home raises an eyebrow at http://stackoverflow.com/questions/8715980/javascript-strings-utf-16-vs-ucs-2 [06:48:40] the "ugh" was about global state [06:48:52] but yeah I noticed that local thing [06:53:02] * Aaron|home reads http://mathiasbynens.be/notes/javascript-encoding [06:55:42] yes :( [06:56:10] http://stackoverflow.com/questions/3744721/javascript-strings-outside-of-the-bmp [06:56:23] lol the "UTF 16 curse" [06:56:57] maybe it's good that lua waited [07:00:35] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [07:04:50] not at 4:40 am I'm not around, paravoid (but not much today, swapped monday holiday for today) [07:10:28] I sure forgot to change that [07:16:02] RECOVERY - MySQL Slave Delay on db59 is OK: OK replication delay 0 seconds [07:33:53] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [07:36:44] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 26.73 ms [07:37:59] PROBLEM - NTP on mw1085 is CRITICAL: NTP CRITICAL: Offset unknown [07:39:52] RECOVERY - NTP on mw1085 is OK: NTP OK: Offset -0.001426935196 secs [07:46:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:48:02] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.030 seconds [07:57:28] New review: Danny B.; "Patch Set 2: Code-Review+1" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/49681 [08:03:20] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Puppet has not run in the last 10 hours [08:05:22] New review: Danny B.; "Patch Set 1: Code-Review-1" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/49802 [08:13:14] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [08:15:20] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [08:17:17] PROBLEM - Puppet freshness on labstore3 is CRITICAL: Puppet has not run in the last 10 hours [08:21:02] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:30:47] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 185 seconds [08:31:50] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 210 seconds [08:35:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.055 seconds [08:40:50] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 11 seconds [08:41:44] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [08:58:34] when wikidata goes read-only later today will https://www.wikidata.org/wiki/MediaWiki:Readonlywarning be displayed for users? [09:20:26] RECOVERY - Puppet freshness on amssq43 is OK: puppet ran at Wed Feb 20 09:20:08 UTC 2013 [09:23:26] RECOVERY - Puppet freshness on knsq26 is OK: puppet ran at Wed Feb 20 09:23:00 UTC 2013 [09:26:15] Depends how it's set read-only, I suppose. [09:26:43] You can set the variable to a string to display a custom message or you can set it to true to show that MediaWiki page, I think. [09:30:47] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 196 seconds [09:31:04] Susan: Right but normally that only shows up in the edit window right? [09:31:10] Wikidata doesnt have an edit window... [09:31:32] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 214 seconds [09:41:55] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [09:43:07] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [09:43:34] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [09:44:10] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [09:48:09] hello [10:13:07] RECOVERY - MySQL disk space on neon is OK: DISK OK [10:13:16] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [10:49:00] New patchset: QChris; "(bug 35802) Stop applying unwarranted ellipses to email subjects" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49987 [10:56:38] qchris: :-] [10:56:53] Hi hashar :-) [10:57:22] qchris: it is good to see Chad having a new Gerrit friend :-] [10:57:35] can't wait to see the bugzilla/gerrit integration [10:57:53] The plugin is there ... [10:58:11] so it cannot take too long until it's done [10:58:20] are you coming to SF next week? [10:58:27] But there are few other things on the agenda before [10:58:34] no. [10:58:40] I suppose you are going? [10:58:44] yeah I am [10:59:00] One less in my time-zone :-( [10:59:02] I have ton of people to interview for continuous integration [10:59:20] I am afraid you will be a bit lonely during european morning : / [10:59:32] though there are the wikidata folks from Wikimedia Deutschland ! [10:59:38] :-) [10:59:39] they hide in #wikimedia-wikidata [10:59:46] So we're not alone after all. [11:15:39] PROBLEM - Apache HTTP on mw1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:16:06] PROBLEM - SSH on mw1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:16:23] New patchset: Ori.livneh; "EventLogging: fix test2wiki configs; +CodeEditor" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49991 [11:19:50] I hate java really [11:29:18] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [11:35:00] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 198 seconds [11:36:48] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [11:38:46] New patchset: Hashar; "(bug 38114) gerrit: alternate change list row colors" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49993 [11:39:28] New review: Hashar; "Patch Set 1:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49993 [11:39:49] lunchhh [12:02:12] PROBLEM - Apache HTTP on mw31 is CRITICAL: Connection refused [12:36:51] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 198 seconds [12:37:09] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 209 seconds [12:38:40] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [12:38:57] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [12:49:51] New review: QChris; "Patch Set 1: Code-Review-1" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/49993 [12:55:36] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [12:56:35] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [13:03:11] New review: Hashar; "Patch Set 1:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49993 [13:05:41] New patchset: Hashar; "(bug 38114) gerrit: alternate change list row colors" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49993 [13:05:41] qchris: thanks for test of alternate row coloring [13:05:50] qchris: I have amended the change ^^^^ [13:05:58] * qchris looks again :-) [13:07:23] New review: QChris; "Patch Set 2: Code-Review+1" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/49993 [13:07:35] Tables in gerrit are annoying [13:07:43] bah [13:07:44] http://integration.wmflabs.org/gerrit/#/q/status:open,n,z [13:07:50] Somtimes they even come with invisible extra rows/columns :-( [13:07:50] that is not working on my install :( [13:08:18] Nice theme :-) [13:08:35] s/Tables in // [13:08:45] :-D [13:09:37] oh my god [13:09:40] gerrit I hate you [13:10:06] it start up with the wrong jar [13:10:07] :( [13:10:08] /var/lib/gerrit2/review_site/bin/gerrit.war [13:17:48] PROBLEM - MySQL Replication Heartbeat on db44 is CRITICAL: CRIT replication delay 181 seconds [13:18:17] <^demon> It always starts with the one in ./bin [13:18:42] PROBLEM - MySQL Slave Delay on db44 is CRITICAL: CRIT replication delay 187 seconds [13:20:01] ^demon: I start it using the init.d script which loads /etc/default/gerritcodereview that got: GERRIT_WAR="/var/lib/gerrit2/review_site/bin/gerrit.war" [13:20:01] ;-D [13:20:10] not sure why I had the .war in /var/lib/gerrit2 [13:20:26] <^demon> The package installs the war there for installation purposes. [13:21:04] <^demon> (So as to not overwrite the running .war until you do init) [13:21:29] qchris: the table alternance does not work properly :( [13:21:39] ^demon: yeah that make sense :-] [13:22:02] qchris: on the general view http://integration.wmflabs.org/gerrit/#/q/status:open,n,z I got the first change with a white background [13:22:15] * qchris takes a look [13:22:22] but in "My Changes", I got the first one with gray ;-] [13:22:27] not sure how to link you to it hehe [13:22:48] <^demon> You're not supposed to link to those anymore. [13:23:14] ahh [13:23:19] I'll apply it locally again [13:23:43] that is a bit crazy, I am not sure how they count their rows hehe [13:24:20] <^demon> Make sure we're not overriding anything in GerritSite.css [13:24:37] AHHH [13:24:39] <^demon> Stuff in there is !important so would override theme settings. [13:24:46] so turns out that is just a huge table [13:25:23] .changeTable tr:nth-child(odd) { background: #EEE; } [13:25:31] but then you have headers inserted in the .changeTable [13:25:43] and can have either and odd or even number of rows before [13:25:50] so after a header, you could get a odd or even row :( [13:26:39] Can we pick different colors for header and grey rows? [13:27:15] RECOVERY - MySQL disk space on neon is OK: DISK OK [13:28:09] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [13:30:04] New review: Hashar; "Patch Set 2: Code-Review-1" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/49993 [13:30:10] gave a long comment https://gerrit.wikimedia.org/r/#/c/49993/ [13:30:17] qchris: yeah different colors I guess :( [13:30:23] what about pink for headers? :-]]]]]]]]]]]]]]]]]]] [13:30:39] \o/ [13:31:55] http://www.colorhexa.com/eed2ee [13:32:16] everyone is going to hate me [13:35:08] qchris: I will let Krinkle respond on the bug report [13:35:21] ok [13:35:23] * Krinkle hears bleep [13:35:28] though #eed2ee is nice :-] [13:36:11] Krinkle: we were talking about colors alternance in the gerrit changes lists [13:36:38] Right [13:37:05] Krinkle: headers are #EEE, both headers and changes are all part of the same table. By picking up #EEE as an alternate color, you can end up with a header and the change next to it having the same color (#EEE) [13:37:19] When I tested this locally I used :nth-child(odd/even) in the console as well, but I'm assuming that is not what it will use in production since those tables are spanning multiple "user" tables, (they are technically one big table) [13:37:21] I know that [13:39:54] and so that looks odd http://bug-attachment.wikimedia.org/attachment.cgi?id=11812 [13:39:55] ;D [13:41:01] yeah [13:41:11] Niklas was talking about patch lists though, not change lists. [13:41:16] (he even corrected himself) [13:41:24] that's why I suggested #eee as it would be fine in there [13:41:34] They're headings anyway [13:41:38] they're all white now [13:44:54] * qchris will take a look in the aftenoon to see whether we cannot just fix the counting of rows upstream [13:46:08] RECOVERY - MySQL Replication Heartbeat on db44 is OK: OK replication delay 0 seconds [13:47:11] RECOVERY - MySQL Slave Delay on db44 is OK: OK replication delay 0 seconds [13:49:35] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 187 seconds [13:49:44] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 190 seconds [14:18:23] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [14:18:41] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [14:19:54] New review: Mark Bergsma; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/49831 [14:20:04] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49831 [14:22:00] mark: I will boot a new instance and use puppetmaster::self to have a clean change. [14:22:17] mark: I sense that patch is not enough yet to have an upload cache in beta :-/ [14:23:28] which patch? [14:26:37] mark: the one you just merged ( https://gerrit.wikimedia.org/r/#/c/49831/ ) which tweaks the lvs config for labs [14:26:51] hehe [14:26:59] just noticed the upload role has sda / sdb devices [14:27:10] we need to abstract that [14:30:49] something like $role::cache::configuration::disk[$::realm] ? :-D [14:31:04] hmm not necessarily [14:34:43] warning: peer certificate won't be verified in this SSL session [14:34:43] Feb 20 14:23:40 deployment-cache-upload-test puppet-agent[4355]: Did not receive certificate [14:34:44] [0;36mnotice: Did not receive certificate[0m [14:34:45] Feb 20 14:23:51 deployment-cache-upload-test dhclient: DHCPREQUEST of 10.4.1.80 on eth0 to 10.4.0.1 port 67 [14:34:46] bahhh [14:35:08] I should get my own virtual boxes [14:40:38] haha [14:40:43] that's what I dooooooo [14:41:02] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [14:41:47] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [15:12:02] RECOVERY - MySQL disk space on neon is OK: DISK OK [15:12:29] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [15:16:03] puppet1.pmtpa.wmflabs died in labs, not sure whether mark / Coren can do anything about :-D [15:16:25] Not yet. Sorry. [15:18:08] hm [15:18:12] i just created that [15:18:26] wait, did you have one called puppet1, hashar? [15:18:32] ohh [15:18:40] that just sounded like the puppet master in labs died :] [15:18:59] oh is puppet1.pmtpa.wmflabs the puppet master? haha [15:19:05] na should be virt0 or something [15:19:09] hm [15:19:15] right [15:19:28] do we both have instances named puppet1 now? [15:20:14] virt0 is just named virt0 [15:20:17] it's not an instance [15:20:21] it's a physical host [15:20:39] New patchset: Platonides; "Configuration for webtools-apache VMs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50011 [15:20:55] ja, right [15:21:36] puppetmaster is not running on virt0 [15:22:16] hmm, maybe it is, service just says it isn't [15:22:49] right, you should check for other webservers [15:23:08] ? [15:24:12] puppetmasters can be their own service or can be run under apache or whatever webserver someone chose [15:24:27] anyway, the entry point is always HTTPS [15:40:23] PROBLEM - Puppet freshness on labstore2 is CRITICAL: Puppet has not run in the last 10 hours [15:50:39] New patchset: Platonides; "Configuration for webtools-apache VMs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50011 [15:56:56] New patchset: Platonides; "Handling of packages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50015 [15:57:07] New patchset: Demon; "Update gerrit /etc/default/* for init script" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50016 [15:59:57] New review: Platonides; "Patch Set 1: Code-Review+1" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/50016 [16:22:40] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [16:36:01] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:38:43] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.034 seconds [16:48:57] New patchset: Demon; "Make the # optional when listing RT and Bugzilla entries" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50017 [16:49:54] <^demon> 50016 and 50017 are super trivial if someone's got a minute. [16:51:55] New patchset: Jgreen; "de-puppetizing my.cnf for db29 since it's a weird conf & short term use" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50019 [16:51:56] ^demon: and 49196! :) [16:52:33] <^demon> Ya. [16:52:41] <^demon> Since we're gonna have to restart gerrit for config changes. [16:52:46] <^demon> Might as well get all 3 of those. [16:53:13] we need tagging! [16:53:21] these could all be tagged as gerritconf [16:53:29] and then we can search for that when doing a restart [16:53:45] <^demon> We could use the topics. [16:53:51] unless that's a new feature i didn't notice yet [16:53:53] hrmmmm [16:56:12] <^demon> https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/puppet+branch:production+topic:gerritconf,n,z [16:59:12] what is gerrit doing that makes config changes such a big deal? :) [16:59:15] er, restarts [17:01:40] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [17:01:46] mark: it requires shell access. i think that's a good thing. :) (shell people are sometimes easy to find and sometimes not) [17:02:01] mark: btw, do you have a comment on RT 822? [17:03:38] New review: Jgreen; "Patch Set 1: Verified+2 Code-Review+2" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50019 [17:03:42] Coren: http://bugs.debian.org/701026 [17:03:47] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50019 [17:03:55] Coren: btw, you should /join #wikimedia-tech [17:08:56] <^demon> mark: Well, it doesn't pick up config changes until it restarts :\ [17:09:15] yeah i get that, but why is restarting such a big deal [17:09:23] why isn't it quick and painless [17:09:27] what's it doing during restarts? :) [17:09:30] <^demon> It's pretty painless. [17:09:37] New patchset: Ottomata; "Adding puppet Limn module." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49710 [17:09:38] <^demon> I'm just saying, people freak out when they can't reach gerrit. [17:09:42] hehe [17:09:43] <^demon> So I try to keep reboots to a minimum, [17:38:54] New patchset: Reedy; "Rename *.wikimedia.org.crt to star.wikimedia.org.crt like it is used in files/owa/owa-apache" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/32924 [17:42:56] * jeremyb_ wonders what owa is [17:43:05] Open Web Analytics IIRC [17:43:16] from some grepping it seems to not be Outlook Web Access [17:43:22] ahhh, much better [17:43:27] Reedy: THAT WOULD BE SWEET [17:45:36] New patchset: Matmarex; "(bug 45163) make gerrit's link parser not match incomplete words" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50029 [17:54:50] MathRenderer::writeDBEntry 10.64.16.8 1205 Lock wait timeout exceeded; try restarting transaction (10.64.16.8) REPLACE INTO `math` ... [17:55:10] so, who changed math to cause locking problems? [18:02:14] James_F: did you ever find that watch? :) [18:02:16] AaronSchulz, it had an epic rewrite recently [18:02:32] AaronSchulz: Yes. On my desk, as expected. :-) [18:04:36] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Puppet has not run in the last 10 hours [18:06:33] PROBLEM - Puppet freshness on lardner is CRITICAL: Puppet has not run in the last 10 hours [18:08:27] WTF? [18:08:44] So I went to lardner to check why Puppet hadn't run there, assuming there was some sort of error in a manifest [18:08:47] And I got this: [18:09:11] root@lardner:~# puppetd -tv [18:09:13] notice: Skipping run of Puppet configuration client; administratively disabled; use 'puppet Puppet configuration client --enable' to re-enable. [18:10:52] wasn't lardner supposed to be reprovisioned [18:11:07] Eventually yes [18:11:13] But I was expecting to have been given notice [18:11:18] ah no, consfusing it with another box [18:11:20] ignore me [18:11:29] I should get to the office and get a coffee on my way :) [18:11:38] RobH: Do you know what's up with lardner by any chance? [18:14:30] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [18:16:36] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [18:18:33] PROBLEM - Puppet freshness on labstore3 is CRITICAL: Puppet has not run in the last 10 hours [18:23:00] RoanKattouw: I have not been working on it [18:23:13] RoanKattouw: but i did get that the other day [18:23:18] on another box. [18:23:34] i wasnt on as root [18:23:36] OK [18:23:38] (iirc [18:23:39] ) [18:23:44] you ran as root and got that? [18:23:47] Ya [18:23:58] now im trying to recall how i fixed it [18:24:05] Well it tells me how to reenable [18:24:07] But I'm concerned [18:24:19] And I'm not just gonna reenable it without first talking to someone who knows this stuff [18:24:24] Preferably mark or Ryan_Lane [18:24:35] i dealt with this yesterday [18:24:38] i just wish i recalled how. [18:24:51] (it also happened on some random mw server i just installed, so it was some fluke it seemed) [18:25:01] It says "use 'puppet Puppet configuration client --enable' to re-enable" [18:25:37] i dont think that worked for me. [18:27:02] heh, the servermon for puppet showing whats run [18:27:06] reads 'a django site' [18:27:17] now all i have is django unchained soundtrack in my head. [18:32:55] RoanKattouw: So when i got that, if i recall, and i barely do, i think i had to try what it siad [18:32:57] it failed [18:33:06] so i wiped off all its puppet keys on local and cleaned cert of sockpuppet [18:33:10] and reran it all and it worked [18:33:23] but honestly, i fixed a couple dozen dead servers in past two days, i no longer recall what i did to each. [18:34:32] RoanKattouw: where is this disabled? [18:34:45] lardner [18:34:48] usually puppet gets disabled because someone is live hacking something on a server [18:34:51] what does lardner do? [18:34:55] its a parsoid server [18:35:00] no one should have been touching it cept roan and i [18:35:15] and neither of us did anythign that should trigger that [18:35:35] puppetd --enable [18:35:45] <^demon> RoanKattouw: Did you maybe disable puppet awhile back on the box when I broke your parsoid boxes? [18:36:33] <^demon> I know it was like 2 weeks ago, but just a thought... [18:36:37] nah [18:36:42] it had ran as of yesterday [18:36:46] cuz i been watching the puppet checks. [18:43:02] New review: Demon; "Patch Set 1: Code-Review+1" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/50029 [18:44:21] Change abandoned: Demon; "Abandoning in favor of I93d1a978, which accomplishes the same and is way more up to date." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11589 [18:45:01] New review: Demon; "Patch Set 1: Code-Review+1" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/49987 [18:46:57] New patchset: Demon; "Make the # optional when listing RT and Bugzilla entries" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50017 [19:04:00] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 189 seconds [19:04:27] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 202 seconds [19:05:04] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:24] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [19:09:51] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [19:10:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.210 seconds [19:24:14] ^demon: no gerritconf deployment yet, right? [19:24:20] maybe have another tweak to make [19:24:58] <^demon> Not yet. I pinged Ryan, but he seems busy. [19:25:04] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: closed, special, wikimedia, private and fishbowl to 1.21wmf10 [19:25:09] Logged the message, Master [19:25:21] :) [19:29:50] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikivoyage, wikinews and wiktionary to 1.21wmf10 [19:29:52] Logged the message, Master [19:30:00] huh, why did i escape : ? [19:33:08] ^demon: what did you ping me about? [19:34:07] <^demon> In -labs, a couple of gerrit changes. [19:34:37] <^demon> I think we're ready now, but jeremyb_ says maybe wait? [19:34:48] New review: Ryan Lane; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/49987 [19:34:57] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49987 [19:35:17] New review: Ryan Lane; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/50029 [19:35:27] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50029 [19:35:45] New review: Ryan Lane; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/50016 [19:35:53] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50016 [19:36:19] ^demon: you can go if you want... i saw mutante using "RT-\d+" so i was thinking why not support that [19:36:30] ^demon: merged all the way through [19:36:33] (i think it's not the first i saw it) [19:37:04] New patchset: Demon; "Make the # optional when listing RT and Bugzilla entries" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50017 [19:37:18] <^demon> That last one is also ready ^ [19:37:19] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikiversity, wikisource and wikiquote to 1.21wmf10 [19:37:20] Logged the message, Master [19:37:44] hah, i read it as Reedy instead of Ready [19:38:06] <^demon> jeremyb_: https://gerrit.wikimedia.org/r/#/c/49196/ will need rebasing. [19:38:17] i can do that [19:40:45] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikibooks to 1.21wmf10 [19:40:47] Logged the message, Master [19:42:16] New patchset: Reedy; "Everything non 'pedia to 1.21wmf10" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50038 [19:42:30] New review: Reedy; "Patch Set 1: Verified+2 Code-Review+2" [operations/mediawiki-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50038 [19:42:31] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50038 [19:43:54] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:47:56] <^demon> Ryan_Lane: https://gerrit.wikimedia.org/r/#/c/50017/ is it, then I'll just run puppet once. [19:48:30] New review: Ryan Lane; "Patch Set 3: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/50017 [19:48:37] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50017 [19:48:40] (i am rebasing) [19:48:48] ^demon: done [19:48:55] <^demon> thanks. [19:55:27] New patchset: AzaToth; "(Bug #44961) get rid of "Patch Set" in comments" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50041 [19:56:32] paravoid or mutante, around? [19:56:39] ^demon: ↑ [19:56:39] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.221 seconds [19:56:39] I am [19:56:48] what's up [19:57:30] <^demon> AzaToth: Just on comment-added? What about the others? [19:57:53] ^demon: only comment-added are "reviews" [19:57:54] afaik [19:58:40] paravoid, can you do a quick check for me please? basically, whether check_solr fails because it requires a FQDN: `check_solr -a 500 solr1` vs. `check_solr -a 500 solr1.pmtpa.wmnet` [19:59:14] wow, it must be a while since i did a rebase... [19:59:15] had to relearn a little [19:59:19] MaxSem: ofcourse it wants an fqdn [19:59:24] MaxSem: solr1 won't work [19:59:25] okay [19:59:38] but it doesn't seem to work anyway [19:59:42] AttributeError: _ElementInterface instance has no attribute 'iter' [20:00:34] iter is 2.7 [20:00:51] New patchset: Jeremyb; "Linkify RT a little more liberally." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49196 [20:00:53] I think [20:00:55] let me check [20:00:57] ^demon: I don't think abandoned and restored have a Patch Set line? [20:00:57] yup, I've a fix ready [20:01:00] ^demon: Ryan_Lane: ^ [20:01:07] ^demon: dunno, as I've never seen any [20:01:36] Change abandoned: Demon; "Abandoning in favor of I93d1a978, which accomplishes the same and is way more up to date." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11589 [20:01:45] <^demon> Ah, true. [20:01:46] abandoned works already [20:01:53] MaxSem: if you have a fix, I won't debug more :) [20:01:56] restored I've no clue [20:02:38] ^demon: perhaps we chould add a draft-published [20:02:53] I don't know if it will trigger patchset-created [20:03:14] when "published" [20:03:21] <^demon> Really, I'd rather get rid of all of these hooks, and write a single bot that consumes the stream-events. [20:03:38] heh [20:03:56] would you like a cherry on top? [20:04:26] I think he wants a pony [20:05:08] but don't we all... [20:07:39] New patchset: MaxSem; "Make check_solr work with Python 2.6" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50043 [20:08:32] New review: Faidon; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/50043 [20:08:42] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50043 [20:10:36] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 194 seconds [20:11:03] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 206 seconds [20:11:07] MaxSem: any reason to not use $HOSTADDRESS in nagios? [20:11:17] instead of passing the hostname via puppet? [20:11:41] umm, because I'm a Nagios noob?:P [20:11:55] okay, google that :) [20:11:58] yea [20:12:07] https://gerrit.wikimedia.org/r/#/c/50044/ [20:12:30] as I made a draft, and then made publish, it didn't post here [20:12:38] ^demon: ↑ [20:14:06] New review: Demon; "Patch Set 1: Code-Review+1" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/50044 [20:14:41] whoops [20:15:05] got "Mail Error: Server smtp.pmtpa.wmnet rejected recipient preilly : 550 Address preilly@wikimedia.org does not exist " when trying to add "Administrators" to review [20:15:06] New review: Demon; "Patch Set 1: Code-Review+1" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/50041 [20:15:40] <^demon> AzaToth: Fixed. [20:18:06] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 194 seconds [20:18:24] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 196 seconds [20:21:42] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [20:22:00] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [20:31:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:32:20] ! [20:32:41] !log mw86-mw111 being added to nodelists, ignore errors on them for next few minutes [20:32:42] Logged the message, RobH [20:38:07] !log mw86-mw111 tested ok, pushing into apache service in pmtpa [20:38:08] Logged the message, RobH [20:38:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.021 seconds [20:56:09] New patchset: Reedy; "Fix location of $wgLocalStylePath per bug 42858" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50048 [20:56:31] New review: Reedy; "Patch Set 1: Verified+2 Code-Review+2" [operations/mediawiki-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50048 [20:56:32] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50048 [20:58:06] !log reedy synchronized wmf-config/CommonSettings.php [20:58:07] Logged the message, Master [21:02:14] New patchset: Hashar; "beta: basic role to get mysql packages installed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49703 [21:03:47] ^^^^ easy merge :-] [21:07:01] MaxSem: so? [21:10:03] paravoid, $HOSTADDRESS? will do [21:10:10] yes [21:12:23] oh you also miss trailing $ [21:12:35] damn, this is my bad for pushing this through [21:13:05] !log mw112-125 now in service as api apaches in tampa [21:13:07] Logged the message, RobH [21:13:30] RECOVERY - Solr on solr3 is OK: All OK [21:13:30] RECOVERY - Solr on solr1001 is OK: All OK [21:13:58] RECOVERY - Solr on solr1 is OK: All OK [21:13:58] RECOVERY - Solr on solr2 is OK: All OK [21:14:24] RECOVERY - Solr on solr1003 is OK: All OK [21:14:51] RECOVERY - Solr on solr1002 is OK: All OK [21:17:57] New patchset: Faidon; "Fix Solr nagios plugin invocation" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50049 [21:18:24] New patchset: Asher; "setting wgReadOnly for wikidatawiki during maintenance window" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50050 [21:18:49] !log forced puppet run on lardner, was tired of it showing error for no runs [21:18:50] Logged the message, RobH [21:18:54] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Wed Feb 20 21:18:51 UTC 2013 [21:19:00] New review: Faidon; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/50049 [21:19:06] New review: Asher; "Patch Set 1: Verified+2 Code-Review+2" [operations/mediawiki-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50050 [21:19:07] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50050 [21:19:09] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50049 [21:20:52] !log asher synchronized wmf-config/InitialiseSettings.php 'setting wgReadOnly for wikidatawiki durring maintenance' [21:20:53] Logged the message, Master [21:24:07] !log maerlant still offline (200 days?). dropped esams ticket for onsite troubleshooting, as mgmt is offline. [21:24:08] Logged the message, RobH [21:24:39] !log kicked knsq23 to clear puppet freshness alarm [21:24:41] Logged the message, RobH [21:24:59] I did that 5' ago or so :) [21:25:06] but glad that you're on top of this [21:25:10] I'll leave you to it :) [21:25:16] !log running wikidata wb_items / wb_items_per_site schema migrations in parallel on the db66 slave (creating primary keys, cannot be done via online-schema-change) [21:25:17] Logged the message, Master [21:26:15] RECOVERY - Apache HTTP on mw31 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.082 second response time [21:26:50] paravoid, I was in a meeting [21:26:53] MaxSem: fixed it already [21:27:01] thanks:) [21:30:02] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [21:30:29] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [21:31:31] New patchset: Hashar; "(bug 44041) adapt role::cache::mobile for beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44709 [21:32:32] New review: Hashar; "Patch Set 25:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44709 [21:33:38] PROBLEM - MySQL Replication Heartbeat on db66 is CRITICAL: CRIT replication delay 308 seconds [21:33:57] New patchset: Hashar; "Varnish rules for Beta cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47567 [21:34:43] New review: Hashar; "Patch Set 3:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47567 [21:35:05] Ryan_Lane: http://stackoverflow.com/a/167560 [21:35:47] AaronSchulz: ? [21:36:04] it was amusing, that's all :) [21:36:07] heh [21:36:09] yeah. it is [21:36:11] and I agree [21:36:22] New patchset: MaxSem; "Take pmtpa Solr servers out of rotation, increase load on master" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50056 [21:36:23] AaronSchulz: so, how about some swift docs, and some docs on the new way to use memcache ;) [21:36:23] I keep thinking about the lack of inline documentation [21:36:59] Ryan_Lane: what about it? [21:37:16] we have no docs on how to configure object storage [21:37:23] and no docs on how to configure memcache the new way [21:37:51] you mean the MW config or the server side config? [21:38:11] I'd be of limited use for that later [21:40:15] fyi, replication is temporarily stopped on db66 while it has been hijacked for wikidata, please ignore any nagios messages about it in this channel [21:41:17] !log ignore db66 errors, asher is working on it [21:41:19] Logged the message, RobH [21:42:06] New review: MaxSem; "Patch Set 1: Verified+2 Code-Review+2" [operations/mediawiki-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50056 [21:42:08] Change merged: MaxSem; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50056 [21:42:23] thanks robh [21:42:45] New patchset: Hashar; "beta: fill missing bits in lvs/cache configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50064 [21:45:04] !log maxsem synchronized wmf-config/CommonSettings.php 'Take pmtpa Solr servers out of rotation, increase load on master: https://gerrit.wikimedia.org/r/#/c/50056/' [21:45:06] Logged the message, Master [21:46:50] exit [21:47:05] ok, disabling mouse changes active window [21:47:08] far too dangerous [21:47:27] rotfl [21:47:30] * RobH is glad that wasnt a host of other terminal commands [21:47:41] i just use ctrl-d [21:47:54] vs. `exit` [21:48:08] !log authdns-update for wtp1002-1004 [21:48:10] Logged the message, RobH [21:48:17] New patchset: Hashar; "beta: fill missing bits in lvs/cache configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50064 [21:48:41] ns3 is dying i think... [21:49:14] wait, bleh [21:49:24] 0-2, not 1-3, all good. [21:49:48] New patchset: Hashar; "beta: fill missing bits in lvs/cache configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50064 [21:50:19] New review: Matmarex; "Patch Set 1: Code-Review+1" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/50041 [21:51:38] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 203 seconds [21:52:05] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 213 seconds [21:55:47] New patchset: Hashar; "adapt role::cache::upload for beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50064 [22:00:38] New patchset: RobH; "wtp1002-1004 additions" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50106 [22:02:36] New review: RobH; "Patch Set 1: Verified+2 Code-Review+2" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50106 [22:02:47] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50106 [22:03:20] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [22:04:14] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [22:18:55] New patchset: JGonera; "Add photo upload schema for event logging" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50109 [22:24:01] !log importing wikidatawiki with schema changes into the inactive pmtpa side of the s5 shard, with replication of the import just in pmtpa. ignore repl warnings for db{35,44,45,55} [22:24:02] Logged the message, Master [22:35:58] !log importing wikidatawiki with schema changes into the s5 master, with sql_log_bin disabled [22:36:00] Logged the message, Master [22:40:19] grrrrr [22:40:24] wtp1002, why you gotta be a pain [22:40:45] err: Could not request certificate: Error 403 on SERVER: [22:41:01] thats a new one. [22:43:25] RobH: wild guess is that's mismatched puppet versions? [22:43:57] its a brand new install. [22:43:59] http://pastebin.com/fz5rFqMK [22:44:20] so now it doesnt even hit sockpuppet and ask for signing [22:44:25] paravoid: got any ideas? ^ [22:44:50] other stuff runs puppet fine, so its not a new puppet issue or something [22:45:17] i spun up new hosts today and they worked...wtf [22:45:22] hrmmmm, full pastebin makes me think not a version thing maybe [22:45:29] its very odd [22:45:34] and its new (to me) [22:45:53] wrong reverse dns [22:45:53] RobH: and puppetmaster has a log with more info maybe? [22:45:56] missing a final dot [22:45:57] PROBLEM - MySQL Slave Delay on db44 is CRITICAL: CRIT replication delay 182 seconds [22:46:10] reverse dns wrong for the host? [22:46:12] so it doesn't match the by domain apache acl [22:46:13] yes [22:46:19] blehhhh, thx [22:46:20] all three of them [22:46:27] wtp1002.eqiad.wmnet.32.64.10.in-addr.arpa. [22:46:29] etc. [22:46:30] yea, i did some vim sedding to throw them in lazy stype [22:46:34] my bad. [22:47:00] PROBLEM - MySQL Replication Heartbeat on db44 is CRITICAL: CRIT replication delay 195 seconds [22:48:02] !log authdns-update [22:48:03] Logged the message, RobH [22:48:30] New patchset: Andrew Bogott; "Change-Id: If3b639726a667b4d70f59a1b3ec49a80961d1478" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50112 [22:49:17] New review: Andrew Bogott; "Patch Set 1: Code-Review-2" [operations/puppet] (production) C: -2; - https://gerrit.wikimedia.org/r/50112 [22:49:47] Change abandoned: Andrew Bogott; "git mishap!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50112 [22:50:19] New patchset: Andrew Bogott; "Turn manage-volumes into a daemon" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49916 [22:50:53] New patchset: Platonides; "Handling of packages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50015 [22:50:59] New review: Andrew Bogott; "Patch Set 3: Code-Review-2" [operations/puppet] (production) C: -2; - https://gerrit.wikimedia.org/r/49916 [23:12:04] New review: awjrichards; "Patch Set 1: Verified+1" [operations/mediawiki-config] (master); V: 1 - https://gerrit.wikimedia.org/r/50109 [23:17:36] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [23:17:45] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [23:27:16] Reedy: Are you working on OWA? [23:27:22] is anyone? [23:27:41] No [23:27:52] I saw some comments in scrollback. [23:28:02] Windows doesn't like * in filenames [23:28:18] Among other things. [23:28:20] So then cloning a git repo with *.wikimedia in it makes it unhappy [23:28:30] I think the project's deprecated [23:28:35] in favor of the current analytics plan [23:28:39] Indeed [23:28:42] Heh, plan. [23:28:43] It's been a couple of years at least [23:28:44] Such that it is. [23:28:44] but that was before me so I might be mistaken [23:28:53] OWA never really got off the ground. [23:29:00] It was a short fall from grace. [23:29:02] paravoid: Check with robla then strip the crap out of the repo :D [23:29:11] which repo? [23:29:23] reedy@fenari:/home/wikipedia/common$ ping owa1 [23:29:23] PING owa1.wikimedia.org (208.80.152.112) 56(84) bytes of data. [23:29:23] 64 bytes from owa1.wikimedia.org (208.80.152.112): icmp_req=1 ttl=63 time=0.214 ms [23:29:34] these were repurposed as swift test boxes [23:29:34] Heh. [23:29:40] and I've given those back to RobH for repurposing [23:29:46] ahh [23:30:00] they'll probably be renamed as soon as they're repurposed [23:30:18] paravoid: https://gerrit.wikimedia.org/r/#/c/32924/ files/owa/owa-apache [23:30:21] commit 3bc4ae656f3caf33a6d714de7b117ea2b78b18d2 [23:30:21] Author: Faidon Liambotis [23:30:22] Date: Mon Jul 23 15:54:52 2012 +0300 [23:30:22] Remove OWA manifests [23:30:28] seems I missed those files [23:31:18] heh [23:31:24] Hmm [23:31:30] That *.wikimedia.org.crt doesn't seem to be referenced anywhere [23:31:41] * AaronSchulz hates that file [23:31:47] at least when he works from home [23:31:52] heh [23:32:50] New patchset: Faidon; "Ditch orphan OWA files" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50119 [23:33:03] RECOVERY - MySQL Slave Delay on db44 is OK: OK replication delay 0 seconds [23:33:57] RECOVERY - MySQL Replication Heartbeat on db44 is OK: OK replication delay 0 seconds [23:40:12] New patchset: Reedy; "Rename *.wikimedia.org.crt to star.wikimedia.org.crt" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/32924 [23:41:05] paravoid: ^ Would you mind merging that one please? [23:41:12] seems owa was the only thing to use it.. [23:41:22] and then it was star.wikimedia in the file [23:41:36] us dual os users would appreciate [23:42:29] yes! [23:43:49] after all, committing from Windows is easier, you don't need to run puppet parser validate (cuz it won't work anyway:P) [23:44:59] New review: Reedy; "Patch Set 1: Code-Review+1" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/50119 [23:45:44] New review: Hashar; "Patch Set 1:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50041 [23:48:16] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [23:48:22] RECOVERY - MySQL disk space on neon is OK: DISK OK