[10:14:08] 10serviceops, 10Operations, 10Thumbor, 10User-jijiki: Upgrade Thumbor to Buster - https://phabricator.wikimedia.org/T216815 (10jijiki) [12:56:11] <_joe_> jijiki: do you plan to reimage more servers today? [12:56:44] we have reimaged all of them [12:57:14] why whatsup? [12:58:00] <_joe_> I want to upload the scap package [12:58:05] <_joe_> and you know the drill [12:58:12] <_joe_> I won't upgrade it now [12:58:17] <_joe_> in puppet [12:58:50] <_joe_> actually, lemme fix this damn thing. [12:58:54] lol [13:20:35] <_joe_> jijiki: see https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/493404 and https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/493317 (the latest version) [13:21:07] <_joe_> moritzm as well, do you concur we'd be better off managing scap upgrades with our tools? [13:21:26] <_joe_> e.g. debmonitor/debdeploy/cumin? [13:21:31] _joe_: I was thinking 'latest' [13:21:35] <_joe_> noooo [13:21:37] <_joe_> never do that [13:21:40] I know [13:21:43] but just for this [13:21:45] <_joe_> that will install the package fleet wide [13:21:48] <_joe_> why? [13:21:57] I thought that is what we wanted [13:21:59] <_joe_> the point it not having puppet manage package version [13:22:09] <_joe_> define "we" [13:22:18] <_joe_> it surely isn't what *I* want [13:22:22] lol lol [13:22:38] ok, I was under the impression tha the purpose of having the version there [13:22:49] <_joe_> and as far as installing it in production, I think we (as in SREs) have the final say [13:22:49] let me check something hangon [13:23:36] <_joe_> jijiki: the purpose is to use puppet as a poor man's deployment method, which is almost never a good idea in a production environment [13:24:23] _joe_: definitely, it's really straightforward [13:24:40] from one hand, since this is a package we control when a new package is released [13:25:04] I dont find it a horrible idea to always have the latest version, this is what we do now anyway [13:25:09] <_joe_> sorry, lunch time [13:25:35] on the other hand, it is a pain to rollback given we will need to run puppet fleetwide or use cumin to downgrade [13:29:24] with debdeploy you can also rollback easily [13:31:11] that is true [13:31:38] I simply find 'latest' (for this specific package only) not as horrible as you [13:32:20] in 99% of cases it is obviously a terrible idea :) [13:37:55] <_joe_> I think it reproduces the current issue [13:38:08] <_joe_> you will let puppet decide when it is upgraded [13:38:24] <_joe_> so you can't upload it, test it on N hosts, upload fleet-wide [13:38:38] <_joe_> err install it fleet-wide [13:39:05] <_joe_> if you rely on puppet, it's either going to be multiple commits or there is no way to manage it properly [13:40:25] yeah, that's also what is done for all other hosts, first upgrade to the Cumin alias which serves as the canary, then upgrade the rest [13:41:09] latest is a landmine, we only use it for tzdata at this point [13:41:40] I see why what you are purposing is better [13:42:00] I am just arguing that managing scap's version in puppet doesnt seem this horrible [13:42:04] to me [14:01:40] 10serviceops, 10Operations, 10Thumbor, 10User-jijiki: Upgrade Thumbor to Buster - https://phabricator.wikimedia.org/T216815 (10jijiki) [15:50:44] akosiaris: godog suggested I talk to you about logging requirements for stuff running in k8s containers [15:50:56] will it be convention to just log to stdout? [15:51:17] i.e. let it go to syslog via systemd -> journald ? [15:51:53] yup, just log to stdout [15:52:23] OK, so we should add the severity to the JSON messages [15:52:33] have we standardized on that attribute name? [15:52:34] this will then be handled by kubernetes and logs will be visible in logstash and kubectl logs [15:52:55] and yes please add severity. [15:53:00] isn't it "level"? [15:53:06] godog: ^ ? [15:53:20] I dunno, that seems familiar and entirely reasonable, just making sure :) [15:54:18] yeah just checked on logstash, it's "level" [15:54:27] k, thanks! [15:55:09] values are the usual DEBUG, INFO, WARN, ERROR etc [15:55:21] we let systemd slap the timestamp on, right? [15:55:55] there is no systemd in the equation actually [15:56:05] oic [15:56:18] but there is a syslog, but I think it's probably better it doesn't do it [15:56:34] makes the rsyslog rules just a tad simpler [15:56:48] wait... doesn't do what? [15:56:50] I 'll have to make sure however that mmkubernetes doesn't add it [15:56:59] rsyslog adding the timestamp [15:57:17] so we *should* output a timestamp? [15:58:02] that's OK on my end, but I'd have thought it'd potentially complicate things on the k8s end [15:58:20] like opening the possibility of format inconsistencies [15:58:23] yeah add it. I see service-runner apps do add a "time": "2018-12-15T13:20:47.070Z", [15:58:33] so best to keep consistent [15:58:38] oooh... as an attribute of the message [15:59:11] indeed for our services it is 'level' the attribute name [16:00:02] re: timestamp also yes looks like it'll need to be added by the service [16:00:44] so JSON-only (no @cee cookie), timestamp attributes (named `time`), and severity/level (named `level`) [16:02:27] re timestamps vs datetime vs iso 8601: https://phabricator.wikimedia.org/T212529 [16:02:28] :) [16:02:52] this convention is mostly for analyticss world, but it'd be nice if we could adopt it elsewhere: [16:03:01] dt are iso-8601 date time strings [16:03:07] ts are unix epoch timestamps [16:04:19] i guess 'time' is already there for service-runner so oh well [16:06:31] godog: if we end up running this stand-alone (running kask in k8s isn't a certainty yet, that I know of), what would we do logging-wise? [16:06:42] ottomata: thanks! subscribed, also +1 [16:06:42] journald via systemd? [16:07:06] urandom: that'd be my recommendation yes [16:07:33] godog: and would we need the "@cee" cookie then? [16:07:53] I'm fishing for an LCD if there is one [16:10:02] urandom: yeah [16:10:23] LCD I believe would be to add @cee: when logging to syslog, but otherwise emit the same json [16:10:38] that seems the simplest to me, even if not ideal [16:10:41] akosiaris: can I run an ab load test on the prod eventgate-analytics discovery endpoint? [16:11:02] godog: well I'm looking to simply eliminate the direct-to-syslog vector and put everything on stdout [16:11:19] unless there is a compelling reason to make it possible to do both [16:11:39] so was looking for a possible LCD for the output of stdout [16:12:07] or input of stdout, output sent to stdout, w/e [16:12:32] ottomata: sure [16:13:27] ottomata: I'm happy to provisionally make it `dt` if you're planning to raise this more broadly (read: not ad hoc) [16:13:54] ottomata: otherwise: https://xkcd.com/927/ [16:13:58] urandom: it def isn't ad hoc, but it hasn't been something we're standardizing outside of strongly schemaed worlds, e.g. logging [16:14:09] but if we can, the more consistent we are everywhere the better [16:14:09] 10serviceops, 10Citoid: JSTOR is blocking citoid IPs - https://phabricator.wikimedia.org/T216456 (10Samwalton9) @Mvolz Do you already have a contact at JSTOR to help with this? If not we'd be happy to help! [16:14:20] heheh exactly urandom [16:14:21] but also [16:14:29] ottomata: I mean, you want `dt` to be The One, but other services are already using `time` [16:14:32] https://xkcd.com/1179/ [16:14:48] urandom: that task is a subtask of https://phabricator.wikimedia.org/T214093 [16:14:51] so you want me to break w/ convention elsewhere and use yours [16:15:02] we weren't really going to take on general purpose logging stuff with that task [16:15:14] i'd like if we were consistent everywhere [16:15:28] maybe ask marko and petr and see what they think about using dt in logs [16:15:31] in service runner [16:15:33] heh [16:16:06] ottomata: see... that was what I was aiming to avoid [16:16:39] ottomata: getting distracted by an attribute naming crusade when I just want to add a timestamp to some JSON [16:16:48] ottomata: I figured maybe you wanted to [16:18:34] ottomata: I admire and respect your attempt at enlisting me though :) [16:20:20] hehh, ya but these things matter you know! i say if you won't want to bother, don't bother here, but if it is easy an uncontroversial (petr and marko are cool with it) then do it [16:24:24] urandom: oh ok I see what you are saying now, essentially having rsyslog DTRT with json payloads even if they don't have @cee [16:25:43] godog: yeah, is there a way of logging to stdout that would satisfy both use-cases [16:30:16] urandom: got it, I'm a bit in the middle of two other things ATM but there should be yeah, namely I believe either always emit the cookie or never emit it, if that makes sense, although tomorrow I'll probably have more time to think about this [16:30:42] meeeeeting [16:30:45] mark: akosiaris [16:31:25] mark isn't going to be around [16:31:30] other meetings [16:34:13] godog: no worries; thanks [18:48:33] akosiaris: unless no ok with you, i'm going to merge https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/493444/ as is. i'd rather them be set always, even for dev. [18:54:22] 10serviceops, 10Citoid: JSTOR is blocking citoid IPs - https://phabricator.wikimedia.org/T216456 (10Mvolz) >>! In T216456#4991319, @Samwalton9 wrote: > @Mvolz Do you already have a contact at JSTOR to help with this? If not we'd be happy to help! I don't, that'd be great! [19:25:38] 10serviceops, 10Operations, 10ops-eqiad, 10HHVM: mw1272 crashed: Bad page map in process hhvm - https://phabricator.wikimedia.org/T211668 (10Cmjohnson) Received the parts, replaced CPU2 and DIMM B1 and cleared the log Return shipping info USPS 9202 3946 2441 1124 14 FEDEX 9611918 2393026 77862432 [22:01:43] 10serviceops, 10MediaWiki-Cache, 10Operations, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), and 5 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10aaron) ChronologyProtector positions should be applie...