[01:03:45] @seen Neha16 [01:03:45] zhuyifei1999_: Last time I saw Neha16 they were quitting the network with reason: Quit: Connection closed for inactivity N/A at 2/27/2018 10:34:07 PM (67d2h29m38s ago) [02:05:32] legoktm: I think that you have to set the `"index": "not_analyzed"` flag on each property you are explicitly setting as text. The dynamic_template settings only effect fields that are implicitly defined [02:07:05] bd808: so I tried that for the console field (https://phabricator.wikimedia.org/R2338:afcac0dd1b3613ffe0d08d151fa7af464bb1dbf5), deleted and recreated the index, but when I look at the mapping (http://tools-elastic-01.tools.eqiad.wmflabs/flaky-ci/_mapping/jenkins-job) it doesn't show the not_analyzed [02:12:05] legoktm: I'm not sure why, but it did not work. You can see that with -- curl -XGET http://tools-elastic-01.tools.eqiad.wmflabs/flaky-ci/_analyze?pretty -H 'Content-Type: application/json' -d'{"field":"console", "text": "foo bar"}' [02:12:23] it is parsing that using the normal work break analyzer [02:12:41] hmm [02:13:52] https://www.elastic.co/guide/en/elasticsearch/reference/5.5/breaking_50_mapping_changes.html [02:13:56] > The string field datatype has been replaced by the text field for full text analyzed content, and the keyword field for not-analyzed exact string values [02:14:05] maybe if I set it to keyword it won't analyze? [02:14:59] heh [02:15:00] elasticsearch.exceptions.RequestError: TransportError(400, 'illegal_argument_exception', 'Document contains at least one immense term in field="console" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: \'[83, 116, 97, 114, 116, 101, 100, 32, 98, 121, 32, 117, 115, 101, 114, 32, 97, 110, 111, 110, 121, 109, 111, [02:15:00] 117, 115, 10, 66, 117, 105, 108]...\', original message: bytes can be at most 32766 in length; got 169967') [02:16:53] I have never really played with regex searches in elasticsearch. Mostly what I remember about them is that manybubbles did a bunch of work to keep them from crashing prod [02:18:00] regex was my ultimate goal, but right now I think I can satisfy my usecase with just exact substring matches [02:22:50] I'm gonna spend another hour or two on this tonight but then I'll implement it in mysql I think [02:24:09] legoktm: dump the files in a directory and exec grep on them ;) [02:24:28] this is not going to need to scale for giant workloads [02:26:47] yeaaah, that would be pretty easy to do [02:27:15] at least I got to learn a bit about elasticsearch ^.^ [04:06:12] !log toolsbeta locally patched `/usr/lib/python2.7/dist-packages/toollabs/common/tool.py` on bastion and webgrid-lighttpd [04:06:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL