[00:55:35] FIRING: [2x] DiskSpace: Disk space mwlog1003:9100:/srv 3.655% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [01:15:35] RESOLVED: DiskSpace: Disk space mwlog1003:9100:/srv 3.894% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=mwlog1003 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [17:40:45] hello! I'm seeing errors in logstash (OpenSearch dashboards) saying "1 of 616 shards failed" [17:41:10] the error seems to be: "type": "illegal_argument_exception", "reason": "No field found for [confirmation_delay_seconds.keyword] in mapping" [17:54:02] dbrant: can you link the dashboard with the problem to me? [18:05:11] cwhite: it seems to happen for many of them, but for example: https://logstash.wikimedia.org/app/dashboards#/view/1b5adc90-016e-11e8-bc95-517a9b9d585c?_g=h@8130aac&_a=h@be59f90 [18:09:18] I see what happened. There was a scripted field added to the index pattern. I'll clean that up. [18:10:01] dbrant: dashboard looks to be behaving better now. confirm? [18:15:35] cwhite: nice, looks better! [18:16:03] \o/ Thanks for the report! [18:33:19] thanks for the quick response! [18:34:00] got bandwidth to do a scap of an arc-lamp change? (https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/1241204) alternately, I can do it if you don't mind. [18:47:50] cwhite: ^ [19:16:06] ori: done :) [19:55:10] oh thanks! [21:21:25] FIRING: SystemdUnitFailed: arclamp_generate_svgs.service on arclamp1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:27:54] ^^ Somehow the owner of the svgs directory was changed from xenon to root on arclamp1001. I chown'd them back so that alert should resolve. [21:36:25] RESOLVED: SystemdUnitFailed: arclamp_generate_svgs.service on arclamp1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed