[05:05:02] upstream connect error or disconnect/reset before headers. reset reason: overflow [05:05:02] ? [05:06:48] yeah, it broke [05:09:02] Grafana's down too, ATS 502 [05:09:36] TimStarling: Houston, we have a problem [05:11:23] looks like SRE just got paged [05:13:02] all wikis down for me [05:13:35] yep, it's been noted above ^ [05:18:21] 504 errors while connecting to otrs-wiki [05:18:32] a known issue? [05:18:35] yes [05:18:39] ankry: see topic [05:22:08] Ah, you are already aware of it, I see. [05:22:32] Yes, known issue, operations is working on it [05:23:10] * Frogging101 thumbs up [05:30:56] Gerrit isn't served through eqiad? [05:36:41] Everything should be back right now [05:36:43] ye working now [05:36:45] yep [05:36:54] if you are still encountering any issue let us know please [05:37:10] big thanks to volans [05:38:21] thanks, but it's a team effort, quite few people from SRE working on it ;) [05:38:27] Out of curiosity: what caused this outage? [05:38:49] It seems disk full? not sure [05:38:56] no [05:40:00] rxy: I really doubt disk space is the problem [05:40:19] they tend to provision disk space ahead of time, and it's unlikely so many would've run out of space at the exact same moment [05:40:24] that's something there's all kind of alerts on, and can usually be seen coming a mile off [05:40:34] and also doesn't usually take down an entire cluster [10:59:34] hi, happy easter to you technical people.. hopefully some of you have a different sunday than fixing printers and proprietary ebook readers ;) [11:00:56] Gryllida: if you've been fixing printers and ebook readers today, i feel your pain :/ [11:01:42] it was ok, they both started working and i avoided some greater evils :-) [13:21:19] Gryllida, ouch that sounds all sorts of non-fun [18:38:21] * Zppix shows Gryllida how to operate a sledgehammer [19:16:44] Would be cool if Wikimedia had a simple 'visitor' dashboard (grafana or static) for those 'touring' with all the fancy impressive numbers, like average read/writes, total RAM/storage between all clusters.. etc [19:18:27] something that could impress the 'every-person' with basic computer knowledge :) [19:56:02] alystair: grafana.wikimedia.org is public [20:03:36] Grafana takes at least a bit of knowledge to interpret though [20:04:00] and is quite easy to dunning-kreuger yourself with [20:08:41] that's good feedback alystair [20:08:55] we've had things like https://phabricator.wikimedia.org/T178690#3890148 for a while now [20:09:11] this isn't exactly what you're asking for, but it relates to it a bit [20:13:34] https://grafana.wikimedia.org/d/000000605/datacenter-global-overview?orgId=1&var-datasource=eqiad%20prometheus%2Fglobal&var-site=All has a lot of those metrics [20:59:05] Yeah I took a while to explore the grafana site before saying anything. Definitely not for the non-technical. I'd suggest the following: Total # (worldwide, not single cluster): Machines, CPUs, RAM (TB), storage (TB). Articles edited/created (graphs, already done in 'activity' panel). Avg read/write response times (ms, could cheese it varnish for reads, db for writes?), Traffic in/out [20:59:05] (Gbps) [20:59:54] Would be handy for the WM execs for presentations - could get their input [23:49:35] is there any script or plugin that saves drafts in the middle of editing in case the user's computer crashes? [23:50:20] or a phab ticket requesting to add this functionality to be on by default, by saving incompleted edits to one's personal sandbox every few minutes? [23:55:03] Gryllida: I dont think so, atleast not that im aware of... [23:56:08] Gryllida: I would recommend typing it on a program off-wiki that autosaves if your worried about that [23:56:23] i'm concerned about end users, have a few people who asked in the last year [23:56:29] some of them lost their work :( [23:57:43] Gryllida: have you tried looking up userscripts on english wikipedia? [23:58:55] yea don't see it at https://en.wikipedia.org/wiki/Wikipedia:User_scripts/List