[00:03:24] musikanimal: is it "normal" for xtools to have 119 lighttpd-xtools jobs running? [00:03:28] How is that even possible... [00:04:42] this has to be the webservice watcher going nuts [00:06:48] (03Draft2) 10Quiddity: Add link to source for our cdnjs. [labs/tools/cdnjs-index] - 10https://gerrit.wikimedia.org/r/377930 [00:08:05] RECOVERY - Puppet errors on tools-exec-1439 is OK: OK: Less than 1.00% above the threshold [0.0] [00:08:43] (03CR) 10Quiddity: "Note: I tried to put the link in the top-right corner, by the Upstream link, but CSS horrors ensued (linebreaks everywhere). Anyway, this" [labs/tools/cdnjs-index] - 10https://gerrit.wikimedia.org/r/377930 (owner: 10Quiddity) [00:10:16] !log tools.xtools Killed 119 running webservice jobs with `qstat -q webgrid-lighttpd | awk '{ print $1;}' | xargs -L1 qdel` [00:10:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.xtools/SAL [00:30:46] RECOVERY - Puppet errors on tools-exec-1437 is OK: OK: Less than 1.00% above the threshold [0.0] [00:44:52] Hi. Is quarry being so broken a known issue? [00:45:14] http://tools.wmflabs.org/quarry/login?next=/quarry/ gives a 502 Bad Gateway error. [00:45:20] http://tools.wmflabs.org/quarry/query/runs/all is empty, etc. [00:53:41] bd808: hmm I dunno? The legacy xtools .lightbox.conf is pretty crazy, set up Hedonil [00:55:02] I recently made the legacy articleinfo redirect to the new one, don't think that's it though? [00:55:34] Err bad autocorrection, I said lighttpd.conf [01:07:20] Esther, you want https://quarry.wmflabs.org/ not http://tools.wmflabs.org/quarry/ [01:07:54] (where was the latter linked from? so that I/we can link-fix..) [01:13:03] I can't see anything with a search for insource:"tools.wmflabs.org/quarry/" at mw/meta/en/wikitech. Did you reach it from browser-history-autocomplete or google? [01:17:05] PROBLEM - exim queue length on tools-mail is CRITICAL: CRITICAL: 4.08% of data above the critical threshold [2.0] [01:17:58] 10Quarry: Redirect Toolforge Quarry page to Cloud VPS Quarry - https://phabricator.wikimedia.org/T175881#3606587 (10Quiddity) [01:51:06] https://wikitech.wikimedia.org/wiki/Help:LAMP_instances recommends installing MariaDB using apt-get, is that still recommended or is there a puppet class that should be used instead? [01:54:42] 10Quarry: Redirect Toolforge Quarry page to Cloud VPS Quarry - https://phabricator.wikimedia.org/T175881#3606587 (10zhuyifei1999) From [[https://gerrit.wikimedia.org/r/#/c/304764/|Gerrit change 304764]], I *think* this is Yuvi's attempt to move Quarry to Toolforge. There are quite some benefits, such as better r... [01:59:15] 10Quarry, 10Toolforge: Redirect Toolforge Quarry page to Cloud VPS Quarry - https://phabricator.wikimedia.org/T175881#3606633 (10Quiddity) [02:18:51] quiddity: Oh! Thank you. [02:26:54] 10Quarry, 10Toolforge: Redirect Toolforge Quarry page to Cloud VPS Quarry - https://phabricator.wikimedia.org/T175881#3606639 (10MZMcBride) At minimum, putting a big note at the top of seems reasonable. Better would be an HTTP 301 or 302 to . [02:46:16] !log suggestbot installed MariaDB on suggestbot-prod, created database and user, restricted to connecting from localhost [02:46:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Suggestbot/SAL [02:58:11] Nettrom: that LAMP page is "unofficial" at best. There is puppet code for MySQL/MariaDB, but I don't know if they are easily used on a random Cloud VPS instance [02:59:20] bd808: I saw there’s a bunch of MariaDB puppet classes, but I have no idea if any of them is a good general one to use. So far my needs are simple and I can live with some downtime, so I used apt-get and document what I’ve done. [03:03:15] *nod* It would probably be a good hackathon project to sort out some common components and make them easier to setup on our VMs [03:04:34] I use MediaWiki-Vagrant on most of the VMs that I run. Mostly because I wrote a fair part of the Puppet code there and know how it works :) [03:05:57] I have a crazy wishlist task to come up with a new and better way of letting people write their own Puppet code. Someday I'll find time (or someone else) to work on it [03:10:39] !log suggestbot apt-get installed python-pip, python-dev, and build-essentials, then installed virtualenv globally via pip [03:10:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Suggestbot/SAL [03:59:46] PROBLEM - Puppet errors on tools-exec-1412 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [04:19:45] RECOVERY - Puppet errors on tools-exec-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [04:25:51] quiddity: I typed in the URL by hand. [04:26:16] I don't think it was always at quarry.wmflabs.org? But maybe I'm misremembering. [04:49:40] PROBLEM - Puppet errors on tools-exec-1415 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [05:04:32] 10Cloud-VPS, 10Toolforge, 10Operations: Toolforge's static websever broken by Puppet changes and stale nginx packages - https://phabricator.wikimedia.org/T175885#3606725 (10bd808) [05:05:00] 10Cloud-VPS, 10Toolforge, 10Operations: Toolforge's static websever broken by Puppet changes and stale nginx packages - https://phabricator.wikimedia.org/T175885#3606739 (10bd808) p:05Triage>03Normal [05:05:22] 10Cloud-VPS, 10Toolforge, 10cloud-services-team (Kanban), 10Operations: Toolforge's static websever broken by Puppet changes and stale nginx packages - https://phabricator.wikimedia.org/T175885#3606725 (10bd808) [05:09:03] 10Cloud-Services, 10Patch-For-Review: Update nginx on tools and labs proxies and static file server - https://phabricator.wikimedia.org/T134383#3606747 (10bd808) [05:09:06] 10Cloud-Services, 10Patch-For-Review: Update static tools host to be jessie - https://phabricator.wikimedia.org/T139743#3606744 (10bd808) 05Open>03Resolved a:03madhuvishy Done by @madhuvishy quite a while ago. ``` tools-static-10.tools:/etc/nginx root$ lsb_release -a No LSB modules are available. Distri... [05:10:43] 10Cloud-Services, 10Patch-For-Review, 10User-bd808: Update nginx on tools and labs proxies and static file server - https://phabricator.wikimedia.org/T134383#3606752 (10bd808) 05Open>03Resolved a:03bd808 ``` mbp01:~/projects/wmf/operations/puppet (git production) bd808$ git grep -l spdy modules/tlspro... [05:13:26] 10Toolforge: Delete unused tools wlm-jury-at, wlm-jury-yarl - https://phabricator.wikimedia.org/T172590#3606755 (10bd808) [05:14:18] 10Toolforge: Delete unused tools wlm-jury-at, wlm-jury-yarl - https://phabricator.wikimedia.org/T172590#3503164 (10bd808) [05:14:20] 10Toolforge, 10Tracking: Tools that should get deleted (tracking) - https://phabricator.wikimedia.org/T133777#3606756 (10bd808) [05:17:05] 10cloud-services-team (FY2017-18), 10Goal: Program 10 Outcome 3: Outreach - https://phabricator.wikimedia.org/T166406#3606767 (10bd808) [05:17:07] 10Cloud-Services, 10Wikimania-Hackathon-2017: History & Purpose of Wikimedia Cloud Services @ Wikimania Hackathon - https://phabricator.wikimedia.org/T170833#3606765 (10bd808) 05Open>03Resolved a:03madhuvishy [05:18:09] 10Toolforge: Run non-interactive commands on labs kubernetes webservices - https://phabricator.wikimedia.org/T169695#3606768 (10bd808) [05:19:17] 10Toolforge, 10Outreachy (Round-15): Improvements for the Toolforge 'webservice' command - https://phabricator.wikimedia.org/T175768#3606769 (10bd808) [05:20:13] 10Toolforge, 10Kubernetes: newer npm for nodejs Kubernetes instances - https://phabricator.wikimedia.org/T169451#3398652 (10bd808) [05:23:01] 10Cloud-VPS, 10Toolforge, 10cloud-services-team (Kanban), 10Operations: Toolforge's static websever broken by Puppet changes and stale nginx packages - https://phabricator.wikimedia.org/T175885#3606778 (10bd808) Related: {T169247} [05:24:42] RECOVERY - Puppet errors on tools-exec-1415 is OK: OK: Less than 1.00% above the threshold [0.0] [05:47:07] 10Cloud-VPS, 10Toolforge, 10cloud-services-team (Kanban), 10Operations: Toolforge's static webserver broken by Puppet changes and stale nginx packages - https://phabricator.wikimedia.org/T175885#3606823 (10Quiddity) [06:11:34] bd808: around? [06:35:55] PROBLEM - Puppet errors on tools-exec-1414 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:09:49] 10Tools, 10Commons: Zoomviewer is down - https://phabricator.wikimedia.org/T169864#3606926 (10Fae) @dschwen is away, and has been for a long time. ZoomViewer should be migrated to being WMF supported. Without it, Commons is **not** a suitable platform for high resolution images which are now the norm for digit... [07:15:57] RECOVERY - Puppet errors on tools-exec-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [07:40:56] 10Tools: Need easier tool for working on redundancy than "Inhalte übernommen" Template Tool (german WP) - https://phabricator.wikimedia.org/T175698#3600535 (10PerfektesChaos) Template description has been improved as far as possible [[https://de.wikipedia.org/wiki/Spezial:Diff/169061522/169065774 | (agreed)]].... [08:30:24] (03CR) 10Jean-Frédéric: [C: 032] Include tools directory in php linting [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/377921 (owner: 10Lokal Profil) [08:30:48] (03CR) 10Jean-Frédéric: [C: 032] "Thanks! I had completely forgotten about phpcs.xml :)" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/377921 (owner: 10Lokal Profil) [08:32:05] (03Merged) 10jenkins-bot: Include tools directory in php linting [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/377921 (owner: 10Lokal Profil) [08:32:51] (03CR) 10jenkins-bot: Include tools directory in php linting [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/377921 (owner: 10Lokal Profil) [08:34:00] (03CR) 10Jean-Frédéric: [C: 032] "Thanks for hunting that down!" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/377927 (https://phabricator.wikimedia.org/T175839) (owner: 10Lokal Profil) [08:34:51] (03Merged) 10jenkins-bot: Ensure all added categories are pywikibot.Category objects [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/377927 (https://phabricator.wikimedia.org/T175839) (owner: 10Lokal Profil) [08:35:41] (03CR) 10jenkins-bot: Ensure all added categories are pywikibot.Category objects [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/377927 (https://phabricator.wikimedia.org/T175839) (owner: 10Lokal Profil) [08:37:52] !log tools.heritage Deploy latest from Git master: c5b8ffb, 837707f, 5799d26 (T174261), d330733 (T174340) [08:37:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [08:37:56] T174261: Add Egypt in Arabic to monuments database - https://phabricator.wikimedia.org/T174261 [08:37:56] T174340: Add Iraq in Arabic to monuments database - https://phabricator.wikimedia.org/T174340 [09:35:59] PROBLEM - Puppet errors on tools-worker-1020 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:36:53] !log tools.heritage Deploy latest from Git master: d2aa019, 0766491, d73eb9e (T174871), 5b00f0b, f8ff2a6 (T175839) [09:36:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [09:36:58] T175839: ErfgoedBot categorisation fails with 'unicode' object has no attribute 'isHiddenCategory' - https://phabricator.wikimedia.org/T175839 [09:36:58] T174871: Solve ErfgoedBot categorisation problem - https://phabricator.wikimedia.org/T174871 [09:50:51] 10Toolforge, 10Huggle: https://huggle.wmflabs.org gives ERR_NAME_NOT_RESOLVED - https://phabricator.wikimedia.org/T175901#3607269 (10MarcoAurelio) [10:15:59] RECOVERY - Puppet errors on tools-worker-1020 is OK: OK: Less than 1.00% above the threshold [0.0] [10:34:52] (03CR) 10Lokal Profil: "> Thanks! I had completely forgotten about phpcs.xml :)" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/377921 (owner: 10Lokal Profil) [10:54:05] PROBLEM - Puppet errors on tools-bastion-03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:10:07] (03PS1) 10Jean-Frédéric: Add Bash linting using bashate [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378006 [11:19:56] (03PS2) 10Jean-Frédéric: Add Bash linting using bashate [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378006 (https://phabricator.wikimedia.org/T175906) [11:21:41] (03PS3) 10Jean-Frédéric: Add Bash linting using bashate [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378006 (https://phabricator.wikimedia.org/T175906) [11:24:02] RECOVERY - Puppet errors on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [11:27:52] (03CR) 10Jean-Frédéric: "> We should probably bump this to use ESlint at some point (and maybe" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/377921 (owner: 10Lokal Profil) [11:29:26] (03PS1) 10Jean-Frédéric: [WIP] Add Eslint [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378011 (https://phabricator.wikimedia.org/T175907) [11:29:32] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add Eslint [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378011 (https://phabricator.wikimedia.org/T175907) (owner: 10Jean-Frédéric) [11:35:37] (03PS2) 10Jean-Frédéric: [WIP] Add Eslint [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378011 (https://phabricator.wikimedia.org/T175907) [11:36:35] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add Eslint [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378011 (https://phabricator.wikimedia.org/T175907) (owner: 10Jean-Frédéric) [11:48:13] 10Toolforge: Rounding and missing units in VMEM values on http://tools.wmflabs.org/?status create misleading values - https://phabricator.wikimedia.org/T119680#3607623 (10Liuxinyu970226) [11:50:02] PROBLEM - Puppet errors on tools-bastion-03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:50:48] 10Toolforge, 10Huggle: https://huggle.wmflabs.org gives ERR_NAME_NOT_RESOLVED - https://phabricator.wikimedia.org/T175901#3607269 (10Liuxinyu970226) I don't think #toolforge is good for the providing url, use #cloud-services instead? [12:21:01] (03PS3) 10Jean-Frédéric: [WIP] Add Eslint [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378011 (https://phabricator.wikimedia.org/T175907) [12:21:55] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add Eslint [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378011 (https://phabricator.wikimedia.org/T175907) (owner: 10Jean-Frédéric) [12:22:23] (03PS4) 10Jean-Frédéric: [WIP] Add Eslint [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378011 (https://phabricator.wikimedia.org/T175907) [12:23:23] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add Eslint [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378011 (https://phabricator.wikimedia.org/T175907) (owner: 10Jean-Frédéric) [12:40:39] (03CR) 10Lokal Profil: "374401 has the pattern for the ESlint change for the Wikispeech extension" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378011 (https://phabricator.wikimedia.org/T175907) (owner: 10Jean-Frédéric) [12:44:42] 10Tools, 10Commons: Zoomviewer is down - https://phabricator.wikimedia.org/T169864#3607735 (10dschwen) I'm not away. [12:48:37] (03PS1) 10Jean-Frédéric: Skip categorisation for some countries [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378021 (https://phabricator.wikimedia.org/T174871) [12:50:45] (03CR) 10jerkins-bot: [V: 04-1] Skip categorisation for some countries [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378021 (https://phabricator.wikimedia.org/T174871) (owner: 10Jean-Frédéric) [12:55:03] RECOVERY - Puppet errors on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [13:05:41] 10Tools, 10Commons: Zoomviewer is down - https://phabricator.wikimedia.org/T169864#3607778 (10Fae) Good! However this task should not be marked as 'resolved' and the more general point that the WMF should be thinking of providing ZoomViewer facilities as part of the media viewer... or at least something that g... [13:18:27] (03PS2) 10Jean-Frédéric: Skip categorisation for some countries [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378021 (https://phabricator.wikimedia.org/T174871) [13:30:50] PROBLEM - Puppet errors on tools-exec-1428 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [13:40:33] 10Cloud-VPS, 10Operations-Software-Development, 10Patch-For-Review: Install cumin in the WMCS infrastructure - https://phabricator.wikimedia.org/T175712#3601071 (10chasemp) >>! In T175712#3601612, @hashar wrote: > As a side effect, #beta-cluster-infrastructure and #continuous-integration-infrastructure woul... [14:01:07] andrewbogott: hey, is there anything needed for https://phabricator.wikimedia.org/T175567 ? [14:10:50] RECOVERY - Puppet errors on tools-exec-1428 is OK: OK: Less than 1.00% above the threshold [0.0] [14:15:02] Amir1: I need to create it :) but it's been a very busy week [14:15:32] chasemp: I understand, take your time and thanks for the great work :) [14:17:38] 10Cloud-VPS (Project-requests), 10cloud-services-team: Create a project for Wikimedia Armenia - https://phabricator.wikimedia.org/T175567#3608010 (10chasemp) a:03chasemp [14:19:02] 10Cloud-VPS (Project-requests), 10cloud-services-team: Create a project for Wikimedia Armenia - https://phabricator.wikimedia.org/T175567#3596615 (10chasemp) 05Open>03Resolved Created with `Ladsgroup` as a project admin. Best of luck! [14:26:10] 10Cloud-VPS (Quota-requests), 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Request increased quota for wikidata-query labs project - https://phabricator.wikimedia.org/T175196#3608027 (10chasemp) 05Open>03Resolved a:03chasemp Great done, I upped RAM to `72500` for good measure. Keep the quota,... [14:32:40] 10Cloud-VPS (Project-requests), 10cloud-services-team: Create a project for Wikimedia Armenia - https://phabricator.wikimedia.org/T175567#3608037 (10Ladsgroup) Thank you Chase :) [14:39:45] 10Tools, 10Commons: Zoomviewer is down - https://phabricator.wikimedia.org/T169864#3608040 (10dschwen) Please create a new task for this. A constantly open "Zoomviewer is down" task is misleading and counterproductive. You are welcome to reopen this when the zoomviewer is down again and needs attention from me... [15:12:02] 10Toolforge, 10Huggle: https://huggle.wmflabs.org gives ERR_NAME_NOT_RESOLVED - https://phabricator.wikimedia.org/T175901#3608142 (10MarcoAurelio) #toolforge covers "Platform as a Service hosting for bots, webservices, and analytics research". If this is a webservice issue, I think the project I used is approp... [15:45:01] 10Wikibugs: Prepare wikibugs gerrit bot for gerrit 2.14 / 2.15 - https://phabricator.wikimedia.org/T175929#3608237 (10Paladox) [15:57:28] 10Tool-Pageviews: Add ability to query for individual pages across multiple projects - https://phabricator.wikimedia.org/T175930#3608284 (10MusikAnimal) [16:15:16] (03PS1) 10Volans: volans wmcs wide root [labs/private] - 10https://gerrit.wikimedia.org/r/378053 [16:16:51] 10Tools, 10Commons: Zoomviewer is down - https://phabricator.wikimedia.org/T169864#3608319 (10zhuyifei1999) 05Resolved>03Open https://commons.wikimedia.org/wiki/Commons:Village_pump#ZoomViewer_down says it's down. [16:31:49] 10Cloud-VPS, 10Huggle: https://huggle.wmflabs.org gives ERR_NAME_NOT_RESOLVED - https://phabricator.wikimedia.org/T175901#3608359 (10bd808) There is no proxy with the name "huggle.wmflabs.org" in the [[https://tools.wmflabs.org/openstack-browser/project/huggle|huggle project]] or across [[https://tools.wmflabs... [16:55:37] (03CR) 10Rush: [V: 032 C: 032] "I believe ed2551 is fine so we'll see :)" [labs/private] - 10https://gerrit.wikimedia.org/r/378053 (owner: 10Volans) [17:08:56] 10Tools, 10Commons: Zoomviewer is down - https://phabricator.wikimedia.org/T169864#3608518 (10dschwen) This is stupid [17:10:59] 10Tools, 10Commons: Zoomviewer is down - https://phabricator.wikimedia.org/T169864#3608524 (10dschwen) Alright, I'll take a look. On mobile right now. [18:54:00] 10Tools, 10Commons: Zoomviewer is down - https://phabricator.wikimedia.org/T169864#3608857 (10dschwen) Ok, running again. The webservice was hung: ``` tools.zoomviewer@tools-bastion-03:~$ webservice status Your webservice is running tools.zoomviewer@tools-bastion-03:~$ webservice stop Stopping webservice........ [18:54:23] 10Tools, 10Commons: Zoomviewer is down - https://phabricator.wikimedia.org/T169864#3608858 (10dschwen) 05Open>03Resolved [19:02:24] 10Tools, 10Commons: Zoomviewer is down - https://phabricator.wikimedia.org/T169864#3608896 (10dschwen) Sigh, zoomviewer used to be a _lot_ faster. I wonder what changed. I'll follow up on this. I see that somebody is trying to pull up the ordnance map which still needs to be preprocessed (the script should do... [19:34:07] PROBLEM - Puppet errors on tools-exec-1439 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [19:37:08] !log deployment-prep updated PrivateSettings.php for T175868 [19:37:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [19:37:15] T175868: Deploy and test new book rendering (Remex + Electron) - https://phabricator.wikimedia.org/T175868 [20:04:03] RECOVERY - Puppet errors on tools-exec-1439 is OK: OK: Less than 1.00% above the threshold [0.0] [20:52:20] PROBLEM - exim queue length 1 on tools-mail is CRITICAL: CRITICAL: 57.02% of data above the critical threshold [1.0] [20:52:31] chasemp: ^ [20:59:50] 10VPS-Projects: Successful pilot of Discourse on https://discourse.wmflabs.org/ as an alternative to wikimedia-l mailinglist - https://phabricator.wikimedia.org/T124690#3609260 (10Aklapper) [20:59:52] 10VPS-Projects: Problem creating an account at https://discourse.wmflabs.org/ - https://phabricator.wikimedia.org/T125107#3609258 (10Aklapper) 05stalled>03declined >>! In T125107#3540769, @Aklapper wrote: > @AdHuikeshoven: Can you provide more information? Unfortunately closing this report as no further inf... [21:00:58] PROBLEM - Puppet errors on tools-worker-1010 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:04:58] I wonder what unit is it using [21:06:13] Platonides: are you asking about the tools-mail check? [21:06:40] if so the alert is bogus, I was trying to convince shinken to send the notification to this irc channel [21:07:03] oh [21:07:18] yes, I was talking about that [21:08:28] yeah it's all good, I set the threshold to be critical for anything over length 1 to force the alert, which happened, so the check works ;) [21:08:44] a critical for queue length 1 made no sense [21:08:56] and then, it seemed to be using a float for that [21:09:11] and 57.02% above 1?! [21:09:32] I was imagining some weird calculation :D [21:11:04] ha ha [21:37:56] madhuvishy: how did you get that to work? :) [21:38:15] my thresholds of choice were going to be 800 for warn and 1000 for crit based on the history I saw [21:39:35] chasemp: It's really stupid, and I don't really know why, but every host is associated with a contact-group, and it seems like hosts handle the contact group right. All the service checks merely state what host/hostgroup, and the host config sends it out to the right contact group - atleast that's all I can find in our setuo [21:39:38] setup [21:39:48] so I literally just pulled the contact_groups line [21:39:54] madhuvishy: what contact group got it work? [21:39:57] in the service {} block [21:39:57] to [21:39:58] tools [21:40:04] huh, I tried taht I thought [21:40:09] it's associated with the tools-mail host [21:40:18] and that's in generated/tools.cfg [21:40:57] RECOVERY - Puppet errors on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0] [21:41:09] madhuvishy: can you put up a changset? [21:41:15] and thank you [21:41:16] chasemp: I did :) [21:41:25] https://gerrit.wikimedia.org/r/#/c/378105/ [21:42:12] I have it at 100/200 now, feel free to amend, or I can [21:42:36] kk [21:45:01] (03CR) 10Lokal Profil: [C: 032] Skip categorisation for some countries [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378021 (https://phabricator.wikimedia.org/T174871) (owner: 10Jean-Frédéric) [21:45:59] (03Merged) 10jenkins-bot: Skip categorisation for some countries [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378021 (https://phabricator.wikimedia.org/T174871) (owner: 10Jean-Frédéric) [21:47:07] (03CR) 10jenkins-bot: Skip categorisation for some countries [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378021 (https://phabricator.wikimedia.org/T174871) (owner: 10Jean-Frédéric) [21:47:37] madhuvishy: I looked back at the last 6mo or so and we seem to exceed that somewhat regularly so just beginning baseline I was going to go higher [21:47:41] I'll amend [22:01:50] 10cloud-services-team (FY2017-18), 10Goal, 10Patch-For-Review: Define a metric to track OpenStack system availability - https://phabricator.wikimedia.org/T167556#3609383 (10Andrew) I have some api uptime stats at https://grafana.wikimedia.org/dashboard/db/wmcs-api-uptimes?orgId=1 Obviously that dashboard ne... [22:18:16] madhuvishy: merged, is puppet running on shinken? [22:18:26] chasemp: yep [23:00:07] PROBLEM - Puppet errors on tools-exec-1439 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [23:35:04] RECOVERY - Puppet errors on tools-exec-1439 is OK: OK: Less than 1.00% above the threshold [0.0] [23:54:25] 10VPS-project-Phabricator, 10Patch-For-Review: Cannot log in or create account on phab-01.wmflabs.org: "Unhandled Exception ("AphrontMalformedRequestException")" - https://phabricator.wikimedia.org/T165643#3609562 (10Dzahn) @paladox @Aklapper change above now merged in prod - should fix this too [23:56:51] 10VPS-project-Phabricator, 10Patch-For-Review: Cannot log in or create account on phab-01.wmflabs.org: "Unhandled Exception ("AphrontMalformedRequestException")" - https://phabricator.wikimedia.org/T165643#3609565 (10Dzahn) 05Open>03Resolved a:03Dzahn http://phab-01.wmflabs.org/ now redirects to https:...