[00:32:21] RECOVERY - Puppet errors on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [0.0] [00:38:22] 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: EditCounter's "pages created" does not match Pages tool - https://phabricator.wikimedia.org/T169955#3450908 (10Samwilson) No, this is about counting how many pages a user has created that have since been deleted. So we were just looking at the number of unique... [01:21:00] PROBLEM - Puppet errors on tools-exec-1437 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [01:47:22] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [02:00:58] RECOVERY - Puppet errors on tools-exec-1437 is OK: OK: Less than 1.00% above the threshold [0.0] [02:08:19] PROBLEM - Puppet errors on tools-exec-1414 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [02:27:21] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [02:48:22] RECOVERY - Puppet errors on tools-exec-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [03:47:03] 10Toolforge, 10InternetArchiveBot (v1.4), 10User-Zppix: IABot Management interface: Make the login sessions last longer or add an option for "remember me" - https://phabricator.wikimedia.org/T170849#3451252 (10Cyberpower678) [04:05:24] 10Data-Services, 10DBA, 10WMF-Legal, 10Patch-For-Review: Expose ar_content_format and ar_content_model columns of archive table on Labs replicas - https://phabricator.wikimedia.org/T89741#3451264 (10ZhouZ) Speaking for legal, we clear this change as well. [06:16:32] 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: EditCounter throws warnings when user has zero edits - https://phabricator.wikimedia.org/T170608#3451376 (10Samwilson) a:03Samwilson [06:47:31] 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: EditCounter throws warnings when user has zero edits - https://phabricator.wikimedia.org/T170608#3451403 (10Samwilson) PR: https://github.com/x-tools/xtools/pull/54 [06:51:59] PROBLEM - Puppet errors on tools-exec-1437 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:32:01] RECOVERY - Puppet errors on tools-exec-1437 is OK: OK: Less than 1.00% above the threshold [0.0] [08:30:22] 10Cloud-VPS (Project-requests): Request creation of openipmap labs project - https://phabricator.wikimedia.org/T166671#3451506 (10akosiaris) >>! In T166671#3450013, @Pintoch wrote: > @akosiaris Fantastic! Thanks a lot. Yes, I have a GPG key: http://antonin.delpeuch.eu/delpeuch.asc You got mail at the email addr... [09:22:46] 10Tools, 10Wikidata: Small-displayed images false positive at wp_no_image - https://phabricator.wikimedia.org/T171033#3451605 (10Paucabot) [10:15:56] is there an easy way to copy/move a file from a labs host to a place where it can be loaded into toolsdb? The file is 5.9GB compressed [10:19:04] ^ is in relation to T169766 [10:19:04] T169766: Request custom instance for recommendation-api labs project - https://phabricator.wikimedia.org/T169766 [10:31:31] ^nevermind - I'll just connect to the db from the labs host and load from there [10:56:39] 10Cloud-Services, 10Operations: Ensure we can survive a loss of labservices1001 - https://phabricator.wikimedia.org/T163402#3451840 (10fgiunchedi) [11:02:03] 10Tools, 10Community-Tech-Tool-Labs, 10Epic: Convert all Labs tools to use cdnjs for static libraries and fonts - https://phabricator.wikimedia.org/T103934#3451860 (10zhuyifei1999) [11:29:27] PROBLEM - Puppet errors on tools-exec-1411 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [12:04:26] RECOVERY - Puppet errors on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [13:48:12] PROBLEM - Puppet errors on tools-worker-1027 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [14:23:11] RECOVERY - Puppet errors on tools-worker-1027 is OK: OK: Less than 1.00% above the threshold [0.0] [14:28:48] 10Cloud-Services, 10Cloud-VPS, 10Operations, 10Wikimedia-Incident: labservices1001 down, suspected overheating - https://phabricator.wikimedia.org/T152340#3452581 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi I don't think we've seen this reoccuring [15:00:18] Cyberpower678 on the iabot mangement interface on the queue mutiple pages to run bot on what do the buttons, load pages from url search & load pages from domain search mean? [15:13:01] 10Cloud-VPS: puppet::self may be unused - https://phabricator.wikimedia.org/T171061#3452707 (10Andrew) [15:19:48] 10Cloud-VPS: puppet::self may be unused - https://phabricator.wikimedia.org/T171061#3452707 (10bd808) https://tools.wmflabs.org/openstack-browser/puppetclass/role::puppet::self Looks to be still in use in: * analytics * dashiki * graphite * ifttt * monitoring * redirects * reportcard * sentry * wikidata-topicmaps [15:21:58] 10Cloud-VPS: puppet::self may be unused - https://phabricator.wikimedia.org/T171061#3452707 (10Milimetric) The analytics ones (analytics, dashiki, reportcard) no longer need it, we should clean that up, but I'm a bit hesitant to start anything new now before parental leave. [15:36:05] 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: EditCounter throws warnings when user has zero edits - https://phabricator.wikimedia.org/T170608#3452794 (10MusikAnimal) 05Open>03Resolved Merged! [16:01:55] andrewbogott: good morning :-} Eventually I found out rabbitMQ has some plugin that offers a web interface for monitoring/management :D [16:01:56] https://www.cloudamqp.com/blog/2015-05-27-part3-rabbitmq-for-beginners_the-management-interface.html [16:06:53] 10Cloud-Services, 10wikitech.wikimedia.org: Requesting 'bot' access for MABot - https://phabricator.wikimedia.org/T171066#3452916 (10MarcoAurelio) [16:26:42] 10Cloud-Services, 10wikitech.wikimedia.org: Requesting 'bot' access for MABot - https://phabricator.wikimedia.org/T171066#3453002 (10MarcoAurelio) I am having a problem with the account as I cannot login to it. Please hold on. [16:31:00] 10Cloud-Services, 10wikitech.wikimedia.org: Cannot login/change password to MABot@wikitech - https://phabricator.wikimedia.org/T171069#3453021 (10MarcoAurelio) [16:34:32] 10Cloud-Services, 10wikitech.wikimedia.org: Requesting 'bot' access for MABot - https://phabricator.wikimedia.org/T171066#3453038 (10MarcoAurelio) [16:37:33] 10Tool-Labs-tools-XTools, 10Security, 10Vuln-Inject: 500: Internal Server Error with ArticleInfo when using an apostriphe in article title - https://phabricator.wikimedia.org/T170808#3443346 (10Bawolff) > It's very bad, no doubt, but this db is read-only and public so there's no real harm. Note, this i... [17:28:20] PROBLEM - Puppet errors on tools-worker-1021 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [18:03:20] RECOVERY - Puppet errors on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [0.0] [18:12:47] 10Tool-Labs-tools-XTools, 10Security, 10Vuln-Inject: 500: Internal Server Error with ArticleInfo when using an apostriphe in article title - https://phabricator.wikimedia.org/T170808#3453471 (10Cyberpower678) >>! In T170808#3453048, @Bawolff wrote: >> It's very bad, no doubt, but this db is read-only an... [18:18:16] 10Tool-Labs-tools-XTools, 10Security, 10Vuln-Inject: 500: Internal Server Error with ArticleInfo when using an apostrophe in article title - https://phabricator.wikimedia.org/T170808#3453514 (10Aklapper) [18:19:01] 10Cloud-VPS, 10cloud-services-team (Kanban): Build VPS base images - https://phabricator.wikimedia.org/T170828#3453518 (10Andrew) Jessie and Stretch are updated. There are unexpected issues with the Trusty build which I'm working on. [18:54:20] PROBLEM - Puppet errors on tools-worker-1021 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [18:54:44] 10cloud-services-team, 10wikitech.wikimedia.org: Remove Wikitech rights for WikiSysop - https://phabricator.wikimedia.org/T171090#3453738 (10Quiddity) [19:07:24] 10Tool-Labs-tools-XTools, 10Security, 10Vuln-Inject: 500: Internal Server Error with ArticleInfo when using an apostrophe in article title - https://phabricator.wikimedia.org/T170808#3453809 (10MusikAnimal) >>! In T170808#3453471, @Cyberpower678 wrote: >>>! In T170808#3453048, @Bawolff wrote: >>>It's very ba... [19:36:17] 10Tool-Labs-tools-XTools, 10Security, 10Vuln-Inject: 500: Internal Server Error with ArticleInfo when using an apostrophe in article title - https://phabricator.wikimedia.org/T170808#3453903 (10Bawolff) No need for ears to bleed, I just want to ensure that the potential impact of sql injections are not under... [19:59:21] RECOVERY - Puppet errors on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [0.0] [20:08:14] 10Tool-Labs-tools-XTools, 10Community-Tech: Rewrite XTools: Edit Summaries - https://phabricator.wikimedia.org/T170905#3454098 (10Matthewrbowker) [20:15:27] PROBLEM - Puppet errors on tools-exec-1419 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:17:26] PROBLEM - Puppet errors on tools-bastion-03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:18:20] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:20:04] PROBLEM - Puppet errors on tools-exec-1430 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:20:17] PROBLEM - Puppet errors on tools-exec-1402 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:20:37] PROBLEM - Puppet errors on tools-exec-1413 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:20:49] PROBLEM - Puppet errors on tools-exec-1425 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:21:08] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1417 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:21:57] PROBLEM - Puppet errors on tools-exec-1429 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:21:57] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1404 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:22:25] PROBLEM - Puppet errors on tools-exec-1433 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:22:38] will any of this also affect VPS instances or only toolforge? [20:22:39] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1424 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:23:01] PROBLEM - Puppet errors on tools-exec-1437 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:24:19] PROBLEM - Puppet errors on tools-exec-1407 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:24:25] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1425 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:24:26] PROBLEM - Puppet errors on tools-flannel-etcd-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:24:46] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:25:26] PROBLEM - Puppet errors on tools-exec-1411 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:28:11] Hi. Is there a problem with ssh access to wmflabs? My keys won't work. I tried to ul a new public key and get "Failed to import keypaid" [20:28:14] keypair [20:28:28] Kotz: yes see topic, our ldap instances are having issues [20:28:51] Kotz ldap is having some issues, you may be able to get in if you try every now and then [20:29:10] ah thanks. [20:29:17] good luck with fixing it :-) [20:29:28] np [20:31:17] so is there a chance that I get logged in, or is it completly unavailable for some time? [20:31:55] Sagan i mean you could try but more then likely you wont be able to but who knows [20:32:25] Sagan: unsure - we are investigating, will update when we know. [20:33:41] madhuvishy: ok, thank you :) [20:40:02] bd808: I can't SSH into toolforge [20:40:14] Cyberpower678 ldap issues [20:40:20] :/ [20:40:29] Cyberpower678 they are investigating and working on it atm [20:40:34] Ok [20:49:41] (03PS1) 10Legoktm: De-reference symlinks in tarball [labs/tools/extdist] - 10https://gerrit.wikimedia.org/r/366414 (https://phabricator.wikimedia.org/T135194) [20:53:58] Cyberpower678: we had an ssl certificate expire unexpectedly. :/ folks are actively working on it. [20:55:19] bd808: ok thanks. I recently had milk unexpectedly on me recently. Though I imagine certificates are more difficult. ;-) [21:03:17] Cyberpower678: :) the hard part seems to be putting a new cert in the right place after nothing trusts anything else. [21:03:31] Ouch [21:16:03] !log bastion Forced puppet run and restarted nslcd on bastion-01.bastion.eqiad.wmflabs [21:16:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Bastion/SAL [21:17:20] yay, I can login to bastion again :o [21:17:24] but only to it.... [21:17:56] i can only ssh into bastion too [21:18:52] !log tools Forced puppet run and restarted nscd, nslcd on tools-bastion-02 [21:18:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:19:42] !log tools Restarted nslcd on tools-bastion-03 (=tools-login); logins seem functional again. [21:19:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:21:19] paladox: what instance are you trying to get to? [21:21:36] bd808 puppet-paladox3 and puppet-phabricator [21:21:52] i need to run git pull so that all the instances in those projects pick up the new cert for ldap [21:22:48] paladox: trying to fix puppet-paladox3 now... [21:22:52] thanks :) [21:23:36] bd808: What machines run the kibana frontend? All of the logstash boxes? [21:23:44] * RainbowSprinkles needs to kick puppet + apache to get it back up [21:23:57] RainbowSprinkles: logstash100[1-3] [21:24:04] bd808: can you run that on bastion-02 too? [21:24:11] that bastion allows still no login [21:24:53] paladox: your puppet-paladox3 needs to get the upstream. how do you typically do that? [21:25:12] bd808 i normaly cd /var/lib/git/operations [21:25:13] git pull [21:25:20] press x for it to merge [21:25:30] (there should be no merge conflicts i doint think) [21:25:43] /var/lib/git/operations -> /var/lib/git/operations/puppet [21:26:01] bd808: Thx, didn't realize it was all 3 :) [21:27:03] i then do puppet agent -tv :) [21:29:05] !log git Synced puppet with upstream, forced puppet run, restarted nscd and nslcd [21:29:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [21:29:08] bd808: I can't ssh into my instances. tunnel via bastion01 works (not via bastion02), but authentification fails at my server: publickey [21:29:22] neon.rcm.eqiad.wmflabs [21:29:25] thankyou bd808 [21:29:26] :) [21:29:28] same for other hosts of my project [21:29:34] Sagan: ok. let me see if I can fix that one [21:30:39] bd808: thx :). But I guess that is a more general problem? All of my instances have the same problem, and they are serving different services on different OS [21:30:51] bd808 when your done doing Sagan could you do puppet-phabricator please? :) [21:31:14] Sagan: we are working on using salt to fix all the things... [21:31:17] I guess a general solution will help more? since if every instance needs a fix by an admin, this can take a long time [21:31:29] but I'm spot fixing too [21:32:01] bd808: ah, ok. neon ist the most important one currently, so if salt itself takes more time, I'm fine with that, the services are running currently, so I don't need SSH today [21:32:07] only for neon, since it needs an update [21:32:40] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1424 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:32:44] PROBLEM - Puppet errors on tools-exec-1441 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:32:46] PROBLEM - Puppet errors on tools-proxy-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:32:48] PROBLEM - Puppet errors on tools-elastic-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:32:48] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:32:50] bd808: (I'm afk for 15 minutes, so I'm fine with testing when I'm back, but it can take some minutes. just as a note ;)) [21:32:53] PROBLEM - Puppet errors on tools-exec-1421 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:32:53] PROBLEM - Puppet errors on tools-worker-1012 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:32:53] PROBLEM - Puppet errors on tools-worker-1013 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:32:56] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1408 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:32:56] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1421 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:32:58] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1422 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:01] PROBLEM - Puppet errors on tools-exec-1437 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:05] PROBLEM - Puppet errors on tools-elastic-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:07] !log bastion Restarted nscd, nslcd on bastion-02.bastion.eqiad.wmflabs [21:33:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Bastion/SAL [21:33:11] PROBLEM - Puppet errors on tools-worker-1005 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:13] PROBLEM - Puppet errors on tools-worker-1011 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:16] PROBLEM - Puppet errors on tools-worker-1018 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:21] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:24] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1413 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:24] PROBLEM - Puppet errors on tools-paws-worker-1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:33:29] PROBLEM - Puppet errors on tools-exec-1434 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:30] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:33] PROBLEM - Puppet errors on tools-exec-1417 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:33] PROBLEM - Puppet errors on tools-docker-registry-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:33] PROBLEM - Puppet errors on tools-worker-1015 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:34] PROBLEM - Puppet errors on tools-worker-1016 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:34] PROBLEM - Puppet errors on tools-k8s-master-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:36] PROBLEM - Puppet errors on tools-prometheus-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:36] PROBLEM - Puppet errors on tools-exec-1406 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:39] PROBLEM - Puppet errors on tools-docker-builder-05 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [0.0] [21:33:43] PROBLEM - Puppet errors on tools-exec-1405 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:44] PROBLEM - Puppet errors on tools-worker-1008 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:44] PROBLEM - Puppet errors on tools-exec-1412 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:48] PROBLEM - Puppet errors on tools-exec-1416 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:49] PROBLEM - Puppet errors on tools-worker-1006 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:54] PROBLEM - Puppet errors on tools-exec-1403 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:55] PROBLEM - Puppet errors on tools-k8s-etcd-03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:56] PROBLEM - Puppet errors on tools-exec-1436 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:33:59] PROBLEM - Puppet errors on tools-prometheus-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:04] thanks shinken [21:34:09] PROBLEM - Puppet errors on tools-static-10 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:11] PROBLEM - Puppet errors on tools-worker-1027 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:14] PROBLEM - Puppet errors on tools-exec-1427 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:16] PROBLEM - Puppet errors on tools-k8s-etcd-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:19] PROBLEM - Puppet errors on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:20] PROBLEM - Puppet errors on tools-exec-1414 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:20] PROBLEM - Puppet errors on tools-exec-1407 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:22] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1425 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:23] PROBLEM - Puppet errors on tools-worker-1017 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:25] PROBLEM - Puppet errors on tools-flannel-etcd-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:29] PROBLEM - Puppet errors on tools-proxy-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:30] PROBLEM - Puppet errors on tools-webgrid-generic-1404 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:33] PROBLEM - Puppet errors on tools-worker-1020 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:33] PROBLEM - Puppet errors on tools-elastic-03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:35] PROBLEM - Puppet errors on tools-worker-1010 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:36] PROBLEM - Puppet errors on tools-logs-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:38] PROBLEM - Puppet errors on tools-exec-1439 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:43] PROBLEM - Puppet errors on tools-redis-1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:46] PROBLEM - Puppet errors on tools-exec-1440 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:49] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:51] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1428 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:53] PROBLEM - Puppet errors on tools-worker-1023 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:54] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1407 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:54] PROBLEM - Puppet errors on tools-exec-1424 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:56] PROBLEM - Puppet errors on tools-worker-1009 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:34:56] PROBLEM - Puppet errors on tools-bastion-05 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:00] PROBLEM - Puppet errors on tools-flannel-etcd-03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:01] PROBLEM - Puppet errors on tools-exec-1430 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:02] PROBLEM - Puppet errors on tools-exec-1423 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:04] PROBLEM - Puppet errors on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:04] PROBLEM - Puppet errors on tools-worker-1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:10] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1418 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:12] PROBLEM - Puppet errors on tools-exec-1435 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:14] PROBLEM - Puppet errors on tools-exec-1410 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:15] PROBLEM - Puppet errors on tools-exec-1408 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:15] PROBLEM - Puppet errors on tools-exec-1438 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:20] PROBLEM - Puppet errors on tools-worker-1021 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:20] PROBLEM - Puppet errors on tools-exec-1402 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:23] PROBLEM - Puppet errors on tools-worker-1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:23] Sagan: I think neon.rcm.eqiad.wmflabs was fine by the time I got there [21:35:25] PROBLEM - Puppet errors on tools-exec-1401 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:26] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1419 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:26] PROBLEM - Puppet errors on tools-exec-1419 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:29] PROBLEM - Puppet errors on tools-exec-1411 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:29] PROBLEM - Puppet errors on tools-docker-registry-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:29] PROBLEM - Puppet errors on tools-exec-1442 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:34] PROBLEM - Puppet errors on tools-checker-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:34] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1412 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:37] PROBLEM - Puppet errors on tools-services-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:37] PROBLEM - Puppet errors on tools-exec-1413 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:44] PROBLEM - Puppet errors on tools-flannel-etcd-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:46] PROBLEM - Puppet errors on tools-worker-1003 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:48] PROBLEM - Puppet errors on tools-exec-1422 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:51] PROBLEM - Puppet errors on tools-exec-1425 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:52] PROBLEM - Puppet errors on tools-worker-1025 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:54] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:54] PROBLEM - Puppet errors on tools-mail is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:55] PROBLEM - Puppet errors on tools-worker-1019 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:55] PROBLEM - Puppet errors on tools-exec-1420 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:55] PROBLEM - Puppet errors on tools-worker-1026 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:00] PROBLEM - Puppet errors on tools-exec-1418 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:00] PROBLEM - Puppet errors on tools-services-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:01] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1414 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:05] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1427 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:05] PROBLEM - Puppet errors on tools-redis-1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:08] PROBLEM - Puppet errors on tools-grid-master is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:08] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1411 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:10] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1420 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:10] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1417 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:10] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:11] PROBLEM - Puppet errors on tools-worker-1004 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:13] madhuvishy: can you poke shinken-wm for us? [21:36:13] PROBLEM - Puppet errors on tools-static-11 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:13] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1426 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [0.0] [21:36:18] PROBLEM - Puppet errors on tools-grid-shadow is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:29] oh yikes [21:36:31] bd808: yes [21:36:52] PROBLEM - Puppet errors on tools-cron-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:55] PROBLEM - Puppet errors on tools-exec-1429 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:37:17] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1404 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:37:22] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1406 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:37:25] PROBLEM - Puppet errors on tools-worker-1022 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:37:26] PROBLEM - Puppet errors on tools-bastion-03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:37:26] PROBLEM - Puppet errors on tools-exec-1433 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:37:41] thanks madhuvishy :) [21:39:05] fwiw [21:39:08] https://www.irccloud.com/pastebin/WEPSYIpF/ [21:39:25] madhuvishy: yeah. that is known [21:39:58] !log puppet Synced puppet with upstream, forced puppet run, restarted nscd and nslcd [21:40:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet/SAL [21:40:01] paladox: ^ [21:40:09] bd808 thank you so much :) [21:41:29] paladox: that way that you are managing the puppet tree is a bit scary. keeping all your patches rebased on the production branch and letting the /usr/local/bin/git-sync-upstream run via cron is more stable [21:41:39] yeh [21:41:53] i try rebasing but it seems everytime i rebase it gets a ton of conflicts [21:42:01] ie < *nod* rebasing won't work well if you are hacking on things that are also changing upstream [21:43:21] yeh [21:43:51] bd808: it works fine, thank you very much :) [21:43:57] I am testing a couple of gerrit features on puppet-paladox3 (gerrit 2.14 and scap) and needed to fix redirecting on logon on puppet-phabricator. [21:44:13] i think /usr/local/bin/git-sync-upstream works now as i did git add -A --all && git commit [21:51:20] bd808 do all instances need nscd and nslcd restarted? [21:51:26] I get this Authentication failed. on some instances now. [21:51:58] paladox: yes, we think so. There is a salt command running across all of labs to try and take care of that [21:52:07] thanks :) [21:55:23] will that command run puppet then restart it? :) [22:09:45] good work cloud team! [22:10:20] 10cloud-services-team: Build updated labvirt-star cert - https://phabricator.wikimedia.org/T171116#3454592 (10Andrew) [22:11:16] 10cloud-services-team: Build updated labvirt-star cert - https://phabricator.wikimedia.org/T171116#3454608 (10Andrew) [22:13:56] !log git restart apache on all instances [22:13:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [22:14:04] !log phabricator restart apache on all instances [22:14:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL [22:14:24] !log git restart gerrit on gerrit-test and gerrit-test3 [22:14:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [22:15:27] !log git restart icinga2 to pick up ldap new cert. [22:15:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [22:22:42] 10Cloud-VPS (Quota-requests), 10Recommendation-API: Request custom instance for recommendation-api labs project - https://phabricator.wikimedia.org/T169766#3454659 (10schana) @bd808 I created a database `s53132__trex_p`, but it seems that tables are limited to 64 indexes. Is there a way around this limitation... [22:24:48] 10Cloud-VPS (Quota-requests), 10Recommendation-API: Request custom instance for recommendation-api labs project - https://phabricator.wikimedia.org/T169766#3454689 (10bd808) @schana That would be a question for the DBAs I guess. My first reaction is to wonder if you really need more than 64 distinct indexes on... [22:29:22] 10Cloud-VPS (Quota-requests), 10Recommendation-API: Request custom instance for recommendation-api labs project - https://phabricator.wikimedia.org/T169766#3454702 (10schana) @bd808 Yes, it's a lot of indexing, but the table has records for every wikidata item and columns for every wiki, with the values being... [22:35:09] 10Cloud-Services, 10wikitech.wikimedia.org: Cannot login/change password to MABot@wikitech - https://phabricator.wikimedia.org/T171069#3453021 (10Andrew) It's possible that you were unlucky and hit us in the middle of an ldap outage... does the same happen if you try now? [22:48:49] 10Cloud-Services, 10wikitech.wikimedia.org: Cannot login/change password to MABot@wikitech - https://phabricator.wikimedia.org/T171069#3454744 (10MarcoAurelio) Thanks for your reply. I'll retry but it'll have to be in some hours. Sorry for the inconvenience. [22:58:32] 10Cloud-Services, 10wikitech.wikimedia.org: Cannot login/change password to MABot@wikitech - https://phabricator.wikimedia.org/T171069#3454782 (10MarcoAurelio) @Andrew @bd808 I retried as promied, earlier than expected, and found that I am still being denied the password change with the same error message I pr... [23:09:35] musikanimal: I'm working on a new data store for mediaplaycounts that will make it much faster and much more efficient [23:09:54] it will also much more efficiently get you "all" plays for a given file [23:10:07] and wasn't one of your requests a leaderboard for the most played files? [23:26:00] 10Cloud-Services, 10wikitech.wikimedia.org: Cannot login/change password to MABot@wikitech - https://phabricator.wikimedia.org/T171069#3454857 (10Andrew) I see the MABot account in the wikitech user table but don't see an ldap record. It might be that the creation process you followed just doesn't work right,... [23:29:55] 10Tool-Labs-tools-XTools, 10Community-Tech: http://xtools.wmflabs.org/adminscore/en.wikipedia.org is a redirect loop - https://phabricator.wikimedia.org/T171126#3454863 (10Matthewrbowker) [23:52:10] !log tools Restarted cron on tools-cron-01; toolschecker job showing user not found errors [23:52:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:54:12] 10Cloud-VPS (Quota-requests), 10Recommendation-API: Request custom instance for recommendation-api labs project - https://phabricator.wikimedia.org/T169766#3454975 (10bd808) 64 secondary indexes per table is a MySQL InnoDB limit -- https://dev.mysql.com/doc/refman/5.7/en/innodb-restrictions.html [23:59:10] 10cloud-services-team, 10Research, 10Epic, 10User-bd808: [FY17-18] Program 4: Technical community building - https://phabricator.wikimedia.org/T171120#3454995 (10bd808) [23:59:27] 10cloud-services-team (Kanban), 10Research, 10Epic, 10User-bd808: [FY17-18] Program 4: Technical community building - https://phabricator.wikimedia.org/T171120#3454674 (10bd808) [23:59:33] 10cloud-services-team (FY2017-18), 10Research, 10Epic, 10User-bd808: [FY17-18] Program 4: Technical community building - https://phabricator.wikimedia.org/T171120#3454674 (10bd808)