[00:39:43] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [01:39:41] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:35:55] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [02:53:15] 06Labs, 10Tool-Labs, 10community-labs-monitoring: Implement a system to monitor tools on tool-labs - https://phabricator.wikimedia.org/T53434#2902974 (10scfc) (Labs is slowly moving authorative information about //instances// from LDAP to OpenStack, so if that affects your Icinga setup, https://gerrit.wikime... [03:09:08] !log wikilabels installing nodejs in wikilabels nodes (T154122) [03:09:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL [03:09:11] T154122: Minification and gzip compression for wikilabels assets - https://phabricator.wikimedia.org/T154122 [03:35:32] 06Labs, 10Tool-Labs, 10community-labs-monitoring: Implement a system to monitor tools on tool-labs - https://phabricator.wikimedia.org/T53434#2903006 (10zhuyifei1999) What about the bots? Many of the current (pseudo-)monitoring service are for webservices (check 200, etc), but for bots, do we have anything a... [03:40:42] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [05:05:11] 06Labs, 10Tool-Labs, 10community-labs-monitoring: Implement a system to monitor tools on tool-labs - https://phabricator.wikimedia.org/T53434#2903104 (10scfc) @zhuyifei1999: Depends on the definition of monitoring. If a bot is started by `bigbrother` and `jstart`, if it fails, it will be restarted a couple... [05:36:41] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [05:37:25] 06Labs, 10Tool-Labs, 10community-labs-monitoring: Implement a system to monitor tools on tool-labs - https://phabricator.wikimedia.org/T53434#2903122 (10zhuyifei1999) >>! In T53434#2903104, @scfc wrote: > But this will monitor the process not failing, i. e. if the process is "stuck", it won't notice that typ... [05:50:09] 10Tool-Labs-tools-Pageviews: Use UniversalLanguageSelector for selecting the UI language in the Labs Pageviews tool - https://phabricator.wikimedia.org/T151521#2903129 (10MusikAnimal) 05Open>03declined For now I'm going to decline this. I tried it out and got it to work, but the additional ~60KB of JavaScrip... [05:59:54] 10Tool-Labs-tools-Pageviews: Use UniversalLanguageSelector for selecting the UI language in the Labs Pageviews tool - https://phabricator.wikimedia.org/T151521#2903141 (10Amire80) 05declined>03Open Oh please don't reinvent the wheel :) I am aware of the footprint problem (T153845) and I was actually about t... [06:15:13] 10Tool-Labs-tools-Pageviews: Use UniversalLanguageSelector for selecting the UI language in the Labs Pageviews tool - https://phabricator.wikimedia.org/T151521#2903145 (10MusikAnimal) >>! In T151521#2903141, @Amire80 wrote: > Oh please don't reinvent the wheel :) > > I am aware of the footprint problem (T153845... [06:40:51] 06Labs, 10Labs-Infrastructure, 10DBA, 07Chinese-Sites: Lost database changes on s2 for 3 hours on labs replicas - https://phabricator.wikimedia.org/T129432#2903181 (10Shizhao) [06:41:41] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:09:08] hi! I've got a problem, on tools.giftbot I started webservice on k8s, but https://tools.wmflabs.org/giftbot/weblinksuche.fcgi returns actually '500 - Internal Server Error' - what can I do? [11:10:11] doctaxon: not an expert, but maybe you should use trusty? [11:11:03] LadyViolet: webservice on grid engine? [11:12:00] but it had run on k8s before, too ... :( [11:12:31] I say that because one day the webservice of our tools didn't started but webservice --release=trusty start did worked [11:13:17] if it is about task submission, jsub now defaults to trusty but my knowledge ends here [11:13:36] I always add -l release=trusty nonetheless [11:13:46] doctaxon: can you check the error logs? [11:14:33] where can I find the error logs, it is a webservice [11:14:54] wait a sec [11:15:23] LadyViolet: webservice does not work have "-l release=trusty" but "--backend {gridengine,kubernetes}" [11:16:17] zhuyifei1999_: our webservice does work fine atm, it's doctaxon the one with issues now :) [11:16:26] ik [11:17:43] webservice on grid seems to be running [11:17:46] doctaxon: per your service.manifest you seem to be running a k8b container for php5.6 webservices [11:17:54] *k8s [11:18:24] i.e. type is php5.6 [11:20:31] try tcl https://github.com/wikimedia/operations-software-tools-webservice/blob/master/toollabs/webservice/backends/kubernetesbackend.py#L34 [11:20:36] my service.manifest is backend: kubernetes ; version 2 ; web php5.6 [11:21:09] $ webservice stop; webservice --backend kubernetes tcl start [11:21:23] oh [11:21:32] i'll try [11:22:08] yeah you're using the default php5.6 image, which unfortunately supports only php [11:23:01] you don't use a = [11:23:18] backend=kubernetes ? [11:23:25] backend kubernetes ? [11:24:21] it's what $ webservice help says [11:24:33] * $ webservice --help [11:25:16] ya, it's running [11:25:19] cool [11:25:23] thanks [11:25:24] :) [11:25:25] np [11:25:53] but webservice restart is simply $ webservice restart [11:25:59] isn't it? [11:26:13] yeah [11:26:24] unless you stop it manually [11:26:28] thank you very much indeed [11:26:56] and when stopped manually it clears your service.manifest [11:27:00] np [11:30:56] zhuyifei1999_: did service.manifest actualize automatically? Now there's stated: web tcl instead of web php5.6 like before [11:31:20] the webservice command manage the file [11:31:36] afaik [12:31:17] 06Labs, 10Tool-Labs, 13Patch-For-Review, 07Puppet: role::puppetmaster::puppetdb depends on Ganglia and cannot be used in Labs - https://phabricator.wikimedia.org/T154104#2903464 (10scfc) a:03scfc [12:37:41] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [12:54:41] 06Labs, 10Tool-Labs, 13Patch-For-Review, 07Puppet: Make standalone puppetmasters optionally use PuppetDB - https://phabricator.wikimedia.org/T153577#2903500 (10scfc) a:03scfc [15:31:59] PROBLEM - Puppet run on tools-worker-1003 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:42:42] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:56:59] !log video starting the two other workers because of flood, trying to restore the system to a "ready for new tasks" state [15:57:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Video/SAL [17:16:11] PROBLEM - Puppet run on tools-webgrid-lighttpd-1209 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:19:19] PROBLEM - Puppet run on tools-exec-1412 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:20:11] PROBLEM - Puppet run on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:55:14] RECOVERY - Puppet run on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [17:56:10] RECOVERY - Puppet run on tools-webgrid-lighttpd-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [17:59:18] RECOVERY - Puppet run on tools-exec-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [18:38:41] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [19:20:03] !log video depooling encoding02 since the flood is under control [19:20:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Video/SAL [19:23:43] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:09:40] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [22:09:42] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:05:42] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]