[02:13:32] PROBLEM - High iowait on tools-exec-1434 is CRITICAL: CRITICAL: tools.tools-exec-1434.cpu.total.iowait (>16.67%) [02:18:31] RECOVERY - High iowait on tools-exec-1434 is OK: OK: All targets OK [06:33:02] PROBLEM - Puppet errors on tools-exec-1428 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [06:47:10] PROBLEM - Puppet errors on tools-cron-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:07:59] RECOVERY - Puppet errors on tools-exec-1428 is OK: OK: Less than 1.00% above the threshold [0.0] [07:22:06] RECOVERY - Puppet errors on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:26:42] 10Wikibugs: Tag detection is broken - https://phabricator.wikimedia.org/T166951#3312668 (10valhallasw) 05Open>03Resolved a:03valhallasw [14:26:45] \o/ [14:30:20] (03PS1) 10Merlijn van Deen: Join #wikimedia-cloud by default [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/357013 [14:30:23] (03PS1) 10Merlijn van Deen: Join #wikimedia-cloud by default [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/357013 [14:30:31] (03CR) 10Merlijn van Deen: [C: 032] Join #wikimedia-cloud by default [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/357013 (owner: 10Merlijn van Deen) [14:30:33] (03CR) 10Merlijn van Deen: [C: 032] Join #wikimedia-cloud by default [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/357013 (owner: 10Merlijn van Deen) [14:30:44] huh. [14:30:46] Hmm it is showing it twice ^^ [14:30:57] (03Merged) 10jenkins-bot: Join #wikimedia-cloud by default [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/357013 (owner: 10Merlijn van Deen) [14:31:00] (03Merged) 10jenkins-bot: Join #wikimedia-cloud by default [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/357013 (owner: 10Merlijn van Deen) [14:31:07] (03CR) 10jenkins-bot: Join #wikimedia-cloud by default [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/357013 (owner: 10Merlijn van Deen) [14:31:10] (03CR) 10jenkins-bot: Join #wikimedia-cloud by default [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/357013 (owner: 10Merlijn van Deen) [14:38:49] Is there 2 instances of wikibugs running on the tool? [14:39:44] No, there was a stray grrrrit-to-redis instance running. [14:40:16] ah [15:46:37] thanks valhallasw`cloud :) [15:59:41] !log tools.multidesc Restarted webservice to see if PHP user detection error is transient (T166949) [15:59:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.multidesc/SAL [15:59:45] T166949: multidesc: cannot connect to database due to reading wrong replica.my.cnf file - https://phabricator.wikimedia.org/T166949 [16:14:14] bd808: is k8s webservices down? tools.zppixbot's website is down for me [16:15:23] not all kubernetes services are down if that is what you are asking. Probably best for you to login and look at your error logs [16:17:10] 10Tool-Labs-tools-Other: multidesc: cannot connect to database due to reading wrong replica.my.cnf file - https://phabricator.wikimedia.org/T166949#3312597 (10bd808) Restarting the kubernetes webservice process seems to have fixed the app. It looks like the PHP [[https://secure.php.net/get_current_user|`get_curr... [16:20:12] bd808: no errors logged for webservice and my ircbot pods and stuff are okay, i've tried doing webservice stop; sleep 5; webservice start --backend=kubermetes [16:21:26] (its been this way for a while now but i just got around to saying something (i forgot to say something lol)) [16:23:13] there is no webservice pod running that I can see, but there is a service.manifest [16:24:11] bd808: ugh im unable to access ssh on my current device if its not much to ask could you run webservice start again :/ i probably derped and forgot to start it or something [16:24:50] `webservice status` thinks it is running so something funky is going on. I'll poke at it a bit [16:25:13] bd808: let me know if you need me to do anything :) thanks [16:28:29] Zppix: https://tools.wmflabs.org/zppixbot/ seems to work now [16:28:43] bd808: thanks what was the issue? [16:29:30] Zppix: unsure honestly. I just did `webservice stop; webservice --backend=kubernetes php5.6 start` [16:30:16] I wonder if its somehow related to the same issue as the problem with webservice restart? [16:30:43] ill keep my eye out and see if i notice it do that again [16:47:45] 10Tool-Labs-tools-Other: multidesc: cannot connect to database due to reading wrong replica.my.cnf file - https://phabricator.wikimedia.org/T166949#3312745 (10Codrinb) It works now. Thanks a lot! [16:58:42] !conduct [16:58:42] This irc channel and all of the Cloud Services projects are subject to the code of conduct for Wikimedia technical spaces -- https://www.mediawiki.org/wiki/Code_of_Conduct [18:57:28] (03PS1) 10DatGuy: Add bot to #wikimedia-ops [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/357021 [18:57:29] (03PS1) 10DatGuy: Add bot to #wikimedia-ops [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/357021 [18:59:04] bd808: theres a rougue gerrit-to-irc proc on wikibugs i think [19:13:54] I killed it beore, but it is respawning [19:14:00] * valhallasw`cloud curses OGE [19:14:43] Ok, killed the task process now [20:08:08] valhallasw`cloud: if it continues maybe killing the deployment then re-deploy? [20:08:36] Zppix: if OGE was aware of the running task this wouldn't be a problem [20:08:55] refresh my memory what is oge [20:09:23] the software that manages job scheduling (open grid engine) [20:10:50] valhallasw`cloud: wikibugs isnt on kubermetes? [20:11:01] No. [20:11:04] why? [20:11:56] Because even if kubernetes could be used for regular jobs (it can only be used for webservices at the moment, as far as I know), all the deployment scripts assume OGE [20:12:10] so changing to kubernetes is basically wasted effort [20:12:39] valhallasw`cloud: unless ircbots fall under webservices then k8s runs more than that [20:13:12] valhallasw`cloud: I think stashbot is on k8s [20:14:27] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Kubernetes#Kubernetes_continuous_jobs [20:14:57] zhuyifei1999_: valhallasw`cloud not mention my ircbot zppixbot is on k8s [20:16:18] > all the deployment scripts assume OGE, so changing to kubernetes is basically wasted effort [20:17:16] valhallasw`cloud: I realise that but we were just correcting your statment about k8s [20:17:38] sounds like time to do some refactoring :) [20:18:44] * zhuyifei1999_ gtg [21:41:48] valhallasw`cloud: are you still about? [21:42:25] ? [21:42:39] valhallasw`cloud: mind looking at https://gerrit.wikimedia.org/r/357102 for me? [21:43:54] Tomorrow. [21:44:15] ok