[00:10:35] !log wikistats instance -kraken, upgrade php7 and mariadb packages [00:10:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikistats/SAL [01:37:24] !log wikistream rebooting ws-web, enabling puppet, truncating syslog, etc. Drive is full so I can't make it any worse [01:37:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikistream/SAL [02:17:33] !log math rebooting drmf in an attempt to get puppet working [02:17:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Math/SAL [07:02:56] wikidata may get outdated for some minutes while we setup the new s8 group of servers on labs [13:43:20] thanks jynus! [15:05:23] milimetric: is the 'dashiki' project still doing anything? I'm chasing down some puppet issues on dashiki-01 and I'm not clear on if it ever really worked. [15:09:28] andrewbogott: yeah, it hosts a bunch of dashboards [15:09:51] ok, but apache is broken on dashiki-01. Maybe just that instance is stale? [15:10:07] We have some hiera that configures apache sites on there to point to static js/html that we just copy up [15:10:21] hm... lemme check the dashboards [15:10:33] maybe I broke the apache by un-breaking puppet [15:10:40] yep, dashboards are broken [15:11:05] They were fine last week, but we don’t check them regularly [15:11:17] ok — well, puppet was failing to load on both of those VMs. Can you (or someone) fix that so I don't have to? [15:11:44] sure, I will try, thanks for the heads up [15:13:16] milimetric: the primary issues seems to be passing in [] to role::simplestatic so it doesn't actually configure any hosts [15:14:05] yeah, I have no idea how that was set up, madhu did it, I’ll look around though [17:12:15] arturo: ^ topic changed so it's 'official' :) [17:12:28] great :-) [19:21:05] (03PS203) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [19:25:03] (03CR) 10Ricordisamoa: [C: 04-2] "PS203 updates rollup from 0.53.0 to 0.53.3" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [19:25:59] (03PS204) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [19:30:37] (03CR) 10Ricordisamoa: [C: 04-2] "PS204 renames options in rollup.config.js" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [19:42:05] !log tools reboot tools-worker-1010 [19:42:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:50:47] !log tools clush -w @all 'sudo puppet agent --test' [19:50:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:15:50] !log tools disable puppet on proxies and k8s workers [20:15:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:32:05] !log wikidata-dev fixed puppet runs on wikibase-stretch, wikibase-vue, wikibase with https://gerrit.wikimedia.org/r/#/c/403232/ [20:32:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [20:34:43] !log wikidata-dev wikibase-vue cant start Apache because docker-proxy is already using port 80 [20:34:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [20:34:51] andrewbogott: ^ that issue is unrelated [20:35:24] it doesnt fully stop puppet though, just fails to start webserver [20:51:49] !log tools kubectl cordon tools-worker-1001.tools.eqiad.wmflabs [20:51:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:55:58] !log tools for n in `kubectl get nodes | awk '{print $1}' | grep -v -e tools-worker-1001 -e tools-worker-1016`; do kubectl cordon $n; done [20:56:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:10:15] !log tools tools-k8s-master-01:~# for n in `kubectl get nodes | awk '{print $1}' | grep -v -e tools-worker-1001 -e tools-worker-1016 -e tools-worker-1028 -e tools-worker-1029 `; do kubectl uncordon $n; done [21:10:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:17:25] !log tools tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable --test" [21:17:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:17:54] !log tools ...rush@tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable && sudo puppet agent --test" [21:17:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:19:40] some reports of PAWS down, yuvipanda told me to let you all know :) [22:21:14] ^ madhuvishy can you take a look and see if anything makes sesne in relation to ferm or such? [22:22:07] Hauskatze: hmmm, is up for me, who is reporting as down? [22:23:05] chasemp: did you check the iptables stuff in paws after the ferm patch roll out? [22:23:17] madhuvishy: I didn't think of it [22:23:20] probably broke paws [22:23:24] i'll look no worries [22:23:34] madhuvishy: how though? ferm isn't even on the paws nodes afaict [22:24:22] we're not sure, but it definately seemed related on Friday [22:24:36] I'm not pushing back so much as totally confused [22:24:38] chasemp: unsure, but I noticed last time that when we do these ferm changes it also makes the paws iptables rules FORWARD chain to change to DROP on all the workers [22:24:45] hm [22:25:49] madhuvishy: T184566 & T184500 [22:25:50] T184500: PAWS is down again - https://phabricator.wikimedia.org/T184500 [22:25:50] T184566: As usual.... PAWS is down - https://phabricator.wikimedia.org/T184566 [22:26:50] I'm not very motivated to fix bugs with titles like "As usual.... PAWS is down". [22:27:03] they made another ticket too [22:27:20] I think this may just be tools-paws-worker-1007 [22:27:25] that I rebooted earlier today [22:27:36] i assume it came up with the iptables rule changed [22:27:47] so may be not related to the ferm stuff [22:29:38] chasemp: ^ [22:32:29] Tbh PAWS is broken alot but yes task title is very encouraging :P [22:32:39] Isnt* [22:35:58] bd808: I agree wrt title [22:37:18] 3 months from now we may be ready to spend more time with PAWS. Today we are barely able to keep the rest of the world running. :/ [22:37:45] it turns out that we have a lot of moving parts to keep moving around here ;) [22:39:17] wonder if somebody could have a minute to +2 https://gerrit.wikimedia.org/r/#/c/402580/ [22:40:02] bd808: lot of moving parts or not alot of people moving them? [22:41:37] A lot of software components that move in eccentric and sometimes unpredictable ways. [22:42:21] and we were down 1 staff member for ~9 months [22:46:13] !log run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/remove-worker.bash' to completely destroy the paws k8s cluster [22:46:14] yuvipanda: Unknown project "run" [22:46:35] !log tools run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/remove-worker.bash' to completely destroy the paws k8s cluster [22:46:37] !log tools run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/remove-worker.bash' to completely destroy the paws k8s cluster [22:46:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:46:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:46:47] :) [22:46:56] !log tools reboot all paws-worker nodes [22:47:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:47:12] awww, ty madhuvishy :) didn't wanna burden y'all, so I'll come here just to log for a bit and then disappear [22:47:16] complicated commands :| [22:48:30] !log tools run 'clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/install-kubeadm.bash'' to setup kubeadm on all paws worker nodes [22:48:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:49:05] !log tools run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/init-worker.bash' to bring paws workers back up again, but as 1.8 [22:49:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:53:20] !log tools redo tools-paws-worker-1006 manually, since clush seems to have missed it for some reason [22:53:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:54:05] !log tools kill all PAWS pods [22:54:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:54:37] !log tools kill all kube-system pods in paws cluster [22:54:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:57:23] !log tool.replag Deployed be6109b (add s8 slice) [22:57:23] bd808: Unknown project "tool.replag" [22:57:30] !log tools.replag Deployed be6109b (add s8 slice) [22:57:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.replag/SAL [23:01:26] !log tools kill paws master and reboot it [23:01:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:08:23] !log tools turns out the version of k8s we had wasn't recent enough to support easy upgrades, so destroy entire cluster again and install 1.9.1 [23:08:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:21:27] !log tools paws new cluster master is up, re-adding nodes by executing same sequence of commands for upgrading [23:21:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL