[09:18:48] while testing sth else I noticed that some names in https://gerrit.wikimedia.org/r/c/operations/puppet/+/572213/4/hieradata/eqiad.yaml and https://gerrit.wikimedia.org/r/c/operations/puppet/+/572213/4/hieradata/codfw.yaml are NXDOMAIN, expected? happy to send patches to fix if needed [09:43:45] godog: oh? https://gerrit.wikimedia.org/r/plugins/gitiles/operations/dns/+/d2e0102943023b696f93f10909b569d6813964de%5E%21/ [09:49:16] arturo: I can't resolve the records without .openstack. like 'ns-recursor0.eqiad1.wikimediacloud.org' though [09:50:42] oh [09:50:55] that's a typo indeed, let me fix it! [09:51:18] ack, thanks! [10:00:46] it seems I cannot submit to gerrit? [10:01:01] oh it worked on second try [10:01:01] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/574395 [10:01:05] godog: ^^^ [10:04:41] arturo: ack, reviewed [10:07:58] cool, thanks [10:56:57] !log admin [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-peer-create --peer-ip 208.80.153.185 --remote-as 65002 bgppeer` (T245606) [10:57:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:57:00] T245606: CloudVPS: enable BGP in the neutron transport network - https://phabricator.wikimedia.org/T245606 [10:59:32] !log admin [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-speaker-peer-add bgpspeaker bgppeer` (T245606) [10:59:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:06:57] !log admin [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-peer-delete 17b8c2a3-f0ce-4d50-a265-18ccac703c61` (T245606) [12:06:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:07:01] T245606: CloudVPS: enable BGP in the neutron transport network - https://phabricator.wikimedia.org/T245606 [12:09:12] !log admin [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-peer-create --peer-ip 208.80.153.186 --remote-as 65002 cr1-codfw` (T245606) [12:09:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:09:22] !log admin [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-peer-create --peer-ip 208.80.153.187 --remote-as 65002 cr2-codfw` (T245606) [12:09:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:16:17] !log admin [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-speaker-peer-add bgpspeaker cr1-codfw` (T245606) [12:16:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:16:20] T245606: CloudVPS: enable BGP in the neutron transport network - https://phabricator.wikimedia.org/T245606 [12:16:22] !log admin [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-speaker-peer-add bgpspeaker cr2-codfw` (T245606) [12:16:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:19:49] !log tools.zppixbot change logging_level from DEBUG --> INFO [16:19:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [17:22:06] Something is making DNS go kaboom in codesearch: https://phabricator.wikimedia.org/T246017 [17:22:19] is it expected? [17:25:42] arturo: can it because your network maintenance? [17:25:48] Amir1: that problem does not seem at the moment to be systemic across all of Cloud VPS. [17:26:14] Amir1: my network operations were in codfw, a different environment [17:26:30] I didn't know we have cloud in codfw [17:26:30] Amir1: I merged a DNS patch early in my morning though [17:26:32] Amir1: I can try to help investigate. This is on the codesearch6.codesearch.eqiad.wmflabs instance? [17:26:39] bd808: yup [17:28:02] sudo journalctl -u hound-search [17:28:03] Amir1: gerrit-replica.wikimedia.org and github.com seem to resolve as expected from a root shell on that instance [17:28:50] Those logs look like the problem is happening inside a Docker container. Does that sound right? [17:29:17] yup [17:30:16] * bd808 finds /lib/systemd/system/hound-search.service [17:37:32] !log codesearch `systemctl restart hound-search.service` on codesearch6 (T246017) [17:37:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL [17:37:37] T246017: CodeSearch down; "Connection refused" - https://phabricator.wikimedia.org/T246017 [17:39:17] Amir1: on restart it is spitting out a lot of iptables errors again [17:39:48] I wonder if the Puppet roles the instance is using are pulling in ferm things that are causing problems? [17:44:36] it had a similar down time over the weekend, they might be related [17:44:47] * Amir1 https://phabricator.wikimedia.org/T245920 [17:44:50] bd808: ^ [17:45:21] Amir1: yeah, it seems to be doing the same iptables crash loop now [17:45:57] I'm digging through the Puppet manifests. I have a hunch this is ferm and Docker fighting over things [17:50:11] `iptables -t filter -L -v -n` doesn't have a DOCKER chain. My current random guess is that ferm applied things after the docker demon was running and that wiped out the iptables rules managed by Docker. [17:51:09] !log codesearch `systemctl restart docker` on codesearch6 (T246017) [17:51:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL [17:51:13] T246017: CodeSearch down; "Connection refused" (Docker containers not starting) - https://phabricator.wikimedia.org/T246017 [17:51:53] Amir1: https://codesearch.wmflabs.org/_health is starting to look better. [17:52:17] Restarting the docker daemon fixed the iptables chains [17:52:29] oh okay but that's going to happen again :( [17:52:41] yes, I think the Puppet code needs help [17:53:20] we have seen this sort of thing before in Toolforge with Kubernetes/Docker and FERM fighting over iptables control [17:54:00] I *think* that was fixed by adding some Puppet ordering dependencies [18:01:45] Amir1: https://phabricator.wikimedia.org/T246017#5912967 -- my current guess at the cause of the problem [18:02:55] oh thanks. that makes sense [18:09:30] bd808 Amir1 not sure if it fits here, but here's an example of how I worked around a similar issue. https://phabricator.wikimedia.org/P10501 [18:11:24] jeh: nice! That looks very related [18:13:47] This looks related as well -- https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/386166/ [18:14:34] "Instead of installing the docker-ce package in a profile, which is a wrong thing to do, ..." -- the codesearch module installs docker-ce ;) [18:15:46] yep, that looks good too [18:18:19] !log tools.wikibugs Updated channels.yaml to: f109caf1cedea816385a1278dd23631d38733d60 Move RelEng's interest in other's work out [18:18:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [21:04:23] !log tools.heritage Deploy latest from Git master: d2f2eab (T176560), 36b4940, a493161, 2cea31c, f63aa99, 09efc1c, 7e78abb, 2feab97, dd5062c, a3616d8, 465e82e, 010917d (T244445), 2bcef18 [21:04:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [21:04:31] T244445: Include failed datasets in categorization statistics - https://phabricator.wikimedia.org/T244445 [21:04:32] T176560: Better unused images output for sparql harvests - https://phabricator.wikimedia.org/T176560 [21:26:28] !log tools.account-creator Migrated to 2020 Kubernetes cluster [21:26:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.account-creator/SAL [21:28:56] !log tools.ace2018 Migrated to 2020 Kubernetes cluster [21:28:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ace2018/SAL [21:43:54] !log tools.zppixbot prepping upgrades for zppixbotwiki to 1.34 [21:43:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [21:44:30] Did you see my comment a day or 2 ago asking about the tool info edit links/buttons? bd808 [21:46:13] something to the effect of asking if they should be listed as editable if not a maintainer on the tool [21:47:23] DSquirrelGM: did you click them and you find a privilege escalation? Or are you hoping that this is a good way to report your UI preferences? [21:48:15] it proceeded to the form to edit it, but I didn't attempt to save [21:48:45] meant more in terms of whether it was a bug or not [21:49:07] it did not however give me an option to add one if it doesn't exist [21:50:03] there are 2 edit forms, one for the tool maintainers and one for other authenticated user [21:50:58] The "public" form does not allow editing the tool name, license, authors list, or webservice flag [21:51:18] this is a wiki-like experience with some data "locked" to maintainer only [21:51:43] so no, it is not a bug unless you have found a way to change the "protected" things without being a maintainer [21:52:28] ok, I'll try a couple test edits and let you know the results [21:52:50] or better yet, file a security bug if you find an exploit [21:53:34] well, later, just got called away [21:56:49] !log tools.zppixbot Wiki to read only & starting update for mediawiki & extensions to 1.34 [21:56:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [22:29:51] !log tools.zppixbot ZppixBotWiki update done & unlocked db/removed site notice - reset my own password as it keeps saying incorrect [22:29:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [22:34:54] bd808: is this the only channel that !log tool.name works in [22:35:05] RhinosF1: yes [22:36:32] bd808: thanks. Hopefully I won't have to do an upgrade for a bit. (May? for 1.35). [23:10:52] !log tools.admin testing wm-bot integration [23:10:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.admin/SAL [23:29:27] !log tools.bd808-k8smigrate Migrated to 2020 Kubernetes cluster [23:29:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bd808-k8smigrate/SAL [23:32:42] !log tools.bd808-k8smigrate Migrated to 2020 Kubernetes cluster [23:32:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bd808-k8smigrate/SAL [23:34:35] !log tools.bd808-k8smigrate Migrated to 2020 Kubernetes cluster [23:34:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bd808-k8smigrate/SAL [23:34:52] * bd808 cannot catch it failing again [23:35:55] !log tools.bd808-k8smigrate Rolled back to legacy k8s cluster [23:35:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bd808-k8smigrate/SAL [23:36:15] !log tools.bd808-k8smigrate Migrated to 2020 Kubernetes cluster [23:36:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bd808-k8smigrate/SAL [23:36:52] !log tools.bd808-k8smigrate Rolled back to legacy k8s cluster [23:36:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bd808-k8smigrate/SAL