[08:27:51] jayme: your dns changes are ok to merge? [08:29:25] marostegui: yeah [08:29:33] jayme: ok, I will merge then [08:29:45] marostegui: thanks! [12:30:06] XioNoX: do you have time today to help me with the switch changes for https://phabricator.wikimedia.org/T258826? I need to do them one at a time so it'll be tedious :) [12:30:43] andrewbogott: sure, yeah [12:31:02] great! I'll set some downtimes [12:37:29] XioNoX: assuming that https://gerrit.wikimedia.org/r/c/operations/dns/+/616150 looks good to you… I'm ready to start with reimaging cloudcephmon1003 [12:38:02] then planning to check that it's able to rejoin the pool post-rebuild; if it is we can move on to 1002 [12:38:35] andrewbogott: cool, need the matching dhcp file update too [12:38:57] that was https://gerrit.wikimedia.org/r/c/operations/puppet/+/616172 which I'm currently applying to install1003 [12:40:59] andrewbogott: ok, and the matching removal of the old IPs, or you want to do it later on? [12:41:10] planning to do that at the end [12:41:20] https://gerrit.wikimedia.org/r/c/operations/dns/+/616152 [12:41:29] oh wait, wrong patch [12:41:37] https://gerrit.wikimedia.org/r/c/operations/dns/+/616151 [12:42:50] cool, lgtm! [12:44:03] So… I think I need you to make a switch change, right? Since we're not moving to the regular private vlan but to cloud-hosts1-b-eqiad [12:44:17] at least, I'm assuming that's why this reimage didn't work at all on Friday :) [12:44:28] (sorry, maybe you're already on top of that) [12:45:34] andrewbogott: yep, let me know when good to push the change [12:45:41] I'm ready [12:47:14] andrewbogott: done for cloudcephmon1003 [12:47:24] thanks! Will let you know how it goes [12:49:04] I hope so :) [13:25:37] XioNoX: what is the cidr for cloud-hosts1-b-eqiad? Is it just 10.64.20.0/24 ? [13:26:20] andrewbogott: https://netbox.wikimedia.org/ipam/vlans/88/ yep [13:26:26] thx [13:27:41] ok. this should help… https://gerrit.wikimedia.org/r/c/operations/puppet/+/616519 [13:29:12] andrewbogott: can you add the cluster network as well? [13:29:18] s/can/should/ [13:29:48] XioNoX: I'm pretty sure that's not needed for the mons… I'll need to add that when I start adding the new osds though. [13:29:54] and moving the old ones [13:29:54] ok! [13:31:40] hm, still missing something [13:33:44] ah! [13:38:54] godog: might it be worth moving the pontoon docs out of your user space? it doesn't show up on wikitech search (https://wikitech.wikimedia.org/w/index.php?search=pontoon&title=Special%3ASearch&go=Go&fulltext=1&ns0=1&ns12=1&ns116=1&ns498=1) [14:02:38] XioNoX: I'm ready to try cloudcephmon1002. lmk when you've pushed the change [14:02:48] ok, 2min [14:05:01] andrewbogott: done [14:05:07] thanks! [14:07:03] kormat: +1, moved to https://wikitech.wikimedia.org/wiki/Puppet/Pontoon [14:07:37] \o/ <3 [14:39:15] XioNoX: I'm ready for 1001 now [14:41:07] andrewbogott: done [14:41:18] thx [15:36:49] XioNoX: I'm still waiting for icinga to catch up with the dns changes but other things look good. Thanks for your help! [15:37:14] great!! good job! [16:58:47] ottomata: I wonder if you are thinking about improving mw config mess, I think that will have to do at some point, not necesarilly with etcd [16:58:56] *be done [16:59:27] same, IMHO, with moving db load balancer outside of mw, but both not easy projects [17:01:29] so there is the "wikimedia_clusters" struct in hieradata/common.yaml. it used to be used for all kinds of things in the past but not anymore [17:01:47] there is just one thing left it is still used for "which datacenters have appservers" [17:02:31] if that could be elsewhere we can get rid of like 400 lines of yaml [17:03:55] where is that used from? [17:10:27] godog or mutante: any idea why icinga insists that cloudcephmon1003 is down when it is clearly up? I renamed it so it's probably fixated on the old name/IP but I renamed two other hosts at the same time and it picked them up correctly. [17:11:30] hm, still wrong ip in /etc/icinga/objects/puppet_hosts.cfg [17:12:53] I can't ping it from cumin1001 either [17:13:03] jynus: modules/profile/manifests/pybal.pp: if $::site in keys($wikimedia_clusters['appserver']['sites']) { [17:13:45] andrewbogott: i think that means it's still in puppetdb and the decom script needs to be run on it [17:14:10] doesn't wmf-auto-reimage-host do that? [17:14:49] was it run with --rename option ? [17:15:42] hm, maybe not [17:15:50] i think that's it then [17:16:00] and decom won't run because there's no dns entry for the old name anymore :/ [17:16:35] direct renaming usually has a bunch of issues like this [17:16:55] jynus: no thoughts on that, was just curious about why etcd vs mediawiki-config [17:17:18] i'm using mediawiki-config now for event stream configuration, and it works fine but just feels a little weird [17:17:25] yeah, I rememer it required its own dedicated work just for the first value [17:17:30] we will soon centralize the config there, even for things like production eventgate main stuff [17:17:35] ideally it wouldn't exist [17:17:41] andrewbogott: was it a reimage, a rename a decom+reimage... [17:17:44] ? [17:17:46] aye makes sense [17:17:55] volans: a reimage from .wikimedia.org to .eqiad.wmnet [17:18:13] that's effectlively a rename [17:18:14] ottomata: ideally mw would use more sofisticated interfaces [17:18:16] was https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Rename_while_reimaging followed? [17:18:32] ottomata: but it is easier to path than to build from the ground up [17:18:35] *patch [17:18:42] Iwould hve decom first and then reimage with the new host, it's quicker [17:18:43] and simpler [17:18:46] the solution is robust in any case [17:18:53] *hostname [17:18:57] there was a log of work by service ops to make it work well [17:18:59] volans: I didn't run the decom first, thought that was handled by reimage [17:18:59] *lot [17:19:00] andrewbogott: you can probably find the "remove from puppetdb" part in the decom script and directly run that to clean this one up [17:19:10] anyway, I'm trying to clean up things in puppet with 'puppet node deactivate cloudcephmon1003.wikimedia.org', will see if that helps [17:19:14] and node clean [17:19:39] that seems to have done it! [17:19:40] thanks all [17:19:53] ah, i almost forgot that already, used to do that in the past, ack [17:49:10] access requests: starts with "just need 'restricted'" and then casually adds "write access to the SQL databases" hah, a _bit_ different [17:49:33] also not handled in admin module, so dunno, they should probably tag that DBA [17:52:12] I mean, it is also existing access [17:52:17] for many of their members [17:52:55] I'm a bit worried that granting access for a new employee isn't the right place to sort out retroactively formalizing a group around that access, especially when the latter sounds like it will be hard to do [17:54:28] ah okay you just sent the patch to add to restricted, nvm :) [17:54:34] cdanis: already uploaded a change for that.. yes [17:54:49] first i thought we want a new group that is more made for T&S needs [17:54:53] but changed my mind quickly [17:55:14] he already has access to private data anyways, so the additional mwlog part seems just fine [17:56:58] that being said i think we also should question sometimes when it's "just like existing access for X". it's often been way more than needed (f.e. full deployment access just to get on mwmaint). not in this case. [17:57:32] the "write access to DB" part seems unrelated issue unless that comes with simply being on mwmaint [17:58:25] I think it does come with that -- isn't shell.php also a mwmaint script? [17:58:43] (well, I guess I don't know if that's the specific mechanism they're using when they're doing that) [17:59:08] but yeah, I do agree we should understand how they're using the access they do have, and try to make it more specific/bounded [17:59:45] in the wdqs-one we kind of went the opposite way. first spend time on making it specific and then go full root anyways. [17:59:50] there used to be sql.php [17:59:51] * mutante checks [18:00:13] [mwmaint1002:~] $ sql [18:00:14] Execute the MySQL client binary. [18:00:33] in the distant pass this meant straight mysql console, but not today [18:51:17] herron: so far: https://lists.wmcloud.org/postorius/lists/?all-lists I need to hook exim4 to it and also make sure hyperkitty works and this setup is good to go [18:52:50] Amir1: neat [18:53:17] ^_^ [18:56:29] hyperkitty looks quite nice :) [18:56:46] demo https://lists.fedoraproject.org/archives/ [19:16:36] https://lists-beta.wmflabs.org/hyperkitty/ [19:16:50] cdanis: yup, you can search, etc. It's so good [19:17:03] https://lists-beta.wmflabs.org/hyperkitty/list/test-high-volume@lists.beta.wmflabs.org/thread/TOFSYCOMTGUWZPXNZGGIK3TBRCYAKAQJ/