[06:07:48] !log tools.qrcode-generator Add pre-generated category while uploading (T242190) [06:07:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.qrcode-generator/SAL [06:07:50] T242190: QRCode-Generator: Add pre-generated title, description and category while uploading - https://phabricator.wikimedia.org/T242190 [10:30:22] !log toolsbeta remove puppet prefixes `toolsbeta-test-proxy`, `toolsbeta-k8s-master`, `toolsbeta-flannel-etcd`, no longer in use [10:30:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:35:34] !log toolsbeta remove local changes in the puppet tree in toolsbeta-puppetmaster-03 (docker mount point) [10:35:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [15:27:15] !log tools.qrcode-generator Deploying T242190 [15:27:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.qrcode-generator/SAL [15:27:18] T242190: QRCode-Generator: Add pre-generated title, description and category while uploading - https://phabricator.wikimedia.org/T242190 [16:42:34] !log admin restarting l3 agents on cloudnets in codfw1dev after applying https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/584188/ [16:42:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:44:48] !log admin [codfw1dev] installing package neutron-openvswitch-agent in cloudvirt2002-dev (T248881) [16:44:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:44:50] T248881: CloudVPS: research VXLAN implentation for neutron - https://phabricator.wikimedia.org/T248881 [16:55:12] !log cloudinfra dropping `_psl.wmcloud.org` record (T168677) [16:55:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Cloudinfra/SAL [16:55:14] T168677: Add new Cloud Services domains to public suffix list - https://phabricator.wikimedia.org/T168677 [16:56:03] !log tools dropping `_psl.toolforge.org` TXT record (T168677) [16:56:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:22:32] !log tools disabled puppet across tools-k8s-worker-[1-55].tools.eqiad.wmflabs T248702 [18:22:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:22:36] T248702: Reconfigure the Toolforge k8s workers to map their unused disk to /var/lib/docker - https://phabricator.wikimedia.org/T248702 [18:28:03] !log tools Beginning rolling depool, remount, repool of k8s workers for T248702 [18:28:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:28:05] T248702: Reconfigure the Toolforge k8s workers to map their unused disk to /var/lib/docker - https://phabricator.wikimedia.org/T248702 [19:04:41] !log phabricator setting profile::tlsproxy::envoy::timeout: 60 in project hiera to fix puppet [19:04:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL [19:11:02] I just noticed about 200 emails flooded my spam folder about 4 hours ago [19:11:58] paladox, phabricator.phabricator.eqiad.wmflabs is yours, right? I've just added half a dozen hiera values to try to get puppet working there but it's still broken; could you have a look? [19:12:23] sure [19:16:03] thanks [19:18:28] !log deployment-prep restarting puppetdb on deployment-puppetdb03 and restarted apache2 on deployment-puppetmaster04 but puppet runs still fail everywhere [19:18:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [19:18:51] Krenair: puppet is broken throughout deployment-prep. I'm guessing it's the puppetdb OOM but a restart seems not to help. [19:19:44] hm [19:21:30] Betacommand: possibly related to T248731 getting fixed? [19:21:30] T248731: tools.wmflabs.org email isn't received (puppet failure at tools-mail-02) - https://phabricator.wikimedia.org/T248731 [19:21:37] andrewbogott, are you sure it didn't just take some time to come back up? [19:21:55] Krenair: possible! rechecking [19:21:58] andrewbogott, it seems to work now but was clearly broken for most of the last 24 hours [19:22:07] I just ran it on -deploy01 fine [19:22:17] I recall puppetdb being slow to start [19:22:36] I agree, it seems better [19:22:39] like hm-why-did-that-not-fix-everything kind of slow :) [19:22:46] now I see [19:22:53] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Function lookup() did not find a value for the name 'profile::tlsproxy::envoy::timeout' (file: /etc/puppet/modules/profile/manifests/tlsproxy/envoy.pp, line: 50) on node deployment-restbase01.deployment-prep.eqiad.wmflabs [19:22:59] yeah [19:23:05] which is not specific to deployment-prep, it's the same thing I just pinged paladox about [19:23:11] some instances have problems like that [19:23:14] I guess I should dig up whatever upstream thing broke that [19:23:19] need to run around fixing them when they appear :/ [19:23:23] andrewbogott it appears its failing on: [19:23:37] https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/tlsproxy/envoy.pp#L72 [19:23:55] It would be so nice if prod and cloud vps did not have completely forked hiera lookup paths :/ [19:24:35] Though why dosen't setting profile::envoy::ensure: absent, stop it from installing the class? [19:24:46] paladox: that's just after I fixed a bunch of things. I think the main issue is that that class is totally unconfigured for VMs [19:25:36] jbond42: if you're around, I think this might be related to your recent work [19:26:06] probably but also it's probably only used in classes that aren't often included in cloud VPS [19:26:16] that also take a load of config [19:26:24] it's just this envoy stuff is being put everywhere now [19:26:26] maybe but enough places that it broke the very first two VMs I spot-checked [19:26:26] * Krenair shrug [19:26:53] which two were those? [19:27:08] phabricator.phabricator and deployment-restbase01.deployment-prep [19:27:15] figures [19:28:08] sorry I'm in a 'meh, everything is terrible and this is fine' mood at the moment :) [19:31:15] me too but working on puppet breakages at least won't accidentally break other things, mostly :) [19:31:55] :D [19:45:12] I recently started getting errors when using the shared Pywikibot through cron: ImportError: No module named pathlib2, which is included in pywikibot.bot. did something change there? [19:48:09] dungodung: we did not uninstall system packages for the job grid if that's what you are asking. No idea if the requirement is new in pywikibot or not. [19:48:31] the shared pywikibot "install" is a blessing and a curse ;) [19:48:51] I guess andrewbogott we could add a if statement here https://github.com/wikimedia/puppet/blob/819633abad3eecfd7a041aa8b284c481e3a110c9/modules/profile/manifests/envoy.pp#L10 so if its absent, don't install it? [19:48:59] well, it *is* the currently recommended way, as per https://wikitech.wikimedia.org/wiki/Help:Toolforge/Pywikibot [19:49:00] thats my quick solution :P [19:50:49] bd808, I guess installing that python package wouldn't hurt, if it is indeed a new one? [19:53:58] andrewbogott: you need to have the following hiera in cloud.yaml [19:54:04] if you give me a sec i can do a patch [19:54:33] dungodung: it does seem to exist for python2 in the Debian repos -- . I wonder if it is needed if you use python3 instead? [19:54:56] * bd808 is not excited about the number of tools that need to move from py2 to py3 [19:55:10] bd808, oh, I still use python2 actually [19:56:27] and python locally (within the tool) is just python2, maybe on the gridengine it got turned into python3? [19:57:29] dungodung: both runtimes are present on the job grid. `python` is a synonym for `python2`, `python3` is Python 3.5.3 [19:58:03] bd808, hmm, then how come python spews that error with cron? [19:59:28] dungodung: my phrasing was probably confusing. It would be technically possible to install python-pathlib2 system wide, but it is not currently installed [19:59:54] paladox: andrewbogott: i have just mereged https://gerrit.wikimedia.org/r/c/operations/puppet/+/584690 can you confirm if that fixes the issue [20:00:21] bd808, oh, so it does appear to have been added recently by pwb? if so, then shouldn't other maintainers be experiencing the same problem? [20:00:23] dungodung: but you might try running your scripts under python3 to see if that is an alternate fix [20:00:37] dungodung: probably, yes [20:00:48] bd808, oh, that requires switching to python3, which is just a no go for me right now :) [20:01:05] still fails jbond42 [20:01:17] paladox: what is the failing server [20:01:19] bd808, so wouldn't installing it system wide just solve the problem? :) [20:01:26] jbond42: [20:01:26] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Operator '[]' is not applicable to an Undef Value. (file: /etc/puppet/modules/profile/manifests/tlsproxy/envoy.pp, line: 72, column: 28) on node phabricator.phabricator.eqiad.wmflabs [20:01:50] dungodung: that's pushing your problem up to me and only a temporary fix, but yes :) [20:02:45] bd808, well, if it makes sure that the recommended setup works for everyone, I'd argue it's a benefit for all :) [20:02:51] T213287 [20:02:51] T213287: Drop support of Python 2.7 - https://phabricator.wikimedia.org/T213287 [20:03:10] Just a note that pywikibot is working on removing py2 [20:03:14] dungodung: a phabricator task tagged to https://phabricator.wikimedia.org/project/view/3978/ is the way to officially ask for new packages [20:03:31] and calling that page on wikitech official... make me want to unlink it [20:03:41] heh [20:04:06] paladox: any rason this cant be upgraded to puppet 5.5 and facter 3 [20:04:15] Also https://phabricator.wikimedia.org/T213287#5195721 [20:04:27] the issues is that facter 2.5 doesn;t have $facts['networking']['interfaces'] [20:04:55] yeh [20:09:13] paladox: can you elaborate as the phab puppet masters are marked as migrated on https://phabricator.wikimedia.org/T241719 [20:09:42] jbond42 but they are not migrated on phabricator [20:09:51] *phabricator.phabricator.eqiad.wmflabs [20:09:59] so still runs puppet 4 (and facter 2) [20:10:48] is there something you are specificly worried about the production host are running facter3 and puppet 5 without issue [20:11:11] nope, i just didn't know they could run puppet5 (only the puppet master i thought could) [20:11:40] no everything should be compatible with puppt5 and facter now and should be upgraded [20:11:50] i just set: [20:11:50] profile::base::puppet::facter_major_version: 3 [20:11:51] profile::base::puppet::puppet_major_version: 5 [20:11:53] right? [20:12:07] let me double check but that looks right [20:12:09] paladox: T236571 is your fix :) [20:12:10] T236571: "phabricator" Cloud VPS project jessie deprecation - https://phabricator.wikimedia.org/T236571 [20:12:41] paladox: and also T236569 [20:12:41] T236569: "git" Cloud VPS project jessie deprecation - https://phabricator.wikimedia.org/T236569 [20:12:42] bd808 its running stretch, but need to reimagine i guess so it uses buster [20:13:06] but y'all said you were going to shutdown both those projects right? [20:13:12] paladox: yes that should be it [20:13:20] great! [20:13:23] bd808 yup [20:13:27] migrating to devtools [20:13:32] we should probably update them in cloud.yaml [20:13:56] ill touch base with the cloud team tomorrow [20:14:03] paladox: and yet they still exist 5 months after I filed the tasks [20:14:26] * bd808 may be cranky today [20:15:08] I mean we have some of the instances up, we just have to get the classes working. [20:15:16] As i had hacked this so scap would work. [20:15:26] and also hacked it for ldap [20:15:50] jbond42 hmm, puppet failing will mean it won't add it. [20:16:39] we've got rid of a few instances though bd808 [20:17:32] paladox: sure, but 5 months ago it was a "we are doing this now!" ask for new resources [20:17:37] paladox: ho many servers need fixing? [20:18:16] jbond42 i'm only aware of phabricator.phab and there's also the deployment-restbase01.deployment-prep.eqiad.wmflabs [20:18:37] ack one sec let me fix it manualy [20:19:32] bd808 ok [20:21:30] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, secret(): invalid secret ssl/~.key (file: /etc/puppet/modules/sslcert/manifests/certificate.pp, line: 76, column: 26) (file: /etc/puppet/modules/profile/manifests/tlsproxy/envoy.pp, line: 116) on node phabricator.phabricator.eqiad.wmflab [20:21:30] s [20:22:05] paladox: yep looking [20:22:11] thanks :) [20:32:43] !log tools.wiper-languagetool Shutdown webservice. Stuck in CrashLoopBackoff because of bad config from /data/project/wiper-languagetool/run.sh. (java.io.FileNotFoundException: /home/server.properties) [20:32:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wiper-languagetool/SAL [20:34:07] paladox: i applied https://gerrit.wikimedia.org/r/c/operations/puppet/+/584699 but im still seeing there error do yo have anything on the project puppet hiera settings? [20:34:20] yeh, i think i set that [20:34:22] * paladox removes [20:35:02] jbond42 [20:35:03] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, If you want non-sni TLS to be supported, you need to define profile::tlsproxy::envoy::global_cert_name or profile::tlsproxy::envoy::acme_cert_name (file: /etc/puppet/modules/profile/manifests/tlsproxy/envoy.pp, line: 131, column: 13) [20:35:03] on node phabricator.phabricator.eqiad.wmflabs [20:38:26] paladox: hieradata/role/common/phabricator.yaml has phabricator.discovery.wmnet you should be picking that up [20:38:57] I don't think in wmcs it loads individual role hiera files [20:39:13] oh ok try adding [20:39:14] profile::tlsproxy::envoy::global_cert_name: "phabricator.discovery.wmnet" [20:39:37] doing :) [20:39:41] looks like you should also have profile::tlsproxy::envoy::websockets: true [20:39:46] jbond42: confusing to all core SREs, but paladox is correct that nothing under hieradata/role allies in Cloud PVS projects [20:39:53] to match production settings at least [20:39:53] *applies [20:39:56] hmm [20:39:57] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Could not find resource 'Package[php7.2-fpm]' in parameter 'require' on node phabricator.phabricator.eqiad.wmflabs [20:40:29] php7.2-fpm should be available [20:40:44] bd808: thanks and yes the different hiera rule and backends is a never ending source f confusion :) [20:41:07] jbond42: blame _joe_ ;) [20:42:30] :) [20:43:13] paladox: thats not the most helpfull error message and its getting late here, can you file a phab task and send it to me and ill take a look tomorrow [20:43:23] sure [20:43:26] cheers [20:44:21] jbond42 https://phabricator.wikimedia.org/T248921 [20:44:39] i think it may be comming from https://github.com/wikimedia/puppet/blob/770e21262382015275eab3de08faac4fca11fa3d/modules/profile/manifests/phabricator/httpd.pp#L20 [20:44:52] oh [20:44:56] i see the issue jbond42 [20:45:06] the phabricator class seemed to remove stretch support [20:48:08] paladox: possible ill tak a look tomorro, if you get a fix please add me as a reviewer ad ill look at/merge that. thanks [20:48:16] ok [21:11:42] jbond42: if we have code in the puppet repo that requires a new version of puppet then we need to upgrade every VM [21:11:53] That issue is appearing in various places, not just on paladox's host [21:12:04] (sorry if you already covered that, I'm still reading the backscroll) [21:24:02] !log tools.cgstat Shutown webservice. Associated 'catgraph' project was deleted 2020-03-18. [21:24:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cgstat/SAL [23:42:21] !log admin deleted "Kubernetes Cluster" and "Kubernetes Performance" dashboards T246689 [23:42:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [23:42:24] T246689: Toolforge: cleanup legacy kubernetes cluster - https://phabricator.wikimedia.org/T246689