[00:35:01] PROBLEM - Puppet errors on tools-exec-1409 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [00:59:22] PROBLEM - Puppet errors on tools-exec-1411 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:10:01] RECOVERY - Puppet errors on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [01:54:13] PROBLEM - Puppet errors on tools-worker-1021 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [02:09:24] RECOVERY - Puppet errors on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:24] !log tools Cleaning up stuck merges for cdnjs clones on tools-static-10 and tools-static-11 [02:19:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [02:25:22] PROBLEM - Puppet errors on tools-exec-1411 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [02:51:36] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [02:52:40] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [02:59:13] RECOVERY - Puppet errors on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [0.0] [03:05:25] RECOVERY - Puppet errors on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [03:26:38] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [03:38:55] !log tools cdnjs on tools-static-11 is up to date [03:38:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [03:43:43] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [03:47:13] PROBLEM - Puppet errors on tools-exec-1432 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [04:22:12] RECOVERY - Puppet errors on tools-exec-1432 is OK: OK: Less than 1.00% above the threshold [0.0] [04:23:40] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [04:23:49] !help Who do I need to talk to to get access to the cluebotng and cluebot3 projects on wmflabs? [04:23:49] Cobi: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [04:24:18] https://tools.wmflabs.org/?tool=cluebotng [04:24:28] https://tools.wmflabs.org/?tool=cluebot3 [04:24:51] Apparently Rich Smith and DamianZaremba [04:26:05] !log tools cdnjs on tools-static-10 is up to date [04:26:06] Ahh, so that is something they can do themselves? Last time I had asked them, they told me I'd need to contact the sysadmins [04:26:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:27:13] Cobi: yeah. they just need to use the "manage maintainers" link on https://tools.wmflabs.org/?list next to the tool name [04:27:38] you do need to be a member of the tools project first [04:28:08] How can I tell if I am a member of the tools project? I am able to login to login.tools.wmflabs.org [04:28:20] you're a member then :) [04:29:05] another way to tell is by visiting https://toolsadmin.wikimedia.org/tools/ and logging in [04:29:44] yet another way is looking at the members list on https://tools.wmflabs.org/openstack-browser/project/tools [04:30:03] err... "users" there I guess [04:33:25] Awesome. I'll contact them and let them know that they can manage it via that link. I am the owner of ClueBot NG/ClueBot III on enwiki, but had gotten busy in real life, so delegated day-to-day operations to Rich and Damian. They decided to move it from our own servers to wmflabs a while back, and because they were generally responsive to e-mail, I never really needed to be able to become cluebotng or cluebot3. But they've been away [04:33:56] If, for whatever reason, I'm unable to still get in contact with them, what would the process be at that point? [04:35:04] filing a phabricator task with the Tool Labs standards committee and asking to be forcefully added to those tools -- https://phabricator.wikimedia.org/project/profile/2457/ [04:35:38] it would fall under the https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Abandoned_tool_policy [04:36:23] You may be able to get things expedited a bit if they know you from your work on the wikis [04:38:18] bd808: that would break the bot if we follow the policy and remove all the passwords right? [04:38:40] zhuyifei1999_: yeah I think this would be a bit differrent [04:39:38] we could verify that those tool accounts are authing as [[User:ClueBot_III]] and then vlidate somehow that Cobi is who they say they are [04:40:04] I'd like you and the committee to handle the verfication for sure [04:40:12] k [04:40:54] I know the passwords of the bot accounts, though. And have access to the e-mail addresses registered to the bots, so could reset them anyway. But I'll definitely try again to contact them. Thanks :) [04:42:15] Cobi: yeah. I'm not trying to be a jerk but we do need to do something for verification [04:42:23] Oh, obviously. [04:42:29] I'd do the same thing. [04:42:33] :) [04:43:31] as zhuyifei1999_ pointed out this isn't exactly what we wrote the Abandoned tools policy for but its kind of related [05:30:36] 10Labs, 10DBA: Prepare and check storage layer for kbp.wikipedia.org - https://phabricator.wikimedia.org/T160869#3399517 (10Marostegui) a:03Marostegui [05:55:00] 10Labs, 10DBA, 10User-bd808, 10cloud-services-team (Kanban): Prepare and check storage layer for atjwiki - https://phabricator.wikimedia.org/T167715#3399535 (10Marostegui) I have sanitized the tables on both sanitarium hosts and that replicated to labs. Also recreated the views on labsdb1009, labsdb1010, l... [05:55:55] 10Labs, 10DBA: Prepare and check storage layer for kbp.wikipedia.org - https://phabricator.wikimedia.org/T160869#3399548 (10Marostegui) 05Open>03Resolved I have sanitized the tables on both sanitarium hosts and that replicated to labs. Also recreated the views on labsdb1009, labsdb1010, labsdb1011 and the... [06:58:45] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1425 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:33:44] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1425 is OK: OK: Less than 1.00% above the threshold [0.0] [07:38:23] PROBLEM - Free space - all mounts on tools-worker-1020 is CRITICAL: CRITICAL: tools.tools-worker-1020.diskspace.root.byte_percentfree (<40.00%) [08:13:44] 10Labs: ukwikimedia still present on labs hosts - https://phabricator.wikimedia.org/T169488#3399766 (10Marostegui) [08:32:18] 10Labs, 10Labs-Infrastructure, 10Operations: investigate slapd memory leak - https://phabricator.wikimedia.org/T130593#3399799 (10MoritzMuehlenhoff) Yeah, that's correct, the underlying memory leak isn't fixed, only hidden by the restarts. This is likely still unfixed in stretch, there's nothing in the 2.4.4... [08:46:42] 10cloud-services-team, 10Operations, 10Upstream: New anti-stackclash (4.9.25-1~bpo8+3 ) kernel super bad for NFS - https://phabricator.wikimedia.org/T169290#3399875 (10MoritzMuehlenhoff) Which NFS services/processes caused this? [08:59:05] What are the memory restrictions on wmflabs? I was thinking of making a webservice that uses perhaps 1GB [09:31:14] 10Labs: ukwikimedia still present on replicas dbs on labs hosts - https://phabricator.wikimedia.org/T169488#3400095 (10Framawiki) [09:44:06] 10Labs, 10Labs-Infrastructure, 10DBA, 10Tracking: LabsDB replica service for tools and labs - issues and missing available views (tracking) - https://phabricator.wikimedia.org/T150767#3400123 (10jcrespo) [09:44:10] 10Labs, 10DBA, 10Epic: Labs database replica drift - https://phabricator.wikimedia.org/T138967#3400124 (10jcrespo) [09:44:14] 10Labs, 10DBA: enwiki_p logging vs logging_userindex returning dramatically different results - https://phabricator.wikimedia.org/T168349#3400122 (10jcrespo) [10:11:00] (03PS1) 10Addshore: Add User-Addshore to ##add [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/362962 [11:43:05] 10Labs: Blacklist apache from unattended-upgrades on tools puppetmaster - https://phabricator.wikimedia.org/T159254#3400577 (10MoritzMuehlenhoff) > Would pinning apache to our repo for jessie in the Puppet class handle this? Unfortunately not. unattended-upgrades would still upgrade the packages. The pinning mi... [12:33:04] PROBLEM - Puppet errors on tools-exec-1440 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [13:08:05] RECOVERY - Puppet errors on tools-exec-1440 is OK: OK: Less than 1.00% above the threshold [0.0] [13:14:43] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:49:38] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [13:52:44] PROBLEM - Puppet errors on tools-worker-1006 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [13:53:09] PROBLEM - Puppet errors on tools-exec-1407 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:53:37] PROBLEM - Puppet errors on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:54:03] PROBLEM - Puppet errors on tools-worker-1018 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:54:43] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1425 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [13:55:11] PROBLEM - Puppet errors on tools-flannel-etcd-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:55:19] PROBLEM - Puppet errors on tools-package-builder-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:55:27] PROBLEM - Puppet errors on tools-exec-1439 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [13:55:39] PROBLEM - Puppet errors on tools-exec-1422 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:55:49] PROBLEM - Puppet errors on tools-worker-1023 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [13:57:20] PROBLEM - Puppet errors on tools-services-02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [13:57:24] PROBLEM - Puppet errors on tools-exec-1411 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:57:24] PROBLEM - Puppet errors on tools-exec-1408 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [13:57:32] PROBLEM - Puppet errors on tools-bastion-03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:57:36] PROBLEM - Puppet errors on tools-mail is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [13:57:44] PROBLEM - Puppet errors on tools-elastic-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [13:58:02] PROBLEM - Puppet errors on tools-exec-1434 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [13:58:06] PROBLEM - Puppet errors on tools-exec-1436 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:58:06] PROBLEM - Puppet errors on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:58:15] PROBLEM - Puppet errors on tools-exec-1431 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [13:58:25] PROBLEM - Puppet errors on tools-proxy-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:58:25] PROBLEM - Puppet errors on tools-exec-1406 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:58:43] PROBLEM - Puppet errors on tools-exec-1420 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [13:58:51] PROBLEM - Puppet errors on tools-worker-1013 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:59:01] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1408 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [13:59:07] PROBLEM - Puppet errors on tools-exec-1440 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:59:09] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1407 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [13:59:22] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:59:58] PROBLEM - Puppet errors on tools-worker-1016 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [14:00:02] PROBLEM - Puppet errors on tools-exec-1428 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [14:00:20] PROBLEM - Puppet errors on tools-worker-1014 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [14:00:32] PROBLEM - Puppet errors on tools-bastion-05 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [14:00:42] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1412 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [14:00:48] PROBLEM - Puppet errors on tools-static-11 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [14:00:52] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1406 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [14:00:55] PROBLEM - Puppet errors on tools-exec-1417 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [14:01:03] PROBLEM - Puppet errors on tools-exec-1409 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [14:01:51] PROBLEM - Puppet errors on tools-exec-1403 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [14:02:01] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1418 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [14:02:21] PROBLEM - Puppet errors on tools-worker-1019 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [14:05:07] ^ known issue being worked out [14:22:51] 10Labs, 10cloud-services-team (Kanban): nova-fullstack is losing instances on creation - https://phabricator.wikimedia.org/T165555#3401009 (10chasemp) @andrew seems to be an interesting crop of leaks in the hopper atm ```+--------------------------------------+-----------------------+--------+----------------... [14:28:53] 10Labs: Labstore nfsd processes report "sent only x when sending y bytes - shutting down socket" - https://phabricator.wikimedia.org/T169281#3401013 (10chasemp) >>! In T169281#3393761, @bd808 wrote: > Quite likely related to {T169290}. @chasemp could not find other log events like this prior to the kernel upgrad... [14:32:09] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [14:32:15] RECOVERY - Puppet errors on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [14:32:21] PROBLEM - Puppet errors on tools-worker-1017 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [14:32:25] RECOVERY - Puppet errors on tools-flannel-etcd-01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:32:31] PROBLEM - Puppet errors on tools-webgrid-generic-1404 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [14:32:33] PROBLEM - Puppet errors on tools-worker-1010 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [14:32:35] PROBLEM - Puppet errors on tools-exec-1404 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [0.0] [14:32:41] PROBLEM - Puppet errors on tools-redis-1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [14:32:45] PROBLEM - Puppet errors on tools-exec-1441 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [0.0] [14:32:51] RECOVERY - Puppet errors on tools-worker-1023 is OK: OK: Less than 1.00% above the threshold [0.0] [14:32:54] PROBLEM - Puppet errors on tools-k8s-etcd-03 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [0.0] [14:33:20] RECOVERY - Puppet errors on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [14:33:26] RECOVERY - Puppet errors on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [14:33:28] RECOVERY - Puppet errors on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [14:33:46] RECOVERY - Puppet errors on tools-proxy-02 is OK: OK: Less than 1.00% above the threshold [0.0] [14:34:14] RECOVERY - Puppet errors on tools-worker-1018 is OK: OK: Less than 1.00% above the threshold [0.0] [14:35:08] RECOVERY - Puppet errors on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [14:35:10] RECOVERY - Puppet errors on tools-package-builder-01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:35:22] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1425 is OK: OK: Less than 1.00% above the threshold [0.0] [14:35:30] PROBLEM - Puppet errors on tools-worker-1015 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [14:35:36] RECOVERY - Puppet errors on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [14:35:53] RECOVERY - Puppet errors on tools-worker-1013 is OK: OK: Less than 1.00% above the threshold [0.0] [14:35:57] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1422 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [14:36:13] RECOVERY - Puppet errors on tools-exec-1431 is OK: OK: Less than 1.00% above the threshold [0.0] [14:36:33] RECOVERY - Puppet errors on tools-worker-1016 is OK: OK: Less than 1.00% above the threshold [0.0] [14:36:38] RECOVERY - Puppet errors on tools-exec-1439 is OK: OK: Less than 1.00% above the threshold [0.0] [14:36:49] RECOVERY - Puppet errors on tools-worker-1006 is OK: OK: Less than 1.00% above the threshold [0.0] [14:36:53] RECOVERY - Puppet errors on tools-exec-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [14:36:55] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [14:36:55] PROBLEM - Puppet errors on tools-worker-1009 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [14:36:55] RECOVERY - Puppet errors on tools-exec-1420 is OK: OK: Less than 1.00% above the threshold [0.0] [14:36:57] RECOVERY - Puppet errors on tools-bastion-05 is OK: OK: Less than 1.00% above the threshold [0.0] [14:36:59] PROBLEM - Puppet errors on tools-flannel-etcd-03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [14:37:03] RECOVERY - Puppet errors on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [14:37:14] RECOVERY - Puppet errors on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [14:37:22] RECOVERY - Puppet errors on tools-worker-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [14:37:34] RECOVERY - Puppet errors on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0] [14:37:44] RECOVERY - Puppet errors on tools-redis-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [14:37:50] RECOVERY - Puppet errors on tools-exec-1422 is OK: OK: Less than 1.00% above the threshold [0.0] [14:38:08] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [14:38:12] RECOVERY - Puppet errors on tools-static-11 is OK: OK: Less than 1.00% above the threshold [0.0] [14:38:22] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [14:38:29] RECOVERY - Puppet errors on tools-exec-1434 is OK: OK: Less than 1.00% above the threshold [0.0] [14:38:32] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [14:38:47] RECOVERY - Puppet errors on tools-elastic-02 is OK: OK: Less than 1.00% above the threshold [0.0] [14:38:53] RECOVERY - Puppet errors on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [14:38:57] RECOVERY - Puppet errors on tools-worker-1019 is OK: OK: Less than 1.00% above the threshold [0.0] [14:39:03] RECOVERY - Puppet errors on tools-exec-1428 is OK: OK: Less than 1.00% above the threshold [0.0] [14:39:37] RECOVERY - Puppet errors on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [14:39:53] RECOVERY - Puppet errors on tools-exec-1436 is OK: OK: Less than 1.00% above the threshold [0.0] [14:40:13] RECOVERY - Puppet errors on tools-worker-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [14:40:31] RECOVERY - Puppet errors on tools-worker-1015 is OK: OK: Less than 1.00% above the threshold [0.0] [14:40:45] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [14:40:57] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1422 is OK: OK: Less than 1.00% above the threshold [0.0] [14:40:57] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [14:41:56] RECOVERY - Puppet errors on tools-worker-1009 is OK: OK: Less than 1.00% above the threshold [0.0] [14:41:58] RECOVERY - Puppet errors on tools-flannel-etcd-03 is OK: OK: Less than 1.00% above the threshold [0.0] [14:42:10] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [14:42:30] RECOVERY - Puppet errors on tools-webgrid-generic-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [14:42:34] RECOVERY - Puppet errors on tools-exec-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [14:42:45] RECOVERY - Puppet errors on tools-exec-1441 is OK: OK: Less than 1.00% above the threshold [0.0] [14:42:47] RECOVERY - Puppet errors on tools-exec-1440 is OK: OK: Less than 1.00% above the threshold [0.0] [14:42:53] RECOVERY - Puppet errors on tools-k8s-etcd-03 is OK: OK: Less than 1.00% above the threshold [0.0] [14:43:33] RECOVERY - Puppet errors on tools-exec-1417 is OK: OK: Less than 1.00% above the threshold [0.0] [14:45:22] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [14:46:04] 10Labs, 10Tool-Labs: Update php 5 to php 7 - https://phabricator.wikimedia.org/T121022#3401068 (10MGChecker) After more than a year passed: Is there any chance to reconsider that decision? It really is a pity that we can't use the cool features PHP 7.0 and 7.1 introduced at Tool-Labs. [14:55:47] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3401109 (10chasemp) @Papaul, can you get this knocked out this week? (some of our goals this quarter will depend on this it seems) [14:55:50] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3401111 (10chasemp) @Papaul, can you get this knocked out this week? (some of our goals this quarter will depend on this it seems) [14:55:52] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3401110 (10chasemp) @Papaul, can you get this knocked out this week? (some of our goals this quarter will depend on this it seems) [14:57:52] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-eqiad: rack/setup/install labcontrol100[34] - https://phabricator.wikimedia.org/T165781#3401118 (10chasemp) https://gerrit.wikimedia.org/r/#/c/362993/ [14:58:01] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3401131 (10Papaul) @chasemp working on it. [14:58:17] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3401132 (10Papaul) @chasemp working on it. [14:58:31] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3401133 (10Papaul) @chasemp working on it. [15:07:15] PROBLEM - Puppet errors on tools-exec-1438 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:08:10] fnielsen: the default for grid engine jobs is 512M I think, that can be adjusted using the `-mem` command line argument -- https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Grid#Allocating_additional_memory [15:08:47] fnielsen: for grid engine and kubernetes web services the default limit is 4G -- https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web#Memory_limit [15:09:24] fnielsen: that can also be adjusted but it requires a phabricator task asking for a quota adjustment [15:18:54] 10Labs, 10cloud-services-team (Kanban): nova-fullstack is losing instances on creation - https://phabricator.wikimedia.org/T165555#3401226 (10chasemp) https://phabricator.wikimedia.org/P5669 [15:23:54] 10Labs, 10cloud-services-team (Kanban): nova-fullstack is losing instances on creation - https://phabricator.wikimedia.org/T165555#3401237 (10chasemp) I cleaned things up to get an idea of current state since @andrew is away for the moment. >>! In T165555#3401226, @chasemp wrote: > https://phabricator.wikime... [15:24:05] 10Labs, 10cloud-services-team (Kanban): nova-fullstack is losing instances on creation - https://phabricator.wikimedia.org/T165555#3401238 (10chasemp) first run post clear and restart of nova-fullstack seems ok [15:24:59] 10Labs, 10Tool-Labs: Update php 5 to php 7 - https://phabricator.wikimedia.org/T121022#1867173 (10zhuyifei1999) >>! In T121022#3401068, @MGChecker wrote: > After more than a year passed: Is there any chance to reconsider that decision? Do jessie or trusty official apt repositories or apt.wikimedia.org provide... [15:30:20] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [15:42:53] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:47:14] RECOVERY - Puppet errors on tools-exec-1438 is OK: OK: Less than 1.00% above the threshold [0.0] [15:52:14] bd808: Thanks. For some reason I never found the "Help:Tool Labs/Web" page [15:58:55] our docs... are not obvious :/ [16:22:51] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [16:26:22] PROBLEM - Puppet errors on tools-worker-1021 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:28:48] 10Labs: ukwikimedia still present on replicas dbs on labs hosts - https://phabricator.wikimedia.org/T169488#3401574 (10bd808) @Marostegui I think it would be fine to drop the replica db copies of that database. As mentioned in the commit message when added to deleted.dblist the ukwikimedia has been an external r... [16:30:48] 10Labs, 10DBA: ukwikimedia still present on replicas dbs on labs hosts - https://phabricator.wikimedia.org/T169488#3401577 (10jcrespo) [16:31:03] 10Labs, 10DBA: ukwikimedia still present on replicas dbs on labs hosts - https://phabricator.wikimedia.org/T169488#3401579 (10Marostegui) Thanks @bd808! Can you clean up the views and I will take care of removing the db? [16:31:27] 10Labs, 10DBA: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3401580 (10jcrespo) [16:31:45] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [16:32:05] 10Labs, 10DBA: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3399766 (10jcrespo) Yes, we may need your help to update the meta database, maybe? I can take care of the actual data deletion. [16:33:09] 10Labs, 10DBA: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3401597 (10jcrespo) Self-reminder, reload the replication filters, too, just in case. [16:37:57] 10Labs, 10Tracking: Create Labs project netops - https://phabricator.wikimedia.org/T169556#3401602 (10ayounsi) [16:57:06] 10Labs, 10DBA: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3401733 (10bd808) bah. !log fail: `[16:55] < bd808> !log Running maintain-views --all-databases --clean --replace-all --debug on labsdb1001` [17:01:19] RECOVERY - Puppet errors on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [0.0] [17:06:44] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [17:15:54] bd808: can i have your deploy-service docs again i lost the link [17:16:34] Zppix: is this what you are looking for? -- https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Kubernetes#Kubernetes_continuous_jobs [17:16:49] one sec [17:20:28] nevermind i figured it out thanks anyways [17:27:32] 10Labs, 10Tool-Labs: Update php 5 to php 7 - https://phabricator.wikimedia.org/T121022#3401829 (10MGChecker) >>! In T121022#3401239, @zhuyifei1999 wrote: >>>! In T121022#3401068, @MGChecker wrote: >> After more than a year passed: Is there any chance to reconsider that decision? > > Do jessie or trusty offici... [17:42:59] 10Labs, 10Tool-Labs: Linkwatcher spawns many processes without parent - https://phabricator.wikimedia.org/T123121#3401880 (10Beetstra) @bd808 It is not that trivial, the new project would need to run coibot and linkwatcher, as they both do their share of analysis on the created db. On the other hand, an own p... [17:44:07] Hello, I had nice colorful prompt at toollabs before. Now I've logged in into my urbanecmbot tool and the prompt is gone. Fascinating is that in my personal account (urbanecm) or other tool (wikinity) there is no change if I do not count the header "this file is managed by puppet" in .bashrc and .profile. How can I fix it please? [17:46:33] Simply copying the working .bashrc from personal account do not work (I must run . ~/.bashrc) [17:47:18] 10Labs, 10DC-Ops, 10Operations: labstore1005 A PCIe link training failure error on boot - https://phabricator.wikimedia.org/T169286#3401905 (10chasemp) p:05Triage>03High >>! In T169286#3393637, @Andrew wrote: > I tagged dc-ops because... have y'all ever seen something like this? @Christopher ^ We had t... [17:47:22] Never mind, I've figured it out. I don't now why it didn't try to use .profile :) [17:47:49] Urbanecm: good to know! glad it worked out [17:48:14] But making some files to be managed by puppet without letting users know isn't good idea :) [17:48:17] chasemp: ^ [17:48:53] Urbanecm: afaik it's not, something changed where bash is no longer enforced at a specific system level and instead defaults flow from ldap [17:48:57] possibly teh cause? [17:49:01] I'm not sure why it changed for you [17:52:03] Another problem. One of my python webscripts is using HTML module. But I can import it only from my personal account not from the bot. See https://pastebin.com/vcVpk259 . [17:53:15] Why I can't import the same module when using another system user? [17:53:56] 10Labs-project-icinga2, 10User-Zppix: Make Icinga2-wm bot use IRC auth - https://phabricator.wikimedia.org/T167807#3401980 (10Zppix) 05Open>03Resolved [17:54:17] chasemp: [17:55:35] Urbanecm: is sys.path differnt there? [17:55:38] between them [17:55:57] python -c 'import sys; print(sys.path)' [17:56:54] Hmm, there is /home/urbanecm/.local/lib/python2.7/site-packages [17:56:56] HTML.__file__ on the user would be itneresting [17:57:09] ah yeah, so it's installed locally for the user must be [17:57:17] But why it is there? [17:57:36] Yeah, it is. But it was working previously... [17:58:12] no clue on that front [17:59:18] Great, I've copied whole .local folder and it works now. [18:04:17] 10Labs, 10Operations: nfs-manage failover script needs to be tested with real load and fixed - https://phabricator.wikimedia.org/T169570#3402054 (10chasemp) [18:04:57] 10Labs, 10Operations: nfs-manage failover script needs to be tested with real load and fixed - https://phabricator.wikimedia.org/T169570#3402071 (10chasemp) `fuser -k` introduction or some such is possibly an addition? With nfs-kernel-server stopped file integrity issues from clients shouldn't be an issue but... [18:05:01] 10Labs, 10Operations: nfs-manage failover script needs to be tested with real load and fixed - https://phabricator.wikimedia.org/T169570#3402072 (10chasemp) p:05Triage>03High [18:41:25] 10Labs, 10Tool-Labs, 10cloud-services-team (Kanban): Build new tools puppetmaster - https://phabricator.wikimedia.org/T169350#3402149 (10bd808) p:05Triage>03High Marking as high because we need to get clush running again somewhere sooner rather than later. [18:46:21] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [18:49:26] I remember some plan page to rename labs to cloud but can't find it [18:50:10] bd808? [18:50:18] I wasent pinging him [18:50:23] i was meaning under his user. [18:50:30] https://wikitech.wikimedia.org/wiki/User:BryanDavis/Rebranding_Cloud_Services_products [18:50:33] revi: https://wikitech.wikimedia.org/wiki/User:BryanDavis/Rebranding_Cloud_Services_products linked from https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_Introduction [18:50:36] :) [18:50:39] thanks [18:50:52] oh thanks [18:51:04] T166404 is the annual plan tracking task for it [18:51:07] T166404: Program 10 Outcome 2: Rebranding - https://phabricator.wikimedia.org/T166404 [18:52:18] small Q: why cloud-l@list.wm.o? [18:52:25] can we drop -l suffixv [18:52:30] s/suffixv/suffix? [18:53:00] (while not a policy it seems no -l is a standard https://meta.wikimedia.org/wiki/Mailing_lists/Standardization ) [18:54:18] * TabbyCat likes the -l [18:54:27] reminds me I am mailing a mailing list [18:54:38] so old fashioned, I know [18:54:45] **lists**.wikimedia.org also does that [18:58:38] revi: no good reason. I think someone else also mentioned that the -l suffix was being used less and less. I really don't care at all either way [18:58:47] hmm k [18:59:02] I don't mind about this like 'ohmygod this is an evil' so [19:00:18] even is it was named cloud@ I would say 'cloud-l" out loud to mean the mailing list :) [19:01:50] xD [19:01:52] nini anyway [19:01:57] I really need to sleep now [19:26:21] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [19:51:57] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3379828 (10Dzahn) labtestcontrol2003.wikimedia.org has address 208.80.153.75 [19:52:25] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3379811 (10Dzahn) labtestservices2003.wikimedia.org has address 208.80.153.109 [19:52:41] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3379794 (10Dzahn) labtestservices2002.wikimedia.org has address 208.80.153.76 [19:53:16] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install labtestmetal2001.codfw.wmnet - https://phabricator.wikimedia.org/T168891#3379777 (10Dzahn) labtestmetal2001.codfw.wmnet has address 10.192.20.11 [20:13:52] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:29:07] 10Labs: Request creation of wembedder labs project - https://phabricator.wikimedia.org/T169580#3402434 (10Fnielsen) [20:30:50] 10Labs, 10DBA: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3402456 (10bd808) >>! In T169488#3401733, @bd808 wrote: > bah. !log fail: `[16:55] < bd808> !log Running maintain-views --all-databases --clean --repl... [20:36:15] 10Striker, 10Epic, 10Patch-For-Review, 10User-bd808, 10cloud-services-team (FY2017-18): Manage shared tool accounts via Striker - https://phabricator.wikimedia.org/T149458#3402476 (10bd808) a:03bd808 [20:53:53] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [21:10:41] anomie: ping [21:10:52] could you check which API error am I triggering? [21:11:14] a script is failing and won't let me send globalblock requests [21:12:14] Error occured in API request while attempting to block 178.137.136.76. Please check whether your input is valid. Script has been terminated. [21:21:34] 10Labs: Create Labs project netops - https://phabricator.wikimedia.org/T169556#3402615 (10Peachey88) [21:48:13] PROBLEM - Puppet errors on tools-worker-1027 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [22:23:11] RECOVERY - Puppet errors on tools-worker-1027 is OK: OK: Less than 1.00% above the threshold [0.0] [23:17:21] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [23:25:03] 10Labs, 10DBA: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3402799 (10bd808) > seemed to add quite a large number of missing views Not necessarily true. I used `--replace-all`. [23:37:14] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Set up load balancing for new XTools - https://phabricator.wikimedia.org/T169590#3402820 (10kaldari) [23:39:09] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Set up load balancing for new XTools - https://phabricator.wikimedia.org/T169590#3402820 (10kaldari) [23:47:29] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Optimize edit count queries in XTools - https://phabricator.wikimedia.org/T163284#3402868 (10DannyH) p:05Normal>03High [23:57:22] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0]