[00:03:17] RECOVERY - Puppet errors on tools-exec-1420 is OK: OK: Less than 1.00% above the threshold [0.0] [00:42:54] PROBLEM - Puppet errors on tools-exec-1401 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:17:57] RECOVERY - Puppet errors on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [04:43:19] PROBLEM - SSH on tools-exec-1440 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:48:08] RECOVERY - SSH on tools-exec-1440 is OK: SSH OK - OpenSSH_6.9p1 Ubuntu-2~trusty1 (protocol 2.0) [04:55:13] PROBLEM - High iowait on tools-exec-1440 is CRITICAL: CRITICAL: tools.tools-exec-1440.cpu.total.iowait (>11.11%) [05:00:12] RECOVERY - High iowait on tools-exec-1440 is OK: OK: All targets OK [06:53:47] PROBLEM - Puppet errors on tools-exec-1437 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:28:43] RECOVERY - Puppet errors on tools-exec-1437 is OK: OK: Less than 1.00% above the threshold [0.0] [08:49:52] 10Data-Services, 10DBA: LabsDB infrastructure pending work - https://phabricator.wikimedia.org/T153058#3540367 (10Marostegui) [08:56:49] 10Data-Services, 10DBA: LabsDB infrastructure pending work - https://phabricator.wikimedia.org/T153058#3540393 (10jcrespo) p:05Triage>03Low [10:14:14] 10Cloud-VPS, 10Wikidata: ListeriaBot logs in 28000 times a day - https://phabricator.wikimedia.org/T173777#3540601 (10Magnus) It was written to do single edits, so it does log in before each one. I can try and change it if it's a problem? [10:14:47] 10Cloud-VPS, 10Wikidata: ListeriaBot logs in 28000 times a day - https://phabricator.wikimedia.org/T173777#3540602 (10Magnus) Daily list to work through: https://tools.wmflabs.org/listeria/botstatus.php [10:15:29] 10Cloud-VPS, 10Wikidata: ListeriaBot logs in 28000 times a day - https://phabricator.wikimedia.org/T173777#3540603 (10Cyberpower678) You really ought to. Simply using a session cookie to retain the login data will save some major resources IMO. [10:16:55] 10Cloud-VPS, 10Wikidata: ListeriaBot logs in 28000 times a day - https://phabricator.wikimedia.org/T173777#3540605 (10Cyberpower678) Migrating to OAuth would solve this problem as well. [10:38:33] 10Cloud-Services, 10Cloud-VPS, 10Striker, 10DBA: Investigate moving labsdb (replicas) user credential management to 'Striker' (codename) - https://phabricator.wikimedia.org/T140832#3540654 (10jcrespo) @chasemp I do not want to push for this, but I suspect this may already be done and should be resolved? Ca... [10:56:30] 10cloud-services-team (Kanban), 10Analytics, 10DBA, 10Research: Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3540700 (10jcrespo) One thought- after setting up the new labsdbs, we said we were going to consider the #DBA part of the... [11:26:48] (03PS1) 10Lokal Profil: Allow by_be-tarask lists to be in the main namespace [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/373053 [11:27:16] 10VPS-Projects: Successful pilot of Discourse on https://discourse.wmflabs.org/ as an alternative to wikimedia-l mailinglist - https://phabricator.wikimedia.org/T124690#3540771 (10Aklapper) [11:27:19] 10VPS-Projects: Problem creating an account at https://discourse.wmflabs.org/ - https://phabricator.wikimedia.org/T125107#3540769 (10Aklapper) 05Open>03stalled @AdHuikeshoven: Can you provide more information?: >>! In T125107#1992855, @EBernhardson wrote: > I can look into what caused the failure to create... [11:27:46] PROBLEM - Puppet errors on tools-exec-1408 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [11:40:24] (03CR) 10Jean-Frédéric: Allow by_be-tarask lists to be in the main namespace (031 comment) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/373053 (owner: 10Lokal Profil) [11:46:18] (03PS2) 10Lokal Profil: Allow by_be-tarask lists to be in the main namespace [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/373053 (https://phabricator.wikimedia.org/T173717) [11:46:38] (03CR) 10Lokal Profil: ">" (031 comment) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/373053 (https://phabricator.wikimedia.org/T173717) (owner: 10Lokal Profil) [12:04:45] (03CR) 10Jean-Frédéric: [C: 032] Allow by_be-tarask lists to be in the main namespace [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/373053 (https://phabricator.wikimedia.org/T173717) (owner: 10Lokal Profil) [12:05:44] (03Merged) 10jenkins-bot: Allow by_be-tarask lists to be in the main namespace [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/373053 (https://phabricator.wikimedia.org/T173717) (owner: 10Lokal Profil) [12:06:32] (03CR) 10jenkins-bot: Allow by_be-tarask lists to be in the main namespace [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/373053 (https://phabricator.wikimedia.org/T173717) (owner: 10Lokal Profil) [12:07:46] RECOVERY - Puppet errors on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [12:16:21] !log tools.heritage Deploy latest from Git master: 166f01d, 1d33262, 7177386 (T173717) [12:16:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [12:16:25] T173717: Weird harvest of by_(be-tarask) - https://phabricator.wikimedia.org/T173717 [12:33:51] PROBLEM - Puppet errors on tools-exec-1428 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [13:08:51] RECOVERY - Puppet errors on tools-exec-1428 is OK: OK: Less than 1.00% above the threshold [0.0] [14:23:24] 10Cloud-VPS, 10cloud-services-team (Kanban), 10Operations, 10Patch-For-Review: Switch to new labs puppetmasters - https://phabricator.wikimedia.org/T171786#3541284 (10Andrew) I'm going to leave labcontrol1001 as the salt master. No sense in rebuilding this when we're going to stop using salt soon, and the... [15:20:25] 10Cloud-Services, 10Cloud-VPS, 10Striker, 10DBA: Investigate moving labsdb (replicas) user credential management to 'Striker' (codename) - https://phabricator.wikimedia.org/T140832#3541630 (10bd808) This is still wishlist status on the #striker implementation side. The idea is to replace the current servic... [15:21:21] 10Cloud-Services, 10Cloud-VPS, 10Striker, 10DBA: Investigate moving labsdb (replicas) user credential management to 'Striker' (codename) - https://phabricator.wikimedia.org/T140832#3541651 (10jcrespo) Oh, sorry. So it is done, but not by striker. Sorry for the confusion. [15:38:34] 10cloud-services-team (Kanban), 10DC-Ops, 10Operations, 10ops-eqiad: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3541723 (10Andrew) For example: ``` CPU 0 BANK 18 TSC b270de9d648a RIP !INEXACT! 10:ffffffff8146c0e8 MISC c0fe2010821cc086 ADDR 3f62282b00 TIME 1502812434 Tue Aug 15 15:5... [15:56:30] 10Data-Services, 10DC-Ops, 10Operations, 10ops-codfw: Split up labstore external shelf storage available in codfw between labstore2001 and 2 - https://phabricator.wikimedia.org/T171623#3541756 (10Papaul) a:05Papaul>03madhuvishy @Cmjohnson I have no issues @madhuvishy This is complete please check and... [16:13:45] E_TOOLONG apparently [16:14:21] I would ditch the missing data piece [16:14:25] it's past deadline [16:14:29] :) [16:35:48] 10Striker, 10cloud-services-team (Kanban): Make potential for others to see IP Address for ssh sessions explicit in Toolforge membership request process - https://phabricator.wikimedia.org/T173845#3541866 (10bd808) [16:38:42] 10Cloud-VPS, 10cloud-services-team (Kanban): Add warning about ssh ip address visibility to Cloud VPS TOU - https://phabricator.wikimedia.org/T173846#3541880 (10bd808) [16:59:15] 10Cloud-Services, 10Operations, 10ops-eqiad: rack/setup/install labmon1002 - https://phabricator.wikimedia.org/T165784#3541951 (10RobH) [17:58:02] 10Cloud-VPS, 10cloud-services-team (Kanban), 10Continuous-Integration-Infrastructure, 10Nodepool, and 2 others: figure out if nodepool is overwhelming rabbitmq and/or nova - https://phabricator.wikimedia.org/T170492#3542225 (10Andrew) I've just removed apache and puppetmaster from the labcontrols. Seems l... [17:58:52] 10PAWS: Debugging notebook cell action/state - https://phabricator.wikimedia.org/T173416#3542227 (10Jprorama) Thanks for the suggestion. I'll run a test with more limited RAM. BTW, is there any status/monitoring to observe jupyter kernel status from a notebook user end? I haven't found anything yet. >>! In... [18:00:52] 10Cloud-VPS, 10cloud-services-team (Kanban), 10Operations, 10Patch-For-Review: Switch to new labs puppetmasters - https://phabricator.wikimedia.org/T171786#3542228 (10Andrew) I merged the patch removing puppetmaster from labcontrols. Then, on labcontrol1001, 1002, and labtestcontrol1001 I did the followin... [18:02:47] 10Cloud-VPS, 10cloud-services-team (Kanban), 10Operations, 10Patch-For-Review: Switch to new labs puppetmasters - https://phabricator.wikimedia.org/T171786#3542236 (10Andrew) [18:51:15] 10Toolforge, 10Tools, 10Developer-Relations, 10Toolforge-standards-committee: Make sure abandoned useful tools are properly advertised so potentially interested new maintainers could find them - https://phabricator.wikimedia.org/T159595#3542495 (10Quiddity) [18:57:14] 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labmon1002 - https://phabricator.wikimedia.org/T165784#3542524 (10RobH) [18:58:50] 10Toolforge, 10Operations, 10Toolforge-standards-committee, 10Traffic, 10HTTPS: Detect tools.wmflabs.org tools which are HTTP-only - https://phabricator.wikimedia.org/T128409#3542533 (10Quiddity) [19:13:20] 10Cloud-VPS, 10cloud-services-team (Kanban), 10Operations, 10Patch-For-Review: Switch to new labs puppetmasters - https://phabricator.wikimedia.org/T171786#3542622 (10Andrew) [19:18:41] 10Cloud-Services, 10Toolforge, 10cloud-services-team (Kanban): Build new tools puppetmaster - https://phabricator.wikimedia.org/T169350#3542649 (10Andrew) deleted! [19:20:02] PROBLEM - Host tools-puppetmaster-02 is DOWN: CRITICAL - Host Unreachable (10.68.18.245) [19:20:39] !log tools deleted tools-puppetmaster-02, it was replaced a month ago by -01 [19:20:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:40:51] 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labmon1002 - https://phabricator.wikimedia.org/T165784#3542755 (10RobH) a:05RobH>03chasemp [19:41:29] 10Cloud-Services, 10Operations: rack/setup/install labmon1002 - https://phabricator.wikimedia.org/T165784#3276770 (10RobH) labmon1002 is now ready for cloud team implementation. I've assigned to @chasemp since he made the initial hardware request. [23:21:26] 10Cloud-Services, 10Operations, 10ops-eqiad: rack/setup/install labnet100[34] - https://phabricator.wikimedia.org/T165779#3543357 (10RobH) [23:48:46] 10Cloud-Services, 10Operations: rack/setup/install labnet100[34] - https://phabricator.wikimedia.org/T165779#3543422 (10RobH) a:05RobH>03chasemp [23:49:04] 10Cloud-Services, 10Operations: rack/setup/install labnet100[34] - https://phabricator.wikimedia.org/T165779#3276633 (10RobH) These are both all setup and ready for cloud team to take over. Assigned to @chasemp for followup. [23:50:02] 10Cloud-Services, 10Operations: rack/setup/install labnet100[34] - https://phabricator.wikimedia.org/T165779#3543426 (10chasemp) Thanks @robh