[02:31:06] 10Tool-Labs-tools-stewardbots: IRC SUL Unification Bot Down - https://phabricator.wikimedia.org/T144887#2613576 (10Krinkle) [02:32:36] !log tools ran `SULWatcher/restart_SULWatcher.sh` as `tools.stewardbots` on bastion-03 to fix T144887 [02:32:37] T144887: IRC SUL Unification Bot Down - https://phabricator.wikimedia.org/T144887 [02:32:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [02:33:02] 10Tool-Labs-tools-stewardbots: IRC SUL Unification Bot Down - https://phabricator.wikimedia.org/T144887#2613580 (10Krenair) 05Open>03Resolved a:03Krenair ran `SULWatcher/restart_SULWatcher.sh` as `tools.stewardbots` on tools-bastion-03 [02:33:55] 10Tool-Labs-tools-stewardbots: IRC SUL Unification Bot Down - https://phabricator.wikimedia.org/T144887#2613583 (10Cameron11598) Thanks! [02:41:07] 06Labs, 06Operations, 10wikitech.wikimedia.org: Can't login wikitech - https://phabricator.wikimedia.org/T144805#2613590 (10Shizhao) >>! In T144805#2612937, @Peachey88 wrote: > @Shizhao When was the last time you logged into wikitech successfully? > > There was a mistake at one stage that resulted in a numb... [02:43:54] 06Labs, 06Operations, 10wikitech.wikimedia.org: Can't login wikitech - https://phabricator.wikimedia.org/T144805#2613591 (10Shizhao) ops, logged into wikitech successfully... .About 4 - 5 years ago? I don‘t remember [03:30:21] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Demo Unicorn was created, changed by Demo Unicorn link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Demo_Unicorn edit summary: Created page with "{{Tools Access Request |Justification=Alt account for {{u|BryanDavis}} that will be used in various tutorials for Wikitech, Labs, and Tool Labs. Tool Labs membership needed to..." [03:34:13] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Demo Unicorn was modified, changed by Demo Unicorn link https://wikitech.wikimedia.org/w/index.php?diff=820044 edit summary: template nesting here is apparently not a good idea [04:08:25] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Demo Unicorn was modified, changed by BryanDavis link https://wikitech.wikimedia.org/w/index.php?diff=820052 edit summary: [05:17:02] 06Labs, 10Tool-Labs: Linkwatcher spawns many processes without parent - https://phabricator.wikimedia.org/T123121#2613655 (10Beetstra) @valhallasw - it crashed, and is now on 1213. Do you mind moving the other tasks (it is back making backlogs again)? [07:20:27] 10Tool-Labs-tools-stewardbots: IRC SUL Unification Bot Down - https://phabricator.wikimedia.org/T144887#2613718 (10MarcoAurelio) Hmm, strange. I set few days ago a bigbrother daemon so this don't happen again (cf. {T144461}). What was wrong? Btw, thanks for restarting the bots. Although I'm not sure this was an... [07:22:26] 10Tool-Labs-tools-stewardbots: Automatically start the irc bots - https://phabricator.wikimedia.org/T144461#2600503 (10MarcoAurelio) 05Resolved>03Open Apparently this didn't work, cf. {T144887} [08:09:18] 10Labs-Kubernetes, 13Patch-For-Review: Locale on Kubernetes - https://phabricator.wikimedia.org/T144794#2613754 (10Magnus) Ah, as I already said in the patch review comment, C.UTF-8 doesn't work for me. It should sort É after E, but it gets sorted after Z. It works fine with en_US.UTF-8, so unless there is an... [08:21:11] 10Labs-Kubernetes, 13Patch-For-Review: Locale on Kubernetes - https://phabricator.wikimedia.org/T144794#2613791 (10Magnus) 05Open>03Resolved a:03Magnus Fixed it! Apparently, the PHP Collator class (https://secure.php.net/manual/en/class.collator.php ) does not rely on the system locale, so I'm using that... [10:07:24] RECOVERY - Host tools-secgroup-test-103 is UP: PING OK - Packet loss = 0%, RTA = 0.77 ms [10:13:21] (03Draft1) 10Paladox: Do not show merges by the i18n-bot [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 [10:21:03] (03PS2) 10Paladox: Do not show merges by the L10n-bot [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) [10:23:18] (03PS3) 10Paladox: Do not show merges by the L10n-bot [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) [10:23:19] PROBLEM - Host tools-secgroup-test-103 is DOWN: CRITICAL - Host Unreachable (10.68.21.22) [10:41:11] RECOVERY - Host secgroup-lag-102 is UP: PING OK - Packet loss = 0%, RTA = 0.84 ms [11:08:19] PROBLEM - Host secgroup-lag-102 is DOWN: PING CRITICAL - Packet loss = 100% [11:23:14] (03CR) 10Lokal Profil: "> Sorry for the delay in reviewing this." [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/302887 (owner: 10Lokal Profil) [12:16:09] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Wikishizhao was created, changed by Wikishizhao link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Wikishizhao edit summary: Created page with "{{Tools Access Request |Justification=run [[commons:User:Panoramio_upload_bot]] and [[:zh:User:Sz-iwbot]], and other self made tools for wikimedia project. |Completed=false |U..." [12:21:19] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Wikishizhao was modified, changed by Wikishizhao link https://wikitech.wikimedia.org/w/index.php?diff=820283 edit summary: [12:23:35] PROBLEM - Host tools-secgroup-test-102 is DOWN: CRITICAL - Host Unreachable (10.68.21.170) [14:33:20] 06Labs, 10Horizon, 13Patch-For-Review: Horizon dashboard for managing instance puppet config - https://phabricator.wikimedia.org/T91990#2614729 (10Andrew) [14:33:22] 06Labs, 13Patch-For-Review: Convert all ldap globals into hiera variables instead - https://phabricator.wikimedia.org/T101447#2614727 (10Andrew) 05Open>03Resolved a:03Andrew [15:27:45] !log tools.lolrrrit-wm testing some new updates to npm packages [15:27:45] tools.lolrrrit-wm is not a valid project. [15:27:54] !log tools.lolrrit-wm testing some new updates to npm packages [15:27:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lolrrit-wm/SAL, Master [15:31:47] yuvipanda: (or anyone else who can) any chance of getting some long running queries killed on quarry (they're two of mine) [15:34:19] 10Striker: Groups and tools only refreshed at login - https://phabricator.wikimedia.org/T144943#2615061 (10bd808) [15:36:29] 12390 and 12396 are the queries needing killing as and when, I'm sure they'll timeout soon [15:37:32] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: labnet1002.eqiad.wmnet: no route to host - https://phabricator.wikimedia.org/T144945#2615093 (10hashar) [15:38:13] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: labnet1002.eqiad.wmnet: no route to host - https://phabricator.wikimedia.org/T144945#2615108 (10hashar) [15:38:25] somehow labnet1002.eqiad.wmnet is no more reachable :( [15:40:08] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: labnet1002.eqiad.wmnet: no route to host - https://phabricator.wikimedia.org/T144945#2615093 (10hashar) Step to reproduce: ``` ssh labnodepool1001.eqiad.wmnet user@labnodepool1001:~$ become-nodepool nodepool@labnodepool1001:~$ openst... [15:40:29] andrewbogott: yuvipanda chasemp labnet1002.eqiad.wmnet is unreachable from labnodepool1001 apparently :( [15:40:46] it's probably being upgraded, the keystone entry being wrong is an issue [15:40:52] I pingd andrewbogott in releng [15:41:05] ah great thank you. Lets follow up there so [15:41:55] myrcx, they seem to be gone? [15:42:13] hm [15:42:58] Krenair: one completed, https://quarry.wmflabs.org/query/12390 is still executing (probably due to my fail of SQLfu) [15:43:02] mysql's show processlist shows nothing, but https://quarry.wmflabs.org/query/12390 says running [15:43:39] dunno, yuvi? [15:48:11] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: labnet1002.eqiad.wmnet: no route to host - https://phabricator.wikimedia.org/T144945#2615165 (10hashar) 05Open>03Resolved a:03Andrew labnet1002 is in maintenance but the failover did not update Keystone. The openstack CLI tool on... [15:55:03] 10Labs-Kubernetes, 13Patch-For-Review: Locale on Kubernetes - https://phabricator.wikimedia.org/T144794#2615217 (10yuvipanda) Thanks! I will also add en_US.UTF-8 to the image. I *might* also have to add all locales, will see. [15:57:55] 10Striker, 10Phabricator, 13Patch-For-Review: Unable to mirror repository from git.legoktm.com into diffusion - https://phabricator.wikimedia.org/T143969#2615229 (10bd808) >>! In T143969#2596494, @bd808 wrote: > @dpatrick and @faidon: do you have strong feelings about allowing Phabricator to directly replica... [16:35:19] 06Labs, 10Tool-Labs: Use of uninitialized value in print at /usr/local/sbin/bigbrother line 210 - https://phabricator.wikimedia.org/T144955#2615362 (10valhallasw) [16:36:08] 06Labs, 10Tool-Labs: Use of uninitialized value in print at /usr/local/sbin/bigbrother line 210 - https://phabricator.wikimedia.org/T144955#2615376 (10valhallasw) May be related to {T144461}. [16:36:08] PROBLEM - Puppet run on tools-exec-1206 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:36:14] PROBLEM - Puppet run on tools-precise-dev is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:37:24] PROBLEM - Puppet run on tools-worker-1007 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:37:47] PROBLEM - Puppet run on tools-webgrid-lighttpd-1405 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:38:21] PROBLEM - Puppet run on tools-webgrid-lighttpd-1206 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:38:25] PROBLEM - Puppet run on tools-exec-1210 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:38:32] ^^ puppet failures [16:39:00] (03CR) 10Paladox: "recheck" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) (owner: 10Paladox) [16:40:33] PROBLEM - Puppet run on tools-worker-1023 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [16:41:13] PROBLEM - Puppet run on tools-exec-1407 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:42:03] PROBLEM - Puppet run on tools-mail-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [16:42:15] PROBLEM - Puppet run on tools-static-11 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:42:17] PROBLEM - Puppet run on tools-k8s-master-02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:42:27] PROBLEM - Puppet run on tools-exec-1214 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [16:42:29] PROBLEM - Puppet run on tools-worker-1002 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:42:35] PROBLEM - Puppet run on tools-worker-1013 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:42:37] PROBLEM - Puppet run on tools-exec-1406 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:42:45] PROBLEM - Puppet run on tools-worker-1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:42:49] chasemp ^^ andrewbogott [16:43:35] PROBLEM - Puppet run on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:43:39] PROBLEM - Puppet run on tools-checker-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [16:44:01] PROBLEM - Puppet run on tools-exec-1201 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:44:50] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:45:11] PROBLEM - Puppet run on tools-worker-1016 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:46:52] (03CR) 10Paladox: "recheck" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) (owner: 10Paladox) [16:47:01] PROBLEM - Puppet run on tools-worker-1011 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [16:47:15] PROBLEM - Puppet run on tools-exec-1410 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:47:17] PROBLEM - Puppet run on tools-exec-1212 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:47:21] PROBLEM - Puppet run on tools-webgrid-lighttpd-1210 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:47:33] PROBLEM - Puppet run on tools-webgrid-lighttpd-1412 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:47:37] PROBLEM - Puppet run on tools-worker-1009 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:47:37] PROBLEM - Puppet run on tools-exec-1217 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:47:38] uhhh [16:47:41] PROBLEM - Puppet run on tools-k8s-master-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:47:41] PROBLEM - Puppet run on tools-bastion-02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:47:43] PROBLEM - Puppet run on tools-worker-1018 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:47:47] PROBLEM - Puppet run on tools-worker-1012 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:47:51] PROBLEM - Puppet run on tools-webgrid-lighttpd-1205 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:47:52] PROBLEM - Puppet run on tools-mail is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:47:52] PROBLEM - Puppet run on tools-exec-1208 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:47:52] this is me, labstore1003 maintenance [16:48:00] should repair itself in a few minutes [16:48:01] PROBLEM - Puppet run on tools-exec-1409 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:48:23] PROBLEM - Puppet run on tools-webgrid-lighttpd-1209 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:48:27] PROBLEM - Puppet run on tools-worker-1017 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:48:58] yuvipanda I got npm 2 on grrrit-wm, i hope i didnt break any of your rules for doing that. I just installed npm locally and used the export to change the path of npm, npm2 is also a security upgrade. I am going to see if this will improve the bots reliability and not keep disconnecting because of a freeze. [16:49:02] and hi [16:49:04] :) [16:49:21] chasemp: ouch, the file handles went stale [16:49:32] 06Labs, 10Tool-Labs: Use of uninitialized value in print at /usr/local/sbin/bigbrother line 210 - https://phabricator.wikimedia.org/T144955#2615362 (10bd808) [[https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/toollabs/files/bigbrother;6718f6ea9fe18ae3ecee3d5d40679c50bf562f80$210|That... [16:50:10] PROBLEM - Puppet run on tools-grid-master is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:50:16] madhuvishy, mind if I quiet shinken-wm [16:50:45] paladox sure, just make sure it is all documented on wikitech. [16:51:08] PROBLEM - Puppet run on tools-exec-1216 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:51:16] PROBLEM - Puppet run on tools-exec-gift is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [16:51:16] tom29739: sure [16:51:20] PROBLEM - Puppet run on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:51:27] yuvipanda Ok, what do i document? [16:51:52] PROBLEM - Puppet run on tools-worker-1006 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:52:00] PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [16:52:00] tools.lolrrit-wm@tools-bastion-03:~$ npm -v [16:52:01] 2.15.10 [16:52:02] :) [16:52:02] PROBLEM - Puppet run on tools-checker-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [16:52:04] PROBLEM - Puppet run on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:52:04] PROBLEM - Puppet run on tools-webgrid-lighttpd-1408 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:52:18] PROBLEM - Puppet run on tools-webgrid-lighttpd-1202 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:52:24] PROBLEM - Puppet run on tools-grid-shadow is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [16:52:28] PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:52:28] tom29739 i guess if one of the labs admins are ok with that then we can quit it, but we can enable it probaly again in a few mins [16:52:38] PROBLEM - Puppet run on tools-exec-cyberbot is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:52:39] PROBLEM - Puppet run on tools-worker-1021 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [16:52:41] PROBLEM - Puppet run on tools-exec-1209 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:52:46] PROBLEM - Puppet run on tools-webgrid-lighttpd-1201 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:53:00] PROBLEM - Puppet run on tools-worker-1020 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [16:53:12] madhuvishy: need help, what's the plan / status? it is indeed screwey [16:53:24] PROBLEM - Puppet run on tools-cron-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:53:31] PROBLEM - Puppet run on tools-exec-1220 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:53:56] paladox everything :) on the wikitech page. all the changes you've made. [16:53:59] chasemp: scratch is not mounted [16:54:03] on labstore1003 [16:54:18] ok that's another thing :) looking [16:54:43] Ah, yuvipanda like the .npmrc and the .profile changes, ok, i will do that now, under the npm 2 upgrade section :) [16:54:51] paladox thanks! [16:55:19] Your welcome :) [16:55:28] chasemp: I'm going to mount /dev/srv/scratch on to /srv/scratch [16:55:37] madhuvishy: just did it [16:55:42] ah ok [16:55:49] PROBLEM - Puppet run on tools-worker-1010 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:55:55] (was about to type that as you said so :) [16:55:56] @quiet shinken-wm [16:56:02] not sure if that worked [16:56:15] PROBLEM - Puppet run on tools-webgrid-lighttpd-1203 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:56:17] PROBLEM - Puppet run on tools-exec-1203 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:56:17] PROBLEM - Puppet run on tools-exec-1207 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:56:17] madhuvishy: I ran nfs-mount-manager umount /mnt/nfs/labstore1003-scratch on tools-webgrid-lighttpd-1403 and am rerunning puppet to see if that fixes things [16:56:19] PROBLEM - Puppet run on tools-worker-1008 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:56:37] ok, let me kill shinken-wm [16:56:53] PROBLEM - Puppet run on tools-exec-1204 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:56:55] PROBLEM - Puppet run on tools-worker-1005 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [16:56:56] madhuvishy: yeah if you run [16:56:56] nfs-mount-manager umount /mnt/nfs/labstore1003-scratch && mount -a [16:56:57] PROBLEM - Puppet run on tools-exec-1215 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:56:59] PROBLEM - Puppet run on tools-webgrid-lighttpd-1401 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:57:02] across tools things will be ok [16:57:05] PROBLEM - Puppet run on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:57:05] PROBLEM - Puppet run on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:57:11] PROBLEM - Puppet run on tools-webgrid-lighttpd-1204 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:57:13] can you kick that off madhuvishy? [16:57:44] chasemp: yeah [16:57:48] and I'll look at why scratch didn't come up mounted [16:57:51] on labstore1003 [16:58:02] sure [16:58:10] that was going to be my next question [16:58:53] puppet reporting broken runs is better than the old way of it getting stuck and then hours later we find out :) [17:00:46] yuvipanda done https://wikitech.wikimedia.org/wiki/Grrrit-wm#npm_2_upgrade_changes :) [17:00:56] thanks paladox [17:01:24] chasemp madhuvishy am gonna go afk to eat and shower, that ok or anything I can do to help now? [17:01:30] Your welcome :) [17:01:41] yuvipanda: I think things are in hand madhuvishy is running the command to fix across tools [17:01:49] ok [17:01:50] and it only affects things that use scratch in the next few minutes [17:01:55] (pre-fix) [17:02:09] madhuvishy: you doing ok? [17:02:17] chasemp: yeah [17:02:44] I need to grab a bite as well but am here, will look into the mount thing in aminute, I get why but not the why of the why [17:02:49] if that makes any sense at all :) [17:02:56] :) [17:06:51] chasemp: ok ran the command across tools, next puppet run should set things right [17:07:33] some weird errors - [17:07:43] https://www.irccloud.com/pastebin/7C0nnfnu/ [17:08:35] I don't know what's up there at all [17:09:17] those paths seems either bad or nonexistent...? [17:10:46] chasemp blassstt ffrrrom the passstttt [17:10:56] no idea how or why those survived, but we used to have all those mounts [17:11:17] probably puppet leftovers due to puppet-isms [17:11:45] the keys were how we distributed keys! there was a perl/bash/python script that ran on labstore, fetched keys, and put them in a keys export. this was why you could never ssh in when nfs was overloaded :) [17:11:50] the backup one predates me tho [17:11:56] yeah - will probably go away after puppet run? [17:11:59] hopefully [17:12:23] I bet not as puppet only remove things it's explicitly told to [17:12:29] I bet that's why it's hanging out now [17:12:33] right [17:12:38] may require manual cleanup eh [17:13:02] every labs maintenance is a exercise in finding historical treasure :D [17:13:47] the ssh daemon we run on precise is a backport from trusty to allow us to get rid of this NFS mount [17:13:59] and use ssh-keys-command (or whatever it is we use now) instead [17:16:46] yuvipanda: we can probably bring shinken back [17:17:09] ok [17:17:39] it should be back now [17:17:40] RECOVERY - Puppet run on tools-worker-1013 is OK: OK: Less than 1.00% above the threshold [0.0] [17:17:48] RECOVERY - Puppet run on tools-worker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [17:18:24] RECOVERY - Puppet run on tools-webgrid-lighttpd-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [17:18:26] RECOVERY - Puppet run on tools-exec-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [17:18:34] RECOVERY - Puppet run on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [17:18:40] RECOVERY - Puppet run on tools-checker-02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:19:02] RECOVERY - Puppet run on tools-exec-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [17:22:16] RECOVERY - Puppet run on tools-exec-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [17:22:26] RECOVERY - Puppet run on tools-exec-1214 is OK: OK: Less than 1.00% above the threshold [0.0] [17:22:38] RECOVERY - Puppet run on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [17:22:42] RECOVERY - Puppet run on tools-k8s-master-01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:23:22] RECOVERY - Puppet run on tools-webgrid-lighttpd-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [17:23:29] RECOVERY - Puppet run on tools-worker-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [17:24:51] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:25:11] RECOVERY - Puppet run on tools-grid-master is OK: OK: Less than 1.00% above the threshold [0.0] [17:25:11] RECOVERY - Puppet run on tools-worker-1016 is OK: OK: Less than 1.00% above the threshold [0.0] [17:26:07] RECOVERY - Puppet run on tools-exec-1216 is OK: OK: Less than 1.00% above the threshold [0.0] [17:26:13] RECOVERY - Puppet run on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [17:26:19] RECOVERY - Puppet run on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [17:26:53] RECOVERY - Puppet run on tools-worker-1006 is OK: OK: Less than 1.00% above the threshold [0.0] [17:26:59] RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:01] RECOVERY - Puppet run on tools-worker-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:06] RECOVERY - Puppet run on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:06] RECOVERY - Puppet run on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:13] RECOVERY - Puppet run on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:19] RECOVERY - Puppet run on tools-webgrid-lighttpd-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:21] RECOVERY - Puppet run on tools-webgrid-lighttpd-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:24] RECOVERY - Puppet run on tools-grid-shadow is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:27] RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:35] RECOVERY - Puppet run on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:36] RECOVERY - Puppet run on tools-worker-1009 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:36] RECOVERY - Puppet run on tools-exec-1217 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:40] RECOVERY - Puppet run on tools-exec-cyberbot is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:42] RECOVERY - Puppet run on tools-exec-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:42] RECOVERY - Puppet run on tools-bastion-02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:46] RECOVERY - Puppet run on tools-worker-1018 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:48] RECOVERY - Puppet run on tools-worker-1012 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:50] RECOVERY - Puppet run on tools-exec-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:50] RECOVERY - Puppet run on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:52] RECOVERY - Puppet run on tools-webgrid-lighttpd-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:58] RECOVERY - Puppet run on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [17:28:02] RECOVERY - Puppet run on tools-worker-1020 is OK: OK: Less than 1.00% above the threshold [0.0] [17:28:23] RECOVERY - Puppet run on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:28:31] RECOVERY - Puppet run on tools-exec-1220 is OK: OK: Less than 1.00% above the threshold [0.0] [17:30:49] RECOVERY - Puppet run on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0] [17:31:15] RECOVERY - Puppet run on tools-webgrid-lighttpd-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [17:31:16] RECOVERY - Puppet run on tools-exec-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [17:31:19] RECOVERY - Puppet run on tools-worker-1008 is OK: OK: Less than 1.00% above the threshold [0.0] [17:31:57] RECOVERY - Puppet run on tools-exec-1215 is OK: OK: Less than 1.00% above the threshold [0.0] [17:31:58] RECOVERY - Puppet run on tools-webgrid-lighttpd-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [17:32:02] RECOVERY - Puppet run on tools-checker-01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:32:20] RECOVERY - Puppet run on tools-webgrid-lighttpd-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [17:32:32] RECOVERY - Puppet run on tools-webgrid-lighttpd-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [17:32:38] RECOVERY - Puppet run on tools-exec-1218 is OK: OK: Less than 1.00% above the threshold [0.0] [17:32:44] RECOVERY - Puppet run on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [0.0] [17:32:48] RECOVERY - Puppet run on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [17:32:50] RECOVERY - Puppet run on tools-webgrid-lighttpd-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [17:32:52] RECOVERY - Puppet run on tools-exec-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [17:32:52] RECOVERY - Puppet run on tools-webgrid-lighttpd-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [17:33:26] 06Labs, 06Operations: labnet100[12].eqiad.wmnet need to be reimaged with RAID - https://phabricator.wikimedia.org/T136718#2615765 (10Andrew) 05Open>03Resolved Labnet1001 is now the live network/api host, and I just reimaged labnet1002 with a raid. [17:36:08] RECOVERY - Puppet run on tools-webgrid-lighttpd-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [17:36:13] RECOVERY - Puppet run on tools-webgrid-lighttpd-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [17:36:19] RECOVERY - Puppet run on tools-exec-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [17:36:55] RECOVERY - Puppet run on tools-worker-1005 is OK: OK: Less than 1.00% above the threshold [0.0] [17:37:04] RECOVERY - Puppet run on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [17:37:04] RECOVERY - Puppet run on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [17:37:11] RECOVERY - Puppet run on tools-webgrid-lighttpd-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [17:37:46] RECOVERY - Puppet run on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [17:55:41] (03CR) 10Paladox: "recheck" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) (owner: 10Paladox) [18:08:52] yuvipanda: about? [18:15:07] (03PS4) 10Paladox: Do not show merges by the L10n-bot also update packages [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) [18:20:31] !log tools.lolrrit-wm Cherry picking https://gerrit.wikimedia.org/r/#/c/308949/ [18:20:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lolrrit-wm/SAL, Master [18:21:08] (03CR) 10Paladox: "I have deployed this onto grrrit-wm and it seems to works." [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) (owner: 10Paladox) [18:21:42] !log tools.lolrrit-wm I have finished my testing now. :) [18:21:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lolrrit-wm/SAL, Master [19:37:19] PROBLEM - Puppet run on tools-docker-builder-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [20:03:37] yuvipanda could you merge https://gerrit.wikimedia.org/r/#/c/308949/ please? [20:03:55] It currently seems the bot hasent broke and is running so it is a sucess [20:20:22] (03PS1) 10BBlack: update bblack ssh keys [labs/private] - 10https://gerrit.wikimedia.org/r/309100 [20:26:58] (03CR) 10BBlack: [C: 032 V: 032] update bblack ssh keys [labs/private] - 10https://gerrit.wikimedia.org/r/309100 (owner: 10BBlack) [20:43:48] 06Labs, 13Patch-For-Review: Clarify public/private role for holmium (aka labs-ns2) - https://phabricator.wikimedia.org/T93639#2616747 (10Andrew) 05Open>03Invalid This was a note to self which is long since moot. [21:11:24] !log tools brought labs/private.git up to date on tools-puppetmaster-01 [21:11:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [21:49:58] yuvipanda is Grafana down again? [23:54:05] yuvipanda could you merge https://gerrit.wikimedia.org/r/#/c/308949/ please?