[00:02:06] PROBLEM - SSH on tools-webgrid-lighttpd-1208 is CRITICAL: Server answer [00:03:44] valhallasw`cloud: is tools-exec-1213 dead? [00:12:05] 6Labs, 10Tool-Labs: tools-exec-1213 looks dead - https://phabricator.wikimedia.org/T126141#2005789 (10zhuyifei1999) 3NEW [00:17:04] RECOVERY - SSH on tools-webgrid-lighttpd-1208 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [03:44:05] PROBLEM - SSH on tools-webgrid-lighttpd-1208 is CRITICAL: Server answer [03:59:06] RECOVERY - SSH on tools-webgrid-lighttpd-1208 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [05:40:37] (03Restored) 10Tim Landscheidt: Add usage() to take(1) [labs/toollabs] - 10https://gerrit.wikimedia.org/r/70058 (owner: 10Platonides) [05:40:48] (03PS2) 10Tim Landscheidt: Add options --help and --version to take [labs/toollabs] - 10https://gerrit.wikimedia.org/r/70058 (owner: 10Platonides) [05:42:47] (03CR) 10Tim Landscheidt: [C: 031] "I've trimmed down the original patch to --help and --version as those should work in all programs. If there ever would be additional opti" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/70058 (owner: 10Platonides) [05:42:49] (03CR) 10jenkins-bot: [V: 04-1] Add options --help and --version to take [labs/toollabs] - 10https://gerrit.wikimedia.org/r/70058 (owner: 10Platonides) [06:03:32] (03PS1) 10Tim Landscheidt: Let take fail if recursion failed [labs/toollabs] - 10https://gerrit.wikimedia.org/r/268931 [06:03:50] (03CR) 10jenkins-bot: [V: 04-1] Let take fail if recursion failed [labs/toollabs] - 10https://gerrit.wikimedia.org/r/268931 (owner: 10Tim Landscheidt) [06:42:23] (03PS1) 10Tim Landscheidt: Make take's FD constructor explicit [labs/toollabs] - 10https://gerrit.wikimedia.org/r/268934 [06:42:41] (03CR) 10jenkins-bot: [V: 04-1] Make take's FD constructor explicit [labs/toollabs] - 10https://gerrit.wikimedia.org/r/268934 (owner: 10Tim Landscheidt) [06:47:00] PROBLEM - Puppet failure on tools-webgrid-generic-1405 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:21:57] RECOVERY - Puppet failure on tools-webgrid-generic-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [08:29:05] PROBLEM - SSH on tools-webgrid-lighttpd-1208 is CRITICAL: Server answer [08:44:07] RECOVERY - SSH on tools-webgrid-lighttpd-1208 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [10:00:54] 10Tool-Labs-tools-Other: musikanimal/pageviews loads resources from cloudfare.com - https://phabricator.wikimedia.org/T126151#2006086 (10Nemo_bis) 3NEW a:3MusikAnimal [10:33:05] PROBLEM - SSH on tools-webgrid-lighttpd-1208 is CRITICAL: Server answer [10:43:05] RECOVERY - SSH on tools-webgrid-lighttpd-1208 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [11:10:51] RECOVERY - Puppet failure on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [11:42:22] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Abbe98 was created, changed by Abbe98 link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Abbe98 edit summary: Created page with "{{Tools Access Request |Justification=For now the Ajapaik2Commons tool(https://github.com/Abbe98/ajapaik2commons), it's based on the Mapillay2Commons(https://tools.wmflabs.org..." [11:48:57] Any admin that could take a look at my Access request https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Abbe98 ? [12:53:15] Abbe98: note that you're not allowed to load resources from bootstrapcdn / googleapis [12:53:29] Abbe98: please use tools-static.wmflabs.org/cdnjs for that [12:54:25] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Abbe98 was modified, changed by Merlijn van Deen link https://wikitech.wikimedia.org/w/index.php?diff=290389 edit summary: [12:54:30] okey, updating it! [12:59:59] Updated my source code on github [13:01:30] Thanks! [13:13:46] valhallasw`cloud: can you check https://phabricator.wikimedia.org/T126141 ? [13:16:18] 6Labs, 10Tool-Labs: tools-exec-1213 looks dead - https://phabricator.wikimedia.org/T126141#2006138 (10valhallasw) Cleaned up continuous jobs, but there are unfortunately also a few 'task' jobs running: ``` 2558791 0.33570 radehc tools.rezabo r 01/21/2016 22:17:04 task@tools MASTER 2598902 0.33337... [13:18:38] 6Labs, 10Tool-Labs: tools-exec-1213 looks dead - https://phabricator.wikimedia.org/T126141#2006140 (10valhallasw) The rescheduled jobs are ``` 5580 0.80000 vandalstat tools.cluest Rr 01/21/2016 18:43:27 continuous 7971 0.41375 AnomieBOT- tools.anomie Rr 01/21/2016 18:35:08 continuous 703... [13:19:30] thx [13:27:46] valhallasw`cloud: can you kill the prb task as well? last activity of the prb.out log was on 26th [13:27:55] zhuyifei1999_: sure [13:28:33] zhuyifei1999_: done [13:28:38] ok thx [13:50:29] Any good resources on accessing the tool labs server from Putty? keep getting the following error: "Disconnected: No supported authentication methods aviable(server sent:publickey,hostbased)", note that I have added a public key [13:52:01] Abbe98: https://wikitech.wikimedia.org/wiki/Help:Access_to_ToolLabs_instances_with_PuTTY_and_WinSCP [13:53:40] The https://wikitech.wikimedia.org/wiki/Help:Access_to_ToolLabs_instances_with_PuTTY_and_WinSCP is not relay helpful [13:53:55] o.O? [13:54:14] Abbe98: if you get that message, your public key is not set up correctly [13:54:18] on your computer [13:54:45] Abbe98: maek suer https://wikitech.wikimedia.org/w/images/7/7f/20130526_2133_Putty_Login_Connection_SSH_Auth.png is set correctly [13:55:20] The private key is added there actullay [13:55:27] something else is wrong [13:55:51] Abbe98: How did you make your private + public key? [13:56:20] pyuttygenerator [13:56:42] puttygen? [13:56:49] yes [13:57:29] Have you pasted the public key into your Wikitech preferences? [13:58:21] yes [13:59:35] Try generating another public key, your existing one might be corrupted or something [14:00:25] Abbe98: in the screen, does it report 'Server refused our key'? [14:01:01] (the regular putty screen, not the popup) [14:01:59] it did one time! [14:02:47] Is this the message you are seeing: http://prntscr.com/a0560w [14:03:26] yes [14:03:50] Load your key into puttygen again [14:03:55] Abbe98: your username is abbe98, not Abbe98 [14:04:29] okey I guess the username was the issue! [14:05:59] Thanks it works now! [14:44:23] Hello, Labs admin hereß [14:44:25] *? [14:46:02] problem solved [15:39:12] Is a mwvagrant expert here? [15:41:09] Luke081515: I'm not one but there's this page on the MediaWiki wiki: https://www.mediawiki.org/wiki/MediaWiki-Vagrant [15:41:50] tom29739: Thanks for your help, but I got a specific question concerning step five here: https://wikitech.wikimedia.org/wiki/Help:MediaWiki-Vagrant_in_Labs [15:42:28] Log out and log back in to pick up profile.d alias that will make the vagrant command run Vagrant as the mwvagrant shared user account. Note: skipping this step will cause problems! [15:42:31] That step [15:43:57] Yeah, what is meaned with "log back in to pick up profile.d alias that will make the vagrant command run Vagrant as the mwvagrant shared user account. [15:44:01] " [15:44:36] Simply log out of ssh [15:44:42] then log back in again [15:44:59] type 'logout' in the ssh terminal to logout [15:45:04] yeah, but what is meaned with: pick up profile.d alias [15:45:14] log in/out is not the problem ;) [15:45:26] It will do it automatically when you log back in [15:45:37] Luke081515: the 'vagrant' shell alias is configured in profile.d, and profile.d is only read when you login [15:46:05] so unless you log out and in again, 'vagrant' will not work correctly [15:46:13] ah, ok. Thanks :) [16:03:05] PROBLEM - SSH on tools-webgrid-lighttpd-1208 is CRITICAL: Server answer [16:08:59] 6Labs, 10Labs-Infrastructure, 10MediaWiki-Vagrant: After MediaWiki-Vagrant installation: Could not execute mw-config - https://phabricator.wikimedia.org/T126159#2006302 (10Luke081515) 3NEW [17:00:12] 6Labs, 10Tool-Labs: tools-exec-1213 looks dead - https://phabricator.wikimedia.org/T126141#2006388 (10valhallasw) [17:00:14] 6Labs, 10Tool-Labs: tools-webgrid-lighttpd-1208.eqiad.wmflabs hangs - https://phabricator.wikimedia.org/T125770#2006389 (10valhallasw) [17:00:20] 6Labs, 10Labs-Infrastructure, 7Tracking: Labs instances sometimes freeze - https://phabricator.wikimedia.org/T124133#2006387 (10valhallasw) [17:02:23] 6Labs, 10Labs-Infrastructure, 7Tracking: Labs instances sometimes freeze - https://phabricator.wikimedia.org/T124133#2006403 (10valhallasw) >>! In T124133#1952488, @chasemp wrote: > have we had non-webgrid examples? Yes, tools-exec-1213 is currently locked up (webgrid-lighttpd-1208 as well). [17:13:06] RECOVERY - SSH on tools-webgrid-lighttpd-1208 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [17:19:04] PROBLEM - SSH on tools-webgrid-lighttpd-1208 is CRITICAL: Server answer [17:23:30] 6Labs, 10Labs-Infrastructure, 7Tracking: Labs instances sometimes freeze - https://phabricator.wikimedia.org/T124133#2006439 (10valhallasw) @chasemp: if you have time to look at this, could you try to ptrace sshd to see why sshd is hanging? Having a working (root) ssh login would make it much easier to figur... [17:29:41] 6Labs, 10Tool-Labs: puppet failure on a large number of instances - https://phabricator.wikimedia.org/T126165#2006444 (10valhallasw) 3NEW [17:31:08] 6Labs, 10Tool-Labs: Toollabs::Cronrunner backup fails for invalid utf-8 content - https://phabricator.wikimedia.org/T126166#2006453 (10valhallasw) 3NEW [17:32:13] 6Labs, 10Tool-Labs: tools-docker-registry-01 has incorrect puppetmaster key - https://phabricator.wikimedia.org/T126167#2006459 (10valhallasw) 3NEW [17:33:26] 6Labs, 10Tool-Labs: libbytes-random-secure-perl unavailable on precise - https://phabricator.wikimedia.org/T126168#2006465 (10valhallasw) 3NEW [17:49:05] RECOVERY - SSH on tools-webgrid-lighttpd-1208 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [17:55:05] PROBLEM - SSH on tools-webgrid-lighttpd-1208 is CRITICAL: Server answer [18:10:04] RECOVERY - SSH on tools-webgrid-lighttpd-1208 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [19:21:56] !log ores Deployed ores-wikimedia-config:0be5afc [19:21:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL, Master [19:22:05] PROBLEM - SSH on tools-webgrid-lighttpd-1208 is CRITICAL: Server answer [19:30:46] 6Labs, 10Tool-Labs: tools-web-static-*: Could not find dependent Package[gridengine-common] - https://phabricator.wikimedia.org/T126171#2006583 (10valhallasw) 3NEW [19:36:15] 6Labs, 10Tool-Labs: tools-submit: cron service stopped and puppet disabled - https://phabricator.wikimedia.org/T126172#2006591 (10valhallasw) 3NEW [19:36:25] 6Labs, 10Tool-Labs: puppet failure on a large number of instances - https://phabricator.wikimedia.org/T126165#2006599 (10valhallasw) [19:36:57] 6Labs, 10Tool-Labs: tools-submit: cron service stopped and puppet disabled - https://phabricator.wikimedia.org/T126172#2006591 (10valhallasw) [19:36:59] 6Labs, 10Tool-Labs: tools-web-static-*: Could not find dependent Package[gridengine-common] - https://phabricator.wikimedia.org/T126171#2006583 (10valhallasw) [19:37:03] 6Labs, 10Tool-Labs: puppet failure on a large number of instances - https://phabricator.wikimedia.org/T126165#2006444 (10valhallasw) [19:49:12] 6Labs, 10Tool-Labs: tools-submit: cron service stopped and puppet disabled - https://phabricator.wikimedia.org/T126172#2006614 (10yuvipanda) Yes, since the instance is dead and cron service is on tools-cron-01. Instance should be deleted now [19:53:59] 6Labs, 10Tool-Labs: Toollabs::Cronrunner backup fails for invalid utf-8 content - https://phabricator.wikimedia.org/T126166#2006622 (10scfc) a:3scfc I believe this is essentially https://tickets.puppetlabs.com/browse/PUP-3377 → https://tickets.puppetlabs.com/browse/PUP-1441. I'll test that and, if so, fix i... [19:57:02] RECOVERY - SSH on tools-webgrid-lighttpd-1208 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [20:32:34] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Vladis13 was created, changed by Vladis13 link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Vladis13 edit summary: Created page with "{{Tools Access Request |Justification=Work with scripts mostly for ru-wikisource and ru-wiki. |Completed=false |User Name=Vladis13 }}" [20:58:34] 6Labs, 10Labs-Infrastructure, 10MediaWiki-Vagrant: After MediaWiki-Vagrant installation: Could not execute mw-config - https://phabricator.wikimedia.org/T126159#2006672 (10bd808) Running `vagrant up` (or possibly `vagrant provision` to retry after a prior error in Puppet provisioning) will start the virtual... [21:07:06] 6Labs, 10Tool-Labs: Toollabs::Cronrunner backup fails for invalid utf-8 content - https://phabricator.wikimedia.org/T126166#2006680 (10scfc) Yep, `show_diff => false` works. [21:19:02] 10Tool-Labs-tools-Other: musikanimal/pageviews loads resources from cloudfare.com - https://phabricator.wikimedia.org/T126151#2006683 (10MusikAnimal) 5Open>3Resolved Done! Thank you [21:23:49] 6Labs, 10Tool-Labs: puppet failure on a large number of instances - https://phabricator.wikimedia.org/T126165#2006701 (10yuvipanda) [21:23:51] 6Labs, 10Tool-Labs: tools-docker-registry-01 has incorrect puppetmaster key - https://phabricator.wikimedia.org/T126167#2006698 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Fixed it by switching it to the appropriate puppetmaster (tools-puppetmaster-01) [21:24:09] 6Labs, 10Tool-Labs, 5Patch-For-Review: libbytes-random-secure-perl unavailable on precise - https://phabricator.wikimedia.org/T126168#2006705 (10yuvipanda) 5Open>3Resolved a:3yuvipanda My bad for not checking properly :( Thanks for fixing! [21:24:11] 6Labs, 10Tool-Labs: puppet failure on a large number of instances - https://phabricator.wikimedia.org/T126165#2006444 (10yuvipanda) [21:27:06] 10Tool-Labs-tools-Other: musikanimal/pageviews loads resources from cloudfare.com - https://phabricator.wikimedia.org/T126151#2006710 (10Nemo_bis) Great, much better now. [21:34:30] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [21:36:38] RECOVERY - Puppet failure on tools-exec-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [21:38:41] 6Labs, 10Tool-Labs, 10Continuous-Integration-Infrastructure, 7Blocked-on-RelEng: debian-glue job is out of disk space - https://phabricator.wikimedia.org/T126176#2006720 (10scfc) 3NEW [21:48:15] RECOVERY - Puppet failure on tools-exec-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [21:55:21] RECOVERY - Puppet failure on tools-exec-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [21:58:58] RECOVERY - Puppet failure on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:32:24] 10Tool-Labs-tools-Other, 6Discovery, 10Maps: GeoHack's list of mapping service links does not include maps.wikimedia.org - https://phabricator.wikimedia.org/T113438#2006856 (10Yurik) 5Open>3Resolved a:3Yurik some geohacks have added maps.wikimedia.org to their lists. I don't think our goal is to repla... [22:44:10] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Vladis13 was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=290695 edit summary: [22:49:05] PROBLEM - SSH on tools-webgrid-lighttpd-1208 is CRITICAL: Server answer [22:53:57] 6Labs, 10Tool-Labs, 10Continuous-Integration-Infrastructure: integration-slave-jessie-1001 is out of disk space (causes debian-glue job to fail) - https://phabricator.wikimedia.org/T126176#2006879 (10hashar) [22:58:36] 6Labs, 10Tool-Labs, 10Continuous-Integration-Infrastructure: integration-slave-jessie-1001 is out of disk space (causes debian-glue job to fail) - https://phabricator.wikimedia.org/T126176#2006884 (10hashar) /mnt is only 20GBytes and all of that is taken by pbuilder specially /mnt/pbuilder/build/cow.* dirs.... [22:59:27] 6Labs, 10Tool-Labs, 10Continuous-Integration-Infrastructure: integration-slave-jessie-1001 is out of disk space (causes debian-glue job to fail) - https://phabricator.wikimedia.org/T126176#2006885 (10hashar) 5Open>3Resolved a:3hashar ``` hashar@integration-slave-jessie-1001:~$ df -h /mnt Filesystem... [22:59:46] (03CR) 10Hashar: "recheck" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/268934 (owner: 10Tim Landscheidt) [23:05:22] 6Labs, 10Tool-Labs: puppet failure on a large number of instances - https://phabricator.wikimedia.org/T126165#2006907 (10scfc) [23:05:24] 6Labs, 10Tool-Labs: Toollabs::Cronrunner backup fails for invalid utf-8 content - https://phabricator.wikimedia.org/T126166#2006904 (10scfc) 5Open>3Resolved Fixed by 6a2e3d9acbc75089b04384387f4d162107933ca8 where I missed adding the "Bug:" footer. [23:49:06] RECOVERY - SSH on tools-webgrid-lighttpd-1208 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [23:55:04] PROBLEM - SSH on tools-webgrid-lighttpd-1208 is CRITICAL: Server answer