[00:01:14] (03PS126) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [00:06:31] (03CR) 10Ricordisamoa: [C: 04-2] "PS126 splits LanguageSection into a module" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [00:07:49] (03PS127) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [00:14:19] (03CR) 10Ricordisamoa: [C: 04-2] "PS127 splits SingleTermSection into a module" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [00:15:09] (03PS128) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [00:18:08] (03CR) 10Ricordisamoa: [C: 04-2] "PS128 splits AliasesSection into a module" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [00:25:44] (03PS129) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [00:28:58] (03CR) 10Ricordisamoa: [C: 04-2] "PS129 splits SitelinksSection into a module" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [00:33:43] (03PS130) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [00:38:48] (03CR) 10Ricordisamoa: [C: 04-2] "PS130 splits StatementsSection into a module" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [00:40:10] PROBLEM - Puppet errors on tools-exec-1404 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [00:40:36] (03PS131) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [00:47:13] (03CR) 10Ricordisamoa: [C: 04-2] "PS131 splits DragHelper into a module" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [01:10:10] RECOVERY - Puppet errors on tools-exec-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [01:53:52] PROBLEM - Puppet errors on tools-exec-1402 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [02:33:53] RECOVERY - Puppet errors on tools-exec-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [02:42:48] 10Toolforge: node.js webservice not seeing PORT in env - https://phabricator.wikimedia.org/T176812#3638368 (10bd808) The bot process seems to actually be up and running: http://tools.wmflabs.org/fb-translate-bot/ Looking at the source code for `webservice`, the environment injected PORT setting is done in the p... [03:30:00] PROBLEM - Puppet errors on tools-exec-1425 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [04:30:01] RECOVERY - Puppet errors on tools-exec-1425 is OK: OK: Less than 1.00% above the threshold [0.0] [04:50:33] 10Tools: Notice: Undefined variable: clusterNr in /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/App.php on line 70 - https://phabricator.wikimedia.org/T176831#3638432 (10Zoranzoki21) [05:36:30] !log admin-monitoring Clear all fullstackd instances because the project is full and the fullstack tests are failing [05:36:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin-monitoring/SAL [06:35:02] 10Tool-Global-user-contributions: guc giving an error - https://phabricator.wikimedia.org/T176823#3638510 (10Zoranzoki21) [06:35:04] 10Tools: Notice: Undefined variable: clusterNr in /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/App.php on line 70 - https://phabricator.wikimedia.org/T176831#3638513 (10Zoranzoki21) [06:35:27] 10Tool-Global-user-contributions: Notice: Undefined variable: clusterNr in /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/App.php on line 70 - https://phabricator.wikimedia.org/T176831#3638432 (10Zoranzoki21) [06:42:03] PROBLEM - Puppet errors on tools-exec-1432 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [06:54:10] 10Tool-Global-user-contributions: Notice: Undefined variable: clusterNr in /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/App.php on line 70 - https://phabricator.wikimedia.org/T176831#3638432 (10CorrectHorseBatteryStaple) This was probably caused by [[https://github.com/wikimedia/labs-tools-gu... [07:00:17] PROBLEM - Puppet errors on tools-bastion-05 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:02:17] (03Draft2) 10Zoranzoki21: Fix problem Notice: Undefined variable: clusterNr in /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/App.php on line 70 [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/380919 [07:03:22] (03PS3) 10Zoranzoki21: Fix problem with listing of contributions [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/380919 (https://phabricator.wikimedia.org/T176831) [07:04:27] 10Tool-Global-user-contributions, 10Patch-For-Review: Notice: Undefined variable: clusterNr in /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/App.php on line 70 - https://phabricator.wikimedia.org/T176831#3638548 (10Zoranzoki21) >>! In T176831#3638540, @CorrectHorseBatteryStaple wrote: > This... [07:22:04] RECOVERY - Puppet errors on tools-exec-1432 is OK: OK: Less than 1.00% above the threshold [0.0] [07:35:15] RECOVERY - Puppet errors on tools-bastion-05 is OK: OK: Less than 1.00% above the threshold [0.0] [09:31:29] (03PS1) 10Lokal Profil: Harvest coordinate template [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380939 (https://phabricator.wikimedia.org/T176845) [09:35:37] (03PS2) 10Lokal Profil: Harvest coordinate template and drop lat, lon [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380939 (https://phabricator.wikimedia.org/T176845) [10:07:41] 10Cloud-Services, 10Wikidata, 10Patch-For-Review, 10User-Ladsgroup, 10Wikidata-Sprint: Open view for term_full_entity_id in wb_terms table in labs - https://phabricator.wikimedia.org/T167114#3638918 (10Ladsgroup) 92% of the populating the table has been done now and it will finish by Thursday, I think we... [11:12:33] 10Tool-Global-user-contributions: guc giving an error - https://phabricator.wikimedia.org/T176823#3639044 (10Aklapper) > When I try to use guc on a user For future reference (this task is already handled in T176831), please provide a link to "guc" - see https://mediawiki.org/wiki/How_to_report_a_bug Thanks. [11:41:43] Technical Advice IRC meeting starting at 3 pm UTC/5 pm CEST in channel #wikimedia-tech, hosts: @addshore & @C_Fisch (WMDE) - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [13:07:00] PROBLEM - Puppet errors on tools-worker-1020 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [13:18:13] 10Tool-Global-user-contributions, 10Patch-For-Review: Notice: Undefined variable: clusterNr in /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/App.php on line 70 - https://phabricator.wikimedia.org/T176831#3639502 (10Zoranzoki21) p:05Triage>03High [13:29:16] 10Toolforge, 10Outreachy (Round-15): Improvements for the Toolforge 'webservice' command - https://phabricator.wikimedia.org/T175768#3602769 (10Sowjanyavemuri) Hi, @Andrew @bd808 @madhuvishy I am Sowjanya. I am interested in participating in this project and have finished the microtask specified. I have alread... [13:40:38] 10Tools: Make table at tool-extreg-wos sortable if possible - https://phabricator.wikimedia.org/T176873#3639619 (10MarcoAurelio) [13:54:36] 10Toolforge, 10Outreachy (Round-15): Outreachy - webservice microtask for Sowjanyavemuri - https://phabricator.wikimedia.org/T176624#3639706 (10Sowjanyavemuri) Hi @srishakatux , I am Sowjanya. I am interested in contributing to this project(Improvements for the Toolforge 'webservice' command) as a part of Outr... [14:11:59] RECOVERY - Puppet errors on tools-worker-1020 is OK: OK: Less than 1.00% above the threshold [0.0] [14:30:45] Technical Advice IRC meeting starting in 30 minutes in channel #wikimedia-tech, hosts: @addshore & @C_Fisch (WMDE) - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [14:34:17] (03PS11) 10MarcoAurelio: [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) [14:34:47] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) (owner: 10MarcoAurelio) [14:45:24] (03PS12) 10MarcoAurelio: [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) [14:45:30] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) (owner: 10MarcoAurelio) [14:47:03] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1413 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [14:51:19] (03PS13) 10MarcoAurelio: [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) [14:51:41] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) (owner: 10MarcoAurelio) [14:53:02] (03PS14) 10MarcoAurelio: [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) [14:53:36] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) (owner: 10MarcoAurelio) [14:54:53] (03PS15) 10MarcoAurelio: [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) [14:55:18] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) (owner: 10MarcoAurelio) [14:58:43] (03PS16) 10MarcoAurelio: [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) [14:59:09] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) (owner: 10MarcoAurelio) [15:00:55] Technical Advice IRC meeting starting now in channel #wikimedia-tech, hosts: @addshore & @C_Fisch (WMDE) - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:08:16] (03PS17) 10MarcoAurelio: [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) [15:08:37] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) (owner: 10MarcoAurelio) [15:12:12] (03PS18) 10MarcoAurelio: [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) [15:12:43] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) (owner: 10MarcoAurelio) [15:14:42] (03PS19) 10MarcoAurelio: [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) [15:15:10] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) (owner: 10MarcoAurelio) [15:15:57] (03PS20) 10MarcoAurelio: [WIP] Update composer.json to use MW_CodeSniffer and fix detected issues [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/380355 (https://phabricator.wikimedia.org/T176635) [15:17:03] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [15:17:56] 10Tool-stewardbots, 10Patch-For-Review: Update composer.json to use MediaWiki CodeSniffer and fix detected issues - https://phabricator.wikimedia.org/T176635#3639964 (10MarcoAurelio) @bd808 @Legoktm Patch now passes jenkins. I fixed some with PHPCBF and excluded some tests which I think ain't relevant for our... [16:09:09] (03PS1) 10Krinkle: Fix undefined variable clusterNr [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/381020 (https://phabricator.wikimedia.org/T176831) [16:09:38] (03CR) 10Krinkle: [C: 032] Fix undefined variable clusterNr [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/381020 (https://phabricator.wikimedia.org/T176831) (owner: 10Krinkle) [16:10:50] (03Merged) 10jenkins-bot: Fix undefined variable clusterNr [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/381020 (https://phabricator.wikimedia.org/T176831) (owner: 10Krinkle) [16:12:40] PROBLEM - Puppet errors on tools-cron-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [16:12:46] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1405 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [16:13:04] PROBLEM - Puppet errors on tools-exec-1432 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:14:00] 10Tool-Global-user-contributions, 10Patch-For-Review: Notice: Undefined variable: clusterNr in /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/App.php on line 70 - https://phabricator.wikimedia.org/T176831#3640150 (10Krinkle) 05Open>03Resolved a:03Krinkle Thanks! [16:16:08] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1428 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:16:20] PROBLEM - Puppet errors on tools-exec-gift-trusty-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:17:01] PROBLEM - Puppet errors on tools-exec-1426 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:18:37] bd808: regarding meta_p.wiki, I can see the problem there because that is replicated, right? And we kind of want to avoid a default it seems. Perhaps a solution would be to make it not replicated but reflect the currently used DB? That way, I can bootstrap my tool from one hostname, and from there I can query the wiki question, meta_p, and other wikis via its slice field. [16:19:02] Alternatively, going with a suffix-less slice field would also work, but that'd be a breaking change (or requires a new column, which would be fine, too?) [16:19:49] Krinkle: its actually maintained separately on each backend host, but ... the hosts don't know how they are being talked to [16:20:21] bd808: Ah, rihgt, because they're actually the same host right now. [16:20:31] I was thinking about both adding a new column that is only the sN value and changing the current sN.labsdb to sN.analytics... [16:20:37] I'm already giving into the illusion that they're different :P [16:20:51] bd808: Yeah, that makes sense I think. Defensive default. [16:21:11] there are 3 backend servers, and 2 fronting load balancers. [16:21:21] we need diagrams! [16:22:02] making a task about this is on my mental todo list. I should do that now :) [16:22:54] bd808: give me the flow of how stuff works ill make one [16:24:26] bd808: One Q, regarding https://tools.wmflabs.org/replag/ - is the labsdb/wiki breakdown duplicate of one of the 4 smaller tables on top (c1/c3?), or do we essentially have those 4 "new" ones, and the labsdb old one? [16:24:29] Zppix: thanks for the offer. I need to find out the 'real' setup myself first :) [16:24:49] bd808: ok just let me know (im always reachable via email, and IRC if im connected) [16:25:07] Krinkle: right now the table below is related the to c1 & c3 tables above [16:25:16] OK. Thx :) [16:26:01] c1 == labsdb1001, c3 == labsdb1003. The breakout table is based on which of c[13] the $wikidb.labsdb service name points at [16:27:30] I think the next generation of that display will be tabbed and go back to the "all shards" and "per-wiki" display. [16:28:12] off topic about replag site, it needs some serious css overhaul [16:28:14] probably something like /replag/web, /replag/analytics, /replag/labsdb [16:28:44] heh. I made it look really plain like that on purpose Zppix :) [16:29:06] bd808: i mean you could of made it atleast all line up... [16:29:21] ? [16:29:49] the tables are somehow ragged for you? [16:30:12] bd808: yeah a refresh fixed it i guess the site didnt load correctly [16:30:31] probably just my end though [16:42:08] 10Data-Services: Update meta_p database for new service names - https://phabricator.wikimedia.org/T176886#3640253 (10bd808) [16:42:31] 10Data-Services: Update meta_p database for new service names - https://phabricator.wikimedia.org/T176886#3640265 (10bd808) [16:42:33] 10Data-Services, 10cloud-services-team (FY2017-18), 10DBA, 10Goal: Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T142807#3640266 (10bd808) [16:45:02] 10Cloud-VPS, 10Upstream: Designate API expects FQDN in "name" value for a recordset - https://phabricator.wikimedia.org/T176057#3640272 (10bd808) [16:46:31] 10Cloud-VPS, 10Upstream: Designate API expects FQDN in "name" value for a recordset and raises bizarre error when that expecation fails - https://phabricator.wikimedia.org/T176057#3612352 (10bd808) [16:47:22] 10Toolforge, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-bd808: Update `sql` command to use new wiki replica servers - https://phabricator.wikimedia.org/T176688#3640277 (10bd808) a:03bd808 [16:49:12] https://phabricator.wikimedia.org/T176624 wow a competition [16:50:10] (03Abandoned) 10Zoranzoki21: Fix problem with listing of contributions [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/380919 (https://phabricator.wikimedia.org/T176831) (owner: 10Zoranzoki21) [16:50:50] 10Tool-Global-user-contributions, 10Patch-For-Review: Notice: Undefined variable: clusterNr in /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/App.php on line 70 - https://phabricator.wikimedia.org/T176831#3640288 (10Zoranzoki21) All is ok now.. I abandoned my patch, because is resolved. [16:51:08] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1428 is OK: OK: Less than 1.00% above the threshold [0.0] [16:51:21] RECOVERY - Puppet errors on tools-exec-gift-trusty-01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:51:59] RECOVERY - Puppet errors on tools-exec-1426 is OK: OK: Less than 1.00% above the threshold [0.0] [16:52:43] RECOVERY - Puppet errors on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:52:47] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [16:53:03] RECOVERY - Puppet errors on tools-exec-1432 is OK: OK: Less than 1.00% above the threshold [0.0] [16:54:49] 10Toolforge, 10Outreachy (Round-15): Improvements for the Toolforge 'webservice' command - https://phabricator.wikimedia.org/T175768#3640295 (10bd808) [16:54:51] 10Toolforge, 10Outreachy (Round-15): Outreachy - webservice microtask for Sowjanyavemuri - https://phabricator.wikimedia.org/T176624#3640294 (10bd808) [17:28:11] bd808: i set up a fresh debian jessie instance; the role seems to set up /srv/mediawiki_vagrant but it's saying 'no provider available' when i run `vagrant up` :( [17:29:48] brion: what does `type -a vagrant` tell you? "vagrant is aliased to `/usr/local/bin/mwvagrant'" or something else? [17:30:13] hmmm "vagrant is /usr/bin/vagrant" [17:30:27] try `source /etc/profile.d/alias-vagrant.sh` to get the alias [17:30:40] something must hvae gone awry in setup [17:30:59] it only picks that up if you ssh in after the puppet run [17:31:25] bd808: ok that looks happier :D [17:31:30] :) [17:31:38] i did log out and back in.... not sure why it didn't see it [17:31:49] hmm [17:31:50] thanks bd808 ! [17:32:12] "NFS requires a host-only network to be created. Please add a host-only network to the machine (with either DHCP or a static IP) for NFS to work." wot [17:32:31] ugh [17:32:47] i'm wondering if it got confused because i added the role via a prefix before i installed the vm instead of waiting until after setup to add the role on the instance [17:33:07] the LXC automation I wrote is a bit fragile :/ [17:33:18] :) [17:33:49] I have seen that error before and never really figured out WTF the problem is other than vagrant and LXC not being nice to each other [17:34:33] I would suggest first forcing another puppet run just to be sure its all provisioned and then rebooting the vm [17:34:48] ok [17:35:07] if its still broken when it comes back up then ... yell again and I'll try to remember more things to poke [17:35:31] seems to have been happier on a second 'vagrant up' after destroying the half-created instance [17:35:38] oh nice [17:35:38] i can live with that :D [17:37:53] 10Cloud-VPS: DNS resolution chosing IPv6 addrs on hosts with only link-local IPv6 addresses - https://phabricator.wikimedia.org/T176891#3640404 (10bd808) [17:51:25] 10Tool-stewardbots, 10Need-volunteer, 10WorkType-Maintenance: Outdated MySQL handling for hat-web-tool@stewardbots - https://phabricator.wikimedia.org/T156545#3640452 (10bd808) @MarcoAurelio I do not understand the bug report, but using the new db servers will help with problems like replica drift if that ha... [17:56:50] 10Cloud-Services, 10cloud-services-team (Kanban), 10Wikimedia-Mailing-lists, 10Patch-For-Review: Create cloud-admin and archive labs-admin mailing list - https://phabricator.wikimedia.org/T167155#3640474 (10RobH) Ok, I've setup an alias for the redirection, and disabled the old mailing list. emails to the... [18:18:45] 10Cloud-Services, 10cloud-services-team (Kanban), 10Wikimedia-Mailing-lists, 10User-bd808: Create cloud-admin and archive labs-admin mailing list - https://phabricator.wikimedia.org/T167155#3640572 (10RobH) a:03bd808 We've gotten this working now. All emails sent to labs-admin@lists.wikimedia.org automa... [18:30:03] 10Toolforge: Update Toolforge to PHP 5.6 - https://phabricator.wikimedia.org/T176897#3640641 (10kaldari) [18:44:28] 10Cloud-VPS (Project-requests): Request creation of webperf VPS project - https://phabricator.wikimedia.org/T176597#3640675 (10Peter) @Gilles I will add all in the team when I figure out how. [18:45:33] I've been trying to get the Kubernetes webservice running on Toolforge, but haven't had any luck.... [18:45:40] https://www.irccloud.com/pastebin/UgMDdwOQ/ [18:46:03] is there a limit to the hhvm subprocesses, or the apache processes, under vagrant? when doing batch uploads of videos to my vagrant boxes, job queue eats up several processes and further requests time out or have gateway errors until the ffmpeg subprocesses finish [18:46:05] ugh [18:46:44] seems to max out at 8 [18:46:45] kaldari: let me take a peek. I have a hunch that the webservice command is just confused. [18:47:16] kaldari: bd808 perhaps it just didnt instantly kill it and it was still stopping [18:47:31] I tried waiting a while, but no dice [18:48:00] kaldari: qstat shows it still running on the grid. There is a bug open about this somewhere. [18:48:03] How long should it take to stop the webservice? [18:48:28] Yeah, I ended up restarting the gridengine webservice [18:48:50] lemme try stopping it and checking to see if it's actually stopped... [18:48:57] A fix that has worked for me before is to `webservice stop` then `rm service.manifest` then use qstat to find and kill things manually [18:49:05] then start up with webservice again [18:49:29] what can happen for some goofy reason is that the webservice watchdog keeps restarting the service [18:49:31] oh, now everything works like magic :P [18:50:01] I just did the same two commands and now it works fine :P [18:50:17] brion: I don't remember if we pinned the thread count in mw-vagrant's HHVM config, but that's possible. [18:50:21] googling indicates i should adjust hhvm.server.thread_count which defaults to 2x core count (which would be 2x4 -> 8) \o/ [18:50:31] there you go :) [18:50:31] i don't see it in the config file so it's using default yeah [18:51:39] re the 'vagrant' alias not picking up -- it picks up on regular login, but not on 'sudo bash' to root or 'screen' as myself o_O [18:51:48] kaldari: so the old "have you tried turning it off and on again" trick. :) I think this is a race in the watchdog [18:51:58] i can work around by calling mwvagrant explicitly or loading the alias explicitly [18:52:01] brion: ah. screen doesn't read .profile by default [18:52:10] bingo that'll be it :D [18:52:15] there's a tick for that ... /me looks in screen config [18:53:31] 10Cloud-VPS (Project-requests): Request creation of webperf VPS project - https://phabricator.wikimedia.org/T176597#3640729 (10Peter) @Gilles fixed it now. Thank you @madhuvishy for fixing this so fast! [18:53:34] brion: I think that " [18:53:48] "defshell -bash" is the magic [18:53:57] bd808 brion: that alias should go to .bashrc rather than .bash_profile right [18:54:15] zhuyifei1999_: its in /etc/profile.d/... [18:54:22] uh [18:54:44] which login shells should see [18:55:10] screen shouldn't start a login shell [18:55:24] opinions vary :) [18:55:36] ... [18:55:48] i don't understand why there's a difference between login shells and non-login shells, i'll have to do some reading :D [18:56:07] but inconsistent behavior kinda sucks [18:56:17] brion: afaik login shells sets the environment variables [18:56:35] and non-login ones don't [18:57:00] non-login shells are for forked sub-processes [18:57:06] they load less config [18:57:10] but alias, unlike environment variables, don't inherit [18:57:13] that's pretty much the difference [18:57:48] but screen sub-shells are really interactive environments [18:57:59] and thus the difference of opinion on if they should be login shells or not :) [18:58:06] sigh [18:58:36] 10cloud-services-team (FY2017-18), 10Developer-Relations, 10Goal: Program 4 Outcome 1: improve documentation - https://phabricator.wikimedia.org/T166401#3640743 (10Quiddity) [18:59:43] funnnn [19:00:04] interactive login, non-interactive login, interactive non-login, and non-interactive non-login are all different in un*x land and those differences are compounded by distro default, local system, and per-user config [19:00:24] bd808: wait a sec, forking (I interpret as fork()) shouldn't load any new config right [19:00:59] by default, no [19:01:25] but being a purist vs being pragmatic are sometimes different things [19:01:31] non-login shells is when you call a shell (not as fork) within a shell [19:01:48] sigh whatever [19:03:07] for bash, borne, and posix the difference between login config behavior and non-login is if $0 starts with a '-'. That changes the internal shell behavior [19:03:34] its not really anything more than that. [19:04:38] The reason you might not want login shell behavior is that you may have non-reentrant settings for your login shell config. Like setting up X sockets or ssh-agents, etc [19:05:05] things that you really want to share across many interactive sub-shells [19:06:29] zhuyifei1999_: you are correct though that it would be nicer for mw-vagrant on labs to figure out how to hook .bashrc instead of .profile [19:06:42] * bd808 was lazy [19:15:08] * zhuyifei1999_ was away and will be away again [19:33:39] PROBLEM - Puppet errors on tools-exec-1440 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [19:49:13] 10cloud-services-team (FY2017-18), 10Goal, 10Patch-For-Review, 10User-bd808: Perform core Cloud Services rebranding - https://phabricator.wikimedia.org/T168480#3640900 (10bd808) [19:49:14] 10cloud-services-team (FY2017-18), 10Goal, 10Patch-For-Review: Program 10 Outcome 2: Rebranding - https://phabricator.wikimedia.org/T166404#3640901 (10bd808) [19:49:31] 10Cloud-Services, 10cloud-services-team (Kanban), 10Wikimedia-Mailing-lists, 10User-bd808: Create cloud-admin and archive labs-admin mailing list - https://phabricator.wikimedia.org/T167155#3640897 (10bd808) 05Open>03Resolved a:05bd808>03RobH Everything looks good to me. I announced the list name... [19:53:21] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1424 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [19:56:43] 10Cloud-Services, 10Wikimedia-Mailing-lists: Create cloud mailman list and archive labs-l list - https://phabricator.wikimedia.org/T175190#3640952 (10bd808) a:03RobH Planning on doing this during the PDT morning on 2017-09-27. [20:08:39] RECOVERY - Puppet errors on tools-exec-1440 is OK: OK: Less than 1.00% above the threshold [0.0] [20:21:40] 10Cloud-Services, 10Puppet: Make changing puppetmasters for Labs instances more easy - https://phabricator.wikimedia.org/T152941#3640998 (10hashar) [20:26:00] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1425 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:29:05] 10Toolforge, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-bd808: Update `sql` command to use new wiki replica servers - https://phabricator.wikimedia.org/T176688#3641000 (10bd808) * Followed general pattern of https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#Building_packages to bui... [20:33:25] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1424 is OK: OK: Less than 1.00% above the threshold [0.0] [21:06:01] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1425 is OK: OK: Less than 1.00% above the threshold [0.0] [21:56:37] 10Toolforge, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-bd808: Update `sql` command to use new wiki replica servers - https://phabricator.wikimedia.org/T176688#3641298 (10bd808) A second clush run showed clean for the prior failures on tools-exec-[1411,1420].tools.eqiad.wmflabs. The tools-d... [22:28:04] PROBLEM - Puppet errors on tools-exec-1439 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [23:07:45] PROBLEM - Puppet errors on tools-exec-1438 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [23:08:07] RECOVERY - Puppet errors on tools-exec-1439 is OK: OK: Less than 1.00% above the threshold [0.0] [23:47:44] RECOVERY - Puppet errors on tools-exec-1438 is OK: OK: Less than 1.00% above the threshold [0.0]