[00:00:00] <wikibugs>	 10Cloud-Services, 10Toolforge, 10Developer-Relations: Provide an easy way for Tool Labs tools to expose their source code - https://phabricator.wikimedia.org/T102081#1355202 (10bd808) >>! In T102081#2998382, @Legoktm wrote: > Is not requiring the usage of git still a requirement for this task?  If it is, the...
[00:01:20] <wikibugs>	 10Cloud-Services, 10wikitech.wikimedia.org, 10User-bd808: Update wikitech Titleblacklist - https://phabricator.wikimedia.org/T170178#3428551 (10bd808) >>! In T170178#3421916, @MarcoAurelio wrote: > I am not certainly the most experienced with regex, but IMHO usernames that we want to forbid from creation sho...
[00:02:59] <wikibugs>	 10Cloud-Services, 10Toolforge, 10Developer-Relations: Provide an easy way for Tool Labs tools to expose their source code - https://phabricator.wikimedia.org/T102081#3428562 (10Legoktm) Well the original premise of this task that @yuvipanda created was "A way that does not require using git". @yuvipanda, cou...
[00:05:58] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech: Remove all the average edit size information in the Edit Counter in new XTools - https://phabricator.wikimedia.org/T170103#3428577 (10kaldari)
[00:06:08] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech: Remove edit size information in the Edit Counter in new XTools - https://phabricator.wikimedia.org/T170103#3419463 (10kaldari)
[00:06:53] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove edit size information in the Edit Counter in new XTools - https://phabricator.wikimedia.org/T170103#3419463 (10kaldari) p:05Normal>03High
[00:08:15] <wikibugs>	 10Cloud-Services, 10Toolforge, 10Developer-Relations: Provide an easy way for Tool Labs tools to expose their source code - https://phabricator.wikimedia.org/T102081#1355202 (10yuvipanda) All of the things I was thinking of and can think of now seem terrible and seem to enable our current set of terrible pra...
[00:08:38] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove edit size information in the Edit Counter in new XTools - https://phabricator.wikimedia.org/T170103#3419463 (10Luke081515) >>! In T170103#3421859, @kaldari wrote: > @Samwilson: That's the wrong query to use. `rev_len` is the size of the entire revision,...
[00:09:59] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech: Export Page History top-editors as wikitext table - https://phabricator.wikimedia.org/T170098#3419288 (10kaldari) @MusikAnimal: During meeting today, we decided not to generalize this, but to just limit to the one table for now.
[00:10:25] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech: Add a slow query killer to new XTools - https://phabricator.wikimedia.org/T170013#3416681 (10MaxSem) I recommend [[ https://www.percona.com/doc/percona-toolkit/LATEST/pt-kill.html | pt-kill ]].
[00:11:03] <wikibugs>	 10Cloud-Services, 10Tool-Labs-standards-committee, 10Toolforge, 10Developer-Relations, 10WMF-Legal: Make sure tools can be taken over after they are abandoned - https://phabricator.wikimedia.org/T102066#3428615 (10bd808)
[00:11:06] <wikibugs>	 10Cloud-Services, 10Developer-Relations, 10User-bd808: Provide an easy way for Tool Labs tools to expose their source code - https://phabricator.wikimedia.org/T102081#3428611 (10bd808) 05Open>03Resolved a:03bd808 >>! In T102081#3428592, @yuvipanda wrote: > All of the things I was thinking of and can th...
[00:12:05] <wikibugs>	 10Toolforge, 10Developer-Relations: Provide an easy way for Tool Labs tools to expose their source code - https://phabricator.wikimedia.org/T102081#3428629 (10bd808)
[00:12:27] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Export Page History top-editors as wikitext table - https://phabricator.wikimedia.org/T170098#3428630 (10kaldari) p:05Triage>03High
[00:15:20] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Add a slow query killer to new XTools - https://phabricator.wikimedia.org/T170013#3428637 (10kaldari) p:05Triage>03High
[00:17:40] <wikibugs>	 10Tool-Labs-standards-committee, 10Toolforge, 10Tools, 10Developer-Relations: Make sure abandoned useful tools are properly advertised so potentially interested new maintainers could find them - https://phabricator.wikimedia.org/T159595#3428644 (10bd808) My suggestion to the #tool-labs-standards-committee...
[00:21:58] <wikibugs>	 10Tool-Labs-standards-committee, 10Toolforge: Rename the Tool Labs standards committee - https://phabricator.wikimedia.org/T170363#3428649 (10bd808)
[00:22:35] <wikibugs>	 10Tool-Labs-standards-committee, 10Toolforge: Rename the Tool Labs standards committee - https://phabricator.wikimedia.org/T170363#3428662 (10bd808)
[00:22:37] <wikibugs>	 10cloud-services-team (FY2017-18), 10Goal, 10Patch-For-Review, 10User-bd808: Perform initial Cloud Services rebranding - https://phabricator.wikimedia.org/T168480#3428661 (10bd808)
[00:27:22] <shinken-wm>	 PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[00:40:25] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove edit size information in the Edit Counter in new XTools - https://phabricator.wikimedia.org/T170103#3428687 (10kaldari)
[00:44:12] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove edit size information in the Edit Counter in new XTools - https://phabricator.wikimedia.org/T170103#3428688 (10kaldari) @Luke081515: Correcting the query would mean doubling the number of revisions we have to look at for the user. The Edit Counter inter...
[00:55:09] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove automated edits interface in Edit Counter - https://phabricator.wikimedia.org/T170185#3428718 (10kaldari)
[00:55:25] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove automated edits interface in Edit Counter - https://phabricator.wikimedia.org/T170185#3422164 (10kaldari)
[00:57:21] <shinken-wm>	 RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0]
[01:00:00] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove automated edits interface in Edit Counter - https://phabricator.wikimedia.org/T170185#3422164 (10kaldari) @MusikAnimal: Does just pulling the total number itself slow things down? If so, let's remove that as well from the General statistics section.
[01:12:22] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint, 10JavaScript: Phabricator link on Internal Server Error page is reported as an XSS attack by noscript - https://phabricator.wikimedia.org/T170292#3428776 (10Samwilson) It seems like even with replacing the filename with just the class name this could still b...
[01:12:54] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1424 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[01:32:32] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Planning for Xtools beta - https://phabricator.wikimedia.org/T167217#3321093 (10Samwilson)
[01:32:35] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Set up load balancing for new XTools - https://phabricator.wikimedia.org/T169590#3428801 (10Samwilson) 05Open>03declined As we're not going to set up load-balancing for now, this ticket is done I think. Further prod configuration is being discussed in T167...
[01:35:02] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Planning for Xtools beta - https://phabricator.wikimedia.org/T167217#3428810 (10Samwilson) Well, there are no backups of ToolsDB, so it seems like we should. I'm not sure where they should live, but if someone tells me I'll set up a daily dump. They don't have...
[01:52:54] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1424 is OK: OK: Less than 1.00% above the threshold [0.0]
[02:05:38] <shinken-wm>	 PROBLEM - Puppet staleness on tools-worker-1020 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]
[02:07:14] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1438 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[02:43:53] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1424 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[02:47:14] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1438 is OK: OK: Less than 1.00% above the threshold [0.0]
[02:51:50] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Planning for Xtools beta - https://phabricator.wikimedia.org/T167217#3428857 (10Samwilson) Stats are at http://xtools.wmflabs.org/awstats/awstats.pl
[03:18:53] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1424 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:38:30] <wikibugs>	 10Tool-Labs-standards-committee, 10Toolforge: Rename the Tool Labs standards committee - https://phabricator.wikimedia.org/T170363#3428891 (10zhuyifei1999) I thought @harej would have created this ticket ;)
[03:48:21] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1401 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[03:53:38] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint, 10JavaScript: Phabricator link on Internal Server Error page is reported as an XSS attack by noscript - https://phabricator.wikimedia.org/T170292#3428901 (10Matthewrbowker) >>! In T170292#3428776, @Samwilson wrote: > It seems like even with replacing the fil...
[03:55:20] <shinken-wm>	 PROBLEM - Free space - all mounts on tools-logs-02 is CRITICAL: CRITICAL: tools.tools-logs-02.diskspace._srv.byte_percentfree (<20.00%)
[04:14:49] <wikibugs>	 10Tool-Labs-tools-XTools: Figure out XTools Git Repositories - https://phabricator.wikimedia.org/T170367#3428913 (10Matthewrbowker)
[04:15:29] <wikibugs>	 10Tool-Labs-tools-XTools: Convert all xtools issues to Phabricator - https://phabricator.wikimedia.org/T134632#2271790 (10Matthewrbowker)
[04:18:21] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:22:50] <wikibugs>	 10Cloud-Services, 10Quarry: Consider moving Quarry to be an installation of Redash - https://phabricator.wikimedia.org/T169452#3428933 (10Tbayer) Don't know whether it should be considered dealbreaker, but FWIW: Apart from permalinks ensuring reproducibility of individual anayses, the history is also an accide...
[04:36:50] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove automated edits interface in Edit Counter - https://phabricator.wikimedia.org/T170185#3428973 (10MusikAnimal) >>! In T170185#3428730, @kaldari wrote: > @MusikAnimal: Does just pulling the total number itself slow things down? If so, let's remove that as...
[04:42:46] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove edit size information in the Edit Counter in new XTools - https://phabricator.wikimedia.org/T170103#3419463 (10MusikAnimal) When there's a will there's a way, but you should also pay mind to reverts vs non-reverts. People care about content, not that yo...
[04:46:24] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove automated edits interface in Edit Counter - https://phabricator.wikimedia.org/T170185#3422164 (10Samwilson) Currently to calculate the total number of semi-automated edits it gets all revision comments by the user (with `SELECT rev_comment FROM $revisio...
[04:54:04] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Planning for Xtools beta - https://phabricator.wikimedia.org/T167217#3428985 (10MusikAnimal) >>! In T167217#3428857, @Samwilson wrote: > Stats are at http://xtools.wmflabs.org/awstats/awstats.pl  This is AMAZING!!!! Thank you!  >>! In T167217#3428810, @Samwils...
[04:56:12] <wikibugs>	 10Tool-Labs-tools-XTools: Convert all xtools issues to Phabricator - https://phabricator.wikimedia.org/T134632#3428986 (10Samwilson) Once we turn off GH issues, the old issue queue is hidden and all previous issue links become 404s I think. Is that going to be a problem?
[04:57:42] <wikibugs>	 10Tool-Labs-tools-XTools: Convert all xtools issues to Phabricator - https://phabricator.wikimedia.org/T134632#3428988 (10Matthewrbowker) >>! In T134632#3428986, @Samwilson wrote: > Once we turn off GH issues, the old issue queue is hidden and all previous issue links become 404s I think. Is that going to be a p...
[04:58:44] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Planning for Xtools beta - https://phabricator.wikimedia.org/T167217#3321093 (10bd808) Once you settle on the general configuration for your servers, I'd be glad to help figure out if creating Puppet classes would be useful to make it easier to create new serv...
[04:59:13] <wikibugs>	 10Tool-Labs-tools-XTools: Figure out XTools Git Repositories - https://phabricator.wikimedia.org/T170367#3428913 (10Samwilson) Another option is to move to the Diffusion repository {rXT}, and retire both the xtools-rebirth and xtools repos. That'd mean we also use Wikimedia -style code review too (which I don't...
[05:01:21] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Planning for Xtools beta - https://phabricator.wikimedia.org/T167217#3428994 (10Samwilson) Yes please @bd808 I'd love to learn how to do that! If you teach me, I promise I'll write whatever I learn in docs on Wikitech. :-)  It's that "cattle, not pets" thing i...
[05:01:41] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Planning for Xtools beta - https://phabricator.wikimedia.org/T167217#3428995 (10MusikAnimal) I guess you already said there were no backups of toolsdb heh. We're only talking about usage data here, right? If that's the case we shouldn't worry too much
[05:04:22] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Planning for Xtools beta - https://phabricator.wikimedia.org/T167217#3428996 (10Samwilson) @MusikAnimal yeah it's not a huge problem if we lose it all is it? We're not going to start backing up access logs in general, so let's not worry about that DB.  I don't...
[05:05:14] <wikibugs>	 10Tool-Labs-tools-XTools: Figure out XTools Git Repositories - https://phabricator.wikimedia.org/T170367#3428997 (10Matthewrbowker) >>! In T170367#3428992, @Samwilson wrote: > Another option is to move to the Diffusion repository {rXT}, and retire both the xtools-rebirth and xtools repos. That'd mean we also use...
[05:07:50] <wikibugs>	 10Tool-Labs-tools-XTools: Figure out XTools Git Repositories - https://phabricator.wikimedia.org/T170367#3428998 (10Samwilson) Yep, fair enough! I go for option A then, too. All existing links will even keep working, which is nice.
[05:09:08] <wikibugs>	 10Tool-Labs-tools-XTools: Convert all xtools issues to Phabricator - https://phabricator.wikimedia.org/T134632#3428999 (10Samwilson) > disallow any new ones.  Is that possible? Or do you mean just tell people not to create new ones there?
[05:13:11] <wikibugs>	 10Tool-Labs-tools-XTools: Convert all xtools issues to Phabricator - https://phabricator.wikimedia.org/T134632#3429000 (10Matthewrbowker) >>! In T134632#3428999, @Samwilson wrote: >> disallow any new ones. >  > Is that possible? Or do you mean just tell people not to create new ones there?  I thought we would ju...
[05:27:55] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Planning for Xtools beta - https://phabricator.wikimedia.org/T167217#3429005 (10bd808) >>! In T167217#3428996, @Samwilson wrote: >  > I don't know if there's any problem with running the staging and the prod things with the same database user. I'd normally not...
[05:30:49] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Planning for Xtools beta - https://phabricator.wikimedia.org/T167217#3429007 (10Samwilson) >>! In T167217#3429005, @bd808 wrote: > The current 'official' policy for getting ToolsDB/web replica credentials is to create a tool account and copy the replica.my.cnf...
[05:32:42] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Planning for Xtools beta - https://phabricator.wikimedia.org/T167217#3429014 (10bd808) >>! In T167217#3428994, @Samwilson wrote: > Yes please @bd808 I'd love to learn how to do that! If you teach me, I promise I'll write whatever I learn in docs on Wikitech. :...
[05:33:27] <wikibugs>	 10Cloud-VPS (Project-requests), 10artificial-intelligence: Provide large disk space to WikiBrain for memory-mapped file - https://phabricator.wikimedia.org/T161554#3429015 (10Shilad) Thanks @Halfak and @Andrew This is exciting!    wikibrain-host-01 sounds great for the server itself, and the vms could be wikib...
[05:34:20] <wikibugs>	 10Cloud-Services, 10Tools-Global-user-contributions, 10Labs-Sprint-107, 10Regression: meta_p.wiki table corrupt (contains many NULL entries for 'url' field) - https://phabricator.wikimedia.org/T106897#3429020 (10Krinkle)
[05:39:49] <wikibugs>	 10Tool-Labs-tools-XTools: Convert all xtools issues to Phabricator - https://phabricator.wikimedia.org/T134632#3429023 (10Samwilson) The other things we could turn off are the GitHub wiki and projects features. Just to clear up the navigation there.
[05:40:17] <wikibugs>	 10Cloud-VPS (Project-requests), 10artificial-intelligence: Provide large disk space to WikiBrain for memory-mapped file - https://phabricator.wikimedia.org/T161554#3429024 (10Shilad)
[05:40:34] <wikibugs>	 10Cloud-VPS (Project-requests), 10artificial-intelligence: Provide large disk space to WikiBrain for memory-mapped file - https://phabricator.wikimedia.org/T161554#3135155 (10Shilad)
[05:42:04] <wikibugs>	 10Cloud-VPS (Project-requests), 10artificial-intelligence: Provide large disk space to WikiBrain for memory-mapped file - https://phabricator.wikimedia.org/T161554#3135155 (10Shilad) Also, I'll probably be using Docker images (we have a WikiBrain docker image). I presume that it's better to run the Docker imag...
[05:49:39] <wikibugs>	 10Cloud-VPS, 10Puppet: role::puppetmaster::standalone has no firewall rule for port 8140 - https://phabricator.wikimedia.org/T154150#3429032 (10bd808)
[05:59:07] <wikibugs>	 10Cloud-Services, 10Continuous-Integration-Infrastructure, 10Beta-Cluster-reproducible, 10Puppet: New instances attached to a role::puppetmaster::standalone Puppetmaster need manual changes after switching from the default Puppetmaster - https://phabricator.wikimedia.org/T148929#3429039 (10bd808)
[06:00:30] <wikibugs>	 10Tool-Labs-tools-Pageviews, 10Possible-Tech-Projects: Add Mediaviews to Pageviews suite - https://phabricator.wikimedia.org/T149642#2758721 (10Paputx) Very interesting: "Views of this media file",
[06:01:54] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Export Page History top-editors as wikitext table - https://phabricator.wikimedia.org/T170098#3429045 (10Samwilson) a:03Samwilson
[06:04:05] <wikibugs>	 10Cloud-Services, 10Operations, 10Puppet: Self hosted puppetmaster is broken - https://phabricator.wikimedia.org/T119541#3429048 (10bd808) 05Open>03Resolved WP:BOLD'ly closing this stale task. The LDAP enc is long gone now.
[06:12:17] <wikibugs>	 10Cloud-Services, 10Puppet: Make changing puppetmasters for Labs instances more easy - https://phabricator.wikimedia.org/T152941#2864105 (10bd808) p:05Lowest>03Low == Workaround == See https://wikitech.wikimedia.org/wiki/Help:Standalone_puppetmaster:  Agent: ``` $ sudo -i puppet agent -tv $ sudo rm -fR /va...
[06:27:40] <wikibugs>	 10Cloud-VPS, 10Puppet: Invesitgate use of Puppet "modules" for per-project Puppet manifests - https://phabricator.wikimedia.org/T170370#3429083 (10bd808)
[06:50:04] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove edit size information in the Edit Counter in new XTools - https://phabricator.wikimedia.org/T170103#3429107 (10kaldari) @MusikAnimal: Whenever I use the old XTools, it always just shows me "extended" for these 3 fields, no matter who I'm looking up. Wha...
[06:59:33] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove automated edits interface in Edit Counter - https://phabricator.wikimedia.org/T170185#3429109 (10kaldari) @MusikAnimal: Since you missed the meeting today, I should let you know that we're basically in slash and burn mode at this point. We're basically...
[07:14:25] <shinken-wm>	 PROBLEM - Puppet errors on tools-bastion-03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[07:15:19] <shinken-wm>	 RECOVERY - Free space - all mounts on tools-logs-02 is OK: OK: All targets OK
[07:31:00] <wikibugs>	 10Cloud-Services, 10Operations, 10hardware-requests: Codfw: (2) hardware access request for labtest [region 2] - https://phabricator.wikimedia.org/T161766#3429144 (10faidon)
[07:31:07] <wikibugs>	 10Cloud-Services, 10Operations, 10hardware-requests: eqiad: (1) hardware access request for labnodepool1002 - https://phabricator.wikimedia.org/T161753#3429148 (10faidon)
[07:58:42] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Export Page History top-editors as wikitext table - https://phabricator.wikimedia.org/T170098#3429170 (10Samwilson) PR: https://github.com/x-tools/xtools-rebirth/pull/41  I've added a `?format=wikitext` option to the Page History results, which returns the req...
[07:59:25] <shinken-wm>	 RECOVERY - Puppet errors on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0]
[08:06:27] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Write unit tests for Xtools - https://phabricator.wikimedia.org/T165400#3429177 (10Samwilson) I suspect we have written all the tests we're going to for the time being. New features will be added with tests, but I think for now this can be closed, or at least...
[08:06:40] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Write unit tests for Xtools - https://phabricator.wikimedia.org/T165400#3429179 (10Samwilson) a:03Samwilson
[09:04:11] <shinken-wm>	 PROBLEM - Puppet errors on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[09:07:08] <shinken-wm>	 PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:12:00] <shinken-wm>	 RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3570 bytes in 0.010 second response time
[09:18:19] <shinken-wm>	 PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[09:39:07] <shinken-wm>	 RECOVERY - Puppet errors on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0]
[10:12:11] <shinken-wm>	 PROBLEM - Puppet errors on tools-worker-1011 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[10:33:22] <shinken-wm>	 RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0]
[10:47:14] <shinken-wm>	 RECOVERY - Puppet errors on tools-worker-1011 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:13:08] <shinken-wm>	 PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[12:15:56] <andrewbogott>	 !log tools restarting 'admin' webservice
[12:15:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[12:18:00] <shinken-wm>	 RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3570 bytes in 0.042 second response time
[13:53:22] <wikibugs>	 10Cloud-Services, 10DBA, 10User-Urbanecm: Prepare and check storage layer for dinwiki - https://phabricator.wikimedia.org/T169193#3430531 (10Marostegui) ``` root@neodymium:/home/marostegui# mysql --skip-ssl -hdb1075 dinwiki -e "show tables;" | wc -l 78 ```  Is this all done in production and we should go ahe...
[13:53:56] <wikibugs>	 10Cloud-Services, 10DBA, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3430534 (10Marostegui) ``` root@neodymium:/home/marostegui# mysql --skip-ssl -hdb1075 maiwiki -e "show tables;" | wc -l 83 ```  Is this all done in production and we should g...
[13:57:33] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Planning for Xtools beta - https://phabricator.wikimedia.org/T167217#3321093 (10MZMcBride) If you're going to spend time rewriting these tools, why not just make them properly supported MediaWiki extensions or add them to MediaWiki core?
[13:58:11] <wikibugs>	 10Cloud-Services, 10DBA, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3430551 (10Urbanecm) Ping @Dereckson. As I watched -operations, it seems like we are waiting for Apache config being merged by ops (see the main task for details) and everyth...
[13:58:35] <wikibugs>	 10Cloud-Services, 10DBA, 10User-Urbanecm: Prepare and check storage layer for dinwiki - https://phabricator.wikimedia.org/T169193#3430558 (10Urbanecm) >>! In T169193#3430531, @Marostegui wrote: > ``` > root@neodymium:/home/marostegui# mysql --skip-ssl -hdb1075 dinwiki -e "show tables;" | wc -l > 78 > ``` >...
[13:59:00] <wikibugs>	 10Cloud-Services, 10DBA, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3430560 (10Marostegui) Would that be a blocker for the table sanitization?
[13:59:14] <wikibugs>	 10Cloud-Services, 10DBA, 10User-Urbanecm: Prepare and check storage layer for dinwiki - https://phabricator.wikimedia.org/T169193#3430561 (10Marostegui) ok - I will go ahead then
[13:59:20] <wikibugs>	 10Cloud-Services, 10DBA, 10User-Urbanecm: Prepare and check storage layer for dinwiki - https://phabricator.wikimedia.org/T169193#3430562 (10Marostegui) a:03Marostegui
[13:59:26] <wikibugs>	 10Cloud-Services, 10DBA, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3430563 (10Urbanecm) I don't think so - database is created.
[14:24:08] <shinken-wm>	 PROBLEM - Puppet errors on tools-webgrid-lighttpd-1417 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[14:24:57] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1429 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[14:31:27] <Sagan>	 max topiclen is 390 :)
[14:53:36] <shinken-wm>	 PROBLEM - Puppet errors on tools-services-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[14:53:57] <wikibugs>	 10Cloud-Services, 10DBA, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3430762 (10Marostegui) a:03Marostegui
[14:54:42] <shinken-wm>	 PROBLEM - Puppet errors on tools-webgrid-lighttpd-1424 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[14:55:32] <wikibugs>	 10Cloud-Services, 10Quarry: Consider moving Quarry to be an installation of Redash - https://phabricator.wikimedia.org/T169452#3430771 (10Halfak) > history is also an accidental but very valuable source of knowledge  To clarify, the decision to include a query history was not accidental at all.   That was very...
[14:59:05] <shinken-wm>	 PROBLEM - Puppet errors on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[14:59:43] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1441 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[15:00:29] <chasemp>	 andrewbogott: madhuvishy I'm headed into an interview^ 
[15:00:32] <chasemp>	 somethign is wrong
[15:00:47] <chasemp>	 or maybe fallout from moritz's change?
[15:00:51] <chasemp>	 not sure
[15:01:06] <madhuvishy>	 chasemp: I'll look
[15:01:46] <andrewbogott>	 madhuvishy: is this the puppet failures you're talking about, or something else?
[15:02:20] <madhuvishy>	 I think he meant the puppet failures
[15:02:40] <andrewbogott>	 probably moritz then, he was going to reboot labcontrols
[15:02:47] <andrewbogott>	 or, maybe not reboot, just restart apache?
[15:02:55] <andrewbogott>	 But either way that would interrupt a bunch of puppet runs
[15:05:10] <shinken-wm>	 PROBLEM - Puppet errors on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[15:06:57] <shinken-wm>	 PROBLEM - Puppet errors on tools-webgrid-lighttpd-1421 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[15:08:17] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1438 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[15:09:53] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1429 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:09:55] <wikibugs>	 10Cloud-Services, 10DBA, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3430862 (10Marostegui) I have sanitized sanitarium and sanitarium2 - but before creating the views I am running a check_private_data to make sure everything has been sanitized.
[15:10:04] <shinken-wm>	 PROBLEM - Puppet errors on tools-webgrid-lighttpd-1414 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[15:10:08] <wikibugs>	 10Cloud-Services, 10DBA, 10User-Urbanecm: Prepare and check storage layer for dinwiki - https://phabricator.wikimedia.org/T169193#3430867 (10Marostegui) I have sanitized sanitarium and sanitarium2 - but before creating the views I am running a check_private_data to make sure everything has been sanitized.
[15:13:34] <shinken-wm>	 RECOVERY - Puppet errors on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:14:05] <shinken-wm>	 RECOVERY - Puppet errors on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:14:43] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1441 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:15:25] <shinken-wm>	 PROBLEM - Puppet errors on tools-bastion-03 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[15:24:37] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1413 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[15:24:47] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1422 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[15:25:44] <shinken-wm>	 PROBLEM - Puppet errors on tools-proxy-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[15:27:30] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1434 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[15:30:45] <shinken-wm>	 PROBLEM - Puppet errors on tools-elastic-02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[15:31:06] <bd808>	 I've been spot checking puppet failures and they all seem to be transient. maybe whatever problem is also flooding -ops right now?
[15:33:54] <shinken-wm>	 PROBLEM - Puppet errors on tools-worker-1009 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[15:36:56] <shinken-wm>	 RECOVERY - Puppet errors on tools-webgrid-lighttpd-1421 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:37:29] <chasemp>	 bd808: it seems like teh actual contents of /usr/local/sbin/puppet-run are being changed
[15:37:49] <chasemp>	 idk if that's going to cause some recursive weirdness or what
[15:37:56] <chasemp>	 I wouldn't think...
[15:39:22] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1401 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[15:39:25] <bd808>	 Notice: /Stage[main]/Base::Puppet/File[/usr/local/sbin/puppet-run]/content: content changed '{md5}3e50e4bb829be93e2c5bb4821ae21478' to '{md5}c23351dac75db77d2088e15d4c770b44'
[15:39:32] <bd808>	 Notice: Caught TERM; calling stop
[15:39:44] <bd808>	 so yeah. puppet is killing puppet. good times
[15:39:49] <chasemp>	 'tis true
[15:40:19] <bd808>	 maybe for a timeout though? not sure because there are no timestamps of course
[15:40:26] <shinken-wm>	 RECOVERY - Puppet errors on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:40:44] <shinken-wm>	 RECOVERY - Puppet errors on tools-proxy-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:42:21] <chasemp>	 seems like a splay change in puppet-run causes puppet to sigterm itself
[15:42:27] <chasemp>	 then sleep for a random interval post
[15:42:44] <chasemp>	 (my interview guy is writing some code atm but I gotta get back)
[15:43:17] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1438 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:44:07] <shinken-wm>	 RECOVERY - Puppet errors on tools-webgrid-lighttpd-1417 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:44:21] <shinken-wm>	 PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[15:44:39] <shinken-wm>	 RECOVERY - Puppet errors on tools-webgrid-lighttpd-1424 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:45:28] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1426 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[15:45:46] <shinken-wm>	 RECOVERY - Puppet errors on tools-elastic-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:46:43] <chasemp>	 !log tools push out puppet run across tools
[15:46:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[15:47:28] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1434 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:48:54] <shinken-wm>	 RECOVERY - Puppet errors on tools-worker-1009 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:49:47] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1422 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:53:22] <shinken-wm>	 PROBLEM - Puppet errors on tools-webgrid-lighttpd-1425 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[15:54:38] <shinken-wm>	 PROBLEM - Puppet errors on tools-services-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[15:56:34] <wikibugs>	 10cloud-services-team, 10DBA, 10Operations, 10Scoring-platform-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3431085 (10madhuvishy)
[16:04:36] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1413 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:08:42] <Sagan>	 what's the difference between nova and bigdisks?
[16:10:08] <shinken-wm>	 RECOVERY - Puppet errors on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:12:06] <bd808>	 Sagan: I think I need more context to understand that question
[16:12:35] <Sagan>	 bd808: when I start an instance at horizon, at the menu "details", there is a field under the name of it
[16:12:44] <Sagan>	 there I can choose betweem nove and bigdisks
[16:13:24] <Sagan>	 sadly horizon does not accept things like uselang=en, so I can't tell you the english name of the field
[16:13:30] <bd808>	 ah. "bigdisks" is a new zone that andrewbogott added yesterday for a particular project
[16:13:42] <bd808>	 use nova
[16:13:57] <Sagan>	 bd808: when I pick stretch as image, it automatically uses bigdisks
[16:14:11] <bd808>	 andrewbogott: ^ can we hide the bigdisks avail zone from most projects?
[16:14:20] <Sagan>	 I now have a instance with that, but I did not setup it yet, so do you think it would make sense to recreate it?
[16:14:23] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:14:53] <bd808>	 Sagan: it will be ok if you leave it, but if you have the energy you can rebuild
[16:15:04] <Sagan>	 I have it actually :)
[16:15:18] <Sagan>	 just for my interest: what's the difference between this options?
[16:16:04] <bd808>	 we have 2 brand new labvirt hosts with a lot of local storage. the new zone is for pinning certain vms to those labvirts
[16:16:35] <bd808>	 there are a couple of projects that have been waiting for these big disks
[16:17:24] <bd808>	 one is T161554
[16:17:24] <stashbot>	 T161554: Provide large disk space to WikiBrain for memory-mapped file - https://phabricator.wikimedia.org/T161554
[16:18:06] <bd808>	 another isn't in phab yet, but a plan to test a ceph storage service
[16:18:38] <Sagan>	 ah, ok
[16:19:14] <Sagan>	 bd808: can you help me with my instance? I tried to delete it, but horizon says it's in the error state now, and the opertion to delete timed out
[16:19:20] <Sagan>	 it's neon.rcm.eqiad.wmflabs
[16:20:02] <shinken-wm>	 RECOVERY - Puppet errors on tools-webgrid-lighttpd-1414 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:21:25] <bd808>	 Sagan: I can try ...
[16:21:36] * bd808 reads openstack cli docs
[16:21:44] <Sagan>	 bd808: thx :)
[16:24:06] <bd808>	 !log rcm Deleted instance "neon" via nova cli
[16:24:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL
[16:24:22] <bd808>	 poof! it's gone
[16:24:23] <shinken-wm>	 RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:25:52] <bd808>	 !log wdq-mm  Added BryanDavis (self) as admin to work on T169653
[16:25:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wdq-mm/SAL
[16:25:54] <stashbot>	 T169653: Redirect wdq.wmflabs.org to query.wikidata.org - https://phabricator.wikimedia.org/T169653
[16:27:23] <bd808>	 !log wdq-mm  Deleted wdq.wmflabs.org proxy for T169653
[16:27:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wdq-mm/SAL
[16:28:16] <bd808>	 !log redirects added wdq.wmflabs.org proxy for T169653
[16:28:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Redirects/SAL
[16:28:22] <shinken-wm>	 RECOVERY - Puppet errors on tools-webgrid-lighttpd-1425 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:30:02] <wikibugs>	 10Cloud-Services: Redirect wdq.wmflabs.org to query.wikidata.org - https://phabricator.wikimedia.org/T169653#3431231 (10bd808) ``` $ curl -v https://wdq.wmflabs.org/ *   Trying 208.80.155.156... * TCP_NODELAY set * Connected to wdq.wmflabs.org (208.80.155.156) port 443 (#0) * TLS 1.2 connection using TLS_ECDHE_R...
[16:31:14] <bd808>	 !log wdq-mm  Removed BryanDavis (self) from project. T169653 complete
[16:31:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wdq-mm/SAL
[16:31:17] <stashbot>	 T169653: Redirect wdq.wmflabs.org to query.wikidata.org - https://phabricator.wikimedia.org/T169653
[16:32:09] <wikibugs>	 10Cloud-Services, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Sunset of WDQ - https://phabricator.wikimedia.org/T153439#3431240 (10bd808)
[16:32:11] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (Kanban), 10User-bd808: Redirect wdq.wmflabs.org to query.wikidata.org - https://phabricator.wikimedia.org/T169653#3431237 (10bd808) 05Open>03Resolved a:03bd808
[16:33:19] <wikibugs>	 10Cloud-VPS, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Sunset of WDQ - https://phabricator.wikimedia.org/T153439#2880526 (10bd808)
[16:34:35] <shinken-wm>	 RECOVERY - Puppet errors on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:37:42] <halfak>	 o/ andrewbogott.  Did you see https://phabricator.wikimedia.org/T170348?  I wonder how common it is to get a request to decrease a labs quota. 
[16:42:27] <bd808>	 halfak: not common enough :)
[16:45:15] <robh>	 So does cloud handle wdqs at all?  I ask cuz we have them reporting partially down, and while i see things passing on the logs on the systems, not 100% sure whats going on
[16:45:19] <robh>	 i see yuvi used to touch it
[16:45:37] <robh>	 (but yuvi worked on a lot of stuff!)
[16:46:02] <robh>	 (just asking but i doubt it)
[16:46:30] <Sagan>	 bd808: thx for deleting :)
[16:46:50] <robh>	 bah, i see gehel mostly handles now, and he is afk, so yuvi was just involved cuz he was ;]
[16:49:26] <bd808>	 robh: yeah. the real prod stuff is gehel and SMalyshev these days I think
[16:49:32] <robh>	 yep
[16:49:41] <robh>	 was just pming the latter =]
[16:49:47] <robh>	 thank you for sharing 
[16:52:10] <bd808>	 halfak: woah. that's a big quota drop
[16:52:50] <halfak>	 Yeah, I think that Yuvi at one point was like "ahh.  let's just give you 100 of everything trust that you won't do something ridiculous."
[16:52:53] <halfak>	 :)) 
[16:53:00] <halfak>	 Which, FWIW, I didn't do anything ridiculous. 
[16:53:23] <bd808>	 heh. when we make a "preferred customer" gold card you will certainly get an application ;)
[16:53:29] <halfak>	 :D
[16:55:23] <Sagan>	 bd808: sorry to poke you again, but the recreation of the instance failed :/. horizon says error state again, and when I'm trying to login, I get "server unexpectly shutdown the connection2
[16:55:25] <Sagan>	 *"
[16:55:58] <bd808>	 di you rebuild with the same hostname?
[16:57:28] <Sagan>	 bd808: hm, yeah. is that the problem?
[16:57:36] <wikibugs>	 10Cloud-VPS (Quota-requests), 10cloud-services-team (Kanban), 10ORES, 10Scoring-platform-team, 10User-bd808: Decrease quota for ores project to 80GB ram & 40 CPUs - https://phabricator.wikimedia.org/T170348#3431393 (10bd808) a:03bd808
[16:57:45] <bd808>	 It might be. let me poke at it
[17:00:56] <bd808>	 Sagan: hmmm.. it shows as "ERROR" state and I can't ssh or ping
[17:01:23] <Sagan>	 bd808: hm. so trying to delete again, and wait some hours before recreation?
[17:01:44] <bd808>	 does it need to have that name specifically?
[17:02:07] <bd808>	 I'm not sure it has anything to do with the name though
[17:02:31] <bd808>	 my first thought was that it was stuck with stale puppet certs from the past host not getting lceaned up
[17:02:38] <bd808>	 but this looks like something else
[17:04:31] <bd808>	 Sagan: I can try force deleting again or we can ask a.ndrewbogott to try to debug the error
[17:09:09] <Sagan>	 bd808: not sure which is the best option, but I'm open for both :)
[17:10:51] <chasemp>	 what is teh instance bd808?
[17:11:16] <bd808>	 chasemp: neon.rcm.eqiad.wmflabs
[17:11:48] <bd808>	 | f6e928f6-3fd7-4939-8ab1-b140f1d4da62 | neon   | ERROR  | -          | NOSTATE     | public=10.68.17.116                 |
[17:12:27] <wikibugs>	 10Cloud-VPS (Quota-requests), 10ORES, 10Scoring-platform-team, 10User-bd808: Request increase quota for ores-staging to 52GB RAM - https://phabricator.wikimedia.org/T169811#3431482 (10bd808)
[17:12:29] <wikibugs>	 10Cloud-VPS (Quota-requests), 10cloud-services-team (Kanban), 10ORES, 10Scoring-platform-team, 10User-bd808: Decrease quota for ores project to 80GB ram & 40 CPUs - https://phabricator.wikimedia.org/T170348#3431480 (10bd808) 05Open>03Resolved ``` $ nova quota-show --tenant ores +---------------------...
[17:12:34] <chasemp>	 hm
[17:12:35] <chasemp>	 {u'message': u'Build of instance f6e928f6-3fd7-4939-8ab1-b140f1d4da62 aborted: Could not clean up failed build, not rescheduling', u'code': 500, u'created': u'2017-07-12T16:56:25Z'}
[17:12:40] <chasemp>	 not very descriptive
[17:13:32] <bd808>	 there was an instance with the same name in the same project that failed to delete from horizon so I deleted from cli
[17:13:53] <chasemp>	 failure to delete can be a symptom of nova-compute being unhappy somewhere
[17:13:57] <Sagan>	 I wonder if we maybe have more creation errors? https://grafana.wikimedia.org/dashboard/db/nodepool?orgId=1&refresh=10s&from=now-3h&to=now shows some as well
[17:14:20] <Sagan>	 since about one hour
[17:14:46] <bd808>	 it did have a different uuid though -- e355aa6a-9999-4861-873c-daad727ebd81
[17:14:54] <chasemp>	 andrewbogott: looks like something is indeed up
[17:15:02] <chasemp>	 nova-fullstck shows timeouts for creation and deletion
[17:15:20] <Sagan>	 that's what horizon told me too: timeout
[17:16:07] <bd808>	 the other new thing Sagan saw today was the "bigdisk" AZ
[17:16:16] <bd808>	 may or may not be related
[17:16:25] <andrewbogott>	 bd808: that availability zone turns out to be a nonstarter anyway, I'll just delete it.
[17:17:02] <chasemp>	 andrewbogott: is that an attempt to do targeted scheduling?
[17:17:26] <chasemp>	 host-aggregates (non-mutually-exclusive) seems like the thing if so?
[17:17:59] <andrewbogott>	 chasemp: I spent a few hours yesterday trying to get that to work but couldn't.  It may be that I've misunderstood how it's supposed to work.
[17:18:11] <andrewbogott>	 I want a flavor to be automatically assigned to an aggregate but it doesn't seem to actually do that.
[17:18:44] <chasemp>	 andrewbogott: can you help me look at why creations are timing out?
[17:18:52] <chasemp>	 rabbit finally retarted in theory
[17:19:52] <chasemp>	 restarting nova-fullstck to kick off new build
[17:20:02] <wikibugs>	 10Tool-Labs-tools-XTools: Figure out XTools Git Repositories - https://phabricator.wikimedia.org/T170367#3428913 (10Krinkle) Note that you can rename repositories in the GitHub interface. Doing so will preserve all settings, issues, and pull-requests, and also leaves a redirect for any access to urls under the o...
[17:20:15] <andrewbogott>	 chasemp: yes, looking
[17:20:20] <andrewbogott>	 sorry, too many conversations at once
[17:20:23] <chasemp>	 andrewbogott: looking in releng I think CI may be effected
[17:20:38] <chasemp>	 yep, I think we have an issue and I"m sure what
[17:20:55] <Sagan>	 chasemp: yeah, CI queued growed up the last 15 minutes etc
[17:21:40] <chasemp>	 seems most likely related to something wonky w/ newly in service hsots maybe andrewbogott?
[17:21:41] <Sagan>	 grafana shows launch errors, and a lot of instances in the building/deletion state, only one of 25 is used
[17:23:04] <chasemp>	 our own canary has issues
[17:23:19] <andrewbogott>	 I haven't seen anything revealing yet.  All the logs look fine
[17:23:49] <chasemp>	 andrewbogott: nova-fullstack is attempting to create fullstackd-1499879974 now
[17:23:56] <chasemp>	 past few creations timed out
[17:24:03] <andrewbogott>	 you restarted rabbit a few minutes ago?
[17:24:12] <andrewbogott>	 It might take a bit for things to rebound from that
[17:24:50] <chasemp>	 andrewbogott: I did, but thigns were hosed up before that and it was 10m ago
[17:24:57] <chasemp>	 seems like zuul is stuck for 45m or so afaict
[17:25:30] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1426 is OK: OK: Less than 1.00% above the threshold [0.0]
[17:25:57] <greg-g>	 yeah, nodepool isn't making new instances, afaict, tyler (not here) may know more
[17:27:11] <chasemp>	 andrewbogott: for ci I see the not very useful '| fault                                | {u'message': u'Timed out waiting for a reply to message ID ef359d64b0664d9b8a1d91e5816752c1', u'code': 500, u'created': u'2017-07-12T17:20:32Z'} |'
[17:27:49] <chasemp>	 andrewbogott: 
[17:27:54] <chasemp>	 CI is attempting to create in | OS-EXT-AZ:availability_zone          | bigdisks
[17:27:57] <chasemp>	 so that cannot be right?
[17:27:58] <andrewbogott>	 yeah, some VMs are failing to start up on labvirt1015.  I'm trying to see what's happening but we can depool it in a minute if this continues.
[17:28:10] <andrewbogott>	 Well… that's CI's fault then :)  But I'll go back to deleting that.
[17:28:30] <chasemp>	 maybe horizon is too and now that there are two AZ's it's no longer working for default cases?
[17:28:35] <andrewbogott>	 chasemp: can you stop nodepool so that I can actually delete the zone?  I can't delete it as long as it contains VMs
[17:28:57] <chasemp>	 AZ iirc are mutually-exclusive 
[17:29:07] <chasemp>	 k
[17:29:24] <andrewbogott>	 (that zone should work, though, so we probably have multiple things going on)
[17:29:28] <bd808>	 andrewbogott: S.agan said "when I pick stretch as image, it automatically uses bigdisks"
[17:30:04] <chasemp>	 what resources are in that AZ?
[17:30:14] <chasemp>	 nodepool is thinking about stopping
[17:30:21] <chasemp>	 it does a lot of cleanup if you let it I think
[17:31:54] <thcipriani>	 ugh, it's trying to stop all the servers it thinks it's tried to start
[17:32:04] <thcipriani>	 which is, of course, failing because the manager is stopped
[17:32:34] <thcipriani>	 anyway, it's got 9 more that it's going to try and fail to delete
[17:32:43] <chasemp>	 andrewbogott: because AZ's are exlusive it looks like labvirt14 and labvirt15 dropped out of the nova AZ
[17:32:47] <chasemp>	 and only show up in bigdisk
[17:32:53] <bd808>	 ohi thcipriani. I'll stop writing things to your home channel now :)
[17:32:56] <chasemp>	 but I'm unsure atm why it would refuse to schedule in that AZ
[17:33:05] <chasemp>	 other than maybe other things are not working right for that AZ in general
[17:33:09] * thcipriani waves
[17:33:31] <Sagan>	 bd808: it did first. as I created the instace again, horizon used nova again
[17:33:35] <andrewbogott>	 chasemp: if you can find the command to list servers per availability-zone that would help me out.
[17:33:49] <chasemp>	 andrewbogott: does it nova availability-zone-list
[17:33:56] <andrewbogott>	 that lists the zones
[17:33:59] <andrewbogott>	 not what's in the zones...
[17:34:06] <chasemp>	 it shows me wht's in them?
[17:34:12] <chasemp>	 | bigdisks              | available                              |
[17:34:12] <chasemp>	 | |- labvirt1014        |                                        |
[17:34:12] <chasemp>	 | | |- nova-compute     | enabled :-) 2017-07-12T17:31:46.000000 |
[17:34:14] <chasemp>	 | |- labvirt1015        |                                        |
[17:34:16] <chasemp>	 | | |- nova-compute     | enabled :-) 2017-07-12T17:31:46.000000 |
[17:34:27] <andrewbogott>	 sorry, VMs I mean
[17:34:33] <chasemp>	 oh
[17:34:38] <andrewbogott>	 as in 'openstack server list <somethingsomething>'
[17:34:50] <chasemp>	 I'm not sure there is a reverse lookup for that hm
[17:35:40] <andrewbogott>	 I guess I can look in the db
[17:36:03] <bd808>	 maybe nova-manage vm list --host ...
[17:36:49] <andrewbogott>	 I don't think 'nova-manage' has done anything since essex
[17:37:07] <andrewbogott>	 it's ok, I'll just use mysql for now
[17:37:20] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1407 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[17:41:19] <chasemp>	 andrewbogott: fwiw admin-monitoring also is tryign to use '| OS-EXT-AZ:availability_zone          | bigdisks' it seems
[17:41:31] <chasemp>	 somethign I can do to help? clear out those dead instance attempts?
[17:42:50] <andrewbogott>	 chasemp: try restarting nodepool again?
[17:42:56] <andrewbogott>	 And, sure, clean out admin-monitoring and restart
[17:43:06] <andrewbogott>	 I deleted the aggregate and zone, we'll see what happens now
[17:43:30] <andrewbogott>	 It could be a combination where everything was going to that zone, and that zone was limited to only two hosts, and those two hosts <mumble>
[17:45:02] <chasemp>	 andrewbogott: started nodepool and restarted nova-fullstack after cleaning up admin-monitoring
[17:45:16] <andrewbogott>	 thanks
[17:45:50] <chasemp>	 I can concur the AZ appears gone from list of options
[17:45:56] <andrewbogott>	 So — defaulting to that zone was a mistake (apparently if you don't specify a zone it just… picks one?)
[17:46:11] <chasemp>	 andrewbogott: https://stackoverflow.com/questions/41820355/how-is-availability-zone-list-order-determined-by-the-nova-api-in-openstack
[17:46:21] <andrewbogott>	 But that shouldn't have broken things on its own — for example I created the zone yesterday and things didn't break yesterday.
[17:46:23] <chasemp>	 which is sorted based on the id
[17:46:45] <andrewbogott>	 makes sense, so — arbitrary
[17:46:54] <chasemp>	 I'm not sure thing are wrokign yet, you may be right :D
[17:47:02] <andrewbogott>	 bigdisks got lucky and got an alphabetically gifted id
[17:47:04] <chasemp>	 but anecdotally nova-fullstack is woring
[17:47:08] <chasemp>	 working even
[17:47:11] <chasemp>	 so far
[17:47:26] <chasemp>	 forcing another run
[17:47:59] <chasemp>	 andrewbogott: did you cleanup contintcloud project before I started nodepool?
[17:48:14] <andrewbogott>	 chasemp: just the ones on 1015 because that's what I was looking at
[17:48:23] <andrewbogott>	 I'll do the other error state nodes in a moment
[17:48:27] <chasemp>	 there are more in error state 
[17:48:27] <chasemp>	 ok
[17:48:29] <chasemp>	 thanks
[17:48:38] <chasemp>	 hard to separate old noise from new atm
[17:48:51] <chasemp>	 I tink it's working slowly tho
[17:49:00] <chasemp>	 it has to cycle through to issue deleteds and cleanup
[17:49:03] <chasemp>	 and that seems to be settling
[17:49:11] <chasemp>	 fyi thcipriani and RainbowSprinkles^
[17:49:24] <thcipriani>	 yeah, watching the debug log for nodepool
[17:49:26] <andrewbogott>	 bd808: my thoughts about https://etherpad.wikimedia.org/p/WMCS-Renaming-Announce is that the announcement should probably start with the tl;dr (or something about what's actually happening) — I like it otherwise.
[17:49:55] <bd808>	 andrewbogott: awesome. that matches other feedback so I'll make some changes
[17:49:58] <andrewbogott>	 oh, looks like ERROR vms were already deleted by nodepool
[17:50:06] <thcipriani>	 things seem to be entering ready, so that seems like a good thing
[17:50:08] <chasemp>	 yes it does that even if slowly
[17:50:25] <chasemp>	 so, creation of a new AZ msut be coupled with hard setting selection in existing places 
[17:50:35] <chasemp>	 because it's deterministic in a way that's not useful
[17:50:59] <chasemp>	 did that work in labtest?
[17:51:09] <chasemp>	 I guess we need to get nova-fullstack going there all the time to create a good baseline
[17:52:57] <Sagan>	 when you are done with the main problem: my instance is still at error after creation :o
[17:53:00] <Sagan>	 but that's non-urgent
[17:53:28] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (Kanban): Set good availability-zone defaults for nova users - https://phabricator.wikimedia.org/T170447#3431744 (10Andrew)
[17:53:38] <andrewbogott>	 Sagan: delete it and recreate and it should (maybe) work better.
[17:53:41] <andrewbogott>	 chasemp: ^^
[17:53:45] <Sagan>	 andrewbogott: ty, I will try it
[17:53:52] <chasemp>	 I removed it Sagan
[17:54:00] <Sagan>	 chasemp: ah, thx
[17:54:13] <Sagan>	 how long do I have to wait before recreating an instance with the sam ename? 20 minutes?
[17:54:30] <chasemp>	 uh, andrewbogott would know better :)
[17:54:35] * chasemp off a meeting
[17:55:02] <andrewbogott>	 Sagan: you can do it right away — your local system might cache dns and confuse you for a bit but it will /probably/ just work.
[17:55:14] <Sagan>	 andrewbogott: ok, thx :)
[17:56:39] <chasemp>	 (conceptually tho andrewbogott I dont' think AZ is the riht mechanism for that targted scheduling ...chat later)
[17:57:15] <andrewbogott>	 chasemp: you're thinking 'host aggregate'?  Because I couldn't find any docs that distinguished between the two for this use case
[17:57:48] <andrewbogott>	 But in any case, my way didn't work at all :)
[17:59:21] <chasemp>	 ( an AZ is really jsut a pre-determined-purpose host-aggregate that is mutually-exclusive right?)
[17:59:27] <chasemp>	 ok really off to meeting!
[18:01:28] <andrewbogott>	 chasemp: ok — I'm sort of inbetween locations now so will disappear in a few minutes if things keep working
[18:02:04] <chasemp>	 (meeting andrewbogott -- but afaict things are working)
[18:02:20] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:08:12] <andrewbogott>	 Sagan: working now?
[18:08:32] <Sagan>	 andrewbogott: I did not start yet, but I will check it now :)
[18:08:57] <Sagan>	 it's currently building
[18:09:17] <andrewbogott>	 hm, slow
[18:09:48] <Sagan>	 (I started it less then a minute ago)
[18:10:01] <andrewbogott>	 ok
[18:10:15] <Sagan>	 andrewbogott: how long does creation/deletion usually take?
[18:10:48] <Sagan>	 I'm wondering when I take a look at the CI statistics, that there are about 8 instances in the deletion state
[18:10:53] <andrewbogott>	 Actually building is usually a minute or two… but then puppet has to run before you can actually log in and that can be a bit slow.
[18:11:10] <Sagan>	 ok, and deletion?
[18:11:17] <andrewbogott>	 depends but should be pretty fast
[18:11:21] <andrewbogott>	 it looks like something is messed up again
[18:12:23] <chasemp>	 andrewbogott: ah yeah I see "Exception: deletion timed out" from fullstack
[18:12:37] <wikibugs>	 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labnet100[34] - https://phabricator.wikimedia.org/T165779#3431876 (10Cmjohnson)
[18:12:47] <RainbowSprinkles>	 Hmm, yeah things stopped again
[18:13:02] <wikibugs>	 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labnet100[34] - https://phabricator.wikimedia.org/T165779#3276633 (10Cmjohnson) @robh can you take this from here please.
[18:13:34] <chasemp>	 andrewbogott: indeed I think things are not in a good state
[18:13:47] <andrewbogott>	 I'm going to depool 1015 and see if that helps
[18:14:05] <Sagan>	 my instance is in creation now since 5 minutes
[18:14:49] <RainbowSprinkles>	 Well, moving, but slowly.
[18:15:14] <bd808>	 Sagan: you are probably getting stuck just like the CI instances
[18:15:38] <Sagan>	 bd808: yeah, looks like
[18:15:40] <andrewbogott>	 Sagan: what instance ID are you looking at?
[18:15:42] <Sagan>	 now I've got a timeout
[18:15:47] <Sagan>	 and it's at error again
[18:15:59] <Sagan>	 andrewbogott: I don't know the ID. it's neon.rcm.eqiad.wmflabs
[18:16:09] <chasemp>	 andrewbogott: 1668e040-076d-447f-bc42-d18a8e3c7978
[18:16:09] <bd808>	 andrewbogott: 1668e040-076d-447f-bc42-d18a8e3c7978
[18:16:15] <bd808>	 jinx!
[18:16:28] <andrewbogott>	 so it was on 1014 and not 1015...
[18:16:55] <chasemp>	 how did you find that? I didn't see it scheduled
[18:17:39] <wikibugs>	 10Cloud-Services, 10DBA, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3431922 (10Dereckson) Thanks (indeed it wasn't a blocker, as the db is in ready state).
[18:18:16] <wikibugs>	 10Cloud-Services, 10DBA, 10User-Urbanecm: Prepare and check storage layer for dinwiki - https://phabricator.wikimedia.org/T169193#3390099 (10Dereckson) This one is public.
[18:18:35] <chasemp>	 andrewbogott: definitely real, nodepool is choking, what can I do to help?
[18:18:51] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1425 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[18:19:58] <chasemp>	 andrewbogott: yeah I see it failed to build on 1014
[18:20:11] <andrewbogott>	 chasemp: I don't know.  I'm going to depool 1014 as well, which gets us back to the state of a few days ago
[18:26:29] <Sagan>	 andrewbogott: should I try it again (delete the instance and recreate it)?
[18:26:31] <chasemp>	 seems like some of the failed stuff was 14 and some was 15
[18:26:38] <andrewbogott>	 Sagan: yes, but not yet
[18:26:43] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove automated edits interface in Edit Counter - https://phabricator.wikimedia.org/T170185#3431943 (10MusikAnimal) >>! In T170185#3428981, @Samwilson wrote: > Is there a more efficient way to just get the total? I'm not sure.  You can get the total with [[ h...
[18:26:54] <Sagan>	 andrewbogott: ok, then just ping me, when I should :)
[18:29:13] <shinken-wm>	 PROBLEM - Puppet errors on tools-exec-1408 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[18:29:28] <andrewbogott>	 chasemp: A lot of what's happening now is just all the services restarting and catching up after config changes.  so I'm just waiting and watching for the moment
[18:29:53] <chasemp>	 andrewbogott: k, I'm trying not to trip things up so I'm in this meeting and standing by
[18:29:56] <wikibugs>	 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labnet100[34] - https://phabricator.wikimedia.org/T165779#3431947 (10RobH) a:05Cmjohnson>03RobH Yep!
[18:39:57] <andrewbogott>	 Sagan: ok, try once more :/
[18:40:20] <Sagan>	 andrewbogott: ok, I'm recreating now
[18:40:40] <Sagan>	 it's building now
[18:41:49] <Sagan>	 it's created now, but I can'T login yet
[18:41:59] <Sagan>	 guess puppet needs a bit
[18:42:43] <wikibugs>	 10Tool-Labs-tools-XTools: Figure out XTools Git Repositories - https://phabricator.wikimedia.org/T170367#3428913 (10MusikAnimal) The only issue I have with option A, as I understand it, is that we'll lose our bragging rights of 18 watchers, 39 stars and 23 forks. Also I bet we're still going to have issues poppi...
[18:42:59] <Sagan>	 lgo says it's trying to start LDAP now
[18:43:33] <Sagan>	 andrewbogott: looks fixed, the login works now :)
[18:43:51] <andrewbogott>	 great
[18:44:03] <andrewbogott>	 sorry for the hold-up.  I still don't know what happened :(
[18:44:08] <Sagan>	 grafana for contint still shows 9 instances deleting
[18:44:24] <Sagan>	 not a problem for me :)
[18:45:10] <shinken-wm>	 PROBLEM - Puppet errors on tools-static-11 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[18:46:52] <wikibugs>	 10Cloud-VPS, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labnodepool1002.eqiad.wmnet - https://phabricator.wikimedia.org/T168407#3432010 (10Cmjohnson)
[18:47:20] <wikibugs>	 10Cloud-VPS, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labnodepool1002.eqiad.wmnet - https://phabricator.wikimedia.org/T168407#3363615 (10Cmjohnson) a:05Cmjohnson>03RobH Moving this @robh to handle the off-site work.
[18:54:00] <Sagan>	 andrewbogott: can you take a look if there are still contint instances stuck at deleltion? the zuul backlog is huge at the moment
[18:54:20] <andrewbogott>	 Sagan: it'll take a while to catch up but it has lots of VMs to work with
[18:54:40] <Sagan>	 andrewbogott: ok :)
[18:54:55] <Sagan>	 I only worried since grafana says there are still 10 instances at deletion
[18:58:52] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1425 is OK: OK: Less than 1.00% above the threshold [0.0]
[19:01:05] <chasemp>	 andrewbogott: things are looking ok?
[19:01:25] <andrewbogott>	 chasemp: yep, so far so good.  I'm going to give it a few more minutes.
[19:01:34] <chasemp>	 ok nice
[19:01:36] <andrewbogott>	 Then tomorrow I'll repool 1014 at the beginning of the day so I can watch it go for a few hours
[19:01:44] * chasemp nods
[19:03:04] <wikibugs>	 10Quarry: Quarry query in unknown state - https://phabricator.wikimedia.org/T170464#3432141 (10awight)
[19:04:15] <shinken-wm>	 RECOVERY - Puppet errors on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0]
[19:10:14] <shinken-wm>	 RECOVERY - Puppet errors on tools-static-11 is OK: OK: Less than 1.00% above the threshold [0.0]
[19:10:50] <andrewbogott>	 chasemp: ok, I'm convinced that things are peaceful now, I'll be back later on.
[19:14:07] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove edit size information in the Edit Counter in new XTools - https://phabricator.wikimedia.org/T170103#3432180 (10MusikAnimal) I think this might do it (user 59944 is Kaldari): ``` MariaDB [enwiki_p]> SELECT AVG(sizes.size) AS average_size,     ->     COUN...
[19:16:25] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove edit size information in the Edit Counter in new XTools - https://phabricator.wikimedia.org/T170103#3432187 (10MusikAnimal) >>! In T170103#3429107, @kaldari wrote: > @MusikAnimal: Whenever I use the old XTools, it always just shows me "extended" for the...
[19:18:16] <wikibugs>	 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labweb100[12].wikimedia.org - https://phabricator.wikimedia.org/T167820#3432194 (10Cmjohnson)
[19:19:21] <wikibugs>	 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labweb100[12].wikimedia.org - https://phabricator.wikimedia.org/T167820#3345535 (10Cmjohnson) a:05Cmjohnson>03RobH @robh assigning to you...can you do the production DNS please in addition to the other things.  Thanks
[19:19:28] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint, 10JavaScript: Phabricator link on Internal Server Error page is reported as an XSS attack by noscript - https://phabricator.wikimedia.org/T170292#3432228 (10MusikAnimal) 05Open>03Resolved Merged https://phabricator.wikimedia.org/rXTReaca305b0be7d401adae7...
[19:20:20] <wikibugs>	 10Cloud-Services, 10Cloud-VPS, 10Operations, 10ops-eqiad: rack/setup/install labpuppetmaster100[12].wikimedia.org - https://phabricator.wikimedia.org/T167905#3432245 (10Cmjohnson) a:05Cmjohnson>03RobH Assigning to robh to do off-site work
[19:21:21] <wikibugs>	 10Cloud-Services, 10Operations: rack/setup/install labcontrol100[34] - https://phabricator.wikimedia.org/T165781#3432252 (10Cmjohnson)
[19:23:19] <paladox>	 !log phabricator phab-01 daemons are segfaulting.
[19:23:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL
[19:26:00] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Write unit tests for Xtools - https://phabricator.wikimedia.org/T165400#3432291 (10kaldari) 05Open>03Resolved
[19:26:02] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech: Epic: Rewriting XTools - https://phabricator.wikimedia.org/T153112#3432292 (10kaldari)
[19:28:16] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Planning for Xtools beta - https://phabricator.wikimedia.org/T167217#3321093 (10Niharika) >>! In T167217#3430537, @MZMcBride wrote: > If you're going to spend time rewriting these tools, why not just make them properly supported MediaWiki extensions or add the...
[19:31:52] <wikibugs>	 10Toolforge: The Labs unicorn when you ssh into Labs is gone - https://phabricator.wikimedia.org/T170467#3432317 (10Sigma)
[19:37:39] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove edit size information in the Edit Counter in new XTools - https://phabricator.wikimedia.org/T170103#3432349 (10MusikAnimal) And for me: ``` MariaDB [enwiki_p]> SELECT AVG(sizes.size) AS average_size,     ->     COUNT(CASE WHEN sizes.size < 20 THEN 1 END...
[20:13:41] <wikibugs>	 (03PS1) 10Lokal Profil: Proof of concept to harvest Wikidata into monuments database [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/364840 (https://phabricator.wikimedia.org/T165988)
[20:15:18] <wikibugs>	 (03Abandoned) 10Lokal Profil: Proof of concept to harvest Wikidata into monuments database [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/364840 (https://phabricator.wikimedia.org/T165988) (owner: 10Lokal Profil)
[20:19:30] <paladox>	 Hi trying to delete an instance from horizion is taking a long time
[20:19:33] <paladox>	 is that normal?
[20:19:58] <chasemp>	 paladox: I'm not sure, depends on what you mean by a long time, if it persists can you file a task and cc me and andrew?
[20:20:06] <paladox>	 Ok
[20:20:15] <paladox>	 i will wait 10 more mins and file a task :)
[20:20:28] <paladox>	 ah deleted now
[20:20:56] <chasemp>	 I'm not seeing delays for deletion on our canary
[20:21:41] <paladox>	 oh
[20:22:04] <wikibugs>	 10Cloud-VPS, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labpuppetmaster100[12].wikimedia.org - https://phabricator.wikimedia.org/T167905#3432656 (10RobH)
[20:23:17] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Remove edit size information in the Edit Counter in new XTools - https://phabricator.wikimedia.org/T170103#3432658 (10kaldari) >However I was thinking, not necessarily right now, that instead of throwing out these fun stats we could instead limit them to users...
[20:30:16] <wikibugs>	 10Tool-Labs-tools-XTools: Figure out XTools Git Repositories - https://phabricator.wikimedia.org/T170367#3428913 (10kaldari) Option A sounds reasonable to me.
[20:30:50] <wikibugs>	 (03Restored) 10Lokal Profil: Proof of concept to harvest Wikidata into monuments database [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/364840 (https://phabricator.wikimedia.org/T165988) (owner: 10Lokal Profil)
[20:30:56] <wikibugs>	 10cloud-services-team (FY2017-18), 10Wikimedia-Blog-Content: Publish a blog post covering the Cloud Services rebranding process - https://phabricator.wikimedia.org/T170288#3425689 (10MelodyKramer) This is a great idea and can largely be based on the wikitech-l note (https://lists.wikimedia.org/pipermail/wikite...
[20:31:33] <James_F>	 bd808: Can the logo instead be supported by the red and blue b*e*ars of the Wikimedia Community logo?
[20:31:57] <wikibugs>	 10Cloud-VPS, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labpuppetmaster100[12].wikimedia.org - https://phabricator.wikimedia.org/T167905#3432694 (10RobH)
[20:33:32] <wikibugs>	 (03CR) 10Lokal Profil: [C: 04-1] "SoI messed up in gerrit somehow and now my patch got pushed to https://gerrit.wikimedia.org/r/#/c/364840/ which in turn prevents me from p" (039 comments) [labs/tools/heritage] (wikidata) - 10https://gerrit.wikimedia.org/r/354961 (https://phabricator.wikimedia.org/T165988) (owner: 10Jean-Frédéric)
[20:34:26] <bd808>	 doh
[20:35:11] <wikibugs>	 (03PS7) 10Lokal Profil: Proof of concept to harvest Wikidata into monuments database [labs/tools/heritage] (wikidata) - 10https://gerrit.wikimedia.org/r/354961 (https://phabricator.wikimedia.org/T165988) (owner: 10Jean-Frédéric)
[20:35:43] <wikibugs>	 (03Abandoned) 10Lokal Profil: Proof of concept to harvest Wikidata into monuments database [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/364840 (https://phabricator.wikimedia.org/T165988) (owner: 10Lokal Profil)
[20:36:00] <wikibugs>	 (03CR) 10Lokal Profil: "Ignore the last comment =)" [labs/tools/heritage] (wikidata) - 10https://gerrit.wikimedia.org/r/354961 (https://phabricator.wikimedia.org/T165988) (owner: 10Jean-Frédéric)
[20:36:06] <harej>	 bd808: nice heraldry lingo
[20:36:52] <harej>	 James_F: I'm not sure bears are proper heraldic imagery for our community. We need something more energetic.
[20:37:11] * James_F shrugs.
[20:37:42] <James_F>	 We are based in California, after all.
[20:37:50] <wikibugs>	 10Quarry: Quarry query in unknown state - https://phabricator.wikimedia.org/T170464#3432726 (10awight) p:05Normal>03Low
[20:37:57] <James_F>	 Oh! And also Florida still. Maybe a bear and an aligator?
[20:37:59] <wikibugs>	 10Quarry: Quarry query in unknown state - https://phabricator.wikimedia.org/T170464#3432141 (10awight)
[20:38:27] <bd808>	 "rampant on a field" :)
[20:40:08] <harej>	 It's like herding cats, isn't it? Maybe the heraldic symbol of the community should be a domestic house cat.
[20:41:27] <harej>	 The community shield would be the cutest shield.
[20:41:39] <harej>	 Or perhaps the wily river otter, just because.
[20:41:44] <James_F>	 If we're going with cats, a wild cat seems more aposite.
[20:41:58] <James_F>	 Gah, apposite. Darn new keyboard.
[20:42:31] <harej>	 If Wikimedia DC had a coat of arms it would definitely be supported by river otters.
[20:42:39] <harej>	 Our mascot is an otter after all.
[20:45:28] <wikibugs>	 (03CR) 10Lokal Profil: "recheck" [labs/tools/heritage] (wikidata) - 10https://gerrit.wikimedia.org/r/354961 (https://phabricator.wikimedia.org/T165988) (owner: 10Jean-Frédéric)
[20:49:00] <robh>	 andrewbogott: you'll have your labpuppetmaster100[12] systems shortly =]
[20:49:09] <robh>	 i assume im assignign to you since you commented on their setup task to use jessie?
[20:49:14] <robh>	 assigning even
[20:49:34] <robh>	 chasemp or should you get this one?
[20:49:59] <chasemp>	 I think andrew is working on that now robh
[20:50:11] <robh>	 cool, then ill assign to him once the os is installed and puppet/salt accepted
[20:51:25] <robh>	 I'm also going to shortly be installing labnodepool1002.eqiad.wmnet, labweb100[12].wikimedia.org, & labnet100[34]
[20:51:44] <robh>	 chasemp: can you advise on the os for those?
[20:51:58] <robh>	 I assume trusty, but the labpupeptmaster100[12] were jessie
[20:52:10] <chasemp>	 robh: I'm pretty sure jessie
[20:52:16] <robh>	 even better =]
[20:52:36] <robh>	 want one of the labwebs on stretch or nah?
[20:52:50] <robh>	 (i figured i may as well ask so when im later asked if ived been pushing stretch i can honestly say yes ;)
[20:53:03] <chasemp>	 those are going to mirror a system that is curently jessie I believe now in labtest robh so stick w/ jessie for the moment
[20:53:12] <robh>	 wilco
[20:53:13] <chasemp>	 by mirror I mean combined wikitch and horizon etc
[20:53:43] <wikibugs>	 10Cloud-VPS, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labnodepool1002.eqiad.wmnet - https://phabricator.wikimedia.org/T168407#3432797 (10RobH) update from irc chat with @chasemp: please install these hosts with jessie.
[20:53:48] <wikibugs>	 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labweb100[12].wikimedia.org - https://phabricator.wikimedia.org/T167820#3432798 (10RobH) update from irc chat with @chasemp: please install these hosts with jessie.
[20:53:51] <wikibugs>	 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labnet100[34] - https://phabricator.wikimedia.org/T165779#3432799 (10RobH) update from irc chat with @chasemp: please install these hosts with jessie.
[20:53:53] <wikibugs>	 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labnet100[34] - https://phabricator.wikimedia.org/T165779#3432800 (10RobH) update from irc chat with @chasemp: please install these hosts with jessie.
[20:55:00] <robh>	 chasemp: should all of these other 3 setups go to you when complete?
[20:55:11] <chasemp>	 robh: sure, please
[20:55:13] <robh>	 labnodepool, labnet, labweb
[20:55:16] <robh>	 cool
[20:55:25] <chasemp>	 robh: oh labweb is andrew as well sorry
[20:55:30] <robh>	 heh, np
[20:56:03] <wikibugs>	 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labweb100[12].wikimedia.org - https://phabricator.wikimedia.org/T167820#3432804 (10RobH)
[20:56:17] <wikibugs>	 10Cloud-VPS, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labpuppetmaster100[12].wikimedia.org - https://phabricator.wikimedia.org/T167905#3432805 (10RobH)
[20:56:34] <wikibugs>	 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labnet100[34] - https://phabricator.wikimedia.org/T165779#3432806 (10RobH)
[20:56:47] <wikibugs>	 10Cloud-VPS, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labnodepool1002.eqiad.wmnet - https://phabricator.wikimedia.org/T168407#3432807 (10RobH)
[20:57:11] <robh>	 its hardware christmas for cloud team this week =]
[20:57:31] <chasemp>	 no joke 
[21:05:31] <paladox>	 !log phabricator phab-01 -> phabricator 
[21:05:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL
[21:05:51] <paladox>	 !log phabricator deleting phabricator instance and recreating it to try and rid it of the php segfault
[21:05:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL
[21:41:39] <wikibugs>	 10Cloud-VPS, 10Operations: rack/setup/install labpuppetmaster100[12].wikimedia.org - https://phabricator.wikimedia.org/T167905#3433102 (10RobH) a:05RobH>03Andrew
[21:41:49] <wikibugs>	 10Cloud-VPS, 10Operations: rack/setup/install labpuppetmaster100[12].wikimedia.org - https://phabricator.wikimedia.org/T167905#3349071 (10RobH) All setup and ready for Andrew to take over.
[21:45:29] <wikibugs>	 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10Patch-For-Review: figure out if nodepool is overwhelming rabbitmq and/or nova - https://phabricator.wikimedia.org/T170492#3433155 (10Andrew) For a dramatic change, I'm going to merge a patch that doubles the spawn time from 5 seconds to 10 seconds.  If...
[21:46:43] <wikibugs>	 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10Nodepool, 10Patch-For-Review: figure out if nodepool is overwhelming rabbitmq and/or nova - https://phabricator.wikimedia.org/T170492#3433162 (10greg)
[22:03:11] <wikibugs>	 10cloud-services-team (FY2017-18), 10Goal: Begin migrating customer-facing Dumps endpoints to Cloud Services - https://phabricator.wikimedia.org/T168486#3433318 (10madhuvishy)
[22:04:03] <wikibugs>	 10cloud-services-team (FY2017-18), 10Goal: Begin migrating customer-facing Dumps endpoints to Cloud Services - https://phabricator.wikimedia.org/T168486#3366040 (10madhuvishy)
[22:04:48] <RainbowSprinkles>	 Did we get stuck again?
[22:17:33] <bd808>	 RainbowSprinkles: andrewbogott thought he saw a blip too but then thought it worked itself out
[22:17:52] <bd808>	 are you still seeing problems with spawn rates?
[22:18:10] <RainbowSprinkles>	 Nope
[22:19:07] <wikibugs>	 10Tool-Labs-tools-XTools, 10Community-Tech-Sprint: Form state not reset after using browser's back button in Safari/Firefox - https://phabricator.wikimedia.org/T170499#3433388 (10MusikAnimal)
[22:19:14] <bd808>	 cool. when you really need to shout at us, '!' + 'help' pings the whole WMCS team. :)
[22:19:31] <bd808>	 or at least folks who are "on-call"
[22:19:50] <RainbowSprinkles>	 I'm so glad nobody can ping me like that :p
[22:20:04] <RainbowSprinkles>	 Good to know :D
[22:20:42] <bd808>	 I've got a todo to make a bot that watched for it too and lets us be a bit more selective about when we care
[22:23:54] <wikibugs>	 10cloud-services-team (FY2017-18), 10Wikimedia-Blog-Content: Publish a blog post covering the Cloud Services rebranding process - https://phabricator.wikimedia.org/T170288#3433426 (10bd808) >>! In T170288#3432685, @MelodyKramer wrote: > I'm going to work up an outline based on that note and your slidedecks @bd...
[22:42:29] <paladox>	 hi, im wondering could i have some help accociating a floating ip in phabricator project please?
[22:42:52] <paladox>	 it has a floating up which i removed from an instance but carn't seem to add it to any of the instances now
[22:44:07] <paladox>	 ah
[22:44:08] <paladox>	 fixed it
[22:44:15] <paladox>	 some how there's a bug
[22:44:27] <paladox>	 since i can do it through access and securty
[22:44:35] <paladox>	 but carn't through the instance page
[22:46:27] <bd808>	 paladox: hmmm.. I think I've seen that before too. Probably fixed upstream already but maybe worth checking for a bug report
[22:46:42] <paladox>	 thanks.
[22:46:44] <bd808>	 our Horizon is several versions behind
[22:46:48] <paladox>	 oh
[22:46:52] <bd808>	 for 'reasons'
[22:47:04] <paladox>	 yep
[22:58:34] <wikibugs>	 10Cloud-VPS, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labnodepool1002.eqiad.wmnet - https://phabricator.wikimedia.org/T168407#3433685 (10RobH)
[23:05:44] <RainbowSprinkles>	 bd808, chasemp: So the queue has gone back down to zero. Hopefully now things will be back to normal
[23:05:48] <RainbowSprinkles>	 No more backlogs
[23:06:07] * bd808 knocks on wood
[23:09:54] <wikibugs>	 10Cloud-VPS, 10Operations: rack/setup/install labnodepool1002.eqiad.wmnet - https://phabricator.wikimedia.org/T168407#3433719 (10RobH) a:05RobH>03chasemp
[23:10:07] <wikibugs>	 10Cloud-VPS, 10Operations: rack/setup/install labnodepool1002.eqiad.wmnet - https://phabricator.wikimedia.org/T168407#3363615 (10RobH)
[23:15:20] <shinken-wm>	 PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[23:24:27] <paladox>	 !log phabricator phabricator backup now, no segfaulting.
[23:24:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL