[05:20:06] 10Data-Services, 10DBA, 10Security-Team: gblrename log_type missing on replicas - https://phabricator.wikimedia.org/T178752#3702504 (10bd808) The `gblrename` type was not in [[https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/role/templates/labs/db/views/maintain-views.yaml... [06:08:01] 10Cloud-Services, 10DBA: Multiple concurrent long running queries from s51434 overloading labsdb1003/labsdb1009 - https://phabricator.wikimedia.org/T133705#3702537 (10Marostegui) These are no longer on the processlist - I guess they finished if they were not killed by anyone (not me) [07:59:33] 10Cloud-Services, 10DBA: Multiple concurrent long running queries from s51434 overloading labsdb1003/labsdb1009 - https://phabricator.wikimedia.org/T133705#3702735 (10Magnus) 05Open>03Resolved [08:15:12] 10Data-Services, 10cloud-services-team (Kanban), 10Wikidata, 10Wikidata-Sprint: Drop wb_entity_per_page views in Wiki Replicas - https://phabricator.wikimedia.org/T178661#3702747 (10Marostegui) 05Open>03stalled I will update this task once T177601 if finished so it can be done. Stalling it for now unti... [08:37:54] 10Data-Services, 10Security-Team: gblrename log_type missing on replicas - https://phabricator.wikimedia.org/T178752#3702769 (10Marostegui) Please note that moodbar tables were removed: T153033 On the same note, interwiki ones were truncated: T169376 [09:39:35] hi! [11:12:20] 10PAWS, 10Operations, 10Pywikibot-Commons, 10Traffic, and 2 others: Server error (500) while trying to download files from Commons from PAWS - https://phabricator.wikimedia.org/T178567#3702993 (10Chicocvenancio) While @BBlack's response does seem to make sense to me, I am wondering why pywikibot sends thes... [11:57:28] !log ores-staging applying role::labs::mediawiki_vagrant on ores-misc-01 [11:57:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores-staging/SAL [12:07:46] 10PAWS, 10Operations, 10Pywikibot-Commons, 10Traffic, and 2 others: Server error (500) while trying to download files from Commons from PAWS - https://phabricator.wikimedia.org/T178567#3703068 (10BBlack) Yeah there's a few different layers of issue wrapped up in this `Authorization` mess: 1. Pywikibot pro... [13:14:03] 10Data-Services, 10cloud-services-team (Kanban), 10Wikidata, 10Wikidata-Sprint: Drop wb_entity_per_page views in Wiki Replicas - https://phabricator.wikimedia.org/T178661#3698944 (10chasemp) Thanks gentlemen :) [13:52:25] 10Toolforge, 10Wikisource: [Wsexport] Grid job submission failing - https://phabricator.wikimedia.org/T178803#3703227 (10Tpt) [13:52:44] 10Toolforge, 10Wikisource: [Wsexport] Grid job submission failing - https://phabricator.wikimedia.org/T178803#3703240 (10Tpt) p:05Triage>03High [13:54:25] 10Toolforge, 10Wikisource: [Wsexport] Grid job submission failing - https://phabricator.wikimedia.org/T178803#3703227 (10chasemp) What is throwing that error message: Jsub or xvfb-run? Can you capture more logs from xvfb-run? When did this start? Is this 100% of jobs or less? [14:04:23] 10Toolforge, 10Wikisource: [Wsexport] Grid job submission failing - https://phabricator.wikimedia.org/T178803#3703282 (10Tpt) > What is throwing that error message: Jsub or xvfb-run? It's jsub. It seems to be a quite well known error accoding to a quick googling. > Can you capture more logs from xvfb-run? I... [14:18:34] Why might my cloud VPS instances have stopped looking using my standalone puppet master? I still have my puppetmaster set in Hiera on wikitech but when I run `puppet agent --debug -tv` it seems to be using labs-puppetmaster. I think that this happened around September 25th. [14:28:10] tarrow: the puppetmaster is set in /etc/puppet/puppetmaster.conf — what does it say in there? [14:30:43] andrewbogott: on neither the puppet master nor the client do I have that file. I do have /etc/puppet/puppet.conf which has : `server = labs-puppetmaster.wikimedia.org` on both [14:30:57] um, yes, sorry, that's right [14:31:06] ok, what host are we talking about? [14:31:29] elasticsearch-01.wikifactmine.eqiad.wmflabs [14:32:05] ok, let me look [14:32:39] I have set the puppetmaster at: https://wikitech.wikimedia.org/wiki/Hiera:Wikifactmine [14:34:26] but the *real* problem I'm trying to solve is that the secrets that are in my puppetmaster (and are still there) seem to have disappeared from the agent. I suspect the problem is that the agent is now looking at the wrong puppetmaster but this may not be right [14:34:43] thanks for taking a look! [14:35:07] I think you're right about why your secrets aren't getting set [14:36:52] however they were set fine before and the mtime of the secret file I care about is 'Sep 25 19:22' [14:38:42] tarrow: I'm not sure why this isn't working, but I'd recommend you transfer that hiera setup to the Horizon interface. I'd also encourage you to make the settings specific to a prefix rather than project-wide (so your puppetmaster doesn't manage itself) [14:38:50] I can explain that in further detail if need be [14:42:23] yeah, I actually have an exception for just the puppetmaster host to avoid that :). I guess I paste into the Hiera box on each instance? It comes with a `{}` as default. Should I still just put yaml in there? [14:43:04] I would set up a prefix for elasticsearch- [14:43:20] which you can do via the 'puppet -> Prefix puppet' tab on the left [14:43:27] and yeah, just yaml will work [14:45:29] Thanks, seems to be working although it looks like I need to do the SSL cert dance again [14:45:37] either something is wrong w/ teh grid or the bastion-03 [14:45:45] the bastion is hosed up enough it's hard to tell [14:48:41] chasemp: grid seems ok from bastion-02. Probably somebody running amok on -03 again :/ [14:49:40] yeah I took to looking at hte master but it seems ok so I'm rebooting 03 [14:49:50] !log tools wall message and scheduled reboot in 5m for bastion-03 [14:49:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:50:06] bd808: was trying to take a look at https://phabricator.wikimedia.org/T178803 [14:50:16] wsexport was 15% of all running jobs [14:50:27] and contrary to the report some of them I believe are finishing or were [14:50:34] 15%+ [14:50:43] when I'm excluding bogus mail q jobs that are stuck [14:50:45] anyway, huh [14:50:46] poor wsexport [14:51:12] the grid is not a good dispatch and queueing system [14:51:14] its cool that is works but its scary that it works too [14:51:26] that neeeds some shim to control the flow of executed things [14:52:06] 10PAWS, 10Operations, 10Pywikibot-Commons, 10Traffic, and 2 others: Server error (500) while trying to download files from Commons from PAWS - https://phabricator.wikimedia.org/T178567#3703369 (10fgiunchedi) >>! In T178567#3700598, @BBlack wrote: > The original request did have an `Authorization` header fu... [14:52:12] looks like their lighttpd container job has gone nuts [14:52:32] I've seen this happen a small number of times before with other tools [14:52:57] https://tools.wmflabs.org/grid-jobs/tool/wsexport -- 43 copies of lighttpd-wsexport is very unexpected [14:54:10] yes I tried to stop it [14:54:15] to see how many disappear :D [14:54:25] but bastion-03 was so dogged it was not working out [14:56:07] qstat is not showing me any jobs at all for wsexport now [14:56:31] I'll start the service back up [14:56:49] it must have taken then and just not returned [14:57:14] !log tools.wsexport Started webservice [14:57:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wsexport/SAL [14:58:43] bd808: that looks better to me [14:59:13] * bd808 makes grid-jobs rebuild its cache [15:00:39] 10cloud-services-team (FY2017-18), 10Operations, 10Patch-For-Review, 10Puppet, 10User-Joe: Upgrade to puppet 4 (4.8 or newer) - https://phabricator.wikimedia.org/T177254#3703418 (10herron) [15:01:23] 10Toolforge, 10Wikisource: [Wsexport] Grid job submission failing - https://phabricator.wikimedia.org/T178803#3703420 (10chasemp) Something there had gone horribly wrong w/ 43 copies of the webservice seemingly running. I stopped it, did some poking at the grid itself to see if things were sane, and then @bd8... [15:02:43] tarrow: is everything making sense now? [15:11:11] 10Cloud-VPS (Quota-requests): Increase Tools available quota - https://phabricator.wikimedia.org/T178805#3703440 (10chasemp) [15:12:32] 10Cloud-VPS (Quota-requests): Increase Tools available quota - https://phabricator.wikimedia.org/T178805#3703469 (10chasemp) [15:12:38] 10Cloud-VPS (Quota-requests): Increase Tools available quota - https://phabricator.wikimedia.org/T178805#3703440 (10chasemp) p:05Triage>03Normal [15:19:48] 10cloud-services-team: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3703515 (10chasemp) p:05Triage>03Normal [15:21:41] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3703523 (10chasemp) [15:22:40] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3703524 (10bd808) [15:22:56] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3703504 (10bd808) [15:23:09] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3703526 (10chasemp) [15:29:13] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3703530 (10chasemp) [15:41:33] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3703632 (10chasemp) [15:41:37] 10Toolforge, 10Wikisource: [Wsexport] Grid job submission failing - https://phabricator.wikimedia.org/T178803#3703633 (10Tpt) Thank you for having a look at it. Sadly it is still not working. The epub exportation is not using the grid, it is the exportation to pdf/mobi/txt that is done by first generating the... [15:58:49] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3703702 (10chasemp) [15:59:00] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3703504 (10chasemp) [15:59:11] 10Toolforge, 10Wikisource: [Wsexport] Grid job submission failing - https://phabricator.wikimedia.org/T178803#3703705 (10bd808) https://stackoverflow.com/questions/4883056/sge-qsub-fails-to-submit-jobs-in-sync-mode seems to indicate that this error is related to `qsub sync -y` jobs and the qmaster running out... [16:00:24] chasemp: slow bastion <= -05 is almost always free :) [16:06:13] 10cloud-services-team (Kanban), 10DC-Ops, 10Operations, 10ops-eqiad: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3703728 (10chasemp) @Cmjohnson, can we RMA this back to oblivion yet? :D [16:08:55] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3703732 (10chasemp) [16:15:09] 10cloud-services-team (Kanban), 10DC-Ops, 10Operations, 10ops-eqiad: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3703744 (10Cmjohnson) @chasemp, no unfortunately it does not work that way. I new CPU and motherboard has been requested through Dell. I believe that will fix the issue. T... [16:27:20] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3703790 (10chasemp) [16:33:28] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3703815 (10chasemp) [17:03:07] 10cloud-services-team (Kanban), 10DC-Ops, 10Operations, 10ops-eqiad: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3703915 (10chasemp) OK, thanks @Cmjohnson. We'll hang tight for the new board. [17:35:57] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3704016 (10chasemp) [17:36:53] 10Toolforge, 10Wikisource: [Wsexport] Grid job submission failing - https://phabricator.wikimedia.org/T178803#3704019 (10chasemp) Is this a general exec host pool resource issue? [17:48:18] 10VPS-project-Wikistats: Miraheze wikistats falsely listing private wikis - https://phabricator.wikimedia.org/T178820#3704066 (10Reception123) [18:55:29] 10Toolforge, 10Wikisource: [Wsexport] Grid job submission failing - https://phabricator.wikimedia.org/T178803#3704256 (10bd808) >>! In T178803#3704019, @chasemp wrote: > Is this a general exec host pool resource issue? It could be, yes. I'm sure we haven't changed any config around this so I'm wondering if th... [19:24:22] (03PS4) 10Paladox: Gerrit: Replace certificates with tokens for its-phabricator [labs/private] - 10https://gerrit.wikimedia.org/r/384902 (https://phabricator.wikimedia.org/T178385) [19:27:00] 10cloud-services-team: Provide any rough metrics for tool and project usage - https://phabricator.wikimedia.org/T178834#3704404 (10Quiddity) [19:27:24] Hello, do we have an evidence of outages of labsdb replicas available from toolsforge? [19:28:26] Urbanecm: not that I've heard. Do you have specifics? [19:28:27] I just connected to enwiki using a Tool I own Urbanecm [19:31:51] Ok, I be specific. My script ended with MySQL server has gone away this night. It was making some queries like select page_title from page where page_namespace=14 and page_is_redirect=0 and page_title like "Alba_roku%"; (the part after "Alba_roku" differs in each query) and renaming those categories by Pywikibot. [19:32:29] Complete stderr is at ~urbanecm/tayari.err [19:32:53] My code is at ~urbanecm/Documents/cswiki/tayari/script.py [19:33:05] Any other info needed from me? [19:36:32] Urbanecm: could you file a phabricator task? Chase and I are in a meeting right now and may forget otherwise [19:38:21] Sure. Should I assign somebody (you/Chase) to the task? [19:38:52] you can assign it to me. I'll either dig into it or pass it of on someone else :) [19:40:50] (03CR) 10Chad: [V: 032 C: 032] Gerrit: Replace certificates with tokens for its-phabricator [labs/private] - 10https://gerrit.wikimedia.org/r/384902 (https://phabricator.wikimedia.org/T178385) (owner: 10Paladox) [19:46:25] 10Toolforge: List of best practices for Toolforge tools - https://phabricator.wikimedia.org/T178836#3704484 (10Tgr) [19:53:40] 10Toolforge: List of best practices for Toolforge tools - https://phabricator.wikimedia.org/T178836#3704484 (10Quiddity) We do have https://commons.wikimedia.org/wiki/File:FLOSS_Best_Practices_for_Bots_and_Tools_poster.pdf (via https://github.com/bd808/floss-best-practices-for-bots-and-tools-poster/ via T169919)... [19:54:34] 10Toolforge: List of best practices for Toolforge tools - https://phabricator.wikimedia.org/T178836#3704525 (10Tgr) The three high-level areas to cover: * links to language-specific best practices (such as [[http://www.phptherightway.com/|PHP The Right Way]] or the [[https://github.com/sk89q/php-security-checkli... [20:00:05] 10Toolforge, 10User-Urbanecm, 10User-bd808: Strange Mysql server gone away error - possible outage during 2017-10-22 night - https://phabricator.wikimedia.org/T178837#3704540 (10Urbanecm) [20:00:14] bd808, done as T178837. [20:00:15] T178837: Strange Mysql server gone away error - possible outage during 2017-10-22 night - https://phabricator.wikimedia.org/T178837 [20:00:23] thanks [20:10:49] 10Data-Services, 10Toolforge, 10User-Urbanecm, 10User-bd808: Strange Mysql server gone away error - possible outage during 2017-10-22 night - https://phabricator.wikimedia.org/T178837#3704588 (10zhuyifei1999) [20:17:45] 10Data-Services, 10Toolforge, 10User-Urbanecm, 10User-bd808: Strange Mysql server gone away error - possible outage during 2017-10-22 night - https://phabricator.wikimedia.org/T178837#3704606 (10Urbanecm) Just a note: Outage is almost out of chance, as it failed with the same backtrace for the second time... [20:17:57] 10Data-Services, 10Toolforge, 10User-Urbanecm, 10User-bd808: Strange Mysql server gone away error - possible outage during 2017-10-22 night - https://phabricator.wikimedia.org/T178837#3704607 (10bd808) `db.connect('cswiki')` would connect to labsdb1001. 'MySQL server has gone away' can mean [[https://dev.m... [20:19:39] 10Data-Services, 10Toolforge, 10User-Urbanecm, 10User-bd808: Strange Mysql server gone away error - possible outage during 2017-10-22 night - https://phabricator.wikimedia.org/T178837#3704611 (10bd808) @Urbanecm can you try to run the same script using [[https://phabricator.wikimedia.org/phame/post/view/70... [21:21:24] 10Data-Services, 10Toolforge, 10User-Urbanecm, 10User-bd808: Strange Mysql server gone away error - possible outage during 2017-10-22 night - https://phabricator.wikimedia.org/T178837#3704540 (10chasemp) >>! In T178837#3704611, @bd808 wrote: > @Urbanecm can you try to run the same script using [[https://ph... [21:25:12] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3704858 (10chasemp) @robh, do you manage 'Add to ops private mailing list' ? [21:28:33] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3704875 (10chasemp) [21:29:38] 10VPS-project-Wikistats: Miraheze wikistats falsely listing private wikis - https://phabricator.wikimedia.org/T178820#3704876 (10Dzahn) Are the private wikis included in https://meta.miraheze.org/w/api.php?action=sitematrix or not? [21:31:26] 10VPS-project-Wikistats: Miraheze wikistats falsely listing private wikis - https://phabricator.wikimedia.org/T178820#3704877 (10Dzahn) This code used to skip them in the past, i have not changed it. ``` # skip private wikis if ( isset( $wiki['private'] ) ) { continue; } ``` The code doing... [21:32:59] 10VPS-project-Wikistats: Miraheze wikistats falsely listing private wikis - https://phabricator.wikimedia.org/T178820#3704878 (10Dzahn) In the output of https://meta.miraheze.org/w/api.php?action=sitematrix did the "private" setting change from "true"/"yes" to "" maybe ? [22:11:37] 10Cloud-Services, 10Quarry: Error: BIGINT UNSIGNED value is out of range - https://phabricator.wikimedia.org/T178848#3704967 (10Huji) [22:14:29] 10Data-Services: Error: BIGINT UNSIGNED value is out of range - https://phabricator.wikimedia.org/T178848#3704988 (10zhuyifei1999) [22:52:23] 10Toolforge: Add Support for PHP 7.1 to Toolforge - https://phabricator.wikimedia.org/T178850#3705054 (10dbarratt) [22:54:01] apologies if that ^ already exists somewhere [22:54:18] 10VPS-project-Wikistats: automatic import of new miraheze wikis - https://phabricator.wikimedia.org/T153930#3705067 (10Dzahn) [22:54:20] 10VPS-project-Wikistats: Miraheze wikistats falsely listing private wikis - https://phabricator.wikimedia.org/T178820#3705066 (10Dzahn) [23:16:02] 10Toolforge: Add Support for PHP 7.1 to Toolforge - https://phabricator.wikimedia.org/T178850#3705054 (10bd808) This would require one of: * Adding Docker containers which install PHP from a source other than the Wikimedia Foundation apt servers * Adding Docker containers based on [[https://packages.debian.org/b... [23:16:54] 10Toolforge: Add Support for PHP 7.1 to Toolforge - https://phabricator.wikimedia.org/T178850#3705086 (10bd808) I'm not even sure I want to contemplate what it would take to provide PHP 7.1 on the bastions and Grid Engine nodes. [23:19:09] 10Toolforge: Add Support for PHP 7.1 to Toolforge - https://phabricator.wikimedia.org/T178850#3705104 (10dbarratt) >>! In T178850#3705084, @bd808 wrote: > * Making arbitrary Docker containers work with `webservice --backend=kubernetes` That would be my preference, so we are able to use whatever version/extensio... [23:48:32] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3705175 (10RobH) >>! In T178807#3704858, @chasemp wrote: > @robh, do you manage 'Add to ops private mailing list' ? added to both the internal team list and the normal ops list. [23:48:59] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3705176 (10RobH) [23:49:35] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3703504 (10RobH) RT access is no longer relevant. RT is only needed for historical lookup of procurement records, so not something people (outside of me) really need. [23:49:58] 10cloud-services-team, 10Patch-For-Review: Onboard aborrero to WMF - https://phabricator.wikimedia.org/T178807#3705178 (10RobH)