[00:27:39] PROBLEM - Puppet errors on tools-exec-1411 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [00:39:42] PROBLEM - Puppet errors on tools-exec-1434 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [01:02:39] RECOVERY - Puppet errors on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [01:09:42] RECOVERY - Puppet errors on tools-exec-1434 is OK: OK: Less than 1.00% above the threshold [0.0] [01:41:11] PROBLEM - Puppet errors on tools-exec-1418 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [02:21:13] RECOVERY - Puppet errors on tools-exec-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [03:12:32] PROBLEM - Puppet errors on tools-exec-1435 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [03:52:30] RECOVERY - Puppet errors on tools-exec-1435 is OK: OK: Less than 1.00% above the threshold [0.0] [04:58:10] 10Cloud-Services, 10DBA: Prepare and check storage layer for hif.wiktionary - https://phabricator.wikimedia.org/T173647#3536665 (10Marostegui) Hi, Let the DBAs know once the database is actually created as it is not yet there. Not for DBAs, we need to ALTER the tables to get the PKs in place as the patch to... [08:52:21] PROBLEM - Puppet errors on tools-exec-gift-trusty-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:22:19] RECOVERY - Puppet errors on tools-exec-gift-trusty-01 is OK: OK: Less than 1.00% above the threshold [0.0] [09:46:43] Hi [09:46:58] I am trying to host a web service on wikimedia tools page [09:47:21] Is it possible to host a Flask web service [09:47:41] my objective is to host a Keras service in Python [09:48:41] Anybody knows anything about how to do this? Or is it possible to host the service only in PHP?? [10:04:24] brij_: yes! It's possible to host a python webservice. Let me find you the link [10:05:06] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Python_.28uWSGI.29 [10:15:20] tarrow: Thanks, let me try that out. [11:32:27] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1418 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [11:45:16] 10Tools: Tool "locator" loads Google Maps - https://phabricator.wikimedia.org/T172659#3537195 (10TheDJ) This would require a bit of a rewrite to provide the same functionality using Leaflet.js I do note, that this page has a considerable warning on the page regarding privacy expectations when using 3rd party se... [11:48:25] in toolforge, i can access the wiki replica databases with *my* account [11:48:29] but my tool account can't [11:48:41] Philroc: error? [11:48:56] what is the name of your tool account? [11:48:57] ERROR 1045 (28000): Access denied for user 's53476'@'10.68.23.58' (using password: YES) [11:49:02] "roccerbot-new" [11:50:23] hmm this is weird [11:50:34] jynus: ^ [12:11:54] Philroc, zhuyifei1999_I am not sure what you are trying to connct to, but that ip is not known to me [12:12:15] i'm trying to connect to the wiki replica database [12:12:22] it is not one of the known replicas or proxies [12:12:22] i typed "sql enwiki" [12:12:26] check your host [12:12:43] did you edit your .my.cnf? [12:12:48] no [12:12:53] it's not possible to edit it [12:13:36] jynus [12:16:09] I think you can edit your.my.cnf, different from the replica.my.cnf [12:16:58] try: mysql --defaults-file=$HOME/replica.my.cnf -h enwiki.labsdb enwiki_p [12:17:23] sql enwiki should be a shortcut to that, but try it in full [12:19:25] jynus: 58.23.68.10.in-addr.arpa domain name pointer tools-bastion-03.tools.eqiad.wmflabs. [12:20:18] yes, but try the avobe command, see if you get the same result [12:20:52] also, is the tool recent? [12:21:21] i tried that already jynus [12:21:23] didn't work [12:21:29] is the tool recent in creation? [12:21:33] ye [12:21:40] it may take some time to be enabled on the replicas [12:21:49] how long? [12:21:57] 30 minutes or so, not sure [12:22:10] 10Toolforge: Cannot log into Tool Labs local DB via 'sql' command - https://phabricator.wikimedia.org/T173708#3537242 (10Magnus) [12:22:44] ^it seems you are not the only one [12:23:02] probably the jobs to create users broke [12:23:14] please commente there it is now working either [12:23:26] for you, giving your account name [12:27:02] i made my tool yesterday [12:27:17] not 30 minutes ago [12:27:49] oh, then there may be something wrong [12:27:55] I am commenting on that ticket [12:28:59] 10Toolforge: Cannot log into Tool Labs local DB via 'sql' command - https://phabricator.wikimedia.org/T173708#3537284 (10jcrespo) @Magnus I cannot help directly, but "magnustools" is probably an incorrect database username- the replica.my.cnf may be incorrect, as such account does not exist on the databases. Pro... [12:29:09] ^I have reported it, Philroc_ [12:29:29] thx [12:29:34] The right people to debug issues will attend it asap [12:29:46] sorry for the inconveniences [12:30:29] will make sure it is handled quickly, but it may take some hours [12:37:27] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [12:38:58] 10Toolforge: Cannot log into Tool Labs local DB via 'sql' command - https://phabricator.wikimedia.org/T173708#3537310 (10Magnus) Also for "flickr2commons" tool. [12:40:42] 10Toolforge: Cannot log into Tool Labs local DB via 'sql' command - https://phabricator.wikimedia.org/T173708#3537327 (10Magnus) @jcrespo But shouldn't the sql script automatically use the replica file? There's a "proper" user name in there, and 'sql dewiki' works fine. [12:45:10] 10Toolforge: Cannot log into Tool Labs local DB via 'sql' command - https://phabricator.wikimedia.org/T173708#3537341 (10jcrespo) @Magnus I think there may be issues with the script or the authentication system. I am waiting for cloud team to evaluate. The databases look ok, but may be missing accounts or someth... [12:52:54] 10Toolforge: Cannot log into Tool Labs local DB via 'sql' command - https://phabricator.wikimedia.org/T173708#3537365 (10Magnus) Addendum: With some tools, I can connect to "local", but not create a new tool database: ``` tools.wikidata-todo@tools-bastion-03:~$ sql local Welcome to the MariaDB monitor. Command... [13:06:21] 10Toolforge: Cannot log into Tool Labs local DB via 'sql' command - https://phabricator.wikimedia.org/T173708#3537384 (10jcrespo) @Magnus That is a different topic, and not an issue- wikidata-todo and flickr2commons, despite both being owning by you, are separate accounts database-wise, and you cannot create or... [13:12:46] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [13:44:36] 10Toolforge: Cannot log into Tool Labs local DB via 'sql' command - https://phabricator.wikimedia.org/T173708#3537242 (10chasemp) > This was one of the first tools on Tool Labs. Maybe some legacy issue? I think there is an issue that may affect some few older tools. Could be related. I'm pinging @madhuvishy h... [13:47:36] 10PAWS: Point paws.wmflabs.org to new PAWS setup - https://phabricator.wikimedia.org/T172257#3537777 (10Abbe98) @yuvipanda when creating a "Bash" notebook it seams like it sometimes causes other files in the same folder to be deleted. I have not been able to reproduce it but it has happened "randomly" twice. [13:52:06] Hi everyone [13:52:30] Has anybody ever did Keras/Neural network prediction on Wikimedia tools [13:52:32] ?? [13:52:58] I have hosted a tool on my project page : https://tools.wmflabs.org/proneval-gsoc17/ [13:53:49] It tries to predict a value based on some features, but the prediction code does not give any results, I waited for more than 10 min for it finish [13:54:04] any ideas? [14:03:18] anybody tried running neural network prediction inside a hosted tool? [14:03:46] Not sure! Try asking in #wikimedia-ai [14:03:46] brij: you might try the labs-l list to cast a wider net brij [14:03:52] it's early in the morning for most US folks [14:03:56] ah that too :) [14:05:10] hey @chasemp, thanks for the suggestion [14:05:46] Though I'm not sure how to use labs-l list [14:05:51] 10Toolforge: Cannot log into Tool Labs local DB via 'sql' command - https://phabricator.wikimedia.org/T173708#3537808 (10Magnus) @jcrespo But I am working as the respective "tool-user"! [14:06:03] can you give me a link? [14:08:31] brij: https://lists.wikimedia.org/mailman/listinfo/labs-l [14:17:46] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [15:07:20] PROBLEM - Puppet errors on tools-exec-1409 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:07:46] 10Data-Services, 10DC-Ops, 10Operations, 10ops-codfw: Split up labstore external shelf storage available in codfw between labstore2001 and 2 - https://phabricator.wikimedia.org/T171623#3538002 (10Papaul) @madhuvishy here is my proposal: labstore2001 2 shelves labstore-array [0-1] labstore2002 3 shelves la... [15:22:12] 10Toolforge: Cannot log into Tool Labs local DB via 'sql' command - https://phabricator.wikimedia.org/T173708#3538046 (10bd808) As my own user, `sql local` seems to work fine. If I `sudo become magnustools` it does fail: ``` $ sql local -v Connecting to tools-db ERROR 1045 (28000): Access denied for user 'magnus... [15:34:57] 10cloud-services-team (Kanban), 10Analytics, 10Research: Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3530850 (10Ottomata) FYI, we put this on the Analytics 'Radar' column, as I don't think there is active work for the Analytics tea... [15:35:32] 10Toolforge: Cannot log into Tool Labs local DB via 'sql' command - https://phabricator.wikimedia.org/T173708#3538114 (10jcrespo) @bd808 did you see T173708#3537284 that account seems to be missing (maybe?). [15:35:45] 10Cloud-Services, 10Patch-For-Review: Simple logrotate service for users of Tools as stopgap before central logging - https://phabricator.wikimedia.org/T152235#3538115 (10chasemp) Mostly silly troubleshooting on my part here :) When truncating the file for lighttpd that is not using O_APPEND the seek position... [15:35:50] 10cloud-services-team, 10Analytics, 10Project-Admins, 10Research: Create a phabricator project called "wikireplica-datasets" - https://phabricator.wikimedia.org/T173512#3530877 (10bd808) @Halfak can we start with this just as a column on the #Data-Services workboard? I've been debating making those columns... [15:37:18] RECOVERY - Puppet errors on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [15:44:32] 10cloud-services-team (Kanban), 10Analytics, 10Research: Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3530850 (10chasemp) @halfak and I spoke a bit about this this morning. We talked about a SIG for wikireplica things and how this... [15:50:32] 10cloud-services-team, 10Analytics, 10Research: Create a database on the wikireplica servers called "datasets_p" - https://phabricator.wikimedia.org/T173513#3538175 (10bd808) The ask here as I understand is to provide a database that is co-located with the wiki replicas available via the `wikireplica-web.eqi... [15:57:47] 10Data-Services, 10cloud-services-team (Kanban): Tool 'roccerbot-new' has mysql creds that don't work - https://phabricator.wikimedia.org/T173735#3538226 (10bd808) [15:59:11] 10Toolforge: Cannot log into Tool Labs local DB via 'sql' command - https://phabricator.wikimedia.org/T173708#3537242 (10bd808) >>! In T173708#3538114, @jcrespo wrote: > @bd808 did you see T173708#3537284 that account seems to be missing (maybe?). See {T173735}. This is probably unrelated to the initial problem... [16:37:29] 10Horizon, 10Marvin: Add @Niedzielski to the reading-web-staging group in horizon - https://phabricator.wikimedia.org/T172911#3538516 (10Jhernandez) I added you as an admin of the project @Niedzielski, let me know if that works! [16:47:13] 10Striker: Create a "recent changes" feed for Striker - https://phabricator.wikimedia.org/T173748#3538591 (10bd808) [16:53:35] 10Data-Services, 10DC-Ops, 10Operations, 10ops-codfw: Split up labstore external shelf storage available in codfw between labstore2001 and 2 - https://phabricator.wikimedia.org/T171623#3538657 (10madhuvishy) @Papaul Yeah that seems fine to me. Thanks! [17:35:40] long running querys are breaking because lost conn. to server [17:35:47] are there any new limits :/? [17:44:26] meow? [17:49:49] it's the tabby [17:52:15] TabbyCat :-D [17:53:03] 10Data-Services, 10DC-Ops, 10Operations, 10ops-codfw: Split up labstore external shelf storage available in codfw between labstore2001 and 2 - https://phabricator.wikimedia.org/T171623#3538961 (10Papaul) @madhuvishy Let me know when i have green light to disconnect everything and start working on the new s... [17:53:22] Steinsplitter: nothing that has changed recently that I know of. Sometimes this means that the query was always close to being killed and now is taking just a little too long [17:53:51] 10Data-Services, 10DC-Ops, 10Operations, 10ops-codfw: Split up labstore external shelf storage available in codfw between labstore2001 and 2 - https://phabricator.wikimedia.org/T171623#3538963 (10madhuvishy) @Papaul The servers are not in use and have no useful data in them, you have green light to disconn... [17:54:15] Steinsplitter: you could try being one of the early users of the new replicas. there are some details at T172704 [17:54:15] T172704: Promote beta test of new Wiki Replica servers - https://phabricator.wikimedia.org/T172704 [17:54:28] db808: i had no problems for years with such querys. Can you change the limit or do i have to re-write/find a other solution? [17:54:37] oh, thanks. will look :) [17:54:53] I don't directly have any control over the query killers [17:55:29] 10Data-Services, 10DC-Ops, 10Operations, 10ops-codfw: Split up labstore external shelf storage available in codfw between labstore2001 and 2 - https://phabricator.wikimedia.org/T171623#3471092 (10Cmjohnson) @papaul please let me know and @madhuvishy know if you have any issues getting the disk shelves to b... [17:56:44] it hasn't changed in months at least I expect [17:59:16] currently testing wikireplica-analytics.eqiad.wmnet :) [18:21:04] 10Cloud-Services: Toolforge intermittent Puppet failures for puppet-enc - https://phabricator.wikimedia.org/T173526#3539078 (10Andrew) If it's the enc failing then we should be able to see occasional failures with watch -n .5 /usr/local/bin/puppet-enc tools-static-11.tools.eqiad.wmflabs nothing yet, though! [18:25:58] 10Tool-fatameh: More descriptions for Fatameh - https://phabricator.wikimedia.org/T171995#3539119 (10XXN) [18:26:06] 10Data-Services, 10cloud-services-team (Kanban): Tool 'roccerbot-new' has mysql creds that don't work - https://phabricator.wikimedia.org/T173735#3539120 (10madhuvishy) 05Open>03Resolved Should be fixed now. [18:27:52] 10Tool-fatameh: More descriptions for Fatameh - https://phabricator.wikimedia.org/T171995#3482470 (10XXN) [18:33:25] bd808: works (fast!) :) thanks. [18:33:37] nice :) [18:34:47] those new servers have more cpu/ram/disk than the current cluster and not many users yet [18:35:02] 10Toolforge: Cannot log into Tool Labs local DB via 'sql' command - https://phabricator.wikimedia.org/T173708#3539166 (10madhuvishy) Everything looks good now on the account credentials end ``` madhuvishy@labstore1004:~CHECK_UID=s51211 madhuvishy@labstore1004:~$ mysql -h m5-master.eqiad.wmnet -u labsdbaccounts... [18:35:09] at the time we ordered them they were the biggest servers that WMF had ever bought :) [18:36:36] wow :) [18:37:17] Yes but do you have numbers? :] [18:37:40] nope [18:38:05] (i meant for bd808) [18:40:15] generally we don't disclose what we pay for materials, both because most of it is considered confidential, and it would make it difficult for any vendor to negotiate with us in the future [18:40:54] yeah, quoting prices is bad mojo. we want the nice discounts for being a non-profit :) [18:43:28] but I can say that `free -h` says 503G total ram and `df -h` says 12T of disk for the dbs [18:44:03] \o/ [18:44:05] like the military stuff, they don't disclose how much things cost usually [18:44:12] and 16 physical cores (I think) [18:44:25] huge electricity bills... [18:44:45] are you trying to surpass NSA on Internet and Electricity spenditure? [18:44:47] ;) [18:46:33] heh. no. OUr data centers are tiny for the work we do with them [18:49:50] bd808: i meant the tech specs, not the prices [18:49:51] and, wow [18:50:10] 503 GB of RAM. [18:50:18] databases like ram [18:50:29] Yes [18:50:34] Redis likes RAM even more [18:55:53] 10Horizon, 10Marvin: Add @Niedzielski to the reading-web-staging group in horizon - https://phabricator.wikimedia.org/T172911#3539260 (10Niedzielski) @Jhernandez, it works! \o/ {icon thumbs-up} [18:56:24] 10cloud-services-team (Kanban), 10DC-Ops, 10Operations, 10ops-eqiad: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3539262 (10Andrew) Syslog jumps from Aug 15 15:53:01 to Aug 21 18:40:37 with no indication of trouble: Aug 15 15:53:01 labvirt1015 CRON[136739]: (prometheus) CMD (/usr/local... [18:56:45] 10Horizon, 10Marvin: Add @Niedzielski to the reading-web-staging group in horizon - https://phabricator.wikimedia.org/T172911#3539263 (10Niedzielski) 05Open>03Resolved a:03Jhernandez I think this is done so I'm zapping it :] [19:59:11] 10Cloud-Services, 10cloud-services-team (Kanban), 10Privacy, 10Security: "last" command on WMF Labs/Tools allows users to view IPs of other toolforge users - https://phabricator.wikimedia.org/T172650#3539467 (10bd808) >>! In T172650#3532217, @Krenair wrote: > Maybe we could consider some extra hop box that... [20:00:36] bd808, is a bastion that only allows SSH preferable to the current bastions anyway? [20:00:39] isn't* [20:01:06] for the bastion project, yeah it probably is [20:01:12] as for the toolforge 'bastions' I suggest we simply stop calling them bastions and remove their external access [20:01:50] making everyone figure out how to use ssh jumphosts is not trivial [20:01:56] and kind of a step backwards [20:02:03] I shan't expect that to happen any time soon, but in an ideal world [20:08:17] 10cloud-services-team (Kanban), 10Analytics, 10Research: Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3530850 (10bd808) @Halfak can we close the 3 sub-tasks you forked from this and have the discussion before we jump to a specific s... [20:10:36] I did some testing eons ago w/ limited shells and one thing I figured out is you didn't actually need to allow any commands for ssh jump host to function [20:10:44] I'm not sure that ever materializes a shell [20:10:57] I've wondered if the VPS bastions could set /bin/false as the default shell [20:11:11] no chance to test it yet, but w/ lshell and no allowed commands it worked [20:11:23] this was a long time ago so vague recollections other than that [20:28:17] 10Cloud-VPS, 10Wikidata: ListeriaBot logs in 28000 times a day - https://phabricator.wikimedia.org/T173777#3539526 (10MaxSem) [20:42:35] 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labstore100[67].wikimedia.org - https://phabricator.wikimedia.org/T167984#3539555 (10madhuvishy) @Cmjohnson I tried getting into the management interface for 1007, and powercycled it, booted from network and was looking at... [20:45:38] 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labstore100[67].wikimedia.org - https://phabricator.wikimedia.org/T167984#3539556 (10madhuvishy) @Cmjohnson I also can't even seem to get into the management interface for labstore1006 ``` ☁ ~ ssh root@labstore1006.mgmt.... [20:51:46] !log rcm CAC: Running vagrant git-update [20:51:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [20:52:09] !log rcm Xenon: Updating to master [20:52:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [20:55:20] !log rcm Xenon: Running apt upgrade [20:55:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [20:58:32] !log rcm CAC: Fixing problems with apt, running apt upgrade [20:58:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [20:58:52] !log rcm Oxygen: Running apt upgrade [20:58:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [20:59:22] !log rcm Tin: Updating Jenkins; Running apt upgrade [20:59:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [20:59:40] !log rcm Neon: Running apt upgrade [20:59:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [21:22:43] PROBLEM - Puppet errors on tools-exec-1437 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:57:44] RECOVERY - Puppet errors on tools-exec-1437 is OK: OK: Less than 1.00% above the threshold [0.0] [22:21:34] 10Cloud-Services, 10cloud-services-team (Kanban), 10Patch-For-Review: nova-fullstack is losing instances on creation - https://phabricator.wikimedia.org/T165555#3539851 (10Andrew) The majority of failures here are VMs that work fine, but come up too slowly. I'm digging through the logs of a couple of these... [22:25:41] 10Cloud-Services, 10cloud-services-team (Kanban), 10Patch-For-Review: nova-fullstack is losing instances on creation - https://phabricator.wikimedia.org/T165555#3539863 (10Andrew) Things are much better here, but not yet perfect. In the last two weeks two weeks we leaked 5 VMs, 2 were working but timed out... [22:58:15] PROBLEM - Puppet errors on tools-exec-1420 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [23:15:35] trying to get a shell on deployment-mediawiki05 gives me "Class 'Memcached' not found in /srv/mediawiki/php-master/includes/libs/objectcache/MemcachedPeclBagOStuff.php on line 61 [23:16:27] but not on deployment-tin, interestingly [23:36:24] provided by an extension?