[00:35:51] PROBLEM - Puppet errors on tools-exec-1431 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [00:59:42] 10cloud-services-team, 10Community-Tech, 10DBA, 10Security: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3578646 (10kaldari) @MusikAnimal: Do we want this table (minus the revdeleted content) available on Labs? (It will require creating a specialized... [01:10:53] RECOVERY - Puppet errors on tools-exec-1431 is OK: OK: Less than 1.00% above the threshold [0.0] [01:47:04] PROBLEM - Puppet errors on tools-exec-1421 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [02:05:57] PROBLEM - Puppet errors on tools-worker-1020 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [02:22:06] RECOVERY - Puppet errors on tools-exec-1421 is OK: OK: Less than 1.00% above the threshold [0.0] [02:40:58] RECOVERY - Puppet errors on tools-worker-1020 is OK: OK: Less than 1.00% above the threshold [0.0] [02:43:07] PROBLEM - Puppet errors on tools-exec-1421 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [03:18:04] RECOVERY - Puppet errors on tools-exec-1421 is OK: OK: Less than 1.00% above the threshold [0.0] [03:38:45] PROBLEM - Puppet errors on tools-exec-1408 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [03:38:46] !log video upgrading youtube_dl from 2017.8.6 to 2017.9.2 on frontend and restarting webservic [03:38:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Video/SAL [04:38:42] RECOVERY - Puppet errors on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [04:53:07] 10Tools, 10Toolforge-standards-committee, 10Privacy: Hunt for Toolforge tools that load resources from third party sites - https://phabricator.wikimedia.org/T172065#3578865 (10Urbanecm) [04:53:09] 10Tools, 10Patch-For-Review: Tool "wikinity" loads assets from bootstrapcdn and ajax.googleapis.com - https://phabricator.wikimedia.org/T173065#3578862 (10Urbanecm) 05Open>03Resolved a:03Ricordisamoa Thank you, merged&deployed. [05:06:18] 10Tools, 10Patch-For-Review: Tool "wikinity" loads assets from bootstrapcdn and ajax.googleapis.com - https://phabricator.wikimedia.org/T173065#3578895 (10Ricordisamoa) Thanks for the quick response. [08:30:50] PROBLEM - Puppet errors on tools-exec-1428 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:06:59] PROBLEM - Puppet errors on tools-worker-1020 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [09:10:52] RECOVERY - Puppet errors on tools-exec-1428 is OK: OK: Less than 1.00% above the threshold [0.0] [09:41:57] RECOVERY - Puppet errors on tools-worker-1020 is OK: OK: Less than 1.00% above the threshold [0.0] [10:06:01] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1411 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [10:07:45] PROBLEM - Puppet errors on tools-exec-1438 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [10:08:57] PROBLEM - Puppet errors on tools-exec-1401 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [10:45:18] 10cloud-services-team, 10DBA: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3579430 (10jcrespo) [10:46:04] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [10:47:46] RECOVERY - Puppet errors on tools-exec-1438 is OK: OK: Less than 1.00% above the threshold [0.0] [10:48:57] RECOVERY - Puppet errors on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [11:05:16] 10cloud-services-team, 10DBA: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3579482 (10jcrespo) A first evaluation would point to **nova** as the cause, but I only have indirect metrics saying that, so I am not 100% sure. [11:08:37] 10cloud-services-team, 10DBA: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3579491 (10jcrespo) The other only possible candidate would be testreduce_0715 cc @ssastry [11:20:13] 10cloud-services-team, 10DBA: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3579530 (10jcrespo) We do not have detailed monitoring on db1009, but I can see an increase number of UPDATEs and INSERTs creating contention among themselves and bl... [11:52:13] 10Cloud-VPS, 10cloud-services-team (Kanban), 10Continuous-Integration-Infrastructure, 10Nodepool, and 2 others: figure out if nodepool is overwhelming rabbitmq and/or nova - https://phabricator.wikimedia.org/T170492#3579592 (10hashar) It happened again this morning after I entered a faulty command that att... [12:11:31] 10Cloud-VPS, 10cloud-services-team (Kanban), 10Continuous-Integration-Infrastructure, 10Nodepool, and 2 others: figure out if nodepool is overwhelming rabbitmq and/or nova - https://phabricator.wikimedia.org/T170492#3579682 (10hashar) Another thing I have noticed on the [[https://grafana.wikimedia.org/dash... [12:47:54] 10cloud-services-team, 10DBA: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3579784 (10chasemp) I'm wondering if this is related: https://phabricator.wikimedia.org/T170492#3579682 I think @hashar issues a command that tried to purge all no... [13:01:03] (03PS1) 10Lokal Profil: Fix wrong use of default in Colombia, Mexico, Argentina harvest [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376003 (https://phabricator.wikimedia.org/T173929) [13:05:37] (03PS1) 10Lokal Profil: Construct registrar_url in country table rather than monuments_all [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376005 [13:07:37] 10Cloud-VPS (Quota-requests), 10Discovery, 10Wikidata, 10Wikidata-Query-Service: WDQS testing setup platform sizing - https://phabricator.wikimedia.org/T169133#3579823 (10chasemp) I'm making a note to discuss it today in our meeting ;) [13:45:19] (03PS2) 10Lokal Profil: Fix wrong use of default in Colombia, Mexico, Argentina harvest [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376003 (https://phabricator.wikimedia.org/T173929) [13:48:40] (03PS3) 10Lokal Profil: Fix wrong use of default in Colombia, Mexico, Argentina harvest [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376003 (https://phabricator.wikimedia.org/T173929) [14:03:25] (03PS1) 10Lokal Profil: Add harvested field for pa_(es) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376016 [14:04:18] (03CR) 10jerkins-bot: [V: 04-1] Add harvested field for pa_(es) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376016 (owner: 10Lokal Profil) [14:11:15] (03PS2) 10Lokal Profil: Add harvested field for pa_(es) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376016 [14:24:28] 10cloud-services-team, 10DBA: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3580140 (10ssastry) >>! In T175002#3579491, @jcrespo wrote: > The other only possible candidate would be testreduce_0715 cc @ssastry There is no round trip test run... [14:27:31] 10cloud-services-team, 10DBA: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3580151 (10jcrespo) @ssastry - I agree, I think Chase's comment are the best fit right now, but I had to ask around to all users of such database. [14:31:41] 10cloud-services-team, 10DBA: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3580165 (10jcrespo) @andrew Is there something that could be done to reduce the amount of connections per service? every one precreates 40 or so, and among them they... [14:35:19] 10Cloud-VPS, 10cloud-services-team (Kanban), 10Continuous-Integration-Infrastructure, 10Nodepool, 10Release-Engineering-Team (Watching / External): rabbitmq: Consume and log messages sent to notifications.error - https://phabricator.wikimedia.org/T175029#3580173 (10Andrew) [14:57:29] (03PS1) 10Lokal Profil: Remove erroneous field from Aruba harvest [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376033 (https://phabricator.wikimedia.org/T174901) [15:04:46] 10Data-Services, 10DBA, 10Patch-For-Review: `pr_index`to be replicated to Labs public databases - https://phabricator.wikimedia.org/T113842#3580271 (10jcrespo) With T174782 solved, the next step is to, for each of the 7 shards: 1) stop replication on sanitarium 2) copy the tables, on existing wikis (wikiso... [15:06:36] 10cloud-services-team, 10DBA: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3580282 (10Andrew) >>! In T175002#3580165, @jcrespo wrote: > @andrew Is there something that could be done to reduce the amount of connections per service? In theor... [15:08:32] 10cloud-services-team, 10Community-Tech, 10DBA, 10Security: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3580291 (10MusikAnimal) >>! In T173891#3578646, @kaldari wrote: > @MusikAnimal: Do we want this table (minus the revdeleted content) available on... [15:15:19] 10cloud-services-team, 10DBA: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3580318 (10jcrespo) > if you can feed me instructions for how to check the number of connections for a given service. ```lang=bash root@neodymium:~$ mysql --defaults... [15:46:56] 10Cloud-VPS (Project-requests): Request creation of project-smtp VPS project - https://phabricator.wikimedia.org/T174618#3568042 (10chasemp) +1 -- no name preference [15:52:27] 10Cloud-VPS (Quota-requests), 10Discovery, 10Wikidata, 10Wikidata-Query-Service: WDQS testing setup platform sizing - https://phabricator.wikimedia.org/T169133#3580459 (10chasemp) +1'd to grant access to the 300G flavor via meeting [15:53:30] 10Cloud-VPS (Quota-requests), 10cloud-services-team (Kanban), 10Discovery, 10Wikidata, 10Wikidata-Query-Service: WDQS testing setup platform sizing - https://phabricator.wikimedia.org/T169133#3580465 (10bd808) a:03Andrew Approved in team meeting for 300GB image flavor. [15:53:58] 10Cloud-VPS (Project-requests), 10cloud-services-team (Kanban), 10User-bd808: Request creation of project-smtp VPS project - https://phabricator.wikimedia.org/T174618#3580468 (10bd808) a:03bd808 [15:57:45] James_F: *waves* [15:58:42] James_F: so just so I'm clear on this: either something is committed on a release branch (in which case we tag it with REL1_XX), or on a WMF branch (in which case we tag it with that WMF tag?) or on master (in which case we tag it with the next wmf tag)? [16:02:12] (03CR) 10Jean-Frédéric: [C: 032] Remove erroneous field from Aruba harvest [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376033 (https://phabricator.wikimedia.org/T174901) (owner: 10Lokal Profil) [16:02:42] (03CR) 10Jean-Frédéric: [C: 032] Add harvested field for pa_(es) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376016 (owner: 10Lokal Profil) [16:03:48] (03CR) 10Jean-Frédéric: [C: 032] Fix wrong use of default in Colombia, Mexico, Argentina harvest [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376003 (https://phabricator.wikimedia.org/T173929) (owner: 10Lokal Profil) [16:05:41] (03Merged) 10jenkins-bot: Remove erroneous field from Aruba harvest [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376033 (https://phabricator.wikimedia.org/T174901) (owner: 10Lokal Profil) [16:06:03] (03Merged) 10jenkins-bot: Add harvested field for pa_(es) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376016 (owner: 10Lokal Profil) [16:06:05] (03Merged) 10jenkins-bot: Fix wrong use of default in Colombia, Mexico, Argentina harvest [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376003 (https://phabricator.wikimedia.org/T173929) (owner: 10Lokal Profil) [16:16:45] Hi Cloud folks, who I can bother few minutes to get some info about to connect to the openstack API? I'm planning to start working on Cumin's backend soon [16:17:06] volans: you can bother me :) [16:18:31] andrewbogott: thanks, will do! In the meanwhile can you point me to the scripts that you have to interact with openstack API right now? I know you have some... [16:20:16] volans: in modules/openstack2/files/liberty/admin_scripts/novastats there are a bunch of small scripts that run queries [16:20:35] volans: https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/labstore/files/nfs-exportd;e194d1258c3f1ab67575ee1a9d820fa9227f91bb$89-121 [16:20:37] 'alltrusty' might be a good one since it's an example of the kind of thing cumin would want to know [16:21:10] volans: that uses a bunch of upstream client python libraries. If you want to write your own rest wrapper that will involve more code but fewer dependencies [16:21:11] https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/openstack/files/nova_fullstack_test.py [16:22:28] great, thanks for the links! [16:28:52] valhallasw`cloud: Yes. [16:30:22] (03CR) 10jenkins-bot: Remove erroneous field from Aruba harvest [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376033 (https://phabricator.wikimedia.org/T174901) (owner: 10Lokal Profil) [16:31:06] (03CR) 10jenkins-bot: Add harvested field for pa_(es) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376016 (owner: 10Lokal Profil) [16:31:53] (03CR) 10jenkins-bot: Fix wrong use of default in Colombia, Mexico, Argentina harvest [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376003 (https://phabricator.wikimedia.org/T173929) (owner: 10Lokal Profil) [16:32:32] 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labstore100[67].wikimedia.org - https://phabricator.wikimedia.org/T167984#3580657 (10madhuvishy) > All of these solutions so far require onsite. @Cmjohnson If you are back and onsite today, could you please take a look? [16:35:12] (03PS1) 10Merlijn van Deen: Only add current-wmf branch for master commits [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/376051 (https://phabricator.wikimedia.org/T172238) [16:35:31] (03CR) 10Merlijn van Deen: [C: 032] Only add current-wmf branch for master commits [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/376051 (https://phabricator.wikimedia.org/T172238) (owner: 10Merlijn van Deen) [16:36:20] (03CR) 10jerkins-bot: [V: 04-1] Only add current-wmf branch for master commits [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/376051 (https://phabricator.wikimedia.org/T172238) (owner: 10Merlijn van Deen) [16:36:25] baaaaah [16:37:44] (03PS2) 10Merlijn van Deen: Only add current-wmf branch for master commits [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/376051 (https://phabricator.wikimedia.org/T172238) [16:37:46] (03PS1) 10Merlijn van Deen: pep8 fixes [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/376052 [16:37:54] (03CR) 10Merlijn van Deen: [C: 032] pep8 fixes [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/376052 (owner: 10Merlijn van Deen) [16:39:05] (03Merged) 10jenkins-bot: pep8 fixes [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/376052 (owner: 10Merlijn van Deen) [16:39:15] (03CR) 10Merlijn van Deen: [V: 032 C: 032] Only add current-wmf branch for master commits [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/376051 (https://phabricator.wikimedia.org/T172238) (owner: 10Merlijn van Deen) [16:39:19] (03CR) 10Merlijn van Deen: [C: 032] Only add current-wmf branch for master commits [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/376051 (https://phabricator.wikimedia.org/T172238) (owner: 10Merlijn van Deen) [16:39:31] (03Merged) 10jenkins-bot: Only add current-wmf branch for master commits [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/376051 (https://phabricator.wikimedia.org/T172238) (owner: 10Merlijn van Deen) [16:40:01] James_F: and now... we wait [16:49:57] 10Data-Services, 10cloud-services-team (Kanban), 10Analytics, 10DBA, 10Research: Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3580724 (10Halfak) [16:50:00] 10cloud-services-team, 10Analytics, 10Project-Admins, 10Research: Create a phabricator project called "wikireplica-datasets" - https://phabricator.wikimedia.org/T173512#3580722 (10Halfak) 05Open>03declined @bd808, I think that makese sense. [16:52:52] 10cloud-services-team, 10Analytics, 10Project-Admins, 10Research: Create a phabricator project called "wikireplica-datasets" - https://phabricator.wikimedia.org/T173512#3580731 (10bd808) There is now a `Datasets` column on the #data-services workboard that we can use to at least group these tickets. [16:55:23] 10Data-Services, 10cloud-services-team, 10Analytics, 10Research: Create a database on the wikireplica servers called "datasets_p" - https://phabricator.wikimedia.org/T173513#3580758 (10bd808) [16:55:56] 10Data-Services, 10cloud-services-team, 10Analytics, 10Research: Document the process for importing a new "datasets_p" table - https://phabricator.wikimedia.org/T173514#3580762 (10bd808) [16:56:57] 10Data-Services, 10cloud-services-team (Kanban), 10Analytics, 10DBA, 10Research: Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3530850 (10bd808) [16:56:59] 10Data-Services, 10cloud-services-team, 10Analytics, 10Research: Create a database on the wikireplica servers called "datasets_p" - https://phabricator.wikimedia.org/T173513#3530910 (10bd808) 05Open>03stalled p:05Triage>03Normal See {T173511} for higher level discussion. [16:57:36] 10Data-Services, 10cloud-services-team (Kanban), 10Analytics, 10DBA, 10Research: Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3530850 (10bd808) [16:57:38] 10Data-Services, 10cloud-services-team, 10Analytics, 10Research: Document the process for importing a new "datasets_p" table - https://phabricator.wikimedia.org/T173514#3530940 (10bd808) 05Open>03stalled p:05Triage>03Normal See {T173511} for higher level discussion. [16:59:13] 10cloud-services-team, 10DBA, 10Patch-For-Review: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3580822 (10jcrespo) a:03Andrew This can be closed now for me, until we have negative feedback. [16:59:21] 10cloud-services-team, 10DBA, 10Patch-For-Review: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3580824 (10jcrespo) p:05Triage>03Normal [17:01:51] 10Toolforge, 10cloud-services-team (Kanban): No module named 'pymysql' -- python3-mysql installed on toolforge bastions but not on exec nodes - https://phabricator.wikimedia.org/T174439#3580829 (10bd808) I removed the package from tools-bastion-03 to prevent future confusion. ``` $ sudo apt-get purge python3-... [17:09:36] 10cloud-services-team, 10DBA, 10Patch-For-Review: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3580873 (10Andrew) If nova proves happy with this change then we can further reduce the workers if needed. I'd like to give it a few weeks tho... [17:13:03] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1413 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:14:27] (03CR) 10Jforrester: "Neat. Thanks!" [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/376051 (https://phabricator.wikimedia.org/T172238) (owner: 10Merlijn van Deen) [17:17:49] 10Cloud-VPS (Quota-requests), 10cloud-services-team (Kanban), 10Discovery, 10Wikidata, 10Wikidata-Query-Service: WDQS testing setup platform sizing - https://phabricator.wikimedia.org/T169133#3580893 (10Andrew) I don't see a project named 'WDQS' -- can you clarify what actual VPS project we're talking ab... [17:18:58] 10Cloud-VPS (Quota-requests), 10cloud-services-team (Kanban), 10Discovery, 10Wikidata, 10Wikidata-Query-Service: WDQS testing setup platform sizing - https://phabricator.wikimedia.org/T169133#3580894 (10bd808) The project is [[https://tools.wmflabs.org/openstack-browser/project/wikidata-query|wikidata-qu... [17:21:48] 10Cloud-VPS (Quota-requests), 10cloud-services-team (Kanban), 10Discovery, 10Wikidata, 10Wikidata-Query-Service: WDQS testing setup platform sizing - https://phabricator.wikimedia.org/T169133#3580917 (10Andrew) ok, flavor added. I'm a bit nervous about how the scheduler will handle this, so please ping... [17:22:41] 10Data-Services, 10Quarry, 10DBA: CHAR_LENGTH does not return the character count - https://phabricator.wikimedia.org/T174543#3580920 (10jcrespo) 05Open>03Resolved a:03jcrespo See: **https://quarry.wmflabs.org/query/21367 which counts by characters, not by bytes.** Mediawiki on WMF-hosted wikis uses B... [17:22:45] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [17:23:46] This is the best I can suggest, unles someone has other ideas: https://phabricator.wikimedia.org/T174543#3580920 [17:31:35] jynus: thanks for looking at that one. Our legacy binary encoding stuff is certainly confusing for folks who haven't run into it before. :/ [17:32:34] 10Data-Services, 10cloud-services-team (Kanban), 10User-bd808: Define naming scheme for connecting to new wiki replica cluster - https://phabricator.wikimedia.org/T174860#3581008 (10bd808) Some semi-random thoughts that will probably eventually spawn other tasks: * `.labsdb` will be with us for a long,... [17:47:47] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [17:48:03] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [17:56:08] 10Data-Services, 10Quarry, 10DBA: CHAR_LENGTH does not return the character count - https://phabricator.wikimedia.org/T174543#3565363 (10Base) @jcrespo , is this why I also fail to get normal results while attempting to match title against a regex? https://quarry.wmflabs.org/query/21026 [18:03:17] wow jynus that's really informative, thank you [18:12:56] 10Data-Services, 10Quarry, 10DBA: CHAR_LENGTH does not return the character count - https://phabricator.wikimedia.org/T174543#3581128 (10jcrespo) > is this why I also fail to get normal results while attempting to match title against a regex? I cannot say, I would tell you to try if it helps :-) Some of the... [18:16:10] 10cloud-services-team, 10Analytics: Remove logging from labs for schema https://meta.wikimedia.org/wiki/Schema:CommandInvocation - https://phabricator.wikimedia.org/T166712#3581160 (10Krenair) Krenair: you should ask some of the cloud team people about it :) [18:24:39] 10cloud-services-team (Kanban), 10Analytics: Remove logging from labs for schema https://meta.wikimedia.org/wiki/Schema:CommandInvocation - https://phabricator.wikimedia.org/T166712#3581208 (10bd808) This schema was for finding out about how people used our `jsub` and `webservice` commands. We can shut this do... [18:29:51] bd808: it seems like labs-vagrant does not have contrib enabled as an apt source but normal vagrant does [18:29:55] is that possible? [18:30:50] Hi all. I have a question Can I install Mysql Workbench on my desktop PC an d connect to the wikimedia SQL server and do queries? I know I can access the DB from wmtools command line. [18:31:10] tgr: there is no labs-vagrant anymore unless you are looking at a ver, very ancient VM. MediaWiki-Vagrant is the same if you run it locally or on a VM [18:31:33] Kotz: there are instructions for that somewhere. Let me see if I can find the link [18:31:44] yeah, that's why I am confused [18:32:29] Kotz: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#MySQL_Workbench [18:33:03] tgr: what instance is giving you problems? [18:33:07] and/or what role? [18:33:54] @bd808 - thanks!! [18:33:59] tgr: hmmm... there could be a base instance difference between LXC and VirtualBox actually [18:34:07] What do they mean "Username: " [18:34:12] ? where is ~/.my.cnf ? [18:34:38] ahh it is ~/.replica.my.cnf ? [18:34:48] Kotz: old and bad instructions! Use the username from $HOME/replica.my.cnf [18:35:38] bd808: the instance has been created a few hours ago [18:35:45] https://horizon.wikimedia.org/project/instances/97b18a1d-ab24-4e46-85b9-7509cc0cad67/ [18:36:13] I'll file a bug [18:36:29] tgr: I bet that it turns out to be about the base image [18:36:48] bd808: Thanks alot, that was very helpful [18:37:46] I made some quick changes to the wiki documentation, but if you find more things that are confusing there ask questions and update the docs when you figure out the right answer [18:41:46] 10Cloud-VPS, 10MediaWiki-Vagrant: Normal and cloud Vagrant have different apt sources settings - https://phabricator.wikimedia.org/T175055#3581235 (10Tgr) [18:41:57] 10Cloud-VPS, 10MediaWiki-Vagrant: Normal and cloud Vagrant have different apt sources settings - https://phabricator.wikimedia.org/T175055#3581247 (10Tgr) [18:43:46] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [18:44:08] 10Cloud-VPS, 10MediaWiki-Vagrant: Normal and cloud Vagrant have different apt sources settings - https://phabricator.wikimedia.org/T175055#3581248 (10bd808) They definitely do use different base images. We should probably setup our own `/etc/apt/sources.list` on all base images as part of the early Puppet proc... [19:05:06] 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labstore100[67].wikimedia.org - https://phabricator.wikimedia.org/T167984#3581403 (10Cmjohnson) I ended up moving the cards to different pci slots and that fixed the issue. @robh passing this to you (again) [19:18:44] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [19:32:45] 10cloud-services-team (Kanban), 10DC-Ops, 10Operations, 10ops-eqiad: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3581609 (10Cmjohnson) It appears to be the cpu. Creating a task with Dell to replace. Record: 16 Date/Time: 08/30/2017 16:13:51 Source: system Severity: Cri... [19:34:09] !log deployed PrivateSettings.php change to add Thumbor username to Swift configuration [19:34:09] gilles: Unknown project "deployed" [19:34:33] !log deployment-prep deployed PrivateSettings.php change to add Thumbor username to Swift configuration [19:34:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [19:37:07] 10Cloud-VPS, 10cloud-services-team (Kanban), 10Continuous-Integration-Infrastructure, 10Nodepool, 10Release-Engineering-Team (Watching / External): rabbitmq: Consume and log messages sent to notifications.error - https://phabricator.wikimedia.org/T175029#3581634 (10hashar) [19:48:04] 10Data-Services, 10cloud-services-team (Kanban), 10User-bd808: Define naming scheme for connecting to new wiki replica cluster - https://phabricator.wikimedia.org/T174860#3575183 (10chasemp) I really dislike foo.labsdb. It is trading all sanity for conciseness I think. I think wikireplica-web.eqiad.wmnet a... [19:49:20] 10Cloud-VPS, 10cloud-services-team (Kanban), 10Continuous-Integration-Infrastructure, 10Nodepool, 10Release-Engineering-Team (Watching / External): rabbitmq: Consume and log messages sent to notifications.error - https://phabricator.wikimedia.org/T175029#3581733 (10chasemp) should we try to get this into... [19:52:32] (03PS2) 10Lokal Profil: Construct registrar_url in country table rather than monuments_all [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/376005 [20:00:48] 10Data-Services, 10cloud-services-team (Kanban), 10User-bd808: Define naming scheme for connecting to new wiki replica cluster - https://phabricator.wikimedia.org/T174860#3581780 (10bd808) The `.(web|analytics).db.svc.eqiad.wmflabs` convention makes sense to me. Per my ramblings in T174860#3581008, we... [20:03:53] 10Cloud-Services, 10DBA: Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T142807#3581790 (10bd808) [20:03:56] 10Cloud-Services, 10Cloud-VPS, 10DBA, 10Epic, 10Tracking: Labs databases rearchitecture (tracking) - https://phabricator.wikimedia.org/T140788#3581789 (10bd808) [20:04:16] 10Cloud-Services, 10DBA: Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T142807#2546917 (10bd808) [20:04:18] 10Cloud-Services, 10Cloud-VPS, 10DBA, 10Epic, 10Tracking: Labs databases rearchitecture (tracking) - https://phabricator.wikimedia.org/T140788#2475959 (10bd808) [20:05:08] 10Cloud-Services, 10DBA: Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T142807#2546917 (10bd808) [20:05:11] 10Cloud-Services, 10Cloud-VPS, 10DBA, 10Epic, 10Tracking: Labs databases rearchitecture (tracking) - https://phabricator.wikimedia.org/T140788#2475959 (10bd808) [20:05:13] 10Data-Services, 10cloud-services-team (Kanban), 10User-bd808: Promote beta test of new Wiki Replica servers - https://phabricator.wikimedia.org/T172704#3581796 (10bd808) [20:08:02] 10Cloud-VPS, 10cloud-services-team (Kanban), 10Continuous-Integration-Infrastructure, 10Nodepool, and 2 others: figure out if nodepool is overwhelming rabbitmq and/or nova - https://phabricator.wikimedia.org/T170492#3581822 (10hashar) Also from the logs there are `ValueError: Circular reference detected` i... [20:10:18] 10Data-Services, 10cloud-services-team, 10DBA: Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T142807#3581840 (10bd808) [20:12:09] 10cloud-services-team, 10Goatification, 10Wikimania-Hackathon-2017, 10User-bd808: Engage the unicorn community - https://phabricator.wikimedia.org/T173112#3581847 (10bd808) p:05High>03Low [20:13:22] 10Data-Services, 10Analytics, 10Research: Document the process for importing a new "datasets_p" table - https://phabricator.wikimedia.org/T173514#3581850 (10bd808) [20:13:48] 10Data-Services, 10Analytics, 10Research: Create a database on the wikireplica servers called "datasets_p" - https://phabricator.wikimedia.org/T173513#3581855 (10bd808) [20:18:57] 10Data-Services, 10cloud-services-team (Kanban), 10Patch-For-Review: Add socket parameter to maintain-views script - https://phabricator.wikimedia.org/T172496#3581867 (10bd808) [20:20:54] 10cloud-services-team, 10DBA, 10Operations, 10Scoring-platform-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3581887 (10bd808) [20:20:57] 10Data-Services, 10cloud-services-team, 10DBA: Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T142807#3581886 (10bd808) [20:21:35] 10Cloud-VPS (Project-requests), 10cloud-services-team (Kanban), 10User-bd808: Request creation of deep-learning-services VPS project - https://phabricator.wikimedia.org/T172421#3581888 (10bd808) a:03bd808 [20:28:07] 10Cloud-VPS (Quota-requests), 10Recommendation-API: Request custom instance for recommendation-api labs project - https://phabricator.wikimedia.org/T169766#3581939 (10bd808) @schana Are you still blocked by the InnoDB index limit? Do you want to re-examine the need for a custom instance to host your own DB ser... [20:43:49] 10Cloud-Services, 10Tracking, 10User-bd808: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#3582046 (10bd808) [20:43:51] 10Cloud-VPS (Quota-requests): Request increased quota for git labs project - https://phabricator.wikimedia.org/T163213#3582044 (10bd808) 05stalled>03declined Adding quota to shoehorn an unrelated use-case into the existing project for gerrit testing is not a good idea. A new project request for icinga2 testi... [20:54:14] 10Cloud-VPS: nova compute hosts disk space alert does not page - https://phabricator.wikimedia.org/T175077#3582084 (10chasemp) [21:06:46] madhuvishy: so i have not signed the puppet keys yet [21:06:55] but i think we have labstore1006 successfully installed [21:07:09] has all the right mounts, but nothing for the shelf, which is expected [21:07:35] im installing labstore1007 now. [21:08:32] chris was able to reroute the cables inside the chassis and swap the card slots for the internal and external controller, which fixes the boot order issue in bios. [21:09:27] and the shelves have a raid10 array of their disks, so the only thing to do manually is setup lvm/filesystem/mount for the shelf [21:09:32] or tie it into the existing lvm, etc... [21:11:06] 10Data-Services, 10cloud-services-team (Kanban), 10Patch-For-Review: Add socket parameter to maintain-views script - https://phabricator.wikimedia.org/T172496#3582153 (10bd808) 05Open>03Resolved a:03chasemp [21:14:41] 10Cloud-Services, 10Operations, 10ops-eqiad: rack/setup/install labstore100[67].wikimedia.org - https://phabricator.wikimedia.org/T167984#3582163 (10RobH) [21:16:53] 10Cloud-Services, 10Operations, 10ops-eqiad: rack/setup/install labstore100[67].wikimedia.org - https://phabricator.wikimedia.org/T167984#3582171 (10RobH) [21:26:59] 10Cloud-Services, 10Operations, 10ops-eqiad: rack/setup/install labstore100[67].wikimedia.org - https://phabricator.wikimedia.org/T167984#3582198 (10RobH) Ok, I thought labstore1006 was installed, since it was booted, but it and labstore1007 do not show the same disks in the same order. Example: the raid ar... [21:27:11] 10Cloud-Services, 10Operations, 10ops-eqiad: rack/setup/install labstore100[67].wikimedia.org - https://phabricator.wikimedia.org/T167984#3582199 (10RobH) [21:28:42] 10Toolforge, 10cloud-services-team (Kanban), 10Documentation, 10Patch-For-Review, 10User-bd808: Update code and/or docs for "How can I detect if I'm running in Labs?" - https://phabricator.wikimedia.org/T174082#3582207 (10bd808) a:05bd808>03valhallasw @valhallasw did a nice job of updating the docs o... [22:03:01] (03PS1) 10Krinkle: wikimedia-editing: Update renamed Editing-Department tag [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/376144 [22:03:16] James_F: ^ desired? [22:05:55] (03CR) 10Jforrester: [C: 031] wikimedia-editing: Update renamed Editing-Department tag [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/376144 (owner: 10Krinkle) [22:06:01] Krinkle: Good spot, thanks. [22:06:37] (03CR) 10Krinkle: [C: 032] wikimedia-editing: Update renamed Editing-Department tag [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/376144 (owner: 10Krinkle) [22:07:01] (03Merged) 10jenkins-bot: wikimedia-editing: Update renamed Editing-Department tag [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/376144 (owner: 10Krinkle) [22:07:09] (03CR) 10jenkins-bot: wikimedia-editing: Update renamed Editing-Department tag [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/376144 (owner: 10Krinkle) [22:07:43] !log tools.wikibugs Updated channels.yaml to: 5b53afbb0cb9b8828efdd60be8f766328bded1a1 wikimedia-editing: Update renamed Editing-Department tag [22:12:01] robh: oh awesome! Thank you [22:12:16] yeah i need to circle back, 1006 isnt right with 1007 [22:12:24] they arent identical and they need to be, but im not done yet ;D [22:12:30] no worries [22:12:40] but the shelves and boot detection order is now right [22:12:44] yay! [22:12:51] now its back to raid settings and partitioning [22:12:53] so there is progress. [22:13:01] just with the slot change, the disks detect in a different order [22:13:18] Ah I see [22:13:19] so i need to redo the raid arrays so they get the new labeling. it may also result in our being able to use dumpsdata recipe [22:13:24] since the only difference was the disk labeling [22:13:38] which would be nice from a simplification standpoing. [22:13:39] standpoint even [22:14:02] yup that would be great! [22:25:42] 10Data-Services, 10cloud-services-team (Kanban), 10DBA: Create and announce timeline for shutting down labsdb100[13] - https://phabricator.wikimedia.org/T175086#3582412 (10bd808) [23:08:17] 10Data-Services, 10cloud-services-team, 10DBA: Identify tools hosting databases on labsdb100[13] and notify maintainers - https://phabricator.wikimedia.org/T175096#3582626 (10bd808) [23:27:28] bd808: was it always the case that MediaWiki would listen on 8080? [23:27:49] or settings[:http_port], more generally [23:28:24] it does not make sense to me and a bunch of roles that set up their own service and call the MW API are broken because of it [23:28:40] also, we have this in the vagrantfile: [23:28:41] if settings[:http_port] != 80 && ENV['MWV_ENVIRONMENT'] != 'labs' [23:28:41] puppet.facter['port_fragment'] = ":#{settings[:http_port]}" [23:28:41] end [23:30:00] why the labs exception? does LXC do the routing differently? [23:54:18] 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labstore100[67].wikimedia.org - https://phabricator.wikimedia.org/T167984#3582771 (10RobH) [23:55:27] 10Cloud-Services, 10Operations: rack/setup/install labstore100[67].wikimedia.org - https://phabricator.wikimedia.org/T167984#3352141 (10RobH) a:05RobH>03madhuvishy Ok, after the cards were swapped, the disks now detect in the same order as other hosts. IE: the raid1 flex bays setup as raid1 are showing as... [23:55:34] madhuvishy: labstores online \o/ [23:55:36] all yours [23:55:41] 10Cloud-VPS, 10MediaWiki-Vagrant: MediaWiki/Apache port is chosen weirdly on Vagrant - https://phabricator.wikimedia.org/T175100#3582782 (10Tgr) [23:55:43] robh: <3 thank you sooo much [23:55:47] filed as T175100 [23:55:49] T175100: MediaWiki/Apache port is chosen weirdly on Vagrant - https://phabricator.wikimedia.org/T175100 [23:56:03] welcome, sorry it took so long, but was first of these hp systems with disk arrays we've had =P [23:56:12] but now we have them behaving like the rest it seems [23:57:17] robh: I understand :) happy to have them up and running now! [23:58:08] 10Cloud-Services, 10cloud-services-team (Kanban): disable service groups for non-tools projects - https://phabricator.wikimedia.org/T167204#3320787 (10bd808) Wikitech [[https://wikitech.wikimedia.org/w/index.php?title=MediaWiki:Sidebar/Group:user&diff=prev&oldid=1769434|sidebar link removed]]