[00:05:31] 10cloud-services-team, 10wikitech.wikimedia.org: contentadmin has suddenly less permissions - https://phabricator.wikimedia.org/T171208#3649641 (10EddieGP) >>! In T171208#3649629, @Krenair wrote: > I know why at least one of those is considered a security-sensitive right. Would you dare to enlighten us, or is... [00:14:49] 10cloud-services-team, 10wikitech.wikimedia.org: contentadmin has suddenly less permissions - https://phabricator.wikimedia.org/T171208#3649644 (10Krenair) >>! In T171208#3649641, @EddieGP wrote: >>>! In T171208#3649629, @Krenair wrote: >> I know why at least one of those is considered a security-sensitive rig... [01:12:35] 10Cloud-Services, 10Toolforge: Instrument jsub/jstart/webservices usage - https://phabricator.wikimedia.org/T123444#3649662 (10bd808) 05Open>03declined Undone via {T166712} [01:38:08] (03CR) 10Nuria: [C: 031] Remove log-command-invocation calls [labs/toollabs] - 10https://gerrit.wikimedia.org/r/381619 (https://phabricator.wikimedia.org/T166712) (owner: 10BryanDavis) [01:38:27] 10cloud-services-team (Kanban), 10Analytics, 10Patch-For-Review, 10User-bd808: Remove logging from labs for schema https://meta.wikimedia.org/wiki/Schema:CommandInvocation - https://phabricator.wikimedia.org/T166712#3649673 (10Nuria) Let us know if we can also delete the tables this schema was logging to [01:40:14] 10cloud-services-team (Kanban), 10Analytics, 10Patch-For-Review, 10User-bd808: Remove logging from labs for schema https://meta.wikimedia.org/wiki/Schema:CommandInvocation - https://phabricator.wikimedia.org/T166712#3649674 (10bd808) >>! In T166712#3649673, @Nuria wrote: > Let us know if we can also delete... [01:49:50] PROBLEM - Puppet errors on tools-exec-1402 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [02:29:50] RECOVERY - Puppet errors on tools-exec-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [03:26:19] (03PS3) 10BryanDavis: Add link to source for our cdnjs. [labs/tools/cdnjs-index] - 10https://gerrit.wikimedia.org/r/377930 (owner: 10Quiddity) [03:28:13] (03CR) 10BryanDavis: [V: 032 C: 032] Add link to source for our cdnjs. [labs/tools/cdnjs-index] - 10https://gerrit.wikimedia.org/r/377930 (owner: 10Quiddity) [05:15:14] !log quarry Deployed 644b293 to quarry-main-01 and restarted uwsgi [05:15:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [05:17:57] PROBLEM - Puppet errors on tools-exec-1426 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [05:52:59] RECOVERY - Puppet errors on tools-exec-1426 is OK: OK: Less than 1.00% above the threshold [0.0] [05:59:40] 10Data-Services, 10DBA: Some queries to new replica hosts are dramatically slower than labsdb; missing indexes? - https://phabricator.wikimedia.org/T177096#3649817 (10Marostegui) >>! In T177096#3647333, @bd808 wrote: > I'd be ok with stalling the index fix for a few days if we can get something properly design... [06:38:04] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1411 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [06:38:12] (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/381725 (owner: 10L10n-bot) [06:43:23] PROBLEM - Puppet errors on tools-exec-gift-trusty-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [06:48:03] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1417 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [06:54:00] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1413 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:18:03] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [07:19:05] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [07:23:20] RECOVERY - Puppet errors on tools-exec-gift-trusty-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:28:02] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1417 is OK: OK: Less than 1.00% above the threshold [0.0] [09:22:12] 10Cloud-Services, 10Cloud-VPS, 10WikiApiary: Wikiapiary project instance needs more local disk - https://phabricator.wikimedia.org/T162534#3166400 (10Ciencia_Al_Poder) Any update on this? Just to be sure it doesn't really run out of space, making the move more difficult [12:59:48] 10cloud-services-team (Kanban), 10DC-Ops, 10Operations, 10ops-eqiad: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3650507 (10chasemp) Final status of etherpad we were using to coordinate migrations off labvirt1015 for posterity ```https://phabricator.wikimedia.org/T177164 https://phabri... [13:08:57] 10Cloud-Services, 10Operations, 10netops, 10ops-eqiad: labsdb1001's switch port negociating at 100M - https://phabricator.wikimedia.org/T177130#3650565 (10faidon) [15:30:15] 10Data-Services, 10DBA: Some queries to new replica hosts are dramatically slower than labsdb; missing indexes? - https://phabricator.wikimedia.org/T177096#3651073 (10bd808) >>! In T177096#3649817, @Marostegui wrote: > From my point of view, ideally we should have some sort of cronjob or similar that could com... [15:33:36] 10Data-Services, 10DBA: Determine schema differences between labsdb1001 and labsdb1009 - https://phabricator.wikimedia.org/T177223#3651078 (10bd808) [15:33:53] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10User-bd808: Determine schema differences between labsdb1001 and labsdb1009 - https://phabricator.wikimedia.org/T177223#3651093 (10bd808) a:03bd808 [15:34:19] 10Data-Services, 10cloud-services-team (FY2017-18), 10DBA, 10Goal: Migrate all users to new Wiki Replica cluster and decommission old hardware - https://phabricator.wikimedia.org/T142807#3651095 (10bd808) [15:34:21] 10Data-Services, 10DBA: Some queries to new replica hosts are dramatically slower than labsdb; missing indexes? - https://phabricator.wikimedia.org/T177096#3646922 (10bd808) [15:35:33] 10Cloud-VPS, 10cloud-services-team (Kanban): DNS resolution chosing IPv6 addrs on hosts with only link-local IPv6 addresses - https://phabricator.wikimedia.org/T176891#3651107 (10bd808) [15:37:40] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10User-bd808: Determine schema differences between labsdb1001 and labsdb1009 - https://phabricator.wikimedia.org/T177223#3651125 (10jcrespo) There is already 4 related things that, even nothing to do with this, we could integrate this on: * replicati... [16:32:44] 10Wikibugs, 10Differential, 10Phabricator, 10Gerrit-Migration, and 2 others: Create conduit method to query the feed and return records with relevant details populated instead of just a bunch of phids - https://phabricator.wikimedia.org/T123417#3651445 (10mmodell) I plan to work on this soon, hopefully som... [16:46:39] Are best practices for backing up data from a VM documented somewhere? Looking to get started on that. [16:52:39] 10Cloud-Services, 10cloud-services-team (FY2017-18), 10Documentation, 10Goal: Form a WMCS Documentation Special Interest Group - https://phabricator.wikimedia.org/T177123#3651538 (10bd808) More specifically, the initial focus of this group would be systematically cataloging the topical content and organiza... [16:59:43] 10cloud-services-team (Kanban), 10DC-Ops, 10Operations, 10ops-eqiad: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3651563 (10Cmjohnson) The CPU failed again over the weekend, Record: 2 Date/Time: 10/01/2017 01:21:53 Source: system Severity: Critical Description: CPU 2 ma... [17:03:05] Nettrom: not really, no. The existing documentation is basically "you should figure that out and don't count on our NFS servers as your 'safe' storage" [17:05:02] bd808: I think I remember reading that. Any knowledge of what people do in this case? Get some cloud storage somewhere else and rsync it? [17:05:03] we have grand dreams about one day providing a backup solution for tenants, but it is at best months away. (and at worst just a dream) [17:05:54] Nettrom: that might be a good question to put to the cloud@lists.wikimedia.org mailing list [17:06:13] bd808: good idea, I’ll make a note to get that done later today, thanks! [17:06:58] the things I maintain are generally "backed up" via git repos, Puppet manifests, etc. [17:07:36] state should generally be recreated in that way agreed [17:07:55] data otoh is totally context dependent Nettrom [17:09:30] Yeah, I need to start working on having a rebuild strategy & scripts available (e.g. puppet manifests). It’s the data that I’m mostly interested in figuring out a good way of handling. [17:13:40] Nettrom: if you dont have a computer that is running 24/7 (datacenter or not), you could let your laptop/desktop run rsync to copy the data from the VPS once a day when you login .. so you dont have to remember it [17:14:01] because if we have to remember it.. we dont [17:29:59] 10Data-Services, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-bd808: Define naming scheme for connecting to new wiki replica cluster - https://phabricator.wikimedia.org/T174860#3651717 (10bd808) [17:30:01] 10Cloud-VPS, 10Upstream: Designate API expects FQDN in "name" value for a recordset and raises bizarre error when that expecation fails - https://phabricator.wikimedia.org/T176057#3651716 (10bd808) [21:04:20] 10Cloud-VPS: puppet-phabricator and gerrit-test3 have gone down - https://phabricator.wikimedia.org/T177164#3652269 (10Zppix) @andrew and @madhuvishy for the timely response and fix. [22:06:31] 10Toolforge, 10Patch-For-Review: Catchpoint tests failing under Toolforge availability product - https://phabricator.wikimedia.org/T177103#3652459 (10madhuvishy) For the labsdb1001 & 1003 tests, the error was: ``` root@tools-checker-01:~# curl localhost/labsdb/labsdb1001rw ; echo Caught exception: (1030, u'Go... [22:10:55] 10Toolforge, 10Patch-For-Review: Catchpoint tests failing under Toolforge availability product - https://phabricator.wikimedia.org/T177103#3652465 (10madhuvishy) Also fixed the labsdb1005 check with https://gerrit.wikimedia.org/r/381885 [22:48:56] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [22:49:48] 10Toolforge, 10Patch-For-Review: Catchpoint tests failing under Toolforge availability product - https://phabricator.wikimedia.org/T177103#3652523 (10madhuvishy) The webservice tests should be fixed too! I'll let @chasemp verify and resolve this. [23:09:05] It looks like ReleaseTaggerBot broke on Friday? [23:09:07] (cc legoktm) [23:09:18] https://phabricator.wikimedia.org/p/ReleaseTaggerBot/ shows no activity in the past 48+ hours [23:09:19] hello [23:11:07] File "/data/project/forrestbot/forrestbot/forrestbot.py", line 46, in get_master_branches [23:11:07] key=wmf_number)[-1] [23:11:07] IndexError: list index out of range [23:12:32] RoanKattouw: I'll debug further after I finish my homework. Probably someone created a malformed branch on a repository somewhere :/ [23:14:17] 10Toolforge, 10cloud-services-team (FY2017-18), 10Research, 10Goal, 10Research-2017-18-Q2: 2017 Toolforge user survey - https://phabricator.wikimedia.org/T177126#3652562 (10leila) [23:15:04] Thanks [23:28:56] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [23:32:51] PROBLEM - Puppet errors on tools-exec-1428 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [23:52:27] mediawiki/extensions/DataTypes [23:52:32] doesn't have any wmf branches [23:53:24] ok, I created a bogus wmf/1.31.0-wmf.1 branch [23:58:35] RoanKattouw: it just ran and should be back to normal now