[00:31:08] PROBLEM - Puppet errors on tools-exec-1436 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [00:49:43] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [00:51:55] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [00:55:37] 10Data-Services, 10cloud-services-team (Kanban), 10User-bd808: Define naming scheme for connecting to new wiki replica cluster - https://phabricator.wikimedia.org/T174860#3590305 (10bd808) Using `*.db.svc.eqiad.wmflabs` is a tiny bit more complicated than the current `*.labsdb` service names at our DNS layer... [01:03:21] PROBLEM - Puppet errors on tools-exec-1430 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:06:08] RECOVERY - Puppet errors on tools-exec-1436 is OK: OK: Less than 1.00% above the threshold [0.0] [01:14:46] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [01:26:56] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [01:37:43] PROBLEM - Puppet errors on tools-exec-1434 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [02:12:43] RECOVERY - Puppet errors on tools-exec-1434 is OK: OK: Less than 1.00% above the threshold [0.0] [02:33:24] RECOVERY - Puppet errors on tools-exec-1430 is OK: OK: Less than 1.00% above the threshold [0.0] [03:43:51] PROBLEM - Puppet errors on tools-exec-1403 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [03:58:58] 10Tool-Pageviews: Userviews halts if PageViews API fails - https://phabricator.wikimedia.org/T175254#3590437 (10Shyamal) It worked for 365 days quite reasonably and without any warnings/error messages today. [04:38:52] RECOVERY - Puppet errors on tools-exec-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [06:16:46] 10Tools: Tool "icalendar" loads bootstrap doc assets from getbootstrap - https://phabricator.wikimedia.org/T172608#3503579 (10Liuxinyu970226) @zhuyifei1999 try emailing [[ mailto:wiegels@tools.wmflabs.org | here ]]? [06:44:22] PROBLEM - Puppet errors on tools-exec-gift-trusty-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:24:23] RECOVERY - Puppet errors on tools-exec-gift-trusty-01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:55:09] 10Cloud-Services, 10Operations: rack/setup/install labstore100[67].wikimedia.org - https://phabricator.wikimedia.org/T167984#3352141 (10fgiunchedi) Those messages are due to acpi power meter, which we blacklist as of https://gerrit.wikimedia.org/r/#/c/356422/. A reboot should make the message go away. [09:40:03] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/376238 (owner: 10Hashar) [11:08:57] 10Tool-wikiloves: list of uploaders in WikiLoves tool does not work if country has multiple words - https://phabricator.wikimedia.org/T175354#3591091 (10Effeietsanders) [11:36:20] 10Tool-wikiloves: list of uploaders in WikiLoves tool does not work if country has multiple words - https://phabricator.wikimedia.org/T175354#3591158 (10JeanFred) Thanks for the report! > A url is linked to like https://tools.wmflabs.org/wikiloves/monuments/2017/United_States That URL works :D https://tools.w... [13:06:00] PROBLEM - Puppet errors on tools-worker-1020 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [13:40:59] RECOVERY - Puppet errors on tools-worker-1020 is OK: OK: Less than 1.00% above the threshold [0.0] [14:07:23] 10Tool-wikiloves: list of uploaders in WikiLoves tool does not work if country has multiple words - https://phabricator.wikimedia.org/T175354#3591557 (10Suhadakashter) [14:08:15] 10Tool-wikiloves: list of uploaders in WikiLoves tool does not work if country has multiple words - https://phabricator.wikimedia.org/T175354#3591091 (10Suhadakashter) [14:09:45] 10Tool-wikiloves: list of uploaders in WikiLoves tool does not work if country has multiple words - https://phabricator.wikimedia.org/T175354#3591616 (10JeanFred) 05duplicate>03Open [15:39:57] PROBLEM - Puppet errors on tools-exec-1401 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [16:18:29] 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labvirt10(19|20).eqiad.wmnet - https://phabricator.wikimedia.org/T172538#3591956 (10Cmjohnson) [16:19:56] 10Cloud-Services, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labvirt10(19|20).eqiad.wmnet - https://phabricator.wikimedia.org/T172538#3501521 (10Cmjohnson) bios is setup, raid is configured to raid 10. switch ports need setup still 1019 -> b4 ge-4/0/33 1020 -> b7 ge-7/0/13 [16:19:57] RECOVERY - Puppet errors on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [17:51:08] 10Cloud-Services, 10Operations: rack/setup/install labstore100[67].wikimedia.org - https://phabricator.wikimedia.org/T167984#3592267 (10madhuvishy) 05Open>03Resolved @fgiunchedi Thank you! That seems to have fixed it. Resolving this task. Thanks everyone :) [18:06:42] !log quarry Deployed d9e8a4a to quarry-main-01 T175285 [18:06:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [18:06:46] T175285: Quarry XLSX cells for long urls are wrongly empty - https://phabricator.wikimedia.org/T175285 [18:09:54] 10Quarry, 10Patch-For-Review: Quarry XLSX cells for long urls are wrongly empty - https://phabricator.wikimedia.org/T175285#3592323 (10zhuyifei1999) 05Open>03Resolved Should be resolved. Such text are no longer written as a link but as basic string. Please reopen if there are issues. [18:11:13] Hello everyone! I’m trying to understand the connection between continuous jobs and the Bigbrother daemon. Should I set up Bigbrother if I want to ensure that my continuous job keeps running? From the description of continuous jobs, it seems they’ll just keep on running, so I wasn’t sure. [18:12:08] 10Quarry, 10Patch-For-Review: Quarry XLSX cells for long urls are wrongly empty - https://phabricator.wikimedia.org/T175285#3592341 (10IKhitron) Checked. Works great. Thank you, @zhuyifei1999. [18:24:35] Nettrom: there are layers to this question, :) [18:25:42] `jsub -continuous` wraps your job in a tiny shell script that fundamentally says "start this code again if it exits with a non-zero status code" [18:26:42] bigbrother is subtly different. it looks at all the jobs on the grid every minute and if your job isn't there it tries to start it [18:26:59] using both as defense in depth is ok [18:28:16] `jsub -continuous` can make some weird things happen too. It changes how Grid Engine sees the job. This can sometimes cause problems in the way that the job's environment is setup [18:29:16] the best candidates for `jsub -continuous` are bots that might die from a fatal error and need to start right back up [18:30:01] also, the `webservice` command now has a bigbrother like feature built into it directly, so using bigbrother with it is unnecessary [18:30:26] Nettrom: ^ hope all of that helps some. Feel free to ask follow up questions [18:38:02] bd808: Fantastic answer, thanks! I’m running a Python script that processes the EventStream and grabs ORES predictions for new articles and stores it in a database table (on tools.labsdb) for a research project. Keeping it running is preferable, so having bigbrother as an additional failsafe is good, I think. [18:39:39] :D [18:40:23] Jut wondering, is it able to track an offset and resume without missing data? [18:40:48] Backed by Kafka so I suppose yes, in theory. [18:41:51] It would be nice if there were even a meta-big-brother in which the Wikimedia Foundation could adopt maintenance of publicly useful data sets [18:43:01] bigborther is a pretty blunt and generic thing. There would be better ways to manage a collection of curated ETL jobs [18:43:23] thats a tv show. (sorry for offtopic) :) [18:44:02] the tv show and the process are both nods to George Orwell's 1984. :) [18:44:23] which makes me wonder why awight endorses the idea. :P [18:45:32] Work brings Order [18:45:34] War is Peace [18:47:00] yeah, these curated jobs would need occasional tweaking that goes beyond just restarting the process [18:47:13] Any deal like I’m imagining is an eminently human endeavor [18:47:55] awight: I thought about that, the EventStream appears to allow it. Problem is that articles get deleted, so if it’s down for a while and starts back up, it’ll have to handle deleted articles instead. Having nothing for a while in the dataset might be easier than filling those gaps retroactively. Arguably I could run it all on WMF servers, but I’d like to have as much data publicly available as possible on this project, which it why I set [18:47:56] up on Toolforge. [18:48:27] oh cool. So it integrates with Quarry, then. [18:48:45] yeah, the database should be public… I need to test that, brb [18:48:54] * awight is still slowly learning new labs appellations [18:49:15] I’m old enoug to sometimes misspeak of Toolforge as the Toolserver :D [18:49:35] You must first realize that there is no Labs. Only the Cloud [18:49:53] http://bigbrother.channel5.com [18:49:55] * awight blanks mind and tries to learn [18:49:57] woops [18:50:02] lol [18:53:46] Nettrom: It could run on WMF servers, then replicate to Toolforge. [18:54:57] awight: I do have a "sometime we will be able to..." idea about a central service that lets people register event handlers for EventBus events and takes care of making sure they all run. Kind of Amazon Lambda + IFTTT [18:55:33] * awight marks on next year’s calendar [18:55:36] awight: hmm, I’ll keep that in mind, once I have some other things set up that might be a good way to do it [18:55:40] something like that could replace a whole lot of one-off bots that watch irc to trigger things [18:56:01] +1 [19:00:45] that's a neat idea bd808 [19:00:49] bd808: ifttt’s UI FTW! [19:01:01] I've had less sophisticated thoughts about eventbus and that use case too [19:01:37] I told Y.uvi at one point and he was all "yeah I've thought about that" :) [19:03:12] I built most of the backend for $DAYJOB-1 around as a MOM dispatcher. It was a pretty flexible pattern that I miss. [19:03:42] I don't miss the JVM and XML, but the functionality was pretty nice [19:11:28] 10Cloud-Services, 10cloud-services-team, 10User-bd808: Request for additional edit permissions on wikitech.wikimedia.org for dr0ptp4kt - https://phabricator.wikimedia.org/T174488#3592493 (10bd808) 05Open>03Resolved a:03bd808 @dr0ptp4kt please fix all of the docs everywhere. :) [19:15:28] 10Cloud-Services, 10cloud-services-team, 10User-bd808: Request for additional edit permissions on wikitech.wikimedia.org for dr0ptp4kt - https://phabricator.wikimedia.org/T174488#3592581 (10dr0ptp4kt) Thanks! [21:06:58] PROBLEM - Puppet errors on tools-worker-1020 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:23:44] 10Cloud-Services, 10Design: Update Cloud Services logo+wordmark - https://phabricator.wikimedia.org/T174094#3593017 (10BFlores) I wanted to show you what the updated logo would look like in one color. The logo mark can have the original colors, but it should be kept at this size to keep it consistent with the... [21:47:55] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [22:06:20] 10Striker, 10wikitech.wikimedia.org: LDAP account that is not attached on wikitech has no means for password reset - https://phabricator.wikimedia.org/T174469#3593114 (10Tgr) ```lang=php \MediaWiki\Auth\AuthManager::getInstance()->autoCreateUser( User::newFromName( $username ), LdapPrimaryAuthenticationProvide... [22:11:27] 10Striker, 10wikitech.wikimedia.org: LDAP account that is not attached on wikitech has no means for password reset - https://phabricator.wikimedia.org/T174469#3593144 (10Tgr) As discussed on IRC, one approach is to allow Special:PasswordReset to work for users who do not have a local account, then make sure cl... [22:11:59] RECOVERY - Puppet errors on tools-worker-1020 is OK: OK: Less than 1.00% above the threshold [0.0] [22:27:57] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [22:32:09] 10Striker, 10wikitech.wikimedia.org: LDAP account that is not attached on wikitech has no means for password reset - https://phabricator.wikimedia.org/T174469#3593153 (10bd808) >>! In T174469#3593114, @Tgr wrote: > ```lang=php > \MediaWiki\Auth\AuthManager::getInstance()->autoCreateUser( User::newFromName( $us... [23:00:58] 10cloud-services-team (FY2017-18), 10Goal: Program 4 Outcome 1: improve documentation - https://phabricator.wikimedia.org/T166401#3593229 (10bd808)