[00:47:11] RECOVERY - Check the last execution of monitor_refine_eventlogging_legacy_failure_flags on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_legacy_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:51:37] RECOVERY - Check the last execution of monitor_refine_eventlogging_analytics_failure_flags on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_analytics_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:22:41] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Import AMD rocm packages in wikimedia-buster - https://phabricator.wikimedia.org/T224723 (10MoritzMuehlenhoff) ROCM is now also packaged in Debian: https://packages.qa.debian.org/r/rocr-runtime.html https://packages.qa.deb... [08:19:53] good morning team [09:12:08] 10Analytics-Clusters: PySpark Error in JupyterHub: Python in worker has different version - https://phabricator.wikimedia.org/T256997 (10JAllemandou) I think the issue is related to using Spark in a SWAP notebook on stat1008. I reproduced the error using the preset pyspark kernels, but managed to have a working... [12:35:59] !log Start backfilling of wdqs_internal (external had been done, not internal :S) [12:36:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:39:19] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Backfill wdqs_external_sparql_query without filtering on meta.domain - https://phabricator.wikimedia.org/T256797 (10JAllemandou) I didn't notice this task was mentioning `wdqs_external`only, meaning that `wdqs_internal` were not included. Currently running a... [13:00:13] 10Analytics-Clusters: Review an-coord1001's usage and failover plans - https://phabricator.wikimedia.org/T257412 (10Ottomata) All for it! Ultimately everything should be as isolated as possible, and the DB on an-coord1001 is a big SPOF. I'd prioritize 2., trying to get the db replicated and in some kind of hot... [13:14:14] 10Analytics, 10Event-Platform, 10Technical-blog-posts: Story idea for Blog: Wikimedia's Event Platform - https://phabricator.wikimedia.org/T253649 (10Ottomata) Sure would love to! I just sent you an invite for this afternoon if that works. [13:38:28] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Backfill wdqs_external_sparql_query without filtering on meta.domain - https://phabricator.wikimedia.org/T256797 (10Ottomata) OH! Thank you. [14:02:05] hey teammmm, hello [14:40:28] 10Analytics, 10Product-Analytics, 10Epic: API pageview counts for 'Mobile app' are incorrect since switch to mobile-html - https://phabricator.wikimedia.org/T256508 (10Charlotte) [14:40:39] 10Analytics, 10Product-Analytics, 10Epic: API pageview counts for 'Mobile app' are incorrect since switch to mobile-html - https://phabricator.wikimedia.org/T256508 (10Charlotte) [14:47:07] hi mforns! [14:47:28] hey milimetric :] [14:48:03] helloOO [14:48:53] hi ottomata :] [14:55:29] mforns: holaaa [14:58:00] hello nuria :] [15:01:20] ping joal [15:06:30] 10Analytics, 10Analytics-Kanban: User entropy alarms. Evaluate thresholds - https://phabricator.wikimedia.org/T257691 (10Nuria) [15:07:01] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Vertical: Migrate SearchSatisfaction EventLogging event stream to Event Platform - https://phabricator.wikimedia.org/T249261 (10Ottomata) Last week we fully migrated SearchSatisfaction events to eventgate-analytics-e... [15:07:16] 10Analytics, 10Analytics-Kanban: User entropy alarms. Evaluate thresholds - https://phabricator.wikimedia.org/T257691 (10Nuria) a:05mforns→03None [15:58:01] (03PS4) 10Ottomata: Overloaded methods to make working with default Refine related classes easier [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/607788 [16:34:24] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add editors per country data to AQS API (geoeditors) - https://phabricator.wikimedia.org/T238365 (10Nuria) a:05Milimetric→03mforns [16:34:32] * dsaez loves the a-team. Thanks for this [16:34:38] * dsaez this https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_actor_hourly [16:34:47] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Create intermediate table that holds public data for geoeditors dataset so it can be used to load cassandra - https://phabricator.wikimedia.org/T244597 (10Nuria) a:05Milimetric→03mforns [16:35:03] dsaez: <3 - please know that the name will change this week to pagevierw_actor [16:36:14] joal, got it. thx [16:41:09] 10Analytics-Radar, 10Product-Analytics: Check Product Analytics team's standard datasets and remove COUNT(*) - https://phabricator.wikimedia.org/T256025 (10fdans) [16:41:29] 10Analytics-Radar, 10Product-Analytics (Kanban): Collect metrics/tables which might be touched by IP masking feature - https://phabricator.wikimedia.org/T255816 (10fdans) [16:43:37] 10Analytics-Radar, 10Product-Analytics: Clarify the data retention extension process - https://phabricator.wikimedia.org/T256776 (10fdans) [16:44:45] 10Analytics: RU reportupdater-ee-beta-features keeps logging a lot of daily errors to its logs - https://phabricator.wikimedia.org/T256195 (10fdans) a:03mforns [16:45:30] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Epic: API pageview counts for 'Mobile app' are incorrect since switch to mobile-html - https://phabricator.wikimedia.org/T256508 (10fdans) [16:47:50] 10Analytics-Radar, 10Product-Analytics (Kanban): Identify next steps for dealing with missing mobile app pageview counts - https://phabricator.wikimedia.org/T256804 (10fdans) [16:47:52] 10Analytics, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.32x-Q-Qurabiya): Send Analytics pageview header only for live views, not saved pages. - https://phabricator.wikimedia.org/T257859 (10Dbrant) [16:48:07] 10Analytics-Radar, 10Product-Analytics: Calculate impact of missing mobile app pageviews to high-level metrics - https://phabricator.wikimedia.org/T257373 (10fdans) [16:48:09] 10Analytics, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.32x-Q-Qurabiya): Send Analytics pageview header only for live views, not saved pages. - https://phabricator.wikimedia.org/T257859 (10Dbrant) p:05Triage→03High a:03Dbrant [16:49:20] 10Analytics: Update PageviewDefinition to only include /api/rest_v1/page/mobile-html requests with X-Analytics: pageview=1 in pageviews - https://phabricator.wikimedia.org/T257860 (10JoeWalsh) [16:49:21] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics: Bug: 'Include Time' option in table visualization produces "0NaN-NaN-NaN NaN:NaN:NaN" - https://phabricator.wikimedia.org/T256136 (10fdans) [16:49:31] 10Analytics: Update PageviewDefinition to only include /api/rest_v1/page/mobile-html requests with X-Analytics: pageview=1 in pageviews - https://phabricator.wikimedia.org/T257860 (10JoeWalsh) [16:49:50] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rename pageview_actor_hourly to pageview_actor - https://phabricator.wikimedia.org/T256415 (10fdans) p:05Triage→03High [16:50:02] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rename pageview_actor_hourly to pageview_actor - https://phabricator.wikimedia.org/T256415 (10fdans) p:05High→03Triage [16:50:48] 10Analytics, 10Product-Analytics: Re-process webrequests from 2020-05-18 so that page views from latest Wikipedia app releases are counted - https://phabricator.wikimedia.org/T256516 (10fdans) p:05Triage→03High [16:54:17] 10Analytics: Update refinery-core Webrequest.isWikimediaHost - https://phabricator.wikimedia.org/T256674 (10fdans) p:05Triage→03Medium [16:55:44] 10Analytics, 10Event-Platform: Refine should add field to indicate if event is from wikimedia domain instead of filtering - https://phabricator.wikimedia.org/T256677 (10fdans) p:05Triage→03Medium [16:56:27] 10Analytics: EventGate throttling and DOS prevention - https://phabricator.wikimedia.org/T256891 (10fdans) [16:56:45] 10Analytics: EventGate throttling and DOS prevention - https://phabricator.wikimedia.org/T256891 (10fdans) ping @Ottomata [16:57:39] 10Analytics: EventGate throttling and DOS prevention - https://phabricator.wikimedia.org/T256891 (10fdans) p:05Triage→03Medium [17:01:06] milimetric: is now lunch tome for you? [17:01:15] s/o/i [17:01:32] joal: I was just gonna IM you, yes, I just need a bit to catch my breath and then we can deploy? [17:01:42] milimetric: if som I go for diner and we deploy later? [17:01:56] In 1h, 1h30? [17:04:15] milimetric: ok I'm gone for diner and put kids to bed, I'll ping when back [17:05:09] 10Analytics: Update PageviewDefinition to only include /api/rest_v1/page/mobile-html requests with X-Analytics: pageview=1 in pageviews - https://phabricator.wikimedia.org/T257860 (10JoeWalsh) [17:05:11] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Epic: API pageview counts for 'Mobile app' are incorrect since switch to mobile-html - https://phabricator.wikimedia.org/T256508 (10JoeWalsh) [17:32:36] sorry, np, I'm back, anytime now [17:51:03] 10Analytics: EventGate throttling and DOS prevention - https://phabricator.wikimedia.org/T256891 (10Ottomata) No, no more than we added throttling to EventLogging /beacon/event :p [18:06:37] 10Analytics-Radar, 10Product-Analytics (Kanban): Collect metrics/tables which might be touched by IP masking feature - https://phabricator.wikimedia.org/T255816 (10MNeisler) [18:08:36] ottomata: would it make sense to loop through the first N events per hour looking for the first one with an actual $schema? [18:21:22] milimetric: back as well :) [18:21:32] in da cave [18:21:36] joining! [18:26:05] !log Kill pageview_actor_hourly and unique_devices_per_project_family jobs to copy backfilled data [18:26:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:36:29] joal: i have forgotten....what did we decided to do about $schema and schema merrging and did we write it down somewhere? [18:37:20] ottomata: currently doing some ops with Dan - need to concentrate - You might have written the decision in the task (can't recall) [18:37:23] ok [18:37:52] i i made a task very good past me [18:37:53] thank you [18:40:48] 10Analytics, 10Analytics-Kanban: User entropy alarms. Evaluate thresholds - https://phabricator.wikimedia.org/T257691 (10Nuria) [18:44:30] 10Analytics: Refine drops $schema field values - https://phabricator.wikimedia.org/T255818 (10Ottomata) @JAllemandou I thought I could get dropping struct columns to work like: `lang=scala val newMetaCol = struct("meta", df0.select("meta.*").drop("topic").columns:_*) df0.withColumn("meta", newMetaCol) ` But it... [18:45:17] 10Analytics, 10Analytics-Kanban: User entropy alarms. Evaluate thresholds - https://phabricator.wikimedia.org/T257691 (10Nuria) [18:56:40] 10Analytics-Radar, 10Product-Analytics (Kanban): Collect metrics/tables which might be touched by IP masking feature - https://phabricator.wikimedia.org/T255816 (10nshahquinn-wmf) [19:03:55] 10Analytics, 10Analytics-Kanban: User entropy alarms. Evaluate thresholds - https://phabricator.wikimedia.org/T257691 (10mforns) Yes, it's surprising that it didn't alarm... After seeing the whole timeseries though: {F31933841} I think it didn't alarm because the RSVD algorithm gives the same importance to pa... [19:14:57] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Update geoeditors daily to not use map-join [analytics/refinery] - 10https://gerrit.wikimedia.org/r/609510 (https://phabricator.wikimedia.org/T257397) (owner: 10Joal) [19:16:49] (03CR) 10Milimetric: [C: 03+2] Update clickstream rename pageview_actor_hourly [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/610153 (https://phabricator.wikimedia.org/T256415) (owner: 10Joal) [19:17:57] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Rename pageview_actor_hourly to pageview_actor [analytics/refinery] - 10https://gerrit.wikimedia.org/r/610159 (https://phabricator.wikimedia.org/T256415) (owner: 10Joal) [19:18:14] ottomata: we're deploying refinery-source - I assume since I have not yet review your patches you have nothing to deploy now? [19:19:05] das right! :) [19:19:31] ottomata: sorry for missing train again :( [19:19:56] Starting build #53 for job analytics-refinery-maven-release-docker [19:19:57] s'ok still not urgent and i can use my own complied refinery when i am doing debugging like right now :) [19:20:03] ok [19:24:44] !log Drop pageview_actor_hourly and replace it by pageview_actor [19:24:45] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:32:05] Project analytics-refinery-maven-release-docker build #53: 09SUCCESS in 12 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/53/ [19:36:22] Starting build #20 for job analytics-refinery-update-jars-docker [19:36:39] Project analytics-refinery-update-jars-docker build #20: 09SUCCESS in 17 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/20/ [19:38:09] (03PS1) 10Maven-release-user: Add refinery-source jars for v0.0.130 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/612405 [19:39:10] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Add refinery-source jars for v0.0.130 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/612405 (owner: 10Maven-release-user) [19:41:19] !log Deploy refinery with scap [19:43:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:43:39] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog (Kanban): EventLogging Server Side client should POST to EventGate - https://phabricator.wikimedia.org/T253121 (10Mholloway) a:03Mholloway [19:48:38] milimetric: I'm updating the wikitech docs for pageview_actor rename [19:56:29] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Vertical: Migrate SearchSatisfaction EventLogging event stream to Event Platform - https://phabricator.wikimedia.org/T249261 (10Ottomata) The backfill of the offending SearchSatisfaction hour (/wmf/data/event/SearchS... [20:06:11] 10Analytics, 10Analytics-Kanban: User entropy alarms. Evaluate thresholds - https://phabricator.wikimedia.org/T257691 (10Nuria) Given that this is a daily measure it seems that 1000 datapoints is a bit much, right? probably 1 /2 months back seems as much as we should look given the phenomena we are trying to m... [20:11:59] 10Analytics, 10Analytics-Kanban: User entropy alarms. Evaluate thresholds - https://phabricator.wikimedia.org/T257691 (10Nuria) >Plus, if we continue to read the whole time-series, the RSVD will potentially become worse Agreed, I was not aware this is what was happening. [20:17:38] !log deployed weekly train with two oozie job bugfixes and rename to pageview_actor table [20:17:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:17:40] (restarting now) [20:34:24] 10Analytics-Radar, 10Product-Analytics (Kanban): Collect metrics/tables which might be touched by IP masking feature - https://phabricator.wikimedia.org/T255816 (10mpopov) [20:41:43] k, restarting done. I have to wait for the daily druid jobs to finish, run compaction and I'm done. [20:48:14] \o/ [20:49:24] milimetric: I think you can restart the compaction - all daily jobs for June have finished (druid coordinator shows me only daily segmented) [20:51:08] k [21:10:47] 10Analytics-Clusters: PySpark Error in JupyterHub: Python in worker has different version - https://phabricator.wikimedia.org/T256997 (10diego) Thanks @JAllemandou I'll try that. In the mean time, this simple example produce the same error with the current configuration (not applying @JAllemandou yet): `... [21:20:10] 10Analytics-Clusters: PySpark Error in JupyterHub: Python in worker has different version - https://phabricator.wikimedia.org/T256997 (10diego) And I confirm that this error is not happening on the stat1007.