[07:23:55] <wikibugs>	 10serviceops, 10Analytics, 10ChangeProp, 10Community-Tech, and 6 others: Provide the ability to have time-delayed or time-offset jobs in the job queue - https://phabricator.wikimedia.org/T218812 (10Joe)
[07:31:42] <wikibugs>	 10serviceops, 10Analytics, 10ChangeProp, 10Community-Tech, and 6 others: Provide the ability to have time-delayed or time-offset jobs in the job queue - https://phabricator.wikimedia.org/T218812 (10Joe) I'm a bit conflicted about this, and let me clarify why:  in all the use-cases referenced above the use...
[10:54:36] <_joe_>	 my mac crashed 
[10:59:53] <wikibugs>	 10serviceops, 10ChangeProp, 10Release Pipeline, 10Patch-For-Review, and 2 others: Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10Pchelolo)
[10:59:57] <wikibugs>	 10serviceops, 10ChangeProp, 10Release Pipeline, 10Core Platform Team Kanban (Done with CPT), 10Services (done): Make change-prop tests independent of Kafka and Redis - https://phabricator.wikimedia.org/T218396 (10Pchelolo) 05Open→03Resolved Now it's ready - CP tests are independent of both Kafka and...
[12:03:21] <wikibugs>	 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: mw2206.codfw.wmnet memory issues - https://phabricator.wikimedia.org/T215415 (10jijiki) @Papaul then the issue is somewhere else:  ` [Thu Mar 21 11:02:31 2019] mce: [Hardware Error]: Machine check events logged [Thu Mar 21 11:02:31 2019] EDAC sbridge...
[12:05:21] <wikibugs>	 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: mw2206.codfw.wmnet memory issues - https://phabricator.wikimedia.org/T215415 (10MoritzMuehlenhoff) It could be simply a broken CPU? If we have such the CPU type in a decom host, we could loot it from there.
[12:58:44] <wikibugs>	 10serviceops, 10Analytics, 10ChangeProp, 10Community-Tech, and 6 others: Provide the ability to have time-delayed or time-offset jobs in the job queue - https://phabricator.wikimedia.org/T218812 (10Pchelolo) There is already an ability to execute jobs after a delay or at more-or-less specific time, but it'...
[13:56:13] <ottomata>	 godog:  o/ 
[13:56:14] <ottomata>	 qq
[13:56:28] <ottomata>	 the librdkafka stats are emitted internally in eventgate once every 30 seconds
[13:56:38] <ottomata>	 and i suppose prometheeus is configured to scrape every 60 seconds
[13:56:52] <ottomata>	 so i think e.g. the windows are caculated every 30 seconds (not sure about thaht)
[13:57:08] <ottomata>	 for some other rhings. like message rates
[13:57:22] <ottomata>	 i'm doing a sum(rate ... ))
[13:57:38] <ottomata>	 with rate [5m]
[13:57:53] <ottomata>	 seems like it works, but i think the metrics are pretty slow to update.  maybe it sok?
[13:58:00] <ottomata>	 i guess i'm just looking for best practice here
[13:58:09] <ottomata>	 if you have any thoughts.
[14:03:41] <wikibugs>	 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: mw2206.codfw.wmnet memory issues - https://phabricator.wikimedia.org/T215415 (10CDanis) The memory address is the same in all of these error reports.  That suggests to me that one of the DIMMs has a 'stuck' bit and that it is unlikely to be a CPU issu...
[14:21:59] <godog>	 ottomata: hi! not sure what you mean by metrics are slow to update, like the query is slow?
[14:22:07] <mutante>	 does a parsoid::testing server need to match the PHP version on the prod parsoid servers (7.0) (assumed that) or the one on appservers (7.2)  per https://phabricator.wikimedia.org/T216102#4954452 ?
[14:22:17] <mutante>	 my assumption was that i match wtp* 
[14:22:28] <ottomata>	 no sorry
[14:22:55] <ottomata>	 actually yes.  
[14:23:02] <ottomata>	 i have two questions, but let me me the second first
[14:23:08] <ottomata>	 because i can't provide an example
[14:23:08] <ottomata>	 !
[14:23:17] <ottomata>	 godog:  when you go here
[14:23:17] <ottomata>	 https://grafana.wikimedia.org/d/ePFPOkqiz/eventgate?refresh=1m&orgId=1&from=1553174590868&to=1553178190868&var-dc=eqiad%20prometheus%2Fk8s-staging&var-service=eventgate-analytics&var-kafka_topic=All&var-kafka_broker=All&var-kafka_producer_type=All
[14:23:30] <ottomata>	 (and switch to k8s-staging if not already selected)
[14:23:39] <ottomata>	 do you see anything under kafka messages rate by topic?
[14:23:42] <ottomata>	 or just No Data points?
[14:24:49] <ottomata>	 i see no data
[14:24:50] <ottomata>	 and yet
[14:24:54] <ottomata>	 http://localhost:8000/k8s-staging/graph?g0.range_input=1h&g0.expr=sum(rate(eventgate_rdkafka_producer_topic_partition_txmsgs%5B5m%5D))+by+(producer_type%2C+topic)&g0.tab=0
[14:25:01] <ottomata>	 has what I expect
[14:25:18] <ottomata>	 i can get usually grafana to display the kafka data IF I click around a lot and change things,
[14:25:23] <ottomata>	 swich time frame to last 24 hours
[14:25:29] <ottomata>	 flip between datasources
[14:25:34] <ottomata>	 then eventuaally it will show something
[14:27:20] <godog>	 ah ok, sorry I don't have time to dig into that right ottomata 
[14:27:23] <ottomata>	 the query inspector shows that it is executing correctly
[14:27:27] <ottomata>	 but just has no results
[14:28:17] <ottomata>	 ok godog 
[14:28:50] <ottomata>	 i can verify that 
[14:28:53] <ottomata>	 curl 'localhost:8000/k8s-staging/api/v1/query_range?query=sum(rate(eventgate_rdkafka_producer_topic_partition_txmsgs%7Bservice%3D%22eventgate-analytics%22%2Cproducer_type%3D~%22%22%2Ctopic%3D~%22staging_test_event%22%7D%5B5m%5D))%20by%20(producer_type%2Ctopic)&start=1553174820&end=1553178435&step=15'
[14:28:56] <ottomata>	 doesn't return any data
[14:28:58] <ottomata>	 v strange
[14:38:39] <ottomata>	 akosiaris:  yt?  
[14:38:43] <akosiaris>	 ?
[14:38:51] <ottomata>	 i think i can reproduce this eventgate/k8s/kafka problem in staging.
[14:39:02] <akosiaris>	 which problem?
[14:39:03] <ottomata>	 but i'm really sure how to troubleshoot it
[14:39:34] * akosiaris reading backlog
[14:39:36] <ottomata>	 well right now, what happens when a new pod is spawned
[14:39:47] <ottomata>	 akosiaris:  backlog won't help unless in -services :)
[14:39:56] <ottomata>	 so
[14:39:59] <ottomata>	 new pod gets spawneed
[14:40:07] <ottomata>	 and the service connects to kafka and says it is ready
[14:40:19] <ottomata>	 but, the first time I try to POST to it
[14:40:24] <ottomata>	 it blocks, and does not produce to Kakfa for a LONG time.
[14:40:29] <ottomata>	 > 60 seconds
[14:40:33] <ottomata>	 or more
[14:40:37] <akosiaris>	 before anything else
[14:40:42] <ottomata>	 the http client eventually times out
[14:40:44] <akosiaris>	 do we continue this here on in #-services?
[14:40:48] <ottomata>	 and EVENTUALLY the messages are produced
[14:40:49] <ottomata>	 oh
[14:41:00] <ottomata>	 the same people are in both!
[14:41:04] <ottomata>	 i guess services?
[14:42:47] <akosiaris>	 ok
[15:44:16] <wikibugs>	 10serviceops, 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10Addshore) >>! In T212189#5020187, @akosiaris wrote: >  > Thanks for the understanding. We are drafting next quarter goals this week, I 'll...
[15:47:54] <wikibugs>	 10serviceops, 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10akosiaris) >>! In T212189#5044451, @Addshore wrote: >>>! In T212189#5020187, @akosiaris wrote: >>  >> Thanks for the understanding. We are...
[16:07:47] <wikibugs>	 10serviceops, 10Analytics, 10ChangeProp, 10Community-Tech, and 6 others: Provide the ability to have time-delayed or time-offset jobs in the job queue - https://phabricator.wikimedia.org/T218812 (10Mooeypoo) >>! In T218812#5042813, @Joe wrote: > @aezell is there something I'm missing? Wouldn't a scheduled...
[16:20:03] <wikibugs>	 10serviceops, 10Analytics, 10ChangeProp, 10Community-Tech, and 6 others: Provide the ability to have time-delayed or time-offset jobs in the job queue - https://phabricator.wikimedia.org/T218812 (10aezell) >>! In T218812#5042813, @Joe wrote: > - TTL-based expiry of records  I agree. This is the "work aroun...
[22:59:39] <wikibugs>	 10serviceops, 10Analytics, 10Analytics-Kanban, 10EventBus, and 3 others: eventgate-analytics k8s pods occasionally can't produce to kafka - https://phabricator.wikimedia.org/T218268 (10Ottomata) I don't know much more, but I have a lot more data!  Here is a staging pod with trace logging enabled reproducin...