[00:51:33] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service: Make the user agent configurable for Wikidata Query Service Updater - https://phabricator.wikimedia.org/T217896 (10Smalyshev) Blazegraph accepts `http.userAgent` but looks like Updater does not. Probably makes sense to make it do the same.
[00:52:09] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: Make the user agent configurable for Wikidata Query Service Updater - https://phabricator.wikimedia.org/T217896 (10Smalyshev)
[00:52:19] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: Make the user agent configurable for Wikidata Query Service Updater - https://phabricator.wikimedia.org/T217896 (10Smalyshev) p:05Triage→03Normal
[01:09:51] <wikibugs>	 (03PS5) 10Paladox: Add "multi-site" plugin so gerrit can have multi masters [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/494865
[01:25:03] <wikibugs>	 (03PS1) 10Jdlrobson: Restore descriptions to beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495419 (https://phabricator.wikimedia.org/T217931)
[01:41:49] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] "Beta-only. Will revert if it blows up." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495419 (https://phabricator.wikimedia.org/T217931) (owner: 10Jdlrobson)
[01:42:35] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] Remove $wgMediaInTargetLanguage, matches the MW default now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495408 (owner: 10MaxSem)
[01:42:50] <wikibugs>	 (03Merged) 10jenkins-bot: Restore descriptions to beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495419 (https://phabricator.wikimedia.org/T217931) (owner: 10Jdlrobson)
[01:49:45] <wikibugs>	 (03CR) 10jenkins-bot: Restore descriptions to beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495419 (https://phabricator.wikimedia.org/T217931) (owner: 10Jdlrobson)
[02:27:55] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 26187896 and 1 seconds
[02:30:17] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 52288 and 56 seconds
[02:52:33] <wikibugs>	 (03PS1) 10Paladox: Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/495424
[02:54:11] <wikibugs>	 (03Abandoned) 10Paladox: Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/495265 (owner: 10Paladox)
[02:54:16] <wikibugs>	 (03CR) 10Paladox: [V: 03+2 C: 03+2] Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/495424 (owner: 10Paladox)
[02:55:11] <wikibugs>	 (03PS2) 10Paladox: Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/495424
[06:28:09] <icinga-wm>	 PROBLEM - netbox HTTPS on netmon1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 547 bytes in 0.163 second response time https://wikitech.wikimedia.org/wiki/Netbox
[06:28:37] <icinga-wm>	 PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[06:37:03] <icinga-wm>	 RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational
[06:37:47] <icinga-wm>	 RECOVERY - netbox HTTPS on netmon1002 is OK: HTTP OK: HTTP/1.1 302 Found - 348 bytes in 0.669 second response time https://wikitech.wikimedia.org/wiki/Netbox
[10:07:21] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on graphite1004 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[10:08:23] <elukey>	 mcrouter is complaining https://grafana.wikimedia.org/d/000000549/mcrouter?orgId=1&var-source=eqiad%20prometheus%2Fops&var-cluster=api_appserver&var-instance=All
[10:08:26] <elukey>	 for the apis
[10:09:14] <elukey>	 and it is for mc1022 (I checked on /var/log/mcrouter.log on one api appserver)
[10:10:22] <elukey>	 and from https://grafana.wikimedia.org/d/000000317/memcache-slabs?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=memcached&var-instance=mc1022&var-slab=All the culprit seems to be slab 168, that is the recurrent issue with the translate-groups key
[10:10:32] <elukey>	 in theory it should recover pretty soon
[10:10:45] <elukey>	 on the hosts I can see tkos alread gone 
[10:14:02] * elukey goes afk now but available if needed
[10:24:05] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on graphite1004 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[10:40:43] <wikibugs>	 (03PS1) 10Jayprakash12345: Enable $wgAllowCopyUploads for pawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495446 (https://phabricator.wikimedia.org/T217486)
[11:33:27] <wikibugs>	 (03PS1) 10Volans: check_icinga: fix retry logic [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/495447 (https://phabricator.wikimedia.org/T217599)
[12:33:42] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] "+1 per Bryan :-]  Then we will see whether it actually changes anything on the LDAP Grafana dashboard." [puppet] - 10https://gerrit.wikimedia.org/r/494922 (https://phabricator.wikimedia.org/T217280) (owner: 10GTirloni)
[13:29:08] <wikibugs>	 (03PS1) 10Elukey: superset: fix database name for analytics-tool1004 [puppet] - 10https://gerrit.wikimedia.org/r/495452
[13:31:54] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] superset: fix database name for analytics-tool1004 [puppet] - 10https://gerrit.wikimedia.org/r/495452 (owner: 10Elukey)
[13:51:01] <icinga-wm>	 RECOVERY - superset on analytics-tool1004 is OK: TCP OK - 0.036 second response time on 10.64.36.116 port 9080
[13:52:37] <icinga-wm>	 RECOVERY - puppet last run on analytics-tool1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:30:27] <icinga-wm>	 PROBLEM - superset on analytics-tool1004 is CRITICAL: connect to address 10.64.36.116 and port 9080: Connection refused
[16:30:57] <icinga-wm>	 PROBLEM - Check systemd state on analytics-tool1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[16:32:49] <icinga-wm>	 RECOVERY - superset on analytics-tool1004 is OK: TCP OK - 0.036 second response time on 10.64.36.116 port 9080
[16:33:00] <elukey>	 this is me testing on a testing host :)
[16:33:19] <icinga-wm>	 RECOVERY - Check systemd state on analytics-tool1004 is OK: OK - running: The system is fully operational
[21:21:00] <wikibugs>	 10Operations, 10MobileFrontend, 10TechCom, 10Traffic, 10Readers-Web-Backlog (Tracking): Remove .m. subdomain, serve mobile and desktop variants through the same URL - https://phabricator.wikimedia.org/T214998 (10Krinkle) >>! In T214998#5007578, @Tbayer wrote: >>>! In T214998#5005391, @Krinkle wrote: >> I...
[22:10:14] <wikibugs>	 (03CR) 10MarkAHershberger: "glad you suggested it.  coming up!" [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/495158 (owner: 10MarkAHershberger)
[22:20:56] <wikibugs>	 (03PS2) 10MarkAHershberger: Add the current time to the tag for the nightly build [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/495158
[22:36:16] <icinga-wm>	 PROBLEM - puppet last run on db1062 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:02:08] <icinga-wm>	 RECOVERY - puppet last run on db1062 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures