[00:37:26] 10Operations, 10Packaging, 10Toolforge: Upload python-pykube deb to apt.wikimedia.org - https://phabricator.wikimedia.org/T200660 (10Legoktm) 05Open>03Resolved Thank you, seems to work as expected! [00:43:22] kaldari: I don't see any rsvg package in https://tools.wmflabs.org/apt-browser/stretch-wikimedia/main/ so it's going to be the Debian version, which is 2.40.16 (https://packages.debian.org/source/stable/librsvg) [00:44:21] Thanks [00:48:09] (03CR) 10Legoktm: [C: 04-1] "Getting closer...:" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/449033 (https://phabricator.wikimedia.org/T188318) (owner: 10Legoktm) [00:50:22] 10Operations, 10Wikimedia-SVG-rendering, 10Upstream: Incorrect text positioning in SVG rasterization (scale/transform; font-size; kerning) - https://phabricator.wikimedia.org/T36947 (10kaldari) >When we migrate the image scalers to Debian stretch we'll have a refreshed graphics library stack. @MoritzMuehlenh... [00:50:40] 10Operations, 10Wikimedia-SVG-rendering: Incorrect text positioning in SVG rasterization (scale/transform; font-size; kerning) - https://phabricator.wikimedia.org/T36947 (10kaldari) [01:35:12] _joe_: sorry was sleeping [02:55:11] PROBLEM - Disk space on elastic1018 is CRITICAL: DISK CRITICAL - free space: /srv 51560 MB (10% inode=99%) [03:26:41] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 787.15 seconds [03:28:31] RECOVERY - Disk space on elastic1018 is OK: DISK OK [03:39:50] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 196.89 seconds [04:37:30] PROBLEM - Disk space on elastic1018 is CRITICAL: DISK CRITICAL - free space: /srv 52245 MB (10% inode=99%) [04:41:31] RECOVERY - Disk space on elastic1018 is OK: DISK OK [04:47:49] 10Operations, 10DNS, 10Traffic, 10WMF-Communications, and 4 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776 (10Liuxinyu970226) @varnent Found two toolforge tools [[https://tools.wmflabs.org/cdnjs/|cdnjs]] and [[https://tools... [06:24:14] 10Operations, 10Continuous-Integration-Infrastructure: docker-registry returned HTTP 403 Forbidden in CI run - https://phabricator.wikimedia.org/T201737 (10Legoktm) [06:39:11] 10Operations, 10Continuous-Integration-Infrastructure: docker-registry returned HTTP 403 Forbidden in CI run - https://phabricator.wikimedia.org/T201737 (10Legoktm) Again in https://integration.wikimedia.org/ci/job/composer-package-php72-docker/914/console [07:01:31] PROBLEM - Host mw1319 is DOWN: PING CRITICAL - Packet loss = 100% [07:02:30] RECOVERY - Host mw1319 is UP: PING OK - Packet loss = 0%, RTA = 1.11 ms [08:16:21] PROBLEM - Disk space on elastic1020 is CRITICAL: DISK CRITICAL - free space: /srv 51696 MB (10% inode=99%) [08:36:41] RECOVERY - Disk space on elastic1020 is OK: DISK OK [09:40:10] PROBLEM - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is CRITICAL: CRITICAL: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is alerting: 70% GET drop in 30min alert. [09:42:11] RECOVERY - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is OK: OK: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is not alerting. [09:47:36] <_joe_> revi: heh np it was obvious in hindsight :) [09:47:55] :D [09:48:30] and yes Commons people are somewhat obsessed with their 'mass edit/upload' [11:58:50] PROBLEM - MariaDB Slave Lag: s5 on dbstore2001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 442.12 seconds [11:59:50] RECOVERY - MariaDB Slave Lag: s5 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [12:03:57] (03CR) 10Framawiki: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/451698 (https://phabricator.wikimedia.org/T192698) (owner: 10Zhuyifei1999) [12:28:41] PROBLEM - HP RAID on db2033 is CRITICAL: CRITICAL: Slot 0: Failed: 1I:1:1 - OK: 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK [12:28:44] ACKNOWLEDGEMENT - HP RAID on db2033 is CRITICAL: CRITICAL: Slot 0: Failed: 1I:1:1 - OK: 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T201757 [12:28:50] 10Operations, 10ops-codfw: Degraded RAID on db2033 - https://phabricator.wikimedia.org/T201757 (10ops-monitoring-bot) [12:47:41] PROBLEM - Device not healthy -SMART- on db2033 is CRITICAL: cluster=mysql device=cciss,11 instance=db2033:9100 job=node site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2033&var-datasource=codfw%2520prometheus%252Fops [13:34:56] Am I the only one experiencing problems with sessions on WMF sites? [13:35:18] I log on to one device, but it kicks me out from the other device at some point. [13:35:42] Almost everyday I find myself having to log back into Wikipedia, which gets annoying with 2FA [13:36:37] And this morning Phabricator freaked out, and froze my browser with a runaway JS, that printed several million "You need to be logged in to perform this action" error message. [13:37:09] paladox: ^ [13:37:44] Also, I have to log in to sister projects. SUL is not carrying over to them. [13:38:22] I’m not experiencing any problems, try cleaning cookies and then log in? [13:38:44] Example: I'm logged in to enwiki, but when I got to enwiktionary I have to log in again, despite already having an SUL on that wiki [13:41:59] paladox: didn't fix it. I just logged in on mediawiki.org but I'm still logged out on enwiki [13:42:05] 04Critical Alert for device cr2-ulsfo.wikimedia.org - Primary outbound port utilisation over 80% [13:45:27] paladox: amazing. 2FA verification failed. :/ [13:46:03] OMG, I CAN'T LOGIN [13:46:31] HELPP [13:46:43] None of my 2FA codes are working [13:47:05] 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-ulsfo.wikimedia.org recovered from Primary outbound port utilisation over 80% [13:48:20] Okay, there it goes [13:49:54] paladox: But I'm still being forced to log in to different wikis [13:50:29] And I wiped every cookie. [13:52:41] Oh [13:54:37] But maybe my persistence issue is resolved. We'll see. [16:31:00] PROBLEM - HP RAID on db2039 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:1, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Failed: 1I:1:2 - Controller: OK - Battery/Capacitor: OK [16:31:03] ACKNOWLEDGEMENT - HP RAID on db2039 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:1, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Failed: 1I:1:2 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T201761 [16:31:07] 10Operations, 10ops-codfw: Degraded RAID on db2039 - https://phabricator.wikimedia.org/T201761 (10ops-monitoring-bot) [16:40:00] PROBLEM - toolschecker: tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds [16:53:31] PROBLEM - Device not healthy -SMART- on db2039 is CRITICAL: cluster=mysql device=cciss,11 instance=db2039:9100 job=node site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2039&var-datasource=codfw%2520prometheus%252Fops [16:57:21] RECOVERY - toolschecker: tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 1015 bytes in 0.075 second response time [17:11:00] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [17:15:01] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [17:36:31] (03PS1) 10Zoranzoki21: Fix 'the the' typos in code [mediawiki-config] - 10https://gerrit.wikimedia.org/r/452050 (https://phabricator.wikimedia.org/T201491) [17:37:44] (03PS2) 10Zoranzoki21: Fix 'the the' typos in code [mediawiki-config] - 10https://gerrit.wikimedia.org/r/452050 (https://phabricator.wikimedia.org/T201491) [17:49:04] (03PS3) 10Zoranzoki21: Fix 'the the' typos in code [mediawiki-config] - 10https://gerrit.wikimedia.org/r/452050 (https://phabricator.wikimedia.org/T201491) [17:49:51] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [17:50:16] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown, and 2 others: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Imarlier) @Nemo_bis Google has read in the sitemap for it.wikipedia.org, whi... [17:50:50] (03PS1) 10Zoranzoki21: Fix 'the the' typos in code (part #2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/452051 (https://phabricator.wikimedia.org/T201491) [17:51:51] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [17:52:07] (03PS2) 10Zoranzoki21: Fix 'the the' typos in code (part #2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/452051 (https://phabricator.wikimedia.org/T201491) [17:53:19] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown, and 2 others: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Imarlier) https://tools.wmflabs.org/pageviews/?project=it.wikipedia.org&plat... [17:53:31] (03PS4) 10Urbanecm: Fix 'the the' typo in wmf-config/CirrusSearch-common.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/452050 (https://phabricator.wikimedia.org/T201491) (owner: 10Zoranzoki21) [17:54:00] (03PS3) 10Urbanecm: Fix 'the the' typo in vendor/perftools/xhgui-collector/external/header.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/452051 (https://phabricator.wikimedia.org/T201491) (owner: 10Zoranzoki21) [18:08:10] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [18:16:20] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [18:20:56] 10Operations, 10Core-Platform-Team, 10TechCom-RFC, 10Traffic, and 4 others: Harmonise the identification of requests across our stack - https://phabricator.wikimedia.org/T201409 (10Imarlier) [18:25:36] 10Operations, 10Performance-Team, 10Traffic: Significant increase in Time To First Byte on 8/8, between 16:00 and 20:00 UTC - https://phabricator.wikimedia.org/T201769 (10Imarlier) [18:31:30] 10Operations, 10Performance-Team, 10Traffic: Significant increase in Time To First Byte on 8/8, between 16:00 and 20:00 UTC - https://phabricator.wikimedia.org/T201769 (10Imarlier) [18:40:30] PROBLEM - HHVM jobrunner on mw1309 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [18:41:30] RECOVERY - HHVM jobrunner on mw1309 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [18:44:51] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [18:46:51] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [18:59:11] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [19:01:11] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [19:02:59] 10Operations, 10Domains, 10Traffic, 10WikimediaUI Style Guide: Redirect design.wikimedia.org/style-guide/wiki/* to design.wikimedia.org/style-guide/ - https://phabricator.wikimedia.org/T200304 (10Volker_E) [19:36:55] 10Operations, 10Performance-Team, 10Traffic: Significant increase in Time To First Byte on 2018-08-08, between 16:00 and 20:00 UTC - https://phabricator.wikimedia.org/T201769 (10Aklapper) [21:30:54] 10Operations, 10Maps: maps.wikimedia.org is showing old vandalized version of OSM - https://phabricator.wikimedia.org/T201772 (10MusikAnimal) [21:52:55] (03PS6) 10Gehel: [WIP] extract reporting from BaseEventHandler [software/cumin] - 10https://gerrit.wikimedia.org/r/451080 [22:33:36] (03CR) 10Nuria: [C: 031] Add cron job to create and rotate EventLogging salts [puppet] - 10https://gerrit.wikimedia.org/r/451780 (https://phabricator.wikimedia.org/T199899) (owner: 10Mforns) [22:58:01] PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet operation_type={create_container,remove_container,start_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [23:02:10] RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [23:27:37] 10Operations, 10Commons, 10Multimedia, 10media-storage: Damaged uploads interrupted with reaching of 5 MB - https://phabricator.wikimedia.org/T201379 (10Urbanecm) [23:31:12] (03CR) 10Krinkle: [C: 031] Remove obsolete $wgPopupsBetaFeature from InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/444574 (owner: 10Prtksxna) [23:31:15] (03CR) 10Krinkle: [C: 031] Remove obsolete $wgPopupsBetaFeature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/450906 (owner: 10Prtksxna)