[00:06:04] RECOVERY - Check Varnish expiry mailbox lag on cp1099 is OK: OK: expiry mailbox lag is 998 [00:13:32] (03CR) 10Dereckson: "Testing procedure:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371968 (https://phabricator.wikimedia.org/T173342) (owner: 10Dereckson) [00:31:40] 10Puppet, 10Beta-Cluster-Infrastructure, 10Wikidata: mediawiki::maintenance::wikidata should not run crons for testwikidatawiki when used on labs / a testwikidatawiki doesnt exist - https://phabricator.wikimedia.org/T173357#3525307 (10Addshore) [00:32:02] 10Puppet, 10Beta-Cluster-Infrastructure, 10Wikidata, 10User-Addshore: mediawiki::maintenance::wikidata should not run crons for testwikidatawiki when used on labs / a testwikidatawiki doesnt exist - https://phabricator.wikimedia.org/T173357#3525322 (10Addshore) [02:28:15] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.11) (duration: 08m 58s) [02:28:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:49:33] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.13) (duration: 07m 53s) [02:49:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:56:14] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Aug 15 02:56:14 UTC 2017 (duration 6m 42s) [02:56:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:27:14] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 725.90 seconds [04:22:34] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 248.71 seconds [05:44:34] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [05:45:34] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [05:49:34] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [06:02:44] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:34:54] PROBLEM - pdfrender on scb1003 is CRITICAL: connect to address 10.64.32.153 and port 5252: Connection refused [06:47:09] (03PS1) 10Muehlenhoff: Record new account expiration dates of ISI researchers [puppet] - 10https://gerrit.wikimedia.org/r/372027 [06:56:22] (03CR) 10Muehlenhoff: [C: 032] Record new account expiration dates of ISI researchers [puppet] - 10https://gerrit.wikimedia.org/r/372027 (owner: 10Muehlenhoff) [07:18:09] !log restart pdfrender on scb1003 [07:18:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:19:04] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.003 second response time [07:24:14] 10Operations, 10Analytics, 10Traffic, 10Varnish: Sort out analytics service dependency issues for cp* cache hosts - https://phabricator.wikimedia.org/T128374#3525460 (10elukey) Number 1 was done in T138747, but I was a bit reluctant to create any dependency between Varnish and Varnishkafka that could for s... [07:26:16] 10Operations, 10Analytics, 10Traffic, 10User-Elukey, 10Varnish: Sort out analytics service dependency issues for cp* cache hosts - https://phabricator.wikimedia.org/T128374#3525461 (10elukey) [07:36:29] (03PS2) 10Gehel: Revert "Switch elastic1017-1031 to niofs" [puppet] - 10https://gerrit.wikimedia.org/r/371962 (https://phabricator.wikimedia.org/T169498) (owner: 10EBernhardson) [07:37:05] (03CR) 10Gehel: [C: 032] Revert "Switch elastic1017-1031 to niofs" [puppet] - 10https://gerrit.wikimedia.org/r/371962 (https://phabricator.wikimedia.org/T169498) (owner: 10EBernhardson) [07:47:24] PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [07:50:24] RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge. [07:56:34] !log installing cvs security updates [07:56:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:29:38] PROBLEM - MariaDB Slave SQL: s3 on db1078 is CRITICAL: CRITICAL slave_sql_state could not connect [09:29:48] PROBLEM - MariaDB Slave IO: s3 on db1078 is CRITICAL: CRITICAL slave_io_state could not connect [09:31:36] <_joe_> hey jynus [09:33:24] (03PS1) 10Jcrespo: mariadb: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372040 [09:35:43] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372040 (owner: 10Jcrespo) [09:36:18] PROBLEM - MariaDB Slave Lag: s3 on db1078 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 561.73 seconds [09:37:12] (03Merged) 10jenkins-bot: mariadb: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372040 (owner: 10Jcrespo) [09:37:21] (03CR) 10jenkins-bot: mariadb: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372040 (owner: 10Jcrespo) [09:38:24] PROBLEM - HP RAID on db1078 is CRITICAL: CRITICAL: Slot 1: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2 - Failed: 1I:1:4 - Controller: OK - Battery/Capacitor: OK [09:38:25] ACKNOWLEDGEMENT - HP RAID on db1078 is CRITICAL: CRITICAL: Slot 1: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2 - Failed: 1I:1:4 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T173365 [09:38:29] 10Operations, 10ops-eqiad: Degraded RAID on db1078 - https://phabricator.wikimedia.org/T173365#3525572 (10ops-monitoring-bot) [09:39:26] !log jynus@tin Synchronized wmf-config/db-eqiad.php: depool db1078 (duration: 00m 49s) [09:39:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:45:20] 10Operations, 10ops-eqiad, 10DBA: RAID crashed on db1078 - https://phabricator.wikimedia.org/T173365#3525589 (10jcrespo) [10:01:24] 10Operations, 10ops-eqiad, 10DBA: RAID crashed on db1078 - https://phabricator.wikimedia.org/T173365#3525572 (10jcrespo) ``` 170815 9:26:51 [ERROR] InnoDB: Tried to read 16384 bytes at offset 20021248. Was only able to read 0. 2017-08-15 09:26:51 7f0b8bd0a700 InnoDB: Operating system error number 5 in a fi... [10:09:58] !log rebooting db1078 [10:10:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:23] !log installing PHP security updates on trusty [10:45:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:47:14] 10Operations, 10ops-eqiad, 10DBA: RAID crashed on db1078 - https://phabricator.wikimedia.org/T173365#3525689 (10jcrespo) p:05Triage>03High @Cmjohnson aside from the disk replacement, please open a ticket with HP, if disk failure shouldn't be transparent with the current configuration, or if we need to di... [11:18:21] (03PS1) 10MarcoAurelio: Administrators to add/remove 'transwiki' at nowiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372045 (https://phabricator.wikimedia.org/T172365) [11:32:21] !log installing Linux updates on stretch systems (no reboots yet) [11:32:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:04] PROBLEM - puppet last run on stat1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[libgsl0-dev] [11:50:05] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 232, down: 1, dormant: 0, excluded: 0, unused: 0 [11:50:44] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [11:59:38] Hello developers - could I get help with the deletion of a djvu file on Commons which gives a "Error deleting file: An unknown error occurred in storage backend "local-multiwrite"." message. [11:59:52] the file in question is https://commons.wikimedia.org/wiki/File:Literature_II,_Harutyun_Surkhatian.djvu [12:00:09] and it's a corrupt/non functioning djvu file. [12:04:34] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [12:26:15] PROBLEM - DPKG on ms-fe1007 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:26:24] PROBLEM - DPKG on ms-fe1005 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:27:24] RECOVERY - DPKG on ms-fe1007 is OK: All packages OK [12:27:24] RECOVERY - DPKG on ms-fe1005 is OK: All packages OK [12:35:04] PROBLEM - DPKG on ms-be2039 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:35:05] PROBLEM - DPKG on ms-be2031 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:35:05] PROBLEM - DPKG on ms-be2035 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:35:14] PROBLEM - DPKG on ms-be2030 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:35:24] PROBLEM - DPKG on ms-be2037 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:35:46] the dpkg alerts are all fine, upgrades in progress [12:35:55] PROBLEM - DPKG on ms-be2036 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:35:55] PROBLEM - DPKG on ms-be2033 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:35:56] PROBLEM - DPKG on ms-be2034 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:36:05] PROBLEM - DPKG on ms-be2032 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:36:05] PROBLEM - DPKG on ms-be2038 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:37:14] RECOVERY - DPKG on ms-be2039 is OK: All packages OK [12:37:55] RECOVERY - DPKG on ms-be2036 is OK: All packages OK [12:37:55] RECOVERY - DPKG on ms-be2033 is OK: All packages OK [12:37:55] RECOVERY - DPKG on ms-be2034 is OK: All packages OK [12:38:04] RECOVERY - DPKG on ms-be2032 is OK: All packages OK [12:38:05] RECOVERY - DPKG on ms-be2038 is OK: All packages OK [12:38:14] RECOVERY - DPKG on ms-be2031 is OK: All packages OK [12:38:14] RECOVERY - DPKG on ms-be2035 is OK: All packages OK [12:38:15] RECOVERY - DPKG on ms-be2030 is OK: All packages OK [12:38:25] RECOVERY - DPKG on ms-be2037 is OK: All packages OK [12:41:54] PROBLEM - DPKG on ms-be2024 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:41:55] PROBLEM - DPKG on ms-be2023 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:41:55] PROBLEM - DPKG on ms-be2025 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:41:55] PROBLEM - DPKG on ms-be2022 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:42:14] PROBLEM - DPKG on ms-be2029 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:42:15] PROBLEM - DPKG on ms-be2027 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:42:15] PROBLEM - DPKG on ms-be2028 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:42:25] PROBLEM - DPKG on ms-be2026 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:42:54] RECOVERY - DPKG on ms-be2024 is OK: All packages OK [12:42:55] RECOVERY - DPKG on ms-be2023 is OK: All packages OK [12:42:55] RECOVERY - DPKG on ms-be2025 is OK: All packages OK [12:43:04] RECOVERY - DPKG on ms-be2022 is OK: All packages OK [12:43:14] RECOVERY - DPKG on ms-be2027 is OK: All packages OK [12:43:24] RECOVERY - DPKG on ms-be2026 is OK: All packages OK [12:44:14] RECOVERY - DPKG on ms-be2029 is OK: All packages OK [12:44:15] RECOVERY - DPKG on ms-be2028 is OK: All packages OK [12:46:52] moritzm: whenever anyone says `x alerts are all fine` https://twitter.com/sadserver/status/657969482783158272 springs to mind :P [12:47:57] :-) [13:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170815T1300). Please do the needful. [13:16:44] PROBLEM - pdfrender on scb1001 is CRITICAL: connect to address 10.64.0.16 and port 5252: Connection refused [13:20:44] RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.003 second response time [13:21:51] !log bounced pdfrender on scb1001 (T159922) [13:22:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:05] T159922: pdfrender fails to serve requests since Mar 8 00:30:32 UTC on scb1003 - https://phabricator.wikimedia.org/T159922 [13:22:45] moritzm let me know if i can be of any assistance [13:28:48] Zppix: you mean for the recurring pdfrender problems? if you find additional information on the race, please followup on the Phab task [13:29:23] IIRC the exact future of the PDF rendering solution isn [13:29:51] IIRC the exact future of the PDF rendering solution isn't settled yet, using chromium in headless mode was also one of the options [13:30:09] i mean in general :P [13:36:33] can't think of anything specific ATM. We should really have some "easy hacks" tag in Phabricator as e.g. done by LibreOffice: https://wiki.documentfoundation.org/Development/EasyHacks [13:42:03] !log cleanup of old leftover logs on deployment-logstash2:/var/log/logstash [13:42:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:49:10] moritzm i am aware just letting you know im around to help if needed [13:59:32] (03PS14) 10MarcoAurelio: [WIP DNM] Create computed list of wikis that can use SecurePoll [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371926 [13:59:53] (03CR) 10MarcoAurelio: "-loginwiki, since it is not a wiki we're suposed to be operating from" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371926 (owner: 10MarcoAurelio) [14:02:20] (03CR) 10jerkins-bot: [V: 04-1] [WIP DNM] Create computed list of wikis that can use SecurePoll [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371926 (owner: 10MarcoAurelio) [14:02:22] 10Operations, 10Thumbor, 10User-fgiunchedi: Long running thumbnail requests locking up Thumbor instances - https://phabricator.wikimedia.org/T172930#3525797 (10Gilles) I think that the current per-IP PoolCounter limits are just too generous. A single user can hog up to 32 workers right now. IMHO, what matter... [14:02:40] (03CR) 10MarcoAurelio: "List is still incorrect. I feel we should remove as well those from "wikimedia.dblist"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371926 (owner: 10MarcoAurelio) [14:04:57] 10Operations, 10Performance-Team, 10Thumbor, 10User-fgiunchedi: Long running thumbnail requests locking up Thumbor instances - https://phabricator.wikimedia.org/T172930#3525800 (10Gilles) p:05Triage>03Normal a:03Gilles [14:05:52] (03CR) 10MarcoAurelio: "Maybe another approach:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371926 (owner: 10MarcoAurelio) [14:06:49] (03PS1) 10Gilles: Reduce the per-IP concurrency limit in Thumbor [puppet] - 10https://gerrit.wikimedia.org/r/372054 (https://phabricator.wikimedia.org/T172930) [14:08:02] (03PS15) 10MarcoAurelio: [WIP DNM] Create computed list of wikis that can use SecurePoll [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371926 [14:10:47] (03CR) 10jerkins-bot: [V: 04-1] [WIP DNM] Create computed list of wikis that can use SecurePoll [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371926 (owner: 10MarcoAurelio) [14:18:43] (03CR) 10MarcoAurelio: "> Main test build failed." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371926 (owner: 10MarcoAurelio) [14:19:50] Hello developers - could I get help with the deletion of a djvu file on Commons which gives a "Error deleting file: An unknown error occurred in storage backend "local-multiwrite"." message. [14:20:02] the file in question is https://commons.wikimedia.org/wiki/File:Literature_II,_Harutyun_Surkhatian.djvu [14:20:28] I've had a look through Phab, there's nothing obvious which watches the error message, just https://phabricator.wikimedia.org/T75094 but that's old old old. [14:21:53] problem has been happening for a month, basically (I've only found out about it today - the file went through DR on Commons for some unfathomable reason and has been awaiting closure for three weeks) [14:30:04] NotASpy, I suggest filing a ticket [14:30:11] a lot of people are in the air right now [14:30:36] cool, will do, just didn't want to be told 'we know what it is' or something. [14:32:07] NotASpy, https://phabricator.wikimedia.org/T75094 is indeed very close and has a "any future problems then reopen" type comment, however it has a different storage backend in the error [14:32:24] I'm not sure whether it makes a difference [14:32:36] personally I'd open a new one and link the new ticket from the old one [14:33:14] yeah, plus it's a different file type and perhaps a different issue with the file itself (in this case, the djvu looks to be corrupt or perhaps uploading has failed in a strange way) [14:33:23] I was going to link anyway, just in case it's relevant. [14:47:35] 10Operations, 10Thumbor, 10Performance-Team (Radar), 10User-fgiunchedi: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817#3525964 (10Gilles) It's probably a minor difference in rsvg rendering. 98.8% is very good similarity. Let's double check if the rendering difference is sign... [14:52:35] 10Operations, 10Thumbor, 10Performance-Team (Radar), 10User-fgiunchedi: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817#3525971 (10MoritzMuehlenhoff) That sounds pretty much like what happened on the image scalers after the upgrade to jessie, see https://phabricator.wikimedia... [15:14:23] 10Operations, 10Thumbor, 10Performance-Team (Radar), 10User-fgiunchedi: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817#3526033 (10Gilles) The font config files from fonts.pp that you wrote for Jessie are definitely present on deployment-imagescaler02. This is the font sampl... [15:17:15] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 2188516 [15:17:51] 10Operations, 10Thumbor, 10Performance-Team (Radar), 10User-fgiunchedi: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817#3526036 (10Gilles) Right off the bat, the first one with major differences, Century Schoolbook L, comes from the "gsfonts" package, which is found on thumbo... [15:21:30] !log restarting pdfrender on scb1001, added some debug messages to help us diagnose T159922 [15:21:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:21:45] T159922: pdfrender fails to serve requests since Mar 8 00:30:32 UTC on scb1003 - https://phabricator.wikimedia.org/T159922 [15:24:34] PROBLEM - pdfrender on scb1001 is CRITICAL: connect to address 10.64.0.16 and port 5252: Connection refused [15:25:03] that's me ^ [15:26:35] RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time [15:30:54] PROBLEM - puppet last run on puppetmaster2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:16] 10Operations, 10Electron-PDFs, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking), 10Services (blocked): pdfrender fails to serve requests since Mar 8 00:30:32 UTC on scb1003 - https://phabricator.wikimedia.org/T159922#3526079 (10mobrovac) >>! In T159922#3522796, @fgiunchedi wrote: > This just happened... [15:31:23] (03PS2) 10Gehel: redis - instance names should be strings in puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/369695 [15:39:14] PROBLEM - Check Varnish expiry mailbox lag on cp1049 is CRITICAL: CRITICAL: expiry mailbox lag is 2044397 [15:43:28] 10Operations, 10Thumbor, 10Performance-Team (Radar), 10User-fgiunchedi: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817#3526091 (10Gilles) I've updated the Thumbor package here: https://github.com/gi11es/thumbor-debian/tree/master/thumbor with the latest master from upstream,... [15:52:39] (03PS1) 10Gehel: service::node: move to puppet4 compatible parameter validation [puppet] - 10https://gerrit.wikimedia.org/r/372063 (https://phabricator.wikimedia.org/T171704) [15:53:09] (03CR) 10jerkins-bot: [V: 04-1] service::node: move to puppet4 compatible parameter validation [puppet] - 10https://gerrit.wikimedia.org/r/372063 (https://phabricator.wikimedia.org/T171704) (owner: 10Gehel) [15:54:19] (03PS2) 10Gehel: service::node: move to puppet4 compatible parameter validation [puppet] - 10https://gerrit.wikimedia.org/r/372063 (https://phabricator.wikimedia.org/T171704) [15:54:53] (03CR) 10jerkins-bot: [V: 04-1] service::node: move to puppet4 compatible parameter validation [puppet] - 10https://gerrit.wikimedia.org/r/372063 (https://phabricator.wikimedia.org/T171704) (owner: 10Gehel) [15:55:34] PROBLEM - Host labvirt1015 is DOWN: PING CRITICAL - Packet loss = 100% [15:56:24] PROBLEM - pdfrender on scb1004 is CRITICAL: connect to address 10.64.48.29 and port 5252: Connection refused [15:58:14] RECOVERY - puppet last run on puppetmaster2002 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [16:00:04] godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170815T1600). [16:17:24] (03PS3) 10Gehel: service::node: move to puppet 4 compatible parameter validation [puppet] - 10https://gerrit.wikimedia.org/r/372063 (https://phabricator.wikimedia.org/T171704) [16:29:14] RECOVERY - Check Varnish expiry mailbox lag on cp1049 is OK: OK: expiry mailbox lag is 0 [16:34:01] A note to anyone here, I imagine SWAT is cancelled as most of us are traveling [16:35:25] PROBLEM - Check Varnish expiry mailbox lag on cp1072 is CRITICAL: CRITICAL: expiry mailbox lag is 2036982 [16:58:26] there's nothing in puppet swat anyway [16:59:50] well that makes it easy :) [17:00:05] gwicke, cscott, arlolra, subbu, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170815T1700). [17:00:08] (sometimes ppl show up tho last minute so I figured I would note) [17:01:42] No ores today [17:04:29] no parsoid today [17:55:00] !log bsitzmann@tin Started deploy [mobileapps/deploy@34a1304]: Update mobileapps to 33b80dd (T172829 T152441 T172021 T103362) [17:55:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:55:16] T152441: Geo coordinates are missing from MCS requess on Wikivoyage en Paris - https://phabricator.wikimedia.org/T152441 [17:55:16] T172021: It shouldn't be possible for coordinates to be the lead paragraph - https://phabricator.wikimedia.org/T172021 [17:55:16] T172829: Wikidata doesn't return wikidata description for main page - https://phabricator.wikimedia.org/T172829 [17:55:16] T103362: Try to get rid of mobileview call for regular pages - https://phabricator.wikimedia.org/T103362 [17:59:00] !log bsitzmann@tin Finished deploy [mobileapps/deploy@34a1304]: Update mobileapps to 33b80dd (T172829 T152441 T172021 T103362) (duration: 04m 00s) [17:59:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:04:34] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [18:16:44] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [18:18:42] !log demon@tin Pruned MediaWiki: 1.30.0-wmf.12 [keeping static files] (duration: 02m 47s) [18:18:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:40] 10Operations, 10Mobile-Content-Service, 10RESTBase, 10Reading-Infrastructure-Team-Backlog, and 2 others: Split slash decoding from general percent normalization in Varnish VCL - https://phabricator.wikimedia.org/T127387#3526312 (10Jdlrobson) I don't think there are any open patches... [18:32:12] 10Operations, 10Mobile-Content-Service, 10Parsing-Team, 10Reading-Infrastructure-Team-Backlog, and 3 others: Create functional cluster checks for all services (and have them page!) - https://phabricator.wikimedia.org/T134551#3526315 (10Jdlrobson) [18:35:04] (03PS1) 10Chad: Group0 to wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372071 [18:35:33] (03CR) 10Chad: [C: 04-2] "l8r gator" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372071 (owner: 10Chad) [18:35:54] !log demon@tin Started scap: bootstrap wmf.14 [18:36:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:39:02] !log demon@tin scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_3982832257" --threads=10 --lang en --quiet' returned non-zero exit status 255 (duration: 03m 05s) [18:39:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:40:29] * RainbowSprinkles sighs [18:42:35] (03CR) 10Chad: "Stupid stupid. I'm an idiot for +2ing this. It does not output valid PHP in prep.py anymore. Will fix." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369818 (owner: 10Krinkle) [18:43:26] !log demon@tin Started scap: bootstrap wmf.14 v2.0 [18:43:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:45:34] (03PS1) 10Chad: Scap prep: Fix stupid syntax errors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372073 [18:45:35] (03CR) 10Chad: [C: 032] Scap prep: Fix stupid syntax errors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372073 (owner: 10Chad) [18:47:06] (03Merged) 10jenkins-bot: Scap prep: Fix stupid syntax errors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372073 (owner: 10Chad) [18:47:17] (03CR) 10jenkins-bot: Scap prep: Fix stupid syntax errors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372073 (owner: 10Chad) [18:58:30] !log mobrovac@tin Started deploy [restbase/deploy@1139d00] (staging): Use only new parsoid tables (before the removal of old ones) - T169939 [18:58:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:58:43] T169939: End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939 [19:00:04] thcipriani: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170815T1900). Please do the needful. [19:00:54] PROBLEM - Restbase root url on praseodymium is CRITICAL: connect to address 10.64.16.149 and port 7231: Connection refused [19:01:20] (03CR) 10Chad: "Didn't see this change, I already fixed it in I555e258988beb05ac300c3ad57ba99b45d6f5077." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370745 (owner: 1020after4) [19:02:55] RB in staging known ^ [19:03:09] !log mobrovac@tin Finished deploy [restbase/deploy@1139d00] (staging): Use only new parsoid tables (before the removal of old ones) - T169939 (duration: 04m 39s) [19:03:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:04:06] !log mobrovac@tin Started deploy [restbase/deploy@1139d00]: Use only new parsoid tables (before the removal of old ones) - T169939 [19:04:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:04:38] T169939: End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939 [19:04:54] RECOVERY - Restbase root url on praseodymium is OK: HTTP OK: HTTP/1.1 200 - 15659 bytes in 0.011 second response time [19:14:23] !log demon@tin Finished scap: bootstrap wmf.14 v2.0 (duration: 30m 56s) [19:14:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:16:32] !log mobrovac@tin Started deploy [restbase/deploy@1139d00]: Use only new parsoid tables (before the removal of old ones), part #2 - T169939 [19:16:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:16:43] T169939: End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939 [19:19:22] !log mobrovac@tin Finished deploy [restbase/deploy@1139d00]: Use only new parsoid tables (before the removal of old ones), part #2 - T169939 (duration: 02m 50s) [19:19:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:19:47] (03CR) 10Chad: [C: 032] Group0 to wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372071 (owner: 10Chad) [19:21:17] (03Merged) 10jenkins-bot: Group0 to wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372071 (owner: 10Chad) [19:22:10] (03CR) 10jenkins-bot: Group0 to wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372071 (owner: 10Chad) [19:24:09] !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.14 [19:24:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:41:54] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 234, down: 0, dormant: 0, excluded: 0, unused: 0 [19:42:05] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [20:21:05] 10Operations, 10Traffic, 10netops: Poor conectivity (Vodafone/THUS in UK) - https://phabricator.wikimedia.org/T172262#3526444 (10ayounsi) 05Open>03Resolved a:03ayounsi See previous comment. Doesn't seem to be Wikimedia related and no update on the task since about two weeks. To troubleshot more those... [21:15:05] PROBLEM - cassandra-b CQL 10.192.48.47:9042 on restbase2005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:15:34] RECOVERY - Check Varnish expiry mailbox lag on cp1072 is OK: OK: expiry mailbox lag is 4351 [21:17:19] ^^^ looking [21:18:05] RECOVERY - cassandra-b CQL 10.192.48.47:9042 on restbase2005 is OK: TCP OK - 3.077 second response time on 10.192.48.47 port 9042 [21:19:01] (03CR) 10Luke081515: [C: 031] Administrators to add/remove 'transwiki' at nowiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372045 (https://phabricator.wikimedia.org/T172365) (owner: 10MarcoAurelio) [21:22:44] PROBLEM - cassandra-b SSL 10.192.48.47:7001 on restbase2005 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer [21:24:44] PROBLEM - Check systemd state on restbase2005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:25:04] PROBLEM - cassandra-b service on restbase2005 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is failed [21:25:04] PROBLEM - cassandra-b CQL 10.192.48.47:9042 on restbase2005 is CRITICAL: connect to address 10.192.48.47 and port 9042: Connection refused [21:26:45] RECOVERY - Check systemd state on restbase2005 is OK: OK - running: The system is fully operational [21:26:54] !log Starting Cassandra restbase2005-b [21:27:04] RECOVERY - cassandra-b service on restbase2005 is OK: OK - cassandra-b is active [21:27:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:28:04] RECOVERY - cassandra-b SSL 10.192.48.47:7001 on restbase2005 is OK: SSL OK - Certificate restbase2005-b valid until 2018-07-19 10:52:26 +0000 (expires in 337 days) [21:28:05] RECOVERY - cassandra-b CQL 10.192.48.47:9042 on restbase2005 is OK: TCP OK - 0.036 second response time on 10.192.48.47 port 9042 [21:40:54] !log scb - codfw: disabling puppet and stopping changeprop to release load on Cassandra while dropping old tables - T169939 [21:41:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:41:06] T169939: End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939 [21:44:54] PROBLEM - Check systemd state on scb2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:45:04] PROBLEM - Check systemd state on scb2005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:45:14] PROBLEM - Check systemd state on scb2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:45:16] PROBLEM - Check systemd state on scb2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:45:25] PROBLEM - Check systemd state on scb2006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:45:29] (03PS1) 10Dereckson: Enable WikidataPageBanner on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372081 (https://phabricator.wikimedia.org/T173388) [21:45:44] PROBLEM - Check systemd state on scb2004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:58:38] (03CR) 10Krinkle: "Sorry :( , yeah, forgot a "#" on the second line." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369818 (owner: 10Krinkle) [21:59:32] (03CR) 10Krinkle: scap prep: Fix comment in generated LocalSettings.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370745 (owner: 1020after4) [21:59:45] (03CR) 10Krinkle: "WFM :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372073 (owner: 10Chad) [22:00:04] (03Abandoned) 10Krinkle: scap prep: Fix comment in generated LocalSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370745 (owner: 1020after4) [22:03:22] 10Operations, 10Phabricator: phabricator.wikimedia.org serves 403 for LANDLINE DSL Maroc Telecom connections - https://phabricator.wikimedia.org/T173389#3526622 (10Dereckson) [22:31:23] 10Operations, 10Phabricator: phabricator.wikimedia.org serves 403 for LANDLINE DSL Maroc Telecom connections - https://phabricator.wikimedia.org/T173389#3526674 (10Aklapper) 05Open>03declined a:03Aklapper That is unfortunately intentional due to spending way too much time recently on removing copyrighted... [22:33:34] PROBLEM - MariaDB Slave Lag: s2 on db1047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 331.01 seconds [22:35:53] 10Operations, 10Phabricator: phabricator.wikimedia.org serves 403 for LANDLINE DSL Maroc Telecom connections - https://phabricator.wikimedia.org/T173389#3526684 (10Dereckson) But Zero is only for mobile isn't it? This is not a mobile range, this is an IP range not used on Zero. [22:36:34] PROBLEM - MariaDB Slave Lag: s2 on db1047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.82 seconds [22:38:03] 10Operations, 10Phabricator: phabricator.wikimedia.org serves 403 for LANDLINE DSL Maroc Telecom connections - https://phabricator.wikimedia.org/T173389#3526685 (10Aklapper) Indeed, Zero is for mobile. And after blocking Zero, massive uploading of copyrighted material continued for weeks via non-mobile Morocca... [22:42:34] RECOVERY - MariaDB Slave Lag: s2 on db1047 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [22:45:12] !log scb - codfw: enabling puppet and starting changeprop back T169939 [22:45:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:45:26] T169939: End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939 [22:49:44] RECOVERY - Check systemd state on scb2002 is OK: OK - running: The system is fully operational [22:50:05] !log ppchelko@tin Started restart [changeprop/deploy@444223d]: (no justification provided) [22:50:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:50:25] RECOVERY - Check systemd state on scb2001 is OK: OK - running: The system is fully operational [22:50:34] RECOVERY - Check systemd state on scb2005 is OK: OK - running: The system is fully operational [22:50:45] RECOVERY - Check systemd state on scb2003 is OK: OK - running: The system is fully operational [22:50:55] RECOVERY - Check systemd state on scb2006 is OK: OK - running: The system is fully operational [22:51:14] RECOVERY - Check systemd state on scb2004 is OK: OK - running: The system is fully operational [22:58:57] (03PS2) 10Aaron Schulz: Don't retry InitImageDataJob's [puppet] - 10https://gerrit.wikimedia.org/r/326151 (owner: 10EBernhardson) [23:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170815T2300). [23:00:04] Urbanecm: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:20] Here [23:00:26] jouncebot: refresh [23:00:28] I refreshed my knowledge about deployments. [23:00:35] ah too late for the refresh [23:00:37] hi Urbanecm [23:00:40] Helo [23:00:44] jouncebot, now [23:00:44] For the next 0 hour(s) and 59 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170815T2300) [23:02:43] Isarra: can you add the backports to https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170815T2300 please? [23:02:52] (03CR) 10Dereckson: [C: 032] Create a few of namespace aliases for hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370990 (https://phabricator.wikimedia.org/T172977) (owner: 10Urbanecm) [23:04:25] I can try! [23:06:12] syntax is [where to deploy] {{Gerrit|number}} Title (bug) [23:06:41] [where to deploy] = [config] or a branch e.g. [.13] [.14] [23:07:15] (by the way Zuul merged the backports) [23:07:51] Urbanecm: https://gerrit.wikimedia.org/r/#/c/370990/ has a merge conflcit [23:07:55] (03PS2) 10Dereckson: Create a few of namespace aliases for hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370990 (https://phabricator.wikimedia.org/T172977) (owner: 10Urbanecm) [23:08:00] Should I resolve? [23:08:04] (03CR) 10Dereckson: [C: 032] "SWAT, take two" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370990 (https://phabricator.wikimedia.org/T172977) (owner: 10Urbanecm) [23:08:11] could be autoresolvable [23:09:01] OT: Gerrit has Rebase If Necessary policy, shouldn't it be rebased if necessary automatically? [23:09:25] (03Merged) 10jenkins-bot: Create a few of namespace aliases for hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370990 (https://phabricator.wikimedia.org/T172977) (owner: 10Urbanecm) [23:09:35] (03CR) 10jenkins-bot: Create a few of namespace aliases for hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370990 (https://phabricator.wikimedia.org/T172977) (owner: 10Urbanecm) [23:09:58] Urbanecm: live on mwdebug1002 [23:10:03] Dereckson, checking [23:11:03] Dereckson, working, please deploy to the whole universe [23:11:49] deploying [23:12:24] I edited the page. [23:12:26] It's edited. [23:12:27] thanks [23:12:39] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Create a few of namespace aliases for hiwikiversity (T172977) (duration: 00m 53s) [23:12:39] Thank you! [23:12:50] (03PS1) 10Dereckson: Enable Timeless on four French wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372087 [23:12:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:12:50] T172977: Create English namespace aliases for hiwikiversity - https://phabricator.wikimedia.org/T172977 [23:13:18] hi [23:13:47] dereckson@terbium:~$ mwscript updateArticleCount.php --wiki=srwikiquote --update [23:13:50] Counting articles...found 224. [23:13:52] Hi aude [23:14:02] !log Ran updateArticleCount.php on sr.wikiquote (T172974) [23:14:11] Urbanecm: do you need something else? [23:14:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:14:16] T172974: Set $wgArticleCountMethod to 'any' for srwikiquote and srwikisource - https://phabricator.wikimedia.org/T172974 [23:14:21] (03PS2) 10Dereckson: Enable WikidataPageBanner on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372081 (https://phabricator.wikimedia.org/T173388) [23:14:23] Dereckson, I've requested a script to be run at the Deployment page [23:14:27] (03CR) 10Dereckson: [C: 032] Enable WikidataPageBanner on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372081 (https://phabricator.wikimedia.org/T173388) (owner: 10Dereckson) [23:14:30] Dereckson: when you are done, i'd like to deploy https://gerrit.wikimedia.org/r/#/c/372051/ [23:14:30] Urbanecm: done [23:14:34] aude: ok [23:14:44] or maybe you could if you like [23:15:54] (03Merged) 10jenkins-bot: Enable WikidataPageBanner on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372081 (https://phabricator.wikimedia.org/T173388) (owner: 10Dereckson) [23:16:04] ok [23:16:08] (03CR) 10jenkins-bot: Enable WikidataPageBanner on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372081 (https://phabricator.wikimedia.org/T173388) (owner: 10Dereckson) [23:16:50] From https://gerrit.wikimedia.org/r/mediawiki/skins/Timeless [23:16:50] 45e2c28..1d2636d master -> origin/master [23:16:50] * [new branch] wmf/1.30.0-wmf.13 -> origin/wmf/1.30.0-wmf.13 [23:16:50] 45e2c28..f5e759d wmf/1.30.0-wmf.14 -> origin/wmf/1.30.0-wmf.14 [23:17:09] Isarra: yes, we know have a working origin/wmf/1.30.0-wmf.13 branch :) [23:17:39] Shiny. [23:20:49] 10Operations, 10media-storage: Deleting file on Commons "Error deleting file: An unknown error occurred in storage backend "local-multiwrite"." - https://phabricator.wikimedia.org/T173374#3526766 (10Peachey88) [23:21:33] Isarra: fix live on mwdebug1002 [23:26:54] Urbanecm: can you explain to Isarra how works the debug extension on firefox/chrome and how to test a change please? [23:27:04] Yes [23:27:05] I've a submodule commit to fix for wmf/1.30.0-wmf.13 [23:27:06] thanks [23:27:53] That's the thing you use to get the site from the new thing instead of the actual thing? [23:27:56] Or stuff? [23:28:02] Isarra, kind of. [23:28:24] It allows you to use the exact same environment as the production have, but without having to even touch production :) [23:28:49] You know, beta cluster can work a little bit differently than the production one. [23:29:37] 10Operations, 10OTRS, 10User-Matthewrbowker: Proposal: Centralize OTRS login methodology - https://phabricator.wikimedia.org/T133476#3526784 (10Az1568) [23:29:38] You can (and must, if you wish to continue) install https://chrome.google.com/webstore/detail/wikimediadebug/binmakecefompkjggiklgjenddjoifbb (same extension is for Firefox too)- You click on its icon and select the debug host you wish to use (in that case, it is mwdebug1002 as Dereckson said above) [23:29:44] Yeah... [23:29:52] Ensure that the extension's icon is green. [23:30:13] Then, every WMF's site you have access to is served from dedicated debug server. [23:30:32] Test your change, ensure that it do whatever it should do and tell Dereckson if he should continue, or rollback your changes. [23:30:36] (03PS5) 10Ppchelko: JobQueueEventBus: Enable group1. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370975 (https://phabricator.wikimedia.org/T163380) [23:30:47] Feel free to ask me if you have any questions about the extension or something else I may know [23:31:06] How do I install it on firefox? [23:31:20] i have user script that adds the server name (and request time) when logged in [23:31:23] https://meta.wikimedia.org/wiki/User:Aude/global.js [23:31:31] Chrome is basically just a crash factory for me. >.> [23:31:41] No, that isn't the same. [23:31:49] although i can look at the headers, that helps me to know my request is to the debug server [23:32:09] (The extension exists on Firefox too) [23:32:15] How do I install it, though? [23:32:15] Isarra, follow instructions at https://addons.mozilla.org/en-US/firefox/addon/wikimedia-debug-header/ [23:32:35] Ah. [23:32:40] I couldn't even find it. >.> [23:32:48] Isarra, then it'll be somewhere in the Firefox's menu [23:33:01] (after you click at Add to Firefox button at the page [23:33:35] aude: I need to manually create a core submodule update for a branch, I prepared it following the format the automatic script uses, but Gerrit rejects it as there isn't any change id, any hint? [23:33:44] ...firefox just crashed toooooo. [23:34:00] (like a "just push it directly") [23:34:05] Dereckson: looking [23:34:20] the core submodule update should be automatic [23:34:26] Isarra, I can install it without any problems, just by clicking at Add to Firefox and confirming by clicking at Install again [23:34:35] but if not, suppose we can create it [23:34:36] yes but Timeless skin didn't have a wmf13 branch, only wmf14 [23:34:40] Yeah, I installed it and firefox just crashed. [23:34:44] Now I can't get mw.org to load. [23:34:48] ah [23:34:54] What mean can't get mw.org to load? [23:34:59] What's written there? [23:35:03] I'll sort this out in a moment and let you know what you actually wanted me to do. >.> [23:35:06] so I've only the automatic submodule for wmf14 [23:35:07] do you use git review? [23:35:08] oh well [23:35:09] Er, about what you actually wanted. [23:35:11] yes [23:35:17] This internet just sucks. [23:35:26] we can live with ONE submodule update WITH a change-id [23:35:27] let's add one [23:35:48] i just do git submodule update --init --recursive within core [23:36:06] and then git add extensions/Timeless or such [23:36:26] and submit the change to wmf13 (git review) [23:36:36] done, but mediawiki/core SHOULD match prod too [23:36:39] (the .gitmodules file) [23:36:39] ok [23:36:48] https://gerrit.wikimedia.org/r/372089 [23:37:10] That looks good to you? [23:37:18] is timeless new to wmf13 ? and does it have any i18n? [23:37:40] wmf13 has a ommit bc30e627e555311274bc4ec8770872989e7cb654 Add Timeless skin submodule from Friday [23:38:35] if there are some i18n to push, a full scap has been done as a part of the train yesterday [23:38:35] we might also need to update .gitmodules in wmf13 but not sure [23:39:03] e.g. https://gerrit.wikimedia.org/r/#/c/294338/ [23:39:19] not sure .gitmodules change is necessary though [23:39:25] * Dereckson nods [23:40:10] mine that just update the submodule and not change the branch don't touch .gitmodules [23:40:17] i think it's ok [23:40:29] So "no hotfix the first week if you wish deployers don't have extra manual work" is a good policy [23:41:11] oh bugger, zuul is treating submodule update like a full MediaWiki core change [23:41:16] https://integration.wikimedia.org/zuul/ [23:42:11] Dereckson: 1002, you say? [23:42:16] Isarra: yup [23:42:19] that's the one used in SWAT [23:42:35] I'm not seeing a difference. [23:42:47] try with ?debug=true ? [23:43:04] could be a caching issue [23:43:33] by the way WikidataPageBanner appears well at https://test.wikipedia.org/wiki/Special:Version [23:43:52] I just got an empty page from test2.wikipedia.org after trying to log in. [23:44:02] Empty HTML response, no or [23:44:03] odd. [23:44:08] Krinkle: could be WikidataPageBanner [23:44:19] Krinkle: on mwdebug1002 or in prod? [23:44:23] prod [23:44:26] oh so nope [23:44:47] (I pushed 8e073d0a77902af422221b7a3f9cf56616c9af35 Enable WikidataPageBanner on test wikis on mwdebug1002) [23:45:10] https://test2.wikipedia.org/wiki/Main_Page works for me when logging in [23:45:54] 10Operations, 10Traffic, 10Community-Liaisons (Jul-Sep 2017), 10User-Johan: Get translations for "IE8 on XP won't work" - https://phabricator.wikimedia.org/T172418#3526798 (10Johan) https://meta.wikimedia.org/wiki/User:Johan_(WMF)/IE8XP [23:46:11] i am logged in [23:46:16] on test and test2 [23:46:20] Krinkle: I've an exception in logstash, see your private [23:46:55] aude: it occured when creating a new local account apparently [23:47:04] ah [23:47:05] Dereckson: Maybe it is caching, but that didn't help. [23:47:25] * Dereckson looks if there is already a Phab task about this one [23:47:39] i see the changes on tin [23:48:19] 23:44:42 1) Wikibase\Repo\Tests\Api\CreateRedirectTest::testSetRedirect_failure with data set "bad source id" ('xyz', 'Q12', 'invalid-entity-id') [23:48:22] 23:44:42 Use of ApiUsageException::getCodeString was deprecated in MediaWiki 1.29. [Called from Wikibase\Repo\Tests\Api\CreateRedirectTest::testSetRedirect_failure in /home/jenkins/workspace/mediawiki-extensions-hhvm-jessie/src/extensions/Wikidata/extensions/Wikibase/repo/tests/phpunit/includes/Api/CreateRedirectTest.php at line 249] [23:48:35] * Dereckson sighs [23:48:39] o_O [23:48:54] aude: the tests for wmf13 branch [23:49:05] https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm-jessie/20458/console [23:49:27] * aude looks if we fixed that on master [23:51:02] submodule change live on wmf13 on tin, repo is clean :) [23:51:31] Isarra: https://www.mediawiki.org/wiki/Special:Watchlist [23:51:47] when I pick mwdebug1002 + switch the button to on/enable [23:51:50] it works [23:52:12] you've two operation to do on Chrome: (1) choose the right server on the menu (2) enable the debug extension ("ON") [23:52:44] we can't test on a wmf13 wiki currently [23:53:17] syncing [23:55:08] !log dereckson@tin Synchronized php-1.30.0-wmf.13/skins/Timeless/resources/screen-common.less: Fix messed up recent changes/watchlist legends (T173151) (duration: 00m 54s) [23:55:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:55:22] T173151: Watchlist legend alignment is weird - https://phabricator.wikimedia.org/T173151 [23:55:59] Isarra: what we do with Amire80 request to deploy too on hewiki? [23:56:38] (03PS2) 10Dereckson: Enable Timeless on four French wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372087 (https://phabricator.wikimedia.org/T154371) [23:56:44] Do you hit the buttons before or after going to the page? [23:56:51] before [23:56:59] the goal is to add a custom header to the HTTP request [23:57:33] the extension adds this to the HTTP request: X-Wikimedia-Debug: backend=mwdebug1002.eqiad.wmnet [23:57:34] Right, okay. [23:57:43] ...I seriously can't get it to work, though. >.< [23:58:45] !log dereckson@tin Synchronized php-1.30.0-wmf.14/skins/Timeless/resources/screen-common.less: Fix messed up recent changes/watchlist legends (T173151) (duration: 00m 50s) [23:58:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:59:44] https://www.mediawiki.org/wiki/Special:Watchlist <- works in prod too now :) [23:59:52] Thanks for the quick fix!