[00:03:45] 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration: Redirect nan.wikipedia.org to zh-min-nan.wikipedia.org - https://phabricator.wikimedia.org/T173966#3551517 (10Verdy_p) [00:05:12] 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration: Redirect nan.wikipedia.org to zh-min-nan.wikipedia.org - https://phabricator.wikimedia.org/T173966#3546343 (10Verdy_p) [00:14:03] matt_flaschen: you good? [00:15:03] ah, just merged 15 minutes ago [00:15:17] I'm going to head out, you're still clear to go afaiac [00:16:15] greg-g, yeah, it's on mwdebug1002, but then I found MediaWiki.org is broken since an on-by-default gadget is using jquery.mwExtension (just removed). Kind of annoying to test a JS issue with an unrelated JS error, so fixing that first. [00:16:19] greg-g, have a good night. [00:16:44] heh :) [00:31:50] ^ Krinkle, I removed jquery.mwExtension from https://www.mediawiki.org/wiki/MediaWiki:Gadgets-definition , but when I debug through, it still claims it's a dependency of ext.gadget.site. [00:32:15] Krenair, could you remind me which URL has the ResourceLoader dependency map used client-side? [00:32:20] Sorry, Krinkle [00:42:09] AaronSchulz: turns out pretty easy to get the raw data from elasticsearch, i've got a query split into 100 partitions running on our test cluster now, can turn it into a histogram tomorrow morninig [00:43:42] (doesn't even need to be sampled this way, although would probably still be accurate enough at 1% or 10%. Also this doesn't count non-content though because our test cluster doesnt have that index atm) [00:45:26] has links into content, but not links into say talk pages or whatever [00:50:15] Found it (https://www.mediawiki.org/w/load.php?debug=false&lang=en&modules=startup&only=scripts&skin=vector) [01:03:12] Krinkle, greg-g, https://phabricator.wikimedia.org/T174124 [01:04:14] PROBLEM - Check health of redis instance on 6481 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1503623047 600 - REDIS 2.8.17 on 127.0.0.1:6481 has 1 databases (db0) with 5212433 keys, up 4 minutes 4 seconds - replication_delay is 1503623047 [01:04:14] PROBLEM - Check health of redis instance on 6480 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1503623047 600 - REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 5217610 keys, up 4 minutes 4 seconds - replication_delay is 1503623047 [01:04:14] PROBLEM - Check health of redis instance on 6379 on rdb2003 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6379 [01:04:14] PROBLEM - Check health of redis instance on 6381 on rdb2003 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6381 [01:04:34] PROBLEM - Check health of redis instance on 6379 on rdb2001 is CRITICAL: CRITICAL: replication_delay is 1503623069 600 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9922437 keys, up 4 minutes 27 seconds - replication_delay is 1503623069 [01:05:04] PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6479 [01:05:14] RECOVERY - Check health of redis instance on 6481 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6481 has 1 databases (db0) with 5209995 keys, up 5 minutes 4 seconds - replication_delay is 0 [01:05:14] RECOVERY - Check health of redis instance on 6480 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 5212870 keys, up 5 minutes 4 seconds - replication_delay is 0 [01:05:14] RECOVERY - Check health of redis instance on 6379 on rdb2003 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9915829 keys, up 5 minutes 5 seconds - replication_delay is 0 [01:05:24] RECOVERY - Check health of redis instance on 6381 on rdb2003 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6381 has 1 databases (db0) with 9809934 keys, up 5 minutes 15 seconds - replication_delay is 0 [01:05:25] PROBLEM - puppet last run on graphite1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:06:04] RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 5210966 keys, up 5 minutes 54 seconds - replication_delay is 0 [01:06:34] RECOVERY - Check health of redis instance on 6379 on rdb2001 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9913827 keys, up 6 minutes 26 seconds - replication_delay is 0 [01:07:55] 10Operations, 10ops-codfw, 10DC-Ops, 10Data-Services: Split up labstore external shelf storage available in codfw between labstore2001 and 2 - https://phabricator.wikimedia.org/T171623#3551607 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['labstore2002.codfw.wmnet'] ``` Of which those **FAI... [01:10:35] RECOVERY - MariaDB Slave Lag: s3 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 89943.93 seconds [01:12:18] !log mattflaschen@tin Synchronized php-1.30.0-wmf.15/extensions/Flow/modules/engine/components/board/features/flow-board-loadmore.js: T173807: Fix Flow infinite scroll (duration: 00m 45s) [01:12:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:12:31] T173807: [Regression] Pagination doesn't work (there's no automatic load of previous threads on scroll) - https://phabricator.wikimedia.org/T173807 [01:34:45] RECOVERY - puppet last run on graphite1003 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [02:30:32] ebernhardson, do you know how to use hhvm.bypass_access_check ? [02:44:24] PROBLEM - confd service on cp1063 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:44:34] PROBLEM - puppet last run on cp1063 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:44:34] PROBLEM - traffic-pool service on cp1063 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:45:24] RECOVERY - confd service on cp1063 is OK: OK - confd is active [02:45:34] RECOVERY - traffic-pool service on cp1063 is OK: OK - traffic-pool is active [02:45:34] RECOVERY - puppet last run on cp1063 is OK: OK: Puppet is currently enabled, last run 19 minutes ago with 0 failures [03:31:35] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 656.32 seconds [03:42:53] 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration: Like nan.wikipedia.org, redirect other nan.*.org to the proper zh-min-nan.*.org domains - https://phabricator.wikimedia.org/T173966#3551685 (10Liuxinyu970226) [04:17:44] RECOVERY - MariaDB Slave Lag: s2 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 89855.75 seconds [04:17:55] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 212.50 seconds [04:24:34] RECOVERY - MariaDB Slave Lag: s5 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 89854.20 seconds [04:55:11] (03PS1) 10BBlack: 3DES Deprecation: bump to 8% [puppet] - 10https://gerrit.wikimedia.org/r/373726 (https://phabricator.wikimedia.org/T163251) [04:55:33] (03CR) 10jerkins-bot: [V: 04-1] 3DES Deprecation: bump to 8% [puppet] - 10https://gerrit.wikimedia.org/r/373726 (https://phabricator.wikimedia.org/T163251) (owner: 10BBlack) [04:56:53] (03PS2) 10BBlack: Deprecation of 3DES: bump to 8% [puppet] - 10https://gerrit.wikimedia.org/r/373726 (https://phabricator.wikimedia.org/T163251) [04:57:37] jenkins' tox commitmsg check fails on any commitmsg where the first character of the first line is a number, apparently :P [04:57:43] (03PS1) 10Marostegui: db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373727 (https://phabricator.wikimedia.org/T168661) [04:57:50] (03CR) 10BBlack: [C: 032] Deprecation of 3DES: bump to 8% [puppet] - 10https://gerrit.wikimedia.org/r/373726 (https://phabricator.wikimedia.org/T163251) (owner: 10BBlack) [04:59:58] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373727 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [05:01:25] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373727 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [05:01:39] (03CR) 10jenkins-bot: db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373727 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [05:03:10] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2065 for a MariaDB upgrade - T168661 (duration: 00m 44s) [05:03:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:03:25] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [05:04:36] (03CR) 10Marostegui: [C: 031] mariadb: Retire db1033, pool db1069 with its copy and low weight (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373673 (https://phabricator.wikimedia.org/T174076) (owner: 10Jcrespo) [05:08:49] !log Upgrade MariaDB on db2065 - T168661 [05:08:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:09:01] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [05:12:15] (03PS1) 10Marostegui: mariadb: Update socket location [puppet] - 10https://gerrit.wikimedia.org/r/373729 (https://phabricator.wikimedia.org/T148507) [05:13:19] (03PS2) 10Marostegui: mariadb: Update socket location for db2065 [puppet] - 10https://gerrit.wikimedia.org/r/373729 (https://phabricator.wikimedia.org/T148507) [05:15:34] (03CR) 10Marostegui: [C: 032] mariadb: Update socket location for db2065 [puppet] - 10https://gerrit.wikimedia.org/r/373729 (https://phabricator.wikimedia.org/T148507) (owner: 10Marostegui) [05:16:03] !log Reboot db2065 to pick up new kernel [05:16:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:18:00] !log re-enable mw1260 (video scaler) for active job processing [05:18:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:21:49] PROBLEM - Check systemd state on labstore2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [05:25:41] (03PS1) 10Marostegui: db-codfw.php: Repool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373730 (https://phabricator.wikimedia.org/T168661) [05:25:55] 10Operations, 10ops-codfw, 10DC-Ops, 10Data-Services: Split up labstore external shelf storage available in codfw between labstore2001 and 2 - https://phabricator.wikimedia.org/T171623#3551757 (10madhuvishy) @Papaul, thanks for splitting up the shelves! I've reimaged the servers, and that part looks right... [05:26:59] RECOVERY - Check systemd state on mw1260 is OK: OK - running: The system is fully operational [05:27:34] (03CR) 10Marostegui: [C: 032] db-codfw.php: Repool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373730 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [05:29:00] (03Merged) 10jenkins-bot: db-codfw.php: Repool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373730 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [05:29:12] (03CR) 10jenkins-bot: db-codfw.php: Repool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373730 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [05:30:05] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2065 - T168661 (duration: 00m 44s) [05:30:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:30:18] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [05:30:59] RECOVERY - Check systemd state on labstore2002 is OK: OK - running: The system is fully operational [05:35:52] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): labstore systemd state Icinga alarms - https://phabricator.wikimedia.org/T151322#3551776 (10madhuvishy) 05Open>03Resolved 2001 is done too. ``` root@labstore2001:~# systemctl --failed --no-legend root@labstore2001:~# ``` Closing this now. [05:42:59] (03PS1) 10Muehlenhoff: Remove access for psinger [puppet] - 10https://gerrit.wikimedia.org/r/373731 [05:44:39] RECOVERY - puppet last run on mw1260 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [05:46:26] (03CR) 10Muehlenhoff: [C: 032] Remove access for psinger [puppet] - 10https://gerrit.wikimedia.org/r/373731 (owner: 10Muehlenhoff) [05:53:33] 10Operations, 10Multimedia, 10TimedMediaHandler, 10HHVM, 10Patch-For-Review: Migrate video scalers to jessie - https://phabricator.wikimedia.org/T145742#3551785 (10MoritzMuehlenhoff) Status update: Now that Theora generation is disabled, I've re-enabled mw1260, the jessie-based video scaler in eqiad. I'l... [05:55:52] (03PS1) 10Marostegui: db-codfw.php: Depool db2058 for a MariaDB upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373732 (https://phabricator.wikimedia.org/T168661) [05:56:52] (03PS1) 10Muehlenhoff: Remove ffmpeg2theora from package list [puppet] - 10https://gerrit.wikimedia.org/r/373733 (https://phabricator.wikimedia.org/T172445) [05:57:04] (03PS1) 10Marostegui: mariadb: Update db2058 socket location [puppet] - 10https://gerrit.wikimedia.org/r/373734 (https://phabricator.wikimedia.org/T148507) [05:57:15] (03CR) 10jerkins-bot: [V: 04-1] Remove ffmpeg2theora from package list [puppet] - 10https://gerrit.wikimedia.org/r/373733 (https://phabricator.wikimedia.org/T172445) (owner: 10Muehlenhoff) [05:58:11] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2058 for a MariaDB upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373732 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [05:59:36] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2058 for a MariaDB upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373732 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [05:59:45] (03CR) 10jenkins-bot: db-codfw.php: Depool db2058 for a MariaDB upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373732 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [05:59:50] (03CR) 10Jcrespo: "> why db1028 would serve more traffic if it has poorer HW?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373673 (https://phabricator.wikimedia.org/T174076) (owner: 10Jcrespo) [06:00:39] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2058 - T168661 (duration: 00m 43s) [06:00:43] (03PS2) 10Muehlenhoff: Remove ffmpeg2theora from package list [puppet] - 10https://gerrit.wikimedia.org/r/373733 (https://phabricator.wikimedia.org/T172445) [06:00:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:00:51] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [06:01:19] (03CR) 10Marostegui: [C: 031] "> > why db1028 would serve more traffic if it has poorer HW?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373673 (https://phabricator.wikimedia.org/T174076) (owner: 10Jcrespo) [06:01:30] CI fails because of "Line 9: Unexpected blank line" in the commit message. seriously? [06:01:45] !log Upgrade MariaDB to 10.0.32 on db2058 - T168661 [06:01:46] (03PS9) 10Jcrespo: mariadb: Adding rack allocations, some formatting fixes, read-only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371444 (https://phabricator.wikimedia.org/T172459) [06:01:48] (03PS3) 10Jcrespo: mariadb: Repool es2013 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373524 (https://phabricator.wikimedia.org/T172265) [06:01:50] (03PS3) 10Jcrespo: mariadb: Retire db1033, pool db1069 with its copy and low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373673 (https://phabricator.wikimedia.org/T174076) [06:01:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:02:43] moritzm: don't you love the new linting? :) [06:03:47] it's admittedly very quick, but also sometimes stupid. feels like watching a "Fast and the Furious" movie :-) [06:04:17] marostegui: I will deploy https://gerrit.wikimedia.org/r/371444 [06:04:29] unless you see any error or other problem [06:04:31] jynus: let's go for it, I will monitor logtash [06:04:36] me too [06:04:53] also let's monitor wikitek and read only errors [06:05:02] yep [06:05:13] the rebase looks good (my last change is there correctly) [06:05:15] it should be a noop [06:05:21] but you never know [06:05:29] thanks also for checking that [06:05:35] I was fearing that, too [06:06:00] Yeah, both my yesterday's eqiad changes and today's codfw one are there [06:06:04] so that should be good [06:06:15] (03CR) 10Jcrespo: [C: 032] mariadb: Adding rack allocations, some formatting fixes, read-only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371444 (https://phabricator.wikimedia.org/T172459) (owner: 10Jcrespo) [06:06:42] I will deploy 373524 at the same time [06:06:47] not reason not to [06:06:53] will do the other afterwards [06:06:56] sure [06:07:32] I will do codfw first and curl some pages [06:07:40] (03Merged) 10jenkins-bot: mariadb: Adding rack allocations, some formatting fixes, read-only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371444 (https://phabricator.wikimedia.org/T172459) (owner: 10Jcrespo) [06:07:49] time has come! [06:07:49] (03CR) 10jenkins-bot: mariadb: Adding rack allocations, some formatting fixes, read-only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371444 (https://phabricator.wikimedia.org/T172459) (owner: 10Jcrespo) [06:07:57] (03CR) 10Jcrespo: [C: 032] mariadb: Repool es2013 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373524 (https://phabricator.wikimedia.org/T172265) (owner: 10Jcrespo) [06:09:20] (03Merged) 10jenkins-bot: mariadb: Repool es2013 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373524 (https://phabricator.wikimedia.org/T172265) (owner: 10Jcrespo) [06:09:52] (03CR) 10Marostegui: [C: 032] mariadb: Update db2058 socket location [puppet] - 10https://gerrit.wikimedia.org/r/373734 (https://phabricator.wikimedia.org/T148507) (owner: 10Marostegui) [06:10:19] (03CR) 10jenkins-bot: mariadb: Repool es2013 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373524 (https://phabricator.wikimedia.org/T172265) (owner: 10Jcrespo) [06:10:25] merging codfw [06:10:28] ok [06:10:31] checking logtash [06:11:01] !log jynus@tin Synchronized wmf-config/db-codfw.php: repool es2013, formatting fixes (duration: 00m 44s) [06:11:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:11:51] so far so good [06:18:08] some uncought exceptions, but I am not sure those are new [06:19:38] seems to be happening for hours, maybe we should ping releng [06:19:49] yeah I was checking the history [06:19:55] Don't think it is related to our change [06:20:07] no [06:20:15] it could be hvvm's fault, too [06:21:11] it has been happening for over 7 days [06:21:22] and actually, it got better recently [06:22:41] let's go for db-eqiad? [06:22:45] yes [06:22:55] I wonder if this is related to the job-queue issues [06:23:12] most of them seem to be job queue executions [06:23:36] Maybe we can ask Amir1 to check, as he was hotfixing it yesterday, just to discard further issues [06:24:29] The jobqueu size started to go down after the patch (as per the graphs) [06:24:40] so in that sense it looks good [06:24:56] I will add a comment [06:24:57] assuming it is not just discarding jobs :) [06:28:15] 10Operations, 10CirrusSearch, 10Discovery, 10Discovery-Search, and 6 others: Job queue is increasing non-stop - https://phabricator.wikimedia.org/T173710#3551824 (10jcrespo) This is probably a symptom and not a cause, but I wanted to comment it anyway in case it was interesting: There seems to be higher t... [06:28:59] ok, doing eqiad [06:30:02] !log jynus@tin Synchronized wmf-config/db-eqiad.php: formatting fixes (duration: 00m 44s) [06:30:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:31:04] i can edit finely on wikitech [06:39:21] job queue processors seems to have higher connection error rate than the other application servers [06:39:30] which probably isn't surprising [06:41:11] 10Operations, 10MW-1.30-release-notes, 10Performance-Team, 10monitoring: Ensure getLagTimes.php is working properly - https://phabricator.wikimedia.org/T172559#3551831 (10jcrespo) This certainly is working, I would close it as resolved: https://logstash.wikimedia.org/goto/3d73a71beb3d6fa2bd5f8740b0f47fab T... [06:41:16] Yeah, I am not seeing anything wrong with your change [06:43:52] https://noc.wikimedia.org/conf/db-eqiad.php.txt seems ok [06:44:11] but https://noc.wikimedia.org/conf/highlight.php?file=db-eqiad.php seems weird, can you see why? [06:44:37] yeah, I see different indents [06:45:01] https://phabricator.wikimedia.org/source/mediawiki-config/browse/master/wmf-config/db-eqiad.php seems ok too [06:45:25] yeah [06:45:26] weird [06:46:19] I think there are some minor fixes pending on serverTemplate [06:51:26] (03CR) 10Jcrespo: [C: 032] mariadb: Retire db1033, pool db1069 with its copy and low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373673 (https://phabricator.wikimedia.org/T174076) (owner: 10Jcrespo) [06:52:57] (03Merged) 10jenkins-bot: mariadb: Retire db1033, pool db1069 with its copy and low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373673 (https://phabricator.wikimedia.org/T174076) (owner: 10Jcrespo) [06:53:10] (03CR) 10jenkins-bot: mariadb: Retire db1033, pool db1069 with its copy and low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373673 (https://phabricator.wikimedia.org/T174076) (owner: 10Jcrespo) [06:55:43] !log jynus@tin Synchronized wmf-config/db-codfw.php: decom db1033 (duration: 00m 43s) [06:55:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:56:02] (03CR) 10Muehlenhoff: [C: 032] Fine-tune display of check_restart and deploy commands [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373283 (owner: 10Muehlenhoff) [07:00:10] !log jynus@tin Synchronized wmf-config/db-eqiad.php: decom db1033, pool db1069 with low weight (duration: 00m 44s) [07:00:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:16:48] (03CR) 10Muehlenhoff: New debdeploy module to query installed reverse dependencies (037 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373290 (owner: 10Muehlenhoff) [07:17:33] (03CR) 10Muehlenhoff: [C: 032] New debdeploy module to query installed reverse dependencies [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373290 (owner: 10Muehlenhoff) [07:21:41] (03CR) 10Volans: "I don't see the PS2 that fixes the comments" [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373290 (owner: 10Muehlenhoff) [07:24:17] (03PS4) 10Jcrespo: wikidata-maintenance: Emergency stop of rebuildTermSqlIndex [puppet] - 10https://gerrit.wikimedia.org/r/373507 (https://phabricator.wikimedia.org/T171460) [07:26:11] (03CR) 10Jcrespo: [C: 032] "Amir: Log was rotated and cron failed to execute- it needs a slightly different method of checking the last log." [puppet] - 10https://gerrit.wikimedia.org/r/373507 (https://phabricator.wikimedia.org/T171460) (owner: 10Jcrespo) [07:26:29] PROBLEM - HHVM jobrunner on mw1161 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [07:27:29] RECOVERY - HHVM jobrunner on mw1161 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.002 second response time [07:31:44] (03PS1) 10Muehlenhoff: Drop the suffix [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373782 [07:32:49] (03CR) 10Muehlenhoff: [C: 032] Drop the suffix [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373782 (owner: 10Muehlenhoff) [07:34:17] (03PS1) 10Volans: Adapt cumin calls to latest API version [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373786 [07:35:49] (03PS1) 10Muehlenhoff: Review feedback by Riccardo [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373789 [07:38:53] (03CR) 10Muehlenhoff: [C: 031] "Thanks, feel free to merge already. I'll deploy debdeploy to some test hosts in production later the day." [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373786 (owner: 10Volans) [07:39:05] (03CR) 10Muehlenhoff: [C: 032] Review feedback by Riccardo [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373789 (owner: 10Muehlenhoff) [07:40:00] jynus: I'm around and work on it right now [07:40:25] (03PS1) 10Jcrespo: mariadb: Pool db1069 with more weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373795 (https://phabricator.wikimedia.org/T174076) [07:41:12] (03PS2) 10Volans: Adapt cumin calls to latest API version [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373786 [07:41:50] (03CR) 10Volans: [C: 032] Adapt cumin calls to latest API version [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373786 (owner: 10Volans) [07:45:45] (03CR) 10Jcrespo: [C: 032] mariadb: Pool db1069 with more weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373795 (https://phabricator.wikimedia.org/T174076) (owner: 10Jcrespo) [07:47:24] (03CR) 10jenkins-bot: mariadb: Pool db1069 with more weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373795 (https://phabricator.wikimedia.org/T174076) (owner: 10Jcrespo) [07:49:00] !log jynus@tin Synchronized wmf-config/db-eqiad.php: pool db1069 with more weight (duration: 00m 44s) [07:49:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:52:21] (03PS1) 10Jcrespo: mariadb: Depool db1028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373812 (https://phabricator.wikimedia.org/T174076) [07:55:09] !log reimaging logstash1006 after change of failed disk - T173679 [07:55:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:55:21] T173679: Degraded RAID on logstash1006 - https://phabricator.wikimedia.org/T173679 [08:00:32] !log upgrading cumin to v1.0.0 on sarin/neodymium [08:00:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:00:54] (03PS5) 10Volans: Cumin: update configuration file for v1.0.0 [puppet] - 10https://gerrit.wikimedia.org/r/373550 [08:01:21] (03CR) 10Volans: [C: 032] Cumin: update configuration file for v1.0.0 [puppet] - 10https://gerrit.wikimedia.org/r/373550 (owner: 10Volans) [08:07:33] (03CR) 10Jcrespo: [C: 04-1] "Not yet" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373812 (https://phabricator.wikimedia.org/T174076) (owner: 10Jcrespo) [08:08:54] (03PS1) 10Volans: Cumin: fix typo in aliases generation [puppet] - 10https://gerrit.wikimedia.org/r/373838 [08:09:27] (03CR) 10Volans: [C: 032] Cumin: fix typo in aliases generation [puppet] - 10https://gerrit.wikimedia.org/r/373838 (owner: 10Volans) [08:15:46] 10Operations, 10fundraising-tech-ops: Port fundraising stats off Ganglia - https://phabricator.wikimedia.org/T152562#3551920 (10fgiunchedi) >>! In T152562#3551109, @cwdent wrote: > Hi @fgiunchedi - prometheus is now running on pay-lvs*:9090 > > They are only watching themselves and one eqiad host, which also... [08:20:55] (03PS1) 10Alexandros Kosiaris: role::mx: Use the Message-ID header instead of Subject [puppet] - 10https://gerrit.wikimedia.org/r/373848 (https://phabricator.wikimedia.org/T173733) [08:21:09] (03CR) 10jerkins-bot: [V: 04-1] role::mx: Use the Message-ID header instead of Subject [puppet] - 10https://gerrit.wikimedia.org/r/373848 (https://phabricator.wikimedia.org/T173733) (owner: 10Alexandros Kosiaris) [08:21:32] (03PS2) 10Alexandros Kosiaris: role::mx: Use the Message-ID header instead of Subject [puppet] - 10https://gerrit.wikimedia.org/r/373848 (https://phabricator.wikimedia.org/T173733) [08:24:53] (03CR) 10Alexandros Kosiaris: [C: 032] role::mx: Use the Message-ID header instead of Subject [puppet] - 10https://gerrit.wikimedia.org/r/373848 (https://phabricator.wikimedia.org/T173733) (owner: 10Alexandros Kosiaris) [08:25:02] 10Operations, 10ops-eqiad: Broken disk on mw1228 - https://phabricator.wikimedia.org/T168613#3369930 (10Volans) What is the status of this server? I can see it all red in Icinga, trying to SSH gives the key changed warning but puppet is not aware of the new one. [08:36:21] 10Operations, 10Mail, 10OTRS, 10Patch-For-Review: Automatically merge bounces/DSNs in ticket - https://phabricator.wikimedia.org/T173733#3551953 (10akosiaris) A more thorough reading of the code helped me understand that the previous approach using the `Subject` header was not correct. I 've amended to us... [08:41:18] !log restarting apache/hhvm on canary app servers to pick up libxml security update [08:41:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:52:46] (03CR) 10Filippo Giunchedi: "> Main test build failed." [puppet] - 10https://gerrit.wikimedia.org/r/373558 (owner: 10Filippo Giunchedi) [09:06:08] 10Operations, 10ops-eqiad, 10Discovery-Search (Current work): Degraded RAID on logstash1006 - https://phabricator.wikimedia.org/T173679#3552026 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['logstash1006.eqiad.wmnet'] ``` The log can be found in... [09:07:02] 10Operations, 10Ops-Access-Requests, 10Analytics, 10Research, and 2 others: NDA, MOU and LDAP (analytics cluster) for Shilad Sen - https://phabricator.wikimedia.org/T171988#3552027 (10Shilad) One follow-up: The Navigation Vectors project [[ https://github.com/ewulczyn/wiki-vectors/blob/master/src/get_reque... [09:09:05] 10Operations, 10Mail, 10OTRS, 10Patch-For-Review: Automatically merge bounces/DSNs in ticket - https://phabricator.wikimedia.org/T173733#3552029 (10akosiaris) Seems like https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom;TicketID=10208290#Article12110567 worked fine. [09:12:33] !log reimage restbase2001 to test new partman recipe - T169939 [09:12:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:12:46] T169939: End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939 [09:13:01] I know partman will give me zero joy, hope to be wrong [09:13:06] 10Operations, 10Traffic: Fix broken referer categorization for visits from Safari browsers - https://phabricator.wikimedia.org/T154702#3552066 (10TheDJ) Safari Technology Preview v38, released 2 days ago, seems to be solving this. I at least am no longer getting the console warning that the policy is unsupport... [09:19:41] !log restart thumbor to pick up libxml2 security updates [09:19:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:20:56] 10Operations, 10monitoring: add pdu redundancy checking to server/router/switch checks in icinga - https://phabricator.wikimedia.org/T109903#3552072 (10akosiaris) This looks like a nice way forward. For what is worth merging the check with the temperature one is probably fine. I don't see any real benefit in s... [09:28:04] wotcha all, little stuck on something and would appreciate a poke in the right direction - how are images uploaded to `/static/images/mobile/copyright/`? I believe I've seen some logs where they've been synced from somewhere? [09:38:01] 10Operations, 10monitoring: add pdu redundancy checking to server/router/switch checks in icinga - https://phabricator.wikimedia.org/T109903#1562129 (10Volans) I agree, the only drawback I see to have them bundled together is that we couldn't use stalking to tell them apart given that the temperature will chan... [09:45:47] PROBLEM - Host logstash1006 is DOWN: PING CRITICAL - Packet loss = 100% [09:46:06] RECOVERY - Host logstash1006 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [09:46:52] (03PS1) 10Ladsgroup: mediawiki: fix logrotating in wikidata cronjob [puppet] - 10https://gerrit.wikimedia.org/r/373854 (https://phabricator.wikimedia.org/T171460) [09:47:15] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: fix logrotating in wikidata cronjob [puppet] - 10https://gerrit.wikimedia.org/r/373854 (https://phabricator.wikimedia.org/T171460) (owner: 10Ladsgroup) [09:48:06] 10Operations, 10ops-eqiad, 10Discovery-Search (Current work): Degraded RAID on logstash1006 - https://phabricator.wikimedia.org/T173679#3552114 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['logstash1006.eqiad.wmnet'] ``` and were **ALL** successful. [09:50:16] (03PS2) 10Ladsgroup: mediawiki: fix logrotating in wikidata cronjob [puppet] - 10https://gerrit.wikimedia.org/r/373854 (https://phabricator.wikimedia.org/T171460) [09:50:39] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: fix logrotating in wikidata cronjob [puppet] - 10https://gerrit.wikimedia.org/r/373854 (https://phabricator.wikimedia.org/T171460) (owner: 10Ladsgroup) [09:52:21] (03PS3) 10Ladsgroup: mediawiki: fix logrotating in wikidata cronjob [puppet] - 10https://gerrit.wikimedia.org/r/373854 (https://phabricator.wikimedia.org/T171460) [09:59:56] (03PS3) 10Phuedx: pagePreviews: remove invalidated popup sampling rate variables [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373171 (https://phabricator.wikimedia.org/T172291) (owner: 10Niedzielski) [10:00:06] (03CR) 10jerkins-bot: [V: 04-1] pagePreviews: remove invalidated popup sampling rate variables [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373171 (https://phabricator.wikimedia.org/T172291) (owner: 10Niedzielski) [10:29:12] (03PS1) 10Volans: Pass a logger instance to cumin [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373862 [10:30:48] PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target) is CRITICAL: Test normal source and target returned the unexpected status 429 (expecting: 200) [10:31:48] RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy [10:35:38] PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received [10:36:38] RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy [10:42:55] (03CR) 10Muehlenhoff: [C: 031] "Thanks, looks good and fixes the traceback." [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373862 (owner: 10Volans) [10:42:57] (03CR) 10Muehlenhoff: [C: 032] Pass a logger instance to cumin [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373862 (owner: 10Volans) [10:44:35] PROBLEM - MD RAID on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:44:35] PROBLEM - configured eth on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:45:35] PROBLEM - dhclient process on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:46:25] PROBLEM - cassandra-a CQL 10.192.16.162:9042 on restbase2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:46:26] PROBLEM - puppet last run on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:47:16] PROBLEM - Check size of conntrack table on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:47:16] PROBLEM - salt-minion processes on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:48:15] PROBLEM - Check systemd state on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:48:15] PROBLEM - cassandra-a service on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:49:05] PROBLEM - Check the NTP synchronisation status of timesyncd on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:49:05] PROBLEM - cassandra-b CQL 10.192.16.163:9042 on restbase2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:50:05] PROBLEM - Check whether ferm is active by checking the default input chain on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:50:55] PROBLEM - DPKG on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:50:55] PROBLEM - cassandra-b service on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:51:43] that's me ^ downtime expired [10:51:44] (03PS1) 10Filippo Giunchedi: install_server: add partman for cassandra JBOD [puppet] - 10https://gerrit.wikimedia.org/r/373863 (https://phabricator.wikimedia.org/T169939) [10:51:45] PROBLEM - cassandra-c CQL 10.192.16.164:9042 on restbase2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:51:45] PROBLEM - Disk space on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:57:23] (03CR) 10Filippo Giunchedi: "The directory/mountpoint naming is up for discussion, I've put in /srv/cassandra/instance-data for the raid1 to get started. The individua" [puppet] - 10https://gerrit.wikimedia.org/r/373863 (https://phabricator.wikimedia.org/T169939) (owner: 10Filippo Giunchedi) [10:57:25] RECOVERY - Check size of conntrack table on restbase2001 is OK: OK: nf_conntrack is 0 % full [10:57:45] RECOVERY - Disk space on restbase2001 is OK: DISK OK [10:57:45] RECOVERY - dhclient process on restbase2001 is OK: PROCS OK: 0 processes with command name dhclient [10:57:45] RECOVERY - configured eth on restbase2001 is OK: OK - interfaces up [10:57:45] RECOVERY - MD RAID on restbase2001 is OK: OK: Active: 15, Working: 15, Failed: 0, Spare: 0 [10:57:55] RECOVERY - DPKG on restbase2001 is OK: All packages OK [10:58:05] RECOVERY - Check whether ferm is active by checking the default input chain on restbase2001 is OK: OK ferm input default policy is set [11:11:43] (03PS1) 10Muehlenhoff: Add support for querying reverse dependencies [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373865 [11:19:49] (03PS5) 10Jcrespo: mariadb: Add cluster manager hosts to allowed admin port users [puppet] - 10https://gerrit.wikimedia.org/r/362217 [11:21:13] (03CR) 10Jcrespo: "Andrew: let's reopen 3306 for https://tendril.wikimedia.org/ monitoring, if you are ok with that (not the full range, like before)." [puppet] - 10https://gerrit.wikimedia.org/r/362217 (owner: 10Jcrespo) [11:27:56] PROBLEM - High lag on wdqs1002 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [1800.0] [11:28:28] ^looking at wdqs... [11:29:36] updater stopped for a while on wdqs1002, but it is already recovered, I'll try to find an explanation [11:32:57] RECOVERY - High lag on wdqs1002 is OK: OK: Less than 30.00% above the threshold [600.0] [11:33:54] (03PS1) 10Muehlenhoff: Handle incorrect package names in reverse dependency query [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373866 [11:34:16] PROBLEM - Check Varnish expiry mailbox lag on cp1049 is CRITICAL: CRITICAL: expiry mailbox lag is 2035249 [11:44:55] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373812 (https://phabricator.wikimedia.org/T174076) (owner: 10Jcrespo) [11:46:56] (03Merged) 10jenkins-bot: mariadb: Depool db1028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373812 (https://phabricator.wikimedia.org/T174076) (owner: 10Jcrespo) [11:47:39] (03CR) 10jenkins-bot: mariadb: Depool db1028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373812 (https://phabricator.wikimedia.org/T174076) (owner: 10Jcrespo) [11:48:33] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1028 (duration: 00m 47s) [11:48:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:54:42] (03PS1) 10Jcrespo: mariadb: Decommission db1028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373869 (https://phabricator.wikimedia.org/T174076) [11:55:09] (03CR) 10Jcrespo: [C: 04-1] mariadb: Decommission db1028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373869 (https://phabricator.wikimedia.org/T174076) (owner: 10Jcrespo) [12:03:55] (03PS1) 10Muehlenhoff: Make the server group / Cumin alias configurable [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373870 [12:04:38] (03PS1) 10Marostegui: db-codfw.php: Repool db2058 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373871 (https://phabricator.wikimedia.org/T168661) [12:16:46] (03CR) 10Marostegui: [C: 032] db-codfw.php: Repool db2058 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373871 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [12:18:56] (03Merged) 10jenkins-bot: db-codfw.php: Repool db2058 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373871 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [12:19:05] (03CR) 10jenkins-bot: db-codfw.php: Repool db2058 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373871 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [12:19:59] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2058 - T168661 (duration: 00m 48s) [12:20:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:20:11] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [12:21:25] jouncebot, next [12:21:26] In 72 hour(s) and 38 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170828T1300) [12:23:26] (03PS1) 10Marostegui: db-codfw.php: Depool db2044 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373872 (https://phabricator.wikimedia.org/T168661) [12:26:30] (03PS1) 10Urbanecm: Add 4 URLs to $wgCopyUploadsDomain whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373873 (https://phabricator.wikimedia.org/T174152) [12:26:32] (03PS1) 10Marostegui: mariadb: Update socket location for db2044 [puppet] - 10https://gerrit.wikimedia.org/r/373874 (https://phabricator.wikimedia.org/T148507) [12:26:44] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2044 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373872 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [12:27:23] (03CR) 10Lucas Werkmeister (WMDE): "Update: there’s been a new Wikidata build with an update to the WikibaseQualityConstraints extension, so I think there’s now deployed code" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367914 (https://phabricator.wikimedia.org/T171281) (owner: 10Lucas Werkmeister (WMDE)) [12:27:58] (03PS3) 10Lucas Werkmeister (WMDE): Log 'WikibaseQualityConstraints' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367914 (https://phabricator.wikimedia.org/T171281) [12:28:12] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2044 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373872 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [12:28:22] (03CR) 10jenkins-bot: db-codfw.php: Depool db2044 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373872 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [12:29:20] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2044 - T168661 (duration: 00m 43s) [12:29:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:29:34] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [12:29:59] !log Upgrade MariaDB on db2044 - T168661 [12:29:59] (03PS4) 10Niedzielski: pagePreviews: remove invalidated popup sampling rate variables [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373171 (https://phabricator.wikimedia.org/T172291) [12:30:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:18] (03CR) 10Zoranzoki21: [C: 031] "Looks good to me, but someone else must approve." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373171 (https://phabricator.wikimedia.org/T172291) (owner: 10Niedzielski) [12:37:55] (03PS2) 10Marostegui: mariadb: Update socket location for db2044 [puppet] - 10https://gerrit.wikimedia.org/r/373874 (https://phabricator.wikimedia.org/T148507) [12:38:43] (03CR) 10Marostegui: [C: 032] mariadb: Update socket location for db2044 [puppet] - 10https://gerrit.wikimedia.org/r/373874 (https://phabricator.wikimedia.org/T148507) (owner: 10Marostegui) [12:42:54] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2044" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373876 [12:43:58] (03PS4) 10Urbanecm: Automatically include commons and wikidata in $wmgThrottlingExceptions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373698 (https://phabricator.wikimedia.org/T163872) [12:45:18] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2044" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373876 (owner: 10Marostegui) [12:45:49] (03CR) 10jerkins-bot: [V: 04-1] Automatically include commons and wikidata in $wmgThrottlingExceptions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373698 (https://phabricator.wikimedia.org/T163872) (owner: 10Urbanecm) [12:51:47] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2044 - T168661 (duration: 00m 44s) [12:51:54] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2044" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373876 (owner: 10Marostegui) [12:51:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:51:59] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [12:52:08] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2044" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373876 (owner: 10Marostegui) [12:56:32] (03PS1) 10Marostegui: db-codfw.php: Depool db2037 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373879 (https://phabricator.wikimedia.org/T168661) [12:57:56] 10Operations, 10Discovery, 10Discovery-Analysis, 10Maps, and 2 others: What is a reasonable per-IP ratelimit for maps - https://phabricator.wikimedia.org/T169175#3552805 (10Gehel) @BBlack any news on a deploy time for this? [12:59:50] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2037 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373879 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [13:01:30] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2037 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373879 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [13:01:46] (03CR) 10jenkins-bot: db-codfw.php: Depool db2037 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373879 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [13:01:59] (03PS1) 10Filippo Giunchedi: prometheus: log puppet-agent-stats exceptions at DEBUG [puppet] - 10https://gerrit.wikimedia.org/r/373882 (https://phabricator.wikimedia.org/T170932) [13:02:32] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2037 - T168661 (duration: 00m 44s) [13:02:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:45] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [13:04:44] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: log puppet-agent-stats exceptions at DEBUG [puppet] - 10https://gerrit.wikimedia.org/r/373882 (https://phabricator.wikimedia.org/T170932) (owner: 10Filippo Giunchedi) [13:08:53] (03PS1) 10Marostegui: mariadb: Update socket location for db2037 [puppet] - 10https://gerrit.wikimedia.org/r/373885 (https://phabricator.wikimedia.org/T148507) [13:18:04] (03CR) 10Volans: [C: 04-1] "I don't like too much the approach to match errors, see the comments inline." (033 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373866 (owner: 10Muehlenhoff) [13:22:56] (03CR) 10Marostegui: [C: 032] mariadb: Update socket location for db2037 [puppet] - 10https://gerrit.wikimedia.org/r/373885 (https://phabricator.wikimedia.org/T148507) (owner: 10Marostegui) [13:24:27] !log Upgrade MariaDB on db2037 to 10.0.32 - T168661 [13:24:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:24:41] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [13:26:50] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2037" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373889 [13:31:55] (03CR) 10Volans: "See comments inline." (034 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373865 (owner: 10Muehlenhoff) [13:34:40] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2037" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373889 (owner: 10Marostegui) [13:36:03] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2037" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373889 (owner: 10Marostegui) [13:36:22] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2037" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373889 (owner: 10Marostegui) [13:37:09] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2037 - T168661 (duration: 00m 45s) [13:37:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:37:21] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [13:39:03] (03CR) 10Volans: [C: 04-1] "See comments inline." (034 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373870 (owner: 10Muehlenhoff) [13:40:38] !log Upgrade MariaDB to 10.0.32 on db2019 - T168661 [13:40:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:42:48] (03PS1) 10Marostegui: db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373891 (https://phabricator.wikimedia.org/T168661) [13:47:16] 10Operations, 10ops-eqiad, 10Discovery-Search (Current work): Degraded RAID on logstash1006 - https://phabricator.wikimedia.org/T173679#3552996 (10Gehel) Reimage completed, logstash1006 back in rotation. @Cmjohnson: do you need to keep this task open? [13:50:23] (03PS1) 10Marostegui: mariadb: Update socket location for db1097 [puppet] - 10https://gerrit.wikimedia.org/r/373892 (https://phabricator.wikimedia.org/T148507) [13:50:54] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373891 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [13:52:17] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373891 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [13:52:29] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373891 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [13:53:17] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db1097 - T168661 (duration: 00m 44s) [13:53:23] !log Upgrade MariaDB on db1097 to 10.0.32 - T168661 [13:53:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:29] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [13:53:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:52] 10Operations, 10monitoring: Investigate check_nrpe -u option to reduce critical alerts - https://phabricator.wikimedia.org/T172131#3553025 (10faidon) [13:57:01] 10Operations, 10monitoring: Investigate check_nrpe -u option to reduce critical alerts - https://phabricator.wikimedia.org/T172131#3486701 (10faidon) Sounds good to me, feel free to go ahead :) [13:59:11] (03CR) 10Marostegui: [C: 032] mariadb: Update socket location for db1097 [puppet] - 10https://gerrit.wikimedia.org/r/373892 (https://phabricator.wikimedia.org/T148507) (owner: 10Marostegui) [14:04:02] 10Operations, 10monitoring: Review check_raid_hpssacli frequency - https://phabricator.wikimedia.org/T173311#3553044 (10faidon) [14:04:09] 10Operations, 10monitoring: Review check_ping settings - https://phabricator.wikimedia.org/T173315#3553048 (10faidon) [14:10:04] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1097" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373894 [14:13:21] 10Operations, 10monitoring: Review check_ping settings - https://phabricator.wikimedia.org/T173315#3523998 (10faidon) 1 ping is going to be too error-prone though :/ A single packet may be dropped for whatever reason on either side or in transport. Especially when talking about cross-DC checks, this isn't too... [14:13:33] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1097" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373894 (owner: 10Marostegui) [14:14:57] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1097" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373894 (owner: 10Marostegui) [14:16:16] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1097" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373894 (owner: 10Marostegui) [14:16:23] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db1097 - T168661 (duration: 00m 44s) [14:16:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:16:36] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [14:18:28] 10Operations, 10fundraising-tech-ops: Port fundraising stats off Ganglia - https://phabricator.wikimedia.org/T152562#3553072 (10cwdent) Thanks @fgiunchedi, I had assumed we'd be using the latter "federated" approach so I did make those firewall holes already (but would still need to tweak iptables). Puppet co... [14:23:56] PROBLEM - Check Varnish expiry mailbox lag on cp1072 is CRITICAL: CRITICAL: expiry mailbox lag is 2130345 [14:29:12] !log bounce varnish on cp1049 / cp1072 - mailbox lag [14:29:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:34] 10Operations, 10monitoring: Review check_ping settings - https://phabricator.wikimedia.org/T173315#3523998 (10Volans) The `check_ping` on our icinga hosts doesn't seem to have an option to set the equivalent of the `-i` of the ping command. Reducing from 5 to 3 packets half the time to 2s per check, but I'm no... [14:33:57] RECOVERY - Check Varnish expiry mailbox lag on cp1072 is OK: OK: expiry mailbox lag is 0 [14:34:16] RECOVERY - Check Varnish expiry mailbox lag on cp1049 is OK: OK: expiry mailbox lag is 0 [14:40:49] (03PS1) 10Herron: Add shell account cooltey [puppet] - 10https://gerrit.wikimedia.org/r/373899 (https://phabricator.wikimedia.org/T173886) [14:46:28] (03PS4) 10Pmiazga: Remove unused wgPopupsAPIUseRESTBase config variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/358415 (https://phabricator.wikimedia.org/T165018) [14:49:59] (03Abandoned) 10Pmiazga: Remove unused wgPopupsAPIUseRESTBase config variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/358415 (https://phabricator.wikimedia.org/T165018) (owner: 10Pmiazga) [14:51:23] 10Operations, 10Discovery, 10Discovery-Analysis, 10Maps, and 2 others: What is a reasonable per-IP ratelimit for maps - https://phabricator.wikimedia.org/T169175#3553151 (10BBlack) Sorry, it's fallen off the radar lately. Let's shoot for mid-next-week, perhaps Weds? [14:52:18] (03CR) 10Zoranzoki21: [C: 031] "Looks good to me, but someone else must approve." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373873 (https://phabricator.wikimedia.org/T174152) (owner: 10Urbanecm) [14:59:57] 10Operations, 10monitoring: Review check_raid_hpssacli frequency - https://phabricator.wikimedia.org/T173311#3553178 (10fgiunchedi) From swift clients perspective a failed disk isn't usually very impactful e.g. on latency unless it is a grey failure (e.g. disk very slow to respond). So from that perspective it... [15:08:38] 10Operations, 10OCG-General, 10Reading-Community-Engagement, 10Epic, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#3553223 (10ovasileva) @gwicke - currently, the estimate for OCG replacement is by the end of September, however... [15:25:39] 10Operations, 10monitoring: Review check_ping settings - https://phabricator.wikimedia.org/T173315#3553262 (10herron) Should have clarified in the description that the current host check has a max check of 2 so was wondering about the pros/cons of the current 2 consecutive 4s 5 packet check_ping versus say 3 o... [15:26:54] (03CR) 10Volans: [C: 031] "Good work! The new structure looks much cleaner and organized than before, making it simpler to follow." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/369682 (https://phabricator.wikimedia.org/T171704) (owner: 10Gehel) [15:36:11] (03PS1) 10Alexandros Kosiaris: Generate kubernetes manpages for kubectl [debs/kubernetes] - 10https://gerrit.wikimedia.org/r/373917 (https://phabricator.wikimedia.org/T170346) [15:37:23] (03CR) 10Muehlenhoff: Handle incorrect package names in reverse dependency query (031 comment) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/373866 (owner: 10Muehlenhoff) [15:42:34] 10Operations, 10fundraising-tech-ops: Port fundraising stats off Ganglia - https://phabricator.wikimedia.org/T152562#3553297 (10fgiunchedi) I see as two different use cases mostly, accessing directly from grafana will let you access all metrics and datapoints that Prometheus server is collecting and retaining.... [15:42:56] (03CR) 10Muehlenhoff: [C: 04-1] Add shell account cooltey (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/373899 (https://phabricator.wikimedia.org/T173886) (owner: 10Herron) [15:44:07] (03CR) 10BBlack: "Discussion at T170598 if someone has comments" [puppet] - 10https://gerrit.wikimedia.org/r/361864 (owner: 10BBlack) [15:44:25] (03CR) 10Jdlrobson: [C: 031] Fix incorrect Special:Userlogin name in Popups blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373696 (https://phabricator.wikimedia.org/T170169) (owner: 10Pmiazga) [15:44:47] matt_flaschen: not familiar with hhvm.bypass_access_check, i generally use `set bac on` in the debugger manually to enable bypass of checks [15:54:43] (03PS1) 10Muehlenhoff: Treat cn=ciadmin as a privileged group [puppet] - 10https://gerrit.wikimedia.org/r/373919 (https://phabricator.wikimedia.org/T169557) [15:55:54] (03PS2) 10BBlack: VCL: apply VSV00001 (DSA 3924-1) DoS workaround to text only [puppet] - 10https://gerrit.wikimedia.org/r/369859 (owner: 10Ema) [15:56:06] (03PS2) 10Herron: Add shell accounts cooltey and sharvaniharan [puppet] - 10https://gerrit.wikimedia.org/r/373899 (https://phabricator.wikimedia.org/T173886) [15:56:28] (03PS1) 10Jdlrobson: Enable an A/B test for page previews on EN and DE wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373920 (https://phabricator.wikimedia.org/T172291) [15:56:40] (03CR) 10Muehlenhoff: [C: 032] Treat cn=ciadmin as a privileged group [puppet] - 10https://gerrit.wikimedia.org/r/373919 (https://phabricator.wikimedia.org/T169557) (owner: 10Muehlenhoff) [15:57:08] 10Operations, 10Traffic: Extending our HSTS value beyond ~1y - https://phabricator.wikimedia.org/T170598#3553325 (10Volans) +1 for me, I see this almost as a noop. Over 2y is more likely than the user change the physical device (in particular if mobile) than HSTS expires :wink: [15:57:13] (03CR) 10BBlack: [C: 032] VCL: apply VSV00001 (DSA 3924-1) DoS workaround to text only [puppet] - 10https://gerrit.wikimedia.org/r/369859 (owner: 10Ema) [15:57:32] (03PS3) 10BBlack: VCL: apply VSV00001 (DSA 3924-1) DoS workaround to text only [puppet] - 10https://gerrit.wikimedia.org/r/369859 (owner: 10Ema) [15:57:37] (03CR) 10BBlack: [V: 032 C: 032] VCL: apply VSV00001 (DSA 3924-1) DoS workaround to text only [puppet] - 10https://gerrit.wikimedia.org/r/369859 (owner: 10Ema) [15:58:47] (03CR) 10jerkins-bot: [V: 04-1] Enable an A/B test for page previews on EN and DE wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373920 (https://phabricator.wikimedia.org/T172291) (owner: 10Jdlrobson) [15:59:03] matt_flaschen: fwiw, the only mention of bypass_access_check is in the debugger code from hhvm, so the ini setting would only apply in the debugger [16:02:21] 10Operations, 10monitoring: Review check_raid_hpssacli frequency - https://phabricator.wikimedia.org/T173311#3553335 (10Marostegui) I wouldn't like to have the check pushed to 4h-8h, that is a lot, specially for masters as @faidon said. Normally a degraded RAID should be well handled by the controller and shou... [16:02:23] 10Operations, 10Graphite: unused grafana-dashboard indices on elasticsearch / logstash - https://phabricator.wikimedia.org/T174172#3553336 (10Gehel) [16:03:44] !log created cn=ciadmin group in LDAP to be used for privileged Jenkins access, initial members in ticket (T169557) [16:03:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:08:11] (03CR) 10RobH: [C: 031] Add shell accounts cooltey and sharvaniharan [puppet] - 10https://gerrit.wikimedia.org/r/373899 (https://phabricator.wikimedia.org/T173886) (owner: 10Herron) [16:08:44] 10Operations, 10Traffic, 10HTTPS, 10Patch-For-Review: setup CAA record for policy.wikimedia.org for namecheap (used by WP VIP GO) due by 2017-09-27 - https://phabricator.wikimedia.org/T173787#3553367 (10RobH) 05Open>03Resolved [16:10:01] (03CR) 10Muehlenhoff: [C: 031] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/373899 (https://phabricator.wikimedia.org/T173886) (owner: 10Herron) [16:10:14] 10Operations, 10Traffic: Extending our HSTS value beyond ~1y - https://phabricator.wikimedia.org/T170598#3553373 (10BBlack) Right. If I put on my self-adversarial hat, I imagine a counter-argument would be something along the lines of: 1) There exists some legacy UA that is deployed widely enough to matter (... [16:16:39] (03PS1) 10Ayounsi: Add new eqiad frack IPs [dns] - 10https://gerrit.wikimedia.org/r/373923 (https://phabricator.wikimedia.org/T169644) [16:17:49] (03CR) 10Ayounsi: [C: 032] Add new eqiad frack IPs [dns] - 10https://gerrit.wikimedia.org/r/373923 (https://phabricator.wikimedia.org/T169644) (owner: 10Ayounsi) [16:19:31] (03PS2) 10Jdlrobson: Enable an A/B test for page previews on EN and DE wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373920 (https://phabricator.wikimedia.org/T172291) [16:23:47] (03CR) 10Herron: [C: 032] Add shell accounts cooltey and sharvaniharan [puppet] - 10https://gerrit.wikimedia.org/r/373899 (https://phabricator.wikimedia.org/T173886) (owner: 10Herron) [16:23:53] (03PS3) 10Herron: Add shell accounts cooltey and sharvaniharan [puppet] - 10https://gerrit.wikimedia.org/r/373899 (https://phabricator.wikimedia.org/T173886) [16:26:19] 10Operations, 10Ops-Access-Requests, 10Gerrit, 10Patch-For-Review: Add new users Sharvaniharan and Cooltey to releasers-mobile - https://phabricator.wikimedia.org/T173886#3553442 (10herron) [16:31:11] 10Operations, 10Ops-Access-Requests, 10Gerrit, 10Patch-For-Review: Add new users Sharvaniharan and Cooltey to releasers-mobile - https://phabricator.wikimedia.org/T173886#3553453 (10herron) 05Open>03Resolved a:03herron https://gerrit.wikimedia.org/r/373899 has been merged and puppet manually run on b... [16:32:41] ebernhardson, awesome, thanks. Yeah, I wanted it for the debugger. [16:35:09] hello [16:37:06] /j #wikimedia-tech [16:37:13] oh sorry :D [16:37:30] cortex_: :) happens to all of us [16:37:40] hehe yea. [16:42:41] (03PS1) 10Ayounsi: Fix typo fasw-c-codfw -> fasw-c-eqiad [dns] - 10https://gerrit.wikimedia.org/r/373926 [16:43:07] (03CR) 10Ayounsi: [C: 032] Fix typo fasw-c-codfw -> fasw-c-eqiad [dns] - 10https://gerrit.wikimedia.org/r/373926 (owner: 10Ayounsi) [16:46:21] (03PS11) 10Gehel: wdqs - moving to role / profiles [puppet] - 10https://gerrit.wikimedia.org/r/369682 (https://phabricator.wikimedia.org/T171704) [16:47:04] (03CR) 10Gehel: wdqs - moving to role / profiles (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/369682 (https://phabricator.wikimedia.org/T171704) (owner: 10Gehel) [17:03:56] (03CR) 10Andrew Bogott: [C: 032] Expose wbqc_constraints view on Wiki Replicas [puppet] - 10https://gerrit.wikimedia.org/r/365969 (https://phabricator.wikimedia.org/T170927) (owner: 10Lucas Werkmeister (WMDE)) [17:04:00] (03PS4) 10Andrew Bogott: Expose wbqc_constraints view on Wiki Replicas [puppet] - 10https://gerrit.wikimedia.org/r/365969 (https://phabricator.wikimedia.org/T170927) (owner: 10Lucas Werkmeister (WMDE)) [17:14:39] (03CR) 10Lucas Werkmeister (WMDE): "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/365969 (https://phabricator.wikimedia.org/T170927) (owner: 10Lucas Werkmeister (WMDE)) [17:15:56] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: eqiad: rack frack refresh equipment - https://phabricator.wikimedia.org/T169644#3553579 (10Cmjohnson) Updated names to fasw-c1a-eqiad and fasw-c1b-eqiad physically, racktables and scs. [17:40:37] PROBLEM - puppet last run on labtestweb2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/mediawiki-firejail-ffmpeg2theora] [17:43:53] ^ andrewbogott I feel thats known but no clue what it's about [17:47:54] 10Operations, 10DC-Ops: determine/process/document bios firmware tracking/updating policies - https://phabricator.wikimedia.org/T141128#2487936 (10RobH) [17:52:37] (03PS1) 10Bearloga: statistics::private: Disable Discovery [puppet] - 10https://gerrit.wikimedia.org/r/373938 (https://phabricator.wikimedia.org/T170494) [17:56:44] 10Operations, 10ops-ulsfo, 10hardware-requests: Decommission cp4011, cp4012, cp4019, cp4020 - https://phabricator.wikimedia.org/T167377#3553711 (10RobH) p:05Normal>03Low [18:04:11] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto, 10Patch-For-Review, 10Wikimedia-log-errors: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3553733 (10Vachove... [18:08:06] RECOVERY - puppet last run on labtestweb2001 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [18:19:53] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto, 10Patch-For-Review, 10Wikimedia-log-errors: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3463322 (10Ankry)... [18:27:10] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: eqiad: rack frack refresh equipment - https://phabricator.wikimedia.org/T169644#3553848 (10ayounsi) Another change, please use xe-3/1/7 on cr1/2-eqiad instead of xe-5/1/1 (T149196) [18:33:44] (03PS1) 10Ayounsi: Move pfw3-eqiad from xe-5/1/1 to xe-3/1/7 [dns] - 10https://gerrit.wikimedia.org/r/373942 [18:34:28] (03CR) 10Ayounsi: [C: 032] Move pfw3-eqiad from xe-5/1/1 to xe-3/1/7 [dns] - 10https://gerrit.wikimedia.org/r/373942 (owner: 10Ayounsi) [18:47:54] 10Operations, 10Analytics, 10Analytics-Wikistats, 10Wikidata, and 6 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3553954 (10Marostegui) [18:51:14] hi. [18:51:31] someone is stating they can't reach commons. [18:51:36] error presented is: "Request from [IP] via cp3048 frontend, Varnish XID 411806751 Upstream caches: cp3048 int Error: 404, Requested domainname does not exist on this server at Fri, 25 Aug 2017 13:32:06 GMT" Cloudbound (talk) 13:33, 25 August 2017 (UTC) [18:51:53] (03PS1) 10Samtar: Upload wikipedia-wordmark-zh-c.svg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373948 (https://phabricator.wikimedia.org/T174192) [18:53:07] thedj: requested domainname? [18:53:10] Via some sort of proxy? [18:53:25] https://wikitech.wikimedia.org/wiki/Reporting_a_connectivity_issue kinda related, but not fully [18:57:13] "it is sporadic, and when it does happen it's within any article with a file hosted on Commons" [18:57:25] no the requested domain name is not listed. [19:02:15] (03PS1) 10Gehel: discovery analytics - disable report updater cronjob [puppet] - 10https://gerrit.wikimedia.org/r/373952 (https://phabricator.wikimedia.org/T174110) [19:03:56] (03CR) 10Bearloga: [C: 031] discovery analytics - disable report updater cronjob [puppet] - 10https://gerrit.wikimedia.org/r/373952 (https://phabricator.wikimedia.org/T174110) (owner: 10Gehel) [19:04:38] (03Abandoned) 10Bearloga: statistics::private: Disable Discovery [puppet] - 10https://gerrit.wikimedia.org/r/373938 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga) [19:06:11] (03CR) 10Gehel: [C: 032] discovery analytics - disable report updater cronjob [puppet] - 10https://gerrit.wikimedia.org/r/373952 (https://phabricator.wikimedia.org/T174110) (owner: 10Gehel) [19:07:07] !log kill stuck discovery report-updater process on stat1005 - T174110 [19:07:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:07:22] T174110: Private data access for non-person user that calculates metrics - https://phabricator.wikimedia.org/T174110 [19:09:20] !log actually not killing the "stuck discovery report-updater process on stat1005", it is already gone - T174110 [19:09:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:16:53] (03PS1) 10Samtar: Set up a new logo on zh-classical.m.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373956 (https://phabricator.wikimedia.org/T173408) [19:26:41] 10Operations, 10Cloud-Services, 10Cloud-VPS, 10cloud-services-team (Kanban): Puppet CA: virt1000.wikimedia.org' will expire on 2017-08-15 - https://phabricator.wikimedia.org/T168110#3554101 (10Andrew) 05stalled>03Resolved We're on new puppetmasters with new certs now. [19:26:46] 10Operations, 10Cloud-Services, 10Cloud-VPS, 10cloud-services-team (Kanban): Puppet CA: virt1000.wikimedia.org' will expire on 2017-08-15 - https://phabricator.wikimedia.org/T168110#3554104 (10Andrew) [19:26:49] 10Operations, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): Switch to new labs puppetmasters - https://phabricator.wikimedia.org/T171786#3554103 (10Andrew) 05Open>03Resolved [19:35:06] (03CR) 10Urbanecm: [C: 04-1] "373956 and 373948 should be done in one patch and one task. They are dependent on each other." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373948 (https://phabricator.wikimedia.org/T174192) (owner: 10Samtar) [19:35:08] (03CR) 10Urbanecm: [C: 04-1] "373956 and 373948 should be done in one patch and one task. They are dependent on each other." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373956 (https://phabricator.wikimedia.org/T173408) (owner: 10Samtar) [19:35:24] (03CR) 10Urbanecm: [C: 04-1] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373956 (https://phabricator.wikimedia.org/T173408) (owner: 10Samtar) [19:35:29] (03CR) 10Urbanecm: [C: 04-1] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373948 (https://phabricator.wikimedia.org/T174192) (owner: 10Samtar) [19:36:48] (03CR) 10jerkins-bot: [V: 04-1] Set up a new logo on zh-classical.m.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373956 (https://phabricator.wikimedia.org/T173408) (owner: 10Samtar) [19:48:20] (03CR) 10Mobrovac: "LGTM, but we cannot merge this until all of the nodes are decommissioned/re-imaged. Should we perhaps change the recipes one by one instea" [puppet] - 10https://gerrit.wikimedia.org/r/373863 (https://phabricator.wikimedia.org/T169939) (owner: 10Filippo Giunchedi) [19:57:03] (03PS2) 10Samtar: Upload wikipedia-wordmark-zh-c.svg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373948 (https://phabricator.wikimedia.org/T174192) [20:01:08] (03PS3) 10Samtar: Upload wikipedia-wordmark-zh-c.svg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373948 (https://phabricator.wikimedia.org/T174192) [20:01:42] (03Abandoned) 10Samtar: Set up a new logo on zh-classical.m.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373956 (https://phabricator.wikimedia.org/T173408) (owner: 10Samtar) [20:01:44] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373948 (https://phabricator.wikimedia.org/T174192) (owner: 10Samtar) [20:03:47] 10Operations, 10CirrusSearch, 10Discovery, 10Discovery-Search, and 6 others: Job queue is increasing non-stop - https://phabricator.wikimedia.org/T173710#3554154 (10aaron) Note that for de-duplication, as long as the job has rootJobTimestamp set, it will ignore rows already touched (page_touched) to a high... [20:03:51] 10Operations, 10MW-1.30-release-notes, 10Performance-Team, 10monitoring: Ensure getLagTimes.php is working properly - https://phabricator.wikimedia.org/T172559#3554155 (10aaron) 05Open>03Resolved [20:04:02] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373948 (https://phabricator.wikimedia.org/T174192) (owner: 10Samtar) [20:07:37] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto, 10Patch-For-Review, 10Wikimedia-log-errors: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3554170 (10zhuyife... [20:22:14] 10Operations, 10CirrusSearch, 10Discovery, 10Discovery-Search, and 6 others: Job queue is increasing non-stop - https://phabricator.wikimedia.org/T173710#3554229 (10EBernhardson) >>! In T173710#3554154, @aaron wrote: > Note that for de-duplication, as long as the job has rootJobTimestamp set, it will ignor... [20:44:32] 10Operations, 10Electron-PDFs, 10TCB-Team, 10Patch-For-Review, and 2 others: Deploy ElectronPdfService Extension to production - https://phabricator.wikimedia.org/T150185#3554300 (10mmodell) [20:44:34] 10Operations, 10IDS-extension, 10Wikimedia-Extension-setup, 10I18n, 10wikimedia-extension-review-queue: Deploy IDS rendering engine to production - https://phabricator.wikimedia.org/T148693#3554302 (10mmodell) [21:09:36] 10Operations, 10CirrusSearch, 10Discovery, 10Discovery-Search, and 6 others: Job queue is increasing non-stop - https://phabricator.wikimedia.org/T173710#3554380 (10aaron) Ignored purges still count as work items, yes. Rebound purges could explain some of the number. Also, given the backlog, lots of them... [22:37:08] 10Operations, 10CirrusSearch, 10Discovery, 10Discovery-Search, and 6 others: Job queue is increasing non-stop - https://phabricator.wikimedia.org/T173710#3554721 (10aaron) Though this bit is problematic: ``` "page_touched < " . $dbw->addQuotes( $dbw->timestamp( $touchTimestamp ) ) ``` ...seems like that... [22:44:59] 10Operations, 10ops-eqiad, 10DC-Ops, 10netops: Move eqiad frack to new infra - https://phabricator.wikimedia.org/T174218#3554735 (10ayounsi) [22:48:34] 10Operations, 10Analytics, 10Analytics-Wikistats, 10Wikidata, and 6 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3554755 (10gh87) [22:48:44] 10Operations, 10Graphite: unused grafana-dashboard indices on elasticsearch / logstash - https://phabricator.wikimedia.org/T174172#3554758 (10Krinkle) [23:38:21] (03PS1) 10Krinkle: Enable jQuery 3 on nlwiki, svwiki, plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/373987 (https://phabricator.wikimedia.org/T124742) [23:38:42] Krinkle: enwiki or go home [23:42:15] Reedy: hehe, soon, soon. [23:42:54] Reedy: Or maybe I'll set enwiki => !!mt_rand(0, 1) [23:42:55] muhahaha [23:43:41] And then wonder why load.php serves a different jquery version to what mw core thinks it should get? :D [23:44:08] Mm, not really, it only affects 1 request: load.php?modules=jquery|mediawiki [23:44:11] nothing else varries [23:44:14] not even the url [23:44:21] boo [23:44:28] but it'll be cached [23:44:38] so this means it changes every 5 minutes for each lang/skin combination [23:44:42] (might) [23:44:52] people will go mad :P