[00:01:05] <Krenair>	 why would it only affect wikimedia?
[00:03:29] <bblack>	 it could be our endpoints are problematic for his host being able to detect the path mtu problem
[00:04:09] <Krenair>	 the only thing I know of that's particularly crazy about our SSL setup is the number of SANs, but I'd think that'd show up after Server Hello?
[00:05:45] <bblack>	 yeah I don't mean e.g. our SSL being bad, but lower-level tuning of our edge servers could be wrong in related ways, that only affect these minority cases
[00:05:55] <bblack>	 https://blog.cloudflare.com/path-mtu-discovery-in-practice/ covers a lot of the ground I'm thinking of
[00:06:29] <bblack>	 we don't use ECMP though, so our solution probably doesn't have to be as complex as theirs
[00:07:06] <Krenair>	 I'm vaguely aware ICMP is involved in fragmentation problems
[00:07:29] <bblack>	 ECMP is something different, not a typo for ICMP
[00:07:34] <Krenair>	 I know
[00:08:04] <bblack>	 but anyways, for instance we don't currently turn on /proc/sys/net/ipv4/tcp_mtu_probing like they're suggesting in that blog post
[00:08:38] <bblack>	 but that's for the v4 case.  for v6 the only solutions they offer there is dumping our server-side MTU down to 1280, or using something like their pmtu daemon
[00:09:27] <bblack>	 err wait, PMTUD is their long-term v4 solution.  I don't think they really call out a better answer than 1280 for the v6 version of the problem
[00:09:51] <bblack>	 maybe lvs icmp routing isn't working right, too
[00:10:05] <bblack>	 (we do have some config turned on for that, but that doesn't necessarily mean it's working right)
[00:11:09] <Krenair>	 traceroute to wikimedia is broken from my machine and from ZxelA's
[00:11:34] <Krenair>	 that'd indicate ICMP getting blocked wouldn't it?
[00:11:37] <bblack>	 broken in what sense?
[00:11:54] <bblack>	 (and no, traceroute usually uses UDP to unreachable ports, but then there are many different "traceroute" tools these days)
[00:12:41] <Reedy>	 Presume he means broken as in it's not responding for all hosts down the chain
[00:13:03] <bblack>	 yeah that's often the case
[00:13:08] <Krenair>	 like https://phabricator.wikimedia.org/P6121
[00:13:11] <bblack>	 "mtr" might give a better idea
[00:13:47] <Krenair>	 mtr works
[00:13:51] <Reedy>	 Krenair: TalkTalk? Seriously?
[00:14:13] <Krenair>	 Reedy, I didn't get to choose the ISP :)
[00:14:24] <Krenair>	 but that was essentially my reaction
[00:14:35] <Reedy>	 I've always chosen the ISP even when at my parents
[00:16:55] <p858snake>	 I get the choice of shitty or shittier when it comes to ISPs around here
[00:17:14] <Krenair>	 ^
[00:17:30] <Reedy>	 If you can get TT, you can get most of the retail ISPs in the UK
[01:04:16] <icinga-wm>	 PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1508029451 600 - REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 4163902 keys, up 4 minutes 8 seconds - replication_delay is 1508029451
[01:04:16] <icinga-wm>	 PROBLEM - Check health of redis instance on 6480 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1508029451 600 - REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 4163534 keys, up 4 minutes 8 seconds - replication_delay is 1508029451
[01:04:26] <icinga-wm>	 PROBLEM - Check health of redis instance on 6379 on rdb2001 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6379
[01:04:46] <icinga-wm>	 PROBLEM - Check health of redis instance on 6481 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1508029479 600 - REDIS 2.8.17 on 127.0.0.1:6481 has 1 databases (db0) with 4160453 keys, up 4 minutes 36 seconds - replication_delay is 1508029479
[01:05:16] <icinga-wm>	 RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 4160318 keys, up 5 minutes 7 seconds - replication_delay is 0
[01:05:26] <icinga-wm>	 RECOVERY - Check health of redis instance on 6379 on rdb2001 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8866052 keys, up 5 minutes 20 seconds - replication_delay is 0
[01:05:47] <icinga-wm>	 RECOVERY - Check health of redis instance on 6481 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6481 has 1 databases (db0) with 4157727 keys, up 5 minutes 41 seconds - replication_delay is 0
[01:06:16] <icinga-wm>	 RECOVERY - Check health of redis instance on 6480 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 4159319 keys, up 6 minutes 8 seconds - replication_delay is 0
[01:07:26] <icinga-wm>	 PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:19:22] <wikibugs_>	 10Operations, 10Deployments, 10Beta-Cluster-reproducible, 10HHVM, and 3 others: Switch mwscript from Zend PHP5 to default php alternative (e.g. HHVM or PHP7) - https://phabricator.wikimedia.org/T146285#3685666 (10Smalyshev) Speaking of which, if we're moving to php7 on mwscript and off hhvm, should we also...
[01:20:08] <wikibugs_>	 10Operations, 10Deployments, 10Beta-Cluster-reproducible, 10HHVM, and 2 others: Switch mwscript from Zend PHP5 to default php alternative (e.g. HHVM or PHP7) - https://phabricator.wikimedia.org/T146285#3685667 (10Reedy)
[01:32:26] <icinga-wm>	 RECOVERY - puppet last run on mw1226 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures
[02:16:53] <wikibugs_>	 (03Draft1) 10Paladox: Enable auto submodule updates [puppet] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/384306
[02:16:55] <wikibugs_>	 (03PS2) 10Paladox: Enable auto submodule updates [puppet] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/384306
[02:20:11] <wikibugs_>	 (03Draft1) 10Paladox: Add branch field to .gitmodules [puppet] - 10https://gerrit.wikimedia.org/r/384307
[02:20:16] <wikibugs_>	 (03PS2) 10Paladox: Add branch field to .gitmodules [puppet] - 10https://gerrit.wikimedia.org/r/384307
[02:24:22] <wikibugs_>	 (03PS3) 10Paladox: Enable auto submodule updates [puppet] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/384306
[02:24:43] <wikibugs_>	 (03CR) 10Paladox: "See https://gerrit-review.googlesource.com/Documentation/user-submodules.html" [puppet] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/384306 (owner: 10Paladox)
[03:27:46] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 605.35 seconds
[03:46:47] <icinga-wm>	 PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: Traceback (most recent call last)
[03:48:26] <icinga-wm>	 PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: Traceback (most recent call last)
[03:49:17] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: Traceback (most recent call last)
[03:50:17] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: Traceback (most recent call last)
[03:50:27] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: Traceback (most recent call last)
[03:50:57] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: Traceback (most recent call last)
[03:56:46] <icinga-wm>	 PROBLEM - Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/dumps - 288 bytes in 0.021 second response time
[03:56:46] <icinga-wm>	 RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 0 probes of 290 (alerts on 19) - https://atlas.ripe.net/measurements/1790945/#!map
[03:58:26] <icinga-wm>	 RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 0 probes of 291 (alerts on 19) - https://atlas.ripe.net/measurements/1791307/#!map
[03:59:17] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 8 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map
[04:00:16] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 8 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[04:00:36] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 0 probes of 288 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[04:00:57] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 9 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[04:33:06] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 247.12 seconds
[10:46:46] <icinga-wm>	 RECOVERY - Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.021 second response time
[11:14:28] <wikibugs_>	 (03CR) 10Dereckson: [C: 04-1] "Config looks good, but wikidata client should be double checked." (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372798 (https://phabricator.wikimedia.org/T173643) (owner: 10Urbanecm)
[11:15:56] <logmsgbot>	 !log mobrovac@tin Started restart [electron-render/deploy@8dd5f13]: Electron hanging - T174916
[11:16:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:16:03] <stashbot>	 T174916: electron/pdfrender hangs - https://phabricator.wikimedia.org/T174916
[11:16:08] <wikibugs_>	 10Operations, 10DBA, 10Support-and-Safety, 10Patch-For-Review, 10Wiki-Setup (Create): Create elections committee private wiki - https://phabricator.wikimedia.org/T174370#3685824 (10Dereckson) p:05Low>03Normal There is hi.wiktionary to create soon, I'll see next week to plan a window for it, and if so...
[12:43:16] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 29 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[12:48:16] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 9 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[13:41:26] <icinga-wm>	 PROBLEM - puppet last run on labtestweb2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:49:16] <icinga-wm>	 PROBLEM - puppet last run on cp1068 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:08:46] <icinga-wm>	 PROBLEM - puppet last run on mw1319 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:11:26] <icinga-wm>	 RECOVERY - puppet last run on labtestweb2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[14:19:16] <icinga-wm>	 RECOVERY - puppet last run on cp1068 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[14:38:46] <icinga-wm>	 RECOVERY - puppet last run on mw1319 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:50:28] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: Port docker builder [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/384081 (https://phabricator.wikimedia.org/T177276)
[15:08:37] <icinga-wm>	 PROBLEM - puppet last run on lvs1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:38:37] <icinga-wm>	 RECOVERY - puppet last run on lvs1010 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[15:45:41] <wikibugs_>	 (03CR) 10MarcoAurelio: [C: 04-1] Enable blocking feature of abuse filter in fawikiquote (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384252 (https://phabricator.wikimedia.org/T178227) (owner: 10Ladsgroup)
[15:49:49] <wikibugs_>	 (03CR) 10MarcoAurelio: "After some time without complains I guess we can safely assume this is correct IMHO." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370791 (https://phabricator.wikimedia.org/T101983) (owner: 10TerraCodes)
[15:50:05] <wikibugs_>	 (03CR) 10MarcoAurelio: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370791 (https://phabricator.wikimedia.org/T101983) (owner: 10TerraCodes)
[16:40:46] <icinga-wm>	 PROBLEM - Check systemd state on relforge1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[16:41:37] <icinga-wm>	 PROBLEM - Check systemd state on relforge1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[18:12:37] <icinga-wm>	 PROBLEM - puppet last run on rdb1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[18:29:41] <wikibugs_>	 (03CR) 10Huji: Enable blocking feature of abuse filter in fawikiquote (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384252 (https://phabricator.wikimedia.org/T178227) (owner: 10Ladsgroup)
[18:42:36] <icinga-wm>	 RECOVERY - puppet last run on rdb1007 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[18:47:16] <wikibugs_>	 (03CR) 10Luke081515: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370791 (https://phabricator.wikimedia.org/T101983) (owner: 10TerraCodes)
[19:04:43] <wikibugs_>	 (03PS7) 10TerraCodes: Remove overlapping userrights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370791 (https://phabricator.wikimedia.org/T101983)
[20:04:21] <wikibugs_>	 10Operations, 10Cloud-Services, 10Toolforge, 10Traffic, 10HTTPS: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367#3686086 (10bd808)
[20:20:22] <wikibugs_>	 10Operations, 10Cloud-Services, 10Toolforge, 10Traffic, 10HTTPS: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367#3686094 (10bd808) >>! In T128409#2233337, @BBlack wrote: > What you can do to help modern browsers, though, without taking the redirect and/o...
[20:22:17] <wikibugs_>	 (03PS1) 10Mforns: Fix cron job for refinery data drop of MediaWiki snapshots [puppet] - 10https://gerrit.wikimedia.org/r/384346 (https://phabricator.wikimedia.org/T178256)
[20:56:04] <wikibugs_>	 10Operations, 10Cloud-Services, 10Toolforge, 10Traffic, 10HTTPS: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367#3686149 (10bd808)
[21:11:34] <wikibugs_>	 (03PS2) 10Dereckson: Add additional namespaces to search results for bnwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383794 (https://phabricator.wikimedia.org/T178041) (owner: 10DCausse)
[21:12:35] <wikibugs_>	 (03CR) 10Dereckson: "PS2: namespace numbers can be cryptic, and especially confusing for wikisource, as they change from one wiki to another, so we can documen" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383794 (https://phabricator.wikimedia.org/T178041) (owner: 10DCausse)
[22:07:18] <wikibugs_>	 (03PS2) 10ArielGlenn: Increase the shard count for Wikidata entity dumps from 5 to 6 [puppet] - 10https://gerrit.wikimedia.org/r/383414 (https://phabricator.wikimedia.org/T177486) (owner: 10Hoo man)
[22:08:35] <wikibugs_>	 (03CR) 10ArielGlenn: [C: 032] Increase the shard count for Wikidata entity dumps from 5 to 6 [puppet] - 10https://gerrit.wikimedia.org/r/383414 (https://phabricator.wikimedia.org/T177486) (owner: 10Hoo man)
[22:11:28] <wikibugs_>	 (03CR) 10ArielGlenn: [C: 032] Test different batch sizes in dumpwikidatajson.sh [puppet] - 10https://gerrit.wikimedia.org/r/384204 (https://phabricator.wikimedia.org/T177486) (owner: 10Hoo man)
[22:12:09] <wikibugs_>	 (03PS3) 10ArielGlenn: Test different batch sizes in dumpwikidatajson.sh [puppet] - 10https://gerrit.wikimedia.org/r/384204 (https://phabricator.wikimedia.org/T177486) (owner: 10Hoo man)
[22:18:22] <wikibugs_>	 (03PS2) 10ArielGlenn: Do not make dumps of wb_entity_per_page [puppet] - 10https://gerrit.wikimedia.org/r/352797 (https://phabricator.wikimedia.org/T140890) (owner: 10Ladsgroup)
[22:19:02] <wikibugs_>	 (03CR) 10ArielGlenn: [C: 032] Do not make dumps of wb_entity_per_page [puppet] - 10https://gerrit.wikimedia.org/r/352797 (https://phabricator.wikimedia.org/T140890) (owner: 10Ladsgroup)
[22:24:25] <apergos>	 consider it a very early monday deploy :-P
[22:24:35] <apergos>	 (01:24 am here!)
[23:31:31] <wikibugs_>	 10Operations, 10Cloud-Services, 10Toolforge, 10Traffic, 10HTTPS: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367#3686227 (10BBlack) Be careful with `preload`.  It's only purpose is to signal to the Chromium list maintainers that it's ok to you preload yo...
[23:34:39] <wikibugs_>	 10Operations, 10Cloud-Services, 10Toolforge, 10Traffic, 10HTTPS: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367#3686228 (10bd808) >>! In T102367#3686227, @BBlack wrote: > Be careful with `preload`.  It's only purpose is to signal to the Chromium list ma...
[23:39:05] <wikibugs_>	 (03PS2) 10Ori.livneh: Drop support for the legacy configuration format [debs/pybal] - 10https://gerrit.wikimedia.org/r/317823