[00:24:30] (03PS1) 10Aaron Schulz: Add yubikey nano key ssh key for aaron [puppet] - 10https://gerrit.wikimedia.org/r/393432 [01:12:24] PROBLEM - HHVM rendering on mw1293 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:13:14] RECOVERY - HHVM rendering on mw1293 is OK: HTTP OK: HTTP/1.1 200 OK - 75275 bytes in 0.147 second response time [03:23:15] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 845.98 seconds [03:33:24] PROBLEM - eventstreams on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:33:44] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: /{domain}/v1/page/summary/{title}{/revision}{/tid} (Get summary for Barack Obama) timed out before a response was received: /{domain}/v1/page/media/{title} (retrieve images and videos of en.wp Cat page via media route) timed out before a response was received: /{domain}/v1/page/mobile-sections-lead/{title} (retrieve lead section of en.wp Altrincham page via mobile-se [03:33:44] out before a response was received: /{domain}/v1/feed/onthisday/{type}/{mm}/{dd} (retrieve all events on January 15) timed out before a response was received: /{domain}/v1/page/definition/{title} (retrieve en-wiktionary definitions for cat) timed out before a response was received [03:34:15] RECOVERY - eventstreams on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 929 bytes in 0.023 second response time [03:34:44] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [03:37:34] PROBLEM - puppet last run on mw2162 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz] [04:02:34] RECOVERY - puppet last run on mw2162 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [04:02:34] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 257.36 seconds [06:28:35] PROBLEM - puppet last run on mw2229 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ssh/userkeys/pybal-check] [06:29:04] PROBLEM - puppet last run on mw2127 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/share/ca-certificates/RapidSSL_SHA256_CA_-_G3.crt] [06:58:35] RECOVERY - puppet last run on mw2229 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:59:04] RECOVERY - puppet last run on mw2127 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:12:24] PROBLEM - Nginx local proxy to apache on mw2111 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:13:14] RECOVERY - Nginx local proxy to apache on mw2111 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.198 second response time [08:06:24] PROBLEM - HHVM rendering on mw2188 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:07:14] RECOVERY - HHVM rendering on mw2188 is OK: HTTP OK: HTTP/1.1 200 OK - 75199 bytes in 0.287 second response time [11:40:44] 10Operations, 10Phabricator, 10Traffic, 10Zero: Missing IP addresses for Maroc Telecom - https://phabricator.wikimedia.org/T174342#3787246 (10Aklapper) ...and we just saw 154.150.77.xx in Phab [15:22:04] PROBLEM - High lag on wdqs2002 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1800.0] https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1panelId=8fullscreen [15:45:25] PROBLEM - cxserver endpoints health on scb1002 is CRITICAL: /v1/mt/{from}/{to}{/provider} (Machine translate an HTML fragment using Apertium.) timed out before a response was received [15:46:24] RECOVERY - cxserver endpoints health on scb1002 is OK: All endpoints are healthy [17:12:52] (03Draft2) 10Jayprakash12345: IP cap lift for Semaine contributive 2017-2018 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393442 [17:13:24] (03PS3) 10Jayprakash12345: IP cap lift for Semaine contributive 2017-2018 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393442 (https://phabricator.wikimedia.org/T181360) [17:14:44] (03CR) 10jerkins-bot: [V: 04-1] IP cap lift for Semaine contributive 2017-2018 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393442 (https://phabricator.wikimedia.org/T181360) (owner: 10Jayprakash12345) [17:17:41] (03PS4) 10Jayprakash12345: IP cap lift for Semaine contributive 2017-2018 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393442 (https://phabricator.wikimedia.org/T181360) [17:18:18] (03CR) 10Jayprakash12345: "@MarcoAurelio Please review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393442 (https://phabricator.wikimedia.org/T181360) (owner: 10Jayprakash12345) [17:36:00] (03PS1) 10ArielGlenn: keep misc dumps dirs owner on generating host same as on web server for now [puppet] - 10https://gerrit.wikimedia.org/r/393443 [17:36:27] (03CR) 10jerkins-bot: [V: 04-1] keep misc dumps dirs owner on generating host same as on web server for now [puppet] - 10https://gerrit.wikimedia.org/r/393443 (owner: 10ArielGlenn) [17:42:33] (03PS2) 10ArielGlenn: keep misc dumps dirs owner on generating host same as on web server for now [puppet] - 10https://gerrit.wikimedia.org/r/393443 [17:50:40] (03PS3) 10ArielGlenn: keep misc dumps dirs owner on generating host same as on web server for now [puppet] - 10https://gerrit.wikimedia.org/r/393443 [17:55:51] (03CR) 10ArielGlenn: [C: 032] keep misc dumps dirs owner on generating host same as on web server for now [puppet] - 10https://gerrit.wikimedia.org/r/393443 (owner: 10ArielGlenn) [19:01:39] hi, I have some technical question regarding anon vandals - for users connecting with NAT, it is possible that many users from the ISP will share the IP. is the source port of connection is logged somewhere exposed to checkusers/admins? [19:03:00] (some admin in hewiki report to ISP about a troll and the ISP says we should provide the source port, not just the IP and the time [19:04:43] no, it's not [19:05:42] is this something that would be valid to ask in phab for future development? are there many ISPs using carrier grade NAT? [19:06:05] There's enough... And as many aren't implementing IPv6, it's only going to get more common [19:06:12] There's some countries running CGNAT [19:07:53] $_SERVER['REMOTE_PORT'] would in theory have the information... As long as it wasn't mangled going through the various WMF layers [19:08:36] I guess it's possible, might need some work at various levels.. [19:08:41] But filing a task is probably step 1 [19:09:11] OK. this is not urgent of course, just wondering if this is something that even make sense to ask for :) [19:09:27] It sounds like you've got a use case at least [19:09:44] If it's something ISPs want to help track people down for abuse reports [19:11:01] I'll open a task on phab for it and will explain the motivation for it. thanks! [19:25:45] I would expect $_SERVER['REMOTE_PORT'] to be useless inside WMF infrastructure [19:26:28] (03CR) 10Framawiki: [C: 031] Remove single editor tab for plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393121 (https://phabricator.wikimedia.org/T181045) (owner: 10TerraCodes) [19:27:23] I would not expect nginx and varnish to use the same source port as the client [19:27:47] (03CR) 10Framawiki: "@Chad, I suppose that it was merged during a SWAP session ? Please add "SWAP" comment when you +2 in this repo :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393075 (https://phabricator.wikimedia.org/T181241) (owner: 10Jon Harald Søby) [19:29:50] 10Operations, 10CheckUser, 10Traffic: Log source port for anonymous users and expose it for sysops/checkusers - https://phabricator.wikimedia.org/T181368#3787516 (10Krenair) I would expect $_SERVER['REMOTE_PORT'] to be useless inside WMF infrastructure I would not expect nginx and varnish... [20:21:52] (03CR) 10Zoranzoki21: [C: 031] IP cap lift for Semaine contributive 2017-2018 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393442 (https://phabricator.wikimedia.org/T181360) (owner: 10Jayprakash12345) [20:23:04] PROBLEM - Varnish HTTP text-backend - port 3128 on cp4029 is CRITICAL: connect to address 10.128.0.129 and port 3128: Connection refused [20:24:04] RECOVERY - Varnish HTTP text-backend - port 3128 on cp4029 is OK: HTTP OK: HTTP/1.1 200 OK - 178 bytes in 0.157 second response time [21:07:24] (03CR) 10TerraCodes: [C: 04-1] "you don't need to include commons" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393442 (https://phabricator.wikimedia.org/T181360) (owner: 10Jayprakash12345) [21:09:51] (03CR) 10Zoranzoki21: [C: 031] "> you don't need to include commons" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393442 (https://phabricator.wikimedia.org/T181360) (owner: 10Jayprakash12345) [21:13:00] (03CR) 10TerraCodes: [C: 04-1] "> > you don't need to include commons" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393442 (https://phabricator.wikimedia.org/T181360) (owner: 10Jayprakash12345) [21:14:47] (03CR) 10Zoranzoki21: [C: 031] "> > > you don't need to include commons" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393442 (https://phabricator.wikimedia.org/T181360) (owner: 10Jayprakash12345) [21:15:13] (03PS5) 10Zoranzoki21: IP cap lift for Semaine contributive 2017-2018 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393442 (https://phabricator.wikimedia.org/T181360) (owner: 10Jayprakash12345) [21:19:28] (03CR) 10TerraCodes: [C: 031] "idk why you need a limit of 30 if you're expecting 15, but other than that, LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393442 (https://phabricator.wikimedia.org/T181360) (owner: 10Jayprakash12345) [21:22:54] PROBLEM - cxserver endpoints health on scb1001 is CRITICAL: /v2/translate/{from}/{to}{/provider} (Machine translate an HTML fragment using Apertium, adapt the links to target language wiki.) timed out before a response was received: /v1/mt/{from}/{to}{/provider} (Machine translate an HTML fragment using Apertium.) timed out before a response was received [21:23:44] RECOVERY - cxserver endpoints health on scb1001 is OK: All endpoints are healthy [21:34:45] PROBLEM - Check Varnish expiry mailbox lag on cp4026 is CRITICAL: CRITICAL: expiry mailbox lag is 2001742 [21:36:34] 10Operations, 10Performance-Team, 10Traffic: load.php response taking 160s (of which only 0.031s in Apache) - https://phabricator.wikimedia.org/T181315#3787619 (10Krinkle) [22:07:07] 10Operations, 10Phabricator, 10Traffic, 10Zero: Missing IP addresses for Maroc Telecom - https://phabricator.wikimedia.org/T174342#3787625 (10Tgr) So what does it take for this task to be resolved? Is someone actually looking into it or is it just being pushed around? We have a set of IP blocks in the Pha... [22:11:55] lol, zero [22:24:39] 10Operations, 10Phabricator, 10Traffic, 10Zero: Missing IP addresses for Maroc Telecom - https://phabricator.wikimedia.org/T174342#3787629 (10Tgr) Presumably if the pirates go through the effort of uploading files regularly then it works (ie. some users can actually download them while being zero-rated), r... [22:34:45] RECOVERY - Check Varnish expiry mailbox lag on cp4026 is OK: OK: expiry mailbox lag is 4 [23:25:24] RECOVERY - High lag on wdqs2002 is OK: OK: Less than 30.00% above the threshold [600.0] https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1panelId=8fullscreen [23:28:35] PROBLEM - puppet last run on labtestneutron2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:31:59] What's the deal with the slow Wikidata servers? [23:51:32] what slow servers? [23:58:44] RECOVERY - puppet last run on labtestneutron2002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures