[00:00:27] PROBLEM - HHVM rendering on mw2165 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:01:17] RECOVERY - HHVM rendering on mw2165 is OK: HTTP OK: HTTP/1.1 200 OK - 75190 bytes in 0.293 second response time [00:01:32] (03PS4) 10Dzahn: DHCP: switch from jessie to stretch as default installer [puppet] - 10https://gerrit.wikimedia.org/r/399826 (https://phabricator.wikimedia.org/T182215) [00:58:53] (03PS1) 10Dzahn: parsoid::testing: convert role to profile [puppet] - 10https://gerrit.wikimedia.org/r/404063 [01:02:42] (03CR) 10Dzahn: "the role class has to be converted to a profile and the Hiera lookup moved to a parameter of the profile class. then jenkins-bot would be " [puppet] - 10https://gerrit.wikimedia.org/r/403464 (owner: 10Arlolra) [01:12:27] PROBLEM - HHVM rendering on mw2244 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:13:17] RECOVERY - HHVM rendering on mw2244 is OK: HTTP OK: HTTP/1.1 200 OK - 75200 bytes in 0.293 second response time [03:25:18] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 817.90 seconds [03:58:18] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 233.76 seconds [04:31:01] (03PS42) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) [04:31:58] (03CR) 10jerkins-bot: [V: 04-1] $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) (owner: 10TerraCodes) [04:33:50] (03PS43) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) [04:34:47] (03CR) 10jerkins-bot: [V: 04-1] $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) (owner: 10TerraCodes) [04:36:49] (03PS44) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) [04:46:48] PROBLEM - puppet last run on cp4032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:16:47] RECOVERY - puppet last run on cp4032 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:04:16] (03Draft2) 10Biplab Anand: Add Draft namespace to the Nepali Wikipedia. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404067 [06:05:51] (03PS3) 10Biplab Anand: Add Draft namespace to the Nepali Wikipedia. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404067 (https://phabricator.wikimedia.org/T184157) [06:16:29] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2036 - https://phabricator.wikimedia.org/T184836#3898392 (10Marostegui) a:03Papaul This host is out of warranty, but maybe @Papaul has some spare disks somewhere? [06:17:15] 10Operations, 10ops-eqiad, 10hardware-requests, 10cloud-services-team (Kanban): Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T184832#3898395 (10Marostegui) [06:41:02] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0 [06:43:32] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 26 probes of 289 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [06:48:42] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 11 probes of 289 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [06:49:12] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0 [07:37:45] (03CR) 10Jayprakash12345: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404067 (https://phabricator.wikimedia.org/T184157) (owner: 10Biplab Anand) [07:41:00] (03CR) 10Jayprakash12345: [C: 031] Add Draft namespace to the Nepali Wikipedia. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404067 (https://phabricator.wikimedia.org/T184157) (owner: 10Biplab Anand) [08:53:22] 10Operations, 10Discourse, 10Developer-Relations (Jan-Mar-2018): Setup reply via email in discourse-mediawiki.wmflabs.org - https://phabricator.wikimedia.org/T184592#3898495 (10Samwilson) Is this a Google POP account? I think you have to create it a new "app password" (and "enable legacy access" or somesuch)... [08:56:26] 10Operations, 10Discourse, 10Developer-Relations (Jan-Mar-2018): Setup reply via email in discourse-mediawiki.wmflabs.org - https://phabricator.wikimedia.org/T184592#3898496 (10revi) >>! In T184592#3898495, @Samwilson wrote: > Is this a Google POP account? I think you have to create it a new "app password" (... [08:58:45] 10Operations, 10Discourse, 10Developer-Relations (Jan-Mar-2018): Setup reply via email in discourse-mediawiki.wmflabs.org - https://phabricator.wikimedia.org/T184592#3898497 (10Samwilson) Thanks @revi! I was just being lazy cause I seem to have to read the docs every time I want to do that (and it's saturday... [10:01:08] PROBLEM - puppet last run on cp1047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:01:17] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:01:28] PROBLEM - puppet last run on mw1262 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:01:28] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:01:47] PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:01:47] PROBLEM - puppet last run on ms-be1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:01:48] PROBLEM - puppet last run on puppetmaster2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:02:07] PROBLEM - puppet last run on db1072 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:02:08] PROBLEM - puppet last run on ms-be1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:02:47] PROBLEM - puppet last run on rdb1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:02:57] PROBLEM - puppet last run on cp4025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:03:17] PROBLEM - puppet last run on nitrogen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:03:18] PROBLEM - puppet last run on analytics1034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:03:27] PROBLEM - puppet last run on mw1275 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:03:37] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:04:17] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:28:17] RECOVERY - puppet last run on nitrogen is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [10:28:37] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [10:29:17] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [10:31:08] RECOVERY - puppet last run on cp1047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:31:08] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:31:28] RECOVERY - puppet last run on mw1262 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:31:28] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:31:47] RECOVERY - puppet last run on hafnium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:31:47] RECOVERY - puppet last run on ms-be1019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:31:48] RECOVERY - puppet last run on puppetmaster2002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:32:07] RECOVERY - puppet last run on db1072 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [10:32:08] RECOVERY - puppet last run on ms-be1028 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:32:47] RECOVERY - puppet last run on rdb1006 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [10:32:57] RECOVERY - puppet last run on cp4025 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [10:33:17] RECOVERY - puppet last run on analytics1034 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:33:27] RECOVERY - puppet last run on mw1275 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [12:49:13] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10NewPHP, and 2 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3898593 (10daniel) As per the TechCom meeting on January 10, this RFC has been approved after an uneventful Last Call period with no issues raised. [13:15:27] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0 [13:15:47] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [14:54:57] PROBLEM - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [3000.0] https://grafana.wikimedia.org/dashboard/file/swift.json?panelId=9&fullscreen&orgId=1&var-DC=eqiad-prod [15:02:57] RECOVERY - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is OK: OK: Less than 80.00% above the threshold [2000.0] https://grafana.wikimedia.org/dashboard/file/swift.json?panelId=9&fullscreen&orgId=1&var-DC=eqiad-prod [15:53:57] PROBLEM - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [3000.0] https://grafana.wikimedia.org/dashboard/file/swift.json?panelId=9&fullscreen&orgId=1&var-DC=eqiad-prod [16:01:57] RECOVERY - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is OK: OK: Less than 80.00% above the threshold [2000.0] https://grafana.wikimedia.org/dashboard/file/swift.json?panelId=9&fullscreen&orgId=1&var-DC=eqiad-prod [16:20:12] (03PS1) 10Addshore: Add basic Dockerfile to run docker-pkg [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/404084 [17:32:57] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [17:33:37] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0 [17:34:13] (03PS1) 10Urbanecm: Whitelist audiovis.nac.gov.pl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404088 (https://phabricator.wikimedia.org/T184853) [17:50:46] (03CR) 10Mobrovac: [C: 031] Scap canary: cache last good deploy time [puppet] - 10https://gerrit.wikimedia.org/r/403574 (https://phabricator.wikimedia.org/T183999) (owner: 10Thcipriani) [18:05:06] (03CR) 10Mobrovac: [C: 04-1] "Here's what you do. Create modules/profile/manifests/parsoid/testing.pp and place the contents of the parsoid::testing role inside, with t" [puppet] - 10https://gerrit.wikimedia.org/r/403464 (owner: 10Arlolra) [18:10:28] (03CR) 10Steinsplitter: [C: 031] Add Draft namespace to the Nepali Wikipedia. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404067 (https://phabricator.wikimedia.org/T184157) (owner: 10Biplab Anand) [18:20:19] 10Operations, 10Page Content Service, 10RESTBase, 10Reading-Infrastructure-Team-Backlog, and 3 others: Inconsistent behavior when fetching redirected pages with Cache-Control header - https://phabricator.wikimedia.org/T184833#3898780 (10mobrovac) This is strange indeed. This part of the logic is handled by... [18:46:24] 10Operations, 10DBA, 10Performance-Team, 10Availability (Multiple-active-datacenters): Perform testing for TLS effect on connection rate - https://phabricator.wikimedia.org/T171071#3898792 (10jcrespo) These are my results with your script, just changing the query to run on a real table (heartbeat) and with... [19:05:55] 10Operations, 10Packaging: rebuild php-wikidiff2 and php-luasandbox for php7 and stretch - https://phabricator.wikimedia.org/T184270#3898807 (10ArielGlenn) [19:19:41] 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): 2018-01-02: labstore Tools and Misc share very full - https://phabricator.wikimedia.org/T183920#3898825 (10Cyberpower678) [19:19:44] 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): tools.iabot is using 1.3T of 8T available tools nfs storage - https://phabricator.wikimedia.org/T183953#3898824 (10Cyberpower678) 05Open>03Resolved [19:45:40] 10Operations, 10Page Content Service, 10RESTBase, 10Reading-Infrastructure-Team-Backlog, and 3 others: Inconsistent behavior when fetching redirected pages with Cache-Control header - https://phabricator.wikimedia.org/T184833#3898842 (10Dbrant) In principle, this kind of variation wouldn't be problematic f... [21:10:22] (03CR) 10ArielGlenn: [C: 031] "Looks legit to me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403984 (https://phabricator.wikimedia.org/T184664) (owner: 10Kaldari) [21:23:42] (03CR) 10ArielGlenn: [WIP] php7 manifests for mediawiki on stretch (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/394977 (owner: 10ArielGlenn) [21:24:39] (03PS17) 10ArielGlenn: [WIP] php7 manifests for mediawiki on stretch [puppet] - 10https://gerrit.wikimedia.org/r/394977 [22:04:17] 10Operations, 10Page Content Service, 10RESTBase, 10Reading-Infrastructure-Team-Backlog, and 3 others: Inconsistent behavior when fetching redirected pages with Cache-Control header - https://phabricator.wikimedia.org/T184833#3898911 (10BBlack) See also T134464 . I think part of the problem is the VCL wor... [22:09:37] 10Operations, 10DBA, 10Performance-Team, 10Availability (Multiple-active-datacenters): Perform testing for TLS effect on connection rate - https://phabricator.wikimedia.org/T171071#3898913 (10jcrespo) If I use proxysql, pointing to db1031, we get better results than querying to a same dc, but remote host:... [23:41:25] 10Operations, 10Patch-For-Review: Update people.wikimedia.org with the 2017 Wikimania hackathon group photo - https://phabricator.wikimedia.org/T184338#3898954 (10Framawiki) >>! In T184338#3882341, @faidon wrote: > That's not the Wikimania 2017 Hackathon (which was in Montreal), but the 2017 Hackathon in Vienn... [23:42:48] (03PS3) 10Framawiki: Update group photo on people.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/402583 (https://phabricator.wikimedia.org/T184338) [23:43:03] 10Operations, 10Patch-For-Review: Update people.wikimedia.org with the 2017 Wikimedia hackathon group photo - https://phabricator.wikimedia.org/T184338#3898964 (10Framawiki)