[00:00:04] twentyafterfour: Your horoscope predicts another unfortunate Phabricator update deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190926T0000). [01:09:54] (03PS4) 10Alex Monk: nrpe::monitor_service: Make notes_url optional for ensure=absent [puppet] - 10https://gerrit.wikimedia.org/r/529590 [01:33:58] (03CR) 10Bstorm: "Ok, so I made it into a table here: https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Toolforge_Kuber" [puppet] - 10https://gerrit.wikimedia.org/r/537755 (https://phabricator.wikimedia.org/T227290) (owner: 10Bstorm) [01:36:08] 10Operations, 10Analytics, 10Fundraising-Backlog, 10SRE-Access-Requests: Banner History and page view data access for fundraising analysts - Jerrie and Erin - https://phabricator.wikimedia.org/T233636 (10Nuria) @jrobell Are the two analysts full timer or contractors? If contractors they would need an NDA o... [01:45:55] (03CR) 10Alex Monk: toolforge-k8s: proposed role for all tools (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/537755 (https://phabricator.wikimedia.org/T227290) (owner: 10Bstorm) [01:48:48] (03CR) 10Bstorm: toolforge-k8s: proposed role for all tools (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/537755 (https://phabricator.wikimedia.org/T227290) (owner: 10Bstorm) [01:49:53] (03CR) 10Bstorm: toolforge-k8s: proposed role for all tools (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/537755 (https://phabricator.wikimedia.org/T227290) (owner: 10Bstorm) [02:26:44] (03PS3) 10Nuria: Rsync analytics mediawiki history dumps to dumps.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/538312 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns) [02:37:03] RECOVERY - ElasticSearch shard size check - 9243 on search.svc.codfw.wmnet is OK: OK - All good! https://wikitech.wikimedia.org/wiki/Search%23If_it_has_been_indexed [02:48:33] PROBLEM - MariaDB Slave Lag: s1 on db2048 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 752.02 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [02:51:45] RECOVERY - MariaDB Slave Lag: s1 on db2048 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [02:56:43] 10Operations, 10Traffic, 10Performance-Team (Radar): Some HTTP requests for MW failing due to "ERR_SPDY_PROTOCOL_ERROR 200" - https://phabricator.wikimedia.org/T220022 (10matmarex) Well, since that apparently fixed the problem, we should probably consider a more general solution that would limit the maximum... [03:54:45] (03PS1) 10Jeena Huneidi: Update scaffold template names to use chart name [deployment-charts] - 10https://gerrit.wikimedia.org/r/539220 [04:07:54] (03PS2) 10Marostegui: wmnet: Update s4-master to point to db1138 [dns] - 10https://gerrit.wikimedia.org/r/538748 (https://phabricator.wikimedia.org/T230784) [04:07:59] (03CR) 10Marostegui: mariadb: Promote db1138 to master [puppet] - 10https://gerrit.wikimedia.org/r/538747 (https://phabricator.wikimedia.org/T230784) (owner: 10Marostegui) [04:08:06] (03PS2) 10Marostegui: mariadb: Promote db1138 to master [puppet] - 10https://gerrit.wikimedia.org/r/538747 (https://phabricator.wikimedia.org/T230784) [04:08:34] (03CR) 10Marostegui: wmnet: Update s4-master to point to db1138 [dns] - 10https://gerrit.wikimedia.org/r/538748 (https://phabricator.wikimedia.org/T230784) (owner: 10Marostegui) [04:10:42] !log Start pre-switchover s4 steps T230784 [04:10:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:10:46] T230784: Switchover s4 (commonswiki) primary database master db1081 -> db1138 - 26th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230784 [04:15:09] !log marostegui@cumin1001 dbctl commit (dc=all): 'Set db1138 with weight 0 T230784', diff saved to https://phabricator.wikimedia.org/P9188 and previous config saved to /var/cache/conftool/dbconfig/20190926-041508-marostegui.json [04:15:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:16:45] 10Operations, 10DBA, 10Data-Services, 10Patch-For-Review: Prepare and check storage layer for nqowiki - https://phabricator.wikimedia.org/T230543 (10Marostegui) a:03Marostegui [04:17:15] (03CR) 10Jcrespo: [C: 03+1] wmnet: Update s4-master to point to db1138 [dns] - 10https://gerrit.wikimedia.org/r/538748 (https://phabricator.wikimedia.org/T230784) (owner: 10Marostegui) [04:20:51] (03CR) 10Jcrespo: [C: 03+1] mariadb: Promote db1138 to master [puppet] - 10https://gerrit.wikimedia.org/r/538747 (https://phabricator.wikimedia.org/T230784) (owner: 10Marostegui) [04:22:06] (03PS4) 10Jcrespo: mysql: remove grants for sarin and neodymium [puppet] - 10https://gerrit.wikimedia.org/r/527043 (https://phabricator.wikimedia.org/T220503) (owner: 10Jbond) [04:23:07] (03Abandoned) 10Jcrespo: mysql: remove grants for sarin and neodymium [puppet] - 10https://gerrit.wikimedia.org/r/527043 (https://phabricator.wikimedia.org/T220503) (owner: 10Jbond) [04:24:04] (03PS2) 10Jeena Huneidi: Update scaffold template names to use chart name [deployment-charts] - 10https://gerrit.wikimedia.org/r/539220 [04:25:14] (03CR) 10Jcrespo: [C: 03+1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/537132 (owner: 10Muehlenhoff) [04:31:01] (03CR) 10Jcrespo: "I think the extra device is probably the battery. I don't know how to check for a missing disk, because it is possible to purchase a host " [puppet] - 10https://gerrit.wikimedia.org/r/510139 (owner: 10Jbond) [04:35:40] (03CR) 10Marostegui: [C: 03+2] wmnet: Update s4-master to point to db1138 [dns] - 10https://gerrit.wikimedia.org/r/538748 (https://phabricator.wikimedia.org/T230784) (owner: 10Marostegui) [04:36:08] gah, wrong merge [04:36:13] anyways, I won't deploy dns yet [04:36:24] (03CR) 10Marostegui: [C: 03+2] mariadb: Promote db1138 to master [puppet] - 10https://gerrit.wikimedia.org/r/538747 (https://phabricator.wikimedia.org/T230784) (owner: 10Marostegui) [05:00:04] marostegui and jynus: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for s4 database master failover . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190926T0500). [05:00:06] jynus: ready? [05:00:28] yes [05:00:32] !log Starting s4 failover from db1081 to db1138 - T230784 [05:00:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:00:36] T230784: Switchover s4 (commonswiki) primary database master db1081 -> db1138 - 26th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230784 [05:00:51] !log marostegui@cumin1001 dbctl commit (dc=all): 'Set s4 as read-only for maintenance T230784', diff saved to https://phabricator.wikimedia.org/P9189 and previous config saved to /var/cache/conftool/dbconfig/20190926-050050-marostegui.json [05:00:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:01:07] ro confirmed [05:01:10] "WARNING: The database has been locked for maintenance" [05:01:42] !log marostegui@cumin1001 dbctl commit (dc=all): 'Promote db1138 to s4 master and remove read-only from s4 T230784', diff saved to https://phabricator.wikimedia.org/P9190 and previous config saved to /var/cache/conftool/dbconfig/20190926-050140-marostegui.json [05:01:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:01:51] all done [05:02:01] I can edit fine [05:02:04] I can edit [05:02:25] I can see recentchanges going thru [05:02:27] topology looks fine [05:02:58] not a lot of errors [05:03:45] Wikibase SQL Usage tracker gave a few exceptions [05:03:54] yeah, not many [05:04:37] I wonder if that is reportable, as in- handle the exception even if it is to give an error [05:05:02] yeah, probably [05:05:06] no more errors ongoing [05:05:19] although commons tends to have a long tail for encodings of video [05:05:42] DBConnection looks clean [05:06:00] yeah [05:06:03] same for fatals [05:07:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'Give some weight to db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9191 and previous config saved to /var/cache/conftool/dbconfig/20190926-050722-marostegui.json [05:07:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:07:27] T230784: Switchover s4 (commonswiki) primary database master db1081 -> db1138 - 26th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230784 [05:09:38] !log marostegui@cumin1001 dbctl commit (dc=all): 'Give some API weight to db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9192 and previous config saved to /var/cache/conftool/dbconfig/20190926-050937-marostegui.json [05:09:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:10:43] PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/news (get In the News content) timed out before a response was received https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [05:10:54] 10Operations, 10DBA: Switchover s4 (commonswiki) primary database master db1081 -> db1138 - 26th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230784 (10Marostegui) The switchover was done successfully Read-only start: 05:00:51 Read-only stop: 05:01:42 Total read-only time: 51 seconds [05:11:17] is that a new record? [05:12:13] RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [05:13:17] apergos: no, we did s2 in 50 seconds :) [05:13:24] sooo close! [05:14:18] 10Operations, 10ops-eqiad, 10DC-Ops: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) - https://phabricator.wikimedia.org/T227138 (10Marostegui) [05:19:19] !log marostegui@cumin1001 dbctl commit (dc=all): 'Increase weight for db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9193 and previous config saved to /var/cache/conftool/dbconfig/20190926-051916-marostegui.json [05:19:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:19:24] T230784: Switchover s4 (commonswiki) primary database master db1081 -> db1138 - 26th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230784 [05:30:29] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully pool db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9194 and previous config saved to /var/cache/conftool/dbconfig/20190926-053029-marostegui.json [05:30:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:30:33] T230784: Switchover s4 (commonswiki) primary database master db1081 -> db1138 - 26th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230784 [05:30:57] 10Operations, 10DBA: Switchover s4 (commonswiki) primary database master db1081 -> db1138 - 26th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230784 (10Marostegui) 05Open→03Resolved [05:30:59] 10Operations, 10ops-eqiad, 10DC-Ops: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) - https://phabricator.wikimedia.org/T227138 (10Marostegui) [05:46:27] (03PS5) 10Elukey: Remove Python 2 packages from Analytics Client nodes [puppet] - 10https://gerrit.wikimedia.org/r/538750 (https://phabricator.wikimedia.org/T204734) [05:49:11] (03PS2) 10Elukey: prometheus::node_puppet_agent: use Python3 and its deps [puppet] - 10https://gerrit.wikimedia.org/r/539156 [05:53:19] (03CR) 10Elukey: [C: 03+2] prometheus::node_puppet_agent: use Python3 and its deps [puppet] - 10https://gerrit.wikimedia.org/r/539156 (owner: 10Elukey) [05:53:36] if you see anything weird for --^ let me know :) [05:53:42] I'll keep an eye on puppet errors etc.. [06:10:39] (03PS1) 10Elukey: prometheus::node_amd_rocm: use Python3 deps [puppet] - 10https://gerrit.wikimedia.org/r/539226 [06:11:21] (03CR) 10Elukey: [C: 03+2] prometheus::node_amd_rocm: use Python3 deps [puppet] - 10https://gerrit.wikimedia.org/r/539226 (owner: 10Elukey) [06:29:23] !log marostegui@cumin1001 dbctl commit (dc=all): ' Repool db2088:3312 db2084:3315 db2087:3316 db2086:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9195 and previous config saved to /var/cache/conftool/dbconfig/20190926-062922-marostegui.json [06:29:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:29:27] T233625: Change PK and remove partitions from the logging table - https://phabricator.wikimedia.org/T233625 [06:35:56] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9196 and previous config saved to /var/cache/conftool/dbconfig/20190926-063555-marostegui.json [06:35:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:36:00] T233625: Change PK and remove partitions from the logging table - https://phabricator.wikimedia.org/T233625 [06:39:52] !log Deploy schema change on db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 T233625 [06:39:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:41:34] !log Sanitize nqowiki on db1124:3313 and db2094:3313 - T230543 [06:41:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:41:39] T230543: Prepare and check storage layer for nqowiki - https://phabricator.wikimedia.org/T230543 [06:43:53] 10Operations, 10DBA, 10Data-Services, 10Patch-For-Review: Prepare and check storage layer for nqowiki - https://phabricator.wikimedia.org/T230543 (10Marostegui) I have sanitized both db1124:3313 and db2094:3313 (sanitariums). All the users have been sanitized correctly and my username was created correctl... [06:54:10] (03CR) 10Marostegui: [C: 03+1] Remove ferm rules for labpuppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/537132 (owner: 10Muehlenhoff) [06:55:24] !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission [06:55:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:55:35] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=False) [06:55:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:55:39] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission db1066.eqiad.wmnet - https://phabricator.wikimedia.org/T233071 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db1066.eqiad.wmnet` - db1066.eqiad.wmnet (**PASS**) - Downtimed host on Ic... [06:56:51] mmmh exit_code=False is weird, I'm having a look [06:57:01] (03PS1) 10Marostegui: site.pp: Remove all references to db1066 [puppet] - 10https://gerrit.wikimedia.org/r/539229 (https://phabricator.wikimedia.org/T233071) [06:57:28] (03PS2) 10Muehlenhoff: Remove ferm rules for labpuppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/537132 [06:57:32] volans: but there are no errors from the output actually [06:57:45] indeed, that's correct [06:58:07] False evaluates to 0 so it passes, and that's ok, but the cookbooks should return an int not a boolean [06:58:12] I've had exit_code=False for all my successful tests as well [06:58:22] (03PS1) 10Marostegui: wmnet: Remove db1066 production entries [dns] - 10https://gerrit.wikimedia.org/r/539230 (https://phabricator.wikimedia.org/T233071) [06:58:29] yes it's the have_failure, it's ok, fixing it anyway [06:58:37] (03CR) 10Marostegui: [C: 03+2] site.pp: Remove all references to db1066 [puppet] - 10https://gerrit.wikimedia.org/r/539229 (https://phabricator.wikimedia.org/T233071) (owner: 10Marostegui) [06:59:07] (03CR) 10Marostegui: [C: 03+2] wmnet: Remove db1066 production entries [dns] - 10https://gerrit.wikimedia.org/r/539230 (https://phabricator.wikimedia.org/T233071) (owner: 10Marostegui) [06:59:58] 10Operations, 10ops-eqiad, 10DC-Ops: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142 (10Marostegui) [07:00:55] (03PS1) 10Volans: sre.hosts.decommission: return an int [cookbooks] - 10https://gerrit.wikimedia.org/r/539231 (https://phabricator.wikimedia.org/T231066) [07:00:57] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission db1066.eqiad.wmnet - https://phabricator.wikimedia.org/T233071 (10Marostegui) a:05RobH→03None [07:00:59] (03PS3) 10Muehlenhoff: Remove ferm rules for labpuppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/537132 [07:01:17] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission db1066.eqiad.wmnet - https://phabricator.wikimedia.org/T233071 (10Marostegui) Host ready for on-site steps (and switch disablement) [07:03:09] (03CR) 10Volans: [C: 03+2] sre.hosts.decommission: return an int [cookbooks] - 10https://gerrit.wikimedia.org/r/539231 (https://phabricator.wikimedia.org/T231066) (owner: 10Volans) [07:04:00] (03CR) 10Muehlenhoff: [C: 03+2] Remove ferm rules for labpuppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/537132 (owner: 10Muehlenhoff) [07:04:52] (03Merged) 10jenkins-bot: sre.hosts.decommission: return an int [cookbooks] - 10https://gerrit.wikimedia.org/r/539231 (https://phabricator.wikimedia.org/T231066) (owner: 10Volans) [07:06:25] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission db1066.eqiad.wmnet - https://phabricator.wikimedia.org/T233071 (10Marostegui) a:03Cmjohnson [07:07:25] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission db1072.eqiad.wmnet - https://phabricator.wikimedia.org/T228956 (10Marostegui) a:03Cmjohnson [07:07:35] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission db1069 - https://phabricator.wikimedia.org/T227166 (10Marostegui) a:03Cmjohnson [07:07:43] (03CR) 10Elukey: "> You can use debdeploy to check reverse dependencies, e.g." [puppet] - 10https://gerrit.wikimedia.org/r/538750 (https://phabricator.wikimedia.org/T204734) (owner: 10Elukey) [07:07:47] 10Operations, 10ops-eqiad, 10decommission: Decommission db1064 - https://phabricator.wikimedia.org/T223217 (10Marostegui) a:03Cmjohnson [07:08:11] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission db1068 - https://phabricator.wikimedia.org/T226689 (10Marostegui) a:03Cmjohnson [07:08:23] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission db1065 - https://phabricator.wikimedia.org/T227560 (10Marostegui) a:03Cmjohnson [07:09:55] !log Stop mysql on db1114 for mainboard replacement - T229452 [07:09:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:09:59] T229452: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 [07:10:18] 10Operations, 10ops-eqiad, 10decommission: Decommission neodymium - https://phabricator.wikimedia.org/T220503 (10MoritzMuehlenhoff) a:05RobH→03Cmjohnson [07:10:40] !log Power off db1114 for mainboard replacement T229452 [07:10:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:11:20] 10Operations, 10ops-eqiad, 10DBA: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10Marostegui) @Cmjohnson db1114 is now off, so the mainboard can be replaced anytime. [07:13:19] (03PS1) 10Jcrespo: home: Add zarcillo alias to jynus user [puppet] - 10https://gerrit.wikimedia.org/r/539238 [07:19:21] 10Operations, 10MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), 10Patch-For-Review, 10User-Ladsgroup, and 2 others: Create Wikisource Hindi - https://phabricator.wikimedia.org/T218155 (10Dcljr) >>! In T218155#5507289, @MF-Warburg wrote: > I have already planned to do this. Is there any technical issue preventi... [07:29:20] 10Operations, 10serviceops, 10Core Platform Team (Needs Cleaning - Services Operations): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10elukey) [07:31:01] 10Operations, 10Phabricator, 10Traffic, 10Release-Engineering-Team (Development services), and 2 others: Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10mmodell) 👍 [07:35:56] (03CR) 10Marostegui: [C: 03+1] home: Add zarcillo alias to jynus user [puppet] - 10https://gerrit.wikimedia.org/r/539238 (owner: 10Jcrespo) [07:40:31] (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to 4443 on cp5003 [puppet] - 10https://gerrit.wikimedia.org/r/539264 (https://phabricator.wikimedia.org/T231433) [07:40:33] (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to 443 on cp5003 [puppet] - 10https://gerrit.wikimedia.org/r/539265 (https://phabricator.wikimedia.org/T231433) [07:41:09] !log switching from nginx to ats-tls on cp5003 - T231433 [07:41:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:41:13] T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 [07:41:31] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move nginx from port 443 to 4443 on cp5003 [puppet] - 10https://gerrit.wikimedia.org/r/539264 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez) [07:42:50] !log draining ganeti2001 for upcoming reboot (combined kernel/qemu security updates) [07:42:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:43:07] (03PS4) 10Volans: Deploy homer [puppet] - 10https://gerrit.wikimedia.org/r/534538 (https://phabricator.wikimedia.org/T228388) (owner: 10Ayounsi) [07:43:41] (03CR) 10Volans: [C: 03+1] "LGTM, I'm not sure if /srv/homer/private should be declared here or not. Merging and seeing what tweaks we need." [puppet] - 10https://gerrit.wikimedia.org/r/534538 (https://phabricator.wikimedia.org/T228388) (owner: 10Ayounsi) [07:44:07] 10Operations, 10Release Pipeline, 10serviceops, 10CPT Initiatives (RESTBase Split (CDP2)), and 4 others: Deploy the RESTBase front-end service (RESTRouter) to Kubernetes - https://phabricator.wikimedia.org/T223953 (10mobrovac) 05Resolved→03Open Reopening as there are two more things we have to do befor... [07:44:10] 10Operations, 10Release Pipeline, 10serviceops, 10Goal, 10Release-Engineering-Team (Pipeline): Self-service Deployment Pipeline - https://phabricator.wikimedia.org/T228676 (10mobrovac) [07:44:18] 10Operations, 10Release Pipeline, 10Release-Engineering-Team-TODO, 10Core Platform Team Legacy (Watching / External), and 3 others: Migrate production services to kubernetes using the pipeline - https://phabricator.wikimedia.org/T198901 (10mobrovac) [07:45:14] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move ats-tls from port 8443 to 443 on cp5003 [puppet] - 10https://gerrit.wikimedia.org/r/539265 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez) [07:45:44] vgutierrez: I'm taking the number for next merge [07:46:01] go ahead volans <3 [07:46:18] (03PS5) 10Volans: Deploy homer [puppet] - 10https://gerrit.wikimedia.org/r/534538 (https://phabricator.wikimedia.org/T228388) (owner: 10Ayounsi) [07:46:25] hmm my puppet-merge is still running though [07:46:29] is pretty slow nowadays :( [07:47:01] (done) [07:47:03] no prob I'll wait [07:47:10] I'm waiting for jenkins anyway [07:47:14] !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission [07:47:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:24] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [07:47:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:27] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission db1063.eqiad.wmnet - https://phabricator.wikimedia.org/T232564 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db1063.eqiad.wmnet` - db1063.eqiad.wmnet (**PASS**) - Downtimed host on Ic... [07:47:34] marostegui: better now :) (exit_code=0) [07:47:40] volans: yeah! :) [07:48:22] PROBLEM - HTTPS Unified ECDSA on cp5003 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS [07:48:26] ^^ expected :) [07:48:27] expected :D [07:48:30] lol [07:48:31] ahahaha [07:48:46] (03PS1) 10Marostegui: site.pp: Remove references to db1063 [puppet] - 10https://gerrit.wikimedia.org/r/539266 (https://phabricator.wikimedia.org/T232564) [07:49:23] (03PS1) 10Marostegui: wmnet: Remove production entries for db1063 [dns] - 10https://gerrit.wikimedia.org/r/539267 (https://phabricator.wikimedia.org/T232564) [07:49:48] (03CR) 10Marostegui: [C: 03+2] site.pp: Remove references to db1063 [puppet] - 10https://gerrit.wikimedia.org/r/539266 (https://phabricator.wikimedia.org/T232564) (owner: 10Marostegui) [07:49:53] (03CR) 10Volans: [C: 03+2] Deploy homer [puppet] - 10https://gerrit.wikimedia.org/r/534538 (https://phabricator.wikimedia.org/T228388) (owner: 10Ayounsi) [07:49:54] !log swift eqiad-prod: continue ms-be1027 decom - T233289 [07:49:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:49:57] T233289: Unable to power on ms-be1027 - https://phabricator.wikimedia.org/T233289 [07:49:58] RECOVERY - HTTPS Unified ECDSA on cp5003 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 345526 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2019-11-22 07:59:59 +0000 (expires in 57 days) https://wikitech.wikimedia.org/wiki/HTTPS [07:49:58] (03CR) 10Marostegui: [C: 03+2] wmnet: Remove production entries for db1063 [dns] - 10https://gerrit.wikimedia.org/r/539267 (https://phabricator.wikimedia.org/T232564) (owner: 10Marostegui) [07:50:00] marostegui: I was in the queue [07:50:01] :( [07:50:14] volans: oh, sorry, missed it :( [07:50:18] no prob [07:50:20] go ahead [07:50:23] spaniards always skipping the queu [07:50:24] :(+ [07:50:25] *queue [07:50:27] just the 4 rebase in a row [07:50:46] * vgutierrez grants volans the title "Master of rebasing" [07:50:54] not even 10am and we already have a queue :( [07:51:03] almost 4pm dude [07:51:07] peak hours [07:51:10] ;P [07:51:22] marostegui: lmk once puppet-merged [07:51:23] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission db1063.eqiad.wmnet - https://phabricator.wikimedia.org/T232564 (10Marostegui) a:05RobH→03Cmjohnson [07:51:42] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission db1063.eqiad.wmnet - https://phabricator.wikimedia.org/T232564 (10Marostegui) Ready for on-site steps + switch disablement [07:51:44] volans: done! [07:51:57] (03PS6) 10Volans: Deploy homer [puppet] - 10https://gerrit.wikimedia.org/r/534538 (https://phabricator.wikimedia.org/T228388) (owner: 10Ayounsi) [07:51:59] volans: let me know when you've finished to not collide with you [07:52:06] vgutierrez: ack [07:55:01] 10Operations, 10Traffic, 10Patch-For-Review: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez) [07:55:55] (03PS1) 10Marostegui: db1078: Change binlog format to ROW [puppet] - 10https://gerrit.wikimedia.org/r/539268 (https://phabricator.wikimedia.org/T233569) [07:56:04] * vgutierrez waiting.... [07:56:08] vgutierrez: all yours [07:56:13] <3 [07:56:13] all too slow [07:56:23] jenkins and puppet-merge [07:56:58] (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to 4443 on cp4023 [puppet] - 10https://gerrit.wikimedia.org/r/539269 (https://phabricator.wikimedia.org/T231433) [07:57:00] (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to 443 on cp4023 [puppet] - 10https://gerrit.wikimedia.org/r/539270 (https://phabricator.wikimedia.org/T231433) [07:57:06] !log switching from nginx to ats-tls on cp4023 - T231433 [07:57:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:57:10] T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 [07:57:24] (03PS2) 10Jcrespo: home: Add zarcillo alias to jynus user [puppet] - 10https://gerrit.wikimedia.org/r/539238 [07:57:26] (03PS2) 10Marostegui: db1078: Change binlog format to ROW [puppet] - 10https://gerrit.wikimedia.org/r/539268 (https://phabricator.wikimedia.org/T233569) [07:58:04] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move nginx from port 443 to 4443 on cp4023 [puppet] - 10https://gerrit.wikimedia.org/r/539269 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez) [07:59:15] (03PS3) 10Marostegui: db1078: Change binlog format to ROW [puppet] - 10https://gerrit.wikimedia.org/r/539268 (https://phabricator.wikimedia.org/T233569) [08:00:03] (03CR) 10Marostegui: [C: 03+2] db1078: Change binlog format to ROW [puppet] - 10https://gerrit.wikimedia.org/r/539268 (https://phabricator.wikimedia.org/T233569) (owner: 10Marostegui) [08:00:25] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move ats-tls from port 8443 to 443 on cp4023 [puppet] - 10https://gerrit.wikimedia.org/r/539270 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez) [08:00:46] (03PS2) 10Vgutierrez: hiera: Move ats-tls from port 8443 to 443 on cp4023 [puppet] - 10https://gerrit.wikimedia.org/r/539270 (https://phabricator.wikimedia.org/T231433) [08:02:00] !log Depool db1078 to restart mysql to change its binlog format to ROW [08:02:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:02:41] (03PS1) 10Volans: scap: add dsh group for homer [puppet] - 10https://gerrit.wikimedia.org/r/539271 (https://phabricator.wikimedia.org/T228388) [08:02:45] vgutierrez: I've one more :) [08:03:01] go ahead [08:03:16] PROBLEM - HTTPS Unified RSA on cp4023 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS [08:03:21] expected :) [08:03:31] ^expected [08:03:34] :p [08:03:48] PROBLEM - HTTPS Unified ECDSA on cp4023 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS [08:03:56] nice, I can get all of you switching cp servers to ats-tls now [08:03:57] O:) [08:03:58] (03PS2) 10Volans: scap: add dsh group for homer [puppet] - 10https://gerrit.wikimedia.org/r/539271 (https://phabricator.wikimedia.org/T228388) [08:03:59] vgutierrez: ack, thx [08:04:27] (03PS3) 10Volans: scap: add dsh group for homer [puppet] - 10https://gerrit.wikimedia.org/r/539271 (https://phabricator.wikimedia.org/T228388) [08:04:32] RECOVERY - HTTPS Unified RSA on cp4023 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 345588 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (RSA) valid until 2019-11-22 07:59:59 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:04:37] (03CR) 10Volans: [C: 03+2] scap: add dsh group for homer [puppet] - 10https://gerrit.wikimedia.org/r/539271 (https://phabricator.wikimedia.org/T228388) (owner: 10Volans) [08:04:43] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1078 to change binlog format', diff saved to https://phabricator.wikimedia.org/P9197 and previous config saved to /var/cache/conftool/dbconfig/20190926-080442-marostegui.json [08:04:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:05:06] RECOVERY - HTTPS Unified ECDSA on cp4023 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 345554 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2019-11-22 07:59:59 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:05:09] (03CR) 10Hashar: "profile::elastalert was applied on deployment-logstash2.deployment-prep.eqiad.wmflabs but the class was not existing causing puppet to br" [puppet] - 10https://gerrit.wikimedia.org/r/505762 (https://phabricator.wikimedia.org/T213933) (owner: 10Filippo Giunchedi) [08:07:13] !log executed 'rmr /yarn-rmstore/analytics-test-hadoop/ZKRMStateRoot' on conf1004's zkCli.sh to clean up znodes - T217057 [08:07:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:07:17] T217057: Decouple analytics zookeeper cluster from kafka zookeeper cluster [2019-2020] - https://phabricator.wikimedia.org/T217057 [08:08:45] vgutierrez: all back to you (for now) [08:09:50] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P9198 and previous config saved to /var/cache/conftool/dbconfig/20190926-080949-marostegui.json [08:09:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:15] (03CR) 10Jcrespo: [C: 03+2] home: Add zarcillo alias to jynus user [puppet] - 10https://gerrit.wikimedia.org/r/539238 (owner: 10Jcrespo) [08:10:46] (03PS3) 10Jcrespo: home: Add zarcillo alias to jynus user [puppet] - 10https://gerrit.wikimedia.org/r/539238 [08:11:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P9199 and previous config saved to /var/cache/conftool/dbconfig/20190926-081144-marostegui.json [08:11:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:12:36] (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to 4443 on cp3036 [puppet] - 10https://gerrit.wikimedia.org/r/539272 (https://phabricator.wikimedia.org/T231433) [08:12:38] (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to 443 on cp3036 [puppet] - 10https://gerrit.wikimedia.org/r/539273 (https://phabricator.wikimedia.org/T231433) [08:12:59] 10Operations, 10Traffic, 10Patch-For-Review: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez) [08:13:30] !log switching from nginx to ats-tls on cp3036 - T231433 [08:13:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:13:34] T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 [08:13:48] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move nginx from port 443 to 4443 on cp3036 [puppet] - 10https://gerrit.wikimedia.org/r/539272 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez) [08:14:06] (03PS2) 10Vgutierrez: hiera: Move nginx from port 443 to 4443 on cp3036 [puppet] - 10https://gerrit.wikimedia.org/r/539272 (https://phabricator.wikimedia.org/T231433) [08:15:13] PROBLEM - Keyholder SSH agent on cumin2001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder [08:15:30] hmmm cumin2001 got a reboot? [08:16:33] that's me [08:16:37] new identity [08:16:38] last mention on SAL it's from yesterday [08:16:39] ack [08:16:43] but I have small issue [08:16:53] will alert for 1001 too, we'll need to wait for arzhel [08:16:55] I'll ack those [08:16:58] uh, ok [08:18:00] !log marostegui@cumin1001 dbctl commit (dc=all): 'More weight to db1078', diff saved to https://phabricator.wikimedia.org/P9200 and previous config saved to /var/cache/conftool/dbconfig/20190926-081759-marostegui.json [08:18:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:18:36] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move ats-tls from port 8443 to 443 on cp3036 [puppet] - 10https://gerrit.wikimedia.org/r/539273 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez) [08:18:45] (03PS2) 10Vgutierrez: hiera: Move ats-tls from port 8443 to 443 on cp3036 [puppet] - 10https://gerrit.wikimedia.org/r/539273 (https://phabricator.wikimedia.org/T231433) [08:22:29] PROBLEM - HTTPS Unified ECDSA on cp3036 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS [08:22:34] !log marostegui@cumin1001 dbctl commit (dc=all): 'Change special weights from 1 to 100 - T231018', diff saved to https://phabricator.wikimedia.org/P9201 and previous config saved to /var/cache/conftool/dbconfig/20190926-082233-marostegui.json [08:22:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:38] T231018: specify group (api/vslow/etc) weights in terms of 0..100 instead of 0..1 - https://phabricator.wikimedia.org/T231018 [08:25:23] RECOVERY - HTTPS Unified ECDSA on cp3036 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 345516 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2019-11-22 07:59:59 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:25:29] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [08:25:30] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [08:25:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:45] hallo [08:26:05] Urbanecm - thanks for the help with N'Ko. Is the Incubator going to be imported soon? [08:26:21] ACKNOWLEDGEMENT - Keyholder SSH agent on cumin2001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. Volans Misisng new key for homer, I dont have access to the passphrase, need to wait for Arzhel https://wikitech.wikimedia.org/wiki/Keyholder [08:27:31] PROBLEM - ats-tls HTTPS en.wikipedia.org RSA on cp3036 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS [08:28:01] PROBLEM - ats-tls HTTPS en.wikipedia.org ECDSA on cp3036 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS [08:28:01] PROBLEM - Ensure traffic_manager binds on 8443 and responds to HTTP requests on cp3036 is CRITICAL: connect to address 10.20.0.171 and port 8443: Connection refused https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:31:58] RECOVERY - ats-tls HTTPS en.wikipedia.org RSA on cp3036 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 345122 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (RSA) valid until 2019-11-22 07:59:59 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:32:42] RECOVERY - ats-tls HTTPS en.wikipedia.org ECDSA on cp3036 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 345078 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2019-11-22 07:59:59 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:39:39] (03CR) 10Masumrezarock100: "@Urbanecm There is a higher resolution PNG in the phabtask description. You might want to use that one instead." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539052 (https://phabricator.wikimedia.org/T233104) (owner: 10Urbanecm) [08:41:25] (03CR) 10Urbanecm: "> Patch Set 4: -Code-Review" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539052 (https://phabricator.wikimedia.org/T233104) (owner: 10Urbanecm) [08:42:00] !log marostegui@cumin1001 dbctl commit (dc=all): 'More weight to db1078', diff saved to https://phabricator.wikimedia.org/P9202 and previous config saved to /var/cache/conftool/dbconfig/20190926-084159-marostegui.json [08:42:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:22] aharoni: usually, we don't handle wiki imports ourselves, see https://incubator.wikimedia.org/wiki/Incubator:Site_creation_log#2019 and people listed there for recent imports [08:42:28] 10Operations, 10Traffic, 10Patch-For-Review: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez) [08:43:05] !log mobrovac@deploy1001 Started deploy [restbase/deploy@c419651]: Add nqo.wp.org - T233833 [08:43:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:43:09] T233833: Add nqowiki to restbase - https://phabricator.wikimedia.org/T233833 [08:43:38] (03PS5) 10Urbanecm: Add wgMinervaCustomLogos for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539052 (https://phabricator.wikimedia.org/T233104) [08:45:14] Urbanecm - that's more or less what I thought, but how are they notified? Does the wiki creator notify them? Or do they somehow find out? :) [08:46:16] aharoni: we have a mailinglist newprojects, see https://lists.wikimedia.org/mailman/listinfo/newprojects. The wiki creating script automatically sends an email to that list [08:47:06] (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to 4443 on cp2008 [puppet] - 10https://gerrit.wikimedia.org/r/539277 (https://phabricator.wikimedia.org/T231433) [08:47:08] (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to 443 on cp2008 [puppet] - 10https://gerrit.wikimedia.org/r/539278 (https://phabricator.wikimedia.org/T231433) [08:47:31] !log switching from nginx to ats-tls on cp2008 - T231433 [08:47:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:47:34] T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 [08:48:27] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move nginx from port 443 to 4443 on cp2008 [puppet] - 10https://gerrit.wikimedia.org/r/539277 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez) [08:49:26] (03PS1) 10Volans: Update submodule reference and rebuild artifacts [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/539279 [08:50:07] (03CR) 10Volans: [V: 03+2 C: 03+2] "Merging to unblock deployment" [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/539279 (owner: 10Volans) [08:51:56] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move ats-tls from port 8443 to 443 on cp2008 [puppet] - 10https://gerrit.wikimedia.org/r/539278 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez) [08:54:51] hashar: you around by any chance? [08:57:42] volans: yes in a minute ;) [08:57:48] ack [08:58:03] 10Operations, 10ops-codfw: Broken network connection on ganeti2001 after reboot - https://phabricator.wikimedia.org/T233906 (10MoritzMuehlenhoff) [08:58:47] volans: so yeah I am here :] [08:58:57] I'll query you [08:59:35] (03PS1) 10Mobrovac: RESTRouter: Skip resources on start-up and add nqo.wp.org [deployment-charts] - 10https://gerrit.wikimedia.org/r/539280 (https://phabricator.wikimedia.org/T223953) [09:02:06] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [09:02:07] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [09:02:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:29] (03CR) 10Filippo Giunchedi: "Thanks for reviving this! I don't have the bandwidth to followup but in short the issue to tackle is: in production swift block devices ar" [puppet] - 10https://gerrit.wikimedia.org/r/361648 (https://phabricator.wikimedia.org/T163673) (owner: 10Filippo Giunchedi) [09:03:14] 10Operations, 10MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), 10Patch-For-Review, 10User-Ladsgroup, and 2 others: Create Wikisource Hindi - https://phabricator.wikimedia.org/T218155 (10MF-Warburg) We are waiting for T233365 to be resolved. [09:03:31] (03CR) 10Filippo Giunchedi: [C: 03+2] Decom ms-be1027 [puppet] - 10https://gerrit.wikimedia.org/r/539136 (https://phabricator.wikimedia.org/T233289) (owner: 10Filippo Giunchedi) [09:03:41] (03PS2) 10Filippo Giunchedi: Decom ms-be1027 [puppet] - 10https://gerrit.wikimedia.org/r/539136 (https://phabricator.wikimedia.org/T233289) [09:04:37] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@c419651]: Add nqo.wp.org - T233833 (duration: 21m 32s) [09:04:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:04:41] T233833: Add nqowiki to restbase - https://phabricator.wikimedia.org/T233833 [09:11:16] 10Operations, 10Traffic, 10Patch-For-Review: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez) [09:13:50] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1078', diff saved to https://phabricator.wikimedia.org/P9203 and previous config saved to /var/cache/conftool/dbconfig/20190926-091348-marostegui.json [09:13:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:17:46] (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to 4443 on cp1080 [puppet] - 10https://gerrit.wikimedia.org/r/539282 (https://phabricator.wikimedia.org/T231433) [09:17:48] (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to 443 on cp1080 [puppet] - 10https://gerrit.wikimedia.org/r/539283 (https://phabricator.wikimedia.org/T231433) [09:18:10] !log switching from nginx to ats-tls on cp1080 - T231433 [09:18:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:18:14] T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 [09:18:23] PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [09:18:32] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move nginx from port 443 to 4443 on cp1080 [puppet] - 10https://gerrit.wikimedia.org/r/539282 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez) [09:19:25] who's got the umerged puppet change? [09:19:42] godog: I can merge your change? [09:19:48] *can I... [09:20:06] "Decom ms-be1027 (b5f06df828)" [09:20:25] whoops, yes vgutierrez please merge away [09:20:28] totally forgot [09:20:29] ack [09:20:35] (merging) [09:20:39] thanks [09:20:58] I hit 'puppet-merge' but didn't punch in 'yes' )o) [09:21:15] * apergos gets back off the host (I had looked and answered 'no' :-P) [09:21:24] 10Operations, 10observability, 10HHVM: Monitor HHVM bytecode cache depletion on mediawiki app servers - https://phabricator.wikimedia.org/T161598 (10MoritzMuehlenhoff) 05Open→03Declined No longer needed [09:21:29] RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [09:22:55] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move ats-tls from port 8443 to 443 on cp1080 [puppet] - 10https://gerrit.wikimedia.org/r/539283 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez) [09:23:25] 10Operations, 10Citoid, 10RESTBase, 10RESTBase-API, and 5 others: Set-up Citoid behind RESTBase - https://phabricator.wikimedia.org/T108646 (10Mvolz) [09:26:12] (03PS1) 10Volans: homer: add missing dependency virtualenv [puppet] - 10https://gerrit.wikimedia.org/r/539284 (https://phabricator.wikimedia.org/T228388) [09:27:42] vgutierrez: lmk when done ;) [09:27:49] already done [09:27:57] (03PS2) 10Volans: homer: add missing dependency virtualenv [puppet] - 10https://gerrit.wikimedia.org/r/539284 (https://phabricator.wikimedia.org/T228388) [09:28:04] nice, thx [09:28:07] * volans taking the spot [09:28:11] *slot [09:29:09] (03PS6) 10Mathew.onipe: query_service: rename profile/wdqs to profile/query_service [puppet] - 10https://gerrit.wikimedia.org/r/538849 (https://phabricator.wikimedia.org/T232297) [09:29:11] (03PS1) 10Mathew.onipe: query_service: separate categories from main blazegraph profile [puppet] - 10https://gerrit.wikimedia.org/r/539285 (https://phabricator.wikimedia.org/T232297) [09:30:26] (03CR) 10Volans: [C: 03+2] homer: add missing dependency virtualenv [puppet] - 10https://gerrit.wikimedia.org/r/539284 (https://phabricator.wikimedia.org/T228388) (owner: 10Volans) [09:37:47] 10Operations, 10MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), 10Patch-For-Review, 10User-Ladsgroup, and 2 others: Create Wikisource Hindi - https://phabricator.wikimedia.org/T218155 (10Urbanecm) >>! In T218155#5525154, @MF-Warburg wrote: > We are waiting for T233365 to be resolved. Already was done, just no... [09:39:33] (03CR) 10Masumrezarock100: [C: 03+1] "Looks good to me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539052 (https://phabricator.wikimedia.org/T233104) (owner: 10Urbanecm) [09:40:31] 10Operations, 10Traffic, 10Patch-For-Review: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez) [09:41:36] PROBLEM - Keyholder SSH agent on cumin1001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder [09:42:18] expected, acking [09:43:50] ACKNOWLEDGEMENT - Keyholder SSH agent on cumin1001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. Volans Missing new key for homer, I dont have access to the passphrase, need to wait for Arzhel https://wikitech.wikimedia.org/wiki/Keyholder [09:50:44] 10Puppet, 10Patch-For-Review: Analyses octocatalog-diff output - https://phabricator.wikimedia.org/T233203 (10jbond) 05Open→03Resolved [09:50:47] 10Puppet, 10Patch-For-Review: Upgrade Puppet Masters and Puppet DB servers - https://phabricator.wikimedia.org/T228657 (10jbond) [09:51:46] 10Operations, 10Puppet: Rebuild puppet master backends - https://phabricator.wikimedia.org/T233915 (10jbond) [09:51:54] 10Operations, 10Puppet: Rebuild puppet master backends - https://phabricator.wikimedia.org/T233915 (10jbond) p:05Triage→03Normal [09:52:53] (03PS1) 10Jbond: puppetmaster1002: offline puppet master ready for rebuild [puppet] - 10https://gerrit.wikimedia.org/r/539287 (https://phabricator.wikimedia.org/T233915) [09:54:42] 10Operations, 10observability, 10serviceops: Errors managed by wmf-errors (like OOMs) lack normalized_message on logstash - https://phabricator.wikimedia.org/T233828 (10Joe) [09:55:02] (03CR) 10Filippo Giunchedi: "I don't feel strongly either way although the custom script feels better. Adding Moritz and John to see what they think" [puppet] - 10https://gerrit.wikimedia.org/r/539191 (owner: 1020after4) [09:55:21] !log bouncing postgres on puppetdb1002/2002 [09:55:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:44] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/539287 (https://phabricator.wikimedia.org/T233915) (owner: 10Jbond) [09:56:21] (03CR) 10Jbond: [C: 03+2] puppetmaster1002: offline puppet master ready for rebuild [puppet] - 10https://gerrit.wikimedia.org/r/539287 (https://phabricator.wikimedia.org/T233915) (owner: 10Jbond) [10:01:16] (03PS6) 10Elukey: Remove Python 2 packages from Analytics Client nodes [puppet] - 10https://gerrit.wikimedia.org/r/538750 (https://phabricator.wikimedia.org/T204734) [10:02:13] (03PS1) 10Jbond: puppetmaster1002: switch pxe boot to buster installer [puppet] - 10https://gerrit.wikimedia.org/r/539289 (https://phabricator.wikimedia.org/T233915) [10:06:19] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/539289 (https://phabricator.wikimedia.org/T233915) (owner: 10Jbond) [10:06:28] (03CR) 10Filippo Giunchedi: "AFAIK this patch will be superseded by deploying ferm 2.4.2+ which DTRT for A+AAAA" [puppet] - 10https://gerrit.wikimedia.org/r/381073 (https://phabricator.wikimedia.org/T153468) (owner: 10Hashar) [10:07:24] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/538750 (https://phabricator.wikimedia.org/T204734) (owner: 10Elukey) [10:10:10] RECOVERY - Check systemd state on debmonitor1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:12:28] (03PS2) 1020after4: Script to upgrade phatality [puppet] - 10https://gerrit.wikimedia.org/r/539191 [10:13:05] (03PS1) 10Volans: Use same approch of more recent repos [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/539292 [10:13:09] (03PS1) 10Volans: Rename submodule to src [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/539293 [10:13:11] (03PS1) 10Volans: Rebuilt artifacts with the new code [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/539294 [10:13:13] (03CR) 10Filippo Giunchedi: [C: 03+2] nrpe::monitor_service: Make notes_url optional for ensure=absent [puppet] - 10https://gerrit.wikimedia.org/r/529590 (owner: 10Alex Monk) [10:13:19] (03PS5) 10Filippo Giunchedi: nrpe::monitor_service: Make notes_url optional for ensure=absent [puppet] - 10https://gerrit.wikimedia.org/r/529590 (owner: 10Alex Monk) [10:13:21] (03CR) 10jerkins-bot: [V: 04-1] Script to upgrade phatality [puppet] - 10https://gerrit.wikimedia.org/r/539191 (owner: 1020after4) [10:14:07] 10Operations, 10Release Pipeline, 10serviceops, 10CPT Initiatives (RESTBase Split (CDP2)), and 4 others: Deploy the RESTBase front-end service (RESTRouter) to Kubernetes - https://phabricator.wikimedia.org/T223953 (10akosiaris) > * set up the rate-limiting DHT inside k8s for RESTRouter (this is currently d... [10:14:33] (03PS3) 1020after4: Script to upgrade phatality [puppet] - 10https://gerrit.wikimedia.org/r/539191 [10:14:37] (03CR) 10Jbond: [C: 03+2] puppetmaster1002: switch pxe boot to buster installer [puppet] - 10https://gerrit.wikimedia.org/r/539289 (https://phabricator.wikimedia.org/T233915) (owner: 10Jbond) [10:17:37] (03PS2) 10Volans: Use same approch of more recent repos [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/539292 [10:17:39] (03PS2) 10Volans: Rename submodule to src [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/539293 [10:23:00] (03PS1) 10Volans: Add git_upstream_submodules=True [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/539296 [10:23:28] (03CR) 10Volans: [V: 03+2 C: 03+2] "As suggested by scap" [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/539296 (owner: 10Volans) [10:30:57] (03Abandoned) 10MarcoAurelio: profile::mediawiki::maintenance: purge_checkuser to use periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/529077 (owner: 10MarcoAurelio) [10:41:01] 10Operations: Further steps for CAS/web SSO - https://phabricator.wikimedia.org/T233921 (10MoritzMuehlenhoff) [10:41:18] (03CR) 10Filippo Giunchedi: "Thanks for the script! See inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/539191 (owner: 1020after4) [10:41:30] 10Operations, 10Traffic, 10Patch-For-Review: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 (10Vgutierrez) [10:46:33] (03Abandoned) 10Arturo Borrero Gonzalez: toolforge: k8s: ingress: nginx-ingress listen on 8082/tcp [puppet] - 10https://gerrit.wikimedia.org/r/527541 (https://phabricator.wikimedia.org/T228500) (owner: 10Arturo Borrero Gonzalez) [10:46:43] (03Abandoned) 10Arturo Borrero Gonzalez: toolforge: k8s: ingress: add frontend service [puppet] - 10https://gerrit.wikimedia.org/r/527542 (https://phabricator.wikimedia.org/T228500) (owner: 10Arturo Borrero Gonzalez) [10:47:34] (03PS2) 10Vgutierrez: hiera: Move nginx from port 443 to 4443 on cp5007 [puppet] - 10https://gerrit.wikimedia.org/r/537994 (https://phabricator.wikimedia.org/T231627) [10:47:36] (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to 443 on cp5007 [puppet] - 10https://gerrit.wikimedia.org/r/539300 (https://phabricator.wikimedia.org/T231627) [10:48:11] !log switching from nginx to ats-tls on cp5007 - T231627 [10:48:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:48:15] T231627: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 [10:50:18] (03PS1) 10Hashar: contint: add puppetmaster CA cert [puppet] - 10https://gerrit.wikimedia.org/r/539301 (https://phabricator.wikimedia.org/T152941) [10:50:34] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move nginx from port 443 to 4443 on cp5007 [puppet] - 10https://gerrit.wikimedia.org/r/537994 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [10:53:22] !log reimagaing puppetmaster1002 to buster [10:53:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:54:42] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move ats-tls from port 8443 to 443 on cp5007 [puppet] - 10https://gerrit.wikimedia.org/r/539300 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [10:55:28] 10Operations, 10Puppet, 10Patch-For-Review: Rebuild puppet master backends - https://phabricator.wikimedia.org/T233915 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts: ` ['puppetmaster1002.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reima... [10:57:22] PROBLEM - HTTPS Unified ECDSA on cp5007 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS [10:57:30] expected :) [10:58:34] jouncebot: next [10:58:35] In 0 hour(s) and 1 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190926T1100) [10:58:58] RECOVERY - HTTPS Unified ECDSA on cp5007 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 345543 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2019-11-22 07:59:59 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/HTTPS [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor My software never has bugs. It just develops random features. Rise for European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190926T1100). [11:00:04] isaacj: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:13] here [11:00:13] I can SWAT today! [11:00:21] thanks Urbanecm ! [11:00:23] Hi isaacj [11:00:32] (03CR) 10Urbanecm: [C: 03+2] Enable reader demographic surveys in English, Polish, and Russian. With proper links now. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539183 (https://phabricator.wikimedia.org/T232525) (owner: 10Isaac Johnson) [11:02:19] (03Merged) 10jenkins-bot: Enable reader demographic surveys in English, Polish, and Russian. With proper links now. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539183 (https://phabricator.wikimedia.org/T232525) (owner: 10Isaac Johnson) [11:02:37] (03CR) 10jenkins-bot: Enable reader demographic surveys in English, Polish, and Russian. With proper links now. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539183 (https://phabricator.wikimedia.org/T232525) (owner: 10Isaac Johnson) [11:04:03] isaacj: your patch is at mwdebug1002, can you check, please? [11:04:14] yep, thanks -- three surveys so will likely just take me a few minutes to make sure that everything on them looks right [11:04:17] (03CR) 10Urbanecm: [C: 03+2] Add wgMinervaCustomLogos for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539052 (https://phabricator.wikimedia.org/T233104) (owner: 10Urbanecm) [11:04:23] (03PS6) 10Urbanecm: Add wgMinervaCustomLogos for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539052 (https://phabricator.wikimedia.org/T233104) [11:04:31] (03CR) 10Urbanecm: [C: 03+2] Add wgMinervaCustomLogos for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539052 (https://phabricator.wikimedia.org/T233104) (owner: 10Urbanecm) [11:04:49] sure, that's fine isaacj :) [11:05:28] (03Merged) 10jenkins-bot: Add wgMinervaCustomLogos for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539052 (https://phabricator.wikimedia.org/T233104) (owner: 10Urbanecm) [11:05:44] (03CR) 10jenkins-bot: Add wgMinervaCustomLogos for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539052 (https://phabricator.wikimedia.org/T233104) (owner: 10Urbanecm) [11:07:43] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [11:07:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:48] !log urbanecm@deploy1001 Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.png: SWAT: 96bba4c: Add wgMinervaCustomLogos for szlwiki (T233104; 1/3) (duration: 01m 08s) [11:07:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:52] T233104: Add localized Wikipedia wordmark to the Silesian (szl) mobile frontend - https://phabricator.wikimedia.org/T233104 [11:08:06] ok Urbanecm I'm happy with how they look. let's proceed! [11:08:15] cool, thanks isaacj ! [11:08:56] 10Operations, 10Traffic, 10Patch-For-Review: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 (10Vgutierrez) [11:09:56] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [11:09:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:46] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 7645e55: Enable reader demographic surveys in English, Polish, and Russian (T232525) (duration: 01m 06s) [11:10:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:52] T232525: Repeat demographics surveys for longer time period - https://phabricator.wikimedia.org/T232525 [11:11:04] isaacj: should be synced! [11:12:14] great! i'll check to make sure but thanks [11:12:27] isaacj: yw [11:13:06] !log urbanecm@deploy1001 Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.svg: SWAT: 96bba4c: Add wgMinervaCustomLogos for szlwiki (T233104; 2/3) (duration: 01m 05s) [11:13:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:13:11] T233104: Add localized Wikipedia wordmark to the Silesian (szl) mobile frontend - https://phabricator.wikimedia.org/T233104 [11:14:21] !log Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-szl.svg (T233104) [11:14:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:16:26] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 96bba4c: Add wgMinervaCustomLogos for szlwiki (T233104; 3/3) (duration: 01m 05s) [11:16:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:17:59] and yep, all things look good! [11:20:49] great isaacj ! [11:23:09] !log EU SWAT done [11:23:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:24:29] 10Puppet, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Consider ways to make puppetmaster CA changes smoother on the puppet client end - https://phabricator.wikimedia.org/T220268 (10hashar) 05Resolved→03Open For the `integration` project I have set `profile::base::certificates:... [11:25:22] (03CR) 10Mobrovac: [C: 03+2] RESTRouter: Skip resources on start-up and add nqo.wp.org [deployment-charts] - 10https://gerrit.wikimedia.org/r/539280 (https://phabricator.wikimedia.org/T223953) (owner: 10Mobrovac) [11:25:36] (03Merged) 10jenkins-bot: RESTRouter: Skip resources on start-up and add nqo.wp.org [deployment-charts] - 10https://gerrit.wikimedia.org/r/539280 (https://phabricator.wikimedia.org/T223953) (owner: 10Mobrovac) [11:29:30] 10Puppet, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Consider ways to make puppetmaster CA changes smoother on the puppet client end - https://phabricator.wikimedia.org/T220268 (10hashar) And I fix it on the puppet agent via `rm -fR /var/lib/puppet/ssl/` which then gives me the... [11:31:02] !log mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='Nederlandse Leeuw' /home/urbanecm/T233922 (T233922) [11:31:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:07] T233922: Server side upload for Nederlandse Leeuw - https://phabricator.wikimedia.org/T233922 [11:38:34] (03CR) 10Mobrovac: [C: 04-1] "I left a first round of comments, more will likely be needed :)" (035 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/517557 (https://phabricator.wikimedia.org/T224935) (owner: 10Jeena Huneidi) [11:39:52] (03CR) 10Mobrovac: [C: 04-1] Add restbase chart (port from local-charts) (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/517557 (https://phabricator.wikimedia.org/T224935) (owner: 10Jeena Huneidi) [11:42:55] (03PS1) 10Vgutierrez: fifo-log-tailer: Retry on errors [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/539312 [11:44:15] (03PS2) 10Vgutierrez: fifo-log-tailer: Retry on errors [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/539312 [11:53:18] (03PS1) 10Jbond: puppetmaster1002: first add master back as a canary [puppet] - 10https://gerrit.wikimedia.org/r/539316 (https://phabricator.wikimedia.org/T233915) [11:53:20] (03PS1) 10Jbond: puppetmaster1002: promote puppetmaster1002 [puppet] - 10https://gerrit.wikimedia.org/r/539317 (https://phabricator.wikimedia.org/T233915) [11:55:51] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/539316 (https://phabricator.wikimedia.org/T233915) (owner: 10Jbond) [11:56:17] (03CR) 10Muehlenhoff: "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/539317 (https://phabricator.wikimedia.org/T233915) (owner: 10Jbond) [11:58:28] (03CR) 10Jbond: [C: 03+2] puppetmaster1002: first add master back as a canary [puppet] - 10https://gerrit.wikimedia.org/r/539316 (https://phabricator.wikimedia.org/T233915) (owner: 10Jbond) [12:06:06] (03CR) 10Jbond: [C: 03+2] puppetmaster1002: promote puppetmaster1002 [puppet] - 10https://gerrit.wikimedia.org/r/539317 (https://phabricator.wikimedia.org/T233915) (owner: 10Jbond) [12:06:11] (03PS1) 10Marostegui: sX-pager.sql: Remove partitioning from logging table [software] - 10https://gerrit.wikimedia.org/r/539319 (https://phabricator.wikimedia.org/T233625) [12:06:49] (03PS2) 10Marostegui: sX-pager.sql: Remove partitioning from logging table [software] - 10https://gerrit.wikimedia.org/r/539319 (https://phabricator.wikimedia.org/T233625) [12:07:52] (03PS3) 10Marostegui: sX-pager.sql: Remove partitioning from logging table [software] - 10https://gerrit.wikimedia.org/r/539319 (https://phabricator.wikimedia.org/T233625) [12:08:59] (03PS4) 10Marostegui: sX-pager.sql: Remove partitioning from logging table [software] - 10https://gerrit.wikimedia.org/r/539319 (https://phabricator.wikimedia.org/T233625) [12:15:26] PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 27452 MB (5% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [12:18:19] (03PS1) 10Jbond: puppetmaster2002: Offline puppetmaster2002 to upgrade [puppet] - 10https://gerrit.wikimedia.org/r/539322 (https://phabricator.wikimedia.org/T233915) [12:19:21] (03PS1) 10CDanis: dbctl test fixtures: one schema to rule them all [software/conftool] - 10https://gerrit.wikimedia.org/r/539323 [12:24:46] (03CR) 10Jbond: Script to upgrade phatality (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/539191 (owner: 1020after4) [12:26:31] 10Operations, 10serviceops, 10PHP 7.2 support, 10Performance-Team (Radar), and 2 others: PHP 7 corruption during deployment (was: PHP 7 fatals on mw1262) - https://phabricator.wikimedia.org/T224491 (10Joe) 05Open→03Resolved I will close this bug as resolved. We've not seen a recurrence in a long time,... [12:29:28] (03PS6) 10Filippo Giunchedi: nrpe::monitor_service: Make notes_url optional for ensure=absent [puppet] - 10https://gerrit.wikimedia.org/r/529590 (owner: 10Alex Monk) [12:30:01] (03CR) 1020after4: "> Patch Set 3:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/539191 (owner: 1020after4) [12:32:51] (03PS1) 10Giuseppe Lavagetto: mediawiki: remove the PHP/HHVM conditionals from the code [puppet] - 10https://gerrit.wikimedia.org/r/539326 (https://phabricator.wikimedia.org/T192166) [12:33:02] RECOVERY - Disk space on elastic1025 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [12:34:50] (03PS4) 1020after4: Script to upgrade phatality [puppet] - 10https://gerrit.wikimedia.org/r/539191 [12:35:12] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: remove the PHP/HHVM conditionals from the code [puppet] - 10https://gerrit.wikimedia.org/r/539326 (https://phabricator.wikimedia.org/T192166) (owner: 10Giuseppe Lavagetto) [12:35:43] (03CR) 1020after4: "Addressed review comments" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/539191 (owner: 1020after4) [12:35:46] (03CR) 10jerkins-bot: [V: 04-1] Script to upgrade phatality [puppet] - 10https://gerrit.wikimedia.org/r/539191 (owner: 1020after4) [12:37:23] (03PS5) 1020after4: Script to upgrade phatality [puppet] - 10https://gerrit.wikimedia.org/r/539191 [12:38:29] (03CR) 10Jbond: [C: 03+1] "lgtm" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/539191 (owner: 1020after4) [12:40:27] (03CR) 10Filippo Giunchedi: [C: 03+1] "Multiple sudo rules WFM too, I think we can skip the script at this point" [puppet] - 10https://gerrit.wikimedia.org/r/539191 (owner: 1020after4) [12:40:40] twentyafterfour jbond42 thanks for the quick iteration! [12:41:24] 10Operations, 10ops-eqiad, 10decommission: Decommission neodymium - https://phabricator.wikimedia.org/T220503 (10RobH) a:05Cmjohnson→03None [12:44:01] (03CR) 10Jcrespo: "@akosiaris Do you think I can deploy this on the new hosts only (stopping puppet on the others) to validate an progress further (then reve" [puppet] - 10https://gerrit.wikimedia.org/r/538239 (https://phabricator.wikimedia.org/T229209) (owner: 10Jcrespo) [12:49:13] (03CR) 10Filippo Giunchedi: [C: 03+2] Script to upgrade phatality [puppet] - 10https://gerrit.wikimedia.org/r/539191 (owner: 1020after4) [12:49:21] (03PS6) 10Filippo Giunchedi: Script to upgrade phatality [puppet] - 10https://gerrit.wikimedia.org/r/539191 (owner: 1020after4) [12:49:52] (03PS2) 10CDanis: dbctl test fixtures: one schema to rule them all [software/conftool] - 10https://gerrit.wikimedia.org/r/539323 [12:53:13] twentyafterfour: deployed [12:53:20] (03CR) 1020after4: "Thank you!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/539191 (owner: 1020after4) [12:53:35] err, in a couple of minutes actually when puppet finishes [12:54:45] 10Operations: Create a staging environment for CAS - https://phabricator.wikimedia.org/T233930 (10MoritzMuehlenhoff) [12:56:00] 10Operations: Cross data center setup for CAS - https://phabricator.wikimedia.org/T233931 (10MoritzMuehlenhoff) [12:56:58] 10Operations: Replicated ticket registry - https://phabricator.wikimedia.org/T233933 (10MoritzMuehlenhoff) [12:58:15] 10Operations: Collects metrics for CAS - https://phabricator.wikimedia.org/T233934 (10MoritzMuehlenhoff) [12:59:11] 10Operations: Icinga Monitoring for CAS - https://phabricator.wikimedia.org/T233935 (10MoritzMuehlenhoff) [12:59:26] (03PS7) 10Elukey: Remove Python 2 packages from Analytics Client nodes [puppet] - 10https://gerrit.wikimedia.org/r/538750 (https://phabricator.wikimedia.org/T204734) [13:00:04] 10Operations: Integrate CAS into backup infrastructure - https://phabricator.wikimedia.org/T233936 (10MoritzMuehlenhoff) [13:00:08] (03CR) 10Filippo Giunchedi: "> Patch Set 12:" [puppet] - 10https://gerrit.wikimedia.org/r/505762 (https://phabricator.wikimedia.org/T213933) (owner: 10Filippo Giunchedi) [13:03:42] (03CR) 10Elukey: [C: 03+2] Remove Python 2 packages from Analytics Client nodes [puppet] - 10https://gerrit.wikimedia.org/r/538750 (https://phabricator.wikimedia.org/T204734) (owner: 10Elukey) [13:03:47] 10Operations: Add U2F/FIDO as second factor for CAS - https://phabricator.wikimedia.org/T233937 (10MoritzMuehlenhoff) [13:04:06] 10Operations: SSO kill switch for crucial services - https://phabricator.wikimedia.org/T233938 (10MoritzMuehlenhoff) [13:04:42] 10Operations: Wikimedia theme for SSO login page - https://phabricator.wikimedia.org/T233939 (10MoritzMuehlenhoff) [13:06:06] 10Operations: CLI tools for CAS administration - https://phabricator.wikimedia.org/T233940 (10MoritzMuehlenhoff) [13:06:43] 10Operations: Validate Single Logout Flow - https://phabricator.wikimedia.org/T233941 (10MoritzMuehlenhoff) [13:07:37] 10Operations: Maintain session history / audit log - https://phabricator.wikimedia.org/T233942 (10MoritzMuehlenhoff) [13:08:58] 10Operations: Log / alert on too many failing logins / Throttling login attempts - https://phabricator.wikimedia.org/T233944 (10MoritzMuehlenhoff) [13:10:06] 10Operations: Banning IPs / subnets from accessing login/validation endpoint - https://phabricator.wikimedia.org/T233945 (10MoritzMuehlenhoff) [13:10:43] 10Operations: Validate user lockout - https://phabricator.wikimedia.org/T233946 (10MoritzMuehlenhoff) [13:11:08] PROBLEM - Check systemd state on puppetmaster1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:11:30] PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 27115 MB (5% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [13:11:53] 10Operations: CAS build as a deb - https://phabricator.wikimedia.org/T233947 (10MoritzMuehlenhoff) [13:12:28] 10Operations: Review ticket policies - https://phabricator.wikimedia.org/T233948 (10MoritzMuehlenhoff) [13:13:01] 10Operations: Fine-tune CAS logging - https://phabricator.wikimedia.org/T233949 (10MoritzMuehlenhoff) [13:13:37] (03PS1) 10Elukey: profile::analytics::cluster::packages::common: avoid absent python-yaml [puppet] - 10https://gerrit.wikimedia.org/r/539329 [13:13:39] 10Operations: Revisit Tomcat deployment of CAS - https://phabricator.wikimedia.org/T233950 (10MoritzMuehlenhoff) [13:13:52] (03CR) 10Elukey: [C: 03+2] profile::analytics::cluster::packages::common: avoid absent python-yaml [puppet] - 10https://gerrit.wikimedia.org/r/539329 (owner: 10Elukey) [13:14:08] 10Operations: Systemd hardening of CAS service unit - https://phabricator.wikimedia.org/T233951 (10MoritzMuehlenhoff) [13:16:29] 10Operations, 10Discovery, 10Elasticsearch, 10Discovery-Search (Current work), 10Patch-For-Review: Icinga should alert on free disk space < 15% (now < 12%) on Elasticsearch hosts - https://phabricator.wikimedia.org/T130329 (10fgiunchedi) Still ongoing from time to time (e.g. in september) ` #wikimedia-o... [13:21:49] 10Operations, 10serviceops, 10PHP 7.2 support: Mysterious, coordinated slowdowns every ~ 25 minutes on mw1347,mw1348 (php7 api servers) - https://phabricator.wikimedia.org/T231011 (10jijiki) Looks like this is back since yesterday: {F30473970} {F30473972} Logs show many requests that take a long time to fi... [13:23:47] (03PS1) 10CDanis: updated dbctl JSON schemata for 2019-09-26 release [puppet] - 10https://gerrit.wikimedia.org/r/539330 [13:24:29] gehel, onimisionipe o/ - should we worry about elastic1015's disk space? [13:24:33] or is it temporary? [13:25:36] (03PS1) 10Elukey: profile::analytics::cluster::packages::hadoop: absent python2 packages [puppet] - 10https://gerrit.wikimedia.org/r/539331 (https://phabricator.wikimedia.org/T204734) [13:27:05] (03PS1) 10CDanis: schema.yaml for conftool/dbctl 2019-09-26 release [puppet] - 10https://gerrit.wikimedia.org/r/539332 [13:28:09] elukey: it should be temporary, elastic1025 is moving a very big shard (commonswiki_file) away [13:30:00] 10Operations, 10ops-codfw: Broken network connection on ganeti2001 after reboot - https://phabricator.wikimedia.org/T233906 (10Papaul) p:05Triage→03High [13:30:04] sorry about these flapping alerts, hopefully after we replace these old nodes they should stop bothering us [13:33:29] dcausse: thanks :) [13:33:58] RECOVERY - Disk space on elastic1025 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [13:34:02] no bother, I noticed the alert and wanted to follow up :) [13:34:22] sure, thanks :) [13:36:46] (03PS2) 10Elukey: profile::analytics::cluster::packages::hadoop: absent python2 packages [puppet] - 10https://gerrit.wikimedia.org/r/539331 (https://phabricator.wikimedia.org/T204734) [13:39:21] (03CR) 10Elukey: [C: 03+2] profile::analytics::cluster::packages::hadoop: absent python2 packages [puppet] - 10https://gerrit.wikimedia.org/r/539331 (https://phabricator.wikimedia.org/T204734) (owner: 10Elukey) [13:43:37] (03CR) 10Giuseppe Lavagetto: [C: 03+1] schema.yaml for conftool/dbctl 2019-09-26 release [puppet] - 10https://gerrit.wikimedia.org/r/539332 (owner: 10CDanis) [13:46:22] RECOVERY - Check systemd state on puppetmaster1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:46:43] (03CR) 10Giuseppe Lavagetto: [C: 03+1] updated dbctl JSON schemata for 2019-09-26 release [puppet] - 10https://gerrit.wikimedia.org/r/539330 (owner: 10CDanis) [14:04:04] (03PS1) 10Jbond: apereo_cas: add ability to use groovy script to determine MFA [puppet] - 10https://gerrit.wikimedia.org/r/539336 (https://phabricator.wikimedia.org/T233937) [14:05:58] PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 22892 MB (4% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [14:06:44] 10Operations, 10Patch-For-Review: Add U2F/FIDO as second factor for CAS - https://phabricator.wikimedia.org/T233937 (10jbond) The upstream [[ https://github.com/apereo/cas/pull/4188 | PR ]] i have is to support multiple MFA providers, further i have just create a script to use groovy to to the same . If we o... [14:08:56] (03PS1) 10Elukey: Add role::kerberos::kdc to krb1001 [puppet] - 10https://gerrit.wikimedia.org/r/539338 (https://phabricator.wikimedia.org/T226089) [14:10:21] 10Operations, 10MediaWiki-Releasing, 10Parsoid: signatures were invalid: EXPKEYSIG 90E9F83F22250DD7 MediaWiki releases repository - https://phabricator.wikimedia.org/T225601 (10Misterms735) @fgiunchedi I understand what you are saying, but I'm very new and not skilled with in... [14:10:56] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/539338 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey) [14:13:46] (03CR) 10Elukey: [C: 03+2] Add role::kerberos::kdc to krb1001 [puppet] - 10https://gerrit.wikimedia.org/r/539338 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey) [14:16:08] !log ✔️ cdanis@install1002.wikimedia.org ~ 🕙☕ sudo -E reprepro -C main include buster-wikimedia conftool_1.2.0-1+deb10u1_amd64.changes ; sudo -E reprepro -C main include stretch-wikimedia conftool_1.2.0-1_amd64.changes [14:16:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:03] !log ✔️ cdanis@install1002.wikimedia.org ~ 🕥☕ sudo -E reprepro -C main include jessie-wikimedia conftool_1.2.0-1+deb8u1_amd64.changes [14:19:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:20:22] RECOVERY - Disk space on elastic1025 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [14:28:58] (03PS2) 10CDanis: updated dbctl JSON schemata for 2019-09-26 release [puppet] - 10https://gerrit.wikimedia.org/r/539330 [14:29:04] (03CR) 10CDanis: [C: 03+2] updated dbctl JSON schemata for 2019-09-26 release [puppet] - 10https://gerrit.wikimedia.org/r/539330 (owner: 10CDanis) [14:31:23] 10Operations, 10Patch-For-Review: Deploy federation for Prometheus - https://phabricator.wikimedia.org/T150486 (10fgiunchedi) [14:31:35] (03PS2) 10CDanis: schema.yaml for conftool/dbctl 2019-09-26 release [puppet] - 10https://gerrit.wikimedia.org/r/539332 [14:31:36] 10Operations, 10Patch-For-Review: Deploy federation for Prometheus - https://phabricator.wikimedia.org/T150486 (10fgiunchedi) 05Open→03Resolved This has happened in the meantime! [14:31:42] (03CR) 10CDanis: [V: 03+2 C: 03+2] schema.yaml for conftool/dbctl 2019-09-26 release [puppet] - 10https://gerrit.wikimedia.org/r/539332 (owner: 10CDanis) [14:32:08] <_joe_> *please everyone*: cdanis and I are taking over puppetmasters for a conftool release. Please do not run puppet-merge until we give a green light [14:36:17] !log ✔️ cdanis@puppetmaster1001.eqiad.wmnet ~ 🕥☕ sudo apt install python3-conftool [14:36:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:37:53] 10Operations, 10ops-codfw: Broken network connection on ganeti2001 after reboot - https://phabricator.wikimedia.org/T233906 (10Papaul) @MoritzMuehlenhoff i checked the cable and switch side all look good. This has to be at another level ` papaul@asw-b-codfw> show interfaces ge-1/0/7 descriptions Interface... [14:37:56] !log ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕥☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s puppetmaster [14:37:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:38:01] <_joe_> *puppet merging can resume* [14:38:06] (03PS1) 10Muehlenhoff: Sort distros in generate-debdeploy-spec [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/539340 [14:41:17] !log ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕥☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s cumin [14:41:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:42:19] <_joe_> moritzm: <3 [14:43:29] !log cdanis@cumin1001 dbctl commit (dc=all): 'dbctl 1.2.0 adds hostByName to the output, but it is not used by Mediawiki; this commit is the first made with the new release; no-op change', diff saved to https://phabricator.wikimedia.org/P9208 and previous config saved to /var/cache/conftool/dbconfig/20190926-144328-cdanis.json [14:43:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:27] <_joe_> cdanis: +1 [14:46:55] (03PS1) 10Filippo Giunchedi: role: drop Thanos labels from global Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/539342 (https://phabricator.wikimedia.org/T233956) [14:47:29] !log dbctl schema migration on instances to add note field https://wikitech.wikimedia.org/wiki/Dbctl#Schema_upgrades T229677 [14:47:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:47:33] T229677: #dbctl: add 'comment'/'description' metadata to instances - https://phabricator.wikimedia.org/T229677 [14:47:38] PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on icinga1001 is CRITICAL: 48.19 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [14:47:52] 10Operations, 10ops-codfw: Broken network connection on ganeti2001 after reboot - https://phabricator.wikimedia.org/T233906 (10MoritzMuehlenhoff) Maybe the NIC on the server broke? Are there some self-tests/diagnostics for that on the hardware side? [14:48:42] (03CR) 10jerkins-bot: [V: 04-1] role: drop Thanos labels from global Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/539342 (https://phabricator.wikimedia.org/T233956) (owner: 10Filippo Giunchedi) [14:49:12] PROBLEM - Varnish traffic drop between 30min ago and now at esams on icinga1001 is CRITICAL: 59.95 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [14:49:31] spike? [14:51:16] RECOVERY - Varnish traffic drop between 30min ago and now at esams on icinga1001 is OK: (C)60 le (W)70 le 74.9 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [14:51:49] <_joe_> vgutierrez: I think so, but also [14:52:48] RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on icinga1001 is OK: (C)60 le (W)70 le 74.67 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [14:54:46] 10Operations, 10Traffic, 10Performance-Team (Radar): Some HTTP requests for MW failing due to "ERR_SPDY_PROTOCOL_ERROR 200" - https://phabricator.wikimedia.org/T220022 (10TheDJ) @matmarex i think at the very least, when we trip the limit, we would want to have that proactively logged, regardless of the solut... [14:59:13] (03PS3) 10CRusnov: netbox: Setup automated DNS generation [puppet] - 10https://gerrit.wikimedia.org/r/539182 (https://phabricator.wikimedia.org/T233183) [15:00:02] !log dbctl schema migration done T229677 [15:00:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:06] T229677: #dbctl: add 'comment'/'description' metadata to instances - https://phabricator.wikimedia.org/T229677 [15:01:00] (03PS1) 10Muehlenhoff: Use correct database name for PuppetDB Postgres replication check [puppet] - 10https://gerrit.wikimedia.org/r/539346 [15:01:43] (03PS2) 10Filippo Giunchedi: role: drop Thanos labels from global Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/539342 (https://phabricator.wikimedia.org/T233956) [15:03:43] !log mforns@deploy1001 Started deploy [analytics/aqs/deploy@1a1c08c]: Deploying analytics-aqs using scap [15:03:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:27] !log mforns@deploy1001 Finished deploy [analytics/aqs/deploy@1a1c08c]: Deploying analytics-aqs using scap (duration: 02m 44s) [15:06:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:12] 10Operations, 10Traffic, 10Performance-Team (Radar): Some HTTP requests for MW failing due to "ERR_SPDY_PROTOCOL_ERROR 200" - https://phabricator.wikimedia.org/T220022 (10Krinkle) I recall seeing it on other/smaller responses as well, but haven't seen those recently. >>! In T220022#5524618, @matmarex wrote:... [15:08:18] PROBLEM - Check systemd state on krb1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:09:15] 10Operations: Broken network connection on ganeti2001 after reboot - https://phabricator.wikimedia.org/T233906 (10akosiaris) I don't think this is hardware related. ` root@ganeti2001:/etc/network# ifup private Error: argument "private" is wrong: dev is invalid ` [15:12:30] 10Operations: Broken network connection on ganeti2001 after reboot - https://phabricator.wikimedia.org/T233906 (10akosiaris) Found it. I had to comment out from `/etc/network/interfaces` the line ` pre-up /sbin/ip token set ::10:192:16:125 dev private ` which makes sense that it fails given that it tries on pr... [15:12:58] RECOVERY - Check systemd state on krb1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:14:19] 10Operations: Broken network connection on ganeti2001 after reboot - https://phabricator.wikimedia.org/T233906 (10akosiaris) p:05High→03Normal Changing priority to normal since the host is now up and running, but we have a chicken and egg problem to solve here. [15:15:59] !log ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s eqsin [15:16:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:05] 10Operations, 10MediaWiki-Releasing, 10Parsoid: signatures were invalid: EXPKEYSIG 90E9F83F22250DD7 MediaWiki releases repository - https://phabricator.wikimedia.org/T225601 (10fgiunchedi) >>! In T225601#5526711, @Misterms735 wrote: > @fgiunchedi I understand what you are say... [15:35:34] !log ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕦☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s ulsfo [15:35:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:25] (03CR) 10Filippo Giunchedi: "I'm not really sure about 'prometheus' external label name that we would assign to each prometheus "instance" we have. It'd be sth like pr" [puppet] - 10https://gerrit.wikimedia.org/r/539342 (https://phabricator.wikimedia.org/T233956) (owner: 10Filippo Giunchedi) [15:45:13] 10Operations, 10Analytics, 10User-Elukey: setup/install codfw kerbos node WMF6577 - https://phabricator.wikimedia.org/T233142 (10elukey) @RobH any chance to get this done by today/tomorrow? Really sorry to press you but it would help a lot in trying to make a quarterly goal.. If you are busy no problem! [15:46:24] 10Operations, 10Analytics, 10User-Elukey: setup/install krb1001/WMF5173 - https://phabricator.wikimedia.org/T233141 (10elukey) 05Open→03Resolved [15:46:27] 10Operations, 10Analytics, 10hardware-requests, 10User-Elukey: eqiad: 1 misc node for the Kerberos KDC service - https://phabricator.wikimedia.org/T227288 (10elukey) [15:48:07] 10Operations: Broken network connection on ganeti2001 after reboot - https://phabricator.wikimedia.org/T233906 (10Papaul) @akosiaris can you take over the task then? [15:48:41] 10Operations: Broken network connection on ganeti2001 after reboot - https://phabricator.wikimedia.org/T233906 (10akosiaris) a:05Papaul→03akosiaris Sure. [15:49:20] 10Operations: Broken network connection on ganeti2001 after reboot - https://phabricator.wikimedia.org/T233906 (10Papaul) Thanks [15:50:17] 10Operations, 10ops-codfw, 10media-storage: rack/setup/install ms-be205[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T233638 (10Papaul) [15:51:55] 10Operations, 10ops-codfw: apply hostname labels for krb2001/WMF6577 - https://phabricator.wikimedia.org/T233962 (10RobH) [15:52:46] (03CR) 10Jforrester: "Nice." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539011 (https://phabricator.wikimedia.org/T233771) (owner: 10Krinkle) [15:53:33] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) p:05Triage→03High [15:54:00] (03PS1) 10Andrew Bogott: cloud ldap: support acls for n keystone hosts (in this case, n=2) [puppet] - 10https://gerrit.wikimedia.org/r/539354 (https://phabricator.wikimedia.org/T223907) [15:54:48] (03CR) 10Jforrester: [C: 03+1] build: Upgrade from PHPUnit 6 to PHPUnit 8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539008 (https://phabricator.wikimedia.org/T233771) (owner: 10Krinkle) [15:54:52] (03CR) 10jerkins-bot: [V: 04-1] cloud ldap: support acls for n keystone hosts (in this case, n=2) [puppet] - 10https://gerrit.wikimedia.org/r/539354 (https://phabricator.wikimedia.org/T223907) (owner: 10Andrew Bogott) [15:55:11] (03PS5) 10Jforrester: MWConfigCacheGenerator: Provide getCachableMWConfig() which doesn't rely on wgConf [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538078 [15:56:20] (03PS2) 10Andrew Bogott: cloud ldap: support acls for n keystone hosts (in this case, n=2) [puppet] - 10https://gerrit.wikimedia.org/r/539354 (https://phabricator.wikimedia.org/T223907) [15:56:36] (03PS2) 10Jbond: apereo_cas: add ability to use groovy script to determine MFA [puppet] - 10https://gerrit.wikimedia.org/r/539336 (https://phabricator.wikimedia.org/T233937) [15:56:54] !log sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s esams [15:56:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:01] (03CR) 10jerkins-bot: [V: 04-1] cloud ldap: support acls for n keystone hosts (in this case, n=2) [puppet] - 10https://gerrit.wikimedia.org/r/539354 (https://phabricator.wikimedia.org/T223907) (owner: 10Andrew Bogott) [15:57:17] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) a:05RobH→03Papaul Unfortunately, it appears the switch port for this system is not labeled on asw-d8-codfw, so we'll need @papaul to trace it out and update this task with the port... [15:57:57] (03PS3) 10Andrew Bogott: cloud ldap: support acls for n keystone hosts (in this case, n=2) [puppet] - 10https://gerrit.wikimedia.org/r/539354 (https://phabricator.wikimedia.org/T223907) [15:58:52] (03CR) 10jerkins-bot: [V: 04-1] cloud ldap: support acls for n keystone hosts (in this case, n=2) [puppet] - 10https://gerrit.wikimedia.org/r/539354 (https://phabricator.wikimedia.org/T223907) (owner: 10Andrew Bogott) [15:59:38] (03PS4) 10Andrew Bogott: cloud ldap: support acls for n keystone hosts (in this case, n=2) [puppet] - 10https://gerrit.wikimedia.org/r/539354 (https://phabricator.wikimedia.org/T223907) [16:00:05] godog and _joe_: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Puppet SWAT(Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190926T1600). [16:00:05] No GERRIT patches in the queue for this window AFAICS. [16:02:14] (03PS5) 10Andrew Bogott: cloud ldap: support acls for n keystone hosts (in this case, n=2) [puppet] - 10https://gerrit.wikimedia.org/r/539354 (https://phabricator.wikimedia.org/T223907) [16:03:56] (03PS1) 10RobH: updating krb2001 mgmt and prod dns [dns] - 10https://gerrit.wikimedia.org/r/539358 (https://phabricator.wikimedia.org/T233142) [16:06:39] (03CR) 10RobH: [C: 03+2] updating krb2001 mgmt and prod dns [dns] - 10https://gerrit.wikimedia.org/r/539358 (https://phabricator.wikimedia.org/T233142) (owner: 10RobH) [16:07:51] (03PS14) 10Jforrester: Variant configuration: Pre-calculate config for each wiki and store it in config.git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507729 (https://phabricator.wikimedia.org/T223602) [16:09:09] (03CR) 10jerkins-bot: [V: 04-1] Variant configuration: Pre-calculate config for each wiki and store it in config.git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507729 (https://phabricator.wikimedia.org/T223602) (owner: 10Jforrester) [16:12:13] (03CR) 10Jhedden: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/539354 (https://phabricator.wikimedia.org/T223907) (owner: 10Andrew Bogott) [16:12:50] (03PS15) 10Jforrester: Variant configuration: Pre-calculate config for each wiki and store it in config.git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507729 (https://phabricator.wikimedia.org/T223602) [16:13:16] (03CR) 10Anomie: [C: 03+1] "Seems sane as far as I can tell. No idea how to test it, and I can't +2 in this repo anyway." [software] - 10https://gerrit.wikimedia.org/r/539319 (https://phabricator.wikimedia.org/T233625) (owner: 10Marostegui) [16:13:46] (03CR) 10Anomie: [C: 03+1] "> At some point in the not too distant future we'll need to review the `revision` partitioning for T233625 as well." [software] - 10https://gerrit.wikimedia.org/r/539319 (https://phabricator.wikimedia.org/T233625) (owner: 10Marostegui) [16:15:03] (03CR) 10Andrew Bogott: [C: 03+2] cloud ldap: support acls for n keystone hosts (in this case, n=2) [puppet] - 10https://gerrit.wikimedia.org/r/539354 (https://phabricator.wikimedia.org/T223907) (owner: 10Andrew Bogott) [16:17:58] (03CR) 10Arturo Borrero Gonzalez: cloud ldap: support acls for n keystone hosts (in this case, n=2) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/539354 (https://phabricator.wikimedia.org/T223907) (owner: 10Andrew Bogott) [16:18:14] (03PS7) 10Ayounsi: [WIP] Netbox Juniper installed base report [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/539192 [16:19:08] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) [16:19:11] (03CR) 10Andrew Bogott: [C: 03+2] cloud ldap: support acls for n keystone hosts (in this case, n=2) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/539354 (https://phabricator.wikimedia.org/T223907) (owner: 10Andrew Bogott) [16:19:14] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) a:05Papaul→03RobH [16:20:21] (03CR) 10Volans: [C: 03+1] "LGTM" [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/539340 (owner: 10Muehlenhoff) [16:21:16] (03CR) 10Volans: [C: 03+1] "LGTM, at least I didn't see anything obviously wrong." [software/conftool] - 10https://gerrit.wikimedia.org/r/539323 (owner: 10CDanis) [16:22:23] (03PS1) 10Ayounsi: Add fake deploy homer ssh keys [labs/private] - 10https://gerrit.wikimedia.org/r/539360 [16:22:50] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/539346 (owner: 10Muehlenhoff) [16:23:39] (03CR) 10Ayounsi: [V: 03+2 C: 03+2] Add fake deploy homer ssh keys [labs/private] - 10https://gerrit.wikimedia.org/r/539360 (owner: 10Ayounsi) [16:23:53] (03PS2) 10Jforrester: CommonSettings: Switch from getMWConfigForCacheing to getCachableMWConfig to avoid wgConf [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538342 [16:24:23] (03PS2) 10Jforrester: Drop getMWConfigForCacheing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538343 [16:25:04] (03PS2) 10Jforrester: [WiP] YAML files for every wiki, and a basic inheritance tree [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538354 [16:28:34] PROBLEM - Check systemd state on seaborgium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:29:28] PROBLEM - Labs LDAP on seaborgium is CRITICAL: Could not bind to the LDAP server https://wikitech.wikimedia.org/wiki/LDAP%23Troubleshooting [16:30:38] 10Operations, 10ops-codfw, 10decommission: Decommission db2036 - https://phabricator.wikimedia.org/T223885 (10Papaul) [16:31:19] what's going on with LDAP? [16:32:20] (03PS2) 10Jforrester: CirrusSettings-labs: Move as much as possible to InitialiseSettings-Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538318 [16:33:12] probably a puppet patch, is being reverted now [16:33:26] RECOVERY - Check systemd state on seaborgium is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:34:18] RECOVERY - Labs LDAP on seaborgium is OK: LDAP OK - 0.007 seconds response time https://wikitech.wikimedia.org/wiki/LDAP%23Troubleshooting [16:34:58] (03CR) 10Jbond: [C: 03+1] Use correct database name for PuppetDB Postgres replication check [puppet] - 10https://gerrit.wikimedia.org/r/539346 (owner: 10Muehlenhoff) [16:36:10] (03PS1) 10RobH: krb2001 install params [puppet] - 10https://gerrit.wikimedia.org/r/539363 (https://phabricator.wikimedia.org/T233142) [16:36:50] (03CR) 10RobH: [C: 03+2] krb2001 install params [puppet] - 10https://gerrit.wikimedia.org/r/539363 (https://phabricator.wikimedia.org/T233142) (owner: 10RobH) [16:37:24] (03PS2) 10RobH: krb2001 install params [puppet] - 10https://gerrit.wikimedia.org/r/539363 (https://phabricator.wikimedia.org/T233142) [16:37:49] !log ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕛☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s codfw [16:37:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:39:19] 10Operations, 10CPT Initiatives (PHP7 (TEC4)), 10HHVM, 10MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Jdforrester-WMF) [16:40:51] !log upgrading firmware on scs-c1-codfw [16:40:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:41:24] (03PS1) 10Andrew Bogott: openldap: remove some extra newlines in labs-acls [puppet] - 10https://gerrit.wikimedia.org/r/539365 (https://phabricator.wikimedia.org/T223907) [16:41:53] (03PS3) 10Bstorm: toolforge-kubernetes: restructure pod security policies [puppet] - 10https://gerrit.wikimedia.org/r/537732 (https://phabricator.wikimedia.org/T227290) [16:43:17] (03CR) 10Andrew Bogott: [C: 03+2] openldap: remove some extra newlines in labs-acls [puppet] - 10https://gerrit.wikimedia.org/r/539365 (https://phabricator.wikimedia.org/T223907) (owner: 10Andrew Bogott) [16:43:34] 10Operations, 10ops-codfw, 10decommission: Decommission db2037 - https://phabricator.wikimedia.org/T224720 (10Papaul) [16:45:14] (03PS1) 10Ayounsi: Homer, use deploy-homer user for deploy + fix files perms [puppet] - 10https://gerrit.wikimedia.org/r/539367 (https://phabricator.wikimedia.org/T228388) [16:46:12] (03PS4) 10Arturo Borrero Gonzalez: toolforge: update nginx-ingress configuration [puppet] - 10https://gerrit.wikimedia.org/r/539087 (https://phabricator.wikimedia.org/T228500) [16:49:21] (03PS1) 10Andrew Bogott: openldap: pluralize labs_keystone_host(s) for codfw and labs [puppet] - 10https://gerrit.wikimedia.org/r/539368 [16:50:19] (03CR) 10Ayounsi: "https://puppet-compiler.wmflabs.org/compiler1001/18602/cumin1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/539367 (https://phabricator.wikimedia.org/T228388) (owner: 10Ayounsi) [16:50:48] (03CR) 10Andrew Bogott: [C: 03+2] openldap: pluralize labs_keystone_host(s) for codfw and labs [puppet] - 10https://gerrit.wikimedia.org/r/539368 (owner: 10Andrew Bogott) [16:51:13] 10Operations, 10ops-codfw: refresh/replace scs-c1-codfw - https://phabricator.wikimedia.org/T231687 (10Papaul) [16:51:40] (03PS2) 10Jforrester: CirrusSettings-common: Move as much as possible to VariantSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538319 [16:52:39] (03CR) 10jerkins-bot: [V: 04-1] CirrusSettings-common: Move as much as possible to VariantSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538319 (owner: 10Jforrester) [16:52:56] (03CR) 10Jeena Huneidi: "> Patch Set 12: Code-Review-1" (035 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/517557 (https://phabricator.wikimedia.org/T224935) (owner: 10Jeena Huneidi) [16:53:07] (03CR) 10Volans: "LGTM, just a couple of nits:" [puppet] - 10https://gerrit.wikimedia.org/r/539367 (https://phabricator.wikimedia.org/T228388) (owner: 10Ayounsi) [16:54:20] (03PS3) 10Jforrester: CirrusSettings-common: Move as much as possible to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538319 [16:54:22] (03PS2) 10Jforrester: CirrusSettings-common: Move timeouts to InitialiseSettings (full array, not just over-writes) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538323 [16:54:24] (03PS2) 10Jforrester: InitialiseSettings: Set wgWMESearchRelevancePages directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538324 [16:54:26] (03PS2) 10Jforrester: CirrusSettings-production: Move settings to InitialiseSettings where static [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538325 [16:54:55] (03PS2) 10Ayounsi: Homer, use deploy-homer user for deploy + fix files perms [puppet] - 10https://gerrit.wikimedia.org/r/539367 (https://phabricator.wikimedia.org/T228388) [16:55:31] (03PS1) 10CDanis: conftool-data: remove service tree [puppet] - 10https://gerrit.wikimedia.org/r/539371 [16:55:39] (03CR) 10jerkins-bot: [V: 04-1] CirrusSettings-common: Move as much as possible to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538319 (owner: 10Jforrester) [16:55:41] (03CR) 10Ayounsi: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/539367 (https://phabricator.wikimedia.org/T228388) (owner: 10Ayounsi) [16:55:58] (03CR) 10jerkins-bot: [V: 04-1] CirrusSettings-common: Move timeouts to InitialiseSettings (full array, not just over-writes) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538323 (owner: 10Jforrester) [16:56:06] (03CR) 10jerkins-bot: [V: 04-1] InitialiseSettings: Set wgWMESearchRelevancePages directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538324 (owner: 10Jforrester) [16:56:16] (03CR) 10Krinkle: "IS/IS-labs load much earlier than the bulk of CommonSettings and Cirus-common.php (included at the bottom of CommonSettings.php, no idea w" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538318 (owner: 10Jforrester) [16:56:30] (03CR) 10jerkins-bot: [V: 04-1] CirrusSettings-production: Move settings to InitialiseSettings where static [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538325 (owner: 10Jforrester) [16:56:45] James_F: wow, I wonder why Cirrus is loaded that latet [16:56:47] late* [16:57:55] * James_F looks. [16:57:57] [ IS, IS-labs, (bulk of CS), Ciruss-common, Ciruss-labs ] [16:58:10] Oh, I see, yeah. [16:58:22] I gues has to do with wfLoadExtension [16:58:25] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: decommission db2038 - https://phabricator.wikimedia.org/T227565 (10Papaul) [16:58:28] Probably just the extension… yes. [16:58:42] I think require_once had the issue of needing to be before setting, which is why we have all those wmg/wg [16:58:52] (as otherwise extension defaults win) [16:59:02] but afaik wfLoadExtension doesn't have that issue. [16:59:04] Oh, meh, yes. [16:59:24] (03PS3) 10Ayounsi: Homer, use deploy-homer user for deploy + fix files perms [puppet] - 10https://gerrit.wikimedia.org/r/539367 (https://phabricator.wikimedia.org/T228388) [16:59:49] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) [17:00:04] cscott, arlolra, subbu, halfak, and accraze: #bothumor My software never has bugs. It just develops random features. Rise for Services – Graphoid / Parsoid / Citoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190926T1700). [17:00:57] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) a:05RobH→03Papaul ` ge-8/0/3 down down krb2001 ` Everything is ready for this to install, but it doesn't see any network attachment on its primary interface when trying to... [17:01:46] (03PS4) 10Jforrester: CirrusSettings-common: Move as much as possible to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538319 [17:01:48] (03PS3) 10Jforrester: CirrusSettings-common: Move timeouts to InitialiseSettings (full array, not just over-writes) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538323 [17:01:50] (03PS3) 10Jforrester: InitialiseSettings: Set wgWMESearchRelevancePages directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538324 [17:01:52] (03PS3) 10Jforrester: CirrusSettings-production: Move settings to InitialiseSettings where static [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538325 [17:02:36] (03CR) 10Ayounsi: [C: 03+2] Homer, use deploy-homer user for deploy + fix files perms [puppet] - 10https://gerrit.wikimedia.org/r/539367 (https://phabricator.wikimedia.org/T228388) (owner: 10Ayounsi) [17:03:02] (03CR) 10jerkins-bot: [V: 04-1] CirrusSettings-common: Move timeouts to InitialiseSettings (full array, not just over-writes) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538323 (owner: 10Jforrester) [17:03:14] (03CR) 10jerkins-bot: [V: 04-1] CirrusSettings-common: Move as much as possible to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538319 (owner: 10Jforrester) [17:03:20] (03CR) 10jerkins-bot: [V: 04-1] InitialiseSettings: Set wgWMESearchRelevancePages directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538324 (owner: 10Jforrester) [17:03:47] (03CR) 10CDanis: [C: 03+2] conftool-data: remove service tree [puppet] - 10https://gerrit.wikimedia.org/r/539371 (owner: 10CDanis) [17:04:00] (03PS2) 10CDanis: conftool-data: remove service tree [puppet] - 10https://gerrit.wikimedia.org/r/539371 [17:04:04] (03CR) 10jerkins-bot: [V: 04-1] CirrusSettings-production: Move settings to InitialiseSettings where static [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538325 (owner: 10Jforrester) [17:04:14] (03PS5) 10Jforrester: CirrusSettings-common: Move as much as possible to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538319 [17:04:16] (03PS4) 10Jforrester: CirrusSettings-common: Move timeouts to InitialiseSettings (full array, not just over-writes) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538323 [17:04:18] (03PS4) 10Jforrester: InitialiseSettings: Set wgWMESearchRelevancePages directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538324 [17:04:20] (03PS4) 10Jforrester: CirrusSettings-production: Move settings to InitialiseSettings where static [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538325 [17:05:21] (03CR) 10jerkins-bot: [V: 04-1] CirrusSettings-common: Move as much as possible to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538319 (owner: 10Jforrester) [17:05:24] (03PS1) 10Ottomata: Bump eventgate-main image version and pre cache revision-score 2.0.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/539373 (https://phabricator.wikimedia.org/T225211) [17:05:28] (03CR) 10jerkins-bot: [V: 04-1] CirrusSettings-common: Move timeouts to InitialiseSettings (full array, not just over-writes) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538323 (owner: 10Jforrester) [17:05:48] (03CR) 10jerkins-bot: [V: 04-1] InitialiseSettings: Set wgWMESearchRelevancePages directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538324 (owner: 10Jforrester) [17:06:02] (03CR) 10Ottomata: [C: 03+2] Bump eventgate-main image version and pre cache revision-score 2.0.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/539373 (https://phabricator.wikimedia.org/T225211) (owner: 10Ottomata) [17:06:08] (03CR) 10jerkins-bot: [V: 04-1] CirrusSettings-production: Move settings to InitialiseSettings where static [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538325 (owner: 10Jforrester) [17:06:33] no parsoid deploy today [17:07:07] (03CR) 10Bstorm: "Looks awesome! I haven't tested it locally, but I know you and Hieu were doing that stuff." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/539087 (https://phabricator.wikimedia.org/T228500) (owner: 10Arturo Borrero Gonzalez) [17:07:19] 10Operations, 10CPT Initiatives (PHP7 (TEC4)), 10HHVM, 10MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Ladsgroup) It seems all of traffic is on php7. Can we drop things now? 😈😈😈 [17:07:21] (03CR) 10Jforrester: CirrusSettings-labs: Move as much as possible to InitialiseSettings-Labs (038 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538318 (owner: 10Jforrester) [17:07:25] 10Operations, 10conftool: remove service objects from etcd and update documentation - https://phabricator.wikimedia.org/T233973 (10CDanis) [17:07:43] Krinkle: Only value not already set in IS is wgCirrusSearchEnableSearchLogging. I think it's good to go. [17:08:09] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) a:05Papaul→03RobH it was in disabled and i missed it, papaul pointed it out, fixed. [17:08:40] !log @ helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' . [17:08:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:09:57] (03PS6) 10Jforrester: CirrusSettings-common: Move as much as possible to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538319 [17:10:41] (03CR) 10Bstorm: toolforge-kubernetes: restructure pod security policies (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/537732 (https://phabricator.wikimedia.org/T227290) (owner: 10Bstorm) [17:10:57] (03PS4) 10Ottomata: Rsync analytics mediawiki history dumps to dumps.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/538312 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns) [17:11:00] !log @ helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' . [17:11:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:11:47] (03CR) 10Bstorm: "An important thing here is that we have to remove the existing clusterrolebinding for authenticated users in our test cluster to properly " [puppet] - 10https://gerrit.wikimedia.org/r/537732 (https://phabricator.wikimedia.org/T227290) (owner: 10Bstorm) [17:12:52] (03PS5) 10Jforrester: CirrusSettings-common: Move timeouts to InitialiseSettings (full array, not just over-writes) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538323 [17:12:54] (03CR) 10Ottomata: [C: 03+2] Rsync analytics mediawiki history dumps to dumps.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/538312 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns) [17:13:33] !log ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s eqiad [17:13:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:14:52] !log @ helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' . [17:14:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:14:58] (03PS3) 10Bstorm: toolforge-k8s: proposed role for all tools [puppet] - 10https://gerrit.wikimedia.org/r/537755 (https://phabricator.wikimedia.org/T227290) [17:15:46] 10Operations: Systemd hardening of CAS service unit - https://phabricator.wikimedia.org/T233951 (10herron) p:05Triage→03Normal [17:16:01] 10Operations: Fine-tune CAS logging - https://phabricator.wikimedia.org/T233949 (10herron) p:05Triage→03Normal [17:16:16] 10Operations: CAS build as a deb - https://phabricator.wikimedia.org/T233947 (10herron) p:05Triage→03Normal [17:16:26] 10Operations: Revisit Tomcat deployment of CAS - https://phabricator.wikimedia.org/T233950 (10herron) p:05Triage→03Normal [17:16:38] 10Operations: Review ticket policies - https://phabricator.wikimedia.org/T233948 (10herron) p:05Triage→03Normal [17:16:52] 10Operations: Validate user lockout - https://phabricator.wikimedia.org/T233946 (10herron) p:05Triage→03Normal [17:16:59] 10Operations, 10ops-eqiad, 10DBA: es1019 IPMI and its management interface are unresponsive (again2) - https://phabricator.wikimedia.org/T233698 (10Cmjohnson) @Marostegui Can you depool it leave it for us to do when we get a free moment. It's an easy thing to do but may not happen until later in the day aft... [17:17:04] 10Operations: Banning IPs / subnets from accessing login/validation endpoint - https://phabricator.wikimedia.org/T233945 (10herron) p:05Triage→03Normal [17:17:16] 10Operations: Log / alert on too many failing logins / Throttling login attempts - https://phabricator.wikimedia.org/T233944 (10herron) p:05Triage→03Normal [17:17:30] 10Operations: Maintain session history / audit log - https://phabricator.wikimedia.org/T233942 (10herron) p:05Triage→03Normal [17:17:41] 10Operations: Validate Single Logout Flow - https://phabricator.wikimedia.org/T233941 (10herron) p:05Triage→03Normal [17:17:47] (03CR) 10CDanis: [C: 03+1] "thanks!" [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/539340 (owner: 10Muehlenhoff) [17:17:52] 10Operations: CLI tools for CAS administration - https://phabricator.wikimedia.org/T233940 (10herron) p:05Triage→03Normal [17:18:01] 10Operations: Wikimedia theme for SSO login page - https://phabricator.wikimedia.org/T233939 (10herron) p:05Triage→03Normal [17:18:12] 10Operations: SSO kill switch for crucial services - https://phabricator.wikimedia.org/T233938 (10herron) p:05Triage→03Normal [17:18:23] 10Operations, 10Patch-For-Review: Add U2F/FIDO as second factor for CAS - https://phabricator.wikimedia.org/T233937 (10herron) p:05Triage→03Normal [17:18:30] 10Operations: Integrate CAS into backup infrastructure - https://phabricator.wikimedia.org/T233936 (10herron) p:05Triage→03Normal [17:18:38] 10Operations: Icinga Monitoring for CAS - https://phabricator.wikimedia.org/T233935 (10herron) p:05Triage→03Normal [17:18:49] 10Operations: Collects metrics for CAS - https://phabricator.wikimedia.org/T233934 (10herron) p:05Triage→03Normal [17:18:58] 10Operations: Replicated ticket registry - https://phabricator.wikimedia.org/T233933 (10herron) p:05Triage→03Normal [17:19:07] 10Operations: Cross data center setup for CAS - https://phabricator.wikimedia.org/T233931 (10herron) p:05Triage→03Normal [17:19:14] 10Operations: Create a staging environment for CAS - https://phabricator.wikimedia.org/T233930 (10herron) p:05Triage→03Normal [17:19:22] 10Operations: Further steps for CAS/web SSO - https://phabricator.wikimedia.org/T233921 (10herron) p:05Triage→03Normal [17:19:44] PROBLEM - Check systemd state on ms-be1026 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:20:17] (03CR) 10Jforrester: [C: 03+2] CirrusSettings-labs: Move as much as possible to InitialiseSettings-Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538318 (owner: 10Jforrester) [17:20:30] 10Operations, 10Wikimedia-Logstash, 10observability, 10serviceops: Errors managed by wmf-errors (like OOMs) lack normalized_message on logstash - https://phabricator.wikimedia.org/T233828 (10herron) [17:20:53] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) [17:21:12] (03Merged) 10jenkins-bot: CirrusSettings-labs: Move as much as possible to InitialiseSettings-Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538318 (owner: 10Jforrester) [17:22:13] (03CR) 10Arturo Borrero Gonzalez: toolforge: update nginx-ingress configuration (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/539087 (https://phabricator.wikimedia.org/T228500) (owner: 10Arturo Borrero Gonzalez) [17:22:20] (03PS1) 10Mforns: dumps::manifests::web::fetches::stats: correct path for mediawiki history [puppet] - 10https://gerrit.wikimedia.org/r/539374 (https://phabricator.wikimedia.org/T208612) [17:23:04] Krinkle: Also, wfLoadExtension preserves existing globals when unpacking extension.json. [17:23:12] Otherwise lots of things would break. [17:23:39] (03CR) 10Ottomata: [C: 03+2] dumps::manifests::web::fetches::stats: correct path for mediawiki history [puppet] - 10https://gerrit.wikimedia.org/r/539374 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns) [17:23:54] 10Operations, 10ops-codfw: apply hostname labels for krb2001/WMF6577 - https://phabricator.wikimedia.org/T233962 (10Papaul) 05Open→03Resolved complete [17:23:57] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10Papaul) [17:24:13] (03CR) 10jenkins-bot: CirrusSettings-labs: Move as much as possible to InitialiseSettings-Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538318 (owner: 10Jforrester) [17:24:36] 10Operations, 10ops-codfw, 10decommission: Decommission db2041 - https://phabricator.wikimedia.org/T223950 (10Papaul) [17:26:09] (03Abandoned) 10Jforrester: tests: Skip the Cirrus configuration tests as they're inextricable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/537756 (owner: 10Jforrester) [17:26:24] (03CR) 10Jforrester: CirrusSettings-common: Move as much as possible to InitialiseSettings (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538319 (owner: 10Jforrester) [17:28:03] (03PS8) 10Ayounsi: Netbox Juniper installed base report [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/539192 [17:28:59] (03CR) 10Volans: Netbox Juniper installed base report (031 comment) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/539192 (owner: 10Ayounsi) [17:31:02] !log start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T233835, T233246) [17:31:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:31:07] T233246: Please add Wikidata support for newly created hi.wikisource - https://phabricator.wikimedia.org/T233246 [17:31:08] T233835: Add Wikidata support for nqowiki - https://phabricator.wikimedia.org/T233835 [17:31:47] (03CR) 10Ayounsi: "This change is ready for review." [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/539192 (owner: 10Ayounsi) [17:32:48] oh, moar Wikidata [17:35:35] !log run apt-get autoremove on stat* and notebook* to clean up old python2 deps [17:35:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:39:01] * Amir1 meows [17:39:51] 10Operations, 10ops-eqiad: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet - https://phabricator.wikimedia.org/T232367 (10Cmjohnson) [17:39:55] (03CR) 10Volans: "puppet compiler is failing" [puppet] - 10https://gerrit.wikimedia.org/r/539182 (https://phabricator.wikimedia.org/T233183) (owner: 10CRusnov) [17:40:16] elukey: <3 [17:40:26] James_F: yeah, wfExtension made it safe to set config before loading [17:40:27] (03CR) 10CRusnov: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/539182 (https://phabricator.wikimedia.org/T233183) (owner: 10CRusnov) [17:40:40] but for anything using require .php-style loading, we'd still need wmg/wg to set things from IS. [17:40:54] 10Operations, 10ops-eqiad: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet - https://phabricator.wikimedia.org/T232367 (10Cmjohnson) @fgiunchedi I see you said for raid Partitioning/Raid: "use existing ms-be setup" Unfortunately my memory is not that great anymore can you please remind what the exist... [17:41:04] which I hope we have very few left of :) [17:41:06] (03CR) 10Bstorm: toolforge: update nginx-ingress configuration (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/539087 (https://phabricator.wikimedia.org/T228500) (owner: 10Arturo Borrero Gonzalez) [17:41:34] !log ppchelko@deploy1001 Started deploy [changeprop/deploy@2db4bff]: Modify ORES processor for new-style events T225211 [17:41:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:41:37] T225211: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 [17:43:38] !log ppchelko@deploy1001 Finished deploy [changeprop/deploy@2db4bff]: Modify ORES processor for new-style events T225211 (duration: 02m 04s) [17:43:38] (03CR) 10Jforrester: [C: 03+2] CirrusSettings-common: Move as much as possible to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538319 (owner: 10Jforrester) [17:43:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:44:59] (03Merged) 10jenkins-bot: CirrusSettings-common: Move as much as possible to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538319 (owner: 10Jforrester) [17:47:32] (03PS1) 10Andrew Bogott: openldap: move keystone_hosts lookup to the profile [puppet] - 10https://gerrit.wikimedia.org/r/539376 [17:47:44] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Move static settings from CirrusSettings-common (duration: 01m 05s) [17:47:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:48:18] (03CR) 10jenkins-bot: CirrusSettings-common: Move as much as possible to InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538319 (owner: 10Jforrester) [17:49:10] !log end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T233835, T233246) [17:49:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:49:15] T233246: Please add Wikidata support for newly created hi.wikisource - https://phabricator.wikimedia.org/T233246 [17:49:15] T233835: Add Wikidata support for nqowiki - https://phabricator.wikimedia.org/T233835 [17:49:23] Amir1: Fun. [17:49:35] !log jforrester@deploy1001 Synchronized wmf-config/CirrusSearch-common.php: Stop setting static values now set in InitialiseSettings (duration: 01m 04s) [17:49:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:50:04] (03PS2) 10Andrew Bogott: openldap: move keystone_hosts lookup to the profile [puppet] - 10https://gerrit.wikimedia.org/r/539376 [17:50:18] James_F: It's usually a harmless script, the fun begins when it breaks for any reason: https://wikitech.wikimedia.org/wiki/Add_a_wiki#Wikidata [17:50:21] James_F: copied some from CirussSearch/extension.json? [17:50:26] * James_F grins. [17:50:34] * Krinkle can tell by the double quotes [17:50:56] (03PS1) 10Ayounsi: homer->deploy-homer on deploy server [puppet] - 10https://gerrit.wikimedia.org/r/539377 (https://phabricator.wikimedia.org/T228388) [17:51:04] Krinkle: No? That was cut+paste from CirrusSearch-common with some tweaks. [17:51:26] (03PS5) 10Arturo Borrero Gonzalez: toolforge: update nginx-ingress configuration [puppet] - 10https://gerrit.wikimedia.org/r/539087 (https://phabricator.wikimedia.org/T228500) [17:51:33] (03CR) 10Krinkle: [C: 04-1] CirrusSettings-common: Move timeouts to InitialiseSettings (full array, not just over-writes) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538323 (owner: 10Jforrester) [17:51:33] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/538323/ is the only one where I'm explicitly re-setting values from extension.json. [17:51:37] (03Abandoned) 10Andrew Bogott: openldap: move keystone_hosts lookup to the profile [puppet] - 10https://gerrit.wikimedia.org/r/539376 (owner: 10Andrew Bogott) [17:51:45] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/539377 (https://phabricator.wikimedia.org/T228388) (owner: 10Ayounsi) [17:51:55] (03CR) 10Ayounsi: [C: 03+2] homer->deploy-homer on deploy server [puppet] - 10https://gerrit.wikimedia.org/r/539377 (https://phabricator.wikimedia.org/T228388) (owner: 10Ayounsi) [17:51:55] James_F: yeah, that's the one I'm looking at no worries [17:51:59] I like all quotes equally [17:52:38] (03CR) 10Jforrester: CirrusSettings-common: Move timeouts to InitialiseSettings (full array, not just over-writes) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538323 (owner: 10Jforrester) [17:52:40] (03CR) 10Krinkle: [C: 03+1] CirrusSettings-common: Move timeouts to InitialiseSettings (full array, not just over-writes) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538323 (owner: 10Jforrester) [17:52:55] Ha. :-) [17:52:57] (03CR) 10Jforrester: [C: 03+2] CirrusSettings-common: Move timeouts to InitialiseSettings (full array, not just over-writes) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538323 (owner: 10Jforrester) [17:53:01] James_F: yeah, I saw the CS.php code assign ['default'] [17:53:03] which is unusual [17:53:10] and didn't see an array there on the other hand [17:53:13] but all good [17:53:14] Yeah. [17:53:54] (03Merged) 10jenkins-bot: CirrusSettings-common: Move timeouts to InitialiseSettings (full array, not just over-writes) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538323 (owner: 10Jforrester) [17:53:55] I'm sad that expensive code like `foreach ( $wmgCirrusSearchMLRModel as $name => $mlrModel )` runs on every request init. [17:54:11] (03CR) 10jenkins-bot: CirrusSettings-common: Move timeouts to InitialiseSettings (full array, not just over-writes) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538323 (owner: 10Jforrester) [17:54:30] (03CR) 10Krinkle: "end result LGTM, but this is a scap trap. not sure there's a way to roll it out currently." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538324 (owner: 10Jforrester) [17:55:11] Krinkle: Oh, good point. Meh, will split. [17:55:17] (03CR) 10Ayounsi: "Adding Chris for emoji compliance." [cookbooks] - 10https://gerrit.wikimedia.org/r/537486 (https://phabricator.wikimedia.org/T233053) (owner: 10Ayounsi) [17:55:51] (03CR) 10Bstorm: [C: 03+1] "Until we deploy this we can always go back and realize we were wrong on all the naming 😅" [puppet] - 10https://gerrit.wikimedia.org/r/539087 (https://phabricator.wikimedia.org/T228500) (owner: 10Arturo Borrero Gonzalez) [17:56:31] (03CR) 10CDanis: [C: 03+1] "LGTM for emoji readability" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/537486 (https://phabricator.wikimedia.org/T233053) (owner: 10Ayounsi) [17:57:12] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Set the whole of the CirrusSearch timeoutes arrays directly (duration: 01m 00s) [17:57:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:58:22] 10Operations, 10serviceops, 10PHP 7.2 support: Mysterious, coordinated slowdowns every ~ 25 minutes on mw1347,mw1348 (php7 api servers) - https://phabricator.wikimedia.org/T231011 (10Ladsgroup) euwiki heavily uses wikidata, it might be related to wb_terms table. Does it correlate with spikes in https://grafa... [17:58:50] (03PS6) 10Arturo Borrero Gonzalez: toolforge: update nginx-ingress configuration [puppet] - 10https://gerrit.wikimedia.org/r/539087 (https://phabricator.wikimedia.org/T228500) [17:58:53] !log jforrester@deploy1001 Synchronized wmf-config/CirrusSearch-common.php: Stop setting bits of the CirrusSearch timeoutes arrays, already set in IS (duration: 01m 04s) [17:58:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:59:36] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: update nginx-ingress configuration [puppet] - 10https://gerrit.wikimedia.org/r/539087 (https://phabricator.wikimedia.org/T228500) (owner: 10Arturo Borrero Gonzalez) [18:00:04] MaxSem, RoanKattouw, Niharika, and Urbanecm: That opportune time is upon us again. Time for a Morning SWAT (Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190926T1800). [18:00:04] No GERRIT patches in the queue for this window AFAICS. [18:01:00] (03PS5) 10Jforrester: CirrusSearch-common: Stop indirectly setting wgWMESearchRelevancePages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538324 [18:01:02] (03PS1) 10Jforrester: InitialiseSettings: Set wgWMESearchRelevancePages directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539381 [18:01:14] 10Operations, 10CPT Initiatives (PHP7 (TEC4)), 10HHVM, 10MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Reedy) >>! In T176370#5527357, @Ladsgroup wrote: > It seems all of traffic is on php7. Can we drop things now?... [18:01:25] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/539322 (https://phabricator.wikimedia.org/T233915) (owner: 10Jbond) [18:01:47] (03CR) 10Muehlenhoff: [V: 03+2 C: 03+2] Sort distros in generate-debdeploy-spec [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/539340 (owner: 10Muehlenhoff) [18:02:19] 10Operations, 10CPT Initiatives (PHP7 (TEC4)), 10HHVM, 10MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Jdforrester-WMF) >>! In T176370#5527544, @Reedy wrote: >>>! In T176370#5527357, @Ladsgroup wrote: >> It seems a... [18:02:37] !log volans@deploy1001 Started deploy [homer/deploy@68ac5cc]: Initial Homer release [18:02:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:03:20] !log volans@deploy1001 Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 42s) [18:03:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:04:28] !log volans@deploy1001 Started deploy [homer/deploy@68ac5cc]: Initial Homer release [18:04:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:04:31] !log volans@deploy1001 Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 03s) [18:04:31] !log running mcrouter_generate_certs to add a cert for wtp2001.codfw.wmnet for T233654 [18:04:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:04:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:04:37] T233654: Make the parsoid cluster to support parsoid/PHP - https://phabricator.wikimedia.org/T233654 [18:04:54] (03CR) 10Jforrester: [C: 03+2] InitialiseSettings: Set wgWMESearchRelevancePages directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539381 (owner: 10Jforrester) [18:06:01] (03Merged) 10jenkins-bot: InitialiseSettings: Set wgWMESearchRelevancePages directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539381 (owner: 10Jforrester) [18:06:07] (03PS5) 10Dzahn: parsoid: introduce parameter to use parsoid/PHP [puppet] - 10https://gerrit.wikimedia.org/r/539181 (https://phabricator.wikimedia.org/T233654) [18:06:55] !log ayounsi@deploy1001 Started deploy [homer/deploy@68ac5cc]: Initial Homer release [18:06:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:07:49] (03CR) 10jenkins-bot: InitialiseSettings: Set wgWMESearchRelevancePages directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539381 (owner: 10Jforrester) [18:07:51] !log ayounsi@deploy1001 Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 55s) [18:07:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:08:02] RECOVERY - Check systemd state on ms-be1026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:08:18] (03CR) 10jerkins-bot: [V: 04-1] parsoid: introduce parameter to use parsoid/PHP [puppet] - 10https://gerrit.wikimedia.org/r/539181 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn) [18:10:16] (03CR) 10Dzahn: "mcrouter cert for wtp2001 generated and committed on puppetmaster" [puppet] - 10https://gerrit.wikimedia.org/r/539181 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn) [18:10:49] (03CR) 10Jforrester: [C: 03+2] CirrusSearch-common: Stop indirectly setting wgWMESearchRelevancePages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538324 (owner: 10Jforrester) [18:10:56] (03PS4) 10Bstorm: toolforge-k8s: proposed role for all tools [puppet] - 10https://gerrit.wikimedia.org/r/537755 (https://phabricator.wikimedia.org/T227290) [18:11:32] (03CR) 10jerkins-bot: [V: 04-1] toolforge-k8s: proposed role for all tools [puppet] - 10https://gerrit.wikimedia.org/r/537755 (https://phabricator.wikimedia.org/T227290) (owner: 10Bstorm) [18:11:49] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Set wgWMESearchRelevancePages directly in InitialiseSettings (duration: 01m 04s) [18:11:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:04] (03Merged) 10jenkins-bot: CirrusSearch-common: Stop indirectly setting wgWMESearchRelevancePages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538324 (owner: 10Jforrester) [18:12:15] Hi. Who can do a mediawiki/services/parsoid/deploy.git bump? [18:13:51] 10Operations, 10ops-eqiad, 10vm-requests: rack/setup/install ganeti10([09]|1[0-8[).eqiad.wmnet - https://phabricator.wikimedia.org/T228924 (10Cmjohnson) [18:13:54] (03CR) 10jenkins-bot: CirrusSearch-common: Stop indirectly setting wgWMESearchRelevancePages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538324 (owner: 10Jforrester) [18:15:04] (03PS4) 10Bstorm: toolforge-kubernetes: restructure pod security policies [puppet] - 10https://gerrit.wikimedia.org/r/537732 (https://phabricator.wikimedia.org/T227290) [18:15:06] (03PS5) 10Bstorm: toolforge-k8s: proposed role for all tools [puppet] - 10https://gerrit.wikimedia.org/r/537755 (https://phabricator.wikimedia.org/T227290) [18:15:11] !log volans@deploy1001 Started deploy [homer/deploy@68ac5cc]: Initial Homer release [18:15:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:42] !log volans@deploy1001 Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 31s) [18:15:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:37] (03CR) 10Bstorm: "This is much better structured. Testing it now." [puppet] - 10https://gerrit.wikimedia.org/r/537755 (https://phabricator.wikimedia.org/T227290) (owner: 10Bstorm) [18:16:44] (03PS5) 10Jforrester: CirrusSettings-production: Move settings to InitialiseSettings where static [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538325 [18:17:38] !log jforrester@deploy1001 Synchronized wmf-config/CirrusSearch-common.php: Stop indirectly setting wgWMESearchRelevancePages (duration: 01m 04s) [18:17:39] (03PS1) 10Mforns: analytics::refinery::job::Refine: bump up refinery jar version to v0.0.101 [puppet] - 10https://gerrit.wikimedia.org/r/539385 [18:17:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:51] (03CR) 10jerkins-bot: [V: 04-1] CirrusSettings-production: Move settings to InitialiseSettings where static [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538325 (owner: 10Jforrester) [18:21:29] (03PS6) 10Jforrester: CirrusSettings-production: Move settings to InitialiseSettings where static [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538325 [18:22:25] (03CR) 10jerkins-bot: [V: 04-1] CirrusSettings-production: Move settings to InitialiseSettings where static [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538325 (owner: 10Jforrester) [18:24:02] (03PS1) 10Volans: Fix script path [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/539387 [18:24:46] (03CR) 10Volans: [V: 03+2 C: 03+2] "Merging to unblock deployment" [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/539387 (owner: 10Volans) [18:25:07] !log volans@deploy1001 Started deploy [homer/deploy@715d842]: Initial Homer release [18:25:07] (03PS7) 10Jforrester: CirrusSettings-production: Move settings to InitialiseSettings where static [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538325 [18:25:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:25:30] !log volans@deploy1001 Finished deploy [homer/deploy@715d842]: Initial Homer release (duration: 00m 23s) [18:25:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:46] (03CR) 10Bstorm: "Yup, it still works. Now maybe it'll be more sensible to reason over. I'm realizing that I don't have a reason to restrict controllerrev" [puppet] - 10https://gerrit.wikimedia.org/r/537755 (https://phabricator.wikimedia.org/T227290) (owner: 10Bstorm) [18:29:03] !log mforns@deploy1001 Started deploy [analytics/refinery@cd2f43b]: deploy refinery using scap (together with refinery-source v0.0.101) [18:29:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:30:45] (03CR) 10Jforrester: [C: 03+2] CirrusSettings-production: Move settings to InitialiseSettings where static [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538325 (owner: 10Jforrester) [18:31:50] (03Merged) 10jenkins-bot: CirrusSettings-production: Move settings to InitialiseSettings where static [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538325 (owner: 10Jforrester) [18:32:07] (03CR) 10jenkins-bot: CirrusSettings-production: Move settings to InitialiseSettings where static [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538325 (owner: 10Jforrester) [18:34:00] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Set last static Cirrus settings directly in IS (duration: 01m 07s) [18:34:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:35:07] !log mforns@deploy1001 Finished deploy [analytics/refinery@cd2f43b]: deploy refinery using scap (together with refinery-source v0.0.101) (duration: 06m 04s) [18:35:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:35:27] OK, clear for now. [18:35:42] !log jforrester@deploy1001 Synchronized wmf-config/CirrusSearch-production.php: Stop setting various static settings, now set in IS (duration: 01m 04s) [18:35:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:06] (03PS3) 10Jforrester: Remove unused math config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/537166 (https://phabricator.wikimedia.org/T228547) (owner: 10Physikerwelt) [18:37:37] (03CR) 10Krinkle: [C: 03+1] Lower gzip threshold for SVGs served by MediaWiki [puppet] - 10https://gerrit.wikimedia.org/r/537974 (https://phabricator.wikimedia.org/T232615) (owner: 10Gilles) [18:37:42] (03PS2) 10Jforrester: Remove more unused math config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/537241 (https://phabricator.wikimedia.org/T228547) (owner: 10Physikerwelt) [18:39:45] (03PS1) 10Ayounsi: Homer, add make package for scap deploy [puppet] - 10https://gerrit.wikimedia.org/r/539390 (https://phabricator.wikimedia.org/T228388) [18:42:59] (03CR) 10Ayounsi: [C: 03+2] Homer, add make package for scap deploy [puppet] - 10https://gerrit.wikimedia.org/r/539390 (https://phabricator.wikimedia.org/T228388) (owner: 10Ayounsi) [18:44:54] !log ayounsi@deploy1001 Started deploy [homer/deploy@715d842]: Initial Homer release [18:44:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:45:17] !log ayounsi@deploy1001 Finished deploy [homer/deploy@715d842]: Initial Homer release (duration: 00m 22s) [18:45:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:49:08] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) grub is failing to install on sda, regardless of distro. I think this may be a hardware failure, perhaps we shoudl swap them around and see if the error follows the disk. [18:49:58] (03CR) 10Vgutierrez: [C: 03+2] Lower gzip threshold for SVGs served by MediaWiki [puppet] - 10https://gerrit.wikimedia.org/r/537974 (https://phabricator.wikimedia.org/T232615) (owner: 10Gilles) [18:50:11] (03PS5) 10Vgutierrez: Lower gzip threshold for SVGs served by MediaWiki [puppet] - 10https://gerrit.wikimedia.org/r/537974 (https://phabricator.wikimedia.org/T232615) (owner: 10Gilles) [18:54:07] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) a:05RobH→03Papaul @papaul, Since this is failing grub on sda, I'd like to see if it is a disk issue (most likely), bay issue (moderately likely if the backplane is bad), or softwa... [19:00:04] twentyafterfour: Time to snap out of that daydream and deploy MediaWiki train - American version. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190926T1900). [19:03:06] (03PS1) 10Jhedden: openstack: update haproxy ferm rules for remote services [puppet] - 10https://gerrit.wikimedia.org/r/539398 (https://phabricator.wikimedia.org/T223907) [19:03:48] (03PS1) 10Dzahn: add fake mcrouter certs for wtp2001.codfw.wmnet [labs/private] - 10https://gerrit.wikimedia.org/r/539399 [19:05:35] 10Operations, 10serviceops, 10Patch-For-Review: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10Dzahn) [19:05:47] (03CR) 10Dzahn: [V: 03+2 C: 03+2] add fake mcrouter certs for wtp2001.codfw.wmnet [labs/private] - 10https://gerrit.wikimedia.org/r/539399 (owner: 10Dzahn) [19:06:13] 10Operations, 10DC-Ops, 10netops: Juniper network device audit - all sites - https://phabricator.wikimedia.org/T213843 (10ayounsi) I wrote a Netbox report to check against Juniper's installed base ( https://netbox.wikimedia.org/extras/reports/juniper.Juniper/ ) Still in review in https://gerrit.wikimedia.org... [19:06:23] (03CR) 10Jhedden: "PCC results: https://puppet-compiler.wmflabs.org/compiler1001/18607/" [puppet] - 10https://gerrit.wikimedia.org/r/539398 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [19:06:38] (03PS2) 10Dzahn: add fake mcrouter certs for wtp2001.codfw.wmnet [labs/private] - 10https://gerrit.wikimedia.org/r/539399 [19:10:03] (03CR) 10Dzahn: [V: 03+2 C: 03+2] add fake mcrouter certs for wtp2001.codfw.wmnet [labs/private] - 10https://gerrit.wikimedia.org/r/539399 (owner: 10Dzahn) [19:12:26] It would seem there are still no train blockers. [19:13:03] !log preparing to deploy the mediawiki train for 1.34.0-wmf.24. refs T220749 [19:13:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:13:07] T220749: 1.34.0-wmf.24 deployment blockers - https://phabricator.wikimedia.org/T220749 [19:13:41] (03PS1) 1020after4: all wikis to 1.34.0-wmf.24 refs T220749 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539400 [19:13:43] (03CR) 1020after4: [C: 03+2] all wikis to 1.34.0-wmf.24 refs T220749 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539400 (owner: 1020after4) [19:14:59] (03Merged) 10jenkins-bot: all wikis to 1.34.0-wmf.24 refs T220749 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539400 (owner: 1020after4) [19:16:35] (03CR) 10jenkins-bot: all wikis to 1.34.0-wmf.24 refs T220749 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539400 (owner: 1020after4) [19:17:07] !log volans@deploy1001 Started deploy [homer/deploy@715d842]: Initial Homer release (test) [19:17:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:17:23] !log volans@deploy1001 Finished deploy [homer/deploy@715d842]: Initial Homer release (test) (duration: 00m 16s) [19:17:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:19:39] (03PS6) 10Dzahn: parsoid: introduce parameter to use parsoid/PHP [puppet] - 10https://gerrit.wikimedia.org/r/539181 (https://phabricator.wikimedia.org/T233654) [19:21:28] (03PS7) 10Dzahn: parsoid: introduce parameter to use parsoid/PHP [puppet] - 10https://gerrit.wikimedia.org/r/539181 (https://phabricator.wikimedia.org/T233654) [19:22:25] (03CR) 10Jhedden: [C: 03+2] openstack: update haproxy ferm rules for remote services [puppet] - 10https://gerrit.wikimedia.org/r/539398 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [19:23:32] !log twentyafterfour@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.24 refs T220749 [19:23:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:23:36] T220749: 1.34.0-wmf.24 deployment blockers - https://phabricator.wikimedia.org/T220749 [19:27:22] (03PS8) 10Dzahn: parsoid: introduce parameter to use parsoid/PHP [puppet] - 10https://gerrit.wikimedia.org/r/539181 (https://phabricator.wikimedia.org/T233654) [19:36:45] (03PS1) 10Volans: Homer: add bash wrapper to make it easy to run it [puppet] - 10https://gerrit.wikimedia.org/r/539404 (https://phabricator.wikimedia.org/T228388) [19:37:37] 10Operations, 10DNS, 10Mail, 10Traffic, and 2 others: wikidata.org lacks SPF record - https://phabricator.wikimedia.org/T210134 (10revi) [19:38:52] (03PS1) 10Jhedden: openstack: add designate to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539406 (https://phabricator.wikimedia.org/T223907) [19:41:25] (03PS9) 10Ayounsi: Netbox Juniper installed base report [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/539192 [19:42:18] (03CR) 10Jhedden: [C: 03+2] openstack: add designate to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539406 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [19:43:15] (03CR) 10Ayounsi: [C: 03+1] Homer: add bash wrapper to make it easy to run it [puppet] - 10https://gerrit.wikimedia.org/r/539404 (https://phabricator.wikimedia.org/T228388) (owner: 10Volans) [19:46:29] !log phedenskog@deploy1001 Started deploy [performance/navtiming@f2a0863]: (no justification provided) [19:46:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:34] !log phedenskog@deploy1001 Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s) [19:46:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:09] (03PS2) 10Volans: Homer: add bash wrapper to make it easy to run it [puppet] - 10https://gerrit.wikimedia.org/r/539404 (https://phabricator.wikimedia.org/T228388) [19:50:08] (03CR) 10CDanis: [C: 03+2] dbctl test fixtures: one schema to rule them all [software/conftool] - 10https://gerrit.wikimedia.org/r/539323 (owner: 10CDanis) [19:50:48] (03CR) 10Volans: [C: 03+2] Homer: add bash wrapper to make it easy to run it [puppet] - 10https://gerrit.wikimedia.org/r/539404 (https://phabricator.wikimedia.org/T228388) (owner: 10Volans) [19:52:05] 10Operations, 10Phabricator, 10Traffic, 10Release-Engineering-Team (Development services), and 2 others: Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10CDanis) >>! In T226044#5524171, @greg wrote: > @mmodell @ema After a discussion with Tech... [19:52:06] !log krinkle@deploy1001 Started deploy [performance/navtiming@f2a0863]: (no justification provided) [19:52:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:11] !log krinkle@deploy1001 Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s) [19:52:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:01] (03Merged) 10jenkins-bot: dbctl test fixtures: one schema to rule them all [software/conftool] - 10https://gerrit.wikimedia.org/r/539323 (owner: 10CDanis) [19:53:38] PROBLEM - MariaDB read only wikireplica on labsdb1011 is CRITICAL: Could not connect to localhost:3306 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting [19:54:02] PROBLEM - haproxy failover on dbproxy1018 is CRITICAL: CRITICAL check_failover servers up 2 down 1 https://wikitech.wikimedia.org/wiki/HAProxy [19:54:14] PROBLEM - Check systemd state on labsdb1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:54:46] PROBLEM - haproxy failover on dbproxy1010 is CRITICAL: CRITICAL check_failover servers up 2 down 1 https://wikitech.wikimedia.org/wiki/HAProxy [19:55:16] RECOVERY - MariaDB read only wikireplica on labsdb1011 is OK: Version 10.1.39-MariaDB, Uptime 310s, read_only: True, 10.87 QPS, connection latency: 0.001735s, query latency: 0.000392s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting [19:58:22] 10Operations, 10serviceops, 10Patch-For-Review: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10ssastry) [19:58:44] !log phedenskog@deploy1001 Started deploy [performance/navtiming@1880a79]: Test deploy [19:58:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:58:48] !log phedenskog@deploy1001 Finished deploy [performance/navtiming@1880a79]: Test deploy (duration: 00m 05s) [19:58:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:15] (03PS1) 10Ottomata: Start mediawiki_analytics camus as a with a new history, include api-request [puppet] - 10https://gerrit.wikimedia.org/r/539408 (https://phabricator.wikimedia.org/T233718) [20:02:05] (03PS9) 10Dzahn: parsoid: introduce parameter to use parsoid/PHP [puppet] - 10https://gerrit.wikimedia.org/r/539181 (https://phabricator.wikimedia.org/T233654) [20:02:20] RECOVERY - Check systemd state on labsdb1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:02:27] 10Operations, 10serviceops: Set up LVS for parsoid/PHP - https://phabricator.wikimedia.org/T233722 (10Dzahn) [20:03:28] (03CR) 10Ottomata: [C: 03+2] Start mediawiki_analytics camus as a with a new history, include api-request [puppet] - 10https://gerrit.wikimedia.org/r/539408 (https://phabricator.wikimedia.org/T233718) (owner: 10Ottomata) [20:06:15] (03PS1) 10Vgutierrez: ATS: Fix phabricator websocket remap rule for ats-tls instance [puppet] - 10https://gerrit.wikimedia.org/r/539409 [20:07:33] 10Operations, 10serviceops, 10Patch-For-Review: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10ssastry) [20:08:39] (03CR) 10Vgutierrez: [C: 03+2] ATS: Fix phabricator websocket remap rule for ats-tls instance [puppet] - 10https://gerrit.wikimedia.org/r/539409 (owner: 10Vgutierrez) [20:08:48] (03PS2) 10Vgutierrez: ATS: Fix phabricator websocket remap rule for ats-tls instance [puppet] - 10https://gerrit.wikimedia.org/r/539409 [20:10:48] PROBLEM - Check systemd state on labsdb1010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:10:51] (03PS1) 10Dzahn: add fake SSL key for wtp2001.codfw.wmnet [labs/private] - 10https://gerrit.wikimedia.org/r/539410 (https://phabricator.wikimedia.org/T233654) [20:11:16] PROBLEM - haproxy failover on dbproxy1011 is CRITICAL: CRITICAL check_failover servers up 2 down 1 https://wikitech.wikimedia.org/wiki/HAProxy [20:11:18] PROBLEM - haproxy failover on dbproxy1019 is CRITICAL: CRITICAL check_failover servers up 2 down 1 https://wikitech.wikimedia.org/wiki/HAProxy [20:11:52] (03PS2) 10Dzahn: add fake SSL key for wtp2001.codfw.wmnet [labs/private] - 10https://gerrit.wikimedia.org/r/539410 (https://phabricator.wikimedia.org/T233654) [20:12:06] 10Operations, 10serviceops, 10Patch-For-Review: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10Dzahn) https://gerrit.wikimedia.org/r/c/labs/private/+/539399/ [20:13:29] (03CR) 10Dzahn: [V: 03+2 C: 03+2] add fake SSL key for wtp2001.codfw.wmnet [labs/private] - 10https://gerrit.wikimedia.org/r/539410 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn) [20:14:48] (03PS1) 10Ayounsi: Homer, add secondary server for sync [puppet] - 10https://gerrit.wikimedia.org/r/539411 (https://phabricator.wikimedia.org/T228388) [20:17:04] (03PS1) 10Papaul: DNS: Add mgmt and production DNS for ms-be205[1-6] [dns] - 10https://gerrit.wikimedia.org/r/539412 [20:17:32] (03CR) 10jerkins-bot: [V: 04-1] DNS: Add mgmt and production DNS for ms-be205[1-6] [dns] - 10https://gerrit.wikimedia.org/r/539412 (owner: 10Papaul) [20:21:20] (03PS2) 10Papaul: DNS: Add mgmt and production DNS for ms-be205[1-6] [dns] - 10https://gerrit.wikimedia.org/r/539412 [20:21:38] (03PS10) 10Dzahn: parsoid: introduce parameter to use parsoid/PHP [puppet] - 10https://gerrit.wikimedia.org/r/539181 (https://phabricator.wikimedia.org/T233654) [20:22:11] (03CR) 10Dzahn: DNS: Add mgmt and production DNS for ms-be205[1-6] (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/539412 (owner: 10Papaul) [20:30:32] (03CR) 10Dzahn: [C: 03+1] DNS: Add mgmt and production DNS for ms-be205[1-6] [dns] - 10https://gerrit.wikimedia.org/r/539412 (owner: 10Papaul) [20:32:29] (03CR) 10Papaul: [C: 03+2] DNS: Add mgmt and production DNS for ms-be205[1-6] [dns] - 10https://gerrit.wikimedia.org/r/539412 (owner: 10Papaul) [20:33:19] RECOVERY - Check systemd state on labsdb1010 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:40:49] 10Operations, 10ops-codfw, 10media-storage, 10Patch-For-Review: rack/setup/install ms-be205[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T233638 (10Papaul) [20:40:55] (03PS6) 10Jforrester: MWConfigCacheGenerator: Provide getCachableMWConfig() which doesn't rely on wgConf [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538078 [20:40:57] (03PS16) 10Jforrester: Variant configuration: Pre-calculate config for each wiki and store it in config.git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507729 (https://phabricator.wikimedia.org/T223602) [20:40:59] (03PS3) 10Jforrester: [WiP] YAML files for every wiki, and a basic inheritance tree [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538354 [20:41:01] (03PS1) 10Jforrester: [WIP] Demonstration move of select dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539414 [21:00:34] 10Operations, 10ops-codfw, 10media-storage, 10Patch-For-Review: rack/setup/install ms-be205[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T233638 (10Papaul) [21:01:54] (03PS4) 10Alexandros Kosiaris: rsyslog: Support adding metadata to input, default to off [puppet] - 10https://gerrit.wikimedia.org/r/538626 (https://phabricator.wikimedia.org/T207200) [21:01:56] (03PS4) 10Alexandros Kosiaris: rsyslog: populate kubernetes configuration [puppet] - 10https://gerrit.wikimedia.org/r/538627 (https://phabricator.wikimedia.org/T207200) [21:08:19] is it just me or gerrit rest api is returning invalid json? [21:08:22] curl "https://gerrit.wikimedia.org/r/changes/?q=status:open&n=2" [21:08:47] (03PS5) 10Alexandros Kosiaris: rsyslog: Support adding metadata to input, default to off [puppet] - 10https://gerrit.wikimedia.org/r/538626 (https://phabricator.wikimedia.org/T207200) [21:08:49] (03PS5) 10Alexandros Kosiaris: rsyslog: populate kubernetes configuration [puppet] - 10https://gerrit.wikimedia.org/r/538627 (https://phabricator.wikimedia.org/T207200) [21:08:53] (03PS1) 10Alexandros Kosiaris: rsyslog::input::file: Fix regular expression [puppet] - 10https://gerrit.wikimedia.org/r/539418 [21:08:55] (03PS1) 10Alexandros Kosiaris: rsyslog: Support adding cee tag to input file [puppet] - 10https://gerrit.wikimedia.org/r/539419 (https://phabricator.wikimedia.org/T207200) [21:09:51] (03PS2) 10Ayounsi: Homer, remove rsync [puppet] - 10https://gerrit.wikimedia.org/r/539411 (https://phabricator.wikimedia.org/T228388) [21:11:05] Krinkle: Thoughts about me proceeding with https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/538078 ? [21:11:41] (03PS3) 10Jforrester: CommonSettings: Switch from getMWConfigForCacheing to getCachableMWConfig to avoid wgConf [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538342 [21:11:53] (03PS3) 10Jforrester: Drop getMWConfigForCacheing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538343 [21:11:54] dmaza hi, that's not invalid. [21:12:04] you need to strip out the top [21:12:07] it's done on purpose [21:12:32] right ok. Why is it done on purpose? [21:13:19] (03PS6) 10Alexandros Kosiaris: rsyslog: populate kubernetes configuration [puppet] - 10https://gerrit.wikimedia.org/r/538627 (https://phabricator.wikimedia.org/T207200) [21:14:16] dmaza see https://gerrit-review.googlesource.com/Documentation/rest-api.html#output [21:14:23] "To prevent against Cross Site Script Inclusion (XSSI) attacks, the JSON response body starts with a magic prefix line that must be stripped before feeding the rest of the response body to a JSON parser:" [21:15:35] paladox: thank you very much [21:15:56] your welcome :) [21:17:47] James_F: Hm.. not sure. I think there's at least several other references to wgConf outside of this that need to be accomodated. [21:18:06] Including a few uses of looking at other wiki's configuration. [21:18:28] outside wmf-config, that is. [21:18:50] Really? Hmm. [21:19:05] Yeah, it's a long tail of silly things like that. [21:19:09] This doesn't stop us having SiteConfiguration, it just moves it out of the critical path. [21:19:39] (03CR) 10Ayounsi: [C: 03+2] Homer, remove rsync [puppet] - 10https://gerrit.wikimedia.org/r/539411 (https://phabricator.wikimedia.org/T228388) (owner: 10Ayounsi) [21:19:42] Well, yes, but if we still need to populate and generate it fully for compat, I'd rather have 1 thing to test and worry about then two. [21:19:46] If nothing else, the Cirrus tests rely on it. [21:19:56] Ah well, we know how much we love those. [21:20:00] We already have the Cirrus tests use an old fork of the code. [21:20:02] ;-) [21:20:04] There's a couple of things in core and in extensions [21:20:31] I don't see a plausible path to pre-generated config with a SiteConfiguration object as the initial store. [21:20:32] We'd be in a better position if we got rid of the '+'/array_merge feature, at least for wmf's input to wgConf. Note sure how feasible that is, would be really nice [21:20:44] James_F: agreed. [21:20:55] But there's nothing to stop us keeping it later. [21:21:13] I want to have as little negative impact as possible on the current InitialiseSettings.php world until the switch. [21:21:25] Yes [21:22:13] Getting rid of + syntax would be quite disruptive. [21:22:30] Can we take a brief break and summarise the how/what on the task for some wider input? I know not many people seem to care, but I also don't want us to go at this alone, or at least for me to take more dedicated time to look at it. [21:22:34] Once we're all in YAML files we can't use the + syntax anyway. [21:22:43] Sure. [21:25:04] yeah, letting core/extension defaults shine through is okay (and inevitable, it's outside wmf-config control). But letting the subset of vars that we override dynamically change their override based in partial inheritence and merging is... problematic, so yeah, that's gotta go one way or another so that extract() can remain clean [21:26:04] It's also kind of buggy now in that if InitiliaseSettings.php does 'wgFoo': '+enwiki' => [ 'x' => 'y' ], and you change core from ['a' => 1] to ['a' => 2], I don't think our current /tmp cache invalidation would detect that [21:26:16] so it will keep expanding wgFoo as ['a' => 1, 'x' => 'y' ] [21:26:20] unless and until IS.php is touched [21:26:29] .. which maybe scap does always, so that might be fine, [21:26:36] but still far from pretty [21:26:59] Anyway, gotta finish up somethings now and then dinner. [21:27:08] ttyl :) - Nice work on the Cirrus configs [21:29:50] (03PS1) 10Jhedden: openstack: add designate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539421 (https://phabricator.wikimedia.org/T223907) [21:32:15] (03CR) 10jerkins-bot: [V: 04-1] openstack: add designate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539421 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [21:34:06] (03PS2) 10Jhedden: openstack: add designate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539421 (https://phabricator.wikimedia.org/T223907) [21:42:28] (03PS3) 10Jhedden: openstack: add designate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539421 (https://phabricator.wikimedia.org/T223907) [21:52:50] (03PS4) 10Jforrester: Remove unused math config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/537166 (https://phabricator.wikimedia.org/T228547) (owner: 10Physikerwelt) [21:53:38] (03CR) 10Jforrester: [C: 03+2] Remove unused math config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/537166 (https://phabricator.wikimedia.org/T228547) (owner: 10Physikerwelt) [21:54:31] (03Merged) 10jenkins-bot: Remove unused math config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/537166 (https://phabricator.wikimedia.org/T228547) (owner: 10Physikerwelt) [21:56:17] (03CR) 10BryanDavis: [C: 03+1] "Basically a copy-and-paste from urlproxy" [puppet] - 10https://gerrit.wikimedia.org/r/479041 (https://phabricator.wikimedia.org/T211709) (owner: 10Andrew Bogott) [21:56:35] (03CR) 10jenkins-bot: Remove unused math config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/537166 (https://phabricator.wikimedia.org/T228547) (owner: 10Physikerwelt) [21:57:00] (03PS3) 10Jforrester: Remove more unused math config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/537241 (https://phabricator.wikimedia.org/T228547) (owner: 10Physikerwelt) [21:57:32] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: T228547 Stop setting wgTexvc, wgMathTexvcCheckExecutable, wgMathCheckFiles (unused) (duration: 01m 00s) [21:57:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:57:38] T228547: Remove unused configuration variables for Math Extension from codebase - https://phabricator.wikimedia.org/T228547 [21:58:45] 10Operations, 10Mail: Vendor's Emails Not Coming Through - https://phabricator.wikimedia.org/T233991 (10Krenair) [21:58:50] (03CR) 10Jforrester: [C: 03+2] Remove more unused math config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/537241 (https://phabricator.wikimedia.org/T228547) (owner: 10Physikerwelt) [21:59:00] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T228547 Stop setting wgMathFileBackend, wgMathPath, wgMathDirectory (unused) (duration: 00m 56s) [21:59:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:59:48] (03Merged) 10jenkins-bot: Remove more unused math config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/537241 (https://phabricator.wikimedia.org/T228547) (owner: 10Physikerwelt) [22:00:07] (03CR) 10jenkins-bot: Remove more unused math config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/537241 (https://phabricator.wikimedia.org/T228547) (owner: 10Physikerwelt) [22:02:39] !log jforrester@deploy1001 Synchronized wmf-config/filebackend.php: T228547 Stop sharding wgFileBackends shardViaHashLevels for math-render (duration: 00m 56s) [22:02:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:02:43] T228547: Remove unused configuration variables for Math Extension from codebase - https://phabricator.wikimedia.org/T228547 [22:03:12] 10Operations, 10Math, 10MW-1.34-notes (1.34.0-wmf.23; 2019-09-17), 10Patch-For-Review: Remove unused configuration variables for Math Extension from codebase - https://phabricator.wikimedia.org/T228547 (10Jdforrester-WMF) 05Open→03Resolved a:03Jdforrester-WMF Deployed. Thank you. [22:03:15] 10Operations, 10Math: Clean up artifacts from LaTeX based math rendering - https://phabricator.wikimedia.org/T195847 (10Jdforrester-WMF) [22:03:56] (03PS3) 10Andrew Bogott: novaproxy: support hiera config for blocking ips, user agents, referers [puppet] - 10https://gerrit.wikimedia.org/r/479041 (https://phabricator.wikimedia.org/T211709) [22:04:24] (03CR) 10Dzahn: "this version compiles now with no change on an unrelated wtp and this on wtp2001: https://puppet-compiler.wmflabs.org/compiler1002/18613/" [puppet] - 10https://gerrit.wikimedia.org/r/539181 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn) [22:05:00] (03Abandoned) 10Jforrester: Enable Extension:Newsletter on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151) (owner: 10Zoranzoki21) [22:06:16] (03PS4) 10Jforrester: Enable emails for certain notification types by default on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478799 (https://phabricator.wikimedia.org/T211620) (owner: 10Catrope) [22:07:26] (03PS2) 10Jforrester: Remove $wgPageTriageNoIndexTemplates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/523295 (owner: 10MaxSem) [22:07:33] (03PS4) 10Andrew Bogott: novaproxy: support hiera config for blocking ips, user agents, referers [puppet] - 10https://gerrit.wikimedia.org/r/479041 (https://phabricator.wikimedia.org/T211709) [22:07:57] (03CR) 10Jforrester: [C: 03+2] "Dependency has been live everywhere for two months." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/523295 (owner: 10MaxSem) [22:08:41] (03CR) 10Jforrester: [C: 03+2] Enable emails for certain notification types by default on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478799 (https://phabricator.wikimedia.org/T211620) (owner: 10Catrope) [22:09:04] (03Merged) 10jenkins-bot: Remove $wgPageTriageNoIndexTemplates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/523295 (owner: 10MaxSem) [22:09:15] (03PS5) 10Jforrester: Create new http://www.mediawiki.org/xml/sitelist-1.1/ to reference sitelist-1.1.xsd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/508130 (https://phabricator.wikimedia.org/T222516) (owner: 10Luca Mauri) [22:09:54] (03PS3) 10Jforrester: Add some HIDPI Wikivoyage logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529464 (https://phabricator.wikimedia.org/T230114) (owner: 10Jc86035) [22:09:56] (03PS4) 10Jhedden: openstack: add designate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539421 (https://phabricator.wikimedia.org/T223907) [22:11:16] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [22:11:31] (03CR) 10jenkins-bot: Remove $wgPageTriageNoIndexTemplates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/523295 (owner: 10MaxSem) [22:11:37] (03CR) 10Jforrester: "I don't understand the purpose of this change. Just point the wikis at the enwikivoyage logos?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529464 (https://phabricator.wikimedia.org/T230114) (owner: 10Jc86035) [22:11:39] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Stop setting wgPageTriageNoIndexTemplates, never read (duration: 00m 57s) [22:11:39] (03CR) 10jerkins-bot: [V: 04-1] Add some HIDPI Wikivoyage logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529464 (https://phabricator.wikimedia.org/T230114) (owner: 10Jc86035) [22:11:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:11:52] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 72, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [22:12:08] (03PS5) 10Jforrester: Enable emails for certain notification types by default on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478799 (https://phabricator.wikimedia.org/T211620) (owner: 10Catrope) [22:12:14] (03CR) 10Jforrester: [C: 03+2] "…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478799 (https://phabricator.wikimedia.org/T211620) (owner: 10Catrope) [22:13:02] (03PS5) 10Jhedden: openstack: add designate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539421 (https://phabricator.wikimedia.org/T223907) [22:13:14] (03CR) 10Jforrester: "Is this going anywhere? Seems reasonable (but you should move this to the static tests file, much faster as you don't need the rest of the" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529047 (https://phabricator.wikimedia.org/T230103) (owner: 10Urbanecm) [22:13:24] (03Merged) 10jenkins-bot: Enable emails for certain notification types by default on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478799 (https://phabricator.wikimedia.org/T211620) (owner: 10Catrope) [22:13:44] (03PS5) 10Andrew Bogott: novaproxy: support hiera config for blocking ips, user agents, referers [puppet] - 10https://gerrit.wikimedia.org/r/479041 (https://phabricator.wikimedia.org/T211709) [22:14:22] (03CR) 10jenkins-bot: Enable emails for certain notification types by default on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478799 (https://phabricator.wikimedia.org/T211620) (owner: 10Catrope) [22:15:49] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T211620 Enable emails for certain notification types by default on officewiki (duration: 00m 56s) [22:15:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:15:53] T211620: Change some default Notification settings on officewiki - https://phabricator.wikimedia.org/T211620 [22:16:25] (03PS6) 10Andrew Bogott: novaproxy: support hiera config for blocking ips, user agents, referers [puppet] - 10https://gerrit.wikimedia.org/r/479041 (https://phabricator.wikimedia.org/T211709) [22:16:57] (03CR) 10Jforrester: [C: 04-1] "Per my last. If there's a simple way to exempt "temporary" logos, I'd love to get this (re-written into a static test and) merged." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521281 (https://phabricator.wikimedia.org/T227419) (owner: 10Urbanecm) [22:18:01] (03CR) 10Jforrester: "> Patch Set 4:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/508476 (https://phabricator.wikimedia.org/T222539) (owner: 10Catrope) [22:20:09] (03PS7) 10Andrew Bogott: novaproxy: support hiera config for blocking ips, user agents, referers [puppet] - 10https://gerrit.wikimedia.org/r/479041 (https://phabricator.wikimedia.org/T211709) [22:21:12] (03Abandoned) 10Jforrester: Enable FlaggedRevisions on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507932 (https://phabricator.wikimedia.org/T221933) (owner: 10星耀晨曦) [22:21:56] (03CR) 10Dzahn: "questions i have: has_lvs is disabled but webserver has_tls is enabled. I added real and fake mcrouter certs. but what about the real equi" [puppet] - 10https://gerrit.wikimedia.org/r/539181 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn) [22:22:38] (03CR) 10Jforrester: "This new global is landed in master and will ship in 1.34.0-wmf.25; should we deploy this now, or wait for the train?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538867 (https://phabricator.wikimedia.org/T232986) (owner: 10KartikMistry) [22:23:05] (03CR) 10Dzahn: "see all the new resources in https://puppet-compiler.wmflabs.org/compiler1002/18613/wtp2001.codfw.wmnet/ and also which get removed due to" [puppet] - 10https://gerrit.wikimedia.org/r/539181 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn) [22:26:03] (03Abandoned) 10Jforrester: Set wgGEConfirmEmailEnabled to false for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538100 (https://phabricator.wikimedia.org/T233363) (owner: 10Urbanecm) [22:26:24] (03CR) 10BryanDavis: [C: 03+1] "LGTM. There is an existing blocked UA in the domainproxy template that could be moved into hiera config with this patch, but that can be a" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/479041 (https://phabricator.wikimedia.org/T211709) (owner: 10Andrew Bogott) [22:28:30] 10Operations, 10Mail: Vendor's Emails Not Coming Through - https://phabricator.wikimedia.org/T233991 (10Dzahn) a:05Dzahn→03None [22:28:35] (03CR) 10Bstorm: [C: 03+1] "Kind of what I imagine. I wish the compiler was more useful for templating. https://puppet-compiler.wmflabs.org/compiler1002/18624/proxy-" [puppet] - 10https://gerrit.wikimedia.org/r/479041 (https://phabricator.wikimedia.org/T211709) (owner: 10Andrew Bogott) [22:29:41] (03PS15) 10Jforrester: Update HD logo for wikisource using default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529175 (owner: 10Viztor) [22:29:56] (03CR) 10Jforrester: "PS15: Rebased." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529175 (owner: 10Viztor) [22:30:28] (03CR) 10Jforrester: "Is this ready to ship?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530639 (https://phabricator.wikimedia.org/T219222) (owner: 10Kosta Harlan) [22:30:51] (03PS3) 10Jforrester: Echo: Enable poll for updates feature on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530639 (https://phabricator.wikimedia.org/T219222) (owner: 10Kosta Harlan) [22:31:34] (03CR) 10Bstorm: sssd: Add a whole duplicate hierarchy of sssd images (031 comment) [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/536692 (https://phabricator.wikimedia.org/T229058) (owner: 10Bstorm) [22:31:59] (03PS6) 10Jhedden: openstack: add designate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539421 (https://phabricator.wikimedia.org/T223907) [22:34:05] (03CR) 10Bstorm: "So overall, I think I'll swap toollabs for toolforge in this, which would allow me to drop the "sssd" portion, really. I can add runtime " [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/536692 (https://phabricator.wikimedia.org/T229058) (owner: 10Bstorm) [22:35:47] (03CR) 10Jforrester: "Should we just run this with "periodical" at first, and then specialise with messages once we're sure it works?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530871 (https://phabricator.wikimedia.org/T78711) (owner: 10Umherirrender) [22:37:51] 10Operations, 10Mail: Vendor's Emails Not Coming Through - https://phabricator.wikimedia.org/T233991 (10Dzahn) @HMarcus I am not really the right person for this. I can say though that "lawroom" only shows up twice in recent logs on the main production mail server and there it says "DKIM: d=lawroom.com .. [ve... [22:40:08] (03PS7) 10Jhedden: openstack: add designate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539421 (https://phabricator.wikimedia.org/T223907) [22:45:35] (03CR) 10Jhedden: "PCC results: https://puppet-compiler.wmflabs.org/compiler1002/18626/" [puppet] - 10https://gerrit.wikimedia.org/r/539421 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [22:46:08] (03Abandoned) 10Jforrester: Even more invariant config moved over to CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/512418 (owner: 10Jforrester) [22:50:04] 10Operations, 10serviceops, 10Patch-For-Review: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10Dzahn) @Joe @jijiki This change compiles now. On wtp2001 it adds all these resources: https://puppet-compiler.wmflabs.org/compiler1002/18613/wtp2001.codfw.wmnet/ and... [22:50:40] (03PS4) 10Jforrester: [WiP] YAML files for every wiki, and a basic inheritance tree [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538354 [22:51:16] (03CR) 10Jforrester: "PS4: Fixed the accidental squash of child into this commit." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538354 (owner: 10Jforrester) [22:55:06] (03PS1) 10Papaul: DHCP: Add MAC address entries for ms-be205[1-6] [puppet] - 10https://gerrit.wikimedia.org/r/539432 (https://phabricator.wikimedia.org/T233638) [23:00:05] MaxSem, RoanKattouw, Niharika, and Urbanecm: (Dis)respected human, time to deploy Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190926T2300). Please do the needful. [23:00:05] No GERRIT patches in the queue for this window AFAICS. [23:05:10] (03CR) 10Papaul: [C: 03+2] DHCP: Add MAC address entries for ms-be205[1-6] [puppet] - 10https://gerrit.wikimedia.org/r/539432 (https://phabricator.wikimedia.org/T233638) (owner: 10Papaul) [23:19:21] (03PS5) 10Jforrester: Static configuration: Provide basic YAML files for each wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538354 [23:19:43] (03PS6) 10Jforrester: Static configuration: Provide basic YAML files for each wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538354 [23:19:49] (03CR) 10Jforrester: [C: 03+2] Static configuration: Provide basic YAML files for each wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538354 (owner: 10Jforrester) [23:22:48] (03Merged) 10jenkins-bot: Static configuration: Provide basic YAML files for each wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538354 (owner: 10Jforrester) [23:24:12] (03PS7) 10Jforrester: MWConfigCacheGenerator: Provide getCachableMWConfig() which doesn't rely on wgConf [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538078 [23:24:25] (03PS17) 10Jforrester: Variant configuration: Pre-calculate config for each wiki and store it in config.git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507729 (https://phabricator.wikimedia.org/T223602) [23:26:22] (03CR) 10jerkins-bot: [V: 04-1] Variant configuration: Pre-calculate config for each wiki and store it in config.git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507729 (https://phabricator.wikimedia.org/T223602) (owner: 10Jforrester) [23:29:00] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [23:29:38] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 74, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [23:30:02] PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-text site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [23:30:32] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [23:31:40] RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [23:32:10] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [23:33:37] (03CR) 10jenkins-bot: Static configuration: Provide basic YAML files for each wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538354 (owner: 10Jforrester) [23:38:22] (03PS8) 10Jforrester: MWConfigCacheGenerator: Provide getCachableMWConfig() which doesn't rely on wgConf [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538078 [23:38:24] (03PS18) 10Jforrester: Variant configuration: Pre-calculate config for each wiki and store it in config.git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507729 (https://phabricator.wikimedia.org/T223602) [23:38:26] (03PS5) 10Jforrester: [WIP] Provide for YAML-based inherited configuration to eventually replace InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538129 [23:38:30] (03PS1) 10Jforrester: config: Also provide a config file for nqowiki, new since generated [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539433 [23:39:10] (03CR) 10Jforrester: [C: 03+2] config: Also provide a config file for nqowiki, new since generated [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539433 (owner: 10Jforrester) [23:39:53] 10Operations, 10Mail: Vendor's Emails Not Coming Through - https://phabricator.wikimedia.org/T233991 (10HMarcus) Thanks @Dzahn , appreciate the input from the logs. Will wait for confirmation from @herron but it sounds like the bottleneck doesn't live on your end. [23:40:03] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Provide for YAML-based inherited configuration to eventually replace InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538129 (owner: 10Jforrester) [23:40:13] (03Merged) 10jenkins-bot: config: Also provide a config file for nqowiki, new since generated [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539433 (owner: 10Jforrester) [23:42:28] (03CR) 10jenkins-bot: config: Also provide a config file for nqowiki, new since generated [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539433 (owner: 10Jforrester)