[00:04:30] !log install1002 - exported indices for new scap version - copied back from buster to stretch - upgraded scap version on mw2250 - scap pull now works and starts to rsync (T228482, T228328, T226948) [00:04:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:04:43] T226948: Degraded RAID on mw2250 - https://phabricator.wikimedia.org/T226948 [00:04:43] T228482: Deploy scap 3.11.1-1 - https://phabricator.wikimedia.org/T228482 [00:04:44] T228328: 'scap pull' stopped working on appservers ? - https://phabricator.wikimedia.org/T228328 [00:05:18] 10Operations, 10ops-codfw, 10serviceops, 10User-jijiki: Degraded RAID on mw2250 - https://phabricator.wikimedia.org/T226948 (10Dzahn) the above was after "19:50 < mutante> !log built new scap version 3.11.1-1 on boron, copied to install1002, imported package with reprepro, copied from stretch to jessie and... [00:07:49] RECOVERY - puppet last run on mw2250 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [00:08:25] RECOVERY - PHP7 rendering on mw2250 is OK: HTTP OK: HTTP/1.1 200 OK - 327 bytes in 0.074 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:09:03] ^ yay [00:09:12] that is with new scap [00:10:06] !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet [00:10:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:11:43] 10Operations, 10ops-codfw, 10serviceops, 10User-jijiki: Degraded RAID on mw2250 - https://phabricator.wikimedia.org/T226948 (10Dzahn) 05Stalled→03Resolved 20:08 <+icinga-wm> RECOVERY - PHP7 rendering on mw2250 is OK: HTTP OK: HTTP/1.1 200 OK - 327 bytes in 0.074 second response time 20:10 <+logmsgbot... [00:12:57] (03PS1) 10Andrew Bogott: bootstrap-vz: remove a few packages from the buster base image [puppet] - 10https://gerrit.wikimedia.org/r/524387 [00:13:01] RECOVERY - mediawiki-installation DSH group on mw2250 is OK: OK https://wikitech.wikimedia.org/wiki/Application_servers%23Apache_setup_checklist [00:13:12] (03PS2) 10Andrew Bogott: bootstrap-vz: remove a few packages from the buster base image [puppet] - 10https://gerrit.wikimedia.org/r/524387 [00:13:52] (03CR) 10Andrew Bogott: [C: 03+2] bootstrap-vz: remove a few packages from the buster base image [puppet] - 10https://gerrit.wikimedia.org/r/524387 (owner: 10Andrew Bogott) [00:15:21] 10Operations: Host mw2250 is not in mediawiki-installation dsh group - https://phabricator.wikimedia.org/T227547 (10Dzahn) server could be synced again because a new scap version was deployed (T228482) which fixes scap pull (T228328). after doing a scap pull this happens automatically: 20:13 <+icinga-wm> RECOV... [00:15:42] 10Operations: Host mw2250 is not in mediawiki-installation dsh group - https://phabricator.wikimedia.org/T227547 (10Dzahn) the issue is resolved, keeping it open to improve the docs [00:15:48] 10Operations: Host mw2250 is not in mediawiki-installation dsh group - https://phabricator.wikimedia.org/T227547 (10Dzahn) a:03Dzahn [00:24:00] (03PS9) 10Jeena Huneidi: Add mediawiki development chart. [deployment-charts] - 10https://gerrit.wikimedia.org/r/522584 (https://phabricator.wikimedia.org/T224935) [00:28:07] PROBLEM - puppet last run on mwmaint2001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [00:36:42] runs puppet on mwmaint2001 [00:37:31] uh oh. errors are real and it fails to remove a couple directories [00:38:51] !log mwmaint2001 - puppet fails - not removing a bunch of log dirs for maintenance crons [00:38:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:39:25] RECOVERY - puppet last run on mwmaint2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [00:39:59] nah, they were just existing warnings [00:41:39] Hey - was going to scap out another /private change. Please let me know if I shouldn't. [00:42:52] no concerns from me, there is a new scap version but it's not rolled out yet except to 1 server [00:42:57] if that's why you asked [00:44:08] i am gonna say this is the time with the least coverage though [00:45:53] this will affect like 6 smaller projects [00:46:39] (03CR) 10Dzahn: profile::mediawiki::jobrunner: Enable feature flags (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/523908 (owner: 10Effie Mouzeli) [00:52:02] !log sbassett@deploy1001 Synchronized private/PrivateSettings.php: Add even more severe rate limits for eswikiquote and some other, smaller wikis (T227416) (duration: 00m 58s) [00:52:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:03:03] PROBLEM - puppet last run on an-worker1084 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [01:10:33] RECOVERY - High lag on wdqs1010 is OK: (C)3600 ge (W)1200 ge 788.8 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [01:25:41] RECOVERY - puppet last run on an-worker1084 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [01:52:35] !log enable outbound sampling on eqiad's router [01:52:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:00:08] (03PS1) 10Andrew Bogott: bootstrap-vz: prune out a couple of extra firstboot steps [puppet] - 10https://gerrit.wikimedia.org/r/524392 [02:06:48] (03PS2) 10Andrew Bogott: bootstrap-vz: prune out a couple of extra firstboot steps [puppet] - 10https://gerrit.wikimedia.org/r/524392 [02:14:00] (03CR) 10Andrew Bogott: [C: 03+2] bootstrap-vz: prune out a couple of extra firstboot steps [puppet] - 10https://gerrit.wikimedia.org/r/524392 (owner: 10Andrew Bogott) [03:20:43] RECOVERY - Check systemd state on mw2250 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [03:25:37] RECOVERY - Check the last execution of php7.2-fpm_check_restart on mw2250 is OK: OK: Status of the systemd unit php7.2-fpm_check_restart https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [03:33:59] 10Puppet, 10cloud-services-team (Kanban): Help people remember to merge labs/private git - https://phabricator.wikimedia.org/T228443 (10CDanis) I updated the docs: https://wikitech.wikimedia.org/w/index.php?title=Puppet&type=revision&diff=1833007&oldid=1828050 [04:20:29] PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [04:52:20] 10Operations, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team-TODO (201907), 10Wikimedia-Incident: docker-registry: some layers has been corrupted due to deleting other swift containers - https://phabricator.wikimedia.org/T228196 (10fsero) I did a complete pull of all images and tags of our... [04:52:29] 10Operations, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team-TODO (201907), 10Wikimedia-Incident: docker-registry: some layers has been corrupted due to deleting other swift containers - https://phabricator.wikimedia.org/T228196 (10fsero) p:05High→03Normal [05:08:13] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2116" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524398 [05:09:20] (03CR) 10Marostegui: [C: 03+2] Revert "db-codfw.php: Depool db2116" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524398 (owner: 10Marostegui) [05:10:13] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2116" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524398 (owner: 10Marostegui) [05:10:28] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2116" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524398 (owner: 10Marostegui) [05:10:48] (03PS7) 10Marostegui: mariadb: Provision dbproxy2001 into codfw m1 [puppet] - 10https://gerrit.wikimedia.org/r/518251 (https://phabricator.wikimedia.org/T202367) [05:11:26] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool db2116 (duration: 00m 55s) [05:11:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:12:09] 10Operations, 10SRE-Access-Requests: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003 and notebook1004] and groups for cchen - https://phabricator.wikimedia.org/T228447 (10fsero) @cchen as stated in https://wikitech.wikimedia.org/wiki/Production_shell_access we need your public SSH key... [05:12:21] 10Operations, 10SRE-Access-Requests: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003 and notebook1004] and groups for cchen - https://phabricator.wikimedia.org/T228447 (10fsero) p:05Triage→03Normal [05:26:18] !log repool ms-fe2005 - T228196 [05:26:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:26:25] T228196: docker-registry: some layers has been corrupted due to deleting other swift containers - https://phabricator.wikimedia.org/T228196 [05:27:08] (03PS17) 10Vgutierrez: ncredir: Use pipes instead of files for the access_log [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) [05:31:07] (03PS18) 10Vgutierrez: ncredir: Use pipes instead of files for the access_log [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) [05:38:09] (03PS19) 10Vgutierrez: ncredir: Use pipes instead of files for the access_log [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) [05:43:06] (03PS4) 10Vgutierrez: fifo_log_demux: Provide pipe creation capabilities [puppet] - 10https://gerrit.wikimedia.org/r/524176 (https://phabricator.wikimedia.org/T228382) [05:43:08] (03PS20) 10Vgutierrez: ncredir: Use pipes instead of files for the access_log [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) [05:56:39] (03PS1) 10Fsero: admin: creates Mayakpwiki shell access and membership to groups [puppet] - 10https://gerrit.wikimedia.org/r/524404 (https://phabricator.wikimedia.org/T227633) [05:58:51] (03PS2) 10Fsero: admin: creates Mayakpwiki shell access and membership to groups [puppet] - 10https://gerrit.wikimedia.org/r/524404 (https://phabricator.wikimedia.org/T227633) [06:00:31] (03CR) 10Fsero: [C: 03+2] admin: creates Mayakpwiki shell access and membership to groups [puppet] - 10https://gerrit.wikimedia.org/r/524404 (https://phabricator.wikimedia.org/T227633) (owner: 10Fsero) [06:02:16] (03PS21) 10Vgutierrez: ncredir: Use pipes instead of files for the access_log [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) [06:02:26] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003, and notebook1004] and groups for Mayakpwiki - https://phabricator.wikimedia.org/T227633 (10fsero) @kzimmerman @Mayakp.wiki done, feel free to reopen if you find any issues. [06:02:33] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003, and notebook1004] and groups for Mayakpwiki - https://phabricator.wikimedia.org/T227633 (10fsero) 05Open→03Resolved a:03fsero [06:02:36] if anyone's around, I'd appreciate a sanity check on https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/EventBus/+/524403 to get some more debug logs for a UBN [06:07:18] (03PS22) 10Vgutierrez: ncredir: Use pipes instead of files for the access_log [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) [06:08:13] (03PS23) 10Vgutierrez: ncredir: Use pipes instead of files for the access_log [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) [06:08:45] PROBLEM - puppet last run on bast2002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:10:11] (03PS1) 10Fsero: admin: creates Mayakpwiki shell access and membership to groups [puppet] - 10https://gerrit.wikimedia.org/r/524405 (https://phabricator.wikimedia.org/T227633) [06:10:29] (03CR) 10jerkins-bot: [V: 04-1] admin: creates Mayakpwiki shell access and membership to groups [puppet] - 10https://gerrit.wikimedia.org/r/524405 (https://phabricator.wikimedia.org/T227633) (owner: 10Fsero) [06:10:56] (03PS24) 10Vgutierrez: ncredir: Use pipes instead of files for the access_log [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) [06:11:59] (03PS2) 10Fsero: admin: creates Mayakpwiki shell access and membership to groups [puppet] - 10https://gerrit.wikimedia.org/r/524405 (https://phabricator.wikimedia.org/T227633) [06:12:17] (03CR) 10jerkins-bot: [V: 04-1] admin: creates Mayakpwiki shell access and membership to groups [puppet] - 10https://gerrit.wikimedia.org/r/524405 (https://phabricator.wikimedia.org/T227633) (owner: 10Fsero) [06:12:17] PROBLEM - puppet last run on bast5001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:12:45] PROBLEM - puppet last run on notebook1004 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:13:30] (03PS3) 10Fsero: admin: creates Mayakpwiki shell access and membership to groups [puppet] - 10https://gerrit.wikimedia.org/r/524405 (https://phabricator.wikimedia.org/T227633) [06:14:13] PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:14:37] (03CR) 10Fsero: [C: 03+2] admin: creates Mayakpwiki shell access and membership to groups [puppet] - 10https://gerrit.wikimedia.org/r/524405 (https://phabricator.wikimedia.org/T227633) (owner: 10Fsero) [06:14:57] (03PS25) 10Vgutierrez: ncredir: Use pipes instead of files for the access_log [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) [06:15:01] PROBLEM - puppet last run on people1001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:15:20] !log clear opcache on mwdebug* [06:15:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:17:29] PROBLEM - puppet last run on stat1006 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:23:09] RECOVERY - puppet last run on stat1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:23:35] RECOVERY - puppet last run on bast5001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:28:17] (03CR) 10Vgutierrez: "pcc is happy https://puppet-compiler.wmflabs.org/compiler1002/17492/ and our traffic-ncredir instance in labs is running fine as well" [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) (owner: 10Vgutierrez) [06:30:26] !log legoktm@deploy1001 Synchronized php-1.34.0-wmf.14/extensions/EventBus/includes/EventBus.php: Add more debugging to figure out which events are invalid: T225199 (duration: 00m 55s) [06:30:27] PROBLEM - puppet last run on db1132 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/rsyslog.lookup.d/lookup_table_output.json] https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:30:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:30:34] T225199: Fatal error during RecentChange::notifyEdit (deferred update) from ORES/RecentChangeSaveHookHandler - https://phabricator.wikimedia.org/T225199 [06:30:35] PROBLEM - puppet last run on analytics1061 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/maven/ivysettings.xml] https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:30:57] (03PS4) 10Fsero: Capture calico deployment in code. [deployment-charts] - 10https://gerrit.wikimedia.org/r/523580 (https://phabricator.wikimedia.org/T227775) [06:32:01] (03CR) 10Fsero: [V: 03+2 C: 03+2] Capture calico deployment in code. [deployment-charts] - 10https://gerrit.wikimedia.org/r/523580 (https://phabricator.wikimedia.org/T227775) (owner: 10Fsero) [06:32:14] !log legoktm@deploy1001 Synchronized php-1.34.0-wmf.13/extensions/EventBus/includes/EventBus.php: Add more debugging to figure out which events are invalid: T225199 (duration: 00m 55s) [06:32:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:36:59] RECOVERY - puppet last run on bast2002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:40:03] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received: / (spec from root) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [06:40:59] RECOVERY - puppet last run on notebook1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:41:39] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [06:42:29] RECOVERY - puppet last run on stat1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:43:17] RECOVERY - puppet last run on people1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:46:02] (03PS4) 10Muehlenhoff: Add DHCP config for idp1001 [puppet] - 10https://gerrit.wikimedia.org/r/524275 (https://phabricator.wikimedia.org/T228403) [06:48:13] (03CR) 10Muehlenhoff: [C: 03+2] Add DHCP config for idp1001 [puppet] - 10https://gerrit.wikimedia.org/r/524275 (https://phabricator.wikimedia.org/T228403) (owner: 10Muehlenhoff) [06:58:03] (03Abandoned) 10Muehlenhoff: Switch puppetdb1001/1002 to facter 3/puppet 5 [puppet] - 10https://gerrit.wikimedia.org/r/510171 (https://phabricator.wikimedia.org/T219803) (owner: 10Muehlenhoff) [06:58:43] RECOVERY - puppet last run on db1132 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [06:58:49] RECOVERY - puppet last run on analytics1061 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [07:01:47] !log depool wdqs2004 from all services (waiting for maintenance) [07:01:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:02:18] !log reloading dbproxy1004/9 [07:02:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:02:57] RECOVERY - haproxy failover on dbproxy1004 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy [07:03:13] !log restart php-fpm on mw1330 - op-cache hit ratio low [07:03:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:03:29] jijiki: --^ (I saved the opcache-info if you need it) [07:03:49] RECOVERY - haproxy failover on dbproxy1009 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy [07:09:37] (03PS1) 10Vgutierrez: prometheus: Collect ncredir nginx metrics [puppet] - 10https://gerrit.wikimedia.org/r/524409 (https://phabricator.wikimedia.org/T228382) [07:10:39] (03CR) 10jerkins-bot: [V: 04-1] prometheus: Collect ncredir nginx metrics [puppet] - 10https://gerrit.wikimedia.org/r/524409 (https://phabricator.wikimedia.org/T228382) (owner: 10Vgutierrez) [07:11:57] (03PS2) 10Vgutierrez: prometheus: Collect ncredir nginx metrics [puppet] - 10https://gerrit.wikimedia.org/r/524409 (https://phabricator.wikimedia.org/T228382) [07:16:17] (03CR) 10Jcrespo: [C: 03+1] "Remember to upgrade switchover.py to HEAD" [puppet] - 10https://gerrit.wikimedia.org/r/523941 (https://phabricator.wikimedia.org/T228243) (owner: 10Marostegui) [07:16:51] (03PS1) 10Marostegui: mariadb: Promote db1104 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/524411 (https://phabricator.wikimedia.org/T227062) [07:17:21] (03CR) 10Marostegui: [C: 04-2] "Wait for failover day" [puppet] - 10https://gerrit.wikimedia.org/r/524411 (https://phabricator.wikimedia.org/T227062) (owner: 10Marostegui) [07:18:31] (03PS2) 10Marostegui: mariadb: Promote db1104 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/524411 (https://phabricator.wikimedia.org/T227062) [07:20:42] (03PS1) 10Marostegui: db-eqiad.php: Pool db1109 into API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524412 (https://phabricator.wikimedia.org/T227062) [07:21:41] (03PS1) 10Muehlenhoff: Add idp1001 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/524413 [07:25:01] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [07:25:03] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [07:25:07] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [07:25:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:25:08] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [07:25:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:25:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:25:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:49] (03CR) 10Muehlenhoff: [C: 03+2] Add idp1001 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/524413 (owner: 10Muehlenhoff) [07:37:11] RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [07:38:19] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [07:38:20] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [07:38:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:38:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:38:55] !log rebooting tungsten for kernel update [07:39:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:55:23] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, having a pcc run would help too" [puppet] - 10https://gerrit.wikimedia.org/r/524274 (owner: 10Jbond) [07:57:15] !log installing idp1001 T228403 [07:57:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:57:22] T228403: eqiad: One VM request for identity provider - https://phabricator.wikimedia.org/T228403 [07:57:33] elukey: thank you, in theory the service would be restarted today or tomorrow due to that [07:59:34] I suspected that, should I leave them as warnings? [07:59:47] I saw a couple of other ones today (not mwdebug, regular appservers) [08:01:04] (03CR) 10Filippo Giunchedi: "> Patch Set 4:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524037 (owner: 10Ayounsi) [08:01:44] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/524288 (https://phabricator.wikimedia.org/T196066) (owner: 10Cwhite) [08:06:04] (03PS2) 10ArielGlenn: add a few more public sql tables to default list to be dumped [dumps] - 10https://gerrit.wikimedia.org/r/521565 (https://phabricator.wikimedia.org/T226167) [08:07:35] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Pool db1109 into API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524412 (https://phabricator.wikimedia.org/T227062) (owner: 10Marostegui) [08:08:30] (03Merged) 10jenkins-bot: db-eqiad.php: Pool db1109 into API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524412 (https://phabricator.wikimedia.org/T227062) (owner: 10Marostegui) [08:08:47] (03CR) 10jenkins-bot: db-eqiad.php: Pool db1109 into API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524412 (https://phabricator.wikimedia.org/T227062) (owner: 10Marostegui) [08:10:09] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Pool db1109 into API (duration: 00m 54s) [08:10:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:22] (03PS1) 10ArielGlenn: Revert "add a few more public sql tables to default list to be dumped": this was pushed directly by accident. Will be resubmitted the usual way through gerrit. [dumps] - 10https://gerrit.wikimedia.org/r/524437 [08:10:51] (03CR) 10ArielGlenn: [C: 03+2] Revert "add a few more public sql tables to default list to be dumped": this was pushed directly by accident. Will be resubmitted the usual [dumps] - 10https://gerrit.wikimedia.org/r/524437 (owner: 10ArielGlenn) [08:13:23] PROBLEM - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [08:13:54] (03PS13) 10Effie Mouzeli: profile::mediawiki::jobrunner: Enable feature flags [puppet] - 10https://gerrit.wikimedia.org/r/523908 [08:14:20] (03CR) 10Effie Mouzeli: [V: 03+1] profile::mediawiki::jobrunner: Enable feature flags (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/523908 (owner: 10Effie Mouzeli) [08:15:08] (03PS1) 10ArielGlenn: add a few more public sql tables to default list to be dumped [dumps] - 10https://gerrit.wikimedia.org/r/524438 (https://phabricator.wikimedia.org/T226167) [08:16:29] !log restart pybal on lvs2006 [08:16:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:18:19] (03PS2) 10Effie Mouzeli: profile::mediawiki::jobrunner: Configure php7_only flag [puppet] - 10https://gerrit.wikimedia.org/r/524336 (https://phabricator.wikimedia.org/T219148) [08:20:04] !log restart pybal on lvs2003 [08:20:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:04] (03PS14) 10Effie Mouzeli: profile::mediawiki::jobrunner: Enable feature flags [puppet] - 10https://gerrit.wikimedia.org/r/523908 [08:21:27] RECOVERY - PyBal backends health check on lvs2003 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [08:22:57] !log repooling wdqs2003 - T228122 [08:23:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:23:04] T228122: DB reload for WDQS - https://phabricator.wikimedia.org/T228122 [08:24:02] !log repooling wdqs2004 - T228122 [08:24:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:27:17] 10Operations, 10Patch-For-Review: Tracking and Reducing cron-spam to root@ - https://phabricator.wikimedia.org/T132324 (10jcrespo) [08:28:51] RECOVERY - PyBal backends health check on lvs2006 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [08:30:36] (03PS1) 10Muehlenhoff: Add contract end date for Maya [puppet] - 10https://gerrit.wikimedia.org/r/524473 (https://phabricator.wikimedia.org/T227633) [08:34:56] (03PS26) 10Vgutierrez: ncredir: Use pipes instead of files for the access_log [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) [08:35:01] (03CR) 10ArielGlenn: [C: 03+2] add a few more public sql tables to default list to be dumped [dumps] - 10https://gerrit.wikimedia.org/r/524438 (https://phabricator.wikimedia.org/T226167) (owner: 10ArielGlenn) [08:36:24] !log gehel@cumin1001 START - Cookbook sre.wdqs.data-transfer [08:36:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:02] !log ariel@deploy1001 Started deploy [dumps/dumps@440faa0]: more error reporting for stubs/abstracts/pagelogs; more public table dumps by default [08:37:06] !log ariel@deploy1001 Finished deploy [dumps/dumps@440faa0]: more error reporting for stubs/abstracts/pagelogs; more public table dumps by default (duration: 00m 04s) [08:37:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:51] RECOVERY - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is OK: puppetdb.PuppetDB OK https://wikitech.wikimedia.org/wiki/Netbox%23Reports [08:39:57] (03PS2) 10ArielGlenn: add more public sql tables to xml/sql dumps [puppet] - 10https://gerrit.wikimedia.org/r/521566 (https://phabricator.wikimedia.org/T226167) [08:40:09] (03CR) 10Muehlenhoff: [C: 03+2] Add contract end date for Maya [puppet] - 10https://gerrit.wikimedia.org/r/524473 (https://phabricator.wikimedia.org/T227633) (owner: 10Muehlenhoff) [08:40:53] (03PS27) 10Vgutierrez: ncredir: Use pipes instead of files for the access_log [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) [08:41:17] (03PS3) 10ArielGlenn: add more public sql tables to xml/sql dumps [puppet] - 10https://gerrit.wikimedia.org/r/521566 (https://phabricator.wikimedia.org/T226167) [08:42:25] (03CR) 10ArielGlenn: [C: 03+2] add more public sql tables to xml/sql dumps [puppet] - 10https://gerrit.wikimedia.org/r/521566 (https://phabricator.wikimedia.org/T226167) (owner: 10ArielGlenn) [08:43:38] 10Operations, 10hardware-requests: eqiad+codfw: 6x hardware request for swift backend (each site) - https://phabricator.wikimedia.org/T227314 (10fgiunchedi) Thanks @robh! Both tasks updated with racking and need-by info [08:55:40] (03PS28) 10Vgutierrez: ncredir: Use pipes instead of files for the access_log [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) [08:55:49] 10Operations, 10Maps: Review sizing of maps cluster - https://phabricator.wikimedia.org/T228497 (10Gehel) [08:55:57] 10Operations, 10Maps: Review sizing of maps cluster - https://phabricator.wikimedia.org/T228497 (10Gehel) p:05Triage→03High [08:56:34] 10Operations, 10Maps, 10Wikimedia-Incident: Review sizing of maps cluster - https://phabricator.wikimedia.org/T228497 (10Gehel) [09:00:08] (03PS1) 10Gehel: Revert "Disable replicate and admin cron in eqiad" [puppet] - 10https://gerrit.wikimedia.org/r/524475 [09:00:47] (03PS2) 10Gehel: Revert "Disable replicate and admin cron in eqiad" [puppet] - 10https://gerrit.wikimedia.org/r/524475 [09:03:34] 10Operations, 10Cassandra, 10Goal, 10Patch-For-Review, and 2 others: Handle HBA controllers in get-raid-status-hpssacli - https://phabricator.wikimedia.org/T185216 (10fgiunchedi) a:05fgiunchedi→03None >>! In T185216#5347580, @Eevans wrote: > @fgiunchedi is this still outstanding (and/or relevant)? Sti... [09:13:35] 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO, 10Patch-For-Review, 10Release-Engineering-Team (Development services): Add prometheus exporter to Gerrit - https://phabricator.wikimedia.org/T184086 (10hashar) >>! In T184086#5345246, @fgiunchedi wrote: > Sort of orthogonal, please consider also ad... [09:16:33] (03CR) 10Ema: [C: 03+1] fifo_log_demux: Provide pipe creation capabilities [puppet] - 10https://gerrit.wikimedia.org/r/524176 (https://phabricator.wikimedia.org/T228382) (owner: 10Vgutierrez) [09:22:47] (03CR) 10Muehlenhoff: puppetmaster: Add the abbilty to have canary beckends (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524287 (owner: 10Jbond) [09:23:09] (03CR) 10Gehel: [C: 03+2] Revert "Disable replicate and admin cron in eqiad" [puppet] - 10https://gerrit.wikimedia.org/r/524475 (owner: 10Gehel) [09:26:23] (03CR) 10Ema: "Other than the inline comments, it could be that ncredirlog.sh isn't really necessary in this case. I've added a similar script for ATS (a" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) (owner: 10Vgutierrez) [09:26:38] (03PS1) 10Alexandros Kosiaris: build_notes_url(): Don't quote if single url [puppet] - 10https://gerrit.wikimedia.org/r/524476 [09:27:52] (03PS22) 10Alexandros Kosiaris: nrpe/icinga: make notes_url a required parameter of nrpe::monitor_service [puppet] - 10https://gerrit.wikimedia.org/r/496830 (https://phabricator.wikimedia.org/T197873) (owner: 10Dzahn) [09:28:37] 10Operations, 10Traffic, 10Patch-For-Review: Provide prometheus metrics for the ncredir service - https://phabricator.wikimedia.org/T228382 (10ema) p:05Triage→03Normal [09:28:56] (03CR) 10jerkins-bot: [V: 04-1] nrpe/icinga: make notes_url a required parameter of nrpe::monitor_service [puppet] - 10https://gerrit.wikimedia.org/r/496830 (https://phabricator.wikimedia.org/T197873) (owner: 10Dzahn) [09:36:55] (03PS1) 10Fsero: Keeping in code what i did in boron for T228196. [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/524478 (https://phabricator.wikimedia.org/T228196) [09:38:50] 10Operations, 10User-fgiunchedi: CPU scaling governor audit - https://phabricator.wikimedia.org/T225713 (10fgiunchedi) >>! In T225713#5335975, @jcrespo wrote: > FYI, after applying the above change, I expected a huge shift on reported load (even if performance didn't change) or on temperatures, given this (wik... [09:38:51] (03PS2) 10Fsero: Keeping in code what i did in boron for T228196. [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/524478 (https://phabricator.wikimedia.org/T228196) [09:39:31] (03CR) 10Fsero: [V: 03+2 C: 03+2] Keeping in code what i did in boron for T228196. [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/524478 (https://phabricator.wikimedia.org/T228196) (owner: 10Fsero) [09:39:38] (03CR) 10Alexandros Kosiaris: [C: 03+1] Keeping in code what i did in boron for T228196. [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/524478 (https://phabricator.wikimedia.org/T228196) (owner: 10Fsero) [09:39:54] 10Operations, 10Traffic: Upgrade Varnish to 5.1.3-1wm11 - https://phabricator.wikimedia.org/T227672 (10ema) 05Open→03Resolved [09:41:07] (03PS1) 10Muehlenhoff: piwik: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524479 (https://phabricator.wikimedia.org/T227650) [09:41:53] (03CR) 10Alexandros Kosiaris: [C: 03+1] piwik: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524479 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [09:43:11] (03PS2) 10Jbond: puppetmaster: add type checking ro puppetmaster::web_frontend [puppet] - 10https://gerrit.wikimedia.org/r/524274 [09:43:28] (03PS2) 10Jbond: puppetmaster: Add the abbilty to have canary beckends [puppet] - 10https://gerrit.wikimedia.org/r/524287 [09:43:37] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/524274 (owner: 10Jbond) [09:43:44] (03CR) 10jerkins-bot: [V: 04-1] puppetmaster: add type checking ro puppetmaster::web_frontend [puppet] - 10https://gerrit.wikimedia.org/r/524274 (owner: 10Jbond) [09:43:53] (03PS3) 10Jbond: puppetmaster: Add the abbilty to have canary beckends [puppet] - 10https://gerrit.wikimedia.org/r/524287 [09:44:10] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/524287 (owner: 10Jbond) [09:44:21] (03PS2) 10Alexandros Kosiaris: Pass use_nodejs10 to proton [puppet] - 10https://gerrit.wikimedia.org/r/524353 (https://phabricator.wikimedia.org/T217114) (owner: 10MSantos) [09:44:38] (03CR) 10jerkins-bot: [V: 04-1] puppetmaster: Add the abbilty to have canary beckends [puppet] - 10https://gerrit.wikimedia.org/r/524287 (owner: 10Jbond) [09:44:41] (03CR) 10Alexandros Kosiaris: [C: 03+2] Pass use_nodejs10 to proton [puppet] - 10https://gerrit.wikimedia.org/r/524353 (https://phabricator.wikimedia.org/T217114) (owner: 10MSantos) [09:45:12] (03PS3) 10Jbond: puppetmaster: add type checking ro puppetmaster::web_frontend [puppet] - 10https://gerrit.wikimedia.org/r/524274 [09:45:17] (03PS4) 10Jbond: puppetmaster: Add the abbilty to have canary beckends [puppet] - 10https://gerrit.wikimedia.org/r/524287 [09:49:29] (03PS1) 10Ema: ATS: use TLS to connect to analytics-tool hosts [puppet] - 10https://gerrit.wikimedia.org/r/524482 (https://phabricator.wikimedia.org/T210411) [09:59:11] 10Operations, 10ops-codfw, 10serviceops: (OoW) restbase2009 lockup - https://phabricator.wikimedia.org/T227408 (10jijiki) @Eevans Shall we mark restbase2009 as inactive on conftool? [10:00:57] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/524287 (owner: 10Jbond) [10:01:04] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/524274 (owner: 10Jbond) [10:03:17] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/17494/" [puppet] - 10https://gerrit.wikimedia.org/r/524479 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [10:07:26] (03CR) 10Muehlenhoff: "Best to read the LDAP server from the new Hiera variables, see https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/524479/" [puppet] - 10https://gerrit.wikimedia.org/r/523991 (owner: 10Dzahn) [10:07:41] (03CR) 10Muehlenhoff: "Best to read the LDAP server from the new Hiera variables, see https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/524479/" [puppet] - 10https://gerrit.wikimedia.org/r/523992 (owner: 10Dzahn) [10:07:51] (03CR) 10Muehlenhoff: "Best to read the LDAP server from the new Hiera variables, see https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/524479/" [puppet] - 10https://gerrit.wikimedia.org/r/523993 (owner: 10Dzahn) [10:08:03] (03CR) 10Muehlenhoff: "Best to read the LDAP server from the new Hiera variables, see https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/524479/" [puppet] - 10https://gerrit.wikimedia.org/r/523994 (owner: 10Dzahn) [10:08:13] (03CR) 10Muehlenhoff: "Best to read the LDAP server from the new Hiera variables, see https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/524479/" [puppet] - 10https://gerrit.wikimedia.org/r/523995 (owner: 10Dzahn) [10:08:34] (03PS2) 10Ema: restbase: add certificate for restbase.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/523956 (https://phabricator.wikimedia.org/T210411) [10:09:10] 10Operations, 10Continuous-Integration-Infrastructure, 10Packaging: PCC always has an ERROR when compiling for servers with profile::redis::slave - https://phabricator.wikimedia.org/T228266 (10jbond) [10:09:12] 10Operations, 10puppet-compiler: puppet-catalog-compiler: compilation result randomly places servers in the 'failed' section - https://phabricator.wikimedia.org/T224977 (10jbond) [10:09:49] 10Operations, 10puppet-compiler: puppet-catalog-compiler: compilation result randomly places servers in the 'failed' section - https://phabricator.wikimedia.org/T224977 (10jbond) >>! In T224977#5265178, @Vgutierrez wrote: > After checking https://puppet-compiler.wmflabs.org/compiler1001/16855/ change error/war... [10:11:26] (03CR) 10Alexandros Kosiaris: [C: 04-1] puppetmaster: Add the abbilty to have canary beckends (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/524287 (owner: 10Jbond) [10:11:48] (03CR) 10Ema: [C: 03+2] restbase: add certificate for restbase.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/523956 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [10:12:21] (03Abandoned) 10Alexandros Kosiaris: Enable discovery for termbox-test [dns] - 10https://gerrit.wikimedia.org/r/521459 (https://phabricator.wikimedia.org/T226814) (owner: 10Tarrow) [10:12:45] (03Abandoned) 10Alexandros Kosiaris: Introduce termbox-test LVS configuration [puppet] - 10https://gerrit.wikimedia.org/r/521449 (https://phabricator.wikimedia.org/T226814) (owner: 10Tarrow) [10:13:50] (03PS1) 10Muehlenhoff: puppetboard: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524487 (https://phabricator.wikimedia.org/T227650) [10:15:50] (03CR) 10Alexandros Kosiaris: [C: 04-1] Assign termbox-test.svc.{eqiad,codfw}.wmnet LVS IPs (037 comments) [dns] - 10https://gerrit.wikimedia.org/r/521456 (https://phabricator.wikimedia.org/T226814) (owner: 10Tarrow) [10:16:15] (03Abandoned) 10Alexandros Kosiaris: termbox: add Kubernetes stanzas for test [puppet] - 10https://gerrit.wikimedia.org/r/521452 (https://phabricator.wikimedia.org/T226814) (owner: 10Tarrow) [10:16:56] (03Abandoned) 10Alexandros Kosiaris: WIP DNM: Introduce wikibase-termbox chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/497262 (owner: 10Tarrow) [10:18:15] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/17497/" [puppet] - 10https://gerrit.wikimedia.org/r/524487 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [10:20:34] (03PS5) 10Jbond: puppetmaster: Add the abbilty to have canary beckends [puppet] - 10https://gerrit.wikimedia.org/r/524287 [10:20:42] (03PS1) 10Elukey: role::analytics_cluster::ui: add health check for Yarn and Hue [puppet] - 10https://gerrit.wikimedia.org/r/524488 (https://phabricator.wikimedia.org/T227860) [10:20:49] (03CR) 10Jbond: "check experimental" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/524287 (owner: 10Jbond) [10:22:20] (03PS1) 10Muehlenhoff: debmonitor: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524489 (https://phabricator.wikimedia.org/T227650) [10:22:22] (03CR) 10Elukey: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/17498/matomo1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/524479 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [10:22:52] (03CR) 10Alexandros Kosiaris: [C: 04-1] profile::mediawiki::jobrunner: Enable feature flags (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/523908 (owner: 10Effie Mouzeli) [10:23:09] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1001/17499/analytics-tool1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/524488 (https://phabricator.wikimedia.org/T227860) (owner: 10Elukey) [10:23:34] (03CR) 10Alexandros Kosiaris: [C: 04-1] "I am too pendantic, I know. Otherwise LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/523908 (owner: 10Effie Mouzeli) [10:23:52] (03PS4) 10Jbond: puppetmaster: add type checking ro puppetmaster::web_frontend [puppet] - 10https://gerrit.wikimedia.org/r/524274 [10:23:54] (03PS15) 10Effie Mouzeli: profile::mediawiki::jobrunner: Enable feature flags [puppet] - 10https://gerrit.wikimedia.org/r/523908 [10:24:16] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/524274 (owner: 10Jbond) [10:24:44] (03PS2) 10Elukey: role::analytics_cluster::ui: add health check for Yarn and Hue [puppet] - 10https://gerrit.wikimedia.org/r/524488 (https://phabricator.wikimedia.org/T227860) [10:25:25] (03CR) 10Volans: [C: 03+1] "LGTM, one question inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524487 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [10:25:51] (03CR) 10Alexandros Kosiaris: [C: 03+1] profile::mediawiki::jobrunner: Enable feature flags [puppet] - 10https://gerrit.wikimedia.org/r/523908 (owner: 10Effie Mouzeli) [10:26:41] !log disable puppet on jobrunners for 523908 [10:26:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:28:47] (03CR) 10Effie Mouzeli: [C: 03+2] profile::mediawiki::jobrunner: Enable feature flags [puppet] - 10https://gerrit.wikimedia.org/r/523908 (owner: 10Effie Mouzeli) [10:28:53] (03CR) 10Muehlenhoff: puppetboard: Read LDAP servers from Hiera and switch to read-only replicas (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524487 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [10:28:58] (03PS16) 10Effie Mouzeli: profile::mediawiki::jobrunner: Enable feature flags [puppet] - 10https://gerrit.wikimedia.org/r/523908 [10:29:15] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/17500/" [puppet] - 10https://gerrit.wikimedia.org/r/524489 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [10:30:17] (03CR) 10Volans: [C: 03+1] puppetboard: Read LDAP servers from Hiera and switch to read-only replicas (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524487 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [10:30:42] (03CR) 10Alexandros Kosiaris: [C: 04-1] profile::mediawiki::jobrunner: Configure php7_only flag (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/524336 (https://phabricator.wikimedia.org/T219148) (owner: 10Effie Mouzeli) [10:30:51] (03CR) 10Ema: [C: 03+1] role::analytics_cluster::ui: add health check for Yarn and Hue [puppet] - 10https://gerrit.wikimedia.org/r/524488 (https://phabricator.wikimedia.org/T227860) (owner: 10Elukey) [10:31:01] (03PS29) 10Vgutierrez: ncredir: Use pipes instead of files for the access_log [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) [10:32:18] (03CR) 10Vgutierrez: "> Patch Set 28:" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) (owner: 10Vgutierrez) [10:32:32] (03PS3) 10Fsero: helmfile,k8s: Add a coredns deployment for DNS in-cluster service [deployment-charts] - 10https://gerrit.wikimedia.org/r/523722 (https://phabricator.wikimedia.org/T226516) [10:32:46] (03CR) 10Alexandros Kosiaris: [C: 04-1] puppetmaster: Add the abbilty to have canary beckends (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524287 (owner: 10Jbond) [10:35:18] !log enable puppet on jobrunners [10:35:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:26] 10Operations, 10Machine vision, 10serviceops, 10Reading-Infrastructure-Team-Backlog (Kanban), and 2 others: Update open_nsfw-- for Wikimedia production deployment - https://phabricator.wikimedia.org/T225664 (10Tgr) How is this related to {T214201}? It seems unnecessary to do both. [10:36:56] (03PS4) 10Fsero: helmfile,k8s: Add a coredns deployment for DNS in-cluster service [deployment-charts] - 10https://gerrit.wikimedia.org/r/523722 (https://phabricator.wikimedia.org/T226516) [10:38:02] (03CR) 10Fsero: "Updated with your comments, if its good to go i'll merge" [deployment-charts] - 10https://gerrit.wikimedia.org/r/523722 (https://phabricator.wikimedia.org/T226516) (owner: 10Fsero) [10:38:38] (03PS1) 10Muehlenhoff: Switch account consistency script to using the read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524491 [10:39:10] (03CR) 10jerkins-bot: [V: 04-1] Switch account consistency script to using the read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524491 (owner: 10Muehlenhoff) [10:44:07] (03PS1) 10Fsero: helmfile,deploy: bug: secrets.yaml should be inside private folder [puppet] - 10https://gerrit.wikimedia.org/r/524492 (https://phabricator.wikimedia.org/T227775) [10:45:14] 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO, 10Patch-For-Review, 10Release-Engineering-Team (Development services): Add prometheus exporter to Gerrit - https://phabricator.wikimedia.org/T184086 (10fgiunchedi) >>! In T184086#5348933, @hashar wrote: >>>! In T184086#5345246, @fgiunchedi wrote: >... [10:46:26] (03PS3) 10Effie Mouzeli: profile::mediawiki::jobrunner: Configure php7_only flag [puppet] - 10https://gerrit.wikimedia.org/r/524336 (https://phabricator.wikimedia.org/T219148) [10:47:15] (03CR) 10Effie Mouzeli: profile::mediawiki::jobrunner: Configure php7_only flag (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/524336 (https://phabricator.wikimedia.org/T219148) (owner: 10Effie Mouzeli) [10:48:00] (03PS2) 10Fsero: helmfile,deploy: bug: secrets.yaml should be inside private folder [puppet] - 10https://gerrit.wikimedia.org/r/524492 (https://phabricator.wikimedia.org/T227775) [10:49:40] (03CR) 10Fsero: [C: 03+2] helmfile,deploy: bug: secrets.yaml should be inside private folder [puppet] - 10https://gerrit.wikimedia.org/r/524492 (https://phabricator.wikimedia.org/T227775) (owner: 10Fsero) [10:53:43] !log deploying calico from helmfile in staging T227775 [10:53:43] (03CR) 10Effie Mouzeli: "LGTM: https://puppet-compiler.wmflabs.org/compiler1001/17502/mw1300.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/524336 (https://phabricator.wikimedia.org/T219148) (owner: 10Effie Mouzeli) [10:53:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:53:51] T227775: recreate staging cluster namespaces using helmfile - https://phabricator.wikimedia.org/T227775 [10:53:53] (03PS2) 10Muehlenhoff: Switch account consistency script to using the read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524491 [10:53:59] !log root@ helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' . [10:54:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:54:16] (03PS30) 10Vgutierrez: ncredir: Use pipes instead of files for the access_log [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) [10:55:44] !log root@ helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' . [10:55:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:56:08] !log root@ helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' . [10:56:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:29] (03CR) 10Alexandros Kosiaris: [C: 04-1] "minor last comment, rest LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524336 (https://phabricator.wikimedia.org/T219148) (owner: 10Effie Mouzeli) [11:04:05] (03PS1) 10Muehlenhoff: netbox: Read LDAP server from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524494 (https://phabricator.wikimedia.org/T227650) [11:04:36] (03CR) 10jerkins-bot: [V: 04-1] netbox: Read LDAP server from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524494 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [11:05:29] (03CR) 10Effie Mouzeli: profile::mediawiki::jobrunner: Configure php7_only flag (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524336 (https://phabricator.wikimedia.org/T219148) (owner: 10Effie Mouzeli) [11:07:08] (03PS4) 10Effie Mouzeli: profile::mediawiki::jobrunner: Configure php7_only flag [puppet] - 10https://gerrit.wikimedia.org/r/524336 (https://phabricator.wikimedia.org/T219148) [11:07:13] (03PS2) 10Muehlenhoff: netbox: Read LDAP server from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524494 (https://phabricator.wikimedia.org/T227650) [11:08:12] (03CR) 10jerkins-bot: [V: 04-1] netbox: Read LDAP server from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524494 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [11:11:11] (03PS2) 10Effie Mouzeli: remove pdfrender records [dns] - 10https://gerrit.wikimedia.org/r/521582 (https://phabricator.wikimedia.org/T226675) (owner: 10Dzahn) [11:11:41] (03CR) 10Effie Mouzeli: [C: 03+2] remove pdfrender records [dns] - 10https://gerrit.wikimedia.org/r/521582 (https://phabricator.wikimedia.org/T226675) (owner: 10Dzahn) [11:19:40] 10Operations, 10Cloud-VPS, 10Traffic, 10cloud-services-team (Kanban): cloudcontrol: decide on FQDN for service endpoints - https://phabricator.wikimedia.org/T223902 (10aborrero) 05Open→03Resolved a:03aborrero This is written to wikitech here: https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Servic... [11:27:17] (03PS3) 10Muehlenhoff: netbox: Read LDAP server from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524494 (https://phabricator.wikimedia.org/T227650) [11:37:01] (03PS31) 10Vgutierrez: ncredir: Use pipes instead of files for the access_log [puppet] - 10https://gerrit.wikimedia.org/r/524185 (https://phabricator.wikimedia.org/T228382) [11:37:03] (03PS1) 10Vgutierrez: fifo_log_demux: Allow to specify a service that requires fifo_log_demux [puppet] - 10https://gerrit.wikimedia.org/r/524496 (https://phabricator.wikimedia.org/T228382) [11:40:14] (03PS6) 10Jbond: puppetmaster: Add the abbilty to have canary beckends [puppet] - 10https://gerrit.wikimedia.org/r/524287 [11:43:19] (03CR) 10Volans: [C: 03+1] "LGTM, one partially out of scope caveat inline." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524494 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [11:46:52] (03PS7) 10Jbond: puppetmaster: Add the abbilty to have canary beckends [puppet] - 10https://gerrit.wikimedia.org/r/524287 [11:54:28] jbond42: typo: backends :-P [11:54:31] * volans hides [11:55:16] (03PS8) 10Jbond: puppetmaster: Add the ability to have canary backends [puppet] - 10https://gerrit.wikimedia.org/r/524287 [11:55:26] cheers :) [12:09:07] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [12:14:07] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [12:20:25] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/524476 (owner: 10Alexandros Kosiaris) [12:22:53] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/524487 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [12:28:45] !log gehel@cumin1001 END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) [12:28:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:30:11] ACKNOWLEDGEMENT - High lag on wdqs1008 is CRITICAL: 1.394e+04 ge 3600 Gehel catching up after data transfer - https://phabricator.wikimedia.org/T228122 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [12:32:15] ACKNOWLEDGEMENT - High lag on wdqs1010 is CRITICAL: 1.404e+04 ge 3600 Gehel catching up after data transfer - https://phabricator.wikimedia.org/T228122 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [12:39:03] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [12:44:09] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [12:54:58] (03PS1) 10Muehlenhoff: Add library hint for bzip2 [puppet] - 10https://gerrit.wikimedia.org/r/524503 [12:56:21] (03PS2) 10Muehlenhoff: Add library hint for bzip2 [puppet] - 10https://gerrit.wikimedia.org/r/524503 [12:57:35] (03CR) 10Jbond: Add Fastnetmon to the netinsights role (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/524253 (https://phabricator.wikimedia.org/T226810) (owner: 10Ayounsi) [12:59:21] (03CR) 10Muehlenhoff: [C: 03+2] Add library hint for bzip2 [puppet] - 10https://gerrit.wikimedia.org/r/524503 (owner: 10Muehlenhoff) [12:59:59] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/524491 (owner: 10Muehlenhoff) [13:00:01] (03CR) 10BBlack: [C: 03+1] varnish/templates/text-frontend.inc.vcl.erb: Fix doc ref to renamed variable [puppet] - 10https://gerrit.wikimedia.org/r/514394 (owner: 10Jforrester) [13:03:29] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/524336 (https://phabricator.wikimedia.org/T219148) (owner: 10Effie Mouzeli) [13:04:31] !log installing bzip2 security updates on jessie [13:04:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:07] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [13:10:34] (03PS3) 10Elukey: role::analytics_cluster::ui: add health check for Yarn and Hue [puppet] - 10https://gerrit.wikimedia.org/r/524488 (https://phabricator.wikimedia.org/T227860) [13:11:35] (03CR) 10Hashar: "Alexandros raised that concern earlier indeed. I have to dig into it and find out the appropriate strategy :-\" [puppet] - 10https://gerrit.wikimedia.org/r/480957 (owner: 10Hashar) [13:12:16] (03PS3) 10Muehlenhoff: Switch account consistency script to using the read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524491 [13:14:09] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [13:14:37] (03CR) 10Muehlenhoff: [C: 03+2] Switch account consistency script to using the read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524491 (owner: 10Muehlenhoff) [13:18:34] (03PS4) 10Muehlenhoff: netbox: Read LDAP server from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524494 (https://phabricator.wikimedia.org/T227650) [13:18:36] (03PS2) 10Elukey: piwik: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524479 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [13:19:44] (03CR) 10Elukey: [C: 03+2] piwik: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524479 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [13:20:19] (03PS4) 10Elukey: role::analytics_cluster::ui: add health check for Yarn and Hue [puppet] - 10https://gerrit.wikimedia.org/r/524488 (https://phabricator.wikimedia.org/T227860) [13:23:35] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1001/17509/analytics-tool1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/524488 (https://phabricator.wikimedia.org/T227860) (owner: 10Elukey) [13:24:21] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/17508/" [puppet] - 10https://gerrit.wikimedia.org/r/524494 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [13:24:40] (03PS5) 10Elukey: role::analytics_cluster::ui: add health check for Yarn and Hue [puppet] - 10https://gerrit.wikimedia.org/r/524488 (https://phabricator.wikimedia.org/T227860) [13:29:43] (03Abandoned) 10Awight: Prune nonexistent config files [dumps] - 10https://gerrit.wikimedia.org/r/348011 (owner: 10Awight) [13:30:07] !log tarrow@ helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' . [13:30:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:31:32] (03CR) 10Ema: [C: 03+1] role::analytics_cluster::ui: add health check for Yarn and Hue [puppet] - 10https://gerrit.wikimedia.org/r/524488 (https://phabricator.wikimedia.org/T227860) (owner: 10Elukey) [13:33:36] (03PS6) 10Elukey: role::analytics_cluster::ui: add health check for Yarn and Hue [puppet] - 10https://gerrit.wikimedia.org/r/524488 (https://phabricator.wikimedia.org/T227860) [13:35:06] (03CR) 10Elukey: [C: 03+2] role::analytics_cluster::ui: add health check for Yarn and Hue [puppet] - 10https://gerrit.wikimedia.org/r/524488 (https://phabricator.wikimedia.org/T227860) (owner: 10Elukey) [13:47:10] (03PS1) 10Elukey: profile::tlsproxy::service: allow to modify the Nagios contact_group [puppet] - 10https://gerrit.wikimedia.org/r/524516 (https://phabricator.wikimedia.org/T227860) [13:48:42] (03PS5) 10Jbond: puppetmaster: add type checking ro puppetmaster::web_frontend [puppet] - 10https://gerrit.wikimedia.org/r/524274 [13:49:56] (03CR) 10Jbond: [C: 03+2] puppetmaster: add type checking ro puppetmaster::web_frontend [puppet] - 10https://gerrit.wikimedia.org/r/524274 (owner: 10Jbond) [13:51:52] (03CR) 10Ema: [C: 03+1] profile::tlsproxy::service: allow to modify the Nagios contact_group [puppet] - 10https://gerrit.wikimedia.org/r/524516 (https://phabricator.wikimedia.org/T227860) (owner: 10Elukey) [13:54:35] (03CR) 10Elukey: [C: 03+2] profile::tlsproxy::service: allow to modify the Nagios contact_group [puppet] - 10https://gerrit.wikimedia.org/r/524516 (https://phabricator.wikimedia.org/T227860) (owner: 10Elukey) [13:54:42] (03PS2) 10Elukey: profile::tlsproxy::service: allow to modify the Nagios contact_group [puppet] - 10https://gerrit.wikimedia.org/r/524516 (https://phabricator.wikimedia.org/T227860) [14:03:58] (03CR) 10Marostegui: [C: 04-2] "Puppet looks good: https://puppet-compiler.wmflabs.org/compiler1002/17513/" [puppet] - 10https://gerrit.wikimedia.org/r/524411 (https://phabricator.wikimedia.org/T227062) (owner: 10Marostegui) [14:10:01] (03PS2) 10Eevans: sessionstore: update to Kask v1.0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/524377 [14:11:32] (03PS1) 10Muehlenhoff: grafana: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524519 (https://phabricator.wikimedia.org/T227650) [14:12:27] (03PS9) 10Jbond: puppetmaster: Add the ability to have canary backends [puppet] - 10https://gerrit.wikimedia.org/r/524287 [14:12:44] (03CR) 10Jbond: "check experimental" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524287 (owner: 10Jbond) [14:18:22] (03PS1) 10Elukey: Update yarn.wikimedia.org's crt file [puppet] - 10https://gerrit.wikimedia.org/r/524524 (https://phabricator.wikimedia.org/T227860) [14:19:00] (03CR) 10Ema: [C: 03+1] Update yarn.wikimedia.org's crt file [puppet] - 10https://gerrit.wikimedia.org/r/524524 (https://phabricator.wikimedia.org/T227860) (owner: 10Elukey) [14:19:35] (03CR) 10Elukey: [C: 03+2] Update yarn.wikimedia.org's crt file [puppet] - 10https://gerrit.wikimedia.org/r/524524 (https://phabricator.wikimedia.org/T227860) (owner: 10Elukey) [14:20:51] (03PS1) 10Alexandros Kosiaris: Update tests to using puppet 4.10.2 [puppet] - 10https://gerrit.wikimedia.org/r/524525 [14:21:53] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/524476 (owner: 10Alexandros Kosiaris) [14:21:59] (03PS2) 10Alexandros Kosiaris: build_notes_url(): Don't quote if single url [puppet] - 10https://gerrit.wikimedia.org/r/524476 [14:23:42] (03CR) 10Alexandros Kosiaris: [C: 04-1] helmfile,k8s: Add a coredns deployment for DNS in-cluster service (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/523722 (https://phabricator.wikimedia.org/T226516) (owner: 10Fsero) [14:23:46] (03CR) 10Alexandros Kosiaris: [C: 03+1] helmfile,k8s: Add a coredns deployment for DNS in-cluster service [deployment-charts] - 10https://gerrit.wikimedia.org/r/523722 (https://phabricator.wikimedia.org/T226516) (owner: 10Fsero) [14:24:28] (03PS2) 10ArielGlenn: handle exception when setting up Wiki object for misc dumps [dumps] - 10https://gerrit.wikimedia.org/r/522018 (https://phabricator.wikimedia.org/T227730) [14:26:24] * Krinkle staging om mwdebug1002 [14:27:12] (03PS2) 10Muehlenhoff: grafana: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524519 (https://phabricator.wikimedia.org/T227650) [14:28:05] (03CR) 10ArielGlenn: [C: 03+2] handle exception when setting up Wiki object for misc dumps [dumps] - 10https://gerrit.wikimedia.org/r/522018 (https://phabricator.wikimedia.org/T227730) (owner: 10ArielGlenn) [14:28:17] !log krinkle@deploy1001: extensions/CheckUser is dirty in php-1.34-wmf.13 and php-1.34-wmf.14 [14:28:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:43] !log krinkle@deploy1001: Untracked file found in php-1.34-wmf.13 [14:28:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:29:08] !log ariel@deploy1001 Started deploy [dumps/dumps@71e62ee]: better exception handling for misc dumps [14:29:11] !log ariel@deploy1001 Finished deploy [dumps/dumps@71e62ee]: better exception handling for misc dumps (duration: 00m 03s) [14:29:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:29:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:29:43] (03PS1) 10Ema: logstash: add TLS support via profile::tlsproxy::service [puppet] - 10https://gerrit.wikimedia.org/r/524527 (https://phabricator.wikimedia.org/T210411) [14:31:14] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/17516/" [puppet] - 10https://gerrit.wikimedia.org/r/524519 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [14:31:28] (03CR) 10Muehlenhoff: "https://puppet-compiler.wmflabs.org/compiler1001/17516/" [puppet] - 10https://gerrit.wikimedia.org/r/524519 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [14:34:04] PROBLEM - Maps - OSM synchronization lag - eqiad on icinga1001 is CRITICAL: 3.98e+05 ge 2.592e+05 https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=11&fullscreen&orgId=1 [14:35:20] !log krinkle@deploy1001 Synchronized php-1.34.0-wmf.13/extensions/Collection/Collection.php: 66ce154d7d734209c76a62cf / T87899 (duration: 00m 56s) [14:35:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:27] T87899: Convert Collection to use extension registration - https://phabricator.wikimedia.org/T87899 [14:38:47] (03CR) 10CDanis: [C: 03+1] grafana: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524519 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [14:39:52] ACKNOWLEDGEMENT - Maps - OSM synchronization lag - eqiad on icinga1001 is CRITICAL: 3.983e+05 ge 2.592e+05 Gehel catching up after data re-import, should resolve in the next 24h https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=11&fullscreen&orgId=1 [14:41:16] (03PS1) 10Jhedden: cloudstore: allow rpc.mountd traffic between hosts [puppet] - 10https://gerrit.wikimedia.org/r/524528 (https://phabricator.wikimedia.org/T225265) [14:41:36] (03PS1) 10Ema: kibana: add certificate [puppet] - 10https://gerrit.wikimedia.org/r/524529 (https://phabricator.wikimedia.org/T210411) [14:42:22] !log krinkle@deploy1001 Synchronized php-1.34.0-wmf.14/extensions/Collection/Collection.php: 90eed0fad / T87899 (duration: 00m 54s) [14:42:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:42:29] T87899: Convert Collection to use extension registration - https://phabricator.wikimedia.org/T87899 [14:43:17] (03PS1) 10Elukey: superset: move httpd proxy config to a profile [puppet] - 10https://gerrit.wikimedia.org/r/524531 (https://phabricator.wikimedia.org/T227860) [14:43:21] (03PS1) 10Ema: secret: dummy key for kibana [labs/private] - 10https://gerrit.wikimedia.org/r/524532 (https://phabricator.wikimedia.org/T210411) [14:43:48] (03CR) 10Bstorm: [C: 03+1] "Looks good to me. Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/524528 (https://phabricator.wikimedia.org/T225265) (owner: 10Jhedden) [14:44:16] (03CR) 10Ema: [V: 03+2 C: 03+2] secret: dummy key for kibana [labs/private] - 10https://gerrit.wikimedia.org/r/524532 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [14:46:20] (03CR) 10Andrew Bogott: [C: 03+1] grafana: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524519 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [14:47:41] PROBLEM - puppet last run on ms-be1025 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [14:48:32] (03CR) 10Ema: "https://puppet-compiler.wmflabs.org/compiler1002/17519/logstash1007.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/524527 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [14:50:50] (03Abandoned) 10Gehel: Re-enable eqiad crons [puppet] - 10https://gerrit.wikimedia.org/r/524232 (https://phabricator.wikimedia.org/T218097) (owner: 10MSantos) [14:51:11] (03PS2) 10Jhedden: cloudstore: allow rpc.mountd traffic between hosts [puppet] - 10https://gerrit.wikimedia.org/r/524528 (https://phabricator.wikimedia.org/T225265) [14:54:10] (03PS3) 10Jhedden: cloudstore: allow rpc.mountd traffic between hosts [puppet] - 10https://gerrit.wikimedia.org/r/524528 (https://phabricator.wikimedia.org/T225265) [14:54:41] (03PS2) 10Elukey: superset: move httpd proxy config to a profile [puppet] - 10https://gerrit.wikimedia.org/r/524531 (https://phabricator.wikimedia.org/T227860) [14:57:13] (03CR) 10Jhedden: [V: 03+2 C: 03+2] cloudstore: allow rpc.mountd traffic between hosts [puppet] - 10https://gerrit.wikimedia.org/r/524528 (https://phabricator.wikimedia.org/T225265) (owner: 10Jhedden) [14:58:52] (03PS2) 10Gehel: wdqs: introduced tuned journal options to wdqs1005. [puppet] - 10https://gerrit.wikimedia.org/r/523874 (https://phabricator.wikimedia.org/T228122) [14:59:43] (03CR) 10Gehel: [C: 03+2] wdqs: introduced tuned journal options to wdqs1005. [puppet] - 10https://gerrit.wikimedia.org/r/523874 (https://phabricator.wikimedia.org/T228122) (owner: 10Gehel) [15:00:48] ema: looks like you have an unmerged change on puppetmaster [15:00:58] looks trivial enough, want me to merge? [15:01:41] Oh, that's in labs/private, I did not realize that was now checked as well [15:01:52] ema: I'm merging it [15:02:03] gehel: I have a change in queue too, if you hit that it's OK to merge [15:02:16] jeh: ack, merging as well [15:02:22] thank you [15:03:53] !log gehel@cumin1001 START - Cookbook sre.wdqs.data-transfer [15:03:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:23] gehel: thanks! [15:08:49] RECOVERY - puppet last run on ms-be1025 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [15:11:45] (03CR) 10Filippo Giunchedi: [C: 03+1] grafana: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524519 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [15:19:38] PROBLEM - puppet last run on cloudvirt1009 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [15:20:59] (03CR) 10Fsero: [V: 03+2 C: 03+2] helmfile,k8s: Add a coredns deployment for DNS in-cluster service [deployment-charts] - 10https://gerrit.wikimedia.org/r/523722 (https://phabricator.wikimedia.org/T226516) (owner: 10Fsero) [15:22:16] (03PS1) 10Muehlenhoff: icinga: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524540 (https://phabricator.wikimedia.org/T227650) [15:22:47] !log deploy coredns in staging T226516 [15:22:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:22:55] T226516: deploy CoreDNS as a in-cluster DNS service - https://phabricator.wikimedia.org/T226516 [15:23:10] (03CR) 10jerkins-bot: [V: 04-1] icinga: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524540 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [15:24:19] (03PS2) 10Muehlenhoff: icinga: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524540 (https://phabricator.wikimedia.org/T227650) [15:26:00] !log root@ helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' . [15:26:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:45] !log root@ helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' . [15:27:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:31:15] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/17526/" [puppet] - 10https://gerrit.wikimedia.org/r/524540 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [15:31:24] (03PS2) 10Cwhite: hiera: canary enable varnishkafka_exporter on cp1088 [puppet] - 10https://gerrit.wikimedia.org/r/524288 (https://phabricator.wikimedia.org/T196066) [15:32:01] (03CR) 10Cwhite: [C: 03+2] hiera: canary enable varnishkafka_exporter on cp1088 [puppet] - 10https://gerrit.wikimedia.org/r/524288 (https://phabricator.wikimedia.org/T196066) (owner: 10Cwhite) [15:35:54] (03CR) 10CDanis: [C: 03+1] icinga: Read LDAP servers from Hiera and switch to read-only replicas (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524540 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [15:40:31] !log root@ helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' . [15:40:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:48] (03CR) 10Alexandros Kosiaris: [C: 03+1] sessionstore: update to Kask v1.0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/524377 (owner: 10Eevans) [15:46:48] RECOVERY - puppet last run on cloudvirt1009 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [15:50:06] (03PS1) 10Cwhite: prometheus: refresh service on configuration change [puppet] - 10https://gerrit.wikimedia.org/r/524545 (https://phabricator.wikimedia.org/T196066) [15:50:43] (03CR) 10jerkins-bot: [V: 04-1] prometheus: refresh service on configuration change [puppet] - 10https://gerrit.wikimedia.org/r/524545 (https://phabricator.wikimedia.org/T196066) (owner: 10Cwhite) [15:51:39] (03PS2) 10Cwhite: prometheus: refresh service on configuration change [puppet] - 10https://gerrit.wikimedia.org/r/524545 (https://phabricator.wikimedia.org/T196066) [15:54:47] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [15:54:48] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [15:54:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:55:11] * Krinkle testing on mwdebug1002 [15:55:58] !log rebooting mw2164 for a test [15:56:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:13:27] (03PS1) 10CDanis: noc: allow db.php?dc=codfw instead of just eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) [16:17:47] !log gehel@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [16:17:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:18:46] Krinkle: I'm completely faking it when it comes to PHP, so please don't be afraid to be nitpicky :) [16:18:50] PROBLEM - High lag on wdqs1010 is CRITICAL: 4495 ge 3600 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [16:20:50] ACKNOWLEDGEMENT - High lag on wdqs1010 is CRITICAL: 4342 ge 3600 Gehel catching up after data transfer - https://phabricator.wikimedia.org/T228122 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [16:20:57] cdanis: context? [16:21:06] Krinkle: https://gerrit.wikimedia.org/r/524551 [16:21:58] (03PS3) 10Elukey: superset: move httpd proxy config to a profile [puppet] - 10https://gerrit.wikimedia.org/r/524531 (https://phabricator.wikimedia.org/T227860) [16:25:56] (03CR) 10Krinkle: noc: allow db.php?dc=codfw instead of just eqiad (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) (owner: 10CDanis) [16:26:01] cdanis: cool [16:26:54] (03CR) 10Krinkle: noc: allow db.php?dc=codfw instead of just eqiad (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) (owner: 10CDanis) [16:27:53] (03CR) 10CDanis: noc: allow db.php?dc=codfw instead of just eqiad (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) (owner: 10CDanis) [16:33:41] (03PS2) 10CDanis: noc: allow db.php?dc=codfw instead of just eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) [16:33:43] (03CR) 10CDanis: noc: allow db.php?dc=codfw instead of just eqiad (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) (owner: 10CDanis) [16:50:09] (03CR) 10Krinkle: noc: allow db.php?dc=codfw instead of just eqiad (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) (owner: 10CDanis) [16:50:37] (03PS1) 10Jforrester: extension-list: Load Collection via extension.json directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524556 [16:50:39] (03PS1) 10Jforrester: Load Collection from extension.json directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524557 (https://phabricator.wikimedia.org/T87899) [16:51:06] (03CR) 10Ayounsi: "> Patch Set 4:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524037 (owner: 10Ayounsi) [16:51:18] (03CR) 10CDanis: noc: allow db.php?dc=codfw instead of just eqiad (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) (owner: 10CDanis) [16:51:36] (03PS5) 10Ayounsi: Add an anycast endpoint to syslog centralservers [puppet] - 10https://gerrit.wikimedia.org/r/524037 [16:51:38] (03PS3) 10CDanis: noc: allow db.php?dc=codfw instead of just eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) [16:53:38] RECOVERY - High lag on wdqs1010 is OK: (C)3600 ge (W)1200 ge 1198 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [16:54:44] (03CR) 10Ayounsi: "Adding John as it's 100% Puppet." [puppet] - 10https://gerrit.wikimedia.org/r/524076 (owner: 10Ayounsi) [16:57:31] (03CR) 10Ayounsi: "We discussed it over IRC but the check is from https://github.com/unixsurfer/anycast_healthchecker/blob/master/contrib/nagios/check_anycas" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/520643 (https://phabricator.wikimedia.org/T186550) (owner: 10Ayounsi) [16:59:33] (03PS4) 10Krinkle: noc: allow db.php?dc=codfw instead of just eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) (owner: 10CDanis) [17:00:26] (03CR) 10Krinkle: "simplified slightly to keep it more like now. It seem the extra var isn't currently needed for this, so might keep that for a separate cha" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) (owner: 10CDanis) [17:01:42] (03CR) 10Krinkle: "Confirmed http://localhost:9412/db.php and http://localhost:9412/db.php?dc=codfw work as expected." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) (owner: 10CDanis) [17:01:45] (03CR) 10Krinkle: [C: 03+1] noc: allow db.php?dc=codfw instead of just eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) (owner: 10CDanis) [17:01:55] cdanis: wanna give it a try locally as well? [17:02:43] (03PS1) 10Cwhite: prometheus: add varnishkafka job to prometheus scrapes [puppet] - 10https://gerrit.wikimedia.org/r/524561 (https://phabricator.wikimedia.org/T196066) [17:02:46] (03PS1) 10RobH: adding new skus [software] - 10https://gerrit.wikimedia.org/r/524562 [17:03:33] (03Abandoned) 10RobH: adding in R440 single CPU SKU [software] - 10https://gerrit.wikimedia.org/r/484771 (owner: 10RobH) [17:04:25] (03CR) 10RobH: [C: 03+2] adding new skus [software] - 10https://gerrit.wikimedia.org/r/524562 (owner: 10RobH) [17:05:12] (03PS5) 10Ayounsi: Add Fastnetmon to the netinsights role [puppet] - 10https://gerrit.wikimedia.org/r/524253 (https://phabricator.wikimedia.org/T226810) [17:05:39] (03CR) 10Ayounsi: "Thanks! All addressed." (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/524253 (https://phabricator.wikimedia.org/T226810) (owner: 10Ayounsi) [17:14:52] Krinkle: thanks, that's a lot cleaner [17:21:13] (03PS23) 10Dzahn: nrpe/icinga: make notes_url a required parameter of nrpe::monitor_service [puppet] - 10https://gerrit.wikimedia.org/r/496830 (https://phabricator.wikimedia.org/T197873) [17:24:17] is it really going to happen.. no jerkins -1 on PS23 ?.. [17:26:11] volans: or you managed to kill jerkins itself, one of the two :) [17:26:15] err oops [17:26:20] mutante: or you managed to kill jerkins itself, one of the two :) [17:26:31] lol [17:28:40] lol, indeed [17:28:55] no, it said +2 now, woot [17:30:27] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/524253 (https://phabricator.wikimedia.org/T226810) (owner: 10Ayounsi) [17:36:34] (03CR) 10Ayounsi: [C: 03+2] Add Fastnetmon to the netinsights role [puppet] - 10https://gerrit.wikimedia.org/r/524253 (https://phabricator.wikimedia.org/T226810) (owner: 10Ayounsi) [17:36:39] (03CR) 10Ayounsi: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/17527/netflow1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/524253 (https://phabricator.wikimedia.org/T226810) (owner: 10Ayounsi) [17:36:51] (03PS6) 10Ayounsi: Add Fastnetmon to the netinsights role [puppet] - 10https://gerrit.wikimedia.org/r/524253 (https://phabricator.wikimedia.org/T226810) [17:37:08] mutante: \o/ [17:39:36] cdanis: partially because https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/524476/ fixed tests [17:39:57] before it was not possible to get the quoting right [17:40:31] cdanis: got a follow-up locally based on your commit that cleans it up a bit more. seems simple enough to roll out now if you're ready with this one [17:41:02] Krinkle: sure, your changes that I've seen so far LGTM [17:41:29] (03PS1) 10Krinkle: noc: Reduce code inclusion from db.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524569 [17:41:32] (03CR) 10Krinkle: [C: 03+2] noc: allow db.php?dc=codfw instead of just eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) (owner: 10CDanis) [17:42:37] (03CR) 10CDanis: [C: 03+1] noc: Reduce code inclusion from db.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524569 (owner: 10Krinkle) [17:42:58] (03CR) 10Krinkle: noc: Reduce code inclusion from db.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524569 (owner: 10Krinkle) [17:43:02] (03CR) 10Krinkle: [C: 03+2] noc: Reduce code inclusion from db.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524569 (owner: 10Krinkle) [17:43:10] (03Merged) 10jenkins-bot: noc: allow db.php?dc=codfw instead of just eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) (owner: 10CDanis) [17:44:47] !log change netflow target port to 2055 in eqiad [17:44:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:49:53] (03Merged) 10jenkins-bot: noc: Reduce code inclusion from db.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524569 (owner: 10Krinkle) [17:53:09] !log cdanis@deploy1001 Synchronized docroot/noc/db.php: noc: db.php: support ?dc=codfw, and cleanups (duration: 00m 56s) [17:53:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:54:17] (03CR) 10jenkins-bot: noc: allow db.php?dc=codfw instead of just eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524551 (https://phabricator.wikimedia.org/T197126) (owner: 10CDanis) [17:54:20] (03CR) 10jenkins-bot: noc: Reduce code inclusion from db.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524569 (owner: 10Krinkle) [17:58:16] cdanis: nice :) [17:58:17] https://noc.wikimedia.org/db.php?dc=codfw&format=json [17:58:28] Krinkle: thanks again for the help!! [17:58:40] yw [18:02:13] (03PS3) 10Dzahn: static-rt: LDAP config, use ro, Hiera and new password classes [puppet] - 10https://gerrit.wikimedia.org/r/523992 [18:03:59] (03PS1) 10DannyS712: Add `flood` group to ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524574 (https://phabricator.wikimedia.org/T228521) [18:04:58] (03PS4) 10Dzahn: static-rt: LDAP config, use ro, Hiera and new password classes [puppet] - 10https://gerrit.wikimedia.org/r/523992 [18:06:37] (03PS5) 10Dzahn: static-rt: LDAP config, use ro, Hiera and new password classes [puppet] - 10https://gerrit.wikimedia.org/r/523992 (https://phabricator.wikimedia.org/T227650) [18:08:03] (03PS2) 10DannyS712: Add `flood` group to ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524574 (https://phabricator.wikimedia.org/T228521) [18:11:04] (03CR) 10Dzahn: piwik: Read LDAP servers from Hiera and switch to read-only replicas (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524479 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [18:31:32] (03PS3) 10Dzahn: librenms: use ldap-ro, stop using ldap-labs, use Hiera [puppet] - 10https://gerrit.wikimedia.org/r/523994 (https://phabricator.wikimedia.org/T227650) [18:32:40] (03CR) 10Dzahn: piwik: Read LDAP servers from Hiera and switch to read-only replicas (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524479 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [18:35:19] (03PS4) 10Dzahn: microsites/transparency: use ldap-ro, stop using ldap-labs, use Hiera [puppet] - 10https://gerrit.wikimedia.org/r/523991 (https://phabricator.wikimedia.org/T227650) [18:38:14] (03PS3) 10Dzahn: xhgui::app: use ldap-ro, stop using ldap-labs, use Hiera [puppet] - 10https://gerrit.wikimedia.org/r/523995 (https://phabricator.wikimedia.org/T227650) [18:38:47] (03CR) 10jerkins-bot: [V: 04-1] xhgui::app: use ldap-ro, stop using ldap-labs, use Hiera [puppet] - 10https://gerrit.wikimedia.org/r/523995 (https://phabricator.wikimedia.org/T227650) (owner: 10Dzahn) [18:42:28] PROBLEM - puppet last run on ganeti1002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [18:45:49] (03CR) 10Krinkle: Drop zero.wikimedia.org (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/521886 (https://phabricator.wikimedia.org/T187716) (owner: 10Jforrester) [18:45:54] (03PS3) 10Dzahn: tendril: use ldap-ro, use Hiera, refactor to profile [puppet] - 10https://gerrit.wikimedia.org/r/523993 (https://phabricator.wikimedia.org/T227650) [18:46:59] (03PS4) 10Dzahn: xhgui::app: use ldap-ro, stop using ldap-labs, use Hiera [puppet] - 10https://gerrit.wikimedia.org/r/523995 (https://phabricator.wikimedia.org/T227650) [18:48:20] (03PS4) 10Dzahn: tendril: use ldap-ro, use Hiera, refactor to profile [puppet] - 10https://gerrit.wikimedia.org/r/523993 (https://phabricator.wikimedia.org/T227650) [18:50:32] (03CR) 10jerkins-bot: [V: 04-1] xhgui::app: use ldap-ro, stop using ldap-labs, use Hiera [puppet] - 10https://gerrit.wikimedia.org/r/523995 (https://phabricator.wikimedia.org/T227650) (owner: 10Dzahn) [18:50:42] (03PS1) 10Dzahn: netbox: stop using ::passwords::ldap:wmf_cluster [puppet] - 10https://gerrit.wikimedia.org/r/524583 (https://phabricator.wikimedia.org/T227650) [18:52:02] (03PS4) 10Jforrester: Drop zero.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/521886 (https://phabricator.wikimedia.org/T187716) [18:52:04] (03PS1) 10Dzahn: icinga: stop using ::passwords::ldap:wmf_cluster [puppet] - 10https://gerrit.wikimedia.org/r/524584 (https://phabricator.wikimedia.org/T227650) [18:52:12] (03CR) 10Jforrester: Drop zero.wikimedia.org (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/521886 (https://phabricator.wikimedia.org/T187716) (owner: 10Jforrester) [18:57:50] (03PS1) 10Dzahn: grafana: stop using ldap-labs, use ldap-ro [puppet] - 10https://gerrit.wikimedia.org/r/524585 (https://phabricator.wikimedia.org/T227650) [19:00:09] (03PS3) 10Eevans: sessionstore: update to Kask v1.0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/524377 [19:00:41] (03CR) 10Eevans: [V: 03+2 C: 03+2] sessionstore: update to Kask v1.0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/524377 (owner: 10Eevans) [19:01:16] (03PS1) 10Dzahn: graphite: use ldap-ro, stop using ldap-labs, use Hiera [puppet] - 10https://gerrit.wikimedia.org/r/524586 (https://phabricator.wikimedia.org/T227650) [19:02:14] !log eevans@ helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' . [19:02:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:27] (03CR) 10Dzahn: [C: 04-1] "would need refactor to profile but is deprecated anyways..." [puppet] - 10https://gerrit.wikimedia.org/r/523995 (https://phabricator.wikimedia.org/T227650) (owner: 10Dzahn) [19:07:22] !log eevans@ helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' . [19:07:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:09:14] (03CR) 10Dzahn: "maybe you could add to this patch to switch the flag for one single server, like a canary or test server. then we could compile on that an" [puppet] - 10https://gerrit.wikimedia.org/r/524336 (https://phabricator.wikimedia.org/T219148) (owner: 10Effie Mouzeli) [19:10:46] RECOVERY - puppet last run on ganeti1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [19:12:04] (03PS3) 10Dzahn: Remove support for jessie from Phabricator classes [puppet] - 10https://gerrit.wikimedia.org/r/522420 (owner: 10Muehlenhoff) [19:12:13] (03CR) 10Effie Mouzeli: "> maybe you could add to this patch to switch the flag for one single" [puppet] - 10https://gerrit.wikimedia.org/r/524336 (https://phabricator.wikimedia.org/T219148) (owner: 10Effie Mouzeli) [19:12:56] (03CR) 10Dzahn: "amended to fix ERROR two-space soft tabs not used (2sp_soft_tabs) lint issue" [puppet] - 10https://gerrit.wikimedia.org/r/522420 (owner: 10Muehlenhoff) [19:14:29] (03CR) 10Dzahn: "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/524336 (https://phabricator.wikimedia.org/T219148) (owner: 10Effie Mouzeli) [19:19:17] (03CR) 10Dzahn: "is this the expected behaviour for random host mw1307? https://puppet-compiler.wmflabs.org/compiler1001/17528/mw1307.eqiad.wmnet/ or did " [puppet] - 10https://gerrit.wikimedia.org/r/524336 (https://phabricator.wikimedia.org/T219148) (owner: 10Effie Mouzeli) [19:31:33] (03PS2) 10Gehel: wdqs: introduced tuned journal options to wdqs1006. [puppet] - 10https://gerrit.wikimedia.org/r/523875 (https://phabricator.wikimedia.org/T228122) [19:32:20] (03CR) 10Gehel: [C: 03+2] wdqs: introduced tuned journal options to wdqs1006. [puppet] - 10https://gerrit.wikimedia.org/r/523875 (https://phabricator.wikimedia.org/T228122) (owner: 10Gehel) [19:34:02] !log gehel@cumin1001 START - Cookbook sre.wdqs.data-transfer [19:34:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:34] !log gehel@cumin1001 END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) [19:35:37] !log gehel@cumin1001 START - Cookbook sre.wdqs.data-transfer [19:35:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:45:32] (03CR) 10CDanis: "mostly LGTM, thanks!" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/496830 (https://phabricator.wikimedia.org/T197873) (owner: 10Dzahn) [19:48:24] (03CR) 10CDanis: nrpe/icinga: make notes_url a required parameter of nrpe::monitor_service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/496830 (https://phabricator.wikimedia.org/T197873) (owner: 10Dzahn) [19:51:33] (03CR) 10Cwhite: [C: 03+1] icinga: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524540 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [20:07:44] (03CR) 10Muehlenhoff: "Already covered at" [puppet] - 10https://gerrit.wikimedia.org/r/524585 (https://phabricator.wikimedia.org/T227650) (owner: 10Dzahn) [20:11:51] 10Operations, 10Wikimedia-Mailing-lists: New Mailing lists for AzWiki sysops - https://phabricator.wikimedia.org/T228542 (10Eldarado) @MarcoAurelio , @Aklapper Thank you so much for your help. I'll update my request tomorrow :) [20:18:46] (03Abandoned) 10Dzahn: grafana: stop using ldap-labs, use ldap-ro [puppet] - 10https://gerrit.wikimedia.org/r/524585 (https://phabricator.wikimedia.org/T227650) (owner: 10Dzahn) [20:20:52] (03PS2) 10Krinkle: Add entries to wgCSPFalsePositiveUrls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/504474 (https://phabricator.wikimedia.org/T207900) [20:21:14] (03CR) 10Dzahn: nrpe/icinga: make notes_url a required parameter of nrpe::monitor_service (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/496830 (https://phabricator.wikimedia.org/T197873) (owner: 10Dzahn) [20:21:27] (03PS3) 10Krinkle: Add entries to wgCSPFalsePositiveUrls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/504474 (https://phabricator.wikimedia.org/T207900) [20:23:42] (03CR) 10Dzahn: icinga: Read LDAP servers from Hiera and switch to read-only replicas (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524540 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [20:25:00] (03CR) 10Dzahn: [C: 03+1] "compiler output looks good, just the nitpick about using Stdlib::Fqdn for the server name" [puppet] - 10https://gerrit.wikimedia.org/r/524540 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [20:25:06] PROBLEM - puppet last run on cloudvirt1016 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [20:26:17] (03PS4) 10Dzahn: Remove support for jessie from Phabricator classes [puppet] - 10https://gerrit.wikimedia.org/r/522420 (owner: 10Muehlenhoff) [20:27:55] (03CR) 10Dzahn: [C: 03+1] "also see https://gerrit.wikimedia.org/r/c/operations/puppet/+/524584 and merge that with it or merge it into your change" [puppet] - 10https://gerrit.wikimedia.org/r/524540 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [20:33:37] (03CR) 10Paladox: [C: 03+1] Remove support for jessie from Phabricator classes [puppet] - 10https://gerrit.wikimedia.org/r/522420 (owner: 10Muehlenhoff) [20:36:07] (03PS5) 10Dzahn: tendril: use ldap-ro, use Hiera, refactor to profile [puppet] - 10https://gerrit.wikimedia.org/r/523993 (https://phabricator.wikimedia.org/T227650) [20:37:01] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1001/17529/" [puppet] - 10https://gerrit.wikimedia.org/r/523993 (https://phabricator.wikimedia.org/T227650) (owner: 10Dzahn) [20:37:47] (03CR) 10Dzahn: "see compiler output. only the LDAP URL changes to 'ro'" [puppet] - 10https://gerrit.wikimedia.org/r/523993 (https://phabricator.wikimedia.org/T227650) (owner: 10Dzahn) [20:40:59] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/17530/" [puppet] - 10https://gerrit.wikimedia.org/r/522420 (owner: 10Muehlenhoff) [20:45:35] (03CR) 10Dzahn: [C: 03+1] grafana: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524519 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [20:45:47] 10Operations, 10SRE-Access-Requests: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003 and notebook1004] and groups for cchen - https://phabricator.wikimedia.org/T228447 (10cchen) [20:47:01] (03CR) 10CDanis: [C: 03+1] nrpe/icinga: make notes_url a required parameter of nrpe::monitor_service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/496830 (https://phabricator.wikimedia.org/T197873) (owner: 10Dzahn) [20:47:22] (03CR) 10CDanis: [C: 03+1] icinga: stop using ::passwords::ldap:wmf_cluster [puppet] - 10https://gerrit.wikimedia.org/r/524584 (https://phabricator.wikimedia.org/T227650) (owner: 10Dzahn) [20:47:40] RECOVERY - puppet last run on cloudvirt1016 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [20:48:52] (03CR) 10Dzahn: netbox: Read LDAP server from Hiera and switch to read-only replicas (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/524494 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [20:49:56] (03CR) 10Dzahn: netbox: Read LDAP server from Hiera and switch to read-only replicas (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524494 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [20:51:39] (03CR) 10Dzahn: [C: 03+1] puppetboard: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524487 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [20:53:08] (03CR) 10Dzahn: [C: 03+1] debmonitor: Read LDAP servers from Hiera and switch to read-only replicas [puppet] - 10https://gerrit.wikimedia.org/r/524489 (https://phabricator.wikimedia.org/T227650) (owner: 10Muehlenhoff) [20:54:45] (03CR) 10Dzahn: [C: 03+1] Decommission old jessie-based ORES pool counters [puppet] - 10https://gerrit.wikimedia.org/r/524162 (https://phabricator.wikimedia.org/T227640) (owner: 10Muehlenhoff) [20:55:26] (03CR) 10Dzahn: [C: 03+1] Allow gpu-testers to run radeontop [puppet] - 10https://gerrit.wikimedia.org/r/518210 (https://phabricator.wikimedia.org/T220811) (owner: 10Muehlenhoff) [21:05:12] (03PS2) 10Dzahn: icinga: stop using ::passwords::ldap:wmf_cluster [puppet] - 10https://gerrit.wikimedia.org/r/524584 (https://phabricator.wikimedia.org/T227650) [21:14:18] (03PS1) 10CDanis: cdanis dotfiles: some truly delightful patches from upstream [puppet] - 10https://gerrit.wikimedia.org/r/524595 [21:16:15] (03CR) 10CDanis: [C: 03+2] "I can't believe I've done this" [puppet] - 10https://gerrit.wikimedia.org/r/524595 (owner: 10CDanis) [21:16:44] RECOVERY - puppet last run on phab2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link [21:18:32] ^ fixed in -releng [21:21:19] 10Operations, 10SRE-Access-Requests: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003 and notebook1004] and groups for cchen - https://phabricator.wikimedia.org/T228447 (10cchen) Thanks for the info! Just updated my SSH in description. [21:34:37] !log gehel@cumin1001 END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) [21:34:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:36:16] (03PS1) 10Volans: docstrings: fix newly reported pep257 violations [software/cumin] - 10https://gerrit.wikimedia.org/r/524598 [21:36:18] (03PS1) 10Volans: tests: temporarily limit max version of prospector [software/cumin] - 10https://gerrit.wikimedia.org/r/524599 [21:36:20] (03PS1) 10Volans: setup.py: limit max version of tqdm [software/cumin] - 10https://gerrit.wikimedia.org/r/524600 [21:36:22] (03PS1) 10Volans: dependency: replace colorama with custom module [software/cumin] - 10https://gerrit.wikimedia.org/r/524601 (https://phabricator.wikimedia.org/T217038) [21:36:56] the first will fail CI [21:37:22] (03CR) 10Gehel: [C: 03+1] "LGTM, trivial enough" [software/cumin] - 10https://gerrit.wikimedia.org/r/524599 (owner: 10Volans) [21:40:27] (03CR) 10Gehel: "LGTM, trivial enough" [software/cumin] - 10https://gerrit.wikimedia.org/r/524598 (owner: 10Volans) [21:41:50] (03CR) 10jerkins-bot: [V: 04-1] docstrings: fix newly reported pep257 violations [software/cumin] - 10https://gerrit.wikimedia.org/r/524598 (owner: 10Volans) [21:42:40] (03CR) 10Volans: "Integration tests results FYI:" [software/cumin] - 10https://gerrit.wikimedia.org/r/524601 (https://phabricator.wikimedia.org/T217038) (owner: 10Volans) [21:44:11] (03CR) 10Volans: "The CI failure is expected because of:" [software/cumin] - 10https://gerrit.wikimedia.org/r/524598 (owner: 10Volans) [21:45:11] (03CR) 10Gehel: [C: 04-1] "minor comment inline" (031 comment) [software/cumin] - 10https://gerrit.wikimedia.org/r/524601 (https://phabricator.wikimedia.org/T217038) (owner: 10Volans) [21:45:45] (03CR) 10Gehel: [C: 03+1] "LGTM" [software/cumin] - 10https://gerrit.wikimedia.org/r/524600 (owner: 10Volans) [21:45:52] (03PS4) 10Krinkle: Add entries to wgCSPFalsePositiveUrls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/504474 (https://phabricator.wikimedia.org/T207900) [21:49:26] (03PS1) 10Bartosz Dziewoński: Set wgCategoryCollation to 'uca-sl-u-kn' on Slovene projects (sl) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/524605 (https://phabricator.wikimedia.org/T208984) [21:49:42] (03PS2) 10Volans: dependency: replace colorama with custom module [software/cumin] - 10https://gerrit.wikimedia.org/r/524601 (https://phabricator.wikimedia.org/T217038) [21:49:44] (03CR) 10Volans: dependency: replace colorama with custom module (031 comment) [software/cumin] - 10https://gerrit.wikimedia.org/r/524601 (https://phabricator.wikimedia.org/T217038) (owner: 10Volans) [21:51:19] (03CR) 10Gehel: "minor nitpick, otherwise LGTM" (031 comment) [software/cumin] - 10https://gerrit.wikimedia.org/r/524601 (https://phabricator.wikimedia.org/T217038) (owner: 10Volans) [21:53:02] (03PS3) 10Volans: dependency: replace colorama with custom module [software/cumin] - 10https://gerrit.wikimedia.org/r/524601 (https://phabricator.wikimedia.org/T217038) [21:53:15] (03CR) 10Volans: dependency: replace colorama with custom module (031 comment) [software/cumin] - 10https://gerrit.wikimedia.org/r/524601 (https://phabricator.wikimedia.org/T217038) (owner: 10Volans) [21:54:32] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1001/17531/icinga1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/524584 (https://phabricator.wikimedia.org/T227650) (owner: 10Dzahn) [21:54:36] (03CR) 10Volans: [V: 03+2 C: 03+2] "bypassing CI as stated, the next CR fixes it." [software/cumin] - 10https://gerrit.wikimedia.org/r/524598 (owner: 10Volans) [21:54:44] (03CR) 10Dzahn: [C: 03+2] icinga: stop using ::passwords::ldap:wmf_cluster [puppet] - 10https://gerrit.wikimedia.org/r/524584 (https://phabricator.wikimedia.org/T227650) (owner: 10Dzahn) [21:54:53] (03PS3) 10Dzahn: icinga: stop using ::passwords::ldap:wmf_cluster [puppet] - 10https://gerrit.wikimedia.org/r/524584 (https://phabricator.wikimedia.org/T227650) [21:55:02] (03CR) 10Volans: [C: 03+2] tests: temporarily limit max version of prospector [software/cumin] - 10https://gerrit.wikimedia.org/r/524599 (owner: 10Volans) [21:55:20] 10Operations, 10SRE-Access-Requests: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003, and notebook1004] and groups for Mayakpwiki - https://phabricator.wikimedia.org/T227633 (10kzimmerman) @fsero Maya's contract end date, if still needed, is available here: https://docs.google.com/do... [21:55:56] (03CR) 10jenkins-bot: docstrings: fix newly reported pep257 violations [software/cumin] - 10https://gerrit.wikimedia.org/r/524598 (owner: 10Volans) [21:56:34] (03PS5) 10Krinkle: Add entries to wgCSPFalsePositiveUrls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/504474 (https://phabricator.wikimedia.org/T207900) [22:00:42] RECOVERY - Memory correctable errors -EDAC- on mw1239 is OK: (C)4 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw1239&var-datasource=eqiad+prometheus/ops [22:00:56] (03Merged) 10jenkins-bot: tests: temporarily limit max version of prospector [software/cumin] - 10https://gerrit.wikimedia.org/r/524599 (owner: 10Volans) [22:01:46] (03CR) 10Krinkle: "Verified on mwdebug1002:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/504474 (https://phabricator.wikimedia.org/T207900) (owner: 10Krinkle) [22:01:49] (03CR) 10jenkins-bot: tests: temporarily limit max version of prospector [software/cumin] - 10https://gerrit.wikimedia.org/r/524599 (owner: 10Volans) [22:03:17] (03CR) 10Volans: [C: 03+2] setup.py: limit max version of tqdm [software/cumin] - 10https://gerrit.wikimedia.org/r/524600 (owner: 10Volans) [22:04:40] (03CR) 10Gehel: [C: 03+1] "LGTM" [software/cumin] - 10https://gerrit.wikimedia.org/r/524601 (https://phabricator.wikimedia.org/T217038) (owner: 10Volans) [22:06:25] ACKNOWLEDGEMENT - High lag on wdqs1010 is CRITICAL: 7452 ge 3600 Gehel catching up after data transfer https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [22:09:06] (03Merged) 10jenkins-bot: setup.py: limit max version of tqdm [software/cumin] - 10https://gerrit.wikimedia.org/r/524600 (owner: 10Volans) [22:09:59] (03CR) 10jenkins-bot: setup.py: limit max version of tqdm [software/cumin] - 10https://gerrit.wikimedia.org/r/524600 (owner: 10Volans) [22:12:08] (03CR) 10Dzahn: "noop" [puppet] - 10https://gerrit.wikimedia.org/r/524584 (https://phabricator.wikimedia.org/T227650) (owner: 10Dzahn) [22:12:10] (03CR) 10Krinkle: "Actually – This is what I verified via XWD/mwdebug1002:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/504474 (https://phabricator.wikimedia.org/T207900) (owner: 10Krinkle) [22:12:55] (03CR) 10Volans: [C: 03+2] dependency: replace colorama with custom module [software/cumin] - 10https://gerrit.wikimedia.org/r/524601 (https://phabricator.wikimedia.org/T217038) (owner: 10Volans) [22:12:57] (03PS2) 10Dzahn: netbox: stop using ::passwords::ldap:wmf_cluster [puppet] - 10https://gerrit.wikimedia.org/r/524583 (https://phabricator.wikimedia.org/T227650) [22:18:55] (03Merged) 10jenkins-bot: dependency: replace colorama with custom module [software/cumin] - 10https://gerrit.wikimedia.org/r/524601 (https://phabricator.wikimedia.org/T217038) (owner: 10Volans) [22:19:42] (03PS6) 10Paladox: phabricator: enable php-fpm in Hiera on both hosts [puppet] - 10https://gerrit.wikimedia.org/r/510597 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn) [22:19:52] (03CR) 10jenkins-bot: dependency: replace colorama with custom module [software/cumin] - 10https://gerrit.wikimedia.org/r/524601 (https://phabricator.wikimedia.org/T217038) (owner: 10Volans) [22:20:02] (03PS7) 10Paladox: phabricator: enable php-fpm in Hiera on both hosts [puppet] - 10https://gerrit.wikimedia.org/r/510597 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn) [22:23:44] (03PS8) 10Dzahn: phabricator: enable php-fpm in Hiera on both hosts [puppet] - 10https://gerrit.wikimedia.org/r/510597 (https://phabricator.wikimedia.org/T190568) [22:27:07] (03PS1) 10Jhedden: toolschecker: check for existing webservice [puppet] - 10https://gerrit.wikimedia.org/r/524610 (https://phabricator.wikimedia.org/T221301) [22:27:24] paladox: https://puppet-compiler.wmflabs.org/compiler1001/17532/phab2001.codfw.wmnet/ [22:27:45] :) [22:28:07] 👍 [22:28:32] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1001/17532/" [puppet] - 10https://gerrit.wikimedia.org/r/510597 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn) [22:29:02] paladox: that even works ;) [22:29:10] :) [22:29:26] (03CR) 10Paladox: [C: 03+1] phabricator: enable php-fpm in Hiera on both hosts [puppet] - 10https://gerrit.wikimedia.org/r/510597 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn) [22:31:29] (03CR) 10Dzahn: [C: 03+2] "only affects 2001" [puppet] - 10https://gerrit.wikimedia.org/r/510597 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn) [22:31:52] (03PS1) 10Andrew Bogott: bootstrapvz: try to pre-generate a few more config files [puppet] - 10https://gerrit.wikimedia.org/r/524611 [22:32:43] (03PS2) 10Andrew Bogott: bootstrapvz: try to pre-generate a few more config files [puppet] - 10https://gerrit.wikimedia.org/r/524611 [22:33:34] (03CR) 10Andrew Bogott: [C: 03+2] bootstrapvz: try to pre-generate a few more config files [puppet] - 10https://gerrit.wikimedia.org/r/524611 (owner: 10Andrew Bogott) [22:36:01] !log phab2001 - switching apache to php-fpm and worker instead of mpm-prefork (to match phab1001) (T190568 T137928 T190572) [22:36:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:36:11] T137928: Deploy phabricator to phab2001.codfw.wmnet - https://phabricator.wikimedia.org/T137928 [22:36:11] T190568: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568 [22:36:12] T190572: Prepare a disaster recovery plan for failing over from phab1001 to phab2001 (or phab2001 to 1001) - https://phabricator.wikimedia.org/T190572 [22:44:46] (03CR) 10Dzahn: "noop on phab1003, switched it over on phab2001. apache service got refreshed by puppet .. no issues :)" [puppet] - 10https://gerrit.wikimedia.org/r/510597 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn) [22:46:04] 10Operations, 10Phabricator, 10serviceops, 10Patch-For-Review, and 3 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832 (10Dzahn) now also phab2001 has been switched to php-fpm and worker . it matches... [23:02:16] 10Operations, 10ops-codfw, 10ops-eqiad: Document PDU models - https://phabricator.wikimedia.org/T227632 (10wiki_willy) a:03RobH [23:03:38] (03PS1) 10Andrew Bogott: bootstrap: copy modprobe.d from build VM into base image [puppet] - 10https://gerrit.wikimedia.org/r/524618 [23:03:40] (03PS1) 10Andrew Bogott: wmcs-cold-migrate: show instance fqdn on cleanup prompt [puppet] - 10https://gerrit.wikimedia.org/r/524619 [23:03:42] (03PS1) 10Andrew Bogott: wmcs-cold-migrate: provide the --cleanup switch [puppet] - 10https://gerrit.wikimedia.org/r/524620 [23:04:22] (03CR) 10jerkins-bot: [V: 04-1] wmcs-cold-migrate: show instance fqdn on cleanup prompt [puppet] - 10https://gerrit.wikimedia.org/r/524619 (owner: 10Andrew Bogott) [23:04:40] (03CR) 10jerkins-bot: [V: 04-1] wmcs-cold-migrate: provide the --cleanup switch [puppet] - 10https://gerrit.wikimedia.org/r/524620 (owner: 10Andrew Bogott) [23:05:41] (03CR) 10Andrew Bogott: [C: 03+2] bootstrap: copy modprobe.d from build VM into base image [puppet] - 10https://gerrit.wikimedia.org/r/524618 (owner: 10Andrew Bogott) [23:07:49] (03PS2) 10Andrew Bogott: wmcs-cold-migrate: show instance fqdn on cleanup prompt [puppet] - 10https://gerrit.wikimedia.org/r/524619 [23:07:51] (03PS2) 10Andrew Bogott: wmcs-cold-migrate: provide the --cleanup switch [puppet] - 10https://gerrit.wikimedia.org/r/524620 [23:08:30] (03CR) 10Andrew Bogott: [C: 03+2] wmcs-cold-migrate: show instance fqdn on cleanup prompt [puppet] - 10https://gerrit.wikimedia.org/r/524619 (owner: 10Andrew Bogott) [23:08:55] (03CR) 10Andrew Bogott: [C: 03+2] wmcs-cold-migrate: provide the --cleanup switch [puppet] - 10https://gerrit.wikimedia.org/r/524620 (owner: 10Andrew Bogott) [23:09:09] (03PS1) 10Dzahn: planet: re-add support for https for traffic server [puppet] - 10https://gerrit.wikimedia.org/r/524621 (https://phabricator.wikimedia.org/T210411) [23:10:20] (03CR) 10Paladox: [C: 04-1] planet: re-add support for https for traffic server (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/524621 (https://phabricator.wikimedia.org/T210411) (owner: 10Dzahn) [23:20:26] (03PS1) 10Cwhite: hiera: deploy varnishkafka exporter to eqsin [puppet] - 10https://gerrit.wikimedia.org/r/524622 (https://phabricator.wikimedia.org/T196066) [23:24:19] (03PS2) 10Cwhite: hiera: deploy varnishkafka exporter to eqsin [puppet] - 10https://gerrit.wikimedia.org/r/524622 (https://phabricator.wikimedia.org/T196066) [23:26:19] (03CR) 10Cwhite: "Appears to do the right thing: https://puppet-compiler.wmflabs.org/compiler1002/17534/" [puppet] - 10https://gerrit.wikimedia.org/r/524622 (https://phabricator.wikimedia.org/T196066) (owner: 10Cwhite) [23:30:14] Is there any way for me to quickly check in which commit something was removed in gerrit? thanks! [23:31:15] Tks4Fish: do you mean a file was deleted from the repository, or something else? [23:31:32] something else [23:32:55] InitialiseSettings.php once enabled ptwiki's 'crats to add the usergroup "account-creator", per T65750, but that was removed somewhere, and the first ptwiki relevant commit I was able to find no longer has it (T212735) [23:32:56] T212735: ptwikipedia: Allow bureaucrats to grant and remove rollbacker usergroup - https://phabricator.wikimedia.org/T212735 [23:32:56] T65750: Enable ability to add and remove account creator flag on Portuguese Wikipedia locally - https://phabricator.wikimedia.org/T65750 [23:46:23] (03PS1) 10EBernhardson: Add swift analytics_mjolnir dummy account key [labs/private] - 10https://gerrit.wikimedia.org/r/524624 (https://phabricator.wikimedia.org/T227364) [23:48:18] (03PS1) 10EBernhardson: Add swift read credentials for mjolnir [puppet] - 10https://gerrit.wikimedia.org/r/524625 (https://phabricator.wikimedia.org/T227364) [23:48:46] (03CR) 10jerkins-bot: [V: 04-1] Add swift read credentials for mjolnir [puppet] - 10https://gerrit.wikimedia.org/r/524625 (https://phabricator.wikimedia.org/T227364) (owner: 10EBernhardson) [23:50:29] (03PS2) 10EBernhardson: Add swift read credentials for mjolnir [puppet] - 10https://gerrit.wikimedia.org/r/524625 (https://phabricator.wikimedia.org/T227364) [23:50:58] (03CR) 10jerkins-bot: [V: 04-1] Add swift read credentials for mjolnir [puppet] - 10https://gerrit.wikimedia.org/r/524625 (https://phabricator.wikimedia.org/T227364) (owner: 10EBernhardson) [23:53:54] (03PS3) 10EBernhardson: Add swift read credentials for mjolnir [puppet] - 10https://gerrit.wikimedia.org/r/524625 (https://phabricator.wikimedia.org/T227364) [23:54:44] (03CR) 10jerkins-bot: [V: 04-1] Add swift read credentials for mjolnir [puppet] - 10https://gerrit.wikimedia.org/r/524625 (https://phabricator.wikimedia.org/T227364) (owner: 10EBernhardson) [23:57:16] (03PS4) 10EBernhardson: Add swift read credentials for mjolnir [puppet] - 10https://gerrit.wikimedia.org/r/524625 (https://phabricator.wikimedia.org/T227364) [23:58:18] 10Operations, 10Thumbor, 10serviceops, 10User-jijiki: Upgrade Thumbor to Buster - https://phabricator.wikimedia.org/T216815 (10Aklapper)